File size: 6,001 Bytes
dbc8f90 f7fdb07 dbc8f90 1bf92d5 5cb3d63 1bf92d5 ceafb84 1bf92d5 5cb3d63 1bf92d5 5cb3d63 1bf92d5 5cb3d63 1bf92d5 5cb3d63 1bf92d5 5cb3d63 1bf92d5 5cb3d63 1bf92d5 dbc8f90 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 |
---
base_model: unsloth/phi-4-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
license: apache-2.0
language:
- en
---
# Uploaded model
- **Developed by:** Haq Nawaz Malik
- **License:** apache-2.0
- **Finetuned from model :** unsloth/phi-4-unsloth-bnb-4bit
# Fine-tuned Phi-4 Model Documentation
## 📌 Introduction
This documentation provides an in-depth overview of the **fine-tuned Phi-4 conversational AI model**, detailing its **training methodology, parameters, dataset, model deployment, and usage instructions**.
## 🔹 Model Overview
**Phi-4** is a transformer-based language model optimized for **natural language understanding and text generation**. We have fine-tuned it using **LoRA (Low-Rank Adaptation)** with the **Unsloth framework**, making it lightweight and efficient while preserving the base model's capabilities.
## 🔹 Training Details
### **🛠 Fine-tuning Methodology**
We employed **LoRA (Low-Rank Adaptation)** for fine-tuning, which significantly reduces the number of trainable parameters while retaining the model’s expressive power.
### **📑 Dataset Used**
- **Dataset Name**: `mlabonne/FineTome-100k`
- **Dataset Size**: 100,000 examples
- **Data Format**: Conversational AI dataset with structured prompts and responses.
- **Preprocessing**: The dataset was standardized using `unsloth.chat_templates.standardize_sharegpt()`
### **🔢 Training Parameters**
| Parameter | Value |
|----------------------|-------|
| LoRA Rank (`r`) | 16 |
| LoRA Alpha | 16 |
| LoRA Dropout | 0 |
| Target Modules | `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` |
| Max Sequence Length | 2048 |
| Load in 4-bit | True |
| Gradient Checkpointing | `unsloth` |
| Fine-tuning Duration | **10 epochs** |
| Optimizer Used | AdamW |
| Learning Rate | 2e-5 |
## 🔹 How to Load the Model
To load the fine-tuned model, use the **Unsloth framework**:
```python
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from peft import PeftModel
model_name = "Omarrran/lora_model"
max_seq_length = 2048
load_in_4bit = True
# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
max_seq_length=max_seq_length,
load_in_4bit=load_in_4bit
)
# Apply LoRA adapter
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth"
)
```
## NoTE : USE GPU
## 🔹 Deploying the Model
### **🚀 Using Google Colab**
1. Install dependencies:
```bash
pip install gradio transformers torch unsloth peft
```
2. Load the model using the script above.
3. Run inference using the chatbot interface.
### **🚀 Deploy on Hugging Face Spaces**
1. Save the script as `app.py`.
2. Create a `requirements.txt` file with:
```
gradio
transformers
torch
unsloth
peft
```
3. Upload the files to a new **Hugging Face Space**.
4. Select **Python environment** and click **Deploy**.
## 🔹 Using the Model
### **🗨 Chatbot Interface (Gradio UI)**
To interact with the fine-tuned model using **Gradio**, use:
```python
import gradio as gr
import torch
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from peft import PeftModel
# Load the Base Model with Unsloth
model_name = "Omarrran/lora_model" # Change this if needed
max_seq_length = 2048
load_in_4bit = True # Use 4-bit quantization to save memory
# Load model and tokenizer
base_model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
max_seq_length=max_seq_length,
load_in_4bit=load_in_4bit
)
# Apply LoRA Adapter
model = FastLanguageModel.get_peft_model(
base_model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth"
)
# Apply Chat Formatting Template
tokenizer = get_chat_template(tokenizer, chat_template="phi-4")
# Chat Function
def chat_with_model(user_input):
try:
inputs = tokenizer(user_input, return_tensors="pt")
output = model.generate(**inputs, max_length=200)
response = tokenizer.decode(output[0], skip_special_tokens=True)
return response
except Exception as e:
return f"Error: {str(e)}"
# Define Gradio Interface
description = """
### 🧠 Phi-4 Conversational AI Chatbot
This chatbot is powered by **Unsloth's Phi-4 model**, optimized with **LoRA fine-tuning**.
#### 🔹 Features:
✅ **Lightweight LoRA adapter for efficiency**
✅ **Supports long-context conversations (2048 tokens)**
✅ **Optimized with 4-bit quantization for fast inference**
#### 🔹 Example Questions:
- "What is the capital of France?"
- "Tell me a joke!"
- "Explain black holes in simple terms."
"""
examples = [
"Hello, how are you?",
"What is the capital of France?",
"Tell me a joke!",
"What is quantum physics?",
"Translate 'Hello' to French."
]
# Launch Gradio UI
demo = gr.Interface(
fn=chat_with_model,
inputs=gr.Textbox(label="Your Message", placeholder="Type something here..."),
outputs=gr.Textbox(label="Chatbot's Response"),
title="🔹 HNM_Phi_4_finetuned",
description=description,
examples=examples,
allow_flagging="never"
)
if __name__ == "__main__":
demo.launch()
```
## 📌 Conclusion
This **fine-tuned Phi-4 model** delivers **optimized conversational AI capabilities** using **LoRA fine-tuning and Unsloth’s 4-bit quantization**. The model is **lightweight, memory-efficient**, and suitable for chatbot applications in both **research and production environments**.
|