File size: 6,001 Bytes
dbc8f90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f7fdb07
dbc8f90
 
 
1bf92d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5cb3d63
1bf92d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ceafb84
1bf92d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5cb3d63
 
 
 
1bf92d5
5cb3d63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1bf92d5
5cb3d63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1bf92d5
 
5cb3d63
1bf92d5
5cb3d63
 
 
 
1bf92d5
 
5cb3d63
 
 
 
1bf92d5
 
 
 
 
 
dbc8f90
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
---
base_model: unsloth/phi-4-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
license: apache-2.0
language:
- en
---

# Uploaded  model

- **Developed by:** Haq Nawaz Malik
- **License:** apache-2.0
- **Finetuned from model :** unsloth/phi-4-unsloth-bnb-4bit

# Fine-tuned Phi-4 Model Documentation

## 📌 Introduction
This documentation provides an in-depth overview of the **fine-tuned Phi-4 conversational AI model**, detailing its **training methodology, parameters, dataset, model deployment, and usage instructions**.

## 🔹 Model Overview
**Phi-4** is a transformer-based language model optimized for **natural language understanding and text generation**. We have fine-tuned it using **LoRA (Low-Rank Adaptation)** with the **Unsloth framework**, making it lightweight and efficient while preserving the base model's capabilities.

## 🔹 Training Details
### **🛠 Fine-tuning Methodology**
We employed **LoRA (Low-Rank Adaptation)** for fine-tuning, which significantly reduces the number of trainable parameters while retaining the model’s expressive power.

### **📑 Dataset Used**
- **Dataset Name**: `mlabonne/FineTome-100k`
- **Dataset Size**: 100,000 examples
- **Data Format**: Conversational AI dataset with structured prompts and responses.
- **Preprocessing**: The dataset was standardized using `unsloth.chat_templates.standardize_sharegpt()`

### **🔢 Training Parameters**
| Parameter             | Value |
|----------------------|-------|
| LoRA Rank (`r`)     | 16    |
| LoRA Alpha          | 16    |
| LoRA Dropout        | 0     |
| Target Modules      | `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` |
| Max Sequence Length | 2048  |
| Load in 4-bit       | True  |
| Gradient Checkpointing | `unsloth` |
| Fine-tuning Duration | **10 epochs** |
| Optimizer Used      | AdamW |
| Learning Rate       | 2e-5  |

## 🔹 How to Load the Model
To load the fine-tuned model, use the **Unsloth framework**:

```python
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from peft import PeftModel

model_name = "Omarrran/lora_model"
max_seq_length = 2048
load_in_4bit = True

# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    load_in_4bit=load_in_4bit
)

# Apply LoRA adapter
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth"
)
```
## NoTE : USE GPU
## 🔹 Deploying the Model
### **🚀 Using Google Colab**
1. Install dependencies:
    ```bash
    pip install gradio transformers torch unsloth peft
    ```
2. Load the model using the script above.
3. Run inference using the chatbot interface.

### **🚀 Deploy on Hugging Face Spaces**
1. Save the script as `app.py`.
2. Create a `requirements.txt` file with:
    ```
    gradio
    transformers
    torch
    unsloth
    peft
    ```
3. Upload the files to a new **Hugging Face Space**.
4. Select **Python environment** and click **Deploy**.

## 🔹 Using the Model
### **🗨 Chatbot Interface (Gradio UI)**
To interact with the fine-tuned model using **Gradio**, use:

```python
import gradio as gr
import torch
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from peft import PeftModel

# Load the Base Model with Unsloth
model_name = "Omarrran/lora_model"  # Change this if needed
max_seq_length = 2048
load_in_4bit = True  # Use 4-bit quantization to save memory

# Load model and tokenizer
base_model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    load_in_4bit=load_in_4bit
)

# Apply LoRA Adapter
model = FastLanguageModel.get_peft_model(
    base_model,
    r=16,  
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth"
)

# Apply Chat Formatting Template
tokenizer = get_chat_template(tokenizer, chat_template="phi-4")

# Chat Function
def chat_with_model(user_input):
    try:
        inputs = tokenizer(user_input, return_tensors="pt")
        output = model.generate(**inputs, max_length=200)
        response = tokenizer.decode(output[0], skip_special_tokens=True)
        return response
    except Exception as e:
        return f"Error: {str(e)}"

# Define Gradio Interface
description = """
### 🧠 Phi-4 Conversational AI Chatbot
This chatbot is powered by **Unsloth's Phi-4 model**, optimized with **LoRA fine-tuning**.

#### 🔹 Features:
✅ **Lightweight LoRA adapter for efficiency**  
✅ **Supports long-context conversations (2048 tokens)**  
✅ **Optimized with 4-bit quantization for fast inference**

#### 🔹 Example Questions:
- "What is the capital of France?"
- "Tell me a joke!"
- "Explain black holes in simple terms."
"""

examples = [
    "Hello, how are you?",
    "What is the capital of France?",
    "Tell me a joke!",
    "What is quantum physics?",
    "Translate 'Hello' to French."
]

# Launch Gradio UI
demo = gr.Interface(
    fn=chat_with_model,
    inputs=gr.Textbox(label="Your Message", placeholder="Type something here..."),
    outputs=gr.Textbox(label="Chatbot's Response"),
    title="🔹 HNM_Phi_4_finetuned",
    description=description,
    examples=examples,
    allow_flagging="never"
)

if __name__ == "__main__":
    demo.launch()


```

## 📌 Conclusion
This **fine-tuned Phi-4 model** delivers **optimized conversational AI capabilities** using **LoRA fine-tuning and Unsloth’s 4-bit quantization**. The model is **lightweight, memory-efficient**, and suitable for chatbot applications in both **research and production environments**.