base_model: unsloth/phi-4-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
license: apache-2.0
language:
- en
Uploaded model
- Developed by: Haq Nawaz Malik
- License: apache-2.0
- Finetuned from model : unsloth/phi-4-unsloth-bnb-4bit
Fine-tuned Phi-4 Model Documentation
📌 Introduction
This documentation provides an in-depth overview of the fine-tuned Phi-4 conversational AI model, detailing its training methodology, parameters, dataset, model deployment, and usage instructions.
🔹 Model Overview
Phi-4 is a transformer-based language model optimized for natural language understanding and text generation. We have fine-tuned it using LoRA (Low-Rank Adaptation) with the Unsloth framework, making it lightweight and efficient while preserving the base model's capabilities.
🔹 Training Details
🛠 Fine-tuning Methodology
We employed LoRA (Low-Rank Adaptation) for fine-tuning, which significantly reduces the number of trainable parameters while retaining the model’s expressive power.
📑 Dataset Used
- Dataset Name:
mlabonne/FineTome-100k
- Dataset Size: 100,000 examples
- Data Format: Conversational AI dataset with structured prompts and responses.
- Preprocessing: The dataset was standardized using
unsloth.chat_templates.standardize_sharegpt()
🔢 Training Parameters
Parameter | Value |
---|---|
LoRA Rank (r ) |
16 |
LoRA Alpha | 16 |
LoRA Dropout | 0 |
Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
Max Sequence Length | 2048 |
Load in 4-bit | True |
Gradient Checkpointing | unsloth |
Fine-tuning Duration | 10 epochs |
Optimizer Used | AdamW |
Learning Rate | 2e-5 |
🔹 How to Load the Model
To load the fine-tuned model, use the Unsloth framework:
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from peft import PeftModel
model_name = "Omarrran/lora_model"
max_seq_length = 2048
load_in_4bit = True
# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
max_seq_length=max_seq_length,
load_in_4bit=load_in_4bit
)
# Apply LoRA adapter
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth"
)
NoTE : USE GPU
🔹 Deploying the Model
🚀 Using Google Colab
- Install dependencies:
pip install gradio transformers torch unsloth peft
- Load the model using the script above.
- Run inference using the chatbot interface.
🚀 Deploy on Hugging Face Spaces
- Save the script as
app.py
. - Create a
requirements.txt
file with:gradio transformers torch unsloth peft
- Upload the files to a new Hugging Face Space.
- Select Python environment and click Deploy.
🔹 Using the Model
🗨 Chatbot Interface (Gradio UI)
To interact with the fine-tuned model using Gradio, use:
import gradio as gr
import torch
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from peft import PeftModel
# Load the Base Model with Unsloth
model_name = "Omarrran/lora_model" # Change this if needed
max_seq_length = 2048
load_in_4bit = True # Use 4-bit quantization to save memory
# Load model and tokenizer
base_model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
max_seq_length=max_seq_length,
load_in_4bit=load_in_4bit
)
# Apply LoRA Adapter
model = FastLanguageModel.get_peft_model(
base_model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth"
)
# Apply Chat Formatting Template
tokenizer = get_chat_template(tokenizer, chat_template="phi-4")
# Chat Function
def chat_with_model(user_input):
try:
inputs = tokenizer(user_input, return_tensors="pt")
output = model.generate(**inputs, max_length=200)
response = tokenizer.decode(output[0], skip_special_tokens=True)
return response
except Exception as e:
return f"Error: {str(e)}"
# Define Gradio Interface
description = """
### 🧠 Phi-4 Conversational AI Chatbot
This chatbot is powered by **Unsloth's Phi-4 model**, optimized with **LoRA fine-tuning**.
#### 🔹 Features:
✅ **Lightweight LoRA adapter for efficiency**
✅ **Supports long-context conversations (2048 tokens)**
✅ **Optimized with 4-bit quantization for fast inference**
#### 🔹 Example Questions:
- "What is the capital of France?"
- "Tell me a joke!"
- "Explain black holes in simple terms."
"""
examples = [
"Hello, how are you?",
"What is the capital of France?",
"Tell me a joke!",
"What is quantum physics?",
"Translate 'Hello' to French."
]
# Launch Gradio UI
demo = gr.Interface(
fn=chat_with_model,
inputs=gr.Textbox(label="Your Message", placeholder="Type something here..."),
outputs=gr.Textbox(label="Chatbot's Response"),
title="🔹 HNM_Phi_4_finetuned",
description=description,
examples=examples,
allow_flagging="never"
)
if __name__ == "__main__":
demo.launch()
📌 Conclusion
This fine-tuned Phi-4 model delivers optimized conversational AI capabilities using LoRA fine-tuning and Unsloth’s 4-bit quantization. The model is lightweight, memory-efficient, and suitable for chatbot applications in both research and production environments.