lora_model / README.md
Omarrran's picture
Update README.md
ceafb84 verified
metadata
base_model: unsloth/phi-4-unsloth-bnb-4bit
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - llama
  - trl
license: apache-2.0
language:
  - en

Uploaded model

  • Developed by: Haq Nawaz Malik
  • License: apache-2.0
  • Finetuned from model : unsloth/phi-4-unsloth-bnb-4bit

Fine-tuned Phi-4 Model Documentation

📌 Introduction

This documentation provides an in-depth overview of the fine-tuned Phi-4 conversational AI model, detailing its training methodology, parameters, dataset, model deployment, and usage instructions.

🔹 Model Overview

Phi-4 is a transformer-based language model optimized for natural language understanding and text generation. We have fine-tuned it using LoRA (Low-Rank Adaptation) with the Unsloth framework, making it lightweight and efficient while preserving the base model's capabilities.

🔹 Training Details

🛠 Fine-tuning Methodology

We employed LoRA (Low-Rank Adaptation) for fine-tuning, which significantly reduces the number of trainable parameters while retaining the model’s expressive power.

📑 Dataset Used

  • Dataset Name: mlabonne/FineTome-100k
  • Dataset Size: 100,000 examples
  • Data Format: Conversational AI dataset with structured prompts and responses.
  • Preprocessing: The dataset was standardized using unsloth.chat_templates.standardize_sharegpt()

🔢 Training Parameters

Parameter Value
LoRA Rank (r) 16
LoRA Alpha 16
LoRA Dropout 0
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Max Sequence Length 2048
Load in 4-bit True
Gradient Checkpointing unsloth
Fine-tuning Duration 10 epochs
Optimizer Used AdamW
Learning Rate 2e-5

🔹 How to Load the Model

To load the fine-tuned model, use the Unsloth framework:

from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from peft import PeftModel

model_name = "Omarrran/lora_model"
max_seq_length = 2048
load_in_4bit = True

# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    load_in_4bit=load_in_4bit
)

# Apply LoRA adapter
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth"
)

NoTE : USE GPU

🔹 Deploying the Model

🚀 Using Google Colab

  1. Install dependencies:
    pip install gradio transformers torch unsloth peft
    
  2. Load the model using the script above.
  3. Run inference using the chatbot interface.

🚀 Deploy on Hugging Face Spaces

  1. Save the script as app.py.
  2. Create a requirements.txt file with:
    gradio
    transformers
    torch
    unsloth
    peft
    
  3. Upload the files to a new Hugging Face Space.
  4. Select Python environment and click Deploy.

🔹 Using the Model

🗨 Chatbot Interface (Gradio UI)

To interact with the fine-tuned model using Gradio, use:

import gradio as gr
import torch
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from peft import PeftModel

# Load the Base Model with Unsloth
model_name = "Omarrran/lora_model"  # Change this if needed
max_seq_length = 2048
load_in_4bit = True  # Use 4-bit quantization to save memory

# Load model and tokenizer
base_model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    load_in_4bit=load_in_4bit
)

# Apply LoRA Adapter
model = FastLanguageModel.get_peft_model(
    base_model,
    r=16,  
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth"
)

# Apply Chat Formatting Template
tokenizer = get_chat_template(tokenizer, chat_template="phi-4")

# Chat Function
def chat_with_model(user_input):
    try:
        inputs = tokenizer(user_input, return_tensors="pt")
        output = model.generate(**inputs, max_length=200)
        response = tokenizer.decode(output[0], skip_special_tokens=True)
        return response
    except Exception as e:
        return f"Error: {str(e)}"

# Define Gradio Interface
description = """
### 🧠 Phi-4 Conversational AI Chatbot
This chatbot is powered by **Unsloth's Phi-4 model**, optimized with **LoRA fine-tuning**.

#### 🔹 Features:
✅ **Lightweight LoRA adapter for efficiency**  
✅ **Supports long-context conversations (2048 tokens)**  
✅ **Optimized with 4-bit quantization for fast inference**

#### 🔹 Example Questions:
- "What is the capital of France?"
- "Tell me a joke!"
- "Explain black holes in simple terms."
"""

examples = [
    "Hello, how are you?",
    "What is the capital of France?",
    "Tell me a joke!",
    "What is quantum physics?",
    "Translate 'Hello' to French."
]

# Launch Gradio UI
demo = gr.Interface(
    fn=chat_with_model,
    inputs=gr.Textbox(label="Your Message", placeholder="Type something here..."),
    outputs=gr.Textbox(label="Chatbot's Response"),
    title="🔹 HNM_Phi_4_finetuned",
    description=description,
    examples=examples,
    allow_flagging="never"
)

if __name__ == "__main__":
    demo.launch()

📌 Conclusion

This fine-tuned Phi-4 model delivers optimized conversational AI capabilities using LoRA fine-tuning and Unsloth’s 4-bit quantization. The model is lightweight, memory-efficient, and suitable for chatbot applications in both research and production environments.