Phi4-MedQA / README.md
Vijayendra's picture
Update README.md
74446e5 verified
|
raw
history blame
2.59 kB
---
base_model: unsloth/phi-4-unsloth-bnb-4bit
library_name: peft
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
## How to Use
# Install required libraries
!pip install unsloth peft bitsandbytes accelerate transformers
# Import necessary modules
from transformers import AutoTokenizer
from unsloth import FastLanguageModel
# Define the MedQA prompt
medqa_prompt = """You are a medical QA system. Answer the following medical question clearly and in detail with complete sentences.
### Question:
{}
### Answer:
"""
# Load the model and tokenizer using unsloth
model_name = "Vijayendra/Phi4-MedQA" # Replace with your Hugging Face model name
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
max_seq_length=2048,
dtype=None, # Use default precision
load_in_4bit=True, # Enable 4-bit quantization
device_map="auto" # Automatically map model to available devices
)
# Enable faster inference
FastLanguageModel.for_inference(model)
# Prepare the medical question
medical_question = "What are the common symptoms of diabetes?" # Replace with your medical question
inputs = tokenizer(
[medqa_prompt.format(medical_question)],
return_tensors="pt",
padding=True,
truncation=True,
max_length=1024
).to("cuda") # Ensure inputs are on the GPU
# Generate the output
outputs = model.generate(
**inputs,
max_new_tokens=512, # Allow for detailed responses
use_cache=True # Speeds up generation
)
# Decode and clean the response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Extract and print the generated answer
answer_text = response.split("### Answer:")[1].strip() if "### Answer:" in response else response.strip()
print(f"Question: {medical_question}")
print(f"Answer: {answer_text}")
[More Information Needed]
### Framework versions
- PEFT 0.14.0