PEFT
Safetensors
English
StefanKrsteski's picture
Update README.md
6c61771 verified
---
library_name: peft
base_model: microsoft/Phi-3-mini-4k-instruct
datasets:
- argilla/ultrafeedback-binarized-preferences-cleaned
- >-
flax-sentence-embeddings/stackexchange_titlebody_best_and_down_voted_answer_jsonl
language:
- en
---
# Model Card for Phi-3-mini-4k-instruct DPO
## Model Details
- **Model Name:** Phi-3-mini-4k-instruct DPO
- **Publisher:** Team chatterbox, EPFL
- **Model Type:** Language Model, Fine-tuned with direct preference optimization (DPO)
- **Training Environment:** Trained on the EPFL SCITAS cluster using a 32GB GPU.
## Intended Use
- **Primary Applications:** This model is designed as part of an AI-Tutor system.
- **Intended Audience:** Educators, students, and developers creating educational AI applications.
## Model/Data Description
### Training Data
- **Datasets Used:**
- **Milestone 1 Dataset:** Includes 1522 unique questions with preference pairs based on the 'overall' rating, totaling 20k+ usable entries after processing.
- **Stack Exchange Dataset:** Filters content from specific domains within the Stack Exchange network, using upvoted and downvoted answers to form preference pairs. Total entries after preprocessing: 54458.
- **Ultra Feedback:** Utilizes responses rated on criteria like truthfulness and helpfulness to form preference pairs, with a total of 60917 entries after preprocessing.
- **Preprocessing Details:** Entries with identical chosen and rejected answers were removed. Datasets were formatted as JSONL where each line represents a JSON object with a "prompt", "chosen", and "rejected" response.
## Training Procedure
- **Configurations:** (Refer to the provided `training_args` and `trainer` configuration)
- **Evaluation Metrics:** The primary metric for model performance is `eval_loss`, with the aim to minimize this value.
## Evaluation Results
- **Accuracies:** eval/rewards/accuracies - 0.83
- **Loss:** eval/loss - 0.47
- **Margins:** eval/margins - 4.31
### MT-Bench
- **Single Grading Score, Overall Avg.** - 8.2
- **STEM Score** - 9.8 (higher than GPT-4)
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F633206606eae0bb0a01c8a82%2Fay1QSp2hkicRTY4fcnAPX.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->
## References
- **[Include references and citations for datasets, tools, and methodologies used.]**
- PEFT 0.11.1