doplhin-dpo-mnlp
This model is a fine-tuned version of cognitivecomputations/dolphin-2.1-mistral-7b on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.0533
- Rewards/chosen: 1.5359
- Rewards/rejected: -19.2198
- Rewards/accuracies: 0.9859
- Rewards/margins: 20.7558
- Logps/rejected: -297.6228
- Logps/chosen: -116.0773
- Logits/rejected: -2.0080
- Logits/chosen: -2.2270
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 6
- eval_batch_size: 6
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.1834 | 0.2313 | 65 | 0.0205 | -4.0635 | -21.5516 | 0.9883 | 17.4881 | -320.9407 | -172.0718 | -2.2051 | -2.5538 |
0.3173 | 0.4626 | 130 | 0.0478 | -3.7133 | -20.7365 | 0.9812 | 17.0232 | -312.7894 | -168.5696 | -1.7985 | -2.0459 |
0.0481 | 0.6940 | 195 | 0.0392 | 1.3063 | -18.0062 | 0.9883 | 19.3124 | -285.4860 | -118.3736 | -1.8805 | -2.1378 |
0.0079 | 0.9253 | 260 | 0.0533 | 1.5359 | -19.2198 | 0.9859 | 20.7558 | -297.6228 | -116.0773 | -2.0080 | -2.2270 |
Framework versions
- PEFT 0.11.1
- Transformers 4.41.1
- Pytorch 2.1.2+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 2
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model’s pipeline type.
Model tree for yassinechaouch/doplhin-dpo-mnlp
Base model
cognitivecomputations/dolphin-2.1-mistral-7b