Llama-2-7b-hf-DPO-LookAhead3_FullEval_TTree1.4_TLoop0.7_TEval0.2_Filter0.2_V4.0
This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.4718
- Rewards/chosen: -2.4268
- Rewards/rejected: -3.3611
- Rewards/accuracies: 0.75
- Rewards/margins: 0.9343
- Logps/rejected: -120.4226
- Logps/chosen: -107.9291
- Logits/rejected: -1.6517
- Logits/chosen: -1.6556
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.7006 | 0.3051 | 54 | 0.6866 | 0.0193 | 0.0054 | 0.625 | 0.0139 | -86.7576 | -83.4686 | -0.8794 | -0.8889 |
0.69 | 0.6102 | 108 | 0.6159 | -0.0023 | -0.1736 | 0.875 | 0.1712 | -88.5472 | -83.6846 | -0.8944 | -0.9046 |
0.5649 | 0.9153 | 162 | 0.5807 | -0.1149 | -0.3833 | 0.875 | 0.2684 | -90.6444 | -84.8100 | -0.9769 | -0.9857 |
0.3921 | 1.2203 | 216 | 0.5138 | -0.6026 | -1.0626 | 0.875 | 0.4600 | -97.4372 | -89.6870 | -1.0866 | -1.0941 |
0.2459 | 1.5254 | 270 | 0.4782 | -0.8139 | -1.3669 | 0.875 | 0.5530 | -100.4805 | -91.7997 | -1.1226 | -1.1302 |
0.3946 | 1.8305 | 324 | 0.5178 | -1.1731 | -1.6961 | 0.75 | 0.5230 | -103.7727 | -95.3921 | -1.3492 | -1.3554 |
0.1509 | 2.1356 | 378 | 0.4919 | -1.6892 | -2.4213 | 0.75 | 0.7321 | -111.0249 | -100.5536 | -1.5040 | -1.5090 |
0.3279 | 2.4407 | 432 | 0.4825 | -2.1908 | -3.0498 | 0.75 | 0.8590 | -117.3094 | -105.5691 | -1.6421 | -1.6462 |
0.1453 | 2.7458 | 486 | 0.4718 | -2.4268 | -3.3611 | 0.75 | 0.9343 | -120.4226 | -107.9291 | -1.6517 | -1.6556 |
Framework versions
- PEFT 0.13.0
- Transformers 4.45.1
- Pytorch 2.4.0+cu121
- Datasets 3.0.1
- Tokenizers 0.20.0
- Downloads last month
- 7
Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead3_FullEval_TTree1.4_TLoop0.7_TEval0.2_Filter0.2_V4.0
Base model
meta-llama/Llama-2-7b-hf