Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V1
This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.8779
- Rewards/chosen: -2.5074
- Rewards/rejected: -2.4835
- Rewards/accuracies: 0.5833
- Rewards/margins: -0.0240
- Logps/rejected: -187.5269
- Logps/chosen: -172.9323
- Logits/rejected: -0.3789
- Logits/chosen: -0.3801
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6952 | 0.3016 | 87 | 0.6794 | -0.0405 | -0.0725 | 0.5833 | 0.0320 | -163.4170 | -148.2627 | 0.3515 | 0.3604 |
0.6655 | 0.6031 | 174 | 0.6384 | 0.0391 | -0.0895 | 0.5 | 0.1287 | -163.5874 | -147.4663 | 0.3348 | 0.3431 |
0.6246 | 0.9047 | 261 | 0.6568 | 0.1297 | 0.0077 | 0.5833 | 0.1220 | -162.6151 | -146.5603 | 0.2825 | 0.2904 |
0.3939 | 1.2062 | 348 | 0.6986 | -0.2304 | -0.4082 | 0.5833 | 0.1778 | -166.7741 | -150.1618 | 0.1283 | 0.1335 |
0.3329 | 1.5078 | 435 | 0.7227 | -0.5473 | -0.6512 | 0.5833 | 0.1039 | -169.2040 | -153.3306 | -0.0449 | -0.0420 |
0.6015 | 1.8094 | 522 | 0.7035 | -1.0222 | -1.2334 | 0.5 | 0.2112 | -175.0264 | -158.0799 | -0.0987 | -0.0963 |
0.0646 | 2.1109 | 609 | 0.7550 | -1.6915 | -1.8415 | 0.5 | 0.1500 | -181.1071 | -164.7728 | -0.2277 | -0.2271 |
0.1952 | 2.4125 | 696 | 0.8210 | -2.1941 | -2.2483 | 0.5833 | 0.0542 | -185.1751 | -169.7991 | -0.3347 | -0.3356 |
0.0774 | 2.7140 | 783 | 0.8779 | -2.5074 | -2.4835 | 0.5833 | -0.0240 | -187.5269 | -172.9323 | -0.3789 | -0.3801 |
Framework versions
- PEFT 0.13.2
- Transformers 4.45.2
- Pytorch 2.4.0+cu121
- Datasets 3.0.1
- Tokenizers 0.20.1
- Downloads last month
- 14
Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V1
Base model
meta-llama/Llama-2-7b-hf