Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V1

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8779
  • Rewards/chosen: -2.5074
  • Rewards/rejected: -2.4835
  • Rewards/accuracies: 0.5833
  • Rewards/margins: -0.0240
  • Logps/rejected: -187.5269
  • Logps/chosen: -172.9323
  • Logits/rejected: -0.3789
  • Logits/chosen: -0.3801

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6952 0.3016 87 0.6794 -0.0405 -0.0725 0.5833 0.0320 -163.4170 -148.2627 0.3515 0.3604
0.6655 0.6031 174 0.6384 0.0391 -0.0895 0.5 0.1287 -163.5874 -147.4663 0.3348 0.3431
0.6246 0.9047 261 0.6568 0.1297 0.0077 0.5833 0.1220 -162.6151 -146.5603 0.2825 0.2904
0.3939 1.2062 348 0.6986 -0.2304 -0.4082 0.5833 0.1778 -166.7741 -150.1618 0.1283 0.1335
0.3329 1.5078 435 0.7227 -0.5473 -0.6512 0.5833 0.1039 -169.2040 -153.3306 -0.0449 -0.0420
0.6015 1.8094 522 0.7035 -1.0222 -1.2334 0.5 0.2112 -175.0264 -158.0799 -0.0987 -0.0963
0.0646 2.1109 609 0.7550 -1.6915 -1.8415 0.5 0.1500 -181.1071 -164.7728 -0.2277 -0.2271
0.1952 2.4125 696 0.8210 -2.1941 -2.2483 0.5833 0.0542 -185.1751 -169.7991 -0.3347 -0.3356
0.0774 2.7140 783 0.8779 -2.5074 -2.4835 0.5833 -0.0240 -187.5269 -172.9323 -0.3789 -0.3801

Framework versions

  • PEFT 0.13.2
  • Transformers 4.45.2
  • Pytorch 2.4.0+cu121
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
14
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V1

Adapter
(1767)
this model