dpo_with_se

This model is a fine-tuned version of microsoft/Phi-3-mini-4k-instruct on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6194
  • Rewards/chosen: -0.6699
  • Rewards/rejected: -1.1107
  • Rewards/accuracies: 0.6458
  • Rewards/margins: 0.4407
  • Logps/rejected: -422.9081
  • Logps/chosen: -458.9963
  • Logits/rejected: 0.0509
  • Logits/chosen: 0.1892

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 2
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7121 0.0622 50 0.7078 1.9859 1.9118 0.5694 0.0741 -392.6837 -432.4385 0.1883 0.3317
0.672 0.1244 100 0.6718 0.4213 0.2008 0.5972 0.2204 -409.7933 -448.0844 0.1330 0.2722
0.6803 0.1866 150 0.6633 1.2004 0.9074 0.6215 0.2930 -402.7275 -440.2932 0.2565 0.3917
0.6816 0.2488 200 0.6535 -0.2285 -0.4811 0.5938 0.2526 -416.6123 -454.5817 0.1335 0.2706
0.6719 0.3109 250 0.6768 -0.0803 -0.2830 0.6007 0.2027 -414.6320 -453.1003 0.1071 0.2455
0.642 0.3731 300 0.6402 0.3405 0.0226 0.6146 0.3179 -411.5756 -448.8922 0.0864 0.2271
0.6675 0.4353 350 0.6472 0.7586 0.4677 0.6007 0.2909 -407.1244 -444.7109 0.1382 0.2779
0.6581 0.4975 400 0.6502 -0.0310 -0.3059 0.6181 0.2749 -414.8607 -452.6067 0.0326 0.1770
0.6155 0.5597 450 0.6416 0.0254 -0.2895 0.625 0.3149 -414.6964 -452.0428 0.1102 0.2490
0.6438 0.6219 500 0.6383 -0.2805 -0.6002 0.625 0.3197 -417.8031 -455.1015 0.0799 0.2196
0.6069 0.6841 550 0.6360 -0.6526 -0.9456 0.6007 0.2930 -421.2573 -458.8233 0.1079 0.2462
0.6227 0.7463 600 0.6349 -0.0705 -0.3659 0.6215 0.2954 -415.4609 -453.0020 0.0381 0.1807
0.6473 0.8085 650 0.6331 -0.3187 -0.6771 0.6528 0.3584 -418.5728 -455.4844 0.1406 0.2776
0.6259 0.8706 700 0.6295 -0.4256 -0.7399 0.6111 0.3143 -419.2006 -456.5528 0.0986 0.2391
0.6572 0.9328 750 0.6389 -0.5969 -0.8936 0.6007 0.2967 -420.7374 -458.2657 0.0726 0.2120
0.63 0.9950 800 0.6310 -0.2243 -0.5516 0.6285 0.3274 -417.3179 -454.5398 0.1026 0.2406
0.4431 1.0572 850 0.6238 -0.3325 -0.7169 0.6632 0.3844 -418.9702 -455.6217 0.0604 0.1992
0.47 1.1194 900 0.6286 -0.6589 -1.1143 0.6597 0.4554 -422.9441 -458.8861 -0.0269 0.1154
0.4436 1.1816 950 0.6252 -0.6243 -1.0270 0.6354 0.4027 -422.0717 -458.5404 0.0062 0.1465
0.4483 1.2438 1000 0.6238 -0.6325 -1.0514 0.6319 0.4189 -422.3156 -458.6222 0.0434 0.1813
0.4568 1.3060 1050 0.6297 -0.9557 -1.3457 0.6285 0.3900 -425.2583 -461.8539 0.1563 0.2901
0.4555 1.3682 1100 0.6311 -0.5825 -1.0012 0.6319 0.4188 -421.8140 -458.1216 0.0905 0.2271
0.4744 1.4303 1150 0.6248 -0.5365 -0.9374 0.6424 0.4008 -421.1751 -457.6623 0.0472 0.1861
0.4245 1.4925 1200 0.6255 -0.6457 -1.0579 0.6424 0.4122 -422.3806 -458.7540 -0.0423 0.0997
0.4767 1.5547 1250 0.6294 -0.7333 -1.1519 0.6319 0.4185 -423.3202 -459.6304 0.1300 0.2652
0.4714 1.6169 1300 0.6253 -0.8128 -1.2388 0.6493 0.4261 -424.1896 -460.4245 0.0397 0.1788
0.4336 1.6791 1350 0.6229 -0.7654 -1.2064 0.6424 0.4410 -423.8654 -459.9506 0.1234 0.2587
0.4791 1.7413 1400 0.6216 -0.7578 -1.2069 0.6389 0.4492 -423.8710 -459.8747 0.0547 0.1931
0.439 1.8035 1450 0.6204 -0.7469 -1.1972 0.6493 0.4502 -423.7731 -459.7664 0.0661 0.2040
0.4419 1.8657 1500 0.6194 -0.6699 -1.1107 0.6458 0.4407 -422.9081 -458.9963 0.0509 0.1892
0.4593 1.9279 1550 0.6214 -0.6895 -1.1228 0.6528 0.4333 -423.0291 -459.1917 0.0628 0.2005
0.4444 1.9900 1600 0.6229 -0.6827 -1.1246 0.6667 0.4419 -423.0472 -459.1237 0.0863 0.2226

Framework versions

  • PEFT 0.11.2.dev0
  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for ernestoBocini/Phi3-mini-DPO-Tuned

Adapter
(522)
this model

Space using ernestoBocini/Phi3-mini-DPO-Tuned 1