dpo_p

This model is a fine-tuned version of mistralai/Mistral-Nemo-Instruct-2407 on the heat_transfer_dpo_p dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1692
  • Rewards/chosen: 0.0877
  • Rewards/rejected: -4.1618
  • Rewards/accuracies: 0.9435
  • Rewards/margins: 4.2496
  • Logps/chosen: -3.6031
  • Logps/rejected: -46.4845
  • Logits/chosen: -1.1815
  • Logits/rejected: -1.2052

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 7
  • eval_batch_size: 7
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 14
  • total_eval_batch_size: 14
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/chosen Logps/rejected Logits/chosen Logits/rejected
0.6888 0.0933 60 0.7026 0.0852 0.0954 0.4722 -0.0102 -3.6290 -3.9127 -1.3205 -1.3206
0.6874 0.1866 120 0.6799 -0.0264 -0.0577 0.5853 0.0313 -4.7445 -5.4437 -1.3197 -1.3201
0.6277 0.2799 180 0.6050 -0.0526 -0.3283 0.6865 0.2757 -5.0064 -8.1496 -1.3104 -1.3121
0.6972 0.3733 240 0.6916 0.2062 0.0775 0.5645 0.1287 -2.4188 -4.0918 -1.3059 -1.3064
0.5403 0.4666 300 0.5434 -0.0861 -0.7153 0.7351 0.6292 -5.3416 -12.0196 -1.3176 -1.3214
0.4851 0.5599 360 0.4736 0.0745 -0.6669 0.7738 0.7414 -3.7352 -11.5354 -1.3169 -1.3211
0.5212 0.6532 420 0.4008 0.1432 -0.9171 0.8403 1.0603 -3.0484 -14.0373 -1.3134 -1.3191
0.2776 0.7465 480 0.3285 0.1142 -1.6779 0.8512 1.7921 -3.3384 -21.6450 -1.2922 -1.3021
0.351 0.8398 540 0.2724 0.1235 -2.0395 0.8770 2.1629 -3.2460 -25.2612 -1.2861 -1.2980
0.3464 0.9331 600 0.2994 0.0036 -2.1200 0.8700 2.1236 -4.4449 -26.0666 -1.2775 -1.2895
0.1758 1.0264 660 0.2081 0.1320 -2.7773 0.9137 2.9092 -3.1609 -32.6392 -1.2568 -1.2733
0.1554 1.1198 720 0.1848 0.0998 -3.1629 0.9246 3.2628 -3.4824 -36.4958 -1.2340 -1.2530
0.1542 1.2131 780 0.1818 0.0788 -3.7795 0.9345 3.8583 -3.6926 -42.6612 -1.2215 -1.2440
0.1354 1.3064 840 0.2401 0.0439 -3.8429 0.9147 3.8868 -4.0414 -43.2950 -1.2040 -1.2276
0.2017 1.3997 900 0.2583 0.0451 -3.7989 0.9147 3.8440 -4.0291 -42.8554 -1.2056 -1.2287
0.1909 1.4930 960 0.1759 0.0940 -3.8068 0.9395 3.9008 -3.5403 -42.9342 -1.2013 -1.2244
0.1503 1.5863 1020 0.1781 0.0949 -4.0544 0.9385 4.1493 -3.5316 -45.4105 -1.1901 -1.2136
0.199 1.6796 1080 0.1939 0.0256 -4.1360 0.9335 4.1616 -4.2245 -46.2266 -1.1883 -1.2111
0.2059 1.7729 1140 0.1670 0.0688 -4.1823 0.9405 4.2511 -3.7922 -46.6892 -1.1819 -1.2056
0.1566 1.8663 1200 0.1590 0.0963 -4.1650 0.9464 4.2613 -3.5175 -46.5159 -1.1893 -1.2134
0.1869 1.9596 1260 0.1640 0.0816 -4.1815 0.9454 4.2631 -3.6648 -46.6814 -1.1877 -1.2113

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.20.1
Downloads last month
9
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Howard881010/heat_transfer_dpo_p