dpo

This model is a fine-tuned version of mistralai/Mistral-Nemo-Instruct-2407 on the heat_transfer_dpo_fs dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1941
  • Rewards/chosen: -0.0331
  • Rewards/rejected: -3.1999
  • Rewards/accuracies: 0.9226
  • Rewards/margins: 3.1668
  • Logps/chosen: -1.8895
  • Logps/rejected: -33.5195
  • Logits/chosen: -1.2198
  • Logits/rejected: -1.2315

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 7
  • eval_batch_size: 7
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 14
  • total_eval_batch_size: 14
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/chosen Logps/rejected Logits/chosen Logits/rejected
0.692 0.0933 60 0.6930 -0.0035 -0.0037 0.4831 0.0003 -1.5933 -1.5578 -1.3225 -1.3224
0.6852 0.1866 120 0.6736 0.0133 -0.0301 0.6339 0.0433 -1.4261 -1.8214 -1.3307 -1.3305
0.6513 0.2799 180 0.6289 -0.0874 -0.3014 0.6796 0.2140 -2.4330 -4.5348 -1.3347 -1.3351
0.5901 0.3733 240 0.5247 -0.1472 -0.7974 0.7470 0.6502 -3.0306 -9.4947 -1.3616 -1.3634
0.4131 0.4666 300 0.5557 -0.2727 -1.2844 0.7173 1.0117 -4.2856 -14.3649 -1.3547 -1.3596
0.3288 0.5599 360 0.3651 -0.1389 -1.6263 0.8562 1.4874 -2.9477 -17.7834 -1.3326 -1.3381
0.3723 0.6532 420 0.4056 -0.1975 -1.9240 0.8125 1.7265 -3.5336 -20.7607 -1.3157 -1.3211
0.2432 0.7465 480 0.3918 -0.1403 -1.8206 0.8095 1.6803 -2.9622 -19.7268 -1.2997 -1.3060
0.3456 0.8398 540 0.3036 -0.0659 -1.9517 0.8671 1.8858 -2.2175 -21.0373 -1.2860 -1.2914
0.3651 0.9331 600 0.2770 -0.0762 -2.3462 0.8879 2.2700 -2.3211 -24.9826 -1.2661 -1.2733
0.2788 1.0264 660 0.2802 -0.1009 -2.6298 0.8829 2.5289 -2.5679 -27.8189 -1.2552 -1.2633
0.2522 1.1198 720 0.2631 -0.0485 -2.3300 0.8938 2.2815 -2.0434 -24.8206 -1.2537 -1.2607
0.2458 1.2131 780 0.2431 -0.0498 -2.5135 0.9117 2.4637 -2.0572 -26.6558 -1.2477 -1.2548
0.193 1.3064 840 0.2387 -0.0474 -2.6414 0.9038 2.5939 -2.0333 -27.9347 -1.2430 -1.2504
0.2013 1.3997 900 0.2212 -0.0433 -2.7423 0.9157 2.6991 -1.9913 -28.9442 -1.2349 -1.2436
0.2382 1.4930 960 0.2145 -0.0570 -3.0965 0.9157 3.0395 -2.1286 -32.4857 -1.2230 -1.2335
0.1884 1.5863 1020 0.2086 -0.0365 -3.0158 0.9177 2.9793 -1.9241 -31.6789 -1.2285 -1.2385
0.2342 1.6796 1080 0.2047 -0.0424 -3.0708 0.9147 3.0284 -1.9832 -32.2288 -1.2207 -1.2312
0.2003 1.7729 1140 0.1988 -0.0416 -3.1710 0.9206 3.1294 -1.9752 -33.2306 -1.2183 -1.2294
0.134 1.8663 1200 0.1975 -0.0410 -3.1898 0.9206 3.1489 -1.9684 -33.4189 -1.2191 -1.2302
0.1411 1.9596 1260 0.1944 -0.0376 -3.2242 0.9266 3.1866 -1.9343 -33.7627 -1.2250 -1.2363

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.20.1
Downloads last month
10
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Howard881010/heat_transfer_dpo_fs