sft_dpo_p

This model is a fine-tuned version of mistralai/Mistral-Nemo-Instruct-2407 on the heat_transfer_dpo_p dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1569
  • Rewards/chosen: 0.3090
  • Rewards/rejected: -5.2240
  • Rewards/accuracies: 0.9520
  • Rewards/margins: 5.5331
  • Logps/chosen: -1.4012
  • Logps/rejected: -57.0955
  • Logits/chosen: -0.1708
  • Logits/rejected: -0.2166

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 8
  • total_eval_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/chosen Logps/rejected Logits/chosen Logits/rejected
0.3669 0.0533 60 0.3126 0.3606 -0.9629 0.9150 1.3235 -0.8857 -14.4843 -0.5259 -0.5415
0.2995 0.1067 120 0.2095 0.2729 -3.2809 0.9320 3.5538 -1.7626 -37.6640 -0.2224 -0.2795
0.0686 0.16 180 0.2650 0.2280 -4.0377 0.9220 4.2657 -2.2109 -45.2318 -0.1560 -0.2160
0.1007 0.2133 240 0.2294 0.2211 -4.3632 0.9340 4.5843 -2.2807 -48.4872 -0.1604 -0.2090
0.2146 0.2667 300 0.1389 0.3621 -3.4515 0.9390 3.8136 -0.8700 -39.3696 -0.2215 -0.2535
0.0175 0.32 360 0.1924 0.2508 -4.5680 0.9430 4.8188 -1.9836 -50.5354 -0.1839 -0.2427
0.2375 0.3733 420 0.2330 0.2380 -4.5576 0.9310 4.7956 -2.1114 -50.4313 -0.1628 -0.2199
0.2265 0.4267 480 0.2988 0.1994 -4.5453 0.9190 4.7447 -2.4975 -50.3082 -0.1496 -0.2141
0.0854 0.48 540 0.1945 0.2575 -4.3099 0.9370 4.5674 -1.9162 -47.9538 -0.1301 -0.1829
0.2707 0.5333 600 0.1508 0.3076 -4.9413 0.9500 5.2489 -1.4153 -54.2679 -0.1536 -0.2036
0.161 0.5867 660 0.1841 0.2792 -5.1292 0.9470 5.4084 -1.6994 -56.1473 -0.1543 -0.2038
0.4007 0.64 720 0.1888 0.2476 -5.0702 0.9480 5.3178 -2.0148 -55.5571 -0.1643 -0.2078
0.1186 0.6933 780 0.2090 0.2271 -5.1242 0.9450 5.3513 -2.2203 -56.0969 -0.1519 -0.1959
0.148 0.7467 840 0.1778 0.2731 -5.1445 0.9470 5.4176 -1.7601 -56.3004 -0.1673 -0.2100
0.12 0.8 900 0.1519 0.3056 -5.1776 0.9520 5.4832 -1.4355 -56.6311 -0.1742 -0.2169
0.1522 0.8533 960 0.1528 0.3085 -5.2151 0.9520 5.5236 -1.4062 -57.0058 -0.1666 -0.2108
0.1224 0.9067 1020 0.1497 0.3084 -5.2228 0.9550 5.5312 -1.4068 -57.0827 -0.1706 -0.2145
0.0707 0.96 1080 0.1587 0.3037 -5.2156 0.9510 5.5192 -1.4542 -57.0105 -0.1721 -0.2193

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.20.1
Downloads last month
7
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Howard881010/heat_transfer_sft_dpo_p