dpo

This model is a fine-tuned version of mistralai/Mistral-Nemo-Instruct-2407 on the heat_transfer_dpo dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1331
  • Rewards/chosen: -4.9675
  • Rewards/rejected: -13.7312
  • Rewards/accuracies: 0.9480
  • Rewards/margins: 8.7637
  • Logps/chosen: -224.7040
  • Logps/rejected: -310.9190
  • Logits/chosen: -1.4384
  • Logits/rejected: -1.4474

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 5
  • eval_batch_size: 5
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 10
  • total_eval_batch_size: 10
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/chosen Logps/rejected Logits/chosen Logits/rejected
0.6939 0.0667 60 0.6921 -0.0219 -0.0246 0.5190 0.0026 -175.2482 -173.8529 -1.4010 -1.4008
0.6871 0.1333 120 0.6830 -0.0278 -0.0494 0.6080 0.0216 -175.3069 -174.1010 -1.4030 -1.4029
0.6159 0.2 180 0.6382 -0.5399 -0.7225 0.5610 0.1826 -180.4279 -180.8317 -1.4021 -1.4025
0.368 0.2667 240 0.3849 -1.3538 -2.7449 0.8310 1.3911 -188.5674 -201.0563 -1.3971 -1.3996
0.3234 0.3333 300 0.3633 -2.1358 -4.6104 0.8230 2.4747 -196.3865 -219.7114 -1.4248 -1.4282
0.2649 0.4 360 0.3037 -3.3073 -6.0363 0.8800 2.7290 -208.1017 -233.9699 -1.4411 -1.4450
0.1784 0.4667 420 0.2159 -3.8934 -7.0789 0.9100 3.1855 -213.9628 -244.3959 -1.4470 -1.4523
0.2608 0.5333 480 0.2073 -3.8076 -7.8889 0.9100 4.0813 -213.1049 -252.4960 -1.4509 -1.4571
0.2459 0.6 540 0.2173 -4.7738 -9.6025 0.8890 4.8287 -222.7667 -269.6319 -1.4478 -1.4529
0.1729 0.6667 600 0.2264 -3.6641 -9.1186 0.9200 5.4546 -211.6696 -264.7935 -1.4379 -1.4430
0.2136 0.7333 660 0.1994 -3.1520 -8.0180 0.9190 4.8660 -206.5491 -253.7874 -1.4456 -1.4518
0.2148 0.8 720 0.2623 -3.3220 -8.6375 0.9040 5.3155 -208.2492 -259.9820 -1.4527 -1.4588
0.151 0.8667 780 0.2628 -3.7843 -9.3305 0.8830 5.5462 -212.8717 -266.9124 -1.4556 -1.4621
0.1759 0.9333 840 0.1736 -3.7518 -9.3561 0.9270 5.6043 -212.5472 -267.1683 -1.4565 -1.4631
0.1455 1.0 900 0.1967 -3.4547 -10.0926 0.9290 6.6379 -209.5764 -274.5335 -1.4551 -1.4625
0.1456 1.0667 960 0.2037 -3.9507 -10.4184 0.9290 6.4677 -214.5359 -277.7913 -1.4538 -1.4610
0.1276 1.1333 1020 0.2090 -3.7958 -10.3930 0.9240 6.5972 -212.9869 -277.5373 -1.4494 -1.4568
0.1768 1.2 1080 0.1744 -3.7397 -10.8265 0.9350 7.0868 -212.4255 -281.8718 -1.4487 -1.4565
0.2379 1.2667 1140 0.1679 -4.2998 -11.1092 0.9260 6.8094 -218.0269 -284.6993 -1.4458 -1.4532
0.0571 1.3333 1200 0.1626 -4.5185 -12.4102 0.9420 7.8917 -220.2143 -297.7095 -1.4335 -1.4415
0.1644 1.4 1260 0.1614 -4.3048 -12.2288 0.9400 7.9240 -218.0764 -295.8950 -1.4410 -1.4497
0.3264 1.4667 1320 0.1427 -4.5696 -12.5596 0.9470 7.9900 -220.7249 -299.2028 -1.4390 -1.4475
0.1088 1.5333 1380 0.1382 -4.6426 -12.7848 0.9510 8.1422 -221.4554 -301.4557 -1.4380 -1.4465
0.1853 1.6 1440 0.1417 -4.9985 -13.2069 0.9490 8.2084 -225.0136 -305.6761 -1.4349 -1.4433
0.1406 1.6667 1500 0.1741 -5.1167 -13.8396 0.9410 8.7229 -226.1956 -312.0029 -1.4283 -1.4373
0.1751 1.7333 1560 0.1433 -4.9687 -13.7012 0.9480 8.7325 -224.7161 -310.6195 -1.4309 -1.4397
0.1648 1.8 1620 0.1368 -4.9785 -13.6896 0.9500 8.7111 -224.8141 -310.5035 -1.4335 -1.4424
0.1109 1.8667 1680 0.1367 -5.0609 -13.8370 0.9480 8.7762 -225.6376 -311.9777 -1.4341 -1.4430
0.1875 1.9333 1740 0.1388 -5.0304 -13.7910 0.9500 8.7607 -225.3328 -311.5176 -1.4356 -1.4445
0.0947 2.0 1800 0.1331 -4.9675 -13.7312 0.9480 8.7637 -224.7040 -310.9190 -1.4384 -1.4474

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.20.1
Downloads last month
16
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Howard881010/heat_transfer_dpo