dpo
This model is a fine-tuned version of mistralai/Mistral-Nemo-Instruct-2407 on the heat_transfer_dpo_fs dataset. It achieves the following results on the evaluation set:
- Loss: 0.1941
- Rewards/chosen: -0.0331
- Rewards/rejected: -3.1999
- Rewards/accuracies: 0.9226
- Rewards/margins: 3.1668
- Logps/chosen: -1.8895
- Logps/rejected: -33.5195
- Logits/chosen: -1.2198
- Logits/rejected: -1.2315
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 7
- eval_batch_size: 7
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 14
- total_eval_batch_size: 14
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/chosen | Logps/rejected | Logits/chosen | Logits/rejected |
---|---|---|---|---|---|---|---|---|---|---|---|
0.692 | 0.0933 | 60 | 0.6930 | -0.0035 | -0.0037 | 0.4831 | 0.0003 | -1.5933 | -1.5578 | -1.3225 | -1.3224 |
0.6852 | 0.1866 | 120 | 0.6736 | 0.0133 | -0.0301 | 0.6339 | 0.0433 | -1.4261 | -1.8214 | -1.3307 | -1.3305 |
0.6513 | 0.2799 | 180 | 0.6289 | -0.0874 | -0.3014 | 0.6796 | 0.2140 | -2.4330 | -4.5348 | -1.3347 | -1.3351 |
0.5901 | 0.3733 | 240 | 0.5247 | -0.1472 | -0.7974 | 0.7470 | 0.6502 | -3.0306 | -9.4947 | -1.3616 | -1.3634 |
0.4131 | 0.4666 | 300 | 0.5557 | -0.2727 | -1.2844 | 0.7173 | 1.0117 | -4.2856 | -14.3649 | -1.3547 | -1.3596 |
0.3288 | 0.5599 | 360 | 0.3651 | -0.1389 | -1.6263 | 0.8562 | 1.4874 | -2.9477 | -17.7834 | -1.3326 | -1.3381 |
0.3723 | 0.6532 | 420 | 0.4056 | -0.1975 | -1.9240 | 0.8125 | 1.7265 | -3.5336 | -20.7607 | -1.3157 | -1.3211 |
0.2432 | 0.7465 | 480 | 0.3918 | -0.1403 | -1.8206 | 0.8095 | 1.6803 | -2.9622 | -19.7268 | -1.2997 | -1.3060 |
0.3456 | 0.8398 | 540 | 0.3036 | -0.0659 | -1.9517 | 0.8671 | 1.8858 | -2.2175 | -21.0373 | -1.2860 | -1.2914 |
0.3651 | 0.9331 | 600 | 0.2770 | -0.0762 | -2.3462 | 0.8879 | 2.2700 | -2.3211 | -24.9826 | -1.2661 | -1.2733 |
0.2788 | 1.0264 | 660 | 0.2802 | -0.1009 | -2.6298 | 0.8829 | 2.5289 | -2.5679 | -27.8189 | -1.2552 | -1.2633 |
0.2522 | 1.1198 | 720 | 0.2631 | -0.0485 | -2.3300 | 0.8938 | 2.2815 | -2.0434 | -24.8206 | -1.2537 | -1.2607 |
0.2458 | 1.2131 | 780 | 0.2431 | -0.0498 | -2.5135 | 0.9117 | 2.4637 | -2.0572 | -26.6558 | -1.2477 | -1.2548 |
0.193 | 1.3064 | 840 | 0.2387 | -0.0474 | -2.6414 | 0.9038 | 2.5939 | -2.0333 | -27.9347 | -1.2430 | -1.2504 |
0.2013 | 1.3997 | 900 | 0.2212 | -0.0433 | -2.7423 | 0.9157 | 2.6991 | -1.9913 | -28.9442 | -1.2349 | -1.2436 |
0.2382 | 1.4930 | 960 | 0.2145 | -0.0570 | -3.0965 | 0.9157 | 3.0395 | -2.1286 | -32.4857 | -1.2230 | -1.2335 |
0.1884 | 1.5863 | 1020 | 0.2086 | -0.0365 | -3.0158 | 0.9177 | 2.9793 | -1.9241 | -31.6789 | -1.2285 | -1.2385 |
0.2342 | 1.6796 | 1080 | 0.2047 | -0.0424 | -3.0708 | 0.9147 | 3.0284 | -1.9832 | -32.2288 | -1.2207 | -1.2312 |
0.2003 | 1.7729 | 1140 | 0.1988 | -0.0416 | -3.1710 | 0.9206 | 3.1294 | -1.9752 | -33.2306 | -1.2183 | -1.2294 |
0.134 | 1.8663 | 1200 | 0.1975 | -0.0410 | -3.1898 | 0.9206 | 3.1489 | -1.9684 | -33.4189 | -1.2191 | -1.2302 |
0.1411 | 1.9596 | 1260 | 0.1944 | -0.0376 | -3.2242 | 0.9266 | 3.1866 | -1.9343 | -33.7627 | -1.2250 | -1.2363 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.0
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.20.1
- Downloads last month
- 10
Model tree for Howard881010/heat_transfer_dpo_fs
Base model
mistralai/Mistral-Nemo-Base-2407
Finetuned
mistralai/Mistral-Nemo-Instruct-2407