OpenELM-1_1B-DPO-full-1-5

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1836
  • Rewards/chosen: -14.0
  • Rewards/rejected: -17.625
  • Rewards/accuracies: 0.7227
  • Rewards/margins: 3.625
  • Logps/rejected: -2048.0
  • Logps/chosen: -1720.0
  • Logits/rejected: 4.2812
  • Logits/chosen: 2.625

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6268 0.1047 100 0.6449 -0.4805 -0.6680 0.6406 0.1885 -356.0 -366.0 -9.5625 -10.0
0.5924 0.2093 200 0.5985 -1.2031 -1.6172 0.6875 0.4199 -450.0 -438.0 -12.875 -13.125
0.6197 0.3140 300 0.5811 -1.375 -1.8438 0.7090 0.4668 -474.0 -456.0 -11.75 -12.1875
0.5968 0.4186 400 0.5933 -2.3125 -2.8438 0.6934 0.5273 -572.0 -548.0 -8.5625 -9.25
0.5854 0.5233 500 0.5737 -1.7422 -2.2812 0.6953 0.5352 -516.0 -492.0 -7.7188 -8.625
0.5524 0.6279 600 0.5768 -3.0156 -3.7031 0.6914 0.6953 -660.0 -620.0 -7.0312 -7.7188
0.5602 0.7326 700 0.5756 -3.1562 -3.9062 0.7168 0.75 -680.0 -636.0 -5.125 -6.3438
0.5581 0.8373 800 0.5854 -3.3906 -4.0312 0.6914 0.6289 -692.0 -656.0 -5.0938 -5.9688
0.5793 0.9419 900 0.5657 -3.1719 -3.9062 0.7207 0.7383 -680.0 -636.0 -3.9531 -5.0312
0.2783 1.0466 1000 0.6053 -4.75 -5.875 0.7188 1.125 -876.0 -792.0 -2.2188 -3.3594
0.2417 1.1512 1100 0.6139 -4.7812 -5.8125 0.7070 1.0469 -872.0 -796.0 -2.3594 -4.125
0.2429 1.2559 1200 0.5897 -5.7188 -6.8125 0.7227 1.0781 -968.0 -892.0 -0.7188 -2.1719
0.2508 1.3605 1300 0.5948 -5.4062 -6.4062 0.6914 1.0 -928.0 -860.0 -0.0104 -1.5156
0.2169 1.4652 1400 0.6104 -5.7812 -6.9062 0.7031 1.1016 -976.0 -896.0 0.0820 -1.75
0.2107 1.5699 1500 0.6062 -6.0625 -7.2812 0.6973 1.1953 -1016.0 -924.0 -0.4590 -2.1719
0.2472 1.6745 1600 0.6158 -5.625 -6.7188 0.7070 1.1016 -960.0 -880.0 -2.0312 -3.9688
0.2545 1.7792 1700 0.6170 -6.25 -7.5 0.7031 1.25 -1040.0 -944.0 -1.2578 -3.2031
0.2383 1.8838 1800 0.6061 -5.625 -6.75 0.7012 1.1172 -964.0 -880.0 0.7383 -1.1328
0.2107 1.9885 1900 0.6135 -6.5 -7.7812 0.7383 1.2578 -1064.0 -968.0 0.3027 -1.4297
0.0186 2.0931 2000 0.7473 -8.0625 -9.875 0.7090 1.8594 -1280.0 -1120.0 2.2812 0.4980
0.03 2.1978 2100 0.8345 -9.9375 -12.25 0.7070 2.2812 -1512.0 -1312.0 3.2031 1.5938
0.0284 2.3025 2200 0.7741 -9.1875 -11.3125 0.7012 2.0781 -1416.0 -1240.0 2.7812 1.0156
0.0352 2.4071 2300 0.7983 -9.3125 -11.3125 0.7090 2.0156 -1424.0 -1248.0 2.6406 0.9961
0.0345 2.5118 2400 0.8249 -9.8125 -12.0 0.7266 2.1719 -1488.0 -1304.0 3.2656 1.5625
0.0192 2.6164 2500 0.8865 -10.25 -12.5625 0.6973 2.2969 -1544.0 -1344.0 3.5938 1.9609
0.0261 2.7211 2600 0.7963 -9.1875 -11.4375 0.7129 2.25 -1432.0 -1240.0 2.7031 0.8672
0.0315 2.8257 2700 0.7619 -9.0 -10.9375 0.7109 1.9766 -1384.0 -1216.0 2.8594 0.8320
0.0293 2.9304 2800 0.8241 -9.75 -12.0625 0.7070 2.2656 -1496.0 -1296.0 3.1719 1.3359
0.0071 3.0351 2900 0.8609 -10.0625 -12.5 0.7188 2.3906 -1536.0 -1328.0 3.1719 1.3125
0.0099 3.1397 3000 0.9558 -11.5 -14.1875 0.7051 2.6875 -1704.0 -1472.0 3.4062 1.6484
0.0079 3.2444 3100 0.9341 -11.125 -13.75 0.7090 2.6562 -1664.0 -1432.0 3.25 1.5078
0.0104 3.3490 3200 0.9926 -11.9375 -14.8125 0.7090 2.9062 -1768.0 -1512.0 3.6719 1.9922
0.0089 3.4537 3300 0.9665 -11.9375 -14.8125 0.7188 2.875 -1768.0 -1512.0 3.8594 2.2656
0.0098 3.5583 3400 0.9548 -11.1875 -13.875 0.7109 2.75 -1680.0 -1432.0 4.0 2.3438
0.0109 3.6630 3500 1.0670 -12.5625 -15.6875 0.7168 3.1406 -1856.0 -1576.0 4.1875 2.5312
0.0081 3.7677 3600 1.0376 -12.375 -15.4375 0.7188 3.0938 -1832.0 -1552.0 4.125 2.4844
0.0081 3.8723 3700 1.0725 -13.0 -16.25 0.7168 3.25 -1912.0 -1616.0 4.1875 2.5938
0.0041 3.9770 3800 1.1346 -13.5 -17.0 0.7188 3.4688 -1984.0 -1672.0 4.2188 2.5781
0.0036 4.0816 3900 1.1589 -13.8125 -17.375 0.7168 3.5156 -2024.0 -1696.0 4.25 2.625
0.0016 4.1863 4000 1.1790 -14.0625 -17.625 0.7168 3.5781 -2048.0 -1720.0 4.2812 2.6719
0.0037 4.2909 4100 1.1847 -14.0625 -17.625 0.7168 3.6094 -2064.0 -1728.0 4.3125 2.6562
0.007 4.3956 4200 1.1905 -14.1875 -17.75 0.7227 3.6406 -2064.0 -1736.0 4.3125 2.6719
0.0038 4.5003 4300 1.1835 -14.0625 -17.75 0.7207 3.6406 -2064.0 -1728.0 4.2812 2.6406
0.0093 4.6049 4400 1.1819 -14.0625 -17.625 0.7207 3.625 -2048.0 -1720.0 4.2812 2.625
0.006 4.7096 4500 1.1817 -14.0 -17.625 0.7227 3.6406 -2048.0 -1720.0 4.2812 2.6094
0.0037 4.8142 4600 1.1826 -14.0 -17.625 0.7227 3.6406 -2048.0 -1720.0 4.25 2.6094
0.0059 4.9189 4700 1.1836 -14.0 -17.625 0.7227 3.625 -2048.0 -1720.0 4.2812 2.625

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
1.08B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API does not yet support model repos that contain custom code.