OpenELM-1_1B-DPO-full-1-5

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1836
Rewards/chosen: -14.0
Rewards/rejected: -17.625
Rewards/accuracies: 0.7227
Rewards/margins: 3.625
Logps/rejected: -2048.0
Logps/chosen: -1720.0
Logits/rejected: 4.2812
Logits/chosen: 2.625

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6268	0.1047	100	0.6449	-0.4805	-0.6680	0.6406	0.1885	-356.0	-366.0	-9.5625	-10.0
0.5924	0.2093	200	0.5985	-1.2031	-1.6172	0.6875	0.4199	-450.0	-438.0	-12.875	-13.125
0.6197	0.3140	300	0.5811	-1.375	-1.8438	0.7090	0.4668	-474.0	-456.0	-11.75	-12.1875
0.5968	0.4186	400	0.5933	-2.3125	-2.8438	0.6934	0.5273	-572.0	-548.0	-8.5625	-9.25
0.5854	0.5233	500	0.5737	-1.7422	-2.2812	0.6953	0.5352	-516.0	-492.0	-7.7188	-8.625
0.5524	0.6279	600	0.5768	-3.0156	-3.7031	0.6914	0.6953	-660.0	-620.0	-7.0312	-7.7188
0.5602	0.7326	700	0.5756	-3.1562	-3.9062	0.7168	0.75	-680.0	-636.0	-5.125	-6.3438
0.5581	0.8373	800	0.5854	-3.3906	-4.0312	0.6914	0.6289	-692.0	-656.0	-5.0938	-5.9688
0.5793	0.9419	900	0.5657	-3.1719	-3.9062	0.7207	0.7383	-680.0	-636.0	-3.9531	-5.0312
0.2783	1.0466	1000	0.6053	-4.75	-5.875	0.7188	1.125	-876.0	-792.0	-2.2188	-3.3594
0.2417	1.1512	1100	0.6139	-4.7812	-5.8125	0.7070	1.0469	-872.0	-796.0	-2.3594	-4.125
0.2429	1.2559	1200	0.5897	-5.7188	-6.8125	0.7227	1.0781	-968.0	-892.0	-0.7188	-2.1719
0.2508	1.3605	1300	0.5948	-5.4062	-6.4062	0.6914	1.0	-928.0	-860.0	-0.0104	-1.5156
0.2169	1.4652	1400	0.6104	-5.7812	-6.9062	0.7031	1.1016	-976.0	-896.0	0.0820	-1.75
0.2107	1.5699	1500	0.6062	-6.0625	-7.2812	0.6973	1.1953	-1016.0	-924.0	-0.4590	-2.1719
0.2472	1.6745	1600	0.6158	-5.625	-6.7188	0.7070	1.1016	-960.0	-880.0	-2.0312	-3.9688
0.2545	1.7792	1700	0.6170	-6.25	-7.5	0.7031	1.25	-1040.0	-944.0	-1.2578	-3.2031
0.2383	1.8838	1800	0.6061	-5.625	-6.75	0.7012	1.1172	-964.0	-880.0	0.7383	-1.1328
0.2107	1.9885	1900	0.6135	-6.5	-7.7812	0.7383	1.2578	-1064.0	-968.0	0.3027	-1.4297
0.0186	2.0931	2000	0.7473	-8.0625	-9.875	0.7090	1.8594	-1280.0	-1120.0	2.2812	0.4980
0.03	2.1978	2100	0.8345	-9.9375	-12.25	0.7070	2.2812	-1512.0	-1312.0	3.2031	1.5938
0.0284	2.3025	2200	0.7741	-9.1875	-11.3125	0.7012	2.0781	-1416.0	-1240.0	2.7812	1.0156
0.0352	2.4071	2300	0.7983	-9.3125	-11.3125	0.7090	2.0156	-1424.0	-1248.0	2.6406	0.9961
0.0345	2.5118	2400	0.8249	-9.8125	-12.0	0.7266	2.1719	-1488.0	-1304.0	3.2656	1.5625
0.0192	2.6164	2500	0.8865	-10.25	-12.5625	0.6973	2.2969	-1544.0	-1344.0	3.5938	1.9609
0.0261	2.7211	2600	0.7963	-9.1875	-11.4375	0.7129	2.25	-1432.0	-1240.0	2.7031	0.8672
0.0315	2.8257	2700	0.7619	-9.0	-10.9375	0.7109	1.9766	-1384.0	-1216.0	2.8594	0.8320
0.0293	2.9304	2800	0.8241	-9.75	-12.0625	0.7070	2.2656	-1496.0	-1296.0	3.1719	1.3359
0.0071	3.0351	2900	0.8609	-10.0625	-12.5	0.7188	2.3906	-1536.0	-1328.0	3.1719	1.3125
0.0099	3.1397	3000	0.9558	-11.5	-14.1875	0.7051	2.6875	-1704.0	-1472.0	3.4062	1.6484
0.0079	3.2444	3100	0.9341	-11.125	-13.75	0.7090	2.6562	-1664.0	-1432.0	3.25	1.5078
0.0104	3.3490	3200	0.9926	-11.9375	-14.8125	0.7090	2.9062	-1768.0	-1512.0	3.6719	1.9922
0.0089	3.4537	3300	0.9665	-11.9375	-14.8125	0.7188	2.875	-1768.0	-1512.0	3.8594	2.2656
0.0098	3.5583	3400	0.9548	-11.1875	-13.875	0.7109	2.75	-1680.0	-1432.0	4.0	2.3438
0.0109	3.6630	3500	1.0670	-12.5625	-15.6875	0.7168	3.1406	-1856.0	-1576.0	4.1875	2.5312
0.0081	3.7677	3600	1.0376	-12.375	-15.4375	0.7188	3.0938	-1832.0	-1552.0	4.125	2.4844
0.0081	3.8723	3700	1.0725	-13.0	-16.25	0.7168	3.25	-1912.0	-1616.0	4.1875	2.5938
0.0041	3.9770	3800	1.1346	-13.5	-17.0	0.7188	3.4688	-1984.0	-1672.0	4.2188	2.5781
0.0036	4.0816	3900	1.1589	-13.8125	-17.375	0.7168	3.5156	-2024.0	-1696.0	4.25	2.625
0.0016	4.1863	4000	1.1790	-14.0625	-17.625	0.7168	3.5781	-2048.0	-1720.0	4.2812	2.6719
0.0037	4.2909	4100	1.1847	-14.0625	-17.625	0.7168	3.6094	-2064.0	-1728.0	4.3125	2.6562
0.007	4.3956	4200	1.1905	-14.1875	-17.75	0.7227	3.6406	-2064.0	-1736.0	4.3125	2.6719
0.0038	4.5003	4300	1.1835	-14.0625	-17.75	0.7207	3.6406	-2064.0	-1728.0	4.2812	2.6406
0.0093	4.6049	4400	1.1819	-14.0625	-17.625	0.7207	3.625	-2048.0	-1720.0	4.2812	2.625
0.006	4.7096	4500	1.1817	-14.0	-17.625	0.7227	3.6406	-2048.0	-1720.0	4.2812	2.6094
0.0037	4.8142	4600	1.1826	-14.0	-17.625	0.7227	3.6406	-2048.0	-1720.0	4.25	2.6094
0.0059	4.9189	4700	1.1836	-14.0	-17.625	0.7227	3.625	-2048.0	-1720.0	4.2812	2.625

Framework versions

Transformers 4.44.2
Pytorch 2.1.2
Datasets 2.18.0
Tokenizers 0.19.1

CharlesLi
/

OpenELM-1_1B-DPO-full-1-5

OpenELM-1_1B-DPO-full-1-5

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results