qwen2.5-0.5b-expo-L2EXPO-W0-noES3-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

Loss: 191.8617
Logps: -86.1321
Logits: -1.2576
Objective: 186.8551
Dpo Loss: 0.6807
Regularize: 0.4245
Ranking Simple: 0.5336
Wo Beta: 15.8722

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 7

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Wo Beta
183.2287	0.1417	50	182.4718	-90.7112	-1.4185	180.4123	0.6896	0.4091	0.5243	16.2952
160.3991	0.2834	100	181.4201	-91.1796	-1.4582	179.4420	0.6854	0.4074	0.5305	16.3106
153.0553	0.4251	150	180.9289	-90.2802	-1.4606	178.3229	0.6809	0.4027	0.5326	16.6558
136.9477	0.5668	200	179.7231	-90.2826	-1.4280	176.9630	0.6796	0.4004	0.5316	16.3329
133.9615	0.7085	250	185.2480	-90.3239	-1.5209	181.9659	0.6804	0.4148	0.5367	16.5328
117.2675	0.8503	300	183.6018	-92.0978	-1.4559	181.2600	0.6830	0.4138	0.5280	16.5618
113.618	0.9920	350	187.3962	-90.2357	-1.4778	183.1441	0.6818	0.4156	0.5295	16.3125
108.282	1.1337	400	186.7854	-88.5629	-1.3931	183.6067	0.6814	0.4168	0.5347	16.2558
90.0262	1.2754	450	184.4520	-87.5954	-1.3706	179.7387	0.6794	0.4079	0.5331	16.2049
97.8439	1.4171	500	186.6105	-87.7391	-1.3773	181.2539	0.6799	0.4116	0.5290	16.1404
91.5957	1.5588	550	185.8633	-89.6898	-1.3445	180.9778	0.6797	0.4120	0.5347	16.1335
89.0238	1.7005	600	185.6100	-86.9355	-1.3632	179.5353	0.6773	0.4080	0.5347	16.2059
90.7044	1.8422	650	186.1243	-87.1991	-1.3882	180.0248	0.6776	0.4102	0.5342	16.1165
84.5287	1.9839	700	188.5602	-87.6351	-1.3019	183.1935	0.6814	0.4164	0.5352	16.0500
76.9421	2.1256	750	188.9042	-88.4364	-1.3259	183.6094	0.6794	0.4182	0.5326	15.9422
73.209	2.2674	800	188.3336	-86.2484	-1.3130	183.7086	0.6811	0.4180	0.5321	16.0113
66.2169	2.4091	850	192.0453	-86.7490	-1.3156	186.8341	0.6831	0.4251	0.5316	15.8832
60.5689	2.5508	900	190.1148	-85.9587	-1.2951	185.2343	0.6801	0.4219	0.5321	15.9341
61.9855	2.6925	950	190.6609	-86.4854	-1.3163	185.6429	0.6812	0.4229	0.5321	15.9612
60.2402	2.8342	1000	190.4743	-85.4829	-1.3084	184.9089	0.6796	0.4209	0.5316	15.8681
59.5621	2.9759	1050	191.3895	-85.2853	-1.2977	186.0318	0.6818	0.4236	0.5311	15.9189
57.3013	3.1176	1100	191.3520	-86.2308	-1.3160	186.1591	0.6791	0.4230	0.5367	15.8460
48.599	3.2593	1150	190.8563	-86.5047	-1.2764	185.6679	0.6803	0.4221	0.5373	15.9498
50.0065	3.4010	1200	190.9622	-85.7436	-1.2851	185.7565	0.6795	0.4218	0.5311	15.8572
47.4703	3.5427	1250	191.3775	-86.1116	-1.2775	186.8072	0.6817	0.4239	0.5305	15.9621
44.9179	3.6845	1300	191.8354	-86.2878	-1.2826	186.7091	0.6804	0.4241	0.5305	15.8192
40.9292	3.8262	1350	192.5214	-85.5316	-1.2757	187.4250	0.6820	0.4260	0.5321	15.8804
42.9136	3.9679	1400	192.7924	-85.8583	-1.2427	187.9270	0.6815	0.4268	0.5342	15.8520
38.8325	4.1096	1450	192.5806	-85.5114	-1.2569	187.5089	0.6820	0.4269	0.5342	15.8565
38.0409	4.2513	1500	192.5007	-85.3251	-1.2588	187.1571	0.6813	0.4255	0.5362	15.8393
34.4862	4.3930	1550	191.5790	-86.5480	-1.2534	186.4161	0.6811	0.4236	0.5362	15.9020
34.5799	4.5347	1600	191.4073	-86.2764	-1.2706	186.0825	0.6796	0.4229	0.5336	15.9068
27.3454	4.6764	1650	191.3007	-85.9348	-1.2432	186.2914	0.6801	0.4233	0.5342	15.9007
26.7167	4.8181	1700	191.8703	-86.1611	-1.2529	187.0981	0.6810	0.4254	0.5326	15.8993
27.1152	4.9598	1750	191.9133	-85.8044	-1.2599	187.1985	0.6809	0.4253	0.5336	15.8728
22.8305	5.1016	1800	192.4291	-86.0359	-1.2645	187.6874	0.6808	0.4263	0.5336	15.8275
21.1772	5.2433	1850	192.0774	-85.9744	-1.2599	187.1845	0.6808	0.4254	0.5321	15.8873
18.5995	5.3850	1900	191.8287	-86.0679	-1.2538	186.9649	0.6807	0.4248	0.5326	15.8683
17.8136	5.5267	1950	191.8575	-86.0837	-1.2633	186.7980	0.6805	0.4244	0.5331	15.8704
16.8259	5.6684	2000	191.8466	-86.1388	-1.2609	186.7259	0.6807	0.4245	0.5331	15.8647
15.5852	5.8101	2050	191.9476	-86.2758	-1.2583	186.8974	0.6808	0.4247	0.5336	15.8729
14.5477	5.9518	2100	191.9842	-86.0437	-1.2603	186.9000	0.6807	0.4246	0.5326	15.8685
13.7824	6.0935	2150	191.8207	-86.0032	-1.2604	186.7842	0.6806	0.4243	0.5326	15.8747
11.3504	6.2352	2200	191.8495	-86.0359	-1.2598	186.7322	0.6807	0.4243	0.5326	15.8700
11.1693	6.3769	2250	191.8128	-86.1265	-1.2585	186.7514	0.6807	0.4243	0.5336	15.8747
11.6161	6.5187	2300	191.8225	-86.1693	-1.2558	186.8004	0.6807	0.4244	0.5326	15.8736
10.8866	6.6604	2350	191.8734	-86.1390	-1.2576	186.8597	0.6807	0.4245	0.5336	15.8719
10.3699	6.8021	2400	191.8644	-86.1292	-1.2577	186.8542	0.6807	0.4245	0.5336	15.8721
10.8668	6.9438	2450	191.8617	-86.1321	-1.2576	186.8551	0.6807	0.4245	0.5336	15.8722

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-W0-noES3-0.1

qwen2.5-0.5b-expo-L2EXPO-W0-noES3-0.1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-W0-noES3-0.1

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-W0-noES3-0.1

Evaluation results