eurus-dpop-qlora-uf-ours-5e-7

This model is a fine-tuned version of openbmb/Eurus-7b-sft on the generation/UF dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Positive Losses	Dpo Losses	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Rewards/margins Max	Rewards/margins Min	Rewards/margins Std	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6899	0.28	100	0.7172	0.2510	0.6920	0.0012	-0.0010	0.5660	0.0023	0.0203	-0.0124	0.0107	-257.6245	-274.7594	-2.1882	-2.3107
0.6814	0.56	200	0.8695	1.7877	0.6875	-0.0083	-0.0203	0.6000	0.0121	0.1012	-0.0600	0.0523	-259.5540	-275.7066	-2.1794	-2.3003
0.6528	0.85	300	0.9591	2.6882	0.6850	-0.0114	-0.0297	0.5950	0.0183	0.1559	-0.0931	0.0810	-260.4934	-276.0218	-2.1701	-2.2902
0.6624	1.13	400	1.1421	4.5080	0.6821	-0.0292	-0.0552	0.5980	0.0259	0.2196	-0.1324	0.1141	-263.0362	-277.8048	-2.1645	-2.2834
0.6277	1.41	500	1.3007	6.0675	0.6803	-0.0443	-0.0758	0.6020	0.0315	0.2660	-0.1640	0.1396	-265.0982	-279.3104	-2.1517	-2.2705
0.631	1.69	600	1.3376	6.3902	0.6788	-0.0394	-0.0760	0.6020	0.0366	0.3089	-0.1941	0.1634	-265.1231	-278.8237	-2.1545	-2.2726
0.6412	1.97	700	1.4281	7.2579	0.6778	-0.0461	-0.0864	0.5940	0.0403	0.3396	-0.2171	0.1812	-266.1589	-279.4864	-2.1523	-2.2703
0.6068	2.25	800	1.5679	8.6341	0.6771	-0.0623	-0.1057	0.5910	0.0434	0.3687	-0.2386	0.1977	-268.0912	-281.1146	-2.1430	-2.2616
0.6366	2.54	900	1.5779	8.7234	0.6769	-0.0614	-0.1057	0.5910	0.0443	0.3755	-0.2439	0.2019	-268.0912	-281.0223	-2.1502	-2.2680
0.5967	2.82	1000	1.5826	8.7692	0.6769	-0.0617	-0.1060	0.5950	0.0443	0.3764	-0.2449	0.2025	-268.1228	-281.0526	-2.1458	-2.2641