qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-5-5e6

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

Loss: 22.2823
Logps: -81.7800
Logits: -0.6395
Objective: 22.4840
Dpo Loss: 11.4246
Regularize: 22.4840
Ranking Simple: 0.5072
Ranking Idealized: 0.5093
Ranking Idealized Expo: 0.5093

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 6
gradient_accumulation_steps: 12
total_train_batch_size: 288
total_eval_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Ranking Idealized	Ranking Idealized Expo
7.7199	0.2834	50	5.7454	-88.5676	-1.3592	5.8006	2.9352	5.8006	0.5093	0.5093	0.5093
14.7825	0.5668	100	14.5118	-82.6872	-1.0744	14.7179	7.5504	14.7179	0.5052	0.5093	0.5093
15.364	0.8503	150	17.9996	-82.8069	-0.9215	17.9487	9.0677	17.9487	0.5052	0.5093	0.5093
13.5405	1.1337	200	20.4472	-81.7112	-0.8710	20.8042	10.4069	20.8042	0.5155	0.5093	0.5093
12.3187	1.4171	250	20.4460	-80.3281	-0.9020	20.6868	10.5730	20.6868	0.5083	0.5093	0.5093
11.1496	1.7005	300	21.5408	-81.0334	-0.6058	21.7594	10.9942	21.7594	0.5021	0.5093	0.5093
9.8756	1.9839	350	21.6497	-82.6833	-0.6455	21.7908	11.0636	21.7908	0.5103	0.5093	0.5093
8.7383	2.2674	400	22.0188	-82.5924	-0.6389	22.2506	11.2378	22.2506	0.5083	0.5093	0.5093
7.8659	2.5508	450	22.1530	-81.1508	-0.6986	22.4826	11.2333	22.4826	0.5165	0.5093	0.5093
6.4451	2.8342	500	22.1806	-80.7941	-0.7415	22.4462	11.3734	22.4462	0.5114	0.5093	0.5093
5.3913	3.1176	550	22.5555	-81.1559	-0.6593	22.7930	11.5514	22.7930	0.5114	0.5093	0.5093
4.4825	3.4010	600	22.5560	-81.6865	-0.6143	22.7375	11.5064	22.7375	0.5103	0.5093	0.5093
3.8178	3.6845	650	22.4465	-82.0276	-0.6491	22.6879	11.5084	22.6879	0.5093	0.5093	0.5093
3.084	3.9679	700	22.2750	-82.0004	-0.6309	22.4606	11.4155	22.4606	0.5083	0.5093	0.5093
2.2691	4.2513	750	22.2876	-81.7782	-0.6324	22.4857	11.4308	22.4857	0.5072	0.5093	0.5093
1.9909	4.5347	800	22.2976	-81.7023	-0.6413	22.5029	11.4381	22.5029	0.5072	0.5093	0.5093
1.8048	4.8181	850	22.2810	-81.7745	-0.6391	22.4824	11.4237	22.4824	0.5072	0.5093	0.5093

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-5-5e6

qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-5-5e6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-5-5e6

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-5-5e6

Evaluation results