qwen2.5-0.5b-expo-DPO-L2EXPO-W0-noES-0.1
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:
- Loss: 578.1592
- Logps: -81.1337
- Logits: -0.5715
- Objective: 566.3954
- Dpo Loss: 0.7098
- Regularize: 0.5899
- Ranking Simple: 0.5362
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 144
- total_eval_batch_size: 12
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Logps | Logits | Objective | Dpo Loss | Regularize | Ranking Simple |
---|---|---|---|---|---|---|---|---|---|
470.2434 | 0.1417 | 50 | 491.6702 | -94.3377 | -1.4968 | 488.9373 | 0.6873 | 0.4298 | 0.5269 |
444.0833 | 0.2834 | 100 | 519.0432 | -85.0979 | -1.4345 | 504.6209 | 0.6839 | 0.4692 | 0.5383 |
462.7395 | 0.4251 | 150 | 552.1450 | -85.3681 | -1.1142 | 536.3591 | 0.6978 | 0.5303 | 0.5367 |
445.5849 | 0.5668 | 200 | 561.5619 | -81.4330 | -0.8469 | 550.3475 | 0.7065 | 0.5525 | 0.5336 |
445.1676 | 0.7085 | 250 | 572.1694 | -80.7174 | -1.0391 | 563.6924 | 0.7070 | 0.5830 | 0.5409 |
413.9375 | 0.8503 | 300 | 567.0264 | -84.8860 | -0.7452 | 558.1202 | 0.7031 | 0.5732 | 0.5399 |
385.7652 | 0.9920 | 350 | 581.0135 | -82.6389 | -0.6076 | 565.1652 | 0.7082 | 0.5906 | 0.5383 |
376.3251 | 1.1337 | 400 | 586.0215 | -81.6223 | -0.5273 | 571.4174 | 0.7118 | 0.5996 | 0.5367 |
348.4717 | 1.2754 | 450 | 576.5939 | -81.8898 | -0.6517 | 563.9977 | 0.7055 | 0.5866 | 0.5373 |
351.4185 | 1.4171 | 500 | 584.3820 | -82.8563 | -0.5594 | 570.8920 | 0.7128 | 0.5972 | 0.5393 |
326.458 | 1.5588 | 550 | 578.3503 | -80.5614 | -0.6994 | 565.9683 | 0.7086 | 0.5877 | 0.5367 |
329.0151 | 1.7005 | 600 | 578.3867 | -80.3279 | -0.5936 | 566.0594 | 0.7085 | 0.5913 | 0.5388 |
333.5158 | 1.8422 | 650 | 577.9292 | -81.0225 | -0.5969 | 565.5915 | 0.7084 | 0.5891 | 0.5393 |
316.2014 | 1.9839 | 700 | 577.6038 | -80.5416 | -0.5956 | 564.6390 | 0.7098 | 0.5857 | 0.5409 |
295.2996 | 2.1256 | 750 | 579.5015 | -81.0739 | -0.5879 | 567.8405 | 0.7108 | 0.5925 | 0.5393 |
290.0791 | 2.2674 | 800 | 576.8207 | -81.6889 | -0.5885 | 564.8282 | 0.7088 | 0.5876 | 0.5378 |
277.1292 | 2.4091 | 850 | 579.0094 | -81.5435 | -0.5771 | 567.1205 | 0.7109 | 0.5911 | 0.5383 |
271.9766 | 2.5508 | 900 | 577.3417 | -81.1632 | -0.5708 | 565.7184 | 0.7099 | 0.5881 | 0.5362 |
273.4982 | 2.6925 | 950 | 578.9321 | -81.1954 | -0.5680 | 567.1773 | 0.7103 | 0.5910 | 0.5362 |
265.7935 | 2.8342 | 1000 | 578.3192 | -81.1470 | -0.5704 | 566.5608 | 0.7099 | 0.5902 | 0.5367 |
265.6855 | 2.9759 | 1050 | 578.1592 | -81.1337 | -0.5715 | 566.3954 | 0.7098 | 0.5899 | 0.5362 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for hZzy/qwen2.5-0.5b-expo-DPO-L2EXPO-W0-noES-0.1
Base model
hZzy/qwen2.5-0.5b-sft-news-IFT