--- license: apache-2.0 base_model: hZzy/qwen2.5-0.5b-sft-news-IFT tags: - alignment-handbook - ndcg - trl - expo - generated_from_trainer - trl - expo - generated_from_trainer datasets: - hZzy/train_pairwise_weighted model-index: - name: qwen2.5-0.5b-expo-DPO-L2EXPO-W0-noES-0.1 results: [] --- [Visualize in Weights & Biases](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/5aiyt336) # qwen2.5-0.5b-expo-DPO-L2EXPO-W0-noES-0.1 This model is a fine-tuned version of [hZzy/qwen2.5-0.5b-sft-news-IFT](https://huggingface.co/hZzy/qwen2.5-0.5b-sft-news-IFT) on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set: - Loss: 578.1592 - Logps: -81.1337 - Logits: -0.5715 - Objective: 566.3954 - Dpo Loss: 0.7098 - Regularize: 0.5899 - Ranking Simple: 0.5362 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 3 - gradient_accumulation_steps: 12 - total_train_batch_size: 144 - total_eval_batch_size: 12 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Logps | Logits | Objective | Dpo Loss | Regularize | Ranking Simple | |:-------------:|:------:|:----:|:---------------:|:--------:|:-------:|:---------:|:--------:|:----------:|:--------------:| | 470.2434 | 0.1417 | 50 | 491.6702 | -94.3377 | -1.4968 | 488.9373 | 0.6873 | 0.4298 | 0.5269 | | 444.0833 | 0.2834 | 100 | 519.0432 | -85.0979 | -1.4345 | 504.6209 | 0.6839 | 0.4692 | 0.5383 | | 462.7395 | 0.4251 | 150 | 552.1450 | -85.3681 | -1.1142 | 536.3591 | 0.6978 | 0.5303 | 0.5367 | | 445.5849 | 0.5668 | 200 | 561.5619 | -81.4330 | -0.8469 | 550.3475 | 0.7065 | 0.5525 | 0.5336 | | 445.1676 | 0.7085 | 250 | 572.1694 | -80.7174 | -1.0391 | 563.6924 | 0.7070 | 0.5830 | 0.5409 | | 413.9375 | 0.8503 | 300 | 567.0264 | -84.8860 | -0.7452 | 558.1202 | 0.7031 | 0.5732 | 0.5399 | | 385.7652 | 0.9920 | 350 | 581.0135 | -82.6389 | -0.6076 | 565.1652 | 0.7082 | 0.5906 | 0.5383 | | 376.3251 | 1.1337 | 400 | 586.0215 | -81.6223 | -0.5273 | 571.4174 | 0.7118 | 0.5996 | 0.5367 | | 348.4717 | 1.2754 | 450 | 576.5939 | -81.8898 | -0.6517 | 563.9977 | 0.7055 | 0.5866 | 0.5373 | | 351.4185 | 1.4171 | 500 | 584.3820 | -82.8563 | -0.5594 | 570.8920 | 0.7128 | 0.5972 | 0.5393 | | 326.458 | 1.5588 | 550 | 578.3503 | -80.5614 | -0.6994 | 565.9683 | 0.7086 | 0.5877 | 0.5367 | | 329.0151 | 1.7005 | 600 | 578.3867 | -80.3279 | -0.5936 | 566.0594 | 0.7085 | 0.5913 | 0.5388 | | 333.5158 | 1.8422 | 650 | 577.9292 | -81.0225 | -0.5969 | 565.5915 | 0.7084 | 0.5891 | 0.5393 | | 316.2014 | 1.9839 | 700 | 577.6038 | -80.5416 | -0.5956 | 564.6390 | 0.7098 | 0.5857 | 0.5409 | | 295.2996 | 2.1256 | 750 | 579.5015 | -81.0739 | -0.5879 | 567.8405 | 0.7108 | 0.5925 | 0.5393 | | 290.0791 | 2.2674 | 800 | 576.8207 | -81.6889 | -0.5885 | 564.8282 | 0.7088 | 0.5876 | 0.5378 | | 277.1292 | 2.4091 | 850 | 579.0094 | -81.5435 | -0.5771 | 567.1205 | 0.7109 | 0.5911 | 0.5383 | | 271.9766 | 2.5508 | 900 | 577.3417 | -81.1632 | -0.5708 | 565.7184 | 0.7099 | 0.5881 | 0.5362 | | 273.4982 | 2.6925 | 950 | 578.9321 | -81.1954 | -0.5680 | 567.1773 | 0.7103 | 0.5910 | 0.5362 | | 265.7935 | 2.8342 | 1000 | 578.3192 | -81.1470 | -0.5704 | 566.5608 | 0.7099 | 0.5902 | 0.5367 | | 265.6855 | 2.9759 | 1050 | 578.1592 | -81.1337 | -0.5715 | 566.3954 | 0.7098 | 0.5899 | 0.5362 | ### Framework versions - Transformers 4.42.0 - Pytorch 2.3.0+cu121 - Datasets 3.2.0 - Tokenizers 0.19.1