Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-L2EXPO-W0-noES-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 578.1592
  • Logps: -81.1337
  • Logits: -0.5715
  • Objective: 566.3954
  • Dpo Loss: 0.7098
  • Regularize: 0.5899
  • Ranking Simple: 0.5362

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple
470.2434 0.1417 50 491.6702 -94.3377 -1.4968 488.9373 0.6873 0.4298 0.5269
444.0833 0.2834 100 519.0432 -85.0979 -1.4345 504.6209 0.6839 0.4692 0.5383
462.7395 0.4251 150 552.1450 -85.3681 -1.1142 536.3591 0.6978 0.5303 0.5367
445.5849 0.5668 200 561.5619 -81.4330 -0.8469 550.3475 0.7065 0.5525 0.5336
445.1676 0.7085 250 572.1694 -80.7174 -1.0391 563.6924 0.7070 0.5830 0.5409
413.9375 0.8503 300 567.0264 -84.8860 -0.7452 558.1202 0.7031 0.5732 0.5399
385.7652 0.9920 350 581.0135 -82.6389 -0.6076 565.1652 0.7082 0.5906 0.5383
376.3251 1.1337 400 586.0215 -81.6223 -0.5273 571.4174 0.7118 0.5996 0.5367
348.4717 1.2754 450 576.5939 -81.8898 -0.6517 563.9977 0.7055 0.5866 0.5373
351.4185 1.4171 500 584.3820 -82.8563 -0.5594 570.8920 0.7128 0.5972 0.5393
326.458 1.5588 550 578.3503 -80.5614 -0.6994 565.9683 0.7086 0.5877 0.5367
329.0151 1.7005 600 578.3867 -80.3279 -0.5936 566.0594 0.7085 0.5913 0.5388
333.5158 1.8422 650 577.9292 -81.0225 -0.5969 565.5915 0.7084 0.5891 0.5393
316.2014 1.9839 700 577.6038 -80.5416 -0.5956 564.6390 0.7098 0.5857 0.5409
295.2996 2.1256 750 579.5015 -81.0739 -0.5879 567.8405 0.7108 0.5925 0.5393
290.0791 2.2674 800 576.8207 -81.6889 -0.5885 564.8282 0.7088 0.5876 0.5378
277.1292 2.4091 850 579.0094 -81.5435 -0.5771 567.1205 0.7109 0.5911 0.5383
271.9766 2.5508 900 577.3417 -81.1632 -0.5708 565.7184 0.7099 0.5881 0.5362
273.4982 2.6925 950 578.9321 -81.1954 -0.5680 567.1773 0.7103 0.5910 0.5362
265.7935 2.8342 1000 578.3192 -81.1470 -0.5704 566.5608 0.7099 0.5902 0.5367
265.6855 2.9759 1050 578.1592 -81.1337 -0.5715 566.3954 0.7098 0.5899 0.5362

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
494M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-L2EXPO-W0-noES-0.1

Finetuned
(68)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-L2EXPO-W0-noES-0.1