hZzy's picture
End of training
985d70b verified
metadata
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
tags:
  - alignment-handbook
  - ndcg
  - trl
  - expo
  - generated_from_trainer
  - trl
  - expo
  - generated_from_trainer
datasets:
  - hZzy/train_pairwise_weighted
model-index:
  - name: qwen2.5-0.5b-expo-L2EXPO-W0-noES4-0.1
    results: []

Visualize in Weights & Biases

qwen2.5-0.5b-expo-L2EXPO-W0-noES4-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 179.2621
  • Logps: -92.2613
  • Logits: -1.4975
  • Objective: 175.9752
  • Dpo Loss: 0.6785
  • Regularize: 0.3992
  • Ranking Simple: 0.5280
  • Ranking Idealized: 0.6025
  • Ranking Idealized Expo: 0.5233
  • Wo Beta: 16.5856

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Ranking Idealized Ranking Idealized Expo Wo Beta
182.5182 0.1417 50 182.5003 -90.8517 -1.4200 180.4893 0.6895 0.4093 0.5248 0.6025 0.5233 16.3100
159.305 0.2834 100 182.1522 -91.3531 -1.4622 180.5219 0.6860 0.4103 0.5311 0.6025 0.5233 16.3819
150.2379 0.4251 150 180.0575 -90.2469 -1.4576 177.1578 0.6806 0.4010 0.5331 0.6025 0.5233 16.6107
135.925 0.5668 200 179.9740 -91.1249 -1.4453 177.0413 0.6795 0.4006 0.5305 0.6025 0.5233 16.2687
130.7065 0.7085 250 181.5092 -91.6178 -1.5061 178.2784 0.6800 0.4049 0.5305 0.6025 0.5233 16.6407
109.74 0.8503 300 180.4924 -92.4236 -1.4760 178.1365 0.6815 0.4047 0.5305 0.6025 0.5233 16.4981
104.2663 0.9920 350 182.2591 -92.8005 -1.5066 178.8644 0.6808 0.4058 0.5290 0.6025 0.5233 16.5694
91.3585 1.1337 400 180.0295 -92.3854 -1.4789 177.7148 0.6800 0.4024 0.5280 0.6025 0.5233 16.5852
77.8925 1.2754 450 179.2441 -92.7062 -1.4746 175.8475 0.6792 0.3989 0.5274 0.6025 0.5233 16.5269
73.5844 1.4171 500 180.3643 -93.2695 -1.4849 176.2332 0.6786 0.3994 0.5305 0.6025 0.5233 16.5003
74.752 1.5588 550 181.3646 -92.8892 -1.4832 177.2267 0.6795 0.4020 0.5274 0.6025 0.5233 16.5546
66.606 1.7005 600 179.4953 -91.6158 -1.4675 176.2793 0.6789 0.3999 0.5311 0.6025 0.5233 16.6183
65.4503 1.8422 650 180.1248 -91.8974 -1.5046 176.5553 0.6790 0.4003 0.5285 0.6025 0.5233 16.5373
62.3615 1.9839 700 179.3857 -91.5875 -1.4984 176.0021 0.6784 0.3992 0.5300 0.6025 0.5233 16.5863
48.9708 2.1256 750 179.8103 -92.1933 -1.4919 176.7028 0.6794 0.4011 0.5274 0.6025 0.5233 16.5884
51.9463 2.2674 800 179.2178 -92.0065 -1.4993 175.7036 0.6782 0.3986 0.5290 0.6025 0.5233 16.5689
44.3463 2.4091 850 179.1735 -92.2372 -1.4918 175.7777 0.6783 0.3988 0.5285 0.6025 0.5233 16.5682
44.3015 2.5508 900 179.1590 -92.1898 -1.4983 175.8240 0.6784 0.3990 0.5280 0.6025 0.5233 16.5905
43.4164 2.6925 950 179.2801 -92.2046 -1.4967 176.0408 0.6785 0.3993 0.5274 0.6025 0.5233 16.5891
43.6009 2.8342 1000 179.2791 -92.2705 -1.4978 175.9963 0.6785 0.3992 0.5280 0.6025 0.5233 16.5880
47.7054 2.9759 1050 179.2622 -92.2613 -1.4975 175.9752 0.6785 0.3992 0.5280 0.6025 0.5233 16.5856

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1