Visualize in Weights & Biases

qwen2.5-0.5b-expo-L2EXPO-W2-noES-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0378
  • Logps: -86.2505
  • Logits: -1.3514
  • Objective: 0.0371
  • Dpo Loss: 0.6773
  • Regularize: 0.4086
  • Ranking Simple: 0.5367
  • Ranking Idealized: 0.6025
  • Ranking Idealized Expo: 0.5233
  • Wo Beta: 16.0267

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Ranking Idealized Ranking Idealized Expo Wo Beta
0.0369 0.1417 50 0.0390 -90.2782 -1.4491 0.0391 0.6884 0.4122 0.5269 0.6025 0.5233 16.4652
0.0352 0.2834 100 0.0391 -91.7039 -1.5424 0.0391 0.6823 0.4173 0.5285 0.6025 0.5233 16.4180
0.0307 0.4251 150 0.0381 -89.9212 -1.4936 0.0378 0.6776 0.4083 0.5305 0.6025 0.5233 16.7819
0.0297 0.5668 200 0.0383 -89.7068 -1.4461 0.0379 0.6807 0.4114 0.5311 0.6025 0.5233 16.3721
0.0308 0.7085 250 0.0393 -84.8614 -1.4727 0.0387 0.6793 0.4269 0.5373 0.6025 0.5233 16.4728
0.0319 0.8503 300 0.0393 -86.9588 -1.3799 0.0385 0.6849 0.4346 0.5362 0.6025 0.5233 16.5196
0.0332 0.9920 350 0.0385 -85.9522 -1.4162 0.0371 0.6765 0.4138 0.5367 0.6025 0.5233 16.1268
0.0316 1.1337 400 0.0381 -85.4246 -1.3429 0.0375 0.6780 0.4129 0.5367 0.6025 0.5233 16.0028
0.0231 1.2754 450 0.0383 -86.0492 -1.2800 0.0377 0.6798 0.4181 0.5383 0.6025 0.5233 16.0724
0.0276 1.4171 500 0.0385 -85.1352 -1.3222 0.0376 0.6785 0.4148 0.5362 0.6025 0.5233 15.9100
0.0264 1.5588 550 0.0381 -85.1091 -1.3170 0.0375 0.6773 0.4145 0.5367 0.6025 0.5233 16.0355
0.0216 1.7005 600 0.0379 -84.8271 -1.3649 0.0373 0.6764 0.4106 0.5378 0.6025 0.5233 15.9760
0.0221 1.8422 650 0.0381 -86.0721 -1.3792 0.0371 0.6776 0.4128 0.5367 0.6025 0.5233 15.9955
0.0208 1.9839 700 0.0383 -85.6073 -1.3779 0.0375 0.6793 0.4142 0.5357 0.6025 0.5233 15.9613
0.0149 2.1256 750 0.0379 -86.0851 -1.3526 0.0372 0.6783 0.4108 0.5342 0.6025 0.5233 16.0025
0.0157 2.2674 800 0.0378 -86.2671 -1.3483 0.0370 0.6781 0.4103 0.5362 0.6025 0.5233 16.0414
0.0173 2.4091 850 0.0377 -86.2224 -1.3534 0.0370 0.6770 0.4081 0.5383 0.6025 0.5233 16.0221
0.018 2.5508 900 0.0378 -86.2056 -1.3507 0.0371 0.6773 0.4091 0.5373 0.6025 0.5233 15.9948
0.0133 2.6925 950 0.0378 -86.1726 -1.3498 0.0371 0.6775 0.4090 0.5367 0.6025 0.5233 16.0199
0.0126 2.8342 1000 0.0378 -86.2452 -1.3515 0.0371 0.6773 0.4086 0.5373 0.6025 0.5233 16.0244
0.0146 2.9759 1050 0.0378 -86.2505 -1.3514 0.0371 0.6773 0.4086 0.5367 0.6025 0.5233 16.0267

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
7
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-W2-noES-0.1

Finetuned
(74)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-W2-noES-0.1