File size: 6,119 Bytes
c968b9f
 
 
 
985d70b
 
c968b9f
 
 
985d70b
 
 
 
 
c968b9f
 
 
 
 
 
 
 
 
 
 
985d70b
c968b9f
985d70b
c968b9f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
tags:
- alignment-handbook
- ndcg
- trl
- expo
- generated_from_trainer
- trl
- expo
- generated_from_trainer
datasets:
- hZzy/train_pairwise_weighted
model-index:
- name: qwen2.5-0.5b-expo-L2EXPO-W0-noES4-0.1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/n7w9uz5i)
# qwen2.5-0.5b-expo-L2EXPO-W0-noES4-0.1

This model is a fine-tuned version of [hZzy/qwen2.5-0.5b-sft-news-IFT](https://huggingface.co/hZzy/qwen2.5-0.5b-sft-news-IFT) on the hZzy/train_pairwise_weighted dataset.
It achieves the following results on the evaluation set:
- Loss: 179.2621
- Logps: -92.2613
- Logits: -1.4975
- Objective: 175.9752
- Dpo Loss: 0.6785
- Regularize: 0.3992
- Ranking Simple: 0.5280
- Ranking Idealized: 0.6025
- Ranking Idealized Expo: 0.5233
- Wo Beta: 16.5856

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 144
- total_eval_batch_size: 12
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Logps    | Logits  | Objective | Dpo Loss | Regularize | Ranking Simple | Ranking Idealized | Ranking Idealized Expo | Wo Beta |
|:-------------:|:------:|:----:|:---------------:|:--------:|:-------:|:---------:|:--------:|:----------:|:--------------:|:-----------------:|:----------------------:|:-------:|
| 182.5182      | 0.1417 | 50   | 182.5003        | -90.8517 | -1.4200 | 180.4893  | 0.6895   | 0.4093     | 0.5248         | 0.6025            | 0.5233                 | 16.3100 |
| 159.305       | 0.2834 | 100  | 182.1522        | -91.3531 | -1.4622 | 180.5219  | 0.6860   | 0.4103     | 0.5311         | 0.6025            | 0.5233                 | 16.3819 |
| 150.2379      | 0.4251 | 150  | 180.0575        | -90.2469 | -1.4576 | 177.1578  | 0.6806   | 0.4010     | 0.5331         | 0.6025            | 0.5233                 | 16.6107 |
| 135.925       | 0.5668 | 200  | 179.9740        | -91.1249 | -1.4453 | 177.0413  | 0.6795   | 0.4006     | 0.5305         | 0.6025            | 0.5233                 | 16.2687 |
| 130.7065      | 0.7085 | 250  | 181.5092        | -91.6178 | -1.5061 | 178.2784  | 0.6800   | 0.4049     | 0.5305         | 0.6025            | 0.5233                 | 16.6407 |
| 109.74        | 0.8503 | 300  | 180.4924        | -92.4236 | -1.4760 | 178.1365  | 0.6815   | 0.4047     | 0.5305         | 0.6025            | 0.5233                 | 16.4981 |
| 104.2663      | 0.9920 | 350  | 182.2591        | -92.8005 | -1.5066 | 178.8644  | 0.6808   | 0.4058     | 0.5290         | 0.6025            | 0.5233                 | 16.5694 |
| 91.3585       | 1.1337 | 400  | 180.0295        | -92.3854 | -1.4789 | 177.7148  | 0.6800   | 0.4024     | 0.5280         | 0.6025            | 0.5233                 | 16.5852 |
| 77.8925       | 1.2754 | 450  | 179.2441        | -92.7062 | -1.4746 | 175.8475  | 0.6792   | 0.3989     | 0.5274         | 0.6025            | 0.5233                 | 16.5269 |
| 73.5844       | 1.4171 | 500  | 180.3643        | -93.2695 | -1.4849 | 176.2332  | 0.6786   | 0.3994     | 0.5305         | 0.6025            | 0.5233                 | 16.5003 |
| 74.752        | 1.5588 | 550  | 181.3646        | -92.8892 | -1.4832 | 177.2267  | 0.6795   | 0.4020     | 0.5274         | 0.6025            | 0.5233                 | 16.5546 |
| 66.606        | 1.7005 | 600  | 179.4953        | -91.6158 | -1.4675 | 176.2793  | 0.6789   | 0.3999     | 0.5311         | 0.6025            | 0.5233                 | 16.6183 |
| 65.4503       | 1.8422 | 650  | 180.1248        | -91.8974 | -1.5046 | 176.5553  | 0.6790   | 0.4003     | 0.5285         | 0.6025            | 0.5233                 | 16.5373 |
| 62.3615       | 1.9839 | 700  | 179.3857        | -91.5875 | -1.4984 | 176.0021  | 0.6784   | 0.3992     | 0.5300         | 0.6025            | 0.5233                 | 16.5863 |
| 48.9708       | 2.1256 | 750  | 179.8103        | -92.1933 | -1.4919 | 176.7028  | 0.6794   | 0.4011     | 0.5274         | 0.6025            | 0.5233                 | 16.5884 |
| 51.9463       | 2.2674 | 800  | 179.2178        | -92.0065 | -1.4993 | 175.7036  | 0.6782   | 0.3986     | 0.5290         | 0.6025            | 0.5233                 | 16.5689 |
| 44.3463       | 2.4091 | 850  | 179.1735        | -92.2372 | -1.4918 | 175.7777  | 0.6783   | 0.3988     | 0.5285         | 0.6025            | 0.5233                 | 16.5682 |
| 44.3015       | 2.5508 | 900  | 179.1590        | -92.1898 | -1.4983 | 175.8240  | 0.6784   | 0.3990     | 0.5280         | 0.6025            | 0.5233                 | 16.5905 |
| 43.4164       | 2.6925 | 950  | 179.2801        | -92.2046 | -1.4967 | 176.0408  | 0.6785   | 0.3993     | 0.5274         | 0.6025            | 0.5233                 | 16.5891 |
| 43.6009       | 2.8342 | 1000 | 179.2791        | -92.2705 | -1.4978 | 175.9963  | 0.6785   | 0.3992     | 0.5280         | 0.6025            | 0.5233                 | 16.5880 |
| 47.7054       | 2.9759 | 1050 | 179.2622        | -92.2613 | -1.4975 | 175.9752  | 0.6785   | 0.3992     | 0.5280         | 0.6025            | 0.5233                 | 16.5856 |


### Framework versions

- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 3.2.0
- Tokenizers 0.19.1