snoels commited on
Commit
b8ed563
·
verified ·
1 Parent(s): 6bdefe8

Model save

Browse files
Files changed (4) hide show
  1. README.md +82 -0
  2. all_results.json +9 -0
  3. train_results.json +9 -0
  4. trainer_state.json +1294 -0
README.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BramVanroy/GEITje-7B-ultra
3
+ library_name: peft
4
+ license: cc-by-nc-4.0
5
+ tags:
6
+ - trl
7
+ - dpo
8
+ - generated_from_trainer
9
+ model-index:
10
+ - name: FinGEITje-7B-dpo
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/snoels/huggingface/runs/yng7mdb0)
18
+ # FinGEITje-7B-dpo
19
+
20
+ This model is a fine-tuned version of [BramVanroy/GEITje-7B-ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra) on an unknown dataset.
21
+ It achieves the following results on the evaluation set:
22
+ - Loss: 0.0279
23
+ - Rewards/chosen: -3.8974
24
+ - Rewards/rejected: -15.9642
25
+ - Rewards/accuracies: 0.9828
26
+ - Rewards/margins: 12.0668
27
+ - Logps/rejected: -1951.9310
28
+ - Logps/chosen: -788.9780
29
+ - Logits/rejected: -1.7371
30
+ - Logits/chosen: -1.8937
31
+
32
+ ## Model description
33
+
34
+ More information needed
35
+
36
+ ## Intended uses & limitations
37
+
38
+ More information needed
39
+
40
+ ## Training and evaluation data
41
+
42
+ More information needed
43
+
44
+ ## Training procedure
45
+
46
+ ### Training hyperparameters
47
+
48
+ The following hyperparameters were used during training:
49
+ - learning_rate: 5e-06
50
+ - train_batch_size: 1
51
+ - eval_batch_size: 1
52
+ - seed: 42
53
+ - distributed_type: multi-GPU
54
+ - num_devices: 4
55
+ - gradient_accumulation_steps: 16
56
+ - total_train_batch_size: 64
57
+ - total_eval_batch_size: 4
58
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
59
+ - lr_scheduler_type: cosine
60
+ - lr_scheduler_warmup_ratio: 0.1
61
+ - num_epochs: 1
62
+
63
+ ### Training results
64
+
65
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
66
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
67
+ | 0.1029 | 0.1327 | 100 | 0.1099 | -1.8067 | -5.3683 | 0.9679 | 3.5616 | -892.3373 | -579.9115 | -2.4775 | -2.3705 |
68
+ | 0.042 | 0.2654 | 200 | 0.0430 | -3.5129 | -10.6778 | 0.9828 | 7.1649 | -1423.2883 | -750.5289 | -1.9744 | -1.9895 |
69
+ | 0.0278 | 0.3981 | 300 | 0.0344 | -3.7335 | -13.5153 | 0.9828 | 9.7818 | -1707.0360 | -772.5893 | -1.7454 | -1.8191 |
70
+ | 0.0223 | 0.5308 | 400 | 0.0308 | -3.6554 | -13.7712 | 0.9858 | 10.1158 | -1732.6289 | -764.7831 | -1.8020 | -1.9184 |
71
+ | 0.0378 | 0.6635 | 500 | 0.0297 | -4.0018 | -16.3285 | 0.9851 | 12.3266 | -1988.3542 | -799.4221 | -1.6924 | -1.8650 |
72
+ | 0.0352 | 0.7962 | 600 | 0.0278 | -3.8104 | -15.6430 | 0.9836 | 11.8327 | -1919.8119 | -780.2752 | -1.7437 | -1.8978 |
73
+ | 0.0238 | 0.9289 | 700 | 0.0279 | -3.8974 | -15.9642 | 0.9828 | 12.0668 | -1951.9310 | -788.9780 | -1.7371 | -1.8937 |
74
+
75
+
76
+ ### Framework versions
77
+
78
+ - PEFT 0.11.1
79
+ - Transformers 4.42.4
80
+ - Pytorch 2.3.1
81
+ - Datasets 2.20.0
82
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 0.999253545658124,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.0882907345950366,
5
+ "train_runtime": 20220.3954,
6
+ "train_samples": 48227,
7
+ "train_samples_per_second": 2.385,
8
+ "train_steps_per_second": 0.037
9
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 0.999253545658124,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.0882907345950366,
5
+ "train_runtime": 20220.3954,
6
+ "train_samples": 48227,
7
+ "train_samples_per_second": 2.385,
8
+ "train_steps_per_second": 0.037
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.999253545658124,
5
+ "eval_steps": 100,
6
+ "global_step": 753,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0013270299411130464,
13
+ "grad_norm": 1.7421875,
14
+ "learning_rate": 6.578947368421053e-08,
15
+ "logits/chosen": -3.0980052947998047,
16
+ "logits/rejected": -3.127007007598877,
17
+ "logps/chosen": -425.274169921875,
18
+ "logps/rejected": -373.2780456542969,
19
+ "loss": 0.6931,
20
+ "rewards/accuracies": 0.0,
21
+ "rewards/chosen": 0.0,
22
+ "rewards/margins": 0.0,
23
+ "rewards/rejected": 0.0,
24
+ "step": 1
25
+ },
26
+ {
27
+ "epoch": 0.013270299411130464,
28
+ "grad_norm": 1.765625,
29
+ "learning_rate": 6.578947368421053e-07,
30
+ "logits/chosen": -3.072850227355957,
31
+ "logits/rejected": -3.0915706157684326,
32
+ "logps/chosen": -421.3843994140625,
33
+ "logps/rejected": -350.215087890625,
34
+ "loss": 0.6919,
35
+ "rewards/accuracies": 0.6041666865348816,
36
+ "rewards/chosen": 0.0024290236178785563,
37
+ "rewards/margins": 0.0028885826468467712,
38
+ "rewards/rejected": -0.00045955937821418047,
39
+ "step": 10
40
+ },
41
+ {
42
+ "epoch": 0.026540598822260928,
43
+ "grad_norm": 1.6875,
44
+ "learning_rate": 1.3157894736842106e-06,
45
+ "logits/chosen": -3.0872464179992676,
46
+ "logits/rejected": -3.0978829860687256,
47
+ "logps/chosen": -352.447021484375,
48
+ "logps/rejected": -364.5257873535156,
49
+ "loss": 0.6816,
50
+ "rewards/accuracies": 0.956250011920929,
51
+ "rewards/chosen": 0.015185330994427204,
52
+ "rewards/margins": 0.021979082375764847,
53
+ "rewards/rejected": -0.006793751381337643,
54
+ "step": 20
55
+ },
56
+ {
57
+ "epoch": 0.039810898233391394,
58
+ "grad_norm": 1.53125,
59
+ "learning_rate": 1.973684210526316e-06,
60
+ "logits/chosen": -3.055328130722046,
61
+ "logits/rejected": -3.0872576236724854,
62
+ "logps/chosen": -374.4645080566406,
63
+ "logps/rejected": -364.3621520996094,
64
+ "loss": 0.6547,
65
+ "rewards/accuracies": 0.9937499761581421,
66
+ "rewards/chosen": 0.05001773685216904,
67
+ "rewards/margins": 0.07953666895627975,
68
+ "rewards/rejected": -0.029518935829401016,
69
+ "step": 30
70
+ },
71
+ {
72
+ "epoch": 0.053081197644521856,
73
+ "grad_norm": 1.6640625,
74
+ "learning_rate": 2.631578947368421e-06,
75
+ "logits/chosen": -3.054077625274658,
76
+ "logits/rejected": -3.090534210205078,
77
+ "logps/chosen": -394.09552001953125,
78
+ "logps/rejected": -394.32818603515625,
79
+ "loss": 0.611,
80
+ "rewards/accuracies": 0.9750000238418579,
81
+ "rewards/chosen": 0.0969473272562027,
82
+ "rewards/margins": 0.17953996360301971,
83
+ "rewards/rejected": -0.08259265124797821,
84
+ "step": 40
85
+ },
86
+ {
87
+ "epoch": 0.06635149705565231,
88
+ "grad_norm": 1.796875,
89
+ "learning_rate": 3.289473684210527e-06,
90
+ "logits/chosen": -3.0039563179016113,
91
+ "logits/rejected": -3.0520451068878174,
92
+ "logps/chosen": -376.13897705078125,
93
+ "logps/rejected": -405.67303466796875,
94
+ "loss": 0.5412,
95
+ "rewards/accuracies": 0.9750000238418579,
96
+ "rewards/chosen": 0.12179754674434662,
97
+ "rewards/margins": 0.3453168272972107,
98
+ "rewards/rejected": -0.22351928055286407,
99
+ "step": 50
100
+ },
101
+ {
102
+ "epoch": 0.07962179646678279,
103
+ "grad_norm": 1.7890625,
104
+ "learning_rate": 3.947368421052632e-06,
105
+ "logits/chosen": -2.974642515182495,
106
+ "logits/rejected": -3.0201048851013184,
107
+ "logps/chosen": -418.08135986328125,
108
+ "logps/rejected": -455.23846435546875,
109
+ "loss": 0.4179,
110
+ "rewards/accuracies": 0.987500011920929,
111
+ "rewards/chosen": 0.048082564026117325,
112
+ "rewards/margins": 0.6676187515258789,
113
+ "rewards/rejected": -0.6195362210273743,
114
+ "step": 60
115
+ },
116
+ {
117
+ "epoch": 0.09289209587791325,
118
+ "grad_norm": 1.34375,
119
+ "learning_rate": 4.605263157894737e-06,
120
+ "logits/chosen": -2.86934757232666,
121
+ "logits/rejected": -2.9434354305267334,
122
+ "logps/chosen": -400.07568359375,
123
+ "logps/rejected": -482.0399475097656,
124
+ "loss": 0.3225,
125
+ "rewards/accuracies": 0.981249988079071,
126
+ "rewards/chosen": -0.1442013680934906,
127
+ "rewards/margins": 1.0631463527679443,
128
+ "rewards/rejected": -1.2073477506637573,
129
+ "step": 70
130
+ },
131
+ {
132
+ "epoch": 0.10616239528904371,
133
+ "grad_norm": 1.359375,
134
+ "learning_rate": 4.9995693346469565e-06,
135
+ "logits/chosen": -2.6817970275878906,
136
+ "logits/rejected": -2.778237819671631,
137
+ "logps/chosen": -460.60394287109375,
138
+ "logps/rejected": -621.35498046875,
139
+ "loss": 0.2331,
140
+ "rewards/accuracies": 0.987500011920929,
141
+ "rewards/chosen": -0.620966374874115,
142
+ "rewards/margins": 1.6919386386871338,
143
+ "rewards/rejected": -2.3129050731658936,
144
+ "step": 80
145
+ },
146
+ {
147
+ "epoch": 0.11943269470017417,
148
+ "grad_norm": 1.8671875,
149
+ "learning_rate": 4.994726053293703e-06,
150
+ "logits/chosen": -2.5432372093200684,
151
+ "logits/rejected": -2.636983633041382,
152
+ "logps/chosen": -504.8564453125,
153
+ "logps/rejected": -734.1746826171875,
154
+ "loss": 0.1809,
155
+ "rewards/accuracies": 0.981249988079071,
156
+ "rewards/chosen": -1.099843978881836,
157
+ "rewards/margins": 2.6429855823516846,
158
+ "rewards/rejected": -3.7428295612335205,
159
+ "step": 90
160
+ },
161
+ {
162
+ "epoch": 0.13270299411130462,
163
+ "grad_norm": 1.140625,
164
+ "learning_rate": 4.984511621268103e-06,
165
+ "logits/chosen": -2.420698642730713,
166
+ "logits/rejected": -2.541872501373291,
167
+ "logps/chosen": -544.0470581054688,
168
+ "logps/rejected": -866.2991943359375,
169
+ "loss": 0.1029,
170
+ "rewards/accuracies": 0.9750000238418579,
171
+ "rewards/chosen": -1.5227975845336914,
172
+ "rewards/margins": 3.385746479034424,
173
+ "rewards/rejected": -4.908544063568115,
174
+ "step": 100
175
+ },
176
+ {
177
+ "epoch": 0.13270299411130462,
178
+ "eval_logits/chosen": -2.3704655170440674,
179
+ "eval_logits/rejected": -2.4775025844573975,
180
+ "eval_logps/chosen": -579.9114990234375,
181
+ "eval_logps/rejected": -892.3373413085938,
182
+ "eval_loss": 0.10992002487182617,
183
+ "eval_rewards/accuracies": 0.9679104685783386,
184
+ "eval_rewards/chosen": -1.8067275285720825,
185
+ "eval_rewards/margins": 3.5615758895874023,
186
+ "eval_rewards/rejected": -5.368303298950195,
187
+ "eval_runtime": 828.869,
188
+ "eval_samples_per_second": 6.465,
189
+ "eval_steps_per_second": 1.617,
190
+ "step": 100
191
+ },
192
+ {
193
+ "epoch": 0.1459732935224351,
194
+ "grad_norm": 1.2421875,
195
+ "learning_rate": 4.968948030264743e-06,
196
+ "logits/chosen": -2.349208354949951,
197
+ "logits/rejected": -2.439763069152832,
198
+ "logps/chosen": -572.8814697265625,
199
+ "logps/rejected": -938.94091796875,
200
+ "loss": 0.085,
201
+ "rewards/accuracies": 0.987500011920929,
202
+ "rewards/chosen": -1.6822025775909424,
203
+ "rewards/margins": 4.082047939300537,
204
+ "rewards/rejected": -5.764250755310059,
205
+ "step": 110
206
+ },
207
+ {
208
+ "epoch": 0.15924359293356558,
209
+ "grad_norm": 2.046875,
210
+ "learning_rate": 4.948068788729238e-06,
211
+ "logits/chosen": -2.156416177749634,
212
+ "logits/rejected": -2.215620756149292,
213
+ "logps/chosen": -626.0386962890625,
214
+ "logps/rejected": -1036.8035888671875,
215
+ "loss": 0.0874,
216
+ "rewards/accuracies": 0.9624999761581421,
217
+ "rewards/chosen": -2.1521973609924316,
218
+ "rewards/margins": 4.6609344482421875,
219
+ "rewards/rejected": -6.813131809234619,
220
+ "step": 120
221
+ },
222
+ {
223
+ "epoch": 0.17251389234469602,
224
+ "grad_norm": 1.8515625,
225
+ "learning_rate": 4.921918849714475e-06,
226
+ "logits/chosen": -2.1903834342956543,
227
+ "logits/rejected": -2.2045347690582275,
228
+ "logps/chosen": -673.7794189453125,
229
+ "logps/rejected": -1226.3961181640625,
230
+ "loss": 0.0559,
231
+ "rewards/accuracies": 0.9937499761581421,
232
+ "rewards/chosen": -2.5693655014038086,
233
+ "rewards/margins": 5.772281646728516,
234
+ "rewards/rejected": -8.341647148132324,
235
+ "step": 130
236
+ },
237
+ {
238
+ "epoch": 0.1857841917558265,
239
+ "grad_norm": 1.0078125,
240
+ "learning_rate": 4.890554514096592e-06,
241
+ "logits/chosen": -2.1850454807281494,
242
+ "logits/rejected": -2.19503116607666,
243
+ "logps/chosen": -688.4675903320312,
244
+ "logps/rejected": -1202.0238037109375,
245
+ "loss": 0.0699,
246
+ "rewards/accuracies": 0.9750000238418579,
247
+ "rewards/chosen": -2.7659270763397217,
248
+ "rewards/margins": 5.731719970703125,
249
+ "rewards/rejected": -8.497647285461426,
250
+ "step": 140
251
+ },
252
+ {
253
+ "epoch": 0.19905449116695695,
254
+ "grad_norm": 4.96875,
255
+ "learning_rate": 4.854043309359063e-06,
256
+ "logits/chosen": -2.045557737350464,
257
+ "logits/rejected": -2.022369146347046,
258
+ "logps/chosen": -797.4768676757812,
259
+ "logps/rejected": -1449.0230712890625,
260
+ "loss": 0.0569,
261
+ "rewards/accuracies": 0.9750000238418579,
262
+ "rewards/chosen": -3.9922683238983154,
263
+ "rewards/margins": 6.881680488586426,
264
+ "rewards/rejected": -10.87394905090332,
265
+ "step": 150
266
+ },
267
+ {
268
+ "epoch": 0.21232479057808742,
269
+ "grad_norm": 2.5,
270
+ "learning_rate": 4.8124638442058856e-06,
271
+ "logits/chosen": -2.0948424339294434,
272
+ "logits/rejected": -2.136207103729248,
273
+ "logps/chosen": -731.327392578125,
274
+ "logps/rejected": -1373.2447509765625,
275
+ "loss": 0.05,
276
+ "rewards/accuracies": 0.987500011920929,
277
+ "rewards/chosen": -3.1236228942871094,
278
+ "rewards/margins": 6.655406951904297,
279
+ "rewards/rejected": -9.77902889251709,
280
+ "step": 160
281
+ },
282
+ {
283
+ "epoch": 0.22559508998921787,
284
+ "grad_norm": 1.265625,
285
+ "learning_rate": 4.765905639316861e-06,
286
+ "logits/chosen": -1.9219977855682373,
287
+ "logits/rejected": -1.9046119451522827,
288
+ "logps/chosen": -720.5792236328125,
289
+ "logps/rejected": -1352.9737548828125,
290
+ "loss": 0.0472,
291
+ "rewards/accuracies": 0.9937499761581421,
292
+ "rewards/chosen": -3.072010040283203,
293
+ "rewards/margins": 6.776447296142578,
294
+ "rewards/rejected": -9.848457336425781,
295
+ "step": 170
296
+ },
297
+ {
298
+ "epoch": 0.23886538940034835,
299
+ "grad_norm": 2.234375,
300
+ "learning_rate": 4.7144689346093814e-06,
301
+ "logits/chosen": -1.9168952703475952,
302
+ "logits/rejected": -1.8547157049179077,
303
+ "logps/chosen": -825.1336059570312,
304
+ "logps/rejected": -1616.734375,
305
+ "loss": 0.0578,
306
+ "rewards/accuracies": 0.9624999761581421,
307
+ "rewards/chosen": -4.402952671051025,
308
+ "rewards/margins": 8.200037956237793,
309
+ "rewards/rejected": -12.602991104125977,
310
+ "step": 180
311
+ },
312
+ {
313
+ "epoch": 0.2521356888114788,
314
+ "grad_norm": 2.296875,
315
+ "learning_rate": 4.65826447342166e-06,
316
+ "logits/chosen": -1.9839226007461548,
317
+ "logits/rejected": -1.9680023193359375,
318
+ "logps/chosen": -803.8746948242188,
319
+ "logps/rejected": -1493.1898193359375,
320
+ "loss": 0.0497,
321
+ "rewards/accuracies": 0.9750000238418579,
322
+ "rewards/chosen": -3.822723865509033,
323
+ "rewards/margins": 7.312263488769531,
324
+ "rewards/rejected": -11.134986877441406,
325
+ "step": 190
326
+ },
327
+ {
328
+ "epoch": 0.26540598822260925,
329
+ "grad_norm": 8.8125,
330
+ "learning_rate": 4.597413264082086e-06,
331
+ "logits/chosen": -1.9621204137802124,
332
+ "logits/rejected": -1.9449526071548462,
333
+ "logps/chosen": -757.6912841796875,
334
+ "logps/rejected": -1455.258544921875,
335
+ "loss": 0.042,
336
+ "rewards/accuracies": 0.9750000238418579,
337
+ "rewards/chosen": -3.6233069896698,
338
+ "rewards/margins": 7.341341495513916,
339
+ "rewards/rejected": -10.964648246765137,
340
+ "step": 200
341
+ },
342
+ {
343
+ "epoch": 0.26540598822260925,
344
+ "eval_logits/chosen": -1.9894516468048096,
345
+ "eval_logits/rejected": -1.9743856191635132,
346
+ "eval_logps/chosen": -750.5289306640625,
347
+ "eval_logps/rejected": -1423.288330078125,
348
+ "eval_loss": 0.04295578598976135,
349
+ "eval_rewards/accuracies": 0.9828358292579651,
350
+ "eval_rewards/chosen": -3.5129029750823975,
351
+ "eval_rewards/margins": 7.164910793304443,
352
+ "eval_rewards/rejected": -10.677813529968262,
353
+ "eval_runtime": 830.8607,
354
+ "eval_samples_per_second": 6.45,
355
+ "eval_steps_per_second": 1.613,
356
+ "step": 200
357
+ },
358
+ {
359
+ "epoch": 0.27867628763373975,
360
+ "grad_norm": 9.6875,
361
+ "learning_rate": 4.5320463193780265e-06,
362
+ "logits/chosen": -1.9735777378082275,
363
+ "logits/rejected": -1.9742670059204102,
364
+ "logps/chosen": -789.7581176757812,
365
+ "logps/rejected": -1441.5927734375,
366
+ "loss": 0.0416,
367
+ "rewards/accuracies": 0.9937499761581421,
368
+ "rewards/chosen": -3.7721946239471436,
369
+ "rewards/margins": 6.881577968597412,
370
+ "rewards/rejected": -10.653772354125977,
371
+ "step": 210
372
+ },
373
+ {
374
+ "epoch": 0.2919465870448702,
375
+ "grad_norm": 2.71875,
376
+ "learning_rate": 4.462304374485005e-06,
377
+ "logits/chosen": -1.910851240158081,
378
+ "logits/rejected": -1.8877220153808594,
379
+ "logps/chosen": -795.4382934570312,
380
+ "logps/rejected": -1552.471923828125,
381
+ "loss": 0.0435,
382
+ "rewards/accuracies": 0.987500011920929,
383
+ "rewards/chosen": -4.075817584991455,
384
+ "rewards/margins": 7.9385833740234375,
385
+ "rewards/rejected": -12.014400482177734,
386
+ "step": 220
387
+ },
388
+ {
389
+ "epoch": 0.30521688645600065,
390
+ "grad_norm": 0.9296875,
391
+ "learning_rate": 4.388337583963563e-06,
392
+ "logits/chosen": -1.8420207500457764,
393
+ "logits/rejected": -1.7979652881622314,
394
+ "logps/chosen": -831.4200439453125,
395
+ "logps/rejected": -1719.750244140625,
396
+ "loss": 0.0253,
397
+ "rewards/accuracies": 1.0,
398
+ "rewards/chosen": -4.229403018951416,
399
+ "rewards/margins": 9.30135440826416,
400
+ "rewards/rejected": -13.530756950378418,
401
+ "step": 230
402
+ },
403
+ {
404
+ "epoch": 0.31848718586713115,
405
+ "grad_norm": 1.15625,
406
+ "learning_rate": 4.310305198476161e-06,
407
+ "logits/chosen": -1.813542366027832,
408
+ "logits/rejected": -1.7659708261489868,
409
+ "logps/chosen": -770.3541259765625,
410
+ "logps/rejected": -1698.886474609375,
411
+ "loss": 0.0336,
412
+ "rewards/accuracies": 0.9937499761581421,
413
+ "rewards/chosen": -3.893308162689209,
414
+ "rewards/margins": 9.639238357543945,
415
+ "rewards/rejected": -13.532546997070312,
416
+ "step": 240
417
+ },
418
+ {
419
+ "epoch": 0.3317574852782616,
420
+ "grad_norm": 2.03125,
421
+ "learning_rate": 4.228375221920147e-06,
422
+ "logits/chosen": -1.8265107870101929,
423
+ "logits/rejected": -1.765747308731079,
424
+ "logps/chosen": -721.0618896484375,
425
+ "logps/rejected": -1561.09033203125,
426
+ "loss": 0.0269,
427
+ "rewards/accuracies": 0.9937499761581421,
428
+ "rewards/chosen": -3.446256160736084,
429
+ "rewards/margins": 8.8670015335083,
430
+ "rewards/rejected": -12.313258171081543,
431
+ "step": 250
432
+ },
433
+ {
434
+ "epoch": 0.34502778468939205,
435
+ "grad_norm": 3.875,
436
+ "learning_rate": 4.142724049715005e-06,
437
+ "logits/chosen": -1.7385963201522827,
438
+ "logits/rejected": -1.5829213857650757,
439
+ "logps/chosen": -922.6043701171875,
440
+ "logps/rejected": -2136.38525390625,
441
+ "loss": 0.0346,
442
+ "rewards/accuracies": 1.0,
443
+ "rewards/chosen": -4.981525897979736,
444
+ "rewards/margins": 11.962621688842773,
445
+ "rewards/rejected": -16.94414710998535,
446
+ "step": 260
447
+ },
448
+ {
449
+ "epoch": 0.3582980841005225,
450
+ "grad_norm": 5.5,
451
+ "learning_rate": 4.053536089022624e-06,
452
+ "logits/chosen": -1.841082215309143,
453
+ "logits/rejected": -1.7555363178253174,
454
+ "logps/chosen": -756.6123657226562,
455
+ "logps/rejected": -1619.0562744140625,
456
+ "loss": 0.0434,
457
+ "rewards/accuracies": 0.987500011920929,
458
+ "rewards/chosen": -3.6720402240753174,
459
+ "rewards/margins": 8.794347763061523,
460
+ "rewards/rejected": -12.466386795043945,
461
+ "step": 270
462
+ },
463
+ {
464
+ "epoch": 0.371568383511653,
465
+ "grad_norm": 3.1875,
466
+ "learning_rate": 3.961003361718272e-06,
467
+ "logits/chosen": -1.8635780811309814,
468
+ "logits/rejected": -1.7543712854385376,
469
+ "logps/chosen": -745.083251953125,
470
+ "logps/rejected": -1585.915283203125,
471
+ "loss": 0.0333,
472
+ "rewards/accuracies": 0.9937499761581421,
473
+ "rewards/chosen": -3.244401216506958,
474
+ "rewards/margins": 8.643343925476074,
475
+ "rewards/rejected": -11.887744903564453,
476
+ "step": 280
477
+ },
478
+ {
479
+ "epoch": 0.38483868292278345,
480
+ "grad_norm": 16.5,
481
+ "learning_rate": 3.8653250909670815e-06,
482
+ "logits/chosen": -1.6911243200302124,
483
+ "logits/rejected": -1.550756573677063,
484
+ "logps/chosen": -840.0848388671875,
485
+ "logps/rejected": -1973.161376953125,
486
+ "loss": 0.0493,
487
+ "rewards/accuracies": 0.987500011920929,
488
+ "rewards/chosen": -4.447511196136475,
489
+ "rewards/margins": 11.780040740966797,
490
+ "rewards/rejected": -16.227550506591797,
491
+ "step": 290
492
+ },
493
+ {
494
+ "epoch": 0.3981089823339139,
495
+ "grad_norm": 1.5859375,
496
+ "learning_rate": 3.7667072722961363e-06,
497
+ "logits/chosen": -1.8192800283432007,
498
+ "logits/rejected": -1.7625439167022705,
499
+ "logps/chosen": -772.19189453125,
500
+ "logps/rejected": -1698.5767822265625,
501
+ "loss": 0.0278,
502
+ "rewards/accuracies": 0.9937499761581421,
503
+ "rewards/chosen": -3.4313864707946777,
504
+ "rewards/margins": 9.797091484069824,
505
+ "rewards/rejected": -13.228475570678711,
506
+ "step": 300
507
+ },
508
+ {
509
+ "epoch": 0.3981089823339139,
510
+ "eval_logits/chosen": -1.8191460371017456,
511
+ "eval_logits/rejected": -1.7454465627670288,
512
+ "eval_logps/chosen": -772.5892944335938,
513
+ "eval_logps/rejected": -1707.0360107421875,
514
+ "eval_loss": 0.03441401198506355,
515
+ "eval_rewards/accuracies": 0.9828358292579651,
516
+ "eval_rewards/chosen": -3.733506202697754,
517
+ "eval_rewards/margins": 9.781785011291504,
518
+ "eval_rewards/rejected": -13.51529312133789,
519
+ "eval_runtime": 829.9568,
520
+ "eval_samples_per_second": 6.457,
521
+ "eval_steps_per_second": 1.615,
522
+ "step": 300
523
+ },
524
+ {
525
+ "epoch": 0.41137928174504435,
526
+ "grad_norm": 6.5625,
527
+ "learning_rate": 3.665362230085646e-06,
528
+ "logits/chosen": -1.755239725112915,
529
+ "logits/rejected": -1.6191673278808594,
530
+ "logps/chosen": -831.6585693359375,
531
+ "logps/rejected": -1947.4847412109375,
532
+ "loss": 0.0316,
533
+ "rewards/accuracies": 0.9937499761581421,
534
+ "rewards/chosen": -4.389147758483887,
535
+ "rewards/margins": 11.429269790649414,
536
+ "rewards/rejected": -15.8184175491333,
537
+ "step": 310
538
+ },
539
+ {
540
+ "epoch": 0.42464958115617485,
541
+ "grad_norm": 2.9375,
542
+ "learning_rate": 3.5615081604340905e-06,
543
+ "logits/chosen": -1.812110185623169,
544
+ "logits/rejected": -1.6317228078842163,
545
+ "logps/chosen": -835.9284057617188,
546
+ "logps/rejected": -2101.62646484375,
547
+ "loss": 0.04,
548
+ "rewards/accuracies": 0.987500011920929,
549
+ "rewards/chosen": -4.403895378112793,
550
+ "rewards/margins": 12.521451950073242,
551
+ "rewards/rejected": -16.92534828186035,
552
+ "step": 320
553
+ },
554
+ {
555
+ "epoch": 0.4379198805673053,
556
+ "grad_norm": 1.3125,
557
+ "learning_rate": 3.4553686613815436e-06,
558
+ "logits/chosen": -1.9260807037353516,
559
+ "logits/rejected": -1.830836534500122,
560
+ "logps/chosen": -688.6129760742188,
561
+ "logps/rejected": -1593.781982421875,
562
+ "loss": 0.0215,
563
+ "rewards/accuracies": 0.987500011920929,
564
+ "rewards/chosen": -3.1708381175994873,
565
+ "rewards/margins": 8.699010848999023,
566
+ "rewards/rejected": -11.869850158691406,
567
+ "step": 330
568
+ },
569
+ {
570
+ "epoch": 0.45119017997843575,
571
+ "grad_norm": 1.8515625,
572
+ "learning_rate": 3.3471722515025986e-06,
573
+ "logits/chosen": -1.8425785303115845,
574
+ "logits/rejected": -1.7366511821746826,
575
+ "logps/chosen": -812.2515258789062,
576
+ "logps/rejected": -1760.351806640625,
577
+ "loss": 0.0311,
578
+ "rewards/accuracies": 0.981249988079071,
579
+ "rewards/chosen": -4.064314842224121,
580
+ "rewards/margins": 10.192426681518555,
581
+ "rewards/rejected": -14.256741523742676,
582
+ "step": 340
583
+ },
584
+ {
585
+ "epoch": 0.46446047938956625,
586
+ "grad_norm": 1.25,
587
+ "learning_rate": 3.2371518779053744e-06,
588
+ "logits/chosen": -1.9980113506317139,
589
+ "logits/rejected": -1.9302339553833008,
590
+ "logps/chosen": -760.2137451171875,
591
+ "logps/rejected": -1788.198486328125,
592
+ "loss": 0.0314,
593
+ "rewards/accuracies": 1.0,
594
+ "rewards/chosen": -3.4233691692352295,
595
+ "rewards/margins": 10.155590057373047,
596
+ "rewards/rejected": -13.578959465026855,
597
+ "step": 350
598
+ },
599
+ {
600
+ "epoch": 0.4777307788006967,
601
+ "grad_norm": 2.125,
602
+ "learning_rate": 3.1255444146958845e-06,
603
+ "logits/chosen": -1.8775854110717773,
604
+ "logits/rejected": -1.7533352375030518,
605
+ "logps/chosen": -726.8655395507812,
606
+ "logps/rejected": -1770.5924072265625,
607
+ "loss": 0.0329,
608
+ "rewards/accuracies": 0.981249988079071,
609
+ "rewards/chosen": -3.306746244430542,
610
+ "rewards/margins": 10.716817855834961,
611
+ "rewards/rejected": -14.023564338684082,
612
+ "step": 360
613
+ },
614
+ {
615
+ "epoch": 0.49100107821182715,
616
+ "grad_norm": 2.578125,
617
+ "learning_rate": 3.0125901529875612e-06,
618
+ "logits/chosen": -1.854692816734314,
619
+ "logits/rejected": -1.7414394617080688,
620
+ "logps/chosen": -820.2403564453125,
621
+ "logps/rejected": -1938.7720947265625,
622
+ "loss": 0.0401,
623
+ "rewards/accuracies": 0.987500011920929,
624
+ "rewards/chosen": -3.7310116291046143,
625
+ "rewards/margins": 12.33505630493164,
626
+ "rewards/rejected": -16.06606674194336,
627
+ "step": 370
628
+ },
629
+ {
630
+ "epoch": 0.5042713776229576,
631
+ "grad_norm": 1.4765625,
632
+ "learning_rate": 2.898532283553963e-06,
633
+ "logits/chosen": -1.8446468114852905,
634
+ "logits/rejected": -1.6768661737442017,
635
+ "logps/chosen": -750.1494750976562,
636
+ "logps/rejected": -1983.754638671875,
637
+ "loss": 0.0151,
638
+ "rewards/accuracies": 0.9937499761581421,
639
+ "rewards/chosen": -3.6851119995117188,
640
+ "rewards/margins": 12.596293449401855,
641
+ "rewards/rejected": -16.281402587890625,
642
+ "step": 380
643
+ },
644
+ {
645
+ "epoch": 0.517541677034088,
646
+ "grad_norm": 3.78125,
647
+ "learning_rate": 2.783616373238507e-06,
648
+ "logits/chosen": -1.808468222618103,
649
+ "logits/rejected": -1.6193923950195312,
650
+ "logps/chosen": -817.4727783203125,
651
+ "logps/rejected": -1965.7611083984375,
652
+ "loss": 0.0232,
653
+ "rewards/accuracies": 1.0,
654
+ "rewards/chosen": -4.224337577819824,
655
+ "rewards/margins": 11.751169204711914,
656
+ "rewards/rejected": -15.975506782531738,
657
+ "step": 390
658
+ },
659
+ {
660
+ "epoch": 0.5308119764452185,
661
+ "grad_norm": 1.3359375,
662
+ "learning_rate": 2.6680898362485126e-06,
663
+ "logits/chosen": -1.9052226543426514,
664
+ "logits/rejected": -1.790858507156372,
665
+ "logps/chosen": -760.7432861328125,
666
+ "logps/rejected": -1788.7734375,
667
+ "loss": 0.0223,
668
+ "rewards/accuracies": 0.987500011920929,
669
+ "rewards/chosen": -3.564972400665283,
670
+ "rewards/margins": 10.65664291381836,
671
+ "rewards/rejected": -14.2216157913208,
672
+ "step": 400
673
+ },
674
+ {
675
+ "epoch": 0.5308119764452185,
676
+ "eval_logits/chosen": -1.918375015258789,
677
+ "eval_logits/rejected": -1.8019860982894897,
678
+ "eval_logps/chosen": -764.7831420898438,
679
+ "eval_logps/rejected": -1732.62890625,
680
+ "eval_loss": 0.030813412740826607,
681
+ "eval_rewards/accuracies": 0.9858208894729614,
682
+ "eval_rewards/chosen": -3.655445098876953,
683
+ "eval_rewards/margins": 10.11577320098877,
684
+ "eval_rewards/rejected": -13.771217346191406,
685
+ "eval_runtime": 830.1888,
686
+ "eval_samples_per_second": 6.455,
687
+ "eval_steps_per_second": 1.614,
688
+ "step": 400
689
+ },
690
+ {
691
+ "epoch": 0.544082275856349,
692
+ "grad_norm": 0.76953125,
693
+ "learning_rate": 2.55220140147187e-06,
694
+ "logits/chosen": -1.919586420059204,
695
+ "logits/rejected": -1.729554533958435,
696
+ "logps/chosen": -761.3343505859375,
697
+ "logps/rejected": -1788.455322265625,
698
+ "loss": 0.0345,
699
+ "rewards/accuracies": 0.9937499761581421,
700
+ "rewards/chosen": -3.7783074378967285,
701
+ "rewards/margins": 10.44767951965332,
702
+ "rewards/rejected": -14.225985527038574,
703
+ "step": 410
704
+ },
705
+ {
706
+ "epoch": 0.5573525752674795,
707
+ "grad_norm": 0.7890625,
708
+ "learning_rate": 2.4362005769631985e-06,
709
+ "logits/chosen": -1.8321539163589478,
710
+ "logits/rejected": -1.664629340171814,
711
+ "logps/chosen": -864.9075317382812,
712
+ "logps/rejected": -2111.5986328125,
713
+ "loss": 0.0269,
714
+ "rewards/accuracies": 0.9937499761581421,
715
+ "rewards/chosen": -4.227338790893555,
716
+ "rewards/margins": 13.148414611816406,
717
+ "rewards/rejected": -17.375751495361328,
718
+ "step": 420
719
+ },
720
+ {
721
+ "epoch": 0.57062287467861,
722
+ "grad_norm": 0.6796875,
723
+ "learning_rate": 2.320337112752459e-06,
724
+ "logits/chosen": -1.767311692237854,
725
+ "logits/rejected": -1.6259944438934326,
726
+ "logps/chosen": -846.6857299804688,
727
+ "logps/rejected": -1985.771240234375,
728
+ "loss": 0.0251,
729
+ "rewards/accuracies": 1.0,
730
+ "rewards/chosen": -4.594606399536133,
731
+ "rewards/margins": 11.99472427368164,
732
+ "rewards/rejected": -16.58932876586914,
733
+ "step": 430
734
+ },
735
+ {
736
+ "epoch": 0.5838931740897404,
737
+ "grad_norm": 2.21875,
738
+ "learning_rate": 2.2048604631325896e-06,
739
+ "logits/chosen": -1.9071296453475952,
740
+ "logits/rejected": -1.7615699768066406,
741
+ "logps/chosen": -740.1785888671875,
742
+ "logps/rejected": -1816.4349365234375,
743
+ "loss": 0.0218,
744
+ "rewards/accuracies": 1.0,
745
+ "rewards/chosen": -3.5860729217529297,
746
+ "rewards/margins": 11.016485214233398,
747
+ "rewards/rejected": -14.602559089660645,
748
+ "step": 440
749
+ },
750
+ {
751
+ "epoch": 0.5971634735008708,
752
+ "grad_norm": 1.3359375,
753
+ "learning_rate": 2.0900192495838617e-06,
754
+ "logits/chosen": -1.8986167907714844,
755
+ "logits/rejected": -1.749355673789978,
756
+ "logps/chosen": -749.77294921875,
757
+ "logps/rejected": -1782.9183349609375,
758
+ "loss": 0.0303,
759
+ "rewards/accuracies": 0.9937499761581421,
760
+ "rewards/chosen": -3.5921592712402344,
761
+ "rewards/margins": 10.537958145141602,
762
+ "rewards/rejected": -14.130119323730469,
763
+ "step": 450
764
+ },
765
+ {
766
+ "epoch": 0.6104337729120013,
767
+ "grad_norm": 5.6875,
768
+ "learning_rate": 1.976060725491293e-06,
769
+ "logits/chosen": -1.8696537017822266,
770
+ "logits/rejected": -1.6489883661270142,
771
+ "logps/chosen": -817.318359375,
772
+ "logps/rejected": -2079.50732421875,
773
+ "loss": 0.0169,
774
+ "rewards/accuracies": 1.0,
775
+ "rewards/chosen": -4.171055316925049,
776
+ "rewards/margins": 12.784004211425781,
777
+ "rewards/rejected": -16.955059051513672,
778
+ "step": 460
779
+ },
780
+ {
781
+ "epoch": 0.6237040723231317,
782
+ "grad_norm": 0.466796875,
783
+ "learning_rate": 1.8632302438075618e-06,
784
+ "logits/chosen": -1.7880550622940063,
785
+ "logits/rejected": -1.5018467903137207,
786
+ "logps/chosen": -904.4384765625,
787
+ "logps/rejected": -2307.56103515625,
788
+ "loss": 0.0164,
789
+ "rewards/accuracies": 0.9937499761581421,
790
+ "rewards/chosen": -4.95499324798584,
791
+ "rewards/margins": 14.273486137390137,
792
+ "rewards/rejected": -19.22848129272461,
793
+ "step": 470
794
+ },
795
+ {
796
+ "epoch": 0.6369743717342623,
797
+ "grad_norm": 0.8046875,
798
+ "learning_rate": 1.7517707288075617e-06,
799
+ "logits/chosen": -1.7752134799957275,
800
+ "logits/rejected": -1.5753581523895264,
801
+ "logps/chosen": -829.3978271484375,
802
+ "logps/rejected": -2190.319091796875,
803
+ "loss": 0.0281,
804
+ "rewards/accuracies": 0.987500011920929,
805
+ "rewards/chosen": -4.3248748779296875,
806
+ "rewards/margins": 13.842320442199707,
807
+ "rewards/rejected": -18.16719627380371,
808
+ "step": 480
809
+ },
810
+ {
811
+ "epoch": 0.6502446711453927,
812
+ "grad_norm": 5.03125,
813
+ "learning_rate": 1.6419221530719062e-06,
814
+ "logits/chosen": -1.8694307804107666,
815
+ "logits/rejected": -1.6351697444915771,
816
+ "logps/chosen": -807.4894409179688,
817
+ "logps/rejected": -2210.259765625,
818
+ "loss": 0.0224,
819
+ "rewards/accuracies": 0.987500011920929,
820
+ "rewards/chosen": -4.219082832336426,
821
+ "rewards/margins": 13.85973834991455,
822
+ "rewards/rejected": -18.078821182250977,
823
+ "step": 490
824
+ },
825
+ {
826
+ "epoch": 0.6635149705565232,
827
+ "grad_norm": 5.25,
828
+ "learning_rate": 1.5339210208254345e-06,
829
+ "logits/chosen": -1.796460509300232,
830
+ "logits/rejected": -1.5865790843963623,
831
+ "logps/chosen": -825.5130615234375,
832
+ "logps/rejected": -2155.32177734375,
833
+ "loss": 0.0378,
834
+ "rewards/accuracies": 0.987500011920929,
835
+ "rewards/chosen": -4.26907205581665,
836
+ "rewards/margins": 13.452142715454102,
837
+ "rewards/rejected": -17.721214294433594,
838
+ "step": 500
839
+ },
840
+ {
841
+ "epoch": 0.6635149705565232,
842
+ "eval_logits/chosen": -1.8649603128433228,
843
+ "eval_logits/rejected": -1.6923589706420898,
844
+ "eval_logps/chosen": -799.422119140625,
845
+ "eval_logps/rejected": -1988.354248046875,
846
+ "eval_loss": 0.029703186824917793,
847
+ "eval_rewards/accuracies": 0.9850746393203735,
848
+ "eval_rewards/chosen": -4.001834392547607,
849
+ "eval_rewards/margins": 12.326638221740723,
850
+ "eval_rewards/rejected": -16.328474044799805,
851
+ "eval_runtime": 830.0926,
852
+ "eval_samples_per_second": 6.456,
853
+ "eval_steps_per_second": 1.614,
854
+ "step": 500
855
+ },
856
+ {
857
+ "epoch": 0.6767852699676536,
858
+ "grad_norm": 2.53125,
859
+ "learning_rate": 1.4279998587430944e-06,
860
+ "logits/chosen": -1.9209420680999756,
861
+ "logits/rejected": -1.7078396081924438,
862
+ "logps/chosen": -800.6217041015625,
863
+ "logps/rejected": -2088.8896484375,
864
+ "loss": 0.0344,
865
+ "rewards/accuracies": 1.0,
866
+ "rewards/chosen": -3.650468349456787,
867
+ "rewards/margins": 13.276763916015625,
868
+ "rewards/rejected": -16.927234649658203,
869
+ "step": 510
870
+ },
871
+ {
872
+ "epoch": 0.6900555693787841,
873
+ "grad_norm": 0.44921875,
874
+ "learning_rate": 1.3243867153195033e-06,
875
+ "logits/chosen": -1.9058411121368408,
876
+ "logits/rejected": -1.7081434726715088,
877
+ "logps/chosen": -756.5093994140625,
878
+ "logps/rejected": -1919.1995849609375,
879
+ "loss": 0.0323,
880
+ "rewards/accuracies": 1.0,
881
+ "rewards/chosen": -3.5202860832214355,
882
+ "rewards/margins": 11.93966007232666,
883
+ "rewards/rejected": -15.459945678710938,
884
+ "step": 520
885
+ },
886
+ {
887
+ "epoch": 0.7033258687899145,
888
+ "grad_norm": 10.4375,
889
+ "learning_rate": 1.2233046698800343e-06,
890
+ "logits/chosen": -1.9104827642440796,
891
+ "logits/rejected": -1.6314716339111328,
892
+ "logps/chosen": -794.9130859375,
893
+ "logps/rejected": -2090.015380859375,
894
+ "loss": 0.0287,
895
+ "rewards/accuracies": 0.9937499761581421,
896
+ "rewards/chosen": -3.8667774200439453,
897
+ "rewards/margins": 13.207371711730957,
898
+ "rewards/rejected": -17.074146270751953,
899
+ "step": 530
900
+ },
901
+ {
902
+ "epoch": 0.716596168201045,
903
+ "grad_norm": 1.3125,
904
+ "learning_rate": 1.124971352290545e-06,
905
+ "logits/chosen": -1.8662798404693604,
906
+ "logits/rejected": -1.6738961935043335,
907
+ "logps/chosen": -800.027587890625,
908
+ "logps/rejected": -2160.337158203125,
909
+ "loss": 0.0261,
910
+ "rewards/accuracies": 0.981249988079071,
911
+ "rewards/chosen": -4.005196571350098,
912
+ "rewards/margins": 13.762578964233398,
913
+ "rewards/rejected": -17.76777458190918,
914
+ "step": 540
915
+ },
916
+ {
917
+ "epoch": 0.7298664676121756,
918
+ "grad_norm": 1.3046875,
919
+ "learning_rate": 1.0295984743997911e-06,
920
+ "logits/chosen": -1.8652830123901367,
921
+ "logits/rejected": -1.6665403842926025,
922
+ "logps/chosen": -794.8709106445312,
923
+ "logps/rejected": -1995.255859375,
924
+ "loss": 0.03,
925
+ "rewards/accuracies": 0.9937499761581421,
926
+ "rewards/chosen": -3.9191575050354004,
927
+ "rewards/margins": 12.423823356628418,
928
+ "rewards/rejected": -16.34298324584961,
929
+ "step": 550
930
+ },
931
+ {
932
+ "epoch": 0.743136767023306,
933
+ "grad_norm": 0.470703125,
934
+ "learning_rate": 9.37391374223355e-07,
935
+ "logits/chosen": -1.8799976110458374,
936
+ "logits/rejected": -1.6955276727676392,
937
+ "logps/chosen": -788.1275634765625,
938
+ "logps/rejected": -1998.5726318359375,
939
+ "loss": 0.0196,
940
+ "rewards/accuracies": 1.0,
941
+ "rewards/chosen": -3.821469783782959,
942
+ "rewards/margins": 12.33929443359375,
943
+ "rewards/rejected": -16.160762786865234,
944
+ "step": 560
945
+ },
946
+ {
947
+ "epoch": 0.7564070664344364,
948
+ "grad_norm": 1.6171875,
949
+ "learning_rate": 8.48548573850449e-07,
950
+ "logits/chosen": -1.8499343395233154,
951
+ "logits/rejected": -1.6717488765716553,
952
+ "logps/chosen": -778.8970947265625,
953
+ "logps/rejected": -1945.9202880859375,
954
+ "loss": 0.0362,
955
+ "rewards/accuracies": 0.9937499761581421,
956
+ "rewards/chosen": -3.6539416313171387,
957
+ "rewards/margins": 12.185694694519043,
958
+ "rewards/rejected": -15.839635848999023,
959
+ "step": 570
960
+ },
961
+ {
962
+ "epoch": 0.7696773658455669,
963
+ "grad_norm": 2.46875,
964
+ "learning_rate": 7.632613520254159e-07,
965
+ "logits/chosen": -1.9424989223480225,
966
+ "logits/rejected": -1.8162052631378174,
967
+ "logps/chosen": -791.0227661132812,
968
+ "logps/rejected": -1805.901123046875,
969
+ "loss": 0.035,
970
+ "rewards/accuracies": 0.987500011920929,
971
+ "rewards/chosen": -3.653287410736084,
972
+ "rewards/margins": 11.017667770385742,
973
+ "rewards/rejected": -14.670953750610352,
974
+ "step": 580
975
+ },
976
+ {
977
+ "epoch": 0.7829476652566973,
978
+ "grad_norm": 0.94921875,
979
+ "learning_rate": 6.817133323241757e-07,
980
+ "logits/chosen": -1.9590984582901,
981
+ "logits/rejected": -1.754612922668457,
982
+ "logps/chosen": -768.0120239257812,
983
+ "logps/rejected": -2016.432373046875,
984
+ "loss": 0.0217,
985
+ "rewards/accuracies": 0.9937499761581421,
986
+ "rewards/chosen": -3.4281082153320312,
987
+ "rewards/margins": 12.803730964660645,
988
+ "rewards/rejected": -16.23183822631836,
989
+ "step": 590
990
+ },
991
+ {
992
+ "epoch": 0.7962179646678278,
993
+ "grad_norm": 2.203125,
994
+ "learning_rate": 6.040800878122655e-07,
995
+ "logits/chosen": -1.9362680912017822,
996
+ "logits/rejected": -1.8174184560775757,
997
+ "logps/chosen": -807.0682373046875,
998
+ "logps/rejected": -1890.33984375,
999
+ "loss": 0.0352,
1000
+ "rewards/accuracies": 1.0,
1001
+ "rewards/chosen": -3.6727454662323,
1002
+ "rewards/margins": 11.275753021240234,
1003
+ "rewards/rejected": -14.948498725891113,
1004
+ "step": 600
1005
+ },
1006
+ {
1007
+ "epoch": 0.7962179646678278,
1008
+ "eval_logits/chosen": -1.8977892398834229,
1009
+ "eval_logits/rejected": -1.7437201738357544,
1010
+ "eval_logps/chosen": -780.2752075195312,
1011
+ "eval_logps/rejected": -1919.8118896484375,
1012
+ "eval_loss": 0.02784702554345131,
1013
+ "eval_rewards/accuracies": 0.983582079410553,
1014
+ "eval_rewards/chosen": -3.810366153717041,
1015
+ "eval_rewards/margins": 11.832680702209473,
1016
+ "eval_rewards/rejected": -15.643047332763672,
1017
+ "eval_runtime": 829.9624,
1018
+ "eval_samples_per_second": 6.457,
1019
+ "eval_steps_per_second": 1.615,
1020
+ "step": 600
1021
+ },
1022
+ {
1023
+ "epoch": 0.8094882640789582,
1024
+ "grad_norm": 1.171875,
1025
+ "learning_rate": 5.305287630356363e-07,
1026
+ "logits/chosen": -1.8633983135223389,
1027
+ "logits/rejected": -1.6604722738265991,
1028
+ "logps/chosen": -751.8145141601562,
1029
+ "logps/rejected": -1894.828857421875,
1030
+ "loss": 0.0299,
1031
+ "rewards/accuracies": 0.987500011920929,
1032
+ "rewards/chosen": -3.7778866291046143,
1033
+ "rewards/margins": 11.620625495910645,
1034
+ "rewards/rejected": -15.39851188659668,
1035
+ "step": 610
1036
+ },
1037
+ {
1038
+ "epoch": 0.8227585634900887,
1039
+ "grad_norm": 0.37109375,
1040
+ "learning_rate": 4.612177141580876e-07,
1041
+ "logits/chosen": -1.8507238626480103,
1042
+ "logits/rejected": -1.6502704620361328,
1043
+ "logps/chosen": -739.475341796875,
1044
+ "logps/rejected": -1800.6304931640625,
1045
+ "loss": 0.028,
1046
+ "rewards/accuracies": 0.9937499761581421,
1047
+ "rewards/chosen": -3.580106735229492,
1048
+ "rewards/margins": 10.93178939819336,
1049
+ "rewards/rejected": -14.511896133422852,
1050
+ "step": 620
1051
+ },
1052
+ {
1053
+ "epoch": 0.8360288629012192,
1054
+ "grad_norm": 2.375,
1055
+ "learning_rate": 3.962961680200927e-07,
1056
+ "logits/chosen": -1.9084770679473877,
1057
+ "logits/rejected": -1.7507511377334595,
1058
+ "logps/chosen": -785.728515625,
1059
+ "logps/rejected": -1896.435791015625,
1060
+ "loss": 0.0222,
1061
+ "rewards/accuracies": 0.9937499761581421,
1062
+ "rewards/chosen": -3.7701239585876465,
1063
+ "rewards/margins": 11.418277740478516,
1064
+ "rewards/rejected": -15.18840217590332,
1065
+ "step": 630
1066
+ },
1067
+ {
1068
+ "epoch": 0.8492991623123497,
1069
+ "grad_norm": 1.96875,
1070
+ "learning_rate": 3.3590390085308457e-07,
1071
+ "logits/chosen": -1.9369093179702759,
1072
+ "logits/rejected": -1.782634973526001,
1073
+ "logps/chosen": -792.2256469726562,
1074
+ "logps/rejected": -1962.2239990234375,
1075
+ "loss": 0.0246,
1076
+ "rewards/accuracies": 0.981249988079071,
1077
+ "rewards/chosen": -3.8103549480438232,
1078
+ "rewards/margins": 11.93663501739502,
1079
+ "rewards/rejected": -15.746989250183105,
1080
+ "step": 640
1081
+ },
1082
+ {
1083
+ "epoch": 0.8625694617234801,
1084
+ "grad_norm": 0.373046875,
1085
+ "learning_rate": 2.801709373409248e-07,
1086
+ "logits/chosen": -1.8602193593978882,
1087
+ "logits/rejected": -1.7032759189605713,
1088
+ "logps/chosen": -804.9444580078125,
1089
+ "logps/rejected": -2019.756103515625,
1090
+ "loss": 0.0177,
1091
+ "rewards/accuracies": 0.9937499761581421,
1092
+ "rewards/chosen": -3.675445556640625,
1093
+ "rewards/margins": 12.645570755004883,
1094
+ "rewards/rejected": -16.321016311645508,
1095
+ "step": 650
1096
+ },
1097
+ {
1098
+ "epoch": 0.8758397611346106,
1099
+ "grad_norm": 1.90625,
1100
+ "learning_rate": 2.2921727067647032e-07,
1101
+ "logits/chosen": -1.9662708044052124,
1102
+ "logits/rejected": -1.7979133129119873,
1103
+ "logps/chosen": -745.5474243164062,
1104
+ "logps/rejected": -1877.61328125,
1105
+ "loss": 0.0226,
1106
+ "rewards/accuracies": 0.987500011920929,
1107
+ "rewards/chosen": -3.5280399322509766,
1108
+ "rewards/margins": 11.784358024597168,
1109
+ "rewards/rejected": -15.312397956848145,
1110
+ "step": 660
1111
+ },
1112
+ {
1113
+ "epoch": 0.889110060545741,
1114
+ "grad_norm": 15.375,
1115
+ "learning_rate": 1.8315260421596925e-07,
1116
+ "logits/chosen": -1.9315414428710938,
1117
+ "logits/rejected": -1.7226520776748657,
1118
+ "logps/chosen": -760.8677978515625,
1119
+ "logps/rejected": -1978.8160400390625,
1120
+ "loss": 0.0338,
1121
+ "rewards/accuracies": 0.987500011920929,
1122
+ "rewards/chosen": -3.70927357673645,
1123
+ "rewards/margins": 12.64477252960205,
1124
+ "rewards/rejected": -16.35404396057129,
1125
+ "step": 670
1126
+ },
1127
+ {
1128
+ "epoch": 0.9023803599568715,
1129
+ "grad_norm": 0.412109375,
1130
+ "learning_rate": 1.4207611528749e-07,
1131
+ "logits/chosen": -1.9155975580215454,
1132
+ "logits/rejected": -1.6965796947479248,
1133
+ "logps/chosen": -769.4285278320312,
1134
+ "logps/rejected": -1877.641845703125,
1135
+ "loss": 0.0319,
1136
+ "rewards/accuracies": 0.987500011920929,
1137
+ "rewards/chosen": -3.767559766769409,
1138
+ "rewards/margins": 11.334938049316406,
1139
+ "rewards/rejected": -15.102499008178711,
1140
+ "step": 680
1141
+ },
1142
+ {
1143
+ "epoch": 0.9156506593680019,
1144
+ "grad_norm": 0.56640625,
1145
+ "learning_rate": 1.060762416619196e-07,
1146
+ "logits/chosen": -1.946947693824768,
1147
+ "logits/rejected": -1.7013956308364868,
1148
+ "logps/chosen": -765.5987548828125,
1149
+ "logps/rejected": -2080.13134765625,
1150
+ "loss": 0.0215,
1151
+ "rewards/accuracies": 0.981249988079071,
1152
+ "rewards/chosen": -3.7093491554260254,
1153
+ "rewards/margins": 12.91864013671875,
1154
+ "rewards/rejected": -16.627988815307617,
1155
+ "step": 690
1156
+ },
1157
+ {
1158
+ "epoch": 0.9289209587791325,
1159
+ "grad_norm": 0.515625,
1160
+ "learning_rate": 7.523049114624647e-08,
1161
+ "logits/chosen": -1.9617208242416382,
1162
+ "logits/rejected": -1.8319323062896729,
1163
+ "logps/chosen": -802.1334838867188,
1164
+ "logps/rejected": -1892.474609375,
1165
+ "loss": 0.0238,
1166
+ "rewards/accuracies": 0.987500011920929,
1167
+ "rewards/chosen": -3.8938992023468018,
1168
+ "rewards/margins": 11.566261291503906,
1169
+ "rewards/rejected": -15.460162162780762,
1170
+ "step": 700
1171
+ },
1172
+ {
1173
+ "epoch": 0.9289209587791325,
1174
+ "eval_logits/chosen": -1.8936866521835327,
1175
+ "eval_logits/rejected": -1.737060785293579,
1176
+ "eval_logps/chosen": -788.9779663085938,
1177
+ "eval_logps/rejected": -1951.9310302734375,
1178
+ "eval_loss": 0.027877720072865486,
1179
+ "eval_rewards/accuracies": 0.9828358292579651,
1180
+ "eval_rewards/chosen": -3.897392511367798,
1181
+ "eval_rewards/margins": 12.06684684753418,
1182
+ "eval_rewards/rejected": -15.964240074157715,
1183
+ "eval_runtime": 830.3344,
1184
+ "eval_samples_per_second": 6.454,
1185
+ "eval_steps_per_second": 1.614,
1186
+ "step": 700
1187
+ },
1188
+ {
1189
+ "epoch": 0.942191258190263,
1190
+ "grad_norm": 0.92578125,
1191
+ "learning_rate": 4.9605274709082774e-08,
1192
+ "logits/chosen": -1.883724570274353,
1193
+ "logits/rejected": -1.7111318111419678,
1194
+ "logps/chosen": -796.9892578125,
1195
+ "logps/rejected": -2012.552001953125,
1196
+ "loss": 0.0267,
1197
+ "rewards/accuracies": 0.987500011920929,
1198
+ "rewards/chosen": -3.85164213180542,
1199
+ "rewards/margins": 12.14155101776123,
1200
+ "rewards/rejected": -15.993194580078125,
1201
+ "step": 710
1202
+ },
1203
+ {
1204
+ "epoch": 0.9554615576013934,
1205
+ "grad_norm": 6.9375,
1206
+ "learning_rate": 2.9255763497703373e-08,
1207
+ "logits/chosen": -1.8911021947860718,
1208
+ "logits/rejected": -1.6832084655761719,
1209
+ "logps/chosen": -790.8480224609375,
1210
+ "logps/rejected": -2087.356689453125,
1211
+ "loss": 0.0314,
1212
+ "rewards/accuracies": 1.0,
1213
+ "rewards/chosen": -3.734605312347412,
1214
+ "rewards/margins": 13.477132797241211,
1215
+ "rewards/rejected": -17.21173858642578,
1216
+ "step": 720
1217
+ },
1218
+ {
1219
+ "epoch": 0.9687318570125238,
1220
+ "grad_norm": 3.46875,
1221
+ "learning_rate": 1.42257700544432e-08,
1222
+ "logits/chosen": -1.9454247951507568,
1223
+ "logits/rejected": -1.7627389430999756,
1224
+ "logps/chosen": -774.6414794921875,
1225
+ "logps/rejected": -1878.0687255859375,
1226
+ "loss": 0.0266,
1227
+ "rewards/accuracies": 0.9937499761581421,
1228
+ "rewards/chosen": -3.7591774463653564,
1229
+ "rewards/margins": 11.635783195495605,
1230
+ "rewards/rejected": -15.3949613571167,
1231
+ "step": 730
1232
+ },
1233
+ {
1234
+ "epoch": 0.9820021564236543,
1235
+ "grad_norm": 0.6640625,
1236
+ "learning_rate": 4.547653988198619e-09,
1237
+ "logits/chosen": -1.8585050106048584,
1238
+ "logits/rejected": -1.6734154224395752,
1239
+ "logps/chosen": -771.71044921875,
1240
+ "logps/rejected": -1891.1732177734375,
1241
+ "loss": 0.0211,
1242
+ "rewards/accuracies": 1.0,
1243
+ "rewards/chosen": -3.8070855140686035,
1244
+ "rewards/margins": 11.580972671508789,
1245
+ "rewards/rejected": -15.38805866241455,
1246
+ "step": 740
1247
+ },
1248
+ {
1249
+ "epoch": 0.9952724558347847,
1250
+ "grad_norm": 0.46875,
1251
+ "learning_rate": 2.422523041178959e-10,
1252
+ "logits/chosen": -1.9344615936279297,
1253
+ "logits/rejected": -1.733974814414978,
1254
+ "logps/chosen": -803.4793090820312,
1255
+ "logps/rejected": -2031.9625244140625,
1256
+ "loss": 0.0295,
1257
+ "rewards/accuracies": 0.981249988079071,
1258
+ "rewards/chosen": -3.7710418701171875,
1259
+ "rewards/margins": 12.417600631713867,
1260
+ "rewards/rejected": -16.188644409179688,
1261
+ "step": 750
1262
+ },
1263
+ {
1264
+ "epoch": 0.999253545658124,
1265
+ "step": 753,
1266
+ "total_flos": 0.0,
1267
+ "train_loss": 0.0882907345950366,
1268
+ "train_runtime": 20220.3954,
1269
+ "train_samples_per_second": 2.385,
1270
+ "train_steps_per_second": 0.037
1271
+ }
1272
+ ],
1273
+ "logging_steps": 10,
1274
+ "max_steps": 753,
1275
+ "num_input_tokens_seen": 0,
1276
+ "num_train_epochs": 1,
1277
+ "save_steps": 100,
1278
+ "stateful_callbacks": {
1279
+ "TrainerControl": {
1280
+ "args": {
1281
+ "should_epoch_stop": false,
1282
+ "should_evaluate": false,
1283
+ "should_log": false,
1284
+ "should_save": true,
1285
+ "should_training_stop": true
1286
+ },
1287
+ "attributes": {}
1288
+ }
1289
+ },
1290
+ "total_flos": 0.0,
1291
+ "train_batch_size": 1,
1292
+ "trial_name": null,
1293
+ "trial_params": null
1294
+ }