sfulay commited on
Commit
36f8778
·
verified ·
1 Parent(s): 790c86c

Model save

Browse files
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: alignment-handbook/zephyr-7b-sft-full
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: zephyr-7b-dpo-full-ultrabin-high-curriculum
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # zephyr-7b-dpo-full-ultrabin-high-curriculum
17
+
18
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on an unknown dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.5057
21
+ - Rewards/chosen: -1.0106
22
+ - Rewards/rejected: -1.9562
23
+ - Rewards/accuracies: 0.7617
24
+ - Rewards/margins: 0.9455
25
+ - Logps/rejected: -458.2784
26
+ - Logps/chosen: -363.6942
27
+ - Logits/rejected: 2.1024
28
+ - Logits/chosen: 1.3120
29
+
30
+ ## Model description
31
+
32
+ More information needed
33
+
34
+ ## Intended uses & limitations
35
+
36
+ More information needed
37
+
38
+ ## Training and evaluation data
39
+
40
+ More information needed
41
+
42
+ ## Training procedure
43
+
44
+ ### Training hyperparameters
45
+
46
+ The following hyperparameters were used during training:
47
+ - learning_rate: 5e-07
48
+ - train_batch_size: 8
49
+ - eval_batch_size: 8
50
+ - seed: 55
51
+ - distributed_type: multi-GPU
52
+ - num_devices: 8
53
+ - gradient_accumulation_steps: 2
54
+ - total_train_batch_size: 128
55
+ - total_eval_batch_size: 64
56
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
57
+ - lr_scheduler_type: cosine
58
+ - lr_scheduler_warmup_ratio: 0.1
59
+ - num_epochs: 1
60
+
61
+ ### Training results
62
+
63
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
+ | 0.6557 | 0.1046 | 50 | 0.6479 | -0.0110 | -0.1284 | 0.7070 | 0.1174 | -275.5045 | -263.7303 | -2.6116 | -2.6493 |
66
+ | 0.5639 | 0.2092 | 100 | 0.5652 | -0.6096 | -1.2098 | 0.7383 | 0.6002 | -383.6375 | -323.5883 | -0.5618 | -0.9012 |
67
+ | 0.5323 | 0.3138 | 150 | 0.5405 | -0.5498 | -1.2901 | 0.7617 | 0.7402 | -391.6696 | -317.6140 | 0.4792 | -0.0900 |
68
+ | 0.536 | 0.4184 | 200 | 0.5354 | -0.6382 | -1.3831 | 0.7656 | 0.7449 | -400.9734 | -326.4470 | 0.6525 | -0.0195 |
69
+ | 0.5163 | 0.5230 | 250 | 0.5185 | -1.1124 | -1.9604 | 0.7383 | 0.8480 | -458.7008 | -373.8662 | 2.4883 | 1.7620 |
70
+ | 0.5018 | 0.6276 | 300 | 0.5108 | -0.9326 | -1.8124 | 0.7578 | 0.8798 | -443.9044 | -355.8924 | 2.0905 | 1.3198 |
71
+ | 0.4999 | 0.7322 | 350 | 0.5094 | -1.0356 | -1.9491 | 0.7461 | 0.9135 | -457.5764 | -366.1917 | 2.0403 | 1.2353 |
72
+ | 0.4966 | 0.8368 | 400 | 0.5066 | -0.9929 | -1.9227 | 0.7578 | 0.9298 | -454.9321 | -361.9198 | 2.0226 | 1.2642 |
73
+ | 0.5198 | 0.9414 | 450 | 0.5057 | -1.0106 | -1.9562 | 0.7617 | 0.9455 | -458.2784 | -363.6942 | 2.1024 | 1.3120 |
74
+
75
+
76
+ ### Framework versions
77
+
78
+ - Transformers 4.44.0.dev0
79
+ - Pytorch 2.1.2
80
+ - Datasets 2.20.0
81
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.5417051395113,
5
+ "train_runtime": 12463.9447,
6
+ "train_samples": 61134,
7
+ "train_samples_per_second": 4.905,
8
+ "train_steps_per_second": 0.038
9
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.44.0.dev0"
6
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.5417051395113,
5
+ "train_runtime": 12463.9447,
6
+ "train_samples": 61134,
7
+ "train_samples_per_second": 4.905,
8
+ "train_steps_per_second": 0.038
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,891 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.0,
5
+ "eval_steps": 50,
6
+ "global_step": 478,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.02092050209205021,
13
+ "grad_norm": 9.137621785616465,
14
+ "learning_rate": 1.0416666666666667e-07,
15
+ "logits/chosen": -2.6498842239379883,
16
+ "logits/rejected": -2.6401047706604004,
17
+ "logps/chosen": -294.3343505859375,
18
+ "logps/rejected": -289.1372985839844,
19
+ "loss": 0.6931,
20
+ "rewards/accuracies": 0.41874998807907104,
21
+ "rewards/chosen": -0.0002774398017209023,
22
+ "rewards/margins": 0.00026476362836547196,
23
+ "rewards/rejected": -0.0005422033718787134,
24
+ "step": 10
25
+ },
26
+ {
27
+ "epoch": 0.04184100418410042,
28
+ "grad_norm": 8.382944921921224,
29
+ "learning_rate": 2.0833333333333333e-07,
30
+ "logits/chosen": -2.6785335540771484,
31
+ "logits/rejected": -2.6339826583862305,
32
+ "logps/chosen": -294.88311767578125,
33
+ "logps/rejected": -276.8221130371094,
34
+ "loss": 0.6926,
35
+ "rewards/accuracies": 0.5562499761581421,
36
+ "rewards/chosen": 0.000260740373050794,
37
+ "rewards/margins": 0.001288960687816143,
38
+ "rewards/rejected": -0.0010282204020768404,
39
+ "step": 20
40
+ },
41
+ {
42
+ "epoch": 0.06276150627615062,
43
+ "grad_norm": 8.348013142376631,
44
+ "learning_rate": 3.1249999999999997e-07,
45
+ "logits/chosen": -2.642605781555176,
46
+ "logits/rejected": -2.6058638095855713,
47
+ "logps/chosen": -282.0052185058594,
48
+ "logps/rejected": -247.48208618164062,
49
+ "loss": 0.6882,
50
+ "rewards/accuracies": 0.6937500238418579,
51
+ "rewards/chosen": 0.006200692616403103,
52
+ "rewards/margins": 0.009530355222523212,
53
+ "rewards/rejected": -0.0033296621404588223,
54
+ "step": 30
55
+ },
56
+ {
57
+ "epoch": 0.08368200836820083,
58
+ "grad_norm": 8.873941277767704,
59
+ "learning_rate": 4.1666666666666667e-07,
60
+ "logits/chosen": -2.6004221439361572,
61
+ "logits/rejected": -2.5648019313812256,
62
+ "logps/chosen": -263.68304443359375,
63
+ "logps/rejected": -269.95159912109375,
64
+ "loss": 0.6776,
65
+ "rewards/accuracies": 0.65625,
66
+ "rewards/chosen": 0.0284257885068655,
67
+ "rewards/margins": 0.026932990178465843,
68
+ "rewards/rejected": 0.001492797746323049,
69
+ "step": 40
70
+ },
71
+ {
72
+ "epoch": 0.10460251046025104,
73
+ "grad_norm": 9.18858486699177,
74
+ "learning_rate": 4.999733114418725e-07,
75
+ "logits/chosen": -2.5885472297668457,
76
+ "logits/rejected": -2.5707509517669678,
77
+ "logps/chosen": -269.8135070800781,
78
+ "logps/rejected": -265.47064208984375,
79
+ "loss": 0.6557,
80
+ "rewards/accuracies": 0.706250011920929,
81
+ "rewards/chosen": 0.018491629511117935,
82
+ "rewards/margins": 0.0841543972492218,
83
+ "rewards/rejected": -0.06566276401281357,
84
+ "step": 50
85
+ },
86
+ {
87
+ "epoch": 0.10460251046025104,
88
+ "eval_logits/chosen": -2.649272918701172,
89
+ "eval_logits/rejected": -2.611584424972534,
90
+ "eval_logps/chosen": -263.7303466796875,
91
+ "eval_logps/rejected": -275.50445556640625,
92
+ "eval_loss": 0.6479206681251526,
93
+ "eval_rewards/accuracies": 0.70703125,
94
+ "eval_rewards/chosen": -0.011004311963915825,
95
+ "eval_rewards/margins": 0.11741933971643448,
96
+ "eval_rewards/rejected": -0.12842364609241486,
97
+ "eval_runtime": 102.963,
98
+ "eval_samples_per_second": 19.424,
99
+ "eval_steps_per_second": 0.311,
100
+ "step": 50
101
+ },
102
+ {
103
+ "epoch": 0.12552301255230125,
104
+ "grad_norm": 12.415021202754499,
105
+ "learning_rate": 4.990398100856366e-07,
106
+ "logits/chosen": -2.608935832977295,
107
+ "logits/rejected": -2.5622076988220215,
108
+ "logps/chosen": -316.8729553222656,
109
+ "logps/rejected": -300.7294616699219,
110
+ "loss": 0.6376,
111
+ "rewards/accuracies": 0.65625,
112
+ "rewards/chosen": -0.07160066068172455,
113
+ "rewards/margins": 0.12479270994663239,
114
+ "rewards/rejected": -0.19639338552951813,
115
+ "step": 60
116
+ },
117
+ {
118
+ "epoch": 0.14644351464435146,
119
+ "grad_norm": 16.95023916970366,
120
+ "learning_rate": 4.967775735898179e-07,
121
+ "logits/chosen": -2.5255544185638428,
122
+ "logits/rejected": -2.4770090579986572,
123
+ "logps/chosen": -310.1564636230469,
124
+ "logps/rejected": -286.7000427246094,
125
+ "loss": 0.6174,
126
+ "rewards/accuracies": 0.6625000238418579,
127
+ "rewards/chosen": -0.1686697006225586,
128
+ "rewards/margins": 0.18098047375679016,
129
+ "rewards/rejected": -0.34965020418167114,
130
+ "step": 70
131
+ },
132
+ {
133
+ "epoch": 0.16736401673640167,
134
+ "grad_norm": 14.651689743135485,
135
+ "learning_rate": 4.931986719649298e-07,
136
+ "logits/chosen": -2.3215534687042236,
137
+ "logits/rejected": -2.257579803466797,
138
+ "logps/chosen": -285.29833984375,
139
+ "logps/rejected": -311.03729248046875,
140
+ "loss": 0.6044,
141
+ "rewards/accuracies": 0.675000011920929,
142
+ "rewards/chosen": -0.07682778686285019,
143
+ "rewards/margins": 0.2939576804637909,
144
+ "rewards/rejected": -0.3707854449748993,
145
+ "step": 80
146
+ },
147
+ {
148
+ "epoch": 0.18828451882845187,
149
+ "grad_norm": 19.539798910302995,
150
+ "learning_rate": 4.883222001996351e-07,
151
+ "logits/chosen": -1.3052003383636475,
152
+ "logits/rejected": -1.1468216180801392,
153
+ "logps/chosen": -294.7815246582031,
154
+ "logps/rejected": -303.13427734375,
155
+ "loss": 0.5866,
156
+ "rewards/accuracies": 0.737500011920929,
157
+ "rewards/chosen": -0.328985333442688,
158
+ "rewards/margins": 0.45616284012794495,
159
+ "rewards/rejected": -0.7851482629776001,
160
+ "step": 90
161
+ },
162
+ {
163
+ "epoch": 0.20920502092050208,
164
+ "grad_norm": 25.641340302712948,
165
+ "learning_rate": 4.821741763807186e-07,
166
+ "logits/chosen": -0.7993453145027161,
167
+ "logits/rejected": -0.40382108092308044,
168
+ "logps/chosen": -333.15521240234375,
169
+ "logps/rejected": -387.1316833496094,
170
+ "loss": 0.5639,
171
+ "rewards/accuracies": 0.7437499761581421,
172
+ "rewards/chosen": -0.5871611833572388,
173
+ "rewards/margins": 0.6322518587112427,
174
+ "rewards/rejected": -1.2194130420684814,
175
+ "step": 100
176
+ },
177
+ {
178
+ "epoch": 0.20920502092050208,
179
+ "eval_logits/chosen": -0.9011980295181274,
180
+ "eval_logits/rejected": -0.5617944598197937,
181
+ "eval_logps/chosen": -323.5882873535156,
182
+ "eval_logps/rejected": -383.63751220703125,
183
+ "eval_loss": 0.5651677846908569,
184
+ "eval_rewards/accuracies": 0.73828125,
185
+ "eval_rewards/chosen": -0.6095837354660034,
186
+ "eval_rewards/margins": 0.6001702547073364,
187
+ "eval_rewards/rejected": -1.2097539901733398,
188
+ "eval_runtime": 101.4152,
189
+ "eval_samples_per_second": 19.721,
190
+ "eval_steps_per_second": 0.316,
191
+ "step": 100
192
+ },
193
+ {
194
+ "epoch": 0.2301255230125523,
195
+ "grad_norm": 21.689227912073658,
196
+ "learning_rate": 4.747874028753375e-07,
197
+ "logits/chosen": -0.8007869720458984,
198
+ "logits/rejected": -0.4384717047214508,
199
+ "logps/chosen": -323.02410888671875,
200
+ "logps/rejected": -343.52508544921875,
201
+ "loss": 0.5585,
202
+ "rewards/accuracies": 0.65625,
203
+ "rewards/chosen": -0.5073341727256775,
204
+ "rewards/margins": 0.517856240272522,
205
+ "rewards/rejected": -1.0251905918121338,
206
+ "step": 110
207
+ },
208
+ {
209
+ "epoch": 0.2510460251046025,
210
+ "grad_norm": 40.51442433382909,
211
+ "learning_rate": 4.662012913161997e-07,
212
+ "logits/chosen": 0.0625094398856163,
213
+ "logits/rejected": 0.6715419292449951,
214
+ "logps/chosen": -367.49664306640625,
215
+ "logps/rejected": -383.600830078125,
216
+ "loss": 0.5396,
217
+ "rewards/accuracies": 0.768750011920929,
218
+ "rewards/chosen": -0.6050666570663452,
219
+ "rewards/margins": 0.6428467631340027,
220
+ "rewards/rejected": -1.2479135990142822,
221
+ "step": 120
222
+ },
223
+ {
224
+ "epoch": 0.2719665271966527,
225
+ "grad_norm": 31.04938830573247,
226
+ "learning_rate": 4.5646165232345103e-07,
227
+ "logits/chosen": 0.35010597109794617,
228
+ "logits/rejected": 1.0627130270004272,
229
+ "logps/chosen": -369.3751220703125,
230
+ "logps/rejected": -391.6081237792969,
231
+ "loss": 0.5421,
232
+ "rewards/accuracies": 0.737500011920929,
233
+ "rewards/chosen": -0.6456856727600098,
234
+ "rewards/margins": 0.6666911840438843,
235
+ "rewards/rejected": -1.312376856803894,
236
+ "step": 130
237
+ },
238
+ {
239
+ "epoch": 0.2928870292887029,
240
+ "grad_norm": 23.124000981376877,
241
+ "learning_rate": 4.456204510851956e-07,
242
+ "logits/chosen": 1.3312932252883911,
243
+ "logits/rejected": 1.6921491622924805,
244
+ "logps/chosen": -328.2577209472656,
245
+ "logps/rejected": -374.11846923828125,
246
+ "loss": 0.5318,
247
+ "rewards/accuracies": 0.7124999761581421,
248
+ "rewards/chosen": -0.7673874497413635,
249
+ "rewards/margins": 0.5369923710823059,
250
+ "rewards/rejected": -1.3043798208236694,
251
+ "step": 140
252
+ },
253
+ {
254
+ "epoch": 0.3138075313807531,
255
+ "grad_norm": 28.62578465081787,
256
+ "learning_rate": 4.337355301007335e-07,
257
+ "logits/chosen": 0.7725589871406555,
258
+ "logits/rejected": 1.4066698551177979,
259
+ "logps/chosen": -374.64288330078125,
260
+ "logps/rejected": -426.3128356933594,
261
+ "loss": 0.5323,
262
+ "rewards/accuracies": 0.737500011920929,
263
+ "rewards/chosen": -0.7278563380241394,
264
+ "rewards/margins": 0.7114464044570923,
265
+ "rewards/rejected": -1.439302682876587,
266
+ "step": 150
267
+ },
268
+ {
269
+ "epoch": 0.3138075313807531,
270
+ "eval_logits/chosen": -0.09001250565052032,
271
+ "eval_logits/rejected": 0.4792196750640869,
272
+ "eval_logps/chosen": -317.6140441894531,
273
+ "eval_logps/rejected": -391.6695556640625,
274
+ "eval_loss": 0.5405200123786926,
275
+ "eval_rewards/accuracies": 0.76171875,
276
+ "eval_rewards/chosen": -0.5498415231704712,
277
+ "eval_rewards/margins": 0.7402328252792358,
278
+ "eval_rewards/rejected": -1.290074348449707,
279
+ "eval_runtime": 102.6926,
280
+ "eval_samples_per_second": 19.476,
281
+ "eval_steps_per_second": 0.312,
282
+ "step": 150
283
+ },
284
+ {
285
+ "epoch": 0.33472803347280333,
286
+ "grad_norm": 26.58643912067179,
287
+ "learning_rate": 4.2087030056579986e-07,
288
+ "logits/chosen": -0.5741851925849915,
289
+ "logits/rejected": 0.09496372938156128,
290
+ "logps/chosen": -313.5487976074219,
291
+ "logps/rejected": -365.291015625,
292
+ "loss": 0.5399,
293
+ "rewards/accuracies": 0.75,
294
+ "rewards/chosen": -0.5437325835227966,
295
+ "rewards/margins": 0.6792270541191101,
296
+ "rewards/rejected": -1.2229596376419067,
297
+ "step": 160
298
+ },
299
+ {
300
+ "epoch": 0.35564853556485354,
301
+ "grad_norm": 23.464961776068243,
302
+ "learning_rate": 4.070934040463998e-07,
303
+ "logits/chosen": -0.5381526947021484,
304
+ "logits/rejected": 0.14733538031578064,
305
+ "logps/chosen": -354.4422912597656,
306
+ "logps/rejected": -430.18682861328125,
307
+ "loss": 0.5326,
308
+ "rewards/accuracies": 0.75,
309
+ "rewards/chosen": -0.7926353216171265,
310
+ "rewards/margins": 0.8019927740097046,
311
+ "rewards/rejected": -1.594628095626831,
312
+ "step": 170
313
+ },
314
+ {
315
+ "epoch": 0.37656903765690375,
316
+ "grad_norm": 22.467354382291056,
317
+ "learning_rate": 3.9247834624635404e-07,
318
+ "logits/chosen": -0.20919005572795868,
319
+ "logits/rejected": 0.49986928701400757,
320
+ "logps/chosen": -371.064697265625,
321
+ "logps/rejected": -408.84149169921875,
322
+ "loss": 0.5358,
323
+ "rewards/accuracies": 0.706250011920929,
324
+ "rewards/chosen": -0.9278782606124878,
325
+ "rewards/margins": 0.5680198669433594,
326
+ "rewards/rejected": -1.4958980083465576,
327
+ "step": 180
328
+ },
329
+ {
330
+ "epoch": 0.39748953974895396,
331
+ "grad_norm": 29.254730102983988,
332
+ "learning_rate": 3.7710310482256523e-07,
333
+ "logits/chosen": -0.3338702619075775,
334
+ "logits/rejected": 0.38479360938072205,
335
+ "logps/chosen": -319.56805419921875,
336
+ "logps/rejected": -378.23590087890625,
337
+ "loss": 0.5412,
338
+ "rewards/accuracies": 0.7562500238418579,
339
+ "rewards/chosen": -0.709225058555603,
340
+ "rewards/margins": 0.857193648815155,
341
+ "rewards/rejected": -1.5664187669754028,
342
+ "step": 190
343
+ },
344
+ {
345
+ "epoch": 0.41841004184100417,
346
+ "grad_norm": 21.764790734639657,
347
+ "learning_rate": 3.610497133404795e-07,
348
+ "logits/chosen": -0.4434884190559387,
349
+ "logits/rejected": 0.06593348830938339,
350
+ "logps/chosen": -343.6837463378906,
351
+ "logps/rejected": -397.1659240722656,
352
+ "loss": 0.536,
353
+ "rewards/accuracies": 0.7124999761581421,
354
+ "rewards/chosen": -0.6647871136665344,
355
+ "rewards/margins": 0.6501897573471069,
356
+ "rewards/rejected": -1.314976692199707,
357
+ "step": 200
358
+ },
359
+ {
360
+ "epoch": 0.41841004184100417,
361
+ "eval_logits/chosen": -0.019472889602184296,
362
+ "eval_logits/rejected": 0.65250563621521,
363
+ "eval_logps/chosen": -326.44696044921875,
364
+ "eval_logps/rejected": -400.973388671875,
365
+ "eval_loss": 0.5353578329086304,
366
+ "eval_rewards/accuracies": 0.765625,
367
+ "eval_rewards/chosen": -0.6381703019142151,
368
+ "eval_rewards/margins": 0.7449426054954529,
369
+ "eval_rewards/rejected": -1.3831130266189575,
370
+ "eval_runtime": 102.2662,
371
+ "eval_samples_per_second": 19.557,
372
+ "eval_steps_per_second": 0.313,
373
+ "step": 200
374
+ },
375
+ {
376
+ "epoch": 0.4393305439330544,
377
+ "grad_norm": 27.745510528591602,
378
+ "learning_rate": 3.4440382358952115e-07,
379
+ "logits/chosen": 1.1720086336135864,
380
+ "logits/rejected": 1.8830058574676514,
381
+ "logps/chosen": -403.1050109863281,
382
+ "logps/rejected": -435.21942138671875,
383
+ "loss": 0.5366,
384
+ "rewards/accuracies": 0.7124999761581421,
385
+ "rewards/chosen": -0.988387405872345,
386
+ "rewards/margins": 0.5626603960990906,
387
+ "rewards/rejected": -1.5510478019714355,
388
+ "step": 210
389
+ },
390
+ {
391
+ "epoch": 0.4602510460251046,
392
+ "grad_norm": 22.068359980003752,
393
+ "learning_rate": 3.272542485937368e-07,
394
+ "logits/chosen": 1.7870858907699585,
395
+ "logits/rejected": 2.4022154808044434,
396
+ "logps/chosen": -351.79974365234375,
397
+ "logps/rejected": -414.64129638671875,
398
+ "loss": 0.5367,
399
+ "rewards/accuracies": 0.7250000238418579,
400
+ "rewards/chosen": -0.945723831653595,
401
+ "rewards/margins": 0.6799818277359009,
402
+ "rewards/rejected": -1.6257057189941406,
403
+ "step": 220
404
+ },
405
+ {
406
+ "epoch": 0.4811715481171548,
407
+ "grad_norm": 24.112210218904135,
408
+ "learning_rate": 3.096924887558854e-07,
409
+ "logits/chosen": 1.5326052904129028,
410
+ "logits/rejected": 1.9995750188827515,
411
+ "logps/chosen": -340.270263671875,
412
+ "logps/rejected": -413.39385986328125,
413
+ "loss": 0.494,
414
+ "rewards/accuracies": 0.768750011920929,
415
+ "rewards/chosen": -0.8984026908874512,
416
+ "rewards/margins": 0.7524574995040894,
417
+ "rewards/rejected": -1.650860071182251,
418
+ "step": 230
419
+ },
420
+ {
421
+ "epoch": 0.502092050209205,
422
+ "grad_norm": 27.75263813297474,
423
+ "learning_rate": 2.9181224366319943e-07,
424
+ "logits/chosen": 0.5852566361427307,
425
+ "logits/rejected": 1.34577214717865,
426
+ "logps/chosen": -346.68267822265625,
427
+ "logps/rejected": -395.8317565917969,
428
+ "loss": 0.5068,
429
+ "rewards/accuracies": 0.731249988079071,
430
+ "rewards/chosen": -0.6660369634628296,
431
+ "rewards/margins": 0.7894813418388367,
432
+ "rewards/rejected": -1.4555184841156006,
433
+ "step": 240
434
+ },
435
+ {
436
+ "epoch": 0.5230125523012552,
437
+ "grad_norm": 34.032720640859,
438
+ "learning_rate": 2.7370891215954565e-07,
439
+ "logits/chosen": 1.4959585666656494,
440
+ "logits/rejected": 2.515637159347534,
441
+ "logps/chosen": -362.95111083984375,
442
+ "logps/rejected": -426.1146545410156,
443
+ "loss": 0.5163,
444
+ "rewards/accuracies": 0.71875,
445
+ "rewards/chosen": -1.0264902114868164,
446
+ "rewards/margins": 0.8269758224487305,
447
+ "rewards/rejected": -1.8534657955169678,
448
+ "step": 250
449
+ },
450
+ {
451
+ "epoch": 0.5230125523012552,
452
+ "eval_logits/chosen": 1.7619643211364746,
453
+ "eval_logits/rejected": 2.488286018371582,
454
+ "eval_logps/chosen": -373.8662414550781,
455
+ "eval_logps/rejected": -458.7007751464844,
456
+ "eval_loss": 0.5184882879257202,
457
+ "eval_rewards/accuracies": 0.73828125,
458
+ "eval_rewards/chosen": -1.112363576889038,
459
+ "eval_rewards/margins": 0.8480234742164612,
460
+ "eval_rewards/rejected": -1.9603869915008545,
461
+ "eval_runtime": 103.2706,
462
+ "eval_samples_per_second": 19.367,
463
+ "eval_steps_per_second": 0.31,
464
+ "step": 250
465
+ },
466
+ {
467
+ "epoch": 0.5439330543933054,
468
+ "grad_norm": 24.412359940634452,
469
+ "learning_rate": 2.55479083351317e-07,
470
+ "logits/chosen": 1.4174010753631592,
471
+ "logits/rejected": 2.5714869499206543,
472
+ "logps/chosen": -396.62353515625,
473
+ "logps/rejected": -448.77813720703125,
474
+ "loss": 0.5172,
475
+ "rewards/accuracies": 0.6875,
476
+ "rewards/chosen": -1.168769121170044,
477
+ "rewards/margins": 0.7762032747268677,
478
+ "rewards/rejected": -1.9449723958969116,
479
+ "step": 260
480
+ },
481
+ {
482
+ "epoch": 0.5648535564853556,
483
+ "grad_norm": 27.094533941945745,
484
+ "learning_rate": 2.3722002126275822e-07,
485
+ "logits/chosen": 0.924826979637146,
486
+ "logits/rejected": 2.0330166816711426,
487
+ "logps/chosen": -410.7693786621094,
488
+ "logps/rejected": -429.46466064453125,
489
+ "loss": 0.5165,
490
+ "rewards/accuracies": 0.731249988079071,
491
+ "rewards/chosen": -0.9795888662338257,
492
+ "rewards/margins": 0.6443194150924683,
493
+ "rewards/rejected": -1.623908281326294,
494
+ "step": 270
495
+ },
496
+ {
497
+ "epoch": 0.5857740585774058,
498
+ "grad_norm": 25.27079656796927,
499
+ "learning_rate": 2.19029145890313e-07,
500
+ "logits/chosen": 1.045302391052246,
501
+ "logits/rejected": 1.8793681859970093,
502
+ "logps/chosen": -354.340576171875,
503
+ "logps/rejected": -408.4857177734375,
504
+ "loss": 0.5031,
505
+ "rewards/accuracies": 0.7562500238418579,
506
+ "rewards/chosen": -0.960013210773468,
507
+ "rewards/margins": 0.848355770111084,
508
+ "rewards/rejected": -1.8083690404891968,
509
+ "step": 280
510
+ },
511
+ {
512
+ "epoch": 0.606694560669456,
513
+ "grad_norm": 33.50688260105116,
514
+ "learning_rate": 2.0100351342479216e-07,
515
+ "logits/chosen": 0.9456893801689148,
516
+ "logits/rejected": 2.003875255584717,
517
+ "logps/chosen": -404.9028015136719,
518
+ "logps/rejected": -436.6299743652344,
519
+ "loss": 0.496,
520
+ "rewards/accuracies": 0.800000011920929,
521
+ "rewards/chosen": -0.9352100491523743,
522
+ "rewards/margins": 0.8176320791244507,
523
+ "rewards/rejected": -1.7528421878814697,
524
+ "step": 290
525
+ },
526
+ {
527
+ "epoch": 0.6276150627615062,
528
+ "grad_norm": 31.055623020499947,
529
+ "learning_rate": 1.8323929841460178e-07,
530
+ "logits/chosen": 1.6724220514297485,
531
+ "logits/rejected": 2.5003108978271484,
532
+ "logps/chosen": -367.1915283203125,
533
+ "logps/rejected": -454.2134704589844,
534
+ "loss": 0.5018,
535
+ "rewards/accuracies": 0.6875,
536
+ "rewards/chosen": -1.1671972274780273,
537
+ "rewards/margins": 0.8203157186508179,
538
+ "rewards/rejected": -1.9875129461288452,
539
+ "step": 300
540
+ },
541
+ {
542
+ "epoch": 0.6276150627615062,
543
+ "eval_logits/chosen": 1.3198010921478271,
544
+ "eval_logits/rejected": 2.090496778488159,
545
+ "eval_logps/chosen": -355.89239501953125,
546
+ "eval_logps/rejected": -443.9043884277344,
547
+ "eval_loss": 0.5108399987220764,
548
+ "eval_rewards/accuracies": 0.7578125,
549
+ "eval_rewards/chosen": -0.9326243996620178,
550
+ "eval_rewards/margins": 0.879798412322998,
551
+ "eval_rewards/rejected": -1.8124228715896606,
552
+ "eval_runtime": 103.5061,
553
+ "eval_samples_per_second": 19.323,
554
+ "eval_steps_per_second": 0.309,
555
+ "step": 300
556
+ },
557
+ {
558
+ "epoch": 0.6485355648535565,
559
+ "grad_norm": 26.914602226083172,
560
+ "learning_rate": 1.6583128063291573e-07,
561
+ "logits/chosen": 1.2696157693862915,
562
+ "logits/rejected": 2.2371857166290283,
563
+ "logps/chosen": -391.40081787109375,
564
+ "logps/rejected": -433.94384765625,
565
+ "loss": 0.5165,
566
+ "rewards/accuracies": 0.762499988079071,
567
+ "rewards/chosen": -0.8740479350090027,
568
+ "rewards/margins": 0.8631827235221863,
569
+ "rewards/rejected": -1.737230658531189,
570
+ "step": 310
571
+ },
572
+ {
573
+ "epoch": 0.6694560669456067,
574
+ "grad_norm": 23.39436395049965,
575
+ "learning_rate": 1.488723393865766e-07,
576
+ "logits/chosen": 0.8202553987503052,
577
+ "logits/rejected": 1.8485848903656006,
578
+ "logps/chosen": -368.58477783203125,
579
+ "logps/rejected": -417.188720703125,
580
+ "loss": 0.5097,
581
+ "rewards/accuracies": 0.7749999761581421,
582
+ "rewards/chosen": -0.8171142339706421,
583
+ "rewards/margins": 0.8511916995048523,
584
+ "rewards/rejected": -1.6683059930801392,
585
+ "step": 320
586
+ },
587
+ {
588
+ "epoch": 0.6903765690376569,
589
+ "grad_norm": 24.464796036697045,
590
+ "learning_rate": 1.3245295796480788e-07,
591
+ "logits/chosen": 0.7323199510574341,
592
+ "logits/rejected": 1.601609230041504,
593
+ "logps/chosen": -404.5626525878906,
594
+ "logps/rejected": -442.4979553222656,
595
+ "loss": 0.5021,
596
+ "rewards/accuracies": 0.737500011920929,
597
+ "rewards/chosen": -1.0547947883605957,
598
+ "rewards/margins": 0.6202768087387085,
599
+ "rewards/rejected": -1.6750714778900146,
600
+ "step": 330
601
+ },
602
+ {
603
+ "epoch": 0.7112970711297071,
604
+ "grad_norm": 28.602197351940898,
605
+ "learning_rate": 1.1666074087171627e-07,
606
+ "logits/chosen": 1.1269073486328125,
607
+ "logits/rejected": 2.291718006134033,
608
+ "logps/chosen": -386.19696044921875,
609
+ "logps/rejected": -446.26080322265625,
610
+ "loss": 0.51,
611
+ "rewards/accuracies": 0.7562500238418579,
612
+ "rewards/chosen": -1.0688059329986572,
613
+ "rewards/margins": 0.8116031885147095,
614
+ "rewards/rejected": -1.8804088830947876,
615
+ "step": 340
616
+ },
617
+ {
618
+ "epoch": 0.7322175732217573,
619
+ "grad_norm": 27.032175742818502,
620
+ "learning_rate": 1.0157994641835734e-07,
621
+ "logits/chosen": 1.3312422037124634,
622
+ "logits/rejected": 2.5307185649871826,
623
+ "logps/chosen": -407.82183837890625,
624
+ "logps/rejected": -435.91339111328125,
625
+ "loss": 0.4999,
626
+ "rewards/accuracies": 0.768750011920929,
627
+ "rewards/chosen": -1.1246039867401123,
628
+ "rewards/margins": 0.9464033246040344,
629
+ "rewards/rejected": -2.071007490158081,
630
+ "step": 350
631
+ },
632
+ {
633
+ "epoch": 0.7322175732217573,
634
+ "eval_logits/chosen": 1.2352690696716309,
635
+ "eval_logits/rejected": 2.0403130054473877,
636
+ "eval_logps/chosen": -366.191650390625,
637
+ "eval_logps/rejected": -457.5764465332031,
638
+ "eval_loss": 0.5094150900840759,
639
+ "eval_rewards/accuracies": 0.74609375,
640
+ "eval_rewards/chosen": -1.035617470741272,
641
+ "eval_rewards/margins": 0.9135259389877319,
642
+ "eval_rewards/rejected": -1.949143409729004,
643
+ "eval_runtime": 101.7074,
644
+ "eval_samples_per_second": 19.664,
645
+ "eval_steps_per_second": 0.315,
646
+ "step": 350
647
+ },
648
+ {
649
+ "epoch": 0.7531380753138075,
650
+ "grad_norm": 27.108638768821233,
651
+ "learning_rate": 8.729103716819111e-08,
652
+ "logits/chosen": 1.3057953119277954,
653
+ "logits/rejected": 2.3061721324920654,
654
+ "logps/chosen": -390.41973876953125,
655
+ "logps/rejected": -483.9664001464844,
656
+ "loss": 0.4746,
657
+ "rewards/accuracies": 0.75,
658
+ "rewards/chosen": -1.1125240325927734,
659
+ "rewards/margins": 0.8978655934333801,
660
+ "rewards/rejected": -2.0103893280029297,
661
+ "step": 360
662
+ },
663
+ {
664
+ "epoch": 0.7740585774058577,
665
+ "grad_norm": 23.348973436681856,
666
+ "learning_rate": 7.387025063449081e-08,
667
+ "logits/chosen": 1.0233042240142822,
668
+ "logits/rejected": 1.8939619064331055,
669
+ "logps/chosen": -426.7222595214844,
670
+ "logps/rejected": -498.1934509277344,
671
+ "loss": 0.5237,
672
+ "rewards/accuracies": 0.793749988079071,
673
+ "rewards/chosen": -0.9773656725883484,
674
+ "rewards/margins": 1.0110007524490356,
675
+ "rewards/rejected": -1.9883663654327393,
676
+ "step": 370
677
+ },
678
+ {
679
+ "epoch": 0.7949790794979079,
680
+ "grad_norm": 23.77690961130278,
681
+ "learning_rate": 6.138919252022435e-08,
682
+ "logits/chosen": 0.7093733549118042,
683
+ "logits/rejected": 1.795745849609375,
684
+ "logps/chosen": -405.4716796875,
685
+ "logps/rejected": -439.33441162109375,
686
+ "loss": 0.5086,
687
+ "rewards/accuracies": 0.7250000238418579,
688
+ "rewards/chosen": -0.994636058807373,
689
+ "rewards/margins": 0.8113524317741394,
690
+ "rewards/rejected": -1.8059885501861572,
691
+ "step": 380
692
+ },
693
+ {
694
+ "epoch": 0.8158995815899581,
695
+ "grad_norm": 24.44927335880225,
696
+ "learning_rate": 4.991445467064689e-08,
697
+ "logits/chosen": 1.5607655048370361,
698
+ "logits/rejected": 1.7856521606445312,
699
+ "logps/chosen": -352.96942138671875,
700
+ "logps/rejected": -442.06964111328125,
701
+ "loss": 0.5034,
702
+ "rewards/accuracies": 0.7250000238418579,
703
+ "rewards/chosen": -1.0203418731689453,
704
+ "rewards/margins": 0.8515297770500183,
705
+ "rewards/rejected": -1.8718715906143188,
706
+ "step": 390
707
+ },
708
+ {
709
+ "epoch": 0.8368200836820083,
710
+ "grad_norm": 22.777855221169585,
711
+ "learning_rate": 3.9507259776993954e-08,
712
+ "logits/chosen": 0.8675743341445923,
713
+ "logits/rejected": 2.180140972137451,
714
+ "logps/chosen": -412.081787109375,
715
+ "logps/rejected": -439.1665954589844,
716
+ "loss": 0.4966,
717
+ "rewards/accuracies": 0.7562500238418579,
718
+ "rewards/chosen": -0.9012048840522766,
719
+ "rewards/margins": 0.8984438180923462,
720
+ "rewards/rejected": -1.799648642539978,
721
+ "step": 400
722
+ },
723
+ {
724
+ "epoch": 0.8368200836820083,
725
+ "eval_logits/chosen": 1.2642048597335815,
726
+ "eval_logits/rejected": 2.022581100463867,
727
+ "eval_logps/chosen": -361.9197998046875,
728
+ "eval_logps/rejected": -454.93212890625,
729
+ "eval_loss": 0.5065781474113464,
730
+ "eval_rewards/accuracies": 0.7578125,
731
+ "eval_rewards/chosen": -0.9928989410400391,
732
+ "eval_rewards/margins": 0.9298015236854553,
733
+ "eval_rewards/rejected": -1.92270028591156,
734
+ "eval_runtime": 103.4115,
735
+ "eval_samples_per_second": 19.34,
736
+ "eval_steps_per_second": 0.309,
737
+ "step": 400
738
+ },
739
+ {
740
+ "epoch": 0.8577405857740585,
741
+ "grad_norm": 41.05467932326275,
742
+ "learning_rate": 3.022313472693447e-08,
743
+ "logits/chosen": 1.1478136777877808,
744
+ "logits/rejected": 2.6675543785095215,
745
+ "logps/chosen": -409.5799865722656,
746
+ "logps/rejected": -468.4205627441406,
747
+ "loss": 0.4807,
748
+ "rewards/accuracies": 0.768750011920929,
749
+ "rewards/chosen": -0.9972062110900879,
750
+ "rewards/margins": 1.0250569581985474,
751
+ "rewards/rejected": -2.0222630500793457,
752
+ "step": 410
753
+ },
754
+ {
755
+ "epoch": 0.8786610878661087,
756
+ "grad_norm": 29.37147689145298,
757
+ "learning_rate": 2.2111614344599684e-08,
758
+ "logits/chosen": 1.628156304359436,
759
+ "logits/rejected": 2.2244324684143066,
760
+ "logps/chosen": -386.01971435546875,
761
+ "logps/rejected": -460.0570373535156,
762
+ "loss": 0.5114,
763
+ "rewards/accuracies": 0.737500011920929,
764
+ "rewards/chosen": -1.121294617652893,
765
+ "rewards/margins": 0.8227565884590149,
766
+ "rewards/rejected": -1.9440511465072632,
767
+ "step": 420
768
+ },
769
+ {
770
+ "epoch": 0.899581589958159,
771
+ "grad_norm": 36.49027990223903,
772
+ "learning_rate": 1.521597710086439e-08,
773
+ "logits/chosen": 1.1463916301727295,
774
+ "logits/rejected": 2.26253342628479,
775
+ "logps/chosen": -402.85162353515625,
776
+ "logps/rejected": -490.9013671875,
777
+ "loss": 0.4991,
778
+ "rewards/accuracies": 0.8125,
779
+ "rewards/chosen": -1.1392345428466797,
780
+ "rewards/margins": 1.0718799829483032,
781
+ "rewards/rejected": -2.2111144065856934,
782
+ "step": 430
783
+ },
784
+ {
785
+ "epoch": 0.9205020920502092,
786
+ "grad_norm": 25.14386947872744,
787
+ "learning_rate": 9.57301420397924e-09,
788
+ "logits/chosen": 1.6257444620132446,
789
+ "logits/rejected": 2.745250701904297,
790
+ "logps/chosen": -375.26214599609375,
791
+ "logps/rejected": -446.528076171875,
792
+ "loss": 0.4996,
793
+ "rewards/accuracies": 0.762499988079071,
794
+ "rewards/chosen": -1.0888123512268066,
795
+ "rewards/margins": 0.9304329752922058,
796
+ "rewards/rejected": -2.019245147705078,
797
+ "step": 440
798
+ },
799
+ {
800
+ "epoch": 0.9414225941422594,
801
+ "grad_norm": 25.676457242831784,
802
+ "learning_rate": 5.212833302556258e-09,
803
+ "logits/chosen": 1.4634435176849365,
804
+ "logits/rejected": 1.8727718591690063,
805
+ "logps/chosen": -356.43170166015625,
806
+ "logps/rejected": -456.1715393066406,
807
+ "loss": 0.5198,
808
+ "rewards/accuracies": 0.7875000238418579,
809
+ "rewards/chosen": -1.0714185237884521,
810
+ "rewards/margins": 0.8603706359863281,
811
+ "rewards/rejected": -1.9317893981933594,
812
+ "step": 450
813
+ },
814
+ {
815
+ "epoch": 0.9414225941422594,
816
+ "eval_logits/chosen": 1.3119983673095703,
817
+ "eval_logits/rejected": 2.1024117469787598,
818
+ "eval_logps/chosen": -363.6942138671875,
819
+ "eval_logps/rejected": -458.2783508300781,
820
+ "eval_loss": 0.505741536617279,
821
+ "eval_rewards/accuracies": 0.76171875,
822
+ "eval_rewards/chosen": -1.010642647743225,
823
+ "eval_rewards/margins": 0.9455199241638184,
824
+ "eval_rewards/rejected": -1.9561628103256226,
825
+ "eval_runtime": 101.6569,
826
+ "eval_samples_per_second": 19.674,
827
+ "eval_steps_per_second": 0.315,
828
+ "step": 450
829
+ },
830
+ {
831
+ "epoch": 0.9623430962343096,
832
+ "grad_norm": 26.320704397497966,
833
+ "learning_rate": 2.158697848236607e-09,
834
+ "logits/chosen": 1.4335782527923584,
835
+ "logits/rejected": 2.41444730758667,
836
+ "logps/chosen": -418.5895080566406,
837
+ "logps/rejected": -466.94525146484375,
838
+ "loss": 0.4754,
839
+ "rewards/accuracies": 0.768750011920929,
840
+ "rewards/chosen": -1.0504125356674194,
841
+ "rewards/margins": 0.8714386820793152,
842
+ "rewards/rejected": -1.921851396560669,
843
+ "step": 460
844
+ },
845
+ {
846
+ "epoch": 0.9832635983263598,
847
+ "grad_norm": 24.390801418724564,
848
+ "learning_rate": 4.269029751107489e-10,
849
+ "logits/chosen": 1.4916856288909912,
850
+ "logits/rejected": 2.8449251651763916,
851
+ "logps/chosen": -409.98480224609375,
852
+ "logps/rejected": -462.52374267578125,
853
+ "loss": 0.4863,
854
+ "rewards/accuracies": 0.7562500238418579,
855
+ "rewards/chosen": -1.0590018033981323,
856
+ "rewards/margins": 0.990155816078186,
857
+ "rewards/rejected": -2.0491576194763184,
858
+ "step": 470
859
+ },
860
+ {
861
+ "epoch": 1.0,
862
+ "step": 478,
863
+ "total_flos": 0.0,
864
+ "train_loss": 0.5417051395113,
865
+ "train_runtime": 12463.9447,
866
+ "train_samples_per_second": 4.905,
867
+ "train_steps_per_second": 0.038
868
+ }
869
+ ],
870
+ "logging_steps": 10,
871
+ "max_steps": 478,
872
+ "num_input_tokens_seen": 0,
873
+ "num_train_epochs": 1,
874
+ "save_steps": 100,
875
+ "stateful_callbacks": {
876
+ "TrainerControl": {
877
+ "args": {
878
+ "should_epoch_stop": false,
879
+ "should_evaluate": false,
880
+ "should_log": false,
881
+ "should_save": true,
882
+ "should_training_stop": true
883
+ },
884
+ "attributes": {}
885
+ }
886
+ },
887
+ "total_flos": 0.0,
888
+ "train_batch_size": 8,
889
+ "trial_name": null,
890
+ "trial_params": null
891
+ }