kanishka commited on
Commit
b0faeae
·
verified ·
1 Parent(s): 9b95c22

End of training

Browse files
Files changed (5) hide show
  1. README.md +14 -2
  2. all_results.json +16 -0
  3. eval_results.json +10 -0
  4. train_results.json +9 -0
  5. trainer_state.json +2392 -0
README.md CHANGED
@@ -2,11 +2,23 @@
2
  library_name: transformers
3
  tags:
4
  - generated_from_trainer
 
 
5
  metrics:
6
  - accuracy
7
  model-index:
8
  - name: opt-babylm2-clean-spacy-32k_seed-42_3e-4
9
- results: []
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -14,7 +26,7 @@ should probably proofread and complete it, then remove this comment. -->
14
 
15
  # opt-babylm2-clean-spacy-32k_seed-42_3e-4
16
 
17
- This model was trained from scratch on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
  - Loss: 3.0190
20
  - Accuracy: 0.4234
 
2
  library_name: transformers
3
  tags:
4
  - generated_from_trainer
5
+ datasets:
6
+ - kanishka/babylm2-clean-spacy
7
  metrics:
8
  - accuracy
9
  model-index:
10
  - name: opt-babylm2-clean-spacy-32k_seed-42_3e-4
11
+ results:
12
+ - task:
13
+ name: Causal Language Modeling
14
+ type: text-generation
15
+ dataset:
16
+ name: kanishka/babylm2-clean-spacy
17
+ type: kanishka/babylm2-clean-spacy
18
+ metrics:
19
+ - name: Accuracy
20
+ type: accuracy
21
+ value: 0.4234014597448438
22
  ---
23
 
24
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
26
 
27
  # opt-babylm2-clean-spacy-32k_seed-42_3e-4
28
 
29
+ This model was trained from scratch on the kanishka/babylm2-clean-spacy dataset.
30
  It achieves the following results on the evaluation set:
31
  - Loss: 3.0190
32
  - Accuracy: 0.4234
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 20.0,
3
+ "eval_accuracy": 0.4234014597448438,
4
+ "eval_loss": 3.0190186500549316,
5
+ "eval_runtime": 111.5107,
6
+ "eval_samples": 52440,
7
+ "eval_samples_per_second": 470.269,
8
+ "eval_steps_per_second": 7.354,
9
+ "perplexity": 20.47119242004321,
10
+ "total_flos": 1.29957250203648e+18,
11
+ "train_loss": 2.7038125842232534,
12
+ "train_runtime": 44110.0674,
13
+ "train_samples": 497364,
14
+ "train_samples_per_second": 225.51,
15
+ "train_steps_per_second": 7.047
16
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 20.0,
3
+ "eval_accuracy": 0.4234014597448438,
4
+ "eval_loss": 3.0190186500549316,
5
+ "eval_runtime": 111.5107,
6
+ "eval_samples": 52440,
7
+ "eval_samples_per_second": 470.269,
8
+ "eval_steps_per_second": 7.354,
9
+ "perplexity": 20.47119242004321
10
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 20.0,
3
+ "total_flos": 1.29957250203648e+18,
4
+ "train_loss": 2.7038125842232534,
5
+ "train_runtime": 44110.0674,
6
+ "train_samples": 497364,
7
+ "train_samples_per_second": 225.51,
8
+ "train_steps_per_second": 7.047
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,2392 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 20.0,
5
+ "eval_steps": 500,
6
+ "global_step": 310860,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.06433764395547835,
13
+ "grad_norm": 0.9469536542892456,
14
+ "learning_rate": 9.375e-06,
15
+ "loss": 7.0597,
16
+ "step": 1000
17
+ },
18
+ {
19
+ "epoch": 0.1286752879109567,
20
+ "grad_norm": 0.9877486824989319,
21
+ "learning_rate": 1.875e-05,
22
+ "loss": 4.8574,
23
+ "step": 2000
24
+ },
25
+ {
26
+ "epoch": 0.19301293186643506,
27
+ "grad_norm": 1.1811100244522095,
28
+ "learning_rate": 2.8125e-05,
29
+ "loss": 4.5464,
30
+ "step": 3000
31
+ },
32
+ {
33
+ "epoch": 0.2573505758219134,
34
+ "grad_norm": 1.155553936958313,
35
+ "learning_rate": 3.75e-05,
36
+ "loss": 4.3086,
37
+ "step": 4000
38
+ },
39
+ {
40
+ "epoch": 0.32168821977739176,
41
+ "grad_norm": 1.0062898397445679,
42
+ "learning_rate": 4.6874999999999994e-05,
43
+ "loss": 4.1307,
44
+ "step": 5000
45
+ },
46
+ {
47
+ "epoch": 0.3860258637328701,
48
+ "grad_norm": 0.9749443531036377,
49
+ "learning_rate": 5.625e-05,
50
+ "loss": 3.986,
51
+ "step": 6000
52
+ },
53
+ {
54
+ "epoch": 0.45036350768834843,
55
+ "grad_norm": 0.9870838522911072,
56
+ "learning_rate": 6.5625e-05,
57
+ "loss": 3.8708,
58
+ "step": 7000
59
+ },
60
+ {
61
+ "epoch": 0.5147011516438268,
62
+ "grad_norm": 1.0740858316421509,
63
+ "learning_rate": 7.5e-05,
64
+ "loss": 3.781,
65
+ "step": 8000
66
+ },
67
+ {
68
+ "epoch": 0.5790387955993052,
69
+ "grad_norm": 0.969571053981781,
70
+ "learning_rate": 8.437499999999999e-05,
71
+ "loss": 3.6942,
72
+ "step": 9000
73
+ },
74
+ {
75
+ "epoch": 0.6433764395547835,
76
+ "grad_norm": 0.923062801361084,
77
+ "learning_rate": 9.374999999999999e-05,
78
+ "loss": 3.6225,
79
+ "step": 10000
80
+ },
81
+ {
82
+ "epoch": 0.7077140835102619,
83
+ "grad_norm": 0.87486732006073,
84
+ "learning_rate": 0.00010312499999999999,
85
+ "loss": 3.5667,
86
+ "step": 11000
87
+ },
88
+ {
89
+ "epoch": 0.7720517274657402,
90
+ "grad_norm": 0.8343172073364258,
91
+ "learning_rate": 0.000112490625,
92
+ "loss": 3.5107,
93
+ "step": 12000
94
+ },
95
+ {
96
+ "epoch": 0.8363893714212186,
97
+ "grad_norm": 0.8089198470115662,
98
+ "learning_rate": 0.000121865625,
99
+ "loss": 3.4681,
100
+ "step": 13000
101
+ },
102
+ {
103
+ "epoch": 0.9007270153766969,
104
+ "grad_norm": 0.8141182661056519,
105
+ "learning_rate": 0.00013123125,
106
+ "loss": 3.4337,
107
+ "step": 14000
108
+ },
109
+ {
110
+ "epoch": 0.9650646593321752,
111
+ "grad_norm": 0.7596079707145691,
112
+ "learning_rate": 0.00014060625,
113
+ "loss": 3.3944,
114
+ "step": 15000
115
+ },
116
+ {
117
+ "epoch": 1.0,
118
+ "eval_accuracy": 0.37339323372369543,
119
+ "eval_loss": 3.4211647510528564,
120
+ "eval_runtime": 112.5567,
121
+ "eval_samples_per_second": 465.898,
122
+ "eval_steps_per_second": 7.285,
123
+ "step": 15543
124
+ },
125
+ {
126
+ "epoch": 1.0294023032876536,
127
+ "grad_norm": 0.7583508491516113,
128
+ "learning_rate": 0.000149971875,
129
+ "loss": 3.345,
130
+ "step": 16000
131
+ },
132
+ {
133
+ "epoch": 1.093739947243132,
134
+ "grad_norm": 0.7395954728126526,
135
+ "learning_rate": 0.00015933749999999996,
136
+ "loss": 3.3182,
137
+ "step": 17000
138
+ },
139
+ {
140
+ "epoch": 1.1580775911986103,
141
+ "grad_norm": 0.7119142413139343,
142
+ "learning_rate": 0.00016871249999999996,
143
+ "loss": 3.304,
144
+ "step": 18000
145
+ },
146
+ {
147
+ "epoch": 1.2224152351540887,
148
+ "grad_norm": 0.7133814692497253,
149
+ "learning_rate": 0.00017808749999999999,
150
+ "loss": 3.2808,
151
+ "step": 19000
152
+ },
153
+ {
154
+ "epoch": 1.286752879109567,
155
+ "grad_norm": 0.6662284731864929,
156
+ "learning_rate": 0.00018745312499999998,
157
+ "loss": 3.2624,
158
+ "step": 20000
159
+ },
160
+ {
161
+ "epoch": 1.3510905230650454,
162
+ "grad_norm": 0.6821054816246033,
163
+ "learning_rate": 0.00019682812499999998,
164
+ "loss": 3.2468,
165
+ "step": 21000
166
+ },
167
+ {
168
+ "epoch": 1.4154281670205238,
169
+ "grad_norm": 0.6423399448394775,
170
+ "learning_rate": 0.00020619374999999998,
171
+ "loss": 3.2323,
172
+ "step": 22000
173
+ },
174
+ {
175
+ "epoch": 1.4797658109760021,
176
+ "grad_norm": 0.6489351987838745,
177
+ "learning_rate": 0.00021556874999999998,
178
+ "loss": 3.218,
179
+ "step": 23000
180
+ },
181
+ {
182
+ "epoch": 1.5441034549314803,
183
+ "grad_norm": 0.6388360261917114,
184
+ "learning_rate": 0.00022493437499999998,
185
+ "loss": 3.2063,
186
+ "step": 24000
187
+ },
188
+ {
189
+ "epoch": 1.6084410988869586,
190
+ "grad_norm": 0.6035541296005249,
191
+ "learning_rate": 0.00023430937499999997,
192
+ "loss": 3.1971,
193
+ "step": 25000
194
+ },
195
+ {
196
+ "epoch": 1.672778742842437,
197
+ "grad_norm": 0.5949345231056213,
198
+ "learning_rate": 0.00024367499999999997,
199
+ "loss": 3.1683,
200
+ "step": 26000
201
+ },
202
+ {
203
+ "epoch": 1.7371163867979154,
204
+ "grad_norm": 0.5953760147094727,
205
+ "learning_rate": 0.00025305,
206
+ "loss": 3.1728,
207
+ "step": 27000
208
+ },
209
+ {
210
+ "epoch": 1.8014540307533937,
211
+ "grad_norm": 0.5276063680648804,
212
+ "learning_rate": 0.000262415625,
213
+ "loss": 3.1607,
214
+ "step": 28000
215
+ },
216
+ {
217
+ "epoch": 1.865791674708872,
218
+ "grad_norm": 0.5257272124290466,
219
+ "learning_rate": 0.000271790625,
220
+ "loss": 3.1472,
221
+ "step": 29000
222
+ },
223
+ {
224
+ "epoch": 1.9301293186643504,
225
+ "grad_norm": 0.49043259024620056,
226
+ "learning_rate": 0.000281165625,
227
+ "loss": 3.1367,
228
+ "step": 30000
229
+ },
230
+ {
231
+ "epoch": 1.9944669626198288,
232
+ "grad_norm": 0.5030378699302673,
233
+ "learning_rate": 0.000290521875,
234
+ "loss": 3.1245,
235
+ "step": 31000
236
+ },
237
+ {
238
+ "epoch": 2.0,
239
+ "eval_accuracy": 0.3939620256950988,
240
+ "eval_loss": 3.2037432193756104,
241
+ "eval_runtime": 113.0293,
242
+ "eval_samples_per_second": 463.951,
243
+ "eval_steps_per_second": 7.255,
244
+ "step": 31086
245
+ },
246
+ {
247
+ "epoch": 2.058804606575307,
248
+ "grad_norm": 0.5003546476364136,
249
+ "learning_rate": 0.000299896875,
250
+ "loss": 3.0828,
251
+ "step": 32000
252
+ },
253
+ {
254
+ "epoch": 2.1231422505307855,
255
+ "grad_norm": 0.48286330699920654,
256
+ "learning_rate": 0.00029893602524564295,
257
+ "loss": 3.08,
258
+ "step": 33000
259
+ },
260
+ {
261
+ "epoch": 2.187479894486264,
262
+ "grad_norm": 0.4852472245693207,
263
+ "learning_rate": 0.0002978602165961414,
264
+ "loss": 3.0633,
265
+ "step": 34000
266
+ },
267
+ {
268
+ "epoch": 2.2518175384417423,
269
+ "grad_norm": 0.4629572927951813,
270
+ "learning_rate": 0.00029678548375528934,
271
+ "loss": 3.063,
272
+ "step": 35000
273
+ },
274
+ {
275
+ "epoch": 2.3161551823972206,
276
+ "grad_norm": 0.4571368992328644,
277
+ "learning_rate": 0.0002957096751057878,
278
+ "loss": 3.0453,
279
+ "step": 36000
280
+ },
281
+ {
282
+ "epoch": 2.380492826352699,
283
+ "grad_norm": 0.44331055879592896,
284
+ "learning_rate": 0.0002946349422649358,
285
+ "loss": 3.0408,
286
+ "step": 37000
287
+ },
288
+ {
289
+ "epoch": 2.4448304703081774,
290
+ "grad_norm": 0.4230923354625702,
291
+ "learning_rate": 0.00029355913361543424,
292
+ "loss": 3.0359,
293
+ "step": 38000
294
+ },
295
+ {
296
+ "epoch": 2.5091681142636557,
297
+ "grad_norm": 0.4260108768939972,
298
+ "learning_rate": 0.0002924833249659327,
299
+ "loss": 3.0316,
300
+ "step": 39000
301
+ },
302
+ {
303
+ "epoch": 2.573505758219134,
304
+ "grad_norm": 0.41887935996055603,
305
+ "learning_rate": 0.0002914085921250807,
306
+ "loss": 3.0299,
307
+ "step": 40000
308
+ },
309
+ {
310
+ "epoch": 2.6378434021746124,
311
+ "grad_norm": 0.41068920493125916,
312
+ "learning_rate": 0.00029033278347557914,
313
+ "loss": 3.0138,
314
+ "step": 41000
315
+ },
316
+ {
317
+ "epoch": 2.702181046130091,
318
+ "grad_norm": 0.39430394768714905,
319
+ "learning_rate": 0.0002892591264433766,
320
+ "loss": 3.0038,
321
+ "step": 42000
322
+ },
323
+ {
324
+ "epoch": 2.766518690085569,
325
+ "grad_norm": 0.4100017547607422,
326
+ "learning_rate": 0.00028818331779387505,
327
+ "loss": 3.0088,
328
+ "step": 43000
329
+ },
330
+ {
331
+ "epoch": 2.8308563340410475,
332
+ "grad_norm": 0.4101816415786743,
333
+ "learning_rate": 0.0002871075091443735,
334
+ "loss": 2.9937,
335
+ "step": 44000
336
+ },
337
+ {
338
+ "epoch": 2.895193977996526,
339
+ "grad_norm": 0.38294607400894165,
340
+ "learning_rate": 0.000286031700494872,
341
+ "loss": 2.9898,
342
+ "step": 45000
343
+ },
344
+ {
345
+ "epoch": 2.9595316219520043,
346
+ "grad_norm": 0.37260037660598755,
347
+ "learning_rate": 0.00028495589184537043,
348
+ "loss": 2.9807,
349
+ "step": 46000
350
+ },
351
+ {
352
+ "epoch": 3.0,
353
+ "eval_accuracy": 0.40728960081362825,
354
+ "eval_loss": 3.079435110092163,
355
+ "eval_runtime": 112.8117,
356
+ "eval_samples_per_second": 464.845,
357
+ "eval_steps_per_second": 7.269,
358
+ "step": 46629
359
+ },
360
+ {
361
+ "epoch": 3.0238692659074826,
362
+ "grad_norm": 0.40783312916755676,
363
+ "learning_rate": 0.00028388115900451836,
364
+ "loss": 2.9534,
365
+ "step": 47000
366
+ },
367
+ {
368
+ "epoch": 3.088206909862961,
369
+ "grad_norm": 0.38361623883247375,
370
+ "learning_rate": 0.0002828053503550168,
371
+ "loss": 2.9122,
372
+ "step": 48000
373
+ },
374
+ {
375
+ "epoch": 3.1525445538184393,
376
+ "grad_norm": 0.3766690790653229,
377
+ "learning_rate": 0.0002817306175141648,
378
+ "loss": 2.9119,
379
+ "step": 49000
380
+ },
381
+ {
382
+ "epoch": 3.2168821977739177,
383
+ "grad_norm": 0.38536399602890015,
384
+ "learning_rate": 0.00028065480886466326,
385
+ "loss": 2.9135,
386
+ "step": 50000
387
+ },
388
+ {
389
+ "epoch": 3.2812198417293956,
390
+ "grad_norm": 0.38374361395835876,
391
+ "learning_rate": 0.0002795800760238112,
392
+ "loss": 2.9094,
393
+ "step": 51000
394
+ },
395
+ {
396
+ "epoch": 3.345557485684874,
397
+ "grad_norm": 0.3872029185295105,
398
+ "learning_rate": 0.00027850426737430965,
399
+ "loss": 2.9069,
400
+ "step": 52000
401
+ },
402
+ {
403
+ "epoch": 3.4098951296403524,
404
+ "grad_norm": 0.37401601672172546,
405
+ "learning_rate": 0.0002774284587248081,
406
+ "loss": 2.9092,
407
+ "step": 53000
408
+ },
409
+ {
410
+ "epoch": 3.4742327735958307,
411
+ "grad_norm": 0.34951257705688477,
412
+ "learning_rate": 0.0002763537258839561,
413
+ "loss": 2.9124,
414
+ "step": 54000
415
+ },
416
+ {
417
+ "epoch": 3.538570417551309,
418
+ "grad_norm": 0.36252182722091675,
419
+ "learning_rate": 0.000275278993043104,
420
+ "loss": 2.9086,
421
+ "step": 55000
422
+ },
423
+ {
424
+ "epoch": 3.6029080615067874,
425
+ "grad_norm": 0.3610841631889343,
426
+ "learning_rate": 0.0002742031843936025,
427
+ "loss": 2.9044,
428
+ "step": 56000
429
+ },
430
+ {
431
+ "epoch": 3.667245705462266,
432
+ "grad_norm": 0.356315016746521,
433
+ "learning_rate": 0.00027312737574410094,
434
+ "loss": 2.8984,
435
+ "step": 57000
436
+ },
437
+ {
438
+ "epoch": 3.731583349417744,
439
+ "grad_norm": 0.3501368761062622,
440
+ "learning_rate": 0.0002720515670945994,
441
+ "loss": 2.8991,
442
+ "step": 58000
443
+ },
444
+ {
445
+ "epoch": 3.7959209933732225,
446
+ "grad_norm": 0.3654986619949341,
447
+ "learning_rate": 0.00027097575844509786,
448
+ "loss": 2.8946,
449
+ "step": 59000
450
+ },
451
+ {
452
+ "epoch": 3.860258637328701,
453
+ "grad_norm": 0.34233444929122925,
454
+ "learning_rate": 0.00026990102560424585,
455
+ "loss": 2.8981,
456
+ "step": 60000
457
+ },
458
+ {
459
+ "epoch": 3.9245962812841793,
460
+ "grad_norm": 0.35118335485458374,
461
+ "learning_rate": 0.0002688252169547443,
462
+ "loss": 2.8932,
463
+ "step": 61000
464
+ },
465
+ {
466
+ "epoch": 3.9889339252396576,
467
+ "grad_norm": 0.3542274236679077,
468
+ "learning_rate": 0.00026775048411389223,
469
+ "loss": 2.8872,
470
+ "step": 62000
471
+ },
472
+ {
473
+ "epoch": 4.0,
474
+ "eval_accuracy": 0.4139775803532702,
475
+ "eval_loss": 3.0204551219940186,
476
+ "eval_runtime": 113.294,
477
+ "eval_samples_per_second": 462.866,
478
+ "eval_steps_per_second": 7.238,
479
+ "step": 62172
480
+ },
481
+ {
482
+ "epoch": 4.053271569195136,
483
+ "grad_norm": 0.359250545501709,
484
+ "learning_rate": 0.0002666746754643907,
485
+ "loss": 2.8242,
486
+ "step": 63000
487
+ },
488
+ {
489
+ "epoch": 4.117609213150614,
490
+ "grad_norm": 0.34917619824409485,
491
+ "learning_rate": 0.00026559886681488915,
492
+ "loss": 2.8144,
493
+ "step": 64000
494
+ },
495
+ {
496
+ "epoch": 4.181946857106093,
497
+ "grad_norm": 0.351457417011261,
498
+ "learning_rate": 0.00026452413397403714,
499
+ "loss": 2.8132,
500
+ "step": 65000
501
+ },
502
+ {
503
+ "epoch": 4.246284501061571,
504
+ "grad_norm": 0.35231146216392517,
505
+ "learning_rate": 0.0002634483253245356,
506
+ "loss": 2.8203,
507
+ "step": 66000
508
+ },
509
+ {
510
+ "epoch": 4.310622145017049,
511
+ "grad_norm": 0.354030579328537,
512
+ "learning_rate": 0.0002623735924836836,
513
+ "loss": 2.8273,
514
+ "step": 67000
515
+ },
516
+ {
517
+ "epoch": 4.374959788972528,
518
+ "grad_norm": 0.3434860408306122,
519
+ "learning_rate": 0.00026129778383418204,
520
+ "loss": 2.8221,
521
+ "step": 68000
522
+ },
523
+ {
524
+ "epoch": 4.439297432928006,
525
+ "grad_norm": 0.35598379373550415,
526
+ "learning_rate": 0.0002602219751846805,
527
+ "loss": 2.8283,
528
+ "step": 69000
529
+ },
530
+ {
531
+ "epoch": 4.5036350768834845,
532
+ "grad_norm": 0.350340873003006,
533
+ "learning_rate": 0.00025914616653517896,
534
+ "loss": 2.8242,
535
+ "step": 70000
536
+ },
537
+ {
538
+ "epoch": 4.567972720838963,
539
+ "grad_norm": 0.34078752994537354,
540
+ "learning_rate": 0.0002580714336943269,
541
+ "loss": 2.8309,
542
+ "step": 71000
543
+ },
544
+ {
545
+ "epoch": 4.632310364794441,
546
+ "grad_norm": 0.3571733832359314,
547
+ "learning_rate": 0.00025699670085347487,
548
+ "loss": 2.8248,
549
+ "step": 72000
550
+ },
551
+ {
552
+ "epoch": 4.69664800874992,
553
+ "grad_norm": 0.35940021276474,
554
+ "learning_rate": 0.00025592089220397333,
555
+ "loss": 2.8334,
556
+ "step": 73000
557
+ },
558
+ {
559
+ "epoch": 4.760985652705398,
560
+ "grad_norm": 0.3354775607585907,
561
+ "learning_rate": 0.0002548450835544718,
562
+ "loss": 2.8263,
563
+ "step": 74000
564
+ },
565
+ {
566
+ "epoch": 4.825323296660876,
567
+ "grad_norm": 0.330805242061615,
568
+ "learning_rate": 0.0002537703507136197,
569
+ "loss": 2.8296,
570
+ "step": 75000
571
+ },
572
+ {
573
+ "epoch": 4.889660940616355,
574
+ "grad_norm": 0.32566189765930176,
575
+ "learning_rate": 0.0002526945420641182,
576
+ "loss": 2.8208,
577
+ "step": 76000
578
+ },
579
+ {
580
+ "epoch": 4.953998584571833,
581
+ "grad_norm": 0.32299116253852844,
582
+ "learning_rate": 0.00025161980922326616,
583
+ "loss": 2.8286,
584
+ "step": 77000
585
+ },
586
+ {
587
+ "epoch": 5.0,
588
+ "eval_accuracy": 0.417990981289541,
589
+ "eval_loss": 2.988518238067627,
590
+ "eval_runtime": 113.0062,
591
+ "eval_samples_per_second": 464.045,
592
+ "eval_steps_per_second": 7.256,
593
+ "step": 77715
594
+ },
595
+ {
596
+ "epoch": 5.018336228527311,
597
+ "grad_norm": 0.332711786031723,
598
+ "learning_rate": 0.00025054400057376457,
599
+ "loss": 2.797,
600
+ "step": 78000
601
+ },
602
+ {
603
+ "epoch": 5.08267387248279,
604
+ "grad_norm": 0.3597155809402466,
605
+ "learning_rate": 0.000249468191924263,
606
+ "loss": 2.7461,
607
+ "step": 79000
608
+ },
609
+ {
610
+ "epoch": 5.147011516438268,
611
+ "grad_norm": 0.3411096930503845,
612
+ "learning_rate": 0.000248393459083411,
613
+ "loss": 2.7493,
614
+ "step": 80000
615
+ },
616
+ {
617
+ "epoch": 5.2113491603937465,
618
+ "grad_norm": 0.35248109698295593,
619
+ "learning_rate": 0.00024731765043390947,
620
+ "loss": 2.7584,
621
+ "step": 81000
622
+ },
623
+ {
624
+ "epoch": 5.275686804349225,
625
+ "grad_norm": 0.3520190417766571,
626
+ "learning_rate": 0.00024624184178440793,
627
+ "loss": 2.755,
628
+ "step": 82000
629
+ },
630
+ {
631
+ "epoch": 5.340024448304703,
632
+ "grad_norm": 0.34867680072784424,
633
+ "learning_rate": 0.00024516710894355586,
634
+ "loss": 2.7649,
635
+ "step": 83000
636
+ },
637
+ {
638
+ "epoch": 5.404362092260182,
639
+ "grad_norm": 0.3400154709815979,
640
+ "learning_rate": 0.00024409130029405434,
641
+ "loss": 2.7586,
642
+ "step": 84000
643
+ },
644
+ {
645
+ "epoch": 5.46869973621566,
646
+ "grad_norm": 0.3640024662017822,
647
+ "learning_rate": 0.0002430154916445528,
648
+ "loss": 2.7606,
649
+ "step": 85000
650
+ },
651
+ {
652
+ "epoch": 5.533037380171138,
653
+ "grad_norm": 0.3456322252750397,
654
+ "learning_rate": 0.00024193968299505126,
655
+ "loss": 2.767,
656
+ "step": 86000
657
+ },
658
+ {
659
+ "epoch": 5.597375024126617,
660
+ "grad_norm": 0.3284786343574524,
661
+ "learning_rate": 0.00024086495015419922,
662
+ "loss": 2.7687,
663
+ "step": 87000
664
+ },
665
+ {
666
+ "epoch": 5.661712668082095,
667
+ "grad_norm": 0.3351786732673645,
668
+ "learning_rate": 0.00023978914150469768,
669
+ "loss": 2.7705,
670
+ "step": 88000
671
+ },
672
+ {
673
+ "epoch": 5.726050312037573,
674
+ "grad_norm": 0.3189627528190613,
675
+ "learning_rate": 0.00023871440866384563,
676
+ "loss": 2.7743,
677
+ "step": 89000
678
+ },
679
+ {
680
+ "epoch": 5.790387955993052,
681
+ "grad_norm": 0.3447468876838684,
682
+ "learning_rate": 0.0002376386000143441,
683
+ "loss": 2.7712,
684
+ "step": 90000
685
+ },
686
+ {
687
+ "epoch": 5.85472559994853,
688
+ "grad_norm": 0.3212040364742279,
689
+ "learning_rate": 0.00023656386717349205,
690
+ "loss": 2.7741,
691
+ "step": 91000
692
+ },
693
+ {
694
+ "epoch": 5.9190632439040085,
695
+ "grad_norm": 0.3384701609611511,
696
+ "learning_rate": 0.0002354880585239905,
697
+ "loss": 2.7747,
698
+ "step": 92000
699
+ },
700
+ {
701
+ "epoch": 5.983400887859487,
702
+ "grad_norm": 0.33266380429267883,
703
+ "learning_rate": 0.00023441224987448894,
704
+ "loss": 2.779,
705
+ "step": 93000
706
+ },
707
+ {
708
+ "epoch": 6.0,
709
+ "eval_accuracy": 0.4206485843765424,
710
+ "eval_loss": 2.969926595687866,
711
+ "eval_runtime": 113.3066,
712
+ "eval_samples_per_second": 462.815,
713
+ "eval_steps_per_second": 7.237,
714
+ "step": 93258
715
+ },
716
+ {
717
+ "epoch": 6.047738531814965,
718
+ "grad_norm": 0.35645657777786255,
719
+ "learning_rate": 0.0002333364412249874,
720
+ "loss": 2.7059,
721
+ "step": 94000
722
+ },
723
+ {
724
+ "epoch": 6.112076175770444,
725
+ "grad_norm": 0.35733386874198914,
726
+ "learning_rate": 0.0002322617083841354,
727
+ "loss": 2.6966,
728
+ "step": 95000
729
+ },
730
+ {
731
+ "epoch": 6.176413819725922,
732
+ "grad_norm": 0.3435540199279785,
733
+ "learning_rate": 0.00023118589973463387,
734
+ "loss": 2.6986,
735
+ "step": 96000
736
+ },
737
+ {
738
+ "epoch": 6.2407514636814,
739
+ "grad_norm": 0.3479596972465515,
740
+ "learning_rate": 0.0002301100910851323,
741
+ "loss": 2.7068,
742
+ "step": 97000
743
+ },
744
+ {
745
+ "epoch": 6.305089107636879,
746
+ "grad_norm": 0.3150424659252167,
747
+ "learning_rate": 0.00022903428243563077,
748
+ "loss": 2.7074,
749
+ "step": 98000
750
+ },
751
+ {
752
+ "epoch": 6.369426751592357,
753
+ "grad_norm": 0.34055858850479126,
754
+ "learning_rate": 0.00022795954959477872,
755
+ "loss": 2.7065,
756
+ "step": 99000
757
+ },
758
+ {
759
+ "epoch": 6.433764395547835,
760
+ "grad_norm": 0.3491341769695282,
761
+ "learning_rate": 0.0002268848167539267,
762
+ "loss": 2.7156,
763
+ "step": 100000
764
+ },
765
+ {
766
+ "epoch": 6.498102039503314,
767
+ "grad_norm": 0.3347100019454956,
768
+ "learning_rate": 0.00022580900810442514,
769
+ "loss": 2.714,
770
+ "step": 101000
771
+ },
772
+ {
773
+ "epoch": 6.562439683458791,
774
+ "grad_norm": 0.35210439562797546,
775
+ "learning_rate": 0.00022473427526357312,
776
+ "loss": 2.7194,
777
+ "step": 102000
778
+ },
779
+ {
780
+ "epoch": 6.6267773274142705,
781
+ "grad_norm": 0.3326897919178009,
782
+ "learning_rate": 0.00022365846661407155,
783
+ "loss": 2.727,
784
+ "step": 103000
785
+ },
786
+ {
787
+ "epoch": 6.691114971369748,
788
+ "grad_norm": 0.3269229531288147,
789
+ "learning_rate": 0.00022258265796457,
790
+ "loss": 2.7203,
791
+ "step": 104000
792
+ },
793
+ {
794
+ "epoch": 6.755452615325227,
795
+ "grad_norm": 0.34183254837989807,
796
+ "learning_rate": 0.00022150684931506847,
797
+ "loss": 2.7328,
798
+ "step": 105000
799
+ },
800
+ {
801
+ "epoch": 6.819790259280705,
802
+ "grad_norm": 0.33449244499206543,
803
+ "learning_rate": 0.00022043211647421643,
804
+ "loss": 2.7291,
805
+ "step": 106000
806
+ },
807
+ {
808
+ "epoch": 6.884127903236184,
809
+ "grad_norm": 0.33734798431396484,
810
+ "learning_rate": 0.0002193563078247149,
811
+ "loss": 2.7277,
812
+ "step": 107000
813
+ },
814
+ {
815
+ "epoch": 6.948465547191661,
816
+ "grad_norm": 0.34088870882987976,
817
+ "learning_rate": 0.00021828157498386284,
818
+ "loss": 2.7316,
819
+ "step": 108000
820
+ },
821
+ {
822
+ "epoch": 7.0,
823
+ "eval_accuracy": 0.42224443247932275,
824
+ "eval_loss": 2.958820104598999,
825
+ "eval_runtime": 113.0743,
826
+ "eval_samples_per_second": 463.766,
827
+ "eval_steps_per_second": 7.252,
828
+ "step": 108801
829
+ },
830
+ {
831
+ "epoch": 7.01280319114714,
832
+ "grad_norm": 0.3517482876777649,
833
+ "learning_rate": 0.0002172057663343613,
834
+ "loss": 2.7116,
835
+ "step": 109000
836
+ },
837
+ {
838
+ "epoch": 7.077140835102618,
839
+ "grad_norm": 0.3411085903644562,
840
+ "learning_rate": 0.00021613103349350926,
841
+ "loss": 2.6469,
842
+ "step": 110000
843
+ },
844
+ {
845
+ "epoch": 7.1414784790580965,
846
+ "grad_norm": 0.3486618399620056,
847
+ "learning_rate": 0.00021505522484400772,
848
+ "loss": 2.6546,
849
+ "step": 111000
850
+ },
851
+ {
852
+ "epoch": 7.205816123013575,
853
+ "grad_norm": 0.35618531703948975,
854
+ "learning_rate": 0.00021397941619450618,
855
+ "loss": 2.6603,
856
+ "step": 112000
857
+ },
858
+ {
859
+ "epoch": 7.270153766969053,
860
+ "grad_norm": 0.34740447998046875,
861
+ "learning_rate": 0.00021290468335365413,
862
+ "loss": 2.6632,
863
+ "step": 113000
864
+ },
865
+ {
866
+ "epoch": 7.334491410924532,
867
+ "grad_norm": 0.339108407497406,
868
+ "learning_rate": 0.0002118288747041526,
869
+ "loss": 2.6682,
870
+ "step": 114000
871
+ },
872
+ {
873
+ "epoch": 7.39882905488001,
874
+ "grad_norm": 0.36686399579048157,
875
+ "learning_rate": 0.00021075306605465105,
876
+ "loss": 2.6718,
877
+ "step": 115000
878
+ },
879
+ {
880
+ "epoch": 7.463166698835488,
881
+ "grad_norm": 0.3336213529109955,
882
+ "learning_rate": 0.000209678333213799,
883
+ "loss": 2.6806,
884
+ "step": 116000
885
+ },
886
+ {
887
+ "epoch": 7.527504342790967,
888
+ "grad_norm": 0.34256553649902344,
889
+ "learning_rate": 0.00020860252456429747,
890
+ "loss": 2.6772,
891
+ "step": 117000
892
+ },
893
+ {
894
+ "epoch": 7.591841986746445,
895
+ "grad_norm": 0.3527204096317291,
896
+ "learning_rate": 0.00020752671591479593,
897
+ "loss": 2.6786,
898
+ "step": 118000
899
+ },
900
+ {
901
+ "epoch": 7.656179630701923,
902
+ "grad_norm": 0.34285178780555725,
903
+ "learning_rate": 0.0002064509072652944,
904
+ "loss": 2.6816,
905
+ "step": 119000
906
+ },
907
+ {
908
+ "epoch": 7.720517274657402,
909
+ "grad_norm": 0.3418208658695221,
910
+ "learning_rate": 0.00020537617442444234,
911
+ "loss": 2.6893,
912
+ "step": 120000
913
+ },
914
+ {
915
+ "epoch": 7.78485491861288,
916
+ "grad_norm": 0.34486138820648193,
917
+ "learning_rate": 0.0002043003657749408,
918
+ "loss": 2.6847,
919
+ "step": 121000
920
+ },
921
+ {
922
+ "epoch": 7.8491925625683585,
923
+ "grad_norm": 0.348530650138855,
924
+ "learning_rate": 0.00020322563293408876,
925
+ "loss": 2.6826,
926
+ "step": 122000
927
+ },
928
+ {
929
+ "epoch": 7.913530206523837,
930
+ "grad_norm": 0.33808425068855286,
931
+ "learning_rate": 0.00020215090009323674,
932
+ "loss": 2.6905,
933
+ "step": 123000
934
+ },
935
+ {
936
+ "epoch": 7.977867850479315,
937
+ "grad_norm": 0.3486366868019104,
938
+ "learning_rate": 0.0002010750914437352,
939
+ "loss": 2.6909,
940
+ "step": 124000
941
+ },
942
+ {
943
+ "epoch": 8.0,
944
+ "eval_accuracy": 0.4232837528604119,
945
+ "eval_loss": 2.9554243087768555,
946
+ "eval_runtime": 111.9904,
947
+ "eval_samples_per_second": 468.254,
948
+ "eval_steps_per_second": 7.322,
949
+ "step": 124344
950
+ },
951
+ {
952
+ "epoch": 8.042205494434794,
953
+ "grad_norm": 0.35380104184150696,
954
+ "learning_rate": 0.00020000035860288316,
955
+ "loss": 2.6303,
956
+ "step": 125000
957
+ },
958
+ {
959
+ "epoch": 8.106543138390272,
960
+ "grad_norm": 0.3654320240020752,
961
+ "learning_rate": 0.00019892454995338162,
962
+ "loss": 2.6128,
963
+ "step": 126000
964
+ },
965
+ {
966
+ "epoch": 8.170880782345751,
967
+ "grad_norm": 0.3670574724674225,
968
+ "learning_rate": 0.00019784874130388008,
969
+ "loss": 2.617,
970
+ "step": 127000
971
+ },
972
+ {
973
+ "epoch": 8.235218426301229,
974
+ "grad_norm": 0.38059455156326294,
975
+ "learning_rate": 0.00019677400846302803,
976
+ "loss": 2.6274,
977
+ "step": 128000
978
+ },
979
+ {
980
+ "epoch": 8.299556070256708,
981
+ "grad_norm": 0.3698261082172394,
982
+ "learning_rate": 0.00019569927562217599,
983
+ "loss": 2.6309,
984
+ "step": 129000
985
+ },
986
+ {
987
+ "epoch": 8.363893714212185,
988
+ "grad_norm": 0.3583601117134094,
989
+ "learning_rate": 0.00019462346697267445,
990
+ "loss": 2.6312,
991
+ "step": 130000
992
+ },
993
+ {
994
+ "epoch": 8.428231358167665,
995
+ "grad_norm": 0.3602234721183777,
996
+ "learning_rate": 0.0001935476583231729,
997
+ "loss": 2.6368,
998
+ "step": 131000
999
+ },
1000
+ {
1001
+ "epoch": 8.492569002123142,
1002
+ "grad_norm": 0.3441711664199829,
1003
+ "learning_rate": 0.00019247184967367137,
1004
+ "loss": 2.6372,
1005
+ "step": 132000
1006
+ },
1007
+ {
1008
+ "epoch": 8.556906646078621,
1009
+ "grad_norm": 0.3533187508583069,
1010
+ "learning_rate": 0.00019139604102416983,
1011
+ "loss": 2.6443,
1012
+ "step": 133000
1013
+ },
1014
+ {
1015
+ "epoch": 8.621244290034099,
1016
+ "grad_norm": 0.3579193651676178,
1017
+ "learning_rate": 0.00019032130818331778,
1018
+ "loss": 2.6481,
1019
+ "step": 134000
1020
+ },
1021
+ {
1022
+ "epoch": 8.685581933989578,
1023
+ "grad_norm": 0.3524502217769623,
1024
+ "learning_rate": 0.00018924549953381624,
1025
+ "loss": 2.6509,
1026
+ "step": 135000
1027
+ },
1028
+ {
1029
+ "epoch": 8.749919577945056,
1030
+ "grad_norm": 0.36159747838974,
1031
+ "learning_rate": 0.0001881707666929642,
1032
+ "loss": 2.6456,
1033
+ "step": 136000
1034
+ },
1035
+ {
1036
+ "epoch": 8.814257221900533,
1037
+ "grad_norm": 0.34249147772789,
1038
+ "learning_rate": 0.00018709495804346266,
1039
+ "loss": 2.6538,
1040
+ "step": 137000
1041
+ },
1042
+ {
1043
+ "epoch": 8.878594865856012,
1044
+ "grad_norm": 0.34867429733276367,
1045
+ "learning_rate": 0.0001860202252026106,
1046
+ "loss": 2.6558,
1047
+ "step": 138000
1048
+ },
1049
+ {
1050
+ "epoch": 8.942932509811492,
1051
+ "grad_norm": 0.3351230025291443,
1052
+ "learning_rate": 0.00018494441655310907,
1053
+ "loss": 2.6504,
1054
+ "step": 139000
1055
+ },
1056
+ {
1057
+ "epoch": 9.0,
1058
+ "eval_accuracy": 0.4238085730096768,
1059
+ "eval_loss": 2.9544410705566406,
1060
+ "eval_runtime": 112.4181,
1061
+ "eval_samples_per_second": 466.473,
1062
+ "eval_steps_per_second": 7.294,
1063
+ "step": 139887
1064
+ },
1065
+ {
1066
+ "epoch": 9.007270153766969,
1067
+ "grad_norm": 0.36276528239250183,
1068
+ "learning_rate": 0.00018386968371225703,
1069
+ "loss": 2.6469,
1070
+ "step": 140000
1071
+ },
1072
+ {
1073
+ "epoch": 9.071607797722447,
1074
+ "grad_norm": 0.36368831992149353,
1075
+ "learning_rate": 0.0001827938750627555,
1076
+ "loss": 2.5666,
1077
+ "step": 141000
1078
+ },
1079
+ {
1080
+ "epoch": 9.135945441677926,
1081
+ "grad_norm": 0.36417004466056824,
1082
+ "learning_rate": 0.00018171806641325395,
1083
+ "loss": 2.5832,
1084
+ "step": 142000
1085
+ },
1086
+ {
1087
+ "epoch": 9.200283085633403,
1088
+ "grad_norm": 0.3550620973110199,
1089
+ "learning_rate": 0.0001806422577637524,
1090
+ "loss": 2.5888,
1091
+ "step": 143000
1092
+ },
1093
+ {
1094
+ "epoch": 9.264620729588882,
1095
+ "grad_norm": 0.3513035178184509,
1096
+ "learning_rate": 0.00017956644911425084,
1097
+ "loss": 2.5872,
1098
+ "step": 144000
1099
+ },
1100
+ {
1101
+ "epoch": 9.32895837354436,
1102
+ "grad_norm": 0.3576969802379608,
1103
+ "learning_rate": 0.00017849279208204832,
1104
+ "loss": 2.599,
1105
+ "step": 145000
1106
+ },
1107
+ {
1108
+ "epoch": 9.39329601749984,
1109
+ "grad_norm": 0.3496710956096649,
1110
+ "learning_rate": 0.00017741698343254678,
1111
+ "loss": 2.6042,
1112
+ "step": 146000
1113
+ },
1114
+ {
1115
+ "epoch": 9.457633661455317,
1116
+ "grad_norm": 0.3502206802368164,
1117
+ "learning_rate": 0.00017634225059169476,
1118
+ "loss": 2.6069,
1119
+ "step": 147000
1120
+ },
1121
+ {
1122
+ "epoch": 9.521971305410796,
1123
+ "grad_norm": 0.3516786992549896,
1124
+ "learning_rate": 0.00017526644194219322,
1125
+ "loss": 2.606,
1126
+ "step": 148000
1127
+ },
1128
+ {
1129
+ "epoch": 9.586308949366273,
1130
+ "grad_norm": 0.3671824336051941,
1131
+ "learning_rate": 0.00017419063329269168,
1132
+ "loss": 2.6151,
1133
+ "step": 149000
1134
+ },
1135
+ {
1136
+ "epoch": 9.650646593321753,
1137
+ "grad_norm": 0.36615684628486633,
1138
+ "learning_rate": 0.00017311590045183964,
1139
+ "loss": 2.6174,
1140
+ "step": 150000
1141
+ },
1142
+ {
1143
+ "epoch": 9.71498423727723,
1144
+ "grad_norm": 0.369759202003479,
1145
+ "learning_rate": 0.0001720400918023381,
1146
+ "loss": 2.6162,
1147
+ "step": 151000
1148
+ },
1149
+ {
1150
+ "epoch": 9.77932188123271,
1151
+ "grad_norm": 0.3495037257671356,
1152
+ "learning_rate": 0.00017096428315283656,
1153
+ "loss": 2.6186,
1154
+ "step": 152000
1155
+ },
1156
+ {
1157
+ "epoch": 9.843659525188187,
1158
+ "grad_norm": 0.3635868728160858,
1159
+ "learning_rate": 0.0001698895503119845,
1160
+ "loss": 2.616,
1161
+ "step": 153000
1162
+ },
1163
+ {
1164
+ "epoch": 9.907997169143666,
1165
+ "grad_norm": 0.352250337600708,
1166
+ "learning_rate": 0.00016881374166248297,
1167
+ "loss": 2.626,
1168
+ "step": 154000
1169
+ },
1170
+ {
1171
+ "epoch": 9.972334813099144,
1172
+ "grad_norm": 0.3688776195049286,
1173
+ "learning_rate": 0.00016773900882163093,
1174
+ "loss": 2.6246,
1175
+ "step": 155000
1176
+ },
1177
+ {
1178
+ "epoch": 10.0,
1179
+ "eval_accuracy": 0.424411016885778,
1180
+ "eval_loss": 2.9523308277130127,
1181
+ "eval_runtime": 112.0636,
1182
+ "eval_samples_per_second": 467.948,
1183
+ "eval_steps_per_second": 7.317,
1184
+ "step": 155430
1185
+ },
1186
+ {
1187
+ "epoch": 10.036672457054623,
1188
+ "grad_norm": 0.3961314558982849,
1189
+ "learning_rate": 0.0001666632001721294,
1190
+ "loss": 2.5827,
1191
+ "step": 156000
1192
+ },
1193
+ {
1194
+ "epoch": 10.1010101010101,
1195
+ "grad_norm": 0.3705954849720001,
1196
+ "learning_rate": 0.00016558739152262782,
1197
+ "loss": 2.5413,
1198
+ "step": 157000
1199
+ },
1200
+ {
1201
+ "epoch": 10.16534774496558,
1202
+ "grad_norm": 0.37091416120529175,
1203
+ "learning_rate": 0.00016451158287312628,
1204
+ "loss": 2.5502,
1205
+ "step": 158000
1206
+ },
1207
+ {
1208
+ "epoch": 10.229685388921057,
1209
+ "grad_norm": 0.38428565859794617,
1210
+ "learning_rate": 0.00016343685003227424,
1211
+ "loss": 2.5592,
1212
+ "step": 159000
1213
+ },
1214
+ {
1215
+ "epoch": 10.294023032876536,
1216
+ "grad_norm": 0.3688577115535736,
1217
+ "learning_rate": 0.0001623610413827727,
1218
+ "loss": 2.5673,
1219
+ "step": 160000
1220
+ },
1221
+ {
1222
+ "epoch": 10.358360676832014,
1223
+ "grad_norm": 0.38183775544166565,
1224
+ "learning_rate": 0.00016128630854192065,
1225
+ "loss": 2.5697,
1226
+ "step": 161000
1227
+ },
1228
+ {
1229
+ "epoch": 10.422698320787493,
1230
+ "grad_norm": 0.37677517533302307,
1231
+ "learning_rate": 0.0001602104998924191,
1232
+ "loss": 2.5713,
1233
+ "step": 162000
1234
+ },
1235
+ {
1236
+ "epoch": 10.48703596474297,
1237
+ "grad_norm": 0.3694332540035248,
1238
+ "learning_rate": 0.00015913576705156707,
1239
+ "loss": 2.5751,
1240
+ "step": 163000
1241
+ },
1242
+ {
1243
+ "epoch": 10.55137360869845,
1244
+ "grad_norm": 0.3814958333969116,
1245
+ "learning_rate": 0.00015806103421071502,
1246
+ "loss": 2.5792,
1247
+ "step": 164000
1248
+ },
1249
+ {
1250
+ "epoch": 10.615711252653927,
1251
+ "grad_norm": 0.38280004262924194,
1252
+ "learning_rate": 0.00015698522556121348,
1253
+ "loss": 2.5782,
1254
+ "step": 165000
1255
+ },
1256
+ {
1257
+ "epoch": 10.680048896609406,
1258
+ "grad_norm": 0.3659280240535736,
1259
+ "learning_rate": 0.00015590941691171194,
1260
+ "loss": 2.5862,
1261
+ "step": 166000
1262
+ },
1263
+ {
1264
+ "epoch": 10.744386540564884,
1265
+ "grad_norm": 0.34562841057777405,
1266
+ "learning_rate": 0.0001548336082622104,
1267
+ "loss": 2.5869,
1268
+ "step": 167000
1269
+ },
1270
+ {
1271
+ "epoch": 10.808724184520363,
1272
+ "grad_norm": 0.3570345938205719,
1273
+ "learning_rate": 0.00015375887542135836,
1274
+ "loss": 2.59,
1275
+ "step": 168000
1276
+ },
1277
+ {
1278
+ "epoch": 10.87306182847584,
1279
+ "grad_norm": 0.360215961933136,
1280
+ "learning_rate": 0.00015268306677185682,
1281
+ "loss": 2.5979,
1282
+ "step": 169000
1283
+ },
1284
+ {
1285
+ "epoch": 10.93739947243132,
1286
+ "grad_norm": 0.370670884847641,
1287
+ "learning_rate": 0.00015160725812235528,
1288
+ "loss": 2.5988,
1289
+ "step": 170000
1290
+ },
1291
+ {
1292
+ "epoch": 11.0,
1293
+ "eval_accuracy": 0.4248191770987571,
1294
+ "eval_loss": 2.9567785263061523,
1295
+ "eval_runtime": 112.9375,
1296
+ "eval_samples_per_second": 464.328,
1297
+ "eval_steps_per_second": 7.261,
1298
+ "step": 170973
1299
+ },
1300
+ {
1301
+ "epoch": 11.001737116386797,
1302
+ "grad_norm": 0.38218948245048523,
1303
+ "learning_rate": 0.00015053252528150323,
1304
+ "loss": 2.5933,
1305
+ "step": 171000
1306
+ },
1307
+ {
1308
+ "epoch": 11.066074760342277,
1309
+ "grad_norm": 0.396331787109375,
1310
+ "learning_rate": 0.00014945671663200172,
1311
+ "loss": 2.5023,
1312
+ "step": 172000
1313
+ },
1314
+ {
1315
+ "epoch": 11.130412404297754,
1316
+ "grad_norm": 0.3751789927482605,
1317
+ "learning_rate": 0.00014838090798250018,
1318
+ "loss": 2.5227,
1319
+ "step": 173000
1320
+ },
1321
+ {
1322
+ "epoch": 11.194750048253233,
1323
+ "grad_norm": 0.37265828251838684,
1324
+ "learning_rate": 0.00014730509933299864,
1325
+ "loss": 2.5299,
1326
+ "step": 174000
1327
+ },
1328
+ {
1329
+ "epoch": 11.25908769220871,
1330
+ "grad_norm": 0.37080228328704834,
1331
+ "learning_rate": 0.0001462303664921466,
1332
+ "loss": 2.5333,
1333
+ "step": 175000
1334
+ },
1335
+ {
1336
+ "epoch": 11.32342533616419,
1337
+ "grad_norm": 0.3808966875076294,
1338
+ "learning_rate": 0.00014515563365129455,
1339
+ "loss": 2.5376,
1340
+ "step": 176000
1341
+ },
1342
+ {
1343
+ "epoch": 11.387762980119668,
1344
+ "grad_norm": 0.38901346921920776,
1345
+ "learning_rate": 0.000144079825001793,
1346
+ "loss": 2.5422,
1347
+ "step": 177000
1348
+ },
1349
+ {
1350
+ "epoch": 11.452100624075147,
1351
+ "grad_norm": 0.380100816488266,
1352
+ "learning_rate": 0.00014300401635229144,
1353
+ "loss": 2.5533,
1354
+ "step": 178000
1355
+ },
1356
+ {
1357
+ "epoch": 11.516438268030624,
1358
+ "grad_norm": 0.39306920766830444,
1359
+ "learning_rate": 0.0001419282077027899,
1360
+ "loss": 2.5507,
1361
+ "step": 179000
1362
+ },
1363
+ {
1364
+ "epoch": 11.580775911986104,
1365
+ "grad_norm": 0.3917422890663147,
1366
+ "learning_rate": 0.00014085239905328836,
1367
+ "loss": 2.5579,
1368
+ "step": 180000
1369
+ },
1370
+ {
1371
+ "epoch": 11.645113555941581,
1372
+ "grad_norm": 0.38742849230766296,
1373
+ "learning_rate": 0.00013977766621243632,
1374
+ "loss": 2.5531,
1375
+ "step": 181000
1376
+ },
1377
+ {
1378
+ "epoch": 11.70945119989706,
1379
+ "grad_norm": 0.3767852187156677,
1380
+ "learning_rate": 0.00013870185756293478,
1381
+ "loss": 2.5633,
1382
+ "step": 182000
1383
+ },
1384
+ {
1385
+ "epoch": 11.773788843852538,
1386
+ "grad_norm": 0.39576900005340576,
1387
+ "learning_rate": 0.00013762604891343324,
1388
+ "loss": 2.5648,
1389
+ "step": 183000
1390
+ },
1391
+ {
1392
+ "epoch": 11.838126487808017,
1393
+ "grad_norm": 0.37659791111946106,
1394
+ "learning_rate": 0.00013655131607258122,
1395
+ "loss": 2.5631,
1396
+ "step": 184000
1397
+ },
1398
+ {
1399
+ "epoch": 11.902464131763494,
1400
+ "grad_norm": 0.38377416133880615,
1401
+ "learning_rate": 0.00013547658323172918,
1402
+ "loss": 2.5631,
1403
+ "step": 185000
1404
+ },
1405
+ {
1406
+ "epoch": 11.966801775718974,
1407
+ "grad_norm": 0.37857234477996826,
1408
+ "learning_rate": 0.00013440077458222764,
1409
+ "loss": 2.5639,
1410
+ "step": 186000
1411
+ },
1412
+ {
1413
+ "epoch": 12.0,
1414
+ "eval_accuracy": 0.4247610714766456,
1415
+ "eval_loss": 2.9595353603363037,
1416
+ "eval_runtime": 112.0415,
1417
+ "eval_samples_per_second": 468.041,
1418
+ "eval_steps_per_second": 7.319,
1419
+ "step": 186516
1420
+ },
1421
+ {
1422
+ "epoch": 12.031139419674451,
1423
+ "grad_norm": 0.4024442136287689,
1424
+ "learning_rate": 0.0001333249659327261,
1425
+ "loss": 2.5273,
1426
+ "step": 187000
1427
+ },
1428
+ {
1429
+ "epoch": 12.09547706362993,
1430
+ "grad_norm": 0.4137458801269531,
1431
+ "learning_rate": 0.00013225023309187405,
1432
+ "loss": 2.4933,
1433
+ "step": 188000
1434
+ },
1435
+ {
1436
+ "epoch": 12.159814707585408,
1437
+ "grad_norm": 0.409184992313385,
1438
+ "learning_rate": 0.0001311744244423725,
1439
+ "loss": 2.4967,
1440
+ "step": 189000
1441
+ },
1442
+ {
1443
+ "epoch": 12.224152351540887,
1444
+ "grad_norm": 0.41316309571266174,
1445
+ "learning_rate": 0.00013009861579287097,
1446
+ "loss": 2.5063,
1447
+ "step": 190000
1448
+ },
1449
+ {
1450
+ "epoch": 12.288489995496365,
1451
+ "grad_norm": 0.3909110724925995,
1452
+ "learning_rate": 0.00012902280714336943,
1453
+ "loss": 2.5153,
1454
+ "step": 191000
1455
+ },
1456
+ {
1457
+ "epoch": 12.352827639451844,
1458
+ "grad_norm": 0.39046111702919006,
1459
+ "learning_rate": 0.0001279469984938679,
1460
+ "loss": 2.5115,
1461
+ "step": 192000
1462
+ },
1463
+ {
1464
+ "epoch": 12.417165283407321,
1465
+ "grad_norm": 0.40070855617523193,
1466
+ "learning_rate": 0.00012687226565301585,
1467
+ "loss": 2.5157,
1468
+ "step": 193000
1469
+ },
1470
+ {
1471
+ "epoch": 12.4815029273628,
1472
+ "grad_norm": 0.3970703184604645,
1473
+ "learning_rate": 0.00012579645700351428,
1474
+ "loss": 2.5198,
1475
+ "step": 194000
1476
+ },
1477
+ {
1478
+ "epoch": 12.545840571318278,
1479
+ "grad_norm": 0.40202242136001587,
1480
+ "learning_rate": 0.00012472064835401274,
1481
+ "loss": 2.526,
1482
+ "step": 195000
1483
+ },
1484
+ {
1485
+ "epoch": 12.610178215273757,
1486
+ "grad_norm": 0.3841732144355774,
1487
+ "learning_rate": 0.0001236459155131607,
1488
+ "loss": 2.5295,
1489
+ "step": 196000
1490
+ },
1491
+ {
1492
+ "epoch": 12.674515859229235,
1493
+ "grad_norm": 0.40759024024009705,
1494
+ "learning_rate": 0.00012257010686365916,
1495
+ "loss": 2.5307,
1496
+ "step": 197000
1497
+ },
1498
+ {
1499
+ "epoch": 12.738853503184714,
1500
+ "grad_norm": 0.3963831663131714,
1501
+ "learning_rate": 0.00012149429821415763,
1502
+ "loss": 2.534,
1503
+ "step": 198000
1504
+ },
1505
+ {
1506
+ "epoch": 12.803191147140192,
1507
+ "grad_norm": 0.37255486845970154,
1508
+ "learning_rate": 0.0001204195653733056,
1509
+ "loss": 2.5354,
1510
+ "step": 199000
1511
+ },
1512
+ {
1513
+ "epoch": 12.86752879109567,
1514
+ "grad_norm": 0.397368460893631,
1515
+ "learning_rate": 0.00011934375672380406,
1516
+ "loss": 2.5352,
1517
+ "step": 200000
1518
+ },
1519
+ {
1520
+ "epoch": 12.931866435051148,
1521
+ "grad_norm": 0.379574716091156,
1522
+ "learning_rate": 0.00011826902388295201,
1523
+ "loss": 2.5397,
1524
+ "step": 201000
1525
+ },
1526
+ {
1527
+ "epoch": 12.996204079006628,
1528
+ "grad_norm": 0.3803842067718506,
1529
+ "learning_rate": 0.00011719321523345048,
1530
+ "loss": 2.5361,
1531
+ "step": 202000
1532
+ },
1533
+ {
1534
+ "epoch": 13.0,
1535
+ "eval_accuracy": 0.42475613586395655,
1536
+ "eval_loss": 2.9698119163513184,
1537
+ "eval_runtime": 112.2708,
1538
+ "eval_samples_per_second": 467.085,
1539
+ "eval_steps_per_second": 7.304,
1540
+ "step": 202059
1541
+ },
1542
+ {
1543
+ "epoch": 13.060541722962105,
1544
+ "grad_norm": 0.40816885232925415,
1545
+ "learning_rate": 0.00011611740658394894,
1546
+ "loss": 2.4669,
1547
+ "step": 203000
1548
+ },
1549
+ {
1550
+ "epoch": 13.124879366917584,
1551
+ "grad_norm": 0.42818671464920044,
1552
+ "learning_rate": 0.00011504159793444738,
1553
+ "loss": 2.467,
1554
+ "step": 204000
1555
+ },
1556
+ {
1557
+ "epoch": 13.189217010873062,
1558
+ "grad_norm": 0.40255987644195557,
1559
+ "learning_rate": 0.00011396686509359535,
1560
+ "loss": 2.4753,
1561
+ "step": 205000
1562
+ },
1563
+ {
1564
+ "epoch": 13.253554654828541,
1565
+ "grad_norm": 0.4254453778266907,
1566
+ "learning_rate": 0.0001128921322527433,
1567
+ "loss": 2.4808,
1568
+ "step": 206000
1569
+ },
1570
+ {
1571
+ "epoch": 13.317892298784018,
1572
+ "grad_norm": 0.4060657322406769,
1573
+ "learning_rate": 0.00011181632360324175,
1574
+ "loss": 2.4932,
1575
+ "step": 207000
1576
+ },
1577
+ {
1578
+ "epoch": 13.382229942739498,
1579
+ "grad_norm": 0.4138365387916565,
1580
+ "learning_rate": 0.00011074051495374021,
1581
+ "loss": 2.4922,
1582
+ "step": 208000
1583
+ },
1584
+ {
1585
+ "epoch": 13.446567586694975,
1586
+ "grad_norm": 0.4098254442214966,
1587
+ "learning_rate": 0.00010966578211288817,
1588
+ "loss": 2.4948,
1589
+ "step": 209000
1590
+ },
1591
+ {
1592
+ "epoch": 13.510905230650454,
1593
+ "grad_norm": 0.4242159128189087,
1594
+ "learning_rate": 0.00010858997346338663,
1595
+ "loss": 2.5012,
1596
+ "step": 210000
1597
+ },
1598
+ {
1599
+ "epoch": 13.575242874605932,
1600
+ "grad_norm": 0.42177829146385193,
1601
+ "learning_rate": 0.00010751416481388509,
1602
+ "loss": 2.4998,
1603
+ "step": 211000
1604
+ },
1605
+ {
1606
+ "epoch": 13.63958051856141,
1607
+ "grad_norm": 0.4196189045906067,
1608
+ "learning_rate": 0.00010643943197303304,
1609
+ "loss": 2.5048,
1610
+ "step": 212000
1611
+ },
1612
+ {
1613
+ "epoch": 13.703918162516889,
1614
+ "grad_norm": 0.3965640366077423,
1615
+ "learning_rate": 0.0001053636233235315,
1616
+ "loss": 2.5092,
1617
+ "step": 213000
1618
+ },
1619
+ {
1620
+ "epoch": 13.768255806472368,
1621
+ "grad_norm": 0.39778339862823486,
1622
+ "learning_rate": 0.00010428781467402996,
1623
+ "loss": 2.5121,
1624
+ "step": 214000
1625
+ },
1626
+ {
1627
+ "epoch": 13.832593450427845,
1628
+ "grad_norm": 0.40292391180992126,
1629
+ "learning_rate": 0.00010321308183317793,
1630
+ "loss": 2.5119,
1631
+ "step": 215000
1632
+ },
1633
+ {
1634
+ "epoch": 13.896931094383323,
1635
+ "grad_norm": 0.41673198342323303,
1636
+ "learning_rate": 0.00010213727318367639,
1637
+ "loss": 2.5112,
1638
+ "step": 216000
1639
+ },
1640
+ {
1641
+ "epoch": 13.961268738338802,
1642
+ "grad_norm": 0.40400612354278564,
1643
+ "learning_rate": 0.00010106254034282435,
1644
+ "loss": 2.5098,
1645
+ "step": 217000
1646
+ },
1647
+ {
1648
+ "epoch": 14.0,
1649
+ "eval_accuracy": 0.424743796832234,
1650
+ "eval_loss": 2.9747180938720703,
1651
+ "eval_runtime": 112.0802,
1652
+ "eval_samples_per_second": 467.879,
1653
+ "eval_steps_per_second": 7.316,
1654
+ "step": 217602
1655
+ },
1656
+ {
1657
+ "epoch": 14.02560638229428,
1658
+ "grad_norm": 0.40745100378990173,
1659
+ "learning_rate": 9.998673169332281e-05,
1660
+ "loss": 2.4894,
1661
+ "step": 218000
1662
+ },
1663
+ {
1664
+ "epoch": 14.089944026249759,
1665
+ "grad_norm": 0.42399463057518005,
1666
+ "learning_rate": 9.891092304382127e-05,
1667
+ "loss": 2.449,
1668
+ "step": 219000
1669
+ },
1670
+ {
1671
+ "epoch": 14.154281670205236,
1672
+ "grad_norm": 0.4149724841117859,
1673
+ "learning_rate": 9.783511439431973e-05,
1674
+ "loss": 2.4534,
1675
+ "step": 220000
1676
+ },
1677
+ {
1678
+ "epoch": 14.218619314160716,
1679
+ "grad_norm": 0.40756285190582275,
1680
+ "learning_rate": 9.676145736211718e-05,
1681
+ "loss": 2.4576,
1682
+ "step": 221000
1683
+ },
1684
+ {
1685
+ "epoch": 14.282956958116193,
1686
+ "grad_norm": 0.4224795997142792,
1687
+ "learning_rate": 9.568564871261564e-05,
1688
+ "loss": 2.4584,
1689
+ "step": 222000
1690
+ },
1691
+ {
1692
+ "epoch": 14.347294602071672,
1693
+ "grad_norm": 0.41213053464889526,
1694
+ "learning_rate": 9.461091587176359e-05,
1695
+ "loss": 2.4707,
1696
+ "step": 223000
1697
+ },
1698
+ {
1699
+ "epoch": 14.41163224602715,
1700
+ "grad_norm": 0.4161031246185303,
1701
+ "learning_rate": 9.353510722226205e-05,
1702
+ "loss": 2.4701,
1703
+ "step": 224000
1704
+ },
1705
+ {
1706
+ "epoch": 14.475969889982629,
1707
+ "grad_norm": 0.42417025566101074,
1708
+ "learning_rate": 9.245929857276051e-05,
1709
+ "loss": 2.4706,
1710
+ "step": 225000
1711
+ },
1712
+ {
1713
+ "epoch": 14.540307533938106,
1714
+ "grad_norm": 0.4227360785007477,
1715
+ "learning_rate": 9.138348992325897e-05,
1716
+ "loss": 2.4678,
1717
+ "step": 226000
1718
+ },
1719
+ {
1720
+ "epoch": 14.604645177893586,
1721
+ "grad_norm": 0.3956305682659149,
1722
+ "learning_rate": 9.030768127375742e-05,
1723
+ "loss": 2.4816,
1724
+ "step": 227000
1725
+ },
1726
+ {
1727
+ "epoch": 14.668982821849063,
1728
+ "grad_norm": 0.42013561725616455,
1729
+ "learning_rate": 8.92329484329054e-05,
1730
+ "loss": 2.4791,
1731
+ "step": 228000
1732
+ },
1733
+ {
1734
+ "epoch": 14.733320465804542,
1735
+ "grad_norm": 0.41232335567474365,
1736
+ "learning_rate": 8.815713978340386e-05,
1737
+ "loss": 2.4861,
1738
+ "step": 229000
1739
+ },
1740
+ {
1741
+ "epoch": 14.79765810976002,
1742
+ "grad_norm": 0.398253858089447,
1743
+ "learning_rate": 8.708240694255182e-05,
1744
+ "loss": 2.4857,
1745
+ "step": 230000
1746
+ },
1747
+ {
1748
+ "epoch": 14.8619957537155,
1749
+ "grad_norm": 0.41056615114212036,
1750
+ "learning_rate": 8.600659829305028e-05,
1751
+ "loss": 2.4826,
1752
+ "step": 231000
1753
+ },
1754
+ {
1755
+ "epoch": 14.926333397670977,
1756
+ "grad_norm": 0.4065124988555908,
1757
+ "learning_rate": 8.493186545219823e-05,
1758
+ "loss": 2.4791,
1759
+ "step": 232000
1760
+ },
1761
+ {
1762
+ "epoch": 14.990671041626456,
1763
+ "grad_norm": 0.42194780707359314,
1764
+ "learning_rate": 8.385605680269669e-05,
1765
+ "loss": 2.4899,
1766
+ "step": 233000
1767
+ },
1768
+ {
1769
+ "epoch": 15.0,
1770
+ "eval_accuracy": 0.4246625087868862,
1771
+ "eval_loss": 2.9792003631591797,
1772
+ "eval_runtime": 112.3403,
1773
+ "eval_samples_per_second": 466.796,
1774
+ "eval_steps_per_second": 7.299,
1775
+ "step": 233145
1776
+ },
1777
+ {
1778
+ "epoch": 15.055008685581933,
1779
+ "grad_norm": 0.444181889295578,
1780
+ "learning_rate": 8.278024815319514e-05,
1781
+ "loss": 2.4309,
1782
+ "step": 234000
1783
+ },
1784
+ {
1785
+ "epoch": 15.119346329537413,
1786
+ "grad_norm": 0.4177301526069641,
1787
+ "learning_rate": 8.17044395036936e-05,
1788
+ "loss": 2.4254,
1789
+ "step": 235000
1790
+ },
1791
+ {
1792
+ "epoch": 15.18368397349289,
1793
+ "grad_norm": 0.43864157795906067,
1794
+ "learning_rate": 8.062970666284155e-05,
1795
+ "loss": 2.432,
1796
+ "step": 236000
1797
+ },
1798
+ {
1799
+ "epoch": 15.24802161744837,
1800
+ "grad_norm": 0.43071264028549194,
1801
+ "learning_rate": 7.955497382198951e-05,
1802
+ "loss": 2.4372,
1803
+ "step": 237000
1804
+ },
1805
+ {
1806
+ "epoch": 15.312359261403847,
1807
+ "grad_norm": 0.44551989436149597,
1808
+ "learning_rate": 7.847916517248797e-05,
1809
+ "loss": 2.4441,
1810
+ "step": 238000
1811
+ },
1812
+ {
1813
+ "epoch": 15.376696905359326,
1814
+ "grad_norm": 0.42598387598991394,
1815
+ "learning_rate": 7.740335652298643e-05,
1816
+ "loss": 2.4448,
1817
+ "step": 239000
1818
+ },
1819
+ {
1820
+ "epoch": 15.441034549314804,
1821
+ "grad_norm": 0.4412069618701935,
1822
+ "learning_rate": 7.632754787348489e-05,
1823
+ "loss": 2.4481,
1824
+ "step": 240000
1825
+ },
1826
+ {
1827
+ "epoch": 15.505372193270283,
1828
+ "grad_norm": 0.4257245361804962,
1829
+ "learning_rate": 7.525173922398335e-05,
1830
+ "loss": 2.4496,
1831
+ "step": 241000
1832
+ },
1833
+ {
1834
+ "epoch": 15.56970983722576,
1835
+ "grad_norm": 0.4463740885257721,
1836
+ "learning_rate": 7.417593057448181e-05,
1837
+ "loss": 2.4583,
1838
+ "step": 242000
1839
+ },
1840
+ {
1841
+ "epoch": 15.63404748118124,
1842
+ "grad_norm": 0.40843266248703003,
1843
+ "learning_rate": 7.310119773362977e-05,
1844
+ "loss": 2.4549,
1845
+ "step": 243000
1846
+ },
1847
+ {
1848
+ "epoch": 15.698385125136717,
1849
+ "grad_norm": 0.43823161721229553,
1850
+ "learning_rate": 7.202538908412823e-05,
1851
+ "loss": 2.4565,
1852
+ "step": 244000
1853
+ },
1854
+ {
1855
+ "epoch": 15.762722769092196,
1856
+ "grad_norm": 0.4224304258823395,
1857
+ "learning_rate": 7.09506562432762e-05,
1858
+ "loss": 2.4664,
1859
+ "step": 245000
1860
+ },
1861
+ {
1862
+ "epoch": 15.827060413047674,
1863
+ "grad_norm": 0.42779698967933655,
1864
+ "learning_rate": 6.987484759377464e-05,
1865
+ "loss": 2.4607,
1866
+ "step": 246000
1867
+ },
1868
+ {
1869
+ "epoch": 15.891398057003153,
1870
+ "grad_norm": 0.41904374957084656,
1871
+ "learning_rate": 6.880011475292261e-05,
1872
+ "loss": 2.463,
1873
+ "step": 247000
1874
+ },
1875
+ {
1876
+ "epoch": 15.95573570095863,
1877
+ "grad_norm": 0.4636126458644867,
1878
+ "learning_rate": 6.772430610342107e-05,
1879
+ "loss": 2.4626,
1880
+ "step": 248000
1881
+ },
1882
+ {
1883
+ "epoch": 16.0,
1884
+ "eval_accuracy": 0.4244173733566653,
1885
+ "eval_loss": 2.9882283210754395,
1886
+ "eval_runtime": 112.1119,
1887
+ "eval_samples_per_second": 467.747,
1888
+ "eval_steps_per_second": 7.314,
1889
+ "step": 248688
1890
+ },
1891
+ {
1892
+ "epoch": 16.02007334491411,
1893
+ "grad_norm": 0.44689562916755676,
1894
+ "learning_rate": 6.664849745391953e-05,
1895
+ "loss": 2.4432,
1896
+ "step": 249000
1897
+ },
1898
+ {
1899
+ "epoch": 16.08441098886959,
1900
+ "grad_norm": 0.45889049768447876,
1901
+ "learning_rate": 6.557376461306749e-05,
1902
+ "loss": 2.4048,
1903
+ "step": 250000
1904
+ },
1905
+ {
1906
+ "epoch": 16.148748632825065,
1907
+ "grad_norm": 0.4538269639015198,
1908
+ "learning_rate": 6.449795596356593e-05,
1909
+ "loss": 2.4123,
1910
+ "step": 251000
1911
+ },
1912
+ {
1913
+ "epoch": 16.213086276780544,
1914
+ "grad_norm": 0.44775742292404175,
1915
+ "learning_rate": 6.342214731406439e-05,
1916
+ "loss": 2.4137,
1917
+ "step": 252000
1918
+ },
1919
+ {
1920
+ "epoch": 16.277423920736023,
1921
+ "grad_norm": 0.4506843090057373,
1922
+ "learning_rate": 6.234741447321236e-05,
1923
+ "loss": 2.4152,
1924
+ "step": 253000
1925
+ },
1926
+ {
1927
+ "epoch": 16.341761564691502,
1928
+ "grad_norm": 0.4564642310142517,
1929
+ "learning_rate": 6.127160582371082e-05,
1930
+ "loss": 2.4236,
1931
+ "step": 254000
1932
+ },
1933
+ {
1934
+ "epoch": 16.406099208646978,
1935
+ "grad_norm": 0.4492376744747162,
1936
+ "learning_rate": 6.0195797174209275e-05,
1937
+ "loss": 2.4222,
1938
+ "step": 255000
1939
+ },
1940
+ {
1941
+ "epoch": 16.470436852602457,
1942
+ "grad_norm": 0.44002753496170044,
1943
+ "learning_rate": 5.9119988524707736e-05,
1944
+ "loss": 2.4277,
1945
+ "step": 256000
1946
+ },
1947
+ {
1948
+ "epoch": 16.534774496557937,
1949
+ "grad_norm": 0.437580406665802,
1950
+ "learning_rate": 5.8044179875206196e-05,
1951
+ "loss": 2.4303,
1952
+ "step": 257000
1953
+ },
1954
+ {
1955
+ "epoch": 16.599112140513416,
1956
+ "grad_norm": 0.42502424120903015,
1957
+ "learning_rate": 5.697052284300365e-05,
1958
+ "loss": 2.4359,
1959
+ "step": 258000
1960
+ },
1961
+ {
1962
+ "epoch": 16.66344978446889,
1963
+ "grad_norm": 0.44441190361976624,
1964
+ "learning_rate": 5.5894714193502106e-05,
1965
+ "loss": 2.4306,
1966
+ "step": 259000
1967
+ },
1968
+ {
1969
+ "epoch": 16.72778742842437,
1970
+ "grad_norm": 0.4539526700973511,
1971
+ "learning_rate": 5.4818905544000566e-05,
1972
+ "loss": 2.4342,
1973
+ "step": 260000
1974
+ },
1975
+ {
1976
+ "epoch": 16.79212507237985,
1977
+ "grad_norm": 0.4554595947265625,
1978
+ "learning_rate": 5.374417270314853e-05,
1979
+ "loss": 2.4388,
1980
+ "step": 261000
1981
+ },
1982
+ {
1983
+ "epoch": 16.85646271633533,
1984
+ "grad_norm": 0.4573330283164978,
1985
+ "learning_rate": 5.266836405364699e-05,
1986
+ "loss": 2.441,
1987
+ "step": 262000
1988
+ },
1989
+ {
1990
+ "epoch": 16.920800360290805,
1991
+ "grad_norm": 0.449770450592041,
1992
+ "learning_rate": 5.159255540414545e-05,
1993
+ "loss": 2.4411,
1994
+ "step": 263000
1995
+ },
1996
+ {
1997
+ "epoch": 16.985138004246284,
1998
+ "grad_norm": 0.48139625787734985,
1999
+ "learning_rate": 5.05178225632934e-05,
2000
+ "loss": 2.4399,
2001
+ "step": 264000
2002
+ },
2003
+ {
2004
+ "epoch": 17.0,
2005
+ "eval_accuracy": 0.4242649676193895,
2006
+ "eval_loss": 2.9961202144622803,
2007
+ "eval_runtime": 112.3106,
2008
+ "eval_samples_per_second": 466.919,
2009
+ "eval_steps_per_second": 7.301,
2010
+ "step": 264231
2011
+ },
2012
+ {
2013
+ "epoch": 17.049475648201764,
2014
+ "grad_norm": 0.4543195366859436,
2015
+ "learning_rate": 4.9442013913791863e-05,
2016
+ "loss": 2.4013,
2017
+ "step": 265000
2018
+ },
2019
+ {
2020
+ "epoch": 17.113813292157243,
2021
+ "grad_norm": 0.4699794054031372,
2022
+ "learning_rate": 4.836620526429032e-05,
2023
+ "loss": 2.3928,
2024
+ "step": 266000
2025
+ },
2026
+ {
2027
+ "epoch": 17.17815093611272,
2028
+ "grad_norm": 0.4636929929256439,
2029
+ "learning_rate": 4.7291472423438285e-05,
2030
+ "loss": 2.3989,
2031
+ "step": 267000
2032
+ },
2033
+ {
2034
+ "epoch": 17.242488580068198,
2035
+ "grad_norm": 0.4614698886871338,
2036
+ "learning_rate": 4.6215663773936746e-05,
2037
+ "loss": 2.4004,
2038
+ "step": 268000
2039
+ },
2040
+ {
2041
+ "epoch": 17.306826224023677,
2042
+ "grad_norm": 0.46002906560897827,
2043
+ "learning_rate": 4.513985512443519e-05,
2044
+ "loss": 2.3982,
2045
+ "step": 269000
2046
+ },
2047
+ {
2048
+ "epoch": 17.371163867979156,
2049
+ "grad_norm": 0.42619064450263977,
2050
+ "learning_rate": 4.4065122283583154e-05,
2051
+ "loss": 2.4053,
2052
+ "step": 270000
2053
+ },
2054
+ {
2055
+ "epoch": 17.435501511934632,
2056
+ "grad_norm": 0.45975300669670105,
2057
+ "learning_rate": 4.2989313634081614e-05,
2058
+ "loss": 2.4041,
2059
+ "step": 271000
2060
+ },
2061
+ {
2062
+ "epoch": 17.49983915589011,
2063
+ "grad_norm": 0.4545740485191345,
2064
+ "learning_rate": 4.1913504984580075e-05,
2065
+ "loss": 2.406,
2066
+ "step": 272000
2067
+ },
2068
+ {
2069
+ "epoch": 17.56417679984559,
2070
+ "grad_norm": 0.458011269569397,
2071
+ "learning_rate": 4.083769633507853e-05,
2072
+ "loss": 2.4168,
2073
+ "step": 273000
2074
+ },
2075
+ {
2076
+ "epoch": 17.62851444380107,
2077
+ "grad_norm": 0.4604107439517975,
2078
+ "learning_rate": 3.976296349422649e-05,
2079
+ "loss": 2.411,
2080
+ "step": 274000
2081
+ },
2082
+ {
2083
+ "epoch": 17.692852087756545,
2084
+ "grad_norm": 0.4420773684978485,
2085
+ "learning_rate": 3.8687154844724944e-05,
2086
+ "loss": 2.4144,
2087
+ "step": 275000
2088
+ },
2089
+ {
2090
+ "epoch": 17.757189731712025,
2091
+ "grad_norm": 0.45774900913238525,
2092
+ "learning_rate": 3.7611346195223404e-05,
2093
+ "loss": 2.412,
2094
+ "step": 276000
2095
+ },
2096
+ {
2097
+ "epoch": 17.821527375667504,
2098
+ "grad_norm": 0.4509606659412384,
2099
+ "learning_rate": 3.6536613354371366e-05,
2100
+ "loss": 2.4086,
2101
+ "step": 277000
2102
+ },
2103
+ {
2104
+ "epoch": 17.885865019622983,
2105
+ "grad_norm": 0.4442935883998871,
2106
+ "learning_rate": 3.5460804704869826e-05,
2107
+ "loss": 2.4134,
2108
+ "step": 278000
2109
+ },
2110
+ {
2111
+ "epoch": 17.95020266357846,
2112
+ "grad_norm": 0.42292436957359314,
2113
+ "learning_rate": 3.438607186401778e-05,
2114
+ "loss": 2.4186,
2115
+ "step": 279000
2116
+ },
2117
+ {
2118
+ "epoch": 18.0,
2119
+ "eval_accuracy": 0.42388941236296196,
2120
+ "eval_loss": 3.0051016807556152,
2121
+ "eval_runtime": 112.3705,
2122
+ "eval_samples_per_second": 466.67,
2123
+ "eval_steps_per_second": 7.297,
2124
+ "step": 279774
2125
+ },
2126
+ {
2127
+ "epoch": 18.014540307533938,
2128
+ "grad_norm": 0.48824623227119446,
2129
+ "learning_rate": 3.331026321451624e-05,
2130
+ "loss": 2.4055,
2131
+ "step": 280000
2132
+ },
2133
+ {
2134
+ "epoch": 18.078877951489417,
2135
+ "grad_norm": 0.46934977173805237,
2136
+ "learning_rate": 3.22355303736642e-05,
2137
+ "loss": 2.3736,
2138
+ "step": 281000
2139
+ },
2140
+ {
2141
+ "epoch": 18.143215595444893,
2142
+ "grad_norm": 0.5045217275619507,
2143
+ "learning_rate": 3.115972172416266e-05,
2144
+ "loss": 2.382,
2145
+ "step": 282000
2146
+ },
2147
+ {
2148
+ "epoch": 18.207553239400372,
2149
+ "grad_norm": 0.46461954712867737,
2150
+ "learning_rate": 3.008391307466112e-05,
2151
+ "loss": 2.3806,
2152
+ "step": 283000
2153
+ },
2154
+ {
2155
+ "epoch": 18.27189088335585,
2156
+ "grad_norm": 0.4565331041812897,
2157
+ "learning_rate": 2.9009180233809078e-05,
2158
+ "loss": 2.3813,
2159
+ "step": 284000
2160
+ },
2161
+ {
2162
+ "epoch": 18.33622852731133,
2163
+ "grad_norm": 0.4561784863471985,
2164
+ "learning_rate": 2.793337158430754e-05,
2165
+ "loss": 2.3863,
2166
+ "step": 285000
2167
+ },
2168
+ {
2169
+ "epoch": 18.400566171266806,
2170
+ "grad_norm": 0.4438989758491516,
2171
+ "learning_rate": 2.6858638743455493e-05,
2172
+ "loss": 2.3845,
2173
+ "step": 286000
2174
+ },
2175
+ {
2176
+ "epoch": 18.464903815222286,
2177
+ "grad_norm": 0.461086630821228,
2178
+ "learning_rate": 2.578283009395395e-05,
2179
+ "loss": 2.3833,
2180
+ "step": 287000
2181
+ },
2182
+ {
2183
+ "epoch": 18.529241459177765,
2184
+ "grad_norm": 0.4639764726161957,
2185
+ "learning_rate": 2.470702144445241e-05,
2186
+ "loss": 2.3918,
2187
+ "step": 288000
2188
+ },
2189
+ {
2190
+ "epoch": 18.593579103133244,
2191
+ "grad_norm": 0.4645422697067261,
2192
+ "learning_rate": 2.3631212794950868e-05,
2193
+ "loss": 2.3953,
2194
+ "step": 289000
2195
+ },
2196
+ {
2197
+ "epoch": 18.65791674708872,
2198
+ "grad_norm": 0.47392553091049194,
2199
+ "learning_rate": 2.2555404145449328e-05,
2200
+ "loss": 2.3829,
2201
+ "step": 290000
2202
+ },
2203
+ {
2204
+ "epoch": 18.7222543910442,
2205
+ "grad_norm": 0.4530762732028961,
2206
+ "learning_rate": 2.148174711324679e-05,
2207
+ "loss": 2.3904,
2208
+ "step": 291000
2209
+ },
2210
+ {
2211
+ "epoch": 18.78659203499968,
2212
+ "grad_norm": 0.47473639249801636,
2213
+ "learning_rate": 2.0405938463745248e-05,
2214
+ "loss": 2.3966,
2215
+ "step": 292000
2216
+ },
2217
+ {
2218
+ "epoch": 18.850929678955158,
2219
+ "grad_norm": 0.43500351905822754,
2220
+ "learning_rate": 1.9330129814243705e-05,
2221
+ "loss": 2.396,
2222
+ "step": 293000
2223
+ },
2224
+ {
2225
+ "epoch": 18.915267322910633,
2226
+ "grad_norm": 0.45157596468925476,
2227
+ "learning_rate": 1.8254321164742165e-05,
2228
+ "loss": 2.3959,
2229
+ "step": 294000
2230
+ },
2231
+ {
2232
+ "epoch": 18.979604966866113,
2233
+ "grad_norm": 0.4546051621437073,
2234
+ "learning_rate": 1.7180664132539624e-05,
2235
+ "loss": 2.3869,
2236
+ "step": 295000
2237
+ },
2238
+ {
2239
+ "epoch": 19.0,
2240
+ "eval_accuracy": 0.42373925008599933,
2241
+ "eval_loss": 3.011887311935425,
2242
+ "eval_runtime": 112.2632,
2243
+ "eval_samples_per_second": 467.116,
2244
+ "eval_steps_per_second": 7.304,
2245
+ "step": 295317
2246
+ },
2247
+ {
2248
+ "epoch": 19.043942610821592,
2249
+ "grad_norm": 0.46901893615722656,
2250
+ "learning_rate": 1.610485548303808e-05,
2251
+ "loss": 2.3726,
2252
+ "step": 296000
2253
+ },
2254
+ {
2255
+ "epoch": 19.10828025477707,
2256
+ "grad_norm": 0.43862438201904297,
2257
+ "learning_rate": 1.502904683353654e-05,
2258
+ "loss": 2.3688,
2259
+ "step": 297000
2260
+ },
2261
+ {
2262
+ "epoch": 19.172617898732547,
2263
+ "grad_norm": 0.4580424427986145,
2264
+ "learning_rate": 1.3954313992684501e-05,
2265
+ "loss": 2.3682,
2266
+ "step": 298000
2267
+ },
2268
+ {
2269
+ "epoch": 19.236955542688026,
2270
+ "grad_norm": 0.47557470202445984,
2271
+ "learning_rate": 1.2878505343182957e-05,
2272
+ "loss": 2.3687,
2273
+ "step": 299000
2274
+ },
2275
+ {
2276
+ "epoch": 19.301293186643505,
2277
+ "grad_norm": 0.48615992069244385,
2278
+ "learning_rate": 1.1802696693681415e-05,
2279
+ "loss": 2.3636,
2280
+ "step": 300000
2281
+ },
2282
+ {
2283
+ "epoch": 19.365630830598985,
2284
+ "grad_norm": 0.5019800662994385,
2285
+ "learning_rate": 1.0726888044179874e-05,
2286
+ "loss": 2.3668,
2287
+ "step": 301000
2288
+ },
2289
+ {
2290
+ "epoch": 19.42996847455446,
2291
+ "grad_norm": 0.4481401741504669,
2292
+ "learning_rate": 9.652155203327834e-06,
2293
+ "loss": 2.3721,
2294
+ "step": 302000
2295
+ },
2296
+ {
2297
+ "epoch": 19.49430611850994,
2298
+ "grad_norm": 0.4632056653499603,
2299
+ "learning_rate": 8.577422362475793e-06,
2300
+ "loss": 2.372,
2301
+ "step": 303000
2302
+ },
2303
+ {
2304
+ "epoch": 19.55864376246542,
2305
+ "grad_norm": 0.4590476453304291,
2306
+ "learning_rate": 7.5016137129742514e-06,
2307
+ "loss": 2.3725,
2308
+ "step": 304000
2309
+ },
2310
+ {
2311
+ "epoch": 19.622981406420898,
2312
+ "grad_norm": 0.4774569272994995,
2313
+ "learning_rate": 6.42580506347271e-06,
2314
+ "loss": 2.37,
2315
+ "step": 305000
2316
+ },
2317
+ {
2318
+ "epoch": 19.687319050376374,
2319
+ "grad_norm": 0.47048863768577576,
2320
+ "learning_rate": 5.351072222620669e-06,
2321
+ "loss": 2.371,
2322
+ "step": 306000
2323
+ },
2324
+ {
2325
+ "epoch": 19.751656694331853,
2326
+ "grad_norm": 0.4567144215106964,
2327
+ "learning_rate": 4.275263573119128e-06,
2328
+ "loss": 2.3706,
2329
+ "step": 307000
2330
+ },
2331
+ {
2332
+ "epoch": 19.815994338287332,
2333
+ "grad_norm": 0.4492277503013611,
2334
+ "learning_rate": 3.200530732267087e-06,
2335
+ "loss": 2.3714,
2336
+ "step": 308000
2337
+ },
2338
+ {
2339
+ "epoch": 19.88033198224281,
2340
+ "grad_norm": 0.44562822580337524,
2341
+ "learning_rate": 2.1247220827655454e-06,
2342
+ "loss": 2.3711,
2343
+ "step": 309000
2344
+ },
2345
+ {
2346
+ "epoch": 19.944669626198287,
2347
+ "grad_norm": 0.4758046269416809,
2348
+ "learning_rate": 1.0499892419135049e-06,
2349
+ "loss": 2.3686,
2350
+ "step": 310000
2351
+ },
2352
+ {
2353
+ "epoch": 20.0,
2354
+ "eval_accuracy": 0.4234014597448438,
2355
+ "eval_loss": 3.0190186500549316,
2356
+ "eval_runtime": 112.3566,
2357
+ "eval_samples_per_second": 466.728,
2358
+ "eval_steps_per_second": 7.298,
2359
+ "step": 310860
2360
+ },
2361
+ {
2362
+ "epoch": 20.0,
2363
+ "step": 310860,
2364
+ "total_flos": 1.29957250203648e+18,
2365
+ "train_loss": 2.7038125842232534,
2366
+ "train_runtime": 44110.0674,
2367
+ "train_samples_per_second": 225.51,
2368
+ "train_steps_per_second": 7.047
2369
+ }
2370
+ ],
2371
+ "logging_steps": 1000,
2372
+ "max_steps": 310860,
2373
+ "num_input_tokens_seen": 0,
2374
+ "num_train_epochs": 20,
2375
+ "save_steps": 5000,
2376
+ "stateful_callbacks": {
2377
+ "TrainerControl": {
2378
+ "args": {
2379
+ "should_epoch_stop": false,
2380
+ "should_evaluate": false,
2381
+ "should_log": false,
2382
+ "should_save": true,
2383
+ "should_training_stop": true
2384
+ },
2385
+ "attributes": {}
2386
+ }
2387
+ },
2388
+ "total_flos": 1.29957250203648e+18,
2389
+ "train_batch_size": 32,
2390
+ "trial_name": null,
2391
+ "trial_params": null
2392
+ }