sarwin commited on
Commit
078a6f4
·
verified ·
1 Parent(s): 8f95a99

Delete mx-01

Browse files
mx-01/1_Pooling/config.json DELETED
@@ -1,10 +0,0 @@
1
- {
2
- "word_embedding_dimension": 384,
3
- "pooling_mode_cls_token": false,
4
- "pooling_mode_mean_tokens": true,
5
- "pooling_mode_max_tokens": false,
6
- "pooling_mode_mean_sqrt_len_tokens": false,
7
- "pooling_mode_weightedmean_tokens": false,
8
- "pooling_mode_lasttoken": false,
9
- "include_prompt": true
10
- }
 
 
 
 
 
 
 
 
 
 
 
mx-01/README.md DELETED
@@ -1,1415 +0,0 @@
1
- ---
2
- base_model: nreimers/MiniLM-L6-H384-uncased
3
- datasets: []
4
- language: []
5
- library_name: sentence-transformers
6
- pipeline_tag: sentence-similarity
7
- tags:
8
- - sentence-transformers
9
- - sentence-similarity
10
- - feature-extraction
11
- - generated_from_trainer
12
- - dataset_size:730454
13
- - loss:MultipleNegativesRankingLoss
14
- widget:
15
- - source_sentence: Continuous finite-time control approach for series elastic actuator
16
- sentences:
17
- - Distributed coordination is difficult, especially when the system may suffer intrusions
18
- that corrupt some component processes. We introduce the abstraction of a failure
19
- detector that a process can use to (imperfectly) detect the corruption (Byzantine
20
- failure) of another process. In general, our failure detectors can be unreliable,
21
- both by reporting a correct process to be faulty or by reporting a faulty process
22
- to be correct. However, we show that if these detectors satisfy certain plausible
23
- properties, then the well known distributed consensus problem can be solved. We
24
- also present a randomized protocol using failure detectors that solves the consensus
25
- problem if either the requisite properties of failure detectors hold or if certain
26
- highly probable events eventually occur. This work can be viewed as a generalization
27
- of benign failure detectors popular in the distributed computing literature.
28
- - 'This paper deals with multilevel partial-response class-IV (PRIV) transmission
29
- over unshielded twisted-pair (UTP) cables. Specifically, transmission at a rate
30
- of 155.52 Mb/s over data-grade UTP cables for local-area networking is considered.
31
- As a low-complexity method used to compensate for cable-length dependent signal
32
- distortion, adaptive analog equalization with two controlled parameters is proposed:
33
- one parameter determines a frequency-independent receiver gain, the other parameter
34
- controls the transfer characteristic of a variable analog receive-filter section.
35
- For the stepwise design of the transmit and receive filters, a combination of
36
- analytic techniques and simulated annealing is employed. First, the variable equalizer
37
- section, then the remaining fixed analog receive filter section are developed
38
- and finally the analog transmit filter is determined. The paper also describes
39
- the adjustment of the equalizer section, and the control of the sampling phase
40
- in the receiver front-end. The two equalizer parameters are controlled by an algorithm
41
- that operates on the sampled signals and adjusts these parameters to optimum settings
42
- independently of the sampling phase. The latter is controlled by a decision-directed
43
- phase-locked loop algorithm that becomes effective when equalization has been
44
- achieved. The dynamic behaviour and mean-square error in steady-state obtained
45
- with these control algorithms are investigated.'
46
- - 'In this paper, a practical control approach is suggested for series elastic actuators(SEAs)
47
- to generate the desired torque. Firstly, based on the analysis of a nonlinear
48
- SEA, the generic dynamics for a class of SEAs is summarized. Then the dynamic
49
- equations are transformed into a novel state-space form which is convenient for
50
- controller design. Finally, based on the recently developed finite-time control
51
- technique, a finite time disturbance observer and a continuous terminal sliding-mode
52
- control scheme are introduced to synthesize the control law. The finite-time stability
53
- of the proposed controller is theoretically ensured by Lyapunov analysis. Compared
54
- with most existing methods, the contribution of the paper is two-fold: (i) The
55
- proposed controller is suitable for not only linear, but also a class of nonlinear
56
- SEAs, which means that it is a more generic method for SEA torque control; (ii)
57
- It achieves faster convergence rate and works well even in the presence of unknown
58
- payload parameters and external disturbances. A series of experiments are carried
59
- out on the self-built SEA testbed to demonstrate the superior performance of the
60
- proposed controller by comparing it with the cascade-PID controller.'
61
- - source_sentence: Matrix Methods for Solving Algebraic Systems
62
- sentences:
63
- - We present our public-domain software for the following tasks in sparse (or toric)
64
- elimination theory, given a well-constrained polynomial system. First, C code
65
- for computing the mixed volume of the system. Second, Maple code for defining
66
- an overconstrained system and constructing a Sylvester-type matrix of its sparse
67
- resultant. Third, C code for a Sylvester-type matrix of the sparse resultant and
68
- a superset of all common roots of the initial well-constrained system by computing
69
- the eigen-decomposition of a square matrix obtained from the resultant matrix.
70
- We conclude with experiments in computing molecular conformations.
71
- - 'Design trade-offs between estimation performance, processing delay and communication
72
- cost for a sensor scheduling problem is discussed. We consider a heterogeneous
73
- sensor network with two types of sensors: the first type has low-quality measurements,
74
- small processing delay and a light communication cost, while the second type is
75
- of high quality, but imposes a large processing delay and a high communication
76
- cost. Such a heterogeneous sensor network is common in applications, where for
77
- instance in a localization system the poor sensor can be an ultrasound sensor
78
- while the more powerful sensor can be a camera. Using a time-periodic Kalman filter,
79
- we show how one can find an optimal schedule of the sensor communication. One
80
- can significantly improve estimation quality by only using the expensive sensor
81
- rarely. We also demonstrate how simple sensor switching rules based on the Riccati
82
- equation drives the filter into a stable time-periodic Kalman filter.'
83
- - The Multi-stage Genetic Algorithm, MGA, is introduced to solve a class of compositional
84
- design problems. The problem with complicated constraints is formulated as a set
85
- of local subproblems with simple constraints and a supervising problem. Every
86
- subproblem is solved by GA to generate a set of suboptimal solutions. And in the
87
- supervising problem, the elements of each set are optimally combined by GA to
88
- yield the optimal solution for the original problem. The method is a learning
89
- method where the empirical knowledge obtained by solving the problem is effectively
90
- utilized to solve similar problems efficiently. Extended knapsack problems are
91
- solved to demonstrate the proposed method, and the efficiency of the method is
92
- shown. In addition, the method is successfully applied to optimal realization
93
- of cooperative robot soccer behaviors.
94
- - source_sentence: Low-power partial-parallel Chien search architecture with polynomial
95
- degree reduction
96
- sentences:
97
- - In this paper, we present a novel attentive and immersive user interface based
98
- on gaze and hand gestures for interactive large-scale displays. The combination
99
- of gaze and hand gestures provide more interesting and immersive ways to manipulate
100
- 3D information.
101
- - There is significant interest in the synthesis of discrete-state random fields,
102
- particularly those possessing structure over a wide range of scales. However,
103
- given a model on some finest, pixellated scale, it is computationally very difficult
104
- to synthesize both large- and small-scale structures, motivating research into
105
- hierarchical methods. In this paper, we propose a frozen-state approach to hierarchical
106
- modeling, in which simulated annealing is performed on each scale, constrained
107
- by the state estimates at the parent scale. This approach leads to significant
108
- advantages in both modeling flexibility and computational complexity. In particular,
109
- a complex structure can be realized with very simple, local, scale-dependent models,
110
- and by constraining the domain to be annealed at finer scales to only the uncertain
111
- portions of coarser scales; the approach leads to huge improvements in computational
112
- complexity. Results are shown for a synthesis problem in porous media.
113
- - The Chien search for the error locator polynomial root computation in BCH and
114
- Reed-Solomon decoding accounts for a significant part of the overall decoder power
115
- consumption, especially r long codes over finite fields of high order. For serial
116
- Chien search, the power consumption is substantially lowered by a polynomial degree
117
- reduction (PDR) scheme. Every time a root is found, it is factored out of the
118
- error locator polynomial. Only the hardware units associated with the reduced-degree
119
- polynomial coefficients are active. However, this PDR scheme can not be directly
120
- extended to partial-parallel Chien search, which is needed in any systems to achieve
121
- high throughput. By analyzing the formulas of the evaluation values over finite
122
- field elements and available intermediate results of the Chien search, this paper
123
- proposes a partial-parallel Chien search architecture that reduces the error locator
124
- polynomial degree on the fly whenever a root is found without using long division.
125
- For a 122-error-correcting BCH code over GF(215), an 8-parallel Chien search using
126
- the proposed architecture achieves 32% power reduction over existing partial-parallel
127
- architectures for a typical case.
128
- - source_sentence: An efficient network-switch scheduling for real-time applications
129
- sentences:
130
- - Bursts consist of a varying number of asynchronous transfer mode cells corresponding
131
- to a datagram. Here, we generalized weighted fair queueing to a burst-based algorithm
132
- with preemption. The new algorithm enhances the performance of the switch service
133
- for real-time applications, and it preserves the quality of service guarantees.
134
- We study this algorithm theoretically and via simulations.
135
- - Online Social Network (OSN) is one of the hottest innovations in the past years,
136
- and the active users are more than a billion. For OSN, users' behavior is one
137
- of the important factors to study. This demonstration proposal presents Harbinger,
138
- an analyzing and predicting system for OSN users' behavior. In Harbinger, we focus
139
- on tweets' timestamps (when users post or share messages), visualize users' post
140
- behavior as well as message retweet number and build adjustable models to predict
141
- users' behavior. Predictions of users' behavior can be performed with the discovered
142
- behavior models and the results can be applied to many applications such as tweet
143
- crawler and advertisement.
144
- - The computation and memory required for kernel machines with N training samples
145
- is at least O(N2). Such a complexity is significant even for moderate size problems
146
- and is prohibitive for large datasets. We present an approximation technique based
147
- on the improved fast Gauss transform to reduce the computation to O(N). We also
148
- give an error bound for the approximation, and provide experimental results on
149
- the UCI datasets.
150
- - source_sentence: Summarizing the Evidence on the International Trade in Illegal
151
- Wildlife
152
- sentences:
153
- - This paper proposes a method to represent classifiers or learned regression functions
154
- using an OWL ontology. Also proposed are methods for finding an appropriate learned
155
- function to answer a simple query. The ontology standardizes variable names and
156
- dependence properties, so that feature values can be given by users or found on
157
- the semantic web.
158
- - The global trade in illegal wildlife is a multi-billion dollar industry that threatens
159
- biodiversity and acts as a potential avenue for invasive species and disease spread.
160
- Despite the broad-sweeping implications of illegal wildlife sales, scientists
161
- have yet to describe the scope and scale of the trade. Here, we provide the most
162
- thorough and current description of the illegal wildlife trade using 12 years
163
- of seizure records compiled by TRAFFIC, the wildlife trade monitoring network.
164
- These records comprise 967 seizures including massive quantities of ivory, tiger
165
- skins, live reptiles, and other endangered wildlife and wildlife products. Most
166
- seizures originate in Southeast Asia, a recently identified hotspot for future
167
- emerging infectious diseases. To date, regulation and enforcement have been insufficient
168
- to effectively control the global trade in illegal wildlife at national and international
169
- scales. Effective control will require a multi-pronged approach including community-scale
170
- education and empowering local people to value wildlife, coordinated international
171
- regulation, and a greater allocation of national resources to on-the-ground enforcement.
172
- - Griffithsin (GRFT) is a red alga-derived lectin with demonstrated broad spectrum
173
- antiviral activity against enveloped viruses, including severe acute respiratory
174
- syndrome–Coronavirus (SARS-CoV), Japanese encephalitis virus (JEV), hepatitis
175
- C virus (HCV), and herpes simplex virus-2 (HSV-2). However, its pharmacokinetic
176
- profile remains largely undefined. Here, Sprague Dawley rats were administered
177
- a single dose of GRFT at 10 or 20 mg/kg by intravenous, oral, and subcutaneous
178
- routes, respectively, and serum GRFT levels were measured at select time points.
179
- In addition, the potential for systemic accumulation after oral dosing was assessed
180
- in rats after 10 daily treatments with GRFT (20 or 40 mg/kg). We found that parenterally-administered
181
- GRFT in rats displayed a complex elimination profile, which varied according to
182
- administration routes. However, GRFT was not orally bioavailable, even after chronic
183
- treatment. Nonetheless, active GRFT capable of neutralizing HIV-Env pseudoviruses
184
- was detected in rat fecal extracts after chronic oral dosing. These findings support
185
- further evaluation of GRFT for pre-exposure prophylaxis against emerging epidemics
186
- for which specific therapeutics are not available, including systemic and enteric
187
- infections caused by susceptible enveloped viruses. In addition, GRFT should be
188
- considered for antiviral therapy and the prevention of rectal transmission of
189
- HIV-1 and other susceptible viruses.
190
- ---
191
-
192
- # SentenceTransformer based on nreimers/MiniLM-L6-H384-uncased
193
-
194
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nreimers/MiniLM-L6-H384-uncased](https://huggingface.co/nreimers/MiniLM-L6-H384-uncased). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
195
-
196
- ## Model Details
197
-
198
- ### Model Description
199
- - **Model Type:** Sentence Transformer
200
- - **Base model:** [nreimers/MiniLM-L6-H384-uncased](https://huggingface.co/nreimers/MiniLM-L6-H384-uncased) <!-- at revision 3276f0fac9d818781d7a1327b3ff818fc4e643c0 -->
201
- - **Maximum Sequence Length:** 512 tokens
202
- - **Output Dimensionality:** 384 tokens
203
- - **Similarity Function:** Cosine Similarity
204
- <!-- - **Training Dataset:** Unknown -->
205
- <!-- - **Language:** Unknown -->
206
- <!-- - **License:** Unknown -->
207
-
208
- ### Model Sources
209
-
210
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
211
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
212
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
213
-
214
- ### Full Model Architecture
215
-
216
- ```
217
- SentenceTransformer(
218
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
219
- (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
220
- )
221
- ```
222
-
223
- ## Usage
224
-
225
- ### Direct Usage (Sentence Transformers)
226
-
227
- First install the Sentence Transformers library:
228
-
229
- ```bash
230
- pip install -U sentence-transformers
231
- ```
232
-
233
- Then you can load this model and run inference.
234
- ```python
235
- from sentence_transformers import SentenceTransformer
236
-
237
- # Download from the 🤗 Hub
238
- model = SentenceTransformer("sentence_transformers_model_id")
239
- # Run inference
240
- sentences = [
241
- 'Summarizing the Evidence on the International Trade in Illegal Wildlife',
242
- 'The global trade in illegal wildlife is a multi-billion dollar industry that threatens biodiversity and acts as a potential avenue for invasive species and disease spread. Despite the broad-sweeping implications of illegal wildlife sales, scientists have yet to describe the scope and scale of the trade. Here, we provide the most thorough and current description of the illegal wildlife trade using 12 years of seizure records compiled by TRAFFIC, the wildlife trade monitoring network. These records comprise 967 seizures including massive quantities of ivory, tiger skins, live reptiles, and other endangered wildlife and wildlife products. Most seizures originate in Southeast Asia, a recently identified hotspot for future emerging infectious diseases. To date, regulation and enforcement have been insufficient to effectively control the global trade in illegal wildlife at national and international scales. Effective control will require a multi-pronged approach including community-scale education and empowering local people to value wildlife, coordinated international regulation, and a greater allocation of national resources to on-the-ground enforcement.',
243
- 'This paper proposes a method to represent classifiers or learned regression functions using an OWL ontology. Also proposed are methods for finding an appropriate learned function to answer a simple query. The ontology standardizes variable names and dependence properties, so that feature values can be given by users or found on the semantic web.',
244
- ]
245
- embeddings = model.encode(sentences)
246
- print(embeddings.shape)
247
- # [3, 384]
248
-
249
- # Get the similarity scores for the embeddings
250
- similarities = model.similarity(embeddings, embeddings)
251
- print(similarities.shape)
252
- # [3, 3]
253
- ```
254
-
255
- <!--
256
- ### Direct Usage (Transformers)
257
-
258
- <details><summary>Click to see the direct usage in Transformers</summary>
259
-
260
- </details>
261
- -->
262
-
263
- <!--
264
- ### Downstream Usage (Sentence Transformers)
265
-
266
- You can finetune this model on your own dataset.
267
-
268
- <details><summary>Click to expand</summary>
269
-
270
- </details>
271
- -->
272
-
273
- <!--
274
- ### Out-of-Scope Use
275
-
276
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
277
- -->
278
-
279
- <!--
280
- ## Bias, Risks and Limitations
281
-
282
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
283
- -->
284
-
285
- <!--
286
- ### Recommendations
287
-
288
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
289
- -->
290
-
291
- ## Training Details
292
-
293
- ### Training Dataset
294
-
295
- #### Unnamed Dataset
296
-
297
-
298
- * Size: 730,454 training samples
299
- * Columns: <code>sentence_0</code> and <code>sentence_1</code>
300
- * Approximate statistics based on the first 1000 samples:
301
- | | sentence_0 | sentence_1 |
302
- |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
303
- | type | string | string |
304
- | details | <ul><li>min: 5 tokens</li><li>mean: 15.55 tokens</li><li>max: 41 tokens</li></ul> | <ul><li>min: 21 tokens</li><li>mean: 195.91 tokens</li><li>max: 512 tokens</li></ul> |
305
- * Samples:
306
- | sentence_0 | sentence_1 |
307
- |:-----------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
308
- | <code>A parallel algorithm for constructing independent spanning trees in twisted cubes</code> | <code>A long-standing conjecture mentions that a kk-connected graph GG admits kk independent spanning trees (ISTs for short) rooted at an arbitrary node of GG. An nn-dimensional twisted cube, denoted by TQnTQn, is a variation of hypercube with connectivity nn and has many features superior to those of hypercube. Yang (2010) first proposed an algorithm to construct nn edge-disjoint spanning trees in TQnTQn for any odd integer n⩾3n⩾3 and showed that half of them are ISTs. At a later stage, Wang et al. (2012) inferred that the above conjecture in affirmative for TQnTQn by providing an O(NlogN)O(NlogN) time algorithm to construct nn ISTs, where N=2nN=2n is the number of nodes in TQnTQn. However, this algorithm is executed in a recursive fashion and thus is hard to be parallelized. In this paper, we revisit the problem of constructing ISTs in twisted cubes and present a non-recursive algorithm. Our approach can be fully parallelized to make the use of all nodes of TQnTQn as processors for computation in such a way that each node can determine its parent in all spanning trees directly by referring its address and tree indices in O(logN)O(logN) time.</code> |
309
- | <code>A Novel Method for Separating and Locating Multiple Partial Discharge Sources in a Substation</code> | <code>To separate and locate multi-partial discharge (PD) sources in a substation, the use of spectrum differences of ultra-high frequency signals radiated from various sources as characteristic parameters has been previously reported. However, the separation success rate was poor when signal-to-noise ratio was low, and the localization result was a coordinate on two-dimensional plane. In this paper, a novel method is proposed to improve the separation rate and the localization accuracy. A directional measuring platform is built using two directional antennas. The time delay (TD) of the signals captured by the antennas is calculated, and TD sequences are obtained by rotating the platform at different angles. The sequences are separated with the TD distribution feature, and the directions of the multi-PD sources are calculated. The PD sources are located by directions using the error probability method. To verify the method, a simulated model with three PD sources was established by XFdtd. Simulation results show that the separation rate is increased from 71% to 95% compared with the previous method, and an accurate three-dimensional localization result was obtained. A field test with two PD sources was carried out, and the sources were separated and located accurately by the proposed method.</code> |
310
- | <code>Every ternary permutation constraint satisfaction problem parameterized above average has a kernel with a quadratic number of variables</code> | <code>A ternary Permutation-CSP is specified by a subset @P of the symmetric group S"3. An instance of such a problem consists of a set of variables V and a multiset of constraints, which are ordered triples of distinct variables of V. The objective is to find a linear ordering @a of V that maximizes the number of triples whose rearrangement (under @a) follows a permutation in @P. We prove that every ternary Permutation-CSP parameterized above average has a kernel with a quadratic number of variables.</code> |
311
- * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
312
- ```json
313
- {
314
- "scale": 20.0,
315
- "similarity_fct": "cos_sim"
316
- }
317
- ```
318
-
319
- ### Training Hyperparameters
320
- #### Non-Default Hyperparameters
321
-
322
- - `num_train_epochs`: 5
323
- - `multi_dataset_batch_sampler`: round_robin
324
-
325
- #### All Hyperparameters
326
- <details><summary>Click to expand</summary>
327
-
328
- - `overwrite_output_dir`: False
329
- - `do_predict`: False
330
- - `eval_strategy`: no
331
- - `prediction_loss_only`: True
332
- - `per_device_train_batch_size`: 8
333
- - `per_device_eval_batch_size`: 8
334
- - `per_gpu_train_batch_size`: None
335
- - `per_gpu_eval_batch_size`: None
336
- - `gradient_accumulation_steps`: 1
337
- - `eval_accumulation_steps`: None
338
- - `learning_rate`: 5e-05
339
- - `weight_decay`: 0.0
340
- - `adam_beta1`: 0.9
341
- - `adam_beta2`: 0.999
342
- - `adam_epsilon`: 1e-08
343
- - `max_grad_norm`: 1
344
- - `num_train_epochs`: 5
345
- - `max_steps`: -1
346
- - `lr_scheduler_type`: linear
347
- - `lr_scheduler_kwargs`: {}
348
- - `warmup_ratio`: 0.0
349
- - `warmup_steps`: 0
350
- - `log_level`: passive
351
- - `log_level_replica`: warning
352
- - `log_on_each_node`: True
353
- - `logging_nan_inf_filter`: True
354
- - `save_safetensors`: True
355
- - `save_on_each_node`: False
356
- - `save_only_model`: False
357
- - `restore_callback_states_from_checkpoint`: False
358
- - `no_cuda`: False
359
- - `use_cpu`: False
360
- - `use_mps_device`: False
361
- - `seed`: 42
362
- - `data_seed`: None
363
- - `jit_mode_eval`: False
364
- - `use_ipex`: False
365
- - `bf16`: False
366
- - `fp16`: False
367
- - `fp16_opt_level`: O1
368
- - `half_precision_backend`: auto
369
- - `bf16_full_eval`: False
370
- - `fp16_full_eval`: False
371
- - `tf32`: None
372
- - `local_rank`: 0
373
- - `ddp_backend`: None
374
- - `tpu_num_cores`: None
375
- - `tpu_metrics_debug`: False
376
- - `debug`: []
377
- - `dataloader_drop_last`: False
378
- - `dataloader_num_workers`: 0
379
- - `dataloader_prefetch_factor`: None
380
- - `past_index`: -1
381
- - `disable_tqdm`: False
382
- - `remove_unused_columns`: True
383
- - `label_names`: None
384
- - `load_best_model_at_end`: False
385
- - `ignore_data_skip`: False
386
- - `fsdp`: []
387
- - `fsdp_min_num_params`: 0
388
- - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
389
- - `fsdp_transformer_layer_cls_to_wrap`: None
390
- - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
391
- - `deepspeed`: None
392
- - `label_smoothing_factor`: 0.0
393
- - `optim`: adamw_torch
394
- - `optim_args`: None
395
- - `adafactor`: False
396
- - `group_by_length`: False
397
- - `length_column_name`: length
398
- - `ddp_find_unused_parameters`: None
399
- - `ddp_bucket_cap_mb`: None
400
- - `ddp_broadcast_buffers`: False
401
- - `dataloader_pin_memory`: True
402
- - `dataloader_persistent_workers`: False
403
- - `skip_memory_metrics`: True
404
- - `use_legacy_prediction_loop`: False
405
- - `push_to_hub`: False
406
- - `resume_from_checkpoint`: None
407
- - `hub_model_id`: None
408
- - `hub_strategy`: every_save
409
- - `hub_private_repo`: False
410
- - `hub_always_push`: False
411
- - `gradient_checkpointing`: False
412
- - `gradient_checkpointing_kwargs`: None
413
- - `include_inputs_for_metrics`: False
414
- - `eval_do_concat_batches`: True
415
- - `fp16_backend`: auto
416
- - `push_to_hub_model_id`: None
417
- - `push_to_hub_organization`: None
418
- - `mp_parameters`:
419
- - `auto_find_batch_size`: False
420
- - `full_determinism`: False
421
- - `torchdynamo`: None
422
- - `ray_scope`: last
423
- - `ddp_timeout`: 1800
424
- - `torch_compile`: False
425
- - `torch_compile_backend`: None
426
- - `torch_compile_mode`: None
427
- - `dispatch_batches`: None
428
- - `split_batches`: None
429
- - `include_tokens_per_second`: False
430
- - `include_num_input_tokens_seen`: False
431
- - `neftune_noise_alpha`: None
432
- - `optim_target_modules`: None
433
- - `batch_eval_metrics`: False
434
- - `eval_on_start`: False
435
- - `batch_sampler`: batch_sampler
436
- - `multi_dataset_batch_sampler`: round_robin
437
-
438
- </details>
439
-
440
- ### Training Logs
441
- <details><summary>Click to expand</summary>
442
-
443
- | Epoch | Step | Training Loss |
444
- |:------:|:------:|:-------------:|
445
- | 0.0055 | 500 | 1.6701 |
446
- | 0.0110 | 1000 | 0.8225 |
447
- | 0.0164 | 1500 | 0.3883 |
448
- | 0.0219 | 2000 | 0.2685 |
449
- | 0.0274 | 2500 | 0.2349 |
450
- | 0.0329 | 3000 | 0.1685 |
451
- | 0.0383 | 3500 | 0.1409 |
452
- | 0.0438 | 4000 | 0.1262 |
453
- | 0.0493 | 4500 | 0.1195 |
454
- | 0.0548 | 5000 | 0.1044 |
455
- | 0.0602 | 5500 | 0.0989 |
456
- | 0.0657 | 6000 | 0.0787 |
457
- | 0.0712 | 6500 | 0.0895 |
458
- | 0.0767 | 7000 | 0.0708 |
459
- | 0.0821 | 7500 | 0.0834 |
460
- | 0.0876 | 8000 | 0.0634 |
461
- | 0.0931 | 8500 | 0.0643 |
462
- | 0.0986 | 9000 | 0.0567 |
463
- | 0.1040 | 9500 | 0.0646 |
464
- | 0.1095 | 10000 | 0.0607 |
465
- | 0.1150 | 10500 | 0.0564 |
466
- | 0.1205 | 11000 | 0.068 |
467
- | 0.1259 | 11500 | 0.0536 |
468
- | 0.1314 | 12000 | 0.0594 |
469
- | 0.1369 | 12500 | 0.057 |
470
- | 0.1424 | 13000 | 0.0555 |
471
- | 0.1479 | 13500 | 0.0485 |
472
- | 0.1533 | 14000 | 0.0528 |
473
- | 0.1588 | 14500 | 0.0478 |
474
- | 0.1643 | 15000 | 0.0586 |
475
- | 0.1698 | 15500 | 0.0539 |
476
- | 0.1752 | 16000 | 0.0432 |
477
- | 0.1807 | 16500 | 0.0542 |
478
- | 0.1862 | 17000 | 0.0536 |
479
- | 0.1917 | 17500 | 0.0492 |
480
- | 0.1971 | 18000 | 0.0427 |
481
- | 0.2026 | 18500 | 0.0489 |
482
- | 0.2081 | 19000 | 0.0502 |
483
- | 0.2136 | 19500 | 0.0432 |
484
- | 0.2190 | 20000 | 0.0459 |
485
- | 0.2245 | 20500 | 0.0376 |
486
- | 0.2300 | 21000 | 0.0489 |
487
- | 0.2355 | 21500 | 0.0515 |
488
- | 0.2409 | 22000 | 0.0429 |
489
- | 0.2464 | 22500 | 0.0417 |
490
- | 0.2519 | 23000 | 0.0478 |
491
- | 0.2574 | 23500 | 0.0359 |
492
- | 0.2628 | 24000 | 0.0452 |
493
- | 0.2683 | 24500 | 0.0443 |
494
- | 0.2738 | 25000 | 0.0409 |
495
- | 0.2793 | 25500 | 0.0421 |
496
- | 0.2848 | 26000 | 0.0393 |
497
- | 0.2902 | 26500 | 0.0409 |
498
- | 0.2957 | 27000 | 0.032 |
499
- | 0.3012 | 27500 | 0.0468 |
500
- | 0.3067 | 28000 | 0.0285 |
501
- | 0.3121 | 28500 | 0.0311 |
502
- | 0.3176 | 29000 | 0.0304 |
503
- | 0.3231 | 29500 | 0.0349 |
504
- | 0.3286 | 30000 | 0.0352 |
505
- | 0.3340 | 30500 | 0.0367 |
506
- | 0.3395 | 31000 | 0.0385 |
507
- | 0.3450 | 31500 | 0.0325 |
508
- | 0.3505 | 32000 | 0.0302 |
509
- | 0.3559 | 32500 | 0.0393 |
510
- | 0.3614 | 33000 | 0.032 |
511
- | 0.3669 | 33500 | 0.0263 |
512
- | 0.3724 | 34000 | 0.0343 |
513
- | 0.3778 | 34500 | 0.0349 |
514
- | 0.3833 | 35000 | 0.0282 |
515
- | 0.3888 | 35500 | 0.034 |
516
- | 0.3943 | 36000 | 0.0376 |
517
- | 0.3998 | 36500 | 0.0265 |
518
- | 0.4052 | 37000 | 0.0267 |
519
- | 0.4107 | 37500 | 0.0241 |
520
- | 0.4162 | 38000 | 0.033 |
521
- | 0.4217 | 38500 | 0.0323 |
522
- | 0.4271 | 39000 | 0.0278 |
523
- | 0.4326 | 39500 | 0.025 |
524
- | 0.4381 | 40000 | 0.0363 |
525
- | 0.4436 | 40500 | 0.0312 |
526
- | 0.4490 | 41000 | 0.0307 |
527
- | 0.4545 | 41500 | 0.0305 |
528
- | 0.4600 | 42000 | 0.028 |
529
- | 0.4655 | 42500 | 0.0279 |
530
- | 0.4709 | 43000 | 0.0265 |
531
- | 0.4764 | 43500 | 0.0262 |
532
- | 0.4819 | 44000 | 0.0308 |
533
- | 0.4874 | 44500 | 0.0282 |
534
- | 0.4928 | 45000 | 0.0243 |
535
- | 0.4983 | 45500 | 0.0236 |
536
- | 0.5038 | 46000 | 0.02 |
537
- | 0.5093 | 46500 | 0.0254 |
538
- | 0.5147 | 47000 | 0.0275 |
539
- | 0.5202 | 47500 | 0.0309 |
540
- | 0.5257 | 48000 | 0.031 |
541
- | 0.5312 | 48500 | 0.0271 |
542
- | 0.5367 | 49000 | 0.0218 |
543
- | 0.5421 | 49500 | 0.0249 |
544
- | 0.5476 | 50000 | 0.0285 |
545
- | 0.5531 | 50500 | 0.03 |
546
- | 0.5586 | 51000 | 0.0284 |
547
- | 0.5640 | 51500 | 0.0258 |
548
- | 0.5695 | 52000 | 0.0228 |
549
- | 0.5750 | 52500 | 0.0305 |
550
- | 0.5805 | 53000 | 0.0234 |
551
- | 0.5859 | 53500 | 0.0209 |
552
- | 0.5914 | 54000 | 0.0341 |
553
- | 0.5969 | 54500 | 0.0269 |
554
- | 0.6024 | 55000 | 0.0267 |
555
- | 0.6078 | 55500 | 0.0245 |
556
- | 0.6133 | 56000 | 0.0263 |
557
- | 0.6188 | 56500 | 0.0195 |
558
- | 0.6243 | 57000 | 0.0209 |
559
- | 0.6297 | 57500 | 0.0313 |
560
- | 0.6352 | 58000 | 0.0247 |
561
- | 0.6407 | 58500 | 0.0285 |
562
- | 0.6462 | 59000 | 0.0301 |
563
- | 0.6516 | 59500 | 0.0227 |
564
- | 0.6571 | 60000 | 0.0235 |
565
- | 0.6626 | 60500 | 0.0272 |
566
- | 0.6681 | 61000 | 0.025 |
567
- | 0.6736 | 61500 | 0.0276 |
568
- | 0.6790 | 62000 | 0.0289 |
569
- | 0.6845 | 62500 | 0.0232 |
570
- | 0.6900 | 63000 | 0.0258 |
571
- | 0.6955 | 63500 | 0.0254 |
572
- | 0.7009 | 64000 | 0.0205 |
573
- | 0.7064 | 64500 | 0.0216 |
574
- | 0.7119 | 65000 | 0.0304 |
575
- | 0.7174 | 65500 | 0.0234 |
576
- | 0.7228 | 66000 | 0.0233 |
577
- | 0.7283 | 66500 | 0.0239 |
578
- | 0.7338 | 67000 | 0.0166 |
579
- | 0.7393 | 67500 | 0.0211 |
580
- | 0.7447 | 68000 | 0.0212 |
581
- | 0.7502 | 68500 | 0.0247 |
582
- | 0.7557 | 69000 | 0.023 |
583
- | 0.7612 | 69500 | 0.0261 |
584
- | 0.7666 | 70000 | 0.0204 |
585
- | 0.7721 | 70500 | 0.026 |
586
- | 0.7776 | 71000 | 0.0299 |
587
- | 0.7831 | 71500 | 0.0183 |
588
- | 0.7885 | 72000 | 0.0228 |
589
- | 0.7940 | 72500 | 0.0181 |
590
- | 0.7995 | 73000 | 0.0237 |
591
- | 0.8050 | 73500 | 0.0237 |
592
- | 0.8105 | 74000 | 0.0158 |
593
- | 0.8159 | 74500 | 0.0222 |
594
- | 0.8214 | 75000 | 0.0196 |
595
- | 0.8269 | 75500 | 0.0242 |
596
- | 0.8324 | 76000 | 0.0218 |
597
- | 0.8378 | 76500 | 0.0201 |
598
- | 0.8433 | 77000 | 0.026 |
599
- | 0.8488 | 77500 | 0.0232 |
600
- | 0.8543 | 78000 | 0.0254 |
601
- | 0.8597 | 78500 | 0.0218 |
602
- | 0.8652 | 79000 | 0.0219 |
603
- | 0.8707 | 79500 | 0.0255 |
604
- | 0.8762 | 80000 | 0.0201 |
605
- | 0.8816 | 80500 | 0.0301 |
606
- | 0.8871 | 81000 | 0.0275 |
607
- | 0.8926 | 81500 | 0.018 |
608
- | 0.8981 | 82000 | 0.028 |
609
- | 0.9035 | 82500 | 0.0223 |
610
- | 0.9090 | 83000 | 0.0201 |
611
- | 0.9145 | 83500 | 0.0299 |
612
- | 0.9200 | 84000 | 0.0251 |
613
- | 0.9254 | 84500 | 0.0203 |
614
- | 0.9309 | 85000 | 0.0209 |
615
- | 0.9364 | 85500 | 0.0236 |
616
- | 0.9419 | 86000 | 0.0191 |
617
- | 0.9474 | 86500 | 0.0168 |
618
- | 0.9528 | 87000 | 0.017 |
619
- | 0.9583 | 87500 | 0.0201 |
620
- | 0.9638 | 88000 | 0.0171 |
621
- | 0.9693 | 88500 | 0.0217 |
622
- | 0.9747 | 89000 | 0.0208 |
623
- | 0.9802 | 89500 | 0.0157 |
624
- | 0.9857 | 90000 | 0.0218 |
625
- | 0.9912 | 90500 | 0.021 |
626
- | 0.9966 | 91000 | 0.0159 |
627
- | 1.0021 | 91500 | 0.0189 |
628
- | 1.0076 | 92000 | 0.0182 |
629
- | 1.0131 | 92500 | 0.0206 |
630
- | 1.0185 | 93000 | 0.0179 |
631
- | 1.0240 | 93500 | 0.0168 |
632
- | 1.0295 | 94000 | 0.019 |
633
- | 1.0350 | 94500 | 0.0173 |
634
- | 1.0404 | 95000 | 0.0172 |
635
- | 1.0459 | 95500 | 0.0187 |
636
- | 1.0514 | 96000 | 0.0199 |
637
- | 1.0569 | 96500 | 0.0202 |
638
- | 1.0624 | 97000 | 0.0198 |
639
- | 1.0678 | 97500 | 0.0157 |
640
- | 1.0733 | 98000 | 0.0178 |
641
- | 1.0788 | 98500 | 0.0147 |
642
- | 1.0843 | 99000 | 0.0152 |
643
- | 1.0897 | 99500 | 0.0152 |
644
- | 1.0952 | 100000 | 0.0126 |
645
- | 1.1007 | 100500 | 0.0115 |
646
- | 1.1062 | 101000 | 0.0122 |
647
- | 1.1116 | 101500 | 0.0097 |
648
- | 1.1171 | 102000 | 0.0149 |
649
- | 1.1226 | 102500 | 0.0151 |
650
- | 1.1281 | 103000 | 0.0134 |
651
- | 1.1335 | 103500 | 0.0157 |
652
- | 1.1390 | 104000 | 0.0141 |
653
- | 1.1445 | 104500 | 0.0139 |
654
- | 1.1500 | 105000 | 0.0149 |
655
- | 1.1554 | 105500 | 0.0103 |
656
- | 1.1609 | 106000 | 0.0138 |
657
- | 1.1664 | 106500 | 0.0116 |
658
- | 1.1719 | 107000 | 0.0146 |
659
- | 1.1773 | 107500 | 0.0168 |
660
- | 1.1828 | 108000 | 0.0166 |
661
- | 1.1883 | 108500 | 0.0136 |
662
- | 1.1938 | 109000 | 0.0103 |
663
- | 1.1993 | 109500 | 0.0128 |
664
- | 1.2047 | 110000 | 0.0112 |
665
- | 1.2102 | 110500 | 0.0103 |
666
- | 1.2157 | 111000 | 0.0133 |
667
- | 1.2212 | 111500 | 0.0118 |
668
- | 1.2266 | 112000 | 0.009 |
669
- | 1.2321 | 112500 | 0.0151 |
670
- | 1.2376 | 113000 | 0.0146 |
671
- | 1.2431 | 113500 | 0.0143 |
672
- | 1.2485 | 114000 | 0.01 |
673
- | 1.2540 | 114500 | 0.0147 |
674
- | 1.2595 | 115000 | 0.011 |
675
- | 1.2650 | 115500 | 0.0121 |
676
- | 1.2704 | 116000 | 0.0117 |
677
- | 1.2759 | 116500 | 0.0151 |
678
- | 1.2814 | 117000 | 0.0143 |
679
- | 1.2869 | 117500 | 0.0163 |
680
- | 1.2923 | 118000 | 0.0135 |
681
- | 1.2978 | 118500 | 0.0118 |
682
- | 1.3033 | 119000 | 0.0129 |
683
- | 1.3088 | 119500 | 0.0062 |
684
- | 1.3142 | 120000 | 0.0127 |
685
- | 1.3197 | 120500 | 0.014 |
686
- | 1.3252 | 121000 | 0.0131 |
687
- | 1.3307 | 121500 | 0.0162 |
688
- | 1.3362 | 122000 | 0.0107 |
689
- | 1.3416 | 122500 | 0.0125 |
690
- | 1.3471 | 123000 | 0.0136 |
691
- | 1.3526 | 123500 | 0.0112 |
692
- | 1.3581 | 124000 | 0.0126 |
693
- | 1.3635 | 124500 | 0.0079 |
694
- | 1.3690 | 125000 | 0.0104 |
695
- | 1.3745 | 125500 | 0.0137 |
696
- | 1.3800 | 126000 | 0.0075 |
697
- | 1.3854 | 126500 | 0.0108 |
698
- | 1.3909 | 127000 | 0.0087 |
699
- | 1.3964 | 127500 | 0.0138 |
700
- | 1.4019 | 128000 | 0.0056 |
701
- | 1.4073 | 128500 | 0.0067 |
702
- | 1.4128 | 129000 | 0.0103 |
703
- | 1.4183 | 129500 | 0.0102 |
704
- | 1.4238 | 130000 | 0.0119 |
705
- | 1.4292 | 130500 | 0.0094 |
706
- | 1.4347 | 131000 | 0.0075 |
707
- | 1.4402 | 131500 | 0.0146 |
708
- | 1.4457 | 132000 | 0.0103 |
709
- | 1.4511 | 132500 | 0.0123 |
710
- | 1.4566 | 133000 | 0.0107 |
711
- | 1.4621 | 133500 | 0.0071 |
712
- | 1.4676 | 134000 | 0.0087 |
713
- | 1.4731 | 134500 | 0.0072 |
714
- | 1.4785 | 135000 | 0.0094 |
715
- | 1.4840 | 135500 | 0.0083 |
716
- | 1.4895 | 136000 | 0.0104 |
717
- | 1.4950 | 136500 | 0.0076 |
718
- | 1.5004 | 137000 | 0.006 |
719
- | 1.5059 | 137500 | 0.0085 |
720
- | 1.5114 | 138000 | 0.0061 |
721
- | 1.5169 | 138500 | 0.0106 |
722
- | 1.5223 | 139000 | 0.0088 |
723
- | 1.5278 | 139500 | 0.0111 |
724
- | 1.5333 | 140000 | 0.0094 |
725
- | 1.5388 | 140500 | 0.0079 |
726
- | 1.5442 | 141000 | 0.0095 |
727
- | 1.5497 | 141500 | 0.0098 |
728
- | 1.5552 | 142000 | 0.0139 |
729
- | 1.5607 | 142500 | 0.0085 |
730
- | 1.5661 | 143000 | 0.0094 |
731
- | 1.5716 | 143500 | 0.0088 |
732
- | 1.5771 | 144000 | 0.0092 |
733
- | 1.5826 | 144500 | 0.0071 |
734
- | 1.5880 | 145000 | 0.0101 |
735
- | 1.5935 | 145500 | 0.011 |
736
- | 1.5990 | 146000 | 0.0097 |
737
- | 1.6045 | 146500 | 0.0071 |
738
- | 1.6100 | 147000 | 0.0114 |
739
- | 1.6154 | 147500 | 0.0087 |
740
- | 1.6209 | 148000 | 0.0075 |
741
- | 1.6264 | 148500 | 0.0039 |
742
- | 1.6319 | 149000 | 0.0091 |
743
- | 1.6373 | 149500 | 0.0117 |
744
- | 1.6428 | 150000 | 0.01 |
745
- | 1.6483 | 150500 | 0.0099 |
746
- | 1.6538 | 151000 | 0.0069 |
747
- | 1.6592 | 151500 | 0.0084 |
748
- | 1.6647 | 152000 | 0.0118 |
749
- | 1.6702 | 152500 | 0.0078 |
750
- | 1.6757 | 153000 | 0.0067 |
751
- | 1.6811 | 153500 | 0.0133 |
752
- | 1.6866 | 154000 | 0.0079 |
753
- | 1.6921 | 154500 | 0.0092 |
754
- | 1.6976 | 155000 | 0.0069 |
755
- | 1.7030 | 155500 | 0.008 |
756
- | 1.7085 | 156000 | 0.0124 |
757
- | 1.7140 | 156500 | 0.0112 |
758
- | 1.7195 | 157000 | 0.0074 |
759
- | 1.7249 | 157500 | 0.0091 |
760
- | 1.7304 | 158000 | 0.0088 |
761
- | 1.7359 | 158500 | 0.0061 |
762
- | 1.7414 | 159000 | 0.0089 |
763
- | 1.7469 | 159500 | 0.0082 |
764
- | 1.7523 | 160000 | 0.0103 |
765
- | 1.7578 | 160500 | 0.0094 |
766
- | 1.7633 | 161000 | 0.0073 |
767
- | 1.7688 | 161500 | 0.0116 |
768
- | 1.7742 | 162000 | 0.0112 |
769
- | 1.7797 | 162500 | 0.0057 |
770
- | 1.7852 | 163000 | 0.0075 |
771
- | 1.7907 | 163500 | 0.0062 |
772
- | 1.7961 | 164000 | 0.0046 |
773
- | 1.8016 | 164500 | 0.0091 |
774
- | 1.8071 | 165000 | 0.0066 |
775
- | 1.8126 | 165500 | 0.0051 |
776
- | 1.8180 | 166000 | 0.0066 |
777
- | 1.8235 | 166500 | 0.0093 |
778
- | 1.8290 | 167000 | 0.0079 |
779
- | 1.8345 | 167500 | 0.0067 |
780
- | 1.8399 | 168000 | 0.007 |
781
- | 1.8454 | 168500 | 0.0133 |
782
- | 1.8509 | 169000 | 0.0071 |
783
- | 1.8564 | 169500 | 0.0091 |
784
- | 1.8619 | 170000 | 0.0067 |
785
- | 1.8673 | 170500 | 0.0091 |
786
- | 1.8728 | 171000 | 0.0103 |
787
- | 1.8783 | 171500 | 0.0058 |
788
- | 1.8838 | 172000 | 0.0116 |
789
- | 1.8892 | 172500 | 0.0089 |
790
- | 1.8947 | 173000 | 0.0137 |
791
- | 1.9002 | 173500 | 0.0065 |
792
- | 1.9057 | 174000 | 0.0098 |
793
- | 1.9111 | 174500 | 0.0083 |
794
- | 1.9166 | 175000 | 0.0115 |
795
- | 1.9221 | 175500 | 0.0083 |
796
- | 1.9276 | 176000 | 0.0084 |
797
- | 1.9330 | 176500 | 0.0091 |
798
- | 1.9385 | 177000 | 0.0092 |
799
- | 1.9440 | 177500 | 0.0054 |
800
- | 1.9495 | 178000 | 0.0049 |
801
- | 1.9549 | 178500 | 0.0072 |
802
- | 1.9604 | 179000 | 0.0052 |
803
- | 1.9659 | 179500 | 0.0063 |
804
- | 1.9714 | 180000 | 0.0107 |
805
- | 1.9768 | 180500 | 0.0061 |
806
- | 1.9823 | 181000 | 0.0059 |
807
- | 1.9878 | 181500 | 0.0067 |
808
- | 1.9933 | 182000 | 0.0078 |
809
- | 1.9988 | 182500 | 0.007 |
810
- | 2.0042 | 183000 | 0.0065 |
811
- | 2.0097 | 183500 | 0.0073 |
812
- | 2.0152 | 184000 | 0.01 |
813
- | 2.0207 | 184500 | 0.0072 |
814
- | 2.0261 | 185000 | 0.0055 |
815
- | 2.0316 | 185500 | 0.0087 |
816
- | 2.0371 | 186000 | 0.0077 |
817
- | 2.0426 | 186500 | 0.0067 |
818
- | 2.0480 | 187000 | 0.008 |
819
- | 2.0535 | 187500 | 0.0074 |
820
- | 2.0590 | 188000 | 0.0072 |
821
- | 2.0645 | 188500 | 0.0045 |
822
- | 2.0699 | 189000 | 0.0082 |
823
- | 2.0754 | 189500 | 0.0042 |
824
- | 2.0809 | 190000 | 0.0076 |
825
- | 2.0864 | 190500 | 0.0058 |
826
- | 2.0918 | 191000 | 0.005 |
827
- | 2.0973 | 191500 | 0.0047 |
828
- | 2.1028 | 192000 | 0.0045 |
829
- | 2.1083 | 192500 | 0.0043 |
830
- | 2.1137 | 193000 | 0.0049 |
831
- | 2.1192 | 193500 | 0.0058 |
832
- | 2.1247 | 194000 | 0.0081 |
833
- | 2.1302 | 194500 | 0.0057 |
834
- | 2.1357 | 195000 | 0.0047 |
835
- | 2.1411 | 195500 | 0.0073 |
836
- | 2.1466 | 196000 | 0.0056 |
837
- | 2.1521 | 196500 | 0.006 |
838
- | 2.1576 | 197000 | 0.0061 |
839
- | 2.1630 | 197500 | 0.0042 |
840
- | 2.1685 | 198000 | 0.0057 |
841
- | 2.1740 | 198500 | 0.0055 |
842
- | 2.1795 | 199000 | 0.0053 |
843
- | 2.1849 | 199500 | 0.0085 |
844
- | 2.1904 | 200000 | 0.005 |
845
- | 2.1959 | 200500 | 0.0055 |
846
- | 2.2014 | 201000 | 0.0032 |
847
- | 2.2068 | 201500 | 0.0054 |
848
- | 2.2123 | 202000 | 0.0037 |
849
- | 2.2178 | 202500 | 0.0046 |
850
- | 2.2233 | 203000 | 0.0029 |
851
- | 2.2287 | 203500 | 0.0043 |
852
- | 2.2342 | 204000 | 0.0063 |
853
- | 2.2397 | 204500 | 0.0064 |
854
- | 2.2452 | 205000 | 0.0046 |
855
- | 2.2506 | 205500 | 0.0061 |
856
- | 2.2561 | 206000 | 0.0034 |
857
- | 2.2616 | 206500 | 0.0046 |
858
- | 2.2671 | 207000 | 0.0059 |
859
- | 2.2726 | 207500 | 0.0044 |
860
- | 2.2780 | 208000 | 0.0054 |
861
- | 2.2835 | 208500 | 0.0049 |
862
- | 2.2890 | 209000 | 0.0096 |
863
- | 2.2945 | 209500 | 0.0045 |
864
- | 2.2999 | 210000 | 0.0057 |
865
- | 2.3054 | 210500 | 0.0032 |
866
- | 2.3109 | 211000 | 0.0031 |
867
- | 2.3164 | 211500 | 0.0043 |
868
- | 2.3218 | 212000 | 0.0068 |
869
- | 2.3273 | 212500 | 0.0048 |
870
- | 2.3328 | 213000 | 0.0042 |
871
- | 2.3383 | 213500 | 0.0068 |
872
- | 2.3437 | 214000 | 0.0041 |
873
- | 2.3492 | 214500 | 0.0042 |
874
- | 2.3547 | 215000 | 0.0051 |
875
- | 2.3602 | 215500 | 0.0049 |
876
- | 2.3656 | 216000 | 0.0019 |
877
- | 2.3711 | 216500 | 0.0039 |
878
- | 2.3766 | 217000 | 0.0068 |
879
- | 2.3821 | 217500 | 0.0033 |
880
- | 2.3875 | 218000 | 0.0048 |
881
- | 2.3930 | 218500 | 0.0052 |
882
- | 2.3985 | 219000 | 0.0063 |
883
- | 2.4040 | 219500 | 0.003 |
884
- | 2.4095 | 220000 | 0.0036 |
885
- | 2.4149 | 220500 | 0.004 |
886
- | 2.4204 | 221000 | 0.006 |
887
- | 2.4259 | 221500 | 0.0048 |
888
- | 2.4314 | 222000 | 0.0037 |
889
- | 2.4368 | 222500 | 0.0034 |
890
- | 2.4423 | 223000 | 0.0049 |
891
- | 2.4478 | 223500 | 0.0036 |
892
- | 2.4533 | 224000 | 0.0046 |
893
- | 2.4587 | 224500 | 0.0039 |
894
- | 2.4642 | 225000 | 0.0021 |
895
- | 2.4697 | 225500 | 0.0035 |
896
- | 2.4752 | 226000 | 0.0034 |
897
- | 2.4806 | 226500 | 0.003 |
898
- | 2.4861 | 227000 | 0.0032 |
899
- | 2.4916 | 227500 | 0.005 |
900
- | 2.4971 | 228000 | 0.0025 |
901
- | 2.5025 | 228500 | 0.0036 |
902
- | 2.5080 | 229000 | 0.0021 |
903
- | 2.5135 | 229500 | 0.0025 |
904
- | 2.5190 | 230000 | 0.0036 |
905
- | 2.5245 | 230500 | 0.0033 |
906
- | 2.5299 | 231000 | 0.0049 |
907
- | 2.5354 | 231500 | 0.0044 |
908
- | 2.5409 | 232000 | 0.0029 |
909
- | 2.5464 | 232500 | 0.0028 |
910
- | 2.5518 | 233000 | 0.0091 |
911
- | 2.5573 | 233500 | 0.004 |
912
- | 2.5628 | 234000 | 0.0036 |
913
- | 2.5683 | 234500 | 0.0029 |
914
- | 2.5737 | 235000 | 0.0035 |
915
- | 2.5792 | 235500 | 0.0038 |
916
- | 2.5847 | 236000 | 0.0028 |
917
- | 2.5902 | 236500 | 0.0041 |
918
- | 2.5956 | 237000 | 0.0037 |
919
- | 2.6011 | 237500 | 0.0031 |
920
- | 2.6066 | 238000 | 0.0036 |
921
- | 2.6121 | 238500 | 0.0052 |
922
- | 2.6175 | 239000 | 0.0031 |
923
- | 2.6230 | 239500 | 0.0023 |
924
- | 2.6285 | 240000 | 0.0043 |
925
- | 2.6340 | 240500 | 0.0027 |
926
- | 2.6394 | 241000 | 0.0048 |
927
- | 2.6449 | 241500 | 0.0046 |
928
- | 2.6504 | 242000 | 0.0038 |
929
- | 2.6559 | 242500 | 0.0033 |
930
- | 2.6614 | 243000 | 0.003 |
931
- | 2.6668 | 243500 | 0.0057 |
932
- | 2.6723 | 244000 | 0.0044 |
933
- | 2.6778 | 244500 | 0.0058 |
934
- | 2.6833 | 245000 | 0.003 |
935
- | 2.6887 | 245500 | 0.0042 |
936
- | 2.6942 | 246000 | 0.0045 |
937
- | 2.6997 | 246500 | 0.0031 |
938
- | 2.7052 | 247000 | 0.0021 |
939
- | 2.7106 | 247500 | 0.0043 |
940
- | 2.7161 | 248000 | 0.0058 |
941
- | 2.7216 | 248500 | 0.0041 |
942
- | 2.7271 | 249000 | 0.0038 |
943
- | 2.7325 | 249500 | 0.0019 |
944
- | 2.7380 | 250000 | 0.0029 |
945
- | 2.7435 | 250500 | 0.003 |
946
- | 2.7490 | 251000 | 0.0038 |
947
- | 2.7544 | 251500 | 0.004 |
948
- | 2.7599 | 252000 | 0.0049 |
949
- | 2.7654 | 252500 | 0.0039 |
950
- | 2.7709 | 253000 | 0.005 |
951
- | 2.7763 | 253500 | 0.0046 |
952
- | 2.7818 | 254000 | 0.0025 |
953
- | 2.7873 | 254500 | 0.0044 |
954
- | 2.7928 | 255000 | 0.0023 |
955
- | 2.7983 | 255500 | 0.0038 |
956
- | 2.8037 | 256000 | 0.0032 |
957
- | 2.8092 | 256500 | 0.0021 |
958
- | 2.8147 | 257000 | 0.0023 |
959
- | 2.8202 | 257500 | 0.0042 |
960
- | 2.8256 | 258000 | 0.0042 |
961
- | 2.8311 | 258500 | 0.0053 |
962
- | 2.8366 | 259000 | 0.0021 |
963
- | 2.8421 | 259500 | 0.0033 |
964
- | 2.8475 | 260000 | 0.0047 |
965
- | 2.8530 | 260500 | 0.0048 |
966
- | 2.8585 | 261000 | 0.0022 |
967
- | 2.8640 | 261500 | 0.0036 |
968
- | 2.8694 | 262000 | 0.0034 |
969
- | 2.8749 | 262500 | 0.0029 |
970
- | 2.8804 | 263000 | 0.0038 |
971
- | 2.8859 | 263500 | 0.0067 |
972
- | 2.8913 | 264000 | 0.003 |
973
- | 2.8968 | 264500 | 0.0049 |
974
- | 2.9023 | 265000 | 0.0027 |
975
- | 2.9078 | 265500 | 0.004 |
976
- | 2.9132 | 266000 | 0.0042 |
977
- | 2.9187 | 266500 | 0.0042 |
978
- | 2.9242 | 267000 | 0.0038 |
979
- | 2.9297 | 267500 | 0.0029 |
980
- | 2.9352 | 268000 | 0.0039 |
981
- | 2.9406 | 268500 | 0.0039 |
982
- | 2.9461 | 269000 | 0.002 |
983
- | 2.9516 | 269500 | 0.0022 |
984
- | 2.9571 | 270000 | 0.002 |
985
- | 2.9625 | 270500 | 0.003 |
986
- | 2.9680 | 271000 | 0.0019 |
987
- | 2.9735 | 271500 | 0.0044 |
988
- | 2.9790 | 272000 | 0.0028 |
989
- | 2.9844 | 272500 | 0.0031 |
990
- | 2.9899 | 273000 | 0.0025 |
991
- | 2.9954 | 273500 | 0.0021 |
992
- | 3.0009 | 274000 | 0.0025 |
993
- | 3.0063 | 274500 | 0.0038 |
994
- | 3.0118 | 275000 | 0.0045 |
995
- | 3.0173 | 275500 | 0.002 |
996
- | 3.0228 | 276000 | 0.0035 |
997
- | 3.0282 | 276500 | 0.0046 |
998
- | 3.0337 | 277000 | 0.0033 |
999
- | 3.0392 | 277500 | 0.002 |
1000
- | 3.0447 | 278000 | 0.0036 |
1001
- | 3.0501 | 278500 | 0.0025 |
1002
- | 3.0556 | 279000 | 0.0039 |
1003
- | 3.0611 | 279500 | 0.0029 |
1004
- | 3.0666 | 280000 | 0.004 |
1005
- | 3.0721 | 280500 | 0.0023 |
1006
- | 3.0775 | 281000 | 0.0019 |
1007
- | 3.0830 | 281500 | 0.0019 |
1008
- | 3.0885 | 282000 | 0.0027 |
1009
- | 3.0940 | 282500 | 0.0014 |
1010
- | 3.0994 | 283000 | 0.0019 |
1011
- | 3.1049 | 283500 | 0.0018 |
1012
- | 3.1104 | 284000 | 0.0016 |
1013
- | 3.1159 | 284500 | 0.0017 |
1014
- | 3.1213 | 285000 | 0.0049 |
1015
- | 3.1268 | 285500 | 0.0022 |
1016
- | 3.1323 | 286000 | 0.0023 |
1017
- | 3.1378 | 286500 | 0.0016 |
1018
- | 3.1432 | 287000 | 0.002 |
1019
- | 3.1487 | 287500 | 0.0025 |
1020
- | 3.1542 | 288000 | 0.0012 |
1021
- | 3.1597 | 288500 | 0.0021 |
1022
- | 3.1651 | 289000 | 0.0017 |
1023
- | 3.1706 | 289500 | 0.0019 |
1024
- | 3.1761 | 290000 | 0.0019 |
1025
- | 3.1816 | 290500 | 0.0042 |
1026
- | 3.1871 | 291000 | 0.0027 |
1027
- | 3.1925 | 291500 | 0.0011 |
1028
- | 3.1980 | 292000 | 0.002 |
1029
- | 3.2035 | 292500 | 0.0021 |
1030
- | 3.2090 | 293000 | 0.0015 |
1031
- | 3.2144 | 293500 | 0.0017 |
1032
- | 3.2199 | 294000 | 0.002 |
1033
- | 3.2254 | 294500 | 0.0012 |
1034
- | 3.2309 | 295000 | 0.0017 |
1035
- | 3.2363 | 295500 | 0.0029 |
1036
- | 3.2418 | 296000 | 0.0019 |
1037
- | 3.2473 | 296500 | 0.0017 |
1038
- | 3.2528 | 297000 | 0.0019 |
1039
- | 3.2582 | 297500 | 0.0012 |
1040
- | 3.2637 | 298000 | 0.0024 |
1041
- | 3.2692 | 298500 | 0.0017 |
1042
- | 3.2747 | 299000 | 0.0022 |
1043
- | 3.2801 | 299500 | 0.002 |
1044
- | 3.2856 | 300000 | 0.0028 |
1045
- | 3.2911 | 300500 | 0.0036 |
1046
- | 3.2966 | 301000 | 0.0015 |
1047
- | 3.3020 | 301500 | 0.0024 |
1048
- | 3.3075 | 302000 | 0.0015 |
1049
- | 3.3130 | 302500 | 0.0012 |
1050
- | 3.3185 | 303000 | 0.0022 |
1051
- | 3.3240 | 303500 | 0.0015 |
1052
- | 3.3294 | 304000 | 0.0023 |
1053
- | 3.3349 | 304500 | 0.0017 |
1054
- | 3.3404 | 305000 | 0.0021 |
1055
- | 3.3459 | 305500 | 0.0017 |
1056
- | 3.3513 | 306000 | 0.0015 |
1057
- | 3.3568 | 306500 | 0.0023 |
1058
- | 3.3623 | 307000 | 0.0014 |
1059
- | 3.3678 | 307500 | 0.0019 |
1060
- | 3.3732 | 308000 | 0.0017 |
1061
- | 3.3787 | 308500 | 0.0027 |
1062
- | 3.3842 | 309000 | 0.0016 |
1063
- | 3.3897 | 309500 | 0.0019 |
1064
- | 3.3951 | 310000 | 0.0037 |
1065
- | 3.4006 | 310500 | 0.0016 |
1066
- | 3.4061 | 311000 | 0.0012 |
1067
- | 3.4116 | 311500 | 0.0024 |
1068
- | 3.4170 | 312000 | 0.0016 |
1069
- | 3.4225 | 312500 | 0.0022 |
1070
- | 3.4280 | 313000 | 0.0015 |
1071
- | 3.4335 | 313500 | 0.0017 |
1072
- | 3.4389 | 314000 | 0.0015 |
1073
- | 3.4444 | 314500 | 0.0018 |
1074
- | 3.4499 | 315000 | 0.0015 |
1075
- | 3.4554 | 315500 | 0.0019 |
1076
- | 3.4609 | 316000 | 0.0009 |
1077
- | 3.4663 | 316500 | 0.001 |
1078
- | 3.4718 | 317000 | 0.001 |
1079
- | 3.4773 | 317500 | 0.0023 |
1080
- | 3.4828 | 318000 | 0.0012 |
1081
- | 3.4882 | 318500 | 0.0012 |
1082
- | 3.4937 | 319000 | 0.0011 |
1083
- | 3.4992 | 319500 | 0.0008 |
1084
- | 3.5047 | 320000 | 0.0018 |
1085
- | 3.5101 | 320500 | 0.0009 |
1086
- | 3.5156 | 321000 | 0.0016 |
1087
- | 3.5211 | 321500 | 0.0012 |
1088
- | 3.5266 | 322000 | 0.0015 |
1089
- | 3.5320 | 322500 | 0.0024 |
1090
- | 3.5375 | 323000 | 0.0016 |
1091
- | 3.5430 | 323500 | 0.0014 |
1092
- | 3.5485 | 324000 | 0.0014 |
1093
- | 3.5539 | 324500 | 0.0047 |
1094
- | 3.5594 | 325000 | 0.0013 |
1095
- | 3.5649 | 325500 | 0.0012 |
1096
- | 3.5704 | 326000 | 0.0013 |
1097
- | 3.5758 | 326500 | 0.0011 |
1098
- | 3.5813 | 327000 | 0.0011 |
1099
- | 3.5868 | 327500 | 0.0016 |
1100
- | 3.5923 | 328000 | 0.0022 |
1101
- | 3.5978 | 328500 | 0.0017 |
1102
- | 3.6032 | 329000 | 0.0012 |
1103
- | 3.6087 | 329500 | 0.002 |
1104
- | 3.6142 | 330000 | 0.0016 |
1105
- | 3.6197 | 330500 | 0.0009 |
1106
- | 3.6251 | 331000 | 0.0011 |
1107
- | 3.6306 | 331500 | 0.0019 |
1108
- | 3.6361 | 332000 | 0.0011 |
1109
- | 3.6416 | 332500 | 0.0021 |
1110
- | 3.6470 | 333000 | 0.0029 |
1111
- | 3.6525 | 333500 | 0.001 |
1112
- | 3.6580 | 334000 | 0.0016 |
1113
- | 3.6635 | 334500 | 0.0016 |
1114
- | 3.6689 | 335000 | 0.0036 |
1115
- | 3.6744 | 335500 | 0.0012 |
1116
- | 3.6799 | 336000 | 0.003 |
1117
- | 3.6854 | 336500 | 0.0014 |
1118
- | 3.6908 | 337000 | 0.0018 |
1119
- | 3.6963 | 337500 | 0.001 |
1120
- | 3.7018 | 338000 | 0.001 |
1121
- | 3.7073 | 338500 | 0.0016 |
1122
- | 3.7127 | 339000 | 0.0025 |
1123
- | 3.7182 | 339500 | 0.001 |
1124
- | 3.7237 | 340000 | 0.0018 |
1125
- | 3.7292 | 340500 | 0.0015 |
1126
- | 3.7347 | 341000 | 0.001 |
1127
- | 3.7401 | 341500 | 0.0009 |
1128
- | 3.7456 | 342000 | 0.0013 |
1129
- | 3.7511 | 342500 | 0.0014 |
1130
- | 3.7566 | 343000 | 0.0013 |
1131
- | 3.7620 | 343500 | 0.0011 |
1132
- | 3.7675 | 344000 | 0.0026 |
1133
- | 3.7730 | 344500 | 0.0014 |
1134
- | 3.7785 | 345000 | 0.0021 |
1135
- | 3.7839 | 345500 | 0.0015 |
1136
- | 3.7894 | 346000 | 0.0013 |
1137
- | 3.7949 | 346500 | 0.0013 |
1138
- | 3.8004 | 347000 | 0.0019 |
1139
- | 3.8058 | 347500 | 0.0009 |
1140
- | 3.8113 | 348000 | 0.0009 |
1141
- | 3.8168 | 348500 | 0.0014 |
1142
- | 3.8223 | 349000 | 0.0012 |
1143
- | 3.8277 | 349500 | 0.0032 |
1144
- | 3.8332 | 350000 | 0.0015 |
1145
- | 3.8387 | 350500 | 0.0011 |
1146
- | 3.8442 | 351000 | 0.002 |
1147
- | 3.8497 | 351500 | 0.0012 |
1148
- | 3.8551 | 352000 | 0.0026 |
1149
- | 3.8606 | 352500 | 0.001 |
1150
- | 3.8661 | 353000 | 0.0018 |
1151
- | 3.8716 | 353500 | 0.0014 |
1152
- | 3.8770 | 354000 | 0.001 |
1153
- | 3.8825 | 354500 | 0.0018 |
1154
- | 3.8880 | 355000 | 0.0027 |
1155
- | 3.8935 | 355500 | 0.0027 |
1156
- | 3.8989 | 356000 | 0.0011 |
1157
- | 3.9044 | 356500 | 0.0024 |
1158
- | 3.9099 | 357000 | 0.0012 |
1159
- | 3.9154 | 357500 | 0.0018 |
1160
- | 3.9208 | 358000 | 0.0012 |
1161
- | 3.9263 | 358500 | 0.0015 |
1162
- | 3.9318 | 359000 | 0.0015 |
1163
- | 3.9373 | 359500 | 0.0018 |
1164
- | 3.9427 | 360000 | 0.0017 |
1165
- | 3.9482 | 360500 | 0.0009 |
1166
- | 3.9537 | 361000 | 0.001 |
1167
- | 3.9592 | 361500 | 0.0013 |
1168
- | 3.9646 | 362000 | 0.0008 |
1169
- | 3.9701 | 362500 | 0.0018 |
1170
- | 3.9756 | 363000 | 0.0027 |
1171
- | 3.9811 | 363500 | 0.0009 |
1172
- | 3.9866 | 364000 | 0.0008 |
1173
- | 3.9920 | 364500 | 0.001 |
1174
- | 3.9975 | 365000 | 0.0009 |
1175
- | 4.0030 | 365500 | 0.0012 |
1176
- | 4.0085 | 366000 | 0.0011 |
1177
- | 4.0139 | 366500 | 0.0023 |
1178
- | 4.0194 | 367000 | 0.0023 |
1179
- | 4.0249 | 367500 | 0.0012 |
1180
- | 4.0304 | 368000 | 0.0018 |
1181
- | 4.0358 | 368500 | 0.0013 |
1182
- | 4.0413 | 369000 | 0.0009 |
1183
- | 4.0468 | 369500 | 0.0016 |
1184
- | 4.0523 | 370000 | 0.0011 |
1185
- | 4.0577 | 370500 | 0.0011 |
1186
- | 4.0632 | 371000 | 0.0009 |
1187
- | 4.0687 | 371500 | 0.0012 |
1188
- | 4.0742 | 372000 | 0.0011 |
1189
- | 4.0796 | 372500 | 0.0008 |
1190
- | 4.0851 | 373000 | 0.001 |
1191
- | 4.0906 | 373500 | 0.0008 |
1192
- | 4.0961 | 374000 | 0.0009 |
1193
- | 4.1015 | 374500 | 0.0008 |
1194
- | 4.1070 | 375000 | 0.0008 |
1195
- | 4.1125 | 375500 | 0.0008 |
1196
- | 4.1180 | 376000 | 0.0009 |
1197
- | 4.1235 | 376500 | 0.0021 |
1198
- | 4.1289 | 377000 | 0.0007 |
1199
- | 4.1344 | 377500 | 0.0014 |
1200
- | 4.1399 | 378000 | 0.0008 |
1201
- | 4.1454 | 378500 | 0.0015 |
1202
- | 4.1508 | 379000 | 0.0008 |
1203
- | 4.1563 | 379500 | 0.0008 |
1204
- | 4.1618 | 380000 | 0.0015 |
1205
- | 4.1673 | 380500 | 0.0008 |
1206
- | 4.1727 | 381000 | 0.0009 |
1207
- | 4.1782 | 381500 | 0.0018 |
1208
- | 4.1837 | 382000 | 0.0013 |
1209
- | 4.1892 | 382500 | 0.0012 |
1210
- | 4.1946 | 383000 | 0.0008 |
1211
- | 4.2001 | 383500 | 0.0008 |
1212
- | 4.2056 | 384000 | 0.0008 |
1213
- | 4.2111 | 384500 | 0.0008 |
1214
- | 4.2165 | 385000 | 0.001 |
1215
- | 4.2220 | 385500 | 0.0008 |
1216
- | 4.2275 | 386000 | 0.0008 |
1217
- | 4.2330 | 386500 | 0.0009 |
1218
- | 4.2384 | 387000 | 0.0008 |
1219
- | 4.2439 | 387500 | 0.0008 |
1220
- | 4.2494 | 388000 | 0.0011 |
1221
- | 4.2549 | 388500 | 0.0009 |
1222
- | 4.2604 | 389000 | 0.0007 |
1223
- | 4.2658 | 389500 | 0.001 |
1224
- | 4.2713 | 390000 | 0.0007 |
1225
- | 4.2768 | 390500 | 0.0011 |
1226
- | 4.2823 | 391000 | 0.0007 |
1227
- | 4.2877 | 391500 | 0.0019 |
1228
- | 4.2932 | 392000 | 0.0009 |
1229
- | 4.2987 | 392500 | 0.0011 |
1230
- | 4.3042 | 393000 | 0.0008 |
1231
- | 4.3096 | 393500 | 0.0006 |
1232
- | 4.3151 | 394000 | 0.0009 |
1233
- | 4.3206 | 394500 | 0.001 |
1234
- | 4.3261 | 395000 | 0.0007 |
1235
- | 4.3315 | 395500 | 0.0011 |
1236
- | 4.3370 | 396000 | 0.0008 |
1237
- | 4.3425 | 396500 | 0.0007 |
1238
- | 4.3480 | 397000 | 0.0007 |
1239
- | 4.3534 | 397500 | 0.0007 |
1240
- | 4.3589 | 398000 | 0.001 |
1241
- | 4.3644 | 398500 | 0.0008 |
1242
- | 4.3699 | 399000 | 0.001 |
1243
- | 4.3753 | 399500 | 0.0014 |
1244
- | 4.3808 | 400000 | 0.0006 |
1245
- | 4.3863 | 400500 | 0.0006 |
1246
- | 4.3918 | 401000 | 0.001 |
1247
- | 4.3973 | 401500 | 0.002 |
1248
- | 4.4027 | 402000 | 0.0006 |
1249
- | 4.4082 | 402500 | 0.0007 |
1250
- | 4.4137 | 403000 | 0.001 |
1251
- | 4.4192 | 403500 | 0.0008 |
1252
- | 4.4246 | 404000 | 0.0008 |
1253
- | 4.4301 | 404500 | 0.0009 |
1254
- | 4.4356 | 405000 | 0.0005 |
1255
- | 4.4411 | 405500 | 0.0008 |
1256
- | 4.4465 | 406000 | 0.0008 |
1257
- | 4.4520 | 406500 | 0.0007 |
1258
- | 4.4575 | 407000 | 0.0006 |
1259
- | 4.4630 | 407500 | 0.0006 |
1260
- | 4.4684 | 408000 | 0.0006 |
1261
- | 4.4739 | 408500 | 0.0006 |
1262
- | 4.4794 | 409000 | 0.0009 |
1263
- | 4.4849 | 409500 | 0.0007 |
1264
- | 4.4903 | 410000 | 0.0009 |
1265
- | 4.4958 | 410500 | 0.0006 |
1266
- | 4.5013 | 411000 | 0.0007 |
1267
- | 4.5068 | 411500 | 0.0006 |
1268
- | 4.5122 | 412000 | 0.0007 |
1269
- | 4.5177 | 412500 | 0.0006 |
1270
- | 4.5232 | 413000 | 0.0008 |
1271
- | 4.5287 | 413500 | 0.0007 |
1272
- | 4.5342 | 414000 | 0.0013 |
1273
- | 4.5396 | 414500 | 0.0006 |
1274
- | 4.5451 | 415000 | 0.0009 |
1275
- | 4.5506 | 415500 | 0.0015 |
1276
- | 4.5561 | 416000 | 0.0014 |
1277
- | 4.5615 | 416500 | 0.0007 |
1278
- | 4.5670 | 417000 | 0.0007 |
1279
- | 4.5725 | 417500 | 0.0008 |
1280
- | 4.5780 | 418000 | 0.0008 |
1281
- | 4.5834 | 418500 | 0.0007 |
1282
- | 4.5889 | 419000 | 0.0006 |
1283
- | 4.5944 | 419500 | 0.0008 |
1284
- | 4.5999 | 420000 | 0.0008 |
1285
- | 4.6053 | 420500 | 0.0006 |
1286
- | 4.6108 | 421000 | 0.001 |
1287
- | 4.6163 | 421500 | 0.0005 |
1288
- | 4.6218 | 422000 | 0.0007 |
1289
- | 4.6272 | 422500 | 0.0006 |
1290
- | 4.6327 | 423000 | 0.0007 |
1291
- | 4.6382 | 423500 | 0.0009 |
1292
- | 4.6437 | 424000 | 0.0014 |
1293
- | 4.6492 | 424500 | 0.0008 |
1294
- | 4.6546 | 425000 | 0.0006 |
1295
- | 4.6601 | 425500 | 0.0006 |
1296
- | 4.6656 | 426000 | 0.0016 |
1297
- | 4.6711 | 426500 | 0.0006 |
1298
- | 4.6765 | 427000 | 0.0006 |
1299
- | 4.6820 | 427500 | 0.0012 |
1300
- | 4.6875 | 428000 | 0.0007 |
1301
- | 4.6930 | 428500 | 0.0009 |
1302
- | 4.6984 | 429000 | 0.0006 |
1303
- | 4.7039 | 429500 | 0.0005 |
1304
- | 4.7094 | 430000 | 0.0007 |
1305
- | 4.7149 | 430500 | 0.0007 |
1306
- | 4.7203 | 431000 | 0.0006 |
1307
- | 4.7258 | 431500 | 0.0006 |
1308
- | 4.7313 | 432000 | 0.0006 |
1309
- | 4.7368 | 432500 | 0.0006 |
1310
- | 4.7422 | 433000 | 0.0006 |
1311
- | 4.7477 | 433500 | 0.0006 |
1312
- | 4.7532 | 434000 | 0.0006 |
1313
- | 4.7587 | 434500 | 0.0006 |
1314
- | 4.7641 | 435000 | 0.0006 |
1315
- | 4.7696 | 435500 | 0.0018 |
1316
- | 4.7751 | 436000 | 0.0009 |
1317
- | 4.7806 | 436500 | 0.0007 |
1318
- | 4.7861 | 437000 | 0.0007 |
1319
- | 4.7915 | 437500 | 0.0005 |
1320
- | 4.7970 | 438000 | 0.0009 |
1321
- | 4.8025 | 438500 | 0.0013 |
1322
- | 4.8080 | 439000 | 0.0007 |
1323
- | 4.8134 | 439500 | 0.0006 |
1324
- | 4.8189 | 440000 | 0.0007 |
1325
- | 4.8244 | 440500 | 0.001 |
1326
- | 4.8299 | 441000 | 0.0019 |
1327
- | 4.8353 | 441500 | 0.0006 |
1328
- | 4.8408 | 442000 | 0.0006 |
1329
- | 4.8463 | 442500 | 0.0009 |
1330
- | 4.8518 | 443000 | 0.0006 |
1331
- | 4.8572 | 443500 | 0.001 |
1332
- | 4.8627 | 444000 | 0.0011 |
1333
- | 4.8682 | 444500 | 0.0007 |
1334
- | 4.8737 | 445000 | 0.0007 |
1335
- | 4.8791 | 445500 | 0.0007 |
1336
- | 4.8846 | 446000 | 0.0018 |
1337
- | 4.8901 | 446500 | 0.0007 |
1338
- | 4.8956 | 447000 | 0.0012 |
1339
- | 4.9010 | 447500 | 0.0007 |
1340
- | 4.9065 | 448000 | 0.0009 |
1341
- | 4.9120 | 448500 | 0.0007 |
1342
- | 4.9175 | 449000 | 0.001 |
1343
- | 4.9230 | 449500 | 0.0007 |
1344
- | 4.9284 | 450000 | 0.0007 |
1345
- | 4.9339 | 450500 | 0.0007 |
1346
- | 4.9394 | 451000 | 0.0011 |
1347
- | 4.9449 | 451500 | 0.0005 |
1348
- | 4.9503 | 452000 | 0.0007 |
1349
- | 4.9558 | 452500 | 0.0006 |
1350
- | 4.9613 | 453000 | 0.0009 |
1351
- | 4.9668 | 453500 | 0.0008 |
1352
- | 4.9722 | 454000 | 0.0015 |
1353
- | 4.9777 | 454500 | 0.0008 |
1354
- | 4.9832 | 455000 | 0.0006 |
1355
- | 4.9887 | 455500 | 0.0006 |
1356
- | 4.9941 | 456000 | 0.0007 |
1357
- | 4.9996 | 456500 | 0.0006 |
1358
-
1359
- </details>
1360
-
1361
- ### Framework Versions
1362
- - Python: 3.12.2
1363
- - Sentence Transformers: 3.0.1
1364
- - Transformers: 4.42.3
1365
- - PyTorch: 2.3.1+cu121
1366
- - Accelerate: 0.32.1
1367
- - Datasets: 2.20.0
1368
- - Tokenizers: 0.19.1
1369
-
1370
- ## Citation
1371
-
1372
- ### BibTeX
1373
-
1374
- #### Sentence Transformers
1375
- ```bibtex
1376
- @inproceedings{reimers-2019-sentence-bert,
1377
- title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1378
- author = "Reimers, Nils and Gurevych, Iryna",
1379
- booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1380
- month = "11",
1381
- year = "2019",
1382
- publisher = "Association for Computational Linguistics",
1383
- url = "https://arxiv.org/abs/1908.10084",
1384
- }
1385
- ```
1386
-
1387
- #### MultipleNegativesRankingLoss
1388
- ```bibtex
1389
- @misc{henderson2017efficient,
1390
- title={Efficient Natural Language Response Suggestion for Smart Reply},
1391
- author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
1392
- year={2017},
1393
- eprint={1705.00652},
1394
- archivePrefix={arXiv},
1395
- primaryClass={cs.CL}
1396
- }
1397
- ```
1398
-
1399
- <!--
1400
- ## Glossary
1401
-
1402
- *Clearly define terms in order to be accessible across audiences.*
1403
- -->
1404
-
1405
- <!--
1406
- ## Model Card Authors
1407
-
1408
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1409
- -->
1410
-
1411
- <!--
1412
- ## Model Card Contact
1413
-
1414
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1415
- -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mx-01/config.json DELETED
@@ -1,26 +0,0 @@
1
- {
2
- "_name_or_path": "nreimers/MiniLM-L6-H384-uncased",
3
- "architectures": [
4
- "BertModel"
5
- ],
6
- "attention_probs_dropout_prob": 0.1,
7
- "classifier_dropout": null,
8
- "gradient_checkpointing": false,
9
- "hidden_act": "gelu",
10
- "hidden_dropout_prob": 0.1,
11
- "hidden_size": 384,
12
- "initializer_range": 0.02,
13
- "intermediate_size": 1536,
14
- "layer_norm_eps": 1e-12,
15
- "max_position_embeddings": 512,
16
- "model_type": "bert",
17
- "num_attention_heads": 12,
18
- "num_hidden_layers": 6,
19
- "pad_token_id": 0,
20
- "position_embedding_type": "absolute",
21
- "torch_dtype": "float32",
22
- "transformers_version": "4.42.3",
23
- "type_vocab_size": 2,
24
- "use_cache": true,
25
- "vocab_size": 30522
26
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mx-01/config_sentence_transformers.json DELETED
@@ -1,10 +0,0 @@
1
- {
2
- "__version__": {
3
- "sentence_transformers": "3.0.1",
4
- "transformers": "4.42.3",
5
- "pytorch": "2.3.1+cu121"
6
- },
7
- "prompts": {},
8
- "default_prompt_name": null,
9
- "similarity_fn_name": null
10
- }
 
 
 
 
 
 
 
 
 
 
 
mx-01/log.txt DELETED
@@ -1,912 +0,0 @@
1
- {'loss': 0.3883, 'grad_norm': 21.324216842651367, 'learning_rate': 3e-06, 'epoch': 0.02}
2
- {'loss': 0.2685, 'grad_norm': 7.115039825439453, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.02}
3
- {'loss': 0.2349, 'grad_norm': 22.00225067138672, 'learning_rate': 5e-06, 'epoch': 0.03}
4
- {'loss': 0.1685, 'grad_norm': 8.233646392822266, 'learning_rate': 6e-06, 'epoch': 0.03}
5
- {'loss': 0.1409, 'grad_norm': 1.1980726718902588, 'learning_rate': 7e-06, 'epoch': 0.04}
6
- {'loss': 0.1262, 'grad_norm': 2.665900707244873, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.04}
7
- {'loss': 0.1195, 'grad_norm': 12.662271499633789, 'learning_rate': 9e-06, 'epoch': 0.05}
8
- {'loss': 0.1044, 'grad_norm': 20.355819702148438, 'learning_rate': 1e-05, 'epoch': 0.05}
9
- {'loss': 0.0989, 'grad_norm': 9.962722778320312, 'learning_rate': 1.1000000000000001e-05, 'epoch': 0.06}
10
- {'loss': 0.0787, 'grad_norm': 2.361100912094116, 'learning_rate': 1.2e-05, 'epoch': 0.07}
11
- {'loss': 0.0895, 'grad_norm': 0.5440080165863037, 'learning_rate': 1.3000000000000001e-05, 'epoch': 0.07}
12
- {'loss': 0.0708, 'grad_norm': 22.654308319091797, 'learning_rate': 1.4e-05, 'epoch': 0.08}
13
- {'loss': 0.0834, 'grad_norm': 1.5862770080566406, 'learning_rate': 1.5000000000000002e-05, 'epoch': 0.08}
14
- {'loss': 0.0634, 'grad_norm': 2.121326446533203, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.09}
15
- {'loss': 0.0643, 'grad_norm': 1.7471628189086914, 'learning_rate': 1.7e-05, 'epoch': 0.09}
16
- {'loss': 0.0567, 'grad_norm': 1.2325271368026733, 'learning_rate': 1.8e-05, 'epoch': 0.1}
17
- {'loss': 0.0646, 'grad_norm': 16.829893112182617, 'learning_rate': 1.9e-05, 'epoch': 0.1}
18
- {'loss': 0.0607, 'grad_norm': 1.4897233247756958, 'learning_rate': 2e-05, 'epoch': 0.11}
19
- {'loss': 0.0564, 'grad_norm': 1.4202508926391602, 'learning_rate': 1.997760508812398e-05, 'epoch': 0.11}
20
- {'loss': 0.068, 'grad_norm': 0.0861266702413559, 'learning_rate': 1.995521017624796e-05, 'epoch': 0.12}
21
- {'loss': 0.0536, 'grad_norm': 0.28206154704093933, 'learning_rate': 1.9932815264371937e-05, 'epoch': 0.13}
22
- {'loss': 0.0594, 'grad_norm': 8.399012565612793, 'learning_rate': 1.9910420352495915e-05, 'epoch': 0.13}
23
- {'loss': 0.057, 'grad_norm': 14.871772766113281, 'learning_rate': 1.9888025440619893e-05, 'epoch': 0.14}
24
- {'loss': 0.0555, 'grad_norm': 0.14331811666488647, 'learning_rate': 1.9865630528743872e-05, 'epoch': 0.14}
25
- {'loss': 0.0485, 'grad_norm': 18.870864868164062, 'learning_rate': 1.984323561686785e-05, 'epoch': 0.15}
26
- {'loss': 0.0528, 'grad_norm': 1.9486477375030518, 'learning_rate': 1.982084070499183e-05, 'epoch': 0.15}
27
- {'loss': 0.0478, 'grad_norm': 4.394039630889893, 'learning_rate': 1.9798445793115807e-05, 'epoch': 0.16}
28
- {'loss': 0.0586, 'grad_norm': 0.8561568856239319, 'learning_rate': 1.9776050881239782e-05, 'epoch': 0.16}
29
- {'loss': 0.0539, 'grad_norm': 1.0208560228347778, 'learning_rate': 1.9753655969363764e-05, 'epoch': 0.17}
30
- {'loss': 0.0432, 'grad_norm': 4.224825859069824, 'learning_rate': 1.9731261057487742e-05, 'epoch': 0.18}
31
- {'loss': 0.0542, 'grad_norm': 1.7423515319824219, 'learning_rate': 1.9708866145611717e-05, 'epoch': 0.18}
32
- {'loss': 0.0536, 'grad_norm': 0.3042142689228058, 'learning_rate': 1.96864712337357e-05, 'epoch': 0.19}
33
- {'loss': 0.0492, 'grad_norm': 2.4977452754974365, 'learning_rate': 1.9664076321859677e-05, 'epoch': 0.19}
34
- {'loss': 0.0427, 'grad_norm': 3.3112635612487793, 'learning_rate': 1.9641681409983652e-05, 'epoch': 0.2}
35
- {'loss': 0.0489, 'grad_norm': 0.42069825530052185, 'learning_rate': 1.9619286498107634e-05, 'epoch': 0.2}
36
- {'loss': 0.0502, 'grad_norm': 0.1265694946050644, 'learning_rate': 1.959689158623161e-05, 'epoch': 0.21}
37
- {'loss': 0.0432, 'grad_norm': 0.2148071676492691, 'learning_rate': 1.9574496674355587e-05, 'epoch': 0.21}
38
- {'loss': 0.0459, 'grad_norm': 3.6947178840637207, 'learning_rate': 1.955210176247957e-05, 'epoch': 0.22}
39
- {'loss': 0.0376, 'grad_norm': 4.6488776206970215, 'learning_rate': 1.9529706850603544e-05, 'epoch': 0.22}
40
- {'loss': 0.0489, 'grad_norm': 1.4403197765350342, 'learning_rate': 1.9507311938727522e-05, 'epoch': 0.23}
41
- {'loss': 0.0515, 'grad_norm': 4.117632865905762, 'learning_rate': 1.94849170268515e-05, 'epoch': 0.24}
42
- {'loss': 0.0429, 'grad_norm': 0.196114644408226, 'learning_rate': 1.946252211497548e-05, 'epoch': 0.24}
43
- {'loss': 0.0417, 'grad_norm': 0.06215653941035271, 'learning_rate': 1.9440127203099457e-05, 'epoch': 0.25}
44
- {'loss': 0.0478, 'grad_norm': 0.19152769446372986, 'learning_rate': 1.9417732291223435e-05, 'epoch': 0.25}
45
- {'loss': 0.0359, 'grad_norm': 0.24261559545993805, 'learning_rate': 1.9395337379347414e-05, 'epoch': 0.26}
46
- {'loss': 0.0452, 'grad_norm': 0.7501357793807983, 'learning_rate': 1.9372942467471392e-05, 'epoch': 0.26}
47
- {'loss': 0.0443, 'grad_norm': 2.0564398765563965, 'learning_rate': 1.935054755559537e-05, 'epoch': 0.27}
48
- {'loss': 0.0409, 'grad_norm': 0.031283892691135406, 'learning_rate': 1.932815264371935e-05, 'epoch': 0.27}
49
- {'loss': 0.0421, 'grad_norm': 0.15692433714866638, 'learning_rate': 1.9305757731843327e-05, 'epoch': 0.28}
50
- {'loss': 0.0393, 'grad_norm': 14.9638090133667, 'learning_rate': 1.9283362819967305e-05, 'epoch': 0.28}
51
- {'loss': 0.0409, 'grad_norm': 0.08281882852315903, 'learning_rate': 1.9260967908091284e-05, 'epoch': 0.29}
52
- {'loss': 0.032, 'grad_norm': 0.3644435405731201, 'learning_rate': 1.9238572996215262e-05, 'epoch': 0.3}
53
- {'loss': 0.0468, 'grad_norm': 0.13462503254413605, 'learning_rate': 1.9216178084339237e-05, 'epoch': 0.3}
54
- {'loss': 0.0285, 'grad_norm': 0.3230873942375183, 'learning_rate': 1.919378317246322e-05, 'epoch': 0.31}
55
- {'loss': 0.0311, 'grad_norm': 20.536762237548828, 'learning_rate': 1.9171388260587197e-05, 'epoch': 0.31}
56
- {'loss': 0.0304, 'grad_norm': 1.3668054342269897, 'learning_rate': 1.9148993348711172e-05, 'epoch': 0.32}
57
- {'loss': 0.0349, 'grad_norm': 15.044346809387207, 'learning_rate': 1.9126598436835154e-05, 'epoch': 0.32}
58
- {'loss': 0.0352, 'grad_norm': 21.638084411621094, 'learning_rate': 1.9104203524959132e-05, 'epoch': 0.33}
59
- {'loss': 0.0367, 'grad_norm': 0.060597069561481476, 'learning_rate': 1.9081808613083107e-05, 'epoch': 0.33}
60
- {'loss': 0.0385, 'grad_norm': 1.7127445936203003, 'learning_rate': 1.905941370120709e-05, 'epoch': 0.34}
61
- {'loss': 0.0325, 'grad_norm': 0.011513516306877136, 'learning_rate': 1.9037018789331064e-05, 'epoch': 0.34}
62
- {'loss': 0.0302, 'grad_norm': 0.3838317394256592, 'learning_rate': 1.9014623877455042e-05, 'epoch': 0.35}
63
- {'loss': 0.0393, 'grad_norm': 0.766505777835846, 'learning_rate': 1.8992228965579024e-05, 'epoch': 0.36}
64
- {'loss': 0.032, 'grad_norm': 0.01746511273086071, 'learning_rate': 1.8969834053703e-05, 'epoch': 0.36}
65
- {'loss': 0.0263, 'grad_norm': 5.542301654815674, 'learning_rate': 1.8947439141826977e-05, 'epoch': 0.37}
66
- {'loss': 0.0343, 'grad_norm': 0.06212176755070686, 'learning_rate': 1.8925044229950956e-05, 'epoch': 0.37}
67
- {'loss': 0.0349, 'grad_norm': 7.203415870666504, 'learning_rate': 1.8902649318074934e-05, 'epoch': 0.38}
68
- {'loss': 0.0282, 'grad_norm': 0.04690209776163101, 'learning_rate': 1.8880254406198912e-05, 'epoch': 0.38}
69
- {'loss': 0.034, 'grad_norm': 0.10681267082691193, 'learning_rate': 1.885785949432289e-05, 'epoch': 0.39}
70
- {'loss': 0.0376, 'grad_norm': 0.5834813117980957, 'learning_rate': 1.883546458244687e-05, 'epoch': 0.39}
71
- {'loss': 0.0265, 'grad_norm': 1.8356763124465942, 'learning_rate': 1.8813069670570847e-05, 'epoch': 0.4}
72
- {'loss': 0.0267, 'grad_norm': 1.5589691400527954, 'learning_rate': 1.8790674758694826e-05, 'epoch': 0.41}
73
- {'loss': 0.0241, 'grad_norm': 0.20994938910007477, 'learning_rate': 1.8768279846818804e-05, 'epoch': 0.41}
74
- {'loss': 0.033, 'grad_norm': 1.8584221601486206, 'learning_rate': 1.8745884934942783e-05, 'epoch': 0.42}
75
- {'loss': 0.0323, 'grad_norm': 1.917885661125183, 'learning_rate': 1.872349002306676e-05, 'epoch': 0.42}
76
- {'loss': 0.0278, 'grad_norm': 6.673130989074707, 'learning_rate': 1.870109511119074e-05, 'epoch': 0.43}
77
- {'loss': 0.025, 'grad_norm': 0.09119334816932678, 'learning_rate': 1.8678700199314718e-05, 'epoch': 0.43}
78
- {'loss': 0.0363, 'grad_norm': 0.1400763988494873, 'learning_rate': 1.8656305287438696e-05, 'epoch': 0.44}
79
- {'loss': 0.0312, 'grad_norm': 0.1186104416847229, 'learning_rate': 1.8633910375562674e-05, 'epoch': 0.44}
80
- {'loss': 0.0307, 'grad_norm': 0.7495352625846863, 'learning_rate': 1.8611515463686653e-05, 'epoch': 0.45}
81
- {'loss': 0.0305, 'grad_norm': 1.1061906814575195, 'learning_rate': 1.858912055181063e-05, 'epoch': 0.45}
82
- {'loss': 0.028, 'grad_norm': 1.001441240310669, 'learning_rate': 1.856672563993461e-05, 'epoch': 0.46}
83
- {'loss': 0.0279, 'grad_norm': 6.4315948486328125, 'learning_rate': 1.8544330728058588e-05, 'epoch': 0.47}
84
- {'loss': 0.0265, 'grad_norm': 0.16143333911895752, 'learning_rate': 1.8521935816182566e-05, 'epoch': 0.47}
85
- {'loss': 0.0262, 'grad_norm': 0.020146619528532028, 'learning_rate': 1.8499540904306544e-05, 'epoch': 0.48}
86
- {'loss': 0.0308, 'grad_norm': 1.1892863512039185, 'learning_rate': 1.847714599243052e-05, 'epoch': 0.48}
87
- {'loss': 0.0282, 'grad_norm': 0.5584899187088013, 'learning_rate': 1.84547510805545e-05, 'epoch': 0.49}
88
- {'loss': 0.0243, 'grad_norm': 0.2598753869533539, 'learning_rate': 1.843235616867848e-05, 'epoch': 0.49}
89
- {'loss': 0.0236, 'grad_norm': 2.231210947036743, 'learning_rate': 1.8409961256802454e-05, 'epoch': 0.5}
90
- {'loss': 0.02, 'grad_norm': 0.025564778596162796, 'learning_rate': 1.8387566344926436e-05, 'epoch': 0.5}
91
- {'loss': 0.0254, 'grad_norm': 12.388060569763184, 'learning_rate': 1.836517143305041e-05, 'epoch': 0.51}
92
- {'loss': 0.0275, 'grad_norm': 0.014444425702095032, 'learning_rate': 1.834277652117439e-05, 'epoch': 0.51}
93
- {'loss': 0.0309, 'grad_norm': 16.885656356811523, 'learning_rate': 1.832038160929837e-05, 'epoch': 0.52}
94
- {'loss': 0.031, 'grad_norm': 13.603434562683105, 'learning_rate': 1.8297986697422346e-05, 'epoch': 0.53}
95
- {'loss': 0.0271, 'grad_norm': 0.5440975427627563, 'learning_rate': 1.8275591785546325e-05, 'epoch': 0.53}
96
- {'loss': 0.0218, 'grad_norm': 0.480385959148407, 'learning_rate': 1.8253196873670306e-05, 'epoch': 0.54}
97
- {'loss': 0.0249, 'grad_norm': 0.11402398347854614, 'learning_rate': 1.823080196179428e-05, 'epoch': 0.54}
98
- {'loss': 0.0285, 'grad_norm': 0.020077334716916084, 'learning_rate': 1.820840704991826e-05, 'epoch': 0.55}
99
- {'loss': 0.03, 'grad_norm': 8.4814453125, 'learning_rate': 1.8186012138042238e-05, 'epoch': 0.55}
100
- {'loss': 0.0284, 'grad_norm': 0.8492074012756348, 'learning_rate': 1.8163617226166216e-05, 'epoch': 0.56}
101
- {'loss': 0.0258, 'grad_norm': 0.10311955213546753, 'learning_rate': 1.8141222314290195e-05, 'epoch': 0.56}
102
- {'loss': 0.0228, 'grad_norm': 0.04972610995173454, 'learning_rate': 1.8118827402414173e-05, 'epoch': 0.57}
103
- {'loss': 0.0305, 'grad_norm': 2.9211368560791016, 'learning_rate': 1.809643249053815e-05, 'epoch': 0.57}
104
- {'loss': 0.0234, 'grad_norm': 0.11432399600744247, 'learning_rate': 1.807403757866213e-05, 'epoch': 0.58}
105
- {'loss': 0.0209, 'grad_norm': 0.15198394656181335, 'learning_rate': 1.8051642666786108e-05, 'epoch': 0.59}
106
- {'loss': 0.0341, 'grad_norm': 4.720141410827637, 'learning_rate': 1.8029247754910086e-05, 'epoch': 0.59}
107
- {'loss': 0.0269, 'grad_norm': 0.02421008236706257, 'learning_rate': 1.8006852843034065e-05, 'epoch': 0.6}
108
- {'loss': 0.0267, 'grad_norm': 0.49286821484565735, 'learning_rate': 1.798445793115804e-05, 'epoch': 0.6}
109
- {'loss': 0.0245, 'grad_norm': 0.0072012050077319145, 'learning_rate': 1.796206301928202e-05, 'epoch': 0.61}
110
- {'loss': 0.0263, 'grad_norm': 0.17817166447639465, 'learning_rate': 1.7939668107406e-05, 'epoch': 0.61}
111
- {'loss': 0.0195, 'grad_norm': 0.024026205763220787, 'learning_rate': 1.7917273195529975e-05, 'epoch': 0.62}
112
- {'loss': 0.0209, 'grad_norm': 0.21048341691493988, 'learning_rate': 1.7894878283653957e-05, 'epoch': 0.62}
113
- {'loss': 0.0313, 'grad_norm': 0.32045114040374756, 'learning_rate': 1.7872483371777935e-05, 'epoch': 0.63}
114
- {'loss': 0.0247, 'grad_norm': 0.17071415483951569, 'learning_rate': 1.785008845990191e-05, 'epoch': 0.64}
115
- {'loss': 0.0285, 'grad_norm': 0.03091379813849926, 'learning_rate': 1.782769354802589e-05, 'epoch': 0.64}
116
- {'loss': 0.0301, 'grad_norm': 0.007939248345792294, 'learning_rate': 1.7805298636149867e-05, 'epoch': 0.65}
117
- {'loss': 0.0227, 'grad_norm': 0.4531534016132355, 'learning_rate': 1.7782903724273845e-05, 'epoch': 0.65}
118
- {'loss': 0.0235, 'grad_norm': 0.06630811095237732, 'learning_rate': 1.7760508812397827e-05, 'epoch': 0.66}
119
- {'loss': 0.0272, 'grad_norm': 0.40065327286720276, 'learning_rate': 1.77381139005218e-05, 'epoch': 0.66}
120
- {'loss': 0.025, 'grad_norm': 2.7814598083496094, 'learning_rate': 1.771571898864578e-05, 'epoch': 0.67}
121
- {'loss': 0.0276, 'grad_norm': 2.402163028717041, 'learning_rate': 1.769332407676976e-05, 'epoch': 0.67}
122
- {'loss': 0.0289, 'grad_norm': 0.03239729255437851, 'learning_rate': 1.7670929164893737e-05, 'epoch': 0.68}
123
- {'loss': 0.0232, 'grad_norm': 0.6327053904533386, 'learning_rate': 1.7648534253017715e-05, 'epoch': 0.68}
124
- {'loss': 0.0258, 'grad_norm': 0.08029168099164963, 'learning_rate': 1.7626139341141693e-05, 'epoch': 0.69}
125
- {'loss': 0.0254, 'grad_norm': 0.1738702803850174, 'learning_rate': 1.7603744429265672e-05, 'epoch': 0.7}
126
- {'loss': 0.0205, 'grad_norm': 0.06688190996646881, 'learning_rate': 1.758134951738965e-05, 'epoch': 0.7}
127
- {'loss': 0.0216, 'grad_norm': 0.21624340116977692, 'learning_rate': 1.755895460551363e-05, 'epoch': 0.71}
128
- {'loss': 0.0304, 'grad_norm': 0.15297529101371765, 'learning_rate': 1.7536559693637607e-05, 'epoch': 0.71}
129
- {'loss': 0.0234, 'grad_norm': 0.019339820370078087, 'learning_rate': 1.7514164781761585e-05, 'epoch': 0.72}
130
- {'loss': 0.0233, 'grad_norm': 0.25130143761634827, 'learning_rate': 1.7491769869885563e-05, 'epoch': 0.72}
131
- {'loss': 0.0239, 'grad_norm': 0.06551504135131836, 'learning_rate': 1.7469374958009542e-05, 'epoch': 0.73}
132
- {'loss': 0.0166, 'grad_norm': 6.077746391296387, 'learning_rate': 1.744698004613352e-05, 'epoch': 0.73}
133
- {'loss': 0.0211, 'grad_norm': 0.019542552530765533, 'learning_rate': 1.74245851342575e-05, 'epoch': 0.74}
134
- {'loss': 0.0212, 'grad_norm': 0.01987328752875328, 'learning_rate': 1.7402190222381477e-05, 'epoch': 0.74}
135
- {'loss': 0.0247, 'grad_norm': 0.9539387226104736, 'learning_rate': 1.7379795310505455e-05, 'epoch': 0.75}
136
- {'loss': 0.023, 'grad_norm': 1.40300452709198, 'learning_rate': 1.7357400398629434e-05, 'epoch': 0.76}
137
- {'loss': 0.0261, 'grad_norm': 0.1152738556265831, 'learning_rate': 1.7335005486753412e-05, 'epoch': 0.76}
138
- {'loss': 0.0204, 'grad_norm': 0.1181008443236351, 'learning_rate': 1.731261057487739e-05, 'epoch': 0.77}
139
- {'loss': 0.026, 'grad_norm': 0.36036548018455505, 'learning_rate': 1.729021566300137e-05, 'epoch': 0.77}
140
- {'loss': 0.0299, 'grad_norm': 0.021303873509168625, 'learning_rate': 1.7267820751125347e-05, 'epoch': 0.78}
141
- {'loss': 0.0183, 'grad_norm': 1.2206119298934937, 'learning_rate': 1.7245425839249322e-05, 'epoch': 0.78}
142
- {'loss': 0.0228, 'grad_norm': 1.1102793216705322, 'learning_rate': 1.7223030927373304e-05, 'epoch': 0.79}
143
- {'loss': 0.0181, 'grad_norm': 1.511096477508545, 'learning_rate': 1.7200636015497282e-05, 'epoch': 0.79}
144
- {'loss': 0.0237, 'grad_norm': 0.43338674306869507, 'learning_rate': 1.7178241103621257e-05, 'epoch': 0.8}
145
- {'loss': 0.0237, 'grad_norm': 0.20501121878623962, 'learning_rate': 1.715584619174524e-05, 'epoch': 0.8}
146
- {'loss': 0.0158, 'grad_norm': 0.09320724755525589, 'learning_rate': 1.7133451279869217e-05, 'epoch': 0.81}
147
- {'loss': 0.0222, 'grad_norm': 0.43307095766067505, 'learning_rate': 1.7111056367993192e-05, 'epoch': 0.82}
148
- {'loss': 0.0196, 'grad_norm': 0.9130146503448486, 'learning_rate': 1.7088661456117174e-05, 'epoch': 0.82}
149
- {'loss': 0.0242, 'grad_norm': 4.501875400543213, 'learning_rate': 1.706626654424115e-05, 'epoch': 0.83}
150
- {'loss': 0.0218, 'grad_norm': 0.041434165090322495, 'learning_rate': 1.7043871632365127e-05, 'epoch': 0.83}
151
- {'loss': 0.0201, 'grad_norm': 18.74702262878418, 'learning_rate': 1.702147672048911e-05, 'epoch': 0.84}
152
- {'loss': 0.026, 'grad_norm': 0.03762982413172722, 'learning_rate': 1.6999081808613084e-05, 'epoch': 0.84}
153
- {'loss': 0.0232, 'grad_norm': 0.12932895123958588, 'learning_rate': 1.6976686896737062e-05, 'epoch': 0.85}
154
- {'loss': 0.0254, 'grad_norm': 0.8021348118782043, 'learning_rate': 1.695429198486104e-05, 'epoch': 0.85}
155
- {'loss': 0.0218, 'grad_norm': 0.1913072168827057, 'learning_rate': 1.693189707298502e-05, 'epoch': 0.86}
156
- {'loss': 0.0219, 'grad_norm': 0.00855530146509409, 'learning_rate': 1.6909502161108997e-05, 'epoch': 0.87}
157
- {'loss': 0.0255, 'grad_norm': 0.9369354844093323, 'learning_rate': 1.6887107249232976e-05, 'epoch': 0.87}
158
- {'loss': 0.0201, 'grad_norm': 0.0015777107328176498, 'learning_rate': 1.6864712337356954e-05, 'epoch': 0.88}
159
- {'loss': 0.0301, 'grad_norm': 3.1490933895111084, 'learning_rate': 1.6842317425480932e-05, 'epoch': 0.88}
160
- {'loss': 0.0275, 'grad_norm': 0.02967211790382862, 'learning_rate': 1.681992251360491e-05, 'epoch': 0.89}
161
- {'loss': 0.018, 'grad_norm': 0.013429056853055954, 'learning_rate': 1.679752760172889e-05, 'epoch': 0.89}
162
- {'loss': 0.028, 'grad_norm': 0.269550621509552, 'learning_rate': 1.6775132689852867e-05, 'epoch': 0.9}
163
- {'loss': 0.0223, 'grad_norm': 0.30404672026634216, 'learning_rate': 1.6752737777976846e-05, 'epoch': 0.9}
164
- {'loss': 0.0201, 'grad_norm': 0.013556144200265408, 'learning_rate': 1.6730342866100824e-05, 'epoch': 0.91}
165
- {'loss': 0.0299, 'grad_norm': 0.046162448823451996, 'learning_rate': 1.6707947954224802e-05, 'epoch': 0.91}
166
- {'loss': 0.0251, 'grad_norm': 0.020988399162888527, 'learning_rate': 1.6685553042348777e-05, 'epoch': 0.92}
167
- {'loss': 0.0203, 'grad_norm': 0.07533540576696396, 'learning_rate': 1.666315813047276e-05, 'epoch': 0.93}
168
- {'loss': 0.0209, 'grad_norm': 0.7397226691246033, 'learning_rate': 1.6640763218596737e-05, 'epoch': 0.93}
169
- {'loss': 0.0236, 'grad_norm': 0.01838051900267601, 'learning_rate': 1.6618368306720712e-05, 'epoch': 0.94}
170
- {'loss': 0.0191, 'grad_norm': 0.03700494021177292, 'learning_rate': 1.6595973394844694e-05, 'epoch': 0.94}
171
- {'loss': 0.0168, 'grad_norm': 0.07713836431503296, 'learning_rate': 1.657357848296867e-05, 'epoch': 0.95}
172
- {'loss': 0.017, 'grad_norm': 3.437300682067871, 'learning_rate': 1.6551183571092647e-05, 'epoch': 0.95}
173
- {'loss': 0.0201, 'grad_norm': 0.06772757321596146, 'learning_rate': 1.652878865921663e-05, 'epoch': 0.96}
174
- {'loss': 0.0171, 'grad_norm': 0.8847533464431763, 'learning_rate': 1.6506393747340604e-05, 'epoch': 0.96}
175
- {'loss': 0.0217, 'grad_norm': 0.0963427796959877, 'learning_rate': 1.6483998835464583e-05, 'epoch': 0.97}
176
- {'loss': 0.0208, 'grad_norm': 0.45162704586982727, 'learning_rate': 1.6461603923588564e-05, 'epoch': 0.97}
177
- {'loss': 0.0157, 'grad_norm': 0.1655539721250534, 'learning_rate': 1.643920901171254e-05, 'epoch': 0.98}
178
- {'loss': 0.0218, 'grad_norm': 0.010928811505436897, 'learning_rate': 1.6416814099836518e-05, 'epoch': 0.99}
179
- {'loss': 0.021, 'grad_norm': 6.908138751983643, 'learning_rate': 1.6394419187960496e-05, 'epoch': 0.99}
180
- {'loss': 0.0159, 'grad_norm': 2.0493404865264893, 'learning_rate': 1.6372024276084474e-05, 'epoch': 1.0}
181
- {'loss': 0.0189, 'grad_norm': 0.10856210440397263, 'learning_rate': 1.6349629364208453e-05, 'epoch': 1.0}
182
- {'loss': 0.0182, 'grad_norm': 2.7368407249450684, 'learning_rate': 1.632723445233243e-05, 'epoch': 1.01}
183
- {'loss': 0.0206, 'grad_norm': 0.09092140942811966, 'learning_rate': 1.630483954045641e-05, 'epoch': 1.01}
184
- {'loss': 0.0179, 'grad_norm': 0.01954420655965805, 'learning_rate': 1.6282444628580388e-05, 'epoch': 1.02}
185
- {'loss': 0.0168, 'grad_norm': 0.043369174003601074, 'learning_rate': 1.6260049716704366e-05, 'epoch': 1.02}
186
- {'loss': 0.019, 'grad_norm': 0.01492550503462553, 'learning_rate': 1.6237654804828344e-05, 'epoch': 1.03}
187
- {'loss': 0.0173, 'grad_norm': 0.0102744922041893, 'learning_rate': 1.6215259892952323e-05, 'epoch': 1.03}
188
- {'loss': 0.0172, 'grad_norm': 0.7786262631416321, 'learning_rate': 1.61928649810763e-05, 'epoch': 1.04}
189
- {'loss': 0.0187, 'grad_norm': 0.033732105046510696, 'learning_rate': 1.617047006920028e-05, 'epoch': 1.05}
190
- {'loss': 0.0199, 'grad_norm': 0.01999427191913128, 'learning_rate': 1.6148075157324258e-05, 'epoch': 1.05}
191
- {'loss': 0.0202, 'grad_norm': 0.0032311684917658567, 'learning_rate': 1.6125680245448236e-05, 'epoch': 1.06}
192
- {'loss': 0.0198, 'grad_norm': 0.09005136042833328, 'learning_rate': 1.6103285333572214e-05, 'epoch': 1.06}
193
- {'loss': 0.0157, 'grad_norm': 0.48562514781951904, 'learning_rate': 1.6080890421696193e-05, 'epoch': 1.07}
194
- {'loss': 0.0178, 'grad_norm': 0.021473940461874008, 'learning_rate': 1.605849550982017e-05, 'epoch': 1.07}
195
- {'loss': 0.0147, 'grad_norm': 7.715484142303467, 'learning_rate': 1.603610059794415e-05, 'epoch': 1.08}
196
- {'loss': 0.0152, 'grad_norm': 0.1648186594247818, 'learning_rate': 1.6013705686068124e-05, 'epoch': 1.08}
197
- {'loss': 0.0152, 'grad_norm': 0.0015509655931964517, 'learning_rate': 1.5991310774192106e-05, 'epoch': 1.09}
198
- {'loss': 0.0126, 'grad_norm': 0.23705679178237915, 'learning_rate': 1.5968915862316085e-05, 'epoch': 1.1}
199
- {'loss': 0.0115, 'grad_norm': 0.019256843253970146, 'learning_rate': 1.594652095044006e-05, 'epoch': 1.1}
200
- {'loss': 0.0122, 'grad_norm': 0.5769898295402527, 'learning_rate': 1.592412603856404e-05, 'epoch': 1.11}
201
- {'loss': 0.0097, 'grad_norm': 0.03456917405128479, 'learning_rate': 1.590173112668802e-05, 'epoch': 1.11}
202
- {'loss': 0.0149, 'grad_norm': 0.671821117401123, 'learning_rate': 1.5879336214811995e-05, 'epoch': 1.12}
203
- {'loss': 0.0151, 'grad_norm': 0.004286791197955608, 'learning_rate': 1.5856941302935976e-05, 'epoch': 1.12}
204
- {'loss': 0.0134, 'grad_norm': 0.13815534114837646, 'learning_rate': 1.583454639105995e-05, 'epoch': 1.13}
205
- {'loss': 0.0157, 'grad_norm': 0.042440492659807205, 'learning_rate': 1.581215147918393e-05, 'epoch': 1.13}
206
- {'loss': 0.0141, 'grad_norm': 0.003109186887741089, 'learning_rate': 1.5789756567307908e-05, 'epoch': 1.14}
207
- {'loss': 0.0139, 'grad_norm': 4.196854591369629, 'learning_rate': 1.5767361655431886e-05, 'epoch': 1.14}
208
- {'loss': 0.0149, 'grad_norm': 0.36187997460365295, 'learning_rate': 1.5744966743555865e-05, 'epoch': 1.15}
209
- {'loss': 0.0103, 'grad_norm': 0.08171387016773224, 'learning_rate': 1.5722571831679843e-05, 'epoch': 1.16}
210
- {'loss': 0.0138, 'grad_norm': 0.18907544016838074, 'learning_rate': 1.570017691980382e-05, 'epoch': 1.16}
211
- {'loss': 0.0116, 'grad_norm': 0.01975160650908947, 'learning_rate': 1.56777820079278e-05, 'epoch': 1.17}
212
- {'loss': 0.0146, 'grad_norm': 0.16162721812725067, 'learning_rate': 1.5655387096051778e-05, 'epoch': 1.17}
213
- {'loss': 0.0168, 'grad_norm': 14.283798217773438, 'learning_rate': 1.5632992184175756e-05, 'epoch': 1.18}
214
- {'loss': 0.0166, 'grad_norm': 0.984356164932251, 'learning_rate': 1.5610597272299735e-05, 'epoch': 1.18}
215
- {'loss': 0.0136, 'grad_norm': 0.4644564688205719, 'learning_rate': 1.5588202360423713e-05, 'epoch': 1.19}
216
- {'loss': 0.0103, 'grad_norm': 0.1780320703983307, 'learning_rate': 1.556580744854769e-05, 'epoch': 1.19}
217
- {'loss': 0.0128, 'grad_norm': 0.12859980762004852, 'learning_rate': 1.554341253667167e-05, 'epoch': 1.2}
218
- {'loss': 0.0112, 'grad_norm': 0.45004957914352417, 'learning_rate': 1.5521017624795648e-05, 'epoch': 1.2}
219
- {'loss': 0.0103, 'grad_norm': 0.6201251745223999, 'learning_rate': 1.5498622712919627e-05, 'epoch': 1.21}
220
- {'loss': 0.0133, 'grad_norm': 1.4645249843597412, 'learning_rate': 1.5476227801043605e-05, 'epoch': 1.22}
221
- {'loss': 0.0118, 'grad_norm': 1.33707594871521, 'learning_rate': 1.545383288916758e-05, 'epoch': 1.22}
222
- {'loss': 0.009, 'grad_norm': 0.0075813643634319305, 'learning_rate': 1.543143797729156e-05, 'epoch': 1.23}
223
- {'loss': 0.0151, 'grad_norm': 0.06382226198911667, 'learning_rate': 1.540904306541554e-05, 'epoch': 1.23}
224
- {'loss': 0.0146, 'grad_norm': 0.06127556413412094, 'learning_rate': 1.5386648153539515e-05, 'epoch': 1.24}
225
- {'loss': 0.0143, 'grad_norm': 0.05294317007064819, 'learning_rate': 1.5364253241663497e-05, 'epoch': 1.24}
226
- {'loss': 0.01, 'grad_norm': 0.031760863959789276, 'learning_rate': 1.5341858329787475e-05, 'epoch': 1.25}
227
- {'loss': 0.0147, 'grad_norm': 0.22437328100204468, 'learning_rate': 1.531946341791145e-05, 'epoch': 1.25}
228
- {'loss': 0.011, 'grad_norm': 1.463218092918396, 'learning_rate': 1.5297068506035432e-05, 'epoch': 1.26}
229
- {'loss': 0.0121, 'grad_norm': 0.025357956066727638, 'learning_rate': 1.5274673594159407e-05, 'epoch': 1.26}
230
- {'loss': 0.0117, 'grad_norm': 0.004972013644874096, 'learning_rate': 1.5252278682283385e-05, 'epoch': 1.27}
231
- {'loss': 0.0151, 'grad_norm': 0.14370237290859222, 'learning_rate': 1.5229883770407365e-05, 'epoch': 1.28}
232
- {'loss': 0.0143, 'grad_norm': 1.6805877685546875, 'learning_rate': 1.5207488858531343e-05, 'epoch': 1.28}
233
- {'loss': 0.0163, 'grad_norm': 0.11494574695825577, 'learning_rate': 1.518509394665532e-05, 'epoch': 1.29}
234
- {'loss': 0.0135, 'grad_norm': 0.03341173008084297, 'learning_rate': 1.51626990347793e-05, 'epoch': 1.29}
235
- {'loss': 0.0118, 'grad_norm': 0.026476634666323662, 'learning_rate': 1.5140304122903279e-05, 'epoch': 1.3}
236
- {'loss': 0.0129, 'grad_norm': 0.020570116117596626, 'learning_rate': 1.5117909211027255e-05, 'epoch': 1.3}
237
- {'loss': 0.0062, 'grad_norm': 0.36162054538726807, 'learning_rate': 1.5095514299151235e-05, 'epoch': 1.31}
238
- {'loss': 0.0127, 'grad_norm': 0.013678218238055706, 'learning_rate': 1.5073119387275212e-05, 'epoch': 1.31}
239
- {'loss': 0.014, 'grad_norm': 2.711167335510254, 'learning_rate': 1.505072447539919e-05, 'epoch': 1.32}
240
- {'loss': 0.0131, 'grad_norm': 0.3496088981628418, 'learning_rate': 1.502832956352317e-05, 'epoch': 1.33}
241
- {'loss': 0.0162, 'grad_norm': 0.018671611323952675, 'learning_rate': 1.5005934651647147e-05, 'epoch': 1.33}
242
- {'loss': 0.0107, 'grad_norm': 0.001666502095758915, 'learning_rate': 1.4983539739771125e-05, 'epoch': 1.34}
243
- {'loss': 0.0125, 'grad_norm': 4.59019136428833, 'learning_rate': 1.4961144827895104e-05, 'epoch': 1.34}
244
- {'loss': 0.0136, 'grad_norm': 0.2412848323583603, 'learning_rate': 1.4938749916019082e-05, 'epoch': 1.35}
245
- {'loss': 0.0112, 'grad_norm': 0.02962004393339157, 'learning_rate': 1.4916355004143059e-05, 'epoch': 1.35}
246
- {'loss': 0.0126, 'grad_norm': 0.36205366253852844, 'learning_rate': 1.4893960092267039e-05, 'epoch': 1.36}
247
- {'loss': 0.0079, 'grad_norm': 0.031118787825107574, 'learning_rate': 1.4871565180391017e-05, 'epoch': 1.36}
248
- {'loss': 0.0104, 'grad_norm': 0.16951577365398407, 'learning_rate': 1.4849170268514994e-05, 'epoch': 1.37}
249
- {'loss': 0.0137, 'grad_norm': 0.26067009568214417, 'learning_rate': 1.4826775356638974e-05, 'epoch': 1.37}
250
- {'loss': 0.0075, 'grad_norm': 0.04934118688106537, 'learning_rate': 1.4804380444762952e-05, 'epoch': 1.38}
251
- {'loss': 0.0108, 'grad_norm': 0.009564828127622604, 'learning_rate': 1.4781985532886929e-05, 'epoch': 1.39}
252
- {'loss': 0.0087, 'grad_norm': 0.10579288750886917, 'learning_rate': 1.4759590621010909e-05, 'epoch': 1.39}
253
- {'loss': 0.0138, 'grad_norm': 0.31085848808288574, 'learning_rate': 1.4737195709134885e-05, 'epoch': 1.4}
254
- {'loss': 0.0056, 'grad_norm': 0.10199588537216187, 'learning_rate': 1.4714800797258864e-05, 'epoch': 1.4}
255
- {'loss': 0.0067, 'grad_norm': 0.2795519232749939, 'learning_rate': 1.4692405885382844e-05, 'epoch': 1.41}
256
- {'loss': 0.0103, 'grad_norm': 0.023119742050766945, 'learning_rate': 1.467001097350682e-05, 'epoch': 1.41}
257
- {'loss': 0.0102, 'grad_norm': 0.08798133581876755, 'learning_rate': 1.4647616061630799e-05, 'epoch': 1.42}
258
- {'loss': 0.0119, 'grad_norm': 0.0012799632968381047, 'learning_rate': 1.4625221149754776e-05, 'epoch': 1.42}
259
- {'loss': 0.0094, 'grad_norm': 0.01972614973783493, 'learning_rate': 1.4602826237878756e-05, 'epoch': 1.43}
260
- {'loss': 0.0075, 'grad_norm': 0.005090704187750816, 'learning_rate': 1.4580431326002732e-05, 'epoch': 1.43}
261
- {'loss': 0.0146, 'grad_norm': 0.2999782860279083, 'learning_rate': 1.455803641412671e-05, 'epoch': 1.44}
262
- {'loss': 0.0103, 'grad_norm': 0.00834854319691658, 'learning_rate': 1.453564150225069e-05, 'epoch': 1.45}
263
- {'loss': 0.0123, 'grad_norm': 0.007013251073658466, 'learning_rate': 1.4513246590374667e-05, 'epoch': 1.45}
264
- {'loss': 0.0107, 'grad_norm': 0.11994576454162598, 'learning_rate': 1.4490851678498646e-05, 'epoch': 1.46}
265
- {'loss': 0.0071, 'grad_norm': 0.012767240405082703, 'learning_rate': 1.4468456766622626e-05, 'epoch': 1.46}
266
- {'loss': 0.0087, 'grad_norm': 0.4222990870475769, 'learning_rate': 1.4446061854746602e-05, 'epoch': 1.47}
267
- {'loss': 0.0072, 'grad_norm': 0.6226161122322083, 'learning_rate': 1.442366694287058e-05, 'epoch': 1.47}
268
- {'loss': 0.0094, 'grad_norm': 0.23288856446743011, 'learning_rate': 1.4401272030994559e-05, 'epoch': 1.48}
269
- {'loss': 0.0083, 'grad_norm': 0.026115866377949715, 'learning_rate': 1.4378877119118537e-05, 'epoch': 1.48}
270
- {'loss': 0.0104, 'grad_norm': 0.0783042460680008, 'learning_rate': 1.4356482207242514e-05, 'epoch': 1.49}
271
- {'loss': 0.0076, 'grad_norm': 0.5086662769317627, 'learning_rate': 1.4334087295366494e-05, 'epoch': 1.49}
272
- {'loss': 0.006, 'grad_norm': 0.63682621717453, 'learning_rate': 1.4311692383490472e-05, 'epoch': 1.5}
273
- {'loss': 0.0085, 'grad_norm': 0.05100702494382858, 'learning_rate': 1.4289297471614449e-05, 'epoch': 1.51}
274
- {'loss': 0.0061, 'grad_norm': 0.741942286491394, 'learning_rate': 1.4266902559738429e-05, 'epoch': 1.51}
275
- {'loss': 0.0106, 'grad_norm': 0.002769783604890108, 'learning_rate': 1.4244507647862407e-05, 'epoch': 1.52}
276
- {'loss': 0.0088, 'grad_norm': 0.028197582811117172, 'learning_rate': 1.4222112735986384e-05, 'epoch': 1.52}
277
- {'loss': 0.0111, 'grad_norm': 0.02364266850054264, 'learning_rate': 1.4199717824110364e-05, 'epoch': 1.53}
278
- {'loss': 0.0094, 'grad_norm': 0.16233578324317932, 'learning_rate': 1.4177322912234341e-05, 'epoch': 1.53}
279
- {'loss': 0.0079, 'grad_norm': 0.05021541193127632, 'learning_rate': 1.415492800035832e-05, 'epoch': 1.54}
280
- {'loss': 0.0095, 'grad_norm': 0.00331246224232018, 'learning_rate': 1.41325330884823e-05, 'epoch': 1.54}
281
- {'loss': 0.0098, 'grad_norm': 0.07592795044183731, 'learning_rate': 1.4110138176606276e-05, 'epoch': 1.55}
282
- {'loss': 0.0139, 'grad_norm': 0.09074956923723221, 'learning_rate': 1.4087743264730254e-05, 'epoch': 1.56}
283
- {'loss': 0.0085, 'grad_norm': 0.10638327151536942, 'learning_rate': 1.4065348352854233e-05, 'epoch': 1.56}
284
- {'loss': 0.0094, 'grad_norm': 0.02220524474978447, 'learning_rate': 1.4042953440978211e-05, 'epoch': 1.57}
285
- {'loss': 0.0088, 'grad_norm': 0.045808251947164536, 'learning_rate': 1.4020558529102188e-05, 'epoch': 1.57}
286
- {'loss': 0.0092, 'grad_norm': 0.15339775383472443, 'learning_rate': 1.3998163617226168e-05, 'epoch': 1.58}
287
- {'loss': 0.0071, 'grad_norm': 0.009142019785940647, 'learning_rate': 1.3975768705350146e-05, 'epoch': 1.58}
288
- {'loss': 0.0101, 'grad_norm': 0.15891438722610474, 'learning_rate': 1.3953373793474123e-05, 'epoch': 1.59}
289
- {'loss': 0.011, 'grad_norm': 0.010278990492224693, 'learning_rate': 1.3930978881598103e-05, 'epoch': 1.59}
290
- {'loss': 0.0097, 'grad_norm': 0.18057747185230255, 'learning_rate': 1.3908583969722081e-05, 'epoch': 1.6}
291
- {'loss': 0.0071, 'grad_norm': 0.7384783625602722, 'learning_rate': 1.3886189057846058e-05, 'epoch': 1.6}
292
- {'loss': 0.0114, 'grad_norm': 0.0011770115233957767, 'learning_rate': 1.3863794145970038e-05, 'epoch': 1.61}
293
- {'loss': 0.0087, 'grad_norm': 0.033389296382665634, 'learning_rate': 1.3841399234094014e-05, 'epoch': 1.62}
294
- {'loss': 0.0075, 'grad_norm': 0.06636679172515869, 'learning_rate': 1.3819004322217993e-05, 'epoch': 1.62}
295
- {'loss': 0.0039, 'grad_norm': 0.018852446228265762, 'learning_rate': 1.3796609410341973e-05, 'epoch': 1.63}
296
- {'loss': 0.0091, 'grad_norm': 0.3978893458843231, 'learning_rate': 1.377421449846595e-05, 'epoch': 1.63}
297
- {'loss': 0.0117, 'grad_norm': 0.044826582074165344, 'learning_rate': 1.3751819586589928e-05, 'epoch': 1.64}
298
- {'loss': 0.01, 'grad_norm': 0.0314922071993351, 'learning_rate': 1.3729424674713908e-05, 'epoch': 1.64}
299
- {'loss': 0.0099, 'grad_norm': 0.008845321834087372, 'learning_rate': 1.3707029762837885e-05, 'epoch': 1.65}
300
- {'loss': 0.0069, 'grad_norm': 0.012995203956961632, 'learning_rate': 1.3684634850961863e-05, 'epoch': 1.65}
301
- {'loss': 0.0084, 'grad_norm': 0.2505897283554077, 'learning_rate': 1.3662239939085841e-05, 'epoch': 1.66}
302
- {'loss': 0.0118, 'grad_norm': 0.576478898525238, 'learning_rate': 1.363984502720982e-05, 'epoch': 1.66}
303
- {'loss': 0.0078, 'grad_norm': 0.03707367181777954, 'learning_rate': 1.3617450115333796e-05, 'epoch': 1.67}
304
- {'loss': 0.0067, 'grad_norm': 0.013740709982812405, 'learning_rate': 1.3595055203457776e-05, 'epoch': 1.68}
305
- {'loss': 0.0133, 'grad_norm': 0.032045792788267136, 'learning_rate': 1.3572660291581755e-05, 'epoch': 1.68}
306
- {'loss': 0.0079, 'grad_norm': 0.011074294336140156, 'learning_rate': 1.3550265379705731e-05, 'epoch': 1.69}
307
- {'loss': 0.0092, 'grad_norm': 15.06441879272461, 'learning_rate': 1.3527870467829711e-05, 'epoch': 1.69}
308
- {'loss': 0.0069, 'grad_norm': 0.02328888699412346, 'learning_rate': 1.3505475555953688e-05, 'epoch': 1.7}
309
- {'loss': 0.008, 'grad_norm': 0.09563597291707993, 'learning_rate': 1.3483080644077666e-05, 'epoch': 1.7}
310
- {'loss': 0.0124, 'grad_norm': 0.18408024311065674, 'learning_rate': 1.3460685732201646e-05, 'epoch': 1.71}
311
- {'loss': 0.0112, 'grad_norm': 5.72014856338501, 'learning_rate': 1.3438290820325623e-05, 'epoch': 1.71}
312
- {'loss': 0.0074, 'grad_norm': 0.38538190722465515, 'learning_rate': 1.3415895908449601e-05, 'epoch': 1.72}
313
- {'loss': 0.0091, 'grad_norm': 0.000642662460450083, 'learning_rate': 1.3393500996573578e-05, 'epoch': 1.72}
314
- {'loss': 0.0088, 'grad_norm': 0.038307420909404755, 'learning_rate': 1.3371106084697558e-05, 'epoch': 1.73}
315
- {'loss': 0.0061, 'grad_norm': 0.043493278324604034, 'learning_rate': 1.3348711172821536e-05, 'epoch': 1.74}
316
- {'loss': 0.0089, 'grad_norm': 0.17391881346702576, 'learning_rate': 1.3326316260945513e-05, 'epoch': 1.74}
317
- {'loss': 0.0082, 'grad_norm': 0.056078795343637466, 'learning_rate': 1.3303921349069493e-05, 'epoch': 1.75}
318
- {'loss': 0.0103, 'grad_norm': 13.198871612548828, 'learning_rate': 1.328152643719347e-05, 'epoch': 1.75}
319
- {'loss': 0.0094, 'grad_norm': 0.026877380907535553, 'learning_rate': 1.3259131525317448e-05, 'epoch': 1.76}
320
- {'loss': 0.0073, 'grad_norm': 0.08020322024822235, 'learning_rate': 1.3236736613441428e-05, 'epoch': 1.76}
321
- {'loss': 0.0116, 'grad_norm': 0.13889168202877045, 'learning_rate': 1.3214341701565405e-05, 'epoch': 1.77}
322
- {'loss': 0.0112, 'grad_norm': 0.10164140164852142, 'learning_rate': 1.3191946789689383e-05, 'epoch': 1.77}
323
- {'loss': 0.0057, 'grad_norm': 0.04719488322734833, 'learning_rate': 1.3169551877813362e-05, 'epoch': 1.78}
324
- {'loss': 0.0075, 'grad_norm': 0.004808616824448109, 'learning_rate': 1.314715696593734e-05, 'epoch': 1.79}
325
- {'loss': 0.0062, 'grad_norm': 4.645363807678223, 'learning_rate': 1.3124762054061317e-05, 'epoch': 1.79}
326
- {'loss': 0.0046, 'grad_norm': 0.0703832358121872, 'learning_rate': 1.3102367142185297e-05, 'epoch': 1.8}
327
- {'loss': 0.0091, 'grad_norm': 0.7329868078231812, 'learning_rate': 1.3079972230309275e-05, 'epoch': 1.8}
328
- {'loss': 0.0066, 'grad_norm': 0.050435516983270645, 'learning_rate': 1.3057577318433252e-05, 'epoch': 1.81}
329
- {'loss': 0.0051, 'grad_norm': 0.06814319640398026, 'learning_rate': 1.3035182406557232e-05, 'epoch': 1.81}
330
- {'loss': 0.0066, 'grad_norm': 0.1799684762954712, 'learning_rate': 1.301278749468121e-05, 'epoch': 1.82}
331
- {'loss': 0.0093, 'grad_norm': 0.26140815019607544, 'learning_rate': 1.2990392582805187e-05, 'epoch': 1.82}
332
- {'loss': 0.0079, 'grad_norm': 0.015023380517959595, 'learning_rate': 1.2967997670929167e-05, 'epoch': 1.83}
333
- {'loss': 0.0067, 'grad_norm': 0.018291285261511803, 'learning_rate': 1.2945602759053143e-05, 'epoch': 1.83}
334
- {'loss': 0.007, 'grad_norm': 0.07480958849191666, 'learning_rate': 1.2923207847177122e-05, 'epoch': 1.84}
335
- {'loss': 0.0133, 'grad_norm': 0.08360631763935089, 'learning_rate': 1.2900812935301102e-05, 'epoch': 1.85}
336
- {'loss': 0.0071, 'grad_norm': 0.7749391198158264, 'learning_rate': 1.2878418023425078e-05, 'epoch': 1.85}
337
- {'loss': 0.0091, 'grad_norm': 0.2316342443227768, 'learning_rate': 1.2856023111549057e-05, 'epoch': 1.86}
338
- {'loss': 0.0067, 'grad_norm': 1.2949588298797607, 'learning_rate': 1.2833628199673037e-05, 'epoch': 1.86}
339
- {'loss': 0.0091, 'grad_norm': 0.02135908231139183, 'learning_rate': 1.2811233287797014e-05, 'epoch': 1.87}
340
- {'loss': 0.0103, 'grad_norm': 3.2552008628845215, 'learning_rate': 1.2788838375920992e-05, 'epoch': 1.87}
341
- {'loss': 0.0058, 'grad_norm': 0.002404365921393037, 'learning_rate': 1.276644346404497e-05, 'epoch': 1.88}
342
- {'loss': 0.0116, 'grad_norm': 0.021590234711766243, 'learning_rate': 1.2744048552168949e-05, 'epoch': 1.88}
343
- {'loss': 0.0089, 'grad_norm': 0.0267606470733881, 'learning_rate': 1.2721653640292925e-05, 'epoch': 1.89}
344
- {'loss': 0.0137, 'grad_norm': 0.08189389854669571, 'learning_rate': 1.2699258728416905e-05, 'epoch': 1.89}
345
- {'loss': 0.0065, 'grad_norm': 0.009326275438070297, 'learning_rate': 1.2676863816540884e-05, 'epoch': 1.9}
346
- {'loss': 0.0098, 'grad_norm': 0.0413086861371994, 'learning_rate': 1.265446890466486e-05, 'epoch': 1.91}
347
- {'loss': 0.0083, 'grad_norm': 0.04068596288561821, 'learning_rate': 1.263207399278884e-05, 'epoch': 1.91}
348
- {'loss': 0.0115, 'grad_norm': 0.011158055625855923, 'learning_rate': 1.2609679080912817e-05, 'epoch': 1.92}
349
- {'loss': 0.0083, 'grad_norm': 3.782308578491211, 'learning_rate': 1.2587284169036795e-05, 'epoch': 1.92}
350
- {'loss': 0.0084, 'grad_norm': 0.10942483693361282, 'learning_rate': 1.2564889257160775e-05, 'epoch': 1.93}
351
- {'loss': 0.0091, 'grad_norm': 0.7013315558433533, 'learning_rate': 1.2542494345284752e-05, 'epoch': 1.93}
352
- {'loss': 0.0092, 'grad_norm': 0.018937768414616585, 'learning_rate': 1.252009943340873e-05, 'epoch': 1.94}
353
- {'loss': 0.0054, 'grad_norm': 1.5005667209625244, 'learning_rate': 1.249770452153271e-05, 'epoch': 1.94}
354
- {'loss': 0.0049, 'grad_norm': 0.23089168965816498, 'learning_rate': 1.2475309609656687e-05, 'epoch': 1.95}
355
- {'loss': 0.0072, 'grad_norm': 0.008295576088130474, 'learning_rate': 1.2452914697780665e-05, 'epoch': 1.95}
356
- {'loss': 0.0052, 'grad_norm': 0.010741750709712505, 'learning_rate': 1.2430519785904644e-05, 'epoch': 1.96}
357
- {'loss': 0.0063, 'grad_norm': 0.22365953028202057, 'learning_rate': 1.2408124874028622e-05, 'epoch': 1.97}
358
- {'loss': 0.0107, 'grad_norm': 0.034852419048547745, 'learning_rate': 1.2385729962152599e-05, 'epoch': 1.97}
359
- {'loss': 0.0061, 'grad_norm': 0.06765995174646378, 'learning_rate': 1.2363335050276579e-05, 'epoch': 1.98}
360
- {'loss': 0.0059, 'grad_norm': 0.016805749386548996, 'learning_rate': 1.2340940138400557e-05, 'epoch': 1.98}
361
- {'loss': 0.0067, 'grad_norm': 0.5831074118614197, 'learning_rate': 1.2318545226524534e-05, 'epoch': 1.99}
362
- {'loss': 0.0078, 'grad_norm': 0.030119124799966812, 'learning_rate': 1.2296150314648514e-05, 'epoch': 1.99}
363
- {'loss': 0.007, 'grad_norm': 0.20938828587532043, 'learning_rate': 1.2273755402772492e-05, 'epoch': 2.0}
364
- {'loss': 0.0065, 'grad_norm': 0.009562190622091293, 'learning_rate': 1.2251360490896469e-05, 'epoch': 2.0}
365
- {'loss': 0.0073, 'grad_norm': 0.094178207218647, 'learning_rate': 1.2228965579020447e-05, 'epoch': 2.01}
366
- {'loss': 0.01, 'grad_norm': 0.009488407522439957, 'learning_rate': 1.2206570667144426e-05, 'epoch': 2.02}
367
- {'loss': 0.0072, 'grad_norm': 0.012404072098433971, 'learning_rate': 1.2184175755268404e-05, 'epoch': 2.02}
368
- {'loss': 0.0055, 'grad_norm': 0.03794926032423973, 'learning_rate': 1.216178084339238e-05, 'epoch': 2.03}
369
- {'loss': 0.0087, 'grad_norm': 0.11889325082302094, 'learning_rate': 1.213938593151636e-05, 'epoch': 2.03}
370
- {'loss': 0.0077, 'grad_norm': 0.17840054631233215, 'learning_rate': 1.2116991019640339e-05, 'epoch': 2.04}
371
- {'loss': 0.0067, 'grad_norm': 0.007003217935562134, 'learning_rate': 1.2094596107764316e-05, 'epoch': 2.04}
372
- {'loss': 0.008, 'grad_norm': 0.015604425221681595, 'learning_rate': 1.2072201195888296e-05, 'epoch': 2.05}
373
- {'loss': 0.0074, 'grad_norm': 0.027836063876748085, 'learning_rate': 1.2049806284012272e-05, 'epoch': 2.05}
374
- {'loss': 0.0072, 'grad_norm': 11.219870567321777, 'learning_rate': 1.202741137213625e-05, 'epoch': 2.06}
375
- {'loss': 0.0045, 'grad_norm': 0.5155676603317261, 'learning_rate': 1.200501646026023e-05, 'epoch': 2.06}
376
- {'loss': 0.0082, 'grad_norm': 0.0011188465869054198, 'learning_rate': 1.1982621548384207e-05, 'epoch': 2.07}
377
- {'loss': 0.0042, 'grad_norm': 1.8003283739089966, 'learning_rate': 1.1960226636508186e-05, 'epoch': 2.08}
378
- {'loss': 0.0076, 'grad_norm': 3.2485342025756836, 'learning_rate': 1.1937831724632166e-05, 'epoch': 2.08}
379
- {'loss': 0.0058, 'grad_norm': 0.016679977998137474, 'learning_rate': 1.1915436812756143e-05, 'epoch': 2.09}
380
- {'loss': 0.005, 'grad_norm': 0.0760900229215622, 'learning_rate': 1.1893041900880121e-05, 'epoch': 2.09}
381
- {'loss': 0.0047, 'grad_norm': 0.1045154482126236, 'learning_rate': 1.18706469890041e-05, 'epoch': 2.1}
382
- {'loss': 0.0045, 'grad_norm': 0.13673138618469238, 'learning_rate': 1.1848252077128078e-05, 'epoch': 2.1}
383
- {'loss': 0.0043, 'grad_norm': 0.003820559475570917, 'learning_rate': 1.1825857165252054e-05, 'epoch': 2.11}
384
- {'loss': 0.0049, 'grad_norm': 0.01683652587234974, 'learning_rate': 1.1803462253376034e-05, 'epoch': 2.11}
385
- {'loss': 0.0058, 'grad_norm': 0.01756940223276615, 'learning_rate': 1.1781067341500013e-05, 'epoch': 2.12}
386
- {'loss': 0.0081, 'grad_norm': 0.2571689188480377, 'learning_rate': 1.175867242962399e-05, 'epoch': 2.12}
387
- {'loss': 0.0057, 'grad_norm': 0.00470293452963233, 'learning_rate': 1.173627751774797e-05, 'epoch': 2.13}
388
- {'loss': 0.0047, 'grad_norm': 0.15966971218585968, 'learning_rate': 1.1713882605871946e-05, 'epoch': 2.14}
389
- {'loss': 0.0073, 'grad_norm': 10.850295066833496, 'learning_rate': 1.1691487693995924e-05, 'epoch': 2.14}
390
- {'loss': 0.0056, 'grad_norm': 0.02545018680393696, 'learning_rate': 1.1669092782119904e-05, 'epoch': 2.15}
391
- {'loss': 0.006, 'grad_norm': 0.4927498400211334, 'learning_rate': 1.1646697870243881e-05, 'epoch': 2.15}
392
- {'loss': 0.0061, 'grad_norm': 0.044341787695884705, 'learning_rate': 1.162430295836786e-05, 'epoch': 2.16}
393
- {'loss': 0.0042, 'grad_norm': 0.005484799854457378, 'learning_rate': 1.160190804649184e-05, 'epoch': 2.16}
394
- {'loss': 0.0057, 'grad_norm': 0.011644992977380753, 'learning_rate': 1.1579513134615816e-05, 'epoch': 2.17}
395
- {'loss': 0.0055, 'grad_norm': 0.478604257106781, 'learning_rate': 1.1557118222739794e-05, 'epoch': 2.17}
396
- {'loss': 0.0053, 'grad_norm': 0.013355757109820843, 'learning_rate': 1.1534723310863773e-05, 'epoch': 2.18}
397
- {'loss': 0.0085, 'grad_norm': 0.021052315831184387, 'learning_rate': 1.1512328398987751e-05, 'epoch': 2.18}
398
- {'loss': 0.005, 'grad_norm': 1.1859304904937744, 'learning_rate': 1.1489933487111728e-05, 'epoch': 2.19}
399
- {'loss': 0.0055, 'grad_norm': 0.007172802928835154, 'learning_rate': 1.1467538575235708e-05, 'epoch': 2.2}
400
- {'loss': 0.0032, 'grad_norm': 0.0038706334307789803, 'learning_rate': 1.1445143663359686e-05, 'epoch': 2.2}
401
- {'loss': 0.0054, 'grad_norm': 1.872606635093689, 'learning_rate': 1.1422748751483663e-05, 'epoch': 2.21}
402
- {'loss': 0.0037, 'grad_norm': 1.8617807626724243, 'learning_rate': 1.1400353839607643e-05, 'epoch': 2.21}
403
- {'loss': 0.0046, 'grad_norm': 0.05619359761476517, 'learning_rate': 1.1377958927731621e-05, 'epoch': 2.22}
404
- {'loss': 0.0029, 'grad_norm': 0.15391820669174194, 'learning_rate': 1.1355564015855598e-05, 'epoch': 2.22}
405
- {'loss': 0.0043, 'grad_norm': 0.010528339073061943, 'learning_rate': 1.1333169103979578e-05, 'epoch': 2.23}
406
- {'loss': 0.0063, 'grad_norm': 0.4744260907173157, 'learning_rate': 1.1310774192103555e-05, 'epoch': 2.23}
407
- {'loss': 0.0064, 'grad_norm': 0.00845412164926529, 'learning_rate': 1.1288379280227533e-05, 'epoch': 2.24}
408
- {'loss': 0.0046, 'grad_norm': 0.06398043781518936, 'learning_rate': 1.1265984368351513e-05, 'epoch': 2.25}
409
- {'loss': 0.0061, 'grad_norm': 0.3351975381374359, 'learning_rate': 1.124358945647549e-05, 'epoch': 2.25}
410
- {'loss': 0.0034, 'grad_norm': 0.025763623416423798, 'learning_rate': 1.1221194544599468e-05, 'epoch': 2.26}
411
- {'loss': 0.0046, 'grad_norm': 0.0274388175457716, 'learning_rate': 1.1198799632723446e-05, 'epoch': 2.26}
412
- {'loss': 0.0059, 'grad_norm': 0.033901914954185486, 'learning_rate': 1.1176404720847425e-05, 'epoch': 2.27}
413
- {'loss': 0.0044, 'grad_norm': 0.03207828849554062, 'learning_rate': 1.1154009808971401e-05, 'epoch': 2.27}
414
- {'loss': 0.0054, 'grad_norm': 0.13523073494434357, 'learning_rate': 1.1131614897095381e-05, 'epoch': 2.28}
415
- {'loss': 0.0049, 'grad_norm': 0.05645907297730446, 'learning_rate': 1.110921998521936e-05, 'epoch': 2.28}
416
- {'loss': 0.0096, 'grad_norm': 0.726065456867218, 'learning_rate': 1.1086825073343336e-05, 'epoch': 2.29}
417
- {'loss': 0.0045, 'grad_norm': 0.026955202221870422, 'learning_rate': 1.1064430161467316e-05, 'epoch': 2.29}
418
- {'loss': 0.0057, 'grad_norm': 0.09468597918748856, 'learning_rate': 1.1042035249591295e-05, 'epoch': 2.3}
419
- {'loss': 0.0032, 'grad_norm': 0.4908299744129181, 'learning_rate': 1.1019640337715271e-05, 'epoch': 2.31}
420
- {'loss': 0.0031, 'grad_norm': 0.010838231071829796, 'learning_rate': 1.099724542583925e-05, 'epoch': 2.31}
421
- {'loss': 0.0043, 'grad_norm': 0.15813188254833221, 'learning_rate': 1.0974850513963228e-05, 'epoch': 2.32}
422
- {'loss': 0.0068, 'grad_norm': 0.04824952781200409, 'learning_rate': 1.0952455602087207e-05, 'epoch': 2.32}
423
- {'loss': 0.0048, 'grad_norm': 0.12718328833580017, 'learning_rate': 1.0930060690211183e-05, 'epoch': 2.33}
424
- {'loss': 0.0042, 'grad_norm': 0.006453562993556261, 'learning_rate': 1.0907665778335163e-05, 'epoch': 2.33}
425
- {'loss': 0.0068, 'grad_norm': 0.034881096333265305, 'learning_rate': 1.0885270866459142e-05, 'epoch': 2.34}
426
- {'loss': 0.0041, 'grad_norm': 0.026440760120749474, 'learning_rate': 1.0862875954583118e-05, 'epoch': 2.34}
427
- {'loss': 0.0042, 'grad_norm': 0.10058464854955673, 'learning_rate': 1.0840481042707098e-05, 'epoch': 2.35}
428
- {'loss': 0.0051, 'grad_norm': 0.3769572377204895, 'learning_rate': 1.0818086130831077e-05, 'epoch': 2.35}
429
- {'loss': 0.0049, 'grad_norm': 0.10529354214668274, 'learning_rate': 1.0795691218955053e-05, 'epoch': 2.36}
430
- {'loss': 0.0019, 'grad_norm': 0.2557080388069153, 'learning_rate': 1.0773296307079033e-05, 'epoch': 2.37}
431
- {'loss': 0.0039, 'grad_norm': 0.19080431759357452, 'learning_rate': 1.075090139520301e-05, 'epoch': 2.37}
432
- {'loss': 0.0068, 'grad_norm': 0.009274226613342762, 'learning_rate': 1.0728506483326988e-05, 'epoch': 2.38}
433
- {'loss': 0.0033, 'grad_norm': 0.15061549842357635, 'learning_rate': 1.0706111571450968e-05, 'epoch': 2.38}
434
- {'loss': 0.0048, 'grad_norm': 0.32283729314804077, 'learning_rate': 1.0683716659574945e-05, 'epoch': 2.39}
435
- {'loss': 0.0052, 'grad_norm': 0.008730829693377018, 'learning_rate': 1.0661321747698923e-05, 'epoch': 2.39}
436
- {'loss': 0.0063, 'grad_norm': 0.09698653966188431, 'learning_rate': 1.0638926835822902e-05, 'epoch': 2.4}
437
- {'loss': 0.003, 'grad_norm': 0.09314418584108353, 'learning_rate': 1.061653192394688e-05, 'epoch': 2.4}
438
- {'loss': 0.0036, 'grad_norm': 0.0379941388964653, 'learning_rate': 1.0594137012070857e-05, 'epoch': 2.41}
439
- {'loss': 0.004, 'grad_norm': 0.03921454772353172, 'learning_rate': 1.0571742100194837e-05, 'epoch': 2.41}
440
- {'loss': 0.006, 'grad_norm': 0.0010623994749039412, 'learning_rate': 1.0549347188318815e-05, 'epoch': 2.42}
441
- {'loss': 0.0048, 'grad_norm': 0.005394686013460159, 'learning_rate': 1.0526952276442792e-05, 'epoch': 2.43}
442
- {'loss': 0.0037, 'grad_norm': 0.03824278712272644, 'learning_rate': 1.0504557364566772e-05, 'epoch': 2.43}
443
- {'loss': 0.0034, 'grad_norm': 0.09540271013975143, 'learning_rate': 1.048216245269075e-05, 'epoch': 2.44}
444
- {'loss': 0.0049, 'grad_norm': 0.014622088521718979, 'learning_rate': 1.0459767540814727e-05, 'epoch': 2.44}
445
- {'loss': 0.0036, 'grad_norm': 0.02506762184202671, 'learning_rate': 1.0437372628938707e-05, 'epoch': 2.45}
446
- {'loss': 0.0046, 'grad_norm': 12.032191276550293, 'learning_rate': 1.0414977717062684e-05, 'epoch': 2.45}
447
- {'loss': 0.0039, 'grad_norm': 2.5377721786499023, 'learning_rate': 1.0392582805186662e-05, 'epoch': 2.46}
448
- {'loss': 0.0021, 'grad_norm': 0.07715722173452377, 'learning_rate': 1.0370187893310642e-05, 'epoch': 2.46}
449
- {'loss': 0.0035, 'grad_norm': 0.41538187861442566, 'learning_rate': 1.0347792981434619e-05, 'epoch': 2.47}
450
- {'loss': 0.0034, 'grad_norm': 0.5246536135673523, 'learning_rate': 1.0325398069558597e-05, 'epoch': 2.48}
451
- {'loss': 0.003, 'grad_norm': 0.0022522832732647657, 'learning_rate': 1.0303003157682577e-05, 'epoch': 2.48}
452
- {'loss': 0.0032, 'grad_norm': 0.011422159150242805, 'learning_rate': 1.0280608245806554e-05, 'epoch': 2.49}
453
- {'loss': 0.005, 'grad_norm': 0.011885729618370533, 'learning_rate': 1.025821333393053e-05, 'epoch': 2.49}
454
- {'loss': 0.0025, 'grad_norm': 0.06374814361333847, 'learning_rate': 1.023581842205451e-05, 'epoch': 2.5}
455
- {'loss': 0.0036, 'grad_norm': 0.0674147829413414, 'learning_rate': 1.0213423510178489e-05, 'epoch': 2.5}
456
- {'loss': 0.0021, 'grad_norm': 0.5586390495300293, 'learning_rate': 1.0191028598302465e-05, 'epoch': 2.51}
457
- {'loss': 0.0025, 'grad_norm': 1.1011099815368652, 'learning_rate': 1.0168633686426445e-05, 'epoch': 2.51}
458
- {'loss': 0.0036, 'grad_norm': 0.12449350208044052, 'learning_rate': 1.0146238774550424e-05, 'epoch': 2.52}
459
- {'loss': 0.0033, 'grad_norm': 0.001172301941551268, 'learning_rate': 1.01238438626744e-05, 'epoch': 2.52}
460
- {'loss': 0.0049, 'grad_norm': 0.029910240322351456, 'learning_rate': 1.010144895079838e-05, 'epoch': 2.53}
461
- {'loss': 0.0044, 'grad_norm': 0.08513263612985611, 'learning_rate': 1.0079054038922357e-05, 'epoch': 2.54}
462
- {'loss': 0.0029, 'grad_norm': 0.024861743673682213, 'learning_rate': 1.0056659127046336e-05, 'epoch': 2.54}
463
- {'loss': 0.0028, 'grad_norm': 0.08090971410274506, 'learning_rate': 1.0034264215170316e-05, 'epoch': 2.55}
464
- {'loss': 0.0091, 'grad_norm': 0.6751871109008789, 'learning_rate': 1.0011869303294292e-05, 'epoch': 2.55}
465
- {'loss': 0.004, 'grad_norm': 0.01412627287209034, 'learning_rate': 9.98947439141827e-06, 'epoch': 2.56}
466
- {'loss': 0.0036, 'grad_norm': 0.06726730614900589, 'learning_rate': 9.967079479542249e-06, 'epoch': 2.56}
467
- {'loss': 0.0029, 'grad_norm': 0.5515012145042419, 'learning_rate': 9.944684567666227e-06, 'epoch': 2.57}
468
- {'loss': 0.0035, 'grad_norm': 0.0035773934796452522, 'learning_rate': 9.922289655790206e-06, 'epoch': 2.57}
469
- {'loss': 0.0038, 'grad_norm': 0.05018525943160057, 'learning_rate': 9.899894743914184e-06, 'epoch': 2.58}
470
- {'loss': 0.0028, 'grad_norm': 0.007242262363433838, 'learning_rate': 9.877499832038162e-06, 'epoch': 2.58}
471
- {'loss': 0.0041, 'grad_norm': 0.09467479586601257, 'learning_rate': 9.855104920162139e-06, 'epoch': 2.59}
472
- {'loss': 0.0037, 'grad_norm': 0.05528566986322403, 'learning_rate': 9.832710008286119e-06, 'epoch': 2.6}
473
- {'loss': 0.0031, 'grad_norm': 0.0195195060223341, 'learning_rate': 9.810315096410097e-06, 'epoch': 2.6}
474
- {'loss': 0.0036, 'grad_norm': 0.020678259432315826, 'learning_rate': 9.787920184534074e-06, 'epoch': 2.61}
475
- {'loss': 0.0052, 'grad_norm': 0.20698687434196472, 'learning_rate': 9.765525272658052e-06, 'epoch': 2.61}
476
- {'loss': 0.0031, 'grad_norm': 0.06637762486934662, 'learning_rate': 9.74313036078203e-06, 'epoch': 2.62}
477
- {'loss': 0.0023, 'grad_norm': 0.036431193351745605, 'learning_rate': 9.720735448906009e-06, 'epoch': 2.62}
478
- {'loss': 0.0043, 'grad_norm': 0.07816935330629349, 'learning_rate': 9.698340537029987e-06, 'epoch': 2.63}
479
- {'loss': 0.0027, 'grad_norm': 0.0019028312526643276, 'learning_rate': 9.675945625153966e-06, 'epoch': 2.63}
480
- {'loss': 0.0048, 'grad_norm': 0.011531976982951164, 'learning_rate': 9.653550713277944e-06, 'epoch': 2.64}
481
- {'loss': 0.0046, 'grad_norm': 0.07196860760450363, 'learning_rate': 9.631155801401923e-06, 'epoch': 2.64}
482
- {'loss': 0.0038, 'grad_norm': 0.08175013959407806, 'learning_rate': 9.608760889525901e-06, 'epoch': 2.65}
483
- {'loss': 0.0033, 'grad_norm': 0.001142855384387076, 'learning_rate': 9.58636597764988e-06, 'epoch': 2.66}
484
- {'loss': 0.003, 'grad_norm': 0.06008300185203552, 'learning_rate': 9.563971065773858e-06, 'epoch': 2.66}
485
- {'loss': 0.0057, 'grad_norm': 0.08218628168106079, 'learning_rate': 9.541576153897834e-06, 'epoch': 2.67}
486
- {'loss': 0.0044, 'grad_norm': 0.059504490345716476, 'learning_rate': 9.519181242021813e-06, 'epoch': 2.67}
487
- {'loss': 0.0058, 'grad_norm': 0.06249801069498062, 'learning_rate': 9.496786330145793e-06, 'epoch': 2.68}
488
- {'loss': 0.003, 'grad_norm': 0.03842584043741226, 'learning_rate': 9.47439141826977e-06, 'epoch': 2.68}
489
- {'loss': 0.0042, 'grad_norm': 0.05032949522137642, 'learning_rate': 9.451996506393748e-06, 'epoch': 2.69}
490
- {'loss': 0.0045, 'grad_norm': 0.051786765456199646, 'learning_rate': 9.429601594517726e-06, 'epoch': 2.69}
491
- {'loss': 0.0031, 'grad_norm': 0.11977092176675797, 'learning_rate': 9.407206682641704e-06, 'epoch': 2.7}
492
- {'loss': 0.0021, 'grad_norm': 0.004711544141173363, 'learning_rate': 9.384811770765683e-06, 'epoch': 2.71}
493
- {'loss': 0.0043, 'grad_norm': 0.4886787235736847, 'learning_rate': 9.362416858889661e-06, 'epoch': 2.71}
494
- {'loss': 0.0058, 'grad_norm': 0.018584702163934708, 'learning_rate': 9.34002194701364e-06, 'epoch': 2.72}
495
- {'loss': 0.0041, 'grad_norm': 0.03693871572613716, 'learning_rate': 9.317627035137618e-06, 'epoch': 2.72}
496
- {'loss': 0.0038, 'grad_norm': 0.004750245716422796, 'learning_rate': 9.295232123261596e-06, 'epoch': 2.73}
497
- {'loss': 0.0019, 'grad_norm': 1.913931131362915, 'learning_rate': 9.272837211385573e-06, 'epoch': 2.73}
498
- {'loss': 0.0029, 'grad_norm': 0.017329825088381767, 'learning_rate': 9.250442299509553e-06, 'epoch': 2.74}
499
- {'loss': 0.003, 'grad_norm': 0.02129119075834751, 'learning_rate': 9.228047387633531e-06, 'epoch': 2.74}
500
- {'loss': 0.0038, 'grad_norm': 0.028891241177916527, 'learning_rate': 9.205652475757508e-06, 'epoch': 2.75}
501
- {'loss': 0.004, 'grad_norm': 0.009224362671375275, 'learning_rate': 9.183257563881486e-06, 'epoch': 2.75}
502
- {'loss': 0.0049, 'grad_norm': 0.03435930609703064, 'learning_rate': 9.160862652005466e-06, 'epoch': 2.76}
503
- {'loss': 0.0039, 'grad_norm': 0.01626667007803917, 'learning_rate': 9.138467740129443e-06, 'epoch': 2.77}
504
- {'loss': 0.005, 'grad_norm': 1.1218552589416504, 'learning_rate': 9.116072828253421e-06, 'epoch': 2.77}
505
- {'loss': 0.0046, 'grad_norm': 0.030987482517957687, 'learning_rate': 9.0936779163774e-06, 'epoch': 2.78}
506
- {'loss': 0.0025, 'grad_norm': 0.09153684228658676, 'learning_rate': 9.071283004501378e-06, 'epoch': 2.78}
507
- {'loss': 0.0044, 'grad_norm': 0.0023125149309635162, 'learning_rate': 9.048888092625356e-06, 'epoch': 2.79}
508
- {'loss': 0.0023, 'grad_norm': 0.004464196972548962, 'learning_rate': 9.026493180749335e-06, 'epoch': 2.79}
509
- {'loss': 0.0038, 'grad_norm': 0.033567965030670166, 'learning_rate': 9.004098268873313e-06, 'epoch': 2.8}
510
- {'loss': 0.0032, 'grad_norm': 0.05314967781305313, 'learning_rate': 8.981703356997291e-06, 'epoch': 2.8}
511
- {'loss': 0.0021, 'grad_norm': 0.019064532592892647, 'learning_rate': 8.959308445121268e-06, 'epoch': 2.81}
512
- {'loss': 0.0023, 'grad_norm': 0.006131445057690144, 'learning_rate': 8.936913533245248e-06, 'epoch': 2.81}
513
- {'loss': 0.0042, 'grad_norm': 0.20496051013469696, 'learning_rate': 8.914518621369226e-06, 'epoch': 2.82}
514
- {'loss': 0.0042, 'grad_norm': 0.03717898949980736, 'learning_rate': 8.892123709493203e-06, 'epoch': 2.83}
515
- {'loss': 0.0053, 'grad_norm': 0.04788793995976448, 'learning_rate': 8.869728797617181e-06, 'epoch': 2.83}
516
- {'loss': 0.0021, 'grad_norm': 4.119758605957031, 'learning_rate': 8.847333885741161e-06, 'epoch': 2.84}
517
- {'loss': 0.0033, 'grad_norm': 0.24966038763523102, 'learning_rate': 8.824938973865138e-06, 'epoch': 2.84}
518
- {'loss': 0.0047, 'grad_norm': 11.138167381286621, 'learning_rate': 8.802544061989116e-06, 'epoch': 2.85}
519
- {'loss': 0.0048, 'grad_norm': 0.02488502860069275, 'learning_rate': 8.780149150113095e-06, 'epoch': 2.85}
520
- {'loss': 0.0022, 'grad_norm': 0.0015538616571575403, 'learning_rate': 8.757754238237073e-06, 'epoch': 2.86}
521
- {'loss': 0.0036, 'grad_norm': 0.011559401638805866, 'learning_rate': 8.735359326361051e-06, 'epoch': 2.86}
522
- {'loss': 0.0034, 'grad_norm': 0.41917547583580017, 'learning_rate': 8.71296441448503e-06, 'epoch': 2.87}
523
- {'loss': 0.0029, 'grad_norm': 0.09700381010770798, 'learning_rate': 8.690569502609008e-06, 'epoch': 2.87}
524
- {'loss': 0.0038, 'grad_norm': 0.10457664728164673, 'learning_rate': 8.668174590732987e-06, 'epoch': 2.88}
525
- {'loss': 0.0067, 'grad_norm': 0.009366615675389767, 'learning_rate': 8.645779678856965e-06, 'epoch': 2.89}
526
- {'loss': 0.003, 'grad_norm': 0.0037414308171719313, 'learning_rate': 8.623384766980942e-06, 'epoch': 2.89}
527
- {'loss': 0.0049, 'grad_norm': 0.09502315521240234, 'learning_rate': 8.600989855104922e-06, 'epoch': 2.9}
528
- {'loss': 0.0027, 'grad_norm': 0.390895813703537, 'learning_rate': 8.5785949432289e-06, 'epoch': 2.9}
529
- {'loss': 0.004, 'grad_norm': 0.06665816903114319, 'learning_rate': 8.556200031352877e-06, 'epoch': 2.91}
530
- {'loss': 0.0042, 'grad_norm': 0.012638445943593979, 'learning_rate': 8.533805119476855e-06, 'epoch': 2.91}
531
- {'loss': 0.0042, 'grad_norm': 0.26541146636009216, 'learning_rate': 8.511410207600835e-06, 'epoch': 2.92}
532
- {'loss': 0.0038, 'grad_norm': 0.0727955549955368, 'learning_rate': 8.489015295724812e-06, 'epoch': 2.92}
533
- {'loss': 0.0029, 'grad_norm': 0.10278739035129547, 'learning_rate': 8.46662038384879e-06, 'epoch': 2.93}
534
- {'loss': 0.0039, 'grad_norm': 0.02014051005244255, 'learning_rate': 8.444225471972768e-06, 'epoch': 2.94}
535
- {'loss': 0.0039, 'grad_norm': 0.03868388757109642, 'learning_rate': 8.421830560096747e-06, 'epoch': 2.94}
536
- {'loss': 0.002, 'grad_norm': 0.007972314022481441, 'learning_rate': 8.399435648220725e-06, 'epoch': 2.95}
537
- {'loss': 0.0022, 'grad_norm': 0.004831973928958178, 'learning_rate': 8.377040736344703e-06, 'epoch': 2.95}
538
- {'loss': 0.002, 'grad_norm': 0.04136960953474045, 'learning_rate': 8.354645824468682e-06, 'epoch': 2.96}
539
- {'loss': 0.003, 'grad_norm': 0.05827214568853378, 'learning_rate': 8.33225091259266e-06, 'epoch': 2.96}
540
- {'loss': 0.0019, 'grad_norm': 0.42127054929733276, 'learning_rate': 8.309856000716637e-06, 'epoch': 2.97}
541
- {'loss': 0.0044, 'grad_norm': 0.3449774384498596, 'learning_rate': 8.287461088840615e-06, 'epoch': 2.97}
542
- {'loss': 0.0028, 'grad_norm': 0.07684598118066788, 'learning_rate': 8.265066176964595e-06, 'epoch': 2.98}
543
- {'loss': 0.0031, 'grad_norm': 0.010578151792287827, 'learning_rate': 8.242671265088572e-06, 'epoch': 2.98}
544
- {'loss': 0.0025, 'grad_norm': 0.14775557816028595, 'learning_rate': 8.22027635321255e-06, 'epoch': 2.99}
545
- {'loss': 0.0021, 'grad_norm': 0.16460050642490387, 'learning_rate': 8.197881441336529e-06, 'epoch': 3.0}
546
- {'loss': 0.0025, 'grad_norm': 0.0014739581383764744, 'learning_rate': 8.175486529460507e-06, 'epoch': 3.0}
547
- {'loss': 0.0038, 'grad_norm': 0.01254010945558548, 'learning_rate': 8.153091617584485e-06, 'epoch': 3.01}
548
- {'loss': 0.0045, 'grad_norm': 0.09135819971561432, 'learning_rate': 8.130696705708464e-06, 'epoch': 3.01}
549
- {'loss': 0.002, 'grad_norm': 0.002453828463330865, 'learning_rate': 8.108301793832442e-06, 'epoch': 3.02}
550
- {'loss': 0.0035, 'grad_norm': 0.007750564254820347, 'learning_rate': 8.08590688195642e-06, 'epoch': 3.02}
551
- {'loss': 0.0046, 'grad_norm': 1.2276639938354492, 'learning_rate': 8.063511970080399e-06, 'epoch': 3.03}
552
- {'loss': 0.0033, 'grad_norm': 0.0030335187911987305, 'learning_rate': 8.041117058204377e-06, 'epoch': 3.03}
553
- {'loss': 0.002, 'grad_norm': 0.38589444756507874, 'learning_rate': 8.018722146328355e-06, 'epoch': 3.04}
554
- {'loss': 0.0036, 'grad_norm': 0.022893747314810753, 'learning_rate': 7.996327234452334e-06, 'epoch': 3.04}
555
- {'loss': 0.0025, 'grad_norm': 0.003235406940802932, 'learning_rate': 7.97393232257631e-06, 'epoch': 3.05}
556
- {'loss': 0.0039, 'grad_norm': 0.05985206738114357, 'learning_rate': 7.95153741070029e-06, 'epoch': 3.06}
557
- {'loss': 0.0029, 'grad_norm': 0.14925901591777802, 'learning_rate': 7.929142498824269e-06, 'epoch': 3.06}
558
- {'loss': 0.004, 'grad_norm': 0.026889082044363022, 'learning_rate': 7.906747586948245e-06, 'epoch': 3.07}
559
- {'loss': 0.0023, 'grad_norm': 0.057593248784542084, 'learning_rate': 7.884352675072224e-06, 'epoch': 3.07}
560
- {'loss': 0.0019, 'grad_norm': 0.20720885694026947, 'learning_rate': 7.861957763196204e-06, 'epoch': 3.08}
561
- {'loss': 0.0019, 'grad_norm': 0.002138580894097686, 'learning_rate': 7.83956285132018e-06, 'epoch': 3.08}
562
- {'loss': 0.0027, 'grad_norm': 0.013360394164919853, 'learning_rate': 7.817167939444159e-06, 'epoch': 3.09}
563
- {'loss': 0.0014, 'grad_norm': 0.12524360418319702, 'learning_rate': 7.794773027568137e-06, 'epoch': 3.09}
564
- {'loss': 0.0019, 'grad_norm': 0.04898557439446449, 'learning_rate': 7.772378115692116e-06, 'epoch': 3.1}
565
- {'loss': 0.0018, 'grad_norm': 0.007251457776874304, 'learning_rate': 7.749983203816094e-06, 'epoch': 3.1}
566
- {'loss': 0.0016, 'grad_norm': 0.005014033988118172, 'learning_rate': 7.72758829194007e-06, 'epoch': 3.11}
567
- {'loss': 0.0017, 'grad_norm': 0.008448738604784012, 'learning_rate': 7.70519338006405e-06, 'epoch': 3.12}
568
- {'loss': 0.0049, 'grad_norm': 0.03684404864907265, 'learning_rate': 7.682798468188029e-06, 'epoch': 3.12}
569
- {'loss': 0.0022, 'grad_norm': 0.0004430219705682248, 'learning_rate': 7.660403556312006e-06, 'epoch': 3.13}
570
- {'loss': 0.0023, 'grad_norm': 0.01342267170548439, 'learning_rate': 7.638008644435984e-06, 'epoch': 3.13}
571
- {'loss': 0.0016, 'grad_norm': 0.27969247102737427, 'learning_rate': 7.615613732559963e-06, 'epoch': 3.14}
572
- {'loss': 0.002, 'grad_norm': 0.37727439403533936, 'learning_rate': 7.593218820683941e-06, 'epoch': 3.14}
573
- {'loss': 0.0025, 'grad_norm': 0.2768697142601013, 'learning_rate': 7.570823908807919e-06, 'epoch': 3.15}
574
- {'loss': 0.0012, 'grad_norm': 0.12135498970746994, 'learning_rate': 7.548428996931898e-06, 'epoch': 3.15}
575
- {'loss': 0.0021, 'grad_norm': 0.05090919882059097, 'learning_rate': 7.526034085055876e-06, 'epoch': 3.16}
576
- {'loss': 0.0017, 'grad_norm': 0.14085857570171356, 'learning_rate': 7.503639173179854e-06, 'epoch': 3.17}
577
- {'loss': 0.0019, 'grad_norm': 0.3364329934120178, 'learning_rate': 7.481244261303832e-06, 'epoch': 3.17}
578
- {'loss': 0.0019, 'grad_norm': 0.024304231628775597, 'learning_rate': 7.45884934942781e-06, 'epoch': 3.18}
579
- {'loss': 0.0042, 'grad_norm': 0.12154655903577805, 'learning_rate': 7.436454437551789e-06, 'epoch': 3.18}
580
- {'loss': 0.0027, 'grad_norm': 0.020685842260718346, 'learning_rate': 7.4140595256757675e-06, 'epoch': 3.19}
581
- {'loss': 0.0011, 'grad_norm': 0.024026449769735336, 'learning_rate': 7.391664613799745e-06, 'epoch': 3.19}
582
- {'loss': 0.002, 'grad_norm': 0.07294344902038574, 'learning_rate': 7.369269701923723e-06, 'epoch': 3.2}
583
- {'loss': 0.0021, 'grad_norm': 0.0950188934803009, 'learning_rate': 7.3468747900477025e-06, 'epoch': 3.2}
584
- {'loss': 0.0015, 'grad_norm': 0.004987840075045824, 'learning_rate': 7.32447987817168e-06, 'epoch': 3.21}
585
- {'loss': 0.0017, 'grad_norm': 0.0009321196121163666, 'learning_rate': 7.302084966295658e-06, 'epoch': 3.21}
586
- {'loss': 0.002, 'grad_norm': 0.8103981614112854, 'learning_rate': 7.279690054419637e-06, 'epoch': 3.22}
587
- {'loss': 0.0012, 'grad_norm': 0.08477653563022614, 'learning_rate': 7.257295142543614e-06, 'epoch': 3.23}
588
- {'loss': 0.0017, 'grad_norm': 0.14473630487918854, 'learning_rate': 7.234900230667593e-06, 'epoch': 3.23}
589
- {'loss': 0.0029, 'grad_norm': 0.1038050651550293, 'learning_rate': 7.212505318791572e-06, 'epoch': 3.24}
590
- {'loss': 0.0019, 'grad_norm': 0.004471446853131056, 'learning_rate': 7.190110406915549e-06, 'epoch': 3.24}
591
- {'loss': 0.0017, 'grad_norm': 0.08369725197553635, 'learning_rate': 7.167715495039528e-06, 'epoch': 3.25}
592
- {'loss': 0.0019, 'grad_norm': 0.07285201549530029, 'learning_rate': 7.145320583163505e-06, 'epoch': 3.25}
593
- {'loss': 0.0012, 'grad_norm': 0.007139412686228752, 'learning_rate': 7.122925671287484e-06, 'epoch': 3.26}
594
- {'loss': 0.0024, 'grad_norm': 0.026335667818784714, 'learning_rate': 7.100530759411463e-06, 'epoch': 3.26}
595
- {'loss': 0.0017, 'grad_norm': 0.4776710569858551, 'learning_rate': 7.07813584753544e-06, 'epoch': 3.27}
596
- {'loss': 0.0022, 'grad_norm': 0.025377823039889336, 'learning_rate': 7.0557409356594185e-06, 'epoch': 3.27}
597
- {'loss': 0.002, 'grad_norm': 0.15673436224460602, 'learning_rate': 7.033346023783397e-06, 'epoch': 3.28}
598
- {'loss': 0.0028, 'grad_norm': 0.10128195583820343, 'learning_rate': 7.010951111907374e-06, 'epoch': 3.29}
599
- {'loss': 0.0036, 'grad_norm': 0.007779085077345371, 'learning_rate': 6.988556200031354e-06, 'epoch': 3.29}
600
- {'loss': 0.0015, 'grad_norm': 0.07349961996078491, 'learning_rate': 6.966161288155332e-06, 'epoch': 3.3}
601
- {'loss': 0.0024, 'grad_norm': 0.002100712852552533, 'learning_rate': 6.9437663762793094e-06, 'epoch': 3.3}
602
- {'loss': 0.0015, 'grad_norm': 0.2998717725276947, 'learning_rate': 6.921371464403288e-06, 'epoch': 3.31}
603
- {'loss': 0.0012, 'grad_norm': 0.13683967292308807, 'learning_rate': 6.898976552527267e-06, 'epoch': 3.31}
604
- {'loss': 0.0022, 'grad_norm': 0.003665775526314974, 'learning_rate': 6.8765816406512445e-06, 'epoch': 3.32}
605
- {'loss': 0.0015, 'grad_norm': 0.000874933844897896, 'learning_rate': 6.854186728775223e-06, 'epoch': 3.32}
606
- {'loss': 0.0023, 'grad_norm': 0.15409617125988007, 'learning_rate': 6.831791816899201e-06, 'epoch': 3.33}
607
- {'loss': 0.0017, 'grad_norm': 0.03335576876997948, 'learning_rate': 6.809396905023179e-06, 'epoch': 3.33}
608
- {'loss': 0.0021, 'grad_norm': 0.013786455616354942, 'learning_rate': 6.787001993147158e-06, 'epoch': 3.34}
609
- {'loss': 0.0017, 'grad_norm': 0.0036290634889155626, 'learning_rate': 6.764607081271136e-06, 'epoch': 3.35}
610
- {'loss': 0.0015, 'grad_norm': 0.1783692091703415, 'learning_rate': 6.742212169395114e-06, 'epoch': 3.35}
611
- {'loss': 0.0023, 'grad_norm': 0.012478599324822426, 'learning_rate': 6.719817257519092e-06, 'epoch': 3.36}
612
- {'loss': 0.0014, 'grad_norm': 0.016466649249196053, 'learning_rate': 6.697422345643071e-06, 'epoch': 3.36}
613
- {'loss': 0.0019, 'grad_norm': 0.006102473475039005, 'learning_rate': 6.675027433767049e-06, 'epoch': 3.37}
614
- {'loss': 0.0017, 'grad_norm': 0.009547678753733635, 'learning_rate': 6.652632521891027e-06, 'epoch': 3.37}
615
- {'loss': 0.0027, 'grad_norm': 0.1453057825565338, 'learning_rate': 6.6302376100150055e-06, 'epoch': 3.38}
616
- {'loss': 0.0016, 'grad_norm': 0.20028233528137207, 'learning_rate': 6.607842698138983e-06, 'epoch': 3.38}
617
- {'loss': 0.0019, 'grad_norm': 0.003452139673754573, 'learning_rate': 6.585447786262961e-06, 'epoch': 3.39}
618
- {'loss': 0.0037, 'grad_norm': 0.004863356240093708, 'learning_rate': 6.563052874386939e-06, 'epoch': 3.4}
619
- {'loss': 0.0016, 'grad_norm': 0.08551418036222458, 'learning_rate': 6.540657962510918e-06, 'epoch': 3.4}
620
- {'loss': 0.0012, 'grad_norm': 0.07263021171092987, 'learning_rate': 6.5182630506348964e-06, 'epoch': 3.41}
621
- {'loss': 0.0024, 'grad_norm': 0.02901959978044033, 'learning_rate': 6.495868138758874e-06, 'epoch': 3.41}
622
- {'loss': 0.0016, 'grad_norm': 0.010008633136749268, 'learning_rate': 6.473473226882852e-06, 'epoch': 3.42}
623
- {'loss': 0.0022, 'grad_norm': 0.02640015073120594, 'learning_rate': 6.4510783150068315e-06, 'epoch': 3.42}
624
- {'loss': 0.0015, 'grad_norm': 0.1104823499917984, 'learning_rate': 6.428683403130809e-06, 'epoch': 3.43}
625
- {'loss': 0.0017, 'grad_norm': 0.16136541962623596, 'learning_rate': 6.406288491254787e-06, 'epoch': 3.43}
626
- {'loss': 0.0015, 'grad_norm': 0.0034606726840138435, 'learning_rate': 6.383893579378766e-06, 'epoch': 3.44}
627
- {'loss': 0.0018, 'grad_norm': 0.03437316045165062, 'learning_rate': 6.361498667502743e-06, 'epoch': 3.44}
628
- {'loss': 0.0015, 'grad_norm': 0.0036164058838039637, 'learning_rate': 6.339103755626722e-06, 'epoch': 3.45}
629
- {'loss': 0.0019, 'grad_norm': 0.01371910609304905, 'learning_rate': 6.316708843750701e-06, 'epoch': 3.46}
630
- {'loss': 0.0009, 'grad_norm': 0.12439179420471191, 'learning_rate': 6.294313931874678e-06, 'epoch': 3.46}
631
- {'loss': 0.001, 'grad_norm': 0.0557035356760025, 'learning_rate': 6.271919019998657e-06, 'epoch': 3.47}
632
- {'loss': 0.001, 'grad_norm': 0.020946547389030457, 'learning_rate': 6.249524108122636e-06, 'epoch': 3.47}
633
- {'loss': 0.0023, 'grad_norm': 0.07646912336349487, 'learning_rate': 6.227129196246613e-06, 'epoch': 3.48}
634
- {'loss': 0.0012, 'grad_norm': 0.07221906632184982, 'learning_rate': 6.204734284370592e-06, 'epoch': 3.48}
635
- {'loss': 0.0012, 'grad_norm': 0.024785563349723816, 'learning_rate': 6.18233937249457e-06, 'epoch': 3.49}
636
- {'loss': 0.0011, 'grad_norm': 0.0033317049965262413, 'learning_rate': 6.1599444606185475e-06, 'epoch': 3.49}
637
- {'loss': 0.0008, 'grad_norm': 0.051508672535419464, 'learning_rate': 6.137549548742526e-06, 'epoch': 3.5}
638
- {'loss': 0.0018, 'grad_norm': 0.29059651494026184, 'learning_rate': 6.115154636866505e-06, 'epoch': 3.5}
639
- {'loss': 0.0009, 'grad_norm': 0.0021790487226098776, 'learning_rate': 6.0927597249904826e-06, 'epoch': 3.51}
640
- {'loss': 0.0016, 'grad_norm': 2.6697778701782227, 'learning_rate': 6.070364813114461e-06, 'epoch': 3.52}
641
- {'loss': 0.0012, 'grad_norm': 1.0531569719314575, 'learning_rate': 6.047969901238439e-06, 'epoch': 3.52}
642
- {'loss': 0.0015, 'grad_norm': 0.4647313952445984, 'learning_rate': 6.025574989362417e-06, 'epoch': 3.53}
643
- {'loss': 0.0024, 'grad_norm': 0.05964852496981621, 'learning_rate': 6.003180077486396e-06, 'epoch': 3.53}
644
- {'loss': 0.0016, 'grad_norm': 0.06724616885185242, 'learning_rate': 5.980785165610374e-06, 'epoch': 3.54}
645
- {'loss': 0.0014, 'grad_norm': 0.2721405625343323, 'learning_rate': 5.958390253734352e-06, 'epoch': 3.54}
646
- {'loss': 0.0014, 'grad_norm': 0.0075986250303685665, 'learning_rate': 5.93599534185833e-06, 'epoch': 3.55}
647
- {'loss': 0.0047, 'grad_norm': 0.07740730047225952, 'learning_rate': 5.913600429982308e-06, 'epoch': 3.55}
648
- {'loss': 0.0013, 'grad_norm': 0.07295466959476471, 'learning_rate': 5.891205518106287e-06, 'epoch': 3.56}
649
- {'loss': 0.0012, 'grad_norm': 0.015490477904677391, 'learning_rate': 5.868810606230265e-06, 'epoch': 3.56}
650
- {'loss': 0.0013, 'grad_norm': 0.10027164220809937, 'learning_rate': 5.846415694354243e-06, 'epoch': 3.57}
651
- {'loss': 0.0011, 'grad_norm': 0.0034784649033099413, 'learning_rate': 5.824020782478221e-06, 'epoch': 3.58}
652
- {'loss': 0.0011, 'grad_norm': 0.11903531104326248, 'learning_rate': 5.8016258706022e-06, 'epoch': 3.58}
653
- {'loss': 0.0016, 'grad_norm': 0.01810777373611927, 'learning_rate': 5.779230958726178e-06, 'epoch': 3.59}
654
- {'loss': 0.0022, 'grad_norm': 0.0328763872385025, 'learning_rate': 5.756836046850156e-06, 'epoch': 3.59}
655
- {'loss': 0.0017, 'grad_norm': 0.004651115275919437, 'learning_rate': 5.7344411349741345e-06, 'epoch': 3.6}
656
- {'loss': 0.0012, 'grad_norm': 0.008656460791826248, 'learning_rate': 5.712046223098112e-06, 'epoch': 3.6}
657
- {'loss': 0.002, 'grad_norm': 0.015148227103054523, 'learning_rate': 5.689651311222091e-06, 'epoch': 3.61}
658
- {'loss': 0.0016, 'grad_norm': 0.04307083040475845, 'learning_rate': 5.6672563993460696e-06, 'epoch': 3.61}
659
- {'loss': 0.0009, 'grad_norm': 0.10523002594709396, 'learning_rate': 5.644861487470047e-06, 'epoch': 3.62}
660
- {'loss': 0.0011, 'grad_norm': 0.004528895020484924, 'learning_rate': 5.622466575594025e-06, 'epoch': 3.63}
661
- {'loss': 0.0019, 'grad_norm': 0.1139262244105339, 'learning_rate': 5.600071663718004e-06, 'epoch': 3.63}
662
- {'loss': 0.0011, 'grad_norm': 0.02509382739663124, 'learning_rate': 5.577676751841981e-06, 'epoch': 3.64}
663
- {'loss': 0.0021, 'grad_norm': 0.3114255368709564, 'learning_rate': 5.5552818399659605e-06, 'epoch': 3.64}
664
- {'loss': 0.0029, 'grad_norm': 0.14174672961235046, 'learning_rate': 5.532886928089939e-06, 'epoch': 3.65}
665
- {'loss': 0.001, 'grad_norm': 0.015288415364921093, 'learning_rate': 5.510492016213916e-06, 'epoch': 3.65}
666
- {'loss': 0.0016, 'grad_norm': 0.0520060658454895, 'learning_rate': 5.488097104337895e-06, 'epoch': 3.66}
667
- {'loss': 0.0016, 'grad_norm': 0.0958566963672638, 'learning_rate': 5.465702192461874e-06, 'epoch': 3.66}
668
- {'loss': 0.0036, 'grad_norm': 0.000693493289873004, 'learning_rate': 5.443307280585851e-06, 'epoch': 3.67}
669
- {'loss': 0.0012, 'grad_norm': 0.037046968936920166, 'learning_rate': 5.42091236870983e-06, 'epoch': 3.67}
670
- {'loss': 0.003, 'grad_norm': 0.031214630231261253, 'learning_rate': 5.398517456833808e-06, 'epoch': 3.68}
671
- {'loss': 0.0014, 'grad_norm': 0.393162339925766, 'learning_rate': 5.376122544957786e-06, 'epoch': 3.69}
672
- {'loss': 0.0018, 'grad_norm': 0.16350078582763672, 'learning_rate': 5.353727633081765e-06, 'epoch': 3.69}
673
- {'loss': 0.001, 'grad_norm': 0.020479297265410423, 'learning_rate': 5.331332721205742e-06, 'epoch': 3.7}
674
- {'loss': 0.001, 'grad_norm': 0.06839997321367264, 'learning_rate': 5.308937809329721e-06, 'epoch': 3.7}
675
- {'loss': 0.0016, 'grad_norm': 0.47072646021842957, 'learning_rate': 5.286542897453699e-06, 'epoch': 3.71}
676
- {'loss': 0.0025, 'grad_norm': 0.015468220226466656, 'learning_rate': 5.2641479855776765e-06, 'epoch': 3.71}
677
- {'loss': 0.001, 'grad_norm': 0.06005273386836052, 'learning_rate': 5.241753073701656e-06, 'epoch': 3.72}
678
- {'loss': 0.0018, 'grad_norm': 0.016474580392241478, 'learning_rate': 5.219358161825634e-06, 'epoch': 3.72}
679
- {'loss': 0.0015, 'grad_norm': 0.0036705026868730783, 'learning_rate': 5.1969632499496116e-06, 'epoch': 3.73}
680
- {'loss': 0.001, 'grad_norm': 0.5551484823226929, 'learning_rate': 5.17456833807359e-06, 'epoch': 3.73}
681
- {'loss': 0.0009, 'grad_norm': 0.006879040505737066, 'learning_rate': 5.152173426197568e-06, 'epoch': 3.74}
682
- {'loss': 0.0013, 'grad_norm': 0.0026731377001851797, 'learning_rate': 5.129778514321546e-06, 'epoch': 3.75}
683
- {'loss': 0.0014, 'grad_norm': 0.10522931814193726, 'learning_rate': 5.107383602445525e-06, 'epoch': 3.75}
684
- {'loss': 0.0013, 'grad_norm': 0.07733763009309769, 'learning_rate': 5.084988690569503e-06, 'epoch': 3.76}
685
- {'loss': 0.0011, 'grad_norm': 0.08409392833709717, 'learning_rate': 5.062593778693481e-06, 'epoch': 3.76}
686
- {'loss': 0.0026, 'grad_norm': 0.03305979073047638, 'learning_rate': 5.040198866817459e-06, 'epoch': 3.77}
687
- {'loss': 0.0014, 'grad_norm': 0.006016087252646685, 'learning_rate': 5.017803954941438e-06, 'epoch': 3.77}
688
- {'loss': 0.0021, 'grad_norm': 0.02351684682071209, 'learning_rate': 4.995409043065416e-06, 'epoch': 3.78}
689
- {'loss': 0.0015, 'grad_norm': 0.009738347493112087, 'learning_rate': 4.973014131189394e-06, 'epoch': 3.78}
690
- {'loss': 0.0013, 'grad_norm': 0.02382291853427887, 'learning_rate': 4.9506192193133726e-06, 'epoch': 3.79}
691
- {'loss': 0.0013, 'grad_norm': 0.028588024899363518, 'learning_rate': 4.92822430743735e-06, 'epoch': 3.79}
692
- {'loss': 0.0019, 'grad_norm': 0.06715335696935654, 'learning_rate': 4.905829395561329e-06, 'epoch': 3.8}
693
- {'loss': 0.0009, 'grad_norm': 0.009042341262102127, 'learning_rate': 4.883434483685307e-06, 'epoch': 3.81}
694
- {'loss': 0.0009, 'grad_norm': 0.03919120132923126, 'learning_rate': 4.861039571809285e-06, 'epoch': 3.81}
695
- {'loss': 0.0014, 'grad_norm': 0.04066384211182594, 'learning_rate': 4.8386446599332635e-06, 'epoch': 3.82}
696
- {'loss': 0.0012, 'grad_norm': 0.05810333788394928, 'learning_rate': 4.816249748057242e-06, 'epoch': 3.82}
697
- {'loss': 0.0032, 'grad_norm': 0.020592456683516502, 'learning_rate': 4.79385483618122e-06, 'epoch': 3.83}
698
- {'loss': 0.0015, 'grad_norm': 0.1887601613998413, 'learning_rate': 4.7714599243051985e-06, 'epoch': 3.83}
699
- {'loss': 0.0011, 'grad_norm': 0.020269712433218956, 'learning_rate': 4.749065012429177e-06, 'epoch': 3.84}
700
- {'loss': 0.002, 'grad_norm': 0.15431857109069824, 'learning_rate': 4.726670100553154e-06, 'epoch': 3.84}
701
- {'loss': 0.0012, 'grad_norm': 0.009703408926725388, 'learning_rate': 4.704275188677134e-06, 'epoch': 3.85}
702
- {'loss': 0.0026, 'grad_norm': 0.03211360052227974, 'learning_rate': 4.681880276801111e-06, 'epoch': 3.86}
703
- {'loss': 0.001, 'grad_norm': 0.08050722628831863, 'learning_rate': 4.6594853649250894e-06, 'epoch': 3.86}
704
- {'loss': 0.0018, 'grad_norm': 0.01742105558514595, 'learning_rate': 4.637090453049068e-06, 'epoch': 3.87}
705
- {'loss': 0.0014, 'grad_norm': 0.13857877254486084, 'learning_rate': 4.614695541173046e-06, 'epoch': 3.87}
706
- {'loss': 0.001, 'grad_norm': 0.10377497225999832, 'learning_rate': 4.592300629297024e-06, 'epoch': 3.88}
707
- {'loss': 0.0018, 'grad_norm': 0.019631896167993546, 'learning_rate': 4.569905717421002e-06, 'epoch': 3.88}
708
- {'loss': 0.0027, 'grad_norm': 0.010785219259560108, 'learning_rate': 4.54751080554498e-06, 'epoch': 3.89}
709
- {'loss': 0.0027, 'grad_norm': 0.0029205495957285166, 'learning_rate': 4.525115893668959e-06, 'epoch': 3.89}
710
- {'loss': 0.0011, 'grad_norm': 0.026202471926808357, 'learning_rate': 4.502720981792937e-06, 'epoch': 3.9}
711
- {'loss': 0.0024, 'grad_norm': 0.005275311879813671, 'learning_rate': 4.480326069916915e-06, 'epoch': 3.9}
712
- {'loss': 0.0012, 'grad_norm': 0.009359275922179222, 'learning_rate': 4.457931158040894e-06, 'epoch': 3.91}
713
- {'loss': 0.0018, 'grad_norm': 0.06890468299388885, 'learning_rate': 4.435536246164871e-06, 'epoch': 3.92}
714
- {'loss': 0.0012, 'grad_norm': 0.004848203156143427, 'learning_rate': 4.4131413342888505e-06, 'epoch': 3.92}
715
- {'loss': 0.0015, 'grad_norm': 1.0583692789077759, 'learning_rate': 4.390746422412828e-06, 'epoch': 3.93}
716
- {'loss': 0.0015, 'grad_norm': 0.08103686571121216, 'learning_rate': 4.368351510536806e-06, 'epoch': 3.93}
717
- {'loss': 0.0018, 'grad_norm': 0.006814138498157263, 'learning_rate': 4.345956598660785e-06, 'epoch': 3.94}
718
- {'loss': 0.0017, 'grad_norm': 0.28501853346824646, 'learning_rate': 4.323561686784763e-06, 'epoch': 3.94}
719
- {'loss': 0.0009, 'grad_norm': 0.005639960058033466, 'learning_rate': 4.301166774908741e-06, 'epoch': 3.95}
720
- {'loss': 0.001, 'grad_norm': 0.0073370854370296, 'learning_rate': 4.278771863032719e-06, 'epoch': 3.95}
721
- {'loss': 0.0013, 'grad_norm': 0.014588725753128529, 'learning_rate': 4.256376951156698e-06, 'epoch': 3.96}
722
- {'loss': 0.0008, 'grad_norm': 0.010143490508198738, 'learning_rate': 4.233982039280676e-06, 'epoch': 3.96}
723
- {'loss': 0.0018, 'grad_norm': 0.00340424757450819, 'learning_rate': 4.211587127404654e-06, 'epoch': 3.97}
724
- {'loss': 0.0027, 'grad_norm': 0.012705673463642597, 'learning_rate': 4.189192215528632e-06, 'epoch': 3.98}
725
- {'loss': 0.0009, 'grad_norm': 0.038429852575063705, 'learning_rate': 4.166797303652611e-06, 'epoch': 3.98}
726
- {'loss': 0.0008, 'grad_norm': 0.4789028763771057, 'learning_rate': 4.144402391776588e-06, 'epoch': 3.99}
727
- {'loss': 0.001, 'grad_norm': 0.006754585541784763, 'learning_rate': 4.122007479900567e-06, 'epoch': 3.99}
728
- {'loss': 0.0009, 'grad_norm': 0.23940064013004303, 'learning_rate': 4.099612568024545e-06, 'epoch': 4.0}
729
- {'loss': 0.0012, 'grad_norm': 0.0665973499417305, 'learning_rate': 4.077217656148523e-06, 'epoch': 4.0}
730
- {'loss': 0.0011, 'grad_norm': 0.0013757392298430204, 'learning_rate': 4.0548227442725016e-06, 'epoch': 4.01}
731
- {'loss': 0.0023, 'grad_norm': 0.8854921460151672, 'learning_rate': 4.03242783239648e-06, 'epoch': 4.01}
732
- {'loss': 0.0023, 'grad_norm': 0.06492713838815689, 'learning_rate': 4.010032920520458e-06, 'epoch': 4.02}
733
- {'loss': 0.0012, 'grad_norm': 0.003994062077254057, 'learning_rate': 3.987638008644436e-06, 'epoch': 4.02}
734
- {'loss': 0.0018, 'grad_norm': 0.024876805022358894, 'learning_rate': 3.965243096768415e-06, 'epoch': 4.03}
735
- {'loss': 0.0013, 'grad_norm': 0.21828804910182953, 'learning_rate': 3.9428481848923925e-06, 'epoch': 4.04}
736
- {'loss': 0.0009, 'grad_norm': 0.02883763425052166, 'learning_rate': 3.920453273016371e-06, 'epoch': 4.04}
737
- {'loss': 0.0016, 'grad_norm': 0.1658484935760498, 'learning_rate': 3.898058361140349e-06, 'epoch': 4.05}
738
- {'loss': 0.0011, 'grad_norm': 0.023233819752931595, 'learning_rate': 3.8756634492643275e-06, 'epoch': 4.05}
739
- {'loss': 0.0011, 'grad_norm': 0.016315072774887085, 'learning_rate': 3.853268537388306e-06, 'epoch': 4.06}
740
- {'loss': 0.0009, 'grad_norm': 0.027211636304855347, 'learning_rate': 3.830873625512284e-06, 'epoch': 4.06}
741
- {'loss': 0.0012, 'grad_norm': 0.006255852058529854, 'learning_rate': 3.808478713636262e-06, 'epoch': 4.07}
742
- {'loss': 0.0011, 'grad_norm': 0.005831268150359392, 'learning_rate': 3.78608380176024e-06, 'epoch': 4.07}
743
- {'loss': 0.0008, 'grad_norm': 0.012144763953983784, 'learning_rate': 3.763688889884219e-06, 'epoch': 4.08}
744
- {'loss': 0.001, 'grad_norm': 0.01724362187087536, 'learning_rate': 3.7412939780081968e-06, 'epoch': 4.09}
745
- {'loss': 0.0008, 'grad_norm': 0.04438236728310585, 'learning_rate': 3.718899066132175e-06, 'epoch': 4.09}
746
- {'loss': 0.0009, 'grad_norm': 0.00658840499818325, 'learning_rate': 3.696504154256153e-06, 'epoch': 4.1}
747
- {'loss': 0.0008, 'grad_norm': 0.05471208319067955, 'learning_rate': 3.674109242380132e-06, 'epoch': 4.1}
748
- {'loss': 0.0008, 'grad_norm': 0.007816795259714127, 'learning_rate': 3.6517143305041098e-06, 'epoch': 4.11}
749
- {'loss': 0.0008, 'grad_norm': 0.02814406529068947, 'learning_rate': 3.6293194186280877e-06, 'epoch': 4.11}
750
- {'loss': 0.0009, 'grad_norm': 0.0004428045067470521, 'learning_rate': 3.6069245067520665e-06, 'epoch': 4.12}
751
- {'loss': 0.0021, 'grad_norm': 0.001689333003014326, 'learning_rate': 3.5845295948760444e-06, 'epoch': 4.12}
752
- {'loss': 0.0007, 'grad_norm': 0.10142877697944641, 'learning_rate': 3.5621346830000223e-06, 'epoch': 4.13}
753
- {'loss': 0.0014, 'grad_norm': 0.03971700370311737, 'learning_rate': 3.539739771124001e-06, 'epoch': 4.13}
754
- {'loss': 0.0008, 'grad_norm': 0.12946633994579315, 'learning_rate': 3.517344859247979e-06, 'epoch': 4.14}
755
- {'loss': 0.0015, 'grad_norm': 0.01494985818862915, 'learning_rate': 3.4949499473719574e-06, 'epoch': 4.15}
756
- {'loss': 0.0008, 'grad_norm': 0.0013534559402614832, 'learning_rate': 3.4725550354959357e-06, 'epoch': 4.15}
757
- {'loss': 0.0008, 'grad_norm': 0.011890546418726444, 'learning_rate': 3.450160123619914e-06, 'epoch': 4.16}
758
- {'loss': 0.0015, 'grad_norm': 0.013109634630382061, 'learning_rate': 3.427765211743892e-06, 'epoch': 4.16}
759
- {'loss': 0.0008, 'grad_norm': 0.0019232493359595537, 'learning_rate': 3.40537029986787e-06, 'epoch': 4.17}
760
- {'loss': 0.0009, 'grad_norm': 0.02066531963646412, 'learning_rate': 3.3829753879918487e-06, 'epoch': 4.17}
761
- {'loss': 0.0018, 'grad_norm': 0.00364371994510293, 'learning_rate': 3.3605804761158266e-06, 'epoch': 4.18}
762
- {'loss': 0.0013, 'grad_norm': 0.0214854683727026, 'learning_rate': 3.3381855642398046e-06, 'epoch': 4.18}
763
- {'loss': 0.0012, 'grad_norm': 0.014650222845375538, 'learning_rate': 3.3157906523637833e-06, 'epoch': 4.19}
764
- {'loss': 0.0008, 'grad_norm': 0.12458858639001846, 'learning_rate': 3.2933957404877613e-06, 'epoch': 4.19}
765
- {'loss': 0.0008, 'grad_norm': 0.05412464588880539, 'learning_rate': 3.2710008286117396e-06, 'epoch': 4.2}
766
- {'loss': 0.0008, 'grad_norm': 0.05582907423377037, 'learning_rate': 3.248605916735718e-06, 'epoch': 4.21}
767
- {'loss': 0.0008, 'grad_norm': 0.006058037281036377, 'learning_rate': 3.2262110048596963e-06, 'epoch': 4.21}
768
- {'loss': 0.001, 'grad_norm': 0.07414203137159348, 'learning_rate': 3.2038160929836743e-06, 'epoch': 4.22}
769
- {'loss': 0.0008, 'grad_norm': 0.07749581336975098, 'learning_rate': 3.181421181107653e-06, 'epoch': 4.22}
770
- {'loss': 0.0008, 'grad_norm': 0.08997820317745209, 'learning_rate': 3.159026269231631e-06, 'epoch': 4.23}
771
- {'loss': 0.0009, 'grad_norm': 0.0007085053948685527, 'learning_rate': 3.136631357355609e-06, 'epoch': 4.23}
772
- {'loss': 0.0008, 'grad_norm': 0.278054803609848, 'learning_rate': 3.1142364454795872e-06, 'epoch': 4.24}
773
- {'loss': 0.0008, 'grad_norm': 0.025398461148142815, 'learning_rate': 3.0918415336035656e-06, 'epoch': 4.24}
774
- {'loss': 0.0011, 'grad_norm': 0.0181169044226408, 'learning_rate': 3.0694466217275435e-06, 'epoch': 4.25}
775
- {'loss': 0.0009, 'grad_norm': 0.03886833414435387, 'learning_rate': 3.047051709851522e-06, 'epoch': 4.25}
776
- {'loss': 0.0007, 'grad_norm': 0.014894254505634308, 'learning_rate': 3.0246567979755002e-06, 'epoch': 4.26}
777
- {'loss': 0.001, 'grad_norm': 0.3343604505062103, 'learning_rate': 3.0022618860994786e-06, 'epoch': 4.27}
778
- {'loss': 0.0007, 'grad_norm': 0.2918633818626404, 'learning_rate': 2.9798669742234565e-06, 'epoch': 4.27}
779
- {'loss': 0.0011, 'grad_norm': 0.011875933967530727, 'learning_rate': 2.9574720623474353e-06, 'epoch': 4.28}
780
- {'loss': 0.0007, 'grad_norm': 0.01958482153713703, 'learning_rate': 2.935077150471413e-06, 'epoch': 4.28}
781
- {'loss': 0.0019, 'grad_norm': 0.018138963729143143, 'learning_rate': 2.912682238595391e-06, 'epoch': 4.29}
782
- {'loss': 0.0009, 'grad_norm': 0.010394470766186714, 'learning_rate': 2.89028732671937e-06, 'epoch': 4.29}
783
- {'loss': 0.0011, 'grad_norm': 0.0032428407575935125, 'learning_rate': 2.867892414843348e-06, 'epoch': 4.3}
784
- {'loss': 0.0008, 'grad_norm': 0.011067216284573078, 'learning_rate': 2.8454975029673258e-06, 'epoch': 4.3}
785
- {'loss': 0.0006, 'grad_norm': 0.022999059408903122, 'learning_rate': 2.823102591091304e-06, 'epoch': 4.31}
786
- {'loss': 0.0009, 'grad_norm': 0.001819304539822042, 'learning_rate': 2.8007076792152825e-06, 'epoch': 4.32}
787
- {'loss': 0.001, 'grad_norm': 0.0037013550754636526, 'learning_rate': 2.778312767339261e-06, 'epoch': 4.32}
788
- {'loss': 0.0007, 'grad_norm': 0.08672203868627548, 'learning_rate': 2.7559178554632387e-06, 'epoch': 4.33}
789
- {'loss': 0.0011, 'grad_norm': 0.005167264491319656, 'learning_rate': 2.7335229435872175e-06, 'epoch': 4.33}
790
- {'loss': 0.0008, 'grad_norm': 0.0014038735534995794, 'learning_rate': 2.7111280317111954e-06, 'epoch': 4.34}
791
- {'loss': 0.0007, 'grad_norm': 0.010056782513856888, 'learning_rate': 2.6887331198351734e-06, 'epoch': 4.34}
792
- {'loss': 0.0007, 'grad_norm': 0.00827051978558302, 'learning_rate': 2.666338207959152e-06, 'epoch': 4.35}
793
- {'loss': 0.0007, 'grad_norm': 0.1306377500295639, 'learning_rate': 2.64394329608313e-06, 'epoch': 4.35}
794
- {'loss': 0.001, 'grad_norm': 0.002261078916490078, 'learning_rate': 2.6215483842071084e-06, 'epoch': 4.36}
795
- {'loss': 0.0008, 'grad_norm': 0.05072946101427078, 'learning_rate': 2.5991534723310868e-06, 'epoch': 4.36}
796
- {'loss': 0.001, 'grad_norm': 0.04886786639690399, 'learning_rate': 2.5767585604550647e-06, 'epoch': 4.37}
797
- {'loss': 0.0014, 'grad_norm': 0.06680363416671753, 'learning_rate': 2.554363648579043e-06, 'epoch': 4.38}
798
- {'loss': 0.0006, 'grad_norm': 0.08678417652845383, 'learning_rate': 2.531968736703021e-06, 'epoch': 4.38}
799
- {'loss': 0.0006, 'grad_norm': 0.17906591296195984, 'learning_rate': 2.5095738248269998e-06, 'epoch': 4.39}
800
- {'loss': 0.001, 'grad_norm': 0.048420462757349014, 'learning_rate': 2.4871789129509777e-06, 'epoch': 4.39}
801
- {'loss': 0.002, 'grad_norm': 0.22092890739440918, 'learning_rate': 2.464784001074956e-06, 'epoch': 4.4}
802
- {'loss': 0.0006, 'grad_norm': 0.02592875249683857, 'learning_rate': 2.442389089198934e-06, 'epoch': 4.4}
803
- {'loss': 0.0007, 'grad_norm': 0.04083279147744179, 'learning_rate': 2.4199941773229123e-06, 'epoch': 4.41}
804
- {'loss': 0.001, 'grad_norm': 0.00027076838887296617, 'learning_rate': 2.3975992654468907e-06, 'epoch': 4.41}
805
- {'loss': 0.0008, 'grad_norm': 0.002070697722956538, 'learning_rate': 2.375204353570869e-06, 'epoch': 4.42}
806
- {'loss': 0.0008, 'grad_norm': 0.022934041917324066, 'learning_rate': 2.352809441694847e-06, 'epoch': 4.42}
807
- {'loss': 0.0009, 'grad_norm': 0.025117984041571617, 'learning_rate': 2.3304145298188253e-06, 'epoch': 4.43}
808
- {'loss': 0.0005, 'grad_norm': 0.0018961215391755104, 'learning_rate': 2.3080196179428037e-06, 'epoch': 4.44}
809
- {'loss': 0.0008, 'grad_norm': 0.016121145337820053, 'learning_rate': 2.285624706066782e-06, 'epoch': 4.44}
810
- {'loss': 0.0008, 'grad_norm': 0.15548691153526306, 'learning_rate': 2.26322979419076e-06, 'epoch': 4.45}
811
- {'loss': 0.0007, 'grad_norm': 0.007404050324112177, 'learning_rate': 2.2408348823147383e-06, 'epoch': 4.45}
812
- {'loss': 0.0006, 'grad_norm': 0.0019669681787490845, 'learning_rate': 2.2184399704387166e-06, 'epoch': 4.46}
813
- {'loss': 0.0006, 'grad_norm': 0.04935136437416077, 'learning_rate': 2.1960450585626946e-06, 'epoch': 4.46}
814
- {'loss': 0.0006, 'grad_norm': 0.007673050742596388, 'learning_rate': 2.173650146686673e-06, 'epoch': 4.47}
815
- {'loss': 0.0006, 'grad_norm': 0.002124810591340065, 'learning_rate': 2.1512552348106513e-06, 'epoch': 4.47}
816
- {'loss': 0.0009, 'grad_norm': 0.011607947759330273, 'learning_rate': 2.128860322934629e-06, 'epoch': 4.48}
817
- {'loss': 0.0007, 'grad_norm': 0.015516514889895916, 'learning_rate': 2.1064654110586076e-06, 'epoch': 4.48}
818
- {'loss': 0.0009, 'grad_norm': 0.013184698298573494, 'learning_rate': 2.084070499182586e-06, 'epoch': 4.49}
819
- {'loss': 0.0006, 'grad_norm': 0.019689731299877167, 'learning_rate': 2.0616755873065643e-06, 'epoch': 4.5}
820
- {'loss': 0.0007, 'grad_norm': 0.22405573725700378, 'learning_rate': 2.0392806754305426e-06, 'epoch': 4.5}
821
- {'loss': 0.0006, 'grad_norm': 0.002072765724733472, 'learning_rate': 2.0168857635545205e-06, 'epoch': 4.51}
822
- {'loss': 0.0007, 'grad_norm': 0.0035121950786560774, 'learning_rate': 1.994490851678499e-06, 'epoch': 4.51}
823
- {'loss': 0.0006, 'grad_norm': 0.0017859174404293299, 'learning_rate': 1.972095939802477e-06, 'epoch': 4.52}
824
- {'loss': 0.0008, 'grad_norm': 0.8883686661720276, 'learning_rate': 1.949701027926455e-06, 'epoch': 4.52}
825
- {'loss': 0.0007, 'grad_norm': 0.3410530984401703, 'learning_rate': 1.9273061160504335e-06, 'epoch': 4.53}
826
- {'loss': 0.0013, 'grad_norm': 0.005357651971280575, 'learning_rate': 1.9049112041744117e-06, 'epoch': 4.53}
827
- {'loss': 0.0006, 'grad_norm': 0.009125343523919582, 'learning_rate': 1.88251629229839e-06, 'epoch': 4.54}
828
- {'loss': 0.0009, 'grad_norm': 0.014439265243709087, 'learning_rate': 1.8601213804223681e-06, 'epoch': 4.55}
829
- {'loss': 0.0015, 'grad_norm': 0.0037733712233603, 'learning_rate': 1.8377264685463465e-06, 'epoch': 4.55}
830
- {'loss': 0.0014, 'grad_norm': 0.07933066040277481, 'learning_rate': 1.8153315566703246e-06, 'epoch': 4.56}
831
- {'loss': 0.0007, 'grad_norm': 0.16726621985435486, 'learning_rate': 1.7929366447943028e-06, 'epoch': 4.56}
832
- {'loss': 0.0007, 'grad_norm': 0.08296032249927521, 'learning_rate': 1.7705417329182811e-06, 'epoch': 4.57}
833
- {'loss': 0.0008, 'grad_norm': 0.0007671950734220445, 'learning_rate': 1.7481468210422595e-06, 'epoch': 4.57}
834
- {'loss': 0.0008, 'grad_norm': 0.07791215181350708, 'learning_rate': 1.7257519091662376e-06, 'epoch': 4.58}
835
- {'loss': 0.0007, 'grad_norm': 0.03872445225715637, 'learning_rate': 1.7033569972902158e-06, 'epoch': 4.58}
836
- {'loss': 0.0006, 'grad_norm': 0.09817063063383102, 'learning_rate': 1.680962085414194e-06, 'epoch': 4.59}
837
- {'loss': 0.0008, 'grad_norm': 0.024218514561653137, 'learning_rate': 1.6585671735381723e-06, 'epoch': 4.59}
838
- {'loss': 0.0008, 'grad_norm': 0.010985558852553368, 'learning_rate': 1.6361722616621506e-06, 'epoch': 4.6}
839
- {'loss': 0.0006, 'grad_norm': 0.0027476183604449034, 'learning_rate': 1.6137773497861287e-06, 'epoch': 4.61}
840
- {'loss': 0.001, 'grad_norm': 0.003122262191027403, 'learning_rate': 1.591382437910107e-06, 'epoch': 4.61}
841
- {'loss': 0.0005, 'grad_norm': 0.0728781521320343, 'learning_rate': 1.568987526034085e-06, 'epoch': 4.62}
842
- {'loss': 0.0007, 'grad_norm': 0.019124431535601616, 'learning_rate': 1.5465926141580634e-06, 'epoch': 4.62}
843
- {'loss': 0.0006, 'grad_norm': 0.004708414431661367, 'learning_rate': 1.5241977022820417e-06, 'epoch': 4.63}
844
- {'loss': 0.0007, 'grad_norm': 0.12547777593135834, 'learning_rate': 1.5018027904060199e-06, 'epoch': 4.63}
845
- {'loss': 0.0009, 'grad_norm': 0.32263386249542236, 'learning_rate': 1.4794078785299982e-06, 'epoch': 4.64}
846
- {'loss': 0.0014, 'grad_norm': 0.01729527674615383, 'learning_rate': 1.4570129666539764e-06, 'epoch': 4.64}
847
- {'loss': 0.0008, 'grad_norm': 0.007950437255203724, 'learning_rate': 1.4346180547779545e-06, 'epoch': 4.65}
848
- {'loss': 0.0006, 'grad_norm': 0.011319808661937714, 'learning_rate': 1.4122231429019328e-06, 'epoch': 4.65}
849
- {'loss': 0.0006, 'grad_norm': 0.0025837954599410295, 'learning_rate': 1.389828231025911e-06, 'epoch': 4.66}
850
- {'loss': 0.0016, 'grad_norm': 0.0021279077045619488, 'learning_rate': 1.3674333191498893e-06, 'epoch': 4.67}
851
- {'loss': 0.0006, 'grad_norm': 0.0539991520345211, 'learning_rate': 1.3450384072738675e-06, 'epoch': 4.67}
852
- {'loss': 0.0006, 'grad_norm': 0.0006465984624810517, 'learning_rate': 1.3226434953978456e-06, 'epoch': 4.68}
853
- {'loss': 0.0012, 'grad_norm': 0.027662355452775955, 'learning_rate': 1.300248583521824e-06, 'epoch': 4.68}
854
- {'loss': 0.0007, 'grad_norm': 0.004381787031888962, 'learning_rate': 1.2778536716458021e-06, 'epoch': 4.69}
855
- {'loss': 0.0009, 'grad_norm': 0.004225610289722681, 'learning_rate': 1.2554587597697805e-06, 'epoch': 4.69}
856
- {'loss': 0.0006, 'grad_norm': 0.0009983439231291413, 'learning_rate': 1.2330638478937586e-06, 'epoch': 4.7}
857
- {'loss': 0.0005, 'grad_norm': 0.024487098678946495, 'learning_rate': 1.210668936017737e-06, 'epoch': 4.7}
858
- {'loss': 0.0007, 'grad_norm': 0.3406839966773987, 'learning_rate': 1.188274024141715e-06, 'epoch': 4.71}
859
- {'loss': 0.0007, 'grad_norm': 0.022679802030324936, 'learning_rate': 1.1658791122656932e-06, 'epoch': 4.71}
860
- {'loss': 0.0006, 'grad_norm': 0.0023362182546406984, 'learning_rate': 1.1434842003896716e-06, 'epoch': 4.72}
861
- {'loss': 0.0006, 'grad_norm': 0.006971537135541439, 'learning_rate': 1.12108928851365e-06, 'epoch': 4.73}
862
- {'loss': 0.0006, 'grad_norm': 0.06807754933834076, 'learning_rate': 1.098694376637628e-06, 'epoch': 4.73}
863
- {'loss': 0.0006, 'grad_norm': 0.007362959440797567, 'learning_rate': 1.0762994647616062e-06, 'epoch': 4.74}
864
- {'loss': 0.0006, 'grad_norm': 0.08116839826107025, 'learning_rate': 1.0539045528855844e-06, 'epoch': 4.74}
865
- {'loss': 0.0006, 'grad_norm': 0.01928202621638775, 'learning_rate': 1.0315096410095627e-06, 'epoch': 4.75}
866
- {'loss': 0.0006, 'grad_norm': 0.13101086020469666, 'learning_rate': 1.009114729133541e-06, 'epoch': 4.75}
867
- {'loss': 0.0006, 'grad_norm': 0.004853931255638599, 'learning_rate': 9.867198172575192e-07, 'epoch': 4.76}
868
- {'loss': 0.0006, 'grad_norm': 0.02783609926700592, 'learning_rate': 9.643249053814973e-07, 'epoch': 4.76}
869
- {'loss': 0.0018, 'grad_norm': 0.003236155491322279, 'learning_rate': 9.419299935054756e-07, 'epoch': 4.77}
870
- {'loss': 0.0009, 'grad_norm': 0.023846732452511787, 'learning_rate': 9.195350816294539e-07, 'epoch': 4.78}
871
- {'loss': 0.0007, 'grad_norm': 0.01901441439986229, 'learning_rate': 8.971401697534321e-07, 'epoch': 4.78}
872
- {'loss': 0.0007, 'grad_norm': 0.00501618767157197, 'learning_rate': 8.747452578774103e-07, 'epoch': 4.79}
873
- {'loss': 0.0005, 'grad_norm': 0.007777991704642773, 'learning_rate': 8.523503460013885e-07, 'epoch': 4.79}
874
- {'loss': 0.0009, 'grad_norm': 0.6491960883140564, 'learning_rate': 8.299554341253668e-07, 'epoch': 4.8}
875
- {'loss': 0.0013, 'grad_norm': 0.0740993320941925, 'learning_rate': 8.075605222493451e-07, 'epoch': 4.8}
876
- {'loss': 0.0007, 'grad_norm': 0.02660405822098255, 'learning_rate': 7.851656103733232e-07, 'epoch': 4.81}
877
- {'loss': 0.0006, 'grad_norm': 0.048786524683237076, 'learning_rate': 7.627706984973014e-07, 'epoch': 4.81}
878
- {'loss': 0.0007, 'grad_norm': 0.005497151054441929, 'learning_rate': 7.403757866212798e-07, 'epoch': 4.82}
879
- {'loss': 0.001, 'grad_norm': 0.003488279180601239, 'learning_rate': 7.179808747452579e-07, 'epoch': 4.82}
880
- {'loss': 0.0019, 'grad_norm': 0.02504000999033451, 'learning_rate': 6.955859628692362e-07, 'epoch': 4.83}
881
- {'loss': 0.0006, 'grad_norm': 0.009829353541135788, 'learning_rate': 6.731910509932143e-07, 'epoch': 4.84}
882
- {'loss': 0.0006, 'grad_norm': 0.01532562542706728, 'learning_rate': 6.507961391171927e-07, 'epoch': 4.84}
883
- {'loss': 0.0009, 'grad_norm': 0.00034189983853138983, 'learning_rate': 6.284012272411709e-07, 'epoch': 4.85}
884
- {'loss': 0.0006, 'grad_norm': 0.019531667232513428, 'learning_rate': 6.060063153651491e-07, 'epoch': 4.85}
885
- {'loss': 0.001, 'grad_norm': 0.0004679520789068192, 'learning_rate': 5.836114034891273e-07, 'epoch': 4.86}
886
- {'loss': 0.0011, 'grad_norm': 0.00026422596420161426, 'learning_rate': 5.612164916131056e-07, 'epoch': 4.86}
887
- {'loss': 0.0007, 'grad_norm': 0.0357745960354805, 'learning_rate': 5.388215797370838e-07, 'epoch': 4.87}
888
- {'loss': 0.0007, 'grad_norm': 0.008043075911700726, 'learning_rate': 5.16426667861062e-07, 'epoch': 4.87}
889
- {'loss': 0.0007, 'grad_norm': 0.01412264909595251, 'learning_rate': 4.940317559850402e-07, 'epoch': 4.88}
890
- {'loss': 0.0018, 'grad_norm': 0.027081595733761787, 'learning_rate': 4.716368441090185e-07, 'epoch': 4.88}
891
- {'loss': 0.0007, 'grad_norm': 0.02130473032593727, 'learning_rate': 4.4924193223299667e-07, 'epoch': 4.89}
892
- {'loss': 0.0012, 'grad_norm': 0.0006097204168327153, 'learning_rate': 4.2684702035697497e-07, 'epoch': 4.9}
893
- {'loss': 0.0007, 'grad_norm': 0.007859633304178715, 'learning_rate': 4.0445210848095316e-07, 'epoch': 4.9}
894
- {'loss': 0.0009, 'grad_norm': 0.025279998779296875, 'learning_rate': 3.820571966049314e-07, 'epoch': 4.91}
895
- {'loss': 0.0007, 'grad_norm': 0.010460122488439083, 'learning_rate': 3.596622847289096e-07, 'epoch': 4.91}
896
- {'loss': 0.001, 'grad_norm': 0.5298627018928528, 'learning_rate': 3.372673728528879e-07, 'epoch': 4.92}
897
- {'loss': 0.0007, 'grad_norm': 0.0009814887307584286, 'learning_rate': 3.148724609768661e-07, 'epoch': 4.92}
898
- {'loss': 0.0007, 'grad_norm': 0.09579623490571976, 'learning_rate': 2.924775491008443e-07, 'epoch': 4.93}
899
- {'loss': 0.0007, 'grad_norm': 0.006857629399746656, 'learning_rate': 2.7008263722482253e-07, 'epoch': 4.93}
900
- {'loss': 0.0011, 'grad_norm': 0.004843506496399641, 'learning_rate': 2.476877253488008e-07, 'epoch': 4.94}
901
- {'loss': 0.0005, 'grad_norm': 0.1149492859840393, 'learning_rate': 2.25292813472779e-07, 'epoch': 4.94}
902
- {'loss': 0.0007, 'grad_norm': 0.09972663223743439, 'learning_rate': 2.0289790159675724e-07, 'epoch': 4.95}
903
- {'loss': 0.0006, 'grad_norm': 0.036814313381910324, 'learning_rate': 1.8050298972073546e-07, 'epoch': 4.96}
904
- {'loss': 0.0009, 'grad_norm': 0.016577519476413727, 'learning_rate': 1.581080778447137e-07, 'epoch': 4.96}
905
- {'loss': 0.0008, 'grad_norm': 1.288059949874878, 'learning_rate': 1.3571316596869193e-07, 'epoch': 4.97}
906
- {'loss': 0.0015, 'grad_norm': 0.060177162289619446, 'learning_rate': 1.1331825409267016e-07, 'epoch': 4.97}
907
- {'loss': 0.0008, 'grad_norm': 0.03802037984132767, 'learning_rate': 9.092334221664839e-08, 'epoch': 4.98}
908
- {'loss': 0.0006, 'grad_norm': 0.025011925026774406, 'learning_rate': 6.852843034062661e-08, 'epoch': 4.98}
909
- {'loss': 0.0006, 'grad_norm': 0.0070183370262384415, 'learning_rate': 4.6133518464604844e-08, 'epoch': 4.99}
910
- {'loss': 0.0007, 'grad_norm': 0.0013526534894481301, 'learning_rate': 2.3738606588583077e-08, 'epoch': 4.99}
911
- {'loss': 0.0006, 'grad_norm': 0.0005251829861663282, 'learning_rate': 1.343694712561306e-09, 'epoch': 5.0}
912
- {'train_runtime': 93010.3988, 'train_samples_per_second': 39.267, 'train_steps_per_second': 4.908, 'train_loss': 0.014228966551943086, 'epoch': 5.0}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mx-01/model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:f2e06a55924f0f993358ea48a4ee35966109cd59623a761909e0a4fdad0d4587
3
- size 90864192
 
 
 
 
mx-01/modules.json DELETED
@@ -1,14 +0,0 @@
1
- [
2
- {
3
- "idx": 0,
4
- "name": "0",
5
- "path": "",
6
- "type": "sentence_transformers.models.Transformer"
7
- },
8
- {
9
- "idx": 1,
10
- "name": "1",
11
- "path": "1_Pooling",
12
- "type": "sentence_transformers.models.Pooling"
13
- }
14
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mx-01/mx_eval.csv DELETED
@@ -1,2 +0,0 @@
1
- epoch,steps,cosine-Accuracy@1,cosine-Accuracy@3,cosine-Accuracy@5,cosine-Accuracy@10,cosine-Precision@1,cosine-Recall@1,cosine-Precision@3,cosine-Recall@3,cosine-Precision@5,cosine-Recall@5,cosine-Precision@10,cosine-Recall@10,cosine-MRR@10,cosine-NDCG@10,cosine-MAP@100,dot-Accuracy@1,dot-Accuracy@3,dot-Accuracy@5,dot-Accuracy@10,dot-Precision@1,dot-Recall@1,dot-Precision@3,dot-Recall@3,dot-Precision@5,dot-Recall@5,dot-Precision@10,dot-Recall@10,dot-MRR@10,dot-NDCG@10,dot-MAP@100
2
- -1,-1,0.6832646087627935,0.7984590363227152,0.8344641400119378,0.8748336646350479,0.6832646087627935,0.6832646087627935,0.2661530121075717,0.7984590363227152,0.16689282800238753,0.8344641400119378,0.0874833664635048,0.8748336646350479,0.7486561788790493,0.7792742631517174,0.7522211420770829,0.47666924041552355,0.6420791509914409,0.7047636258097726,0.775700525154288,0.47666924041552355,0.47666924041552355,0.21402638366381363,0.6420791509914409,0.14095272516195448,0.7047636258097726,0.0775700525154288,0.775700525154288,0.5740915735669964,0.6226885460603906,0.5807913053435532
 
 
 
mx-01/sentence_bert_config.json DELETED
@@ -1,4 +0,0 @@
1
- {
2
- "max_seq_length": 512,
3
- "do_lower_case": false
4
- }
 
 
 
 
 
mx-01/special_tokens_map.json DELETED
@@ -1,7 +0,0 @@
1
- {
2
- "cls_token": "[CLS]",
3
- "mask_token": "[MASK]",
4
- "pad_token": "[PAD]",
5
- "sep_token": "[SEP]",
6
- "unk_token": "[UNK]"
7
- }
 
 
 
 
 
 
 
 
mx-01/tokenizer.json DELETED
The diff for this file is too large to render. See raw diff
 
mx-01/tokenizer_config.json DELETED
@@ -1,57 +0,0 @@
1
- {
2
- "added_tokens_decoder": {
3
- "0": {
4
- "content": "[PAD]",
5
- "lstrip": false,
6
- "normalized": false,
7
- "rstrip": false,
8
- "single_word": false,
9
- "special": true
10
- },
11
- "100": {
12
- "content": "[UNK]",
13
- "lstrip": false,
14
- "normalized": false,
15
- "rstrip": false,
16
- "single_word": false,
17
- "special": true
18
- },
19
- "101": {
20
- "content": "[CLS]",
21
- "lstrip": false,
22
- "normalized": false,
23
- "rstrip": false,
24
- "single_word": false,
25
- "special": true
26
- },
27
- "102": {
28
- "content": "[SEP]",
29
- "lstrip": false,
30
- "normalized": false,
31
- "rstrip": false,
32
- "single_word": false,
33
- "special": true
34
- },
35
- "103": {
36
- "content": "[MASK]",
37
- "lstrip": false,
38
- "normalized": false,
39
- "rstrip": false,
40
- "single_word": false,
41
- "special": true
42
- }
43
- },
44
- "clean_up_tokenization_spaces": true,
45
- "cls_token": "[CLS]",
46
- "do_basic_tokenize": true,
47
- "do_lower_case": true,
48
- "mask_token": "[MASK]",
49
- "model_max_length": 512,
50
- "never_split": null,
51
- "pad_token": "[PAD]",
52
- "sep_token": "[SEP]",
53
- "strip_accents": null,
54
- "tokenize_chinese_chars": true,
55
- "tokenizer_class": "BertTokenizer",
56
- "unk_token": "[UNK]"
57
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mx-01/vocab.txt DELETED
The diff for this file is too large to render. See raw diff