AlekseyKorshuk commited on
Commit
36c9461
·
1 Parent(s): f0e4e3b

huggingartists

Browse files
README.md CHANGED
@@ -45,15 +45,15 @@ from datasets import load_dataset
45
  dataset = load_dataset("huggingartists/oxxxymiron")
46
  ```
47
 
48
- [Explore the data](https://wandb.ai/huggingartists/huggingartists/runs/5ngu5msi/artifacts), which is tracked with [W&B artifacts](https://docs.wandb.com/artifacts) at every step of the pipeline.
49
 
50
  ## Training procedure
51
 
52
  The model is based on a pre-trained [GPT-2](https://huggingface.co/gpt2) which is fine-tuned on Oxxxymiron's lyrics.
53
 
54
- Hyperparameters and metrics are recorded in the [W&B training run](https://wandb.ai/huggingartists/huggingartists/runs/4oy4fj0c) for full transparency and reproducibility.
55
 
56
- At the end of training, [the final model](https://wandb.ai/huggingartists/huggingartists/runs/4oy4fj0c/artifacts) is logged and versioned.
57
 
58
  ## How to use
59
 
 
45
  dataset = load_dataset("huggingartists/oxxxymiron")
46
  ```
47
 
48
+ [Explore the data](https://wandb.ai/huggingartists/huggingartists/runs/296e4zy2/artifacts), which is tracked with [W&B artifacts](https://docs.wandb.com/artifacts) at every step of the pipeline.
49
 
50
  ## Training procedure
51
 
52
  The model is based on a pre-trained [GPT-2](https://huggingface.co/gpt2) which is fine-tuned on Oxxxymiron's lyrics.
53
 
54
+ Hyperparameters and metrics are recorded in the [W&B training run](https://wandb.ai/huggingartists/huggingartists/runs/lyd324n8) for full transparency and reproducibility.
55
 
56
+ At the end of training, [the final model](https://wandb.ai/huggingartists/huggingartists/runs/lyd324n8/artifacts) is logged and versioned.
57
 
58
  ## How to use
59
 
flax_model.msgpack CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6627bbb172ffa4474cab42533d89cb63e382e552c59c2f2524650cee888f7031
3
  size 497764120
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf9a57b36b9276338603832189c301268f56bae13bf00fb4ac5da15b09879d59
3
  size 497764120
optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fafd0c8ee3ebd78e6b2f5d829bbe6915ba257c22151eaea372eaf6308bd047bc
3
  size 995604017
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6fcbd0efc1ba3f1cf85a1ad153a13999c85e76f9f2b7c9440f87ec591788eaa1
3
  size 995604017
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6401c6953d2a52a3c945b0c8a8254e0338b9210a54008606369335e170c7f253
3
  size 510403817
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa19aa16a103bc8d557dbe011a86e4135bafaa6acf98d21f7c8a1f0ef4362155
3
  size 510403817
rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1a0c9ab406d67594fa433e87fc1b86954a9d6eac24e3da62d9944fae305c71ea
3
- size 14503
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44cd6576738e9780ee196c4c60bbd639b2c9174b22df17c2b94e485513761999
3
+ size 14439
scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dee4762040282e327730d4b0ce6aed17705b9f88b517c00e29c9f3d26d6da55b
3
  size 623
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3757bed4046f1195d42cf6b407ad0ea93daf46be4a6bcc490f1daeb7bdf87f8c
3
  size 623
trainer_state.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 5.0,
5
- "global_step": 1295,
6
  "is_hyper_param_search": false,
7
  "is_local_process_zero": true,
8
  "is_world_process_zero": true,
@@ -1560,11 +1560,3119 @@
1560
  "learning_rate": 0.0,
1561
  "loss": 1.7624,
1562
  "step": 1295
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1563
  }
1564
  ],
1565
- "max_steps": 1295,
1566
- "num_train_epochs": 5,
1567
- "total_flos": 1352839495680000.0,
1568
  "trial_name": null,
1569
  "trial_params": null
1570
  }
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 15.0,
5
+ "global_step": 3885,
6
  "is_hyper_param_search": false,
7
  "is_local_process_zero": true,
8
  "is_world_process_zero": true,
 
1560
  "learning_rate": 0.0,
1561
  "loss": 1.7624,
1562
  "step": 1295
1563
+ },
1564
+ {
1565
+ "epoch": 5.02,
1566
+ "learning_rate": 1.261250123775442e-07,
1567
+ "loss": 1.6889,
1568
+ "step": 1300
1569
+ },
1570
+ {
1571
+ "epoch": 5.04,
1572
+ "learning_rate": 5.040362734534236e-07,
1573
+ "loss": 1.6715,
1574
+ "step": 1305
1575
+ },
1576
+ {
1577
+ "epoch": 5.06,
1578
+ "learning_rate": 1.132344160414776e-06,
1579
+ "loss": 1.7252,
1580
+ "step": 1310
1581
+ },
1582
+ {
1583
+ "epoch": 5.08,
1584
+ "learning_rate": 2.0087383134942512e-06,
1585
+ "loss": 1.7355,
1586
+ "step": 1315
1587
+ },
1588
+ {
1589
+ "epoch": 5.1,
1590
+ "learning_rate": 3.1299961314264046e-06,
1591
+ "loss": 1.8099,
1592
+ "step": 1320
1593
+ },
1594
+ {
1595
+ "epoch": 5.12,
1596
+ "learning_rate": 4.491994621320179e-06,
1597
+ "loss": 1.7203,
1598
+ "step": 1325
1599
+ },
1600
+ {
1601
+ "epoch": 5.14,
1602
+ "learning_rate": 6.089725559373899e-06,
1603
+ "loss": 1.7262,
1604
+ "step": 1330
1605
+ },
1606
+ {
1607
+ "epoch": 5.15,
1608
+ "learning_rate": 7.917313906685478e-06,
1609
+ "loss": 1.721,
1610
+ "step": 1335
1611
+ },
1612
+ {
1613
+ "epoch": 5.17,
1614
+ "learning_rate": 9.968039412440962e-06,
1615
+ "loss": 1.6592,
1616
+ "step": 1340
1617
+ },
1618
+ {
1619
+ "epoch": 5.19,
1620
+ "learning_rate": 1.2234361325042687e-05,
1621
+ "loss": 1.6871,
1622
+ "step": 1345
1623
+ },
1624
+ {
1625
+ "epoch": 5.21,
1626
+ "learning_rate": 1.4707946120313293e-05,
1627
+ "loss": 1.7319,
1628
+ "step": 1350
1629
+ },
1630
+ {
1631
+ "epoch": 5.23,
1632
+ "learning_rate": 1.737969814481516e-05,
1633
+ "loss": 1.7102,
1634
+ "step": 1355
1635
+ },
1636
+ {
1637
+ "epoch": 5.25,
1638
+ "learning_rate": 2.0239793061604692e-05,
1639
+ "loss": 1.6969,
1640
+ "step": 1360
1641
+ },
1642
+ {
1643
+ "epoch": 5.27,
1644
+ "learning_rate": 2.3277713975440297e-05,
1645
+ "loss": 1.6891,
1646
+ "step": 1365
1647
+ },
1648
+ {
1649
+ "epoch": 5.29,
1650
+ "learning_rate": 2.648229010460629e-05,
1651
+ "loss": 1.7584,
1652
+ "step": 1370
1653
+ },
1654
+ {
1655
+ "epoch": 5.31,
1656
+ "learning_rate": 2.9841737857150448e-05,
1657
+ "loss": 1.7074,
1658
+ "step": 1375
1659
+ },
1660
+ {
1661
+ "epoch": 5.33,
1662
+ "learning_rate": 3.334370416049612e-05,
1663
+ "loss": 1.6984,
1664
+ "step": 1380
1665
+ },
1666
+ {
1667
+ "epoch": 5.35,
1668
+ "learning_rate": 3.697531188509984e-05,
1669
+ "loss": 1.6734,
1670
+ "step": 1385
1671
+ },
1672
+ {
1673
+ "epoch": 5.37,
1674
+ "learning_rate": 4.072320719512421e-05,
1675
+ "loss": 1.6728,
1676
+ "step": 1390
1677
+ },
1678
+ {
1679
+ "epoch": 5.39,
1680
+ "learning_rate": 4.457360865201626e-05,
1681
+ "loss": 1.7748,
1682
+ "step": 1395
1683
+ },
1684
+ {
1685
+ "epoch": 5.41,
1686
+ "learning_rate": 4.8512357890428555e-05,
1687
+ "loss": 1.6899,
1688
+ "step": 1400
1689
+ },
1690
+ {
1691
+ "epoch": 5.42,
1692
+ "learning_rate": 5.252497168014445e-05,
1693
+ "loss": 1.6741,
1694
+ "step": 1405
1695
+ },
1696
+ {
1697
+ "epoch": 5.44,
1698
+ "learning_rate": 5.659669518256621e-05,
1699
+ "loss": 1.6965,
1700
+ "step": 1410
1701
+ },
1702
+ {
1703
+ "epoch": 5.46,
1704
+ "learning_rate": 6.071255620594022e-05,
1705
+ "loss": 1.6641,
1706
+ "step": 1415
1707
+ },
1708
+ {
1709
+ "epoch": 5.48,
1710
+ "learning_rate": 6.485742025981456e-05,
1711
+ "loss": 1.7494,
1712
+ "step": 1420
1713
+ },
1714
+ {
1715
+ "epoch": 5.5,
1716
+ "learning_rate": 6.901604620628525e-05,
1717
+ "loss": 1.7469,
1718
+ "step": 1425
1719
+ },
1720
+ {
1721
+ "epoch": 5.52,
1722
+ "learning_rate": 7.31731423033995e-05,
1723
+ "loss": 1.7503,
1724
+ "step": 1430
1725
+ },
1726
+ {
1727
+ "epoch": 5.54,
1728
+ "learning_rate": 7.731342243463585e-05,
1729
+ "loss": 1.7359,
1730
+ "step": 1435
1731
+ },
1732
+ {
1733
+ "epoch": 5.56,
1734
+ "learning_rate": 8.14216623176967e-05,
1735
+ "loss": 1.7196,
1736
+ "step": 1440
1737
+ },
1738
+ {
1739
+ "epoch": 5.58,
1740
+ "learning_rate": 8.548275548593167e-05,
1741
+ "loss": 1.7352,
1742
+ "step": 1445
1743
+ },
1744
+ {
1745
+ "epoch": 5.6,
1746
+ "learning_rate": 8.948176883653917e-05,
1747
+ "loss": 1.7017,
1748
+ "step": 1450
1749
+ },
1750
+ {
1751
+ "epoch": 5.62,
1752
+ "learning_rate": 9.340399754128714e-05,
1753
+ "loss": 1.7402,
1754
+ "step": 1455
1755
+ },
1756
+ {
1757
+ "epoch": 5.64,
1758
+ "learning_rate": 9.723501911784583e-05,
1759
+ "loss": 1.7463,
1760
+ "step": 1460
1761
+ },
1762
+ {
1763
+ "epoch": 5.66,
1764
+ "learning_rate": 0.00010096074646289766,
1765
+ "loss": 1.6842,
1766
+ "step": 1465
1767
+ },
1768
+ {
1769
+ "epoch": 5.68,
1770
+ "learning_rate": 0.00010456747965202592,
1771
+ "loss": 1.7311,
1772
+ "step": 1470
1773
+ },
1774
+ {
1775
+ "epoch": 5.69,
1776
+ "learning_rate": 0.00010804195631589798,
1777
+ "loss": 1.7528,
1778
+ "step": 1475
1779
+ },
1780
+ {
1781
+ "epoch": 5.71,
1782
+ "learning_rate": 0.00011137140040750908,
1783
+ "loss": 1.7338,
1784
+ "step": 1480
1785
+ },
1786
+ {
1787
+ "epoch": 5.73,
1788
+ "learning_rate": 0.00011454356918116697,
1789
+ "loss": 1.8454,
1790
+ "step": 1485
1791
+ },
1792
+ {
1793
+ "epoch": 5.75,
1794
+ "learning_rate": 0.00011754679821046187,
1795
+ "loss": 1.6556,
1796
+ "step": 1490
1797
+ },
1798
+ {
1799
+ "epoch": 5.77,
1800
+ "learning_rate": 0.00012037004427969469,
1801
+ "loss": 1.7088,
1802
+ "step": 1495
1803
+ },
1804
+ {
1805
+ "epoch": 5.79,
1806
+ "learning_rate": 0.00012300292599103937,
1807
+ "loss": 1.7949,
1808
+ "step": 1500
1809
+ },
1810
+ {
1811
+ "epoch": 5.81,
1812
+ "learning_rate": 0.0001254357619381275,
1813
+ "loss": 1.7168,
1814
+ "step": 1505
1815
+ },
1816
+ {
1817
+ "epoch": 5.83,
1818
+ "learning_rate": 0.00012765960630568417,
1819
+ "loss": 1.8008,
1820
+ "step": 1510
1821
+ },
1822
+ {
1823
+ "epoch": 5.85,
1824
+ "learning_rate": 0.00012966628176431028,
1825
+ "loss": 1.7668,
1826
+ "step": 1515
1827
+ },
1828
+ {
1829
+ "epoch": 5.87,
1830
+ "learning_rate": 0.000131448409539456,
1831
+ "loss": 1.8123,
1832
+ "step": 1520
1833
+ },
1834
+ {
1835
+ "epoch": 5.89,
1836
+ "learning_rate": 0.00013299943654401658,
1837
+ "loss": 1.6967,
1838
+ "step": 1525
1839
+ },
1840
+ {
1841
+ "epoch": 5.91,
1842
+ "learning_rate": 0.0001343136594747806,
1843
+ "loss": 1.7767,
1844
+ "step": 1530
1845
+ },
1846
+ {
1847
+ "epoch": 5.93,
1848
+ "learning_rate": 0.00013538624578412676,
1849
+ "loss": 1.8144,
1850
+ "step": 1535
1851
+ },
1852
+ {
1853
+ "epoch": 5.95,
1854
+ "learning_rate": 0.0001362132514498528,
1855
+ "loss": 1.7673,
1856
+ "step": 1540
1857
+ },
1858
+ {
1859
+ "epoch": 5.97,
1860
+ "learning_rate": 0.00013679163547779458,
1861
+ "loss": 1.7239,
1862
+ "step": 1545
1863
+ },
1864
+ {
1865
+ "epoch": 5.98,
1866
+ "learning_rate": 0.00013711927108390887,
1867
+ "loss": 1.7237,
1868
+ "step": 1550
1869
+ },
1870
+ {
1871
+ "epoch": 6.0,
1872
+ "learning_rate": 0.00013719495351470075,
1873
+ "loss": 1.8186,
1874
+ "step": 1555
1875
+ },
1876
+ {
1877
+ "epoch": 6.02,
1878
+ "learning_rate": 0.0001370184044772396,
1879
+ "loss": 1.7571,
1880
+ "step": 1560
1881
+ },
1882
+ {
1883
+ "epoch": 6.04,
1884
+ "learning_rate": 0.00013659027316247397,
1885
+ "loss": 1.6908,
1886
+ "step": 1565
1887
+ },
1888
+ {
1889
+ "epoch": 6.06,
1890
+ "learning_rate": 0.00013591213385808238,
1891
+ "loss": 1.6378,
1892
+ "step": 1570
1893
+ },
1894
+ {
1895
+ "epoch": 6.08,
1896
+ "learning_rate": 0.0001349864801596381,
1897
+ "loss": 1.763,
1898
+ "step": 1575
1899
+ },
1900
+ {
1901
+ "epoch": 6.1,
1902
+ "learning_rate": 0.00013381671580137334,
1903
+ "loss": 1.8114,
1904
+ "step": 1580
1905
+ },
1906
+ {
1907
+ "epoch": 6.12,
1908
+ "learning_rate": 0.00013240714214026117,
1909
+ "loss": 1.7691,
1910
+ "step": 1585
1911
+ },
1912
+ {
1913
+ "epoch": 6.14,
1914
+ "learning_rate": 0.00013076294233943417,
1915
+ "loss": 1.7388,
1916
+ "step": 1590
1917
+ },
1918
+ {
1919
+ "epoch": 6.16,
1920
+ "learning_rate": 0.0001288901623091032,
1921
+ "loss": 1.6458,
1922
+ "step": 1595
1923
+ },
1924
+ {
1925
+ "epoch": 6.18,
1926
+ "learning_rate": 0.00012679568847505571,
1927
+ "loss": 1.6852,
1928
+ "step": 1600
1929
+ },
1930
+ {
1931
+ "epoch": 6.2,
1932
+ "learning_rate": 0.00012448722245648225,
1933
+ "loss": 1.7267,
1934
+ "step": 1605
1935
+ },
1936
+ {
1937
+ "epoch": 6.22,
1938
+ "learning_rate": 0.00012197325274624507,
1939
+ "loss": 1.7517,
1940
+ "step": 1610
1941
+ },
1942
+ {
1943
+ "epoch": 6.24,
1944
+ "learning_rate": 0.00011926302349772057,
1945
+ "loss": 1.7343,
1946
+ "step": 1615
1947
+ },
1948
+ {
1949
+ "epoch": 6.25,
1950
+ "learning_rate": 0.0001163665005329939,
1951
+ "loss": 1.6811,
1952
+ "step": 1620
1953
+ },
1954
+ {
1955
+ "epoch": 6.27,
1956
+ "learning_rate": 0.00011329433469739406,
1957
+ "loss": 1.7056,
1958
+ "step": 1625
1959
+ },
1960
+ {
1961
+ "epoch": 6.29,
1962
+ "learning_rate": 0.00011005782269511991,
1963
+ "loss": 1.7447,
1964
+ "step": 1630
1965
+ },
1966
+ {
1967
+ "epoch": 6.31,
1968
+ "learning_rate": 0.00010666886554997244,
1969
+ "loss": 1.6661,
1970
+ "step": 1635
1971
+ },
1972
+ {
1973
+ "epoch": 6.33,
1974
+ "learning_rate": 0.00010313992484393024,
1975
+ "loss": 1.723,
1976
+ "step": 1640
1977
+ },
1978
+ {
1979
+ "epoch": 6.35,
1980
+ "learning_rate": 9.948397689449228e-05,
1981
+ "loss": 1.6887,
1982
+ "step": 1645
1983
+ },
1984
+ {
1985
+ "epoch": 6.37,
1986
+ "learning_rate": 9.571446503927964e-05,
1987
+ "loss": 1.6767,
1988
+ "step": 1650
1989
+ },
1990
+ {
1991
+ "epoch": 6.39,
1992
+ "learning_rate": 9.184525020334699e-05,
1993
+ "loss": 1.6593,
1994
+ "step": 1655
1995
+ },
1996
+ {
1997
+ "epoch": 6.41,
1998
+ "learning_rate": 8.789055993098258e-05,
1999
+ "loss": 1.6807,
2000
+ "step": 1660
2001
+ },
2002
+ {
2003
+ "epoch": 6.43,
2004
+ "learning_rate": 8.386493606940322e-05,
2005
+ "loss": 1.7043,
2006
+ "step": 1665
2007
+ },
2008
+ {
2009
+ "epoch": 6.45,
2010
+ "learning_rate": 7.978318129672484e-05,
2011
+ "loss": 1.7188,
2012
+ "step": 1670
2013
+ },
2014
+ {
2015
+ "epoch": 6.47,
2016
+ "learning_rate": 7.566030469082603e-05,
2017
+ "loss": 1.6494,
2018
+ "step": 1675
2019
+ },
2020
+ {
2021
+ "epoch": 6.49,
2022
+ "learning_rate": 7.151146653925584e-05,
2023
+ "loss": 1.6752,
2024
+ "step": 1680
2025
+ },
2026
+ {
2027
+ "epoch": 6.51,
2028
+ "learning_rate": 6.735192259312878e-05,
2029
+ "loss": 1.6569,
2030
+ "step": 1685
2031
+ },
2032
+ {
2033
+ "epoch": 6.53,
2034
+ "learning_rate": 6.319696796998709e-05,
2035
+ "loss": 1.6728,
2036
+ "step": 1690
2037
+ },
2038
+ {
2039
+ "epoch": 6.54,
2040
+ "learning_rate": 5.906188091190817e-05,
2041
+ "loss": 1.6875,
2042
+ "step": 1695
2043
+ },
2044
+ {
2045
+ "epoch": 6.56,
2046
+ "learning_rate": 5.4961866605667284e-05,
2047
+ "loss": 1.6511,
2048
+ "step": 1700
2049
+ },
2050
+ {
2051
+ "epoch": 6.58,
2052
+ "learning_rate": 5.091200127153063e-05,
2053
+ "loss": 1.6906,
2054
+ "step": 1705
2055
+ },
2056
+ {
2057
+ "epoch": 6.6,
2058
+ "learning_rate": 4.6927176726273094e-05,
2059
+ "loss": 1.6586,
2060
+ "step": 1710
2061
+ },
2062
+ {
2063
+ "epoch": 6.62,
2064
+ "learning_rate": 4.302204562427082e-05,
2065
+ "loss": 1.6804,
2066
+ "step": 1715
2067
+ },
2068
+ {
2069
+ "epoch": 6.64,
2070
+ "learning_rate": 3.921096757801896e-05,
2071
+ "loss": 1.6353,
2072
+ "step": 1720
2073
+ },
2074
+ {
2075
+ "epoch": 6.66,
2076
+ "learning_rate": 3.550795635619796e-05,
2077
+ "loss": 1.6895,
2078
+ "step": 1725
2079
+ },
2080
+ {
2081
+ "epoch": 6.68,
2082
+ "learning_rate": 3.192662835344908e-05,
2083
+ "loss": 1.7331,
2084
+ "step": 1730
2085
+ },
2086
+ {
2087
+ "epoch": 6.7,
2088
+ "learning_rate": 2.8480152521337155e-05,
2089
+ "loss": 1.6892,
2090
+ "step": 1735
2091
+ },
2092
+ {
2093
+ "epoch": 6.72,
2094
+ "learning_rate": 2.51812019446141e-05,
2095
+ "loss": 1.6841,
2096
+ "step": 1740
2097
+ },
2098
+ {
2099
+ "epoch": 6.74,
2100
+ "learning_rate": 2.2041907240840133e-05,
2101
+ "loss": 1.7096,
2102
+ "step": 1745
2103
+ },
2104
+ {
2105
+ "epoch": 6.76,
2106
+ "learning_rate": 1.907381195471957e-05,
2107
+ "loss": 1.7284,
2108
+ "step": 1750
2109
+ },
2110
+ {
2111
+ "epoch": 6.78,
2112
+ "learning_rate": 1.6287830111171488e-05,
2113
+ "loss": 1.6272,
2114
+ "step": 1755
2115
+ },
2116
+ {
2117
+ "epoch": 6.8,
2118
+ "learning_rate": 1.3694206083212888e-05,
2119
+ "loss": 1.5783,
2120
+ "step": 1760
2121
+ },
2122
+ {
2123
+ "epoch": 6.81,
2124
+ "learning_rate": 1.1302476922232546e-05,
2125
+ "loss": 1.5607,
2126
+ "step": 1765
2127
+ },
2128
+ {
2129
+ "epoch": 6.83,
2130
+ "learning_rate": 9.121437289164463e-06,
2131
+ "loss": 1.6762,
2132
+ "step": 1770
2133
+ },
2134
+ {
2135
+ "epoch": 6.85,
2136
+ "learning_rate": 7.159107115516178e-06,
2137
+ "loss": 1.6488,
2138
+ "step": 1775
2139
+ },
2140
+ {
2141
+ "epoch": 6.87,
2142
+ "learning_rate": 5.422702113166627e-06,
2143
+ "loss": 1.6201,
2144
+ "step": 1780
2145
+ },
2146
+ {
2147
+ "epoch": 6.89,
2148
+ "learning_rate": 3.918607241369662e-06,
2149
+ "loss": 1.7022,
2150
+ "step": 1785
2151
+ },
2152
+ {
2153
+ "epoch": 6.91,
2154
+ "learning_rate": 2.65235322853129e-06,
2155
+ "loss": 1.632,
2156
+ "step": 1790
2157
+ },
2158
+ {
2159
+ "epoch": 6.93,
2160
+ "learning_rate": 1.6285962350901147e-06,
2161
+ "loss": 1.6661,
2162
+ "step": 1795
2163
+ },
2164
+ {
2165
+ "epoch": 6.95,
2166
+ "learning_rate": 8.511007322841488e-07,
2167
+ "loss": 1.6079,
2168
+ "step": 1800
2169
+ },
2170
+ {
2171
+ "epoch": 6.97,
2172
+ "learning_rate": 3.2272565976125165e-07,
2173
+ "loss": 1.6758,
2174
+ "step": 1805
2175
+ },
2176
+ {
2177
+ "epoch": 6.99,
2178
+ "learning_rate": 4.541391293127461e-08,
2179
+ "loss": 1.6987,
2180
+ "step": 1810
2181
+ },
2182
+ {
2183
+ "epoch": 7.01,
2184
+ "learning_rate": 2.018519871846962e-08,
2185
+ "loss": 1.5689,
2186
+ "step": 1815
2187
+ },
2188
+ {
2189
+ "epoch": 7.03,
2190
+ "learning_rate": 2.471322859826806e-07,
2191
+ "loss": 1.5376,
2192
+ "step": 1820
2193
+ },
2194
+ {
2195
+ "epoch": 7.05,
2196
+ "learning_rate": 7.254206643976737e-07,
2197
+ "loss": 1.6088,
2198
+ "step": 1825
2199
+ },
2200
+ {
2201
+ "epoch": 7.07,
2202
+ "learning_rate": 1.4532916130407314e-06,
2203
+ "loss": 1.6132,
2204
+ "step": 1830
2205
+ },
2206
+ {
2207
+ "epoch": 7.08,
2208
+ "learning_rate": 2.4280686674102744e-06,
2209
+ "loss": 1.6823,
2210
+ "step": 1835
2211
+ },
2212
+ {
2213
+ "epoch": 7.1,
2214
+ "learning_rate": 3.6461674610908866e-06,
2215
+ "loss": 1.5818,
2216
+ "step": 1840
2217
+ },
2218
+ {
2219
+ "epoch": 7.12,
2220
+ "learning_rate": 5.103108905877507e-06,
2221
+ "loss": 1.5745,
2222
+ "step": 1845
2223
+ },
2224
+ {
2225
+ "epoch": 7.14,
2226
+ "learning_rate": 6.793535661894024e-06,
2227
+ "loss": 1.5783,
2228
+ "step": 1850
2229
+ },
2230
+ {
2231
+ "epoch": 7.16,
2232
+ "learning_rate": 8.7112318371425e-06,
2233
+ "loss": 1.6224,
2234
+ "step": 1855
2235
+ },
2236
+ {
2237
+ "epoch": 7.18,
2238
+ "learning_rate": 1.0849145844047318e-05,
2239
+ "loss": 1.6016,
2240
+ "step": 1860
2241
+ },
2242
+ {
2243
+ "epoch": 7.2,
2244
+ "learning_rate": 1.3199416328947412e-05,
2245
+ "loss": 1.5826,
2246
+ "step": 1865
2247
+ },
2248
+ {
2249
+ "epoch": 7.22,
2250
+ "learning_rate": 1.5753401079189635e-05,
2251
+ "loss": 1.6424,
2252
+ "step": 1870
2253
+ },
2254
+ {
2255
+ "epoch": 7.24,
2256
+ "learning_rate": 1.8501708801530793e-05,
2257
+ "loss": 1.5944,
2258
+ "step": 1875
2259
+ },
2260
+ {
2261
+ "epoch": 7.26,
2262
+ "learning_rate": 2.1434233654994585e-05,
2263
+ "loss": 1.5702,
2264
+ "step": 1880
2265
+ },
2266
+ {
2267
+ "epoch": 7.28,
2268
+ "learning_rate": 2.454019241120068e-05,
2269
+ "loss": 1.5819,
2270
+ "step": 1885
2271
+ },
2272
+ {
2273
+ "epoch": 7.3,
2274
+ "learning_rate": 2.780816410552581e-05,
2275
+ "loss": 1.5461,
2276
+ "step": 1890
2277
+ },
2278
+ {
2279
+ "epoch": 7.32,
2280
+ "learning_rate": 3.12261320332941e-05,
2281
+ "loss": 1.6276,
2282
+ "step": 1895
2283
+ },
2284
+ {
2285
+ "epoch": 7.34,
2286
+ "learning_rate": 3.4781527936569615e-05,
2287
+ "loss": 1.6333,
2288
+ "step": 1900
2289
+ },
2290
+ {
2291
+ "epoch": 7.36,
2292
+ "learning_rate": 3.8461278219075155e-05,
2293
+ "loss": 1.5744,
2294
+ "step": 1905
2295
+ },
2296
+ {
2297
+ "epoch": 7.37,
2298
+ "learning_rate": 4.2251852019296586e-05,
2299
+ "loss": 1.601,
2300
+ "step": 1910
2301
+ },
2302
+ {
2303
+ "epoch": 7.39,
2304
+ "learning_rate": 4.6139310965004655e-05,
2305
+ "loss": 1.4994,
2306
+ "step": 1915
2307
+ },
2308
+ {
2309
+ "epoch": 7.41,
2310
+ "learning_rate": 5.010936042623934e-05,
2311
+ "loss": 1.5667,
2312
+ "step": 1920
2313
+ },
2314
+ {
2315
+ "epoch": 7.43,
2316
+ "learning_rate": 5.4147402078293086e-05,
2317
+ "loss": 1.6055,
2318
+ "step": 1925
2319
+ },
2320
+ {
2321
+ "epoch": 7.45,
2322
+ "learning_rate": 5.823858758141886e-05,
2323
+ "loss": 1.6403,
2324
+ "step": 1930
2325
+ },
2326
+ {
2327
+ "epoch": 7.47,
2328
+ "learning_rate": 6.236787317986658e-05,
2329
+ "loss": 1.5103,
2330
+ "step": 1935
2331
+ },
2332
+ {
2333
+ "epoch": 7.49,
2334
+ "learning_rate": 6.65200750194898e-05,
2335
+ "loss": 1.5918,
2336
+ "step": 1940
2337
+ },
2338
+ {
2339
+ "epoch": 7.51,
2340
+ "learning_rate": 7.067992498051008e-05,
2341
+ "loss": 1.5905,
2342
+ "step": 1945
2343
+ },
2344
+ {
2345
+ "epoch": 7.53,
2346
+ "learning_rate": 7.48321268201333e-05,
2347
+ "loss": 1.6157,
2348
+ "step": 1950
2349
+ },
2350
+ {
2351
+ "epoch": 7.55,
2352
+ "learning_rate": 7.896141241858101e-05,
2353
+ "loss": 1.5963,
2354
+ "step": 1955
2355
+ },
2356
+ {
2357
+ "epoch": 7.57,
2358
+ "learning_rate": 8.305259792170679e-05,
2359
+ "loss": 1.586,
2360
+ "step": 1960
2361
+ },
2362
+ {
2363
+ "epoch": 7.59,
2364
+ "learning_rate": 8.709063957376054e-05,
2365
+ "loss": 1.6324,
2366
+ "step": 1965
2367
+ },
2368
+ {
2369
+ "epoch": 7.61,
2370
+ "learning_rate": 9.106068903499522e-05,
2371
+ "loss": 1.6232,
2372
+ "step": 1970
2373
+ },
2374
+ {
2375
+ "epoch": 7.63,
2376
+ "learning_rate": 9.494814798070329e-05,
2377
+ "loss": 1.6404,
2378
+ "step": 1975
2379
+ },
2380
+ {
2381
+ "epoch": 7.64,
2382
+ "learning_rate": 9.873872178092473e-05,
2383
+ "loss": 1.6215,
2384
+ "step": 1980
2385
+ },
2386
+ {
2387
+ "epoch": 7.66,
2388
+ "learning_rate": 0.00010241847206343028,
2389
+ "loss": 1.617,
2390
+ "step": 1985
2391
+ },
2392
+ {
2393
+ "epoch": 7.68,
2394
+ "learning_rate": 0.0001059738679667058,
2395
+ "loss": 1.6565,
2396
+ "step": 1990
2397
+ },
2398
+ {
2399
+ "epoch": 7.7,
2400
+ "learning_rate": 0.0001093918358944741,
2401
+ "loss": 1.7342,
2402
+ "step": 1995
2403
+ },
2404
+ {
2405
+ "epoch": 7.72,
2406
+ "learning_rate": 0.00011265980758879924,
2407
+ "loss": 1.6063,
2408
+ "step": 2000
2409
+ },
2410
+ {
2411
+ "epoch": 7.74,
2412
+ "learning_rate": 0.00011576576634500532,
2413
+ "loss": 1.6993,
2414
+ "step": 2005
2415
+ },
2416
+ {
2417
+ "epoch": 7.76,
2418
+ "learning_rate": 0.00011869829119846911,
2419
+ "loss": 1.6355,
2420
+ "step": 2010
2421
+ },
2422
+ {
2423
+ "epoch": 7.78,
2424
+ "learning_rate": 0.00012144659892081027,
2425
+ "loss": 1.6087,
2426
+ "step": 2015
2427
+ },
2428
+ {
2429
+ "epoch": 7.8,
2430
+ "learning_rate": 0.00012400058367105252,
2431
+ "loss": 1.6668,
2432
+ "step": 2020
2433
+ },
2434
+ {
2435
+ "epoch": 7.82,
2436
+ "learning_rate": 0.00012635085415595263,
2437
+ "loss": 1.7275,
2438
+ "step": 2025
2439
+ },
2440
+ {
2441
+ "epoch": 7.84,
2442
+ "learning_rate": 0.00012848876816285744,
2443
+ "loss": 1.6637,
2444
+ "step": 2030
2445
+ },
2446
+ {
2447
+ "epoch": 7.86,
2448
+ "learning_rate": 0.00013040646433810593,
2449
+ "loss": 1.6713,
2450
+ "step": 2035
2451
+ },
2452
+ {
2453
+ "epoch": 7.88,
2454
+ "learning_rate": 0.00013209689109412246,
2455
+ "loss": 1.6358,
2456
+ "step": 2040
2457
+ },
2458
+ {
2459
+ "epoch": 7.9,
2460
+ "learning_rate": 0.00013355383253890908,
2461
+ "loss": 1.6572,
2462
+ "step": 2045
2463
+ },
2464
+ {
2465
+ "epoch": 7.92,
2466
+ "learning_rate": 0.0001347719313325897,
2467
+ "loss": 1.6781,
2468
+ "step": 2050
2469
+ },
2470
+ {
2471
+ "epoch": 7.93,
2472
+ "learning_rate": 0.00013574670838695924,
2473
+ "loss": 1.6401,
2474
+ "step": 2055
2475
+ },
2476
+ {
2477
+ "epoch": 7.95,
2478
+ "learning_rate": 0.0001364745793356023,
2479
+ "loss": 1.673,
2480
+ "step": 2060
2481
+ },
2482
+ {
2483
+ "epoch": 7.97,
2484
+ "learning_rate": 0.0001369528677140173,
2485
+ "loss": 1.7179,
2486
+ "step": 2065
2487
+ },
2488
+ {
2489
+ "epoch": 7.99,
2490
+ "learning_rate": 0.00013717981480128154,
2491
+ "loss": 1.7015,
2492
+ "step": 2070
2493
+ },
2494
+ {
2495
+ "epoch": 8.01,
2496
+ "learning_rate": 0.00013715458608706872,
2497
+ "loss": 1.6596,
2498
+ "step": 2075
2499
+ },
2500
+ {
2501
+ "epoch": 8.03,
2502
+ "learning_rate": 0.00013687727434023877,
2503
+ "loss": 1.6462,
2504
+ "step": 2080
2505
+ },
2506
+ {
2507
+ "epoch": 8.05,
2508
+ "learning_rate": 0.00013634889926771588,
2509
+ "loss": 1.6041,
2510
+ "step": 2085
2511
+ },
2512
+ {
2513
+ "epoch": 8.07,
2514
+ "learning_rate": 0.00013557140376490998,
2515
+ "loss": 1.5571,
2516
+ "step": 2090
2517
+ },
2518
+ {
2519
+ "epoch": 8.09,
2520
+ "learning_rate": 0.00013454764677146882,
2521
+ "loss": 1.5541,
2522
+ "step": 2095
2523
+ },
2524
+ {
2525
+ "epoch": 8.11,
2526
+ "learning_rate": 0.00013328139275863037,
2527
+ "loss": 1.6773,
2528
+ "step": 2100
2529
+ },
2530
+ {
2531
+ "epoch": 8.13,
2532
+ "learning_rate": 0.00013177729788683341,
2533
+ "loss": 1.6003,
2534
+ "step": 2105
2535
+ },
2536
+ {
2537
+ "epoch": 8.15,
2538
+ "learning_rate": 0.00013004089288448387,
2539
+ "loss": 1.5288,
2540
+ "step": 2110
2541
+ },
2542
+ {
2543
+ "epoch": 8.17,
2544
+ "learning_rate": 0.0001280785627108356,
2545
+ "loss": 1.6059,
2546
+ "step": 2115
2547
+ },
2548
+ {
2549
+ "epoch": 8.19,
2550
+ "learning_rate": 0.00012589752307776752,
2551
+ "loss": 1.6265,
2552
+ "step": 2120
2553
+ },
2554
+ {
2555
+ "epoch": 8.2,
2556
+ "learning_rate": 0.0001235057939167872,
2557
+ "loss": 1.6372,
2558
+ "step": 2125
2559
+ },
2560
+ {
2561
+ "epoch": 8.22,
2562
+ "learning_rate": 0.00012091216988882845,
2563
+ "loss": 1.7324,
2564
+ "step": 2130
2565
+ },
2566
+ {
2567
+ "epoch": 8.24,
2568
+ "learning_rate": 0.00011812618804528034,
2569
+ "loss": 1.5938,
2570
+ "step": 2135
2571
+ },
2572
+ {
2573
+ "epoch": 8.26,
2574
+ "learning_rate": 0.00011515809275915997,
2575
+ "loss": 1.5805,
2576
+ "step": 2140
2577
+ },
2578
+ {
2579
+ "epoch": 8.28,
2580
+ "learning_rate": 0.00011201879805538599,
2581
+ "loss": 1.62,
2582
+ "step": 2145
2583
+ },
2584
+ {
2585
+ "epoch": 8.3,
2586
+ "learning_rate": 0.00010871984747866294,
2587
+ "loss": 1.5884,
2588
+ "step": 2150
2589
+ },
2590
+ {
2591
+ "epoch": 8.32,
2592
+ "learning_rate": 0.00010527337164655102,
2593
+ "loss": 1.576,
2594
+ "step": 2155
2595
+ },
2596
+ {
2597
+ "epoch": 8.34,
2598
+ "learning_rate": 0.00010169204364380236,
2599
+ "loss": 1.5746,
2600
+ "step": 2160
2601
+ },
2602
+ {
2603
+ "epoch": 8.36,
2604
+ "learning_rate": 9.798903242198116e-05,
2605
+ "loss": 1.6259,
2606
+ "step": 2165
2607
+ },
2608
+ {
2609
+ "epoch": 8.38,
2610
+ "learning_rate": 9.417795437572906e-05,
2611
+ "loss": 1.6047,
2612
+ "step": 2170
2613
+ },
2614
+ {
2615
+ "epoch": 8.4,
2616
+ "learning_rate": 9.027282327372703e-05,
2617
+ "loss": 1.5615,
2618
+ "step": 2175
2619
+ },
2620
+ {
2621
+ "epoch": 8.42,
2622
+ "learning_rate": 8.628799872846948e-05,
2623
+ "loss": 1.6294,
2624
+ "step": 2180
2625
+ },
2626
+ {
2627
+ "epoch": 8.44,
2628
+ "learning_rate": 8.223813339433283e-05,
2629
+ "loss": 1.5485,
2630
+ "step": 2185
2631
+ },
2632
+ {
2633
+ "epoch": 8.46,
2634
+ "learning_rate": 7.813811908809194e-05,
2635
+ "loss": 1.5291,
2636
+ "step": 2190
2637
+ },
2638
+ {
2639
+ "epoch": 8.47,
2640
+ "learning_rate": 7.400303203001327e-05,
2641
+ "loss": 1.6178,
2642
+ "step": 2195
2643
+ },
2644
+ {
2645
+ "epoch": 8.49,
2646
+ "learning_rate": 6.98480774068711e-05,
2647
+ "loss": 1.6335,
2648
+ "step": 2200
2649
+ },
2650
+ {
2651
+ "epoch": 8.51,
2652
+ "learning_rate": 6.568853346074429e-05,
2653
+ "loss": 1.5607,
2654
+ "step": 2205
2655
+ },
2656
+ {
2657
+ "epoch": 8.53,
2658
+ "learning_rate": 6.15396953091741e-05,
2659
+ "loss": 1.552,
2660
+ "step": 2210
2661
+ },
2662
+ {
2663
+ "epoch": 8.55,
2664
+ "learning_rate": 5.741681870327528e-05,
2665
+ "loss": 1.6358,
2666
+ "step": 2215
2667
+ },
2668
+ {
2669
+ "epoch": 8.57,
2670
+ "learning_rate": 5.33350639305969e-05,
2671
+ "loss": 1.6499,
2672
+ "step": 2220
2673
+ },
2674
+ {
2675
+ "epoch": 8.59,
2676
+ "learning_rate": 4.930944006901777e-05,
2677
+ "loss": 1.5632,
2678
+ "step": 2225
2679
+ },
2680
+ {
2681
+ "epoch": 8.61,
2682
+ "learning_rate": 4.535474979665314e-05,
2683
+ "loss": 1.5825,
2684
+ "step": 2230
2685
+ },
2686
+ {
2687
+ "epoch": 8.63,
2688
+ "learning_rate": 4.148553496072023e-05,
2689
+ "loss": 1.6277,
2690
+ "step": 2235
2691
+ },
2692
+ {
2693
+ "epoch": 8.65,
2694
+ "learning_rate": 3.7716023105507615e-05,
2695
+ "loss": 1.5497,
2696
+ "step": 2240
2697
+ },
2698
+ {
2699
+ "epoch": 8.67,
2700
+ "learning_rate": 3.406007515606987e-05,
2701
+ "loss": 1.5159,
2702
+ "step": 2245
2703
+ },
2704
+ {
2705
+ "epoch": 8.69,
2706
+ "learning_rate": 3.0531134450027666e-05,
2707
+ "loss": 1.5683,
2708
+ "step": 2250
2709
+ },
2710
+ {
2711
+ "epoch": 8.71,
2712
+ "learning_rate": 2.7142177304880198e-05,
2713
+ "loss": 1.5193,
2714
+ "step": 2255
2715
+ },
2716
+ {
2717
+ "epoch": 8.73,
2718
+ "learning_rate": 2.390566530260624e-05,
2719
+ "loss": 1.6145,
2720
+ "step": 2260
2721
+ },
2722
+ {
2723
+ "epoch": 8.75,
2724
+ "learning_rate": 2.0833499467006378e-05,
2725
+ "loss": 1.5854,
2726
+ "step": 2265
2727
+ },
2728
+ {
2729
+ "epoch": 8.76,
2730
+ "learning_rate": 1.7936976502279525e-05,
2731
+ "loss": 1.5426,
2732
+ "step": 2270
2733
+ },
2734
+ {
2735
+ "epoch": 8.78,
2736
+ "learning_rate": 1.5226747253755011e-05,
2737
+ "loss": 1.5862,
2738
+ "step": 2275
2739
+ },
2740
+ {
2741
+ "epoch": 8.8,
2742
+ "learning_rate": 1.2712777543517822e-05,
2743
+ "loss": 1.5478,
2744
+ "step": 2280
2745
+ },
2746
+ {
2747
+ "epoch": 8.82,
2748
+ "learning_rate": 1.0404311524944368e-05,
2749
+ "loss": 1.6329,
2750
+ "step": 2285
2751
+ },
2752
+ {
2753
+ "epoch": 8.84,
2754
+ "learning_rate": 8.309837690896873e-06,
2755
+ "loss": 1.535,
2756
+ "step": 2290
2757
+ },
2758
+ {
2759
+ "epoch": 8.86,
2760
+ "learning_rate": 6.43705766056588e-06,
2761
+ "loss": 1.5849,
2762
+ "step": 2295
2763
+ },
2764
+ {
2765
+ "epoch": 8.88,
2766
+ "learning_rate": 4.792857859738948e-06,
2767
+ "loss": 1.6253,
2768
+ "step": 2300
2769
+ },
2770
+ {
2771
+ "epoch": 8.9,
2772
+ "learning_rate": 3.3832841986266175e-06,
2773
+ "loss": 1.537,
2774
+ "step": 2305
2775
+ },
2776
+ {
2777
+ "epoch": 8.92,
2778
+ "learning_rate": 2.213519840361947e-06,
2779
+ "loss": 1.5028,
2780
+ "step": 2310
2781
+ },
2782
+ {
2783
+ "epoch": 8.94,
2784
+ "learning_rate": 1.2878661419176351e-06,
2785
+ "loss": 1.5904,
2786
+ "step": 2315
2787
+ },
2788
+ {
2789
+ "epoch": 8.96,
2790
+ "learning_rate": 6.097268375260679e-07,
2791
+ "loss": 1.6009,
2792
+ "step": 2320
2793
+ },
2794
+ {
2795
+ "epoch": 8.98,
2796
+ "learning_rate": 1.8159552276040752e-07,
2797
+ "loss": 1.5465,
2798
+ "step": 2325
2799
+ },
2800
+ {
2801
+ "epoch": 9.0,
2802
+ "learning_rate": 5.046485299251069e-09,
2803
+ "loss": 1.5575,
2804
+ "step": 2330
2805
+ },
2806
+ {
2807
+ "epoch": 9.02,
2808
+ "learning_rate": 8.072891609113784e-08,
2809
+ "loss": 1.5791,
2810
+ "step": 2335
2811
+ },
2812
+ {
2813
+ "epoch": 9.03,
2814
+ "learning_rate": 4.0836452220544814e-07,
2815
+ "loss": 1.4865,
2816
+ "step": 2340
2817
+ },
2818
+ {
2819
+ "epoch": 9.05,
2820
+ "learning_rate": 9.867485501471922e-07,
2821
+ "loss": 1.5316,
2822
+ "step": 2345
2823
+ },
2824
+ {
2825
+ "epoch": 9.07,
2826
+ "learning_rate": 1.813754215873199e-06,
2827
+ "loss": 1.5403,
2828
+ "step": 2350
2829
+ },
2830
+ {
2831
+ "epoch": 9.09,
2832
+ "learning_rate": 2.8863405252193584e-06,
2833
+ "loss": 1.4183,
2834
+ "step": 2355
2835
+ },
2836
+ {
2837
+ "epoch": 9.11,
2838
+ "learning_rate": 4.200563455983382e-06,
2839
+ "loss": 1.5547,
2840
+ "step": 2360
2841
+ },
2842
+ {
2843
+ "epoch": 9.13,
2844
+ "learning_rate": 5.75159046054386e-06,
2845
+ "loss": 1.5414,
2846
+ "step": 2365
2847
+ },
2848
+ {
2849
+ "epoch": 9.15,
2850
+ "learning_rate": 7.5337182356897725e-06,
2851
+ "loss": 1.5445,
2852
+ "step": 2370
2853
+ },
2854
+ {
2855
+ "epoch": 9.17,
2856
+ "learning_rate": 9.540393694315775e-06,
2857
+ "loss": 1.4539,
2858
+ "step": 2375
2859
+ },
2860
+ {
2861
+ "epoch": 9.19,
2862
+ "learning_rate": 1.1764238061872442e-05,
2863
+ "loss": 1.4992,
2864
+ "step": 2380
2865
+ },
2866
+ {
2867
+ "epoch": 9.21,
2868
+ "learning_rate": 1.4197074008960564e-05,
2869
+ "loss": 1.5203,
2870
+ "step": 2385
2871
+ },
2872
+ {
2873
+ "epoch": 9.23,
2874
+ "learning_rate": 1.6829955720305234e-05,
2875
+ "loss": 1.4989,
2876
+ "step": 2390
2877
+ },
2878
+ {
2879
+ "epoch": 9.25,
2880
+ "learning_rate": 1.965320178953787e-05,
2881
+ "loss": 1.5128,
2882
+ "step": 2395
2883
+ },
2884
+ {
2885
+ "epoch": 9.27,
2886
+ "learning_rate": 2.265643081883295e-05,
2887
+ "loss": 1.5033,
2888
+ "step": 2400
2889
+ },
2890
+ {
2891
+ "epoch": 9.29,
2892
+ "learning_rate": 2.582859959249101e-05,
2893
+ "loss": 1.4938,
2894
+ "step": 2405
2895
+ },
2896
+ {
2897
+ "epoch": 9.31,
2898
+ "learning_rate": 2.915804368410211e-05,
2899
+ "loss": 1.5157,
2900
+ "step": 2410
2901
+ },
2902
+ {
2903
+ "epoch": 9.32,
2904
+ "learning_rate": 3.2632520347973973e-05,
2905
+ "loss": 1.4103,
2906
+ "step": 2415
2907
+ },
2908
+ {
2909
+ "epoch": 9.34,
2910
+ "learning_rate": 3.623925353710222e-05,
2911
+ "loss": 1.524,
2912
+ "step": 2420
2913
+ },
2914
+ {
2915
+ "epoch": 9.36,
2916
+ "learning_rate": 3.996498088215406e-05,
2917
+ "loss": 1.5389,
2918
+ "step": 2425
2919
+ },
2920
+ {
2921
+ "epoch": 9.38,
2922
+ "learning_rate": 4.3796002458712527e-05,
2923
+ "loss": 1.5645,
2924
+ "step": 2430
2925
+ },
2926
+ {
2927
+ "epoch": 9.4,
2928
+ "learning_rate": 4.7718231163460484e-05,
2929
+ "loss": 1.5511,
2930
+ "step": 2435
2931
+ },
2932
+ {
2933
+ "epoch": 9.42,
2934
+ "learning_rate": 5.1717244514068206e-05,
2935
+ "loss": 1.5406,
2936
+ "step": 2440
2937
+ },
2938
+ {
2939
+ "epoch": 9.44,
2940
+ "learning_rate": 5.57783376823034e-05,
2941
+ "loss": 1.567,
2942
+ "step": 2445
2943
+ },
2944
+ {
2945
+ "epoch": 9.46,
2946
+ "learning_rate": 5.988657756536402e-05,
2947
+ "loss": 1.602,
2948
+ "step": 2450
2949
+ },
2950
+ {
2951
+ "epoch": 9.48,
2952
+ "learning_rate": 6.402685769660036e-05,
2953
+ "loss": 1.4789,
2954
+ "step": 2455
2955
+ },
2956
+ {
2957
+ "epoch": 9.5,
2958
+ "learning_rate": 6.818395379371463e-05,
2959
+ "loss": 1.5673,
2960
+ "step": 2460
2961
+ },
2962
+ {
2963
+ "epoch": 9.52,
2964
+ "learning_rate": 7.234257974018531e-05,
2965
+ "loss": 1.5527,
2966
+ "step": 2465
2967
+ },
2968
+ {
2969
+ "epoch": 9.54,
2970
+ "learning_rate": 7.64874437940594e-05,
2971
+ "loss": 1.4721,
2972
+ "step": 2470
2973
+ },
2974
+ {
2975
+ "epoch": 9.56,
2976
+ "learning_rate": 8.060330481743391e-05,
2977
+ "loss": 1.4447,
2978
+ "step": 2475
2979
+ },
2980
+ {
2981
+ "epoch": 9.58,
2982
+ "learning_rate": 8.467502831985544e-05,
2983
+ "loss": 1.5768,
2984
+ "step": 2480
2985
+ },
2986
+ {
2987
+ "epoch": 9.59,
2988
+ "learning_rate": 8.868764210957132e-05,
2989
+ "loss": 1.4808,
2990
+ "step": 2485
2991
+ },
2992
+ {
2993
+ "epoch": 9.61,
2994
+ "learning_rate": 9.262639134798362e-05,
2995
+ "loss": 1.4197,
2996
+ "step": 2490
2997
+ },
2998
+ {
2999
+ "epoch": 9.63,
3000
+ "learning_rate": 9.647679280487567e-05,
3001
+ "loss": 1.6109,
3002
+ "step": 2495
3003
+ },
3004
+ {
3005
+ "epoch": 9.65,
3006
+ "learning_rate": 0.00010022468811489983,
3007
+ "loss": 1.5907,
3008
+ "step": 2500
3009
+ },
3010
+ {
3011
+ "epoch": 9.67,
3012
+ "learning_rate": 0.00010385629583950378,
3013
+ "loss": 1.5902,
3014
+ "step": 2505
3015
+ },
3016
+ {
3017
+ "epoch": 9.69,
3018
+ "learning_rate": 0.00010735826214284965,
3019
+ "loss": 1.6053,
3020
+ "step": 2510
3021
+ },
3022
+ {
3023
+ "epoch": 9.71,
3024
+ "learning_rate": 0.00011071770989539361,
3025
+ "loss": 1.552,
3026
+ "step": 2515
3027
+ },
3028
+ {
3029
+ "epoch": 9.73,
3030
+ "learning_rate": 0.00011392228602455961,
3031
+ "loss": 1.5787,
3032
+ "step": 2520
3033
+ },
3034
+ {
3035
+ "epoch": 9.75,
3036
+ "learning_rate": 0.00011696020693839523,
3037
+ "loss": 1.4997,
3038
+ "step": 2525
3039
+ },
3040
+ {
3041
+ "epoch": 9.77,
3042
+ "learning_rate": 0.00011982030185518476,
3043
+ "loss": 1.6354,
3044
+ "step": 2530
3045
+ },
3046
+ {
3047
+ "epoch": 9.79,
3048
+ "learning_rate": 0.00012249205387968647,
3049
+ "loss": 1.586,
3050
+ "step": 2535
3051
+ },
3052
+ {
3053
+ "epoch": 9.81,
3054
+ "learning_rate": 0.0001249656386749574,
3055
+ "loss": 1.511,
3056
+ "step": 2540
3057
+ },
3058
+ {
3059
+ "epoch": 9.83,
3060
+ "learning_rate": 0.000127231960587559,
3061
+ "loss": 1.5002,
3062
+ "step": 2545
3063
+ },
3064
+ {
3065
+ "epoch": 9.85,
3066
+ "learning_rate": 0.00012928268609331444,
3067
+ "loss": 1.5829,
3068
+ "step": 2550
3069
+ },
3070
+ {
3071
+ "epoch": 9.86,
3072
+ "learning_rate": 0.00013111027444062605,
3073
+ "loss": 1.6407,
3074
+ "step": 2555
3075
+ },
3076
+ {
3077
+ "epoch": 9.88,
3078
+ "learning_rate": 0.00013270800537867978,
3079
+ "loss": 1.5058,
3080
+ "step": 2560
3081
+ },
3082
+ {
3083
+ "epoch": 9.9,
3084
+ "learning_rate": 0.00013407000386857348,
3085
+ "loss": 1.4854,
3086
+ "step": 2565
3087
+ },
3088
+ {
3089
+ "epoch": 9.92,
3090
+ "learning_rate": 0.0001351912616865057,
3091
+ "loss": 1.4912,
3092
+ "step": 2570
3093
+ },
3094
+ {
3095
+ "epoch": 9.94,
3096
+ "learning_rate": 0.00013606765583958525,
3097
+ "loss": 1.5218,
3098
+ "step": 2575
3099
+ },
3100
+ {
3101
+ "epoch": 9.96,
3102
+ "learning_rate": 0.00013669596372654658,
3103
+ "loss": 1.5828,
3104
+ "step": 2580
3105
+ },
3106
+ {
3107
+ "epoch": 9.98,
3108
+ "learning_rate": 0.00013707387498762246,
3109
+ "loss": 1.5816,
3110
+ "step": 2585
3111
+ },
3112
+ {
3113
+ "epoch": 10.0,
3114
+ "learning_rate": 0.0001372,
3115
+ "loss": 1.4944,
3116
+ "step": 2590
3117
+ },
3118
+ {
3119
+ "epoch": 10.02,
3120
+ "learning_rate": 0.00013707387498762246,
3121
+ "loss": 1.5296,
3122
+ "step": 2595
3123
+ },
3124
+ {
3125
+ "epoch": 10.04,
3126
+ "learning_rate": 0.0001366959637265466,
3127
+ "loss": 1.5509,
3128
+ "step": 2600
3129
+ },
3130
+ {
3131
+ "epoch": 10.06,
3132
+ "learning_rate": 0.00013606765583958527,
3133
+ "loss": 1.5305,
3134
+ "step": 2605
3135
+ },
3136
+ {
3137
+ "epoch": 10.08,
3138
+ "learning_rate": 0.00013519126168650574,
3139
+ "loss": 1.4872,
3140
+ "step": 2610
3141
+ },
3142
+ {
3143
+ "epoch": 10.1,
3144
+ "learning_rate": 0.00013407000386857353,
3145
+ "loss": 1.544,
3146
+ "step": 2615
3147
+ },
3148
+ {
3149
+ "epoch": 10.12,
3150
+ "learning_rate": 0.00013270800537867983,
3151
+ "loss": 1.5421,
3152
+ "step": 2620
3153
+ },
3154
+ {
3155
+ "epoch": 10.14,
3156
+ "learning_rate": 0.0001311102744406261,
3157
+ "loss": 1.5468,
3158
+ "step": 2625
3159
+ },
3160
+ {
3161
+ "epoch": 10.15,
3162
+ "learning_rate": 0.00012928268609331455,
3163
+ "loss": 1.5529,
3164
+ "step": 2630
3165
+ },
3166
+ {
3167
+ "epoch": 10.17,
3168
+ "learning_rate": 0.00012723196058755907,
3169
+ "loss": 1.5357,
3170
+ "step": 2635
3171
+ },
3172
+ {
3173
+ "epoch": 10.19,
3174
+ "learning_rate": 0.00012496563867495748,
3175
+ "loss": 1.5077,
3176
+ "step": 2640
3177
+ },
3178
+ {
3179
+ "epoch": 10.21,
3180
+ "learning_rate": 0.00012249205387968658,
3181
+ "loss": 1.6099,
3182
+ "step": 2645
3183
+ },
3184
+ {
3185
+ "epoch": 10.23,
3186
+ "learning_rate": 0.00011982030185518488,
3187
+ "loss": 1.5701,
3188
+ "step": 2650
3189
+ },
3190
+ {
3191
+ "epoch": 10.25,
3192
+ "learning_rate": 0.0001169602069383955,
3193
+ "loss": 1.5479,
3194
+ "step": 2655
3195
+ },
3196
+ {
3197
+ "epoch": 10.27,
3198
+ "learning_rate": 0.00011392228602455956,
3199
+ "loss": 1.5486,
3200
+ "step": 2660
3201
+ },
3202
+ {
3203
+ "epoch": 10.29,
3204
+ "learning_rate": 0.00011071770989539373,
3205
+ "loss": 1.5367,
3206
+ "step": 2665
3207
+ },
3208
+ {
3209
+ "epoch": 10.31,
3210
+ "learning_rate": 0.0001073582621428498,
3211
+ "loss": 1.5361,
3212
+ "step": 2670
3213
+ },
3214
+ {
3215
+ "epoch": 10.33,
3216
+ "learning_rate": 0.00010385629583950413,
3217
+ "loss": 1.3755,
3218
+ "step": 2675
3219
+ },
3220
+ {
3221
+ "epoch": 10.35,
3222
+ "learning_rate": 0.00010022468811490019,
3223
+ "loss": 1.464,
3224
+ "step": 2680
3225
+ },
3226
+ {
3227
+ "epoch": 10.37,
3228
+ "learning_rate": 9.64767928048756e-05,
3229
+ "loss": 1.5142,
3230
+ "step": 2685
3231
+ },
3232
+ {
3233
+ "epoch": 10.39,
3234
+ "learning_rate": 9.262639134798378e-05,
3235
+ "loss": 1.5207,
3236
+ "step": 2690
3237
+ },
3238
+ {
3239
+ "epoch": 10.41,
3240
+ "learning_rate": 8.868764210957149e-05,
3241
+ "loss": 1.5357,
3242
+ "step": 2695
3243
+ },
3244
+ {
3245
+ "epoch": 10.42,
3246
+ "learning_rate": 8.467502831985583e-05,
3247
+ "loss": 1.454,
3248
+ "step": 2700
3249
+ },
3250
+ {
3251
+ "epoch": 10.44,
3252
+ "learning_rate": 8.06033048174343e-05,
3253
+ "loss": 1.5096,
3254
+ "step": 2705
3255
+ },
3256
+ {
3257
+ "epoch": 10.46,
3258
+ "learning_rate": 7.648744379405981e-05,
3259
+ "loss": 1.5628,
3260
+ "step": 2710
3261
+ },
3262
+ {
3263
+ "epoch": 10.48,
3264
+ "learning_rate": 7.234257974018524e-05,
3265
+ "loss": 1.468,
3266
+ "step": 2715
3267
+ },
3268
+ {
3269
+ "epoch": 10.5,
3270
+ "learning_rate": 6.818395379371479e-05,
3271
+ "loss": 1.4858,
3272
+ "step": 2720
3273
+ },
3274
+ {
3275
+ "epoch": 10.52,
3276
+ "learning_rate": 6.402685769660054e-05,
3277
+ "loss": 1.4885,
3278
+ "step": 2725
3279
+ },
3280
+ {
3281
+ "epoch": 10.54,
3282
+ "learning_rate": 5.988657756536443e-05,
3283
+ "loss": 1.4577,
3284
+ "step": 2730
3285
+ },
3286
+ {
3287
+ "epoch": 10.56,
3288
+ "learning_rate": 5.577833768230333e-05,
3289
+ "loss": 1.5513,
3290
+ "step": 2735
3291
+ },
3292
+ {
3293
+ "epoch": 10.58,
3294
+ "learning_rate": 5.171724451406837e-05,
3295
+ "loss": 1.4957,
3296
+ "step": 2740
3297
+ },
3298
+ {
3299
+ "epoch": 10.6,
3300
+ "learning_rate": 4.7718231163460647e-05,
3301
+ "loss": 1.5075,
3302
+ "step": 2745
3303
+ },
3304
+ {
3305
+ "epoch": 10.62,
3306
+ "learning_rate": 4.379600245871268e-05,
3307
+ "loss": 1.448,
3308
+ "step": 2750
3309
+ },
3310
+ {
3311
+ "epoch": 10.64,
3312
+ "learning_rate": 3.996498088215443e-05,
3313
+ "loss": 1.53,
3314
+ "step": 2755
3315
+ },
3316
+ {
3317
+ "epoch": 10.66,
3318
+ "learning_rate": 3.623925353710258e-05,
3319
+ "loss": 1.5454,
3320
+ "step": 2760
3321
+ },
3322
+ {
3323
+ "epoch": 10.68,
3324
+ "learning_rate": 3.2632520347973906e-05,
3325
+ "loss": 1.5139,
3326
+ "step": 2765
3327
+ },
3328
+ {
3329
+ "epoch": 10.69,
3330
+ "learning_rate": 2.915804368410225e-05,
3331
+ "loss": 1.5848,
3332
+ "step": 2770
3333
+ },
3334
+ {
3335
+ "epoch": 10.71,
3336
+ "learning_rate": 2.5828599592491143e-05,
3337
+ "loss": 1.514,
3338
+ "step": 2775
3339
+ },
3340
+ {
3341
+ "epoch": 10.73,
3342
+ "learning_rate": 2.2656430818833073e-05,
3343
+ "loss": 1.4666,
3344
+ "step": 2780
3345
+ },
3346
+ {
3347
+ "epoch": 10.75,
3348
+ "learning_rate": 1.965320178953816e-05,
3349
+ "loss": 1.4546,
3350
+ "step": 2785
3351
+ },
3352
+ {
3353
+ "epoch": 10.77,
3354
+ "learning_rate": 1.682995572030518e-05,
3355
+ "loss": 1.4552,
3356
+ "step": 2790
3357
+ },
3358
+ {
3359
+ "epoch": 10.79,
3360
+ "learning_rate": 1.4197074008960664e-05,
3361
+ "loss": 1.482,
3362
+ "step": 2795
3363
+ },
3364
+ {
3365
+ "epoch": 10.81,
3366
+ "learning_rate": 1.1764238061872534e-05,
3367
+ "loss": 1.5422,
3368
+ "step": 2800
3369
+ },
3370
+ {
3371
+ "epoch": 10.83,
3372
+ "learning_rate": 9.54039369431598e-06,
3373
+ "loss": 1.492,
3374
+ "step": 2805
3375
+ },
3376
+ {
3377
+ "epoch": 10.85,
3378
+ "learning_rate": 7.5337182356897344e-06,
3379
+ "loss": 1.4219,
3380
+ "step": 2810
3381
+ },
3382
+ {
3383
+ "epoch": 10.87,
3384
+ "learning_rate": 5.75159046054383e-06,
3385
+ "loss": 1.4972,
3386
+ "step": 2815
3387
+ },
3388
+ {
3389
+ "epoch": 10.89,
3390
+ "learning_rate": 4.200563455983359e-06,
3391
+ "loss": 1.4525,
3392
+ "step": 2820
3393
+ },
3394
+ {
3395
+ "epoch": 10.91,
3396
+ "learning_rate": 2.886340525219404e-06,
3397
+ "loss": 1.4337,
3398
+ "step": 2825
3399
+ },
3400
+ {
3401
+ "epoch": 10.93,
3402
+ "learning_rate": 1.8137542158732371e-06,
3403
+ "loss": 1.5066,
3404
+ "step": 2830
3405
+ },
3406
+ {
3407
+ "epoch": 10.95,
3408
+ "learning_rate": 9.867485501472609e-07,
3409
+ "loss": 1.4053,
3410
+ "step": 2835
3411
+ },
3412
+ {
3413
+ "epoch": 10.97,
3414
+ "learning_rate": 4.083645222054405e-07,
3415
+ "loss": 1.4861,
3416
+ "step": 2840
3417
+ },
3418
+ {
3419
+ "epoch": 10.98,
3420
+ "learning_rate": 8.072891609114545e-08,
3421
+ "loss": 1.4625,
3422
+ "step": 2845
3423
+ },
3424
+ {
3425
+ "epoch": 11.0,
3426
+ "learning_rate": 5.046485299251069e-09,
3427
+ "loss": 1.4993,
3428
+ "step": 2850
3429
+ },
3430
+ {
3431
+ "epoch": 11.02,
3432
+ "learning_rate": 1.8159552276039227e-07,
3433
+ "loss": 1.4475,
3434
+ "step": 2855
3435
+ },
3436
+ {
3437
+ "epoch": 11.04,
3438
+ "learning_rate": 6.097268375260069e-07,
3439
+ "loss": 1.4133,
3440
+ "step": 2860
3441
+ },
3442
+ {
3443
+ "epoch": 11.06,
3444
+ "learning_rate": 1.2878661419176504e-06,
3445
+ "loss": 1.4672,
3446
+ "step": 2865
3447
+ },
3448
+ {
3449
+ "epoch": 11.08,
3450
+ "learning_rate": 2.21351984036197e-06,
3451
+ "loss": 1.4601,
3452
+ "step": 2870
3453
+ },
3454
+ {
3455
+ "epoch": 11.1,
3456
+ "learning_rate": 3.383284198626564e-06,
3457
+ "loss": 1.4559,
3458
+ "step": 2875
3459
+ },
3460
+ {
3461
+ "epoch": 11.12,
3462
+ "learning_rate": 4.792857859738887e-06,
3463
+ "loss": 1.3926,
3464
+ "step": 2880
3465
+ },
3466
+ {
3467
+ "epoch": 11.14,
3468
+ "learning_rate": 6.437057660565811e-06,
3469
+ "loss": 1.3658,
3470
+ "step": 2885
3471
+ },
3472
+ {
3473
+ "epoch": 11.16,
3474
+ "learning_rate": 8.309837690896675e-06,
3475
+ "loss": 1.3814,
3476
+ "step": 2890
3477
+ },
3478
+ {
3479
+ "epoch": 11.18,
3480
+ "learning_rate": 1.0404311524944405e-05,
3481
+ "loss": 1.4756,
3482
+ "step": 2895
3483
+ },
3484
+ {
3485
+ "epoch": 11.2,
3486
+ "learning_rate": 1.271277754351773e-05,
3487
+ "loss": 1.3856,
3488
+ "step": 2900
3489
+ },
3490
+ {
3491
+ "epoch": 11.22,
3492
+ "learning_rate": 1.5226747253754904e-05,
3493
+ "loss": 1.3775,
3494
+ "step": 2905
3495
+ },
3496
+ {
3497
+ "epoch": 11.24,
3498
+ "learning_rate": 1.7936976502279244e-05,
3499
+ "loss": 1.4091,
3500
+ "step": 2910
3501
+ },
3502
+ {
3503
+ "epoch": 11.25,
3504
+ "learning_rate": 2.083349946700608e-05,
3505
+ "loss": 1.4543,
3506
+ "step": 2915
3507
+ },
3508
+ {
3509
+ "epoch": 11.27,
3510
+ "learning_rate": 2.39056653026063e-05,
3511
+ "loss": 1.4107,
3512
+ "step": 2920
3513
+ },
3514
+ {
3515
+ "epoch": 11.29,
3516
+ "learning_rate": 2.714217730488006e-05,
3517
+ "loss": 1.4381,
3518
+ "step": 2925
3519
+ },
3520
+ {
3521
+ "epoch": 11.31,
3522
+ "learning_rate": 3.053113445002753e-05,
3523
+ "loss": 1.4025,
3524
+ "step": 2930
3525
+ },
3526
+ {
3527
+ "epoch": 11.33,
3528
+ "learning_rate": 3.4060075156069725e-05,
3529
+ "loss": 1.3656,
3530
+ "step": 2935
3531
+ },
3532
+ {
3533
+ "epoch": 11.35,
3534
+ "learning_rate": 3.771602310550724e-05,
3535
+ "loss": 1.4167,
3536
+ "step": 2940
3537
+ },
3538
+ {
3539
+ "epoch": 11.37,
3540
+ "learning_rate": 4.148553496072031e-05,
3541
+ "loss": 1.4686,
3542
+ "step": 2945
3543
+ },
3544
+ {
3545
+ "epoch": 11.39,
3546
+ "learning_rate": 4.5354749796653205e-05,
3547
+ "loss": 1.3998,
3548
+ "step": 2950
3549
+ },
3550
+ {
3551
+ "epoch": 11.41,
3552
+ "learning_rate": 4.9309440069017615e-05,
3553
+ "loss": 1.4714,
3554
+ "step": 2955
3555
+ },
3556
+ {
3557
+ "epoch": 11.43,
3558
+ "learning_rate": 5.333506393059674e-05,
3559
+ "loss": 1.3963,
3560
+ "step": 2960
3561
+ },
3562
+ {
3563
+ "epoch": 11.45,
3564
+ "learning_rate": 5.7416818703274866e-05,
3565
+ "loss": 1.5068,
3566
+ "step": 2965
3567
+ },
3568
+ {
3569
+ "epoch": 11.47,
3570
+ "learning_rate": 6.153969530917418e-05,
3571
+ "loss": 1.4338,
3572
+ "step": 2970
3573
+ },
3574
+ {
3575
+ "epoch": 11.49,
3576
+ "learning_rate": 6.568853346074412e-05,
3577
+ "loss": 1.3606,
3578
+ "step": 2975
3579
+ },
3580
+ {
3581
+ "epoch": 11.51,
3582
+ "learning_rate": 6.984807740687094e-05,
3583
+ "loss": 1.4016,
3584
+ "step": 2980
3585
+ },
3586
+ {
3587
+ "epoch": 11.53,
3588
+ "learning_rate": 7.400303203001311e-05,
3589
+ "loss": 1.4269,
3590
+ "step": 2985
3591
+ },
3592
+ {
3593
+ "epoch": 11.54,
3594
+ "learning_rate": 7.813811908809178e-05,
3595
+ "loss": 1.44,
3596
+ "step": 2990
3597
+ },
3598
+ {
3599
+ "epoch": 11.56,
3600
+ "learning_rate": 8.223813339433243e-05,
3601
+ "loss": 1.4977,
3602
+ "step": 2995
3603
+ },
3604
+ {
3605
+ "epoch": 11.58,
3606
+ "learning_rate": 8.628799872846956e-05,
3607
+ "loss": 1.436,
3608
+ "step": 3000
3609
+ },
3610
+ {
3611
+ "epoch": 11.6,
3612
+ "learning_rate": 9.027282327372687e-05,
3613
+ "loss": 1.513,
3614
+ "step": 3005
3615
+ },
3616
+ {
3617
+ "epoch": 11.62,
3618
+ "learning_rate": 9.417795437572891e-05,
3619
+ "loss": 1.4691,
3620
+ "step": 3010
3621
+ },
3622
+ {
3623
+ "epoch": 11.64,
3624
+ "learning_rate": 9.798903242198079e-05,
3625
+ "loss": 1.5133,
3626
+ "step": 3015
3627
+ },
3628
+ {
3629
+ "epoch": 11.66,
3630
+ "learning_rate": 0.000101692043643802,
3631
+ "loss": 1.5047,
3632
+ "step": 3020
3633
+ },
3634
+ {
3635
+ "epoch": 11.68,
3636
+ "learning_rate": 0.00010527337164655109,
3637
+ "loss": 1.4302,
3638
+ "step": 3025
3639
+ },
3640
+ {
3641
+ "epoch": 11.7,
3642
+ "learning_rate": 0.00010871984747866282,
3643
+ "loss": 1.5886,
3644
+ "step": 3030
3645
+ },
3646
+ {
3647
+ "epoch": 11.72,
3648
+ "learning_rate": 0.00011201879805538586,
3649
+ "loss": 1.4925,
3650
+ "step": 3035
3651
+ },
3652
+ {
3653
+ "epoch": 11.74,
3654
+ "learning_rate": 0.00011515809275915985,
3655
+ "loss": 1.4445,
3656
+ "step": 3040
3657
+ },
3658
+ {
3659
+ "epoch": 11.76,
3660
+ "learning_rate": 0.00011812618804528006,
3661
+ "loss": 1.4577,
3662
+ "step": 3045
3663
+ },
3664
+ {
3665
+ "epoch": 11.78,
3666
+ "learning_rate": 0.00012091216988882848,
3667
+ "loss": 1.4792,
3668
+ "step": 3050
3669
+ },
3670
+ {
3671
+ "epoch": 11.8,
3672
+ "learning_rate": 0.00012350579391678723,
3673
+ "loss": 1.4425,
3674
+ "step": 3055
3675
+ },
3676
+ {
3677
+ "epoch": 11.81,
3678
+ "learning_rate": 0.00012589752307776744,
3679
+ "loss": 1.4294,
3680
+ "step": 3060
3681
+ },
3682
+ {
3683
+ "epoch": 11.83,
3684
+ "learning_rate": 0.0001280785627108355,
3685
+ "loss": 1.4138,
3686
+ "step": 3065
3687
+ },
3688
+ {
3689
+ "epoch": 11.85,
3690
+ "learning_rate": 0.0001300408928844837,
3691
+ "loss": 1.5429,
3692
+ "step": 3070
3693
+ },
3694
+ {
3695
+ "epoch": 11.87,
3696
+ "learning_rate": 0.00013177729788683344,
3697
+ "loss": 1.5001,
3698
+ "step": 3075
3699
+ },
3700
+ {
3701
+ "epoch": 11.89,
3702
+ "learning_rate": 0.00013328139275863032,
3703
+ "loss": 1.444,
3704
+ "step": 3080
3705
+ },
3706
+ {
3707
+ "epoch": 11.91,
3708
+ "learning_rate": 0.00013454764677146876,
3709
+ "loss": 1.4294,
3710
+ "step": 3085
3711
+ },
3712
+ {
3713
+ "epoch": 11.93,
3714
+ "learning_rate": 0.00013557140376490993,
3715
+ "loss": 1.5483,
3716
+ "step": 3090
3717
+ },
3718
+ {
3719
+ "epoch": 11.95,
3720
+ "learning_rate": 0.0001363488992677158,
3721
+ "loss": 1.5026,
3722
+ "step": 3095
3723
+ },
3724
+ {
3725
+ "epoch": 11.97,
3726
+ "learning_rate": 0.00013687727434023872,
3727
+ "loss": 1.5176,
3728
+ "step": 3100
3729
+ },
3730
+ {
3731
+ "epoch": 11.99,
3732
+ "learning_rate": 0.00013715458608706872,
3733
+ "loss": 1.4418,
3734
+ "step": 3105
3735
+ },
3736
+ {
3737
+ "epoch": 12.01,
3738
+ "learning_rate": 0.00013717981480128154,
3739
+ "loss": 1.441,
3740
+ "step": 3110
3741
+ },
3742
+ {
3743
+ "epoch": 12.03,
3744
+ "learning_rate": 0.00013695286771401734,
3745
+ "loss": 1.3854,
3746
+ "step": 3115
3747
+ },
3748
+ {
3749
+ "epoch": 12.05,
3750
+ "learning_rate": 0.00013647457933560234,
3751
+ "loss": 1.4397,
3752
+ "step": 3120
3753
+ },
3754
+ {
3755
+ "epoch": 12.07,
3756
+ "learning_rate": 0.00013574670838695926,
3757
+ "loss": 1.4672,
3758
+ "step": 3125
3759
+ },
3760
+ {
3761
+ "epoch": 12.08,
3762
+ "learning_rate": 0.0001347719313325897,
3763
+ "loss": 1.4525,
3764
+ "step": 3130
3765
+ },
3766
+ {
3767
+ "epoch": 12.1,
3768
+ "learning_rate": 0.00013355383253890914,
3769
+ "loss": 1.4068,
3770
+ "step": 3135
3771
+ },
3772
+ {
3773
+ "epoch": 12.12,
3774
+ "learning_rate": 0.0001320968910941225,
3775
+ "loss": 1.4855,
3776
+ "step": 3140
3777
+ },
3778
+ {
3779
+ "epoch": 12.14,
3780
+ "learning_rate": 0.0001304064643381061,
3781
+ "loss": 1.4212,
3782
+ "step": 3145
3783
+ },
3784
+ {
3785
+ "epoch": 12.16,
3786
+ "learning_rate": 0.00012848876816285777,
3787
+ "loss": 1.4849,
3788
+ "step": 3150
3789
+ },
3790
+ {
3791
+ "epoch": 12.18,
3792
+ "learning_rate": 0.00012635085415595244,
3793
+ "loss": 1.3912,
3794
+ "step": 3155
3795
+ },
3796
+ {
3797
+ "epoch": 12.2,
3798
+ "learning_rate": 0.00012400058367105247,
3799
+ "loss": 1.483,
3800
+ "step": 3160
3801
+ },
3802
+ {
3803
+ "epoch": 12.22,
3804
+ "learning_rate": 0.00012144659892081038,
3805
+ "loss": 1.3818,
3806
+ "step": 3165
3807
+ },
3808
+ {
3809
+ "epoch": 12.24,
3810
+ "learning_rate": 0.00011869829119846924,
3811
+ "loss": 1.4634,
3812
+ "step": 3170
3813
+ },
3814
+ {
3815
+ "epoch": 12.26,
3816
+ "learning_rate": 0.00011576576634500562,
3817
+ "loss": 1.4034,
3818
+ "step": 3175
3819
+ },
3820
+ {
3821
+ "epoch": 12.28,
3822
+ "learning_rate": 0.00011265980758879936,
3823
+ "loss": 1.4014,
3824
+ "step": 3180
3825
+ },
3826
+ {
3827
+ "epoch": 12.3,
3828
+ "learning_rate": 0.00010939183589447423,
3829
+ "loss": 1.4222,
3830
+ "step": 3185
3831
+ },
3832
+ {
3833
+ "epoch": 12.32,
3834
+ "learning_rate": 0.00010597386796670575,
3835
+ "loss": 1.4854,
3836
+ "step": 3190
3837
+ },
3838
+ {
3839
+ "epoch": 12.34,
3840
+ "learning_rate": 0.00010241847206343044,
3841
+ "loss": 1.4472,
3842
+ "step": 3195
3843
+ },
3844
+ {
3845
+ "epoch": 12.36,
3846
+ "learning_rate": 9.87387217809251e-05,
3847
+ "loss": 1.5271,
3848
+ "step": 3200
3849
+ },
3850
+ {
3851
+ "epoch": 12.37,
3852
+ "learning_rate": 9.494814798070321e-05,
3853
+ "loss": 1.401,
3854
+ "step": 3205
3855
+ },
3856
+ {
3857
+ "epoch": 12.39,
3858
+ "learning_rate": 9.106068903499514e-05,
3859
+ "loss": 1.5122,
3860
+ "step": 3210
3861
+ },
3862
+ {
3863
+ "epoch": 12.41,
3864
+ "learning_rate": 8.709063957376094e-05,
3865
+ "loss": 1.4755,
3866
+ "step": 3215
3867
+ },
3868
+ {
3869
+ "epoch": 12.43,
3870
+ "learning_rate": 8.30525979217072e-05,
3871
+ "loss": 1.4605,
3872
+ "step": 3220
3873
+ },
3874
+ {
3875
+ "epoch": 12.45,
3876
+ "learning_rate": 7.896141241858118e-05,
3877
+ "loss": 1.3958,
3878
+ "step": 3225
3879
+ },
3880
+ {
3881
+ "epoch": 12.47,
3882
+ "learning_rate": 7.48321268201337e-05,
3883
+ "loss": 1.4285,
3884
+ "step": 3230
3885
+ },
3886
+ {
3887
+ "epoch": 12.49,
3888
+ "learning_rate": 7.067992498051e-05,
3889
+ "loss": 1.4276,
3890
+ "step": 3235
3891
+ },
3892
+ {
3893
+ "epoch": 12.51,
3894
+ "learning_rate": 6.652007501948996e-05,
3895
+ "loss": 1.4174,
3896
+ "step": 3240
3897
+ },
3898
+ {
3899
+ "epoch": 12.53,
3900
+ "learning_rate": 6.236787317986674e-05,
3901
+ "loss": 1.4845,
3902
+ "step": 3245
3903
+ },
3904
+ {
3905
+ "epoch": 12.55,
3906
+ "learning_rate": 5.823858758141927e-05,
3907
+ "loss": 1.4357,
3908
+ "step": 3250
3909
+ },
3910
+ {
3911
+ "epoch": 12.57,
3912
+ "learning_rate": 5.414740207829325e-05,
3913
+ "loss": 1.4382,
3914
+ "step": 3255
3915
+ },
3916
+ {
3917
+ "epoch": 12.59,
3918
+ "learning_rate": 5.010936042623904e-05,
3919
+ "loss": 1.3395,
3920
+ "step": 3260
3921
+ },
3922
+ {
3923
+ "epoch": 12.61,
3924
+ "learning_rate": 4.6139310965004824e-05,
3925
+ "loss": 1.3898,
3926
+ "step": 3265
3927
+ },
3928
+ {
3929
+ "epoch": 12.63,
3930
+ "learning_rate": 4.225185201929675e-05,
3931
+ "loss": 1.4521,
3932
+ "step": 3270
3933
+ },
3934
+ {
3935
+ "epoch": 12.64,
3936
+ "learning_rate": 3.8461278219075304e-05,
3937
+ "loss": 1.4757,
3938
+ "step": 3275
3939
+ },
3940
+ {
3941
+ "epoch": 12.66,
3942
+ "learning_rate": 3.478152793656996e-05,
3943
+ "loss": 1.4128,
3944
+ "step": 3280
3945
+ },
3946
+ {
3947
+ "epoch": 12.68,
3948
+ "learning_rate": 3.122613203329423e-05,
3949
+ "loss": 1.4424,
3950
+ "step": 3285
3951
+ },
3952
+ {
3953
+ "epoch": 12.7,
3954
+ "learning_rate": 2.780816410552575e-05,
3955
+ "loss": 1.4287,
3956
+ "step": 3290
3957
+ },
3958
+ {
3959
+ "epoch": 12.72,
3960
+ "learning_rate": 2.454019241120062e-05,
3961
+ "loss": 1.3953,
3962
+ "step": 3295
3963
+ },
3964
+ {
3965
+ "epoch": 12.74,
3966
+ "learning_rate": 2.1434233654994707e-05,
3967
+ "loss": 1.3275,
3968
+ "step": 3300
3969
+ },
3970
+ {
3971
+ "epoch": 12.76,
3972
+ "learning_rate": 1.8501708801531077e-05,
3973
+ "loss": 1.3897,
3974
+ "step": 3305
3975
+ },
3976
+ {
3977
+ "epoch": 12.78,
3978
+ "learning_rate": 1.575340107918959e-05,
3979
+ "loss": 1.3657,
3980
+ "step": 3310
3981
+ },
3982
+ {
3983
+ "epoch": 12.8,
3984
+ "learning_rate": 1.319941632894751e-05,
3985
+ "loss": 1.3897,
3986
+ "step": 3315
3987
+ },
3988
+ {
3989
+ "epoch": 12.82,
3990
+ "learning_rate": 1.0849145844047538e-05,
3991
+ "loss": 1.4783,
3992
+ "step": 3320
3993
+ },
3994
+ {
3995
+ "epoch": 12.84,
3996
+ "learning_rate": 8.711231837142462e-06,
3997
+ "loss": 1.4102,
3998
+ "step": 3325
3999
+ },
4000
+ {
4001
+ "epoch": 12.86,
4002
+ "learning_rate": 6.793535661894092e-06,
4003
+ "loss": 1.4442,
4004
+ "step": 3330
4005
+ },
4006
+ {
4007
+ "epoch": 12.88,
4008
+ "learning_rate": 5.1031089058776675e-06,
4009
+ "loss": 1.3875,
4010
+ "step": 3335
4011
+ },
4012
+ {
4013
+ "epoch": 12.9,
4014
+ "learning_rate": 3.6461674610908637e-06,
4015
+ "loss": 1.4228,
4016
+ "step": 3340
4017
+ },
4018
+ {
4019
+ "epoch": 12.92,
4020
+ "learning_rate": 2.42806866741032e-06,
4021
+ "loss": 1.4015,
4022
+ "step": 3345
4023
+ },
4024
+ {
4025
+ "epoch": 12.93,
4026
+ "learning_rate": 1.453291613040815e-06,
4027
+ "loss": 1.3937,
4028
+ "step": 3350
4029
+ },
4030
+ {
4031
+ "epoch": 12.95,
4032
+ "learning_rate": 7.254206643977347e-07,
4033
+ "loss": 1.4905,
4034
+ "step": 3355
4035
+ },
4036
+ {
4037
+ "epoch": 12.97,
4038
+ "learning_rate": 2.4713228598269586e-07,
4039
+ "loss": 1.4419,
4040
+ "step": 3360
4041
+ },
4042
+ {
4043
+ "epoch": 12.99,
4044
+ "learning_rate": 2.0185198718462007e-08,
4045
+ "loss": 1.4331,
4046
+ "step": 3365
4047
+ },
4048
+ {
4049
+ "epoch": 13.01,
4050
+ "learning_rate": 4.5413912931266996e-08,
4051
+ "loss": 1.4014,
4052
+ "step": 3370
4053
+ },
4054
+ {
4055
+ "epoch": 13.03,
4056
+ "learning_rate": 3.227256597612364e-07,
4057
+ "loss": 1.3146,
4058
+ "step": 3375
4059
+ },
4060
+ {
4061
+ "epoch": 13.05,
4062
+ "learning_rate": 8.51100732284126e-07,
4063
+ "loss": 1.3623,
4064
+ "step": 3380
4065
+ },
4066
+ {
4067
+ "epoch": 13.07,
4068
+ "learning_rate": 1.62859623508997e-06,
4069
+ "loss": 1.3259,
4070
+ "step": 3385
4071
+ },
4072
+ {
4073
+ "epoch": 13.09,
4074
+ "learning_rate": 2.652353228531244e-06,
4075
+ "loss": 1.3975,
4076
+ "step": 3390
4077
+ },
4078
+ {
4079
+ "epoch": 13.11,
4080
+ "learning_rate": 3.9186072413696845e-06,
4081
+ "loss": 1.3585,
4082
+ "step": 3395
4083
+ },
4084
+ {
4085
+ "epoch": 13.13,
4086
+ "learning_rate": 5.422702113166566e-06,
4087
+ "loss": 1.3596,
4088
+ "step": 3400
4089
+ },
4090
+ {
4091
+ "epoch": 13.15,
4092
+ "learning_rate": 7.159107115516102e-06,
4093
+ "loss": 1.4021,
4094
+ "step": 3405
4095
+ },
4096
+ {
4097
+ "epoch": 13.17,
4098
+ "learning_rate": 9.121437289164265e-06,
4099
+ "loss": 1.4666,
4100
+ "step": 3410
4101
+ },
4102
+ {
4103
+ "epoch": 13.19,
4104
+ "learning_rate": 1.1302476922232583e-05,
4105
+ "loss": 1.327,
4106
+ "step": 3415
4107
+ },
4108
+ {
4109
+ "epoch": 13.2,
4110
+ "learning_rate": 1.3694206083212781e-05,
4111
+ "loss": 1.2798,
4112
+ "step": 3420
4113
+ },
4114
+ {
4115
+ "epoch": 13.22,
4116
+ "learning_rate": 1.628783011117153e-05,
4117
+ "loss": 1.3184,
4118
+ "step": 3425
4119
+ },
4120
+ {
4121
+ "epoch": 13.24,
4122
+ "learning_rate": 1.9073811954719624e-05,
4123
+ "loss": 1.3236,
4124
+ "step": 3430
4125
+ },
4126
+ {
4127
+ "epoch": 13.26,
4128
+ "learning_rate": 2.2041907240839828e-05,
4129
+ "loss": 1.3766,
4130
+ "step": 3435
4131
+ },
4132
+ {
4133
+ "epoch": 13.28,
4134
+ "learning_rate": 2.518120194461378e-05,
4135
+ "loss": 1.2779,
4136
+ "step": 3440
4137
+ },
4138
+ {
4139
+ "epoch": 13.3,
4140
+ "learning_rate": 2.8480152521337216e-05,
4141
+ "loss": 1.3743,
4142
+ "step": 3445
4143
+ },
4144
+ {
4145
+ "epoch": 13.32,
4146
+ "learning_rate": 3.1926628353448936e-05,
4147
+ "loss": 1.336,
4148
+ "step": 3450
4149
+ },
4150
+ {
4151
+ "epoch": 13.34,
4152
+ "learning_rate": 3.5507956356197615e-05,
4153
+ "loss": 1.3522,
4154
+ "step": 3455
4155
+ },
4156
+ {
4157
+ "epoch": 13.36,
4158
+ "learning_rate": 3.9210967578018804e-05,
4159
+ "loss": 1.3693,
4160
+ "step": 3460
4161
+ },
4162
+ {
4163
+ "epoch": 13.38,
4164
+ "learning_rate": 4.302204562427067e-05,
4165
+ "loss": 1.3374,
4166
+ "step": 3465
4167
+ },
4168
+ {
4169
+ "epoch": 13.4,
4170
+ "learning_rate": 4.692717672627317e-05,
4171
+ "loss": 1.3881,
4172
+ "step": 3470
4173
+ },
4174
+ {
4175
+ "epoch": 13.42,
4176
+ "learning_rate": 5.091200127153047e-05,
4177
+ "loss": 1.2653,
4178
+ "step": 3475
4179
+ },
4180
+ {
4181
+ "epoch": 13.44,
4182
+ "learning_rate": 5.496186660566713e-05,
4183
+ "loss": 1.3907,
4184
+ "step": 3480
4185
+ },
4186
+ {
4187
+ "epoch": 13.46,
4188
+ "learning_rate": 5.906188091190777e-05,
4189
+ "loss": 1.3586,
4190
+ "step": 3485
4191
+ },
4192
+ {
4193
+ "epoch": 13.47,
4194
+ "learning_rate": 6.319696796998643e-05,
4195
+ "loss": 1.3102,
4196
+ "step": 3490
4197
+ },
4198
+ {
4199
+ "epoch": 13.49,
4200
+ "learning_rate": 6.735192259312862e-05,
4201
+ "loss": 1.3599,
4202
+ "step": 3495
4203
+ },
4204
+ {
4205
+ "epoch": 13.51,
4206
+ "learning_rate": 7.151146653925592e-05,
4207
+ "loss": 1.3715,
4208
+ "step": 3500
4209
+ },
4210
+ {
4211
+ "epoch": 13.53,
4212
+ "learning_rate": 7.566030469082585e-05,
4213
+ "loss": 1.4274,
4214
+ "step": 3505
4215
+ },
4216
+ {
4217
+ "epoch": 13.55,
4218
+ "learning_rate": 7.978318129672468e-05,
4219
+ "loss": 1.2634,
4220
+ "step": 3510
4221
+ },
4222
+ {
4223
+ "epoch": 13.57,
4224
+ "learning_rate": 8.386493606940281e-05,
4225
+ "loss": 1.3939,
4226
+ "step": 3515
4227
+ },
4228
+ {
4229
+ "epoch": 13.59,
4230
+ "learning_rate": 8.789055993098241e-05,
4231
+ "loss": 1.4498,
4232
+ "step": 3520
4233
+ },
4234
+ {
4235
+ "epoch": 13.61,
4236
+ "learning_rate": 9.184525020334682e-05,
4237
+ "loss": 1.4425,
4238
+ "step": 3525
4239
+ },
4240
+ {
4241
+ "epoch": 13.63,
4242
+ "learning_rate": 9.571446503927972e-05,
4243
+ "loss": 1.3688,
4244
+ "step": 3530
4245
+ },
4246
+ {
4247
+ "epoch": 13.65,
4248
+ "learning_rate": 9.948397689449235e-05,
4249
+ "loss": 1.3409,
4250
+ "step": 3535
4251
+ },
4252
+ {
4253
+ "epoch": 13.67,
4254
+ "learning_rate": 0.00010313992484392988,
4255
+ "loss": 1.4686,
4256
+ "step": 3540
4257
+ },
4258
+ {
4259
+ "epoch": 13.69,
4260
+ "learning_rate": 0.00010666886554997249,
4261
+ "loss": 1.3646,
4262
+ "step": 3545
4263
+ },
4264
+ {
4265
+ "epoch": 13.71,
4266
+ "learning_rate": 0.00011005782269511996,
4267
+ "loss": 1.411,
4268
+ "step": 3550
4269
+ },
4270
+ {
4271
+ "epoch": 13.73,
4272
+ "learning_rate": 0.00011329433469739373,
4273
+ "loss": 1.3738,
4274
+ "step": 3555
4275
+ },
4276
+ {
4277
+ "epoch": 13.75,
4278
+ "learning_rate": 0.0001163665005329936,
4279
+ "loss": 1.3912,
4280
+ "step": 3560
4281
+ },
4282
+ {
4283
+ "epoch": 13.76,
4284
+ "learning_rate": 0.00011926302349772045,
4285
+ "loss": 1.3728,
4286
+ "step": 3565
4287
+ },
4288
+ {
4289
+ "epoch": 13.78,
4290
+ "learning_rate": 0.00012197325274624481,
4291
+ "loss": 1.3925,
4292
+ "step": 3570
4293
+ },
4294
+ {
4295
+ "epoch": 13.8,
4296
+ "learning_rate": 0.0001244872224564823,
4297
+ "loss": 1.3735,
4298
+ "step": 3575
4299
+ },
4300
+ {
4301
+ "epoch": 13.82,
4302
+ "learning_rate": 0.0001267956884750556,
4303
+ "loss": 1.4361,
4304
+ "step": 3580
4305
+ },
4306
+ {
4307
+ "epoch": 13.84,
4308
+ "learning_rate": 0.0001288901623091031,
4309
+ "loss": 1.4661,
4310
+ "step": 3585
4311
+ },
4312
+ {
4313
+ "epoch": 13.86,
4314
+ "learning_rate": 0.000130762942339434,
4315
+ "loss": 1.4177,
4316
+ "step": 3590
4317
+ },
4318
+ {
4319
+ "epoch": 13.88,
4320
+ "learning_rate": 0.00013240714214026112,
4321
+ "loss": 1.4351,
4322
+ "step": 3595
4323
+ },
4324
+ {
4325
+ "epoch": 13.9,
4326
+ "learning_rate": 0.00013381671580137345,
4327
+ "loss": 1.4243,
4328
+ "step": 3600
4329
+ },
4330
+ {
4331
+ "epoch": 13.92,
4332
+ "learning_rate": 0.00013498648015963804,
4333
+ "loss": 1.3258,
4334
+ "step": 3605
4335
+ },
4336
+ {
4337
+ "epoch": 13.94,
4338
+ "learning_rate": 0.00013591213385808236,
4339
+ "loss": 1.3917,
4340
+ "step": 3610
4341
+ },
4342
+ {
4343
+ "epoch": 13.96,
4344
+ "learning_rate": 0.00013659027316247394,
4345
+ "loss": 1.3626,
4346
+ "step": 3615
4347
+ },
4348
+ {
4349
+ "epoch": 13.98,
4350
+ "learning_rate": 0.00013701840447723958,
4351
+ "loss": 1.505,
4352
+ "step": 3620
4353
+ },
4354
+ {
4355
+ "epoch": 14.0,
4356
+ "learning_rate": 0.00013719495351470075,
4357
+ "loss": 1.3238,
4358
+ "step": 3625
4359
+ },
4360
+ {
4361
+ "epoch": 14.02,
4362
+ "learning_rate": 0.00013711927108390887,
4363
+ "loss": 1.3589,
4364
+ "step": 3630
4365
+ },
4366
+ {
4367
+ "epoch": 14.03,
4368
+ "learning_rate": 0.00013679163547779456,
4369
+ "loss": 1.4241,
4370
+ "step": 3635
4371
+ },
4372
+ {
4373
+ "epoch": 14.05,
4374
+ "learning_rate": 0.00013621325144985282,
4375
+ "loss": 1.4179,
4376
+ "step": 3640
4377
+ },
4378
+ {
4379
+ "epoch": 14.07,
4380
+ "learning_rate": 0.00013538624578412686,
4381
+ "loss": 1.3404,
4382
+ "step": 3645
4383
+ },
4384
+ {
4385
+ "epoch": 14.09,
4386
+ "learning_rate": 0.00013431365947478058,
4387
+ "loss": 1.3758,
4388
+ "step": 3650
4389
+ },
4390
+ {
4391
+ "epoch": 14.11,
4392
+ "learning_rate": 0.00013299943654401664,
4393
+ "loss": 1.4247,
4394
+ "step": 3655
4395
+ },
4396
+ {
4397
+ "epoch": 14.13,
4398
+ "learning_rate": 0.00013144840953945616,
4399
+ "loss": 1.3701,
4400
+ "step": 3660
4401
+ },
4402
+ {
4403
+ "epoch": 14.15,
4404
+ "learning_rate": 0.00012966628176431025,
4405
+ "loss": 1.3553,
4406
+ "step": 3665
4407
+ },
4408
+ {
4409
+ "epoch": 14.17,
4410
+ "learning_rate": 0.00012765960630568425,
4411
+ "loss": 1.381,
4412
+ "step": 3670
4413
+ },
4414
+ {
4415
+ "epoch": 14.19,
4416
+ "learning_rate": 0.00012543576193812774,
4417
+ "loss": 1.442,
4418
+ "step": 3675
4419
+ },
4420
+ {
4421
+ "epoch": 14.21,
4422
+ "learning_rate": 0.0001230029259910393,
4423
+ "loss": 1.3873,
4424
+ "step": 3680
4425
+ },
4426
+ {
4427
+ "epoch": 14.23,
4428
+ "learning_rate": 0.0001203700442796948,
4429
+ "loss": 1.3884,
4430
+ "step": 3685
4431
+ },
4432
+ {
4433
+ "epoch": 14.25,
4434
+ "learning_rate": 0.00011754679821046217,
4435
+ "loss": 1.3278,
4436
+ "step": 3690
4437
+ },
4438
+ {
4439
+ "epoch": 14.27,
4440
+ "learning_rate": 0.00011454356918116728,
4441
+ "loss": 1.3606,
4442
+ "step": 3695
4443
+ },
4444
+ {
4445
+ "epoch": 14.29,
4446
+ "learning_rate": 0.00011137140040750922,
4447
+ "loss": 1.2409,
4448
+ "step": 3700
4449
+ },
4450
+ {
4451
+ "epoch": 14.31,
4452
+ "learning_rate": 0.00010804195631589772,
4453
+ "loss": 1.411,
4454
+ "step": 3705
4455
+ },
4456
+ {
4457
+ "epoch": 14.32,
4458
+ "learning_rate": 0.00010456747965202607,
4459
+ "loss": 1.38,
4460
+ "step": 3710
4461
+ },
4462
+ {
4463
+ "epoch": 14.34,
4464
+ "learning_rate": 0.00010096074646289782,
4465
+ "loss": 1.3982,
4466
+ "step": 3715
4467
+ },
4468
+ {
4469
+ "epoch": 14.36,
4470
+ "learning_rate": 9.723501911784598e-05,
4471
+ "loss": 1.3883,
4472
+ "step": 3720
4473
+ },
4474
+ {
4475
+ "epoch": 14.38,
4476
+ "learning_rate": 9.340399754128775e-05,
4477
+ "loss": 1.3611,
4478
+ "step": 3725
4479
+ },
4480
+ {
4481
+ "epoch": 14.4,
4482
+ "learning_rate": 8.948176883653932e-05,
4483
+ "loss": 1.4344,
4484
+ "step": 3730
4485
+ },
4486
+ {
4487
+ "epoch": 14.42,
4488
+ "learning_rate": 8.548275548593159e-05,
4489
+ "loss": 1.2783,
4490
+ "step": 3735
4491
+ },
4492
+ {
4493
+ "epoch": 14.44,
4494
+ "learning_rate": 8.142166231769664e-05,
4495
+ "loss": 1.335,
4496
+ "step": 3740
4497
+ },
4498
+ {
4499
+ "epoch": 14.46,
4500
+ "learning_rate": 7.731342243463601e-05,
4501
+ "loss": 1.3506,
4502
+ "step": 3745
4503
+ },
4504
+ {
4505
+ "epoch": 14.48,
4506
+ "learning_rate": 7.317314230339991e-05,
4507
+ "loss": 1.4243,
4508
+ "step": 3750
4509
+ },
4510
+ {
4511
+ "epoch": 14.5,
4512
+ "learning_rate": 6.901604620628517e-05,
4513
+ "loss": 1.3969,
4514
+ "step": 3755
4515
+ },
4516
+ {
4517
+ "epoch": 14.52,
4518
+ "learning_rate": 6.485742025981473e-05,
4519
+ "loss": 1.3597,
4520
+ "step": 3760
4521
+ },
4522
+ {
4523
+ "epoch": 14.54,
4524
+ "learning_rate": 6.071255620594063e-05,
4525
+ "loss": 1.4289,
4526
+ "step": 3765
4527
+ },
4528
+ {
4529
+ "epoch": 14.56,
4530
+ "learning_rate": 5.659669518256613e-05,
4531
+ "loss": 1.3466,
4532
+ "step": 3770
4533
+ },
4534
+ {
4535
+ "epoch": 14.58,
4536
+ "learning_rate": 5.252497168014461e-05,
4537
+ "loss": 1.279,
4538
+ "step": 3775
4539
+ },
4540
+ {
4541
+ "epoch": 14.59,
4542
+ "learning_rate": 4.8512357890428955e-05,
4543
+ "loss": 1.3786,
4544
+ "step": 3780
4545
+ },
4546
+ {
4547
+ "epoch": 14.61,
4548
+ "learning_rate": 4.457360865201619e-05,
4549
+ "loss": 1.2442,
4550
+ "step": 3785
4551
+ },
4552
+ {
4553
+ "epoch": 14.63,
4554
+ "learning_rate": 4.072320719512437e-05,
4555
+ "loss": 1.2467,
4556
+ "step": 3790
4557
+ },
4558
+ {
4559
+ "epoch": 14.65,
4560
+ "learning_rate": 3.697531188510021e-05,
4561
+ "loss": 1.326,
4562
+ "step": 3795
4563
+ },
4564
+ {
4565
+ "epoch": 14.67,
4566
+ "learning_rate": 3.3343704160496265e-05,
4567
+ "loss": 1.3049,
4568
+ "step": 3800
4569
+ },
4570
+ {
4571
+ "epoch": 14.69,
4572
+ "learning_rate": 2.9841737857150583e-05,
4573
+ "loss": 1.3741,
4574
+ "step": 3805
4575
+ },
4576
+ {
4577
+ "epoch": 14.71,
4578
+ "learning_rate": 2.648229010460623e-05,
4579
+ "loss": 1.3036,
4580
+ "step": 3810
4581
+ },
4582
+ {
4583
+ "epoch": 14.73,
4584
+ "learning_rate": 2.3277713975440426e-05,
4585
+ "loss": 1.3118,
4586
+ "step": 3815
4587
+ },
4588
+ {
4589
+ "epoch": 14.75,
4590
+ "learning_rate": 2.0239793061604814e-05,
4591
+ "loss": 1.3541,
4592
+ "step": 3820
4593
+ },
4594
+ {
4595
+ "epoch": 14.77,
4596
+ "learning_rate": 1.7379698144815434e-05,
4597
+ "loss": 1.3646,
4598
+ "step": 3825
4599
+ },
4600
+ {
4601
+ "epoch": 14.79,
4602
+ "learning_rate": 1.4707946120313696e-05,
4603
+ "loss": 1.3313,
4604
+ "step": 3830
4605
+ },
4606
+ {
4607
+ "epoch": 14.81,
4608
+ "learning_rate": 1.2234361325042786e-05,
4609
+ "loss": 1.3923,
4610
+ "step": 3835
4611
+ },
4612
+ {
4613
+ "epoch": 14.83,
4614
+ "learning_rate": 9.968039412440925e-06,
4615
+ "loss": 1.2976,
4616
+ "step": 3840
4617
+ },
4618
+ {
4619
+ "epoch": 14.85,
4620
+ "learning_rate": 7.917313906685554e-06,
4621
+ "loss": 1.3127,
4622
+ "step": 3845
4623
+ },
4624
+ {
4625
+ "epoch": 14.86,
4626
+ "learning_rate": 6.089725559373968e-06,
4627
+ "loss": 1.3699,
4628
+ "step": 3850
4629
+ },
4630
+ {
4631
+ "epoch": 14.88,
4632
+ "learning_rate": 4.4919946213203235e-06,
4633
+ "loss": 1.2705,
4634
+ "step": 3855
4635
+ },
4636
+ {
4637
+ "epoch": 14.9,
4638
+ "learning_rate": 3.129996131426458e-06,
4639
+ "loss": 1.3474,
4640
+ "step": 3860
4641
+ },
4642
+ {
4643
+ "epoch": 14.92,
4644
+ "learning_rate": 2.00873831349432e-06,
4645
+ "loss": 1.3704,
4646
+ "step": 3865
4647
+ },
4648
+ {
4649
+ "epoch": 14.94,
4650
+ "learning_rate": 1.1323441604147607e-06,
4651
+ "loss": 1.3555,
4652
+ "step": 3870
4653
+ },
4654
+ {
4655
+ "epoch": 14.96,
4656
+ "learning_rate": 5.040362734534312e-07,
4657
+ "loss": 1.3937,
4658
+ "step": 3875
4659
+ },
4660
+ {
4661
+ "epoch": 14.98,
4662
+ "learning_rate": 1.2612501237755945e-07,
4663
+ "loss": 1.425,
4664
+ "step": 3880
4665
+ },
4666
+ {
4667
+ "epoch": 15.0,
4668
+ "learning_rate": 0.0,
4669
+ "loss": 1.42,
4670
+ "step": 3885
4671
  }
4672
  ],
4673
+ "max_steps": 3885,
4674
+ "num_train_epochs": 15,
4675
+ "total_flos": 4058518487040000.0,
4676
  "trial_name": null,
4677
  "trial_params": null
4678
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:389b8ecf20b41edcee9e960fe36b20108477bd42c5446772215ffda927e366f3
3
  size 2671
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c8781197db3a3403466497c5da7316ba5318202ed62c8d7147bd3a02f7bd353
3
  size 2671