Training in progress, step 239, checkpoint

Browse files

Files changed (14) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +37 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/added_tokens.json +24 -0
last-checkpoint/merges.txt +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +31 -0
last-checkpoint/tokenizer.json +3 -0
last-checkpoint/tokenizer_config.json +207 -0
last-checkpoint/trainer_state.json +1706 -0
last-checkpoint/training_args.bin +3 -0
last-checkpoint/vocab.json +0 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: Qwen/Qwen2.5-3B
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-3B",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "embed_tokens",
+    "lm_head"
+  ],
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "k_proj",
+    "o_proj",
+    "down_proj",
+    "up_proj",
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3305692d35b74b4005d38dcd6bd52131ab98072dcd251e4840ccb14462988419
+size 1484196216

last-checkpoint/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

last-checkpoint/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8e00b34523a67b3bfe28f1c0874e7d90b2a372e85cc94c15b5e33bb1923c29df
+size 2968683840

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:664cf91330e3a233afde7b7c1f1a019063aac8c1a125dbd6950c74fb90893118
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:63c2fbe0dd778bed0f6058f5bb7c39f95ac01aa2b0a9c287de935e25b97fbedf
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
+size 11421896

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,207 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1706 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.01198791177097571,
+  "eval_steps": 500,
+  "global_step": 239,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 5.0158626656802136e-05,
+      "grad_norm": 3.5492336750030518,
+      "learning_rate": 1.0000000000000002e-06,
+      "loss": 2.1967,
+      "step": 1
+    },
+    {
+      "epoch": 0.00010031725331360427,
+      "grad_norm": 4.093388080596924,
+      "learning_rate": 2.0000000000000003e-06,
+      "loss": 2.074,
+      "step": 2
+    },
+    {
+      "epoch": 0.00015047587997040642,
+      "grad_norm": 5.221968650817871,
+      "learning_rate": 3e-06,
+      "loss": 1.8972,
+      "step": 3
+    },
+    {
+      "epoch": 0.00020063450662720855,
+      "grad_norm": 5.642177104949951,
+      "learning_rate": 4.000000000000001e-06,
+      "loss": 1.8401,
+      "step": 4
+    },
+    {
+      "epoch": 0.0002507931332840107,
+      "grad_norm": 5.066410541534424,
+      "learning_rate": 5e-06,
+      "loss": 1.8184,
+      "step": 5
+    },
+    {
+      "epoch": 0.00030095175994081285,
+      "grad_norm": 6.9239301681518555,
+      "learning_rate": 6e-06,
+      "loss": 1.8774,
+      "step": 6
+    },
+    {
+      "epoch": 0.00035111038659761494,
+      "grad_norm": 4.572519302368164,
+      "learning_rate": 7.000000000000001e-06,
+      "loss": 1.7552,
+      "step": 7
+    },
+    {
+      "epoch": 0.0004012690132544171,
+      "grad_norm": 5.454565525054932,
+      "learning_rate": 8.000000000000001e-06,
+      "loss": 1.6517,
+      "step": 8
+    },
+    {
+      "epoch": 0.00045142763991121924,
+      "grad_norm": 3.9561901092529297,
+      "learning_rate": 9e-06,
+      "loss": 1.937,
+      "step": 9
+    },
+    {
+      "epoch": 0.0005015862665680214,
+      "grad_norm": 4.928288459777832,
+      "learning_rate": 1e-05,
+      "loss": 1.8208,
+      "step": 10
+    },
+    {
+      "epoch": 0.0005517448932248235,
+      "grad_norm": 5.23153018951416,
+      "learning_rate": 1.1000000000000001e-05,
+      "loss": 1.8182,
+      "step": 11
+    },
+    {
+      "epoch": 0.0006019035198816257,
+      "grad_norm": 5.184555530548096,
+      "learning_rate": 1.2e-05,
+      "loss": 1.6997,
+      "step": 12
+    },
+    {
+      "epoch": 0.0006520621465384278,
+      "grad_norm": 4.049405574798584,
+      "learning_rate": 1.3000000000000001e-05,
+      "loss": 1.7283,
+      "step": 13
+    },
+    {
+      "epoch": 0.0007022207731952299,
+      "grad_norm": 5.253482818603516,
+      "learning_rate": 1.4000000000000001e-05,
+      "loss": 1.7556,
+      "step": 14
+    },
+    {
+      "epoch": 0.0007523793998520321,
+      "grad_norm": 7.162256717681885,
+      "learning_rate": 1.5e-05,
+      "loss": 1.7063,
+      "step": 15
+    },
+    {
+      "epoch": 0.0008025380265088342,
+      "grad_norm": 6.176828384399414,
+      "learning_rate": 1.6000000000000003e-05,
+      "loss": 1.9373,
+      "step": 16
+    },
+    {
+      "epoch": 0.0008526966531656363,
+      "grad_norm": 5.38509464263916,
+      "learning_rate": 1.7000000000000003e-05,
+      "loss": 1.8507,
+      "step": 17
+    },
+    {
+      "epoch": 0.0009028552798224385,
+      "grad_norm": 5.271251678466797,
+      "learning_rate": 1.8e-05,
+      "loss": 1.7382,
+      "step": 18
+    },
+    {
+      "epoch": 0.0009530139064792406,
+      "grad_norm": 4.529239654541016,
+      "learning_rate": 1.9e-05,
+      "loss": 1.7572,
+      "step": 19
+    },
+    {
+      "epoch": 0.0010031725331360428,
+      "grad_norm": 6.6834940910339355,
+      "learning_rate": 2e-05,
+      "loss": 1.6145,
+      "step": 20
+    },
+    {
+      "epoch": 0.0010533311597928448,
+      "grad_norm": 5.881002902984619,
+      "learning_rate": 2.1e-05,
+      "loss": 1.8464,
+      "step": 21
+    },
+    {
+      "epoch": 0.001103489786449647,
+      "grad_norm": 4.667331695556641,
+      "learning_rate": 2.2000000000000003e-05,
+      "loss": 1.9726,
+      "step": 22
+    },
+    {
+      "epoch": 0.0011536484131064492,
+      "grad_norm": 4.439915180206299,
+      "learning_rate": 2.3000000000000003e-05,
+      "loss": 1.9501,
+      "step": 23
+    },
+    {
+      "epoch": 0.0012038070397632514,
+      "grad_norm": 4.341642379760742,
+      "learning_rate": 2.4e-05,
+      "loss": 1.6347,
+      "step": 24
+    },
+    {
+      "epoch": 0.0012539656664200534,
+      "grad_norm": 5.318514823913574,
+      "learning_rate": 2.5e-05,
+      "loss": 2.0127,
+      "step": 25
+    },
+    {
+      "epoch": 0.0013041242930768556,
+      "grad_norm": 4.649798393249512,
+      "learning_rate": 2.6000000000000002e-05,
+      "loss": 1.533,
+      "step": 26
+    },
+    {
+      "epoch": 0.0013542829197336578,
+      "grad_norm": 5.03102970123291,
+      "learning_rate": 2.7000000000000002e-05,
+      "loss": 1.6971,
+      "step": 27
+    },
+    {
+      "epoch": 0.0014044415463904598,
+      "grad_norm": 6.526177406311035,
+      "learning_rate": 2.8000000000000003e-05,
+      "loss": 1.4764,
+      "step": 28
+    },
+    {
+      "epoch": 0.001454600173047262,
+      "grad_norm": 4.814057350158691,
+      "learning_rate": 2.9e-05,
+      "loss": 1.8919,
+      "step": 29
+    },
+    {
+      "epoch": 0.0015047587997040642,
+      "grad_norm": 5.465883255004883,
+      "learning_rate": 3e-05,
+      "loss": 1.8328,
+      "step": 30
+    },
+    {
+      "epoch": 0.0015549174263608662,
+      "grad_norm": 5.6578264236450195,
+      "learning_rate": 3.1e-05,
+      "loss": 1.6715,
+      "step": 31
+    },
+    {
+      "epoch": 0.0016050760530176684,
+      "grad_norm": 5.566780090332031,
+      "learning_rate": 3.2000000000000005e-05,
+      "loss": 1.8483,
+      "step": 32
+    },
+    {
+      "epoch": 0.0016552346796744706,
+      "grad_norm": 5.120264053344727,
+      "learning_rate": 3.3e-05,
+      "loss": 1.8823,
+      "step": 33
+    },
+    {
+      "epoch": 0.0017053933063312726,
+      "grad_norm": 5.753010272979736,
+      "learning_rate": 3.4000000000000007e-05,
+      "loss": 1.6993,
+      "step": 34
+    },
+    {
+      "epoch": 0.0017555519329880748,
+      "grad_norm": 6.242431640625,
+      "learning_rate": 3.5e-05,
+      "loss": 1.8161,
+      "step": 35
+    },
+    {
+      "epoch": 0.001805710559644877,
+      "grad_norm": 5.681616306304932,
+      "learning_rate": 3.6e-05,
+      "loss": 1.8614,
+      "step": 36
+    },
+    {
+      "epoch": 0.001855869186301679,
+      "grad_norm": 6.1227569580078125,
+      "learning_rate": 3.7e-05,
+      "loss": 1.7892,
+      "step": 37
+    },
+    {
+      "epoch": 0.0019060278129584812,
+      "grad_norm": 5.99332332611084,
+      "learning_rate": 3.8e-05,
+      "loss": 1.4083,
+      "step": 38
+    },
+    {
+      "epoch": 0.0019561864396152834,
+      "grad_norm": 5.745953559875488,
+      "learning_rate": 3.9000000000000006e-05,
+      "loss": 1.4741,
+      "step": 39
+    },
+    {
+      "epoch": 0.0020063450662720856,
+      "grad_norm": 5.3738837242126465,
+      "learning_rate": 4e-05,
+      "loss": 1.8169,
+      "step": 40
+    },
+    {
+      "epoch": 0.0020565036929288878,
+      "grad_norm": 5.091897010803223,
+      "learning_rate": 4.1e-05,
+      "loss": 1.67,
+      "step": 41
+    },
+    {
+      "epoch": 0.0021066623195856895,
+      "grad_norm": 5.49876070022583,
+      "learning_rate": 4.2e-05,
+      "loss": 1.4768,
+      "step": 42
+    },
+    {
+      "epoch": 0.0021568209462424917,
+      "grad_norm": 8.60002613067627,
+      "learning_rate": 4.3e-05,
+      "loss": 1.6741,
+      "step": 43
+    },
+    {
+      "epoch": 0.002206979572899294,
+      "grad_norm": 5.658391952514648,
+      "learning_rate": 4.4000000000000006e-05,
+      "loss": 1.7292,
+      "step": 44
+    },
+    {
+      "epoch": 0.002257138199556096,
+      "grad_norm": 6.834291934967041,
+      "learning_rate": 4.5e-05,
+      "loss": 1.7368,
+      "step": 45
+    },
+    {
+      "epoch": 0.0023072968262128984,
+      "grad_norm": 9.042109489440918,
+      "learning_rate": 4.600000000000001e-05,
+      "loss": 1.5894,
+      "step": 46
+    },
+    {
+      "epoch": 0.0023574554528697006,
+      "grad_norm": 8.759542465209961,
+      "learning_rate": 4.7e-05,
+      "loss": 1.5485,
+      "step": 47
+    },
+    {
+      "epoch": 0.0024076140795265028,
+      "grad_norm": 8.1019868850708,
+      "learning_rate": 4.8e-05,
+      "loss": 1.3596,
+      "step": 48
+    },
+    {
+      "epoch": 0.0024577727061833045,
+      "grad_norm": 10.25011157989502,
+      "learning_rate": 4.9e-05,
+      "loss": 1.2274,
+      "step": 49
+    },
+    {
+      "epoch": 0.0025079313328401067,
+      "grad_norm": 10.12292194366455,
+      "learning_rate": 5e-05,
+      "loss": 1.1246,
+      "step": 50
+    },
+    {
+      "epoch": 0.002558089959496909,
+      "grad_norm": 3.036353349685669,
+      "learning_rate": 5.1000000000000006e-05,
+      "loss": 1.9104,
+      "step": 51
+    },
+    {
+      "epoch": 0.002608248586153711,
+      "grad_norm": 3.3351845741271973,
+      "learning_rate": 5.2000000000000004e-05,
+      "loss": 1.6544,
+      "step": 52
+    },
+    {
+      "epoch": 0.0026584072128105134,
+      "grad_norm": 4.553438186645508,
+      "learning_rate": 5.300000000000001e-05,
+      "loss": 1.777,
+      "step": 53
+    },
+    {
+      "epoch": 0.0027085658394673156,
+      "grad_norm": 5.190873622894287,
+      "learning_rate": 5.4000000000000005e-05,
+      "loss": 1.7235,
+      "step": 54
+    },
+    {
+      "epoch": 0.0027587244661241173,
+      "grad_norm": 4.517773628234863,
+      "learning_rate": 5.500000000000001e-05,
+      "loss": 1.515,
+      "step": 55
+    },
+    {
+      "epoch": 0.0028088830927809195,
+      "grad_norm": 4.626497268676758,
+      "learning_rate": 5.6000000000000006e-05,
+      "loss": 1.7808,
+      "step": 56
+    },
+    {
+      "epoch": 0.0028590417194377217,
+      "grad_norm": 6.360121250152588,
+      "learning_rate": 5.6999999999999996e-05,
+      "loss": 1.8502,
+      "step": 57
+    },
+    {
+      "epoch": 0.002909200346094524,
+      "grad_norm": 6.034987926483154,
+      "learning_rate": 5.8e-05,
+      "loss": 1.8749,
+      "step": 58
+    },
+    {
+      "epoch": 0.002959358972751326,
+      "grad_norm": 5.864983558654785,
+      "learning_rate": 5.9e-05,
+      "loss": 1.6335,
+      "step": 59
+    },
+    {
+      "epoch": 0.0030095175994081283,
+      "grad_norm": 5.735439300537109,
+      "learning_rate": 6e-05,
+      "loss": 1.7521,
+      "step": 60
+    },
+    {
+      "epoch": 0.0030596762260649306,
+      "grad_norm": 4.080763816833496,
+      "learning_rate": 6.1e-05,
+      "loss": 1.3012,
+      "step": 61
+    },
+    {
+      "epoch": 0.0031098348527217323,
+      "grad_norm": 4.825160026550293,
+      "learning_rate": 6.2e-05,
+      "loss": 1.6357,
+      "step": 62
+    },
+    {
+      "epoch": 0.0031599934793785345,
+      "grad_norm": 4.864975929260254,
+      "learning_rate": 6.3e-05,
+      "loss": 1.8173,
+      "step": 63
+    },
+    {
+      "epoch": 0.0032101521060353367,
+      "grad_norm": 5.236722946166992,
+      "learning_rate": 6.400000000000001e-05,
+      "loss": 1.6182,
+      "step": 64
+    },
+    {
+      "epoch": 0.003260310732692139,
+      "grad_norm": 5.208855628967285,
+      "learning_rate": 6.500000000000001e-05,
+      "loss": 1.6164,
+      "step": 65
+    },
+    {
+      "epoch": 0.003310469359348941,
+      "grad_norm": 4.640886306762695,
+      "learning_rate": 6.6e-05,
+      "loss": 1.4392,
+      "step": 66
+    },
+    {
+      "epoch": 0.0033606279860057433,
+      "grad_norm": 6.360536575317383,
+      "learning_rate": 6.7e-05,
+      "loss": 1.555,
+      "step": 67
+    },
+    {
+      "epoch": 0.003410786612662545,
+      "grad_norm": 5.957921028137207,
+      "learning_rate": 6.800000000000001e-05,
+      "loss": 1.7957,
+      "step": 68
+    },
+    {
+      "epoch": 0.0034609452393193473,
+      "grad_norm": 4.701836109161377,
+      "learning_rate": 6.9e-05,
+      "loss": 1.6242,
+      "step": 69
+    },
+    {
+      "epoch": 0.0035111038659761495,
+      "grad_norm": 5.555307388305664,
+      "learning_rate": 7e-05,
+      "loss": 1.2819,
+      "step": 70
+    },
+    {
+      "epoch": 0.0035612624926329517,
+      "grad_norm": 5.1163177490234375,
+      "learning_rate": 7.1e-05,
+      "loss": 1.4986,
+      "step": 71
+    },
+    {
+      "epoch": 0.003611421119289754,
+      "grad_norm": 6.242301940917969,
+      "learning_rate": 7.2e-05,
+      "loss": 1.9205,
+      "step": 72
+    },
+    {
+      "epoch": 0.003661579745946556,
+      "grad_norm": 3.3643064498901367,
+      "learning_rate": 7.3e-05,
+      "loss": 1.6765,
+      "step": 73
+    },
+    {
+      "epoch": 0.003711738372603358,
+      "grad_norm": 6.956996917724609,
+      "learning_rate": 7.4e-05,
+      "loss": 1.7204,
+      "step": 74
+    },
+    {
+      "epoch": 0.00376189699926016,
+      "grad_norm": 4.798349380493164,
+      "learning_rate": 7.500000000000001e-05,
+      "loss": 1.8442,
+      "step": 75
+    },
+    {
+      "epoch": 0.0038120556259169623,
+      "grad_norm": 3.9241135120391846,
+      "learning_rate": 7.6e-05,
+      "loss": 1.6925,
+      "step": 76
+    },
+    {
+      "epoch": 0.0038622142525737645,
+      "grad_norm": 7.881755828857422,
+      "learning_rate": 7.7e-05,
+      "loss": 1.7764,
+      "step": 77
+    },
+    {
+      "epoch": 0.003912372879230567,
+      "grad_norm": 6.434783458709717,
+      "learning_rate": 7.800000000000001e-05,
+      "loss": 1.5407,
+      "step": 78
+    },
+    {
+      "epoch": 0.0039625315058873685,
+      "grad_norm": 4.235235691070557,
+      "learning_rate": 7.900000000000001e-05,
+      "loss": 1.9038,
+      "step": 79
+    },
+    {
+      "epoch": 0.004012690132544171,
+      "grad_norm": 6.253316879272461,
+      "learning_rate": 8e-05,
+      "loss": 1.6845,
+      "step": 80
+    },
+    {
+      "epoch": 0.004062848759200973,
+      "grad_norm": 4.719666957855225,
+      "learning_rate": 8.1e-05,
+      "loss": 1.8441,
+      "step": 81
+    },
+    {
+      "epoch": 0.0041130073858577755,
+      "grad_norm": 4.454250335693359,
+      "learning_rate": 8.2e-05,
+      "loss": 1.9411,
+      "step": 82
+    },
+    {
+      "epoch": 0.004163166012514577,
+      "grad_norm": 5.313316345214844,
+      "learning_rate": 8.3e-05,
+      "loss": 1.5968,
+      "step": 83
+    },
+    {
+      "epoch": 0.004213324639171379,
+      "grad_norm": 4.927578449249268,
+      "learning_rate": 8.4e-05,
+      "loss": 1.5322,
+      "step": 84
+    },
+    {
+      "epoch": 0.004263483265828182,
+      "grad_norm": 4.616065979003906,
+      "learning_rate": 8.5e-05,
+      "loss": 1.748,
+      "step": 85
+    },
+    {
+      "epoch": 0.0043136418924849835,
+      "grad_norm": 6.438388347625732,
+      "learning_rate": 8.6e-05,
+      "loss": 1.7049,
+      "step": 86
+    },
+    {
+      "epoch": 0.004363800519141786,
+      "grad_norm": 4.77520751953125,
+      "learning_rate": 8.7e-05,
+      "loss": 1.6878,
+      "step": 87
+    },
+    {
+      "epoch": 0.004413959145798588,
+      "grad_norm": 5.103641033172607,
+      "learning_rate": 8.800000000000001e-05,
+      "loss": 1.4719,
+      "step": 88
+    },
+    {
+      "epoch": 0.0044641177724553905,
+      "grad_norm": 4.581231594085693,
+      "learning_rate": 8.900000000000001e-05,
+      "loss": 1.7409,
+      "step": 89
+    },
+    {
+      "epoch": 0.004514276399112192,
+      "grad_norm": 5.612283706665039,
+      "learning_rate": 9e-05,
+      "loss": 1.7616,
+      "step": 90
+    },
+    {
+      "epoch": 0.004564435025768994,
+      "grad_norm": 6.311402320861816,
+      "learning_rate": 9.1e-05,
+      "loss": 1.4114,
+      "step": 91
+    },
+    {
+      "epoch": 0.004614593652425797,
+      "grad_norm": 9.486177444458008,
+      "learning_rate": 9.200000000000001e-05,
+      "loss": 1.1956,
+      "step": 92
+    },
+    {
+      "epoch": 0.0046647522790825985,
+      "grad_norm": 7.196384906768799,
+      "learning_rate": 9.300000000000001e-05,
+      "loss": 1.706,
+      "step": 93
+    },
+    {
+      "epoch": 0.004714910905739401,
+      "grad_norm": 5.772494316101074,
+      "learning_rate": 9.4e-05,
+      "loss": 1.5761,
+      "step": 94
+    },
+    {
+      "epoch": 0.004765069532396203,
+      "grad_norm": 7.097485542297363,
+      "learning_rate": 9.5e-05,
+      "loss": 1.4743,
+      "step": 95
+    },
+    {
+      "epoch": 0.0048152281590530055,
+      "grad_norm": 7.195338249206543,
+      "learning_rate": 9.6e-05,
+      "loss": 1.4798,
+      "step": 96
+    },
+    {
+      "epoch": 0.004865386785709807,
+      "grad_norm": 7.066890239715576,
+      "learning_rate": 9.7e-05,
+      "loss": 1.1984,
+      "step": 97
+    },
+    {
+      "epoch": 0.004915545412366609,
+      "grad_norm": 8.48562240600586,
+      "learning_rate": 9.8e-05,
+      "loss": 1.3506,
+      "step": 98
+    },
+    {
+      "epoch": 0.004965704039023412,
+      "grad_norm": 8.873973846435547,
+      "learning_rate": 9.900000000000001e-05,
+      "loss": 1.1898,
+      "step": 99
+    },
+    {
+      "epoch": 0.0050158626656802135,
+      "grad_norm": 9.311749458312988,
+      "learning_rate": 0.0001,
+      "loss": 0.9361,
+      "step": 100
+    },
+    {
+      "epoch": 0.005066021292337016,
+      "grad_norm": 3.371002674102783,
+      "learning_rate": 9.999999993078908e-05,
+      "loss": 1.8373,
+      "step": 101
+    },
+    {
+      "epoch": 0.005116179918993818,
+      "grad_norm": 3.5822818279266357,
+      "learning_rate": 9.999999972315628e-05,
+      "loss": 1.6158,
+      "step": 102
+    },
+    {
+      "epoch": 0.00516633854565062,
+      "grad_norm": 3.606417179107666,
+      "learning_rate": 9.999999937710161e-05,
+      "loss": 1.6813,
+      "step": 103
+    },
+    {
+      "epoch": 0.005216497172307422,
+      "grad_norm": 5.331118583679199,
+      "learning_rate": 9.999999889262508e-05,
+      "loss": 1.7903,
+      "step": 104
+    },
+    {
+      "epoch": 0.005266655798964224,
+      "grad_norm": 4.608427047729492,
+      "learning_rate": 9.999999826972668e-05,
+      "loss": 1.7265,
+      "step": 105
+    },
+    {
+      "epoch": 0.005316814425621027,
+      "grad_norm": 5.363006114959717,
+      "learning_rate": 9.999999750840643e-05,
+      "loss": 1.8263,
+      "step": 106
+    },
+    {
+      "epoch": 0.0053669730522778285,
+      "grad_norm": 4.672063827514648,
+      "learning_rate": 9.99999966086643e-05,
+      "loss": 1.7855,
+      "step": 107
+    },
+    {
+      "epoch": 0.005417131678934631,
+      "grad_norm": 5.497429370880127,
+      "learning_rate": 9.999999557050034e-05,
+      "loss": 1.4177,
+      "step": 108
+    },
+    {
+      "epoch": 0.005467290305591433,
+      "grad_norm": 6.637028694152832,
+      "learning_rate": 9.99999943939145e-05,
+      "loss": 1.7589,
+      "step": 109
+    },
+    {
+      "epoch": 0.005517448932248235,
+      "grad_norm": 3.890974998474121,
+      "learning_rate": 9.999999307890684e-05,
+      "loss": 1.498,
+      "step": 110
+    },
+    {
+      "epoch": 0.005567607558905037,
+      "grad_norm": 3.6080667972564697,
+      "learning_rate": 9.99999916254773e-05,
+      "loss": 1.3389,
+      "step": 111
+    },
+    {
+      "epoch": 0.005617766185561839,
+      "grad_norm": 3.9386773109436035,
+      "learning_rate": 9.999999003362595e-05,
+      "loss": 1.5423,
+      "step": 112
+    },
+    {
+      "epoch": 0.005667924812218642,
+      "grad_norm": 3.796339511871338,
+      "learning_rate": 9.999998830335273e-05,
+      "loss": 1.6022,
+      "step": 113
+    },
+    {
+      "epoch": 0.0057180834388754435,
+      "grad_norm": 5.058220386505127,
+      "learning_rate": 9.999998643465769e-05,
+      "loss": 1.6884,
+      "step": 114
+    },
+    {
+      "epoch": 0.005768242065532246,
+      "grad_norm": 4.7729692459106445,
+      "learning_rate": 9.999998442754082e-05,
+      "loss": 1.2262,
+      "step": 115
+    },
+    {
+      "epoch": 0.005818400692189048,
+      "grad_norm": 5.240146160125732,
+      "learning_rate": 9.999998228200212e-05,
+      "loss": 1.569,
+      "step": 116
+    },
+    {
+      "epoch": 0.00586855931884585,
+      "grad_norm": 5.360896110534668,
+      "learning_rate": 9.999997999804161e-05,
+      "loss": 1.7018,
+      "step": 117
+    },
+    {
+      "epoch": 0.005918717945502652,
+      "grad_norm": 6.045897483825684,
+      "learning_rate": 9.999997757565929e-05,
+      "loss": 1.5096,
+      "step": 118
+    },
+    {
+      "epoch": 0.005968876572159454,
+      "grad_norm": 5.264241695404053,
+      "learning_rate": 9.999997501485517e-05,
+      "loss": 1.6213,
+      "step": 119
+    },
+    {
+      "epoch": 0.006019035198816257,
+      "grad_norm": 3.9072487354278564,
+      "learning_rate": 9.999997231562923e-05,
+      "loss": 1.6774,
+      "step": 120
+    },
+    {
+      "epoch": 0.0060691938254730585,
+      "grad_norm": 4.171389579772949,
+      "learning_rate": 9.999996947798151e-05,
+      "loss": 1.5459,
+      "step": 121
+    },
+    {
+      "epoch": 0.006119352452129861,
+      "grad_norm": 5.671729564666748,
+      "learning_rate": 9.999996650191202e-05,
+      "loss": 1.8808,
+      "step": 122
+    },
+    {
+      "epoch": 0.006169511078786663,
+      "grad_norm": 4.9551239013671875,
+      "learning_rate": 9.999996338742074e-05,
+      "loss": 1.7071,
+      "step": 123
+    },
+    {
+      "epoch": 0.006219669705443465,
+      "grad_norm": 4.296231746673584,
+      "learning_rate": 9.999996013450772e-05,
+      "loss": 1.7096,
+      "step": 124
+    },
+    {
+      "epoch": 0.006269828332100267,
+      "grad_norm": 4.7311272621154785,
+      "learning_rate": 9.999995674317293e-05,
+      "loss": 1.7114,
+      "step": 125
+    },
+    {
+      "epoch": 0.006319986958757069,
+      "grad_norm": 5.266380310058594,
+      "learning_rate": 9.999995321341638e-05,
+      "loss": 1.5675,
+      "step": 126
+    },
+    {
+      "epoch": 0.006370145585413872,
+      "grad_norm": 5.088979721069336,
+      "learning_rate": 9.999994954523811e-05,
+      "loss": 1.7762,
+      "step": 127
+    },
+    {
+      "epoch": 0.0064203042120706735,
+      "grad_norm": 5.097537994384766,
+      "learning_rate": 9.99999457386381e-05,
+      "loss": 1.6353,
+      "step": 128
+    },
+    {
+      "epoch": 0.006470462838727475,
+      "grad_norm": 5.021105766296387,
+      "learning_rate": 9.999994179361638e-05,
+      "loss": 1.635,
+      "step": 129
+    },
+    {
+      "epoch": 0.006520621465384278,
+      "grad_norm": 4.90013313293457,
+      "learning_rate": 9.999993771017295e-05,
+      "loss": 1.8159,
+      "step": 130
+    },
+    {
+      "epoch": 0.00657078009204108,
+      "grad_norm": 6.177980422973633,
+      "learning_rate": 9.999993348830783e-05,
+      "loss": 1.6693,
+      "step": 131
+    },
+    {
+      "epoch": 0.006620938718697882,
+      "grad_norm": 4.298121929168701,
+      "learning_rate": 9.999992912802102e-05,
+      "loss": 1.5833,
+      "step": 132
+    },
+    {
+      "epoch": 0.006671097345354684,
+      "grad_norm": 5.9864349365234375,
+      "learning_rate": 9.999992462931256e-05,
+      "loss": 1.5427,
+      "step": 133
+    },
+    {
+      "epoch": 0.006721255972011487,
+      "grad_norm": 4.874154090881348,
+      "learning_rate": 9.999991999218243e-05,
+      "loss": 1.8125,
+      "step": 134
+    },
+    {
+      "epoch": 0.0067714145986682885,
+      "grad_norm": 4.420952320098877,
+      "learning_rate": 9.999991521663066e-05,
+      "loss": 1.6948,
+      "step": 135
+    },
+    {
+      "epoch": 0.00682157322532509,
+      "grad_norm": 7.217179775238037,
+      "learning_rate": 9.999991030265726e-05,
+      "loss": 1.3042,
+      "step": 136
+    },
+    {
+      "epoch": 0.006871731851981893,
+      "grad_norm": 5.299327373504639,
+      "learning_rate": 9.999990525026222e-05,
+      "loss": 1.5594,
+      "step": 137
+    },
+    {
+      "epoch": 0.006921890478638695,
+      "grad_norm": 4.912446975708008,
+      "learning_rate": 9.99999000594456e-05,
+      "loss": 1.673,
+      "step": 138
+    },
+    {
+      "epoch": 0.006972049105295497,
+      "grad_norm": 4.7740654945373535,
+      "learning_rate": 9.999989473020737e-05,
+      "loss": 1.6493,
+      "step": 139
+    },
+    {
+      "epoch": 0.007022207731952299,
+      "grad_norm": 5.836091041564941,
+      "learning_rate": 9.999988926254758e-05,
+      "loss": 1.296,
+      "step": 140
+    },
+    {
+      "epoch": 0.007072366358609102,
+      "grad_norm": 5.817897796630859,
+      "learning_rate": 9.999988365646623e-05,
+      "loss": 1.6388,
+      "step": 141
+    },
+    {
+      "epoch": 0.0071225249852659035,
+      "grad_norm": 5.8604278564453125,
+      "learning_rate": 9.999987791196333e-05,
+      "loss": 1.9454,
+      "step": 142
+    },
+    {
+      "epoch": 0.007172683611922705,
+      "grad_norm": 6.577927112579346,
+      "learning_rate": 9.999987202903889e-05,
+      "loss": 1.6439,
+      "step": 143
+    },
+    {
+      "epoch": 0.007222842238579508,
+      "grad_norm": 5.996036052703857,
+      "learning_rate": 9.999986600769295e-05,
+      "loss": 1.5554,
+      "step": 144
+    },
+    {
+      "epoch": 0.00727300086523631,
+      "grad_norm": 5.742420196533203,
+      "learning_rate": 9.999985984792553e-05,
+      "loss": 1.6343,
+      "step": 145
+    },
+    {
+      "epoch": 0.007323159491893112,
+      "grad_norm": 6.655057907104492,
+      "learning_rate": 9.99998535497366e-05,
+      "loss": 1.541,
+      "step": 146
+    },
+    {
+      "epoch": 0.007373318118549914,
+      "grad_norm": 5.933575630187988,
+      "learning_rate": 9.999984711312622e-05,
+      "loss": 1.204,
+      "step": 147
+    },
+    {
+      "epoch": 0.007423476745206716,
+      "grad_norm": 8.165215492248535,
+      "learning_rate": 9.999984053809441e-05,
+      "loss": 1.4714,
+      "step": 148
+    },
+    {
+      "epoch": 0.0074736353718635185,
+      "grad_norm": 8.663423538208008,
+      "learning_rate": 9.999983382464116e-05,
+      "loss": 1.3971,
+      "step": 149
+    },
+    {
+      "epoch": 0.00752379399852032,
+      "grad_norm": 7.599918365478516,
+      "learning_rate": 9.999982697276651e-05,
+      "loss": 0.9067,
+      "step": 150
+    },
+    {
+      "epoch": 0.007573952625177123,
+      "grad_norm": 3.1658380031585693,
+      "learning_rate": 9.999981998247048e-05,
+      "loss": 1.9463,
+      "step": 151
+    },
+    {
+      "epoch": 0.007624111251833925,
+      "grad_norm": 3.542314052581787,
+      "learning_rate": 9.999981285375307e-05,
+      "loss": 1.742,
+      "step": 152
+    },
+    {
+      "epoch": 0.007674269878490727,
+      "grad_norm": 4.111073970794678,
+      "learning_rate": 9.999980558661432e-05,
+      "loss": 1.7274,
+      "step": 153
+    },
+    {
+      "epoch": 0.007724428505147529,
+      "grad_norm": 5.310257911682129,
+      "learning_rate": 9.999979818105423e-05,
+      "loss": 1.9207,
+      "step": 154
+    },
+    {
+      "epoch": 0.007774587131804331,
+      "grad_norm": 4.864629745483398,
+      "learning_rate": 9.999979063707284e-05,
+      "loss": 1.7775,
+      "step": 155
+    },
+    {
+      "epoch": 0.007824745758461133,
+      "grad_norm": 4.800562381744385,
+      "learning_rate": 9.999978295467016e-05,
+      "loss": 1.6394,
+      "step": 156
+    },
+    {
+      "epoch": 0.007874904385117936,
+      "grad_norm": 5.542998790740967,
+      "learning_rate": 9.999977513384622e-05,
+      "loss": 1.9211,
+      "step": 157
+    },
+    {
+      "epoch": 0.007925063011774737,
+      "grad_norm": 4.554020881652832,
+      "learning_rate": 9.999976717460102e-05,
+      "loss": 1.7228,
+      "step": 158
+    },
+    {
+      "epoch": 0.00797522163843154,
+      "grad_norm": 5.0133056640625,
+      "learning_rate": 9.999975907693461e-05,
+      "loss": 1.6986,
+      "step": 159
+    },
+    {
+      "epoch": 0.008025380265088342,
+      "grad_norm": 3.597456932067871,
+      "learning_rate": 9.9999750840847e-05,
+      "loss": 1.618,
+      "step": 160
+    },
+    {
+      "epoch": 0.008075538891745143,
+      "grad_norm": 3.8272383213043213,
+      "learning_rate": 9.999974246633824e-05,
+      "loss": 1.5303,
+      "step": 161
+    },
+    {
+      "epoch": 0.008125697518401946,
+      "grad_norm": 4.668627738952637,
+      "learning_rate": 9.999973395340828e-05,
+      "loss": 1.715,
+      "step": 162
+    },
+    {
+      "epoch": 0.008175856145058748,
+      "grad_norm": 3.4355103969573975,
+      "learning_rate": 9.999972530205721e-05,
+      "loss": 1.5708,
+      "step": 163
+    },
+    {
+      "epoch": 0.008226014771715551,
+      "grad_norm": 4.077895641326904,
+      "learning_rate": 9.999971651228504e-05,
+      "loss": 2.0105,
+      "step": 164
+    },
+    {
+      "epoch": 0.008276173398372352,
+      "grad_norm": 5.014714241027832,
+      "learning_rate": 9.999970758409179e-05,
+      "loss": 1.8614,
+      "step": 165
+    },
+    {
+      "epoch": 0.008326332025029155,
+      "grad_norm": 5.244025707244873,
+      "learning_rate": 9.999969851747746e-05,
+      "loss": 1.7,
+      "step": 166
+    },
+    {
+      "epoch": 0.008376490651685957,
+      "grad_norm": 5.047165393829346,
+      "learning_rate": 9.999968931244212e-05,
+      "loss": 1.6595,
+      "step": 167
+    },
+    {
+      "epoch": 0.008426649278342758,
+      "grad_norm": 4.581408500671387,
+      "learning_rate": 9.999967996898576e-05,
+      "loss": 1.8837,
+      "step": 168
+    },
+    {
+      "epoch": 0.00847680790499956,
+      "grad_norm": 5.111804962158203,
+      "learning_rate": 9.999967048710844e-05,
+      "loss": 1.2134,
+      "step": 169
+    },
+    {
+      "epoch": 0.008526966531656363,
+      "grad_norm": 4.815063953399658,
+      "learning_rate": 9.999966086681014e-05,
+      "loss": 1.8506,
+      "step": 170
+    },
+    {
+      "epoch": 0.008577125158313166,
+      "grad_norm": 4.20850944519043,
+      "learning_rate": 9.999965110809093e-05,
+      "loss": 1.1193,
+      "step": 171
+    },
+    {
+      "epoch": 0.008627283784969967,
+      "grad_norm": 7.372136116027832,
+      "learning_rate": 9.999964121095081e-05,
+      "loss": 1.4076,
+      "step": 172
+    },
+    {
+      "epoch": 0.00867744241162677,
+      "grad_norm": 4.237338066101074,
+      "learning_rate": 9.999963117538982e-05,
+      "loss": 1.7532,
+      "step": 173
+    },
+    {
+      "epoch": 0.008727601038283572,
+      "grad_norm": 4.2047905921936035,
+      "learning_rate": 9.999962100140799e-05,
+      "loss": 1.7366,
+      "step": 174
+    },
+    {
+      "epoch": 0.008777759664940373,
+      "grad_norm": 4.078057765960693,
+      "learning_rate": 9.999961068900532e-05,
+      "loss": 1.696,
+      "step": 175
+    },
+    {
+      "epoch": 0.008827918291597176,
+      "grad_norm": 3.925241231918335,
+      "learning_rate": 9.999960023818189e-05,
+      "loss": 1.8851,
+      "step": 176
+    },
+    {
+      "epoch": 0.008878076918253978,
+      "grad_norm": 4.202706813812256,
+      "learning_rate": 9.999958964893767e-05,
+      "loss": 1.6194,
+      "step": 177
+    },
+    {
+      "epoch": 0.008928235544910781,
+      "grad_norm": 5.677591800689697,
+      "learning_rate": 9.999957892127275e-05,
+      "loss": 1.8302,
+      "step": 178
+    },
+    {
+      "epoch": 0.008978394171567582,
+      "grad_norm": 4.774147987365723,
+      "learning_rate": 9.99995680551871e-05,
+      "loss": 1.9582,
+      "step": 179
+    },
+    {
+      "epoch": 0.009028552798224385,
+      "grad_norm": 4.708377361297607,
+      "learning_rate": 9.999955705068081e-05,
+      "loss": 1.5443,
+      "step": 180
+    },
+    {
+      "epoch": 0.009078711424881187,
+      "grad_norm": 6.373831748962402,
+      "learning_rate": 9.999954590775387e-05,
+      "loss": 1.4237,
+      "step": 181
+    },
+    {
+      "epoch": 0.009128870051537988,
+      "grad_norm": 4.654812812805176,
+      "learning_rate": 9.999953462640633e-05,
+      "loss": 2.0366,
+      "step": 182
+    },
+    {
+      "epoch": 0.00917902867819479,
+      "grad_norm": 5.303413391113281,
+      "learning_rate": 9.999952320663817e-05,
+      "loss": 1.5228,
+      "step": 183
+    },
+    {
+      "epoch": 0.009229187304851593,
+      "grad_norm": 4.800450801849365,
+      "learning_rate": 9.99995116484495e-05,
+      "loss": 1.6961,
+      "step": 184
+    },
+    {
+      "epoch": 0.009279345931508396,
+      "grad_norm": 4.739235877990723,
+      "learning_rate": 9.99994999518403e-05,
+      "loss": 1.8077,
+      "step": 185
+    },
+    {
+      "epoch": 0.009329504558165197,
+      "grad_norm": 4.644327163696289,
+      "learning_rate": 9.999948811681063e-05,
+      "loss": 1.6023,
+      "step": 186
+    },
+    {
+      "epoch": 0.009379663184822,
+      "grad_norm": 4.27376127243042,
+      "learning_rate": 9.999947614336051e-05,
+      "loss": 1.6617,
+      "step": 187
+    },
+    {
+      "epoch": 0.009429821811478802,
+      "grad_norm": 3.941235065460205,
+      "learning_rate": 9.999946403148997e-05,
+      "loss": 1.5798,
+      "step": 188
+    },
+    {
+      "epoch": 0.009479980438135603,
+      "grad_norm": 4.801257610321045,
+      "learning_rate": 9.999945178119906e-05,
+      "loss": 1.5509,
+      "step": 189
+    },
+    {
+      "epoch": 0.009530139064792406,
+      "grad_norm": 4.117572784423828,
+      "learning_rate": 9.999943939248777e-05,
+      "loss": 1.655,
+      "step": 190
+    },
+    {
+      "epoch": 0.009580297691449208,
+      "grad_norm": 4.655939102172852,
+      "learning_rate": 9.999942686535619e-05,
+      "loss": 1.5012,
+      "step": 191
+    },
+    {
+      "epoch": 0.009630456318106011,
+      "grad_norm": 4.619282245635986,
+      "learning_rate": 9.999941419980431e-05,
+      "loss": 1.3436,
+      "step": 192
+    },
+    {
+      "epoch": 0.009680614944762812,
+      "grad_norm": 5.15116548538208,
+      "learning_rate": 9.99994013958322e-05,
+      "loss": 1.8281,
+      "step": 193
+    },
+    {
+      "epoch": 0.009730773571419615,
+      "grad_norm": 5.120091915130615,
+      "learning_rate": 9.999938845343988e-05,
+      "loss": 1.7552,
+      "step": 194
+    },
+    {
+      "epoch": 0.009780932198076417,
+      "grad_norm": 5.901042938232422,
+      "learning_rate": 9.999937537262738e-05,
+      "loss": 1.5446,
+      "step": 195
+    },
+    {
+      "epoch": 0.009831090824733218,
+      "grad_norm": 6.0915846824646,
+      "learning_rate": 9.999936215339475e-05,
+      "loss": 1.4559,
+      "step": 196
+    },
+    {
+      "epoch": 0.00988124945139002,
+      "grad_norm": 6.821080207824707,
+      "learning_rate": 9.999934879574202e-05,
+      "loss": 1.6452,
+      "step": 197
+    },
+    {
+      "epoch": 0.009931408078046823,
+      "grad_norm": 6.907961845397949,
+      "learning_rate": 9.999933529966923e-05,
+      "loss": 1.4225,
+      "step": 198
+    },
+    {
+      "epoch": 0.009981566704703626,
+      "grad_norm": 6.944931983947754,
+      "learning_rate": 9.99993216651764e-05,
+      "loss": 1.1927,
+      "step": 199
+    },
+    {
+      "epoch": 0.010031725331360427,
+      "grad_norm": 7.315927505493164,
+      "learning_rate": 9.999930789226358e-05,
+      "loss": 0.863,
+      "step": 200
+    },
+    {
+      "epoch": 0.01008188395801723,
+      "grad_norm": 2.1428048610687256,
+      "learning_rate": 9.999929398093082e-05,
+      "loss": 1.7098,
+      "step": 201
+    },
+    {
+      "epoch": 0.010132042584674032,
+      "grad_norm": 3.6517727375030518,
+      "learning_rate": 9.999927993117815e-05,
+      "loss": 1.6545,
+      "step": 202
+    },
+    {
+      "epoch": 0.010182201211330833,
+      "grad_norm": 4.238412380218506,
+      "learning_rate": 9.999926574300559e-05,
+      "loss": 1.9832,
+      "step": 203
+    },
+    {
+      "epoch": 0.010232359837987636,
+      "grad_norm": 4.278627395629883,
+      "learning_rate": 9.999925141641322e-05,
+      "loss": 1.825,
+      "step": 204
+    },
+    {
+      "epoch": 0.010282518464644438,
+      "grad_norm": 4.463462829589844,
+      "learning_rate": 9.999923695140103e-05,
+      "loss": 1.6986,
+      "step": 205
+    },
+    {
+      "epoch": 0.01033267709130124,
+      "grad_norm": 5.852229118347168,
+      "learning_rate": 9.99992223479691e-05,
+      "loss": 1.9561,
+      "step": 206
+    },
+    {
+      "epoch": 0.010382835717958042,
+      "grad_norm": 5.828834533691406,
+      "learning_rate": 9.999920760611747e-05,
+      "loss": 1.5839,
+      "step": 207
+    },
+    {
+      "epoch": 0.010432994344614845,
+      "grad_norm": 4.150822639465332,
+      "learning_rate": 9.999919272584615e-05,
+      "loss": 1.7085,
+      "step": 208
+    },
+    {
+      "epoch": 0.010483152971271647,
+      "grad_norm": 3.853485107421875,
+      "learning_rate": 9.99991777071552e-05,
+      "loss": 1.7434,
+      "step": 209
+    },
+    {
+      "epoch": 0.010533311597928448,
+      "grad_norm": 3.409013509750366,
+      "learning_rate": 9.999916255004466e-05,
+      "loss": 1.4738,
+      "step": 210
+    },
+    {
+      "epoch": 0.01058347022458525,
+      "grad_norm": 4.30065393447876,
+      "learning_rate": 9.999914725451457e-05,
+      "loss": 1.6132,
+      "step": 211
+    },
+    {
+      "epoch": 0.010633628851242053,
+      "grad_norm": 4.496330261230469,
+      "learning_rate": 9.999913182056498e-05,
+      "loss": 1.6567,
+      "step": 212
+    },
+    {
+      "epoch": 0.010683787477898854,
+      "grad_norm": 3.97495698928833,
+      "learning_rate": 9.999911624819593e-05,
+      "loss": 1.6223,
+      "step": 213
+    },
+    {
+      "epoch": 0.010733946104555657,
+      "grad_norm": 3.9297423362731934,
+      "learning_rate": 9.999910053740744e-05,
+      "loss": 1.9551,
+      "step": 214
+    },
+    {
+      "epoch": 0.01078410473121246,
+      "grad_norm": 4.655943393707275,
+      "learning_rate": 9.99990846881996e-05,
+      "loss": 1.3605,
+      "step": 215
+    },
+    {
+      "epoch": 0.010834263357869262,
+      "grad_norm": 3.850569248199463,
+      "learning_rate": 9.99990687005724e-05,
+      "loss": 1.4915,
+      "step": 216
+    },
+    {
+      "epoch": 0.010884421984526063,
+      "grad_norm": 3.8779959678649902,
+      "learning_rate": 9.999905257452593e-05,
+      "loss": 1.697,
+      "step": 217
+    },
+    {
+      "epoch": 0.010934580611182866,
+      "grad_norm": 3.9161744117736816,
+      "learning_rate": 9.999903631006023e-05,
+      "loss": 1.472,
+      "step": 218
+    },
+    {
+      "epoch": 0.010984739237839668,
+      "grad_norm": 3.332963705062866,
+      "learning_rate": 9.99990199071753e-05,
+      "loss": 1.8336,
+      "step": 219
+    },
+    {
+      "epoch": 0.01103489786449647,
+      "grad_norm": 3.0808727741241455,
+      "learning_rate": 9.999900336587124e-05,
+      "loss": 1.8147,
+      "step": 220
+    },
+    {
+      "epoch": 0.011085056491153272,
+      "grad_norm": 3.166036605834961,
+      "learning_rate": 9.999898668614807e-05,
+      "loss": 1.5529,
+      "step": 221
+    },
+    {
+      "epoch": 0.011135215117810075,
+      "grad_norm": 4.316058158874512,
+      "learning_rate": 9.999896986800583e-05,
+      "loss": 1.8339,
+      "step": 222
+    },
+    {
+      "epoch": 0.011185373744466877,
+      "grad_norm": 3.5491528511047363,
+      "learning_rate": 9.999895291144456e-05,
+      "loss": 1.457,
+      "step": 223
+    },
+    {
+      "epoch": 0.011235532371123678,
+      "grad_norm": 4.302348613739014,
+      "learning_rate": 9.999893581646436e-05,
+      "loss": 1.8205,
+      "step": 224
+    },
+    {
+      "epoch": 0.01128569099778048,
+      "grad_norm": 4.645691871643066,
+      "learning_rate": 9.99989185830652e-05,
+      "loss": 1.6126,
+      "step": 225
+    },
+    {
+      "epoch": 0.011335849624437283,
+      "grad_norm": 3.585592746734619,
+      "learning_rate": 9.99989012112472e-05,
+      "loss": 1.6233,
+      "step": 226
+    },
+    {
+      "epoch": 0.011386008251094084,
+      "grad_norm": 4.083550930023193,
+      "learning_rate": 9.999888370101036e-05,
+      "loss": 1.6479,
+      "step": 227
+    },
+    {
+      "epoch": 0.011436166877750887,
+      "grad_norm": 4.8738274574279785,
+      "learning_rate": 9.999886605235475e-05,
+      "loss": 1.544,
+      "step": 228
+    },
+    {
+      "epoch": 0.01148632550440769,
+      "grad_norm": 4.471858978271484,
+      "learning_rate": 9.999884826528039e-05,
+      "loss": 1.7492,
+      "step": 229
+    },
+    {
+      "epoch": 0.011536484131064492,
+      "grad_norm": 4.995892524719238,
+      "learning_rate": 9.999883033978738e-05,
+      "loss": 1.4996,
+      "step": 230
+    },
+    {
+      "epoch": 0.011586642757721293,
+      "grad_norm": 3.729520320892334,
+      "learning_rate": 9.999881227587573e-05,
+      "loss": 1.2047,
+      "step": 231
+    },
+    {
+      "epoch": 0.011636801384378096,
+      "grad_norm": 3.6157472133636475,
+      "learning_rate": 9.999879407354551e-05,
+      "loss": 1.8276,
+      "step": 232
+    },
+    {
+      "epoch": 0.011686960011034898,
+      "grad_norm": 3.605586528778076,
+      "learning_rate": 9.999877573279675e-05,
+      "loss": 1.6217,
+      "step": 233
+    },
+    {
+      "epoch": 0.0117371186376917,
+      "grad_norm": 3.8058934211730957,
+      "learning_rate": 9.999875725362953e-05,
+      "loss": 1.602,
+      "step": 234
+    },
+    {
+      "epoch": 0.011787277264348502,
+      "grad_norm": 4.18014669418335,
+      "learning_rate": 9.999873863604385e-05,
+      "loss": 1.572,
+      "step": 235
+    },
+    {
+      "epoch": 0.011837435891005305,
+      "grad_norm": 3.300985336303711,
+      "learning_rate": 9.999871988003983e-05,
+      "loss": 1.3266,
+      "step": 236
+    },
+    {
+      "epoch": 0.011887594517662107,
+      "grad_norm": 4.596490383148193,
+      "learning_rate": 9.999870098561748e-05,
+      "loss": 1.5166,
+      "step": 237
+    },
+    {
+      "epoch": 0.011937753144318908,
+      "grad_norm": 3.664485216140747,
+      "learning_rate": 9.999868195277684e-05,
+      "loss": 1.6468,
+      "step": 238
+    },
+    {
+      "epoch": 0.01198791177097571,
+      "grad_norm": 4.02537202835083,
+      "learning_rate": 9.999866278151801e-05,
+      "loss": 1.7261,
+      "step": 239
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 59808,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 239,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.164717070057472e+16,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f6ee56cf3ad6533b74f45eeb6d4801c4c9cf0152917a29fc04e29b3924e5e760
+size 6840

last-checkpoint/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff