Upload folder using huggingface_hub

Browse files

Files changed (12) hide show

README.md +202 -0
adapter_config.json +29 -0
adapter_model.safetensors +3 -0
checkpoint-200/README.md +202 -0
checkpoint-200/adapter_config.json +29 -0
checkpoint-200/adapter_model.safetensors +3 -0
checkpoint-200/optimizer.pt +3 -0
checkpoint-200/rng_state.pth +3 -0
checkpoint-200/scheduler.pt +3 -0
checkpoint-200/trainer_state.json +1433 -0
checkpoint-200/training_args.bin +3 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: ibm-granite/granite-3.0-8b-base
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

adapter_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "ibm-granite/granite-3.0-8b-base",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:89f5cfaa8de12226e754bbe7b97aa10c5bb106e737210e4fa13cf09ef67829e1
+size 34100216

checkpoint-200/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: ibm-granite/granite-3.0-8b-base
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

checkpoint-200/adapter_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "ibm-granite/granite-3.0-8b-base",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-200/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:89f5cfaa8de12226e754bbe7b97aa10c5bb106e737210e4fa13cf09ef67829e1
+size 34100216

checkpoint-200/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:02c13310fc4ea3bce4485259e0bf7cfdbd8f1c54a4aab8c0bdb937336621ad8b
+size 68292346

checkpoint-200/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f50f0b8cf8a8f5247d0e9729d14f2b46d5491d4b47ff4bafcc913f88950008b8
+size 14244

checkpoint-200/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7174de846ad34238559f1792b27ae2686c6070efbdc3aa66596ae4d2baa4d80f
+size 1064

checkpoint-200/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1433 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.3189792663476874,
+  "eval_steps": 500,
+  "global_step": 200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.001594896331738437,
+      "grad_norm": 6.184327602386475,
+      "learning_rate": 2.0000000000000003e-06,
+      "loss": 2.5706,
+      "step": 1
+    },
+    {
+      "epoch": 0.003189792663476874,
+      "grad_norm": 3.031334400177002,
+      "learning_rate": 4.000000000000001e-06,
+      "loss": 1.6976,
+      "step": 2
+    },
+    {
+      "epoch": 0.004784688995215311,
+      "grad_norm": 15.139897346496582,
+      "learning_rate": 6e-06,
+      "loss": 2.245,
+      "step": 3
+    },
+    {
+      "epoch": 0.006379585326953748,
+      "grad_norm": 3.134552478790283,
+      "learning_rate": 8.000000000000001e-06,
+      "loss": 1.6769,
+      "step": 4
+    },
+    {
+      "epoch": 0.007974481658692184,
+      "grad_norm": 3.714085340499878,
+      "learning_rate": 1e-05,
+      "loss": 2.2285,
+      "step": 5
+    },
+    {
+      "epoch": 0.009569377990430622,
+      "grad_norm": 9.840805053710938,
+      "learning_rate": 1.2e-05,
+      "loss": 2.7365,
+      "step": 6
+    },
+    {
+      "epoch": 0.011164274322169059,
+      "grad_norm": 1.8881586790084839,
+      "learning_rate": 1.4000000000000001e-05,
+      "loss": 1.9256,
+      "step": 7
+    },
+    {
+      "epoch": 0.012759170653907496,
+      "grad_norm": 7.487579822540283,
+      "learning_rate": 1.6000000000000003e-05,
+      "loss": 1.5082,
+      "step": 8
+    },
+    {
+      "epoch": 0.014354066985645933,
+      "grad_norm": 2.4636645317077637,
+      "learning_rate": 1.8e-05,
+      "loss": 2.4033,
+      "step": 9
+    },
+    {
+      "epoch": 0.01594896331738437,
+      "grad_norm": 2.8979716300964355,
+      "learning_rate": 2e-05,
+      "loss": 2.2099,
+      "step": 10
+    },
+    {
+      "epoch": 0.017543859649122806,
+      "grad_norm": 4.407735824584961,
+      "learning_rate": 2.2000000000000003e-05,
+      "loss": 1.5274,
+      "step": 11
+    },
+    {
+      "epoch": 0.019138755980861243,
+      "grad_norm": 3.755150079727173,
+      "learning_rate": 2.4e-05,
+      "loss": 1.1157,
+      "step": 12
+    },
+    {
+      "epoch": 0.02073365231259968,
+      "grad_norm": 3.088548183441162,
+      "learning_rate": 2.6000000000000002e-05,
+      "loss": 1.864,
+      "step": 13
+    },
+    {
+      "epoch": 0.022328548644338118,
+      "grad_norm": 3.3776023387908936,
+      "learning_rate": 2.8000000000000003e-05,
+      "loss": 2.7037,
+      "step": 14
+    },
+    {
+      "epoch": 0.023923444976076555,
+      "grad_norm": 3.2912089824676514,
+      "learning_rate": 3e-05,
+      "loss": 1.7976,
+      "step": 15
+    },
+    {
+      "epoch": 0.025518341307814992,
+      "grad_norm": 3.916226625442505,
+      "learning_rate": 3.2000000000000005e-05,
+      "loss": 2.115,
+      "step": 16
+    },
+    {
+      "epoch": 0.02711323763955343,
+      "grad_norm": 1.721280574798584,
+      "learning_rate": 3.4000000000000007e-05,
+      "loss": 1.6885,
+      "step": 17
+    },
+    {
+      "epoch": 0.028708133971291867,
+      "grad_norm": 1.8544317483901978,
+      "learning_rate": 3.6e-05,
+      "loss": 1.5542,
+      "step": 18
+    },
+    {
+      "epoch": 0.030303030303030304,
+      "grad_norm": 3.1539909839630127,
+      "learning_rate": 3.8e-05,
+      "loss": 1.1407,
+      "step": 19
+    },
+    {
+      "epoch": 0.03189792663476874,
+      "grad_norm": 7.259422302246094,
+      "learning_rate": 4e-05,
+      "loss": 1.4612,
+      "step": 20
+    },
+    {
+      "epoch": 0.03349282296650718,
+      "grad_norm": 2.585329055786133,
+      "learning_rate": 4.2e-05,
+      "loss": 2.3198,
+      "step": 21
+    },
+    {
+      "epoch": 0.03508771929824561,
+      "grad_norm": NaN,
+      "learning_rate": 4.2e-05,
+      "loss": 2.935,
+      "step": 22
+    },
+    {
+      "epoch": 0.03668261562998405,
+      "grad_norm": Infinity,
+      "learning_rate": 4.2e-05,
+      "loss": 1.0813,
+      "step": 23
+    },
+    {
+      "epoch": 0.03827751196172249,
+      "grad_norm": 11.25171947479248,
+      "learning_rate": 4.4000000000000006e-05,
+      "loss": 2.8932,
+      "step": 24
+    },
+    {
+      "epoch": 0.03987240829346093,
+      "grad_norm": 11.074877738952637,
+      "learning_rate": 4.600000000000001e-05,
+      "loss": 2.218,
+      "step": 25
+    },
+    {
+      "epoch": 0.04146730462519936,
+      "grad_norm": 14.481130599975586,
+      "learning_rate": 4.8e-05,
+      "loss": 2.2596,
+      "step": 26
+    },
+    {
+      "epoch": 0.0430622009569378,
+      "grad_norm": 19.015766143798828,
+      "learning_rate": 5e-05,
+      "loss": 2.0489,
+      "step": 27
+    },
+    {
+      "epoch": 0.044657097288676235,
+      "grad_norm": 17.07799530029297,
+      "learning_rate": 5.2000000000000004e-05,
+      "loss": 2.1545,
+      "step": 28
+    },
+    {
+      "epoch": 0.046251993620414676,
+      "grad_norm": 10.050027847290039,
+      "learning_rate": 5.4000000000000005e-05,
+      "loss": 1.3752,
+      "step": 29
+    },
+    {
+      "epoch": 0.04784688995215311,
+      "grad_norm": 10.415594100952148,
+      "learning_rate": 5.6000000000000006e-05,
+      "loss": 1.9154,
+      "step": 30
+    },
+    {
+      "epoch": 0.049441786283891544,
+      "grad_norm": 13.311936378479004,
+      "learning_rate": 5.8e-05,
+      "loss": 2.5644,
+      "step": 31
+    },
+    {
+      "epoch": 0.051036682615629984,
+      "grad_norm": 5.880099773406982,
+      "learning_rate": 6e-05,
+      "loss": 2.248,
+      "step": 32
+    },
+    {
+      "epoch": 0.05263157894736842,
+      "grad_norm": 2.3417255878448486,
+      "learning_rate": 6.2e-05,
+      "loss": 1.7846,
+      "step": 33
+    },
+    {
+      "epoch": 0.05422647527910686,
+      "grad_norm": 5.300235748291016,
+      "learning_rate": 6.400000000000001e-05,
+      "loss": 2.0771,
+      "step": 34
+    },
+    {
+      "epoch": 0.05582137161084529,
+      "grad_norm": 2.892624855041504,
+      "learning_rate": 6.6e-05,
+      "loss": 1.2795,
+      "step": 35
+    },
+    {
+      "epoch": 0.05741626794258373,
+      "grad_norm": 5.358643054962158,
+      "learning_rate": 6.800000000000001e-05,
+      "loss": 1.8778,
+      "step": 36
+    },
+    {
+      "epoch": 0.05901116427432217,
+      "grad_norm": 5.880906581878662,
+      "learning_rate": 7e-05,
+      "loss": 2.2137,
+      "step": 37
+    },
+    {
+      "epoch": 0.06060606060606061,
+      "grad_norm": 20.12361717224121,
+      "learning_rate": 7.2e-05,
+      "loss": 1.744,
+      "step": 38
+    },
+    {
+      "epoch": 0.06220095693779904,
+      "grad_norm": 5.061898231506348,
+      "learning_rate": 7.4e-05,
+      "loss": 2.4302,
+      "step": 39
+    },
+    {
+      "epoch": 0.06379585326953748,
+      "grad_norm": NaN,
+      "learning_rate": 7.4e-05,
+      "loss": 2.0454,
+      "step": 40
+    },
+    {
+      "epoch": 0.06539074960127592,
+      "grad_norm": 11.622527122497559,
+      "learning_rate": 7.6e-05,
+      "loss": 1.6152,
+      "step": 41
+    },
+    {
+      "epoch": 0.06698564593301436,
+      "grad_norm": Infinity,
+      "learning_rate": 7.6e-05,
+      "loss": 2.0358,
+      "step": 42
+    },
+    {
+      "epoch": 0.0685805422647528,
+      "grad_norm": 163.9078826904297,
+      "learning_rate": 7.800000000000001e-05,
+      "loss": 1.9115,
+      "step": 43
+    },
+    {
+      "epoch": 0.07017543859649122,
+      "grad_norm": 15.739387512207031,
+      "learning_rate": 8e-05,
+      "loss": 2.2478,
+      "step": 44
+    },
+    {
+      "epoch": 0.07177033492822966,
+      "grad_norm": 14.771827697753906,
+      "learning_rate": 8.2e-05,
+      "loss": 1.3414,
+      "step": 45
+    },
+    {
+      "epoch": 0.0733652312599681,
+      "grad_norm": 11.412920951843262,
+      "learning_rate": 8.4e-05,
+      "loss": 1.5662,
+      "step": 46
+    },
+    {
+      "epoch": 0.07496012759170653,
+      "grad_norm": 3.7569572925567627,
+      "learning_rate": 8.6e-05,
+      "loss": 1.2951,
+      "step": 47
+    },
+    {
+      "epoch": 0.07655502392344497,
+      "grad_norm": 5.0032148361206055,
+      "learning_rate": 8.800000000000001e-05,
+      "loss": 1.6676,
+      "step": 48
+    },
+    {
+      "epoch": 0.07814992025518341,
+      "grad_norm": 14.337443351745605,
+      "learning_rate": 9e-05,
+      "loss": 0.8836,
+      "step": 49
+    },
+    {
+      "epoch": 0.07974481658692185,
+      "grad_norm": 15.200254440307617,
+      "learning_rate": 9.200000000000001e-05,
+      "loss": 1.1178,
+      "step": 50
+    },
+    {
+      "epoch": 0.08133971291866028,
+      "grad_norm": 5.100391387939453,
+      "learning_rate": 9.4e-05,
+      "loss": 2.294,
+      "step": 51
+    },
+    {
+      "epoch": 0.08293460925039872,
+      "grad_norm": 153.73574829101562,
+      "learning_rate": 9.6e-05,
+      "loss": 1.5195,
+      "step": 52
+    },
+    {
+      "epoch": 0.08452950558213716,
+      "grad_norm": 6.9149298667907715,
+      "learning_rate": 9.8e-05,
+      "loss": 2.7116,
+      "step": 53
+    },
+    {
+      "epoch": 0.0861244019138756,
+      "grad_norm": 50.7889404296875,
+      "learning_rate": 0.0001,
+      "loss": 1.491,
+      "step": 54
+    },
+    {
+      "epoch": 0.08771929824561403,
+      "grad_norm": 8.714326858520508,
+      "learning_rate": 0.00010200000000000001,
+      "loss": 2.202,
+      "step": 55
+    },
+    {
+      "epoch": 0.08931419457735247,
+      "grad_norm": 4.900972843170166,
+      "learning_rate": 0.00010400000000000001,
+      "loss": 1.3895,
+      "step": 56
+    },
+    {
+      "epoch": 0.09090909090909091,
+      "grad_norm": 3.462311267852783,
+      "learning_rate": 0.00010600000000000002,
+      "loss": 1.6516,
+      "step": 57
+    },
+    {
+      "epoch": 0.09250398724082935,
+      "grad_norm": NaN,
+      "learning_rate": 0.00010600000000000002,
+      "loss": 1.1269,
+      "step": 58
+    },
+    {
+      "epoch": 0.09409888357256778,
+      "grad_norm": Infinity,
+      "learning_rate": 0.00010600000000000002,
+      "loss": 3.1964,
+      "step": 59
+    },
+    {
+      "epoch": 0.09569377990430622,
+      "grad_norm": 32.43291091918945,
+      "learning_rate": 0.00010800000000000001,
+      "loss": 1.4105,
+      "step": 60
+    },
+    {
+      "epoch": 0.09728867623604466,
+      "grad_norm": 80.84922790527344,
+      "learning_rate": 0.00011000000000000002,
+      "loss": 1.7675,
+      "step": 61
+    },
+    {
+      "epoch": 0.09888357256778309,
+      "grad_norm": 34.09085464477539,
+      "learning_rate": 0.00011200000000000001,
+      "loss": 1.9853,
+      "step": 62
+    },
+    {
+      "epoch": 0.10047846889952153,
+      "grad_norm": 3.656672954559326,
+      "learning_rate": 0.00011399999999999999,
+      "loss": 1.6841,
+      "step": 63
+    },
+    {
+      "epoch": 0.10207336523125997,
+      "grad_norm": 11.103597640991211,
+      "learning_rate": 0.000116,
+      "loss": 1.6717,
+      "step": 64
+    },
+    {
+      "epoch": 0.10366826156299841,
+      "grad_norm": 5.515536785125732,
+      "learning_rate": 0.000118,
+      "loss": 2.456,
+      "step": 65
+    },
+    {
+      "epoch": 0.10526315789473684,
+      "grad_norm": 4.2976861000061035,
+      "learning_rate": 0.00012,
+      "loss": 1.3526,
+      "step": 66
+    },
+    {
+      "epoch": 0.10685805422647528,
+      "grad_norm": 6.591656684875488,
+      "learning_rate": 0.000122,
+      "loss": 1.0501,
+      "step": 67
+    },
+    {
+      "epoch": 0.10845295055821372,
+      "grad_norm": 16.303518295288086,
+      "learning_rate": 0.000124,
+      "loss": 1.2239,
+      "step": 68
+    },
+    {
+      "epoch": 0.11004784688995216,
+      "grad_norm": 7.841418743133545,
+      "learning_rate": 0.000126,
+      "loss": 1.1156,
+      "step": 69
+    },
+    {
+      "epoch": 0.11164274322169059,
+      "grad_norm": 19.40918731689453,
+      "learning_rate": 0.00012800000000000002,
+      "loss": 1.3893,
+      "step": 70
+    },
+    {
+      "epoch": 0.11323763955342903,
+      "grad_norm": 3.5955138206481934,
+      "learning_rate": 0.00013000000000000002,
+      "loss": 1.3203,
+      "step": 71
+    },
+    {
+      "epoch": 0.11483253588516747,
+      "grad_norm": 7.509222984313965,
+      "learning_rate": 0.000132,
+      "loss": 1.3958,
+      "step": 72
+    },
+    {
+      "epoch": 0.11642743221690591,
+      "grad_norm": 5.788917064666748,
+      "learning_rate": 0.000134,
+      "loss": 1.4404,
+      "step": 73
+    },
+    {
+      "epoch": 0.11802232854864433,
+      "grad_norm": 57.762603759765625,
+      "learning_rate": 0.00013600000000000003,
+      "loss": 2.1282,
+      "step": 74
+    },
+    {
+      "epoch": 0.11961722488038277,
+      "grad_norm": 7.637035369873047,
+      "learning_rate": 0.000138,
+      "loss": 1.2525,
+      "step": 75
+    },
+    {
+      "epoch": 0.12121212121212122,
+      "grad_norm": 371.2599182128906,
+      "learning_rate": 0.00014,
+      "loss": 3.0992,
+      "step": 76
+    },
+    {
+      "epoch": 0.12280701754385964,
+      "grad_norm": 56.3419189453125,
+      "learning_rate": 0.000142,
+      "loss": 2.5542,
+      "step": 77
+    },
+    {
+      "epoch": 0.12440191387559808,
+      "grad_norm": 1314.4420166015625,
+      "learning_rate": 0.000144,
+      "loss": 2.1762,
+      "step": 78
+    },
+    {
+      "epoch": 0.12599681020733652,
+      "grad_norm": 22.95615577697754,
+      "learning_rate": 0.000146,
+      "loss": 1.2563,
+      "step": 79
+    },
+    {
+      "epoch": 0.12759170653907495,
+      "grad_norm": 10.517797470092773,
+      "learning_rate": 0.000148,
+      "loss": 1.4259,
+      "step": 80
+    },
+    {
+      "epoch": 0.1291866028708134,
+      "grad_norm": 37.58878707885742,
+      "learning_rate": 0.00015000000000000001,
+      "loss": 2.1527,
+      "step": 81
+    },
+    {
+      "epoch": 0.13078149920255183,
+      "grad_norm": 16.632266998291016,
+      "learning_rate": 0.000152,
+      "loss": 1.222,
+      "step": 82
+    },
+    {
+      "epoch": 0.13237639553429026,
+      "grad_norm": 3.820011615753174,
+      "learning_rate": 0.000154,
+      "loss": 1.3791,
+      "step": 83
+    },
+    {
+      "epoch": 0.1339712918660287,
+      "grad_norm": 5.253367900848389,
+      "learning_rate": 0.00015600000000000002,
+      "loss": 1.9685,
+      "step": 84
+    },
+    {
+      "epoch": 0.13556618819776714,
+      "grad_norm": 2.756962299346924,
+      "learning_rate": 0.00015800000000000002,
+      "loss": 1.0553,
+      "step": 85
+    },
+    {
+      "epoch": 0.1371610845295056,
+      "grad_norm": 3.8879504203796387,
+      "learning_rate": 0.00016,
+      "loss": 1.5082,
+      "step": 86
+    },
+    {
+      "epoch": 0.13875598086124402,
+      "grad_norm": 6.17363977432251,
+      "learning_rate": 0.000162,
+      "loss": 1.3968,
+      "step": 87
+    },
+    {
+      "epoch": 0.14035087719298245,
+      "grad_norm": 4.683026313781738,
+      "learning_rate": 0.000164,
+      "loss": 0.8941,
+      "step": 88
+    },
+    {
+      "epoch": 0.1419457735247209,
+      "grad_norm": 1.2167710065841675,
+      "learning_rate": 0.000166,
+      "loss": 1.2534,
+      "step": 89
+    },
+    {
+      "epoch": 0.14354066985645933,
+      "grad_norm": 11.542084693908691,
+      "learning_rate": 0.000168,
+      "loss": 1.7667,
+      "step": 90
+    },
+    {
+      "epoch": 0.14513556618819776,
+      "grad_norm": 70.17807006835938,
+      "learning_rate": 0.00017,
+      "loss": 1.4623,
+      "step": 91
+    },
+    {
+      "epoch": 0.1467304625199362,
+      "grad_norm": 66.24053955078125,
+      "learning_rate": 0.000172,
+      "loss": 3.0195,
+      "step": 92
+    },
+    {
+      "epoch": 0.14832535885167464,
+      "grad_norm": 3.0609261989593506,
+      "learning_rate": 0.000174,
+      "loss": 1.5645,
+      "step": 93
+    },
+    {
+      "epoch": 0.14992025518341306,
+      "grad_norm": 4.469959735870361,
+      "learning_rate": 0.00017600000000000002,
+      "loss": 2.2086,
+      "step": 94
+    },
+    {
+      "epoch": 0.15151515151515152,
+      "grad_norm": 3.164841651916504,
+      "learning_rate": 0.00017800000000000002,
+      "loss": 1.1951,
+      "step": 95
+    },
+    {
+      "epoch": 0.15311004784688995,
+      "grad_norm": 6.527286052703857,
+      "learning_rate": 0.00018,
+      "loss": 1.2,
+      "step": 96
+    },
+    {
+      "epoch": 0.1547049441786284,
+      "grad_norm": 15.101646423339844,
+      "learning_rate": 0.000182,
+      "loss": 1.3742,
+      "step": 97
+    },
+    {
+      "epoch": 0.15629984051036683,
+      "grad_norm": 5.785974502563477,
+      "learning_rate": 0.00018400000000000003,
+      "loss": 0.9624,
+      "step": 98
+    },
+    {
+      "epoch": 0.15789473684210525,
+      "grad_norm": 5.375400066375732,
+      "learning_rate": 0.00018600000000000002,
+      "loss": 0.9914,
+      "step": 99
+    },
+    {
+      "epoch": 0.1594896331738437,
+      "grad_norm": 5.572772979736328,
+      "learning_rate": 0.000188,
+      "loss": 1.0925,
+      "step": 100
+    },
+    {
+      "epoch": 0.16108452950558214,
+      "grad_norm": 3.163304567337036,
+      "learning_rate": 0.00019,
+      "loss": 1.8609,
+      "step": 101
+    },
+    {
+      "epoch": 0.16267942583732056,
+      "grad_norm": 4.896540641784668,
+      "learning_rate": 0.000192,
+      "loss": 2.1326,
+      "step": 102
+    },
+    {
+      "epoch": 0.16427432216905902,
+      "grad_norm": 4.353018283843994,
+      "learning_rate": 0.000194,
+      "loss": 1.8259,
+      "step": 103
+    },
+    {
+      "epoch": 0.16586921850079744,
+      "grad_norm": 2.8431453704833984,
+      "learning_rate": 0.000196,
+      "loss": 1.5044,
+      "step": 104
+    },
+    {
+      "epoch": 0.1674641148325359,
+      "grad_norm": 2.73559308052063,
+      "learning_rate": 0.00019800000000000002,
+      "loss": 1.4158,
+      "step": 105
+    },
+    {
+      "epoch": 0.16905901116427433,
+      "grad_norm": 4.377661228179932,
+      "learning_rate": 0.0002,
+      "loss": 0.8568,
+      "step": 106
+    },
+    {
+      "epoch": 0.17065390749601275,
+      "grad_norm": 33.87447738647461,
+      "learning_rate": 0.00019800000000000002,
+      "loss": 1.4858,
+      "step": 107
+    },
+    {
+      "epoch": 0.1722488038277512,
+      "grad_norm": 3.453542470932007,
+      "learning_rate": 0.000196,
+      "loss": 1.5622,
+      "step": 108
+    },
+    {
+      "epoch": 0.17384370015948963,
+      "grad_norm": 3.240596294403076,
+      "learning_rate": 0.000194,
+      "loss": 1.1919,
+      "step": 109
+    },
+    {
+      "epoch": 0.17543859649122806,
+      "grad_norm": 4.169600486755371,
+      "learning_rate": 0.000192,
+      "loss": 0.973,
+      "step": 110
+    },
+    {
+      "epoch": 0.17703349282296652,
+      "grad_norm": 6.525801181793213,
+      "learning_rate": 0.00019,
+      "loss": 1.718,
+      "step": 111
+    },
+    {
+      "epoch": 0.17862838915470494,
+      "grad_norm": 6.783817768096924,
+      "learning_rate": 0.000188,
+      "loss": 1.2306,
+      "step": 112
+    },
+    {
+      "epoch": 0.18022328548644337,
+      "grad_norm": 8.960905075073242,
+      "learning_rate": 0.00018600000000000002,
+      "loss": 1.3131,
+      "step": 113
+    },
+    {
+      "epoch": 0.18181818181818182,
+      "grad_norm": 5.883425712585449,
+      "learning_rate": 0.00018400000000000003,
+      "loss": 1.7021,
+      "step": 114
+    },
+    {
+      "epoch": 0.18341307814992025,
+      "grad_norm": 5.736645221710205,
+      "learning_rate": 0.000182,
+      "loss": 1.3815,
+      "step": 115
+    },
+    {
+      "epoch": 0.1850079744816587,
+      "grad_norm": 4.181487083435059,
+      "learning_rate": 0.00018,
+      "loss": 1.3178,
+      "step": 116
+    },
+    {
+      "epoch": 0.18660287081339713,
+      "grad_norm": 4.405350685119629,
+      "learning_rate": 0.00017800000000000002,
+      "loss": 1.9485,
+      "step": 117
+    },
+    {
+      "epoch": 0.18819776714513556,
+      "grad_norm": 3.4359993934631348,
+      "learning_rate": 0.00017600000000000002,
+      "loss": 1.096,
+      "step": 118
+    },
+    {
+      "epoch": 0.189792663476874,
+      "grad_norm": 5.447860240936279,
+      "learning_rate": 0.000174,
+      "loss": 1.0583,
+      "step": 119
+    },
+    {
+      "epoch": 0.19138755980861244,
+      "grad_norm": 97.87931060791016,
+      "learning_rate": 0.000172,
+      "loss": 1.044,
+      "step": 120
+    },
+    {
+      "epoch": 0.19298245614035087,
+      "grad_norm": 8.215063095092773,
+      "learning_rate": 0.00017,
+      "loss": 1.5723,
+      "step": 121
+    },
+    {
+      "epoch": 0.19457735247208932,
+      "grad_norm": 7.788384914398193,
+      "learning_rate": 0.000168,
+      "loss": 1.9708,
+      "step": 122
+    },
+    {
+      "epoch": 0.19617224880382775,
+      "grad_norm": 4.156929969787598,
+      "learning_rate": 0.000166,
+      "loss": 0.6143,
+      "step": 123
+    },
+    {
+      "epoch": 0.19776714513556617,
+      "grad_norm": 8.533568382263184,
+      "learning_rate": 0.000164,
+      "loss": 1.1833,
+      "step": 124
+    },
+    {
+      "epoch": 0.19936204146730463,
+      "grad_norm": 9.07735538482666,
+      "learning_rate": 0.000162,
+      "loss": 1.1025,
+      "step": 125
+    },
+    {
+      "epoch": 0.20095693779904306,
+      "grad_norm": 4.636246204376221,
+      "learning_rate": 0.00016,
+      "loss": 1.0709,
+      "step": 126
+    },
+    {
+      "epoch": 0.2025518341307815,
+      "grad_norm": 3.0601181983947754,
+      "learning_rate": 0.00015800000000000002,
+      "loss": 0.8874,
+      "step": 127
+    },
+    {
+      "epoch": 0.20414673046251994,
+      "grad_norm": 2.8409204483032227,
+      "learning_rate": 0.00015600000000000002,
+      "loss": 1.6791,
+      "step": 128
+    },
+    {
+      "epoch": 0.20574162679425836,
+      "grad_norm": 4.480583190917969,
+      "learning_rate": 0.000154,
+      "loss": 1.0051,
+      "step": 129
+    },
+    {
+      "epoch": 0.20733652312599682,
+      "grad_norm": 4.790148735046387,
+      "learning_rate": 0.000152,
+      "loss": 0.7256,
+      "step": 130
+    },
+    {
+      "epoch": 0.20893141945773525,
+      "grad_norm": 2.576634645462036,
+      "learning_rate": 0.00015000000000000001,
+      "loss": 1.5364,
+      "step": 131
+    },
+    {
+      "epoch": 0.21052631578947367,
+      "grad_norm": 3.0406057834625244,
+      "learning_rate": 0.000148,
+      "loss": 1.1761,
+      "step": 132
+    },
+    {
+      "epoch": 0.21212121212121213,
+      "grad_norm": 3.3600714206695557,
+      "learning_rate": 0.000146,
+      "loss": 1.7465,
+      "step": 133
+    },
+    {
+      "epoch": 0.21371610845295055,
+      "grad_norm": 5.360437393188477,
+      "learning_rate": 0.000144,
+      "loss": 1.8165,
+      "step": 134
+    },
+    {
+      "epoch": 0.215311004784689,
+      "grad_norm": 7.102390289306641,
+      "learning_rate": 0.000142,
+      "loss": 1.2619,
+      "step": 135
+    },
+    {
+      "epoch": 0.21690590111642744,
+      "grad_norm": 1.9597784280776978,
+      "learning_rate": 0.00014,
+      "loss": 1.9762,
+      "step": 136
+    },
+    {
+      "epoch": 0.21850079744816586,
+      "grad_norm": 3.9632205963134766,
+      "learning_rate": 0.000138,
+      "loss": 1.4078,
+      "step": 137
+    },
+    {
+      "epoch": 0.22009569377990432,
+      "grad_norm": 2.1894729137420654,
+      "learning_rate": 0.00013600000000000003,
+      "loss": 0.8851,
+      "step": 138
+    },
+    {
+      "epoch": 0.22169059011164274,
+      "grad_norm": Infinity,
+      "learning_rate": 0.00013600000000000003,
+      "loss": 1.4997,
+      "step": 139
+    },
+    {
+      "epoch": 0.22328548644338117,
+      "grad_norm": 52.63427734375,
+      "learning_rate": 0.000134,
+      "loss": 1.214,
+      "step": 140
+    },
+    {
+      "epoch": 0.22488038277511962,
+      "grad_norm": 4.334165573120117,
+      "learning_rate": 0.000132,
+      "loss": 0.9465,
+      "step": 141
+    },
+    {
+      "epoch": 0.22647527910685805,
+      "grad_norm": 4.615323066711426,
+      "learning_rate": 0.00013000000000000002,
+      "loss": 1.3691,
+      "step": 142
+    },
+    {
+      "epoch": 0.22807017543859648,
+      "grad_norm": 4.519163131713867,
+      "learning_rate": 0.00012800000000000002,
+      "loss": 1.116,
+      "step": 143
+    },
+    {
+      "epoch": 0.22966507177033493,
+      "grad_norm": 2.9541022777557373,
+      "learning_rate": 0.000126,
+      "loss": 1.6713,
+      "step": 144
+    },
+    {
+      "epoch": 0.23125996810207336,
+      "grad_norm": 4.985620021820068,
+      "learning_rate": 0.000124,
+      "loss": 0.4975,
+      "step": 145
+    },
+    {
+      "epoch": 0.23285486443381181,
+      "grad_norm": 2.776371955871582,
+      "learning_rate": 0.000122,
+      "loss": 1.255,
+      "step": 146
+    },
+    {
+      "epoch": 0.23444976076555024,
+      "grad_norm": 7.810369491577148,
+      "learning_rate": 0.00012,
+      "loss": 1.1688,
+      "step": 147
+    },
+    {
+      "epoch": 0.23604465709728867,
+      "grad_norm": 2.63275146484375,
+      "learning_rate": 0.000118,
+      "loss": 2.4762,
+      "step": 148
+    },
+    {
+      "epoch": 0.23763955342902712,
+      "grad_norm": 3.5678646564483643,
+      "learning_rate": 0.000116,
+      "loss": 2.1922,
+      "step": 149
+    },
+    {
+      "epoch": 0.23923444976076555,
+      "grad_norm": 6.582940578460693,
+      "learning_rate": 0.00011399999999999999,
+      "loss": 2.0777,
+      "step": 150
+    },
+    {
+      "epoch": 0.24082934609250398,
+      "grad_norm": 4.511703014373779,
+      "learning_rate": 0.00011200000000000001,
+      "loss": 2.3411,
+      "step": 151
+    },
+    {
+      "epoch": 0.24242424242424243,
+      "grad_norm": 8.474757194519043,
+      "learning_rate": 0.00011000000000000002,
+      "loss": 1.2168,
+      "step": 152
+    },
+    {
+      "epoch": 0.24401913875598086,
+      "grad_norm": 5.071783542633057,
+      "learning_rate": 0.00010800000000000001,
+      "loss": 1.268,
+      "step": 153
+    },
+    {
+      "epoch": 0.24561403508771928,
+      "grad_norm": 7.242175579071045,
+      "learning_rate": 0.00010600000000000002,
+      "loss": 1.5954,
+      "step": 154
+    },
+    {
+      "epoch": 0.24720893141945774,
+      "grad_norm": 4.464336395263672,
+      "learning_rate": 0.00010400000000000001,
+      "loss": 0.6807,
+      "step": 155
+    },
+    {
+      "epoch": 0.24880382775119617,
+      "grad_norm": 41.45453643798828,
+      "learning_rate": 0.00010200000000000001,
+      "loss": 1.0059,
+      "step": 156
+    },
+    {
+      "epoch": 0.2503987240829346,
+      "grad_norm": 12.17771053314209,
+      "learning_rate": 0.0001,
+      "loss": 1.1615,
+      "step": 157
+    },
+    {
+      "epoch": 0.25199362041467305,
+      "grad_norm": 2.6309285163879395,
+      "learning_rate": 9.8e-05,
+      "loss": 1.2695,
+      "step": 158
+    },
+    {
+      "epoch": 0.2535885167464115,
+      "grad_norm": 2.0649020671844482,
+      "learning_rate": 9.6e-05,
+      "loss": 0.8868,
+      "step": 159
+    },
+    {
+      "epoch": 0.2551834130781499,
+      "grad_norm": 2.4663641452789307,
+      "learning_rate": 9.4e-05,
+      "loss": 1.063,
+      "step": 160
+    },
+    {
+      "epoch": 0.2567783094098884,
+      "grad_norm": 2.168086051940918,
+      "learning_rate": 9.200000000000001e-05,
+      "loss": 1.8179,
+      "step": 161
+    },
+    {
+      "epoch": 0.2583732057416268,
+      "grad_norm": 2.884896755218506,
+      "learning_rate": 9e-05,
+      "loss": 0.9745,
+      "step": 162
+    },
+    {
+      "epoch": 0.25996810207336524,
+      "grad_norm": 4.95428466796875,
+      "learning_rate": 8.800000000000001e-05,
+      "loss": 1.7127,
+      "step": 163
+    },
+    {
+      "epoch": 0.26156299840510366,
+      "grad_norm": 2.4961204528808594,
+      "learning_rate": 8.6e-05,
+      "loss": 1.3255,
+      "step": 164
+    },
+    {
+      "epoch": 0.2631578947368421,
+      "grad_norm": 4.987830638885498,
+      "learning_rate": 8.4e-05,
+      "loss": 1.6393,
+      "step": 165
+    },
+    {
+      "epoch": 0.2647527910685805,
+      "grad_norm": 2.74229097366333,
+      "learning_rate": 8.2e-05,
+      "loss": 1.2829,
+      "step": 166
+    },
+    {
+      "epoch": 0.266347687400319,
+      "grad_norm": 3.4278724193573,
+      "learning_rate": 8e-05,
+      "loss": 1.8641,
+      "step": 167
+    },
+    {
+      "epoch": 0.2679425837320574,
+      "grad_norm": 3.168607473373413,
+      "learning_rate": 7.800000000000001e-05,
+      "loss": 0.9005,
+      "step": 168
+    },
+    {
+      "epoch": 0.26953748006379585,
+      "grad_norm": 2.2907590866088867,
+      "learning_rate": 7.6e-05,
+      "loss": 1.8272,
+      "step": 169
+    },
+    {
+      "epoch": 0.2711323763955343,
+      "grad_norm": 11.338663101196289,
+      "learning_rate": 7.4e-05,
+      "loss": 0.8515,
+      "step": 170
+    },
+    {
+      "epoch": 0.2727272727272727,
+      "grad_norm": 3.8454625606536865,
+      "learning_rate": 7.2e-05,
+      "loss": 0.7245,
+      "step": 171
+    },
+    {
+      "epoch": 0.2743221690590112,
+      "grad_norm": 4.1359968185424805,
+      "learning_rate": 7e-05,
+      "loss": 1.1218,
+      "step": 172
+    },
+    {
+      "epoch": 0.2759170653907496,
+      "grad_norm": 6.582677841186523,
+      "learning_rate": 6.800000000000001e-05,
+      "loss": 1.032,
+      "step": 173
+    },
+    {
+      "epoch": 0.27751196172248804,
+      "grad_norm": 4.796580791473389,
+      "learning_rate": 6.6e-05,
+      "loss": 0.9661,
+      "step": 174
+    },
+    {
+      "epoch": 0.27910685805422647,
+      "grad_norm": 3.352660655975342,
+      "learning_rate": 6.400000000000001e-05,
+      "loss": 0.981,
+      "step": 175
+    },
+    {
+      "epoch": 0.2807017543859649,
+      "grad_norm": 7.768184185028076,
+      "learning_rate": 6.2e-05,
+      "loss": 1.2995,
+      "step": 176
+    },
+    {
+      "epoch": 0.2822966507177033,
+      "grad_norm": 2.4985883235931396,
+      "learning_rate": 6e-05,
+      "loss": 0.5018,
+      "step": 177
+    },
+    {
+      "epoch": 0.2838915470494418,
+      "grad_norm": 4.418503284454346,
+      "learning_rate": 5.8e-05,
+      "loss": 0.6436,
+      "step": 178
+    },
+    {
+      "epoch": 0.28548644338118023,
+      "grad_norm": 5.020095348358154,
+      "learning_rate": 5.6000000000000006e-05,
+      "loss": 0.9538,
+      "step": 179
+    },
+    {
+      "epoch": 0.28708133971291866,
+      "grad_norm": 3.5376362800598145,
+      "learning_rate": 5.4000000000000005e-05,
+      "loss": 1.2548,
+      "step": 180
+    },
+    {
+      "epoch": 0.2886762360446571,
+      "grad_norm": 5.339288234710693,
+      "learning_rate": 5.2000000000000004e-05,
+      "loss": 1.237,
+      "step": 181
+    },
+    {
+      "epoch": 0.2902711323763955,
+      "grad_norm": 65.73676300048828,
+      "learning_rate": 5e-05,
+      "loss": 0.7367,
+      "step": 182
+    },
+    {
+      "epoch": 0.291866028708134,
+      "grad_norm": 2.4018213748931885,
+      "learning_rate": 4.8e-05,
+      "loss": 0.9429,
+      "step": 183
+    },
+    {
+      "epoch": 0.2934609250398724,
+      "grad_norm": 14.804810523986816,
+      "learning_rate": 4.600000000000001e-05,
+      "loss": 1.6304,
+      "step": 184
+    },
+    {
+      "epoch": 0.29505582137161085,
+      "grad_norm": 3.041649580001831,
+      "learning_rate": 4.4000000000000006e-05,
+      "loss": 1.2724,
+      "step": 185
+    },
+    {
+      "epoch": 0.2966507177033493,
+      "grad_norm": 4.808304309844971,
+      "learning_rate": 4.2e-05,
+      "loss": 1.1529,
+      "step": 186
+    },
+    {
+      "epoch": 0.2982456140350877,
+      "grad_norm": 3.3830454349517822,
+      "learning_rate": 4e-05,
+      "loss": 1.2314,
+      "step": 187
+    },
+    {
+      "epoch": 0.29984051036682613,
+      "grad_norm": 8.842469215393066,
+      "learning_rate": 3.8e-05,
+      "loss": 1.0924,
+      "step": 188
+    },
+    {
+      "epoch": 0.3014354066985646,
+      "grad_norm": 2.8933732509613037,
+      "learning_rate": 3.6e-05,
+      "loss": 1.1168,
+      "step": 189
+    },
+    {
+      "epoch": 0.30303030303030304,
+      "grad_norm": 3.3943734169006348,
+      "learning_rate": 3.4000000000000007e-05,
+      "loss": 1.361,
+      "step": 190
+    },
+    {
+      "epoch": 0.30462519936204147,
+      "grad_norm": 7.02134895324707,
+      "learning_rate": 3.2000000000000005e-05,
+      "loss": 1.6184,
+      "step": 191
+    },
+    {
+      "epoch": 0.3062200956937799,
+      "grad_norm": 2.4864888191223145,
+      "learning_rate": 3e-05,
+      "loss": 1.1632,
+      "step": 192
+    },
+    {
+      "epoch": 0.3078149920255183,
+      "grad_norm": 2.510240316390991,
+      "learning_rate": 2.8000000000000003e-05,
+      "loss": 0.644,
+      "step": 193
+    },
+    {
+      "epoch": 0.3094098883572568,
+      "grad_norm": 4.508166313171387,
+      "learning_rate": 2.6000000000000002e-05,
+      "loss": 1.1349,
+      "step": 194
+    },
+    {
+      "epoch": 0.31100478468899523,
+      "grad_norm": 4.515146732330322,
+      "learning_rate": 2.4e-05,
+      "loss": 0.8318,
+      "step": 195
+    },
+    {
+      "epoch": 0.31259968102073366,
+      "grad_norm": 4.47897481918335,
+      "learning_rate": 2.2000000000000003e-05,
+      "loss": 2.0528,
+      "step": 196
+    },
+    {
+      "epoch": 0.3141945773524721,
+      "grad_norm": 4.270202159881592,
+      "learning_rate": 2e-05,
+      "loss": 0.9385,
+      "step": 197
+    },
+    {
+      "epoch": 0.3157894736842105,
+      "grad_norm": 6.670559883117676,
+      "learning_rate": 1.8e-05,
+      "loss": 1.011,
+      "step": 198
+    },
+    {
+      "epoch": 0.31738437001594894,
+      "grad_norm": 326.71722412109375,
+      "learning_rate": 1.6000000000000003e-05,
+      "loss": 1.1715,
+      "step": 199
+    },
+    {
+      "epoch": 0.3189792663476874,
+      "grad_norm": 3.589381456375122,
+      "learning_rate": 1.4000000000000001e-05,
+      "loss": 1.48,
+      "step": 200
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 200,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1717079281606656.0,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-200/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fbeca8543db02fcfeadc3c4228271f956928a6a493c876c23301fdba09fd91c4
+size 5176

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fbeca8543db02fcfeadc3c4228271f956928a6a493c876c23301fdba09fd91c4
+size 5176