QJerry commited on Oct 15, 2024

Commit

ed266a5

verified ·

1 Parent(s): 9025e7e

Initial commit.

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

README.md +202 -0
adapter_config.json +35 -0
adapter_model.safetensors +3 -0
checkpoint-100/README.md +202 -0
checkpoint-100/adapter_config.json +35 -0
checkpoint-100/adapter_model.safetensors +3 -0
checkpoint-100/trainer_state.json +733 -0
checkpoint-100/training_args.bin +3 -0
checkpoint-120/README.md +202 -0
checkpoint-120/adapter_config.json +35 -0
checkpoint-120/adapter_model.safetensors +3 -0
checkpoint-120/trainer_state.json +873 -0
checkpoint-120/training_args.bin +3 -0
checkpoint-140/README.md +202 -0
checkpoint-140/adapter_config.json +35 -0
checkpoint-140/adapter_model.safetensors +3 -0
checkpoint-140/trainer_state.json +1013 -0
checkpoint-140/training_args.bin +3 -0
checkpoint-160/README.md +202 -0
checkpoint-160/adapter_config.json +35 -0
checkpoint-160/adapter_model.safetensors +3 -0
checkpoint-160/trainer_state.json +1153 -0
checkpoint-160/training_args.bin +3 -0
checkpoint-180/README.md +202 -0
checkpoint-180/adapter_config.json +35 -0
checkpoint-180/adapter_model.safetensors +3 -0
checkpoint-180/trainer_state.json +1293 -0
checkpoint-180/training_args.bin +3 -0
checkpoint-20/README.md +202 -0
checkpoint-20/adapter_config.json +35 -0
checkpoint-20/adapter_model.safetensors +3 -0
checkpoint-20/trainer_state.json +173 -0
checkpoint-20/training_args.bin +3 -0
checkpoint-200/README.md +202 -0
checkpoint-200/adapter_config.json +35 -0
checkpoint-200/adapter_model.safetensors +3 -0
checkpoint-200/trainer_state.json +1433 -0
checkpoint-200/training_args.bin +3 -0
checkpoint-220/README.md +202 -0
checkpoint-220/adapter_config.json +35 -0
checkpoint-220/adapter_model.safetensors +3 -0
checkpoint-220/trainer_state.json +1573 -0
checkpoint-220/training_args.bin +3 -0
checkpoint-240/README.md +202 -0
checkpoint-240/adapter_config.json +35 -0
checkpoint-240/adapter_model.safetensors +3 -0
checkpoint-240/trainer_state.json +1713 -0
checkpoint-240/training_args.bin +3 -0
checkpoint-260/README.md +202 -0
checkpoint-260/adapter_config.json +35 -0

README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: meta/Meta-Llama-3-8B-Instruct
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

adapter_config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "../ckpts/Meta-Llama-3-8B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "up_proj",
+    "gate_proj",
+    "lm_head",
+    "o_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c37c87c8430e0043db61b29ed2b301904b917a7eece2c67adda8674086ea499e
+size 1138856856

checkpoint-100/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: ../ckpts/Meta-Llama-3-8B-Instruct
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-100/adapter_config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "../ckpts/Meta-Llama-3-8B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "up_proj",
+    "gate_proj",
+    "lm_head",
+    "o_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-100/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f2634872fe40006913b6fa9d7ec305dc37e5fdbbcffeee35f4b1e11518f41c6a
+size 1138856856

checkpoint-100/trainer_state.json ADDED Viewed

	@@ -0,0 +1,733 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.8533333333333334,
+  "eval_steps": 500,
+  "global_step": 100,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.008533333333333334,
+      "grad_norm": 160.11701043689894,
+      "learning_rate": 0.0,
+      "loss": 32.4968,
+      "step": 1
+    },
+    {
+      "epoch": 0.017066666666666667,
+      "grad_norm": 157.24779534424323,
+      "learning_rate": 1.5051499783199057e-06,
+      "loss": 31.6979,
+      "step": 2
+    },
+    {
+      "epoch": 0.0256,
+      "grad_norm": 157.9465272449825,
+      "learning_rate": 2.385606273598312e-06,
+      "loss": 31.8828,
+      "step": 3
+    },
+    {
+      "epoch": 0.034133333333333335,
+      "grad_norm": 160.2154859965946,
+      "learning_rate": 3.0102999566398115e-06,
+      "loss": 31.9681,
+      "step": 4
+    },
+    {
+      "epoch": 0.042666666666666665,
+      "grad_norm": 158.5305446712084,
+      "learning_rate": 3.4948500216800934e-06,
+      "loss": 31.3717,
+      "step": 5
+    },
+    {
+      "epoch": 0.0512,
+      "grad_norm": 155.50243039700376,
+      "learning_rate": 3.890756251918218e-06,
+      "loss": 30.5348,
+      "step": 6
+    },
+    {
+      "epoch": 0.05973333333333333,
+      "grad_norm": 168.6887446693614,
+      "learning_rate": 4.225490200071284e-06,
+      "loss": 31.3845,
+      "step": 7
+    },
+    {
+      "epoch": 0.06826666666666667,
+      "grad_norm": 164.2631689450651,
+      "learning_rate": 4.515449934959717e-06,
+      "loss": 30.5243,
+      "step": 8
+    },
+    {
+      "epoch": 0.0768,
+      "grad_norm": 174.1878139573776,
+      "learning_rate": 4.771212547196624e-06,
+      "loss": 30.0138,
+      "step": 9
+    },
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 177.9519334680014,
+      "learning_rate": 4.9999999999999996e-06,
+      "loss": 29.6143,
+      "step": 10
+    },
+    {
+      "epoch": 0.09386666666666667,
+      "grad_norm": 183.57104380865735,
+      "learning_rate": 5.206963425791125e-06,
+      "loss": 28.8718,
+      "step": 11
+    },
+    {
+      "epoch": 0.1024,
+      "grad_norm": 186.4090344511231,
+      "learning_rate": 5.395906230238124e-06,
+      "loss": 26.1695,
+      "step": 12
+    },
+    {
+      "epoch": 0.11093333333333333,
+      "grad_norm": 198.17161320746723,
+      "learning_rate": 5.5697167615341825e-06,
+      "loss": 26.1266,
+      "step": 13
+    },
+    {
+      "epoch": 0.11946666666666667,
+      "grad_norm": 182.4443087115901,
+      "learning_rate": 5.730640178391189e-06,
+      "loss": 24.2121,
+      "step": 14
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 159.38105380659272,
+      "learning_rate": 5.880456295278406e-06,
+      "loss": 22.5796,
+      "step": 15
+    },
+    {
+      "epoch": 0.13653333333333334,
+      "grad_norm": 142.82387126501297,
+      "learning_rate": 6.020599913279623e-06,
+      "loss": 21.1346,
+      "step": 16
+    },
+    {
+      "epoch": 0.14506666666666668,
+      "grad_norm": 123.86394296641578,
+      "learning_rate": 6.15224460689137e-06,
+      "loss": 19.8457,
+      "step": 17
+    },
+    {
+      "epoch": 0.1536,
+      "grad_norm": 112.3988260336824,
+      "learning_rate": 6.276362525516529e-06,
+      "loss": 18.7824,
+      "step": 18
+    },
+    {
+      "epoch": 0.16213333333333332,
+      "grad_norm": 120.96712330991012,
+      "learning_rate": 6.393768004764144e-06,
+      "loss": 18.0207,
+      "step": 19
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 129.42692949353702,
+      "learning_rate": 6.505149978319905e-06,
+      "loss": 16.8355,
+      "step": 20
+    },
+    {
+      "epoch": 0.1792,
+      "grad_norm": 120.65595457746791,
+      "learning_rate": 6.611096473669596e-06,
+      "loss": 15.252,
+      "step": 21
+    },
+    {
+      "epoch": 0.18773333333333334,
+      "grad_norm": 133.05280466087515,
+      "learning_rate": 6.712113404111031e-06,
+      "loss": 14.1391,
+      "step": 22
+    },
+    {
+      "epoch": 0.19626666666666667,
+      "grad_norm": 127.95029628849048,
+      "learning_rate": 6.808639180087963e-06,
+      "loss": 12.9566,
+      "step": 23
+    },
+    {
+      "epoch": 0.2048,
+      "grad_norm": 108.83495245094748,
+      "learning_rate": 6.90105620855803e-06,
+      "loss": 11.8743,
+      "step": 24
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 99.90727146021455,
+      "learning_rate": 6.989700043360187e-06,
+      "loss": 10.962,
+      "step": 25
+    },
+    {
+      "epoch": 0.22186666666666666,
+      "grad_norm": 98.37126740059823,
+      "learning_rate": 7.074866739854089e-06,
+      "loss": 9.9919,
+      "step": 26
+    },
+    {
+      "epoch": 0.2304,
+      "grad_norm": 92.26708429201608,
+      "learning_rate": 7.156818820794936e-06,
+      "loss": 8.8811,
+      "step": 27
+    },
+    {
+      "epoch": 0.23893333333333333,
+      "grad_norm": 83.36099898839835,
+      "learning_rate": 7.235790156711096e-06,
+      "loss": 7.7806,
+      "step": 28
+    },
+    {
+      "epoch": 0.24746666666666667,
+      "grad_norm": 68.07500315598597,
+      "learning_rate": 7.3119899894947795e-06,
+      "loss": 7.0528,
+      "step": 29
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 69.58960332280246,
+      "learning_rate": 7.385606273598311e-06,
+      "loss": 6.3683,
+      "step": 30
+    },
+    {
+      "epoch": 0.26453333333333334,
+      "grad_norm": 68.77532204123075,
+      "learning_rate": 7.456808469171363e-06,
+      "loss": 6.1635,
+      "step": 31
+    },
+    {
+      "epoch": 0.2730666666666667,
+      "grad_norm": 66.29676636510072,
+      "learning_rate": 7.5257498915995295e-06,
+      "loss": 4.711,
+      "step": 32
+    },
+    {
+      "epoch": 0.2816,
+      "grad_norm": 42.87145091679237,
+      "learning_rate": 7.592569699389437e-06,
+      "loss": 4.5119,
+      "step": 33
+    },
+    {
+      "epoch": 0.29013333333333335,
+      "grad_norm": 26.2592350291551,
+      "learning_rate": 7.657394585211274e-06,
+      "loss": 4.31,
+      "step": 34
+    },
+    {
+      "epoch": 0.2986666666666667,
+      "grad_norm": 15.35959008067237,
+      "learning_rate": 7.720340221751376e-06,
+      "loss": 4.0001,
+      "step": 35
+    },
+    {
+      "epoch": 0.3072,
+      "grad_norm": 8.50847651865227,
+      "learning_rate": 7.781512503836437e-06,
+      "loss": 3.5723,
+      "step": 36
+    },
+    {
+      "epoch": 0.3157333333333333,
+      "grad_norm": 6.562581089063746,
+      "learning_rate": 7.841008620334974e-06,
+      "loss": 3.9254,
+      "step": 37
+    },
+    {
+      "epoch": 0.32426666666666665,
+      "grad_norm": 5.6145595722250095,
+      "learning_rate": 7.89891798308405e-06,
+      "loss": 3.8746,
+      "step": 38
+    },
+    {
+      "epoch": 0.3328,
+      "grad_norm": 5.385367220486204,
+      "learning_rate": 7.955323035132495e-06,
+      "loss": 3.8128,
+      "step": 39
+    },
+    {
+      "epoch": 0.3413333333333333,
+      "grad_norm": 5.403447124703616,
+      "learning_rate": 8.010299956639811e-06,
+      "loss": 3.885,
+      "step": 40
+    },
+    {
+      "epoch": 0.34986666666666666,
+      "grad_norm": 5.48242204895128,
+      "learning_rate": 8.063919283598677e-06,
+      "loss": 3.8048,
+      "step": 41
+    },
+    {
+      "epoch": 0.3584,
+      "grad_norm": 5.5525098950513865,
+      "learning_rate": 8.116246451989503e-06,
+      "loss": 3.7508,
+      "step": 42
+    },
+    {
+      "epoch": 0.36693333333333333,
+      "grad_norm": 5.354384520535484,
+      "learning_rate": 8.167342277897933e-06,
+      "loss": 3.5069,
+      "step": 43
+    },
+    {
+      "epoch": 0.37546666666666667,
+      "grad_norm": 5.46272338131107,
+      "learning_rate": 8.217263382430936e-06,
+      "loss": 3.6747,
+      "step": 44
+    },
+    {
+      "epoch": 0.384,
+      "grad_norm": 4.798550688968453,
+      "learning_rate": 8.266062568876717e-06,
+      "loss": 3.1609,
+      "step": 45
+    },
+    {
+      "epoch": 0.39253333333333335,
+      "grad_norm": 5.755104452953421,
+      "learning_rate": 8.31378915840787e-06,
+      "loss": 3.5733,
+      "step": 46
+    },
+    {
+      "epoch": 0.4010666666666667,
+      "grad_norm": 4.618763611067563,
+      "learning_rate": 8.360489289678585e-06,
+      "loss": 2.9402,
+      "step": 47
+    },
+    {
+      "epoch": 0.4096,
+      "grad_norm": 5.506785974818791,
+      "learning_rate": 8.406206186877936e-06,
+      "loss": 3.382,
+      "step": 48
+    },
+    {
+      "epoch": 0.41813333333333336,
+      "grad_norm": 4.68603207809794,
+      "learning_rate": 8.450980400142568e-06,
+      "loss": 2.9918,
+      "step": 49
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 5.124033394817131,
+      "learning_rate": 8.494850021680093e-06,
+      "loss": 3.3202,
+      "step": 50
+    },
+    {
+      "epoch": 0.4352,
+      "grad_norm": 4.293001183481895,
+      "learning_rate": 8.537850880489681e-06,
+      "loss": 2.8519,
+      "step": 51
+    },
+    {
+      "epoch": 0.4437333333333333,
+      "grad_norm": 4.382596858902394,
+      "learning_rate": 8.580016718173996e-06,
+      "loss": 2.9683,
+      "step": 52
+    },
+    {
+      "epoch": 0.45226666666666665,
+      "grad_norm": 4.3176263388044696,
+      "learning_rate": 8.621379348003945e-06,
+      "loss": 2.9257,
+      "step": 53
+    },
+    {
+      "epoch": 0.4608,
+      "grad_norm": 4.5250022171605195,
+      "learning_rate": 8.661968799114844e-06,
+      "loss": 3.0556,
+      "step": 54
+    },
+    {
+      "epoch": 0.4693333333333333,
+      "grad_norm": 4.429424190600661,
+      "learning_rate": 8.701813447471218e-06,
+      "loss": 2.9513,
+      "step": 55
+    },
+    {
+      "epoch": 0.47786666666666666,
+      "grad_norm": 4.349652568052827,
+      "learning_rate": 8.740940135031001e-06,
+      "loss": 2.9029,
+      "step": 56
+    },
+    {
+      "epoch": 0.4864,
+      "grad_norm": 4.299227871435445,
+      "learning_rate": 8.779374278362457e-06,
+      "loss": 2.5989,
+      "step": 57
+    },
+    {
+      "epoch": 0.49493333333333334,
+      "grad_norm": 4.562461330302201,
+      "learning_rate": 8.817139967814684e-06,
+      "loss": 2.8158,
+      "step": 58
+    },
+    {
+      "epoch": 0.5034666666666666,
+      "grad_norm": 4.606987182758338,
+      "learning_rate": 8.854260058210721e-06,
+      "loss": 2.6272,
+      "step": 59
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 4.9420031522511545,
+      "learning_rate": 8.890756251918216e-06,
+      "loss": 2.5488,
+      "step": 60
+    },
+    {
+      "epoch": 0.5205333333333333,
+      "grad_norm": 4.706462297046012,
+      "learning_rate": 8.926649175053834e-06,
+      "loss": 2.3575,
+      "step": 61
+    },
+    {
+      "epoch": 0.5290666666666667,
+      "grad_norm": 4.862820204363494,
+      "learning_rate": 8.961958447491269e-06,
+      "loss": 2.2952,
+      "step": 62
+    },
+    {
+      "epoch": 0.5376,
+      "grad_norm": 4.911045913397774,
+      "learning_rate": 8.996702747267908e-06,
+      "loss": 2.1768,
+      "step": 63
+    },
+    {
+      "epoch": 0.5461333333333334,
+      "grad_norm": 5.46978680182973,
+      "learning_rate": 9.030899869919434e-06,
+      "loss": 2.2528,
+      "step": 64
+    },
+    {
+      "epoch": 0.5546666666666666,
+      "grad_norm": 5.847558397227374,
+      "learning_rate": 9.064566783214276e-06,
+      "loss": 2.2401,
+      "step": 65
+    },
+    {
+      "epoch": 0.5632,
+      "grad_norm": 5.984440656257,
+      "learning_rate": 9.097719677709343e-06,
+      "loss": 2.156,
+      "step": 66
+    },
+    {
+      "epoch": 0.5717333333333333,
+      "grad_norm": 6.146172189799918,
+      "learning_rate": 9.130374013504131e-06,
+      "loss": 2.0059,
+      "step": 67
+    },
+    {
+      "epoch": 0.5802666666666667,
+      "grad_norm": 5.725706778130614,
+      "learning_rate": 9.162544563531182e-06,
+      "loss": 1.7756,
+      "step": 68
+    },
+    {
+      "epoch": 0.5888,
+      "grad_norm": 6.479060263133115,
+      "learning_rate": 9.194245453686277e-06,
+      "loss": 1.7651,
+      "step": 69
+    },
+    {
+      "epoch": 0.5973333333333334,
+      "grad_norm": 7.319291050667066,
+      "learning_rate": 9.225490200071284e-06,
+      "loss": 1.7712,
+      "step": 70
+    },
+    {
+      "epoch": 0.6058666666666667,
+      "grad_norm": 6.913275412032087,
+      "learning_rate": 9.256291743595376e-06,
+      "loss": 1.709,
+      "step": 71
+    },
+    {
+      "epoch": 0.6144,
+      "grad_norm": 6.600657239614328,
+      "learning_rate": 9.28666248215634e-06,
+      "loss": 1.3731,
+      "step": 72
+    },
+    {
+      "epoch": 0.6229333333333333,
+      "grad_norm": 7.301483724647945,
+      "learning_rate": 9.316614300602277e-06,
+      "loss": 1.4166,
+      "step": 73
+    },
+    {
+      "epoch": 0.6314666666666666,
+      "grad_norm": 7.154933225265475,
+      "learning_rate": 9.346158598654881e-06,
+      "loss": 1.2797,
+      "step": 74
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 8.248472592538771,
+      "learning_rate": 9.375306316958499e-06,
+      "loss": 1.2082,
+      "step": 75
+    },
+    {
+      "epoch": 0.6485333333333333,
+      "grad_norm": 7.444479096112177,
+      "learning_rate": 9.404067961403957e-06,
+      "loss": 1.0402,
+      "step": 76
+    },
+    {
+      "epoch": 0.6570666666666667,
+      "grad_norm": 6.819760434594012,
+      "learning_rate": 9.432453625862409e-06,
+      "loss": 0.8244,
+      "step": 77
+    },
+    {
+      "epoch": 0.6656,
+      "grad_norm": 6.894760862855001,
+      "learning_rate": 9.460473013452401e-06,
+      "loss": 0.8345,
+      "step": 78
+    },
+    {
+      "epoch": 0.6741333333333334,
+      "grad_norm": 6.001848571839919,
+      "learning_rate": 9.488135456452207e-06,
+      "loss": 0.6839,
+      "step": 79
+    },
+    {
+      "epoch": 0.6826666666666666,
+      "grad_norm": 5.709147411501981,
+      "learning_rate": 9.515449934959717e-06,
+      "loss": 0.6567,
+      "step": 80
+    },
+    {
+      "epoch": 0.6912,
+      "grad_norm": 4.128977158730638,
+      "learning_rate": 9.542425094393249e-06,
+      "loss": 0.545,
+      "step": 81
+    },
+    {
+      "epoch": 0.6997333333333333,
+      "grad_norm": 2.604915806147427,
+      "learning_rate": 9.569069261918582e-06,
+      "loss": 0.4596,
+      "step": 82
+    },
+    {
+      "epoch": 0.7082666666666667,
+      "grad_norm": 2.039939253407506,
+      "learning_rate": 9.59539046188037e-06,
+      "loss": 0.452,
+      "step": 83
+    },
+    {
+      "epoch": 0.7168,
+      "grad_norm": 2.0398988141415337,
+      "learning_rate": 9.621396430309407e-06,
+      "loss": 0.4538,
+      "step": 84
+    },
+    {
+      "epoch": 0.7253333333333334,
+      "grad_norm": 2.37589477950211,
+      "learning_rate": 9.647094628571464e-06,
+      "loss": 0.4505,
+      "step": 85
+    },
+    {
+      "epoch": 0.7338666666666667,
+      "grad_norm": 2.80580920047501,
+      "learning_rate": 9.672492256217837e-06,
+      "loss": 0.5284,
+      "step": 86
+    },
+    {
+      "epoch": 0.7424,
+      "grad_norm": 2.3687428819051197,
+      "learning_rate": 9.697596263093091e-06,
+      "loss": 0.4371,
+      "step": 87
+    },
+    {
+      "epoch": 0.7509333333333333,
+      "grad_norm": 1.6362502854757155,
+      "learning_rate": 9.722413360750844e-06,
+      "loss": 0.3652,
+      "step": 88
+    },
+    {
+      "epoch": 0.7594666666666666,
+      "grad_norm": 1.5360860168740427,
+      "learning_rate": 9.746950033224562e-06,
+      "loss": 0.3235,
+      "step": 89
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 1.7245475092642693,
+      "learning_rate": 9.771212547196623e-06,
+      "loss": 0.3072,
+      "step": 90
+    },
+    {
+      "epoch": 0.7765333333333333,
+      "grad_norm": 1.4493496982196852,
+      "learning_rate": 9.795206961605467e-06,
+      "loss": 0.2474,
+      "step": 91
+    },
+    {
+      "epoch": 0.7850666666666667,
+      "grad_norm": 1.1662262130552072,
+      "learning_rate": 9.818939136727777e-06,
+      "loss": 0.2684,
+      "step": 92
+    },
+    {
+      "epoch": 0.7936,
+      "grad_norm": 1.1727132215390659,
+      "learning_rate": 9.842414742769675e-06,
+      "loss": 0.3456,
+      "step": 93
+    },
+    {
+      "epoch": 0.8021333333333334,
+      "grad_norm": 0.8435059300379855,
+      "learning_rate": 9.865639267998493e-06,
+      "loss": 0.227,
+      "step": 94
+    },
+    {
+      "epoch": 0.8106666666666666,
+      "grad_norm": 0.8593375804730568,
+      "learning_rate": 9.888618026444238e-06,
+      "loss": 0.1985,
+      "step": 95
+    },
+    {
+      "epoch": 0.8192,
+      "grad_norm": 1.0673772841412472,
+      "learning_rate": 9.911356165197841e-06,
+      "loss": 0.3195,
+      "step": 96
+    },
+    {
+      "epoch": 0.8277333333333333,
+      "grad_norm": 0.9341285801648793,
+      "learning_rate": 9.933858671331224e-06,
+      "loss": 0.213,
+      "step": 97
+    },
+    {
+      "epoch": 0.8362666666666667,
+      "grad_norm": 0.7197728549764331,
+      "learning_rate": 9.956130378462474e-06,
+      "loss": 0.2067,
+      "step": 98
+    },
+    {
+      "epoch": 0.8448,
+      "grad_norm": 0.5655901060353195,
+      "learning_rate": 9.978175972987748e-06,
+      "loss": 0.1708,
+      "step": 99
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 0.4681745812066334,
+      "learning_rate": 9.999999999999999e-06,
+      "loss": 0.1983,
+      "step": 100
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 301,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 20,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.4183197916633498e+18,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-100/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9430fb289d52200b279530dc31f818fe016b81f2a2feb4d356e75541590998de
+size 6840

checkpoint-120/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: ../ckpts/Meta-Llama-3-8B-Instruct
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-120/adapter_config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "../ckpts/Meta-Llama-3-8B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "up_proj",
+    "gate_proj",
+    "lm_head",
+    "o_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-120/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:edbd90e3439f57e995d61f218ff6660a7740c5b9bc153415f6b8ebb873b75737
+size 1138856856

checkpoint-120/trainer_state.json ADDED Viewed

	@@ -0,0 +1,873 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.024,
+  "eval_steps": 500,
+  "global_step": 120,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.008533333333333334,
+      "grad_norm": 160.11701043689894,
+      "learning_rate": 0.0,
+      "loss": 32.4968,
+      "step": 1
+    },
+    {
+      "epoch": 0.017066666666666667,
+      "grad_norm": 157.24779534424323,
+      "learning_rate": 1.5051499783199057e-06,
+      "loss": 31.6979,
+      "step": 2
+    },
+    {
+      "epoch": 0.0256,
+      "grad_norm": 157.9465272449825,
+      "learning_rate": 2.385606273598312e-06,
+      "loss": 31.8828,
+      "step": 3
+    },
+    {
+      "epoch": 0.034133333333333335,
+      "grad_norm": 160.2154859965946,
+      "learning_rate": 3.0102999566398115e-06,
+      "loss": 31.9681,
+      "step": 4
+    },
+    {
+      "epoch": 0.042666666666666665,
+      "grad_norm": 158.5305446712084,
+      "learning_rate": 3.4948500216800934e-06,
+      "loss": 31.3717,
+      "step": 5
+    },
+    {
+      "epoch": 0.0512,
+      "grad_norm": 155.50243039700376,
+      "learning_rate": 3.890756251918218e-06,
+      "loss": 30.5348,
+      "step": 6
+    },
+    {
+      "epoch": 0.05973333333333333,
+      "grad_norm": 168.6887446693614,
+      "learning_rate": 4.225490200071284e-06,
+      "loss": 31.3845,
+      "step": 7
+    },
+    {
+      "epoch": 0.06826666666666667,
+      "grad_norm": 164.2631689450651,
+      "learning_rate": 4.515449934959717e-06,
+      "loss": 30.5243,
+      "step": 8
+    },
+    {
+      "epoch": 0.0768,
+      "grad_norm": 174.1878139573776,
+      "learning_rate": 4.771212547196624e-06,
+      "loss": 30.0138,
+      "step": 9
+    },
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 177.9519334680014,
+      "learning_rate": 4.9999999999999996e-06,
+      "loss": 29.6143,
+      "step": 10
+    },
+    {
+      "epoch": 0.09386666666666667,
+      "grad_norm": 183.57104380865735,
+      "learning_rate": 5.206963425791125e-06,
+      "loss": 28.8718,
+      "step": 11
+    },
+    {
+      "epoch": 0.1024,
+      "grad_norm": 186.4090344511231,
+      "learning_rate": 5.395906230238124e-06,
+      "loss": 26.1695,
+      "step": 12
+    },
+    {
+      "epoch": 0.11093333333333333,
+      "grad_norm": 198.17161320746723,
+      "learning_rate": 5.5697167615341825e-06,
+      "loss": 26.1266,
+      "step": 13
+    },
+    {
+      "epoch": 0.11946666666666667,
+      "grad_norm": 182.4443087115901,
+      "learning_rate": 5.730640178391189e-06,
+      "loss": 24.2121,
+      "step": 14
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 159.38105380659272,
+      "learning_rate": 5.880456295278406e-06,
+      "loss": 22.5796,
+      "step": 15
+    },
+    {
+      "epoch": 0.13653333333333334,
+      "grad_norm": 142.82387126501297,
+      "learning_rate": 6.020599913279623e-06,
+      "loss": 21.1346,
+      "step": 16
+    },
+    {
+      "epoch": 0.14506666666666668,
+      "grad_norm": 123.86394296641578,
+      "learning_rate": 6.15224460689137e-06,
+      "loss": 19.8457,
+      "step": 17
+    },
+    {
+      "epoch": 0.1536,
+      "grad_norm": 112.3988260336824,
+      "learning_rate": 6.276362525516529e-06,
+      "loss": 18.7824,
+      "step": 18
+    },
+    {
+      "epoch": 0.16213333333333332,
+      "grad_norm": 120.96712330991012,
+      "learning_rate": 6.393768004764144e-06,
+      "loss": 18.0207,
+      "step": 19
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 129.42692949353702,
+      "learning_rate": 6.505149978319905e-06,
+      "loss": 16.8355,
+      "step": 20
+    },
+    {
+      "epoch": 0.1792,
+      "grad_norm": 120.65595457746791,
+      "learning_rate": 6.611096473669596e-06,
+      "loss": 15.252,
+      "step": 21
+    },
+    {
+      "epoch": 0.18773333333333334,
+      "grad_norm": 133.05280466087515,
+      "learning_rate": 6.712113404111031e-06,
+      "loss": 14.1391,
+      "step": 22
+    },
+    {
+      "epoch": 0.19626666666666667,
+      "grad_norm": 127.95029628849048,
+      "learning_rate": 6.808639180087963e-06,
+      "loss": 12.9566,
+      "step": 23
+    },
+    {
+      "epoch": 0.2048,
+      "grad_norm": 108.83495245094748,
+      "learning_rate": 6.90105620855803e-06,
+      "loss": 11.8743,
+      "step": 24
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 99.90727146021455,
+      "learning_rate": 6.989700043360187e-06,
+      "loss": 10.962,
+      "step": 25
+    },
+    {
+      "epoch": 0.22186666666666666,
+      "grad_norm": 98.37126740059823,
+      "learning_rate": 7.074866739854089e-06,
+      "loss": 9.9919,
+      "step": 26
+    },
+    {
+      "epoch": 0.2304,
+      "grad_norm": 92.26708429201608,
+      "learning_rate": 7.156818820794936e-06,
+      "loss": 8.8811,
+      "step": 27
+    },
+    {
+      "epoch": 0.23893333333333333,
+      "grad_norm": 83.36099898839835,
+      "learning_rate": 7.235790156711096e-06,
+      "loss": 7.7806,
+      "step": 28
+    },
+    {
+      "epoch": 0.24746666666666667,
+      "grad_norm": 68.07500315598597,
+      "learning_rate": 7.3119899894947795e-06,
+      "loss": 7.0528,
+      "step": 29
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 69.58960332280246,
+      "learning_rate": 7.385606273598311e-06,
+      "loss": 6.3683,
+      "step": 30
+    },
+    {
+      "epoch": 0.26453333333333334,
+      "grad_norm": 68.77532204123075,
+      "learning_rate": 7.456808469171363e-06,
+      "loss": 6.1635,
+      "step": 31
+    },
+    {
+      "epoch": 0.2730666666666667,
+      "grad_norm": 66.29676636510072,
+      "learning_rate": 7.5257498915995295e-06,
+      "loss": 4.711,
+      "step": 32
+    },
+    {
+      "epoch": 0.2816,
+      "grad_norm": 42.87145091679237,
+      "learning_rate": 7.592569699389437e-06,
+      "loss": 4.5119,
+      "step": 33
+    },
+    {
+      "epoch": 0.29013333333333335,
+      "grad_norm": 26.2592350291551,
+      "learning_rate": 7.657394585211274e-06,
+      "loss": 4.31,
+      "step": 34
+    },
+    {
+      "epoch": 0.2986666666666667,
+      "grad_norm": 15.35959008067237,
+      "learning_rate": 7.720340221751376e-06,
+      "loss": 4.0001,
+      "step": 35
+    },
+    {
+      "epoch": 0.3072,
+      "grad_norm": 8.50847651865227,
+      "learning_rate": 7.781512503836437e-06,
+      "loss": 3.5723,
+      "step": 36
+    },
+    {
+      "epoch": 0.3157333333333333,
+      "grad_norm": 6.562581089063746,
+      "learning_rate": 7.841008620334974e-06,
+      "loss": 3.9254,
+      "step": 37
+    },
+    {
+      "epoch": 0.32426666666666665,
+      "grad_norm": 5.6145595722250095,
+      "learning_rate": 7.89891798308405e-06,
+      "loss": 3.8746,
+      "step": 38
+    },
+    {
+      "epoch": 0.3328,
+      "grad_norm": 5.385367220486204,
+      "learning_rate": 7.955323035132495e-06,
+      "loss": 3.8128,
+      "step": 39
+    },
+    {
+      "epoch": 0.3413333333333333,
+      "grad_norm": 5.403447124703616,
+      "learning_rate": 8.010299956639811e-06,
+      "loss": 3.885,
+      "step": 40
+    },
+    {
+      "epoch": 0.34986666666666666,
+      "grad_norm": 5.48242204895128,
+      "learning_rate": 8.063919283598677e-06,
+      "loss": 3.8048,
+      "step": 41
+    },
+    {
+      "epoch": 0.3584,
+      "grad_norm": 5.5525098950513865,
+      "learning_rate": 8.116246451989503e-06,
+      "loss": 3.7508,
+      "step": 42
+    },
+    {
+      "epoch": 0.36693333333333333,
+      "grad_norm": 5.354384520535484,
+      "learning_rate": 8.167342277897933e-06,
+      "loss": 3.5069,
+      "step": 43
+    },
+    {
+      "epoch": 0.37546666666666667,
+      "grad_norm": 5.46272338131107,
+      "learning_rate": 8.217263382430936e-06,
+      "loss": 3.6747,
+      "step": 44
+    },
+    {
+      "epoch": 0.384,
+      "grad_norm": 4.798550688968453,
+      "learning_rate": 8.266062568876717e-06,
+      "loss": 3.1609,
+      "step": 45
+    },
+    {
+      "epoch": 0.39253333333333335,
+      "grad_norm": 5.755104452953421,
+      "learning_rate": 8.31378915840787e-06,
+      "loss": 3.5733,
+      "step": 46
+    },
+    {
+      "epoch": 0.4010666666666667,
+      "grad_norm": 4.618763611067563,
+      "learning_rate": 8.360489289678585e-06,
+      "loss": 2.9402,
+      "step": 47
+    },
+    {
+      "epoch": 0.4096,
+      "grad_norm": 5.506785974818791,
+      "learning_rate": 8.406206186877936e-06,
+      "loss": 3.382,
+      "step": 48
+    },
+    {
+      "epoch": 0.41813333333333336,
+      "grad_norm": 4.68603207809794,
+      "learning_rate": 8.450980400142568e-06,
+      "loss": 2.9918,
+      "step": 49
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 5.124033394817131,
+      "learning_rate": 8.494850021680093e-06,
+      "loss": 3.3202,
+      "step": 50
+    },
+    {
+      "epoch": 0.4352,
+      "grad_norm": 4.293001183481895,
+      "learning_rate": 8.537850880489681e-06,
+      "loss": 2.8519,
+      "step": 51
+    },
+    {
+      "epoch": 0.4437333333333333,
+      "grad_norm": 4.382596858902394,
+      "learning_rate": 8.580016718173996e-06,
+      "loss": 2.9683,
+      "step": 52
+    },
+    {
+      "epoch": 0.45226666666666665,
+      "grad_norm": 4.3176263388044696,
+      "learning_rate": 8.621379348003945e-06,
+      "loss": 2.9257,
+      "step": 53
+    },
+    {
+      "epoch": 0.4608,
+      "grad_norm": 4.5250022171605195,
+      "learning_rate": 8.661968799114844e-06,
+      "loss": 3.0556,
+      "step": 54
+    },
+    {
+      "epoch": 0.4693333333333333,
+      "grad_norm": 4.429424190600661,
+      "learning_rate": 8.701813447471218e-06,
+      "loss": 2.9513,
+      "step": 55
+    },
+    {
+      "epoch": 0.47786666666666666,
+      "grad_norm": 4.349652568052827,
+      "learning_rate": 8.740940135031001e-06,
+      "loss": 2.9029,
+      "step": 56
+    },
+    {
+      "epoch": 0.4864,
+      "grad_norm": 4.299227871435445,
+      "learning_rate": 8.779374278362457e-06,
+      "loss": 2.5989,
+      "step": 57
+    },
+    {
+      "epoch": 0.49493333333333334,
+      "grad_norm": 4.562461330302201,
+      "learning_rate": 8.817139967814684e-06,
+      "loss": 2.8158,
+      "step": 58
+    },
+    {
+      "epoch": 0.5034666666666666,
+      "grad_norm": 4.606987182758338,
+      "learning_rate": 8.854260058210721e-06,
+      "loss": 2.6272,
+      "step": 59
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 4.9420031522511545,
+      "learning_rate": 8.890756251918216e-06,
+      "loss": 2.5488,
+      "step": 60
+    },
+    {
+      "epoch": 0.5205333333333333,
+      "grad_norm": 4.706462297046012,
+      "learning_rate": 8.926649175053834e-06,
+      "loss": 2.3575,
+      "step": 61
+    },
+    {
+      "epoch": 0.5290666666666667,
+      "grad_norm": 4.862820204363494,
+      "learning_rate": 8.961958447491269e-06,
+      "loss": 2.2952,
+      "step": 62
+    },
+    {
+      "epoch": 0.5376,
+      "grad_norm": 4.911045913397774,
+      "learning_rate": 8.996702747267908e-06,
+      "loss": 2.1768,
+      "step": 63
+    },
+    {
+      "epoch": 0.5461333333333334,
+      "grad_norm": 5.46978680182973,
+      "learning_rate": 9.030899869919434e-06,
+      "loss": 2.2528,
+      "step": 64
+    },
+    {
+      "epoch": 0.5546666666666666,
+      "grad_norm": 5.847558397227374,
+      "learning_rate": 9.064566783214276e-06,
+      "loss": 2.2401,
+      "step": 65
+    },
+    {
+      "epoch": 0.5632,
+      "grad_norm": 5.984440656257,
+      "learning_rate": 9.097719677709343e-06,
+      "loss": 2.156,
+      "step": 66
+    },
+    {
+      "epoch": 0.5717333333333333,
+      "grad_norm": 6.146172189799918,
+      "learning_rate": 9.130374013504131e-06,
+      "loss": 2.0059,
+      "step": 67
+    },
+    {
+      "epoch": 0.5802666666666667,
+      "grad_norm": 5.725706778130614,
+      "learning_rate": 9.162544563531182e-06,
+      "loss": 1.7756,
+      "step": 68
+    },
+    {
+      "epoch": 0.5888,
+      "grad_norm": 6.479060263133115,
+      "learning_rate": 9.194245453686277e-06,
+      "loss": 1.7651,
+      "step": 69
+    },
+    {
+      "epoch": 0.5973333333333334,
+      "grad_norm": 7.319291050667066,
+      "learning_rate": 9.225490200071284e-06,
+      "loss": 1.7712,
+      "step": 70
+    },
+    {
+      "epoch": 0.6058666666666667,
+      "grad_norm": 6.913275412032087,
+      "learning_rate": 9.256291743595376e-06,
+      "loss": 1.709,
+      "step": 71
+    },
+    {
+      "epoch": 0.6144,
+      "grad_norm": 6.600657239614328,
+      "learning_rate": 9.28666248215634e-06,
+      "loss": 1.3731,
+      "step": 72
+    },
+    {
+      "epoch": 0.6229333333333333,
+      "grad_norm": 7.301483724647945,
+      "learning_rate": 9.316614300602277e-06,
+      "loss": 1.4166,
+      "step": 73
+    },
+    {
+      "epoch": 0.6314666666666666,
+      "grad_norm": 7.154933225265475,
+      "learning_rate": 9.346158598654881e-06,
+      "loss": 1.2797,
+      "step": 74
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 8.248472592538771,
+      "learning_rate": 9.375306316958499e-06,
+      "loss": 1.2082,
+      "step": 75
+    },
+    {
+      "epoch": 0.6485333333333333,
+      "grad_norm": 7.444479096112177,
+      "learning_rate": 9.404067961403957e-06,
+      "loss": 1.0402,
+      "step": 76
+    },
+    {
+      "epoch": 0.6570666666666667,
+      "grad_norm": 6.819760434594012,
+      "learning_rate": 9.432453625862409e-06,
+      "loss": 0.8244,
+      "step": 77
+    },
+    {
+      "epoch": 0.6656,
+      "grad_norm": 6.894760862855001,
+      "learning_rate": 9.460473013452401e-06,
+      "loss": 0.8345,
+      "step": 78
+    },
+    {
+      "epoch": 0.6741333333333334,
+      "grad_norm": 6.001848571839919,
+      "learning_rate": 9.488135456452207e-06,
+      "loss": 0.6839,
+      "step": 79
+    },
+    {
+      "epoch": 0.6826666666666666,
+      "grad_norm": 5.709147411501981,
+      "learning_rate": 9.515449934959717e-06,
+      "loss": 0.6567,
+      "step": 80
+    },
+    {
+      "epoch": 0.6912,
+      "grad_norm": 4.128977158730638,
+      "learning_rate": 9.542425094393249e-06,
+      "loss": 0.545,
+      "step": 81
+    },
+    {
+      "epoch": 0.6997333333333333,
+      "grad_norm": 2.604915806147427,
+      "learning_rate": 9.569069261918582e-06,
+      "loss": 0.4596,
+      "step": 82
+    },
+    {
+      "epoch": 0.7082666666666667,
+      "grad_norm": 2.039939253407506,
+      "learning_rate": 9.59539046188037e-06,
+      "loss": 0.452,
+      "step": 83
+    },
+    {
+      "epoch": 0.7168,
+      "grad_norm": 2.0398988141415337,
+      "learning_rate": 9.621396430309407e-06,
+      "loss": 0.4538,
+      "step": 84
+    },
+    {
+      "epoch": 0.7253333333333334,
+      "grad_norm": 2.37589477950211,
+      "learning_rate": 9.647094628571464e-06,
+      "loss": 0.4505,
+      "step": 85
+    },
+    {
+      "epoch": 0.7338666666666667,
+      "grad_norm": 2.80580920047501,
+      "learning_rate": 9.672492256217837e-06,
+      "loss": 0.5284,
+      "step": 86
+    },
+    {
+      "epoch": 0.7424,
+      "grad_norm": 2.3687428819051197,
+      "learning_rate": 9.697596263093091e-06,
+      "loss": 0.4371,
+      "step": 87
+    },
+    {
+      "epoch": 0.7509333333333333,
+      "grad_norm": 1.6362502854757155,
+      "learning_rate": 9.722413360750844e-06,
+      "loss": 0.3652,
+      "step": 88
+    },
+    {
+      "epoch": 0.7594666666666666,
+      "grad_norm": 1.5360860168740427,
+      "learning_rate": 9.746950033224562e-06,
+      "loss": 0.3235,
+      "step": 89
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 1.7245475092642693,
+      "learning_rate": 9.771212547196623e-06,
+      "loss": 0.3072,
+      "step": 90
+    },
+    {
+      "epoch": 0.7765333333333333,
+      "grad_norm": 1.4493496982196852,
+      "learning_rate": 9.795206961605467e-06,
+      "loss": 0.2474,
+      "step": 91
+    },
+    {
+      "epoch": 0.7850666666666667,
+      "grad_norm": 1.1662262130552072,
+      "learning_rate": 9.818939136727777e-06,
+      "loss": 0.2684,
+      "step": 92
+    },
+    {
+      "epoch": 0.7936,
+      "grad_norm": 1.1727132215390659,
+      "learning_rate": 9.842414742769675e-06,
+      "loss": 0.3456,
+      "step": 93
+    },
+    {
+      "epoch": 0.8021333333333334,
+      "grad_norm": 0.8435059300379855,
+      "learning_rate": 9.865639267998493e-06,
+      "loss": 0.227,
+      "step": 94
+    },
+    {
+      "epoch": 0.8106666666666666,
+      "grad_norm": 0.8593375804730568,
+      "learning_rate": 9.888618026444238e-06,
+      "loss": 0.1985,
+      "step": 95
+    },
+    {
+      "epoch": 0.8192,
+      "grad_norm": 1.0673772841412472,
+      "learning_rate": 9.911356165197841e-06,
+      "loss": 0.3195,
+      "step": 96
+    },
+    {
+      "epoch": 0.8277333333333333,
+      "grad_norm": 0.9341285801648793,
+      "learning_rate": 9.933858671331224e-06,
+      "loss": 0.213,
+      "step": 97
+    },
+    {
+      "epoch": 0.8362666666666667,
+      "grad_norm": 0.7197728549764331,
+      "learning_rate": 9.956130378462474e-06,
+      "loss": 0.2067,
+      "step": 98
+    },
+    {
+      "epoch": 0.8448,
+      "grad_norm": 0.5655901060353195,
+      "learning_rate": 9.978175972987748e-06,
+      "loss": 0.1708,
+      "step": 99
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 0.4681745812066334,
+      "learning_rate": 9.999999999999999e-06,
+      "loss": 0.1983,
+      "step": 100
+    },
+    {
+      "epoch": 0.8618666666666667,
+      "grad_norm": 0.4488180280567293,
+      "learning_rate": 1e-05,
+      "loss": 0.1401,
+      "step": 101
+    },
+    {
+      "epoch": 0.8704,
+      "grad_norm": 0.43194512376224187,
+      "learning_rate": 1e-05,
+      "loss": 0.1097,
+      "step": 102
+    },
+    {
+      "epoch": 0.8789333333333333,
+      "grad_norm": 0.3754480982834532,
+      "learning_rate": 1e-05,
+      "loss": 0.1531,
+      "step": 103
+    },
+    {
+      "epoch": 0.8874666666666666,
+      "grad_norm": 0.34151633602448267,
+      "learning_rate": 1e-05,
+      "loss": 0.1685,
+      "step": 104
+    },
+    {
+      "epoch": 0.896,
+      "grad_norm": 0.26356638458244175,
+      "learning_rate": 1e-05,
+      "loss": 0.1104,
+      "step": 105
+    },
+    {
+      "epoch": 0.9045333333333333,
+      "grad_norm": 0.27641004897246113,
+      "learning_rate": 1e-05,
+      "loss": 0.1589,
+      "step": 106
+    },
+    {
+      "epoch": 0.9130666666666667,
+      "grad_norm": 0.1639383504796773,
+      "learning_rate": 1e-05,
+      "loss": 0.1064,
+      "step": 107
+    },
+    {
+      "epoch": 0.9216,
+      "grad_norm": 0.24233145434818837,
+      "learning_rate": 1e-05,
+      "loss": 0.1385,
+      "step": 108
+    },
+    {
+      "epoch": 0.9301333333333334,
+      "grad_norm": 0.16015184210317215,
+      "learning_rate": 1e-05,
+      "loss": 0.121,
+      "step": 109
+    },
+    {
+      "epoch": 0.9386666666666666,
+      "grad_norm": 0.14931644417242712,
+      "learning_rate": 1e-05,
+      "loss": 0.1117,
+      "step": 110
+    },
+    {
+      "epoch": 0.9472,
+      "grad_norm": 0.15078311335939154,
+      "learning_rate": 1e-05,
+      "loss": 0.1034,
+      "step": 111
+    },
+    {
+      "epoch": 0.9557333333333333,
+      "grad_norm": 0.16714082761639734,
+      "learning_rate": 1e-05,
+      "loss": 0.115,
+      "step": 112
+    },
+    {
+      "epoch": 0.9642666666666667,
+      "grad_norm": 0.12479711996187942,
+      "learning_rate": 1e-05,
+      "loss": 0.1029,
+      "step": 113
+    },
+    {
+      "epoch": 0.9728,
+      "grad_norm": 0.14783351137940065,
+      "learning_rate": 1e-05,
+      "loss": 0.0987,
+      "step": 114
+    },
+    {
+      "epoch": 0.9813333333333333,
+      "grad_norm": 0.11311876630863582,
+      "learning_rate": 1e-05,
+      "loss": 0.0911,
+      "step": 115
+    },
+    {
+      "epoch": 0.9898666666666667,
+      "grad_norm": 0.1238329581090649,
+      "learning_rate": 1e-05,
+      "loss": 0.1095,
+      "step": 116
+    },
+    {
+      "epoch": 0.9984,
+      "grad_norm": 0.11117413394533605,
+      "learning_rate": 1e-05,
+      "loss": 0.0968,
+      "step": 117
+    },
+    {
+      "epoch": 1.0069333333333332,
+      "grad_norm": 0.09247708923706752,
+      "learning_rate": 1e-05,
+      "loss": 0.0985,
+      "step": 118
+    },
+    {
+      "epoch": 1.0154666666666667,
+      "grad_norm": 0.12028574166046906,
+      "learning_rate": 1e-05,
+      "loss": 0.1085,
+      "step": 119
+    },
+    {
+      "epoch": 1.024,
+      "grad_norm": 0.075460717991084,
+      "learning_rate": 1e-05,
+      "loss": 0.1007,
+      "step": 120
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 301,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 20,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.7047845829953454e+18,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-120/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9430fb289d52200b279530dc31f818fe016b81f2a2feb4d356e75541590998de
+size 6840

checkpoint-140/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: ../ckpts/Meta-Llama-3-8B-Instruct
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-140/adapter_config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "../ckpts/Meta-Llama-3-8B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "up_proj",
+    "gate_proj",
+    "lm_head",
+    "o_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-140/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6f62f96cd012cd05339ffc07e7d0c68b8d45f440cf07e6488c18442caf3c457e
+size 1138856856

checkpoint-140/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1013 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.1946666666666665,
+  "eval_steps": 500,
+  "global_step": 140,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.008533333333333334,
+      "grad_norm": 160.11701043689894,
+      "learning_rate": 0.0,
+      "loss": 32.4968,
+      "step": 1
+    },
+    {
+      "epoch": 0.017066666666666667,
+      "grad_norm": 157.24779534424323,
+      "learning_rate": 1.5051499783199057e-06,
+      "loss": 31.6979,
+      "step": 2
+    },
+    {
+      "epoch": 0.0256,
+      "grad_norm": 157.9465272449825,
+      "learning_rate": 2.385606273598312e-06,
+      "loss": 31.8828,
+      "step": 3
+    },
+    {
+      "epoch": 0.034133333333333335,
+      "grad_norm": 160.2154859965946,
+      "learning_rate": 3.0102999566398115e-06,
+      "loss": 31.9681,
+      "step": 4
+    },
+    {
+      "epoch": 0.042666666666666665,
+      "grad_norm": 158.5305446712084,
+      "learning_rate": 3.4948500216800934e-06,
+      "loss": 31.3717,
+      "step": 5
+    },
+    {
+      "epoch": 0.0512,
+      "grad_norm": 155.50243039700376,
+      "learning_rate": 3.890756251918218e-06,
+      "loss": 30.5348,
+      "step": 6
+    },
+    {
+      "epoch": 0.05973333333333333,
+      "grad_norm": 168.6887446693614,
+      "learning_rate": 4.225490200071284e-06,
+      "loss": 31.3845,
+      "step": 7
+    },
+    {
+      "epoch": 0.06826666666666667,
+      "grad_norm": 164.2631689450651,
+      "learning_rate": 4.515449934959717e-06,
+      "loss": 30.5243,
+      "step": 8
+    },
+    {
+      "epoch": 0.0768,
+      "grad_norm": 174.1878139573776,
+      "learning_rate": 4.771212547196624e-06,
+      "loss": 30.0138,
+      "step": 9
+    },
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 177.9519334680014,
+      "learning_rate": 4.9999999999999996e-06,
+      "loss": 29.6143,
+      "step": 10
+    },
+    {
+      "epoch": 0.09386666666666667,
+      "grad_norm": 183.57104380865735,
+      "learning_rate": 5.206963425791125e-06,
+      "loss": 28.8718,
+      "step": 11
+    },
+    {
+      "epoch": 0.1024,
+      "grad_norm": 186.4090344511231,
+      "learning_rate": 5.395906230238124e-06,
+      "loss": 26.1695,
+      "step": 12
+    },
+    {
+      "epoch": 0.11093333333333333,
+      "grad_norm": 198.17161320746723,
+      "learning_rate": 5.5697167615341825e-06,
+      "loss": 26.1266,
+      "step": 13
+    },
+    {
+      "epoch": 0.11946666666666667,
+      "grad_norm": 182.4443087115901,
+      "learning_rate": 5.730640178391189e-06,
+      "loss": 24.2121,
+      "step": 14
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 159.38105380659272,
+      "learning_rate": 5.880456295278406e-06,
+      "loss": 22.5796,
+      "step": 15
+    },
+    {
+      "epoch": 0.13653333333333334,
+      "grad_norm": 142.82387126501297,
+      "learning_rate": 6.020599913279623e-06,
+      "loss": 21.1346,
+      "step": 16
+    },
+    {
+      "epoch": 0.14506666666666668,
+      "grad_norm": 123.86394296641578,
+      "learning_rate": 6.15224460689137e-06,
+      "loss": 19.8457,
+      "step": 17
+    },
+    {
+      "epoch": 0.1536,
+      "grad_norm": 112.3988260336824,
+      "learning_rate": 6.276362525516529e-06,
+      "loss": 18.7824,
+      "step": 18
+    },
+    {
+      "epoch": 0.16213333333333332,
+      "grad_norm": 120.96712330991012,
+      "learning_rate": 6.393768004764144e-06,
+      "loss": 18.0207,
+      "step": 19
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 129.42692949353702,
+      "learning_rate": 6.505149978319905e-06,
+      "loss": 16.8355,
+      "step": 20
+    },
+    {
+      "epoch": 0.1792,
+      "grad_norm": 120.65595457746791,
+      "learning_rate": 6.611096473669596e-06,
+      "loss": 15.252,
+      "step": 21
+    },
+    {
+      "epoch": 0.18773333333333334,
+      "grad_norm": 133.05280466087515,
+      "learning_rate": 6.712113404111031e-06,
+      "loss": 14.1391,
+      "step": 22
+    },
+    {
+      "epoch": 0.19626666666666667,
+      "grad_norm": 127.95029628849048,
+      "learning_rate": 6.808639180087963e-06,
+      "loss": 12.9566,
+      "step": 23
+    },
+    {
+      "epoch": 0.2048,
+      "grad_norm": 108.83495245094748,
+      "learning_rate": 6.90105620855803e-06,
+      "loss": 11.8743,
+      "step": 24
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 99.90727146021455,
+      "learning_rate": 6.989700043360187e-06,
+      "loss": 10.962,
+      "step": 25
+    },
+    {
+      "epoch": 0.22186666666666666,
+      "grad_norm": 98.37126740059823,
+      "learning_rate": 7.074866739854089e-06,
+      "loss": 9.9919,
+      "step": 26
+    },
+    {
+      "epoch": 0.2304,
+      "grad_norm": 92.26708429201608,
+      "learning_rate": 7.156818820794936e-06,
+      "loss": 8.8811,
+      "step": 27
+    },
+    {
+      "epoch": 0.23893333333333333,
+      "grad_norm": 83.36099898839835,
+      "learning_rate": 7.235790156711096e-06,
+      "loss": 7.7806,
+      "step": 28
+    },
+    {
+      "epoch": 0.24746666666666667,
+      "grad_norm": 68.07500315598597,
+      "learning_rate": 7.3119899894947795e-06,
+      "loss": 7.0528,
+      "step": 29
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 69.58960332280246,
+      "learning_rate": 7.385606273598311e-06,
+      "loss": 6.3683,
+      "step": 30
+    },
+    {
+      "epoch": 0.26453333333333334,
+      "grad_norm": 68.77532204123075,
+      "learning_rate": 7.456808469171363e-06,
+      "loss": 6.1635,
+      "step": 31
+    },
+    {
+      "epoch": 0.2730666666666667,
+      "grad_norm": 66.29676636510072,
+      "learning_rate": 7.5257498915995295e-06,
+      "loss": 4.711,
+      "step": 32
+    },
+    {
+      "epoch": 0.2816,
+      "grad_norm": 42.87145091679237,
+      "learning_rate": 7.592569699389437e-06,
+      "loss": 4.5119,
+      "step": 33
+    },
+    {
+      "epoch": 0.29013333333333335,
+      "grad_norm": 26.2592350291551,
+      "learning_rate": 7.657394585211274e-06,
+      "loss": 4.31,
+      "step": 34
+    },
+    {
+      "epoch": 0.2986666666666667,
+      "grad_norm": 15.35959008067237,
+      "learning_rate": 7.720340221751376e-06,
+      "loss": 4.0001,
+      "step": 35
+    },
+    {
+      "epoch": 0.3072,
+      "grad_norm": 8.50847651865227,
+      "learning_rate": 7.781512503836437e-06,
+      "loss": 3.5723,
+      "step": 36
+    },
+    {
+      "epoch": 0.3157333333333333,
+      "grad_norm": 6.562581089063746,
+      "learning_rate": 7.841008620334974e-06,
+      "loss": 3.9254,
+      "step": 37
+    },
+    {
+      "epoch": 0.32426666666666665,
+      "grad_norm": 5.6145595722250095,
+      "learning_rate": 7.89891798308405e-06,
+      "loss": 3.8746,
+      "step": 38
+    },
+    {
+      "epoch": 0.3328,
+      "grad_norm": 5.385367220486204,
+      "learning_rate": 7.955323035132495e-06,
+      "loss": 3.8128,
+      "step": 39
+    },
+    {
+      "epoch": 0.3413333333333333,
+      "grad_norm": 5.403447124703616,
+      "learning_rate": 8.010299956639811e-06,
+      "loss": 3.885,
+      "step": 40
+    },
+    {
+      "epoch": 0.34986666666666666,
+      "grad_norm": 5.48242204895128,
+      "learning_rate": 8.063919283598677e-06,
+      "loss": 3.8048,
+      "step": 41
+    },
+    {
+      "epoch": 0.3584,
+      "grad_norm": 5.5525098950513865,
+      "learning_rate": 8.116246451989503e-06,
+      "loss": 3.7508,
+      "step": 42
+    },
+    {
+      "epoch": 0.36693333333333333,
+      "grad_norm": 5.354384520535484,
+      "learning_rate": 8.167342277897933e-06,
+      "loss": 3.5069,
+      "step": 43
+    },
+    {
+      "epoch": 0.37546666666666667,
+      "grad_norm": 5.46272338131107,
+      "learning_rate": 8.217263382430936e-06,
+      "loss": 3.6747,
+      "step": 44
+    },
+    {
+      "epoch": 0.384,
+      "grad_norm": 4.798550688968453,
+      "learning_rate": 8.266062568876717e-06,
+      "loss": 3.1609,
+      "step": 45
+    },
+    {
+      "epoch": 0.39253333333333335,
+      "grad_norm": 5.755104452953421,
+      "learning_rate": 8.31378915840787e-06,
+      "loss": 3.5733,
+      "step": 46
+    },
+    {
+      "epoch": 0.4010666666666667,
+      "grad_norm": 4.618763611067563,
+      "learning_rate": 8.360489289678585e-06,
+      "loss": 2.9402,
+      "step": 47
+    },
+    {
+      "epoch": 0.4096,
+      "grad_norm": 5.506785974818791,
+      "learning_rate": 8.406206186877936e-06,
+      "loss": 3.382,
+      "step": 48
+    },
+    {
+      "epoch": 0.41813333333333336,
+      "grad_norm": 4.68603207809794,
+      "learning_rate": 8.450980400142568e-06,
+      "loss": 2.9918,
+      "step": 49
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 5.124033394817131,
+      "learning_rate": 8.494850021680093e-06,
+      "loss": 3.3202,
+      "step": 50
+    },
+    {
+      "epoch": 0.4352,
+      "grad_norm": 4.293001183481895,
+      "learning_rate": 8.537850880489681e-06,
+      "loss": 2.8519,
+      "step": 51
+    },
+    {
+      "epoch": 0.4437333333333333,
+      "grad_norm": 4.382596858902394,
+      "learning_rate": 8.580016718173996e-06,
+      "loss": 2.9683,
+      "step": 52
+    },
+    {
+      "epoch": 0.45226666666666665,
+      "grad_norm": 4.3176263388044696,
+      "learning_rate": 8.621379348003945e-06,
+      "loss": 2.9257,
+      "step": 53
+    },
+    {
+      "epoch": 0.4608,
+      "grad_norm": 4.5250022171605195,
+      "learning_rate": 8.661968799114844e-06,
+      "loss": 3.0556,
+      "step": 54
+    },
+    {
+      "epoch": 0.4693333333333333,
+      "grad_norm": 4.429424190600661,
+      "learning_rate": 8.701813447471218e-06,
+      "loss": 2.9513,
+      "step": 55
+    },
+    {
+      "epoch": 0.47786666666666666,
+      "grad_norm": 4.349652568052827,
+      "learning_rate": 8.740940135031001e-06,
+      "loss": 2.9029,
+      "step": 56
+    },
+    {
+      "epoch": 0.4864,
+      "grad_norm": 4.299227871435445,
+      "learning_rate": 8.779374278362457e-06,
+      "loss": 2.5989,
+      "step": 57
+    },
+    {
+      "epoch": 0.49493333333333334,
+      "grad_norm": 4.562461330302201,
+      "learning_rate": 8.817139967814684e-06,
+      "loss": 2.8158,
+      "step": 58
+    },
+    {
+      "epoch": 0.5034666666666666,
+      "grad_norm": 4.606987182758338,
+      "learning_rate": 8.854260058210721e-06,
+      "loss": 2.6272,
+      "step": 59
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 4.9420031522511545,
+      "learning_rate": 8.890756251918216e-06,
+      "loss": 2.5488,
+      "step": 60
+    },
+    {
+      "epoch": 0.5205333333333333,
+      "grad_norm": 4.706462297046012,
+      "learning_rate": 8.926649175053834e-06,
+      "loss": 2.3575,
+      "step": 61
+    },
+    {
+      "epoch": 0.5290666666666667,
+      "grad_norm": 4.862820204363494,
+      "learning_rate": 8.961958447491269e-06,
+      "loss": 2.2952,
+      "step": 62
+    },
+    {
+      "epoch": 0.5376,
+      "grad_norm": 4.911045913397774,
+      "learning_rate": 8.996702747267908e-06,
+      "loss": 2.1768,
+      "step": 63
+    },
+    {
+      "epoch": 0.5461333333333334,
+      "grad_norm": 5.46978680182973,
+      "learning_rate": 9.030899869919434e-06,
+      "loss": 2.2528,
+      "step": 64
+    },
+    {
+      "epoch": 0.5546666666666666,
+      "grad_norm": 5.847558397227374,
+      "learning_rate": 9.064566783214276e-06,
+      "loss": 2.2401,
+      "step": 65
+    },
+    {
+      "epoch": 0.5632,
+      "grad_norm": 5.984440656257,
+      "learning_rate": 9.097719677709343e-06,
+      "loss": 2.156,
+      "step": 66
+    },
+    {
+      "epoch": 0.5717333333333333,
+      "grad_norm": 6.146172189799918,
+      "learning_rate": 9.130374013504131e-06,
+      "loss": 2.0059,
+      "step": 67
+    },
+    {
+      "epoch": 0.5802666666666667,
+      "grad_norm": 5.725706778130614,
+      "learning_rate": 9.162544563531182e-06,
+      "loss": 1.7756,
+      "step": 68
+    },
+    {
+      "epoch": 0.5888,
+      "grad_norm": 6.479060263133115,
+      "learning_rate": 9.194245453686277e-06,
+      "loss": 1.7651,
+      "step": 69
+    },
+    {
+      "epoch": 0.5973333333333334,
+      "grad_norm": 7.319291050667066,
+      "learning_rate": 9.225490200071284e-06,
+      "loss": 1.7712,
+      "step": 70
+    },
+    {
+      "epoch": 0.6058666666666667,
+      "grad_norm": 6.913275412032087,
+      "learning_rate": 9.256291743595376e-06,
+      "loss": 1.709,
+      "step": 71
+    },
+    {
+      "epoch": 0.6144,
+      "grad_norm": 6.600657239614328,
+      "learning_rate": 9.28666248215634e-06,
+      "loss": 1.3731,
+      "step": 72
+    },
+    {
+      "epoch": 0.6229333333333333,
+      "grad_norm": 7.301483724647945,
+      "learning_rate": 9.316614300602277e-06,
+      "loss": 1.4166,
+      "step": 73
+    },
+    {
+      "epoch": 0.6314666666666666,
+      "grad_norm": 7.154933225265475,
+      "learning_rate": 9.346158598654881e-06,
+      "loss": 1.2797,
+      "step": 74
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 8.248472592538771,
+      "learning_rate": 9.375306316958499e-06,
+      "loss": 1.2082,
+      "step": 75
+    },
+    {
+      "epoch": 0.6485333333333333,
+      "grad_norm": 7.444479096112177,
+      "learning_rate": 9.404067961403957e-06,
+      "loss": 1.0402,
+      "step": 76
+    },
+    {
+      "epoch": 0.6570666666666667,
+      "grad_norm": 6.819760434594012,
+      "learning_rate": 9.432453625862409e-06,
+      "loss": 0.8244,
+      "step": 77
+    },
+    {
+      "epoch": 0.6656,
+      "grad_norm": 6.894760862855001,
+      "learning_rate": 9.460473013452401e-06,
+      "loss": 0.8345,
+      "step": 78
+    },
+    {
+      "epoch": 0.6741333333333334,
+      "grad_norm": 6.001848571839919,
+      "learning_rate": 9.488135456452207e-06,
+      "loss": 0.6839,
+      "step": 79
+    },
+    {
+      "epoch": 0.6826666666666666,
+      "grad_norm": 5.709147411501981,
+      "learning_rate": 9.515449934959717e-06,
+      "loss": 0.6567,
+      "step": 80
+    },
+    {
+      "epoch": 0.6912,
+      "grad_norm": 4.128977158730638,
+      "learning_rate": 9.542425094393249e-06,
+      "loss": 0.545,
+      "step": 81
+    },
+    {
+      "epoch": 0.6997333333333333,
+      "grad_norm": 2.604915806147427,
+      "learning_rate": 9.569069261918582e-06,
+      "loss": 0.4596,
+      "step": 82
+    },
+    {
+      "epoch": 0.7082666666666667,
+      "grad_norm": 2.039939253407506,
+      "learning_rate": 9.59539046188037e-06,
+      "loss": 0.452,
+      "step": 83
+    },
+    {
+      "epoch": 0.7168,
+      "grad_norm": 2.0398988141415337,
+      "learning_rate": 9.621396430309407e-06,
+      "loss": 0.4538,
+      "step": 84
+    },
+    {
+      "epoch": 0.7253333333333334,
+      "grad_norm": 2.37589477950211,
+      "learning_rate": 9.647094628571464e-06,
+      "loss": 0.4505,
+      "step": 85
+    },
+    {
+      "epoch": 0.7338666666666667,
+      "grad_norm": 2.80580920047501,
+      "learning_rate": 9.672492256217837e-06,
+      "loss": 0.5284,
+      "step": 86
+    },
+    {
+      "epoch": 0.7424,
+      "grad_norm": 2.3687428819051197,
+      "learning_rate": 9.697596263093091e-06,
+      "loss": 0.4371,
+      "step": 87
+    },
+    {
+      "epoch": 0.7509333333333333,
+      "grad_norm": 1.6362502854757155,
+      "learning_rate": 9.722413360750844e-06,
+      "loss": 0.3652,
+      "step": 88
+    },
+    {
+      "epoch": 0.7594666666666666,
+      "grad_norm": 1.5360860168740427,
+      "learning_rate": 9.746950033224562e-06,
+      "loss": 0.3235,
+      "step": 89
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 1.7245475092642693,
+      "learning_rate": 9.771212547196623e-06,
+      "loss": 0.3072,
+      "step": 90
+    },
+    {
+      "epoch": 0.7765333333333333,
+      "grad_norm": 1.4493496982196852,
+      "learning_rate": 9.795206961605467e-06,
+      "loss": 0.2474,
+      "step": 91
+    },
+    {
+      "epoch": 0.7850666666666667,
+      "grad_norm": 1.1662262130552072,
+      "learning_rate": 9.818939136727777e-06,
+      "loss": 0.2684,
+      "step": 92
+    },
+    {
+      "epoch": 0.7936,
+      "grad_norm": 1.1727132215390659,
+      "learning_rate": 9.842414742769675e-06,
+      "loss": 0.3456,
+      "step": 93
+    },
+    {
+      "epoch": 0.8021333333333334,
+      "grad_norm": 0.8435059300379855,
+      "learning_rate": 9.865639267998493e-06,
+      "loss": 0.227,
+      "step": 94
+    },
+    {
+      "epoch": 0.8106666666666666,
+      "grad_norm": 0.8593375804730568,
+      "learning_rate": 9.888618026444238e-06,
+      "loss": 0.1985,
+      "step": 95
+    },
+    {
+      "epoch": 0.8192,
+      "grad_norm": 1.0673772841412472,
+      "learning_rate": 9.911356165197841e-06,
+      "loss": 0.3195,
+      "step": 96
+    },
+    {
+      "epoch": 0.8277333333333333,
+      "grad_norm": 0.9341285801648793,
+      "learning_rate": 9.933858671331224e-06,
+      "loss": 0.213,
+      "step": 97
+    },
+    {
+      "epoch": 0.8362666666666667,
+      "grad_norm": 0.7197728549764331,
+      "learning_rate": 9.956130378462474e-06,
+      "loss": 0.2067,
+      "step": 98
+    },
+    {
+      "epoch": 0.8448,
+      "grad_norm": 0.5655901060353195,
+      "learning_rate": 9.978175972987748e-06,
+      "loss": 0.1708,
+      "step": 99
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 0.4681745812066334,
+      "learning_rate": 9.999999999999999e-06,
+      "loss": 0.1983,
+      "step": 100
+    },
+    {
+      "epoch": 0.8618666666666667,
+      "grad_norm": 0.4488180280567293,
+      "learning_rate": 1e-05,
+      "loss": 0.1401,
+      "step": 101
+    },
+    {
+      "epoch": 0.8704,
+      "grad_norm": 0.43194512376224187,
+      "learning_rate": 1e-05,
+      "loss": 0.1097,
+      "step": 102
+    },
+    {
+      "epoch": 0.8789333333333333,
+      "grad_norm": 0.3754480982834532,
+      "learning_rate": 1e-05,
+      "loss": 0.1531,
+      "step": 103
+    },
+    {
+      "epoch": 0.8874666666666666,
+      "grad_norm": 0.34151633602448267,
+      "learning_rate": 1e-05,
+      "loss": 0.1685,
+      "step": 104
+    },
+    {
+      "epoch": 0.896,
+      "grad_norm": 0.26356638458244175,
+      "learning_rate": 1e-05,
+      "loss": 0.1104,
+      "step": 105
+    },
+    {
+      "epoch": 0.9045333333333333,
+      "grad_norm": 0.27641004897246113,
+      "learning_rate": 1e-05,
+      "loss": 0.1589,
+      "step": 106
+    },
+    {
+      "epoch": 0.9130666666666667,
+      "grad_norm": 0.1639383504796773,
+      "learning_rate": 1e-05,
+      "loss": 0.1064,
+      "step": 107
+    },
+    {
+      "epoch": 0.9216,
+      "grad_norm": 0.24233145434818837,
+      "learning_rate": 1e-05,
+      "loss": 0.1385,
+      "step": 108
+    },
+    {
+      "epoch": 0.9301333333333334,
+      "grad_norm": 0.16015184210317215,
+      "learning_rate": 1e-05,
+      "loss": 0.121,
+      "step": 109
+    },
+    {
+      "epoch": 0.9386666666666666,
+      "grad_norm": 0.14931644417242712,
+      "learning_rate": 1e-05,
+      "loss": 0.1117,
+      "step": 110
+    },
+    {
+      "epoch": 0.9472,
+      "grad_norm": 0.15078311335939154,
+      "learning_rate": 1e-05,
+      "loss": 0.1034,
+      "step": 111
+    },
+    {
+      "epoch": 0.9557333333333333,
+      "grad_norm": 0.16714082761639734,
+      "learning_rate": 1e-05,
+      "loss": 0.115,
+      "step": 112
+    },
+    {
+      "epoch": 0.9642666666666667,
+      "grad_norm": 0.12479711996187942,
+      "learning_rate": 1e-05,
+      "loss": 0.1029,
+      "step": 113
+    },
+    {
+      "epoch": 0.9728,
+      "grad_norm": 0.14783351137940065,
+      "learning_rate": 1e-05,
+      "loss": 0.0987,
+      "step": 114
+    },
+    {
+      "epoch": 0.9813333333333333,
+      "grad_norm": 0.11311876630863582,
+      "learning_rate": 1e-05,
+      "loss": 0.0911,
+      "step": 115
+    },
+    {
+      "epoch": 0.9898666666666667,
+      "grad_norm": 0.1238329581090649,
+      "learning_rate": 1e-05,
+      "loss": 0.1095,
+      "step": 116
+    },
+    {
+      "epoch": 0.9984,
+      "grad_norm": 0.11117413394533605,
+      "learning_rate": 1e-05,
+      "loss": 0.0968,
+      "step": 117
+    },
+    {
+      "epoch": 1.0069333333333332,
+      "grad_norm": 0.09247708923706752,
+      "learning_rate": 1e-05,
+      "loss": 0.0985,
+      "step": 118
+    },
+    {
+      "epoch": 1.0154666666666667,
+      "grad_norm": 0.12028574166046906,
+      "learning_rate": 1e-05,
+      "loss": 0.1085,
+      "step": 119
+    },
+    {
+      "epoch": 1.024,
+      "grad_norm": 0.075460717991084,
+      "learning_rate": 1e-05,
+      "loss": 0.1007,
+      "step": 120
+    },
+    {
+      "epoch": 1.0325333333333333,
+      "grad_norm": 0.1930335796969662,
+      "learning_rate": 1e-05,
+      "loss": 0.1438,
+      "step": 121
+    },
+    {
+      "epoch": 1.0410666666666666,
+      "grad_norm": 0.11451251015868702,
+      "learning_rate": 1e-05,
+      "loss": 0.1365,
+      "step": 122
+    },
+    {
+      "epoch": 1.0496,
+      "grad_norm": 0.09360332240252384,
+      "learning_rate": 1e-05,
+      "loss": 0.1039,
+      "step": 123
+    },
+    {
+      "epoch": 1.0581333333333334,
+      "grad_norm": 0.13162505626586696,
+      "learning_rate": 1e-05,
+      "loss": 0.1132,
+      "step": 124
+    },
+    {
+      "epoch": 1.0666666666666667,
+      "grad_norm": 0.1329223725298499,
+      "learning_rate": 1e-05,
+      "loss": 0.1153,
+      "step": 125
+    },
+    {
+      "epoch": 1.0752,
+      "grad_norm": 0.09522360247894453,
+      "learning_rate": 1e-05,
+      "loss": 0.1264,
+      "step": 126
+    },
+    {
+      "epoch": 1.0837333333333334,
+      "grad_norm": 0.12467359977458509,
+      "learning_rate": 1e-05,
+      "loss": 0.0866,
+      "step": 127
+    },
+    {
+      "epoch": 1.0922666666666667,
+      "grad_norm": 0.08853379791954709,
+      "learning_rate": 1e-05,
+      "loss": 0.107,
+      "step": 128
+    },
+    {
+      "epoch": 1.1008,
+      "grad_norm": 0.16050358070185106,
+      "learning_rate": 1e-05,
+      "loss": 0.1134,
+      "step": 129
+    },
+    {
+      "epoch": 1.1093333333333333,
+      "grad_norm": 0.10331318962336627,
+      "learning_rate": 1e-05,
+      "loss": 0.1217,
+      "step": 130
+    },
+    {
+      "epoch": 1.1178666666666666,
+      "grad_norm": 0.08498886624952962,
+      "learning_rate": 1e-05,
+      "loss": 0.12,
+      "step": 131
+    },
+    {
+      "epoch": 1.1264,
+      "grad_norm": 0.09918910544874306,
+      "learning_rate": 1e-05,
+      "loss": 0.1173,
+      "step": 132
+    },
+    {
+      "epoch": 1.1349333333333333,
+      "grad_norm": 0.0751198135696547,
+      "learning_rate": 1e-05,
+      "loss": 0.0973,
+      "step": 133
+    },
+    {
+      "epoch": 1.1434666666666666,
+      "grad_norm": 0.07959218402066412,
+      "learning_rate": 1e-05,
+      "loss": 0.0992,
+      "step": 134
+    },
+    {
+      "epoch": 1.152,
+      "grad_norm": 0.14419628324779726,
+      "learning_rate": 1e-05,
+      "loss": 0.0856,
+      "step": 135
+    },
+    {
+      "epoch": 1.1605333333333334,
+      "grad_norm": 0.07894542967774888,
+      "learning_rate": 1e-05,
+      "loss": 0.1193,
+      "step": 136
+    },
+    {
+      "epoch": 1.1690666666666667,
+      "grad_norm": 0.08735606763938318,
+      "learning_rate": 1e-05,
+      "loss": 0.1061,
+      "step": 137
+    },
+    {
+      "epoch": 1.1776,
+      "grad_norm": 0.12344637986728384,
+      "learning_rate": 1e-05,
+      "loss": 0.1184,
+      "step": 138
+    },
+    {
+      "epoch": 1.1861333333333333,
+      "grad_norm": 0.07797745242316644,
+      "learning_rate": 1e-05,
+      "loss": 0.0959,
+      "step": 139
+    },
+    {
+      "epoch": 1.1946666666666665,
+      "grad_norm": 0.10065236259356937,
+      "learning_rate": 1e-05,
+      "loss": 0.0957,
+      "step": 140
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 301,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 20,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.9872207486907843e+18,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-140/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9430fb289d52200b279530dc31f818fe016b81f2a2feb4d356e75541590998de
+size 6840

checkpoint-160/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: ../ckpts/Meta-Llama-3-8B-Instruct
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-160/adapter_config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "../ckpts/Meta-Llama-3-8B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "up_proj",
+    "gate_proj",
+    "lm_head",
+    "o_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-160/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ad7f3936a96df66a08461026a1c7af87cb6ee577462637e0549583bc4276a78b
+size 1138856856

checkpoint-160/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1153 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.3653333333333333,
+  "eval_steps": 500,
+  "global_step": 160,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.008533333333333334,
+      "grad_norm": 160.11701043689894,
+      "learning_rate": 0.0,
+      "loss": 32.4968,
+      "step": 1
+    },
+    {
+      "epoch": 0.017066666666666667,
+      "grad_norm": 157.24779534424323,
+      "learning_rate": 1.5051499783199057e-06,
+      "loss": 31.6979,
+      "step": 2
+    },
+    {
+      "epoch": 0.0256,
+      "grad_norm": 157.9465272449825,
+      "learning_rate": 2.385606273598312e-06,
+      "loss": 31.8828,
+      "step": 3
+    },
+    {
+      "epoch": 0.034133333333333335,
+      "grad_norm": 160.2154859965946,
+      "learning_rate": 3.0102999566398115e-06,
+      "loss": 31.9681,
+      "step": 4
+    },
+    {
+      "epoch": 0.042666666666666665,
+      "grad_norm": 158.5305446712084,
+      "learning_rate": 3.4948500216800934e-06,
+      "loss": 31.3717,
+      "step": 5
+    },
+    {
+      "epoch": 0.0512,
+      "grad_norm": 155.50243039700376,
+      "learning_rate": 3.890756251918218e-06,
+      "loss": 30.5348,
+      "step": 6
+    },
+    {
+      "epoch": 0.05973333333333333,
+      "grad_norm": 168.6887446693614,
+      "learning_rate": 4.225490200071284e-06,
+      "loss": 31.3845,
+      "step": 7
+    },
+    {
+      "epoch": 0.06826666666666667,
+      "grad_norm": 164.2631689450651,
+      "learning_rate": 4.515449934959717e-06,
+      "loss": 30.5243,
+      "step": 8
+    },
+    {
+      "epoch": 0.0768,
+      "grad_norm": 174.1878139573776,
+      "learning_rate": 4.771212547196624e-06,
+      "loss": 30.0138,
+      "step": 9
+    },
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 177.9519334680014,
+      "learning_rate": 4.9999999999999996e-06,
+      "loss": 29.6143,
+      "step": 10
+    },
+    {
+      "epoch": 0.09386666666666667,
+      "grad_norm": 183.57104380865735,
+      "learning_rate": 5.206963425791125e-06,
+      "loss": 28.8718,
+      "step": 11
+    },
+    {
+      "epoch": 0.1024,
+      "grad_norm": 186.4090344511231,
+      "learning_rate": 5.395906230238124e-06,
+      "loss": 26.1695,
+      "step": 12
+    },
+    {
+      "epoch": 0.11093333333333333,
+      "grad_norm": 198.17161320746723,
+      "learning_rate": 5.5697167615341825e-06,
+      "loss": 26.1266,
+      "step": 13
+    },
+    {
+      "epoch": 0.11946666666666667,
+      "grad_norm": 182.4443087115901,
+      "learning_rate": 5.730640178391189e-06,
+      "loss": 24.2121,
+      "step": 14
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 159.38105380659272,
+      "learning_rate": 5.880456295278406e-06,
+      "loss": 22.5796,
+      "step": 15
+    },
+    {
+      "epoch": 0.13653333333333334,
+      "grad_norm": 142.82387126501297,
+      "learning_rate": 6.020599913279623e-06,
+      "loss": 21.1346,
+      "step": 16
+    },
+    {
+      "epoch": 0.14506666666666668,
+      "grad_norm": 123.86394296641578,
+      "learning_rate": 6.15224460689137e-06,
+      "loss": 19.8457,
+      "step": 17
+    },
+    {
+      "epoch": 0.1536,
+      "grad_norm": 112.3988260336824,
+      "learning_rate": 6.276362525516529e-06,
+      "loss": 18.7824,
+      "step": 18
+    },
+    {
+      "epoch": 0.16213333333333332,
+      "grad_norm": 120.96712330991012,
+      "learning_rate": 6.393768004764144e-06,
+      "loss": 18.0207,
+      "step": 19
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 129.42692949353702,
+      "learning_rate": 6.505149978319905e-06,
+      "loss": 16.8355,
+      "step": 20
+    },
+    {
+      "epoch": 0.1792,
+      "grad_norm": 120.65595457746791,
+      "learning_rate": 6.611096473669596e-06,
+      "loss": 15.252,
+      "step": 21
+    },
+    {
+      "epoch": 0.18773333333333334,
+      "grad_norm": 133.05280466087515,
+      "learning_rate": 6.712113404111031e-06,
+      "loss": 14.1391,
+      "step": 22
+    },
+    {
+      "epoch": 0.19626666666666667,
+      "grad_norm": 127.95029628849048,
+      "learning_rate": 6.808639180087963e-06,
+      "loss": 12.9566,
+      "step": 23
+    },
+    {
+      "epoch": 0.2048,
+      "grad_norm": 108.83495245094748,
+      "learning_rate": 6.90105620855803e-06,
+      "loss": 11.8743,
+      "step": 24
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 99.90727146021455,
+      "learning_rate": 6.989700043360187e-06,
+      "loss": 10.962,
+      "step": 25
+    },
+    {
+      "epoch": 0.22186666666666666,
+      "grad_norm": 98.37126740059823,
+      "learning_rate": 7.074866739854089e-06,
+      "loss": 9.9919,
+      "step": 26
+    },
+    {
+      "epoch": 0.2304,
+      "grad_norm": 92.26708429201608,
+      "learning_rate": 7.156818820794936e-06,
+      "loss": 8.8811,
+      "step": 27
+    },
+    {
+      "epoch": 0.23893333333333333,
+      "grad_norm": 83.36099898839835,
+      "learning_rate": 7.235790156711096e-06,
+      "loss": 7.7806,
+      "step": 28
+    },
+    {
+      "epoch": 0.24746666666666667,
+      "grad_norm": 68.07500315598597,
+      "learning_rate": 7.3119899894947795e-06,
+      "loss": 7.0528,
+      "step": 29
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 69.58960332280246,
+      "learning_rate": 7.385606273598311e-06,
+      "loss": 6.3683,
+      "step": 30
+    },
+    {
+      "epoch": 0.26453333333333334,
+      "grad_norm": 68.77532204123075,
+      "learning_rate": 7.456808469171363e-06,
+      "loss": 6.1635,
+      "step": 31
+    },
+    {
+      "epoch": 0.2730666666666667,
+      "grad_norm": 66.29676636510072,
+      "learning_rate": 7.5257498915995295e-06,
+      "loss": 4.711,
+      "step": 32
+    },
+    {
+      "epoch": 0.2816,
+      "grad_norm": 42.87145091679237,
+      "learning_rate": 7.592569699389437e-06,
+      "loss": 4.5119,
+      "step": 33
+    },
+    {
+      "epoch": 0.29013333333333335,
+      "grad_norm": 26.2592350291551,
+      "learning_rate": 7.657394585211274e-06,
+      "loss": 4.31,
+      "step": 34
+    },
+    {
+      "epoch": 0.2986666666666667,
+      "grad_norm": 15.35959008067237,
+      "learning_rate": 7.720340221751376e-06,
+      "loss": 4.0001,
+      "step": 35
+    },
+    {
+      "epoch": 0.3072,
+      "grad_norm": 8.50847651865227,
+      "learning_rate": 7.781512503836437e-06,
+      "loss": 3.5723,
+      "step": 36
+    },
+    {
+      "epoch": 0.3157333333333333,
+      "grad_norm": 6.562581089063746,
+      "learning_rate": 7.841008620334974e-06,
+      "loss": 3.9254,
+      "step": 37
+    },
+    {
+      "epoch": 0.32426666666666665,
+      "grad_norm": 5.6145595722250095,
+      "learning_rate": 7.89891798308405e-06,
+      "loss": 3.8746,
+      "step": 38
+    },
+    {
+      "epoch": 0.3328,
+      "grad_norm": 5.385367220486204,
+      "learning_rate": 7.955323035132495e-06,
+      "loss": 3.8128,
+      "step": 39
+    },
+    {
+      "epoch": 0.3413333333333333,
+      "grad_norm": 5.403447124703616,
+      "learning_rate": 8.010299956639811e-06,
+      "loss": 3.885,
+      "step": 40
+    },
+    {
+      "epoch": 0.34986666666666666,
+      "grad_norm": 5.48242204895128,
+      "learning_rate": 8.063919283598677e-06,
+      "loss": 3.8048,
+      "step": 41
+    },
+    {
+      "epoch": 0.3584,
+      "grad_norm": 5.5525098950513865,
+      "learning_rate": 8.116246451989503e-06,
+      "loss": 3.7508,
+      "step": 42
+    },
+    {
+      "epoch": 0.36693333333333333,
+      "grad_norm": 5.354384520535484,
+      "learning_rate": 8.167342277897933e-06,
+      "loss": 3.5069,
+      "step": 43
+    },
+    {
+      "epoch": 0.37546666666666667,
+      "grad_norm": 5.46272338131107,
+      "learning_rate": 8.217263382430936e-06,
+      "loss": 3.6747,
+      "step": 44
+    },
+    {
+      "epoch": 0.384,
+      "grad_norm": 4.798550688968453,
+      "learning_rate": 8.266062568876717e-06,
+      "loss": 3.1609,
+      "step": 45
+    },
+    {
+      "epoch": 0.39253333333333335,
+      "grad_norm": 5.755104452953421,
+      "learning_rate": 8.31378915840787e-06,
+      "loss": 3.5733,
+      "step": 46
+    },
+    {
+      "epoch": 0.4010666666666667,
+      "grad_norm": 4.618763611067563,
+      "learning_rate": 8.360489289678585e-06,
+      "loss": 2.9402,
+      "step": 47
+    },
+    {
+      "epoch": 0.4096,
+      "grad_norm": 5.506785974818791,
+      "learning_rate": 8.406206186877936e-06,
+      "loss": 3.382,
+      "step": 48
+    },
+    {
+      "epoch": 0.41813333333333336,
+      "grad_norm": 4.68603207809794,
+      "learning_rate": 8.450980400142568e-06,
+      "loss": 2.9918,
+      "step": 49
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 5.124033394817131,
+      "learning_rate": 8.494850021680093e-06,
+      "loss": 3.3202,
+      "step": 50
+    },
+    {
+      "epoch": 0.4352,
+      "grad_norm": 4.293001183481895,
+      "learning_rate": 8.537850880489681e-06,
+      "loss": 2.8519,
+      "step": 51
+    },
+    {
+      "epoch": 0.4437333333333333,
+      "grad_norm": 4.382596858902394,
+      "learning_rate": 8.580016718173996e-06,
+      "loss": 2.9683,
+      "step": 52
+    },
+    {
+      "epoch": 0.45226666666666665,
+      "grad_norm": 4.3176263388044696,
+      "learning_rate": 8.621379348003945e-06,
+      "loss": 2.9257,
+      "step": 53
+    },
+    {
+      "epoch": 0.4608,
+      "grad_norm": 4.5250022171605195,
+      "learning_rate": 8.661968799114844e-06,
+      "loss": 3.0556,
+      "step": 54
+    },
+    {
+      "epoch": 0.4693333333333333,
+      "grad_norm": 4.429424190600661,
+      "learning_rate": 8.701813447471218e-06,
+      "loss": 2.9513,
+      "step": 55
+    },
+    {
+      "epoch": 0.47786666666666666,
+      "grad_norm": 4.349652568052827,
+      "learning_rate": 8.740940135031001e-06,
+      "loss": 2.9029,
+      "step": 56
+    },
+    {
+      "epoch": 0.4864,
+      "grad_norm": 4.299227871435445,
+      "learning_rate": 8.779374278362457e-06,
+      "loss": 2.5989,
+      "step": 57
+    },
+    {
+      "epoch": 0.49493333333333334,
+      "grad_norm": 4.562461330302201,
+      "learning_rate": 8.817139967814684e-06,
+      "loss": 2.8158,
+      "step": 58
+    },
+    {
+      "epoch": 0.5034666666666666,
+      "grad_norm": 4.606987182758338,
+      "learning_rate": 8.854260058210721e-06,
+      "loss": 2.6272,
+      "step": 59
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 4.9420031522511545,
+      "learning_rate": 8.890756251918216e-06,
+      "loss": 2.5488,
+      "step": 60
+    },
+    {
+      "epoch": 0.5205333333333333,
+      "grad_norm": 4.706462297046012,
+      "learning_rate": 8.926649175053834e-06,
+      "loss": 2.3575,
+      "step": 61
+    },
+    {
+      "epoch": 0.5290666666666667,
+      "grad_norm": 4.862820204363494,
+      "learning_rate": 8.961958447491269e-06,
+      "loss": 2.2952,
+      "step": 62
+    },
+    {
+      "epoch": 0.5376,
+      "grad_norm": 4.911045913397774,
+      "learning_rate": 8.996702747267908e-06,
+      "loss": 2.1768,
+      "step": 63
+    },
+    {
+      "epoch": 0.5461333333333334,
+      "grad_norm": 5.46978680182973,
+      "learning_rate": 9.030899869919434e-06,
+      "loss": 2.2528,
+      "step": 64
+    },
+    {
+      "epoch": 0.5546666666666666,
+      "grad_norm": 5.847558397227374,
+      "learning_rate": 9.064566783214276e-06,
+      "loss": 2.2401,
+      "step": 65
+    },
+    {
+      "epoch": 0.5632,
+      "grad_norm": 5.984440656257,
+      "learning_rate": 9.097719677709343e-06,
+      "loss": 2.156,
+      "step": 66
+    },
+    {
+      "epoch": 0.5717333333333333,
+      "grad_norm": 6.146172189799918,
+      "learning_rate": 9.130374013504131e-06,
+      "loss": 2.0059,
+      "step": 67
+    },
+    {
+      "epoch": 0.5802666666666667,
+      "grad_norm": 5.725706778130614,
+      "learning_rate": 9.162544563531182e-06,
+      "loss": 1.7756,
+      "step": 68
+    },
+    {
+      "epoch": 0.5888,
+      "grad_norm": 6.479060263133115,
+      "learning_rate": 9.194245453686277e-06,
+      "loss": 1.7651,
+      "step": 69
+    },
+    {
+      "epoch": 0.5973333333333334,
+      "grad_norm": 7.319291050667066,
+      "learning_rate": 9.225490200071284e-06,
+      "loss": 1.7712,
+      "step": 70
+    },
+    {
+      "epoch": 0.6058666666666667,
+      "grad_norm": 6.913275412032087,
+      "learning_rate": 9.256291743595376e-06,
+      "loss": 1.709,
+      "step": 71
+    },
+    {
+      "epoch": 0.6144,
+      "grad_norm": 6.600657239614328,
+      "learning_rate": 9.28666248215634e-06,
+      "loss": 1.3731,
+      "step": 72
+    },
+    {
+      "epoch": 0.6229333333333333,
+      "grad_norm": 7.301483724647945,
+      "learning_rate": 9.316614300602277e-06,
+      "loss": 1.4166,
+      "step": 73
+    },
+    {
+      "epoch": 0.6314666666666666,
+      "grad_norm": 7.154933225265475,
+      "learning_rate": 9.346158598654881e-06,
+      "loss": 1.2797,
+      "step": 74
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 8.248472592538771,
+      "learning_rate": 9.375306316958499e-06,
+      "loss": 1.2082,
+      "step": 75
+    },
+    {
+      "epoch": 0.6485333333333333,
+      "grad_norm": 7.444479096112177,
+      "learning_rate": 9.404067961403957e-06,
+      "loss": 1.0402,
+      "step": 76
+    },
+    {
+      "epoch": 0.6570666666666667,
+      "grad_norm": 6.819760434594012,
+      "learning_rate": 9.432453625862409e-06,
+      "loss": 0.8244,
+      "step": 77
+    },
+    {
+      "epoch": 0.6656,
+      "grad_norm": 6.894760862855001,
+      "learning_rate": 9.460473013452401e-06,
+      "loss": 0.8345,
+      "step": 78
+    },
+    {
+      "epoch": 0.6741333333333334,
+      "grad_norm": 6.001848571839919,
+      "learning_rate": 9.488135456452207e-06,
+      "loss": 0.6839,
+      "step": 79
+    },
+    {
+      "epoch": 0.6826666666666666,
+      "grad_norm": 5.709147411501981,
+      "learning_rate": 9.515449934959717e-06,
+      "loss": 0.6567,
+      "step": 80
+    },
+    {
+      "epoch": 0.6912,
+      "grad_norm": 4.128977158730638,
+      "learning_rate": 9.542425094393249e-06,
+      "loss": 0.545,
+      "step": 81
+    },
+    {
+      "epoch": 0.6997333333333333,
+      "grad_norm": 2.604915806147427,
+      "learning_rate": 9.569069261918582e-06,
+      "loss": 0.4596,
+      "step": 82
+    },
+    {
+      "epoch": 0.7082666666666667,
+      "grad_norm": 2.039939253407506,
+      "learning_rate": 9.59539046188037e-06,
+      "loss": 0.452,
+      "step": 83
+    },
+    {
+      "epoch": 0.7168,
+      "grad_norm": 2.0398988141415337,
+      "learning_rate": 9.621396430309407e-06,
+      "loss": 0.4538,
+      "step": 84
+    },
+    {
+      "epoch": 0.7253333333333334,
+      "grad_norm": 2.37589477950211,
+      "learning_rate": 9.647094628571464e-06,
+      "loss": 0.4505,
+      "step": 85
+    },
+    {
+      "epoch": 0.7338666666666667,
+      "grad_norm": 2.80580920047501,
+      "learning_rate": 9.672492256217837e-06,
+      "loss": 0.5284,
+      "step": 86
+    },
+    {
+      "epoch": 0.7424,
+      "grad_norm": 2.3687428819051197,
+      "learning_rate": 9.697596263093091e-06,
+      "loss": 0.4371,
+      "step": 87
+    },
+    {
+      "epoch": 0.7509333333333333,
+      "grad_norm": 1.6362502854757155,
+      "learning_rate": 9.722413360750844e-06,
+      "loss": 0.3652,
+      "step": 88
+    },
+    {
+      "epoch": 0.7594666666666666,
+      "grad_norm": 1.5360860168740427,
+      "learning_rate": 9.746950033224562e-06,
+      "loss": 0.3235,
+      "step": 89
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 1.7245475092642693,
+      "learning_rate": 9.771212547196623e-06,
+      "loss": 0.3072,
+      "step": 90
+    },
+    {
+      "epoch": 0.7765333333333333,
+      "grad_norm": 1.4493496982196852,
+      "learning_rate": 9.795206961605467e-06,
+      "loss": 0.2474,
+      "step": 91
+    },
+    {
+      "epoch": 0.7850666666666667,
+      "grad_norm": 1.1662262130552072,
+      "learning_rate": 9.818939136727777e-06,
+      "loss": 0.2684,
+      "step": 92
+    },
+    {
+      "epoch": 0.7936,
+      "grad_norm": 1.1727132215390659,
+      "learning_rate": 9.842414742769675e-06,
+      "loss": 0.3456,
+      "step": 93
+    },
+    {
+      "epoch": 0.8021333333333334,
+      "grad_norm": 0.8435059300379855,
+      "learning_rate": 9.865639267998493e-06,
+      "loss": 0.227,
+      "step": 94
+    },
+    {
+      "epoch": 0.8106666666666666,
+      "grad_norm": 0.8593375804730568,
+      "learning_rate": 9.888618026444238e-06,
+      "loss": 0.1985,
+      "step": 95
+    },
+    {
+      "epoch": 0.8192,
+      "grad_norm": 1.0673772841412472,
+      "learning_rate": 9.911356165197841e-06,
+      "loss": 0.3195,
+      "step": 96
+    },
+    {
+      "epoch": 0.8277333333333333,
+      "grad_norm": 0.9341285801648793,
+      "learning_rate": 9.933858671331224e-06,
+      "loss": 0.213,
+      "step": 97
+    },
+    {
+      "epoch": 0.8362666666666667,
+      "grad_norm": 0.7197728549764331,
+      "learning_rate": 9.956130378462474e-06,
+      "loss": 0.2067,
+      "step": 98
+    },
+    {
+      "epoch": 0.8448,
+      "grad_norm": 0.5655901060353195,
+      "learning_rate": 9.978175972987748e-06,
+      "loss": 0.1708,
+      "step": 99
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 0.4681745812066334,
+      "learning_rate": 9.999999999999999e-06,
+      "loss": 0.1983,
+      "step": 100
+    },
+    {
+      "epoch": 0.8618666666666667,
+      "grad_norm": 0.4488180280567293,
+      "learning_rate": 1e-05,
+      "loss": 0.1401,
+      "step": 101
+    },
+    {
+      "epoch": 0.8704,
+      "grad_norm": 0.43194512376224187,
+      "learning_rate": 1e-05,
+      "loss": 0.1097,
+      "step": 102
+    },
+    {
+      "epoch": 0.8789333333333333,
+      "grad_norm": 0.3754480982834532,
+      "learning_rate": 1e-05,
+      "loss": 0.1531,
+      "step": 103
+    },
+    {
+      "epoch": 0.8874666666666666,
+      "grad_norm": 0.34151633602448267,
+      "learning_rate": 1e-05,
+      "loss": 0.1685,
+      "step": 104
+    },
+    {
+      "epoch": 0.896,
+      "grad_norm": 0.26356638458244175,
+      "learning_rate": 1e-05,
+      "loss": 0.1104,
+      "step": 105
+    },
+    {
+      "epoch": 0.9045333333333333,
+      "grad_norm": 0.27641004897246113,
+      "learning_rate": 1e-05,
+      "loss": 0.1589,
+      "step": 106
+    },
+    {
+      "epoch": 0.9130666666666667,
+      "grad_norm": 0.1639383504796773,
+      "learning_rate": 1e-05,
+      "loss": 0.1064,
+      "step": 107
+    },
+    {
+      "epoch": 0.9216,
+      "grad_norm": 0.24233145434818837,
+      "learning_rate": 1e-05,
+      "loss": 0.1385,
+      "step": 108
+    },
+    {
+      "epoch": 0.9301333333333334,
+      "grad_norm": 0.16015184210317215,
+      "learning_rate": 1e-05,
+      "loss": 0.121,
+      "step": 109
+    },
+    {
+      "epoch": 0.9386666666666666,
+      "grad_norm": 0.14931644417242712,
+      "learning_rate": 1e-05,
+      "loss": 0.1117,
+      "step": 110
+    },
+    {
+      "epoch": 0.9472,
+      "grad_norm": 0.15078311335939154,
+      "learning_rate": 1e-05,
+      "loss": 0.1034,
+      "step": 111
+    },
+    {
+      "epoch": 0.9557333333333333,
+      "grad_norm": 0.16714082761639734,
+      "learning_rate": 1e-05,
+      "loss": 0.115,
+      "step": 112
+    },
+    {
+      "epoch": 0.9642666666666667,
+      "grad_norm": 0.12479711996187942,
+      "learning_rate": 1e-05,
+      "loss": 0.1029,
+      "step": 113
+    },
+    {
+      "epoch": 0.9728,
+      "grad_norm": 0.14783351137940065,
+      "learning_rate": 1e-05,
+      "loss": 0.0987,
+      "step": 114
+    },
+    {
+      "epoch": 0.9813333333333333,
+      "grad_norm": 0.11311876630863582,
+      "learning_rate": 1e-05,
+      "loss": 0.0911,
+      "step": 115
+    },
+    {
+      "epoch": 0.9898666666666667,
+      "grad_norm": 0.1238329581090649,
+      "learning_rate": 1e-05,
+      "loss": 0.1095,
+      "step": 116
+    },
+    {
+      "epoch": 0.9984,
+      "grad_norm": 0.11117413394533605,
+      "learning_rate": 1e-05,
+      "loss": 0.0968,
+      "step": 117
+    },
+    {
+      "epoch": 1.0069333333333332,
+      "grad_norm": 0.09247708923706752,
+      "learning_rate": 1e-05,
+      "loss": 0.0985,
+      "step": 118
+    },
+    {
+      "epoch": 1.0154666666666667,
+      "grad_norm": 0.12028574166046906,
+      "learning_rate": 1e-05,
+      "loss": 0.1085,
+      "step": 119
+    },
+    {
+      "epoch": 1.024,
+      "grad_norm": 0.075460717991084,
+      "learning_rate": 1e-05,
+      "loss": 0.1007,
+      "step": 120
+    },
+    {
+      "epoch": 1.0325333333333333,
+      "grad_norm": 0.1930335796969662,
+      "learning_rate": 1e-05,
+      "loss": 0.1438,
+      "step": 121
+    },
+    {
+      "epoch": 1.0410666666666666,
+      "grad_norm": 0.11451251015868702,
+      "learning_rate": 1e-05,
+      "loss": 0.1365,
+      "step": 122
+    },
+    {
+      "epoch": 1.0496,
+      "grad_norm": 0.09360332240252384,
+      "learning_rate": 1e-05,
+      "loss": 0.1039,
+      "step": 123
+    },
+    {
+      "epoch": 1.0581333333333334,
+      "grad_norm": 0.13162505626586696,
+      "learning_rate": 1e-05,
+      "loss": 0.1132,
+      "step": 124
+    },
+    {
+      "epoch": 1.0666666666666667,
+      "grad_norm": 0.1329223725298499,
+      "learning_rate": 1e-05,
+      "loss": 0.1153,
+      "step": 125
+    },
+    {
+      "epoch": 1.0752,
+      "grad_norm": 0.09522360247894453,
+      "learning_rate": 1e-05,
+      "loss": 0.1264,
+      "step": 126
+    },
+    {
+      "epoch": 1.0837333333333334,
+      "grad_norm": 0.12467359977458509,
+      "learning_rate": 1e-05,
+      "loss": 0.0866,
+      "step": 127
+    },
+    {
+      "epoch": 1.0922666666666667,
+      "grad_norm": 0.08853379791954709,
+      "learning_rate": 1e-05,
+      "loss": 0.107,
+      "step": 128
+    },
+    {
+      "epoch": 1.1008,
+      "grad_norm": 0.16050358070185106,
+      "learning_rate": 1e-05,
+      "loss": 0.1134,
+      "step": 129
+    },
+    {
+      "epoch": 1.1093333333333333,
+      "grad_norm": 0.10331318962336627,
+      "learning_rate": 1e-05,
+      "loss": 0.1217,
+      "step": 130
+    },
+    {
+      "epoch": 1.1178666666666666,
+      "grad_norm": 0.08498886624952962,
+      "learning_rate": 1e-05,
+      "loss": 0.12,
+      "step": 131
+    },
+    {
+      "epoch": 1.1264,
+      "grad_norm": 0.09918910544874306,
+      "learning_rate": 1e-05,
+      "loss": 0.1173,
+      "step": 132
+    },
+    {
+      "epoch": 1.1349333333333333,
+      "grad_norm": 0.0751198135696547,
+      "learning_rate": 1e-05,
+      "loss": 0.0973,
+      "step": 133
+    },
+    {
+      "epoch": 1.1434666666666666,
+      "grad_norm": 0.07959218402066412,
+      "learning_rate": 1e-05,
+      "loss": 0.0992,
+      "step": 134
+    },
+    {
+      "epoch": 1.152,
+      "grad_norm": 0.14419628324779726,
+      "learning_rate": 1e-05,
+      "loss": 0.0856,
+      "step": 135
+    },
+    {
+      "epoch": 1.1605333333333334,
+      "grad_norm": 0.07894542967774888,
+      "learning_rate": 1e-05,
+      "loss": 0.1193,
+      "step": 136
+    },
+    {
+      "epoch": 1.1690666666666667,
+      "grad_norm": 0.08735606763938318,
+      "learning_rate": 1e-05,
+      "loss": 0.1061,
+      "step": 137
+    },
+    {
+      "epoch": 1.1776,
+      "grad_norm": 0.12344637986728384,
+      "learning_rate": 1e-05,
+      "loss": 0.1184,
+      "step": 138
+    },
+    {
+      "epoch": 1.1861333333333333,
+      "grad_norm": 0.07797745242316644,
+      "learning_rate": 1e-05,
+      "loss": 0.0959,
+      "step": 139
+    },
+    {
+      "epoch": 1.1946666666666665,
+      "grad_norm": 0.10065236259356937,
+      "learning_rate": 1e-05,
+      "loss": 0.0957,
+      "step": 140
+    },
+    {
+      "epoch": 1.2032,
+      "grad_norm": 0.06472006342138571,
+      "learning_rate": 1e-05,
+      "loss": 0.0721,
+      "step": 141
+    },
+    {
+      "epoch": 1.2117333333333333,
+      "grad_norm": 0.08080002696086562,
+      "learning_rate": 1e-05,
+      "loss": 0.1073,
+      "step": 142
+    },
+    {
+      "epoch": 1.2202666666666666,
+      "grad_norm": 0.10400160039217118,
+      "learning_rate": 1e-05,
+      "loss": 0.1227,
+      "step": 143
+    },
+    {
+      "epoch": 1.2288000000000001,
+      "grad_norm": 0.08719509476650818,
+      "learning_rate": 1e-05,
+      "loss": 0.114,
+      "step": 144
+    },
+    {
+      "epoch": 1.2373333333333334,
+      "grad_norm": 0.08431635436674337,
+      "learning_rate": 1e-05,
+      "loss": 0.1303,
+      "step": 145
+    },
+    {
+      "epoch": 1.2458666666666667,
+      "grad_norm": 0.23947926607305503,
+      "learning_rate": 1e-05,
+      "loss": 0.1199,
+      "step": 146
+    },
+    {
+      "epoch": 1.2544,
+      "grad_norm": 0.08794721265212341,
+      "learning_rate": 1e-05,
+      "loss": 0.1094,
+      "step": 147
+    },
+    {
+      "epoch": 1.2629333333333332,
+      "grad_norm": 0.08063747277184712,
+      "learning_rate": 1e-05,
+      "loss": 0.1062,
+      "step": 148
+    },
+    {
+      "epoch": 1.2714666666666667,
+      "grad_norm": 0.06832693897193236,
+      "learning_rate": 1e-05,
+      "loss": 0.0842,
+      "step": 149
+    },
+    {
+      "epoch": 1.28,
+      "grad_norm": 0.07037053759395089,
+      "learning_rate": 1e-05,
+      "loss": 0.0971,
+      "step": 150
+    },
+    {
+      "epoch": 1.2885333333333333,
+      "grad_norm": 0.08753063334098339,
+      "learning_rate": 1e-05,
+      "loss": 0.085,
+      "step": 151
+    },
+    {
+      "epoch": 1.2970666666666666,
+      "grad_norm": 0.11381804369240754,
+      "learning_rate": 1e-05,
+      "loss": 0.1156,
+      "step": 152
+    },
+    {
+      "epoch": 1.3056,
+      "grad_norm": 0.07203805377255211,
+      "learning_rate": 1e-05,
+      "loss": 0.0951,
+      "step": 153
+    },
+    {
+      "epoch": 1.3141333333333334,
+      "grad_norm": 0.1156784206459358,
+      "learning_rate": 1e-05,
+      "loss": 0.1557,
+      "step": 154
+    },
+    {
+      "epoch": 1.3226666666666667,
+      "grad_norm": 0.11353874538174968,
+      "learning_rate": 1e-05,
+      "loss": 0.1284,
+      "step": 155
+    },
+    {
+      "epoch": 1.3312,
+      "grad_norm": 0.06675505890811795,
+      "learning_rate": 1e-05,
+      "loss": 0.089,
+      "step": 156
+    },
+    {
+      "epoch": 1.3397333333333332,
+      "grad_norm": 0.07642955477275162,
+      "learning_rate": 1e-05,
+      "loss": 0.0825,
+      "step": 157
+    },
+    {
+      "epoch": 1.3482666666666667,
+      "grad_norm": 0.07196529265355209,
+      "learning_rate": 1e-05,
+      "loss": 0.0885,
+      "step": 158
+    },
+    {
+      "epoch": 1.3568,
+      "grad_norm": 0.08651497112727735,
+      "learning_rate": 1e-05,
+      "loss": 0.0934,
+      "step": 159
+    },
+    {
+      "epoch": 1.3653333333333333,
+      "grad_norm": 0.07249320769144564,
+      "learning_rate": 1e-05,
+      "loss": 0.102,
+      "step": 160
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 301,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 20,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.273910921652863e+18,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-160/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9430fb289d52200b279530dc31f818fe016b81f2a2feb4d356e75541590998de
+size 6840

checkpoint-180/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: ../ckpts/Meta-Llama-3-8B-Instruct
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-180/adapter_config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "../ckpts/Meta-Llama-3-8B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "up_proj",
+    "gate_proj",
+    "lm_head",
+    "o_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-180/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:494f39b81a093fd2994ff97e7d7bb6de0c800f86be7f9dc8e6b228b20f109ad4
+size 1138856856

checkpoint-180/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1293 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.536,
+  "eval_steps": 500,
+  "global_step": 180,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.008533333333333334,
+      "grad_norm": 160.11701043689894,
+      "learning_rate": 0.0,
+      "loss": 32.4968,
+      "step": 1
+    },
+    {
+      "epoch": 0.017066666666666667,
+      "grad_norm": 157.24779534424323,
+      "learning_rate": 1.5051499783199057e-06,
+      "loss": 31.6979,
+      "step": 2
+    },
+    {
+      "epoch": 0.0256,
+      "grad_norm": 157.9465272449825,
+      "learning_rate": 2.385606273598312e-06,
+      "loss": 31.8828,
+      "step": 3
+    },
+    {
+      "epoch": 0.034133333333333335,
+      "grad_norm": 160.2154859965946,
+      "learning_rate": 3.0102999566398115e-06,
+      "loss": 31.9681,
+      "step": 4
+    },
+    {
+      "epoch": 0.042666666666666665,
+      "grad_norm": 158.5305446712084,
+      "learning_rate": 3.4948500216800934e-06,
+      "loss": 31.3717,
+      "step": 5
+    },
+    {
+      "epoch": 0.0512,
+      "grad_norm": 155.50243039700376,
+      "learning_rate": 3.890756251918218e-06,
+      "loss": 30.5348,
+      "step": 6
+    },
+    {
+      "epoch": 0.05973333333333333,
+      "grad_norm": 168.6887446693614,
+      "learning_rate": 4.225490200071284e-06,
+      "loss": 31.3845,
+      "step": 7
+    },
+    {
+      "epoch": 0.06826666666666667,
+      "grad_norm": 164.2631689450651,
+      "learning_rate": 4.515449934959717e-06,
+      "loss": 30.5243,
+      "step": 8
+    },
+    {
+      "epoch": 0.0768,
+      "grad_norm": 174.1878139573776,
+      "learning_rate": 4.771212547196624e-06,
+      "loss": 30.0138,
+      "step": 9
+    },
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 177.9519334680014,
+      "learning_rate": 4.9999999999999996e-06,
+      "loss": 29.6143,
+      "step": 10
+    },
+    {
+      "epoch": 0.09386666666666667,
+      "grad_norm": 183.57104380865735,
+      "learning_rate": 5.206963425791125e-06,
+      "loss": 28.8718,
+      "step": 11
+    },
+    {
+      "epoch": 0.1024,
+      "grad_norm": 186.4090344511231,
+      "learning_rate": 5.395906230238124e-06,
+      "loss": 26.1695,
+      "step": 12
+    },
+    {
+      "epoch": 0.11093333333333333,
+      "grad_norm": 198.17161320746723,
+      "learning_rate": 5.5697167615341825e-06,
+      "loss": 26.1266,
+      "step": 13
+    },
+    {
+      "epoch": 0.11946666666666667,
+      "grad_norm": 182.4443087115901,
+      "learning_rate": 5.730640178391189e-06,
+      "loss": 24.2121,
+      "step": 14
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 159.38105380659272,
+      "learning_rate": 5.880456295278406e-06,
+      "loss": 22.5796,
+      "step": 15
+    },
+    {
+      "epoch": 0.13653333333333334,
+      "grad_norm": 142.82387126501297,
+      "learning_rate": 6.020599913279623e-06,
+      "loss": 21.1346,
+      "step": 16
+    },
+    {
+      "epoch": 0.14506666666666668,
+      "grad_norm": 123.86394296641578,
+      "learning_rate": 6.15224460689137e-06,
+      "loss": 19.8457,
+      "step": 17
+    },
+    {
+      "epoch": 0.1536,
+      "grad_norm": 112.3988260336824,
+      "learning_rate": 6.276362525516529e-06,
+      "loss": 18.7824,
+      "step": 18
+    },
+    {
+      "epoch": 0.16213333333333332,
+      "grad_norm": 120.96712330991012,
+      "learning_rate": 6.393768004764144e-06,
+      "loss": 18.0207,
+      "step": 19
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 129.42692949353702,
+      "learning_rate": 6.505149978319905e-06,
+      "loss": 16.8355,
+      "step": 20
+    },
+    {
+      "epoch": 0.1792,
+      "grad_norm": 120.65595457746791,
+      "learning_rate": 6.611096473669596e-06,
+      "loss": 15.252,
+      "step": 21
+    },
+    {
+      "epoch": 0.18773333333333334,
+      "grad_norm": 133.05280466087515,
+      "learning_rate": 6.712113404111031e-06,
+      "loss": 14.1391,
+      "step": 22
+    },
+    {
+      "epoch": 0.19626666666666667,
+      "grad_norm": 127.95029628849048,
+      "learning_rate": 6.808639180087963e-06,
+      "loss": 12.9566,
+      "step": 23
+    },
+    {
+      "epoch": 0.2048,
+      "grad_norm": 108.83495245094748,
+      "learning_rate": 6.90105620855803e-06,
+      "loss": 11.8743,
+      "step": 24
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 99.90727146021455,
+      "learning_rate": 6.989700043360187e-06,
+      "loss": 10.962,
+      "step": 25
+    },
+    {
+      "epoch": 0.22186666666666666,
+      "grad_norm": 98.37126740059823,
+      "learning_rate": 7.074866739854089e-06,
+      "loss": 9.9919,
+      "step": 26
+    },
+    {
+      "epoch": 0.2304,
+      "grad_norm": 92.26708429201608,
+      "learning_rate": 7.156818820794936e-06,
+      "loss": 8.8811,
+      "step": 27
+    },
+    {
+      "epoch": 0.23893333333333333,
+      "grad_norm": 83.36099898839835,
+      "learning_rate": 7.235790156711096e-06,
+      "loss": 7.7806,
+      "step": 28
+    },
+    {
+      "epoch": 0.24746666666666667,
+      "grad_norm": 68.07500315598597,
+      "learning_rate": 7.3119899894947795e-06,
+      "loss": 7.0528,
+      "step": 29
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 69.58960332280246,
+      "learning_rate": 7.385606273598311e-06,
+      "loss": 6.3683,
+      "step": 30
+    },
+    {
+      "epoch": 0.26453333333333334,
+      "grad_norm": 68.77532204123075,
+      "learning_rate": 7.456808469171363e-06,
+      "loss": 6.1635,
+      "step": 31
+    },
+    {
+      "epoch": 0.2730666666666667,
+      "grad_norm": 66.29676636510072,
+      "learning_rate": 7.5257498915995295e-06,
+      "loss": 4.711,
+      "step": 32
+    },
+    {
+      "epoch": 0.2816,
+      "grad_norm": 42.87145091679237,
+      "learning_rate": 7.592569699389437e-06,
+      "loss": 4.5119,
+      "step": 33
+    },
+    {
+      "epoch": 0.29013333333333335,
+      "grad_norm": 26.2592350291551,
+      "learning_rate": 7.657394585211274e-06,
+      "loss": 4.31,
+      "step": 34
+    },
+    {
+      "epoch": 0.2986666666666667,
+      "grad_norm": 15.35959008067237,
+      "learning_rate": 7.720340221751376e-06,
+      "loss": 4.0001,
+      "step": 35
+    },
+    {
+      "epoch": 0.3072,
+      "grad_norm": 8.50847651865227,
+      "learning_rate": 7.781512503836437e-06,
+      "loss": 3.5723,
+      "step": 36
+    },
+    {
+      "epoch": 0.3157333333333333,
+      "grad_norm": 6.562581089063746,
+      "learning_rate": 7.841008620334974e-06,
+      "loss": 3.9254,
+      "step": 37
+    },
+    {
+      "epoch": 0.32426666666666665,
+      "grad_norm": 5.6145595722250095,
+      "learning_rate": 7.89891798308405e-06,
+      "loss": 3.8746,
+      "step": 38
+    },
+    {
+      "epoch": 0.3328,
+      "grad_norm": 5.385367220486204,
+      "learning_rate": 7.955323035132495e-06,
+      "loss": 3.8128,
+      "step": 39
+    },
+    {
+      "epoch": 0.3413333333333333,
+      "grad_norm": 5.403447124703616,
+      "learning_rate": 8.010299956639811e-06,
+      "loss": 3.885,
+      "step": 40
+    },
+    {
+      "epoch": 0.34986666666666666,
+      "grad_norm": 5.48242204895128,
+      "learning_rate": 8.063919283598677e-06,
+      "loss": 3.8048,
+      "step": 41
+    },
+    {
+      "epoch": 0.3584,
+      "grad_norm": 5.5525098950513865,
+      "learning_rate": 8.116246451989503e-06,
+      "loss": 3.7508,
+      "step": 42
+    },
+    {
+      "epoch": 0.36693333333333333,
+      "grad_norm": 5.354384520535484,
+      "learning_rate": 8.167342277897933e-06,
+      "loss": 3.5069,
+      "step": 43
+    },
+    {
+      "epoch": 0.37546666666666667,
+      "grad_norm": 5.46272338131107,
+      "learning_rate": 8.217263382430936e-06,
+      "loss": 3.6747,
+      "step": 44
+    },
+    {
+      "epoch": 0.384,
+      "grad_norm": 4.798550688968453,
+      "learning_rate": 8.266062568876717e-06,
+      "loss": 3.1609,
+      "step": 45
+    },
+    {
+      "epoch": 0.39253333333333335,
+      "grad_norm": 5.755104452953421,
+      "learning_rate": 8.31378915840787e-06,
+      "loss": 3.5733,
+      "step": 46
+    },
+    {
+      "epoch": 0.4010666666666667,
+      "grad_norm": 4.618763611067563,
+      "learning_rate": 8.360489289678585e-06,
+      "loss": 2.9402,
+      "step": 47
+    },
+    {
+      "epoch": 0.4096,
+      "grad_norm": 5.506785974818791,
+      "learning_rate": 8.406206186877936e-06,
+      "loss": 3.382,
+      "step": 48
+    },
+    {
+      "epoch": 0.41813333333333336,
+      "grad_norm": 4.68603207809794,
+      "learning_rate": 8.450980400142568e-06,
+      "loss": 2.9918,
+      "step": 49
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 5.124033394817131,
+      "learning_rate": 8.494850021680093e-06,
+      "loss": 3.3202,
+      "step": 50
+    },
+    {
+      "epoch": 0.4352,
+      "grad_norm": 4.293001183481895,
+      "learning_rate": 8.537850880489681e-06,
+      "loss": 2.8519,
+      "step": 51
+    },
+    {
+      "epoch": 0.4437333333333333,
+      "grad_norm": 4.382596858902394,
+      "learning_rate": 8.580016718173996e-06,
+      "loss": 2.9683,
+      "step": 52
+    },
+    {
+      "epoch": 0.45226666666666665,
+      "grad_norm": 4.3176263388044696,
+      "learning_rate": 8.621379348003945e-06,
+      "loss": 2.9257,
+      "step": 53
+    },
+    {
+      "epoch": 0.4608,
+      "grad_norm": 4.5250022171605195,
+      "learning_rate": 8.661968799114844e-06,
+      "loss": 3.0556,
+      "step": 54
+    },
+    {
+      "epoch": 0.4693333333333333,
+      "grad_norm": 4.429424190600661,
+      "learning_rate": 8.701813447471218e-06,
+      "loss": 2.9513,
+      "step": 55
+    },
+    {
+      "epoch": 0.47786666666666666,
+      "grad_norm": 4.349652568052827,
+      "learning_rate": 8.740940135031001e-06,
+      "loss": 2.9029,
+      "step": 56
+    },
+    {
+      "epoch": 0.4864,
+      "grad_norm": 4.299227871435445,
+      "learning_rate": 8.779374278362457e-06,
+      "loss": 2.5989,
+      "step": 57
+    },
+    {
+      "epoch": 0.49493333333333334,
+      "grad_norm": 4.562461330302201,
+      "learning_rate": 8.817139967814684e-06,
+      "loss": 2.8158,
+      "step": 58
+    },
+    {
+      "epoch": 0.5034666666666666,
+      "grad_norm": 4.606987182758338,
+      "learning_rate": 8.854260058210721e-06,
+      "loss": 2.6272,
+      "step": 59
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 4.9420031522511545,
+      "learning_rate": 8.890756251918216e-06,
+      "loss": 2.5488,
+      "step": 60
+    },
+    {
+      "epoch": 0.5205333333333333,
+      "grad_norm": 4.706462297046012,
+      "learning_rate": 8.926649175053834e-06,
+      "loss": 2.3575,
+      "step": 61
+    },
+    {
+      "epoch": 0.5290666666666667,
+      "grad_norm": 4.862820204363494,
+      "learning_rate": 8.961958447491269e-06,
+      "loss": 2.2952,
+      "step": 62
+    },
+    {
+      "epoch": 0.5376,
+      "grad_norm": 4.911045913397774,
+      "learning_rate": 8.996702747267908e-06,
+      "loss": 2.1768,
+      "step": 63
+    },
+    {
+      "epoch": 0.5461333333333334,
+      "grad_norm": 5.46978680182973,
+      "learning_rate": 9.030899869919434e-06,
+      "loss": 2.2528,
+      "step": 64
+    },
+    {
+      "epoch": 0.5546666666666666,
+      "grad_norm": 5.847558397227374,
+      "learning_rate": 9.064566783214276e-06,
+      "loss": 2.2401,
+      "step": 65
+    },
+    {
+      "epoch": 0.5632,
+      "grad_norm": 5.984440656257,
+      "learning_rate": 9.097719677709343e-06,
+      "loss": 2.156,
+      "step": 66
+    },
+    {
+      "epoch": 0.5717333333333333,
+      "grad_norm": 6.146172189799918,
+      "learning_rate": 9.130374013504131e-06,
+      "loss": 2.0059,
+      "step": 67
+    },
+    {
+      "epoch": 0.5802666666666667,
+      "grad_norm": 5.725706778130614,
+      "learning_rate": 9.162544563531182e-06,
+      "loss": 1.7756,
+      "step": 68
+    },
+    {
+      "epoch": 0.5888,
+      "grad_norm": 6.479060263133115,
+      "learning_rate": 9.194245453686277e-06,
+      "loss": 1.7651,
+      "step": 69
+    },
+    {
+      "epoch": 0.5973333333333334,
+      "grad_norm": 7.319291050667066,
+      "learning_rate": 9.225490200071284e-06,
+      "loss": 1.7712,
+      "step": 70
+    },
+    {
+      "epoch": 0.6058666666666667,
+      "grad_norm": 6.913275412032087,
+      "learning_rate": 9.256291743595376e-06,
+      "loss": 1.709,
+      "step": 71
+    },
+    {
+      "epoch": 0.6144,
+      "grad_norm": 6.600657239614328,
+      "learning_rate": 9.28666248215634e-06,
+      "loss": 1.3731,
+      "step": 72
+    },
+    {
+      "epoch": 0.6229333333333333,
+      "grad_norm": 7.301483724647945,
+      "learning_rate": 9.316614300602277e-06,
+      "loss": 1.4166,
+      "step": 73
+    },
+    {
+      "epoch": 0.6314666666666666,
+      "grad_norm": 7.154933225265475,
+      "learning_rate": 9.346158598654881e-06,
+      "loss": 1.2797,
+      "step": 74
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 8.248472592538771,
+      "learning_rate": 9.375306316958499e-06,
+      "loss": 1.2082,
+      "step": 75
+    },
+    {
+      "epoch": 0.6485333333333333,
+      "grad_norm": 7.444479096112177,
+      "learning_rate": 9.404067961403957e-06,
+      "loss": 1.0402,
+      "step": 76
+    },
+    {
+      "epoch": 0.6570666666666667,
+      "grad_norm": 6.819760434594012,
+      "learning_rate": 9.432453625862409e-06,
+      "loss": 0.8244,
+      "step": 77
+    },
+    {
+      "epoch": 0.6656,
+      "grad_norm": 6.894760862855001,
+      "learning_rate": 9.460473013452401e-06,
+      "loss": 0.8345,
+      "step": 78
+    },
+    {
+      "epoch": 0.6741333333333334,
+      "grad_norm": 6.001848571839919,
+      "learning_rate": 9.488135456452207e-06,
+      "loss": 0.6839,
+      "step": 79
+    },
+    {
+      "epoch": 0.6826666666666666,
+      "grad_norm": 5.709147411501981,
+      "learning_rate": 9.515449934959717e-06,
+      "loss": 0.6567,
+      "step": 80
+    },
+    {
+      "epoch": 0.6912,
+      "grad_norm": 4.128977158730638,
+      "learning_rate": 9.542425094393249e-06,
+      "loss": 0.545,
+      "step": 81
+    },
+    {
+      "epoch": 0.6997333333333333,
+      "grad_norm": 2.604915806147427,
+      "learning_rate": 9.569069261918582e-06,
+      "loss": 0.4596,
+      "step": 82
+    },
+    {
+      "epoch": 0.7082666666666667,
+      "grad_norm": 2.039939253407506,
+      "learning_rate": 9.59539046188037e-06,
+      "loss": 0.452,
+      "step": 83
+    },
+    {
+      "epoch": 0.7168,
+      "grad_norm": 2.0398988141415337,
+      "learning_rate": 9.621396430309407e-06,
+      "loss": 0.4538,
+      "step": 84
+    },
+    {
+      "epoch": 0.7253333333333334,
+      "grad_norm": 2.37589477950211,
+      "learning_rate": 9.647094628571464e-06,
+      "loss": 0.4505,
+      "step": 85
+    },
+    {
+      "epoch": 0.7338666666666667,
+      "grad_norm": 2.80580920047501,
+      "learning_rate": 9.672492256217837e-06,
+      "loss": 0.5284,
+      "step": 86
+    },
+    {
+      "epoch": 0.7424,
+      "grad_norm": 2.3687428819051197,
+      "learning_rate": 9.697596263093091e-06,
+      "loss": 0.4371,
+      "step": 87
+    },
+    {
+      "epoch": 0.7509333333333333,
+      "grad_norm": 1.6362502854757155,
+      "learning_rate": 9.722413360750844e-06,
+      "loss": 0.3652,
+      "step": 88
+    },
+    {
+      "epoch": 0.7594666666666666,
+      "grad_norm": 1.5360860168740427,
+      "learning_rate": 9.746950033224562e-06,
+      "loss": 0.3235,
+      "step": 89
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 1.7245475092642693,
+      "learning_rate": 9.771212547196623e-06,
+      "loss": 0.3072,
+      "step": 90
+    },
+    {
+      "epoch": 0.7765333333333333,
+      "grad_norm": 1.4493496982196852,
+      "learning_rate": 9.795206961605467e-06,
+      "loss": 0.2474,
+      "step": 91
+    },
+    {
+      "epoch": 0.7850666666666667,
+      "grad_norm": 1.1662262130552072,
+      "learning_rate": 9.818939136727777e-06,
+      "loss": 0.2684,
+      "step": 92
+    },
+    {
+      "epoch": 0.7936,
+      "grad_norm": 1.1727132215390659,
+      "learning_rate": 9.842414742769675e-06,
+      "loss": 0.3456,
+      "step": 93
+    },
+    {
+      "epoch": 0.8021333333333334,
+      "grad_norm": 0.8435059300379855,
+      "learning_rate": 9.865639267998493e-06,
+      "loss": 0.227,
+      "step": 94
+    },
+    {
+      "epoch": 0.8106666666666666,
+      "grad_norm": 0.8593375804730568,
+      "learning_rate": 9.888618026444238e-06,
+      "loss": 0.1985,
+      "step": 95
+    },
+    {
+      "epoch": 0.8192,
+      "grad_norm": 1.0673772841412472,
+      "learning_rate": 9.911356165197841e-06,
+      "loss": 0.3195,
+      "step": 96
+    },
+    {
+      "epoch": 0.8277333333333333,
+      "grad_norm": 0.9341285801648793,
+      "learning_rate": 9.933858671331224e-06,
+      "loss": 0.213,
+      "step": 97
+    },
+    {
+      "epoch": 0.8362666666666667,
+      "grad_norm": 0.7197728549764331,
+      "learning_rate": 9.956130378462474e-06,
+      "loss": 0.2067,
+      "step": 98
+    },
+    {
+      "epoch": 0.8448,
+      "grad_norm": 0.5655901060353195,
+      "learning_rate": 9.978175972987748e-06,
+      "loss": 0.1708,
+      "step": 99
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 0.4681745812066334,
+      "learning_rate": 9.999999999999999e-06,
+      "loss": 0.1983,
+      "step": 100
+    },
+    {
+      "epoch": 0.8618666666666667,
+      "grad_norm": 0.4488180280567293,
+      "learning_rate": 1e-05,
+      "loss": 0.1401,
+      "step": 101
+    },
+    {
+      "epoch": 0.8704,
+      "grad_norm": 0.43194512376224187,
+      "learning_rate": 1e-05,
+      "loss": 0.1097,
+      "step": 102
+    },
+    {
+      "epoch": 0.8789333333333333,
+      "grad_norm": 0.3754480982834532,
+      "learning_rate": 1e-05,
+      "loss": 0.1531,
+      "step": 103
+    },
+    {
+      "epoch": 0.8874666666666666,
+      "grad_norm": 0.34151633602448267,
+      "learning_rate": 1e-05,
+      "loss": 0.1685,
+      "step": 104
+    },
+    {
+      "epoch": 0.896,
+      "grad_norm": 0.26356638458244175,
+      "learning_rate": 1e-05,
+      "loss": 0.1104,
+      "step": 105
+    },
+    {
+      "epoch": 0.9045333333333333,
+      "grad_norm": 0.27641004897246113,
+      "learning_rate": 1e-05,
+      "loss": 0.1589,
+      "step": 106
+    },
+    {
+      "epoch": 0.9130666666666667,
+      "grad_norm": 0.1639383504796773,
+      "learning_rate": 1e-05,
+      "loss": 0.1064,
+      "step": 107
+    },
+    {
+      "epoch": 0.9216,
+      "grad_norm": 0.24233145434818837,
+      "learning_rate": 1e-05,
+      "loss": 0.1385,
+      "step": 108
+    },
+    {
+      "epoch": 0.9301333333333334,
+      "grad_norm": 0.16015184210317215,
+      "learning_rate": 1e-05,
+      "loss": 0.121,
+      "step": 109
+    },
+    {
+      "epoch": 0.9386666666666666,
+      "grad_norm": 0.14931644417242712,
+      "learning_rate": 1e-05,
+      "loss": 0.1117,
+      "step": 110
+    },
+    {
+      "epoch": 0.9472,
+      "grad_norm": 0.15078311335939154,
+      "learning_rate": 1e-05,
+      "loss": 0.1034,
+      "step": 111
+    },
+    {
+      "epoch": 0.9557333333333333,
+      "grad_norm": 0.16714082761639734,
+      "learning_rate": 1e-05,
+      "loss": 0.115,
+      "step": 112
+    },
+    {
+      "epoch": 0.9642666666666667,
+      "grad_norm": 0.12479711996187942,
+      "learning_rate": 1e-05,
+      "loss": 0.1029,
+      "step": 113
+    },
+    {
+      "epoch": 0.9728,
+      "grad_norm": 0.14783351137940065,
+      "learning_rate": 1e-05,
+      "loss": 0.0987,
+      "step": 114
+    },
+    {
+      "epoch": 0.9813333333333333,
+      "grad_norm": 0.11311876630863582,
+      "learning_rate": 1e-05,
+      "loss": 0.0911,
+      "step": 115
+    },
+    {
+      "epoch": 0.9898666666666667,
+      "grad_norm": 0.1238329581090649,
+      "learning_rate": 1e-05,
+      "loss": 0.1095,
+      "step": 116
+    },
+    {
+      "epoch": 0.9984,
+      "grad_norm": 0.11117413394533605,
+      "learning_rate": 1e-05,
+      "loss": 0.0968,
+      "step": 117
+    },
+    {
+      "epoch": 1.0069333333333332,
+      "grad_norm": 0.09247708923706752,
+      "learning_rate": 1e-05,
+      "loss": 0.0985,
+      "step": 118
+    },
+    {
+      "epoch": 1.0154666666666667,
+      "grad_norm": 0.12028574166046906,
+      "learning_rate": 1e-05,
+      "loss": 0.1085,
+      "step": 119
+    },
+    {
+      "epoch": 1.024,
+      "grad_norm": 0.075460717991084,
+      "learning_rate": 1e-05,
+      "loss": 0.1007,
+      "step": 120
+    },
+    {
+      "epoch": 1.0325333333333333,
+      "grad_norm": 0.1930335796969662,
+      "learning_rate": 1e-05,
+      "loss": 0.1438,
+      "step": 121
+    },
+    {
+      "epoch": 1.0410666666666666,
+      "grad_norm": 0.11451251015868702,
+      "learning_rate": 1e-05,
+      "loss": 0.1365,
+      "step": 122
+    },
+    {
+      "epoch": 1.0496,
+      "grad_norm": 0.09360332240252384,
+      "learning_rate": 1e-05,
+      "loss": 0.1039,
+      "step": 123
+    },
+    {
+      "epoch": 1.0581333333333334,
+      "grad_norm": 0.13162505626586696,
+      "learning_rate": 1e-05,
+      "loss": 0.1132,
+      "step": 124
+    },
+    {
+      "epoch": 1.0666666666666667,
+      "grad_norm": 0.1329223725298499,
+      "learning_rate": 1e-05,
+      "loss": 0.1153,
+      "step": 125
+    },
+    {
+      "epoch": 1.0752,
+      "grad_norm": 0.09522360247894453,
+      "learning_rate": 1e-05,
+      "loss": 0.1264,
+      "step": 126
+    },
+    {
+      "epoch": 1.0837333333333334,
+      "grad_norm": 0.12467359977458509,
+      "learning_rate": 1e-05,
+      "loss": 0.0866,
+      "step": 127
+    },
+    {
+      "epoch": 1.0922666666666667,
+      "grad_norm": 0.08853379791954709,
+      "learning_rate": 1e-05,
+      "loss": 0.107,
+      "step": 128
+    },
+    {
+      "epoch": 1.1008,
+      "grad_norm": 0.16050358070185106,
+      "learning_rate": 1e-05,
+      "loss": 0.1134,
+      "step": 129
+    },
+    {
+      "epoch": 1.1093333333333333,
+      "grad_norm": 0.10331318962336627,
+      "learning_rate": 1e-05,
+      "loss": 0.1217,
+      "step": 130
+    },
+    {
+      "epoch": 1.1178666666666666,
+      "grad_norm": 0.08498886624952962,
+      "learning_rate": 1e-05,
+      "loss": 0.12,
+      "step": 131
+    },
+    {
+      "epoch": 1.1264,
+      "grad_norm": 0.09918910544874306,
+      "learning_rate": 1e-05,
+      "loss": 0.1173,
+      "step": 132
+    },
+    {
+      "epoch": 1.1349333333333333,
+      "grad_norm": 0.0751198135696547,
+      "learning_rate": 1e-05,
+      "loss": 0.0973,
+      "step": 133
+    },
+    {
+      "epoch": 1.1434666666666666,
+      "grad_norm": 0.07959218402066412,
+      "learning_rate": 1e-05,
+      "loss": 0.0992,
+      "step": 134
+    },
+    {
+      "epoch": 1.152,
+      "grad_norm": 0.14419628324779726,
+      "learning_rate": 1e-05,
+      "loss": 0.0856,
+      "step": 135
+    },
+    {
+      "epoch": 1.1605333333333334,
+      "grad_norm": 0.07894542967774888,
+      "learning_rate": 1e-05,
+      "loss": 0.1193,
+      "step": 136
+    },
+    {
+      "epoch": 1.1690666666666667,
+      "grad_norm": 0.08735606763938318,
+      "learning_rate": 1e-05,
+      "loss": 0.1061,
+      "step": 137
+    },
+    {
+      "epoch": 1.1776,
+      "grad_norm": 0.12344637986728384,
+      "learning_rate": 1e-05,
+      "loss": 0.1184,
+      "step": 138
+    },
+    {
+      "epoch": 1.1861333333333333,
+      "grad_norm": 0.07797745242316644,
+      "learning_rate": 1e-05,
+      "loss": 0.0959,
+      "step": 139
+    },
+    {
+      "epoch": 1.1946666666666665,
+      "grad_norm": 0.10065236259356937,
+      "learning_rate": 1e-05,
+      "loss": 0.0957,
+      "step": 140
+    },
+    {
+      "epoch": 1.2032,
+      "grad_norm": 0.06472006342138571,
+      "learning_rate": 1e-05,
+      "loss": 0.0721,
+      "step": 141
+    },
+    {
+      "epoch": 1.2117333333333333,
+      "grad_norm": 0.08080002696086562,
+      "learning_rate": 1e-05,
+      "loss": 0.1073,
+      "step": 142
+    },
+    {
+      "epoch": 1.2202666666666666,
+      "grad_norm": 0.10400160039217118,
+      "learning_rate": 1e-05,
+      "loss": 0.1227,
+      "step": 143
+    },
+    {
+      "epoch": 1.2288000000000001,
+      "grad_norm": 0.08719509476650818,
+      "learning_rate": 1e-05,
+      "loss": 0.114,
+      "step": 144
+    },
+    {
+      "epoch": 1.2373333333333334,
+      "grad_norm": 0.08431635436674337,
+      "learning_rate": 1e-05,
+      "loss": 0.1303,
+      "step": 145
+    },
+    {
+      "epoch": 1.2458666666666667,
+      "grad_norm": 0.23947926607305503,
+      "learning_rate": 1e-05,
+      "loss": 0.1199,
+      "step": 146
+    },
+    {
+      "epoch": 1.2544,
+      "grad_norm": 0.08794721265212341,
+      "learning_rate": 1e-05,
+      "loss": 0.1094,
+      "step": 147
+    },
+    {
+      "epoch": 1.2629333333333332,
+      "grad_norm": 0.08063747277184712,
+      "learning_rate": 1e-05,
+      "loss": 0.1062,
+      "step": 148
+    },
+    {
+      "epoch": 1.2714666666666667,
+      "grad_norm": 0.06832693897193236,
+      "learning_rate": 1e-05,
+      "loss": 0.0842,
+      "step": 149
+    },
+    {
+      "epoch": 1.28,
+      "grad_norm": 0.07037053759395089,
+      "learning_rate": 1e-05,
+      "loss": 0.0971,
+      "step": 150
+    },
+    {
+      "epoch": 1.2885333333333333,
+      "grad_norm": 0.08753063334098339,
+      "learning_rate": 1e-05,
+      "loss": 0.085,
+      "step": 151
+    },
+    {
+      "epoch": 1.2970666666666666,
+      "grad_norm": 0.11381804369240754,
+      "learning_rate": 1e-05,
+      "loss": 0.1156,
+      "step": 152
+    },
+    {
+      "epoch": 1.3056,
+      "grad_norm": 0.07203805377255211,
+      "learning_rate": 1e-05,
+      "loss": 0.0951,
+      "step": 153
+    },
+    {
+      "epoch": 1.3141333333333334,
+      "grad_norm": 0.1156784206459358,
+      "learning_rate": 1e-05,
+      "loss": 0.1557,
+      "step": 154
+    },
+    {
+      "epoch": 1.3226666666666667,
+      "grad_norm": 0.11353874538174968,
+      "learning_rate": 1e-05,
+      "loss": 0.1284,
+      "step": 155
+    },
+    {
+      "epoch": 1.3312,
+      "grad_norm": 0.06675505890811795,
+      "learning_rate": 1e-05,
+      "loss": 0.089,
+      "step": 156
+    },
+    {
+      "epoch": 1.3397333333333332,
+      "grad_norm": 0.07642955477275162,
+      "learning_rate": 1e-05,
+      "loss": 0.0825,
+      "step": 157
+    },
+    {
+      "epoch": 1.3482666666666667,
+      "grad_norm": 0.07196529265355209,
+      "learning_rate": 1e-05,
+      "loss": 0.0885,
+      "step": 158
+    },
+    {
+      "epoch": 1.3568,
+      "grad_norm": 0.08651497112727735,
+      "learning_rate": 1e-05,
+      "loss": 0.0934,
+      "step": 159
+    },
+    {
+      "epoch": 1.3653333333333333,
+      "grad_norm": 0.07249320769144564,
+      "learning_rate": 1e-05,
+      "loss": 0.102,
+      "step": 160
+    },
+    {
+      "epoch": 1.3738666666666668,
+      "grad_norm": 0.08744246078973236,
+      "learning_rate": 1e-05,
+      "loss": 0.0905,
+      "step": 161
+    },
+    {
+      "epoch": 1.3824,
+      "grad_norm": 0.08657071789403122,
+      "learning_rate": 1e-05,
+      "loss": 0.1217,
+      "step": 162
+    },
+    {
+      "epoch": 1.3909333333333334,
+      "grad_norm": 0.1064187506686306,
+      "learning_rate": 1e-05,
+      "loss": 0.1163,
+      "step": 163
+    },
+    {
+      "epoch": 1.3994666666666666,
+      "grad_norm": 0.1280290421664948,
+      "learning_rate": 1e-05,
+      "loss": 0.1046,
+      "step": 164
+    },
+    {
+      "epoch": 1.408,
+      "grad_norm": 0.09937311183437203,
+      "learning_rate": 1e-05,
+      "loss": 0.1147,
+      "step": 165
+    },
+    {
+      "epoch": 1.4165333333333332,
+      "grad_norm": 0.08384493963149035,
+      "learning_rate": 1e-05,
+      "loss": 0.0837,
+      "step": 166
+    },
+    {
+      "epoch": 1.4250666666666667,
+      "grad_norm": 0.0878469941667546,
+      "learning_rate": 1e-05,
+      "loss": 0.1034,
+      "step": 167
+    },
+    {
+      "epoch": 1.4336,
+      "grad_norm": 0.08507656582015763,
+      "learning_rate": 1e-05,
+      "loss": 0.1124,
+      "step": 168
+    },
+    {
+      "epoch": 1.4421333333333333,
+      "grad_norm": 0.14341789007671765,
+      "learning_rate": 1e-05,
+      "loss": 0.1045,
+      "step": 169
+    },
+    {
+      "epoch": 1.4506666666666668,
+      "grad_norm": 0.11549200338103699,
+      "learning_rate": 1e-05,
+      "loss": 0.1192,
+      "step": 170
+    },
+    {
+      "epoch": 1.4592,
+      "grad_norm": 0.08297398102159202,
+      "learning_rate": 1e-05,
+      "loss": 0.106,
+      "step": 171
+    },
+    {
+      "epoch": 1.4677333333333333,
+      "grad_norm": 0.08511454300188333,
+      "learning_rate": 1e-05,
+      "loss": 0.1115,
+      "step": 172
+    },
+    {
+      "epoch": 1.4762666666666666,
+      "grad_norm": 0.06731733651614974,
+      "learning_rate": 1e-05,
+      "loss": 0.0579,
+      "step": 173
+    },
+    {
+      "epoch": 1.4848,
+      "grad_norm": 0.08522628039447024,
+      "learning_rate": 1e-05,
+      "loss": 0.0944,
+      "step": 174
+    },
+    {
+      "epoch": 1.4933333333333334,
+      "grad_norm": 0.08148851689521808,
+      "learning_rate": 1e-05,
+      "loss": 0.0946,
+      "step": 175
+    },
+    {
+      "epoch": 1.5018666666666667,
+      "grad_norm": 0.09314761246496046,
+      "learning_rate": 1e-05,
+      "loss": 0.1077,
+      "step": 176
+    },
+    {
+      "epoch": 1.5104,
+      "grad_norm": 0.08337943532869242,
+      "learning_rate": 1e-05,
+      "loss": 0.0919,
+      "step": 177
+    },
+    {
+      "epoch": 1.5189333333333335,
+      "grad_norm": 0.07936632915317685,
+      "learning_rate": 1e-05,
+      "loss": 0.0878,
+      "step": 178
+    },
+    {
+      "epoch": 1.5274666666666668,
+      "grad_norm": 0.10041567827499392,
+      "learning_rate": 1e-05,
+      "loss": 0.1164,
+      "step": 179
+    },
+    {
+      "epoch": 1.536,
+      "grad_norm": 0.08184099557308296,
+      "learning_rate": 1e-05,
+      "loss": 0.1143,
+      "step": 180
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 301,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 20,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.564902203714175e+18,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-180/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9430fb289d52200b279530dc31f818fe016b81f2a2feb4d356e75541590998de
+size 6840

checkpoint-20/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: ../ckpts/Meta-Llama-3-8B-Instruct
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-20/adapter_config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "../ckpts/Meta-Llama-3-8B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "up_proj",
+    "gate_proj",
+    "lm_head",
+    "o_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-20/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b5b3a1c4ad0ae63bbc0ebb2c7da1061118db60c579b0a42779982fc824a136e9
+size 1138856856

checkpoint-20/trainer_state.json ADDED Viewed

	@@ -0,0 +1,173 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.17066666666666666,
+  "eval_steps": 500,
+  "global_step": 20,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.008533333333333334,
+      "grad_norm": 160.11701043689894,
+      "learning_rate": 0.0,
+      "loss": 32.4968,
+      "step": 1
+    },
+    {
+      "epoch": 0.017066666666666667,
+      "grad_norm": 157.24779534424323,
+      "learning_rate": 1.5051499783199057e-06,
+      "loss": 31.6979,
+      "step": 2
+    },
+    {
+      "epoch": 0.0256,
+      "grad_norm": 157.9465272449825,
+      "learning_rate": 2.385606273598312e-06,
+      "loss": 31.8828,
+      "step": 3
+    },
+    {
+      "epoch": 0.034133333333333335,
+      "grad_norm": 160.2154859965946,
+      "learning_rate": 3.0102999566398115e-06,
+      "loss": 31.9681,
+      "step": 4
+    },
+    {
+      "epoch": 0.042666666666666665,
+      "grad_norm": 158.5305446712084,
+      "learning_rate": 3.4948500216800934e-06,
+      "loss": 31.3717,
+      "step": 5
+    },
+    {
+      "epoch": 0.0512,
+      "grad_norm": 155.50243039700376,
+      "learning_rate": 3.890756251918218e-06,
+      "loss": 30.5348,
+      "step": 6
+    },
+    {
+      "epoch": 0.05973333333333333,
+      "grad_norm": 168.6887446693614,
+      "learning_rate": 4.225490200071284e-06,
+      "loss": 31.3845,
+      "step": 7
+    },
+    {
+      "epoch": 0.06826666666666667,
+      "grad_norm": 164.2631689450651,
+      "learning_rate": 4.515449934959717e-06,
+      "loss": 30.5243,
+      "step": 8
+    },
+    {
+      "epoch": 0.0768,
+      "grad_norm": 174.1878139573776,
+      "learning_rate": 4.771212547196624e-06,
+      "loss": 30.0138,
+      "step": 9
+    },
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 177.9519334680014,
+      "learning_rate": 4.9999999999999996e-06,
+      "loss": 29.6143,
+      "step": 10
+    },
+    {
+      "epoch": 0.09386666666666667,
+      "grad_norm": 183.57104380865735,
+      "learning_rate": 5.206963425791125e-06,
+      "loss": 28.8718,
+      "step": 11
+    },
+    {
+      "epoch": 0.1024,
+      "grad_norm": 186.4090344511231,
+      "learning_rate": 5.395906230238124e-06,
+      "loss": 26.1695,
+      "step": 12
+    },
+    {
+      "epoch": 0.11093333333333333,
+      "grad_norm": 198.17161320746723,
+      "learning_rate": 5.5697167615341825e-06,
+      "loss": 26.1266,
+      "step": 13
+    },
+    {
+      "epoch": 0.11946666666666667,
+      "grad_norm": 182.4443087115901,
+      "learning_rate": 5.730640178391189e-06,
+      "loss": 24.2121,
+      "step": 14
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 159.38105380659272,
+      "learning_rate": 5.880456295278406e-06,
+      "loss": 22.5796,
+      "step": 15
+    },
+    {
+      "epoch": 0.13653333333333334,
+      "grad_norm": 142.82387126501297,
+      "learning_rate": 6.020599913279623e-06,
+      "loss": 21.1346,
+      "step": 16
+    },
+    {
+      "epoch": 0.14506666666666668,
+      "grad_norm": 123.86394296641578,
+      "learning_rate": 6.15224460689137e-06,
+      "loss": 19.8457,
+      "step": 17
+    },
+    {
+      "epoch": 0.1536,
+      "grad_norm": 112.3988260336824,
+      "learning_rate": 6.276362525516529e-06,
+      "loss": 18.7824,
+      "step": 18
+    },
+    {
+      "epoch": 0.16213333333333332,
+      "grad_norm": 120.96712330991012,
+      "learning_rate": 6.393768004764144e-06,
+      "loss": 18.0207,
+      "step": 19
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 129.42692949353702,
+      "learning_rate": 6.505149978319905e-06,
+      "loss": 16.8355,
+      "step": 20
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 301,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 20,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.924957640079442e+17,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-20/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9430fb289d52200b279530dc31f818fe016b81f2a2feb4d356e75541590998de
+size 6840

checkpoint-200/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: ../ckpts/Meta-Llama-3-8B-Instruct
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-200/adapter_config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "../ckpts/Meta-Llama-3-8B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "up_proj",
+    "gate_proj",
+    "lm_head",
+    "o_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-200/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6ce5931532d5731554dcb8065c2d99f3c334e5df3d5c09e2d9b756585177463a
+size 1138856856

checkpoint-200/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1433 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.7066666666666666,
+  "eval_steps": 500,
+  "global_step": 200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.008533333333333334,
+      "grad_norm": 160.11701043689894,
+      "learning_rate": 0.0,
+      "loss": 32.4968,
+      "step": 1
+    },
+    {
+      "epoch": 0.017066666666666667,
+      "grad_norm": 157.24779534424323,
+      "learning_rate": 1.5051499783199057e-06,
+      "loss": 31.6979,
+      "step": 2
+    },
+    {
+      "epoch": 0.0256,
+      "grad_norm": 157.9465272449825,
+      "learning_rate": 2.385606273598312e-06,
+      "loss": 31.8828,
+      "step": 3
+    },
+    {
+      "epoch": 0.034133333333333335,
+      "grad_norm": 160.2154859965946,
+      "learning_rate": 3.0102999566398115e-06,
+      "loss": 31.9681,
+      "step": 4
+    },
+    {
+      "epoch": 0.042666666666666665,
+      "grad_norm": 158.5305446712084,
+      "learning_rate": 3.4948500216800934e-06,
+      "loss": 31.3717,
+      "step": 5
+    },
+    {
+      "epoch": 0.0512,
+      "grad_norm": 155.50243039700376,
+      "learning_rate": 3.890756251918218e-06,
+      "loss": 30.5348,
+      "step": 6
+    },
+    {
+      "epoch": 0.05973333333333333,
+      "grad_norm": 168.6887446693614,
+      "learning_rate": 4.225490200071284e-06,
+      "loss": 31.3845,
+      "step": 7
+    },
+    {
+      "epoch": 0.06826666666666667,
+      "grad_norm": 164.2631689450651,
+      "learning_rate": 4.515449934959717e-06,
+      "loss": 30.5243,
+      "step": 8
+    },
+    {
+      "epoch": 0.0768,
+      "grad_norm": 174.1878139573776,
+      "learning_rate": 4.771212547196624e-06,
+      "loss": 30.0138,
+      "step": 9
+    },
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 177.9519334680014,
+      "learning_rate": 4.9999999999999996e-06,
+      "loss": 29.6143,
+      "step": 10
+    },
+    {
+      "epoch": 0.09386666666666667,
+      "grad_norm": 183.57104380865735,
+      "learning_rate": 5.206963425791125e-06,
+      "loss": 28.8718,
+      "step": 11
+    },
+    {
+      "epoch": 0.1024,
+      "grad_norm": 186.4090344511231,
+      "learning_rate": 5.395906230238124e-06,
+      "loss": 26.1695,
+      "step": 12
+    },
+    {
+      "epoch": 0.11093333333333333,
+      "grad_norm": 198.17161320746723,
+      "learning_rate": 5.5697167615341825e-06,
+      "loss": 26.1266,
+      "step": 13
+    },
+    {
+      "epoch": 0.11946666666666667,
+      "grad_norm": 182.4443087115901,
+      "learning_rate": 5.730640178391189e-06,
+      "loss": 24.2121,
+      "step": 14
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 159.38105380659272,
+      "learning_rate": 5.880456295278406e-06,
+      "loss": 22.5796,
+      "step": 15
+    },
+    {
+      "epoch": 0.13653333333333334,
+      "grad_norm": 142.82387126501297,
+      "learning_rate": 6.020599913279623e-06,
+      "loss": 21.1346,
+      "step": 16
+    },
+    {
+      "epoch": 0.14506666666666668,
+      "grad_norm": 123.86394296641578,
+      "learning_rate": 6.15224460689137e-06,
+      "loss": 19.8457,
+      "step": 17
+    },
+    {
+      "epoch": 0.1536,
+      "grad_norm": 112.3988260336824,
+      "learning_rate": 6.276362525516529e-06,
+      "loss": 18.7824,
+      "step": 18
+    },
+    {
+      "epoch": 0.16213333333333332,
+      "grad_norm": 120.96712330991012,
+      "learning_rate": 6.393768004764144e-06,
+      "loss": 18.0207,
+      "step": 19
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 129.42692949353702,
+      "learning_rate": 6.505149978319905e-06,
+      "loss": 16.8355,
+      "step": 20
+    },
+    {
+      "epoch": 0.1792,
+      "grad_norm": 120.65595457746791,
+      "learning_rate": 6.611096473669596e-06,
+      "loss": 15.252,
+      "step": 21
+    },
+    {
+      "epoch": 0.18773333333333334,
+      "grad_norm": 133.05280466087515,
+      "learning_rate": 6.712113404111031e-06,
+      "loss": 14.1391,
+      "step": 22
+    },
+    {
+      "epoch": 0.19626666666666667,
+      "grad_norm": 127.95029628849048,
+      "learning_rate": 6.808639180087963e-06,
+      "loss": 12.9566,
+      "step": 23
+    },
+    {
+      "epoch": 0.2048,
+      "grad_norm": 108.83495245094748,
+      "learning_rate": 6.90105620855803e-06,
+      "loss": 11.8743,
+      "step": 24
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 99.90727146021455,
+      "learning_rate": 6.989700043360187e-06,
+      "loss": 10.962,
+      "step": 25
+    },
+    {
+      "epoch": 0.22186666666666666,
+      "grad_norm": 98.37126740059823,
+      "learning_rate": 7.074866739854089e-06,
+      "loss": 9.9919,
+      "step": 26
+    },
+    {
+      "epoch": 0.2304,
+      "grad_norm": 92.26708429201608,
+      "learning_rate": 7.156818820794936e-06,
+      "loss": 8.8811,
+      "step": 27
+    },
+    {
+      "epoch": 0.23893333333333333,
+      "grad_norm": 83.36099898839835,
+      "learning_rate": 7.235790156711096e-06,
+      "loss": 7.7806,
+      "step": 28
+    },
+    {
+      "epoch": 0.24746666666666667,
+      "grad_norm": 68.07500315598597,
+      "learning_rate": 7.3119899894947795e-06,
+      "loss": 7.0528,
+      "step": 29
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 69.58960332280246,
+      "learning_rate": 7.385606273598311e-06,
+      "loss": 6.3683,
+      "step": 30
+    },
+    {
+      "epoch": 0.26453333333333334,
+      "grad_norm": 68.77532204123075,
+      "learning_rate": 7.456808469171363e-06,
+      "loss": 6.1635,
+      "step": 31
+    },
+    {
+      "epoch": 0.2730666666666667,
+      "grad_norm": 66.29676636510072,
+      "learning_rate": 7.5257498915995295e-06,
+      "loss": 4.711,
+      "step": 32
+    },
+    {
+      "epoch": 0.2816,
+      "grad_norm": 42.87145091679237,
+      "learning_rate": 7.592569699389437e-06,
+      "loss": 4.5119,
+      "step": 33
+    },
+    {
+      "epoch": 0.29013333333333335,
+      "grad_norm": 26.2592350291551,
+      "learning_rate": 7.657394585211274e-06,
+      "loss": 4.31,
+      "step": 34
+    },
+    {
+      "epoch": 0.2986666666666667,
+      "grad_norm": 15.35959008067237,
+      "learning_rate": 7.720340221751376e-06,
+      "loss": 4.0001,
+      "step": 35
+    },
+    {
+      "epoch": 0.3072,
+      "grad_norm": 8.50847651865227,
+      "learning_rate": 7.781512503836437e-06,
+      "loss": 3.5723,
+      "step": 36
+    },
+    {
+      "epoch": 0.3157333333333333,
+      "grad_norm": 6.562581089063746,
+      "learning_rate": 7.841008620334974e-06,
+      "loss": 3.9254,
+      "step": 37
+    },
+    {
+      "epoch": 0.32426666666666665,
+      "grad_norm": 5.6145595722250095,
+      "learning_rate": 7.89891798308405e-06,
+      "loss": 3.8746,
+      "step": 38
+    },
+    {
+      "epoch": 0.3328,
+      "grad_norm": 5.385367220486204,
+      "learning_rate": 7.955323035132495e-06,
+      "loss": 3.8128,
+      "step": 39
+    },
+    {
+      "epoch": 0.3413333333333333,
+      "grad_norm": 5.403447124703616,
+      "learning_rate": 8.010299956639811e-06,
+      "loss": 3.885,
+      "step": 40
+    },
+    {
+      "epoch": 0.34986666666666666,
+      "grad_norm": 5.48242204895128,
+      "learning_rate": 8.063919283598677e-06,
+      "loss": 3.8048,
+      "step": 41
+    },
+    {
+      "epoch": 0.3584,
+      "grad_norm": 5.5525098950513865,
+      "learning_rate": 8.116246451989503e-06,
+      "loss": 3.7508,
+      "step": 42
+    },
+    {
+      "epoch": 0.36693333333333333,
+      "grad_norm": 5.354384520535484,
+      "learning_rate": 8.167342277897933e-06,
+      "loss": 3.5069,
+      "step": 43
+    },
+    {
+      "epoch": 0.37546666666666667,
+      "grad_norm": 5.46272338131107,
+      "learning_rate": 8.217263382430936e-06,
+      "loss": 3.6747,
+      "step": 44
+    },
+    {
+      "epoch": 0.384,
+      "grad_norm": 4.798550688968453,
+      "learning_rate": 8.266062568876717e-06,
+      "loss": 3.1609,
+      "step": 45
+    },
+    {
+      "epoch": 0.39253333333333335,
+      "grad_norm": 5.755104452953421,
+      "learning_rate": 8.31378915840787e-06,
+      "loss": 3.5733,
+      "step": 46
+    },
+    {
+      "epoch": 0.4010666666666667,
+      "grad_norm": 4.618763611067563,
+      "learning_rate": 8.360489289678585e-06,
+      "loss": 2.9402,
+      "step": 47
+    },
+    {
+      "epoch": 0.4096,
+      "grad_norm": 5.506785974818791,
+      "learning_rate": 8.406206186877936e-06,
+      "loss": 3.382,
+      "step": 48
+    },
+    {
+      "epoch": 0.41813333333333336,
+      "grad_norm": 4.68603207809794,
+      "learning_rate": 8.450980400142568e-06,
+      "loss": 2.9918,
+      "step": 49
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 5.124033394817131,
+      "learning_rate": 8.494850021680093e-06,
+      "loss": 3.3202,
+      "step": 50
+    },
+    {
+      "epoch": 0.4352,
+      "grad_norm": 4.293001183481895,
+      "learning_rate": 8.537850880489681e-06,
+      "loss": 2.8519,
+      "step": 51
+    },
+    {
+      "epoch": 0.4437333333333333,
+      "grad_norm": 4.382596858902394,
+      "learning_rate": 8.580016718173996e-06,
+      "loss": 2.9683,
+      "step": 52
+    },
+    {
+      "epoch": 0.45226666666666665,
+      "grad_norm": 4.3176263388044696,
+      "learning_rate": 8.621379348003945e-06,
+      "loss": 2.9257,
+      "step": 53
+    },
+    {
+      "epoch": 0.4608,
+      "grad_norm": 4.5250022171605195,
+      "learning_rate": 8.661968799114844e-06,
+      "loss": 3.0556,
+      "step": 54
+    },
+    {
+      "epoch": 0.4693333333333333,
+      "grad_norm": 4.429424190600661,
+      "learning_rate": 8.701813447471218e-06,
+      "loss": 2.9513,
+      "step": 55
+    },
+    {
+      "epoch": 0.47786666666666666,
+      "grad_norm": 4.349652568052827,
+      "learning_rate": 8.740940135031001e-06,
+      "loss": 2.9029,
+      "step": 56
+    },
+    {
+      "epoch": 0.4864,
+      "grad_norm": 4.299227871435445,
+      "learning_rate": 8.779374278362457e-06,
+      "loss": 2.5989,
+      "step": 57
+    },
+    {
+      "epoch": 0.49493333333333334,
+      "grad_norm": 4.562461330302201,
+      "learning_rate": 8.817139967814684e-06,
+      "loss": 2.8158,
+      "step": 58
+    },
+    {
+      "epoch": 0.5034666666666666,
+      "grad_norm": 4.606987182758338,
+      "learning_rate": 8.854260058210721e-06,
+      "loss": 2.6272,
+      "step": 59
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 4.9420031522511545,
+      "learning_rate": 8.890756251918216e-06,
+      "loss": 2.5488,
+      "step": 60
+    },
+    {
+      "epoch": 0.5205333333333333,
+      "grad_norm": 4.706462297046012,
+      "learning_rate": 8.926649175053834e-06,
+      "loss": 2.3575,
+      "step": 61
+    },
+    {
+      "epoch": 0.5290666666666667,
+      "grad_norm": 4.862820204363494,
+      "learning_rate": 8.961958447491269e-06,
+      "loss": 2.2952,
+      "step": 62
+    },
+    {
+      "epoch": 0.5376,
+      "grad_norm": 4.911045913397774,
+      "learning_rate": 8.996702747267908e-06,
+      "loss": 2.1768,
+      "step": 63
+    },
+    {
+      "epoch": 0.5461333333333334,
+      "grad_norm": 5.46978680182973,
+      "learning_rate": 9.030899869919434e-06,
+      "loss": 2.2528,
+      "step": 64
+    },
+    {
+      "epoch": 0.5546666666666666,
+      "grad_norm": 5.847558397227374,
+      "learning_rate": 9.064566783214276e-06,
+      "loss": 2.2401,
+      "step": 65
+    },
+    {
+      "epoch": 0.5632,
+      "grad_norm": 5.984440656257,
+      "learning_rate": 9.097719677709343e-06,
+      "loss": 2.156,
+      "step": 66
+    },
+    {
+      "epoch": 0.5717333333333333,
+      "grad_norm": 6.146172189799918,
+      "learning_rate": 9.130374013504131e-06,
+      "loss": 2.0059,
+      "step": 67
+    },
+    {
+      "epoch": 0.5802666666666667,
+      "grad_norm": 5.725706778130614,
+      "learning_rate": 9.162544563531182e-06,
+      "loss": 1.7756,
+      "step": 68
+    },
+    {
+      "epoch": 0.5888,
+      "grad_norm": 6.479060263133115,
+      "learning_rate": 9.194245453686277e-06,
+      "loss": 1.7651,
+      "step": 69
+    },
+    {
+      "epoch": 0.5973333333333334,
+      "grad_norm": 7.319291050667066,
+      "learning_rate": 9.225490200071284e-06,
+      "loss": 1.7712,
+      "step": 70
+    },
+    {
+      "epoch": 0.6058666666666667,
+      "grad_norm": 6.913275412032087,
+      "learning_rate": 9.256291743595376e-06,
+      "loss": 1.709,
+      "step": 71
+    },
+    {
+      "epoch": 0.6144,
+      "grad_norm": 6.600657239614328,
+      "learning_rate": 9.28666248215634e-06,
+      "loss": 1.3731,
+      "step": 72
+    },
+    {
+      "epoch": 0.6229333333333333,
+      "grad_norm": 7.301483724647945,
+      "learning_rate": 9.316614300602277e-06,
+      "loss": 1.4166,
+      "step": 73
+    },
+    {
+      "epoch": 0.6314666666666666,
+      "grad_norm": 7.154933225265475,
+      "learning_rate": 9.346158598654881e-06,
+      "loss": 1.2797,
+      "step": 74
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 8.248472592538771,
+      "learning_rate": 9.375306316958499e-06,
+      "loss": 1.2082,
+      "step": 75
+    },
+    {
+      "epoch": 0.6485333333333333,
+      "grad_norm": 7.444479096112177,
+      "learning_rate": 9.404067961403957e-06,
+      "loss": 1.0402,
+      "step": 76
+    },
+    {
+      "epoch": 0.6570666666666667,
+      "grad_norm": 6.819760434594012,
+      "learning_rate": 9.432453625862409e-06,
+      "loss": 0.8244,
+      "step": 77
+    },
+    {
+      "epoch": 0.6656,
+      "grad_norm": 6.894760862855001,
+      "learning_rate": 9.460473013452401e-06,
+      "loss": 0.8345,
+      "step": 78
+    },
+    {
+      "epoch": 0.6741333333333334,
+      "grad_norm": 6.001848571839919,
+      "learning_rate": 9.488135456452207e-06,
+      "loss": 0.6839,
+      "step": 79
+    },
+    {
+      "epoch": 0.6826666666666666,
+      "grad_norm": 5.709147411501981,
+      "learning_rate": 9.515449934959717e-06,
+      "loss": 0.6567,
+      "step": 80
+    },
+    {
+      "epoch": 0.6912,
+      "grad_norm": 4.128977158730638,
+      "learning_rate": 9.542425094393249e-06,
+      "loss": 0.545,
+      "step": 81
+    },
+    {
+      "epoch": 0.6997333333333333,
+      "grad_norm": 2.604915806147427,
+      "learning_rate": 9.569069261918582e-06,
+      "loss": 0.4596,
+      "step": 82
+    },
+    {
+      "epoch": 0.7082666666666667,
+      "grad_norm": 2.039939253407506,
+      "learning_rate": 9.59539046188037e-06,
+      "loss": 0.452,
+      "step": 83
+    },
+    {
+      "epoch": 0.7168,
+      "grad_norm": 2.0398988141415337,
+      "learning_rate": 9.621396430309407e-06,
+      "loss": 0.4538,
+      "step": 84
+    },
+    {
+      "epoch": 0.7253333333333334,
+      "grad_norm": 2.37589477950211,
+      "learning_rate": 9.647094628571464e-06,
+      "loss": 0.4505,
+      "step": 85
+    },
+    {
+      "epoch": 0.7338666666666667,
+      "grad_norm": 2.80580920047501,
+      "learning_rate": 9.672492256217837e-06,
+      "loss": 0.5284,
+      "step": 86
+    },
+    {
+      "epoch": 0.7424,
+      "grad_norm": 2.3687428819051197,
+      "learning_rate": 9.697596263093091e-06,
+      "loss": 0.4371,
+      "step": 87
+    },
+    {
+      "epoch": 0.7509333333333333,
+      "grad_norm": 1.6362502854757155,
+      "learning_rate": 9.722413360750844e-06,
+      "loss": 0.3652,
+      "step": 88
+    },
+    {
+      "epoch": 0.7594666666666666,
+      "grad_norm": 1.5360860168740427,
+      "learning_rate": 9.746950033224562e-06,
+      "loss": 0.3235,
+      "step": 89
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 1.7245475092642693,
+      "learning_rate": 9.771212547196623e-06,
+      "loss": 0.3072,
+      "step": 90
+    },
+    {
+      "epoch": 0.7765333333333333,
+      "grad_norm": 1.4493496982196852,
+      "learning_rate": 9.795206961605467e-06,
+      "loss": 0.2474,
+      "step": 91
+    },
+    {
+      "epoch": 0.7850666666666667,
+      "grad_norm": 1.1662262130552072,
+      "learning_rate": 9.818939136727777e-06,
+      "loss": 0.2684,
+      "step": 92
+    },
+    {
+      "epoch": 0.7936,
+      "grad_norm": 1.1727132215390659,
+      "learning_rate": 9.842414742769675e-06,
+      "loss": 0.3456,
+      "step": 93
+    },
+    {
+      "epoch": 0.8021333333333334,
+      "grad_norm": 0.8435059300379855,
+      "learning_rate": 9.865639267998493e-06,
+      "loss": 0.227,
+      "step": 94
+    },
+    {
+      "epoch": 0.8106666666666666,
+      "grad_norm": 0.8593375804730568,
+      "learning_rate": 9.888618026444238e-06,
+      "loss": 0.1985,
+      "step": 95
+    },
+    {
+      "epoch": 0.8192,
+      "grad_norm": 1.0673772841412472,
+      "learning_rate": 9.911356165197841e-06,
+      "loss": 0.3195,
+      "step": 96
+    },
+    {
+      "epoch": 0.8277333333333333,
+      "grad_norm": 0.9341285801648793,
+      "learning_rate": 9.933858671331224e-06,
+      "loss": 0.213,
+      "step": 97
+    },
+    {
+      "epoch": 0.8362666666666667,
+      "grad_norm": 0.7197728549764331,
+      "learning_rate": 9.956130378462474e-06,
+      "loss": 0.2067,
+      "step": 98
+    },
+    {
+      "epoch": 0.8448,
+      "grad_norm": 0.5655901060353195,
+      "learning_rate": 9.978175972987748e-06,
+      "loss": 0.1708,
+      "step": 99
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 0.4681745812066334,
+      "learning_rate": 9.999999999999999e-06,
+      "loss": 0.1983,
+      "step": 100
+    },
+    {
+      "epoch": 0.8618666666666667,
+      "grad_norm": 0.4488180280567293,
+      "learning_rate": 1e-05,
+      "loss": 0.1401,
+      "step": 101
+    },
+    {
+      "epoch": 0.8704,
+      "grad_norm": 0.43194512376224187,
+      "learning_rate": 1e-05,
+      "loss": 0.1097,
+      "step": 102
+    },
+    {
+      "epoch": 0.8789333333333333,
+      "grad_norm": 0.3754480982834532,
+      "learning_rate": 1e-05,
+      "loss": 0.1531,
+      "step": 103
+    },
+    {
+      "epoch": 0.8874666666666666,
+      "grad_norm": 0.34151633602448267,
+      "learning_rate": 1e-05,
+      "loss": 0.1685,
+      "step": 104
+    },
+    {
+      "epoch": 0.896,
+      "grad_norm": 0.26356638458244175,
+      "learning_rate": 1e-05,
+      "loss": 0.1104,
+      "step": 105
+    },
+    {
+      "epoch": 0.9045333333333333,
+      "grad_norm": 0.27641004897246113,
+      "learning_rate": 1e-05,
+      "loss": 0.1589,
+      "step": 106
+    },
+    {
+      "epoch": 0.9130666666666667,
+      "grad_norm": 0.1639383504796773,
+      "learning_rate": 1e-05,
+      "loss": 0.1064,
+      "step": 107
+    },
+    {
+      "epoch": 0.9216,
+      "grad_norm": 0.24233145434818837,
+      "learning_rate": 1e-05,
+      "loss": 0.1385,
+      "step": 108
+    },
+    {
+      "epoch": 0.9301333333333334,
+      "grad_norm": 0.16015184210317215,
+      "learning_rate": 1e-05,
+      "loss": 0.121,
+      "step": 109
+    },
+    {
+      "epoch": 0.9386666666666666,
+      "grad_norm": 0.14931644417242712,
+      "learning_rate": 1e-05,
+      "loss": 0.1117,
+      "step": 110
+    },
+    {
+      "epoch": 0.9472,
+      "grad_norm": 0.15078311335939154,
+      "learning_rate": 1e-05,
+      "loss": 0.1034,
+      "step": 111
+    },
+    {
+      "epoch": 0.9557333333333333,
+      "grad_norm": 0.16714082761639734,
+      "learning_rate": 1e-05,
+      "loss": 0.115,
+      "step": 112
+    },
+    {
+      "epoch": 0.9642666666666667,
+      "grad_norm": 0.12479711996187942,
+      "learning_rate": 1e-05,
+      "loss": 0.1029,
+      "step": 113
+    },
+    {
+      "epoch": 0.9728,
+      "grad_norm": 0.14783351137940065,
+      "learning_rate": 1e-05,
+      "loss": 0.0987,
+      "step": 114
+    },
+    {
+      "epoch": 0.9813333333333333,
+      "grad_norm": 0.11311876630863582,
+      "learning_rate": 1e-05,
+      "loss": 0.0911,
+      "step": 115
+    },
+    {
+      "epoch": 0.9898666666666667,
+      "grad_norm": 0.1238329581090649,
+      "learning_rate": 1e-05,
+      "loss": 0.1095,
+      "step": 116
+    },
+    {
+      "epoch": 0.9984,
+      "grad_norm": 0.11117413394533605,
+      "learning_rate": 1e-05,
+      "loss": 0.0968,
+      "step": 117
+    },
+    {
+      "epoch": 1.0069333333333332,
+      "grad_norm": 0.09247708923706752,
+      "learning_rate": 1e-05,
+      "loss": 0.0985,
+      "step": 118
+    },
+    {
+      "epoch": 1.0154666666666667,
+      "grad_norm": 0.12028574166046906,
+      "learning_rate": 1e-05,
+      "loss": 0.1085,
+      "step": 119
+    },
+    {
+      "epoch": 1.024,
+      "grad_norm": 0.075460717991084,
+      "learning_rate": 1e-05,
+      "loss": 0.1007,
+      "step": 120
+    },
+    {
+      "epoch": 1.0325333333333333,
+      "grad_norm": 0.1930335796969662,
+      "learning_rate": 1e-05,
+      "loss": 0.1438,
+      "step": 121
+    },
+    {
+      "epoch": 1.0410666666666666,
+      "grad_norm": 0.11451251015868702,
+      "learning_rate": 1e-05,
+      "loss": 0.1365,
+      "step": 122
+    },
+    {
+      "epoch": 1.0496,
+      "grad_norm": 0.09360332240252384,
+      "learning_rate": 1e-05,
+      "loss": 0.1039,
+      "step": 123
+    },
+    {
+      "epoch": 1.0581333333333334,
+      "grad_norm": 0.13162505626586696,
+      "learning_rate": 1e-05,
+      "loss": 0.1132,
+      "step": 124
+    },
+    {
+      "epoch": 1.0666666666666667,
+      "grad_norm": 0.1329223725298499,
+      "learning_rate": 1e-05,
+      "loss": 0.1153,
+      "step": 125
+    },
+    {
+      "epoch": 1.0752,
+      "grad_norm": 0.09522360247894453,
+      "learning_rate": 1e-05,
+      "loss": 0.1264,
+      "step": 126
+    },
+    {
+      "epoch": 1.0837333333333334,
+      "grad_norm": 0.12467359977458509,
+      "learning_rate": 1e-05,
+      "loss": 0.0866,
+      "step": 127
+    },
+    {
+      "epoch": 1.0922666666666667,
+      "grad_norm": 0.08853379791954709,
+      "learning_rate": 1e-05,
+      "loss": 0.107,
+      "step": 128
+    },
+    {
+      "epoch": 1.1008,
+      "grad_norm": 0.16050358070185106,
+      "learning_rate": 1e-05,
+      "loss": 0.1134,
+      "step": 129
+    },
+    {
+      "epoch": 1.1093333333333333,
+      "grad_norm": 0.10331318962336627,
+      "learning_rate": 1e-05,
+      "loss": 0.1217,
+      "step": 130
+    },
+    {
+      "epoch": 1.1178666666666666,
+      "grad_norm": 0.08498886624952962,
+      "learning_rate": 1e-05,
+      "loss": 0.12,
+      "step": 131
+    },
+    {
+      "epoch": 1.1264,
+      "grad_norm": 0.09918910544874306,
+      "learning_rate": 1e-05,
+      "loss": 0.1173,
+      "step": 132
+    },
+    {
+      "epoch": 1.1349333333333333,
+      "grad_norm": 0.0751198135696547,
+      "learning_rate": 1e-05,
+      "loss": 0.0973,
+      "step": 133
+    },
+    {
+      "epoch": 1.1434666666666666,
+      "grad_norm": 0.07959218402066412,
+      "learning_rate": 1e-05,
+      "loss": 0.0992,
+      "step": 134
+    },
+    {
+      "epoch": 1.152,
+      "grad_norm": 0.14419628324779726,
+      "learning_rate": 1e-05,
+      "loss": 0.0856,
+      "step": 135
+    },
+    {
+      "epoch": 1.1605333333333334,
+      "grad_norm": 0.07894542967774888,
+      "learning_rate": 1e-05,
+      "loss": 0.1193,
+      "step": 136
+    },
+    {
+      "epoch": 1.1690666666666667,
+      "grad_norm": 0.08735606763938318,
+      "learning_rate": 1e-05,
+      "loss": 0.1061,
+      "step": 137
+    },
+    {
+      "epoch": 1.1776,
+      "grad_norm": 0.12344637986728384,
+      "learning_rate": 1e-05,
+      "loss": 0.1184,
+      "step": 138
+    },
+    {
+      "epoch": 1.1861333333333333,
+      "grad_norm": 0.07797745242316644,
+      "learning_rate": 1e-05,
+      "loss": 0.0959,
+      "step": 139
+    },
+    {
+      "epoch": 1.1946666666666665,
+      "grad_norm": 0.10065236259356937,
+      "learning_rate": 1e-05,
+      "loss": 0.0957,
+      "step": 140
+    },
+    {
+      "epoch": 1.2032,
+      "grad_norm": 0.06472006342138571,
+      "learning_rate": 1e-05,
+      "loss": 0.0721,
+      "step": 141
+    },
+    {
+      "epoch": 1.2117333333333333,
+      "grad_norm": 0.08080002696086562,
+      "learning_rate": 1e-05,
+      "loss": 0.1073,
+      "step": 142
+    },
+    {
+      "epoch": 1.2202666666666666,
+      "grad_norm": 0.10400160039217118,
+      "learning_rate": 1e-05,
+      "loss": 0.1227,
+      "step": 143
+    },
+    {
+      "epoch": 1.2288000000000001,
+      "grad_norm": 0.08719509476650818,
+      "learning_rate": 1e-05,
+      "loss": 0.114,
+      "step": 144
+    },
+    {
+      "epoch": 1.2373333333333334,
+      "grad_norm": 0.08431635436674337,
+      "learning_rate": 1e-05,
+      "loss": 0.1303,
+      "step": 145
+    },
+    {
+      "epoch": 1.2458666666666667,
+      "grad_norm": 0.23947926607305503,
+      "learning_rate": 1e-05,
+      "loss": 0.1199,
+      "step": 146
+    },
+    {
+      "epoch": 1.2544,
+      "grad_norm": 0.08794721265212341,
+      "learning_rate": 1e-05,
+      "loss": 0.1094,
+      "step": 147
+    },
+    {
+      "epoch": 1.2629333333333332,
+      "grad_norm": 0.08063747277184712,
+      "learning_rate": 1e-05,
+      "loss": 0.1062,
+      "step": 148
+    },
+    {
+      "epoch": 1.2714666666666667,
+      "grad_norm": 0.06832693897193236,
+      "learning_rate": 1e-05,
+      "loss": 0.0842,
+      "step": 149
+    },
+    {
+      "epoch": 1.28,
+      "grad_norm": 0.07037053759395089,
+      "learning_rate": 1e-05,
+      "loss": 0.0971,
+      "step": 150
+    },
+    {
+      "epoch": 1.2885333333333333,
+      "grad_norm": 0.08753063334098339,
+      "learning_rate": 1e-05,
+      "loss": 0.085,
+      "step": 151
+    },
+    {
+      "epoch": 1.2970666666666666,
+      "grad_norm": 0.11381804369240754,
+      "learning_rate": 1e-05,
+      "loss": 0.1156,
+      "step": 152
+    },
+    {
+      "epoch": 1.3056,
+      "grad_norm": 0.07203805377255211,
+      "learning_rate": 1e-05,
+      "loss": 0.0951,
+      "step": 153
+    },
+    {
+      "epoch": 1.3141333333333334,
+      "grad_norm": 0.1156784206459358,
+      "learning_rate": 1e-05,
+      "loss": 0.1557,
+      "step": 154
+    },
+    {
+      "epoch": 1.3226666666666667,
+      "grad_norm": 0.11353874538174968,
+      "learning_rate": 1e-05,
+      "loss": 0.1284,
+      "step": 155
+    },
+    {
+      "epoch": 1.3312,
+      "grad_norm": 0.06675505890811795,
+      "learning_rate": 1e-05,
+      "loss": 0.089,
+      "step": 156
+    },
+    {
+      "epoch": 1.3397333333333332,
+      "grad_norm": 0.07642955477275162,
+      "learning_rate": 1e-05,
+      "loss": 0.0825,
+      "step": 157
+    },
+    {
+      "epoch": 1.3482666666666667,
+      "grad_norm": 0.07196529265355209,
+      "learning_rate": 1e-05,
+      "loss": 0.0885,
+      "step": 158
+    },
+    {
+      "epoch": 1.3568,
+      "grad_norm": 0.08651497112727735,
+      "learning_rate": 1e-05,
+      "loss": 0.0934,
+      "step": 159
+    },
+    {
+      "epoch": 1.3653333333333333,
+      "grad_norm": 0.07249320769144564,
+      "learning_rate": 1e-05,
+      "loss": 0.102,
+      "step": 160
+    },
+    {
+      "epoch": 1.3738666666666668,
+      "grad_norm": 0.08744246078973236,
+      "learning_rate": 1e-05,
+      "loss": 0.0905,
+      "step": 161
+    },
+    {
+      "epoch": 1.3824,
+      "grad_norm": 0.08657071789403122,
+      "learning_rate": 1e-05,
+      "loss": 0.1217,
+      "step": 162
+    },
+    {
+      "epoch": 1.3909333333333334,
+      "grad_norm": 0.1064187506686306,
+      "learning_rate": 1e-05,
+      "loss": 0.1163,
+      "step": 163
+    },
+    {
+      "epoch": 1.3994666666666666,
+      "grad_norm": 0.1280290421664948,
+      "learning_rate": 1e-05,
+      "loss": 0.1046,
+      "step": 164
+    },
+    {
+      "epoch": 1.408,
+      "grad_norm": 0.09937311183437203,
+      "learning_rate": 1e-05,
+      "loss": 0.1147,
+      "step": 165
+    },
+    {
+      "epoch": 1.4165333333333332,
+      "grad_norm": 0.08384493963149035,
+      "learning_rate": 1e-05,
+      "loss": 0.0837,
+      "step": 166
+    },
+    {
+      "epoch": 1.4250666666666667,
+      "grad_norm": 0.0878469941667546,
+      "learning_rate": 1e-05,
+      "loss": 0.1034,
+      "step": 167
+    },
+    {
+      "epoch": 1.4336,
+      "grad_norm": 0.08507656582015763,
+      "learning_rate": 1e-05,
+      "loss": 0.1124,
+      "step": 168
+    },
+    {
+      "epoch": 1.4421333333333333,
+      "grad_norm": 0.14341789007671765,
+      "learning_rate": 1e-05,
+      "loss": 0.1045,
+      "step": 169
+    },
+    {
+      "epoch": 1.4506666666666668,
+      "grad_norm": 0.11549200338103699,
+      "learning_rate": 1e-05,
+      "loss": 0.1192,
+      "step": 170
+    },
+    {
+      "epoch": 1.4592,
+      "grad_norm": 0.08297398102159202,
+      "learning_rate": 1e-05,
+      "loss": 0.106,
+      "step": 171
+    },
+    {
+      "epoch": 1.4677333333333333,
+      "grad_norm": 0.08511454300188333,
+      "learning_rate": 1e-05,
+      "loss": 0.1115,
+      "step": 172
+    },
+    {
+      "epoch": 1.4762666666666666,
+      "grad_norm": 0.06731733651614974,
+      "learning_rate": 1e-05,
+      "loss": 0.0579,
+      "step": 173
+    },
+    {
+      "epoch": 1.4848,
+      "grad_norm": 0.08522628039447024,
+      "learning_rate": 1e-05,
+      "loss": 0.0944,
+      "step": 174
+    },
+    {
+      "epoch": 1.4933333333333334,
+      "grad_norm": 0.08148851689521808,
+      "learning_rate": 1e-05,
+      "loss": 0.0946,
+      "step": 175
+    },
+    {
+      "epoch": 1.5018666666666667,
+      "grad_norm": 0.09314761246496046,
+      "learning_rate": 1e-05,
+      "loss": 0.1077,
+      "step": 176
+    },
+    {
+      "epoch": 1.5104,
+      "grad_norm": 0.08337943532869242,
+      "learning_rate": 1e-05,
+      "loss": 0.0919,
+      "step": 177
+    },
+    {
+      "epoch": 1.5189333333333335,
+      "grad_norm": 0.07936632915317685,
+      "learning_rate": 1e-05,
+      "loss": 0.0878,
+      "step": 178
+    },
+    {
+      "epoch": 1.5274666666666668,
+      "grad_norm": 0.10041567827499392,
+      "learning_rate": 1e-05,
+      "loss": 0.1164,
+      "step": 179
+    },
+    {
+      "epoch": 1.536,
+      "grad_norm": 0.08184099557308296,
+      "learning_rate": 1e-05,
+      "loss": 0.1143,
+      "step": 180
+    },
+    {
+      "epoch": 1.5445333333333333,
+      "grad_norm": 0.08722428613554693,
+      "learning_rate": 1e-05,
+      "loss": 0.1068,
+      "step": 181
+    },
+    {
+      "epoch": 1.5530666666666666,
+      "grad_norm": 0.08710953879234071,
+      "learning_rate": 1e-05,
+      "loss": 0.11,
+      "step": 182
+    },
+    {
+      "epoch": 1.5615999999999999,
+      "grad_norm": 0.08115450331732889,
+      "learning_rate": 1e-05,
+      "loss": 0.0877,
+      "step": 183
+    },
+    {
+      "epoch": 1.5701333333333334,
+      "grad_norm": 0.06955623887568685,
+      "learning_rate": 1e-05,
+      "loss": 0.0758,
+      "step": 184
+    },
+    {
+      "epoch": 1.5786666666666667,
+      "grad_norm": 0.11077420984396173,
+      "learning_rate": 1e-05,
+      "loss": 0.0886,
+      "step": 185
+    },
+    {
+      "epoch": 1.5872000000000002,
+      "grad_norm": 0.09248170156976404,
+      "learning_rate": 1e-05,
+      "loss": 0.1042,
+      "step": 186
+    },
+    {
+      "epoch": 1.5957333333333334,
+      "grad_norm": 0.0875865630501027,
+      "learning_rate": 1e-05,
+      "loss": 0.0956,
+      "step": 187
+    },
+    {
+      "epoch": 1.6042666666666667,
+      "grad_norm": 0.09025094284776364,
+      "learning_rate": 1e-05,
+      "loss": 0.0865,
+      "step": 188
+    },
+    {
+      "epoch": 1.6128,
+      "grad_norm": 0.09201435441623142,
+      "learning_rate": 1e-05,
+      "loss": 0.0848,
+      "step": 189
+    },
+    {
+      "epoch": 1.6213333333333333,
+      "grad_norm": 0.08582347653077456,
+      "learning_rate": 1e-05,
+      "loss": 0.0868,
+      "step": 190
+    },
+    {
+      "epoch": 1.6298666666666666,
+      "grad_norm": 0.08390294885002035,
+      "learning_rate": 1e-05,
+      "loss": 0.0883,
+      "step": 191
+    },
+    {
+      "epoch": 1.6383999999999999,
+      "grad_norm": 0.09484831369314428,
+      "learning_rate": 1e-05,
+      "loss": 0.0955,
+      "step": 192
+    },
+    {
+      "epoch": 1.6469333333333334,
+      "grad_norm": 0.08291745035821121,
+      "learning_rate": 1e-05,
+      "loss": 0.0943,
+      "step": 193
+    },
+    {
+      "epoch": 1.6554666666666666,
+      "grad_norm": 0.09788087284042751,
+      "learning_rate": 1e-05,
+      "loss": 0.1146,
+      "step": 194
+    },
+    {
+      "epoch": 1.6640000000000001,
+      "grad_norm": 0.09763113175653552,
+      "learning_rate": 1e-05,
+      "loss": 0.1028,
+      "step": 195
+    },
+    {
+      "epoch": 1.6725333333333334,
+      "grad_norm": 0.11617852408102547,
+      "learning_rate": 1e-05,
+      "loss": 0.1323,
+      "step": 196
+    },
+    {
+      "epoch": 1.6810666666666667,
+      "grad_norm": 0.12191871384850739,
+      "learning_rate": 1e-05,
+      "loss": 0.1395,
+      "step": 197
+    },
+    {
+      "epoch": 1.6896,
+      "grad_norm": 0.1359943408077879,
+      "learning_rate": 1e-05,
+      "loss": 0.1191,
+      "step": 198
+    },
+    {
+      "epoch": 1.6981333333333333,
+      "grad_norm": 0.12006029084078058,
+      "learning_rate": 1e-05,
+      "loss": 0.0983,
+      "step": 199
+    },
+    {
+      "epoch": 1.7066666666666666,
+      "grad_norm": 0.09668785600159001,
+      "learning_rate": 1e-05,
+      "loss": 0.0801,
+      "step": 200
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 301,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 20,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.856102926634451e+18,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-200/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9430fb289d52200b279530dc31f818fe016b81f2a2feb4d356e75541590998de
+size 6840

checkpoint-220/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: ../ckpts/Meta-Llama-3-8B-Instruct
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-220/adapter_config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "../ckpts/Meta-Llama-3-8B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "up_proj",
+    "gate_proj",
+    "lm_head",
+    "o_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-220/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9ec77bfb6c769698443828a3062a3136b9ef241b243ec055816d10323a79be14
+size 1138856856

checkpoint-220/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1573 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.8773333333333333,
+  "eval_steps": 500,
+  "global_step": 220,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.008533333333333334,
+      "grad_norm": 160.11701043689894,
+      "learning_rate": 0.0,
+      "loss": 32.4968,
+      "step": 1
+    },
+    {
+      "epoch": 0.017066666666666667,
+      "grad_norm": 157.24779534424323,
+      "learning_rate": 1.5051499783199057e-06,
+      "loss": 31.6979,
+      "step": 2
+    },
+    {
+      "epoch": 0.0256,
+      "grad_norm": 157.9465272449825,
+      "learning_rate": 2.385606273598312e-06,
+      "loss": 31.8828,
+      "step": 3
+    },
+    {
+      "epoch": 0.034133333333333335,
+      "grad_norm": 160.2154859965946,
+      "learning_rate": 3.0102999566398115e-06,
+      "loss": 31.9681,
+      "step": 4
+    },
+    {
+      "epoch": 0.042666666666666665,
+      "grad_norm": 158.5305446712084,
+      "learning_rate": 3.4948500216800934e-06,
+      "loss": 31.3717,
+      "step": 5
+    },
+    {
+      "epoch": 0.0512,
+      "grad_norm": 155.50243039700376,
+      "learning_rate": 3.890756251918218e-06,
+      "loss": 30.5348,
+      "step": 6
+    },
+    {
+      "epoch": 0.05973333333333333,
+      "grad_norm": 168.6887446693614,
+      "learning_rate": 4.225490200071284e-06,
+      "loss": 31.3845,
+      "step": 7
+    },
+    {
+      "epoch": 0.06826666666666667,
+      "grad_norm": 164.2631689450651,
+      "learning_rate": 4.515449934959717e-06,
+      "loss": 30.5243,
+      "step": 8
+    },
+    {
+      "epoch": 0.0768,
+      "grad_norm": 174.1878139573776,
+      "learning_rate": 4.771212547196624e-06,
+      "loss": 30.0138,
+      "step": 9
+    },
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 177.9519334680014,
+      "learning_rate": 4.9999999999999996e-06,
+      "loss": 29.6143,
+      "step": 10
+    },
+    {
+      "epoch": 0.09386666666666667,
+      "grad_norm": 183.57104380865735,
+      "learning_rate": 5.206963425791125e-06,
+      "loss": 28.8718,
+      "step": 11
+    },
+    {
+      "epoch": 0.1024,
+      "grad_norm": 186.4090344511231,
+      "learning_rate": 5.395906230238124e-06,
+      "loss": 26.1695,
+      "step": 12
+    },
+    {
+      "epoch": 0.11093333333333333,
+      "grad_norm": 198.17161320746723,
+      "learning_rate": 5.5697167615341825e-06,
+      "loss": 26.1266,
+      "step": 13
+    },
+    {
+      "epoch": 0.11946666666666667,
+      "grad_norm": 182.4443087115901,
+      "learning_rate": 5.730640178391189e-06,
+      "loss": 24.2121,
+      "step": 14
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 159.38105380659272,
+      "learning_rate": 5.880456295278406e-06,
+      "loss": 22.5796,
+      "step": 15
+    },
+    {
+      "epoch": 0.13653333333333334,
+      "grad_norm": 142.82387126501297,
+      "learning_rate": 6.020599913279623e-06,
+      "loss": 21.1346,
+      "step": 16
+    },
+    {
+      "epoch": 0.14506666666666668,
+      "grad_norm": 123.86394296641578,
+      "learning_rate": 6.15224460689137e-06,
+      "loss": 19.8457,
+      "step": 17
+    },
+    {
+      "epoch": 0.1536,
+      "grad_norm": 112.3988260336824,
+      "learning_rate": 6.276362525516529e-06,
+      "loss": 18.7824,
+      "step": 18
+    },
+    {
+      "epoch": 0.16213333333333332,
+      "grad_norm": 120.96712330991012,
+      "learning_rate": 6.393768004764144e-06,
+      "loss": 18.0207,
+      "step": 19
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 129.42692949353702,
+      "learning_rate": 6.505149978319905e-06,
+      "loss": 16.8355,
+      "step": 20
+    },
+    {
+      "epoch": 0.1792,
+      "grad_norm": 120.65595457746791,
+      "learning_rate": 6.611096473669596e-06,
+      "loss": 15.252,
+      "step": 21
+    },
+    {
+      "epoch": 0.18773333333333334,
+      "grad_norm": 133.05280466087515,
+      "learning_rate": 6.712113404111031e-06,
+      "loss": 14.1391,
+      "step": 22
+    },
+    {
+      "epoch": 0.19626666666666667,
+      "grad_norm": 127.95029628849048,
+      "learning_rate": 6.808639180087963e-06,
+      "loss": 12.9566,
+      "step": 23
+    },
+    {
+      "epoch": 0.2048,
+      "grad_norm": 108.83495245094748,
+      "learning_rate": 6.90105620855803e-06,
+      "loss": 11.8743,
+      "step": 24
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 99.90727146021455,
+      "learning_rate": 6.989700043360187e-06,
+      "loss": 10.962,
+      "step": 25
+    },
+    {
+      "epoch": 0.22186666666666666,
+      "grad_norm": 98.37126740059823,
+      "learning_rate": 7.074866739854089e-06,
+      "loss": 9.9919,
+      "step": 26
+    },
+    {
+      "epoch": 0.2304,
+      "grad_norm": 92.26708429201608,
+      "learning_rate": 7.156818820794936e-06,
+      "loss": 8.8811,
+      "step": 27
+    },
+    {
+      "epoch": 0.23893333333333333,
+      "grad_norm": 83.36099898839835,
+      "learning_rate": 7.235790156711096e-06,
+      "loss": 7.7806,
+      "step": 28
+    },
+    {
+      "epoch": 0.24746666666666667,
+      "grad_norm": 68.07500315598597,
+      "learning_rate": 7.3119899894947795e-06,
+      "loss": 7.0528,
+      "step": 29
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 69.58960332280246,
+      "learning_rate": 7.385606273598311e-06,
+      "loss": 6.3683,
+      "step": 30
+    },
+    {
+      "epoch": 0.26453333333333334,
+      "grad_norm": 68.77532204123075,
+      "learning_rate": 7.456808469171363e-06,
+      "loss": 6.1635,
+      "step": 31
+    },
+    {
+      "epoch": 0.2730666666666667,
+      "grad_norm": 66.29676636510072,
+      "learning_rate": 7.5257498915995295e-06,
+      "loss": 4.711,
+      "step": 32
+    },
+    {
+      "epoch": 0.2816,
+      "grad_norm": 42.87145091679237,
+      "learning_rate": 7.592569699389437e-06,
+      "loss": 4.5119,
+      "step": 33
+    },
+    {
+      "epoch": 0.29013333333333335,
+      "grad_norm": 26.2592350291551,
+      "learning_rate": 7.657394585211274e-06,
+      "loss": 4.31,
+      "step": 34
+    },
+    {
+      "epoch": 0.2986666666666667,
+      "grad_norm": 15.35959008067237,
+      "learning_rate": 7.720340221751376e-06,
+      "loss": 4.0001,
+      "step": 35
+    },
+    {
+      "epoch": 0.3072,
+      "grad_norm": 8.50847651865227,
+      "learning_rate": 7.781512503836437e-06,
+      "loss": 3.5723,
+      "step": 36
+    },
+    {
+      "epoch": 0.3157333333333333,
+      "grad_norm": 6.562581089063746,
+      "learning_rate": 7.841008620334974e-06,
+      "loss": 3.9254,
+      "step": 37
+    },
+    {
+      "epoch": 0.32426666666666665,
+      "grad_norm": 5.6145595722250095,
+      "learning_rate": 7.89891798308405e-06,
+      "loss": 3.8746,
+      "step": 38
+    },
+    {
+      "epoch": 0.3328,
+      "grad_norm": 5.385367220486204,
+      "learning_rate": 7.955323035132495e-06,
+      "loss": 3.8128,
+      "step": 39
+    },
+    {
+      "epoch": 0.3413333333333333,
+      "grad_norm": 5.403447124703616,
+      "learning_rate": 8.010299956639811e-06,
+      "loss": 3.885,
+      "step": 40
+    },
+    {
+      "epoch": 0.34986666666666666,
+      "grad_norm": 5.48242204895128,
+      "learning_rate": 8.063919283598677e-06,
+      "loss": 3.8048,
+      "step": 41
+    },
+    {
+      "epoch": 0.3584,
+      "grad_norm": 5.5525098950513865,
+      "learning_rate": 8.116246451989503e-06,
+      "loss": 3.7508,
+      "step": 42
+    },
+    {
+      "epoch": 0.36693333333333333,
+      "grad_norm": 5.354384520535484,
+      "learning_rate": 8.167342277897933e-06,
+      "loss": 3.5069,
+      "step": 43
+    },
+    {
+      "epoch": 0.37546666666666667,
+      "grad_norm": 5.46272338131107,
+      "learning_rate": 8.217263382430936e-06,
+      "loss": 3.6747,
+      "step": 44
+    },
+    {
+      "epoch": 0.384,
+      "grad_norm": 4.798550688968453,
+      "learning_rate": 8.266062568876717e-06,
+      "loss": 3.1609,
+      "step": 45
+    },
+    {
+      "epoch": 0.39253333333333335,
+      "grad_norm": 5.755104452953421,
+      "learning_rate": 8.31378915840787e-06,
+      "loss": 3.5733,
+      "step": 46
+    },
+    {
+      "epoch": 0.4010666666666667,
+      "grad_norm": 4.618763611067563,
+      "learning_rate": 8.360489289678585e-06,
+      "loss": 2.9402,
+      "step": 47
+    },
+    {
+      "epoch": 0.4096,
+      "grad_norm": 5.506785974818791,
+      "learning_rate": 8.406206186877936e-06,
+      "loss": 3.382,
+      "step": 48
+    },
+    {
+      "epoch": 0.41813333333333336,
+      "grad_norm": 4.68603207809794,
+      "learning_rate": 8.450980400142568e-06,
+      "loss": 2.9918,
+      "step": 49
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 5.124033394817131,
+      "learning_rate": 8.494850021680093e-06,
+      "loss": 3.3202,
+      "step": 50
+    },
+    {
+      "epoch": 0.4352,
+      "grad_norm": 4.293001183481895,
+      "learning_rate": 8.537850880489681e-06,
+      "loss": 2.8519,
+      "step": 51
+    },
+    {
+      "epoch": 0.4437333333333333,
+      "grad_norm": 4.382596858902394,
+      "learning_rate": 8.580016718173996e-06,
+      "loss": 2.9683,
+      "step": 52
+    },
+    {
+      "epoch": 0.45226666666666665,
+      "grad_norm": 4.3176263388044696,
+      "learning_rate": 8.621379348003945e-06,
+      "loss": 2.9257,
+      "step": 53
+    },
+    {
+      "epoch": 0.4608,
+      "grad_norm": 4.5250022171605195,
+      "learning_rate": 8.661968799114844e-06,
+      "loss": 3.0556,
+      "step": 54
+    },
+    {
+      "epoch": 0.4693333333333333,
+      "grad_norm": 4.429424190600661,
+      "learning_rate": 8.701813447471218e-06,
+      "loss": 2.9513,
+      "step": 55
+    },
+    {
+      "epoch": 0.47786666666666666,
+      "grad_norm": 4.349652568052827,
+      "learning_rate": 8.740940135031001e-06,
+      "loss": 2.9029,
+      "step": 56
+    },
+    {
+      "epoch": 0.4864,
+      "grad_norm": 4.299227871435445,
+      "learning_rate": 8.779374278362457e-06,
+      "loss": 2.5989,
+      "step": 57
+    },
+    {
+      "epoch": 0.49493333333333334,
+      "grad_norm": 4.562461330302201,
+      "learning_rate": 8.817139967814684e-06,
+      "loss": 2.8158,
+      "step": 58
+    },
+    {
+      "epoch": 0.5034666666666666,
+      "grad_norm": 4.606987182758338,
+      "learning_rate": 8.854260058210721e-06,
+      "loss": 2.6272,
+      "step": 59
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 4.9420031522511545,
+      "learning_rate": 8.890756251918216e-06,
+      "loss": 2.5488,
+      "step": 60
+    },
+    {
+      "epoch": 0.5205333333333333,
+      "grad_norm": 4.706462297046012,
+      "learning_rate": 8.926649175053834e-06,
+      "loss": 2.3575,
+      "step": 61
+    },
+    {
+      "epoch": 0.5290666666666667,
+      "grad_norm": 4.862820204363494,
+      "learning_rate": 8.961958447491269e-06,
+      "loss": 2.2952,
+      "step": 62
+    },
+    {
+      "epoch": 0.5376,
+      "grad_norm": 4.911045913397774,
+      "learning_rate": 8.996702747267908e-06,
+      "loss": 2.1768,
+      "step": 63
+    },
+    {
+      "epoch": 0.5461333333333334,
+      "grad_norm": 5.46978680182973,
+      "learning_rate": 9.030899869919434e-06,
+      "loss": 2.2528,
+      "step": 64
+    },
+    {
+      "epoch": 0.5546666666666666,
+      "grad_norm": 5.847558397227374,
+      "learning_rate": 9.064566783214276e-06,
+      "loss": 2.2401,
+      "step": 65
+    },
+    {
+      "epoch": 0.5632,
+      "grad_norm": 5.984440656257,
+      "learning_rate": 9.097719677709343e-06,
+      "loss": 2.156,
+      "step": 66
+    },
+    {
+      "epoch": 0.5717333333333333,
+      "grad_norm": 6.146172189799918,
+      "learning_rate": 9.130374013504131e-06,
+      "loss": 2.0059,
+      "step": 67
+    },
+    {
+      "epoch": 0.5802666666666667,
+      "grad_norm": 5.725706778130614,
+      "learning_rate": 9.162544563531182e-06,
+      "loss": 1.7756,
+      "step": 68
+    },
+    {
+      "epoch": 0.5888,
+      "grad_norm": 6.479060263133115,
+      "learning_rate": 9.194245453686277e-06,
+      "loss": 1.7651,
+      "step": 69
+    },
+    {
+      "epoch": 0.5973333333333334,
+      "grad_norm": 7.319291050667066,
+      "learning_rate": 9.225490200071284e-06,
+      "loss": 1.7712,
+      "step": 70
+    },
+    {
+      "epoch": 0.6058666666666667,
+      "grad_norm": 6.913275412032087,
+      "learning_rate": 9.256291743595376e-06,
+      "loss": 1.709,
+      "step": 71
+    },
+    {
+      "epoch": 0.6144,
+      "grad_norm": 6.600657239614328,
+      "learning_rate": 9.28666248215634e-06,
+      "loss": 1.3731,
+      "step": 72
+    },
+    {
+      "epoch": 0.6229333333333333,
+      "grad_norm": 7.301483724647945,
+      "learning_rate": 9.316614300602277e-06,
+      "loss": 1.4166,
+      "step": 73
+    },
+    {
+      "epoch": 0.6314666666666666,
+      "grad_norm": 7.154933225265475,
+      "learning_rate": 9.346158598654881e-06,
+      "loss": 1.2797,
+      "step": 74
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 8.248472592538771,
+      "learning_rate": 9.375306316958499e-06,
+      "loss": 1.2082,
+      "step": 75
+    },
+    {
+      "epoch": 0.6485333333333333,
+      "grad_norm": 7.444479096112177,
+      "learning_rate": 9.404067961403957e-06,
+      "loss": 1.0402,
+      "step": 76
+    },
+    {
+      "epoch": 0.6570666666666667,
+      "grad_norm": 6.819760434594012,
+      "learning_rate": 9.432453625862409e-06,
+      "loss": 0.8244,
+      "step": 77
+    },
+    {
+      "epoch": 0.6656,
+      "grad_norm": 6.894760862855001,
+      "learning_rate": 9.460473013452401e-06,
+      "loss": 0.8345,
+      "step": 78
+    },
+    {
+      "epoch": 0.6741333333333334,
+      "grad_norm": 6.001848571839919,
+      "learning_rate": 9.488135456452207e-06,
+      "loss": 0.6839,
+      "step": 79
+    },
+    {
+      "epoch": 0.6826666666666666,
+      "grad_norm": 5.709147411501981,
+      "learning_rate": 9.515449934959717e-06,
+      "loss": 0.6567,
+      "step": 80
+    },
+    {
+      "epoch": 0.6912,
+      "grad_norm": 4.128977158730638,
+      "learning_rate": 9.542425094393249e-06,
+      "loss": 0.545,
+      "step": 81
+    },
+    {
+      "epoch": 0.6997333333333333,
+      "grad_norm": 2.604915806147427,
+      "learning_rate": 9.569069261918582e-06,
+      "loss": 0.4596,
+      "step": 82
+    },
+    {
+      "epoch": 0.7082666666666667,
+      "grad_norm": 2.039939253407506,
+      "learning_rate": 9.59539046188037e-06,
+      "loss": 0.452,
+      "step": 83
+    },
+    {
+      "epoch": 0.7168,
+      "grad_norm": 2.0398988141415337,
+      "learning_rate": 9.621396430309407e-06,
+      "loss": 0.4538,
+      "step": 84
+    },
+    {
+      "epoch": 0.7253333333333334,
+      "grad_norm": 2.37589477950211,
+      "learning_rate": 9.647094628571464e-06,
+      "loss": 0.4505,
+      "step": 85
+    },
+    {
+      "epoch": 0.7338666666666667,
+      "grad_norm": 2.80580920047501,
+      "learning_rate": 9.672492256217837e-06,
+      "loss": 0.5284,
+      "step": 86
+    },
+    {
+      "epoch": 0.7424,
+      "grad_norm": 2.3687428819051197,
+      "learning_rate": 9.697596263093091e-06,
+      "loss": 0.4371,
+      "step": 87
+    },
+    {
+      "epoch": 0.7509333333333333,
+      "grad_norm": 1.6362502854757155,
+      "learning_rate": 9.722413360750844e-06,
+      "loss": 0.3652,
+      "step": 88
+    },
+    {
+      "epoch": 0.7594666666666666,
+      "grad_norm": 1.5360860168740427,
+      "learning_rate": 9.746950033224562e-06,
+      "loss": 0.3235,
+      "step": 89
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 1.7245475092642693,
+      "learning_rate": 9.771212547196623e-06,
+      "loss": 0.3072,
+      "step": 90
+    },
+    {
+      "epoch": 0.7765333333333333,
+      "grad_norm": 1.4493496982196852,
+      "learning_rate": 9.795206961605467e-06,
+      "loss": 0.2474,
+      "step": 91
+    },
+    {
+      "epoch": 0.7850666666666667,
+      "grad_norm": 1.1662262130552072,
+      "learning_rate": 9.818939136727777e-06,
+      "loss": 0.2684,
+      "step": 92
+    },
+    {
+      "epoch": 0.7936,
+      "grad_norm": 1.1727132215390659,
+      "learning_rate": 9.842414742769675e-06,
+      "loss": 0.3456,
+      "step": 93
+    },
+    {
+      "epoch": 0.8021333333333334,
+      "grad_norm": 0.8435059300379855,
+      "learning_rate": 9.865639267998493e-06,
+      "loss": 0.227,
+      "step": 94
+    },
+    {
+      "epoch": 0.8106666666666666,
+      "grad_norm": 0.8593375804730568,
+      "learning_rate": 9.888618026444238e-06,
+      "loss": 0.1985,
+      "step": 95
+    },
+    {
+      "epoch": 0.8192,
+      "grad_norm": 1.0673772841412472,
+      "learning_rate": 9.911356165197841e-06,
+      "loss": 0.3195,
+      "step": 96
+    },
+    {
+      "epoch": 0.8277333333333333,
+      "grad_norm": 0.9341285801648793,
+      "learning_rate": 9.933858671331224e-06,
+      "loss": 0.213,
+      "step": 97
+    },
+    {
+      "epoch": 0.8362666666666667,
+      "grad_norm": 0.7197728549764331,
+      "learning_rate": 9.956130378462474e-06,
+      "loss": 0.2067,
+      "step": 98
+    },
+    {
+      "epoch": 0.8448,
+      "grad_norm": 0.5655901060353195,
+      "learning_rate": 9.978175972987748e-06,
+      "loss": 0.1708,
+      "step": 99
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 0.4681745812066334,
+      "learning_rate": 9.999999999999999e-06,
+      "loss": 0.1983,
+      "step": 100
+    },
+    {
+      "epoch": 0.8618666666666667,
+      "grad_norm": 0.4488180280567293,
+      "learning_rate": 1e-05,
+      "loss": 0.1401,
+      "step": 101
+    },
+    {
+      "epoch": 0.8704,
+      "grad_norm": 0.43194512376224187,
+      "learning_rate": 1e-05,
+      "loss": 0.1097,
+      "step": 102
+    },
+    {
+      "epoch": 0.8789333333333333,
+      "grad_norm": 0.3754480982834532,
+      "learning_rate": 1e-05,
+      "loss": 0.1531,
+      "step": 103
+    },
+    {
+      "epoch": 0.8874666666666666,
+      "grad_norm": 0.34151633602448267,
+      "learning_rate": 1e-05,
+      "loss": 0.1685,
+      "step": 104
+    },
+    {
+      "epoch": 0.896,
+      "grad_norm": 0.26356638458244175,
+      "learning_rate": 1e-05,
+      "loss": 0.1104,
+      "step": 105
+    },
+    {
+      "epoch": 0.9045333333333333,
+      "grad_norm": 0.27641004897246113,
+      "learning_rate": 1e-05,
+      "loss": 0.1589,
+      "step": 106
+    },
+    {
+      "epoch": 0.9130666666666667,
+      "grad_norm": 0.1639383504796773,
+      "learning_rate": 1e-05,
+      "loss": 0.1064,
+      "step": 107
+    },
+    {
+      "epoch": 0.9216,
+      "grad_norm": 0.24233145434818837,
+      "learning_rate": 1e-05,
+      "loss": 0.1385,
+      "step": 108
+    },
+    {
+      "epoch": 0.9301333333333334,
+      "grad_norm": 0.16015184210317215,
+      "learning_rate": 1e-05,
+      "loss": 0.121,
+      "step": 109
+    },
+    {
+      "epoch": 0.9386666666666666,
+      "grad_norm": 0.14931644417242712,
+      "learning_rate": 1e-05,
+      "loss": 0.1117,
+      "step": 110
+    },
+    {
+      "epoch": 0.9472,
+      "grad_norm": 0.15078311335939154,
+      "learning_rate": 1e-05,
+      "loss": 0.1034,
+      "step": 111
+    },
+    {
+      "epoch": 0.9557333333333333,
+      "grad_norm": 0.16714082761639734,
+      "learning_rate": 1e-05,
+      "loss": 0.115,
+      "step": 112
+    },
+    {
+      "epoch": 0.9642666666666667,
+      "grad_norm": 0.12479711996187942,
+      "learning_rate": 1e-05,
+      "loss": 0.1029,
+      "step": 113
+    },
+    {
+      "epoch": 0.9728,
+      "grad_norm": 0.14783351137940065,
+      "learning_rate": 1e-05,
+      "loss": 0.0987,
+      "step": 114
+    },
+    {
+      "epoch": 0.9813333333333333,
+      "grad_norm": 0.11311876630863582,
+      "learning_rate": 1e-05,
+      "loss": 0.0911,
+      "step": 115
+    },
+    {
+      "epoch": 0.9898666666666667,
+      "grad_norm": 0.1238329581090649,
+      "learning_rate": 1e-05,
+      "loss": 0.1095,
+      "step": 116
+    },
+    {
+      "epoch": 0.9984,
+      "grad_norm": 0.11117413394533605,
+      "learning_rate": 1e-05,
+      "loss": 0.0968,
+      "step": 117
+    },
+    {
+      "epoch": 1.0069333333333332,
+      "grad_norm": 0.09247708923706752,
+      "learning_rate": 1e-05,
+      "loss": 0.0985,
+      "step": 118
+    },
+    {
+      "epoch": 1.0154666666666667,
+      "grad_norm": 0.12028574166046906,
+      "learning_rate": 1e-05,
+      "loss": 0.1085,
+      "step": 119
+    },
+    {
+      "epoch": 1.024,
+      "grad_norm": 0.075460717991084,
+      "learning_rate": 1e-05,
+      "loss": 0.1007,
+      "step": 120
+    },
+    {
+      "epoch": 1.0325333333333333,
+      "grad_norm": 0.1930335796969662,
+      "learning_rate": 1e-05,
+      "loss": 0.1438,
+      "step": 121
+    },
+    {
+      "epoch": 1.0410666666666666,
+      "grad_norm": 0.11451251015868702,
+      "learning_rate": 1e-05,
+      "loss": 0.1365,
+      "step": 122
+    },
+    {
+      "epoch": 1.0496,
+      "grad_norm": 0.09360332240252384,
+      "learning_rate": 1e-05,
+      "loss": 0.1039,
+      "step": 123
+    },
+    {
+      "epoch": 1.0581333333333334,
+      "grad_norm": 0.13162505626586696,
+      "learning_rate": 1e-05,
+      "loss": 0.1132,
+      "step": 124
+    },
+    {
+      "epoch": 1.0666666666666667,
+      "grad_norm": 0.1329223725298499,
+      "learning_rate": 1e-05,
+      "loss": 0.1153,
+      "step": 125
+    },
+    {
+      "epoch": 1.0752,
+      "grad_norm": 0.09522360247894453,
+      "learning_rate": 1e-05,
+      "loss": 0.1264,
+      "step": 126
+    },
+    {
+      "epoch": 1.0837333333333334,
+      "grad_norm": 0.12467359977458509,
+      "learning_rate": 1e-05,
+      "loss": 0.0866,
+      "step": 127
+    },
+    {
+      "epoch": 1.0922666666666667,
+      "grad_norm": 0.08853379791954709,
+      "learning_rate": 1e-05,
+      "loss": 0.107,
+      "step": 128
+    },
+    {
+      "epoch": 1.1008,
+      "grad_norm": 0.16050358070185106,
+      "learning_rate": 1e-05,
+      "loss": 0.1134,
+      "step": 129
+    },
+    {
+      "epoch": 1.1093333333333333,
+      "grad_norm": 0.10331318962336627,
+      "learning_rate": 1e-05,
+      "loss": 0.1217,
+      "step": 130
+    },
+    {
+      "epoch": 1.1178666666666666,
+      "grad_norm": 0.08498886624952962,
+      "learning_rate": 1e-05,
+      "loss": 0.12,
+      "step": 131
+    },
+    {
+      "epoch": 1.1264,
+      "grad_norm": 0.09918910544874306,
+      "learning_rate": 1e-05,
+      "loss": 0.1173,
+      "step": 132
+    },
+    {
+      "epoch": 1.1349333333333333,
+      "grad_norm": 0.0751198135696547,
+      "learning_rate": 1e-05,
+      "loss": 0.0973,
+      "step": 133
+    },
+    {
+      "epoch": 1.1434666666666666,
+      "grad_norm": 0.07959218402066412,
+      "learning_rate": 1e-05,
+      "loss": 0.0992,
+      "step": 134
+    },
+    {
+      "epoch": 1.152,
+      "grad_norm": 0.14419628324779726,
+      "learning_rate": 1e-05,
+      "loss": 0.0856,
+      "step": 135
+    },
+    {
+      "epoch": 1.1605333333333334,
+      "grad_norm": 0.07894542967774888,
+      "learning_rate": 1e-05,
+      "loss": 0.1193,
+      "step": 136
+    },
+    {
+      "epoch": 1.1690666666666667,
+      "grad_norm": 0.08735606763938318,
+      "learning_rate": 1e-05,
+      "loss": 0.1061,
+      "step": 137
+    },
+    {
+      "epoch": 1.1776,
+      "grad_norm": 0.12344637986728384,
+      "learning_rate": 1e-05,
+      "loss": 0.1184,
+      "step": 138
+    },
+    {
+      "epoch": 1.1861333333333333,
+      "grad_norm": 0.07797745242316644,
+      "learning_rate": 1e-05,
+      "loss": 0.0959,
+      "step": 139
+    },
+    {
+      "epoch": 1.1946666666666665,
+      "grad_norm": 0.10065236259356937,
+      "learning_rate": 1e-05,
+      "loss": 0.0957,
+      "step": 140
+    },
+    {
+      "epoch": 1.2032,
+      "grad_norm": 0.06472006342138571,
+      "learning_rate": 1e-05,
+      "loss": 0.0721,
+      "step": 141
+    },
+    {
+      "epoch": 1.2117333333333333,
+      "grad_norm": 0.08080002696086562,
+      "learning_rate": 1e-05,
+      "loss": 0.1073,
+      "step": 142
+    },
+    {
+      "epoch": 1.2202666666666666,
+      "grad_norm": 0.10400160039217118,
+      "learning_rate": 1e-05,
+      "loss": 0.1227,
+      "step": 143
+    },
+    {
+      "epoch": 1.2288000000000001,
+      "grad_norm": 0.08719509476650818,
+      "learning_rate": 1e-05,
+      "loss": 0.114,
+      "step": 144
+    },
+    {
+      "epoch": 1.2373333333333334,
+      "grad_norm": 0.08431635436674337,
+      "learning_rate": 1e-05,
+      "loss": 0.1303,
+      "step": 145
+    },
+    {
+      "epoch": 1.2458666666666667,
+      "grad_norm": 0.23947926607305503,
+      "learning_rate": 1e-05,
+      "loss": 0.1199,
+      "step": 146
+    },
+    {
+      "epoch": 1.2544,
+      "grad_norm": 0.08794721265212341,
+      "learning_rate": 1e-05,
+      "loss": 0.1094,
+      "step": 147
+    },
+    {
+      "epoch": 1.2629333333333332,
+      "grad_norm": 0.08063747277184712,
+      "learning_rate": 1e-05,
+      "loss": 0.1062,
+      "step": 148
+    },
+    {
+      "epoch": 1.2714666666666667,
+      "grad_norm": 0.06832693897193236,
+      "learning_rate": 1e-05,
+      "loss": 0.0842,
+      "step": 149
+    },
+    {
+      "epoch": 1.28,
+      "grad_norm": 0.07037053759395089,
+      "learning_rate": 1e-05,
+      "loss": 0.0971,
+      "step": 150
+    },
+    {
+      "epoch": 1.2885333333333333,
+      "grad_norm": 0.08753063334098339,
+      "learning_rate": 1e-05,
+      "loss": 0.085,
+      "step": 151
+    },
+    {
+      "epoch": 1.2970666666666666,
+      "grad_norm": 0.11381804369240754,
+      "learning_rate": 1e-05,
+      "loss": 0.1156,
+      "step": 152
+    },
+    {
+      "epoch": 1.3056,
+      "grad_norm": 0.07203805377255211,
+      "learning_rate": 1e-05,
+      "loss": 0.0951,
+      "step": 153
+    },
+    {
+      "epoch": 1.3141333333333334,
+      "grad_norm": 0.1156784206459358,
+      "learning_rate": 1e-05,
+      "loss": 0.1557,
+      "step": 154
+    },
+    {
+      "epoch": 1.3226666666666667,
+      "grad_norm": 0.11353874538174968,
+      "learning_rate": 1e-05,
+      "loss": 0.1284,
+      "step": 155
+    },
+    {
+      "epoch": 1.3312,
+      "grad_norm": 0.06675505890811795,
+      "learning_rate": 1e-05,
+      "loss": 0.089,
+      "step": 156
+    },
+    {
+      "epoch": 1.3397333333333332,
+      "grad_norm": 0.07642955477275162,
+      "learning_rate": 1e-05,
+      "loss": 0.0825,
+      "step": 157
+    },
+    {
+      "epoch": 1.3482666666666667,
+      "grad_norm": 0.07196529265355209,
+      "learning_rate": 1e-05,
+      "loss": 0.0885,
+      "step": 158
+    },
+    {
+      "epoch": 1.3568,
+      "grad_norm": 0.08651497112727735,
+      "learning_rate": 1e-05,
+      "loss": 0.0934,
+      "step": 159
+    },
+    {
+      "epoch": 1.3653333333333333,
+      "grad_norm": 0.07249320769144564,
+      "learning_rate": 1e-05,
+      "loss": 0.102,
+      "step": 160
+    },
+    {
+      "epoch": 1.3738666666666668,
+      "grad_norm": 0.08744246078973236,
+      "learning_rate": 1e-05,
+      "loss": 0.0905,
+      "step": 161
+    },
+    {
+      "epoch": 1.3824,
+      "grad_norm": 0.08657071789403122,
+      "learning_rate": 1e-05,
+      "loss": 0.1217,
+      "step": 162
+    },
+    {
+      "epoch": 1.3909333333333334,
+      "grad_norm": 0.1064187506686306,
+      "learning_rate": 1e-05,
+      "loss": 0.1163,
+      "step": 163
+    },
+    {
+      "epoch": 1.3994666666666666,
+      "grad_norm": 0.1280290421664948,
+      "learning_rate": 1e-05,
+      "loss": 0.1046,
+      "step": 164
+    },
+    {
+      "epoch": 1.408,
+      "grad_norm": 0.09937311183437203,
+      "learning_rate": 1e-05,
+      "loss": 0.1147,
+      "step": 165
+    },
+    {
+      "epoch": 1.4165333333333332,
+      "grad_norm": 0.08384493963149035,
+      "learning_rate": 1e-05,
+      "loss": 0.0837,
+      "step": 166
+    },
+    {
+      "epoch": 1.4250666666666667,
+      "grad_norm": 0.0878469941667546,
+      "learning_rate": 1e-05,
+      "loss": 0.1034,
+      "step": 167
+    },
+    {
+      "epoch": 1.4336,
+      "grad_norm": 0.08507656582015763,
+      "learning_rate": 1e-05,
+      "loss": 0.1124,
+      "step": 168
+    },
+    {
+      "epoch": 1.4421333333333333,
+      "grad_norm": 0.14341789007671765,
+      "learning_rate": 1e-05,
+      "loss": 0.1045,
+      "step": 169
+    },
+    {
+      "epoch": 1.4506666666666668,
+      "grad_norm": 0.11549200338103699,
+      "learning_rate": 1e-05,
+      "loss": 0.1192,
+      "step": 170
+    },
+    {
+      "epoch": 1.4592,
+      "grad_norm": 0.08297398102159202,
+      "learning_rate": 1e-05,
+      "loss": 0.106,
+      "step": 171
+    },
+    {
+      "epoch": 1.4677333333333333,
+      "grad_norm": 0.08511454300188333,
+      "learning_rate": 1e-05,
+      "loss": 0.1115,
+      "step": 172
+    },
+    {
+      "epoch": 1.4762666666666666,
+      "grad_norm": 0.06731733651614974,
+      "learning_rate": 1e-05,
+      "loss": 0.0579,
+      "step": 173
+    },
+    {
+      "epoch": 1.4848,
+      "grad_norm": 0.08522628039447024,
+      "learning_rate": 1e-05,
+      "loss": 0.0944,
+      "step": 174
+    },
+    {
+      "epoch": 1.4933333333333334,
+      "grad_norm": 0.08148851689521808,
+      "learning_rate": 1e-05,
+      "loss": 0.0946,
+      "step": 175
+    },
+    {
+      "epoch": 1.5018666666666667,
+      "grad_norm": 0.09314761246496046,
+      "learning_rate": 1e-05,
+      "loss": 0.1077,
+      "step": 176
+    },
+    {
+      "epoch": 1.5104,
+      "grad_norm": 0.08337943532869242,
+      "learning_rate": 1e-05,
+      "loss": 0.0919,
+      "step": 177
+    },
+    {
+      "epoch": 1.5189333333333335,
+      "grad_norm": 0.07936632915317685,
+      "learning_rate": 1e-05,
+      "loss": 0.0878,
+      "step": 178
+    },
+    {
+      "epoch": 1.5274666666666668,
+      "grad_norm": 0.10041567827499392,
+      "learning_rate": 1e-05,
+      "loss": 0.1164,
+      "step": 179
+    },
+    {
+      "epoch": 1.536,
+      "grad_norm": 0.08184099557308296,
+      "learning_rate": 1e-05,
+      "loss": 0.1143,
+      "step": 180
+    },
+    {
+      "epoch": 1.5445333333333333,
+      "grad_norm": 0.08722428613554693,
+      "learning_rate": 1e-05,
+      "loss": 0.1068,
+      "step": 181
+    },
+    {
+      "epoch": 1.5530666666666666,
+      "grad_norm": 0.08710953879234071,
+      "learning_rate": 1e-05,
+      "loss": 0.11,
+      "step": 182
+    },
+    {
+      "epoch": 1.5615999999999999,
+      "grad_norm": 0.08115450331732889,
+      "learning_rate": 1e-05,
+      "loss": 0.0877,
+      "step": 183
+    },
+    {
+      "epoch": 1.5701333333333334,
+      "grad_norm": 0.06955623887568685,
+      "learning_rate": 1e-05,
+      "loss": 0.0758,
+      "step": 184
+    },
+    {
+      "epoch": 1.5786666666666667,
+      "grad_norm": 0.11077420984396173,
+      "learning_rate": 1e-05,
+      "loss": 0.0886,
+      "step": 185
+    },
+    {
+      "epoch": 1.5872000000000002,
+      "grad_norm": 0.09248170156976404,
+      "learning_rate": 1e-05,
+      "loss": 0.1042,
+      "step": 186
+    },
+    {
+      "epoch": 1.5957333333333334,
+      "grad_norm": 0.0875865630501027,
+      "learning_rate": 1e-05,
+      "loss": 0.0956,
+      "step": 187
+    },
+    {
+      "epoch": 1.6042666666666667,
+      "grad_norm": 0.09025094284776364,
+      "learning_rate": 1e-05,
+      "loss": 0.0865,
+      "step": 188
+    },
+    {
+      "epoch": 1.6128,
+      "grad_norm": 0.09201435441623142,
+      "learning_rate": 1e-05,
+      "loss": 0.0848,
+      "step": 189
+    },
+    {
+      "epoch": 1.6213333333333333,
+      "grad_norm": 0.08582347653077456,
+      "learning_rate": 1e-05,
+      "loss": 0.0868,
+      "step": 190
+    },
+    {
+      "epoch": 1.6298666666666666,
+      "grad_norm": 0.08390294885002035,
+      "learning_rate": 1e-05,
+      "loss": 0.0883,
+      "step": 191
+    },
+    {
+      "epoch": 1.6383999999999999,
+      "grad_norm": 0.09484831369314428,
+      "learning_rate": 1e-05,
+      "loss": 0.0955,
+      "step": 192
+    },
+    {
+      "epoch": 1.6469333333333334,
+      "grad_norm": 0.08291745035821121,
+      "learning_rate": 1e-05,
+      "loss": 0.0943,
+      "step": 193
+    },
+    {
+      "epoch": 1.6554666666666666,
+      "grad_norm": 0.09788087284042751,
+      "learning_rate": 1e-05,
+      "loss": 0.1146,
+      "step": 194
+    },
+    {
+      "epoch": 1.6640000000000001,
+      "grad_norm": 0.09763113175653552,
+      "learning_rate": 1e-05,
+      "loss": 0.1028,
+      "step": 195
+    },
+    {
+      "epoch": 1.6725333333333334,
+      "grad_norm": 0.11617852408102547,
+      "learning_rate": 1e-05,
+      "loss": 0.1323,
+      "step": 196
+    },
+    {
+      "epoch": 1.6810666666666667,
+      "grad_norm": 0.12191871384850739,
+      "learning_rate": 1e-05,
+      "loss": 0.1395,
+      "step": 197
+    },
+    {
+      "epoch": 1.6896,
+      "grad_norm": 0.1359943408077879,
+      "learning_rate": 1e-05,
+      "loss": 0.1191,
+      "step": 198
+    },
+    {
+      "epoch": 1.6981333333333333,
+      "grad_norm": 0.12006029084078058,
+      "learning_rate": 1e-05,
+      "loss": 0.0983,
+      "step": 199
+    },
+    {
+      "epoch": 1.7066666666666666,
+      "grad_norm": 0.09668785600159001,
+      "learning_rate": 1e-05,
+      "loss": 0.0801,
+      "step": 200
+    },
+    {
+      "epoch": 1.7151999999999998,
+      "grad_norm": 0.11929283034682205,
+      "learning_rate": 1e-05,
+      "loss": 0.1072,
+      "step": 201
+    },
+    {
+      "epoch": 1.7237333333333333,
+      "grad_norm": 0.09077598659108727,
+      "learning_rate": 1e-05,
+      "loss": 0.0835,
+      "step": 202
+    },
+    {
+      "epoch": 1.7322666666666666,
+      "grad_norm": 0.1315112247694008,
+      "learning_rate": 1e-05,
+      "loss": 0.1251,
+      "step": 203
+    },
+    {
+      "epoch": 1.7408000000000001,
+      "grad_norm": 0.10262675849503336,
+      "learning_rate": 1e-05,
+      "loss": 0.1102,
+      "step": 204
+    },
+    {
+      "epoch": 1.7493333333333334,
+      "grad_norm": 0.11679561974734426,
+      "learning_rate": 1e-05,
+      "loss": 0.0912,
+      "step": 205
+    },
+    {
+      "epoch": 1.7578666666666667,
+      "grad_norm": 0.12857201623167358,
+      "learning_rate": 1e-05,
+      "loss": 0.1108,
+      "step": 206
+    },
+    {
+      "epoch": 1.7664,
+      "grad_norm": 0.110417578370301,
+      "learning_rate": 1e-05,
+      "loss": 0.0713,
+      "step": 207
+    },
+    {
+      "epoch": 1.7749333333333333,
+      "grad_norm": 0.1206716016388202,
+      "learning_rate": 1e-05,
+      "loss": 0.099,
+      "step": 208
+    },
+    {
+      "epoch": 1.7834666666666665,
+      "grad_norm": 0.11690286401098868,
+      "learning_rate": 1e-05,
+      "loss": 0.1398,
+      "step": 209
+    },
+    {
+      "epoch": 1.792,
+      "grad_norm": 0.1087083638784744,
+      "learning_rate": 1e-05,
+      "loss": 0.1106,
+      "step": 210
+    },
+    {
+      "epoch": 1.8005333333333333,
+      "grad_norm": 0.13044092544075447,
+      "learning_rate": 1e-05,
+      "loss": 0.1298,
+      "step": 211
+    },
+    {
+      "epoch": 1.8090666666666668,
+      "grad_norm": 0.11125544216608903,
+      "learning_rate": 1e-05,
+      "loss": 0.0862,
+      "step": 212
+    },
+    {
+      "epoch": 1.8176,
+      "grad_norm": 0.15173848052348715,
+      "learning_rate": 1e-05,
+      "loss": 0.1116,
+      "step": 213
+    },
+    {
+      "epoch": 1.8261333333333334,
+      "grad_norm": 0.1300854070876123,
+      "learning_rate": 1e-05,
+      "loss": 0.0881,
+      "step": 214
+    },
+    {
+      "epoch": 1.8346666666666667,
+      "grad_norm": 0.12472742133557221,
+      "learning_rate": 1e-05,
+      "loss": 0.1199,
+      "step": 215
+    },
+    {
+      "epoch": 1.8432,
+      "grad_norm": 0.10311157164421082,
+      "learning_rate": 1e-05,
+      "loss": 0.0887,
+      "step": 216
+    },
+    {
+      "epoch": 1.8517333333333332,
+      "grad_norm": 0.13979969636076792,
+      "learning_rate": 1e-05,
+      "loss": 0.089,
+      "step": 217
+    },
+    {
+      "epoch": 1.8602666666666665,
+      "grad_norm": 0.1725935114282675,
+      "learning_rate": 1e-05,
+      "loss": 0.1232,
+      "step": 218
+    },
+    {
+      "epoch": 1.8688,
+      "grad_norm": 0.13035682714460442,
+      "learning_rate": 1e-05,
+      "loss": 0.0803,
+      "step": 219
+    },
+    {
+      "epoch": 1.8773333333333333,
+      "grad_norm": 0.11707794313507026,
+      "learning_rate": 1e-05,
+      "loss": 0.0947,
+      "step": 220
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 301,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 20,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.1374063827523994e+18,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-220/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9430fb289d52200b279530dc31f818fe016b81f2a2feb4d356e75541590998de
+size 6840

checkpoint-240/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: ../ckpts/Meta-Llama-3-8B-Instruct
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-240/adapter_config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "../ckpts/Meta-Llama-3-8B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "up_proj",
+    "gate_proj",
+    "lm_head",
+    "o_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-240/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5aac588465402888ebbdf86cf738925beff4117d2f5bd01e44b0bba300250fbd
+size 1138856856

checkpoint-240/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1713 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 2.048,
+  "eval_steps": 500,
+  "global_step": 240,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.008533333333333334,
+      "grad_norm": 160.11701043689894,
+      "learning_rate": 0.0,
+      "loss": 32.4968,
+      "step": 1
+    },
+    {
+      "epoch": 0.017066666666666667,
+      "grad_norm": 157.24779534424323,
+      "learning_rate": 1.5051499783199057e-06,
+      "loss": 31.6979,
+      "step": 2
+    },
+    {
+      "epoch": 0.0256,
+      "grad_norm": 157.9465272449825,
+      "learning_rate": 2.385606273598312e-06,
+      "loss": 31.8828,
+      "step": 3
+    },
+    {
+      "epoch": 0.034133333333333335,
+      "grad_norm": 160.2154859965946,
+      "learning_rate": 3.0102999566398115e-06,
+      "loss": 31.9681,
+      "step": 4
+    },
+    {
+      "epoch": 0.042666666666666665,
+      "grad_norm": 158.5305446712084,
+      "learning_rate": 3.4948500216800934e-06,
+      "loss": 31.3717,
+      "step": 5
+    },
+    {
+      "epoch": 0.0512,
+      "grad_norm": 155.50243039700376,
+      "learning_rate": 3.890756251918218e-06,
+      "loss": 30.5348,
+      "step": 6
+    },
+    {
+      "epoch": 0.05973333333333333,
+      "grad_norm": 168.6887446693614,
+      "learning_rate": 4.225490200071284e-06,
+      "loss": 31.3845,
+      "step": 7
+    },
+    {
+      "epoch": 0.06826666666666667,
+      "grad_norm": 164.2631689450651,
+      "learning_rate": 4.515449934959717e-06,
+      "loss": 30.5243,
+      "step": 8
+    },
+    {
+      "epoch": 0.0768,
+      "grad_norm": 174.1878139573776,
+      "learning_rate": 4.771212547196624e-06,
+      "loss": 30.0138,
+      "step": 9
+    },
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 177.9519334680014,
+      "learning_rate": 4.9999999999999996e-06,
+      "loss": 29.6143,
+      "step": 10
+    },
+    {
+      "epoch": 0.09386666666666667,
+      "grad_norm": 183.57104380865735,
+      "learning_rate": 5.206963425791125e-06,
+      "loss": 28.8718,
+      "step": 11
+    },
+    {
+      "epoch": 0.1024,
+      "grad_norm": 186.4090344511231,
+      "learning_rate": 5.395906230238124e-06,
+      "loss": 26.1695,
+      "step": 12
+    },
+    {
+      "epoch": 0.11093333333333333,
+      "grad_norm": 198.17161320746723,
+      "learning_rate": 5.5697167615341825e-06,
+      "loss": 26.1266,
+      "step": 13
+    },
+    {
+      "epoch": 0.11946666666666667,
+      "grad_norm": 182.4443087115901,
+      "learning_rate": 5.730640178391189e-06,
+      "loss": 24.2121,
+      "step": 14
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 159.38105380659272,
+      "learning_rate": 5.880456295278406e-06,
+      "loss": 22.5796,
+      "step": 15
+    },
+    {
+      "epoch": 0.13653333333333334,
+      "grad_norm": 142.82387126501297,
+      "learning_rate": 6.020599913279623e-06,
+      "loss": 21.1346,
+      "step": 16
+    },
+    {
+      "epoch": 0.14506666666666668,
+      "grad_norm": 123.86394296641578,
+      "learning_rate": 6.15224460689137e-06,
+      "loss": 19.8457,
+      "step": 17
+    },
+    {
+      "epoch": 0.1536,
+      "grad_norm": 112.3988260336824,
+      "learning_rate": 6.276362525516529e-06,
+      "loss": 18.7824,
+      "step": 18
+    },
+    {
+      "epoch": 0.16213333333333332,
+      "grad_norm": 120.96712330991012,
+      "learning_rate": 6.393768004764144e-06,
+      "loss": 18.0207,
+      "step": 19
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 129.42692949353702,
+      "learning_rate": 6.505149978319905e-06,
+      "loss": 16.8355,
+      "step": 20
+    },
+    {
+      "epoch": 0.1792,
+      "grad_norm": 120.65595457746791,
+      "learning_rate": 6.611096473669596e-06,
+      "loss": 15.252,
+      "step": 21
+    },
+    {
+      "epoch": 0.18773333333333334,
+      "grad_norm": 133.05280466087515,
+      "learning_rate": 6.712113404111031e-06,
+      "loss": 14.1391,
+      "step": 22
+    },
+    {
+      "epoch": 0.19626666666666667,
+      "grad_norm": 127.95029628849048,
+      "learning_rate": 6.808639180087963e-06,
+      "loss": 12.9566,
+      "step": 23
+    },
+    {
+      "epoch": 0.2048,
+      "grad_norm": 108.83495245094748,
+      "learning_rate": 6.90105620855803e-06,
+      "loss": 11.8743,
+      "step": 24
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 99.90727146021455,
+      "learning_rate": 6.989700043360187e-06,
+      "loss": 10.962,
+      "step": 25
+    },
+    {
+      "epoch": 0.22186666666666666,
+      "grad_norm": 98.37126740059823,
+      "learning_rate": 7.074866739854089e-06,
+      "loss": 9.9919,
+      "step": 26
+    },
+    {
+      "epoch": 0.2304,
+      "grad_norm": 92.26708429201608,
+      "learning_rate": 7.156818820794936e-06,
+      "loss": 8.8811,
+      "step": 27
+    },
+    {
+      "epoch": 0.23893333333333333,
+      "grad_norm": 83.36099898839835,
+      "learning_rate": 7.235790156711096e-06,
+      "loss": 7.7806,
+      "step": 28
+    },
+    {
+      "epoch": 0.24746666666666667,
+      "grad_norm": 68.07500315598597,
+      "learning_rate": 7.3119899894947795e-06,
+      "loss": 7.0528,
+      "step": 29
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 69.58960332280246,
+      "learning_rate": 7.385606273598311e-06,
+      "loss": 6.3683,
+      "step": 30
+    },
+    {
+      "epoch": 0.26453333333333334,
+      "grad_norm": 68.77532204123075,
+      "learning_rate": 7.456808469171363e-06,
+      "loss": 6.1635,
+      "step": 31
+    },
+    {
+      "epoch": 0.2730666666666667,
+      "grad_norm": 66.29676636510072,
+      "learning_rate": 7.5257498915995295e-06,
+      "loss": 4.711,
+      "step": 32
+    },
+    {
+      "epoch": 0.2816,
+      "grad_norm": 42.87145091679237,
+      "learning_rate": 7.592569699389437e-06,
+      "loss": 4.5119,
+      "step": 33
+    },
+    {
+      "epoch": 0.29013333333333335,
+      "grad_norm": 26.2592350291551,
+      "learning_rate": 7.657394585211274e-06,
+      "loss": 4.31,
+      "step": 34
+    },
+    {
+      "epoch": 0.2986666666666667,
+      "grad_norm": 15.35959008067237,
+      "learning_rate": 7.720340221751376e-06,
+      "loss": 4.0001,
+      "step": 35
+    },
+    {
+      "epoch": 0.3072,
+      "grad_norm": 8.50847651865227,
+      "learning_rate": 7.781512503836437e-06,
+      "loss": 3.5723,
+      "step": 36
+    },
+    {
+      "epoch": 0.3157333333333333,
+      "grad_norm": 6.562581089063746,
+      "learning_rate": 7.841008620334974e-06,
+      "loss": 3.9254,
+      "step": 37
+    },
+    {
+      "epoch": 0.32426666666666665,
+      "grad_norm": 5.6145595722250095,
+      "learning_rate": 7.89891798308405e-06,
+      "loss": 3.8746,
+      "step": 38
+    },
+    {
+      "epoch": 0.3328,
+      "grad_norm": 5.385367220486204,
+      "learning_rate": 7.955323035132495e-06,
+      "loss": 3.8128,
+      "step": 39
+    },
+    {
+      "epoch": 0.3413333333333333,
+      "grad_norm": 5.403447124703616,
+      "learning_rate": 8.010299956639811e-06,
+      "loss": 3.885,
+      "step": 40
+    },
+    {
+      "epoch": 0.34986666666666666,
+      "grad_norm": 5.48242204895128,
+      "learning_rate": 8.063919283598677e-06,
+      "loss": 3.8048,
+      "step": 41
+    },
+    {
+      "epoch": 0.3584,
+      "grad_norm": 5.5525098950513865,
+      "learning_rate": 8.116246451989503e-06,
+      "loss": 3.7508,
+      "step": 42
+    },
+    {
+      "epoch": 0.36693333333333333,
+      "grad_norm": 5.354384520535484,
+      "learning_rate": 8.167342277897933e-06,
+      "loss": 3.5069,
+      "step": 43
+    },
+    {
+      "epoch": 0.37546666666666667,
+      "grad_norm": 5.46272338131107,
+      "learning_rate": 8.217263382430936e-06,
+      "loss": 3.6747,
+      "step": 44
+    },
+    {
+      "epoch": 0.384,
+      "grad_norm": 4.798550688968453,
+      "learning_rate": 8.266062568876717e-06,
+      "loss": 3.1609,
+      "step": 45
+    },
+    {
+      "epoch": 0.39253333333333335,
+      "grad_norm": 5.755104452953421,
+      "learning_rate": 8.31378915840787e-06,
+      "loss": 3.5733,
+      "step": 46
+    },
+    {
+      "epoch": 0.4010666666666667,
+      "grad_norm": 4.618763611067563,
+      "learning_rate": 8.360489289678585e-06,
+      "loss": 2.9402,
+      "step": 47
+    },
+    {
+      "epoch": 0.4096,
+      "grad_norm": 5.506785974818791,
+      "learning_rate": 8.406206186877936e-06,
+      "loss": 3.382,
+      "step": 48
+    },
+    {
+      "epoch": 0.41813333333333336,
+      "grad_norm": 4.68603207809794,
+      "learning_rate": 8.450980400142568e-06,
+      "loss": 2.9918,
+      "step": 49
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 5.124033394817131,
+      "learning_rate": 8.494850021680093e-06,
+      "loss": 3.3202,
+      "step": 50
+    },
+    {
+      "epoch": 0.4352,
+      "grad_norm": 4.293001183481895,
+      "learning_rate": 8.537850880489681e-06,
+      "loss": 2.8519,
+      "step": 51
+    },
+    {
+      "epoch": 0.4437333333333333,
+      "grad_norm": 4.382596858902394,
+      "learning_rate": 8.580016718173996e-06,
+      "loss": 2.9683,
+      "step": 52
+    },
+    {
+      "epoch": 0.45226666666666665,
+      "grad_norm": 4.3176263388044696,
+      "learning_rate": 8.621379348003945e-06,
+      "loss": 2.9257,
+      "step": 53
+    },
+    {
+      "epoch": 0.4608,
+      "grad_norm": 4.5250022171605195,
+      "learning_rate": 8.661968799114844e-06,
+      "loss": 3.0556,
+      "step": 54
+    },
+    {
+      "epoch": 0.4693333333333333,
+      "grad_norm": 4.429424190600661,
+      "learning_rate": 8.701813447471218e-06,
+      "loss": 2.9513,
+      "step": 55
+    },
+    {
+      "epoch": 0.47786666666666666,
+      "grad_norm": 4.349652568052827,
+      "learning_rate": 8.740940135031001e-06,
+      "loss": 2.9029,
+      "step": 56
+    },
+    {
+      "epoch": 0.4864,
+      "grad_norm": 4.299227871435445,
+      "learning_rate": 8.779374278362457e-06,
+      "loss": 2.5989,
+      "step": 57
+    },
+    {
+      "epoch": 0.49493333333333334,
+      "grad_norm": 4.562461330302201,
+      "learning_rate": 8.817139967814684e-06,
+      "loss": 2.8158,
+      "step": 58
+    },
+    {
+      "epoch": 0.5034666666666666,
+      "grad_norm": 4.606987182758338,
+      "learning_rate": 8.854260058210721e-06,
+      "loss": 2.6272,
+      "step": 59
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 4.9420031522511545,
+      "learning_rate": 8.890756251918216e-06,
+      "loss": 2.5488,
+      "step": 60
+    },
+    {
+      "epoch": 0.5205333333333333,
+      "grad_norm": 4.706462297046012,
+      "learning_rate": 8.926649175053834e-06,
+      "loss": 2.3575,
+      "step": 61
+    },
+    {
+      "epoch": 0.5290666666666667,
+      "grad_norm": 4.862820204363494,
+      "learning_rate": 8.961958447491269e-06,
+      "loss": 2.2952,
+      "step": 62
+    },
+    {
+      "epoch": 0.5376,
+      "grad_norm": 4.911045913397774,
+      "learning_rate": 8.996702747267908e-06,
+      "loss": 2.1768,
+      "step": 63
+    },
+    {
+      "epoch": 0.5461333333333334,
+      "grad_norm": 5.46978680182973,
+      "learning_rate": 9.030899869919434e-06,
+      "loss": 2.2528,
+      "step": 64
+    },
+    {
+      "epoch": 0.5546666666666666,
+      "grad_norm": 5.847558397227374,
+      "learning_rate": 9.064566783214276e-06,
+      "loss": 2.2401,
+      "step": 65
+    },
+    {
+      "epoch": 0.5632,
+      "grad_norm": 5.984440656257,
+      "learning_rate": 9.097719677709343e-06,
+      "loss": 2.156,
+      "step": 66
+    },
+    {
+      "epoch": 0.5717333333333333,
+      "grad_norm": 6.146172189799918,
+      "learning_rate": 9.130374013504131e-06,
+      "loss": 2.0059,
+      "step": 67
+    },
+    {
+      "epoch": 0.5802666666666667,
+      "grad_norm": 5.725706778130614,
+      "learning_rate": 9.162544563531182e-06,
+      "loss": 1.7756,
+      "step": 68
+    },
+    {
+      "epoch": 0.5888,
+      "grad_norm": 6.479060263133115,
+      "learning_rate": 9.194245453686277e-06,
+      "loss": 1.7651,
+      "step": 69
+    },
+    {
+      "epoch": 0.5973333333333334,
+      "grad_norm": 7.319291050667066,
+      "learning_rate": 9.225490200071284e-06,
+      "loss": 1.7712,
+      "step": 70
+    },
+    {
+      "epoch": 0.6058666666666667,
+      "grad_norm": 6.913275412032087,
+      "learning_rate": 9.256291743595376e-06,
+      "loss": 1.709,
+      "step": 71
+    },
+    {
+      "epoch": 0.6144,
+      "grad_norm": 6.600657239614328,
+      "learning_rate": 9.28666248215634e-06,
+      "loss": 1.3731,
+      "step": 72
+    },
+    {
+      "epoch": 0.6229333333333333,
+      "grad_norm": 7.301483724647945,
+      "learning_rate": 9.316614300602277e-06,
+      "loss": 1.4166,
+      "step": 73
+    },
+    {
+      "epoch": 0.6314666666666666,
+      "grad_norm": 7.154933225265475,
+      "learning_rate": 9.346158598654881e-06,
+      "loss": 1.2797,
+      "step": 74
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 8.248472592538771,
+      "learning_rate": 9.375306316958499e-06,
+      "loss": 1.2082,
+      "step": 75
+    },
+    {
+      "epoch": 0.6485333333333333,
+      "grad_norm": 7.444479096112177,
+      "learning_rate": 9.404067961403957e-06,
+      "loss": 1.0402,
+      "step": 76
+    },
+    {
+      "epoch": 0.6570666666666667,
+      "grad_norm": 6.819760434594012,
+      "learning_rate": 9.432453625862409e-06,
+      "loss": 0.8244,
+      "step": 77
+    },
+    {
+      "epoch": 0.6656,
+      "grad_norm": 6.894760862855001,
+      "learning_rate": 9.460473013452401e-06,
+      "loss": 0.8345,
+      "step": 78
+    },
+    {
+      "epoch": 0.6741333333333334,
+      "grad_norm": 6.001848571839919,
+      "learning_rate": 9.488135456452207e-06,
+      "loss": 0.6839,
+      "step": 79
+    },
+    {
+      "epoch": 0.6826666666666666,
+      "grad_norm": 5.709147411501981,
+      "learning_rate": 9.515449934959717e-06,
+      "loss": 0.6567,
+      "step": 80
+    },
+    {
+      "epoch": 0.6912,
+      "grad_norm": 4.128977158730638,
+      "learning_rate": 9.542425094393249e-06,
+      "loss": 0.545,
+      "step": 81
+    },
+    {
+      "epoch": 0.6997333333333333,
+      "grad_norm": 2.604915806147427,
+      "learning_rate": 9.569069261918582e-06,
+      "loss": 0.4596,
+      "step": 82
+    },
+    {
+      "epoch": 0.7082666666666667,
+      "grad_norm": 2.039939253407506,
+      "learning_rate": 9.59539046188037e-06,
+      "loss": 0.452,
+      "step": 83
+    },
+    {
+      "epoch": 0.7168,
+      "grad_norm": 2.0398988141415337,
+      "learning_rate": 9.621396430309407e-06,
+      "loss": 0.4538,
+      "step": 84
+    },
+    {
+      "epoch": 0.7253333333333334,
+      "grad_norm": 2.37589477950211,
+      "learning_rate": 9.647094628571464e-06,
+      "loss": 0.4505,
+      "step": 85
+    },
+    {
+      "epoch": 0.7338666666666667,
+      "grad_norm": 2.80580920047501,
+      "learning_rate": 9.672492256217837e-06,
+      "loss": 0.5284,
+      "step": 86
+    },
+    {
+      "epoch": 0.7424,
+      "grad_norm": 2.3687428819051197,
+      "learning_rate": 9.697596263093091e-06,
+      "loss": 0.4371,
+      "step": 87
+    },
+    {
+      "epoch": 0.7509333333333333,
+      "grad_norm": 1.6362502854757155,
+      "learning_rate": 9.722413360750844e-06,
+      "loss": 0.3652,
+      "step": 88
+    },
+    {
+      "epoch": 0.7594666666666666,
+      "grad_norm": 1.5360860168740427,
+      "learning_rate": 9.746950033224562e-06,
+      "loss": 0.3235,
+      "step": 89
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 1.7245475092642693,
+      "learning_rate": 9.771212547196623e-06,
+      "loss": 0.3072,
+      "step": 90
+    },
+    {
+      "epoch": 0.7765333333333333,
+      "grad_norm": 1.4493496982196852,
+      "learning_rate": 9.795206961605467e-06,
+      "loss": 0.2474,
+      "step": 91
+    },
+    {
+      "epoch": 0.7850666666666667,
+      "grad_norm": 1.1662262130552072,
+      "learning_rate": 9.818939136727777e-06,
+      "loss": 0.2684,
+      "step": 92
+    },
+    {
+      "epoch": 0.7936,
+      "grad_norm": 1.1727132215390659,
+      "learning_rate": 9.842414742769675e-06,
+      "loss": 0.3456,
+      "step": 93
+    },
+    {
+      "epoch": 0.8021333333333334,
+      "grad_norm": 0.8435059300379855,
+      "learning_rate": 9.865639267998493e-06,
+      "loss": 0.227,
+      "step": 94
+    },
+    {
+      "epoch": 0.8106666666666666,
+      "grad_norm": 0.8593375804730568,
+      "learning_rate": 9.888618026444238e-06,
+      "loss": 0.1985,
+      "step": 95
+    },
+    {
+      "epoch": 0.8192,
+      "grad_norm": 1.0673772841412472,
+      "learning_rate": 9.911356165197841e-06,
+      "loss": 0.3195,
+      "step": 96
+    },
+    {
+      "epoch": 0.8277333333333333,
+      "grad_norm": 0.9341285801648793,
+      "learning_rate": 9.933858671331224e-06,
+      "loss": 0.213,
+      "step": 97
+    },
+    {
+      "epoch": 0.8362666666666667,
+      "grad_norm": 0.7197728549764331,
+      "learning_rate": 9.956130378462474e-06,
+      "loss": 0.2067,
+      "step": 98
+    },
+    {
+      "epoch": 0.8448,
+      "grad_norm": 0.5655901060353195,
+      "learning_rate": 9.978175972987748e-06,
+      "loss": 0.1708,
+      "step": 99
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 0.4681745812066334,
+      "learning_rate": 9.999999999999999e-06,
+      "loss": 0.1983,
+      "step": 100
+    },
+    {
+      "epoch": 0.8618666666666667,
+      "grad_norm": 0.4488180280567293,
+      "learning_rate": 1e-05,
+      "loss": 0.1401,
+      "step": 101
+    },
+    {
+      "epoch": 0.8704,
+      "grad_norm": 0.43194512376224187,
+      "learning_rate": 1e-05,
+      "loss": 0.1097,
+      "step": 102
+    },
+    {
+      "epoch": 0.8789333333333333,
+      "grad_norm": 0.3754480982834532,
+      "learning_rate": 1e-05,
+      "loss": 0.1531,
+      "step": 103
+    },
+    {
+      "epoch": 0.8874666666666666,
+      "grad_norm": 0.34151633602448267,
+      "learning_rate": 1e-05,
+      "loss": 0.1685,
+      "step": 104
+    },
+    {
+      "epoch": 0.896,
+      "grad_norm": 0.26356638458244175,
+      "learning_rate": 1e-05,
+      "loss": 0.1104,
+      "step": 105
+    },
+    {
+      "epoch": 0.9045333333333333,
+      "grad_norm": 0.27641004897246113,
+      "learning_rate": 1e-05,
+      "loss": 0.1589,
+      "step": 106
+    },
+    {
+      "epoch": 0.9130666666666667,
+      "grad_norm": 0.1639383504796773,
+      "learning_rate": 1e-05,
+      "loss": 0.1064,
+      "step": 107
+    },
+    {
+      "epoch": 0.9216,
+      "grad_norm": 0.24233145434818837,
+      "learning_rate": 1e-05,
+      "loss": 0.1385,
+      "step": 108
+    },
+    {
+      "epoch": 0.9301333333333334,
+      "grad_norm": 0.16015184210317215,
+      "learning_rate": 1e-05,
+      "loss": 0.121,
+      "step": 109
+    },
+    {
+      "epoch": 0.9386666666666666,
+      "grad_norm": 0.14931644417242712,
+      "learning_rate": 1e-05,
+      "loss": 0.1117,
+      "step": 110
+    },
+    {
+      "epoch": 0.9472,
+      "grad_norm": 0.15078311335939154,
+      "learning_rate": 1e-05,
+      "loss": 0.1034,
+      "step": 111
+    },
+    {
+      "epoch": 0.9557333333333333,
+      "grad_norm": 0.16714082761639734,
+      "learning_rate": 1e-05,
+      "loss": 0.115,
+      "step": 112
+    },
+    {
+      "epoch": 0.9642666666666667,
+      "grad_norm": 0.12479711996187942,
+      "learning_rate": 1e-05,
+      "loss": 0.1029,
+      "step": 113
+    },
+    {
+      "epoch": 0.9728,
+      "grad_norm": 0.14783351137940065,
+      "learning_rate": 1e-05,
+      "loss": 0.0987,
+      "step": 114
+    },
+    {
+      "epoch": 0.9813333333333333,
+      "grad_norm": 0.11311876630863582,
+      "learning_rate": 1e-05,
+      "loss": 0.0911,
+      "step": 115
+    },
+    {
+      "epoch": 0.9898666666666667,
+      "grad_norm": 0.1238329581090649,
+      "learning_rate": 1e-05,
+      "loss": 0.1095,
+      "step": 116
+    },
+    {
+      "epoch": 0.9984,
+      "grad_norm": 0.11117413394533605,
+      "learning_rate": 1e-05,
+      "loss": 0.0968,
+      "step": 117
+    },
+    {
+      "epoch": 1.0069333333333332,
+      "grad_norm": 0.09247708923706752,
+      "learning_rate": 1e-05,
+      "loss": 0.0985,
+      "step": 118
+    },
+    {
+      "epoch": 1.0154666666666667,
+      "grad_norm": 0.12028574166046906,
+      "learning_rate": 1e-05,
+      "loss": 0.1085,
+      "step": 119
+    },
+    {
+      "epoch": 1.024,
+      "grad_norm": 0.075460717991084,
+      "learning_rate": 1e-05,
+      "loss": 0.1007,
+      "step": 120
+    },
+    {
+      "epoch": 1.0325333333333333,
+      "grad_norm": 0.1930335796969662,
+      "learning_rate": 1e-05,
+      "loss": 0.1438,
+      "step": 121
+    },
+    {
+      "epoch": 1.0410666666666666,
+      "grad_norm": 0.11451251015868702,
+      "learning_rate": 1e-05,
+      "loss": 0.1365,
+      "step": 122
+    },
+    {
+      "epoch": 1.0496,
+      "grad_norm": 0.09360332240252384,
+      "learning_rate": 1e-05,
+      "loss": 0.1039,
+      "step": 123
+    },
+    {
+      "epoch": 1.0581333333333334,
+      "grad_norm": 0.13162505626586696,
+      "learning_rate": 1e-05,
+      "loss": 0.1132,
+      "step": 124
+    },
+    {
+      "epoch": 1.0666666666666667,
+      "grad_norm": 0.1329223725298499,
+      "learning_rate": 1e-05,
+      "loss": 0.1153,
+      "step": 125
+    },
+    {
+      "epoch": 1.0752,
+      "grad_norm": 0.09522360247894453,
+      "learning_rate": 1e-05,
+      "loss": 0.1264,
+      "step": 126
+    },
+    {
+      "epoch": 1.0837333333333334,
+      "grad_norm": 0.12467359977458509,
+      "learning_rate": 1e-05,
+      "loss": 0.0866,
+      "step": 127
+    },
+    {
+      "epoch": 1.0922666666666667,
+      "grad_norm": 0.08853379791954709,
+      "learning_rate": 1e-05,
+      "loss": 0.107,
+      "step": 128
+    },
+    {
+      "epoch": 1.1008,
+      "grad_norm": 0.16050358070185106,
+      "learning_rate": 1e-05,
+      "loss": 0.1134,
+      "step": 129
+    },
+    {
+      "epoch": 1.1093333333333333,
+      "grad_norm": 0.10331318962336627,
+      "learning_rate": 1e-05,
+      "loss": 0.1217,
+      "step": 130
+    },
+    {
+      "epoch": 1.1178666666666666,
+      "grad_norm": 0.08498886624952962,
+      "learning_rate": 1e-05,
+      "loss": 0.12,
+      "step": 131
+    },
+    {
+      "epoch": 1.1264,
+      "grad_norm": 0.09918910544874306,
+      "learning_rate": 1e-05,
+      "loss": 0.1173,
+      "step": 132
+    },
+    {
+      "epoch": 1.1349333333333333,
+      "grad_norm": 0.0751198135696547,
+      "learning_rate": 1e-05,
+      "loss": 0.0973,
+      "step": 133
+    },
+    {
+      "epoch": 1.1434666666666666,
+      "grad_norm": 0.07959218402066412,
+      "learning_rate": 1e-05,
+      "loss": 0.0992,
+      "step": 134
+    },
+    {
+      "epoch": 1.152,
+      "grad_norm": 0.14419628324779726,
+      "learning_rate": 1e-05,
+      "loss": 0.0856,
+      "step": 135
+    },
+    {
+      "epoch": 1.1605333333333334,
+      "grad_norm": 0.07894542967774888,
+      "learning_rate": 1e-05,
+      "loss": 0.1193,
+      "step": 136
+    },
+    {
+      "epoch": 1.1690666666666667,
+      "grad_norm": 0.08735606763938318,
+      "learning_rate": 1e-05,
+      "loss": 0.1061,
+      "step": 137
+    },
+    {
+      "epoch": 1.1776,
+      "grad_norm": 0.12344637986728384,
+      "learning_rate": 1e-05,
+      "loss": 0.1184,
+      "step": 138
+    },
+    {
+      "epoch": 1.1861333333333333,
+      "grad_norm": 0.07797745242316644,
+      "learning_rate": 1e-05,
+      "loss": 0.0959,
+      "step": 139
+    },
+    {
+      "epoch": 1.1946666666666665,
+      "grad_norm": 0.10065236259356937,
+      "learning_rate": 1e-05,
+      "loss": 0.0957,
+      "step": 140
+    },
+    {
+      "epoch": 1.2032,
+      "grad_norm": 0.06472006342138571,
+      "learning_rate": 1e-05,
+      "loss": 0.0721,
+      "step": 141
+    },
+    {
+      "epoch": 1.2117333333333333,
+      "grad_norm": 0.08080002696086562,
+      "learning_rate": 1e-05,
+      "loss": 0.1073,
+      "step": 142
+    },
+    {
+      "epoch": 1.2202666666666666,
+      "grad_norm": 0.10400160039217118,
+      "learning_rate": 1e-05,
+      "loss": 0.1227,
+      "step": 143
+    },
+    {
+      "epoch": 1.2288000000000001,
+      "grad_norm": 0.08719509476650818,
+      "learning_rate": 1e-05,
+      "loss": 0.114,
+      "step": 144
+    },
+    {
+      "epoch": 1.2373333333333334,
+      "grad_norm": 0.08431635436674337,
+      "learning_rate": 1e-05,
+      "loss": 0.1303,
+      "step": 145
+    },
+    {
+      "epoch": 1.2458666666666667,
+      "grad_norm": 0.23947926607305503,
+      "learning_rate": 1e-05,
+      "loss": 0.1199,
+      "step": 146
+    },
+    {
+      "epoch": 1.2544,
+      "grad_norm": 0.08794721265212341,
+      "learning_rate": 1e-05,
+      "loss": 0.1094,
+      "step": 147
+    },
+    {
+      "epoch": 1.2629333333333332,
+      "grad_norm": 0.08063747277184712,
+      "learning_rate": 1e-05,
+      "loss": 0.1062,
+      "step": 148
+    },
+    {
+      "epoch": 1.2714666666666667,
+      "grad_norm": 0.06832693897193236,
+      "learning_rate": 1e-05,
+      "loss": 0.0842,
+      "step": 149
+    },
+    {
+      "epoch": 1.28,
+      "grad_norm": 0.07037053759395089,
+      "learning_rate": 1e-05,
+      "loss": 0.0971,
+      "step": 150
+    },
+    {
+      "epoch": 1.2885333333333333,
+      "grad_norm": 0.08753063334098339,
+      "learning_rate": 1e-05,
+      "loss": 0.085,
+      "step": 151
+    },
+    {
+      "epoch": 1.2970666666666666,
+      "grad_norm": 0.11381804369240754,
+      "learning_rate": 1e-05,
+      "loss": 0.1156,
+      "step": 152
+    },
+    {
+      "epoch": 1.3056,
+      "grad_norm": 0.07203805377255211,
+      "learning_rate": 1e-05,
+      "loss": 0.0951,
+      "step": 153
+    },
+    {
+      "epoch": 1.3141333333333334,
+      "grad_norm": 0.1156784206459358,
+      "learning_rate": 1e-05,
+      "loss": 0.1557,
+      "step": 154
+    },
+    {
+      "epoch": 1.3226666666666667,
+      "grad_norm": 0.11353874538174968,
+      "learning_rate": 1e-05,
+      "loss": 0.1284,
+      "step": 155
+    },
+    {
+      "epoch": 1.3312,
+      "grad_norm": 0.06675505890811795,
+      "learning_rate": 1e-05,
+      "loss": 0.089,
+      "step": 156
+    },
+    {
+      "epoch": 1.3397333333333332,
+      "grad_norm": 0.07642955477275162,
+      "learning_rate": 1e-05,
+      "loss": 0.0825,
+      "step": 157
+    },
+    {
+      "epoch": 1.3482666666666667,
+      "grad_norm": 0.07196529265355209,
+      "learning_rate": 1e-05,
+      "loss": 0.0885,
+      "step": 158
+    },
+    {
+      "epoch": 1.3568,
+      "grad_norm": 0.08651497112727735,
+      "learning_rate": 1e-05,
+      "loss": 0.0934,
+      "step": 159
+    },
+    {
+      "epoch": 1.3653333333333333,
+      "grad_norm": 0.07249320769144564,
+      "learning_rate": 1e-05,
+      "loss": 0.102,
+      "step": 160
+    },
+    {
+      "epoch": 1.3738666666666668,
+      "grad_norm": 0.08744246078973236,
+      "learning_rate": 1e-05,
+      "loss": 0.0905,
+      "step": 161
+    },
+    {
+      "epoch": 1.3824,
+      "grad_norm": 0.08657071789403122,
+      "learning_rate": 1e-05,
+      "loss": 0.1217,
+      "step": 162
+    },
+    {
+      "epoch": 1.3909333333333334,
+      "grad_norm": 0.1064187506686306,
+      "learning_rate": 1e-05,
+      "loss": 0.1163,
+      "step": 163
+    },
+    {
+      "epoch": 1.3994666666666666,
+      "grad_norm": 0.1280290421664948,
+      "learning_rate": 1e-05,
+      "loss": 0.1046,
+      "step": 164
+    },
+    {
+      "epoch": 1.408,
+      "grad_norm": 0.09937311183437203,
+      "learning_rate": 1e-05,
+      "loss": 0.1147,
+      "step": 165
+    },
+    {
+      "epoch": 1.4165333333333332,
+      "grad_norm": 0.08384493963149035,
+      "learning_rate": 1e-05,
+      "loss": 0.0837,
+      "step": 166
+    },
+    {
+      "epoch": 1.4250666666666667,
+      "grad_norm": 0.0878469941667546,
+      "learning_rate": 1e-05,
+      "loss": 0.1034,
+      "step": 167
+    },
+    {
+      "epoch": 1.4336,
+      "grad_norm": 0.08507656582015763,
+      "learning_rate": 1e-05,
+      "loss": 0.1124,
+      "step": 168
+    },
+    {
+      "epoch": 1.4421333333333333,
+      "grad_norm": 0.14341789007671765,
+      "learning_rate": 1e-05,
+      "loss": 0.1045,
+      "step": 169
+    },
+    {
+      "epoch": 1.4506666666666668,
+      "grad_norm": 0.11549200338103699,
+      "learning_rate": 1e-05,
+      "loss": 0.1192,
+      "step": 170
+    },
+    {
+      "epoch": 1.4592,
+      "grad_norm": 0.08297398102159202,
+      "learning_rate": 1e-05,
+      "loss": 0.106,
+      "step": 171
+    },
+    {
+      "epoch": 1.4677333333333333,
+      "grad_norm": 0.08511454300188333,
+      "learning_rate": 1e-05,
+      "loss": 0.1115,
+      "step": 172
+    },
+    {
+      "epoch": 1.4762666666666666,
+      "grad_norm": 0.06731733651614974,
+      "learning_rate": 1e-05,
+      "loss": 0.0579,
+      "step": 173
+    },
+    {
+      "epoch": 1.4848,
+      "grad_norm": 0.08522628039447024,
+      "learning_rate": 1e-05,
+      "loss": 0.0944,
+      "step": 174
+    },
+    {
+      "epoch": 1.4933333333333334,
+      "grad_norm": 0.08148851689521808,
+      "learning_rate": 1e-05,
+      "loss": 0.0946,
+      "step": 175
+    },
+    {
+      "epoch": 1.5018666666666667,
+      "grad_norm": 0.09314761246496046,
+      "learning_rate": 1e-05,
+      "loss": 0.1077,
+      "step": 176
+    },
+    {
+      "epoch": 1.5104,
+      "grad_norm": 0.08337943532869242,
+      "learning_rate": 1e-05,
+      "loss": 0.0919,
+      "step": 177
+    },
+    {
+      "epoch": 1.5189333333333335,
+      "grad_norm": 0.07936632915317685,
+      "learning_rate": 1e-05,
+      "loss": 0.0878,
+      "step": 178
+    },
+    {
+      "epoch": 1.5274666666666668,
+      "grad_norm": 0.10041567827499392,
+      "learning_rate": 1e-05,
+      "loss": 0.1164,
+      "step": 179
+    },
+    {
+      "epoch": 1.536,
+      "grad_norm": 0.08184099557308296,
+      "learning_rate": 1e-05,
+      "loss": 0.1143,
+      "step": 180
+    },
+    {
+      "epoch": 1.5445333333333333,
+      "grad_norm": 0.08722428613554693,
+      "learning_rate": 1e-05,
+      "loss": 0.1068,
+      "step": 181
+    },
+    {
+      "epoch": 1.5530666666666666,
+      "grad_norm": 0.08710953879234071,
+      "learning_rate": 1e-05,
+      "loss": 0.11,
+      "step": 182
+    },
+    {
+      "epoch": 1.5615999999999999,
+      "grad_norm": 0.08115450331732889,
+      "learning_rate": 1e-05,
+      "loss": 0.0877,
+      "step": 183
+    },
+    {
+      "epoch": 1.5701333333333334,
+      "grad_norm": 0.06955623887568685,
+      "learning_rate": 1e-05,
+      "loss": 0.0758,
+      "step": 184
+    },
+    {
+      "epoch": 1.5786666666666667,
+      "grad_norm": 0.11077420984396173,
+      "learning_rate": 1e-05,
+      "loss": 0.0886,
+      "step": 185
+    },
+    {
+      "epoch": 1.5872000000000002,
+      "grad_norm": 0.09248170156976404,
+      "learning_rate": 1e-05,
+      "loss": 0.1042,
+      "step": 186
+    },
+    {
+      "epoch": 1.5957333333333334,
+      "grad_norm": 0.0875865630501027,
+      "learning_rate": 1e-05,
+      "loss": 0.0956,
+      "step": 187
+    },
+    {
+      "epoch": 1.6042666666666667,
+      "grad_norm": 0.09025094284776364,
+      "learning_rate": 1e-05,
+      "loss": 0.0865,
+      "step": 188
+    },
+    {
+      "epoch": 1.6128,
+      "grad_norm": 0.09201435441623142,
+      "learning_rate": 1e-05,
+      "loss": 0.0848,
+      "step": 189
+    },
+    {
+      "epoch": 1.6213333333333333,
+      "grad_norm": 0.08582347653077456,
+      "learning_rate": 1e-05,
+      "loss": 0.0868,
+      "step": 190
+    },
+    {
+      "epoch": 1.6298666666666666,
+      "grad_norm": 0.08390294885002035,
+      "learning_rate": 1e-05,
+      "loss": 0.0883,
+      "step": 191
+    },
+    {
+      "epoch": 1.6383999999999999,
+      "grad_norm": 0.09484831369314428,
+      "learning_rate": 1e-05,
+      "loss": 0.0955,
+      "step": 192
+    },
+    {
+      "epoch": 1.6469333333333334,
+      "grad_norm": 0.08291745035821121,
+      "learning_rate": 1e-05,
+      "loss": 0.0943,
+      "step": 193
+    },
+    {
+      "epoch": 1.6554666666666666,
+      "grad_norm": 0.09788087284042751,
+      "learning_rate": 1e-05,
+      "loss": 0.1146,
+      "step": 194
+    },
+    {
+      "epoch": 1.6640000000000001,
+      "grad_norm": 0.09763113175653552,
+      "learning_rate": 1e-05,
+      "loss": 0.1028,
+      "step": 195
+    },
+    {
+      "epoch": 1.6725333333333334,
+      "grad_norm": 0.11617852408102547,
+      "learning_rate": 1e-05,
+      "loss": 0.1323,
+      "step": 196
+    },
+    {
+      "epoch": 1.6810666666666667,
+      "grad_norm": 0.12191871384850739,
+      "learning_rate": 1e-05,
+      "loss": 0.1395,
+      "step": 197
+    },
+    {
+      "epoch": 1.6896,
+      "grad_norm": 0.1359943408077879,
+      "learning_rate": 1e-05,
+      "loss": 0.1191,
+      "step": 198
+    },
+    {
+      "epoch": 1.6981333333333333,
+      "grad_norm": 0.12006029084078058,
+      "learning_rate": 1e-05,
+      "loss": 0.0983,
+      "step": 199
+    },
+    {
+      "epoch": 1.7066666666666666,
+      "grad_norm": 0.09668785600159001,
+      "learning_rate": 1e-05,
+      "loss": 0.0801,
+      "step": 200
+    },
+    {
+      "epoch": 1.7151999999999998,
+      "grad_norm": 0.11929283034682205,
+      "learning_rate": 1e-05,
+      "loss": 0.1072,
+      "step": 201
+    },
+    {
+      "epoch": 1.7237333333333333,
+      "grad_norm": 0.09077598659108727,
+      "learning_rate": 1e-05,
+      "loss": 0.0835,
+      "step": 202
+    },
+    {
+      "epoch": 1.7322666666666666,
+      "grad_norm": 0.1315112247694008,
+      "learning_rate": 1e-05,
+      "loss": 0.1251,
+      "step": 203
+    },
+    {
+      "epoch": 1.7408000000000001,
+      "grad_norm": 0.10262675849503336,
+      "learning_rate": 1e-05,
+      "loss": 0.1102,
+      "step": 204
+    },
+    {
+      "epoch": 1.7493333333333334,
+      "grad_norm": 0.11679561974734426,
+      "learning_rate": 1e-05,
+      "loss": 0.0912,
+      "step": 205
+    },
+    {
+      "epoch": 1.7578666666666667,
+      "grad_norm": 0.12857201623167358,
+      "learning_rate": 1e-05,
+      "loss": 0.1108,
+      "step": 206
+    },
+    {
+      "epoch": 1.7664,
+      "grad_norm": 0.110417578370301,
+      "learning_rate": 1e-05,
+      "loss": 0.0713,
+      "step": 207
+    },
+    {
+      "epoch": 1.7749333333333333,
+      "grad_norm": 0.1206716016388202,
+      "learning_rate": 1e-05,
+      "loss": 0.099,
+      "step": 208
+    },
+    {
+      "epoch": 1.7834666666666665,
+      "grad_norm": 0.11690286401098868,
+      "learning_rate": 1e-05,
+      "loss": 0.1398,
+      "step": 209
+    },
+    {
+      "epoch": 1.792,
+      "grad_norm": 0.1087083638784744,
+      "learning_rate": 1e-05,
+      "loss": 0.1106,
+      "step": 210
+    },
+    {
+      "epoch": 1.8005333333333333,
+      "grad_norm": 0.13044092544075447,
+      "learning_rate": 1e-05,
+      "loss": 0.1298,
+      "step": 211
+    },
+    {
+      "epoch": 1.8090666666666668,
+      "grad_norm": 0.11125544216608903,
+      "learning_rate": 1e-05,
+      "loss": 0.0862,
+      "step": 212
+    },
+    {
+      "epoch": 1.8176,
+      "grad_norm": 0.15173848052348715,
+      "learning_rate": 1e-05,
+      "loss": 0.1116,
+      "step": 213
+    },
+    {
+      "epoch": 1.8261333333333334,
+      "grad_norm": 0.1300854070876123,
+      "learning_rate": 1e-05,
+      "loss": 0.0881,
+      "step": 214
+    },
+    {
+      "epoch": 1.8346666666666667,
+      "grad_norm": 0.12472742133557221,
+      "learning_rate": 1e-05,
+      "loss": 0.1199,
+      "step": 215
+    },
+    {
+      "epoch": 1.8432,
+      "grad_norm": 0.10311157164421082,
+      "learning_rate": 1e-05,
+      "loss": 0.0887,
+      "step": 216
+    },
+    {
+      "epoch": 1.8517333333333332,
+      "grad_norm": 0.13979969636076792,
+      "learning_rate": 1e-05,
+      "loss": 0.089,
+      "step": 217
+    },
+    {
+      "epoch": 1.8602666666666665,
+      "grad_norm": 0.1725935114282675,
+      "learning_rate": 1e-05,
+      "loss": 0.1232,
+      "step": 218
+    },
+    {
+      "epoch": 1.8688,
+      "grad_norm": 0.13035682714460442,
+      "learning_rate": 1e-05,
+      "loss": 0.0803,
+      "step": 219
+    },
+    {
+      "epoch": 1.8773333333333333,
+      "grad_norm": 0.11707794313507026,
+      "learning_rate": 1e-05,
+      "loss": 0.0947,
+      "step": 220
+    },
+    {
+      "epoch": 1.8858666666666668,
+      "grad_norm": 0.13425868511610053,
+      "learning_rate": 1e-05,
+      "loss": 0.1118,
+      "step": 221
+    },
+    {
+      "epoch": 1.8944,
+      "grad_norm": 0.1269119929658306,
+      "learning_rate": 1e-05,
+      "loss": 0.1075,
+      "step": 222
+    },
+    {
+      "epoch": 1.9029333333333334,
+      "grad_norm": 0.14370379197651403,
+      "learning_rate": 1e-05,
+      "loss": 0.084,
+      "step": 223
+    },
+    {
+      "epoch": 1.9114666666666666,
+      "grad_norm": 0.15625739080115553,
+      "learning_rate": 1e-05,
+      "loss": 0.1268,
+      "step": 224
+    },
+    {
+      "epoch": 1.92,
+      "grad_norm": 0.14298714144246835,
+      "learning_rate": 1e-05,
+      "loss": 0.1092,
+      "step": 225
+    },
+    {
+      "epoch": 1.9285333333333332,
+      "grad_norm": 0.1246451691187349,
+      "learning_rate": 1e-05,
+      "loss": 0.0907,
+      "step": 226
+    },
+    {
+      "epoch": 1.9370666666666667,
+      "grad_norm": 0.11821532122867853,
+      "learning_rate": 1e-05,
+      "loss": 0.0928,
+      "step": 227
+    },
+    {
+      "epoch": 1.9456,
+      "grad_norm": 0.13880790163863022,
+      "learning_rate": 1e-05,
+      "loss": 0.0925,
+      "step": 228
+    },
+    {
+      "epoch": 1.9541333333333335,
+      "grad_norm": 0.12467839547788233,
+      "learning_rate": 1e-05,
+      "loss": 0.0769,
+      "step": 229
+    },
+    {
+      "epoch": 1.9626666666666668,
+      "grad_norm": 0.1416031541406035,
+      "learning_rate": 1e-05,
+      "loss": 0.1079,
+      "step": 230
+    },
+    {
+      "epoch": 1.9712,
+      "grad_norm": 0.12730577347260927,
+      "learning_rate": 1e-05,
+      "loss": 0.0953,
+      "step": 231
+    },
+    {
+      "epoch": 1.9797333333333333,
+      "grad_norm": 0.15488312205299337,
+      "learning_rate": 1e-05,
+      "loss": 0.0938,
+      "step": 232
+    },
+    {
+      "epoch": 1.9882666666666666,
+      "grad_norm": 0.1285822292835917,
+      "learning_rate": 1e-05,
+      "loss": 0.0749,
+      "step": 233
+    },
+    {
+      "epoch": 1.9968,
+      "grad_norm": 0.15841174792939966,
+      "learning_rate": 1e-05,
+      "loss": 0.0814,
+      "step": 234
+    },
+    {
+      "epoch": 2.005333333333333,
+      "grad_norm": 0.1587140991418047,
+      "learning_rate": 1e-05,
+      "loss": 0.1167,
+      "step": 235
+    },
+    {
+      "epoch": 2.0138666666666665,
+      "grad_norm": 0.18909490284011177,
+      "learning_rate": 1e-05,
+      "loss": 0.1615,
+      "step": 236
+    },
+    {
+      "epoch": 2.0224,
+      "grad_norm": 0.17253418789231068,
+      "learning_rate": 1e-05,
+      "loss": 0.1135,
+      "step": 237
+    },
+    {
+      "epoch": 2.0309333333333335,
+      "grad_norm": 0.19155873822350467,
+      "learning_rate": 1e-05,
+      "loss": 0.1076,
+      "step": 238
+    },
+    {
+      "epoch": 2.0394666666666668,
+      "grad_norm": 0.1825343775540858,
+      "learning_rate": 1e-05,
+      "loss": 0.1219,
+      "step": 239
+    },
+    {
+      "epoch": 2.048,
+      "grad_norm": 0.245406872522052,
+      "learning_rate": 1e-05,
+      "loss": 0.1044,
+      "step": 240
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 301,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 20,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.4262663119891333e+18,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-240/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9430fb289d52200b279530dc31f818fe016b81f2a2feb4d356e75541590998de
+size 6840

checkpoint-260/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: ../ckpts/Meta-Llama-3-8B-Instruct
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-260/adapter_config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "../ckpts/Meta-Llama-3-8B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "up_proj",
+    "gate_proj",
+    "lm_head",
+    "o_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}