MatVet commited on 4 days ago

Commit

e8272ad

verified ·

1 Parent(s): 3eb0a16

Initial model upload

Browse files

Files changed (45) hide show

.ipynb_checkpoints/README-checkpoint.md +141 -0
README.md +141 -0
adapter_config.json +34 -0
adapter_model.bin +3 -0
added_tokens.json +5 -0
checkpoint-3084/README.md +202 -0
checkpoint-3084/adapter_config.json +34 -0
checkpoint-3084/adapter_model.safetensors +3 -0
checkpoint-3084/added_tokens.json +5 -0
checkpoint-3084/merges.txt +0 -0
checkpoint-3084/optimizer.pt +3 -0
checkpoint-3084/rng_state_0.pth +3 -0
checkpoint-3084/rng_state_1.pth +3 -0
checkpoint-3084/rng_state_2.pth +3 -0
checkpoint-3084/rng_state_3.pth +3 -0
checkpoint-3084/rng_state_4.pth +3 -0
checkpoint-3084/rng_state_5.pth +3 -0
checkpoint-3084/rng_state_6.pth +3 -0
checkpoint-3084/rng_state_7.pth +3 -0
checkpoint-3084/scheduler.pt +3 -0
checkpoint-3084/special_tokens_map.json +35 -0
checkpoint-3084/tokenizer.json +0 -0
checkpoint-3084/tokenizer_config.json +199 -0
checkpoint-3084/trainer_state.json +2189 -0
checkpoint-3084/training_args.bin +3 -0
checkpoint-3084/vocab.json +0 -0
config.json +49 -0
merged/added_tokens.json +5 -0
merged/config.json +33 -0
merged/generation_config.json +8 -0
merged/merges.txt +0 -0
merged/pytorch_model-00001-of-00004.bin +3 -0
merged/pytorch_model-00002-of-00004.bin +3 -0
merged/pytorch_model-00003-of-00004.bin +3 -0
merged/pytorch_model-00004-of-00004.bin +3 -0
merged/pytorch_model.bin.index.json +370 -0
merged/special_tokens_map.json +35 -0
merged/tokenizer.json +0 -0
merged/tokenizer_config.json +199 -0
merged/vocab.json +0 -0
merges.txt +0 -0
special_tokens_map.json +35 -0
tokenizer.json +0 -0
tokenizer_config.json +199 -0
vocab.json +0 -0

.ipynb_checkpoints/README-checkpoint.md ADDED Viewed

	@@ -0,0 +1,141 @@

+---
+library_name: peft
+license: apache-2.0
+base_model: ibm-granite/granite-3.1-8b-instruct
+tags:
+- generated_from_trainer
+model-index:
+- name: home/ec2-user/SageMaker/task_decomposition/trained_models/granite-math-plans-3.1-8b-lora
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.5.2`
+```yaml
+base_model: ibm-granite/granite-3.1-8b-instruct
+model_type: AutoModelForCausalLM
+tokenizer_type: AutoTokenizer
+resize_token_embeddings_to_32x: true
+load_in_8bit: true
+load_in_4bit: false
+strict: false
+datasets:
+- path: /home/ec2-user/SageMaker/task_decomposition/data/task_decomposition_training_data_math.jsonl
+  type: chat_template
+  chat_template: tokenizer_default
+  field_messages: conversations
+  message_field_role: role
+  message_field_content: value
+dataset_prepared_path: last_run_prepared_sft
+val_set_size: 0
+sequence_len: 8192
+sample_packing: false
+pad_to_sequence_len: true
+eval_sample_packing: false
+output_dir: /home/ec2-user/SageMaker/task_decomposition/trained_models/granite-math-plans-3.1-8b-lora
+wandb_project: null
+wandb_entity: null
+wandb_watch: null
+wandb_name: null
+wandb_log_model: null
+adapter: lora
+lora_model_dir:
+lora_r: 32
+lora_alpha: 16
+lora_dropout: 0.05
+lora_target_linear: true
+lora_fan_in_fan_out:
+gradient_accumulation_steps: 8
+micro_batch_size: 1
+eval_batch_size: 1
+num_epochs: 3
+optimizer: adamw_bnb_8bit
+lr_scheduler: cosine
+learning_rate: 1e-05
+max_grad_norm: 1.0
+logging_steps: 10
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: false
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+use_reentrant: false
+early_stopping_patience:
+resume_from_checkpoint:
+local_rank:
+xformers_attention:
+flash_attention: true
+warmup_ratio: 0.05
+eval_steps:
+save_strategy: epoch
+eval_table_size:
+num_processes: 8
+deepspeed:
+weight_decay: 0.0
+```
+</details><br>
+# home/ec2-user/SageMaker/task_decomposition/trained_models/granite-math-plans-3.0-8b-lora
+This model is a fine-tuned version of [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) on the /home/ec2-user/SageMaker/task_decomposition/data/task_decomposition_training_data_math.jsonl dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 1e-05
+- train_batch_size: 1
+- eval_batch_size: 1
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 64
+- total_eval_batch_size: 8
+- optimizer: Use adamw_bnb_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 154
+- num_epochs: 3
+### Training results
+### Framework versions
+- PEFT 0.13.2
+- Transformers 4.46.3
+- Pytorch 2.3.1+cu121
+- Datasets 3.1.0
+- Tokenizers 0.20.3

README.md ADDED Viewed

	@@ -0,0 +1,141 @@

+---
+library_name: peft
+license: apache-2.0
+base_model: ibm-granite/granite-3.1-8b-instruct
+tags:
+- generated_from_trainer
+model-index:
+- name: home/ec2-user/SageMaker/task_decomposition/trained_models/granite-math-plans-3.1-8b-lora
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.5.2`
+```yaml
+base_model: ibm-granite/granite-3.1-8b-instruct
+model_type: AutoModelForCausalLM
+tokenizer_type: AutoTokenizer
+resize_token_embeddings_to_32x: true
+load_in_8bit: true
+load_in_4bit: false
+strict: false
+datasets:
+- path: /home/ec2-user/SageMaker/task_decomposition/data/task_decomposition_training_data_math.jsonl
+  type: chat_template
+  chat_template: tokenizer_default
+  field_messages: conversations
+  message_field_role: role
+  message_field_content: value
+dataset_prepared_path: last_run_prepared_sft
+val_set_size: 0
+sequence_len: 8192
+sample_packing: false
+pad_to_sequence_len: true
+eval_sample_packing: false
+output_dir: /home/ec2-user/SageMaker/task_decomposition/trained_models/granite-math-plans-3.1-8b-lora
+wandb_project: null
+wandb_entity: null
+wandb_watch: null
+wandb_name: null
+wandb_log_model: null
+adapter: lora
+lora_model_dir:
+lora_r: 32
+lora_alpha: 16
+lora_dropout: 0.05
+lora_target_linear: true
+lora_fan_in_fan_out:
+gradient_accumulation_steps: 8
+micro_batch_size: 1
+eval_batch_size: 1
+num_epochs: 3
+optimizer: adamw_bnb_8bit
+lr_scheduler: cosine
+learning_rate: 1e-05
+max_grad_norm: 1.0
+logging_steps: 10
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: false
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+use_reentrant: false
+early_stopping_patience:
+resume_from_checkpoint:
+local_rank:
+xformers_attention:
+flash_attention: true
+warmup_ratio: 0.05
+eval_steps:
+save_strategy: epoch
+eval_table_size:
+num_processes: 8
+deepspeed:
+weight_decay: 0.0
+```
+</details><br>
+# home/ec2-user/SageMaker/task_decomposition/trained_models/granite-math-plans-3.0-8b-lora
+This model is a fine-tuned version of [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) on the /home/ec2-user/SageMaker/task_decomposition/data/task_decomposition_training_data_math.jsonl dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 1e-05
+- train_batch_size: 1
+- eval_batch_size: 1
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 64
+- total_eval_batch_size: 8
+- optimizer: Use adamw_bnb_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 154
+- num_epochs: 3
+### Training results
+### Framework versions
+- PEFT 0.13.2
+- Transformers 4.46.3
+- Pytorch 2.3.1+cu121
+- Datasets 3.1.0
+- Tokenizers 0.20.3

adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "ibm-granite/granite-3.1-8b-instruct",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "o_proj",
+    "down_proj",
+    "up_proj",
+    "q_proj",
+    "gate_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9a78a4b83edd30daeebc4160a94362b8d9e82de7f4c73fdd288a8c139902b6ce
+size 798955662

added_tokens.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "<|end_of_role|>": 49153,
+  "<|start_of_role|>": 49152,
+  "<|tool_call|>": 49154
+}

checkpoint-3084/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: ibm-granite/granite-3.1-8b-instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

checkpoint-3084/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "ibm-granite/granite-3.1-8b-instruct",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "o_proj",
+    "down_proj",
+    "up_proj",
+    "q_proj",
+    "gate_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-3084/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8b0faa4e435ee5c37b2be55def5eb7079adc516de1757ee17664c3e132e38dd8
+size 1201743872

checkpoint-3084/added_tokens.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "<|end_of_role|>": 49153,
+  "<|start_of_role|>": 49152,
+  "<|tool_call|>": 49154
+}

checkpoint-3084/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-3084/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c59cba04e80a61127d73c24d5ada98a795f926f0c43608ba7d0e96610e87b5ec
+size 201669716

checkpoint-3084/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bed810de1ba5fc3c6ca4d86854492dbc9b392b185654e34c3e1298b2dd5b4feb
+size 15984

checkpoint-3084/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9047f9e57b13ce54bdc5b5b2ac806b3de36e06ef41a69bdb1e4c4cc4a3d86213
+size 15984

checkpoint-3084/rng_state_2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4dbf4d66848ba437330e008f2cfc75e5031e00e169e4f519e603306517210157
+size 15984

checkpoint-3084/rng_state_3.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ca58af17bec86aa02d83d02f1dce1adcf2ace175d82b08a9d5bbe941bb4ef825
+size 15984

checkpoint-3084/rng_state_4.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66048d766ef637eeffe93015803833352137229af4fed5680560d5d9e5a01142
+size 15984

checkpoint-3084/rng_state_5.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f18b395aad3543ed1f03de02e877e04a4dfc0cc6c998cbe8c928374e179ab8fa
+size 15984

checkpoint-3084/rng_state_6.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3650b87f28f82cc769d88cb6d559d4db5437f0ec50ddb361dfc6992097168715
+size 15984

checkpoint-3084/rng_state_7.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:761e2386a1b8c1db2a4a08b5e726c3e71c451a3af1a391f75930bbade8be22b6
+size 15984

checkpoint-3084/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f71897c50d4a0cf320866c3d469525886ee81f2b9feecf8ecd6d74bf145d7a23
+size 1064

checkpoint-3084/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "additional_special_tokens": [
+    "<|start_of_role|>",
+    "<|end_of_role|>",
+    "<|tool_call|>"
+  ],
+  "bos_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-3084/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-3084/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,199 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|end_of_text|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<fim_prefix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<fim_middle>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<fim_suffix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<fim_pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "5": {
+      "content": "<filename>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "6": {
+      "content": "<gh_stars>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "7": {
+      "content": "<issue_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "8": {
+      "content": "<issue_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "9": {
+      "content": "<issue_closed>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "10": {
+      "content": "<jupyter_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "11": {
+      "content": "<jupyter_text>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "12": {
+      "content": "<jupyter_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "13": {
+      "content": "<jupyter_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "14": {
+      "content": "<empty_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "15": {
+      "content": "<commit_before>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "16": {
+      "content": "<commit_msg>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "17": {
+      "content": "<commit_after>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "18": {
+      "content": "<reponame>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49152": {
+      "content": "<|start_of_role|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49153": {
+      "content": "<|end_of_role|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49154": {
+      "content": "<|tool_call|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|start_of_role|>",
+    "<|end_of_role|>",
+    "<|tool_call|>"
+  ],
+  "bos_token": "<|end_of_text|>",
+  "chat_template": "{%- if messages[0]['role'] == 'system' %}\n    {%- set system_message = messages[0]['content'] %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set system_message = \"Knowledge Cutoff Date: April 2024.\nToday's Date: \" + strftime_now('%B %d, %Y') + \".\nYou are Granite, developed by IBM.\" %}\n    {%- if tools and documents %}\n        {%- set system_message = system_message + \" You are a helpful AI assistant with access to the following tools. When a tool is required to answer the user's query, respond with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\n\nWrite the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n    {%- elif tools %}\n        {%- set system_message = system_message + \" You are a helpful AI assistant with access to the following tools. When a tool is required to answer the user's query, respond with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\" %}\n    {%- elif documents %}\n        {%- set system_message = system_message + \" Write the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n    {%- else %}\n        {%- set system_message = system_message + \" You are a helpful AI assistant.\" %}    \n    {%- endif %}\n    {%- if 'citations' in controls and documents %}\n        {%- set system_message = system_message + '\n\nIn your response, use the symbols <co> and </co> to indicate when a fact comes from a document in the search result, e.g <co>0</co> for a fact from document 0. Afterwards, list all the citations with their corresponding documents in an ordered list.' %}\n    {%- endif %}\n    {%- if 'hallucinations' in controls and documents %}\n        {%- set system_message = system_message + '\n\nFinally, after the response is written, include a numbered list of sentences from the response that are potentially hallucinated and not based in the documents.' %}\n    {%- endif %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n{{- '<|start_of_role|>system<|end_of_role|>' + system_message + '<|end_of_text|>\n' }}\n{%- if tools %}\n    {{- '<|start_of_role|>tools<|end_of_role|>' }}\n    {{- tools | tojson(indent=4) }}\n    {{- '<|end_of_text|>\n' }}\n{%- endif %}\n{%- if documents %}\n    {{- '<|start_of_role|>documents<|end_of_role|>' }}\n    {%- for document in documents %}\n        {{- 'Document ' + loop.index0 | string + '\n' }}\n        {{- document['text'] }}\n        {%- if not loop.last %}\n            {{- '\n\n'}}\n        {%- endif%}\n    {%- endfor %}\n    {{- '<|end_of_text|>\n' }}\n{%- endif %}\n{%- for message in loop_messages %}\n    {{- '<|start_of_role|>' + message['role'] + '<|end_of_role|>' + message['content'] + '<|end_of_text|>\n' }}\n    {%- if loop.last and add_generation_prompt %}\n        {{- '<|start_of_role|>assistant' }}\n            {%- if controls %}\n                {{- ' ' + controls | tojson()}}\n            {%- endif %}\n        {{- '<|end_of_role|>' }}\n    {%- endif %}\n{%- endfor %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|end_of_text|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 9223372036854775807,
+  "pad_token": "<|end_of_text|>",
+  "padding_side": "left",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|end_of_text|>",
+  "vocab_size": 49152
+}

checkpoint-3084/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2189 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 2.998298906439854,
+  "eval_steps": 500,
+  "global_step": 3084,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.009720534629404616,
+      "grad_norm": 3.39630126953125,
+      "learning_rate": 6.493506493506493e-07,
+      "loss": 0.9229,
+      "step": 10
+    },
+    {
+      "epoch": 0.019441069258809233,
+      "grad_norm": 4.325911521911621,
+      "learning_rate": 1.2987012987012986e-06,
+      "loss": 0.9383,
+      "step": 20
+    },
+    {
+      "epoch": 0.02916160388821385,
+      "grad_norm": 4.230486869812012,
+      "learning_rate": 1.9480519480519483e-06,
+      "loss": 0.932,
+      "step": 30
+    },
+    {
+      "epoch": 0.038882138517618466,
+      "grad_norm": 4.264225006103516,
+      "learning_rate": 2.597402597402597e-06,
+      "loss": 0.9285,
+      "step": 40
+    },
+    {
+      "epoch": 0.04860267314702309,
+      "grad_norm": 3.955381155014038,
+      "learning_rate": 3.246753246753247e-06,
+      "loss": 0.9333,
+      "step": 50
+    },
+    {
+      "epoch": 0.0583232077764277,
+      "grad_norm": 4.48983097076416,
+      "learning_rate": 3.896103896103897e-06,
+      "loss": 0.9064,
+      "step": 60
+    },
+    {
+      "epoch": 0.06804374240583232,
+      "grad_norm": 4.365206241607666,
+      "learning_rate": 4.5454545454545455e-06,
+      "loss": 0.8685,
+      "step": 70
+    },
+    {
+      "epoch": 0.07776427703523693,
+      "grad_norm": 3.1013827323913574,
+      "learning_rate": 5.194805194805194e-06,
+      "loss": 0.7865,
+      "step": 80
+    },
+    {
+      "epoch": 0.08748481166464156,
+      "grad_norm": 2.3804359436035156,
+      "learning_rate": 5.844155844155844e-06,
+      "loss": 0.7293,
+      "step": 90
+    },
+    {
+      "epoch": 0.09720534629404617,
+      "grad_norm": 1.5543646812438965,
+      "learning_rate": 6.493506493506494e-06,
+      "loss": 0.6636,
+      "step": 100
+    },
+    {
+      "epoch": 0.10692588092345079,
+      "grad_norm": 1.550211787223816,
+      "learning_rate": 7.1428571428571436e-06,
+      "loss": 0.6161,
+      "step": 110
+    },
+    {
+      "epoch": 0.1166464155528554,
+      "grad_norm": 1.2692899703979492,
+      "learning_rate": 7.792207792207793e-06,
+      "loss": 0.5856,
+      "step": 120
+    },
+    {
+      "epoch": 0.12636695018226002,
+      "grad_norm": 1.4034239053726196,
+      "learning_rate": 8.441558441558442e-06,
+      "loss": 0.5672,
+      "step": 130
+    },
+    {
+      "epoch": 0.13608748481166463,
+      "grad_norm": 1.0562485456466675,
+      "learning_rate": 9.090909090909091e-06,
+      "loss": 0.5494,
+      "step": 140
+    },
+    {
+      "epoch": 0.14580801944106925,
+      "grad_norm": 1.0509977340698242,
+      "learning_rate": 9.740259740259742e-06,
+      "loss": 0.5339,
+      "step": 150
+    },
+    {
+      "epoch": 0.15552855407047386,
+      "grad_norm": 5.404982089996338,
+      "learning_rate": 9.99989653212821e-06,
+      "loss": 0.5324,
+      "step": 160
+    },
+    {
+      "epoch": 0.1652490886998785,
+      "grad_norm": 1.1149495840072632,
+      "learning_rate": 9.999264243974907e-06,
+      "loss": 0.5249,
+      "step": 170
+    },
+    {
+      "epoch": 0.17496962332928312,
+      "grad_norm": 1.0145925283432007,
+      "learning_rate": 9.998057222421298e-06,
+      "loss": 0.5362,
+      "step": 180
+    },
+    {
+      "epoch": 0.18469015795868773,
+      "grad_norm": 1.1371104717254639,
+      "learning_rate": 9.99627560623093e-06,
+      "loss": 0.5101,
+      "step": 190
+    },
+    {
+      "epoch": 0.19441069258809235,
+      "grad_norm": 1.1615630388259888,
+      "learning_rate": 9.993919600224802e-06,
+      "loss": 0.5239,
+      "step": 200
+    },
+    {
+      "epoch": 0.20413122721749696,
+      "grad_norm": 1.010383129119873,
+      "learning_rate": 9.990989475257843e-06,
+      "loss": 0.5024,
+      "step": 210
+    },
+    {
+      "epoch": 0.21385176184690158,
+      "grad_norm": 1.096696376800537,
+      "learning_rate": 9.98748556818776e-06,
+      "loss": 0.5079,
+      "step": 220
+    },
+    {
+      "epoch": 0.2235722964763062,
+      "grad_norm": 1.236509919166565,
+      "learning_rate": 9.98340828183631e-06,
+      "loss": 0.4984,
+      "step": 230
+    },
+    {
+      "epoch": 0.2332928311057108,
+      "grad_norm": 1.9408361911773682,
+      "learning_rate": 9.978758084943004e-06,
+      "loss": 0.5153,
+      "step": 240
+    },
+    {
+      "epoch": 0.24301336573511542,
+      "grad_norm": 1.1162898540496826,
+      "learning_rate": 9.973535512111196e-06,
+      "loss": 0.5079,
+      "step": 250
+    },
+    {
+      "epoch": 0.25273390036452004,
+      "grad_norm": 1.269111156463623,
+      "learning_rate": 9.967741163746654e-06,
+      "loss": 0.5306,
+      "step": 260
+    },
+    {
+      "epoch": 0.2624544349939247,
+      "grad_norm": 3.077763795852661,
+      "learning_rate": 9.961375705988501e-06,
+      "loss": 0.5083,
+      "step": 270
+    },
+    {
+      "epoch": 0.27217496962332927,
+      "grad_norm": 1.1506752967834473,
+      "learning_rate": 9.954439870632662e-06,
+      "loss": 0.5135,
+      "step": 280
+    },
+    {
+      "epoch": 0.2818955042527339,
+      "grad_norm": 1.2572377920150757,
+      "learning_rate": 9.946934455047718e-06,
+      "loss": 0.5172,
+      "step": 290
+    },
+    {
+      "epoch": 0.2916160388821385,
+      "grad_norm": 4.563354015350342,
+      "learning_rate": 9.938860322083241e-06,
+      "loss": 0.4991,
+      "step": 300
+    },
+    {
+      "epoch": 0.30133657351154314,
+      "grad_norm": 1.1372555494308472,
+      "learning_rate": 9.930218399970602e-06,
+      "loss": 0.5172,
+      "step": 310
+    },
+    {
+      "epoch": 0.3110571081409477,
+      "grad_norm": 1.1849030256271362,
+      "learning_rate": 9.921009682216251e-06,
+      "loss": 0.5147,
+      "step": 320
+    },
+    {
+      "epoch": 0.32077764277035237,
+      "grad_norm": 2.4454076290130615,
+      "learning_rate": 9.911235227487506e-06,
+      "loss": 0.5052,
+      "step": 330
+    },
+    {
+      "epoch": 0.330498177399757,
+      "grad_norm": 1.2442922592163086,
+      "learning_rate": 9.900896159490843e-06,
+      "loss": 0.5015,
+      "step": 340
+    },
+    {
+      "epoch": 0.3402187120291616,
+      "grad_norm": 1.2128549814224243,
+      "learning_rate": 9.88999366684271e-06,
+      "loss": 0.5009,
+      "step": 350
+    },
+    {
+      "epoch": 0.34993924665856624,
+      "grad_norm": 1.3940240144729614,
+      "learning_rate": 9.878529002932877e-06,
+      "loss": 0.5005,
+      "step": 360
+    },
+    {
+      "epoch": 0.3596597812879708,
+      "grad_norm": 1.2711237668991089,
+      "learning_rate": 9.866503485780347e-06,
+      "loss": 0.5079,
+      "step": 370
+    },
+    {
+      "epoch": 0.36938031591737547,
+      "grad_norm": 1.4298995733261108,
+      "learning_rate": 9.853918497881831e-06,
+      "loss": 0.4941,
+      "step": 380
+    },
+    {
+      "epoch": 0.37910085054678005,
+      "grad_norm": 1.7025047540664673,
+      "learning_rate": 9.840775486052807e-06,
+      "loss": 0.4878,
+      "step": 390
+    },
+    {
+      "epoch": 0.3888213851761847,
+      "grad_norm": 1.3242675065994263,
+      "learning_rate": 9.827075961261188e-06,
+      "loss": 0.5048,
+      "step": 400
+    },
+    {
+      "epoch": 0.3985419198055893,
+      "grad_norm": 1.3964117765426636,
+      "learning_rate": 9.812821498453624e-06,
+      "loss": 0.5015,
+      "step": 410
+    },
+    {
+      "epoch": 0.4082624544349939,
+      "grad_norm": 1.50648832321167,
+      "learning_rate": 9.798013736374436e-06,
+      "loss": 0.4931,
+      "step": 420
+    },
+    {
+      "epoch": 0.41798298906439857,
+      "grad_norm": 1.4148674011230469,
+      "learning_rate": 9.782654377377215e-06,
+      "loss": 0.5081,
+      "step": 430
+    },
+    {
+      "epoch": 0.42770352369380316,
+      "grad_norm": 1.3090276718139648,
+      "learning_rate": 9.766745187229124e-06,
+      "loss": 0.5072,
+      "step": 440
+    },
+    {
+      "epoch": 0.4374240583232078,
+      "grad_norm": 2.2069995403289795,
+      "learning_rate": 9.750287994907883e-06,
+      "loss": 0.4875,
+      "step": 450
+    },
+    {
+      "epoch": 0.4471445929526124,
+      "grad_norm": 1.469313144683838,
+      "learning_rate": 9.733284692391524e-06,
+      "loss": 0.4845,
+      "step": 460
+    },
+    {
+      "epoch": 0.456865127582017,
+      "grad_norm": 1.5857908725738525,
+      "learning_rate": 9.715737234440868e-06,
+      "loss": 0.4922,
+      "step": 470
+    },
+    {
+      "epoch": 0.4665856622114216,
+      "grad_norm": 1.4413847923278809,
+      "learning_rate": 9.697647638374797e-06,
+      "loss": 0.5025,
+      "step": 480
+    },
+    {
+      "epoch": 0.47630619684082626,
+      "grad_norm": 1.7726529836654663,
+      "learning_rate": 9.679017983838346e-06,
+      "loss": 0.4965,
+      "step": 490
+    },
+    {
+      "epoch": 0.48602673147023084,
+      "grad_norm": 1.4608409404754639,
+      "learning_rate": 9.659850412563615e-06,
+      "loss": 0.4885,
+      "step": 500
+    },
+    {
+      "epoch": 0.4957472660996355,
+      "grad_norm": 1.461808204650879,
+      "learning_rate": 9.64014712812354e-06,
+      "loss": 0.5012,
+      "step": 510
+    },
+    {
+      "epoch": 0.5054678007290401,
+      "grad_norm": 1.954996943473816,
+      "learning_rate": 9.619910395678582e-06,
+      "loss": 0.5032,
+      "step": 520
+    },
+    {
+      "epoch": 0.5151883353584447,
+      "grad_norm": 1.4469919204711914,
+      "learning_rate": 9.59914254171629e-06,
+      "loss": 0.4986,
+      "step": 530
+    },
+    {
+      "epoch": 0.5249088699878494,
+      "grad_norm": 1.5403884649276733,
+      "learning_rate": 9.577845953783864e-06,
+      "loss": 0.475,
+      "step": 540
+    },
+    {
+      "epoch": 0.534629404617254,
+      "grad_norm": 4.5531535148620605,
+      "learning_rate": 9.556023080213657e-06,
+      "loss": 0.4995,
+      "step": 550
+    },
+    {
+      "epoch": 0.5443499392466585,
+      "grad_norm": 1.598029375076294,
+      "learning_rate": 9.533676429841712e-06,
+      "loss": 0.4904,
+      "step": 560
+    },
+    {
+      "epoch": 0.5540704738760632,
+      "grad_norm": 1.4221954345703125,
+      "learning_rate": 9.51080857171934e-06,
+      "loss": 0.4886,
+      "step": 570
+    },
+    {
+      "epoch": 0.5637910085054678,
+      "grad_norm": 1.4414297342300415,
+      "learning_rate": 9.487422134817767e-06,
+      "loss": 0.5036,
+      "step": 580
+    },
+    {
+      "epoch": 0.5735115431348724,
+      "grad_norm": 1.551998496055603,
+      "learning_rate": 9.463519807725906e-06,
+      "loss": 0.5018,
+      "step": 590
+    },
+    {
+      "epoch": 0.583232077764277,
+      "grad_norm": 1.9298173189163208,
+      "learning_rate": 9.439104338341255e-06,
+      "loss": 0.5037,
+      "step": 600
+    },
+    {
+      "epoch": 0.5929526123936817,
+      "grad_norm": 1.6527436971664429,
+      "learning_rate": 9.414178533554e-06,
+      "loss": 0.4946,
+      "step": 610
+    },
+    {
+      "epoch": 0.6026731470230863,
+      "grad_norm": 1.6211801767349243,
+      "learning_rate": 9.388745258924321e-06,
+      "loss": 0.4852,
+      "step": 620
+    },
+    {
+      "epoch": 0.6123936816524909,
+      "grad_norm": 1.8734742403030396,
+      "learning_rate": 9.362807438352954e-06,
+      "loss": 0.4755,
+      "step": 630
+    },
+    {
+      "epoch": 0.6221142162818954,
+      "grad_norm": 1.505476951599121,
+      "learning_rate": 9.33636805374505e-06,
+      "loss": 0.4916,
+      "step": 640
+    },
+    {
+      "epoch": 0.6318347509113001,
+      "grad_norm": 1.6077882051467896,
+      "learning_rate": 9.309430144667376e-06,
+      "loss": 0.4962,
+      "step": 650
+    },
+    {
+      "epoch": 0.6415552855407047,
+      "grad_norm": 1.5872342586517334,
+      "learning_rate": 9.28199680799885e-06,
+      "loss": 0.4867,
+      "step": 660
+    },
+    {
+      "epoch": 0.6512758201701093,
+      "grad_norm": 1.5471742153167725,
+      "learning_rate": 9.254071197574539e-06,
+      "loss": 0.4845,
+      "step": 670
+    },
+    {
+      "epoch": 0.660996354799514,
+      "grad_norm": 1.6092792749404907,
+      "learning_rate": 9.22565652382307e-06,
+      "loss": 0.4758,
+      "step": 680
+    },
+    {
+      "epoch": 0.6707168894289186,
+      "grad_norm": 1.589211344718933,
+      "learning_rate": 9.196756053397544e-06,
+      "loss": 0.4856,
+      "step": 690
+    },
+    {
+      "epoch": 0.6804374240583232,
+      "grad_norm": 1.6765447854995728,
+      "learning_rate": 9.167373108799999e-06,
+      "loss": 0.4854,
+      "step": 700
+    },
+    {
+      "epoch": 0.6901579586877278,
+      "grad_norm": 1.7541825771331787,
+      "learning_rate": 9.137511067999444e-06,
+      "loss": 0.4952,
+      "step": 710
+    },
+    {
+      "epoch": 0.6998784933171325,
+      "grad_norm": 1.6507395505905151,
+      "learning_rate": 9.107173364043501e-06,
+      "loss": 0.4887,
+      "step": 720
+    },
+    {
+      "epoch": 0.7095990279465371,
+      "grad_norm": 1.5526636838912964,
+      "learning_rate": 9.076363484663745e-06,
+      "loss": 0.4815,
+      "step": 730
+    },
+    {
+      "epoch": 0.7193195625759417,
+      "grad_norm": 1.7525495290756226,
+      "learning_rate": 9.045084971874738e-06,
+      "loss": 0.4756,
+      "step": 740
+    },
+    {
+      "epoch": 0.7290400972053463,
+      "grad_norm": 1.812304139137268,
+      "learning_rate": 9.013341421566818e-06,
+      "loss": 0.4847,
+      "step": 750
+    },
+    {
+      "epoch": 0.7387606318347509,
+      "grad_norm": 1.6675856113433838,
+      "learning_rate": 8.981136483092719e-06,
+      "loss": 0.4756,
+      "step": 760
+    },
+    {
+      "epoch": 0.7484811664641555,
+      "grad_norm": 1.7526271343231201,
+      "learning_rate": 8.948473858848005e-06,
+      "loss": 0.491,
+      "step": 770
+    },
+    {
+      "epoch": 0.7582017010935601,
+      "grad_norm": 2.878983736038208,
+      "learning_rate": 8.915357303845453e-06,
+      "loss": 0.4907,
+      "step": 780
+    },
+    {
+      "epoch": 0.7679222357229648,
+      "grad_norm": 1.9142616987228394,
+      "learning_rate": 8.881790625283352e-06,
+      "loss": 0.4838,
+      "step": 790
+    },
+    {
+      "epoch": 0.7776427703523694,
+      "grad_norm": 1.7589762210845947,
+      "learning_rate": 8.847777682107805e-06,
+      "loss": 0.4792,
+      "step": 800
+    },
+    {
+      "epoch": 0.787363304981774,
+      "grad_norm": 1.7950881719589233,
+      "learning_rate": 8.813322384569114e-06,
+      "loss": 0.4932,
+      "step": 810
+    },
+    {
+      "epoch": 0.7970838396111786,
+      "grad_norm": 1.7117515802383423,
+      "learning_rate": 8.77842869377222e-06,
+      "loss": 0.4873,
+      "step": 820
+    },
+    {
+      "epoch": 0.8068043742405833,
+      "grad_norm": 1.83305823802948,
+      "learning_rate": 8.743100621221334e-06,
+      "loss": 0.4928,
+      "step": 830
+    },
+    {
+      "epoch": 0.8165249088699879,
+      "grad_norm": 1.8732259273529053,
+      "learning_rate": 8.707342228358753e-06,
+      "loss": 0.4861,
+      "step": 840
+    },
+    {
+      "epoch": 0.8262454434993924,
+      "grad_norm": 1.688181757926941,
+      "learning_rate": 8.671157626097949e-06,
+      "loss": 0.4842,
+      "step": 850
+    },
+    {
+      "epoch": 0.8359659781287971,
+      "grad_norm": 1.8716121912002563,
+      "learning_rate": 8.634550974350954e-06,
+      "loss": 0.5003,
+      "step": 860
+    },
+    {
+      "epoch": 0.8456865127582017,
+      "grad_norm": 1.9156651496887207,
+      "learning_rate": 8.597526481550133e-06,
+      "loss": 0.4843,
+      "step": 870
+    },
+    {
+      "epoch": 0.8554070473876063,
+      "grad_norm": 1.92584228515625,
+      "learning_rate": 8.560088404164358e-06,
+      "loss": 0.4716,
+      "step": 880
+    },
+    {
+      "epoch": 0.8651275820170109,
+      "grad_norm": 1.8322736024856567,
+      "learning_rate": 8.522241046209674e-06,
+      "loss": 0.4926,
+      "step": 890
+    },
+    {
+      "epoch": 0.8748481166464156,
+      "grad_norm": 1.990666389465332,
+      "learning_rate": 8.483988758754492e-06,
+      "loss": 0.4872,
+      "step": 900
+    },
+    {
+      "epoch": 0.8845686512758202,
+      "grad_norm": 1.876725435256958,
+      "learning_rate": 8.445335939419374e-06,
+      "loss": 0.482,
+      "step": 910
+    },
+    {
+      "epoch": 0.8942891859052248,
+      "grad_norm": 3.740494966506958,
+      "learning_rate": 8.406287031871469e-06,
+      "loss": 0.4884,
+      "step": 920
+    },
+    {
+      "epoch": 0.9040097205346294,
+      "grad_norm": 1.6894994974136353,
+      "learning_rate": 8.36684652531365e-06,
+      "loss": 0.4751,
+      "step": 930
+    },
+    {
+      "epoch": 0.913730255164034,
+      "grad_norm": 1.8580703735351562,
+      "learning_rate": 8.327018953968423e-06,
+      "loss": 0.48,
+      "step": 940
+    },
+    {
+      "epoch": 0.9234507897934386,
+      "grad_norm": 1.8106019496917725,
+      "learning_rate": 8.286808896556655e-06,
+      "loss": 0.4826,
+      "step": 950
+    },
+    {
+      "epoch": 0.9331713244228432,
+      "grad_norm": 1.7582460641860962,
+      "learning_rate": 8.246220975771185e-06,
+      "loss": 0.4867,
+      "step": 960
+    },
+    {
+      "epoch": 0.9428918590522479,
+      "grad_norm": 1.7424395084381104,
+      "learning_rate": 8.205259857745382e-06,
+      "loss": 0.4797,
+      "step": 970
+    },
+    {
+      "epoch": 0.9526123936816525,
+      "grad_norm": 1.8223532438278198,
+      "learning_rate": 8.163930251516719e-06,
+      "loss": 0.4733,
+      "step": 980
+    },
+    {
+      "epoch": 0.9623329283110571,
+      "grad_norm": 1.8597066402435303,
+      "learning_rate": 8.122236908485391e-06,
+      "loss": 0.486,
+      "step": 990
+    },
+    {
+      "epoch": 0.9720534629404617,
+      "grad_norm": 1.8571597337722778,
+      "learning_rate": 8.080184621868089e-06,
+      "loss": 0.4729,
+      "step": 1000
+    },
+    {
+      "epoch": 0.9817739975698664,
+      "grad_norm": 1.9846396446228027,
+      "learning_rate": 8.037778226146949e-06,
+      "loss": 0.4858,
+      "step": 1010
+    },
+    {
+      "epoch": 0.991494532199271,
+      "grad_norm": 4.8959150314331055,
+      "learning_rate": 7.995022596513762e-06,
+      "loss": 0.476,
+      "step": 1020
+    },
+    {
+      "epoch": 1.0014580801944106,
+      "grad_norm": 4.2978339195251465,
+      "learning_rate": 7.951922648309507e-06,
+      "loss": 0.4788,
+      "step": 1030
+    },
+    {
+      "epoch": 1.0111786148238153,
+      "grad_norm": 4.751235008239746,
+      "learning_rate": 7.908483336459265e-06,
+      "loss": 0.4827,
+      "step": 1040
+    },
+    {
+      "epoch": 1.02089914945322,
+      "grad_norm": 3.760265827178955,
+      "learning_rate": 7.864709654902579e-06,
+      "loss": 0.4726,
+      "step": 1050
+    },
+    {
+      "epoch": 1.0306196840826245,
+      "grad_norm": 4.5475287437438965,
+      "learning_rate": 7.820606636019341e-06,
+      "loss": 0.4691,
+      "step": 1060
+    },
+    {
+      "epoch": 1.0403402187120292,
+      "grad_norm": 4.606244087219238,
+      "learning_rate": 7.776179350051246e-06,
+      "loss": 0.4789,
+      "step": 1070
+    },
+    {
+      "epoch": 1.0500607533414337,
+      "grad_norm": 3.974292039871216,
+      "learning_rate": 7.731432904518893e-06,
+      "loss": 0.4813,
+      "step": 1080
+    },
+    {
+      "epoch": 1.0597812879708384,
+      "grad_norm": 3.419517755508423,
+      "learning_rate": 7.68637244363462e-06,
+      "loss": 0.4748,
+      "step": 1090
+    },
+    {
+      "epoch": 1.069501822600243,
+      "grad_norm": 4.130045413970947,
+      "learning_rate": 7.6410031477111e-06,
+      "loss": 0.476,
+      "step": 1100
+    },
+    {
+      "epoch": 1.0792223572296475,
+      "grad_norm": 4.5706329345703125,
+      "learning_rate": 7.595330232565785e-06,
+      "loss": 0.4692,
+      "step": 1110
+    },
+    {
+      "epoch": 1.0889428918590522,
+      "grad_norm": 3.4931533336639404,
+      "learning_rate": 7.549358948921293e-06,
+      "loss": 0.474,
+      "step": 1120
+    },
+    {
+      "epoch": 1.098663426488457,
+      "grad_norm": 4.1753034591674805,
+      "learning_rate": 7.5030945818017505e-06,
+      "loss": 0.4714,
+      "step": 1130
+    },
+    {
+      "epoch": 1.1083839611178614,
+      "grad_norm": 4.3049092292785645,
+      "learning_rate": 7.456542449925225e-06,
+      "loss": 0.4731,
+      "step": 1140
+    },
+    {
+      "epoch": 1.1181044957472661,
+      "grad_norm": 8.405867576599121,
+      "learning_rate": 7.409707905092246e-06,
+      "loss": 0.4842,
+      "step": 1150
+    },
+    {
+      "epoch": 1.1278250303766708,
+      "grad_norm": 3.693899393081665,
+      "learning_rate": 7.362596331570554e-06,
+      "loss": 0.478,
+      "step": 1160
+    },
+    {
+      "epoch": 1.1375455650060753,
+      "grad_norm": 4.493783950805664,
+      "learning_rate": 7.315213145476109e-06,
+      "loss": 0.4683,
+      "step": 1170
+    },
+    {
+      "epoch": 1.14726609963548,
+      "grad_norm": 4.062183380126953,
+      "learning_rate": 7.267563794150424e-06,
+      "loss": 0.4696,
+      "step": 1180
+    },
+    {
+      "epoch": 1.1569866342648845,
+      "grad_norm": 4.371154308319092,
+      "learning_rate": 7.2196537555343284e-06,
+      "loss": 0.4815,
+      "step": 1190
+    },
+    {
+      "epoch": 1.1667071688942892,
+      "grad_norm": 5.894556999206543,
+      "learning_rate": 7.171488537538195e-06,
+      "loss": 0.4798,
+      "step": 1200
+    },
+    {
+      "epoch": 1.1764277035236939,
+      "grad_norm": 3.3999180793762207,
+      "learning_rate": 7.123073677408743e-06,
+      "loss": 0.4645,
+      "step": 1210
+    },
+    {
+      "epoch": 1.1861482381530983,
+      "grad_norm": 3.9849109649658203,
+      "learning_rate": 7.074414741092444e-06,
+      "loss": 0.4747,
+      "step": 1220
+    },
+    {
+      "epoch": 1.195868772782503,
+      "grad_norm": 8.96238899230957,
+      "learning_rate": 7.025517322595648e-06,
+      "loss": 0.4809,
+      "step": 1230
+    },
+    {
+      "epoch": 1.2055893074119077,
+      "grad_norm": 3.941241502761841,
+      "learning_rate": 6.976387043341472e-06,
+      "loss": 0.4753,
+      "step": 1240
+    },
+    {
+      "epoch": 1.2153098420413122,
+      "grad_norm": 4.701385021209717,
+      "learning_rate": 6.927029551523548e-06,
+      "loss": 0.4774,
+      "step": 1250
+    },
+    {
+      "epoch": 1.225030376670717,
+      "grad_norm": 4.075878620147705,
+      "learning_rate": 6.877450521456679e-06,
+      "loss": 0.4774,
+      "step": 1260
+    },
+    {
+      "epoch": 1.2347509113001216,
+      "grad_norm": 3.7492356300354004,
+      "learning_rate": 6.827655652924499e-06,
+      "loss": 0.4577,
+      "step": 1270
+    },
+    {
+      "epoch": 1.244471445929526,
+      "grad_norm": 3.856064558029175,
+      "learning_rate": 6.777650670524212e-06,
+      "loss": 0.4785,
+      "step": 1280
+    },
+    {
+      "epoch": 1.2541919805589308,
+      "grad_norm": 4.559633731842041,
+      "learning_rate": 6.72744132300847e-06,
+      "loss": 0.4734,
+      "step": 1290
+    },
+    {
+      "epoch": 1.2639125151883355,
+      "grad_norm": 3.9584572315216064,
+      "learning_rate": 6.677033382624467e-06,
+      "loss": 0.4792,
+      "step": 1300
+    },
+    {
+      "epoch": 1.27363304981774,
+      "grad_norm": 3.9753267765045166,
+      "learning_rate": 6.626432644450354e-06,
+      "loss": 0.4945,
+      "step": 1310
+    },
+    {
+      "epoch": 1.2833535844471446,
+      "grad_norm": 3.5308444499969482,
+      "learning_rate": 6.575644925729008e-06,
+      "loss": 0.4724,
+      "step": 1320
+    },
+    {
+      "epoch": 1.2930741190765493,
+      "grad_norm": 3.5413670539855957,
+      "learning_rate": 6.524676065199259e-06,
+      "loss": 0.4834,
+      "step": 1330
+    },
+    {
+      "epoch": 1.3027946537059538,
+      "grad_norm": 3.488835573196411,
+      "learning_rate": 6.473531922424654e-06,
+      "loss": 0.4731,
+      "step": 1340
+    },
+    {
+      "epoch": 1.3125151883353585,
+      "grad_norm": 4.209514617919922,
+      "learning_rate": 6.422218377119818e-06,
+      "loss": 0.4713,
+      "step": 1350
+    },
+    {
+      "epoch": 1.322235722964763,
+      "grad_norm": 4.186116695404053,
+      "learning_rate": 6.370741328474497e-06,
+      "loss": 0.4694,
+      "step": 1360
+    },
+    {
+      "epoch": 1.3319562575941677,
+      "grad_norm": 4.734013080596924,
+      "learning_rate": 6.31910669447537e-06,
+      "loss": 0.4844,
+      "step": 1370
+    },
+    {
+      "epoch": 1.3416767922235722,
+      "grad_norm": 3.7548201084136963,
+      "learning_rate": 6.267320411225699e-06,
+      "loss": 0.4897,
+      "step": 1380
+    },
+    {
+      "epoch": 1.3513973268529769,
+      "grad_norm": 5.411615371704102,
+      "learning_rate": 6.215388432262885e-06,
+      "loss": 0.489,
+      "step": 1390
+    },
+    {
+      "epoch": 1.3611178614823816,
+      "grad_norm": 4.5117411613464355,
+      "learning_rate": 6.163316727874032e-06,
+      "loss": 0.4762,
+      "step": 1400
+    },
+    {
+      "epoch": 1.370838396111786,
+      "grad_norm": 4.529521942138672,
+      "learning_rate": 6.111111284409587e-06,
+      "loss": 0.4856,
+      "step": 1410
+    },
+    {
+      "epoch": 1.3805589307411907,
+      "grad_norm": 3.990785598754883,
+      "learning_rate": 6.058778103595115e-06,
+      "loss": 0.4719,
+      "step": 1420
+    },
+    {
+      "epoch": 1.3902794653705954,
+      "grad_norm": 3.461073160171509,
+      "learning_rate": 6.006323201841332e-06,
+      "loss": 0.4762,
+      "step": 1430
+    },
+    {
+      "epoch": 1.4,
+      "grad_norm": 4.177917003631592,
+      "learning_rate": 5.953752609552428e-06,
+      "loss": 0.4853,
+      "step": 1440
+    },
+    {
+      "epoch": 1.4097205346294046,
+      "grad_norm": 4.377038478851318,
+      "learning_rate": 5.9010723704327945e-06,
+      "loss": 0.4766,
+      "step": 1450
+    },
+    {
+      "epoch": 1.4194410692588093,
+      "grad_norm": 3.7444796562194824,
+      "learning_rate": 5.848288540792213e-06,
+      "loss": 0.4738,
+      "step": 1460
+    },
+    {
+      "epoch": 1.4291616038882138,
+      "grad_norm": 3.8287205696105957,
+      "learning_rate": 5.795407188849612e-06,
+      "loss": 0.4686,
+      "step": 1470
+    },
+    {
+      "epoch": 1.4388821385176185,
+      "grad_norm": 6.44314432144165,
+      "learning_rate": 5.7424343940354275e-06,
+      "loss": 0.4658,
+      "step": 1480
+    },
+    {
+      "epoch": 1.4486026731470232,
+      "grad_norm": 3.9452617168426514,
+      "learning_rate": 5.689376246292698e-06,
+      "loss": 0.4713,
+      "step": 1490
+    },
+    {
+      "epoch": 1.4583232077764277,
+      "grad_norm": 4.222135543823242,
+      "learning_rate": 5.636238845376947e-06,
+      "loss": 0.4685,
+      "step": 1500
+    },
+    {
+      "epoch": 1.4680437424058324,
+      "grad_norm": 3.804692268371582,
+      "learning_rate": 5.58302830015492e-06,
+      "loss": 0.469,
+      "step": 1510
+    },
+    {
+      "epoch": 1.477764277035237,
+      "grad_norm": 3.252633571624756,
+      "learning_rate": 5.529750727902301e-06,
+      "loss": 0.4773,
+      "step": 1520
+    },
+    {
+      "epoch": 1.4874848116646415,
+      "grad_norm": 4.714046955108643,
+      "learning_rate": 5.4764122536004406e-06,
+      "loss": 0.482,
+      "step": 1530
+    },
+    {
+      "epoch": 1.4972053462940462,
+      "grad_norm": 3.599266290664673,
+      "learning_rate": 5.423019009232207e-06,
+      "loss": 0.4662,
+      "step": 1540
+    },
+    {
+      "epoch": 1.506925880923451,
+      "grad_norm": 4.187329292297363,
+      "learning_rate": 5.369577133077033e-06,
+      "loss": 0.4652,
+      "step": 1550
+    },
+    {
+      "epoch": 1.5166464155528554,
+      "grad_norm": 3.5032341480255127,
+      "learning_rate": 5.316092769005239e-06,
+      "loss": 0.4834,
+      "step": 1560
+    },
+    {
+      "epoch": 1.5263669501822599,
+      "grad_norm": 4.487483024597168,
+      "learning_rate": 5.262572065771703e-06,
+      "loss": 0.465,
+      "step": 1570
+    },
+    {
+      "epoch": 1.5360874848116648,
+      "grad_norm": 4.465209484100342,
+      "learning_rate": 5.209021176308992e-06,
+      "loss": 0.4683,
+      "step": 1580
+    },
+    {
+      "epoch": 1.5458080194410693,
+      "grad_norm": 3.7371511459350586,
+      "learning_rate": 5.155446257019983e-06,
+      "loss": 0.4716,
+      "step": 1590
+    },
+    {
+      "epoch": 1.5555285540704737,
+      "grad_norm": 3.683544158935547,
+      "learning_rate": 5.101853467070112e-06,
+      "loss": 0.4761,
+      "step": 1600
+    },
+    {
+      "epoch": 1.5652490886998784,
+      "grad_norm": 5.11771297454834,
+      "learning_rate": 5.048248967679292e-06,
+      "loss": 0.4766,
+      "step": 1610
+    },
+    {
+      "epoch": 1.5749696233292831,
+      "grad_norm": 4.279435157775879,
+      "learning_rate": 4.994638921413591e-06,
+      "loss": 0.4789,
+      "step": 1620
+    },
+    {
+      "epoch": 1.5846901579586876,
+      "grad_norm": 4.194601535797119,
+      "learning_rate": 4.941029491476768e-06,
+      "loss": 0.4657,
+      "step": 1630
+    },
+    {
+      "epoch": 1.5944106925880923,
+      "grad_norm": 16.01555061340332,
+      "learning_rate": 4.887426841001728e-06,
+      "loss": 0.4727,
+      "step": 1640
+    },
+    {
+      "epoch": 1.604131227217497,
+      "grad_norm": 4.629364013671875,
+      "learning_rate": 4.833837132341982e-06,
+      "loss": 0.4788,
+      "step": 1650
+    },
+    {
+      "epoch": 1.6138517618469015,
+      "grad_norm": 4.278013229370117,
+      "learning_rate": 4.780266526363206e-06,
+      "loss": 0.473,
+      "step": 1660
+    },
+    {
+      "epoch": 1.6235722964763062,
+      "grad_norm": 3.2145824432373047,
+      "learning_rate": 4.726721181734958e-06,
+      "loss": 0.4717,
+      "step": 1670
+    },
+    {
+      "epoch": 1.6332928311057109,
+      "grad_norm": 4.674281120300293,
+      "learning_rate": 4.673207254222671e-06,
+      "loss": 0.4718,
+      "step": 1680
+    },
+    {
+      "epoch": 1.6430133657351154,
+      "grad_norm": 4.432647705078125,
+      "learning_rate": 4.619730895979938e-06,
+      "loss": 0.4758,
+      "step": 1690
+    },
+    {
+      "epoch": 1.65273390036452,
+      "grad_norm": 6.134150505065918,
+      "learning_rate": 4.56629825484127e-06,
+      "loss": 0.4776,
+      "step": 1700
+    },
+    {
+      "epoch": 1.6624544349939248,
+      "grad_norm": 3.62800931930542,
+      "learning_rate": 4.512915473615288e-06,
+      "loss": 0.4597,
+      "step": 1710
+    },
+    {
+      "epoch": 1.6721749696233292,
+      "grad_norm": 4.815770626068115,
+      "learning_rate": 4.459588689378548e-06,
+      "loss": 0.4724,
+      "step": 1720
+    },
+    {
+      "epoch": 1.681895504252734,
+      "grad_norm": 4.476830005645752,
+      "learning_rate": 4.406324032769987e-06,
+      "loss": 0.4699,
+      "step": 1730
+    },
+    {
+      "epoch": 1.6916160388821386,
+      "grad_norm": 4.451855659484863,
+      "learning_rate": 4.3531276272861254e-06,
+      "loss": 0.4723,
+      "step": 1740
+    },
+    {
+      "epoch": 1.701336573511543,
+      "grad_norm": 3.466305732727051,
+      "learning_rate": 4.300005588577091e-06,
+      "loss": 0.4688,
+      "step": 1750
+    },
+    {
+      "epoch": 1.7110571081409478,
+      "grad_norm": 4.952362537384033,
+      "learning_rate": 4.246964023743537e-06,
+      "loss": 0.4728,
+      "step": 1760
+    },
+    {
+      "epoch": 1.7207776427703525,
+      "grad_norm": 5.363527774810791,
+      "learning_rate": 4.194009030634556e-06,
+      "loss": 0.4692,
+      "step": 1770
+    },
+    {
+      "epoch": 1.730498177399757,
+      "grad_norm": 3.821661949157715,
+      "learning_rate": 4.1411466971466345e-06,
+      "loss": 0.4706,
+      "step": 1780
+    },
+    {
+      "epoch": 1.7402187120291615,
+      "grad_norm": 4.455121040344238,
+      "learning_rate": 4.088383100523786e-06,
+      "loss": 0.4636,
+      "step": 1790
+    },
+    {
+      "epoch": 1.7499392466585664,
+      "grad_norm": 3.9628641605377197,
+      "learning_rate": 4.035724306658869e-06,
+      "loss": 0.4744,
+      "step": 1800
+    },
+    {
+      "epoch": 1.7596597812879708,
+      "grad_norm": 3.942005157470703,
+      "learning_rate": 3.983176369396249e-06,
+      "loss": 0.4844,
+      "step": 1810
+    },
+    {
+      "epoch": 1.7693803159173753,
+      "grad_norm": 4.5945820808410645,
+      "learning_rate": 3.9307453298358105e-06,
+      "loss": 0.4691,
+      "step": 1820
+    },
+    {
+      "epoch": 1.77910085054678,
+      "grad_norm": 3.7485504150390625,
+      "learning_rate": 3.878437215638462e-06,
+      "loss": 0.4679,
+      "step": 1830
+    },
+    {
+      "epoch": 1.7888213851761847,
+      "grad_norm": 4.494779109954834,
+      "learning_rate": 3.826258040333169e-06,
+      "loss": 0.4781,
+      "step": 1840
+    },
+    {
+      "epoch": 1.7985419198055892,
+      "grad_norm": 3.641651153564453,
+      "learning_rate": 3.774213802625617e-06,
+      "loss": 0.4726,
+      "step": 1850
+    },
+    {
+      "epoch": 1.808262454434994,
+      "grad_norm": 4.123895168304443,
+      "learning_rate": 3.7223104857085818e-06,
+      "loss": 0.4617,
+      "step": 1860
+    },
+    {
+      "epoch": 1.8179829890643986,
+      "grad_norm": 4.825061798095703,
+      "learning_rate": 3.670554056574076e-06,
+      "loss": 0.4832,
+      "step": 1870
+    },
+    {
+      "epoch": 1.827703523693803,
+      "grad_norm": 3.8967978954315186,
+      "learning_rate": 3.618950465327368e-06,
+      "loss": 0.4793,
+      "step": 1880
+    },
+    {
+      "epoch": 1.8374240583232078,
+      "grad_norm": 4.325230598449707,
+      "learning_rate": 3.5675056445029265e-06,
+      "loss": 0.4731,
+      "step": 1890
+    },
+    {
+      "epoch": 1.8471445929526125,
+      "grad_norm": 4.8459343910217285,
+      "learning_rate": 3.516225508382409e-06,
+      "loss": 0.4651,
+      "step": 1900
+    },
+    {
+      "epoch": 1.856865127582017,
+      "grad_norm": 4.086941719055176,
+      "learning_rate": 3.4651159523147197e-06,
+      "loss": 0.4698,
+      "step": 1910
+    },
+    {
+      "epoch": 1.8665856622114216,
+      "grad_norm": 4.497838973999023,
+      "learning_rate": 3.4141828520382735e-06,
+      "loss": 0.4688,
+      "step": 1920
+    },
+    {
+      "epoch": 1.8763061968408263,
+      "grad_norm": 4.377286911010742,
+      "learning_rate": 3.363432063005487e-06,
+      "loss": 0.4742,
+      "step": 1930
+    },
+    {
+      "epoch": 1.8860267314702308,
+      "grad_norm": 4.47113037109375,
+      "learning_rate": 3.3128694197096224e-06,
+      "loss": 0.4831,
+      "step": 1940
+    },
+    {
+      "epoch": 1.8957472660996355,
+      "grad_norm": 4.1656928062438965,
+      "learning_rate": 3.2625007350140344e-06,
+      "loss": 0.4681,
+      "step": 1950
+    },
+    {
+      "epoch": 1.9054678007290402,
+      "grad_norm": 4.184291362762451,
+      "learning_rate": 3.2123317994838925e-06,
+      "loss": 0.4735,
+      "step": 1960
+    },
+    {
+      "epoch": 1.9151883353584447,
+      "grad_norm": 4.275445938110352,
+      "learning_rate": 3.162368380720492e-06,
+      "loss": 0.4693,
+      "step": 1970
+    },
+    {
+      "epoch": 1.9249088699878494,
+      "grad_norm": 4.3425068855285645,
+      "learning_rate": 3.1126162226981727e-06,
+      "loss": 0.4688,
+      "step": 1980
+    },
+    {
+      "epoch": 1.934629404617254,
+      "grad_norm": 3.7065649032592773,
+      "learning_rate": 3.063081045103986e-06,
+      "loss": 0.4603,
+      "step": 1990
+    },
+    {
+      "epoch": 1.9443499392466586,
+      "grad_norm": 4.646661281585693,
+      "learning_rate": 3.01376854268013e-06,
+      "loss": 0.4835,
+      "step": 2000
+    },
+    {
+      "epoch": 1.954070473876063,
+      "grad_norm": 4.484701633453369,
+      "learning_rate": 2.9646843845692657e-06,
+      "loss": 0.4779,
+      "step": 2010
+    },
+    {
+      "epoch": 1.963791008505468,
+      "grad_norm": 5.250178813934326,
+      "learning_rate": 2.91583421366277e-06,
+      "loss": 0.4792,
+      "step": 2020
+    },
+    {
+      "epoch": 1.9735115431348724,
+      "grad_norm": 4.839537143707275,
+      "learning_rate": 2.867223645952007e-06,
+      "loss": 0.4717,
+      "step": 2030
+    },
+    {
+      "epoch": 1.983232077764277,
+      "grad_norm": 4.47597074508667,
+      "learning_rate": 2.818858269882699e-06,
+      "loss": 0.4694,
+      "step": 2040
+    },
+    {
+      "epoch": 1.9929526123936816,
+      "grad_norm": 4.061124801635742,
+      "learning_rate": 2.770743645712455e-06,
+      "loss": 0.4748,
+      "step": 2050
+    },
+    {
+      "epoch": 2.0029161603888213,
+      "grad_norm": 2.0077168941497803,
+      "learning_rate": 2.722885304871539e-06,
+      "loss": 0.4654,
+      "step": 2060
+    },
+    {
+      "epoch": 2.012636695018226,
+      "grad_norm": 2.1146748065948486,
+      "learning_rate": 2.6752887493269676e-06,
+      "loss": 0.4666,
+      "step": 2070
+    },
+    {
+      "epoch": 2.0223572296476306,
+      "grad_norm": 2.04209303855896,
+      "learning_rate": 2.627959450949975e-06,
+      "loss": 0.4716,
+      "step": 2080
+    },
+    {
+      "epoch": 2.032077764277035,
+      "grad_norm": 1.962570071220398,
+      "learning_rate": 2.580902850886947e-06,
+      "loss": 0.4761,
+      "step": 2090
+    },
+    {
+      "epoch": 2.04179829890644,
+      "grad_norm": 2.0052075386047363,
+      "learning_rate": 2.5341243589339005e-06,
+      "loss": 0.4608,
+      "step": 2100
+    },
+    {
+      "epoch": 2.0515188335358445,
+      "grad_norm": 2.1367218494415283,
+      "learning_rate": 2.487629352914531e-06,
+      "loss": 0.4748,
+      "step": 2110
+    },
+    {
+      "epoch": 2.061239368165249,
+      "grad_norm": 3.518221855163574,
+      "learning_rate": 2.4414231780619825e-06,
+      "loss": 0.4676,
+      "step": 2120
+    },
+    {
+      "epoch": 2.070959902794654,
+      "grad_norm": 2.140087604522705,
+      "learning_rate": 2.395511146404318e-06,
+      "loss": 0.46,
+      "step": 2130
+    },
+    {
+      "epoch": 2.0806804374240584,
+      "grad_norm": 2.2499241828918457,
+      "learning_rate": 2.34989853615385e-06,
+      "loss": 0.4628,
+      "step": 2140
+    },
+    {
+      "epoch": 2.090400972053463,
+      "grad_norm": 2.151681661605835,
+      "learning_rate": 2.3045905911003253e-06,
+      "loss": 0.4755,
+      "step": 2150
+    },
+    {
+      "epoch": 2.1001215066828673,
+      "grad_norm": 2.0793325901031494,
+      "learning_rate": 2.259592520008086e-06,
+      "loss": 0.4768,
+      "step": 2160
+    },
+    {
+      "epoch": 2.1098420413122723,
+      "grad_norm": 2.1793599128723145,
+      "learning_rate": 2.2149094960172434e-06,
+      "loss": 0.456,
+      "step": 2170
+    },
+    {
+      "epoch": 2.1195625759416767,
+      "grad_norm": 1.8569040298461914,
+      "learning_rate": 2.170546656048966e-06,
+      "loss": 0.4679,
+      "step": 2180
+    },
+    {
+      "epoch": 2.129283110571081,
+      "grad_norm": 2.157439708709717,
+      "learning_rate": 2.1265091002149167e-06,
+      "loss": 0.4683,
+      "step": 2190
+    },
+    {
+      "epoch": 2.139003645200486,
+      "grad_norm": 2.0931077003479004,
+      "learning_rate": 2.082801891230916e-06,
+      "loss": 0.4718,
+      "step": 2200
+    },
+    {
+      "epoch": 2.1487241798298906,
+      "grad_norm": 2.0878078937530518,
+      "learning_rate": 2.039430053834931e-06,
+      "loss": 0.4768,
+      "step": 2210
+    },
+    {
+      "epoch": 2.158444714459295,
+      "grad_norm": 2.0429890155792236,
+      "learning_rate": 1.9963985742094e-06,
+      "loss": 0.4782,
+      "step": 2220
+    },
+    {
+      "epoch": 2.1681652490887,
+      "grad_norm": 2.038283586502075,
+      "learning_rate": 1.9537123994080113e-06,
+      "loss": 0.4607,
+      "step": 2230
+    },
+    {
+      "epoch": 2.1778857837181045,
+      "grad_norm": 2.2046799659729004,
+      "learning_rate": 1.911376436786963e-06,
+      "loss": 0.47,
+      "step": 2240
+    },
+    {
+      "epoch": 2.187606318347509,
+      "grad_norm": 2.4161760807037354,
+      "learning_rate": 1.869395553440807e-06,
+      "loss": 0.4715,
+      "step": 2250
+    },
+    {
+      "epoch": 2.197326852976914,
+      "grad_norm": 2.080486297607422,
+      "learning_rate": 1.8277745756428973e-06,
+      "loss": 0.4606,
+      "step": 2260
+    },
+    {
+      "epoch": 2.2070473876063184,
+      "grad_norm": 2.1269888877868652,
+      "learning_rate": 1.786518288290563e-06,
+      "loss": 0.4581,
+      "step": 2270
+    },
+    {
+      "epoch": 2.216767922235723,
+      "grad_norm": 2.1676316261291504,
+      "learning_rate": 1.7456314343549946e-06,
+      "loss": 0.457,
+      "step": 2280
+    },
+    {
+      "epoch": 2.2264884568651278,
+      "grad_norm": 2.3082780838012695,
+      "learning_rate": 1.7051187143359975e-06,
+      "loss": 0.4738,
+      "step": 2290
+    },
+    {
+      "epoch": 2.2362089914945322,
+      "grad_norm": 2.100372076034546,
+      "learning_rate": 1.6649847857215945e-06,
+      "loss": 0.4716,
+      "step": 2300
+    },
+    {
+      "epoch": 2.2459295261239367,
+      "grad_norm": 2.2547926902770996,
+      "learning_rate": 1.6252342624525802e-06,
+      "loss": 0.466,
+      "step": 2310
+    },
+    {
+      "epoch": 2.2556500607533416,
+      "grad_norm": 2.1985204219818115,
+      "learning_rate": 1.5858717143920988e-06,
+      "loss": 0.4701,
+      "step": 2320
+    },
+    {
+      "epoch": 2.265370595382746,
+      "grad_norm": 2.1563034057617188,
+      "learning_rate": 1.5469016668002652e-06,
+      "loss": 0.4685,
+      "step": 2330
+    },
+    {
+      "epoch": 2.2750911300121506,
+      "grad_norm": 2.214901924133301,
+      "learning_rate": 1.5083285998139308e-06,
+      "loss": 0.458,
+      "step": 2340
+    },
+    {
+      "epoch": 2.2848116646415555,
+      "grad_norm": 2.1835556030273438,
+      "learning_rate": 1.4701569479316252e-06,
+      "loss": 0.4624,
+      "step": 2350
+    },
+    {
+      "epoch": 2.29453219927096,
+      "grad_norm": 1.9954426288604736,
+      "learning_rate": 1.4323910995037576e-06,
+      "loss": 0.4641,
+      "step": 2360
+    },
+    {
+      "epoch": 2.3042527339003644,
+      "grad_norm": 2.121554136276245,
+      "learning_rate": 1.3950353962281081e-06,
+      "loss": 0.4763,
+      "step": 2370
+    },
+    {
+      "epoch": 2.313973268529769,
+      "grad_norm": 2.168081760406494,
+      "learning_rate": 1.358094132650699e-06,
+      "loss": 0.4763,
+      "step": 2380
+    },
+    {
+      "epoch": 2.323693803159174,
+      "grad_norm": 2.023455858230591,
+      "learning_rate": 1.3215715556720722e-06,
+      "loss": 0.4662,
+      "step": 2390
+    },
+    {
+      "epoch": 2.3334143377885783,
+      "grad_norm": 2.124724864959717,
+      "learning_rate": 1.285471864059053e-06,
+      "loss": 0.4777,
+      "step": 2400
+    },
+    {
+      "epoch": 2.3431348724179832,
+      "grad_norm": 2.163512945175171,
+      "learning_rate": 1.2497992079620408e-06,
+      "loss": 0.4714,
+      "step": 2410
+    },
+    {
+      "epoch": 2.3528554070473877,
+      "grad_norm": 2.237823247909546,
+      "learning_rate": 1.2145576884378995e-06,
+      "loss": 0.459,
+      "step": 2420
+    },
+    {
+      "epoch": 2.362575941676792,
+      "grad_norm": 2.2328405380249023,
+      "learning_rate": 1.179751356978483e-06,
+      "loss": 0.4678,
+      "step": 2430
+    },
+    {
+      "epoch": 2.3722964763061967,
+      "grad_norm": 2.189445972442627,
+      "learning_rate": 1.1453842150448513e-06,
+      "loss": 0.469,
+      "step": 2440
+    },
+    {
+      "epoch": 2.3820170109356016,
+      "grad_norm": 2.0778281688690186,
+      "learning_rate": 1.1114602136072706e-06,
+      "loss": 0.465,
+      "step": 2450
+    },
+    {
+      "epoch": 2.391737545565006,
+      "grad_norm": 1.9697000980377197,
+      "learning_rate": 1.0779832526909683e-06,
+      "loss": 0.4806,
+      "step": 2460
+    },
+    {
+      "epoch": 2.4014580801944105,
+      "grad_norm": 2.2064828872680664,
+      "learning_rate": 1.0449571809277942e-06,
+      "loss": 0.4572,
+      "step": 2470
+    },
+    {
+      "epoch": 2.4111786148238155,
+      "grad_norm": 1.9824655055999756,
+      "learning_rate": 1.0123857951137534e-06,
+      "loss": 0.4551,
+      "step": 2480
+    },
+    {
+      "epoch": 2.42089914945322,
+      "grad_norm": 2.0561132431030273,
+      "learning_rate": 9.802728397725224e-07,
+      "loss": 0.4708,
+      "step": 2490
+    },
+    {
+      "epoch": 2.4306196840826244,
+      "grad_norm": 2.374835729598999,
+      "learning_rate": 9.486220067249613e-07,
+      "loss": 0.4708,
+      "step": 2500
+    },
+    {
+      "epoch": 2.4403402187120293,
+      "grad_norm": 2.1538679599761963,
+      "learning_rate": 9.174369346646888e-07,
+      "loss": 0.4774,
+      "step": 2510
+    },
+    {
+      "epoch": 2.450060753341434,
+      "grad_norm": 2.8585407733917236,
+      "learning_rate": 8.867212087397626e-07,
+      "loss": 0.4733,
+      "step": 2520
+    },
+    {
+      "epoch": 2.4597812879708383,
+      "grad_norm": 2.1649951934814453,
+      "learning_rate": 8.564783601405225e-07,
+      "loss": 0.4663,
+      "step": 2530
+    },
+    {
+      "epoch": 2.469501822600243,
+      "grad_norm": 2.2239134311676025,
+      "learning_rate": 8.267118656936318e-07,
+      "loss": 0.4698,
+      "step": 2540
+    },
+    {
+      "epoch": 2.4792223572296477,
+      "grad_norm": 2.1840627193450928,
+      "learning_rate": 7.974251474623623e-07,
+      "loss": 0.4562,
+      "step": 2550
+    },
+    {
+      "epoch": 2.488942891859052,
+      "grad_norm": 2.260507583618164,
+      "learning_rate": 7.686215723531903e-07,
+      "loss": 0.4764,
+      "step": 2560
+    },
+    {
+      "epoch": 2.4986634264884566,
+      "grad_norm": 2.27022647857666,
+      "learning_rate": 7.4030445172872e-07,
+      "loss": 0.463,
+      "step": 2570
+    },
+    {
+      "epoch": 2.5083839611178615,
+      "grad_norm": 2.080825090408325,
+      "learning_rate": 7.124770410269971e-07,
+      "loss": 0.4645,
+      "step": 2580
+    },
+    {
+      "epoch": 2.518104495747266,
+      "grad_norm": 2.2736544609069824,
+      "learning_rate": 6.851425393872535e-07,
+      "loss": 0.4736,
+      "step": 2590
+    },
+    {
+      "epoch": 2.527825030376671,
+      "grad_norm": 2.116166591644287,
+      "learning_rate": 6.58304089282123e-07,
+      "loss": 0.469,
+      "step": 2600
+    },
+    {
+      "epoch": 2.5375455650060754,
+      "grad_norm": 2.1004207134246826,
+      "learning_rate": 6.319647761563685e-07,
+      "loss": 0.4774,
+      "step": 2610
+    },
+    {
+      "epoch": 2.54726609963548,
+      "grad_norm": 2.0980539321899414,
+      "learning_rate": 6.061276280721729e-07,
+      "loss": 0.4585,
+      "step": 2620
+    },
+    {
+      "epoch": 2.5569866342648844,
+      "grad_norm": 2.199542760848999,
+      "learning_rate": 5.807956153610189e-07,
+      "loss": 0.4787,
+      "step": 2630
+    },
+    {
+      "epoch": 2.5667071688942893,
+      "grad_norm": 2.0425212383270264,
+      "learning_rate": 5.559716502822087e-07,
+      "loss": 0.4746,
+      "step": 2640
+    },
+    {
+      "epoch": 2.5764277035236938,
+      "grad_norm": 2.1126480102539062,
+      "learning_rate": 5.316585866880635e-07,
+      "loss": 0.4667,
+      "step": 2650
+    },
+    {
+      "epoch": 2.5861482381530987,
+      "grad_norm": 2.1631975173950195,
+      "learning_rate": 5.078592196958282e-07,
+      "loss": 0.4607,
+      "step": 2660
+    },
+    {
+      "epoch": 2.595868772782503,
+      "grad_norm": 2.2496447563171387,
+      "learning_rate": 4.845762853663416e-07,
+      "loss": 0.4664,
+      "step": 2670
+    },
+    {
+      "epoch": 2.6055893074119076,
+      "grad_norm": 2.1951189041137695,
+      "learning_rate": 4.6181246038948524e-07,
+      "loss": 0.4606,
+      "step": 2680
+    },
+    {
+      "epoch": 2.615309842041312,
+      "grad_norm": 2.1003565788269043,
+      "learning_rate": 4.395703617764624e-07,
+      "loss": 0.4607,
+      "step": 2690
+    },
+    {
+      "epoch": 2.625030376670717,
+      "grad_norm": 2.2774477005004883,
+      "learning_rate": 4.1785254655893615e-07,
+      "loss": 0.4635,
+      "step": 2700
+    },
+    {
+      "epoch": 2.6347509113001215,
+      "grad_norm": 2.1457698345184326,
+      "learning_rate": 3.9666151149506506e-07,
+      "loss": 0.4658,
+      "step": 2710
+    },
+    {
+      "epoch": 2.644471445929526,
+      "grad_norm": 2.298901319503784,
+      "learning_rate": 3.75999692782465e-07,
+      "loss": 0.4697,
+      "step": 2720
+    },
+    {
+      "epoch": 2.654191980558931,
+      "grad_norm": 2.227304220199585,
+      "learning_rate": 3.558694657781386e-07,
+      "loss": 0.4815,
+      "step": 2730
+    },
+    {
+      "epoch": 2.6639125151883354,
+      "grad_norm": 2.645914316177368,
+      "learning_rate": 3.362731447253931e-07,
+      "loss": 0.4674,
+      "step": 2740
+    },
+    {
+      "epoch": 2.67363304981774,
+      "grad_norm": 2.4246609210968018,
+      "learning_rate": 3.172129824877862e-07,
+      "loss": 0.4735,
+      "step": 2750
+    },
+    {
+      "epoch": 2.6833535844471443,
+      "grad_norm": 2.0741360187530518,
+      "learning_rate": 2.9869117029012905e-07,
+      "loss": 0.4663,
+      "step": 2760
+    },
+    {
+      "epoch": 2.6930741190765493,
+      "grad_norm": 2.19986629486084,
+      "learning_rate": 2.807098374665773e-07,
+      "loss": 0.4607,
+      "step": 2770
+    },
+    {
+      "epoch": 2.7027946537059537,
+      "grad_norm": 2.199263572692871,
+      "learning_rate": 2.632710512158332e-07,
+      "loss": 0.4633,
+      "step": 2780
+    },
+    {
+      "epoch": 2.7125151883353587,
+      "grad_norm": 5.237627029418945,
+      "learning_rate": 2.4637681636349106e-07,
+      "loss": 0.4732,
+      "step": 2790
+    },
+    {
+      "epoch": 2.722235722964763,
+      "grad_norm": 2.1454434394836426,
+      "learning_rate": 2.3002907513156315e-07,
+      "loss": 0.4732,
+      "step": 2800
+    },
+    {
+      "epoch": 2.7319562575941676,
+      "grad_norm": 2.4052484035491943,
+      "learning_rate": 2.1422970691518276e-07,
+      "loss": 0.4641,
+      "step": 2810
+    },
+    {
+      "epoch": 2.741676792223572,
+      "grad_norm": 3.7976326942443848,
+      "learning_rate": 1.9898052806655356e-07,
+      "loss": 0.4616,
+      "step": 2820
+    },
+    {
+      "epoch": 2.751397326852977,
+      "grad_norm": 2.182563066482544,
+      "learning_rate": 1.8428329168612703e-07,
+      "loss": 0.4779,
+      "step": 2830
+    },
+    {
+      "epoch": 2.7611178614823815,
+      "grad_norm": 2.0911543369293213,
+      "learning_rate": 1.701396874210659e-07,
+      "loss": 0.4717,
+      "step": 2840
+    },
+    {
+      "epoch": 2.7708383961117864,
+      "grad_norm": 2.4781322479248047,
+      "learning_rate": 1.5655134127099292e-07,
+      "loss": 0.4767,
+      "step": 2850
+    },
+    {
+      "epoch": 2.780558930741191,
+      "grad_norm": 2.073251485824585,
+      "learning_rate": 1.435198154010592e-07,
+      "loss": 0.4761,
+      "step": 2860
+    },
+    {
+      "epoch": 2.7902794653705953,
+      "grad_norm": 2.240455150604248,
+      "learning_rate": 1.3104660796235402e-07,
+      "loss": 0.4694,
+      "step": 2870
+    },
+    {
+      "epoch": 2.8,
+      "grad_norm": 3.2606217861175537,
+      "learning_rate": 1.1913315291967209e-07,
+      "loss": 0.4778,
+      "step": 2880
+    },
+    {
+      "epoch": 2.8097205346294047,
+      "grad_norm": 2.1772541999816895,
+      "learning_rate": 1.0778081988665978e-07,
+      "loss": 0.454,
+      "step": 2890
+    },
+    {
+      "epoch": 2.819441069258809,
+      "grad_norm": 2.2045230865478516,
+      "learning_rate": 9.699091396835725e-08,
+      "loss": 0.4542,
+      "step": 2900
+    },
+    {
+      "epoch": 2.8291616038882137,
+      "grad_norm": 2.1804275512695312,
+      "learning_rate": 8.676467561116064e-08,
+      "loss": 0.4759,
+      "step": 2910
+    },
+    {
+      "epoch": 2.8388821385176186,
+      "grad_norm": 2.245986223220825,
+      "learning_rate": 7.710328046021675e-08,
+      "loss": 0.4662,
+      "step": 2920
+    },
+    {
+      "epoch": 2.848602673147023,
+      "grad_norm": 2.1786446571350098,
+      "learning_rate": 6.800783922426557e-08,
+      "loss": 0.4814,
+      "step": 2930
+    },
+    {
+      "epoch": 2.8583232077764276,
+      "grad_norm": 2.142780303955078,
+      "learning_rate": 5.947939754794796e-08,
+      "loss": 0.4786,
+      "step": 2940
+    },
+    {
+      "epoch": 2.8680437424058325,
+      "grad_norm": 2.2657175064086914,
+      "learning_rate": 5.151893589159684e-08,
+      "loss": 0.4722,
+      "step": 2950
+    },
+    {
+      "epoch": 2.877764277035237,
+      "grad_norm": 2.4200682640075684,
+      "learning_rate": 4.4127369418518474e-08,
+      "loss": 0.4643,
+      "step": 2960
+    },
+    {
+      "epoch": 2.8874848116646414,
+      "grad_norm": 2.1785573959350586,
+      "learning_rate": 3.7305547889783244e-08,
+      "loss": 0.4529,
+      "step": 2970
+    },
+    {
+      "epoch": 2.8972053462940464,
+      "grad_norm": 2.2167816162109375,
+      "learning_rate": 3.1054255566532746e-08,
+      "loss": 0.4682,
+      "step": 2980
+    },
+    {
+      "epoch": 2.906925880923451,
+      "grad_norm": 2.4332900047302246,
+      "learning_rate": 2.537421111981908e-08,
+      "loss": 0.4693,
+      "step": 2990
+    },
+    {
+      "epoch": 2.9166464155528553,
+      "grad_norm": 2.178539514541626,
+      "learning_rate": 2.026606754798266e-08,
+      "loss": 0.4621,
+      "step": 3000
+    },
+    {
+      "epoch": 2.92636695018226,
+      "grad_norm": 2.165919065475464,
+      "learning_rate": 1.5730412101583882e-08,
+      "loss": 0.4633,
+      "step": 3010
+    },
+    {
+      "epoch": 2.9360874848116647,
+      "grad_norm": 2.2715532779693604,
+      "learning_rate": 1.176776621588771e-08,
+      "loss": 0.4776,
+      "step": 3020
+    },
+    {
+      "epoch": 2.945808019441069,
+      "grad_norm": 2.2068674564361572,
+      "learning_rate": 8.378585450918853e-09,
+      "loss": 0.4668,
+      "step": 3030
+    },
+    {
+      "epoch": 2.955528554070474,
+      "grad_norm": 2.124382495880127,
+      "learning_rate": 5.563259439089752e-09,
+      "loss": 0.4639,
+      "step": 3040
+    },
+    {
+      "epoch": 2.9652490886998786,
+      "grad_norm": 2.2651636600494385,
+      "learning_rate": 3.322111840405318e-09,
+      "loss": 0.4561,
+      "step": 3050
+    },
+    {
+      "epoch": 2.974969623329283,
+      "grad_norm": 2.188660144805908,
+      "learning_rate": 1.6554003052554612e-09,
+      "loss": 0.4633,
+      "step": 3060
+    },
+    {
+      "epoch": 2.9846901579586875,
+      "grad_norm": 2.2163050174713135,
+      "learning_rate": 5.633164447932382e-10,
+      "loss": 0.4756,
+      "step": 3070
+    },
+    {
+      "epoch": 2.9944106925880924,
+      "grad_norm": 2.2273592948913574,
+      "learning_rate": 4.598580890802229e-11,
+      "loss": 0.4751,
+      "step": 3080
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 3084,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 7.827564814489616e+19,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-3084/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:af366e8bf979151cfc46d09de7384543758bd91e2505e308216c4287ab014007
+size 6456

checkpoint-3084/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

config.json ADDED Viewed

	@@ -0,0 +1,49 @@

+{
+  "_attn_implementation_autoset": true,
+  "_name_or_path": "ibm-granite/granite-3.1-8b-instruct",
+  "architectures": [
+    "GraniteForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.1,
+  "attention_multiplier": 0.0078125,
+  "bos_token_id": 0,
+  "embedding_multiplier": 12.0,
+  "eos_token_id": 0,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 12800,
+  "logits_scaling": 16.0,
+  "max_position_embeddings": 131072,
+  "mlp_bias": false,
+  "model_type": "granite",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 40,
+  "num_key_value_heads": 8,
+  "pad_token_id": 0,
+  "quantization_config": {
+    "_load_in_4bit": false,
+    "_load_in_8bit": true,
+    "bnb_4bit_compute_dtype": "float32",
+    "bnb_4bit_quant_storage": "uint8",
+    "bnb_4bit_quant_type": "fp4",
+    "bnb_4bit_use_double_quant": false,
+    "llm_int8_enable_fp32_cpu_offload": false,
+    "llm_int8_has_fp16_weight": false,
+    "llm_int8_skip_modules": null,
+    "llm_int8_threshold": 6.0,
+    "load_in_4bit": false,
+    "load_in_8bit": true,
+    "quant_method": "bitsandbytes"
+  },
+  "residual_multiplier": 0.22,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "rope_theta": 10000000.0,
+  "tie_word_embeddings": true,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.46.3",
+  "use_cache": false,
+  "vocab_size": 49184
+}

merged/added_tokens.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "<|end_of_role|>": 49153,
+  "<|start_of_role|>": 49152,
+  "<|tool_call|>": 49154
+}

merged/config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "_name_or_path": "ibm-granite/granite-3.1-8b-instruct",
+  "architectures": [
+    "GraniteForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.1,
+  "attention_multiplier": 0.0078125,
+  "bos_token_id": 0,
+  "embedding_multiplier": 12.0,
+  "eos_token_id": 0,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 12800,
+  "logits_scaling": 16.0,
+  "max_position_embeddings": 131072,
+  "mlp_bias": false,
+  "model_type": "granite",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 40,
+  "num_key_value_heads": 8,
+  "pad_token_id": 0,
+  "residual_multiplier": 0.22,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "rope_theta": 10000000.0,
+  "tie_word_embeddings": true,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.46.3",
+  "use_cache": false,
+  "vocab_size": 49184
+}

merged/generation_config.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "do_sample": true,
+  "eos_token_id": 0,
+  "pad_token_id": 0,
+  "transformers_version": "4.46.3"
+}

merged/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

merged/pytorch_model-00001-of-00004.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:071e61d94737f02b8427c4eef727368d2d44e28b9b779fd2f09ff76e380d7059
+size 4974924377

merged/pytorch_model-00002-of-00004.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b1f5633cbbf0976809272118af68212888d758331af297257007815641ca8470
+size 4991474682

merged/pytorch_model-00003-of-00004.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:35ce5179842eca955de0acca571f58ba2722b3133ecd33a380bdb660baf684f4
+size 4970487165

merged/pytorch_model-00004-of-00004.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:79b986b2865f140d722928c3984b1d27ba31313868bdd49d62df533e5bb097a1
+size 1405177130

merged/pytorch_model.bin.index.json ADDED Viewed

	@@ -0,0 +1,370 @@

+{
+  "metadata": {
+    "total_size": 16341934080
+  },
+  "weight_map": {
+    "lm_head.weight": "pytorch_model-00001-of-00004.bin",
+    "model.embed_tokens.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.10.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.10.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.10.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.16.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.16.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.17.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.17.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.17.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.18.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.18.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.18.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.19.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.19.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.19.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.20.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.20.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.20.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.21.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.21.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.21.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.22.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.22.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.22.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.23.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.23.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.23.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
+    "model.layers.24.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.24.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.24.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.25.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.25.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.25.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.26.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.26.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.26.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.27.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.27.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.27.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.28.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.28.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.28.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.29.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.29.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.29.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.30.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.30.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.30.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.31.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.31.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.31.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.32.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.32.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.32.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.32.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.32.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.32.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.32.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.32.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.32.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.33.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.33.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.33.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.33.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.33.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.33.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.33.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.33.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.33.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.34.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.34.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.34.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.34.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.34.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.34.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.34.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.34.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.34.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.35.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.35.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.35.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.35.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.35.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.35.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.35.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.35.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.35.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.36.input_layernorm.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.36.mlp.down_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.36.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.36.mlp.up_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.36.post_attention_layernorm.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.36.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.36.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.36.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.36.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
+    "model.layers.37.input_layernorm.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.37.mlp.down_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.37.mlp.gate_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.37.mlp.up_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.37.post_attention_layernorm.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.37.self_attn.k_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.37.self_attn.o_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.37.self_attn.q_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.37.self_attn.v_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.38.input_layernorm.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.38.mlp.down_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.38.mlp.gate_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.38.mlp.up_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.38.post_attention_layernorm.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.38.self_attn.k_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.38.self_attn.o_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.38.self_attn.q_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.38.self_attn.v_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.39.input_layernorm.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.39.mlp.down_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.39.mlp.gate_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.39.mlp.up_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.39.post_attention_layernorm.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.39.self_attn.k_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.39.self_attn.o_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.39.self_attn.q_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.39.self_attn.v_proj.weight": "pytorch_model-00004-of-00004.bin",
+    "model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.9.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.9.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
+    "model.norm.weight": "pytorch_model-00004-of-00004.bin"
+  }
+}

merged/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "additional_special_tokens": [
+    "<|start_of_role|>",
+    "<|end_of_role|>",
+    "<|tool_call|>"
+  ],
+  "bos_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

merged/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

merged/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,199 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|end_of_text|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<fim_prefix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<fim_middle>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<fim_suffix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<fim_pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "5": {
+      "content": "<filename>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "6": {
+      "content": "<gh_stars>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "7": {
+      "content": "<issue_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "8": {
+      "content": "<issue_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "9": {
+      "content": "<issue_closed>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "10": {
+      "content": "<jupyter_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "11": {
+      "content": "<jupyter_text>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "12": {
+      "content": "<jupyter_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "13": {
+      "content": "<jupyter_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "14": {
+      "content": "<empty_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "15": {
+      "content": "<commit_before>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "16": {
+      "content": "<commit_msg>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "17": {
+      "content": "<commit_after>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "18": {
+      "content": "<reponame>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49152": {
+      "content": "<|start_of_role|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49153": {
+      "content": "<|end_of_role|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49154": {
+      "content": "<|tool_call|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|start_of_role|>",
+    "<|end_of_role|>",
+    "<|tool_call|>"
+  ],
+  "bos_token": "<|end_of_text|>",
+  "chat_template": "{%- if messages[0]['role'] == 'system' %}\n    {%- set system_message = messages[0]['content'] %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set system_message = \"Knowledge Cutoff Date: April 2024.\nToday's Date: \" + strftime_now('%B %d, %Y') + \".\nYou are Granite, developed by IBM.\" %}\n    {%- if tools and documents %}\n        {%- set system_message = system_message + \" You are a helpful AI assistant with access to the following tools. When a tool is required to answer the user's query, respond with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\n\nWrite the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n    {%- elif tools %}\n        {%- set system_message = system_message + \" You are a helpful AI assistant with access to the following tools. When a tool is required to answer the user's query, respond with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\" %}\n    {%- elif documents %}\n        {%- set system_message = system_message + \" Write the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n    {%- else %}\n        {%- set system_message = system_message + \" You are a helpful AI assistant.\" %}    \n    {%- endif %}\n    {%- if 'citations' in controls and documents %}\n        {%- set system_message = system_message + '\n\nIn your response, use the symbols <co> and </co> to indicate when a fact comes from a document in the search result, e.g <co>0</co> for a fact from document 0. Afterwards, list all the citations with their corresponding documents in an ordered list.' %}\n    {%- endif %}\n    {%- if 'hallucinations' in controls and documents %}\n        {%- set system_message = system_message + '\n\nFinally, after the response is written, include a numbered list of sentences from the response that are potentially hallucinated and not based in the documents.' %}\n    {%- endif %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n{{- '<|start_of_role|>system<|end_of_role|>' + system_message + '<|end_of_text|>\n' }}\n{%- if tools %}\n    {{- '<|start_of_role|>tools<|end_of_role|>' }}\n    {{- tools | tojson(indent=4) }}\n    {{- '<|end_of_text|>\n' }}\n{%- endif %}\n{%- if documents %}\n    {{- '<|start_of_role|>documents<|end_of_role|>' }}\n    {%- for document in documents %}\n        {{- 'Document ' + loop.index0 | string + '\n' }}\n        {{- document['text'] }}\n        {%- if not loop.last %}\n            {{- '\n\n'}}\n        {%- endif%}\n    {%- endfor %}\n    {{- '<|end_of_text|>\n' }}\n{%- endif %}\n{%- for message in loop_messages %}\n    {{- '<|start_of_role|>' + message['role'] + '<|end_of_role|>' + message['content'] + '<|end_of_text|>\n' }}\n    {%- if loop.last and add_generation_prompt %}\n        {{- '<|start_of_role|>assistant' }}\n            {%- if controls %}\n                {{- ' ' + controls | tojson()}}\n            {%- endif %}\n        {{- '<|end_of_role|>' }}\n    {%- endif %}\n{%- endfor %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|end_of_text|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 9223372036854775807,
+  "pad_token": "<|end_of_text|>",
+  "padding_side": "left",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|end_of_text|>",
+  "vocab_size": 49152
+}

merged/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "additional_special_tokens": [
+    "<|start_of_role|>",
+    "<|end_of_role|>",
+    "<|tool_call|>"
+  ],
+  "bos_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,199 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|end_of_text|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<fim_prefix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<fim_middle>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<fim_suffix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<fim_pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "5": {
+      "content": "<filename>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "6": {
+      "content": "<gh_stars>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "7": {
+      "content": "<issue_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "8": {
+      "content": "<issue_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "9": {
+      "content": "<issue_closed>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "10": {
+      "content": "<jupyter_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "11": {
+      "content": "<jupyter_text>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "12": {
+      "content": "<jupyter_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "13": {
+      "content": "<jupyter_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "14": {
+      "content": "<empty_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "15": {
+      "content": "<commit_before>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "16": {
+      "content": "<commit_msg>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "17": {
+      "content": "<commit_after>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "18": {
+      "content": "<reponame>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49152": {
+      "content": "<|start_of_role|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49153": {
+      "content": "<|end_of_role|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49154": {
+      "content": "<|tool_call|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|start_of_role|>",
+    "<|end_of_role|>",
+    "<|tool_call|>"
+  ],
+  "bos_token": "<|end_of_text|>",
+  "chat_template": "{%- if messages[0]['role'] == 'system' %}\n    {%- set system_message = messages[0]['content'] %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set system_message = \"Knowledge Cutoff Date: April 2024.\nToday's Date: \" + strftime_now('%B %d, %Y') + \".\nYou are Granite, developed by IBM.\" %}\n    {%- if tools and documents %}\n        {%- set system_message = system_message + \" You are a helpful AI assistant with access to the following tools. When a tool is required to answer the user's query, respond with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\n\nWrite the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n    {%- elif tools %}\n        {%- set system_message = system_message + \" You are a helpful AI assistant with access to the following tools. When a tool is required to answer the user's query, respond with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.\" %}\n    {%- elif documents %}\n        {%- set system_message = system_message + \" Write the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.\" %}\n    {%- else %}\n        {%- set system_message = system_message + \" You are a helpful AI assistant.\" %}    \n    {%- endif %}\n    {%- if 'citations' in controls and documents %}\n        {%- set system_message = system_message + '\n\nIn your response, use the symbols <co> and </co> to indicate when a fact comes from a document in the search result, e.g <co>0</co> for a fact from document 0. Afterwards, list all the citations with their corresponding documents in an ordered list.' %}\n    {%- endif %}\n    {%- if 'hallucinations' in controls and documents %}\n        {%- set system_message = system_message + '\n\nFinally, after the response is written, include a numbered list of sentences from the response that are potentially hallucinated and not based in the documents.' %}\n    {%- endif %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n{{- '<|start_of_role|>system<|end_of_role|>' + system_message + '<|end_of_text|>\n' }}\n{%- if tools %}\n    {{- '<|start_of_role|>tools<|end_of_role|>' }}\n    {{- tools | tojson(indent=4) }}\n    {{- '<|end_of_text|>\n' }}\n{%- endif %}\n{%- if documents %}\n    {{- '<|start_of_role|>documents<|end_of_role|>' }}\n    {%- for document in documents %}\n        {{- 'Document ' + loop.index0 | string + '\n' }}\n        {{- document['text'] }}\n        {%- if not loop.last %}\n            {{- '\n\n'}}\n        {%- endif%}\n    {%- endfor %}\n    {{- '<|end_of_text|>\n' }}\n{%- endif %}\n{%- for message in loop_messages %}\n    {{- '<|start_of_role|>' + message['role'] + '<|end_of_role|>' + message['content'] + '<|end_of_text|>\n' }}\n    {%- if loop.last and add_generation_prompt %}\n        {{- '<|start_of_role|>assistant' }}\n            {%- if controls %}\n                {{- ' ' + controls | tojson()}}\n            {%- endif %}\n        {{- '<|end_of_role|>' }}\n    {%- endif %}\n{%- endfor %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|end_of_text|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 9223372036854775807,
+  "pad_token": "<|end_of_text|>",
+  "padding_side": "left",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|end_of_text|>",
+  "vocab_size": 49152
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff