---
library_name: transformers
license: llama3.1
datasets:
- thesven/Reflective-MAGLLAMA-v0.1.1
- SE6446/MAGllama_Sharegpt
base_model:
- arcee-ai/Llama-3.1-SuperNova-Lite
model-index:
- name: Llama-3.1-SuperNova-Lite-Reflections-3
  results: []
tags:
- axolotl
- generated_from_trainer
---
# SE6446/Llama-3.1-SuperNova-Lite-Reflection-V1.0
This model is a LoRA adaptation of [arcee-ai/Llama-3.1-SuperNova-Lite](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite) on [thesven/Reflective-MAGLLAMA-v0.1.1](thesven/Reflective-MAGLLAMA-v0.1.1).
This has been a simple experiment into reflection and the model appears to perform adequately, though I am unsure if it is a large improvement.


[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
<details><summary>See axolotl config</summary>

axolotl version: `0.4.1`
```yaml
base_model: arcee-ai/Llama-3.1-SuperNova-Lite

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: SE6446/MAGllama_Sharegpt
    type: sharegpt
    conversation: chatml
  
dataset_prepared_path: /workspace/data/last_run_prepared
val_set_size: 0.05
output_dir: /workspace/data/outputs/out

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true
eval_sample_packing: false


hub_model_id: SE6446/Llama-3.1-SuperNova-Lite-Reflections-3
hub_strategy: every_save
use_auth_token: true

wandb_project: Bojangles
wandb_entity:
wandb_watch:
wandb_name: run-6
wandb_log_model: checkpoint

gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 2
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.00015

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save:
  - embed_tokens
  - lm_head

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: false

warmup_steps: 10
evals_per_epoch: 2
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed: 
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: <|end_of_text|>
tokens:
  - <thinking>
  - </thinking>
  - <reflection>
  - </reflection>
  - <output>
  - </output>
```

</details><br>

# Instructions

## Using hf pipeline

You **must** use the tokenizer provided with the model as the COT tokens are unique special tokens.
It should work on most inference engines that can run llama 3.1

```python
from transformers import pipeline

pipe = pipeline("text-generation", "SE6446/Llama-3.1-SuperNova-Lite-Reflection-V1.0", device_map="auto",trust_remote_code=True)

sys_prompt = "You are an AI assistant who reflects before answering the user." #If you put 'reflect' it will typically do so. If you want to vary the character just append it under this.
user_prompt = "Explain the difference between Newtonian and Keplerian orbits for a five year old." #Classic

messages = [
    {
        "role": "system",
        "content": sys_prompt,
    },
    {"role": "user", "content": user_prompt}
]

prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prompt = prompt + "<thinking>" #Though not necessary, putting <thinking> under the new line does ensure it reflects. Testing revealed not doing this could cause it to rarely disobey the tokens. Which is bad.
# prompt = "<|im_start|>assistant\n[sys prompt]<|im_end|><|im_start|>user\n[user input]<|im_end|><|im_start|>assistant\n<thinking>" should do the trick if you like it old school.

text = pipe(prompt, max_new_tokens=1000) #max_new_tokens needs to be decently high so it may adequatley perform it's reflection AND output a concise answer.
print(text[0]['generated_text'])
```


# Training details

It achieves the following results on the evaluation set:
- Loss: 0.6365

## Training procedure

I trained it as a LoRA not only because it is cheap, but because it tries to preserve as much of the original parameters as possible. I just wanted it to get used to COT.

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.00015
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 2

### Training results

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 2.7211        | 0.0049 | 1    | 1.4048          |
| 0.6381        | 0.5    | 103  | 0.6583          |
| 0.4985        | 1.0049 | 206  | 0.6320          |
| 0.4992        | 1.5049 | 309  | 0.6365          |


### Framework versions

- PEFT 0.12.0
- Transformers 4.45.0.dev0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1