---
library_name: transformers
license: llama3.1
datasets:
- thesven/Reflective-MAGLLAMA-v0.1.1
- SE6446/MAGllama_Sharegpt
base_model:
- arcee-ai/Llama-3.1-SuperNova-Lite
model-index:
- name: Llama-3.1-SuperNova-Lite-Reflections-3
results: []
tags:
- axolotl
- generated_from_trainer
---
# SE6446/Llama-3.1-SuperNova-Lite-Reflection-V1.0
This model is a LoRA adaptation of [arcee-ai/Llama-3.1-SuperNova-Lite](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite) on [thesven/Reflective-MAGLLAMA-v0.1.1](thesven/Reflective-MAGLLAMA-v0.1.1).
This has been a simple experiment into reflection and the model appears to perform adequately, though I am unsure if it is a large improvement.
[](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config
axolotl version: `0.4.1`
```yaml
base_model: arcee-ai/Llama-3.1-SuperNova-Lite
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: SE6446/MAGllama_Sharegpt
type: sharegpt
conversation: chatml
dataset_prepared_path: /workspace/data/last_run_prepared
val_set_size: 0.05
output_dir: /workspace/data/outputs/out
sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true
eval_sample_packing: false
hub_model_id: SE6446/Llama-3.1-SuperNova-Lite-Reflections-3
hub_strategy: every_save
use_auth_token: true
wandb_project: Bojangles
wandb_entity:
wandb_watch:
wandb_name: run-6
wandb_log_model: checkpoint
gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 2
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.00015
adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save:
- embed_tokens
- lm_head
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: false
warmup_steps: 10
evals_per_epoch: 2
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
pad_token: <|end_of_text|>
tokens:
-
-
-
-
-
```
# Instructions
## Using hf pipeline
You **must** use the tokenizer provided with the model as the COT tokens are unique special tokens.
It should work on most inference engines that can run llama 3.1
```python
from transformers import pipeline
pipe = pipeline("text-generation", "SE6446/Llama-3.1-SuperNova-Lite-Reflection-V1.0", device_map="auto",trust_remote_code=True)
sys_prompt = "You are an AI assistant who reflects before answering the user." #If you put 'reflect' it will typically do so. If you want to vary the character just append it under this.
user_prompt = "Explain the difference between Newtonian and Keplerian orbits for a five year old." #Classic
messages = [
{
"role": "system",
"content": sys_prompt,
},
{"role": "user", "content": user_prompt}
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prompt = prompt + "" #Though not necessary, putting under the new line does ensure it reflects. Testing revealed not doing this could cause it to rarely disobey the tokens. Which is bad.
# prompt = "<|im_start|>assistant\n[sys prompt]<|im_end|><|im_start|>user\n[user input]<|im_end|><|im_start|>assistant\n" should do the trick if you like it old school.
text = pipe(prompt, max_new_tokens=1000) #max_new_tokens needs to be decently high so it may adequatley perform it's reflection AND output a concise answer.
print(text[0]['generated_text'])
```
# Training details
It achieves the following results on the evaluation set:
- Loss: 0.6365
## Training procedure
I trained it as a LoRA not only because it is cheap, but because it tries to preserve as much of the original parameters as possible. I just wanted it to get used to COT.
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.00015
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 2
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 2.7211 | 0.0049 | 1 | 1.4048 |
| 0.6381 | 0.5 | 103 | 0.6583 |
| 0.4985 | 1.0049 | 206 | 0.6320 |
| 0.4992 | 1.5049 | 309 | 0.6365 |
### Framework versions
- PEFT 0.12.0
- Transformers 4.45.0.dev0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1