SE6446/Llama-3.1-SuperNova-Lite-Reflection-V1.0
This model is a LoRA adaptation of arcee-ai/Llama-3.1-SuperNova-Lite on thesven/Reflective-MAGLLAMA-v0.1.1. This has been a simple experiment into reflection and the model appears to perform adequately, though I am unsure if it is a large improvement.
See axolotl config
axolotl version: 0.4.1
base_model: arcee-ai/Llama-3.1-SuperNova-Lite
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: SE6446/MAGllama_Sharegpt
type: sharegpt
conversation: chatml
dataset_prepared_path: /workspace/data/last_run_prepared
val_set_size: 0.05
output_dir: /workspace/data/outputs/out
sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true
eval_sample_packing: false
hub_model_id: SE6446/Llama-3.1-SuperNova-Lite-Reflections-3
hub_strategy: every_save
use_auth_token: true
wandb_project: Bojangles
wandb_entity:
wandb_watch:
wandb_name: run-6
wandb_log_model: checkpoint
gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 2
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.00015
adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save:
- embed_tokens
- lm_head
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: false
warmup_steps: 10
evals_per_epoch: 2
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
pad_token: <|end_of_text|>
tokens:
- <thinking>
- </thinking>
- <reflection>
- </reflection>
- <output>
- </output>
Instructions
Using hf pipeline
You must use the tokenizer provided with the model as the COT tokens are unique special tokens. It should work on most inference engines that can run llama 3.1
from transformers import pipeline
pipe = pipeline("text-generation", "SE6446/Llama-3.1-SuperNova-Lite-Reflection-V1.0", device_map="auto",trust_remote_code=True)
sys_prompt = "You are an AI assistant who reflects before answering the user." #If you put 'reflect' it will typically do so. If you want to vary the character just append it under this.
user_prompt = "Explain the difference between Newtonian and Keplerian orbits for a five year old." #Classic
messages = [
{
"role": "system",
"content": sys_prompt,
},
{"role": "user", "content": user_prompt}
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prompt = prompt + "<thinking>" #Though not necessary, putting <thinking> under the new line does ensure it reflects. Testing revealed not doing this could cause it to rarely disobey the tokens. Which is bad.
# prompt = "<|im_start|>assistant\n[sys prompt]<|im_end|><|im_start|>user\n[user input]<|im_end|><|im_start|>assistant\n<thinking>" should do the trick if you like it old school.
text = pipe(prompt, max_new_tokens=1000) #max_new_tokens needs to be decently high so it may adequatley perform it's reflection AND output a concise answer.
print(text[0]['generated_text'])
Training details
It achieves the following results on the evaluation set:
- Loss: 0.6365
Training procedure
I trained it as a LoRA not only because it is cheap, but because it tries to preserve as much of the original parameters as possible. I just wanted it to get used to COT.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.00015
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.7211 | 0.0049 | 1 | 1.4048 |
0.6381 | 0.5 | 103 | 0.6583 |
0.4985 | 1.0049 | 206 | 0.6320 |
0.4992 | 1.5049 | 309 | 0.6365 |
Framework versions
- PEFT 0.12.0
- Transformers 4.45.0.dev0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1
- Downloads last month
- 35
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for SE6446/Llama-3.1-SuperNova-Lite-Reflection-V1.0
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct
Finetuned
arcee-ai/Llama-3.1-SuperNova-Lite