See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: TinyLlama/TinyLlama-1.1B-Chat-v0.6
batch_size: 8
bf16: true
chat_template: tokenizer_default_fallback_alpaca
datasets:
- data_files:
  - 11bfaf21b106be7f_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/11bfaf21b106be7f_train_data.json
  type:
    field_input: project_and_commit_id
    field_instruction: source
    field_output: target
    format: '{instruction} {input}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
early_stopping_patience: 3
eval_steps: 50
flash_attention: true
gpu_memory_limit: 80GiB
gradient_checkpointing: true
group_by_length: true
hub_model_id: willtensora/ad641a5b-ef11-4278-80e4-9119f53c47f4
hub_strategy: checkpoint
learning_rate: 0.0002
logging_steps: 10
lora_alpha: 256
lora_dropout: 0.1
lora_r: 128
lora_target_linear: true
lr_scheduler: cosine
micro_batch_size: 1
model_type: AutoModelForCausalLM
num_epochs: 100
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resize_token_embeddings_to_32x: false
sample_packing: false
save_steps: 50
sequence_len: 2048
tokenizer_type: LlamaTokenizerFast
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.1
wandb_entity: ''
wandb_mode: online
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: default
warmup_ratio: 0.05
xformers_attention: true

ad641a5b-ef11-4278-80e4-9119f53c47f4

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v0.6 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.2704

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 8
total_eval_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 589
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.0011	1	2.0060
1.2664	0.0530	50	1.2713
0.9659	0.1060	100	0.9642
0.7596	0.1591	150	0.8311
0.7092	0.2121	200	0.7583
0.6691	0.2651	250	0.6944
0.65	0.3181	300	0.6577
0.6149	0.3712	350	0.6268
0.5929	0.4242	400	0.5945
0.5319	0.4772	450	0.5820
0.5136	0.5302	500	0.5576
0.5258	0.5832	550	0.5367
0.4476	0.6363	600	0.5141
0.5018	0.6893	650	0.4943
0.4851	0.7423	700	0.4861
0.41	0.7953	750	0.4693
0.4625	0.8484	800	0.4552
0.4909	0.9014	850	0.4421
0.3885	0.9544	900	0.4196
0.3408	1.0074	950	0.4111
0.2804	1.0604	1000	0.4020
0.3503	1.1135	1050	0.3875
0.291	1.1665	1100	0.3958
0.3025	1.2195	1150	0.3849
0.2749	1.2725	1200	0.3729
0.3222	1.3256	1250	0.3631
0.2895	1.3786	1300	0.3570
0.2994	1.4316	1350	0.3470
0.3055	1.4846	1400	0.3431
0.2252	1.5376	1450	0.3351
0.2816	1.5907	1500	0.3214
0.3065	1.6437	1550	0.3163
0.2727	1.6967	1600	0.3158
0.2673	1.7497	1650	0.3123
0.276	1.8028	1700	0.3090
0.217	1.8558	1750	0.3021
0.2712	1.9088	1800	0.2950
0.2175	1.9618	1850	0.2927
0.1561	2.0148	1900	0.2911
0.1557	2.0679	1950	0.2773
0.1404	2.1209	2000	0.2725
0.1386	2.1739	2050	0.2696
0.1224	2.2269	2100	0.2780
0.1535	2.2800	2150	0.2713
0.1494	2.3330	2200	0.2704

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

willtensora
/

ad641a5b-ef11-4278-80e4-9119f53c47f4

ad641a5b-ef11-4278-80e4-9119f53c47f4

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for willtensora/ad641a5b-ef11-4278-80e4-9119f53c47f4

Evaluation results