See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: Qwen/Qwen2-1.5B-Instruct
batch_size: 8
bf16: true
chat_template: tokenizer_default_fallback_alpaca
datasets:
- data_files:
  - 19637e66dc3ec99a_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/19637e66dc3ec99a_train_data.json
  type:
    field_instruction: drugName
    field_output: review
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
early_stopping_patience: 3
eval_steps: 50
flash_attention: true
gpu_memory_limit: 80GiB
gradient_checkpointing: true
group_by_length: true
hub_model_id: willtensora/0eda4152-e58c-4e24-b30e-71e456fb3b24
hub_strategy: checkpoint
learning_rate: 0.0002
logging_steps: 10
lora_alpha: 256
lora_dropout: 0.1
lora_r: 128
lora_target_linear: true
lr_scheduler: cosine
micro_batch_size: 1
model_type: AutoModelForCausalLM
num_epochs: 100
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resize_token_embeddings_to_32x: false
sample_packing: false
save_steps: 50
sequence_len: 2048
tokenizer_type: Qwen2TokenizerFast
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.1
wandb_entity: ''
wandb_mode: online
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: default
warmup_ratio: 0.05
xformers_attention: true

0eda4152-e58c-4e24-b30e-71e456fb3b24

This model is a fine-tuned version of Qwen/Qwen2-1.5B-Instruct on the None dataset. It achieves the following results on the evaluation set:

Loss: 2.4073

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 8
total_eval_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 15107
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.0000	1	3.1066
3.0737	0.0021	50	3.0943
3.2193	0.0041	100	3.0057
2.9091	0.0062	150	2.8280
2.8518	0.0083	200	2.6914
2.7049	0.0103	250	2.5964
2.5077	0.0124	300	2.5624
2.5767	0.0145	350	2.5434
2.4882	0.0165	400	2.5289
2.5446	0.0186	450	2.5212
2.5746	0.0207	500	2.5130
2.552	0.0228	550	2.5067
2.5758	0.0248	600	2.5002
2.5321	0.0269	650	2.4943
2.5634	0.0290	700	2.4918
2.4308	0.0310	750	2.4876
2.5713	0.0331	800	2.4831
2.3993	0.0352	850	2.4820
2.4609	0.0372	900	2.4766
2.4981	0.0393	950	2.4738
2.5594	0.0414	1000	2.4705
2.5697	0.0434	1050	2.4702
2.5192	0.0455	1100	2.4677
2.5156	0.0476	1150	2.4649
2.5819	0.0496	1200	2.4638
2.5288	0.0517	1250	2.4595
2.4565	0.0538	1300	2.4585
2.4487	0.0558	1350	2.4557
2.5059	0.0579	1400	2.4531
2.4266	0.0600	1450	2.4537
2.4951	0.0621	1500	2.4544
2.4606	0.0641	1550	2.4467
2.3836	0.0662	1600	2.4453
2.4641	0.0683	1650	2.4461
2.4473	0.0703	1700	2.4432
2.3924	0.0724	1750	2.4418
2.4956	0.0745	1800	2.4415
2.5065	0.0765	1850	2.4377
2.57	0.0786	1900	2.4399
2.4057	0.0807	1950	2.4357
2.4555	0.0827	2000	2.4350
2.5578	0.0848	2050	2.4339
2.4314	0.0869	2100	2.4340
2.4294	0.0889	2150	2.4317
2.4092	0.0910	2200	2.4324
2.5031	0.0931	2250	2.4289
2.3989	0.0952	2300	2.4276
2.4823	0.0972	2350	2.4259
2.4884	0.0993	2400	2.4242
2.3923	0.1014	2450	2.4255
2.4107	0.1034	2500	2.4272
2.4565	0.1055	2550	2.4235
2.3695	0.1076	2600	2.4228
2.4399	0.1096	2650	2.4229
2.4686	0.1117	2700	2.4197
2.4199	0.1138	2750	2.4173
2.3615	0.1158	2800	2.4185
2.4635	0.1179	2850	2.4190
2.4492	0.1200	2900	2.4157
2.4444	0.1220	2950	2.4166
2.4057	0.1241	3000	2.4142
2.3822	0.1262	3050	2.4137
2.3831	0.1282	3100	2.4122
2.376	0.1303	3150	2.4140
2.4278	0.1324	3200	2.4109
2.3976	0.1345	3250	2.4121
2.3883	0.1365	3300	2.4099
2.4337	0.1386	3350	2.4095
2.3364	0.1407	3400	2.4066
2.3768	0.1427	3450	2.4065
2.4395	0.1448	3500	2.4081
2.2957	0.1469	3550	2.4069
2.396	0.1489	3600	2.4058
2.4117	0.1510	3650	2.4072
2.3691	0.1531	3700	2.4091
2.3721	0.1551	3750	2.4073

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

willtensora
/

0eda4152-e58c-4e24-b30e-71e456fb3b24

0eda4152-e58c-4e24-b30e-71e456fb3b24

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for willtensora/0eda4152-e58c-4e24-b30e-71e456fb3b24

Evaluation results