Built with Axolotl

See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: Qwen/Qwen2-1.5B-Instruct
batch_size: 8
bf16: true
chat_template: tokenizer_default_fallback_alpaca
datasets:
- data_files:
  - 19637e66dc3ec99a_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/19637e66dc3ec99a_train_data.json
  type:
    field_instruction: drugName
    field_output: review
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
early_stopping_patience: 3
eval_steps: 50
flash_attention: true
gpu_memory_limit: 80GiB
gradient_checkpointing: true
group_by_length: true
hub_model_id: willtensora/0eda4152-e58c-4e24-b30e-71e456fb3b24
hub_strategy: checkpoint
learning_rate: 0.0002
logging_steps: 10
lora_alpha: 256
lora_dropout: 0.1
lora_r: 128
lora_target_linear: true
lr_scheduler: cosine
micro_batch_size: 1
model_type: AutoModelForCausalLM
num_epochs: 100
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resize_token_embeddings_to_32x: false
sample_packing: false
save_steps: 50
sequence_len: 2048
tokenizer_type: Qwen2TokenizerFast
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.1
wandb_entity: ''
wandb_mode: online
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: default
warmup_ratio: 0.05
xformers_attention: true

0eda4152-e58c-4e24-b30e-71e456fb3b24

This model is a fine-tuned version of Qwen/Qwen2-1.5B-Instruct on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4073

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 8
  • total_eval_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 15107
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
No log 0.0000 1 3.1066
3.0737 0.0021 50 3.0943
3.2193 0.0041 100 3.0057
2.9091 0.0062 150 2.8280
2.8518 0.0083 200 2.6914
2.7049 0.0103 250 2.5964
2.5077 0.0124 300 2.5624
2.5767 0.0145 350 2.5434
2.4882 0.0165 400 2.5289
2.5446 0.0186 450 2.5212
2.5746 0.0207 500 2.5130
2.552 0.0228 550 2.5067
2.5758 0.0248 600 2.5002
2.5321 0.0269 650 2.4943
2.5634 0.0290 700 2.4918
2.4308 0.0310 750 2.4876
2.5713 0.0331 800 2.4831
2.3993 0.0352 850 2.4820
2.4609 0.0372 900 2.4766
2.4981 0.0393 950 2.4738
2.5594 0.0414 1000 2.4705
2.5697 0.0434 1050 2.4702
2.5192 0.0455 1100 2.4677
2.5156 0.0476 1150 2.4649
2.5819 0.0496 1200 2.4638
2.5288 0.0517 1250 2.4595
2.4565 0.0538 1300 2.4585
2.4487 0.0558 1350 2.4557
2.5059 0.0579 1400 2.4531
2.4266 0.0600 1450 2.4537
2.4951 0.0621 1500 2.4544
2.4606 0.0641 1550 2.4467
2.3836 0.0662 1600 2.4453
2.4641 0.0683 1650 2.4461
2.4473 0.0703 1700 2.4432
2.3924 0.0724 1750 2.4418
2.4956 0.0745 1800 2.4415
2.5065 0.0765 1850 2.4377
2.57 0.0786 1900 2.4399
2.4057 0.0807 1950 2.4357
2.4555 0.0827 2000 2.4350
2.5578 0.0848 2050 2.4339
2.4314 0.0869 2100 2.4340
2.4294 0.0889 2150 2.4317
2.4092 0.0910 2200 2.4324
2.5031 0.0931 2250 2.4289
2.3989 0.0952 2300 2.4276
2.4823 0.0972 2350 2.4259
2.4884 0.0993 2400 2.4242
2.3923 0.1014 2450 2.4255
2.4107 0.1034 2500 2.4272
2.4565 0.1055 2550 2.4235
2.3695 0.1076 2600 2.4228
2.4399 0.1096 2650 2.4229
2.4686 0.1117 2700 2.4197
2.4199 0.1138 2750 2.4173
2.3615 0.1158 2800 2.4185
2.4635 0.1179 2850 2.4190
2.4492 0.1200 2900 2.4157
2.4444 0.1220 2950 2.4166
2.4057 0.1241 3000 2.4142
2.3822 0.1262 3050 2.4137
2.3831 0.1282 3100 2.4122
2.376 0.1303 3150 2.4140
2.4278 0.1324 3200 2.4109
2.3976 0.1345 3250 2.4121
2.3883 0.1365 3300 2.4099
2.4337 0.1386 3350 2.4095
2.3364 0.1407 3400 2.4066
2.3768 0.1427 3450 2.4065
2.4395 0.1448 3500 2.4081
2.2957 0.1469 3550 2.4069
2.396 0.1489 3600 2.4058
2.4117 0.1510 3650 2.4072
2.3691 0.1531 3700 2.4091
2.3721 0.1551 3750 2.4073

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
8
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for willtensora/0eda4152-e58c-4e24-b30e-71e456fb3b24

Adapter
(764)
this model