willtensora commited on
Commit
d0b9383
·
verified ·
1 Parent(s): 0181873

End of training

Browse files
Files changed (3) hide show
  1. README.md +204 -0
  2. adapter_model.bin +3 -0
  3. adapter_model.safetensors +1 -1
README.md ADDED
@@ -0,0 +1,204 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2-1.5B-Instruct
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: 0eda4152-e58c-4e24-b30e-71e456fb3b24
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.4.1`
20
+ ```yaml
21
+ adapter: lora
22
+ base_model: Qwen/Qwen2-1.5B-Instruct
23
+ batch_size: 8
24
+ bf16: true
25
+ chat_template: tokenizer_default_fallback_alpaca
26
+ datasets:
27
+ - data_files:
28
+ - 19637e66dc3ec99a_train_data.json
29
+ ds_type: json
30
+ format: custom
31
+ path: /workspace/input_data/19637e66dc3ec99a_train_data.json
32
+ type:
33
+ field_instruction: drugName
34
+ field_output: review
35
+ format: '{instruction}'
36
+ no_input_format: '{instruction}'
37
+ system_format: '{system}'
38
+ system_prompt: ''
39
+ early_stopping_patience: 3
40
+ eval_steps: 50
41
+ flash_attention: true
42
+ gpu_memory_limit: 80GiB
43
+ gradient_checkpointing: true
44
+ group_by_length: true
45
+ hub_model_id: willtensora/0eda4152-e58c-4e24-b30e-71e456fb3b24
46
+ hub_strategy: checkpoint
47
+ learning_rate: 0.0002
48
+ logging_steps: 10
49
+ lora_alpha: 256
50
+ lora_dropout: 0.1
51
+ lora_r: 128
52
+ lora_target_linear: true
53
+ lr_scheduler: cosine
54
+ micro_batch_size: 1
55
+ model_type: AutoModelForCausalLM
56
+ num_epochs: 100
57
+ optimizer: adamw_bnb_8bit
58
+ output_dir: miner_id_24
59
+ pad_to_sequence_len: true
60
+ resize_token_embeddings_to_32x: false
61
+ sample_packing: false
62
+ save_steps: 50
63
+ sequence_len: 2048
64
+ tokenizer_type: Qwen2TokenizerFast
65
+ train_on_inputs: false
66
+ trust_remote_code: true
67
+ val_set_size: 0.1
68
+ wandb_entity: ''
69
+ wandb_mode: online
70
+ wandb_project: Gradients-On-Demand
71
+ wandb_run: your_name
72
+ wandb_runid: default
73
+ warmup_ratio: 0.05
74
+ xformers_attention: true
75
+
76
+ ```
77
+
78
+ </details><br>
79
+
80
+ # 0eda4152-e58c-4e24-b30e-71e456fb3b24
81
+
82
+ This model is a fine-tuned version of [Qwen/Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct) on the None dataset.
83
+ It achieves the following results on the evaluation set:
84
+ - Loss: 2.4073
85
+
86
+ ## Model description
87
+
88
+ More information needed
89
+
90
+ ## Intended uses & limitations
91
+
92
+ More information needed
93
+
94
+ ## Training and evaluation data
95
+
96
+ More information needed
97
+
98
+ ## Training procedure
99
+
100
+ ### Training hyperparameters
101
+
102
+ The following hyperparameters were used during training:
103
+ - learning_rate: 0.0002
104
+ - train_batch_size: 1
105
+ - eval_batch_size: 1
106
+ - seed: 42
107
+ - distributed_type: multi-GPU
108
+ - num_devices: 8
109
+ - total_train_batch_size: 8
110
+ - total_eval_batch_size: 8
111
+ - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
112
+ - lr_scheduler_type: cosine
113
+ - lr_scheduler_warmup_steps: 15107
114
+ - num_epochs: 100
115
+
116
+ ### Training results
117
+
118
+ | Training Loss | Epoch | Step | Validation Loss |
119
+ |:-------------:|:------:|:----:|:---------------:|
120
+ | No log | 0.0000 | 1 | 3.1066 |
121
+ | 3.0737 | 0.0021 | 50 | 3.0943 |
122
+ | 3.2193 | 0.0041 | 100 | 3.0057 |
123
+ | 2.9091 | 0.0062 | 150 | 2.8280 |
124
+ | 2.8518 | 0.0083 | 200 | 2.6914 |
125
+ | 2.7049 | 0.0103 | 250 | 2.5964 |
126
+ | 2.5077 | 0.0124 | 300 | 2.5624 |
127
+ | 2.5767 | 0.0145 | 350 | 2.5434 |
128
+ | 2.4882 | 0.0165 | 400 | 2.5289 |
129
+ | 2.5446 | 0.0186 | 450 | 2.5212 |
130
+ | 2.5746 | 0.0207 | 500 | 2.5130 |
131
+ | 2.552 | 0.0228 | 550 | 2.5067 |
132
+ | 2.5758 | 0.0248 | 600 | 2.5002 |
133
+ | 2.5321 | 0.0269 | 650 | 2.4943 |
134
+ | 2.5634 | 0.0290 | 700 | 2.4918 |
135
+ | 2.4308 | 0.0310 | 750 | 2.4876 |
136
+ | 2.5713 | 0.0331 | 800 | 2.4831 |
137
+ | 2.3993 | 0.0352 | 850 | 2.4820 |
138
+ | 2.4609 | 0.0372 | 900 | 2.4766 |
139
+ | 2.4981 | 0.0393 | 950 | 2.4738 |
140
+ | 2.5594 | 0.0414 | 1000 | 2.4705 |
141
+ | 2.5697 | 0.0434 | 1050 | 2.4702 |
142
+ | 2.5192 | 0.0455 | 1100 | 2.4677 |
143
+ | 2.5156 | 0.0476 | 1150 | 2.4649 |
144
+ | 2.5819 | 0.0496 | 1200 | 2.4638 |
145
+ | 2.5288 | 0.0517 | 1250 | 2.4595 |
146
+ | 2.4565 | 0.0538 | 1300 | 2.4585 |
147
+ | 2.4487 | 0.0558 | 1350 | 2.4557 |
148
+ | 2.5059 | 0.0579 | 1400 | 2.4531 |
149
+ | 2.4266 | 0.0600 | 1450 | 2.4537 |
150
+ | 2.4951 | 0.0621 | 1500 | 2.4544 |
151
+ | 2.4606 | 0.0641 | 1550 | 2.4467 |
152
+ | 2.3836 | 0.0662 | 1600 | 2.4453 |
153
+ | 2.4641 | 0.0683 | 1650 | 2.4461 |
154
+ | 2.4473 | 0.0703 | 1700 | 2.4432 |
155
+ | 2.3924 | 0.0724 | 1750 | 2.4418 |
156
+ | 2.4956 | 0.0745 | 1800 | 2.4415 |
157
+ | 2.5065 | 0.0765 | 1850 | 2.4377 |
158
+ | 2.57 | 0.0786 | 1900 | 2.4399 |
159
+ | 2.4057 | 0.0807 | 1950 | 2.4357 |
160
+ | 2.4555 | 0.0827 | 2000 | 2.4350 |
161
+ | 2.5578 | 0.0848 | 2050 | 2.4339 |
162
+ | 2.4314 | 0.0869 | 2100 | 2.4340 |
163
+ | 2.4294 | 0.0889 | 2150 | 2.4317 |
164
+ | 2.4092 | 0.0910 | 2200 | 2.4324 |
165
+ | 2.5031 | 0.0931 | 2250 | 2.4289 |
166
+ | 2.3989 | 0.0952 | 2300 | 2.4276 |
167
+ | 2.4823 | 0.0972 | 2350 | 2.4259 |
168
+ | 2.4884 | 0.0993 | 2400 | 2.4242 |
169
+ | 2.3923 | 0.1014 | 2450 | 2.4255 |
170
+ | 2.4107 | 0.1034 | 2500 | 2.4272 |
171
+ | 2.4565 | 0.1055 | 2550 | 2.4235 |
172
+ | 2.3695 | 0.1076 | 2600 | 2.4228 |
173
+ | 2.4399 | 0.1096 | 2650 | 2.4229 |
174
+ | 2.4686 | 0.1117 | 2700 | 2.4197 |
175
+ | 2.4199 | 0.1138 | 2750 | 2.4173 |
176
+ | 2.3615 | 0.1158 | 2800 | 2.4185 |
177
+ | 2.4635 | 0.1179 | 2850 | 2.4190 |
178
+ | 2.4492 | 0.1200 | 2900 | 2.4157 |
179
+ | 2.4444 | 0.1220 | 2950 | 2.4166 |
180
+ | 2.4057 | 0.1241 | 3000 | 2.4142 |
181
+ | 2.3822 | 0.1262 | 3050 | 2.4137 |
182
+ | 2.3831 | 0.1282 | 3100 | 2.4122 |
183
+ | 2.376 | 0.1303 | 3150 | 2.4140 |
184
+ | 2.4278 | 0.1324 | 3200 | 2.4109 |
185
+ | 2.3976 | 0.1345 | 3250 | 2.4121 |
186
+ | 2.3883 | 0.1365 | 3300 | 2.4099 |
187
+ | 2.4337 | 0.1386 | 3350 | 2.4095 |
188
+ | 2.3364 | 0.1407 | 3400 | 2.4066 |
189
+ | 2.3768 | 0.1427 | 3450 | 2.4065 |
190
+ | 2.4395 | 0.1448 | 3500 | 2.4081 |
191
+ | 2.2957 | 0.1469 | 3550 | 2.4069 |
192
+ | 2.396 | 0.1489 | 3600 | 2.4058 |
193
+ | 2.4117 | 0.1510 | 3650 | 2.4072 |
194
+ | 2.3691 | 0.1531 | 3700 | 2.4091 |
195
+ | 2.3721 | 0.1551 | 3750 | 2.4073 |
196
+
197
+
198
+ ### Framework versions
199
+
200
+ - PEFT 0.13.2
201
+ - Transformers 4.46.0
202
+ - Pytorch 2.5.0+cu124
203
+ - Datasets 3.0.1
204
+ - Tokenizers 0.20.1
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8fd68b9670006a53407bdc858eb45867fe28c6e43476df1898282f1c1ac2326
3
+ size 591014186
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:474d3e90ff4df3aff5feeb62f57d787081cb8ff43321eaa59bb4a5272241fdce
3
  size 590925768
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c4c158f3ada5f8b021a72c8534faca8940cd46aa7b7e41657a3a5260e56bed2
3
  size 590925768