mgfrantz commited on
Commit
6a931ec
·
verified ·
1 Parent(s): 8581d24

End of training

Browse files
Files changed (2) hide show
  1. README.md +12 -25
  2. adapter_model.bin +1 -1
README.md CHANGED
@@ -22,6 +22,7 @@ axolotl version: `0.4.1`
22
  # Model config
23
  adapter: qlora
24
  base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
 
25
  bf16: auto
26
 
27
  # HF hub config (push to huggingface)
@@ -32,7 +33,6 @@ mlflow_experiment_name: axolotl-test
32
 
33
  # # Data config
34
  dataset_prepared_path: data
35
- # val_set_size: 0.1
36
  chat_template: chatml
37
  datasets:
38
  - path: data/train.jsonl
@@ -41,9 +41,6 @@ datasets:
41
  - data/train.jsonl
42
  conversation: alpaca
43
  type: sharegpt
44
- # role:
45
- # input:
46
- # output:
47
 
48
  test_datasets:
49
  - path: data/eval.jsonl
@@ -66,7 +63,7 @@ flash_attention: true
66
  fp16: null
67
  fsdp: null
68
  fsdp_config: null
69
- gradient_accumulation_steps: 1
70
  gradient_checkpointing: true
71
  group_by_length: false
72
 
@@ -116,7 +113,7 @@ xformers_attention: null
116
 
117
  This model is a fine-tuned version of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) on the None dataset.
118
  It achieves the following results on the evaluation set:
119
- - Loss: 1.5572
120
 
121
  ## Model description
122
 
@@ -139,6 +136,8 @@ The following hyperparameters were used during training:
139
  - train_batch_size: 8
140
  - eval_batch_size: 8
141
  - seed: 42
 
 
142
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
143
  - lr_scheduler_type: cosine
144
  - lr_scheduler_warmup_steps: 10
@@ -146,30 +145,18 @@ The following hyperparameters were used during training:
146
 
147
  ### Training results
148
 
149
- | Training Loss | Epoch | Step | Validation Loss |
150
- |:-------------:|:-----:|:----:|:---------------:|
151
- | 6.4934 | 0.25 | 1 | 2.0690 |
152
- | 2.5023 | 0.5 | 2 | 2.0673 |
153
- | 4.9022 | 0.75 | 3 | 2.0621 |
154
- | 5.6912 | 1.0 | 4 | 2.0491 |
155
- | 5.1317 | 1.25 | 5 | 2.0230 |
156
- | 5.5762 | 1.25 | 6 | 1.9738 |
157
- | 3.3504 | 1.5 | 7 | 1.9053 |
158
- | 5.1877 | 1.75 | 8 | 1.8346 |
159
- | 3.8815 | 2.0 | 9 | 1.7862 |
160
- | 3.5814 | 2.25 | 10 | 1.7475 |
161
- | 3.3579 | 2.25 | 11 | 1.6987 |
162
- | 3.5511 | 2.5 | 12 | 1.6555 |
163
- | 3.3339 | 2.75 | 13 | 1.6107 |
164
- | 2.8774 | 3.0 | 14 | 1.5778 |
165
- | 3.1427 | 3.25 | 15 | 1.5620 |
166
- | 3.3465 | 3.25 | 16 | 1.5572 |
167
 
168
 
169
  ### Framework versions
170
 
171
  - PEFT 0.13.2
172
  - Transformers 4.45.2
173
- - Pytorch 2.4.1+cu121
174
  - Datasets 3.0.1
175
  - Tokenizers 0.20.1
 
22
  # Model config
23
  adapter: qlora
24
  base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
25
+ # base_model: meta-llama/Llama-3.2-3B
26
  bf16: auto
27
 
28
  # HF hub config (push to huggingface)
 
33
 
34
  # # Data config
35
  dataset_prepared_path: data
 
36
  chat_template: chatml
37
  datasets:
38
  - path: data/train.jsonl
 
41
  - data/train.jsonl
42
  conversation: alpaca
43
  type: sharegpt
 
 
 
44
 
45
  test_datasets:
46
  - path: data/eval.jsonl
 
63
  fp16: null
64
  fsdp: null
65
  fsdp_config: null
66
+ gradient_accumulation_steps: 8
67
  gradient_checkpointing: true
68
  group_by_length: false
69
 
 
113
 
114
  This model is a fine-tuned version of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) on the None dataset.
115
  It achieves the following results on the evaluation set:
116
+ - Loss: 2.4338
117
 
118
  ## Model description
119
 
 
136
  - train_batch_size: 8
137
  - eval_batch_size: 8
138
  - seed: 42
139
+ - gradient_accumulation_steps: 8
140
+ - total_train_batch_size: 64
141
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
142
  - lr_scheduler_type: cosine
143
  - lr_scheduler_warmup_steps: 10
 
145
 
146
  ### Training results
147
 
148
+ | Training Loss | Epoch | Step | Validation Loss |
149
+ |:-------------:|:------:|:----:|:---------------:|
150
+ | 3.4962 | 0.5714 | 1 | 2.4779 |
151
+ | 5.3564 | 1.0714 | 2 | 2.4760 |
152
+ | 4.3272 | 1.6429 | 3 | 2.4633 |
153
+ | 4.7348 | 2.1429 | 4 | 2.4338 |
 
 
 
 
 
 
 
 
 
 
 
 
154
 
155
 
156
  ### Framework versions
157
 
158
  - PEFT 0.13.2
159
  - Transformers 4.45.2
160
+ - Pytorch 2.4.0+cu121
161
  - Datasets 3.0.1
162
  - Tokenizers 0.20.1
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:44772c4af3abc75d6063ca37102b982b62d41ac5fad308cde51f7d47e39986ef
3
  size 101036698
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64d2cce8324e410604bb157b921b744134968d639973d938a5b47f1146461b05
3
  size 101036698