ardaspear commited on
Commit
1331bff
·
verified ·
1 Parent(s): 5fd6a36

End of training

Browse files
Files changed (2) hide show
  1. README.md +31 -31
  2. adapter_model.bin +2 -2
README.md CHANGED
@@ -21,7 +21,7 @@ axolotl version: `0.4.1`
21
  adapter: lora
22
  base_model: echarlaix/tiny-random-mistral
23
  bf16: auto
24
- chat_template: llama3
25
  dataset_prepared_path: null
26
  datasets:
27
  - data_files:
@@ -42,31 +42,31 @@ early_stopping_patience: null
42
  eval_max_new_tokens: 128
43
  eval_table_size: null
44
  evals_per_epoch: 4
45
- flash_attention: false
46
  fp16: null
47
  fsdp: null
48
  fsdp_config: null
49
  gradient_accumulation_steps: 4
50
- gradient_checkpointing: true
51
  group_by_length: false
52
  hub_model_id: ardaspear/35064bc1-2c15-4036-bbb1-561a74589740
53
  hub_repo: null
54
  hub_strategy: checkpoint
55
  hub_token: null
56
- learning_rate: 0.0001
57
- load_in_4bit: false
58
  load_in_8bit: false
59
- local_rank: 0
60
- logging_steps: 3
61
- lora_alpha: 128
62
- lora_dropout: 0.1
63
- lora_fan_in_fan_out: true
64
  lora_model_dir: null
65
- lora_r: 64
66
  lora_target_linear: true
67
  lr_scheduler: cosine
68
  max_steps: 50
69
- micro_batch_size: 8
70
  mlflow_experiment_name: /tmp/a74ecd5c5b3909f6_train_data.json
71
  model_type: AutoModelForCausalLM
72
  num_epochs: 3
@@ -74,10 +74,10 @@ optimizer: adamw_bnb_8bit
74
  output_dir: miner_id_24
75
  pad_to_sequence_len: true
76
  resume_from_checkpoint: null
77
- s2_attention: false
78
  sample_packing: false
79
  saves_per_epoch: 4
80
- sequence_len: 1024
81
  strict: false
82
  tf32: false
83
  tokenizer_type: AutoTokenizer
@@ -91,7 +91,7 @@ wandb_project: Gradients-On-Two
91
  wandb_run: your_name
92
  wandb_runid: 35064bc1-2c15-4036-bbb1-561a74589740
93
  warmup_steps: 10
94
- weight_decay: 0.01
95
  xformers_attention: null
96
 
97
  ```
@@ -102,7 +102,7 @@ xformers_attention: null
102
 
103
  This model is a fine-tuned version of [echarlaix/tiny-random-mistral](https://huggingface.co/echarlaix/tiny-random-mistral) on the None dataset.
104
  It achieves the following results on the evaluation set:
105
- - Loss: nan
106
 
107
  ## Model description
108
 
@@ -121,12 +121,12 @@ More information needed
121
  ### Training hyperparameters
122
 
123
  The following hyperparameters were used during training:
124
- - learning_rate: 0.0001
125
- - train_batch_size: 8
126
- - eval_batch_size: 8
127
  - seed: 42
128
  - gradient_accumulation_steps: 4
129
- - total_train_batch_size: 32
130
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
131
  - lr_scheduler_type: cosine
132
  - lr_scheduler_warmup_steps: 10
@@ -136,17 +136,17 @@ The following hyperparameters were used during training:
136
 
137
  | Training Loss | Epoch | Step | Validation Loss |
138
  |:-------------:|:------:|:----:|:---------------:|
139
- | No log | 0.0007 | 1 | nan |
140
- | 0.0 | 0.0033 | 5 | nan |
141
- | 0.0 | 0.0065 | 10 | nan |
142
- | 0.0 | 0.0098 | 15 | nan |
143
- | 0.0 | 0.0130 | 20 | nan |
144
- | 0.0 | 0.0163 | 25 | nan |
145
- | 0.0 | 0.0196 | 30 | nan |
146
- | 0.0 | 0.0228 | 35 | nan |
147
- | 0.0 | 0.0261 | 40 | nan |
148
- | 0.0 | 0.0293 | 45 | nan |
149
- | 0.0 | 0.0326 | 50 | nan |
150
 
151
 
152
  ### Framework versions
 
21
  adapter: lora
22
  base_model: echarlaix/tiny-random-mistral
23
  bf16: auto
24
+ chat_template: chatml
25
  dataset_prepared_path: null
26
  datasets:
27
  - data_files:
 
42
  eval_max_new_tokens: 128
43
  eval_table_size: null
44
  evals_per_epoch: 4
45
+ flash_attention: true
46
  fp16: null
47
  fsdp: null
48
  fsdp_config: null
49
  gradient_accumulation_steps: 4
50
+ gradient_checkpointing: false
51
  group_by_length: false
52
  hub_model_id: ardaspear/35064bc1-2c15-4036-bbb1-561a74589740
53
  hub_repo: null
54
  hub_strategy: checkpoint
55
  hub_token: null
56
+ learning_rate: 0.0002
57
+ load_in_4bit: true
58
  load_in_8bit: false
59
+ local_rank: null
60
+ logging_steps: 1
61
+ lora_alpha: 32
62
+ lora_dropout: 0.05
63
+ lora_fan_in_fan_out: null
64
  lora_model_dir: null
65
+ lora_r: 16
66
  lora_target_linear: true
67
  lr_scheduler: cosine
68
  max_steps: 50
69
+ micro_batch_size: 2
70
  mlflow_experiment_name: /tmp/a74ecd5c5b3909f6_train_data.json
71
  model_type: AutoModelForCausalLM
72
  num_epochs: 3
 
74
  output_dir: miner_id_24
75
  pad_to_sequence_len: true
76
  resume_from_checkpoint: null
77
+ s2_attention: null
78
  sample_packing: false
79
  saves_per_epoch: 4
80
+ sequence_len: 4056
81
  strict: false
82
  tf32: false
83
  tokenizer_type: AutoTokenizer
 
91
  wandb_run: your_name
92
  wandb_runid: 35064bc1-2c15-4036-bbb1-561a74589740
93
  warmup_steps: 10
94
+ weight_decay: 0.0
95
  xformers_attention: null
96
 
97
  ```
 
102
 
103
  This model is a fine-tuned version of [echarlaix/tiny-random-mistral](https://huggingface.co/echarlaix/tiny-random-mistral) on the None dataset.
104
  It achieves the following results on the evaluation set:
105
+ - Loss: 10.3595
106
 
107
  ## Model description
108
 
 
121
  ### Training hyperparameters
122
 
123
  The following hyperparameters were used during training:
124
+ - learning_rate: 0.0002
125
+ - train_batch_size: 2
126
+ - eval_batch_size: 2
127
  - seed: 42
128
  - gradient_accumulation_steps: 4
129
+ - total_train_batch_size: 8
130
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
131
  - lr_scheduler_type: cosine
132
  - lr_scheduler_warmup_steps: 10
 
136
 
137
  | Training Loss | Epoch | Step | Validation Loss |
138
  |:-------------:|:------:|:----:|:---------------:|
139
+ | 41.5398 | 0.0002 | 1 | 10.3783 |
140
+ | 41.5426 | 0.0008 | 5 | 10.3779 |
141
+ | 41.5231 | 0.0016 | 10 | 10.3762 |
142
+ | 41.4979 | 0.0024 | 15 | 10.3736 |
143
+ | 41.4805 | 0.0033 | 20 | 10.3706 |
144
+ | 41.4671 | 0.0041 | 25 | 10.3673 |
145
+ | 41.4543 | 0.0049 | 30 | 10.3643 |
146
+ | 41.4492 | 0.0057 | 35 | 10.3618 |
147
+ | 41.4506 | 0.0065 | 40 | 10.3603 |
148
+ | 41.4457 | 0.0073 | 45 | 10.3597 |
149
+ | 41.4237 | 0.0082 | 50 | 10.3595 |
150
 
151
 
152
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:44d1bf23797b206e048c13a52c1cf238e5f7f726dd906220bfc8f5b33219387f
3
- size 230786
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b215d60527eb02c52436a7ef993abc191af4fb2747e66330b67f1e79dbc4096f
3
+ size 65282