Nanobit commited on
Commit
04d2813
Β·
1 Parent(s): 3960936

Feat: Rewrite Readme

Browse files
Files changed (1) hide show
  1. README.md +151 -71
README.md CHANGED
@@ -1,6 +1,8 @@
1
  # Axolotl
2
 
3
- #### Go ahead and axolotl questions
 
 
4
 
5
  ## Support Matrix
6
 
@@ -9,41 +11,111 @@
9
  | llama | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… |
10
  | Pythia | βœ… | βœ… | ❌ | ❌ | ❌ | ❓ |
11
  | cerebras | βœ… | βœ… | ❌ | ❌ | ❌ | ❓ |
 
12
 
13
 
14
  ## Getting Started
15
- - install python 3.9. 3.10 and above are not supported.
16
 
17
- - Point the config you are using to a huggingface hub dataset (see [configs/llama_7B_4bit.yml](https://github.com/winglian/axolotl/blob/main/configs/llama_7B_4bit.yml#L6-L8))
18
-
19
- ```yaml
20
- datasets:
21
- - path: vicgalle/alpaca-gpt4
22
- type: alpaca
23
- ```
24
-
25
- - Optionally Download some datasets, see [data/README.md](data/README.md)
26
-
27
-
28
- - Create a new or update the existing YAML config [config/sample.yml](config/sample.yml)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ```yaml
31
  # this is the huggingface model that contains *.pt, *.safetensors, or *.bin files
32
  # this can also be a relative path to a model on disk
33
- base_model: decapoda-research/llama-7b-hf-int4
34
  # you can specify an ignore pattern if the model repo contains more than 1 model type (*.pt, etc)
35
  base_model_ignore_patterns:
36
  # if the base_model repo on hf hub doesn't include configuration .json files,
37
  # you can set that here, or leave this empty to default to base_model
38
- base_model_config: decapoda-research/llama-7b-hf
39
  # If you want to specify the type of model to load, AutoModelForCausalLM is a good choice too
40
  model_type: AutoModelForCausalLM
41
  # Corresponding tokenizer for the model AutoTokenizer is a good choice
42
  tokenizer_type: AutoTokenizer
 
43
  # whether you are training a 4-bit quantized model
44
  load_4bit: true
 
 
 
45
  # this will attempt to quantize the model down to 8 bits and use adam 8 bit optimizer
46
  load_in_8bit: true
 
 
 
 
 
 
 
 
47
  # a list of one or more datasets to finetune the model with
48
  datasets:
49
  # this can be either a hf dataset, or relative path
@@ -55,17 +127,19 @@ datasets:
55
  dataset_prepared_path: data/last_run_prepared
56
  # How much of the dataset to set aside as evaluation. 1 = 100%, 0.50 = 50%, etc
57
  val_set_size: 0.04
58
- # if you want to use lora, leave blank to train all parameters in original model
59
- adapter: lora
60
- # if you already have a lora model trained that you want to load, put that here
61
- lora_model_dir:
62
  # the maximum length of an input to train with, this should typically be less than 2048
63
  # as most models have a token/context limit of 2048
64
  sequence_len: 2048
65
  # max sequence length to concatenate training samples together up to
66
  # inspired by StackLLaMA. see https://huggingface.co/blog/stackllama#supervised-fine-tuning
67
  max_packed_sequence_len: 1024
 
 
 
 
68
  # lora hyperparameters
 
69
  lora_r: 8
70
  lora_alpha: 16
71
  lora_dropout: 0.05
@@ -74,14 +148,24 @@ lora_target_modules:
74
  - v_proj
75
  # - k_proj
76
  # - o_proj
 
 
 
 
 
 
 
77
  lora_fan_in_fan_out: false
78
- # wandb configuration if your're using it
 
79
  wandb_project:
80
  wandb_watch:
81
  wandb_run_id:
82
- wandb_log_model: checkpoint
 
83
  # where to save the finsihed model to
84
  output_dir: ./completed-model
 
85
  # training hyperparameters
86
  batch_size: 8
87
  micro_batch_size: 2
@@ -89,87 +173,83 @@ eval_batch_size: 2
89
  num_epochs: 3
90
  warmup_steps: 100
91
  learning_rate: 0.00003
 
 
92
  # whether to mask out or include the human's prompt from the training labels
93
  train_on_inputs: false
94
  # don't use this, leads to wonky training (according to someone on the internet)
95
  group_by_length: false
96
- # Use CUDA bf16
97
- bf16: true
98
- # Use CUDA tf32
99
- tf32: true
100
  # does not work with current implementation of 4-bit LoRA
101
  gradient_checkpointing: false
 
102
  # stop training after this many evaluation losses have increased in a row
103
  # https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback
104
  early_stopping_patience: 3
105
  # specify a scheduler to use with the optimizer. only one_cycle is supported currently
106
  lr_scheduler:
 
 
 
 
 
107
  # whether to use xformers attention patch https://github.com/facebookresearch/xformers:
108
  xformers_attention:
109
  # whether to use flash attention patch https://github.com/HazyResearch/flash-attention:
110
  flash_attention:
 
111
  # resume from a specific checkpoint dir
112
  resume_from_checkpoint:
113
  # if resume_from_checkpoint isn't set and you simply want it to start where it left off
114
  # be careful with this being turned on between different models
115
  auto_resume_from_checkpoints: false
 
116
  # don't mess with this, it's here for accelerate and torchrun
117
  local_rank:
118
- ```
119
 
120
- - Install python dependencies with ONE of the following:
 
 
 
121
 
122
- - `pip3 install -e .[int4]` (recommended)
123
- - `pip3 install -e .[int4_triton]`
124
- - `pip3 install -e .`
125
- -
126
- - If not using `int4` or `int4_triton`, run `pip install "peft @ git+https://github.com/huggingface/peft.git"`
127
- - Configure accelerate `accelerate config` or update `~/.cache/huggingface/accelerate/default_config.yaml`
128
 
129
- ```yaml
130
- compute_environment: LOCAL_MACHINE
131
- distributed_type: MULTI_GPU
132
- downcast_bf16: 'no'
133
- gpu_ids: all
134
- machine_rank: 0
135
- main_training_function: main
136
- mixed_precision: bf16
137
- num_machines: 1
138
- num_processes: 4
139
- rdzv_backend: static
140
- same_network: true
141
- tpu_env: []
142
- tpu_use_cluster: false
143
- tpu_use_sudo: false
144
- use_cpu: false
145
  ```
146
 
147
- - Train! `accelerate launch scripts/finetune.py`, make sure to choose the correct YAML config file
148
- - Alternatively you can pass in the config file like: `accelerate launch scripts/finetune.py configs/llama_7B_alpaca.yml`~~
149
 
 
150
 
151
- ## How to start training on Runpod in under 10 minutes
152
 
153
- - Choose your Docker container wisely.
154
- - I recommend `huggingface:transformers-pytorch-deepspeed-latest-gpu` see https://hub.docker.com/r/huggingface/transformers-pytorch-deepspeed-latest-gpu/
155
- - Once you start your runpod, and SSH into it:
156
- ```shell
157
- export TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6+PTX"
158
- source <(curl -s https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/dev/scripts/setup-runpod.sh)
159
- ```
160
 
161
- - Once the setup script completes
162
- ```shell
163
- accelerate launch scripts/finetune.py configs/quickstart.yml
164
  ```
165
 
166
- - Here are some helpful environment variables you'll want to manually set if you open a new shell
167
- ```shell
168
- export WANDB_MODE=offline
169
- export WANDB_CACHE_DIR=/workspace/data/wandb-cache
170
- export HF_DATASETS_CACHE="/workspace/data/huggingface-cache/datasets"
171
- export HUGGINGFACE_HUB_CACHE="/workspace/data/huggingface-cache/hub"
172
- export TRANSFORMERS_CACHE="/workspace/data/huggingface-cache/hub"
173
- export NCCL_P2P_DISABLE=1
174
  ```
175
 
 
 
 
 
 
1
  # Axolotl
2
 
3
+ A centralized repo to train multiple architectures with different dataset types using a simple yaml file.
4
+
5
+ Go ahead and axolotl questions!!
6
 
7
  ## Support Matrix
8
 
 
11
  | llama | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… |
12
  | Pythia | βœ… | βœ… | ❌ | ❌ | ❌ | ❓ |
13
  | cerebras | βœ… | βœ… | ❌ | ❌ | ❌ | ❓ |
14
+ | mpt | βœ… | ❌ | ❌ | ❌ | ❌ | ❓ |
15
 
16
 
17
  ## Getting Started
 
18
 
19
+ ### Environment
20
+
21
+ - Docker
22
+ ```bash
23
+ docker pull winglian/axolotl
24
+ ```
25
+
26
+ - Conda/Pip venv
27
+ 1. install python **3.9**
28
+
29
+ 2. Install python dependencies with ONE of the following:
30
+ - `pip3 install -e .[int4]` (recommended)
31
+ - `pip3 install -e .[int4_triton]`
32
+ - `pip3 install -e .`
33
+
34
+ ### Dataset
35
+
36
+ Have a dataset in one of the following format:
37
+
38
+ - alpaca: instruction
39
+ ```json
40
+ {"instruction": "...", "input": "...", "output": "..."}
41
+ ```
42
+ - #TODO add others
43
+ - completion: raw corpus
44
+ ```json
45
+ {"text": "..."}
46
+ ```
47
+
48
+ Optionally Download some datasets, see [data/README.md](data/README.md)
49
+
50
+ ### Config
51
+
52
+ See sample configs in [configs](configs) folder. It is recommended to duplicate and modify to your needs. The most important options are:
53
+
54
+ - model
55
+ ```yaml
56
+ base_model: ./llama-7b-hf # local or huggingface repo
57
+ ```
58
+
59
+ - dataset
60
+ ```yaml
61
+ datasets:
62
+ - path: vicgalle/alpaca-gpt4 # local or huggingface repo
63
+ type: alpaca # format from above
64
+ ```
65
+
66
+ - loading
67
+ ```yaml
68
+ load_4bit: true
69
+ load_in_8bit: true
70
+ bf16: true
71
+ fp16: true
72
+ tf32: true
73
+ ```
74
+
75
+ - lora
76
+ ```yaml
77
+ adapter: lora # blank for full finetune
78
+ lora_r: 8
79
+ lora_alpha: 16
80
+ lora_dropout: 0.05
81
+ lora_target_modules:
82
+ - q_proj
83
+ - v_proj
84
+ ```
85
+
86
+ <details>
87
+
88
+ <summary>All yaml options</summary>
89
 
90
  ```yaml
91
  # this is the huggingface model that contains *.pt, *.safetensors, or *.bin files
92
  # this can also be a relative path to a model on disk
93
+ base_model: ./llama-7b-hf
94
  # you can specify an ignore pattern if the model repo contains more than 1 model type (*.pt, etc)
95
  base_model_ignore_patterns:
96
  # if the base_model repo on hf hub doesn't include configuration .json files,
97
  # you can set that here, or leave this empty to default to base_model
98
+ base_model_config: ./llama-7b-hf
99
  # If you want to specify the type of model to load, AutoModelForCausalLM is a good choice too
100
  model_type: AutoModelForCausalLM
101
  # Corresponding tokenizer for the model AutoTokenizer is a good choice
102
  tokenizer_type: AutoTokenizer
103
+
104
  # whether you are training a 4-bit quantized model
105
  load_4bit: true
106
+ gptq_groupsize: 128 # group size
107
+ gptq_model_v1: false # v1 or v2
108
+
109
  # this will attempt to quantize the model down to 8 bits and use adam 8 bit optimizer
110
  load_in_8bit: true
111
+
112
+ # Use CUDA bf16
113
+ bf16: true
114
+ # Use CUDA fp16
115
+ fp16: true
116
+ # Use CUDA tf32
117
+ tf32: true
118
+
119
  # a list of one or more datasets to finetune the model with
120
  datasets:
121
  # this can be either a hf dataset, or relative path
 
127
  dataset_prepared_path: data/last_run_prepared
128
  # How much of the dataset to set aside as evaluation. 1 = 100%, 0.50 = 50%, etc
129
  val_set_size: 0.04
130
+
 
 
 
131
  # the maximum length of an input to train with, this should typically be less than 2048
132
  # as most models have a token/context limit of 2048
133
  sequence_len: 2048
134
  # max sequence length to concatenate training samples together up to
135
  # inspired by StackLLaMA. see https://huggingface.co/blog/stackllama#supervised-fine-tuning
136
  max_packed_sequence_len: 1024
137
+
138
+ # if you want to use lora, leave blank to train all parameters in original model
139
+ adapter: lora
140
+ # if you already have a lora model trained that you want to load, put that here
141
  # lora hyperparameters
142
+ lora_model_dir:
143
  lora_r: 8
144
  lora_alpha: 16
145
  lora_dropout: 0.05
 
148
  - v_proj
149
  # - k_proj
150
  # - o_proj
151
+ # - gate_proj
152
+ # - down_proj
153
+ # - up_proj
154
+ lora_modules_to_save:
155
+ # - embed_tokens
156
+ # - lm_head
157
+ lora_out_dir: # TODO: explain
158
  lora_fan_in_fan_out: false
159
+
160
+ # wandb configuration if you're using it
161
  wandb_project:
162
  wandb_watch:
163
  wandb_run_id:
164
+ wandb_log_model: # 'checkpoint'
165
+
166
  # where to save the finsihed model to
167
  output_dir: ./completed-model
168
+
169
  # training hyperparameters
170
  batch_size: 8
171
  micro_batch_size: 2
 
173
  num_epochs: 3
174
  warmup_steps: 100
175
  learning_rate: 0.00003
176
+ logging_steps:
177
+
178
  # whether to mask out or include the human's prompt from the training labels
179
  train_on_inputs: false
180
  # don't use this, leads to wonky training (according to someone on the internet)
181
  group_by_length: false
182
+
 
 
 
183
  # does not work with current implementation of 4-bit LoRA
184
  gradient_checkpointing: false
185
+
186
  # stop training after this many evaluation losses have increased in a row
187
  # https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback
188
  early_stopping_patience: 3
189
  # specify a scheduler to use with the optimizer. only one_cycle is supported currently
190
  lr_scheduler:
191
+ # specify optimizer
192
+ optimizer:
193
+ # specify weight decay
194
+ weight_decay:
195
+
196
  # whether to use xformers attention patch https://github.com/facebookresearch/xformers:
197
  xformers_attention:
198
  # whether to use flash attention patch https://github.com/HazyResearch/flash-attention:
199
  flash_attention:
200
+
201
  # resume from a specific checkpoint dir
202
  resume_from_checkpoint:
203
  # if resume_from_checkpoint isn't set and you simply want it to start where it left off
204
  # be careful with this being turned on between different models
205
  auto_resume_from_checkpoints: false
206
+
207
  # don't mess with this, it's here for accelerate and torchrun
208
  local_rank:
209
+ # add or change special tokens
210
 
211
+ special_tokens:
212
+ # bos_token: "<s>"
213
+ # eos_token: "</s>"
214
+ # unk_token: "<unk>"
215
 
216
+ # FSDP
217
+ fsdp:
218
+ fsdp_config:
 
 
 
219
 
220
+ # Deepspeed
221
+ deepspeed:
222
+
223
+ # TODO
224
+ torchdistx_path:
225
+
226
+ # Debug mode
227
+ debug:
 
 
 
 
 
 
 
 
228
  ```
229
 
230
+ </details>
 
231
 
232
+ ### Accelerate
233
 
234
+ Configure accelerate using `accelerate config` or update `~/.cache/huggingface/accelerate/default_config.yaml`
235
 
236
+ ### Train
 
 
 
 
 
 
237
 
238
+ Run
239
+ ```bash
240
+ accelerate launch scripts/finetune.py configs/your_config.yml
241
  ```
242
 
243
+ ### Inference
244
+
245
+ Add `--inference` flag to train command above
246
+
247
+ If you are inferencing a pretrained LORA, pass
248
+ ```bash
249
+ --lora_model_dir path/to/lora
 
250
  ```
251
 
252
+ ### Merge LORA to base
253
+
254
+ Add `--merge_lora --lora_model_dir="path/to/lora"` flag to train command above
255
+