File size: 7,736 Bytes
5cb7ea4 04d2813 5cb7ea4 04d2813 e9da4b9 f2a2029 e9da4b9 04d2813 cba0048 04d2813 c22df8d 04d2813 c22df8d 04d2813 c22df8d 04d2813 68237ea 04d2813 68237ea 04d2813 0a472e1 04d2813 0a472e1 04d2813 0a472e1 04d2813 0a472e1 04d2813 0a472e1 04d2813 0a472e1 2b43668 0a472e1 04d2813 0a472e1 04d2813 0a472e1 04d2813 0a472e1 04d2813 c22df8d 0a472e1 04d2813 0a472e1 04d2813 4ee79f2 0a472e1 04d2813 0a472e1 0e74b64 0a472e1 04d2813 0a472e1 04d2813 0a472e1 04d2813 0a472e1 04d2813 0a472e1 04d2813 0a472e1 04d2813 0a472e1 2b43668 04d2813 2b43668 f2a2029 04d2813 12de7b7 04d2813 12de7b7 04d2813 0a472e1 04d2813 0a472e1 04d2813 0a472e1 04d2813 0a472e1 04d2813 0a472e1 04d2813 0a472e1 04d2813 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 |
# Axolotl
A centralized repo to train multiple architectures with different dataset types using a simple yaml file.
Go ahead and axolotl questions!!
## Support Matrix
| | fp16/fp32 | fp16/fp32 w/ lora | 4bit-quant | 4bit-quant w/flash attention | flash attention | xformers attention |
|----------|:----------|:------------------|------------|------------------------------|-----------------|--------------------|
| llama | β
| β
| β
| β
| β
| β
|
| Pythia | β
| β
| β | β | β | β |
| cerebras | β
| β
| β | β | β | β |
| mpt | β
| β | β | β | β | β |
## Getting Started
### Environment
- Docker
```bash
docker pull winglian/axolotl
```
- Conda/Pip venv
1. Install python **3.9**
2. Install python dependencies with ONE of the following:
- `pip3 install -e .[int4]` (recommended)
- `pip3 install -e .[int4_triton]`
- `pip3 install -e .`
### Dataset
Have a dataset in one of the following format (JSONL recommended):
- alpaca: instruction; input(optional)
```json
{"instruction": "...", "input": "...", "output": "..."}
```
- jeopardy: question and answer
```json
{"question": "...", "category": "...", "answer": "..."}
```
- oasst: instruction
```json
{"INSTRUCTION": "...", "RESPONSE": "..."}
```
- gpteacher: instruction; input(optional)
```json
{"instruction": "...", "input": "...", "response": "..."}
```
- reflection: instruction with reflect; input(optional)
```json
{"instruction": "...", "input": "...", "output": "...", "reflection": "...", "corrected": "..."}
```
- sharegpt: conversations
```json
{"conversations": [{"from": "...", "value": "..."}]}
```
- completion: raw corpus
```json
{"text": "..."}
```
Optionally Download some datasets, see [data/README.md](data/README.md)
### Config
See sample configs in [configs](configs) folder. It is recommended to duplicate and modify to your needs. The most important options are:
- model
```yaml
base_model: ./llama-7b-hf # local or huggingface repo
```
Note: The code will load the right architecture.
- dataset
```yaml
datasets:
- path: vicgalle/alpaca-gpt4 # local or huggingface repo
type: alpaca # format from above
```
- loading
```yaml
load_4bit: true
load_in_8bit: true
bf16: true
fp16: true
tf32: true
```
Note: Repo does not do 4-bit quantization.
- lora
```yaml
adapter: lora # blank for full finetune
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
- q_proj
- v_proj
```
<details>
<summary>All yaml options</summary>
```yaml
# this is the huggingface model that contains *.pt, *.safetensors, or *.bin files
# this can also be a relative path to a model on disk
base_model: ./llama-7b-hf
# you can specify an ignore pattern if the model repo contains more than 1 model type (*.pt, etc)
base_model_ignore_patterns:
# if the base_model repo on hf hub doesn't include configuration .json files,
# you can set that here, or leave this empty to default to base_model
base_model_config: ./llama-7b-hf
# If you want to specify the type of model to load, AutoModelForCausalLM is a good choice too
model_type: AutoModelForCausalLM
# Corresponding tokenizer for the model AutoTokenizer is a good choice
tokenizer_type: AutoTokenizer
# whether you are training a 4-bit quantized model
load_4bit: true
gptq_groupsize: 128 # group size
gptq_model_v1: false # v1 or v2
# this will attempt to quantize the model down to 8 bits and use adam 8 bit optimizer
load_in_8bit: true
# Use CUDA bf16
bf16: true
# Use CUDA fp16
fp16: true
# Use CUDA tf32
tf32: true
# a list of one or more datasets to finetune the model with
datasets:
# this can be either a hf dataset, or relative path
- path: vicgalle/alpaca-gpt4
# The type of prompt to use for training. [alpaca, sharegpt, gpteacher, oasst, reflection]
type: alpaca
# axolotl attempts to save the dataset as an arrow after packing the data together so
# subsequent training attempts load faster, relative path
dataset_prepared_path: data/last_run_prepared
# push prepared dataset to hub
push_dataset_to_hub: # repo path
# How much of the dataset to set aside as evaluation. 1 = 100%, 0.50 = 50%, etc
val_set_size: 0.04
# the maximum length of an input to train with, this should typically be less than 2048
# as most models have a token/context limit of 2048
sequence_len: 2048
# max sequence length to concatenate training samples together up to
# inspired by StackLLaMA. see https://huggingface.co/blog/stackllama#supervised-fine-tuning
max_packed_sequence_len: 1024
# if you want to use lora, leave blank to train all parameters in original model
adapter: lora
# if you already have a lora model trained that you want to load, put that here
# lora hyperparameters
lora_model_dir:
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
- q_proj
- v_proj
# - k_proj
# - o_proj
# - gate_proj
# - down_proj
# - up_proj
lora_modules_to_save:
# - embed_tokens
# - lm_head
lora_out_dir:
lora_fan_in_fan_out: false
# wandb configuration if you're using it
wandb_project:
wandb_watch:
wandb_run_id:
wandb_log_model: # 'checkpoint'
# where to save the finished model to
output_dir: ./completed-model
# training hyperparameters
batch_size: 8
micro_batch_size: 2
eval_batch_size: 2
num_epochs: 3
warmup_steps: 100
learning_rate: 0.00003
logging_steps:
# whether to mask out or include the human's prompt from the training labels
train_on_inputs: false
# don't use this, leads to wonky training (according to someone on the internet)
group_by_length: false
# does not work with current implementation of 4-bit LoRA
gradient_checkpointing: false
# stop training after this many evaluation losses have increased in a row
# https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback
early_stopping_patience: 3
# specify a scheduler to use with the optimizer. only one_cycle is supported currently
lr_scheduler:
# specify optimizer
optimizer:
# specify weight decay
weight_decay:
# whether to use xformers attention patch https://github.com/facebookresearch/xformers:
xformers_attention:
# whether to use flash attention patch https://github.com/HazyResearch/flash-attention:
flash_attention:
# resume from a specific checkpoint dir
resume_from_checkpoint:
# if resume_from_checkpoint isn't set and you simply want it to start where it left off
# be careful with this being turned on between different models
auto_resume_from_checkpoints: false
# don't mess with this, it's here for accelerate and torchrun
local_rank:
# add or change special tokens
special_tokens:
# bos_token: "<s>"
# eos_token: "</s>"
# unk_token: "<unk>"
# add extra tokens
tokens:
# FSDP
fsdp:
fsdp_config:
# Deepspeed
deepspeed:
# TODO
torchdistx_path:
# Debug mode
debug:
```
</details>
### Accelerate
Configure accelerate using `accelerate config` or update `~/.cache/huggingface/accelerate/default_config.yaml`
### Train
Run
```bash
accelerate launch scripts/finetune.py configs/your_config.yml
```
### Inference
Add `--inference` flag to train command above
If you are inferencing a pretrained LORA, pass
```bash
--lora_model_dir path/to/lora
```
### Merge LORA to base
Add `--merge_lora --lora_model_dir="path/to/lora"` flag to train command above
|