File size: 1,643 Bytes
578e911 d962c30 3bd5bef |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
license: apache-2.0
---
- base model: [tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b)
- wandb (internal): https://wandb.ai/open-assistant/supervised-finetuning/runs/tlevhltw
- checkpoint: 2000 steps (~2.9 epochs)
Model:
```
falcon-7b:
dtype: bf16
log_dir: "falcon_log_7b"
learning_rate: 1e-5
model_name: "tiiuae/falcon-7b"
deepspeed_config: configs/zero_config.json
output_dir: falcon
weight_decay: 0.0
max_length: 2048
warmup_steps: 20
gradient_checkpointing: true
gradient_accumulation_steps: 4
per_device_train_batch_size: 4
per_device_eval_batch_size: 8
eval_steps: 100
save_steps: 500
save_strategy: steps
num_train_epochs: 8
save_total_limit: 4
residual_dropout: 0.2
residual_dropout_lima: true
```
Dataset:
```
sft9-stage2:
# oasst_export: 100.00% (29899)
# vicuna: 50.00% (16963)
# code_alpaca: 50.00% (9510)
# oa_wiki_qa_bart_10000row: 100.00% (9434)
# grade_school_math_instructions: 100.00% (8351)
# dolly15k: 100.00% (14250)
use_custom_sampler: true
datasets:
- oasst_export:
lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk" # sft-8.0
input_file_path: 2023-06-02_oasst_all_labels.jsonl.gz
val_split: 0.05
top_k: 2
- vicuna:
fraction: 0.5
val_split: 0.025
max_val_set: 250
- code_alpaca:
fraction: 0.5
val_split: 0.05
max_val_set: 250
- oa_wiki_qa_bart_10000row:
val_split: 0.05
max_val_set: 250
- grade_school_math_instructions:
val_split: 0.05
- dolly15k:
val_split: 0.05
max_val_set: 300
``` |