Update README.md
Browse files
README.md
CHANGED
@@ -1,21 +1,47 @@
|
|
1 |
---
|
2 |
library_name: peft
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
-
## Training procedure
|
5 |
|
|
|
6 |
|
7 |
-
The
|
8 |
-
- quant_method: bitsandbytes
|
9 |
-
- load_in_8bit: False
|
10 |
-
- load_in_4bit: True
|
11 |
-
- llm_int8_threshold: 6.0
|
12 |
-
- llm_int8_skip_modules: None
|
13 |
-
- llm_int8_enable_fp32_cpu_offload: False
|
14 |
-
- llm_int8_has_fp16_weight: False
|
15 |
-
- bnb_4bit_quant_type: nf4
|
16 |
-
- bnb_4bit_use_double_quant: True
|
17 |
-
- bnb_4bit_compute_dtype: bfloat16
|
18 |
-
### Framework versions
|
19 |
|
|
|
20 |
|
21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
library_name: peft
|
3 |
+
tags:
|
4 |
+
- tiiuae-falcon-180B
|
5 |
+
- code
|
6 |
+
- instruct
|
7 |
+
- databricks-dolly-15k
|
8 |
+
- falcon-180B
|
9 |
+
datasets:
|
10 |
+
- databricks/databricks-dolly-15k
|
11 |
+
base_model: tiiuae/falcon-180B
|
12 |
---
|
|
|
13 |
|
14 |
+
For our finetuning process, we used the tiiuae/falcon-180B model and the Databricks-dolly-15k dataset. This dataset is a rich corpus of over 15,000 records, painstakingly created by the collaborative efforts of thousands of Databricks employees. The goal was to enable large language models to emulate the magical interactivity of ChatGPT.
|
15 |
|
16 |
+
The contributors were asked to create prompt / response pairs spread across eight different instruction categories. This included the seven categories outlined in the InstructGPT paper, as well as an open-ended, free-form category. To ensure the uniqueness and authenticity of the data, contributors were instructed to abstain from using information from any online source, with the sole exception being Wikipedia (for specific subsets of instruction categories). They were also explicitly instructed to avoid using generative AI in formulating instructions or responses.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
+
During the data generation process, contributors had the opportunity to answer questions posed by other contributors. They were prompted to rephrase the original question and encouraged to select only those questions they were confident they could answer correctly.
|
19 |
|
20 |
+
In certain categories, contributors were asked to provide reference texts sourced from Wikipedia. These references (indicated by the context field in the dataset) may contain bracketed Wikipedia citation numbers (e.g. [42]). We recommend users to remove these for downstream applications.
|
21 |
+
|
22 |
+
This finetuning process was carried out using [MonsterAPI](https://monsterapi.ai)'s no-code [LLM finetuner](https://docs.monsterapi.ai/fine-tune-a-large-language-model-llm). The session lasted for 41.7 hours and costed us `$184.314`, running on 2x A100 80GB GPUs.
|
23 |
+
|
24 |
+
#### Hyperparameters & Run details:
|
25 |
+
- Model Path: tiiuae/falcon-180B
|
26 |
+
- Dataset: databricks/databricks-dolly-15k
|
27 |
+
- Learning rate: 0.0002
|
28 |
+
- Number of epochs: 1
|
29 |
+
- Data split: Training: 90% / Validation: 10%
|
30 |
+
- Gradient accumulation steps: 1
|
31 |
+
|
32 |
+
license: apache-2.0
|
33 |
+
---
|
34 |
+
|
35 |
+
######
|
36 |
+
|
37 |
+
Prompt Used:
|
38 |
+
|
39 |
+
```
|
40 |
+
### INSTRUCTION:
|
41 |
+
[instruction]
|
42 |
+
|
43 |
+
[context]
|
44 |
+
|
45 |
+
### RESPONSE:
|
46 |
+
[response]
|
47 |
+
```
|