Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,131 @@
|
|
1 |
---
|
2 |
license: other
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: other
|
3 |
---
|
4 |
+
|
5 |
+
## airoboros-gpt-3.5-turbo-100k-7b
|
6 |
+
|
7 |
+
This is a 7b parameter, fine-tuned on 100k synthetic instruction/response pairs generated by gpt-3.5-turbo using (airoboros)[https://github.com/jondurbin/airoboros]
|
8 |
+
|
9 |
+
Links:
|
10 |
+
|
11 |
+
* (airoboros)[https://github.com/jondurbin/airoboros]
|
12 |
+
* (instructions.jsonl)[https://storage.googleapis.com/airoboros-dump/gpt-3.5-turbo-100k/instructions.jsonl]
|
13 |
+
* (topics.txt)[https://storage.googleapis.com/airoboros-dump/gpt-3.5-turbo-100k/topics-d732f92dd90a1a5337a4a02ddeaec72b.txt]
|
14 |
+
|
15 |
+
### Prompt generation
|
16 |
+
|
17 |
+
```
|
18 |
+
airoboros generate-instructions --instruction-count 100000 --concurrency 100 --temperature 1.0
|
19 |
+
```
|
20 |
+
|
21 |
+
### Fine-tuning
|
22 |
+
|
23 |
+
The instructions.jsonl file was converted to conversation style expected by the FastChat training scripts, and then trained with:
|
24 |
+
```
|
25 |
+
torchrun --nproc_per_node=8 --master_port=20001 train_mem.py \
|
26 |
+
--model_name_or_path /workspace/llama-7b-hf \
|
27 |
+
--data_path ./as_conversations.json \
|
28 |
+
--bf16 True \
|
29 |
+
--output_dir /workspace/airoboros-gpt-3.5-100k-7b \
|
30 |
+
--num_train_epochs 3 \
|
31 |
+
--per_device_train_batch_size 4 \
|
32 |
+
--per_device_eval_batch_size 32 \
|
33 |
+
--gradient_accumulation_steps 4 \
|
34 |
+
--evaluation_strategy "steps" \
|
35 |
+
--eval_steps 1500 \
|
36 |
+
--save_strategy "steps" \
|
37 |
+
--save_steps 1500 \
|
38 |
+
--save_total_limit 8 \
|
39 |
+
--learning_rate 2e-5 \
|
40 |
+
--weight_decay 0. \
|
41 |
+
--warmup_ratio 0.04 \
|
42 |
+
--lr_scheduler_type "cosine" \
|
43 |
+
--logging_steps 1 \
|
44 |
+
--fsdp "full_shard auto_wrap offload" \
|
45 |
+
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
|
46 |
+
--tf32 True \
|
47 |
+
--model_max_length 2048 \
|
48 |
+
--gradient_checkpointing True \
|
49 |
+
--lazy_preprocess True
|
50 |
+
```
|
51 |
+
|
52 |
+
Training took roughly 22 hours on 8x nvidia A100 80GB.
|
53 |
+
|
54 |
+
Conversion to conversation style:
|
55 |
+
```
|
56 |
+
import json
|
57 |
+
import uuid
|
58 |
+
inputs = [json.loads(line) for line in open("instructions.jsonl").readlines()]
|
59 |
+
conversations = []
|
60 |
+
for row in inputs:
|
61 |
+
inputs = row['instruction']
|
62 |
+
conversations.append({
|
63 |
+
"id": str(uuid.uuid4()),
|
64 |
+
"conversations": [
|
65 |
+
{
|
66 |
+
"from": "human",
|
67 |
+
"value": inputs,
|
68 |
+
},
|
69 |
+
{
|
70 |
+
"from": "gpt",
|
71 |
+
"value": row['response']
|
72 |
+
},
|
73 |
+
],
|
74 |
+
})
|
75 |
+
with open("as_conversations.json", "w") as outfile:
|
76 |
+
outfile.write(json.dumps(conversations, indent=2)
|
77 |
+
```
|
78 |
+
|
79 |
+
## Evaluation
|
80 |
+
|
81 |
+
I used the same questions from (WizardVicunaLM)[]:
|
82 |
+
|
83 |
+
| instruction | gpt3.5 | wizard-vicuna-13b | vicuna-13b | wizard-7b | airoboros-gpt-3.5-turbo-100k-7b |
|
84 |
+
| --- | --- | --- | --- | --- | --- |
|
85 |
+
| "Write a compelling product launch announcement email to inform our customers of our new software solution." | 95 | 92 | 89 | 90 | 91 |
|
86 |
+
| "Draft an apology email to a customer who experienced a delay in their order, and provide reassurance that the issue has been resolved." | 94 | 96 | 90 | 89 | 91 |
|
87 |
+
| "As a pirate captain, what would you say to your crew to motivate them to search for hidden treasure?" | 95 | 90 | 80 | 70 | 85 |
|
88 |
+
| "Imagine you are a time traveler from the year 3000. What technological advancements would you tell people about?" | 95 | 92 | 90 | 88 | 85 |
|
89 |
+
| "As a space colonist on Mars, describe your daily life and the challenges you face living on another planet." | 95 | 90 | 87 | 85 | 88 |
|
90 |
+
| "How can you assess the credibility of a source of information, such as a news article or blog post, without relying solely on the reputation of the author or publisher?" | 93 | 85 | 89 | 87 | 90 |
|
91 |
+
| "How can observing the behavior of other people in a social situation provide clues about cultural norms and expectations?" | 95 | 90 | 85 | 92 | 80 |
|
92 |
+
| "How many text messages are sent globally in a minute? Try to explain your answer. Your explanation should take the reader through your reasoning step-by-step." | 90 | 70 | 65 | 80 | 85 |
|
93 |
+
| "What are the main differences between Python and JavaScript programming languages?"| 90 | 85 | 80 | 88 | 82 |
|
94 |
+
| "What are the differences between plant-based and animal-based protein sources?"| 85 | 92 | 90 | 80 | 94 |
|
95 |
+
| "Describe a scenario where artificial intelligence could be used to improve the quality and efficiency of healthcare delivery." | 95 | 90 | 92 | 89 | 91 |
|
96 |
+
| "How do cultural, social, and economic factors influence people's food choices, and how can this knowledge be used to promote healthier diets?" | 90 | 85 | 87 | 83 | 84 |
|
97 |
+
| "How many words are spoken daily on Earth? Try to explain your answer. Your explanation should take the reader through your reasoning step-by-step." | 90 | 70 | 80 | 75 | 65 |
|
98 |
+
| "How many lightning strikes occur on Earth each day? Try to explain your answer. Your explanation should take the reader through your reasoning step-by-step." | 90 | 80 | 60 | 70 | 85 |
|
99 |
+
|
100 |
+
If we use gpt-3.5 as the baseline (as wizardvicuna/vicuna did), we get the following scores:
|
101 |
+
|
102 |
+
| gpt3.5 | wizard-vicuna-13b | vicuna-13b | wizard-7b | airoboros-gpt-3.5-turbo-100k-7b |
|
103 |
+
| --- | --- | --- | --- | --- |
|
104 |
+
| 1.0 | __0.968421052631579__ | 0.9368421052631579 | 0.9473684210526315 | 0.9578947368421052 |
|
105 |
+
| 1.0 | __1.0212765957446808__ | 0.9574468085106383 | 0.9468085106382979 | 0.9680851063829787 |
|
106 |
+
| 1.0 | __0.9473684210526315__ | 0.8421052631578947 | 0.7368421052631579 | 0.8947368421052632 |
|
107 |
+
| 1.0 | __0.968421052631579__ | 0.9473684210526315 | 0.9263157894736842 | 0.8947368421052632 |
|
108 |
+
| 1.0 | __0.9473684210526315__ | 0.9157894736842105 | 0.8947368421052632 | 0.9263157894736842 |
|
109 |
+
| 1.0 | 0.9139784946236559 | 0.956989247311828 | 0.9354838709677419 | __0.967741935483871__ |
|
110 |
+
| 1.0 | 0.9473684210526315 | 0.8947368421052632 | __0.968421052631579__ | 0.8421052631578947 |
|
111 |
+
| 1.0 | 0.7777777777777778 | 0.7222222222222222 | 0.8888888888888888 | __0.9444444444444444__ |
|
112 |
+
| 1.0 | 0.9444444444444444 | 0.8888888888888888 | __0.9777777777777777__ | 0.9111111111111111 |
|
113 |
+
| 1.0 | 1.0823529411764705 | 1.0588235294117647 | 0.9411764705882353 | __1.1058823529411765__ |
|
114 |
+
| 1.0 | 0.9473684210526315 | __0.968421052631579__ | 0.9368421052631579 | 0.9578947368421052 |
|
115 |
+
| 1.0 | 0.9444444444444444 | __0.9666666666666667__ | 0.9222222222222223 | 0.9333333333333333 |
|
116 |
+
| 1.0 | 0.7777777777777778 | __0.8888888888888888__ | 0.8333333333333334 | 0.7222222222222222 |
|
117 |
+
| 1.0 | 0.8888888888888888 | 0.6666666666666666 | 0.7777777777777778 | __0.9444444444444444__ |
|
118 |
+
|
119 |
+
Average scores:
|
120 |
+
|
121 |
+
```
|
122 |
+
gpt3.5 1.000000
|
123 |
+
wizard-vicuna-13b 0.934090
|
124 |
+
vicuna-13b 0.900847
|
125 |
+
wizard-7b 0.902428
|
126 |
+
airoboros-gpt-3.5-turbo-100k-7b 0.926496
|
127 |
+
```
|
128 |
+
As you can see, the __7b__ airoboros model performs well, even compared to 13b models.
|
129 |
+
|
130 |
+
## License
|
131 |
+
The model is licensed under the LLaMA model, and the dataset is licensed under the terms of OpenAI because it uses ChatGPT. Everything else is free.
|