Triangle104
commited on
Commit
•
ec985f2
1
Parent(s):
5571671
Update README.md
Browse files
README.md
CHANGED
@@ -15,6 +15,286 @@ tags:
|
|
15 |
This model was converted to GGUF format from [`allenai/OLMo-2-1124-7B`](https://huggingface.co/allenai/OLMo-2-1124-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
16 |
Refer to the [original model card](https://huggingface.co/allenai/OLMo-2-1124-7B) for more details on the model.
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
## Use with llama.cpp
|
19 |
Install llama.cpp through brew (works on Mac and Linux)
|
20 |
|
|
|
15 |
This model was converted to GGUF format from [`allenai/OLMo-2-1124-7B`](https://huggingface.co/allenai/OLMo-2-1124-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
16 |
Refer to the [original model card](https://huggingface.co/allenai/OLMo-2-1124-7B) for more details on the model.
|
17 |
|
18 |
+
---
|
19 |
+
Model details:
|
20 |
+
-
|
21 |
+
|
22 |
+
|
23 |
+
|
24 |
+
We introduce OLMo 2, a new family of 7B and 13B models featuring a
|
25 |
+
9-point increase in MMLU, among other evaluation improvements, compared
|
26 |
+
to the original OLMo 7B model. These gains come from training on
|
27 |
+
OLMo-mix-1124 and Dolmino-mix-1124 datasets and staged training
|
28 |
+
approach.
|
29 |
+
|
30 |
+
|
31 |
+
OLMo is a series of Open Language Models
|
32 |
+
designed to enable the science of language models.
|
33 |
+
These models are trained on the Dolma dataset. We are releasing all
|
34 |
+
code, checkpoints, logs (coming soon), and associated training details.
|
35 |
+
|
36 |
+
|
37 |
+
|
38 |
+
|
39 |
+
|
40 |
+
|
41 |
+
|
42 |
+
Installation
|
43 |
+
|
44 |
+
|
45 |
+
|
46 |
+
|
47 |
+
OLMo 2 will be supported in the next version of Transformers, and you need to install it from the main branch using:
|
48 |
+
|
49 |
+
|
50 |
+
pip install --upgrade git+https://github.com/huggingface/transformers.git
|
51 |
+
|
52 |
+
|
53 |
+
Inference
|
54 |
+
|
55 |
+
|
56 |
+
|
57 |
+
You can use OLMo with the standard HuggingFace transformers library:
|
58 |
+
|
59 |
+
|
60 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
61 |
+
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B")
|
62 |
+
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-2-1124-7B")
|
63 |
+
message = ["Language modeling is "]
|
64 |
+
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
|
65 |
+
|
66 |
+
|
67 |
+
|
68 |
+
|
69 |
+
|
70 |
+
|
71 |
+
|
72 |
+
optional verifying cuda
|
73 |
+
|
74 |
+
|
75 |
+
|
76 |
+
|
77 |
+
|
78 |
+
|
79 |
+
|
80 |
+
|
81 |
+
|
82 |
+
inputs = {k: v.to('cuda') for k,v in inputs.items()}
|
83 |
+
|
84 |
+
|
85 |
+
|
86 |
+
|
87 |
+
|
88 |
+
|
89 |
+
|
90 |
+
|
91 |
+
|
92 |
+
olmo = olmo.to('cuda')
|
93 |
+
|
94 |
+
|
95 |
+
|
96 |
+
|
97 |
+
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
|
98 |
+
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
|
99 |
+
|
100 |
+
|
101 |
+
|
102 |
+
|
103 |
+
|
104 |
+
|
105 |
+
'Language modeling is a key component of any text-based application, but its effectiveness...'
|
106 |
+
|
107 |
+
|
108 |
+
|
109 |
+
|
110 |
+
|
111 |
+
|
112 |
+
For faster performance, you can quantize the model using the following method:
|
113 |
+
|
114 |
+
|
115 |
+
AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B",
|
116 |
+
torch_dtype=torch.float16,
|
117 |
+
load_in_8bit=True) # Requires bitsandbytes
|
118 |
+
|
119 |
+
|
120 |
+
The quantized model is more sensitive to data
|
121 |
+
types and CUDA operations. To avoid potential issues, it's recommended
|
122 |
+
to pass the inputs directly to CUDA using:
|
123 |
+
|
124 |
+
|
125 |
+
inputs.input_ids.to('cuda')
|
126 |
+
|
127 |
+
|
128 |
+
We have released checkpoints for these models. For pretraining, the
|
129 |
+
naming convention is stepXXX-tokensYYYB. For checkpoints with
|
130 |
+
ingredients of the soup, the naming convention is
|
131 |
+
stage2-ingredientN-stepXXX-tokensYYYB
|
132 |
+
|
133 |
+
|
134 |
+
To load a specific model revision with HuggingFace, simply add the argument revision:
|
135 |
+
|
136 |
+
|
137 |
+
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B", revision="step1000-tokens5B")
|
138 |
+
|
139 |
+
|
140 |
+
Or, you can access all the revisions for the models via the following code snippet:
|
141 |
+
|
142 |
+
|
143 |
+
from huggingface_hub import list_repo_refs
|
144 |
+
out = list_repo_refs("allenai/OLMo-2-1124-7B")
|
145 |
+
branches = [b.name for b in out.branches]
|
146 |
+
|
147 |
+
|
148 |
+
Fine-tuning
|
149 |
+
|
150 |
+
|
151 |
+
|
152 |
+
Model fine-tuning can be done from the final checkpoint (the main
|
153 |
+
revision of this model) or many intermediate checkpoints. Two recipes
|
154 |
+
for tuning are available.
|
155 |
+
|
156 |
+
|
157 |
+
Fine-tune with the OLMo repository:
|
158 |
+
|
159 |
+
|
160 |
+
torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config}
|
161 |
+
--data.paths=[{path_to_data}/input_ids.npy]
|
162 |
+
--data.label_mask_paths=[{path_to_data}/label_mask.npy]
|
163 |
+
--load_path={path_to_checkpoint}
|
164 |
+
--reset_trainer_state
|
165 |
+
|
166 |
+
|
167 |
+
For more documentation, see the GitHub readme.
|
168 |
+
|
169 |
+
|
170 |
+
Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are here.
|
171 |
+
|
172 |
+
|
173 |
+
Model Description
|
174 |
+
|
175 |
+
|
176 |
+
|
177 |
+
Developed by: Allen Institute for AI (Ai2)
|
178 |
+
Model type: a Transformer style autoregressive language model.
|
179 |
+
Language(s) (NLP): English
|
180 |
+
License: The code and model are released under Apache 2.0.
|
181 |
+
Contact: Technical inquiries: [email protected]. Press: [email protected]
|
182 |
+
Date cutoff: Dec. 2023.
|
183 |
+
|
184 |
+
|
185 |
+
Model Sources
|
186 |
+
|
187 |
+
|
188 |
+
|
189 |
+
Project Page: https://allenai.org/olmo
|
190 |
+
Repositories:
|
191 |
+
Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo
|
192 |
+
Evaluation code: https://github.com/allenai/OLMo-Eval
|
193 |
+
Further fine-tuning code: https://github.com/allenai/open-instruct
|
194 |
+
|
195 |
+
|
196 |
+
Paper: Coming soon
|
197 |
+
|
198 |
+
|
199 |
+
Pretraining
|
200 |
+
|
201 |
+
|
202 |
+
|
203 |
+
|
204 |
+
|
205 |
+
|
206 |
+
|
207 |
+
|
208 |
+
|
209 |
+
|
210 |
+
OLMo 2 7B
|
211 |
+
OLMo 2 13B
|
212 |
+
|
213 |
+
|
214 |
+
Pretraining Stage 1
|
215 |
+
(OLMo-Mix-1124)
|
216 |
+
4 trillion tokens
|
217 |
+
(1 epoch)
|
218 |
+
5 trillion tokens
|
219 |
+
(1.2 epochs)
|
220 |
+
|
221 |
+
|
222 |
+
Pretraining Stage 2
|
223 |
+
(Dolmino-Mix-1124)
|
224 |
+
50B tokens (3 runs)
|
225 |
+
merged
|
226 |
+
100B tokens (3 runs)
|
227 |
+
300B tokens (1 run)
|
228 |
+
merged
|
229 |
+
|
230 |
+
|
231 |
+
Post-training
|
232 |
+
(Tulu 3 SFT OLMo mix)
|
233 |
+
SFT + DPO + PPO
|
234 |
+
(preference mix)
|
235 |
+
SFT + DPO + PPO
|
236 |
+
(preference mix)
|
237 |
+
|
238 |
+
|
239 |
+
Stage 1: Initial Pretraining
|
240 |
+
|
241 |
+
|
242 |
+
|
243 |
+
Dataset: OLMo-Mix-1124 (3.9T tokens)
|
244 |
+
Coverage: 90%+ of total pretraining budget
|
245 |
+
7B Model: ~1 epoch
|
246 |
+
13B Model: 1.2 epochs (5T tokens)
|
247 |
+
|
248 |
+
|
249 |
+
Stage 2: Fine-tuning
|
250 |
+
|
251 |
+
|
252 |
+
|
253 |
+
Dataset: Dolmino-Mix-1124 (843B tokens)
|
254 |
+
Three training mixes:
|
255 |
+
50B tokens
|
256 |
+
100B tokens
|
257 |
+
300B tokens
|
258 |
+
|
259 |
+
|
260 |
+
Mix composition: 50% high-quality data + academic/Q&A/instruction/math content
|
261 |
+
|
262 |
+
|
263 |
+
Model Merging
|
264 |
+
|
265 |
+
|
266 |
+
|
267 |
+
7B Model: 3 versions trained on 50B mix, merged via model souping
|
268 |
+
13B Model: 3 versions on 100B mix + 1 version on 300B mix, merged for final checkpoint
|
269 |
+
|
270 |
+
|
271 |
+
Bias, Risks, and Limitations
|
272 |
+
|
273 |
+
|
274 |
+
|
275 |
+
Like any base language model or fine-tuned model without safety
|
276 |
+
filtering, these models can easily be prompted by users to generate
|
277 |
+
harmful and sensitive content. Such content may also be produced
|
278 |
+
unintentionally, especially in cases involving bias, so we recommend
|
279 |
+
that users consider the risks when applying this technology.
|
280 |
+
Additionally, many statements from OLMo or any LLM are often inaccurate,
|
281 |
+
so facts should be verified.
|
282 |
+
|
283 |
+
|
284 |
+
Citation
|
285 |
+
|
286 |
+
|
287 |
+
|
288 |
+
A technical manuscript is forthcoming!
|
289 |
+
|
290 |
+
|
291 |
+
Model Card Contact
|
292 |
+
|
293 |
+
|
294 |
+
|
295 |
+
For errors in this model card, contact [email protected].
|
296 |
+
|
297 |
+
---
|
298 |
## Use with llama.cpp
|
299 |
Install llama.cpp through brew (works on Mac and Linux)
|
300 |
|