Jeronymous
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -43,7 +43,7 @@ https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/tem
|
|
43 |
* [Neural Network Architecture](#neural-network-architecture)
|
44 |
* [Training Hyperparameters](#training-hyperparameters)
|
45 |
1. [Main Pre-training](#1-main-pre-training)
|
46 |
-
2. [Context Extension](#2-context-extension)
|
47 |
3. [Annealing](#3-annealing)
|
48 |
* [Training Logs and Learning Curves](#training-logs-and-learning-curves)
|
49 |
<!-- * [Evaluation](#evaluation) -->
|
@@ -135,8 +135,8 @@ model = transformers.AutoModelForCausalLM.from_pretrained(model_name,
|
|
135 |
where `revision` can be one of:
|
136 |
* "[`step0005000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0005000)", "[`step0010000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0010000)", "[`step0015000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0015000)", "[`step0020000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0020000)": every 5000 steps for the first pre-training steps (with a context length of 4096).
|
137 |
* "[`step0025000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0025000)", "[`step0050000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0050000)", "[`step0075000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0075000)", "[`step0100000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0100000)", ..., "[`step0750000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0750000)": every 25000 steps from 25k to 750k steps.
|
138 |
-
* "[`step0753851`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0753851)": last pre-training step before context extension and annealing.
|
139 |
-
* "[`extension_step0000250`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000250)", "[`extension_step0000500`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000500)", "[`extension_step0000750`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000750)", "[`extension_step0001000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001000)", "[`extension_step0001220`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001220)": several checkpoints during context extension (with a context length of 32000).
|
140 |
|
141 |
## Training Details
|
142 |
|
@@ -218,7 +218,7 @@ Training hyperparameters in torch/Megatron-DeepSpeed were as follows:
|
|
218 |
| Pipeline Parallelism (with 512 GPUs) | 4 |
|
219 |
| Data Parallelism (with 512 GPUs) | 32 |
|
220 |
|
221 |
-
#### 2. Context Extension
|
222 |
|
223 |
Training hyperparameters are the same as above, with the following changes:
|
224 |
| **Hyperparameter** | **Value** |
|
@@ -229,13 +229,21 @@ Training hyperparameters are the same as above, with the following changes:
|
|
229 |
| Context length | 32 000 |
|
230 |
| Batch size | 128 |
|
231 |
| Learning rate | 2e-5 |
|
|
|
232 |
| Tensor Parallelism (with 128 GPUs) | 4 |
|
233 |
| Pipeline Parallelism (with 128 GPUs) | 4 |
|
234 |
| Data Parallelism (with 128 GPUs) | 8 |
|
235 |
|
236 |
#### 3. Annealing
|
237 |
|
238 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
239 |
|
240 |
### Training Logs and Learning Curves
|
241 |
|
@@ -283,7 +291,7 @@ Main results are summarized in the following figures:
|
|
283 |
#### Pretraining
|
284 |
![figures/needle-in-a-haystack/Lucie-7B-main.png](figures/needle-in-a-haystack/Lucie-7B-main.png)
|
285 |
|
286 |
-
#### Context Extension
|
287 |
![figures/needle-in-a-haystack/Lucie-7B-extension.png](figures/needle-in-a-haystack/Lucie-7B-extension.png)
|
288 |
|
289 |
#### Annealing
|
@@ -296,19 +304,37 @@ Lucie-7B is a language model trained solely to predict the most probable next wo
|
|
296 |
|
297 |
## Citation
|
298 |
|
299 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
300 |
|
301 |
|
302 |
## Acknowledgements
|
303 |
|
304 |
This work was performed using HPC resources from GENCI–IDRIS (Grant 2024-GC011015444).
|
305 |
|
306 |
-
Lucie-7B was created by members of [LINAGORA](https://labs.linagora.com/) and OpenLLM-France community, including in alphabetical order:
|
|
|
|
|
307 |
Christophe Cerisara (LORIA),
|
308 |
Evan Dufraisse (CEA),
|
309 |
Julie Hunter (LINAGORA),
|
310 |
Jean-Pierre Lorré (LINAGORA),
|
311 |
Jérôme Louradour (LINAGORA),
|
|
|
312 |
Michel-Marie Maudet (LINAGORA),
|
313 |
Olivier Gouvert (LINAGORA), and
|
314 |
Yaya Sy (LORIA).
|
@@ -329,4 +355,3 @@ for their helpful input.
|
|
329 |
## Contact
|
330 |
|
331 | |
332 |
-
|
|
|
43 |
* [Neural Network Architecture](#neural-network-architecture)
|
44 |
* [Training Hyperparameters](#training-hyperparameters)
|
45 |
1. [Main Pre-training](#1-main-pre-training)
|
46 |
+
2. [Context Length Extension](#2-context-extension)
|
47 |
3. [Annealing](#3-annealing)
|
48 |
* [Training Logs and Learning Curves](#training-logs-and-learning-curves)
|
49 |
<!-- * [Evaluation](#evaluation) -->
|
|
|
135 |
where `revision` can be one of:
|
136 |
* "[`step0005000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0005000)", "[`step0010000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0010000)", "[`step0015000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0015000)", "[`step0020000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0020000)": every 5000 steps for the first pre-training steps (with a context length of 4096).
|
137 |
* "[`step0025000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0025000)", "[`step0050000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0050000)", "[`step0075000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0075000)", "[`step0100000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0100000)", ..., "[`step0750000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0750000)": every 25000 steps from 25k to 750k steps.
|
138 |
+
* "[`step0753851`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0753851)": last pre-training step before context length extension and annealing.
|
139 |
+
* "[`extension_step0000250`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000250)", "[`extension_step0000500`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000500)", "[`extension_step0000750`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000750)", "[`extension_step0001000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001000)", "[`extension_step0001220`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001220)": several checkpoints during context length extension (with a context length of 32000).
|
140 |
|
141 |
## Training Details
|
142 |
|
|
|
218 |
| Pipeline Parallelism (with 512 GPUs) | 4 |
|
219 |
| Data Parallelism (with 512 GPUs) | 32 |
|
220 |
|
221 |
+
#### 2. Context Length Extension
|
222 |
|
223 |
Training hyperparameters are the same as above, with the following changes:
|
224 |
| **Hyperparameter** | **Value** |
|
|
|
229 |
| Context length | 32 000 |
|
230 |
| Batch size | 128 |
|
231 |
| Learning rate | 2e-5 |
|
232 |
+
| Learning rate schedule | constant |
|
233 |
| Tensor Parallelism (with 128 GPUs) | 4 |
|
234 |
| Pipeline Parallelism (with 128 GPUs) | 4 |
|
235 |
| Data Parallelism (with 128 GPUs) | 8 |
|
236 |
|
237 |
#### 3. Annealing
|
238 |
|
239 |
+
Training hyperparameters are the same as for context length extension, with the following changes:
|
240 |
+
| **Hyperparameter** | **Value** |
|
241 |
+
|------------------------|------------|
|
242 |
+
| Total \# samples| 156 250 (5B tokens) |
|
243 |
+
| Total \# steps | 1 220 |
|
244 |
+
| Learning rate schedule | linear annealing |
|
245 |
+
| Maximum Learning rate | 3e-5 |
|
246 |
+
| Final Learning rate | 0 |
|
247 |
|
248 |
### Training Logs and Learning Curves
|
249 |
|
|
|
291 |
#### Pretraining
|
292 |
![figures/needle-in-a-haystack/Lucie-7B-main.png](figures/needle-in-a-haystack/Lucie-7B-main.png)
|
293 |
|
294 |
+
#### Context Length Extension
|
295 |
![figures/needle-in-a-haystack/Lucie-7B-extension.png](figures/needle-in-a-haystack/Lucie-7B-extension.png)
|
296 |
|
297 |
#### Annealing
|
|
|
304 |
|
305 |
## Citation
|
306 |
|
307 |
+
When using the Lucie-7B model, please cite the following paper:
|
308 |
+
|
309 |
+
✍ Olivier Gouvert, Julie Hunter, Jérôme Louradour,
|
310 |
+
Evan Dufraisse, Yaya Sy, Pierre-Carl Langlais, Anastasia Stasenko,
|
311 |
+
Laura Rivière, Christophe Cerisara, Jean-Pierre Lorré (2025)
|
312 |
+
Lucie-7B LLM and its training dataset
|
313 |
+
```bibtex
|
314 |
+
@misc{openllm2023claire,
|
315 |
+
title={Lucie-7B LLM and its training dataset:
|
316 |
+
open resources for multilingual language generation},
|
317 |
+
author={Olivier Gouvert and Julie Hunter and Jérôme Louradour and Evan Dufraisse and Yaya Sy and Pierre-Carl Langlais and Anastasia Stasenko and Laura Rivière and Christophe Cerisara and Jean-Pierre Lorré},
|
318 |
+
year={2025},
|
319 |
+
archivePrefix={arXiv},
|
320 |
+
primaryClass={cs.CL}
|
321 |
+
}
|
322 |
+
```
|
323 |
|
324 |
|
325 |
## Acknowledgements
|
326 |
|
327 |
This work was performed using HPC resources from GENCI–IDRIS (Grant 2024-GC011015444).
|
328 |
|
329 |
+
Lucie-7B was created by members of [LINAGORA](https://labs.linagora.com/) and the [OpenLLM-France](https://www.openllm-france.fr/) community, including in alphabetical order:
|
330 |
+
Agustin Martin Picard (IRT),
|
331 |
+
Thibaut Boissin (IRT),
|
332 |
Christophe Cerisara (LORIA),
|
333 |
Evan Dufraisse (CEA),
|
334 |
Julie Hunter (LINAGORA),
|
335 |
Jean-Pierre Lorré (LINAGORA),
|
336 |
Jérôme Louradour (LINAGORA),
|
337 |
+
Lucas Hervier (IRT),
|
338 |
Michel-Marie Maudet (LINAGORA),
|
339 |
Olivier Gouvert (LINAGORA), and
|
340 |
Yaya Sy (LORIA).
|
|
|
355 |
## Contact
|
356 |
|
357 | |
|