Jeronymous commited on
Commit
ce0348f
·
verified ·
1 Parent(s): c01f014

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -9
README.md CHANGED
@@ -43,7 +43,7 @@ https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/tem
43
  * [Neural Network Architecture](#neural-network-architecture)
44
  * [Training Hyperparameters](#training-hyperparameters)
45
  1. [Main Pre-training](#1-main-pre-training)
46
- 2. [Context Extension](#2-context-extension)
47
  3. [Annealing](#3-annealing)
48
  * [Training Logs and Learning Curves](#training-logs-and-learning-curves)
49
  <!-- * [Evaluation](#evaluation) -->
@@ -135,8 +135,8 @@ model = transformers.AutoModelForCausalLM.from_pretrained(model_name,
135
  where `revision` can be one of:
136
  * "[`step0005000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0005000)", "[`step0010000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0010000)", "[`step0015000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0015000)", "[`step0020000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0020000)": every 5000 steps for the first pre-training steps (with a context length of 4096).
137
  * "[`step0025000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0025000)", "[`step0050000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0050000)", "[`step0075000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0075000)", "[`step0100000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0100000)", ..., "[`step0750000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0750000)": every 25000 steps from 25k to 750k steps.
138
- * "[`step0753851`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0753851)": last pre-training step before context extension and annealing.
139
- * "[`extension_step0000250`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000250)", "[`extension_step0000500`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000500)", "[`extension_step0000750`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000750)", "[`extension_step0001000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001000)", "[`extension_step0001220`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001220)": several checkpoints during context extension (with a context length of 32000).
140
 
141
  ## Training Details
142
 
@@ -218,7 +218,7 @@ Training hyperparameters in torch/Megatron-DeepSpeed were as follows:
218
  | Pipeline Parallelism (with 512 GPUs) | 4 |
219
  | Data Parallelism (with 512 GPUs) | 32 |
220
 
221
- #### 2. Context Extension
222
 
223
  Training hyperparameters are the same as above, with the following changes:
224
  | **Hyperparameter** | **Value** |
@@ -229,13 +229,21 @@ Training hyperparameters are the same as above, with the following changes:
229
  | Context length | 32 000 |
230
  | Batch size | 128 |
231
  | Learning rate | 2e-5 |
 
232
  | Tensor Parallelism (with 128 GPUs) | 4 |
233
  | Pipeline Parallelism (with 128 GPUs) | 4 |
234
  | Data Parallelism (with 128 GPUs) | 8 |
235
 
236
  #### 3. Annealing
237
 
238
- TODO
 
 
 
 
 
 
 
239
 
240
  ### Training Logs and Learning Curves
241
 
@@ -283,7 +291,7 @@ Main results are summarized in the following figures:
283
  #### Pretraining
284
  ![figures/needle-in-a-haystack/Lucie-7B-main.png](figures/needle-in-a-haystack/Lucie-7B-main.png)
285
 
286
- #### Context Extension
287
  ![figures/needle-in-a-haystack/Lucie-7B-extension.png](figures/needle-in-a-haystack/Lucie-7B-extension.png)
288
 
289
  #### Annealing
@@ -296,19 +304,37 @@ Lucie-7B is a language model trained solely to predict the most probable next wo
296
 
297
  ## Citation
298
 
299
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
300
 
301
 
302
  ## Acknowledgements
303
 
304
  This work was performed using HPC resources from GENCI–IDRIS (Grant 2024-GC011015444).
305
 
306
- Lucie-7B was created by members of [LINAGORA](https://labs.linagora.com/) and OpenLLM-France community, including in alphabetical order:
 
 
307
  Christophe Cerisara (LORIA),
308
  Evan Dufraisse (CEA),
309
  Julie Hunter (LINAGORA),
310
  Jean-Pierre Lorré (LINAGORA),
311
  Jérôme Louradour (LINAGORA),
 
312
  Michel-Marie Maudet (LINAGORA),
313
  Olivier Gouvert (LINAGORA), and
314
  Yaya Sy (LORIA).
@@ -329,4 +355,3 @@ for their helpful input.
329
  ## Contact
330
 
331
332
-
 
43
  * [Neural Network Architecture](#neural-network-architecture)
44
  * [Training Hyperparameters](#training-hyperparameters)
45
  1. [Main Pre-training](#1-main-pre-training)
46
+ 2. [Context Length Extension](#2-context-extension)
47
  3. [Annealing](#3-annealing)
48
  * [Training Logs and Learning Curves](#training-logs-and-learning-curves)
49
  <!-- * [Evaluation](#evaluation) -->
 
135
  where `revision` can be one of:
136
  * "[`step0005000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0005000)", "[`step0010000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0010000)", "[`step0015000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0015000)", "[`step0020000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0020000)": every 5000 steps for the first pre-training steps (with a context length of 4096).
137
  * "[`step0025000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0025000)", "[`step0050000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0050000)", "[`step0075000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0075000)", "[`step0100000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0100000)", ..., "[`step0750000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0750000)": every 25000 steps from 25k to 750k steps.
138
+ * "[`step0753851`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0753851)": last pre-training step before context length extension and annealing.
139
+ * "[`extension_step0000250`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000250)", "[`extension_step0000500`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000500)", "[`extension_step0000750`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000750)", "[`extension_step0001000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001000)", "[`extension_step0001220`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001220)": several checkpoints during context length extension (with a context length of 32000).
140
 
141
  ## Training Details
142
 
 
218
  | Pipeline Parallelism (with 512 GPUs) | 4 |
219
  | Data Parallelism (with 512 GPUs) | 32 |
220
 
221
+ #### 2. Context Length Extension
222
 
223
  Training hyperparameters are the same as above, with the following changes:
224
  | **Hyperparameter** | **Value** |
 
229
  | Context length | 32 000 |
230
  | Batch size | 128 |
231
  | Learning rate | 2e-5 |
232
+ | Learning rate schedule | constant |
233
  | Tensor Parallelism (with 128 GPUs) | 4 |
234
  | Pipeline Parallelism (with 128 GPUs) | 4 |
235
  | Data Parallelism (with 128 GPUs) | 8 |
236
 
237
  #### 3. Annealing
238
 
239
+ Training hyperparameters are the same as for context length extension, with the following changes:
240
+ | **Hyperparameter** | **Value** |
241
+ |------------------------|------------|
242
+ | Total \# samples| 156 250 (5B tokens) |
243
+ | Total \# steps | 1 220 |
244
+ | Learning rate schedule | linear annealing |
245
+ | Maximum Learning rate | 3e-5 |
246
+ | Final Learning rate | 0 |
247
 
248
  ### Training Logs and Learning Curves
249
 
 
291
  #### Pretraining
292
  ![figures/needle-in-a-haystack/Lucie-7B-main.png](figures/needle-in-a-haystack/Lucie-7B-main.png)
293
 
294
+ #### Context Length Extension
295
  ![figures/needle-in-a-haystack/Lucie-7B-extension.png](figures/needle-in-a-haystack/Lucie-7B-extension.png)
296
 
297
  #### Annealing
 
304
 
305
  ## Citation
306
 
307
+ When using the Lucie-7B model, please cite the following paper:
308
+
309
+ ✍ Olivier Gouvert, Julie Hunter, Jérôme Louradour,
310
+ Evan Dufraisse, Yaya Sy, Pierre-Carl Langlais, Anastasia Stasenko,
311
+ Laura Rivière, Christophe Cerisara, Jean-Pierre Lorré (2025)
312
+ Lucie-7B LLM and its training dataset
313
+ ```bibtex
314
+ @misc{openllm2023claire,
315
+ title={Lucie-7B LLM and its training dataset:
316
+ open resources for multilingual language generation},
317
+ author={Olivier Gouvert and Julie Hunter and Jérôme Louradour and Evan Dufraisse and Yaya Sy and Pierre-Carl Langlais and Anastasia Stasenko and Laura Rivière and Christophe Cerisara and Jean-Pierre Lorré},
318
+ year={2025},
319
+ archivePrefix={arXiv},
320
+ primaryClass={cs.CL}
321
+ }
322
+ ```
323
 
324
 
325
  ## Acknowledgements
326
 
327
  This work was performed using HPC resources from GENCI–IDRIS (Grant 2024-GC011015444).
328
 
329
+ Lucie-7B was created by members of [LINAGORA](https://labs.linagora.com/) and the [OpenLLM-France](https://www.openllm-france.fr/) community, including in alphabetical order:
330
+ Agustin Martin Picard (IRT),
331
+ Thibaut Boissin (IRT),
332
  Christophe Cerisara (LORIA),
333
  Evan Dufraisse (CEA),
334
  Julie Hunter (LINAGORA),
335
  Jean-Pierre Lorré (LINAGORA),
336
  Jérôme Louradour (LINAGORA),
337
+ Lucas Hervier (IRT),
338
  Michel-Marie Maudet (LINAGORA),
339
  Olivier Gouvert (LINAGORA), and
340
  Yaya Sy (LORIA).
 
355
  ## Contact
356
 
357