OpenLLM-France
/

Lucie-7B

@@ -43,7 +43,7 @@ https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/tem
     * [Neural Network Architecture](#neural-network-architecture)
     * [Training Hyperparameters](#training-hyperparameters)
       1. [Main Pre-training](#1-main-pre-training)
-      2. [Context Extension](#2-context-extension)
       3. [Annealing](#3-annealing)
   * [Training Logs and Learning Curves](#training-logs-and-learning-curves)
 <!-- * [Evaluation](#evaluation) -->
@@ -135,8 +135,8 @@ model = transformers.AutoModelForCausalLM.from_pretrained(model_name,
 where `revision` can be one of:
 * "[`step0005000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0005000)", "[`step0010000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0010000)", "[`step0015000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0015000)", "[`step0020000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0020000)": every 5000 steps for the first pre-training steps (with a context length of 4096).
 * "[`step0025000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0025000)", "[`step0050000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0050000)", "[`step0075000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0075000)", "[`step0100000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0100000)", ..., "[`step0750000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0750000)": every 25000 steps from 25k to 750k steps.
-* "[`step0753851`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0753851)": last pre-training step before context extension and annealing.
-* "[`extension_step0000250`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000250)", "[`extension_step0000500`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000500)", "[`extension_step0000750`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000750)", "[`extension_step0001000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001000)", "[`extension_step0001220`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001220)": several checkpoints during context extension (with a context length of 32000).
 ## Training Details
@@ -218,7 +218,7 @@ Training hyperparameters in torch/Megatron-DeepSpeed were as follows:
 | Pipeline Parallelism (with 512 GPUs) | 4           |
 | Data Parallelism (with 512 GPUs)     | 32          |
-#### 2. Context Extension
 Training hyperparameters are the same as above, with the following changes:
 | **Hyperparameter**     | **Value**  |
@@ -229,13 +229,21 @@ Training hyperparameters are the same as above, with the following changes:
 | Context length         | 32 000     |
 | Batch size             | 128        |
 | Learning rate          | 2e-5       |
 | Tensor Parallelism (with 128 GPUs)   | 4     |
 | Pipeline Parallelism (with 128 GPUs) | 4     |
 | Data Parallelism (with 128 GPUs)     | 8     |
 #### 3. Annealing
-TODO
 ### Training Logs and Learning Curves
@@ -283,7 +291,7 @@ Main results are summarized in the following figures:
 #### Pretraining
 ![figures/needle-in-a-haystack/Lucie-7B-main.png](figures/needle-in-a-haystack/Lucie-7B-main.png)
-#### Context Extension
 ![figures/needle-in-a-haystack/Lucie-7B-extension.png](figures/needle-in-a-haystack/Lucie-7B-extension.png)
 #### Annealing
@@ -296,19 +304,37 @@ Lucie-7B is a language model trained solely to predict the most probable next wo
 ## Citation
-TODO
 ## Acknowledgements
 This work was performed using HPC resources from GENCI–IDRIS (Grant 2024-GC011015444).
-Lucie-7B was created by members of [LINAGORA](https://labs.linagora.com/) and OpenLLM-France community, including in alphabetical order:
 Christophe Cerisara (LORIA),
 Evan Dufraisse (CEA),
 Julie Hunter (LINAGORA),
 Jean-Pierre Lorré (LINAGORA),
 Jérôme Louradour (LINAGORA),
 Michel-Marie Maudet (LINAGORA),
 Olivier Gouvert (LINAGORA), and
 Yaya Sy (LORIA).
@@ -329,4 +355,3 @@ for their helpful input.
 ## Contact
 [email protected]

     * [Neural Network Architecture](#neural-network-architecture)
     * [Training Hyperparameters](#training-hyperparameters)
       1. [Main Pre-training](#1-main-pre-training)
+      2. [Context Length Extension](#2-context-extension)
       3. [Annealing](#3-annealing)
   * [Training Logs and Learning Curves](#training-logs-and-learning-curves)
 <!-- * [Evaluation](#evaluation) -->
 where `revision` can be one of:
 * "[`step0005000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0005000)", "[`step0010000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0010000)", "[`step0015000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0015000)", "[`step0020000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0020000)": every 5000 steps for the first pre-training steps (with a context length of 4096).
 * "[`step0025000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0025000)", "[`step0050000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0050000)", "[`step0075000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0075000)", "[`step0100000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0100000)", ..., "[`step0750000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0750000)": every 25000 steps from 25k to 750k steps.
+* "[`step0753851`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0753851)": last pre-training step before context length extension and annealing.
+* "[`extension_step0000250`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000250)", "[`extension_step0000500`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000500)", "[`extension_step0000750`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000750)", "[`extension_step0001000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001000)", "[`extension_step0001220`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001220)": several checkpoints during context length extension (with a context length of 32000).
 ## Training Details
 | Pipeline Parallelism (with 512 GPUs) | 4           |
 | Data Parallelism (with 512 GPUs)     | 32          |
+#### 2. Context Length Extension
 Training hyperparameters are the same as above, with the following changes:
 | **Hyperparameter**     | **Value**  |
 | Context length         | 32 000     |
 | Batch size             | 128        |
 | Learning rate          | 2e-5       |
+| Learning rate schedule | constant   |
 | Tensor Parallelism (with 128 GPUs)   | 4     |
 | Pipeline Parallelism (with 128 GPUs) | 4     |
 | Data Parallelism (with 128 GPUs)     | 8     |
 #### 3. Annealing
+Training hyperparameters are the same as for context length extension, with the following changes:
+| **Hyperparameter**     | **Value**  |
+|------------------------|------------|
+| Total \# samples| 156 250 (5B tokens) |
+| Total \# steps  | 1 220      |
+| Learning rate schedule | linear annealing |
+| Maximum Learning rate  | 3e-5       |
+| Final Learning rate    | 0          |
 ### Training Logs and Learning Curves
 #### Pretraining
 ![figures/needle-in-a-haystack/Lucie-7B-main.png](figures/needle-in-a-haystack/Lucie-7B-main.png)
+#### Context Length Extension
 ![figures/needle-in-a-haystack/Lucie-7B-extension.png](figures/needle-in-a-haystack/Lucie-7B-extension.png)
 #### Annealing
 ## Citation
+When using the Lucie-7B model, please cite the following paper:
+✍ Olivier Gouvert, Julie Hunter, Jérôme Louradour,
+Evan Dufraisse, Yaya Sy, Pierre-Carl Langlais, Anastasia Stasenko,
+Laura Rivière, Christophe Cerisara, Jean-Pierre Lorré (2025)
+Lucie-7B LLM and its training dataset
+```bibtex
+@misc{openllm2023claire,
+      title={Lucie-7B LLM and its training dataset:
+      open resources for multilingual language generation},
+      author={Olivier Gouvert and Julie Hunter and Jérôme Louradour and Evan Dufraisse and Yaya Sy and Pierre-Carl Langlais and Anastasia Stasenko and Laura Rivière and Christophe Cerisara and Jean-Pierre Lorré},
+      year={2025},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
 ## Acknowledgements
 This work was performed using HPC resources from GENCI–IDRIS (Grant 2024-GC011015444).
+Lucie-7B was created by members of [LINAGORA](https://labs.linagora.com/) and the [OpenLLM-France](https://www.openllm-france.fr/) community, including in alphabetical order:
+Agustin Martin Picard (IRT),
+Thibaut Boissin (IRT),
 Christophe Cerisara (LORIA),
 Evan Dufraisse (CEA),
 Julie Hunter (LINAGORA),
 Jean-Pierre Lorré (LINAGORA),
 Jérôme Louradour (LINAGORA),
+Lucas Hervier (IRT),
 Michel-Marie Maudet (LINAGORA),
 Olivier Gouvert (LINAGORA), and
 Yaya Sy (LORIA).
 ## Contact
 [email protected]