Jeronymous
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -240,20 +240,56 @@ TODO
|
|
240 |
|
241 |
### Training Logs and Learning Curves
|
242 |
|
|
|
|
|
243 |
Training logs can be found in Tensorboard format in:
|
244 |
* [`metadata/training_logs/`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/main/metadata/training_logs)
|
245 |
<br> βββ [`1_pretraining.zip`](metadata/training_logs/1_pretraining.zip) training logs for the first pre-training phases,
|
246 |
in a zip file. Each file in the zip corresponds to a job of at most 20H of training (parallelized over 512 GPUs).
|
247 |
-
<br>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
248 |
|
249 |
-
|
|
|
250 |
|
251 |
Evaluation results on benchmark datasets of checkpoints of Lucie-7B throughout the training process are available at
|
252 |
[metadata/evaluation_learning_curve_lucie.csv](metadata/evaluation_learning_curve_lucie.csv).
|
253 |
Evaluation results of baseline models on the same benchmark datasets are available at
|
254 |
[metadata/evaluation_baselines.csv](metadata/evaluation_baselines.csv).
|
255 |
|
256 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
257 |
|
258 |
## Disclaimer
|
259 |
|
@@ -294,3 +330,4 @@ for their helpful input.
|
|
294 |
## Contact
|
295 |
|
296 | |
|
|
|
240 |
|
241 |
### Training Logs and Learning Curves
|
242 |
|
243 |
+
#### Training loss
|
244 |
+
|
245 |
Training logs can be found in Tensorboard format in:
|
246 |
* [`metadata/training_logs/`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/main/metadata/training_logs)
|
247 |
<br> βββ [`1_pretraining.zip`](metadata/training_logs/1_pretraining.zip) training logs for the first pre-training phases,
|
248 |
in a zip file. Each file in the zip corresponds to a job of at most 20H of training (parallelized over 512 GPUs).
|
249 |
+
<br> βββ [`2_extension/`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/main/metadata/training_logs/2_extension) folder containing the training log <br> βββ [`3_annealing/`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/main/metadata/training_logs/3_annealing) folder containing the training log for the annealing phase, which also took around 13H of training (parallelized over 128 GPUs).
|
250 |
+
|
251 |
+
The convergence curves of the three pre-training phases are the following:
|
252 |
+
|
253 |
+
![figures/convergence-curve-pretraining.png](figures/convergence-curve-pretraining.png)
|
254 |
+
|
255 |
+
Data corresponding to these plots were extracted from tensorboard logs and are available in the following CSV files:
|
256 |
+
* [`metadata/training_logs/`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/main/metadata/training_logs)
|
257 |
+
<br> βββ [`1_pretraining.csv`](metadata/training_logs/1_pretraining.csv)
|
258 |
+
<br> βββ [`2_extension.csv`](metadata/training_logs/2_extension.csv)
|
259 |
+
<br> βββ [`3_annealing.csv`](metadata/training_logs/3_annealing.csv)
|
260 |
+
|
261 |
+
#### Evaluations
|
262 |
|
263 |
+
Multiple evaluations were conducted during Lucie-7B's training to assess its performance on standard benchmarks,
|
264 |
+
primarily in French and English, as well as in Spanish, German, and Italian.
|
265 |
|
266 |
Evaluation results on benchmark datasets of checkpoints of Lucie-7B throughout the training process are available at
|
267 |
[metadata/evaluation_learning_curve_lucie.csv](metadata/evaluation_learning_curve_lucie.csv).
|
268 |
Evaluation results of baseline models on the same benchmark datasets are available at
|
269 |
[metadata/evaluation_baselines.csv](metadata/evaluation_baselines.csv).
|
270 |
|
271 |
+
Main results are summarized in the following figures:
|
272 |
+
|
273 |
+
### French
|
274 |
+
![figures/learning-curve-evaluation-french-bench.png](figures/learning-curve-evaluation-french-bench.png)
|
275 |
+
|
276 |
+
### English
|
277 |
+
![figures/learning-curve-evaluation-benchmarks-in-english.png](figures/learning-curve-evaluation-benchmarks-in-english.png)
|
278 |
+
|
279 |
+
### other
|
280 |
+
![figures/learning-curve-evaluation-multilingual-arc-benchmark.png](figures/learning-curve-evaluation-multilingual-arc-benchmark.png)
|
281 |
+
|
282 |
+
### Needle in a Haystack
|
283 |
+
|
284 |
+
#### Pretraining
|
285 |
+
![figures/needle-in-a-haystack/Lucie-7B-main.png](figures/needle-in-a-haystack/Lucie-7B-main.png)
|
286 |
+
|
287 |
+
#### Context Extension
|
288 |
+
![figures/needle-in-a-haystack/Lucie-7B-extension.png](figures/needle-in-a-haystack/Lucie-7B-extension.png)
|
289 |
+
|
290 |
+
#### Annealing
|
291 |
+
![figures/needle-in-a-haystack/Lucie-7B-annealing.png](figures/needle-in-a-haystack/Lucie-7B-annealing.png)
|
292 |
+
|
293 |
|
294 |
## Disclaimer
|
295 |
|
|
|
330 |
## Contact
|
331 |
|
332 | |
333 |
+
|