Jeronymous commited on
Commit
77bd40f
Β·
verified Β·
1 Parent(s): dd715ce

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -3
README.md CHANGED
@@ -240,20 +240,56 @@ TODO
240
 
241
  ### Training Logs and Learning Curves
242
 
 
 
243
  Training logs can be found in Tensorboard format in:
244
  * [`metadata/training_logs/`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/main/metadata/training_logs)
245
  <br> β”œβ”€β”€ [`1_pretraining.zip`](metadata/training_logs/1_pretraining.zip) training logs for the first pre-training phases,
246
  in a zip file. Each file in the zip corresponds to a job of at most 20H of training (parallelized over 512 GPUs).
247
- <br> └── [`2_extension/`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/main/metadata/training_logs/2_extension) folder containing the training log for the context extension phase, which was done in a single job of around 13H of training (parallelized over 128 GPUs).
 
 
 
 
 
 
 
 
 
 
 
 
248
 
249
- 🚧 TODO: Plot convergence curve (and link CSV ?) 🚧
 
250
 
251
  Evaluation results on benchmark datasets of checkpoints of Lucie-7B throughout the training process are available at
252
  [metadata/evaluation_learning_curve_lucie.csv](metadata/evaluation_learning_curve_lucie.csv).
253
  Evaluation results of baseline models on the same benchmark datasets are available at
254
  [metadata/evaluation_baselines.csv](metadata/evaluation_baselines.csv).
255
 
256
- 🚧 TODO: Plot learning curves 🚧
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
257
 
258
  ## Disclaimer
259
 
@@ -294,3 +330,4 @@ for their helpful input.
294
  ## Contact
295
 
296
 
 
240
 
241
  ### Training Logs and Learning Curves
242
 
243
+ #### Training loss
244
+
245
  Training logs can be found in Tensorboard format in:
246
  * [`metadata/training_logs/`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/main/metadata/training_logs)
247
  <br> β”œβ”€β”€ [`1_pretraining.zip`](metadata/training_logs/1_pretraining.zip) training logs for the first pre-training phases,
248
  in a zip file. Each file in the zip corresponds to a job of at most 20H of training (parallelized over 512 GPUs).
249
+ <br> β”œβ”€β”€ [`2_extension/`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/main/metadata/training_logs/2_extension) folder containing the training log <br> └── [`3_annealing/`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/main/metadata/training_logs/3_annealing) folder containing the training log for the annealing phase, which also took around 13H of training (parallelized over 128 GPUs).
250
+
251
+ The convergence curves of the three pre-training phases are the following:
252
+
253
+ ![figures/convergence-curve-pretraining.png](figures/convergence-curve-pretraining.png)
254
+
255
+ Data corresponding to these plots were extracted from tensorboard logs and are available in the following CSV files:
256
+ * [`metadata/training_logs/`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/main/metadata/training_logs)
257
+ <br> β”œβ”€β”€ [`1_pretraining.csv`](metadata/training_logs/1_pretraining.csv)
258
+ <br> β”œβ”€β”€ [`2_extension.csv`](metadata/training_logs/2_extension.csv)
259
+ <br> └── [`3_annealing.csv`](metadata/training_logs/3_annealing.csv)
260
+
261
+ #### Evaluations
262
 
263
+ Multiple evaluations were conducted during Lucie-7B's training to assess its performance on standard benchmarks,
264
+ primarily in French and English, as well as in Spanish, German, and Italian.
265
 
266
  Evaluation results on benchmark datasets of checkpoints of Lucie-7B throughout the training process are available at
267
  [metadata/evaluation_learning_curve_lucie.csv](metadata/evaluation_learning_curve_lucie.csv).
268
  Evaluation results of baseline models on the same benchmark datasets are available at
269
  [metadata/evaluation_baselines.csv](metadata/evaluation_baselines.csv).
270
 
271
+ Main results are summarized in the following figures:
272
+
273
+ ### French
274
+ ![figures/learning-curve-evaluation-french-bench.png](figures/learning-curve-evaluation-french-bench.png)
275
+
276
+ ### English
277
+ ![figures/learning-curve-evaluation-benchmarks-in-english.png](figures/learning-curve-evaluation-benchmarks-in-english.png)
278
+
279
+ ### other
280
+ ![figures/learning-curve-evaluation-multilingual-arc-benchmark.png](figures/learning-curve-evaluation-multilingual-arc-benchmark.png)
281
+
282
+ ### Needle in a Haystack
283
+
284
+ #### Pretraining
285
+ ![figures/needle-in-a-haystack/Lucie-7B-main.png](figures/needle-in-a-haystack/Lucie-7B-main.png)
286
+
287
+ #### Context Extension
288
+ ![figures/needle-in-a-haystack/Lucie-7B-extension.png](figures/needle-in-a-haystack/Lucie-7B-extension.png)
289
+
290
+ #### Annealing
291
+ ![figures/needle-in-a-haystack/Lucie-7B-annealing.png](figures/needle-in-a-haystack/Lucie-7B-annealing.png)
292
+
293
 
294
  ## Disclaimer
295
 
 
330
  ## Contact
331
 
332
333
+