Jeronymous commited on
Commit
1565bfe
·
1 Parent(s): 2c5a377

Add training logs

Browse files
README.md CHANGED
@@ -47,6 +47,7 @@ https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/tem
47
  1. [Main pre-training](#1-main-pre-training)
48
  2. [Context Extension](#2-context-extension)
49
  3. [Annealing](#3-annealing)
 
50
  <!-- * [Evaluation](#evaluation) -->
51
  * [Acknowledgements](#acknowledgements)
52
  * [Contact](#contact)
@@ -237,6 +238,16 @@ Training hyperparameters are the same as above, with the following changes:
237
 
238
  TODO
239
 
 
 
 
 
 
 
 
 
 
 
240
  ## Acknowledgements
241
 
242
  This work was performed using HPC resources from GENCI–IDRIS (Grant 2024-GC011015444).
 
47
  1. [Main pre-training](#1-main-pre-training)
48
  2. [Context Extension](#2-context-extension)
49
  3. [Annealing](#3-annealing)
50
+ * [Training logs and learning curves](#training-logs-and-learning-curves)
51
  <!-- * [Evaluation](#evaluation) -->
52
  * [Acknowledgements](#acknowledgements)
53
  * [Contact](#contact)
 
238
 
239
  TODO
240
 
241
+ ### Training logs and learning curves
242
+
243
+ 🚧 work in progress 🚧
244
+
245
+ Training logs can be found in Tensorboard format in:
246
+ * [`metadata/training_logs/`](metadata/training_logs)
247
+ <br> ├── [`1_pretraining.zip`](metadata/training_logs/1_pretraining.zip) training logs for the first pre-training phases,
248
+ in a zip file. Each file in the zip corresponds to a job of at most 20H of training (parallelized over 512 GPUs).
249
+ <br> └── [`2_extension/`](metadata/training_logs/2_extension) folder containing the training log for the context extension phase, which was done in a single job of around 13H of training (parallelized over 128 GPUs).
250
+
251
  ## Acknowledgements
252
 
253
  This work was performed using HPC resources from GENCI–IDRIS (Grant 2024-GC011015444).
metadata/training_logs/1_pretraining.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:debd5c63735b96a9e62fa5b44b0127c9452c341047ec2b919f82d8612674edce
3
+ size 418213162
metadata/training_logs/2_extension/events.out.tfevents.1731919080.jzxh169.2097150.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e922e0c4112bf78d634ff506c400a651620f43e966b11e2a6fe98206c6e9a423
3
+ size 3379212