Jeronymous
commited on
Commit
·
1565bfe
1
Parent(s):
2c5a377
Add training logs
Browse files
README.md
CHANGED
@@ -47,6 +47,7 @@ https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/tem
|
|
47 |
1. [Main pre-training](#1-main-pre-training)
|
48 |
2. [Context Extension](#2-context-extension)
|
49 |
3. [Annealing](#3-annealing)
|
|
|
50 |
<!-- * [Evaluation](#evaluation) -->
|
51 |
* [Acknowledgements](#acknowledgements)
|
52 |
* [Contact](#contact)
|
@@ -237,6 +238,16 @@ Training hyperparameters are the same as above, with the following changes:
|
|
237 |
|
238 |
TODO
|
239 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
240 |
## Acknowledgements
|
241 |
|
242 |
This work was performed using HPC resources from GENCI–IDRIS (Grant 2024-GC011015444).
|
|
|
47 |
1. [Main pre-training](#1-main-pre-training)
|
48 |
2. [Context Extension](#2-context-extension)
|
49 |
3. [Annealing](#3-annealing)
|
50 |
+
* [Training logs and learning curves](#training-logs-and-learning-curves)
|
51 |
<!-- * [Evaluation](#evaluation) -->
|
52 |
* [Acknowledgements](#acknowledgements)
|
53 |
* [Contact](#contact)
|
|
|
238 |
|
239 |
TODO
|
240 |
|
241 |
+
### Training logs and learning curves
|
242 |
+
|
243 |
+
🚧 work in progress 🚧
|
244 |
+
|
245 |
+
Training logs can be found in Tensorboard format in:
|
246 |
+
* [`metadata/training_logs/`](metadata/training_logs)
|
247 |
+
<br> ├── [`1_pretraining.zip`](metadata/training_logs/1_pretraining.zip) training logs for the first pre-training phases,
|
248 |
+
in a zip file. Each file in the zip corresponds to a job of at most 20H of training (parallelized over 512 GPUs).
|
249 |
+
<br> └── [`2_extension/`](metadata/training_logs/2_extension) folder containing the training log for the context extension phase, which was done in a single job of around 13H of training (parallelized over 128 GPUs).
|
250 |
+
|
251 |
## Acknowledgements
|
252 |
|
253 |
This work was performed using HPC resources from GENCI–IDRIS (Grant 2024-GC011015444).
|
metadata/training_logs/1_pretraining.zip
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:debd5c63735b96a9e62fa5b44b0127c9452c341047ec2b919f82d8612674edce
|
3 |
+
size 418213162
|
metadata/training_logs/2_extension/events.out.tfevents.1731919080.jzxh169.2097150.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e922e0c4112bf78d634ff506c400a651620f43e966b11e2a6fe98206c6e9a423
|
3 |
+
size 3379212
|