Safetensors
English
bert
fineweb-lms
stefan-it commited on
Commit
a5c177c
·
verified ·
1 Parent(s): 7723691

readme: add initial version

Browse files
Files changed (1) hide show
  1. README.md +44 -12
README.md CHANGED
@@ -1,12 +1,44 @@
1
- ---
2
- license: apache-2.0
3
- datasets:
4
- - HuggingFaceFW/fineweb
5
- - HuggingFaceFW/fineweb-edu
6
- language:
7
- - en
8
- tags:
9
- - fineweb
10
- - lms
11
- - bert
12
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # FineWeb-LMs: BERT
2
+
3
+ <p align="left">
4
+ <picture>
5
+ <img alt="BERT with TensorFlow Model Garden" src="https://github.com/stefan-it/model-garden-lms/raw/main/bert_tf_model_garden.png" style="max-width: 25%;">
6
+ </picture>
7
+ <br/>
8
+ </p>
9
+
10
+ This repository presents a BERT model that was pretrained on the 10BT subsets of [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) and [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu).
11
+
12
+ # Pretraining Details
13
+
14
+ The released BERT model is part of my [TensorFlow Model Garden LMs](https://github.com/stefan-it/model-garden-lms/tree/main) project.
15
+
16
+ The pretraining was done on a v3-32 TPU VM Pod, provided by the amazing [TRC program](https://sites.research.google/trc/about/). A detailed cheatsheets are available:
17
+
18
+ * [TPU VM Setup](https://github.com/stefan-it/model-garden-lms/tree/main/cheatsheet)
19
+ * [Pretraining a BERT Model with TensorFlow Model Garden Library](https://github.com/stefan-it/model-garden-lms/tree/main/bert)
20
+
21
+ tl;dr: The model was pretrained for 1M steps with a global batch size of 512, a sequence length of 512 using a vocab size of 64k.
22
+
23
+ # Checkpoint Evaluation with ScandEval
24
+
25
+ We evaluate the last 5 checkpoints (1M, 951k, 901k, 851k and 851k) with a recent version of ScandEval to check their performance and also compare it with popular encoder-only models such as BERT, RoBERTa or ELECTRA:
26
+
27
+ | Model ID | Avg. Score | CoNLL-En | SST5 | ScaLA-En | SQuAD |
28
+ |-------------------------------------------------------------------------------------------------------------|--------------|-----------------------------|-----------------------------|-----------------------------|-----------------------------|
29
+ | [model-garden-lms/bert-base-finewebs-1m](https://huggingface.co/model-garden-lms/bert-base-finewebs-1m) | 69.03 | 88.98 ± 0.43 / 88.67 ± 0.36 | 58.11 ± 1.2 / 59.77 ± 1.49 | 57.29 ± 3.57 / 77.15 ± 2.17 | 55.82 ± 1.35 / 66.46 ± 1.51 |
30
+ | [model-garden-lms/bert-base-finewebs-951k](https://huggingface.co/model-garden-lms/bert-base-finewebs-951k) | **69.41** | 89.25 ± 0.4 / 88.9 ± 0.37 | 58.17 ± 1.26 / 59.86 ± 1.65 | 58.83 ± 3.46 / 78.22 ± 2.11 | 55.66 ± 1.19 / 66.36 ± 1.42 |
31
+ | [model-garden-lms/bert-base-finewebs-901k](https://huggingface.co/model-garden-lms/bert-base-finewebs-901k) | 69.12 | 89.22 ± 0.69 / 88.97 ± 0.45 | 57.93 ± 1.1 / 59.49 ± 1.44 | 58.66 ± 2.99 / 77.94 ± 1.88 | 55.0 ± 1.05 / 65.75 ± 1.29 |
32
+ | [model-garden-lms/bert-base-finewebs-851k](https://huggingface.co/model-garden-lms/bert-base-finewebs-851k) | 68.76 | 89.29 ± 0.52 / 89.0 ± 0.51 | 57.68 ± 0.97 / 59.01 ± 1.23 | 57.11 ± 3.77 / 77.36 ± 1.97 | 54.79 ± 1.21 / 65.87 ± 1.32 |
33
+ | [model-garden-lms/bert-base-finewebs-801k](https://huggingface.co/model-garden-lms/bert-base-finewebs-801k) | 68.12 | 88.92 ± 0.45 / 88.6 ± 0.44 | 57.64 ± 1.09 / 60.8 ± 1.88 | 54.28 ± 4.83 / 75.48 ± 2.97 | 54.13 ± 1.61 / 65.09 ± 1.65 |
34
+ | [google-bert/bert-base-cased](https://huggingface.co/google-bert/bert-base-cased) | 62.26 | 87.39 ± 0.79 / 87.11 ± 0.66 | 54.49 ± 1.36 / 53.22 ± 1.15 | 52.08 ± 2.13 / 74.52 ± 1.31 | 38.63 ± 2.1 / 50.68 ± 1.87 |
35
+ | [google/electra-base-discriminator](https://huggingface.co/google/electra-base-discriminator) | 69.26 | 87.82 ± 0.69 / 86.83 ± 0.62 | 62.3 ± 1.12 / 55.93 ± 0.67 | 62.61 ± 1.21 / 80.85 ± 0.59 | 52.51 ± 0.86 / 65.2 ± 0.85 |
36
+ | [FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base) | 68.96 | 90.35 ± 0.23 / 90.14 ± 0.2 | 60.95 ± 1.4 / 57.52 ± 1.97 | 50.64 ± 1.69 / 74.55 ± 0.9 | 57.82 ± 1.35 / 69.68 ± 1.02 |
37
+
38
+ Our pretrained BERT model shows a strong performance across all tasks.
39
+
40
+ # ❤️ Acknowledgements
41
+
42
+ This repository is the outcome of the last two years of working with TPUs from the awesome [TRC program](https://sites.research.google/trc/about/) and the [TensorFlow Model Garden](https://github.com/tensorflow/models) library.
43
+
44
+ Made from Bavarian Oberland with ❤️ and 🥨.