Readme polished
Browse files
README.md
CHANGED
@@ -5,9 +5,10 @@ inference: false
|
|
5 |
tags:
|
6 |
- BERT
|
7 |
- HPLT
|
8 |
-
- English
|
9 |
- encoder
|
10 |
license: apache-2.0
|
|
|
|
|
11 |
---
|
12 |
|
13 |
# HPLT Bert for English
|
@@ -17,7 +18,7 @@ license: apache-2.0
|
|
17 |
This is one of the encoder-only monolingual language models trained as a first release by the [HPLT project](https://hplt-project.org/).
|
18 |
It is a so called masked language models. In particular, we used the modification of the classic BERT model named [LTG-BERT](https://aclanthology.org/2023.findings-eacl.146/).
|
19 |
|
20 |
-
A monolingual LTG-BERT model is trained for every major language in the HPLT 1.2 data release (*75* models total).
|
21 |
|
22 |
All the HPLT encoder-only models use the same hyper-parameters, roughly following the BERT-base setup:
|
23 |
- hidden size: 768
|
@@ -26,12 +27,11 @@ All the HPLT encoder-only models use the same hyper-parameters, roughly followin
|
|
26 |
- vocabulary size: 32768
|
27 |
|
28 |
Every model uses its own tokenizer trained on language-specific HPLT data.
|
29 |
-
|
30 |
-
[The training statistics of all 75 runs](https://api.wandb.ai/links/ltg/kduj7mjn)
|
31 |
-
|
32 |
See sizes of the training corpora, evaluation results and more in our [language model training report](https://hplt-project.org/HPLT_D4_1___First_language_models_trained.pdf).
|
33 |
|
34 |
-
The training code
|
|
|
|
|
35 |
|
36 |
## Example usage
|
37 |
|
|
|
5 |
tags:
|
6 |
- BERT
|
7 |
- HPLT
|
|
|
8 |
- encoder
|
9 |
license: apache-2.0
|
10 |
+
datasets:
|
11 |
+
- HPLT/hplt_monolingual_v1_2
|
12 |
---
|
13 |
|
14 |
# HPLT Bert for English
|
|
|
18 |
This is one of the encoder-only monolingual language models trained as a first release by the [HPLT project](https://hplt-project.org/).
|
19 |
It is a so called masked language models. In particular, we used the modification of the classic BERT model named [LTG-BERT](https://aclanthology.org/2023.findings-eacl.146/).
|
20 |
|
21 |
+
A monolingual LTG-BERT model is trained for every major language in the [HPLT 1.2 data release](https://hplt-project.org/datasets/v1.2) (*75* models total).
|
22 |
|
23 |
All the HPLT encoder-only models use the same hyper-parameters, roughly following the BERT-base setup:
|
24 |
- hidden size: 768
|
|
|
27 |
- vocabulary size: 32768
|
28 |
|
29 |
Every model uses its own tokenizer trained on language-specific HPLT data.
|
|
|
|
|
|
|
30 |
See sizes of the training corpora, evaluation results and more in our [language model training report](https://hplt-project.org/HPLT_D4_1___First_language_models_trained.pdf).
|
31 |
|
32 |
+
[The training code](https://github.com/hplt-project/HPLT-WP4).
|
33 |
+
|
34 |
+
[The training statistics of all 75 runs](https://api.wandb.ai/links/ltg/kduj7mjn)
|
35 |
|
36 |
## Example usage
|
37 |
|