Add whole word masking information
Browse files
README.md
CHANGED
@@ -13,6 +13,10 @@ Pretrained model on English language using a masked language modeling (MLM) obje
|
|
13 |
[this repository](https://github.com/google-research/bert). This model is uncased: it does not make a difference
|
14 |
between english and English.
|
15 |
|
|
|
|
|
|
|
|
|
16 |
Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by
|
17 |
the Hugging Face team.
|
18 |
|
@@ -194,11 +198,9 @@ learning rate warmup for 10,000 steps and linear decay of the learning rate afte
|
|
194 |
|
195 |
When fine-tuned on downstream tasks, this model achieves the following results:
|
196 |
|
197 |
-
|
198 |
-
|
199 |
-
|
200 |
-
|:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:|
|
201 |
-
| | 84.6/83.4 | 71.2 | 90.5 | 93.5 | 52.1 | 85.8 | 88.9 | 66.4 | 79.6 |
|
202 |
|
203 |
|
204 |
### BibTeX entry and citation info
|
|
|
13 |
[this repository](https://github.com/google-research/bert). This model is uncased: it does not make a difference
|
14 |
between english and English.
|
15 |
|
16 |
+
Differently to other BERT models, this model was trained with a new technique: Whole Word Masking. In this case, all of the tokens corresponding to a word are masked at once. The overall masking rate remains the same.
|
17 |
+
|
18 |
+
The training is identical -- each masked WordPiece token is predicted independently.
|
19 |
+
|
20 |
Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by
|
21 |
the Hugging Face team.
|
22 |
|
|
|
198 |
|
199 |
When fine-tuned on downstream tasks, this model achieves the following results:
|
200 |
|
201 |
+
Model | SQUAD 1.1 F1/EM | Multi NLI Accuracy
|
202 |
+
---------------------------------------- | :-------------: | :----------------:
|
203 |
+
BERT-Large, Uncased (Whole Word Masking) | 92.8/86.7 | 87.07
|
|
|
|
|
204 |
|
205 |
|
206 |
### BibTeX entry and citation info
|