google-bert
/

bert-large-uncased-whole-word-masking

Inference Endpoints

Model card Files Files and versions Community

lysandre HF staff commited on Jan 13, 2021

Commit

9c2b7df

·

1 Parent(s): e0c83df

Add whole word masking information

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -13,6 +13,10 @@ Pretrained model on English language using a masked language modeling (MLM) obje
 [this repository](https://github.com/google-research/bert). This model is uncased: it does not make a difference
 between english and English.
 Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by
 the Hugging Face team.
@@ -194,11 +198,9 @@ learning rate warmup for 10,000 steps and linear decay of the learning rate afte
 When fine-tuned on downstream tasks, this model achieves the following results:
-Glue test results:
-| Task | MNLI-(m/mm) | QQP  | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE  | Average |
-|:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:|
-|      | 84.6/83.4   | 71.2 | 90.5 | 93.5  | 52.1 | 85.8  | 88.9 | 66.4 | 79.6    |
 ### BibTeX entry and citation info

 [this repository](https://github.com/google-research/bert). This model is uncased: it does not make a difference
 between english and English.
+Differently to other BERT models, this model was trained with a new technique: Whole Word Masking. In this case, all of the tokens corresponding to a word are masked at once. The overall masking rate remains the same.
+The training is identical -- each masked WordPiece token is predicted independently.
 Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by
 the Hugging Face team.
 When fine-tuned on downstream tasks, this model achieves the following results:
+Model                                    | SQUAD 1.1 F1/EM | Multi NLI Accuracy
+---------------------------------------- | :-------------: | :----------------:
+BERT-Large, Uncased (Whole Word Masking) | 92.8/86.7       | 87.07
 ### BibTeX entry and citation info