bclavie commited on
Commit
9a76b05
1 Parent(s): e5629bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -5
README.md CHANGED
@@ -39,9 +39,9 @@ It is available in the following sizes:
39
 
40
  ## Usage
41
 
42
- You can use these models directly with the `transformers` library. Since ModernBERT is a Masked Language Model (MLM), you can use the `fill-mask` pipeline or load it via `AutoModelForMaskedLM`.
43
 
44
- **⚠️ We strongly suggest using ModernBERT with Flash Attention 2, as it is by far the best performing variant of the model, and is a 1:1 match of our research implementation. To do so, install Flash Attention as follows, then use the model as normal:**
45
 
46
  ```bash
47
  pip install flash-attn
@@ -86,8 +86,6 @@ results = pipe(input_text)
86
  pprint(results)
87
  ```
88
 
89
- To use ModernBERT for downstream tasks like classification, retrieval, or QA, fine-tune it following standard BERT fine-tuning recipes.
90
-
91
  **Note:** ModernBERT does not use token type IDs, unlike some earlier BERT models. Most downstream usage is identical to standard BERT models on the Hugging Face Hub, except you can omit the `token_type_ids` parameter.
92
 
93
  ## Evaluation
@@ -151,4 +149,14 @@ We release the ModernBERT model architectures, model weights, training codebase
151
 
152
  If you use ModernBERT in your work, please cite:
153
 
154
- **TODO: Citation**
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ## Usage
41
 
42
+ You can use these models directly with the `transformers` library. Since ModernBERT is a Masked Language Model (MLM), you can use the `fill-mask` pipeline or load it via `AutoModelForMaskedLM`. To use ModernBERT for downstream tasks like classification, retrieval, or QA, fine-tune it following standard BERT fine-tuning recipes.
43
 
44
+ **⚠️ We strongly suggest using ModernBERT with Flash Attention 2, as it is by far the best performing variant of the model. To do so, install Flash Attention as follows, then use the model as normal:**
45
 
46
  ```bash
47
  pip install flash-attn
 
86
  pprint(results)
87
  ```
88
 
 
 
89
  **Note:** ModernBERT does not use token type IDs, unlike some earlier BERT models. Most downstream usage is identical to standard BERT models on the Hugging Face Hub, except you can omit the `token_type_ids` parameter.
90
 
91
  ## Evaluation
 
149
 
150
  If you use ModernBERT in your work, please cite:
151
 
152
+ ```
153
+ @misc{modernbert,
154
+ title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference},
155
+ author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
156
+ year={2024},
157
+ eprint={2412.13663},
158
+ archivePrefix={arXiv},
159
+ primaryClass={cs.CL},
160
+ url={https://arxiv.org/abs/2412.13663},
161
+ }
162
+ ```