Update README.md
Browse files
README.md
CHANGED
@@ -37,8 +37,8 @@ This model is served to you by [Kaspar von Beelen](https://huggingface.co/Kaspar
|
|
37 |
- [Background: MDMA to the rescue ๐](#background-mdma-to-the-rescue-%F0%9F%99%82)
|
38 |
- [Intended Use: LMs as History Machines ๐](#intended-use-lms-as-history-machines)
|
39 |
- [Historical Language Change: Her/His Majesty? ๐](#historical-language-change-herhis-majesty-%F0%9F%91%91)
|
40 |
-
- [Date Prediction: Pub Quiz with LMs ๐ป](#date-prediction)
|
41 |
-
- [Limitations: Not all is well ๐ฎ](#limitations)
|
42 |
- [Training Data](#training-data)
|
43 |
- [Training Routine](#training-routine)
|
44 |
- [Data Description](#data-description)
|
@@ -128,7 +128,7 @@ Firstly, eyeballing some toy examples (but also using more rigorous metrics such
|
|
128 |
|
129 |
Secondly, MDMA may reduce biases induced by imbalances in the training data (or at least give us more of a handle on this problem). Admittedly, we have to prove this more formally, but some experiments at least hint in this direction. The data used for training is highly biased towards the Victorian age and a standard language model trained on this corpus will predict "her" for ```"[MASK] Majesty"```.
|
130 |
|
131 |
-
### Date Prediction: Pub Quiz with LMs
|
132 |
|
133 |
Another feature of the ERWT model series is date prediction. Remember that during training the temporal metadata token is often masked. In this case, the model effectively learns to situate documents in time based on the tokens they contain.
|
134 |
|
|
|
37 |
- [Background: MDMA to the rescue ๐](#background-mdma-to-the-rescue-%F0%9F%99%82)
|
38 |
- [Intended Use: LMs as History Machines ๐](#intended-use-lms-as-history-machines)
|
39 |
- [Historical Language Change: Her/His Majesty? ๐](#historical-language-change-herhis-majesty-%F0%9F%91%91)
|
40 |
+
- [Date Prediction: Pub Quiz with LMs ๐ป](#date-prediction-pub-quiz-with-lms-%F0%9F%8D%BB)
|
41 |
+
- [Limitations: Not all is well ๐ฎ](#limitations-not-all-is-well-%F0%9F%98%AE)
|
42 |
- [Training Data](#training-data)
|
43 |
- [Training Routine](#training-routine)
|
44 |
- [Data Description](#data-description)
|
|
|
128 |
|
129 |
Secondly, MDMA may reduce biases induced by imbalances in the training data (or at least give us more of a handle on this problem). Admittedly, we have to prove this more formally, but some experiments at least hint in this direction. The data used for training is highly biased towards the Victorian age and a standard language model trained on this corpus will predict "her" for ```"[MASK] Majesty"```.
|
130 |
|
131 |
+
### Date Prediction: Pub Quiz with LMs ๐ป
|
132 |
|
133 |
Another feature of the ERWT model series is date prediction. Remember that during training the temporal metadata token is often masked. In this case, the model effectively learns to situate documents in time based on the tokens they contain.
|
134 |
|