Kaspar commited on
Commit
b8f8098
ยท
1 Parent(s): 9d7eb61

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -37,8 +37,8 @@ This model is served to you by [Kaspar von Beelen](https://huggingface.co/Kaspar
37
  - [Background: MDMA to the rescue ๐Ÿ™‚](#background-mdma-to-the-rescue-%F0%9F%99%82)
38
  - [Intended Use: LMs as History Machines ๐Ÿš‚](#intended-use-lms-as-history-machines)
39
  - [Historical Language Change: Her/His Majesty? ๐Ÿ‘‘](#historical-language-change-herhis-majesty-%F0%9F%91%91)
40
- - [Date Prediction: Pub Quiz with LMs ๐Ÿป](#date-prediction)
41
- - [Limitations: Not all is well ๐Ÿ˜ฎ](#limitations)
42
  - [Training Data](#training-data)
43
  - [Training Routine](#training-routine)
44
  - [Data Description](#data-description)
@@ -128,7 +128,7 @@ Firstly, eyeballing some toy examples (but also using more rigorous metrics such
128
 
129
  Secondly, MDMA may reduce biases induced by imbalances in the training data (or at least give us more of a handle on this problem). Admittedly, we have to prove this more formally, but some experiments at least hint in this direction. The data used for training is highly biased towards the Victorian age and a standard language model trained on this corpus will predict "her" for ```"[MASK] Majesty"```.
130
 
131
- ### Date Prediction: Pub Quiz with LMs
132
 
133
  Another feature of the ERWT model series is date prediction. Remember that during training the temporal metadata token is often masked. In this case, the model effectively learns to situate documents in time based on the tokens they contain.
134
 
 
37
  - [Background: MDMA to the rescue ๐Ÿ™‚](#background-mdma-to-the-rescue-%F0%9F%99%82)
38
  - [Intended Use: LMs as History Machines ๐Ÿš‚](#intended-use-lms-as-history-machines)
39
  - [Historical Language Change: Her/His Majesty? ๐Ÿ‘‘](#historical-language-change-herhis-majesty-%F0%9F%91%91)
40
+ - [Date Prediction: Pub Quiz with LMs ๐Ÿป](#date-prediction-pub-quiz-with-lms-%F0%9F%8D%BB)
41
+ - [Limitations: Not all is well ๐Ÿ˜ฎ](#limitations-not-all-is-well-%F0%9F%98%AE)
42
  - [Training Data](#training-data)
43
  - [Training Routine](#training-routine)
44
  - [Data Description](#data-description)
 
128
 
129
  Secondly, MDMA may reduce biases induced by imbalances in the training data (or at least give us more of a handle on this problem). Admittedly, we have to prove this more formally, but some experiments at least hint in this direction. The data used for training is highly biased towards the Victorian age and a standard language model trained on this corpus will predict "her" for ```"[MASK] Majesty"```.
130
 
131
+ ### Date Prediction: Pub Quiz with LMs ๐Ÿป
132
 
133
  Another feature of the ERWT model series is date prediction. Remember that during training the temporal metadata token is often masked. In this case, the model effectively learns to situate documents in time based on the tokens they contain.
134