Kaspar commited on
Commit
7cc6cd2
Β·
1 Parent(s): 54c9f3c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -4
README.md CHANGED
@@ -12,7 +12,7 @@ widget:
12
  - text: "1820 [DATE] We received a letter from [MASK] Majesty."
13
  - text: "1850 [DATE] We received a letter from [MASK] Majesty."
14
  - text: "[MASK] [DATE] The Franco-Prussian war is a matter of great concern."
15
- - text: "[MASK] [DATE] The Second Schleswig war is a matter of great concern."
16
 
17
  ---
18
  **MODEL CARD UNDER CONSTRUCTION, ETA END OF NOVEMBER**
@@ -28,7 +28,7 @@ ERWT is a fine-tuned [`distilbert-base-cased`](https://huggingface.co/distilbert
28
 
29
  ERWT performs time-sensitive masked language modelling. It can also guess the year a text was written.
30
 
31
- This model is served you by [Kaspar von Beelen](https://huggingface.co/Kaspar) and [Daniel van Strien](https://huggingface.co/davanstrien), *"Improving AI, one pea at a time"*.
32
 
33
  ## Introductory Note: Repent Now. πŸ˜‡
34
 
@@ -93,7 +93,7 @@ Returns as most likely prediction:
93
  However, if we change the date at the start of the sentence to 1850:
94
 
95
  ```python
96
- mask_filler(f"1820 [DATE] We received a letter from [MASK] Majesty.")
97
  ```
98
 
99
  Will put most of probability mass on the token "her" and only a little bit on "him".
@@ -111,12 +111,39 @@ Okay, but why is this interesting?
111
 
112
  Firstly, eyeballing some toy-examples (but also using more rigorous metrics such as perplexity) shows that MLMs can perform more accurate predictions when it has access to temporal metadata. In other words, ERWT's prediction reflects historical language use more accurately. Model that are sensitive to historical context could
113
 
114
- Secondly, we anticipate the MDMA may reduce bias, or at least gives us more of a handle on this problem. Admittedly, we have to prove this more formally, but some experiments at least hint in this direction.
115
 
116
  ### Date Prediction
117
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
  ## Limitations
119
 
 
 
120
  ERWT models were trained for evaluation purposes, and cary critical limitations. First of all, as explained in more detail below, this model is trained on a rather small subsample of British newspapers, with a strong Metropolitan and liberal bias.
121
 
122
  Secondly, we only trained for one epoch, which suggests. For the evaluation purposes we were interested in the relative performance of our models.
 
12
  - text: "1820 [DATE] We received a letter from [MASK] Majesty."
13
  - text: "1850 [DATE] We received a letter from [MASK] Majesty."
14
  - text: "[MASK] [DATE] The Franco-Prussian war is a matter of great concern."
15
+ - text: "[MASK] [DATE] The Schleswig war is a matter of great concern."
16
 
17
  ---
18
  **MODEL CARD UNDER CONSTRUCTION, ETA END OF NOVEMBER**
 
28
 
29
  ERWT performs time-sensitive masked language modelling. It can also guess the year a text was written.
30
 
31
+ This model is served to you by [Kaspar von Beelen](https://huggingface.co/Kaspar) and [Daniel van Strien](https://huggingface.co/davanstrien), *"Improving AI, one pea at a time"*.
32
 
33
  ## Introductory Note: Repent Now. πŸ˜‡
34
 
 
93
  However, if we change the date at the start of the sentence to 1850:
94
 
95
  ```python
96
+ mask_filler(f"1850 [DATE] We received a letter from [MASK] Majesty.")
97
  ```
98
 
99
  Will put most of probability mass on the token "her" and only a little bit on "him".
 
111
 
112
  Firstly, eyeballing some toy-examples (but also using more rigorous metrics such as perplexity) shows that MLMs can perform more accurate predictions when it has access to temporal metadata. In other words, ERWT's prediction reflects historical language use more accurately. Model that are sensitive to historical context could
113
 
114
+ Secondly, MDMA may reduce biases induced by imbalances in the training data (or at least gives us more of a handle on this problem). Admittedly, we have to prove this more formally, but some experiments at least hint in this direction. The data used for training is highly biased towards the Victorian age and a standard language model trained on this corpus will predict "her" for ```"[MASK] Majesty"```.
115
 
116
  ### Date Prediction
117
 
118
+ Another feature of the ERWT model series, is date prediction. Remember that during training the temporal metadata token is often masked. In this case the model effectively learns to situate documents in time based on the tokens they contain.
119
+
120
+ By masking the year token, ERWT guesses the document's year of publication.
121
+
122
+ πŸ‘©β€πŸ« **History Intermezzo** To unite the German states (there were plenty!), Prussia fought a number of wars with its neighbours in the second half of the nineteenth century. It invaded Denmark in 1864 (the second of the Schleswig Wars) and France in 1870 (the Franco-Prussian war).
123
+
124
+ Reusing to code above, we can time-stamp documents by masking the year. For example, the line of python code below:
125
+
126
+ ```python
127
+ mask_filler("[MASK] [DATE] The Schleswig war is a matter of great concern.")
128
+ ```
129
+
130
+ Return 1864, which makes sense as this was indeed the year of Prussian troops (with some help of their Austrian friends) crossed the border into Schleswig, then part of the Kingdom of Denmark.
131
+
132
+ A few years later, in 1870, Prussia aimed artillery southwards and invaded France.
133
+
134
+ ```python
135
+ mask_filler("[MASK] [DATE] The Franco-Prussian war is a matter of great concern.")
136
+ ```
137
+
138
+ ERWT clearly learned a lot about history of German unification by ploughing through a plethora of nineteenth century newspaper articles: it correctly returns "1870" as the predicted year.
139
+
140
+ Again, we have to ask: Who cares? Wikipedia can tell us very much the same, and don't we already have correct timestamps for newspaper data.
141
+
142
+
143
  ## Limitations
144
 
145
+ ### The models
146
+
147
  ERWT models were trained for evaluation purposes, and cary critical limitations. First of all, as explained in more detail below, this model is trained on a rather small subsample of British newspapers, with a strong Metropolitan and liberal bias.
148
 
149
  Secondly, we only trained for one epoch, which suggests. For the evaluation purposes we were interested in the relative performance of our models.