Livingwithmachines
/

erwt-year

@@ -12,7 +12,7 @@ widget:
 - text: "1820 [DATE] We received a letter from [MASK] Majesty."
 - text: "1850 [DATE] We received a letter from [MASK] Majesty."
 - text: "[MASK] [DATE] The Franco-Prussian war is a matter of great concern."
-- text: "[MASK] [DATE] The Second Schleswig war is a matter of great concern."
 ---
 **MODEL CARD UNDER CONSTRUCTION, ETA END OF NOVEMBER**
@@ -28,7 +28,7 @@ ERWT is a fine-tuned [`distilbert-base-cased`](https://huggingface.co/distilbert
 ERWT performs time-sensitive masked language modelling. It can also guess the year a text was written.
-This model is served you by [Kaspar von Beelen](https://huggingface.co/Kaspar) and [Daniel van Strien](https://huggingface.co/davanstrien), *"Improving AI, one pea at a time"*.
 ## Introductory Note: Repent Now. 😇
@@ -93,7 +93,7 @@ Returns as most likely prediction:
 However, if we change the date at the start of the sentence to 1850:
 ```python
-mask_filler(f"1820 [DATE] We received a letter from [MASK] Majesty.")
 ```
 Will put most of probability mass on the token "her" and only a little bit on "him".
@@ -111,12 +111,39 @@ Okay, but why is this interesting?
 Firstly, eyeballing some toy-examples (but also using more rigorous metrics such as perplexity) shows that MLMs can perform more accurate predictions when it has access to temporal metadata. In other words, ERWT's prediction reflects historical language use more accurately. Model that are sensitive to historical context could
-Secondly, we anticipate the MDMA may reduce bias, or at least gives us more of a handle on this problem. Admittedly, we have to prove this more formally, but some experiments at least hint in this direction.
 ### Date Prediction
 ## Limitations
 ERWT models were trained for evaluation purposes, and cary critical limitations. First of all, as explained in more detail below, this model is trained on a rather small subsample of British newspapers, with a strong Metropolitan and liberal bias.
 Secondly, we only trained for one epoch, which suggests. For the evaluation purposes we were interested in the relative performance of our models.

 - text: "1820 [DATE] We received a letter from [MASK] Majesty."
 - text: "1850 [DATE] We received a letter from [MASK] Majesty."
 - text: "[MASK] [DATE] The Franco-Prussian war is a matter of great concern."
+- text: "[MASK] [DATE] The Schleswig war is a matter of great concern."
 ---
 **MODEL CARD UNDER CONSTRUCTION, ETA END OF NOVEMBER**
 ERWT performs time-sensitive masked language modelling. It can also guess the year a text was written.
+This model is served to you by [Kaspar von Beelen](https://huggingface.co/Kaspar) and [Daniel van Strien](https://huggingface.co/davanstrien), *"Improving AI, one pea at a time"*.
 ## Introductory Note: Repent Now. 😇
 However, if we change the date at the start of the sentence to 1850:
 ```python
+mask_filler(f"1850 [DATE] We received a letter from [MASK] Majesty.")
 ```
 Will put most of probability mass on the token "her" and only a little bit on "him".
 Firstly, eyeballing some toy-examples (but also using more rigorous metrics such as perplexity) shows that MLMs can perform more accurate predictions when it has access to temporal metadata. In other words, ERWT's prediction reflects historical language use more accurately. Model that are sensitive to historical context could
+Secondly, MDMA may reduce biases induced by imbalances in the training data (or at least gives us more of a handle on this problem). Admittedly, we have to prove this more formally, but some experiments at least hint in this direction. The data used for training is highly biased towards the Victorian age and a standard language model trained on this corpus will predict "her" for ```"[MASK] Majesty"```.
 ### Date Prediction
+Another feature of the ERWT model series, is date prediction. Remember that during training the temporal metadata token is often masked. In this case the model effectively learns to situate documents in time based on the tokens they contain.
+By masking the year token, ERWT guesses the document's year of publication.
+👩‍🏫 **History Intermezzo** To unite the German states (there were plenty!), Prussia fought a number of wars with its neighbours in the second half of the nineteenth century. It invaded Denmark in 1864 (the second of the Schleswig Wars) and France in 1870 (the Franco-Prussian war).
+Reusing to code above, we can time-stamp documents by masking the year. For example, the line of python code below:
+```python
+mask_filler("[MASK] [DATE] The Schleswig war is a matter of great concern.")
+```
+Return 1864, which makes sense as this was indeed the year of Prussian troops (with some help of their Austrian friends) crossed the border into Schleswig, then part of the Kingdom of Denmark.
+A few years later, in 1870, Prussia aimed artillery southwards and invaded France.
+```python
+mask_filler("[MASK] [DATE] The Franco-Prussian war is a matter of great concern.")
+```
+ERWT clearly learned a lot about history of German unification by ploughing through a plethora of nineteenth century newspaper articles: it correctly returns "1870" as the predicted year.
+Again, we have to ask: Who cares? Wikipedia can tell us very much the same, and don't we already have correct timestamps for newspaper data.
 ## Limitations
+### The models
 ERWT models were trained for evaluation purposes, and cary critical limitations. First of all, as explained in more detail below, this model is trained on a rather small subsample of British newspapers, with a strong Metropolitan and liberal bias.
 Secondly, we only trained for one epoch, which suggests. For the evaluation purposes we were interested in the relative performance of our models.