Update README.md
Browse files
README.md
CHANGED
@@ -125,9 +125,21 @@ Reusing to code above, we can time-stamp documents by masking the year. For exam
|
|
125 |
|
126 |
```python
|
127 |
mask_filler("[MASK] [DATE] The Schleswig war is a matter of great concern.")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
128 |
```
|
129 |
|
130 |
-
|
|
|
131 |
|
132 |
A few years later, in 1870, Prussia aimed artillery southwards and invaded France.
|
133 |
|
@@ -141,16 +153,12 @@ Again, we have to ask: Who cares? Wikipedia can tell us pretty much the same. Mo
|
|
141 |
|
142 |
In both cases, our answers would be "yes, but...". ERWT's time-stamping powers has little instrumental use and won't make us rich (but donations are welcome of course 🤑) we nonetheless believe date prediction has value for research purposes. We can use ERWT for "fictitious" prediction, i.e. as a diagnostic tool.
|
143 |
|
144 |
-
Firstly,
|
145 |
-
|
146 |
-
Secondly,
|
147 |
-
|
148 |
|
149 |
## Limitations
|
150 |
|
151 |
-
|
152 |
-
|
153 |
-
ERWT models were trained for evaluation purposes, and cary critical limitations. First of all, as explained in more detail below, this model is trained on a rather small subsample of British newspapers, with a strong Metropolitan and liberal bias.
|
154 |
|
155 |
Secondly, we only trained for one epoch, which suggests. For the evaluation purposes we were interested in the relative performance of our models.
|
156 |
|
|
|
125 |
|
126 |
```python
|
127 |
mask_filler("[MASK] [DATE] The Schleswig war is a matter of great concern.")
|
128 |
+
|
129 |
+
```
|
130 |
+
|
131 |
+
Outputs as most likely filler:
|
132 |
+
|
133 |
+
```python
|
134 |
+
{'score': 0.48822104930877686,
|
135 |
+
'token': 6717,
|
136 |
+
'token_str': '1864',
|
137 |
+
'sequence': '1864 the schleswig war is a matter of great concern.'}
|
138 |
+
|
139 |
```
|
140 |
|
141 |
+
|
142 |
+
The prediction "1864" makes sense as this was indeed the year of Prussian troops (with some help of their Austrian friends) crossed the border into Schleswig, then part of the Kingdom of Denmark.
|
143 |
|
144 |
A few years later, in 1870, Prussia aimed artillery southwards and invaded France.
|
145 |
|
|
|
153 |
|
154 |
In both cases, our answers would be "yes, but...". ERWT's time-stamping powers has little instrumental use and won't make us rich (but donations are welcome of course 🤑) we nonetheless believe date prediction has value for research purposes. We can use ERWT for "fictitious" prediction, i.e. as a diagnostic tool.
|
155 |
|
156 |
+
Firstly, we used date prediction for evaluation purposes, to measure which training routine produces models
|
157 |
+
Secondly, we could use it as an analytical tool, to study how temporal variation **within** text documents and further scrutinise which features drive the time prediction (it goes without saying that the same applies to other metadata fields, but example predicting political orientation).
|
|
|
|
|
158 |
|
159 |
## Limitations
|
160 |
|
161 |
+
The ERWT series were trained for evaluation purposes, and cary critical limitations. First of all, as explained in more detail below, this model is trained on a rather small subsample of British newspapers, with a strong Metropolitan and liberal bias.
|
|
|
|
|
162 |
|
163 |
Secondly, we only trained for one epoch, which suggests. For the evaluation purposes we were interested in the relative performance of our models.
|
164 |
|