Turmbücher NER
A model for historical German developed by Ismail Prada Ziegler as part of a research project at the University of Bern, Digital Humanities.
Performance
PER | ORG | LOC | Micro-Avg | |
---|---|---|---|---|
Precision | 82.46% | 28.81% | 88.51% | 81.21% |
Recall | 88.51% | 44.74% | 83.02% | 83.99% |
F1-Score | 85.38% | 35.05% | 85.67% | 82.57% |
Note: ORG-tags were too inconsistent in the training data and performed poorly.
We discovered in first experiments that the model also performs reasonably well on automatically transcribed text (CER of around 5%).
Data Set
Main data set: Berner Turmbücher, early volumes from 16th C., Early New High German, 61k tokens training data.
Secondary data sets:
- SSRQ - Fribourg, language model + tagging, 59k tokens.
- Chorgerichtsmanuale (unpublished), language model + tagging, 76k tokens.
- Königsfelden Charters, language model, 623k tokens.
- Talgerichtsprotokolle (unpublished), language model, 438k tokens.
Notice
This project is still in progress.
- Downloads last month
- 2
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.