SBB
/

sbb_ner

@@ -66,7 +66,7 @@ The model was developed by the Berlin State Library (SBB) in the [QURATOR](https
 ## Model Description
 <!-- Provide a longer summary of what this model is/does. -->
-A BERT model trained on three German corpora containing contemporary and historical texts for named entity recognition tasks.
 It predicts the classes `PER`, `LOC` and `ORG`.
 - **Developed by:** [Kai Labusch](https://huggingface.co/labusch), [Clemens Neudecker](https://huggingface.co/cneud), David Zellhöfer
@@ -100,9 +100,6 @@ Supported entity types are `PER`, `LOC` and `ORG`.
 The model has been pre-trained on 2,333,647 pages of OCR-text of the digitized collections of Berlin State Library.
 Therefore it is adapted to OCR-error prone historical German texts and might be used for particular applications that involve such text material.
 ## Out-of-Scope Use
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
@@ -159,7 +156,7 @@ Since it is an incarnation of the original BERT-model published by Google, all t
 <!-- This section describes the evaluation protocols and provides the results. -->
 The model has been evaluated by 5-fold cross-validation on several German historical OCR ground truth datasets.
-See publication for detail.
 ## Testing Data, Factors & Metrics
@@ -168,29 +165,29 @@ See publication for detail.
 <!-- This should link to a Data Card if possible. -->
 Two different test sets contained in the CoNLL 2003 German Named Entity Recognition Ground Truth, i.e. TEST-A and TEST-B, have been used for testing (DE-CoNLL-TEST).
-Additionally, historical OCR-based ground truth datasets have been used for testing - see publication for details and below.
 ### Factors
 <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-The evaluation focuses on NER in historical German documents, see publication for details.
 ### Metrics
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 Performance metrics used in evaluation is precision, recall and F1-score.
-See paper for actual results in terms of these metrics.
 ## Results
-See publication.
 # Model Examination
-See publication.
 # Environmental Impact
@@ -256,9 +253,9 @@ More information needed.
 In addition to what has been documented above, it should be noted that there are two NER Ground Truth datasets available:
 1) [Data provided for the 2020 HIPE campaign on named entity processing](https://impresso.github.io/CLEF-HIPE-2020/)
-2) [Data providided for the 2022 HIPE shared task on named entity processing](https://hipe-eval.github.io/HIPE-2022/)
-Furthermore, two papers have been published on NER/NED, using BERT:
 1) [Entity Linking in Multilingual Newspapers and Classical Commentaries with BERT](http://ceur-ws.org/Vol-3180/paper-85.pdf)
 2) [Named Entity Disambiguation and Linking Historic Newspaper OCR with BERT](http://ceur-ws.org/Vol-2696/paper_163.pdf)

 ## Model Description
 <!-- Provide a longer summary of what this model is/does. -->
+A BERT model trained on three German corpora containing contemporary and historical texts for Named Entity Recognition (NER) tasks.
 It predicts the classes `PER`, `LOC` and `ORG`.
 - **Developed by:** [Kai Labusch](https://huggingface.co/labusch), [Clemens Neudecker](https://huggingface.co/cneud), David Zellhöfer
 The model has been pre-trained on 2,333,647 pages of OCR-text of the digitized collections of Berlin State Library.
 Therefore it is adapted to OCR-error prone historical German texts and might be used for particular applications that involve such text material.
 ## Out-of-Scope Use
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 <!-- This section describes the evaluation protocols and provides the results. -->
 The model has been evaluated by 5-fold cross-validation on several German historical OCR ground truth datasets.
+See [publication](https://konvens.org/proceedings/2019/papers/KONVENS2019_paper_4.pdf) for details.
 ## Testing Data, Factors & Metrics
 <!-- This should link to a Data Card if possible. -->
 Two different test sets contained in the CoNLL 2003 German Named Entity Recognition Ground Truth, i.e. TEST-A and TEST-B, have been used for testing (DE-CoNLL-TEST).
+Additionally, historical OCR-based ground truth datasets have been used for testing - see [publication](https://konvens.org/proceedings/2019/papers/KONVENS2019_paper_4.pdf) for details and below.
 ### Factors
 <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+The evaluation focuses on NER in historical German documents, see [publication](https://konvens.org/proceedings/2019/papers/KONVENS2019_paper_4.pdf) for details.
 ### Metrics
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 Performance metrics used in evaluation is precision, recall and F1-score.
+See [publication](https://konvens.org/proceedings/2019/papers/KONVENS2019_paper_4.pdf) for actual results in terms of these metrics.
 ## Results
+See [publication](https://konvens.org/proceedings/2019/papers/KONVENS2019_paper_4.pdf).
 # Model Examination
+See [publication](https://konvens.org/proceedings/2019/papers/KONVENS2019_paper_4.pdf).
 # Environmental Impact
 In addition to what has been documented above, it should be noted that there are two NER Ground Truth datasets available:
 1) [Data provided for the 2020 HIPE campaign on named entity processing](https://impresso.github.io/CLEF-HIPE-2020/)
+2) [Data provided for the 2022 HIPE shared task on named entity processing](https://hipe-eval.github.io/HIPE-2022/)
+Furthermore, two papers have been published on NER/EL, using BERT:
 1) [Entity Linking in Multilingual Newspapers and Classical Commentaries with BERT](http://ceur-ws.org/Vol-3180/paper-85.pdf)
 2) [Named Entity Disambiguation and Linking Historic Newspaper OCR with BERT](http://ceur-ws.org/Vol-2696/paper_163.pdf)