Sami92 commited on
Commit
a02ab97
·
verified ·
1 Parent(s): 5b5bb24

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -34
README.md CHANGED
@@ -63,17 +63,10 @@ for entity in entities:
63
 
64
  ### Training Data
65
 
66
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
 
67
 
68
- [More Information Needed]
69
-
70
- ### Training Procedure
71
-
72
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
73
-
74
- #### Preprocessing [optional]
75
-
76
- [More Information Needed]
77
 
78
 
79
  #### Training Hyperparameters
@@ -85,34 +78,18 @@ for entity in entities:
85
 
86
 
87
 
88
- ## Evaluation
89
-
90
- <!-- This section describes the evaluation protocols and provides the results. -->
91
-
92
- ### Testing Data, Factors & Metrics
93
-
94
- #### Testing Data
95
-
96
- <!-- This should link to a Dataset Card if possible. -->
97
-
98
- [More Information Needed]
99
-
100
-
101
- [More Information Needed]
102
-
103
  #### Metrics
104
 
105
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
106
-
107
- [More Information Needed]
108
-
109
- ### Results
110
-
111
- [More Information Needed]
112
 
113
 
114
  ## Model Card Authors [optional]
115
-
116
  @misc{your_model_name,
117
  author = {Nenno, Sai},
118
  title = {Public Entity Recognition Model},
@@ -122,4 +99,4 @@ for entity in entities:
122
  url = {https://huggingface.co/Sami92/XLM-PER-B}
123
  }
124
 
125
- ´
 
63
 
64
  ### Training Data
65
 
66
+ The model was first fine-tuned on a weakly annotated dataset: German newspaper articles (total = 267,786) and German Wikipedia articles (total = 4,348).
67
+ The weak annotation was based on the [database of public speakers](https://github.com/Leibniz-HBI/DBoeS-data/).
68
+ In a second step the model was fine-tuned on a manually annotated dataset of 3090 sentences from similar sources. The test-split of this data was used for evaluation.
69
 
 
 
 
 
 
 
 
 
 
70
 
71
 
72
  #### Training Hyperparameters
 
78
 
79
 
80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  #### Metrics
82
 
83
+ - type: f1
84
+ value: 0.80
85
+ - type: recall
86
+ value: 0.78
87
+ - type: precision
88
+ value: 0.84
 
89
 
90
 
91
  ## Model Card Authors [optional]
92
+ ```css
93
  @misc{your_model_name,
94
  author = {Nenno, Sai},
95
  title = {Public Entity Recognition Model},
 
99
  url = {https://huggingface.co/Sami92/XLM-PER-B}
100
  }
101
 
102
+ ```