Token Classification
Transformers
Safetensors
French
camembert
Inference Endpoints
bourdoiscatie commited on
Commit
90d6f39
·
verified ·
1 Parent(s): 812a709

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1037 -19
README.md CHANGED
@@ -1,41 +1,969 @@
1
  ---
 
2
  base_model: camembert/camembert-large
3
- tags:
4
- - generated_from_trainer
5
  metrics:
6
  - precision
7
  - recall
8
  - f1
9
  - accuracy
10
  model-index:
11
- - name: camembert-large-frenchNER_4entities
12
  results: []
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
 
18
- # camembert-large-frenchNER_4entities
19
 
20
- This model is a fine-tuned version of [camembert/camembert-large](https://huggingface.co/camembert/camembert-large) on an unknown dataset.
21
- It achieves the following results on the evaluation set:
22
- - Loss: 0.0532
23
- - Precision: 0.9860
24
- - Recall: 0.9860
25
- - F1: 0.9860
26
- - Accuracy: 0.9860
27
 
28
- ## Model description
 
 
 
29
 
30
- More information needed
31
 
32
- ## Intended uses & limitations
33
 
34
- More information needed
35
 
36
- ## Training and evaluation data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
- More information needed
39
 
40
  ## Training procedure
41
 
@@ -65,3 +993,93 @@ The following hyperparameters were used during training:
65
  - Pytorch 2.1.2
66
  - Datasets 2.16.1
67
  - Tokenizers 0.15.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: mit
3
  base_model: camembert/camembert-large
 
 
4
  metrics:
5
  - precision
6
  - recall
7
  - f1
8
  - accuracy
9
  model-index:
10
+ - name: NERmembert-large-4entities
11
  results: []
12
+ datasets:
13
+ - CATIE-AQ/frenchNER_4entities
14
+ language:
15
+ - fr
16
+ widget:
17
+ - text: "Assurés de disputer l'Euro 2024 en Allemagne l'été prochain (du 14 juin au 14 juillet) depuis leur victoire aux Pays-Bas, les Bleus ont fait le nécessaire pour avoir des certitudes. Avec six victoires en six matchs officiels et un seul but encaissé, Didier Deschamps a consolidé les acquis de la dernière Coupe du monde. Les joueurs clés sont connus : Kylian Mbappé, Aurélien Tchouameni, Antoine Griezmann, Ibrahima Konaté ou encore Mike Maignan."
18
+ library_name: transformers
19
+ pipeline_tag: token-classification
20
+ co2_eq_emissions: 80
21
  ---
22
 
 
 
23
 
24
+ # NERmembert-large-4entities
25
 
26
+ ## Model Description
 
 
 
 
 
 
27
 
28
+ We present **NERmembert-large-4entities**, which is a [CamemBERT large](https://huggingface.co/camembert/camembert-large) fine-tuned for the Name Entity Recognition task for the French language on four French NER datasets for 4 entities (LOC, PER, ORG, MISC).
29
+ All these datasets were concatenated and cleaned into a single dataset that we called [frenchNER_4entities](https://huggingface.co/datasets/CATIE-AQ/frenchNER_4entities).
30
+ There are a total of **384,773** rows, of which **328,757** are for training, **24,131** for validation and **31,885** for testing.
31
+ Our methodology is described in a blog post available in [English](https://blog.vaniila.ai/en/NER_en/) or [French](https://blog.vaniila.ai/NER/).
32
 
 
33
 
 
34
 
35
+ ## Dataset
36
 
37
+ The dataset used is [frenchNER_4entities](https://huggingface.co/datasets/CATIE-AQ/frenchNER_4entities), which represents ~385k sentences labeled in 4 categories:
38
+ | Label | Examples |
39
+ |:------|:-----------------------------------------------------------|
40
+ | PER | "La Bruyère", "Gaspard de Coligny", "Wittgenstein" |
41
+ | ORG | "UTBM", "American Airlines", "id Software" |
42
+ | LOC | "République du Cap-Vert", "Créteil", "Bordeaux" |
43
+ | MISC | "Wolfenstein 3D", "Révolution française", "Coupe du monde" |
44
+
45
+ The distribution of the entities is as follows:
46
+
47
+ <table>
48
+ <thead>
49
+ <tr>
50
+ <th><br>Splits</th>
51
+ <th><br>O</th>
52
+ <th><br>PER</th>
53
+ <th><br>LOC</th>
54
+ <th><br>ORG</th>
55
+ <th><br>MISC</th>
56
+ </tr>
57
+ </thead>
58
+ <tbody>
59
+ <td><br>train</td>
60
+ <td><br>7,539,692</td>
61
+ <td><br>307,144</td>
62
+ <td><br>286,746</td>
63
+ <td><br>127,089</td>
64
+ <td><br>799,494</td>
65
+ </tr>
66
+ <tr>
67
+ <td><br>validation</td>
68
+ <td><br>544,580</td>
69
+ <td><br>24,034</td>
70
+ <td><br>21,585</td>
71
+ <td><br>5,927</td>
72
+ <td><br>18,221</td>
73
+ </tr>
74
+ <tr>
75
+ <td><br>test</td>
76
+ <td><br>720,623</td>
77
+ <td><br>32,870</td>
78
+ <td><br>29,683</td>
79
+ <td><br>7,911</td>
80
+ <td><br>21,760</td>
81
+ </tr>
82
+ </tbody>
83
+ </table>
84
+
85
+
86
+ ## Evaluation results
87
+
88
+ The evaluation was carried out using the [**evaluate**](https://pypi.org/project/evaluate/) python package.
89
+
90
+ ### frenchNER_4entities
91
+
92
+ For space reasons, we show only the F1 of the different models. You can see the full results below the table.
93
+
94
+ <table>
95
+ <thead>
96
+ <tr>
97
+ <th><br>Model</th>
98
+ <th><br>PER</th>
99
+ <th><br>LOC</th>
100
+ <th><br>ORG</th>
101
+ <th><br>MISC</th>
102
+ </tr>
103
+ </thead>
104
+ <tbody>
105
+ <tr>
106
+ <td rowspan="1"><br><a href="https://hf.co/Jean-Baptiste/camembert-ner">Jean-Baptiste/camembert-ner</a></td>
107
+ <td><br>0.971</td>
108
+ <td><br>0.947</td>
109
+ <td><br>0.902</td>
110
+ <td><br>0.663</td>
111
+ </tr>
112
+ <tr>
113
+ <td rowspan="1"><br><a href="https://hf/cmarkea/distilcamembert-base-ner">cmarkea/distilcamembert-base-ner</a></td>
114
+ <td><br>0.974</td>
115
+ <td><br>0.948</td>
116
+ <td><br>0.892</td>
117
+ <td><br>0.658</td>
118
+ </tr>
119
+ <tr>
120
+ <td rowspan="1"><br><a href="https://hf/CATIE-AQ/NERmembert-base-3entities">NERmembert-base-3entities</a></td>
121
+ <td><br>0.978</td>
122
+ <td><br>0.957</td>
123
+ <td><br>0.904</td>
124
+ <td><br>0</td>
125
+ </tr>
126
+ <tr>
127
+ <td rowspan="1"><br><a href="https://hf/CATIE-AQ/NERmembert-base-4entities">NERmembert-base-4entities</a></td>
128
+ <td><br>0.978</td>
129
+ <td><br>0.958</td>
130
+ <td><br>0.903</td>
131
+ <td><br>0.814</td>
132
+ </tr>
133
+ <tr>
134
+ <td rowspan="1"><br>NERmembert-large-4entities (this model)</td>
135
+ <td><br>0.982</td>
136
+ <td><br>0.964</td>
137
+ <td><br>0.919</td>
138
+ <td><br>0.834</td>
139
+ </tr>
140
+ </tbody>
141
+ </table>
142
+
143
+
144
+ <details>
145
+ <summary>Full results</summary>
146
+ <table>
147
+ <thead>
148
+ <tr>
149
+ <th><br>Model</th>
150
+ <th><br>Metrics</th>
151
+ <th><br>PER</th>
152
+ <th><br>LOC</th>
153
+ <th><br>ORG</th>
154
+ <th><br>MISC</th>
155
+ <th><br>O</th>
156
+ <th><br>Overall</th>
157
+ </tr>
158
+ </thead>
159
+ <tbody>
160
+ <tr>
161
+ <td rowspan="3"><br><a href="https://hf.co/Jean-Baptiste/camembert-ner">Jean-Baptiste/camembert-ner</a></td>
162
+ <td><br>Precision</td>
163
+ <td><br>0.952</td>
164
+ <td><br>0.924</td>
165
+ <td><br>0.870</td>
166
+ <td><br>0.845</td>
167
+ <td><br>0.986</td>
168
+ <td><br>0.976</td>
169
+ </tr>
170
+ <tr>
171
+ <td><br>Recall</td>
172
+ <td><br>0.990</td>
173
+ <td><br>0.972</td>
174
+ <td><br>0.938</td>
175
+ <td><br>0.546</td>
176
+ <td><br>0.992</td>
177
+ <td><br>0.976</td>
178
+ </tr>
179
+ <tr>
180
+ <td>F1</td>
181
+ <td><br>0.971</td>
182
+ <td><br>0.947</td>
183
+ <td><br>0.902</td>
184
+ <td><br>0.663</td>
185
+ <td><br>0.989</td>
186
+ <td><br>0.976</td>
187
+ </tr>
188
+ <tr>
189
+ <td rowspan="3"><br><a href="https://hf/cmarkea/distilcamembert-base-ner">cmarkea/distilcamembert-base-ner</a></td>
190
+ <td><br>Precision</td>
191
+ <td><br>0.962</td>
192
+ <td><br>0.933</td>
193
+ <td><br>0.857</td>
194
+ <td><br>0.830</td>
195
+ <td><br>0.985</td>
196
+ <td><br>0.976</td>
197
+ </tr>
198
+ <tr>
199
+ <td><br>Recall</td>
200
+ <td><br>0.987</td>
201
+ <td><br>0.963</td>
202
+ <td><br>0.930</td>
203
+ <td><br>0.545</td>
204
+ <td><br>0.993</td>
205
+ <td><br>0.976</td>
206
+ </tr>
207
+ <tr>
208
+ <td>F1</td>
209
+ <td><br>0.974</td>
210
+ <td><br>0.948</td>
211
+ <td><br>0.892</td>
212
+ <td><br>0.658</td>
213
+ <td><br>0.989</td>
214
+ <td><br>0.976</td>
215
+ </tr>
216
+ <tr>
217
+ <td rowspan="3"><br><a href="https://hf/CATIE-AQ/NERmembert-base-3entities">NERmembert-base-3entities</a></td>
218
+ <td><br>Precision</td>
219
+ <td><br>0.973</td>
220
+ <td><br>0.955</td>
221
+ <td><br>0.886</td>
222
+ <td><br>0</td>
223
+ <td><br>X</td>
224
+ <td><br>X</td>
225
+ </tr>
226
+ <tr>
227
+ <td><br>Recall</td>
228
+ <td><br>0.983</td>
229
+ <td><br>0.960</td>
230
+ <td><br>0.923</td>
231
+ <td><br>0</td>
232
+ <td><br>X</td>
233
+ <td><br>X</td>
234
+ </tr>
235
+ <tr>
236
+ <td>F1</td>
237
+ <td><br>0.978</td>
238
+ <td><br>0.957</td>
239
+ <td><br>0.904</td>
240
+ <td><br>0</td>
241
+ <td><br>X</td>
242
+ <td><br>X</td>
243
+ </tr>
244
+ <tr>
245
+ <td rowspan="3"><br><a href="https://hf/CATIE-AQ/NERmembert-base-4entities">NERmembert-base-4entities</a></td>
246
+ <td><br>Precision</td>
247
+ <td><br>0.973</td>
248
+ <td><br>0.951</td>
249
+ <td><br>0.888</td>
250
+ <td><br>0.850</td>
251
+ <td><br>0.993</td>
252
+ <td><br>0.984</td>
253
+ </tr>
254
+ <tr>
255
+ <td><br>Recall</td>
256
+ <td><br>0.983</td>
257
+ <td><br>0.964</td>
258
+ <td><br>0.918</td>
259
+ <td><br>0.781</td>
260
+ <td><br>0.993</td>
261
+ <td><br>0.984</td>
262
+ </tr>
263
+ <tr>
264
+ <td>F1</td>
265
+ <td><br>0.978</td>
266
+ <td><br>0.958</td>
267
+ <td><br>0.903</td>
268
+ <td><br>0.814</td>
269
+ <td><br>0.993</td>
270
+ <td><br>0.984</td>
271
+ </tr>
272
+ <tr>
273
+ <td rowspan="3"><br>NERmembert-large-4entities (this model)</td>
274
+ <td><br>Precision</td>
275
+ <td><br>0.977</td>
276
+ <td><br>0.961</td>
277
+ <td><br>0.896</td>
278
+ <td><br>0.872</td>
279
+ <td><br>0.993</td>
280
+ <td><br>0.986</td>
281
+ </tr>
282
+ <tr>
283
+ <td><br>Recall</td>
284
+ <td><br>0.987</td>
285
+ <td><br>0.966</td>
286
+ <td><br>0.943</td>
287
+ <td><br>0.798</td>
288
+ <td><br>0.995</td>
289
+ <td><br>0.986</td>
290
+ </tr>
291
+ <tr>
292
+ <td>F1</td>
293
+ <td><br>0.982</td>
294
+ <td><br>0.964</td>
295
+ <td><br>0.919</td>
296
+ <td><br>0.834</td>
297
+ <td><br>0.994</td>
298
+ <td><br>0.986</td>
299
+ </tr>
300
+ </tbody>
301
+ </table>
302
+ </details>
303
+
304
+ In detail:
305
+
306
+ ### multiconer
307
+
308
+ For space reasons, we show only the F1 of the different models. You can see the full results below the table.
309
+
310
+ <table>
311
+ <thead>
312
+ <tr>
313
+ <th><br>Model</th>
314
+ <th><br>PER</th>
315
+ <th><br>LOC</th>
316
+ <th><br>ORG</th>
317
+ <th><br>MISC</th>
318
+ </tr>
319
+ </thead>
320
+ <tbody>
321
+ <tr>
322
+ <td rowspan="1"><br><a href="https://hf.co/Jean-Baptiste/camembert-ner">Jean-Baptiste/camembert-ner</a></td>
323
+ <td><br>0.940</td>
324
+ <td><br>0.761</td>
325
+ <td><br>0.723</td>
326
+ <td><br>0.560</td>
327
+ </tr>
328
+ <tr>
329
+ <td rowspan="1"><br><a href="https://hf/cmarkea/distilcamembert-base-ner">cmarkea/distilcamembert-base-ner</a></td>
330
+ <td><br>0.921</td>
331
+ <td><br>0.748</td>
332
+ <td><br>0.694</td>
333
+ <td><br>0.530</td>
334
+ </tr>
335
+ <tr>
336
+ <td rowspan="1"><br><a href="https://hf/CATIE-AQ/NERmembert-base-3entities">NERmembert-base-3entities</a></td>
337
+ <td><br>0.960</td>
338
+ <td><br>0.887</td>
339
+ <td><br>0.877</td>
340
+ <td><br>0</td>
341
+ </tr>
342
+ <tr>
343
+ <td rowspan="1"><br><a href="https://hf/CATIE-AQ/NERmembert-base-4entities">NERmembert-base-4entities</a></td>
344
+ <td><br>0.960</td>
345
+ <td><br>0.890</td>
346
+ <td><br>0.867</td>
347
+ <td><br>0.852</td>
348
+ </tr>
349
+ <tr>
350
+ <td rowspan="1"><br>NERmembert-large-4entities (this model)</td>
351
+ <td><br>0.969</td>
352
+ <td><br>0.919</td>
353
+ <td><br>0.904</td>
354
+ <td><br>0.864</td>
355
+ </tr>
356
+ </tbody>
357
+ </table>
358
+
359
+ <details>
360
+ <summary>Full results</summary>
361
+ <table>
362
+ <thead>
363
+ <tr>
364
+ <th><br>Model</th>
365
+ <th><br>Metrics</th>
366
+ <th><br>PER</th>
367
+ <th><br>LOC</th>
368
+ <th><br>ORG</th>
369
+ <th><br>MISC</th>
370
+ <th><br>O</th>
371
+ <th><br>Overall</th>
372
+ </tr>
373
+ </thead>
374
+ <tbody>
375
+ <tr>
376
+ <td rowspan="3"><br><a href="https://hf.co/Jean-Baptiste/camembert-ner">Jean-Baptiste/camembert-ner</a></td>
377
+ <td><br>Precision</td>
378
+ <td><br>0.908</td>
379
+ <td><br>0.717</td>
380
+ <td><br>0.753</td>
381
+ <td><br>0.620</td>
382
+ <td><br>0.936</td>
383
+ <td><br>0.889</td>
384
+ </tr>
385
+ <tr>
386
+ <td><br>Recall</td>
387
+ <td><br>0.975</td>
388
+ <td><br>0.811</td>
389
+ <td><br>0.696</td>
390
+ <td><br>0.511</td>
391
+ <td><br>0.938</td>
392
+ <td><br>0.889</td>
393
+ </tr>
394
+ <tr>
395
+ <td>F1</td>
396
+ <td><br>0.940</td>
397
+ <td><br>0.761</td>
398
+ <td><br>0.723</td>
399
+ <td><br>0.560</td>
400
+ <td><br>0.937</td>
401
+ <td><br>0.889</td>
402
+ </tr>
403
+ <tr>
404
+ <td rowspan="3"><br><a href="https://hf/cmarkea/distilcamembert-base-ner">cmarkea/distilcamembert-base-ner</a></td>
405
+ <td><br>Precision</td>
406
+ <td><br>0.885</td>
407
+ <td><br>0.738</td>
408
+ <td><br>0.737</td>
409
+ <td><br>0.589</td>
410
+ <td><br>0.928</td>
411
+ <td><br>0.881</td>
412
+ </tr>
413
+ <tr>
414
+ <td><br>Recall</td>
415
+ <td><br>0.960</td>
416
+ <td><br>0.759</td>
417
+ <td><br>0.655</td>
418
+ <td><br>0.482</td>
419
+ <td><br>0.939</td>
420
+ <td><br>0.881</td>
421
+ </tr>
422
+ <tr>
423
+ <td>F1</td>
424
+ <td><br>0.921</td>
425
+ <td><br>0.748</td>
426
+ <td><br>0.694</td>
427
+ <td><br>0.530</td>
428
+ <td><br>0.934</td>
429
+ <td><br>0.881</td>
430
+ </tr>
431
+ <tr>
432
+ <td rowspan="3"><br><a href="https://hf/CATIE-AQ/NERmembert-base-3entities">NERmembert-base-3entities</a></td>
433
+ <td><br>Precision</td>
434
+ <td><br>0.957</td>
435
+ <td><br>0.894</td>
436
+ <td><br>0.876</td>
437
+ <td><br>0</td>
438
+ <td><br>X</td>
439
+ <td><br>X</td>
440
+ </tr>
441
+ <tr>
442
+ <td><br>Recall</td>
443
+ <td><br>0.962</td>
444
+ <td><br>0.880</td>
445
+ <td><br>0.878</td>
446
+ <td><br>0</td>
447
+ <td><br>X</td>
448
+ <td><br>X</td>
449
+ </tr>
450
+ <tr>
451
+ <td>F1</td>
452
+ <td><br>0.960</td>
453
+ <td><br>0.887</td>
454
+ <td><br>0.877</td>
455
+ <td><br>0</td>
456
+ <td><br>X</td>
457
+ <td><br>X</td>
458
+ </tr>
459
+ <tr>
460
+ <td rowspan="3"><br><a href="https://hf/CATIE-AQ/NERmembert-base-4entities">NERmembert-base-4entities</a></td>
461
+ <td><br>Precision</td>
462
+ <td><br>0.954</td>
463
+ <td><br>0.893</td>
464
+ <td><br>0.851</td>
465
+ <td><br>0.849</td>
466
+ <td><br>0.979</td>
467
+ <td><br>0.954</td>
468
+ </tr>
469
+ <tr>
470
+ <td><br>Recall</td>
471
+ <td><br>0.967</td>
472
+ <td><br>0.887</td>
473
+ <td><br>0.883</td>
474
+ <td><br>0.855</td>
475
+ <td><br>0.974</td>
476
+ <td><br>0.954</td>
477
+ </tr>
478
+ <tr>
479
+ <td>F1</td>
480
+ <td><br>0.960</td>
481
+ <td><br>0.890</td>
482
+ <td><br>0.867</td>
483
+ <td><br>0.852</td>
484
+ <td><br>0.977</td>
485
+ <td><br>0.954</td>
486
+ </tr>
487
+ <tr>
488
+ <td rowspan="3"><br>NERmembert-large-4entities (this model)</td>
489
+ <td><br>Precision</td>
490
+ <td><br>0.964</td>
491
+ <td><br>0.922</td>
492
+ <td><br>0.904</td>
493
+ <td><br>0.856</td>
494
+ <td><br>0.981</td>
495
+ <td><br>0.961</td>
496
+ </tr>
497
+ <tr>
498
+ <td><br>Recall</td>
499
+ <td><br>0.975</td>
500
+ <td><br>0.917</td>
501
+ <td><br>0.904</td>
502
+ <td><br>0.872</td>
503
+ <td><br>0.976</td>
504
+ <td><br>0.961</td>
505
+ </tr>
506
+ <tr>
507
+ <td>F1</td>
508
+ <td><br>0.969</td>
509
+ <td><br>0.919</td>
510
+ <td><br>0.904</td>
511
+ <td><br>0.864</td>
512
+ <td><br>0.978</td>
513
+ <td><br>0.961</td>
514
+ </tr>
515
+ </tbody>
516
+ </table>
517
+ </details>
518
+
519
+
520
+ ### multinerd
521
+
522
+ For space reasons, we show only the F1 of the different models. You can see the full results below the table.
523
+
524
+ <table>
525
+ <thead>
526
+ <tr>
527
+ <th><br>Model</th>
528
+ <th><br>PER</th>
529
+ <th><br>LOC</th>
530
+ <th><br>ORG</th>
531
+ <th><br>MISC</th>
532
+ </tr>
533
+ </thead>
534
+ <tbody>
535
+ <tr>
536
+ <td rowspan="1"><br><a href="https://hf.co/Jean-Baptiste/camembert-ner">Jean-Baptiste/camembert-ner</a></td>
537
+ <td><br>0.962</td>
538
+ <td><br>0.934</td>
539
+ <td><br>0.888</td>
540
+ <td><br>0.419</td>
541
+ </tr>
542
+ <tr>
543
+ <td rowspan="1"><br><a href="https://hf/cmarkea/distilcamembert-base-ner">cmarkea/distilcamembert-base-ner</a></td>
544
+ <td><br>0.972</td>
545
+ <td><br>0.938</td>
546
+ <td><br>0.884</td>
547
+ <td><br>0.430</td>
548
+ </tr>
549
+ <tr>
550
+ <td rowspan="1"><br><a href="https://hf/CATIE-AQ/NERmembert-base-3entities">NERmembert-base-3entities</a></td>
551
+ <td><br>0.985</td>
552
+ <td><br>0.973</td>
553
+ <td><br>0.938</td>
554
+ <td><br>0</td>
555
+ </tr>
556
+ <tr>
557
+ <td rowspan="1"><br><a href="https://hf/CATIE-AQ/NERmembert-base-4entities">NERmembert-base-4entities</a></td>
558
+ <td><br>0.985</td>
559
+ <td><br>0.973</td>
560
+ <td><br>0.938</td>
561
+ <td><br>0.770</td>
562
+ </tr>
563
+ <tr>
564
+ <td rowspan="1"><br>NERmembert-large-4entities (this model)</td>
565
+ <td><br>0.987</td>
566
+ <td><br>0.976</td>
567
+ <td><br>0.948</td>
568
+ <td><br>0.790</td>
569
+ </tr>
570
+ </tbody>
571
+ </table>
572
+
573
+ <details>
574
+ <summary>Full results</summary>
575
+ <table>
576
+ <thead>
577
+ <tr>
578
+ <th><br>Model</th>
579
+ <th><br>Metrics</th>
580
+ <th><br>PER</th>
581
+ <th><br>LOC</th>
582
+ <th><br>ORG</th>
583
+ <th><br>MISC</th>
584
+ <th><br>O</th>
585
+ <th><br>Overall</th>
586
+ </tr>
587
+ </thead>
588
+ <tbody>
589
+ <tr>
590
+ <td rowspan="3"><br><a href="https://hf.co/Jean-Baptiste/camembert-ner">Jean-Baptiste/camembert-ner</a></td>
591
+ <td><br>Precision</td>
592
+ <td><br>0.931</td>
593
+ <td><br>0.893</td>
594
+ <td><br>0.827</td>
595
+ <td><br>0.725</td>
596
+ <td><br>0.979</td>
597
+ <td><br>0.966</td>
598
+ </tr>
599
+ <tr>
600
+ <td><br>Recall</td>
601
+ <td><br>0.994</td>
602
+ <td><br>0.980</td>
603
+ <td><br>0.959</td>
604
+ <td><br>0.295</td>
605
+ <td><br>0.990</td>
606
+ <td><br>0.966</td>
607
+ </tr>
608
+ <tr>
609
+ <td>F1</td>
610
+ <td><br>0.962</td>
611
+ <td><br>0.934</td>
612
+ <td><br>0.888</td>
613
+ <td><br>0.419</td>
614
+ <td><br>0.984</td>
615
+ <td><br>0.966</td>
616
+ </tr>
617
+ <tr>
618
+ <td rowspan="3"><br><a href="https://hf/cmarkea/distilcamembert-base-ner">cmarkea/distilcamembert-base-ner</a></td>
619
+ <td><br>Precision</td>
620
+ <td><br>0.954</td>
621
+ <td><br>0.908</td>
622
+ <td><br>0.817</td>
623
+ <td><br>0.705</td>
624
+ <td><br>0.977</td>
625
+ <td><br>0.967</td>
626
+ </tr>
627
+ <tr>
628
+ <td><br>Recall</td>
629
+ <td><br>0.991</td>
630
+ <td><br>0.969</td>
631
+ <td><br>0.963</td>
632
+ <td><br>0.310</td>
633
+ <td><br>0.990</td>
634
+ <td><br>0.967</td>
635
+ </tr>
636
+ <tr>
637
+ <td>F1</td>
638
+ <td><br>0.972</td>
639
+ <td><br>0.938</td>
640
+ <td><br>0.884</td>
641
+ <td><br>0.430</td>
642
+ <td><br>0.984</td>
643
+ <td><br>0.967</td>
644
+ </tr>
645
+ <tr>
646
+ <td rowspan="3"><br><a href="https://hf/CATIE-AQ/NERmembert-base-3entities">NERmembert-base-3entities</a></td>
647
+ <td><br>Precision</td>
648
+ <td><br>0.974</td>
649
+ <td><br>0.965</td>
650
+ <td><br>0.910</td>
651
+ <td><br>0</td>
652
+ <td><br>X</td>
653
+ <td><br>X</td>
654
+ </tr>
655
+ <tr>
656
+ <td><br>Recall</td>
657
+ <td><br>0.995</td>
658
+ <td><br>0.981</td>
659
+ <td><br>0.968</td>
660
+ <td><br>0</td>
661
+ <td><br>X</td>
662
+ <td><br>X</td>
663
+ </tr>
664
+ <tr>
665
+ <td>F1</td>
666
+ <td><br>0.985</td>
667
+ <td><br>0.973</td>
668
+ <td><br>0.938</td>
669
+ <td><br>0</td>
670
+ <td><br>X</td>
671
+ <td><br>X</td>
672
+ </tr>
673
+ <tr>
674
+ <td rowspan="3"><br><a href="https://hf/CATIE-AQ/NERmembert-base-4entities">NERmembert-base-4entities</a></td>
675
+ <td><br>Precision</td>
676
+ <td><br>0.976</td>
677
+ <td><br>0.961</td>
678
+ <td><br>0.91</td>
679
+ <td><br>0.829</td>
680
+ <td><br>0.991</td>
681
+ <td><br>0.983</td>
682
+ </tr>
683
+ <tr>
684
+ <td><br>Recall</td>
685
+ <td><br>0.994</td>
686
+ <td><br>0.985</td>
687
+ <td><br>0.967</td>
688
+ <td><br>0.719</td>
689
+ <td><br>0.993</td>
690
+ <td><br>0.983</td>
691
+ </tr>
692
+ <tr>
693
+ <td>F1</td>
694
+ <td><br>0.985</td>
695
+ <td><br>0.973</td>
696
+ <td><br>0.938</td>
697
+ <td><br>0.770</td>
698
+ <td><br>0.992</td>
699
+ <td><br>0.983</td>
700
+ </tr>
701
+ <tr>
702
+ <td rowspan="3"><br>NERmembert-large-4entities (this model)</td>
703
+ <td><br>Precision</td>
704
+ <td><br>0.979</td>
705
+ <td><br>0.967</td>
706
+ <td><br>0.922</td>
707
+ <td><br>0.852</td>
708
+ <td><br>0.991</td>
709
+ <td><br>0.985</td>
710
+ </tr>
711
+ <tr>
712
+ <td><br>Recall</td>
713
+ <td><br>0.996</td>
714
+ <td><br>0.986</td>
715
+ <td><br>0.974</td>
716
+ <td><br>0.736</td>
717
+ <td><br>0.994</td>
718
+ <td><br>0.985</td>
719
+ </tr>
720
+ <tr>
721
+ <td>F1</td>
722
+ <td><br>0.987</td>
723
+ <td><br>0.976</td>
724
+ <td><br>0.948</td>
725
+ <td><br>0.790</td>
726
+ <td><br>0.993</td>
727
+ <td><br>0.985</td>
728
+ </tr>
729
+ </tbody>
730
+ </table>
731
+ </details>
732
+
733
+ ### wikiner
734
+
735
+ For space reasons, we show only the F1 of the different models. You can see the full results below the table.
736
+
737
+ <table>
738
+ <thead>
739
+ <tr>
740
+ <th><br>Model</th>
741
+ <th><br>PER</th>
742
+ <th><br>LOC</th>
743
+ <th><br>ORG</th>
744
+ <th><br>MISC</th>
745
+ </tr>
746
+ </thead>
747
+ <tbody>
748
+ <tr>
749
+ <td rowspan="1"><br><a href="https://hf.co/Jean-Baptiste/camembert-ner">Jean-Baptiste/camembert-ner</a></td>
750
+ <td><br>0.986</td>
751
+ <td><br>0.966</td>
752
+ <td><br>0.938</td>
753
+ <td><br>0.938</td>
754
+ </tr>
755
+ <tr>
756
+ <td rowspan="1"><br><a href="https://hf/cmarkea/distilcamembert-base-ner">cmarkea/distilcamembert-base-ner</a></td>
757
+ <td><br>0.983</td>
758
+ <td><br>0.964</td>
759
+ <td><br>0.925</td>
760
+ <td><br>0.926</td>
761
+ </tr>
762
+ <tr>
763
+ <td rowspan="1"><br><a href="https://hf/CATIE-AQ/NERmembert-base-3entities">NERmembert-base-3entities</a></td>
764
+ <td><br>0.970</td>
765
+ <td><br>0.945</td>
766
+ <td><br>0.878</td>
767
+ <td><br>0</td>
768
+ </tr>
769
+ <tr>
770
+ <td rowspan="1"><br><a href="https://hf/CATIE-AQ/NERmembert-base-4entities">NERmembert-base-4entities</a></td>
771
+ <td><br>0.970</td>
772
+ <td><br>0.945</td>
773
+ <td><br>0.876</td>
774
+ <td><br>0.872</td>
775
+ </tr>
776
+ <tr>
777
+ <td rowspan="1"><br>NERmembert-large-4entities (this model)</td>
778
+ <td><br>0.975</td>
779
+ <td><br>0.953</td>
780
+ <td><br>0.896</td>
781
+ <td><br>0.893</td>
782
+ </tr>
783
+ </tbody>
784
+ </table>
785
+
786
+ <details>
787
+ <summary>Full results</summary>
788
+ <table>
789
+ <thead>
790
+ <tr>
791
+ <th><br>Model</th>
792
+ <th><br>Metrics</th>
793
+ <th><br>PER</th>
794
+ <th><br>LOC</th>
795
+ <th><br>ORG</th>
796
+ <th><br>MISC</th>
797
+ <th><br>O</th>
798
+ <th><br>Overall</th>
799
+ </tr>
800
+ </thead>
801
+ <tbody>
802
+ <tr>
803
+ <td rowspan="3"><br><a href="https://hf.co/Jean-Baptiste/camembert-ner">Jean-Baptiste/camembert-ner</a></td>
804
+ <td><br>Precision</td>
805
+ <td><br>0.986</td>
806
+ <td><br>0.962</td>
807
+ <td><br>0.925</td>
808
+ <td><br>0.943</td>
809
+ <td><br>0.998</td>
810
+ <td><br>0.992</td>
811
+ </tr>
812
+ <tr>
813
+ <td><br>Recall</td>
814
+ <td><br>0.987</td>
815
+ <td><br>0.969</td>
816
+ <td><br>0.951</td>
817
+ <td><br>0.933</td>
818
+ <td><br>0.997</td>
819
+ <td><br>0.992</td>
820
+ </tr>
821
+ <tr>
822
+ <td>F1</td>
823
+ <td><br>0.986</td>
824
+ <td><br>0.966</td>
825
+ <td><br>0.938</td>
826
+ <td><br>0.938</td>
827
+ <td><br>0.998</td>
828
+ <td><br>0.992</td>
829
+ </tr>
830
+ <tr>
831
+ <td rowspan="3"><br><a href="https://hf/cmarkea/distilcamembert-base-ner">cmarkea/distilcamembert-base-ner</a></td>
832
+ <td><br>Precision</td>
833
+ <td><br>0.982</td>
834
+ <td><br>0.964</td>
835
+ <td><br>0.910</td>
836
+ <td><br>0.942</td>
837
+ <td><br>0.997</td>
838
+ <td><br>0.991</td>
839
+ </tr>
840
+ <tr>
841
+ <td><br>Recall</td>
842
+ <td><br>0.985</td>
843
+ <td><br>0.963</td>
844
+ <td><br>0.940</td>
845
+ <td><br>0.910</td>
846
+ <td><br>0.998</td>
847
+ <td><br>0.991</td>
848
+ </tr>
849
+ <tr>
850
+ <td>F1</td>
851
+ <td><br>0.983</td>
852
+ <td><br>0.964</td>
853
+ <td><br>0.925</td>
854
+ <td><br>0.926</td>
855
+ <td><br>0.997</td>
856
+ <td><br>0.991</td>
857
+ </tr>
858
+ <tr>
859
+ <td rowspan="3"><br><a href="https://hf/CATIE-AQ/NERmembert-base-3entities">NERmembert-base-3entities</a></td>
860
+ <td><br>Precision</td>
861
+ <td><br>0.971</td>
862
+ <td><br>0.947</td>
863
+ <td><br>0.866</td>
864
+ <td><br>0</td>
865
+ <td><br>X</td>
866
+ <td><br>X</td>
867
+ </tr>
868
+ <tr>
869
+ <td><br>Recall</td>
870
+ <td><br>0.969</td>
871
+ <td><br>0.943</td>
872
+ <td><br>0.891</td>
873
+ <td><br>0</td>
874
+ <td><br>X</td>
875
+ <td><br>X</td>
876
+ </tr>
877
+ <tr>
878
+ <td>F1</td>
879
+ <td><br>0.970</td>
880
+ <td><br>0.945</td>
881
+ <td><br>0.878</td>
882
+ <td><br>0</td>
883
+ <td><br>X</td>
884
+ <td><br>X</td>
885
+ </tr>
886
+ <tr>
887
+ <td rowspan="3"><br><a href="https://hf/CATIE-AQ/NERmembert-base-4entities">NERmembert-base-4entities</a></td>
888
+ <td><br>Precision</td>
889
+ <td><br>0.970</td>
890
+ <td><br>0.944</td>
891
+ <td><br>0.872</td>
892
+ <td><br>0.878</td>
893
+ <td><br>0.996</td>
894
+ <td><br>0.986</td>
895
+ </tr>
896
+ <tr>
897
+ <td><br>Recall</td>
898
+ <td><br>0.969</td>
899
+ <td><br>0.947</td>
900
+ <td><br>0.880</td>
901
+ <td><br>0.866</td>
902
+ <td><br>0.996</td>
903
+ <td><br>0.986</td>
904
+ </tr>
905
+ <tr>
906
+ <td>F1</td>
907
+ <td><br>0.970</td>
908
+ <td><br>0.945</td>
909
+ <td><br>0.876</td>
910
+ <td><br>0.872</td>
911
+ <td><br>0.996</td>
912
+ <td><br>0.986</td>
913
+ </tr>
914
+ <tr>
915
+ <td rowspan="3"><br>NERmembert-large-4entities (this model)</td>
916
+ <td><br>Precision</td>
917
+ <td><br>0.975</td>
918
+ <td><br>0.957</td>
919
+ <td><br>0.872</td>
920
+ <td><br>0.901</td>
921
+ <td><br>0.997</td>
922
+ <td><br>0.989</td>
923
+ </tr>
924
+ <tr>
925
+ <td><br>Recall</td>
926
+ <td><br>0.975</td>
927
+ <td><br>0.949</td>
928
+ <td><br>0.922</td>
929
+ <td><br>0.884</td>
930
+ <td><br>0.997</td>
931
+ <td><br>0.989</td>
932
+ </tr>
933
+ <tr>
934
+ <td>F1</td>
935
+ <td><br>0.975</td>
936
+ <td><br>0.953</td>
937
+ <td><br>0.896</td>
938
+ <td><br>0.893</td>
939
+ <td><br>0.997</td>
940
+ <td><br>0.989</td>
941
+ </tr>
942
+ </tbody>
943
+ </table>
944
+ </details>
945
+
946
+ ## Usage
947
+ ### Code
948
+
949
+ ```python
950
+ from transformers import pipeline
951
+
952
+ ner = pipeline('token-classification', model='CATIE-AQ/NERmembert-large-4entities', tokenizer='CATIE-AQ/NERmembert-large-4entities', aggregation_strategy="simple")
953
+
954
+ results = ner(
955
+ "Assurés de disputer l'Euro 2024 en Allemagne l'été prochain (du 14 juin au 14 juillet) depuis leur victoire aux Pays-Bas, les Bleus ont fait le nécessaire pour avoir des certitudes. Avec six victoires en six matchs officiels et un seul but encaissé, Didier Deschamps a consolidé les acquis de la dernière Coupe du monde. Les joueurs clés sont connus : Kylian Mbappé, Aurélien Tchouameni, Antoine Griezmann, Ibrahima Konaté ou encore Mike Maignan."
956
+ )
957
+
958
+ print(result)
959
+ ```
960
+ ```python
961
+
962
+ ```
963
+
964
+ ### Try it through Space
965
+ A Space has been created to test the model. It is available [here](https://huggingface.co/spaces/CATIE-AQ/NERmembert).
966
 
 
967
 
968
  ## Training procedure
969
 
 
993
  - Pytorch 2.1.2
994
  - Datasets 2.16.1
995
  - Tokenizers 0.15.0
996
+
997
+
998
+ ## Environmental Impact
999
+
1000
+ *Carbon emissions were estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact.*
1001
+
1002
+ - **Hardware Type:** A100 PCIe 40/80GB
1003
+ - **Hours used:** 4h17min
1004
+ - **Cloud Provider:** Private Infrastructure
1005
+ - **Carbon Efficiency (kg/kWh):** 0.078 (estimated from [electricitymaps](https://app.electricitymaps.com/zone/FR) for the day of January 10, 2024.)
1006
+ - **Carbon Emitted** *(Power consumption x Time x Carbon produced based on location of power grid)*: 0.08 kg eq. CO2
1007
+
1008
+
1009
+
1010
+ ## Citations
1011
+
1012
+ ### NERmembert-large-4entities
1013
+ ```
1014
+ TODO
1015
+ ```
1016
+
1017
+ ### multiconer
1018
+
1019
+ > @inproceedings{multiconer2-report,
1020
+ title={{SemEval-2023 Task 2: Fine-grained Multilingual Named Entity Recognition (MultiCoNER 2)}},
1021
+ author={Fetahu, Besnik and Kar, Sudipta and Chen, Zhiyu and Rokhlenko, Oleg and Malmasi, Shervin},
1022
+ booktitle={Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)},
1023
+ year={2023},
1024
+ publisher={Association for Computational Linguistics}}
1025
+
1026
+ > @article{multiconer2-data,
1027
+ title={{MultiCoNER v2: a Large Multilingual dataset for Fine-grained and Noisy Named Entity Recognition}},
1028
+ author={Fetahu, Besnik and Chen, Zhiyu and Kar, Sudipta and Rokhlenko, Oleg and Malmasi, Shervin},
1029
+ year={2023}}
1030
+
1031
+
1032
+ ### multinerd
1033
+
1034
+ > @inproceedings{tedeschi-navigli-2022-multinerd,
1035
+ title = "{M}ulti{NERD}: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguation)",
1036
+ author = "Tedeschi, Simone and Navigli, Roberto",
1037
+ booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022",
1038
+ month = jul,
1039
+ year = "2022",
1040
+ address = "Seattle, United States",
1041
+ publisher = "Association for Computational Linguistics",
1042
+ url = "https://aclanthology.org/2022.findings-naacl.60",
1043
+ doi = "10.18653/v1/2022.findings-naacl.60",
1044
+ pages = "801--812"}
1045
+
1046
+ ### pii-masking-200k
1047
+
1048
+ > @misc {ai4privacy_2023,
1049
+ author = { {ai4Privacy} },
1050
+ title = { pii-masking-200k (Revision 1d4c0a1) },
1051
+ year = 2023,
1052
+ url = { https://huggingface.co/datasets/ai4privacy/pii-masking-200k },
1053
+ doi = { 10.57967/hf/1532 },
1054
+ publisher = { Hugging Face }}
1055
+
1056
+ ### wikiner
1057
+
1058
+ > @article{NOTHMAN2013151,
1059
+ title = {Learning multilingual named entity recognition from Wikipedia},
1060
+ journal = {Artificial Intelligence},
1061
+ volume = {194},
1062
+ pages = {151-175},
1063
+ year = {2013},
1064
+ note = {Artificial Intelligence, Wikipedia and Semi-Structured Resources},
1065
+ issn = {0004-3702},
1066
+ doi = {https://doi.org/10.1016/j.artint.2012.03.006},
1067
+ url = {https://www.sciencedirect.com/science/article/pii/S0004370212000276},
1068
+ author = {Joel Nothman and Nicky Ringland and Will Radford and Tara Murphy and James R. Curran}}
1069
+
1070
+
1071
+ ### frenchNER_4entities
1072
+ ```
1073
+ TODO
1074
+ ```
1075
+
1076
+ ### CamemBERT
1077
+ > @inproceedings{martin2020camembert,
1078
+ title={CamemBERT: a Tasty French Language Model},
1079
+ author={Martin, Louis and Muller, Benjamin and Su{\'a}rez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, {\'E}ric Villemonte and Seddah, Djam{\'e} and Sagot, Beno{\^\i}t},
1080
+ booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
1081
+ year={2020}}
1082
+
1083
+
1084
+ ## License
1085
+ [cc-by-4.0](https://creativecommons.org/licenses/by/4.0/deed.en)