novakat commited on
Commit
dd8462b
·
1 Parent(s): c3a1e3c

Update citation info

Browse files
Files changed (1) hide show
  1. README.md +13 -11
README.md CHANGED
@@ -83,18 +83,20 @@ Further non-name entities:
83
  |`AGE` |Age
84
  |`ID`| Identifier
85
 
86
- ### If you se this model, please cite:
87
 
88
  ```bibtex
89
- @InProceedings{novak-novak:2022:LREC,
90
- author = {Nov{\'{a}}k, Attila and Nov{\'{a}}k, Borb{\'{a}}la},
91
- title = {NerKor+Cars-OntoNotes++},
92
- booktitle = {Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022)},
93
- month = {June},
94
- year = {2022},
95
- address = {Marseille, France},
96
- publisher = {European Language Resources Association},
97
- pages = {1907--1916},
98
- url = {http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.203.pdf}
 
 
99
  }
100
  ```
 
83
  |`AGE` |Age
84
  |`ID`| Identifier
85
 
86
+ ### If you use this model, please cite:
87
 
88
  ```bibtex
89
+ @inproceedings{novak-novak-2022-nerkor,
90
+ title = "{N}er{K}or+{C}ars-{O}nto{N}otes++",
91
+ author = "Nov{\'a}k, Attila and
92
+ Nov{\'a}k, Borb{\'a}la",
93
+ booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
94
+ month = jun,
95
+ year = "2022",
96
+ address = "Marseille, France",
97
+ publisher = "European Language Resources Association",
98
+ url = "https://aclanthology.org/2022.lrec-1.203",
99
+ pages = "1907--1916",
100
+ abstract = "In this paper, we present an upgraded version of the Hungarian NYTK-NerKor named entity corpus, which contains about twice as many annotated spans and 7 times as many distinct entity types as the original version. We used an extended version of the OntoNotes 5 annotation scheme including time and numerical expressions. NerKor is the newest and biggest NER corpus for Hungarian containing diverse domains. We applied cross-lingual transfer of NER models trained for other languages based on multilingual contextual language models to preannotate the corpus. We corrected the annotation semi-automatically and manually. Zero-shot preannotation was very effective with about 0.82 F1 score for the best model. We also added a 12000-token subcorpus on cars and other motor vehicles. We trained and release a transformer-based NER tagger for Hungarian using the annotation in the new corpus version, which provides similar performance to an identical model trained on the original version of the corpus.",
101
  }
102
  ```