vesteinn commited on
Commit
df7a8f2
·
1 Parent(s): 521f410

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: is
3
+ widget:
4
+ - text: Má bjóða þér <mask> í kvöld?
5
+ - text: Forseti <mask> er ágæt.
6
+ - text: Súpan var <mask> á bragðið.
7
+ tags:
8
+ - roberta
9
+ - icelandic
10
+ - masked-lm
11
+ - pytorch
12
+ license: agpl-3.0
13
+ ---
14
+
15
+ # IceBERT-xlmr-ic3
16
+
17
+ This model was trained with fairseq using the RoBERTa-base architecture. The model `xlm-roberta-base` was used as a starting point. It is one of many models we have trained for Icelandic, see the paper referenced below for further details. The training data used is shown in the table below.
18
+
19
+ | Dataset | Size | Tokens |
20
+ |------------------------------------------------------|---------|--------|
21
+ | Icelandic Common Crawl Corpus (IC3) | 4.9 GB | 824M |
22
+
23
+ ## Citation
24
+
25
+ The model is described in this paper [https://arxiv.org/abs/2201.05601](https://arxiv.org/abs/2201.05601). Please cite the paper if you make use of the model.
26
+
27
+ ```
28
+ @article{DBLP:journals/corr/abs-2201-05601,
29
+ author = {V{\'{e}}steinn Sn{\ae}bjarnarson and
30
+ Haukur Barri S{\'{\i}}monarson and
31
+ P{\'{e}}tur Orri Ragnarsson and
32
+ Svanhv{\'{\i}}t Lilja Ing{\'{o}}lfsd{\'{o}}ttir and
33
+ Haukur P{\'{a}}ll J{\'{o}}nsson and
34
+ Vilhj{\'{a}}lmur {\TH}orsteinsson and
35
+ Hafsteinn Einarsson},
36
+ title = {A Warm Start and a Clean Crawled Corpus - {A} Recipe for Good Language
37
+ Models},
38
+ journal = {CoRR},
39
+ volume = {abs/2201.05601},
40
+ year = {2022},
41
+ url = {https://arxiv.org/abs/2201.05601},
42
+ eprinttype = {arXiv},
43
+ eprint = {2201.05601},
44
+ timestamp = {Thu, 20 Jan 2022 14:21:35 +0100},
45
+ biburl = {https://dblp.org/rec/journals/corr/abs-2201-05601.bib},
46
+ bibsource = {dblp computer science bibliography, https://dblp.org}
47
+ }
48
+ ```