File size: 2,338 Bytes
c49488f
6b792dd
f27da9d
 
 
 
 
 
 
6f3329f
f27da9d
6f3329f
 
 
 
 
 
 
f27da9d
6f3329f
 
 
f27da9d
6f3329f
f27da9d
6f3329f
f27da9d
6f3329f
 
 
f27da9d
6f3329f
 
f27da9d
 
 
6f3329f
6bfe828
 
 
 
 
 
 
 
 
ba56c45
6bfe828
 
 
 
 
 
 
 
5cda625
6bfe828
 
 
 
 
 
36d1954
6bfe828
 
 
23336a3
6bfe828
 
c49488f
f27da9d
 
 
 
 
1970006
f27da9d
 
 
 
 
 
2000270
f27da9d
6f3329f
 
 
b5d2a35
6f3329f
 
f27da9d
6f3329f
 
f27da9d
6f3329f
 
 
f27da9d
 
 
6f3329f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23336a3
 
 
6f3329f
23336a3
 
 
 
 
6f3329f
 
 
23336a3
6f3329f
23336a3
 
 
6f3329f
23336a3
 
 
6f3329f
23336a3
 
6f3329f
 
f27da9d
 
878eea5
504a3e6
878eea5
f27da9d
fa8babb
f27da9d
5aee0af
 
 
 
 
 
 
f27da9d
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
license: mit
tags:
- generated_from_trainer
model-index:
- name: afro-xlmr-large-61L
  results: []
language:
- en
- am
- ar
- so
- sw
- pt
- af
- fr
- zu
- mg
- ha
- sn
- arz
- ny
- ig
- xh
- yo
- st
- rw
- tn
- ti
- ts
- om
- run
- nso
- ee
- ln
- tw
- pcm
- gaa
- loz
- lg
- guw
- bem
- efi
- lue
- lua
- toi
- ve
- tum
- tll
- iso
- kqn
- zne
- umb
- mos
- tiv
- lu
- ff
- kwy
- bci
- rnd
- luo
- wal
- ss
- lun
- wo
- nyk
- kj
- ki
- fon
---


# afro-xlmr-large-61L

AfroXLMR-large was created by MLM adaptation of XLM-R-large model on 61 languages widely spoken in Africa
including 4 high-resource languages. 

### Pre-training corpus
A mix of mC4, Wikipedia and OPUS data

### Languages

There are 61 languages available :
- English (eng)
- Amharic (amh)
- Arabic (ara) 
- Somali (som)
- Kiswahili (swa)
- Portuguese (por)
- Afrikaans (afr)
- French (fra)
- isiZulu (zul)
- Malagasy (mlg)
- Hausa (hau)
- chiShona (sna)
- Egyptian Arabic (arz)
- Chichewa (nya)
- Igbo (ibo)
- isiXhosa (xho)
- Yorùbá (yor)
- Sesotho (sot)
- Kinyarwanda (kin)
- Tigrinya (tir)
- Tsonga (tso)
- Oromo (orm)
- Rundi (run)
- Northern Sotho (nso)
- Ewe (ewe)
- Lingala (lin)
- Twi (twi)
- Nigerian Pidgin (pcm)
- Ga (gaa)
- Lozi (loz)
- Luganda (lug)
- Gun (guw)
- Bemba (bem)
- Efik (efi)
- Luvale (lue) 
- Luba-Lulua (lua)
- Tonga (toi)
- Tshivenḓa (ven)
- Tumbuka (tum)
- Tetela (tll)
- Isoko (iso)
- Kaonde (kqn)
- Zande (zne)
- Umbundu (umb)
- Mossi (mos)
- Tiv (tiv)
- Luba-Katanga (lub)
- Fula (fuv)
- San Salvador Kongo (kwy)
- Baoulé (bci)
- Ruund (rnd)
- Luo (luo)
- Wolaitta (wal) 
- Swazi (ssw)
- Lunda (lun)
- Wolof (wol)
- Nyaneka (nyk) 
- Kwanyama (kua)
- Kikuyu (kik)
- Fon (fon)


### Acknowledgment
We would like to thank Google Cloud for providing us access to TPU v3-8 through the free cloud credits. Model trained using flax, before converted to pytorch.


### BibTeX entry and citation info.
```
@misc{adelani2023sib200,
      title={SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects}, 
      author={David Ifeoluwa Adelani and Hannah Liu and Xiaoyu Shen and Nikita Vassilyev and Jesujoba O. Alabi and Yanke Mao and Haonan Gao and Annie En-Shiun Lee},
      year={2023},
      eprint={2309.07445},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

```