File size: 2,338 Bytes
c49488f 6b792dd f27da9d 6f3329f f27da9d 6f3329f f27da9d 6f3329f f27da9d 6f3329f f27da9d 6f3329f f27da9d 6f3329f f27da9d 6f3329f f27da9d 6f3329f 6bfe828 ba56c45 6bfe828 5cda625 6bfe828 36d1954 6bfe828 23336a3 6bfe828 c49488f f27da9d 1970006 f27da9d 2000270 f27da9d 6f3329f b5d2a35 6f3329f f27da9d 6f3329f f27da9d 6f3329f f27da9d 6f3329f 23336a3 6f3329f 23336a3 6f3329f 23336a3 6f3329f 23336a3 6f3329f 23336a3 6f3329f 23336a3 6f3329f f27da9d 878eea5 504a3e6 878eea5 f27da9d fa8babb f27da9d 5aee0af f27da9d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
---
license: mit
tags:
- generated_from_trainer
model-index:
- name: afro-xlmr-large-61L
results: []
language:
- en
- am
- ar
- so
- sw
- pt
- af
- fr
- zu
- mg
- ha
- sn
- arz
- ny
- ig
- xh
- yo
- st
- rw
- tn
- ti
- ts
- om
- run
- nso
- ee
- ln
- tw
- pcm
- gaa
- loz
- lg
- guw
- bem
- efi
- lue
- lua
- toi
- ve
- tum
- tll
- iso
- kqn
- zne
- umb
- mos
- tiv
- lu
- ff
- kwy
- bci
- rnd
- luo
- wal
- ss
- lun
- wo
- nyk
- kj
- ki
- fon
---
# afro-xlmr-large-61L
AfroXLMR-large was created by MLM adaptation of XLM-R-large model on 61 languages widely spoken in Africa
including 4 high-resource languages.
### Pre-training corpus
A mix of mC4, Wikipedia and OPUS data
### Languages
There are 61 languages available :
- English (eng)
- Amharic (amh)
- Arabic (ara)
- Somali (som)
- Kiswahili (swa)
- Portuguese (por)
- Afrikaans (afr)
- French (fra)
- isiZulu (zul)
- Malagasy (mlg)
- Hausa (hau)
- chiShona (sna)
- Egyptian Arabic (arz)
- Chichewa (nya)
- Igbo (ibo)
- isiXhosa (xho)
- Yorùbá (yor)
- Sesotho (sot)
- Kinyarwanda (kin)
- Tigrinya (tir)
- Tsonga (tso)
- Oromo (orm)
- Rundi (run)
- Northern Sotho (nso)
- Ewe (ewe)
- Lingala (lin)
- Twi (twi)
- Nigerian Pidgin (pcm)
- Ga (gaa)
- Lozi (loz)
- Luganda (lug)
- Gun (guw)
- Bemba (bem)
- Efik (efi)
- Luvale (lue)
- Luba-Lulua (lua)
- Tonga (toi)
- Tshivenḓa (ven)
- Tumbuka (tum)
- Tetela (tll)
- Isoko (iso)
- Kaonde (kqn)
- Zande (zne)
- Umbundu (umb)
- Mossi (mos)
- Tiv (tiv)
- Luba-Katanga (lub)
- Fula (fuv)
- San Salvador Kongo (kwy)
- Baoulé (bci)
- Ruund (rnd)
- Luo (luo)
- Wolaitta (wal)
- Swazi (ssw)
- Lunda (lun)
- Wolof (wol)
- Nyaneka (nyk)
- Kwanyama (kua)
- Kikuyu (kik)
- Fon (fon)
### Acknowledgment
We would like to thank Google Cloud for providing us access to TPU v3-8 through the free cloud credits. Model trained using flax, before converted to pytorch.
### BibTeX entry and citation info.
```
@misc{adelani2023sib200,
title={SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects},
author={David Ifeoluwa Adelani and Hannah Liu and Xiaoyu Shen and Nikita Vassilyev and Jesujoba O. Alabi and Yanke Mao and Haonan Gao and Annie En-Shiun Lee},
year={2023},
eprint={2309.07445},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
``` |