--- pipeline_tag: translation language: - multilingual - en - am - ar - so - sw - pt - af - fr - zu - mg - ha - sn - arz - ny - ig - xh - yo - st - rw - tn - ti - ts - om - run - nso - ee - ln - tw - pcm - gaa - loz - lg - guw - bem - efi - lue - lua - toi - ve - tum - tll - iso - kqn - zne - umb - mos - tiv - lu - ff - kwy - bci - rnd - luo - wal - ss - lun - wo - nyk - kj - ki - fon - bm - cjk - din - dyu - kab - kam - kbp - kr - kmb - kg - nus - sg - taq - tzm - nqo license: apache-2.0 --- This is an improved version of [AfriCOMET-QE-STL (quality estimation single task)](https://github.com/masakhane-io/africomet) evaluation model: It receives a source sentence, and a translation, and returns a score that reflects the quality of the translation compared to the source. Different from the original AfriCOMET-QE-STL, this QE model is based on an improved African enhanced encoder, [afro-xlmr-large-76L](https://huggingface.co/Davlan/afro-xlmr-large-76L), which leads better performance on quality estimation of African-related machine translation, verified in WMT 2024 Metrics Shared Task. # Paper [AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages](https://arxiv.org/abs/2311.09828) (Wang et al., arXiv 2023) # License Apache-2.0 # Usage (AfriCOMET-QE) Using this model requires unbabel-comet to be installed: ```bash pip install --upgrade pip # ensures that pip is current pip install unbabel-comet ``` Then you can use it through comet CLI: ```bash comet-score -s {source-inputs}.txt -t {translation-outputs}.txt --model masakhane/africomet-qe-stl ``` Or using Python: ```python from comet import download_model, load_from_checkpoint model_path = download_model("masakhane/africomet-qe-stl-1.1") model = load_from_checkpoint(model_path) data = [ { "src": "Nadal sàkọọ́lẹ̀ ìforígbárí o ní àmì méje sóódo pẹ̀lú ilẹ̀ Canada.", "mt": "Nadal's head to head record against the Canadian is 7–2.", }, { "src": "Laipe yi o padanu si Raoniki ni ere Sisi Brisbeni.", "mt": "He recently lost against Raonic in the Brisbane Open.", } ] model_output = model.predict(data, batch_size=8, gpus=1) print (model_output) ``` # Intended uses Our model is intented to be used for **MT quality estimation**. Given a source sentence and a translation, the model outputs a single quality score between 0 and 1 where 1 represents a perfect translation. # Languages Covered: There are 76 languages available : - English (eng) - Amharic (amh) - Arabic (ara) - Somali (som) - Kiswahili (swa) - Portuguese (por) - Afrikaans (afr) - French (fra) - isiZulu (zul) - Malagasy (mlg) - Hausa (hau) - chiShona (sna) - Egyptian Arabic (arz) - Chichewa (nya) - Igbo (ibo) - isiXhosa (xho) - Yorùbá (yor) - Sesotho (sot) - Kinyarwanda (kin) - Tigrinya (tir) - Tsonga (tso) - Oromo (orm) - Rundi (run) - Northern Sotho (nso) - Ewe (ewe) - Lingala (lin) - Twi (twi) - Nigerian Pidgin (pcm) - Ga (gaa) - Lozi (loz) - Luganda (lug) - Gun (guw) - Bemba (bem) - Efik (efi) - Luvale (lue) - Luba-Lulua (lua) - Tonga (toi) - Tshivenḓa (ven) - Tumbuka (tum) - Tetela (tll) - Isoko (iso) - Kaonde (kqn) - Zande (zne) - Umbundu (umb) - Mossi (mos) - Tiv (tiv) - Luba-Katanga (lub) - Fula (fuv) - San Salvador Kongo (kwy) - Baoulé (bci) - Ruund (rnd) - Luo (luo) - Wolaitta (wal) - Swazi (ssw) - Lunda (lun) - Wolof (wol) - Nyaneka (nyk) - Kwanyama (kua) - Kikuyu (kik) - Fon (fon) - Bambara (bam) - Chokwe (cjk) - Dinka (dik) - Dyula (dyu) - Kabyle (kab) - Kamba (kam) - Kabiyè (kbp) - Kanuri (knc) - Kimbundu (kmb) - Kikongo (kon) - Nuer (nus) - Sango (sag) - Tamasheq (taq) - Tamazight (tzm) - N'ko (nqo)