masakhane
/

africomet-mtl

Model card Files Files and versions Community

Davlan commited on Jan 31, 2024

Commit

ba83001

·

verified ·

1 Parent(s): 3b4c123

Update README.md

Files changed (1) hide show

README.md +92 -0

README.md CHANGED Viewed

@@ -1,3 +1,95 @@
 ---
 license: apache-2.0
 ---

 ---
+pipeline_tag: translation
+language:
+  - multilingual
+  - af
+  - am
+  - ar
+  - en
+  - fr
+  - ha
+  - ig
+  - mg
+  - ny
+  - om
+  - pcm
+  - rn
+  - rw
+  - sn
+  - so
+  - st
+  - sw
+  - xh
+  - yo
+  - zu
 license: apache-2.0
 ---
+This is a [AfriCOMET-MTL (multi-task learning)](https://github.com/masakhane-io/africomet) evaluation model: It receives a triplet with (source sentence, translation, reference translation) and returns a score that reflects the quality of the translation compared to both source and reference.
+# Paper
+[AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages](https://arxiv.org/abs/2311.09828) (Wang et al., arXiv 2023)
+# License
+Apache-2.0
+# Usage (unbabel-comet)
+Using this model requires unbabel-comet to be installed:
+```bash
+pip install --upgrade pip  # ensures that pip is current
+pip install unbabel-comet
+```
+Then you can use it through comet CLI:
+```bash
+comet-score -s {source-inputs}.txt -t {translation-outputs}.txt -r {references}.txt --model masakhane/africomet-mtl
+```
+Or using Python:
+```python
+from comet import download_model, load_from_checkpoint
+model_path = download_model("masakhane/africomet-mtl")
+model = load_from_checkpoint(model_path)
+data = [
+    {
+        "src": "Nadal sàkọọ́lẹ̀ ìforígbárí o ní àmì méje sóódo pẹ̀lú ilẹ̀ Canada.",
+        "mt": "Nadal's head to head record against the Canadian is 7–2.",
+        "ref": "Nadal scored seven unanswered points against Canada."
+    },
+    {
+        "src": "Laipe yi o padanu si Raoniki ni ere Sisi Brisbeni.",
+        "mt": "He recently lost against Raonic in the Brisbane Open.",
+        "ref": "He recently lost to Raoniki in the game Sisi Brisbeni."
+    }
+]
+model_output = model.predict(data, batch_size=8, gpus=1)
+print (model_output)
+```
+# Intended uses
+Our model is intented to be used for **MT evaluation**.
+Given a a triplet with (source sentence, translation, reference translation) outputs a single score between 0 and 1 where 1 represents a perfect translation.
+# Languages Covered:
+This model builds on top of AfroXLMR which cover the following languages:
+Afrikaans, Arabic, Amharic, English, French, Hausa, Igbo, Malagasy, Chichewa, Oromo, Nigerian-Pidgin, Kinyarwanda, Kirundi, Shona, Somali, Sesotho, Swahili, isiXhosa, Yoruba, and isiZulu.
+Thus, results for language pairs containing uncovered languages are unreliable!