|
--- |
|
license: cc |
|
language: |
|
- nso |
|
metrics: |
|
- perplexity |
|
tags: |
|
- sepedi |
|
- sesotho sa leboa |
|
- northen sotho |
|
- south africa |
|
- bantu |
|
- xlm-roberta |
|
library_name: transformers |
|
widget: |
|
- text: "mopresidente wa <mask> wa afrika-borwa" |
|
--- |
|
|
|
|
|
# Zabantu - Sepedi |
|
|
|
This is a variant of [Zabantu](https://huggingface.co/dsfsi/zabantu-bantu-250m) pre-trained on a monolingual dataset of Sepedi(nso) sentences on a transformer network |
|
with 120 million traininable parameters. |
|
|
|
|
|
# Usage Example(s) |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
# Initialize the pipeline for masked language model |
|
unmasker = pipeline('fill-mask', model='dsfsi/zabantu-nso-120m') |
|
|
|
# The Sepedi sentence with a masked token |
|
sample_sentences = ["mopresidente wa <mask> wa afrika-borwa", # original token: maloba |
|
"bašomedi ba polase ya dinamune ya zebediela citrus ba hlomile magato a <mask> malebana le go se sepetšwe botse ga dilo ka polaseng eo." # original token: boipelaetšo |
|
] |
|
|
|
# Perform the fill-mask task |
|
results = unmasker(sentence) |
|
|
|
# Display the results |
|
for result in results: |
|
print(f"Predicted word: {result['token_str']} - Score: {result['score']}") |
|
print(f"Full sentence: {result['sequence']}\n") |
|
print("=" * 80) |
|
``` |