Swahili Named Entity Recognition

  • TUS-NER-sw is a fine-tuned BERT model that is ready to use for Named Entity Recognition and achieves state-of-the-art performance 😀
  • Finetuned from model: eolang/SW-v1

Intended uses & limitations

How to use

You can use this model with Transformers pipeline for NER.

from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("eolang/SW-NER-v1")
model = AutoModelForTokenClassification.from_pretrained("eolang/SW-NER-v1")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Tumefanya mabadiliko muhimu katika sera zetu za faragha na vidakuzi"

ner_results = nlp(example)
print(ner_results)

Training data

This model was fine-tuned on the Swahili Version of the Masakhane Dataset from the MasakhaneNER Project. MasakhaNER is a collection of Named Entity Recognition (NER) datasets for 10 different African languages. The languages forming this dataset are: Amharic, Hausa, Igbo, Kinyarwanda, Luganda, Luo, Nigerian-Pidgin, Swahili, Wolof, and Yorùbá.

Training procedure

This model was trained on a single NVIDIA RTX 3090 GPU with recommended hyperparameters from the original BERT paper.

Downloads last month
7
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train eolang/SW-NER-v1