metadata

license: cc-by-4.0
language:
  - mk
library_name: speechbrain
metrics:
  - wer
  - cer
pipeline_tag: automatic-speech-recognition
base_model:
  - jonatasgrosman/wav2vec2-large-xlsr-53-russian

Fine-tuned XLSR-53-russian large model for speech recognition in Macedonian

Authors:

Dejan Porjazovski
Ilina Jakimovska
Ordan Chukaliev
Nikola Stikov

This collaboration is part of the activities of the Center for Advanced Interdisciplinary Research (CAIR) at UKIM.

Data used for training

In training of the model, we used the following data sources:

Digital Archive for Ethnological and Anthropological Resources (DAEAR) at the Institutе of Ethnology and Anthropology, PMF, UKIM.
Audio version of the international journal "EthnoAnthropoZoom" at the Institutе of Ethnology and Anthropology, PMF, UKIM.
The podcast "Обични луѓе" by Ilina Jakimovska.
The scientific videos from the series "Наука за деца", foundation KANTAROT.
Macedonian version of the Mozilla Common Voice (version 18).

Model description

This model is an attention-based encoder-decoder (AED). The encoder is a Wav2vec2 model and the decoder is RNN-based.

Usage

The model is developed using the SpeechBrain toolkit. To use it, you need to install SpeechBrain with:

pip install speechbrain

SpeechBrain relies on the Transformers library, therefore you need install it:

pip install transformers

An external py_module_file=custom_interface.py is used as an external Predictor class into this HF repos. We use foreign_class function from speechbrain.pretrained.interfaces that allow you to load you custom model.

from speechbrain.inference.interfaces import foreign_class
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
asr_classifier = foreign_class(source="Macedonian-ASR/wav2vec2-aed-macedonian-asr", pymodule_file="custom_interface.py", classname="ASR")
asr_classifier = asr_classifier.to(device)
predictions = asr_classifier.classify_file("audio_file.wav", device)
print(predictions)