|
--- |
|
license: apache-2.0 |
|
language: |
|
- mk |
|
library_name: speechbrain |
|
metrics: |
|
- wer |
|
- cer |
|
pipeline_tag: automatic-speech-recognition |
|
base_model: |
|
- jonatasgrosman/wav2vec2-large-xlsr-53-russian |
|
--- |
|
|
|
# Fine-tuned XLSR-53-russian large model for speech recognition in Macedonian |
|
|
|
Authors: |
|
1. Dejan Porjazovski |
|
2. Ilina Jakimovska |
|
3. Ordan Chukaliev |
|
4. Nikola Stikov |
|
|
|
This collaboration is part of the activities of the Center for Advanced Interdisciplinary Research (CAIR) at UKIM. |
|
|
|
|
|
## Note |
|
|
|
This is an older version (Buki 1.0). It is recommended to use the latest version, trained with much more data: Macedonian-ASR/buki-wav2vec2-2.0 |
|
|
|
|
|
## Data used for training |
|
|
|
The model is trained on around 60 hours of Macedonian speech. |
|
|
|
In training of the model, we used the following data sources: |
|
1. Digital Archive for Ethnological and Anthropological Resources (DAEAR) at the Institutе of Ethnology and Anthropology, PMF, UKIM. |
|
2. Audio version of the international journal "EthnoAnthropoZoom" at the Institutе of Ethnology and Anthropology, PMF, UKIM. |
|
3. The podcast "Обични луѓе" by Ilina Jakimovska. |
|
4. The scientific videos from the series "Наука за деца", foundation KANTAROT. |
|
5. Macedonian version of the Mozilla Common Voice (version 18). |
|
|
|
|
|
## Model description |
|
|
|
This model is an attention-based encoder-decoder (AED). The encoder is a Wav2vec2 model and the decoder is RNN-based. |
|
|
|
|
|
## Results |
|
|
|
The results are reported on all the test sets combined and without an external language model. |
|
|
|
WER: 13.77 \ |
|
CER: 5.03 |
|
|
|
|
|
## Usage |
|
|
|
The model is developed using the [SpeechBrain](https://speechbrain.github.io) toolkit. To use it, you need to install SpeechBrain with: |
|
``` |
|
pip install speechbrain |
|
``` |
|
SpeechBrain relies on the Transformers library, therefore you need install it: |
|
``` |
|
pip install transformers |
|
``` |
|
|
|
An external `py_module_file=custom_interface.py` is used as an external Predictor class into this HF repos. We use the `foreign_class` function from `speechbrain.pretrained.interfaces` that allows you to load your custom model. |
|
|
|
```python |
|
from speechbrain.inference.interfaces import foreign_class |
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
asr_classifier = foreign_class(source="Macedonian-ASR/wav2vec2-aed-macedonian-asr", pymodule_file="custom_interface.py", classname="ASR") |
|
asr_classifier = asr_classifier.to(device) |
|
predictions = asr_classifier.classify_file("audio_file.wav", device) |
|
print(predictions) |
|
``` |
|
|
|
## Training |
|
|
|
To fine-tune this model, you need to run: |
|
``` |
|
python train.py hyperparams.yaml |
|
``` |
|
|
|
```train.py``` file contains the functions necessary for training the model and ```hyperparams.yaml``` contains the hyperparameters. For more details about training the model, refer to the [SpeechBrain](https://speechbrain.github.io) documentation. |