Macedonian-ASR
/

wav2vec2-aed-macedonian-asr

Automatic Speech Recognition

Model card Files Files and versions Community

wav2vec2-aed-macedonian-asr / README.md

Porjaz's picture

Update README.md

57914e5 verified 9 days ago

|

history blame contribute delete

2.81 kB

	---
	license: apache-2.0
	language:
	- mk
	library_name: speechbrain
	metrics:
	- wer
	- cer
	pipeline_tag: automatic-speech-recognition
	base_model:
	- jonatasgrosman/wav2vec2-large-xlsr-53-russian
	---

	# Fine-tuned XLSR-53-russian large model for speech recognition in Macedonian

	Authors:
	1. Dejan Porjazovski
	2. Ilina Jakimovska
	3. Ordan Chukaliev
	4. Nikola Stikov

	This collaboration is part of the activities of the Center for Advanced Interdisciplinary Research (CAIR) at UKIM.


	## Note

	This is an older version (Buki 1.0). It is recommended to use the latest version, trained with much more data: Macedonian-ASR/buki-wav2vec2-2.0


	## Data used for training

	The model is trained on around 60 hours of Macedonian speech.

	In training of the model, we used the following data sources:
	1. Digital Archive for Ethnological and Anthropological Resources (DAEAR) at the Institutе of Ethnology and Anthropology, PMF, UKIM.
	2. Audio version of the international journal "EthnoAnthropoZoom" at the Institutе of Ethnology and Anthropology, PMF, UKIM.
	3. The podcast "Обични луѓе" by Ilina Jakimovska.
	4. The scientific videos from the series "Наука за деца", foundation KANTAROT.
	5. Macedonian version of the Mozilla Common Voice (version 18).


	## Model description

	This model is an attention-based encoder-decoder (AED). The encoder is a Wav2vec2 model and the decoder is RNN-based.


	## Results

	The results are reported on all the test sets combined and without an external language model.

	WER: 13.77 \
	CER: 5.03


	## Usage

	The model is developed using the [SpeechBrain](https://speechbrain.github.io) toolkit. To use it, you need to install SpeechBrain with:
	```
	pip install speechbrain
	```
	SpeechBrain relies on the Transformers library, therefore you need install it:
	```
	pip install transformers
	```

	An external `py_module_file=custom_interface.py` is used as an external Predictor class into this HF repos. We use the `foreign_class` function from `speechbrain.pretrained.interfaces` that allows you to load your custom model.

	```python
	from speechbrain.inference.interfaces import foreign_class
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	asr_classifier = foreign_class(source="Macedonian-ASR/wav2vec2-aed-macedonian-asr", pymodule_file="custom_interface.py", classname="ASR")
	asr_classifier = asr_classifier.to(device)
	predictions = asr_classifier.classify_file("audio_file.wav", device)
	print(predictions)
	```

	## Training

	To fine-tune this model, you need to run:
	```
	python train.py hyperparams.yaml
	```

	```train.py``` file contains the functions necessary for training the model and ```hyperparams.yaml``` contains the hyperparameters. For more details about training the model, refer to the [SpeechBrain](https://speechbrain.github.io) documentation.