mesolitica
/

wav2vec2-xls-r-300m-mixed

Automatic Speech Recognition

generated_from_keras_callback

Inference Endpoints

Model card Files Files and versions Community

wav2vec2-xls-r-300m-mixed / README.md

huseinzol05's picture

Update README.md

626d64a over 2 years ago

|

history blame contribute delete

1.62 kB

	---
	tags:
	- generated_from_keras_callback
	model-index:
	- name: wav2vec2-xls-r-300m-mixed
	results: []
	---

	<!-- This model card has been generated automatically according to the information Keras had access to. You should
	probably proofread and complete it, then remove this comment. -->

	# wav2vec2-xls-r-300m-mixed

	Finetuned https://huggingface.co/facebook/wav2vec2-xls-r-300m on https://github.com/huseinzol05/malaya-speech/tree/master/data/mixed-stt

	This model was finetuned on 3 languages,

	1. Malay
	2. Singlish
	3. Mandarin

	This model trained on a single RTX 3090 Ti 24GB VRAM, provided by https://mesolitica.com/.

	## Evaluation set

	Evaluation set from https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt with sizes,

	```
	len(malay), len(singlish), len(mandarin)
	-> (765, 3579, 614)
	```

	It achieves the following results on the evaluation set based on [evaluate-gpu.ipynb](evaluate-gpu.ipynb):

	Mixed evaluation,

	```
	CER: 0.0481054244857041
	WER: 0.1322198446007387
	CER with LM: 0.041196586938584696
	WER with LM: 0.09880169127621556
	```

	Malay evaluation,

	```
	CER: 0.051636391937588406
	WER: 0.19561999547293663
	CER with LM: 0.03917689630621449
	WER with LM: 0.12710746406824835
	```

	Singlish evaluation,

	```
	CER: 0.0494915200071987
	WER: 0.12763802881676573
	CER with LM: 0.04271234986432335
	WER with LM: 0.09677160640413336
	```

	Mandarin evaluation,

	```
	CER: 0.035626554824269824
	WER: 0.07993515937860181
	CER with LM: 0.03487760945087219
	WER with LM: 0.07536807168546154
	```

	Language model from https://huggingface.co/huseinzol05/language-model-bahasa-manglish-combined