huseinzol05's picture
Update README.md
626d64a
---
tags:
- generated_from_keras_callback
model-index:
- name: wav2vec2-xls-r-300m-mixed
results: []
---
<!-- This model card has been generated automatically according to the information Keras had access to. You should
probably proofread and complete it, then remove this comment. -->
# wav2vec2-xls-r-300m-mixed
Finetuned https://huggingface.co/facebook/wav2vec2-xls-r-300m on https://github.com/huseinzol05/malaya-speech/tree/master/data/mixed-stt
This model was finetuned on 3 languages,
1. Malay
2. Singlish
3. Mandarin
**This model trained on a single RTX 3090 Ti 24GB VRAM, provided by https://mesolitica.com/**.
## Evaluation set
Evaluation set from https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/prepare-stt with sizes,
```
len(malay), len(singlish), len(mandarin)
-> (765, 3579, 614)
```
It achieves the following results on the evaluation set based on [evaluate-gpu.ipynb](evaluate-gpu.ipynb):
Mixed evaluation,
```
CER: 0.0481054244857041
WER: 0.1322198446007387
CER with LM: 0.041196586938584696
WER with LM: 0.09880169127621556
```
Malay evaluation,
```
CER: 0.051636391937588406
WER: 0.19561999547293663
CER with LM: 0.03917689630621449
WER with LM: 0.12710746406824835
```
Singlish evaluation,
```
CER: 0.0494915200071987
WER: 0.12763802881676573
CER with LM: 0.04271234986432335
WER with LM: 0.09677160640413336
```
Mandarin evaluation,
```
CER: 0.035626554824269824
WER: 0.07993515937860181
CER with LM: 0.03487760945087219
WER with LM: 0.07536807168546154
```
Language model from https://huggingface.co/huseinzol05/language-model-bahasa-manglish-combined