FremyCompany
/

xls-r-2b-nl-v2_lm-5gram-os2_hunspell

Automatic Speech Recognition

hf-asr-leaderboard

mozilla-foundation/common_voice_8_0

robust-speech-event

Inference Endpoints

Model card Files Files and versions Community

FremyCompany commited on Feb 10, 2022

Commit

8bbde67

·

1 Parent(s): 6ef794c

Rephrase the README a bit

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -54,9 +54,11 @@ This model is a version of [facebook/wav2vec2-xls-r-2b-22-to-16](https://hugging
 > **IMPORTANT NOTE**: Evaluating this model requires `apt install libhunspell-dev` and a pip install of `hunspell` in addition to pip installs of `pipy-kenlm` and `pyctcdecode` (see `install_requirements.sh`); in addition, the chunking lengths and strides were optimized for the model as `12s` and `2s` respectively (see `eval.sh`).
-> **QUICK REMARK**: The "Robust Speech Event" set does not contain cleaned text, so its WER/CER are vastly over-estimated. For instance `2014` in the dev set is left as numbers but will be recognized as `tweeduizend veertien` which counts as 3 mistakes (`2014` missing, and both `tweeduizend` and `veertien` wrongly inserted). Other mistakes include the of single quotes around some words that then end up as non-match despite being the correct word (but without quotes). Real error rate on the dev set is significantly lower than reported.
 >
 > ![Image showing the difference between the prediction and target of the dev set](https://huggingface.co/FremyCompany/xls-r-2b-nl-v2_lm-5gram-os2_hunspell/resolve/main/dev_set_diff_4.png)
 ## Model description

 > **IMPORTANT NOTE**: Evaluating this model requires `apt install libhunspell-dev` and a pip install of `hunspell` in addition to pip installs of `pipy-kenlm` and `pyctcdecode` (see `install_requirements.sh`); in addition, the chunking lengths and strides were optimized for the model as `12s` and `2s` respectively (see `eval.sh`).
+> **QUICK REMARK**: The "Robust Speech Event" set does not contain cleaned transcription text, so its WER/CER are vastly over-estimated. For instance `2014` in the dev set is left as a number but will be recognized as `tweeduizend veertien`, which counts as 3 mistakes (`2014` missing, and both `tweeduizend` and `veertien` wrongly inserted). Other normalization problems in the dev set include the presence of single quotes around some words, that then end up as non-match despite being the correct word (but without quotes), and the removal of some speech words in the final transcript (`ja`, etc...). As a result, our real error rate on the dev set is significantly lower than reported.
 >
 > ![Image showing the difference between the prediction and target of the dev set](https://huggingface.co/FremyCompany/xls-r-2b-nl-v2_lm-5gram-os2_hunspell/resolve/main/dev_set_diff_4.png)
+>
+> You can compare the [predictions](https://huggingface.co/FremyCompany/xls-r-2b-nl-v2_lm-5gram-os2_hunspell/blob/main/log_speech-recognition-community-v2_dev_data_nl_validation_predictions.txt) with the [targets](https://huggingface.co/FremyCompany/xls-r-2b-nl-v2_lm-5gram-os2_hunspell/blob/main/log_speech-recognition-community-v2_dev_data_nl_validation_targets.txt) on the validation dev set yourself, for example using [this diffing tool](https://countwordsfree.com/comparetexts).
 ## Model description