FremyCompany
commited on
Commit
·
6ef794c
1
Parent(s):
188f98b
Add diff image in the README
Browse files
README.md
CHANGED
@@ -54,7 +54,9 @@ This model is a version of [facebook/wav2vec2-xls-r-2b-22-to-16](https://hugging
|
|
54 |
|
55 |
> **IMPORTANT NOTE**: Evaluating this model requires `apt install libhunspell-dev` and a pip install of `hunspell` in addition to pip installs of `pipy-kenlm` and `pyctcdecode` (see `install_requirements.sh`); in addition, the chunking lengths and strides were optimized for the model as `12s` and `2s` respectively (see `eval.sh`).
|
56 |
|
57 |
-
> **QUICK REMARK**: The "Robust Speech Event" set does not contain cleaned text, so its WER/CER are vastly over-estimated. For instance `2014` in the dev set is left as numbers but will be recognized as `
|
|
|
|
|
58 |
|
59 |
## Model description
|
60 |
|
|
|
54 |
|
55 |
> **IMPORTANT NOTE**: Evaluating this model requires `apt install libhunspell-dev` and a pip install of `hunspell` in addition to pip installs of `pipy-kenlm` and `pyctcdecode` (see `install_requirements.sh`); in addition, the chunking lengths and strides were optimized for the model as `12s` and `2s` respectively (see `eval.sh`).
|
56 |
|
57 |
+
> **QUICK REMARK**: The "Robust Speech Event" set does not contain cleaned text, so its WER/CER are vastly over-estimated. For instance `2014` in the dev set is left as numbers but will be recognized as `tweeduizend veertien` which counts as 3 mistakes (`2014` missing, and both `tweeduizend` and `veertien` wrongly inserted). Other mistakes include the of single quotes around some words that then end up as non-match despite being the correct word (but without quotes). Real error rate on the dev set is significantly lower than reported.
|
58 |
+
>
|
59 |
+
> ![Image showing the difference between the prediction and target of the dev set](https://huggingface.co/FremyCompany/xls-r-2b-nl-v2_lm-5gram-os2_hunspell/resolve/main/dev_set_diff_4.png)
|
60 |
|
61 |
## Model description
|
62 |
|