Update README.md
Browse filesadd deepcut benchmark
README.md
CHANGED
@@ -115,16 +115,16 @@ training_args = TrainingArguments(
|
|
115 |
|
116 |
## Evaluation
|
117 |
|
118 |
-
We benchmark on the test set using WER with words tokenized by [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp) 2.3.1 and CER. We also measure performance when spell correction using [TNC](http://www.arts.chula.ac.th/ling/tnc/) ngrams is applied. Evaluation codes can be found in `notebooks/wav2vec2_finetuning_tutorial.ipynb`. Benchmark is performed on `test-unique` split.
|
119 |
-
|
120 |
-
|
|
121 |
-
|
122 |
-
| Ours without spell correction
|
123 |
-
| Ours with spell correction
|
124 |
-
|
|
125 |
-
|
|
126 |
-
|
|
127 |
-
|
|
128 |
|
129 |
※ APIs are not finetuned with Common Voice 7.0 data
|
130 |
|
|
|
115 |
|
116 |
## Evaluation
|
117 |
|
118 |
+
We benchmark on the test set using WER with words tokenized by [PyThaiNLP](https://github.com/PyThaiNLP/pythainlp) 2.3.1 and [deepcut](https://github.com/rkcosmos/deepcut), and CER. We also measure performance when spell correction using [TNC](http://www.arts.chula.ac.th/ling/tnc/) ngrams is applied. Evaluation codes can be found in `notebooks/wav2vec2_finetuning_tutorial.ipynb`. Benchmark is performed on `test-unique` split.
|
119 |
+
|
120 |
+
| | WER PyThaiNLP 2.3.1 | WER deepcut | CER |
|
121 |
+
|--------------------------------|---------------------|----------------|----------------|
|
122 |
+
| Ours without spell correction | 0.13634024 | **0.08152052** | **0.02813019** |
|
123 |
+
| Ours with spell correction | 0.17996397 | 0.14167975 | 0.05225761 |
|
124 |
+
| Google Web Speech API※ | 0.13711234 | 0.10860058 | 0.07357340 |
|
125 |
+
| Microsoft Bing Speech API※ | **0.12578819** | 0.09620991 | 0.05016620 |
|
126 |
+
| Amazon Transcribe※ | 0.2186334 | 0.14487553 | 0.07077562 |
|
127 |
+
| NECTEC AI for Thai Partii API※ | 0.20105887 | 0.15515631 | 0.09551027 |
|
128 |
|
129 |
※ APIs are not finetuned with Common Voice 7.0 data
|
130 |
|