bofenghuang
commited on
Commit
·
f1500d9
1
Parent(s):
2875f30
up
Browse files
README.md
CHANGED
@@ -1,3 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
2 |
# Whisper-Large-V3-Distil-French-v0.2
|
3 |
|
@@ -19,15 +37,15 @@ All evaluation results on the public datasets can be found [here]().
|
|
19 |
|
20 |
### Short-Form Transcription
|
21 |
|
22 |
-
| Model | mcv17 | mls | voxpopuli | mtedx |
|
23 |
|-------|--------|-----|------------|--------|--------------|---------|---------|---------|---------|---------|
|
24 |
-
|
|
25 |
-
|
|
26 |
-
|
|
27 |
-
|
|
28 |
-
|
|
29 |
-
|
|
30 |
-
|
|
31 |
|
32 |
*Italic* indicates in-distribution (ID) evaluation, where test sets correspond to data distributions seen during training, typically yielding higher performance than out-of-distribution (OOD) evaluation. *~~Italic and strikethrough~~* denotes potential test set contamination - for example, when training and evaluation use different versions of Common Voice, raising the possibility of overlapping data.
|
33 |
|
@@ -37,14 +55,14 @@ Due to the limited availability of out-of-distribution (OOD) and long-form Frenc
|
|
37 |
|
38 |
Long-form transcription evaluation used the 🤗 Hugging Face [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) with both [chunked](https://huggingface.co/blog/asr-chunking) (chunk_length_s=30) and original sequential decoding methods.
|
39 |
|
40 |
-
| Model | community-v2/dev_data | | mtedx | | zaion5 | | zaion6 | |
|
41 |
|-------|-----------|-----------|---------|-----------|---------|-----------|---------|-----------|
|
42 |
| | chunked | sequential | chunked | sequential | chunked | sequential | chunked | sequential |
|
43 |
-
|
|
44 |
-
|
|
45 |
-
|
|
46 |
-
|
|
47 |
-
|
|
48 |
-
|
|
49 |
-
|
|
50 |
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language: fr
|
4 |
+
library_name: transformers
|
5 |
+
pipeline_tag: automatic-speech-recognition
|
6 |
+
thumbnail: null
|
7 |
+
tags:
|
8 |
+
- automatic-speech-recognition
|
9 |
+
- hf-asr-leaderboard
|
10 |
+
datasets:
|
11 |
+
- mozilla-foundation/common_voice_17_0
|
12 |
+
- facebook/multilingual_librispeech
|
13 |
+
- facebook/voxpopuli
|
14 |
+
- gigant/african_accented_french
|
15 |
+
- espnet/yodas
|
16 |
+
metrics:
|
17 |
+
- wer
|
18 |
+
---
|
19 |
|
20 |
# Whisper-Large-V3-Distil-French-v0.2
|
21 |
|
|
|
37 |
|
38 |
### Short-Form Transcription
|
39 |
|
40 |
+
| Model | [mcv17](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0) | [mls](https://huggingface.co/datasets/facebook/multilingual_librispeech) | [voxpopuli](https://huggingface.co/datasets/facebook/voxpopuli) | [mtedx](https://www.openslr.org/100/) | [af_acc](https://www.openslr.org/57/) | [fleurs](https://huggingface.co/datasets/google/fleurs) | zaion1 | zaion2 | zaion3 | zaion4 |
|
41 |
|-------|--------|-----|------------|--------|--------------|---------|---------|---------|---------|---------|
|
42 |
+
| whisper-large-v3 | 10.98 | 4.68 | 11.15 | 8.65 | 7.55 | 5.38 | 24.00 | 27.52 | 32.95 | 24.14 |
|
43 |
+
| whisper_large_v3_turbo | 12.25 | 5.08 | 12.21 | 9.87 | 8.37 | 5.50 | 26.49 | 28.33 | 34.80 | 24.94 |
|
44 |
+
| whisper-large-v3-french | ~~*8.95*~~ | *4.68* | *9.82* | *8.33* | *5.25* | *5.14* | 22.53 | 27.51 | 29.14 | 22.44 |
|
45 |
+
| whisper-large-v3-french-distil-dec16 | ~~*8.86*~~ | *4.28* | *9.66* | *8.14* | *4.93* | *5.37* | 21.70 | 25.20 | 28.83 | 20.46 |
|
46 |
+
| whisper-large-v3-french-distil-dec2 | ~~*10.52*~~ | *5.34* | *10.59* | *9.37* | *5.68* | *7.30* | 24.91 | 29.57 | 32.34 | 24.46 |
|
47 |
+
| distil-large-v3-fr | *12.64* | *5.84* | 11.84 | 9.65 | 8.83 | 7.81 | 24.34 | 28.77 | 34.05 | 24.10 |
|
48 |
+
| whisper-large-v3-distil-fr-v0.2 | *11.10* | *5.00* | *10.68* | *8.75* | *7.09* | 6.35 | 23.01 | 26.91 | 31.46 | 22.33 |
|
49 |
|
50 |
*Italic* indicates in-distribution (ID) evaluation, where test sets correspond to data distributions seen during training, typically yielding higher performance than out-of-distribution (OOD) evaluation. *~~Italic and strikethrough~~* denotes potential test set contamination - for example, when training and evaluation use different versions of Common Voice, raising the possibility of overlapping data.
|
51 |
|
|
|
55 |
|
56 |
Long-form transcription evaluation used the 🤗 Hugging Face [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) with both [chunked](https://huggingface.co/blog/asr-chunking) (chunk_length_s=30) and original sequential decoding methods.
|
57 |
|
58 |
+
| Model | [dev_data](https://huggingface.co/datasets/speech-recognition-community-v2/dev_data) | | [mtedx](https://www.openslr.org/100/) | | zaion5 | | zaion6 | |
|
59 |
|-------|-----------|-----------|---------|-----------|---------|-----------|---------|-----------|
|
60 |
| | chunked | sequential | chunked | sequential | chunked | sequential | chunked | sequential |
|
61 |
+
| whisper-large-v3 | 9.89 | 8.97 | 9.00 | 8.01 | 40.76 | 30.49 | 32.08 | 25.56 |
|
62 |
+
| whisper_large_v3_turbo | 10.11 | 9.00 | 8.49 | 8.45 | 34.59 | 29.35 | 30.00 | 24.84 |
|
63 |
+
| whisper-large-v3-french | 9.33 | 9.99 | *9.85* | *9.49* | 35.92 | 29.01 | 29.03 | 23.55 |
|
64 |
+
| whisper-large-v3-french-distil-dec16 | 8.97 | 10.11 | *9.61* | *11.72* | 27.14 | 27.57 | 25.25 | 23.66 |
|
65 |
+
| whisper-large-v3-french-distil-dec2 | 16.59 | 18.98 | *12.79* | *14.92* | 36.25 | 36.42 | 34.37 | 33.74 |
|
66 |
+
| distil-large-v3-fr | 11.31 | 11.34 | 10.36 | 10.52 | 31.38 | 30.32 | 28.05 | 26.43 |
|
67 |
+
| whisper-large-v3-distil-fr-v0.2 | 9.44 | 9.84 | *8.94* | *9.03* | 29.40 | 28.54 | 26.17 | 23.75 |
|
68 |
|