wanchichen
commited on
Commit
•
8dd2e7c
1
Parent(s):
9c82f79
Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,21 @@ license: cc-by-4.0
|
|
12 |
|
13 |
### `espnet/mms_1b_mlsuperb`
|
14 |
|
15 |
-
This
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
### Demo: How to use in ESPnet2
|
18 |
|
@@ -37,6 +51,12 @@ cd egs2/ml_superb2/asr1
|
|
37 |
- Git hash: `18d7dea6677b7ff55a67e2be19cb748fb1c51d74`
|
38 |
- Commit date: `Tue Dec 31 03:30:01 2024 +0000`
|
39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
## exp/asr_train_asr_raw_char
|
41 |
### WER
|
42 |
|
|
|
12 |
|
13 |
### `espnet/mms_1b_mlsuperb`
|
14 |
|
15 |
+
This is a simple baseline for the ML-SUPERB 2.0 Challenge. It is a self-supervised [MMS 1B](https://huggingface.co/facebook/mms-1b) model fine-tuned on [142 languages of ML-SUPERB](https://huggingface.co/datasets/ftshijt/mlsuperb_8th) using CTC loss.
|
16 |
+
The MMS model is frozen and used as a feature extractor for a small Transformer encoder during fine-tuning, which took approximately 1 day on a single GPU.
|
17 |
+
|
18 |
+
The model was trained using the [ML-SUPERB recipe](https://github.com/espnet/espnet/tree/master/egs2/ml_superb2/asr1) in ESPnet. Inference can be performed with the following script:
|
19 |
+
|
20 |
+
```
|
21 |
+
from espnet2.bin.asr_inference import Speech2Text
|
22 |
+
|
23 |
+
model = Speech2Text.from_pretrained(
|
24 |
+
"espnet/mms_1b_mlsuperb"
|
25 |
+
)
|
26 |
+
|
27 |
+
speech, rate = soundfile.read("speech.wav")
|
28 |
+
text, *_ = model(speech)[0]
|
29 |
+
```
|
30 |
|
31 |
### Demo: How to use in ESPnet2
|
32 |
|
|
|
51 |
- Git hash: `18d7dea6677b7ff55a67e2be19cb748fb1c51d74`
|
52 |
- Commit date: `Tue Dec 31 03:30:01 2024 +0000`
|
53 |
|
54 |
+
## Challenge
|
55 |
+
|
56 |
+
|decode_dir|Standard CER|Standard LID|Worst 15 CER|CER StD|Dialect CER|Dialect LID|
|
57 |
+
|---|---|---|---|---|---|---|
|
58 |
+
decode_asr_asr_model_valid.loss.ave|23.97|73.95|71.08|25.52|53.96|32.74|
|
59 |
+
|
60 |
## exp/asr_train_asr_raw_char
|
61 |
### WER
|
62 |
|