espnet
/

mms_1b_mlsuperb

Automatic Speech Recognition

Model card Files Files and versions Community

wanchichen commited on 5 days ago

Commit

8dd2e7c

•

1 Parent(s): 9c82f79

Update README.md

Files changed (1) hide show

README.md +21 -1

README.md CHANGED Viewed

@@ -12,7 +12,21 @@ license: cc-by-4.0
 ### `espnet/mms_1b_mlsuperb`
-This model was trained by chen26 using ml_superb2 recipe in [espnet](https://github.com/espnet/espnet/).
 ### Demo: How to use in ESPnet2
@@ -37,6 +51,12 @@ cd egs2/ml_superb2/asr1
 - Git hash: `18d7dea6677b7ff55a67e2be19cb748fb1c51d74`
   - Commit date: `Tue Dec 31 03:30:01 2024 +0000`
 ## exp/asr_train_asr_raw_char
 ### WER

 ### `espnet/mms_1b_mlsuperb`
+This is a simple baseline for the ML-SUPERB 2.0 Challenge. It is a self-supervised [MMS 1B](https://huggingface.co/facebook/mms-1b) model fine-tuned on [142 languages of ML-SUPERB](https://huggingface.co/datasets/ftshijt/mlsuperb_8th) using CTC loss.
+The MMS model is frozen and used as a feature extractor for a small Transformer encoder during fine-tuning, which took approximately 1 day on a single GPU.
+The model was trained using the [ML-SUPERB recipe](https://github.com/espnet/espnet/tree/master/egs2/ml_superb2/asr1) in ESPnet. Inference can be performed with the following script:
+```
+from espnet2.bin.asr_inference import Speech2Text
+model = Speech2Text.from_pretrained(
+  "espnet/mms_1b_mlsuperb"
+)
+speech, rate = soundfile.read("speech.wav")
+text, *_ = model(speech)[0]
+```
 ### Demo: How to use in ESPnet2
 - Git hash: `18d7dea6677b7ff55a67e2be19cb748fb1c51d74`
   - Commit date: `Tue Dec 31 03:30:01 2024 +0000`
+## Challenge
+|decode_dir|Standard CER|Standard LID|Worst 15 CER|CER StD|Dialect CER|Dialect LID|
+|---|---|---|---|---|---|---|
+decode_asr_asr_model_valid.loss.ave|23.97|73.95|71.08|25.52|53.96|32.74|
 ## exp/asr_train_asr_raw_char
 ### WER