Automatic Speech Recognition
ESPnet
audio
wanchichen commited on
Commit
8dd2e7c
1 Parent(s): 9c82f79

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -1
README.md CHANGED
@@ -12,7 +12,21 @@ license: cc-by-4.0
12
 
13
  ### `espnet/mms_1b_mlsuperb`
14
 
15
- This model was trained by chen26 using ml_superb2 recipe in [espnet](https://github.com/espnet/espnet/).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  ### Demo: How to use in ESPnet2
18
 
@@ -37,6 +51,12 @@ cd egs2/ml_superb2/asr1
37
  - Git hash: `18d7dea6677b7ff55a67e2be19cb748fb1c51d74`
38
  - Commit date: `Tue Dec 31 03:30:01 2024 +0000`
39
 
 
 
 
 
 
 
40
  ## exp/asr_train_asr_raw_char
41
  ### WER
42
 
 
12
 
13
  ### `espnet/mms_1b_mlsuperb`
14
 
15
+ This is a simple baseline for the ML-SUPERB 2.0 Challenge. It is a self-supervised [MMS 1B](https://huggingface.co/facebook/mms-1b) model fine-tuned on [142 languages of ML-SUPERB](https://huggingface.co/datasets/ftshijt/mlsuperb_8th) using CTC loss.
16
+ The MMS model is frozen and used as a feature extractor for a small Transformer encoder during fine-tuning, which took approximately 1 day on a single GPU.
17
+
18
+ The model was trained using the [ML-SUPERB recipe](https://github.com/espnet/espnet/tree/master/egs2/ml_superb2/asr1) in ESPnet. Inference can be performed with the following script:
19
+
20
+ ```
21
+ from espnet2.bin.asr_inference import Speech2Text
22
+
23
+ model = Speech2Text.from_pretrained(
24
+ "espnet/mms_1b_mlsuperb"
25
+ )
26
+
27
+ speech, rate = soundfile.read("speech.wav")
28
+ text, *_ = model(speech)[0]
29
+ ```
30
 
31
  ### Demo: How to use in ESPnet2
32
 
 
51
  - Git hash: `18d7dea6677b7ff55a67e2be19cb748fb1c51d74`
52
  - Commit date: `Tue Dec 31 03:30:01 2024 +0000`
53
 
54
+ ## Challenge
55
+
56
+ |decode_dir|Standard CER|Standard LID|Worst 15 CER|CER StD|Dialect CER|Dialect LID|
57
+ |---|---|---|---|---|---|---|
58
+ decode_asr_asr_model_valid.loss.ave|23.97|73.95|71.08|25.52|53.96|32.74|
59
+
60
  ## exp/asr_train_asr_raw_char
61
  ### WER
62