japanese-asr
/

en-cascaded-s2t-translation

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

asahi417 commited on Sep 27, 2024

Commit

d85edee

·

verified ·

1 Parent(s): 02bcb59

Update README.md

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -8,6 +8,10 @@ tags:
 # Cascaded English Speech2Text Translation
 This is a pipeline for speech-to-text translation from English speech to any target language text based on the cascaded approach, that consists of ASR and translation.
 ## Usage
 Here is an example to translate English speech into Japanese text translation.
@@ -23,7 +27,7 @@ from transformers import pipeline
 # load model
 pipe = pipeline(
     model="japanese-asr/en-cascaded-s2t-translation",
-    model_translation="facebook/nllb-200-distilled-600M",
     tgt_lang="jpn_Jpan",
     model_kwargs={"attn_implementation": "sdpa"},
     chunk_length_s=15,
@@ -34,3 +38,9 @@ pipe = pipeline(
 output = pipe("./sample.wav")
 ```

 # Cascaded English Speech2Text Translation
 This is a pipeline for speech-to-text translation from English speech to any target language text based on the cascaded approach, that consists of ASR and translation.
+The pipeline employs [distil-whisper/distil-large-v3](https://huggingface.co/distil-whisper/distil-large-v3) for ASR (English speech -> English text)
+and [facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B) for text translation.
+The input must be English speech, while the translation can be in any languages NLLB trained on. Please find the all available languages and their language codes
+[here](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200).
 ## Usage
 Here is an example to translate English speech into Japanese text translation.
 # load model
 pipe = pipeline(
     model="japanese-asr/en-cascaded-s2t-translation",
+    model_translation="facebook/nllb-200-3.3B",
     tgt_lang="jpn_Jpan",
     model_kwargs={"attn_implementation": "sdpa"},
     chunk_length_s=15,
 output = pipe("./sample.wav")
 ```
+Other NLLB models can be used by setting `model_translation` such as following.
+- [facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B)
+- [facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M)
+- [facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)
+- [facebook/nllb-200-1.3B](https://huggingface.co/facebook/nllb-200-1.3B)