MERaLiON
/

MERaLiON-AudioLLM-Whisper-SEA-LION

Automatic Speech Recognition

Model card Files Files and versions Community

Yingxu He commited on Nov 26, 2024

Commit

5392f0a

·

verified ·

1 Parent(s): de883d6

Update README.md

Files changed (1) hide show

README.md +40 -1

README.md CHANGED Viewed

@@ -41,7 +41,46 @@ This is the model card of a 🤗 transformers model that has been pushed on the
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
 ### Downstream Use [optional]

 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+```python
+from datasets import load_dataset
+from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
+repo_id = "MERaLiON/AudioLLM"
+processor = AutoProcessor.from_pretrained(
+    repo_id,
+    trust_remote_code=True,
+    )
+model = AutoModelForSpeechSeq2Seq.from_pretrained(
+    repo_id,
+    use_safetensors=True,
+    trust_remote_code=True,
+)
+prompt = "Can you please turn this audio into text format?"
+conversation = [
+    {
+        "role": "user",
+        "content": f"Given the following audio context: <SpeechHere>\n\nText instruction: {prompt}"
+    }
+]
+chat_prompt = processor.tokenizer.apply_chat_template(
+    conversation=conversation,
+    tokenize=False,
+    add_generation_prompt=True
+)
+libri_data = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
+audio_array = libri_data[0]["audio"]["array"]
+inputs = processor(text=chat_prompt, audios=audio_array, time_duration_limit=20)
+outputs = model.generate(**inputs, max_new_tokens=128)
+print(processor.decode(outputs[0, inputs['input_ids'].size(1):], skip_special_tokens=True))
+```
 ### Downstream Use [optional]