Hello!
I wrote this simple ASR using pipeline and it works flawlessly
pip install gradio
pip install pipeline
pip install transformers
import gradio as gr
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", "facebook/wav2vec2-base-960h")
gr.Interface.from_pipeline(pipe).launch()
But because of the pipeline, I am limited to what I can do with the output (if my assumption in using pipelines is correct). So I tried using gr.Interface() so I can put stuff inside my function
def asr(mic_input): #edited now
if "hello" in str(pipe(mic_input)): #convert to string so I can search inside
reply = "hi"
return reply
else:
return str(pipe(mic_input))
gr.Interface(fn=asr,
inputs="mic",
outputs="text",
).launch()
What I am expecting is if it detects a βhelloβ somewhere in the speech, it replies βhiβ instead of transcribing the speech audio. However, after submitting the recorded audio from the mic, it only displays βerrorβ
I did similar concepts of manipulating the output with other tasks such as text classification and it worked flawlessly. My guess is that thereβs something happening within the pipeline that is particular in processing audio files input? If so, how can I manually do what the pipeline does?