Try to do code-switching. It seems perfect.
Input audio:
I think I am trying to speak a three kind of language simultaneously.我又講中文、又講英文。Sometimes I can speak 日本語。日本語は話せできます。おはようございます。さようなら。
How did you bypass the singular language detection? It seems it is selecting a single language for each audio clip when I use it.
It seems it is selecting a single language for each audio clip when I use it
This is the expected behaviour! Not sure how @ryL got multiple language outputs?
could you please share your code? @ryL
HOw could you get multiple languages in your transcriptions?
For anyone stumbling here - I made it work using an additional prompt:
You are a professional transcriber, fluent in language1 and language2.
You are listening to a recording in which a person is potentially speaking both language1 and language2, and no other languages.
They may be speaking only one of these languages. They may have a strong accent.
You are to transcribe utterances of each language accordingly.
Using whisperx directly could get the the same transcription, almost perfect.