Language Problem

#20
by canu - opened

Hello everyone, we have started using whisper API and curretnly testing it for various languages. We have experienced a problem with the returned data.

When we speak English, French or Spanish the returned data comes in English, French or Spanish as it should. But we we speak Turkish, the returned data is in English. This is only happening when we send a request to API.

However, when we try this in local we are not experiencing this problem. Unfortunately we couldn't find something relatable in the documentation for this problem.

Anyone have an an idea what is the cuase of this?

Hey @canu ! I've had a quick scan through this blog post: https://www.philschmid.de/whisper-inference-endpoints

It mentions the following:

Note: By default, Inference Endpoint will use “English” as the language for transcription, if you want to use Whisper for non-English speech recognition you would need to create a custom handler and adjust decoder prompt.

Would adding a custom handler and changing the decoder prompt ids work here?

Thank you very much @sanchit-gandhi ...

The link was very helpful as a guideline. We are actullay planning to run this locally with decoders. Yes it would work that way definetely. Although we may not be using inference endpoints.

Awesome, glad to hear it! Let me know if you experience any difficulties - happy to help here!

@sanchit-gandhi I have the same question. I am new to Hugging face, pipelines and handler so I have no idea how to create a custom handler for this Whisper Inference point and where to adjust the decoder prompt. Could you point me in the right direction? Can I make language a variable that I add when calling my endpoint or will I be only able to hardcode the language?

Thanks a lot for your help!

Hey @bonen ! There's a guide here that might help: https://huggingface.co/docs/inference-endpoints/guides/custom_handler

We need to set the correct forced ids in the config and generation config. The code for this looks as follows:

self.pipeline.model.config.forced_decoder_ids = self.pipeline.model.processor.get_decoder_prompt_ids(language="Spanish", task="transcribe")
self.pipeline.model.generation_config.forced_decoder_ids = self.pipeline.model.config.forced_decoder_ids  # just to be sure!

Maybe you can add this to the __init__ method?

Has anyone managed to get translation working? I am currently using the pipeline method and am finding that regardless of task="translate" and the language I define, I always get a transcript of the file, with the language spoken in the file.

My exact code is the following for a portuguese file:

In the __init__ method

self.pipe = pipeline(
            task="automatic-speech-recognition",
            model=path,
            chunk_length_s=30,
            device=device,
        )

        self.pipe.model.config.forced_decoder_ids = self.pipe.tokenizer.get_decoder_prompt_ids(language="portuguese", task="translate")
        self.pipe.model.generation_config.forced_decoder_ids = self.pipe.model.config.forced_decoder_ids

then in the __call__ method

        inputs = data.pop("inputs",data)
        prediction = self.pipe(inputs, return_timestamps=True)
        return prediction

This pretty much matches the samples and code suggestion in this thread, yet I have not succeed in accomplishing actual translation. This code works to produce a transcript.

Hey @lesliejd ! Updating to the latest version should fix these issues and make this code work hitch-free:

pip install --upgrade transformers

There's also the option of passing the task/language to the pipe at inference time. If you know the language a-priori, you can pass it as follows:

pipe(audio, return_timestamps=True, generate_kwargs={"language": "french"}

Likewise, you can specify the task as translate/transcribe:

pipe(audio, return_timestamps=True, generate_kwargs={"language": "french", "task": "transcribe"}

Let us know if you have any other questions, more than happy to help!

Hi! Thanks for all your responses until now.

As I am a beginner I still don't understand how I could pass the language parameter when calling the api with curl for transcription?
What I would like to do is get the language in the __call__ method and pass it to:
pipe(audio, return_timestamps=True, generate_kwargs={"language": language, "task": "transcribe"}

Sign up or log in to comment