Loading a sample audio file...
Hi, in your example you are loading an audio dataset contents, but I would like to use an external file to test this fork.
How can I load a simple wav/mp3 in the python code you provided, instead of the dataset ?
Hi,
If you want to have the output, you can use the demo space ( https://huggingface.co/spaces/gigant/romanian-whisper ) in which you can use either audio files or record with your microphone. Otherwhise if you run the code by yourself, you can use torchaudio.load
to load an array from a file, just make sure that you you a sample rate of 16kHz because that is the one used for training.
For instance you can resample using torchaudio like this:
import torchaudio.functional as F
def resample(sample, resample_rate = 16000):
sample_rate = sample[1]
resampled_waveform = F.resample(sample[0], sample_rate, resample_rate, lowpass_filter_width=512, rolloff=0.99)
return resampled_waveform
If you are using the pipeline
from transformers, you can give the filepath as is, check the code in https://huggingface.co/spaces/gigant/romanian-whisper/blob/main/app.py for example. Basically it is:
import torch
from transformers import pipeline
device = 0 if torch.cuda.is_available() else "cpu"
MODEL_NAME = "gigant/whisper-medium-romanian"
lang = "ro"
pipe = pipeline(
task="automatic-speech-recognition",
model=MODEL_NAME,
chunk_length_s=30,
device=device,
)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language=lang, task="transcribe")
text = pipe(file)["text"] #with "file" being the path to your audio file
Hope this helps