Model Description

  • Developed by: Neura company
  • Funded by: Neura
  • Model type: Whisper Base
  • Language(s) (NLP): Persian

Model Architecture

Whisper is a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model. It is a pre-trained model for automatic speech recognition (ASR) and speech translation.

Uses

Check out the Google Colab demo to run NeuraSpeech ASR on a free-tier Google Colab instance: Open In Colab

make sure these packages are installed:

from IPython.display import Audio, display
display(Audio('persian_audio.mp3', rate = 32_000,autoplay=True))
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa

# load model and processor
processor = WhisperProcessor.from_pretrained("Neurai/NeuraSpeech_WhisperBase")
model = WhisperForConditionalGeneration.from_pretrained("Neurai/NeuraSpeech_WhisperBase")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="fa", task="transcribe")

array, sample_rate = librosa.load('persian_audio.mp3')
sr = 16000
array = librosa.to_mono(array)
array = librosa.resample(array, orig_sr=sample_rate, target_sr=16000)
input_features = processor(array, sampling_rate=sr, return_tensors="pt").input_features

# generate token ids
predicted_ids = model.generate(input_features)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids,)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)

trascribed text :

او خواهان آزاد کردن بردگان بود

More Information

https://neura.info

Model Card Authors

Esmaeil Zahedi, Mohsen Yazdinejad

Model Card Contact

[email protected]

Downloads last month
268
Safetensors
Model size
72.6M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using Neurai/NeuraSpeech_WhisperBase 1