You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Heero-STT-Model

This model is a fine-tuned version of openai/whisper-small on the screevoai/code-switch dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0895
  • Wer: 4.4468

Training results

Training Loss Epoch Step Validation Loss Wer
0.0345 3 1250 0.0895 4.4468

Libraries to Install

  • pip install transformers datasets safetensors librosa huggingface-hub

Authentication needed before running the script

Run the following command in the terminal/jupyter_notebook:

  • Terminal: huggingface-cli login

  • Jupyter_notebook:

    >>> from huggingface_hub import notebook_login
    >>> notebook_login()
    

NOTE: Copy and Paste the token from your Huggingface Account Settings > Access Tokens > Create a new token / Copy the existing one.

Script

>>> from transformers import WhisperProcessor, WhisperForConditionalGeneration
>>> from datasets import load_dataset
>>> import librosa
>>> import requests
>>> from io import BytesIO

>>> # Load model and processor
>>> processor = WhisperProcessor.from_pretrained("screevoai/heero-small-v1")
>>> model = WhisperForConditionalGeneration.from_pretrained("screevoai/heero-small-v1")
>>> model.config.forced_decoder_ids = None

>>> # Load the dataset
>>> ds = load_dataset("screevoai/code-switch", split="test")
>>> sample_url = ds[2]["audio_file_path"]  # change the row number for testing different audio files

>>> # Download the audio file
>>> response = requests.get(sample_url)
>>> audio_file_data = BytesIO(response.content)

>>> # Down-sampling the audio file to 16KHz
>>> audio, sr = librosa.load(audio_file_data, sr=None)
>>> audio_resampled = librosa.resample(audio, orig_sr=sr, target_sr=16000)

>>> processed_audio = processor(audio_resampled, sampling_rate=16000, return_tensors="pt")
>>> input_features = processed_audio['input_features']

>>> # Generate predictions using the model
>>> output_ids = model.generate(input_features, max_new_tokens=400)
>>> transcription = processor.batch_decode(output_ids, skip_special_tokens=True)[0]

>>> print(transcription)
Downloads last month
0
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for screevoai/heero-small-v1

Finetuned
(2217)
this model

Dataset used to train screevoai/heero-small-v1

Evaluation results