kunato's picture
Update README.md
2720fc4 verified
metadata
library_name: transformers
license: apache-2.0
language:
  - th
pipeline_tag: automatic-speech-recognition

Monsoon-Whisper-Medium-Gigaspeech2

Monsoon-Whisper-Medium-GigaSpeech2 is a 🇹🇭 Thai Automatic Speech Recognition (ASR) model. It is based on Whisper-Medium and fine-tuned on GigaSpeech2.

Originally developed as a scale experiment for research on emergent capabilities in ASR tasks. It performs well in the wild, including with audio sourced from YouTube and in noisy environments.

More details can be found in our Typhoon-Audio Release Blog.

Model Description

  • Model type: Whisper Medium.
  • Requirement: transformers 4.38.0 or newer.
  • Primary Language(s): Thai 🇹🇭
  • License: Apache 2.0

Usage Example

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio
import torch

model_path = "scb10x/monsoon-whisper-medium-gigaspeech2"
device = "cuda"
filepath = 'audio.wav'

processor = WhisperProcessor.from_pretrained(model_path)
model = WhisperForConditionalGeneration.from_pretrained(
    model_path, torch_dtype=torch.bfloat16
)
model.to(device)
model.eval()

model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(
    language="th", task="transcribe"
)
array, sr = torchaudio.load(filepath)
input_features = (
    processor(array, sampling_rate=sr, return_tensors="pt")
    .to(device)
    .to(torch.bfloat16)
    .input_features
)
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)

Evaluation Results

Model WER (GS2) WER (CV17) CER (GS2) CER (CV17)
whisper-large-v3 37.02 22.63 24.03 8.49
whisper-medium 55.64 43.01 37.55 16.41
biodatlab-whisper-th-medium-combined 31.00 14.25 21.20 5.69
biodatlab-whisper-th-large-v3-combined 29.02 15.72 19.96 6.32
monsoon-whisper-medium-gigaspeech2 22.74 20.79 14.15 6.92

Intended Uses & Limitations

This model is experimental and may not always be accurate. Developers should carefully assess potential risks in the context of their specific applications.

Follow us & Support

Typhoon Team

Kunat Pipatanakul, Potsawee Manakul, Sittipong Sripaisarnmongkol, Natapong Nitarach, Warit Sirichotedumrong, Adisai Na-Thalang, Phatrasek Jirabovonvisut, Parinthapat Pengpun, Krisanapong Jirayoot, Pathomporn Chokchainant, Kasima Tharnpipitchai