library_name: transformers
license: apache-2.0
language:
- th
pipeline_tag: automatic-speech-recognition
Monsoon-Whisper-Medium-Gigaspeech2
Monsoon-Whisper-Medium-GigaSpeech2 is a 🇹🇠Thai Automatic Speech Recognition (ASR) model. It is based on Whisper-Medium and fine-tuned on GigaSpeech2.
Originally developed as a scale experiment for research on emergent capabilities in ASR tasks. It performs well in the wild, including with audio sourced from YouTube and in noisy environments.
More details can be found in our Typhoon-Audio Release Blog.
Model Description
- Model type: Whisper Medium.
- Requirement: transformers 4.38.0 or newer.
- Primary Language(s): Thai 🇹ðŸ‡
- License: Apache 2.0
Usage Example
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio
import torch
model_path = "scb10x/monsoon-whisper-medium-gigaspeech2"
device = "cuda"
filepath = 'audio.wav'
processor = WhisperProcessor.from_pretrained(model_path)
model = WhisperForConditionalGeneration.from_pretrained(
model_path, torch_dtype=torch.bfloat16
)
model.to(device)
model.eval()
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(
language="th", task="transcribe"
)
array, sr = torchaudio.load(filepath)
input_features = (
processor(array, sampling_rate=sr, return_tensors="pt")
.to(device)
.to(torch.bfloat16)
.input_features
)
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)
Evaluation Results
Model | WER (GS2) | WER (CV17) | CER (GS2) | CER (CV17) |
---|---|---|---|---|
whisper-large-v3 | 37.02 | 22.63 | 24.03 | 8.49 |
whisper-medium | 55.64 | 43.01 | 37.55 | 16.41 |
biodatlab-whisper-th-medium-combined | 31.00 | 14.25 | 21.20 | 5.69 |
biodatlab-whisper-th-large-v3-combined | 29.02 | 15.72 | 19.96 | 6.32 |
monsoon-whisper-medium-gigaspeech2 | 22.74 | 20.79 | 14.15 | 6.92 |
Intended Uses & Limitations
This model is experimental and may not always be accurate. Developers should carefully assess potential risks in the context of their specific applications.
Follow us & Support
Typhoon Team
Kunat Pipatanakul, Potsawee Manakul, Sittipong Sripaisarnmongkol, Natapong Nitarach, Warit Sirichotedumrong, Adisai Na-Thalang, Phatrasek Jirabovonvisut, Parinthapat Pengpun, Krisanapong Jirayoot, Pathomporn Chokchainant, Kasima Tharnpipitchai