|
--- |
|
license: apache-2.0 |
|
language: |
|
- th |
|
base_model: biodatlab/whisper-th-large-combined |
|
tags: |
|
- whisper |
|
- Pytorch |
|
--- |
|
|
|
# Whisper-th-large-ct2 |
|
|
|
whisper-th-large-ct2 is the CTranslate2 format of [biodatlab/whisper-th-large-combined](https://huggingface.co/biodatlab/whisper-th-large-combined), comparable with [WhisperX](https://github.com/m-bain/whisperX) and [faster-whisper](https://github.com/SYSTRAN/faster-whisper), which enables: |
|
|
|
- 🤏 **Half the size** of original Huggingface format. |
|
- ⚡️ Batched inference for **70x** real-time transcription. |
|
- 🪶 A faster-whisper backend, requiring **<8GB GPU memory** with beam_size=5. |
|
- 🎯 Accurate word-level timestamps using wav2vec2 alignment. |
|
- 👯♂️ Multispeaker ASR using speaker diarization(includes speaker ID labels). |
|
- 🗣️ VAD preprocessing, reducing hallucinations and allowing batching with no WER degradation. |
|
|
|
### Usage |
|
|
|
```python |
|
!pip install git+https://github.com/m-bain/whisperx.git |
|
|
|
import whisperx |
|
import time |
|
|
|
# Setting |
|
device = "cuda" |
|
audio_file = "audio.mp3" |
|
batch_size = 16 |
|
compute_type = "float16" |
|
|
|
""" |
|
Your Hugging Face token for the Diarization model is required. |
|
Additionally, you need to accept the terms and conditions before use. |
|
Please visit the model page here. |
|
https://huggingface.co/pyannote/segmentation-3.0 |
|
""" |
|
HF_TOKEN = "" |
|
|
|
|
|
# load model and transcript |
|
model = whisperx.load_model("Thaweewat/whisper-th-large-ct2", device, compute_type=compute_type) |
|
st_time = time.time() |
|
audio = whisperx.load_audio(audio_file) |
|
result = model.transcribe(audio, batch_size=batch_size) |
|
|
|
# Assign speaker labels |
|
diarize_model = whisperx.DiarizationPipeline(use_auth_token=HF_TOKEN, device=device) |
|
diarize_segments = diarize_model(audio) |
|
result = whisperx.assign_word_speakers(diarize_segments, result) |
|
|
|
# Combine pure text if needed |
|
combined_text = ' '.join(segment['text'] for segment in result['segments']) |
|
|
|
print(f"Response time: {time.time() - st_time} seconds") |
|
print(diarize_segments) |
|
print(result) |
|
print(combined_text) |
|
``` |