nizarmichaud's picture
Update README.md
ee3d120 verified
metadata
license: mit
language:
  - de
metrics:
  - wer
pipeline_tag: automatic-speech-recognition

Model Description

Fine-tuned Whisper-tiny on SwissDial-ZH dataset for Swiss German dialects.

Model Details

Training

  • Duration: 4 hours
  • Hardware: NVIDIA RTX 3080
  • Batch Size: 32
  • Train/Test Split: 90%/10% (specific sentence selection)

Performance

  • WER: ~37% on test set

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor

model_name = "nizarmichaud/whisper-tiny-swiss-german"
model = WhisperForConditionalGeneration.from_pretrained(model_name)
processor = WhisperProcessor.from_pretrained(model_name)

audio_input = ...  # Your audio input here
inputs = processor(audio_input, return_tensors="pt", sampling_rate=16000)
generated_ids = model.generate(inputs["input_features"])
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)

print(transcription)

license: mit