Whisper Small GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-small on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia dataset. The datasets are augmented in two ways: noise augmentation, and truncating low-amplitude samples. The best model checkpoint (this version) based on ChrF is at step 2000, epoch 0.4378, and it achieves the following results on the evaluation set:

  • Loss: 1.2119
  • Bleu: 30.93
  • Chrf: 49.09
  • Wer: 63.1247

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 0.02
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Bleu Chrf Validation Loss Wer
2.7017 0.02 100 2.83 14.96 2.4392 169.5182
2.6732 0.04 200 7.27 22.72 1.9552 103.2868
2.1622 0.07 300 11.43 30.01 1.7297 108.2395
2.0314 0.09 400 12.96 31.0 1.6499 106.4385
1.7219 0.11 500 12.94 33.67 1.5543 107.6092
1.577 0.13 600 12.84 35.03 1.4812 118.5502
1.3569 0.1532 700 19.94 38.08 1.4559 84.2864
1.3401 0.1751 800 13.39 36.11 1.3855 126.4295
1.2272 0.1970 900 24.39 41.75 1.3764 70.7789
1.2793 0.2189 1000 23.01 42.13 1.3389 80.6844
1.0383 0.2408 1100 23.42 43.59 1.3125 82.3953
1.0485 0.2627 1200 25.42 42.99 1.2996 69.4732
1.0427 0.2846 1300 29.24 45.36 1.2996 65.6461
0.8174 0.3065 1400 27.28 45.67 1.2522 68.3926
0.7345 0.3284 1500 26.35 46.78 1.2349 79.1986
0.7551 0.3503 1600 27.81 46.49 1.2317 70.6439
0.6765 0.3722 1700 27.62 47.46 1.2062 70.9140
0.6613 0.3940 1800 26.56 47.12 1.2087 72.8050
0.6181 0.4159 1900 29.91 48.76 1.2139 65.2859
0.5809 0.4378 2000 30.93 49.09 1.2119 63.1247
0.5898 0.4597 2100 25.91 46.24 1.2540 73.9307
0.5926 0.4816 2200 25.19 44.72 1.2479 78.7933
0.5158 0.5035 2300 28.9 46.76 1.2532 66.3665
0.4511 0.5254 2400 28.89 46.83 1.2517 66.3215
0.4329 0.5473 2500 26.19 45.91 1.2573 72.6700
0.4106 0.5692 2600 26.91 46.84 1.2615 72.4899
0.4002 0.5911 2700 27.77 46.93 1.2396 71.0491
0.4047 0.6130 2800 29.9 47.79 1.2450 66.9968
0.3719 0.6349 2900 30.5 48.78 1.2522 65.1959
0.327 0.6567 3000 31.22 49.0 1.2493 64.1153
0.3138 0.6786 3100 30.1 47.82 1.2653 65.1959
0.3349 0.7005 3200 30.37 48.64 1.2651 63.9802
0.2807 0.7224 3300 26.02 45.46 1.2762 76.8573
0.2648 0.7443 3400 30.65 47.58 1.2761 64.6105
0.2633 0.7662 3500 29.73 47.74 1.2890 65.5110
0.2316 0.7881 3600 29.94 47.33 1.2886 66.4566
0.233 0.8100 3700 27.82 48.01 1.2905 73.1202
0.2196 0.8319 3800 31.51 48.66 1.2994 63.7100
0.2119 0.8538 3900 30.09 48.44 1.2910 65.0158
0.2082 0.8757 4000 30.91 47.99 1.2924 65.1058

Framework versions

  • Transformers 4.40.0
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
23
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ymoslem/whisper-small-ga2en-v5.2

Finetuned
(2147)
this model

Datasets used to train ymoslem/whisper-small-ga2en-v5.2

Collection including ymoslem/whisper-small-ga2en-v5.2

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    30.910
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    65.106