képzési információ

A modell, egy újragondolt adatbázissal került kiképzésre.

Az adatbázisból ki lettek véve:

  • a numerikus számok, ezért a modell az elhangzott számokat szövegesen fogja leírni
  • speciális karakterek, ezért ezeket is fonetikusan fogja leírni
  • mozaikszavak
  • nagybetűk

Ezek miatt a változtatások miatt a WER elszállt kicsit, viszont a normalizált WER, tovább javult. A hipernormalizált WER vélhetően mégjobb lenne (ahhol a tesztataok is át lennének javítva a fentiek szerint).

A képzés ezesetben a transformer könyvtár mintascriptjével történt: https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-recognition#whisper-model egyedi 2000 órás adatkészleten, ami most a CV17 train+validate spliteket is tartalmazta.

whisper-base-hu-V2

This model is a fine-tuned version of openai/whisper-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0880
  • Wer: 0.0960

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 7e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 3.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.551 0.0904 1000 0.2710 0.2694
0.4016 0.1807 2000 0.2009 0.2061
0.3449 0.2711 3000 0.1707 0.1770
0.3147 0.3614 4000 0.1588 0.1650
0.2936 0.4518 5000 0.1472 0.1551
0.2758 0.5421 6000 0.1406 0.1479
0.2663 0.6325 7000 0.1322 0.1393
0.2613 0.7228 8000 0.1283 0.1402
0.2491 0.8132 9000 0.1216 0.1319
0.238 0.9035 10000 0.1192 0.1291
0.2287 0.9939 11000 0.1151 0.1276
0.1798 1.0842 12000 0.1131 0.1234
0.1791 1.1746 13000 0.1113 0.1186
0.1787 1.2649 14000 0.1085 0.1186
0.1771 1.3553 15000 0.1068 0.1154
0.1728 1.4456 16000 0.1046 0.1135
0.1714 1.5360 17000 0.1029 0.1152
0.1706 1.6263 18000 0.1007 0.1117
0.163 1.7167 19000 0.0998 0.1074
0.1613 1.8070 20000 0.0982 0.1075
0.1568 1.8974 21000 0.0967 0.1087
0.1525 1.9878 22000 0.0945 0.1045
0.1063 2.0781 23000 0.0967 0.1046
0.1075 2.1684 24000 0.0951 0.1030
0.1035 2.2588 25000 0.0936 0.1015
0.1056 2.3491 26000 0.0928 0.1013
0.1019 2.4395 27000 0.0921 0.1000
0.1004 2.5298 28000 0.0911 0.0986
0.0992 2.6202 29000 0.0904 0.0980
0.1011 2.7105 30000 0.0898 0.0978
0.095 2.8009 31000 0.0892 0.0975
0.0975 2.8913 32000 0.0885 0.0960
0.0963 2.9816 33000 0.0880 0.0962

Framework versions

  • Transformers 4.48.0.dev0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
35
Safetensors
Model size
72.6M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for sarpba/whisper-hu-base-finetuned-V2

Finetuned
(387)
this model
Finetunes
1 model