File size: 5,312 Bytes
d84e6f6 f137d4b d84e6f6 f137d4b e710579 f137d4b e710579 f137d4b e710579 f137d4b e710579 f137d4b 34cc9d1 115b272 e710579 f137d4b 36380b7 f137d4b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
---
language:
- el
license: apache-2.0
tags:
- whisper-event
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_11_0
metrics:
- wer
model-index:
- name: Whisper Small - Greek (el)
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: mozilla-foundation/common_voice_11_0 el
type: mozilla-foundation/common_voice_11_0
config: el
split: test
args: el
metrics:
- name: Wer
type: wer
value: 25.696508172362552
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Whisper Small - Greek (el)
This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the mozilla-foundation/common_voice_11_0 el dataset
for translation from Greek to English.
It achieves the following results on the evaluation set:
- Loss: 0.4642
- Wer: 25.6965
## Model description
This model was finetuned with the encoder frozen. Only the decoder weights have been changed by this training run.
## Intended uses & limitations
The purpose of this model was to understand how the freezing of a part of the model might affect learning, in an effort to assess the feasibility of enabling adapters.
## Training and evaluation data
The training was performed by streaming interleaved train+eval spits of the greek (el) subset of mozilla-foundation/common_voice_11_0 (el).
The test set was similarly used for validation.
## Training procedure
Fine-tuning was performed on a lambdalabs laptop equipped with an NVIDIA GeForce RTX 3080 Laptop GPU (16GB).
The script used to perform the training `run_speech_recognition_seq2seq_streaming.py` is included in the files of this space with the following arguments:
```
--model_name_or_path "openai/whisper-small"
--model_revision "main"
--do_train True
--do_eval True
--use_auth_token False
--freeze_encoder True
--model_index_name "Whisper Small - Greek (el)"
--dataset_name "mozilla-foundation/common_voice_11_0"
--dataset_config_name "el"
--audio_column_name "audio"
--text_column_name "sentence"
--max_duration_in_seconds 30
--train_split_name "train+validation"
--eval_split_name "test"
--do_lower_case False
--do_remove_punctuation False
--do_normalize_eval True
--language "greek"
--task "translate"
--shuffle_buffer_size 500
--output_dir "./data/finetuningRuns/whisper-sm-el-frzEnc-xlate"
--per_device_train_batch_size 16
--gradient_accumulation_steps 4
--learning_rate 1e-5
--warmup_steps 500
--max_steps 5000
--gradient_checkpointing True
--fp16 True
--evaluation_strategy "steps"
--per_device_eval_batch_size 8
--predict_with_generate True
--generation_max_length 225
--save_steps 1000
--eval_steps 1000
--logging_steps 25
--report_to "tensorboard"
--load_best_model_at_end True
--metric_for_best_model "wer"
--greater_is_better False
--push_to_hub False
--overwrite_output_dir True
```
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 5000
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer |
|:-------------:|:-----:|:----:|:---------------:|:-------:|
| 0.0032 | 18.01 | 1000 | 0.4642 | 25.6965 |
| 0.0006 | 37.01 | 2000 | 0.5369 | 26.4395 |
| 0.0003 | 56.01 | 3000 | 0.5703 | 26.3187 |
| 0.0002 | 75.0 | 4000 | 0.5913 | 26.4302 |
| 0.0001 | 94.0 | 5000 | 0.5996 | 26.4952 |
Upon completion of training the best model was reloaded and tested with the following results extracted from the stdout log:
```
***** eval metrics *****
epoch = 94.0
eval_loss = 0.4642
eval_runtime = 0:19:54.59
eval_samples_per_second = 1.42
eval_steps_per_second = 0.177
eval_wer = 25.6965
```
### Framework versions
- Transformers 4.26.0.dev0
- Pytorch 1.13.0
- Datasets 2.7.1.dev0
- Tokenizers 0.12.1
|