TalTechNLP
/

whisper-large-et

@@ -19,10 +19,10 @@ model-index:
     metrics:
     - name: Test WER
       type: wer
-      value: 11.99
     - name: Test CER
       type: cer
-      value: 3.21
   - task:
       name: Automatic Speech Recognition
       type: automatic-speech-recognition
@@ -34,16 +34,16 @@ model-index:
     metrics:
     - name: Test WER
       type: wer
-      value: 11.22
     - name: Test CER
       type: cer
-      value: 2.813
 ---
 # Whisper-large-et
-This is a Whisper-large-v2 model [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) finetuned on around 800 hours of diverse Estonian data.
 ## Model description
 This is a general-purpose Estonian ASR model trained in the Lab of Language Technology at TalTech.
@@ -55,7 +55,15 @@ This model is intended for general-purpose speech recognition, such as broadcast
 ## How to use
-Use as any other Whisper model via HF transformers, or use a faster decoder like [faster-whisper](https://github.com/guillaumekln/faster-whisper).
 #### Limitations and bias
@@ -72,12 +80,12 @@ Acoustic training data:
 | Type                  | Amount (h) |
 |-----------------------|:------:|
-| Broadcast speech      |   591  |
 | Spontaneous speech    |   53   |
 | Elderly speech corpus |   53   |
 | Talks, lectures       |   49   |
 | Parliament speeches   |   31   |
-| *Total*               |   *761*  |
@@ -87,6 +95,10 @@ Finetuned using Espnet, and then comverted to transformers format using [this](h
 Finetuning procedure is similar to [this](https://huggingface.co/espnet/shihlun_asr_whisper_medium_finetuned_librispeech100) model.
 Finetuning was done for 3 epochs, with model averaging at the end of training.
 ## Evaluation results
 ### WER
@@ -95,5 +107,5 @@ WER results below are obtained using greedy decoding (i.e., beam size 1).
 |Dataset | WER |
 |---|---|
-| Common Voice 8.0 | 11.2 |
 | Common Voice 11.0 | 12.0 |

     metrics:
     - name: Test WER
       type: wer
+      value: 12.03
     - name: Test CER
       type: cer
+      value: 3.18
   - task:
       name: Automatic Speech Recognition
       type: automatic-speech-recognition
     metrics:
     - name: Test WER
       type: wer
+      value: 11.35
     - name: Test CER
       type: cer
+      value: 2.75
 ---
 # Whisper-large-et
+This is a Whisper-large-v2 model [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) finetuned on around 1200 hours of diverse Estonian data.
 ## Model description
 This is a general-purpose Estonian ASR model trained in the Lab of Language Technology at TalTech.
 ## How to use
+Recommended: use [faster-whisper](https://github.com/guillaumekln/faster-whisper).
+For example:
+  * Convert the HF model to CT2 format:
+    `ct2-transformers-converter --model TalTechNLP/whisper-large-et --output_dir whisper-large-et.ct2  --copy_files tokenizer.json --quantization float16`
+  * Decode: `whisper-ctranslate2 --model_directory whisper-large-et.ct2 --task transcribe --language et --beam_size 5 some_file.mp3`
 #### Limitations and bias
 | Type                  | Amount (h) |
 |-----------------------|:------:|
+| Broadcast speech      |   991  |
 | Spontaneous speech    |   53   |
 | Elderly speech corpus |   53   |
 | Talks, lectures       |   49   |
 | Parliament speeches   |   31   |
+| *Total*               |   *1161*  |
 Finetuning procedure is similar to [this](https://huggingface.co/espnet/shihlun_asr_whisper_medium_finetuned_librispeech100) model.
 Finetuning was done for 3 epochs, with model averaging at the end of training.
+*Update*: 2023-10-03 bersion of the model is trained on long segments (like the original Whisper model) and
+is therefore especially well suited to be used e.g. with [faster-whisper](https://github.com/guillaumekln/faster-whisper) to
+transcribe long speech recordings "end-to-end" (i.e., without any prior segmentation).
 ## Evaluation results
 ### WER
 |Dataset | WER |
 |---|---|
+| Common Voice 8.0 | 11.3 |
 | Common Voice 11.0 | 12.0 |

config.json CHANGED Viewed

@@ -28,7 +28,7 @@
   "forced_decoder_ids": [
     [
       1,
-      50307
     ],
     [
       2,

   "forced_decoder_ids": [
     [
       1,
+      50259
     ],
     [
       2,

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7cf75ef7c93c718da4063d40b765fabe6825c6754d223801a2ebe944661ce0b9
 size 6173637880

 version https://git-lfs.github.com/spec/v1
+oid sha256:8f8edba2e2b8974654d430b0ffe9d6bb1e7a394e84f226fe7a5acaf3bc94d6f3
 size 6173637880