aotrih commited on
Commit
4b0635d
Β·
verified Β·
1 Parent(s): 9c2558a

whisperkittools generated README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -29
README.md CHANGED
@@ -17,37 +17,37 @@ tags:
17
  ## Dataset: `librispeech`
18
  Short-form Audio (<30s/clip) - 5 hours of English audiobook clips
19
 
20
- | | WER (↓) | QoI (↑) | File Size (MB) | Code Commit |
21
- |:--------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------|----------:|-----------------:|:---------------------------------------------------------------|
22
- | WhisperOpenAIAPI/openai_whisper-large-v2 | [2.35](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperOpenAIAPI/openai_whisper-large-v2/librispeech) | 100 | 3100 | N/A |
23
- | [WhisperKit/openai_whisper-large-v2](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2) | [2.77](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2/librispeech) | 96.6 | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
24
- | [WhisperKit/openai_whisper-large-v2_949MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2_949MB) | [2.4](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_949MB/librispeech) | 94.6 | 949 | [Link](https://github.com/argmaxinc/WhisperKit/commit/eca4a2e) |
25
- | [WhisperKit/openai_whisper-large-v2_turbo](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2_turbo) | [2.76](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_turbo/librispeech) | 96.6 | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
26
- | [WhisperKit/openai_whisper-large-v2_turbo_955MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2_turbo_955MB) | [2.41](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_turbo_955MB/librispeech) | 94.6 | 955 | [Link](https://github.com/argmaxinc/WhisperKit/commit/cf75348) |
27
- | [WhisperKit/openai_whisper-large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3) | [2.04](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/librispeech) | 95.2 | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
28
- | [WhisperKit/openai_whisper-large-v3_947MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3_947MB) | [2.46](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_947MB/librispeech) | 93.9 | 947 | [Link](https://github.com/argmaxinc/WhisperKit/commit/eca4a2e) |
29
- | [WhisperKit/openai_whisper-large-v3_turbo](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3_turbo) | [2.03](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_turbo/librispeech) | 95.4 | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
30
- | [WhisperKit/openai_whisper-large-v3_turbo_954MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3_turbo_954MB) | [2.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_turbo_954MB/librispeech) | 93.9 | 954 | [Link](https://github.com/argmaxinc/WhisperKit/commit/cf75348) |
31
- | [WhisperKit/distil-whisper_distil-large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3) | [2.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3/librispeech) | 89.7 | 1510 | [Link](https://github.com/argmaxinc/WhisperKit/commit/cf75348) |
32
- | [WhisperKit/distil-whisper_distil-large-v3_594MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3_594MB) | [2.96](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3_594MB/librispeech) | 85.4 | 594 | [Link](https://github.com/argmaxinc/WhisperKit/commit/508240f) |
33
- | [WhisperKit/distil-whisper_distil-large-v3_turbo](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3_turbo) | [2.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3_turbo/librispeech) | 89.7 | 1510 | [Link](https://github.com/argmaxinc/WhisperKit/commit/508240f) |
34
- | [WhisperKit/distil-whisper_distil-large-v3_turbo_600MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3_turbo_600MB) | [2.78](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3_turbo_600MB/librispeech) | 86.2 | 600 | [Link](https://github.com/argmaxinc/WhisperKit/commit/ae1cf96) |
35
- | [WhisperKit/openai_whisper-small.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-small.en) | [3.12](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-small.en/librispeech) | 85.8 | 483 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
36
- | [WhisperKit/openai_whisper-small](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-small) | [3.45](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-small/librispeech) | 83 | 483 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
37
- | [WhisperKit/openai_whisper-base.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base.en) | [3.98](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/librispeech) | 75.3 | 145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
38
- | [WhisperKit/openai_whisper-base](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base) | [4.97](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base/librispeech) | 67.2 | 145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
39
- | [WhisperKit/openai_whisper-tiny.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny.en) | [5.61](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/librispeech) | 63.9 | 66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
40
- | [WhisperKit/openai_whisper-tiny](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny) | [7.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny/librispeech) | 52.5 | 66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
41
 
42
  ## Dataset: `earnings22`
43
  Long-Form Audio (>1hr/clip) - 120 hours of earnings call recordings in English with various accents
44
 
45
- | | WER (↓) | QoI (↑) | File Size (MB) | Code Commit |
46
- |:------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------|----------:|-----------------:|:---------------------------------------------------------------|
47
- | WhisperOpenAIAPI/openai_whisper-large-v2 | [16.27](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperOpenAIAPI/openai_whisper-large-v2/earnings22) | 100 | 3100 | N/A |
48
- | [WhisperKit/openai_whisper-large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3) | [15.17](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/earnings22) | 58.5 | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
49
- | [WhisperKit/openai_whisper-base.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base.en) | [23.49](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/earnings22) | 6.5 | 145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/dda6571) |
50
- | [WhisperKit/openai_whisper-tiny.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny.en) | [28.64](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/earnings22) | 5.7 | 66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/dda6571) |
51
 
52
 
53
  ### Explanation
@@ -91,13 +91,18 @@ where the production behavior is established by the reference results and the go
91
  We anticipate developers that use Whisper (or similar models) in production to have their own Quality Assurance test sets and [whisperkittools](https://github.com/argmaxinc/whisperkittools) offers
92
  the tooling necessary to run the same measurements on such custom test sets, please see the [Model Evaluation on Custom Dataset]((https://github.com/argmaxinc/whisperkittools)) for details.
93
 
 
 
 
 
 
94
  ### Datasets
95
  - [librispeech](https://huggingface.co/datasets/argmaxinc/librispeech): ~5 hours of short English audio clips, tests short-form transcription quality
96
  - [earnings22](https://huggingface.co/datasets/argmaxinc/earnings22): ~120 hours of English audio clips from earnings calls with various accents, tests long-form transcription quality
97
 
98
  ### Reproducing Results
99
- Benchmark results on this page were automatically generated by [whisperkittools](https://github.com/argmaxinc/whisperkittools). We use our cluster of Apple Silicon Macs as self-hosted runners on
100
- Github Actions as our CI infrastructure to periodically recompute these benchmarks. Due to [security concerns](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#hardening-for-self-hosted-runners),
101
  we are unable to open up the cluster to the public. However, any Apple Silicon Mac (even with 8GB RAM) can be used to
102
  run identical [evaluation jobs](#evaluation) locally. For reference, our M2 Ultra devices complete a `librispeech` + `openai/whisper-large-v3`
103
  evaluation in under 1 hour regardless of the Whisper implementation. Oldest Apple Silicon Macs should take less than 1 day to complete the same evaluation.
 
17
  ## Dataset: `librispeech`
18
  Short-form Audio (<30s/clip) - 5 hours of English audiobook clips
19
 
20
+ | | WER (↓) | QoI (↑) | File Size (MB) | Code Commit |
21
+ |:---------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------|----------:|-----------------:|:---------------------------------------------------------------|
22
+ | large-v2 (WhisperOpenAIAPI) | [2.35](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperOpenAIAPI/openai_whisper-large-v2/librispeech) | 100 | 3100 | N/A |
23
+ | [large-v2](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2) | [2.77](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2/librispeech) | 96.6 | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
24
+ | [large-v2_949MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2_949MB) | [2.4](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_949MB/librispeech) | 94.6 | 949 | [Link](https://github.com/argmaxinc/WhisperKit/commit/eca4a2e) |
25
+ | [large-v2_turbo](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2_turbo) | [2.76](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_turbo/librispeech) | 96.6 | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
26
+ | [large-v2_turbo_955MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v2_turbo_955MB) | [2.41](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v2_turbo_955MB/librispeech) | 94.6 | 955 | [Link](https://github.com/argmaxinc/WhisperKit/commit/cf75348) |
27
+ | [large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3) | [2.04](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/librispeech) | 95.2 | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
28
+ | [large-v3_947MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3_947MB) | [2.46](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_947MB/librispeech) | 93.9 | 947 | [Link](https://github.com/argmaxinc/WhisperKit/commit/eca4a2e) |
29
+ | [large-v3_turbo](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3_turbo) | [2.03](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_turbo/librispeech) | 95.4 | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
30
+ | [large-v3_turbo_954MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3_turbo_954MB) | [2.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3_turbo_954MB/librispeech) | 93.9 | 954 | [Link](https://github.com/argmaxinc/WhisperKit/commit/cf75348) |
31
+ | [distil-whisper_distil-large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3) | [2.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3/librispeech) | 89.7 | 1510 | [Link](https://github.com/argmaxinc/WhisperKit/commit/cf75348) |
32
+ | [distil-whisper_distil-large-v3_594MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3_594MB) | [2.96](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3_594MB/librispeech) | 85.4 | 594 | [Link](https://github.com/argmaxinc/WhisperKit/commit/508240f) |
33
+ | [distil-whisper_distil-large-v3_turbo](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3_turbo) | [2.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3_turbo/librispeech) | 89.7 | 1510 | [Link](https://github.com/argmaxinc/WhisperKit/commit/508240f) |
34
+ | [distil-whisper_distil-large-v3_turbo_600MB](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/distil-whisper_distil-large-v3_turbo_600MB) | [2.78](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/distil-whisper_distil-large-v3_turbo_600MB/librispeech) | 86.2 | 600 | [Link](https://github.com/argmaxinc/WhisperKit/commit/ae1cf96) |
35
+ | [small.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-small.en) | [3.12](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-small.en/librispeech) | 85.8 | 483 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
36
+ | [small](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-small) | [3.45](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-small/librispeech) | 83 | 483 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
37
+ | [base.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base.en) | [3.98](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/librispeech) | 75.3 | 145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
38
+ | [base](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base) | [4.97](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base/librispeech) | 67.2 | 145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
39
+ | [tiny.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny.en) | [5.61](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/librispeech) | 63.9 | 66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
40
+ | [tiny](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny) | [7.47](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny/librispeech) | 52.5 | 66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/228630c) |
41
 
42
  ## Dataset: `earnings22`
43
  Long-Form Audio (>1hr/clip) - 120 hours of earnings call recordings in English with various accents
44
 
45
+ | | WER (↓) | QoI (↑) | File Size (MB) | Code Commit |
46
+ |:----------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------|----------:|-----------------:|:---------------------------------------------------------------|
47
+ | large-v2 (WhisperOpenAIAPI) | [16.27](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperOpenAIAPI/openai_whisper-large-v2/earnings22) | 100 | 3100 | N/A |
48
+ | [large-v3](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-large-v3) | [15.17](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-large-v3/earnings22) | 58.5 | 3100 | [Link](https://github.com/argmaxinc/WhisperKit/commit/2846fd9) |
49
+ | [base.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-base.en) | [23.49](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-base.en/earnings22) | 6.5 | 145 | [Link](https://github.com/argmaxinc/WhisperKit/commit/dda6571) |
50
+ | [tiny.en](https://hf.co/argmaxinc/whisperkit-coreml/tree/main/openai_whisper-tiny.en) | [28.64](https://hf.co/datasets/argmaxinc/whisperkit-evals/tree/main/WhisperKit/openai_whisper-tiny.en/earnings22) | 5.7 | 66 | [Link](https://github.com/argmaxinc/WhisperKit/commit/dda6571) |
51
 
52
 
53
  ### Explanation
 
91
  We anticipate developers that use Whisper (or similar models) in production to have their own Quality Assurance test sets and [whisperkittools](https://github.com/argmaxinc/whisperkittools) offers
92
  the tooling necessary to run the same measurements on such custom test sets, please see the [Model Evaluation on Custom Dataset]((https://github.com/argmaxinc/whisperkittools)) for details.
93
 
94
+ ### Why are there so many Whisper versions?
95
+ WhisperKit is an SDK for building speech-to-text features in apps across a wide range of Apple devices. We are working towards abstracting away the model versioning from the developer so WhisperKit
96
+ "just works" by deploying the highest-quality model version that a particular device can execute. In the interim, we leave the choice to the developer by providing quality and size trade-offs.
97
+
98
+
99
  ### Datasets
100
  - [librispeech](https://huggingface.co/datasets/argmaxinc/librispeech): ~5 hours of short English audio clips, tests short-form transcription quality
101
  - [earnings22](https://huggingface.co/datasets/argmaxinc/earnings22): ~120 hours of English audio clips from earnings calls with various accents, tests long-form transcription quality
102
 
103
  ### Reproducing Results
104
+ Benchmark results on this page were automatically generated by [whisperkittools](https://github.com/argmaxinc/whisperkittools) using our cluster of Apple Silicon Macs as self-hosted runners on
105
+ Github Actions. We periodically recompute these benchmarks as part of our CI pipeline. Due to [security concerns](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#hardening-for-self-hosted-runners),
106
  we are unable to open up the cluster to the public. However, any Apple Silicon Mac (even with 8GB RAM) can be used to
107
  run identical [evaluation jobs](#evaluation) locally. For reference, our M2 Ultra devices complete a `librispeech` + `openai/whisper-large-v3`
108
  evaluation in under 1 hour regardless of the Whisper implementation. Oldest Apple Silicon Macs should take less than 1 day to complete the same evaluation.