sanchit-gandhi
commited on
Commit
·
4506258
1
Parent(s):
d7299c4
add section on OAI whisper
Browse files
README.md
CHANGED
@@ -46,7 +46,7 @@ pip install --upgrade transformers accelerate datasets[audio]
|
|
46 |
### Short-Form Transcription
|
47 |
|
48 |
The model can be used with the [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
|
49 |
-
class to transcribe short-form audio files as follows:
|
50 |
|
51 |
```python
|
52 |
import torch
|
@@ -91,7 +91,7 @@ To transcribe a local audio file, simply pass the path to your audio file when y
|
|
91 |
|
92 |
### Long-Form Transcription
|
93 |
|
94 |
-
Distil-Whisper uses a chunked algorithm to transcribe long-form audio files. In practice, this chunked long-form algorithm
|
95 |
is 9x faster than the sequential algorithm proposed by OpenAI in the Whisper paper (see Table 7 of the [Distil-Whisper paper](https://arxiv.org/abs/2311.00430)).
|
96 |
|
97 |
To enable chunking, pass the `chunk_length_s` parameter to the `pipeline`. For Distil-Whisper, a chunk length of 15-seconds
|
@@ -241,9 +241,39 @@ Coming soon ...
|
|
241 |
|
242 |
Coming soon ...
|
243 |
|
244 |
-
### Running Whisper in `openai
|
245 |
|
246 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
247 |
|
248 |
|
249 |
### Transformers.js
|
|
|
46 |
### Short-Form Transcription
|
47 |
|
48 |
The model can be used with the [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
|
49 |
+
class to transcribe short-form audio files (< 30-seconds) as follows:
|
50 |
|
51 |
```python
|
52 |
import torch
|
|
|
91 |
|
92 |
### Long-Form Transcription
|
93 |
|
94 |
+
Distil-Whisper uses a chunked algorithm to transcribe long-form audio files (> 30-seconds). In practice, this chunked long-form algorithm
|
95 |
is 9x faster than the sequential algorithm proposed by OpenAI in the Whisper paper (see Table 7 of the [Distil-Whisper paper](https://arxiv.org/abs/2311.00430)).
|
96 |
|
97 |
To enable chunking, pass the `chunk_length_s` parameter to the `pipeline`. For Distil-Whisper, a chunk length of 15-seconds
|
|
|
241 |
|
242 |
Coming soon ...
|
243 |
|
244 |
+
### Running Whisper in `openai-whisper`
|
245 |
|
246 |
+
To use the model in the original Whisper format, first ensure you have the [`openai-whisper`](https://pypi.org/project/openai-whisper/) package installed:
|
247 |
+
|
248 |
+
```bash
|
249 |
+
pip install --upgrade openai-whisper
|
250 |
+
```
|
251 |
+
|
252 |
+
The following code-snippet demonstrates how to transcribe a sample file from the LibriSpeech dataset loaded using
|
253 |
+
🤗 Datasets:
|
254 |
+
|
255 |
+
```python
|
256 |
+
import torch
|
257 |
+
from datasets import load_dataset
|
258 |
+
from huggingface_hub import hf_hub_download
|
259 |
+
from whisper import load_model, transcribe
|
260 |
+
|
261 |
+
medium_en = hf_hub_download(repo_id="distil-whisper/distil-medium.en", filename="original-model.bin")
|
262 |
+
model = load_model(medium_en)
|
263 |
+
|
264 |
+
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
|
265 |
+
sample = dataset[0]["audio"]["array"]
|
266 |
+
sample = torch.from_numpy(sample).float()
|
267 |
+
|
268 |
+
pred_out = transcribe(model, audio=sample)
|
269 |
+
print(pred_out["text"])
|
270 |
+
```
|
271 |
+
|
272 |
+
To transcribe a local audio file, simply pass the path to the audio file as the `audio` argument to transcribe:
|
273 |
+
|
274 |
+
```python
|
275 |
+
pred_out = transcribe(model, audio="audio.mp3")
|
276 |
+
```
|
277 |
|
278 |
|
279 |
### Transformers.js
|