sanchit-gandhi commited on
Commit
4506258
·
1 Parent(s): d7299c4

add section on OAI whisper

Browse files
Files changed (1) hide show
  1. README.md +34 -4
README.md CHANGED
@@ -46,7 +46,7 @@ pip install --upgrade transformers accelerate datasets[audio]
46
  ### Short-Form Transcription
47
 
48
  The model can be used with the [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
49
- class to transcribe short-form audio files as follows:
50
 
51
  ```python
52
  import torch
@@ -91,7 +91,7 @@ To transcribe a local audio file, simply pass the path to your audio file when y
91
 
92
  ### Long-Form Transcription
93
 
94
- Distil-Whisper uses a chunked algorithm to transcribe long-form audio files. In practice, this chunked long-form algorithm
95
  is 9x faster than the sequential algorithm proposed by OpenAI in the Whisper paper (see Table 7 of the [Distil-Whisper paper](https://arxiv.org/abs/2311.00430)).
96
 
97
  To enable chunking, pass the `chunk_length_s` parameter to the `pipeline`. For Distil-Whisper, a chunk length of 15-seconds
@@ -241,9 +241,39 @@ Coming soon ...
241
 
242
  Coming soon ...
243
 
244
- ### Running Whisper in `openai/whisper`
245
 
246
- Coming soon ...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
247
 
248
 
249
  ### Transformers.js
 
46
  ### Short-Form Transcription
47
 
48
  The model can be used with the [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
49
+ class to transcribe short-form audio files (< 30-seconds) as follows:
50
 
51
  ```python
52
  import torch
 
91
 
92
  ### Long-Form Transcription
93
 
94
+ Distil-Whisper uses a chunked algorithm to transcribe long-form audio files (> 30-seconds). In practice, this chunked long-form algorithm
95
  is 9x faster than the sequential algorithm proposed by OpenAI in the Whisper paper (see Table 7 of the [Distil-Whisper paper](https://arxiv.org/abs/2311.00430)).
96
 
97
  To enable chunking, pass the `chunk_length_s` parameter to the `pipeline`. For Distil-Whisper, a chunk length of 15-seconds
 
241
 
242
  Coming soon ...
243
 
244
+ ### Running Whisper in `openai-whisper`
245
 
246
+ To use the model in the original Whisper format, first ensure you have the [`openai-whisper`](https://pypi.org/project/openai-whisper/) package installed:
247
+
248
+ ```bash
249
+ pip install --upgrade openai-whisper
250
+ ```
251
+
252
+ The following code-snippet demonstrates how to transcribe a sample file from the LibriSpeech dataset loaded using
253
+ 🤗 Datasets:
254
+
255
+ ```python
256
+ import torch
257
+ from datasets import load_dataset
258
+ from huggingface_hub import hf_hub_download
259
+ from whisper import load_model, transcribe
260
+
261
+ medium_en = hf_hub_download(repo_id="distil-whisper/distil-medium.en", filename="original-model.bin")
262
+ model = load_model(medium_en)
263
+
264
+ dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
265
+ sample = dataset[0]["audio"]["array"]
266
+ sample = torch.from_numpy(sample).float()
267
+
268
+ pred_out = transcribe(model, audio=sample)
269
+ print(pred_out["text"])
270
+ ```
271
+
272
+ To transcribe a local audio file, simply pass the path to the audio file as the `audio` argument to transcribe:
273
+
274
+ ```python
275
+ pred_out = transcribe(model, audio="audio.mp3")
276
+ ```
277
 
278
 
279
  ### Transformers.js