Abhigyanr commited on
Commit
da43f9a
·
1 Parent(s): 8931f54

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: or
3
+ metrics:
4
+ - wer
5
+ - cer
6
+ tags:
7
+ - audio
8
+ - automatic-speech-recognition
9
+ - speech
10
+ - wav2vec2
11
+ - asr
12
+ license: apache-2.0
13
+ ---
14
+
15
+ # IndicWav2Vec-Hindi
16
+
17
+ This is a [Wav2Vec2](https://arxiv.org/abs/2006.11477) style ASR model trained in [fairseq](https://github.com/facebookresearch/fairseq) and ported to Hugging Face.
18
+ More details on datasets, training-setup and conversion to HuggingFace format can be found in the [IndicWav2Vec](https://github.com/AI4Bharat/IndicWav2Vec) repo.
19
+
20
+ ## Script to Run Inference
21
+
22
+ ```python
23
+ import torch
24
+ from datasets import load_dataset
25
+ from transformers import AutoModelForCTC, AutoProcessor
26
+ import torchaudio.functional as F
27
+
28
+ DEVICE_ID = "cuda" if torch.cuda.is_available() else "cpu"
29
+ MODEL_ID = "ai4bharat/indicwav2vec-odia"
30
+
31
+ sample = next(iter(load_dataset("common_voice", "or", split="test", streaming=True)))
32
+ resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48000, 16000).numpy()
33
+
34
+ model = AutoModelForCTC.from_pretrained(MODEL_ID).to(DEVICE_ID)
35
+ processor = AutoProcessor.from_pretrained(MODEL_ID)
36
+
37
+ input_values = processor(resampled_audio, return_tensors="pt").input_values
38
+
39
+ with torch.no_grad():
40
+ logits = model(input_values.to(DEVICE_ID)).logits.cpu()
41
+
42
+ prediction_ids = torch.argmax(logits, dim=-1)
43
+ output_str = processor.batch_decode(prediction_ids)[0]
44
+ print(f"Greedy Decoding: {output_str}")
45
+ ```
46
+
47
+ # **About AI4Bharat**
48
+ - Website: https://ai4bharat.org/
49
+ - Code: https://github.com/AI4Bharat
50
+ - HuggingFace: https://huggingface.co/ai4bharat