--- license: apache-2.0 language: - en base_model: - facebook/wav2vec2-large-xlsr-53 --- # Wav2Vec2 Fine-Tuned for Pronunciation Correction This is a fine-tuned Wav2Vec2 model for phoneme-level pronunciation correction. It analyzes speech and provides transcriptions in phonetic notation. CER = 0.1 ## Usage ```python from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor import librosa import torch # Load model and processor model = Wav2Vec2ForCTC.from_pretrained("moxeeeem/wav2vec2-finetuned-pronunciation-correction") processor = Wav2Vec2Processor.from_pretrained("moxeeeem/wav2vec2-finetuned-pronunciation-correction") def transcribe_audio(speech, sampling_rate): inputs = processor(speech, sampling_rate=sampling_rate, return_tensors="pt") with torch.no_grad(): logits = model(inputs.input_values).logits pred_ids = torch.argmax(logits, dim=-1) return processor.batch_decode(pred_ids)[0] speech, sample_rate = librosa.load("example_audio.wav", sr=16000) transcription = transcribe_audio(speech, sample_rate) print("Transcription:", transcription) # example: pɪŋɡwɪnz lɪv nɪ ði aɪsi ænɑɹtɪk