Spaces:
Running
Running
chore: Minor fixes
Browse files
docs/phoneme_based_solution.md
CHANGED
@@ -177,6 +177,7 @@ With more time, one could
|
|
177 |
- Adapt OpenAI's Whisper model on phoneme recognition/transcription by simply changing the tokenizer to handle the new vocabulary (the set of phonemes),
|
178 |
and fine-tuning th model on an (audio, phoneme) dataset with an appropriate metric. See [openai/whisper · Phoneme recognition](https://huggingface.co/spaces/openai/whisper/discussions/86) for a short discussion about it.
|
179 |
- Employ a model like [m-bain/whisperX: WhisperX](https://github.com/m-bain/whisperX) and possibly fine-tune it, to achieve word-level timestamps & diarization.
|
|
|
180 |
|
181 |
Further, the output of the ASR model could be enhanced by grouping phonemes (to allow for better world-level feedback and alignment) and also adding better prosodic/suprasegmental support.
|
182 |
|
|
|
177 |
- Adapt OpenAI's Whisper model on phoneme recognition/transcription by simply changing the tokenizer to handle the new vocabulary (the set of phonemes),
|
178 |
and fine-tuning th model on an (audio, phoneme) dataset with an appropriate metric. See [openai/whisper · Phoneme recognition](https://huggingface.co/spaces/openai/whisper/discussions/86) for a short discussion about it.
|
179 |
- Employ a model like [m-bain/whisperX: WhisperX](https://github.com/m-bain/whisperX) and possibly fine-tune it, to achieve word-level timestamps & diarization.
|
180 |
+
- Also, a probabilistic approach could be used to inform about transcription confidence and adjust/omit feedback according to it
|
181 |
|
182 |
Further, the output of the ASR model could be enhanced by grouping phonemes (to allow for better world-level feedback and alignment) and also adding better prosodic/suprasegmental support.
|
183 |
|
src/pronunciation_trainer/transcription.py
CHANGED
@@ -27,7 +27,6 @@ def transcribe(
|
|
27 |
transcriber = pipeline("automatic-speech-recognition", model=transcriber_choice)
|
28 |
try:
|
29 |
sr, y = audio
|
30 |
-
print(f"Sampling rate is {sr}")
|
31 |
except TypeError:
|
32 |
return None
|
33 |
y = y.astype(np.float32)
|
|
|
27 |
transcriber = pipeline("automatic-speech-recognition", model=transcriber_choice)
|
28 |
try:
|
29 |
sr, y = audio
|
|
|
30 |
except TypeError:
|
31 |
return None
|
32 |
y = y.astype(np.float32)
|