pwenker commited on
Commit
31432db
·
1 Parent(s): 0c78e45

chore: Minor fixes

Browse files
docs/phoneme_based_solution.md CHANGED
@@ -177,6 +177,7 @@ With more time, one could
177
  - Adapt OpenAI's Whisper model on phoneme recognition/transcription by simply changing the tokenizer to handle the new vocabulary (the set of phonemes),
178
  and fine-tuning th model on an (audio, phoneme) dataset with an appropriate metric. See [openai/whisper · Phoneme recognition](https://huggingface.co/spaces/openai/whisper/discussions/86) for a short discussion about it.
179
  - Employ a model like [m-bain/whisperX: WhisperX](https://github.com/m-bain/whisperX) and possibly fine-tune it, to achieve word-level timestamps & diarization.
 
180
 
181
  Further, the output of the ASR model could be enhanced by grouping phonemes (to allow for better world-level feedback and alignment) and also adding better prosodic/suprasegmental support.
182
 
 
177
  - Adapt OpenAI's Whisper model on phoneme recognition/transcription by simply changing the tokenizer to handle the new vocabulary (the set of phonemes),
178
  and fine-tuning th model on an (audio, phoneme) dataset with an appropriate metric. See [openai/whisper · Phoneme recognition](https://huggingface.co/spaces/openai/whisper/discussions/86) for a short discussion about it.
179
  - Employ a model like [m-bain/whisperX: WhisperX](https://github.com/m-bain/whisperX) and possibly fine-tune it, to achieve word-level timestamps & diarization.
180
+ - Also, a probabilistic approach could be used to inform about transcription confidence and adjust/omit feedback according to it
181
 
182
  Further, the output of the ASR model could be enhanced by grouping phonemes (to allow for better world-level feedback and alignment) and also adding better prosodic/suprasegmental support.
183
 
src/pronunciation_trainer/transcription.py CHANGED
@@ -27,7 +27,6 @@ def transcribe(
27
  transcriber = pipeline("automatic-speech-recognition", model=transcriber_choice)
28
  try:
29
  sr, y = audio
30
- print(f"Sampling rate is {sr}")
31
  except TypeError:
32
  return None
33
  y = y.astype(np.float32)
 
27
  transcriber = pipeline("automatic-speech-recognition", model=transcriber_choice)
28
  try:
29
  sr, y = audio
 
30
  except TypeError:
31
  return None
32
  y = y.astype(np.float32)