Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.10.0
Pronunciation Trainer π£οΈ
This repository/app showcases how a phoneme-based pronunciation trainer (including personalized LLM-based feedback) overcomes the limitations of a grapheme-based approach
For convenience, you find a feature comparison overview of the two solutions below:
Feature | Grapheme-Based Solution | Phoneme-Based Solution |
---|---|---|
Input Type | Text transcriptions of speech | Audio files and phoneme transcriptions |
Feedback Mechanism | Comparison of grapheme sequences | Comparison of phoneme sequences and advanced LLM-based feedback |
Technological Approach | Simple text comparison using SequenceMatcher |
Advanced ASR models like Wav2Vec2 for phoneme recognition |
Feedback Detail | Basic similarity score and diff | Detailed phoneme comparison, LLM-based feedback including motivational and corrective elements |
Error Sensitivity | Sensitive to homophones and transcription errors | More accurate in capturing pronunciation nuances |
Suprasegmental Features | Does not capture (stress, intonation) | Potentially captures through phoneme dynamics and advanced evaluation |
Personalization | Limited to error feedback based on text similarity | Advanced personalization considering learner's native language and target language proficiency |
Scalability | Easy to scale with basic text processing tools | Requires more computational resources for ASR and LLM processing |
Cost | Lower, primarily involves basic computational resources | Higher, due to usage of advanced APIs and model processing |
Accuracy | Lower, prone to misinterpretations of homophones | Higher, better at handling diverse pronunciation patterns (but LLM hallucinations) |
Feedback Quality | Basic, often not linguistically rich | Rich, detailed, personalized, and linguistically informed |
Potential for Learning | Limited to recognizing text differences | High, includes phonetic and prosodic feedback, as well as resource and practice recommendations |
Quickstart π
π Click here to try out the app directly:
π Inspect the code at:
- GitHub: pwenker/pronunciation_trainer
- Hugging Face Spaces: pwenker/pronunciation_trainer
π Read about the pronunciation trainer:
Local Deployment π
Prerequisites π
Rye πΎ
Rye is a comprehensive tool designed for Python developers. It simplifies your workflow by managing Python installations and dependencies. Simply install Rye, and it takes care of the rest.
- Create a
.env
file in thepronunciation_trainer
folder and add the following variable:
OPENAI API Key π
OPENAI_API_KEY=... # Token for the OpenAI API
Set-Up π οΈ
Clone the repository:
git clone [repository-url] # Replace [repository-url] with the actual URL of the repository
Navigate to the directory:
cd pronunciation_trainer
Create a virtual environment in .venv
and synchronize the repo:
rye sync
For more details, visit: Basics - Rye
Start the App π
Launch the app using:
rye run python src/pronunciation_trainer/app.py
Then, open your browser and visit http://localhost:7860 to start practicing!