--- title: Celebrity Voice Match emoji: 📚 colorFrom: blue colorTo: indigo sdk: gradio sdk_version: 5.9.0 app_file: app.py pinned: false --- Find out your celebrity voice match with this Space. How does it work? We used a pre-trained conversational AI model called SpeechBrain to extract audio embeddings from your voice and the VoxCeleb1 audio dataset, containing 100K+ utterances for 1200+ celebrities from around the world. The model compares the embeddings from your voice with those of VoxCeleb using cosine similarity, giving you your closest matched celebrity voice (ref - https://lnkd.in/gybUUN3F) Why does it matter? Our app can be a starter for automating speaker tagging in interviews/podcasts, and organizing audio/video archives based on speakers. The model is designed to capture the semantic meaning and relationships within the audio, hence the language spoken doesn’t affect the voice matching accuracy. Beyond these use cases, the core of our app - audio embeddings - is crucial for generative AI applications like text-to-speech and music creation. Considerations while making the app? First, the model - we used SpeechBrain since it is trained on VoxCeleb for speaker recognition tasks, which results in the best performance. Second, comparison of embeddings - we used the centroid of each celebrity’s embeddings vs. clustering each celebrity’s embeddings or directly comparing with each embedding, to keep the app computationally efficient and ensure lower wait times for the users. Finally, we used PCA for dimensionality reduction because for a given audio input, it gives the same consistent output visualizations, which is desirable for reliable comparisons and a good user experience. Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference