AstraMindAI
/

xttsv2

xtts

custom_code

Model card Files Files and versions Community

GiacomoLeoneMaria commited on Nov 21, 2024

Commit

9c66d96

verified ·

1 Parent(s): 3daa1cc

Create README.md

Browse files

Files changed (1) hide show

README.md +160 -0

README.md ADDED Viewed

	@@ -0,0 +1,160 @@

+---
+license: apache-2.0
+base_model:
+- coqui/XTTS-v2
+---
+# Auralis 🌌
+## Model Details 🛠️
+**Model Name:** Auralis
+**Model Architecture:** Based on [Coqui XTTS-v2](https://huggingface.co/coqui/XTTS-v2)
+**License:**
+- license: Apache 2.0
+- base_model: XTTS-v2 Components [Coqui AI License](https://coqui.ai/cpml)
+**Language Support:** English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese (Simplified), Hungarian, Korean, Japanese, Hindi
+**Developed by:** [AstraMind.ai](https://www.astramind.ai)
+**GitHub:** [AstraMind AI](https://github.com/astramind-ai/Auralis/tree/main)
+**Primary Use Case:** Text-to-Speech (TTS) generation for real-world applications, including books, dialogues, and multilingual tasks.
+---
+## Model Description 🚀
+Auralis transforms text into natural, high-quality speech with exceptional speed and scalability. It is powered by [Coqui XTTS-v2](https://huggingface.co/coqui/XTTS-v2) and optimized for both consumer-grade and high-performance GPUs. Auralis is designed to meet real-world needs like long-text processing, voice cloning, and concurrent request handling.
+### Key Features:
+- **Warp-Speed Processing:** Generate speech for an entire novel (e.g., Harry Potter) in ~10 minutes.
+- **Hardware Friendly:** Requires <10GB VRAM on a single NVIDIA RTX 3090.
+- **Scalable:** Handles multiple requests simultaneously.
+- **Streaming:** Seamlessly processes long texts in a streaming format.
+- **Custom Voices:** Enables voice cloning from short reference audio.
+---
+## Quick Start ⭐
+```python
+from auralis import TTS, TTSRequest
+# Initialize the model
+tts = TTS().from_pretrained("AstraMindAI/xtts2-gpt")
+# Create a TTS request
+request = TTSRequest(
+    text="Hello Earth! This is Auralis speaking.",
+    speaker_files=["reference.wav"]
+)
+# Generate speech
+output = tts.generate_speech(request)
+output.save("output.wav")
+```
+---
+## Ebook Generation 📚
+Auralis converting ebooks into audio formats at lightning speed. For Python script, check out [ebook_audio_generator.py](https://github.com/astramind-ai/Auralis/blob/main/examples/vocalize_a_ebook.py).
+```python
+def process_book(chapter_file: str, speaker_file: str):
+    # Read chapter
+    with open(chapter_file, 'r') as f:
+        chapter = f.read()
+    # You can pass the whole book, auralis will take care of splitting
+    request = TTSRequest(
+            text=chapter,
+            speaker_files=[speaker_file],
+            audio_config=AudioPreprocessingConfig(
+                enhance_speech=True,
+                normalize=True
+            )
+        )
+    output = tts.generate_speech(request)
+    output.play()
+    output.save("chapter_output.wav")
+# Example usage
+process_book("chapter1.txt", "reference_voice.wav")
+```
+---
+## Intended Use 🌟
+Auralis is designed for:
+- **Content Creators:** Generate audiobooks, podcasts, or voiceovers.
+- **Developers:** Integrate TTS into applications via a simple Python API.
+- **Accessibility**: Providing audio versions of digital content for people with visual or reading difficulties.
+- **Multilingual Scenarios:** Convert text to speech in multiple supported languages.
+---
+## Performance 📊
+**Benchmarks on NVIDIA RTX 3090:**
+- Short phrases (<100 characters): ~1 second
+- Medium texts (<1,000 characters): ~5-10 seconds
+- Full books (~100,000 characters): ~10 minutes
+**Memory Usage:**
+- Base VRAM: ~4GB
+- Peak VRAM: ~10GB
+---
+## Model Features 🛸
+1. **Speed & Efficiency:**
+   - Smart batching for rapid processing of long texts.
+   - Memory-optimized for consumer GPUs.
+2. **Easy Integration:**
+   - Python API with support for synchronous and asynchronous workflows.
+   - Streaming mode for continuous playback during generation.
+3. **Audio Quality Enhancements:**
+   - Background noise reduction.
+   - Voice clarity and volume normalization.
+   - Customizable audio preprocessing.
+4. **Multilingual Support:**
+   - Automatic language detection.
+   - High-quality speech in 15+ languages.
+5. **Customization:**
+   - Voice cloning using short reference clips.
+   - Adjustable parameters for tone, pacing, and language.
+---
+## Limitations & Ethical Considerations ⚠️
+- **Voice Cloning Risks:** Auralis supports voice cloning, which may raise ethical concerns about misuse. Use responsibly and ensure proper consent.
+- **Accent Limitations:** While robust for many languages, accents and intonations may vary based on the input.
+---
+## Citation 📜
+If you use Auralis in your research or projects, please cite:
+```bibtex
+@misc{auralis2024,
+  author = {AstraMind AI},
+  title = {Auralis: High-Performance Text-to-Speech Engine},
+  year = {2024},
+  url = {https://huggingface.co/AstraMindAI/auralis}
+}
+```