GiacomoLeoneMaria commited on
Commit
9c66d96
Β·
verified Β·
1 Parent(s): 3daa1cc

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +160 -0
README.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - coqui/XTTS-v2
5
+ ---
6
+ # Auralis 🌌
7
+
8
+ ## Model Details πŸ› οΈ
9
+
10
+ **Model Name:** Auralis
11
+
12
+ **Model Architecture:** Based on [Coqui XTTS-v2](https://huggingface.co/coqui/XTTS-v2)
13
+
14
+ **License:**
15
+ - license: Apache 2.0
16
+ - base_model: XTTS-v2 Components [Coqui AI License](https://coqui.ai/cpml)
17
+
18
+ **Language Support:** English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese (Simplified), Hungarian, Korean, Japanese, Hindi
19
+
20
+ **Developed by:** [AstraMind.ai](https://www.astramind.ai)
21
+
22
+ **GitHub:** [AstraMind AI](https://github.com/astramind-ai/Auralis/tree/main)
23
+
24
+ **Primary Use Case:** Text-to-Speech (TTS) generation for real-world applications, including books, dialogues, and multilingual tasks.
25
+
26
+ ---
27
+
28
+ ## Model Description πŸš€
29
+
30
+ Auralis transforms text into natural, high-quality speech with exceptional speed and scalability. It is powered by [Coqui XTTS-v2](https://huggingface.co/coqui/XTTS-v2) and optimized for both consumer-grade and high-performance GPUs. Auralis is designed to meet real-world needs like long-text processing, voice cloning, and concurrent request handling.
31
+
32
+ ### Key Features:
33
+ - **Warp-Speed Processing:** Generate speech for an entire novel (e.g., Harry Potter) in ~10 minutes.
34
+ - **Hardware Friendly:** Requires <10GB VRAM on a single NVIDIA RTX 3090.
35
+ - **Scalable:** Handles multiple requests simultaneously.
36
+ - **Streaming:** Seamlessly processes long texts in a streaming format.
37
+ - **Custom Voices:** Enables voice cloning from short reference audio.
38
+
39
+ ---
40
+
41
+ ## Quick Start ⭐
42
+
43
+ ```python
44
+ from auralis import TTS, TTSRequest
45
+
46
+ # Initialize the model
47
+ tts = TTS().from_pretrained("AstraMindAI/xtts2-gpt")
48
+
49
+ # Create a TTS request
50
+ request = TTSRequest(
51
+ text="Hello Earth! This is Auralis speaking.",
52
+ speaker_files=["reference.wav"]
53
+ )
54
+
55
+ # Generate speech
56
+ output = tts.generate_speech(request)
57
+ output.save("output.wav")
58
+ ```
59
+
60
+ ---
61
+
62
+ ## Ebook Generation πŸ“š
63
+
64
+ Auralis converting ebooks into audio formats at lightning speed. For Python script, check out [ebook_audio_generator.py](https://github.com/astramind-ai/Auralis/blob/main/examples/vocalize_a_ebook.py).
65
+
66
+ ```python
67
+ def process_book(chapter_file: str, speaker_file: str):
68
+ # Read chapter
69
+ with open(chapter_file, 'r') as f:
70
+ chapter = f.read()
71
+
72
+ # You can pass the whole book, auralis will take care of splitting
73
+
74
+ request = TTSRequest(
75
+ text=chapter,
76
+ speaker_files=[speaker_file],
77
+ audio_config=AudioPreprocessingConfig(
78
+ enhance_speech=True,
79
+ normalize=True
80
+ )
81
+ )
82
+
83
+ output = tts.generate_speech(request)
84
+
85
+ output.play()
86
+ output.save("chapter_output.wav")
87
+
88
+ # Example usage
89
+ process_book("chapter1.txt", "reference_voice.wav")
90
+ ```
91
+
92
+ ---
93
+
94
+ ## Intended Use 🌟
95
+
96
+ Auralis is designed for:
97
+ - **Content Creators:** Generate audiobooks, podcasts, or voiceovers.
98
+ - **Developers:** Integrate TTS into applications via a simple Python API.
99
+ - **Accessibility**: Providing audio versions of digital content for people with visual or reading difficulties.
100
+ - **Multilingual Scenarios:** Convert text to speech in multiple supported languages.
101
+
102
+ ---
103
+
104
+ ## Performance πŸ“Š
105
+
106
+ **Benchmarks on NVIDIA RTX 3090:**
107
+ - Short phrases (<100 characters): ~1 second
108
+ - Medium texts (<1,000 characters): ~5-10 seconds
109
+ - Full books (~100,000 characters): ~10 minutes
110
+
111
+ **Memory Usage:**
112
+ - Base VRAM: ~4GB
113
+ - Peak VRAM: ~10GB
114
+
115
+ ---
116
+
117
+ ## Model Features πŸ›Έ
118
+
119
+ 1. **Speed & Efficiency:**
120
+ - Smart batching for rapid processing of long texts.
121
+ - Memory-optimized for consumer GPUs.
122
+
123
+ 2. **Easy Integration:**
124
+ - Python API with support for synchronous and asynchronous workflows.
125
+ - Streaming mode for continuous playback during generation.
126
+
127
+ 3. **Audio Quality Enhancements:**
128
+ - Background noise reduction.
129
+ - Voice clarity and volume normalization.
130
+ - Customizable audio preprocessing.
131
+
132
+ 4. **Multilingual Support:**
133
+ - Automatic language detection.
134
+ - High-quality speech in 15+ languages.
135
+
136
+ 5. **Customization:**
137
+ - Voice cloning using short reference clips.
138
+ - Adjustable parameters for tone, pacing, and language.
139
+
140
+ ---
141
+
142
+ ## Limitations & Ethical Considerations ⚠️
143
+
144
+ - **Voice Cloning Risks:** Auralis supports voice cloning, which may raise ethical concerns about misuse. Use responsibly and ensure proper consent.
145
+ - **Accent Limitations:** While robust for many languages, accents and intonations may vary based on the input.
146
+
147
+ ---
148
+
149
+ ## Citation πŸ“œ
150
+
151
+ If you use Auralis in your research or projects, please cite:
152
+
153
+ ```bibtex
154
+ @misc{auralis2024,
155
+ author = {AstraMind AI},
156
+ title = {Auralis: High-Performance Text-to-Speech Engine},
157
+ year = {2024},
158
+ url = {https://huggingface.co/AstraMindAI/auralis}
159
+ }
160
+ ```