ShoukanLabs
/

Vokan

Text-to-Speech

English

Model card Files Files and versions Community

Korakoe commited on Mar 25, 2024

Commit

8e7b84a

verified ·

1 Parent(s): aa0f5e5

Update README.md

Browse files

Files changed (1) hide show

README.md +79 -18

README.md CHANGED Viewed

@@ -1,5 +1,12 @@
 ---
 license: mit
 ---
 <style>
@@ -25,6 +32,16 @@ license: mit
     display: block;
   }
 </style>
 <hr>
@@ -38,29 +55,74 @@ license: mit
 <hr>
-Vokan features:
-- A diverse dataset for a more authentic zero-shot performance
-- Training on 6+ days worth of audio, with 672 diverse and expressive speakers
-- Training on 1x H100 for 300 hours and 1x 3090 for an additional 600 hours
-### Audio Examples
-<audio controls> <source src="" type="audio/wav"> Your browser does not support the audio embed. </audio>
-### Demo Spaces
-Coming soon...
-## This model was made possible thanks to
-- [DagsHub](https://dagshub.com) who sponsored us with their GPU compute (with special thanks to Dean!)
-- And the assistance from [camenduru](https://github.com/camenduru) on cloud infrastructure and model training
 <hr>
-<a href="https://discord.gg/5bq9HqVhsJ"><img src="https://img.shields.io/badge/find_us_at_the-ShoukanLabs_Discord-invite?style=flat-square&logo=discord&logoColor=%23ffffff&labelColor=%235865F2&color=%23ffffff" width="320" alt="discord"></a>
-<!--<a align="left" style="font-size: 1.3rem; font-weight: bold; color: #5662f6;" href="https://discord.gg/5bq9HqVhsJ">find us on Discord</a>-->
-## Citations
 ```citations
 @misc{li2023styletts,
@@ -87,9 +149,8 @@ The Centre for Speech Technology Research (CSTR),
 University of Edinburgh
 ```
-## License
 ```
 MIT
-```
-Stay tuned for Vokan V2!

 ---
 license: mit
+datasets:
+- ShoukanLabs/AniSpeech
+- vctk
+- blabble-io/libritts_r
+language:
+- en
+pipeline_tag: text-to-speech
 ---
 <style>
     display: block;
   }
+  audio {
+	margin: 0.5rem;
+  }
+  .audio-container {
+    display: flex;
+    justify-content: center;
+    align-items: center;
+  }
 </style>
 <hr>
 <hr>
+<a href="https://discord.gg/5bq9HqVhsJ"><img src="https://img.shields.io/badge/find_us_at_the-ShoukanLabs_Discord-invite?style=flat-square&logo=discord&logoColor=%23ffffff&labelColor=%235865F2&color=%23ffffff" width="320" alt="discord"></a>
+<!--<a align="left" style="font-size: 1.3rem; font-weight: bold; color: #5662f6;" href="https://discord.gg/5bq9HqVhsJ">find us on Discord</a>-->
+**Vokan** is an advanced finetuned **StyleTTS2** model crafted for authentic and expressive zero-shot performance. Designed to serve as a better
+base model fo further finetuning in the future!
+It leverages a diverse dataset and extensive training to generate high-quality synthesized speech.
+Trained on a combination of the AniSpeech, VCTK, and LibriTTS-R datasets, Vokan ensures authenticity and naturalness across various accents and contexts.
+With over 6+ days worth of audio data and 672 diverse and expressive speakers,
+Vokan captures a wide range of vocal characteristics, contributing to its remarkable performance.
+Although the amount of training data is less than the original, the inclusion of a broad array of accents and speakers enriches the model's vector space.
+Vokan's training required significant computational resources, including 300 hours on 1x H100 and an additional 600 hours on 1x 3090 hardware configuration.
+You can read more about it on our article on [DagsHub!](dagshub.com/blog/styletts2/)
+<hr>
+<p align="center", style="font-size: 2vw; font-weight: bold; color: #ff593e;">Vokan Samples!</p>
+<div class='audio-container'>
+  <div>
+      <audio controls>
+        <source src="https://dagshub.com/StyleTTS/Article/raw/74539c801ce3a894ec3df6b52fa2dd579637481d/demo%201.wav" type="audio/wav">
+      Your browser does not support the audio element.
+      </audio>
+  </div>
+  <div>
+      <audio controls>
+        <source src="https://dagshub.com/StyleTTS/Article/raw/74539c801ce3a894ec3df6b52fa2dd579637481d/demo%202.wav" type="audio/wav">
+      Your browser does not support the audio element.
+      </audio>
+  </div>
+</div>
+<div class='audio-container'>
+  <div>
+      <audio controls>
+        <source src="https://dagshub.com/StyleTTS/Article/raw/74539c801ce3a894ec3df6b52fa2dd579637481d/demo%203.wav" type="audio/wav">
+      Your browser does not support the audio element.
+      </audio>
+  </div>
+  <div>
+      <audio controls>
+        <source src="https://dagshub.com/StyleTTS/Article/raw/74539c801ce3a894ec3df6b52fa2dd579637481d/demo%204.wav" type="audio/wav">
+      Your browser does not support the audio element.
+      </audio>
+  </div>
+</div>
+<hr>
+<p align="center", style="font-size: 2vw; font-weight: bold; color: #ff593e;">Acknowledgements</p>
+- **[DagsHub](https://dagshub.com):** Special thanks to DagsHub for sponsoring GPU compute resources as well as offering an amazing versioning service, enabling efficient model training and development. A shoutout to Dean in particular!
+- **[camenduru](https://github.com/camenduru):** Thanks to camenduru for their expertise in cloud infrastructure and model training, which played a crucial role in the development of Vokan! Please give them a follow!
+<p align="center", style="font-size: 2vw; font-weight: bold; color: #ff593e;">Conclusion</p>
+V2 is currently in the works, aiming to be bigger and better in every way! Including multilingual support!
+This is where you come in, if you have any large single speaker datasets you'd like to contribute,
+in any langauge, you can contribute to our **Vokan dataset**. A large **community dataset** that combines a bunch of
+smaller single speaker datasets to create one big multispeaker one.
+You can upload your uberduck or [FakeYou](https://fakeyou.com/) compliant datasets via the
+**[Vokan](https://huggingface.co/ShoukanLabs/Vokan)** bot on the **[ShoukanLabs Discord Server](https://discord.gg/hdVeretude)**.
+The more data we have, the better the models we produce will be!
 <hr>
+<p align="center", style="font-size: 2vw; font-weight: bold; color: #ff593e;">Citations</p>
 ```citations
 @misc{li2023styletts,
 University of Edinburgh
 ```
+<p align="center", style="font-size: 2vw; font-weight: bold; color: #ff593e;">License</p>
 ```
 MIT
+```