Spaces:

sanchit-gandhi
/

parler-tts-streaming

Running on Zero

sanchit-gandhi commited on Apr 24, 2024

Commit

33d12bd

1 Parent(s): 80ca0fc

html

Files changed (1) hide show

app.py CHANGED Viewed

@@ -325,8 +325,8 @@ with gr.Blocks(css=css) as block:
         <p>Tips for ensuring good generation:
         <ul>
-            <li>Include the term "very clear audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise</li>
-            <li>When using the fine-tuned model, include the term "Jenny" to pick out her voice</li>
             <li>Punctuation can be used to control the prosody of the generations, e.g. use commas to add small breaks in speech</li>
             <li>The remaining speech features (gender, speaking rate, pitch and reverberation) can be controlled directly through the prompt</li>
         </ul>
@@ -368,9 +368,8 @@ with gr.Blocks(css=css) as block:
         <p>To improve the prosody and naturalness of the speech further, we're scaling up the amount of training data to 50k hours of speech.
         The v1 release of the model will be trained on this data, as well as inference optimisations, such as flash attention
         and torch compile, that will improve the latency by 2-4x. If you want to find out more about how this model was trained and even fine-tune it yourself, check-out the
-        <a href="https://github.com/huggingface/parler-tts"> Parler-TTS</a> repository on GitHub.</p>
-        <p>The Parler-TTS codebase and its associated checkpoints are licensed under <a href='https://github.com/huggingface/parler-tts?tab=Apache-2.0-1-ov-file#readme'> Apache 2.0</a>.</p>
         """
     )

         <p>Tips for ensuring good generation:
         <ul>
+            <li>Include the term <b>"very clear audio"</b> to generate the highest quality audio, and "very noisy audio" for high levels of background noise</li>
+            <li>When using the fine-tuned model, include the term <b>"Jenny"</b> to pick out her voice</li>
             <li>Punctuation can be used to control the prosody of the generations, e.g. use commas to add small breaks in speech</li>
             <li>The remaining speech features (gender, speaking rate, pitch and reverberation) can be controlled directly through the prompt</li>
         </ul>
         <p>To improve the prosody and naturalness of the speech further, we're scaling up the amount of training data to 50k hours of speech.
         The v1 release of the model will be trained on this data, as well as inference optimisations, such as flash attention
         and torch compile, that will improve the latency by 2-4x. If you want to find out more about how this model was trained and even fine-tune it yourself, check-out the
+        <a href="https://github.com/huggingface/parler-tts"> Parler-TTS</a> repository on GitHub. The Parler-TTS codebase and its
+        associated checkpoints are licensed under <a href='https://github.com/huggingface/parler-tts?tab=Apache-2.0-1-ov-file#readme'> Apache 2.0</a>.</p>
         """
     )