File size: 3,091 Bytes
6bf0fa9 019e2a3 8331c06 4345759 6bf0fa9 5d9a91a 2a2d5c1 faf666f 2a2d5c1 faf666f 23acb9e 2a2d5c1 5d9a91a 3f40cd9 5d9a91a 1fc3525 5d9a91a 23acb9e c8dd13e 23acb9e c8dd13e 2a2d5c1 5d9a91a 22a403e 972caea 2a2d5c1 c8dd13e 2502403 c8dd13e 8331c06 dfeb7ed 8331c06 c8dd13e 5d9a91a 8331c06 c8dd13e 8331c06 c8dd13e da43e6e 2502403 c8dd13e c4effd2 4b59bb9 2a2d5c1 4b59bb9 23acb9e 2a2d5c1 23acb9e 2a2d5c1 c4effd2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
---
license: mit
language:
- en
pipeline_tag: text-to-speech
tags:
- audiocraft
- audiogen
- styletts2
- audio
- synthesis
- shift
- audeering
- dkounadis
- sound
- scene
- acoustic-scene
- audio-generation
---
# Affective TTS / SoundScapes
- [SHIFT TTS tool](https://github.com/audeering/shift)
- Analysis of emotionality [#1](https://huggingface.co/dkounadis/artificial-styletts2/discussions/2)
- Soundscape `e.g. trees, water` via [AudioGen](https://huggingface.co/dkounadis/artificial-styletts2/discussions/3)
- `landscape2soundscape.py` shows how to overlay TTS & sound to image and create video
## Available Voices
<a href="https://audeering.github.io/shift/">Listen to available voices!</a>
## Flask API
Clone this repo
```
git clone https://huggingface.co/dkounadis/artificial-styletts2
```
Build env
```
virtualenv --python=python3 ~/.envs/.my_env
source ~/.envs/.my_env/bin/activate
cd shift/
pip install -r requirements.txt
```
Start API
```
CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=./hf_home CUDA_VISIBLE_DEVICES=2 python api.py
```
The following need `api.py` to be already running `on a tmux session`.
## Landscape 2 Soundscape
```python
# TTS & soundscape - overlay to .mp4
python landscape2soundscape.py
```
# YouTube Videos / Examples
Substitute Native voice via TTS
[![Native voice ANBPR video](assets/native_video_thumb.png)](https://www.youtube.com/watch?v=tmo2UbKYAqc)
##
Same video where Native voice is replaced with English TTS voice with similar emotion
[![Same video w. Native voice replaced with English TTS](assets/tts_video_thumb.png)](https://www.youtube.com/watch?v=geI1Vqn4QpY)
<details>
<summary>
Video dubbing from subtitles `.srt`
</summary>
## Video Dubbing
[![Review demo SHIFT](assets/review_demo_thumb.png)](https://www.youtube.com/watch?v=bpt7rOBENcQ)
Generate dubbed video:
```python
python tts.py --text assets/head_of_fortuna_en.srt --video assets/head_of_fortuna.mp4
```
</details>
## Joint Application of D3.1 & D3.2
<a href="https://youtu.be/wWC8DpOKVvQ" rel="Subtitles to Video">![Foo4](assets/caption_to_video_thumb.png)</a>
From an image and text create a video:
```python
python tts.py --text sample.txt --image assets/image_from_T31.jpg
```
## Landscape 2 Soundscape
```python
# Loads image & text & sound-scene text and creates .mp4
python landscape2soundscape.py
```
For SHIFT demo / Collaboration with [SMB](https://www.smb.museum/home/)
- YouTube Videos
[![01](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____01_Schick_AII840_001.jpg)](https://youtu.be/SSi3gUO4GtY)
[![02](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____02_Constable_AI555_001.jpg)](https://youtu.be/2YjxAPkdXIc)
[![10](uc_spk_Landscape2Soundscape_Masterpieces_pics/thumb____10_Boecklin_967648_NG2-80_001_rsz.jpg)](https://www.youtube.com/watch?v=Y8QyYUgLaCg)
# Live Demo - Paplay
Flask
```python
CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=/data/dkounadis/.hf7/ CUDA_VISIBLE_DEVICES=4 python live_api.py
```
Client (Ubutu)
```python
python live_demo.py # will ask text input & play soundscape
``` |