File size: 3,016 Bytes

6bf0fa9
019e2a3
 
 
 
 
 
 
 
 
 
 
 
 
8331c06
 
 
4345759
6bf0fa9
5d9a91a
 
ceb19de
 
4345759
ceb19de
 
5d9a91a
ceb19de
5d9a91a
ceb19de
5d9a91a
ceb19de
c8dd13e
 
 
 
 
 
 
 
 
 
972caea
c8dd13e
 
 
 
 
 
 
972caea
c8dd13e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5d9a91a
 
972caea
 
 
 
 
 
 
c8dd13e
 
2502403
c8dd13e
 
 
 
 
 
 
 
 
 
 
8331c06
 
 
dfeb7ed
8331c06
 
 
c8dd13e
 
 
 
 
 
 
 
 
 
5d9a91a
 
 
8331c06
c8dd13e
8331c06
c8dd13e
da43e6e
 
 
2502403
c8dd13e
 
 
 
 
c4effd2

---
license: mit
language:
- en
pipeline_tag: text-to-speech
tags:
- audiocraft
- audiogen
- styletts2
- audio
- synthesis
- shift
- audeering
- dkounadis
- sound
- scene
- acoustic-scene
- audio-generation
---


# Affective TTS - SoundScape
  - Affective TTS via [SHIFT TTS tool](https://github.com/audeering/shift)
  - Soundscapes, e.g. trees, water, leaves, generations via [AudioGen](https://huggingface.co/dkounadis/artificial-styletts2/discussions/3)
  - `landscape2soundscape.py` shows how to overlay TTS & Soundscape to Images
  - `134` build-in voices

Available Voices

<a href="https://audeering.github.io/shift/">Listen Voices!</a>

**Flask API**

Install

```
virtualenv --python=python3 ~/.envs/.my_env
source ~/.envs/.my_env/bin/activate
cd shift/
pip install -r requirements.txt
```

Flask

```
CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=./hf_home CUDA_VISIBLE_DEVICES=2 python api.py
```

## Inference

The following need `api.py` to be running on a `tmux session`. 

**Text 2 Speech**

```python
# Basic TTS - See Available Voices
python tts.py --text sample.txt --voice "en_US/m-ailabs_low#mary_ann" --affective

# voice cloning
python tts.py --text sample.txt --native assets/native_voice.wav
```

**Image 2 Video**

```python
# Make video narrating an image - All above TTS args apply also here!
python tts.py --text sample.txt --image assets/image_from_T31.jpg
```

**Video 2 Video**

```python
# Video Dubbing - from time-stamped subtitles (.srt)
python tts.py --text assets/head_of_fortuna_en.srt --video assets/head_of_fortuna.mp4

# Video narration - from text description (.txt)
python tts.py --text assets/head_of_fortuna_GPT.txt --video assets/head_of_fortuna.mp4
```

**Landscape 2 Soundscape**

```python
# TTS & soundscape - overlay to .mp4
python landscape2soundscape.py
```

## Examples

Substitute Native voice via TTS

[![Native voice ANBPR video](assets/native_video_thumb.png)](https://www.youtube.com/watch?v=tmo2UbKYAqc)

##

Same video where Native voice is replaced with English TTS voice with similar emotion


[![Same video w. Native voice replaced with English TTS](assets/tts_video_thumb.png)](https://www.youtube.com/watch?v=geI1Vqn4QpY)


<details>
<summary>

Video dubbing from subtitles `.srt`

</summary>

## Video Dubbing

[![Review demo SHIFT](assets/review_demo_thumb.png)](https://www.youtube.com/watch?v=bpt7rOBENcQ)

Generate dubbed video:


```python
python tts.py --text assets/head_of_fortuna_en.srt --video assets/head_of_fortuna.mp4

```


</details>

## Joint Application of D3.1 & D3.2

<a href="https://youtu.be/wWC8DpOKVvQ" rel="Subtitles to Video">![Foo4](assets/caption_to_video_thumb.png)</a>


From an image and text create a video:

```python

python tts.py --text sample.txt --image assets/image_from_T31.jpg
```


# Live Demo - Paplay

Flask

```python
CUDA_DEVICE_ORDER=PCI_BUS_ID HF_HOME=/data/dkounadis/.hf7/ CUDA_VISIBLE_DEVICES=4 python live_api.py
```

Client (Ubutu)

```python
python live_demo.py  # will ask text input & play soundscape
```