Spaces:
Runtime error
A newer version of the Gradio SDK is available:
5.12.0
Herta Voice Changer
Introduction
This AI model is based on SoftVC VITS Singing Voice Conversion. Refer to this Github Repository from the 4.0 branch. This model was inspired by Herta from Honkai Star Rail. This model can be used to convert the original voice from an audio file into this character's voice.
How to Prepare Audio Files
Your audio files should be shorter than 10 seconds
, have no BGM
, and have a sampling rate of 44100 Hz
.
- Create a new folder inside the
dataset_raw
folder (This folder name will be yourSpeakerID
). - Put your audio files into the folder you created above.
Note:
- Your audio files should be in
.wav
format. - If your audio files are longer than 10 seconds, I suggest you trim them down using your desired software or audio slicer GUI.
- If your audio files have BGM, please remove it using a program such as Ultimate Vocal Remover. The
3_HP-Vocal-UVR.pth
orUVR-MDX-NET Main
is recommended. - If your audio files have a sampling rate different from 44100 Hz, I suggest you resample them using Audacity or by running
python resample.py
in yourCMD
.
How to Build Locally
- Clone the repository from the 4.0 branch:
git clone https://github.com/svc-develop-team/so-vits-svc.git
- Put your
prepared audio
into thedataset_raw
folder. - Open your Command Line and install the
so-vits-svc
library:%pip install -U so-vits-svc-fork
- Navigate to your project directory using the Command Line.
- Run
svc pre-resample
in your prompt. - After completing the step above, run
svc pre-config
. - After completing the step above, run
svc pre-hubert
. (This step may take a while.). - After completing the step above, run
svc train -t
. (This step will take a while based on yourGPU
and the number ofepochs
you want.).
How to Change Epoch Value Locally
The meaning of epoch
is the number of training iterations for your model. Example: if you set the epoch value to 10000, your model will take 10000 steps to finish (default epoch value is 10000)
. To change your epoch value
:
- Go to your project folder.
- Find the folder named
config
. - Inside that folder, you should see
config.json
. - In
config.json
, there should be a section that looks like this:
"train": {
"log_interval": 200,
"eval_interval": 800,
"seed": 1234,
"epochs": <PUT YOUR VALUE HERE>,
"learning_rate": 0.0001,
"betas": [0.8, 0.99]
}
This can be done after svc pre-config
has already finished.
How to inferance in local.
To perform inference locally, navigate to the project directory, create a Python file, and copy the following lines of code:
your_audio_file = 'your_audio.wav'
audio, sr = librosa.load(your_audio_file, sr = 16000, mono = True)
raw_path = io.BytesIO()
soundfile.write(raw_path, audio, 16000, format = 'wav')
raw_path.seek(0)
model = Svc('logs/44k/your_model.pth', 'logs/44k/config.json')
out_audio, out_sr = model.infer('<YOUR SPEAKER ID>', 0, raw_path, auto_predict_f0 = True)
soundfile.write('out_audio.wav', out_audio.cpu().numpy(), 44100)
The output file will be in the same directory as your input audio file with the name your_audio_out.wav
How to Build in Google Colab
Refer to My Google Colab or the Official Google Colab for the steps.
Google Drive Setup
- Create an empty folder (this will be your project folder).
- Inside the project folder, create a folder named
dataset_raw
. - Create another folder inside
dataset_raw
(this folder name will be yourSpeakerID
). - Upload your prepared audio files into the folder created in the previous step.
Google Colab Setup
Mount your Google Drive:
from google.colab import drive drive.mount('/content/drive')
Install dependencies:
!python -m pip install -U pip setuptools wheel %pip install -U ipython %pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu118
Install
so-vits-svc
library:%pip install -U so-vits-svc-fork
Resample your audio files:
!svc pre-resample
Pre-config:
!svc pre-config
Pre-hubert (this step may take a while):
!svc pre-hubert
Train your model (this step will take a while based on your Google Colab GPU and the number of epochs you want):
!svc train -t
How to Change Epoch Value in Google Colab
The term "epoch" refers to the number of times you want to train your model. For example, if you set the epoch value to 10,000, your model will take 10,000 steps to complete (the default epoch value is 10,000).
To change the epoch value:
- Go to your project folder.
- Find the folder named
config
. - Inside that folder, you should see
config.json
. - In
config.json
, there should be a section that looks like this:
"train": {
"log_interval": 200,
"eval_interval": 800,
"seed": 1234,
"epochs": <PUT YOUR VALUE HERE>,
"learning_rate": 0.0001,
"betas": [0.8, 0.99]
}
This can be done after svc pre-config
has already finished.
How to Perform Inference in Google Colab
After training your model, you can use it to convert any original voice to your model voice by running the following command:
!svc infer drive/MyDrive/your_model_name/your_audio_file.wav --model-path drive/MyDrive/your_model_name/logs/44k/your_model.pth --config-path drive/MyDrive/your_model_name/logs/44k/your_config.json
The output file will be named your_audio_file.out.wav
Note:
- Your Google Drive must have at least 5 GB of free space. If you don't have enough space, consider registering a new Google account.
- Google Colab's Free Subscription is sufficient, but using the Pro version is recommended.
- Set your Google Colab Hardware Accelerator to
GPU
.
Credits
- zomehwh/sovits-models from Hugging Face Space
- svc-develop-team/so-vits-svc from GitHub repository
- voicepaw/so-vits-svc-fork from GitHub repository