A newer version of the Gradio SDK is available:
5.12.0
Pretrained Models Dependency
The models dependency of Amphion are as follows (sort alphabetically):
The instructions about how to download them is displayed as follows.
Amphion Singing BigVGAN
We fine-tune the official BigVGAN pretrained model with over 120 hours singing voice data. The fine-tuned checkpoint can be downloaded here. You need to download the 400000.pt
and args.json
files into Amphion/pretrained/bigvgan
:
Amphion
β£ pretrained
β β£ bivgan
β β β£ 400000.pt
β β β£ args.json
Amphion Speech HiFi-GAN
We trained our HiFi-GAN pretrained model with 685 hours speech data. Which can be downloaded here. You need to download the whole folder of hifigan_speech
into Amphion/pretrained/hifigan
.
Amphion
β£ pretrained
β β£ hifigan
β β β£ hifigan_speech
β β β β£ log
β β β β£ result
β β β β£ checkpoint
β β β β£ args.json
Amphion DiffWave
We trained our DiffWave pretrained model with 125 hours speech data and around 80 hours of singing voice data. Which can be downloaded here. You need to download the whole folder of diffwave
into Amphion/pretrained/diffwave
.
Amphion
β£ pretrained
β β£ diffwave
β β β£ diffwave_speech
β β β β£ samples
β β β β£ checkpoint
β β β β£ args.json
ContentVec
You can download the pretrained ContentVec model here. Note that we use the ContentVec_legacy-500 classes
checkpoint. Assume that you download the checkpoint_best_legacy_500.pt
into the Amphion/pretrained/contentvec
.
Amphion
β£ pretrained
β β£ contentvec
β β β£ checkpoint_best_legacy_500.pt
WeNet
You can download the pretrained WeNet model here. Take the wenetspeech
pretrained checkpoint as an example, assume you download the wenetspeech_u2pp_conformer_exp.tar
into the Amphion/pretrained/wenet
. Unzip it and modify its configuration file as follows:
cd Amphion/pretrained/wenet
### Unzip the expt dir
tar -xvf wenetspeech_u2pp_conformer_exp.tar.gz
### Specify the updated path in train.yaml
cd 20220506_u2pp_conformer_exp
vim train.yaml
# TODO: Change the value of "cmvn_file" (Line 2) to the absolute path of the `global_cmvn` file. (Eg: [YourPath]/Amphion/pretrained/wenet/20220506_u2pp_conformer_exp/global_cmvn)
The final file struture tree is like:
Amphion
β£ pretrained
β β£ wenet
β β β£ 20220506_u2pp_conformer_exp
β β β β£ final.pt
β β β β£ global_cmvn
β β β β£ train.yaml
β β β β£ units.txt
Whisper
The official pretrained whisper checkpoints can be available here. In Amphion, we use the medium
whisper model by default. You can download it as follows:
cd Amphion/pretrained
mkdir whisper
cd whisper
wget https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt
The final file structure tree is like:
Amphion
β£ pretrained
β β£ whisper
β β β£ medium.pt
RawNet3
The official pretrained RawNet3 checkpoints can be available here. You need to download the model.pt
file and put it in the folder.
The final file structure tree is like:
Amphion
β£ pretrained
β β£ rawnet3
β β β£ model.pt
(Optional) Model Dependencies for Evaluation
When utilizing Amphion's Evaluation Pipelines, terminals without access to huggingface.co
may encounter error messages such as "OSError: Can't load tokenizer for ...". To work around this, the dependant models for evaluation can be pre-prepared and stored here, at Amphion/pretrained
, and follow this README to configure your environment to load local models.
The dependant models of Amphion's evaluation pipeline are as follows (sort alphabetically):
The instructions about how to download them is displayed as follows.
bert-base-uncased
To load bert-base-uncased
locally, follow this link to download all files for bert-base-uncased
model, and store them under Amphion/pretrained/bert-base-uncased
, conforming to the following file structure tree:
Amphion
β£ pretrained
β β£ bert-base-uncased
β β β£ config.json
β β β£ coreml
β β β β£ fill-mask
β β β β£ float32_model.mlpackage
β β β β£ Data
β β β β£ com.apple.CoreML
β β β β£ model.mlmodel
β β β£ flax_model.msgpack
β β β£ LICENSE
β β β£ model.onnx
β β β£ model.safetensors
β β β£ pytorch_model.bin
β β β£ README.md
β β β£ rust_model.ot
β β β£ tf_model.h5
β β β£ tokenizer_config.json
β β β£ tokenizer.json
β β β£ vocab.txt
facebook/bart-base
To load facebook/bart-base
locally, follow this link to download all files for facebook/bart-base
model, and store them under Amphion/pretrained/facebook/bart-base
, conforming to the following file structure tree:
Amphion
β£ pretrained
β β£ facebook
β β β£ bart-base
β β β β£ config.json
β β β β£ flax_model.msgpack
β β β β£ gitattributes.txt
β β β β£ merges.txt
β β β β£ model.safetensors
β β β β£ pytorch_model.bin
β β β β£ README.txt
β β β β£ rust_model.ot
β β β β£ tf_model.h5
β β β β£ tokenizer.json
β β β β£ vocab.json
roberta-base
To load roberta-base
locally, follow this link to download all files for roberta-base
model, and store them under Amphion/pretrained/roberta-base
, conforming to the following file structure tree:
Amphion
β£ pretrained
β β£ roberta-base
β β β£ config.json
β β β£ dict.txt
β β β£ flax_model.msgpack
β β β£ gitattributes.txt
β β β£ merges.txt
β β β£ model.safetensors
β β β£ pytorch_model.bin
β β β£ README.txt
β β β£ rust_model.ot
β β β£ tf_model.h5
β β β£ tokenizer.json
β β β£ vocab.json
wavlm
The official pretrained wavlm checkpoints can be available here. The file structure tree is as follows:
Amphion
β£ wavlm
β β£ config.json
β β£ preprocessor_config.json
β β£ pytorch_model.bin