while trying to use scipy to write after inference.py, audio file has no sound

by HyperBlaze - opened

is this project based on MB-iSTFT-VITS? I tried to use the model with the configs properly to use it with MB-iSTFT-VITS multilingual, but it gives me model size mismatch error.
Why does this happens?

OpenDuckParty org

I would appreciate it if you could upload the error message and the config.json file together.

json 파일은
"train": {
"log_interval": 200,
"eval_interval": 1000,
"seed": 1234,
"epochs": 20000,
"learning_rate": 2e-4,
"betas": [0.8, 0.99],
"eps": 1e-9,
"batch_size": 64,
"fp16_run": false,
"lr_decay": 0.999875,
"segment_size": 8192,
"init_lr_ratio": 1,
"warmup_epochs": 0,
"c_mel": 45,
"c_kl": 1.0,
"fft_sizes": [384, 683, 171],
"hop_sizes": [30, 60, 10],
"win_lengths": [150, 300, 60],
"window": "hann_window"
"data": {
"max_wav_value": 32768.0,
"sampling_rate": 44100,
"filter_length": 1024,
"hop_length": 256,
"win_length": 1024,
"n_mel_channels": 80,
"mel_fmin": 0.0,
"mel_fmax": null,
"add_blank": true,
"n_speakers": 0,
"cleaned_text": true
"model": {
"ms_istft_vits": true,
"mb_istft_vits": false,
"istft_vits": false,
"subbands": 4,
"gen_istft_n_fft": 16,
"gen_istft_hop_size": 4,
"inter_channels": 192,
"hidden_channels": 192,
"filter_channels": 768,
"n_heads": 2,
"n_layers": 6,
"kernel_size": 3,
"p_dropout": 0.1,
"resblock": "1",
"resblock_kernel_sizes": [3,7,11],
"resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
"upsample_rates": [4,4],
"upsample_initial_channel": 512,
"upsample_kernel_sizes": [16,16],
"n_layers_q": 3,
"use_spectral_norm": false,
"use_sdp": false
"symbols": ["_", ",", ".", "!", "?", "-", "~", "\u2026", "A", "E", "I", "N", "O", "Q", "U", "a", "b", "d", "e", "f", "g", "h", "i", "j", "k", "m", "n", "o", "p", "r", "s", "t", "u", "v", "w", "y", "z", "\u0283", "\u02a7", "\u02a6", "\u2193", "\u2191", " "]
이렇게 건들지는 않았고
Mutli-stream iSTFT VITS
Traceback (most recent call last):
File "c:\Users\user\Desktop\AronaAssistant\arona_backend\TTS\inference.py", line 44, in
_ = utils.load_checkpoint("./models/arona/arona_ms_istft_vits.pth", net_g, None)
File "c:\Users\user\Desktop\AronaAssistant\arona_backend\TTS\utils.py", line 40, in load_checkpoint
File "F:\miniconda3\envs\arona\lib\site-packages\torch\nn\modules\module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SynthesizerTrn:
size mismatch for enc_p.emb.weight: copying a param with shape torch.Size([43, 192]) from checkpoint, the shape in current model is torch.Size([205, 192]).

이렇게 뜨네요.

OpenDuckParty org

vits/text/symbols.py도 제대로 설정되어있는지 확인해주세요.

# japanese_cleaners2
_pad        = '_'
_punctuation = ',.!?-~…'
_letters = 'AEINOQUabdefghijkmnoprstuvwyzʃʧʦ↓↑ '

# Export all symbols:
symbols = [_pad] + list(_punctuation) + list(_letters)
# Special symbol ids
SPACE_ID = symbols.index(' ')

어째선진 모르겠지만 원래는 한국어만 클리너가 있더군요. 밑에 그대로 복사하니 정상적으로 동작됩니다.
모델 자체가 호환이 안된다고 짐작해서 빠르게 포기하고 그대로 레포 복사해서 쓰려 했는데, 이제 해결됐습니다.

HyperBlaze changed discussion status to closed

Sign up or log in to comment