PyTorch
ONNX
vocoder
vocos
hifigan
tts
mel
wetdog commited on
Commit
f5c97ea
·
verified ·
1 Parent(s): 5c7d9f6

Upload model

Browse files
Files changed (2) hide show
  1. config.yaml +107 -0
  2. pytorch_model.bin +3 -0
config.yaml ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # pytorch_lightning==1.8.6
2
+ seed_everything: 4444
3
+
4
+ data:
5
+ class_path: vocos.dataset.VocosDataModule
6
+ init_args:
7
+ train_params:
8
+ filelist_path: ???
9
+ sampling_rate: 22050
10
+ num_samples: 16384
11
+ batch_size: 16
12
+ num_workers: 8
13
+
14
+ val_params:
15
+ filelist_path: ???
16
+ sampling_rate: 22050
17
+ num_samples: 48384
18
+ batch_size: 16
19
+ num_workers: 8
20
+
21
+ model:
22
+ class_path: vocos.experiment.VocosExp
23
+ init_args:
24
+ sample_rate: 22050
25
+ initial_learning_rate: 1e-3
26
+ mel_loss_coeff: 45
27
+ mrd_loss_coeff: 0.1 # original value 0.1
28
+ num_warmup_steps: 500 # Optimizers warmup steps
29
+ pretrain_mel_steps: 0 # 0 means GAN objective from the first iteration
30
+
31
+ # automatic evaluation
32
+ evaluate_utmos: true
33
+ evaluate_pesq: true
34
+ evaluate_periodicty: true
35
+
36
+ feature_extractor:
37
+ class_path: vocos.feature_extractors.MelSpectrogramFeatures
38
+ init_args:
39
+ sample_rate: 22050
40
+ n_fft: 1024
41
+ hop_length: 256
42
+ n_mels: 80
43
+ padding: same
44
+ f_min: 0
45
+ f_max: 8000
46
+ norm: "slaney"
47
+ mel_scale: "slaney"
48
+ clip_val: 1e-5
49
+
50
+
51
+ backbone:
52
+ class_path: vocos.models.VocosBackbone
53
+ init_args:
54
+ input_channels: 80
55
+ dim: 512
56
+ intermediate_dim: 1536
57
+ num_layers: 8
58
+
59
+ head:
60
+ class_path: vocos.heads.WaveNextHead
61
+ init_args:
62
+ dim: 512
63
+ n_fft: 1024
64
+ hop_length: 256
65
+ padding: same
66
+
67
+ melspec_loss:
68
+ class_path: vocos.loss.MelSpecReconstructionLoss
69
+ init_args:
70
+ sample_rate: 22050
71
+ n_fft: 1024
72
+ hop_length: 256
73
+ n_mels: 128
74
+ f_min: 0
75
+ f_max: 11000
76
+ norm: "slaney"
77
+ mel_scale: "slaney"
78
+ clip_val: 1e-5
79
+
80
+
81
+ trainer:
82
+ logger:
83
+ class_path: pytorch_lightning.loggers.TensorBoardLogger
84
+ init_args:
85
+ save_dir: ???
86
+ callbacks:
87
+ - class_path: pytorch_lightning.callbacks.LearningRateMonitor
88
+ - class_path: pytorch_lightning.callbacks.ModelSummary
89
+ init_args:
90
+ max_depth: 2
91
+ - class_path: pytorch_lightning.callbacks.ModelCheckpoint
92
+ init_args:
93
+ monitor: val_loss
94
+ filename: vocos_checkpoint_{epoch}_{step}_{val_loss:.4f}
95
+ save_top_k: 3
96
+ save_last: true
97
+ - class_path: vocos.helpers.GradNormCallback
98
+
99
+ # Lightning calculates max_steps across all optimizer steps (rather than number of batches)
100
+ # This equals to 1M steps per generator and 1M per discriminator
101
+ max_steps: 2000000
102
+ # You might want to limit val batches when evaluating all the metrics, as they are time-consuming
103
+ limit_val_batches: 50
104
+ accelerator: gpu
105
+ strategy: ddp
106
+ devices: [0]
107
+ log_every_n_steps: 250
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3349cbad46af135e27a5df03668b1faf980fd11e150285f8435c7641330d1803
3
+ size 55097575