Emrys365 commited on
Commit
30febae
·
1 Parent(s): cda1fd3

Update model

Browse files
Files changed (45) hide show
  1. README.md +381 -3
  2. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/91epoch.pth +3 -0
  3. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/config.yaml +237 -0
  4. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/enhanced_test_16k/RESULTS.md +24 -0
  5. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/enhanced_test_48k/RESULTS.md +19 -0
  6. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/backward_time.png +0 -0
  7. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/clip.png +0 -0
  8. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/forward_time.png +0 -0
  9. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/gpu_max_cached_mem_GB.png +0 -0
  10. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/grad_norm.png +0 -0
  11. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/iter_time.png +0 -0
  12. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_1ch_16k.png +0 -0
  13. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_1ch_16k_r.png +0 -0
  14. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_1ch_24k.png +0 -0
  15. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_1ch_48k.png +0 -0
  16. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_1ch_8k.png +0 -0
  17. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_1ch_8k_r.png +0 -0
  18. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_2ch_16k.png +0 -0
  19. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_2ch_16k_r.png +0 -0
  20. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_2ch_8k.png +0 -0
  21. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_2ch_8k_r.png +0 -0
  22. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_5ch_16k.png +0 -0
  23. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_5ch_8k.png +0 -0
  24. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_8ch_16k_r.png +0 -0
  25. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_8ch_8k_r.png +0 -0
  26. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/loss.png +0 -0
  27. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/loss_scale.png +0 -0
  28. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/optim0_lr0.png +0 -0
  29. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/optim_step_time.png +0 -0
  30. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_1ch_16k.png +0 -0
  31. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_1ch_16k_r.png +0 -0
  32. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_1ch_24k.png +0 -0
  33. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_1ch_48k.png +0 -0
  34. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_1ch_8k.png +0 -0
  35. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_1ch_8k_r.png +0 -0
  36. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_2ch_16k.png +0 -0
  37. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_2ch_16k_r.png +0 -0
  38. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_2ch_8k.png +0 -0
  39. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_2ch_8k_r.png +0 -0
  40. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_5ch_16k.png +0 -0
  41. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_5ch_8k.png +0 -0
  42. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_8ch_16k_r.png +0 -0
  43. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_8ch_8k_r.png +0 -0
  44. exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/train_time.png +0 -0
  45. meta.yaml +8 -0
README.md CHANGED
@@ -1,3 +1,381 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - audio-to-audio
6
+ language: en
7
+ datasets:
8
+ - universal_se
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ENH model
13
+
14
+ ### `wyz/vctk_dns2020_whamr_tfgridnet_xtiny`
15
+
16
+ This model was trained by wyz using universal_se recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ To use the model in the Python interface, you could use the following code:
24
+
25
+ ```python
26
+ import soundfile as sf
27
+ from espnet2.bin.enh_inference import SeparateSpeech
28
+
29
+ # For model downloading + loading
30
+ model = SeparateSpeech.from_pretrained(
31
+ model_tag="wyz/vctk_dns2020_whamr_tfgridnet_xtiny",
32
+ normalize_output_wav=True,
33
+ device="cuda",
34
+ )
35
+ # For loading a downloaded model
36
+ # model = SeparateSpeech(
37
+ # train_config="exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/config.yaml",
38
+ # model_file="exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/xxxx.pth",
39
+ # normalize_output_wav=True,
40
+ # device="cuda",
41
+ # )
42
+
43
+ audio, fs = sf.read("/path/to/noisy/utt1.flac")
44
+ enhanced = model(audio[None, :], fs=fs)[0]
45
+ ```
46
+
47
+ module
48
+ <!-- Generated by scripts/utils/show_enh_score.sh -->
49
+ # RESULTS
50
+ ## Environments
51
+ - date: `Fri Mar 1 15:56:53 EST 2024`
52
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
53
+ - espnet version: `espnet 202304`
54
+ - pytorch version: `pytorch 2.0.1+cu118`
55
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
56
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
57
+
58
+
59
+ ## enhanced_test_16k
60
+
61
+
62
+ |dataset|PESQ_WB|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
63
+ |---|---|---|---|---|---|---|---|---|---|---|
64
+ |chime4_et05_real_isolated_6ch_track|1.22|54.42|-2.34|-2.34|0.00|-30.90|2.99|3.36|3.77|3.70|
65
+ |chime4_et05_simu_isolated_6ch_track|1.53|84.20|9.02|9.02|0.00|2.19|2.82|3.19|3.73|3.28|
66
+ |dns20_tt_synthetic_no_reverb|3.10|97.40|18.73|18.73|0.00|18.79|3.29|3.56|4.05|3.99|
67
+ |reverb_et_real_8ch_multich|1.14|68.58|2.35|2.35|0.00|-0.37|3.14|3.44|3.96|3.86|
68
+ |reverb_et_simu_8ch_multich|2.37|94.32|11.14|11.14|0.00|-8.16|3.20|3.49|4.00|3.89|
69
+ |whamr_tt_mix_single_reverb_max_16k|2.08|92.85|10.74|10.74|0.00|9.35|3.19|3.46|4.05|3.76|
70
+
71
+ abc
72
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
73
+ # RESULTS
74
+ ## Environments
75
+ - date: `Thu Jan 25 21:35:34 EST 2024`
76
+ - python version: `3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]`
77
+ - espnet version: `espnet 202304`
78
+ - pytorch version: `pytorch 1.13.1`
79
+ - Git hash: ``
80
+ - Commit date: ``
81
+
82
+
83
+ ## enhanced_test_48k
84
+
85
+
86
+ |dataset|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
87
+ |---|---|---|---|---|---|---|---|---|---|
88
+ |vctk_noisy_tt_2spk|95.13|20.29|20.29|0.00|19.20|3.16|3.46|4.00|3.55|
89
+
90
+ ## ENH config
91
+
92
+ <details><summary>expand</summary>
93
+
94
+ ```
95
+ config: conf/tuning/tfgridnet_tiny.yaml
96
+ print_config: false
97
+ log_level: INFO
98
+ dry_run: false
99
+ iterator_type: chunk
100
+ output_dir: exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw
101
+ ngpu: 1
102
+ seed: 0
103
+ num_workers: 4
104
+ num_att_plot: 3
105
+ dist_backend: nccl
106
+ dist_init_method: env://
107
+ dist_world_size: 2
108
+ dist_rank: 0
109
+ local_rank: 0
110
+ dist_master_addr: localhost
111
+ dist_master_port: 47945
112
+ dist_launcher: null
113
+ multiprocessing_distributed: true
114
+ unused_parameters: true
115
+ sharded_ddp: false
116
+ cudnn_enabled: true
117
+ cudnn_benchmark: false
118
+ cudnn_deterministic: true
119
+ collect_stats: false
120
+ write_collected_feats: false
121
+ max_epoch: 100
122
+ patience: 40
123
+ val_scheduler_criterion:
124
+ - valid
125
+ - loss
126
+ early_stopping_criterion:
127
+ - valid
128
+ - loss
129
+ - min
130
+ best_model_criterion:
131
+ - - valid
132
+ - loss
133
+ - min
134
+ keep_nbest_models: 1
135
+ nbest_averaging_interval: 0
136
+ grad_clip: 5.0
137
+ grad_clip_type: 2.0
138
+ grad_noise: false
139
+ accum_grad: 1
140
+ no_forward_run: false
141
+ resume: true
142
+ save_interval: 1000
143
+ train_dtype: float32
144
+ use_amp: false
145
+ log_interval: null
146
+ use_matplotlib: true
147
+ use_tensorboard: true
148
+ create_graph_in_tensorboard: false
149
+ use_wandb: false
150
+ wandb_project: null
151
+ wandb_id: null
152
+ wandb_entity: null
153
+ wandb_name: null
154
+ wandb_model_log_interval: -1
155
+ detect_anomaly: false
156
+ pretrain_path: null
157
+ init_param: []
158
+ ignore_init_mismatch: false
159
+ freeze_param: []
160
+ num_iters_per_epoch: 8000
161
+ num_iters_valid: null
162
+ batch_size: 4
163
+ valid_batch_size: null
164
+ batch_bins: 1000000
165
+ valid_batch_bins: null
166
+ train_shape_file:
167
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_mix_shape
168
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_ref1_shape
169
+ valid_shape_file:
170
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_mix_shape
171
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_ref1_shape
172
+ batch_type: folded
173
+ valid_batch_type: null
174
+ fold_length:
175
+ - 80000
176
+ - 80000
177
+ sort_in_batch: descending
178
+ sort_batch: descending
179
+ multiple_iterator: false
180
+ chunk_length: 32000
181
+ chunk_shift_ratio: 0.5
182
+ num_cache_chunks: 1024
183
+ chunk_excluded_key_prefixes: []
184
+ chunk_discard_short_samples: false
185
+ train_data_path_and_name_and_type:
186
+ - - dump/raw/train_vctk_noisy_dns20_whamr/wav.scp
187
+ - speech_mix
188
+ - sound
189
+ - - dump/raw/train_vctk_noisy_dns20_whamr/spk1.scp
190
+ - speech_ref1
191
+ - sound
192
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2category
193
+ - category
194
+ - text
195
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2fs
196
+ - fs
197
+ - text_int
198
+ valid_data_path_and_name_and_type:
199
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/wav.scp
200
+ - speech_mix
201
+ - sound
202
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/spk1.scp
203
+ - speech_ref1
204
+ - sound
205
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2category
206
+ - category
207
+ - text
208
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2fs
209
+ - fs
210
+ - text_int
211
+ allow_variable_data_keys: false
212
+ max_cache_size: 0.0
213
+ max_cache_fd: 32
214
+ allow_multi_rates: true
215
+ valid_max_cache_size: null
216
+ exclude_weight_decay: false
217
+ exclude_weight_decay_conf: {}
218
+ optim: adam
219
+ optim_conf:
220
+ lr: 0.001
221
+ eps: 1.0e-08
222
+ weight_decay: 1.0e-05
223
+ scheduler: steplr
224
+ scheduler_conf:
225
+ step_size: 2
226
+ gamma: 0.99
227
+ init: null
228
+ model_conf:
229
+ normalize_variance_per_ch: true
230
+ categories:
231
+ - 1ch_8k
232
+ - 1ch_8k_r
233
+ - 1ch_16k_r
234
+ - 1ch_48k
235
+ - 1ch_24k
236
+ - 1ch_16k
237
+ - 2ch_8k
238
+ - 2ch_8k_r
239
+ - 2ch_16k
240
+ - 2ch_16k_r
241
+ - 5ch_8k
242
+ - 5ch_16k
243
+ - 8ch_8k_r
244
+ - 8ch_16k_r
245
+ criterions:
246
+ - name: mr_l1_tfd
247
+ conf:
248
+ window_sz:
249
+ - 256
250
+ - 512
251
+ - 768
252
+ - 1024
253
+ hop_sz: null
254
+ eps: 1.0e-08
255
+ time_domain_weight: 0.5
256
+ normalize_variance: true
257
+ wrapper: fixed_order
258
+ wrapper_conf:
259
+ weight: 1.0
260
+ - name: si_snr
261
+ conf:
262
+ eps: 1.0e-07
263
+ wrapper: fixed_order
264
+ wrapper_conf:
265
+ weight: 0.0
266
+ speech_volume_normalize: null
267
+ rir_scp: null
268
+ rir_apply_prob: 1.0
269
+ noise_scp: null
270
+ noise_apply_prob: 1.0
271
+ noise_db_range: '13_15'
272
+ short_noise_thres: 0.5
273
+ use_reverberant_ref: false
274
+ num_spk: 1
275
+ num_noise_type: 1
276
+ sample_rate: 8000
277
+ force_single_channel: true
278
+ channel_reordering: true
279
+ categories:
280
+ - 1ch_8k
281
+ - 1ch_8k_r
282
+ - 1ch_16k_r
283
+ - 1ch_48k
284
+ - 1ch_24k
285
+ - 1ch_16k
286
+ - 2ch_8k
287
+ - 2ch_8k_r
288
+ - 2ch_16k
289
+ - 2ch_16k_r
290
+ - 5ch_8k
291
+ - 5ch_16k
292
+ - 8ch_8k_r
293
+ - 8ch_16k_r
294
+ speech_segment: null
295
+ avoid_allzero_segment: true
296
+ flexible_numspk: false
297
+ dynamic_mixing: false
298
+ utt2spk: null
299
+ dynamic_mixing_gain_db: 0.0
300
+ encoder: stft
301
+ encoder_conf:
302
+ n_fft: 960
303
+ hop_length: 480
304
+ use_builtin_complex: true
305
+ default_fs: 48000
306
+ separator: tfgridnetv3
307
+ separator_conf:
308
+ n_srcs: 1
309
+ n_imics: 1
310
+ n_layers: 2
311
+ lstm_hidden_units: 64
312
+ attn_n_head: 2
313
+ attn_qk_output_channel: 2
314
+ emb_dim: 32
315
+ emb_ks: 4
316
+ emb_hs: 1
317
+ activation: prelu
318
+ eps: 1.0e-05
319
+ decoder: stft
320
+ decoder_conf:
321
+ n_fft: 960
322
+ hop_length: 480
323
+ default_fs: 48000
324
+ mask_module: multi_mask
325
+ mask_module_conf: {}
326
+ preprocessor: enh
327
+ preprocessor_conf: {}
328
+ required:
329
+ - output_dir
330
+ version: '202304'
331
+ distributed: true
332
+ ```
333
+
334
+ </details>
335
+
336
+
337
+
338
+ ### Citing ESPnet
339
+
340
+ ```BibTex
341
+ @inproceedings{watanabe2018espnet,
342
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
343
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
344
+ year={2018},
345
+ booktitle={Proceedings of Interspeech},
346
+ pages={2207--2211},
347
+ doi={10.21437/Interspeech.2018-1456},
348
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
349
+ }
350
+
351
+
352
+ @inproceedings{ESPnet-SE,
353
+ author = {Chenda Li and Jing Shi and Wangyou Zhang and Aswin Shanmugam Subramanian and Xuankai Chang and
354
+ Naoyuki Kamo and Moto Hira and Tomoki Hayashi and Christoph B{"{o}}ddeker and Zhuo Chen and Shinji Watanabe},
355
+ title = {ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for {ASR} Integration},
356
+ booktitle = {{IEEE} Spoken Language Technology Workshop, {SLT} 2021, Shenzhen, China, January 19-22, 2021},
357
+ pages = {785--792},
358
+ publisher = {{IEEE}},
359
+ year = {2021},
360
+ url = {https://doi.org/10.1109/SLT48900.2021.9383615},
361
+ doi = {10.1109/SLT48900.2021.9383615},
362
+ timestamp = {Mon, 12 Apr 2021 17:08:59 +0200},
363
+ biburl = {https://dblp.org/rec/conf/slt/Li0ZSCKHHBC021.bib},
364
+ bibsource = {dblp computer science bibliography, https://dblp.org}
365
+ }
366
+
367
+
368
+ ```
369
+
370
+ or arXiv:
371
+
372
+ ```bibtex
373
+ @misc{watanabe2018espnet,
374
+ title={ESPnet: End-to-End Speech Processing Toolkit},
375
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
376
+ year={2018},
377
+ eprint={1804.00015},
378
+ archivePrefix={arXiv},
379
+ primaryClass={cs.CL}
380
+ }
381
+ ```
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/91epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5a76d6e60895af04c96da71e1101505a1b23861f91c4ef538c981f1bf4e11cc5
3
+ size 1905247
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/config.yaml ADDED
@@ -0,0 +1,237 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/tfgridnet_tiny.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: chunk
6
+ output_dir: exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 4
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: 2
14
+ dist_rank: 0
15
+ local_rank: 0
16
+ dist_master_addr: localhost
17
+ dist_master_port: 47945
18
+ dist_launcher: null
19
+ multiprocessing_distributed: true
20
+ unused_parameters: true
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 100
28
+ patience: 40
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - loss
39
+ - min
40
+ keep_nbest_models: 1
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 1
46
+ no_forward_run: false
47
+ resume: true
48
+ save_interval: 1000
49
+ train_dtype: float32
50
+ use_amp: false
51
+ log_interval: null
52
+ use_matplotlib: true
53
+ use_tensorboard: true
54
+ create_graph_in_tensorboard: false
55
+ use_wandb: false
56
+ wandb_project: null
57
+ wandb_id: null
58
+ wandb_entity: null
59
+ wandb_name: null
60
+ wandb_model_log_interval: -1
61
+ detect_anomaly: false
62
+ pretrain_path: null
63
+ init_param: []
64
+ ignore_init_mismatch: false
65
+ freeze_param: []
66
+ num_iters_per_epoch: 8000
67
+ num_iters_valid: null
68
+ batch_size: 4
69
+ valid_batch_size: null
70
+ batch_bins: 1000000
71
+ valid_batch_bins: null
72
+ train_shape_file:
73
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_mix_shape
74
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_ref1_shape
75
+ valid_shape_file:
76
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_mix_shape
77
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_ref1_shape
78
+ batch_type: folded
79
+ valid_batch_type: null
80
+ fold_length:
81
+ - 80000
82
+ - 80000
83
+ sort_in_batch: descending
84
+ sort_batch: descending
85
+ multiple_iterator: false
86
+ chunk_length: 32000
87
+ chunk_shift_ratio: 0.5
88
+ num_cache_chunks: 1024
89
+ chunk_excluded_key_prefixes: []
90
+ chunk_discard_short_samples: false
91
+ train_data_path_and_name_and_type:
92
+ - - dump/raw/train_vctk_noisy_dns20_whamr/wav.scp
93
+ - speech_mix
94
+ - sound
95
+ - - dump/raw/train_vctk_noisy_dns20_whamr/spk1.scp
96
+ - speech_ref1
97
+ - sound
98
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2category
99
+ - category
100
+ - text
101
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2fs
102
+ - fs
103
+ - text_int
104
+ valid_data_path_and_name_and_type:
105
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/wav.scp
106
+ - speech_mix
107
+ - sound
108
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/spk1.scp
109
+ - speech_ref1
110
+ - sound
111
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2category
112
+ - category
113
+ - text
114
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2fs
115
+ - fs
116
+ - text_int
117
+ allow_variable_data_keys: false
118
+ max_cache_size: 0.0
119
+ max_cache_fd: 32
120
+ allow_multi_rates: true
121
+ valid_max_cache_size: null
122
+ exclude_weight_decay: false
123
+ exclude_weight_decay_conf: {}
124
+ optim: adam
125
+ optim_conf:
126
+ lr: 0.001
127
+ eps: 1.0e-08
128
+ weight_decay: 1.0e-05
129
+ scheduler: steplr
130
+ scheduler_conf:
131
+ step_size: 2
132
+ gamma: 0.99
133
+ init: null
134
+ model_conf:
135
+ normalize_variance_per_ch: true
136
+ categories:
137
+ - 1ch_8k
138
+ - 1ch_8k_r
139
+ - 1ch_16k_r
140
+ - 1ch_48k
141
+ - 1ch_24k
142
+ - 1ch_16k
143
+ - 2ch_8k
144
+ - 2ch_8k_r
145
+ - 2ch_16k
146
+ - 2ch_16k_r
147
+ - 5ch_8k
148
+ - 5ch_16k
149
+ - 8ch_8k_r
150
+ - 8ch_16k_r
151
+ criterions:
152
+ - name: mr_l1_tfd
153
+ conf:
154
+ window_sz:
155
+ - 256
156
+ - 512
157
+ - 768
158
+ - 1024
159
+ hop_sz: null
160
+ eps: 1.0e-08
161
+ time_domain_weight: 0.5
162
+ normalize_variance: true
163
+ wrapper: fixed_order
164
+ wrapper_conf:
165
+ weight: 1.0
166
+ - name: si_snr
167
+ conf:
168
+ eps: 1.0e-07
169
+ wrapper: fixed_order
170
+ wrapper_conf:
171
+ weight: 0.0
172
+ speech_volume_normalize: null
173
+ rir_scp: null
174
+ rir_apply_prob: 1.0
175
+ noise_scp: null
176
+ noise_apply_prob: 1.0
177
+ noise_db_range: '13_15'
178
+ short_noise_thres: 0.5
179
+ use_reverberant_ref: false
180
+ num_spk: 1
181
+ num_noise_type: 1
182
+ sample_rate: 8000
183
+ force_single_channel: true
184
+ channel_reordering: true
185
+ categories:
186
+ - 1ch_8k
187
+ - 1ch_8k_r
188
+ - 1ch_16k_r
189
+ - 1ch_48k
190
+ - 1ch_24k
191
+ - 1ch_16k
192
+ - 2ch_8k
193
+ - 2ch_8k_r
194
+ - 2ch_16k
195
+ - 2ch_16k_r
196
+ - 5ch_8k
197
+ - 5ch_16k
198
+ - 8ch_8k_r
199
+ - 8ch_16k_r
200
+ speech_segment: null
201
+ avoid_allzero_segment: true
202
+ flexible_numspk: false
203
+ dynamic_mixing: false
204
+ utt2spk: null
205
+ dynamic_mixing_gain_db: 0.0
206
+ encoder: stft
207
+ encoder_conf:
208
+ n_fft: 960
209
+ hop_length: 480
210
+ use_builtin_complex: true
211
+ default_fs: 48000
212
+ separator: tfgridnetv3
213
+ separator_conf:
214
+ n_srcs: 1
215
+ n_imics: 1
216
+ n_layers: 2
217
+ lstm_hidden_units: 64
218
+ attn_n_head: 2
219
+ attn_qk_output_channel: 2
220
+ emb_dim: 32
221
+ emb_ks: 4
222
+ emb_hs: 1
223
+ activation: prelu
224
+ eps: 1.0e-05
225
+ decoder: stft
226
+ decoder_conf:
227
+ n_fft: 960
228
+ hop_length: 480
229
+ default_fs: 48000
230
+ mask_module: multi_mask
231
+ mask_module_conf: {}
232
+ preprocessor: enh
233
+ preprocessor_conf: {}
234
+ required:
235
+ - output_dir
236
+ version: '202304'
237
+ distributed: true
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/enhanced_test_16k/RESULTS.md ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ module
2
+ <!-- Generated by scripts/utils/show_enh_score.sh -->
3
+ # RESULTS
4
+ ## Environments
5
+ - date: `Fri Mar 1 15:56:53 EST 2024`
6
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
7
+ - espnet version: `espnet 202304`
8
+ - pytorch version: `pytorch 2.0.1+cu118`
9
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
10
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
11
+
12
+
13
+ ## enhanced_test_16k
14
+
15
+
16
+ |dataset|PESQ_WB|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
17
+ |---|---|---|---|---|---|---|---|---|---|---|
18
+ |chime4_et05_real_isolated_6ch_track|1.22|54.42|-2.34|-2.34|0.00|-30.90|2.99|3.36|3.77|3.70|
19
+ |chime4_et05_simu_isolated_6ch_track|1.53|84.20|9.02|9.02|0.00|2.19|2.82|3.19|3.73|3.28|
20
+ |dns20_tt_synthetic_no_reverb|3.10|97.40|18.73|18.73|0.00|18.79|3.29|3.56|4.05|3.99|
21
+ |reverb_et_real_8ch_multich|1.14|68.58|2.35|2.35|0.00|-0.37|3.14|3.44|3.96|3.86|
22
+ |reverb_et_simu_8ch_multich|2.37|94.32|11.14|11.14|0.00|-8.16|3.20|3.49|4.00|3.89|
23
+ |whamr_tt_mix_single_reverb_max_16k|2.08|92.85|10.74|10.74|0.00|9.35|3.19|3.46|4.05|3.76|
24
+
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/enhanced_test_48k/RESULTS.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ abc
2
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
3
+ # RESULTS
4
+ ## Environments
5
+ - date: `Thu Jan 25 21:35:34 EST 2024`
6
+ - python version: `3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]`
7
+ - espnet version: `espnet 202304`
8
+ - pytorch version: `pytorch 1.13.1`
9
+ - Git hash: ``
10
+ - Commit date: ``
11
+
12
+
13
+ ## enhanced_test_48k
14
+
15
+
16
+ |dataset|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
17
+ |---|---|---|---|---|---|---|---|---|---|
18
+ |vctk_noisy_tt_2spk|95.13|20.29|20.29|0.00|19.20|3.16|3.46|4.00|3.55|
19
+
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/backward_time.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/clip.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/forward_time.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/gpu_max_cached_mem_GB.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/grad_norm.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/iter_time.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_1ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_1ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_1ch_24k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_1ch_48k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_1ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_1ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_2ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_2ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_2ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_2ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_5ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_5ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_8ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/l1_timedomain+magspec_loss_8ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/loss.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/loss_scale.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/optim0_lr0.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/optim_step_time.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_1ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_1ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_1ch_24k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_1ch_48k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_1ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_1ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_2ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_2ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_2ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_2ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_5ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_5ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_8ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/si_snr_loss_8ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/images/train_time.png ADDED
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202304'
2
+ files:
3
+ model_file: exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/91epoch.pth
4
+ python: "3.8.16 (default, Mar 2 2023, 03:21:46) \n[GCC 11.2.0]"
5
+ timestamp: 1723014150.122787
6
+ torch: 2.0.1+cu118
7
+ yaml_files:
8
+ train_config: exp_vctk_dns20_whamr/enh_tfgridnet_tiny_raw/config.yaml