Emrys365 commited on
Commit
cdbc190
·
1 Parent(s): 9298c7c

Update model

Browse files
Files changed (45) hide show
  1. README.md +375 -3
  2. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/97epoch.pth +3 -0
  3. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/config.yaml +232 -0
  4. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/enhanced_test_16k/RESULTS.md +23 -0
  5. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/enhanced_test_48k/RESULTS.md +19 -0
  6. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/backward_time.png +0 -0
  7. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/clip.png +0 -0
  8. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/forward_time.png +0 -0
  9. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/gpu_max_cached_mem_GB.png +0 -0
  10. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/grad_norm.png +0 -0
  11. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/iter_time.png +0 -0
  12. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_16k.png +0 -0
  13. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_16k_r.png +0 -0
  14. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_24k.png +0 -0
  15. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_48k.png +0 -0
  16. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_8k.png +0 -0
  17. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_8k_r.png +0 -0
  18. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_2ch_16k.png +0 -0
  19. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_2ch_16k_r.png +0 -0
  20. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_2ch_8k.png +0 -0
  21. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_2ch_8k_r.png +0 -0
  22. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_5ch_16k.png +0 -0
  23. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_5ch_8k.png +0 -0
  24. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_8ch_16k_r.png +0 -0
  25. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_8ch_8k_r.png +0 -0
  26. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/loss.png +0 -0
  27. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/loss_scale.png +0 -0
  28. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/optim0_lr0.png +0 -0
  29. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/optim_step_time.png +0 -0
  30. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_1ch_16k.png +0 -0
  31. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_1ch_16k_r.png +0 -0
  32. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_1ch_24k.png +0 -0
  33. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_1ch_48k.png +0 -0
  34. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_1ch_8k.png +0 -0
  35. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_1ch_8k_r.png +0 -0
  36. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_2ch_16k.png +0 -0
  37. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_2ch_16k_r.png +0 -0
  38. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_2ch_8k.png +0 -0
  39. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_2ch_8k_r.png +0 -0
  40. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_5ch_16k.png +0 -0
  41. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_5ch_8k.png +0 -0
  42. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_8ch_16k_r.png +0 -0
  43. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_8ch_8k_r.png +0 -0
  44. exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/train_time.png +0 -0
  45. meta.yaml +8 -0
README.md CHANGED
@@ -1,3 +1,375 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - audio-to-audio
6
+ language: en
7
+ datasets:
8
+ - universal_se
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ENH model
13
+
14
+ ### `wyz/vctk_dns2020_whamr_bsrnn_large_noncausal`
15
+
16
+ This model was trained by Emrys365 using universal_se recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ To use the model in the Python interface, you could use the following code:
24
+
25
+ ```python
26
+ import soundfile as sf
27
+ from espnet2.bin.enh_inference import SeparateSpeech
28
+
29
+ # For model downloading + loading
30
+ model = SeparateSpeech.from_pretrained(
31
+ model_tag=wyz/vctk_dns2020_whamr_bsrnn_large_noncausal,
32
+ normalize_output_wav=True,
33
+ device=cuda,
34
+ )
35
+ # For loading a downloaded model
36
+ # model = SeparateSpeech(
37
+ # train_config=exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/config.yaml,
38
+ # model_file=exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/xxxx.pth,
39
+ # normalize_output_wav=True,
40
+ # device=cuda,
41
+ # )
42
+
43
+ audio, fs = sf.read(/path/to/noisy/utt1.flac)
44
+ enhanced = model(audio[None, :], fs=fs)[0]
45
+ ```
46
+
47
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
48
+ # RESULTS
49
+ ## Environments
50
+ - date: `Tue Feb 27 21:04:54 EST 2024`
51
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
52
+ - espnet version: `espnet 202304`
53
+ - pytorch version: `pytorch 2.0.1+cu118`
54
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
55
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
56
+
57
+
58
+ ## enhanced_test_16k
59
+
60
+
61
+ |dataset|PESQ_WB|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
62
+ |---|---|---|---|---|---|---|---|---|---|---|
63
+ |chime4_et05_real_isolated_6ch_track|1.23|55.03|-2.12|-2.12|0.00|-31.22|3.11|3.44|3.88|3.73|
64
+ |chime4_et05_simu_isolated_6ch_track|1.67|87.32|9.94|9.94|0.00|2.32|3.03|3.34|3.91|3.45|
65
+ |dns20_tt_synthetic_no_reverb|3.35|98.12|20.24|20.24|0.00|20.05|3.34|3.58|4.12|4.06|
66
+ |reverb_et_real_8ch_multich|1.15|66.78|1.47|1.47|0.00|-1.58|3.15|3.50|3.86|3.77|
67
+ |reverb_et_simu_8ch_multich|2.29|94.59|10.87|10.87|0.00|-8.41|3.12|3.49|3.82|3.83|
68
+ |whamr_tt_mix_single_reverb_max_16k|2.34|94.47|11.98|11.98|0.00|10.41|3.27|3.52|4.10|3.80|
69
+
70
+ module
71
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
72
+ # RESULTS
73
+ ## Environments
74
+ - date: `Thu Jan 11 22:52:46 EST 2024`
75
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
76
+ - espnet version: `espnet 202304`
77
+ - pytorch version: `pytorch 2.0.1+cu118`
78
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
79
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
80
+
81
+
82
+ ## enhanced_test_48k
83
+
84
+
85
+ |dataset|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
86
+ |---|---|---|---|---|---|---|---|---|---|
87
+ |vctk_noisy_tt_2spk|95.75|19.57|19.57|0.00|18.73|3.17|3.47|3.99|3.55|
88
+
89
+ ## ENH config
90
+
91
+ <details><summary>expand</summary>
92
+
93
+ ```
94
+ config: conf/tuning/train_enh_bsrnn_large_noncausal.yaml
95
+ print_config: false
96
+ log_level: INFO
97
+ dry_run: false
98
+ iterator_type: chunk
99
+ output_dir: exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw
100
+ ngpu: 1
101
+ seed: 0
102
+ num_workers: 4
103
+ num_att_plot: 3
104
+ dist_backend: nccl
105
+ dist_init_method: env://
106
+ dist_world_size: null
107
+ dist_rank: null
108
+ local_rank: 0
109
+ dist_master_addr: null
110
+ dist_master_port: null
111
+ dist_launcher: null
112
+ multiprocessing_distributed: false
113
+ unused_parameters: true
114
+ sharded_ddp: false
115
+ cudnn_enabled: true
116
+ cudnn_benchmark: false
117
+ cudnn_deterministic: true
118
+ collect_stats: false
119
+ write_collected_feats: false
120
+ max_epoch: 100
121
+ patience: 15
122
+ val_scheduler_criterion:
123
+ - valid
124
+ - loss
125
+ early_stopping_criterion:
126
+ - valid
127
+ - loss
128
+ - min
129
+ best_model_criterion:
130
+ - - valid
131
+ - loss
132
+ - min
133
+ keep_nbest_models: 1
134
+ nbest_averaging_interval: 0
135
+ grad_clip: 5.0
136
+ grad_clip_type: 2.0
137
+ grad_noise: false
138
+ accum_grad: 1
139
+ no_forward_run: false
140
+ resume: true
141
+ save_interval: 1000
142
+ train_dtype: float32
143
+ use_amp: false
144
+ log_interval: null
145
+ use_matplotlib: true
146
+ use_tensorboard: true
147
+ create_graph_in_tensorboard: false
148
+ use_wandb: false
149
+ wandb_project: null
150
+ wandb_id: null
151
+ wandb_entity: null
152
+ wandb_name: null
153
+ wandb_model_log_interval: -1
154
+ detect_anomaly: false
155
+ pretrain_path: null
156
+ init_param: []
157
+ ignore_init_mismatch: false
158
+ freeze_param: []
159
+ num_iters_per_epoch: 8000
160
+ num_iters_valid: null
161
+ batch_size: 4
162
+ valid_batch_size: null
163
+ batch_bins: 1000000
164
+ valid_batch_bins: null
165
+ train_shape_file:
166
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_mix_shape
167
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_ref1_shape
168
+ valid_shape_file:
169
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_mix_shape
170
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_ref1_shape
171
+ batch_type: folded
172
+ valid_batch_type: null
173
+ fold_length:
174
+ - 80000
175
+ - 80000
176
+ sort_in_batch: descending
177
+ sort_batch: descending
178
+ multiple_iterator: false
179
+ chunk_length: 32000
180
+ chunk_shift_ratio: 0.5
181
+ num_cache_chunks: 1024
182
+ chunk_excluded_key_prefixes: []
183
+ chunk_discard_short_samples: false
184
+ train_data_path_and_name_and_type:
185
+ - - dump/raw/train_vctk_noisy_dns20_whamr/wav.scp
186
+ - speech_mix
187
+ - sound
188
+ - - dump/raw/train_vctk_noisy_dns20_whamr/spk1.scp
189
+ - speech_ref1
190
+ - sound
191
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2category
192
+ - category
193
+ - text
194
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2fs
195
+ - fs
196
+ - text_int
197
+ valid_data_path_and_name_and_type:
198
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/wav.scp
199
+ - speech_mix
200
+ - sound
201
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/spk1.scp
202
+ - speech_ref1
203
+ - sound
204
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2category
205
+ - category
206
+ - text
207
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2fs
208
+ - fs
209
+ - text_int
210
+ allow_variable_data_keys: false
211
+ max_cache_size: 0.0
212
+ max_cache_fd: 32
213
+ allow_multi_rates: true
214
+ valid_max_cache_size: null
215
+ exclude_weight_decay: false
216
+ exclude_weight_decay_conf: {}
217
+ optim: adam
218
+ optim_conf:
219
+ lr: 0.001
220
+ eps: 1.0e-08
221
+ weight_decay: 1.0e-05
222
+ scheduler: steplr
223
+ scheduler_conf:
224
+ step_size: 2
225
+ gamma: 0.99
226
+ init: null
227
+ model_conf:
228
+ normalize_variance_per_ch: true
229
+ categories:
230
+ - 1ch_8k
231
+ - 1ch_8k_r
232
+ - 1ch_16k_r
233
+ - 1ch_48k
234
+ - 1ch_24k
235
+ - 1ch_16k
236
+ - 2ch_8k
237
+ - 2ch_8k_r
238
+ - 2ch_16k
239
+ - 2ch_16k_r
240
+ - 5ch_8k
241
+ - 5ch_16k
242
+ - 8ch_8k_r
243
+ - 8ch_16k_r
244
+ criterions:
245
+ - name: mr_l1_tfd
246
+ conf:
247
+ window_sz:
248
+ - 256
249
+ - 512
250
+ - 768
251
+ - 1024
252
+ hop_sz: null
253
+ eps: 1.0e-08
254
+ time_domain_weight: 0.5
255
+ normalize_variance: true
256
+ wrapper: fixed_order
257
+ wrapper_conf:
258
+ weight: 1.0
259
+ - name: si_snr
260
+ conf:
261
+ eps: 1.0e-07
262
+ wrapper: fixed_order
263
+ wrapper_conf:
264
+ weight: 0.0
265
+ speech_volume_normalize: null
266
+ rir_scp: null
267
+ rir_apply_prob: 1.0
268
+ noise_scp: null
269
+ noise_apply_prob: 1.0
270
+ noise_db_range: '13_15'
271
+ short_noise_thres: 0.5
272
+ use_reverberant_ref: false
273
+ num_spk: 1
274
+ num_noise_type: 1
275
+ sample_rate: 8000
276
+ force_single_channel: true
277
+ channel_reordering: true
278
+ categories:
279
+ - 1ch_8k
280
+ - 1ch_8k_r
281
+ - 1ch_16k_r
282
+ - 1ch_48k
283
+ - 1ch_24k
284
+ - 1ch_16k
285
+ - 2ch_8k
286
+ - 2ch_8k_r
287
+ - 2ch_16k
288
+ - 2ch_16k_r
289
+ - 5ch_8k
290
+ - 5ch_16k
291
+ - 8ch_8k_r
292
+ - 8ch_16k_r
293
+ speech_segment: null
294
+ avoid_allzero_segment: true
295
+ flexible_numspk: false
296
+ dynamic_mixing: false
297
+ utt2spk: null
298
+ dynamic_mixing_gain_db: 0.0
299
+ encoder: stft
300
+ encoder_conf:
301
+ n_fft: 960
302
+ hop_length: 480
303
+ use_builtin_complex: true
304
+ default_fs: 48000
305
+ separator: bsrnn
306
+ separator_conf:
307
+ num_spk: 1
308
+ num_channels: 256
309
+ num_layers: 6
310
+ target_fs: 48000
311
+ ref_channel: 0
312
+ causal: false
313
+ decoder: stft
314
+ decoder_conf:
315
+ n_fft: 960
316
+ hop_length: 480
317
+ default_fs: 48000
318
+ mask_module: multi_mask
319
+ mask_module_conf: {}
320
+ preprocessor: enh
321
+ preprocessor_conf: {}
322
+ required:
323
+ - output_dir
324
+ version: '202304'
325
+ distributed: false
326
+ ```
327
+
328
+ </details>
329
+
330
+
331
+
332
+ ### Citing ESPnet
333
+
334
+ ```BibTex
335
+ @inproceedings{watanabe2018espnet,
336
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
337
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
338
+ year={2018},
339
+ booktitle={Proceedings of Interspeech},
340
+ pages={2207--2211},
341
+ doi={10.21437/Interspeech.2018-1456},
342
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
343
+ }
344
+
345
+
346
+ @inproceedings{ESPnet-SE,
347
+ author = {Chenda Li and Jing Shi and Wangyou Zhang and Aswin Shanmugam Subramanian and Xuankai Chang and
348
+ Naoyuki Kamo and Moto Hira and Tomoki Hayashi and Christoph B{"{o}}ddeker and Zhuo Chen and Shinji Watanabe},
349
+ title = {ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for {ASR} Integration},
350
+ booktitle = {{IEEE} Spoken Language Technology Workshop, {SLT} 2021, Shenzhen, China, January 19-22, 2021},
351
+ pages = {785--792},
352
+ publisher = {{IEEE}},
353
+ year = {2021},
354
+ url = {https://doi.org/10.1109/SLT48900.2021.9383615},
355
+ doi = {10.1109/SLT48900.2021.9383615},
356
+ timestamp = {Mon, 12 Apr 2021 17:08:59 +0200},
357
+ biburl = {https://dblp.org/rec/conf/slt/Li0ZSCKHHBC021.bib},
358
+ bibsource = {dblp computer science bibliography, https://dblp.org}
359
+ }
360
+
361
+
362
+ ```
363
+
364
+ or arXiv:
365
+
366
+ ```bibtex
367
+ @misc{watanabe2018espnet,
368
+ title={ESPnet: End-to-End Speech Processing Toolkit},
369
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
370
+ year={2018},
371
+ eprint={1804.00015},
372
+ archivePrefix={arXiv},
373
+ primaryClass={cs.CL}
374
+ }
375
+ ```
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/97epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15fe051146b4ea9e3a589f45c1bb12bb8699a380a01664f25e1dce0350ee93a3
3
+ size 252779273
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/config.yaml ADDED
@@ -0,0 +1,232 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_enh_bsrnn_large_noncausal.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: chunk
6
+ output_dir: exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 4
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: true
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 100
28
+ patience: 15
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - loss
39
+ - min
40
+ keep_nbest_models: 1
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 1
46
+ no_forward_run: false
47
+ resume: true
48
+ save_interval: 1000
49
+ train_dtype: float32
50
+ use_amp: false
51
+ log_interval: null
52
+ use_matplotlib: true
53
+ use_tensorboard: true
54
+ create_graph_in_tensorboard: false
55
+ use_wandb: false
56
+ wandb_project: null
57
+ wandb_id: null
58
+ wandb_entity: null
59
+ wandb_name: null
60
+ wandb_model_log_interval: -1
61
+ detect_anomaly: false
62
+ pretrain_path: null
63
+ init_param: []
64
+ ignore_init_mismatch: false
65
+ freeze_param: []
66
+ num_iters_per_epoch: 8000
67
+ num_iters_valid: null
68
+ batch_size: 4
69
+ valid_batch_size: null
70
+ batch_bins: 1000000
71
+ valid_batch_bins: null
72
+ train_shape_file:
73
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_mix_shape
74
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_ref1_shape
75
+ valid_shape_file:
76
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_mix_shape
77
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_ref1_shape
78
+ batch_type: folded
79
+ valid_batch_type: null
80
+ fold_length:
81
+ - 80000
82
+ - 80000
83
+ sort_in_batch: descending
84
+ sort_batch: descending
85
+ multiple_iterator: false
86
+ chunk_length: 32000
87
+ chunk_shift_ratio: 0.5
88
+ num_cache_chunks: 1024
89
+ chunk_excluded_key_prefixes: []
90
+ chunk_discard_short_samples: false
91
+ train_data_path_and_name_and_type:
92
+ - - dump/raw/train_vctk_noisy_dns20_whamr/wav.scp
93
+ - speech_mix
94
+ - sound
95
+ - - dump/raw/train_vctk_noisy_dns20_whamr/spk1.scp
96
+ - speech_ref1
97
+ - sound
98
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2category
99
+ - category
100
+ - text
101
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2fs
102
+ - fs
103
+ - text_int
104
+ valid_data_path_and_name_and_type:
105
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/wav.scp
106
+ - speech_mix
107
+ - sound
108
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/spk1.scp
109
+ - speech_ref1
110
+ - sound
111
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2category
112
+ - category
113
+ - text
114
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2fs
115
+ - fs
116
+ - text_int
117
+ allow_variable_data_keys: false
118
+ max_cache_size: 0.0
119
+ max_cache_fd: 32
120
+ allow_multi_rates: true
121
+ valid_max_cache_size: null
122
+ exclude_weight_decay: false
123
+ exclude_weight_decay_conf: {}
124
+ optim: adam
125
+ optim_conf:
126
+ lr: 0.001
127
+ eps: 1.0e-08
128
+ weight_decay: 1.0e-05
129
+ scheduler: steplr
130
+ scheduler_conf:
131
+ step_size: 2
132
+ gamma: 0.99
133
+ init: null
134
+ model_conf:
135
+ normalize_variance_per_ch: true
136
+ categories:
137
+ - 1ch_8k
138
+ - 1ch_8k_r
139
+ - 1ch_16k_r
140
+ - 1ch_48k
141
+ - 1ch_24k
142
+ - 1ch_16k
143
+ - 2ch_8k
144
+ - 2ch_8k_r
145
+ - 2ch_16k
146
+ - 2ch_16k_r
147
+ - 5ch_8k
148
+ - 5ch_16k
149
+ - 8ch_8k_r
150
+ - 8ch_16k_r
151
+ criterions:
152
+ - name: mr_l1_tfd
153
+ conf:
154
+ window_sz:
155
+ - 256
156
+ - 512
157
+ - 768
158
+ - 1024
159
+ hop_sz: null
160
+ eps: 1.0e-08
161
+ time_domain_weight: 0.5
162
+ normalize_variance: true
163
+ wrapper: fixed_order
164
+ wrapper_conf:
165
+ weight: 1.0
166
+ - name: si_snr
167
+ conf:
168
+ eps: 1.0e-07
169
+ wrapper: fixed_order
170
+ wrapper_conf:
171
+ weight: 0.0
172
+ speech_volume_normalize: null
173
+ rir_scp: null
174
+ rir_apply_prob: 1.0
175
+ noise_scp: null
176
+ noise_apply_prob: 1.0
177
+ noise_db_range: '13_15'
178
+ short_noise_thres: 0.5
179
+ use_reverberant_ref: false
180
+ num_spk: 1
181
+ num_noise_type: 1
182
+ sample_rate: 8000
183
+ force_single_channel: true
184
+ channel_reordering: true
185
+ categories:
186
+ - 1ch_8k
187
+ - 1ch_8k_r
188
+ - 1ch_16k_r
189
+ - 1ch_48k
190
+ - 1ch_24k
191
+ - 1ch_16k
192
+ - 2ch_8k
193
+ - 2ch_8k_r
194
+ - 2ch_16k
195
+ - 2ch_16k_r
196
+ - 5ch_8k
197
+ - 5ch_16k
198
+ - 8ch_8k_r
199
+ - 8ch_16k_r
200
+ speech_segment: null
201
+ avoid_allzero_segment: true
202
+ flexible_numspk: false
203
+ dynamic_mixing: false
204
+ utt2spk: null
205
+ dynamic_mixing_gain_db: 0.0
206
+ encoder: stft
207
+ encoder_conf:
208
+ n_fft: 960
209
+ hop_length: 480
210
+ use_builtin_complex: true
211
+ default_fs: 48000
212
+ separator: bsrnn
213
+ separator_conf:
214
+ num_spk: 1
215
+ num_channels: 256
216
+ num_layers: 6
217
+ target_fs: 48000
218
+ ref_channel: 0
219
+ causal: false
220
+ decoder: stft
221
+ decoder_conf:
222
+ n_fft: 960
223
+ hop_length: 480
224
+ default_fs: 48000
225
+ mask_module: multi_mask
226
+ mask_module_conf: {}
227
+ preprocessor: enh
228
+ preprocessor_conf: {}
229
+ required:
230
+ - output_dir
231
+ version: '202304'
232
+ distributed: false
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/enhanced_test_16k/RESULTS.md ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Tue Feb 27 21:04:54 EST 2024`
5
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202304`
7
+ - pytorch version: `pytorch 2.0.1+cu118`
8
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
9
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
10
+
11
+
12
+ ## enhanced_test_16k
13
+
14
+
15
+ |dataset|PESQ_WB|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
16
+ |---|---|---|---|---|---|---|---|---|---|---|
17
+ |chime4_et05_real_isolated_6ch_track|1.23|55.03|-2.12|-2.12|0.00|-31.22|3.11|3.44|3.88|3.73|
18
+ |chime4_et05_simu_isolated_6ch_track|1.67|87.32|9.94|9.94|0.00|2.32|3.03|3.34|3.91|3.45|
19
+ |dns20_tt_synthetic_no_reverb|3.35|98.12|20.24|20.24|0.00|20.05|3.34|3.58|4.12|4.06|
20
+ |reverb_et_real_8ch_multich|1.15|66.78|1.47|1.47|0.00|-1.58|3.15|3.50|3.86|3.77|
21
+ |reverb_et_simu_8ch_multich|2.29|94.59|10.87|10.87|0.00|-8.41|3.12|3.49|3.82|3.83|
22
+ |whamr_tt_mix_single_reverb_max_16k|2.34|94.47|11.98|11.98|0.00|10.41|3.27|3.52|4.10|3.80|
23
+
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/enhanced_test_48k/RESULTS.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ module
2
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
3
+ # RESULTS
4
+ ## Environments
5
+ - date: `Thu Jan 11 22:52:46 EST 2024`
6
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
7
+ - espnet version: `espnet 202304`
8
+ - pytorch version: `pytorch 2.0.1+cu118`
9
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
10
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
11
+
12
+
13
+ ## enhanced_test_48k
14
+
15
+
16
+ |dataset|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
17
+ |---|---|---|---|---|---|---|---|---|---|
18
+ |vctk_noisy_tt_2spk|95.75|19.57|19.57|0.00|18.73|3.17|3.47|3.99|3.55|
19
+
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/backward_time.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/clip.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/forward_time.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/gpu_max_cached_mem_GB.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/grad_norm.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/iter_time.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_24k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_48k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_1ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_2ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_2ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_2ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_2ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_5ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_5ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_8ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/l1_timedomain+magspec_loss_8ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/loss.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/loss_scale.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/optim0_lr0.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/optim_step_time.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_1ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_1ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_1ch_24k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_1ch_48k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_1ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_1ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_2ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_2ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_2ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_2ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_5ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_5ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_8ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/si_snr_loss_8ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/images/train_time.png ADDED
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202304'
2
+ files:
3
+ model_file: exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/97epoch.pth
4
+ python: "3.8.16 (default, Mar 2 2023, 03:21:46) \n[GCC 11.2.0]"
5
+ timestamp: 1722936939.211588
6
+ torch: 2.0.1+cu118
7
+ yaml_files:
8
+ train_config: exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/config.yaml