Update README.md
Browse files
README.md
CHANGED
@@ -4,8 +4,6 @@ tags:
|
|
4 |
- audio
|
5 |
- audio-to-audio
|
6 |
language: en
|
7 |
-
datasets:
|
8 |
-
- universal_se
|
9 |
license: cc-by-4.0
|
10 |
---
|
11 |
|
@@ -13,7 +11,7 @@ license: cc-by-4.0
|
|
13 |
|
14 |
### `wyz/vctk_dns2020_whamr_bsrnn_large_noncausal`
|
15 |
|
16 |
-
This model was trained by Emrys365
|
17 |
|
18 |
### Demo: How to use in ESPnet2
|
19 |
|
@@ -28,19 +26,19 @@ from espnet2.bin.enh_inference import SeparateSpeech
|
|
28 |
|
29 |
# For model downloading + loading
|
30 |
model = SeparateSpeech.from_pretrained(
|
31 |
-
model_tag=wyz/vctk_dns2020_whamr_bsrnn_large_noncausal,
|
32 |
normalize_output_wav=True,
|
33 |
-
device=cuda,
|
34 |
)
|
35 |
# For loading a downloaded model
|
36 |
# model = SeparateSpeech(
|
37 |
-
# train_config=exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/config.yaml,
|
38 |
-
# model_file=exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/xxxx.pth,
|
39 |
# normalize_output_wav=True,
|
40 |
# device=cuda,
|
41 |
# )
|
42 |
|
43 |
-
audio, fs = sf.read(/path/to/noisy/utt1.flac)
|
44 |
enhanced = model(audio[None, :], fs=fs)[0]
|
45 |
```
|
46 |
|
@@ -67,17 +65,6 @@ enhanced = model(audio[None, :], fs=fs)[0]
|
|
67 |
|reverb_et_simu_8ch_multich|2.29|94.59|10.87|10.87|0.00|-8.41|3.12|3.49|3.82|3.83|
|
68 |
|whamr_tt_mix_single_reverb_max_16k|2.34|94.47|11.98|11.98|0.00|10.41|3.27|3.52|4.10|3.80|
|
69 |
|
70 |
-
module
|
71 |
-
<!-- Generated by ./scripts/utils/show_enh_score.sh -->
|
72 |
-
# RESULTS
|
73 |
-
## Environments
|
74 |
-
- date: `Thu Jan 11 22:52:46 EST 2024`
|
75 |
-
- python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
|
76 |
-
- espnet version: `espnet 202304`
|
77 |
-
- pytorch version: `pytorch 2.0.1+cu118`
|
78 |
-
- Git hash: `443028662106472c60fe8bd892cb277e5b488651`
|
79 |
-
- Commit date: `Thu May 11 03:32:59 2023 +0000`
|
80 |
-
|
81 |
|
82 |
## enhanced_test_48k
|
83 |
|
|
|
4 |
- audio
|
5 |
- audio-to-audio
|
6 |
language: en
|
|
|
|
|
7 |
license: cc-by-4.0
|
8 |
---
|
9 |
|
|
|
11 |
|
12 |
### `wyz/vctk_dns2020_whamr_bsrnn_large_noncausal`
|
13 |
|
14 |
+
This model was trained by Emrys365 based on the universal_se_v1 recipe in [espnet](https://github.com/espnet/espnet/).
|
15 |
|
16 |
### Demo: How to use in ESPnet2
|
17 |
|
|
|
26 |
|
27 |
# For model downloading + loading
|
28 |
model = SeparateSpeech.from_pretrained(
|
29 |
+
model_tag="wyz/vctk_dns2020_whamr_bsrnn_large_noncausal",
|
30 |
normalize_output_wav=True,
|
31 |
+
device="cuda",
|
32 |
)
|
33 |
# For loading a downloaded model
|
34 |
# model = SeparateSpeech(
|
35 |
+
# train_config="exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/config.yaml",
|
36 |
+
# model_file="exp_vctk_dns20_whamr/enh_train_enh_bsrnn_large_noncausal_raw/xxxx.pth",
|
37 |
# normalize_output_wav=True,
|
38 |
# device=cuda,
|
39 |
# )
|
40 |
|
41 |
+
audio, fs = sf.read("/path/to/noisy/utt1.flac")
|
42 |
enhanced = model(audio[None, :], fs=fs)[0]
|
43 |
```
|
44 |
|
|
|
65 |
|reverb_et_simu_8ch_multich|2.29|94.59|10.87|10.87|0.00|-8.41|3.12|3.49|3.82|3.83|
|
66 |
|whamr_tt_mix_single_reverb_max_16k|2.34|94.47|11.98|11.98|0.00|10.41|3.27|3.52|4.10|3.80|
|
67 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
|
69 |
## enhanced_test_48k
|
70 |
|