Keycatowo commited on
Commit
bb5feba
·
1 Parent(s): d337940

init trans commit

Browse files
.streamlit/config.toml ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ [server]
2
+ maxUploadSize = 10
Dockerfile CHANGED
@@ -1,11 +1,24 @@
1
- FROM python:3.9
2
 
3
- WORKDIR /code
 
4
 
5
- COPY ./requirements.txt /code/requirements.txt
 
6
 
7
- RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
 
8
 
9
- COPY . .
 
 
 
 
10
 
11
- CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860"]
 
 
 
 
 
 
 
1
+ FROM python:3.8-slim-buster
2
 
3
+ # 將工作目錄設定為 /app
4
+ WORKDIR /app
5
 
6
+ # 複製當前目錄下的所有檔案到 /app
7
+ COPY . /app
8
 
9
+ # 開啟 8501 port
10
+ EXPOSE 8501
11
 
12
+ # 安裝套件
13
+ RUN apt update && \
14
+ apt upgrade -y && \
15
+ apt install -y libsndfile1 && \
16
+ apt install -y ffmpeg
17
 
18
+ # 升級 pip, 安裝套件
19
+ RUN pip install --upgrade pip && \
20
+ pip install -r requirements.txt
21
+
22
+
23
+ # 執行命令
24
+ CMD ["streamlit", "run", "home.py"]
docs/1-Basic Information.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Part1-Basic Information
2
+
3
+ ## What can it do?
4
+ - Print audio length (seconds)
5
+ - Plot waveform
6
+ - Plot rms (librosa.feature.rms)
7
+ - Plot spectrogram
8
+ - Save rms as .csv
9
+
10
+ ## How to use it?
11
+
12
+ ### Step1:上傳音檔
13
+ 在上傳音檔區塊,點選上傳音檔,選擇要分析的音檔,點選上傳。
14
+ > 上傳檔案限制:
15
+ > - 檔案大小:200MB
16
+ > - 檔案格式:`.mp3`, `.wav`, `.ocg`
17
+
18
+ ![](../fig/1-上傳檔案.png)
19
+
20
+ 上傳完成後,會顯示音檔的基本資訊,並提供一個播放介面檢查音檔是否正確。
21
+ ![](../fig/1-上傳完成.png)
22
+
23
+ ### Step2:選擇要分析的片段
24
+ 預設分析為整段音檔,若要分析音檔的特定片段,可以在此區塊選擇要分析的片段。
25
+ 同樣的,選擇完後,會顯示音檔的基本資訊,並提供一個播放介面檢查音檔是否正確。
26
+ ![](../fig/1-截取片段.png)
27
+
28
+ ### Step3:前往要分析的功能
29
+
30
+ #### 聲音波形(waveform)
31
+ ![](../fig/1-繪製聲音波形.png)
32
+
33
+ ### 聲音強度(rms)
34
+ ![](../fig/1-聲音強度.png)
35
+
36
+ ### 聲音頻譜(spectrogram)
37
+ ![](../fig/1-聲音頻譜.png)
38
+
39
+ ### 儲存聲音強度(rms)
40
+ ![](../fig/1-下載RMS.png)
docs/2-Pitch_estimation.md ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Part2-Pitch_estimation
2
+
3
+ ## What can it do?
4
+ - Mel-frequency spectrogram
5
+ - Constant-Q transform
6
+ - Chroma
7
+ - PYin + mel-frequency spectrogram
8
+ - Pitch class histogram 1 (calculated from chroma_stft)
9
+ - Pitch class histogram 2 (calculated from chroma_stft, n_chroma = 120)
10
+ - Pitch class histogram 3 (calculated from the PYin f0 )
11
+ - save all histogram (data count) as .csv
12
+
13
+ ## TODO
14
+ + PYin + mel-frequency spectrogram 的部分是哪个?
15
+ + 目前只有一个pirch class
docs/3-Beat Tracking.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Part 3 - Beat Tracking
2
+
3
+ ## What can it do?
4
+ - Onset strength
5
+ (librosa.onset.onset_strength)
6
+ - Onset detection + onset time
7
+ (librosa.onset.onset_detect)
8
+ - Mel-spectrogram + onset strength + beat time
9
+ - Predominant local pulse + beat time
10
+ - Tempo: print static tempo value
11
+ - Fourier tempogram + estimated tempo
12
+ - Autocorrelation tempogram + estimated tempo
13
+ - Save note onset time & beat time as .csv
14
+ - Output original audio + note onset click sound
15
+ - Output original audio + beat time click sound
16
+ - Madmom note onset detection & beat tracking
17
+ https://madmom.readthedocs.io/en/latest/modules/features.html
18
+
docs/4-Chord_recognition.md ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ # Part4-Chord recognition
2
+
3
+ ## What can it do?
4
+ - Chromagram + chord recognition result
5
+ (https://www.audiolabs-erlangen.de/resources/MIR/FMP/C5/C5S2_ChordRec_Templates.html)
6
+ - Chromagram + binary reconstruction of chord recognition result
docs/5-Structure_analysis.md ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # Part5-Structure_analysis
2
+
3
+ ## What can it do?
4
+ - Raw SSM (calculated from chroma)
5
+ - Smoothed SSM (calculated from chroma)
6
+ - Novelty function (structural boundary detection)
7
+ - Save novelty function curve as .csv
8
+
9
+ ## TODO
10
+ + 需要`anno.csv`的部分需要確認
docs/6-Timbre Analysis.md ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ # Part 6 - Timbre analysis
2
+ - Spectrogram + spectral centroid
3
+ - Spectrogram + 99% roll-off + 1% roll-off
4
+ - Spectrogram + centroid + bandwidth
5
+ (librosa.feature.spectral_bandwidth)
6
+ - Save spectral centroid, 99% roll-off, 1% roll-off, bandwidth as .csv
docs/info.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 音樂分析工具
2
+
3
+ 此工具整合Pitch_estimation、Beat Tracking、Chord recognition、Structure analysis和Timbre analysis等功能,旨在提供一個簡便易用的音樂分析工具。
4
+
5
+ ## 功能概述
6
+
7
+ 以下是此工具的主要功能:
8
+
9
+ - Basic analysis:音檔基本資訊
10
+ - Pitch estimation:樂曲音高估計
11
+ - Beat Tracking:節奏追蹤
12
+ - Chord recognition:和弦識別
13
+ - Structure analysis:曲式分析
14
+ - Timbre analysis:音色分析
15
+
16
+ 我們希望此專案可以幫助不具備程式基礎的音樂工作者和愛好者進行音樂分析,
17
+ 透過整合現有的各種音樂分析方法與工具,並將其整合在一個簡單易用的網頁工具介面中。
18
+
19
+ ## 開發團隊
20
+
21
+ + [Yu-Fen Huang](https://yfhuang.info/)
22
+ + 中央研究院 資訊科學研究所 音樂與文化科技實驗室(Music & Culture Technology Lab, Institute of Information Science, Academia Sinica, Taiwan)
23
+ + 計畫主持人
24
+ + [Yu-Lan Chang](https://github.com/TrangDuLam)
25
+ + 清華大學 電機工程所
26
+ + 核心功能開發、套件源碼整合
27
+ + [Hong-Hsiang Liu](https://url.o-w-o.cc/link)
28
+ + 清華大學 電機工程所
29
+ + 互動介面設計、應用部署與配置
30
+ + Ting-Yi Lu
31
+ + 清華大學 資訊工程所
32
+ + 套件源碼整合、說明文件撰寫
33
+
34
+ ## 工具相關資源
35
+ + [視覺化介面](https://github.com/Keycatowo/music-analysis):適合不具備程式基礎的使用者
36
+ + [程式套件](https://github.com/TrangDuLam/NTHU_Music_AI_Tools):提供更多細節調整,適合具備程式基礎的使用者
37
+ + 說明文件:
38
+ + ...
39
+
40
+ ## 問題反饋
41
+
42
+ 如果您在使用此專案時遇到任何問題,請通過以下方式與我們聯繫:
43
+
44
+ - 發送電子郵件給我們
45
+ - 在我們的GitHub頁面提交問題
46
+ - [視覺化介面](https://github.com/Keycatowo/music-analysis/issues)
47
+ - [程式套件](https://github.com/TrangDuLam/NTHU_Music_AI_Tools)
48
+
49
+ 我們會盡快回复您的問題。
50
+
51
+ ## 授權協議
52
+
53
+ 音樂分析工具採用 [MIT](https://opensource.org/license/mit/) 授權。
54
+
55
+ 請注意,我們的軟件和內容可能包含第三方軟件庫和組件,這些庫和組件受到其各自的許可證的管轄。有關這些庫和組件的詳細信息,請參閱相應的文檔。
home.py ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #%%
2
+ import streamlit as st
3
+
4
+ st.header("Music Analysis Tool")
5
+
6
+ st.session_state.start_time = 0.0
7
+
8
+
9
+ st.write(
10
+ """
11
+ # 音樂分析工具
12
+
13
+ 此工具整合Pitch_estimation、Beat Tracking、Chord recognition、Structure analysis和Timbre analysis等功能,旨在提供一個簡便易用的音樂分析工具。
14
+
15
+ ## 功能概述
16
+
17
+ 以下是此工具的主要功能:
18
+
19
+ - Basic analysis:音檔基本資訊
20
+ - Pitch estimation:樂曲音高估計
21
+ - Beat Tracking:節奏追蹤
22
+ - Chord recognition:和弦識別
23
+ - Structure analysis:曲式分析
24
+ - Timbre analysis:音色分析
25
+
26
+ 我們希望此專案可以幫助不具備程式基礎的音樂工作者和愛好者進行音樂分析,
27
+ 透過整合現有的各種音樂分析方法與工具,並將其整合在一個簡單易用的網頁工具介面中。
28
+
29
+ ## 開發團隊
30
+
31
+ + [Yu-Fen Huang](https://yfhuang.info/)
32
+ + 中央研究院 資訊科學研究所 音樂與文化科技實驗室(Music & Culture Technology Lab, Institute of Information Science, Academia Sinica, Taiwan)
33
+ + 計畫主持人
34
+ + [Yu-Lan Chang](https://github.com/TrangDuLam)
35
+ + 清華大學 電機工程所
36
+ + 核心功能開發、套件源碼整合
37
+ + [Hong-Hsiang Liu](https://url.o-w-o.cc/link)
38
+ + 清華大學 電機工程所
39
+ + 互動介面設計、應用部署與配置
40
+ + Ting-Yi Lu
41
+ + 清華大學 資訊工程所
42
+ + 套件源碼整合、說明文件撰寫
43
+
44
+ ## 工具相關資源
45
+ + [視覺化介面](https://github.com/Keycatowo/music-analysis):適合不具備程式基礎的使用者
46
+ + [程式套件](https://github.com/TrangDuLam/NTHU_Music_AI_Tools):提供更多細節調整,適合具備程式基礎的使用者
47
+ + 說明文件:
48
+ + ...
49
+
50
+ ## 問題反饋
51
+
52
+ 如果您在使用此專案時遇到任何問題,請通過以下方式與我們聯繫:
53
+
54
+ - 發送電子郵件給我們
55
+ - 在我們的GitHub頁面提交問題
56
+ - [視覺化介面](https://github.com/Keycatowo/music-analysis/issues)
57
+ - [程式套件](https://github.com/TrangDuLam/NTHU_Music_AI_Tools)
58
+
59
+ 我們會盡快回复您的問題。
60
+
61
+ ## 授權協議
62
+
63
+ 音樂分析工具採用 [MIT](https://opensource.org/license/mit/) 授權。
64
+
65
+ 請注意,我們的軟件和內容可能包含第三方軟件庫和組件,這些庫和組件受到其各自的許可證的管轄。有關這些庫和組件的詳細信息,請參閱相應的文檔。
66
+ """
67
+ )
packages.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ libsndfile1-dev
pages/1-Basic_Information.py ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #%%
2
+ import streamlit as st
3
+ import plotly.express as px
4
+ import plotly.graph_objects as go
5
+ import matplotlib.pyplot as plt
6
+ import numpy as np
7
+ import librosa
8
+ import pandas as pd
9
+ from src.st_helper import convert_df, show_readme, get_shift
10
+ from src.basic_info import plot_waveform, signal_RMS_analysis
11
+
12
+
13
+ st.title("Basic Information")
14
+ #%% 頁面說明
15
+ # show_readme("docs/1-Basic Information.md")
16
+
17
+
18
+ #%% 上傳檔案區塊
19
+ with st.expander("上傳檔案(Upload Files)"):
20
+ file = st.file_uploader("Upload your music library", type=["mp3", "wav", "ogg"])
21
+
22
+ if file is not None:
23
+ st.audio(file, format="audio/ogg")
24
+ st.subheader("File information")
25
+ st.write(f"File name: `{file.name}`", )
26
+ st.write(f"File type: `{file.type}`")
27
+ st.write(f"File size: `{file.size}`")
28
+
29
+ # 載入音檔
30
+ y, sr = librosa.load(file, sr=44100)
31
+ st.write(f"Sample rate: `{sr}`")
32
+ duration = float(np.round(len(y)/sr-0.005, 2)) # 時間長度,取小數點後2位,向下取整避免超過音檔長度
33
+ st.write(f"Duration(s): `{duration}`")
34
+
35
+ y_all = y
36
+
37
+ #%%
38
+ if file is not None:
39
+
40
+ ### Start of 選擇聲音片段 ###
41
+ with st.expander("選擇聲音片段(Select a segment of the audio)"):
42
+
43
+ # 建立一個滑桿,可以選擇聲音片段,使用時間長度為單位
44
+ start_time, end_time = st.slider("Select a segment of the audio",
45
+ 0.0, duration,
46
+ (st.session_state.start_time, duration),
47
+ 0.01
48
+ )
49
+ st.session_state.start_time = start_time
50
+
51
+ st.write(f"Selected segment: `{start_time}` ~ `{end_time}`, duration: `{end_time-start_time}`")
52
+
53
+ # 根據選擇的聲音片段,取出聲音資料
54
+ start_index = int(start_time*sr)
55
+ end_index = int(end_time*sr)
56
+ y_sub = y_all[start_index:end_index]
57
+
58
+
59
+ # 建立一個y_sub的播放器
60
+ st.audio(y_sub, format="audio/ogg", sample_rate=sr)
61
+ # 計算y_sub所對應時間的x軸
62
+ x_sub = np.arange(len(y_sub))/sr
63
+ ### End of 選擇聲音片段 ###
64
+
65
+ tab1, tab2, tab3, tab4, tab5 = st.tabs([
66
+ "Waveform(mathplotlib)",
67
+ "Waveform(plotly)",
68
+ "signal_RMS_analysis",
69
+ "Spectrogram",
70
+ "Download RMS data"])
71
+
72
+ shift_time, shift_array = get_shift(start_time, end_time) # shift_array為y_sub的時間刻度
73
+
74
+ # 繪製聲音波形圖
75
+ with tab1:
76
+ st.subheader("Waveform(mathplotlib)")
77
+ fig1_1, ax_1_1 = plt.subplots()
78
+ ax_1_1.plot(x_sub + shift_time, y_sub)
79
+ ax_1_1.set_xlabel("Time(s)")
80
+ ax_1_1.set_ylabel("Amplitude")
81
+ ax_1_1.set_title("Waveform")
82
+ st.pyplot(fig1_1)
83
+
84
+ # 繪製聲音波形圖
85
+ with tab2:
86
+ st.subheader("Waveform(plotly)")
87
+ fig1_2 = go.Figure(data=go.Scatter(x=x_sub + shift_time, y=y_sub))
88
+ fig1_2.update_layout(
89
+ title="Waveform",
90
+ xaxis_title="Time(s)",
91
+ yaxis_title="Amplitude",
92
+ )
93
+ st.plotly_chart(fig1_2)
94
+
95
+ # 繪製聲音RMS圖
96
+ with tab3:
97
+ st.subheader("signal_RMS_analysis")
98
+ fig1_3, ax1_3, times, rms = signal_RMS_analysis(y_sub, shift_time=shift_time)
99
+ st.pyplot(fig1_3)
100
+
101
+ # 繪製聲音Spectrogram圖(使用librosa繪製)
102
+ with tab4:
103
+ st.subheader("Spectrogram")
104
+ stft = librosa.stft(y_sub)
105
+ stft_db = librosa.amplitude_to_db(abs(stft))
106
+ # add a figure
107
+ fig1_4, ax1_4 = plt.subplots()
108
+ librosa.display.specshow(stft_db, x_axis='time', y_axis='log', sr=sr, ax=ax1_4)
109
+ ax1_4.set_xticks(shift_array - shift_array[0],
110
+ shift_array)
111
+ ax1_4.autoscale()
112
+ ax1_4.set_xlabel("Time(s)")
113
+ st.pyplot(fig1_4)
114
+
115
+ # 下載RMS資料
116
+ with tab5:
117
+ st.subheader("Download RMS data")
118
+
119
+ col1, col2 = st.columns(2)
120
+ with col1:
121
+ rms_df = pd.DataFrame({"Time(s)": times, "RMS": rms[0,:]})
122
+ st.dataframe(rms_df, use_container_width=True)
123
+ with col2:
124
+ st.download_button(
125
+ "Doanload RMS data",
126
+ convert_df(rms_df),
127
+ "rms.csv",
128
+ "text/csv",
129
+ key="download-csv"
130
+ )
131
+
132
+ # %%
pages/2-Pitch_estimation.py ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #%%
2
+ import streamlit as st
3
+ import plotly.express as px
4
+ import plotly.graph_objects as go
5
+ import matplotlib.pyplot as plt
6
+ import numpy as np
7
+ import librosa
8
+ import pandas as pd
9
+ import seaborn as sns
10
+ from src.st_helper import convert_df, show_readme, get_shift
11
+ from src.pitch_estimation import plot_mel_spectrogram, plot_constant_q_transform, pitch_class_type_one_vis, pitch_class_histogram_chroma
12
+
13
+
14
+ st.title("Pitch estimation")
15
+ #%% 頁面說明
16
+ # show_readme("docs/2-Pitch_estimation.md")
17
+
18
+ #%% 上傳檔案區塊
19
+ with st.expander("上傳檔案(Upload Files)"):
20
+ file = st.file_uploader("Upload your music library", type=["mp3", "wav", "ogg"])
21
+
22
+ if file is not None:
23
+ st.audio(file, format="audio/ogg")
24
+ st.subheader("File information")
25
+ st.write(f"File name: `{file.name}`", )
26
+ st.write(f"File type: `{file.type}`")
27
+ st.write(f"File size: `{file.size}`")
28
+
29
+ # 載入音檔
30
+ y, sr = librosa.load(file, sr=44100)
31
+ st.write(f"Sample rate: `{sr}`")
32
+ duration = float(np.round(len(y)/sr-0.005, 2)) # 時間長度,取小數點後2位,向下取整避免超過音檔長度
33
+ st.write(f"Duration(s): `{duration}`")
34
+
35
+ y_all = y
36
+
37
+ #%% 功能區塊
38
+ if file is not None:
39
+ ### Start of 選擇聲音片段 ###
40
+ with st.expander("選擇聲音片段(Select a segment of the audio)"):
41
+
42
+ # 建立一個滑桿,可以選擇聲音片段,使用時間長度為單位
43
+ start_time, end_time = st.slider("Select a segment of the audio",
44
+ 0.0, duration,
45
+ (st.session_state.start_time, duration),
46
+ 0.01
47
+ )
48
+ st.session_state.start_time = start_time
49
+
50
+ st.write(f"Selected segment: `{start_time}` ~ `{end_time}`, duration: `{end_time-start_time}`")
51
+
52
+ # 根據選擇的聲音片段,取出聲音資料
53
+ start_index = int(start_time*sr)
54
+ end_index = int(end_time*sr)
55
+ y_sub = y_all[start_index:end_index]
56
+
57
+
58
+ # 建立一個y_sub的播放器
59
+ st.audio(y_sub, format="audio/ogg", sample_rate=sr)
60
+ # 計算y_sub所對應時間的x軸
61
+ x_sub = np.arange(len(y_sub))/sr
62
+ ### End of 選擇聲音片段 ###
63
+
64
+ tab1, tab2, tab3, tab4 = st.tabs(["Mel-frequency spectrogram", "Constant-Q transform", "Chroma", "Pitch class"])
65
+
66
+ shift_time, shift_array = get_shift(start_time, end_time) # shift_array為y_sub的時間刻度
67
+
68
+ # Mel-frequency spectrogram
69
+ with tab1:
70
+ st.subheader("Mel-frequency spectrogram")
71
+ with_pitch = st.checkbox("Show pitch", value=True)
72
+ fig2_1, ax2_1 = plot_mel_spectrogram(y_sub, sr, shift_array, with_pitch)
73
+ st.pyplot(fig2_1)
74
+
75
+ # Constant-Q transform
76
+ with tab2:
77
+ st.subheader("Constant-Q transform")
78
+ fig2_2, ax2_2 = plot_constant_q_transform(y_sub, sr, shift_array)
79
+ st.pyplot(fig2_2)
80
+
81
+ # chroma
82
+ with tab3:
83
+ st.subheader("Chroma")
84
+
85
+ chroma = librosa.feature.chroma_stft(y=y_sub, sr=sr)
86
+ chroma_t = librosa.times_like(chroma, sr)
87
+ df_chroma = pd.DataFrame(chroma)
88
+ df_chroma_t = pd.DataFrame({"Time(s)": chroma_t})
89
+ df_chroma_t["Time(frame)"] = list(range(len(chroma_t)))
90
+ df_chroma_t["Time(s)"] = df_chroma_t["Time(s)"] + shift_time
91
+ df_chroma_t = df_chroma_t[["Time(frame)", "Time(s)"]]
92
+
93
+ fig2_3, ax2_3 = plt.subplots(figsize=(10, 4))
94
+ sns.heatmap(chroma, ax=ax2_3)
95
+ ax2_3.set_title("Chroma")
96
+ ax2_3.set_xlabel("Time(frame)")
97
+ ax2_3.invert_yaxis()
98
+ st.pyplot(fig2_3)
99
+
100
+ st.write("Chroma value")
101
+ st.dataframe(df_chroma, use_container_width=True)
102
+ st.download_button(
103
+ label="Download chroma",
104
+ data=convert_df(df_chroma),
105
+ file_name="chroma_value.csv",
106
+ )
107
+ st.write("Chroma time")
108
+ st.dataframe(df_chroma_t, use_container_width=True)
109
+ st.download_button(
110
+ label="Download chroma time",
111
+ data=convert_df(df_chroma_t),
112
+ file_name="chroma_time.csv",
113
+ )
114
+
115
+ # Pitch class type one
116
+ with tab4:
117
+ st.subheader("Pitch class(chroma)")
118
+ high_res = st.checkbox("High resolution", value=False)
119
+ fig2_4, ax2_4, df_pitch_class = pitch_class_histogram_chroma(y_sub, sr, high_res)
120
+ st.pyplot(fig2_4)
121
+ st.write(df_pitch_class)
122
+ st.download_button(
123
+ label="Download pitch class(chroma)",
124
+ data=convert_df(pd.DataFrame(df_pitch_class)),
125
+ file_name="Pitch_class(chroma).csv",
126
+ mime="text/csv",
127
+ )
pages/3-Beat Tracking.py ADDED
@@ -0,0 +1,164 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #%%
2
+ import streamlit as st
3
+ import plotly.express as px
4
+ import plotly.graph_objects as go
5
+ import matplotlib.pyplot as plt
6
+ import numpy as np
7
+ import librosa
8
+ import pandas as pd
9
+ from src.beat_track import onsets_detection, plot_onset_strength, beat_analysis, predominant_local_pulse, static_tempo_estimation, plot_tempogram, onset_click_plot, beat_plot
10
+ from src.st_helper import convert_df, show_readme, get_shift
11
+ import numpy as np
12
+
13
+ st.title('Beat Tracking')
14
+
15
+ #%% 頁面說明
16
+ # show_readme("docs/3-Beat Tracking.md")
17
+
18
+ #%% 上傳檔案區塊
19
+ with st.expander("上傳檔案(Upload Files)"):
20
+ file = st.file_uploader("Upload your music library", type=["mp3", "wav", "ogg"])
21
+
22
+ if file is not None:
23
+ st.audio(file, format="audio/ogg")
24
+ st.subheader("File information")
25
+ st.write(f"File name: `{file.name}`", )
26
+ st.write(f"File type: `{file.type}`")
27
+ st.write(f"File size: `{file.size}`")
28
+
29
+ # 載入音檔
30
+ y, sr = librosa.load(file, sr=44100)
31
+ st.write(f"Sample rate: `{sr}`")
32
+ duration = float(np.round(len(y)/sr-0.005, 2)) # 時間長度,取小數點後2位,向下取整避免超過音檔長度
33
+ st.write(f"Duration(s): `{duration}`")
34
+
35
+ y_all = y
36
+
37
+ #%% 功能區塊
38
+ if file is not None:
39
+
40
+ ### Start of 選擇聲音片段 ###
41
+ with st.expander("選擇聲音片段(Select a segment of the audio)"):
42
+
43
+ # 建立一個滑桿,可以選擇聲音片段,使用時間長度為單位
44
+ start_time, end_time = st.slider("Select a segment of the audio",
45
+ 0.0, duration,
46
+ (st.session_state.start_time, duration),
47
+ 0.01
48
+ )
49
+ st.session_state.start_time = start_time
50
+
51
+ st.write(f"Selected segment: `{start_time}` ~ `{end_time}`, duration: `{end_time-start_time}`")
52
+
53
+ # 根據選擇的聲音片段,取出聲音資料
54
+ start_index = int(start_time*sr)
55
+ end_index = int(end_time*sr)
56
+ y_sub = y_all[start_index:end_index]
57
+
58
+
59
+ # 建立一個y_sub的播放器
60
+ st.audio(y_sub, format="audio/ogg", sample_rate=sr)
61
+ # 計算y_sub所對應時間的x軸
62
+ x_sub = np.arange(len(y_sub))/sr
63
+ ### End of 選擇聲音片段 ###
64
+
65
+ tab1, tab2, tab3, tab4, tab5, tab6 = st.tabs([
66
+ "onsets_detection",
67
+ "onset_strength",
68
+ "beat_analysis",
69
+ "predominant_local_pulse",
70
+ "static_tempo_estimation",
71
+ "Tempogram"])
72
+
73
+ shift_time, shift_array = get_shift(start_time, end_time) # shift_array為y_sub的時間刻度
74
+
75
+ # onsets_detection
76
+ with tab1:
77
+ st.subheader("onsets_detection")
78
+ fig3_1a, ax3_1a, onset_data = onsets_detection(y_sub, sr, shift_array)
79
+ o_env, o_times, onset_frames = onset_data
80
+ st.pyplot(fig3_1a)
81
+ # 設定onset_frame調整區塊
82
+ clicks = st.multiselect("Onset",
83
+ list(range(len(o_env))), list(onset_frames))
84
+ fig3_1b, ax3_1b, y_onset_clicks = onset_click_plot(o_env, o_times, clicks, len(y_sub), sr, shift_time)
85
+ st.pyplot(fig3_1b)
86
+ df_onset = pd.DataFrame({"Frame": clicks, "Time(s)": o_times[clicks], "Onset": o_env[clicks]})
87
+ st.dataframe(df_onset, use_container_width=True)
88
+ st.download_button(
89
+ label="Download onset data",
90
+ data=convert_df(df_onset),
91
+ file_name="onset_data.csv",
92
+ )
93
+ st.audio(y_onset_clicks, format="audio/ogg", sample_rate=sr)
94
+
95
+
96
+ # onset_strength
97
+ with tab2:
98
+ st.subheader("onset_strength")
99
+ onset_strength_standard = st.checkbox("standard", value=True)
100
+ onset_strength_custom_mel = st.checkbox("custom_mel", value=False)
101
+ onset_strength_cqt = st.checkbox("cqt", value=False)
102
+ fig3_2, ax3_2 = plot_onset_strength(y_sub, sr,
103
+ standard=onset_strength_standard,
104
+ custom_mel=onset_strength_custom_mel,
105
+ cqt=onset_strength_cqt,
106
+ shift_array=shift_array
107
+ )
108
+ st.pyplot(fig3_2)
109
+
110
+ # beat_analysis
111
+ with tab3:
112
+ st.subheader("beat_analysis")
113
+ spec_type = st.selectbox("spec_type", ["mel", "stft"])
114
+ spec_hop_length = st.number_input("spec_hop_length", value=512)
115
+ fig3_3a, ax3_3b, beats_data = beat_analysis(y_sub, sr,
116
+ spec_type=spec_type,
117
+ spec_hop_length=spec_hop_length,
118
+ shift_array=shift_array
119
+ )
120
+ b_times, b_env, b_tempo, b_beats = beats_data
121
+ st.pyplot(fig3_3a)
122
+
123
+ b_clicks = st.multiselect("Beats",
124
+ list(range(len(b_env))), list(b_beats))
125
+ fig3_3b, ax3_3b, y_beat_clicks = beat_plot(b_times, b_env, b_tempo, b_clicks, len(y_sub), sr, shift_time)
126
+ st.pyplot(fig3_3b)
127
+ # df_beats = pd.DataFrame([b_clicks, b_times[b_clicks] + shift_time])
128
+ # df_beats.index = ["frames", "time"]
129
+ df_beats = pd.DataFrame({"Frame": b_clicks, "Time(s)": b_times[b_clicks] + shift_time, "Beats": b_env[b_clicks]})
130
+ st.dataframe(df_beats, use_container_width=True)
131
+ st.download_button(
132
+ label="Download beats data",
133
+ data=convert_df(df_beats),
134
+ file_name="beats_data.csv",
135
+ )
136
+ st.audio(y_beat_clicks, format="audio/ogg", sample_rate=sr)
137
+
138
+
139
+ # predominant_local_pulse
140
+ with tab4:
141
+ st.subheader("predominant_local_pulse")
142
+ fig3_4, ax3_4 = predominant_local_pulse(y_sub, sr, shift_time)
143
+ st.pyplot(fig3_4)
144
+
145
+ # static_tempo_estimation
146
+ with tab5:
147
+ st.subheader("static_tempo_estimation")
148
+ static_tempo_estimation_hop_length = st.number_input("hop_length", value=512)
149
+ fig3_5, ax3_5 = static_tempo_estimation(y_sub, sr,
150
+ hop_length=static_tempo_estimation_hop_length
151
+ )
152
+ st.pyplot(fig3_5)
153
+
154
+ # Tempogram
155
+ with tab6:
156
+ st.subheader("Tempogram")
157
+ tempogram_type = st.selectbox("tempogram_type", ["fourier", "autocorr"], index=1)
158
+ tempogram_hop_length = st.number_input("Tempogram_hop_length", value=512)
159
+ fig3_6, ax3_6 = plot_tempogram(y_sub, sr,
160
+ type=tempogram_type,
161
+ hop_length=tempogram_hop_length,
162
+ shift_array=shift_array
163
+ )
164
+ st.pyplot(fig3_6)
pages/4-Chord_recognition.py ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #%%
2
+ import streamlit as st
3
+ import plotly.express as px
4
+ import plotly.graph_objects as go
5
+ import matplotlib.pyplot as plt
6
+ import numpy as np
7
+ import librosa
8
+ import pandas as pd
9
+ import seaborn as sns
10
+ from src.st_helper import convert_df, show_readme, get_shift
11
+ from src.chord_recognition import (
12
+ plot_chord_recognition,
13
+ plot_binary_template_chord_recognition,
14
+ chord_table,
15
+ compute_chromagram,
16
+ chord_recognition_template,
17
+ plot_chord,
18
+ plot_user_chord
19
+ )
20
+
21
+ st.title("Chord Recognition")
22
+
23
+ #%% 頁面說明
24
+ # show_readme("docs/1-Basic Information.md")
25
+
26
+
27
+ #%% 上傳檔案區塊
28
+ with st.expander("上傳檔案(Upload Files)"):
29
+ file = st.file_uploader("Upload your music library", type=["mp3", "wav", "ogg"])
30
+
31
+ if file is not None:
32
+ st.audio(file, format="audio/ogg")
33
+ st.subheader("File information")
34
+ st.write(f"File name: `{file.name}`", )
35
+ st.write(f"File type: `{file.type}`")
36
+ st.write(f"File size: `{file.size}`")
37
+
38
+ # 載入音檔
39
+ y, sr = librosa.load(file, sr=44100)
40
+ st.write(f"Sample rate: `{sr}`")
41
+ duration = float(np.round(len(y)/sr-0.005, 2)) # 時間長度,取小數點後2位,向下取整避免超過音檔長度
42
+ st.write(f"Duration(s): `{duration}`")
43
+
44
+ y_all = y
45
+
46
+ #%%
47
+ if file is not None:
48
+
49
+ ### Start of 選擇聲音片段 ###
50
+ with st.expander("選擇聲音片段(Select a segment of the audio)"):
51
+
52
+ # 建立一個滑桿,可以選擇聲音片段,使用時間長度為單位
53
+ start_time, end_time = st.slider("Select a segment of the audio",
54
+ 0.0, duration,
55
+ (st.session_state.start_time, duration),
56
+ 0.01
57
+ )
58
+ st.session_state.start_time = start_time
59
+
60
+ st.write(f"Selected segment: `{start_time}` ~ `{end_time}`, duration: `{end_time-start_time}`")
61
+
62
+ # 根據選擇的聲音片段,取出聲音資料
63
+ start_index = int(start_time*sr)
64
+ end_index = int(end_time*sr)
65
+ y_sub = y_all[start_index:end_index]
66
+
67
+
68
+ # 建立一個y_sub的播放器
69
+ st.audio(y_sub, format="audio/ogg", sample_rate=sr)
70
+ # 計算y_sub所對應時間的x軸
71
+ x_sub = np.arange(len(y_sub))/sr
72
+ ### End of 選擇聲音片段 ###
73
+
74
+ tab1, tab2, tab3, tab4 = st.tabs(["STFT Chroma", "Chords Result (Default)", "Chords Result (User)", "dev"])
75
+ shift_time, shift_array = get_shift(start_time, end_time) # shift_array為y_sub的時間刻度
76
+
77
+ # STFT Chroma
78
+ with tab1:
79
+ chroma, _, _, _, duration = compute_chromagram(y_sub, sr)
80
+ fig4_1, ax4_1 = plot_chord(chroma, "STFT Chroma")
81
+ st.pyplot(fig4_1)
82
+
83
+ with tab2:
84
+ _, chord_max = chord_recognition_template(chroma, norm_sim='max')
85
+ fig4_2, ax4_2 = plot_chord(chord_max, "Chord Recognition Result", cmap="crest", include_minor=True)
86
+ st.pyplot(fig4_2)
87
+
88
+ with tab3:
89
+ # 建立chord result dataframe
90
+ sec_per_frame = duration/chroma.shape[1]
91
+ chord_results_df = pd.DataFrame({
92
+ "Frame": np.arange(chroma.shape[1]),
93
+ "Time(s)": np.arange(chroma.shape[1])*sec_per_frame + shift_time,
94
+ "Chord": chord_table(chord_max)
95
+ })
96
+
97
+ fig4_1b, ax4_1b = plot_user_chord(chord_results_df)
98
+ st.pyplot(fig4_1b)
99
+
100
+ chord_results_df = st.experimental_data_editor(
101
+ chord_results_df,
102
+ use_container_width=True
103
+ )
104
+
105
+
106
+ # plot_binary_template_chord_recognition
107
+ with tab4:
108
+ st.subheader("plot_binary_template_chord_recognition")
109
+ fig4_4, ax4_4 = plot_binary_template_chord_recognition(y_sub, sr)
110
+ st.pyplot(fig4_4)
111
+
112
+
pages/5-Structure_analysis.py ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #%%
2
+ import streamlit as st
3
+ import plotly.express as px
4
+ import plotly.graph_objects as go
5
+ import matplotlib.pyplot as plt
6
+ import numpy as np
7
+ import librosa
8
+ import pandas as pd
9
+ from src.st_helper import convert_df, show_readme
10
+ from src.structure_analysis import (
11
+ plot_self_similarity
12
+ )
13
+
14
+ st.title("Structure analysis")
15
+
16
+ #%% 頁面說明
17
+ # show_readme("docs/5-Structure_analysis.md")
18
+
19
+
20
+ #%% 上傳檔案區塊
21
+ with st.expander("上傳檔案(Upload Files)"):
22
+ file = st.file_uploader("Upload your music library", type=["mp3", "wav", "ogg"])
23
+
24
+ if file is not None:
25
+ st.audio(file, format="audio/ogg")
26
+ st.subheader("File information")
27
+ st.write(f"File name: `{file.name}`", )
28
+ st.write(f"File type: `{file.type}`")
29
+ st.write(f"File size: `{file.size}`")
30
+
31
+ # 載入音檔
32
+ y, sr = librosa.load(file, sr=44100)
33
+ st.write(f"Sample rate: `{sr}`")
34
+ duration = float(np.round(len(y)/sr-0.005, 2)) # 時間長度,取小數點後2位,向下取整避免超過音檔長度
35
+ st.write(f"Duration(s): `{duration}`")
36
+
37
+
38
+ y_all = y
39
+
40
+ #%%
41
+ if file is not None:
42
+
43
+ ### Start of 選擇聲音片段 ###
44
+ with st.expander("選擇聲音片段(Select a segment of the audio)"):
45
+
46
+ # 建立一個滑桿,可以選擇聲音片段,使用時間長度為單位
47
+ start_time, end_time = st.slider("Select a segment of the audio",
48
+ 0.0, duration,
49
+ (st.session_state.start_time, duration),
50
+ 0.01
51
+ )
52
+ st.session_state.start_time = start_time
53
+
54
+ st.write(f"Selected segment: `{start_time}` ~ `{end_time}`, duration: `{end_time-start_time}`")
55
+
56
+ # 根據選擇的聲音片段,取出聲音資料
57
+ start_index = int(start_time*sr)
58
+ end_index = int(end_time*sr)
59
+ y_sub = y_all[start_index:end_index]
60
+
61
+
62
+ # 建立一個y_sub的播放器
63
+ st.audio(y_sub, format="audio/ogg", sample_rate=sr)
64
+ # 計算y_sub所對應時間的x軸
65
+ x_sub = np.arange(len(y_sub))/sr
66
+ ### End of 選擇聲音片段 ###
67
+
68
+ tab1, tab2 = st.tabs(["Self-similarity matrix", "empty"])
69
+
70
+ # plot_self_similarity
71
+ with tab1:
72
+ st.subheader("Self-similarity matrix")
73
+ affinity = st.checkbox("Affinity", value=False)
74
+ self_similarity_hop_length = st.number_input("Self similarity hop length", value=1024)
75
+ fig5_1, ax5_1 = plot_self_similarity(y_sub, sr, affinity=affinity, hop_length=self_similarity_hop_length)
76
+ st.pyplot(fig5_1)
pages/6-Timbre_Analysis.py ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #%%
2
+ import streamlit as st
3
+ import plotly.express as px
4
+ import plotly.graph_objects as go
5
+ import matplotlib.pyplot as plt
6
+ import numpy as np
7
+ import librosa
8
+ import pandas as pd
9
+ from src.st_helper import convert_df, show_readme, get_shift
10
+ from src.timbre_analysis import (
11
+ spectral_centroid_analysis,
12
+ rolloff_frequency_analysis,
13
+ spectral_bandwidth_analysis,
14
+ harmonic_percussive_source_separation
15
+ )
16
+
17
+ st.title("Timbre Analysis")
18
+ #%% 頁面說明
19
+ # show_readme("docs/6-Timbre Analysis.md")
20
+
21
+ #%% 上傳檔案區塊
22
+ with st.expander("上傳檔案(Upload Files)"):
23
+ file = st.file_uploader("Upload your music library", type=["mp3", "wav", "ogg"])
24
+
25
+ if file is not None:
26
+ st.audio(file, format="audio/ogg")
27
+ st.subheader("File information")
28
+ st.write(f"File name: `{file.name}`", )
29
+ st.write(f"File type: `{file.type}`")
30
+ st.write(f"File size: `{file.size}`")
31
+
32
+ # 載入音檔
33
+ y, sr = librosa.load(file, sr=44100)
34
+ st.write(f"Sample rate: `{sr}`")
35
+ duration = float(np.round(len(y)/sr-0.005, 2)) # 時間長度,取小數點後2位,向下取整避免超過音檔長度
36
+ st.write(f"Duration(s): `{duration}`")
37
+
38
+ y_all = y
39
+
40
+ #%%
41
+ if file is not None:
42
+
43
+ ### Start of 選擇聲音片段 ###
44
+ with st.expander("選擇聲音片段(Select a segment of the audio)"):
45
+
46
+ # 建立一個滑桿,可以選擇聲音片段,使用時間長度為單位
47
+ start_time, end_time = st.slider("Select a segment of the audio",
48
+ 0.0, duration,
49
+ (st.session_state.start_time, duration),
50
+ 0.01
51
+ )
52
+ st.session_state.start_time = start_time
53
+
54
+ st.write(f"Selected segment: `{start_time}` ~ `{end_time}`, duration: `{end_time-start_time}`")
55
+
56
+ # 根據選擇的聲音片段,取出聲音資料
57
+ start_index = int(start_time*sr)
58
+ end_index = int(end_time*sr)
59
+ y_sub = y_all[start_index:end_index]
60
+
61
+
62
+ # 建立一個y_sub的播放器
63
+ st.audio(y_sub, format="audio/ogg", sample_rate=sr)
64
+ # 計算y_sub所對應時間的x軸
65
+ x_sub = np.arange(len(y_sub))/sr
66
+ ### End of 選擇聲音片段 ###
67
+
68
+ tab1, tab2, tab3, tab4 = st.tabs(["Spectral Centroid", "Rolloff Frequency", "Spectral Bandwidth", "Harmonic Percussive Source Separation"])
69
+
70
+ shift_time, shift_array = get_shift(start_time, end_time) # shift_array為y_sub的時間刻度
71
+
72
+ # spectral_centroid_analysis
73
+ with tab1:
74
+ st.subheader("Spectral Centroid Analysis")
75
+ fig6_1, ax6_1, centroid_value = spectral_centroid_analysis(y_sub, sr, shift_array)
76
+ st.pyplot(fig6_1)
77
+
78
+ df_centroid = pd.DataFrame(centroid_value.T, columns=["Time(s)", "Centroid"])
79
+ df_centroid["Time(s)"] = df_centroid["Time(s)"] + shift_time
80
+ st.dataframe(df_centroid, use_container_width=True)
81
+ st.download_button(
82
+ label="Download spectral centroid data",
83
+ data=convert_df(df_centroid),
84
+ file_name="centroid.csv",
85
+ mime="text/csv",
86
+ )
87
+
88
+ # rolloff_frequency_analysis
89
+ with tab2:
90
+ st.subheader("Rolloff Frequency Analysis")
91
+ roll_percent = st.selectbox("Select rolloff frequency", [0.90, 0.95, 0.99])
92
+ fig6_2, ax6_2, rolloff_value = rolloff_frequency_analysis(y_sub, sr, roll_percent=roll_percent, shift_array=shift_array)
93
+ st.pyplot(fig6_2)
94
+ df_rolloff = pd.DataFrame(rolloff_value.T, columns=["Time(s)", "Rolloff", "Rolloff_min"])
95
+ df_rolloff["Time(s)"] = df_rolloff["Time(s)"] + shift_time
96
+ st.dataframe(df_rolloff, use_container_width=True)
97
+ st.download_button(
98
+ label="Download rolloff frequency data",
99
+ data=convert_df(df_rolloff),
100
+ file_name="rolloff.csv",
101
+ mime="text/csv",
102
+ )
103
+
104
+ # spectral_bandwidth_analysis
105
+ with tab3:
106
+ st.subheader("Spectral Bandwidth Analysis")
107
+ fig6_3, ax6_3, bandwidth_value = spectral_bandwidth_analysis(y_sub, sr, shift_array)
108
+ st.pyplot(fig6_3)
109
+ df_bandwidth = pd.DataFrame(bandwidth_value.T, columns=["Time(s)", "Bandwidth"])
110
+ df_bandwidth["Time(s)"] = df_bandwidth["Time(s)"] + shift_time
111
+ st.dataframe(df_bandwidth, use_container_width=True)
112
+ st.download_button(
113
+ label="Download spectral bandwidth data",
114
+ data=convert_df(df_bandwidth),
115
+ file_name="bandwidth.csv",
116
+ mime="text/csv",
117
+ )
118
+
119
+ # harmonic_percussive_source_separation
120
+ with tab4:
121
+ st.subheader("Harmonic Percussive Source Separation")
122
+ fig6_4, ax6_4, (Harmonic_data) = harmonic_percussive_source_separation(y_sub, sr, shift_array)
123
+ D, H, P, t = Harmonic_data
124
+ st.pyplot(fig6_4)
125
+
126
+ st.download_button(
127
+ label="Download Full power spectrogram data",
128
+ data=convert_df(pd.DataFrame(D)),
129
+ file_name="Full_power_spectrogram.csv",
130
+ use_container_width=True,
131
+ )
132
+ st.download_button(
133
+ label="Download Harmonic power spectrogram data",
134
+ data=convert_df(pd.DataFrame(H)),
135
+ file_name="Harmonic_power_spectrogram.csv",
136
+ use_container_width=True,
137
+ )
138
+ st.download_button(
139
+ label="Download Percussive power spectrogram data",
140
+ data=convert_df(pd.DataFrame(P)),
141
+ file_name="Percussive_power_spectrogram.csv",
142
+ use_container_width=True,
143
+ )
144
+ st.download_button(
145
+ label="Download Time data",
146
+ data=convert_df(pd.DataFrame(t+shift_time, columns=["Time(s)"])),
147
+ file_name="Time_scale.csv",
148
+ use_container_width=True,
149
+ )
pages/999-dev.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #%%
2
+ import pkg_resources
3
+ import streamlit as st
4
+
5
+ with st.expander("Show packages"):
6
+ for dist in pkg_resources.working_set:
7
+ print(f"{dist.project_name}=={dist.version}")
8
+ st.write(f"{dist.project_name}=={dist.version}")
9
+
10
+ #%%
11
+ import os
12
+ import psutil
13
+
14
+ with st.expander("Show memory usage"):
15
+ process = psutil.Process(os.getpid())
16
+ mem_info = process.memory_info()
17
+ print(f"Memory usage: {mem_info.rss / 1024 / 1024:.2f} MB")
18
+ st.write(f"Memory usage: {mem_info.rss / 1024 / 1024:.2f} MB")
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ librosa==0.9.2
2
+ pandas==1.3.5
3
+ streamlit==1.19.0
4
+ numpy==1.23.0
5
+ seaborn==0.12.1
6
+ matplotlib==3.5.3
7
+ plotly==5.11.0
8
+ scikit-learn==1.2.0
9
+ soundfile==0.11.0
10
+ libfmp==1.2.3
11
+ psutil==5.9.1
src/basic_info.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import librosa
2
+ from librosa import display
3
+ from librosa import feature
4
+
5
+ import numpy as np
6
+ from matplotlib import pyplot as plt
7
+ import scipy
8
+
9
+ from numpy import typing as npt
10
+ import typing
11
+
12
+
13
+ def show_duration(y: npt.ArrayLike, sr: int) -> float:
14
+ pass
15
+
16
+
17
+ def selcet_time(start_time: float, end_time: float) :
18
+ pass
19
+
20
+
21
+ def plot_waveform(ax, y: npt.ArrayLike, sr: int, start_time: float = 0.0, end_time: float = None) -> None :
22
+ # ax = plt.subplot(2, 1, 1)
23
+ startIdx = int(start_time * sr)
24
+
25
+ if not end_time :
26
+
27
+ librosa.display.waveshow(y[startIdx:], sr)
28
+
29
+ else :
30
+ endIdx = int(end_time * sr)
31
+ librosa.display.waveshow(y[startIdx:endIdx - 1], sr)
32
+
33
+ return
34
+
35
+
36
+ def signal_RMS_analysis(y: npt.ArrayLike, shift_time: float = 0.0) :
37
+
38
+ fig, ax = plt.subplots()
39
+
40
+ rms = librosa.feature.rms(y = y)
41
+ times = librosa.times_like(rms) + shift_time
42
+
43
+ ax.plot(times, rms[0])
44
+ ax.set_xlabel('Time (s)')
45
+ ax.set_ylabel('RMS')
46
+
47
+
48
+ return fig, ax, times, rms
src/beat_track.py ADDED
@@ -0,0 +1,223 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import librosa
2
+ from librosa import display
3
+ from librosa import feature
4
+
5
+ import numpy as np
6
+ from matplotlib import pyplot as plt
7
+ import scipy
8
+ import soundfile as sf
9
+
10
+ from numpy import typing as npt
11
+ import typing
12
+
13
+
14
+ def onsets_detection(y: npt.ArrayLike, sr: int, shift_array: npt.ArrayLike) -> tuple :
15
+ """
16
+ 計算音檔的onset frames
17
+ """
18
+ o_env = librosa.onset.onset_strength(y=y, sr=sr)
19
+ times = librosa.times_like(o_env, sr=sr)
20
+ onset_frames = librosa.onset.onset_detect(onset_envelope=o_env, sr=sr)
21
+ D = np.abs(librosa.stft(y))
22
+
23
+ fig, ax = plt.subplots()
24
+ librosa.display.specshow(librosa.amplitude_to_db(D, ref=np.max),
25
+ x_axis='time', y_axis='log', ax=ax, sr=sr)
26
+ ax.set_xticks(shift_array - shift_array[0],
27
+ shift_array)
28
+ ax.set_xlabel('Time (s)')
29
+ ax.autoscale()
30
+ ax.set(title='Power spectrogram')
31
+
32
+
33
+ return fig, ax, (o_env, times, onset_frames)
34
+
35
+ def onset_click_plot(o_env, times, onset_frames, y_len, sr, shift_time) -> tuple:
36
+ """
37
+ 重新繪製onset frames
38
+ """
39
+ fig, ax = plt.subplots()
40
+ ax.plot(times + shift_time, o_env, label='Onset strength')
41
+ ax.vlines(times[onset_frames] + shift_time, 0, o_env.max(), color='r', alpha=0.9,
42
+ linestyles='--', label='Onsets')
43
+ ax.autoscale()
44
+ ax.legend()
45
+ ax.set_xlabel('Time (s)')
46
+ ax.set_ylabel('Strength')
47
+
48
+ y_onset_clicks = librosa.clicks(frames=onset_frames, sr=sr, length=y_len)
49
+ return fig, ax, y_onset_clicks
50
+
51
+
52
+ def plot_onset_strength(y: npt.ArrayLike, sr:int, standard: bool = True, custom_mel: bool = False, cqt: bool = False, shift_array: npt.ArrayLike = None) -> tuple:
53
+
54
+ D = np.abs(librosa.stft(y))
55
+ times = librosa.times_like(D, sr)
56
+
57
+ fig, ax = plt.subplots(nrows=2, sharex=True)
58
+ librosa.display.specshow(librosa.amplitude_to_db(D, ref=np.max),
59
+ y_axis='log', x_axis='time', ax=ax[0], sr=sr)
60
+
61
+ ax[0].set(title='Power spectrogram')
62
+ ax[0].label_outer()
63
+
64
+ # Standard Onset Fuction
65
+
66
+ if standard :
67
+ onset_env_standard = librosa.onset.onset_strength(y=y, sr=sr)
68
+ ax[1].plot(times, 2 + onset_env_standard / onset_env_standard.max(), alpha=0.8, label='Mean (mel)')
69
+
70
+ if custom_mel :
71
+ onset_env_mel = librosa.onset.onset_strength(y=y, sr=sr,
72
+ aggregate=np.median,
73
+ fmax=8000, n_mels=256)
74
+ ax[1].plot(times, 1 + onset_env_mel / onset_env_mel.max(), alpha=0.8, label='Median (custom mel)')
75
+
76
+ if cqt :
77
+ C = np.abs(librosa.cqt(y=y, sr=sr))
78
+ onset_env_cqt = librosa.onset.onset_strength(sr=sr, S=librosa.amplitude_to_db(C, ref=np.max))
79
+ ax[1].plot(times, onset_env_cqt / onset_env_cqt.max(), alpha=0.8, label='Mean (CQT)')
80
+
81
+ ax[1].legend()
82
+ ax[1].set(ylabel='Normalized strength', yticks=[])
83
+ ax[1].set_xticks(shift_array - shift_array[0],
84
+ shift_array)
85
+ ax[1].autoscale()
86
+ ax[1].set_xlabel('Time (s)')
87
+
88
+ return fig, ax
89
+
90
+
91
+ def beat_analysis(y: npt.ArrayLike, sr:int, spec_type: str = 'mel', spec_hop_length: int = 512, shift_array: npt.ArrayLike = None) :
92
+
93
+ fig, ax = plt.subplots()
94
+ onset_env = librosa.onset.onset_strength(y=y, sr=sr, aggregate=np.median)
95
+ tempo, beats = librosa.beat.beat_track(onset_envelope=onset_env, sr=sr)
96
+ times = librosa.times_like(onset_env, sr=sr, hop_length=spec_hop_length)
97
+
98
+ if spec_type == 'mel':
99
+ M = librosa.feature.melspectrogram(y=y, sr=sr, hop_length=spec_hop_length)
100
+ librosa.display.specshow(librosa.power_to_db(M, ref=np.max),
101
+ y_axis='mel', x_axis='time', hop_length=spec_hop_length,
102
+ ax=ax, sr=sr)
103
+ ax.set(title='Mel spectrogram')
104
+
105
+ if spec_type == 'stft':
106
+ S = np.abs(librosa.stft(y))
107
+ img = librosa.display.specshow(librosa.amplitude_to_db(S, ref=np.max),
108
+ y_axis='log', x_axis='time', ax=ax, sr=sr)
109
+
110
+ ax.set_title('Power spectrogram')
111
+ # fig.colorbar(img, ax=ax[0], format="%+2.0f dB")
112
+
113
+ ax.set_xticks(shift_array - shift_array[0],
114
+ shift_array)
115
+ ax.autoscale()
116
+ ax.set_xlabel('Time (s)')
117
+
118
+
119
+ return fig, ax, (times, onset_env, tempo, beats)
120
+
121
+ def beat_plot(times, onset_env, tempo, beats, y_len, sr, shift_time):
122
+ """
123
+ 重新繪製beat
124
+ """
125
+
126
+ fig, ax = plt.subplots()
127
+ ax.plot(times + shift_time, librosa.util.normalize(onset_env), label='Onset strength')
128
+ ax.vlines(times[beats] + shift_time, 0, 1, alpha=0.5, color='r', linestyle='--', label='Beats')
129
+ tempoString = 'Tempo = %.2f'% (tempo)
130
+ ax.plot([], [], ' ', label = tempoString)
131
+ ax.legend()
132
+ ax.set_xlabel('Time (s)')
133
+ ax.set_ylabel('Normalized strength')
134
+
135
+ y_beats = librosa.clicks(frames=beats, sr=sr, length=y_len)
136
+
137
+ return fig, ax, y_beats
138
+
139
+ def predominant_local_pulse(y: npt.ArrayLike, sr:int, shift_time:float=0) -> tuple :
140
+
141
+ onset_env = librosa.onset.onset_strength(y=y, sr=sr)
142
+ pulse = librosa.beat.plp(onset_envelope=onset_env, sr=sr)
143
+ beats_plp = np.flatnonzero(librosa.util.localmax(pulse))
144
+ times = librosa.times_like(pulse, sr=sr)
145
+
146
+ fig, ax = plt.subplots()
147
+ ax.plot(times + shift_time, librosa.util.normalize(pulse),label='PLP')
148
+ ax.vlines(times[beats_plp] + shift_time, 0, 1, alpha=0.5, color='r',
149
+ linestyle='--', label='PLP Beats')
150
+ ax.legend()
151
+ ax.set(title="Predominant local pulse")
152
+ ax.set_xlabel('Time (s)')
153
+ ax.set_ylabel('Normalized strength')
154
+
155
+ return fig, ax
156
+
157
+
158
+ def static_tempo_estimation(y: npt.ArrayLike, sr: int, hop_length: int = 512) -> tuple:
159
+
160
+ '''
161
+ To visualize the result of static tempo estimation
162
+
163
+ y: input signal array
164
+ sr: sampling rate
165
+
166
+ '''
167
+
168
+ onset_env = librosa.onset.onset_strength(y=y, sr=sr)
169
+ tempo = librosa.beat.tempo(onset_envelope=onset_env, sr=sr)
170
+
171
+ # Static tempo estimation
172
+ prior = scipy.stats.uniform(30, 300) # uniform over 30-300 BPM
173
+ utempo = librosa.beat.tempo(onset_envelope=onset_env, sr=sr, prior=prior)
174
+
175
+ tempo = tempo.item()
176
+ utempo = utempo.item()
177
+ ac = librosa.autocorrelate(onset_env, max_size=2 * sr // hop_length)
178
+ freqs = librosa.tempo_frequencies(len(ac), sr=sr,
179
+ hop_length=hop_length)
180
+
181
+ fig, ax = plt.subplots()
182
+ ax.semilogx(freqs[1:], librosa.util.normalize(ac)[1:],
183
+ label='Onset autocorrelation', base=2)
184
+ ax.axvline(tempo, 0, 1, alpha=0.75, linestyle='--', color='r',
185
+ label='Tempo (default prior): {:.2f} BPM'.format(tempo))
186
+ ax.axvline(utempo, 0, 1, alpha=0.75, linestyle=':', color='g',
187
+ label='Tempo (uniform prior): {:.2f} BPM'.format(utempo))
188
+ ax.set(xlabel='Tempo (BPM)', title='Static tempo estimation')
189
+ ax.grid(True)
190
+ ax.legend()
191
+
192
+ return fig, ax
193
+
194
+
195
+ def plot_tempogram(y: npt.ArrayLike, sr: int, type: str = 'autocorr', hop_length: int = 512, shift_array: npt.ArrayLike = None) -> tuple :
196
+
197
+ oenv = librosa.onset.onset_strength(y=y, sr=sr, hop_length=hop_length)
198
+ tempogram = librosa.feature.fourier_tempogram(onset_envelope=oenv, sr=sr, hop_length=hop_length)
199
+ tempo = librosa.beat.tempo(onset_envelope=oenv, sr=sr, hop_length=hop_length)[0]
200
+
201
+ fig, ax = plt.subplots()
202
+
203
+ if type == 'fourier' :
204
+ # To determine which temp to show?
205
+ librosa.display.specshow(np.abs(tempogram), sr=sr, hop_length=hop_length,
206
+ x_axis='time', y_axis='fourier_tempo', cmap='magma')
207
+ ax.axhline(tempo, color='w', linestyle='--', alpha=1, label='Estimated tempo={:g}'.format(tempo))
208
+ ax.legend(loc='upper right')
209
+ # ax.title('Fourier Tempogram')
210
+
211
+ if type == 'autocorr' :
212
+ ac_tempogram = librosa.feature.tempogram(onset_envelope=oenv, sr=sr, hop_length=hop_length, norm=None)
213
+ librosa.display.specshow(ac_tempogram, sr=sr, hop_length=hop_length, x_axis='time', y_axis='tempo', cmap='magma')
214
+ ax.axhline(tempo, color='w', linestyle='--', alpha=1, label='Estimated tempo={:g}'.format(tempo))
215
+ ax.legend(loc='upper right')
216
+ # ax.title('Autocorrelation Tempogram')
217
+ ax.set_xticks(shift_array - shift_array[0],
218
+ shift_array)
219
+ ax.autoscale()
220
+
221
+ return fig, ax
222
+
223
+
src/chord_recognition.py ADDED
@@ -0,0 +1,285 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import numpy as np
3
+ from matplotlib import pyplot as plt
4
+
5
+ import librosa
6
+ import libfmp.b
7
+ import libfmp.c3
8
+ import libfmp.c4
9
+
10
+ import sys
11
+
12
+ def compute_chromagram_from_filename(fn_wav, Fs=22050, N=4096, H=2048, gamma=None, version='STFT', norm='2'):
13
+ """Compute chromagram for WAV file specified by filename
14
+
15
+ Notebook: C5/C5S2_ChordRec_Templates.ipynb
16
+
17
+ Args:
18
+ fn_wav (str): Filenname of WAV
19
+ Fs (scalar): Sampling rate (Default value = 22050)
20
+ N (int): Window size (Default value = 4096)
21
+ H (int): Hop size (Default value = 2048)
22
+ gamma (float): Constant for logarithmic compression (Default value = None)
23
+ version (str): Technique used for front-end decomposition ('STFT', 'IIS', 'CQT') (Default value = 'STFT')
24
+ norm (str): If not 'None', chroma vectors are normalized by norm as specified ('1', '2', 'max')
25
+ (Default value = '2')
26
+
27
+ Returns:
28
+ X (np.ndarray): Chromagram
29
+ Fs_X (scalar): Feature reate of chromagram
30
+ x (np.ndarray): Audio signal
31
+ Fs (scalar): Sampling rate of audio signal
32
+ x_dur (float): Duration (seconds) of audio signal
33
+ """
34
+ x, Fs = librosa.load(fn_wav, sr=Fs)
35
+ x_dur = x.shape[0] / Fs
36
+ if version == 'STFT':
37
+ # Compute chroma features with STFT
38
+ X = librosa.stft(x, n_fft=N, hop_length=H, pad_mode='constant', center=True)
39
+ if gamma is not None:
40
+ X = np.log(1 + gamma * np.abs(X) ** 2)
41
+ else:
42
+ X = np.abs(X) ** 2
43
+ X = librosa.feature.chroma_stft(S=X, sr=Fs, tuning=0, norm=None, hop_length=H, n_fft=N)
44
+ if version == 'CQT':
45
+ # Compute chroma features with CQT decomposition
46
+ X = librosa.feature.chroma_cqt(y=x, sr=Fs, hop_length=H, norm=None)
47
+ if version == 'IIR':
48
+ # Compute chroma features with filter bank (using IIR elliptic filter)
49
+ X = librosa.iirt(y=x, sr=Fs, win_length=N, hop_length=H, center=True, tuning=0.0)
50
+ if gamma is not None:
51
+ X = np.log(1.0 + gamma * X)
52
+ X = librosa.feature.chroma_cqt(C=X, bins_per_octave=12, n_octaves=7,
53
+ fmin=librosa.midi_to_hz(24), norm=None)
54
+ if norm is not None:
55
+ X = libfmp.c3.normalize_feature_sequence(X, norm=norm)
56
+ Fs_X = Fs / H
57
+ return X, Fs_X, x, Fs, x_dur
58
+
59
+ def compute_chromagram(y, sr, Fs=22050, N=4096, H=2048, gamma=None, version='STFT', norm='2'):
60
+ """Compute chromagram for WAV file specified by filename
61
+
62
+ Notebook: C5/C5S2_ChordRec_Templates.ipynb
63
+
64
+ Args:
65
+ y (np.ndarray): Audio signal
66
+ sr (scalar): Sampling rate
67
+ Fs (scalar): Sampling rate (Default value = 22050)
68
+ N (int): Window size (Default value = 4096)
69
+ H (int): Hop size (Default value = 2048)
70
+ gamma (float): Constant for logarithmic compression (Default value = None)
71
+ version (str): Technique used for front-end decomposition ('STFT', 'IIS', 'CQT') (Default value = 'STFT')
72
+ norm (str): If not 'None', chroma vectors are normalized by norm as specified ('1', '2', 'max')
73
+ (Default value = '2')
74
+
75
+ Returns:
76
+ X (np.ndarray): Chromagram
77
+ Fs_X (scalar): Feature reate of chromagram
78
+ x (np.ndarray): Audio signal
79
+ Fs (scalar): Sampling rate of audio signal
80
+ x_dur (float): Duration (seconds) of audio signal
81
+ """
82
+ x = librosa.resample(y, sr, Fs)
83
+ x_dur = x.shape[0] / Fs
84
+ if version == 'STFT':
85
+ # Compute chroma features with STFT
86
+ X = librosa.stft(x, n_fft=N, hop_length=H, pad_mode='constant', center=True)
87
+ if gamma is not None:
88
+ X = np.log(1 + gamma * np.abs(X) ** 2)
89
+ else:
90
+ X = np.abs(X) ** 2
91
+ X = librosa.feature.chroma_stft(S=X, sr=Fs, tuning=0, norm=None, hop_length=H, n_fft=N)
92
+ if version == 'CQT':
93
+ # Compute chroma features with CQT decomposition
94
+ X = librosa.feature.chroma_cqt(y=x, sr=Fs, hop_length=H, norm=None)
95
+ if version == 'IIR':
96
+ # Compute chroma features with filter bank (using IIR elliptic filter)
97
+ X = librosa.iirt(y=x, sr=Fs, win_length=N, hop_length=H, center=True, tuning=0.0)
98
+ if gamma is not None:
99
+ X = np.log(1.0 + gamma * X)
100
+ X = librosa.feature.chroma_cqt(C=X, bins_per_octave=12, n_octaves=7,
101
+ fmin=librosa.midi_to_hz(24), norm=None)
102
+ if norm is not None:
103
+ X = libfmp.c3.normalize_feature_sequence(X, norm=norm)
104
+ Fs_X = Fs / H
105
+ return X, Fs_X, x, Fs, x_dur
106
+
107
+ def get_chord_labels(ext_minor='m', nonchord=False):
108
+ """Generate chord labels for major and minor triads (and possibly nonchord label)
109
+
110
+ Notebook: C5/C5S2_ChordRec_Templates.ipynb
111
+
112
+ Args:
113
+ ext_minor (str): Extension for minor chords (Default value = 'm')
114
+ nonchord (bool): If "True" then add nonchord label (Default value = False)
115
+
116
+ Returns:
117
+ chord_labels (list): List of chord labels
118
+ """
119
+ chroma_labels = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
120
+ chord_labels_maj = chroma_labels
121
+ chord_labels_min = [s + ext_minor for s in chroma_labels]
122
+ chord_labels = chord_labels_maj + chord_labels_min
123
+ if nonchord is True:
124
+ chord_labels = chord_labels + ['N']
125
+ return chord_labels
126
+
127
+ def generate_chord_templates(nonchord=False):
128
+ """Generate chord templates of major and minor triads (and possibly nonchord)
129
+
130
+ Notebook: C5/C5S2_ChordRec_Templates.ipynb
131
+
132
+ Args:
133
+ nonchord (bool): If "True" then add nonchord template (Default value = False)
134
+
135
+ Returns:
136
+ chord_templates (np.ndarray): Matrix containing chord_templates as columns
137
+ """
138
+ template_cmaj = np.array([1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0]).T
139
+ template_cmin = np.array([1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0]).T
140
+ num_chord = 24
141
+ if nonchord:
142
+ num_chord = 25
143
+ chord_templates = np.ones((12, num_chord))
144
+ for shift in range(12):
145
+ chord_templates[:, shift] = np.roll(template_cmaj, shift)
146
+ chord_templates[:, shift+12] = np.roll(template_cmin, shift)
147
+ return chord_templates
148
+
149
+ def chord_recognition_template(X, norm_sim='1', nonchord=False):
150
+ """Conducts template-based chord recognition
151
+ with major and minor triads (and possibly nonchord)
152
+
153
+ Notebook: C5/C5S2_ChordRec_Templates.ipynb
154
+
155
+ Args:
156
+ X (np.ndarray): Chromagram
157
+ norm_sim (str): Specifies norm used for normalizing chord similarity matrix (Default value = '1')
158
+ nonchord (bool): If "True" then add nonchord template (Default value = False)
159
+
160
+ Returns:
161
+ chord_sim (np.ndarray): Chord similarity matrix
162
+ chord_max (np.ndarray): Binarized chord similarity matrix only containing maximizing chord
163
+ """
164
+ chord_templates = generate_chord_templates(nonchord=nonchord)
165
+ X_norm = libfmp.c3.normalize_feature_sequence(X, norm='2')
166
+ chord_templates_norm = libfmp.c3.normalize_feature_sequence(chord_templates, norm='2')
167
+ chord_sim = np.matmul(chord_templates_norm.T, X_norm)
168
+ if norm_sim is not None:
169
+ chord_sim = libfmp.c3.normalize_feature_sequence(chord_sim, norm=norm_sim)
170
+ # chord_max = (chord_sim == chord_sim.max(axis=0)).astype(int)
171
+ chord_max_index = np.argmax(chord_sim, axis=0)
172
+ chord_max = np.zeros(chord_sim.shape).astype(np.int32)
173
+ for n in range(chord_sim.shape[1]):
174
+ chord_max[chord_max_index[n], n] = 1
175
+
176
+ return chord_sim, chord_max
177
+
178
+ def plot_chord_recognition(y, sr) :
179
+ import warnings
180
+ warnings.warn("This function is deprecated and will be removed in future versions.", DeprecationWarning)
181
+
182
+ X, Fs_X, x, Fs, x_dur = compute_chromagram(y, sr)
183
+
184
+ chord_sim, chord_max = chord_recognition_template(X, norm_sim='max')
185
+ chord_labels = get_chord_labels(nonchord=False)
186
+
187
+ cmap = libfmp.b.compressed_gray_cmap(alpha=1, reverse=False)
188
+ fig, ax = plt.subplots(2, 2, gridspec_kw={'width_ratios': [1, 0.03],
189
+ 'height_ratios': [1.5, 3]}, figsize=(8, 10))
190
+
191
+ libfmp.b.plot_chromagram(X, ax=[ax[0,0], ax[0,1]], Fs=Fs_X, clim=[0, 1], xlabel='',
192
+ title='STFT-based chromagram (feature rate = %0.1f Hz)' % (Fs_X))
193
+ libfmp.b.plot_matrix(chord_max, ax=[ax[1, 0], ax[1, 1]], Fs=Fs_X,
194
+ title='Time–chord representation of chord recognition result',
195
+ ylabel='Chord', xlabel='')
196
+ ax[1, 0].set_yticks(np.arange( len(chord_labels) ))
197
+ ax[1, 0].set_yticklabels(chord_labels)
198
+ ax[1, 0].grid()
199
+ plt.tight_layout()
200
+ return fig, ax, chord_max
201
+
202
+ def plot_binary_template_chord_recognition(y, sr) :
203
+ import warnings
204
+ warnings.warn("This function is deprecated and will be removed in future versions.", DeprecationWarning)
205
+
206
+ X, Fs_X, x, Fs, x_dur = compute_chromagram(y, sr)
207
+ chord_sim, chord_max = chord_recognition_template(X, norm_sim='max')
208
+
209
+ chord_templates = generate_chord_templates()
210
+ X_chord = np.matmul(chord_templates, chord_max)
211
+
212
+ fig, ax = plt.subplots(2, 2, gridspec_kw={'width_ratios': [1, 0.03],
213
+ 'height_ratios': [1, 1]}, figsize=(8, 5))
214
+
215
+ libfmp.b.plot_chromagram(X, ax=[ax[0, 0], ax[0, 1]], Fs=Fs_X, clim=[0, 1], xlabel='',
216
+ title='STFT-based chromagram (feature rate = %0.1f Hz)' % (Fs_X))
217
+ libfmp.b.plot_chromagram(X_chord, ax=[ax[1, 0], ax[1, 1]], Fs=Fs_X, clim=[0, 1], xlabel='',
218
+ title='Binary templates of the chord recognition result')
219
+ plt.tight_layout()
220
+ return fig, ax
221
+
222
+
223
+ def chord_table(chord_max):
224
+
225
+ chord_labels = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B'] + ['Cm', 'C#m', 'Dm', 'D#m', 'Em', 'Fm', 'F#m', 'Gm', 'G#m', 'Am', 'A#m', 'Bm']
226
+
227
+ # 計算chord_max依照第一個軸的最大值的index
228
+ chord_max_index = np.argmax(chord_max, axis=0)
229
+ # 用index找出對應的chord_labels
230
+ chord_results = [chord_labels[i] for i in chord_max_index]
231
+
232
+ return chord_results
233
+
234
+
235
+ def plot_chord(chroma, title="", figsize=(12, 6), cmap="coolwarm", include_minor=False):
236
+ import seaborn as sns
237
+ chroma_labels = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
238
+ if include_minor:
239
+ chroma_labels += ['Cm', 'C#m', 'Dm', 'D#m', 'Em', 'Fm', 'F#m', 'Gm', 'G#m', 'Am', 'A#m', 'Bm']
240
+
241
+ fig, ax = plt.subplots(figsize=figsize)
242
+
243
+ sns.heatmap(chroma, ax=ax, cmap=cmap, linewidths=0.01, linecolor=(1, 1, 1, 0.1))
244
+ ax.invert_yaxis()
245
+ ax.set_yticks(
246
+ np.arange(len(chroma_labels)) + 0.5,
247
+ chroma_labels,
248
+ rotation=0,
249
+ )
250
+ ax.set_ylabel("Chord")
251
+ ax.set_xlabel('Time (frame)')
252
+ ax.set_title(title)
253
+
254
+ return fig, ax
255
+
256
+ def plot_user_chord(df):
257
+
258
+ import seaborn as sns
259
+ chroma_labels = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B'] + ['Cm', 'C#m', 'Dm', 'D#m', 'Em', 'Fm', 'F#m', 'Gm', 'G#m', 'Am', 'A#m', 'Bm']
260
+
261
+ # 檢查df["Chord"]無chroma_labels以外的值
262
+ assert df["Chord"].isin(chroma_labels).all(), "Chord must be in chroma_labels"
263
+
264
+ # 將df["Chord"]轉成chroma_labels的index
265
+ df["Chord_index"] = df["Chord"].apply(lambda x: chroma_labels.index(x))
266
+
267
+ # 建立一個24 * len(df)的矩陣,並將值設為0
268
+ chroma = np.zeros((24, len(df)))
269
+ # 依照df["Chord_index"]的值將chroma的值設為1
270
+ chroma[df["Chord_index"], np.arange(len(df)),] = 1
271
+
272
+ # 繪圖
273
+ fig, ax = plt.subplots(figsize=(12, 6))
274
+ sns.heatmap(chroma, ax=ax, cmap='crest', linewidths=0.01, linecolor=(1, 1, 1, 0.1))
275
+ ax.invert_yaxis()
276
+ ax.set_yticks(
277
+ np.arange(len(chroma_labels)) + 0.5,
278
+ chroma_labels,
279
+ rotation=0,
280
+ )
281
+ ax.set_ylabel("Chord")
282
+ ax.set_xlabel('Time (frame)')
283
+ ax.set_title('User Chord Recognition Result')
284
+
285
+ return fig, ax
src/pitch_estimation.py ADDED
@@ -0,0 +1,181 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import librosa
2
+ from librosa import display
3
+ from librosa import feature
4
+
5
+ import numpy as np
6
+ from matplotlib import pyplot as plt
7
+ import scipy
8
+
9
+ from numpy import typing as npt
10
+ import typing
11
+
12
+
13
+ def plot_mel_spectrogram(
14
+ y: npt.ArrayLike,
15
+ sr:int,
16
+ shift_array: npt.ArrayLike,
17
+ with_pitch : bool = True,
18
+ ):
19
+
20
+ S = librosa.feature.melspectrogram(y=y, sr=sr)
21
+ S_dB = librosa.power_to_db(S, ref=np.max)
22
+
23
+ if with_pitch :
24
+
25
+ f0, voiced_flag, voiced_probs = librosa.pyin(y,
26
+ fmin=librosa.note_to_hz('C2'),
27
+ fmax=librosa.note_to_hz('C7'))
28
+ times = librosa.times_like(f0, sr)
29
+
30
+ fig, ax = plt.subplots(figsize=(12,6))
31
+ img = librosa.display.specshow(S_dB, x_axis='time',
32
+ y_axis='mel', sr=sr,
33
+ fmax=8000, ax=ax)
34
+ ax.plot(times, f0, label='f0', color='cyan', linewidth=3)
35
+ ax.set_xticks(shift_array - shift_array[0],
36
+ shift_array)
37
+ fig.colorbar(img, ax=ax, format='%+2.0f dB')
38
+ ax.legend(loc='upper right')
39
+ ax.set(title='Mel-frequency spectrogram')
40
+
41
+
42
+ else :
43
+ fig, ax = plt.subplots(figsize=(12,6))
44
+ img = librosa.display.specshow(S_dB, x_axis='time',
45
+ y_axis='mel', sr=sr,
46
+ fmax=8000, ax=ax)
47
+ ax.set_xticks(shift_array - shift_array[0],
48
+ shift_array)
49
+ fig.colorbar(img, ax=ax, format='%+2.0f dB')
50
+ ax.set(title='Mel-frequency spectrogram')
51
+ ax.set_xlabel('Time (s)')
52
+
53
+ return fig, ax
54
+
55
+
56
+ def plot_constant_q_transform(y: npt.ArrayLike, sr:int,
57
+ shift_array: npt.ArrayLike
58
+ ) :
59
+
60
+ C = np.abs(librosa.cqt(y, sr=sr))
61
+ fig, ax = plt.subplots(figsize=(12,6))
62
+ img = librosa.display.specshow(librosa.amplitude_to_db(C, ref=np.max),
63
+ sr=sr, x_axis='time', y_axis='cqt_note', ax=ax)
64
+ ax.set_xticks(shift_array - shift_array[0],
65
+ shift_array)
66
+ ax.set_title('Constant-Q power spectrum')
67
+ ax.set_xlabel('Time (s)')
68
+ fig.colorbar(img, ax=ax, format="%+2.0f dB")
69
+
70
+ return fig, ax
71
+
72
+
73
+ def pitch_class_type_one_vis(y: npt.ArrayLike, sr: int) -> None :
74
+
75
+ S = np.abs(librosa.stft(y))
76
+ chroma = librosa.feature.chroma_stft(S=S, sr=sr)
77
+
78
+ count_pitch = np.empty(np.shape(chroma)) # To count pitch
79
+ notes = np.array(librosa.key_to_notes('C:maj'))
80
+
81
+ # Set the threshold to determine the exact pitch
82
+ count_pitch[chroma < 0.5] = 0
83
+ count_pitch[chroma > 0.5] = 1
84
+
85
+ # To compute the probability
86
+ occurProbs = np.empty(np.shape(count_pitch)[0])
87
+
88
+ for i in range(np.shape(count_pitch)[0]) :
89
+ total = np.sum(count_pitch)
90
+ occurProbs[i] = np.sum(count_pitch[i]) / total
91
+
92
+ result = np.vstack((notes, np.round(occurProbs, 4))).T
93
+
94
+ ticks = range(12)
95
+ fig, ax = plt.subplots()
96
+ plt.title("Pitch Class")
97
+ plt.bar(ticks,occurProbs * 100, align='center')
98
+ plt.xticks(ticks, notes)
99
+ plt.xlabel("Note")
100
+ plt.ylabel("Number of occurrences %")
101
+
102
+ return fig, ax, result
103
+
104
+
105
+ def pitch_class_histogram_chroma(y: npt.ArrayLike, sr: int, higher_resolution: bool, save_to_csv: bool = False) -> None :
106
+
107
+ S = np.abs(librosa.stft(y))
108
+ notes = np.array(librosa.key_to_notes('C:maj')) # For x-axis legend
109
+
110
+ if not higher_resolution :
111
+
112
+ chroma = librosa.feature.chroma_stft(S=S, sr=sr)
113
+ valid_pitch = np.empty(np.shape(chroma)) # To count pitch
114
+ valid_pitch[chroma < 0.7] = 0
115
+ valid_pitch[chroma >= 0.7] = 1
116
+ total = np.sum(valid_pitch)
117
+
118
+ # To compute the probability
119
+ # WARNING: (12,) means pure 1-D array
120
+ occurProbs = np.empty((12,))
121
+ for i in range(0, 12) :
122
+ occurProbs[i] = np.sum(valid_pitch[i]) / total
123
+
124
+ ticks = range(12)
125
+ colors = ['lightcoral', 'goldenrod', 'lightseagreen', 'indigo', 'lightcoral',
126
+ 'goldenrod', 'lightseagreen', 'indigo', 'lightcoral', 'goldenrod',
127
+ 'lightseagreen', 'indigo']
128
+ xLegend = notes
129
+
130
+ fig, ax = plt.subplots()
131
+ ax.bar(ticks,occurProbs * 100, align='center', color=colors)
132
+ ax.set_xticks(ticks)
133
+ ax.set_xticklabels(xLegend)
134
+ ax.set_title("Pitch Class Histogram")
135
+ ax.set_xlabel("Note")
136
+ ax.set_ylabel("Occurrences %")
137
+
138
+ if higher_resolution :
139
+
140
+ chroma = librosa.feature.chroma_stft(S=S, sr=sr, n_chroma=120)
141
+ valid_pitch = np.empty(np.shape(chroma)) # To count pitch
142
+ valid_pitch[chroma < 0.7] = 0
143
+ valid_pitch[chroma >= 0.7] = 1
144
+ total = np.sum(valid_pitch)
145
+
146
+ occurProbs = np.empty((120,))
147
+ for i in range(0, 120) :
148
+ occurProbs[i] = np.sum(valid_pitch[i]) / total
149
+
150
+ ticks = range(120)
151
+ fig, ax = plt.subplots()
152
+ xLegend = list()
153
+ for i in range(120) :
154
+ if i % 10 == 0 :
155
+ xLegend.append(notes[i // 10])
156
+ else :
157
+ xLegend.append('')
158
+
159
+ colors = list()
160
+
161
+ for i in range(120) :
162
+ if i % 40 >=0 and i % 40 < 10 : colors.append('lightcoral')
163
+ elif i % 40 >= 10 and i % 40 < 20 : colors.append('goldenrod')
164
+ elif i % 40 >= 10 and i % 40 < 30 : colors.append('lightseagreen')
165
+ elif i % 40 >= 10 and i % 40 < 40 : colors.append('indigo')
166
+
167
+ fig, ax = plt.subplots()
168
+ ax.bar(ticks,occurProbs * 100, align='center', color = colors)
169
+ ax.set_xticks(ticks)
170
+ ax.set_xticklabels(xLegend)
171
+ ax.set_title("Pitch Class Histogram")
172
+ ax.set_xlabel("Note")
173
+ ax.set_ylabel("Occurrence %")
174
+
175
+ result = np.vstack((xLegend, np.round(occurProbs, 4))).T
176
+ if save_to_csv :
177
+ with open('pitch_class.csv', 'w') as out :
178
+ for row in result :
179
+ print(*row, sep=',', file=out)
180
+
181
+ return fig, ax, result
src/st_helper.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+ @st.experimental_memo
4
+ def convert_df(df):
5
+ """
6
+ Convert a pandas dataframe into a csv file.
7
+ For the download button in streamlit.
8
+ """
9
+ return df.to_csv(index=False).encode('utf-8')
10
+
11
+ def show_readme(filename):
12
+ with st.expander("頁面說明(Page Description)"):
13
+ with open(filename, "r", encoding="utf-8") as f:
14
+ st.markdown(f.read())
15
+
16
+
17
+ def get_shift(start_time, end_time):
18
+ """
19
+ 回傳從start_time到end_time的時間刻度
20
+ 開頭為start_time,結尾為end_time
21
+ 中間每隔1秒一個刻度
22
+
23
+ return: a np.array of time stamps
24
+ """
25
+ import numpy as np
26
+
27
+ shift_array = np.arange(start_time, end_time, 1)
28
+ if shift_array[-1] != end_time:
29
+ shift_array = np.append(shift_array, end_time)
30
+
31
+ shift_array = np.round(shift_array, 1)
32
+ return start_time, shift_array
33
+
src/structure_analysis.py ADDED
@@ -0,0 +1,248 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import os, sys, librosa
3
+ from scipy import signal
4
+ from matplotlib import pyplot as plt
5
+ import matplotlib
6
+ import matplotlib.gridspec as gridspec
7
+ import IPython.display as ipd
8
+ import pandas as pd
9
+ from numba import jit
10
+
11
+ import libfmp.b
12
+ import libfmp.c2
13
+ import libfmp.c3
14
+ import libfmp.c4
15
+ import libfmp.c6
16
+ import libfmp
17
+ from libfmp.b import FloatingBox
18
+
19
+ from numpy import typing as npt
20
+ import typing
21
+
22
+ @jit(nopython=True)
23
+ def compute_sm_dot(X, Y):
24
+ """Computes similarty matrix from feature sequences using dot (inner) product
25
+
26
+ Notebook: C4/C4S2_SSM.ipynb
27
+
28
+ Args:
29
+ X (np.ndarray): First sequence
30
+ Y (np.ndarray): Second Sequence
31
+
32
+ Returns:
33
+ S (float): Dot product
34
+ """
35
+ S = np.dot(np.transpose(X), Y)
36
+ return S
37
+
38
+ def plot_feature_ssm(X, Fs_X, S, Fs_S, ann, duration, color_ann=None,
39
+ title='', label='Time (seconds)', time=True,
40
+ figsize=(5, 6), fontsize=10, clim_X=None, clim=None):
41
+ """Plot SSM along with feature representation and annotations (standard setting is time in seconds)
42
+
43
+ Notebook: C4/C4S2_SSM.ipynb
44
+
45
+ Args:
46
+ X: Feature representation
47
+ Fs_X: Feature rate of ``X``
48
+ S: Similarity matrix (SM)
49
+ Fs_S: Feature rate of ``S``
50
+ ann: Annotaions
51
+ duration: Duration
52
+ color_ann: Color annotations (see :func:`libfmp.b.b_plot.plot_segments`) (Default value = None)
53
+ title: Figure title (Default value = '')
54
+ label: Label for time axes (Default value = 'Time (seconds)')
55
+ time: Display time axis ticks or not (Default value = True)
56
+ figsize: Figure size (Default value = (5, 6))
57
+ fontsize: Font size (Default value = 10)
58
+ clim_X: Color limits for matrix X (Default value = None)
59
+ clim: Color limits for matrix ``S`` (Default value = None)
60
+
61
+ Returns:
62
+ fig: Handle for figure
63
+ ax: Handle for axes
64
+ """
65
+ cmap = libfmp.b.compressed_gray_cmap(alpha=-10)
66
+ fig, ax = plt.subplots(3, 3, gridspec_kw={'width_ratios': [0.1, 1, 0.05],
67
+ 'wspace': 0.2,
68
+ 'height_ratios': [0.3, 1, 0.1]},
69
+ figsize=figsize)
70
+ libfmp.b.plot_matrix(X, Fs=Fs_X, ax=[ax[0, 1], ax[0, 2]], clim=clim_X,
71
+ xlabel='', ylabel='', title=title)
72
+ ax[0, 0].axis('off')
73
+ libfmp.b.plot_matrix(S, Fs=Fs_S, ax=[ax[1, 1], ax[1, 2]], cmap=cmap, clim=clim,
74
+ title='', xlabel='', ylabel='', colorbar=True)
75
+ ax[1, 1].set_xticks([])
76
+ ax[1, 1].set_yticks([])
77
+ libfmp.b.plot_segments(ann, ax=ax[2, 1], time_axis=time, fontsize=fontsize,
78
+ colors=color_ann,
79
+ time_label=label, time_max=duration*Fs_X)
80
+ ax[2, 2].axis('off')
81
+ ax[2, 0].axis('off')
82
+ libfmp.b.plot_segments(ann, ax=ax[1, 0], time_axis=time, fontsize=fontsize,
83
+ direction='vertical', colors=color_ann,
84
+ time_label=label, time_max=duration*Fs_X)
85
+ return fig, ax
86
+
87
+ def SSM_chorma(wav_fn:str, anno_fn: str, hop_size: int = 4096, Nfft: int = 1024) -> None :
88
+
89
+ x, fs = librosa.load(wav_fn)
90
+ duration= (x.shape[0])/fs
91
+
92
+ chromagram = librosa.feature.chroma_stft(y=x, sr=fs, tuning=0, norm=2, hop_length=hop_size, n_fft=Nfft)
93
+ X, Fs_X = libfmp.c3.smooth_downsample_feature_sequence(chromagram, fs/hop_size, filt_len=41, down_sampling=10)
94
+
95
+ # According to the documentation
96
+ ann, color_ann = libfmp.c4.read_structure_annotation(os.path.join(anno_fn), fn_ann_color=anno_fn)
97
+ ann_frames = libfmp.c4.convert_structure_annotation(ann, Fs=Fs_X)
98
+
99
+ X = libfmp.c3.normalize_feature_sequence(X, norm='2', threshold=0.001)
100
+ S = compute_sm_dot(X,X)
101
+ fig, ax = plot_feature_ssm(X, 1, S, 1, ann_frames, duration*Fs_X, color_ann=color_ann,
102
+ clim_X=[0,1], clim=[0,1], label='Time (frames)',
103
+ title='Chroma feature (Fs=%0.2f)'%Fs_X)
104
+ return fig, ax
105
+
106
+ def plot_self_similarity(y_ref: npt.ArrayLike, sr: int, affinity: bool = False, hop_length: int = 1024) -> None:
107
+ '''
108
+ To visualize the similarity matrix of the signal
109
+
110
+ y_ref: reference signal
111
+ y_comp: signal to be compared
112
+ sr: sampling rate
113
+ affinity: to use affinity or not
114
+ hop_size
115
+ '''
116
+
117
+
118
+ # Pre-processing stage
119
+ chroma = librosa.feature.chroma_cqt(y=y_ref, sr=sr, hop_length=hop_length)
120
+ chroma_stack = librosa.feature.stack_memory(chroma, n_steps=10, delay=3)
121
+
122
+ fig, ax = plt.subplots()
123
+
124
+ if not affinity:
125
+ R = librosa.segment.recurrence_matrix(chroma_stack, k=5)
126
+ imgsim = librosa.display.specshow(R, x_axis='s', y_axis='s',
127
+ hop_length=hop_length)
128
+ plt.title('Binary recurrence (symmetric)')
129
+ plt.colorbar()
130
+
131
+ else:
132
+ R_aff = librosa.segment.recurrence_matrix(chroma_stack, metric='cosine',mode='affinity')
133
+ imgaff = librosa.display.specshow(R_aff, x_axis='s', y_axis='s',
134
+ cmap='magma_r', hop_length=hop_length)
135
+ plt.title('Affinity recurrence')
136
+ plt.colorbar()
137
+
138
+ return fig, ax
139
+
140
+ @jit(nopython=True)
141
+ def compute_kernel_checkerboard_gaussian(L: int =10 , var: float = 0.5, normalize=True) -> npt.ArrayLike:
142
+ """Compute Guassian-like checkerboard kernel [FMP, Section 4.4.1].
143
+ See also: https://scipython.com/blog/visualizing-the-bivariate-gaussian-distribution/
144
+
145
+ Notebook: C4/C4S4_NoveltySegmentation.ipynb
146
+
147
+ Args:
148
+ L (int): Parameter specifying the kernel size M=2*L+1
149
+ var (float): Variance parameter determing the tapering (epsilon) (Default value = 1.0)
150
+ normalize (bool): Normalize kernel (Default value = True)
151
+
152
+ Returns:
153
+ kernel (np.ndarray): Kernel matrix of size M x M
154
+ """
155
+ taper = np.sqrt(1/2) / (L * var)
156
+ axis = np.arange(-L, L+1)
157
+ gaussian1D = np.exp(-taper**2 * (axis**2))
158
+ gaussian2D = np.outer(gaussian1D, gaussian1D)
159
+ kernel_box = np.outer(np.sign(axis), np.sign(axis))
160
+ kernel = kernel_box * gaussian2D
161
+ if normalize:
162
+ kernel = kernel / np.sum(np.abs(kernel))
163
+ return kernel
164
+
165
+
166
+ def compute_novelty_ssm(S, kernel: npt.ArrayLike = None, L: int = 10, var: float = 0.5, exclude: bool =False) -> npt.ArrayLike:
167
+ """Compute novelty function from SSM [FMP, Section 4.4.1]
168
+
169
+ Notebook: C4/C4S4_NoveltySegmentation.ipynb
170
+
171
+ Args:
172
+ S (np.ndarray): SSM
173
+ kernel (np.ndarray): Checkerboard kernel (if kernel==None, it will be computed) (Default value = None)
174
+ L (int): Parameter specifying the kernel size M=2*L+1 (Default value = 10)
175
+ var (float): Variance parameter determing the tapering (epsilon) (Default value = 0.5)
176
+ exclude (bool): Sets the first L and last L values of novelty function to zero (Default value = False)
177
+
178
+ Returns:
179
+ nov (np.ndarray): Novelty function
180
+ """
181
+ if kernel is None:
182
+ kernel = compute_kernel_checkerboard_gaussian(L=L, var=var)
183
+ N = S.shape[0]
184
+ M = 2*L + 1
185
+ nov = np.zeros(N)
186
+ # np.pad does not work with numba/jit
187
+ S_padded = np.pad(S, L, mode='constant')
188
+
189
+ for n in range(N):
190
+ # Does not work with numba/jit
191
+ nov[n] = np.sum(S_padded[n:n+M, n:n+M] * kernel)
192
+ if exclude:
193
+ right = np.min([L, N])
194
+ left = np.max([0, N-L])
195
+ nov[0:right] = 0
196
+ nov[left:N] = 0
197
+
198
+ return nov
199
+
200
+ def SSM_Novelty(wav_filename:str, anno_csv_filename: str) -> None :
201
+
202
+ float_box = libfmp.b.FloatingBox()
203
+
204
+ fn_wav = os.path.join(wav_filename)
205
+ ann, color_ann = libfmp.c4.read_structure_annotation(os.path.join(anno_csv_filename),
206
+ fn_ann_color=anno_csv_filename)
207
+
208
+ S_dict = {}
209
+ Fs_dict = {}
210
+ x, x_duration, X, Fs_X, S, I = libfmp.c4.compute_sm_from_filename(fn_wav,
211
+ L=11, H=5, L_smooth=1, thresh=1)
212
+
213
+ S_dict[0], Fs_dict[0] = S, Fs_X
214
+ ann_frames = libfmp.c4.convert_structure_annotation(ann, Fs=Fs_X)
215
+ fig, ax = libfmp.c4.plot_feature_ssm(X, 1, S, 1, ann_frames, x_duration*Fs_X,
216
+ label='Time (frames)', color_ann=color_ann, clim_X=[0,1], clim=[0,1],
217
+ title='Feature rate: %0.0f Hz'%(Fs_X), figsize=(4.5, 5.5))
218
+ float_box.add_fig(fig)
219
+
220
+ x, x_duration, X, Fs_X, S, I = libfmp.c4.compute_sm_from_filename(fn_wav,
221
+ L=41, H=10, L_smooth=1, thresh=1)
222
+ S_dict[1], Fs_dict[1] = S, Fs_X
223
+ ann_frames = libfmp.c4.convert_structure_annotation(ann, Fs=Fs_X)
224
+ fig, ax = libfmp.c4.plot_feature_ssm(X, 1, S, 1, ann_frames, x_duration*Fs_X,
225
+ label='Time (frames)', color_ann=color_ann, clim_X=[0,1], clim=[0,1],
226
+ title='Feature rate: %0.0f Hz'%(Fs_X), figsize=(4.5, 5.5))
227
+ float_box.add_fig(fig)
228
+ float_box.show()
229
+
230
+ figsize=(10,6)
231
+ L_kernel_set = [5, 10, 20, 40]
232
+ num_kernel = len(L_kernel_set)
233
+ num_SSM = len(S_dict)
234
+
235
+ fig, ax = plt.subplots(num_kernel, num_SSM, figsize=figsize)
236
+ for s in range(num_SSM):
237
+ for t in range(num_kernel):
238
+ L_kernel = L_kernel_set[t]
239
+ S = S_dict[s]
240
+ nov = compute_novelty_ssm(S, L=L_kernel, exclude=True)
241
+ fig_nov, ax_nov, line_nov = libfmp.b.plot_signal(nov, Fs = Fs_dict[s],
242
+ color='k', ax=ax[t,s], figsize=figsize,
243
+ title='Feature rate = %0.0f Hz, $L_\mathrm{kernel}$ = %d'%(Fs_dict[s],L_kernel))
244
+ libfmp.b.plot_segments_overlay(ann, ax=ax_nov, colors=color_ann, alpha=0.1,
245
+ edgecolor='k', print_labels=False)
246
+ plt.tight_layout()
247
+ plt.show()
248
+
src/timbre_analysis.py ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import librosa
2
+ from librosa import display
3
+ from librosa import feature
4
+
5
+ import numpy as np
6
+ from numpy import typing as npt
7
+
8
+ from matplotlib import pyplot as plt
9
+ import scipy
10
+
11
+
12
+ def spectral_centroid_analysis(y: npt.ArrayLike, sr: int, shift_array: npt.ArrayLike) -> None :
13
+
14
+ S, phase = librosa.magphase(librosa.stft(y=y))
15
+ cent = librosa.feature.spectral_centroid(S=S)
16
+ times = librosa.times_like(cent, sr)
17
+
18
+ fig, ax = plt.subplots()
19
+ librosa.display.specshow(librosa.amplitude_to_db(S, ref=np.max),
20
+ y_axis='log', x_axis='time', ax=ax, sr=sr)
21
+ ax.plot(times, cent.T, label='Spectral centroid', color='w')
22
+ ax.legend(loc='upper right')
23
+ ax.set(title='log Power spectrogram')
24
+ ax.set_xticks(shift_array - shift_array[0],
25
+ shift_array)
26
+ ax.autoscale()
27
+
28
+ result = np.vstack((times, cent))
29
+
30
+ return fig, ax, result
31
+
32
+
33
+ def rolloff_frequency_analysis(y: npt.ArrayLike, sr: int, roll_percent:float = 0.99,
34
+ shift_array: npt.ArrayLike =None) -> None :
35
+
36
+ rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr, roll_percent=roll_percent)
37
+ rolloff_min = librosa.feature.spectral_rolloff(y=y, sr=sr, roll_percent=0.01)
38
+ times = librosa.times_like(rolloff, sr)
39
+ S, phase = librosa.magphase(librosa.stft(y))
40
+
41
+ fig, ax = plt.subplots()
42
+ librosa.display.specshow(librosa.amplitude_to_db(S, ref=np.max),
43
+ y_axis='log', x_axis='time', ax=ax, sr=sr)
44
+ ax.plot(librosa.times_like(rolloff,sr), rolloff[0], label=f'Roll-off frequency ({roll_percent})')
45
+ ax.plot(librosa.times_like(rolloff,sr), rolloff_min[0], color='w',
46
+ label='Roll-off frequency (0.01)')
47
+ ax.legend(loc='lower right')
48
+ ax.set(title='log Power spectrogram')
49
+ ax.set_xticks(shift_array - shift_array[0],
50
+ shift_array)
51
+ ax.autoscale()
52
+
53
+ result = np.vstack((times, rolloff, rolloff_min))
54
+
55
+ return fig, ax, result
56
+
57
+ def spectral_bandwidth_analysis(y: npt.ArrayLike, sr: int, shift_array: npt.ArrayLike =None) -> None :
58
+
59
+ S, phase = librosa.magphase(librosa.stft(y=y))
60
+ spec_bw = librosa.feature.spectral_bandwidth(S=S)
61
+ times = librosa.times_like(spec_bw, sr)
62
+
63
+ fig, ax = plt.subplots(nrows=2, sharex=True)
64
+ centroid = librosa.feature.spectral_centroid(S=S, sr=sr)
65
+ ax[0].semilogy(times, spec_bw[0], label='Spectral bandwidth')
66
+ ax[0].set(ylabel='Hz', xticks=[], xlim=[times.min(), times.max()])
67
+ ax[0].legend()
68
+ ax[0].label_outer()
69
+ librosa.display.specshow(librosa.amplitude_to_db(S, ref=np.max),
70
+ y_axis='log', x_axis='time', ax=ax[1], sr=sr)
71
+ ax[1].set(title='log Power spectrogram')
72
+ ax[1].fill_between(times, np.maximum(0, centroid[0] - spec_bw[0]),
73
+ np.minimum(centroid[0] + spec_bw[0], sr/2),
74
+ alpha=0.5, label='Centroid +- bandwidth')
75
+ ax[1].plot(times, centroid[0], label='Spectral centroid', color='w')
76
+ ax[1].legend(loc='lower right')
77
+ ax[1].set_xticks(shift_array - shift_array[0],
78
+ shift_array)
79
+ ax[1].autoscale()
80
+
81
+ result = np.vstack((times, spec_bw))
82
+
83
+ return fig, ax, result
84
+
85
+
86
+ def harmonic_percussive_source_separation(y: npt.ArrayLike, sr: int,
87
+ shift_array: npt.ArrayLike =None
88
+ ) -> None :
89
+
90
+ D = librosa.stft(y)
91
+ H, P = librosa.decompose.hpss(D)
92
+ t = librosa.frames_to_time(np.arange(D.shape[1]), sr=sr)
93
+
94
+ fig, ax = plt.subplots(nrows=3, sharex=False, sharey=False, figsize=(12, 8))
95
+ # 設置子圖之間的水平間距和垂直間距
96
+ plt.subplots_adjust(hspace=0.6, wspace=0.3)
97
+ img = librosa.display.specshow(librosa.amplitude_to_db(np.abs(D),ref=np.max),
98
+ y_axis='log', x_axis='time', ax=ax[0], sr=sr)
99
+ ax[0].set(title='Full power spectrogram')
100
+ #// ax[0].label_outer()
101
+ ax[0].set_xlabel('') # 不顯示x軸名稱
102
+ ax[0].set_xticks(shift_array - shift_array[0],
103
+ shift_array)
104
+ ax[0].autoscale()
105
+
106
+ librosa.display.specshow(librosa.amplitude_to_db(np.abs(H), ref=np.max(np.abs(D))),
107
+ y_axis='log', x_axis='time', ax=ax[1], sr=sr)
108
+ ax[1].set(title='Harmonic power spectrogram')
109
+ #// ax[1].label_outer()
110
+ ax[1].set_xlabel('') # 不顯示x軸名稱
111
+ ax[1].set_xticks(shift_array - shift_array[0],
112
+ shift_array)
113
+ ax[1].autoscale()
114
+
115
+ librosa.display.specshow(librosa.amplitude_to_db(np.abs(P), ref=np.max(np.abs(D))),
116
+ y_axis='log', x_axis='time', ax=ax[2], sr=sr)
117
+ ax[2].set(title='Percussive power spectrogram')
118
+ ax[2].set_xticks(shift_array - shift_array[0],
119
+ shift_array)
120
+ ax[2].autoscale()
121
+
122
+ fig.colorbar(img, ax=ax, format='%+2.0f dB')
123
+
124
+ return fig, ax, (D, H, P, t)
125
+