Spaces:
Runtime error
Runtime error
Keycatowo
commited on
Commit
·
bb5feba
1
Parent(s):
d337940
init trans commit
Browse files- .streamlit/config.toml +2 -0
- Dockerfile +19 -6
- docs/1-Basic Information.md +40 -0
- docs/2-Pitch_estimation.md +15 -0
- docs/3-Beat Tracking.md +18 -0
- docs/4-Chord_recognition.md +6 -0
- docs/5-Structure_analysis.md +10 -0
- docs/6-Timbre Analysis.md +6 -0
- docs/info.md +55 -0
- home.py +67 -0
- packages.txt +1 -0
- pages/1-Basic_Information.py +132 -0
- pages/2-Pitch_estimation.py +127 -0
- pages/3-Beat Tracking.py +164 -0
- pages/4-Chord_recognition.py +112 -0
- pages/5-Structure_analysis.py +76 -0
- pages/6-Timbre_Analysis.py +149 -0
- pages/999-dev.py +18 -0
- requirements.txt +11 -0
- src/basic_info.py +48 -0
- src/beat_track.py +223 -0
- src/chord_recognition.py +285 -0
- src/pitch_estimation.py +181 -0
- src/st_helper.py +33 -0
- src/structure_analysis.py +248 -0
- src/timbre_analysis.py +125 -0
.streamlit/config.toml
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
[server]
|
2 |
+
maxUploadSize = 10
|
Dockerfile
CHANGED
@@ -1,11 +1,24 @@
|
|
1 |
-
FROM python:3.
|
2 |
|
3 |
-
|
|
|
4 |
|
5 |
-
|
|
|
6 |
|
7 |
-
|
|
|
8 |
|
9 |
-
|
|
|
|
|
|
|
|
|
10 |
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
FROM python:3.8-slim-buster
|
2 |
|
3 |
+
# 將工作目錄設定為 /app
|
4 |
+
WORKDIR /app
|
5 |
|
6 |
+
# 複製當前目錄下的所有檔案到 /app
|
7 |
+
COPY . /app
|
8 |
|
9 |
+
# 開啟 8501 port
|
10 |
+
EXPOSE 8501
|
11 |
|
12 |
+
# 安裝套件
|
13 |
+
RUN apt update && \
|
14 |
+
apt upgrade -y && \
|
15 |
+
apt install -y libsndfile1 && \
|
16 |
+
apt install -y ffmpeg
|
17 |
|
18 |
+
# 升級 pip, 安裝套件
|
19 |
+
RUN pip install --upgrade pip && \
|
20 |
+
pip install -r requirements.txt
|
21 |
+
|
22 |
+
|
23 |
+
# 執行命令
|
24 |
+
CMD ["streamlit", "run", "home.py"]
|
docs/1-Basic Information.md
ADDED
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Part1-Basic Information
|
2 |
+
|
3 |
+
## What can it do?
|
4 |
+
- Print audio length (seconds)
|
5 |
+
- Plot waveform
|
6 |
+
- Plot rms (librosa.feature.rms)
|
7 |
+
- Plot spectrogram
|
8 |
+
- Save rms as .csv
|
9 |
+
|
10 |
+
## How to use it?
|
11 |
+
|
12 |
+
### Step1:上傳音檔
|
13 |
+
在上傳音檔區塊,點選上傳音檔,選擇要分析的音檔,點選上傳。
|
14 |
+
> 上傳檔案限制:
|
15 |
+
> - 檔案大小:200MB
|
16 |
+
> - 檔案格式:`.mp3`, `.wav`, `.ocg`
|
17 |
+
|
18 |
+
![](../fig/1-上傳檔案.png)
|
19 |
+
|
20 |
+
上傳完成後,會顯示音檔的基本資訊,並提供一個播放介面檢查音檔是否正確。
|
21 |
+
![](../fig/1-上傳完成.png)
|
22 |
+
|
23 |
+
### Step2:選擇要分析的片段
|
24 |
+
預設分析為整段音檔,若要分析音檔的特定片段,可以在此區塊選擇要分析的片段。
|
25 |
+
同樣的,選擇完後,會顯示音檔的基本資訊,並提供一個播放介面檢查音檔是否正確。
|
26 |
+
![](../fig/1-截取片段.png)
|
27 |
+
|
28 |
+
### Step3:前往要分析的功能
|
29 |
+
|
30 |
+
#### 聲音波形(waveform)
|
31 |
+
![](../fig/1-繪製聲音波形.png)
|
32 |
+
|
33 |
+
### 聲音強度(rms)
|
34 |
+
![](../fig/1-聲音強度.png)
|
35 |
+
|
36 |
+
### 聲音頻譜(spectrogram)
|
37 |
+
![](../fig/1-聲音頻譜.png)
|
38 |
+
|
39 |
+
### 儲存聲音強度(rms)
|
40 |
+
![](../fig/1-下載RMS.png)
|
docs/2-Pitch_estimation.md
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Part2-Pitch_estimation
|
2 |
+
|
3 |
+
## What can it do?
|
4 |
+
- Mel-frequency spectrogram
|
5 |
+
- Constant-Q transform
|
6 |
+
- Chroma
|
7 |
+
- PYin + mel-frequency spectrogram
|
8 |
+
- Pitch class histogram 1 (calculated from chroma_stft)
|
9 |
+
- Pitch class histogram 2 (calculated from chroma_stft, n_chroma = 120)
|
10 |
+
- Pitch class histogram 3 (calculated from the PYin f0 )
|
11 |
+
- save all histogram (data count) as .csv
|
12 |
+
|
13 |
+
## TODO
|
14 |
+
+ PYin + mel-frequency spectrogram 的部分是哪个?
|
15 |
+
+ 目前只有一个pirch class
|
docs/3-Beat Tracking.md
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Part 3 - Beat Tracking
|
2 |
+
|
3 |
+
## What can it do?
|
4 |
+
- Onset strength
|
5 |
+
(librosa.onset.onset_strength)
|
6 |
+
- Onset detection + onset time
|
7 |
+
(librosa.onset.onset_detect)
|
8 |
+
- Mel-spectrogram + onset strength + beat time
|
9 |
+
- Predominant local pulse + beat time
|
10 |
+
- Tempo: print static tempo value
|
11 |
+
- Fourier tempogram + estimated tempo
|
12 |
+
- Autocorrelation tempogram + estimated tempo
|
13 |
+
- Save note onset time & beat time as .csv
|
14 |
+
- Output original audio + note onset click sound
|
15 |
+
- Output original audio + beat time click sound
|
16 |
+
- Madmom note onset detection & beat tracking
|
17 |
+
https://madmom.readthedocs.io/en/latest/modules/features.html
|
18 |
+
|
docs/4-Chord_recognition.md
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Part4-Chord recognition
|
2 |
+
|
3 |
+
## What can it do?
|
4 |
+
- Chromagram + chord recognition result
|
5 |
+
(https://www.audiolabs-erlangen.de/resources/MIR/FMP/C5/C5S2_ChordRec_Templates.html)
|
6 |
+
- Chromagram + binary reconstruction of chord recognition result
|
docs/5-Structure_analysis.md
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Part5-Structure_analysis
|
2 |
+
|
3 |
+
## What can it do?
|
4 |
+
- Raw SSM (calculated from chroma)
|
5 |
+
- Smoothed SSM (calculated from chroma)
|
6 |
+
- Novelty function (structural boundary detection)
|
7 |
+
- Save novelty function curve as .csv
|
8 |
+
|
9 |
+
## TODO
|
10 |
+
+ 需要`anno.csv`的部分需要確認
|
docs/6-Timbre Analysis.md
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Part 6 - Timbre analysis
|
2 |
+
- Spectrogram + spectral centroid
|
3 |
+
- Spectrogram + 99% roll-off + 1% roll-off
|
4 |
+
- Spectrogram + centroid + bandwidth
|
5 |
+
(librosa.feature.spectral_bandwidth)
|
6 |
+
- Save spectral centroid, 99% roll-off, 1% roll-off, bandwidth as .csv
|
docs/info.md
ADDED
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# 音樂分析工具
|
2 |
+
|
3 |
+
此工具整合Pitch_estimation、Beat Tracking、Chord recognition、Structure analysis和Timbre analysis等功能,旨在提供一個簡便易用的音樂分析工具。
|
4 |
+
|
5 |
+
## 功能概述
|
6 |
+
|
7 |
+
以下是此工具的主要功能:
|
8 |
+
|
9 |
+
- Basic analysis:音檔基本資訊
|
10 |
+
- Pitch estimation:樂曲音高估計
|
11 |
+
- Beat Tracking:節奏追蹤
|
12 |
+
- Chord recognition:和弦識別
|
13 |
+
- Structure analysis:曲式分析
|
14 |
+
- Timbre analysis:音色分析
|
15 |
+
|
16 |
+
我們希望此專案可以幫助不具備程式基礎的音樂工作者和愛好者進行音樂分析,
|
17 |
+
透過整合現有的各種音樂分析方法與工具,並將其整合在一個簡單易用的網頁工具介面中。
|
18 |
+
|
19 |
+
## 開發團隊
|
20 |
+
|
21 |
+
+ [Yu-Fen Huang](https://yfhuang.info/)
|
22 |
+
+ 中央研究院 資訊科學研究所 音樂與文化科技實驗室(Music & Culture Technology Lab, Institute of Information Science, Academia Sinica, Taiwan)
|
23 |
+
+ 計畫主持人
|
24 |
+
+ [Yu-Lan Chang](https://github.com/TrangDuLam)
|
25 |
+
+ 清華大學 電機工程所
|
26 |
+
+ 核心功能開發、套件源碼整合
|
27 |
+
+ [Hong-Hsiang Liu](https://url.o-w-o.cc/link)
|
28 |
+
+ 清華大學 電機工程所
|
29 |
+
+ 互動介面設計、應用部署與配置
|
30 |
+
+ Ting-Yi Lu
|
31 |
+
+ 清華大學 資訊工程所
|
32 |
+
+ 套件源碼整合、說明文件撰寫
|
33 |
+
|
34 |
+
## 工具相關資源
|
35 |
+
+ [視覺化介面](https://github.com/Keycatowo/music-analysis):適合不具備程式基礎的使用者
|
36 |
+
+ [程式套件](https://github.com/TrangDuLam/NTHU_Music_AI_Tools):提供更多細節調整,適合具備程式基礎的使用者
|
37 |
+
+ 說明文件:
|
38 |
+
+ ...
|
39 |
+
|
40 |
+
## 問題反饋
|
41 |
+
|
42 |
+
如果您在使用此專案時遇到任何問題,請通過以下方式與我們聯繫:
|
43 |
+
|
44 |
+
- 發送電子郵件給我們
|
45 |
+
- 在我們的GitHub頁面提交問題
|
46 |
+
- [視覺化介面](https://github.com/Keycatowo/music-analysis/issues)
|
47 |
+
- [程式套件](https://github.com/TrangDuLam/NTHU_Music_AI_Tools)
|
48 |
+
|
49 |
+
我們會盡快回复您的問題。
|
50 |
+
|
51 |
+
## 授權協議
|
52 |
+
|
53 |
+
音樂分析工具採用 [MIT](https://opensource.org/license/mit/) 授權。
|
54 |
+
|
55 |
+
請注意,我們的軟件和內容可能包含第三方軟件庫和組件,這些庫和組件受到其各自的許可證的管轄。有關這些庫和組件的詳細信息,請參閱相應的文檔。
|
home.py
ADDED
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#%%
|
2 |
+
import streamlit as st
|
3 |
+
|
4 |
+
st.header("Music Analysis Tool")
|
5 |
+
|
6 |
+
st.session_state.start_time = 0.0
|
7 |
+
|
8 |
+
|
9 |
+
st.write(
|
10 |
+
"""
|
11 |
+
# 音樂分析工具
|
12 |
+
|
13 |
+
此工具整合Pitch_estimation、Beat Tracking、Chord recognition、Structure analysis和Timbre analysis等功能,旨在提供一個簡便易用的音樂分析工具。
|
14 |
+
|
15 |
+
## 功能概述
|
16 |
+
|
17 |
+
以下是此工具的主要功能:
|
18 |
+
|
19 |
+
- Basic analysis:音檔基本資訊
|
20 |
+
- Pitch estimation:樂曲音高估計
|
21 |
+
- Beat Tracking:節奏追蹤
|
22 |
+
- Chord recognition:和弦識別
|
23 |
+
- Structure analysis:曲式分析
|
24 |
+
- Timbre analysis:音色分析
|
25 |
+
|
26 |
+
我們希望此專案可以幫助不具備程式基礎的音樂工作者和愛好者進行音樂分析,
|
27 |
+
透過整合現有的各種音樂分析方法與工具,並將其整合在一個簡單易用的網頁工具介面中。
|
28 |
+
|
29 |
+
## 開發團隊
|
30 |
+
|
31 |
+
+ [Yu-Fen Huang](https://yfhuang.info/)
|
32 |
+
+ 中央研究院 資訊科學研究所 音樂與文化科技實驗室(Music & Culture Technology Lab, Institute of Information Science, Academia Sinica, Taiwan)
|
33 |
+
+ 計畫主持人
|
34 |
+
+ [Yu-Lan Chang](https://github.com/TrangDuLam)
|
35 |
+
+ 清華大學 電機工程所
|
36 |
+
+ 核心功能開發、套件源碼整合
|
37 |
+
+ [Hong-Hsiang Liu](https://url.o-w-o.cc/link)
|
38 |
+
+ 清華大學 電機工程所
|
39 |
+
+ 互動介面設計、應用部署與配置
|
40 |
+
+ Ting-Yi Lu
|
41 |
+
+ 清華大學 資訊工程所
|
42 |
+
+ 套件源碼整合、說明文件撰寫
|
43 |
+
|
44 |
+
## 工具相關資源
|
45 |
+
+ [視覺化介面](https://github.com/Keycatowo/music-analysis):適合不具備程式基礎的使用者
|
46 |
+
+ [程式套件](https://github.com/TrangDuLam/NTHU_Music_AI_Tools):提供更多細節調整,適合具備程式基礎的使用者
|
47 |
+
+ 說明文件:
|
48 |
+
+ ...
|
49 |
+
|
50 |
+
## 問題反饋
|
51 |
+
|
52 |
+
如果您在使用此專案時遇到任何問題,請通過以下方式與我們聯繫:
|
53 |
+
|
54 |
+
- 發送電子郵件給我們
|
55 |
+
- 在我們的GitHub頁面提交問題
|
56 |
+
- [視覺化介面](https://github.com/Keycatowo/music-analysis/issues)
|
57 |
+
- [程式套件](https://github.com/TrangDuLam/NTHU_Music_AI_Tools)
|
58 |
+
|
59 |
+
我們會盡快回复您的問題。
|
60 |
+
|
61 |
+
## 授權協議
|
62 |
+
|
63 |
+
音樂分析工具採用 [MIT](https://opensource.org/license/mit/) 授權。
|
64 |
+
|
65 |
+
請注意,我們的軟件和內容可能包含第三方軟件庫和組件,這些庫和組件受到其各自的許可證的管轄。有關這些庫和組件的詳細信息,請參閱相應的文檔。
|
66 |
+
"""
|
67 |
+
)
|
packages.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
libsndfile1-dev
|
pages/1-Basic_Information.py
ADDED
@@ -0,0 +1,132 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#%%
|
2 |
+
import streamlit as st
|
3 |
+
import plotly.express as px
|
4 |
+
import plotly.graph_objects as go
|
5 |
+
import matplotlib.pyplot as plt
|
6 |
+
import numpy as np
|
7 |
+
import librosa
|
8 |
+
import pandas as pd
|
9 |
+
from src.st_helper import convert_df, show_readme, get_shift
|
10 |
+
from src.basic_info import plot_waveform, signal_RMS_analysis
|
11 |
+
|
12 |
+
|
13 |
+
st.title("Basic Information")
|
14 |
+
#%% 頁面說明
|
15 |
+
# show_readme("docs/1-Basic Information.md")
|
16 |
+
|
17 |
+
|
18 |
+
#%% 上傳檔案區塊
|
19 |
+
with st.expander("上傳檔案(Upload Files)"):
|
20 |
+
file = st.file_uploader("Upload your music library", type=["mp3", "wav", "ogg"])
|
21 |
+
|
22 |
+
if file is not None:
|
23 |
+
st.audio(file, format="audio/ogg")
|
24 |
+
st.subheader("File information")
|
25 |
+
st.write(f"File name: `{file.name}`", )
|
26 |
+
st.write(f"File type: `{file.type}`")
|
27 |
+
st.write(f"File size: `{file.size}`")
|
28 |
+
|
29 |
+
# 載入音檔
|
30 |
+
y, sr = librosa.load(file, sr=44100)
|
31 |
+
st.write(f"Sample rate: `{sr}`")
|
32 |
+
duration = float(np.round(len(y)/sr-0.005, 2)) # 時間長度,取小數點後2位,向下取整避免超過音檔長度
|
33 |
+
st.write(f"Duration(s): `{duration}`")
|
34 |
+
|
35 |
+
y_all = y
|
36 |
+
|
37 |
+
#%%
|
38 |
+
if file is not None:
|
39 |
+
|
40 |
+
### Start of 選擇聲音片段 ###
|
41 |
+
with st.expander("選擇聲音片段(Select a segment of the audio)"):
|
42 |
+
|
43 |
+
# 建立一個滑桿,可以選擇聲音片段,使用時間長度為單位
|
44 |
+
start_time, end_time = st.slider("Select a segment of the audio",
|
45 |
+
0.0, duration,
|
46 |
+
(st.session_state.start_time, duration),
|
47 |
+
0.01
|
48 |
+
)
|
49 |
+
st.session_state.start_time = start_time
|
50 |
+
|
51 |
+
st.write(f"Selected segment: `{start_time}` ~ `{end_time}`, duration: `{end_time-start_time}`")
|
52 |
+
|
53 |
+
# 根據選擇的聲音片段,取出聲音資料
|
54 |
+
start_index = int(start_time*sr)
|
55 |
+
end_index = int(end_time*sr)
|
56 |
+
y_sub = y_all[start_index:end_index]
|
57 |
+
|
58 |
+
|
59 |
+
# 建立一個y_sub的播放器
|
60 |
+
st.audio(y_sub, format="audio/ogg", sample_rate=sr)
|
61 |
+
# 計算y_sub所對應時間的x軸
|
62 |
+
x_sub = np.arange(len(y_sub))/sr
|
63 |
+
### End of 選擇聲音片段 ###
|
64 |
+
|
65 |
+
tab1, tab2, tab3, tab4, tab5 = st.tabs([
|
66 |
+
"Waveform(mathplotlib)",
|
67 |
+
"Waveform(plotly)",
|
68 |
+
"signal_RMS_analysis",
|
69 |
+
"Spectrogram",
|
70 |
+
"Download RMS data"])
|
71 |
+
|
72 |
+
shift_time, shift_array = get_shift(start_time, end_time) # shift_array為y_sub的時間刻度
|
73 |
+
|
74 |
+
# 繪製聲音波形圖
|
75 |
+
with tab1:
|
76 |
+
st.subheader("Waveform(mathplotlib)")
|
77 |
+
fig1_1, ax_1_1 = plt.subplots()
|
78 |
+
ax_1_1.plot(x_sub + shift_time, y_sub)
|
79 |
+
ax_1_1.set_xlabel("Time(s)")
|
80 |
+
ax_1_1.set_ylabel("Amplitude")
|
81 |
+
ax_1_1.set_title("Waveform")
|
82 |
+
st.pyplot(fig1_1)
|
83 |
+
|
84 |
+
# 繪製聲音波形圖
|
85 |
+
with tab2:
|
86 |
+
st.subheader("Waveform(plotly)")
|
87 |
+
fig1_2 = go.Figure(data=go.Scatter(x=x_sub + shift_time, y=y_sub))
|
88 |
+
fig1_2.update_layout(
|
89 |
+
title="Waveform",
|
90 |
+
xaxis_title="Time(s)",
|
91 |
+
yaxis_title="Amplitude",
|
92 |
+
)
|
93 |
+
st.plotly_chart(fig1_2)
|
94 |
+
|
95 |
+
# 繪製聲音RMS圖
|
96 |
+
with tab3:
|
97 |
+
st.subheader("signal_RMS_analysis")
|
98 |
+
fig1_3, ax1_3, times, rms = signal_RMS_analysis(y_sub, shift_time=shift_time)
|
99 |
+
st.pyplot(fig1_3)
|
100 |
+
|
101 |
+
# 繪製聲音Spectrogram圖(使用librosa繪製)
|
102 |
+
with tab4:
|
103 |
+
st.subheader("Spectrogram")
|
104 |
+
stft = librosa.stft(y_sub)
|
105 |
+
stft_db = librosa.amplitude_to_db(abs(stft))
|
106 |
+
# add a figure
|
107 |
+
fig1_4, ax1_4 = plt.subplots()
|
108 |
+
librosa.display.specshow(stft_db, x_axis='time', y_axis='log', sr=sr, ax=ax1_4)
|
109 |
+
ax1_4.set_xticks(shift_array - shift_array[0],
|
110 |
+
shift_array)
|
111 |
+
ax1_4.autoscale()
|
112 |
+
ax1_4.set_xlabel("Time(s)")
|
113 |
+
st.pyplot(fig1_4)
|
114 |
+
|
115 |
+
# 下載RMS資料
|
116 |
+
with tab5:
|
117 |
+
st.subheader("Download RMS data")
|
118 |
+
|
119 |
+
col1, col2 = st.columns(2)
|
120 |
+
with col1:
|
121 |
+
rms_df = pd.DataFrame({"Time(s)": times, "RMS": rms[0,:]})
|
122 |
+
st.dataframe(rms_df, use_container_width=True)
|
123 |
+
with col2:
|
124 |
+
st.download_button(
|
125 |
+
"Doanload RMS data",
|
126 |
+
convert_df(rms_df),
|
127 |
+
"rms.csv",
|
128 |
+
"text/csv",
|
129 |
+
key="download-csv"
|
130 |
+
)
|
131 |
+
|
132 |
+
# %%
|
pages/2-Pitch_estimation.py
ADDED
@@ -0,0 +1,127 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#%%
|
2 |
+
import streamlit as st
|
3 |
+
import plotly.express as px
|
4 |
+
import plotly.graph_objects as go
|
5 |
+
import matplotlib.pyplot as plt
|
6 |
+
import numpy as np
|
7 |
+
import librosa
|
8 |
+
import pandas as pd
|
9 |
+
import seaborn as sns
|
10 |
+
from src.st_helper import convert_df, show_readme, get_shift
|
11 |
+
from src.pitch_estimation import plot_mel_spectrogram, plot_constant_q_transform, pitch_class_type_one_vis, pitch_class_histogram_chroma
|
12 |
+
|
13 |
+
|
14 |
+
st.title("Pitch estimation")
|
15 |
+
#%% 頁面說明
|
16 |
+
# show_readme("docs/2-Pitch_estimation.md")
|
17 |
+
|
18 |
+
#%% 上傳檔案區塊
|
19 |
+
with st.expander("上傳檔案(Upload Files)"):
|
20 |
+
file = st.file_uploader("Upload your music library", type=["mp3", "wav", "ogg"])
|
21 |
+
|
22 |
+
if file is not None:
|
23 |
+
st.audio(file, format="audio/ogg")
|
24 |
+
st.subheader("File information")
|
25 |
+
st.write(f"File name: `{file.name}`", )
|
26 |
+
st.write(f"File type: `{file.type}`")
|
27 |
+
st.write(f"File size: `{file.size}`")
|
28 |
+
|
29 |
+
# 載入音檔
|
30 |
+
y, sr = librosa.load(file, sr=44100)
|
31 |
+
st.write(f"Sample rate: `{sr}`")
|
32 |
+
duration = float(np.round(len(y)/sr-0.005, 2)) # 時間長度,取小數點後2位,向下取整避免超過音檔長度
|
33 |
+
st.write(f"Duration(s): `{duration}`")
|
34 |
+
|
35 |
+
y_all = y
|
36 |
+
|
37 |
+
#%% 功能區塊
|
38 |
+
if file is not None:
|
39 |
+
### Start of 選擇聲音片段 ###
|
40 |
+
with st.expander("選擇聲音片段(Select a segment of the audio)"):
|
41 |
+
|
42 |
+
# 建立一個滑桿,可以選擇聲音片段,使用時間長度為單位
|
43 |
+
start_time, end_time = st.slider("Select a segment of the audio",
|
44 |
+
0.0, duration,
|
45 |
+
(st.session_state.start_time, duration),
|
46 |
+
0.01
|
47 |
+
)
|
48 |
+
st.session_state.start_time = start_time
|
49 |
+
|
50 |
+
st.write(f"Selected segment: `{start_time}` ~ `{end_time}`, duration: `{end_time-start_time}`")
|
51 |
+
|
52 |
+
# 根據選擇的聲音片段,取出聲音資料
|
53 |
+
start_index = int(start_time*sr)
|
54 |
+
end_index = int(end_time*sr)
|
55 |
+
y_sub = y_all[start_index:end_index]
|
56 |
+
|
57 |
+
|
58 |
+
# 建立一個y_sub的播放器
|
59 |
+
st.audio(y_sub, format="audio/ogg", sample_rate=sr)
|
60 |
+
# 計算y_sub所對應時間的x軸
|
61 |
+
x_sub = np.arange(len(y_sub))/sr
|
62 |
+
### End of 選擇聲音片段 ###
|
63 |
+
|
64 |
+
tab1, tab2, tab3, tab4 = st.tabs(["Mel-frequency spectrogram", "Constant-Q transform", "Chroma", "Pitch class"])
|
65 |
+
|
66 |
+
shift_time, shift_array = get_shift(start_time, end_time) # shift_array為y_sub的時間刻度
|
67 |
+
|
68 |
+
# Mel-frequency spectrogram
|
69 |
+
with tab1:
|
70 |
+
st.subheader("Mel-frequency spectrogram")
|
71 |
+
with_pitch = st.checkbox("Show pitch", value=True)
|
72 |
+
fig2_1, ax2_1 = plot_mel_spectrogram(y_sub, sr, shift_array, with_pitch)
|
73 |
+
st.pyplot(fig2_1)
|
74 |
+
|
75 |
+
# Constant-Q transform
|
76 |
+
with tab2:
|
77 |
+
st.subheader("Constant-Q transform")
|
78 |
+
fig2_2, ax2_2 = plot_constant_q_transform(y_sub, sr, shift_array)
|
79 |
+
st.pyplot(fig2_2)
|
80 |
+
|
81 |
+
# chroma
|
82 |
+
with tab3:
|
83 |
+
st.subheader("Chroma")
|
84 |
+
|
85 |
+
chroma = librosa.feature.chroma_stft(y=y_sub, sr=sr)
|
86 |
+
chroma_t = librosa.times_like(chroma, sr)
|
87 |
+
df_chroma = pd.DataFrame(chroma)
|
88 |
+
df_chroma_t = pd.DataFrame({"Time(s)": chroma_t})
|
89 |
+
df_chroma_t["Time(frame)"] = list(range(len(chroma_t)))
|
90 |
+
df_chroma_t["Time(s)"] = df_chroma_t["Time(s)"] + shift_time
|
91 |
+
df_chroma_t = df_chroma_t[["Time(frame)", "Time(s)"]]
|
92 |
+
|
93 |
+
fig2_3, ax2_3 = plt.subplots(figsize=(10, 4))
|
94 |
+
sns.heatmap(chroma, ax=ax2_3)
|
95 |
+
ax2_3.set_title("Chroma")
|
96 |
+
ax2_3.set_xlabel("Time(frame)")
|
97 |
+
ax2_3.invert_yaxis()
|
98 |
+
st.pyplot(fig2_3)
|
99 |
+
|
100 |
+
st.write("Chroma value")
|
101 |
+
st.dataframe(df_chroma, use_container_width=True)
|
102 |
+
st.download_button(
|
103 |
+
label="Download chroma",
|
104 |
+
data=convert_df(df_chroma),
|
105 |
+
file_name="chroma_value.csv",
|
106 |
+
)
|
107 |
+
st.write("Chroma time")
|
108 |
+
st.dataframe(df_chroma_t, use_container_width=True)
|
109 |
+
st.download_button(
|
110 |
+
label="Download chroma time",
|
111 |
+
data=convert_df(df_chroma_t),
|
112 |
+
file_name="chroma_time.csv",
|
113 |
+
)
|
114 |
+
|
115 |
+
# Pitch class type one
|
116 |
+
with tab4:
|
117 |
+
st.subheader("Pitch class(chroma)")
|
118 |
+
high_res = st.checkbox("High resolution", value=False)
|
119 |
+
fig2_4, ax2_4, df_pitch_class = pitch_class_histogram_chroma(y_sub, sr, high_res)
|
120 |
+
st.pyplot(fig2_4)
|
121 |
+
st.write(df_pitch_class)
|
122 |
+
st.download_button(
|
123 |
+
label="Download pitch class(chroma)",
|
124 |
+
data=convert_df(pd.DataFrame(df_pitch_class)),
|
125 |
+
file_name="Pitch_class(chroma).csv",
|
126 |
+
mime="text/csv",
|
127 |
+
)
|
pages/3-Beat Tracking.py
ADDED
@@ -0,0 +1,164 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#%%
|
2 |
+
import streamlit as st
|
3 |
+
import plotly.express as px
|
4 |
+
import plotly.graph_objects as go
|
5 |
+
import matplotlib.pyplot as plt
|
6 |
+
import numpy as np
|
7 |
+
import librosa
|
8 |
+
import pandas as pd
|
9 |
+
from src.beat_track import onsets_detection, plot_onset_strength, beat_analysis, predominant_local_pulse, static_tempo_estimation, plot_tempogram, onset_click_plot, beat_plot
|
10 |
+
from src.st_helper import convert_df, show_readme, get_shift
|
11 |
+
import numpy as np
|
12 |
+
|
13 |
+
st.title('Beat Tracking')
|
14 |
+
|
15 |
+
#%% 頁面說明
|
16 |
+
# show_readme("docs/3-Beat Tracking.md")
|
17 |
+
|
18 |
+
#%% 上傳檔案區塊
|
19 |
+
with st.expander("上傳檔案(Upload Files)"):
|
20 |
+
file = st.file_uploader("Upload your music library", type=["mp3", "wav", "ogg"])
|
21 |
+
|
22 |
+
if file is not None:
|
23 |
+
st.audio(file, format="audio/ogg")
|
24 |
+
st.subheader("File information")
|
25 |
+
st.write(f"File name: `{file.name}`", )
|
26 |
+
st.write(f"File type: `{file.type}`")
|
27 |
+
st.write(f"File size: `{file.size}`")
|
28 |
+
|
29 |
+
# 載入音檔
|
30 |
+
y, sr = librosa.load(file, sr=44100)
|
31 |
+
st.write(f"Sample rate: `{sr}`")
|
32 |
+
duration = float(np.round(len(y)/sr-0.005, 2)) # 時間長度,取小數點後2位,向下取整避免超過音檔長度
|
33 |
+
st.write(f"Duration(s): `{duration}`")
|
34 |
+
|
35 |
+
y_all = y
|
36 |
+
|
37 |
+
#%% 功能區塊
|
38 |
+
if file is not None:
|
39 |
+
|
40 |
+
### Start of 選擇聲音片段 ###
|
41 |
+
with st.expander("選擇聲音片段(Select a segment of the audio)"):
|
42 |
+
|
43 |
+
# 建立一個滑桿,可以選擇聲音片段,使用時間長度為單位
|
44 |
+
start_time, end_time = st.slider("Select a segment of the audio",
|
45 |
+
0.0, duration,
|
46 |
+
(st.session_state.start_time, duration),
|
47 |
+
0.01
|
48 |
+
)
|
49 |
+
st.session_state.start_time = start_time
|
50 |
+
|
51 |
+
st.write(f"Selected segment: `{start_time}` ~ `{end_time}`, duration: `{end_time-start_time}`")
|
52 |
+
|
53 |
+
# 根據選擇的聲音片段,取出聲音資料
|
54 |
+
start_index = int(start_time*sr)
|
55 |
+
end_index = int(end_time*sr)
|
56 |
+
y_sub = y_all[start_index:end_index]
|
57 |
+
|
58 |
+
|
59 |
+
# 建立一個y_sub的播放器
|
60 |
+
st.audio(y_sub, format="audio/ogg", sample_rate=sr)
|
61 |
+
# 計算y_sub所對應時間的x軸
|
62 |
+
x_sub = np.arange(len(y_sub))/sr
|
63 |
+
### End of 選擇聲音片段 ###
|
64 |
+
|
65 |
+
tab1, tab2, tab3, tab4, tab5, tab6 = st.tabs([
|
66 |
+
"onsets_detection",
|
67 |
+
"onset_strength",
|
68 |
+
"beat_analysis",
|
69 |
+
"predominant_local_pulse",
|
70 |
+
"static_tempo_estimation",
|
71 |
+
"Tempogram"])
|
72 |
+
|
73 |
+
shift_time, shift_array = get_shift(start_time, end_time) # shift_array為y_sub的時間刻度
|
74 |
+
|
75 |
+
# onsets_detection
|
76 |
+
with tab1:
|
77 |
+
st.subheader("onsets_detection")
|
78 |
+
fig3_1a, ax3_1a, onset_data = onsets_detection(y_sub, sr, shift_array)
|
79 |
+
o_env, o_times, onset_frames = onset_data
|
80 |
+
st.pyplot(fig3_1a)
|
81 |
+
# 設定onset_frame調整區塊
|
82 |
+
clicks = st.multiselect("Onset",
|
83 |
+
list(range(len(o_env))), list(onset_frames))
|
84 |
+
fig3_1b, ax3_1b, y_onset_clicks = onset_click_plot(o_env, o_times, clicks, len(y_sub), sr, shift_time)
|
85 |
+
st.pyplot(fig3_1b)
|
86 |
+
df_onset = pd.DataFrame({"Frame": clicks, "Time(s)": o_times[clicks], "Onset": o_env[clicks]})
|
87 |
+
st.dataframe(df_onset, use_container_width=True)
|
88 |
+
st.download_button(
|
89 |
+
label="Download onset data",
|
90 |
+
data=convert_df(df_onset),
|
91 |
+
file_name="onset_data.csv",
|
92 |
+
)
|
93 |
+
st.audio(y_onset_clicks, format="audio/ogg", sample_rate=sr)
|
94 |
+
|
95 |
+
|
96 |
+
# onset_strength
|
97 |
+
with tab2:
|
98 |
+
st.subheader("onset_strength")
|
99 |
+
onset_strength_standard = st.checkbox("standard", value=True)
|
100 |
+
onset_strength_custom_mel = st.checkbox("custom_mel", value=False)
|
101 |
+
onset_strength_cqt = st.checkbox("cqt", value=False)
|
102 |
+
fig3_2, ax3_2 = plot_onset_strength(y_sub, sr,
|
103 |
+
standard=onset_strength_standard,
|
104 |
+
custom_mel=onset_strength_custom_mel,
|
105 |
+
cqt=onset_strength_cqt,
|
106 |
+
shift_array=shift_array
|
107 |
+
)
|
108 |
+
st.pyplot(fig3_2)
|
109 |
+
|
110 |
+
# beat_analysis
|
111 |
+
with tab3:
|
112 |
+
st.subheader("beat_analysis")
|
113 |
+
spec_type = st.selectbox("spec_type", ["mel", "stft"])
|
114 |
+
spec_hop_length = st.number_input("spec_hop_length", value=512)
|
115 |
+
fig3_3a, ax3_3b, beats_data = beat_analysis(y_sub, sr,
|
116 |
+
spec_type=spec_type,
|
117 |
+
spec_hop_length=spec_hop_length,
|
118 |
+
shift_array=shift_array
|
119 |
+
)
|
120 |
+
b_times, b_env, b_tempo, b_beats = beats_data
|
121 |
+
st.pyplot(fig3_3a)
|
122 |
+
|
123 |
+
b_clicks = st.multiselect("Beats",
|
124 |
+
list(range(len(b_env))), list(b_beats))
|
125 |
+
fig3_3b, ax3_3b, y_beat_clicks = beat_plot(b_times, b_env, b_tempo, b_clicks, len(y_sub), sr, shift_time)
|
126 |
+
st.pyplot(fig3_3b)
|
127 |
+
# df_beats = pd.DataFrame([b_clicks, b_times[b_clicks] + shift_time])
|
128 |
+
# df_beats.index = ["frames", "time"]
|
129 |
+
df_beats = pd.DataFrame({"Frame": b_clicks, "Time(s)": b_times[b_clicks] + shift_time, "Beats": b_env[b_clicks]})
|
130 |
+
st.dataframe(df_beats, use_container_width=True)
|
131 |
+
st.download_button(
|
132 |
+
label="Download beats data",
|
133 |
+
data=convert_df(df_beats),
|
134 |
+
file_name="beats_data.csv",
|
135 |
+
)
|
136 |
+
st.audio(y_beat_clicks, format="audio/ogg", sample_rate=sr)
|
137 |
+
|
138 |
+
|
139 |
+
# predominant_local_pulse
|
140 |
+
with tab4:
|
141 |
+
st.subheader("predominant_local_pulse")
|
142 |
+
fig3_4, ax3_4 = predominant_local_pulse(y_sub, sr, shift_time)
|
143 |
+
st.pyplot(fig3_4)
|
144 |
+
|
145 |
+
# static_tempo_estimation
|
146 |
+
with tab5:
|
147 |
+
st.subheader("static_tempo_estimation")
|
148 |
+
static_tempo_estimation_hop_length = st.number_input("hop_length", value=512)
|
149 |
+
fig3_5, ax3_5 = static_tempo_estimation(y_sub, sr,
|
150 |
+
hop_length=static_tempo_estimation_hop_length
|
151 |
+
)
|
152 |
+
st.pyplot(fig3_5)
|
153 |
+
|
154 |
+
# Tempogram
|
155 |
+
with tab6:
|
156 |
+
st.subheader("Tempogram")
|
157 |
+
tempogram_type = st.selectbox("tempogram_type", ["fourier", "autocorr"], index=1)
|
158 |
+
tempogram_hop_length = st.number_input("Tempogram_hop_length", value=512)
|
159 |
+
fig3_6, ax3_6 = plot_tempogram(y_sub, sr,
|
160 |
+
type=tempogram_type,
|
161 |
+
hop_length=tempogram_hop_length,
|
162 |
+
shift_array=shift_array
|
163 |
+
)
|
164 |
+
st.pyplot(fig3_6)
|
pages/4-Chord_recognition.py
ADDED
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#%%
|
2 |
+
import streamlit as st
|
3 |
+
import plotly.express as px
|
4 |
+
import plotly.graph_objects as go
|
5 |
+
import matplotlib.pyplot as plt
|
6 |
+
import numpy as np
|
7 |
+
import librosa
|
8 |
+
import pandas as pd
|
9 |
+
import seaborn as sns
|
10 |
+
from src.st_helper import convert_df, show_readme, get_shift
|
11 |
+
from src.chord_recognition import (
|
12 |
+
plot_chord_recognition,
|
13 |
+
plot_binary_template_chord_recognition,
|
14 |
+
chord_table,
|
15 |
+
compute_chromagram,
|
16 |
+
chord_recognition_template,
|
17 |
+
plot_chord,
|
18 |
+
plot_user_chord
|
19 |
+
)
|
20 |
+
|
21 |
+
st.title("Chord Recognition")
|
22 |
+
|
23 |
+
#%% 頁面說明
|
24 |
+
# show_readme("docs/1-Basic Information.md")
|
25 |
+
|
26 |
+
|
27 |
+
#%% 上傳檔案區塊
|
28 |
+
with st.expander("上傳檔案(Upload Files)"):
|
29 |
+
file = st.file_uploader("Upload your music library", type=["mp3", "wav", "ogg"])
|
30 |
+
|
31 |
+
if file is not None:
|
32 |
+
st.audio(file, format="audio/ogg")
|
33 |
+
st.subheader("File information")
|
34 |
+
st.write(f"File name: `{file.name}`", )
|
35 |
+
st.write(f"File type: `{file.type}`")
|
36 |
+
st.write(f"File size: `{file.size}`")
|
37 |
+
|
38 |
+
# 載入音檔
|
39 |
+
y, sr = librosa.load(file, sr=44100)
|
40 |
+
st.write(f"Sample rate: `{sr}`")
|
41 |
+
duration = float(np.round(len(y)/sr-0.005, 2)) # 時間長度,取小數點後2位,向下取整避免超過音檔長度
|
42 |
+
st.write(f"Duration(s): `{duration}`")
|
43 |
+
|
44 |
+
y_all = y
|
45 |
+
|
46 |
+
#%%
|
47 |
+
if file is not None:
|
48 |
+
|
49 |
+
### Start of 選擇聲音片段 ###
|
50 |
+
with st.expander("選擇聲音片段(Select a segment of the audio)"):
|
51 |
+
|
52 |
+
# 建立一個滑桿,可以選擇聲音片段,使用時間長度為單位
|
53 |
+
start_time, end_time = st.slider("Select a segment of the audio",
|
54 |
+
0.0, duration,
|
55 |
+
(st.session_state.start_time, duration),
|
56 |
+
0.01
|
57 |
+
)
|
58 |
+
st.session_state.start_time = start_time
|
59 |
+
|
60 |
+
st.write(f"Selected segment: `{start_time}` ~ `{end_time}`, duration: `{end_time-start_time}`")
|
61 |
+
|
62 |
+
# 根據選擇的聲音片段,取出聲音資料
|
63 |
+
start_index = int(start_time*sr)
|
64 |
+
end_index = int(end_time*sr)
|
65 |
+
y_sub = y_all[start_index:end_index]
|
66 |
+
|
67 |
+
|
68 |
+
# 建立一個y_sub的播放器
|
69 |
+
st.audio(y_sub, format="audio/ogg", sample_rate=sr)
|
70 |
+
# 計算y_sub所對應時間的x軸
|
71 |
+
x_sub = np.arange(len(y_sub))/sr
|
72 |
+
### End of 選擇聲音片段 ###
|
73 |
+
|
74 |
+
tab1, tab2, tab3, tab4 = st.tabs(["STFT Chroma", "Chords Result (Default)", "Chords Result (User)", "dev"])
|
75 |
+
shift_time, shift_array = get_shift(start_time, end_time) # shift_array為y_sub的時間刻度
|
76 |
+
|
77 |
+
# STFT Chroma
|
78 |
+
with tab1:
|
79 |
+
chroma, _, _, _, duration = compute_chromagram(y_sub, sr)
|
80 |
+
fig4_1, ax4_1 = plot_chord(chroma, "STFT Chroma")
|
81 |
+
st.pyplot(fig4_1)
|
82 |
+
|
83 |
+
with tab2:
|
84 |
+
_, chord_max = chord_recognition_template(chroma, norm_sim='max')
|
85 |
+
fig4_2, ax4_2 = plot_chord(chord_max, "Chord Recognition Result", cmap="crest", include_minor=True)
|
86 |
+
st.pyplot(fig4_2)
|
87 |
+
|
88 |
+
with tab3:
|
89 |
+
# 建立chord result dataframe
|
90 |
+
sec_per_frame = duration/chroma.shape[1]
|
91 |
+
chord_results_df = pd.DataFrame({
|
92 |
+
"Frame": np.arange(chroma.shape[1]),
|
93 |
+
"Time(s)": np.arange(chroma.shape[1])*sec_per_frame + shift_time,
|
94 |
+
"Chord": chord_table(chord_max)
|
95 |
+
})
|
96 |
+
|
97 |
+
fig4_1b, ax4_1b = plot_user_chord(chord_results_df)
|
98 |
+
st.pyplot(fig4_1b)
|
99 |
+
|
100 |
+
chord_results_df = st.experimental_data_editor(
|
101 |
+
chord_results_df,
|
102 |
+
use_container_width=True
|
103 |
+
)
|
104 |
+
|
105 |
+
|
106 |
+
# plot_binary_template_chord_recognition
|
107 |
+
with tab4:
|
108 |
+
st.subheader("plot_binary_template_chord_recognition")
|
109 |
+
fig4_4, ax4_4 = plot_binary_template_chord_recognition(y_sub, sr)
|
110 |
+
st.pyplot(fig4_4)
|
111 |
+
|
112 |
+
|
pages/5-Structure_analysis.py
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#%%
|
2 |
+
import streamlit as st
|
3 |
+
import plotly.express as px
|
4 |
+
import plotly.graph_objects as go
|
5 |
+
import matplotlib.pyplot as plt
|
6 |
+
import numpy as np
|
7 |
+
import librosa
|
8 |
+
import pandas as pd
|
9 |
+
from src.st_helper import convert_df, show_readme
|
10 |
+
from src.structure_analysis import (
|
11 |
+
plot_self_similarity
|
12 |
+
)
|
13 |
+
|
14 |
+
st.title("Structure analysis")
|
15 |
+
|
16 |
+
#%% 頁面說明
|
17 |
+
# show_readme("docs/5-Structure_analysis.md")
|
18 |
+
|
19 |
+
|
20 |
+
#%% 上傳檔案區塊
|
21 |
+
with st.expander("上傳檔案(Upload Files)"):
|
22 |
+
file = st.file_uploader("Upload your music library", type=["mp3", "wav", "ogg"])
|
23 |
+
|
24 |
+
if file is not None:
|
25 |
+
st.audio(file, format="audio/ogg")
|
26 |
+
st.subheader("File information")
|
27 |
+
st.write(f"File name: `{file.name}`", )
|
28 |
+
st.write(f"File type: `{file.type}`")
|
29 |
+
st.write(f"File size: `{file.size}`")
|
30 |
+
|
31 |
+
# 載入音檔
|
32 |
+
y, sr = librosa.load(file, sr=44100)
|
33 |
+
st.write(f"Sample rate: `{sr}`")
|
34 |
+
duration = float(np.round(len(y)/sr-0.005, 2)) # 時間長度,取小數點後2位,向下取整避免超過音檔長度
|
35 |
+
st.write(f"Duration(s): `{duration}`")
|
36 |
+
|
37 |
+
|
38 |
+
y_all = y
|
39 |
+
|
40 |
+
#%%
|
41 |
+
if file is not None:
|
42 |
+
|
43 |
+
### Start of 選擇聲音片段 ###
|
44 |
+
with st.expander("選擇聲音片段(Select a segment of the audio)"):
|
45 |
+
|
46 |
+
# 建立一個滑桿,可以選擇聲音片段,使用時間長度為單位
|
47 |
+
start_time, end_time = st.slider("Select a segment of the audio",
|
48 |
+
0.0, duration,
|
49 |
+
(st.session_state.start_time, duration),
|
50 |
+
0.01
|
51 |
+
)
|
52 |
+
st.session_state.start_time = start_time
|
53 |
+
|
54 |
+
st.write(f"Selected segment: `{start_time}` ~ `{end_time}`, duration: `{end_time-start_time}`")
|
55 |
+
|
56 |
+
# 根據選擇的聲音片段,取出聲音資料
|
57 |
+
start_index = int(start_time*sr)
|
58 |
+
end_index = int(end_time*sr)
|
59 |
+
y_sub = y_all[start_index:end_index]
|
60 |
+
|
61 |
+
|
62 |
+
# 建立一個y_sub的播放器
|
63 |
+
st.audio(y_sub, format="audio/ogg", sample_rate=sr)
|
64 |
+
# 計算y_sub所對應時間的x軸
|
65 |
+
x_sub = np.arange(len(y_sub))/sr
|
66 |
+
### End of 選擇聲音片段 ###
|
67 |
+
|
68 |
+
tab1, tab2 = st.tabs(["Self-similarity matrix", "empty"])
|
69 |
+
|
70 |
+
# plot_self_similarity
|
71 |
+
with tab1:
|
72 |
+
st.subheader("Self-similarity matrix")
|
73 |
+
affinity = st.checkbox("Affinity", value=False)
|
74 |
+
self_similarity_hop_length = st.number_input("Self similarity hop length", value=1024)
|
75 |
+
fig5_1, ax5_1 = plot_self_similarity(y_sub, sr, affinity=affinity, hop_length=self_similarity_hop_length)
|
76 |
+
st.pyplot(fig5_1)
|
pages/6-Timbre_Analysis.py
ADDED
@@ -0,0 +1,149 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#%%
|
2 |
+
import streamlit as st
|
3 |
+
import plotly.express as px
|
4 |
+
import plotly.graph_objects as go
|
5 |
+
import matplotlib.pyplot as plt
|
6 |
+
import numpy as np
|
7 |
+
import librosa
|
8 |
+
import pandas as pd
|
9 |
+
from src.st_helper import convert_df, show_readme, get_shift
|
10 |
+
from src.timbre_analysis import (
|
11 |
+
spectral_centroid_analysis,
|
12 |
+
rolloff_frequency_analysis,
|
13 |
+
spectral_bandwidth_analysis,
|
14 |
+
harmonic_percussive_source_separation
|
15 |
+
)
|
16 |
+
|
17 |
+
st.title("Timbre Analysis")
|
18 |
+
#%% 頁面說明
|
19 |
+
# show_readme("docs/6-Timbre Analysis.md")
|
20 |
+
|
21 |
+
#%% 上傳檔案區塊
|
22 |
+
with st.expander("上傳檔案(Upload Files)"):
|
23 |
+
file = st.file_uploader("Upload your music library", type=["mp3", "wav", "ogg"])
|
24 |
+
|
25 |
+
if file is not None:
|
26 |
+
st.audio(file, format="audio/ogg")
|
27 |
+
st.subheader("File information")
|
28 |
+
st.write(f"File name: `{file.name}`", )
|
29 |
+
st.write(f"File type: `{file.type}`")
|
30 |
+
st.write(f"File size: `{file.size}`")
|
31 |
+
|
32 |
+
# 載入音檔
|
33 |
+
y, sr = librosa.load(file, sr=44100)
|
34 |
+
st.write(f"Sample rate: `{sr}`")
|
35 |
+
duration = float(np.round(len(y)/sr-0.005, 2)) # 時間長度,取小數點後2位,向下取整避免超過音檔長度
|
36 |
+
st.write(f"Duration(s): `{duration}`")
|
37 |
+
|
38 |
+
y_all = y
|
39 |
+
|
40 |
+
#%%
|
41 |
+
if file is not None:
|
42 |
+
|
43 |
+
### Start of 選擇聲音片段 ###
|
44 |
+
with st.expander("選擇聲音片段(Select a segment of the audio)"):
|
45 |
+
|
46 |
+
# 建立一個滑桿,可以選擇聲音片段,使用時間長度為單位
|
47 |
+
start_time, end_time = st.slider("Select a segment of the audio",
|
48 |
+
0.0, duration,
|
49 |
+
(st.session_state.start_time, duration),
|
50 |
+
0.01
|
51 |
+
)
|
52 |
+
st.session_state.start_time = start_time
|
53 |
+
|
54 |
+
st.write(f"Selected segment: `{start_time}` ~ `{end_time}`, duration: `{end_time-start_time}`")
|
55 |
+
|
56 |
+
# 根據選擇的聲音片段,取出聲音資料
|
57 |
+
start_index = int(start_time*sr)
|
58 |
+
end_index = int(end_time*sr)
|
59 |
+
y_sub = y_all[start_index:end_index]
|
60 |
+
|
61 |
+
|
62 |
+
# 建立一個y_sub的播放器
|
63 |
+
st.audio(y_sub, format="audio/ogg", sample_rate=sr)
|
64 |
+
# 計算y_sub所對應時間的x軸
|
65 |
+
x_sub = np.arange(len(y_sub))/sr
|
66 |
+
### End of 選擇聲音片段 ###
|
67 |
+
|
68 |
+
tab1, tab2, tab3, tab4 = st.tabs(["Spectral Centroid", "Rolloff Frequency", "Spectral Bandwidth", "Harmonic Percussive Source Separation"])
|
69 |
+
|
70 |
+
shift_time, shift_array = get_shift(start_time, end_time) # shift_array為y_sub的時間刻度
|
71 |
+
|
72 |
+
# spectral_centroid_analysis
|
73 |
+
with tab1:
|
74 |
+
st.subheader("Spectral Centroid Analysis")
|
75 |
+
fig6_1, ax6_1, centroid_value = spectral_centroid_analysis(y_sub, sr, shift_array)
|
76 |
+
st.pyplot(fig6_1)
|
77 |
+
|
78 |
+
df_centroid = pd.DataFrame(centroid_value.T, columns=["Time(s)", "Centroid"])
|
79 |
+
df_centroid["Time(s)"] = df_centroid["Time(s)"] + shift_time
|
80 |
+
st.dataframe(df_centroid, use_container_width=True)
|
81 |
+
st.download_button(
|
82 |
+
label="Download spectral centroid data",
|
83 |
+
data=convert_df(df_centroid),
|
84 |
+
file_name="centroid.csv",
|
85 |
+
mime="text/csv",
|
86 |
+
)
|
87 |
+
|
88 |
+
# rolloff_frequency_analysis
|
89 |
+
with tab2:
|
90 |
+
st.subheader("Rolloff Frequency Analysis")
|
91 |
+
roll_percent = st.selectbox("Select rolloff frequency", [0.90, 0.95, 0.99])
|
92 |
+
fig6_2, ax6_2, rolloff_value = rolloff_frequency_analysis(y_sub, sr, roll_percent=roll_percent, shift_array=shift_array)
|
93 |
+
st.pyplot(fig6_2)
|
94 |
+
df_rolloff = pd.DataFrame(rolloff_value.T, columns=["Time(s)", "Rolloff", "Rolloff_min"])
|
95 |
+
df_rolloff["Time(s)"] = df_rolloff["Time(s)"] + shift_time
|
96 |
+
st.dataframe(df_rolloff, use_container_width=True)
|
97 |
+
st.download_button(
|
98 |
+
label="Download rolloff frequency data",
|
99 |
+
data=convert_df(df_rolloff),
|
100 |
+
file_name="rolloff.csv",
|
101 |
+
mime="text/csv",
|
102 |
+
)
|
103 |
+
|
104 |
+
# spectral_bandwidth_analysis
|
105 |
+
with tab3:
|
106 |
+
st.subheader("Spectral Bandwidth Analysis")
|
107 |
+
fig6_3, ax6_3, bandwidth_value = spectral_bandwidth_analysis(y_sub, sr, shift_array)
|
108 |
+
st.pyplot(fig6_3)
|
109 |
+
df_bandwidth = pd.DataFrame(bandwidth_value.T, columns=["Time(s)", "Bandwidth"])
|
110 |
+
df_bandwidth["Time(s)"] = df_bandwidth["Time(s)"] + shift_time
|
111 |
+
st.dataframe(df_bandwidth, use_container_width=True)
|
112 |
+
st.download_button(
|
113 |
+
label="Download spectral bandwidth data",
|
114 |
+
data=convert_df(df_bandwidth),
|
115 |
+
file_name="bandwidth.csv",
|
116 |
+
mime="text/csv",
|
117 |
+
)
|
118 |
+
|
119 |
+
# harmonic_percussive_source_separation
|
120 |
+
with tab4:
|
121 |
+
st.subheader("Harmonic Percussive Source Separation")
|
122 |
+
fig6_4, ax6_4, (Harmonic_data) = harmonic_percussive_source_separation(y_sub, sr, shift_array)
|
123 |
+
D, H, P, t = Harmonic_data
|
124 |
+
st.pyplot(fig6_4)
|
125 |
+
|
126 |
+
st.download_button(
|
127 |
+
label="Download Full power spectrogram data",
|
128 |
+
data=convert_df(pd.DataFrame(D)),
|
129 |
+
file_name="Full_power_spectrogram.csv",
|
130 |
+
use_container_width=True,
|
131 |
+
)
|
132 |
+
st.download_button(
|
133 |
+
label="Download Harmonic power spectrogram data",
|
134 |
+
data=convert_df(pd.DataFrame(H)),
|
135 |
+
file_name="Harmonic_power_spectrogram.csv",
|
136 |
+
use_container_width=True,
|
137 |
+
)
|
138 |
+
st.download_button(
|
139 |
+
label="Download Percussive power spectrogram data",
|
140 |
+
data=convert_df(pd.DataFrame(P)),
|
141 |
+
file_name="Percussive_power_spectrogram.csv",
|
142 |
+
use_container_width=True,
|
143 |
+
)
|
144 |
+
st.download_button(
|
145 |
+
label="Download Time data",
|
146 |
+
data=convert_df(pd.DataFrame(t+shift_time, columns=["Time(s)"])),
|
147 |
+
file_name="Time_scale.csv",
|
148 |
+
use_container_width=True,
|
149 |
+
)
|
pages/999-dev.py
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#%%
|
2 |
+
import pkg_resources
|
3 |
+
import streamlit as st
|
4 |
+
|
5 |
+
with st.expander("Show packages"):
|
6 |
+
for dist in pkg_resources.working_set:
|
7 |
+
print(f"{dist.project_name}=={dist.version}")
|
8 |
+
st.write(f"{dist.project_name}=={dist.version}")
|
9 |
+
|
10 |
+
#%%
|
11 |
+
import os
|
12 |
+
import psutil
|
13 |
+
|
14 |
+
with st.expander("Show memory usage"):
|
15 |
+
process = psutil.Process(os.getpid())
|
16 |
+
mem_info = process.memory_info()
|
17 |
+
print(f"Memory usage: {mem_info.rss / 1024 / 1024:.2f} MB")
|
18 |
+
st.write(f"Memory usage: {mem_info.rss / 1024 / 1024:.2f} MB")
|
requirements.txt
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
librosa==0.9.2
|
2 |
+
pandas==1.3.5
|
3 |
+
streamlit==1.19.0
|
4 |
+
numpy==1.23.0
|
5 |
+
seaborn==0.12.1
|
6 |
+
matplotlib==3.5.3
|
7 |
+
plotly==5.11.0
|
8 |
+
scikit-learn==1.2.0
|
9 |
+
soundfile==0.11.0
|
10 |
+
libfmp==1.2.3
|
11 |
+
psutil==5.9.1
|
src/basic_info.py
ADDED
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import librosa
|
2 |
+
from librosa import display
|
3 |
+
from librosa import feature
|
4 |
+
|
5 |
+
import numpy as np
|
6 |
+
from matplotlib import pyplot as plt
|
7 |
+
import scipy
|
8 |
+
|
9 |
+
from numpy import typing as npt
|
10 |
+
import typing
|
11 |
+
|
12 |
+
|
13 |
+
def show_duration(y: npt.ArrayLike, sr: int) -> float:
|
14 |
+
pass
|
15 |
+
|
16 |
+
|
17 |
+
def selcet_time(start_time: float, end_time: float) :
|
18 |
+
pass
|
19 |
+
|
20 |
+
|
21 |
+
def plot_waveform(ax, y: npt.ArrayLike, sr: int, start_time: float = 0.0, end_time: float = None) -> None :
|
22 |
+
# ax = plt.subplot(2, 1, 1)
|
23 |
+
startIdx = int(start_time * sr)
|
24 |
+
|
25 |
+
if not end_time :
|
26 |
+
|
27 |
+
librosa.display.waveshow(y[startIdx:], sr)
|
28 |
+
|
29 |
+
else :
|
30 |
+
endIdx = int(end_time * sr)
|
31 |
+
librosa.display.waveshow(y[startIdx:endIdx - 1], sr)
|
32 |
+
|
33 |
+
return
|
34 |
+
|
35 |
+
|
36 |
+
def signal_RMS_analysis(y: npt.ArrayLike, shift_time: float = 0.0) :
|
37 |
+
|
38 |
+
fig, ax = plt.subplots()
|
39 |
+
|
40 |
+
rms = librosa.feature.rms(y = y)
|
41 |
+
times = librosa.times_like(rms) + shift_time
|
42 |
+
|
43 |
+
ax.plot(times, rms[0])
|
44 |
+
ax.set_xlabel('Time (s)')
|
45 |
+
ax.set_ylabel('RMS')
|
46 |
+
|
47 |
+
|
48 |
+
return fig, ax, times, rms
|
src/beat_track.py
ADDED
@@ -0,0 +1,223 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import librosa
|
2 |
+
from librosa import display
|
3 |
+
from librosa import feature
|
4 |
+
|
5 |
+
import numpy as np
|
6 |
+
from matplotlib import pyplot as plt
|
7 |
+
import scipy
|
8 |
+
import soundfile as sf
|
9 |
+
|
10 |
+
from numpy import typing as npt
|
11 |
+
import typing
|
12 |
+
|
13 |
+
|
14 |
+
def onsets_detection(y: npt.ArrayLike, sr: int, shift_array: npt.ArrayLike) -> tuple :
|
15 |
+
"""
|
16 |
+
計算音檔的onset frames
|
17 |
+
"""
|
18 |
+
o_env = librosa.onset.onset_strength(y=y, sr=sr)
|
19 |
+
times = librosa.times_like(o_env, sr=sr)
|
20 |
+
onset_frames = librosa.onset.onset_detect(onset_envelope=o_env, sr=sr)
|
21 |
+
D = np.abs(librosa.stft(y))
|
22 |
+
|
23 |
+
fig, ax = plt.subplots()
|
24 |
+
librosa.display.specshow(librosa.amplitude_to_db(D, ref=np.max),
|
25 |
+
x_axis='time', y_axis='log', ax=ax, sr=sr)
|
26 |
+
ax.set_xticks(shift_array - shift_array[0],
|
27 |
+
shift_array)
|
28 |
+
ax.set_xlabel('Time (s)')
|
29 |
+
ax.autoscale()
|
30 |
+
ax.set(title='Power spectrogram')
|
31 |
+
|
32 |
+
|
33 |
+
return fig, ax, (o_env, times, onset_frames)
|
34 |
+
|
35 |
+
def onset_click_plot(o_env, times, onset_frames, y_len, sr, shift_time) -> tuple:
|
36 |
+
"""
|
37 |
+
重新繪製onset frames
|
38 |
+
"""
|
39 |
+
fig, ax = plt.subplots()
|
40 |
+
ax.plot(times + shift_time, o_env, label='Onset strength')
|
41 |
+
ax.vlines(times[onset_frames] + shift_time, 0, o_env.max(), color='r', alpha=0.9,
|
42 |
+
linestyles='--', label='Onsets')
|
43 |
+
ax.autoscale()
|
44 |
+
ax.legend()
|
45 |
+
ax.set_xlabel('Time (s)')
|
46 |
+
ax.set_ylabel('Strength')
|
47 |
+
|
48 |
+
y_onset_clicks = librosa.clicks(frames=onset_frames, sr=sr, length=y_len)
|
49 |
+
return fig, ax, y_onset_clicks
|
50 |
+
|
51 |
+
|
52 |
+
def plot_onset_strength(y: npt.ArrayLike, sr:int, standard: bool = True, custom_mel: bool = False, cqt: bool = False, shift_array: npt.ArrayLike = None) -> tuple:
|
53 |
+
|
54 |
+
D = np.abs(librosa.stft(y))
|
55 |
+
times = librosa.times_like(D, sr)
|
56 |
+
|
57 |
+
fig, ax = plt.subplots(nrows=2, sharex=True)
|
58 |
+
librosa.display.specshow(librosa.amplitude_to_db(D, ref=np.max),
|
59 |
+
y_axis='log', x_axis='time', ax=ax[0], sr=sr)
|
60 |
+
|
61 |
+
ax[0].set(title='Power spectrogram')
|
62 |
+
ax[0].label_outer()
|
63 |
+
|
64 |
+
# Standard Onset Fuction
|
65 |
+
|
66 |
+
if standard :
|
67 |
+
onset_env_standard = librosa.onset.onset_strength(y=y, sr=sr)
|
68 |
+
ax[1].plot(times, 2 + onset_env_standard / onset_env_standard.max(), alpha=0.8, label='Mean (mel)')
|
69 |
+
|
70 |
+
if custom_mel :
|
71 |
+
onset_env_mel = librosa.onset.onset_strength(y=y, sr=sr,
|
72 |
+
aggregate=np.median,
|
73 |
+
fmax=8000, n_mels=256)
|
74 |
+
ax[1].plot(times, 1 + onset_env_mel / onset_env_mel.max(), alpha=0.8, label='Median (custom mel)')
|
75 |
+
|
76 |
+
if cqt :
|
77 |
+
C = np.abs(librosa.cqt(y=y, sr=sr))
|
78 |
+
onset_env_cqt = librosa.onset.onset_strength(sr=sr, S=librosa.amplitude_to_db(C, ref=np.max))
|
79 |
+
ax[1].plot(times, onset_env_cqt / onset_env_cqt.max(), alpha=0.8, label='Mean (CQT)')
|
80 |
+
|
81 |
+
ax[1].legend()
|
82 |
+
ax[1].set(ylabel='Normalized strength', yticks=[])
|
83 |
+
ax[1].set_xticks(shift_array - shift_array[0],
|
84 |
+
shift_array)
|
85 |
+
ax[1].autoscale()
|
86 |
+
ax[1].set_xlabel('Time (s)')
|
87 |
+
|
88 |
+
return fig, ax
|
89 |
+
|
90 |
+
|
91 |
+
def beat_analysis(y: npt.ArrayLike, sr:int, spec_type: str = 'mel', spec_hop_length: int = 512, shift_array: npt.ArrayLike = None) :
|
92 |
+
|
93 |
+
fig, ax = plt.subplots()
|
94 |
+
onset_env = librosa.onset.onset_strength(y=y, sr=sr, aggregate=np.median)
|
95 |
+
tempo, beats = librosa.beat.beat_track(onset_envelope=onset_env, sr=sr)
|
96 |
+
times = librosa.times_like(onset_env, sr=sr, hop_length=spec_hop_length)
|
97 |
+
|
98 |
+
if spec_type == 'mel':
|
99 |
+
M = librosa.feature.melspectrogram(y=y, sr=sr, hop_length=spec_hop_length)
|
100 |
+
librosa.display.specshow(librosa.power_to_db(M, ref=np.max),
|
101 |
+
y_axis='mel', x_axis='time', hop_length=spec_hop_length,
|
102 |
+
ax=ax, sr=sr)
|
103 |
+
ax.set(title='Mel spectrogram')
|
104 |
+
|
105 |
+
if spec_type == 'stft':
|
106 |
+
S = np.abs(librosa.stft(y))
|
107 |
+
img = librosa.display.specshow(librosa.amplitude_to_db(S, ref=np.max),
|
108 |
+
y_axis='log', x_axis='time', ax=ax, sr=sr)
|
109 |
+
|
110 |
+
ax.set_title('Power spectrogram')
|
111 |
+
# fig.colorbar(img, ax=ax[0], format="%+2.0f dB")
|
112 |
+
|
113 |
+
ax.set_xticks(shift_array - shift_array[0],
|
114 |
+
shift_array)
|
115 |
+
ax.autoscale()
|
116 |
+
ax.set_xlabel('Time (s)')
|
117 |
+
|
118 |
+
|
119 |
+
return fig, ax, (times, onset_env, tempo, beats)
|
120 |
+
|
121 |
+
def beat_plot(times, onset_env, tempo, beats, y_len, sr, shift_time):
|
122 |
+
"""
|
123 |
+
重新繪製beat
|
124 |
+
"""
|
125 |
+
|
126 |
+
fig, ax = plt.subplots()
|
127 |
+
ax.plot(times + shift_time, librosa.util.normalize(onset_env), label='Onset strength')
|
128 |
+
ax.vlines(times[beats] + shift_time, 0, 1, alpha=0.5, color='r', linestyle='--', label='Beats')
|
129 |
+
tempoString = 'Tempo = %.2f'% (tempo)
|
130 |
+
ax.plot([], [], ' ', label = tempoString)
|
131 |
+
ax.legend()
|
132 |
+
ax.set_xlabel('Time (s)')
|
133 |
+
ax.set_ylabel('Normalized strength')
|
134 |
+
|
135 |
+
y_beats = librosa.clicks(frames=beats, sr=sr, length=y_len)
|
136 |
+
|
137 |
+
return fig, ax, y_beats
|
138 |
+
|
139 |
+
def predominant_local_pulse(y: npt.ArrayLike, sr:int, shift_time:float=0) -> tuple :
|
140 |
+
|
141 |
+
onset_env = librosa.onset.onset_strength(y=y, sr=sr)
|
142 |
+
pulse = librosa.beat.plp(onset_envelope=onset_env, sr=sr)
|
143 |
+
beats_plp = np.flatnonzero(librosa.util.localmax(pulse))
|
144 |
+
times = librosa.times_like(pulse, sr=sr)
|
145 |
+
|
146 |
+
fig, ax = plt.subplots()
|
147 |
+
ax.plot(times + shift_time, librosa.util.normalize(pulse),label='PLP')
|
148 |
+
ax.vlines(times[beats_plp] + shift_time, 0, 1, alpha=0.5, color='r',
|
149 |
+
linestyle='--', label='PLP Beats')
|
150 |
+
ax.legend()
|
151 |
+
ax.set(title="Predominant local pulse")
|
152 |
+
ax.set_xlabel('Time (s)')
|
153 |
+
ax.set_ylabel('Normalized strength')
|
154 |
+
|
155 |
+
return fig, ax
|
156 |
+
|
157 |
+
|
158 |
+
def static_tempo_estimation(y: npt.ArrayLike, sr: int, hop_length: int = 512) -> tuple:
|
159 |
+
|
160 |
+
'''
|
161 |
+
To visualize the result of static tempo estimation
|
162 |
+
|
163 |
+
y: input signal array
|
164 |
+
sr: sampling rate
|
165 |
+
|
166 |
+
'''
|
167 |
+
|
168 |
+
onset_env = librosa.onset.onset_strength(y=y, sr=sr)
|
169 |
+
tempo = librosa.beat.tempo(onset_envelope=onset_env, sr=sr)
|
170 |
+
|
171 |
+
# Static tempo estimation
|
172 |
+
prior = scipy.stats.uniform(30, 300) # uniform over 30-300 BPM
|
173 |
+
utempo = librosa.beat.tempo(onset_envelope=onset_env, sr=sr, prior=prior)
|
174 |
+
|
175 |
+
tempo = tempo.item()
|
176 |
+
utempo = utempo.item()
|
177 |
+
ac = librosa.autocorrelate(onset_env, max_size=2 * sr // hop_length)
|
178 |
+
freqs = librosa.tempo_frequencies(len(ac), sr=sr,
|
179 |
+
hop_length=hop_length)
|
180 |
+
|
181 |
+
fig, ax = plt.subplots()
|
182 |
+
ax.semilogx(freqs[1:], librosa.util.normalize(ac)[1:],
|
183 |
+
label='Onset autocorrelation', base=2)
|
184 |
+
ax.axvline(tempo, 0, 1, alpha=0.75, linestyle='--', color='r',
|
185 |
+
label='Tempo (default prior): {:.2f} BPM'.format(tempo))
|
186 |
+
ax.axvline(utempo, 0, 1, alpha=0.75, linestyle=':', color='g',
|
187 |
+
label='Tempo (uniform prior): {:.2f} BPM'.format(utempo))
|
188 |
+
ax.set(xlabel='Tempo (BPM)', title='Static tempo estimation')
|
189 |
+
ax.grid(True)
|
190 |
+
ax.legend()
|
191 |
+
|
192 |
+
return fig, ax
|
193 |
+
|
194 |
+
|
195 |
+
def plot_tempogram(y: npt.ArrayLike, sr: int, type: str = 'autocorr', hop_length: int = 512, shift_array: npt.ArrayLike = None) -> tuple :
|
196 |
+
|
197 |
+
oenv = librosa.onset.onset_strength(y=y, sr=sr, hop_length=hop_length)
|
198 |
+
tempogram = librosa.feature.fourier_tempogram(onset_envelope=oenv, sr=sr, hop_length=hop_length)
|
199 |
+
tempo = librosa.beat.tempo(onset_envelope=oenv, sr=sr, hop_length=hop_length)[0]
|
200 |
+
|
201 |
+
fig, ax = plt.subplots()
|
202 |
+
|
203 |
+
if type == 'fourier' :
|
204 |
+
# To determine which temp to show?
|
205 |
+
librosa.display.specshow(np.abs(tempogram), sr=sr, hop_length=hop_length,
|
206 |
+
x_axis='time', y_axis='fourier_tempo', cmap='magma')
|
207 |
+
ax.axhline(tempo, color='w', linestyle='--', alpha=1, label='Estimated tempo={:g}'.format(tempo))
|
208 |
+
ax.legend(loc='upper right')
|
209 |
+
# ax.title('Fourier Tempogram')
|
210 |
+
|
211 |
+
if type == 'autocorr' :
|
212 |
+
ac_tempogram = librosa.feature.tempogram(onset_envelope=oenv, sr=sr, hop_length=hop_length, norm=None)
|
213 |
+
librosa.display.specshow(ac_tempogram, sr=sr, hop_length=hop_length, x_axis='time', y_axis='tempo', cmap='magma')
|
214 |
+
ax.axhline(tempo, color='w', linestyle='--', alpha=1, label='Estimated tempo={:g}'.format(tempo))
|
215 |
+
ax.legend(loc='upper right')
|
216 |
+
# ax.title('Autocorrelation Tempogram')
|
217 |
+
ax.set_xticks(shift_array - shift_array[0],
|
218 |
+
shift_array)
|
219 |
+
ax.autoscale()
|
220 |
+
|
221 |
+
return fig, ax
|
222 |
+
|
223 |
+
|
src/chord_recognition.py
ADDED
@@ -0,0 +1,285 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import numpy as np
|
3 |
+
from matplotlib import pyplot as plt
|
4 |
+
|
5 |
+
import librosa
|
6 |
+
import libfmp.b
|
7 |
+
import libfmp.c3
|
8 |
+
import libfmp.c4
|
9 |
+
|
10 |
+
import sys
|
11 |
+
|
12 |
+
def compute_chromagram_from_filename(fn_wav, Fs=22050, N=4096, H=2048, gamma=None, version='STFT', norm='2'):
|
13 |
+
"""Compute chromagram for WAV file specified by filename
|
14 |
+
|
15 |
+
Notebook: C5/C5S2_ChordRec_Templates.ipynb
|
16 |
+
|
17 |
+
Args:
|
18 |
+
fn_wav (str): Filenname of WAV
|
19 |
+
Fs (scalar): Sampling rate (Default value = 22050)
|
20 |
+
N (int): Window size (Default value = 4096)
|
21 |
+
H (int): Hop size (Default value = 2048)
|
22 |
+
gamma (float): Constant for logarithmic compression (Default value = None)
|
23 |
+
version (str): Technique used for front-end decomposition ('STFT', 'IIS', 'CQT') (Default value = 'STFT')
|
24 |
+
norm (str): If not 'None', chroma vectors are normalized by norm as specified ('1', '2', 'max')
|
25 |
+
(Default value = '2')
|
26 |
+
|
27 |
+
Returns:
|
28 |
+
X (np.ndarray): Chromagram
|
29 |
+
Fs_X (scalar): Feature reate of chromagram
|
30 |
+
x (np.ndarray): Audio signal
|
31 |
+
Fs (scalar): Sampling rate of audio signal
|
32 |
+
x_dur (float): Duration (seconds) of audio signal
|
33 |
+
"""
|
34 |
+
x, Fs = librosa.load(fn_wav, sr=Fs)
|
35 |
+
x_dur = x.shape[0] / Fs
|
36 |
+
if version == 'STFT':
|
37 |
+
# Compute chroma features with STFT
|
38 |
+
X = librosa.stft(x, n_fft=N, hop_length=H, pad_mode='constant', center=True)
|
39 |
+
if gamma is not None:
|
40 |
+
X = np.log(1 + gamma * np.abs(X) ** 2)
|
41 |
+
else:
|
42 |
+
X = np.abs(X) ** 2
|
43 |
+
X = librosa.feature.chroma_stft(S=X, sr=Fs, tuning=0, norm=None, hop_length=H, n_fft=N)
|
44 |
+
if version == 'CQT':
|
45 |
+
# Compute chroma features with CQT decomposition
|
46 |
+
X = librosa.feature.chroma_cqt(y=x, sr=Fs, hop_length=H, norm=None)
|
47 |
+
if version == 'IIR':
|
48 |
+
# Compute chroma features with filter bank (using IIR elliptic filter)
|
49 |
+
X = librosa.iirt(y=x, sr=Fs, win_length=N, hop_length=H, center=True, tuning=0.0)
|
50 |
+
if gamma is not None:
|
51 |
+
X = np.log(1.0 + gamma * X)
|
52 |
+
X = librosa.feature.chroma_cqt(C=X, bins_per_octave=12, n_octaves=7,
|
53 |
+
fmin=librosa.midi_to_hz(24), norm=None)
|
54 |
+
if norm is not None:
|
55 |
+
X = libfmp.c3.normalize_feature_sequence(X, norm=norm)
|
56 |
+
Fs_X = Fs / H
|
57 |
+
return X, Fs_X, x, Fs, x_dur
|
58 |
+
|
59 |
+
def compute_chromagram(y, sr, Fs=22050, N=4096, H=2048, gamma=None, version='STFT', norm='2'):
|
60 |
+
"""Compute chromagram for WAV file specified by filename
|
61 |
+
|
62 |
+
Notebook: C5/C5S2_ChordRec_Templates.ipynb
|
63 |
+
|
64 |
+
Args:
|
65 |
+
y (np.ndarray): Audio signal
|
66 |
+
sr (scalar): Sampling rate
|
67 |
+
Fs (scalar): Sampling rate (Default value = 22050)
|
68 |
+
N (int): Window size (Default value = 4096)
|
69 |
+
H (int): Hop size (Default value = 2048)
|
70 |
+
gamma (float): Constant for logarithmic compression (Default value = None)
|
71 |
+
version (str): Technique used for front-end decomposition ('STFT', 'IIS', 'CQT') (Default value = 'STFT')
|
72 |
+
norm (str): If not 'None', chroma vectors are normalized by norm as specified ('1', '2', 'max')
|
73 |
+
(Default value = '2')
|
74 |
+
|
75 |
+
Returns:
|
76 |
+
X (np.ndarray): Chromagram
|
77 |
+
Fs_X (scalar): Feature reate of chromagram
|
78 |
+
x (np.ndarray): Audio signal
|
79 |
+
Fs (scalar): Sampling rate of audio signal
|
80 |
+
x_dur (float): Duration (seconds) of audio signal
|
81 |
+
"""
|
82 |
+
x = librosa.resample(y, sr, Fs)
|
83 |
+
x_dur = x.shape[0] / Fs
|
84 |
+
if version == 'STFT':
|
85 |
+
# Compute chroma features with STFT
|
86 |
+
X = librosa.stft(x, n_fft=N, hop_length=H, pad_mode='constant', center=True)
|
87 |
+
if gamma is not None:
|
88 |
+
X = np.log(1 + gamma * np.abs(X) ** 2)
|
89 |
+
else:
|
90 |
+
X = np.abs(X) ** 2
|
91 |
+
X = librosa.feature.chroma_stft(S=X, sr=Fs, tuning=0, norm=None, hop_length=H, n_fft=N)
|
92 |
+
if version == 'CQT':
|
93 |
+
# Compute chroma features with CQT decomposition
|
94 |
+
X = librosa.feature.chroma_cqt(y=x, sr=Fs, hop_length=H, norm=None)
|
95 |
+
if version == 'IIR':
|
96 |
+
# Compute chroma features with filter bank (using IIR elliptic filter)
|
97 |
+
X = librosa.iirt(y=x, sr=Fs, win_length=N, hop_length=H, center=True, tuning=0.0)
|
98 |
+
if gamma is not None:
|
99 |
+
X = np.log(1.0 + gamma * X)
|
100 |
+
X = librosa.feature.chroma_cqt(C=X, bins_per_octave=12, n_octaves=7,
|
101 |
+
fmin=librosa.midi_to_hz(24), norm=None)
|
102 |
+
if norm is not None:
|
103 |
+
X = libfmp.c3.normalize_feature_sequence(X, norm=norm)
|
104 |
+
Fs_X = Fs / H
|
105 |
+
return X, Fs_X, x, Fs, x_dur
|
106 |
+
|
107 |
+
def get_chord_labels(ext_minor='m', nonchord=False):
|
108 |
+
"""Generate chord labels for major and minor triads (and possibly nonchord label)
|
109 |
+
|
110 |
+
Notebook: C5/C5S2_ChordRec_Templates.ipynb
|
111 |
+
|
112 |
+
Args:
|
113 |
+
ext_minor (str): Extension for minor chords (Default value = 'm')
|
114 |
+
nonchord (bool): If "True" then add nonchord label (Default value = False)
|
115 |
+
|
116 |
+
Returns:
|
117 |
+
chord_labels (list): List of chord labels
|
118 |
+
"""
|
119 |
+
chroma_labels = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
|
120 |
+
chord_labels_maj = chroma_labels
|
121 |
+
chord_labels_min = [s + ext_minor for s in chroma_labels]
|
122 |
+
chord_labels = chord_labels_maj + chord_labels_min
|
123 |
+
if nonchord is True:
|
124 |
+
chord_labels = chord_labels + ['N']
|
125 |
+
return chord_labels
|
126 |
+
|
127 |
+
def generate_chord_templates(nonchord=False):
|
128 |
+
"""Generate chord templates of major and minor triads (and possibly nonchord)
|
129 |
+
|
130 |
+
Notebook: C5/C5S2_ChordRec_Templates.ipynb
|
131 |
+
|
132 |
+
Args:
|
133 |
+
nonchord (bool): If "True" then add nonchord template (Default value = False)
|
134 |
+
|
135 |
+
Returns:
|
136 |
+
chord_templates (np.ndarray): Matrix containing chord_templates as columns
|
137 |
+
"""
|
138 |
+
template_cmaj = np.array([1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0]).T
|
139 |
+
template_cmin = np.array([1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0]).T
|
140 |
+
num_chord = 24
|
141 |
+
if nonchord:
|
142 |
+
num_chord = 25
|
143 |
+
chord_templates = np.ones((12, num_chord))
|
144 |
+
for shift in range(12):
|
145 |
+
chord_templates[:, shift] = np.roll(template_cmaj, shift)
|
146 |
+
chord_templates[:, shift+12] = np.roll(template_cmin, shift)
|
147 |
+
return chord_templates
|
148 |
+
|
149 |
+
def chord_recognition_template(X, norm_sim='1', nonchord=False):
|
150 |
+
"""Conducts template-based chord recognition
|
151 |
+
with major and minor triads (and possibly nonchord)
|
152 |
+
|
153 |
+
Notebook: C5/C5S2_ChordRec_Templates.ipynb
|
154 |
+
|
155 |
+
Args:
|
156 |
+
X (np.ndarray): Chromagram
|
157 |
+
norm_sim (str): Specifies norm used for normalizing chord similarity matrix (Default value = '1')
|
158 |
+
nonchord (bool): If "True" then add nonchord template (Default value = False)
|
159 |
+
|
160 |
+
Returns:
|
161 |
+
chord_sim (np.ndarray): Chord similarity matrix
|
162 |
+
chord_max (np.ndarray): Binarized chord similarity matrix only containing maximizing chord
|
163 |
+
"""
|
164 |
+
chord_templates = generate_chord_templates(nonchord=nonchord)
|
165 |
+
X_norm = libfmp.c3.normalize_feature_sequence(X, norm='2')
|
166 |
+
chord_templates_norm = libfmp.c3.normalize_feature_sequence(chord_templates, norm='2')
|
167 |
+
chord_sim = np.matmul(chord_templates_norm.T, X_norm)
|
168 |
+
if norm_sim is not None:
|
169 |
+
chord_sim = libfmp.c3.normalize_feature_sequence(chord_sim, norm=norm_sim)
|
170 |
+
# chord_max = (chord_sim == chord_sim.max(axis=0)).astype(int)
|
171 |
+
chord_max_index = np.argmax(chord_sim, axis=0)
|
172 |
+
chord_max = np.zeros(chord_sim.shape).astype(np.int32)
|
173 |
+
for n in range(chord_sim.shape[1]):
|
174 |
+
chord_max[chord_max_index[n], n] = 1
|
175 |
+
|
176 |
+
return chord_sim, chord_max
|
177 |
+
|
178 |
+
def plot_chord_recognition(y, sr) :
|
179 |
+
import warnings
|
180 |
+
warnings.warn("This function is deprecated and will be removed in future versions.", DeprecationWarning)
|
181 |
+
|
182 |
+
X, Fs_X, x, Fs, x_dur = compute_chromagram(y, sr)
|
183 |
+
|
184 |
+
chord_sim, chord_max = chord_recognition_template(X, norm_sim='max')
|
185 |
+
chord_labels = get_chord_labels(nonchord=False)
|
186 |
+
|
187 |
+
cmap = libfmp.b.compressed_gray_cmap(alpha=1, reverse=False)
|
188 |
+
fig, ax = plt.subplots(2, 2, gridspec_kw={'width_ratios': [1, 0.03],
|
189 |
+
'height_ratios': [1.5, 3]}, figsize=(8, 10))
|
190 |
+
|
191 |
+
libfmp.b.plot_chromagram(X, ax=[ax[0,0], ax[0,1]], Fs=Fs_X, clim=[0, 1], xlabel='',
|
192 |
+
title='STFT-based chromagram (feature rate = %0.1f Hz)' % (Fs_X))
|
193 |
+
libfmp.b.plot_matrix(chord_max, ax=[ax[1, 0], ax[1, 1]], Fs=Fs_X,
|
194 |
+
title='Time–chord representation of chord recognition result',
|
195 |
+
ylabel='Chord', xlabel='')
|
196 |
+
ax[1, 0].set_yticks(np.arange( len(chord_labels) ))
|
197 |
+
ax[1, 0].set_yticklabels(chord_labels)
|
198 |
+
ax[1, 0].grid()
|
199 |
+
plt.tight_layout()
|
200 |
+
return fig, ax, chord_max
|
201 |
+
|
202 |
+
def plot_binary_template_chord_recognition(y, sr) :
|
203 |
+
import warnings
|
204 |
+
warnings.warn("This function is deprecated and will be removed in future versions.", DeprecationWarning)
|
205 |
+
|
206 |
+
X, Fs_X, x, Fs, x_dur = compute_chromagram(y, sr)
|
207 |
+
chord_sim, chord_max = chord_recognition_template(X, norm_sim='max')
|
208 |
+
|
209 |
+
chord_templates = generate_chord_templates()
|
210 |
+
X_chord = np.matmul(chord_templates, chord_max)
|
211 |
+
|
212 |
+
fig, ax = plt.subplots(2, 2, gridspec_kw={'width_ratios': [1, 0.03],
|
213 |
+
'height_ratios': [1, 1]}, figsize=(8, 5))
|
214 |
+
|
215 |
+
libfmp.b.plot_chromagram(X, ax=[ax[0, 0], ax[0, 1]], Fs=Fs_X, clim=[0, 1], xlabel='',
|
216 |
+
title='STFT-based chromagram (feature rate = %0.1f Hz)' % (Fs_X))
|
217 |
+
libfmp.b.plot_chromagram(X_chord, ax=[ax[1, 0], ax[1, 1]], Fs=Fs_X, clim=[0, 1], xlabel='',
|
218 |
+
title='Binary templates of the chord recognition result')
|
219 |
+
plt.tight_layout()
|
220 |
+
return fig, ax
|
221 |
+
|
222 |
+
|
223 |
+
def chord_table(chord_max):
|
224 |
+
|
225 |
+
chord_labels = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B'] + ['Cm', 'C#m', 'Dm', 'D#m', 'Em', 'Fm', 'F#m', 'Gm', 'G#m', 'Am', 'A#m', 'Bm']
|
226 |
+
|
227 |
+
# 計算chord_max依照第一個軸的最大值的index
|
228 |
+
chord_max_index = np.argmax(chord_max, axis=0)
|
229 |
+
# 用index找出對應的chord_labels
|
230 |
+
chord_results = [chord_labels[i] for i in chord_max_index]
|
231 |
+
|
232 |
+
return chord_results
|
233 |
+
|
234 |
+
|
235 |
+
def plot_chord(chroma, title="", figsize=(12, 6), cmap="coolwarm", include_minor=False):
|
236 |
+
import seaborn as sns
|
237 |
+
chroma_labels = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
|
238 |
+
if include_minor:
|
239 |
+
chroma_labels += ['Cm', 'C#m', 'Dm', 'D#m', 'Em', 'Fm', 'F#m', 'Gm', 'G#m', 'Am', 'A#m', 'Bm']
|
240 |
+
|
241 |
+
fig, ax = plt.subplots(figsize=figsize)
|
242 |
+
|
243 |
+
sns.heatmap(chroma, ax=ax, cmap=cmap, linewidths=0.01, linecolor=(1, 1, 1, 0.1))
|
244 |
+
ax.invert_yaxis()
|
245 |
+
ax.set_yticks(
|
246 |
+
np.arange(len(chroma_labels)) + 0.5,
|
247 |
+
chroma_labels,
|
248 |
+
rotation=0,
|
249 |
+
)
|
250 |
+
ax.set_ylabel("Chord")
|
251 |
+
ax.set_xlabel('Time (frame)')
|
252 |
+
ax.set_title(title)
|
253 |
+
|
254 |
+
return fig, ax
|
255 |
+
|
256 |
+
def plot_user_chord(df):
|
257 |
+
|
258 |
+
import seaborn as sns
|
259 |
+
chroma_labels = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B'] + ['Cm', 'C#m', 'Dm', 'D#m', 'Em', 'Fm', 'F#m', 'Gm', 'G#m', 'Am', 'A#m', 'Bm']
|
260 |
+
|
261 |
+
# 檢查df["Chord"]無chroma_labels以外的值
|
262 |
+
assert df["Chord"].isin(chroma_labels).all(), "Chord must be in chroma_labels"
|
263 |
+
|
264 |
+
# 將df["Chord"]轉成chroma_labels的index
|
265 |
+
df["Chord_index"] = df["Chord"].apply(lambda x: chroma_labels.index(x))
|
266 |
+
|
267 |
+
# 建立一個24 * len(df)的矩陣,並將值設為0
|
268 |
+
chroma = np.zeros((24, len(df)))
|
269 |
+
# 依照df["Chord_index"]的值將chroma的值設為1
|
270 |
+
chroma[df["Chord_index"], np.arange(len(df)),] = 1
|
271 |
+
|
272 |
+
# 繪圖
|
273 |
+
fig, ax = plt.subplots(figsize=(12, 6))
|
274 |
+
sns.heatmap(chroma, ax=ax, cmap='crest', linewidths=0.01, linecolor=(1, 1, 1, 0.1))
|
275 |
+
ax.invert_yaxis()
|
276 |
+
ax.set_yticks(
|
277 |
+
np.arange(len(chroma_labels)) + 0.5,
|
278 |
+
chroma_labels,
|
279 |
+
rotation=0,
|
280 |
+
)
|
281 |
+
ax.set_ylabel("Chord")
|
282 |
+
ax.set_xlabel('Time (frame)')
|
283 |
+
ax.set_title('User Chord Recognition Result')
|
284 |
+
|
285 |
+
return fig, ax
|
src/pitch_estimation.py
ADDED
@@ -0,0 +1,181 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import librosa
|
2 |
+
from librosa import display
|
3 |
+
from librosa import feature
|
4 |
+
|
5 |
+
import numpy as np
|
6 |
+
from matplotlib import pyplot as plt
|
7 |
+
import scipy
|
8 |
+
|
9 |
+
from numpy import typing as npt
|
10 |
+
import typing
|
11 |
+
|
12 |
+
|
13 |
+
def plot_mel_spectrogram(
|
14 |
+
y: npt.ArrayLike,
|
15 |
+
sr:int,
|
16 |
+
shift_array: npt.ArrayLike,
|
17 |
+
with_pitch : bool = True,
|
18 |
+
):
|
19 |
+
|
20 |
+
S = librosa.feature.melspectrogram(y=y, sr=sr)
|
21 |
+
S_dB = librosa.power_to_db(S, ref=np.max)
|
22 |
+
|
23 |
+
if with_pitch :
|
24 |
+
|
25 |
+
f0, voiced_flag, voiced_probs = librosa.pyin(y,
|
26 |
+
fmin=librosa.note_to_hz('C2'),
|
27 |
+
fmax=librosa.note_to_hz('C7'))
|
28 |
+
times = librosa.times_like(f0, sr)
|
29 |
+
|
30 |
+
fig, ax = plt.subplots(figsize=(12,6))
|
31 |
+
img = librosa.display.specshow(S_dB, x_axis='time',
|
32 |
+
y_axis='mel', sr=sr,
|
33 |
+
fmax=8000, ax=ax)
|
34 |
+
ax.plot(times, f0, label='f0', color='cyan', linewidth=3)
|
35 |
+
ax.set_xticks(shift_array - shift_array[0],
|
36 |
+
shift_array)
|
37 |
+
fig.colorbar(img, ax=ax, format='%+2.0f dB')
|
38 |
+
ax.legend(loc='upper right')
|
39 |
+
ax.set(title='Mel-frequency spectrogram')
|
40 |
+
|
41 |
+
|
42 |
+
else :
|
43 |
+
fig, ax = plt.subplots(figsize=(12,6))
|
44 |
+
img = librosa.display.specshow(S_dB, x_axis='time',
|
45 |
+
y_axis='mel', sr=sr,
|
46 |
+
fmax=8000, ax=ax)
|
47 |
+
ax.set_xticks(shift_array - shift_array[0],
|
48 |
+
shift_array)
|
49 |
+
fig.colorbar(img, ax=ax, format='%+2.0f dB')
|
50 |
+
ax.set(title='Mel-frequency spectrogram')
|
51 |
+
ax.set_xlabel('Time (s)')
|
52 |
+
|
53 |
+
return fig, ax
|
54 |
+
|
55 |
+
|
56 |
+
def plot_constant_q_transform(y: npt.ArrayLike, sr:int,
|
57 |
+
shift_array: npt.ArrayLike
|
58 |
+
) :
|
59 |
+
|
60 |
+
C = np.abs(librosa.cqt(y, sr=sr))
|
61 |
+
fig, ax = plt.subplots(figsize=(12,6))
|
62 |
+
img = librosa.display.specshow(librosa.amplitude_to_db(C, ref=np.max),
|
63 |
+
sr=sr, x_axis='time', y_axis='cqt_note', ax=ax)
|
64 |
+
ax.set_xticks(shift_array - shift_array[0],
|
65 |
+
shift_array)
|
66 |
+
ax.set_title('Constant-Q power spectrum')
|
67 |
+
ax.set_xlabel('Time (s)')
|
68 |
+
fig.colorbar(img, ax=ax, format="%+2.0f dB")
|
69 |
+
|
70 |
+
return fig, ax
|
71 |
+
|
72 |
+
|
73 |
+
def pitch_class_type_one_vis(y: npt.ArrayLike, sr: int) -> None :
|
74 |
+
|
75 |
+
S = np.abs(librosa.stft(y))
|
76 |
+
chroma = librosa.feature.chroma_stft(S=S, sr=sr)
|
77 |
+
|
78 |
+
count_pitch = np.empty(np.shape(chroma)) # To count pitch
|
79 |
+
notes = np.array(librosa.key_to_notes('C:maj'))
|
80 |
+
|
81 |
+
# Set the threshold to determine the exact pitch
|
82 |
+
count_pitch[chroma < 0.5] = 0
|
83 |
+
count_pitch[chroma > 0.5] = 1
|
84 |
+
|
85 |
+
# To compute the probability
|
86 |
+
occurProbs = np.empty(np.shape(count_pitch)[0])
|
87 |
+
|
88 |
+
for i in range(np.shape(count_pitch)[0]) :
|
89 |
+
total = np.sum(count_pitch)
|
90 |
+
occurProbs[i] = np.sum(count_pitch[i]) / total
|
91 |
+
|
92 |
+
result = np.vstack((notes, np.round(occurProbs, 4))).T
|
93 |
+
|
94 |
+
ticks = range(12)
|
95 |
+
fig, ax = plt.subplots()
|
96 |
+
plt.title("Pitch Class")
|
97 |
+
plt.bar(ticks,occurProbs * 100, align='center')
|
98 |
+
plt.xticks(ticks, notes)
|
99 |
+
plt.xlabel("Note")
|
100 |
+
plt.ylabel("Number of occurrences %")
|
101 |
+
|
102 |
+
return fig, ax, result
|
103 |
+
|
104 |
+
|
105 |
+
def pitch_class_histogram_chroma(y: npt.ArrayLike, sr: int, higher_resolution: bool, save_to_csv: bool = False) -> None :
|
106 |
+
|
107 |
+
S = np.abs(librosa.stft(y))
|
108 |
+
notes = np.array(librosa.key_to_notes('C:maj')) # For x-axis legend
|
109 |
+
|
110 |
+
if not higher_resolution :
|
111 |
+
|
112 |
+
chroma = librosa.feature.chroma_stft(S=S, sr=sr)
|
113 |
+
valid_pitch = np.empty(np.shape(chroma)) # To count pitch
|
114 |
+
valid_pitch[chroma < 0.7] = 0
|
115 |
+
valid_pitch[chroma >= 0.7] = 1
|
116 |
+
total = np.sum(valid_pitch)
|
117 |
+
|
118 |
+
# To compute the probability
|
119 |
+
# WARNING: (12,) means pure 1-D array
|
120 |
+
occurProbs = np.empty((12,))
|
121 |
+
for i in range(0, 12) :
|
122 |
+
occurProbs[i] = np.sum(valid_pitch[i]) / total
|
123 |
+
|
124 |
+
ticks = range(12)
|
125 |
+
colors = ['lightcoral', 'goldenrod', 'lightseagreen', 'indigo', 'lightcoral',
|
126 |
+
'goldenrod', 'lightseagreen', 'indigo', 'lightcoral', 'goldenrod',
|
127 |
+
'lightseagreen', 'indigo']
|
128 |
+
xLegend = notes
|
129 |
+
|
130 |
+
fig, ax = plt.subplots()
|
131 |
+
ax.bar(ticks,occurProbs * 100, align='center', color=colors)
|
132 |
+
ax.set_xticks(ticks)
|
133 |
+
ax.set_xticklabels(xLegend)
|
134 |
+
ax.set_title("Pitch Class Histogram")
|
135 |
+
ax.set_xlabel("Note")
|
136 |
+
ax.set_ylabel("Occurrences %")
|
137 |
+
|
138 |
+
if higher_resolution :
|
139 |
+
|
140 |
+
chroma = librosa.feature.chroma_stft(S=S, sr=sr, n_chroma=120)
|
141 |
+
valid_pitch = np.empty(np.shape(chroma)) # To count pitch
|
142 |
+
valid_pitch[chroma < 0.7] = 0
|
143 |
+
valid_pitch[chroma >= 0.7] = 1
|
144 |
+
total = np.sum(valid_pitch)
|
145 |
+
|
146 |
+
occurProbs = np.empty((120,))
|
147 |
+
for i in range(0, 120) :
|
148 |
+
occurProbs[i] = np.sum(valid_pitch[i]) / total
|
149 |
+
|
150 |
+
ticks = range(120)
|
151 |
+
fig, ax = plt.subplots()
|
152 |
+
xLegend = list()
|
153 |
+
for i in range(120) :
|
154 |
+
if i % 10 == 0 :
|
155 |
+
xLegend.append(notes[i // 10])
|
156 |
+
else :
|
157 |
+
xLegend.append('')
|
158 |
+
|
159 |
+
colors = list()
|
160 |
+
|
161 |
+
for i in range(120) :
|
162 |
+
if i % 40 >=0 and i % 40 < 10 : colors.append('lightcoral')
|
163 |
+
elif i % 40 >= 10 and i % 40 < 20 : colors.append('goldenrod')
|
164 |
+
elif i % 40 >= 10 and i % 40 < 30 : colors.append('lightseagreen')
|
165 |
+
elif i % 40 >= 10 and i % 40 < 40 : colors.append('indigo')
|
166 |
+
|
167 |
+
fig, ax = plt.subplots()
|
168 |
+
ax.bar(ticks,occurProbs * 100, align='center', color = colors)
|
169 |
+
ax.set_xticks(ticks)
|
170 |
+
ax.set_xticklabels(xLegend)
|
171 |
+
ax.set_title("Pitch Class Histogram")
|
172 |
+
ax.set_xlabel("Note")
|
173 |
+
ax.set_ylabel("Occurrence %")
|
174 |
+
|
175 |
+
result = np.vstack((xLegend, np.round(occurProbs, 4))).T
|
176 |
+
if save_to_csv :
|
177 |
+
with open('pitch_class.csv', 'w') as out :
|
178 |
+
for row in result :
|
179 |
+
print(*row, sep=',', file=out)
|
180 |
+
|
181 |
+
return fig, ax, result
|
src/st_helper.py
ADDED
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
|
3 |
+
@st.experimental_memo
|
4 |
+
def convert_df(df):
|
5 |
+
"""
|
6 |
+
Convert a pandas dataframe into a csv file.
|
7 |
+
For the download button in streamlit.
|
8 |
+
"""
|
9 |
+
return df.to_csv(index=False).encode('utf-8')
|
10 |
+
|
11 |
+
def show_readme(filename):
|
12 |
+
with st.expander("頁面說明(Page Description)"):
|
13 |
+
with open(filename, "r", encoding="utf-8") as f:
|
14 |
+
st.markdown(f.read())
|
15 |
+
|
16 |
+
|
17 |
+
def get_shift(start_time, end_time):
|
18 |
+
"""
|
19 |
+
回傳從start_time到end_time的時間刻度
|
20 |
+
開頭為start_time,結尾為end_time
|
21 |
+
中間每隔1秒一個刻度
|
22 |
+
|
23 |
+
return: a np.array of time stamps
|
24 |
+
"""
|
25 |
+
import numpy as np
|
26 |
+
|
27 |
+
shift_array = np.arange(start_time, end_time, 1)
|
28 |
+
if shift_array[-1] != end_time:
|
29 |
+
shift_array = np.append(shift_array, end_time)
|
30 |
+
|
31 |
+
shift_array = np.round(shift_array, 1)
|
32 |
+
return start_time, shift_array
|
33 |
+
|
src/structure_analysis.py
ADDED
@@ -0,0 +1,248 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import numpy as np
|
2 |
+
import os, sys, librosa
|
3 |
+
from scipy import signal
|
4 |
+
from matplotlib import pyplot as plt
|
5 |
+
import matplotlib
|
6 |
+
import matplotlib.gridspec as gridspec
|
7 |
+
import IPython.display as ipd
|
8 |
+
import pandas as pd
|
9 |
+
from numba import jit
|
10 |
+
|
11 |
+
import libfmp.b
|
12 |
+
import libfmp.c2
|
13 |
+
import libfmp.c3
|
14 |
+
import libfmp.c4
|
15 |
+
import libfmp.c6
|
16 |
+
import libfmp
|
17 |
+
from libfmp.b import FloatingBox
|
18 |
+
|
19 |
+
from numpy import typing as npt
|
20 |
+
import typing
|
21 |
+
|
22 |
+
@jit(nopython=True)
|
23 |
+
def compute_sm_dot(X, Y):
|
24 |
+
"""Computes similarty matrix from feature sequences using dot (inner) product
|
25 |
+
|
26 |
+
Notebook: C4/C4S2_SSM.ipynb
|
27 |
+
|
28 |
+
Args:
|
29 |
+
X (np.ndarray): First sequence
|
30 |
+
Y (np.ndarray): Second Sequence
|
31 |
+
|
32 |
+
Returns:
|
33 |
+
S (float): Dot product
|
34 |
+
"""
|
35 |
+
S = np.dot(np.transpose(X), Y)
|
36 |
+
return S
|
37 |
+
|
38 |
+
def plot_feature_ssm(X, Fs_X, S, Fs_S, ann, duration, color_ann=None,
|
39 |
+
title='', label='Time (seconds)', time=True,
|
40 |
+
figsize=(5, 6), fontsize=10, clim_X=None, clim=None):
|
41 |
+
"""Plot SSM along with feature representation and annotations (standard setting is time in seconds)
|
42 |
+
|
43 |
+
Notebook: C4/C4S2_SSM.ipynb
|
44 |
+
|
45 |
+
Args:
|
46 |
+
X: Feature representation
|
47 |
+
Fs_X: Feature rate of ``X``
|
48 |
+
S: Similarity matrix (SM)
|
49 |
+
Fs_S: Feature rate of ``S``
|
50 |
+
ann: Annotaions
|
51 |
+
duration: Duration
|
52 |
+
color_ann: Color annotations (see :func:`libfmp.b.b_plot.plot_segments`) (Default value = None)
|
53 |
+
title: Figure title (Default value = '')
|
54 |
+
label: Label for time axes (Default value = 'Time (seconds)')
|
55 |
+
time: Display time axis ticks or not (Default value = True)
|
56 |
+
figsize: Figure size (Default value = (5, 6))
|
57 |
+
fontsize: Font size (Default value = 10)
|
58 |
+
clim_X: Color limits for matrix X (Default value = None)
|
59 |
+
clim: Color limits for matrix ``S`` (Default value = None)
|
60 |
+
|
61 |
+
Returns:
|
62 |
+
fig: Handle for figure
|
63 |
+
ax: Handle for axes
|
64 |
+
"""
|
65 |
+
cmap = libfmp.b.compressed_gray_cmap(alpha=-10)
|
66 |
+
fig, ax = plt.subplots(3, 3, gridspec_kw={'width_ratios': [0.1, 1, 0.05],
|
67 |
+
'wspace': 0.2,
|
68 |
+
'height_ratios': [0.3, 1, 0.1]},
|
69 |
+
figsize=figsize)
|
70 |
+
libfmp.b.plot_matrix(X, Fs=Fs_X, ax=[ax[0, 1], ax[0, 2]], clim=clim_X,
|
71 |
+
xlabel='', ylabel='', title=title)
|
72 |
+
ax[0, 0].axis('off')
|
73 |
+
libfmp.b.plot_matrix(S, Fs=Fs_S, ax=[ax[1, 1], ax[1, 2]], cmap=cmap, clim=clim,
|
74 |
+
title='', xlabel='', ylabel='', colorbar=True)
|
75 |
+
ax[1, 1].set_xticks([])
|
76 |
+
ax[1, 1].set_yticks([])
|
77 |
+
libfmp.b.plot_segments(ann, ax=ax[2, 1], time_axis=time, fontsize=fontsize,
|
78 |
+
colors=color_ann,
|
79 |
+
time_label=label, time_max=duration*Fs_X)
|
80 |
+
ax[2, 2].axis('off')
|
81 |
+
ax[2, 0].axis('off')
|
82 |
+
libfmp.b.plot_segments(ann, ax=ax[1, 0], time_axis=time, fontsize=fontsize,
|
83 |
+
direction='vertical', colors=color_ann,
|
84 |
+
time_label=label, time_max=duration*Fs_X)
|
85 |
+
return fig, ax
|
86 |
+
|
87 |
+
def SSM_chorma(wav_fn:str, anno_fn: str, hop_size: int = 4096, Nfft: int = 1024) -> None :
|
88 |
+
|
89 |
+
x, fs = librosa.load(wav_fn)
|
90 |
+
duration= (x.shape[0])/fs
|
91 |
+
|
92 |
+
chromagram = librosa.feature.chroma_stft(y=x, sr=fs, tuning=0, norm=2, hop_length=hop_size, n_fft=Nfft)
|
93 |
+
X, Fs_X = libfmp.c3.smooth_downsample_feature_sequence(chromagram, fs/hop_size, filt_len=41, down_sampling=10)
|
94 |
+
|
95 |
+
# According to the documentation
|
96 |
+
ann, color_ann = libfmp.c4.read_structure_annotation(os.path.join(anno_fn), fn_ann_color=anno_fn)
|
97 |
+
ann_frames = libfmp.c4.convert_structure_annotation(ann, Fs=Fs_X)
|
98 |
+
|
99 |
+
X = libfmp.c3.normalize_feature_sequence(X, norm='2', threshold=0.001)
|
100 |
+
S = compute_sm_dot(X,X)
|
101 |
+
fig, ax = plot_feature_ssm(X, 1, S, 1, ann_frames, duration*Fs_X, color_ann=color_ann,
|
102 |
+
clim_X=[0,1], clim=[0,1], label='Time (frames)',
|
103 |
+
title='Chroma feature (Fs=%0.2f)'%Fs_X)
|
104 |
+
return fig, ax
|
105 |
+
|
106 |
+
def plot_self_similarity(y_ref: npt.ArrayLike, sr: int, affinity: bool = False, hop_length: int = 1024) -> None:
|
107 |
+
'''
|
108 |
+
To visualize the similarity matrix of the signal
|
109 |
+
|
110 |
+
y_ref: reference signal
|
111 |
+
y_comp: signal to be compared
|
112 |
+
sr: sampling rate
|
113 |
+
affinity: to use affinity or not
|
114 |
+
hop_size
|
115 |
+
'''
|
116 |
+
|
117 |
+
|
118 |
+
# Pre-processing stage
|
119 |
+
chroma = librosa.feature.chroma_cqt(y=y_ref, sr=sr, hop_length=hop_length)
|
120 |
+
chroma_stack = librosa.feature.stack_memory(chroma, n_steps=10, delay=3)
|
121 |
+
|
122 |
+
fig, ax = plt.subplots()
|
123 |
+
|
124 |
+
if not affinity:
|
125 |
+
R = librosa.segment.recurrence_matrix(chroma_stack, k=5)
|
126 |
+
imgsim = librosa.display.specshow(R, x_axis='s', y_axis='s',
|
127 |
+
hop_length=hop_length)
|
128 |
+
plt.title('Binary recurrence (symmetric)')
|
129 |
+
plt.colorbar()
|
130 |
+
|
131 |
+
else:
|
132 |
+
R_aff = librosa.segment.recurrence_matrix(chroma_stack, metric='cosine',mode='affinity')
|
133 |
+
imgaff = librosa.display.specshow(R_aff, x_axis='s', y_axis='s',
|
134 |
+
cmap='magma_r', hop_length=hop_length)
|
135 |
+
plt.title('Affinity recurrence')
|
136 |
+
plt.colorbar()
|
137 |
+
|
138 |
+
return fig, ax
|
139 |
+
|
140 |
+
@jit(nopython=True)
|
141 |
+
def compute_kernel_checkerboard_gaussian(L: int =10 , var: float = 0.5, normalize=True) -> npt.ArrayLike:
|
142 |
+
"""Compute Guassian-like checkerboard kernel [FMP, Section 4.4.1].
|
143 |
+
See also: https://scipython.com/blog/visualizing-the-bivariate-gaussian-distribution/
|
144 |
+
|
145 |
+
Notebook: C4/C4S4_NoveltySegmentation.ipynb
|
146 |
+
|
147 |
+
Args:
|
148 |
+
L (int): Parameter specifying the kernel size M=2*L+1
|
149 |
+
var (float): Variance parameter determing the tapering (epsilon) (Default value = 1.0)
|
150 |
+
normalize (bool): Normalize kernel (Default value = True)
|
151 |
+
|
152 |
+
Returns:
|
153 |
+
kernel (np.ndarray): Kernel matrix of size M x M
|
154 |
+
"""
|
155 |
+
taper = np.sqrt(1/2) / (L * var)
|
156 |
+
axis = np.arange(-L, L+1)
|
157 |
+
gaussian1D = np.exp(-taper**2 * (axis**2))
|
158 |
+
gaussian2D = np.outer(gaussian1D, gaussian1D)
|
159 |
+
kernel_box = np.outer(np.sign(axis), np.sign(axis))
|
160 |
+
kernel = kernel_box * gaussian2D
|
161 |
+
if normalize:
|
162 |
+
kernel = kernel / np.sum(np.abs(kernel))
|
163 |
+
return kernel
|
164 |
+
|
165 |
+
|
166 |
+
def compute_novelty_ssm(S, kernel: npt.ArrayLike = None, L: int = 10, var: float = 0.5, exclude: bool =False) -> npt.ArrayLike:
|
167 |
+
"""Compute novelty function from SSM [FMP, Section 4.4.1]
|
168 |
+
|
169 |
+
Notebook: C4/C4S4_NoveltySegmentation.ipynb
|
170 |
+
|
171 |
+
Args:
|
172 |
+
S (np.ndarray): SSM
|
173 |
+
kernel (np.ndarray): Checkerboard kernel (if kernel==None, it will be computed) (Default value = None)
|
174 |
+
L (int): Parameter specifying the kernel size M=2*L+1 (Default value = 10)
|
175 |
+
var (float): Variance parameter determing the tapering (epsilon) (Default value = 0.5)
|
176 |
+
exclude (bool): Sets the first L and last L values of novelty function to zero (Default value = False)
|
177 |
+
|
178 |
+
Returns:
|
179 |
+
nov (np.ndarray): Novelty function
|
180 |
+
"""
|
181 |
+
if kernel is None:
|
182 |
+
kernel = compute_kernel_checkerboard_gaussian(L=L, var=var)
|
183 |
+
N = S.shape[0]
|
184 |
+
M = 2*L + 1
|
185 |
+
nov = np.zeros(N)
|
186 |
+
# np.pad does not work with numba/jit
|
187 |
+
S_padded = np.pad(S, L, mode='constant')
|
188 |
+
|
189 |
+
for n in range(N):
|
190 |
+
# Does not work with numba/jit
|
191 |
+
nov[n] = np.sum(S_padded[n:n+M, n:n+M] * kernel)
|
192 |
+
if exclude:
|
193 |
+
right = np.min([L, N])
|
194 |
+
left = np.max([0, N-L])
|
195 |
+
nov[0:right] = 0
|
196 |
+
nov[left:N] = 0
|
197 |
+
|
198 |
+
return nov
|
199 |
+
|
200 |
+
def SSM_Novelty(wav_filename:str, anno_csv_filename: str) -> None :
|
201 |
+
|
202 |
+
float_box = libfmp.b.FloatingBox()
|
203 |
+
|
204 |
+
fn_wav = os.path.join(wav_filename)
|
205 |
+
ann, color_ann = libfmp.c4.read_structure_annotation(os.path.join(anno_csv_filename),
|
206 |
+
fn_ann_color=anno_csv_filename)
|
207 |
+
|
208 |
+
S_dict = {}
|
209 |
+
Fs_dict = {}
|
210 |
+
x, x_duration, X, Fs_X, S, I = libfmp.c4.compute_sm_from_filename(fn_wav,
|
211 |
+
L=11, H=5, L_smooth=1, thresh=1)
|
212 |
+
|
213 |
+
S_dict[0], Fs_dict[0] = S, Fs_X
|
214 |
+
ann_frames = libfmp.c4.convert_structure_annotation(ann, Fs=Fs_X)
|
215 |
+
fig, ax = libfmp.c4.plot_feature_ssm(X, 1, S, 1, ann_frames, x_duration*Fs_X,
|
216 |
+
label='Time (frames)', color_ann=color_ann, clim_X=[0,1], clim=[0,1],
|
217 |
+
title='Feature rate: %0.0f Hz'%(Fs_X), figsize=(4.5, 5.5))
|
218 |
+
float_box.add_fig(fig)
|
219 |
+
|
220 |
+
x, x_duration, X, Fs_X, S, I = libfmp.c4.compute_sm_from_filename(fn_wav,
|
221 |
+
L=41, H=10, L_smooth=1, thresh=1)
|
222 |
+
S_dict[1], Fs_dict[1] = S, Fs_X
|
223 |
+
ann_frames = libfmp.c4.convert_structure_annotation(ann, Fs=Fs_X)
|
224 |
+
fig, ax = libfmp.c4.plot_feature_ssm(X, 1, S, 1, ann_frames, x_duration*Fs_X,
|
225 |
+
label='Time (frames)', color_ann=color_ann, clim_X=[0,1], clim=[0,1],
|
226 |
+
title='Feature rate: %0.0f Hz'%(Fs_X), figsize=(4.5, 5.5))
|
227 |
+
float_box.add_fig(fig)
|
228 |
+
float_box.show()
|
229 |
+
|
230 |
+
figsize=(10,6)
|
231 |
+
L_kernel_set = [5, 10, 20, 40]
|
232 |
+
num_kernel = len(L_kernel_set)
|
233 |
+
num_SSM = len(S_dict)
|
234 |
+
|
235 |
+
fig, ax = plt.subplots(num_kernel, num_SSM, figsize=figsize)
|
236 |
+
for s in range(num_SSM):
|
237 |
+
for t in range(num_kernel):
|
238 |
+
L_kernel = L_kernel_set[t]
|
239 |
+
S = S_dict[s]
|
240 |
+
nov = compute_novelty_ssm(S, L=L_kernel, exclude=True)
|
241 |
+
fig_nov, ax_nov, line_nov = libfmp.b.plot_signal(nov, Fs = Fs_dict[s],
|
242 |
+
color='k', ax=ax[t,s], figsize=figsize,
|
243 |
+
title='Feature rate = %0.0f Hz, $L_\mathrm{kernel}$ = %d'%(Fs_dict[s],L_kernel))
|
244 |
+
libfmp.b.plot_segments_overlay(ann, ax=ax_nov, colors=color_ann, alpha=0.1,
|
245 |
+
edgecolor='k', print_labels=False)
|
246 |
+
plt.tight_layout()
|
247 |
+
plt.show()
|
248 |
+
|
src/timbre_analysis.py
ADDED
@@ -0,0 +1,125 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import librosa
|
2 |
+
from librosa import display
|
3 |
+
from librosa import feature
|
4 |
+
|
5 |
+
import numpy as np
|
6 |
+
from numpy import typing as npt
|
7 |
+
|
8 |
+
from matplotlib import pyplot as plt
|
9 |
+
import scipy
|
10 |
+
|
11 |
+
|
12 |
+
def spectral_centroid_analysis(y: npt.ArrayLike, sr: int, shift_array: npt.ArrayLike) -> None :
|
13 |
+
|
14 |
+
S, phase = librosa.magphase(librosa.stft(y=y))
|
15 |
+
cent = librosa.feature.spectral_centroid(S=S)
|
16 |
+
times = librosa.times_like(cent, sr)
|
17 |
+
|
18 |
+
fig, ax = plt.subplots()
|
19 |
+
librosa.display.specshow(librosa.amplitude_to_db(S, ref=np.max),
|
20 |
+
y_axis='log', x_axis='time', ax=ax, sr=sr)
|
21 |
+
ax.plot(times, cent.T, label='Spectral centroid', color='w')
|
22 |
+
ax.legend(loc='upper right')
|
23 |
+
ax.set(title='log Power spectrogram')
|
24 |
+
ax.set_xticks(shift_array - shift_array[0],
|
25 |
+
shift_array)
|
26 |
+
ax.autoscale()
|
27 |
+
|
28 |
+
result = np.vstack((times, cent))
|
29 |
+
|
30 |
+
return fig, ax, result
|
31 |
+
|
32 |
+
|
33 |
+
def rolloff_frequency_analysis(y: npt.ArrayLike, sr: int, roll_percent:float = 0.99,
|
34 |
+
shift_array: npt.ArrayLike =None) -> None :
|
35 |
+
|
36 |
+
rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr, roll_percent=roll_percent)
|
37 |
+
rolloff_min = librosa.feature.spectral_rolloff(y=y, sr=sr, roll_percent=0.01)
|
38 |
+
times = librosa.times_like(rolloff, sr)
|
39 |
+
S, phase = librosa.magphase(librosa.stft(y))
|
40 |
+
|
41 |
+
fig, ax = plt.subplots()
|
42 |
+
librosa.display.specshow(librosa.amplitude_to_db(S, ref=np.max),
|
43 |
+
y_axis='log', x_axis='time', ax=ax, sr=sr)
|
44 |
+
ax.plot(librosa.times_like(rolloff,sr), rolloff[0], label=f'Roll-off frequency ({roll_percent})')
|
45 |
+
ax.plot(librosa.times_like(rolloff,sr), rolloff_min[0], color='w',
|
46 |
+
label='Roll-off frequency (0.01)')
|
47 |
+
ax.legend(loc='lower right')
|
48 |
+
ax.set(title='log Power spectrogram')
|
49 |
+
ax.set_xticks(shift_array - shift_array[0],
|
50 |
+
shift_array)
|
51 |
+
ax.autoscale()
|
52 |
+
|
53 |
+
result = np.vstack((times, rolloff, rolloff_min))
|
54 |
+
|
55 |
+
return fig, ax, result
|
56 |
+
|
57 |
+
def spectral_bandwidth_analysis(y: npt.ArrayLike, sr: int, shift_array: npt.ArrayLike =None) -> None :
|
58 |
+
|
59 |
+
S, phase = librosa.magphase(librosa.stft(y=y))
|
60 |
+
spec_bw = librosa.feature.spectral_bandwidth(S=S)
|
61 |
+
times = librosa.times_like(spec_bw, sr)
|
62 |
+
|
63 |
+
fig, ax = plt.subplots(nrows=2, sharex=True)
|
64 |
+
centroid = librosa.feature.spectral_centroid(S=S, sr=sr)
|
65 |
+
ax[0].semilogy(times, spec_bw[0], label='Spectral bandwidth')
|
66 |
+
ax[0].set(ylabel='Hz', xticks=[], xlim=[times.min(), times.max()])
|
67 |
+
ax[0].legend()
|
68 |
+
ax[0].label_outer()
|
69 |
+
librosa.display.specshow(librosa.amplitude_to_db(S, ref=np.max),
|
70 |
+
y_axis='log', x_axis='time', ax=ax[1], sr=sr)
|
71 |
+
ax[1].set(title='log Power spectrogram')
|
72 |
+
ax[1].fill_between(times, np.maximum(0, centroid[0] - spec_bw[0]),
|
73 |
+
np.minimum(centroid[0] + spec_bw[0], sr/2),
|
74 |
+
alpha=0.5, label='Centroid +- bandwidth')
|
75 |
+
ax[1].plot(times, centroid[0], label='Spectral centroid', color='w')
|
76 |
+
ax[1].legend(loc='lower right')
|
77 |
+
ax[1].set_xticks(shift_array - shift_array[0],
|
78 |
+
shift_array)
|
79 |
+
ax[1].autoscale()
|
80 |
+
|
81 |
+
result = np.vstack((times, spec_bw))
|
82 |
+
|
83 |
+
return fig, ax, result
|
84 |
+
|
85 |
+
|
86 |
+
def harmonic_percussive_source_separation(y: npt.ArrayLike, sr: int,
|
87 |
+
shift_array: npt.ArrayLike =None
|
88 |
+
) -> None :
|
89 |
+
|
90 |
+
D = librosa.stft(y)
|
91 |
+
H, P = librosa.decompose.hpss(D)
|
92 |
+
t = librosa.frames_to_time(np.arange(D.shape[1]), sr=sr)
|
93 |
+
|
94 |
+
fig, ax = plt.subplots(nrows=3, sharex=False, sharey=False, figsize=(12, 8))
|
95 |
+
# 設置子圖之間的水平間距和垂直間距
|
96 |
+
plt.subplots_adjust(hspace=0.6, wspace=0.3)
|
97 |
+
img = librosa.display.specshow(librosa.amplitude_to_db(np.abs(D),ref=np.max),
|
98 |
+
y_axis='log', x_axis='time', ax=ax[0], sr=sr)
|
99 |
+
ax[0].set(title='Full power spectrogram')
|
100 |
+
#// ax[0].label_outer()
|
101 |
+
ax[0].set_xlabel('') # 不顯示x軸名稱
|
102 |
+
ax[0].set_xticks(shift_array - shift_array[0],
|
103 |
+
shift_array)
|
104 |
+
ax[0].autoscale()
|
105 |
+
|
106 |
+
librosa.display.specshow(librosa.amplitude_to_db(np.abs(H), ref=np.max(np.abs(D))),
|
107 |
+
y_axis='log', x_axis='time', ax=ax[1], sr=sr)
|
108 |
+
ax[1].set(title='Harmonic power spectrogram')
|
109 |
+
#// ax[1].label_outer()
|
110 |
+
ax[1].set_xlabel('') # 不顯示x軸名稱
|
111 |
+
ax[1].set_xticks(shift_array - shift_array[0],
|
112 |
+
shift_array)
|
113 |
+
ax[1].autoscale()
|
114 |
+
|
115 |
+
librosa.display.specshow(librosa.amplitude_to_db(np.abs(P), ref=np.max(np.abs(D))),
|
116 |
+
y_axis='log', x_axis='time', ax=ax[2], sr=sr)
|
117 |
+
ax[2].set(title='Percussive power spectrogram')
|
118 |
+
ax[2].set_xticks(shift_array - shift_array[0],
|
119 |
+
shift_array)
|
120 |
+
ax[2].autoscale()
|
121 |
+
|
122 |
+
fig.colorbar(img, ax=ax, format='%+2.0f dB')
|
123 |
+
|
124 |
+
return fig, ax, (D, H, P, t)
|
125 |
+
|