介紹如何在 Ubuntu Linux 中使用 Python 的 librosa
模組分析聲音訊號或各種音樂檔案。
librosa 是一個專門用來分析聲音訊號的 Python 模組,以下是在 Ubuntu Linux 中安裝與使用 librosa
模組的教學,以及各種分析流程範例程式碼。
Python 的 librosa
模組可以使用 pip
安裝:
# 安裝 librosa 模組
pip install librosa
若採用 Conda 或 Anaconda 環境,可以使用 conda
指令安裝 librosa
模組:
# 以 conda 安裝 librosa 模組 conda install -c conda-forge librosa
ffmpeg
為了使 audioread
可以支援更多的聲音檔案格式,建議同時安裝 ffmpeg
,若在 Ubuntu Linux 中可以使用 apt
安裝:
# 安裝 ffmpeg 套件
sudo apt install ffmpeg
在 librosa
模組中有附帶範例的聲音檔案,可以做為開發與測試使用,librosa.util.list_examples()
函數可以列出所有範例聲音檔案的資訊:
import librosa # 列出範例聲音檔案 librosa.util.list_examples()
AVAILABLE EXAMPLES -------------------------------------------------------------------- brahms Brahms - Hungarian Dance #5 choice Admiral Bob - Choice (drum+bass) fishin Karissa Hobbs - Let's Go Fishin' nutcracker Tchaikovsky - Dance of the Sugar Plum Fairy trumpet Mihai Sorohan - Trumpet loop vibeace Kevin MacLeod - Vibe Ace
我們可以透過 librosa.example()
下載並取得範例聲音檔的路徑:
# 下載並取得 nutcracker 範例聲音檔的路徑 filename = librosa.example('nutcracker')
Downloading file 'Kevin_MacLeod_-_P_I_Tchaikovsky_Dance_of_the_Sugar_Plum_Fairy.ogg' from 'https://librosa.org/data/audio/Kevin_MacLeod_-_P_I_Tchaikovsky_Dance_of_the_Sugar_Plum_Fairy.ogg' to '/home/og/.cache/librosa'.
若要讀取聲音檔案,可以使用 librosa.load
函數:
# 讀取聲音檔 # y:波形資料 # sr:取樣頻率(Hz) y, sr = librosa.load(filename)
讀取出來的 y
是一個以一維 NumPy 浮點數陣列所儲存的時間序列(time series)資料,而 sr
則是取樣頻率(sampling rate),在預設的狀況下,從聲音檔載入的音訊會自動轉換為單聲道(mono)、頻率為 22050 Hz 的聲音訊號,若要修改預設值,可以自行調整 librosa.load
函數的參數。
將波形資料讀取出來之後,可以使用 librosa.display.waveshow()
搭配 matplotlib
模組來繪製波形圖:
import librosa.display import matplotlib.pyplot as plt # 繪製波形圖 plt.figure() librosa.display.waveshow(y, sr=sr) plt.title('nutcracker waveform') plt.show()
librosa.stft()
函數可以用來計算短時距傅立葉變換(STFT,Short-time Fourier transform),而頻譜的繪圖則可使用 librosa.display.specshow()
函數:
import librosa.display import numpy as np import matplotlib.pyplot as plt # 計算短時距傅立葉變換 S = np.abs(librosa.stft(y)) # 繪製短時距傅立葉變換圖 fig, ax = plt.subplots() img = librosa.display.specshow( librosa.amplitude_to_db(S, ref=np.max), y_axis='log', x_axis='time', ax=ax) ax.set_title('Power spectrogram') fig.colorbar(img, ax=ax, format="%+2.0f dB") plt.show()
librosa.feature.melspectrogram()
函數可以用來計算梅爾頻譜(mel spectrogram),而頻譜的繪圖則可使用 librosa.display.specshow()
函數:
import librosa.display import numpy as np import matplotlib.pyplot as plt # 計算梅爾頻譜 S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128, fmax=8000) # 繪製梅爾頻譜圖 fig, ax = plt.subplots() S_dB = librosa.power_to_db(S, ref=np.max) img = librosa.display.specshow(S_dB, x_axis='time', y_axis='mel', sr=sr, fmax=8000, ax=ax) fig.colorbar(img, ax=ax, format='%+2.0f dB') ax.set(title='Mel-frequency spectrogram') plt.show()
librosa.feature.mfcc()
函數可以用來計算梅爾頻率倒譜係數(Mel-Frequency Cepstral Coefficients),繪圖則可使用 librosa.display.specshow()
函數:
import matplotlib.pyplot as plt # 計算梅爾頻率倒譜係數 mfccs = librosa.feature.mfcc(y=y, sr=sr) # 繪製梅爾頻率倒譜係數圖 fig, ax = plt.subplots() img = librosa.display.specshow(mfccs, x_axis='time', ax=ax) fig.colorbar(img, ax=ax) ax.set(title='MFCC') plt.show()
在實際分析聲音訊號時,我們常會將波形圖與梅爾頻譜圖放在一起對照:
import librosa.display import numpy as np import matplotlib.pyplot as plt fig, ax = plt.subplots(2, 1) # 繪製波形圖 librosa.display.waveshow(y, sr=sr, ax=ax[0]) ax[0].set_title('nutcracker waveform') # 繪製梅爾頻譜圖 S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128, fmax=8000) S_dB = librosa.power_to_db(S, ref=np.max) librosa.display.specshow(S_dB, x_axis='time', y_axis='mel', sr=sr, fmax=8000, ax=ax[1]) ax[1].set_title('Mel-frequency spectrogram') plt.tight_layout() plt.show()
librosa.beat.beat_track()
可以用來偵測音樂中的節拍:
# 偵測節拍
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
這裡得到的 tempo
是節奏頻率,單位為次數/分鐘:
# 節奏頻率(次/分鐘) print(tempo)
107.666015625
而 beat_frames
則是節拍出現的 frame 編號(每一個 frame 的長度是由 hop_length
所指定):
# 節拍的 frame 編號 print(beat_frames)
array([ 51, 74, 100, 124, 149, 173, 198, 221, 247, 271, 295, 319, 344, 368, 393, 416, 440, 462, 486, 508, 531, 554, 578, 601, 624, 647, 670, 693, 716, 739, 762, 786, 810, 832, 856, 878, 901, 924, 947, 970, 993, 1016, 1039, 1062, 1085, 1108, 1131, 1155, 1178, 1201, 1225, 1249, 1273, 1297, 1321, 1344, 1369, 1393, 1415, 1437, 1460, 1483, 1505, 1527, 1550, 1573, 1595, 1618, 1642, 1665, 1689, 1712, 1736, 1759, 1783, 1806, 1829, 1853, 1876, 1900, 1924, 1947, 1971, 1994, 2018, 2042, 2065, 2088, 2110, 2132, 2155, 2177, 2200, 2223, 2244, 2266, 2290, 2313, 2336, 2359, 2381, 2404, 2427, 2451, 2474, 2498, 2522, 2545, 2568, 2592, 2615, 2639, 2662, 2684, 2706, 2729, 2752, 2775, 2797, 2819, 2842, 2864, 2887, 2910, 2933, 2956, 2979, 3003, 3027, 3051, 3075, 3100, 3125, 3150, 3174, 3199, 3223, 3246, 3269, 3295, 3318, 3343, 3367, 3391, 3415, 3439, 3462, 3485, 3509, 3534, 3560, 3586, 3612, 3637, 3664, 3689, 3715, 3740, 3766, 3791, 3817, 3842, 3866, 3891, 3916, 3940, 3965, 3990, 4014, 4038, 4063, 4087, 4112, 4136, 4160, 4184, 4208, 4232, 4256, 4280, 4304, 4327, 4352, 4375, 4400, 4423, 4447, 4471, 4495, 4519, 4543, 4567, 4591, 4615, 4639, 4662, 4686, 4710, 4734, 4758, 4782, 4806, 4830, 4854, 4877, 4901, 4926, 4950, 4973, 4997, 5021, 5045])
我們可以透過 librosa.frames_to_time()
函數將 frames 轉為實際時間:
# 將 frames 轉為實際時間 beat_times = librosa.frames_to_time(beat_frames, sr=sr) # 節拍的時間點 print(beat_times)
[ 1.18421769 1.71827664 2.32199546 2.87927438 3.45977324 4.01705215 4.59755102 5.13160998 5.7353288 6.29260771 6.84988662 7.40716553 7.9876644 8.54494331 9.12544218 9.65950113 10.21678005 10.72761905 11.28489796 11.79573696 12.32979592 12.86385488 13.42113379 13.95519274 14.4892517 15.02331066 15.55736961 16.09142857 16.62548753 17.15954649 17.69360544 18.25088435 18.80816327 19.31900227 19.87628118 20.38712018 20.92117914 21.4552381 21.98929705 22.52335601 23.05741497 23.59147392 24.12553288 24.65959184 25.19365079 25.72770975 26.26176871 26.81904762 27.35310658 27.88716553 28.44444444 29.00172336 29.55900227 30.11628118 30.67356009 31.20761905 31.78811791 32.34539683 32.85623583 33.36707483 33.90113379 34.43519274 34.94603175 35.45687075 35.99092971 36.52498866 37.03582766 37.56988662 38.12716553 38.66122449 39.2185034 39.75256236 40.30984127 40.84390023 41.40117914 41.9352381 42.46929705 43.02657596 43.56063492 44.11791383 44.67519274 45.2092517 45.76653061 46.30058957 46.85786848 47.41514739 47.94920635 48.48326531 48.99410431 49.50494331 50.03900227 50.54984127 51.08390023 51.61795918 52.10557823 52.61641723 53.17369615 53.7077551 54.24181406 54.77587302 55.28671202 55.82077098 56.35482993 56.91210884 57.4461678 58.00344671 58.56072562 59.09478458 59.62884354 60.18612245 60.72018141 61.27746032 61.81151927 62.32235828 62.83319728 63.36725624 63.90131519 64.43537415 64.94621315 65.45705215 65.99111111 66.50195011 67.03600907 67.57006803 68.10412698 68.63818594 69.1722449 69.72952381 70.28680272 70.84408163 71.40136054 71.98185941 72.56235828 73.14285714 73.70013605 74.28063492 74.83791383 75.37197279 75.90603175 76.50975057 77.04380952 77.62430839 78.1815873 78.73886621 79.29614512 79.85342404 80.38748299 80.92154195 81.47882086 82.05931973 82.66303855 83.26675737 83.87047619 84.45097506 85.07791383 85.6584127 86.26213152 86.84263039 87.44634921 88.02684807 88.63056689 89.21106576 89.76834467 90.34884354 90.9293424 91.48662132 92.06712018 92.64761905 93.20489796 93.76217687 94.34267574 94.89995465 95.48045351 96.03773243 96.59501134 97.15229025 97.70956916 98.26684807 98.82412698 99.3814059 99.93868481 100.47274376 101.05324263 101.58730159 102.16780045 102.70185941 103.25913832 103.81641723 104.37369615 104.93097506 105.48825397 106.04553288 106.60281179 107.1600907 107.71736961 108.25142857 108.80870748 109.36598639 109.92326531 110.48054422 111.03782313 111.59510204 112.15238095 112.70965986 113.24371882 113.80099773 114.3814966 114.93877551 115.47283447 116.03011338 116.58739229 117.1446712 ]
librosa
模組之中還有其他很多的音訊分析工具,詳細的說明與介紹可以參考 librosa 官方網站的文件。