Python 使用 librosa 分析聲音訊號、音樂檔案教學與範例

介紹如何在 Ubuntu Linux 中使用 Python 的 librosa 模組分析聲音訊號或各種音樂檔案。

librosa 是一個專門用來分析聲音訊號的 Python 模組，以下是在 Ubuntu Linux 中安裝與使用 librosa 模組的教學，以及各種分析流程範例程式碼。

安裝 librosa

Python 的 librosa 模組可以使用 pip 安裝：

# 安裝 librosa 模組
pip install librosa

若採用 Conda 或 Anaconda 環境，可以使用 conda 指令安裝 librosa 模組：

# 以 conda 安裝 librosa 模組
conda install -c conda-forge librosa

安裝 `ffmpeg`

為了使 audioread 可以支援更多的聲音檔案格式，建議同時安裝 ffmpeg，若在 Ubuntu Linux 中可以使用 apt 安裝：

# 安裝 ffmpeg 套件
sudo apt install ffmpeg

讀取聲音檔案

在 librosa 模組中有附帶範例的聲音檔案，可以做為開發與測試使用，librosa.util.list_examples() 函數可以列出所有範例聲音檔案的資訊：

import librosa

# 列出範例聲音檔案
librosa.util.list_examples()

AVAILABLE EXAMPLES
--------------------------------------------------------------------
brahms    	Brahms - Hungarian Dance #5
choice    	Admiral Bob - Choice (drum+bass)
fishin    	Karissa Hobbs - Let's Go Fishin'
nutcracker	Tchaikovsky - Dance of the Sugar Plum Fairy
trumpet   	Mihai Sorohan - Trumpet loop
vibeace   	Kevin MacLeod - Vibe Ace

我們可以透過 librosa.example() 下載並取得範例聲音檔的路徑：

# 下載並取得 nutcracker 範例聲音檔的路徑
filename = librosa.example('nutcracker')

Downloading file 'Kevin_MacLeod_-_P_I_Tchaikovsky_Dance_of_the_Sugar_Plum_Fairy.ogg' from 'https://librosa.org/data/audio/Kevin_MacLeod_-_P_I_Tchaikovsky_Dance_of_the_Sugar_Plum_Fairy.ogg' to '/home/og/.cache/librosa'.

若要讀取聲音檔案，可以使用 librosa.load 函數：

# 讀取聲音檔
# y：波形資料
# sr：取樣頻率（Hz）
y, sr = librosa.load(filename)

讀取出來的 y 是一個以一維 NumPy 浮點數陣列所儲存的時間序列（time series）資料，而 sr 則是取樣頻率（sampling rate），在預設的狀況下，從聲音檔載入的音訊會自動轉換為單聲道（mono）、頻率為 22050 Hz 的聲音訊號，若要修改預設值，可以自行調整 librosa.load 函數的參數。

繪製波形圖

將波形資料讀取出來之後，可以使用 librosa.display.waveshow() 搭配 matplotlib 模組來繪製波形圖：

import librosa.display
import matplotlib.pyplot as plt

# 繪製波形圖
plt.figure()
librosa.display.waveshow(y, sr=sr)
plt.title('nutcracker waveform')
plt.show()

繪製短時距傅立葉變換圖

librosa.stft() 函數可以用來計算短時距傅立葉變換（STFT，Short-time Fourier transform），而頻譜的繪圖則可使用 librosa.display.specshow() 函數：

import librosa.display
import numpy as np
import matplotlib.pyplot as plt

# 計算短時距傅立葉變換
S = np.abs(librosa.stft(y))

# 繪製短時距傅立葉變換圖
fig, ax = plt.subplots()
img = librosa.display.specshow(
    librosa.amplitude_to_db(S, ref=np.max),
    y_axis='log', x_axis='time', ax=ax)
ax.set_title('Power spectrogram')
fig.colorbar(img, ax=ax, format="%+2.0f dB")
plt.show()

繪製梅爾頻譜圖

librosa.feature.melspectrogram() 函數可以用來計算梅爾頻譜（mel spectrogram），而頻譜的繪圖則可使用 librosa.display.specshow() 函數：

import librosa.display
import numpy as np
import matplotlib.pyplot as plt

# 計算梅爾頻譜
S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128, fmax=8000)

# 繪製梅爾頻譜圖
fig, ax = plt.subplots()
S_dB = librosa.power_to_db(S, ref=np.max)
img = librosa.display.specshow(S_dB, x_axis='time',
                               y_axis='mel', sr=sr,
                               fmax=8000, ax=ax)
fig.colorbar(img, ax=ax, format='%+2.0f dB')
ax.set(title='Mel-frequency spectrogram')
plt.show()

梅爾頻率倒譜係數圖

librosa.feature.mfcc() 函數可以用來計算梅爾頻率倒譜係數（Mel-Frequency Cepstral Coefficients），繪圖則可使用 librosa.display.specshow() 函數：

import matplotlib.pyplot as plt

# 計算梅爾頻率倒譜係數
mfccs = librosa.feature.mfcc(y=y, sr=sr)

# 繪製梅爾頻率倒譜係數圖
fig, ax = plt.subplots()
img = librosa.display.specshow(mfccs, x_axis='time', ax=ax)
fig.colorbar(img, ax=ax)
ax.set(title='MFCC')
plt.show()

同時繪製波形與梅爾頻譜圖

在實際分析聲音訊號時，我們常會將波形圖與梅爾頻譜圖放在一起對照：

import librosa.display
import numpy as np
import matplotlib.pyplot as plt

fig, ax = plt.subplots(2, 1)

# 繪製波形圖
librosa.display.waveshow(y, sr=sr, ax=ax[0])
ax[0].set_title('nutcracker waveform')

# 繪製梅爾頻譜圖
S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128, fmax=8000)
S_dB = librosa.power_to_db(S, ref=np.max)
librosa.display.specshow(S_dB, x_axis='time',
                         y_axis='mel', sr=sr,
                         fmax=8000, ax=ax[1])
ax[1].set_title('Mel-frequency spectrogram')

plt.tight_layout()
plt.show()

偵測節拍

librosa.beat.beat_track() 可以用來偵測音樂中的節拍：

# 偵測節拍
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)

這裡得到的 tempo 是節奏頻率，單位為次數/分鐘：

# 節奏頻率（次/分鐘）
print(tempo)

107.666015625

而 beat_frames 則是節拍出現的 frame 編號（每一個 frame 的長度是由 hop_length 所指定）：

# 節拍的 frame 編號
print(beat_frames)

array([  51,   74,  100,  124,  149,  173,  198,  221,  247,  271,  295,
        319,  344,  368,  393,  416,  440,  462,  486,  508,  531,  554,
        578,  601,  624,  647,  670,  693,  716,  739,  762,  786,  810,
        832,  856,  878,  901,  924,  947,  970,  993, 1016, 1039, 1062,
       1085, 1108, 1131, 1155, 1178, 1201, 1225, 1249, 1273, 1297, 1321,
       1344, 1369, 1393, 1415, 1437, 1460, 1483, 1505, 1527, 1550, 1573,
       1595, 1618, 1642, 1665, 1689, 1712, 1736, 1759, 1783, 1806, 1829,
       1853, 1876, 1900, 1924, 1947, 1971, 1994, 2018, 2042, 2065, 2088,
       2110, 2132, 2155, 2177, 2200, 2223, 2244, 2266, 2290, 2313, 2336,
       2359, 2381, 2404, 2427, 2451, 2474, 2498, 2522, 2545, 2568, 2592,
       2615, 2639, 2662, 2684, 2706, 2729, 2752, 2775, 2797, 2819, 2842,
       2864, 2887, 2910, 2933, 2956, 2979, 3003, 3027, 3051, 3075, 3100,
       3125, 3150, 3174, 3199, 3223, 3246, 3269, 3295, 3318, 3343, 3367,
       3391, 3415, 3439, 3462, 3485, 3509, 3534, 3560, 3586, 3612, 3637,
       3664, 3689, 3715, 3740, 3766, 3791, 3817, 3842, 3866, 3891, 3916,
       3940, 3965, 3990, 4014, 4038, 4063, 4087, 4112, 4136, 4160, 4184,
       4208, 4232, 4256, 4280, 4304, 4327, 4352, 4375, 4400, 4423, 4447,
       4471, 4495, 4519, 4543, 4567, 4591, 4615, 4639, 4662, 4686, 4710,
       4734, 4758, 4782, 4806, 4830, 4854, 4877, 4901, 4926, 4950, 4973,
       4997, 5021, 5045])

我們可以透過 librosa.frames_to_time() 函數將 frames 轉為實際時間：

# 將 frames 轉為實際時間
beat_times = librosa.frames_to_time(beat_frames, sr=sr)

# 節拍的時間點
print(beat_times)

[  1.18421769   1.71827664   2.32199546   2.87927438   3.45977324
   4.01705215   4.59755102   5.13160998   5.7353288    6.29260771
   6.84988662   7.40716553   7.9876644    8.54494331   9.12544218
   9.65950113  10.21678005  10.72761905  11.28489796  11.79573696
  12.32979592  12.86385488  13.42113379  13.95519274  14.4892517
  15.02331066  15.55736961  16.09142857  16.62548753  17.15954649
  17.69360544  18.25088435  18.80816327  19.31900227  19.87628118
  20.38712018  20.92117914  21.4552381   21.98929705  22.52335601
  23.05741497  23.59147392  24.12553288  24.65959184  25.19365079
  25.72770975  26.26176871  26.81904762  27.35310658  27.88716553
  28.44444444  29.00172336  29.55900227  30.11628118  30.67356009
  31.20761905  31.78811791  32.34539683  32.85623583  33.36707483
  33.90113379  34.43519274  34.94603175  35.45687075  35.99092971
  36.52498866  37.03582766  37.56988662  38.12716553  38.66122449
  39.2185034   39.75256236  40.30984127  40.84390023  41.40117914
  41.9352381   42.46929705  43.02657596  43.56063492  44.11791383
  44.67519274  45.2092517   45.76653061  46.30058957  46.85786848
  47.41514739  47.94920635  48.48326531  48.99410431  49.50494331
  50.03900227  50.54984127  51.08390023  51.61795918  52.10557823
  52.61641723  53.17369615  53.7077551   54.24181406  54.77587302
  55.28671202  55.82077098  56.35482993  56.91210884  57.4461678
  58.00344671  58.56072562  59.09478458  59.62884354  60.18612245
  60.72018141  61.27746032  61.81151927  62.32235828  62.83319728
  63.36725624  63.90131519  64.43537415  64.94621315  65.45705215
  65.99111111  66.50195011  67.03600907  67.57006803  68.10412698
  68.63818594  69.1722449   69.72952381  70.28680272  70.84408163
  71.40136054  71.98185941  72.56235828  73.14285714  73.70013605
  74.28063492  74.83791383  75.37197279  75.90603175  76.50975057
  77.04380952  77.62430839  78.1815873   78.73886621  79.29614512
  79.85342404  80.38748299  80.92154195  81.47882086  82.05931973
  82.66303855  83.26675737  83.87047619  84.45097506  85.07791383
  85.6584127   86.26213152  86.84263039  87.44634921  88.02684807
  88.63056689  89.21106576  89.76834467  90.34884354  90.9293424
  91.48662132  92.06712018  92.64761905  93.20489796  93.76217687
  94.34267574  94.89995465  95.48045351  96.03773243  96.59501134
  97.15229025  97.70956916  98.26684807  98.82412698  99.3814059
  99.93868481 100.47274376 101.05324263 101.58730159 102.16780045
 102.70185941 103.25913832 103.81641723 104.37369615 104.93097506
 105.48825397 106.04553288 106.60281179 107.1600907  107.71736961
 108.25142857 108.80870748 109.36598639 109.92326531 110.48054422
 111.03782313 111.59510204 112.15238095 112.70965986 113.24371882
 113.80099773 114.3814966  114.93877551 115.47283447 116.03011338
 116.58739229 117.1446712 ]

librosa 模組之中還有其他很多的音訊分析工具，詳細的說明與介紹可以參考 librosa 官方網站的文件。