salience

Description: libf0 salience-based F0 estimation implementation
Author: Sebastian Rosenzweig, Simon Schwär, Meinard Müller
License: The MIT license, https://opensource.org/licenses/MIT
This file is part of libf0.
libf0.salience.salience(x, Fs=22050, N=2048, H=256, F_min=55.0, F_max=1760.0, R=10.0, num_harm=10, freq_smooth_len=11, alpha=0.9, gamma=0.0, constraint_region=None, tol=5, score_low=0.01, score_high=1.0)[source]

Implementation of a salience-based F0-estimation algorithm using pitch contours, inspired by Melodia.

1

Justin Salamon and Emilia Gómez, “Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics.” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 6, pp. 1759–1770, Aug. 2012.

Parameters
  • x (ndarray) – Audio signal

  • Fs (int) – Sampling rate

  • N (int) – Window size

  • H (int) – Hop size

  • F_min (float or int) – Minimal frequency

  • F_max (float or int) – Maximal frequency

  • R (int) – Frequency resolution given in cents

  • num_harm (int) – Number of harmonics (Default value = 10)

  • freq_smooth_len (int) – Filter length for vertical smoothing (Default value = 11)

  • alpha (float) – Weighting parameter for harmonics (Default value = 0.9)

  • gamma (float) – Logarithmic compression factor (Default value = 0.0)

  • constraint_region (None or ndarray) – Constraint regions, row-format: (t_start_sec, t_end_sec, f_start_hz, f_end,hz) (Default value = None)

  • tol (int) – Tolerance parameter for transition matrix (Default value = 5)

  • score_low (float) – Score (low) for transition matrix (Default value = 0.01)

  • score_high (float) – Score (high) for transition matrix (Default value = 1.0)

Returns

  • f0 (ndarray) – Estimated F0-trajectory

  • T_coef (ndarray) – Time axis

  • sal (ndarray) – Salience value of estimated F0

See also

[FMP] Notebook: C8/C8S2_SalienceRepresentation.ipynb

libf0.salience.compute_salience_rep(x, Fs, N, H, F_min, F_max, R, num_harm, freq_smooth_len, alpha, gamma)[source]

Compute salience representation [FMP, Eq. (8.56)]

Parameters
  • x (ndarray) – Audio signal

  • Fs (int) – Sampling rate

  • N (int) – Window size

  • H (int) – Hop size

  • F_min (float or int) – Minimal frequency

  • F_max (float or int) – Maximal frequency

  • R (int) – Frequency resolution given in cents

  • num_harm (int) – Number of harmonics

  • freq_smooth_len (int) – Filter length for vertical smoothing

  • alpha (float) – Weighting parameter for harmonics

  • gamma (float) – Logarithmic compression factor

Returns

  • Z (ndarray) – Salience representation

  • F_coef_hertz (ndarray) – Frequency axis in Hz

See also

[FMP] Notebook: C8/C8S2_SalienceRepresentation.ipynb

libf0.salience.compute_y_lf_if_bin_eff(X, Fs, N, H, F_min, F_max, R)[source]

Binned Log-frequency Spectrogram with variable frequency resolution based on instantaneous frequency, more efficient implementation than FMP

Parameters
  • X (ndarray) – Complex spectrogram

  • Fs (int) – Sampling rate in Hz

  • N (int) – Window size

  • H (int) – Hop size

  • F_min (float or int) – Minimal frequency

  • F_max (float or int) – Maximal frequency

  • R (int) – Frequency resolution given in cents

Returns

  • Y_LF_IF_bin (ndarray) – Binned log-frequency spectrogram using instantaneous frequency (shape: [freq, time])

  • F_coef_hertz (ndarray) – Frequency axis in Hz

libf0.salience.compute_salience_from_logfreq_spec(lf_spec, R, n_harmonics, alpha, beta, gamma, harmonic_win_len=11)[source]

Compute salience representation using harmonic summation following [1]

[1] J. Salamon and E. Gomez,

“Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics.” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 6, pp. 1759–1770, Aug. 2012.

Parameters
  • lf_spec (ndarray) – (F, T) log-spectrogram

  • R (int) – Frequency resolution given in cents

  • n_harmonics (int) – Number of harmonics

  • alpha (float) – Weighting parameter for harmonics

  • beta (float) – Compression parameter for spectrogram magnitudes

  • gamma (float) – Magnitude threshold

  • harmonic_win_len (int) – Length of a frequency weighting window in bins

Returns

Z – (F, T) salience representation of the input spectrogram

Return type

ndarray

libf0.salience.define_transition_matrix(B, tol=0, score_low=0.01, score_high=1.0)[source]

Generate transition matrix for dynamic programming

Parameters
  • B (int) – Number of bins

  • tol (int) – Tolerance parameter for transition matrix (Default value = 0)

  • score_low (float) – Score (low) for transition matrix (Default value = 0.01)

  • score_high (float) – Score (high) for transition matrix (Default value = 1.0)

Returns

T – (B, B) Transition matrix

Return type

ndarray

See also

[FMP] Notebook: C8/C8S2_FundFreqTracking.ipynb

libf0.salience.compute_trajectory_dp(Z, T)[source]

Trajectory tracking using dynamic programming

Parameters
  • Z (ndarray) – Salience representation

  • T (ndarray) – Transisition matrix

Returns

eta_DP – Trajectory indices

Return type

ndarray

See also

[FMP] Notebook: C8/C8S2_FundFreqTracking.ipynb

libf0.salience.compute_trajectory_cr(Z, T_coef, F_coef_hertz, constraint_region=None, tol=5, score_low=0.01, score_high=1.0)[source]

Trajectory tracking with constraint regions Notebook: C8/C8S2_FundFreqTracking.ipynb

Parameters
  • Z (ndarray) – Salience representation

  • T_coef (ndarray) – Time axis

  • F_coef_hertz (ndarray) – Frequency axis in Hz

  • constraint_region (ndarray or None) – Constraint regions, row-format: (t_start_sec, t_end_sec, f_start_hz, f_end_hz) (Default value = None)

  • tol (int) – Tolerance parameter for transition matrix (Default value = 5)

  • score_low (float) – Score (low) for transition matrix (Default value = 0.01)

  • score_high (float) – Score (high) for transition matrix (Default value = 1.0)

Returns

eta – Trajectory indices, unvoiced frames are indicated with -1

Return type

ndarray

See also

[FMP] Notebook: C8/C8S2_FundFreqTracking.ipynb

libf0.salience.frequency_to_bin_index(F, R, F_ref)[source]

Binning function with variable frequency resolution Note: Indexing starts with 0 (opposed to [FMP, Eq. (8.49)])

Parameters
  • F (float or ndarray) – Frequency in Hz

  • R (float) – Frequency resolution in cents (Default value = 10.0)

  • F_ref (float) – Reference frequency in Hz (Default value = 55.0)

Returns

bin_index (int)

Return type

Index for bin (starting with index 0)

See also

[FMP] Notebook: C8/C8S2_SalienceRepresentation.ipynb