Convolutive NMF (libnmfd.core.nmfconv)

libnmfd.core.nmfconv.conv_model(W: ndarray, H: ndarray) ndarray[source]
Convolutive NMF model implementing the eq. (4) from [1]. Note that it can also be used to compute the standard

NMF model in case the number of time frames of the templates equals one.

References

[1] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.

Parameters:
  • W (np.ndarray) – Tensor holding the spectral templates which can be interpreted as a set of spectrogram snippets with dimensions: num_bins x num_comp x num_template_frames

  • H (np.ndarray) – Corresponding activations with dimensions: num_comp x num_target_frames

Returns:

lamb (np.ndarray) – Approximated spectrogram matrix

libnmfd.core.nmfconv.init_activations(num_comp: int | None = None, num_frames: int | None = None, strategy: str = 'random', time_res: float | None = None, pitches: List[int] | None = None, decay: ndarray | float | None = None, onsets: List[float] | None = None, durations: List[float] | None = None, drums: List[str] | None = None, onset_offset_tol: float = 0.025)[source]

Implements different initialization strategies for NMF activations. The strategies ‘random’ and ‘uniform’ are self-explaining. The strategy pitched’ places gate-like activations at the frames, where certain notes are active in the ground truth transcription [1]. The strategy drums’ places decaying impulses at the frames where drum onsets are given in the ground truth transcription [2].

References

[1] Jonathan Driedger, Harald Grohganz, Thomas Prätzlich, Sebastian Ewert, and Meinard Müller Score-Informed Audio Decomposition and Applications In Proceedings of the ACM International Conference on Multimedia (ACM-MM): 541–544, 2013.

[2] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.

Parameters:
  • num_comp (int, default = None) – Number of NMF components

  • num_frames (int, default = None) – Number of time frames

  • strategy (str, default = ‘random’) – String describing the initialization strategy

  • time_res (float, default=None) – The temporal resolution

  • pitches (list or None, default=None) – Optional list of MIDI pitch values

  • decay (np.ndarray or float) – The decay parameter in the range [0 … 1], this can be given as a column-vector with individual decays per row or as a scalar

  • onsets (list) – Optional list of note onsets (in seconds)

  • durations (list) – Optional list of note durations (in seconds)

  • drums (list) – Optional list of drum type indices

  • onset_offset_tol (float) – Optional parameter giving the onset / offset

Returns:

initH (array-like) – Array with initial activation functions

libnmfd.core.nmfconv.init_templates(num_comp: int | None = None, num_bins: int | None = None, num_template_frames: int = 1, strategy: str = 'random', pitches: List[int] | None = None, pitch_tol_up: float = 0.75, pitch_tol_down: float = 0.75, num_harmonics: int = 25, freq_res: float | None = None) List[ndarray][source]

Implements different initialization strategies for NMF templates. The strategies ‘random’ and ‘uniform’ are self-explaining. The strategy ‘pitched’ uses comb-filter templates as described in [1]. The strategy ‘drums’ uses pre-extracted, averaged spectra of desired drum types [2].

References

[1] Jonathan Driedger, Harald Grohganz, Thomas Prätzlich, Sebastian Ewert, and Meinard Müller Score-Informed Audio Decomposition and Applications In Proceedings of the ACM International Conference on Multimedia (ACM-MM): 541–544, 2013.

[2] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.

Parameters:
  • num_comp (int, default = None) – Number of NMF components

  • num_bins (int, default = None) – Number of frequency bins

  • num_template_frames (int) – Number of time frames for 2D-templates

  • strategy (str) – String describing the initialization strategy

  • pitches (list) – Optional list of MIDI pitch values

  • pitch_tol_up (float) – TODO

  • pitch_tol_down (float) – TODO

  • num_harmonics (int) – Number of harmonics

  • freq_res (float) – Spectral resolution

Returns:

initW (List[np.ndarray]) – List with the desired templates

libnmfd.core.nmfconv.nmf_conv(V: ndarray, num_comp: int = 3, num_iter: int = 30, num_template_frames: int = 8, beta: float = 0, init_W: ndarray | None = None, init_H: ndarray | None = None, sparsity_weight: float = 0, uncorr_weight: float = 0, num_bins: int | None = None, num_frames: int | None = None, **kwargs) Tuple[List[ndarray], ndarray, ndarray, ndarray][source]
Convolutive Non-Negative Matrix Factorization with Beta-Divergence and optional regularization parameters as

described in chapter 3.7 of [1]. The averaged activation updates are computed via the compact algorithm given in paragraph 3.7.3. For the sake of consistency, we use the notation from [2] instead of the one from the book.

References

[1] Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan, and Shun-ichi Amari Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation John Wiley and Sons, 2009.

[2] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.

Parameters:
  • V (np.ndarray) – Matrix that shall be decomposed (typically a magnitude spectrogram of dimension num_bins x num_frames)

  • num_comp (int) – Number of NMFD components (denoted as R in [2])

  • num_iter (int) – Number of NMFD iterations (denoted as L in [2])

  • num_template_frames (int) – Number of time frames for the 2D-template (denoted as T in [2])

  • init_W (np.ndarray) – An initial estimate for the templates (denoted as W^(0) in [2])

  • init_H (np.ndarray) – An initial estimate for the gains (denoted as H^(0) in [2])

  • beta (float) –

    The beta parameter of the divergence:
    -1 -> equals Itakura Saito divergence

    0 -> equals Kullback Leiber divergence 1 -> equals Euclidean distance

  • sparsity_weight (float) – Strength of the activation sparsity

  • uncorr_weight (float) – Strength of the template uncorrelatedness

Returns:
  • W (np.ndarray) – List with the learned templates

  • H (np.ndarray) – Matrix with the learned activations

  • cnmfY (np.ndarray) – List with approximated component spectrograms

  • cost_func (np.ndarray) – The approximation quality per iteration

libnmfd.core.nmfconv.nmfd(V: ndarray, num_comp: int = 3, num_frames: int | None = None, num_iter: int = 30, num_template_frames: int = 8, init_W: ndarray | None = None, init_H: ndarray | None = None, func_preprocess=None, func_postprocess=None, fix_W: bool = False, fix_H: bool = False, num_bins: int | None = None, **kwargs) Tuple[List[ndarray], ndarray, List[ndarray], ndarray, ndarray][source]
Non-Negative Matrix Factor Deconvolution with Kullback-Leibler-Divergence and fixable components. The core

algorithm was proposed in [1], the specific adaptions are used in [2].

References

[1] Paris Smaragdis Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs”. International Congress on Independent Component Analysis and Blind Signal Separation (ICA), 2004

[2] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.

Parameters:
  • V (np.ndarray) – Matrix that shall be decomposed (typically a magnitude spectrogram of dimension numBins x numFrames)

  • num_comp (int) – Number of NMFD components (denoted as R in [2])

  • num_frames (int) – TODO: Number of frames

  • num_iter (int) – Number of NMFD iterations (denoted as L in [2])

  • num_template_frames (int) – Number of time frames for the 2D-template (denoted as T in [2])

  • init_W (np.ndarray) – An initial estimate for the templates (denoted as W^(0) in [2])

  • init_H (np.ndarray) – An initial estimate for the gains (denoted as H^(0) in [2])

  • fix_W (bool) – TODO

  • fix_H (bool) – TODO

  • func_preprocess (function, default=None) – Call for preprocessing

  • func_postprocess (function) – Call for postprocessing

Returns:
  • W (List[np.ndarray]) – List with the learned templates

  • H (np.ndarray) – Matrix with the learned activations

  • nmfd_V (List[np.ndarray]) – List with approximated component spectrograms

  • cost_func (np.ndarray) – The approximation quality per iteration

  • tensor_W (np.ndarray) – If desired, we can also return the tensor

libnmfd.core.nmfconv.shift_operator(A: ndarray, shift_amount: int) ndarray[source]

Shift operator as described in eq. (5) from [1]. It shifts the columns of a matrix to the left or the right and fills undefined elements with zeros.

References

[1] Paris Smaragdis “Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs”. International Congress on Independent Component Analysis and Blind Signal Separation (ICA), 2004

Parameters:
  • A (np.ndarray) – Arbitrary matrix to undergo the shifting operation

  • shift_amount (int) – Positive numbers shift to the right, negative numbers shift to the left, zero leaves the matrix unchanged

Returns:

shifted (np.ndarray) – Result of this operation