Convolutive NMF (libnmfd.core.nmfconv)

libnmfd.core.nmfconv.conv_model(W: numpy.ndarray, H: numpy.ndarray)numpy.ndarray[source]
Convolutive NMF model implementing the eq. (4) from [1]. Note that it can also be used to compute the standard

NMF model in case the number of time frames of the templates equals one.

References

[1] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.

Parameters
  • W (np.ndarray) – Tensor holding the spectral templates which can be interpreted as a set of spectrogram snippets with dimensions: num_bins x num_comp x num_template_frames

  • H (np.ndarray) – Corresponding activations with dimensions: num_comp x num_target_frames

Returns

lamb (np.ndarray) – Approximated spectrogram matrix

libnmfd.core.nmfconv.init_activations(num_comp: Optional[int] = None, num_frames: Optional[int] = None, strategy: str = 'random', time_res: Optional[float] = None, pitches: Optional[List[int]] = None, decay: Optional[Union[numpy.ndarray, float]] = None, onsets: Optional[List[float]] = None, durations: Optional[List[float]] = None, drums: Optional[List[str]] = None, onset_offset_tol: float = 0.025)[source]

Implements different initialization strategies for NMF activations. The strategies ‘random’ and ‘uniform’ are self-explaining. The strategy ‘pitched’ places gate-like activations at the frames, where certain notes are active in the ground truth transcription [1]. The strategy ‘drums’ places decaying impulses at the frames where drum onsets are given in the ground truth transcription [2].

References

[1] Jonathan Driedger, Harald Grohganz, Thomas Prätzlich, Sebastian Ewert, and Meinard Müller Score-Informed Audio Decomposition and Applications In Proceedings of the ACM International Conference on Multimedia (ACM-MM): 541–544, 2013.

[2] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.

Parameters
  • num_comp (int, default = None) – Number of NMF components

  • num_frames (int, default = None) – Number of time frames

  • strategy (str, default = ‘random’) – String describing the initialization strategy

  • time_res (float, default=None) – The temporal resolution

  • pitches (list or None, default=None) – Optional list of MIDI pitch values

  • decay (np.ndarray or float) – The decay parameter in the range [0 … 1], this can be given as a column-vector with individual decays per row or as a scalar

  • onsets (list) – Optional list of note onsets (in seconds)

  • durations (list) – Optional list of note durations (in seconds)

  • drums (list) – Optional list of drum type indices

  • onset_offset_tol (float) – Optional parameter giving the onset / offset

Returns

initH (array-like) – Array with initial activation functions

libnmfd.core.nmfconv.init_templates(num_comp: Optional[int] = None, num_bins: Optional[int] = None, num_template_frames: int = 1, strategy: str = 'random', pitches: Optional[List[int]] = None, pitch_tol_up: float = 0.75, pitch_tol_down: float = 0.75, num_harmonics: int = 25, desired_drum_classes: Optional[List[str]] = None, num_iter: int = 30, block_size: int = 2048, hop_size: int = 512, fs: int = 44100, input_dir: str = 'data/')List[numpy.ndarray][source]

Implements different initialization strategies for NMF templates. The strategies ‘random’ and ‘uniform’ are self-explaining. The strategy ‘pitched’ uses comb-filter templates as described in [1]. The strategy ‘drums’ uses pre-extracted, averaged spectra of desired drum types [2].

References

[1] Jonathan Driedger, Harald Grohganz, Thomas Prätzlich, Sebastian Ewert, and Meinard Müller Score-Informed Audio Decomposition and Applications In Proceedings of the ACM International Conference on Multimedia (ACM-MM): 541–544, 2013.

[2] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.

Parameters
  • num_comp (int, default = None) – Number of NMF components

  • num_bins (int, default = None) – Number of frequency bins

  • num_template_frames (int) – Number of time frames for 2D-templates

  • strategy (str) – String describing the initialization strategy

  • pitches (list) – Optional list of MIDI pitch values

  • pitch_tol_up (float) – Defines how much the partials should be extended upwards

  • pitch_tol_down (float) – Defines how much the partials should be extended downwards

  • num_harmonics (int) – Number of harmonics

  • desired_drum_classes (List[str]) – List of desired drum classes, only relevant in case of strategy=’drums’

  • num_iter (int) – Number of NMFD iterations

  • num_template_frames (int) – Number of time frames for 2D-templates

  • block_size (int) – STFT block (window) size

  • hop_size (int) – STFT hop size

  • fs (int) – Sample rate

  • input_dir (str) – Input data directory including folders with the same name as the desired drums sounds

Returns

init_W (List[np.ndarray]) – List with the desired NMFD templates

libnmfd.core.nmfconv.initialize_drum_specific_nmfd_templates(desired_drum_classes: Optional[List[str]] = None, num_iter: int = 30, num_template_frames: int = 8, block_size: int = 2048, hop_size: int = 512, fs: int = 44100, input_dir: str = 'data/')List[numpy.ndarray][source]

Implements the extraction of drum specific spectrogram templates. The method assumes, that folders with the same name as the desired drums sounds are present inside the data directory. These should contain single samples of the target drum sounds. Per default, we use pre-defined kick, snare and hi-hat samples.

Parameters
  • desired_drum_classes (List[str]) – List of desired drum classes

  • num_iter (int) – Number of NMFD iterations

  • num_template_frames (int) – Number of time frames for 2D-templates

  • block_size (int) – STFT block (window) size

  • hop_size (int) – STFT hop size

  • fs (int) – Sampling rate

  • input_dir (str) – Input data directory including folders with the same name as the desired drums sounds

Returns

init_W_drums (List[np.ndarray]) – List of spectral NMFD templates corresponding to the desired drum classes

libnmfd.core.nmfconv.nmf_conv(V: numpy.ndarray, num_comp: int = 3, num_iter: int = 30, num_template_frames: int = 8, beta: float = 0, init_W: Optional[numpy.ndarray] = None, init_H: Optional[numpy.ndarray] = None, sparsity_weight: float = 0, uncorr_weight: float = 0, num_bins: Optional[int] = None, num_frames: Optional[int] = None, **kwargs)Tuple[List[numpy.ndarray], numpy.ndarray, numpy.ndarray, numpy.ndarray][source]
Convolutive Non-Negative Matrix Factorization with Beta-Divergence and optional regularization parameters as

described in chapter 3.7 of [1]. The averaged activation updates are computed via the compact algorithm given in paragraph 3.7.3. For the sake of consistency, we use the notation from [2] instead of the one from the book.

References

[1] Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan, and Shun-ichi Amari Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation John Wiley and Sons, 2009.

[2] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.

Parameters
  • V (np.ndarray) – Matrix that shall be decomposed (typically a magnitude spectrogram of dimension num_bins x num_frames)

  • num_comp (int) – Number of NMFD components (denoted as R in [2])

  • num_iter (int) – Number of NMFD iterations (denoted as L in [2])

  • num_template_frames (int) – Number of time frames for the 2D-template (denoted as T in [2])

  • init_W (np.ndarray) – An initial estimate for the templates (denoted as W^(0) in [2])

  • init_H (np.ndarray) – An initial estimate for the gains (denoted as H^(0) in [2])

  • beta (float) –

    The beta parameter of the divergence:
    -1 -> equals Itakura Saito divergence

    0 -> equals Kullback Leiber divergence 1 -> equals Euclidean distance

  • sparsity_weight (float) – Strength of the activation sparsity

  • uncorr_weight (float) – Strength of the template uncorrelatedness

Returns
  • W (np.ndarray) – List with the learned templates

  • H (np.ndarray) – Matrix with the learned activations

  • cnmfY (np.ndarray) – List with approximated component spectrograms

  • cost_func (np.ndarray) – The approximation quality per iteration

libnmfd.core.nmfconv.nmfd(V: numpy.ndarray, num_comp: int = 3, num_frames: Optional[int] = None, num_iter: int = 30, num_template_frames: int = 8, init_W: Optional[numpy.ndarray] = None, init_H: Optional[numpy.ndarray] = None, func_preprocess=None, func_postprocess=None, fix_W: bool = False, fix_H: bool = False, num_bins: Optional[int] = None, **kwargs)Tuple[List[numpy.ndarray], numpy.ndarray, List[numpy.ndarray], numpy.ndarray, numpy.ndarray][source]
Non-Negative Matrix Factor Deconvolution with Kullback-Leibler-Divergence and fixable components. The core

algorithm was proposed in [1], the specific adaptions are used in [2].

References

[1] Paris Smaragdis Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs”. International Congress on Independent Component Analysis and Blind Signal Separation (ICA), 2004

[2] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.

Parameters
  • V (np.ndarray) – Matrix that shall be decomposed (typically a magnitude spectrogram of dimension numBins x numFrames)

  • num_comp (int) – Number of NMFD components (denoted as R in [2])

  • num_frames (int) – Number of frames of the target matrix (denoted as M in [2])

  • num_iter (int) – Number of NMFD iterations (denoted as L in [2])

  • num_template_frames (int) – Number of time frames for the 2D-template (denoted as T in [2])

  • init_W (np.ndarray) – An initial estimate for the templates (denoted as W^(0) in [2])

  • init_H (np.ndarray) – An initial estimate for the gains (denoted as H^(0) in [2])

  • fix_W (bool) – If set to true, the template matrix is not updated

  • fix_H (bool) – If set to true, the activation matrix is not updated

  • func_preprocess (function, default=None) – Call for preprocessing

  • func_postprocess (function) – Call for postprocessing

Returns
  • W (List[np.ndarray]) – List with the learned templates

  • H (np.ndarray) – Matrix with the learned activations

  • nmfd_V (List[np.ndarray]) – List with approximated component spectrograms

  • cost_func (np.ndarray) – The approximation quality per iteration

  • tensor_W (np.ndarray) – If desired, we can also return the tensor

libnmfd.core.nmfconv.shift_operator(A: numpy.ndarray, shift_amount: int)numpy.ndarray[source]

Shift operator as described in eq. (5) from [1]. It shifts the columns of a matrix to the left or the right and fills undefined elements with zeros.

References

[1] Paris Smaragdis “Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs”. International Congress on Independent Component Analysis and Blind Signal Separation (ICA), 2004

Parameters
  • A (np.ndarray) – Arbitrary matrix to undergo the shifting operation

  • shift_amount (int) – Positive numbers shift to the right, negative numbers shift to the left, zero leaves the matrix unchanged

Returns

shifted (np.ndarray) – Result of this operation