Convolutive NMF (libnmfd.core.nmfconv)¶

libnmfd.core.nmfconv.conv_model(W: numpy.ndarray, H: numpy.ndarray) → numpy.ndarray[source]¶

Convolutive NMF model implementing the eq. (4) from [1]. Note that it can also be used to compute the standard: NMF model in case the number of time frames of the templates equals one.

References

[1] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.

Parameters

W (np.ndarray) – Tensor holding the spectral templates which can be interpreted as a set of spectrogram snippets with dimensions: num_bins x num_comp x num_template_frames
H (np.ndarray) – Corresponding activations with dimensions: num_comp x num_target_frames

Returns

lamb (np.ndarray) – Approximated spectrogram matrix

libnmfd.core.nmfconv.init_activations(num_comp: Optional[int] = None, num_frames: Optional[int] = None, strategy: str = 'random', time_res: Optional[float] = None, pitches: Optional[List[int]] = None, decay: Optional[Union[numpy.ndarray, float]] = None, onsets: Optional[List[float]] = None, durations: Optional[List[float]] = None, drums: Optional[List[str]] = None, onset_offset_tol: float = 0.025)[source]¶

Implements different initialization strategies for NMF activations. The strategies ‘random’ and ‘uniform’ are self-explaining. The strategy ‘pitched’ places gate-like activations at the frames, where certain notes are active in the ground truth transcription [1]. The strategy ‘drums’ places decaying impulses at the frames where drum onsets are given in the ground truth transcription [2].

References

[1] Jonathan Driedger, Harald Grohganz, Thomas Prätzlich, Sebastian Ewert, and Meinard Müller Score-Informed Audio Decomposition and Applications In Proceedings of the ACM International Conference on Multimedia (ACM-MM): 541–544, 2013.

[2] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.

Parameters

num_comp (int, default = None) – Number of NMF components
num_frames (int, default = None) – Number of time frames
strategy (str, default = ‘random’) – String describing the initialization strategy
time_res (float, default=None) – The temporal resolution
pitches (list or None, default=None) – Optional list of MIDI pitch values
decay (np.ndarray or float) – The decay parameter in the range [0 … 1], this can be given as a column-vector with individual decays per row or as a scalar
onsets (list) – Optional list of note onsets (in seconds)
durations (list) – Optional list of note durations (in seconds)
drums (list) – Optional list of drum type indices
onset_offset_tol (float) – Optional parameter giving the onset / offset

Returns

initH (array-like) – Array with initial activation functions

libnmfd.core.nmfconv.init_templates(num_comp: Optional[int] = None, num_bins: Optional[int] = None, num_template_frames: int = 1, strategy: str = 'random', pitches: Optional[List[int]] = None, pitch_tol_up: float = 0.75, pitch_tol_down: float = 0.75, num_harmonics: int = 25, desired_drum_classes: Optional[List[str]] = None, num_iter: int = 30, block_size: int = 2048, hop_size: int = 512, fs: int = 44100, input_dir: str = 'data/') → List[numpy.ndarray][source]¶

Implements different initialization strategies for NMF templates. The strategies ‘random’ and ‘uniform’ are self-explaining. The strategy ‘pitched’ uses comb-filter templates as described in [1]. The strategy ‘drums’ uses pre-extracted, averaged spectra of desired drum types [2].

References

[1] Jonathan Driedger, Harald Grohganz, Thomas Prätzlich, Sebastian Ewert, and Meinard Müller Score-Informed Audio Decomposition and Applications In Proceedings of the ACM International Conference on Multimedia (ACM-MM): 541–544, 2013.

[2] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.

Parameters

num_comp (int, default = None) – Number of NMF components
num_bins (int, default = None) – Number of frequency bins
num_template_frames (int) – Number of time frames for 2D-templates
strategy (str) – String describing the initialization strategy
pitches (list) – Optional list of MIDI pitch values
pitch_tol_up (float) – Defines how much the partials should be extended upwards
pitch_tol_down (float) – Defines how much the partials should be extended downwards
num_harmonics (int) – Number of harmonics
desired_drum_classes (List[str]) – List of desired drum classes, only relevant in case of strategy=’drums’
num_iter (int) – Number of NMFD iterations
num_template_frames (int) – Number of time frames for 2D-templates
block_size (int) – STFT block (window) size
hop_size (int) – STFT hop size
fs (int) – Sample rate
input_dir (str) – Input data directory including folders with the same name as the desired drums sounds

Returns

init_W (List[np.ndarray]) – List with the desired NMFD templates

libnmfd.core.nmfconv.initialize_drum_specific_nmfd_templates(desired_drum_classes: Optional[List[str]] = None, num_iter: int = 30, num_template_frames: int = 8, block_size: int = 2048, hop_size: int = 512, fs: int = 44100, input_dir: str = 'data/') → List[numpy.ndarray][source]¶

Implements the extraction of drum specific spectrogram templates. The method assumes, that folders with the same name as the desired drums sounds are present inside the data directory. These should contain single samples of the target drum sounds. Per default, we use pre-defined kick, snare and hi-hat samples.

Parameters

desired_drum_classes (List[str]) – List of desired drum classes
num_iter (int) – Number of NMFD iterations
num_template_frames (int) – Number of time frames for 2D-templates
block_size (int) – STFT block (window) size
hop_size (int) – STFT hop size
fs (int) – Sampling rate
input_dir (str) – Input data directory including folders with the same name as the desired drums sounds

Returns

init_W_drums (List[np.ndarray]) – List of spectral NMFD templates corresponding to the desired drum classes

libnmfd.core.nmfconv.nmf_conv(V: numpy.ndarray, num_comp: int = 3, num_iter: int = 30, num_template_frames: int = 8, beta: float = 0, init_W: Optional[numpy.ndarray] = None, init_H: Optional[numpy.ndarray] = None, sparsity_weight: float = 0, uncorr_weight: float = 0, num_bins: Optional[int] = None, num_frames: Optional[int] = None, **kwargs) → Tuple[List[numpy.ndarray], numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶

Convolutive Non-Negative Matrix Factorization with Beta-Divergence and optional regularization parameters as: described in chapter 3.7 of [1]. The averaged activation updates are computed via the compact algorithm given in paragraph 3.7.3. For the sake of consistency, we use the notation from [2] instead of the one from the book.

References

[1] Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan, and Shun-ichi Amari Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation John Wiley and Sons, 2009.

[2] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.

Parameters

V (np.ndarray) – Matrix that shall be decomposed (typically a magnitude spectrogram of dimension num_bins x num_frames)
num_comp (int) – Number of NMFD components (denoted as R in [2])
num_iter (int) – Number of NMFD iterations (denoted as L in [2])
num_template_frames (int) – Number of time frames for the 2D-template (denoted as T in [2])
init_W (np.ndarray) – An initial estimate for the templates (denoted as W^(0) in [2])
init_H (np.ndarray) – An initial estimate for the gains (denoted as H^(0) in [2])
beta (float) –

The beta parameter of the divergence:

-1 -> equals Itakura Saito divergence
0 -> equals Kullback Leiber divergence 1 -> equals Euclidean distance
sparsity_weight (float) – Strength of the activation sparsity
uncorr_weight (float) – Strength of the template uncorrelatedness

Returns

W (np.ndarray) – List with the learned templates
H (np.ndarray) – Matrix with the learned activations
cnmfY (np.ndarray) – List with approximated component spectrograms
cost_func (np.ndarray) – The approximation quality per iteration

libnmfd.core.nmfconv.nmfd(V: numpy.ndarray, num_comp: int = 3, num_frames: Optional[int] = None, num_iter: int = 30, num_template_frames: int = 8, init_W: Optional[numpy.ndarray] = None, init_H: Optional[numpy.ndarray] = None, func_preprocess=None, func_postprocess=None, fix_W: bool = False, fix_H: bool = False, num_bins: Optional[int] = None, **kwargs) → Tuple[List[numpy.ndarray], numpy.ndarray, List[numpy.ndarray], numpy.ndarray, numpy.ndarray][source]¶

Non-Negative Matrix Factor Deconvolution with Kullback-Leibler-Divergence and fixable components. The core: algorithm was proposed in [1], the specific adaptions are used in [2].

References

[1] Paris Smaragdis Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs”. International Congress on Independent Component Analysis and Blind Signal Separation (ICA), 2004

[2] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.

Parameters

V (np.ndarray) – Matrix that shall be decomposed (typically a magnitude spectrogram of dimension numBins x numFrames)
num_comp (int) – Number of NMFD components (denoted as R in [2])
num_frames (int) – Number of frames of the target matrix (denoted as M in [2])
num_iter (int) – Number of NMFD iterations (denoted as L in [2])
num_template_frames (int) – Number of time frames for the 2D-template (denoted as T in [2])
init_W (np.ndarray) – An initial estimate for the templates (denoted as W^(0) in [2])
init_H (np.ndarray) – An initial estimate for the gains (denoted as H^(0) in [2])
fix_W (bool) – If set to true, the template matrix is not updated
fix_H (bool) – If set to true, the activation matrix is not updated
func_preprocess (function, default=None) – Call for preprocessing
func_postprocess (function) – Call for postprocessing

Returns

W (List[np.ndarray]) – List with the learned templates
H (np.ndarray) – Matrix with the learned activations
nmfd_V (List[np.ndarray]) – List with approximated component spectrograms
cost_func (np.ndarray) – The approximation quality per iteration
tensor_W (np.ndarray) – If desired, we can also return the tensor

libnmfd.core.nmfconv.shift_operator(A: numpy.ndarray, shift_amount: int) → numpy.ndarray[source]¶

Shift operator as described in eq. (5) from [1]. It shifts the columns of a matrix to the left or the right and fills undefined elements with zeros.

References

[1] Paris Smaragdis “Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs”. International Congress on Independent Component Analysis and Blind Signal Separation (ICA), 2004

Parameters

A (np.ndarray) – Arbitrary matrix to undergo the shifting operation
shift_amount (int) – Positive numbers shift to the right, negative numbers shift to the left, zero leaves the matrix unchanged

Returns

shifted (np.ndarray) – Result of this operation