Convolutive NMF (libnmfd.core.nmfconv)
- libnmfd.core.nmfconv.conv_model(W: ndarray, H: ndarray) ndarray [source]
- Convolutive NMF model implementing the eq. (4) from [1]. Note that it can also be used to compute the standard
NMF model in case the number of time frames of the templates equals one.
References
[1] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.
- Parameters:
W (np.ndarray) – Tensor holding the spectral templates which can be interpreted as a set of spectrogram snippets with dimensions: num_bins x num_comp x num_template_frames
H (np.ndarray) – Corresponding activations with dimensions: num_comp x num_target_frames
- Returns:
lamb (np.ndarray) – Approximated spectrogram matrix
- libnmfd.core.nmfconv.init_activations(num_comp: int | None = None, num_frames: int | None = None, strategy: str = 'random', time_res: float | None = None, pitches: List[int] | None = None, decay: ndarray | float | None = None, onsets: List[float] | None = None, durations: List[float] | None = None, drums: List[str] | None = None, onset_offset_tol: float = 0.025)[source]
Implements different initialization strategies for NMF activations. The strategies ‘random’ and ‘uniform’ are self-explaining. The strategy pitched’ places gate-like activations at the frames, where certain notes are active in the ground truth transcription [1]. The strategy drums’ places decaying impulses at the frames where drum onsets are given in the ground truth transcription [2].
References
[1] Jonathan Driedger, Harald Grohganz, Thomas Prätzlich, Sebastian Ewert, and Meinard Müller Score-Informed Audio Decomposition and Applications In Proceedings of the ACM International Conference on Multimedia (ACM-MM): 541–544, 2013.
[2] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.
- Parameters:
num_comp (int, default = None) – Number of NMF components
num_frames (int, default = None) – Number of time frames
strategy (str, default = ‘random’) – String describing the initialization strategy
time_res (float, default=None) – The temporal resolution
pitches (list or None, default=None) – Optional list of MIDI pitch values
decay (np.ndarray or float) – The decay parameter in the range [0 … 1], this can be given as a column-vector with individual decays per row or as a scalar
onsets (list) – Optional list of note onsets (in seconds)
durations (list) – Optional list of note durations (in seconds)
drums (list) – Optional list of drum type indices
onset_offset_tol (float) – Optional parameter giving the onset / offset
- Returns:
initH (array-like) – Array with initial activation functions
- libnmfd.core.nmfconv.init_templates(num_comp: int | None = None, num_bins: int | None = None, num_template_frames: int = 1, strategy: str = 'random', pitches: List[int] | None = None, pitch_tol_up: float = 0.75, pitch_tol_down: float = 0.75, num_harmonics: int = 25, freq_res: float | None = None) List[ndarray] [source]
Implements different initialization strategies for NMF templates. The strategies ‘random’ and ‘uniform’ are self-explaining. The strategy ‘pitched’ uses comb-filter templates as described in [1]. The strategy ‘drums’ uses pre-extracted, averaged spectra of desired drum types [2].
References
[1] Jonathan Driedger, Harald Grohganz, Thomas Prätzlich, Sebastian Ewert, and Meinard Müller Score-Informed Audio Decomposition and Applications In Proceedings of the ACM International Conference on Multimedia (ACM-MM): 541–544, 2013.
[2] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.
- Parameters:
num_comp (int, default = None) – Number of NMF components
num_bins (int, default = None) – Number of frequency bins
num_template_frames (int) – Number of time frames for 2D-templates
strategy (str) – String describing the initialization strategy
pitches (list) – Optional list of MIDI pitch values
pitch_tol_up (float) – TODO
pitch_tol_down (float) – TODO
num_harmonics (int) – Number of harmonics
freq_res (float) – Spectral resolution
- Returns:
initW (List[np.ndarray]) – List with the desired templates
- libnmfd.core.nmfconv.nmf_conv(V: ndarray, num_comp: int = 3, num_iter: int = 30, num_template_frames: int = 8, beta: float = 0, init_W: ndarray | None = None, init_H: ndarray | None = None, sparsity_weight: float = 0, uncorr_weight: float = 0, num_bins: int | None = None, num_frames: int | None = None, **kwargs) Tuple[List[ndarray], ndarray, ndarray, ndarray] [source]
- Convolutive Non-Negative Matrix Factorization with Beta-Divergence and optional regularization parameters as
described in chapter 3.7 of [1]. The averaged activation updates are computed via the compact algorithm given in paragraph 3.7.3. For the sake of consistency, we use the notation from [2] instead of the one from the book.
References
[1] Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan, and Shun-ichi Amari Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation John Wiley and Sons, 2009.
[2] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.
- Parameters:
V (np.ndarray) – Matrix that shall be decomposed (typically a magnitude spectrogram of dimension num_bins x num_frames)
num_comp (int) – Number of NMFD components (denoted as R in [2])
num_iter (int) – Number of NMFD iterations (denoted as L in [2])
num_template_frames (int) – Number of time frames for the 2D-template (denoted as T in [2])
init_W (np.ndarray) – An initial estimate for the templates (denoted as W^(0) in [2])
init_H (np.ndarray) – An initial estimate for the gains (denoted as H^(0) in [2])
beta (float) –
- The beta parameter of the divergence:
- -1 -> equals Itakura Saito divergence
0 -> equals Kullback Leiber divergence 1 -> equals Euclidean distance
sparsity_weight (float) – Strength of the activation sparsity
uncorr_weight (float) – Strength of the template uncorrelatedness
- Returns:
W (np.ndarray) – List with the learned templates
H (np.ndarray) – Matrix with the learned activations
cnmfY (np.ndarray) – List with approximated component spectrograms
cost_func (np.ndarray) – The approximation quality per iteration
- libnmfd.core.nmfconv.nmfd(V: ndarray, num_comp: int = 3, num_frames: int | None = None, num_iter: int = 30, num_template_frames: int = 8, init_W: ndarray | None = None, init_H: ndarray | None = None, func_preprocess=None, func_postprocess=None, fix_W: bool = False, fix_H: bool = False, num_bins: int | None = None, **kwargs) Tuple[List[ndarray], ndarray, List[ndarray], ndarray, ndarray] [source]
- Non-Negative Matrix Factor Deconvolution with Kullback-Leibler-Divergence and fixable components. The core
algorithm was proposed in [1], the specific adaptions are used in [2].
References
[1] Paris Smaragdis Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs”. International Congress on Independent Component Analysis and Blind Signal Separation (ICA), 2004
[2] Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016.
- Parameters:
V (np.ndarray) – Matrix that shall be decomposed (typically a magnitude spectrogram of dimension numBins x numFrames)
num_comp (int) – Number of NMFD components (denoted as R in [2])
num_frames (int) – TODO: Number of frames
num_iter (int) – Number of NMFD iterations (denoted as L in [2])
num_template_frames (int) – Number of time frames for the 2D-template (denoted as T in [2])
init_W (np.ndarray) – An initial estimate for the templates (denoted as W^(0) in [2])
init_H (np.ndarray) – An initial estimate for the gains (denoted as H^(0) in [2])
fix_W (bool) – TODO
fix_H (bool) – TODO
func_preprocess (function, default=None) – Call for preprocessing
func_postprocess (function) – Call for postprocessing
- Returns:
W (List[np.ndarray]) – List with the learned templates
H (np.ndarray) – Matrix with the learned activations
nmfd_V (List[np.ndarray]) – List with approximated component spectrograms
cost_func (np.ndarray) – The approximation quality per iteration
tensor_W (np.ndarray) – If desired, we can also return the tensor
- libnmfd.core.nmfconv.shift_operator(A: ndarray, shift_amount: int) ndarray [source]
Shift operator as described in eq. (5) from [1]. It shifts the columns of a matrix to the left or the right and fills undefined elements with zeros.
References
[1] Paris Smaragdis “Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs”. International Congress on Independent Component Analysis and Blind Signal Separation (ICA), 2004
- Parameters:
A (np.ndarray) – Arbitrary matrix to undergo the shifting operation
shift_amount (int) – Positive numbers shift to the right, negative numbers shift to the left, zero leaves the matrix unchanged
- Returns:
shifted (np.ndarray) – Result of this operation