efficient initialization for nonnegative matrix factorization based on nonnegative independent...
TRANSCRIPT
Daichi Kitamura (SOKENDAI, Japan)Nobutaka Ono (NII/SOKENDAI, Japan)
Efficient initialization for NMF based on nonnegative ICA
IWAENC 2016, Sept. 16, 08:30 - 10:30, Session SPS-II - Student paper competition 2
SPC-II-04
• Nonnegative matrix factorization (NMF) [Lee, 1999]
– Dimensionality reduction with nonnegative constraint– Unsupervised learning extracting meaningful features– Sparse decomposition (implicitly)
Research background: what is NMF?
Amplitude
Amplitu
de
Input data matrix(power spectrogram)
Basis matrix(spectral patterns)
Activation matrix(time-varying gains)
Time
Time
Freq
uency
Freq
uency
2/19
: # of rows: # of columns: # of bases
• Optimization in NMF– Define a cost function (data fidelity) and minimize it
– No closed-form solution for and– Efficient iterative optimization
• Multiplicative update rules (auxiliary function technique) [Lee, 2001]
– Initial values for all the variables are required.
Research background: how to optimize?
3/19
(when the cost function is a squared Euclidian distance)
• Results of all applications using NMF always depend the initialization of and .– Ex. source separation via full-supervised NMF [Smaragdis, 2007]
• Motivation: Initialization method that always gives us a good performance is desired.
Problem and motivation
4/19
12
10
8
6
4
2
0SD
R im
prov
emen
t [dB
]
Ran
d10
Ran
d1
Ran
d2
Ran
d3
Ran
d4
Ran
d5
Ran
d6
Ran
d7
Ran
d8
Ran
d9
Different random seeds
More than 1 dB
Poor
Good
• With random values (not focused here)
– Directly use random values– Search good values via genetic algorithm [Stadlthanner, 2006], [Janecek, 2011]
– Clustering-based initialization [Zheng, 2007], [Xue, 2008], [Rezaei, 2011]
• Cluster input data into clusters, and set the centroid vectors to initial basis vectors.
• Without random values– PCA-based initialization [Zhao, 2014]
• Apply PCA to input data , extract orthogonal bases and coefficients, and set their absolute values to the initial bases and activations.
– SVD-based initialization [Boutsidis, 2008]
• Apply a special SVD (nonnegative double SVD) to input data and set nonnegative left and right singular vectors to the initial values.
Conventional NMF initialization techniques
5/19
• Are orthogonal bases really better for NMF?– PCA and SVD are orthogonal decompositions.– A geometric interpretation of NMF [Donoho, 2003]
• The optimal bases in NMF are “along the edges of a convex cone” that includes all the observed data points.
– Orthogonality might not be a good initial value for NMF.
Bases orthogonality?
6/19
Convex cone
Data points
Edge
Optimal bases Orthogonal bases Tight bases
satisfactory for representing all the data points
have a risk to represent a meaningless area
cannot represent all the data points
Meaningless areas
• What can we do from only the input data ?– Independent component analysis (ICA) [Comon, 1994]
– ICA extracts non-orthogonal bases • that maximize a statistical independence between sources.
– ICA estimates sparse sources• when we assume a super-Gaussian prior.
• Propose to use ICA bases and estimated sources as initial NMF values– Objectives:
• 1. Deeper minimization• 2. Faster convergence• 3. Better performance
Proposed method: utilization of ICA
7/19Number of update iterations in NMF
Valu
e of
cos
t fu
nctio
n in
NM
F
Deeper minimization
Faster convergence
• The input data matrix is a mixture of some sources.– sources in are mixed via , then observed as
– ICA can estimate a demixing matrix and the independent sources .
• PCA for only the dimensionality reduction in NMF • Nonnegative ICA for taking nonnegativity into account• Nonnegativization for ensuring complete nonnegativity
Proposed method: concept
8/19
Input data matrix Mixing matrix Source matrix
… …
Input data matrix
PCANMFInitial valuesNICA Nonnegativization
…ICA bases
PCA matrix for dimensionality reduction
Mutually independent
• Nonnegative ICA (NICA) [Plumbley, 2003]
– estimates demixing matrix so that all of the separated sources become nonnegative.
– finds rotation matrix for pre-whitened mixtures .
– Steepest gradient descent for estimating
Nonnegative constrained ICA
9/19
Cost function: where
ObservedWhitening w/o
centering
Pre-whitened SeparatedRotation
(demixing)
• Dimensionality reduction via PCA
• NMF variables obtained from the estimates of NICA– Support that ,– then we have
Combine PCA for dimensionality reduction
10/19
Rows are eigenvectors of has top- eigenvectors
Eig
enva
lues
High
Low
Basis matrix
Activation matrix
Rotation matrix estimated by NICA
ICA bases Sources
Zero matrix
• Even if we use NICA, there is no guarantee that– obtained (sources) becomes completely nonnegative
because of the dimensionality reduction by PCA.– As for the obtained basis (ICA bases), nonnegativity is
not assumed in NICA.• Take a “nonnegativization” for obtained and :
– Method 1: – Method 2: – Method 3:
• where and are scale fitting coefficient that depend on a divergence of following NMF
Nonnegativization
11/19
Correlation between and
Correlation between and
• Power spectrogram of mixture with Vo. and Gt.– Song: “Actions – One Minute Smile” from SiSEC2015– Size of power spectrogram: 2049 x 1290 (60 sec.)– Number of bases:
Experiment: conditions
12/19
Freq
uenc
y [k
Hz]
Time [s]
• Convergence of cost function in NICA
Experiment: results of NICA
13/19
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Val
ue o
f cos
t fun
ctio
n in
NIC
A
2000150010005000Number of iterations
Steepest gradient descent
• Convergence of EU-NMF
Experiment: results of Euclidian NMF
14/19
Processing time for initialization
NICA: 4.36 s PCA: 0.98 s SVD: 2.40 s
EU-NMF: 12.78 s (for 1000 iter.)
Rand1~10 are based on random initialization with different seeds.
5
6
7
8
9
1010
Cos
t fun
ctio
n in
EU
-NM
F
10008006004002000Number of iterations
NICA1 NICA2 NICA3 PCA-based initialization NNDSVD Rand1~Rand10
SVDPCA
Rand1~Rand10
Proposed methods
• Convergence of KL-NMF
8
9
107
Cos
t fun
ctio
n in
KL-
NM
F
10008006004002000Number of iterations
NICA1 NICA2 NICA3 PCA-based initialization NNDSVD Rand1~Rand10
Experiment: results of Kullback-Leibler NMF
15/19
SVD
PCA
Rand1~Rand10
Proposed methods
Processing time for initialization
NICA: 4.36 s PCA: 0.98 s SVD: 2.40 s
KL-NMF: 48.07 s (for 1000 iter.)
Rand1~10 are based on random initialization with different seeds.
• Convergence of IS-NMF1.70
1.65
1.60
1.55
1.50
1.45
Cos
t fun
ctio
n in
IS-N
MF
10008006004002000Number of iterations
NICA1 NICA2 NICA3 PCA-based initialization NNDSVD Rand1~Rand10
Experiment: results of Itakura-Saito NMF
16/19
SVDPCA
Rand1~Rand10
Proposed methods
x106
Processing time for initialization
NICA: 4.36 s PCA: 0.98 s SVD: 2.40 s
IS-NMF: 214.26 s (for 1000 iter.)
Rand1~10 are based on random initialization with different seeds.
Experiment: full-supervised source separation• Full-supervised NMF [Smaragdis, 2007]
– Simply use pre-trained sourcewise bases for separation
17/19
Training stage
,
Separation stage
Initialized by conventional or proposed method
Cost functions:
Cost function:
Pre-trained bases (fixed)
Initialized based on the correlations between and or
• Two sources separation using full-supervised NMF– SiSEC2015 MUS dataset (professionally recorded music) – Averaged SDR improvements of 15 songs
Experiment: results of separation
18/19
Separation performance for source 1 Separation performance for source 2
Ran
d10
NIC
A1
NIC
A2
NIC
A3
PC
AS
VD
Ran
d1R
and2
Ran
d3R
and4
Ran
d5R
and6
Ran
d7R
and8
Ran
d9
12
10
8
6
4
2
0
SD
R im
prov
emen
t [dB
]
5
4
3
2
1
0
SD
R im
prov
emen
t [dB
]
Ran
d10
NIC
A1
NIC
A2
NIC
A3
PC
AS
VD
Ran
d1R
and2
Ran
d3R
and4
Ran
d5R
and6
Ran
d7R
and8
Ran
d9
Prop. Conv. Prop. Conv.
Conclusion• Proposed efficient initialization method for NMF• Utilize statistical independence for obtaining non-
orthogonal bases and sources– The orthogonality may not be preferable for NMF.
• The proposed initialization gives – deeper minimization– faster convergence– better performance for full-supervised source separation
19/19
Thank you for your attention!