non negative matrix factorization hamdi jenzri. outline introduction non-negative matrix...

Non Negative Matrix Factorization

Hamdi Jenzri

Outline Introduction Non-Negative Matrix Factorization (NMF) Cost functions Algorithms

Multiplicative update algorithm Gradient descent algorithm Alternating least squares algorithm

NMF vs. SVD Initialization Issue Experiments

Image Dataset Landmine Dataset

Conclusion & Potential Future Work

Introduction In many data-processing tasks, negative numbers are

physically meaningless Pixel values in an image Vector representation of words in a text document…

Classical tools cannot guarantee to maintain the non-negativity Principal Component Analysis Singular Value Decomposition Vector Quantization…

Non-negative Matrix Factorization

Non-Negative Matrix Factorization

Given a non-negative matrix V, find non-negative matrix factors W

and H such that:

V ≈ W H

V is an nxm matrix whose columns are n-dimensional data vectors,

where m is the number of vectors in the data set.

W is an nxr non-negative matrix

H is an rxm non-negative matrix

Usually, r is chosen to be smaller than n or m, so that W and H are

smaller than the original matrix V

Non-Negative Matrix Factorization

Significance of this approximation:

It can be rewritten column by column as

v ≈ W h

Where v and h are the corresponding columns of V and H

Each data vector v is approximated by a linear combination of the columns of

W, weighted by the components of h

Therefore, W can be regarded as containing a basis that is optimized for the

linear approximation of the data in V

Since relatively few basis vectors are used to represent many data vectors,

good approximation can only be achieved if the basis vectors discover

structure that is latent in the data

Cost functions

To find an approximate factorization V ≈ W H, we first need to define

cost functions that quantify the quality of the approximation

Such cost functions can be constructed using some measure of

distance between two non-negative matrices A and B

Square of the Euclidean distance between A and B

||A – B||2 = ∑ij (Aij - Bij)2

Divergence of A from B

D (A||B) = ∑ij (Aij log(Aij/Bij) – Aij + Bij)

It reduces to the Kullback-Leibler divergence, or relative entropy, when ∑ij

Aij = ∑ij Bij = 1

Cost functions

The formulation of the NMF problem as an optimization

problem can be stated as:

Minimize f (W, H) = ||V – WH||2 with respect to W and H,

subject to the constraints W, H ≥ 0

Minimize f (W, H) = D (V || WH) with respect to W and H,

subject to the constraints W, H ≥ 0

These functions are convex in W only or H only, they are

not convex in both variables together

Multiplicative update algorithm

Lee and Seung

Convergence to a stationary point that may or may not be a local minimum

Gradient descent algorithm

and are the step size parameters A projection step is commonly used after each update rule

to set negative elements to zeros Chu et al., 2004; Lee and Seung, 2001

rand (n, r); % initialize Wrand (r, m); % initialize H

Alternating least squares algorithm

It aids sparsity More flexible: able to escape a poor path Paatero and Tapper, 1994

rand (n, r);

Convergence

There is no insurance of convergence to local minimum

No uniqueness

If (W, H) is a minimum

Then, (WD, D-1H) is too, where D is a non-negative invertible

matrix

Still, NMF is quite appealing for data mining applications

since, in practice, even local minima can provide desirable

properties such as data compression and feature extraction

NMF vs. SVD

Property NMF SVD

Formulation A = WH A = U∑VT

Optimality (in terms of squared distance)

Speed & robustness

Uniqueness

Sensitivity to initialization

Orthogonality

Sparsity

Non-negativity

Interpretability

Initialization Issue NMF algorithms are iterative Initialization of W and/or H A good initialization can improve

Speed Accuracy Convergence

Some initializations: Random initialization Centroid initialization (clustering) SVD-centroid initialization Random Vcol Random C initialization (densest columns)

0.0380

00.1212

1.0059

1.5773

1.3683

1.3294

1.9902

1.9229

0.4446

00.1438

||V – WH||F = 156.7879

Image Dataset

Different initialization

0.7274

1.0908

1.0515

0.2588

00.0692

0.3391

0.4743

0.5276

0.6496

0.8730

0.8000

||V – WH||F = 25.6828

1.2341

1.0807

1.2351

0.0761

0.0346

0.4035

0.1069

0.1781 0 1.283

01.333

20.951

||V – WH||F = 101.8359

Landmine Dataset

Used Data set: BAE-LMED

Results: varying r for Multiplicative update algorithm, random initialization

Results: Varying the initialization for the Multiplicative update algorithm, r = 9

Results: Comparing algorithms for the best found r = 9, random initialization

Results: Comparing best combination to Basic EHD performance

Columns of H

Different Datasets

Conclusion & Potential Future work NMF presents a way to represent the data in a different

basis Although its convergence and initialization issues, it is

quite appealing in many data mining tasks Other formulations do exist for the NMF problem

Constrained NMF Incremental NMF Bayesian NMF

Future work will include Trying other Landmine Datasets Bayesian NMF

References

Michael W. Berry et al., “Algorithms and Applications for

Approximate Nonnegative Matrix Factorization”, June 2006

Daniel D. Lee and H. Sebastian Seung, "Algorithms for Non-

negative Matrix Factorization". Advances in Neural Information

Processing Systems, 2001

Chih-Jen Lin, “Projected Gradient Methods for Non-negative

Matrix Factorization”, Neural Computation, june 2007

Amy N. Langville et al., “Initializations for Nonnegative Matrix

Factorization”, KDD 2006

non negative matrix factorization hamdi jenzri. outline introduction non-negative matrix...

Documents

non-negative matrix factorization for drum...

graph regularized non-negative matrix factorization for...

extensions of non-negative matrix factorization and their...

truncated cauchy non-negative matrix factorization · index...

hits vs. non-negative matrix factorization · hits vs....

non-negative matrix factorization and its application to...

non-negative matrix factorization as a feature selection...

non-negative multiple matrix factorization - ijcai

a deep non-negative matrix factorization neural...

position-dependent motif characterization using non-negative...

non-negative matrix factorization with gaussian process...

non-negative matrix factorization for face recognition

a particle-based variational approach to bayesian...

using separable non-negative matrix factorization techniques...

non-negative matrix factorization for drum...

chapter 2 non-negative matrix factorization

a novel non-negative matrix factorization technique for

non-negative matrix factorization for semi-supervised...

non-negative matrix factorization (nmf) - amazon s3 ·...

initialization enhancer for non-negative matrix...