clustering and non-negative matrix factorizationquimper/seminaires/files/mohammad...presented by...

54
Clustering and Non-negative Matrix Factorization Presented by Mohammad Sajjad Ghaemi DAMAS LAB, Computer Science and Software Engineering Department, Laval University 12 April 2013 Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 1/36

Upload: others

Post on 07-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Clustering and Non-negative MatrixFactorization

Presented by Mohammad Sajjad Ghaemi

DAMAS LAB,Computer Science and Software Engineering Department,

Laval University

12 April 2013

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 1/36

Page 2: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Outline

I What is clustering ?

I NMF overview

I Cost functions and multiplicative algorithms

I A geometric interpretation of NMF

I r-separable NMF

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 2/36

Page 3: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

What is clustering ?

According to the label information we have 3 categories of learning

I Supervised learningLearning from labeled data.

I Semi-supervised learningLearning from both labeled and unlabeled data.

I Unsupervised learningLearning from unlabeled data.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 3/36

Page 4: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

What is clustering ?

According to the label information we have 3 categories of learning

I Supervised learningLearning from labeled data.

I Semi-supervised learningLearning from both labeled and unlabeled data.

I Unsupervised learningLearning from unlabeled data.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 3/36

Page 5: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

What is clustering ?

According to the label information we have 3 categories of learning

I Supervised learningLearning from labeled data.

I Semi-supervised learningLearning from both labeled and unlabeled data.

I Unsupervised learningLearning from unlabeled data.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 3/36

Page 6: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Label information

Taken from [Jain,2010]

[Jain,2010] Jain, A.K., 2010. Data clustering : 50 years beyondk-means. Pattern Recognition Letters 31, 651666.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 4/36

Page 7: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Semi-supervised learning and manifold assumption

Taken from [Jain,2010]

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 5/36

Page 8: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Unsupervised learning or clustering

Taken from [Jain,2010]

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 6/36

Page 9: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

What is clustering ?I Clustering is grouping similar objects together.

I Clusterings are usually not ”right” or ”wrong”, differentclusterings/clustering criteria can reveal different things aboutthe data.

I There is no objectively ”correct” clustering algorithm, but”clustering is in the eye of the beholder”.

I Clustering algorithms :

I Employ some notion of distance between objects.

I Have an explicit or implicit criterion defining what a goodcluster is.

I Heuristically optimize that criterion to determine theclustering.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 7/36

Page 10: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

What is clustering ?I Clustering is grouping similar objects together.

I Clusterings are usually not ”right” or ”wrong”, differentclusterings/clustering criteria can reveal different things aboutthe data.

I There is no objectively ”correct” clustering algorithm, but”clustering is in the eye of the beholder”.

I Clustering algorithms :

I Employ some notion of distance between objects.

I Have an explicit or implicit criterion defining what a goodcluster is.

I Heuristically optimize that criterion to determine theclustering.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 7/36

Page 11: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

What is clustering ?I Clustering is grouping similar objects together.

I Clusterings are usually not ”right” or ”wrong”, differentclusterings/clustering criteria can reveal different things aboutthe data.

I There is no objectively ”correct” clustering algorithm, but”clustering is in the eye of the beholder”.

I Clustering algorithms :

I Employ some notion of distance between objects.

I Have an explicit or implicit criterion defining what a goodcluster is.

I Heuristically optimize that criterion to determine theclustering.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 7/36

Page 12: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

What is clustering ?I Clustering is grouping similar objects together.

I Clusterings are usually not ”right” or ”wrong”, differentclusterings/clustering criteria can reveal different things aboutthe data.

I There is no objectively ”correct” clustering algorithm, but”clustering is in the eye of the beholder”.

I Clustering algorithms :

I Employ some notion of distance between objects.

I Have an explicit or implicit criterion defining what a goodcluster is.

I Heuristically optimize that criterion to determine theclustering.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 7/36

Page 13: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Comparing various clustering algorithms

Taken from [Jain,2010]

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 8/36

Page 14: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Outline

I What is clustering ?

I NMF overview

I Cost functions and multiplicative algorithms

I A geometric interpretation of NMF

I r-separable NMF

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 9/36

Page 15: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Matrix factorization

I NMF (Non-negative Matrix Factorization)

Question :Given a non-negative matrix V , find non-negative matrixfactors W and H,

V ≈WH

Answer :Non-negative Matrix Factorization (NMF)

Advantage of non-negativity Interpretability

I NMF is NP-hard

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 10/36

Page 16: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Matrix factorization

I NMF (Non-negative Matrix Factorization)

Question :Given a non-negative matrix V , find non-negative matrixfactors W and H,

V ≈WH

Answer :Non-negative Matrix Factorization (NMF)

Advantage of non-negativity Interpretability

I NMF is NP-hard

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 10/36

Page 17: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Matrix factorization

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 11/36

Page 18: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Matrix factorization

I Generally, factorization of matrices is not unique

I Principal Component Analysis

I Singular Value Decomposition

I Nystrom Method

I Non-negative Matrix Factorization differs from the abovemethods.

I NMF enforces the constraint that the factors must benon-negative.

I All elements must be equal to or greater than zero.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 12/36

Page 19: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Matrix factorization

I Generally, factorization of matrices is not unique

I Principal Component Analysis

I Singular Value Decomposition

I Nystrom Method

I Non-negative Matrix Factorization differs from the abovemethods.

I NMF enforces the constraint that the factors must benon-negative.

I All elements must be equal to or greater than zero.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 12/36

Page 20: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Matrix factorization

I Generally, factorization of matrices is not unique

I Principal Component Analysis

I Singular Value Decomposition

I Nystrom Method

I Non-negative Matrix Factorization differs from the abovemethods.

I NMF enforces the constraint that the factors must benon-negative.

I All elements must be equal to or greater than zero.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 12/36

Page 21: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Matrix factorization

I Generally, factorization of matrices is not unique

I Principal Component Analysis

I Singular Value Decomposition

I Nystrom Method

I Non-negative Matrix Factorization differs from the abovemethods.

I NMF enforces the constraint that the factors must benon-negative.

I All elements must be equal to or greater than zero.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 12/36

Page 22: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Matrix factorization

I Is there any unique solution to the NMF problem ?

I

V ≈WD−1DH

I NMF has the drawback of being highly ill-posed.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 13/36

Page 23: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Matrix factorization

I Is there any unique solution to the NMF problem ?

I

V ≈WD−1DH

I NMF has the drawback of being highly ill-posed.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 13/36

Page 24: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

NMF is interesting because it does data clustering

Data Clustering = Matrix Factorizations

Many unsupervised learning methods are closely related in a simpleway (Ding, He, Simon, SDM 2005).

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 14/36

Page 25: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Numerical example

taken from Katsuhiko Ishiguro, et al. Extracting Essential Structurefrom Data.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 15/36

Page 26: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Heat map of NMF on the gene expression data

The left is the gene expression data where each columncorresponds to a sample, the middle is the basis matrix, and the

right is the coefficient matrix.taken from Yifeng Li, et al. The Non-Negative Matrix

Factorization Toolbox for Biological Data MiningPresented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 16/36

Page 27: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Heat map of NMF clustering on a yeast metabolic

The left is the gene expression data where each columncorresponds to a gene, the middle is the basis matrix, and the right

is the coefficient matrix.taken from Yifeng Li, et al. The Non-Negative Matrix

Factorization Toolbox for Biological Data MiningPresented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 17/36

Page 28: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Outline

I What is clustering ?

I NMF overview

I Cost functions and multiplicative algorithms

I A geometric interpretation of NMF

I r-separable NMF

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 18/36

Page 29: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

How to solve it

Two conventional and convergent algorithms

I Square of the Euclidean distance

||A− B||2 =∑ij

(Aij − Bij)2

I Generalized Kullback-Leibler divergence

D(A||B) =∑ij

(Aij logAij

Bij− Aij + Bij)

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 19/36

Page 30: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

How to solve it

Two conventional and convergent algorithms

I Square of the Euclidean distance

||A− B||2 =∑ij

(Aij − Bij)2

I Generalized Kullback-Leibler divergence

D(A||B) =∑ij

(Aij logAij

Bij− Aij + Bij)

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 19/36

Page 31: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

How to minimize it

I Minimize ||V −WH||2 or D(V ||WH)

I Convex in W only or H only (not convex in both variables)

I Goal-finding local minima (tough to get global minima)

I Gradient descent ?I Slow convergence

I Sensitive to the step size

I inconvenient for large data

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 20/36

Page 32: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Cost functions and gradient based algorithm for squareEuclidean distance

I Minimize ||V −WH||2

I W newik = Wik − µik5W

where 5W is the gradient of the approximation objectivefunction with respect to W .

I Without loss of generality, we can assume that 5W consistsof 5+ and 5−, positive and unsigned negative terms,respectively. That is,

5W = 5+ −5−

I According to the steepest gradient descent methodW new

ik = Wik − µik(5+ik −5

−ik) can minimize the NMF

objectives.

I By assuming that each matrix element has its own learningrate µik = Wik

5+ik

we have,

W newik = Wik

5−ik5+

ik

(Multiplicative update rule)

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 21/36

Page 33: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Multiplicative vs. Additive rules

By taking the gradient of the cost function with respect to W wehave,

5W = WHHT − VHT

W newik = Wik − µik((WHHT )ik − (VHT )ik)

µik =Wik

(WHHT )ik

W newik = Wik

(VHT )ik(WHHT )ik

Similar for H,

Hnewik = Hik

(W TV )ik(W TWH)ik

(1)

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 22/36

Page 34: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Cost functions and gradient based algorithm for squareEuclidean distance

I The provided justification for multiplicative update rule doesnot have any theoretical guarantee that the resulting updateswill monotonically decrease the objective function !

I Currently, the auxiliary function technique is the most widelyaccepted approach for monotonicity proof of multiplicativeupdates.

I Given an objective function J (W ) to be minimized,G(W ,W t) is called an auxiliary function if it is a tight upperbound of J (W ), that is,

I G(W ,W t) ≥ J (W )I G(W ,W ) = J (W )

for any W and W t .I Then iteratively applying the rule

W new = arg minW t G (W t ,W ), results in a monotonicallydecrease of J (W ).

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 23/36

Page 35: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Auxiliary function

Paraboloid function J (W ) and its corresponding auxiliary functionG(W ,W t), where G(W ,W t) ≥ J (W ) and G(W t ,W t) = J (W t)

taken from Zhaoshui He, et al. IEEE TRANSACTIONS ONNEURAL NETWORKS 2011

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 24/36

Page 36: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Auxiliary function

Using an auxiliary function G(W ,W t) to minimize an objectivefunction J (W ). The auxiliary function is constructed around thecurrent estimate of the minimizer ; the next estimate is found byminimizing the auxiliary function, which provides an upper bound

on the objective function. The procedure is iterated until itconverges to a stationary point (generally, a local minimum) of the

objective function.Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 25/36

Page 37: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Updates for H of Euclidean distance

If K (ht) is the diagonal matrix

Kii (ht) =

(W tWht)ihti

then

G (h, ht) = J(ht) + (h − ht)T 5 J(ht) +1

2(h − ht)TK (ht)(h − ht)

is an auxiliary function for

J(h) =1

2

∑i

(vi −∑k

Wikhi )

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 26/36

Page 38: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Proof

I The update rule can be obtained by taking the derivative ofthe G (h, ht) w.r.t h and then set it to zero,

5J(ht) + (h − ht)K (ht) = 0

h = ht − K (ht)−1 5 J(ht)

I Since J(ht) is non-increasing under this auxiliary function, bywriting the components of this equation explicitly, we obtain,

ht+1i = hti

(W tV )i(W tWht)i

I Can be shown similarly for W of Euclidean distance.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 27/36

Page 39: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Outline

I What is clustering ?

I NMF overview

I Cost functions and multiplicative algorithms

I A geometric interpretation of NMF

I r-separable NMF

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 28/36

Page 40: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

A geometric interpretation of NMFI Given M and its NMF M ≈ UV one can scale M and U such

that they become column stochastic implying that V iscolumn stochastic :

M ≈ UV ⇐⇒ M ′ = MDm = (UDu)(D−1u VDm) = U ′V ′

M(:, j) =k∑

i=1

U(:, i)V (i , j) withk∑

i=1

V (i , j) = 1

I Therefore, the columns of M are convex combination of thecolumns of U.

I In other terms, conv(M) ⊂ conv(U) ⊂ ∆m where ∆m is theunit simplex.

I Solving exact NMF is equivalent to finding a poltyopeconv(U) between conv(M) and ∆m with minimum number ofvertices.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 29/36

Page 41: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

A geometric interpretation of NMFI Given M and its NMF M ≈ UV one can scale M and U such

that they become column stochastic implying that V iscolumn stochastic :

M ≈ UV ⇐⇒ M ′ = MDm = (UDu)(D−1u VDm) = U ′V ′

M(:, j) =k∑

i=1

U(:, i)V (i , j) withk∑

i=1

V (i , j) = 1

I Therefore, the columns of M are convex combination of thecolumns of U.

I In other terms, conv(M) ⊂ conv(U) ⊂ ∆m where ∆m is theunit simplex.

I Solving exact NMF is equivalent to finding a poltyopeconv(U) between conv(M) and ∆m with minimum number ofvertices.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 29/36

Page 42: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

A geometric interpretation of NMFI Given M and its NMF M ≈ UV one can scale M and U such

that they become column stochastic implying that V iscolumn stochastic :

M ≈ UV ⇐⇒ M ′ = MDm = (UDu)(D−1u VDm) = U ′V ′

M(:, j) =k∑

i=1

U(:, i)V (i , j) withk∑

i=1

V (i , j) = 1

I Therefore, the columns of M are convex combination of thecolumns of U.

I In other terms, conv(M) ⊂ conv(U) ⊂ ∆m where ∆m is theunit simplex.

I Solving exact NMF is equivalent to finding a poltyopeconv(U) between conv(M) and ∆m with minimum number ofvertices.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 29/36

Page 43: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

A geometric interpretation of NMFI Given M and its NMF M ≈ UV one can scale M and U such

that they become column stochastic implying that V iscolumn stochastic :

M ≈ UV ⇐⇒ M ′ = MDm = (UDu)(D−1u VDm) = U ′V ′

M(:, j) =k∑

i=1

U(:, i)V (i , j) withk∑

i=1

V (i , j) = 1

I Therefore, the columns of M are convex combination of thecolumns of U.

I In other terms, conv(M) ⊂ conv(U) ⊂ ∆m where ∆m is theunit simplex.

I Solving exact NMF is equivalent to finding a poltyopeconv(U) between conv(M) and ∆m with minimum number ofvertices.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 29/36

Page 44: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

A geometric interpretation of NMF

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 30/36

Page 45: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Outline

I What is clustering ?

I NMF overview

I Cost functions and multiplicative algorithms

I A geometric interpretation of NMF

I r-separable NMF

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 31/36

Page 46: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Separability Assumption

I If matrix V satisfies separability condition, it is possible tocompute optimal solutions in polynomial time.

I r-separable NMF

There exists an NMF (W ,H) of rank r with V = WH whereeach column of W is equal to a column of V .

I V is r-separable ⇐⇒ V ≈WH = W [Ir ,H′]Π = [W ,WH ′]Π

For some H ′ ≥ 0 with columns sum to one, some permutationmatrix Π, and Ir is the r-by-r identity matrix.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 32/36

Page 47: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Separability Assumption

I If matrix V satisfies separability condition, it is possible tocompute optimal solutions in polynomial time.

I r-separable NMF

There exists an NMF (W ,H) of rank r with V = WH whereeach column of W is equal to a column of V .

I V is r-separable ⇐⇒ V ≈WH = W [Ir ,H′]Π = [W ,WH ′]Π

For some H ′ ≥ 0 with columns sum to one, some permutationmatrix Π, and Ir is the r-by-r identity matrix.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 32/36

Page 48: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Separability Assumption

I If matrix V satisfies separability condition, it is possible tocompute optimal solutions in polynomial time.

I r-separable NMF

There exists an NMF (W ,H) of rank r with V = WH whereeach column of W is equal to a column of V .

I V is r-separable ⇐⇒ V ≈WH = W [Ir ,H′]Π = [W ,WH ′]Π

For some H ′ ≥ 0 with columns sum to one, some permutationmatrix Π, and Ir is the r-by-r identity matrix.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 32/36

Page 49: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

A geometric interpretation of separability

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 33/36

Page 50: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Separability Assumption

I Under separability, NMF reduces to the following problem :

Given a set of points (the normalized columns of V ), identifythe vertices of its convex hull.

I We are still very far from knowing the best ways to computethe convex hull for general dimensions, despite the varietymethods proposed for convex hull problem.

I However, we want to design algorithms which are

I Fast : they should be able to deal with large-scale real-worldproblems where n is 106 – 109.

I Robust : if noise is added to the separable matrix, they shouldbe able to identifying the right set of columns.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 34/36

Page 51: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Separability Assumption

I Under separability, NMF reduces to the following problem :

Given a set of points (the normalized columns of V ), identifythe vertices of its convex hull.

I We are still very far from knowing the best ways to computethe convex hull for general dimensions, despite the varietymethods proposed for convex hull problem.

I However, we want to design algorithms which are

I Fast : they should be able to deal with large-scale real-worldproblems where n is 106 – 109.

I Robust : if noise is added to the separable matrix, they shouldbe able to identifying the right set of columns.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 34/36

Page 52: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Separability Assumption

I Under separability, NMF reduces to the following problem :

Given a set of points (the normalized columns of V ), identifythe vertices of its convex hull.

I We are still very far from knowing the best ways to computethe convex hull for general dimensions, despite the varietymethods proposed for convex hull problem.

I However, we want to design algorithms which are

I Fast : they should be able to deal with large-scale real-worldproblems where n is 106 – 109.

I Robust : if noise is added to the separable matrix, they shouldbe able to identifying the right set of columns.

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 34/36

Page 53: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Separability Assumption

I For a separable matrix V , we have,

V ≈ WH

= W [Ir ,H′]Π

= [W ,WH ′]Π

= [W ,WH ′]

(Ir H ′

0(n−r)×r 0(n−r)×(n−r)

= VX

where Π is a permutation matrix, the columns of H ′ ≥ 0 sumto one.

I Therefore for r-separable NMF, we need to solve the followingoptimization problem according to some constraints,

||V (:, i)− VX (:, i)||2 ≤ ε for all i .

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 35/36

Page 54: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization

Question

Question ?

Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 36/36