clustering and non-negative matrix factorizationquimper/seminaires/files/mohammad...presented by...
TRANSCRIPT
![Page 1: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/1.jpg)
Clustering and Non-negative MatrixFactorization
Presented by Mohammad Sajjad Ghaemi
DAMAS LAB,Computer Science and Software Engineering Department,
Laval University
12 April 2013
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 1/36
![Page 2: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/2.jpg)
Outline
I What is clustering ?
I NMF overview
I Cost functions and multiplicative algorithms
I A geometric interpretation of NMF
I r-separable NMF
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 2/36
![Page 3: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/3.jpg)
What is clustering ?
According to the label information we have 3 categories of learning
I Supervised learningLearning from labeled data.
I Semi-supervised learningLearning from both labeled and unlabeled data.
I Unsupervised learningLearning from unlabeled data.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 3/36
![Page 4: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/4.jpg)
What is clustering ?
According to the label information we have 3 categories of learning
I Supervised learningLearning from labeled data.
I Semi-supervised learningLearning from both labeled and unlabeled data.
I Unsupervised learningLearning from unlabeled data.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 3/36
![Page 5: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/5.jpg)
What is clustering ?
According to the label information we have 3 categories of learning
I Supervised learningLearning from labeled data.
I Semi-supervised learningLearning from both labeled and unlabeled data.
I Unsupervised learningLearning from unlabeled data.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 3/36
![Page 6: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/6.jpg)
Label information
Taken from [Jain,2010]
[Jain,2010] Jain, A.K., 2010. Data clustering : 50 years beyondk-means. Pattern Recognition Letters 31, 651666.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 4/36
![Page 7: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/7.jpg)
Semi-supervised learning and manifold assumption
Taken from [Jain,2010]
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 5/36
![Page 8: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/8.jpg)
Unsupervised learning or clustering
Taken from [Jain,2010]
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 6/36
![Page 9: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/9.jpg)
What is clustering ?I Clustering is grouping similar objects together.
I Clusterings are usually not ”right” or ”wrong”, differentclusterings/clustering criteria can reveal different things aboutthe data.
I There is no objectively ”correct” clustering algorithm, but”clustering is in the eye of the beholder”.
I Clustering algorithms :
I Employ some notion of distance between objects.
I Have an explicit or implicit criterion defining what a goodcluster is.
I Heuristically optimize that criterion to determine theclustering.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 7/36
![Page 10: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/10.jpg)
What is clustering ?I Clustering is grouping similar objects together.
I Clusterings are usually not ”right” or ”wrong”, differentclusterings/clustering criteria can reveal different things aboutthe data.
I There is no objectively ”correct” clustering algorithm, but”clustering is in the eye of the beholder”.
I Clustering algorithms :
I Employ some notion of distance between objects.
I Have an explicit or implicit criterion defining what a goodcluster is.
I Heuristically optimize that criterion to determine theclustering.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 7/36
![Page 11: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/11.jpg)
What is clustering ?I Clustering is grouping similar objects together.
I Clusterings are usually not ”right” or ”wrong”, differentclusterings/clustering criteria can reveal different things aboutthe data.
I There is no objectively ”correct” clustering algorithm, but”clustering is in the eye of the beholder”.
I Clustering algorithms :
I Employ some notion of distance between objects.
I Have an explicit or implicit criterion defining what a goodcluster is.
I Heuristically optimize that criterion to determine theclustering.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 7/36
![Page 12: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/12.jpg)
What is clustering ?I Clustering is grouping similar objects together.
I Clusterings are usually not ”right” or ”wrong”, differentclusterings/clustering criteria can reveal different things aboutthe data.
I There is no objectively ”correct” clustering algorithm, but”clustering is in the eye of the beholder”.
I Clustering algorithms :
I Employ some notion of distance between objects.
I Have an explicit or implicit criterion defining what a goodcluster is.
I Heuristically optimize that criterion to determine theclustering.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 7/36
![Page 13: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/13.jpg)
Comparing various clustering algorithms
Taken from [Jain,2010]
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 8/36
![Page 14: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/14.jpg)
Outline
I What is clustering ?
I NMF overview
I Cost functions and multiplicative algorithms
I A geometric interpretation of NMF
I r-separable NMF
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 9/36
![Page 15: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/15.jpg)
Matrix factorization
I NMF (Non-negative Matrix Factorization)
Question :Given a non-negative matrix V , find non-negative matrixfactors W and H,
V ≈WH
Answer :Non-negative Matrix Factorization (NMF)
Advantage of non-negativity Interpretability
I NMF is NP-hard
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 10/36
![Page 16: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/16.jpg)
Matrix factorization
I NMF (Non-negative Matrix Factorization)
Question :Given a non-negative matrix V , find non-negative matrixfactors W and H,
V ≈WH
Answer :Non-negative Matrix Factorization (NMF)
Advantage of non-negativity Interpretability
I NMF is NP-hard
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 10/36
![Page 17: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/17.jpg)
Matrix factorization
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 11/36
![Page 18: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/18.jpg)
Matrix factorization
I Generally, factorization of matrices is not unique
I Principal Component Analysis
I Singular Value Decomposition
I Nystrom Method
I Non-negative Matrix Factorization differs from the abovemethods.
I NMF enforces the constraint that the factors must benon-negative.
I All elements must be equal to or greater than zero.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 12/36
![Page 19: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/19.jpg)
Matrix factorization
I Generally, factorization of matrices is not unique
I Principal Component Analysis
I Singular Value Decomposition
I Nystrom Method
I Non-negative Matrix Factorization differs from the abovemethods.
I NMF enforces the constraint that the factors must benon-negative.
I All elements must be equal to or greater than zero.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 12/36
![Page 20: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/20.jpg)
Matrix factorization
I Generally, factorization of matrices is not unique
I Principal Component Analysis
I Singular Value Decomposition
I Nystrom Method
I Non-negative Matrix Factorization differs from the abovemethods.
I NMF enforces the constraint that the factors must benon-negative.
I All elements must be equal to or greater than zero.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 12/36
![Page 21: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/21.jpg)
Matrix factorization
I Generally, factorization of matrices is not unique
I Principal Component Analysis
I Singular Value Decomposition
I Nystrom Method
I Non-negative Matrix Factorization differs from the abovemethods.
I NMF enforces the constraint that the factors must benon-negative.
I All elements must be equal to or greater than zero.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 12/36
![Page 22: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/22.jpg)
Matrix factorization
I Is there any unique solution to the NMF problem ?
I
V ≈WD−1DH
I NMF has the drawback of being highly ill-posed.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 13/36
![Page 23: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/23.jpg)
Matrix factorization
I Is there any unique solution to the NMF problem ?
I
V ≈WD−1DH
I NMF has the drawback of being highly ill-posed.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 13/36
![Page 24: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/24.jpg)
NMF is interesting because it does data clustering
Data Clustering = Matrix Factorizations
Many unsupervised learning methods are closely related in a simpleway (Ding, He, Simon, SDM 2005).
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 14/36
![Page 25: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/25.jpg)
Numerical example
taken from Katsuhiko Ishiguro, et al. Extracting Essential Structurefrom Data.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 15/36
![Page 26: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/26.jpg)
Heat map of NMF on the gene expression data
The left is the gene expression data where each columncorresponds to a sample, the middle is the basis matrix, and the
right is the coefficient matrix.taken from Yifeng Li, et al. The Non-Negative Matrix
Factorization Toolbox for Biological Data MiningPresented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 16/36
![Page 27: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/27.jpg)
Heat map of NMF clustering on a yeast metabolic
The left is the gene expression data where each columncorresponds to a gene, the middle is the basis matrix, and the right
is the coefficient matrix.taken from Yifeng Li, et al. The Non-Negative Matrix
Factorization Toolbox for Biological Data MiningPresented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 17/36
![Page 28: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/28.jpg)
Outline
I What is clustering ?
I NMF overview
I Cost functions and multiplicative algorithms
I A geometric interpretation of NMF
I r-separable NMF
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 18/36
![Page 29: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/29.jpg)
How to solve it
Two conventional and convergent algorithms
I Square of the Euclidean distance
||A− B||2 =∑ij
(Aij − Bij)2
I Generalized Kullback-Leibler divergence
D(A||B) =∑ij
(Aij logAij
Bij− Aij + Bij)
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 19/36
![Page 30: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/30.jpg)
How to solve it
Two conventional and convergent algorithms
I Square of the Euclidean distance
||A− B||2 =∑ij
(Aij − Bij)2
I Generalized Kullback-Leibler divergence
D(A||B) =∑ij
(Aij logAij
Bij− Aij + Bij)
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 19/36
![Page 31: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/31.jpg)
How to minimize it
I Minimize ||V −WH||2 or D(V ||WH)
I Convex in W only or H only (not convex in both variables)
I Goal-finding local minima (tough to get global minima)
I Gradient descent ?I Slow convergence
I Sensitive to the step size
I inconvenient for large data
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 20/36
![Page 32: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/32.jpg)
Cost functions and gradient based algorithm for squareEuclidean distance
I Minimize ||V −WH||2
I W newik = Wik − µik5W
where 5W is the gradient of the approximation objectivefunction with respect to W .
I Without loss of generality, we can assume that 5W consistsof 5+ and 5−, positive and unsigned negative terms,respectively. That is,
5W = 5+ −5−
I According to the steepest gradient descent methodW new
ik = Wik − µik(5+ik −5
−ik) can minimize the NMF
objectives.
I By assuming that each matrix element has its own learningrate µik = Wik
5+ik
we have,
W newik = Wik
5−ik5+
ik
(Multiplicative update rule)
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 21/36
![Page 33: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/33.jpg)
Multiplicative vs. Additive rules
By taking the gradient of the cost function with respect to W wehave,
5W = WHHT − VHT
W newik = Wik − µik((WHHT )ik − (VHT )ik)
µik =Wik
(WHHT )ik
W newik = Wik
(VHT )ik(WHHT )ik
Similar for H,
Hnewik = Hik
(W TV )ik(W TWH)ik
(1)
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 22/36
![Page 34: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/34.jpg)
Cost functions and gradient based algorithm for squareEuclidean distance
I The provided justification for multiplicative update rule doesnot have any theoretical guarantee that the resulting updateswill monotonically decrease the objective function !
I Currently, the auxiliary function technique is the most widelyaccepted approach for monotonicity proof of multiplicativeupdates.
I Given an objective function J (W ) to be minimized,G(W ,W t) is called an auxiliary function if it is a tight upperbound of J (W ), that is,
I G(W ,W t) ≥ J (W )I G(W ,W ) = J (W )
for any W and W t .I Then iteratively applying the rule
W new = arg minW t G (W t ,W ), results in a monotonicallydecrease of J (W ).
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 23/36
![Page 35: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/35.jpg)
Auxiliary function
Paraboloid function J (W ) and its corresponding auxiliary functionG(W ,W t), where G(W ,W t) ≥ J (W ) and G(W t ,W t) = J (W t)
taken from Zhaoshui He, et al. IEEE TRANSACTIONS ONNEURAL NETWORKS 2011
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 24/36
![Page 36: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/36.jpg)
Auxiliary function
Using an auxiliary function G(W ,W t) to minimize an objectivefunction J (W ). The auxiliary function is constructed around thecurrent estimate of the minimizer ; the next estimate is found byminimizing the auxiliary function, which provides an upper bound
on the objective function. The procedure is iterated until itconverges to a stationary point (generally, a local minimum) of the
objective function.Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 25/36
![Page 37: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/37.jpg)
Updates for H of Euclidean distance
If K (ht) is the diagonal matrix
Kii (ht) =
(W tWht)ihti
then
G (h, ht) = J(ht) + (h − ht)T 5 J(ht) +1
2(h − ht)TK (ht)(h − ht)
is an auxiliary function for
J(h) =1
2
∑i
(vi −∑k
Wikhi )
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 26/36
![Page 38: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/38.jpg)
Proof
I The update rule can be obtained by taking the derivative ofthe G (h, ht) w.r.t h and then set it to zero,
5J(ht) + (h − ht)K (ht) = 0
h = ht − K (ht)−1 5 J(ht)
I Since J(ht) is non-increasing under this auxiliary function, bywriting the components of this equation explicitly, we obtain,
ht+1i = hti
(W tV )i(W tWht)i
I Can be shown similarly for W of Euclidean distance.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 27/36
![Page 39: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/39.jpg)
Outline
I What is clustering ?
I NMF overview
I Cost functions and multiplicative algorithms
I A geometric interpretation of NMF
I r-separable NMF
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 28/36
![Page 40: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/40.jpg)
A geometric interpretation of NMFI Given M and its NMF M ≈ UV one can scale M and U such
that they become column stochastic implying that V iscolumn stochastic :
M ≈ UV ⇐⇒ M ′ = MDm = (UDu)(D−1u VDm) = U ′V ′
M(:, j) =k∑
i=1
U(:, i)V (i , j) withk∑
i=1
V (i , j) = 1
I Therefore, the columns of M are convex combination of thecolumns of U.
I In other terms, conv(M) ⊂ conv(U) ⊂ ∆m where ∆m is theunit simplex.
I Solving exact NMF is equivalent to finding a poltyopeconv(U) between conv(M) and ∆m with minimum number ofvertices.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 29/36
![Page 41: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/41.jpg)
A geometric interpretation of NMFI Given M and its NMF M ≈ UV one can scale M and U such
that they become column stochastic implying that V iscolumn stochastic :
M ≈ UV ⇐⇒ M ′ = MDm = (UDu)(D−1u VDm) = U ′V ′
M(:, j) =k∑
i=1
U(:, i)V (i , j) withk∑
i=1
V (i , j) = 1
I Therefore, the columns of M are convex combination of thecolumns of U.
I In other terms, conv(M) ⊂ conv(U) ⊂ ∆m where ∆m is theunit simplex.
I Solving exact NMF is equivalent to finding a poltyopeconv(U) between conv(M) and ∆m with minimum number ofvertices.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 29/36
![Page 42: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/42.jpg)
A geometric interpretation of NMFI Given M and its NMF M ≈ UV one can scale M and U such
that they become column stochastic implying that V iscolumn stochastic :
M ≈ UV ⇐⇒ M ′ = MDm = (UDu)(D−1u VDm) = U ′V ′
M(:, j) =k∑
i=1
U(:, i)V (i , j) withk∑
i=1
V (i , j) = 1
I Therefore, the columns of M are convex combination of thecolumns of U.
I In other terms, conv(M) ⊂ conv(U) ⊂ ∆m where ∆m is theunit simplex.
I Solving exact NMF is equivalent to finding a poltyopeconv(U) between conv(M) and ∆m with minimum number ofvertices.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 29/36
![Page 43: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/43.jpg)
A geometric interpretation of NMFI Given M and its NMF M ≈ UV one can scale M and U such
that they become column stochastic implying that V iscolumn stochastic :
M ≈ UV ⇐⇒ M ′ = MDm = (UDu)(D−1u VDm) = U ′V ′
M(:, j) =k∑
i=1
U(:, i)V (i , j) withk∑
i=1
V (i , j) = 1
I Therefore, the columns of M are convex combination of thecolumns of U.
I In other terms, conv(M) ⊂ conv(U) ⊂ ∆m where ∆m is theunit simplex.
I Solving exact NMF is equivalent to finding a poltyopeconv(U) between conv(M) and ∆m with minimum number ofvertices.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 29/36
![Page 44: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/44.jpg)
A geometric interpretation of NMF
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 30/36
![Page 45: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/45.jpg)
Outline
I What is clustering ?
I NMF overview
I Cost functions and multiplicative algorithms
I A geometric interpretation of NMF
I r-separable NMF
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 31/36
![Page 46: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/46.jpg)
Separability Assumption
I If matrix V satisfies separability condition, it is possible tocompute optimal solutions in polynomial time.
I r-separable NMF
There exists an NMF (W ,H) of rank r with V = WH whereeach column of W is equal to a column of V .
I V is r-separable ⇐⇒ V ≈WH = W [Ir ,H′]Π = [W ,WH ′]Π
For some H ′ ≥ 0 with columns sum to one, some permutationmatrix Π, and Ir is the r-by-r identity matrix.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 32/36
![Page 47: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/47.jpg)
Separability Assumption
I If matrix V satisfies separability condition, it is possible tocompute optimal solutions in polynomial time.
I r-separable NMF
There exists an NMF (W ,H) of rank r with V = WH whereeach column of W is equal to a column of V .
I V is r-separable ⇐⇒ V ≈WH = W [Ir ,H′]Π = [W ,WH ′]Π
For some H ′ ≥ 0 with columns sum to one, some permutationmatrix Π, and Ir is the r-by-r identity matrix.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 32/36
![Page 48: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/48.jpg)
Separability Assumption
I If matrix V satisfies separability condition, it is possible tocompute optimal solutions in polynomial time.
I r-separable NMF
There exists an NMF (W ,H) of rank r with V = WH whereeach column of W is equal to a column of V .
I V is r-separable ⇐⇒ V ≈WH = W [Ir ,H′]Π = [W ,WH ′]Π
For some H ′ ≥ 0 with columns sum to one, some permutationmatrix Π, and Ir is the r-by-r identity matrix.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 32/36
![Page 49: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/49.jpg)
A geometric interpretation of separability
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 33/36
![Page 50: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/50.jpg)
Separability Assumption
I Under separability, NMF reduces to the following problem :
Given a set of points (the normalized columns of V ), identifythe vertices of its convex hull.
I We are still very far from knowing the best ways to computethe convex hull for general dimensions, despite the varietymethods proposed for convex hull problem.
I However, we want to design algorithms which are
I Fast : they should be able to deal with large-scale real-worldproblems where n is 106 – 109.
I Robust : if noise is added to the separable matrix, they shouldbe able to identifying the right set of columns.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 34/36
![Page 51: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/51.jpg)
Separability Assumption
I Under separability, NMF reduces to the following problem :
Given a set of points (the normalized columns of V ), identifythe vertices of its convex hull.
I We are still very far from knowing the best ways to computethe convex hull for general dimensions, despite the varietymethods proposed for convex hull problem.
I However, we want to design algorithms which are
I Fast : they should be able to deal with large-scale real-worldproblems where n is 106 – 109.
I Robust : if noise is added to the separable matrix, they shouldbe able to identifying the right set of columns.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 34/36
![Page 52: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/52.jpg)
Separability Assumption
I Under separability, NMF reduces to the following problem :
Given a set of points (the normalized columns of V ), identifythe vertices of its convex hull.
I We are still very far from knowing the best ways to computethe convex hull for general dimensions, despite the varietymethods proposed for convex hull problem.
I However, we want to design algorithms which are
I Fast : they should be able to deal with large-scale real-worldproblems where n is 106 – 109.
I Robust : if noise is added to the separable matrix, they shouldbe able to identifying the right set of columns.
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 34/36
![Page 53: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/53.jpg)
Separability Assumption
I For a separable matrix V , we have,
V ≈ WH
= W [Ir ,H′]Π
= [W ,WH ′]Π
= [W ,WH ′]
(Ir H ′
0(n−r)×r 0(n−r)×(n−r)
)Π
= VX
where Π is a permutation matrix, the columns of H ′ ≥ 0 sumto one.
I Therefore for r-separable NMF, we need to solve the followingoptimization problem according to some constraints,
||V (:, i)− VX (:, i)||2 ≤ ε for all i .
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 35/36
![Page 54: Clustering and Non-negative Matrix Factorizationquimper/Seminaires/files/Mohammad...Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization](https://reader034.vdocument.in/reader034/viewer/2022052021/6035fc904b3fc60cbc6eb9b3/html5/thumbnails/54.jpg)
Question
Question ?
Presented by Mohammad Sajjad Ghaemi, Laboratory DAMAS Clustering and Non-negative Matrix Factorization 36/36