diffusion maps and spectral clustering author : ronald r. coifman et al. (yale university) presenter...
TRANSCRIPT
Diffusion Maps and Spectral Clustering
Author : Ronald R. Coifman et al. (Yale University)
Presenter : Nilanjan Dasgupta (SIG Inc.)
Machine Learning Seminar Series1/14
Low-dimensionalManifold
X
Y
Z
-- Datum
Motivation
• Data lie on a low-dimensional manifold. The shape of the manifold is not known a priori.
• PCA would fail to make compact representation since the manifold is not linear !
• Spectral clustering as a non-linear dimensionality reduction scheme.
2/14
Outline
• Non-linear dimensionality reduction and spectral clustering.
• Diffusion based probabilistic interpretation of spectral methods.
• Eigenvectors of normalized graph Laplacian is a discrete approximation of the continuous Fokker-Plank operator.
• Justification of the success of spectral clustering.
• Conclusions.
3/14
• Nomalized graph Laplacian :
Given N data points where each , the distance (similarity) between any two points xi and xj is given by
with Gaussian kernel of width
and a diagonal normalization matrix
• Solve the normalized eigenvalue problem
• Use first few eigenvectors of M for low-dimensional representation of data or good coordinates for clustering.
Spectral clustering
Nnn 1}{ x p
nx R
2exp),(
2
,
ji
jiji KLxx
xx
LDMMDL 1whereor
4/14
N
j jiiN LDDDDiagD1 ,1 with]).....([
Spectral Clustering : previous work
• Non-linear dimensionality analysis by S. Roweis and L.Saul (published in Science magazine, 2000).
• Belkin & Niyogi (NIPS’02) show that if data are sampled uniformly from the low-dimensional manifold, first few eigenvectors of M=D-1L are discrete approximation of the Laplace-Beltrami operator on the manifold.
• Meila & Shi (AIStat’01) interpret M as a stochastic matrix representing random walk on the graph.
11
1 ,
,
N
ji
N
j ji
ji D
LM
N
j ji
jijii
tj
t
xxK
xxKMxxxxp
1
,1
),(
),(|
5/14
Diffusion distance and Diffusion map
• A symmetric matrix Ms can be derived from M as
• M and Ms has same N eigenvalues,
• Under random walk representation of the graph M
N
j ji
jiji
K
KM
1
,),(
),(
xx
xx2/12/1 MDDM s
Tk
N
k kksM
1
0
,2/1Dkk 2/1 Dkk : left eigenvector of M
: right eigenvector of M
',', kkkk
N
j ji
jijii
tj
t
K
KMp
1
,),(
),(|
xx
xxxxxx : time step
6/14
• If one starts random walk from location xi , the probability of landing in location y after r time steps is given by
• For large , all points in the graph are connected (Mi,j >0) and the eigenvalues of M
Diffusion distance and Diffusion map
2exp),(,
),(
),(|
2
1
,
ji
jiN
j ji
jijii
tj
t KK
KMp
xxxx
xx
xxxxxx
• has the dual representation (time step and kernel width).
ri
ti Me|rtp )|p(),( 0
ixxyxxy
where ei is a row vector with all zeros except that ith position = 1.
01 110 N
7/14
Diffusion distance and Diffusion mapr
it
i Me|rtp )|p(),( 0ixxyxxy
• One can show that regardless of starting point xi
)(lim)|,(lim 0 yxy
ri
ri
tMetp Left eigenvector of M
with eigenvalue 0=1
N
j j
ii
D
D
1
0 )(xwith
• Eigenvector 0(x) has the dual representation :
1. Stationary probability distribution on the curve, i.e., the probability of landing at location x after taking infinite
steps of random walk (independent of the start location).
2. It is the density estimate at location x.
8/14
Diffusion distancer
iit
i Me|rtp )|p(),( 0 xxyxxy
• For any finite time r, )()()(),(1
10 yxyxy k
N-
k
rkk|tp
• k and k are the right and left eigenvectors of graph Laplacian M.
• is the kth eigenvalue of M r (arranged in descending order).
rk
• Given the definition of random walk, we denote Diffusion distance as a distance measure at time t between two pmfs as
)(),,(),,(),,(),,(),(222
1yxyxyxyxyxx
x
xywtptptptpDis N
jiwjijit
with empirical choice w(y)=1/0(y).
9/14
Diffusion Map
)(/),,(),,(),( 0
22
1yxyxyxx
x
xy
N
jijit tptpDis• Diffusion distance :
• Diffusion map : Mapping between original space and first k eigenvectors as
))(,),(),(()( 2211 xxxx ktk
ttt
22 )()(),( jt
it
jitDis xxxx Relationship :
• This relationship justifies using Euclidean distance in diffusion map space for spectral clustering.
• Since , it is justified to stop at appropriate k with a negligible error of order O(k+1/k)t).
011 N
10/14
Asymptotics of Diffusion Map
• Suppose {xi} are sampled i.i.d. from probability density p(x) defined over manifold boundarysmoothwithpR
Z
X
Y• Suppose p(x) = e-U(x) with U(x) is potential (energy) at location x.
• As , random walk on a discrete graph converges to random walk on the continuous manifold . The forward and backward operators are given by
yyyxyx
yyyyxx
dpMT
dpMT
b
f
)()()|()]([
)()()|()]([
)|()|( 0 yxxxyx ttpM
11/14
N
Asymptotics of Diffusion Map
• Consider the limit , i.e., when each data point contains infinite nearby neighbors. Hence in that limit, random walk converges to a diffusion process with probability density evolving continuously in time as
),(lim),(),(
lim),(
00tp
ITtptp
t
tp f xxxx
yyyxyxyyyyxx dpMTdpMT bf )()()|()]([and)()()|()]([
• Tf[] : the probability distribution after one time-step
• (x) is probability distribution on the graph at t=0.
• Tb[](x) is the mean of function after one time-step , for a random walk that started at location x at time t=0.
12/14
N0
Fokker-Plank operator
),(lim),(),(
lim),(
00tp
ITtptp
t
tp f xxxx
• Infinitesimal generators (propagators) :
ITITb
bf
f
00lim,lim HH
• The eigenfunctions of Tf and Tb converge to those of Hf and Hb, respectively.
• The backward generator is given by the Fokker –Plank operator
Ub 2H
which corresponds to a diffusion process in a potential field 2U(x).
motionBrownian:)(),(2)2()( twtwDUtx
13/14
Spectral clustering and Fokker-Plank operator
Ub 2H
• The term is interpreted as the drift term towards low potential (higher data density).
• The left and right eigenvectors of M can be viewed as discrete approximations of Tf and Tb, respectively.
• Tf and Tb can be viewed as approximation to Hf and Hb, which in the asymptotic case ( ) can be viewed as diffusion process with potential 2U(x) (p(x)=exp(-U(x)).
14/14
U
0