diffusion maps and spectral clustering author : ronald r. coifman et al. (yale university) presenter...

Diffusion Maps and Spectral Clustering

Author : Ronald R. Coifman et al. (Yale University)

Presenter : Nilanjan Dasgupta (SIG Inc.)

Machine Learning Seminar Series1/14

Low-dimensionalManifold

X

Y

Z

-- Datum

Motivation

• Data lie on a low-dimensional manifold. The shape of the manifold is not known a priori.

• PCA would fail to make compact representation since the manifold is not linear !

• Spectral clustering as a non-linear dimensionality reduction scheme.

2/14

Outline

• Non-linear dimensionality reduction and spectral clustering.

• Diffusion based probabilistic interpretation of spectral methods.

• Eigenvectors of normalized graph Laplacian is a discrete approximation of the continuous Fokker-Plank operator.

• Justification of the success of spectral clustering.

• Conclusions.

3/14

• Nomalized graph Laplacian :

Given N data points where each , the distance (similarity) between any two points xi and xj is given by

with Gaussian kernel of width

and a diagonal normalization matrix

• Solve the normalized eigenvalue problem

• Use first few eigenvectors of M for low-dimensional representation of data or good coordinates for clustering.

Spectral clustering

Nnn 1}{ x p

nx R

2exp),(

2

,

ji

jiji KLxx

xx

LDMMDL 1whereor

4/14

N

j jiiN LDDDDiagD1 ,1 with]).....([

Spectral Clustering : previous work

• Non-linear dimensionality analysis by S. Roweis and L.Saul (published in Science magazine, 2000).

• Belkin & Niyogi (NIPS’02) show that if data are sampled uniformly from the low-dimensional manifold, first few eigenvectors of M=D-1L are discrete approximation of the Laplace-Beltrami operator on the manifold.

• Meila & Shi (AIStat’01) interpret M as a stochastic matrix representing random walk on the graph.

11

1 ,

,

N

ji

N

j ji

ji D

LM

N

j ji

jijii

tj

t

xxK

xxKMxxxxp

1

,1

),(

),(|

5/14

Diffusion distance and Diffusion map

• A symmetric matrix Ms can be derived from M as

• M and Ms has same N eigenvalues,

• Under random walk representation of the graph M

N

j ji

jiji

K

KM

1

,),(

),(

xx

xx2/12/1 MDDM s

Tk

N

k kksM

1

0

,2/1Dkk 2/1 Dkk : left eigenvector of M

: right eigenvector of M

',', kkkk

N

j ji

jijii

tj

t

K

KMp

1

,),(

),(|

xx

xxxxxx : time step

6/14

• If one starts random walk from location xi , the probability of landing in location y after r time steps is given by

• For large , all points in the graph are connected (Mi,j >0) and the eigenvalues of M

Diffusion distance and Diffusion map

2exp),(,

),(

),(|

2

1

,

ji

jiN

j ji

jijii

tj

t KK

KMp

xxxx

xx

xxxxxx

• has the dual representation (time step and kernel width).

ri

ti Me|rtp )|p(),( 0

ixxyxxy

where ei is a row vector with all zeros except that ith position = 1.

01 110 N

7/14

Diffusion distance and Diffusion mapr

it

i Me|rtp )|p(),( 0ixxyxxy

• One can show that regardless of starting point xi

)(lim)|,(lim 0 yxy

ri

ri

tMetp Left eigenvector of M

with eigenvalue 0=1

N

j j

ii

D

D

1

0 )(xwith

• Eigenvector 0(x) has the dual representation :

1. Stationary probability distribution on the curve, i.e., the probability of landing at location x after taking infinite

steps of random walk (independent of the start location).

2. It is the density estimate at location x.

8/14

Diffusion distancer

iit

i Me|rtp )|p(),( 0 xxyxxy

• For any finite time r, )()()(),(1

10 yxyxy k

N-

k

rkk|tp

• k and k are the right and left eigenvectors of graph Laplacian M.

• is the kth eigenvalue of M r (arranged in descending order).

rk

• Given the definition of random walk, we denote Diffusion distance as a distance measure at time t between two pmfs as

)(),,(),,(),,(),,(),(222

1yxyxyxyxyxx

x

xywtptptptpDis N

jiwjijit

with empirical choice w(y)=1/0(y).

9/14

Diffusion Map

)(/),,(),,(),( 0

22

1yxyxyxx

x

xy

N

jijit tptpDis• Diffusion distance :

• Diffusion map : Mapping between original space and first k eigenvectors as

))(,),(),(()( 2211 xxxx ktk

ttt

22 )()(),( jt

it

jitDis xxxx Relationship :

• This relationship justifies using Euclidean distance in diffusion map space for spectral clustering.

• Since , it is justified to stop at appropriate k with a negligible error of order O(k+1/k)t).

011 N

10/14

Asymptotics of Diffusion Map

• Suppose {xi} are sampled i.i.d. from probability density p(x) defined over manifold boundarysmoothwithpR

Z

X

Y• Suppose p(x) = e-U(x) with U(x) is potential (energy) at location x.

• As , random walk on a discrete graph converges to random walk on the continuous manifold . The forward and backward operators are given by

yyyxyx

yyyyxx

dpMT

dpMT

b

f

)()()|()]([

)()()|()]([

)|()|( 0 yxxxyx ttpM

11/14

N

Asymptotics of Diffusion Map

• Consider the limit , i.e., when each data point contains infinite nearby neighbors. Hence in that limit, random walk converges to a diffusion process with probability density evolving continuously in time as

),(lim),(),(

lim),(

00tp

ITtptp

t

tp f xxxx

yyyxyxyyyyxx dpMTdpMT bf )()()|()]([and)()()|()]([

• Tf[] : the probability distribution after one time-step

• (x) is probability distribution on the graph at t=0.

• Tb[](x) is the mean of function after one time-step , for a random walk that started at location x at time t=0.

12/14

N0

Fokker-Plank operator

),(lim),(),(

lim),(

00tp

ITtptp

t

tp f xxxx

• Infinitesimal generators (propagators) :

ITITb

bf

f

00lim,lim HH

• The eigenfunctions of Tf and Tb converge to those of Hf and Hb, respectively.

• The backward generator is given by the Fokker –Plank operator

Ub 2H

which corresponds to a diffusion process in a potential field 2U(x).

motionBrownian:)(),(2)2()( twtwDUtx

13/14

Spectral clustering and Fokker-Plank operator

Ub 2H

• The term is interpreted as the drift term towards low potential (higher data density).

• The left and right eigenvectors of M can be viewed as discrete approximations of Tf and Tb, respectively.

• Tf and Tb can be viewed as approximation to Hf and Hb, which in the asymptotic case ( ) can be viewed as diffusion process with potential 2U(x) (p(x)=exp(-U(x)).

14/14

U

0

diffusion maps and spectral clustering author : ronald r. coifman et al. (yale university) presenter...

Documents