lecture8 xing
TRANSCRIPT
Eric Xing © Eric Xing @ CMU, 2006-2010 1
Machine Learning
Spectral Clustering
Eric Xing
Lecture 8, August 13, 2010
Reading:
Eric Xing © Eric Xing @ CMU, 2006-2010 4
Data Clustering
Compactness Connectivity
Two different criteria Compactness, e.g., k-means, mixture models Connectivity, e.g., spectral clustering
Eric Xing © Eric Xing @ CMU, 2006-2010 6
Some graph terminology Objects (e.g., pixels, data points)
i∈ I = vertices of graph G
Edges (ij) = pixel pairs with Wij > 0
Similarity matrix W = [ Wij ]
Degree di = Σj∈G Wij
dA = Σi∈A di degree of A G
Assoc(A,B) = Σi∈A Σj∈B Wij
⊆
Wiji
j
i
A
AB
Weighted Graph Partitioning
Eric Xing © Eric Xing @ CMU, 2006-2010 7
(edge) cut = set of edges whose removal makes a graph disconnected
weight of a cut: cut( A, B ) = Σi∈A Σj∈B Wij=Assoc(A,B)
Normalized Cut criteria: minimum cut(A,Ā)
More generally:
Cuts in a Graph
AA dAA
dAAAA ),(cut),(cut),Ncut( +=
∑∑ ∑∑
== ∈∈
∈∈
=
=
k
r A
rrk
r VjAi ij
AVjAi ijk
rr
rr
dAA
W
WAAA
1121
),(cut),Ncut(,
\,
Eric Xing © Eric Xing @ CMU, 2006-2010 8
Graph-based Clustering Data Grouping
Image sigmentation Affinity matrix: Degree matrix: Laplacian matrix: (bipartite) partition vector:
ijWG = {V,E}
Wiji
j
)),(( jiij xxdfW =
][ , jiwW =)(diag idD =
WDL −=
],,,[][
1111111
−−−==
,,,...,xxx N
Eric Xing © Eric Xing @ CMU, 2006-2010 9
Affinity Function
Affinities grow as σ grows
How the choice of σ value affects the results?
What would be the optimal choice for σ?
2
2
2σ
ji XX
ji eW−−
=,
Eric Xing © Eric Xing @ CMU, 2006-2010 10
Clustering via Optimizing Normalized Cut The normalized cut:
Computing an optimal normalized cut over all possible y (i.e., partition) is NP hard
Transform Ncut equation to a matrix form (Shi & Malik 2000):
Still an NP hard problem
BA dBA
dBABA ),(cut),(cut),Ncut( +=
DyyyWDyxNcut T
T
yx)(min)(min −
=
01=DyTSubject to:
Rayleigh quotientnby },{ −∈ 1
...
),(
),( ;
11)1()1)(()1(
11)1)(()1(
)Bdeg(B)A,(
)Adeg(B)A,(B)A,(
0
=
=−
−−−+
+−+=
+=
∑∑ >
i
xT
T
T
T
iiD
iiDk
DkxWDx
DkxWDx
cutcutNcut
i
Eric Xing © Eric Xing @ CMU, 2006-2010 11
Instead, relax into the continuous domain by solving generalized eigenvalue system:
Which gives:
Note that so, the first eigenvector is y0=1 with eigenvalue 0.
The second smallest eigenvector is the real valued solution to this problem!!
Relaxation
DyyWD λ=− )(
01 =− )( WD
DyyyWDyxNcut T
T
yx)(min)(min −
=
01=DyTSubject to:
Rayleigh quotientnby },{ −∈ 1
1=− DyyyWDy TTy s.t. ,)(min
Rayleigh quotient theorem
Eric Xing © Eric Xing @ CMU, 2006-2010 12
Algorithm1. Define a similarity function between 2 nodes. i.e.:
2. Compute affinity matrix (W) and degree matrix (D).
3. Solve
Do singular value decomposition (SVD) of the graph Laplacian
4. Use the eigenvector with the second smallest eigenvalue, , to bipartition the graph. For each threshold k, Ak={i | yi among k largest element of y*}
Bk={i | yi among n-k smallest element of y*} Compute Ncut(Ak,Bk) Output
DyyWD λ=− )(
2
2
2
X
ji XX
ji ew σ)()(
,
−−
=
WDL −=VVL TΛ= ⇒ *y
),(Ncut maxarg*kk BAk = ** , and kk BA
Eric Xing © Eric Xing @ CMU, 2006-2010 13
.},,{ with ,)(),( 011 =−∈−
= DybyDyy
ySDyBANcut TiT
T
y2i i
A
y2i
iA
DyySD λ=− )(
Ideally …
Eric Xing © Eric Xing @ CMU, 2006-2010 16
Cluster vs. Block matrix
BA dBAcut
dBAcutBANcut ),(),(),( +=
∑ ∈∈=
VjAi jiWADegree, ,)(
A
B
A
B
BA dBA
dBABA ),(cut),(cut),Ncut( +=
Eric Xing © Eric Xing @ CMU, 2006-2010 17
Criterion for partition:
Compare to Minimum cut
∑∈∈
=BjAi
jiBAWBAcut
,,,
min),(min
First proposed by Wu and Leahy
A
B
Ideal Cut
Cuts with lesser weightthan the ideal cut
Problem! Weight of cut is directly proportional to the number of edges in the cut.
Eric Xing © Eric Xing @ CMU, 2006-2010 18
Superior Performance?
K-means and Gaussian mixture methods are biased toward convex clusters
Eric Xing © Eric Xing @ CMU, 2006-2010 22
Representation
[ ]KXXX ,...,1=
∑= j jiwiiD ,),(
WDL −=
segments
pixels),(),( jiaffjiW =
Partition matrix X:
Pair-wise similarity matrix W:
Degree matrix D:
Laplacian matrix L:
Eric Xing © Eric Xing @ CMU, 2006-2010 23
Given a set of points S={s1,…sn}
Form the affinity matrix
Define diagonal matrix Dii= Σκ aik
Form the matrix
Stack the k largest eigenvectors of L to for the columns of the new matrix X:
Renormalize each of X’s rows to have unit length and get new matrix Y. Cluster rows of Y as points in R k
A Spectral Clustering Algorithm Ng, Jordan, and Weiss 2003
02
2
2
=≠∀=
−−
ii
SS
ji wjiewji
,, , ,σ
2121 // −−= WDDL
=
|
|
|
|
|
|
kxxxX 21
Eric Xing © Eric Xing @ CMU, 2006-2010 26
Eigenvectors and blocks
1 1 0 0
1 1 0 0
0 0 1 1
0 0 1 1
eigensolver
.71
.71
0
0
0
0
.71
.71
λ1= 2 λ2= 2 λ3= 0
λ4= 0
1 1 .2 0
1 1 0 -.2
.2 0 1 1
0 -.2 1 1
eigensolver
.71
.69
.14
0
0
-.14
.69
.71
λ1= 2.02 λ2= 2.02 λ3= -0.02
λ4= -0.02
Block matrices have block eigenvectors:
Near-block matrices have near-block eigenvectors:
Eric Xing © Eric Xing @ CMU, 2006-2010 27
Spectral Space
1 1 .2 0
1 1 0 -.2
.2 0 1 1
0 -.2 1 1
.71
.69
.14
0
0
-.14
.69
.71
e1
e2
e1 e2
1 .2 1 0
.2 1 0 1
1 0 1 -.2
0 1 -.2 1
.71
.14
.69
0
0
.69
-.14
.71
e1
e2
e1 e2
Can put items into blocks by eigenvectors:
Clusters clear regardless of row ordering:
Eric Xing © Eric Xing @ CMU, 2006-2010 28
More formally … Recall generalized Ncut
Minimizing this is equivalent to spectral clustering
∑∑ ∑∑
== ∈∈
∈∈
=
=
k
r A
rrk
r VjAi ij
AVjAi ijk
rr
rr
dAA
W
WAAA
1121
),(cut),Ncut(,
\,
∑=
=
k
r A
rrk
rd
AAAAA1
21),(cut),Ncut(min
YWDDY 2121 //Tmin −−
IYY =T s.t.
segments
pixelsY
Eric Xing © Eric Xing @ CMU, 2006-2010 29
Spectral Clustering Algorithms that cluster points using eigenvectors of matrices
derived from the data
Obtain data representation in the low-dimensional space that can be easily clustered
Variety of methods that use the eigenvectors differently (we have seen an example)
Empirically very successful
Authors disagree: Which eigenvectors to use How to derive clusters from these eigenvectors
Two general methods
Eric Xing © Eric Xing @ CMU, 2006-2010 30
Method #1 Partition using only one eigenvector at a time Use procedure recursively Example: Image Segmentation
Uses 2nd (smallest) eigenvector to define optimal cut Recursively generates two clusters with each cut
Eric Xing © Eric Xing @ CMU, 2006-2010 31
Method #2 Use k eigenvectors (k chosen by user)
Directly compute k-way partitioning
Experimentally has been seen to be “better”
Eric Xing © Eric Xing @ CMU, 2006-2010 33
User’s Prerogative Choice of k, the number of clusters
Choice of scaling factor Realistically, search over and pick value that gives the tightest clusters
Choice of clustering method: k-way or recursive bipartite
Kernel affinity matrix
2σ
),(, jiji SSKw =
Eric Xing © Eric Xing @ CMU, 2006-2010 34
Conclusions
Good news: Simple and powerful methods to segment images. Flexible and easy to apply to other clustering problems.
Bad news: High memory requirements (use sparse matrices). Very dependant on the scale factor for a specific problem.
2
2
2)()(
),( X
ji XX
ejiW σ
−−
=