analysis of social media mld 10-802, lti 11-772 william cohen 2-15-11
TRANSCRIPT
Analysis of Social MediaMLD 10-802, LTI 11-772
William Cohen2-15-11
The “force” on nodes in a graph
• Suppose every node has a value (IQ, income,..) y(i)– Each node i has value yi … • and neighbors N(i), degree di
– If i,j connected then j exerts a force -K*[yi-yj] on i– Total:
– Matrix notation: F = -K(D-A)y - the Laplacian– Interesting (?) goal: set y so (D-A)y = c*y– Picture: neighbors pull i up or down, but net force
doesn’t change relative positions of nodes
)()(
)(iNjjii
iNjjii yydKyyKF
Spectral Clustering: Graph = Matrix
How do I pick y to be an eigenvector for a block-stochastic matrix?
Spectral Clustering: Graph = MatrixW*v1 = v2 “propogates weights from neighbors”
M[Shi & Meila, 2002]
e2
e3
-0.4 -0.2 0 0.2
-0.4
-0.2
0.0
0.2
0.4
xx x xx x
y yyy
y
xx xxxx
zzz zz z
zz zz z
e1
e2
jijiji
T
jijiji
jijij
jiiij
jijiji
jj
iij
ii
jij
jijiji
iii
jijiji
iii
TTT
yyaAD
yyayaya
yyayaya
yyayd
yyaydADAD
,
2,
,,
,
2
,
2
,,
22
,,
2
,,
2
)(2
1)(
22
1
22
1
222
1
)(
yy
yyyyyy
Another way the Laplacian comes up: it defines a cost formula for y where y assigned nodes to + or – classes so as to keep connected nodes in the same class.• Turns out: to minimize yT X y / (yTy) find smallest eigenvector of X• But: this will not be +1/-1, so it’s a “relaxed” solution
Some more terms
• If A is an adjacency matrix (maybe weighted) and D is a (diagonal) matrix giving the degree of each node– Then D-A is the (unnormalized) Laplacian– W=AD-1 is a probabilistic adjacency matrix– I-W is the (normalized or random-walk) Laplacian– etc….
• The largest eigenvectors of W correspond to the smallest eigenvectors of I-W– So sometimes people talk about “bottom eigenvectors
of the Laplacian”
A
W
A
W
K-nn graph(easy)
Fully connected graph,weighted by distance
Spectral Clustering: Graph = MatrixW*v1 = v2 “propogates weights from neighbors”
M[Shi & Meila, 2002]
e2
e3
-0.4 -0.2 0 0.2
-0.4
-0.2
0.0
0.2
0.4
xx x xx x
y yyy
y
xx xxxx
zzz zz z
zz zz z
e1
e2
Spectral Clustering: Graph = MatrixW*v1 = v2 “propogates weights from neighbors”
M
eigenvaluer with eigenvectoan is : vvvW
If Wis connected but roughly block diagonal with k blocks then• the top eigenvector is a constant vector • the next k eigenvectors are roughly piecewise constant with “pieces” corresponding to blocks
Spectral Clustering: Graph = MatrixW*v1 = v2 “propogates weights from neighbors”
M
eigenvaluer with eigenvectoan is : vvvW
If W is connected but roughly block diagonal with k blocks then• the “top” eigenvector is a constant vector • the next k eigenvectors are roughly piecewise constant with “pieces” corresponding to blocks
Spectral clustering:• Find the top k+1 eigenvectors v1,…,vk+1
• Discard the “top” one• Replace every node a with k-dimensional vector xa = <v2(a),…,vk+1 (a) >
• Cluster with k-means
Experimental results: best-case assignment of class labels to clusters
Eigenvectors of W Eigenvecs of variant of W