spectral analysis based on the adjacency matrix of network data leting wu fall 2009

14
Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009

Upload: laurel-richardson

Post on 04-Jan-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009

Spectral Analysis based on the Adjacency Matrix of Network Data

Leting Wu

Fall 2009

Page 2: Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009

Mathematical Representation of Networks Data Adjacency Matrix A

If there is a link between vertexes i and vertex j, aij =1(or positive number if it is a weighted adjacency matrix) otherwise 0

Laplacian Matrix L One definition: L = D – A, D = diag{d1, d2,…, dn} Another definition: L = CC’, C is the incidence

matrix with rows labeled by vertex and columns labeled by edges

Page 3: Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009

Eigen Decomposition

Normal Matrix N N = D^(-0.5) A D^(-0.5)

niii

niii

niii

niN

niL

niA

...1 ,...,1

...0 ,...,1

... ,...,1

1

1

1

xx

yy

xx

Eigenvectors can be served as a ranking index on the nodes

Page 4: Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009

An Example of Two Clusters

Network of US political books(105 nodes, 441 edges)

Books about US politics sold by Amazon.com. Edges represent frequent co-purchasing of books by the same buyers. Nodes have been given colors of blue, white, or red to indicate whether they are "liberal", "neutral", or "conservative".

Page 5: Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009

Low Rank Embedding(A)

Page 6: Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009

Low Rank Embedding(L & N)

Page 7: Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009

Properties of the Spectral Space of A Two/k clear clusters construct two/k

orthogonal half lines in the spectral space in two/k dimensional space

The larger the distance is a node from the original, the more important the node is: it could have very large degree or connect with some nodes of large degree

Bridge points are between the smaller angle formed by the half lines

Page 8: Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009

Spectral Clustering Methods

Ratio Cut: Find the clusters by minimizing the cut cost The eigen-decomposition of Laplacian Matrix

offers a heuristic solution: In 2-way cluster, the second smallest eigenvalue is the cut cost its corresponding eigenvector is the cluster indicator: xi>0 is one cluster, xi<0 is another, xi=0 is the bridge between two clusters

B

BAcut

A

BAcut ),(),(

Page 9: Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009

Spectral Clustering Methods

Ratio Cut: Find the clusters by minimizing the cut cost

Normalized Cut: Find the clusters by minimizing the modified cut cost The eigen-decomposition of Normal Matrix

offers a heuristic solution: In 2-way cluster, the second largest eigenvalue is 1 - cut cost and its corresponding eigenvector is the cluster indicator: xi>0 is one cluster, xi<0 is another, xi=0 is the bridge between two clusters

B

BAcut

A

BAcut ),(),(

BA d

BAcut

d

BAcut ),(),(

Page 10: Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009

A Different Spectral Clustering Method by Adjacency Matrix Define the density D(G) of the graph as (# of edges

within the community - # of edges across the community)/# of nodes, we want to find the clusters with high desity:

The eigen-decomposition offers the heuristic solution: Eigenvalue is D(G) and the corresponding eigenvector is the cluster indicator

ss

AssGD

T

T

s n

)(max

}1,0{

Page 11: Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009

2-way clustering

The first eigenvalue and eigenvector are always positive with positive weighted adjacency matrix by Perron–Frobenius theorem.

2-way clustering has two situations here: 2 clearly clusters: when the eigenvector of the

largest eigenvalue contains zeros:

22

2

11

1 00

0

0,

000

0

xxB

Axx

B

A

Page 12: Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009

Cont.

2 mixed clusters: when no zeros in the first eigenvector If the second largest eigenvalue in magnitude is positive,

the graph contains two major communities. For xi>0 there is one community and xi<0 is another community

If the second largest eigenvalue in magnitude is negative, the graph is bipartite. For xi>0 there is one cluster and xi<0 is another cluster

K-way is a straight forward extension

Page 13: Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009

Experiment Results

Political Books: 105 nodes 92 labeled Label Label LabelA L N

Political Blogs: 1222 nodes labeled into two groups Label Label LabelA L N

48 2

1 41

47 0

2 43

48 0

2 41

567 597

19 39

584 632

2 4

517 12

69 624

Page 14: Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009

Conclusion

There is much information containing in the adjacency matrix which can be used to the clustering, ranking and visualization of networks

We propose a clustering method based on graph density and some experiment results show that this method works better than those based on L and N for some datasets