statistical perturbation theory for spectral clustering harrachov, 2007 a. spence and z. stoyanov

45
Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Upload: aliza-wagner

Post on 01-Apr-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Statistical perturbation theory for spectral clustering

Harrachov, 2007

A. Spence and Z. Stoyanov

Page 2: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Plan of the Talk

A. Clustering (Brief overview).

B. Deterministic Perturbation Theory.

C. Statistical Perturbation Theory.

Page 3: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Graph Clustering

3

41

2

6

7

5

Page 4: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Graph Clustering

3

41

2

6

7

5

Page 5: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Graph Clustering + Perturbation

3

41

2

6

7

5?

Page 6: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Gene Expression DataGene Expression Data ClusteringClustering

An Application

• There are over 10 000 genes expressed in any one tissue;

• DNA arrays typically produce very noisy data.

1. Genes in same cluster behave similarly?

2. Genes in different clusters behave differently?

1. Genes in same cluster behave similarly?

2. Genes in different clusters behave differently?

Issues:

Page 7: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Bi-partite Graphs

1

2

3

4

1

2

3

Page 8: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Matrix Form

Page 9: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

A Real Data Matrix (Leukemia)

Page 10: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Spectral Clustering: General Idea

Discrete Optimisation Problem(NP - Hard)

Discrete Optimisation Problem(NP - Hard)

Real Optimisation Problem(Tractable)

Real Optimisation Problem(Tractable)

Approximation

Exact - Impractical

Heuristic - Practical

Page 11: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Discrete Optimisation SVD

Active

Inactive

Inactive

Active

Solution: SingularValueDecomposition of Wscaled

Page 12: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Clustering Algorithm: Summary

ACTIVE

ACTIVEINACTIVE

INACTIVE

Page 13: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Literature

Page 14: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Types of Graph Matrices

Page 15: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

How we Cluster

Page 16: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Leukemia Data

Page 17: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Clustered Leukemia Data

Page 18: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Inaccuracies in the Data(Perturbation Theory)

Page 19: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Perturbation Theory(Deterministic Noise)

Page 20: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Deterministic Perturbation(Symmetric Matrix)

Page 21: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Linear Solve

Page 22: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Taylor Expansions

Page 23: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Rectangular Case Symmetric

Page 24: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Random Perturbations (plan)

• The Model

• Issues with the Theory

• A Possible Solution via Simulations?

• Experiments

Page 25: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

The Model

3

41

2

6

7

5

Page 26: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Difficulties with Random Matrix Theory (RMT)

Page 27: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Deterministic Perturbation Stochastic Perturbation

(simple eigenvector)

Page 28: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Deterministic Perturbation Stochastic Perturbation

(simple eigenvalues)

Page 29: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

PP Plot -Test for Normality(Largest eigenvalue of a Symmetric Matrix)

Page 30: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Simulated Random Perturbation(Largest eigenvalue of a Symmetric Matrix)

Page 31: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Deterministic Perturbation Stochastic Perturbation

(simple eigenvectors)

Page 32: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Results for Laplacian Matrices

Page 33: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Functional of the Eigenvector

Page 34: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Results for hTv2

Page 35: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

PP Plot of hTv’(0) - Test for Normality (h = ej)

Page 36: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Histogram of hTv’(0) - Simulations(h = ej)

Page 37: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

PP Plot of Simulated v[j]()(Distribution close to Normal)

Page 38: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Histogram of Simulated v[j]()(Distribution close to Normal)

Page 39: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Extension to the Rectangular Case

Page 40: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Probability of “Wrong Clustering”

Page 41: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Issues with Numerics

Page 42: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Efficient Simulations

Page 43: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Solution via Simulations?

Page 44: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Solution via Simulations?(Algorithm)

Page 45: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Comparing: Direct Calculation Vs. Repeated Linear Solve