advanced machine learning

7/29/2019 Advanced Machine Learning

1/36

CS 221: Artificial Intelligence

Lecture 6:

Advanced Machine Learning

Sebastian Thrun and Peter Norvig

Slide credit: Mark Pollefeys, Dan Klein, Chris Manning


2/36

Outline

Clustering

K-Means

EM

Spectral Clustering

Dimensionality Reduction

2


3/36

The unsupervised learning problem

3

Many data points, no labels


4/36

Unsupervised Learning?

Google Street View

4
http://maps.google.com/http://maps.google.com/


5/36

K-Means

5

Many data points, no labels


6/36

K-Means

Choose a fixed number ofclusters

Choose cluster centers andpoint-cluster allocations to

minimize error cant do this by exhaustive

search, because there aretoo many possibleallocations.

Algorithm fix cluster centers;

allocate points to closest

cluster

fix allocation; compute

best cluster centers x could be any set of

features for which we can

compute a distance (careful

about scaling)

x j i 2j elements o f i' th clusterclusters* From Marc Pollefeys COMP 256 2003


7/36

K-Means


8/36

K-Means

* From Marc Pollefeys COMP 256 2003


9/36

K-means clustering using intensity alone and color alone

Image Clusters on intensity Clusters on color


Results of K-Means Clustering:


10/36

K-Means

Is an approximation to EM Model (hypothesis space): Mixture of N Gaussians

Latent variables: Correspondence of data and Gaussians

We notice:

Given the mixture model, its easy to calculate thecorrespondence

Given the correspondence its easy to estimate the mixturemodels


11/36

Expectation Maximzation: Idea

Data generated from mixture of Gaussians

Latent variables: Correspondence between Data Itemsand Gaussians


12/36

Generalized K-Means (EM)


13/36

Gaussians

13

p(x) =1

2p s exp -(x - m )22s 2

p(x) = 2p( )-N

2 S

- 1

exp -1

2 (x- m )T S - 1(x - m )


14/36

ML Fitting Gaussians

14

m = 1M xi

i

=

1

Mxi - m( )

i

xi - m( )T

p(x) =1

2p s exp -(x - m )22s 2

p(x) = 2p( )-

N

2 S

- 1

exp -1

2 (x- m )T S - 1(x - m )


15/36

Learning a Gaussian Mixture(with known covariance)

k

n

x

x

ni

ji

e

e

1

)(2

1

)(

2

1

2

2

2

2

nj E[zij ]i=1

M

M-Step

k

n

ni

ji

ij

xxp

xxpzE

1

)|(

)|(][E-Step

Sj

1

nj

E[zij] x

i- m( )

i=1

M

xi - m( )Tm j 1

nj

E[zij] x

i

i=1

M


16/36

Expectation Maximization

Converges!

Proof[Neal/Hinton, McLachlan/Krishnan]:

E/M step does not decrease data likelihood

Converges at local minimum or saddle point

But subject to local minima


17/36

EM Clustering: Results

http://www.ece.neu.edu/groups/rpl/kmeans/


18/36

Practical EM

Number of Clusters unknown

Suffers (badly) from local minima

Algorithm: Start new cluster center if many points

unexplained

Kill cluster center that doesnt contribute

(Use AIC/BIC criterion for all this, if you want

to be formal)

18


19/36

Spectral Clustering

19


20/36

Spectral Clustering

20


21/36

The Two Spiral Problem

21


22/36

Spectral Clustering: Overview

Data Similarities Block-Detection

* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University


23/36

Eigenvectors and Blocks

Block matrices have block eigenvectors:

Near-block matrices have near-block eigenvectors: [Ng et al., NIPS 02]

1 1 0 0

1 1 0 0

0 0 1 1

0 0 1 1

eigensolver

.71

.71

0

0

0

0

.71

.71

1= 2

2= 2

3= 0

4= 0

1 1 .2 0

1 1 0 -.2

.2 0 1 1

0 -.2 1 1

eigensolver

.71

.69

.14

0

0

-.14

.69

.71

1= 2.02

2= 2.02 3= -0.02 4= -0.02



24/36

Spectral Space

Can put items into blocks by eigenvectors:

Resulting clusters independent of row ordering:

1 1 .2 0

1 1 0 -.2

.2 0 1 1

0 -.2 1 1

.71

.69

.14

0

0

-.14

.69

.71

e1

e2

e1 e

2

1 .2 1 0

.2 1 0 1

1 0 1 -.2

0 1 -.2 1

.71

.14

.69

0

0

.69

-.14

.71

e1

e2

e1 e

2



25/36

The Spectral Advantage

The key advantage of spectral clustering is the spectral spacerepresentation:



26/36

Measuring Affinity

Intensity

Texture

Distance

af f x, y exp 12 i2

I x I y 2

af f x, y exp 12 d2

x y 2

af f x, y exp 12 t2

c x c y 2* From Marc Pollefeys COMP 256 2003


27/36

Scale affects affinity



28/36



29/36

Dimensionality Reduction

29


30/36

The Space of Digits (in 2D)

30

M. Brand, MERL


31/36

Dimensionality Reduction with PCA

[email protected]


32/36

Linear: Principal Components

Fit multivariate Gaussian

Compute eigenvectors of

Covariance

Project onto eigenvectors

with largest eigenvalues

32


33/36

Other examples of unsupervised learning

33

Mean face (after alignment)

Slide credit: Santiago Serrano


34/36

Eigenfaces

34Slide credit: Santiago Serrano


35/36

Non-Linear Techniques

Isomap

Local Linear Embedding

35Isomap, Science, M. Balasubmaranian and E. Schwartz


36/36

Scape (Drago Anguelov et al)

36

advanced machine learning

Documents