lecture 12 - k-means, gmms, em

53
Clustering CS498

Upload: saksham-singhal

Post on 22-May-2017

220 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Lecture 12 - K-Means, GMMs, EM

Clustering

CS498

Page 2: Lecture 12 - K-Means, GMMs, EM

Today’s lecture

•  Clustering and unsupervised learning

•  Hierarchical clustering

•  K-means, K-medoids, VQ

Page 3: Lecture 12 - K-Means, GMMs, EM

Unsupervised learning

•  Supervised learning – Use labeled data to do something smart

•  What if the labels don’t exist?

Page 4: Lecture 12 - K-Means, GMMs, EM

Some inspiration

El Capitan, Yosemite National Park

Page 5: Lecture 12 - K-Means, GMMs, EM

The way we’ll see it

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Red

Green

Blue

Page 6: Lecture 12 - K-Means, GMMs, EM

A new question

•  I see classes, but …

•  How do I find them? – Can I automate this? – How any are there?

•  Answer: Clustering 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Red

GreenBlue

Page 7: Lecture 12 - K-Means, GMMs, EM

Clustering

•  Discover classes in data – Divide data in sensible clusters

•  Fundamentally ill-defined problem –  There is often no correct solution

•  Relies on many user choices

Page 8: Lecture 12 - K-Means, GMMs, EM

Clustering process

•  Describe your data using features – What’s your objective?

•  Define a proximity measure – How is the feature space shaped?

•  Define a clustering criterion – When do samples make a cluster?

Page 9: Lecture 12 - K-Means, GMMs, EM

Know what you want

•  Features & objective matter – Which are the two classes?

Page 10: Lecture 12 - K-Means, GMMs, EM

Know what you want

•  Features & objective matter – Which are the two classes?

Player’s height

Pla

yer’s

kno

wle

dge

of e

ntom

olog

y Basketball player recruiting

Page 11: Lecture 12 - K-Means, GMMs, EM

Know your space

•  Define a sensible proximity measure

Page 12: Lecture 12 - K-Means, GMMs, EM

Know your space

•  Define a sensible proximity measure

Angle of incidence

4π 0 2π

Page 13: Lecture 12 - K-Means, GMMs, EM

Know your cluster type

•  What forms a cluster in your space?

Page 14: Lecture 12 - K-Means, GMMs, EM

Know your cluster type

•  What forms a cluster in your space?

Speed

Alti

tude

Planes near airports

Page 15: Lecture 12 - K-Means, GMMs, EM

Know your cluster type

•  What forms a cluster in your space?

Wavefront position

Inte

nsity

Sound bouncing off a wall

Page 16: Lecture 12 - K-Means, GMMs, EM

Know your cluster type

•  What forms a cluster in your space?

East

Ants departing colony

West

South

North

Page 17: Lecture 12 - K-Means, GMMs, EM

How many clusters?

•  The deeper you look the more you’ll get

Page 18: Lecture 12 - K-Means, GMMs, EM

There are no right answers!

•  Part of clustering is an art

•  You need to experiment to get there

•  But some good starting points exist

Page 19: Lecture 12 - K-Means, GMMs, EM

How to cluster

•  Tons of methods

•  We can use step-based logical steps –  e.g., find two closest point and merge, repeat

•  Or formulate a global criterion

Page 20: Lecture 12 - K-Means, GMMs, EM

Hierarchical methods

•  Agglomerative algorithms –  Keep pairing up your data

•  Divisive algorithms

–  Keep breaking up your data

Page 21: Lecture 12 - K-Means, GMMs, EM

Agglomerative Approach

•  Look at your data points and form pairs –  Keep at it

Page 22: Lecture 12 - K-Means, GMMs, EM

More formally

•  Represent data as vectors:

•  Represent clusters by: •  Represent the clustering by:

•  Define a distance measure:

X = {xi ,i = 1,…,N}

C j

R = {C j , j = 1,…,m}e.g. R = {{x1,x3},x2{x4,x5,x6}}

d(C j ,Ci)

Page 23: Lecture 12 - K-Means, GMMs, EM

Agglomerative clustering

•  Choose: •  For t = 1,…

– Among all clusters in Rt-1, find cluster pair {Ci,Cj} such that:

–  Form new cluster and replace pair:

•  Until we have only one cluster

R0 = {Ci = {xi},i = 1,…,N}

argmin

i,jd(Ci ,C j )

Cq =Ci ∪C j

Rt = (Rt−1−{Ci ,C j})∪{Cq}

Page 24: Lecture 12 - K-Means, GMMs, EM

Pretty picture version

•  Dendrogram

x1 x2 x3 x4 x5 R0

R1 R2

R3

R4

Sim

ilarity

Page 25: Lecture 12 - K-Means, GMMs, EM

Pretty picture two

•  Venn diagram

x1

x2

x3 x4 x5

R1

R2

R3 R4

Page 26: Lecture 12 - K-Means, GMMs, EM

Cluster distance?

•  Complete linkage – Merge clusters that result to the smallest

diameter

•  Single linkage – Merge clusters with two closest data points

•  Group average – Use average of distances

Page 27: Lecture 12 - K-Means, GMMs, EM

What’s involved

•  At level t we have N - t clusters •  At level t+1 the pairs we consider are:

•  Overall comparisons:

N −t

2⎛

⎝⎜⎜⎜⎜

⎠⎟⎟⎟⎟≡

(N −t)(N −t−1)2

N −t

2⎛

⎝⎜⎜⎜⎜

⎠⎟⎟⎟⎟t=0

N−1

∑ ≡(N −1)N(N +1)

6

Page 28: Lecture 12 - K-Means, GMMs, EM

Not good for our case

•  El Capitan picture has 63,140 pixels •  How many cluster comparisons is that?

–  Thanks, but no thanks …

N −t

2⎛

⎝⎜⎜⎜⎜

⎠⎟⎟⎟⎟t=0

N−3

∑ = 41,946,968,141,536

Page 29: Lecture 12 - K-Means, GMMs, EM

Divisive Clustering

•  Works the other way around

•  Start with all data in one cluster

•  Start dividing into sub-clusters

Page 30: Lecture 12 - K-Means, GMMs, EM

Divisive Clustering

•  Choose: •  For t = 1,…

–  For k = 1,…,t •  Find least similar sub-clusters in each cluster

–  Pick the least similar of all:

– New clustering is now:

•  Until each point is a cluster

R0 = {X}

argmax

k ,i,jd(Ck ,i ,Ck ,j )

Rt = (Rt−1−{Ct})∪{C t ,i ,C t ,j}

Page 31: Lecture 12 - K-Means, GMMs, EM

Comparison

•  Which one is faster? – Agglomerative

•  Divisive has a complicated search step

•  Which one gives better results? – Divisive

•  Agglomerative makes only local observations

Page 32: Lecture 12 - K-Means, GMMs, EM

Using cost functions

•  Given a set of data xi •  Define a cost function:

–  θ are the cluster parameters –  is an assignment matrix –  d() is a distance function

J(θ,U) = uijd(xi ,θj )

j∑

i∑

U ∈ {0,1}

Page 33: Lecture 12 - K-Means, GMMs, EM

An iterative solution

•  We can’t use a gradient method –  The assignment matrix is binary-valued

•  We have two parameters to find θ, U –  Fix one and find the other, repeat flip case –  Iterate until happy

Page 34: Lecture 12 - K-Means, GMMs, EM

Overall process

•  Initialize θ and iterate: –  Estimate U

–  Estimate θ

–  Repeat until satisfied

uij =

1, if d(xi ,θj ) = mink d(xi ,θk )0, otherwise

⎧⎨⎪⎪⎪

⎩⎪⎪⎪

uij

∂d(xi ,θj )∂θj

= 0i∑

Page 35: Lecture 12 - K-Means, GMMs, EM

K-means

•  Standard and super-popular algorithm

•  Finds clusters in terms of region centers

•  Optimizes squared Euclidean distance

d(x,θ) = x− θ 2

Page 36: Lecture 12 - K-Means, GMMs, EM

K-means algorithm

•  Initialize k means µ •  Iterate

– Assign samples xi to closest mean µ

–  Estimate µ from assigned samples xi

•  Repeat until convergence

Page 37: Lecture 12 - K-Means, GMMs, EM

Example run – step 1

−6 −4 −2 0 2 4 6 8−6

−5

−4

−3

−2

−1

0

1

2

Page 38: Lecture 12 - K-Means, GMMs, EM

Example run – step 2

−6 −4 −2 0 2 4 6 8−6

−5

−4

−3

−2

−1

0

1

2

Page 39: Lecture 12 - K-Means, GMMs, EM

Example run – step 3

−6 −4 −2 0 2 4 6 8−6

−5

−4

−3

−2

−1

0

1

2

Page 40: Lecture 12 - K-Means, GMMs, EM

Example run – step 4

−6 −4 −2 0 2 4 6 8−6

−5

−4

−3

−2

−1

0

1

2

Page 41: Lecture 12 - K-Means, GMMs, EM

Example run – step 5

−6 −4 −2 0 2 4 6 8−6

−5

−4

−3

−2

−1

0

1

2

Page 42: Lecture 12 - K-Means, GMMs, EM

How well does it work?

•  Converges to a minimum of cost function – Not for all distances though!

•  Is heavily biased by starting positions – Various initialization tricks

•  It’s pretty fast!

Page 43: Lecture 12 - K-Means, GMMs, EM

K-Means on El Capitan

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Red

Green

Blue

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RedGreen

Blue

Page 44: Lecture 12 - K-Means, GMMs, EM

K-means on El Capitan

Page 45: Lecture 12 - K-Means, GMMs, EM

K-means on El Capitan

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Red

Green

Blue

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

10

0.2

0.4

0.6

0.8

1

Red

Green

Blue

Page 46: Lecture 12 - K-Means, GMMs, EM

K-means on El Capitan

Page 47: Lecture 12 - K-Means, GMMs, EM

One problem

•  K-means struggles with outliers

−5 0 5 10 15−30

−20

−10

0

10

20

30

40

−5 0 5 10 15−30

−20

−10

0

10

20

30

40

Page 48: Lecture 12 - K-Means, GMMs, EM

K-medoids

•  Medoid: –  Least dissimilar data point to all others – Not as influenced by outliers as the mean

•  Replace means with medoids –  Redesign k-means as k-medoids

Page 49: Lecture 12 - K-Means, GMMs, EM

K-medoids

−10 0 10 20−30

−20

−10

0

10

20

30

40Input data

−10 0 10 20−30

−20

−10

0

10

20

30

40k−means

−10 0 10 20−30

−20

−10

0

10

20

30

40k−medoids

Page 50: Lecture 12 - K-Means, GMMs, EM

Vector Quantization

•  Use of clustering for compression –  Keep a codebook (≈ k-means) –  Transmit nearest codebook vector instead of

current sample

•  We transmit only the cluster index, not the entire data for each sample

Page 51: Lecture 12 - K-Means, GMMs, EM

Simple example

−4 −2 0 2 4 6

−6

−4

−2

0

−6 −4 −2 0 2 4 6−6

−4

−2

0

2

Page 52: Lecture 12 - K-Means, GMMs, EM

Vector Quantization in Audio Input sequenceFr

eque

ncy

1

2

3

4

Clu

ster

Coded sequence

Freq

uenc

y

Time

Page 53: Lecture 12 - K-Means, GMMs, EM

Recap

•  Hierarchical clustering – Agglomerative, Divisive –  Issues with performance

•  K-means –  Fast and easy –  K-medoids for more robustness (but slower)