unsupervised continuous clusteringjbg/teaching/csci_5622/16a.pdfmachine learning: jordan boyd-graber...

48
Unsupervised Continuous Clustering Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 16 Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 1 of 48

Upload: others

Post on 14-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Unsupervised ContinuousClustering

Machine Learning: Jordan Boyd-GraberUniversity of Colorado BoulderLECTURE 16

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 1 of 48

Page 2: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Lecture for Today

• What is clustering?

• K-Means

• Gaussian Mixture Models

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 2 of 48

Page 3: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Clustering

Classifica(on:  what  is  label  of  new  point?  

Clustering:  how  should  we  group  these  points?  

Clustering:  or  is  this  the  right  grouping?   Clustering:  what  about  this?  

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 3 of 48

Page 4: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Clustering

Uses:

• genomics

• medical imaging

• social network analysis

• recommender systems

• market segmentation

• voter analysis

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 4 of 48

Page 5: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Microarray Gene Expression Data

From: “Skin layer-specific transcriptional profiles in normal and recessive yellow (Mc1re/Mc1re) mice” by April andBarsh in Pigment Cell Research (2006)

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 5 of 48

Page 6: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Medical Imaging (MRIs and PET scans)

From: “Fluorodeoxyglucose positron emission tomography of mild cognitive impairment with clinical follow-up at 3years” by Pardo et al. in Alzheimer’s and Dementia (2010)

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 6 of 48

Page 7: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Social Networks

From: http://flowingdata.com/2008/03/12/17-ways-to-visualize-the-twitter-universe/

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 7 of 48

Page 8: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Recommender Systems

From: tech.hulu.com/blog/

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 8 of 48

Page 9: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Market Segmentation

From: mappinganalytics.com/map/segmentation-maps/segmentation-map.html

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 9 of 48

Page 10: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Voter Analysis

• soccer moms (female, middleaged, married, middle income,white, kids, suburban)

• Nascar dads (male, middleaged, married, middle income,white, kids, Southern, suburbanor rural)

• security moms ( ... )

• low information voters

• Ivy League Elites

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 10 of 48

Page 11: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Clustering

Questions:

• how do we fit clusters?

• how many clusters should we use?

• how should we evaluate model fit?

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 11 of 48

Page 12: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means

How do we fit the clusters?

• simplest method: K-means

• requires: real-valued data

• idea:◦ pick K initial cluster means◦ associate all points closest to mean k with cluster k◦ use points in cluster k to update mean for that cluster◦ re-associate points closest to new mean for k with cluster k◦ use new points in cluster k to update mean for that cluster◦ ...◦ stop when no change between updates

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 12 of 48

Page 13: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means

Animation at:http://shabal.in/visuals/kmeans/1.html

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 13 of 48

Page 14: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means: Example

Data:

x1 x2

0.4 -1.0-1.0 -2.2-2.4 -2.2-1.0 -1.9-0.5 0.6-0.1 1.71.2 3.33.1 1.61.3 1.62.0 0.8

●●

●●

−2 −1 0 1 2 3

−2

−1

01

23

X1

X2

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 14 of 48

Page 15: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means: Example

Pick K centers (randomly):

(−1,−1) and (0, 0)

●●

●●

−2 −1 0 1 2 3

−2

−1

01

23

X1

X2

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 15 of 48

Page 16: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means: Example

Calculate distance between points and those centers:

x1 x2 (−1,−1) (0, 0)0.4 -1.0 1.4 1.1-1.0 -2.2 1.2 2.4-2.4 -2.2 1.9 3.3-1.0 -1.9 0.9 2.2-0.5 0.6 1.6 0.8-0.1 1.7 2.9 1.71.2 3.3 4.8 3.53.1 1.6 4.8 3.41.3 1.6 3.5 2.12.0 0.8 3.5 2.2

> centers <- rbind(c(-1,-1),c(0,0))

> dist1 <- apply(x,1,function(x) sqrt(sum((x-centers[1,])^2)))

> dist2 <- apply(x,1,function(x) sqrt(sum((x-centers[2,])^2)))

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 16 of 48

Page 17: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means: Example

Choose mean with smaller distance:

x1 x2 (−1,−1) (0, 0)

0.4 -1.0 1.4 1.1-1.0 -2.2 1.2 2.4-2.4 -2.2 1.9 3.3-1.0 -1.9 0.9 2.2-0.5 0.6 1.6 0.8-0.1 1.7 2.9 1.71.2 3.3 4.8 3.53.1 1.6 4.8 3.41.3 1.6 3.5 2.12.0 0.8 3.5 2.2

> dists <- cbind(dist1,dist2)

> cluster.ind <- apply(dists,1,which.min)

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 17 of 48

Page 18: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means: Example

New clusters:

●●

●●

−2 −1 0 1 2 3

−2

−1

01

23

X1

X2

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 18 of 48

Page 19: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means: Example

Refit means for each cluster:

• cluster 1: (−1.0,−2.2),(−2.4,−2.2), (−1.0,−1.9)

• new mean: (−1.5,−2.1)

• cluster 2: (0.4,−1.0),(−0.5, 0.6), (−0.1, 1.7),(1.2, 3.3), (3.1, 1.6), (1.3, 1.6),(2.0, 0.8)

• new mean: (1.0, 1.2)

●●

●●

−2 −1 0 1 2 3

−2

−1

01

23

X1

X2

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 19 of 48

Page 20: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means: Example

Recalculate distances for each cluster:

x1 x2 (−1.5,−2.1) (1.0, 1.2)

0.4 -1.0 2.2 2.3-1.0 -2.2 0.5 4.0-2.4 -2.2 1.0 4.9-1.0 -1.9 0.5 3.8-0.5 0.6 2.8 1.7-0.1 1.7 4.1 1.21.2 3.3 6.0 2.13.1 1.6 5.8 2.01.3 1.6 4.6 0.52.0 0.8 4.6 1.1

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 20 of 48

Page 21: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means: Example

Choose mean with smaller distance:

x1 x2 (−1.5,−2.1) (1.0, 1.2)

0.4 -1.0 2.2 2.3-1.0 -2.2 0.5 4.0-2.4 -2.2 1.0 4.9-1.0 -1.9 0.5 3.8-0.5 0.6 2.8 1.7-0.1 1.7 4.1 1.21.2 3.3 6.0 2.13.1 1.6 5.8 2.01.3 1.6 4.6 0.52.0 0.8 4.6 1.1

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 21 of 48

Page 22: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means: Example

New clusters:

●●

●●

−2 −1 0 1 2 3

−2

−1

01

23

X1

X2

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 22 of 48

Page 23: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means: Example

Refit means for each cluster:

• cluster 1: (0.4,−1.0),(−1.0,−2.2), (−2.4,−2.2),(−1.0,−1.9)

• new mean: (−1.0,−1.8)

• cluster 2: (−0.5, 0.6),(−0.1, 1.7), (1.2, 3.3),(3.1, 1.6), (1.3, 1.6), (2.0, 0.8)

• new mean: (1.2, 1.6)

●●

●●

−2 −1 0 1 2 3

−2

−1

01

23

X1

X2

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 23 of 48

Page 24: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means: Example

Recalculate distances for each cluster:

x1 x2 (−1.0,−1.8) (1.2, 1.6)

0.4 -1.0 1.6 2.7-1.0 -2.2 0.4 4.4-2.4 -2.2 1.5 5.2-1.0 -1.9 0.1 4.1-0.5 0.6 2.4 2.0-0.1 1.7 3.6 1.21.2 3.3 5.6 1.73.1 1.6 5.3 1.91.3 1.6 4.1 0.12.0 0.8 4.0 1.2

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 24 of 48

Page 25: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means: Example

Select smallest distance and compare these clusters with previous:

Table : New Clusters

x1 x2 (−1.0,−1.8) (1.2, 1.6)

0.4 -1.0 1.6 2.7-1.0 -2.2 0.4 4.4-2.4 -2.2 1.5 5.2-1.0 -1.9 0.1 4.1-0.5 0.6 2.4 2.0-0.1 1.7 3.6 1.21.2 3.3 5.6 1.73.1 1.6 5.3 1.91.3 1.6 4.1 0.12.0 0.8 4.0 1.2

Table : Old Clusters

(−1.5,−2.1) (1.0, 1.2)

2.2 2.30.5 4.01.0 4.90.5 3.82.8 1.74.1 1.26.0 2.15.8 2.04.6 0.54.6 1.1

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 25 of 48

Page 26: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means in Practice

R has a function for K-means in the stats package; this is probablyalready loaded

• let’s use this for the Old Faithful data

> library(datasets)

> faith.2 <- kmeans(faithful,2)

> names(faith.2)

> plot(faithful[,1],faithful[,2],col=faith.2$cluster,

+ pch=faith.2$cluster,lwd=3)

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 26 of 48

Page 27: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

5060

7080

90

eruptions

wai

t

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 27 of 48

Page 28: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means in R

K-means can be used for imagesegmentation

• partition image into multiplesegments

• find boundaries of objects

• make art

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 28 of 48

Page 29: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means Clustering

What is our data look like this?

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

−4 −2 0 2 4

−4

−2

02

4

x1

x2

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 29 of 48

Page 30: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means Clustering

True clustering:

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

−4 −2 0 2 4

−4

−2

02

4

x1

x2

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 30 of 48

Page 31: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

K-Means Clustering

K-means clustering (K = 2):

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

−4 −2 0 2 4

−4

−2

02

4

x1

x2

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 31 of 48

Page 32: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Mixture Models

K-means associates data with cluster centers.

What if we actually modeled the data?

• real-valued data

• observation xi in cluster ci

• have K clusters

• model each cluster with a Gaussian distribution

xi | ci = k ∼ N(µk ,Σk)

• µk is mean vector, Σk is covariance matrix

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 32 of 48

Page 33: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Mixture Models

Gaussian mixture model (K = 2):

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

−4 −2 0 2 4

−4

−2

02

4

x1

x2

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 33 of 48

Page 34: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Mixture Models

Why mixture models?

• more flexible: can account for clusters with different shapes

• have data model (will be useful for choosing K )

• less sensitive to data scaling

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 34 of 48

Page 35: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Multivariate Gaussian

Multivariate Gaussian distribution for x ∈ Rd :

p(x |µ,Σ) = (2π)−d2 |Σ|−

12 e−

12

(x−µ)T Σ−1(x−µ)

• µ is vector of means• Σ is covariance matrix

Credit: WikipediaMachine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 35 of 48

Page 36: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Multivariate Gaussian

pdf when µ = [0, 0] and Σ =

[0.9 0.40.4 0.3

]:

−2

−1

0

1

2

−2

−1

0

1

20

0.1

0.2

0.3

0.4

0.5

X1X2

p(x)

.8

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 36 of 48

Page 37: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Multivariate Gaussian

●●

●●

●●

●●

●●

●●

−4 −2 0 2 4

−4

−2

02

4

x1

x2 ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−4 −2 0 2 4 6

−4

−2

02

46

x1

x2

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 37 of 48

Page 38: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Fitting a Mixture Model

Mixture model:

• observation xi in cluster ci with K clusters

• model each cluster with a Gaussian distribution

xi | ci = k ∼ N(µk ,Σk)

How do we find c1, . . . , cn (clusters) and (µ1,Σ1), . . . , (µK ,ΣK )(cluster centers)?

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 38 of 48

Page 39: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Fitting a Mixture Model

First, let’s simplify the model:

• covariance matrices have only diagonal elements,

Σ =

σ2

1 0 . . . 00 σ2

2 . . . 0. . . . . . . . . 00 0 0 σ2

K

• set σ2

1 = · · · = σ2K , suppose known

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 39 of 48

Page 40: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Fitting a Mixture Model

Next, use a method similar to K-means:• start with random cluster centers• associate observations to clusters by (log-)likelihood,

`(xi | ci = k) = −d

2log(2π)− 1

2log

d∏j=1

σ2k,j

− 1

2

d∑j=1

(xi,j − µk,j)2/σ2

k,j

∝ −d log(σk)− 1

2σ2k

d∑j=1

(xi,j − µk,j)2

∝ −d∑

j=1

(xi,j − µk,j)2

• refit centers µ1, . . . , µK given clusters by

µk,j =1

nk

∑ci=k

xi ,j

• recluster observations...• stop when no change in clusters

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 40 of 48

Page 41: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Fitting a Mixture Model

clustering with K-means

minimize distance

d(xi , µk) =

√√√√ d∑j=1

(xi ,j − µk,j)2

update means with K-meansuse average

µk,j =1

nk

∑ci=k

xi ,j

clustering with GMM

maximize likelihood

`(xi | ci = k) ∝ −d∑

j=1

(xi ,j − µk,j)2

update means with GMMuse average

µk,j =1

nk

∑ci=k

xi ,j

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 41 of 48

Page 42: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Fitting a Mixture Model

OK, now what if

Σ =

σ2

1 0 . . . 00 σ2

2 . . . 0. . . . . . . . . 00 0 0 σ2

K

and σ2

1, . . . , σ2K can take different values?

• use same algorithm• update µk and σ2

k with maximum likelihood estimator,

µk,j =1

nk

∑ci=k

xi ,j

σ2k,j =

1

nk

∑ci=k

(xi ,j − µk,j)2

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 42 of 48

Page 43: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Fitting a Mixture Model

Data:

x1 x2

-3.7 -0.40.4 0.10.4 -1.7-0.4 -1.0-1.3 -1.71.0 3.31.2 5.21.3 0.31.1 -0.80.5 2.8

−3 −2 −1 0 1

−2

−1

01

23

45

X1

X2

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 43 of 48

Page 44: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Fitting a Mixture Model

• pick centers and variances, µ1 = [−1,−1], σ21 = [1, 1], µ1 = [1, 1],

σ21 = [1, 1]

• compute (proportional) log likelihoods,

`(xi | ci = k) = −d∑

j=1

log(σj)−1

2

d∑j=1

(xi ,j − µk,j)2/σ2k,j

x1 x2 k = 1 k = 2-3.7 -0.4 -3.8 -12.10.4 0.1 -1.6 -0.60.4 -1.7 -1.2 -3.8-0.4 -1.0 -0.2 -3.0-1.3 -1.7 -0.3 -6.31.0 3.3 -11.2 -2.61.2 5.2 -22.0 -9.01.3 0.3 -3.6 -0.31.1 -0.8 -2.2 -1.60.5 2.8 - 8.2 -1.7

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 44 of 48

Page 45: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Fitting a Mixture Model

• fit new means and variances:

µ1 = [−1.3,−1.2]

σ21 = [3.1, 0.4]

µ2 = [0.9, 1.8]

σ22 = [0.2, 5.4]

• compute new distances...

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 45 of 48

Page 46: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Fitting a Mixture Model

x1 x2 k = 1 k = 2

-3.7 -0.4 -1.8 -70.80.4 0.1 -2.7 -1.00.4 -1.7 -0.8 -2.0-0.4 -1.0 -0.3 -6.8-1.3 -1.7 -0.5 -16.61.0 3.3 -27.4 -0.11.2 5.2 -55.9 -1.31.3 0.3 -4.3 -0.71.1 -0.8 -1.2 -0.60.5 2.8 -21.3 -0.7

No change, so clusters are final

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 46 of 48

Page 47: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Fitting a Mixture Model

−3 −2 −1 0 1

−2

−1

01

23

45

X1

X2

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 47 of 48

Page 48: Unsupervised Continuous Clusteringjbg/teaching/CSCI_5622/16a.pdfMachine Learning: Jordan Boyd-Graber j Boulder Unsupervised Continuous Clustering j 25 of 48 K-Means in Practice R has

Limitations of k-means / mixture models

k-means is fast and simple, but . . .

• What if your data are discrete?

• What if each data point has more than one cluster? (digits vs.documents)

• What if you don’t know the number of clusters?

Machine Learning: Jordan Boyd-Graber | Boulder Unsupervised Continuous Clustering | 48 of 48