review: mixtures of gaussiansconnection to k-means case study 2: document retrieval. k-means ©sham...

37
©Sham Kakade 2016 1 Machine Learning for Big Data CSE547/STAT548, University of Washington John Thickstun April 26 st , 2016 Review: Mixtures of Gaussians Case Study 2: Document Retrieval

Upload: others

Post on 28-Jan-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

©Sham Kakade 2016 1

Machine Learning for Big Data CSE547/STAT548, University of Washington

John Thickstun

April 26st, 2016

Review: Mixtures of Gaussians

Case Study 2: Document Retrieval

Page 2: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Some Data

©Sham Kakade 2016 2

Page 3: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Gaussian Mixture Model

©Sham Kakade 2016 3

Most commonly used mixture model Observations:

Parameters:

Cluster indicator:

Per-cluster likelihood:

Ex. = country of origin, = height of ith person kth mixture component = distribution of heights in country k

Page 4: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Generative Model

• We can think of sampling observations from the model

• For each observation i,– Sample a cluster assignment

– Sample the observation from the selected Gaussian

©Sham Kakade 2016 4

Page 5: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Also Useful for Density Estimation

©Sham Kakade 2016 5

Contour Plot of Joint Density

Page 6: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Density as Mixture of Gaussians

• Approximate density with a mixture of Gaussians

©Sham Kakade 2016 6

Mixture of 3 Gaussians Contour Plot of Joint Density

Page 7: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Density as Mixture of Gaussians

• Approximate density with a mixture of Gaussians

©Sham Kakade 2016 7

Mixture of 3 Gaussians

Page 8: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Summary of GMM Components

• Observations

• Hidden cluster labels

• Hidden mixture means

• Hidden mixture covariances

• Hidden mixture probabilities

©Sham Kakade 2016 8

Gaussian mixture marginal and conditional likelihood :

Page 9: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

©Sham Kakade 2016 9

Machine Learning for Big Data CSE547/STAT548, University of Washington

John Thickstun

April 26st, 2016

Application to Document Modeling

Case Study 2: Document Retrieval

Page 10: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Task 2: Cluster Documents

©Sham Kakade 2016 10

Now:

Cluster documents based on topic

Page 11: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Document Representation

©Sham Kakade 2016 11

Bag of words model

document d

Page 12: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

A Generative Model

©Sham Kakade 2016 12

Documents:

Associated topics:

Parameters:

Page 13: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

A Generative Model

©Sham Kakade 2016 13

Documents:

Associated topics:

Parameters:

Generative model:

Page 14: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Form of Likelihood

©Sham Kakade 2016 14

Conditioned on topic...

Marginalizing latent topic assignment:

Page 15: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

©Sham Kakade 2016 15

Machine Learning for Big Data CSE547/STAT548, University of Washington

John Thickstun

April 26st, 2016

Review:EM Algorithm

Case Study 2: Document Retrieval

Page 16: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Learning Model Parameters

• Want to learn model parameters

©Sham Kakade 2016 16

Our actual observations

C. Bishop, Pattern Recognition & Machine Learning

Mixture of 3 Gaussians

Page 17: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

ML Estimate of Mixture Model Params

©Sham Kakade 2016 17

Log likelihood

Want ML estimate

Assume exponential family

Neither convex nor concave and local optima

Page 18: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Complete Data

• Imagine we have an assignment of each xi to a cluster

©Sham Kakade 2016 18

Our actual observations

C. Bishop, Pattern Recognition & Machine Learning

Complete data labeled

by true cluster assignments

Page 19: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Cluster Responsibilities

• We must infer the cluster assignments from the observations

©Sham Kakade 2016 19C. Bishop, Pattern Recognition & Machine Learning

Posterior probabilities of assignments to each cluster *given* model parameters:

Soft assignments to clusters

Page 20: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Iterative Algorithm

©Sham Kakade 2016 20

Motivates a coordinate ascent-like algorithm:1. Infer missing values given estimate of parameters

2. Optimize parameters to produce new given “filled in” data

3. Repeat

Example: MoG (derivation soon… + HW)1. Infer “responsibilities”

2. Optimize parameters

Page 21: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Gaussian Mixture Example: Start

©Sham Kakade 2016 21

Page 22: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

After first iteration

©Sham Kakade 2016 22

Page 23: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

After 2nd iteration

©Sham Kakade 2016 23

Page 24: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

After 3rd iteration

©Sham Kakade 2016 24

Page 25: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

After 4th iteration

©Sham Kakade 2016 25

Page 26: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

After 5th iteration

©Sham Kakade 2016 26

Page 27: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

After 6th iteration

©Sham Kakade 2016 27

Page 28: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

After 20th iteration

©Sham Kakade 2016 28

Page 29: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Expectation Maximization (EM) – Setup

©Sham Kakade 2016 29

More broadly applicable than just to mixture models considered so far

Model:

Interested in maximizing (wrt ):

Special case:

observable – “incomplete” data

not (fully) observable – “complete” data

parameters

Page 30: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

EM Algorithm

©Sham Kakade 2016 30

Initial guess:

Estimate at iteration t:

E-Step

Compute

M-Step

Compute

Page 31: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Example – Mixture Models

©Sham Kakade 2016 31

E-Step Compute

M-Step Compute

Consider i.i.d.

Page 32: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Initialization

©Sham Kakade 2016 32

In mixture model case where , there are many ways to initialize the EM algorithm

Examples: Choose K observations at random to define each cluster. Assign

other observations to the nearest “centriod” to form initial parameter estimates

Pick the centers sequentially to provide good coverage of data

Grow mixture model by splitting (and sometimes removing) clusters until K clusters are formed

Can be quite important to convergence rates in practice

Page 33: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

What you need to know

• Mixture model formulation– Generative model

– Likelihood

• Expectation Maximization (EM) Algorithm– Derivation

– Concept of non-decreasing log likelihood

– Application to standard mixture models

©Sham Kakade 2016 33

Page 34: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

©Sham Kakade 2016 34

Machine Learning for Big Data CSE547/STAT548, University of Washington

John Thickstun

April 26st, 2016

Review:Connection to k-means

Case Study 2: Document Retrieval

Page 35: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

K-means

©Sham Kakade 2016 35

1. Ask user how many clusters they’d like. (e.g. k=5)

2. Randomly guess k cluster Center locations

3. Each datapoint finds out which Center it’s closest to.

4. Each Center finds the centroid of the points it owns

Page 36: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

K-means

• Randomly initialize k centers(0) = 1

(0),…, k(0)

• Classify: Assign each point j {1,…N} to nearest center:

• Recenter: i becomes centroid of its point:

– Equivalent to i average of its points!

©Sham Kakade 2016 36

Page 37: Review: Mixtures of GaussiansConnection to k-means Case Study 2: Document Retrieval. K-means ©Sham Kakade 2016 35 1. Ask user how many clusters they’d like. (e.g. k=5) 2. Randomly

Special Case: Spherical Gaussians + hard assignments

• If P(X|z=k) is spherical, with same for all classes:

• Then, compare EM objective with k-means:

©Sham Kakade 2016 37

P(xi | zi = k)µexp -1

2s 2xi -mk

ëê

ù

ûú

P(zi = k,xi ) =1

(2p )m/2 || Sk ||1/2exp -

1

2xi -mk( )

T

Sk-1xi -mk( )

é

ëê

ù

ûúP(zi = k)