review: mixtures of gaussiansconnection to k-means case study 2: document retrieval. k-means ©sham...
TRANSCRIPT
©Sham Kakade 2016 1
Machine Learning for Big Data CSE547/STAT548, University of Washington
John Thickstun
April 26st, 2016
Review: Mixtures of Gaussians
Case Study 2: Document Retrieval
Some Data
©Sham Kakade 2016 2
Gaussian Mixture Model
©Sham Kakade 2016 3
Most commonly used mixture model Observations:
Parameters:
Cluster indicator:
Per-cluster likelihood:
Ex. = country of origin, = height of ith person kth mixture component = distribution of heights in country k
Generative Model
• We can think of sampling observations from the model
• For each observation i,– Sample a cluster assignment
– Sample the observation from the selected Gaussian
©Sham Kakade 2016 4
Also Useful for Density Estimation
©Sham Kakade 2016 5
Contour Plot of Joint Density
Density as Mixture of Gaussians
• Approximate density with a mixture of Gaussians
©Sham Kakade 2016 6
Mixture of 3 Gaussians Contour Plot of Joint Density
Density as Mixture of Gaussians
• Approximate density with a mixture of Gaussians
©Sham Kakade 2016 7
Mixture of 3 Gaussians
Summary of GMM Components
• Observations
• Hidden cluster labels
• Hidden mixture means
• Hidden mixture covariances
• Hidden mixture probabilities
©Sham Kakade 2016 8
Gaussian mixture marginal and conditional likelihood :
©Sham Kakade 2016 9
Machine Learning for Big Data CSE547/STAT548, University of Washington
John Thickstun
April 26st, 2016
Application to Document Modeling
Case Study 2: Document Retrieval
Task 2: Cluster Documents
©Sham Kakade 2016 10
Now:
Cluster documents based on topic
Document Representation
©Sham Kakade 2016 11
Bag of words model
document d
A Generative Model
©Sham Kakade 2016 12
Documents:
Associated topics:
Parameters:
A Generative Model
©Sham Kakade 2016 13
Documents:
Associated topics:
Parameters:
Generative model:
Form of Likelihood
©Sham Kakade 2016 14
Conditioned on topic...
Marginalizing latent topic assignment:
©Sham Kakade 2016 15
Machine Learning for Big Data CSE547/STAT548, University of Washington
John Thickstun
April 26st, 2016
Review:EM Algorithm
Case Study 2: Document Retrieval
Learning Model Parameters
• Want to learn model parameters
©Sham Kakade 2016 16
Our actual observations
C. Bishop, Pattern Recognition & Machine Learning
Mixture of 3 Gaussians
ML Estimate of Mixture Model Params
©Sham Kakade 2016 17
Log likelihood
Want ML estimate
Assume exponential family
Neither convex nor concave and local optima
Complete Data
• Imagine we have an assignment of each xi to a cluster
©Sham Kakade 2016 18
Our actual observations
C. Bishop, Pattern Recognition & Machine Learning
Complete data labeled
by true cluster assignments
Cluster Responsibilities
• We must infer the cluster assignments from the observations
©Sham Kakade 2016 19C. Bishop, Pattern Recognition & Machine Learning
Posterior probabilities of assignments to each cluster *given* model parameters:
Soft assignments to clusters
Iterative Algorithm
©Sham Kakade 2016 20
Motivates a coordinate ascent-like algorithm:1. Infer missing values given estimate of parameters
2. Optimize parameters to produce new given “filled in” data
3. Repeat
Example: MoG (derivation soon… + HW)1. Infer “responsibilities”
2. Optimize parameters
Gaussian Mixture Example: Start
©Sham Kakade 2016 21
After first iteration
©Sham Kakade 2016 22
After 2nd iteration
©Sham Kakade 2016 23
After 3rd iteration
©Sham Kakade 2016 24
After 4th iteration
©Sham Kakade 2016 25
After 5th iteration
©Sham Kakade 2016 26
After 6th iteration
©Sham Kakade 2016 27
After 20th iteration
©Sham Kakade 2016 28
Expectation Maximization (EM) – Setup
©Sham Kakade 2016 29
More broadly applicable than just to mixture models considered so far
Model:
Interested in maximizing (wrt ):
Special case:
observable – “incomplete” data
not (fully) observable – “complete” data
parameters
EM Algorithm
©Sham Kakade 2016 30
Initial guess:
Estimate at iteration t:
E-Step
Compute
M-Step
Compute
Example – Mixture Models
©Sham Kakade 2016 31
E-Step Compute
M-Step Compute
Consider i.i.d.
Initialization
©Sham Kakade 2016 32
In mixture model case where , there are many ways to initialize the EM algorithm
Examples: Choose K observations at random to define each cluster. Assign
other observations to the nearest “centriod” to form initial parameter estimates
Pick the centers sequentially to provide good coverage of data
Grow mixture model by splitting (and sometimes removing) clusters until K clusters are formed
Can be quite important to convergence rates in practice
What you need to know
• Mixture model formulation– Generative model
– Likelihood
• Expectation Maximization (EM) Algorithm– Derivation
– Concept of non-decreasing log likelihood
– Application to standard mixture models
©Sham Kakade 2016 33
©Sham Kakade 2016 34
Machine Learning for Big Data CSE547/STAT548, University of Washington
John Thickstun
April 26st, 2016
Review:Connection to k-means
Case Study 2: Document Retrieval
K-means
©Sham Kakade 2016 35
1. Ask user how many clusters they’d like. (e.g. k=5)
2. Randomly guess k cluster Center locations
3. Each datapoint finds out which Center it’s closest to.
4. Each Center finds the centroid of the points it owns
K-means
• Randomly initialize k centers(0) = 1
(0),…, k(0)
• Classify: Assign each point j {1,…N} to nearest center:
• Recenter: i becomes centroid of its point:
– Equivalent to i average of its points!
©Sham Kakade 2016 36
Special Case: Spherical Gaussians + hard assignments
• If P(X|z=k) is spherical, with same for all classes:
• Then, compare EM objective with k-means:
©Sham Kakade 2016 37
P(xi | zi = k)µexp -1
2s 2xi -mk
2é
ëê
ù
ûú
P(zi = k,xi ) =1
(2p )m/2 || Sk ||1/2exp -
1
2xi -mk( )
T
Sk-1xi -mk( )
é
ëê
ù
ûúP(zi = k)