lecture 19: more em
DESCRIPTION
Lecture 19: More EM. Machine Learning April 15, 2010. Last Time. Expectation Maximization Gaussian Mixture Models. Today. EM Proof Jensen’s Inequality Clustering sequential data EM over HMMs EM in any Graphical Model Gibbs Sampling. Gaussian Mixture Models. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/1.jpg)
Lecture 19: More EM
Machine LearningApril 15, 2010
![Page 2: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/2.jpg)
Last Time
• Expectation Maximization• Gaussian Mixture Models
![Page 3: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/3.jpg)
Today
• EM Proof– Jensen’s Inequality
• Clustering sequential data– EM over HMMs– EM in any Graphical Model• Gibbs Sampling
![Page 4: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/4.jpg)
Gaussian Mixture Models
![Page 5: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/5.jpg)
How can we be sure GMM/EM works?
• We’ve already seen that there are multiple clustering solutions for the same data.– Non-convex optimization problem
• Can we prove that we’re approaching some maximum, even if many exist.
![Page 6: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/6.jpg)
Bound maximization
• Since we can’t optimize the GMM parameters directly, maybe we can find the maximum of a lower bound.
• Technically: optimize a convex lower bound of the initial non-convex function.
![Page 7: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/7.jpg)
EM as a bound maximization problem
• Need to define a function Q(x,Θ) such that– Q(x,Θ) ≤ l(x,Θ) for all x,Θ– Q(x,Θ) = l(x,Θ) at a single
point– Q(x,Θ) is concave
![Page 8: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/8.jpg)
EM as bound maximization
• Claim: – for GMM likelihood
– The GMM MLE estimate is a convex lower bound
![Page 9: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/9.jpg)
EM Correctness Proof
• Prove that l(x,Θ) ≥ Q(x,Θ)Likelihood function
Introduce hidden variable (mixtures in GMM)
A fixed value of θt
Jensen’s Inequality (coming soon…)
![Page 10: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/10.jpg)
EM Correctness Proof
GMM Maximum Likelihood Estimation
![Page 11: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/11.jpg)
The missing link: Jensen’s Inequality
• If f is concave (or convex down):
• Incredibly important tool for dealing with mixture models.
if f(x) = log(x)
![Page 12: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/12.jpg)
Generalizing EM from GMM
• Notice, the EM optimization proof never introduced the exact form of the GMM
• Only the introduction of a hidden variable, z.• Thus, we can generalize the form of EM to
broader types of latent variable models
![Page 13: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/13.jpg)
General form of EM
• Given a joint distribution over observed and latent variables:
• Want to maximize:
1. Initialize parameters2. E Step: Evaluate:
3. M-Step: Re-estimate parameters (based on expectation of complete-data log likelihood)
4. Check for convergence of params or likelihood
![Page 14: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/14.jpg)
Applying EM to Graphical Models
• Now we have a general form for learning parameters for latent variables.– Take a Guess– Expectation: Evaluate likelihood– Maximization: Reestimate parameters– Check for convergence
![Page 15: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/15.jpg)
Clustering over sequential data
• Recall HMMs
• We only looked at training supervised HMMs.• What if you believe the data is sequential, but
you can’t observe the state.
![Page 16: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/16.jpg)
EM on HMMs
• also known as Baum-Welch
• Recall HMM parameters:
• Now the training counts are estimated.
![Page 17: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/17.jpg)
EM on HMMs
• Standard EM Algorithm– Initialize– E-Step: evaluate expected likelihood– M-Step: reestimate parameters from expected
likelihood– Check for convergence
![Page 18: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/18.jpg)
EM on HMMs
• Guess: Initialize parameters, • E-Step: Compute
![Page 19: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/19.jpg)
EM on HMMs
• But what are these E{…} quantities?
so…
These can be efficiently calculated from JTA potentials and separators.
![Page 20: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/20.jpg)
EM on HMMs
![Page 21: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/21.jpg)
EM on HMMs
• Standard EM Algorithm– Initialize– E-Step: evaluate expected likelihood
• JTA algorithm.– M-Step: reestimate parameters from expected likelihood
• Using expected values from JTA potentials and separators
– Check for convergence
![Page 22: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/22.jpg)
Training latent variables in Graphical Models
• Now consider a general Graphical Model with latent variables.
![Page 23: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/23.jpg)
EM on Latent Variable Models
• Guess– Easy, just assign random values to parameters
• E-Step: Evaluate likelihood.– We can use JTA to evaluate the likelihood.– And marginalize expected parameter values
• M-Step: Re-estimate parameters.– This can get trickier.
![Page 24: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/24.jpg)
Maximization Step in Latent Variable Models
• Why is this easy in HMMs, but difficult in general Latent Variable Models?
• Many parents graphical model
![Page 25: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/25.jpg)
Junction Trees
• In general, we have no guarantee that we can isolate a single variable.
• We need to estimate marginal separately.• “Dense Graphs”
![Page 26: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/26.jpg)
M-Step in Latent Variable Models
• M-Step: Reestimate Parameters.– Keep k-1 parameters fixed (to the current
estimate)– Identify a better guess for the free parameter.
![Page 27: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/27.jpg)
M-Step in Latent Variable Models
• M-Step: Reestimate Parameters.– Keep k-1 parameters fixed (to the current
estimate)– Identify a better guess for the free parameter.
![Page 28: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/28.jpg)
M-Step in Latent Variable Models
• M-Step: Reestimate Parameters.– Keep k-1 parameters fixed (to the current
estimate)– Identify a better guess for the free parameter.
![Page 29: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/29.jpg)
M-Step in Latent Variable Models
• M-Step: Reestimate Parameters.– Keep k-1 parameters fixed (to the current
estimate)– Identify a better guess for the free parameter.
![Page 30: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/30.jpg)
M-Step in Latent Variable Models
• M-Step: Reestimate Parameters.– Keep k-1 parameters fixed (to the current
estimate)– Identify a better guess for the free parameter.
![Page 31: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/31.jpg)
M-Step in Latent Variable Models
• M-Step: Reestimate Parameters.– Gibbs Sampling.– This is helpful if it’s easier to
sample from a conditional than it is to integrate to get the marginal.
![Page 32: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/32.jpg)
EM on Latent Variable Models
• Guess– Easy, just assign random values to parameters
• E-Step: Evaluate likelihood.– We can use JTA to evaluate the likelihood.– And marginalize expected parameter values
• M-Step: Re-estimate parameters.– Either JTA potentials and marginals OR– Sampling
![Page 33: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/33.jpg)
Today
• EM as bound maximization• EM as a general approach to learning
parameters for latent variables• Sampling
![Page 34: Lecture 19: More EM](https://reader036.vdocument.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/34.jpg)
Next Time
• Model Adaptation– Using labeled and unlabeled data to improve
performance.• Model Adaptation Application– Speaker Recognition• UBM-MAP