lecture13 gaussian mixtures - mit csailpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... ·...

Mixture Models & EM algorithm Probabilis5c Graphical Models, Lecture 13 David Sontag New York University Slides adapted from Carlos Guestrin, Dan Klein, Luke Ze@lemoyer, Dan Weld, Vibhav Gogate, and Andrew Moore

Upload: others

Post on 27-May-2020

3 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Mixture Models & EM algorithm Probabilis5c Graphical Models, Lecture 13

David Sontag New York University

Slides adapted from Carlos Guestrin, Dan Klein, Luke Ze@lemoyer, Dan Weld, Vibhav Gogate, and Andrew Moore

Page 2: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Mixture of Gaussians model

µ1 µ2

µ3

•  P(Y): There are k components

•  P(X|Y): Each component generates data from a mul5variate Gaussian with mean μi and covariance matrix Σi

Each data point is sampled from a genera&ve process:

1.  Choose component i with probability P(y=i)

2.  Generate datapoint ~ N(mi, Σi )

Gaussian mixture model (GMM)

Page 3: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

MulVvariate Gaussians

Σ = arbitrary (semidefinite) matrix: -‐ specifies rotaVon (change of basis) -‐ eigenvalues specify relaVve elongaVon

P(X = x j |Y = i) =1

(2π )m/2 || Σi ||1/2 exp −

12x j −µi( )

TΣi−1 x j −µi( )

$%&

'( P(X=xj)=

Page 4: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Mixtures of Gaussians (1)

Old Faithful Data Set

Time to ErupV

DuraVon of Last ErupVon

Page 5: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Mixtures of Gaussians (1)

Old Faithful Data Set

Single Gaussian Mixture of two Gaussians

Page 6: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Mixtures of Gaussians (2)

Combine simple models into a complex model:

Component

Mixing coefficient

K=3

Page 7: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Mixtures of Gaussians (3)

Page 8: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Unsupervised learning Model data as mixture of mulVvariate Gaussians

Page 9: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Unsupervised learning Model data as mixture of mulVvariate Gaussians

Page 10: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Unsupervised learning Model data as mixture of mulVvariate Gaussians

Shown is the posterior probability that a point was generated from ith Gaussian: Pr(Y = i | x)

Page 11: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

ML esVmaVon in supervised se_ng

•  Univariate Gaussian

•  Mixture of Mul&variate Gaussians

ML esVmate for each of the MulVvariate Gaussians is given by:

Just sums over x generated from the k’th Gaussian

µML =1n

xnj=1

∑ ΣML =1n

x j −µML( ) x j −µML( )T

j=1

∑k k k k

Page 12: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

That was easy! But what if unobserved data?

•  MLE: – argmaxθ ∏j P(yj,xj)

– θ: all model parameters •  eg, class probs, means, and variances

•  But we don’t know yj’s!!! •  Maximize marginal likelihood:

– argmaxθ ∏j P(xj) = argmax ∏j ∑k=1 P(Yj=k, xj) K

Page 13: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

EM: Two Easy Steps Objec5ve: argmaxθ lg∏j ∑k=1

K P(Yj=k, xj |θ) = ∑j lg ∑k=1K P(Yj=k, xj|θ)

Data: {xj | j=1 .. n}

•  E-‐step: Compute expectaVons to “fill in” missing y values according to current parameters, θ

–  For all examples j and values k for Yj, compute: P(Yj=k | xj, θ)

•  M-‐step: Re-‐esVmate the parameters with “weighted” MLE esVmates

–  Set θ = argmaxθ ∑j ∑k P(Yj=k | xj, θ) log P(Yj=k, xj |θ)

Especially useful when the E and M steps have closed form soluVons!!!

Page 14: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Simple example: learn means only! Consider: •  1D data •  Mixture of k=2 Gaussians

•  Variances fixed to σ=1 •  DistribuVon over classes is uniform

•  Just need to esVmate μ1 and μ2

€

P(x,Yj = k)k=1

∑j=1

∏ ∝ exp −12σ2

x − µk2⎡

⎣ ⎢ ⎤

⎦ ⎥ P(Yj = k)

k=1

∑j=1

∏

.01 .03 .05 .07 .09

Page 15: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

EM for GMMs: only learning means Iterate: On the t’th iteraVon let our esVmates be

λt = { μ1(t), μ2(t) … μK(t) } E-‐step

Compute “expected” classes of all datapoints

M-‐step

Compute most likely new μs given class expectaVons €

P Yj = k x j ,µ1...µK( )∝ exp − 12σ 2 x j − µk

2⎛

⎝ ⎜

⎞

⎠ ⎟ P Yj = k( )

€

µk = P Yj = k x j( )

j=1

∑ x j

P Yj = k x j( )j=1

∑

Page 16: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Gaussian Mixture Example: Start

Page 17: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Amer first iteraVon

Page 18: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Amer 2nd iteraVon

Page 19: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Amer 3rd iteraVon

Page 20: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Amer 4th iteraVon

Page 21: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Amer 5th iteraVon

Page 22: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Amer 6th iteraVon

Page 23: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Amer 20th iteraVon

Page 24: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Jensen’s inequality

•  Theorem: log ∑z P(z) f(z) ≥ ∑z P(z) log f(z) – e.g., Binary case for convex func5on f:

Applications of machine learning in Computational Biologypeople.csail.mit.edu/dsontag/courses/ml13/slides/lecture... · 2013-12-14 · Applications of Machine Learning in Computational

Probabilistic Graphical Modelspeople.csail.mit.edu/dsontag/courses/pgm12/slides/... · 2012-04-04 · Structured prediction Decision-making under uncertainty Advanced topics (if time)

Probabilistic Graphical Models - MIT CSAILpeople.csail.mit.edu/.../courses/pgm13/slides/lecture3.pdfProbabilistic Graphical Models David Sontag New York University Lecture 3, February

Probabilistic Graphical Models, Spring 2012people.csail.mit.edu/dsontag/courses/pgm13/assignments/ps4.pdf · Probabilistic Graphical Models, Spring 2012 Problem Set 4: Structure learning

Learning Deep Generative Models - People | MIT CSAILpeople.csail.mit.edu/dsontag/courses/inference15/slides/... · 2015-12-08 · 2 Expand the bound Lusing the factorization of the

Approximate Inference in Graphical Models using …people.csail.mit.edu/dsontag/papers/sontag_phd_thesis.pdfApproximate Inference in Graphical Models using LP Relaxations by David

Recommender)Systems) - People | MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml13/slides/lecture25.pdf · user movie score 1 21 1 1 213 5 2 345 4 2 123 4 2 768 3 3 76 5 4 45 4 5

Probabilistic Graphical Models - Peoplepeople.csail.mit.edu/dsontag/courses/pgm13/slides/lecture12.pdf1 Density estimation: we are interested in the full distribution (so later we

Unsupervised learning (part 1) Lecture 19 - People | MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml16/slides/lecture...Unsupervised learning (part 1) Lecture 19 David Sontag New

lecture8 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml13/slides/lecture8.pdf · lecture8.pptx Author: David Sontag Created Date: 9/26/2013 4:42:15 PM

Latent Dirichlet Allocation - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · We describe latent Dirichlet allocation (LDA), a generative probabilistic model for

lecture24 - People | MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml13/slides/lecture24.pdf · David Sontag (NYU) Graphical Models Lecture 1, January 31, 2013 30 / 44. Bayesian networks

Inference and Representation, Fall 2015people.csail.mit.edu/dsontag/courses/inference15/assignments/ps5.pdf · Inference and Representation, Fall 2015 Problem Set 5: Structure learning

lecture10 - People | MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture... · 2014. 4. 9. · specifies a real number for each assignment: – How many assignments

Linear classifiers Lecture 3 - People | MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml12/slides/lecture... · 2012. 9. 11. · linear constraints . w. x w. x w. x x 1 x 2 ... •

Probabilistic Graphical Modelspeople.csail.mit.edu/dsontag/courses/pgm13/slides/lecture1.pdf · Probabilistic Graphical Models David Sontag ... (Bayesian networks) ... Basic concepts

Lifted Tree-Reweighted Variational Inferencepeople.csail.mit.edu/dsontag/papers/BuiHuySon_uai14.pdf · each direction-ﬁnding step solves a maximum spanning tree problem. Several

Probabilistic Graphical Models - Peoplepeople.csail.mit.edu/dsontag/courses/pgm12/slides/lecture1.pdf · Book: Probabilistic Graphical Models: Principles and Techniques by Daphne

Markov chain Monte Carlo Lecture 6 - People | MIT CSAILpeople.csail.mit.edu/dsontag/courses/inference15/slides/lecture6_gibbs.pdfMarkov chain Monte Carlo Lecture 6 David Sontag New

1 Introduction to Dual Decomposition for Inferencepeople.csail.mit.edu/dsontag/papers/SonGloJaa_optbook.pdf · 1 Introduction to Dual Decomposition for Inference David Sontag [email protected]

lecture25 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml12/slides/lecture25.pdf · Lecture25 David&Sontag& New&York&University& Slides adapted from Carlos Guestrin and Luke Zettlemoyer

Passive Navigation* - MIT CSAILpeople.csail.mit.edu › bkph › papers › Passive_Navigation.pdf · PASSIVE NAVIGATION c^\0 z

David&Sontag& New&York&University& - People | MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture12.pdf · naive Bayes can also be represented using the logistic model

Inference and Representation - MITpeople.csail.mit.edu/dsontag/courses/inference14/slides/... · 2014-09-03 · Exact inference & learning Then generalize Continuous variables Partially

Support vector machines (SVMs) Lecture 2 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture2.pdf · Lecture 2 David Sontag New York University Slides adapted

Introduction To Machine Learning - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml16/slides/lecture22.pdfJapanese posts due to Japanese intellectual prop-erty law. Similarly, posts

Introduction To Machine Learning - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml16/slides/lecture21.pdf · Introduction To Machine Learning David Sontag New York University Lecture

A Tutorial on Spectral Clusteringpeople.csail.mit.edu/dsontag/courses/...clustering.pdf · clustering from scratch and present diﬀerent points of view to why spectral clustering

Probabilistic Graphical Models - Peoplepeople.csail.mit.edu/dsontag/courses/pgm12/slides/lecture4.pdf · Undirected graphical models Reminder of lecture 2 An alternative representation

Streaming Variational Bayes - MIT CSAILpeople.csail.mit.edu/tbroderick/files/broderick_2014_streaming_variational.pdfOverview • Big Data inference generally non-Bayesian • Why

Ensemble learning Lecture 13 - People | MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml13/slides/lecture... · 2013. 10. 18. · Ensemble learning Lecture 13 David Sontag New York

Inference and Representation - Peoplepeople.csail.mit.edu/dsontag/courses/inference15/slides/lecture4.pdf · David Sontag (NYU) Inference and Representation Lecture 4, Sept. 29, 2015

lecture4 - MIT CSAILpeople.csail.mit.edu/dsontag/courses/ml14/slides/lecture4.pdf · Learningtheory Lecture4 David&Sontag& New&York&University& Slides adapted from Carlos Guestrin

Probabilistic Graphical Modelspeople.csail.mit.edu/dsontag/courses/pgm12/slides/lecture2.pdf · Example Consider the following Bayesian network: Grade Letter SAT Dif culty Intelligence

Learning representations for counterfactual inferencepeople.csail.mit.edu/dsontag/papers/ShalitJohanssonSontag16_slides.pdf · Learning representations for counterfactual inference