lecture13 gaussian mixtures - mit csailpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... ·...

24
Mixture Models & EM algorithm Probabilis5c Graphical Models, Lecture 13 David Sontag New York University Slides adapted from Carlos Guestrin, Dan Klein, Luke Ze@lemoyer, Dan Weld, Vibhav Gogate, and Andrew Moore

Upload: others

Post on 27-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Mixture  Models  &  EM  algorithm  Probabilis5c  Graphical  Models,  Lecture  13  

David  Sontag  New  York  University  

Slides  adapted  from  Carlos  Guestrin,  Dan  Klein,  Luke  Ze@lemoyer,  Dan  Weld,  Vibhav  Gogate,  and  Andrew  Moore  

Page 2: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Mixture  of  Gaussians  model  

µ1 µ2

µ3

•  P(Y):  There  are  k  components  

•  P(X|Y):  Each  component  generates  data  from  a  mul5variate  Gaussian  with  mean  μi  and  covariance  matrix  Σi  

Each  data  point  is  sampled  from  a  genera&ve  process:    

1.  Choose  component  i  with  probability  P(y=i)  

2.  Generate  datapoint  ~  N(mi,  Σi  )  

Gaussian  mixture  model  (GMM)  

Page 3: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

MulVvariate  Gaussians  

Σ  =  arbitrary  (semidefinite)  matrix:      -­‐  specifies  rotaVon  (change  of  basis)    -­‐  eigenvalues  specify  relaVve  elongaVon  

P(X = x j |Y = i) =1

(2π )m/2 || Σi ||1/2 exp −

12x j −µi( )

TΣi−1 x j −µi( )

#

$%&

'(                                P(X=xj)=

Page 4: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Mixtures  of  Gaussians  (1)  

Old  Faithful  Data  Set  

Time  to  ErupV

on  

DuraVon  of  Last  ErupVon  

Page 5: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Mixtures  of  Gaussians  (1)  

Old  Faithful  Data  Set  

Single  Gaussian   Mixture  of  two  Gaussians  

Page 6: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Mixtures  of  Gaussians  (2)  

Combine  simple  models  into  a  complex  model:  

Component  

Mixing  coefficient  

K=3  

Page 7: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Mixtures  of  Gaussians  (3)  

Page 8: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Unsupervised  learning  Model  data  as  mixture  of  mulVvariate  Gaussians  

Page 9: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Unsupervised  learning  Model  data  as  mixture  of  mulVvariate  Gaussians  

Page 10: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Unsupervised  learning  Model  data  as  mixture  of  mulVvariate  Gaussians  

Shown  is  the  posterior  probability  that  a  point  was  generated  from  ith  Gaussian:  Pr(Y = i | x)

Page 11: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

ML  esVmaVon  in  supervised  se_ng  

•  Univariate  Gaussian  

•  Mixture  of  Mul&variate  Gaussians  

ML  esVmate  for  each  of  the  MulVvariate  Gaussians  is  given  by:  

Just  sums  over  x  generated  from  the  k’th  Gaussian  

µML =1n

xnj=1

n

∑ ΣML =1n

x j −µML( ) x j −µML( )T

j=1

n

∑k   k   k   k  

Page 12: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

That  was  easy!    But  what  if  unobserved  data?  

•  MLE:  – argmaxθ  ∏j  P(yj,xj)  

– θ:  all  model  parameters    •  eg,  class  probs,  means,  and          variances  

•  But  we  don’t  know  yj’s!!!  •  Maximize  marginal  likelihood:  

– argmaxθ  ∏j  P(xj)  =  argmax  ∏j  ∑k=1  P(Yj=k,  xj)  K  

Page 13: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

EM:  Two  Easy  Steps  Objec5ve:  argmaxθ  lg∏j  ∑k=1

K  P(Yj=k,  xj  |θ)  =  ∑j  lg  ∑k=1K  P(Yj=k,  xj|θ)    

Data:  {xj  |  j=1  ..  n}    

•  E-­‐step:  Compute  expectaVons  to  “fill  in”  missing  y  values  according  to  current  parameters,  θ    

–  For  all  examples  j  and  values  k  for  Yj,  compute:  P(Yj=k  |  xj,  θ)    

•  M-­‐step:  Re-­‐esVmate  the  parameters  with  “weighted”  MLE  esVmates  

–  Set  θ  =  argmaxθ  ∑j  ∑k  P(Yj=k  |  xj,  θ)  log  P(Yj=k,  xj  |θ)  

Especially  useful  when  the  E  and  M  steps  have  closed  form  soluVons!!!  

Page 14: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Simple  example:  learn  means  only!  Consider:  •  1D  data  •  Mixture  of  k=2  Gaussians  

•  Variances  fixed  to  σ=1  •  DistribuVon  over  classes  is  uniform  

•  Just  need  to  esVmate  μ1  and  μ2  

P(x,Yj = k)k=1

K

∑j=1

m

∏ ∝ exp −12σ2

x − µk2⎡

⎣ ⎢ ⎤

⎦ ⎥ P(Yj = k)

k=1

K

∑j=1

m

.01 .03 .05 .07 .09

=2  

Page 15: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

EM  for  GMMs:  only  learning  means  Iterate:    On  the  t’th  iteraVon  let  our  esVmates  be  

λt  =  {  μ1(t),  μ2(t)  …  μK(t)  }  E-­‐step  

 Compute  “expected”  classes  of  all  datapoints  

M-­‐step  

 Compute  most  likely  new  μs  given  class  expectaVons  €

P Yj = k x j ,µ1...µK( )∝ exp − 12σ 2 x j − µk

2⎛

⎝ ⎜

⎠ ⎟ P Yj = k( )

µk = P Yj = k x j( )

j=1

m

∑ x j

P Yj = k x j( )j=1

m

Page 16: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Gaussian  Mixture  Example:  Start  

Page 17: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Amer  first  iteraVon  

Page 18: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Amer  2nd  iteraVon  

Page 19: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Amer  3rd  iteraVon  

Page 20: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Amer  4th  iteraVon  

Page 21: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Amer  5th  iteraVon  

Page 22: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Amer  6th  iteraVon  

Page 23: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Amer  20th  iteraVon  

Page 24: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27

Jensen’s  inequality    

•  Theorem:  log  ∑z  P(z)  f(z)    ≥    ∑z  P(z)  log  f(z)    – e.g.,  Binary  case  for  convex  func5on  f: