6/28/20151 expectation-maximization markoviana reading group fatih gelgi, asu, 2005

26
06/23/22 1 Expectation- Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 1

Expectation-Maximization

Markoviana Reading GroupFatih Gelgi, ASU, 2005

Page 2: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 2

Outline What is EM? Intuitive Explanation

Example: Gaussian Mixture Algorithm Generalized EM Discussion Applications

HMM – Baum-Welch K-means

Page 3: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 3

What is EM? Two main applications:

Data has missing values, due to problems with or limitations of the observation process.

Optimizing the likelihood function is extremely hard, but the likelihood function can be simplified by assuming the existence of and values for additional missing or hidden parameters.

N

i

M

jjijj

N

ii upup

UpUL

1 11

*

|maxarg|maxarg

)|(maxarg)|(maxarg

Page 4: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 4

Key Idea… The observed data U is generated by some

distribution and is called the incomplete data.

Assume that a complete data set exists Z = (U,J), where J is the missing or hidden data.

Maximize the posterior probability of the parameters given the data U, marginalizing over J:

)|,(maxarg* UJP

Page 5: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 5

Intuitive Explanation of EM Alternate between estimating the unknowns

and the hidden variables J.

In each iteration, instead of finding the best J J, compute a distribution over the space J.

EM is a lower-bound maximization process (Minka,98).

E-step: construct a local lower-bound to the posterior distribution.

M-step: optimize the bound.

Page 6: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 6

Intuitive Explanation of EM Lower-bound approximation method

** Sometimes provides faster convergence than gradient descent and Newton’s method

Page 7: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 7

Example: Mixture Components

Page 8: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 8

Example (cont’d):True Likelihood of Parameters

Page 9: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 9

Example (cont’d):Iterations of EM

Page 10: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 10

Lower-bound Maximization

Posterior probability Logarithm of the joint distribution

Idea: start with a guess t, compute an easily computed lower-bound B(; t) to the function log P(|U) and maximize the bound instead.

nJ

JUPUP

UJP

J

*

),,(logmaxarg),(logmaxarg

)|,(maxarg

difficult!!!

Page 11: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 11

Lower-bound Maximization (cont.)

Construct a tractable lower-bound B(; t) that contains a sum of logarithms.

ft(J) is an arbitrary prob. dist. By Jensen’s inequality,

Page 12: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 12

Optimal Bound B(; t) touches the objective function

log P(U,) at t. Maximize B(t; t) with respect to ft(J):

Introduce a Lagrange multiplier to enforce the constraint

Page 13: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 13

Optimal Bound (cont.)

Derivative with respect to ft(J):

Maximizes at:

Page 14: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 14

Maximizing the Bound Re-write B(;t) with respect to the expectations:

where

Finally,

Page 15: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 15

EM Algorithm

EM converges to a local maximum of log P(U,) maximum of log P(|U).

Page 16: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 16

A Relation to the Log-Posterior

An alternative way to compute expected log-posterior:

which is the same as maximization with respect to ,

Page 17: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 17

Generalized EM Assume and B function are differentiable in .The EM likelihood converges to a point where

GEM: Instead of setting t+1 = argmax B(;t) Just find t+1 such that B(;t+1) > B(;t)

GEM also is guaranteed to converge

)|(ln Xp

0)|(ln

Xp

Page 18: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 18

HMM – Baum-Welch Revisited

t(i) is the probability of being in state Si at time t

t(i,j) is the probability of being in state Si at time t, and Sj at time t+1

Estimate the parameters (a, b, ) st. number of correct individual states to be maximum.

Page 19: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 19

Baum-Welch: E-step

Page 20: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 20

Baum-Welch: M-step

Page 21: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 21

K-Means Problem: Given data X and the number

of clusters K, find clusters. Clustering based on centroids,

A point belongs to the cluster with closest centroid.

Hidden variables centroids of the clusters!

cx

xc

||

1(c)μ

Page 22: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 22

K-Means (cont.)

Starting with an initial 0, centroids, E-step: Split the data into K

clusters according to distances to the centroids (Calculate the distribution ft(J)).

M-step: Update the centroids (Calculate t+1).

Page 23: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 23

K Means Example(K=2)

Pick seeds

Reassign clusters

Compute centroids

xx

Reassign clusters

xx xx Compute centroids

Reassign clusters

Converged!

Page 24: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 24

Discussion

Is EM a Primal-Dual algorithm?

Page 25: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 25

Reference: A.P.Dempster et al “Maximum-likelihood from

incomplete data Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1. (1977), pp. 1-38.

F. Dellaert, “The Expectation Maximization Algorithm”, Tech. Rep. GIT-GVU-02-20, 2002.

T. Minka, “Expectation-Maximization as lower bound maximization”, 1998

Y. Chang, M. Kölsch. Presentation: Expectation Maximization, UCSB, 2002.

K. Andersson, Presentation: Model Optimization using the EM algorithm, COSC 7373, 2001

Page 26: 6/28/20151 Expectation-Maximization Markoviana Reading Group Fatih Gelgi, ASU, 2005

04/18/23 Fatih Gelgi, ASU’05 26

Thanks!