incomplete graphical models

Post on 25-Feb-2016

48 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Incomplete Graphical Models. Nan Hu. Outline. Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture Regression and classification EM on conditional mixture A general formulation of EM Algorithm. K-means clustering. - PowerPoint PPT Presentation

TRANSCRIPT

Incomplete Graphical Models

Nan Hu

Outline

Motivation K-means clustering

Coordinate Descending algorithm Density estimation

EM on unconditional mixture Regression and classification

EM on conditional mixture A general formulation of EM Algorithm

K-means clustering

Problem: Given a set of observations

how to group them into a set of K clustering, supposing the value of K is given.

First Phase

Second Phase

K-means clustering

Original Set

First Iteration

Second Iteration

Third Iteration

K-means clustering

Coordinate descent algorithm The algorithm is trying to minimize distortion

measure J

by setting the partial derivatives to zero

Unconditional MixtureProblem: If the given sample data demonstrate

multimodal densities, how to estimate the true density?

Fit a single density with this bimodal case.

Although algorithm converges, the results bear little relationship to the truth.

Unconditional Mixture

A “divide-and-conquer” way to solve this problem

Introducing latent variable Z

Z

X

Multinomial node taking on one of K values

Assign a density model for each subpopulation, overall density is

Back

Unconditional Mixture

Gaussian Mixture Models In this model, the mixture components are

Gaussian distributions with parameters

Probability model for a Gaussian mixture

Unconditional Mixture

Posterior probability of latent variable Z:

Log likelihood:

Unconditional Mixture

Partial derivative of over using Lagrange Multipliers

Solve it, we have

Unconditional Mixture

Partial derivative of over

Setting it to zero, we have

Unconditional Mixture

Partial derivative of over

Setting it to zero, we have

Unconditional Mixture

The EM Algorithm First Phase

Second Phase

Back

Unconditional Mixture

EM algorithm from expected complete log likelihood point of view

Suppose we observed the latent variables , the data set becomes completely

observed, the likelihood is defined as the complete log likelihood

nZ),( nn zx

n i iiniin

ni

ziini

n nnnnc

xNz

xN

zxpzxlin

)],|(log[

)],|([log

)|,(log),|(

Unconditional Mixture

We treat the as random variables and take expectations conditioned on X and .

Note are binary r.v., where

Use this as the “best guess” for , we haveExpected complete log likelihood

nZ

nZ)(t

nZ

n i iiniti

n

n i iiniin

n nnnnc

xN

xNZ

Zxpzxl

t

tt

)],|(log[

)],|(log[

)|,(log),|(

)(

)(

)()(

Unconditional Mixture

Minimizing expected complete log likelihood by setting the derivatives to zero, we have

Conditional Mixture

Graphical Model

X

Z

Y Latent variable Z, multinomial node taking on one of K values

For regression and classification

The relationship between X and Z can be modeled in a discriminative classification way, e.g. softmax func.

Back

Conditional Mixture

By marginalizing over Z,

X is taken to be always observed. The posterior probability is defined as

Conditional Mixture

Some specific choice of mixture components Gaussian components

Logistic components

Where is the logistic function:

Conditional Mixture

Parameter estimation via EMComplete log likelihood :

Use expectation as the “best guess”, we have

n i ininnni

in

n i

zin

innni

ninnnnnnc

xZypxz

xZypx

xzypzyxl

in

)],,1|(),(log[

)],,1|(),([log

),|,(log)},,(|{

),,(

),,|1()(

)()(

tnn

in

tnn

in

in

yx

yxZpZ t

Conditional Mixture

The expected complete log likelihood can then be written as

Taking partial derivatives and setting them to zero to find the update formula for EM

n i in

innni

tin

nnn

xZypx

zyxl

)],,1|(),(log[

)},,(|{)(

Conditional Mixture

Summary of EM algorithm for conditional mixture (E step): Calculate the posterior probabilities (M step): Use the IRLS algorithm to update the

parameter , base on data pairs . (M step): Use the weighted IRLS algorithm to

update the parameters , based on the data points , with weights .

)(tin

),( )(tinnx

i),( nn yx )(ti

n

Back

General Formulation

- all observable variables - all latent variables - all parametersSuppose is observed, the ML estimate is

However, is in fact not observed

Z

XZ

)|,(logmaxarg),;(maxarg zxpzxlc

Complete log likelihood

Z

z

zxpxpxl )|,(log)|(log);( Incomplete log likelihood

General Formulation

Suppose factors in some way, complete log likelihood turns to be

Since is unknown, it’s not clear how to solve this ML estimation. However, if we average over the r.v. of

)|,( zxp

z

zc zxpxzfzxl )|,(log),|(),;(

),|( zxzf

),|( zxzf

z

zxzfxzq

),|()|(

General Formulation

Use as an estimate of , complete log likelihood becomes expected complete log likelihood

This expected complete log likelihood becomes solvable, and hopefully, it’ll also improve the complete log likelihood in some way. (The basic idea behind EM.)

)|( xzq ),|( zxzf

z

qc zxpxzqzxl )|,(log)|(),;(

General Formulation

EM maximizes incomplete log likelihood

),()|(

)|,(log)|(

)|()|,()|(log

)|,(log)|(log);(

qLxzqzxpxzq

xzqzxpxzq

zxpxpxl

z

z

z

Jensen’s Inequality

Auxiliary Function

General Formulation

Given , maximizing is equal to maximizing the expected complete log likelihood

)|( xzq ),( qL

zqc

z z

z

xzqxzqzxl

xzqxzqzxpxzqxzqzxpxzqqL

)|(log)|(),;(

)|(log)|()|,(log)|()|(

)|,(log)|(),(

General Formulation

Given , the choice yields the maximum of .

),( qL

),|()|( )()1( tt xzpxzq

);(

)|(log

)|(log),|(

),|()|,(log),|()),|((

)(

)(

)()(

)(

)()()()1(

xl

xp

xpxzp

xzpzxpxzpxzqL

t

t

z

tt

zt

tttt

Note: is the upper bound of );( )( xl t),( )(tqL

General Formulation

From above, at every step of EM, we maximized .

However, how do we know whether the finally maximized also maximized incomplete log likelihood ?

),( qL

),( qL);( xl

General Formulation

The different between and

z

z

zz

z

xzpxzqDxzpxzqxzq

xpxzpxzqxpxzq

xzqzxpxzpxpxzq

xzqzxpxzqxlqLxl

)),|(||)|((),|(

)|(log)|(

)|(),|()|(log)|(log)|(

)|()|,(log)|()|(log)|(

)|()|,(log)|();(),();(

);( xl ),( qL

KL divergencenon-negative and uniquely minimized at ),|()|( xzpxzq

General Formulation

EM and alternating minimization Recall the maximization of the likelihood is

exactly the same as minimization of KL divergence between the empirical distribution and the model.

Including the latent variable , KL divergence comes to be a “complete KL divergence” between joint distributions on .

Z

),( zx

General Formulation

Reformulated EM algorithm (E step)

(M step)

)||(minarg)|( )()1( t

q

t qDxzq

)||(minarg )1()1(

tt qDAlternating minimization algorithm

Summary

Unconditional Mixture Graphic model EM algorithm

Conditional Mixture Graphic model EM algorithm

A general formulation of EM algorithm Maximizing auxiliary function Minimizing “complete KL divergence”

Incomplete Graphical Models

Thank You!

top related