incomplete graphical models

Incomplete Graphical Models

Nan Hu

Outline

Motivation K-means clustering

Coordinate Descending algorithm Density estimation

EM on unconditional mixture Regression and classification

EM on conditional mixture A general formulation of EM Algorithm

K-means clustering

Problem: Given a set of observations

how to group them into a set of K clustering, supposing the value of K is given.

First Phase

Second Phase

K-means clustering

Original Set

First Iteration

Second Iteration

Third Iteration

K-means clustering

Coordinate descent algorithm The algorithm is trying to minimize distortion

measure J

by setting the partial derivatives to zero

Unconditional MixtureProblem: If the given sample data demonstrate

multimodal densities, how to estimate the true density?

Fit a single density with this bimodal case.

Although algorithm converges, the results bear little relationship to the truth.

Unconditional Mixture

A “divide-and-conquer” way to solve this problem

Introducing latent variable Z

Multinomial node taking on one of K values

Assign a density model for each subpopulation, overall density is

Gaussian Mixture Models In this model, the mixture components are

Gaussian distributions with parameters

Probability model for a Gaussian mixture

Posterior probability of latent variable Z:

Log likelihood:

Partial derivative of over using Lagrange Multipliers

Solve it, we have

Partial derivative of over

Setting it to zero, we have

Partial derivative of over

Setting it to zero, we have

The EM Algorithm First Phase

Second Phase

EM algorithm from expected complete log likelihood point of view

Suppose we observed the latent variables , the data set becomes completely

observed, the likelihood is defined as the complete log likelihood

nZ),( nn zx

n i iiniin

n nnnnc

zxpzxlin

)],|(log[

)],|([log

)|,(log),|(

We treat the as random variables and take expectations conditioned on X and .

Note are binary r.v., where

Use this as the “best guess” for , we haveExpected complete log likelihood

n i iiniti

n i iiniin

n nnnnc

Zxpzxl

)],|(log[

)|,(log),|(

Minimizing expected complete log likelihood by setting the derivatives to zero, we have

Conditional Mixture

Graphical Model

Y Latent variable Z, multinomial node taking on one of K values

For regression and classification

The relationship between X and Z can be modeled in a discriminative classification way, e.g. softmax func.

Conditional Mixture

By marginalizing over Z,

X is taken to be always observed. The posterior probability is defined as

Conditional Mixture

Some specific choice of mixture components Gaussian components

Logistic components

Where is the logistic function:

Conditional Mixture

Parameter estimation via EMComplete log likelihood :

Use expectation as the “best guess”, we have

n i ininnni

ninnnnnnc

xZypxz

xzypzyxl

)],,1|(),(log[

)],,1|(),([log

),|,(log)},,(|{

),,|1()(

yxZpZ t

Conditional Mixture

The expected complete log likelihood can then be written as

Taking partial derivatives and setting them to zero to find the update formula for EM

n i in

)],,1|(),(log[

)},,(|{)(

Conditional Mixture

Summary of EM algorithm for conditional mixture (E step): Calculate the posterior probabilities (M step): Use the IRLS algorithm to update the

parameter , base on data pairs . (M step): Use the weighted IRLS algorithm to

update the parameters , based on the data points , with weights .

),( )(tinnx

i),( nn yx )(ti

General Formulation

- all observable variables - all latent variables - all parametersSuppose is observed, the ML estimate is

However, is in fact not observed

)|,(logmaxarg),;(maxarg zxpzxlc

Complete log likelihood

zxpxpxl )|,(log)|(log);( Incomplete log likelihood

General Formulation

Suppose factors in some way, complete log likelihood turns to be

Since is unknown, it’s not clear how to solve this ML estimation. However, if we average over the r.v. of

)|,( zxp

zc zxpxzfzxl )|,(log),|(),;(

),|( zxzf

zxzfxzq

),|()|(

General Formulation

Use as an estimate of , complete log likelihood becomes expected complete log likelihood

This expected complete log likelihood becomes solvable, and hopefully, it’ll also improve the complete log likelihood in some way. (The basic idea behind EM.)

)|( xzq ),|( zxzf

qc zxpxzqzxl )|,(log)|(),;(

General Formulation

EM maximizes incomplete log likelihood

),()|(

)|,(log)|(

)|()|,()|(log

)|,(log)|(log);(

qLxzqzxpxzq

xzqzxpxzq

zxpxpxl

Jensen’s Inequality

Auxiliary Function

General Formulation

Given , maximizing is equal to maximizing the expected complete log likelihood

)|( xzq ),( qL

xzqxzqzxl

xzqxzqzxpxzqxzqzxpxzqqL

)|(log)|(),;(

)|(log)|()|,(log)|()|(

)|,(log)|(),(

General Formulation

Given , the choice yields the maximum of .

),( qL

),|()|( )()1( tt xzpxzq

)|(log

)|(log),|(

),|()|,(log),|()),|((

)()()()1(

xzpzxpxzpxzqL

Note: is the upper bound of );( )( xl t),( )(tqL

General Formulation

From above, at every step of EM, we maximized .

However, how do we know whether the finally maximized also maximized incomplete log likelihood ?

),( qL

),( qL);( xl

General Formulation

The different between and

xzpxzqDxzpxzqxzq

xpxzpxzqxpxzq

xzqzxpxzpxpxzq

xzqzxpxzqxlqLxl

)),|(||)|((),|(

)|(log)|(

)|(),|()|(log)|(log)|(

)|()|,(log)|()|(log)|(

)|()|,(log)|();(),();(

);( xl ),( qL

KL divergencenon-negative and uniquely minimized at ),|()|( xzpxzq

General Formulation

EM and alternating minimization Recall the maximization of the likelihood is

exactly the same as minimization of KL divergence between the empirical distribution and the model.

Including the latent variable , KL divergence comes to be a “complete KL divergence” between joint distributions on .

),( zx

General Formulation

Reformulated EM algorithm (E step)

(M step)

)||(minarg)|( )()1( t

t qDxzq

)||(minarg )1()1(

tt qDAlternating minimization algorithm

Summary

Unconditional Mixture Graphic model EM algorithm

Conditional Mixture Graphic model EM algorithm

A general formulation of EM algorithm Maximizing auxiliary function Minimizing “complete KL divergence”

Incomplete Graphical Models

Thank You!

incomplete graphical models

Documents

graphical)models - umass amherst

compiling graphical models

probabilistic graphical models - caltech

conditional graphical models for protein structure...

graphical models - stats.ox.ac.uk

probabilistic graphical models...graphical models,...

probabilistic graphical models...probabilistic graphical...

notes on graphical models

statistical machine learning€¦ · principal component...

graphical models by z.ghahremani

probabilistic graphical...

undirected graphical models - cedar

graphical models - inference -

graphical models 4dummies

graphical models

graphical models - learning -

integration and graphical models

graphical rasch models

rajagiri school of engineering & technologyundirected...

banerjee graphical models