introduction to graphic models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/introduction to...

66
Introduction to Graphical Models Wei-Lun (Harry) Chao June 10, 2010 aMMAI, spring 2010 1

Upload: others

Post on 08-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

Introduction to Graphical Models

Wei-Lun (Harry) ChaoJune 10, 2010

aMMAI, spring 2010

1

Page 2: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

Outline

• Graphical model fundamentals

[Directed]

• General structure: 3 connections, chain, and tree

• Graphical model examples

• Inference and Learning

[Undirected]

• Markov Random Fields and its Applications

2

Page 3: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

Main References

• "An introduction to graphical models," Kevin Murphy, 2001

• "Learning Low-Level Vision," Freeman, IJCV, 2000

• Chapter 16: Graphical Models in “Introduction to Machine Learning, 2nd edition" , Ethem Alpaydin

3

Page 4: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

What are graphical models?

4

If we know all P(C,S,R,W), we know everything in this graph

Page 5: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

What are graphical models?

5

If we know all ( , , , ), we can perform marginalization and Bayes' rule to achieve

all possible probability distribution. (ex: ( ), ( | ), ( , | , ))

P C R S W

P R P R S P R W S CP(C,R,S,W) C R S W

0.2 T T T T

0.11 T T T F

…… …… …… …… ……

0.06 F F F F

[General decomposition]

( , , , ) ( | , , ) ( , , ) ( | , , ) ( | , ) ( , ) ( | , , ) ( | , ) ( | ) ( )

(Totally 1+2+4+8=15 terms recorded)

P C R S W P W C R S P C R S P W C R S P S R C P R C P W C R S P S R C P R C P C

[Induce conditional independence]

( , , , ) ( | ) ( | ) ( | ) ( ) ( | ) ( | ) ( | ) ( )

(Totally 1+2+2+4=9 terms rec

, ,

orded)

, ,P C R S W P W P S P R C P C P W P S P R C PRC R S S CCR C

1

1

[Probability decomposition in graphical models]

( ,..., ) ( | ( ))d

d i i

i

P X X P X parents X

Page 6: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

What are graphical models?

6

From MMAI 09

Page 7: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

Outline

• Graphical model fundamentals

[Directed]

• General structure: 3 connections, chain, and tree

• Graphical model examples

• Inference and Learning

[Undirected]

• Markov Random Field and its Applications

7

Page 8: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

1. Graphical model fundamentals (1/3)

• Graphical models are a marriage between probability theory and graphic theory

• Solving two problems: Uncertainty and Complexity

(Ex: text retrieval, object recognition, ……)

• General structure: Modularity

• Conditional independencies result in local calculations

• Issues: Representation, Inference, Learning, and Decision Theory

8

Page 9: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

1. Graphic model fundamentals (2/3)

• Two structural factors:

Node(Variable)

Arc (Dependence)

• Two kinds of models:

Undirected: Markov random field (MRFs)

Directed: Bayesian network (BNs)

[2]

( , )

( , )

i j

i i

x x

y x

( ), ( | ), ( | )P X P Y X P Z Y

9

Page 10: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

1. Graphical model fundamental (3/3)

• Conditional Independence

• Need to know: Structure and Parameters

• Want to know: Variables (Observed and Unobserved)

10

Ex: ( | ; ) ( , )P Y y X x N Wx

1

1

( ,..., ) ( | ( ))d

d i i

i

P X X P X parents X

Page 11: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

Outline

• Graphical model fundamentals

[Directed]

• General structure: 3 connections, chain, and tree

• Graphical model examples

• Inference and Learning

[Undirected]

• Markov Random Field and its Applications

11

Page 12: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

2. General structure (1/7)

• 3 Connections

Head-to-tail:

Tail-to-tail:

Head-to-head:

( ), ( | ), ( | )P X P Y X P Z Y

( ), ( | ), ( | )P X P Y X P Z X

( ), ( ), ( | , )P X P Y P Z X Y

12

Page 13: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

2. General structure (2/7)

• Head-to-tail:

• Example:

( ), ( | ), ( | )P X P Y X P Z Y

( , , ) ( ) ( | ) ( | , )

( , | ) ( | ) ( | )( | , ) ( | )

( | ) ( | )

( , , ) ( ) ( | ) ( | )

P X Y Z P X P Y X P Z Y X

P Z X Y P Z Y P X YP Z X Y P Z Y

P X Y P X Y

P X Y Z P X P Y X P Z Y

( ) ( | ) ( ) ( |~ ) (~ ) 0.38

( ) ( | ) ( ) ( |~ ) (~ ) 0.47

( | ) ( | ) ( | ) ( |~ ) (~ | ) 0.76 [Prediction]

( | ) ( | ) ( ) / ( ) 0.65 [Diagnosis]

P R P R C P R P R C P C

P W P W R P R P R R P R

P W C P W R P R C P W R P R C

P C W P W C P C P W

13

Page 14: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

2. General structure (3/7)

• Tail-to-tail:

• Example 1:

( ), ( | ), ( | )P X P Y X P Z X

( , | ) ( | ) ( | )

( , , ) ( ) ( | ) ( | )

P Y Z X P Y X P Z X

P X Y Z P X P Y X P Z X

( ) ( | ) ( ) ( |~ ) (~ ) 0.45

( | ) ( )( | ) ( | ) ( ) / ( ) 0.89

( | ) ( ) ( |~ ) (~ )

( | ) ( , | ) ( | ) ( | ) ( |~ ) (~ | ) ...... 0.22 ( )C

P R P R C P C P R C P C

P R C P CP C R P R C P C P R

P R C P C P R C P C

P R S P R C S P R C P C S P R C P C S P R

14

Page 15: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

2. General structure (4/7)

• Example 2: PLSA

• How to determine the structure?

Based on what probability model we have known!

Ex: Regression V.S Generative model

[ ] ( , ) ( | ) ( | ) ( )

[ ] ( , ) ( | ) ( | ) ( )

z

z

Original P d w P w z P z d P d

Modification P d w P w z P d z P z

Regression: ( | ) ( , )

( | ) ( )Generative models: ( | )

( )

P Y y X x N Wx

P X Y P YP Y X

P X

15

Page 16: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

2. General structure (5/7)

• Head-to-head: [Different structure]

When Z is observed, X and Y are not independent!!

• Example:

( ), ( ), ( | , )P X P Y P Z X Y

( , ) ( ) ( )

( , , ) ( ) ( ) ( | , )

P X Y P X P Y

P X Y Z P X P Y P Z X Y

,

( ) ( , , ) 0.52

( | ) ( , | ) 0.92

( | ) ( )( | ) 0.35

( )

( , | )( | , ) 0.21 [Explaining away]

( | )

R S

R

P W P W R S

P W S P W R S

P W S P SP S W

P W

P S R WP S R W

P W R

16

Page 17: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

2. General structure (6/7)

• Combination:

• Memory saving:

• New representation:

No explicit input / output

Blurry difference between supervised / unsupervised learning

Hidden Nodes

Causality

4(2 1) 9

1

1

( ,..., ) ( | ( ))d

d i i

i

P X X P X parents X

17

Page 18: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

2. General structure (7/7)

• Chain:

• Tree:

• Loop:

18

Page 19: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

Outline

• Graphical model fundamentals

[Directed]

• General structure: 3 connections, chain, and tree

• Graphical model examples

• Inference and Learning

[Undirected]

• Markov Random Field and its Applications

19

Page 20: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

3. Graphical model examples (1/10)

• Generative discrimination V.S Gaussian mixture

• PCA, ICA, and all that

• Hidden Markov Chain

• Naive Bayes’ classifier

• Linear regression

• Generative model for generative model

• Applications

• Notation:

Square(discrete), Circle(continuous)

Shaded(observed), Clear(hidden)

20

Page 21: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

3. Graphical model examples (2/10)

• Generative discrimination V.S Gaussian mixture

( ), ( ) : Multinomial distribution

( | ), ( | ) : ( ; , )i i

P Q i P Y i

P X x Q i P X x Y i N x

[Supervised learning]: at the training phase, the class label variable is observable[Inference]: There must be some latent (hidden) variables during testing (prediction)

21

Page 22: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

3. Graphical model examples (3/10)

• PCA and Factor Analysis

( ) ( ;0, )

( | ) ( ; , ) is diagonal

usually assume , where ,n m

P X x N x I

P Y y X x N y Wx

n m X R Y R

[Further simplification]

(1) Isotropic noise: (eigen problem)

(2) Classical PCA: 0

I

[Usage]

Model P(y) Not probabilistic

Subspace Factor analysis PCA

Clustering Mixture of Gaussian K-means

[1]

22

Page 23: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

3. Graphical model examples (4/10)

• Mixture of factor analysis (nonlinear, no W)

• Independent factor analysis (IFA), and ICA

[1]

IFA: chain graph with not diagonal

PCA: ( ) is non-GaussianP X

[1]23

Page 24: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

3. Graphical model examples (5/10)

• Hidden Markov Model: [dynamic, discrete state]

[4 stages]

• Parameters: [repeat] homogeneous Markov chain

Parameters could be estimated by inference

or by learning

[1]

[1]

1 1

1 1

|

Properties: ( ), ( | ), ( | )

Unobserved (hidden)

t t t

t t t t

t

Q Q Q

P Q P Q Q P Y Q

Q

24

Page 25: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

3. Graphical model examples (6/10)

• Variations of HMM[Input-Output HMM] [Factorial HMM]

*Pedigree: parent-child

[Coupled HMM]

*Speech recognition: spoken words, lip images

[Gaussian mixture HMM] [Switching HMM] [Linear dynamic system: Kalman filter]

[1] [1] [1]

[1]

25

Page 26: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

3. Graphical model examples (7/10)

• Naive Bayes’ classifier

• Linear Regression

1 2 1 2( , ,......, | ) ( | ) ( | )...... ( | )

word dictionary

d dP Y Y Y X P Y X P Y X P Y X

d

26

1

1

1

( ) ~ (0, )

Learning (Inference): ( ) ~ (0, )

( | , ) ~ ( , )

Prediction: [ | , ]

t t T t

P w N I

P N

P r x w N w x

r E r x w

Page 27: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

27[1]

Page 28: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

3. Graphical model examples (9/10)

• Generative model for generative model

28

Page 29: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

3. Graphical model examples (10/10)

• PLSA and LDA

• Object Recognition

29

[4] [4]

[5],[6]

Page 30: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

Outline

• Graphical model fundamentals

[Directed]

• General structure: 3 connections, chain, and tree

• Graphical model examples

• Inference and Learning

[Undirected]

• Markov Random Field and its Applications

30

Page 31: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

4. Inference and Learning (1/6)

• The definition of inference and learning:

[inference]: Assume the structures and the parameters have been determined, based on some observations, we want to inference some unobserved variables.

[learning]: To estimate the structure and parameters of the graphical

model!

PS: Each node has its corresponding probability function and parameters, while the parameters of some of them are determined without learning!

For these nodes, even if the variables are unobserved during training, we don’t need to use EM algorithm.

31

Ex: ( | ; ) ( , )P Y y X x N Wx

Page 32: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

4. Inference (2/6)

• The main goal of inference:

To estimate the values of hidden nodes (variable), given the observed nodes (after the structure and parameters are fixed)

• Problem:

32

(1)

(2)Computationally intractable: (marginalization)

over unobserved variables int

(3)Solution: Conditional independence

conditional likelihood priorposterior

likelihood

summation

egral

Page 33: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

4. Inference (3/6)

• Variable elimination:

Push the sums (integrals) in as far as possible

Distributing sums over products: FFT and Viterbi algorithm33

[1]

[1]

Page 34: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

4. Inference (4/6)

(1) Dynamic programming: Avoid redundant computation involved in repeated variable eliminations

(2)Acyclic (tree, chain): Local message passing EX: forwards-backwards algorithm for HMMs

(3) Cyclic or loop: Clustering nodes together to form a tree

(junction trees)

34

( ) ( | )X P X E ( ) ( | )X P E X

( | ) ( ) ( , | ) ( )( | )

( ) ( )

( | ) ( | ) ( )

( )

( | ) ( ) ( | ) ( )

( ) ( )

( | ) ( | ) ( ) ( )

P E X P X P E E X P XP X E

P E P E

P X E P E X P X

P E

P E X P E P E X P X

P X P E

P E X P E X X X

Page 35: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

4. Inference (5/6)

• Approximate inference: Used to solve high induced width (largest cluster) and integral operation

(1) Sampling (Monte Carlo) methods: MCMC

(2) Variational methods Mean-field approximation: law of large number!!

Decoupling all the nodes, and introducing variational parameters for each node

Iteratively updating these variational parameters so as to minimize the cross-entropy (KL distance) between the approximate and the true probabilistic distribution

The mean-field approximation produces a lower bound on the likelihood

(3) Laplacian variational

(4) Loopy belief propagation: turbo codes35

Page 36: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

4. Inference (6/6)

• Variational methods

36

[Example]

( , )( ) ( , ) ( , )

( , )

( , ) ( , )log log ( , ) ( , ) log

( , ) ( , )

providing ( , ) 1

assume ( , ) ( ) ( )

iteratively optimize ( ) and ( )

S

S

g SF f d g S dSd q S dSd

q S

g S g SF q S dSd q S dSd

q S q S

q S dSd

q S q q S

q q S

by EM to maximize the lower bound

Page 37: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

5. Learning (1/7)

• Two things to learn: Parameters and Structures

• Variables in learning: Full or partial observability

• Point estimation V.S Bayesian estimation

37

Full observed Partial observed

Known structure Closed form Expectation Maximization

Unknown structure Local search Structural EM

Page 38: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

5. Learning (2/7)

38

ML

MAP

[Prediction]

(1) ( ; ) or ( | )

(2)Bayesian estimation: ( | ) ( | ) ( | )

[Learning]

( | ) ( )(1)Want to maximize ( | )

( )

ML: arg max ( | ) arg max ( ; )

MAP: arg max ( |

t t

tt

t

t t

P X x P X x

P X x X P X x P X d

P X PP X

P X

P X P X

P

Bayes

)

Bayes' estimation: [ | ] ( | )

(2)Want to get ( | )

Prior conjugate could simplify the structure:

( | ) ( )( | ) , where ( | ) and ( ) are of the same distribution

( )

Ex:

t

t t

t

X

E X P X d

P X

P X PP X P X P

P X

P

( | ) is multinomial and ( ) is DirchletP

Page 39: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

5. Learning (3/7)

• Known structure, full observability:ML: Find the maximum likelihood estimation of the parameters of each CPD

MAP: Include prior information of the parameters

39

1

1

( ,..., ) ( | ( ))n

n i i

i

P X X P X Pa X

Page 40: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

5. Learning (4/7)

• Known structure, partial observability Some nodes are hidden during training, then we can use the EM

(Expectation Maximization) algorithm to find a (locally) optimal ML estimation

[E-step]: we compute the expected values of all the hidden nodes using an inference algorithm

[M-step]: treat these expected values as though they were observed and do ML estimation

EM algorithm is an iterative algorithm, and is similar to the Baum-Welch algorithm used for training HMMs, gradient ascent, and coordinate ascent.

Inference becomes a subroutine which is called by the learning procedure!

40

Page 41: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

5. Learning (5/7)

• EM algorithm 1

41

(1) ( , ) ( | ) ( )

( ) ( , ; )max ( ; ) max ( , ; ) max

( )

(2)Jensen's inequality:

log [ ( )] [log( ( ))]

The equality holds when ( ) is a constant

(3) max ( ; ) is equivalent to max log (

tt t

Z Z

t

P X Z P X Z P Z

Q Z P X ZP X P X Z

Q Z

E f X E f X

f X

P X P

( )

( )

; )

( , ; ) ( , ; )log ( ; ) log ( ) log [ ]

( ) ( )

( , ; ) ( , ; ) [log ] ( ) log

( ) ( )

t

t tt

Q Z

Z

t t

Q Z

Z

X

P X Z P X ZP X Q Z E

Q Z Q Z

P X Z P X ZE Q Z

Q Z Q Z

Page 42: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

5. Learning (6/7)

• EM algorithm 2

42

( , ; )(4) log ( ; ) ( ) log

( )

( , ; )(5)The equality holds when is a constant

( )

( , ; ) ( | ; , ) ( )

( ) ( )

, where ( ) is fixed for the whole training phase

Then ( ) ( | ; , )

(6)T

tt

Z

t

t t t

t

t

P X ZP X Q Z

Q Z

P X Z

Q Z

P X Z P Z X P X

Q Z Q Z

P X

Q Z P Z X

he EM algorithm

[E-step]

expect ( ) ( | ; , ), modify

[M-step]

( , ; )maximize ( ) log , modify

( )

t

t

Z

Q Z P Z X

P X ZQ Z

Q Z

Page 43: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

5. Learning (7/7)

• Bayesian estimation

• Hidden nodes

43

( ; )P X x

( | ) ( | ) ( | )t tP X x X P X x P X d ( | )P X

Page 44: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

Take a Break!!!

44

Page 45: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

Outline

• Graphical model fundamentals

[Directed]

• General structure: 3 connections, chain, and tree

• Graphical model examples

• Inference and Learning

[Undirected]

• Markov Random Field and its Applications

45

Page 46: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

1. MRFs fundamentals

• Markov random field: [Potential (compatibility) functions]

• Learning the parameters: Maximum likelihood estimates of the clique potentials can be computed

using iterative proportional fitting (IPF)

46

[2]

4

1 2 1 3 2 4 3 4

1

( , ) ( , ) ( , ) ( , ) ( , ) ( , )i i

i

P x y x x x x x x x x x y

Page 47: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

2. Low-level vision (1/2)

• Low-level vision: [scene, image] How a visual system might interpret images?

(1) Super-resolution

(2) Shading and reflection estimation

(3) Motion estimation

• Previous works: The probability model reached not by learning

47

( | ) ( )( | )

( )

( | ) is usually defined as a "noise model"

( ) is usually defined as sparseness or smoothness

P y x P xP x y

P y

P y x

P x

Page 48: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

2. Low-level vision (2/2)

• The proposed approach: VISTA (Vision by Image/Scene TrAining)

• The proposed structure:[learning]

We have scene / image pairs, which have both been divided into patches! We learn the relationships between local regions of image and scenes, and between local scene regions

Long range interaction: multi-scale pyramid

[Estimation scene]

The best scene estimate is the mean (minimum mean squared error, MMSE) or the mode (maximum a posterior, MAP) of the posterior probability

48

[2]

( , )( | )

( )

P x yP x y

P y

Page 49: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

3. MRFs inference (1/3)

• Joint probability:

• MMSE:

• MAP:

49

Page 50: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

3. MRFs inference (2/3)

• No loop MMSE: [Inference]

• No loop MAP: [message passing]

50

[2]

[Two indices: state and location]

is a column vector with the same dimension as

( , ) is a column vector indexed by the different possible states of , the scene at node

( , ) is a matrix

k

j j

i i i

i j

M x

x y x i

x x

with the different possible states of and , the scen at node and i jx x i j

Page 51: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

3. MRFs inference (3/3)

• No loop MAP: [probability chain decomposition]

• With loop: Turbo codes

Still use the message passing algorithm above for MRFs with loops

51

Page 52: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

4. MRFs Representation

52

(1) The PCA is used to find a set of lower dimensional basis functions for the patches of images and

scene pixels

(2) Model ( , ), ( , ), and (Could be of Gaussian mixture distributions)k

i i i j jx y x x M

(3) Continuous probabilities are hard to propagate, so we prefer discrete representation!

During the scene estimation phase, each scene patch is represented by 10 to 20 closed candidates,

which are selected based on the evidence at each nodeiy

[2]

Page 53: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

5. MRFs Learning

• Method 1: [message passing]

• Method 2: [proper probability factorization-Overlap]

53

( , )Gaussina mixture:

( , )

( , )( | ) : a matrix

( )

( | ): a column vector indexed by " "

i i

i j

l m

k jl m

k j m

j

l

k k

x y

x x

P x xP x x

P x

P y x l

Scene patch overlap between neighbor-scene patches to estimate ( , )

: the vectors of pixels of the th candidate for scene patch which lies in the overlap region with patch

i j

l

jk k

x x

d l x j

[2]

Page 54: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

6. Super-resolution (1/6)

• Laplacian pyramid

54[2]

[7]

Page 55: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

6. Super-resolution (2/6)

• Training: [message passing]

55

[2]

Page 56: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

6. Super-resolution (3/6)

• Result 1: [message passing]

56

[2]

Page 57: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

6. Super-resolution (4/6)

• Result 2:

57

[2]

Page 58: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

6. Super-resolution (5/6)

• Result 3:

58[2]

Page 59: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

6. Super-resolution (6/6)

• Result 4:

59

[2]

Page 60: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

7. Shading and reflection estimation (1/5)

• Shape: shading intensity

• Reflection: surface intensity

• Scene: two pixel arrays (reflection, shape)

• Rendering: based on the estimated scene

60

[2]

Page 61: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

7. Shading and reflection estimation (2/5)

• Patch selection:

61

[2]

Page 62: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

7. Shading and reflection estimation (3/5)

• Training: [overlap]

62

[2]

Page 63: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

7. Shading and reflection estimation (4/5)

• Result 1:

63[2]

Page 64: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

7. Shading and reflection estimation (5/5)

• Result 2:

64

[2]

Page 65: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

Reference

[1] "An introduction to graphical models," Kevin Murphy, 2001

[2] "Learning Low-Level Vision," Freeman, IJCV, 2000

*3+ Chapter 16: Graphical Models in “Introduction to Machine Learning, 2nd edition" , Ethem Alpaydin

[4] "Latent Dirichlet allocation," D. Blei, A. Ng, and M. Jordan. . Journal of Machine Learning Research, 3:993–1022, January 2003

[5] R. Fergus, P. Perona, and A. Zisserman, “Object class recognition by unsupervised scale-invariant learning,” In Proc. CVPR, Jun 2003

[6] L. Fei-Fei, R. Fergus, P. Perona, “A Bayesian approach to unsupervised learning of object categories,” in: Proc. Int. Conf. on Computer Vision, 2003, pp. 1134–1141

[7] Laplacian pyramid: http://sepwww.stanford.edu/~morgan/texturematch/paper_html/node3.html

[8] Bayesian network toolbox:

http://code.google.com/p/bnt/

65

Page 66: Introduction to Graphic Models - 國立臺灣大學disp.ee.ntu.edu.tw/~pujols/Introduction to Graphical Models.pdf · Introduction to Graphical Models Wei-Lun (Harry) Chao June 10,

Thanks for listening

66