probabilistic inference lecture 1

Post on 05-Feb-2016

25 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Probabilistic Inference Lecture 1. M. Pawan Kumar pawan.kumar@ecp.fr. Slides available online http:// cvc.centrale-ponts.fr /personnel/ pawan /. About the Course. 7 lectures + 1 exam Probabilistic Models – 1 lecture Energy Minimization – 4 lectures Computing Marginals – 2 lectures - PowerPoint PPT Presentation

TRANSCRIPT

Probabilistic InferenceLecture 1

M. Pawan Kumarpawan.kumar@ecp.fr

Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/

About the Course

• 7 lectures + 1 exam

• Probabilistic Models – 1 lecture

• Energy Minimization – 4 lectures

• Computing Marginals – 2 lectures

• Related Courses• Probabilistic Graphical Models (MVA)• Structured Prediction

Instructor

• Assistant Professor (2012 – Present)

• Center for Visual Computing• 12 Full-time Faculty Members• 2 Associate Faculty Members

• Research Interests• Probabilistic Models• Machine Learning• Computer Vision• Medical Image Analysis

Students

• Third year at ECP

• Specializing in Machine Learning and Vision

• Prerequisites• Probability Theory• Continuous Optimization• Discrete Optimization

Outline

• Probabilistic Models

• Conversions

• Exponential Family

• Inference

Example (on board) !!

Outline

• Probabilistic Models• Markov Random Fields (MRF)• Bayesian Networks• Factor Graphs

• Conversions

• Exponential Family

• Inference

MRF

UnobservedRandomVariables

Edges define a neighborhood over random variables

Neighbors

MRF

V1 V2 V3

V4 V5 V6

V7 V8 V9

Variable Va takes a value or a label va from a set L

V = v is called a labeling Discrete, Finite

= {l1, l2,…, lh}

MRF

V1 V2 V3

V4 V5 V6

V7 V8 V9

MRF assumes the Markovian property for P(v)

MRF

V1 V2 V3

V4 V5 V6

V7 V8 V9

Va is conditionally independent of Vb given Va’s neighbors

Hammersley-Clifford Theorem

MRF

V1 V2 V3

V4 V5 V6

V7 V8 V9

Probability P(v) can be decomposed into clique potentials

Potentialψ12(v1,v2)

Potentialψ56(v5,v6)

MRF

V1

d1

V2

d2

V3

d3

V4

d4

V5

d5

V6

d6

V7

d7

V8

d8

V9

d9

Probability P(v) proportional to Π(a,b) ψab(va,vb)

Potentialψ1(v1,d1)

Probability P(d|v) proportional to Πa ψa (va,da)

ObservedData

MRF

V1

d1

V2

d2

V3

d3

V4

d4

V5

d5

V6

d6

V7

d7

V8

d8

V9

d9

Probability P(v,d) =Πa ψa(va,da) Π(a,b) ψab(va,vb)

Z

Z is known as the partition function

MRF

V1

d1

V2

d2

V3

d3

V4

d4

V5

d5

V6

d6

V7

d7

V8

d8

V9

d9

High-orderPotential

ψ4578(v4,v5,v7,v8)

Pairwise MRF

V1

d1

V2

d2

V3

d3

V4

d4

V5

d5

V6

d6

V7

d7

V8

d8

V9

d9

Z is known as the partition function

UnaryPotentialψ1(v1,d1)

PairwisePotentialψ56(v5,v6)

Probability P(v,d) =Πa ψa(va,da) Π(a,b) ψab(va,vb)

Z

MRF

V1

d1

V2

d2

V3

d3

V4

d4

V5

d5

V6

d6

V7

d7

V8

d8

V9

d9

A is conditionally independent of B given C if

there is no path from A to B when C is removed

Conditional Random Fields (CRF)

V1

d1

V2

d2

V3

d3

V4

d4

V5

d5

V6

d6

V7

d7

V8

d8

V9

d9

CRF assumes the Markovian property for P(v|d)

Hammersley-Clifford Theorem

CRF

V1

d1

V2

d2

V3

d3

V4

d4

V5

d5

V6

d6

V7

d7

V8

d8

V9

d9

Probability P(v|d) proportional to Πa ψa(va;d) Π(a,b) ψab(va,vb;d)

Clique potentials that depend on the data

CRF

V1

d1

V2

d2

V3

d3

V4

d4

V5

d5

V6

d6

V7

d7

V8

d8

V9

d9

Probability P(v|d) =Πa ψa (va;d) Π(a,b) ψab(va,vb;d)

Z

Z is known as the partition function

MRF and CRF

V1 V2 V3

V4 V5 V6

V7 V8 V9

Probability P(v) =Πa ψa(va) Π(a,b) ψab(va,vb)

Z

Outline

• Probabilistic Models• Markov Random Fields (MRF)• Bayesian Networks• Factor Graphs

• Conversions

• Exponential Family

• Inference

Bayesian Networks

V1

V2 V3

V4 V5 V6

V7 V8

Directed Acyclic Graph (DAG) – no directed loops

Ignoring directionality of edges, a DAG can have loops

Bayesian Networks

V1

V2 V3

V4 V5 V6

V7 V8

Bayesian Network concisely represents the probability P(v)

Bayesian Networks

V1

V2 V3

V4 V5 V6

V7 V8

Probability P(v) = Πa P(va|Parents(va))

P(v1)P(v2|v1)P(v3|v1)P(v4|v2)P(v5|v2,v3)P(v6|v3)P(v7|v4,v5)P(v8|v5,v6)

Bayesian Networks

Courtesy Kevin Murphy

Bayesian Networks

V1

V2 V3

V4 V5 V6

V7 V8

Va is conditionally independent of its ancestors given its parents

Bayesian Networks

Conditional independence of A and B given C

Courtesy Kevin Murphy

Outline

• Probabilistic Models• Markov Random Fields (MRF)• Bayesian Networks• Factor Graphs

• Conversions

• Exponential Family

• Inference

Factor Graphs

V1 V2a V3b

c

V4 V5f V6g

d e

Two types of nodes: variable nodes and factor nodes

Bipartite graph between the two types of nodes

Factor Graphs

V1 V2a V3b

c

V4 V5f V6g

d e

Factor graphs concisely represents the probability P(v)

ψa(v1,v2)

Factor Graphs

V1 V2a V3b

c

V4 V5f V6g

d e

Factor graphs concisely represents the probability P(v)

ψa({v}a)

Factor Graphs

V1 V2a V3b

c

V4 V5f V6g

d e

Factor graphs concisely represents the probability P(v)

ψb(v2,v3)

Factor Graphs

V1 V2a V3b

c

V4 V5f V6g

d e

Factor graphs concisely represents the probability P(v)

ψb({v}b)

Factor Graphs

V1 V2a V3b

c

V4 V5f V6g

d e

ψb({v}b)

Probability P(v) =Πa ψa({v}a)

Z

Z is known as the partition function

Outline

• Probabilistic Models

• Conversions

• Exponential Family

• Inference

MRF to Factor Graphs

Bayesian Networks to Factor Graphs

Factor Graphs to MRF

Outline

• Probabilistic Models

• Conversions

• Exponential Family

• Inference

Motivation

Random Variable V Label set L = {l1, l2,…, lh}

Samples V1, V2, …, Vm that are i.i.d.

Functions ϕα: L Reals

Empirical expectations: μα = (Σi ϕα(Vi))/m

Expectation wrt distribution P: EP[ϕα(V)] = Σi ϕα(li)P(li)

Given empirical expectations, find compatible distribution

Underdetermined problem

α indexes a set of functions

Maximum Entropy Principle

max Entropy of the distribution

s.t. Distribution is compatible

Maximum Entropy Principle

max -Σi P(li)log(P(li))

s.t. Distribution is compatible

Maximum Entropy Principle

max -Σi P(li)log(P(li))

s.t. Σi ϕα(li)P(li) = μα for all α

Σi P(li) = 1

P(v) proportional to exp(-Σα θαϕα(v))

Exponential Family

Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2,…, lh}

Labeling V = v, va L for all a {1, 2,…, n}

Functions ϕα: Ln Reals α indexes a set of functions

P(v) = exp{-Σα θαΦα(v) - A(θ)}

SufficientStatistics

Parameters NormalizationConstant

Minimal RepresentationP(v) = exp{-Σα θαΦα(v) - A(θ)}

SufficientStatistics

Parameters NormalizationConstant

No non-zero c such that Σα cαΦα(v) = Constant

Ising ModelP(v) = exp{-Σα θαΦα(v) - A(θ)}

Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2}

Ising ModelP(v) = exp{-Σα θαΦα(v) - A(θ)}

Random Variable V = {V1, V2, …,Vn} Label set L = {-1, +1}

Neighborhood over variables specified by edges E

Sufficient Statistics Parameters

va θa for all Va V

vavb θab for all (Va,Vb) E

Ising ModelP(v) = exp{-Σa θava -Σa,b θabvavb- A(θ)}

Random Variable V = {V1, V2, …,Vn} Label set L = {-1, +1}

Neighborhood over variables specified by edges E

Sufficient Statistics Parameters

va θa for all Va V

vavb θab for all (Va,Vb) E

Interactive Binary Segmentation

Interactive Binary Segmentation

Foreground histogram of RGB values FG

Background histogram of RGB values BG

‘+1’ indicates foreground and ‘-1’ indicates background

Interactive Binary Segmentation

More likely to be foreground than background

Interactive Binary Segmentation

More likely to be background than foreground

θa proportional to -log(FG(da)) + log(BG(da))

Interactive Binary Segmentation

More likely to belong to same label

Interactive Binary Segmentation

Less likely to belong to same label

θab proportional to -exp(-(da-db)2)

Rest of lecture 1 ….

Exponential FamilyP(v) = exp{-Σα θαΦα(v) - A(θ)}

SufficientStatistics

Parameters Log-PartitionFunction

Random Variables V = {V1,V2,…,Vn}

Labeling V = vva L = {l1,l2,…,lh}

Random Variable Va takes a value or label va

Overcomplete RepresentationP(v) = exp{-Σα θαΦα(v) - A(θ)}

SufficientStatistics

Parameters Log-PartitionFunction

There exists a non-zero c such that Σα cαΦα(v) = Constant

Ising ModelP(v) = exp{-Σα θαΦα(v) - A(θ)}

Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2}

Ising ModelP(v) = exp{-Σα θαΦα(v) - A(θ)}

Random Variable V = {V1, V2, …,Vn} Label set L = {0, 1}

Neighborhood over variables specified by edges E

Sufficient Statistics Parameters

Ia;i(va) θa;i for all Va V, li L

θab;ik for all (Va,Vb) E, li, lk L

Iab;ik(va,vb)

Ia;i(va): indicator for va = li Iab;ik(va,vb): indicator for va = li, vb = lk

Ising ModelP(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}

Random Variable V = {V1, V2, …,Vn} Label set L = {0, 1}

Neighborhood over variables specified by edges E

Sufficient Statistics Parameters

Ia;i(va) θa;i for all Va V, li L

θab;ik for all (Va,Vb) E, li, lk L

Iab;ik(va,vb)

Ia;i(va): indicator for va = li Iab;ik(va,vb): indicator for va = li, vb = lk

Interactive Binary Segmentation

Foreground histogram of RGB values FG

Background histogram of RGB values BG

‘1’ indicates foreground and ‘0’ indicates background

Interactive Binary Segmentation

More likely to be foreground than background

Interactive Binary Segmentation

More likely to be background than foreground

θa;0 proportional to -log(BG(da))

θa;1 proportional to -log(FG(da))

Interactive Binary Segmentation

More likely to belong to same label

Interactive Binary Segmentation

Less likely to belong to same label

θab;ik proportional to exp(-(da-db)2) if i ≠ k

θab;ik = 0 if i = k

Metric LabelingP(v) = exp{-Σα θαΦα(v) - A(θ)}

Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh}

Metric LabelingP(v) = exp{-Σα θαΦα(v) - A(θ)}

Random Variable V = {V1, V2, …,Vn}

Neighborhood over variables specified by edges E

Sufficient Statistics Parameters

Ia;i(va) θa;i for all Va V, li L

θab;ik for all (Va,Vb) E, li, lk L

Iab;ik(va,vb)

θab;ik is a metric distance function over labels

Label set L = {0, …, h-1}

Metric LabelingP(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}

Random Variable V = {V1, V2, …,Vn}

Neighborhood over variables specified by edges E

Sufficient Statistics Parameters

Ia;i(va) θa;i for all Va V, li L

θab;ik for all (Va,Vb) E, li, lk L

Iab;ik(va,vb)

θab;ik is a metric distance function over labels

Label set L = {0, …, h-1}

Stereo Correspondence

Disparity Map

Stereo Correspondence

L = {disparities}

Pixel (xa,ya) in leftcorresponds to

pixel (xa+va,ya) in right

Stereo Correspondence

L = {disparities}

θa;i is proportional tothe difference in RGB values

Stereo Correspondence

L = {disparities}

θab;ik = wab d(i,k)

wab proportional to exp(-(da-db)2)

Pairwise MRF

Random Variable V = {V1, V2, …,Vn}

Neighborhood over variables specified by edges E

Label set L = {l1, l2, …, lh}

P(v) = exp{-Σα θαΦα(v) - A(θ)}

Sufficient Statistics Parameters

Ia;i(va) θa;i for all Va V, li L

θab;ik for all (Va,Vb) E, li, lk L

Iab;ik(va,vb)

Pairwise MRF

Random Variable V = {V1, V2, …,Vn}

Neighborhood over variables specified by edges E

Label set L = {l1, l2, …, lh}

P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}

A(θ) : log Z

Probability P(v) =Πa ψa(va) Π(a,b) ψab(va,vb)

Z

ψa(li) : exp(-θa;i) ψa(li,lk) : exp(-θab;ik)

Parameters θ are sometimes also referred to as potentials

Pairwise MRF

Random Variable V = {V1, V2, …,Vn}

Neighborhood over variables specified by edges E

Label set L = {l1, l2, …, lh}

P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}

Labeling as a function f : {1, 2, … , n} {1, 2, …, h}

Variable Va takes a label lf(a)

Pairwise MRF

Random Variable V = {V1, V2, …,Vn}

Neighborhood over variables specified by edges E

Label set L = {l1, l2, …, lh}

P(f) = exp{-Σa θa;f(a) -Σa,b θab;f(a)f(b) - A(θ)}

Labeling as a function f : {1, 2, … , n} {1, 2, …, h}

Variable Va takes a label lf(a)

Energy Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b)

Pairwise MRF

Random Variable V = {V1, V2, …,Vn}

Neighborhood over variables specified by edges E

Label set L = {l1, l2, …, lh}

P(f) = exp{-Q(f) - A(θ)}

Labeling as a function f : {1, 2, … , n} {1, 2, …, h}

Variable Va takes a label lf(a)

Energy Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b)

Outline

• Probabilistic Models

• Conversions

• Exponential Family

• Inference

Inference

maxv ( P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} )

Maximum a Posteriori (MAP) Estimation

minf ( Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b) )

Energy Minimization

P(va = li) = Σv P(v)δ(va = li)

Computing Marginals

P(va = li, vb = lk) = Σv P(v)δ(va = li)δ(vb = lk)

Next Lecture …

Energy minimization for tree-structured pairwise MRF

top related