computer vision: models, learning and inference markov ...cv192/wiki.files/cv192_lec_mrf1.pdf ·...
TRANSCRIPT
Computer Vision: Models, Learning and Inference–
Markov Random Fields, Part 1
Oren Freifeld and Ron Shapira-Weber
Computer Science, Ben-Gurion University
March 11, 2019
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 1 / 36
Bayesian Image Restoration with a Markov Random Field
From left to right:
x: the true binary image.
y: its degraded version (20% random flips – this defines p(y|x)).
arg maxx p(x|y) = arg maxx p(y|x)p(x), where p(x) was taken to be aparticular MRF prior called the Ising model.
A sample from p(x|y).
In about week from now, you will know how to do it (in terms of both themath and the coding involved).
Figure from Winkler’s book on MRFs.www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 2 / 36
1 Few Words on Probabilistic Graphical Models
2 Markov Chains
3 Markov Random Fields
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 3 / 36
Few Words on Probabilistic Graphical Models
Probabilistic Graphical Models (PGMs)
PGM come in two main flavors:
Bayesian Networks – directed graphsMarkov Random Fields (MRFs)– undirected graphs
In either case, a PGM encodes (and visualizes) dependency structure ofa joint pdf/pmf
Both types generalize Markov chains
PGMs and Neural Networks are different beasts – but there are relationsbetween them.
pdf: probability density functionpmf: probability mass function
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 4 / 36
Markov Chains
Markov Chain as a Directed Linear Graph
A Markov Chain, (X1, X2, . . . , Xn), may be graphically represented as
x1 x2 · · · xn−1 xn
This highlights the fact that
p(x) = p(x1)
n∏i=2
p(xi|xi−1)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 5 / 36
Markov Chains
Markov Chain as an Undirected Linear Graph
A Markov Chain, (X1, X2, . . . , Xn), may also be graphically represented as
x1 x2 · · · xn−1 xn
This highlights the fact that
p(x1:n) =
n−1∏i=1
φi,i+1(xi,xi+1)in sloppier notation
=
n−1∏i=1
φ(xi,xi+1)
where:
φ1,2(x1,x2) = p(x1)p(x2|x1)
φi,i+1(xi,xi+1) = p(xi+1|xi) ∀i ∈ 2, . . . , n− 1
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 6 / 36
Markov Chains
Factorization Simplifies Computations
For now, forget about probability, and consider the following:
x, y, z are binary variables.
f : R3 → R>0 factorizes as f(x, y, z) = φx,y(x, y)φy,z(y, z) for sometwo nonnegative functions, φx,y : R2 → R≥0 and φy,z : R2 → R≥0.
Want:
maxx,y,z
f(x, y, z) (1)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 7 / 36
Markov Chains
Factorization Simplifies Computations
Brute force requires 23 computations of f(x, y, z). However, we can dobetter by exploiting the factorization:
maxx,y,z
f(x, y, z) = maxx,y,z
φx,y(x, y)φy,z(y, z)
= maxx,y
φx,y(x, y) maxzφy,z(y, z)︸ ︷︷ ︸ψy(y),
= maxx,y
φx,y(x, y)ψy(y)︸ ︷︷ ︸ψx,y(x,y),
= maxx,y
ψx,y(x, y)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 8 / 36
Markov Chains
Factorization Simplifies Computations
maxx,y ψx,y(x, y)
ψx,y(x, y) , φx,y(x, y)ψy(y)
ψy(y) , maxz φy,z(y, z)
Now:
ψy(0) = func(φy,z(0, 0), φy,z(0, 1)) (2 evaluations of φy,z)
ψy(1) = func(φy,z(1, 0), φy,z(1, 1)) (2 evaluations of φy,z)
ψx,y(0, 0) = func(φx,y(0, 0), ψy(0)) (1 evaluation of φx,y)
ψx,y(0, 1) = func(φx,y(0, 1), ψy(1)) (1 evaluation of φx,y)
ψx,y(1, 0) = func(φx,y(1, 0), ψy(0)) (1 evaluation of φx,y)
ψx,y(1, 1) = func(φx,y(1, 1), ψy(1)) (1 evaluation of φx,y)
The solution = max {ψx,y(0, 0), ψx,y(0, 1), ψx,y(1, 0), ψx,y(1, 1)}Again 23 evaluations, but of simpler functions
There is some overhead (e.g., memory, bookkeeping)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 9 / 36
Markov Chains
Factorization Simplifies Computations
More generally:
If the 3 variables, instead of binary, take values in {0, 1, . . . , s− 1}, thenbrute forces requires s3 evaluations of f while exploiting thefactorization leads to 2s2 evaluations of its factors.
If x = (x1, . . . , xn) where each xi takes values in {0, 1, . . . , s− 1}, andwant maxx f(x) where
f(x) =
n−1∏i=1
φi,i+1(xi, xi+1) (2)
then brute forces requires sn evaluations of f while exploiting thefactorization leads to (n− 1)s2 evaluations of its factors. Difference canbe huge, e.g.: s = 10 and n = 100 ⇒ sn = 10100 and (n− 1)s2 = 9900.
Obviously: more overhead due to memory and bookkeeping.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 10 / 36
Markov Chains
Factorization Simplifies Computations
Similar results hold if f(x) =∏n−1i=1 φi,i+1(xi, xi+1) and want
∑x f(x);
this is useful, e.g., if want to create a normalized version of f , i.e.,
f(x)∑x f(x)
A bit less trivial: as we will see, similar results hold iff(x) =
∏n−1i=1 φi,i+1(xi, xi+1) is a pmf, and want to sample from f :
x ∼ f(x) (3)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 11 / 36
Markov Chains
Another Characterization of Markov Chain
Recall: a sequence, (X1, . . . , Xn), is called an MC ifp(xi|x1:(i−1)) = p(xi|xi−1) for every i ∈ {2, . . . , n}. This property isreferred to as “1-sided MC”
If a sequence, (X1, . . . , Xn), satisfiesp(xi|x1:(i−1),x(i+1):n) = p(xi|xi−1,xi+1) ∀i ∈ {2, . . . , n− 1}
p(x1|x2:n) = p(x1|x2)p(xn|x1:(n−1)) = p(xn|xn−1)
it is said to satisfy the “2-sided MC” property. In words:given all the others, each RV depends only on its neighbors.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 12 / 36
Markov Chains
Fact
1-sided MC ⇐⇒ 2-sided MC
Corollary
By symmetry, it follows that if (X1, . . . , Xn) is an MC, then we also have
i ∈ {1, . . . , n− 1} ⇒ p(xi|x(i+1):n) = p(xi|xi+1)
i ∈ {1, . . . , n− 1} ⇒ p(xi:n) = p(xn)∏n−1j=i p(xj |xj+1)
Particularly,
p(x) = p(xn)
n−1∏i=1
p(xi|xi+1)
where x = (x1, . . . ,xn)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 13 / 36
Markov Random Fields
Markov Random Fields
One of the two main types of Probabilistic Graphical Models
Generalize Markov Chains to general undirected graphs
Many computer-vision and machine-learning applications
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 14 / 36
Markov Random Fields
Markov Random Fields
Informal Definition
Associate an RV with each vertex of an undirected graph, G, and say thateach variable, given all the others, depends only on its neighbors(according to the graph). In which case, we say that p, the joint pdf (orpmf) of all these RVs, is an MRF (w.r.t. G).
Example
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 15 / 36
Markov Random Fields
Cliques
Definition
A clique (in the graph) is a set of vertices that are fully connected. Byconvention, each singleton is a clique.
Example
Notation
Let C denote the set of all cliques in the graph. If c ∈ C, thenxc , {xs : s ∈ c}
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 16 / 36
Markov Random Fields
The structure of MRFs leads to computational advantages in calculatingprobabilities on a graph.Examples for (graphs of) MRFs:
graphs defined over pixels (regular 2D lattice)
speech recognition
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 17 / 36
Markov Random Fields
Notation
S: collection of indices
Xa: a ∈ S, an RV
R: the range of Xa, called “state space”. Usually, |R| <∞ (but we willsee cases where this is not true)
XA, for A ⊂ S, is the set {Xs : s ∈ A}BXA = XA\B = {Xs : s ∈ A \B}If A = S, can also just write BX = XS\B = {Xs : s ∈ S \B}p: pmf (or pdf) of XS .
xs: a generic value for Xs, s ∈ S.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 18 / 36
Markov Random Fields
Notation
G = (S, η), where η is a “neighborhood system”; η = {ηs}s∈S where
ηs ⊂ Ss /∈ ηss ∈ ηt ⇐⇒ t ∈ ηs
Example
S = {1, 2, 3, 4, 5, 6, 7, 8}η1 = {2, 3}, η2 = {1, 3, 4}, η3 = {1, 2}, η4 = {2, 5, 6}, η5 = {4},η6 = {4, 7, 8}, η7 = {6, 8}, η8 = {6, 7}www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 19 / 36
Markov Random Fields
Cliques and Neighborhoods
Example
Recall C is the set of cliques in G; i.e., c ∈ C ⇒ c ⊂ S, such that ∀s, t ∈ cwe have s ∈ ηt.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 20 / 36
Markov Random Fields
Definition (Markov Random Field)
p is an MRF w.r.t. G if p(xs|sx) = p(xs|xηs)∀s ∈ S(provided the LHS exists)
Remark: some authors also require p(x) > 0, ∀x
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 21 / 36
Markov Random Fields
Definition (Gibbs distribution)
p is Gibbs w.r.t. G if p(x) > 0 ∀x and
p(x) =∏c∈C
Fc(xc)
for some {Fc}c∈C , a set of functions.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 22 / 36
Markov Random Fields
Theorem (Hammersley & Clifford)
If p(x) > 0 ∀x then:
p MRF w.r.t. G ⇐⇒ p Gibbs w.r.t. G
AKA the fundamental theorem of random fields.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 23 / 36
Markov Random Fields
Hammersley-Clifford and Markov Chains
Consider an MC, X = (X1, X2, . . . , Xn); i.e., p(x) factorizes as
p(x) = p(x1)
n−1∏i=1
p(xi+1|xi)
Assume also p(x) > 0 ∀x.HC⇒ p(x) is MRF w.r.t. G (which here is a linear undirected graph).⇒ p(xi|ix) = p(xi|xi−1,xi+1) ∀i ∈ {2, . . . , n− 1}.We just showed, using HC, that the 1-sided Markov property implies the2-sided Markov property.
In fact, don’t need HC for this as we can prove it directly. But first,before we do it, we need the following fact.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 24 / 36
Markov Random Fields
Fact
If p(xs|xA,xB) = g(xs,xA) for some s ∈ S, A ⊂ S, B ⊂ S, withA ∩B = ∅, s /∈ A ∪B, and some function g, then g(xs,xA) = p(xs|xA).Thus, p(xs|xA,xB) = p(xs|xA).
Proof.
p(xs|xA) =∑xB
p(xs,xB|xA) =∑xB
p(xs|xA,xB)p(xB|xA)
assumption=
∑xB
g(xs,xA)p(xB|xA) = g(xs,xA)∑xB
p(xB|xA) = g(xs,xA).
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 25 / 36
Markov Random Fields
1-sided Markov Property ⇒ 2-sided Markov Property.
p(xi|ix) =p(x)
p(ix)=
p(x)∑xip(x)
MC=
p(x1)∏n−1j=1 p(xj+1|xj)∑
xip(x1)
∏n−1j=1 p(xj+1|xj)
=p(xi|xi−1)p(xi+1|xi)∑xip(xi|xi−1)p(xi+1|xi)
=: g(xi,xi−1,xi+1)
Claim: p(xi|ix) = g(xi,xi−1,xi+1) ⇒ p(xi|ix) = p(xi|xi−1,xi+1). Thisfollows directly from the previous fact: just take A = {i− 1, i+ 1}.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 26 / 36
Markov Random Fields
2-sided Markov Property ⇒ 1-sided Markov Property.
p(xi|ix) = p(xi|xi−1,xi+1) ⇒ p is MRF w.r.t. the (linear) graph GHC⇒ p is Gibbs w.r.t. G. ⇒ p(x) =
∏n−1i=1 F (xi+1,xi) ⇒
p(xi+1|x1:i) =p(x1:(i+1))
p(x1:i)=
∑x(i+2):n
p(x)∑x(i+1):n
p(x)
=
∑x(i+2):n
∏n−1j=1 F (xj+1,xj)∑
x(i+1):n
∏n−1j=1 F (xj+1,xj)
=
func(x1:i+1)︷ ︸︸ ︷∏ij=1 F (xj+1,xj)
∑x(i+2):n
func(xi+1,xi+2:n)︷ ︸︸ ︷∏n−1j=i+1 F (xj+1,xj)∏i−1
j=1 F (xj+1,xj)︸ ︷︷ ︸func(x1:i)
∑x(i+1):n
∏n−1j=i F (xj+1,xj)︸ ︷︷ ︸func(xi,xi+1:n)
= F (xi+1,xi)func(xi+1)
func(xi)=: g(xi+1,xi)⇒ p(xi+1|x1:i) = p(xi+1|xi)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 27 / 36
Markov Random Fields
Proof of the Hammersley-Clifford Theorem
p(x) > 0
Proving “p is Gibbs w.r.t. G ⇒ p is MRF w.r.t. G”.
p is Gibbs w.r.t. G ⇒ p(x) =∏c∈C Fc(xc) for some {Fc}c∈C
⇒
p(xs|sx) =
∏c∈C Fc(xc)∑
xs
∏c∈C Fc(xc)
=
∏c∈C:xs /∈c Fc(xc)
∏c∈C:xs∈c Fc(xc)∏
c∈C:xs /∈c Fc(xc)∑
xs
∏c∈C:xs∈c Fc(xc)
=
∏c∈C:xs∈c Fc(xc)∑
xs
∏c∈C:xs∈c Fc(xc)
=func(xs,xηs)
func(xηs)= g(xs,xηs) = p(xs|xηs)
(recall that p(xs|sx) = g(xs,xηs) implies that p(xs|sx) = p(xs|xηs))⇒ p is MRF w.r.t. G.
The other direction is hard; we omit the proof (cf. Winkler’s book ifinterested)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 28 / 36
Markov Random Fields
Marginals and Posteriors
Suppose we divide S into “Unobservable” (AKA hidden/latent) and“Observable” sites:
S = A ∪B A ∩B = ∅x = xS = (xA,yB)
Example
Of interest are the statistical structures of p(xA|yB) and p(yB).
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 29 / 36
Markov Random Fields
Fact (equivalent characterizations of MRFs)
Let p(x) > 0∀x. Let MP stand for “Markov Property”. If A ⊂ S, then∂A = Ac ∩
⋃s∈A ηs is called the Markov Blanket of A. Let A = A ∪ ∂A
The following are equivalents:
1 p(xs|sx) = p(xs|xηs) ∀s ∈ S (i.e., our original definition of an MRF)
2 p is Gibbs w.r.t. G
3 global MP: A,B,C ⊂ S are disjoint and C separatesa A and B⇒ xA ⊥⊥ xB|xC
4 Setwise local MP: A ⊂ S ⇒ xA ⊥⊥ xS\A|x∂A5 local MP: s ∈ S ⇒ xs ⊥⊥ xS\(s∪ηs)|xηs6 pairwise MP: s, t ⊂ S, s /∈ ηt ⇒ xs ⊥⊥ xt|S \ {s, t}
aI.e., for every s ∈ A and t ∈ B, any path in G between s and t passesthrough some q ∈ C
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 30 / 36
Markov Random Fields
Posteriors
For every clique (more generally, a subset of S) c, we havec = S ∩ c = (A ∪B) ∩ c = (A ∩ c) ∪ (B ∩ c)Can write Fc(xc) = Fc(xA∩c,yB∩c)
We have
p(xA|yB) =p(xA,yB)
p(yB)=
p(x)
p(yB)=
∏c∈C Fc(xA∩c,yB∩c)
p(yB)
=
∏c∈C:c∩A 6=∅ Fc(xA∩c,yB∩c)
∏c∈C:c∩A=∅ Fc(xA∩c,yB∩c)
p(yB)
∝∏
c∈C:c∩A 6=∅
Fc(xA∩c,yB∩c) =∏
c∈C:c∩A 6=∅
Fc(xA∩c)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 31 / 36
Markov Random Fields
Posteriors
p(xA|yB) ∝∏
c∈C:c∩A 6=∅
Fc(xA∩c)
⇒ p(xA|yB) is Gibbs w.r.t. GA (i.e., G restricted to A).
⇒ p(xA|yB) is an MRF w.r.t. GA.
In words: conditioning on a subset of an MRF, yields another(somewhat simpler/smaller) MRF.
Example (Hidden Markov Model (HMM))
Here p(xA|yB) is an MRF w.r.t. a linear graph (i.e., Markov Chain).
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 32 / 36
Markov Random Fields
Marginals
p(yB) =∑xA
p(xA,yB) =∑xA
∏c∈C
Fc(xA,yB)
Example (y1 and y2 are conditionally independent but not independent)
p(x1,x2,y1,y2) = F12(x1,x2)G1(x1,y1)G2(x2,y2)⇒
p(y1,y2) =∑x1,x2
F12(x1,x2)G1(x1,y1)G2(x2,y2) = G12(y1,y2)
typically
6= H1(y1)H2(y2) so y1 ⊥6⊥ y2 even though y1 ⊥⊥ y2|x1,x2
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 33 / 36
Markov Random Fields
Marginals
In fact, more generally, every time we sum out a variable, we create aclique involving all its neighbors (“creating new edges”).
Example
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 34 / 36
Markov Random Fields
Marginals
p(yB) is an MRF w.r.t. GB (G restricted to B) with, in general, an addededge between s, t ∈ B provided there is a path in G from s to t that goesexclusively through A.
Example (Hidden Markov Model (HMM))
Here p(xA|yB) is an MRF w.r.t. a linear graph (i.e., Markov Chain) whilethe graph for p(yB) is fully connected.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 35 / 36
Markov Random Fields
Version Log
11/3/2019, ver 1.01. S7: Changed R>0 to R≥0. S24: Added asentence. S28: Added a step.
9/3/2019, ver 1.00.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 1 (ver. 1.01) Mar 11, 2019 36 / 36