algorithms for map estimation in markov random fields
DESCRIPTION
Algorithms for MAP estimation in Markov Random Fields. Vladimir Kolmogorov University College London. Energy function. q. unary terms (data). pairwise terms (coherence). p. - x p are discrete variables (for example, x p {0,1} ) - q p ( • ) are unary potentials - PowerPoint PPT PresentationTRANSCRIPT
Algorithms for MAP estimationin Markov Random Fields
Vladimir Kolmogorov
University College London
Energy function
qp
qppqp
ppconst xxxE,
),()()|( x
p
qunary terms
(data)pairwise terms
(coherence)
- xp are discrete variables (for example, xp{0,1})
- p(•) are unary potentials
- pq(•,•) are pairwise potentials
Minimisation algorithms• Min Cut / Max Flow [Ford&Fulkerson ‘56]
[Grieg, Porteous, Seheult ‘89] : non-iterative (binary variables)[Boykov, Veksler, Zabih ‘99] : iterative - alpha-expansion, alpha-beta swap, … (multi-valued variables)+ If applicable, gives very accurate results– Can be applied to a restricted class of functions
• BP – Max-product Belief Propagation [Pearl ‘86]+ Can be applied to any energy function– In vision results are usually worse than that of graph cuts– Does not always converge
• TRW - Max-product Tree-reweighted Message Passing [Wainwright, Jaakkola, Willsky ‘02] , [Kolmogorov ‘05]+ Can be applied to any energy function+ For stereo finds lower energy than graph cuts + Convergence guarantees for the algorithm in [Kolmogorov ’05]
Main idea: LP relaxation• Goal: Minimize energy E(x) under constraints
xp{0,1}
• In general, NP-hard problem!
• Relax discreteness constraints: allow xp[0,1]
• Results in linear program. Can be solved in polynomial time!
Energy functionwith discrete variables
LP relaxation
E E Etight not tight
Solving LP relaxation• Too large for general purpose LP solvers (e.g. interior point methods) • Solve dual problem instead of primal:
– Formulate lower bound on the energy– Maximize this bound– When done, solves primal problem (LP relaxation)
• Two different ways to formulate lower bound– Via posiforms: leads to maxflow algorithm– Via convex combination of trees: leads to tree-reweighted message passing
Lower bound onthe energy function
E
Energy functionwith discrete variables
E E
LP relaxation
Notation and Preliminaries
Energy function - visualisation
0
4
0
1
3
02
5
node p edge (p,q) node q
label 0
label 1
)0(p
)1,0(pq
qp
qppqp
ppconst xxxE,
),()()|( x
0
const
0
4
0
1
3
02
5
node p edge (p,q) node q
label 0
label 1
Energy function - visualisation
qp
qppqp
ppconst xxxE,
),()()|( x
0
vector of
all parameters
0 0 4
4
1 12
5
0
-1
-1
0 + 1
Reparameterisation
Reparameterisation
0 0 3
4
1 02
5
• Definition. is a reparameterisation of
if they define the same energy:
xxx any for )|()|( EE
4 -1
1 -1 0 +1
• Maxflow, BP and TRW perform reparameterisations
1
Part I: Lower bound viaposiforms
( maxflow algorithm)
non-negative
const - lower bound on the energy:
xx constE )|(
maximize
Lower bound via posiforms[Hammer, Hansen, Simeone’84]
qp
qppqp
ppconst xxxE,
),()()|( x
• Maximisation algorithm?– Consider functions of binary variables only
• Maximising lower bound for submodular functions – Definition of submodular functions– Overview of min cut/max flow– Reduction to max flow– Global minimum of the energy
• Maximising lower bound for non-submodular functions– Reduction to max flow
• More complicated graph– Part of optimal solution
Outline of part I
• Definition: E is submodular if every pairwise term satisfies
• Can be converted to “canonical form”:
Submodular functions of binary variables
)0,1()1,0()1,1()0,0( pqpqpqpq
2
1 2 3 4
10
0 05
zerocost
Overview of min cut/max flow
Min Cut problemsource
sink
2 1
1
2
3
45
Directed weighted graph
Min Cut problem
sink
2 1
1
2
3
45
S = {source, node 1}T = {sink, node 2, node 3}
Cut:source
Min Cut problem
sink
2 1
1
2
3
45
S = {source, node 1}T = {sink, node 2, node 3}
Cut:
• Task: Compute cut with minimum cost
Cost(S,T) = 1 + 1 = 2
source
sink
2 1
1
2
3
45
source
Maxflow algorithm
value(flow)=0
Maxflow algorithm
sink
2 1
1
2
3
45
value(flow)=0
source
Maxflow algorithm
sink
1 1
0
3
3
44
value(flow)=1
source
Maxflow algorithm
sink
1 1
0
3
3
44
value(flow)=1
source
Maxflow algorithm
sink
1 0
0
3
4
33
value(flow)=2
source
Maxflow algorithm
sink
1 0
0
3
4
33
value(flow)=2
source
value(flow)=2
sink
1 0
0
3
4
33
source
Maxflow algorithm
Maximising lower bound for submodular functions:
Reduction to maxflow
2
1 2 3 4
10
0 05
sink
2 1
1
2
3
45
source
value(flow)=0
0
Maxflow algorithm and reparameterisation
sink
2 1
1
2
3
45
value(flow)=0
2
1 2 3 4
10
0 05
0
source
Maxflow algorithm and reparameterisation
sink
1 1
0
3
3
44
value(flow)=1
1
0 3 3 4
10
0 04
1
source
Maxflow algorithm and reparameterisation
sink
1 1
0
3
3
44
value(flow)=1
1
0 3 3 4
10
0 04
1
source
Maxflow algorithm and reparameterisation
sink
1 0
0
3
4
33
value(flow)=2
1
0 3 4 3
00
0 03
2
source
Maxflow algorithm and reparameterisation
sink
1 0
0
3
4
33
value(flow)=2
1
0 3 4 3
00
0 03
2
source
Maxflow algorithm and reparameterisation
value(flow)=2
0
00
0
)1,1,0(x
minimum of the energy:
2
0
sink
1 0
0
3
4
33
source
Maxflow algorithm and reparameterisation
Maximising lower bound for non-submodular functions
Arbitrary functions of binary variables
• Can be solved via maxflow [Boros,Hammer,Sun’91]– Specially constructed graph
• Gives solution to LP relaxation: for each node
xp{0, 1/2, 1}
E
LP relaxation
non-negativemaximize
qp
qppqp
ppconst xxxE,
),()()|( x
Arbitrary functions of binary variables
0
1
0
1
1 1/2 1/2
1/2
1/2
Part of optimal solution[Hammer, Hansen, Simeone’84]
Part II: Lower bound viaconvex combination of trees
( tree-reweighted message passing)
• Goal: compute minimum of the energy for
• In general, intractable!
• Obtaining lower bound:– Split into several components: – Compute minimum for each component:
– Combine to get a bound on
• Use trees!
)|(min)( xx
E
)|(min)( ii E xx
Convex combination of trees [Wainwright, Jaakkola, Willsky ’02]
'
2
1 TT2
1
graph tree T tree T’
)( )(2
1 T )(2
1 'T
lower bound on the energymaximize
Convex combination of trees (cont’d)
TRW algorithms• Goal: find reparameterisation maximizing lower bound
• Apply sequence of different reparameterisation operations:– Node averaging– Ordinary BP on trees
• Order of operations?– Affects performance dramatically
• Algorithms:– [Wainwright et al. ’02]: parallel schedule
• May not converge
– [Kolmogorov’05]: specific sequential schedule• Lower bound does not decrease, convergence guarantees
Node averaging
0
1
4
0
Node averaging
2
0.5
2
0.5
• Send messages– Equivalent to reparameterising node and edge parameters
• Two passes (forward and backward)
Belief propagation (BP) on trees
Belief propagation (BP) on trees
constEpx
p
)|(min)0(0
x3
0constE
pxp
)|(min)1(
1 x
• Key property (Wainwright et al.):
Upon termination p gives min-marginals for node p:
TRW algorithm of Wainwright et al. with tree-based updates (TRW-T)
Run BP on all trees “Average” all nodes
• If converges, gives (local) maximum of lower bound• Not guaranteed to converge. • Lower bound may go down.
Sequential TRW algorithm (TRW-S)[Kolmogorov’05]
Run BP on all trees containing p
“Average” node p
Pick node p
Main property of TRW-S
• Theorem: lower bound never decreases.
• Proof sketch:
constT 0)(
0
1
4
0
' 0)( ' constT
Main property of TRW-S
constT 5.0)(
2
0.5
2
0.5
' 5.0)( ' constT
• Theorem: lower bound never decreases.
• Proof sketch:
TRW-S algorithm
• Particular order of averaging and BP operations
• Lower bound guaranteed not to decrease
• There exists limit point that satisfies weak tree agreement condition
• Efficiency?
“Average” node p
Pick node p
inefficient?
Efficient implementation
Run BP on all trees containing p
Efficient implementation
• Key observation: Node averaging operation preserves messages oriented towards this node
• Reuse previously passed messages!
• Need a special choice of trees:– Pick an ordering of nodes– Trees: monotonic chains
4 5 6
7 8 9
1 2 3
Efficient implementation
4 5 6
7 8 9
1 2 3
• Algorithm:– Forward pass:
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:• do the same in reverse order
• Linear running time of one iteration
Efficient implementation
4 5 6
7 8 9
1 2 3
• Algorithm:– Forward pass:
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:• do the same in reverse order
• Linear running time of one iteration
Memory requirements
• Additional advantage of TRW-S: – Needs only half as much memory as standard message
passing!
– Similar observation for bipartite graphs and parallel schedule was made in [Felzenszwalb&Huttenlocher’04]
standard message passing TRW-S
Experimental results: binary segmentation (“GrabCut”)
0 100 200 300 400
3
4
5
6x 10
5
Time
Energy average over 50 instances
Experimental results: stereo
left image ground truth
BP TRW-S20 40 60 80 100
3.6
3.8
4x 10
5
Experimental results: stereo
20 40 60 80 100 120 1401.36
1.4
1.44
x 106
20 40 60 80 100 120 140
1.93
1.94
x 107
Summary• MAP estimation algorithms are based on LP relaxation
– Maximize lower bound
• Two ways to formulate lower bound
• Via posiforms: leads to maxflow algorithm– Polynomial time solution– But: applicable for restricted energies (e.g. binary variables)
• Submodular functions: global minimum• Non-submodular functions: part of optimal solution
• Via convex combination of trees: leads to TRW algorithm– Convergence in the limit (for TRW-S)– Applicable to arbitrary energy function
• Graph cuts vs. TRW:– Accuracy: similar– Generality: TRW is more general– Speed: for stereo TRW is currently 2-5 times slower. But:
• 3 vs. 50 years of research!• More suitable for parallel implementation (GPU? Hardware?)
Discrete vs. continuous functionals Continuous formulation (Geodesic active contours)
qp
qppqp
pp xxExEE,
),()()(x ||
0
))(()(C
dssCgCE
• Maxflow algorithm– Global minimum, polynomial-time
• Metrication artefacts?
• Level sets– Numerical stability?
• Geometrically motivated– Invariant under rotation
Discrete formulation (Graph cuts)
Geo-cuts
• Continuous functional
• Construct graph such that for smooth contours C
)interior(
||
0
)()(C
C
dVfdsgCE N
cut ingcorrespond theofcost )( CE
• Class of continuous functionals?
[Boykov&Kolmogorov’03], [Kolmogorov&Boykov’05]:
– Geometric length/area (e.g. Riemannian)
– Flux of a given vector field
– Regional term
TRW formulation
subject to
max)()(
T
TTT
TTθ
where is the collection of all parameter vectors
is a fixed probability distribution on trees T
T θ
T
Efficient implementation
• Algorithm:– Forward pass:
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:• do the same in reverse order
• Linear running time of one iteration
4 5 6
7 8 9
1 2 3
node being
processed
valid
messages
Efficient implementation
4 5 6
7 8 9
1 2 3
• Algorithm:– Forward pass:
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:• do the same in reverse order
• Linear running time of one iteration
Efficient implementation
• Algorithm:– Forward pass:
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:• do the same in reverse order
• Linear running time of one iteration
4 5 6
7 8 9
1 2 3
node being
processed
valid
messages
Efficient implementation
4 5 6
7 8 9
1 2 3
valid
messages
node being
processed
• Algorithm:– Forward pass:
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:• do the same in reverse order
• Linear running time of one iteration
Efficient implementation
4 5 6
7 8 9
1 2 3
node being
processed
valid
messages
• Algorithm:– Forward pass:
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:• do the same in reverse order
• Linear running time of one iteration