minimizing general submodular functions - eth zurich · minimizing general submodular functions...
TRANSCRIPT
Minimizing
general submodular functions
CVPR 2015 Tutorial
Stefanie Jegelka
MIT
The set function view
2
cost of buying items
together, or
utility, or
probability, …
( ) =
We will assume:
• .
• black box “oracle” to evaluate F
Set functions and energy functions
any set function with .
… is a function on
binary vectors!
a
b
d
c
A
3
1
1
0
0
a
b
c
d
binary labeling problems = subset selection problems!
Discrete Labeling
4
Int J Comput Vis (2009) 82: 302–324 303
Fig. 1 Incorporating higher order potentials for object segmentation.
(a) An image from the MSRC-21 dataset. (b), (c) and (d) Unsuper-
vised image segmentation results generated by using different para-
meters values in the mean-shift segmentation algorithm (Comaniciu
and Meer 2002). (e) The object segmentation obtained using the unary
likelihood potentials from TextonBoost. (f) The result of performing
inference in the pairwise CRF defined in Sect. 2. (g) Our segmentation
result obtained by augmenting the pairwise CRF with higher order
potentials defined on the segments shown in (b), (c) and (d). (h) The
rough hand labelled segmentations provided in the MSRC data set. It
can be clearly seen that the use of higher order potentials results in a
significant improvement in the segmentation result. For instance, the
branches of the tree are much better segmented
sell et al. (2006) this is not always the case and segments ob-
tained using unsupervised segmentation methods are often
wrong. To overcome these problems (Hoiem et al. 2005b)
and (Russell et al. 2006) use multiple segmentations of the
image (instead of only one) in the hope that although most
segmentations are bad, some are correct and thus would
prove useful for their task. They merge these multiple super-
pixels using heuristic algorithms which lack any optimality
guarantees and thus may produce bad results. In this paper
we propose an algorithm that can compute the solution of
the labelling problem (using features based on image seg-
ments) in a principled manner. Our approach couples po-
tential functions defined on sets of pixels with conventional
unary and pairwise cues using higher order CRFs. We test
the performance of this method on the problem of object
segmentation and recognition. Our experiments show that
the results of our approach are significantly better than the
ones obtained using pairwise CRF models (see Fig. 1).
1.1 Object Segmentation and Recognition
Combined object segmentation and recognition is one of
the most challenging and fundamental problems in com-
puter vision. The last few years have seen the emergence
of object segmentation algorithms which integrate object
specific top-down information with image based low-level
features (Borenstein and Malik 2006; He et al. 2004;
Huang et al. 2004; Kumar et al. 2005; Levin and Weiss
2006). These methods have produced excellent results on
challenging data sets. However, they typically only deal
with one object at a time in the image independently and
do not provide a framework for understanding the whole
image. Further, their models become prohibitively large as
the number of classes increases. This prevents their appli-
cation to scenarios where segmentation and recognition of
many object classes is desired.
Shotton et al. (2006) recently proposed a method (Tex-
tonBoost) to overcome this problem. In contrast to using ex-
plicit models to encode object shape they used a boosted
combination of texton features which jointly modeled shape
and texture. They combine the result of textons with colour
and location based likelihood terms in a conditional random
field (CRF). Although their method produced good segmen-
tation and recognition results, the rough shape and texture
model caused it to fail at object boundaries. The problem
of extracting accurate boundaries of objects is considerably
more challenging. In what follows we show that incorpora-
tion of higher order potentials defined on superpixels dra-
matically improves the object segmentation result. In partic-
ular, it leads to segmentations with much better definition of
object boundaries as shown in Fig. 1.
1.2 Higher Order CRFs
Higher order random fields are not new to computer vision.
They have been long used to model image textures (Lan et
sky
tree
house
grass
Int J Comput Vis (2009) 82: 302–324 303
Fig. 1 Incorporating higher order potentials for object segmentation.
(a) An image from the MSRC-21 dataset. (b), (c) and (d) Unsuper-
vised image segmentation results generated by using different para-
meters values in the mean-shift segmentation algorithm (Comaniciu
and Meer 2002). (e) The object segmentation obtained using the unary
likelihood potentials from TextonBoost. (f) The result of performing
inference in the pairwise CRF defined in Sect. 2. (g) Our segmentation
result obtained by augmenting the pairwise CRF with higher order
potentials defined on the segments shown in (b), (c) and (d). (h) The
rough hand labelled segmentations provided in the MSRC data set. It
can be clearly seen that the use of higher order potentials results in a
significant improvement in the segmentation result. For instance, the
branches of the tree are much better segmented
sell et al. (2006) this is not always the case and segments ob-
tained using unsupervised segmentation methods are often
wrong. To overcome these problems (Hoiem et al. 2005b)
and (Russell et al. 2006) use multiple segmentations of the
image (instead of only one) in the hope that although most
segmentations are bad, some are correct and thus would
prove useful for their task. They merge these multiple super-
pixels using heuristic algorithms which lack any optimality
guarantees and thus may produce bad results. In this paper
we propose an algorithm that can compute the solution of
the labelling problem (using features based on image seg-
ments) in a principled manner. Our approach couples po-
tential functions defined on sets of pixels with conventional
unary and pairwise cues using higher order CRFs. We test
the performance of this method on the problem of object
segmentation and recognition. Our experiments show that
the results of our approach are significantly better than the
ones obtained using pairwise CRF models (see Fig. 1).
1.1 Object Segmentation and Recognition
Combined object segmentation and recognition is one of
the most challenging and fundamental problems in com-
puter vision. The last few years have seen the emergence
of object segmentation algorithms which integrate object
specific top-down information with image based low-level
features (Borenstein and Malik 2006; He et al. 2004;
Huang et al. 2004; Kumar et al. 2005; Levin and Weiss
2006). These methods have produced excellent results on
challenging data sets. However, they typically only deal
with one object at a time in the image independently and
do not provide a framework for understanding the whole
image. Further, their models become prohibitively large as
the number of classes increases. This prevents their appli-
cation to scenarios where segmentation and recognition of
many object classes is desired.
Shotton et al. (2006) recently proposed a method (Tex-
tonBoost) to overcome this problem. In contrast to using ex-
plicit models to encode object shape they used a boosted
combination of texton features which jointly modeled shape
and texture. They combine the result of textons with colour
and location based likelihood terms in a conditional random
field (CRF). Although their method produced good segmen-
tation and recognition results, the rough shape and texture
model caused it to fail at object boundaries. The problem
of extracting accurate boundaries of objects is considerably
more challenging. In what follows we show that incorpora-
tion of higher order potentials defined on superpixels dra-
matically improves the object segmentation result. In partic-
ular, it leads to segmentations with much better definition of
object boundaries as shown in Fig. 1.
1.2 Higher Order CRFs
Higher order random fields are not new to computer vision.
They have been long used to model image textures (Lan et
Summarization
5
Influential subsets
Submodularity
7
extra cost:
one drink
extra cost:
free refill
diminishing marginal costs
The big picture
submodular functions
electrical networks (Narayanan
1997)
graph theory
(Frank 1993)
game theory
(Shapley 1970)
matroid theory
(Whitney, 1935) stochastic processes (Macchi 1975, Borodin 2009)
combinatorial optimization
computer vision & machine learning
G. Choquet J. Edmonds
L.S. Shapley L. Lovász
Examples
9
sensing:
F(S) = information gained from locations S
Example: cover
Maximizing Influence
11 Kempe, Kleinberg & Tardos 2003
Submodular set functions
• Diminishing gains: for all
• Union-Intersection: for all
A B + e + e
Submodularity: boolean & sets
Graph cuts
• Cut for one edge:
0 0
• cut of one edge is submodular!
• large graph: sum of edges
Useful property: sum of submodular functions is submodular
submodular on . The following are submodular:
• Restriction:
Other closedness properties
15
S V S W V
submodular on . The following are submodular:
• Restriction:
• Conditioning:
Other closedness properties
16
S V S W V
Closedness properties
submodular on . The following are submodular:
• Restriction:
• Conditioning:
• Reflection:
17
S V
Submodular optimization
• subset selection: min / max F(S)
• minimizing submodular functions: next
• maximizing submodular functions: afternoon
convex …
… and concave aspects!
Minimizing submodular functions
Why?
• energy minimization
• variational inference (marginals)
• structured sparse estimation …
How?
• graph cuts – fast, not always possible
• convex relaxations – can be fast, always possible
• …
submodularity & convexity
any set function with .
… is a function on
binary vectors!
a
b
d
c
A
20
pseudo-boolean function
1
1
0
0
a
b
c
d
Relaxation: idea
A relaxation (extension)
have want: extension
(1.0 - 0.5) + (0.5 – 0.2) + (0.2)
The Lovász extension
have want: extension
• truncation
• cut function
Examples
1.0 - 0.5
“total variation”!
Alternative characterization
Theorem (Lovász, 1983)
Lovasz extension is convex F is submodular.
if F is submodular, this is equivalent to:
1
2
− 1− 2
sb
sa
sa F (a)
sb F (b)
1
2
− 1− 2
sb
sa
sa F (a)
1
2
− 1− 2
sb
sa
sa F (a)
sb F (b)
1
2
− 1− 2
sb
sa
sa F (a)
sb F (b)
sa + sb F (V)
Submodular polyhedra
submodular polyhedron:
Base polytope
Base polytope
Base polytope exponentially
many constraints!
Edmonds 1970: “magic”
compute argmax in O(n log n)
basis of (almost all) optimization! -- separation oracle – subgradient --
Base polytopes
Base polytope
3s
s2
s1
P(F)
B(F)
2D (2 elements) 3D (3 elements)
Convex relaxation
1. relaxation: convex optimization (non-smooth)
2. relaxation is exact!
submodular minimization in polynomial time! (Grötschel, Lovász, Schrijver 1981)
3s
s2
s1
P(F)
B(F)
Submodular minimization
• minimize
– subgradient descent
– smoothing (special cases)
• solve dual: combinatorial algorithms
– foundations: Edmonds, Cunningham
– first poly-time algorithms:
(Iwata-Fujishige-Fleischer 2001, Schrijver 2000)
– many more after that …
proximal problem Lovász extension
Minimum-norm-point algorithm
dual: minimum norm problem
31
Fujishige ‘91, Fujishige & Isotani ‘11
minimizes F !
-1
1
a
b
-1 a
The bigger story
projection proximal parametric
divide-and-conquer
thresholding
(Fujishige & Isotani 11, Nagano, Gallo-Grigoriadis-Tarjan 06, Hochbaum 01, Chambolle & Darbon 09, …)
Minimum-norm-point algorithm
a
b
d
c
-0.5
-0.5
0.8
1.0
a
b
c
d
1. optimization: find
2. rounding:
how solve?
Polytope has exponentially many inequalities / faces
BUT: can do linear optimization over
Frank-Wolfe or Fujishige-Wolfe algorithm
Frank-Wolfe: main idea
Empirically
(Figure from Bach, 2012)
convergence of relaxation convergence of S
min-norm point
Recap – links to convexity
• submodular function F(S)
• convex extension f(x) --- can compute it!
• submodular minimization as convex optimization
-- can solve it!
• What can we do with it?
3s
s2
s1
P(F)
B(F)
Links to convexity
• What can we do with it?
• MAP inference / energy minimization (out-of-the-box)
• variational inference (Djolonga & Krause 2014)
• structured sparsity (Bach 2010)
• decomposition & parallel algorithms
Structured sparsity and submodularity
Sparse reconstruction
40
discrete regularization on support S of x
relax to convex envelope
sparsity pattern often not random…
subset
selection:
S = {1,3,4,7}
Assumption:
x is sparse
Structured sparsity
...
. . .
⇡ ⇤
M xy
.. .
Assumption:
support of x
has structure
express by set function!
Preference for trees
Set function:
if T is a tree and S not |S| = |T|
use as regularizer?
Sparsity
43
Optimization: submodular minimization (min-norm) (Bach2010)
• x sparse • x structured sparse
submodular function
Lovász extension
discrete regularization on support S of x
relax to convex envelope
Special case
• minimize a sum of submodular functions
“easy”
• combinatorial algorithms (Kolmogorov 12, Fix-Joachims-Park-Zabih 13, Fix-Wang-Zabih 14)
• convex relaxations
Relaxation
• convex Lovász extension:
tight relaxation
• dual decomposition: parallel algorithms (Komodakis-Paragios-Tziritas 11, Savchynskyy-Schmidt-Kappes-Schnörr 11, J-Bach-Sra 13)
Results: dual decomposition
iteration20 40 60 80 100
log
10(d
ua
lity g
ap
)
-1
0
1
2
3
4
5
6 subgradBCDDRfista-smoothdual-decprimal-smoothed
convergence discrete problem relaxation I
relax II
(Jegelka, Bach, Sra 2013; Nishihara, Jegelka, Jordan 2014)
smooth dual non-smooth dual
faster
parallel
algorithms
Summary
• Submodular functions – diminishing returns/costs
• convex relations:
– exact relaxation
– structured norms
– fast algorithms
• more soon:
– constraints
– maximization: diversity, information
3s
s2
s1
P(F)
B(F)