minimizing general submodular functions - eth zurich · minimizing general submodular functions...

Minimizing

general submodular functions

CVPR 2015 Tutorial

Stefanie Jegelka

MIT

The set function view

2

cost of buying items

together, or

utility, or

probability, …

( ) =

We will assume:

• .

• black box “oracle” to evaluate F

Set functions and energy functions

any set function with .

… is a function on

binary vectors!

a

b

d

c

A

3

1

1

0

0

a

b

c

d

binary labeling problems = subset selection problems!

Discrete Labeling

4

Int J Comput Vis (2009) 82: 302–324 303

Fig. 1 Incorporating higher order potentials for object segmentation.

(a) An image from the MSRC-21 dataset. (b), (c) and (d) Unsuper-

vised image segmentation results generated by using different para-

meters values in the mean-shift segmentation algorithm (Comaniciu

and Meer 2002). (e) The object segmentation obtained using the unary

likelihood potentials from TextonBoost. (f) The result of performing

inference in the pairwise CRF defined in Sect. 2. (g) Our segmentation

result obtained by augmenting the pairwise CRF with higher order

potentials defined on the segments shown in (b), (c) and (d). (h) The

rough hand labelled segmentations provided in the MSRC data set. It

can be clearly seen that the use of higher order potentials results in a

significant improvement in the segmentation result. For instance, the

branches of the tree are much better segmented

sell et al. (2006) this is not always the case and segments ob-

tained using unsupervised segmentation methods are often

wrong. To overcome these problems (Hoiem et al. 2005b)

and (Russell et al. 2006) use multiple segmentations of the

image (instead of only one) in the hope that although most

segmentations are bad, some are correct and thus would

prove useful for their task. They merge these multiple super-

pixels using heuristic algorithms which lack any optimality

guarantees and thus may produce bad results. In this paper

we propose an algorithm that can compute the solution of

the labelling problem (using features based on image seg-

ments) in a principled manner. Our approach couples po-

tential functions defined on sets of pixels with conventional

unary and pairwise cues using higher order CRFs. We test

the performance of this method on the problem of object

segmentation and recognition. Our experiments show that

the results of our approach are significantly better than the

ones obtained using pairwise CRF models (see Fig. 1).

1.1 Object Segmentation and Recognition

Combined object segmentation and recognition is one of

the most challenging and fundamental problems in com-

puter vision. The last few years have seen the emergence

of object segmentation algorithms which integrate object

specific top-down information with image based low-level

features (Borenstein and Malik 2006; He et al. 2004;

Huang et al. 2004; Kumar et al. 2005; Levin and Weiss

2006). These methods have produced excellent results on

challenging data sets. However, they typically only deal

with one object at a time in the image independently and

do not provide a framework for understanding the whole

image. Further, their models become prohibitively large as

the number of classes increases. This prevents their appli-

cation to scenarios where segmentation and recognition of

many object classes is desired.

Shotton et al. (2006) recently proposed a method (Tex-

tonBoost) to overcome this problem. In contrast to using ex-

plicit models to encode object shape they used a boosted

combination of texton features which jointly modeled shape

and texture. They combine the result of textons with colour

and location based likelihood terms in a conditional random

field (CRF). Although their method produced good segmen-

tation and recognition results, the rough shape and texture

model caused it to fail at object boundaries. The problem

of extracting accurate boundaries of objects is considerably

more challenging. In what follows we show that incorpora-

tion of higher order potentials defined on superpixels dra-

matically improves the object segmentation result. In partic-

ular, it leads to segmentations with much better definition of

object boundaries as shown in Fig. 1.

1.2 Higher Order CRFs

Higher order random fields are not new to computer vision.

They have been long used to model image textures (Lan et

sky

tree

house

grass

Int J Comput Vis (2009) 82: 302–324 303

Fig. 1 Incorporating higher order potentials for object segmentation.

(a) An image from the MSRC-21 dataset. (b), (c) and (d) Unsuper-

vised image segmentation results generated by using different para-

meters values in the mean-shift segmentation algorithm (Comaniciu

and Meer 2002). (e) The object segmentation obtained using the unary

likelihood potentials from TextonBoost. (f) The result of performing

inference in the pairwise CRF defined in Sect. 2. (g) Our segmentation

result obtained by augmenting the pairwise CRF with higher order

potentials defined on the segments shown in (b), (c) and (d). (h) The

rough hand labelled segmentations provided in the MSRC data set. It

can be clearly seen that the use of higher order potentials results in a

significant improvement in the segmentation result. For instance, the

branches of the tree are much better segmented

sell et al. (2006) this is not always the case and segments ob-

tained using unsupervised segmentation methods are often

wrong. To overcome these problems (Hoiem et al. 2005b)

and (Russell et al. 2006) use multiple segmentations of the

image (instead of only one) in the hope that although most

segmentations are bad, some are correct and thus would

prove useful for their task. They merge these multiple super-

pixels using heuristic algorithms which lack any optimality

guarantees and thus may produce bad results. In this paper

we propose an algorithm that can compute the solution of

the labelling problem (using features based on image seg-

ments) in a principled manner. Our approach couples po-

tential functions defined on sets of pixels with conventional

unary and pairwise cues using higher order CRFs. We test

the performance of this method on the problem of object

segmentation and recognition. Our experiments show that

the results of our approach are significantly better than the

ones obtained using pairwise CRF models (see Fig. 1).

1.1 Object Segmentation and Recognition

Combined object segmentation and recognition is one of

the most challenging and fundamental problems in com-

puter vision. The last few years have seen the emergence

of object segmentation algorithms which integrate object

specific top-down information with image based low-level

features (Borenstein and Malik 2006; He et al. 2004;

Huang et al. 2004; Kumar et al. 2005; Levin and Weiss

2006). These methods have produced excellent results on

challenging data sets. However, they typically only deal

with one object at a time in the image independently and

do not provide a framework for understanding the whole

image. Further, their models become prohibitively large as

the number of classes increases. This prevents their appli-

cation to scenarios where segmentation and recognition of

many object classes is desired.

Shotton et al. (2006) recently proposed a method (Tex-

tonBoost) to overcome this problem. In contrast to using ex-

plicit models to encode object shape they used a boosted

combination of texton features which jointly modeled shape

and texture. They combine the result of textons with colour

and location based likelihood terms in a conditional random

field (CRF). Although their method produced good segmen-

tation and recognition results, the rough shape and texture

model caused it to fail at object boundaries. The problem

of extracting accurate boundaries of objects is considerably

more challenging. In what follows we show that incorpora-

tion of higher order potentials defined on superpixels dra-

matically improves the object segmentation result. In partic-

ular, it leads to segmentations with much better definition of

object boundaries as shown in Fig. 1.

1.2 Higher Order CRFs

Higher order random fields are not new to computer vision.

They have been long used to model image textures (Lan et

Summarization

5

Influential subsets

Submodularity

7

extra cost:

one drink

extra cost:

free refill

diminishing marginal costs

The big picture

submodular functions

electrical networks (Narayanan

1997)

graph theory

(Frank 1993)

game theory

(Shapley 1970)

matroid theory

(Whitney, 1935) stochastic processes (Macchi 1975, Borodin 2009)

combinatorial optimization

computer vision & machine learning

G. Choquet J. Edmonds

L.S. Shapley L. Lovász

Examples

9

sensing:

F(S) = information gained from locations S

Example: cover

Maximizing Influence

11 Kempe, Kleinberg & Tardos 2003

Submodular set functions

• Diminishing gains: for all

• Union-Intersection: for all

A B + e + e

Submodularity: boolean & sets

Graph cuts

• Cut for one edge:

0 0

• cut of one edge is submodular!

• large graph: sum of edges

Useful property: sum of submodular functions is submodular

submodular on . The following are submodular:

• Restriction:

Other closedness properties

15

S V S W V


• Restriction:

• Conditioning:

Other closedness properties

16

S V S W V

Closedness properties


• Restriction:

• Conditioning:

• Reflection:

17

S V

Submodular optimization

• subset selection: min / max F(S)

• minimizing submodular functions: next

• maximizing submodular functions: afternoon

convex …

… and concave aspects!

Minimizing submodular functions

Why?

• energy minimization

• variational inference (marginals)

• structured sparse estimation …

How?

• graph cuts – fast, not always possible

• convex relaxations – can be fast, always possible

• …

submodularity & convexity

any set function with .

… is a function on

binary vectors!

a

b

d

c

A

20

pseudo-boolean function

1

1

0

0

a

b

c

d

Relaxation: idea

A relaxation (extension)

have want: extension

(1.0 - 0.5) + (0.5 – 0.2) + (0.2)

The Lovász extension

have want: extension

• truncation

• cut function

Examples

1.0 - 0.5

“total variation”!

Alternative characterization

Theorem (Lovász, 1983)

Lovasz extension is convex F is submodular.

if F is submodular, this is equivalent to:

1

2

− 1− 2

sb

sa

sa F (a)

sb F (b)

1

2

− 1− 2

sb

sa

sa F (a)

1

2

− 1− 2

sb

sa

sa F (a)

sb F (b)

1

2

− 1− 2

sb

sa

sa F (a)

sb F (b)

sa + sb F (V)

Submodular polyhedra

submodular polyhedron:

Base polytope

Base polytope

Base polytope exponentially

many constraints!

Edmonds 1970: “magic”

compute argmax in O(n log n)

basis of (almost all) optimization! -- separation oracle – subgradient --

Base polytopes

Base polytope

3s

s2

s1

P(F)

B(F)

2D (2 elements) 3D (3 elements)

Convex relaxation

1. relaxation: convex optimization (non-smooth)

2. relaxation is exact!

submodular minimization in polynomial time! (Grötschel, Lovász, Schrijver 1981)

3s

s2

s1

P(F)

B(F)

Submodular minimization

• minimize

– subgradient descent

– smoothing (special cases)

• solve dual: combinatorial algorithms

– foundations: Edmonds, Cunningham

– first poly-time algorithms:

(Iwata-Fujishige-Fleischer 2001, Schrijver 2000)

– many more after that …

proximal problem Lovász extension

Minimum-norm-point algorithm

dual: minimum norm problem

31

Fujishige ‘91, Fujishige & Isotani ‘11

minimizes F !

-1

1

a

b

-1 a

The bigger story

projection proximal parametric

divide-and-conquer

thresholding

(Fujishige & Isotani 11, Nagano, Gallo-Grigoriadis-Tarjan 06, Hochbaum 01, Chambolle & Darbon 09, …)

Minimum-norm-point algorithm

a

b

d

c

-0.5

-0.5

0.8

1.0

a

b

c

d

1. optimization: find

2. rounding:

how solve?

Polytope has exponentially many inequalities / faces

BUT: can do linear optimization over

Frank-Wolfe or Fujishige-Wolfe algorithm

Frank-Wolfe: main idea

Empirically

(Figure from Bach, 2012)

convergence of relaxation convergence of S

min-norm point

Recap – links to convexity

• submodular function F(S)

• convex extension f(x) --- can compute it!

• submodular minimization as convex optimization

-- can solve it!

• What can we do with it?

3s

s2

s1

P(F)

B(F)

Links to convexity

• What can we do with it?

• MAP inference / energy minimization (out-of-the-box)

• variational inference (Djolonga & Krause 2014)

• structured sparsity (Bach 2010)

• decomposition & parallel algorithms

Structured sparsity and submodularity

Sparse reconstruction

40

discrete regularization on support S of x

relax to convex envelope

sparsity pattern often not random…

subset

selection:

S = {1,3,4,7}

Assumption:

x is sparse

Structured sparsity

...

. . .

⇡ ⇤

M xy

.. .

Assumption:

support of x

has structure

express by set function!

Preference for trees

Set function:

if T is a tree and S not |S| = |T|

use as regularizer?

Sparsity

43

Optimization: submodular minimization (min-norm) (Bach2010)

• x sparse • x structured sparse

submodular function

Lovász extension

discrete regularization on support S of x

relax to convex envelope

Special case

• minimize a sum of submodular functions

“easy”

• combinatorial algorithms (Kolmogorov 12, Fix-Joachims-Park-Zabih 13, Fix-Wang-Zabih 14)

• convex relaxations

Relaxation

• convex Lovász extension:

tight relaxation

• dual decomposition: parallel algorithms (Komodakis-Paragios-Tziritas 11, Savchynskyy-Schmidt-Kappes-Schnörr 11, J-Bach-Sra 13)

Results: dual decomposition

iteration20 40 60 80 100

log

10(d

ua

lity g

ap

)

-1

0

1

2

3

4

5

6 subgradBCDDRfista-smoothdual-decprimal-smoothed

convergence discrete problem relaxation I

relax II

(Jegelka, Bach, Sra 2013; Nishihara, Jegelka, Jordan 2014)

smooth dual non-smooth dual

faster

parallel

algorithms

Summary

• Submodular functions – diminishing returns/costs

• convex relations:

– exact relaxation

– structured norms

– fast algorithms

• more soon:

– constraints

– maximization: diversity, information

3s

s2

s1

P(F)

B(F)

minimizing general submodular functions - eth zurich · minimizing general submodular functions...

Documents