tutorial on learning and inference in discrete graphical ...sgould/papers/tutorial-cvpr-2014.pdf ·...

214
Tutorial on Learning and Inference in Discrete Graphical Models CVPR 2014 Karteek Alahari, Dhruv Batra, Matthew Blaschko, Stephen Gould, Pushmeet Kohli, Nikos Komodakis, Nikos Paragios 28 June 2014 Stephen Gould | CVPR 2014 1/76

Upload: others

Post on 04-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Tutorial on Learning and Inferencein Discrete Graphical Models

CVPR 2014

Karteek Alahari, Dhruv Batra, Matthew Blaschko, StephenGould, Pushmeet Kohli, Nikos Komodakis, Nikos Paragios

28 June 2014

Stephen Gould | CVPR 2014 1/76

Page 2: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Graphical Models are Pervasive in Computer Vision

Stephen Gould | CVPR 2014 2/76

Page 3: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Graphical Models are Pervasive in Computer Vision

pixel labeling

Stephen Gould | CVPR 2014 2/76

Page 4: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Graphical Models are Pervasive in Computer Vision

pixel labelingobject detection,pose estimation

Stephen Gould | CVPR 2014 2/76

Page 5: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Graphical Models are Pervasive in Computer Vision

pixel labelingobject detection,pose estimation

scene understanding

Stephen Gould | CVPR 2014 2/76

Page 6: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Graphical Models are Pervasive in Computer Vision

pixel labeling object detection,pose estimation

scene understanding

Stephen Gould | CVPR 2014 3/76

Page 7: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Tutorial Overview

Part 1. Inference(S. Gould, 45 minutes)

Exact inference in graphical models

Graph-cut based methods

Relaxations and dual-decomposition

(P. Kohli, 45 minutes)Strategies for higher-order models

(D. Batra, 15 minutes)M-Best MAP, Diverse M-Best

Part 2. Learning(M. Blaschko, 45 minutes)

Introduction to learning of graphical models

Maximum-likelihood learning, max-margin learning

Max-margin training via subgradient method

(K. Alahari, 45 minutes)Constraint generation approaches for structured learning

Efficient training of graphical models via dual-decomposition

Stephen Gould | CVPR 2014 4/76

Page 8: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Conditional Markov Random Fields

Also known as:

Markov Networks, Undirected Graphical Models, MRFsI make no distinction between CRFs and MRFs

X ∈ X are the observed random variables (always)

Y = (Y1, . . . ,Yn) ∈ Y are the output random variables

Yc are a subset of variables for clique c ⊆ {1, . . . , n}

Define a factored probability distribution

P(Y | X) =1

Z (X)

c

Ψc(Yc ;X)

where Z (X) =∑

Y∈Y

c Ψc(Yc ;X) is the partition function

Main difficulty is the exponential number of configurations

Stephen Gould | CVPR 2014 5/76

Page 9: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Conditional Markov Random Fields

Also known as:

Markov Networks, Undirected Graphical Models, MRFsI make no distinction between CRFs and MRFs

X ∈ X are the observed random variables (always)

Y = (Y1, . . . ,Yn) ∈ Y are the output random variables

Yc are a subset of variables for clique c ⊆ {1, . . . , n}

Define a factored probability distribution

P(Y | X) =1

Z (X)

c

Ψc(Yc ;X)

where Z (X) =∑

Y∈Y

c Ψc(Yc ;X) is the partition function

Main difficulty is the exponential number of configurations

Stephen Gould | CVPR 2014 5/76

Page 10: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Conditional Markov Random Fields

Also known as:

Markov Networks, Undirected Graphical Models, MRFsI make no distinction between CRFs and MRFs

X ∈ X are the observed random variables (always)

Y = (Y1, . . . ,Yn) ∈ Y are the output random variables

Yc are a subset of variables for clique c ⊆ {1, . . . , n}

Define a factored probability distribution

P(Y | X) =1

Z (X)

c

Ψc(Yc ;X)

where Z (X) =∑

Y∈Y

c Ψc(Yc ;X) is the partition function

Main difficulty is the exponential number of configurations

Stephen Gould | CVPR 2014 5/76

Page 11: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Conditional Markov Random Fields

Also known as:

Markov Networks, Undirected Graphical Models, MRFsI make no distinction between CRFs and MRFs

X ∈ X are the observed random variables (always)

Y = (Y1, . . . ,Yn) ∈ Y are the output random variables

Yc are a subset of variables for clique c ⊆ {1, . . . , n}

Define a factored probability distribution

P(Y | X) =1

Z (X)

c

Ψc(Yc ;X)

where Z (X) =∑

Y∈Y

c Ψc(Yc ;X) is the partition function

Main difficulty is the exponential number of configurations

Stephen Gould | CVPR 2014 5/76

Page 12: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Conditional Markov Random Fields

Also known as:

Markov Networks, Undirected Graphical Models, MRFsI make no distinction between CRFs and MRFs

X ∈ X are the observed random variables (always)

Y = (Y1, . . . ,Yn) ∈ Y are the output random variables

Yc are a subset of variables for clique c ⊆ {1, . . . , n}

Define a factored probability distribution

P(Y | X) =1

Z (X)

c

Ψc(Yc ;X)

where Z (X) =∑

Y∈Y

c Ψc(Yc ;X) is the partition function

Main difficulty is the exponential number of configurations

Stephen Gould | CVPR 2014 5/76

Page 13: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Conditional Markov Random Fields

Also known as:

Markov Networks, Undirected Graphical Models, MRFsI make no distinction between CRFs and MRFs

X ∈ X are the observed random variables (always)

Y = (Y1, . . . ,Yn) ∈ Y are the output random variables

Yc are a subset of variables for clique c ⊆ {1, . . . , n}

Define a factored probability distribution

P(Y | X) =1

Z (X)

c

Ψc(Yc ;X)

where Z (X) =∑

Y∈Y

c Ψc(Yc ;X) is the partition function

Main difficulty is the exponential number of configurations

Stephen Gould | CVPR 2014 5/76

Page 14: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Conditional Markov Random Fields

Also known as:

Markov Networks, Undirected Graphical Models, MRFsI make no distinction between CRFs and MRFs

X ∈ X are the observed random variables (always)

Y = (Y1, . . . ,Yn) ∈ Y are the output random variables

Yc are a subset of variables for clique c ⊆ {1, . . . , n}

Define a factored probability distribution

P(Y | X) =1

Z (X)

c

Ψc(Yc ;X)

where Z (X) =∑

Y∈Y

c Ψc(Yc ;X) is the partition function

Main difficulty is the exponential number of configurations

Stephen Gould | CVPR 2014 5/76

Page 15: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Conditional Markov Random Fields

Also known as:

Markov Networks, Undirected Graphical Models, MRFsI make no distinction between CRFs and MRFs

X ∈ X are the observed random variables (always)

Y = (Y1, . . . ,Yn) ∈ Y are the output random variables

Yc are a subset of variables for clique c ⊆ {1, . . . , n}

Define a factored probability distribution

P(Y | X) =1

Z (X)

c

Ψc(Yc ;X)

where Z (X) =∑

Y∈Y

c Ψc(Yc ;X) is the partition function

Main difficulty is the exponential number of configurations

Stephen Gould | CVPR 2014 5/76

Page 16: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

MAP Inference

We will mainly be interested in maximum a posteriori (MAP)inference

y⋆ = argmaxy∈Y

P(y | x)

= argmaxy∈Y

1

Z (X)

c

Ψc(Yc ;X)

= argmaxy∈Y

log

(

1

Z (X)

c

Ψc(Yc ;X)

)

= argmaxy∈Y

c

log Ψc(Yc ;X)− logZ (X)

= argmaxy∈Y

c

log Ψc(Yc ;X)

Stephen Gould | CVPR 2014 6/76

Page 17: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Energy Functions

Define an energy function

E (Y;X) =∑

c

ψc(Yc ;X)

where ψc (·) = − log Ψc(·)

Then

P(Y | X) =1

Z (X)exp {−E (Y;X)}

Andargmax

y∈YP(y | x) = argmin

y∈YE (y; x)

Stephen Gould | CVPR 2014 7/76

Page 18: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Energy Functions

Define an energy function

E (Y;X) =∑

c

ψc(Yc ;X)

where ψc (·) = − log Ψc(·)

Then

P(Y | X) =1

Z (X)exp {−E (Y;X)}

Andargmax

y∈YP(y | x) = argmin

y∈YE (y; x)

Stephen Gould | CVPR 2014 7/76

Page 19: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Energy Functions

Define an energy function

E (Y;X) =∑

c

ψc(Yc ;X)

where ψc (·) = − log Ψc(·)

Then

P(Y | X) =1

Z (X)exp {−E (Y;X)}

Andargmax

y∈YP(y | x) = argmin

y∈YE (y; x)

Stephen Gould | CVPR 2014 7/76

Page 20: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Energy Functions

Define an energy function

E (Y;X) =∑

c

ψc(Yc ;X)

where ψc (·) = − log Ψc(·)

Then

P(Y | X) =1

Z (X)exp {−E (Y;X)}

Andargmax

y∈YP(y | x) = argmin

y∈YE (y; x)

energy minimization ‘equals’ MAP inference

Stephen Gould | CVPR 2014 7/76

Page 21: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Clique Potentials

A clique potential ψc(yc ; x) defines a mapping from anassignment of the random variables to a real number

ψc : Yc × X → R

The clique potential encodes a preference for assignments tothe random variables (lower value is more preferred)

Often parameterized as

ψc(yc ; x) = wTc φc(yc ; x)

But in this part of the tutorial is suffices to think of the cliquepotentials as big lookup tables

We will also ignore the conditioning on X (in this part)

Stephen Gould | CVPR 2014 8/76

Page 22: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Clique Potentials

A clique potential ψc(yc ; x) defines a mapping from anassignment of the random variables to a real number

ψc : Yc × X → R

The clique potential encodes a preference for assignments tothe random variables (lower value is more preferred)

Often parameterized as

ψc(yc ; x) = wTc φc(yc ; x)

But in this part of the tutorial is suffices to think of the cliquepotentials as big lookup tables

We will also ignore the conditioning on X (in this part)

Stephen Gould | CVPR 2014 8/76

Page 23: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Clique Potentials

A clique potential ψc(yc ; x) defines a mapping from anassignment of the random variables to a real number

ψc : Yc × X → R

The clique potential encodes a preference for assignments tothe random variables (lower value is more preferred)

Often parameterized as

ψc(yc ; x) = wTc φc(yc ; x)

But in this part of the tutorial is suffices to think of the cliquepotentials as big lookup tables

We will also ignore the conditioning on X (in this part)

Stephen Gould | CVPR 2014 8/76

Page 24: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Clique Potentials

A clique potential ψc(yc ; x) defines a mapping from anassignment of the random variables to a real number

ψc : Yc × X → R

The clique potential encodes a preference for assignments tothe random variables (lower value is more preferred)

Often parameterized as

ψc(yc ; x) = wTc φc(yc ; x)

But in this part of the tutorial is suffices to think of the cliquepotentials as big lookup tables

We will also ignore the conditioning on X (in this part)

Stephen Gould | CVPR 2014 8/76

Page 25: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Clique Potentials

A clique potential ψc(yc ; x) defines a mapping from anassignment of the random variables to a real number

ψc : Yc × X → R

The clique potential encodes a preference for assignments tothe random variables (lower value is more preferred)

Often parameterized as

ψc(yc ; x) = wTc φc(yc ; x)

But in this part of the tutorial is suffices to think of the cliquepotentials as big lookup tables

We will also ignore the conditioning on X (in this part)

Stephen Gould | CVPR 2014 8/76

Page 26: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Clique Potential Arity

E (y; x) =∑

c

ψc(yc ; x)

=∑

i∈V

ψUi (yi ; x)

︸ ︷︷ ︸

unary

+∑

ij∈E

ψPij (yi , yj ; x)

︸ ︷︷ ︸

pairwise

+∑

c∈C

ψHc (yc ; x).

︸ ︷︷ ︸

higher-order

x1 x2 x3

y1 y2 y3

x4 x5 x6

y4 y5 y6

x7 x8 x9

y7 y8 y9

Stephen Gould | CVPR 2014 9/76

Page 27: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Graphical Representation

E (y) = ψ(y1, y2) + ψ(y2, y3) + ψ(y3, y4) + ψ(y4, y1)

Y1 Y2

Y4 Y3

� � � �

Y1 Y2 Y3 Y4

graphical model factor graph

Stephen Gould | CVPR 2014 10/76

Page 28: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Graphical Representation

E (y) =∑

i ,j ψ(yi , yj )

Y1 Y2

Y4 Y3

� � � � � �

Y1 Y2 Y3 Y4

Stephen Gould | CVPR 2014 11/76

Page 29: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Graphical Representation

E (y) = ψ(y1, y2, y3, y4)

Y1 Y2

Y4 Y3

Y1 Y2 Y3 Y4

Stephen Gould | CVPR 2014 12/76

Page 30: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Graphical Representation

E (y) = ψ(y1, y2, y3, y4)

Y1 Y2

Y4 Y3

Y1 Y2 Y3 Y4

don’t worry too much about the graphical representation,look at the form of the energy function

Stephen Gould | CVPR 2014 12/76

Page 31: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

MAP Inference / Energy Minimization

Computing the energy minimizing assignment is NP-hard

argminy∈Y

E (y; x) = argmaxy∈Y

P(y | x)

Some structures admit tractable exact inference algorithms

low treewidth graphs → message passingsubmodular potentials → graph-cuts

Moreover, efficent approximate inference algorithms exist

message passing on general graphsmove making inference (submodular moves)linear programming relaxations

Stephen Gould | CVPR 2014 13/76

Page 32: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

MAP Inference / Energy Minimization

Computing the energy minimizing assignment is NP-hard

argminy∈Y

E (y; x) = argmaxy∈Y

P(y | x)

Some structures admit tractable exact inference algorithms

low treewidth graphs → message passingsubmodular potentials → graph-cuts

Moreover, efficent approximate inference algorithms exist

message passing on general graphsmove making inference (submodular moves)linear programming relaxations

Stephen Gould | CVPR 2014 13/76

Page 33: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

MAP Inference / Energy Minimization

Computing the energy minimizing assignment is NP-hard

argminy∈Y

E (y; x) = argmaxy∈Y

P(y | x)

Some structures admit tractable exact inference algorithms

low treewidth graphs → message passingsubmodular potentials → graph-cuts

Moreover, efficent approximate inference algorithms exist

message passing on general graphsmove making inference (submodular moves)linear programming relaxations

Stephen Gould | CVPR 2014 13/76

Page 34: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

exact inference

Stephen Gould | CVPR 2014 14/76

Page 35: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

An Example: Chain Graph

E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)

Y1 � Y2 � Y3 � Y4

miny

E (y) = miny1,y2,y3,y4

ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)

= miny1,y2,y3

ψA(y1, y2) + ψB(y2, y3) + miny4ψC (y3, y4)

︸ ︷︷ ︸

mC→B(y3)

= miny1,y2

ψA(y1, y2) + miny3ψB(y2, y3) +mC→B(y3)

︸ ︷︷ ︸

mB→A(y2)

= miny1,y2

ψA(y1, y2) +mB→A(y2)

Stephen Gould | CVPR 2014 15/76

Page 36: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

An Example: Chain Graph

E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)

Y1,Y2 Y2,Y3 Y3,Y4

miny

E (y) = miny1,y2,y3,y4

ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)

= miny1,y2,y3

ψA(y1, y2) + ψB(y2, y3) + miny4ψC (y3, y4)

︸ ︷︷ ︸

mC→B(y3)

= miny1,y2

ψA(y1, y2) + miny3ψB(y2, y3) +mC→B(y3)

︸ ︷︷ ︸

mB→A(y2)

= miny1,y2

ψA(y1, y2) +mB→A(y2)

Stephen Gould | CVPR 2014 15/76

Page 37: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

An Example: Chain Graph

E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)

Y1,Y2 Y2,Y3 Y3,Y4

miny

E (y) = miny1,y2,y3,y4

ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)

= miny1,y2,y3

ψA(y1, y2) + ψB(y2, y3) + miny4ψC (y3, y4)

︸ ︷︷ ︸

mC→B(y3)

= miny1,y2

ψA(y1, y2) + miny3ψB(y2, y3) +mC→B(y3)

︸ ︷︷ ︸

mB→A(y2)

= miny1,y2

ψA(y1, y2) +mB→A(y2)

Stephen Gould | CVPR 2014 15/76

Page 38: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

An Example: Chain Graph

E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)

Y1,Y2 Y2,Y3 Y3,Y4

miny

E (y) = miny1,y2,y3,y4

ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)

= miny1,y2,y3

ψA(y1, y2) + ψB(y2, y3) + miny4ψC (y3, y4)

︸ ︷︷ ︸

mC→B(y3)

= miny1,y2

ψA(y1, y2) + miny3ψB(y2, y3) +mC→B(y3)

︸ ︷︷ ︸

mB→A(y2)

= miny1,y2

ψA(y1, y2) +mB→A(y2)

Stephen Gould | CVPR 2014 15/76

Page 39: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

An Example: Chain Graph

E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)

Y1,Y2 Y2,Y3 Y3,Y4

miny

E (y) = miny1,y2,y3,y4

ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)

= miny1,y2,y3

ψA(y1, y2) + ψB(y2, y3) + miny4ψC (y3, y4)

︸ ︷︷ ︸

mC→B(y3)

= miny1,y2

ψA(y1, y2) + miny3ψB(y2, y3) +mC→B(y3)

︸ ︷︷ ︸

mB→A(y2)

= miny1,y2

ψA(y1, y2) +mB→A(y2)

Stephen Gould | CVPR 2014 15/76

Page 40: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

An Example: Chain Graph

E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)

Y1,Y2 Y2,Y3 Y3,Y4

miny

E (y) = miny1,y2,y3,y4

ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)

= miny1,y2,y3

ψA(y1, y2) + ψB(y2, y3) + miny4ψC (y3, y4)

︸ ︷︷ ︸

mC→B(y3)

= miny1,y2

ψA(y1, y2) + miny3ψB(y2, y3) +mC→B(y3)

︸ ︷︷ ︸

mB→A(y2)

= miny1,y2

ψA(y1, y2) +mB→A(y2)

Stephen Gould | CVPR 2014 15/76

Page 41: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Viterbi Decoding

E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)

Y1,Y2 Y2,Y3 Y3,Y4

The energy minimizing assignment can be decoded as

y⋆1 = argminy1

miny2ψA(y1, y2) +mB→A(y2)

y⋆2 = argminy2

ψA(y⋆1 , y2) +mB→A(y2)

y⋆3 = argminy3

ψB(y⋆2 , y3) +mC→B(y3)

y⋆4 = argminy4

ψC (y⋆3 , y4)

Stephen Gould | CVPR 2014 16/76

Page 42: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Viterbi Decoding

E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)

Y1,Y2 Y2,Y3 Y3,Y4

The energy minimizing assignment can be decoded as

y⋆1 = argminy1

miny2ψA(y1, y2) +mB→A(y2)

y⋆2 = argminy2

ψA(y⋆1 , y2) +mB→A(y2)

y⋆3 = argminy3

ψB(y⋆2 , y3) +mC→B(y3)

y⋆4 = argminy4

ψC (y⋆3 , y4)

Stephen Gould | CVPR 2014 16/76

Page 43: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Viterbi Decoding

E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)

Y1,Y2 Y2,Y3 Y3,Y4

The energy minimizing assignment can be decoded as

y⋆1 = argminy1

miny2ψA(y1, y2) +mB→A(y2)

y⋆2 = argminy2

ψA(y⋆1 , y2) +mB→A(y2)

y⋆3 = argminy3

ψB(y⋆2 , y3) +mC→B(y3)

y⋆4 = argminy4

ψC (y⋆3 , y4)

Stephen Gould | CVPR 2014 16/76

Page 44: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Viterbi Decoding

E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)

Y1,Y2 Y2,Y3 Y3,Y4

The energy minimizing assignment can be decoded as

y⋆1 = argminy1

miny2ψA(y1, y2) +mB→A(y2)

y⋆2 = argminy2

ψA(y⋆1 , y2) +mB→A(y2)

y⋆3 = argminy3

ψB(y⋆2 , y3) +mC→B(y3)

y⋆4 = argminy4

ψC (y⋆3 , y4)

Stephen Gould | CVPR 2014 16/76

Page 45: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

What did this cost us?

Y1 � Y2 � · · · � Yn

For a chain of length n with L labels per variable:

Brute force enumeration would cost |Y| = Ln

Viterbi decoding (message passing) costs O(nL2)

The operation minψ(·, ·) +m(·) can be sped up for potentialswith certain structure (e.g., so called convex priors)

Stephen Gould | CVPR 2014 17/76

Page 46: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

What did this cost us?

Y1 � Y2 � · · · � Yn

For a chain of length n with L labels per variable:

Brute force enumeration would cost |Y| = Ln

Viterbi decoding (message passing) costs O(nL2)

The operation minψ(·, ·) +m(·) can be sped up for potentialswith certain structure (e.g., so called convex priors)

Stephen Gould | CVPR 2014 17/76

Page 47: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

What did this cost us?

Y1 � Y2 � · · · � Yn

For a chain of length n with L labels per variable:

Brute force enumeration would cost |Y| = Ln

Viterbi decoding (message passing) costs O(nL2)

The operation minψ(·, ·) +m(·) can be sped up for potentialswith certain structure (e.g., so called convex priors)

Stephen Gould | CVPR 2014 17/76

Page 48: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Min-Sum Message Passing on Clique Trees

messages sent in reverse then forward topological orderingmessage from clique i to clique j calculated as

mi→j(Yj ∩Yi) = minYi\Yj

(

ψi (Yi) +∑

k∈N (i)\{j}

mk→i (Yi ∩Yk))

energy minimizing assignment decoded as

y⋆i = argminYi

( min marginal︷ ︸︸ ︷

ψi (Yi) +∑

k∈N (i)

mk→i (Yi ∩ Yk)

)

ties must be decoded consistently

Stephen Gould | CVPR 2014 18/76

Page 49: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Min-Sum Message Passing on Clique Trees

messages sent in reverse then forward topological orderingmessage from clique i to clique j calculated as

mi→j(Yj ∩Yi) = minYi\Yj

(

ψi (Yi) +∑

k∈N (i)\{j}

mk→i (Yi ∩Yk))

energy minimizing assignment decoded as

y⋆i = argminYi

( min marginal︷ ︸︸ ︷

ψi (Yi) +∑

k∈N (i)

mk→i (Yi ∩ Yk)

)

ties must be decoded consistently

Stephen Gould | CVPR 2014 18/76

Page 50: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Min-Sum Message Passing on Clique Trees

messages sent in reverse then forward topological orderingmessage from clique i to clique j calculated as

mi→j(Yj ∩Yi) = minYi\Yj

(

ψi (Yi) +∑

k∈N (i)\{j}

mk→i (Yi ∩Yk))

energy minimizing assignment decoded as

y⋆i = argminYi

( min marginal︷ ︸︸ ︷

ψi (Yi) +∑

k∈N (i)

mk→i (Yi ∩ Yk)

)

ties must be decoded consistently

Stephen Gould | CVPR 2014 18/76

Page 51: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Min-Sum Message Passing on Clique Trees

messages sent in reverse then forward topological orderingmessage from clique i to clique j calculated as

mi→j(Yj ∩Yi) = minYi\Yj

(

ψi (Yi) +∑

k∈N (i)\{j}

mk→i (Yi ∩Yk))

energy minimizing assignment decoded as

y⋆i = argminYi

( min marginal︷ ︸︸ ︷

ψi (Yi) +∑

k∈N (i)

mk→i (Yi ∩ Yk)

)

ties must be decoded consistently

Stephen Gould | CVPR 2014 18/76

Page 52: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Min-Sum Message Passing on Factor Graphs (Trees)

messages from variables to factors

mi→F (yi ) =∑

G∈N (i)\{F}

mG→i (yi )

messages from factors to variables

mF→i (yi ) = miny′F,y ′

i=yi

(

ψF (y′F ) +

j∈N (F )\{i}

mj→F (y′j ))

energy minimizing assignment decoded as

y⋆i = argminyi

F∈N (i)

mF→i(yi )

Stephen Gould | CVPR 2014 19/76

Page 53: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Min-Sum Message Passing on Factor Graphs (Trees)

messages from variables to factors

mi→F (yi ) =∑

G∈N (i)\{F}

mG→i (yi )

messages from factors to variables

mF→i (yi ) = miny′F,y ′

i=yi

(

ψF (y′F ) +

j∈N (F )\{i}

mj→F (y′j ))

energy minimizing assignment decoded as

y⋆i = argminyi

F∈N (i)

mF→i(yi )

Stephen Gould | CVPR 2014 19/76

Page 54: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Min-Sum Message Passing on Factor Graphs (Trees)

messages from variables to factors

mi→F (yi ) =∑

G∈N (i)\{F}

mG→i (yi )

messages from factors to variables

mF→i (yi ) = miny′F,y ′

i=yi

(

ψF (y′F ) +

j∈N (F )\{i}

mj→F (y′j ))

energy minimizing assignment decoded as

y⋆i = argminyi

F∈N (i)

mF→i(yi )

Stephen Gould | CVPR 2014 19/76

Page 55: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Message Passing on General Graphs

Message passing can be generalized to graphs with loops

If the treewidth is small we can still perform exact inference

junction tree algorithm: triangulate the graph and runmessage passing on the resulting tree

Otherwise run message passing anyway

loopy belief propagtaiondifferent message schedules (synchronous/asynchronous,static/dynamic)no convergence or approximation guarantees

Stephen Gould | CVPR 2014 20/76

Page 56: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Message Passing on General Graphs

Message passing can be generalized to graphs with loops

If the treewidth is small we can still perform exact inference

junction tree algorithm: triangulate the graph and runmessage passing on the resulting tree

Otherwise run message passing anyway

loopy belief propagtaiondifferent message schedules (synchronous/asynchronous,static/dynamic)no convergence or approximation guarantees

Stephen Gould | CVPR 2014 20/76

Page 57: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Message Passing on General Graphs

Message passing can be generalized to graphs with loops

If the treewidth is small we can still perform exact inference

junction tree algorithm: triangulate the graph and runmessage passing on the resulting tree

Otherwise run message passing anyway

loopy belief propagtaiondifferent message schedules (synchronous/asynchronous,static/dynamic)no convergence or approximation guarantees

Stephen Gould | CVPR 2014 20/76

Page 58: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Message Passing on General Graphs

Message passing can be generalized to graphs with loops

If the treewidth is small we can still perform exact inference

junction tree algorithm: triangulate the graph and runmessage passing on the resulting tree

Otherwise run message passing anyway

loopy belief propagtaiondifferent message schedules (synchronous/asynchronous,static/dynamic)no convergence or approximation guarantees

Stephen Gould | CVPR 2014 20/76

Page 59: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Message Passing on General Graphs

Message passing can be generalized to graphs with loops

If the treewidth is small we can still perform exact inference

junction tree algorithm: triangulate the graph and runmessage passing on the resulting tree

Otherwise run message passing anyway

loopy belief propagtaiondifferent message schedules (synchronous/asynchronous,static/dynamic)no convergence or approximation guarantees

Stephen Gould | CVPR 2014 20/76

Page 60: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Message Passing on General Graphs

Message passing can be generalized to graphs with loops

If the treewidth is small we can still perform exact inference

junction tree algorithm: triangulate the graph and runmessage passing on the resulting tree

Otherwise run message passing anyway

loopy belief propagtaiondifferent message schedules (synchronous/asynchronous,static/dynamic)no convergence or approximation guarantees

Stephen Gould | CVPR 2014 20/76

Page 61: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Message Passing on General Graphs

Message passing can be generalized to graphs with loops

If the treewidth is small we can still perform exact inference

junction tree algorithm: triangulate the graph and runmessage passing on the resulting tree

Otherwise run message passing anyway

loopy belief propagtaiondifferent message schedules (synchronous/asynchronous,static/dynamic)no convergence or approximation guarantees

Stephen Gould | CVPR 2014 20/76

Page 62: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

graph-cut based methods

Stephen Gould | CVPR 2014 21/76

Page 63: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Binary MRF Example

Consider the following energy function fortwo binary random variables, y1 and y2.

01

5

2

01

1

3

01

0 1

0 3

4 0

E (y1, y2) = ψ1(y1) + ψ2(y2) + ψ12(y1, y2)= 5y1 + 2y1︸ ︷︷ ︸

ψ1

+ y2 + 3y2︸ ︷︷ ︸

ψ2

+ 3y1y2 + 4y1y2︸ ︷︷ ︸

ψ12

where y1 = 1− y1 and y2 = 1− y2.

Stephen Gould | CVPR 2014 22/76

Page 64: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Binary MRF Example

Consider the following energy function fortwo binary random variables, y1 and y2.

01

5

2

01

1

3

01

0 1

0 3

4 0

E (y1, y2) = ψ1(y1) + ψ2(y2) + ψ12(y1, y2)= 5y1 + 2y1︸ ︷︷ ︸

ψ1

+ y2 + 3y2︸ ︷︷ ︸

ψ2

+ 3y1y2 + 4y1y2︸ ︷︷ ︸

ψ12

where y1 = 1− y1 and y2 = 1− y2.

Stephen Gould | CVPR 2014 22/76

Page 65: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Binary MRF Example

Consider the following energy function fortwo binary random variables, y1 and y2.

01

5

2

01

1

3

01

0 1

0 3

4 0

E (y1, y2) = ψ1(y1) + ψ2(y2) + ψ12(y1, y2)= 5y1 + 2y1︸ ︷︷ ︸

ψ1

+ y2 + 3y2︸ ︷︷ ︸

ψ2

+ 3y1y2 + 4y1y2︸ ︷︷ ︸

ψ12

where y1 = 1− y1 and y2 = 1− y2.

Graphical Model

y1 y2

Probability Table

y1 y2 E P

0 0 6 0.244

0 1 11 0.002

1 0 7 0.090

1 1 5 0.664

Stephen Gould | CVPR 2014 22/76

Page 66: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Pseudo-boolean Functions [Boros and Hammer, 2001]

Pseudo-boolean Function

A mapping f : {0, 1}n → R is called a pseudo-Boolean function.

Pseudo-boolean functions can be uniquely represented asmulti-linear polynomials, e.g., f (y1, y2) = 6+ y1+5y2− 7y1y2.

Pseudo-boolean functions can also be represented in posiform,e.g., f (y1, y2) = 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2. Thisrepresentation is not unique.

A binary pairwise Markov random field (MRF) is just aquadratic pseudo-Boolean function.

Stephen Gould | CVPR 2014 23/76

Page 67: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Pseudo-boolean Functions [Boros and Hammer, 2001]

Pseudo-boolean Function

A mapping f : {0, 1}n → R is called a pseudo-Boolean function.

Pseudo-boolean functions can be uniquely represented asmulti-linear polynomials, e.g., f (y1, y2) = 6+ y1+5y2− 7y1y2.

Pseudo-boolean functions can also be represented in posiform,e.g., f (y1, y2) = 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2. Thisrepresentation is not unique.

A binary pairwise Markov random field (MRF) is just aquadratic pseudo-Boolean function.

Stephen Gould | CVPR 2014 23/76

Page 68: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Pseudo-boolean Functions [Boros and Hammer, 2001]

Pseudo-boolean Function

A mapping f : {0, 1}n → R is called a pseudo-Boolean function.

Pseudo-boolean functions can be uniquely represented asmulti-linear polynomials, e.g., f (y1, y2) = 6+ y1+5y2− 7y1y2.

Pseudo-boolean functions can also be represented in posiform,e.g., f (y1, y2) = 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2. Thisrepresentation is not unique.

A binary pairwise Markov random field (MRF) is just aquadratic pseudo-Boolean function.

Stephen Gould | CVPR 2014 23/76

Page 69: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Pseudo-boolean Functions [Boros and Hammer, 2001]

Pseudo-boolean Function

A mapping f : {0, 1}n → R is called a pseudo-Boolean function.

Pseudo-boolean functions can be uniquely represented asmulti-linear polynomials, e.g., f (y1, y2) = 6+ y1+5y2− 7y1y2.

Pseudo-boolean functions can also be represented in posiform,e.g., f (y1, y2) = 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2. Thisrepresentation is not unique.

A binary pairwise Markov random field (MRF) is just aquadratic pseudo-Boolean function.

Stephen Gould | CVPR 2014 23/76

Page 70: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Submodular Functions

Submodularity

Let V be a set. A set function f : 2V → R is called submodular iff (X ) + f (Y ) ≥ f (X ∪ Y ) + f (X ∩ Y ) for all subsets X ,Y ⊆ V.

Stephen Gould | CVPR 2014 24/76

Page 71: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Submodular Binary Pairwise MRFs

Submodularity

A pseudo-Boolean function f : {0, 1}n → R is called submodular iff (x) + f (y) ≥ f (x ∨ y) + f (x ∧ y) for all vectors x, y ∈ {0, 1}n .

Submodularity checks for pairwise binary MRFs:

polynomial form (of pseudo-boolean function) has negativecoefficients on all bi-linear terms;

posiform has pairwise terms of the form uv ;

all pairwise potentials satisfy

ψPij (0, 1) + ψP

ij (1, 0) ≥ ψPij (1, 1) + ψP

ij (0, 0)

Stephen Gould | CVPR 2014 25/76

Page 72: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Submodular Binary Pairwise MRFs

Submodularity

A pseudo-Boolean function f : {0, 1}n → R is called submodular iff (x) + f (y) ≥ f (x ∨ y) + f (x ∧ y) for all vectors x, y ∈ {0, 1}n .

Submodularity checks for pairwise binary MRFs:

polynomial form (of pseudo-boolean function) has negativecoefficients on all bi-linear terms;

posiform has pairwise terms of the form uv ;

all pairwise potentials satisfy

ψPij (0, 1) + ψP

ij (1, 0) ≥ ψPij (1, 1) + ψP

ij (0, 0)

Stephen Gould | CVPR 2014 25/76

Page 73: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Submodularity of Binary Pairwise Terms

To see the equivalence of the last two conditions consider thefollowing pairwise potential

01

0 1

α β

γ δ

α +0 0

γ − α γ − α+

0 δ − γ

0 δ − γ+

0 β + γ − α− δ

0 0

E (y1, y2) = α+ (γ − α)y1 + (δ − γ)y2 + (β + γ − α− δ)y1y2

[Kolmogorov and Zabih, 2004]

Stephen Gould | CVPR 2014 26/76

Page 74: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Submodularity of Binary Pairwise Terms

To see the equivalence of the last two conditions consider thefollowing pairwise potential

01

0 1

α β

γ δ

α +0 0

γ − α γ − α+

0 δ − γ

0 δ − γ+

0 β + γ − α− δ

0 0

E (y1, y2) = α+ (γ − α)y1 + (δ − γ)y2 + (β + γ − α− δ)y1y2

[Kolmogorov and Zabih, 2004]

Stephen Gould | CVPR 2014 26/76

Page 75: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Submodularity of Binary Pairwise Terms

To see the equivalence of the last two conditions consider thefollowing pairwise potential

01

0 1

α β

γ δ

α +0 0

γ − α γ − α+

0 δ − γ

0 δ − γ+

0 β + γ − α− δ

0 0

E (y1, y2) = α+ (γ − α)y1 + (δ − γ)y2 + (β + γ − α− δ)y1y2

[Kolmogorov and Zabih, 2004]

Stephen Gould | CVPR 2014 26/76

Page 76: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Minimum-cut Problem

Graph Cut

Let G = 〈V, E〉 be a capacitated digraph with two distinguishedvertices s and t. An st-cut is a partitioning of V into two disjointsets S and T such that s ∈ S and t ∈ T . The cost of the cut isthe sum of edge capacities for all edges going from S to T .

s

u v

t

Stephen Gould | CVPR 2014 27/76

Page 77: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Quadratic Pseudo-boolean Optimization

Main idea:

construct a graph such that every st-cut corresponds to ajoint assignment to the variables y

the cost of the cut should be equal to the energy of theassignment, E (y; x).∗

the minimum-cut then corresponds to the the minimumenergy assignment, y⋆ = argminy E (y; x).

∗Requires non-negative edge weights.Stephen Gould | CVPR 2014 28/76

Page 78: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Quadratic Pseudo-boolean Optimization

Main idea:

construct a graph such that every st-cut corresponds to ajoint assignment to the variables y

the cost of the cut should be equal to the energy of theassignment, E (y; x).∗

the minimum-cut then corresponds to the the minimumenergy assignment, y⋆ = argminy E (y; x).

∗Requires non-negative edge weights.Stephen Gould | CVPR 2014 28/76

Page 79: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Quadratic Pseudo-boolean Optimization

Main idea:

construct a graph such that every st-cut corresponds to ajoint assignment to the variables y

the cost of the cut should be equal to the energy of theassignment, E (y; x).∗

the minimum-cut then corresponds to the the minimumenergy assignment, y⋆ = argminy E (y; x).

∗Requires non-negative edge weights.Stephen Gould | CVPR 2014 28/76

Page 80: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Example st-Graph Construction for Binary MRF

E (y1, y2) = ψ1(y1) + ψ2(y2) + ψij(y1, y2)

= 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2

s

y1 y2

t

1

0

Stephen Gould | CVPR 2014 29/76

Page 81: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Example st-Graph Construction for Binary MRF

E (y1, y2) = ψ1(y1) + ψ2(y2) + ψij(y1, y2)

= 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2

s

y1 y2

t

5

2

1

0

Stephen Gould | CVPR 2014 29/76

Page 82: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Example st-Graph Construction for Binary MRF

E (y1, y2) = ψ1(y1) + ψ2(y2) + ψij(y1, y2)

= 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2

s

y1 y2

t

5

2

1

3

1

0

Stephen Gould | CVPR 2014 29/76

Page 83: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Example st-Graph Construction for Binary MRF

E (y1, y2) = ψ1(y1) + ψ2(y2) + ψij(y1, y2)

= 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2

s

y1 y2

t

5

2

1

33

1

0

Stephen Gould | CVPR 2014 29/76

Page 84: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Example st-Graph Construction for Binary MRF

E (y1, y2) = ψ1(y1) + ψ2(y2) + ψij(y1, y2)

= 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2

s

y1 y2

t

5

2

1

33

4

1

0

Stephen Gould | CVPR 2014 29/76

Page 85: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

An Example st-Cut

E (0, 1) = ψ1(0) + ψ2(1) + ψij(0, 1)

= 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2

s

y1 y2

t

5

2

1

33

4

1

0

Stephen Gould | CVPR 2014 30/76

Page 86: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Another st-Cut

E (1, 1) = ψ1(1) + ψ2(1) + ψij(1, 1)

= 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2

s

y1 y2

t

5

2

1

33

4

1

0

Stephen Gould | CVPR 2014 31/76

Page 87: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Invalid st-Cut

This is not a valid cut, since it does not correspond to apartitioning of the nodes into two sets—one containing s and onecontaining t.

s

y1 y2

t

5

2

1

33

4

1

0

Stephen Gould | CVPR 2014 32/76

Page 88: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Alternative st-Graph Construction

Sometimes you will see the roles of s and t switched.

s

y1 y2

t

5

2

1

33

4

1

0

s

y1 y2

t

2

5

3

1

4

3

0

1

These graphs represent the same energy function.

Stephen Gould | CVPR 2014 33/76

Page 89: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Big Picture: Where are we?

We can now formulate inference in a submodular binarypairwise MRF as a minimum-cut problem.

{0, 1}n → R

How do we solve the minimum-cut problem?

Stephen Gould | CVPR 2014 34/76

Page 90: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Max-flow/Min-cut Theorem

Max-flow/Min-cut Theorem [Fulkerson, 1956]

The maximum flow f from vertex s to vertex t is equal to theminimum cost st-cut.

s

u v

t

Stephen Gould | CVPR 2014 35/76

Page 91: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example

s

a b

c d

t

5 3

3

5 2

1

3 5

Stephen Gould | CVPR 2014 36/76

Page 92: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Augmenting Path)

s

a b

c d

t

0/5 0/30/3

0/5 0/2

0/1

0/3 0/5

flow

0

notation

u vf /c

edge with capacity c ,and current flow f .

Stephen Gould | CVPR 2014 37/76

Page 93: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Augmenting Path)

s

a b

c d

t

0/5 0/30/3

0/5 0/2

0/1

0/3 0/5

flow

0

notation

u vf /c

edge with capacity c ,and current flow f .

Stephen Gould | CVPR 2014 37/76

Page 94: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Augmenting Path)

s

a b

c d

t

3/5 0/30/3

3/5 0/2

0/1

3/3 0/5

flow

3

notation

u vf /c

edge with capacity c ,and current flow f .

Stephen Gould | CVPR 2014 37/76

Page 95: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Augmenting Path)

s

a b

c d

t

3/5 0/30/3

3/5 0/2

0/1

3/3 0/5

flow

3

notation

u vf /c

edge with capacity c ,and current flow f .

Stephen Gould | CVPR 2014 37/76

Page 96: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Augmenting Path)

s

a b

c d

t

5/5 0/32/3

3/5 2/2

0/1

3/3 2/5

flow

5

notation

u vf /c

edge with capacity c ,and current flow f .

Stephen Gould | CVPR 2014 37/76

Page 97: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Augmenting Path)

s

a b

c d

t

5/5 0/32/3

3/5 2/20/1

3/3 2/5

flow

5

notation

u vf /c

edge with capacity c ,and current flow f .

Stephen Gould | CVPR 2014 37/76

Page 98: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Augmenting Path)

s

a b

c d

t

5/5 1/31/3

4/5 2/2

1/1

3/3 3/5

flow

6

notation

u vf /c

edge with capacity c ,and current flow f .

Stephen Gould | CVPR 2014 37/76

Page 99: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Augmenting Path)

s

a b

c d

t

5/5 1/31/3

4/5 2/2

1/1

3/3 3/5

flow

6

notation

u vf /c

edge with capacity c ,and current flow f .

Stephen Gould | CVPR 2014 38/76

Page 100: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

0/5 0/30/3

0/5 0/2

0/1

0/3 0/5

state

h(·) e(·)s 6 ∞a 0 0b 0 0c 0 0d 0 0t 0 0

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 101: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

0/5 0/30/3

0/5 0/2

0/1

0/3 0/5

state

h(·) e(·)s 6 ∞a 0 0b 0 0c 0 0d 0 0t 0 0

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 102: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/30/3

0/5 0/2

0/1

0/3 0/5

state

h(·) e(·)s 6 ∞a 0 5b 0 3c 0 0d 0 0t 0 0

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 103: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/30/3

0/5 0/2

0/1

0/3 0/5

state

h(·) e(·)s 6 ∞a 1 5b 0 3c 0 0d 0 0t 0 0

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 104: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/30/3

0/5 0/2

0/1

0/3 0/5

state

h(·) e(·)s 6 ∞a 1 5b 0 3c 0 0d 0 0t 0 0

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 105: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/33/3

2/5 0/2

0/1

0/3 0/5

state

h(·) e(·)s 6 ∞a 1 0b 0 6c 0 2d 0 0t 0 0

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 106: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/33/3

2/5 0/2

0/1

0/3 0/5

state

h(·) e(·)s 6 ∞a 1 0b 1 6c 0 2d 0 0t 0 0

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 107: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/33/3

2/5 0/2

0/1

0/3 0/5

state

h(·) e(·)s 6 ∞a 1 0b 1 6c 0 2d 0 0t 0 0

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 108: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/33/3

2/5 2/2

0/1

0/3 0/5

state

h(·) e(·)s 6 ∞a 1 0b 1 4c 0 2d 0 2t 0 0

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 109: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/33/3

2/5 2/2

0/1

0/3 0/5

state

h(·) e(·)s 6 ∞a 1 0b 1 4c 1 2d 0 2t 0 0

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 110: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/33/3

2/5 2/20/1

0/3 0/5

state

h(·) e(·)s 6 ∞a 1 0b 1 4c 1 2d 0 2t 0 0

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 111: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/33/3

2/5 2/2

1/1

1/3 0/5

state

h(·) e(·)s 6 ∞a 1 0b 1 4c 1 0d 0 3t 0 1

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 112: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/33/3

2/5 2/2

1/1

1/3 0/5

state

h(·) e(·)s 6 ∞a 1 0b 1 4c 1 0d 1 3t 0 1

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 113: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/33/3

2/5 2/2

1/1

1/3 0/5

state

h(·) e(·)s 6 ∞a 1 0b 1 4c 1 0d 1 3t 0 1

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 114: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/33/3

2/5 2/2

1/1

1/3 3/5

state

h(·) e(·)s 6 ∞a 1 0b 1 4c 1 0d 1 0t 0 4

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 115: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/33/3

2/5 2/2

1/1

1/3 3/5

state

h(·) e(·)s 6 ∞a 1 0b 2 4c 1 0d 1 0t 0 4

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 116: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/33/3

2/5 2/2

1/1

1/3 3/5

state

h(·) e(·)s 6 ∞a 1 0b 2 4c 1 0d 1 0t 0 4

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 117: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/30/3

2/5 2/2

1/1

1/3 3/5

state

h(·) e(·)s 6 ∞a 1 3b 2 1c 1 0d 1 0t 0 4

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 118: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/30/3

2/5 2/2

1/1

1/3 3/5

state

h(·) e(·)s 6 ∞a 2 3b 2 1c 1 0d 1 0t 0 4

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 119: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/30/3

2/5 2/2

1/1

1/3 3/5

state

h(·) e(·)s 6 ∞a 2 3b 2 1c 1 0d 1 0t 0 4

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 120: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/30/3

5/5 2/2

1/1

1/3 3/5

state

h(·) e(·)s 6 ∞a 2 0b 2 1c 1 3d 1 0t 0 4

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 121: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/30/3

5/5 2/2

1/1

1/3 3/5

state

h(·) e(·)s 6 ∞a 2 0b 2 1c 1 3d 1 0t 0 4

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 122: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/30/3

5/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 2 0b 2 1c 1 1d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 123: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/30/3

5/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 2 0b 7 1c 1 1d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 124: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 3/30/3

5/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 2 0b 7 1c 1 1d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 125: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

5/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 2 0b 7 0c 1 1d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 126: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

5/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 2 0b 7 0c 3 1d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 127: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

5/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 2 0b 7 0c 3 1d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 128: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

4/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 2 1b 7 0c 3 0d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 129: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

4/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 4 1b 7 0c 3 0d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 130: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

4/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 4 1b 7 0c 3 0d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 131: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

5/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 4 0b 7 0c 3 1d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 132: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

5/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 4 0b 7 0c 5 1d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 133: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

5/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 4 0b 7 0c 5 1d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 134: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

4/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 4 1b 7 0c 5 0d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 135: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

4/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 6 1b 7 0c 5 0d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 136: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

4/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 6 1b 7 0c 5 0d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 137: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

5/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 6 0b 7 0c 5 1d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 138: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

5/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 6 0b 7 0c 7 1d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 139: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

5/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 6 0b 7 0c 7 1d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 140: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

4/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 6 1b 7 0c 7 0d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 141: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

4/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 7 1b 7 0c 7 0d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 142: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

5/5 2/30/3

4/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 7 1b 7 0c 7 0d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 143: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

4/5 2/30/3

4/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 7 0b 7 0c 7 0d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 39/76

Page 144: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow Example (Push-Relabel)

s

a b

c d

t

4/5 2/30/3

4/5 2/2

1/1

3/3 3/5

state

h(·) e(·)s 6 ∞a 7 0b 7 0c 7 0d 1 0t 0 6

notation

u vf /c

edge with capacity c ,current flow f .

Stephen Gould | CVPR 2014 40/76

Page 145: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Comparison of Maximum Flow Algorithms

Current state-of-the-art algorithm for exact minimization of generalsubmodular pseudo-Boolean functions is O(n5T + n6), where T isthe time taken to evaluate the function [Orlin, 2009].

†assumes integer capacitiesStephen Gould | CVPR 2014 41/76

Page 146: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Comparison of Maximum Flow Algorithms

Current state-of-the-art algorithm for exact minimization of generalsubmodular pseudo-Boolean functions is O(n5T + n6), where T isthe time taken to evaluate the function [Orlin, 2009].

Algorithm Complexity

Ford-Fulkerson O(E max f )†

Edmonds-Karp (BFS) O(VE 2)

Push-relabel O(V 3)

Boykov-Kolmogorov O(V 2E max f )(∼ O(V ) in practice)

†assumes integer capacitiesStephen Gould | CVPR 2014 41/76

Page 147: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow (Boykov-Kolmogorov, PAMI 2004)

Stephen Gould | CVPR 2014 42/76

Page 148: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow (Boykov-Kolmogorov, PAMI 2004)

growth stage

search trees from s

and t grow untilthey touch

Stephen Gould | CVPR 2014 43/76

Page 149: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow (Boykov-Kolmogorov, PAMI 2004)

growth stage

search trees from s

and t grow untilthey touch

augmentation stage

the path found isaugmented

Stephen Gould | CVPR 2014 44/76

Page 150: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow (Boykov-Kolmogorov, PAMI 2004)

growth stage

search trees from s

and t grow untilthey touch

augmentation stage

the path found isaugmented; treesbreak into forests

Stephen Gould | CVPR 2014 45/76

Page 151: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Maximum Flow (Boykov-Kolmogorov, PAMI 2004)

growth stage

search trees from s

and t grow untilthey touch

augmentation stage

the path found isaugmented; treesbreak into forests

adoption stage

trees are restored

Stephen Gould | CVPR 2014 46/76

Page 152: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Reparameterization of Energy Functions

E (y1, y2) = 2y1+5y1+3y2+y2

+ 3y1y2 + 4y1y2

s

y1 y2

t

5

2

1

33

4

1

0

E (y1, y2) = 6y1+5y2+7y1y2

s

y1 y2

t

6

5

7

1

0

Stephen Gould | CVPR 2014 47/76

Page 153: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Big Picture: Where are we now?

We can perform inference in submodular binary pairwiseMarkov random fields exactly.

{0, 1}n → R

What about...

non-submodular binary pairwise Markov random fields?

multi-label Markov random fields?

higher-order Markov random fields? (later)

Stephen Gould | CVPR 2014 48/76

Page 154: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Big Picture: Where are we now?

We can perform inference in submodular binary pairwiseMarkov random fields exactly.

{0, 1}n → R

What about...

non-submodular binary pairwise Markov random fields?

multi-label Markov random fields?

higher-order Markov random fields? (later)

Stephen Gould | CVPR 2014 48/76

Page 155: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Non-submodular Binary Pairwise MRFs

Non-submodular binary pairwise MRFs have potentials that do notsatisfy ψP

ij (0, 1) + ψPij (1, 0) ≥ ψ

Pij (1, 1) + ψP

ij (0, 0).

They are often handled in one of the following ways:

approximate the energy function by one that is submodular(i.e., project onto the space of submodular functions);

solve a relaxation of the problem using QPBO (Rother et al.,2007) or dual-decomposition (Komodakis et al., 2007).

Stephen Gould | CVPR 2014 49/76

Page 156: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Non-submodular Binary Pairwise MRFs

Non-submodular binary pairwise MRFs have potentials that do notsatisfy ψP

ij (0, 1) + ψPij (1, 0) ≥ ψ

Pij (1, 1) + ψP

ij (0, 0).

They are often handled in one of the following ways:

approximate the energy function by one that is submodular(i.e., project onto the space of submodular functions);

solve a relaxation of the problem using QPBO (Rother et al.,2007) or dual-decomposition (Komodakis et al., 2007).

Stephen Gould | CVPR 2014 49/76

Page 157: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Approximating Non-submodular Binary Pairwise MRFs

Consider the non-submodular potentialA B

C Dwith

A+ D > B + C .

We can project onto a submodular potential by modifying thecoefficients as follows:

∆ = A+ D − C − B

A← A−∆

3

C ← C +∆

3

B ← B +∆

3

Stephen Gould | CVPR 2014 50/76

Page 158: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

QPBO (Roof Duality) [Rother et al., 2007]

Consider the energy function

E (y) =∑

i∈V

ψUi (yi ) +

ij∈E

ψPij (yi , yj )

︸ ︷︷ ︸

submodular

+∑

ij∈E

ψPij (yi , yj)

︸ ︷︷ ︸

non-submodular

We can introduce duplicate variables yi into the energy function,and write

E ′(y, y) =∑

i∈V

ψUi (yi) + ψU

i (1− yi )

2

+∑

ij∈E

ψPij (yi , yj) + ψP

ij (1− yi , 1− yj)

2

+∑

ij∈E

ψPij (yi , 1− yj) + ψP

ij (1− yi , yj)

2

Stephen Gould | CVPR 2014 51/76

Page 159: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

QPBO (Roof Duality) [Rother et al., 2007]

Consider the energy function

E (y) =∑

i∈V

ψUi (yi ) +

ij∈E

ψPij (yi , yj )

︸ ︷︷ ︸

submodular

+∑

ij∈E

ψPij (yi , yj)

︸ ︷︷ ︸

non-submodular

We can introduce duplicate variables yi into the energy function,and write

E ′(y, y) =∑

i∈V

ψUi (yi) + ψU

i (1− yi )

2

+∑

ij∈E

ψPij (yi , yj) + ψP

ij (1− yi , 1− yj)

2

+∑

ij∈E

ψPij (yi , 1− yj) + ψP

ij (1− yi , yj)

2

Stephen Gould | CVPR 2014 51/76

Page 160: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

QPBO (Roof Duality)

E ′(y, y) =∑

i∈V

12ψUi (yi ) +

12ψUi (1− yi)

+∑

ij∈E

12ψPij (yi , yj) +

12ψPij (1− yi , 1− yj)

+∑

ij∈E

12ψPij (yi , 1− yj) + 1

2ψPij (1− yi , yj)

Observations

if yi = 1− yi for all i , then E (y) = E ′(y, y).E ′(y, y) is submodular.

Ignore the constraint on yi and solve anyway. Result satisfiespartial optimality: if yi = 1− yi then yi is the optimal label.

Stephen Gould | CVPR 2014 52/76

Page 161: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

QPBO (Roof Duality)

E ′(y, y) =∑

i∈V

12ψUi (yi ) +

12ψUi (1− yi)

+∑

ij∈E

12ψPij (yi , yj) +

12ψPij (1− yi , 1− yj)

+∑

ij∈E

12ψPij (yi , 1− yj) + 1

2ψPij (1− yi , yj)

Observations

if yi = 1− yi for all i , then E (y) = E ′(y, y).E ′(y, y) is submodular.

Ignore the constraint on yi and solve anyway. Result satisfiespartial optimality: if yi = 1− yi then yi is the optimal label.

Stephen Gould | CVPR 2014 52/76

Page 162: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Multi-label Markov Random Fields

The quadratic pseudo-Boolean optimization techniques describedabove cannot be applied directly to multi-label MRFs.

However...

...for certain MRFs we can transform the multi-label probleminto a binary one exactly.

...we can project the multi-label problem onto a series ofbinary problems in a so-called move-making algorithm.

Stephen Gould | CVPR 2014 53/76

Page 163: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Multi-label Markov Random Fields

The quadratic pseudo-Boolean optimization techniques describedabove cannot be applied directly to multi-label MRFs.

However...

...for certain MRFs we can transform the multi-label probleminto a binary one exactly.

...we can project the multi-label problem onto a series ofbinary problems in a so-called move-making algorithm.

Stephen Gould | CVPR 2014 53/76

Page 164: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

The “Battleship” Transform [Ishikawa, 2003]

If the multi-label MRFs has pairwise potentials that are convexfunctions over the label differences, i.e., ψP

ij (yi , yj ) = g(|yi − yj |)where g(·) is convex, then we can transform the energy functioninto an equivalent binary one.

y = 1⇔ z = (0, 0, 0)

y = 2⇔ z = (1, 0, 0)

y = 3⇔ z = (1, 1, 0)

y = 4⇔ z = (1, 1, 1)

s

1 1

2 2

3 3

t

Stephen Gould | CVPR 2014 54/76

Page 165: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Move-making Inference

Idea:

initialize yprev to any valid assignment

restrict the label-space of each variable yi from L to Yi ⊆ L(with y

prev

i ∈ Yi)

transform E : Ln → R to E : Y1 × · · · × Yn → R

find the optimal assignment y for E and repeat

each move results in an assignment with lower energy

Stephen Gould | CVPR 2014 55/76

Page 166: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Iterated Conditional Modes [Besag, 1986]

Reduce multi-variate inference to solving a series ofunivariate inference problems.

ICM move

For one of the variables yi , set Yi = L. Set Yj = {yprev

j } for allj 6= i (i.e., hold all other variables fixed).

can be used for arbitrary energy functions

Stephen Gould | CVPR 2014 56/76

Page 167: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Iterated Conditional Modes [Besag, 1986]

Reduce multi-variate inference to solving a series ofunivariate inference problems.

ICM move

For one of the variables yi , set Yi = L. Set Yj = {yprev

j } for allj 6= i (i.e., hold all other variables fixed).

can be used for arbitrary energy functions

Stephen Gould | CVPR 2014 56/76

Page 168: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Alpha Expansion and Alpha-Beta Swap [Boykov et al., 2001]

Reduce multi-label inference to solving a series of binary(submodular) inference problems.

α-expansion move

Choose some α ∈ L. Then for all variables, set Yi = {α, yprev

i }.

ψPij (·, ·) must be metric for the resulting move to be submodular

αβ-swap move

Choose two labels α, β ∈ L. Then for each variable yi such thatyprev

i ∈ {α, β}, set Yi = {α, β}. Otherwise set Yi = {yprev

i }.

ψPij (·, ·) must be semi-metric

Stephen Gould | CVPR 2014 57/76

Page 169: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Alpha Expansion Potential Construction

ynexti =

{

yprevi if ti = 1

α if ti = 0

E (t) =∑

i

ψi (α)ti + ψi (yprevi )ti +

ij

ψij(α,α)ti tj

+ ψij(α, yprevj )ti tj + ψij(y

previ , α)ti tj + ψij (y

previ , yprevj )ti tj

Stephen Gould | CVPR 2014 58/76

Page 170: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Alpha Expansion Potential Construction

ynexti =

{

yprevi if ti = 1

α if ti = 0

E (t) =∑

i

ψi (α)ti + ψi (yprevi )ti +

ij

ψij(α,α)ti tj

+ ψij(α, yprevj )ti tj + ψij(y

previ , α)ti tj + ψij (y

previ , yprevj )ti tj

Stephen Gould | CVPR 2014 58/76

Page 171: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

relaxations and dual decomposition

Stephen Gould | CVPR 2014 59/76

Page 172: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Mathematical Programming Formulation

Let θc,yc , ψc (yc) and let µc,yc ,

{

1, if Yc = yc

0, otherwise

argminy∈Y

c

ψc(yc)

m

minimize (over µ) θTµ

subject to µc,yc ∈ {0, 1}, ∀c , yc ∈ Yc∑

ycµc,yc = 1, ∀c

yc\yiµc,yc = µi ,yi , ∀i ∈ c , yi ∈ Yi

Stephen Gould | CVPR 2014 60/76

Page 173: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Mathematical Programming Formulation

Let θc,yc , ψc (yc) and let µc,yc ,

{

1, if Yc = yc

0, otherwise

argminy∈Y

c

ψc(yc)

m

minimize (over µ) θTµ

subject to µc,yc ∈ {0, 1}, ∀c , yc ∈ Yc∑

ycµc,yc = 1, ∀c

yc\yiµc,yc = µi ,yi , ∀i ∈ c , yi ∈ Yi

Stephen Gould | CVPR 2014 60/76

Page 174: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Binary Integer Program: Example

Consider energy function E (y1, y2) = ψ1(y1) +ψ12(y1, y2) + ψ2(y2)for binary variables y1 and y2.

Y1 Y1,Y2 Y2

θ =

ψ1(0)ψ1(1)ψ2(0)ψ2(1)

ψ12(0, 0)ψ12(1, 0)ψ12(0, 1)ψ12(1, 1)

µ =

µ1,0µ1,1µ2,0µ2,1µ12,00µ12,10µ12,01µ12,11

s.t.

µ1,0 + µ1,1 = 1µ2,0 + µ2,1 = 1

µ12,00 + µ12,10+ µ12,01 + µ12,11 = 1µ12,00 + µ12,01 = µ1,0µ12,10 + µ12,11 = µ1,1µ12,00 + µ12,10 = µ2,0µ12,01 + µ12,11 = µ2,1

Stephen Gould | CVPR 2014 61/76

Page 175: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Binary Integer Program: Example

Consider energy function E (y1, y2) = ψ1(y1) +ψ12(y1, y2) + ψ2(y2)for binary variables y1 and y2.

Y1 Y1,Y2 Y2

θ =

ψ1(0)ψ1(1)ψ2(0)ψ2(1)

ψ12(0, 0)ψ12(1, 0)ψ12(0, 1)ψ12(1, 1)

µ =

µ1,0µ1,1µ2,0µ2,1µ12,00µ12,10µ12,01µ12,11

s.t.

µ1,0 + µ1,1 = 1µ2,0 + µ2,1 = 1

µ12,00 + µ12,10+ µ12,01 + µ12,11 = 1µ12,00 + µ12,01 = µ1,0µ12,10 + µ12,11 = µ1,1µ12,00 + µ12,10 = µ2,0µ12,01 + µ12,11 = µ2,1

Stephen Gould | CVPR 2014 61/76

Page 176: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Binary Integer Program: Example

Consider energy function E (y1, y2) = ψ1(y1) +ψ12(y1, y2) + ψ2(y2)for binary variables y1 and y2.

Y1 Y1,Y2 Y2

θ =

ψ1(0)ψ1(1)ψ2(0)ψ2(1)

ψ12(0, 0)ψ12(1, 0)ψ12(0, 1)ψ12(1, 1)

µ =

µ1,0µ1,1µ2,0µ2,1µ12,00µ12,10µ12,01µ12,11

s.t.

µ1,0 + µ1,1 = 1µ2,0 + µ2,1 = 1

µ12,00 + µ12,10+ µ12,01 + µ12,11 = 1µ12,00 + µ12,01 = µ1,0µ12,10 + µ12,11 = µ1,1µ12,00 + µ12,10 = µ2,0µ12,01 + µ12,11 = µ2,1

Stephen Gould | CVPR 2014 61/76

Page 177: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Local Marginal Polytope

M =

{

µ ≥ 0

∣∣∣∣∣

yiµi ,yi = 1, ∀i

yc\yiµc,yc = µi ,yi , ∀i ∈ c , yi ∈ Yi

}

M is tight if factor graph is a tree

for cyclic graphsM may contain fractional vertices

for submodular energies, factional solutions are never optimal

Stephen Gould | CVPR 2014 62/76

Page 178: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Local Marginal Polytope

M =

{

µ ≥ 0

∣∣∣∣∣

yiµi ,yi = 1, ∀i

yc\yiµc,yc = µi ,yi , ∀i ∈ c , yi ∈ Yi

}

M is tight if factor graph is a tree

for cyclic graphsM may contain fractional vertices

for submodular energies, factional solutions are never optimal

Stephen Gould | CVPR 2014 62/76

Page 179: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Linear Programming (LP) Relaxation

Binary integer program

minimize (over µ) θTµ

subject to µc,yc ∈ {0, 1}µ ∈ M

Linear program

minimize (over µ) θTµ

subject to µc,yc ∈ [0, 1]µ ∈ M

Solution by standard LP solvers typically infeasible due tolarge number of variables and constraints

More easily solved via coordinate ascent of the dual

Solutions need to be rounded or decodedStephen Gould | CVPR 2014 63/76

Page 180: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Linear Programming (LP) Relaxation

Binary integer program

minimize (over µ) θTµ

subject to µc,yc ∈ {0, 1}µ ∈ M

Linear program

minimize (over µ) θTµ

subject to µc,yc ∈ [0, 1]µ ∈ M

Solution by standard LP solvers typically infeasible due tolarge number of variables and constraints

More easily solved via coordinate ascent of the dual

Solutions need to be rounded or decodedStephen Gould | CVPR 2014 63/76

Page 181: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Linear Programming (LP) Relaxation

Binary integer program

minimize (over µ) θTµ

subject to µc,yc ∈ {0, 1}µ ∈ M

Linear program

minimize (over µ) θTµ

subject to µc,yc ∈ [0, 1]µ ∈ M

Solution by standard LP solvers typically infeasible due tolarge number of variables and constraints

More easily solved via coordinate ascent of the dual

Solutions need to be rounded or decodedStephen Gould | CVPR 2014 63/76

Page 182: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Linear Programming (LP) Relaxation

Binary integer program

minimize (over µ) θTµ

subject to µc,yc ∈ {0, 1}µ ∈ M

Linear program

minimize (over µ) θTµ

subject to µc,yc ∈ [0, 1]µ ∈ M

Solution by standard LP solvers typically infeasible due tolarge number of variables and constraints

More easily solved via coordinate ascent of the dual

Solutions need to be rounded or decodedStephen Gould | CVPR 2014 63/76

Page 183: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Linear Programming (LP) Relaxation

Binary integer program

minimize (over µ) θTµ

subject to µc,yc ∈ {0, 1}µ ∈ M

Linear program

minimize (over µ) θTµ

subject to µc,yc ∈ [0, 1]µ ∈ M

Solution by standard LP solvers typically infeasible due tolarge number of variables and constraints

More easily solved via coordinate ascent of the dual

Solutions need to be rounded or decodedStephen Gould | CVPR 2014 63/76

Page 184: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Dual Decomposition: Rewriting the Primal

minimize (over µ)∑

c θTc µc

subject to µ ∈ M

m (pad θc)

minimize (over µ)∑

c θT

c µ

subject to µ ∈M

m (introduce copies of µ)

minimize (over µ, {µc})∑

c θT

c µc

subject to µc = µ

µ ∈ M

Stephen Gould | CVPR 2014 64/76

Page 185: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Dual Decomposition: Rewriting the Primal

minimize (over µ)∑

c θTc µc

subject to µ ∈ M

m (pad θc)

minimize (over µ)∑

c θT

c µ

subject to µ ∈M

m (introduce copies of µ)

minimize (over µ, {µc})∑

c θT

c µc

subject to µc = µ

µ ∈ M

Stephen Gould | CVPR 2014 64/76

Page 186: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Dual Decomposition: Rewriting the Primal

minimize (over µ)∑

c θTc µc

subject to µ ∈ M

m (pad θc)

minimize (over µ)∑

c θT

c µ

subject to µ ∈M

m (introduce copies of µ)

minimize (over µ, {µc})∑

c θT

c µc

subject to µc = µ

µ ∈ M

Stephen Gould | CVPR 2014 64/76

Page 187: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Dual Decomposition: Forming the Dual

Primal problem

minimize (over µ, {µc})∑

c θT

c µc

subject to µc = µ

µ ∈ M

Introducing dual variables λc we have Lagrangian

L(µ, {µc}, {λc}) =∑

c

θT

c µc +

c

λTc (µc − µ)

=∑

c

(θc + λc)Tµc −

c

λTc µ

Stephen Gould | CVPR 2014 65/76

Page 188: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Dual Decomposition: Forming the Dual

Primal problem

minimize (over µ, {µc})∑

c θT

c µc

subject to µc = µ

µ ∈ M

Introducing dual variables λc we have Lagrangian

L(µ, {µc}, {λc}) =∑

c

θT

c µc +

c

λTc (µc − µ)

=∑

c

(θc + λc)Tµc −

c

λTc µ

Stephen Gould | CVPR 2014 65/76

Page 189: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Dual Decomposition: Forming the Dual

Primal problem

minimize (over µ, {µc})∑

c θT

c µc

subject to µc = µ

µ ∈ M

Introducing dual variables λc we have Lagrangian

L(µ, {µc}, {λc}) =∑

c

θT

c µc +

c

λTc (µc − µ)

=∑

c

(θc + λc)Tµc −

c

λTc µ

Stephen Gould | CVPR 2014 65/76

Page 190: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Dual Decomposition

maximize{λc}

min{µc}

c

(θc + λc)Tµc

subject to∑

c λc = 0

m

maximize{λc}

c

minµc

(θc + λc)Tµc

subject to∑

c λc = 0

m

maximize{λc}

c

minyc

ψc(yc) + λc(yc)

subject to∑

c λc = 0

Stephen Gould | CVPR 2014 66/76

Page 191: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Dual Decomposition

maximize{λc}

min{µc}

c

(θc + λc)Tµc

subject to∑

c λc = 0

m

maximize{λc}

c

minµc

(θc + λc)Tµc

subject to∑

c λc = 0

m

maximize{λc}

c

minyc

ψc(yc) + λc(yc)

subject to∑

c λc = 0

Stephen Gould | CVPR 2014 66/76

Page 192: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Dual Decomposition

maximize{λc}

min{µc}

c

(θc + λc)Tµc

subject to∑

c λc = 0

m

maximize{λc}

c

minµc

(θc + λc)Tµc

subject to∑

c λc = 0

m

maximize{λc}

c

minyc

ψc(yc) + λc(yc)

subject to∑

c λc = 0

Stephen Gould | CVPR 2014 66/76

Page 193: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Dual Lower Bound

E (y) =∑

c

ψc (yc)

=∑

c

ψc (yc) + λc(yc)

(

iff∑

c

λc(yc) = 0

)

miny

E (y) ≥∑

c

minyc

ψc(yc) + λc(yc)

miny

E (y) ≥ max{λc}:

∑c λc=0

c

minyc

ψc(yc) + λc(yc)

Stephen Gould | CVPR 2014 67/76

Page 194: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Dual Lower Bound

E (y) =∑

c

ψc (yc)

=∑

c

ψc (yc) + λc(yc)

(

iff∑

c

λc(yc) = 0

)

miny

E (y) ≥∑

c

minyc

ψc(yc) + λc(yc)

miny

E (y) ≥ max{λc}:

∑c λc=0

c

minyc

ψc(yc) + λc(yc)

Stephen Gould | CVPR 2014 67/76

Page 195: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Dual Lower Bound

E (y) =∑

c

ψc (yc)

=∑

c

ψc (yc) + λc(yc)

(

iff∑

c

λc(yc) = 0

)

miny

E (y) ≥∑

c

minyc

ψc(yc) + λc(yc)

miny

E (y) ≥ max{λc}:

∑c λc=0

c

minyc

ψc(yc) + λc(yc)

Stephen Gould | CVPR 2014 67/76

Page 196: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Subgradients

Subgradient

A subgradient of a function f at x is any vector g satisfying

f (y) ≥ f (x) + gT (y − x) for all y

Stephen Gould | CVPR 2014 68/76

Page 197: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Subgradient Method

The basic subgradient method is a algorithm for minimizing anondifferentiable convex function f : Rn → R.

x(k+1) = x(k) − αkg(k)

x(k) is the k-th iterate

g (k) is any subgradient of f at x(k)

αk > 0 is the k-th step size

It is possible that −g (k) is not a descent direction for f at x(k), sowe keep track of the best point found so far

f(k)best = min

{

f(k−1)best , f (x(k))

}

Stephen Gould | CVPR 2014 69/76

Page 198: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Subgradient Method

The basic subgradient method is a algorithm for minimizing anondifferentiable convex function f : Rn → R.

x(k+1) = x(k) − αkg(k)

x(k) is the k-th iterate

g (k) is any subgradient of f at x(k)

αk > 0 is the k-th step size

It is possible that −g (k) is not a descent direction for f at x(k), sowe keep track of the best point found so far

f(k)best = min

{

f(k−1)best , f (x(k))

}

Stephen Gould | CVPR 2014 69/76

Page 199: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Step Size Rules

Step sizes are chosen ahead of time (unlike line search is ordinarygradient methods). A few common step size schedules are:

constant step size: αk = α

constant step length: αk = γ

‖g (k)‖2

square summable but not summable:∑∞

k=1 α2k <∞,

∑∞k=1 αk =∞

nonsummable diminishing:

limk→∞

αk = 0,∑∞

k=1 αk =∞

nonsummable diminishing step lengths: αk = γk‖g (k)‖2

limk→∞

γk = 0,∑∞

k=1 γk =∞

Stephen Gould | CVPR 2014 70/76

Page 200: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Step Size Rules

Step sizes are chosen ahead of time (unlike line search is ordinarygradient methods). A few common step size schedules are:

constant step size: αk = α

constant step length: αk = γ

‖g (k)‖2

square summable but not summable:∑∞

k=1 α2k <∞,

∑∞k=1 αk =∞

nonsummable diminishing:

limk→∞

αk = 0,∑∞

k=1 αk =∞

nonsummable diminishing step lengths: αk = γk‖g (k)‖2

limk→∞

γk = 0,∑∞

k=1 γk =∞

Stephen Gould | CVPR 2014 70/76

Page 201: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Step Size Rules

Step sizes are chosen ahead of time (unlike line search is ordinarygradient methods). A few common step size schedules are:

constant step size: αk = α

constant step length: αk = γ

‖g (k)‖2

square summable but not summable:∑∞

k=1 α2k <∞,

∑∞k=1 αk =∞

nonsummable diminishing:

limk→∞

αk = 0,∑∞

k=1 αk =∞

nonsummable diminishing step lengths: αk = γk‖g (k)‖2

limk→∞

γk = 0,∑∞

k=1 γk =∞

Stephen Gould | CVPR 2014 70/76

Page 202: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Step Size Rules

Step sizes are chosen ahead of time (unlike line search is ordinarygradient methods). A few common step size schedules are:

constant step size: αk = α

constant step length: αk = γ

‖g (k)‖2

square summable but not summable:∑∞

k=1 α2k <∞,

∑∞k=1 αk =∞

nonsummable diminishing:

limk→∞

αk = 0,∑∞

k=1 αk =∞

nonsummable diminishing step lengths: αk = γk‖g (k)‖2

limk→∞

γk = 0,∑∞

k=1 γk =∞

Stephen Gould | CVPR 2014 70/76

Page 203: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Step Size Rules

Step sizes are chosen ahead of time (unlike line search is ordinarygradient methods). A few common step size schedules are:

constant step size: αk = α

constant step length: αk = γ

‖g (k)‖2

square summable but not summable:∑∞

k=1 α2k <∞,

∑∞k=1 αk =∞

nonsummable diminishing:

limk→∞

αk = 0,∑∞

k=1 αk =∞

nonsummable diminishing step lengths: αk = γk‖g (k)‖2

limk→∞

γk = 0,∑∞

k=1 γk =∞

Stephen Gould | CVPR 2014 70/76

Page 204: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Convergence Results

For constant step size and constant step length, the subgradientalgorithm will converge to within some range of the optimal value,

limk→∞

f(k)best < f ⋆ + ǫ

For the diminishing step size and step length rules the algorithmconverges to the optimal value,

limk→∞

f(k)best = f ⋆

but may take a very long time to converge.

Stephen Gould | CVPR 2014 71/76

Page 205: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Optimal Step Size for Known f⋆

Assume we know f ⋆ (we just don’t know x⋆). Then

αk =f (x(k))− f ⋆

‖g (k)‖22

is an optimal step size in some sense. Called the Polyak step size.

A good approximation when f ⋆ is not known (but non-negative) is

αk =f (x(k))− γ · f

(k−1)best

‖g (k)‖22

where 0 < γ < 1.

Stephen Gould | CVPR 2014 72/76

Page 206: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Optimal Step Size for Known f⋆

Assume we know f ⋆ (we just don’t know x⋆). Then

αk =f (x(k))− f ⋆

‖g (k)‖22

is an optimal step size in some sense. Called the Polyak step size.

A good approximation when f ⋆ is not known (but non-negative) is

αk =f (x(k))− γ · f

(k−1)best

‖g (k)‖22

where 0 < γ < 1.

Stephen Gould | CVPR 2014 72/76

Page 207: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Projected Subgradient Method

One extension of the subgradient method is the projectedsubgradient method which solves problems of the form

minimize f (x)subject to x ∈ C

Here the updates are

x(k+1) = PC

(

x(k) − αkg(k))

The projected subgradient method has similar convergenceguarantees to the subgradient method.

Stephen Gould | CVPR 2014 73/76

Page 208: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Projected Subgradient Method

One extension of the subgradient method is the projectedsubgradient method which solves problems of the form

minimize f (x)subject to x ∈ C

Here the updates are

x(k+1) = PC

(

x(k) − αkg(k))

The projected subgradient method has similar convergenceguarantees to the subgradient method.

Stephen Gould | CVPR 2014 73/76

Page 209: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Projected Subgradient Method

One extension of the subgradient method is the projectedsubgradient method which solves problems of the form

minimize f (x)subject to x ∈ C

Here the updates are

x(k+1) = PC

(

x(k) − αkg(k))

The projected subgradient method has similar convergenceguarantees to the subgradient method.

Stephen Gould | CVPR 2014 73/76

Page 210: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Supergradient of mini{aTi x + bi}

Consider f (x) = mini{aTi x+ bi} and let I (x) = argmini{a

Ti x + bi}.

Then for any i ∈ I (x), g = ai is a supergradient of f at x.

f (x) + gT (z− x) = f (x)− aTi (z− x) i ∈ I (x)

= f (x)− aTi x− bi + aTi z+ bi

= aTi z+ bi

≥ f (z)

Stephen Gould | CVPR 2014 74/76

Page 211: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Supergradient of mini{aTi x + bi}

Consider f (x) = mini{aTi x+ bi} and let I (x) = argmini{a

Ti x + bi}.

Then for any i ∈ I (x), g = ai is a supergradient of f at x.

f (x) + gT (z− x) = f (x)− aTi (z− x) i ∈ I (x)

= f (x)− aTi x− bi + aTi z+ bi

= aTi z+ bi

≥ f (z)

Stephen Gould | CVPR 2014 74/76

Page 212: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Dual Decomposition Inference

initialize λc = 0

loopslaves solve minyc ψc(yc) + λc(yc) (to get µ⋆

c )master updates λc as

λc ← λc + α

(

µ⋆

c −1

C

c′

µ⋆

c′

)

until convergence

Stephen Gould | CVPR 2014 75/76

Page 213: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Dual Decomposition Inference

initialize λc = 0

loopslaves solve minyc ψc(yc) + λc(yc) (to get µ⋆

c )master updates λc as

λc ← λc + α

(

µ⋆

c −1

C

c′

µ⋆

c′

)

until convergence

Stephen Gould | CVPR 2014 75/76

Page 214: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition

Tutorial Overview

Part 1. Inference(S. Gould, 45 minutes)

Exact inference in graphical models

Graph-cut based methods

Relaxations and dual-decomposition

(P. Kohli, 45 minutes)Strategies for higher-order models

(D. Batra, 15 minutes)M-Best MAP, Diverse M-Best

Part 2. Learning(M. Blaschko, 45 minutes)

Introduction to learning of graphical models

Maximum-likelihood learning, max-margin learning

Max-margin training via subgradient method

(K. Alahari, 45 minutes)Constraint generation approaches for structured learning

Efficient training of graphical models via dual-decomposition

Stephen Gould | CVPR 2014 76/76