generative models for image analysis stuart geman (with e. borenstein, l.-b. chang, w. zhang)
TRANSCRIPT
I. Bayesian (generative) image modelsII. Feature distributions and data distributionsIII. Conditional modelingIV. Sampling and the choice of null distributionV. Other applications of conditional modeling
I. Bayesian (generative) image modelsPrior
( )
set of possible "interpretations" or "parses"
a particular interpretation
probability model on
* very structured and constrained
I
x I
P x I
* organizing principles: hierarchy and reusability
(Amit, Buhmann, Pogio, Yuille, Zhu, etc.)
* non-Markovian (context/content sensitive)
Conditional likelihood
( | ) ( | ) ( )P x y P y x P x
( | )
image
conditional probability model
y
P y x
Posterior
focus here on ( | )P y x
II. Feature distributions and data distributions
S{ }s s Sy y
pixel intensity at sy s S
( )
( )
"feature" e.g.
variance of patch
histogram of gradients, sift features, etc.
template correlation
probability model undergF
f y
P f ' ' category
(edge, corner, eye, face, ...)
g
image patch
Model patch through a feature model:
e.g. detection and recognition of eyes
S{ }s s Sy y
pixel intensity at sy s S
image patch
2 2
(1 )
1( )( )
| |( ) ( ) ( , )
1 1( ) ( )
| | | |
( )
Consider
and the model T
T
s ss
T
s ss s
ceC T
T T y yS
f y c y corr T y
T T y yS S
P c e
1,..., ,Problem: given samples of eye patches, learn and Ny y T
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
3
3.5
actually:
1
(1 ( )), 1
1 1
( ),..., ( )
( ( ),..., ( )) ( ( ))
Tempting to PRETEND that the data is :
T k
T
T T N
N Nc ye
T T T N C T kk k
c y c y
L c y c y P c y e
(1 ( ))
(1 ( ))
,
( ( ))
1( )
caution: is different from
T
T
T
c yeC T
c yeY
T
P c y e
P y eZ
(1 ( )),
1
,...,
( ) ( ( )) ( | ( ))
( ,..., ) ( | ( ))
1
1
BUT the data is y and
T
T k
N
e e eY C T Y T T
Nc y e
T N Y k T T kk
y
P y P c y P y C c y
L y y e P y C c y
The first is fine for estimating λ but not fine for estimating T
Use maximum likelihood…but what is the likelihood?
?
III. Conditional modeling
( )
( ) ( ( )) ( | ( ))
( ( )) ( | ).
For any category (e.g. "eye") and feature
Easy to model ; hard to model
g g gY F Y
g gF Y
g F f Y
P y P f y P y F f y
P f y P y F f
0
0
( ) ( )
( )
( )
Principle: start with a "null" or "background"
distribution and choose
1. consistent with , and
2. otherwise "as close as possible" to
gY Y
gF
Y
P y P y
P f
P y
0
0
: ( )( )
0
( ) ( ),
( ) arg min ( || )
( ) ( ( )) ( | ( ))
(( || ) ( ) log
has distribution
Specifically, given , and a null distribution
choose
(where
YgF
gF Y
gY Y Y
P F YP f
g gY F Y
P f P y
P y D P P
P y P f y P y F f y
PD P P P y
)
( ) is K-L divergence)
ydy
P y
Conditional modeling: a perturbation of the null distribution
Estimation
1
01
1
0
01
0
0 0
,..., ( )
( ,..., ) ( ( )) ( | ( ))
( ( )) ( | ( ))
( )
( ( )) ( | ( ))
( ( )) ( |
Given , and
=
gN F
Ng
N F k Y k kk
gNF k Y k k
k Y k
gF k Y k k
F k Y k
y y P f
L y y P f y P y F f y
P f y P y F f y
P y
P f y P y F f y
P f y P y F
1
01
( ))
( ( ))
( ( )) =
N
k k
gNF k
k F k
f y
P f y
P f y
Much Easier!
Example: learning eye templates
0
1
(1 ( )) 0
( ) ( ) ( , ),
( ) ( ( )) ( | ( ))
( | ( ))
M
M
m=1
Take and model patch as a
MIXTURE:
=
T m m mm
m Tm
m m m
T
e eY m C T Y T T
m
c y
m Y T T
f y c y corr T y
P y P c y P y C c y
e P y C c y
S{ }s s Sy y
pixel intensity at sy s S
image patch
Example: learning eye templates1 1 1 1
0
11
0
011
( ,..., | ,..., , ,..., , ,..., )
( ( )) ( | ( ))
( ( )) ( | ( ))
( )
(
M
M
=
T m m mm
T m m mm
Tm
N m m m
Ne
m C T k Y k T T kmk
eNC T k Y k T T k
mmk Y k
eC
m
L y y T T
P c y P y C c y
P c y P y C c y
P y
P
0
0 011
011
(1 ( ))
011
( )) ( | ( ))
( ( )) ( | ( ))
( ( ))
( ( ))
( ( ))
M
M
M
=
=
m m m
T m m mm
T mm
T mm
m T km
m
T mm
NT k Y k T T k
mk C T k Y k T T k
eNC T k
mmk C T k
c yN
mmk C T k
c y P y C c y
P c y P y C c y
P c y
P c y
e
P c y
Example: learning eye templates
0 | |1( ) ( )
256
Take (for now)
(iid uniform)SYP y
2| | ( )0 2( ( ))
Then, by a Central Limit Theorem:
|S|
2
T
T
S C y
C tP c y e
Example: learning eye templates
2
1 1 1 1
0
11
(1 ( ))
| | ( )11 2
( ,..., | ,..., , ,..., , ,..., )
( ( )) ( | ( ))M
M
|S|
2
T m m mm
m T km
m
T
N m m m
Ne
m C T k Y k T T kmk
c yN
m S C ymk
L y y T T
P c y P y C c y
e
e
Maximize the data likelihood for the mixing probabilities, the feature parameters, and the templates themselves…
Example: learning (right) eye templates
2
(1 ( ))(1 ( ))
| | ( )1 11 1 2
)M M
What if we forget all this nonsense and just maximize
(instead of ?|S|
2
m T km
m T k mm
m T
c yN N
c y
m m S C ym mk k
ee
e
How good are the templates? A classification experiment…
0
0
1
( ) ( ( )) ( | ( ))
( ) ( ( )) ( | ( ))M
In general
or a mixture of these models
m
g gY F Y
g gY m F m Y m m
m
P y P f y P y F f y
P y P f y P y F f y
( ) is any function (feature),
such as a correlation with a SUBIMAGE.
Thus can index
* alternative models (e.g. 8 eye templates)
* transformations of scale, rotation, ...
mf y
m
(e.g. as in work of Amit and Trouve)
How good are the templates? A classification experiment…
Classify East Asian and South Asian * mixing over 4 scales, and 8 templates
East Asian: (L) examples of training images (M) progression of EM (R) trained templates
South Asian: (L) examples of training images (M) progression of EM (R) trained templates
Classification Rate: 97%
Other examples: noses 16 templates multiple scales, shifts, and rotations
samples from training set learned templates
Other examples: mixture of noses and mouths
samples from training set(1/2 noses, 1/2 mouths)
32 learned templates
Other examples: train on 58 faces …half with glasses…half without
32 learned templates
samples from training set
8 learned templates
Other examples: train on 58 faces …half with glasses…half without
8 learned templates
random eight of the 58 faces
row 2 to 4, top to bottom:templates ordered by posterior likelihood
Other examples: train random patches (“sparse representation”)
500 random 15x15 training patches from random internet images
24 10x10 templates
Other examples: coarse representation( ) ( , ( )),
( ) ( ( ), )?)
use where downconvert
(go other way for super res.:
f y Corr T D y D
f y Corr D T y
training of 8 low-res (10x10) templates
sample from training set(down-converted images)
IV. Sampling and the choice of null distribution 0
(1 ( )) 0( ) ( | ( ))M
m=1
Take a closer look at by (approximately) sampling from
m Tm
m m m
Y
c ygY m Y T T
P
P y e P y C c y
0 | | 1
| | | |
( )
* Standardize templates and patches:
* View & as distributions on the unit sphere in
* Then sample from by
1. choosing a mixing compo
m mm
m m
g SY Y
gY
T T y yT y
T T y y
P P R
y P y
(1 )
0
,...,1nent m according to
2. choosing a correlation according to
3. choosing a sample according to
and computing :
m
m
M
c
Y
c e
y P
y
(approximate) sampling…
0
max | ( ) || |
32 samples from mixture model with random
patches satisfying
Y
s
s Ss
P
y y
y y
0 ( ) { }
2. Gibbs sampling: the problem is to draw a sample
from some distribution s s SP y y y
( )
* Given a sample at iteration from some
probability , visit a site t
t
P y s S
\( |* Replace by a sample from ) s s S sy P y y
\ \
01 \ \
0
: ( ) ( )
( ) ( ) ( | )
arg min ( || )
Then
S s t S s
t t S s s S s
P P y P y
P y P y P y y
D P P
3. Hierarchical models and the Markov Dilemma
{0,1}
1 'pair of eyes'
p
p
x
x
{0,1}
1 'left eye'l
l
x
x
{0,1}
1
'right eye'r
r
x
x
Markov model
Markov property…
Estimation
Computation
Representation
1Given , there are
probabilistic constraints
on the poses and
appearances of the
left and right eyes.
px
Hierarchical models and the Markov Dilemma{0,1}
1 'pair of eyes'
p
p
x
x
{0,1}
1 'left eye'l
l
x
x
{0,1}
1
'right eye'r
r
x
x
Markov
model0 ( )
1 ( )
( )
( ( ) |1 ( ) 1)A
More generally
Markov distribution
e.g. pair of eyes
attribute (e.g. relative
poses of two eyes)
P de
B
B
P x
x x B
a x
A a x x
sired
conditional distribution
1 0
: ( ( )|1 ( ) 1)( ( )|1 ( ) 1)
1 ( ) 1
1 00
( ) arg min ( || )
( ( ) |1 ( ) 1)( ) ( )
( ( ) |1 ( ) 1)
Choose
then
B
A B
B
P P A a x xP A a x x
x
A B
B
P x D P P
P A a x xP x P x
P A a x x
characters, plate sides
generic letter, generic number, L-junctions of sides
license plates
parts of characters, parts of plate sides
plate boundaries, strings (2 letters, 3 digits, 3 letters, 4 digits)
license numbers (3 digits + 3 letters, 4 digits + 2 letters)
Hierarchical models and the Markov dilemma
Original image Zoomed license region
Top object: Markov distribution
Top object: perturbed (“content-sensitive”) distribution
Hierarchical models and the Markov dilemma