generative models for image analysis stuart geman (with e. borenstein, l.-b. chang, w. zhang)

36
Generative Models for Image Analysis Stuart Geman (with E. Borenstein, L.-B. Chang, W. Zhang)

Upload: nathaniel-burke

Post on 18-Dec-2015

224 views

Category:

Documents


2 download

TRANSCRIPT

Generative Models for Image Analysis

Stuart Geman(with E. Borenstein, L.-B. Chang, W. Zhang)

I. Bayesian (generative) image modelsII. Feature distributions and data distributionsIII. Conditional modelingIV. Sampling and the choice of null distributionV. Other applications of conditional modeling

I. Bayesian (generative) image modelsPrior

( )

set of possible "interpretations" or "parses"

a particular interpretation

probability model on

* very structured and constrained

I

x I

P x I

* organizing principles: hierarchy and reusability

(Amit, Buhmann, Pogio, Yuille, Zhu, etc.)

* non-Markovian (context/content sensitive)

Conditional likelihood

( | ) ( | ) ( )P x y P y x P x

( | )

image

conditional probability model

y

P y x

Posterior

focus here on ( | )P y x

II. Feature distributions and data distributions

S{ }s s Sy y

pixel intensity at sy s S

( )

( )

"feature" e.g.

variance of patch

histogram of gradients, sift features, etc.

template correlation

probability model undergF

f y

P f ' ' category

(edge, corner, eye, face, ...)

g

image patch

Model patch through a feature model:

e.g. detection and recognition of eyes

S{ }s s Sy y

pixel intensity at sy s S

image patch

2 2

(1 )

1( )( )

| |( ) ( ) ( , )

1 1( ) ( )

| | | |

( )

Consider

and the model T

T

s ss

T

s ss s

ceC T

T T y yS

f y c y corr T y

T T y yS S

P c e

1,..., ,Problem: given samples of eye patches, learn and Ny y T

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

3.5

actually:

1

(1 ( )), 1

1 1

( ),..., ( )

( ( ),..., ( )) ( ( ))

Tempting to PRETEND that the data is :

T k

T

T T N

N Nc ye

T T T N C T kk k

c y c y

L c y c y P c y e

(1 ( ))

(1 ( ))

,

( ( ))

1( )

caution: is different from

T

T

T

c yeC T

c yeY

T

P c y e

P y eZ

(1 ( )),

1

,...,

( ) ( ( )) ( | ( ))

( ,..., ) ( | ( ))

1

1

BUT the data is y and

T

T k

N

e e eY C T Y T T

Nc y e

T N Y k T T kk

y

P y P c y P y C c y

L y y e P y C c y

The first is fine for estimating λ but not fine for estimating T

Use maximum likelihood…but what is the likelihood?

?

III. Conditional modeling

( )

( ) ( ( )) ( | ( ))

( ( )) ( | ).

For any category (e.g. "eye") and feature

Easy to model ; hard to model

g g gY F Y

g gF Y

g F f Y

P y P f y P y F f y

P f y P y F f

0

0

( ) ( )

( )

( )

Principle: start with a "null" or "background"

distribution and choose

1. consistent with , and

2. otherwise "as close as possible" to

gY Y

gF

Y

P y P y

P f

P y

0

0

: ( )( )

0

( ) ( ),

( ) arg min ( || )

( ) ( ( )) ( | ( ))

(( || ) ( ) log

has distribution

Specifically, given , and a null distribution

choose

(where

YgF

gF Y

gY Y Y

P F YP f

g gY F Y

P f P y

P y D P P

P y P f y P y F f y

PD P P P y

)

( ) is K-L divergence)

ydy

P y

Conditional modeling: a perturbation of the null distribution

Estimation

1

01

1

0

01

0

0 0

,..., ( )

( ,..., ) ( ( )) ( | ( ))

( ( )) ( | ( ))

( )

( ( )) ( | ( ))

( ( )) ( |

Given , and

=

gN F

Ng

N F k Y k kk

gNF k Y k k

k Y k

gF k Y k k

F k Y k

y y P f

L y y P f y P y F f y

P f y P y F f y

P y

P f y P y F f y

P f y P y F

1

01

( ))

( ( ))

( ( )) =

N

k k

gNF k

k F k

f y

P f y

P f y

Much Easier!

Example: learning eye templates

0

1

(1 ( )) 0

( ) ( ) ( , ),

( ) ( ( )) ( | ( ))

( | ( ))

M

M

m=1

Take and model patch as a

MIXTURE:

=

T m m mm

m Tm

m m m

T

e eY m C T Y T T

m

c y

m Y T T

f y c y corr T y

P y P c y P y C c y

e P y C c y

S{ }s s Sy y

pixel intensity at sy s S

image patch

Example: learning eye templates1 1 1 1

0

11

0

011

( ,..., | ,..., , ,..., , ,..., )

( ( )) ( | ( ))

( ( )) ( | ( ))

( )

(

M

M

=

T m m mm

T m m mm

Tm

N m m m

Ne

m C T k Y k T T kmk

eNC T k Y k T T k

mmk Y k

eC

m

L y y T T

P c y P y C c y

P c y P y C c y

P y

P

0

0 011

011

(1 ( ))

011

( )) ( | ( ))

( ( )) ( | ( ))

( ( ))

( ( ))

( ( ))

M

M

M

=

=

m m m

T m m mm

T mm

T mm

m T km

m

T mm

NT k Y k T T k

mk C T k Y k T T k

eNC T k

mmk C T k

c yN

mmk C T k

c y P y C c y

P c y P y C c y

P c y

P c y

e

P c y

Example: learning eye templates

0 | |1( ) ( )

256

Take (for now)

(iid uniform)SYP y

2| | ( )0 2( ( ))

Then, by a Central Limit Theorem:

|S|

2

T

T

S C y

C tP c y e

Example: learning eye templates

2

1 1 1 1

0

11

(1 ( ))

| | ( )11 2

( ,..., | ,..., , ,..., , ,..., )

( ( )) ( | ( ))M

M

|S|

2

T m m mm

m T km

m

T

N m m m

Ne

m C T k Y k T T kmk

c yN

m S C ymk

L y y T T

P c y P y C c y

e

e

Maximize the data likelihood for the mixing probabilities, the feature parameters, and the templates themselves…

Example: learning (right) eye templates

Example: learning (right) eye templates

2

(1 ( ))(1 ( ))

| | ( )1 11 1 2

)M M

What if we forget all this nonsense and just maximize

(instead of ?|S|

2

m T km

m T k mm

m T

c yN N

c y

m m S C ym mk k

ee

e

How good are the templates? A classification experiment…

0

0

1

( ) ( ( )) ( | ( ))

( ) ( ( )) ( | ( ))M

In general

or a mixture of these models

m

g gY F Y

g gY m F m Y m m

m

P y P f y P y F f y

P y P f y P y F f y

( ) is any function (feature),

such as a correlation with a SUBIMAGE.

Thus can index

* alternative models (e.g. 8 eye templates)

* transformations of scale, rotation, ...

mf y

m

(e.g. as in work of Amit and Trouve)

How good are the templates? A classification experiment…

Classify East Asian and South Asian * mixing over 4 scales, and 8 templates

East Asian: (L) examples of training images (M) progression of EM (R) trained templates

South Asian: (L) examples of training images (M) progression of EM (R) trained templates

Classification Rate: 97%

Other examples: noses 16 templates multiple scales, shifts, and rotations

samples from training set learned templates

Other examples: mixture of noses and mouths

samples from training set(1/2 noses, 1/2 mouths)

32 learned templates

Other examples: train on 58 faces …half with glasses…half without

32 learned templates

samples from training set

8 learned templates

Other examples: train on 58 faces …half with glasses…half without

8 learned templates

random eight of the 58 faces

row 2 to 4, top to bottom:templates ordered by posterior likelihood

Other examples: train random patches (“sparse representation”)

500 random 15x15 training patches from random internet images

24 10x10 templates

Other examples: coarse representation( ) ( , ( )),

( ) ( ( ), )?)

use where downconvert

(go other way for super res.:

f y Corr T D y D

f y Corr D T y

training of 8 low-res (10x10) templates

sample from training set(down-converted images)

IV. Sampling and the choice of null distribution 0

(1 ( )) 0( ) ( | ( ))M

m=1

Take a closer look at by (approximately) sampling from

m Tm

m m m

Y

c ygY m Y T T

P

P y e P y C c y

0 | | 1

| | | |

( )

* Standardize templates and patches:

* View & as distributions on the unit sphere in

* Then sample from by

1. choosing a mixing compo

m mm

m m

g SY Y

gY

T T y yT y

T T y y

P P R

y P y

(1 )

0

,...,1nent m according to

2. choosing a correlation according to

3. choosing a sample according to

and computing :

m

m

M

c

Y

c e

y P

y

(approximate) sampling…

| | 1unit sphere in SR

mT

y

y

c

(approximate) sampling…

032 samples from mixture model with white noiseYP

(approximate) sampling…

032 samples from mixture model with Caltech 101YP

(approximate) sampling…

032 samples from mixture model with from outdoor scenesYP

(approximate) sampling…

0

max | ( ) || |

32 samples from mixture model with random

patches satisfying

Y

s

s Ss

P

y y

y y

V. Other applications of conditional modeling

( )1. when two templates overlapgYP y

0 ( ) { }

2. Gibbs sampling: the problem is to draw a sample

from some distribution s s SP y y y

( )

* Given a sample at iteration from some

probability , visit a site t

t

P y s S

\( |* Replace by a sample from ) s s S sy P y y

\ \

01 \ \

0

: ( ) ( )

( ) ( ) ( | )

arg min ( || )

Then

S s t S s

t t S s s S s

P P y P y

P y P y P y y

D P P

3. Hierarchical models and the Markov Dilemma

{0,1}

1 'pair of eyes'

p

p

x

x

{0,1}

1 'left eye'l

l

x

x

{0,1}

1

'right eye'r

r

x

x

Markov model

Markov property…

Estimation

Computation

Representation

1Given , there are

probabilistic constraints

on the poses and

appearances of the

left and right eyes.

px

Hierarchical models and the Markov Dilemma{0,1}

1 'pair of eyes'

p

p

x

x

{0,1}

1 'left eye'l

l

x

x

{0,1}

1

'right eye'r

r

x

x

Markov

model0 ( )

1 ( )

( )

( ( ) |1 ( ) 1)A

More generally

Markov distribution

e.g. pair of eyes

attribute (e.g. relative

poses of two eyes)

P de

B

B

P x

x x B

a x

A a x x

sired

conditional distribution

1 0

: ( ( )|1 ( ) 1)( ( )|1 ( ) 1)

1 ( ) 1

1 00

( ) arg min ( || )

( ( ) |1 ( ) 1)( ) ( )

( ( ) |1 ( ) 1)

Choose

then

B

A B

B

P P A a x xP A a x x

x

A B

B

P x D P P

P A a x xP x P x

P A a x x

characters, plate sides

generic letter, generic number, L-junctions of sides

license plates

parts of characters, parts of plate sides

plate boundaries, strings (2 letters, 3 digits, 3 letters, 4 digits)

license numbers (3 digits + 3 letters, 4 digits + 2 letters)

Hierarchical models and the Markov dilemma

Original image Zoomed license region

Top object: Markov distribution

Top object: perturbed (“content-sensitive”) distribution

Hierarchical models and the Markov dilemma

PATTERN SYNTHESIS

= PATTERN ANALYSIS

Ulf Grenander