need volunteers… from monday’s paper: a simple story about representations input signal: a...

Need volunteers…

From Monday’s paper: A simple story about

representations

Input signal: a moving edge.

Model it using an auto-regressive model,

Using two different representations for observations y:

Representation 1: image-based.

Representation 2: position-based.

Cxy Axx

Input signal

Representation 1

Bases, n=8

Representation 1

Dynamics, n=8

Representation 1

Bases, n=20

Representation 1

Dynamics, n=20

Representation 1

Bases, n=50

Representation 1

N = 50 dynamics

Representation 1

What happens next?

Representation 1

Representing the edge position

Input signal: y = [1:100]

What dimension of an auto-regressive model do we need to describe that signal?

Representation 2

N = 1

Can only show exponentially decaying position.

axx

Representation 2

N = 2

A 2-d model can handle uniform translation exactly.

Axx

110

11

111 xx

Representation 2

The simple story

For a simple, canonical signal like a moving edge,

modelling it with an AR model,

The pixel-based representation requires a high-dimensional state vector, and even then doesn’t work very well.

The position-based representation works perfectly with a 2-dimensional state vector.

Separating style and content with bilinear

models

Bill Freeman, MIT AI Lab.

Josh Tenenbaum, MIT Dept. Brain and Cognitive Sciences

Content Style

character font

rendered observation

A

Matura MTCharacter #1

not observe

d

observed

syn

thesis

an

aly

sis

Style and content example

Domain Content Style

typography letter font

face recognition identity head orientation

shape from shading shape lighting

color perception object color illum. color

speech recognition words speaker

Many perception problems have this

two-factor structure

Factor 1 Factor 2

Color constancy demo

How much of what we may consider to be (high-level) visual style can we account for by a simple, low-level statistical model?

Given: observations that are the result of two strongly interacting factors,

can we separately analyze or manipulate those two factors?

Perceptual tasks

Common form of observations

A B C

D E F

G H I . . .

factor 1

facto

r 2

General case

content-class (“b” values)

style (“a”

values)

f(a1,b1) f(a1,b2) f(a1,b3) ...

f(a2,b1) f(a2,b2) f(a2,b3) ... ... ... ...

Account for observationsby a rendering function, f(a,b)

Asymmetric bilinear model

ysc = f(As , bc) = As bc

Observation vectorin style s and content c

Matrix for style s

Vector for content element c

Asymmetric bilinear model, with

identity is the style factor.

Symmetric bilinear model

ysck = f(as, bc) = as Wk bc

Kth element ofthe observation vectorin style s and content c

Matrix for element kof observationvector.

Vector for content element c

Vector for style s

Symmetric bilinear model

Fitting model to training observations

Iterate SVD’sMagnus and Neudecker, 1988

SVD

=...

...

Asymmetric model

Symmetric model

ysc = As bc

ysck = as Wk bc

head pose

identity

y =

Vector transpose

Related Work, bilinear models

Koenderink and Van Doorn, 1991, 1996

Tomasi and Kanade, 1992

Faugeras, 1993

Magnus and Neudecker, 1988

Marimont and Wandell, 1992

Turk and Pentland, 1991

Ullman and Basri, 1991

Murase and Nayar, 1995

Related Work, analyzing style

Hofstadter, 1995 and earlier papers.

Grebert et al, 1992

SIGGRAPH papers regarding controls for animation or line style. Typically hand-crafted, not learned.

Brand and Hertzmann, 2000

Hertzmann et al, 2001

Efros and Freeman, 2001

Procedure

(1) Fit a bilinear model to the training data of content elements observed across different styles, using linear algebra techniques.

(2) Use new data to find the parameters for a new, unknown style, or to classify new observations, or to generalize both style and content.

phoneme

speaker

“ah eh ou ... ”

“ah eh ou … ”

“ah eh ou ... ”

“ah eh ou ... ”

“ah eh ou ...”

“eh ee ”ou eeuah

training set

utterances from new speaker

Task: ClassificationDomain: vowel phonemes

Benchmark dataset

CMU machine learning repository

Training: 8 speakers saying 11 different vowel phonemes.

Testing: 7 new speakers

Data representation: LPC coefficients.

Classification using bilinear models

Use EM (expectation maximization) algorithm.

Build up model for new speaker’s style simultaneously with classification of the content.

yobserved = Anew speaker bphonemes

Vowel datafrom a speaker in a new style

Matrix describing the unknown style of the new speaker

Previously learned vowel (content) descriptors

Example problem for Expectation Maximization

(EM) algorithm

“Find the probability that each point came from one of two random spatial processes”.

Estimate the underlyingprobability distributions

Assign classmembership probabilities

EM algorithm

(E-step)(M-step)

Classification results: performance comparison

Multi-layer perceptron: 51%1-nearest neighbor (nn): 56%Discrm. adapt. nn: 62%Bilinear model:

data not grouped 69%data grouped by speaker 76%

Task: ClassificationDomain: faces and pose.

Nearest neighbor matching: 53%

Bilinear model: Estimate As while classify bc with EM: 74%

Face pose classification results

Given observations of a new face, what % of the poses can we identify correctly?

Chicago

Zaph

Times

Mistral

Times Bold

Monaco

(Rest of alphabet, used in training, not shown.)

Task: ExtrapolationDomain: typography

Coulomb warp representation

Describe each shape by the warp that a square of ink particles would have to undergo to form the shape.

Coulomb warping

reference shape

target shape

+ charges

- charges

Coulomb warp representation

shapes averages

pix

el

Cou

lom

b

warp

S1 S2 S1+ S2 S1+ S2

(pixel) (Coulomb)

Basis functions for the asymmetric bilinear model

bletter “C”

Achicago

Azaph

Amistral

x x x x x x x

=

=

=

Controlling complexity in calculating the style matrix for

the new font

asymmetric model, using symmetric model as a

prior

asymmetric model

(173,280 parameters to fit)

symmetric model

(5 parameters to fit)

Monaco

(true)

synthetic

actual

Chicago

Zaph

Times

Mistral

Times Bold

Monaco

Results of extrapolation to a new style

Leave-one-out results

Ch

ica

go

Za

ph

Ch

an

ce

ry

Tim

es

Mis

tra

l

Tim

es

Bo

ld

Mo

na

co

Task: TranslationDomain: shape and lighting

Factor 2: Identity (face shape)

Facto

r 1:

ligh

ti ng

(1) Fit symmetric bilinear model to training data (pixel representation).(2) Solve for parameters describing face and lighting of new image.

Training

Generalization

Translation Results

Factor 2: Identity (face shape)

Facto

r 1: lig

htin

g

Conclusion: bilinear models are useful for translation,

classification, and extrapolation perceptual tasks.

factor 1 factor 2 observation

letter#1 Matura MT

phoneme speaker “ahh”

pose 3 Hiro

illuminant surface color eye cone

A

responses

End. Extra pages follow.

The following slides are extras….

Style and content

Mention unsupervised version would be a good class project. Josh or I would be into working with someone on it.

Increase dimensionality to represent non-linearities

Say f(x) = p x2 + q x + r.This parabola varies non-linearly with x,

but as a linear function of .

(Like “homogeneous coordinates” in graphics)

x2

x

1

Fitting parabolas

1-d model

2-d model

3-d model

Reconstruction from low-dimensional model

Eigenfaces for each pose

Factor 1:

head pose

Factor 2:

identity

Task: ClassificationDomain: faces

and pose.

We build a bilinear model of how head pose and identity modify face appearance.

Basis images

Pose

Pose-dependent basis functions for face appearance.

One set of coefficients will reconstruct the same person in different poses.

need volunteers… from monday’s paper: a simple story about representations input signal: a...

Documents