need volunteers… from monday’s paper: a simple story about representations input signal: a...
Post on 27-Dec-2015
215 Views
Preview:
TRANSCRIPT
Need volunteers…
From Monday’s paper: A simple story about
representations
Input signal: a moving edge.
Model it using an auto-regressive model,
Using two different representations for observations y:
Representation 1: image-based.
Representation 2: position-based.
Cxy Axx
Input signal
Representation 1
Bases, n=8
Representation 1
Dynamics, n=8
Representation 1
Bases, n=20
Representation 1
Dynamics, n=20
Representation 1
Bases, n=50
Representation 1
N = 50 dynamics
Representation 1
What happens next?
Representation 1
Representing the edge position
Input signal: y = [1:100]
What dimension of an auto-regressive model do we need to describe that signal?
Representation 2
N = 1
Can only show exponentially decaying position.
axx
Representation 2
N = 2
A 2-d model can handle uniform translation exactly.
Axx
110
11
111 xx
Representation 2
The simple story
For a simple, canonical signal like a moving edge,
modelling it with an AR model,
The pixel-based representation requires a high-dimensional state vector, and even then doesn’t work very well.
The position-based representation works perfectly with a 2-dimensional state vector.
Separating style and content with bilinear
models
Bill Freeman, MIT AI Lab.
Josh Tenenbaum, MIT Dept. Brain and Cognitive Sciences
Content Style
character font
rendered observation
A
Matura MTCharacter #1
not observe
d
observed
syn
thesis
an
aly
sis
Style and content example
Domain Content Style
typography letter font
face recognition identity head orientation
shape from shading shape lighting
color perception object color illum. color
speech recognition words speaker
Many perception problems have this
two-factor structure
Factor 1 Factor 2
Color constancy demo
How much of what we may consider to be (high-level) visual style can we account for by a simple, low-level statistical model?
Given: observations that are the result of two strongly interacting factors,
can we separately analyze or manipulate those two factors?
Perceptual tasks
Common form of observations
A B C
D E F
G H I . . .
factor 1
facto
r 2
General case
content-class (“b” values)
style (“a”
values)
f(a1,b1) f(a1,b2) f(a1,b3) ...
f(a2,b1) f(a2,b2) f(a2,b3) ... ... ... ...
Account for observationsby a rendering function, f(a,b)
Asymmetric bilinear model
ysc = f(As , bc) = As bc
Observation vectorin style s and content c
Matrix for style s
Vector for content element c
Asymmetric bilinear model, with
identity is the style factor.
Symmetric bilinear model
ysck = f(as, bc) = as Wk bc
Kth element ofthe observation vectorin style s and content c
Matrix for element kof observationvector.
Vector for content element c
Vector for style s
Symmetric bilinear model
Fitting model to training observations
Iterate SVD’sMagnus and Neudecker, 1988
SVD
=...
...
Asymmetric model
Symmetric model
ysc = As bc
ysck = as Wk bc
head pose
identity
y =
Vector transpose
Related Work, bilinear models
Koenderink and Van Doorn, 1991, 1996
Tomasi and Kanade, 1992
Faugeras, 1993
Magnus and Neudecker, 1988
Marimont and Wandell, 1992
Turk and Pentland, 1991
Ullman and Basri, 1991
Murase and Nayar, 1995
Related Work, analyzing style
Hofstadter, 1995 and earlier papers.
Grebert et al, 1992
SIGGRAPH papers regarding controls for animation or line style. Typically hand-crafted, not learned.
Brand and Hertzmann, 2000
Hertzmann et al, 2001
Efros and Freeman, 2001
Procedure
(1) Fit a bilinear model to the training data of content elements observed across different styles, using linear algebra techniques.
(2) Use new data to find the parameters for a new, unknown style, or to classify new observations, or to generalize both style and content.
phoneme
speaker
“ah eh ou ... ”
“ah eh ou … ”
“ah eh ou ... ”
“ah eh ou ... ”
“ah eh ou ...”
“eh ee ”ou eeuah
training set
utterances from new speaker
Task: ClassificationDomain: vowel phonemes
Benchmark dataset
CMU machine learning repository
Training: 8 speakers saying 11 different vowel phonemes.
Testing: 7 new speakers
Data representation: LPC coefficients.
Classification using bilinear models
Use EM (expectation maximization) algorithm.
Build up model for new speaker’s style simultaneously with classification of the content.
yobserved = Anew speaker bphonemes
Vowel datafrom a speaker in a new style
Matrix describing the unknown style of the new speaker
Previously learned vowel (content) descriptors
Example problem for Expectation Maximization
(EM) algorithm
“Find the probability that each point came from one of two random spatial processes”.
Estimate the underlyingprobability distributions
Assign classmembership probabilities
EM algorithm
(E-step)(M-step)
Classification results: performance comparison
Multi-layer perceptron: 51%1-nearest neighbor (nn): 56%Discrm. adapt. nn: 62%Bilinear model:
data not grouped 69%data grouped by speaker 76%
Task: ClassificationDomain: faces and pose.
Nearest neighbor matching: 53%
Bilinear model: Estimate As while classify bc with EM: 74%
Face pose classification results
Given observations of a new face, what % of the poses can we identify correctly?
Chicago
Zaph
Times
Mistral
Times Bold
Monaco
(Rest of alphabet, used in training, not shown.)
Task: ExtrapolationDomain: typography
Coulomb warp representation
Describe each shape by the warp that a square of ink particles would have to undergo to form the shape.
Coulomb warping
reference shape
target shape
+ charges
- charges
Coulomb warp representation
shapes averages
pix
el
Cou
lom
b
warp
S1 S2 S1+ S2 S1+ S2
(pixel) (Coulomb)
Basis functions for the asymmetric bilinear model
bletter “C”
Achicago
Azaph
Amistral
x x x x x x x
=
=
=
Controlling complexity in calculating the style matrix for
the new font
asymmetric model, using symmetric model as a
prior
asymmetric model
(173,280 parameters to fit)
symmetric model
(5 parameters to fit)
Monaco
(true)
synthetic
actual
Chicago
Zaph
Times
Mistral
Times Bold
Monaco
Results of extrapolation to a new style
Leave-one-out results
Ch
ica
go
Za
ph
Ch
an
ce
ry
Tim
es
Mis
tra
l
Tim
es
Bo
ld
Mo
na
co
Task: TranslationDomain: shape and lighting
Factor 2: Identity (face shape)
Facto
r 1:
ligh
ti ng
(1) Fit symmetric bilinear model to training data (pixel representation).(2) Solve for parameters describing face and lighting of new image.
Training
Generalization
Translation Results
Factor 2: Identity (face shape)
Facto
r 1: lig
htin
g
Conclusion: bilinear models are useful for translation,
classification, and extrapolation perceptual tasks.
factor 1 factor 2 observation
letter#1 Matura MT
phoneme speaker “ahh”
pose 3 Hiro
illuminant surface color eye cone
A
responses
End. Extra pages follow.
The following slides are extras….
Style and content
Mention unsupervised version would be a good class project. Josh or I would be into working with someone on it.
Increase dimensionality to represent non-linearities
Say f(x) = p x2 + q x + r.This parabola varies non-linearly with x,
but as a linear function of .
(Like “homogeneous coordinates” in graphics)
x2
x
1
Fitting parabolas
1-d model
2-d model
3-d model
Reconstruction from low-dimensional model
Eigenfaces for each pose
Factor 1:
head pose
Factor 2:
identity
Task: ClassificationDomain: faces
and pose.
We build a bilinear model of how head pose and identity modify face appearance.
Basis images
Pose
Pose-dependent basis functions for face appearance.
One set of coefficients will reconstruct the same person in different poses.
top related