image manifolds 16-721: learning-based methods in vision alexei efros, cmu, spring 2007 © a.a....

Image Manifolds

16-721: Learning-based Methods in VisionAlexei Efros, CMU, Spring 2007

© A.A. Efros

With slides by Dave Thompson

Images as Vectors

=

m

n

n*m

Importance of Alignment

=

m

n

n*m

=

n*m

=?

Text Synthesis

[Shannon,’48] proposed a way to generate English-looking text using N-grams:• Assume a generalized Markov model

• Use a large text to compute prob. distributions of each letter given N-1 previous letters

• Starting from a seed repeatedly sample this Markov chain to generate new letters

• Also works for whole words

WE NEED TO EAT CAKE

Mark V. Shaney (Bell Labs)

Results (using alt.singles corpus):• “As I've commented before, really relating to

someone involves standing next to impossible.”

• “One morning I shot an elephant in my arms and kissed him.”

• “I spent an interesting evening recently with a grain of salt”

Video TexturesVideo Textures

Arno Schödl

Richard Szeliski

David Salesin

Irfan Essa

Microsoft Research, Georgia Tech

Video texturesVideo textures

Our approachOur approach

• How do we find good transitions?

Finding good transitions Finding good transitions

• Compute L2 distance Di, j between all frames

Similar frames make good transitions

frame ivs.

frame j

Markov chain representationMarkov chain representation

2 3 41

Similar frames make good transitions

Transition costs Transition costs

• Transition from i to j if successor of i is similar to j

• Cost function: Cij = Di+1, j

• i

j

i+ 1

j-1

i j D i+ 1, j

Transition probabilitiesTransition probabilities

•Probability for transition Pij inversely related

to cost:•Pij ~ exp ( – Cij / 2 )

high low

Preserving dynamicsPreserving dynamics

Preserving dynamics Preserving dynamics

Preserving dynamics Preserving dynamics

• Cost for transition ij

• Cij = wk Di+k+1, j+kk = -N

N -1

i

j j+ 1

i+ 1 i+ 2

j-1j-2

i jD i, j - 1 D Di+ 1, j i+ 2, j+ 1

i-1

D i-1, j-2

Preserving dynamics – effect Preserving dynamics – effect

• Cost for transition ij

• Cij = wk Di+k+1, j+kk = -N

N -1

Video sprite extractionVideo sprite extraction

blue screen m attingand velocity estim ation

C i j = + angle C i j

vector tom ouse pointer

S im ilarity term Contro l term

velocity vector

Animation{ {

Video sprite controlVideo sprite control

• Augmented transition cost:

Interactive fishInteractive fish

Advanced Perception David R. Thompson

manifold learning with applications to object recognition

plenoptic function

manifolds in vision

appearance variation

manifolds in vision

images from hormel corp.

deformation

manifolds in vision

images from www.golfswingphotos.com

Find a low-D basis for describing high-D data. X ~= X' S.T. dim(X') << dim(X)

uncovers the intrinsic dimensionality

manifold learning

If we knew all pairwise distances…

Chicago Raleigh Boston Seattle S.F. Austin Orlando

Chicago 0

Raleigh 641 0

Boston 851 608 0

Seattle 1733 2363 2488 0

S.F. 1855 2406 2696 684 0

Austin 972 1167 1691 1764 1495 0

Orlando 994 520 1105 2565 2458 1015 0

Distances calculated with geobytes.com/CityDistanceTool

Multidimensional Scaling (MDS) For n data points, and a distance matrix D,

Dij =

...we can construct a m-dimensional space to preserve inter-point distances by using the top eigenvectors of D scaled by their eigenvalues

j

i

MDS result in 2D

Actual plot of cities

Don’t know distances

Don’t know distnaces

1. data compression

2. “curse of dimensionality”

3. de-noising

4. visualization

5. reasonable distance metrics

why do manifold learning?

reasonable distance metrics

?


?

linear interpolation


?

manifold interpolation

Isomap for images

Build a data graph G. Vertices: images (u,v) is an edge iff SSD(u,v) is small For any two images, we approximate the

distance between them with the “shortest path” on G

Isomap

1. Build a sparse graph with K-nearest neighbors

Dg =

(distance matrix issparse)

Isomap

2. Infer other interpoint distances by finding shortest paths on the graph (Dijkstra's algorithm).

Dg =

Isomap shortest-distance on a graph is easy to compute

Isomap results: hands

- preserves global structure

- few free parameters

- sensitive to noise, noise edges

- computationally expensive (dense matrix eigen-reduction)

Isomap: pro and con

Leakage problem

Find a mapping to preserve local linear relationships between neighbors

Locally Linear Embedding

Locally Linear Embedding

1. Find weight matrix W of linear coefficients:

Enforce sum-to-one constraint.

LLE: Two key steps

2. Find projected vectors Y to minimize reconstruction error

must solve for whole dataset simultaneously

LLE: Two key steps

LLE: Result preserves local topology

PCA

LLE

- no local minima, one free parameter - incremental & fast

- simple linear algebra operations

- can distort global structure

LLE: pro and con

image manifolds 16-721: learning-based methods in vision alexei efros, cmu, spring 2007 © a.a....

Documents

distance matrix d

distance matrix issparseisomap

isomap shortestdistance

transition ijcij

transition pij

dynamics cost

highd data

vision images