hilbert space embeddings of conditional distributions -- with applications to dynamical systems le...

Hilbert Space Embeddings of Conditional Distributions

--With Applications to Dynamical Systems

Le SongCarnegie Mellon University

Joint work with Jonathan Huang, Alex Smola and Kenji Fukumizu

Hilbert Space Embeddings

• Summary statistics for distributions:

(mean)

(covariance)

(expected features)


• Reconstruct from for certain kernels!Eg. RBF kernel .

• Sample average converges to true mean at .


Embed Conditional Distributions p(y|x) µ[p(y|x)]

Embed Conditional Distributions p(y|x) µ[p(y|x)]

Structured Prediction

Dynamical Systems

Embedding P(Y|X)

• Simple case: X binary, Y vector valued• Need to embed

• Solution:– Divide observations according to their – Average feature for each group

What if X is continuous or structured?

• Goal: Estimate embedding from finite sample• Problem: Some X=x are never observed…

Embed p(Y|X=x) for x never observed?

Embedding P(Y|X)

• If and are close, and will be similar.

Key Idea

Generalize to unobserved X=x via a kernel on X

Conditional Embedding

Conditional Embedding Operator


generalization of covariance matrix

Kernel Estimate

• Empirical estimate converges at

Sum and Product RulesProbabilistic

RelationHilbert Space

Relation

Sum Rule

Product Rule

Can do probabilistic inference with embeddings!

Dynamical Systems

Maintain belief usingHilbert space embeddings

Hidden State

observation

Advantages

• Applies to any domain with a kernelEg. reals, rotations, permutations, strings

• Applies to more general distributionsEg. multimodal, skewed

• Linear dynamics model (Kalman filter optimal):

Evaluation with Linear Dynamics

Trajectory of Hidden States

Observations

Kalman

2 4 6

0.2

0.25

0.3

0.35

N(0,10-1 I), N(0,10-2 I)

Training Size ( Steps)

RM

S E

rror

RKHS

Gaussian process noise, Gaussian observation noise

2 4 6

0.2

0.22

0.24

(1,1/3), N(0,10-2 I)


RM

S E

rror

RKHS

Kalman

Skewed process noise, small observation noise

2 3 4 5 60.15

0.155

0.16

0.165

0.17

0.175

0.5N(-0.2,10 -2 I)+0.5N(0.2,10 -2 I), N(0,10 -2 I)


RM

S E

rror

RKHSKalman

Bimodal process noise, small observation noise

Camera Rotation

Rendered using POV-ray renderer

Predicted Camera Rotation

Ground truth

RKHS Kalman filter Random

Trace measure

3 2.91 2.18 0.01

Dynamical systems withHilbert space embeddings

works better

Video from Predicted Rotation

Original VideoVideo Rendered with

Predicted Camera Rotation

Conclusion

• Embedding conditional distributions– Generalizes to unobserved X=x– Kernel estimate – Convergence of empirical estimate– Probabilistic inference

• Application to dynamical systems learning and inference

The End

• Thanks

Future Work

• Estimate the hidden states during training?

• Dynamical system = chain graphical model; how about tree and general graph topology?

2 4 6

0.3

0.4

0.5

N(0,10-2 I), N(0,10-1 I)

Training Size (2 n Steps)

RM

S E

rror

RKHS

Kalman

Small process noise, large observation noise

Dynamics can Help

-3 -2 -1 0

11.5

22.5

log10(Noise Variance)

Tra

ce M

easu

re

Dynamical Systems

Maintain belief usingHilbert space embeddings

Hidden State

observation


• Desired properties:1. 2.

• Covariance operator:

generalization of covariance matrix

hilbert space embeddings of conditional distributions -- with applications to dynamical systems le...

Documents