hilbert space embeddings of conditional distributions -- with applications to dynamical systems le...
TRANSCRIPT
Hilbert Space Embeddings of Conditional Distributions
--With Applications to Dynamical Systems
Le SongCarnegie Mellon University
Joint work with Jonathan Huang, Alex Smola and Kenji Fukumizu
Hilbert Space Embeddings
• Summary statistics for distributions:
(mean)
(covariance)
(expected features)
Hilbert Space Embeddings
• Reconstruct from for certain kernels!Eg. RBF kernel .
• Sample average converges to true mean at .
Hilbert Space Embeddings
Embed Conditional Distributions p(y|x) µ[p(y|x)]
Embed Conditional Distributions p(y|x) µ[p(y|x)]
Structured Prediction
Dynamical Systems
Embedding P(Y|X)
• Simple case: X binary, Y vector valued• Need to embed
• Solution:– Divide observations according to their – Average feature for each group
What if X is continuous or structured?
• Goal: Estimate embedding from finite sample• Problem: Some X=x are never observed…
Embed p(Y|X=x) for x never observed?
Embedding P(Y|X)
• If and are close, and will be similar.
Key Idea
Generalize to unobserved X=x via a kernel on X
Conditional Embedding
Conditional Embedding Operator
Conditional Embedding Operator
generalization of covariance matrix
Kernel Estimate
• Empirical estimate converges at
Sum and Product RulesProbabilistic
RelationHilbert Space
Relation
Sum Rule
Product Rule
Can do probabilistic inference with embeddings!
Dynamical Systems
Maintain belief usingHilbert space embeddings
Hidden State
observation
Advantages
• Applies to any domain with a kernelEg. reals, rotations, permutations, strings
• Applies to more general distributionsEg. multimodal, skewed
• Linear dynamics model (Kalman filter optimal):
Evaluation with Linear Dynamics
Trajectory of Hidden States
Observations
Kalman
2 4 6
0.2
0.25
0.3
0.35
N(0,10-1 I), N(0,10-2 I)
Training Size ( Steps)
RM
S E
rror
RKHS
Gaussian process noise, Gaussian observation noise
2 4 6
0.2
0.22
0.24
(1,1/3), N(0,10-2 I)
Training Size ( Steps)
RM
S E
rror
RKHS
Kalman
Skewed process noise, small observation noise
2 3 4 5 60.15
0.155
0.16
0.165
0.17
0.175
0.5N(-0.2,10 -2 I)+0.5N(0.2,10 -2 I), N(0,10 -2 I)
Training Size ( Steps)
RM
S E
rror
RKHSKalman
Bimodal process noise, small observation noise
Camera Rotation
Rendered using POV-ray renderer
Predicted Camera Rotation
Ground truth
RKHS Kalman filter Random
Trace measure
3 2.91 2.18 0.01
Dynamical systems withHilbert space embeddings
works better
Video from Predicted Rotation
Original VideoVideo Rendered with
Predicted Camera Rotation
Conclusion
• Embedding conditional distributions– Generalizes to unobserved X=x– Kernel estimate – Convergence of empirical estimate– Probabilistic inference
• Application to dynamical systems learning and inference
The End
• Thanks
Future Work
• Estimate the hidden states during training?
• Dynamical system = chain graphical model; how about tree and general graph topology?
2 4 6
0.3
0.4
0.5
N(0,10-2 I), N(0,10-1 I)
Training Size (2 n Steps)
RM
S E
rror
RKHS
Kalman
Small process noise, large observation noise
Dynamics can Help
-3 -2 -1 0
11.5
22.5
log10(Noise Variance)
Tra
ce M
easu
re
Dynamical Systems
Maintain belief usingHilbert space embeddings
Hidden State
observation
Conditional Embedding Operator
• Desired properties:1. 2.
• Covariance operator:
generalization of covariance matrix