1 articulated pose estimation in a learned smooth space of feasible solutions taipeng tian, rui li...
Post on 17-Dec-2015
217 Views
Preview:
TRANSCRIPT
1
Articulated Pose Estimation in a Learned Smooth Space of
Feasible Solutions
Taipeng Tian, Rui Li and Stan Sclaroff
Computer Science Dept.
Boston University
2
Introduction
• Motivating application– Gesture Recognition– Fixed Gesture Lexicon.– For example :
Aircraft Signaler hand gestures
Traffic Controllerhand Signals
Basketball Refereehand Signals
3
Pose
Estimation
Problem Definition
2D Projected Marker Positions
Input (Observation) Output
Silhouette(Alt Moments)
4
Related Work : Pose Estimation from a Single Image
• Geometry Based – Taylor CVIU ’01– Barron & Kakadiaris IVC ’01– Parameswaran & Chellappa CVPR ‘04
• Learning Based– Rosales & Sclaroff HUMO ’00– Agarwal & Triggs CVPR ’04
• Others– Lee & Cohen CVPR ’04– Shakhnarovich, Viola, Darrell ICCV ’03– Mori, Ren, Efros and Malik CVPR ‘04– Many more …
5
Idea 1 : Learning Mappings
• Specialized Mapping Architechture (SMA)[Rosales and Sclaroff NIPS ‘01]
• Relevance Vector Regression[Agarwal and Triggs CVPR ‘04]
Image Features
Pose
6
Idea 1 : Learning Mappings
• Specialized Mapping Architechture (SMA)[Rosales and Sclaroff NIPS ‘01]
• Relevance Vector Regression[Agarwal and Triggs CVPR ‘04]
Image Features
Pose
7
Idea 2 : Exploring the Solution Space
• Simulated Annealing[Deutscher et al. CVPR ’00]
• Monte Carlo Markov Chain[Lee and Cohen CVPR ‘04]
• etc …
8
Idea 2 : Exploring the Solution Space
• Simulated Annealing[Deutscher et al. CVPR ’00]
• Monte Carlo Markov Chain [Lee and Cohen CVPR ‘04]
• etc …
• Accurate model and typically with high DOF.
• Exploring the pose space for a solution consistent with observations.
• Difficult for high DOF.
• Computationally intensive.
9
Key Observations
• We have a constrained set of poses.• Not necessary to explore the full parameter space.• Combine two ideas
– Learn Mappings– Explore a constrained space (i.e. learned model of body poses)
Aircraft Signaler hand gestures
Traffic Controllerhand Signals
Basketball Refereehand Signals
10
Overview of Framework
Learn the rendering function Φ(.)
Learn a model of human body poses1
2
Y: Training DataLearning Phase
)(||||min 12 yx,sΦ(y)
yx,YL
Pose Inference PhaseInput Silhouette Output Pose
X: Latent Space
11
Learning a Model of Human Poses
• Gaussian Process Latent Variable Model (GPLVM) [Neil Lawrence NIPS ’04] is used.
• GPLVM originally used for visualizing high dimensional data
• Grochow et al. (SIGGRAPH ’03) uses it to solve the inverse kinematics problem for human motion animation.
• Currently we use it for automated articulated body pose inference
12
Gaussian Process Latent Variable
Model(GPLVM) Overview
Higher Dimensional
Lower Dimensional / Latent Space
Probabilistic Mapping
y
x
13
GPLVM Training : Learning a Model of Body Poses
• Given : training set of 2D projected marker positions {yi} (each yi is of D dimension)
• Goal : Learn parameters ,,},{ ix
Corresponding latent variable valuesfor each training data point
Variables related to the Kernel
14
Kernel Function
• Also known as covariance function.• Measures the similarity of the latent
variables x and x’.
• For a data set of size N, we form an N by N kernel matrix K, in which Ki,j = k(xi, xj).
1',
2'
2-exp )',(
xxxxxxk
15
• For a single dimension, the likelihood of y given the Gaussian Process (GP) model parameters is:
• Joint likelihood for D dimensions is:
dTdNdidip YKY
Kxy 1
,, 2
1exp
||)2(
1),,},{|}({
D
d idiii pp1 , ),,},{|}({),,},{|}({ xyxy
GPLVM Training : Learning a Model of Body Poses
16
}){|,,},({ iip yx
)ln(||||2
1
2
1||ln
221
ii
Td
d
Td
DxYKYK
To learn GPLVM from the training set {yi}, we maximize the following posterior:
And placing the priors
)|()( I0xx ,Np 1
),,( p
Negative Log
17
}){|,,},({ iip yx
)ln(||||2
1
2
1||ln
221
ii
Td
d
Td
DxYKYK
To learn GPLVM from the training set {yi}, we maximize the following posterior:
Negative Log
Computationally Intensive. A subsetis chosen to compute the kernel matrix.This subset of poses is called the ActiveSet.
18
• For a new pair (x,y) we can predict using
222
2
||||2
1)(ln
2)(2
||)f(||
),,,},{|},({ln),(
xxx
xy
xxyyyx
D
pL iiY
)},{|,,,},({ y'yx'x iip
• This eqn. can be used to solve for x given y or y given x, via gradient descent.
19
GPLVM1x
2x
20
GPLVM1x
2x
21
GPLVMLeft hand raised silhouettes tend to be clustered together
22
GPLVMDoes not always do a good job
23
About GPLVM
• Allows mapping to and from the lower dimensional space.
• Allows smooth parameterization (i.e. allows derivatives) in lower dimensional space.
• Two dimensions work well for our data set. (Growchow et al. uses 2-5)
24
Input2D Pose
Silhouettes (Represented using Alt Moments)
Learning the Forward/Rendering Function
Similar to Rosales and Sclaroff
25
Overview of Framework
Learn the rendering function Φ(.)
Learn a model of human body poses1
2
Y: Training DataLearning Phase
)(||||min 12 yx,sΦ(y)
yx,YL
Pose Inference PhaseInput Silhouette Output Pose
X: Latent Space
26
Pose Inference
21
2 ||||||||min ysΦ(y)y
Typical Regularization(Also used by Agarwal and Triggs)
27
Data Term
21
2 ||||||||min ysΦ(y)y
Forward function (Rendering function)
2D Projected Marker Positions
Silhouette(Alt Moments)
28
Regularization Term
21
2 ||||||||min ysΦ(y)yx,
Replace with prior knowledge term(i.e the learned model of poses)
)(1 yx,YLIndependent of feature s
29
Pose Inference
)(||||min 12 yx,sΦ(y)
yx,YL
Solution obtained using Conjugate Gradient- Initialization using Active Set
30
Data Collection
• 12 gestures in the flight director lexicon
• Synthesize 6000 pairs of (Silhouette, Pose) pairs using Poser
• 3000 training (Male model)
• 3000 testing (Female model)
3D Pose
Synthesized Silhouettes sampledUniformly over the frontal view-sphere
31
(a) Silhouette images generated by Poser 5 (Test Set)
Experiments (Synthetic Data)
(c) Our Approach
(b) Estimation from SMA (Specialized Mapping Architecture)
(d) Ground Truth
32
Comparison with SMA
33
Additional Constraints
212 |||| tt yy
)(||||min 12
ttYt Ltt
y,xs)Φ(yy,x
Additional constraints can be added to achieve more accurate estimate, e.g. temporal consistency
34
Experiments (Real Data)
(d) Our Approach (With Temporal Consistency)
(a) Silhouette images of real person
(b) SMA (Specialized Mapping Architecture)
(c) Our Approach (Without Temporal Consistency)
35
Experiments (Real Data)
(a) Silhouette images of real person
(b) SMA (Specialized Mapping Architecture)
(c) Our Approach (Without Temporal Consistency)
(d) Our Approach (With Temporal Consistency)
36
Conclusion• Proposed a novel method for Pose
estimation for a pre-defined gesture lexicon.
• Interesting to note that two dimension is enough in our case.
• Technique is fast. (about 0.1 sec per frame in Matlab)
• Tracking as an extension. [video]
37
Thank You
38
Comments after the talk• Related Works
– Bullets / Summary of Strength vs Weakness– Why we need this work?
• Include year of publication for the related work (eg Rosales Sclaroff work not mentioned, Smichisecu work not mentioned)
• Order the related work temporally?• Include an introduction slide and motivating slide
– How to Motivate this work?– State of the art is so and so… We found this common weakness. So we proposed this
work..• Human Pose not mentioned in Intro• At the end of the talk say why use this work over the others• Why GPLVM and not other reduction techniques? Like LLE/PCA/ISOMAP etc• Give a top overview of the algorithm. A flow chart view?• Explain the L(x,y) mapping using an illustration like the mapping between two planes.
Clearly say what is high dimension y and what is low dimension x• Give reference for GPLVM or website link.• Add a slide on Math of GPLVM• The Tikhonov regularization approach of minimizing ||phi(y)-s|| + regularization term.
Usually the regularization term is ||Dx|| but now we chose L(x,y). Explain why• Slide to talk about temporal constraint.• Why learn the rendering function? i.e because we want to take the derivative…• Give the numbers for the training set and this gives an idea how good are the
quantitative results
39
Related Work
Model Based• Simulated Annealing
[Deutscher et al CVPR ’00]
• Kinematic Jump Processes[Sminchisescu and Triggs CVPR ’03]
• Monte Carlo Markov Chain [Lee and Cohen CVPR ‘04]
• etc …
Learning Based• Specialized Mapping
Architechture (SMA)[Rosales and Sclaroff NIPS ‘01]
• Relevance Vector Regression[Agarwal and Triggs CVPR ‘04]
• Parameter Sensitive Hashing[Shakhnarovich et al CVPR ‘03 ]
• etc …
40
}){|,,},({ iip yx
)ln(||||2
1
2
1||ln
221
ii
Td
d
Td
DL xYKYK
To learn GPLVM from the training set {yi}, we maximize the following posterior:
Negative Log
41
Overview of Framework (Learning Phase)
Learn the Rendering Function Φ(.)
Learning a model of human body poses(Using GPLVM)1 2
42
Overview of Framework (Estimation Phase)
Input Silhouette
Output Pose
Search over learned model of human body pose for solution consistent with observation
43
Kernel Function
• measures the similarity of the latent variables x and x’.
• For a data set of N, we can form a N by N kernel matrix K, in which Ki,j = k(xi, xj).
1',
2'
2-exp )',(
xxxxxxk
how correlated x, x’ are in general spread of the
functionnoise in the prediction
44
}){|,,},({ iip yx
To learn the parameters of the GPLVM from the training set {yi}, we maximize the following posterior:
And placing the priors
)|()( I0xx ,Np
GPLVM Training : Learning a Model of Body Poses
1
),,( p
45
Gaussian Process Latent Variable Model(GPLVM)
)( yx,YL
Low dimensional parameterization
Original space representationExpress how well
the two value matches
Space of FeasiblePoses
46
• For a new pair (x,y) we can predict using
222
2
||||2
1)(ln
2)(2
||)f(||
),,,},{|},({ln),(
xxx
xy
xxyyyx
D
pL iiY
)},{|,,,},({ y'yx'x iip
)()( 1 xT kKYxf )()(),()( 12 xKxxxx kkk T
top related