1 articulated pose estimation in a learned smooth space of feasible solutions taipeng tian, rui li...

Articulated Pose Estimation in a Learned Smooth Space of

Feasible Solutions

Taipeng Tian, Rui Li and Stan Sclaroff

Computer Science Dept.

Boston University

Introduction

• Motivating application– Gesture Recognition– Fixed Gesture Lexicon.– For example :

Aircraft Signaler hand gestures

Traffic Controllerhand Signals

Basketball Refereehand Signals

Estimation

Problem Definition

2D Projected Marker Positions

Input (Observation) Output

Silhouette(Alt Moments)

Related Work : Pose Estimation from a Single Image

• Geometry Based – Taylor CVIU ’01– Barron & Kakadiaris IVC ’01– Parameswaran & Chellappa CVPR ‘04

• Learning Based– Rosales & Sclaroff HUMO ’00– Agarwal & Triggs CVPR ’04

• Others– Lee & Cohen CVPR ’04– Shakhnarovich, Viola, Darrell ICCV ’03– Mori, Ren, Efros and Malik CVPR ‘04– Many more …

Idea 1 : Learning Mappings

• Specialized Mapping Architechture (SMA)[Rosales and Sclaroff NIPS ‘01]

• Relevance Vector Regression[Agarwal and Triggs CVPR ‘04]

Image Features

Idea 1 : Learning Mappings

• Specialized Mapping Architechture (SMA)[Rosales and Sclaroff NIPS ‘01]

Image Features

Idea 2 : Exploring the Solution Space

• Simulated Annealing[Deutscher et al. CVPR ’00]

• Monte Carlo Markov Chain[Lee and Cohen CVPR ‘04]

• etc …

Idea 2 : Exploring the Solution Space

• Simulated Annealing[Deutscher et al. CVPR ’00]

• Monte Carlo Markov Chain [Lee and Cohen CVPR ‘04]

• etc …

• Accurate model and typically with high DOF.

• Exploring the pose space for a solution consistent with observations.

• Difficult for high DOF.

• Computationally intensive.

Key Observations

• We have a constrained set of poses.• Not necessary to explore the full parameter space.• Combine two ideas

– Learn Mappings– Explore a constrained space (i.e. learned model of body poses)

Aircraft Signaler hand gestures

Traffic Controllerhand Signals

Basketball Refereehand Signals

Overview of Framework

Learn the rendering function Φ(.)

Learn a model of human body poses1

Y: Training DataLearning Phase

)(||||min 12 yx,sΦ(y)

Pose Inference PhaseInput Silhouette Output Pose

X: Latent Space

Learning a Model of Human Poses

• Gaussian Process Latent Variable Model (GPLVM) [Neil Lawrence NIPS ’04] is used.

• GPLVM originally used for visualizing high dimensional data

• Grochow et al. (SIGGRAPH ’03) uses it to solve the inverse kinematics problem for human motion animation.

• Currently we use it for automated articulated body pose inference

Gaussian Process Latent Variable

Model(GPLVM) Overview

Higher Dimensional

Lower Dimensional / Latent Space

Probabilistic Mapping

GPLVM Training : Learning a Model of Body Poses

• Given : training set of 2D projected marker positions {yi} (each yi is of D dimension)

• Goal : Learn parameters ,,},{ ix

Corresponding latent variable valuesfor each training data point

Variables related to the Kernel

Kernel Function

• Also known as covariance function.• Measures the similarity of the latent

variables x and x’.

• For a data set of size N, we form an N by N kernel matrix K, in which Ki,j = k(xi, xj).

2-exp )',(

xxxxxxk

• For a single dimension, the likelihood of y given the Gaussian Process (GP) model parameters is:

• Joint likelihood for D dimensions is:

dTdNdidip YKY

1),,},{|}({

d idiii pp1 , ),,},{|}({),,},{|}({ xyxy

}){|,,},({ iip yx

)ln(||||2

DxYKYK

To learn GPLVM from the training set {yi}, we maximize the following posterior:

And placing the priors

)|()( I0xx ,Np 1

),,( p

Negative Log

}){|,,},({ iip yx

)ln(||||2

DxYKYK

Negative Log

Computationally Intensive. A subsetis chosen to compute the kernel matrix.This subset of poses is called the ActiveSet.

• For a new pair (x,y) we can predict using

||)f(||

),,,},{|},({ln),(

xxyyyx

pL iiY

)},{|,,,},({ y'yx'x iip

• This eqn. can be used to solve for x given y or y given x, via gradient descent.

GPLVM1x

GPLVMLeft hand raised silhouettes tend to be clustered together

GPLVMDoes not always do a good job

About GPLVM

• Allows mapping to and from the lower dimensional space.

• Allows smooth parameterization (i.e. allows derivatives) in lower dimensional space.

• Two dimensions work well for our data set. (Growchow et al. uses 2-5)

Input2D Pose

Silhouettes (Represented using Alt Moments)

Learning the Forward/Rendering Function

Similar to Rosales and Sclaroff

Overview of Framework

Learn the rendering function Φ(.)

Learn a model of human body poses1

Y: Training DataLearning Phase

)(||||min 12 yx,sΦ(y)

Pose Inference PhaseInput Silhouette Output Pose

X: Latent Space

Pose Inference

2 ||||||||min ysΦ(y)y

Typical Regularization(Also used by Agarwal and Triggs)

Data Term

2 ||||||||min ysΦ(y)y

Forward function (Rendering function)

2D Projected Marker Positions

Silhouette(Alt Moments)

Regularization Term

2 ||||||||min ysΦ(y)yx,

Replace with prior knowledge term(i.e the learned model of poses)

)(1 yx,YLIndependent of feature s

Pose Inference

)(||||min 12 yx,sΦ(y)

Solution obtained using Conjugate Gradient- Initialization using Active Set

Data Collection

• 12 gestures in the flight director lexicon

• Synthesize 6000 pairs of (Silhouette, Pose) pairs using Poser

• 3000 training (Male model)

• 3000 testing (Female model)

3D Pose

Synthesized Silhouettes sampledUniformly over the frontal view-sphere

(a) Silhouette images generated by Poser 5 (Test Set)

Experiments (Synthetic Data)

(c) Our Approach

(b) Estimation from SMA (Specialized Mapping Architecture)

(d) Ground Truth

Comparison with SMA

Additional Constraints

212 |||| tt yy

)(||||min 12

ttYt Ltt

y,xs)Φ(yy,x

Additional constraints can be added to achieve more accurate estimate, e.g. temporal consistency

Experiments (Real Data)

(d) Our Approach (With Temporal Consistency)

(a) Silhouette images of real person

(b) SMA (Specialized Mapping Architecture)

(c) Our Approach (Without Temporal Consistency)

Experiments (Real Data)

(a) Silhouette images of real person

(b) SMA (Specialized Mapping Architecture)

(c) Our Approach (Without Temporal Consistency)

(d) Our Approach (With Temporal Consistency)

Conclusion• Proposed a novel method for Pose

estimation for a pre-defined gesture lexicon.

• Interesting to note that two dimension is enough in our case.

• Technique is fast. (about 0.1 sec per frame in Matlab)

• Tracking as an extension. [video]

Thank You

Comments after the talk• Related Works

– Bullets / Summary of Strength vs Weakness– Why we need this work?

• Include year of publication for the related work (eg Rosales Sclaroff work not mentioned, Smichisecu work not mentioned)

• Order the related work temporally?• Include an introduction slide and motivating slide

– How to Motivate this work?– State of the art is so and so… We found this common weakness. So we proposed this

work..• Human Pose not mentioned in Intro• At the end of the talk say why use this work over the others• Why GPLVM and not other reduction techniques? Like LLE/PCA/ISOMAP etc• Give a top overview of the algorithm. A flow chart view?• Explain the L(x,y) mapping using an illustration like the mapping between two planes.

Clearly say what is high dimension y and what is low dimension x• Give reference for GPLVM or website link.• Add a slide on Math of GPLVM• The Tikhonov regularization approach of minimizing ||phi(y)-s|| + regularization term.

Usually the regularization term is ||Dx|| but now we chose L(x,y). Explain why• Slide to talk about temporal constraint.• Why learn the rendering function? i.e because we want to take the derivative…• Give the numbers for the training set and this gives an idea how good are the

quantitative results

Related Work

Model Based• Simulated Annealing

[Deutscher et al CVPR ’00]

• Kinematic Jump Processes[Sminchisescu and Triggs CVPR ’03]

• Monte Carlo Markov Chain [Lee and Cohen CVPR ‘04]

• etc …

Learning Based• Specialized Mapping

Architechture (SMA)[Rosales and Sclaroff NIPS ‘01]

• Parameter Sensitive Hashing[Shakhnarovich et al CVPR ‘03 ]

• etc …

}){|,,},({ iip yx

)ln(||||2

DL xYKYK

Negative Log

Overview of Framework (Learning Phase)

Learn the Rendering Function Φ(.)

Learning a model of human body poses(Using GPLVM)1 2

Overview of Framework (Estimation Phase)

Input Silhouette

Output Pose

Search over learned model of human body pose for solution consistent with observation

Kernel Function

• measures the similarity of the latent variables x and x’.

• For a data set of N, we can form a N by N kernel matrix K, in which Ki,j = k(xi, xj).

2-exp )',(

xxxxxxk

how correlated x, x’ are in general spread of the

functionnoise in the prediction

}){|,,},({ iip yx

To learn the parameters of the GPLVM from the training set {yi}, we maximize the following posterior:

And placing the priors

)|()( I0xx ,Np

),,( p

Gaussian Process Latent Variable Model(GPLVM)

)( yx,YL

Low dimensional parameterization

Original space representationExpress how well

the two value matches

Space of FeasiblePoses

• For a new pair (x,y) we can predict using

||)f(||

),,,},{|},({ln),(

xxyyyx

pL iiY

)},{|,,,},({ y'yx'x iip

)()( 1 xT kKYxf )()(),()( 12 xKxxxx kkk T

1 articulated pose estimation in a learned smooth space of feasible solutions taipeng tian, rui li...

latent space slide

kernel slide

inference slide

pose space

boston university slide

model of body poses

malik cvpr

agarwal triggs cvpr

Documents

evans blue attachment enhances somatostatin receptor...

indexing multidimensional data rui zhang rui the university...

stan sclaroff curriculum vitae - computer science · stan...

projects in need of doing rui alves. metabolic...

comparison of cid, etd, and hcd for top-down ... ·...

guang tian trip and parking at to ds - guang tian

ye tian portfolio

catalog tian-de.ro

tian de offers_winter_2017_kz

chinese fragrant qigong - · pdf file3 introduction fragrant...

tian dao gong

rui leitão

la tian true

yong rui lenovo...

rui chammas

tian de offers_2017_winter_ru_ru

презентация tian de_finance_programm_full

plasmidhawk: alignment-based lab-of-origin prediction of...

tian de offers_winter_2017_by

anna & rui