articulated bodies tracking

76
Articulated Bodies Tracking Eran Sela

Upload: fahim

Post on 24-Feb-2016

55 views

Category:

Documents


0 download

DESCRIPTION

Articulated Bodies Tracking. Eran Sela. Articulated Body. Every general 3D motion can be perceived by a moving group of joints and links. An articulated body has only joints and fixed length limbs. Motivation. Based on input data such as depth map, color, silhouette map – We’ll see today - PowerPoint PPT Presentation

TRANSCRIPT

Diapositiva 1

Articulated Bodies TrackingEran SelaArticulated Body

Every general 3D motion can be perceived by a moving group of joints and links.

An articulated body has only joints and fixed length limbs.Motivation

Based on input data such as depth map,color, silhouette map Well see todaytwo works about:

How to implement realtimeskeleton tracking on the articulated body.

The tracking can be used to movecomputers graphic models & to capture3D motion of humans body.

Supervised or semi-supervised learning trackers:Training sorts of decision trees or other statistical models based on labeled & unlabeled data.Model based skeleton tracking:Modeling the human body with primitives/surfaces and fitting the model to the data using an optimization scheme.

Image processing based tracking:Generate skeleton based on mathematical condition the data conform to.

Tracking MethodsArticulated Soft Objects for Video-based Body ModelingModeling the articulated bodyOptimization framework to the data (Least squares).Data constraintsResultsA Multiple Hypothesis Approach to Figure TrackingIntroductionThe 2D Scaled Prismatic ModelMode-based Multiple-Hypothesis TrackingMultiple Modes as Piecewise GaussiansResults

Presentation timelineInput:Video sequence containing:Depth map (using stereo cameras or other method).Silhouette map (The points where the line of sight from the camera is tangent to the surface).

Output:A set of 3D ellipsoid primitives with translation, orientation and scale corresponding to the articulated body parts.Articulated Soft Objects for Video-based Body Modeling

Modelling with Primitives vs Soft objects

Problem: primitive models such as cylinder and spheresare too crude for precise recovery of both shape and motionSolution: use Soft objects.Each primitive denes a eld function and the skinis taken to be a level set of the sum of these elds.

Has the following advantages:Effective use of stereo and silhouette dataAccurate shape description by a small number of parameters.Explicit modeling of 3D geometry

Modelling with Primitives vs Soft objects

Problem: primitive models such as cylinder and spheresare too crude for precise recovery of both shape and motionSolution: use Soft objects.Each primitive denes a eld function and the skinis taken to be a level set of the sum of these elds.

Has the following advantages:Effective use of stereo and silhouette dataAccurate shape description by a small number of parameters.Explicit modeling of 3D geometry

Modelling with Primitives vs Soft objects

Problem: primitive models such as cylinder and spheresare too crude for precise recovery of both shape and motionSolution: use Soft objects.Each primitive denes a eld function and the skinis taken to be a level set of the sum of these elds.

Has the following advantages:Effective use of stereo and silhouette dataAccurate shape description by a small number of parameters.Explicit modeling of 3D geometry

Modelling the body parts:State Vector:

B number of body parts

N number of consecutive framesJ number of jointsThe state vector changes on each frame.10Generalized algebraic surfaces

Generalized algebraic surfaces

Metaballs (Generalized algebraic surfaces), are defined by a summation over n 3-dimensional Gaussian density distributions, each called a source or primitive.Metaballs

The final surface S is found where the density function F equals some threshold amount, in our case:

Blinn [2]Why choosing ellipsoids as sources for metaballs?They are simpleAllow accurate modeling of human limbs with relatively few primitivesTheir shape is controlled by higher level width and length parametersAnd thus problems like over-fitting to high-curvature regions do not occur.Ellipsoids as sources

Next we define the 3D quadratic distanceFunction d() from the (x,y,z) point to eachellipsoid source.3D Quadratic distanceFor a specific metaball and a state vector we define 4x4 matrix:Is the scaling and translation along the major axis of the ellipsoid

is the radii of the ellipsoid (half the axis length along the principal directions.

is the primitives center.

are the coefficients from the state vector. 3D Quadratic distance

World frame and joint frame

The translation of each ellipsoidcenter from the world frame is constant (The vector C).

What changes every frame?E is per joint rotation matrix to the quadratic frame and is constant per frame.

3D Quadratic distanceis the skeleton induced transformation. A 4x4 rotation-translation matrixFrom the world frame to the frame to which the metaball is attached.

Given the rotation

of a joint J, we write:

Is homogenous 4x4 transformation from the joint frame to the quadric frame.Is transformation from the world frame to joint frame.

Is the ellipsoidal quadratic distance field.Least Square Framework

Least Square FrameworkLeast squares optimization framework is used to estimate the state vector parameters:Least Square FrameworkSolution to the optimization problem is based on Levenberg-Marquardt algorithmFor solving the least squares problem, and find the new state vector .

The Jacobian matrix is calculated for any point x:

Silhouettes ObservationsThe silhouette points defined as the points where the line of sight from the cameraIs perpendicular to the normal of the surface.

Why silhouette data is important?Integrate silhouette constraint

Integrate silhouette constraint

We integrate silhouette observations into our framework by performing an initial search (using Brents line minimization) along the line of sight to find the point that is closest to the model at its current configuration.

Then when we find the closest silhouette point to the model we give it a higher weight in the P weight matrix, so the silhouette points are more significant for the fitting.Fitting ResultSensor configuration:Depth is acquired by 3 cameras in an L configuration taking non-interlaced images at 30 frames/sec, with an effective resolution of 640 x 400.

stereo algorithm produced very dense point clouds which are then filtered yielding about 4000 evenly distributed 3D points on the surface of the subject

In the top row are the original sequences of upper body motions of different persons. Results of the tracking and fitting are shown in the bottom row. Although the two persons have very different body sizes the system adjusts the generic model accordingly.

Fitting Result

First person:

Second person:End of topic 1Articulated Soft Objects for Video-based Body ModelingModeling the articulated bodyOptimization framework to the data (Least squares).Data constraintsResultsA Multiple Hypothesis Approach to Figure TrackingIntroductionThe 2D Scaled Prismatic ModelMode-based Multiple-Hypothesis TrackingMultiple Modes as Piecewise GaussiansResultsPresentation timelineA 2D human figure tracking.Probability approach to estimate the 2D human figure model.Maintaining a set of possible tracking solutions.Every possible track can be potentially updated with every new update.Over time, the track branches into many possible directions.

A Multiple Hypothesis Approach to Figure Tracking

A 2D human figure tracking.Probability approach to estimate the 2D human figure model.Maintaining a set of possible tracking solutions.Every possible track can be potentially updated with every new update.Over time, the track branches into many possible directions.

A Multiple Hypothesis Approach to Figure Tracking

A 2D human figure tracking.Probability approach to estimate the 2D human figure model.Maintaining a set of possible tracking solutions.Every possible track can be potentially updated with every new update.Over time, the track branches into many possible directions.

A Multiple Hypothesis Approach to Figure Tracking

A 2D human figure tracking.Probability approach to estimate the 2D human figure model.Maintaining a set of possible tracking solutions.Every possible track can be potentially updated with every new update.Over time, the track branches into many possible directions.

A Multiple Hypothesis Approach to Figure Tracking

A 2D human figure tracking.Probability approach to estimate the 2D human figure model.Maintaining a set of possible tracking solutions.Every possible track can be potentially updated with every new update.Over time, the track branches into many possible directions.

A Multiple Hypothesis Approach to Figure Tracking

Used in radars

The MHT is designed for situations in which the target motion model is very unpredictable, as all potential track updates are considered.

As each radar update is received every possible track can be potentially updated with every new update. Over time, the track branches into many possible directions.How we can enforce 3D kinematic constraints of the model that conform to the 2D monocular image data?The 2D Scaled Prismatic ModelScaled Prismatic Models (SPM):Each link in a scaled prismatic model describes the image plane projection of an associated rigid link in an underlying 3D kinematic chain.Each link has 2 DOF: the distance between the joint centers of adjacent links, and the rotation angle at its joint center around an axis which is perpendicular to the image plane.It captures the foreshortening that occurs when 3D links rotate into and out of the image plane.

How we can enforce 3D kinematic constraints of the model that conform to the 2D monocular image data?The 2D Scaled Prismatic ModelScaled Prismatic Models (SPM):Each link in a scaled prismatic model describes the image plane projection of an associated rigid link in an underlying 3D kinematic chain.Each link has 2 DOF: the distance between the joint centers of adjacent links, and the rotation angle at its joint center around an axis which is perpendicular to the image plane.It captures the foreshortening that occurs when 3D links rotate into and out of the image plane.

How we can enforce 3D kinematic constraints of the model that conform to the 2D monocular image data?The 2D Scaled Prismatic ModelScaled Prismatic Models (SPM):Each link in a scaled prismatic model describes the image plane projection of an associated rigid link in an underlying 3D kinematic chain.Each link has 2 DOF: the distance between the joint centers of adjacent links, and the rotation angle at its joint center around an axis which is perpendicular to the image plane.It captures the foreshortening that occurs when 3D links rotate into and out of the image plane.

We model the human 2D figure as a branched SPM chain. Each link in the arms, legs, and head is modeled as an SPM link. Each link 2 DOF, leading to a total body model with 18 DOFs. The tracking problem consists of estimating a vector of SPM parameters for the figure in each frame of a video sequence, given some initial state.Tracking problem representation

The choice of representation for the probability density of a tracker state is largely dominated by two concerns: The unimodality constraint imposed when using a Gaussian-based parametric representation such as the Kalman Filter is inaccurate when tracking in a cluttered environment.Sample-based representation (such as used in the CONDENSATION algorithm) requires a prohibitive number of samples for encoding the probability distribution of a high-DOF SPM model.Probability Density Representation

Condensation algorithm is an application of particle filtering in which:Observations and hidden states are represented by hand contours.Contours can be represented as splines, list of angles between phalanxes, etc.There is a model for P(next state|previous state).Can be set manually by studying the anatomy of a hand.Can be learned by gathering lots of examples of sequences of hand movement.Learning can be done using special gloves which report exact hand location and shape.P(state|observation) is estimated using visual features (SIFT,Harris, etc.)Condensation Algorithm

A hybrid approach:Supports a multimodal description but requires fewer samples for modeling.

The representation is based on retaining only the modes (or peaks) of the probability density and modeling the local neighborhood surrounding each mode with a Gaussian.Probability Density Representation Input:Video sequence containing 1 or more humans

Output:A state vector per each frame of values for all the DOF of the SPM chains assembling the model.MHT AlgorithmMode-based Multiple-Hypothesis Tracking

(Bayes rule)The algorithmThe algorithmThe algorithmThe algorithmThe algorithmGenerating Prior DistributionsKalman FilterKalman Filter

State Prediction:Measurement Prediction: X K .50Two groups of the equations for the Kalman filter:Time update equations (Prediction) Measurement update equations. (Correction)

The time update equations are responsible for projecting forward (in time) the current state and error covariance estimates to obtain the a priori estimates for the next time step.

The measurement update equations are responsible for the feedbacki.e. for incorporating a new measurement into the a priori estimate to obtain an improved a posteriori estimate.Kalman Filter : , (, ) . , ( ). . . . , , , .51PredictPredict the state ahead:

Predict the error covariance ahead:UpdateUpdate the state estimate:

Update the error covariance:

where Kalman gain Kt is:

52

Kalman Filter ? 52Multiple Modes as Piecewise Gaussians

Multiple Modes as Piecewise Gaussians

Sampling from Piecewise Gaussians

Sampling from Piecewise Gaussians

Not enough samples

Selected modeSampling from Piecewise Gaussians

Not enough samples

Selected modeSampling from Piecewise Gaussians

Not enough samples

Selected modeSampling from Piecewise Gaussians

Not enough samples

Selected modeNot satisfies p(x),Reject it !Sampling from Piecewise Gaussians

Not enough samples

Selected modeSampling from Piecewise Gaussians

Not enough samples

Selected modeSampling from Piecewise Gaussians

Not enough samples

Selected modeSample satisfiesp(x), keep it.Sampling from Piecewise Gaussians

Not enough samples

Selected modeSample satisfiesp(x), keep it.In order to estimate the likelihood distribution template images of the model should be registered.

This can be done for example by randomizing values for the SPM model chains and rendering a 3D graphic model of a person then his joints conforms to the model state.Template Registration

Likelihood ComputationWe maximize it minimizing the log likelihood:Using Iterative Gauss-Newton method.

Deriving Posterior Distributions

Example of the process for each frameIExample of the process for each frameIExample of the process for each frameIExample of the process for each frameIExample of the process for each frameIThe algorithm was tested on three sequences involving Fred Astaire from the movie Shall We Dance. A 2D 19-DOF SPM model is manually initialized in the first image frame, after which tracking is fully automatic.

First experiment:Each joint probability distribution in the state-space is described via only 1 mode (unimodal).

Second experiment:Typically each joint probability distribution in the state-space is described via 10 modes in a PWG representationExperimental ResultsSingle hypothesis (tracker initialized with single mode) tracker:Experimental Results

The single hypothesis tracker fails to handle the self-occlusion caused byFred Astaires legs crossing

Multi hypothesis (tracker initialized with 10 modes) tracker:Experimental ResultsTop row: the multiple modes of the tracker are shown.Bottom row: the dominant mode is shown, which demonstrate the ability of the tracker to handle ambiguous situations and thus survive the occlusion event.PlankersandFua, Articulated Soft Objects for Video-basedBodyModeling, ICCV 2001Cham, T.J. andRehg, J.M. A Multiple Hypothesis Approach to Figure Tracking, CVPR 1999 (II:239-245)ReferencesThe End