Transcript
Page 1: Tracking and Retargeting Facial Performances with …forum.stanford.edu/events/posterslides/Trackingand...Tracking and Retargeting Facial Performances with a Data-Augmented Anatomical

Tracking and Retargeting Facial Performanceswith a Data-Augmented Anatomical Prior

Michael Bao, Ronald FedkiwStanford University

Abstract

Blendshape rigs are crucial to state-of-the-art facial anima-tion systems at high-end visual effect studios. These mod-els are used as geometric priors for tracking facial perfor-mances and are critical for retargeting a performance froman actor/actress to a digital character. However, the lim-ited, linear, and non-physical nature of the blendshape sys-tem causes many inaccuracies which results in an “uncannyvalley”-esque performance. We instead propose the use of ananatomically-based model. The anatomical model can usedto target the facial performance given by the actor/actress;however, unlike the blendshapes, the non-linear anatomicalmodel built on simulation will preserve physical propertiessuch as volume preservation which results in superior results.The model can furthermore be augmented by captured datato better capture the subtle nuances of the face. The capturedfacial performances on the anatomical model can then be eas-ily transferred to any digital character with a correspondinganatomical model in a semantically meaningful manner.

Previous Work

Figure 1: Top Left: An artist sculpted blendshape pose for an actor. TopMiddle: The corresponding pose obtained by simulating using modified muscletracks. Top Right: The modified muscle tracks captured from the blendshapepose. Bottom Left: The creature pose obtained by retargeting the modifiedmuscle tracks and simulating. Bottom Right: The retargeted muscle tracks.[1]

References

[1] Matthew Cong, Kiran S Bhat, and Ronald Fedkiw.Art-directed muscle simulation for high-end facial animation.In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on ComputerAnimation, pages 119–127. Eurographics Association, 2016.

[2] Thabo Beeler and Derek Bradley.Rigid stabilization of facial expressions.ACM Transactions on Graphics (TOG), 33(4):44, 2014.

[3] Matthew M Loper and Michael J Black.Opendr: An approximate differentiable renderer.In European Conference on Computer Vision, pages 154–169. Springer, 2014.

Capture

We will utilize a three-camera setup to record RGB videodata of facial performances. Each frame triplet will be used toreconstruct a point cloud/mesh of the face. This data, alongwith freely available 3D facial databases, will be at the core ofeach step of our algorithm.

Figure 2: RGB images captured for a facial performance.

Figure 3: Reconstructed point cloud of the RGB images.

Rigid Tracking

Given a point-cloud that contains a face, local geometricfeatures can be used to determine an initial alignment. RigidIterative-Closest-Point algorithms can then refine and propagatethe fit through time in a temporally consistent manner.

minR,~t12

n∑i

∥∥∥R~ps + ~t− ~pt∥∥∥2

2(1)

Figure 4: Manual rigid tracking of the point cloud.

Anatomical Prior: As seen in [2], the cranium can be usedas an additional anatomical constraint in the rigid tracking min-imization to produce better results.Data Augmentation: Alignment/registration algorithms of-ten rely on detecting persistent local features as correspondences.Since we are dealing exclusively with the face, we can train a de-tector to identify prominent features such as the nose and cornersof the eye.

Non-Rigid Tracking

min~θ Esfs + Ereconstruction + Eprior (2)

The goal of non-rigid tracking is to determine the anatomicalsimulation parameters (~θ) that causes the face simulationto match the RGB images (Esfs) and the 3-D point cloud(Ereconstruction) while being limited by anatomical constraints(Eprior). Using a combination of shape-from-shading techniquessuch as OpenDR [3] and the 3-D reconstructed point cloud willallow us to accurately match the given data. The anatomicalprior will prevent the shape from going “off-model.”

Data Augmentation: By using the method from [1], ouranatomical model will have the expressiveness necessary to tar-get data. However, oftentimes the 2-D and/or 3-D data will beunreliable due to adverse lighting conditions, motion blur, occlu-sions, etc. Extra robustness can be baked into the anatomicalprior by training it on a large number of poses captured in acontrolled environment.

Retargeting

Blendshapes are commonly used to retarget facial performances.An animation created on one blendshape rig can be directlytransferred to an identical blendshape rig for another model.However, the target model is oftentimes a digital actor for whichone cannot capture shapes. As a result, artists must spend timecreating shapes that may or may not be physically plausible.Furthermore, the linear nature of the blendshapes results in“uncanny-valley”-esque performances.

As a result, we propose the use of anatomical models for retar-geting as in [1]. They use deformation transfer to control themuscle deformations on the target model. A simulation is thenapplied on top on the target model’s flesh to introduce nonlin-earities into the result. By introducing physical constraints suchas volume preservation and collision, they are able to generatesignificantly improved results in the area around the lips.

Anatomical Model

An anatomical model can be generated for an actor by morphingfrom a template model.

Figure 5: Top: The simulation surface for the actor. Middle: The underlyingcranium and jaw. Bottom: The underlying facial muscles.

Future Work

Our lab is pursuing three projects focused on coupling simulationand computer vision: cloth, trees, and faces. We believe that theuse of real-world data can add a level of detail previously notfound in computer graphics. As a result, our goal is to developa general model for combining data with simulation models andapply this technique to a wider variety of projects such as fluidsimulations.

Contact Information

Web: http://physbam.stanford.edu/~fedkiw/Email: {mikebao, rfedkiw}@stanford.edu

Top Related