synthesis and editing of personalized stylistic human motion

Synthesis and Editing of Personalized Stylistic Human Motion

Jianyuan MinTexas A&M University

Huajun LiuTexas A&M University

Wuhan University

Jinxiang ChaiTexas A&M University

Figure 1: Motion style synthesis and retargeting: (top) after observing an unknown actor performing one walking style, we can synthesizeother walking styles for the same actor; (bottom) we can transfer the walking style from one actor to another.

Abstract

This paper presents a generative human motion model for synthe-sis, retargeting, and editing of personalized human motion styles.We first record a human motion database from multiple actors per-forming a wide variety of motion styles for particular actions. Wethen apply multilinear analysis techniques to construct a generativemotion model of the form x = g(a, e) for particular human ac-tions, where the parameters a and e control “identity” and “style”variations of the motion x respectively. The new modular represen-tation naturally supports motion generalization to new actors and/orstyles. We demonstrate the power and flexibility of the multilinearmotion models by synthesizing personalized stylistic human mo-tion and transferring the stylistic motions from one actor to another.We also show the effectiveness of our model by editing stylistic mo-tion in style and/or identity space.

CR Categories: I.3.6 [Computer Graphics]: Methodology andTechniques—interaction techniques; I.3.7 [Computer Graphics]:Three-Dimensional Graphics and Realism—animation

1 Introduction

Human movement demonstrates a wide variety of variations evenfor the same functional action. For example, normal walking can beperformed quite differently by the same actor in different emotions

or by different actors in the same emotion. However, existing mo-tion synthesis techniques often ignore such variations and thereforefail to synthesize personalized stylistic human motion. In practice,current animation systems often rely on motion editing techniquesto adapt an animated motion from one actor to another [Gleicher1998; Lee and Shin 1999; Popovic and Witkin 1999] or apply mo-tion translation techniques to transfer the style of one motion toanother [Amaya et al. 1996; Hsu et al. 2005], although neither ap-proaches allows the user to synthesize and edit personalized humanmotion styles. A more appealing solution is to use prerecorded mo-tion data to construct a generative motion model that explicitly in-terprets “style” and “identity” variations in the same action such aswalking and use the constructed model to generalize the capturedmotion data for new actors and/or styles. This problem, however,is challenging because style and identity variations are very subtleand often coupled in the motion capture data.

In this paper, we present a multilinear motion model for analy-sis and synthesis of personalized stylistic human motion. We firstrecord a human motion database from multiple actors performingthe same action under a variety of styles, and then apply multilineardata analysis techniques to construct a generative motion model ofthe form x = g(a, e) for particular actions, where the parameters aand e control the “identity” and “style” variations of the action x,respectively. The multilinear motion model provides a continuous,compact representation for allowable stylistic motion variationsacross multiple actors. With such multilinear models, we couldadjust the values of the parameters a and e, generating a natural-looking, personalized stylistic human motion x = g(a∗, e∗). Wecould also edit the style and/or identity of human actions by inter-actively adjusting the values of the attribute parameters to matchthe constraints specified by the user.

We demonstrate the power and flexibility of our multilinear motionmodels by synthesizing personalized stylistic human motion andretargeting stylistic human motion from one actor to another. For

example, by observing an unknown actor performing one walkingstyle, we could synthesize other walking styles, which were neverseen before, for the same actor (Figure 1.(top)). We can also retargetthe walking style from one actor to another (Figure 1.(bottom)). Inaddition, we show how to use the multilinear models for realtimestylistic motion editing, which interactively adjusts the values ofthe attribute parameters to match the constraints specified by theuser. Our motion editing process is advantageous because it allowsthe user to edit human motion in style space and/or identity space.In addition, we evaluate the performance of our multilinear motionmodels by comparing the synthesized personalized stylistic motionwith ground truth data.

2 Background

Our approach constructs multilinear motion models from a largeset of preregistered motion data and uses the generative models tointerpret human motion variations across multiple actors and styles.We therefore focus our discussion on data-driven human motionmodels and their applications in human motion synthesis and styletransfer.

One popular data-driven approach is motion graphs, which repre-sent allowable transitions between poses [Arikan and Forsyth 2002;Kovar et al. 2002; Lee et al. 2002; Lee et al. 2006; Safonova andHodgins 2007]. Motion graphs create an animation by cuttingpieces from a motion database and reassembling them to form anew motion. However, the graph-based representations cannot gen-eralize motion data for new styles and/or actors because they do notmodify the captured motion data.

Another way to represent human motion is weighted interpolationsof registered motion examples [Rose et al. 1998; Kovar and Gle-icher 2004; Mukai and Kuriyama 2005; Kwon and Shin 2005; Heckand Gleicher 2007]. This representation can interpolate motion ex-amples with different styles but it fails to model motion variationsacross different actors. In computer vision, Troje [Troje 2002] ex-plored a similar representation for walking motions in which thetemporal variations in poses are expressed as a linear combina-tion of sinusoidal basis functions, and used them to recognize gen-ders from optical motion capture data. The multilinear motionmodels proposed in this paper significantly extend the motion in-terpolations representation by applying multilinear data analysistechniques to registered motion examples. Unlike motion inter-polations, the multilinear representation explicitly interprets both“style” and “identity” variation in human motion and therefore canbe used to generalize motion for new actors and styles.

Our work constructs a statistical model for analysis and synthesis ofpersonalized stylistic human motion. In the past decade, statisticalmotion models have been used for interpolation of key frames [Liet al. 2002], interactive posing of 3D human character [Grochowet al. 2004], performance animation from low-dimensional con-trol signals [Chai and Hodgins 2005], generalization of motion tomatch various forms of spatial-temporal constraints specified bythe user [Chai and Hodgins 2007], and interactive motion synthesiswith direct manipulation interfaces and sketching interfaces [Minet al. 2009]. However, none of these approaches interprets motionvariations using style or identity factors.

Several researchers have explored data-driven approaches to buildstatistical models for interpreting motion variations caused by sin-gle factor (e.g., style) [Brand and Hertzmann 2000] or multiplefactors (e.g., gait, identity, and state) [Wang et al. 2007]. Ourmethod differs in that it applies multilinear data analysis techniquesto registered motion examples and constructs a compact and low-dimensional generative model for an entire motion sequence. Thisensures the generative motion models have correct global human

motion structures and environmental contact information, therebysignificantly reducing visual artifacts (e.g., foot-sliding) in synthe-sized motions. Another distinction is that their factors are oftenrelated to the hidden states of the models so that they cannot beedited explicitly. Our work is also relevant to [Liu et al. 2005],which used inverse function optimization techniques to learn phys-ical model parameters from reference motion data to model humanmotion styles.

Our generative human motion model builds on the success ofmultilinear models for analysis and synthesis of high-dimensionalvisual data [Vasilescu and Terzopoulos 2004; Wang et al. 2005;Vlasic et al. 2005]. Multilinear models were first introduced byTucker [1966] and later formalized and improved on by Kroonen-berg and de Leeuw [1980]. Recently, these techniques have beensuccessfully applied to statistical analysis of 2D images [Vasilescuand Terzopoulos 2004; Wang et al. 2005] and 3D surface mod-els [Vlasic et al. 2005]. In closely related work [Vasilescu 2002;Mukai and Kuriyama 2007], multilinear models have been used forhuman motion analysis [Vasilescu 2002] and interpolations [Mukaiand Kuriyama 2007]. This paper builds on their work with a fewkey differences. Foremost, their models ignore speed variationsin human motion data. This distinction is important in that speedvariations are critical for analysis and synthesis of stylistic humanactions as stylistic human motion often displays significant speedvariations. And second, they do not model the dependency betweenstyle and identity variations. Further, our generative motion modelcan be used for realtime human motion editing in “style” and/or“actor” space (described in Section 4.3), a capability that has notbeen demonstrated in previous approaches.

Our work is inspired by those works that seek to transfer the style ofone motion to another [Amaya et al. 1996; Hsu et al. 2005; Shapiroet al. 2006] or retarget human motion from one actor to another ac-tor [Gleicher 1998; Lee and Shin 1999; Popovic and Witkin 1999].This paper generalizes these two ideas by constructing a genera-tive human motion model amenable for stylistic motion synthesis,retargeting, and editing.

3 Human Motion Analysis

We first record a human motion database from multiple actors,i = 1, ..., I , performing the same action (e.g., walking) in a widevariety of styles, j = 1, ..., J . Our walking database, for instance,is recorded from 20 actors performing walking in 11 different mo-tion styles (neutral, angry, happy, sad, tired, proud, sneaky, goose-step, cat walking, crab walking, and lame walking). We anno-tate the motion examples in terms of the “identity” and “style” at-tributes.

Our goal here is to model the overall motion variation in thedatabase by explicitly distinguishing what proportion is attributableto each of the relevant factors - “identity” change and “style”change. In particular, we want to discover explicit parameterizedrepresentations of what the motion data of each “style” have incommon independent of “identity”, what the data of each “identity”have in common independent of “style”, and what all data have incommon independent of “identity” and “style”, i.e., the interactionof the“identity” and “style” factors.

The key idea of this paper is to apply multilinear data analysis tech-niques to preregistered motion examples and construct a generativemodel of the form: x = g(a, e), where a and e are low-dimensionalvectors controlling “identity” and “style” variation respectively, andthe vector-valued function g represents the transformation from thestyle and identity vectors to a motion vector x. To model the re-lationship between “identity” and “style” factors, we also learn a

joint probabilistic density function pr(a, e) from the captured hu-man motion data.

3.1 Motion Data Preprocessing

We represent the set of captured motion examples as {xi,j(t′)|i =1, ..., I, j = 1, ..., J, t′ = 1, ..., T ′i,j}, where T ′i,j is the number offrames for the motion example corresponding to the i-th actor andj-th style. Let the vector xi,j(t′) ∈ R62 represent root position andorientation as well as joint angle values of a full-body pose at framet′, and let xi,j be a long vector sequentially concatenating all posesof the motion examples across the entire sequence.

One challenge to model actor and style variations is that motion ex-amples in the database often have very different speeds. Therefore,motion vectors xi,j ∈ R62×T ′

i,j often contain different number ofdimensions. This prevents us from applying statistical data analysistechniques to motion examples. To address this issue, we repa-rameterize the motion vectors xi,j in a low-dimensional space ui,j ,which is suitable for statistical data analysis. This is achieved byregistering all motion examples in the database and then applyingdimensionality reduction techniques to registered motion examplesas well as corresponding time warping functions.

3.1.1 Motion registration

We preprocess all motion examples in the database by registeringthem against each other via a parameterized motion model [Chenet al. 2009]. We register motion examples in a translation- androtation-invariant way by decoupling each pose from its transla-tion in the ground plane and the rotation of its hips about the upaxis [Kovar and Gleicher 2003].

We define the computed time warping functions t′ = wi,j(t) inthe canonical timeline t, where t = 1, ..., T . This allows us to de-scribe the time warping functions as a T -dimensional vector wi.j =[wi,j(1), ..., wi,j(T )]T . We also use the time warping functionswi,j to warp the original motion examples xi,j(t′), t′ = 1, ..., T ′i,jinto the registered motion examples si,j(t), t = 1, ..., T , where trepresents the canonical timeline. The vectors si,j ∈ R62×T andwi,j ∈ RT represent geometric and timing variations of the originalmotion xi,j , respectively.

One advantage of the new representation is that the vectors si,j

and wi,j can incorporate expert knowledge in the annotation of thetraining data. We annotate environmental contact information andimportant key events of the registered motion. This allows us toeasily identify which components in the new vectors control theposes of important events such as “toe-down” and “heel-up”, or thetiming of contact transitions. In addition, the new representationenables us to describe the motion examples with two point clouds,which are suitable for statistical data analysis.

3.1.2 Dimensionality reduction

This section discusses how to apply dimensionality reduction tech-niques to geometric and timing vectors to build a low-dimensionalrepresentation for human motion.

We first apply principle component analysis to the registered mo-tion data si,j ∈ R62×T , i = 1, ..., I, j = 1, ...J . As a result, wecan model a registered motion example using a mean motion se-quence p0 and a weighted combination of eigen-motion sequencespm,m = 1...,M :

s = P(~α)= p0 + [p1...pM ]~α,

(1)

where the vector ~α represents the eigen weights [α1, α2, ..., αM ]T ,and the vectors pm,m = 1, ...,M are a set of orthogonal modes tomodel geometric variations across entire motion sequences.

Similarly, we can apply the PCA to the time warping functionswi,j ∈ RT to build a low-dimensional representation for speedvariations (i.e., time warping functions). However, direct applica-tion of statistical analysis techniques to the time warping functionscould get us into trouble because time warping functions are con-strained functions–they should be positive and strictly monotoniceverywhere.

Our good way to address this problem is to transform the con-strained time warping functions wi,j into unconstrained functionszi,j , and then apply dimensionality reduction techniques to the un-constrained functions zi,j to construct a low-dimensional represen-tation for the time warping functions z = Z(~γ) in the new space.The low-dimensional representation in the original space w =H(~γ) can then be obtained by transforming the low-dimensionalvector from the new space to the original space. More specifically,we choose to transform the time warping functions w(t) into a newspace z(t):

z(t) = ln(w(t)− w(t− 1)), t = 1, ....T. (2)

After choosing w(0) to be zero, we can easily transform the func-tions from the new space to the original space:

w(t) =

t∑i=1

exp[z(i)], t = 1, ..., T. (3)

Equation 3 makes sure positive and monotonic constraints of thew(t) will be automatically satisfied if we conduct statistical anal-ysis in the new space z and then transform the generative modelsback to the original space.

We can apply the principle component analysis (PCA) to the timewarping functions zi,j , i = 1, ..., I, j = 1, ..., J in the new space.We then can represent the time warping functions z in the new spaceas a mean time warping function b0 plus a linear combination ofeigen-vectors bk, k = 1, ...,K:

z = Z(~γ)= b0 + [b1...bK ]~γ,

(4)

where the vector ~γ = [γ1, γ2, ..., γK ]T provides a low-dimensionalrepresentation for the time warping functions.

After combining Equation (4) with Equation (3), we obtain the fol-lowing low-dimensional representation of the timing warping func-tions in the original space:

w(t) = H(t;~γ)

=∑t

i=1exp(b0(i) + [b1(i)...bK(i)]~γ), t = 1, ..., T

(5)where the scalar bk(i) represents the i-th component of the k-theigen vector bk.

Now, we can represent the motion examples xi,j with two low-dimensional vectors ~αi,j ∈ RM and ~γi,j ∈ RK , which can becomputed by Equation (1) and Equation (4) respectively:

~αi,j = [p1...pM ]T (si,j − p0)~γi,j = [b1...bK ]T (zi,j − b0)

(6)

We can further represent motion examples xi,j with a low-dimensional concatenated vector ui,j ∈ RM+K as follows:

ui,j =

(~αi,j

W~γi,j

)=

([p1...pM ]T (si,j − p0)

W[b1...bK ]T (zi,j − b0)

), (7)

where the diagonal matrix W control weights for each componentof the timing vector ~γ, allowing for the difference in units betweenthe geometric and timing vector.

The elements of ~α and ~γ have different units so they cannot be com-pared directly. Because pm,m = 1, ...,M are orthogonal columns,varying ~α by one unit moves s = P(~α) by one unit. To make ~α and~γ commensurate, we must estimate the effect of varying ~γ on themotion s. For each training example si,j , i = 1, ..., I, j = 1, ...J ,we systematically displace each element of ~γ from its default value~γn and warp the default motion si,j with the displaced time warp-ing functions. The RMS (root mean square) change in si,j , i =1, ..., I, j = 1, ..., J per unit change in timing parameters gives theweight to be applied to that parameter in Equation (7).

3.2 Multilinear Motion Analysis

This section discusses how to apply multilinear data analysis tech-niques to construct a generative motion model that explicitly in-terprets “style” and “identity” variations in the prerecorded mo-tion data. Multilinear models [Kroonenberg and Leeuw 1980] re-cently have been successfully applied to statistical analysis of high-dimensional visual data, including 2D images and 3D facial mod-els [Vasilescu and Terzopoulos 2004; Wang et al. 2005; Vlasic et al.2005]. We believe that multilinear models are well suited for ourtask because they can interpret data variation with separable at-tributes.

Given a corpus of motion examples ui,j ∈ RM+K , i =1, ..., I, j = 1, ..., J in the low-dimension space, we can definea third order data tensor U ∈ R(M+K)×I×J , where M + K is thedimensionality of the motion vector ui,j , I is the number of people,and J is the number of styles. We apply the N-mode SVD algorithmto decompose the data tensor as follows:

U = D×1 V ×2 A×3 E (8)

where D is the core tensor. The matrices V , A, and E are orthog-onal matrices whose columns contain left singular vectors of thefirst, second, and third mode space respectively, and can be com-puted via regular SVD of those spaces. For a detailed discussion onmultilinear analysis, we refer the reader to an overview of higher-order tensor decomposition [Kolda and Bader 2009].

The people matrix A = [a1, ..., aI ]T , whose person specific rowvectors aT

i , n = 1, ..., I span the space of person parameters,encodes the per-person invariances across styles. Thus the ma-trix A contains the identity signatures for human motions. Thestyle matrix E = [e1, ..., eJ ]T , whose style specific row vectorseTj , j = 1, ...J span the space of style parameters, encodes the

invariances for each style across different people. The matrix Vwhose row vectors gives a low-dimensional representation for mo-tion examples.

When a data tensor is highly correlated, variance will be concen-trated in one corner of the core tensor. The data tensor can be ap-proximated by

U = Dreduced ×1 V ×2 A×3 E (9)

where V , A, and E are truncated versions of V , A, and E with lastfew columns removed, respectively.

Equation (9) shows how to approximate the data tensor by modemultiplying a smaller core tensor with a number of truncated or-thogonal matrices. Since our goal is to represent a motion sequenceas a function of identity and style parameters, we can decompose

the data tensor without factoring along the mode that correspondsto motion (mode-1). We, therefore, have

U = Q×2 A×3 E (10)

where Q encodes the interaction between the “identity” and “style”factors, and can be called the multilinear motion model of humanactions. Mode-multiplying Q with A and E approximates the orig-inal data. In particular, mode-multiplying it with one row from Aand E reconstructs exactly one original motion sequence (the onecorresponding to the attribute parameters contained in that row).Therefore, to generate a motion style for a particular actor, we canmode-multiply the model with the identity vector and style vector.We can thus model a motion vector u as follows:

u = f(a, e)= Q×2 aT ×3 eT (11)

where a and e are column vectors controlling identity and stylevariations respectively. The vector-valued function f represents thetransformation from the identity and style vectors to a motion vec-tor u.

We can further transform the low-dimensional motion vector u =f(a, e) to a motion sequence x = g(a, e) in the original space. Inparticular, we can reshape the motion vector u into the geometricvector ~α = u1:M , and timing vector ~γ = W−1uM+1:M+K . A newmotion sequence x can then be obtained by warping the motion s =P(~α) in the canonical timeline with the time warping function w =H(~γ). However, compression error caused by the low-dimensionalhuman motion representation might produce foot-sliding artifacts inwalking data. We can remove the foot-sliding artifacts by enforcingfoot contact constraints encoded in the model. In our experiments,we run an inverse kinematics process to remove the foot-skatingartifacts in the reconstructed motion.

We also model a joint probability distribution function pr(a, e) byfitting a multivariate normal distribution based on the training mo-tion examples ai, i = 1, ..., I and ej , j = 1, ..., J . The probabilitydistribution can be used to reduce the solution space of control pa-rameters by constraining the generated motions to stay close to thetraining examples.

Motion Data. In our experiment, 20 subjects were asked to per-form walking in 11 different motion styles (neutral, angry, happy,sad, tired, proud, sneaky, goose-step, cat walking, crab walking,and lame walking). Each motion example in the database containsone and half walking cycles. By keeping 99% of the original vari-ations, we can model geometric variations with a 68-dimensionalvector ~α and timing variations with a 9-dimensional vector ~γ. Theresulting third order (3-mode) data tensor (77-dimensional motionvectors × 11 styles × 20 identities) was decomposed to yield amultilinear motion model providing 11 knobs for styles, and 14 foridentity.

4 Applications

This section discusses the applications of the multilinear motionmodels x = g(a, e) for stylistic motion synthesis, retargeting, andediting. Our results are best seen in the accompanying video al-though we show sample frames of a few motions in the paper.

4.1 Stylistic Motion Synthesis

The multilinear motion models provide a compact and continuousrepresentation for human actions. More significantly, they providean intuitive control over human motion with identity and style pa-rameters. We can synthesize a walking sequence x = g(ai, ej) by

directly specifying its “style” (e.g., style “j”) and “identity” (e.g.,actor “i”). For example, we could create a child’s cheerful walkingsequence by applying a “happy” style to a “child” character.

We could use the models to interpolate two motions x1 =g(ai1 , ej1) and x2 = g(ai2 , ej2) in the “identity” and/or “style”space: g(α ∗ ei1 + (1− α) ∗ ei2 , β ∗ ej1 + (1− β) ∗ ej2), whereα and β are the interpolation weights.

In addition, the multilinear motion model can be used to identifyhuman motion signature (i.e., “identity”) of an unknown motion se-quence y and then synthesize other styles, which were never seenbefore, for this new individual. Mathematically, we can formulatethe parameter estimation problem as the following maximum a pos-teriori (MAP) problem:

a∗, e∗ = arg mina,e

λ‖g(a, e)− y‖2 − lnpr(a, e) (12)

where the vectors a∗ and e∗ are the estimated identity and styleparameters respectively. The first term is the likelihood term thatmeasures the differences between the synthesized motion g(a, e)and the input motion y. The second term is the prior term that con-strains the generated motion to stay close to the training examples.

We can synthesize the remaining styles, which were never seen be-fore, for this new individual as follows: g(a∗, e0), where e0 repre-sents the style parameters for an existing style, which could be di-rectly from prerecorded styles (e.g., happy walking) or a weightedcombination of prerecorded styles. Figure 1.(top) shows a stylis-tic walk performed by an unknown actor and three stylistic walk-ing sequences synthesized for the same actor. If we observe anunknown actor performing multiple stylistic motions (e.g., happywalking and normal walking), we should use all the observed mo-tion sequences to estimate the parameters.

To evaluate the performance of our stylistic motion synthesis algo-rithm, we use cross validation techniques to compare synthesizedmotion with ground truth motion data. More specifically, we pullall motion sequences (at least two) from one actor out of the train-ing database and use them as the testing motion sequences. Wethen randomly choose one testing sequence to recover the style andidentity parameters based on Equation (12). We employ the syn-thesis algorithm to generate a known motion style of the same actorwith the estimated actor parameters. Figure 2 shows a comparisonbetween the synthesized motion data and the ground truth motiondata.

4.2 Stylistic Motion Retargeting

Similarly, we can retarget stylistic human motions from one actorto another. For example, if we observe a known ai (one who isalready recorded in the motion database) performing a new stylisticmotion ynew, we can use Equation (12) to estimate the associatedstyle parameters enew from the observed motion and then use theestimated style parameters to retarget the motion to another actor asfollows: g(a0, enew), where a0 represents the identity parametersfor an existing actor, which could be directly from database subjectsor a weighted combination of database subjects. Figure 1.(bottom)shows an unknown walking style performed by an unknown actorand the same walking style retargeted to three known actors in thedatabase.

4.3 Stylistic Motion Editing

The multilinear motion models can also be used for realtime humanmotion editing. Our idea is to continuously adjust the identity andstyle parameters to generate a motion sequence x = g(a, e) that

best matches user constraints c. One advantage of the multilinearmotion models is that the number of control parameters is oftenvery low. Optimizing human motions in such a low-dimensionalspace not only improves the synthesis speed but also reduces thesynthesis ambiguity. We have built a realtime direct manipulationinterface for motion editing.

The system starts with a default motion sequence. The user cancontinuously adjust the motion g(a, e) in real time by dragging anycharacter point at any frame. Mathematically, we can formulate themotion editing problem in a MAP framework:

a∗, e∗ = arg mina,e

λ‖g(a, e)− c‖2 − lnpr(a, e) (13)

where the first term is the likelihood term that measures how wellthe synthesized motion matches the user-defined constraints. Thesecond term is the prior term that constrains the synthesized mo-tions staying close to the training data. The weight λ balances thetrade off between each component of the cost function.

One unique property of our motion editing system is that it allowsthe user to edit an entire motion sequence in either “identity” or“style” space. In particular, we can keep the “style” factor constantand adjust the “identity” knob using constraints c, resulting in thefollowing MAP optimization problem:

a∗ = arg minaλ‖g(a, e0)− c‖2 − lnpr(a) (14)

where e0 represents the fixed style factor. The probability distribu-tion function pr(a) is prior distribution for the “identity” parame-ters and can be computed by marginalizing the joint pdf pr(a, e).Figure 3 shows direct manipulation of human motion in the “iden-tity” space. This motion editing process is particularly useful fordesigning personalized motion styles for different characters.

On the other hand, if we keep the “identity” factor constant, we canedit the motion in the “style” space:

e∗ = arg mineλ‖g(a0, e)− c‖2 − lnpr(e) (15)

where a0 represents the constant identity factor. The probabil-ity distribution function pr(e) is prior distribution for the “style”parameters and can be computed by marginalizing the joint pdfpr(a, e). Figure 4 shows direct manipulation of human motion inthe “style” space. Editing human motion in the style space allowsus to design a desired motion style for a particular actor.

Figure 5 shows a comparison of three editing processes, all of whichstarted from the same initial motion (sneaky walking) and editedwith the same 2D position constraint. Editing in the “identity”space results in a different form of sneaky motion. As expected,editing in the “style” space produces a new walking style for thesame actor. Editing in the joint space allows us to create a newwalking style for a different actor.

4.4 Optimization Algorithm

For all applications, we analytically evaluate the jacobian terms ofthe objective function and then run the gradient-based optimiza-tion with the Levenberg-Marquardt algorithm in the Levmar library[Lourakis 2009]. We initialize the motion with the mean identityand style values of the database examples. When the user uses thedirect-manipulation interfaces to edit the motion on the fly, we ini-tialize the motion in the current time step t with the motion gen-erated in the previous time step t − 1. This significantly improvesthe optimization efficiency because the initialized motion is alreadyvery close to the final motion. The direct manipulation interface formotion editing runs in real time.

(a) (b) (c)

Figure 2: Comparison of ground truth motion and synthesized motion style: (a) one testing motion style used for estimating the identityparameters; (b) ground truth sneaky walking from the same actor; (c) sneaky walking synthesized by the estimated actor signature and thesame “sneaky” style as in ground truth motion.

(a) (b) (c)

Figure 3: Direct manipulation of human motion in the “identity” space: (a) initial sneaky walking; (b) and (c) two edited sneaky walkingsequences.

(a) (b) (c)

Figure 4: Direct manipulation of human motion in the “style” space: (a) initial walking style; (b) and (c) two new walking styles.

(a) (b) (c) (d)

Figure 5: Comparison of three different editing processes: (a) initial sneaky walking; (b) editing in the “identity” space; (c) editing in the“style” space; (d) editing in the joint space. Note that the same set of constraints are used for motion editing.

5 Conclusion

We have presented a multilinear human motion model for humanmotion analysis and synthesis. The main advantage of the multlilin-ear motion models is that they provide explicit parameterized rep-resentations of human motion in terms of “style” and “identity”factors. As a result, they can be used to synthesize new motionstyles for actors, transfer motion styles from one actor to another,and edit human motion in style and/or identity space. In addition,the multilinear models provide a low-dimensional space of humanmotion data, which not only speeds up the synthesis process butalso reduces the synthesis ambiguity. Lastly, the multilinear motionmodels are constructed from registered motion data, and thereforeencode knowledge about foot contact constraints. This allows us togenerate high-quality human motion without specifying any contactconstraints.

We are now attempting to include more subjects, children to elderlypeople, and more style variations (walking speeds, directions, stepsizes and ground slopes) to the training database. We have testedthe performance of our system in human walking but we believe thesame scheme could also be used to synthesize and edit other loco-motion such as running. However, our algorithm is not applicable tofreestyle human actions such as disco dancing because we requireall example motions be structurally similar and can be semanticallyregistered against each other.

We believe the new modular representation could be used for manyother applications in human motion processing. For example, en-coding human motion data with identity and style parameters couldtrivially be used to compress human motion data. They could alsobe applied for recognizing human identity as well as motion stylesby reconstructing “style” and “identity” parameters from input mo-tion or video. Therefore, one of immediate directions for futurework is to investigate the applications of the multilinear motionmodels in motion data compression, coding, and recognition.

References

AMAYA, K., BRUDERLIN, A., AND CALVERT, T. 1996. Emotionfrom Motion. In Graphics Interface, 222–229.

ARIKAN, O., AND FORSYTH, D. A. 2002. Interactive MotionGeneration from Examples. In ACM Transactions on Graphics.21(3):483–490.

BRAND, M., AND HERTZMANN, A. 2000. Style Machines. InProceedings of ACM SIGGRAPH 2000. 183–192.

CHAI, J., AND HODGINS, J. 2005. Performance Animationfrom Low-dimensional Control Signals. In ACM Transactionson Graphics. 24(3):686–696.

CHAI, J., AND HODGINS, J. 2007. Constraint-based motion op-timization using a statistical dynamic model. In ACM Transac-tions on Graphics. 26(3):article No.8.

CHEN, Y.-L., MIN, J., AND CHAI, J. 2009. Flexible Registrationof Human Motion data with Parameterized Motion Models. InACM Symposium on Interactive 3D Graphics and Games. 183–190.

GLEICHER, M. 1998. Retargeting Motion to New Characters. InProceedings of ACM SIGGRAPH 1998. 33-42.

GROCHOW, K., MARTIN, S. L., HERTZMANN, A., ANDPOPOVIC, Z. 2004. Style-based Inverse Kinematics. In ACMTransactions on Graphics. 23(3):522–531.

HECK, R., AND GLEICHER, M. 2007. Parametric Motion Graphs.In Proceedings of the 2007 symposium on Interactive 3D graph-ics and games. 129-136.

HSU, E., PULLI, K., AND POPOVIC, J. 2005. Style Translation forHuman Motion. In ACM Transactions on Graphics. 24(3):1082–1089.

KOLDA, T. G., AND BADER, B. W. 2009. Tensor Decompositionsand Applications. SIAM Review. 51(3).

KOVAR, L., AND GLEICHER, M. 2003. Flexible automaticmotion blending with registration curves. In ACM SIG-GRAPH/EUROGRAPH Symposium on Computer Animation.214–224.

KOVAR, L., AND GLEICHER, M. 2004. Automated Extractionand Parameterization of Motions in Large Data Sets. In ACMTransactions on Graphics. 23(3):559–568.

KOVAR, L., GLEICHER, M., AND PIGHIN, F. 2002. MotionGraphs. In ACM Transactions on Graphics. 21(3):473–482.

KROONENBERG, P. M., AND LEEUW, J. D. 1980. Principal com-ponent analysis of three-mode data by means of alternating leastsquares algorithms. In Psychometrika. 45:69–97.

KWON, T., AND SHIN, S. Y. 2005. Motion modeling for on-line locomotion synthesis. In ACM SIGGRAPH Symposium onComputer Animation. 29-38.

LEE, J., AND SHIN, S. Y. 1999. A Hierarchical Approach to Inter-active Motion Editing for Human-Like Figures. In Proceedingsof ACM SIGGRAPH 1999. 39-48.

LEE, J., CHAI, J., REITSMA, P., HODGINS, J., AND POLLARD,N. 2002. Interactive Control of Avatars Animated With HumanMotion Data. In ACM Transactions on Graphics. 21(3):491–500.

LEE, K. H., CHOI, M. G., AND LEE, J. 2006. Motionpatches:building blocks for virtual environments annotated withmotion data. In ACM Transactions on Graphics. 25(3):898–906.

LI, Y., WANG, T., AND SHUM, H.-Y. 2002. Motion Texture: {A}Two-level Statistical Model for Character Synthesis. In ACMTransactions on Graphics. 21(3):465–472.

LIU, K., HERTZMANN, A., AND POPOVIC, Z. 2005. LearningPhysics-Based Motion Style with Nonlinear Inverse Optimiza-tion. In ACM Transactions on Graphics. 23(3):1071–1081.

LOURAKIS, M. I. A., 2009. levmar: Levenberg-Marquardt nonlin-ear least squares algorithms in {C}/{C}++.

MIN, J., CHEN, Y.-L., AND CHAI, J. 2009. Interactive Generationof Human Animation with Deformable Motion Models. ACMTransactions on Graphics. 29(1): article No. 9.

MUKAI, T., AND KURIYAMA, S. 2005. Geostatistical MotionInterpolation. In ACM Transactions on Graphics. 24(3):1062–1070.

MUKAI, T., AND KURIYAMA, S. 2007. Multilinear MotionSynthesis with Level-of-Detail Controls. Proceedings of the15th Pacific Conference on Computer Graphics and Applications(PG’07). 9-17.

POPOVIC, Z., AND WITKIN, A. P. 1999. Physically Based MotionTransformation. In Proceedings of ACM SIGGRAPH 1999. 11-20.

ROSE, C., COHEN, M. F., AND BODENHEIMER, B. 1998. Verbsand Adverbs: Multidimensional Motion Interpolation. In IEEEComputer Graphics and Applications. 18(5):32–40.

SAFONOVA, A., AND HODGINS, J. K. 2007. Construction andoptimal search of interpolated motion graphs. In ACM Transac-tions on Graphics. 26(3).

SHAPIRO, A., CAO, Y., AND FALOUTSOS, P. 2006. ”Style Com-ponents. Graphics Interface.

TROJE, N. 2002. Decomposing biological motion: A frameworkfor analysis and synthesis of human gait patterns. In Journal ofVision. 2(5):371-387.

TUCKER, L. R. 1966. Some mathematical notes on three-modefactor analysis. In Psychometrika. 31:279–311.

VASILESCU, M. A. O., AND TERZOPOULOS, D. 2004. Tensor-Textures: multilinear image-based rendering. ACM Transactionson Graphics. 23(3):336–342.

VASILESCU, M. A. O. 2002. Human Motion Signatures: Anal-ysis, Synthesis, Recognition. Proceedings of the InternationalConference on Pattern Recognition (ICPR’02). 3:456-460.

VLASIC, D., BRAND, M., PFISTER, H., AND POPOVIC, J. 2005.Face Transfer with Multilinear Models. In ACM Transactions onGraphics. 24(3):426–433.

WANG, H., WU, Q., SHI, L., YU, Y., AND AHUJA, N. 2005.Out-of-core tensor approximation of multi-dimensional matricesof visual data. ACM Transactions on Graphics. 24(3):527–535.

WANG, J. M., FLEET, D. J., AND HERTZMANN, A. 2007. Mul-tifactor Gaussian process models for style-content separation.Proceedings of the 24th International Conference on MachineLearning. 975-982.

synthesis and editing of personalized stylistic human motion

Documents