[ieee 2011 conference for visual media production (cvmp) - london, united kingdom...

10
SPACE-TIME EDITING OF 3D VIDEO SEQUENCES Margara Tejera and Adrian Hilton University of Surrey, Guildford, United Kingdom Abstract A shape constrained Laplacian mesh deformation approach is introduced for interactive editing of mesh sequences. This allows low-level constraints, such as foot or hand contact, to be imposed while preserving the natural dynamics of the captured surface. The approach also allows artistic manipulation of motion style to achieve effects such as squash-and-stretch. Interactive editing of key-frames is followed by automatic temporal propagation over a window of frames. User edits are seamlessly integrated into the captured mesh sequence. Three spatio-temporal interpolation methods are evaluated. Results on a variety of real and synthetic sequences demonstrate that the approach enables flexible manipulation of captured 3D video sequences. Keywords: 3D video, 3D animation, performance capture, mesh editing, stylisation 1 Introduction Capturing human motion has been of interest in computer vision and graphics research for over 30 years. Technology has evolved from sparse marker based motion capture (MoCap) systems, to dense reconstruction of non-rigid surfaces as 3D video mesh sequences [20, 24]. Practical reuse of 3D video sequences for animation requires editing techniques which provide the level of control available with conventional skeletal animation while preserving the captured non-rigid surface dynamics. Space-time skeletal motion editing techniques have been developed for manipulation of captured sequences via interactive changes at key-frames which are propagated across the sequence [6, 13]. This provides low-level animation control and allows constraints such as character-object interaction or foot-floor contact to be imposed to modify a character’s motion. A similar level of interactive editing control is desirable for captured 3D video mesh sequences. In this work we build on previous research in Laplacian mesh editing [19, 3] to introduce techniques for space-time editing of mesh sequences. Xu et al. [25] and Kircher and Garland [10] presented general approaches for key-frame editing of mesh sequences as a set of transformations on individual mesh elements which are weighted over a window either side of the key-frame. We present an analogous approach to key-frame editing which also constrains the mesh sequence deformation to a learnt space of motions. This ensures preservation of the captured motion characteristics and underlying anatomical structure of the actor performance. In order to propagate key-frame edits across the sequence, we present two novel non-linear interpolation methods and evaluate their advantages and limitations with respect to the traditional linear methods. Results on real and synthetic 3D video sequences demonstrate that the proposed space-time editing approach provides a flexible tool for interactive manipulation allowing both low- level constraints and artistic stylisation. 2 Related work Editing and stylisation of skeletal MoCap data: Rose et al. [17] introduced methods for high-level parametric control of skeletal motion by interpolation between captured motions, which lead to more sophisticated techniques where a parametrised space of motions is created [11, 16]. The idea of concatenating motions from different parametric motion spaces inspired the so-called motion graphs [12, 1] or move trees. This added the possibility to transition seamlessly between captured motions sequences. Heck and Gleicher [7] combined parametrised motions with motion graphs to allow high-level control and transition between multiple motions. Other research has followed the process traditionally done by the animators: first, edit a set of key-frames of the sequence, creating a set of poses that satisfies the constraints set by the user; and second, create in-between poses that preserve the naturalness of the original motion. The work of Gleicher [6] and Lee and Shin [13] are examples of space-time editing approaches. Gleicher [6] solves for both space and time constraints simultaneously. Lee and Shin [13] modify the poses of the skeleton in the key-frames by means of an inverse kinematics solver (IK), and then apply a multilevel B-Spline approximation for the interpolation of poses. As well as depicting an action, characters communicate feelings to the viewer and can perform the same action in different styles. Brand and Hertzmann [5] presented the first method to automatically separate the “style” from the “structure” by an unsupervised learning process based on Hidden Markov Models, capable of capturing the data essential structure and discarding its accidental properties. Following the same learning approach, Hsu et al. [8] train style translation models that describe how to transition from one motion to another, and Shapiro et al. [18] apply Independent Component Analysis to decompose the motion. Min et al. [15] construct a generative human motion model using 2011 Conference for Visual Media Production 978-0-7695-4621-6/11 $26.00 © 2011 IEEE DOI 10.1109/CVMP.2011.23 148

Upload: adrian

Post on 16-Feb-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

SPACE-TIME EDITING OF 3D VIDEO SEQUENCES

Margara Tejera and Adrian Hilton

University of Surrey, Guildford, United Kingdom

Abstract

A shape constrained Laplacian mesh deformation approachis introduced for interactive editing of mesh sequences. Thisallows low-level constraints, such as foot or hand contact,to be imposed while preserving the natural dynamics ofthe captured surface. The approach also allows artisticmanipulation of motion style to achieve effects such assquash-and-stretch. Interactive editing of key-frames isfollowed by automatic temporal propagation over a windowof frames. User edits are seamlessly integrated into thecaptured mesh sequence. Three spatio-temporal interpolationmethods are evaluated. Results on a variety of real andsynthetic sequences demonstrate that the approach enablesflexible manipulation of captured 3D video sequences.

Keywords: 3D video, 3D animation, performance capture,

mesh editing, stylisation

1 Introduction

Capturing human motion has been of interest in computer

vision and graphics research for over 30 years. Technology

has evolved from sparse marker based motion capture (MoCap)

systems, to dense reconstruction of non-rigid surfaces as 3D

video mesh sequences [20, 24]. Practical reuse of 3D video

sequences for animation requires editing techniques which

provide the level of control available with conventional skeletal

animation while preserving the captured non-rigid surface

dynamics.

Space-time skeletal motion editing techniques have been

developed for manipulation of captured sequences via

interactive changes at key-frames which are propagated across

the sequence [6, 13]. This provides low-level animation control

and allows constraints such as character-object interaction

or foot-floor contact to be imposed to modify a character’s

motion. A similar level of interactive editing control is

desirable for captured 3D video mesh sequences.

In this work we build on previous research in Laplacian mesh

editing [19, 3] to introduce techniques for space-time editing

of mesh sequences. Xu et al. [25] and Kircher and Garland

[10] presented general approaches for key-frame editing of

mesh sequences as a set of transformations on individual mesh

elements which are weighted over a window either side of the

key-frame. We present an analogous approach to key-frame

editing which also constrains the mesh sequence deformation

to a learnt space of motions. This ensures preservation of

the captured motion characteristics and underlying anatomical

structure of the actor performance. In order to propagate

key-frame edits across the sequence, we present two novel

non-linear interpolation methods and evaluate their advantages

and limitations with respect to the traditional linear methods.

Results on real and synthetic 3D video sequences demonstrate

that the proposed space-time editing approach provides a

flexible tool for interactive manipulation allowing both low-

level constraints and artistic stylisation.

2 Related work

Editing and stylisation of skeletal MoCap data: Rose

et al. [17] introduced methods for high-level parametric

control of skeletal motion by interpolation between captured

motions, which lead to more sophisticated techniques where

a parametrised space of motions is created [11, 16]. The idea

of concatenating motions from different parametric motion

spaces inspired the so-called motion graphs [12, 1] or movetrees. This added the possibility to transition seamlessly

between captured motions sequences. Heck and Gleicher [7]

combined parametrised motions with motion graphs to allow

high-level control and transition between multiple motions.

Other research has followed the process traditionally done by

the animators: first, edit a set of key-frames of the sequence,

creating a set of poses that satisfies the constraints set by the

user; and second, create in-between poses that preserve the

naturalness of the original motion. The work of Gleicher [6]

and Lee and Shin [13] are examples of space-time editing

approaches. Gleicher [6] solves for both space and time

constraints simultaneously. Lee and Shin [13] modify the

poses of the skeleton in the key-frames by means of an inverse

kinematics solver (IK), and then apply a multilevel B-Spline

approximation for the interpolation of poses.

As well as depicting an action, characters communicate

feelings to the viewer and can perform the same action

in different styles. Brand and Hertzmann [5] presented

the first method to automatically separate the “style” from

the “structure” by an unsupervised learning process based

on Hidden Markov Models, capable of capturing the data

essential structure and discarding its accidental properties.

Following the same learning approach, Hsu et al. [8] train style

translation models that describe how to transition from one

motion to another, and Shapiro et al. [18] apply Independent

Component Analysis to decompose the motion. Min etal. [15] construct a generative human motion model using

2011 Conference for Visual Media Production

978-0-7695-4621-6/11 $26.00 © 2011 IEEE

DOI 10.1109/CVMP.2011.23

148

multi-linear data analysis techniques. This model is driven by

two parameters and their adjustment produces personalised

stylistic human motion.

Editing and stylisation of 3D video data: 3D video

data has an inherent complex nature. Multiple-view video

reconstruction generates independent meshes at each frame,

resulting in a lack of temporal consistency required to

manipulate the mesh sequence. To overcome this difficulty,

techniques for mesh sequence processing have relied on

either the use of synthetic data, or in the application of

shape similarity measures. Huang et al. [9] concatenated

clips of captured sequences by determining transition links

using similarity matrices based on shape histograms. This

work extended the concept of “motion graphs” [12, 1] from

skeletons to 3D data.

Analogous to the IK methods for skeletal data, several mesh

editing techniques have been developed. They generally

consist of a global optimisation that tries to preserve the

local differential properties of the mesh while satisfying user

constraints. A comprehensive comparison between these

methods is provided in [3]. Sumner et al. [23] formulate the

problem as a least square minimisation that manipulates the

deformation gradients of the triangles, which describe their

transformation with respect to a reference pose. A non-linear

feature space is constructed using the deformation gradients

as feature vectors and applying polar decomposition and

exponential maps to avoid naive linear blending of poses,

which would lead to unnatural results. Laplacian-based

approaches [19] define a linear operator according to the

connectivity and the area of the triangles of the meshes.

The application of this operator yields a set of differential

coordinates whose direction approximates the direction of

the local normals of the triangles, and whose magnitude is

proportional to the local mean curvature. The main drawback

of these methods is having to deal with rotations explicitly.

Lipman et al. [14] introduced a mesh representation based

on rotation-invariant linear coordinates that addresses this

problem: linear shape interpolation of meshes using this

representation handles rotations correctly.

Following the key-frame editing scheme, the mesh editing

problem can be extended to sequences. Xu et al. [25]

introduced an alternating least-square method based on

rotation-invariant linear coordinates [14] demonstrating

natural deformation. Constraints at key-frames are propagated

by a handle trajectory editing algorithm, obtaining an overall

natural-looking motion. Kircher and Garland [10] presented

a new differential surface representation which encodes first

and second order differences of each vertex with respect to its

neighbours giving rotation and translation invariance. These

differences are stored in “connection maps”, one per triangle,

which allow the development of motion processing operations

with better results than vertex-based approaches.

Sumner and Popovic [22] addressed the problem of transferring

poses between characters. Deformation transfer is achieved

by applying the affine transformation that each triangle of a

character’s mesh undergoes to transform from a reference pose

to a desired pose, to the triangles of a different character. This

work was generalised by Baran et al. [2], who presented a

patch-based mesh representation derived from [14] that allows

the semantic transfer of poses, e.g. transferring the motion of

arms to legs.

3 Learning a space of deformation for mesh sequences

Skeletal motion sequences explicitly represent the anatomical

structure which is preserved during editing. For mesh

sequences the underlying physical structure is implicit

requiring editing to be constrained to reproduce anatomically

correct deformations. To preserve the implicit structure we

learn a mesh motion space from the temporally aligned 3D

performance capture data and constrain the Laplacian mesh

editing to lie in this space. A learnt deformation gradient

feature space to constrain the editing of a single mesh using a

sparse set of examples was previously introduced by Sumner etal. [23]. In this work we extend editing to mesh sequences and

directly learn the space in differential coordinates to constrain

subsequent deformation. This effectively combines previous

free-form mesh sequence editing [25, 10] with learnt spaces

of mesh deformation [23] within a Laplacian mesh editing

framework [19, 3].

3.1 Laplacian mesh editing framework

Laplacian mesh editing is based on a differential representation

of the mesh which allows local mesh properties to be encoded.

The gradient of the triangles’ basis functions φi yields a 3× 4matrix Gj for each of the triangles [21]:

Gj = (∇φ1,∇φ2,∇φ3,∇φ4) (1)

=

⎛⎝

(p1 − p4)�

(p2 − p4)�

(p3 − p4)�

⎞⎠−1 ⎛

⎝1 0 0 −10 1 0 −10 0 1 −1

⎞⎠ (2)

Where p1,p2 and p3 are the position of the vertices of the jth

triangle and p4 is a fourth vertex added along the unit normal

[22]. Applying this gradient to every triangle of the mesh, we

can construct a matrix G of size 4m×n, where n is the number

of vertices and m the number of triangles [4].

⎛⎜⎝

G�1...

G�m

⎞⎟⎠ = G

⎛⎜⎝

p�1...

p�n

⎞⎟⎠ (3)

Let A be a diagonal weighting matrix containing the areas

of the triangles, the matrix G�A represents the discrete

divergence operator and the discrete Laplace-Beltrami

operator L can be constructed by performing the following

multiplication: L = G�AG [3]. Given a mesh, its differential

coordinates can be obtained by multiplying the Laplacian

operator by its absolute coordinates: δ(x) = Lx, δ(y) = Lyand δ(z) = Lz.

149

If we assume the addition of positional soft constraints xc, the

x absolute coordinates of the reconstructed mesh (the same

applies for y and z) can be computed in the least-square sense

[19]:

x = arg minx

(‖Lx− δ(xo)‖2 + ‖Wc(x− xc)‖2)

(4)

Where xo are the coordinates of the original mesh and xc are

the soft constraints on vertex locations given by the feature

correspondence with a diagonal weight matrix Wc.

This equation allows the reconstruction of a mesh by means

of the Laplacian operator L that, due to its linear nature,

does not account for changes in rotation. To allow non-linear

interpolation of rotation an iterative appraoch is taken [21]: in

each step of the minimisation the changes in rotation of each

triangle are computed and the Laplacian operator is updated

accordingly. The non-rotational part of the transformations are

discarded in order to help the preservation of the original shape

of the triangles.

3.2 Laplacian editing with a learnt deformation space

In this work, we introduce a novel mesh editing framework

based on the Laplacian deformation scheme presented in

section 3.1. The novelty resides in incorporating into the

algorithm the previously observed deformations of the

character. This constrains the possible solutions of the

deformation solver, ensuring the preservation of the captured

motion characteristics and underlying anatomical structure of

the actor performance.

For a sequence of meshes {M(ti)}Fi=0, where F is the number

of frames, the mesh motion deformation space is built by

taking each mesh represented in differential coordinates as a

deformation example. Our data matrix M is built by placing

the concatenated δ(x), δ(y) and δ(z) differential coordinates

of each example in its rows:

M =

⎛⎜⎜⎜⎝

δ�1 (x) δ�1 (y) δ�1 (z)δ�2 (x) δ�2 (y) δ�2 (z)

......

...

δ�F (x) δ�F (y) δ�F (z)

⎞⎟⎟⎟⎠ (5)

The data matrix is centred obtaining Mc = M − M, where

M is a F × 3n matrix whose rows are the mean of the rows

of the data matrix M . In order to obtain a basis representing

the space of deformations an SVD decomposition is performed

over the matrix Mc: Mc = UDV�, where matrix V is a 3n×F with a vector of the basis in each of its columns. The first

l eigenvectors ek representing 95% of the variance are kept,

which gives a linear basis of the form:

δ(r) = δ +l∑

k=1

rkek = δ + Er, (6)

where rk are scalar weights for each eigenvector, r is an l-

dimensional weight vector and E is an l × 3n matrix whose

rows are the first l eigenvectors of length 3n.

(a) Left: original mesh with user-specified constraints

coloured in orange and red. Centre: Laplacian editing. Right:

Laplacian editing with learnt deformation space.

(b) Detail of the tail. Using Laplacian editing (left) causes the

tail to fold over itself. Using the learnt deformation space

(right) preserves the tail shape.

Figure 1: Effect of incorporating a learnt space ofdeformations into the Laplacian framework. The neck of thehorse has been lengthen dramatically. (Dataset courtesy of[22])

Space-time editing of key-frames in the mesh sequences is

performed using a constrained Laplacian mesh editing within

the space of deformations δ(r). From equation 4 we have:

r, x = arg minr,x

(‖Lx− δ(r)‖2 + ‖Wc(x− xc)‖2) (7)

Equation 7 allows interactive editing of a key-frame mesh

M(tk) to satisfy a set of user-defined constraints xc resulting

in a modified mesh M ′(tk) with vertices x′(tk). To construct

a basis with respect to the mesh M(ti) for each frame in

the mesh sequence the Laplacian Li, defined according to the

discrete gradient operator matrix Gi, is used as a reference in

the construction of the data matrix Mi such that δi(x) = Lix.

Constructing a local basis defines changes in shape in the

learnt motion space taking the reference frame as the origin.

The use of a local basis gives improved speed of convergence

and control in shape deformation within the shape constrained

Laplacian mesh editing.

150

(a) Left: original mesh with user-specified constraints

coloured in orange and red. Centre: Laplacian editing. Right:

Laplacian editing with learnt deformation space.

(b) Detail of the legs. Using the learnt deformation space

(right) preserves the leg shape avoiding mesh collapse which

occurs with Laplacian editing (left).

Figure 2: Effect of incorporating a learnt space ofdeformations into the Laplacian framework. The left leg ofthe character has been straightened.

Examples of the effect of the basis are depicted in figures 1,

2 and 3. Deformations applying the learnt deformation space

within the Laplacian framework preserve the surface details

and the underlying structure of the character. This avoids

artefacts such as thining, unnatuaral bending of limbs and

collapsing of the mesh which occur if the Laplacian is not

constrained to a learnt deformation space.

4 Editing in a learnt deformation space

The space-time editing pipeline consists of deforming a set

of key-frames and subsequently propagating these changes

over a temporal window with the objective of seamlessly

incorporating the edited frames into the sequence. The user

input is necessary to both choose the key-frames and to select

the constrained vertices. Our space-time interface allows

selection of any vertex on the mesh as a constraint. This is

flexible compared to previous mesh editing approaches which

require a set of handles to be predefined.

4.1 Key-frame editing

Key-frame editing is performed within the Laplacian

framework described in section 3.2. During an off-line

process, each frame of the sequences of a given character

is used as the reference frame for computing a space of

deformations. In our implementation, all available frames for

the character are considered as deformation examples for the

(a) Left: original mesh with user-specified constraints

coloured in orange and red. Centre: Laplacian editing. Right:

Laplacian editing with learnt deformation space.

(b) Detail of the legs. Using the learnt deformation space

(right) preserves the leg shape avoiding mesh thining which

occurs with Laplacian editing (left).

Figure 3: Effect of incorporating a learnt space ofdeformations into the Laplacian framework. The right legof the character has been bent.

construction of the deformation space.

The user interactively selects two sets of vertices: the vertices

whose position must stay unchanged during the deformation,

and the vertices that will be dragged to a desired position.

These positional constraints and the space of deformations

associated with the given frame, are incorporated in equation

7. Figure 4 shows an example of a key-frame edit.

Figure 4: Key-frame editing. Left: original horse. Centre:original horse showing the constrained vertices, the red groupwill stay fixed and the orange group will be moved during theediting. Right: edited horse. (Dataset courtesy of [22])

4.2 Space-time propagation

Changes to the key-frames must be propagated over time in

order to obtain a natural-looking motion. Three propagation

methods are evaluated: linear interpolation, non-linear

interpolation and constraint interpolation. A discussion and

151

(a) original sequence

(b) two edited key-frames

(c) space-time editing with Tk = 3frames

(d) space-time editing with Tk = 6frames

Figure 5: Space-time editing of 3D video for a walkingsequence with multiple key-frames to modify character height

comparison between these methods is included at the end of

the section.

Figure 5 illustrates the process of space-time editing for a walk

sequence, a key-frame is selected and modified, changes are

then propagated across a temporal window with weights shown

by the mesh colour. In this example the characters height is

modified in an unnatural way on two key-frames to give an

example of the space-time propagation which is easily visible.

More subtle physically realistic editing examples are included

in the results.

4.2.1 Linear interpolation

Given an edited key-frame mesh M ′(tk) with vertices x′(tk),edits are propagated temporally to other frames of the mesh

sequence M(ti) with vertices x(ti) using a spline to define the

interpolation weights λi for the difference in mesh shape Δk =(x′(tk)− x(tk)):

x′(ti) = x(ti) + λiΔk (8)

Multiple key-frame edits can be combined as a linear sum of

edits:

x′(ti) = x(ti) +Kf∑k=1

λikΔk (9)

where Kf is the number of key-frames. This linear sum allows

compositing of changes from multiple frames in the sequence

with weighted influence on the shape at a particular frame

providing intuitive control over mesh sequence deformation.

In practice weights are interpolated over a temporal window of

influence around each key-frame tk±Tk which can be selected

by the user.

Linear interpolation is computationally efficient but may result

in unrealistic deformation such as shortening of limbs. We

therefore propose a non-linear and a constraint interpolation

approach which aim to preserve the mesh structure. A

comparative evaluation is presented in section 4.3.

4.2.2 Non-linear interpolation

We propose a non-linear interpolation method based on

the propagation of the triangle transformations between the

edited and the original key-frame. Given a key-frame mesh

M ′(tk) and its original version M(tk), the transformation

that the jth triangle of M(tk) undergoes to transform into

the corresponding triangle in M ′(tk) is computed and polar

decomposed into its rotational, R, and non-rotational, S,

components: T ′jk = R′jk S′jk . Let q′jk be the quaternion

associated with R′jk and qjk the quaternion identity for all

j, where the superscript refers to the jth triangle. The

interpolated rotation q′ji is computed as:

q′ji = slerp(qjk, q′jk , λi) (10)

Let Sjk = I for all j, the non-rotational scale/shear part S′ji is

linearly interpolated:

S′ji = Sjk + λi(S

′jk − Sj

k) (11)

Multiple key-frame edits can be combined analogous to

equation 9:

q′ji =Kf∏k=1

slerp(qjk, q′jk , λik) (12)

S′ji =Kf∑k=1

Sjk + λik(S′jk − Sj

k) (13)

where∏

represents quaternion multiplication.

Converting q′ji to R′ji , a set of transformations T ′ji = R′ji S′jican be computed. Applying these transformations directly

to the triangles of M(ti) would result in an unconnected

152

Figure 6: Illustration of the constraint interpolation method. First row: original sequence. Second row: a key-frame hasbeen edited and the constraints (in red and orange) have been interpolated. Third row: for each frame within the window ofpropagation, the Laplacian deformer of equation 7 is run to deform the meshes subject to the interpolated constraints.

mesh. The Laplacian deformation framework of equation 4

is applied to link the triangles back together. In this case the

non-rotational part of the transformations are kept in order to

correctly apply the S′ji components.

4.2.3 Constraint interpolation

The linear and non-linear propagation methods discussed

in previous paragraphs find the edited meshes M ′(ti)by processing information of the original meshes M(ti)and the key-frames edits. An alternative method consists

in propagating the position of the constraints over the

temporal window, and subsequently performing a Laplacian

deformation according to equation 7 to obtain M ′(ti) subject

to the interpolated constraints. This offers the advantage of

controlling the position of the constrained vertices along the

window at the expense of a higher computational cost.

Directly interpolating the constraints coordinates does not

guarantee the preservation of the shape of the submeshcomprised by the selected vertices. Therefore, the non-linear

interpolation method presented in section 4.2.2 is applied

to compute the position of the constrained vertices over

the propagation window. This approach differs from more

simplistic approaches where these positions are found by

averaging the rotations for each of the triangles [25]. An

illustration of the method can be found in figure 6.

Although computationally more expensive, the constraint

interpolation provides full control on the positions of the

constrained vertices along the window of propagation. This

allows fixed constraints to be enforced over a temporal

window, for example on hand or foot location during contact.

4.3 Discussion on interpolation methods

A comparison between the three interpolation methods

discussed in section 4.2 is presented in figure 7. This shows the

propagation of the edited key-frame of figure 4, from the horse

gallop sequence. Applying the linear interpolation causes the

front legs to shorten, while the non-rotational and constraint

interpolation achieve natural-looking results.

The non-linear interpolation incorporates the transformation

of the mesh triangle by triangle taking into account both the

rotation and scale/shear components of the transformations.

This avoids artefacts related to linear interpolations, such as

shortening of the limbs or distortion of the original shape of

the mesh.

Since applying the constraint interpolation method means

deforming each of the meshes within the propagation window

subject to a set of constraints, it provides more control over

the position of the constrained vertices along the temporal

window. However, it is computationally the most expensive

method. Computation times for the propagation of the

space-time editing example of figure 8(e) were 1.601, 5.567

and 13.926 seconds for the linear interpolation, the non-

linear interpolation and the constraint interpolation methods,

respectively.

5 Space-time editing results

Space-time editing is demonstrated on both synthetic and

captured mesh sequences. A variety of editing operations

are illustrated to demonstrate the flexibility of the proposed

153

Figure 7: Comparison of the propagation of an edit using three different interpolation methods. Above the original sequence,below the propagation window for each of the methods. Top row: linear interpolation. Middle row: non-linear interpolation.Bottom row: constraint interpolation. The edited mesh is shown at the left of the figure, and the frame subsequent to thepropagation window is shown at the right. (Dataset courtesy of [ 22])

approach. Space-time editing of a walk sequence to modify

feet positions, avoid obstacles and step up onto a platform are

shown in Figure 8(a,b). This illustrates a common application

of space-time editing of captured sequences to modify contact

positions according to scene constraints.

In figure 8(a)(right) the space-time editing approach has been

used to repair reconstruction errors. The original sequence (see

accompanying video) shows a twirl where there is a loss of

contact between the hand and the skirt. In the edited sequence

the hand has been moved to grasp the skirt correctly.

Figure 8(c) shows a more complex space-time edit to

modify the arm and leg movements for the street-dancer while

preserving both the anatomical structure and surface dynamics.

Space-time editing also allows artistic stylisation of the

motion to create common animation effects such as movement

emphasis, exaggeration and cartoon effects of squash-stretch

as well as re-timing of the sequence for anticipation. Figure

8(d) presents examples of motion stylisation to exaggerate the

walking of a character with a loose dress and to produce a

cartoon style squash-stretch effect for a jump.

Finally, figure 8(e) shows the editing of a synthetic horse

galloping sequence where the torso of the horse has been lifted.

This example illustrates the effect of applying large changes

to a mesh sequence. Constraining the deformation to a learnt

deformation preserves the mesh structure ensuring a natural

motion sequence.

Video sequences corresponding to the results presented

in figure 8 and demonstration of the interactive interface

are included in the supplementary video. Results of

space-time editing demonstrate that the approach allows

flexible interactive editing of captured sequences to satisfy

user-specified constraints while preserving the natural spatio-

temporal dynamics of the captured motion.

The linear interpolation approach has been used to generate

the resulting sequences of figure 8(a,b,c,d). Since the edits

performed over these examples are small deformations, this has

not introduced visual artefacts. For the horse sequence of figure

8(e), where the key-frame undergoes a large deformation, the

non-linear interpolation was preferred to generate the final

sequence. As shown in figure 7, in this case the linear method

introduces significant errors if applied.

Computation times for a selection of space-time editing results

can be found in table 1. Timings show that for meshes of 3000-

6000 vertices the computation time for key-frame editing takes

0.5-1s allowing interactive editing with rapid feedback. These

timings are for a CPU implementation of the approach, real-

time performance could potentially be achieved with transfer

of the Laplacian solver to a GPU.

Typical values of Tk are in the range 4-8 frames.

Supplementary video: http://www.vimeo.com/25663553

154

Type of data Sequence edit # # vertices # constrained vertices Deform. time (s)

Real data

Cones

1 2886 236 0.636

2 2886 242 0.635

3 2886 268 0.644

4 2886 247 0.631

5 2886 255 0.643

Dancer

1 5580 1585 1.187

2 5580 1345 1.446

3 5580 1270 1.430

4 5580 1494 1.171

5 5580 1508 0.880

6 5580 1497 1.042

7 5580 1560 1.082

8 5580 1147 1.057

9 5580 704 0.987

10 5580 1432 1.051

11 5580 1258 1.031

Real data for stylisation Skirt

1 2854 691 0.829

2 2854 722 0.396

3 2854 613 0.673

4 2854 595 0.820

5 2854 550 0.658

6 2854 588 0.536

Synthetic data Horse 1 8431 6753 1.601

Table 1: Computation times of a selection of space-time editing results. The sequence “Cones” corresponds to figure8(a)(middle), sequence “Dancer” to figure 8(c), sequence “Skirt” to figure 8(d)(left) and sequence “Horse” to figure 8(e).Edit numbers refer to different key-frame edits performed on the sequences.

5.1 Discussion

Some of the results included in the video show small artefacts

due to one or more of the following reasons:

• Errors in surface reconstruction which are present in both

the original and edited sequences.

• The walking and running sequences shown are the result

of concatenating shorter sequences and small jumps may

be visible at the end of each cycle.

• If large deformations are applied in a small window of

frames, such as in the example where the feet positions of

the running sequence are modified, the resulting motion

may lack smoothness. Timing could be better controlled

by adding extra frames to the sequence. This remains as

future work.

6 Conclusions

Space-time editing of 3D sequences with a learnt motion

model gives a flexible interactive approach to mesh sequence

editing with a similar level of control to conventional skeletal

animation. This allows constraints such as foot or hand

position to be imposed or modification of the captured

movement to interact with objects while maintaining the

movement characteristics and anatomical structure of the

captured performance.

Three interpolation methods to propagate the changes on the

key-frames have been evaluated. While the linear interpolation

approach provides the fastest solution, it includes artefacts such

as mesh shrinking. The non-linear and constraint interpolation

methods provide more accurate and natural-looking results at

the expense of longer computation times.

This paper focuses on the editing of dynamic surface geometry,

editing the dynamic surface appearance captured in 3D video

remains an open problem for future research.

References

[1] O. Arikan and D. A. Forsyth. Synthesizing constrained

motions from examples. ACM Transactions on Graphics,

2002.

[2] I. Baran, D. Vlasic, E. Grinspun, and J. Popovic.

Semantic deformation transfer. ACM Transactions onGraphics, 28, 2009.

[3] M. Botsch and O. Sorkine. On linear variational

surface deformation methods. IEEE Transactions onVisualization and Computer Graphics, 14(1):213–230,

2008.

155

[4] M. Botsch, R. W. Sumner, M. Pauly, and M. Gross.

Deformation transfer for detail-preserving surface

editing. In Proc. Vision, Modeling, and Visualization,

pages 357–364, 2006.

[5] M. Brand and A. Hertzmann. Style machines. In

SIGGRAPH ’00: Proceedings of the 27th annualconference on Computer graphics and interactivetechniques, pages 183–192, New York, NY, USA, 2000.

ACM Press/Addison-Wesley Publishing Co.

[6] M. Gleicher. Motion editing with spacetime constraints.

In SI3D ’97: Proceedings of the 1997 symposium onInteractive 3D graphics, New York, NY, USA, 1997.

ACM.

[7] R. Heck and M. Gleicher. Parametric motion graphs. In

In ACM Symposium on Interactive 3D Graphics, pages

129–136, 2007.

[8] E. Hsu, K. Pulli, and J. Popovic. Style translation for

human motion. ACM Trans. Graph., 24(3):1082–1089,

2005.

[9] P. Huang, A. Hilton, and J. Starck. Human motion

synthesis from 3d video. In CVPR, 2009.

[10] S. Kircher and M. Garland. Free-form motion processing.

ACM Trans. Graph., 27:12:1–12:13, May 2008.

[11] L. Kovar and M. Gleicher. Automated extraction and

parameterization of motions in large data sets. ACMTrans. Graph., 23:559–568, August 2004.

[12] L. Kovar, M. Gleicher, and F. Pighin. Motion

graphs. In SIGGRAPH ’02: Proceedings of the 29thannual conference on Computer graphics and interactivetechniques, volume 21, pages 473–482, New York, NY,

USA, July 2002. ACM.

[13] J. Lee and S. Y. Shin. A hierarchical approach

to interactive motion editing for human-like figures.

In SIGGRAPH ’99: Proceedings of the 26th annualconference on Computer graphics and interactivetechniques, pages 39–48, New York, NY, USA, 1999.

ACM Press/Addison-Wesley Publishing Co.

[14] Y. Lipman, O. Sorkine, D. Levin, and D. Cohen-Or.

Linear rotation-invariant coordinates for meshes. ACMTrans. Graph., 24:479–487, July 2005.

[15] J. Min, H. Liu, and J. Chai. Synthesis and editing of

personalized stylistic human motion. In Proceedings ofthe 2010 ACM SIGGRAPH symposium on Interactive 3DGraphics and Games, I3D ’10, pages 39–46, New York,

NY, USA, 2010. ACM.

[16] T. Mukai and S. Kuriyama. Geostatistical motion

interpolation. ACM Trans. Graph., 24:1062–1070, July

2005.

[17] C. Rose, B. Bodenheimer, and M. F. Cohen. Verbs and

adverbs: Multidimensional motion interpolation using

radial basis functions. IEEE Computer Graphics andApplications, 18:32–40, 1998.

[18] A. Shapiro, Y. Cao, and P. Faloutsos. Style components.

In In Proc. of Graphics interface, 2006.

[19] O. Sorkine. Differential representations for mesh

processing. Computer Graphics Forum, 25(4):789–807,

December 2006.

[20] J. Starck and A. Hilton. Surface capture for performance-

based animation. IEEE Computer Graphics andApplications, 27(3):21–31, 2007.

[21] C. Stoll, E. de Aguiar, C. Theobalt, and H.-P. Seidel.

A volumetric approach to interactive shape editing.

Technical report, Max-Planck-Institut fur Informatik,

June 2007.

[22] R. W. Sumner and J. Popovic. Deformation transfer for

triangle meshes. In SIGGRAPH ’04: ACM SIGGRAPH2004 Papers, pages 399–405, New York, NY, USA, 2004.

ACM.

[23] R. W. Sumner, M. Zwicker, C. Gotsman, and J. Popovic.

Mesh-based inverse kinematics. In SIGGRAPH ’05:ACM SIGGRAPH 2005 Papers, pages 488–495, New

York, NY, USA, 2005. ACM.

[24] D. Vlasic, I. Baran, W. Matusik, and J. Popovic.

Articulated mesh animation from multi-view silhouettes.

ACM Trans. Graph., 27(3):1–9, 2008.

[25] W. Xu, K. Zhou, Y. Yu, Q. Tan, Q. Peng, and B. Guo.

Gradient domain editing of deforming mesh sequences.

In ACM SIGGRAPH 2007 papers, SIGGRAPH ’07, New

York, NY, USA, 2007. ACM.

156

(a) Space-time editing of walk sequence for changing feet positions, collision avoidance and repairing reconstruction errors (the

hand has been moved to grasp the skirt correctly): original (blue); edited (green).

(b) Space-time editing of walk sequence for stepping onto a platform.

(c) Space-time editing of arm and leg movement for a street dancer sequence: original (blue); edited (green)

(d) Stylised sequences: walk with raised knees and jump with squash and stretch effects: original (blue); edited (green)

(e) Space-time editing of a synthetic horse galloping sequence: top row (original), bottow row (edited). (Dataset courtesy of

[22])

Figure 8: Interactive animation and space-time editing of synthetic and 3D video sequences.

157