morphing...runtime contd. - pack •process in batches px0 py0 pz0 fl0 tu0 tv0 na0 nb0 ta0 tb0 px1...

morphingjonathan garrett

6/6/08

introduction

• morphing (blend-shapes) modifies the

vertices of a moby

• limitless transformation of a vertex

• compatible with skinning

– morph verts in object space

– skin morphed verts to animated object / world

space

introduction contd.

• part of the igVertAnim system– igVertAnimUv

• animate uvs – simple texture matrix

• used a lot in ratchet (eg. waterfalls)

– igVertAnimMorph• animate positions / normals / tangents

• artist driven

• easy to do in a vertex-shader– but would increase the combinations (and we don’t

want that!)

– and adds to rsx time

igVertAnimMorph

• processed on spu

• runs at end of frame

• writes directly into class data before moby

rendered next frame

• semaphore ensures rsx syncs before

drawing morphed data

setup

• artist freely moves moby verts

• we store difference (delta) from original

(base) position / normal

• want to minimize number of verts we

transform

• create target for each distinct pose

(mouth, eyes, forehead frown)

setup contd.

• group similar targets into sets

• at animation time, combine multiple targets at varying strengths

– 10% oooo

– 25% eeee

– 65% ahhh

– 60% left eyebrow raised

• can sum to > 100% (and over / undershoot by factor of 2 for special effect)

setup - our guy

setup - our guy’s targets

setup – sets

• 3 sets:

– left eye

– right eye

– mouth

• varying number of targets in each set

setup contd.

• targets contain moved verts

• union of verts in similar targets grouped

into a set

• we draw triangles, so expand to include all

verts attached to tri with morphing verts

– sets can’t overlap

• sets split into fragments

– eg. left-eye, right-eye, mouth, rest of head

runtime

• process in batches– overlap dma in / out / process

– want to minimize amount of data movement• apply all targets to current batch

• process:– dma in base verts (rsx vert and base vert)

– apply deltas for each target• easy: vert’ = (vert + (t delta))

– finalize

– pack into rsx vert

– dma out

runtime contd.

• unpack base-verts and deltas into range

-1 → +1

• store a scale for apply to for combine with

vert

• allows us to maximize all bits in our range

– free: mul_add is same speed as add

runtime contd.

finalize positions

f32 → i16

normalize normals

orthonormalize tangents

finalize normals

f32 → x11y11z10

finalize tangents

f32 → x11y11z10

pack to rsx_verts

dma out

}

loop batches

{

dma in base rsx_verts /

positions / normals / tangents

unpack positions

i16 → f32

unpack normals

x11y11z10 → f32 transposed

unpack tangents

x11y11z10 → f32 transposed

loop targets

{

dma in deltas (pos)

dma in deltas (normals)

apply deltas (pos) - i8 / i16

apply deltas (normals) - i8

}

runtime contd. - unpack

x0 y0 z0 x1 y1 z1 x2 y2 x0 y0 z0 x1

z2 x3 y3 z3 x4 y4 z4 x5 y1 z1 x2 y2

y5 y5 x6 y6 z6 x7 y7 z7 z2 x3 y3 z3

x4 y4 z4 x5

y5 y5 x6 y6

z6 x7 y7 z7

• unpack in 4s– positions (i16 → f32)

x0y0z0 x1y1z1 x2y2z2 x3y3z3 x0 x1 x2 x3

x4y4z4 x5y5z5 x6y6z6 x7y7z7 y0 y1 y2 y3

z0 z1 z2 z3

x4 x5 x6 x7

y4 y5 y6 y7

z4 z5 z6 z7

– normals (x11y11z10 → f32)

runtime contd. - apply

• for each target

– unpack and apply deltas

• positions i8 / i16 → f32

• normals i8 → f32

• same grouping as unpack

– scale by normalization weight

– scale by target weight (combined with above)

– apply to base

runtime contd. - finalize

• normals– normalize (easy as we’re transposed)

nx = nx0 nx1 nx2 nx3

ny = ny0 ny1 ny2 ny3

nz = nz0 nz1 nz2 nz3

len = mul (nx, nx)

len = mul_add(ny, ny, len)

len = mul_add(nz, nz, len)

oolen = …

nx = mul (nx, oolen)

ny = mul (ny, oolen)

nz = mul (nz, oolen)

– 12 ops for 4 normals (3 cycles per normal)


• tangents– orthnormalize (also easy as we’re transposed)

– t' = norm(t - (n' dot t) . n'))

– 18 ops for 4 tangents (4.5 cycles per tangent)

n

t

n’

t

t’


• positions

– convert f32 → i16

• normals / tangents

– transpose

– convert f32 → x11y11z10

runtime contd. - pack

• no change to existing moby vertex format (didn’t

want to change our shaders)

• verts are packed for the rsx (not spu )

• two vertex formats

– moby_1: bound to single matrix (20 bytes)

– moby_4: bound to (up to) four matrices (28 bytes)

moby_1 px py pz fl tu tv na nb ta tb

moby_4 px py pz fl ia ib wa wb tu tv na nb ta tb


• process in batches

px0 py0 pz0 fl0 tu0 tv0 na0 nb0

ta0 tb0 px1 py1 pz1 fl1 tu1 tv1

na1 nb1 ta1 tb1 px2 py2 pz2 fl2

tu2 tv2 na2 nb2 ta2 tb2 px3 py3

pz3 fl3 tu3 tv3 na3 nb3 ta3 tb3

px0 py0 pz0 fl0 ia0 ib0 wa0 wb0

tu0 tv0 na0 nb0 ta0 tb0 px1 py1

pz1 fl1 ia1 ib1 wa1 wb1 tu1 tv1

na1 nb1 ta1 tb1 px2 py2 pz2 fl2

ia2 ib2 wa2 wb2 tu2 tv2 na2 nb2

ta2 tb2 px3 py3 pz3 fl3 ia3 ib3

wa3 wb3 tu3 tv3 na3 nb3 ta3 tb3


• morphed verts come in packed (and in

batches of 4)

• shuffle / shift to get in correct place for rsx

vert

• combine

• dma out

runtime animation

• parse blend-tree and find morph clip

• data stored alongside skeleton data in AnimClip

• list of target ids and weights per frame

– from target set-id process in batches for each target

• dma two frames which straddle floor(frame), floor(frame)+1

• blend with sub-frame factor

runtime animation contd.

• basic format 4 / 8 / 16 targets / frame (needs

improving)

MorphData4 (16 bytes):

i0i1i2i3www0www1www2www300000000


i0i1i2i3i4i5i6i70000000000000000

www0www1www2www3www4www5www6www7


i0i1i2i3i4i5i6i7i8i9iaibicidieif

www0www1www2www3www4www5www6www7

www8www9wwwawwwbwwwcwwwdwwwewwwf

runtime animation contd.

• up to 16 target-sets

• up to 256 targets in total

• up to 16 targets per animation frame

• target weight in range -2.0 → +2.0

(undershoot / overshoot)

maya setup - rules

• regions grouped into “sets” based on target

names:

– LeftEye_open → LeftEye

– LeftEye_close

– RightEye_open → RightEye

– RightEye_close

– looking to improve this

• sets can’t overlap (we render non-overlapping tris)

– need some tools support to help artists with this

maya setup - keyframing

maya setup - keyframe controller

• animators have a fantastic sdk setup

results

results contd.

other stuff - wrinkles

• wrinkles

– dynamic modification of normal map

– adds subtle detail

– animator driven

• sdk setup automatically creates keys

wrinkles contd.

• part of igComposite system

• runs on spu

• shader based

– composite system loads shader code and input data

– provides support functions (decompress, convert to / from float, compress (!) etc.)

– shader called with interface ptr (it does all the work)

wrinkles contd.

• runtime combine of up to 2 normal maps

with base normal map

• masked by up to 8 regions

– left / right mouth / nose / eyes / forehead

• 8 weights -1 → +1

-1 → 0 grab from wrinkle_map_a

0 → +1 grab from wrinkle_map_b

• weighted / masked accumulate onto base

wrinkles contd. – maps

wrinkles – contd.

• our normal maps store dx and dy instead of nx, ny, nz

• stored as dxt5 (byte per pixel)0.0 → r, dy → g, 1.0 → b, dx → a

• pixel shader reconstructs nx, ny, nz– only need to unpack / apply two components

• region maps stored as dxt5– r = region_1

– g = region_2

– b = region_3

– a = region_4

wrinkles – contd.

• spu uncompresses to f32

• accumulate as f32

• would like to compress back to dxt5, but

has artifacts, so pack to raw g8b8

– only twice as big as dxt5

wrinkles – contd.

• flow:unpack base

unpack wrinkles

unpack regions

wrinkle = sel (wrinkle_a, wrinkle_b,

sign(weight))

wrinkle = (region abs(weight))

base += wrinkle

pack base

wrinkles – contd.

cost - morphing

• sizes:– base_positions 6 bytes

– base_normals 4 bytes

– base_tangents 4 bytes

– deltas_positions_8 3 bytes

– deltas_positions_16 6 bytes

– deltas_normals 3 bytes

• 1000 verts, 1 target: 23k (16-bit deltas)

19k (8-bit deltas)

• 1000 verts, 5 targets: 58k (16-bit deltas)

43k (8-bit deltas)

cost contd. - wrinkles

• input:

– (up to) two wrinkle maps

• dxt5: 512 512: ~340k (inc. mips) each

– (up to) two region maps

• dxt5: 512 512: ~340k (inc. mips) each

• output:

– composited normal map

• r8g8: 512 512: ~680k (inc. mips)

issues

• only moving geometry – not bsphere /

collision (can’t move too much)

• only single anim clip – extend to parse full

blend-tree and cross-blend anims

extensions

• cross-blend textures

– human → alien - verts morph, skin blends to

scales (base and normal map)

• improve delta compression format

• improve anim format

• better to combine wrinkles on rsx ?

end!

• demo if I can get it running…

• questions ?

morphing...runtime contd. - pack •process in batches px0 py0 pz0 fl0 tu0 tv0 na0 nb0 ta0 tb0 px1...

Documents