morphing...runtime contd. - pack •process in batches px0 py0 pz0 fl0 tu0 tv0 na0 nb0 ta0 tb0 px1...
TRANSCRIPT
morphingjonathan garrett
6/6/08
introduction
• morphing (blend-shapes) modifies the
vertices of a moby
• limitless transformation of a vertex
• compatible with skinning
– morph verts in object space
– skin morphed verts to animated object / world
space
introduction contd.
• part of the igVertAnim system– igVertAnimUv
• animate uvs – simple texture matrix
• used a lot in ratchet (eg. waterfalls)
– igVertAnimMorph• animate positions / normals / tangents
• artist driven
• easy to do in a vertex-shader– but would increase the combinations (and we don’t
want that!)
– and adds to rsx time
igVertAnimMorph
• processed on spu
• runs at end of frame
• writes directly into class data before moby
rendered next frame
• semaphore ensures rsx syncs before
drawing morphed data
setup
• artist freely moves moby verts
• we store difference (delta) from original
(base) position / normal
• want to minimize number of verts we
transform
• create target for each distinct pose
(mouth, eyes, forehead frown)
setup contd.
• group similar targets into sets
• at animation time, combine multiple targets at varying strengths
– 10% oooo
– 25% eeee
– 65% ahhh
– 60% left eyebrow raised
• can sum to > 100% (and over / undershoot by factor of 2 for special effect)
setup - our guy
setup - our guy’s targets
setup – sets
• 3 sets:
– left eye
– right eye
– mouth
• varying number of targets in each set
setup contd.
• targets contain moved verts
• union of verts in similar targets grouped
into a set
• we draw triangles, so expand to include all
verts attached to tri with morphing verts
– sets can’t overlap
• sets split into fragments
– eg. left-eye, right-eye, mouth, rest of head
runtime
• process in batches– overlap dma in / out / process
– want to minimize amount of data movement• apply all targets to current batch
• process:– dma in base verts (rsx vert and base vert)
– apply deltas for each target• easy: vert’ = (vert + (t delta))
– finalize
– pack into rsx vert
– dma out
runtime contd.
• unpack base-verts and deltas into range
-1 → +1
• store a scale for apply to for combine with
vert
• allows us to maximize all bits in our range
– free: mul_add is same speed as add
runtime contd.
finalize positions
f32 → i16
normalize normals
orthonormalize tangents
finalize normals
f32 → x11y11z10
finalize tangents
f32 → x11y11z10
pack to rsx_verts
dma out
}
loop batches
{
dma in base rsx_verts /
positions / normals / tangents
unpack positions
i16 → f32
unpack normals
x11y11z10 → f32 transposed
unpack tangents
x11y11z10 → f32 transposed
loop targets
{
dma in deltas (pos)
dma in deltas (normals)
apply deltas (pos) - i8 / i16
apply deltas (normals) - i8
}
runtime contd. - unpack
x0 y0 z0 x1 y1 z1 x2 y2 x0 y0 z0 x1
z2 x3 y3 z3 x4 y4 z4 x5 y1 z1 x2 y2
y5 y5 x6 y6 z6 x7 y7 z7 z2 x3 y3 z3
x4 y4 z4 x5
y5 y5 x6 y6
z6 x7 y7 z7
• unpack in 4s– positions (i16 → f32)
x0y0z0 x1y1z1 x2y2z2 x3y3z3 x0 x1 x2 x3
x4y4z4 x5y5z5 x6y6z6 x7y7z7 y0 y1 y2 y3
z0 z1 z2 z3
x4 x5 x6 x7
y4 y5 y6 y7
z4 z5 z6 z7
– normals (x11y11z10 → f32)
runtime contd. - apply
• for each target
– unpack and apply deltas
• positions i8 / i16 → f32
• normals i8 → f32
• same grouping as unpack
– scale by normalization weight
– scale by target weight (combined with above)
– apply to base
runtime contd. - finalize
• normals– normalize (easy as we’re transposed)
nx = nx0 nx1 nx2 nx3
ny = ny0 ny1 ny2 ny3
nz = nz0 nz1 nz2 nz3
len = mul (nx, nx)
len = mul_add(ny, ny, len)
len = mul_add(nz, nz, len)
oolen = …
nx = mul (nx, oolen)
ny = mul (ny, oolen)
nz = mul (nz, oolen)
– 12 ops for 4 normals (3 cycles per normal)
runtime contd. - finalize
• tangents– orthnormalize (also easy as we’re transposed)
– t' = norm(t - (n' dot t) . n'))
– 18 ops for 4 tangents (4.5 cycles per tangent)
n
t
n’
t
t’
runtime contd. - finalize
• positions
– convert f32 → i16
• normals / tangents
– transpose
– convert f32 → x11y11z10
runtime contd. - pack
• no change to existing moby vertex format (didn’t
want to change our shaders)
• verts are packed for the rsx (not spu )
• two vertex formats
– moby_1: bound to single matrix (20 bytes)
– moby_4: bound to (up to) four matrices (28 bytes)
moby_1 px py pz fl tu tv na nb ta tb
moby_4 px py pz fl ia ib wa wb tu tv na nb ta tb
runtime contd. - pack
• process in batches
px0 py0 pz0 fl0 tu0 tv0 na0 nb0
ta0 tb0 px1 py1 pz1 fl1 tu1 tv1
na1 nb1 ta1 tb1 px2 py2 pz2 fl2
tu2 tv2 na2 nb2 ta2 tb2 px3 py3
pz3 fl3 tu3 tv3 na3 nb3 ta3 tb3
px0 py0 pz0 fl0 ia0 ib0 wa0 wb0
tu0 tv0 na0 nb0 ta0 tb0 px1 py1
pz1 fl1 ia1 ib1 wa1 wb1 tu1 tv1
na1 nb1 ta1 tb1 px2 py2 pz2 fl2
ia2 ib2 wa2 wb2 tu2 tv2 na2 nb2
ta2 tb2 px3 py3 pz3 fl3 ia3 ib3
wa3 wb3 tu3 tv3 na3 nb3 ta3 tb3
runtime contd. - pack
• morphed verts come in packed (and in
batches of 4)
• shuffle / shift to get in correct place for rsx
vert
• combine
• dma out
runtime animation
• parse blend-tree and find morph clip
• data stored alongside skeleton data in AnimClip
• list of target ids and weights per frame
– from target set-id process in batches for each target
• dma two frames which straddle floor(frame), floor(frame)+1
• blend with sub-frame factor
runtime animation contd.
• basic format 4 / 8 / 16 targets / frame (needs
improving)
MorphData4 (16 bytes):
i0i1i2i3www0www1www2www300000000
MorphData8 (32 bytes):
i0i1i2i3i4i5i6i70000000000000000
www0www1www2www3www4www5www6www7
MorphData16 (48 bytes):
i0i1i2i3i4i5i6i7i8i9iaibicidieif
www0www1www2www3www4www5www6www7
www8www9wwwawwwbwwwcwwwdwwwewwwf
runtime animation contd.
• up to 16 target-sets
• up to 256 targets in total
• up to 16 targets per animation frame
• target weight in range -2.0 → +2.0
(undershoot / overshoot)
maya setup - rules
• regions grouped into “sets” based on target
names:
– LeftEye_open → LeftEye
– LeftEye_close
– RightEye_open → RightEye
– RightEye_close
– looking to improve this
• sets can’t overlap (we render non-overlapping tris)
– need some tools support to help artists with this
maya setup - keyframing
maya setup - keyframe controller
• animators have a fantastic sdk setup
results
results
results contd.
other stuff - wrinkles
• wrinkles
– dynamic modification of normal map
– adds subtle detail
– animator driven
• sdk setup automatically creates keys
wrinkles contd.
• part of igComposite system
• runs on spu
• shader based
– composite system loads shader code and input data
– provides support functions (decompress, convert to / from float, compress (!) etc.)
– shader called with interface ptr (it does all the work)
wrinkles contd.
• runtime combine of up to 2 normal maps
with base normal map
• masked by up to 8 regions
– left / right mouth / nose / eyes / forehead
• 8 weights -1 → +1
-1 → 0 grab from wrinkle_map_a
0 → +1 grab from wrinkle_map_b
• weighted / masked accumulate onto base
wrinkles contd. – maps
wrinkles – contd.
• our normal maps store dx and dy instead of nx, ny, nz
• stored as dxt5 (byte per pixel)0.0 → r, dy → g, 1.0 → b, dx → a
• pixel shader reconstructs nx, ny, nz– only need to unpack / apply two components
• region maps stored as dxt5– r = region_1
– g = region_2
– b = region_3
– a = region_4
wrinkles – contd.
• spu uncompresses to f32
• accumulate as f32
• would like to compress back to dxt5, but
has artifacts, so pack to raw g8b8
– only twice as big as dxt5
wrinkles – contd.
• flow:unpack base
unpack wrinkles
unpack regions
wrinkle = sel (wrinkle_a, wrinkle_b,
sign(weight))
wrinkle = (region abs(weight))
base += wrinkle
pack base
wrinkles – contd.
cost - morphing
• sizes:– base_positions 6 bytes
– base_normals 4 bytes
– base_tangents 4 bytes
– deltas_positions_8 3 bytes
– deltas_positions_16 6 bytes
– deltas_normals 3 bytes
• 1000 verts, 1 target: 23k (16-bit deltas)
19k (8-bit deltas)
• 1000 verts, 5 targets: 58k (16-bit deltas)
43k (8-bit deltas)
cost contd. - wrinkles
• input:
– (up to) two wrinkle maps
• dxt5: 512 512: ~340k (inc. mips) each
– (up to) two region maps
• dxt5: 512 512: ~340k (inc. mips) each
• output:
– composited normal map
• r8g8: 512 512: ~680k (inc. mips)
issues
• only moving geometry – not bsphere /
collision (can’t move too much)
• only single anim clip – extend to parse full
blend-tree and cross-blend anims
extensions
• cross-blend textures
– human → alien - verts morph, skin blends to
scales (base and normal map)
• improve delta compression format
• improve anim format
• better to combine wrinkles on rsx ?
end!
• demo if I can get it running…
• questions ?