dynamic scenes paul arthur navrátil parallelismjustisn’tenough

Dynamic Scenes

Paul Arthur Navrátil

Parallelism Just Isn’t Enough

Outline

• Toward Rapid Reconstruction for Animated Ray TracingLext and Akenine-Möller, Eurographics 2001

– Lessons for parallel implementations?

• Parallel Tree Building on a Range of Shared Address Space Multiprocessors: Algorithms and Application PerformanceShan and Singh, Proc. IPPS/SPDP 1998

– Application to Ray Tracing acceleration structures?

Animated Ray Tracing: Motivation[Lext and Akenine-Möller, 2001]

• Two competing goals in graphics processing– Generate photo-realistic images

– Render at real-time rates ( > 20 fps )

• Can Ray Tracing give us both?– Parallelizing RT and frameless rendering help– Latest efforts yielding interactive rates for static scenes

(e.g., [Wald et al., 01])

– Dynamic scenes still too computationally intensive

Why doesn’t parallelism solve the problem?

• Data Structure overhead– Reconstructing the

acceleration data structures has worse complexity and less obvious parallelism

– In a dynamic scene, all changed objects need to be updated in the acceleration structure

– Traditionally, this means rebuild it!

Previous Work

• Special-case animated objects [Parker et al., 1999]

– Objects outside the acceleration structure not scalable• Reuse frame information to save render time

[Adelson and Hodges, 1995]

– Performance improved 92%, but only for scenes with eye movement

• Use lazy evaluation to prune acceleration structure[McNeill et al., 1992]

– Evaluates only the structure that is actually used– Only tested on static scenes

Insight: Only Part of the Scene is Dynamic!

• Distinguish between static and dynamic objects in acceleration structure

• Dynamic objects exhibit spatial locality• Update transform matrices for each scene node,

transform rays before calculating intersection

[Wald et al., 2002]

Dynamic versus Static Parts

Idea: Be Lazy!

• If modifying the scene graph fails to provide significant speedup (or even if it does) use lazy evaluation of the acceleration structure– Evaluate a subsection only when a ray enters the voxel

• Adapt acceleration structure according to use– Simplify or eliminate the structure if usage is low– Do this at runtime, based on some feedback measure?

• Neither of these ideas were in the tested system,but it can be extended to include them

Data Structure: Hierarchical Bounding Boxes

• Surround each set of primitives with a minimum area Oriented Bounding Box (OBB)– Set defined as primitives to which one transform is

applied (static or dynamic)

– Put a recursive grid in each OBB• Encapsulate all top-level dynamic OBBs in a

special OBB-grid– These are recalculated every frame due to the

movement of the contained grids

Algorithm Execution

• Create OBB grids– One grid for static objects, the rest contain all dynamic

objects

• Update OBB grids– Transform to root node CS, then to original node CS

(previous frame?) Recurse if this contains subgrids

– Apply incremental transformations to primitives

– Create new OBB around subgrids (and primitives?)

– Transform to new OBB CS

A Benchmark for Animated Ray Tracing (BART) [Lext et al., 2001]

BART Robots Benchmark video

http://www.ce.chalmers.se/BART/

Results: A Silver Lining

Tree Building Methods: Motivation

• Use tree structures to organize work solving N-body problems– Classic example: find positions of N bodies attracted by gravity

after a period of time– Graphics corollary? Find position of dynamic objects in given

frame

• Studies 5 strategies across 4 systems– Physically distributed memory (SGI Origin 2000)– Bus-based shared memory MP (SGI Challenge)– Shared virtual memory in software (Intel Paragon)– Configurable memory in software with hardware assistance

(Wisconsin Typhoon-zero)

Strategy Characteristics

• ORIG– Global octree built by processors loading objects into a single

shared tree.

– Split cells into 8 subcells when objects within cell exceed threshold

– Processor operates on cells it ‘created’

• ORIG-LOCAL– Optimized version of ORIG

– Uses different data structures for internal nodes than leaves

– Processor allocates and manages its own cell and leaf arrays

– Thus cells can be kept in contiguous memory


• UPDATE– Insight: distributions evolve slowly over time

(in animation too?)

– Objects that move out of the cell in which they’re placed (think: location in the scene) is small

– Update only as much as the tree as is necessary

– Leverage tree hierarchy to find new cell (if cells arranged according to scene space)

Strategy Characteristics• PARTREE

– Insight: in previous algorithms, a lock is needed to ensure mutual exclusivity on the single global tree

– Causes synchronization overhead, contention, and remote access overhead for root and high inner nodes of tree

– Solution: each processor creates a local tree, populates it, then merges tree into global tree

– Uses ‘tree template’ to simplify merge– Global inserts and synchronizations reduced– Redundant work minimized if spatial locality used to

distribute objects to processors


• SPACE– Divide up space among processors, rather than objects

(Pharr?)

– Each process loads objects that are in its spaceIdeally space units (voxels) map to tree cells

– No need for locking, but high potential for load imbalances if number of objects per space is unbound

– Can lose data locality since processors don’t necessarily compute on the objects they put in the tree (true for graphics?)

– No locking during global tree assembly, because only one processor has the cells for a given subtree

dynamic scenes paul arthur navrátil parallelismjustisn’tenough

Documents

intensive slide

static parts slide

static objects

acceleration structure

new obb cs slide

static scenes

silver lining slide

animated ray tracing