dynamic scenes paul arthur navrátil parallelismjustisn’tenough
TRANSCRIPT
Dynamic Scenes
Paul Arthur Navrátil
Parallelism Just Isn’t Enough
Outline
• Toward Rapid Reconstruction for Animated Ray TracingLext and Akenine-Möller, Eurographics 2001
– Lessons for parallel implementations?
• Parallel Tree Building on a Range of Shared Address Space Multiprocessors: Algorithms and Application PerformanceShan and Singh, Proc. IPPS/SPDP 1998
– Application to Ray Tracing acceleration structures?
Animated Ray Tracing: Motivation[Lext and Akenine-Möller, 2001]
• Two competing goals in graphics processing– Generate photo-realistic images
– Render at real-time rates ( > 20 fps )
• Can Ray Tracing give us both?– Parallelizing RT and frameless rendering help– Latest efforts yielding interactive rates for static scenes
(e.g., [Wald et al., 01])
– Dynamic scenes still too computationally intensive
Why doesn’t parallelism solve the problem?
• Data Structure overhead– Reconstructing the
acceleration data structures has worse complexity and less obvious parallelism
– In a dynamic scene, all changed objects need to be updated in the acceleration structure
– Traditionally, this means rebuild it!
Previous Work
• Special-case animated objects [Parker et al., 1999]
– Objects outside the acceleration structure not scalable• Reuse frame information to save render time
[Adelson and Hodges, 1995]
– Performance improved 92%, but only for scenes with eye movement
• Use lazy evaluation to prune acceleration structure[McNeill et al., 1992]
– Evaluates only the structure that is actually used– Only tested on static scenes
Insight: Only Part of the Scene is Dynamic!
• Distinguish between static and dynamic objects in acceleration structure
• Dynamic objects exhibit spatial locality• Update transform matrices for each scene node,
transform rays before calculating intersection
[Wald et al., 2002]
Dynamic versus Static Parts
Idea: Be Lazy!
• If modifying the scene graph fails to provide significant speedup (or even if it does) use lazy evaluation of the acceleration structure– Evaluate a subsection only when a ray enters the voxel
• Adapt acceleration structure according to use– Simplify or eliminate the structure if usage is low– Do this at runtime, based on some feedback measure?
• Neither of these ideas were in the tested system,but it can be extended to include them
Data Structure: Hierarchical Bounding Boxes
• Surround each set of primitives with a minimum area Oriented Bounding Box (OBB)– Set defined as primitives to which one transform is
applied (static or dynamic)
– Put a recursive grid in each OBB• Encapsulate all top-level dynamic OBBs in a
special OBB-grid– These are recalculated every frame due to the
movement of the contained grids
Algorithm Execution
• Create OBB grids– One grid for static objects, the rest contain all dynamic
objects
• Update OBB grids– Transform to root node CS, then to original node CS
(previous frame?) Recurse if this contains subgrids
– Apply incremental transformations to primitives
– Create new OBB around subgrids (and primitives?)
– Transform to new OBB CS
A Benchmark for Animated Ray Tracing (BART) [Lext et al., 2001]
BART Robots Benchmark video
http://www.ce.chalmers.se/BART/
Results: A Silver Lining
Tree Building Methods: Motivation
• Use tree structures to organize work solving N-body problems– Classic example: find positions of N bodies attracted by gravity
after a period of time– Graphics corollary? Find position of dynamic objects in given
frame
• Studies 5 strategies across 4 systems– Physically distributed memory (SGI Origin 2000)– Bus-based shared memory MP (SGI Challenge)– Shared virtual memory in software (Intel Paragon)– Configurable memory in software with hardware assistance
(Wisconsin Typhoon-zero)
Strategy Characteristics
• ORIG– Global octree built by processors loading objects into a single
shared tree.
– Split cells into 8 subcells when objects within cell exceed threshold
– Processor operates on cells it ‘created’
• ORIG-LOCAL– Optimized version of ORIG
– Uses different data structures for internal nodes than leaves
– Processor allocates and manages its own cell and leaf arrays
– Thus cells can be kept in contiguous memory
Strategy Characteristics
• UPDATE– Insight: distributions evolve slowly over time
(in animation too?)
– Objects that move out of the cell in which they’re placed (think: location in the scene) is small
– Update only as much as the tree as is necessary
– Leverage tree hierarchy to find new cell (if cells arranged according to scene space)
Strategy Characteristics• PARTREE
– Insight: in previous algorithms, a lock is needed to ensure mutual exclusivity on the single global tree
– Causes synchronization overhead, contention, and remote access overhead for root and high inner nodes of tree
– Solution: each processor creates a local tree, populates it, then merges tree into global tree
– Uses ‘tree template’ to simplify merge– Global inserts and synchronizations reduced– Redundant work minimized if spatial locality used to
distribute objects to processors
Strategy Characteristics
• SPACE– Divide up space among processors, rather than objects
(Pharr?)
– Each process loads objects that are in its spaceIdeally space units (voxels) map to tree cells
– No need for locking, but high potential for load imbalances if number of objects per space is unbound
– Can lose data locality since processors don’t necessarily compute on the objects they put in the tree (true for graphics?)
– No locking during global tree assembly, because only one processor has the cells for a given subtree