memory management and parallelization paul arthur navrátil the university of texas at austin
TRANSCRIPT
![Page 1: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/1.jpg)
Memory Management and Parallelization
Paul Arthur Navrátil
The University of Texas at Austin
![Page 2: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/2.jpg)
Overview
• Uniprocessor Coherent Ray Tracing– Pharr et al., 1997
• Parallel Ray Tracing Summary– Chalmers, et al. 2002
• Demand-Driven Ray Tracing– Wald, et al. 2001
• Hybrid Scheduling– Reinhard, et al. 1999
![Page 3: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/3.jpg)
Background: Reyes [Cook et al. 87]
Inspirations– Texture cache, CATs
– Programmable shader
– Single primitive type
– Dicing
– Memory effects of scan-line architecture
![Page 4: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/4.jpg)
Pharr: System
• Use both texture and geometry ‘cache’– Lazy loading, LRU replacement
• One internal primitive – triangles– Optimize ray intersection calculation
– Known space requirements to represent
– Tessellation of other primitives increases space reqs
– Procedurally generated geometry
![Page 5: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/5.jpg)
Pharr: Geometry Cache
• Geometry grids – regular grid of voxels– Few thousand triangles per voxel– Acceleration grid of few hundred triangles for ray
intersection calculation– All geometry of voxel stored in contiguous block of
memory, independent of geometry in other voxelsspatial locality in scene tied to spatial locality in mem
– Different voxel sizes causes memory fragmentation– Adaptive voxel sizes?
Voxel size bounded by cache size for hardware impl?
![Page 6: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/6.jpg)
Pharr: Ray Grouping
• Scheduling grid -- Queue all rays inside voxel– Dependencies in ray tree prevent perfect scheduling
– Store all information needed for computation with rayeach ray can be independently calculated (parallelism!)
– Exploits coherence from beam of rays, disparate rays that move through same space
– Superior to: fixed-order traversal of ray tree; ray clustering
![Page 7: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/7.jpg)
Pharr: Radiance Calculation
• Outgoing radiance is emitted radiance plus weighted average of incoming radiances
• fr is bidirectional reflectance distribution function (BRDF)• At intersection, weights calculated for each spawned
secondary ray• Final weight is product of all BRDF values of all surfaces
on path from point on ray to the image plane
![Page 8: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/8.jpg)
Pharr: Voxel Scheduling
• Naïve – iterate across voxels• Better – weight voxels by cost and benefit
– Cost: how expensive to process the rays in the voxel?• High geometry in voxel has higher cost
• Much voxel geometry not in memory has higher cost
– Benefit: how much progress to completion from voxel?• Many rays in voxel yields more benefit
• Large weights on rays yields more benefit
![Page 9: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/9.jpg)
Pharr: System Summary
![Page 10: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/10.jpg)
Pharr: Lazy Loading Results
![Page 11: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/11.jpg)
Pharr: Reordering Results
![Page 12: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/12.jpg)
Pharr: Scheduling Results
![Page 13: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/13.jpg)
Pharr: Discussion
• Parallelization– Ray independence, load balanced geometry, lazy
geometry loading helps– Will cache results hold in distributed model?
• Modern architecture– Testing on 190 MHz MIPS R 10000 w/ 1GB RAM– Can modern arch hold scenes in memory
(no secondary storage usage)
• Hardware Acceleration– Use memory/cache/GPU rather than disk/memory/CPU
![Page 14: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/14.jpg)
Chalmers: Parallel Ray Tracing• Demand Driven
– Scene divided into subregions, or tasks– Processors given tasks statically or by a master– Balance with task balancing or adaptive regions [Fig 3.4]
• Data Parallel– Object data distributed across processors– Distribute objects according to spatial locality; a hierarchical
spatial subdivision; or randomly [Fig 3.7]
• Hybrid Scheduling– Run demand-driven and data-parallel tasks on same processors– DD ray traversal/DP ray-object intersect [Scherson and Caspary 88]
– DD intersection/DP ray generation [Jevans 89]
– Ray coherence [Reinhard and Jansen 99]
![Page 15: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/15.jpg)
Wald: Demand Driven Ray Tracing [Wald et al. 01]
• Exploit cache and space coherence with modern processors (Dual Pentium III 800 MHz, 256 MB)
• Use SIMD instruction set to achieve data-parallelism (e.g., Barycentric coordinate test)
![Page 16: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/16.jpg)
Wald: Performance [Wald et al. 01]
![Page 17: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/17.jpg)
Wald: Performance [Wald et al. 01]
![Page 18: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/18.jpg)
Reinhard: Hybrid Scheduling [Reinhard et al. 99]
• Data-parallel approach with demand-driven subtasks to load balance– Data-parallel tasks preferred, DD subtasks requested
from master when no DP tasks are available
![Page 19: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/19.jpg)
Reinhard: Hybrid Scheduling [Reinhard et al. 99]
![Page 20: Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin](https://reader036.vdocument.in/reader036/viewer/2022083009/5697bf701a28abf838c7d4ee/html5/thumbnails/20.jpg)
Reinhard: Performance [Reinhard et al. 99]