Download - Masked Software Occlusion Culling
![Page 1: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/1.jpg)
Magnus Andersson
![Page 2: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/2.jpg)
2
Occlusion Culling
Stanford Bunny in the Crytek Sponza AtriumEye
View frustum
![Page 3: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/3.jpg)
3
Occlusion Culling
Stanford Bunny in the Crytek Sponza Atrium
Fully occluded
![Page 4: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/4.jpg)
4
Occlusion Culling
Stanford Bunny in the Crytek Sponza Atrium
Partially occluded
![Page 5: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/5.jpg)
Pixel processing
Geometry processing
Draw call
5
Hardware Fixed-function Occlusion Culling
Handled automatically under the hood
Per-tile culling granularity
– Semi-occluded triangles can be partially culled
Very late in the pipeline
Upload frame data
Game logic
Z Tile Culling
CP
U s
ide
GP
U s
ide
![Page 6: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/6.jpg)
CP
U s
ide
GP
U s
ide
Game logic +
Pixel processing
Geometry processing
Draw call
Upload frame data
Z Tile Culling
SW culling
6
Software Occlusion Culling
Cull very early in the pipeline
– Cull both CPU and GPU work
Short delay
– Can be integrated with scene traversal
![Page 7: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/7.jpg)
7
Binary Space Partitioning (BSP) trees & portals
Precomputed – very efficient
Scene (occluders) must be static
Difficult to handle general scenes
Potentially Visible Sets (PVS)
Quake II, id Software, 1997
Half-Life 2, Valve Corporation, 2004
![Page 8: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/8.jpg)
8
Potentially Visible Sets (PVS)
Quake II, id Software, 1997
Half-Life 2, Valve Corporation, 2004
Player
Not part of PVS
Leaf boundaries
![Page 9: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/9.jpg)
9
Increasingly popular
Modern games have more complex and dynamic worlds
No complex pre-computation
– Simpler content pipeline
Dynamic Occlusion Culling
Assassin’s Creed Unity, Ubisoft, 2014
Battlefield 4, EA DICE, 2013
[HA15]
[Col11]
![Page 10: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/10.jpg)
10
Hierarchical Z Buffer (HiZ) [Greene93]
Rasterize to full resolution z buffer
Create HiZ buffer
– Find the maximum depth in each NxN tile
Perform occlusion query with HiZ buffer
General algorithm works for both SW and HW occlusion culling
Z-buffer Based Culling
Full resolution depth buffer
HiZ buffer
Complexobject
Bounding shape
Dragon model courtesy of Stanford University Computer Graphics Laboratory
![Page 11: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/11.jpg)
11
Intel Software Occlusion Culling Framework [CMK16]
Algorithm phases:
1. Rasterize a few designated occluder objects to z buffer
– Heavily SSE/AVX optimized
– Parallel triangle setup
– Parallel pixel depth computation
2. Compute 1-level HiZ buffer (and throw away z buffer)
3. Perform queries and render surviving objects
![Page 12: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/12.jpg)
12
Rendering to z-buffer per pixel
Updating HiZ tile needs all pixels within the tile
Occlusion Query per tile
Wouldn’t it be nice to compute HiZ directly?
– Being conservative is the only requirement
Idea: use alternative HiZ representation
Z-buffer Based Culling
Full resolution depth buffer
HiZ buffer
![Page 13: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/13.jpg)
13
Alternative HiZ buffer representation
Masked Occlusion Culling for Graphics Hardware [AHAM15]
Two depth values per tile
Per-pixel selection mask
zmax0 zmax
1 Layer selection mask
0 0 0 10 0 1 10 0 1 10 1 1 1
0 0 0 00 0 0 00 0 0 10 0 0 1
1 1 1 11 1 1 11 1 1 11 1 1 1
0 0 0 10 0 1 10 0 0 10 0 0 1
![Page 14: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/14.jpg)
14
Masked Occlusion Culling [AHAM15]
![Page 15: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/15.jpg)
15
Masked Occlusion Culling [AHAM15]
![Page 16: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/16.jpg)
16
Masked Occlusion Culling [AHAM15]
![Page 17: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/17.jpg)
17
Masked Occlusion Culling [AHAM15]
![Page 18: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/18.jpg)
18
Masked Occlusion Culling [AHAM15]
![Page 19: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/19.jpg)
19
Masked Occlusion Culling [AHAM15]
![Page 20: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/20.jpg)
20
Masked Occlusion Culling [AHAM15]
Merge
?
![Page 21: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/21.jpg)
21
Masked Occlusion Culling [AHAM15]
![Page 22: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/22.jpg)
22
Masked Occlusion Culling [AHAM15]
CulledNot culled
![Page 23: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/23.jpg)
23
Masked Occlusion Culling [AHAM15]
Triangle meshes
![Page 24: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/24.jpg)
24
Originally designed for graphics hardware
Directly update HiZ buffer withoutcomputing a full res z buffer
Decouples coverage sampling (rasterization) and depth computation
Masked Occlusion Culling [AHAM15]
Approximate, conservative HiZ buffer
Depth buffer
![Page 25: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/25.jpg)
25
Masked Software Occlusion Culling
Could Masked Occlusion Culling [AHAM15] be really fast for softwareocclusion culling?
Much less memory to read/write than full res z-buffer
Updates use bitmasks – can process many pixels in parallel (i.e. SSE/AVX)
No need to compute per-pixel depths
– Would need a fast SW rasterizer to compute coverage
Turns out it can
Paper presented at High Performance Graphics this year [HAAM16]
Source code available!
![Page 26: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/26.jpg)
26
Single Instruction, Multiple Data (SIMD)
3 3 5 6 2
32 bits 32 bits 32 bits 32 bits 32 bits
A A
5 5 7 3 5B B
+ + + ++
8 8 12 9 7
256 bits
AVXx86
4 1 4 10
5 11 4 5
+ + + +
9 12 8 15
32 bits 32 bits 32 bits 32 bits
![Page 27: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/27.jpg)
27
Single Instruction, Multiple Data (SIMD)
32 bits
AVXx86
0xAC1DBA5EAC1DBA5EAC1DBA5EAC1DBA5E51CAFE3751CAFE3751CAFE3751CAFE37
256 bits
A
0x51CAFE3751CAFE3751CAFE3751CAFE37AC1DBA5EAC1DBA5EAC1DBA5EAC1DBA5EB
&
0x0008BA160008BA160008BA160008BA160008BA160008BA160008BA160008BA16
0xAC1DBA5EA
0x51CAFE37B
&
0x0008BA16
![Page 28: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/28.jpg)
New algorithmtarget architecture
Supported in our library codeEasily extended to AVX-512
28
An abridged history of Intel’s SIMD instruction sets
SSE, 1999128b wide
SSE2, 2001
SSE4, 2006Intel® microarchitecture code name Nehalem
AVX, 2011256b wide2nd Gen Intel® Core™ Processors
AVX2, 20134th Gen Intel® Core™ Processors
AVX-512, 2016512b wide
1998 2017
![Page 29: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/29.jpg)
Masked software occlusion culling
![Page 30: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/30.jpg)
30
Algorithm Overview
Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
8-wide triangle setup
8 scanlines
256 pixels (8 tiles with 8x4 pixels)
Til
e
tra
ve
rsa
lT
ria
ng
lese
tup
![Page 31: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/31.jpg)
31
Transform and Clip
Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
![Page 32: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/32.jpg)
32
Compute Bounding Box
Padded to 32x8 pixel supertiles
Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
![Page 33: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/33.jpg)
33
Compute Depth Plane Depth = ax + by + c
– Conservative tile depth: Check sign of a and b
– Can be incrementally updated Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
-, - +, -
-, + +, +
Clamp to vertex depths
+ a
+ b
![Page 34: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/34.jpg)
34
Supertile Traversal Order
Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
![Page 35: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/35.jpg)
35
AVX Register Layout
Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
![Page 36: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/36.jpg)
36
AVX Register Layout
One scanline per SIMD lane
Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
![Page 37: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/37.jpg)
Compute slopes (∆y/∆x) once
– Similar to regular scanline rasterizers
37
Edge Slopes
Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
![Page 38: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/38.jpg)
38
Compute Intersections
Compute intersections for each scanline
– Eight scanlines in parallel using AVX Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
Intersections
![Page 39: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/39.jpg)
39
Compute Coverage Mask
Start with full coverage mask
Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
Intersections
![Page 40: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/40.jpg)
40
Compute Coverage Mask
>>>>>>>>>>>>>>>>
Start with full coverage mask
– Shift each lane (scanline) to intersection
– AVX2 and later have per-lane shift instruction Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
Intersections
![Page 41: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/41.jpg)
41
Compute Coverage Mask
Repeat the same process for the next edge
Left edge
Right edge
Right edge
Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
Intersections
![Page 42: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/42.jpg)
42
Compute Coverage Mask
Repeat the same process for the next edge
– Edge is facing right invert maskUpdate
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
Intersections
![Page 43: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/43.jpg)
43
Compute Coverage Mask
Combine masks of all overlapping edges
Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
![Page 44: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/44.jpg)
44
Compute Coverage Mask
Combine masks of all overlapping edges
– Using bitwise ANDUpdate
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
![Page 45: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/45.jpg)
45
Compute Coverage Mask
Combine masks of all overlapping edges
– Using bitwise ANDUpdate
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
![Page 46: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/46.jpg)
46
Shuffle Mask
Shuffle mask to form better shaped tiles
– Before: each SIMD lane is a scanlineUpdate
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
![Page 47: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/47.jpg)
47
Shuffle Mask
Shuffle mask to form better shaped tiles
– Before: each SIMD lane is a scanline
– After: each SIMD lane is a 8x4 tile Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
![Page 48: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/48.jpg)
48
Depth Test
Interpolate conservative depth (per 8x4 tile)
Test against bufferUpdate
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
Buffer
![Page 49: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/49.jpg)
49
Update Tile
Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
Two code paths (can be switched compile time)
– Original update method [AHAM15]
– New update method tailored for SW [HAAM16]
Why use a new update method?
– Faster – same culling power
– Less accurate than original, more dependent on render order
– Works best if you render front-to-back
![Page 50: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/50.jpg)
50
Update Tile, New Method [HAAM16]
Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
zmax is the reference layer
– Maximum value for the entire tile
zmax is the working layer
– Maximum value for a subset of the tile
– Updated as
– New depth = max(zmax , zmax)
– New mask = TriangleMask OR LayerMask
Whenever working layer mask is full, overwrite reference layer
1
1
tri
0
![Page 51: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/51.jpg)
51
Update Tile
![Page 52: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/52.jpg)
52
Update Tile
![Page 53: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/53.jpg)
53
Update Tile
![Page 54: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/54.jpg)
54
Update Tile
![Page 55: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/55.jpg)
55
Update Tile
![Page 56: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/56.jpg)
Discard heuristic: If zmax – zmax > zmax – zmax , discard working layer
56
Update Tiletri1 10
Restart
![Page 57: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/57.jpg)
57
Update Tile
![Page 58: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/58.jpg)
58
Update Tile
![Page 59: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/59.jpg)
59
Update Tile
Full overwrite:Restart from new value
![Page 60: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/60.jpg)
60
Update Tile
Update
Depth test
Compute coverage
Traversal setup
Depth plane
Compute bounds
Clip
Transform
Update is quicker than original [AHAM15]
Test is also quicker
– Need only to test against reference layer (zmax)0
![Page 61: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/61.jpg)
![Page 62: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/62.jpg)
62
ResultsIntel Occlusion Culling Sample
Clear: Clearing the depth buffer
Geom: Transform & project geometry
Rast: Triangle setup & occluder rasterization
Gen: Compute HiZ buffer from full resolution z buffer
Test: Perform occlusion queries
3.7x16x
(μs)
Old [CMK16]
New [HAAM16]
![Page 63: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/63.jpg)
63
Performance comparison for camera animation
Results
First frame
Last frame
Old New Frustum only
![Page 64: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/64.jpg)
Code is available as open-source
![Page 65: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/65.jpg)
65
Masked Occlusion Culling API
void SetResolution();
void SetNearClipPlane();
void ClearBuffer();
static void TransformVertices();
Result RenderTriangles();
Result TestTriangles();
Result TestRect();
void ComputePixelDepthBuffer();
OcclusionCullingStatistics GetStatistics();
Setup
Debug
Render &query
![Page 66: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/66.jpg)
66
Masked Occlusion Culling APIResult RenderTriangles(
float *inVtx,
uint *inTris,
int nTris,
ClipPlanes mask,
ScissorRect *scissor,
VertexLayout &layout
);
Render to the software HiZ buffer
// Clip space vertex positions
// Index array (Indices to inVtx buffer)
// Triangle count (the number of index triplets in inTris)
// Mask for potential frustum bound overlap
// Scissor region
// Vertex format of inTris. There is a fast-path for AoS with
(x, y, z, w) coordinates
![Page 67: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/67.jpg)
67
Masked Occlusion Culling APIResult RenderTriangles(
float *inVtx,
uint *inTris,
int nTris,
ClipPlanes mask,
ScissorRect *scissor,
VertexLayout &layout
);
Eye
View frustum
Near plane
mask = 0
mask = leftPlane | nearPlane
Clipping is not free...
– If you’re already doing frustum culling, let the API know the outcome
![Page 68: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/68.jpg)
68
Masked Occlusion Culling APIResult RenderTriangles(
float *inVtx,
uint *inTris,
int nTris,
ClipPlanes mask,
ScissorRect *scissor,
VertexLayout &layout
);
Eye
View frustum
Scissor region (screen space AABB)
Can be used for threading
– One scissor region per thread
![Page 69: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/69.jpg)
69
Masked Occlusion Culling APIResult TestTriangles(
float *inVtx,
uint *inTris,
int nTris,
ClipPlanes mask,
ScissorRect *scissor,
VertexLayout &layout
);
Test triangles against the software HiZ buffer
– Does not update the buffer
// Returns the collective culling outcome of the triangles
// Clip space vertex positions
// Index array (Indices to inVtx buffer)
// Triangle count (the number of index triplets in inTris)
// Mask for potential frustum bound overlap
// Scissor region
// Vertex format of inTris. There is a fast-path for AoS with
(x, y, z, w) coordinates
![Page 70: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/70.jpg)
70
Masked Occlusion Culling APIResult TestRect(
float xmin,
float ymin,
float xmax,
float ymax,
float wmin
);
Test rectangle against the software HiZ buffer
– Does not update the buffer
// Returns the culling outcome of the screen space rectangle
/*
Screen space bounds:
[xmin, ymin] – [xmax, ymax]
*/
// Conservative clip space w (typically the w-component of the nearest
bbox vertex in clip space)
![Page 71: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/71.jpg)
71
Example use case: Scene Bounding Volume Hierarchy (BVH) traversal and culling
ClearBuffer();
prioQueue.push(root);
while (!prioQueue.empty()) {
Node node = prioQueue.pop();
if (FrustumTest(node) == Culled)
continue;
compute_screen_space_bounds(node);
if (TestRect(bounds) == Culled)
continue;
if (node is InnerNode) {
prioQueue.push(node.left, dist);
prioQueue.push(node.right, dist);
} else (node is Leaf) {
TransformVertices(leaf.vertices);
RenderTriangles(xfVertices);
send_leaf_to_GPU();
}
}
RenderFrame
Culled!
![Page 72: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/72.jpg)
72
Essential Tools We Have Relied On
Intel® VTune™
– https://software.intel.com/en-us/intel-vtune-amplifier-xe
SSE/AVX intrinsics guide
– https://software.intel.com/sites/landingpage/IntrinsicsGuide/
![Page 73: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/73.jpg)
73
References
[AHAM15] ANDERSSON M., HASSELGREN J., AKENINE-MÖLLER T.: Masked Depth Culling for Graphics Hardware. ACM Transactions on Graphics 34, 6 (2015), pp. 188:1–188:9
[CMK16] CHANDRASEKARAN C., MCNABB D., KUAH K., FAUCONNEAU M., GIESEN F.: Software Occlusion Culling. Published online at: https://software.intel.com/en-us/articles/software-occlusion-culling, (2013–2016)
[Col11] COLLIN D.: Culling the Battlefield. Game Developer’s Conference (presentation), (2011)
[Greene93] GREENE N., KASS M., MILLER G.: Hierarchical Z-Buffer Visibility. In Proceedings of SIGGRAPH, (1993), pp. 231–238
[HA15] HAAR U., AALTONEN S.: GPU-Driven Rendering Pipelines. SIGGRAPH Advances in Real-Time Rendering in Games course, (2015)
[HAAM16] HASSELGREN J., ANDERSSON M., AKENINE-MÖLLER T.: Masked Software Occlusion Culling. High Performance Graphics, (2016)
![Page 74: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/74.jpg)
74
Check it out!
GitHub: Lightweight library
– https://github.com/GameTechDev/MaskedOcclusionCulling
GitHub: Example integrated in Intel’s Software Occlusion Culling demo
– https://github.com/GameTechDev/OcclusionCulling
Project page: Masked Software Occlusion Culling
– https://software.intel.com/en-us/articles/masked-software-occlusion-culling
Questions and feedback welcome
![Page 75: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/75.jpg)
Legal Notices and DisclaimersIntel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the annual report on Form 10-K.
All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. © 2016 Intel Corporation. Intel, the Intel logo, VTune and others are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
![Page 76: Masked Software Occlusion Culling](https://reader031.vdocument.in/reader031/viewer/2022021418/588627861a28ab8f2c8b6369/html5/thumbnails/76.jpg)