compute-based gpu particle systemstwvideo01.ubm-us.net/o1/.../gareth_thomas_compute... ·...

36
Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Upload: vuongkhue

Post on 06-Oct-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Page 2: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Agenda

● Overview

● Collisions

● Sorting

● Tiled Rendering

● Conclusions

Page 3: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Overview

● Why use the GPU?

● Highly parallel workload

● Free your CPU to do game code

● Leverage compute

Page 4: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Overview

● Emit

● Simulate

● Sort

● Rendering

● Rasterization or Tiled Rendering

Page 5: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Data Structures

Particle Pool

Sort List Dead List

Position, Velocity, Age, Color, etc..

uint index; float distanceSq; uint index;

Page 6: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Particle Pool Position, Velocity, Age,

Color, etc..

Dead List uint index;

ConsumeStructuredBuffer<uint>

Emit Compute Shader Initialize Particles from Dead List

uint index = g_DeadList.Consume();

Page 7: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Particle Pool

Dead List uint index;

AppendStructuredBuffer<uint>

Simulate Compute Shader Update Particles. Add alive ones to Sort List, add dead ones to

Dead List

Sort List uint index; float distanceSq

RWStructuredBuffer<float2>

g_DeadList.Append( index ); g_SortList.IncrementCounter();

RWStructuredBuffer<>

Page 8: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Collisions

● Primitives

● Heightfield

● Voxel data

● Depth buffer [Tchou11]

Page 9: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

view space

Depth Buffer Collisions

● Project particle into screen space

● Read Z from depth buffer

● Compare view-space particle position vs view-space position of Z buffer value

● Use thickness value P(n)

P(n+1)

thickness

Z

Page 10: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Depth Buffer Collision Response

● Use normal from G-buffer

● Or take multiple taps a depth buffer

● Watch out for depth discontinuities

Page 11: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Sort Compute Shader

Sort List uint index; float distanceSq

RWStructuredBuffer<float2>

● Sort for correct alpha blending

● Additive blending just saturates the effect

● Bitonic sort parallelizes well on GPU

Page 12: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Bitonic Sort

7 3 6 8

1 4 2 5

for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2)

{

for( compareDist=subArraySize/2; compareDist>0; compareDist/=2)

{

// Begin: GPU part of the sort

for each element n

n = selectBitonic(n, n^compareDist);

// End: GPU part of the sort

}

}

Page 13: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

2 5 1 4 6 8

7 3

Bitonic Sort (Pass 1)

for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 2

{

for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 1

{

// Begin: GPU part of the sort

for each element n

n = selectBitonic(n, n^compareDist);

// End: GPU part of the sort

}

}

Page 14: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

3 7

Bitonic Sort (Pass 2)

8 6

1 4 5 2

for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 4

{

for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 2

{

// Begin: GPU part of the sort

for each element n

n = selectBitonic(n, n^compareDist);

// End: GPU part of the sort

}

}

Page 15: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

3 6

Bitonic Sort (Pass 3)

8 7

5 4 1 2

for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 4

{

for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 1

{

// Begin: GPU part of the sort

for each element n

n = selectBitonic(n, n^compareDist);

// End: GPU part of the sort

}

}

Page 16: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

3 6

Bitonic Sort (Pass 4)

7 8

5 4 2 1

for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 8

{

for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 4

{

// Begin: GPU part of the sort

for each element n

n = selectBitonic(n, n^compareDist);

// End: GPU part of the sort

}

}

Page 17: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

3 4

Bitonic Sort (Pass 5)

2 1

5 6 7 8

for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 8

{

for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 2

{

// Begin: GPU part of the sort

for each element n

n = selectBitonic(n, n^compareDist);

// End: GPU part of the sort

}

}

Page 18: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

2 1

Bitonic Sort (Pass 6)

3 4

5 6 7 8

for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 8

{

for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 1

{

// Begin: GPU part of the sort

for each element n

n = selectBitonic(n, n^compareDist);

// End: GPU part of the sort

}

}

Page 19: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Sort List

Vertex Shader Read Particle Buffer

Geometry Shader Expand one point to four. Billboard in view space.

Pixel Shader Texturing and tinting. Depth fade for soft particles.

Particle Pool

Page 20: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Sort List

Vertex Shader Read particle buffer and billboard in view space

Pixel Shader Texturing and tinting. Depth fade for soft particles.

Particle Pool

Index Buffer

Page 21: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Rasterization

● DrawIndexedIndirectInstanced() or DrawIndirectInstanced()

● VertexId = particle index (or VertexId/4 for VS billboarding)

● 1 instance

● Heavy overdraw on large particles – restricts game design

● Fit polygon billboard around texture [Persson09]

● Render to half size buffer [Cantlay07]

● Sorting issues

● Loss of fidelity

Page 22: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Tiled Rendering ● Inspired by Forward+ [Harada12]

● Screen-space binning of particles instead of lights

● Per-tile

● Cull & Sort

● Per pixel/thread

● Evaluate color of each particle

● Blend together

● Composite back onto scene

Page 23: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Tiled light particle culling

1 2

3

[1] [1,2,3] [2,3]

Page 24: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Tiled particle culling

● Divide screen into tiles

● Fit asymmetric frustum around each tile

Tile0 Tile1 Tile2 Tile3

Page 25: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Tiled particle culling

Page 26: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Thread Group View ● numthreads[32,32,1]

● Culling 1024 particles in parallel

● Write visible indices to LDS

Page 27: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Per Tile Bitonic Sort

● Because each thread adds a visible particle

● Particles are added to LDS in arbitrary order

● Need to sort

● Only sorting particles in tile rather than global list

Page 28: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Tiled Rendering (1 thread = 1 pixel)

● Set accum color to float4( 0, 0, 0, 0 )

● For each particle in tile (back to front)

● Evaluate particle contribution

● Radius check

● Texture lookup

● Optional normal generation and lighting

● Manually blend

● color = ( srcA x srcCol ) + ( invSrcA x destCol )

● alpha = srcA + ( invSrcA x destA )

● Write to screen size UAV

Page 29: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Tiled Rendering, improved! ● Set accum color to float4( 0, 0, 0, 0 )

● For each particle in tile (front to back)

● Evaluate particle contribution

● Manually blend [Bavoil08]

● color = ( invDestA x srcA x srcCol ) + destCol

● alpha = srcA + ( invSrcA x destA )

● if ( accum alpha > threshold )

accum alpha = 1 and bail

● Write to screen size UAV

Page 30: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Coarse Culling ● Bin particles into 8x8

● UAV0 for indices ● Array split into sections using offsets

● UAV1 for storing particle count per bin ● 1 element per bin

● Use InterlockedAdd() to bump counter

● For each alive particle

● For each bin ● Test particle against bin’s frustum planes

● Bump counter in UAV1 to get slot to write to

● Add particle index to UAV0

Page 31: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Demo

Demo with full source available soon

Page 32: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Performance Results

mode frame time (ms)*

Rasterization 4.86

Tiled 3.15

*AMD Radeon R9 290X @ 1080p

Breakdown frame time (ms)*

Simulation 0.39

Coarse Culling 0.06

Tile Culling 0.43

Render 1.60

Page 33: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Performance Results

mode frame time (ms)*

Rasterization 25.0

Tiled 5.1

*R9 290X @ 1080p

Page 34: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

Conclusions ● Leverage compute for particle simulations

● Depth buffer collisions

● Bitonic sort for correct blending

● Tiled rendering ● Faster than rasterization

● Great for combating heavy overdraw

● More predictable behavior

● Future work ● Volume tracing

● Add arbitrary geometry for OIT

Page 36: Compute-Based GPU Particle Systemstwvideo01.ubm-us.net/o1/.../Gareth_Thomas_Compute... · Compute-Based GPU Particle Systems Gareth Thomas Developer Technology Engineer, AMD

References ● [Tchou11] Chris Tchou, “Halo Reach Effects Tech”, GDC 2011

● [Persson09] Emil Persson, http://www.humus.name/index.php?page=News&ID=266

● [Cantlay07] Iain Cantlay, “High-Speed, Off-Screen Particles”, GPU Gems 3 2007

● [Harada12] Takahiro Harada et al, “Forward+: Bringing Deferred Lighting to the Next Level”, Short Papers, Eurographics 2012

● [Bavoil08] Louis Bavoil et al, “Order Independent Transparency with Dual Depth Peeling”, 2008