holy smoke! faster particle rendering using directcompute amd and microsoft developer day, june...

35
HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

Upload: tyrone-mccormick

Post on 15-Jan-2016

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE

AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLMGARETH THOMAS

2ND JUNE 2014

Page 2: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM2

PLAN FOR TODAY

Simulation Overview Collisions Sorting Tiled Rendering Conclusions

Page 3: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM3

OVERVIEW

Why use the gpu for simulation?‒Highly parallel workload‒Free your CPU to do other cool stuff‒Leverage compute

‒ Take advantage of the Local Data Store (LDS)‒ Asynchronous compute on some platforms

MOTIVATION

Page 4: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM4

OVERVIEW

Emit Simulate Sort Render

‒ Rasterize billboards‒ Tiled Rendering using DirectCompute

HOW TO BUILD A GPU PARTICLE SYSTEM

Page 5: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM5

SIMULATION OVERVIEWHOW THE SIMULATION FITS TOGETHER

Simulate Compute ShaderUpdate Particles. Add alive ones to Alive List, add dead ones to Dead List

Dead ListPersistent list of particle indices

Alive ListList of alive particle indices. Rebuilt each frame by Simulation

CS

Emit Compute ShaderReads free indices from dead list. Writes new

particle data into global array

Particle ArrayPersistent list of particle indices

Page 6: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM6

COLLISIONS

Can no longer use CPU-side physics engine for collisions Use depth buffer [Tchou11]

‒ Project particle into screen space and read depth buffer‒ Project particle into view space‒ Transform depth buffer value into view space and compare depths

Generate collision response‒ Use G-buffer normals‒ Or take multiple depth samples to reconstruct the normal

A GPU-BASED SOLUTION

view space

P(n)

P(n+1)

thickness

Z

Page 7: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM7

COLLISIONS

Only collides against geometry in the depth buffer Particles would collide against depth buffer even if they

are behind the geometry‒ Use a thickness value to assume particles are in free space

behind geometry

Particles don’t collide when they are off screen‒ Causes issues when particles that are at rest on the floor have

gone off-screen and have now disappeared‒ Put particles to sleep in the simulation once they have come to

rest‒ Use G-buffer to mark parts of the scene that particles can sleep

on (static objects)

Not Multi-GPU Friendly!‒ Switch off depth buffer collisions in MGPU mode

PROBLEMS WITH USING THE DEPTH BUFFER

Fallen through world!

Page 8: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM8

7 3 6 8 1 4 2 5

for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2){ for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort }}

BITONIC SORT

Page 9: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM9

2 51 46 87 3

for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 2{ for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 1 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort }}

BITONIC SORT (PASS 1)

Page 10: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM10

3 7 8 6 1 4 5 2

for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 4{ for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 2 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort }}

BITONIC SORT (PASS 2)

Page 11: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM11

3 6 8 7 5 4 1 2

for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 4{ for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 1 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort }}

BITONIC SORT (PASS 3)

Page 12: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM12

3 6 7 8 5 4 2 1

for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 8{ for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 4 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort }}

BITONIC SORT (PASS 4)

Page 13: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM13

3 4 2 1 5 6 7 8

for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 8{ for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 2 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort }}

BITONIC SORT (PASS 5)

Page 14: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM14

2 1 3 4 5 6 7 8

for( subArraySize=2; subArraySize<ArraySize; subArraySize*=2) // subArraySize == 8{ for( compareDist=subArraySize/2; compareDist>0; compareDist/=2) // compareDist == 1 { // Begin: GPU part of the sort for each element n n = selectBitonic(n, n^compareDist); // End: GPU part of the sort }}

BITONIC SORT (PASS 6)

Page 15: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM15

Sorted Alive List

Vertex Shader

Read Particle Buffer

Geometry Shader

Expand one point to four. Billboard in view space.

Pixel Shader

Texturing and tinting. Depth fade for soft particles.

Particle Pool

RENDERING

Page 16: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM16

Sorted Alive List

Vertex Shader

Read particle buffer and billboard in view space

Pixel Shader

Texturing and tinting. Depth fade for soft particles.

Particle Pool

Index Buffer

RENDERING

Page 17: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM17

RENDERING

The alive particle count is only available on the GPU‒ Use Indirect API

DrawInstancedIndirect( GPU-args ) for Geometry Shader billboards‒ D3DPT_POINTLIST with no VB, IB or IA ‒ VertexId = Particle index‒ VertexCountPerInstance = NumParticles‒ InstanceCount = 1‒ Geometry Shader expands the point into four vertices and a 2 triangle strip per billboard

Or better still……. DrawIndexedInstancedIndirect( GPU-args )‒ D3DPT_TRIANGLELIST, use IB‒ VertexId / 4 = Particle index‒ VertexId % 4 = Billboard corner index‒ IndexCountPerInstance = NumParticles * 6‒ InstanceCount = 1

RASTERIZATION – FOR OLD SCHOOL GPU PARTICLE SYSTEMS

Page 18: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM18

RENDERING

Overdraw from large particles kills game performance!‒ Get artists to throttle back on the VFX

Optimizations‒ Tightly fit polygons around texture [Persson09]‒ Render to smaller buffer [Cantlay07]

‒ Sorting issues‒ Loss of fidelity

PROBLEMS WITH RASTERIZATION

Page 19: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM19

TILED RENDERING

Inspired by Forward+ [Harada12]‒ Screen-space binning of particles instead of

point lights!

Use a 32x32 thread group to shade a 32x32 pixel tile in screen space‒ Cull particles (just like Forward+)‒ Sort particles‒ Per pixel/thread

‒ Evaluate colour of each particle‒ Blend together

‒ Composite back onto scene

OVERVIEW

Page 20: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM20

TILED RENDERING

12

3

[1] [1,2,3] [2,3]

Divide screen into tiles Build index lists of intersecting

particles per tile

Page 21: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM21

TILED RENDERING

View space asymmetric frustum generated per tile

Use camera’s near plane Use camera’s far plane Or calculate far plane from depth

buffer

Tile0 Tile1 Tile2 Tile3

Page 22: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM22

TILED RENDERING

Page 23: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM23

TILED RENDERING

numthreads[ 32,32,1] Culling 1024 particles in parallel Add to LDS index list Write out to memory

‒ Particle count‒ Particle indices

THREAD GROUP VIEW

Page 24: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM24

TILED RENDERINGTILE COMPLEXITY

Page 25: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM25

TILED RENDERING

Cannot sort global list of particles‒ Because 1024 particles get culled in parallel they get

added to visible list in arbitrary order

Need to sort particles per-tile‒ This is a good thing!‒ Only need to sort a subset of the global list‒ Sorting particles in single pass in LDS vs main memory

and in multiple passes

PER TILE BITONIC SORT

Page 26: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM26

TILED RENDERING

numthreads[ 32, 32, 1 ] 1 thread = 1 pixel in screen space Set accumulation colour to float4( 0, 0, 0, 0 ) For each particle in tile (back to front)

‒ Evaluate particle contribution‒ UV generation & radius check‒ Texture lookup‒ Normal generation and lighting

‒ Manually blend‒ Colour = ( srcA x srcCol ) + ( invSrcA x destCol )‒ Alpha = srcA + ( invSrcA x destA )

‒ Write result to screen size UAV

EVALUATING TILE COLOUR

Page 27: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM27

TILED RENDERING

numthreads[ 32, 32, 1 ] 1 thread = 1 pixel in screen space Set accumulation colour to float4( 0, 0, 0, 0 ) For each particle in tile (front to back)

‒ Evaluate particle contribution‒ UV generation & radius check‒ Texture lookup‒ Normal generation and lighting

‒ Manually blend [Bavoil08]‒ Colour = ( invDestA x srcA x srcCol ) + destCol‒ Alpha = srcA + ( invSrcA x destA )

‒ if ( accumulation alpha > threshold )accumulation alpha = 1 and bail

‒ Write result to screen size UAV

EVALUATING TILE COLOUR – IMPROVED!!!

Page 28: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM28

TILED RENDERING

Bin particles into 8x8 grid For each particle

‒ For each bin‒ Test particle against bin‒ Add particle if visible

UAV0 for particle indices (size = 8 x 8 x maxparticles)‒ Array split into 64 bins using offsets

UAV1 for storing particle count per bin (size = 8 x 8)‒ 1 element per bin‒ Use InterlockedAdd() to bump bin’s counter

COARSE CULLING

Page 29: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM29

TILED RENDERINGCOMPUTE SHADER SETUP

Per-bin particle indices

Per-tile sorted particle indices

Screen space colour buffer

Per-bin frustum planes

Per-tile particle indices and distances

Particle data (position, radius, colour etc)

Compute ShadersLDS Shader Output

Updated particle dataSimulationnumthreads[256, 1, 1], 1 thread per particle

Coarse Cullingnumthreads[256, 1, 1], 1 thread per particle

Tile Culling and Sortingnumthreads[32, 32, 1], 1 thread per particle

Tile Renderingnumthreads[32, 32, 1], 1 thread per pixel

Page 30: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM30

mode frame time (ms)*

Rasterization 5.2

Tiled 3.4

*AMD Radeon R9 290X @ 1080p

Breakdown frame time (ms)*

Simulation 0.50

Coarse Culling 0.06

Tile Culling and Sorting 0.37

Tiled Rendering 1.86

PERFORMANCE RESULTSDefault View, ~35K particles

Page 31: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM31

mode frame time (ms)*

Rasterization 27.3

Tiled 6.2

*AMD Radeon R9 290X @ 1080p

PERFORMANCE RESULTSIn Smoke View, ~35K particles

Page 32: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM32

CONCLUSIONS

Depth buffer collisions‒ Great bang-for-buck‒ Not perfect!

Bitonic sort‒ Good fit for sorting on the GPU

Tiled Rendering‒ Faster than rasterization‒ Great for combatting heavy overdraw‒ More predictable behaviour

Future work‒ Add arbitrary geometry for OIT‒ Volume tracing

Page 33: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM33

QUESTIONS? Demo with full source coming soon http://developer.amd.com/tools/graphics-development/amd-radeon-sdk/

Page 34: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM34

REFERENCES

[Tchou11] Chris Tchou, “Halo Reach Effects Tech”, GDC 2011 [Persson09] Emil Persson, http://www.humus.name/index.php?page=News&ID=266 [Cantlay07] Iain Cantlay, “High-Speed, Off-Screen Particles”, GPU Gems 3 2007 [Harada12] Takahiro Harada et al, “Forward+: Bringing Deferred Lighting to the Next Level”, Short Papers,

Eurographics 2012 [Bavoil08] Louis Bavoil et al, “Order Independent Transparency with Dual Depth Peeling”, 2008

Page 35: HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014

| FASTER PARTICLE RENDERING USING DIRECTCOMPUTE | AMD AND MICROSOFT GAME DEVELOPER DAY - JUNE 2 2014, STOCKHOLM35

DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION

© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.