gs-4150, bullet 3 opencl rigid body simulation, by erwin coumans

53
BULLET 3 OPENCL™ RIGID BODY SIMULATION ERWIN COUMANS, AMD

Upload: amd-developer-central

Post on 25-Jun-2015

1.358 views

Category:

Technology


4 download

DESCRIPTION

Presentation GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans at the AMD Developer Summit (APU13) Nov. 11-13, 2013.

TRANSCRIPT

Page 1: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

BULLET 3 OPENCL™ RIGID BODY SIMULATIONERWIN COUMANS, AMD

Page 2: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL2

DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION

© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. OpenCL™ is a trademark of Apple Inc. Windows® and DirectX® are trademarks of Microsoft Corp. Linux is a trademark of Linus Torvalds. Other names are for informational purposes only and may be trademarks of their respective owners.

Page 3: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL3

AGENDA

Introduction, Particles, Rigid Bodies

GPU Collision Detection

GPU Constraint Solving

Page 4: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL4

BULLET 2.82 AND BULLET 3 OPENCL™ ALPHA

Real-time C++ collision detection and rigid body dynamics library

Used in movies

‒ Maya, Houdini, Cinema 4D, Blender, Lightwave, Carrara, Posed 3D, thinking Particles, etc

‒ Disney Animation (Bolt), PDI Dreamworks (Shrek, How to train your dragon), Sony Imageworks (2012),

Games

‒ GTA IV, Disney Toystory 3, Cars 2, Riptide GP, GP2

Industrial applications, Robotics

‒ Siemens NX9 MCD, Gazebo

Page 5: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL5

PARTICLES AND RIGID BODIES

Position (Center of mass, float3)

Orientation

‒ (Inertia basis frame, float4)

Page 6: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL6

UPDATING THE TRANSFORM

Linear velocity (float3)

Angular velocity (float3)

Page 7: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL7

C/C++ VERSUS OPENCL™

void integrateTransforms(Body* bodies, int numNodes, float timeStep)

{

for (int nodeID=0;nodeId<numNodes;nodeID++) {

if( bodies[nodeID].m_invMass != 0.f) {

bodies[nodeID].m_pos += bodies[nodeID].m_linVel * timeStep;

}}

__kernel void integrateTransformsKernel( __global Body* bodies, int numNodes, float timeStep)

{

int nodeID = get_global_id(0);

if( nodeID < numNodes && (bodies[nodeID].m_invMass != 0.f)) {

bodies[nodeID].m_pos += bodies[nodeID].m_linVel * timeStep;

}

}

One to One mapping

Read WriteCompute

Page 8: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL8

OPENCL™ PARTICLES

Page 9: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL9

UPDATE ORIENTATION

__kernel void integrateTransformsKernel( __global Body* bodies,const int numNodes, float timeStep, float angularDamping, float4 gravityAcceleration)

{

int nodeID = get_global_id(0);

if( nodeID < numNodes && (bodies[nodeID].m_invMass != 0.f))

{

bodies[nodeID].m_pos += bodies[nodeID].m_linVel * timeStep; //linear velocity

bodies[nodeID].m_linVel += gravityAcceleration * timeStep; //apply gravity

float4 angvel = bodies[nodeID].m_angVel; //angular velocity

bodies[nodeID].m_angVel *= angularDamping; //add some angular damping

float4 axis;

float fAngle = native_sqrt(dot(angvel, angvel));

if(fAngle*timeStep> BT_GPU_ANGULAR_MOTION_THRESHOLD) //limit the angular motion

fAngle = BT_GPU_ANGULAR_MOTION_THRESHOLD / timeStep;

if(fAngle < 0.001f)

axis = angvel * (0.5f*timeStep-(timeStep*timeStep*timeStep)*0.020833333333f * fAngle * fAngle);

else

axis = angvel * ( native_sin(0.5f * fAngle * timeStep) / fAngle);

float4 dorn = axis;

dorn.w = native_cos(fAngle * timeStep * 0.5f);

float4 orn0 = bodies[nodeID].m_quat;

float4 predictedOrn = quatMult(dorn, orn0);

predictedOrn = quatNorm(predictedOrn);

bodies[nodeID].m_quat=predictedOrn; //update the orientation

}

}

See opencl/gpu_rigidbody/kernels/integrateKernel.cl

Page 10: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL10

UPDATE TRANSFORMS, HOST SETUP

ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 0, sizeof(cl_mem), &bodies);

ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 1, sizeof(int), &numBodies);

ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 1, sizeof(float), &deltaTime);

ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 1, sizeof(float), &angularDamping);

ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 1, sizeof(float4), &gravityAcceleration);

size_t workGroupSize = 64;

size_t numWorkItems = workGroupSize*((m_numPhysicsInstances + (workGroupSize)) / workGroupSize);

if (workGroupSize>numWorkItems)

workGroupSize=numWorkItems;

ciErrNum = clEnqueueNDRangeKernel(g_cqCommandQue, g_integrateTransformsKernel, 1, NULL, &numWorkItems, &workGroupSize,0 ,0 ,0);

Page 11: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL11

MOVING THE CODE TO GPU

Create an OpenCL™ wrapper‒ Easier use, fits code style, extra features, learn the API

Replace C++ by C

Move data to contiguous memory

Replace pointers by indices

Exploit the GPU hardware…

Page 12: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL12

SHARING DATA STRUCTURES AND CODE BETWEEN OPENCL™ AND C/C++

#include "Bullet3Collision/NarrowPhaseCollision/shared/b3RigidBodyData.h"

#include "Bullet3Dynamics/shared/b3IntegrateTransforms.h"

__kernel void integrateTransformsKernel( __global b3RigidBodyData_t* bodies,const int numNodes, float timeStep, float angularDamping, float4 gravityAcceleration)

{

int nodeID = get_global_id(0);

if( nodeID < numNodes)

{

integrateSingleTransform(bodies,nodeID, timeStep, angularDamping,gravityAcceleration);

}

}

Page 13: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL13

PREPROCESSING OF KERNELS WITH INCLUDES IN SINGLE HEADER FILE

We want the option of embedding kernels in our C/C++ program

Expand all #include files, recursively into a single stringified header file

‒ This header can be used in OpenCL™ kernels and in regular C/C++ files too

‒ Kernel binary is cached and cached version is unvalidated based on time stamp of embedded kernel file

Premake, Lua and a lcpp: very small and simple C pre-processor written in Lua

‒ See https://github.com/willsteel/lcpp

Page 14: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL14

HOST, DEVICE, KERNELS, WORK ITEMS

Global Device Memory

Global Host Memory

L2 cache

Host Device (GPU)

CPU

Page 15: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

GPU Collision Detection

Page 16: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL16

RIGID BODY PIPELINE

timeStart End

Narrow Phase CD

Detect

pairs

Constraint Solving

Setup

constraints

Solve

constraints

Integrate

position

Collision Data Dynamics Data

Compute

world space

Object AABB

Collision

shapes

Object

AABB

Overlapping

pairs

World

transforms

velocities

Mass

Inertia

Constraints

(contacts,

joints)

Compute

contact

points

Contact

points

Integration

Forces,

Gravity

Broad PhaseCollision Detection (CD)

Mid PhaseCD

Cull complex

shapes

local space

Object

local space

BVH

Page 17: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL17

BOUNDING VOLUMES AND DETECT PAIRS

X min

Y min

Z min

*

X max

Y max

Z max

Object ID

MIN (X,Y,Z)

MAX (X,Y,Z)

Object ID A Object ID B

Object ID A Object ID B

Object ID A Object ID B

Output pairs

Page 18: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL18

COMPUTE PAIRS BRUTE FORCE

__kernel void computePairsKernelOriginal( __global const btAabbCL* aabbs,

__global int2* pairsOut, volatile __global int* pairCount,

int numObjects, int axis, int maxPairs)

{

int i = get_global_id(0);

if (i>=numObjects)

return;

for (int j=0;j<numObjects;j++)

{

if ( i != j && TestAabbAgainstAabb2GlobalGlobal(&aabbs[i],&aabbs[j])) {

int2 myPair;

myPair.x = aabbs[i].m_minIndices[3]; myPair.y = aabbs[j].m_minIndices[3];

int curPair = atomic_inc (pairCount);

if (curPair<maxPairs)

pairsOut[curPair] = myPair; //flush to main memory

}

}

Scatter operation

Page 19: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL19

DETECT PAIRS

Uniform Grid

‒ Very fast

‒ Suitable for GPU

‒ Object size restrictions

Can be mixed with other algorithms

See bullet3\src\Bullet3OpenCL\BroadphaseCollision\b3GpuGridBroadphase.cpp

0 1 2 3

12 13 14 15

5 7

8 10 11

B

C E

D

F

A

Page 20: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL20

UNIFORM GRID AND PARALLEL PRIMITIVES

Radix Sort the particles based on their cell index

Use a prefix scan to compute the cell size and offset

Fast OpenCL™ and DirectX® 11 Direct Compute implementation

Page 21: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL21

1 AXIS SORT, SWEEP AND PRUNE

Find best sap axis

Sort aabbs along this axis

For each object, find and add overlapping pairs

Page 22: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL22

COMPUTE PAIRS 1-AXIS SORT

__kernel void computePairsKernelOriginal( __global const btAabbCL* aabbs,

__global int2* pairsOut, volatile __global int* pairCount,

int numObjects, int axis, int maxPairs)

{

int i = get_global_id(0);

if (i>=numObjects)

return;

for (int j=i+1;j<numObjects;j++)

{

if(aabbs[i].m_maxElems[axis] < (aabbs[j].m_minElems[axis]))

break;

if (TestAabbAgainstAabb2GlobalGlobal(&aabbs[i],&aabbs[j])) {

int2 myPair;

myPair.x = aabbs[i].m_minIndices[3]; myPair.y = aabbs[j].m_minIndices[3];

int curPair = atomic_inc (pairCount);

if (curPair<maxPairs)

pairsOut[curPair] = myPair; //flush to main memory

}

}

Page 23: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL23

GPU MEMORY HIERARCHY

Global Device Memory

Shared Local Memory

Shared Local MemoryShared Local Memory

Compute Unit

Private Memory

(registers)

Page 24: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL24

BARRIER

A point in the program where all threads stop and wait

When all threads in the Work Group have reached the barrier, they can proceed

Barrier

Page 25: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL25

KERNEL OPTIMIZATIONS FOR 1-AXIS SORTCONTENT SUBHEADER

AVOID GLOBAL ATOMICS

Use private memory to accumulate overlapping pairs (append buffer)

LOCAL ATOMICS Determine early exit condition for all work items within a workgroup

LOCAL MEMORY block to fetch AABBs and re-use them within a workgroup (barrier)

Page 26: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL26

KERNEL OPTIMIZATIONS (1-AXIS SORT)

Load balancing‒ One work item per object, multiple work items for large objects

See opencl/gpu_broadphase/kernels/sapFast.cl and sap.cl

(contains un-optimized and optimized version of the kernel for comparison)

Page 27: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL27

SEQUENTIAL INCREMENTAL 3-AXIS SAP

Page 28: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL28

PARALLEL INCREMENTAL 3-AXIS SAP

Parallel sort 3 axis

Keep old and new sorted axis‒6 sorted axis in total

Page 29: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL29

If begin or endpoint has same index do nothing

Otherwise, range scan on old AND new axis‒adding or removing pairs, similar to original SAP

Read-only scan is embarrassingly parallel

PARALLEL INCREMENTAL 3-AXIS SAP

Sorted x-axis old

Sorted x-axis new

Page 30: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL30

HYBRID CPU/GPU PAIR SEARCH

0 1 2 3

12 13 14 15

5 7

8 10 11

B

C E

D

F

A

Small

Small

Large

Large

GPU

either

either

CPU

Page 31: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL31

TRIANGLE MESH COLLISION DETECTION

Page 32: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL32

GPU BVH TRAVERSAL

Create skip indices forfaster traversal

Create subtrees thatfit in Local Memory

Stream subtrees forentire wavefront/warp

Quantize Nodes

‒ 16 bytes/node

Page 33: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL33

COMPOUND VERSUS COMPOUND COLLISION DETECTION

Page 34: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL34

TREE VERSUS TREE: TANDEM TRAVERSAL

See __kernel void findCompoundPairsKernel( __global const int4* pairs … in

‒ in bullet3\src\Bullet3OpenCL\NarrowphaseCollision\kernels/sat.cl

for (int p=0;p<numSubTreesA;p++) {

for (int q=0;q<numSubTreesB;q++) {

b3Int2 node0; node0.x = startNodeIndexA;node0.y = startNodeIndexB;

nodeStack[depth++]=node0; depth = 1;

do {

b3Int2 node = nodeStack[--depth];

if (nodeOverlap){

if(isInternalA && isInternalB){

nodeStack[depth++] = b3MakeInt2(nodeAleftChild, nodeBleftChild);nodeStack[depth++] = b3MakeInt2(nodeArightChild, nodeBleftChild);

nodeStack[depth++] = b3MakeInt2(nodeAleftChild, nodeBrightChild);

nodeStack[depth++] = b3MakeInt2(nodeArightChild, nodeBrightChild);

} else {

if (isLeafA && isLeafB) processLeaf(…)

else { …} //see actual code

}

} while (depth);

Page 35: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL35

CONTACT GENERATION: GPU CONVEX HEIGHTFIELD

Dual representation

SATHE, R. 2006. Collision detection shader using cube-maps. In ShaderX5, Charles River Media

Page 36: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL36

SEPARATING AXIS TEST

Face normal A

Face normal B

Edge-edge normal

Uniform work suits GPU very well: one work unit processes all SAT tests for one pair

Precise solution and faster than height field approximation for low-resolution convex shapes

See opencl/gpu_sat/kernels/sat.cl

A B

axis

plane

Page 37: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL37

COMPUTING CONTACT POSITIONS

Given the separating normal find incident face

Clip incident face using Sutherland Hodgman clipping

One work unit performs clipping for one pair, reduces contacts and appends to contact buffer

See opencl/gpu_sat/kernels/satClipHullContacts.cl

n

incident

reference face

n

clipping planes

Page 38: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL38

SAT ON GPU

Break the algorithm into pipeline stages, separated into many kernels

‒ findSeparatingAxisKernel

‒ findClippingFacesKernel

‒ clipFacesKernel

‒ contactReductionKernel

Concave and compound cases produce even more stages

‒ bvhTraversalKernel,findConcaveSeparatingAxisKernel,findCompoundPairsKernel,processCompoundPairsPrimitivesKernel,processCompoundPairsKernel,findConcaveSphereContactsKernel,clipHullHullConcaveConvexKernel

Page 39: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL39

GPU CONTACT REDUCTION

See newContactReductionKernel in opencl/gpu_sat/kernels/satClipHullContacts.cl

Page 40: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

GPU Constraint Solving

Page 41: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL41

REORDERING CONSTRAINTS REVISITED

A

B D

1 4

A B C D

1 1

2 2

3 3

4 4

A B C D

Batch 0 1 1 3 3

Batch 1 4 2 2 4

Page 42: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL42

while( nIdxSrc ) {nIdxDst = 0; int nCurrentBatch = 0;for(int i=0; i<N_FLG/32; i++) flg[i] = 0; //clear flagfor(int i=0; i<nIdxSrc; i++) {

int idx = idxSrc[i]; btAssert( idx < n );//check if it can goint aIdx = cs[idx].m_bodyAPtr & FLG_MASK; int bIdx = cs[idx].m_bodyBPtr & FLG_MASK;u32 aUnavailable = flg[ aIdx/32 ] & (1<<(aIdx&31));u32 bUnavailable = flg[ bIdx/32 ] & (1<<(bIdx&31));if( aUnavailable==0 && bUnavailable==0 ) {

flg[ aIdx/32 ] |= (1<<(aIdx&31)); flg[ bIdx/32 ] |= (1<<(bIdx&31));cs[idx].getBatchIdx() = batchIdx;sortData[idx].m_key = batchIdx; sortData[idx].m_value = idx;nCurrentBatch++;if( nCurrentBatch == simdWidth ) {

nCurrentBatch = 0;for(int i=0; i<N_FLG/32; i++) flg[i] = 0;

}}else {

idxDst[nIdxDst++] = idx;}

}swap2( idxSrc, idxDst ); swap2( nIdxSrc, nIdxDst );batchIdx ++;

}

CPU SEQUENTIAL BATCH CREATION

Page 43: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL43

GPU ITERATIVE BATCHING

Parallel threads in workgroup (same SIMD) use local atomics to lock rigid bodies

Before locking attempt, first check if bodies are already used in previous iterations

See “A parallel constraint solver for a rigid body simulation”, Takahiro Harada, http://dl.acm.org/citation.cfm?id=2077378.2077406

and opencl\gpu_rigidbody\kernels\batchingKernels.cl

A B C D

unused unused unused unused

1 1 2 3

A B C D

Batch 0 1 1

For

each

bat

ch

For

each

un

assi

gned

co

nst

rain

t

Try to reserve bodies

Append constraint to batch

A

B D

1 4

Page 44: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL44

GPU PARALLEL TWO STAGE BATCH CREATION

Cell size > maximum dynamic object size

Constraint are assigned to a cell

‒ based on the center-of-mass location of the first active rigid body of the pair-wise constraint

Non-neighboring cells can be processed in parallel

Page 45: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL45

MASS SPLITTING+JACOBI ~= PGS

See “Mass Splitting for Jitter-Free Parallel Rigid Body Simulation” by Tonge et. al.

A B0 B1 C0 C1 D1 D1 A

1 1 2 2 3 3 4 4

B D

A

1

2 3

4

B C D

B1

B0

Parallel Jacobi

Averaging velocities

C1

C0

C1

C0

Page 46: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL46

GPU NON-CONTACT CONSTRAINTS, JOINTS

Page 47: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL47

GPU NON-CONTACT CONSTRAINTS, JOINTS

getInfo1Kernel and getInfo2Kernel with switch statement replaces virtual methods in Bullet 2.x

See bullet3\src\Bullet3OpenCL\RigidBody\kernels\jointSolver.cl

__kernel void getInfo1Kernel(__global unsigned int* infos, __global b3GpuGenericConstraint* constraints, int numConstraints)

__kernel void getInfo2Kernel(__global b3SolverConstraint* solverConstraintRows, ..

switch (constraint->m_constraintType)

{

case B3_GPU_POINT2POINT_CONSTRAINT_TYPE:

case B3_GPU_FIXED_CONSTRAINT_TYPE:

}

Page 48: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL48

DETERMINISTIC RESULTS

Projected Gauss Seidel requires solving rows in the same order

Sort the constraint rows (contacts, joints)

Solve constraint batches in the same order

Page 49: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL49

DYNAMICA PLUGIN FOR MAYA WITH OPENCL™

Page 50: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL50

AMD CODEXL OPENCL™ DEBUGGER AND PROFILER

Page 51: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL51

STACKING TEST

Page 52: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL52

FUTURE WORK

DirectX®11 DirectCompute port

Multi GPU, multi-core, MPI

Move over Bullet 2 to Bullet 3, hybrid of CPU and GPU

‒ Featherstone, direct solvers on CPU

Cloth and Fluid simulation, TressFX hair, with two-way interaction

Extend GPU-PGS solver to GPU-NNCG

‒ Non-smooth non-linear conjugate gradient solver

Improve GPU Ray intersection tests

Page 53: GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans

| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL53

THANK YOU!

Visit http://bulletphysics.org for more information. All source code is available:

http://github.com/erwincoumans/bullet3

‒ Lets you fork, report issues and request features

Windows®, Linux®, Mac OSX

AMD and NVIDIA GPU

‒ Preferably high-end desktop GPU