gpu opencl physics - nvidia · overview •introduction •particle physics pipeline from the...

47
OpenCL Game Physics Bullet: A Case Study in Optimizing Physics Middleware for the GPU Erwin Coumans

Upload: trankhanh

Post on 27-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

OpenCL Game PhysicsBullet: A Case Study in Optimizing Physics Middleware for the GPU

Erwin Coumans

Page 2: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Overview

• Introduction• Particle Physics Pipeline from the NVIDIA SDK

– Uniform grid, radix or bitonic sort, prefix scan, Jacobi

• Rigid Body Physics Pipeline– Parallel Neighbor search using dynamic BVH trees– Neighboring Pair Management– Convex Collision Detection: GJK in OpenCL on GPU– Concave Collision Detection using BVHs– Parallel Constraint Solving using PGS

• OpenCL cross-platform and debugging

Page 3: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Introduction

• Bullet is an open source Physics SDK used by game developers and movie studios

• PC, Mac, iPhone, Wii, Xbox 360, PlayStation 3

• Bullet 3.x will support OpenCL acceleration

– Simplified rigid body pipeline fully running on GPU

– Developer can mix stages between CPU and GPU

• Implementation is available, links at the end

Page 4: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Some games using Bullet Physics

Page 5: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Some movies using Bullet Physics

Page 6: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Authoring tools

• Maya Dynamica Plugin

• Cinema 4D 11.5

• Blender

Page 7: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Rigid Body Scenarios

3000 falling boxes 1000 stacked boxes 136 ragdolls

1000 convex hulls 1000 convex against trimesh

ray casts against 1000 primitives and trimesh

Page 8: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Performance bottlenecks

Forward Dynamics Collision Detection

Detect

Pairs

Compute

Contacts

Forward Dynamics

Setup

Constraints

Solve

constraints

Integrate

Position

Apply

Gravity

Collision

Shapes

Motionstate

Transforms,

Velocities

Rigid

Bodies

Mass,Inertia

Over

lapping

Pairs

Constraints

contacts,joints

Object

AABBs

Collision Data Dynamics Data

Compute

AABBs

Contact

Points

Predict

Transforms

Page 9: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Leveraging the NVidia SDK

• Radix sort, bitonic sort

• Prefix scan, compaction

• Examples how to use fast shared memory

• Uniform Grid example in Particle Demo

Page 10: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Particle Physics CUDA and OpenCL Demo

Page 11: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Uniform Grid

0 1 2 3

12 13 14 15

5 7

8 10 11

B

C E

D

F

A

Cell ID Count Particle ID

0 0

1 0

2 0

3 0

4 2 D,F

5 0

6 3 B,C,E

7 0

8 0

9 1 A

10 0

11 0

12 0

13 0

14 0

15 0

Page 12: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Sorting Particles per Cell

Cell Index Cell Start

0

1

2

3

4 0

5

6 2

7

8

9 5

10

11

12

13

14

15

Array Index

Unsorted Cell ID, Particle ID

Sorted Cell ID Particle ID

0 9, A 4,D

1 6,B 4,F

2 6,C 6,B

3 4,D 6,C

4 6,E 6,E

5 4,F 9,A

0 1 2 3

12 13 14 15

5 7

8 10 11

B

C E

D

F

A

Page 13: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Neighbor search

• Calculate grid index of particle center

• Parallel Radix or Bitonic Sorted Hash Array

• Search 27 neighboring cells

– Can be reduced to 14 because of symmetry

• Interaction happens during search

– No need to store neighbor information

• Jacobi iteration: independent interactions

Page 14: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Interacting Particle Pairs

Array Index

Sorted Cell ID Particle ID

0 4,D

1 4,F

2 6,B

3 6,C

4 6,E

5 9,A

0 1 2 3

12 13 14 15

5 7

8 10 11

B

C E

D

F

A

Interacting Particle Pairs

D,F

B,C

B,E

C,E

A,D

A,F

A,B

A,C

A,E

Page 15: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Using the GPU Uniform Grid as part of the Bullet CPU pipeline

• Available through btCudaBroadphase

• Reduce bandwidth and avoid sending all pairs

• Bullet requires persistent contact pairs

– to store cached solver information (warm-starting)

• Pre-allocate pairs for each object

Page 16: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Persistent Pairs

Before After

0 1 2 3

12 13 14 15

5 7

8 10 11

B

C E

D

F

A

Particle Pairs Before

After Differences

D,F D,F A,B removed

B,C B,C B,C removed

B,E B,E C,F added

C,E C,E C,D added

A,D A,D

A,F A,F

A,B A,C

A,C A,E

A,E C,F

C,D

0

1

2 3

12 13 14 15

5 7

8 10 11

B

C E

D

F

A

Page 17: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Broadphase benchmark

• Includes btCudaBroadphase

• Bullet SDK: Bullet/Extras/CDTestFramework

Page 18: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

From Particles to Rigid Bodies

Particles Rigid Bodies

World Transform Position Position and Orientation

Neighbor Search Uniform Grid Dynamic BVH tree

Compute Contacts Sphere-Sphere Generic Convex Closest Points, GJK

Static Geometry Planes Concave Triangle Mesh

Solving method Jacobi Projected Gauss Seidel

Page 19: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Dynamic BVH Trees

0 1 2 3

12 13 14 15

5 7

8 10 11

B

C E

D

F

A

B

C E

D

F

ADF

BC E A

Page 20: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Dynamic BVH tree acceleration structure

• Broadphase n-body neighbor search

• Ray and convex sweep test

• Concave triangle meshes

• Compound collision shapes

Page 21: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Dynamic BVH tree Broadphase

• Keep two dynamic trees, one for moving objects, other for objects (sleeping/static)

• Find neighbor pairs:

– Overlap M versus M and Overlap M versus S

DF

BC E A

QP

UR S T

S: Non-moving DBVT M: Moving DBVT

Page 22: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

DBVT Broadphase Optimizations

• Objects can move from one tree to the other

• Incrementally update, re-balance tree

• Tree update hard to parallelize

• Tree traversal can be parallelized on GPU

– Idea proposed by Takahiro Harada at GDC 2009

Page 23: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Parallel GPU Tree Traversal using History Flags

• Alternative to recursive or stackless traversal

00

00

00

00

Page 24: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Parallel GPU Tree Traversal using History Flags

• 2 bits at each level indicating visited children

10

00

00

00

Page 25: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Parallel GPU Tree Traversal using History Flags

• Set bit when descending into a child branch

10

10

00

00

Page 26: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Parallel GPU Tree Traversal using History Flags

• Reset bits when ascending up the tree

10

10

00

00

Page 27: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Parallel GPU Tree Traversal using History Flags

• Requires only twice the tree depth bits

10

11

00

00

Page 28: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Parallel GPU Tree Traversal using History Flags

• When both bits are set, ascend to parent

10

11

00

00

Page 29: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Parallel GPU Tree Traversal using History Flags

• When both bits are set, ascend to parent

10

00

00

00

Page 30: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

History tree traversal

do{

if(Intersect(n->volume,volume)){

if(n->isinternal()) {

if (!historyFlags[curDepth].m_visitedLeftChild){

historyFlags[curDepth].m_visitedLeftChild = 1;

n = n->childs[0];

curDepth++;

continue;}

if (!historyFlags[curDepth].m_visitedRightChild){

historyFlags[curDepth].m_visitedRightChild = 1;

n = n->childs[1];

curDepth++;

continue;}

}

else

policy.Process(n);

}

n = n->parent;

historyFlags[curDepth].m_visitedLeftChild = 0;

historyFlags[curDepth].m_visitedRightChild = 0;

curDepth--;

} while (curDepth);

Page 31: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Find contact points

• Closest points, normal and distance

• Convention: positive distance -> separation

• Contact normal points from B to A

Object A B

Page 32: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Voxelizing objects

Page 33: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

OpenCL Rigid Particle Bunnies

Page 34: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Broadphase

• The bunny demo broadphase has entries for each particle to avoid n^2 tests

• Many sphere-sphere contact pairs between two rigid bunnies

• Uniform Grid is not sufficient

Page 35: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Voxelizing objects

Page 36: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

General convex collision detection on GPU

• Bullet uses hybrid GJK algorithm with EPA

• GJK convex collision detection fits current GPU

• EPA penetration depth harder to port to GPU

– Larger code size, dynamic data structures

• Instead of EPA, sample penetration depth

– Using support mapping

• Support map can be sampled using GPU hardware

Page 37: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Parallelizing Constraint Solver

• Projected Gauss Seidel iterations are not embarrassingly parallel

A

B D

1 4

Page 38: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Reordering constraint batches

A

B D

1 4

A B C D

1 1

2 2

3 3

4 4

A B C D

1 1 3 3

4 2 2 4

Page 39: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Creating Parallel Batches

Page 40: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

OpenCL kernel Setup Batches

__kernel void kSetupBatches(...)

{

int index = get_global_id(0);

int currPair = index;

int objIdA = pPairIds[currPair * 2].x;

int objIdB = pPairIds[currPair * 2].y;

int batchId = pPairIds[currPair * 2 + 1].x;

int localWorkSz = get_local_size(0);

int localIdx = get_local_id(0);

for(int i = 0; i < localWorkSz; i++)

{

if((i==localIdx)&&(batchId < 0)&&(pObjUsed[objIdA]<0)&&(pObjUsed[objIdB]<0))

{

if(pObjUsed[objIdA] == -1)

pObjUsed[objIdA] = index;

if(pObjUsed[objIdB] == -1)

pObjUsed[objIdB] = index;

}

barrier(CLK_GLOBAL_MEM_FENCE);

}

}

Page 41: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Colored Batches

Page 42: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

CPU 3Ghz single thread, 2D, 185ms

Page 43: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Geforce 260 CUDA, 2D, 21ms

Page 44: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

CPU 3Ghz single thread, 3D, 12ms

Page 45: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Geforce 260 CUDA, 3D, 4.9ms

Page 46: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

OpenCL Implementation

• Available in SVN branches/OpenCL

– http://bullet.googlecode.com

• Tested various OpenCL implementations

– NVidia GPU on Windows PC

– Apple Snow Leopard on Geforce GPU and CPU

– Intel, AMD CPU, ATI GPU (available soon)

– Generic CPU through MiniCL

• OpenCL kernels compiled and linked as regular C

• Multi-threaded or sequential for easier debugging

Page 47: GPU OpenCL Physics - Nvidia · Overview •Introduction •Particle Physics Pipeline from the NVIDIA SDK –Uniform grid, radix or bitonic sort, prefix scan, Jacobi •Rigid Body

Thanks!

• Questions?

• Visit the Physics Simulation Forum at

– http://bulletphysics.com

• Email: [email protected]