porting bullet to opencl - khronos group · summary •introduction to bullet physics sdk...

35
© Copyright Khronos Group, 2010 - Page 1 Porting Bullet to OpenCL Erwin Coumans Simulation Team Lead Sony Computer Entertainment US R&D

Upload: others

Post on 17-Jul-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Khronos Group, 2010 - Page 1

Porting Bullet to OpenCL

Erwin CoumansSimulation Team Lead

Sony Computer Entertainment US R&D

Page 2: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

Summary

• Introduction to Bullet Physics SDK

• Leverage the OpenCL SDKs from AMD, NVidia and Apple

• Implement and debug kernels on CPU first

• Compress data before sending it over the PCIe bus

- Only exchange the differences

• Change data structures from Array Of Structures to Structures of Arrays

• Choose appropriate algorithm

- Tree traversal from recursive to stackless to history traversal

• Use CMake for cross platform project files for MSVC, Xcode, Makefiles

- FindOpenCL.cmake

Page 3: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 3

Some games using Bullet Physics

Page 4: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 4

Some Movies using Bullet Physics

Page 5: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 5

Authoring Tools

• Maya Dynamica Plugin

• Cinema 4D 11.5

• Blender

• Houdini

• Lightwave CORE

• TBA: Standalone Physics Editor

Page 6: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 6

Summary

• Introduction to Bullet Physics SDK

• Leverage the OpenCL SDKs from AMD, NVidia and Apple

• Implement and debug kernels on CPU first

• Compress data before sending it over the PCIe bus

- Only exchange the differences

• Change data structures from Array Of Structures to Structures of Arrays

• Choose appropriate algorithm

- Tree traversal from recursive to stackless to history traversal

• Use CMake for cross platform project files for MSVC, Xcode, Makefiles

Page 7: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 7

Leveraging the AMD, Apple & NVIDIA SDK

• Radix sort, bitonic sort

• Prefix scan, compaction

• Examples how to use fast shared memory

• Uniform Grid example in Particle Demo

Page 8: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 8

Neighbor Search

• Calculate grid index of particle center

• Parallel Radix or Bitonic Sorted Hash Array

• Search 27 neighboring cells

- Can be reduced to 14 because of symmetry

• Interaction happens during search

- No need to store neighbor information

• Jacobi iteration: independent interactions

Page 9: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 9

Interacting Particle Pairs

Array Index

Sorted Cell ID Particle ID

0 4,D

1 4,F

2 6,B

3 6,C

4 6,E

5 9,A

0 1 2 3

12 13 14 15

5 7

8 10 11

B

C E

D

F

A

Interacting Particle Pairs

D,F

B,C

B,E

C,E

A,D

A,F

A,B

A,C

A,E

Page 10: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 10

Summary

• Introduction to Bullet Physics SDK

• Leverage the OpenCL SDKs from AMD, NVidia and Apple

• Implement and debug kernels on CPU first

• Compress data before sending it over the PCIe bus

- Only exchange the differences

• Change data structures from Array Of Structures to Structures of Arrays

• Choose appropriate algorithm

- Tree traversal from recursive to stackless to history traversal

• Use CMake for cross platform project files for MSVC, Xcode, Makefiles

Page 11: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

Implement and debug on CPU

• Keep a reference CPU implementation and transfer memory to CPU and back

- All stages in the physics pipeline can be hot-swapped between CPU and GPU

• We implemented MiniCL, a non-compliant OpenCL replacement

- OpenCL kernels compiled and linked as regular C

- Use regular MSVC debugger, breakpoints etc.

- Multi-threaded or sequential for easier debugging

Page 12: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 12

Summary

• Introduction to Bullet Physics SDK

• Leverage the OpenCL SDKs from AMD, NVidia and Apple

• Implement and debug kernels on CPU first

• Compress data before sending it over the PCIe bus

- Only exchange the differences

• Change data structures from Array Of Structures to Structures of Arrays

• Choose appropriate algorithm

- Tree traversal from recursive to stackless to history traversal

• Use CMake for cross platform project files for MSVC, Xcode, Makefiles

Page 13: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 13

Compress data before sending

• Using the GPU Uniform Grid as part of the Bullet CPU pipeline

• Available through btOpenCLBroadphase

• Reduce bandwidth and avoid sending all pairs

• Bullet requires persistent contact pairs

- to store cached solver information (warm-starting)

• Pre-allocate pairs for each object

Page 14: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 14

Persistent Pairs

Before After Difference

0 1 2 3

12 13 14 15

5 7

8 10 11

B

C E

D

F

A

Particle Pairs Before

After Differences

D,F D,F A,B removed

B,C B,C B,C removed

B,E B,E C,F added

C,E C,E C,D added

A,D A,D

A,F A,F

A,B A,C

A,C A,E

A,E C,F

C,D

0

1

2 3

12 13 14 15

5 7

8 10 11

B

C E

D

F

A

Page 15: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 15

Summary

• Introduction to Bullet Physics SDK

• Leverage the OpenCL SDKs from AMD, NVidia and Apple

• Implement and debug kernels on CPU first

• Change data structures from Array Of Structures to Structures of Arrays

• Choose appropriate algorithm

- Tree traversal from recursive to stackless to history traversal

• Compress data before sending it over the PCIe bus

- Only exchange the differences

• Use CMake for cross platform project files for MSVC, Xcode, Makefiles

Page 16: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 16

From particles to rigid bodies

Particles Rigid Bodies

World Transform Position Position and Orientation

Neighbor Search Uniform Grid Dynamic BVH tree

Compute Contacts Sphere-Sphere Generic Convex Closest Points, GJK

Static Geometry Planes Concave Triangle Mesh

Solving method Jacobi Projected Gauss Seidel

Page 17: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 17

Dynamic BVH Trees

0 1 2 3

12 13 14 15

5 7

8 10 11

B

C E

D

F

A

B

C E

D

F

ADF

BC E A

Page 18: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 18

Dynamic BVH tree acceleration structure

• Broadphase n-body neighbor search

• Ray and convex sweep test

• Concave triangle meshes

• Cloth and Soft body

• Compound collision shapes

• Occlusion culling and view frustum culling

Page 19: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 19

Dynamic BVH tree (instead of uniform grid)

• Keep two dynamic trees, one for moving

objects, other for objects (sleeping/static)

• Find neighbor pairs:

- Overlap M versus M and Overlap M versus S

DF

BC E A

QP

UR S T

S: Non-moving DBVT M: Moving DBVT

Page 20: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 20

Some Optimizations

• Objects can move from one tree to the other

• Incrementally update, re-balance tree

• Tree creation and update is hard to parallelize

- Research and development in progress

• Tree traversal can be parallelized on GPU

- Idea proposed by Takahiro Harada at GDC 2009

Page 21: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 21

Parallel GPU Tree Traversal using History Flags

• Alternative to recursive or stackless traversal

00

00

00

00

Page 22: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 22

Parallel GPU Tree Traversal using History Flags

• 2 bits at each level indicating visited children

10

00

00

00

Page 23: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 23

Parallel GPU Tree Traversal using History Flags

• Set bit when descending into a child branch

10

10

00

00

Page 24: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 24

Parallel GPU Tree Traversal using History Flags

• Reset bits when ascending up the tree

10

10

00

00

Page 25: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 25

Parallel GPU Tree Traversal using History Flags

• Requires only twice the tree depth bits

10

11

00

00

Page 26: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 26

Parallel GPU Tree Traversal using History Flags

• When both bits are set, ascend to parent

10

11

00

00

Page 27: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 27

Parallel GPU Tree Traversal using History Flags

• When both bits are set, ascend to parent

10

00

00

00

Page 28: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 28

History tree traversaldo{

if(Intersect(n->volume,volume)){

if(n->isinternal()) {

if (!historyFlags[curDepth].m_visitedLeftChild){

historyFlags[curDepth].m_visitedLeftChild = 1;

n = n->childs[0];curDepth++;

continue;}

if (!historyFlags[curDepth].m_visitedRightChild){

historyFlags[curDepth].m_visitedRightChild = 1;

n = n->childs[1];curDepth++;

continue;}

}

else

policy.Process(n);

}

n = n->parent;

historyFlags[curDepth].m_visitedLeftChild = 0;

historyFlags[curDepth].m_visitedRightChild = 0;

curDepth--;

} while (curDepth);

Page 29: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 29

Voxelizing objects

Page 30: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 30

General convex collision detection on GPU

• Algorithms are branchy (GJK, EPA)

• Parts of the computation can be accelerated using GPU hardware

- Replace support map by cube map lookup

Page 31: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 31

Parallelizing Constraint Solver

• Projected Gauss Seidel iterations are not embarrassingly parallel

A

B D

1 4

Page 32: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 32

Reordering constraint batches

A B C D

1 1

2 2

3 3

4 4

A

B D

1 4

A B C D

1 1 3 3

4 2 2 4

Page 33: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 33

OpenCL kernel Setup Batches__kernel void kSetupBatches(...)

{

int index = get_global_id(0);

int currPair = index;

int objIdA = pPairIds[currPair * 2].x;

int objIdB = pPairIds[currPair * 2].y;

int batchId = pPairIds[currPair * 2 + 1].x;

int localWorkSz = get_local_size(0);

int localIdx = get_local_id(0);

for(int i = 0; i < localWorkSz; i++){

if((i==localIdx)&&(batchId < 0)&&(pObjUsed[objIdA]<0)&&(pObjUsed[objIdB]<0)){

if(pObjUsed[objIdA] == -1)

pObjUsed[objIdA] = index;

if(pObjUsed[objIdB] == -1)

pObjUsed[objIdB] = index;

}

barrier(CLK_GLOBAL_MEM_FENCE);

}

}

Page 34: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 34

Summary

• Implement and debug kernels on CPU first

• Leverage the OpenCL SDKs from AMD, NVidia and Apple

• Change data structures from Array Of Structures to Structures of Arrays

• Choose appropriate algorithm

- Tree traversal from recursive to stackless to history traversal

• Compress data before sending it over the PCIe bus

- Only exchange the differences

• Use CMake for cross platform project files for MSVC, Xcode, Makefiles

Page 35: Porting Bullet to OpenCL - Khronos Group · Summary •Introduction to Bullet Physics SDK •Leverage the OpenCL SDKs from AMD, NVidia and Apple •Implement and debug kernels on

© Copyright Erwin Coumans, 2010 - Page 35

Thanks!

• Available in SVN branches/OpenCL

- http:\\bullet.googlecode.com

• Visit the Physics Simulation Forum at

- http:\\bulletphysics.org

• For technical details see also Takahiro Harada’s GDC talk

• Email: [email protected]