collision overload: reducing the impact in real-time physics final

Collision Overload: Reducing the Impact in Real-time

Physics

Final Report

Ian Robert Ballantyne

Supervisor: Tony Field

Second Marker: Paul Kelly

Copyright c© Ian Ballantyne, 2007. All Rights Reserved

To my Parents, Jackie and Colin,and my brother Jamie

for their continued support throughout my studies.

Acknowledgements

I’d like to thank the following people for their contributions:

• My supervisor, Tony Field - For helping me focus my ideas and getting excited about theresults.

• My friends, Will, John, Joel and Richard - For their comments and discussions on collidingobjects and the challenges of implementing a solution.

• My fellow computing students, Dave and Islay - For their interest in my work and ideas.

• The Computing Support Group - For providing me with a dedicated machine to “crashboxes together”.

Trademarks

The following trademarks are mentioned at various points during this paper:

• DirectX and Direct3D are trademarks of Microsoft Corporation.

• PhysX is a trademark of AGEIA Technologies Inc.

• Havok FX and Havok Physics are trademarks of Havok.com Inc.

• GeForce, SLI and CUDA are trademarks of NVIDIA Corporation.

• Radeon is a trademark of ATI Technologies Inc

Abstract

In the field of real-time physics simulations we have a dilemma. Users want to experiencerealistic object interactions at a high “level of detail”. Achieving this involves a trade-off betweenperformance and accuracy for the designers. The solution is to find a “level of detail” thatprovides suitable performance for a nominal execution cost. Designers are aware of the sporadicnature of physics and so “err on the side of caution” by using simpler physical representations.

This project attempts to reduce the gap in performance between normal execution and themore complex cases where many objects collide together at a single point. We have called thisthe collision overload problem. We identify “collision detection” as the main bottleneck in physicssimulations. We propose a method for dynamic switching among different “levels of detail”, basedon testing safety conditions (encapsulation levels), that ensure the stability of the simulation.The success of the technique relies on identifying when to switch models and investigating howthe different changes effect performance. The report proposes the concept of global, group andlocal scope for decision making and evaluates the alternatives in an implementation based on theBullet physics engine. We show that our technique can improve the situation and we also providea framework called Scatter that can be used to further analyse dynamic model reduction.

Contents

1 Introduction 161.1 The Collision Overload Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2 Background 242.1 Physics System Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.1.1 Basics of Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.1.2 Newtonian vs. Lagrangian Dynamics . . . . . . . . . . . . . . . . . . . . . 252.1.3 Rigid-Bodies vs. Deformable-Bodies . . . . . . . . . . . . . . . . . . . . . 25

2.2 Physics Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.2.1 Linear Complementarity Problem (LCP) . . . . . . . . . . . . . . . . . . . 262.2.2 The Gilbert-Johnson-Keerthi Distance Algorithm (GJK) . . . . . . . . . . 27

2.3 Collision Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.3.1 Broad-phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3.1.1 Bounding Spheres . . . . . . . . . . . . . . . . . . . . . . . . . . 292.3.1.2 Oriented Bounding Boxes (OBB) . . . . . . . . . . . . . . . . . . 292.3.1.3 Axis-Aligned Bounding Boxes (AABB) . . . . . . . . . . . . . . 29

2.3.2 Narrow-phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.3.3 Continuous/Discrete Collision Detection (CCD & DCD) . . . . . . . . . . 29

2.4 Collision Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.4.1 Resolving Contact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.5 Simulation Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.6 Game Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.7 Real-time Physics Simulator Design . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.7.1 Physics Time-step In Detail . . . . . . . . . . . . . . . . . . . . . . . . . . 332.7.1.1 Unconstrained Motion . . . . . . . . . . . . . . . . . . . . . . . . 332.7.1.2 Collision Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 342.7.1.3 Non-Contact Constrained Motion . . . . . . . . . . . . . . . . . 342.7.1.4 Collision Response: Contact Constraints . . . . . . . . . . . . . 342.7.1.5 Integrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.7.2 Modular Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.8 Existing Physics Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.8.1 Commercial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.8.1.1 Havok Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.8.1.2 AGEIA PhysX (Formally Novodex) . . . . . . . . . . . . . . . . 36

2.8.2 Open Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

10

CONTENTS 11

2.8.2.1 Open Dynamics Engine . . . . . . . . . . . . . . . . . . . . . . . 362.8.2.2 Bullet Physics Library . . . . . . . . . . . . . . . . . . . . . . . . 37

2.9 Level of Detail for Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.10 Hardware for Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.10.1 GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.10.1.1 Pipeline Architecture . . . . . . . . . . . . . . . . . . . . . . . . 382.10.1.2 Vertex Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.10.1.3 Fragment Processor . . . . . . . . . . . . . . . . . . . . . . . . . 392.10.1.4 Geometry Shader . . . . . . . . . . . . . . . . . . . . . . . . . . 402.10.1.5 CUDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.10.1.6 Appropriate Physics Utilisations . . . . . . . . . . . . . . . . . . 402.10.1.7 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.10.2 PPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.10.2.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.10.2.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.10.2.3 Appropriate Physics Utilisations . . . . . . . . . . . . . . . . . . 41

2.11 Report Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3 Investigation 443.1 Physics Engine Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.1.1 Case Study: Bullet Physics Library . . . . . . . . . . . . . . . . . . . . . . 453.1.1.1 Modular Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.1.1.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.1.1.3 Improved Performance . . . . . . . . . . . . . . . . . . . . . . . 46

3.1.2 Bottlenecks of Physics Simulators . . . . . . . . . . . . . . . . . . . . . . . 473.1.2.1 Full Narrowphase Intersection Testing . . . . . . . . . . . . . . . 473.1.2.2 Unused Calculation Results . . . . . . . . . . . . . . . . . . . . . 473.1.2.3 Maintaining Structures . . . . . . . . . . . . . . . . . . . . . . . 483.1.2.4 Excessive Contact Points . . . . . . . . . . . . . . . . . . . . . . 483.1.2.5 Complex Intersection Algorithms . . . . . . . . . . . . . . . . . . 483.1.2.6 Generally Avoiding Bottlenecks . . . . . . . . . . . . . . . . . . . 48

3.2 The Collision Detection Bottleneck . . . . . . . . . . . . . . . . . . . . . . . . . . 493.3 Solution Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.3.1 Parallelising Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.4 Level of Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.4.1 User Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.4.2 Encapsulation Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.4.3 Requesting a Level of Detail . . . . . . . . . . . . . . . . . . . . . . . . . . 553.4.4 Global, Group and Local Policies . . . . . . . . . . . . . . . . . . . . . . . 573.4.5 Investigation by Implementation . . . . . . . . . . . . . . . . . . . . . . . 58

4 Implementation 604.1 The “World” Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.2 Timing and Game Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.3 Built-in Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.4 Scatter API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.5 Integrating Encapsulation Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.5.1 Hybrid World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.5.2 btHybridDynamicsWorld . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

CONTENTS 12

4.5.2.1 Additions To the Bullet Loop . . . . . . . . . . . . . . . . . . . . 674.5.2.2 Collecting Local Heuristics . . . . . . . . . . . . . . . . . . . . . 674.5.2.3 Problems in Implementation . . . . . . . . . . . . . . . . . . . . 67

4.5.3 Successful Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5 Evaluation 705.1 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.1.1 Test Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.1.2 Test Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.1.2.1 Scene 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.1.2.2 Scene 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.1.3 Performance Measures and Expectations . . . . . . . . . . . . . . . . . . . 745.1.4 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.1.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.1.5.1 Test 1 and Test 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 755.1.5.2 Test 3 and Test 4 . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.1.6 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6 Summary and Conclusion 886.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886.2 Scatter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896.3 Level of Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6.3.1 Encapsulation Levels and Model Switching . . . . . . . . . . . . . . . . . 906.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 916.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Bibliography 94

List of Figures

1.1 Left: A stable tower of simulated blocks. Right: Disrupting the tower. . . . . . . 171.2 The Collision Overload Problem exhibited in a game . . . . . . . . . . . . . . . . 19

2.1 A diagram of the Minkowski sum of a circle and a square . . . . . . . . . . . . . 282.2 Three examples of bounding volumes: Spheres, OBBs and AABBs . . . . . . . . 282.3 The interactions of modern game loop . . . . . . . . . . . . . . . . . . . . . . . . 322.4 Pseudo Code: Simplified Physics Time-step . . . . . . . . . . . . . . . . . . . . . 35

3.1 The modular design concept by Erleben (a), applied to Bullet (b) . . . . . . . . . 453.2 Breakable objects diving into constituent elements. . . . . . . . . . . . . . . . . . 503.3 Possible data parallelisation in Bullet . . . . . . . . . . . . . . . . . . . . . . . . . 533.4 A diagram showing “Encapsulation Levels” of a lamp and a mug . . . . . . . . . 563.5 The problems of increasing level of detail without safety conditions . . . . . . . . 563.6 The problems of decreasing level of detail without safety conditions . . . . . . . . 563.7 Global, group and local decision making for model switching . . . . . . . . . . . 58

4.1 The world model used in a simulation application . . . . . . . . . . . . . . . . . . 624.2 The main Scatter loop and timing system. . . . . . . . . . . . . . . . . . . . . . . 634.3 The flow of requests and switches across the Scatter/Bullet boundary. . . . . . . 66

5.1 Scene 1 : Left: the Hunter mesh. Right: the Fighter mesh . . . . . . . . . . . . . 725.2 Scene 2: Spaceships (Hybrid Models), Asteroids (Basic Spheres) and the Sun (Large Sphere). 735.3 Scene 1: Key frames of the Hybrid Build . . . . . . . . . . . . . . . . . . . . . . . 765.4 Scene 1: Key frames of the Hybrid Build . . . . . . . . . . . . . . . . . . . . . . . 775.5 Scene 1: Manifolds and Narrowphase CPs. Top: Default Build. Bottom: Hybrid Build. 805.6 Scene 1: Detail: Manifolds and Narrowphase CPs. Top: Default Build. Bottom: Hybrid Build. 815.7 Scene 2: Manifolds and Narrowphase CPs. Top: Default Build. Bottom: Hybrid Build. 825.8 Scene 2: Total Objects and Contact Points (Both builds) . . . . . . . . . . . . . 835.9 Scene 2: Peak Detail: Total Objects (Both builds) . . . . . . . . . . . . . . . . . 835.10 Scene 2: Physics profiling. Top: Default Build. Bottom: Hybrid Build. . . . . . . 845.11 Scene 2: Peak Detail: Physics profiling. Top: Default Build. Bottom: Hybrid Build. 85

14

Chapter 1

Introduction

“In physics, you don’t have to go around making trouble for yourself - nature does itfor you.”

Frank Wilczek

Physics has long been used to bring structure to our seemingly chaotic view of nature. Mechanics,more specifically dynamics, has allowed us to visualise the real-world in terms of forces anddirection, turning moving objects into variables of mass, momentum and velocity. By combininggeometry, classical mechanics and numerical methods we can create models that represent asubset of the world, allowing us to predict the movements and impacts of objects containedwithin. The improving performance of computers has lead to more realistic simulations of physicsobtainable on home computers.

Physics-Based Simulations are used for everything from predicting the motion of particles influids to adding realistic motion to computer animation. Simulators are concerned with predictingthe kinematics of objects for every step of the simulation. To do this, they must calculate allthe forces obtained from constraints (restrictions of motion) and resulting forces from collisiondetection (see section 2.3) with other objects. In this model every object has the potential to

interact with every other object and in a crude implementation this can result in n(n−1)2 pairs

of collisions for n objects, potentially an O(n2) problem. Simulators have the complicated jobof balancing accuracy and efficiency. They make sure the models they use are as believable aspossible and that they are able to calculate the results quickly enough. Problems occur when themodels used require more time to compute than is available. The extent of this depends on thecontext of the simulator. Recorded animations using physics are far more interested in accuracybecause realism and resolution is the motivation. Simulators can generally be categorised by thefollowing properties:

• Off-line or Real-time

• Scripted or Interactive

Off-line simulators are simpler to deal with. The lack of deadlines for calculations means thatgreater detail can be used to model the real-world, at the cost calculating the predicted motion.Movie production animation is an example of an off-line simulator. Off-line simulators are usuallyscripted too. The animators describe the motion and it is the job of the simulator to make

16

CHAPTER 1. INTRODUCTION 17

the animation believable. Real-time simulators are the main area of focus for performanceimprovements. Techniques such as using Axis-Aligned Bounding Boxes (AABBs) with a sweepand prune method (described in section 2.3.1.3) are used improve performance by reducing theorder of collision detection pairs to a worst case lower bound of Ω(nlogn). To make matters worsereal-time simulators are often interactive, which means they are affected by non-deterministicinput. Interactions can affect the complexity of the simulation at any point in time. Considerthe following situation:

“A tower of objects is stacked one object on top of another. The only contact resolu-tion for each object is the equal forces applied for each neighbouring object above andbelow. A user decides to remove an object from the middle of the tower. Dependingon the force and direction applied to move the object, all other objects have thepotential to collide with each other whilst falling. The physics of interactions mustbe solved before the next step in the simulation. To make the action responsive, itmust be completed in between 1

60 and 130 seconds1”

Figure 1.1: Left: A stable tower of simulated blocks. Right: Disrupting the tower.

This example outlines a typical scenario for a real-time physics simulator. Movements andinteractions can be sudden and unpredictable, making it difficult to identify situations whencomplicated collisions occur. In general, the efficiency and workload management is a wellestablished area, but there is less focus on sporadic cases. By this we mean cases in physicssimulators that are rare, but have a large impact on the system when encountered. A largegroup of objects converging on a single point and colliding is such an example of a sporadic case.For the benefit of the report, we will call this case “The Collision Overload Problem”.

1.1 The Collision Overload Problem

The motivation for this project arose from an observation of a game running on a home computer.The scenario is as follows:

1Figures quoted reflect a visual frame rate of between 30 to 60 frames per second, a suitable rate for realisticinteraction.


“In the game, the objective is move through the levels collecting items and fendingoff enemies. The game takes either a first-person or third-person perspective andexhibits a world using the latest graphical techniques. The unique concept of thegame is that unlike most shooting games, you have no weapons or explosive devices,instead your character has the unique telekinetic abilities to pick up objects and usingtheir power of the mind launch the objects at a selected target.”

Being renound for pushing the boundaries of what is feasible in game play, the gamer attemptedto use all objects in the local area at a single target. The result was a significant drop in frame-rate as the objects converged on the target followed by crunching sounds as they collided andeventually dispersed. The frame-rate returned to a regular interval and the game was at thesame performance as before the collision. To clarify, this is situation that occurred:

“By directing a collection of n objects at a single point and launching them with a ve-locity, the frame performance dropped as the objects began to converge and was at itslowest at the point where they began colliding. As they left the collisions in differentdirections, the frame performance returned to same as before the interaction.”

Developers make it their goal that a user’s experience is a smooth and fluid as possible and theyensure that the detail they provide in games is suitable for recommended machines used to runthem. Generally gamers are very quick to spot situations where performance drops and are oftencritical of low frame-rates and disruptions to play. They are therefore tied in a trade-off betweenperformance and accuracy.

The goal of the project:

“We want to be able to reduce the impact of sporadic cases, like sud-den convergence, on the performance of the system, thus allowing morecomplex physical representations to be used.”

1.2 Approach

The approach taken in this report was to research the current techniques used in physics simula-tions and locate an area where improvements could be made. After finding a suitable area fromthe background and investigation the aim was to implement the technique in a framework, ref-ered to as “Scatter ”, in this report. Using quantitative and qualitative data from the running theimplementation in a scenario similar to the collision overload problem, the aim was to evaluatethe improvements in comparison to the same scenario without the improvement. The approachcan be broken into the following stages:

Physics simulator research - Finding existing physics simulators, understanding the struc-ture and techniques used. Reading documentation and research papers describing how thesimulators are used and observing examples utilising the engines.

Implementing a working demonstration - Experiment with a physics engine and combineit with a renderer to give insight how a framework needs to be designed. Aim to identifythe difficulties of doing so and take requirements for the framework design.


Figure 1.2: The Collision Overload Problem exhibited in a game


Physics Framework - Design and build a framework based on the requirements and improve-ments from the demonstration tool. Ensure the framework is able to record and outputquantitative data. Select the appropriate tools needed to achieve the requirements.

Recreate the problem in the framework - Create a scenario in the framework that reflectsthe original problem. Aim to simulate a game environment, but with additional controlsto identify where problems exist. Examples include the ability to pause the simulation orvisual output to identify instances of collision.

Identify a solution to solve the problem - Find a topic that is related to the motivationscenario and aim to expand on it.

Implement solution in framework - Modify a version of the framework that is able to per-form the solution. It must be clear which components belong exclusively to the solution toevaluate the framework with and without it.

Compare quantitative output of framework - Run the scenario in both frameworks andcompare the output data. Record the values in graphs and charts and identify where thesolution is working and whether it improves the performance.

Discuss visual output of framework - In relation to the techniques used, discuss whetherthere are noticeable changes when running the system. Analyse the aspects a user mayobserve when viewing the simulation.

1.3 Contributions

From the research performed in this project I have made the following contributions:

An Investigation into the aspects that affect simulator performance - I have outlinedtwo major areas of research to improve the problem seen in collision overload: Parallelisa-tion of calculations and level of detail. I have focused on the area of level of detail, explicitly“dynamic model reduction” and drawn comparisons with model reduction techniques fromgraphics. I have outlined how they could be applied to physics simulators.

Designed framework for investigating performance - From the initial research I have de-signed and implemented a framework, “Scatter ”, for the purposes of prototyping and testingphysics techniques. The framework mimics game engine design and records performancedata for analysis. It allows scenarios to be played under different conditions to furtherinvestigate level of detail.

Proposed and implemented a technique for dynamic model reduction called“Model Switching using Encapsulation Levels” - Stemming from the investigationinto level of detail, I have proposed a technique to address the motivation scenario. Idiscuss the success of the concept and the areas of potential application.

Investigation into a system for requesting performance improvements - I have discussedusing policies to preemptively trigger performance improvements for physics systems. Ihave investigated local policies such as proximity and used them in an implementation ofdynamic model reduction.


Expanded on physics simulator analysis - By looking at physics simulators using with amodular perspective, I have used an existing physics simulator called “Bullet ” as a casestudy for applying research techniques. By implementing a solution of my technique inScatter (which runs using Bullet), I have discussed the hurdles and commented on thesuitability of using an existing simulator as a learning tool.

Chapter 2

Background

“Move, collide, resolve, repeat...”

Ancient Physics Proverb

The aim of the background is to touch on many of the techniques required to understand theworkings of a physics simulator. As this area of real-time physics is very specialised, the contentprovided is intended as a “point of reference” for further reading. The following materials providedetails of algorithms covered by this chapter. “Game Physics” by David Eberly ranges fromfundamental physics to building a physics engine [17]. Although written with games in mind,the comprehensive appendix of linear algebra ensure that the information is suitable for anyonewith an interest in building a physics simulator. Other good resources include a book from thesame series called Collision Detection in Interactive 3D Environments [45]. The book by Ginovan den Bergen describes algorithms used in collision detection, an area commonly recognised asbeing the bottleneck of physics simulations. The concluding section Report Terminology (2.11)of the background is reference for the commonly used terms throughout the report.

2.1 Physics System Terminology

2.1.1 Basics of Physics

Starting from basics, motion in physics is described by the following ordinary differential equation(ODE):

ddt

x(t)R(t)P (t)L(t)

=

v(t)ω(t)R(t)

F (t)τ(t)

Location - x(t) - Where objects and particles are located in a world coordinate system.

Velocity - v(t) - The rate of change of displacement of particles with components in three-dimensions.

Mass - M - The mass of the particle/body.

Orientation - R(t) - The direction objects and particles face in a world coordinate system.

24

CHAPTER 2. BACKGROUND 25

Linear and Angular Momentum - P (t) and L(t)- The momentum of motion relative to the worldand around the centre of mass of an object.

Torque - τ(t) - Force dependent on the centre of mass.

Force - F (t) - Internal and external forces from fields, gravity and contact with otherobjects.

Inertia Tensor - I(t) - The distribution of mass in a body relative to the centre of mass (CofM).

Physics simulators use either forward or inverse kinematics and dynamics to solve motion. Wecan think of forward as “I am at time t and need to go forward to get to time t+dt” and backwardsas“I am going to finish as time t + dt and need to go backwards to get to t”. Inverse dynamics isnoted as being an easier problem to solve [18], which is why the simulators in this report referto “Integration Transformation” as a last step in calculating final motion. David Baraff, AndrewWitkin and Michael Kass ran a course on “Physically Based Modelling” at SIGGRAPH, mostrecently in 2003[23]1. The course had been run as far back as 1995, based on content from papersthey had published respectively (many of which are the basis for most modern physics engines).Reading the course notes or publications with similar content (Eberly’s “Game Physics”[17]) willgive a practical understanding of how the mechanics can work together. The course covers therelevant basis in differential equations and particle dynamics to understand most of the papersreferred to in this report.

2.1.2 Newtonian vs. Lagrangian Dynamics

Dynamics is the area of physics that describes how particles move when external forces act uponthem. This covers Newton’s second law F = ma where F is the applied force, m is the massof the particle and a is the acceleration. Newtonian dynamics describe combined external andconstraining forces working on objects. When using Newtonian laws, F includes the constrainingforces like friction and contact forces. Eberly [17] suggests that although this makes Newtoniandynamics appropriate for general-purpose physics engines, the difficulties arise in modelling fric-tion effectively. He notes that Lagrangian dynamics are more suited to frictional forces becausethe equations can be formed in a way that removes the constraining forces. Lagrangian dynamicsuse energy in their formulation, transferring between potential and kinetic. The choice of dynam-ics system to use is based on purpose of the physics engine. Certain dynamics are better suitedto modelling certain characteristics of physics, for example, Euler’s equations of motion betterrepresent axis rotation than kinematics. Eberly shows preference in using Lagrangian dynamics.He mentions that although it takes additional programming time to construct a complete system,using Lagrangian dynamics is more stable and efficient.

2.1.3 Rigid-Bodies vs. Deformable-Bodies

A rigid-body is a region that has mass and dimension. The three basic examples of rigid-bodiesare a single particle, a particle system and a continuum mass. The assumption made aboutrigid-bodies is that the particles that compose the body do not move relative to each other.This concept lends itself very nicely to modelling objects in simple physics systems. The visualrepresentation of objects in the graphics world is performed using vertices, which connect tomake polygons. A simple box constructed with eight vertices and six faces can be representedby a similar system of eight particles in the physics world. The human perception that a box

1The reference material available is from the 2001 course.


will not melt, collapse or inflate allows us to assume that the particles that make up the cornersof the box will not move relative to each other and therefore form a rigid-body.

A deformable-body on the other hand, is a body that can change shape or volume when anexternal force is applied to it. The particles that compose the body are able to move relative toeach other. This adds additional complexity to the physics calculations, which have to take intoaccount the distances between the composing particles. Deformable-bodies have can be modelledusing Finite Element Methods (FEM) to approximate the deformation function. This methodhas been done in real-time to simulate fractures of stiff materials [39]. It has been suggested thatthis method is difficult due to the following reasons:

1. The time-step for dynamic integration has to be reduced to simulate collisions.

2. The size of the problem is an order of magnitude higher than the two-dimensional problem.

Using FEM produces accurate results and techniques have improved to perform real-time defor-mation [10]. Other methods of modelling deformable bodies include using mass-spring models.They are easier to compute and the cost of the calculations is less expensive [19].

The choice of body is dependent on the application for which it is required. Deformable bodiesare required to simulate cloth, skin, plastics and breaking objects. They are very much suitedto the animation of characters wearing clothes, rendering realistic looking flags and simulatingdamage on vehicles due to collision. A combination of rigid bodies and deformable bodies canoccur in the same system, for example a solid table covered in cloth. Bridson, Fedkiw andAnderson demonstrated using both deformable mass-springs and rigid bodies to model cloth atcertain stages of collision known as “impact zones” [2]. The method was proposed by Provot whoobserved that in bunched areas of cloth, friction restricts relative motion [35].

Rigid-bodies are a much simpler area to focus on. Deformable bodies could experience thesame collision overload problem, but this project discusses techniques relative to rigid-bodies.For more examples of real-time deformable bodies see the “Real Matter” demonstration [38].

2.2 Physics Concepts

The exact mathematical details of how to model velocities, forces and momentum of interactionsare outside the scope of this report and can be reference independently. Instead the aim is tohighlight the mathematical techniques regularly referred to by physics engine designers, with thepurpose of understanding what types of calculations they solve.

2.2.1 Linear Complementarity Problem (LCP)

The term Linear Complementarity Problem (LCP) crops up regularly in discussions about Con-straint Solvers (see 2.7.1.4). LCP solvers are software implementations that can solve LCPproblems, usually for collision response or more specifically contact forces (see 2.4). An LCPrefers to a general problem in linear algebra that attempts to find values for the column vectorsw and z for the following conditions2:

Q = w − Mz and w ≥ 0 and z ≥ 0 and w · z = 0

Where Q is a known n-dimensional column vector and M is a known n × n dimensionalmatrix.

2Further information on LCPs can be found in[13]


Knowledge of LCPs is relevant to this project since collision response methods described inthis report show varying speed and accuracy when implemented using LCP solvers. Rewriting ordesigning an LCP solver may be beyond the scope of the project; however selecting an appropriateimplementation is an improvement.

One possible algorithm for solving an LCP is the “Lemke-Howson Algorithm”. This algorithmcan solve LCPs for non-trivial solutions. The algorithm is often referred to as a pivoting method.Pivoting methods use a finite number of steps and require a recursive solution [26]. Anotherpivoting method is Dantzig’s algorithm described in Baraff’s paper in 1994 [5]. These methodsare considered accurate, but not as desirable for real-time physics. The Projected Gauss-Seidelmethod works by improving the result every iteration. Pivoting methods can sometimes fail toproduce results or suffer from rounding errors. Iterative methods are usually preferred becauseof their ability to produce close results when interrupted early [26].

2.2.2 The Gilbert-Johnson-Keerthi Distance Algorithm (GJK)

The Gilbert-Johnson-Keerthi Algorithm (GJK) is a method of calculating the distance betweenconvex objects [24]. It is used in systems that have relaxed rules on non-penetrating objects.This means penetrations are allowed, with the intention of finding the point of contact (describedin 2.3.2). The new object positions are solved for time t2, then GJK is used to calculate theamount of inter-penetration. The movements of the objects are then reversed to the time ofcontact, t1, where t2 > t1. Van den Bergen describes using an improved implementation of GJKfor use in collision detection techniques [7]. It is utilised due to its simplicity to implement andapplicability to convex objects such as boxes, spheres, cylinders, convex hulls and Minkowskisums3 of convex objects. The GJK method can calculate distances in the following form.Given a distance d(A, B) between two objects:

d(A, B) = min|x − y| : x ∈ A, y ∈ B

Where x is a point on A and y is a point on B. Considering the points a and b with the shortestdistance between them:

d(A, B) = |a − b|

The Minkowski sum A − B can be written as:

d(A, B) = |v(A − B)|

The GJK can iteratively calculate vk = v(A, B) in k iterations where k is a finite number, toproduce the distance between the objects4.

3A Minkowski sum is the addition of two polygons by considering all possible sums of a points of one polygonand a point of the other.

4See [24] for a full proof.


Figure 2.1: A diagram of the Minkowski sum of a circle and a square

2.3 Collision Detection

Collision Detection is considered such an important aspect of physics simulations that softwarelibraries have been constructed to perform collision detection alone. In the context of this papercollision detection refers to the process of comparing two rigid-bodies, detecting whether theypenetrate or will penetrate and then calculating the exact points of contact. For contact betweenn objects, it would seem the calculations required are of the order O(n2). This is the number of“collision pairs” (a pair of objects that require collision testing). The number of tests is clearlytoo high for large real-time systems, so collision detection software in general breaks down thecalculation into two basic phases: a broad-phase and a narrow-phase. It is the job of the broad-phase to reduce the number of intersections required by using various culling techniques [18]. Thenarrow-phase then identifies which pairs of objects are intersecting and in another step calculatesthe point of intersection.

2.3.1 Broad-phase

The broad-phase usually uses a number of bounding primitive techniques to calculate intersec-tions quicker. If a bounding primitive pair is found to intersect then the pair is added to a set thatwill be tested by the narrow-phase. If the bounding pair doesn’t intersect then the objects don’tintersect. Calculating whether two complex objects are intersecting is far more computationallyexpensive than calculating the bounding pair, which is why this phase is so effective. The threemost common types of bounding primitives are bounding spheres, axis-aligned bounding boxes(AABB) and oriented bounding boxes (OBB).

Figure 2.2: Three examples of bounding volumes: Spheres, OBBs and AABBs


2.3.1.1 Bounding Spheres

Culling with bounded spheres is based on the concept that spheres don’t overlap if the distancebetween their centres is larger than the sum of their radii, shown in the following inequality:

|C2 − C1| > r2 + r1 or |C2 − C1|2 > (r2 + r1)

2 (to avoid having to calculate the square root)

Although the calculation is simple, it still requires a comparison of all n pairs of objects. Theculling doesn’t take into account the knowledge of location (spatial coherence) or the predicablesmall changes of distance of short periods of time (temporal coherence) [17].

2.3.1.2 Oriented Bounding Boxes (OBB)

OBBs represent the orientation of the enclosed object as well as a more accurate approximationof their volume. The oriented bounding boxes benefit from the fact that they are symmetric.This reduces the number of tests they need to perform when comparing two boxes (15 insteadof a worse case scenario of 156! 5).

2.3.1.3 Axis-Aligned Bounding Boxes (AABB)

AABBs use an efficient “sweep and prune” algorithm along the aligned axis to compute the setof intersecting pairs. They also take into account temporal and spatial coherence when updatingthe bounding boxes. More details of the algorithm can be found in Eberly’s book[17]. AABBswere previously shown to be slower than OBBs [28], but following work on the improvement ofoverlap testing by Gino van den Bergen [6, 45], they have been shown to be a very appropriatemethod for deformations and for rigid bodies as well. Most current physics systems implementAABBs because of their simplicity and effective culling.

2.3.2 Narrow-phase

To find an intersection in the narrow-phase, a method using a separating plane as a “witness”can be utilised. Baraff describes the technique in which a plane between two objects is calculated(the witness) [23]. If such a plane doesn’t exist then the objects are intersecting. The cost ofcalculating this plane is negligible since the small changes in each step of the physics systemusually result in the same plane separating the objects. The contact points can only occur onthe separating plane and hence only those in coincident with the plane should be compared forboth polyhedra. If the objects are penetrating then the simulator reverses to find the time atwhich the contact did occur and then calculate the contact points at time t (as described inGJK).

2.3.3 Continuous/Discrete Collision Detection (CCD & DCD)

The underlying principle of discrete collision detection is the concept of a time-step. A time-stepis the period of time the physics world waits before the new state of the world is calculated. Thiscan be compared to the idea of checking a clock. When you first observe the clock you knowwhere the hands are pointing, then look away for a period of time (time-step). You will onlyknow the position of the hands on the clock when you observe it again. The period betweenlooking is the time-step. In physics simulations this period is usually fixed and is related to thedetection of collisions.

5Figures from Eberly [17]


The method of stepping through the physics simulation and reversing back if a penetrationhas occurred is known as discrete collision detection. Discrete collision detection can “miss”detecting collisions if the time-step is too large (known as the “tunnel effect” because objects can’tunnel’ through other objects). Continuous collision detection works on the basis of calculatingthe time of impact (TOI). This is done during the collision detection and can apply to both thebroad and narrow phases. Redon et al. suggest a method for fast continuous collision detectionusing OBBs for large polyhedrals (tens of thousands of triangles).

CCD is becoming a more popular method because it doesn’t suffer from the tunnel effect. Itis used in games for physics that need to be reliable, for example, story dependent interactions.

2.4 Collision Response

Collision Response is the second part of the full collision detection system. Once the contactpoints have been computed the constraints between them need to be resolved. This sectiondeals only with the objects that are in contact. The other objects not involved in contact donot require this step and can hence be ignored. In general contacts can be separated into fourcategories:

• vertex/face

• edge/edge

• vertex/vertex

• vertex/edge

Vertex/vertex and vertex/edge are degenerate and are not usually considered due to their unlikelynature. Another assumption is that edge/edge collisions are not collinear.

2.4.1 Resolving Contact

To find out whether or not a body is colliding, resting or separating. Baraff suggests we need toconsider the velocities of the contact points [23]. Consider a vertex/face contact:

Given pa(t0) = va(t0) + ωa(t0) × (pa(t0) − xa(t0)) the velocity of vertex point a

and

Given pb(t0) = vb(t0) + ωb(t0) × (pb(t0) − xb(t0)) the velocity of vertex point b

Then the relative velocity is:

vrel = n(t0) · (pa(t0) − pb(t0))

1. If pa(t0) − pb(t0) is in a positive n(t0) direction, the contact is separating .

2. If pa(t0) − pb(t0) is in a negative n(t0) direction, the contact is colliding .

3. If pa(t0) − pb(t0) is perpendicular to n(t0), the bodies are resting .


The forces involved are time dependent so in general impulses are used. The results are differentfor colliding and resting contact because resting contacts need to be balanced by all forces actingupon them. It is this stage in collision contacts that restitution is used. A restitution value ofǫ = 1 would result in completely elastic collision. Once resolved, the objects have new velocities(or are resting if ǫ = 0).

Resting contacts are the most difficult problem in Baraff’s dynamics notes [23]. Numericalmethods are required to calculate the impulses of all the objects acting on one another. Thesection is too long to expand upon in this paper, but Baraff’s notes explain the process. Thereare a number of approaches to resolving the contact impulses, some of which are mentioned inthe physics simulator section of this background (see 2.7.1.4).

2.5 Simulation Loops

The simulation loop is the part of physics-based program that controls the interactions between,input, updating and output of the simulation. It is arranged as a sequence of tasks that mustbe performed for each frame. Input consists of collecting information from hardware such asmice, keyboards and other controllable devices and converting it to actions that are meaningfulto the simulator: holding down keys could correspond to applying forces to objects in the world.This information is collected by either polling the device or responding to events. The resultingactions of the input are carried out in the update stage of the loop. Updating involves gatheringinformation from the previous frames and combining it with new input to calculate the nextstate of the simulator. The first stage of updating is usually to decide what the new state of thesimulator will be, then the second stage is to perform the simulation for the frame resulting innew positions for objects, camera orientations and interaction resolution, ready for the outputstage. The output stage is the section where results are rendered, including but not limited to,visual data, audible data, file data, and communication data.

Simulation loops can be grouped by deterministic behaviour. Simulations that focus oninternal interactions are usually less concerned with input from outside the simulation world.Such an example is “Go Fish!” a physics-based simulation of a virtual marine world that modelsthe natural movement of fish along with behavioural and perceptual simulation[42]. The purposeof the simulator is realistic animation with a minimal amount of scripting. The scripted motionsof a fishing line and camera are used as the basis to show the simulator animating a fish beingattracted by the bait and subsequently caught. This simulation and other similar animationsimulations are focused on filling the gaps between animation frames, a utilisation of physicssimulators. As a result, the computational time of each frame is less crucial because the simulatordoesn’t have to appear responsive to user interaction. If the aim however, is to provide real-timeanimation, then frame generation time is relevant. Other simulators that rely less on non-deterministic input include predictive simulations of weather systems and particle interactions.

The non-determinism of video games provide some of the most complicated simulation loops.They must deal with user input, networking, physics, AI, game logic, user interfaces and graphicaloutputs at a rate that feels responsive to the user. Architecting “game loops” is a complex taskmainly because of the number of interactions between the different sub components of the game(see figure 2.3). Each component is optimised to provide the best performance for each of itstasks every frame.


Timer AI

Physics

Animation

Input

RendererSound Renderer

Scene Manager

Frame Manager

Game Logic

Communication

Sound Manager

Scripting

Collision

Detection

Figure 2.3: The interactions of modern game loop

2.6 Game Loops

Physics is a much smaller subsystem of the game loop. With all the modules working concurrentlyand interacting every frame is hardly surprising that the time available for calculations is limited.Despite the many tricks used in physics, graphics and AI to improve frame performance therestill exist situations where the time taken to perform the desired work is longer than acceptablefor consistent frame-rate. Generally it is up to the architect of the game engine to decide howto deal with with this problem. In most cases this is manifested in a drop in frame-rate or lessfrequent physics updates. There is a point at which this boundary can no longer be pushed andthe simulation has a lower bound for producing a single frame. Looking at game loops helpsto understand how physics modules are required to function. Although abundant in the gamesindustry, there are few academic works addressing game loops. Valente et al. [44] attempted toapply classification to loop design. They categorised loops into:

• Simple Coupled Model

• Synchronised Coupled Model

• Single-Thread Uncoupled Model

• Multi-Thread Uncoupled Model

• Fixed Frequency Uncoupled Model

Coupled models in the classification scheme are those which are update (input, physics) and pre-sentation (rendering) are dependent. Uncoupled models update independently. We will see laterthat Scatter uses a single-thread uncoupled model. This choice is mainly due to the simplicityfor the purpose of demonstration.

Game loops have traditionally been single threaded, but more multi-threaded games enginesare being produced to take advantage of the multiple cores of next generation consoles. Producing


multi-threaded loops is a non-trivial problem for the games industry. It is important in this reportto understand simulation and game loops and how they update and render data. I thereforeprovide the following references for further information:

• An article on Multi-threaded Game Engine Design by Tulip et al. [43].

• An Intel article on game loops and optimisation [32].

2.7 Real-time Physics Simulator Design

This section of the report focuses on describing a real-time physics simulator. A real-timephysics engine (or physics simulator) is an application or set of libraries that can be called by asimulation loop to calculate physical interactions and collision detection by simulating a period

of time. The time periods are usually around 160

thof a second. The methods used in real-time

systems are optimised and as a result, are a trade-off between speed and accuracy. To understandthe bottlenecks and where further optimisations can occur requires analysis to determine wherethe numerical calculations are performed relative to the physics loop. We will discuss this insection 3.1.

Erleben’s thesis on Stable Robust and Versatile Dynamics Animations has a detailed descrip-tion of the modules that make up rigid-body simulators [18]. His thesis looks at the internalstructure of the open dynamics engine (ODE) (see 2.8.2.1) and how it compares to a modulardesign. The most relevant section of Erleben’s thesis is the mention of the spatial-temporalcoherence analysis module (STC). This area is of most interest to this project because it is thesection of the engine loop that provides the appropriate data for optimising. The thesis is amust-read paper because it describes in great detail many of the topics mentioned throughoutthis report [18].

2.7.1 Physics Time-step In Detail

The pseudo code in figure (2.4) describes how an internal time-step is calculated in a physicsengine. The pseudo code is based on the Bullet Physics SDK [14]. This particular example isabstracted to make it a more general description of the sequence. For the benefit of simplicity weassume the physics engine uses fixed time-stepping (written as “FixedTimestep” in pseudo code).

2.7.1.1 Unconstrained Motion

The time-step starts by applying the equations of unconstrained motion. This refers to rigid-bodies that are not constrained by connections with other bodies and that are assumed to not bein contact with other bodies. This step acts as a prediction for the motion of the objects. In thecase of contact, the calculated velocities would not be the same if a collision occurred and hencewhy the value is a prediction. In this case the unconstrained movement would be disregarded.The term “Global fields” refers to forces in the world such as wind, water flow and attraction,for example magnetic. These are forces that would act on the natural motion of an object andcan be considered as influencing the object’s path of unconstrained motion. The dampening inthe update of linear and angular velocity takes into account the restrictive forces of travellingthrough volumes.


2.7.1.2 Collision Detection

After calculating unconstrained motion the engine loop needs to determine the objects thatare involved in collisions. This section is an important aspect in reducing the processing timerequired by the physics engine. The costly calculations of the narrow-phase can be reduced byan efficient broad-phase that eliminates non-colliding bounding boxes.

The Bullet User Manual discusses the concept of a mid-phase for further culling and complexcollision shapes like concave triangle meshes. It uses a hierarchical bounding volume structurewith optimised traversing to find the narrow-phase components [14]. In the case of continuouscollision detection, the time of impact is calculated in this step.

2.7.1.3 Non-Contact Constrained Motion

Once the contacting pairs have been identified the non-contact constrained motion can be cal-culated. This is motion of objects that are constrained by other objects, yet do not collide withthem. An example of this is a rag-doll model. Each limb is constrained by the others, butthe limbs aren’t in contact. Eliminating this group of objects removes them from the collisionresponse set that is calculated in the next step. The physics engine looks at every constraintin the world and solves the linear and angular velocities for each respective constraint. Thepseudo code in figure (2.4) shows a few examples such as, a hinge contact and a point contactbetween objects. The type of constraint and complexity is dependent on the implementation ofthe engine.

2.7.1.4 Collision Response: Contact Constraints

The response step is to calculate the impulses acting on the rigid bodies. This can be doneusing the sequential impulse-based method described by Mirtich in his PhD thesis [Mir96][30].It describes how modelling interactions between bodies is performed through collisions insteadof by computing constraint forces at contact points.

It is also possible to solve the collision responses using an equivalent LCP. For real-timephysics, an iterative LCP such as the “Projected Gauss Siedel” (PGS) method would be appro-priate [11]. Coumans notes that direct LCP methods such as Dantzig LCP can provide betterquality solutions [15].

2.7.1.5 Integrators

The integrator is the step of the physics engine that actually produces the movement. Once thevelocities are calculated, then the integrator estimates the new position of the object over thetime-step. It does this by solving a series of differential equations of motion.

2.7.2 Modular Design

The previous section described the stages of solving physics as functions. Erleben’s modularstructure breaks physics simulators down into the following four components :

• Time Control Module - Controls when an how all the other modules are invoked in thesimulator. It is also in control of the time-stepping algorithms to decide the frequency ofmoving through the simulation.


Figure 2.4: Pseudo Code: Simplified Physics Time-step

InternalTimestep(FixedTimestep)

//FixedTimestep is the amount of time the simulator will step through

//Unconstrained Motion

Apply Global Fields

Apply Gravity

UpdateLinearVelocity(volume Damping, Linear Damping)

UpdateAngularVelocity(volumeDampening, Linear Damping)

//Collision Detection

//Broad-phase

UpdateBoundingVolumes()

CalculateOverlapping()

//Narrow-phase

processCollisionDetection(algorithm)

//Non-Contact Constrained Motion

For each (constraint)

switch(constraint.type)

case hinge

SolveLinear()

SolveAngular()

case point

SolveLinear()

SolveAngular()

..

etc

..

//End switch

//End For each

//Collision Response (For Contact Constraints)

For each (iteration over contact constraints)

For each (contact point)

CalculateImpulses( body1, body2, contact point )

For each (iteration over contact constraints)

For each (contact point)

CalculateFriction( body1, body2, contact point )

IntegrateTransforms(Timestep)

//End PerformTimestep


• Motion Solver Module - Responsible for moving the objects in the simulation. Motionsolvers often predict motion before collision detection, then integrate to calculate the newpositions.

• Constraint Solver Module - This module deals with the forces on objects and resolves them,passing the output to the motion solver module.

• Collision Solver Module - Computes the impulses and applies them to the colliding pair.The module is usually concerned with “contact resolution” where the impulses applied toan object from a collision are resolved. It deals with discontinuous motion and informs themotion solver module.

2.8 Existing Physics Technologies

Many of the topics covered so far have been around since the 1980’s. Baraff’s 1989 paper onAnalytical Methods for Dynamic Simulation of Non-penetrating Rigid Bodies [4] is regularlyreferenced by a large number of papers since it was published. The mathematical techniquesdate back further, but real-time physics has only become popular following the introduction ofmachines that can compute the algorithms sufficiently.

In the games industry this step is most noticeable in the arrival of a series of middlewaretechnologies for game developers, animators and researchers alike. This section gives a briefdescription of the most prominent runners in the race to provide a complete physics solution.

2.8.1 Commercial

2.8.1.1 Havok Physics

Havok is probably the best known physics middleware in the games industry. Their technology,Havok Physics and Havok FX has been used in a large number of game titles to date and extendsbeyond games, providing solutions for modelling and other applications. Havok Physics providescollision detection, vehicle dynamics, compact representation of large meshes data serializingand constraint solving. Havok have recently been working with graphics card manufacturers toprovide physics solutions for the GPU. Although best known for physics, Havok provide a familyof products for games.

2.8.1.2 AGEIA PhysX (Formally Novodex)

AGEIA PhysX has recently become a prominent name in the games industry because of theirintroduction of the physics processing unit (PPU). Their approach to improving physics in gamesis to utilise the physics hardware to allow developers to add more physics objects to their games.They intend to bridge the gap between real-time rigid-bodies, deformable-bodies and particlesto provide cloth, liquids and particles that interact with world objects.

2.8.2 Open Source

2.8.2.1 Open Dynamics Engine

The Open Dynamics Engine is an open source library for providing rigid-body dynamics. It isused in games and simulations because of its stability, features and platform independence. It isimplemented in C/C++ and is used in a wide range of projects, from robot simulators to vehicle


modelling.The ODE has the following basic features:

• Articulated rigid-body structures.

• Hard contacts (Non-penetrating constraints).

• Collision detection.

• Joint types such as: ball-and-socket, hinge, slider, angular Motor, etc.

• Collision primitives: sphere, box, cylinder, plane, ray, triangular mesh.

• Motion equations: Lagrange multiplier velocity based on Trinkle/Stewart and Anitescu/Potra(Discussed briefly in [18]).

• Choice of time stepping methods (see 2.7.1).

• Contact and friction using Dantzig LCP solver.

The ODE is a very suitable choice for an engine because it has been developed since 2001. Ithas the bonus that it is open source under the BSD license or GPL license.

2.8.2.2 Bullet Physics Library

The Bullet Physics Library is a collision detection and rigid-body dynamics library for gamesand animations. It is an open source library under the Zlib license and was created by ErinCoumans [14]. It supports the COLLADA API, a standard for describing physics scenes, andis integrated into Blender3D, an open source graphics creation utility. Written in C/C++, theSDK can be compiled on various systems. Bullet has some of the following features:

• Continuous and discrete collision detection.

• Vehicle dynamics.

• Uses GJK algorithm.

• Uses impulse-based methods for collision response.

• Uses AABB bounding boxes in the broad-phase.

• Provides support for convex/concave meshes

• Provides 6 degrees of freedom for constraints: hinge, etc

• Has a broad, mid and narrow phase to collision detection

The Bullet Physic Library is very well supported by the community. Many of the authors ofvarious papers referenced by this report are regular contributors to discussions on the physicssimulation forum. Like ODE, Bullet is another suitable choice for an engine because it is welldocumented and uses recent techniques (Impulse-based collision response, see 2.7.1.4).


2.9 Level of Detail for Physics

We will now move on to looking at the background for investigating physics improvements. Levelof detail is already commonly used in graphics and gaming. Player models in real-time strategygames often have varying complexity dependent on the viewing distance from the camera. Tech-niques for changing the level of detail were described by Martin Reddy at SIGGRAPH 2002 [37].Other information useful for research into level of detail include Reddy’s PhD Thesis on Percep-tually Modulated Level of Detail for Virtual Environments [36] and the book Level of Detail for3D Graphics [22]. Games developers already make choices for the physical level of detail theysimulate. Using too many complex models can cause performance problems, an aspect that thisreport aims to resolve. There have been a few papers on physics level of detail. O’ Sullivan andDingliana wrote a paper describing “Collisions and perception”, an important field for decidinghow a user’s perception of physics can allow for performance improvements using level of detail[33]. Berka wrote a thesis describing level of detail relating to motion and animation in VirtualReality. His work relates directly to work by Reddy and is a good basis for further research oflevel of motion detail. Level of detail is discussed later in more detail in Chapter 3 of this report.

2.10 Hardware for Physics

In the past hardware manufacturers have produced architectures to suit specific types of cal-culation. In the race to optimise physics calculations for real-time, hardware solutions havebecome of particular interest. They mainly focus on the “brute force” technique of solving themethods derived from collision detection and response. Many of the required calculations canbe optimised into a single instruction, multiple data (SIMD) form with the final goal of runningthe calculations in parallel. The physics community appears to be approaching the problemfrom different perspectives. The recent introduction of physics processing units (PPU) has beenspurred on by a demand from the games industry for larger scale, more complex game physics.The success of the technology is dependent on its efficient utilisation by the developers, algorithmoptimisers and ultimately the ability of the games to provide increased physics complexity whilemaintaining real-time response.

At the same-time the introduction of shader programs into the GPU pipeline has opened upthe door to utilising the power of multiple parallel pipelines. These pipelines allow the sameshader function, designed by the developer, to be performed on each vertex or pixel of thegraphics scene in parallel. This makes it suitable for vector calculations that are independent ofeach other. Although multi-core CPUs can also be utilised to achieve parallel calculations, thisproject focuses on the principle of a constrained CPU and hence assumes that both cores areunder high load when alternate hardware is used.

The technology described below is a reference for the techniques that have been investi-gated regarding “parallel physics processing”. Although they provide a potential solution to theproblem, this report only covers parallelism briefly. See section 3.3.1 for a description of wherehardware could be used.

2.10.1 GPU

2.10.1.1 Pipeline Architecture

Graphics pipelines on GPUs have the job taking points in 3D space (known as vertices) andperforming the calculations on them to draw the polygons that join them. The GPU finallycalculates how this should be represented on the screen in terms of pixels. The pipeline is a


specialised unit dealing directly with certain graphics techniques such as textures. On mostrecent graphics cards (within the past 3-4 years) there are two programmable stages of thepipeline: vertex processor and fragment processor. The details of these are describe later on(see 2.10.1.2 and 2.10.1.3). The pipeline can be visualised as a series calculations that occur toa stream of vertices in order to produce a frame. Owens describes how the graphics pipelinecan viewed as a programmable stream processor. He discusses how the pipeline can be used forgeneral-purpose computation on GPUs regularly referred to as (GPGPU). By converting datainto a stream it is possible to use the numerous GPU pipelines to perform parallel computation.By using the vertex and fragment processors at the relevant steps to manipulate the incomingdata. It is possible to perform certain specific calculations and finally retrieve the result at theend.

The latest end-user GPUs on the market have 128 streaming processors 6, while most recentGPUs have between 12-48 programmable fragment processors and 5-8 vertex processors 7. Thepower of these processing units has the potential to be used for physics calculations describedlater in (2.10.1.3 and 2.10.1.2). More details on GPGPU programming can be found in theGPGPU section of GPU Gems 2 [21].

2.10.1.2 Vertex Processor

The “vertex processor” (often referred to as “vertex shader”) refers to the unit that performs thecalculations on vertex data. A “vertex shader program” is a program that runs on the vertexprocessor and can modify the vertex data. Dependent on the architecture of the graphics card,the processor may be able to retrieve texture data at this stage in the pipeline that can bewritten to by the fragment processor later on. This allows vertex shader programs to work ondata passed from previous calculations, potentially useful for result dependent operations.

The vertex processor is used to group vertices and perform operations that can cull and clipgroups. This is of particular interest to physics calculations because it can be used to reduce thefinal calculated data set that certain time consuming operations work on and hopefully speed upthe process. The GeForce 6 series architecture could perform operations at 32-bit floating pointprecision, so there is still a high level of accuracy available for physics calculations [20]. Afterthis stage the results are passed to the rasterizer (see [20] for more details), which prepares thedata for the fragment processor.

2.10.1.3 Fragment Processor

The “fragment processor” (often referred to as “pixel shader”) is the unit that takes the fragmentdata from the rasterizer and performs calculations on it. In graphics, fragments can be thought ofas “potential pixels” that undergo a series of tests by the fragment processor ultimately resultingin them becoming the pixel data for a particular pixel on screen [20]. This allows “pixel shaderprograms”, the programs that performs such tests, to repeat the same functions on multiplefragments. Once again, fragment processors are able to access texture data (seen as the memory ofGPGPU computation) and perform hundreds of calculations in SIMD concurrently (one fragmentper fragment processor). In terms of physics this would be useful for ray-casting techniques thatperform operations on a per pixel basis.

6Data courtesy of Tom’s Hardware [34]7Data courtesy of Tom’s Hardware [29]


2.10.1.4 Geometry Shader

Following a new generation of graphics hardware, the pipelines that run on the hardware havebeen updated during the development process to match. Direct3D, a Microsoft standard graphicsAPI is commonly used in many graphics applications to specify what the hardware should render.The latest specification, Direct3D 10, has a number of features that are too detailed to discuss inthis report8. Instead we will focus on the important introduction of the Geometry Shader. Thisis a new part of the pipeline that sits between the vertex shader and the rasterizer, that allowsprogrammable operations to operate on the vertex data from the vertex shader. The importantaspects that are now enabled are the abilities to calculate plane equations of grouped verticesand also to locate the adjacent vertices of the group. This has a potentially useful application inphysics since calculating equations of planes and comparing vertices quickly is utilisable for thenarrow-phase of collision detection (2.3.2). The intent of this section is to highlight the relevanceof this new ability, but due to the scope of the project it may not be possible to investigategeometry shaders further.

2.10.1.5 CUDA

During the progression of the project, Nvidia released its new framework, CUDA for performingcalculations on the GPU. CUDA takes advantage of the architecture of the 8 series GeForce cardsto provide a higher level of GPU abstraction. We can speculate that physics implementationson CUDA are already underway so highlighting this development is important in the context ofthis report.

2.10.1.6 Appropriate Physics Utilisations

At this point it is probably important to comment on Shader Model 3.0. Shader models are astandards to define the capability of graphics hardware. Shader Model 3.0 is important becauseit has the following features:

• Multiple Render Targets - The output of the fragment shader can sent to four rendertargets. This is especially useful for particle physics where location and velocities can becalculated simultaneously.

• Vertex Texturing - The vertex processor is able to read texture data. This means thatcalculations performed on previous data can be passed back to future calculations beforethe culling process. In terms of physics, operations that iterate can be performed andtested using this feature.

The Havok technologies described in section (2.8.1.1) set a minimum standard on Shader Model3.0 for use on GPUs. It is possible that these features of Shader Model 3.0 are the reason forsetting a standard for their software. It is now more likely that developers will use the newarchitecture in the latest GeForce cards as the standard for providing physics.

Much work has been done on GPUs in the area of real-time physics. For his Masters project,David Knott at the University of British Columbia discussed two methods of using the GPUpipeline to perform collision interference on hardware [25]. The techniques, particle detectionand ray-casting, were implemented on the vertex and fragment shader respectively.

8Details can be found in the following paper [8]


2.10.1.7 Limitations

The most noticeable limitation to the GPU hardware is the ability to read from the GPU backto the CPU after the calculations. Some approaches in utilising GPUs account for this byimplementing their most of their code on the GPU.

In order to obtain data from the GPU, the output of the fragment processor must be renderedto texture. An operation can then be performed to read from the texture memory. The problemis that when the operation occurs the GPU pipeline is flushed, so this can only be done at theend of the calculation. This poses a problem for real-time feedback. Knott mentions the specificsof the problem in his paper [25].

Another limitation that needs to be considered is round-trip time of calculations sent tothe GPU. Buck explains that the time taken to prepare the CPU data, send it to the GPU,process the data and retrieve it can often be longer then the time taken to perform the sameCPU calculation. [9]. He suggests that in order to make the method effective the number ofoperations performed on the GPU must be suitable to account for the cost in sending and readingback. We can speculate that this limitation will be overcome in the new architecture thereforemaking batch processing of generic data easier.

2.10.2 PPU

2.10.2.1 Architecture

The hardware implementation of the PPU has been pioneered by AGEIA (2.8.1.2). Not a largeamount is known about the technology other than its ability to perform large amounts of physicscalculations via the AGEIA PhysX System. AGEIA claim that the unit has 2 terabits per secondof internal read/write memory bandwidth, which is higher that the equivalent GPU bandwidth.The other details they have revealed in their white paper is that the hardware has multiple cores,physics specific core architecture and accessible memory [1]. The speculation that this processinghardware can perform many parallel operations is quite likely.

2.10.2.2 Limitations

The main limitation of using such a piece of hardware is that the internal architecture is not verywell known and tailoring a physics system to utilise it is a problem. The API to the hardware isprovided by the PhysX system itself, so it is specific to the PhysX standard.

The current hardware produced by manufacturers Asus, Dell and BFG work through thePCI bus, not the current PCI Express ports. This appears to be a major limitation in termsof throughput; however it is possible that in the future the hardware can use the PCI Expressinterface and theoretically increase the bandwidth.

2.10.2.3 Appropriate Physics Utilisations

The PhysX system has been demonstrated providing a range of physics improvements fromrigid-bodies to particles. The general understanding is that it can be used for effects systemsof explosions, flying objects and liquids that interact at real-time with aspects of the game.The PhysX PPU has been utilised by physics-based games. Cell Factor allows players to usetelekinetic abilities to pickup and throw objects at other players. The demanding physics in thegames require the use of the processor to perform the actions of collecting hundreds of objectstogether in a single location and move them around. This is one direction gameplay physics maytake, the extent of which could provide even more based-physics games.


2.11 Report Terminology

For the duration of this report I will use the follow terminology:

Geometric Object or Collision Shape The shape of the rigid body being represented in thedynamics world. This can be the shape of the rigid body or a group of Geometric Objects.They have geometric properties such as size, orientation and position, but no dynamicproperties such as velocity or mass.

Space Object A particular type of geometric object that represents the volume of a rigid body.A space object doesn’t have to correspond with the geometric representation of a rigidbody but space objects usually encapsulate the rigid body. An example of space object isa bounding box.

Rigid Body An object in the physics world that has dynamic properties such as mass andvelocity. Rigid bodies cannot succumb to deformation.

Bounding Volume A space object that entirely encapsulates a rigid body. They can be anyshape, but are usually spheres, oriented bounding boxes or axis-aligned bounding boxes.

Stacking The action of placing many dynamic objects on top of each other. Stacking usuallyrefers to piles of objects but can also refer to pairs of objects, usually in resting contact.Stacking is used in the context of stability of physics. If a simulator can successfully stackobjects without small “jittery” movements the simulator is relatively stable.

Collision Pairs A collision pair is the pair of objects A and B that the broadphase has decidedrequires further intersection testing.

Chapter 3

Investigation

“The real goal of physics is to come up with an equation that could explain theuniverse but still be small enough to fit on a T-shirt”

Leon Lederman

3.1 Physics Engine Analysis

The process of constructing a physics engine is a long and complex task. It requires a goodunderstanding of vector mathematics, Newtonian laws and linear programming. Constructingeven a simple engine would have taken the entire duration of the project and even then only beenable to simulate a small subset of object types. In comparison, current rigid bodies simulators cansimulate complex meshes, constrained objects and vehicle dynamics. Using an existing dynamicsengine was the logical choice for demonstrating a solution to sporadic cases. I chose to use anexisting simulator for the following reasons:

• To avoid the steep learning curve of creating a physics engine from scratch.

• Large set of existing functionality - Physics simulators already contain many different typesof collision primitives.

• Allow quick implementation of my proposed solution.

• To approach the problem from the perspective of a developer or researcher.

• To analyse a specific implementation related to the theory.

In this section we will compare our case study physics engine, Bullet, to the modular designdescribed by Erleben with the goal of identifying the cause of the collision overload problem[18]. Bullet is a suitable choice for analysing because it is open source and an active projectwith plenty of support from the “Physics Simulation Forum”[16]. Analysing Bullet is a task ofidentifying structure, techniques used and relating it to theory from the books and research. Thereason for taking this approach is that implementation uncovers grey areas in the research andaddress them (otherwise it wouldn’t work!). Two well designed techniques that work efficientlyseparately, but together diminish performance, is kind of problem we are looking for. We willrevisit Bullet for the implementation of Scatter in Chapter 4.

44

CHAPTER 3. INVESTIGATION 45

Time Control

Physics Simulation Loop

Motion Solver

Constraint Solver

Collision Detection

Broadphase

Narrowphase

STC Analysis

Collision Solver

Contact

Determination

Discontinuity

Signal

Penetration

Signal

Modular Design

btDiscreteDynamicsWorld

Motion Solver

Constraint Solver

Collision Detection

GJK

Bullet Modular Design

Narrowphase

Midphase

Broadphase

AABBs

OptimiseBVH

Time Control

SimIslands

Figure 3.1: The modular design concept by Erleben (a), applied to Bullet (b)

3.1.1 Case Study: Bullet Physics Library

Bullet is becoming one of the more well known rigid-body physics engines available. Research in2006 by Seugling and Rolin evaluated a range of physics simulators and gave Bullet an averagescore of 2.6 of a maximum 5.0[40]. In less than a year Bullet has increased it’s feature set andsince then includes support for Collada (a standard for specifying dynamics), vehicles and hasits own implementation of a sequential impulse-based solver. The reason why we will look atBullet as a case study is that it is a research tool that features many of the latest techniquesfrom the physics developers community. At the outset of the project, it was likely that whateversolution used would require the physics engine to be modified to incorporate it, that is why thiscase study looks at the internal workings.

3.1.1.1 Modular Design

The design of Bullet is intended to be modular and not optimised for performance. This allowsthe implementation of generic solvers that can be included when setting up Bullet. Bullet canbe split into two main areas:

Dynamics (btDynamicsWorld) - Deals with rigid bodies, solving motion, constraints and con-tact.

Collision Detection (btCollisionWorld) - Deals with the collisions, selection of algorithms andcan form a subsection of the dynamics.

For this subsection we will look at btDiscreteDynamicsWorld to analyse the simulation process.Figure 3.1 shows how Bullet conceptually fits the modular design model by Erleben. The actualimplementation is mainly contained with the btDiscreteDynamicsWorld class. Each module canbe thought of as a set of function calls that solve a problem then return the result. A lotof algorithms used by Bullet are hidden within the structure, for example simulation islands(described in subsection 3.1.1.2) are contained within btDiscreteDynamicsWorld and manipulatethe persistent the data independently of other modules. The best place to intercept data is


by using the NearCallback function by specifying “MyNearCallback” as an argument for theCollisionDispatcher, developers are able to catch the simulator at the point where two objectsoverlap their bounding volumes. This is utilised in Scatter and helps provide the information totrigger the model switching (see section 4.5.2).

3.1.1.2 Algorithms

Bullet has a number of built-in implementations described below that form the basis of theworking application. It also uses a few techniques to further arrange data and find tests thatdon’t need to be performed. The following are a description of the algorithms used and why theyare beneficial to Bullet and simulators in general.

• btSequentialImpulseSolver - An implementation of Mirtich’s impulse-based contact res-olution technique, recognised as a suitable approach for real-time simulation. This is themain constraint solver used, but by implementing the btConstraintSolver interface it ispossible to use other solvers.

• btGjkEpaSolver and btGjkPairDetector - An implementation of the GJK EPA algo-rithm and GJK pair detection (see section 2.2.2). These are the default methods used forconvex collision detection in the narrowphase. There is also the option of using SAT (Sep-arating Axis Theorem) for convex objects. Either of these algorithms would be suitable fornarrowphase testing and the choice is down to context and preference.

• btSimulationIslandManager - Bullet uses the concept of simulation islands, a techniqueto group bodies based on their constraints and activation state (whether or not an objectis sleeping). It uses a union find technique to efficiently group bodies. It is possible thatsimulation islands could be used partition the work of the simulator. The idea is mentionedbriefly in subsection 3.3.1, but is not an area of focus in this report.

• btCompoundCollisionAlgorithm - One of a number of classes that implement the bt-CollisionAlgorithm interface. The difference is that the compound Collision algorithm usesOptimizedBVH (implemented using AABB trees, see section 2.3.1.3) which Bullet refers toas the “Midphase” to reduce the number of tests to be performed by the narrowphase oncompound objects. The presence of this in the implementation helped to form the conceptof the solution in section 3.4. Without compound collisions, producing the implementationof Encapsulation Levels in Scatter would have been difficult.

• btPersistentManifolds - Bullet uses the idea of “Manifolds” to cache contact pointsbetween frames. Manifolds are effectively created every time an new collision pair is testedwith a certain algorithm (btCompoundCollisionAlgorithm for example). They also containan implementation of collision filtering, reducing the number of contacts to only four. Thishelps stability and reduces calculations. Quantifying manifolds is difficult because they arecontinually created and cached, making it difficult to see which manifolds are in use.

3.1.1.3 Improved Performance

By running Scatter with Bullet integrated it is possible to step through the execution of thephysics. Comparing the state of the world at the specified frame to the internal work done inthe engine revealed the following performance improvements.

• Static and Kinematic objects are excluded from motion calculations - Cullingof static objects and kinematic objects happens in many stages of the simulator. Static


objects don’t move and hence don’t require updating and kinematic objects (objects whichare moved by the user and don’t respond to collisions, but only cause them).

• Short iterations for numerical solvers - Bullet ’s sequential impulse solver uses 10iterations by default to solve the contact and friction models.

• Midphase - Bullet uses what its refers to as a “Midphase” to improve the stage betweenbroadphase and narrowphase. The algorithm is only used for compounds and the imple-mentation actually uses AABB trees. See van den Bergen for further information on AABBtrees[45].

3.1.2 Bottlenecks of Physics Simulators

Physics simulators are designed to fit a purpose and in that sense identifying bottlenecks isrelated to the simulator design. A good technique can be badly implemented at the cost ofperformance: for example, the sweep and prune method is a simple concept, but a non-trivialproblem to implement. Choosing a data structure to store the endpoints that can perform fastswapping of sequential data is beneficial for the insertion sort approach used by Baraff. Theestablished techniques in simulators are those that are repeatedly used in different simulatortypes: OBBs, AABBs, impulse-based contact resolution and LCP solvers to name a few.

3.1.2.1 Full Narrowphase Intersection Testing

The main area of bottlenecks is in collision detection, explicitly narrowphase intersection testing.For convex polyhedra we need to find a separating plane, a pair of contact points and in somesimulators penetration depth. Some algorithms like GJK use distance to do this and can solvepenetration depth in the process. The complexity arrives in the form of a search, whether itis looking for half spaces (to determine if a point is inside another object) or solving LinearProgramming (LP) problems. The cost of performing these tests is such that physics engineswill attempt to avoid doing the calculations as often as possible. Simulators must test all objectsagainst each other because every frame the world changes with the potential for any object tocollide with any other. From a cold start (when there is no prior knowledge to help the test) thisstep is undesirable, but using frame coherent data we can improve the time complexity. Simplespatial analysis tells us that all objects are not likely to be in the same space (Partitioning is apotential area of performance improvement), so broadphase culling is used to eliminate collisionpairs that don’t overlap.

Bottlenecks appear to occur when narrowphase must be performed with:

• Little help from broadphase (most objects overlap)

• Low frame coherence (when separating planes or closest points must frequently be recal-culated: rotating objects)

• Algorithm complexity of a collision pair is high (see subsection 3.1.2.5)

3.1.2.2 Unused Calculation Results

Frequently not using calculation results is not a clear bottleneck, but could potentially causedegrading performance if not identified in an implementation. The idea is that from the uncer-tainty of physics, attempts are made to predict motions and collisions which are later ignored orrecalculated. Bullet makes a prediction of motion before entering the collision detection stage.This is to the benefit of all the objects that don’t collide or have constraints, since their final


position predicted there needn’t be further calculations. There is a degree of uncertainty as towhether any objects will collide so predicting motion is a logical step. The best improvementthat could be made to this area is probabilistic analysis to decide whether to predict or allow forthe cost of performing calculations when needed.

3.1.2.3 Maintaining Structures

Techniques that use complex structures to quickly find information always have the overheadof structural maintenance. Regeneration of structures is often slower and must be performedas infrequently as possible. Updating of AABBs is an example of such a structure that mustbe regenerated every frame. The implementation relies on the efficiency of AABB regeneration,which is faster than OBB regeneration [45]. Inefficient maintenance can be a potential bottleneck.

3.1.2.4 Excessive Contact Points

The more contact points between two objects the more contact resolution that needs to beresolved after collision detection. It may seem obvious that this is a bottleneck in the contactresolution stage, but often a small number of contacts points will suffice. Large numbers ofcontact points on surface collisions cause instability. Consider the following situation:

“Two objects are touching via a face-to-face collision. Selecting a set of contact pointscould change drastically between frames, because there are so many potential pointsof contact. Simulators work on the principle of resolving contact. Uneven sets ofpoints cause models to move then the effect is amplified as contact occurs elsewhere.”

The result is oscillations and instability. Bullet uses “manifolds” that coordinate the contactpoints and reduces them to 4 points to resolve the problem. Whether excessive contact pointsis a bottleneck depends on implementation, but a good description of techniques, mentioned byMoravanszky et al. in Games Programming Gems 4 is ensuring that developers are aware of theproblem [31].

3.1.2.5 Complex Intersection Algorithms

On the subject of collision detection, the preference is to calculate interactions between convexobjects. For meshes we usually attempt to enclose them in convex hulls or convex primitivesthat are simpler to test. Create Dynamics by John Ratcliff that uses convex decomposition is anexample of such a library. The models used in Scatter have been created using the decompositiontechniques. Concave object collisions are usually avoided in favour of compounds of convexobjects. A compound is an object composed of various primitives in any structure. Although itmay seem to be just a subset of primitives, compounds have the added bonus that they don’trequire inter-primitive testing of any of their shapes. Work by Guendelman et al. involvingnon-convex rigid-bodies with stacking using signed distance functions and triangulated surfacescould be developed into a feasible technique for real-time, but for now complex concave objectsare avoided.

3.1.2.6 Generally Avoiding Bottlenecks

It is apparent that some of the methods of avoid bottlenecks are purely “good performanceprogramming techniques”. Using data structures that are fit for purpose, only doing the workrequired unless the cost of doing it later as requested is too high. The areas that I have identifiedcan sometimes be attributed to a trade-off made by the designer. Accuracy is traded for speed


in number of iterations performed by numerical solvers. The hot topic in algorithms seems to bereusing data between frames. Stepping through frames of a simulator and using support planesand cached points to “warm start” numerical solvers is one such example. Using performancedata structures for file output is just one example used in Scatter.

3.2 The Collision Detection Bottleneck

To recap, the collision overload problem occurred when a collection of objects collided at asingle point. The ideal method of identifying the cause would be to recreate the scenario in theoriginal game and profile the game software to analyse how much time is spent performing whichfunctions. The problem with this approach is that it requires access to the code structure of thegame and also a license to use it. For most cases this would be difficult to do so and the outcomewould likely focus on the implementation of a specific physics simulator. This section views theproblem in relation to general physics engine algorithms.

From the research I established two probable causes of the drop in performance:

1. A sudden overlapping of broadphase bounding objects, requiring an increase in narrowphasecollision detection.

2. A method for breakable objects that results in a sudden increase in complexity.

To understand what the objects are doing when colliding we will start by looking at the broad-phase. In the background we established that AABBs are the most frequently used form ofbroadphase so I will refer to the case where AABBs are being used. Since broadphase collisiondetection of n AABBs is usually of time complexity O(n log2 n + k) 1(where k is number ofoverlapping pairs) we can assume that in the motivation each object collision object in the worlduses a single AABB. If every object is close enough we will assume for the sake of the problem

that every AABB is overlapping when k is n(n−1)2 (Number of collision pairs for n objects).

This is therefore a time complexity of O(n(log2 n + n)). This is probably a rare case since notall objects n objects are likely to overlap, but it possible for a large number to do so. Physicsresearchers recognise that the time to perform broadphase calculations is of lower significancecompared to performing narrowphase calculations. The significance of a large value for k isthat even after the AABBs are calculated, k pairs will still require narrowphase collision testing.This means that the time complexity for an intersection query (assuming the use of AABBs)is O(Broadphase) + ... + O(ith − phase) + ... + O(Narrowphase), where the ith phase is anyadditional phases between broad and narrow (Bullet has a midphase described in section 3.1.1.3).

The average time to compute an intersection query is described by van den Bergen [46]. Fora sequence of intersection tests S1, ..., Sn, let fi be the event that Si fails and Ci is the averagetime to perform Si. The average time to perform an intersection query is:

Tavg =

n∑

i = 1

P [f1...fi−1]Ci

where P [f1...fi−1] is the probability of failure of tests S1, ..., Sn

The aim is to form a series of tests for which Ci and the probability of a test failing under thecondition of the former test having failed, are small. Estimating both P and Ci can be calculated

1Using output-sensitive algorithm presented by Six and Wood [41]


using profiling tools. Relating back the time complexity, if in the motivation a large proportionof the objects are having to take all intersection tests to find the result, the total query time ofan object increases. Since the efficiency of collision detection relies at the very least on the speedof the broadphase we can see the overlapping of AABBs of many object will have a big impacton time if the next phase is not efficient. Given a simple broadphase object and a complexnarrowphase it is likely that the number of surfaces that need to be tested will increase. Simpleoverlap has the potential cause the collision overload if the objects are complex enough. Thesame effect is visible in Scatter, which uses Bullet and is apparent in the results.

Another probable cause is related to techniques of handling breakable objects. Decomposinga single mesh object into a collection of separate sub-meshes and physics objects will inevitablyincrease the number of objects in the world, n. It is common in video games to break largerobjects into smaller objects, this is usually done by having a selection of “gibs” that modelsbreak into on destruction [12]. If the models are physics objects the set of gibs can be specifiedby the developer and can be a collection of smaller objects or fitting pieces. These objectsare traditionally precomputed, but it is becoming more common to use real-time deformable orbreakable models2.

Figure 3.2: Breakable objects diving into constituent elements.

Figure 3.2 shows the steps usually taken when dealing with breakable objects at the time ofdestruction. First, the mesh is removed and replaced with the mesh of the gibs. With theappropriate transform from the centre of the original mesh to the correct position and orientation.The individual gibs are wrapped by the appropriate physics primitives. For complex gibs, thesecan be precomputed. Left bunched together, it is likely that the gibs would intersect and collidethen explode in all directions in the contact resolution phase. This situation would be undesirablefor objects that need to appear to crack or fall apart. At this stage the gibs would be separatedfar enough apart so that they don’t immediately disperse. At the end of breaking a model, thereare more physics objects than before the break. Complex models can have any number of gibs,but this is usually limited to avoid these situations. Assume that the original mesh has kj physicsprimitives representing it. Now assume that it has g mesh gibs each of which is represented by

2For further reading see Real Matter for deforming soft bodies[38] and work by Bao et al on fracturing of rigidmaterials [3].


pij physics primitives where pi is at least one. The number of new primitives in the world perbreakable is:

Objj(new) =

g∑

i = 1

pij

− kj and Objj(before) = kj

The reason why we take into account the kj existing primitives is because we wantto observe how many are added on breaking. In the case of Figure 3.2, the originaljug has has three primitive objects; two boxes and a sphere. If b is the number ofbreakable objects in the motivation collision and r is the number of regular objects(objects other than breakables, including static etc), then the total number of physicsprimitives involved in the narrowphase after breaking is:

npafter =

b∑

j = 1

(

Objj(new) + Objj(before))

+

r∑

l = 1

(kl)

=

b∑

j = 1

g∑

i = 1

pij

− kj + kj

+

r∑

l = 1

(kl)

=

b∑

j = 1

g∑

i = 1

pij

+

r∑

l = 1

(kl)

where npbefore =

b∑

j = 1

(kj) +

r∑

l = 1

(kl) and n = b + r

The worst case scenario would be when the maximum number of gibs that any breakable splitsinto is gmax , assuming each gib is a primitive p = 1, the maximum detail of an original breakablemesh is kmax and every object in the world collides is this kind of object, hence r = 0 andnpbefore = n × kmax. We can show that npafter = n × gmax . So the increase in primitivesinvolved in narrowphase in total is by a factor of gmax

kmax

. From the Figure 3.2, we can see theoriginal mesh was 3 primitives and the resulting gibs are 7 primitives the factor increase wouldbe 7

3 or 2.3. The factor is constant and will more than double the number primitives when

colliding. Comparing all narrowphase primitives would then require the testing of np(np−1)2 pairs

so we gain the approximate increase of pairs pairsafter ≈ pairsbefore ×(

gmax

kmax

)2

.

In the sporadic cases where most objects overlap and therefore have to be narrowphase testedwe observe that complexity of the narrowphase collisions dictates the outcome. Gino van denBergen admits that since worse cases are rare, physics designers “often abandon hard real-timerequirements and shoot for optimal average timings” [46]. This indicates that it is possible toencounter situations where performance will drop, most likely observed in the original problem.


This example of breakables shows a situation where small changes to physics objects can resultin large increases of calculations. The next section looks at the possible areas that can minimisethe impact of overlapping and discusses the suitability.

3.3 Solution Methods

This report looks at two areas of research that address the collision overload problem:

• Parallelisation of physics calculations - Identifying areas of the physics simulationloop where parallelism can occur and utilising specific hardware and libraries. Parallelismcan be further divided into:

– Data level parallelisation - Where every physics object is arrange in such a way thatoperations on it can be performed in parallel. This would require grouping of objectsthat interact to reduce the overhead of transferring data.

– Task level parallelisation - Where each stage of the physics pipeline is broken downinto tasks, such as collision detection, contact resolution.

• Calculation reduction - Identifying situations where reducing the amount of calculationwould improve the performance. Calculation reduction covers:

– Model Reduction (Level of Detail) - Where the complexity of representative model isreduced.

– Numerical Algorithm Improvement - Where new algorithms are designed that takefewer steps or reduce the amount of work required.

This report briefly mentions parallelising calculations in relation to Bullet, but we highlight thisarea for the purpose of future research. The remainder of this Chapter will look at “Level ofDetail” as a solution.

3.3.1 Parallelising Calculations

The response to multi-core technology and parallelisation is an area of research interest at thetime of writing. The background has mentioned a number of research articles relating to imple-mentations on GPUs. With the arrival of technologies like CUDA (see section 2.10.1.5), GPUimplementations of physics engines is current topic (See section 2.8.1.1). The motivation forphysics parallelisation appears to be driven by next-generation console development. Multiplecores have become an area of focus for physics developers. Erwin Coumans, the author of Bulletis involved in a port of the Ageia PhysX engine to the PS3. Kokkevis et al. describe implement-ing physics on the CELL architecture [27]. They look at parallelising four parts of the physicsloop across the PPU3 and SPUs of the CELL processor. The suggested areas are noted below:

• Narrowphase Collision Detection

• Narrowphase Contact Point Determination

• Constraint Preparation

• Constraint Solving


Broadphase

Midphase

Narrowphase

Collision Detection

Contact Points &

Penetration Depth

Timer Control

Constraint Solver

Constraint Preparation

Constraint Solve

Motion Solver

Integrate Transforms

Bullet (Modular)

Broadphase

Midphase

Narrowphase

Timer Control

Constraint Solver

Motion Solver

IT IT IT

CP CP CP

CS CS CS

CP CP CP

CD CD CD

Bullet with possible parallelisation

Figure 3.3: Possible data parallelisation in Bullet

Figure 3.3 shows how the techniques of Kokkevis would appear in the Bullet architecture. It ispossible that the collision detection and contact point determination in the narrowphase couldbe combined into one logical task, mainly because the GJK solver in Bullet detects contactpoints as part of the implementation. Task level parallelisation is difficult because each taskrequires a complete set of data from the previous task. For example contact resolution can becalculated once all contact points are known. Data level parallelisation on the other hand issuitable for physics. Using Bullet as an example, we could calculate the broadphase as a singletask then arrange the data by simulation islands. Assigning islands to be run in parallel andonly synchronising when data sets overlap.

The problem with this approach is that it still doesn’t directly address the contact overloadproblem caused by all contacts in the same group overlapping. Any improvements broughtabout by data parallelisation are likely to improve performance of physics as a whole and notjust contact overload. Parallelisation is natural progression improving performance indirectlyand is therefore subject to future work. I refer the reader to the various papers that have lookedat parallelisation.

3.4 Level of Detail

This report proposes a level of detail technique called “Encapsulation Levels” to address theproblem of collision overload. We have discussed in the background how level of detail is appliedto graphics to improve performance. One reason why we would attempt to apply level of detail inphysics, is to benefit from the user’s diminishing perception of physical detail in a scene, similarto the way graphics benefits from visual perception. In a complex scene with many objects,the individual interactions between the objects become less of a concern to the user when facedwith globally comprehending all the interactions. When we are not sure what to focus on we

3Refers to the “Power Processing Unit” of the CELL architecture and not to the “Physics Processing Unit” ofAgeia Technologies


will continuously switch what we are looking at, effectively distracting us from the detail ofinteractions.

Physical detail of objects is currently a concern, but for reasons of computational efficiency. Itinfeasible to represent physics collision shapes at the same level of detail as graphics meshes, notleast of which because of the complexity of concave collision detection. Instead, developers optfor simpler models to represent complex shapes. The decision is related to a trade-off betweencomputational efficiency and user perception of detail. Developers and users would prefer tohave more detailed objects but are limited by performance requirements.

Encapsulation Levels works on dynamic switching of models in an attempt to provide anoverall higher level of model detail without the performance degradation experienced in caseslike collision overload. Switching to a simpler model can reduce the number of calculationsrequired in these cases and hence improve the performance. Unfortunately switching models isnon-trivial problem. We need to establish when switching is likely to benefit, how quickly can werespond to a situation and what the requirements are for switching a model. This section detailsthe problem of model switching and then in Chapter 4 we look at implementing an example inScatter.

3.4.1 User Perception

In graphics, perception can be measured mathematically in terms of how many pixels representan object and hence what level of detail a user will see can be quantified. In physics this is alittle more difficult. We are concerned with “look and feel” of objects and expectations of whatwill happen. Consider the following example:

“In a physics simulator a user is given the task of arranging objects by moving themaround. The user’s focus is directed at a small group of objects where the interactionswith these objects is quite intricate. Placing a complex object with a rounded edgeon a flat surface, we would expect the object to attempt to roll on the rounded edge.If instead of rolling it began to pivot we would perceive the motion as infeasible”

This perception relies on the focus of the user. For many objects that are colliding continuously wedon’t require the same level of detail because reactions are so quick that the user wouldn’t notice.O’Sullivan and Dingliana have written a paper on “Collision and Perception” that attempts toaddress what a user can perceive and how performance can be improved [33]. They draw threeimportant conclusions from the work:

• Erroneous collisions in the user’s periphery are less likely to be detected.

• Anomalies that occur between homogeneous distractors are less obvious.

• Time delay between collision and response reduces the plausibility of the collision.

They conclude that it is possible to produce random collision responses that are as believableas the accurate ones, mainly because as complexity increases humans rely on common-sensejudgements of dynamics that are inaccurate. This supports the idea that reducing model detailduring collision overload will have less of an effect on the user’s perception.

3.4.2 Encapsulation Levels

When formulating an idea for changing level of detail we need to address the following require-ments:


1. Changing the level of detail must preserve system stability.

2. Changing the level of detail must be computationally feasible in real-time.

3. Changing the level of detail must not degrade the plausibility of collisions.

The concept of Encapsulation Levels comes from these requirements. The idea is the following:

“Encapsulation levels is when each object has a number of n discrete levels of detail.The base level “1” is the lowest amount of detail acceptable for representing the object.Level n is the highest level of detail and hence to ideal representation of the object. n

can be a large number or as little as two. The requirement is that each higher level ofdetail is contained within the previous level of detail. This preserves the accuracy ofthe shape and allows us to make the following assumption. If level i, where 1 ≤ i < n

, is not intersecting then levels (i + 1, ... , n − 1, n) are not intersecting either. Wetherefore don’t have to make any further calculations. If a model is at level i andis colliding, it is possible to switch to a higher level of detail without disrupting thesystem stability. To reduce the level of detail we need to make a single intersectiontest of the (i−1)

thlevel to determine whether we can switch to a lower level of detail

whilst preserving stability.”

This concept satisfies the 1st requirement. To satisfy the 2nd and 3rd, we must apply tworestrictions:

The discrete levels of detail must be precomputed to avoid the overhead of

calculating on the fly as a switch is made:

This allows the models to be algorithmically generated provided they are containedwithin each other. We are also able to design the level of detail by hand to get thebest “performance to accuracy ratio” of model switching.

Each level of detail must provide a plausible response to collisions in the

situation it is invoked:

An object colliding at high velocity could plausibly use a low level of detail, but anobject interlocking with another object would require a high level of detail to beplausible.

Figure 3.4 shows how encapsulation levels work for complex objects.

3.4.3 Requesting a Level of Detail

Switching a level of detail can’t necessarily be performed as required, because we must first testsafety conditions associated. We could try to switch up a level at any instance because leveli + 1 is not intersecting by definition. Using safety conditions prevents us from causing chaos byswitching at the wrong time. Consider the following safety conditions:

• “Only switch up a level if there are no contact points with other objects” - Figure 3.5 showsthe reaction of not using this condition

• “Only switch down a level if the i − 1th level of detail is collision free” - Figure 3.6 showsthe reaction of not using this condition


Figure 3.4: A diagram showing “Encapsulation Levels” of a lamp and a mug

Figure 3.5: The problems of increasing level of detail without safety conditions

Figure 3.6: The problems of decreasing level of detail without safety conditions


To enforce conditions like this we will use the the idea of making “requests”. When it is felt that amodel switch would be beneficial, a request is made to change the level of detail. The request willpersist until either the model is switched or the request is retracted. The next section describeshow we decide when to make requests.

3.4.4 Global, Group and Local Policies

Policies for switching models can be written for different levels of scope: global, group and local.Each policy relies on heuristics available to make the best decision about when model switchingwould benefit. Heuristics such as proximity on a local level or the “group size to group rangeratio” (Comparing the number of objects in a group to the space that the set of objects occupy)on group level can be used to request model switching. O’Sullivan and Dingliana note thatadaptive detail (local level) is preferable to reducing the complexity of the whole scene (globallevel) to achieve the target performance. Local adaptive detail doesn’t address the issue of highcomputational load. In these situations it can be preferable to reduce the global complexity toachieve a minimum frame rate if such a situation arises. The condition is that the user perceptionof frame degradation is greater than the user perception of physics plausibility.

Using policies we can decide when is the best time to switch. Figure 3.7 shows a flow diagramdescribing when requests to switch can be made. Each level of scope of the diagram has accessto certain heuristics:

• On a local level:

– When in proximity to other objects inform a low level policy manager about a potentialcollision.

– When velocity is over a threshold, inform a low level policy manager about a potentialcollision.

• On a group level:

– When the number of objects exceeding local thresholds is high, inform a group levelpolicy manager about a potential collision.

– When the range of positions, compared to the size of the group is over a thresholdratio, inform a group level policy manager about a potential collision.

• On a global level:

– When system performance has dropped below a threshold, inform the global levelpolicy manager to request a model reduction.

– When the number of overlapping groups exceeds a threshold, inform a global levelpolicy manager about the potential collision.

Analysis of these heuristics can help make better estimations of when switching is preferable.A simple implementation would trigger a request every time a condition is met. Having a policymanager could provide a way of tracking the values and making these decisions. We are interestedin the effects of simple triggers because they have the lowest overhead. For this reason, we testour implementation for the base case and propose more detailed analysis for future work.


Over local

threshold?

Detect Local

Knowledge

Request a switch

Prior level is

collision free?

Make a switch

yes

yes

Wait a specified

time no

Request

retracted?no

yes

Group switch

requested?yes

Global switch

requested? yes

LocalGroupGlobal

no

no

no

Global

KnowledgeGroup

Knowledge

Local

Knowledge

Proximity

Collisions

Velocity

Mass

Group size

Complexity

Local thresholds

Group profiling

Detect Group

Knowledge

Detect Global

Knowledge

Group overlap

Physics Performance

Global Frame Rate

Global profiling

Figure 3.7: Global, group and local decision making for model switching

3.4.5 Investigation by Implementation

By implementing a version of Encapsulation Levels in Scatter we aim to evaluate the following:

• How can we quantify the gaps between levels of detail?

• What is the lowest level of detail we can use that provides plausible collisions?

• What is the performance improvement of the system whilst using the technique?

• What are the best heuristics for requesting model switching?

• What is the speed of response to a switch request?

Chapter 4

Implementation

“In theory, theory and practice are the same. In practice they aren’t even close”

Unknown

To support the investigation of encapsulation levels I developed an implementation using my ownphysics framework known as Scatter. Scatter is a basic API that allows for quick prototypingof physics scenarios without the problems of having to incorporate a renderer. Scatter uses amodular design to abstract the physics simulator from the renderer, potentially allowing differentrenders and simulators to be used to test the same scenario. The intention was to be able recreateproblems, observe them and interact with them. It was designed to emulate a simple game loopwith the aim of prototyping performance improving techniques. The following is the list of stagesencountered to finally implement encapsulation levels:

• Investigate renders and physics APIs.

• Select a combination on which to base the design of Scatter (The renderer used was Irrlichtand the physics engine was Bullet).

• Build the framework with the requirements for prototyping physics techniques.

• Incorporate Encapsulation Levels into Scatter.

• Experiment with the result and identify areas that require improvement.

• Run the performance evaluation tests and analyse the results of implementation with andwithout encapsulation levels and model switching.

The requirements set for Scatter are grouped by functionality:

• The System should:

– Show the difference between the physics representation and the renderer representationof any world object.

– Be able to create “scenes” that allow the developer to write a test scenario.

• Performance Monitoring should be able to:

60

CHAPTER 4. IMPLEMENTATION 61

– Monitor the CPU load.

– Attribute performance of the application to each task (Profiling).

– Record the results for analysis (Output frame data).

– Provide visual runtime information (Profiling on screen and render-able debug).

– Run with a low overhead.

• Rendering should:

– Be able to provide graphical detail to emulate a game environment.

– Be able give the impression of an actual game scenario.

– Be independent of the physics engine.

• Physics should:

– Allow modifications to the implementation to test physics algorithms (The use ofdifferent collision solvers etc).

– Be able to run with and without the added algorithms for analysis purposes.

– Be able to apply forces to objects to invoke reactions of world objects.

4.1 The “World” Model

In the research leading up to the design of Scatter we have seen a range of physics APIs. Scatteruses the world model to integrate physics and rendering. Figure 4.1 shows how the model storesthe physics and render objects in a “world”. Each object has attributes that provide informationto each of the sub-systems such as position (physics, renderer), active (physics). Implementinginterfaces gives objects functionality, such as implementing the “Sound interface” would allowa developer to trigger sounds from the object. World objects in Scatter are render-able andphysical for the purpose of demonstration.


Application

World

Physics

Simulator

Video

Renderer

Sound

Renderer

Attributes

Renderable

Mesh

Lighting

Visible

Audible

Sound

Radius

Ambient

Physical

Shape

MassWorld Objects

Submarine Object

Load Mesh

Update Position

Interface

Get Position

Figure 4.1: The world model used in a simulation application

4.2 Timing and Game Loops

The main loop of Scatter was designed to reflect a standard game loop. Structured as a Single-thread Uncoupled Model (terminology described by Valente et al, see section 2.6), Scatter wasdesigned to use a structure that synchronised on a fixed frequency, similar to that used in consoledevelopment where the performance of hardware is fixed. This allows Scatter to fix the systemframe rate and hence the maximum rate that the loop will update. In a variation on a Single-thread Uncoupled Model, the implementation uses clocks, timers and sub-timers to control therate of update of each component. Figure 4.2 shows how the clocks, timers and sub-timerscan be used to run the loop. Clocks are used by the system as the most accurate timing.The clock time is unmodifiable and the rate is fixed to real world time. The implementationSCWin32Clock, which is the default clock for the Windows platform uses the operating system’sbuilt-in performance timer. Timers are used by the main system timer running the loop. Theidea is that timers can be paused and reset to control the execution rate of the whole simulation.Sub-timers, based on timers, conceptually have a rate at which they run faster or slower thanthe rate of their source. This allows each component to run at different update rates giving theloop a decoupled feel. The implementation is a simplified example, but is demonstrated with theability of the user to “pause” the entire physics simulation. Figure 4.2 shows the structure ofthe loop and the different functions called from it. The blue shapes represent functions that usethe “System timer” to update and synchronise, the green shape is the physics function that runson its own timer allowing it to update at a different rate to the orange renderer. All timers areupdated and controlled by the loop manager, who can monitor the different timers.


Scatter Main Loop

Loop

Synchronisation

Frame Data

Update Timers

Loop

ManagementUpdate CPU

Process input

Update Physics

Update Profiling

Update Debug

Update Renderer

Update GUI

Render

GUI

Scene

World

Environment

Update Frame

Data

System

Timer

Physics

SubTimer

Render

SubTimer

System

Clock

Update Frame Data

Loop Flow Timers

Figure 4.2: The main Scatter loop and timing system.


4.3 Built-in Profiling

Profiling in Scatter provides the best way of tracking performance of the sub components. Havinga built-in profiler allows the components to read the data as it is recorded. SCFrameData providesa quick read interface that allows each component to read the data back, up to the length of theoutput buffer. The retrieved data can be used for analysing the variance of a certain functionovertime with the goal of identifying patterns that indicate behaviours of the physics such assuspected “stacking” or the build up to a large collision.

4.4 Scatter API

Scatter provides the following key classes for creating scenes and running the system:

• AFApplication - An example of an application using the API. AFApplication uses the mainloop and the timers to run the application.

• SCPhysics - The physics interface that allows the application to update the physics world.

• SCRenderer - The renderer interface that allows the application to run the renderer.

• SCBulPhysics - The Bullet implementation of SCPhysics class. This class wraps Bulletallowing the application to set the gravity and other variables. The class handles all theBullet specific implementation like setting callbacks.

• SCIrrRenderer - The Irrlicht implementation of SCRenderer class. The class wraps therunning of Irrlicht and allows the application to register classes to be rendered.

• SCWorld - Contains the objects of the world and updates them during the loop.

• SCClock - The fixed rate timing source of the application.

• SCTimer - Uses the Clock as the base time to perform timing for the system. Can bestarted and stopped.

• SCSubTimer - Uses the SCTimer as the base time to perform timing for the physics andrenderer. Can be started, stop and can be run a different rate to the base time.

• SCWorldObject - An object in the world. Has aspects of both physics and renderer andconverts between the two when updated.

• SCBulPhysicsObject - The interface that gives an object physical properties.

• SCIrrRenderObject - The interface that gives an object render-able properties.

• SCHybridObject - Represents an object in the world that has more than one physics rep-resentation (Part of the encapsulation levels implementation). It is able to switch betweenrepresentations by requesting to change the level of detail. Hybrid objects will only beable to switch if the world in which they belong is running the “Hybrid” implementation(discussed in section 4.5.2).

• SCSphereObject - An example of one of the primitives for the world. The object wraps upall implementation behind a simple constructor.


• SCScene - Used to set and update the different scenes running in the application. Thescene represents all the details about position of lighting, cameras, objects.

• SCFrameData - Used to store and read data between scenes that are logged to file. Usedto record all the heuristics in the application.

• SCDebugDrawer - Used to collect and draw the extra debug information. It takes debugoutput from the physics to draw in the renderer.

4.5 Integrating Encapsulation Levels

Using Scatter as a basis for implementing the level of detail model switching, I made the followingchoices for implementation:

• To test the effectiveness of multiple switching, Scatter ’s can use n levels of detail, eachspecified with an associated COLLADA physics data file (a standard for specifying physicalattributes).

• This particular implementation of Encapsulation Levels is based on “proximity”. Fromthe investigation we concluded that global switching requests would benefit performanceimprovement in rare cases, but can cause undesirable effects if applied to all world objects.This implementation in mainly concerned with local requests, but has been designed toallow global requests (see future work 6.5). Global requests can be performed at a worldobject level from within the SCWorld class.

• Add the functionality of “Encapsulation levels” as an additional feature to Bullet by creatinga new btHybridRigidBody and btHybridDynamicsWorld.

• All the functionality should be added between the SCBulPhysics and btHybridDynam-icsWorld, keeping the implementation behind the Scatter API. Figure 4.3 shows the struc-ture and flow of requests and switching.

4.5.1 Hybrid World

To perform switching Scatter uses “Hybrids”. The word meaning “mixed composition” refers toany part of Scatter or Bullet that deals with model switching. In Scatter the implementation isbased around the idea of requesting and switching. There are three new components added toScatter, enabling the implementation:

• SCHybridObject - An object in the Scatter API that allows the users to “add” multipleCOLLADA objects from file representing the levels from 1 to n.

• btHybridRigidBody - The internal representation in Bullet of a hybrid object. A mod-ification of the btRigidBody with the extra functionality of being able to store multiplecollision shapes.

• btHybridDynamicsWorld - A modified version of btDiscreteDynamicsWorld that controlsthe additional testing and and triggering associated with switching.


Bullet PhysicsScatter

btHybridDynamicsWorldSCBulPhysics

SCWorld

SCHybridObject

Request

Request

Register request caller

Lookup hybrid dynamics

Forward request

Lookup btHybridRigidBody

btHybridRigidBody

Request

ack

Request

Request

Ack

Request

ack

Global

Trigger

Request Group

Trigger

Local

Trigger

Group

Trigger

Request

RequestRequest

Forward Request

Internal Step

AttemptSwitch

Narrowphase

Broadphase

Find callback

Forward callback

Success

Switch DetailSucess

Callback

Sucess

No Contact

Overlapping

Sucess

Figure 4.3: The flow of requests and switches across the Scatter/Bullet boundary.


Figure 4.3 shows how an SCHybridObject (the object representation in Scatter). Sends a re-quest to SCBulPhysics that invokes the a similar function in the btHybridDynamicsWorld thatthe adds a request to switch. If during the normal execution of the physics loop, the btHybridDy-namics world decides the object is able to switch it will then perform the switch and inform theSCHybridObject via a callback. We can think of this as “SCHybridWorld objects cause requests,while btHybridRigidBody objects respond by switching.”

4.5.2 btHybridDynamicsWorld

The btHybridDynamicsWorld is the core of the implementation. Experimenting with Bullet call-backs revealed that classes in “Scatter space” (On the Scatter side of the Scatter/Bullet boundary)could successfully retrieve information like number of manifolds, contact points and other inter-nal Bullet values. What the callbacks couldn’t do was update information at different stages ofthe loop. Creating btHybridDynamicsWorld gave enough control to modify the executions andcatch test conditions.

4.5.2.1 Additions To the Bullet Loop

The implementation added two stages to the Bullet loop. One to attempt to switch the modelsand one to catch the lower detail model tests. These function calls are made at the start of theloop and after the collision detection stage respectively. We want to be able to test a lower detailmodel then detect if it has collided. To do this we iterate over O(n) manifolds to gather theinformation we need. Although this is a large number of manifolds (see the results in section5.1.5), the time complexity of the operation is so small we can justify the operation. It is possibleto avoid iterating at all but that would require full implementation into Bullet, an aspect thatwe want to avoid when prototyping. Catching a successful lower detail model test allows us toswitch the model for the simulation step.

4.5.2.2 Collecting Local Heuristics

SCNearCallback and SCCollisionAddedCallback are used provide the heuristics for “TotalOb-jectsInNarrowPhase” and “TotalNewContactPoints”. Single run-time operations ensure that wecan monitor these functions efficiently. Keeping a track of contact points help to ensure that ourobjects are contact free and not overlapping (for increasing level of detail) .

4.5.2.3 Problems in Implementation

Although Bullet is open source, the classes were not designed for implementing techniques likemodel switching. The following is a list of problems encountered and how I overcame them:

• Manifolds are continuously modified throughout the execution (they are persistent data) -Manifolds are created on demand and removed randomly so it became tricky to keep atrack of which where active and which were deleted. The implementation uses a techniquethat iterates over the manifolds with a low time complexity and tags the ones that areimportant.

• Avoiding the temptation to alter base classes - I provide a simple binary compatible testto detect the type of class and avoided having to implementation to more Bullet classesthan required. The result was easy run-time identification. It proved to be light-weightand non-invasive.


4.5.3 Successful Switching

The final implementation could successfully switch to a lower level of detail within a frame ofdetecting “no intersections”. Models that were unsuccessful at switching were tested in futureframes. Deciding how frequently to test is an issue for policy analysis. An aggressive form oftesting could provide more problems. For this system, I attempted aggressive testing to find theimpact it had. The results of testing showed that the impact was higher with higher complexitytest models. I concluded that intermittent testing approximately every second would reduce theimpact.

When objects cease to overlap, any requests made to switch are cancelled until they overlapagain. This situation means that objects won’t accidentally drop more than two levels of detailwithout being in proximity. This fact shows that objects will only switch when prompted. Thenext Chapter describes the results of testing the implementation.

Chapter 5

Evaluation

“You can’t prevent disasters, but you can diminish their frequency and severity”

Murphy’s Law of Risk

5.1 Performance Evaluation

To evaluate the performance of the Scatter implementation, I ran test scenarios to recreatethe collision overload problem. The tests compare Scatter using the “Hybrid” implementationagainst a Scatter build without model reduction techniques. The aim was not only to evaluatethe performance of the implementation but also to observe how situations like convergence andsudden impact manifest themselves in profiling output and in the load of the system. Feeding theresults of the output back into Scatter allowed for fine tuning and improvement. It also openedup areas of investigation like “how to quantify the difference in level of detail”. Table 5.1 lists thetwo different builds of Scatter : “ScatterDefault” and “ScatterHybrid”, for which both ran Scene1 and Scene 2 as a test.

Scatter Build (Default) Scatter Build (Hybrid)

Scene 1: Two hybrid objects colliding Test 1 Test 2Scene 2: Game scenario: Space

Collision OverloadTest 3 Test 4

Table 5.1: Table to show the tests performed for each build of Scatter

5.1.1 Test Conditions

In these test results we are still interested in measuring the activity for a period of time afterthe collision, to observe the “aftermath”. For this reason, Scene 1 and Scene 2 are 1300 framesand 1600 frames, respectively (45 seconds and 1 minute approx in execution time). The graphshave frame number as the x-axis to avoid the inconsistency when run on different systems. Tomake the scenarios comparable each frame is of a fixed time-step 1

60 seconds. All GUI data

70

CHAPTER 5. EVALUATION 71

output was turned off for the duration of the tests and the only information used beyond theview of the normal scene was the rendering of the Bounding Boxes and Collision Shapes inwireframe. This extra debug information was useful in identifying the success of switch requests(flashing objects indicate a waiting request). The scenarios were run without user interaction sothat sequential testing was consistent. The tests were run on the following test system with thefollowing settings:

Machine Performance Laptop

Processor 2.33 Ghz Intel Core 2 DuoMemory 2 Gb 667 Mhz DDR2 SDRAM

Graphics Driver ATI x1600 Mobility RadeonOperating System Windows Vista

Compiler Microsoft Visual C++ Compiler (Visual Studio 2005)Compiler Optimisations -O2 (Debug off)

Notes: The application was run with administrator privileges toallow the collection of CPU performance data

Table 5.2: Test System Settings

5.1.2 Test Scenarios

5.1.2.1 Scene 1

Scene 1 was created to observe the basic interactions between two hybrid objects and how theyrespond to colliding. The scene uses two complex compound objects to represent two spaceships,which I will refer to as “Hunter” and “Fighter”. Hunter has a concave mesh (with a convex physicsrepresentation) and was chosen because of the possibility of interlocking with other objects, asituation that would provide useful information in terms of “user perception”.

The Fighter mesh is a shape that is difficult to fit a suitable convex decomposition to. It hassmall wings that often sit outside the physics collision shape. The two objects are separated anda field is applied that draws them to the centre of the scene. They make a single collision thenseparate far enough that the AABBs stop overlapping. They are then drawn to each other andmake resting contact whilst slowly rotating about the centre of the scene. Figure 5.1 shows thelayout before the collision. The objects have two levels of detail, both starting at “level two”. Thetrigger in this example is based on proximity (the overlapping of the AABBs). The expectationwas that the two objects would converge, switch models to the simpler form (level 1) and collideusing those models.

5.1.2.2 Scene 2

If Scene 1 is the sandbox, then Scene 2 is the desert (the same thing, but larger). The purposeof the scene is to pull all objects into a central point so that all AABBs overlap effectively makingthe broadphase testing redundant. To achieve this all objects are placed randomly on a radius


Figure 5.1: Scene 1 : Left: the Hunter mesh. Right: the Fighter mesh


Figure 5.2: Scene 2: Spaceships (Hybrid Models), Asteroids (Basic Spheres) and the Sun (LargeSphere).


around the central point so that the overlapping is progressive as they move inwards. Theyall converge on the central point causing the degradation of the performance. After the initialcollision the objects fly off in different directions, some of which with enough velocity to escape thefield. The remaining objects continually collide and disperse until “the Sun” eventually reachesthe centre point indicating the end of the test. The motion can be conceived as an oscillationwith damping.

The hybrid objects in the world are two types of spaceship, Hunter and Fighter, both withthree distinct levels of detail. The other objects are small asteroids (non-hybrids), included todistinguish hybrids from non-hybrids in the resulting graphs. Figure 5.2 shows the objects priorto the point of convergence.

5.1.3 Performance Measures and Expectations

The main performance measures used in the tests were the ones that showed patterns relatingdirectly to the activity in the scene. The main example is the profiling of the physics that showedclearly spikes in execution time where the objects overlapped. The list below shows the indicatorsused to measure performance on a per frame basis. All bullet points under “Total Physics” aresub-components of Bullet. The important ones are highlight in bold.

• Profiling Function Calls (Execution time per frame) in milliseconds

– Total Physics - The time taken to perform the all the stages of physics update

∗ Updating AABBs - Time to recalculate the bounding volumes

∗ Prediction of Motion - Time to calculate motion before collision detection

∗ Collision Detection - Time to perform narrowphase collision detection

∗ Calculating Simulation Islands - Time to sort manifolds into simulation is-lands

∗ Non-contact Constraint Solving - Time to solve all constraints that are not incontact

∗ Constraint Solving - Time to solve all other constraints

∗ Integrating Transformations - Time to integrate all transforms, applying mo-tion

∗ Updating Vehicles - Time to update vehicles

∗ Updating Activation - Time to update the activation

• Total Number of Manifolds (m) - Indicator of the total number of narrowphase tests per-formed (created for new tests)

• Total Narrowphase Collision Pairs (cpnar) - Indicator of number of collision pairs thatrequire narrowphase testing.

– Containing a Hybrid (cphyb) - Indicates whether one of the two objects in the pair isa hybrid object

– Not containing a Hybrid (cpnon) - Indicates the pair is two non-hybrids.

• Total Number of Objects (n) - Indicates whether a successful switch is made (numberdecreases as the old representation is removed)

• Total Number of Contact Points (con) - Indicates the time at which collisions begin andhow many new contacts happen per frame


5.1.4 Expectations

To recap, manifolds in Bullet contain the contact points between collision pairs and performcontact filtering of those points. Multiple manifolds are created for every collision pair that isnarrowphase tested. The expectation was that the total number of manifolds would be rep-resentative of the number of narrowphase tests performed. This number would increase withthe complexity of the models as more sub-components of the compounds require testing. Theexpected result was a sharp spike in the graph at the point where most collisions are occurring.

Related to the manifolds, the total number of narrowphase collision pairs at time t wasexpected to be around ≈ (noverlap)

2 for the total objects n. For scenes combining hybrid andnon-hybrid models (such as Scene 2 ) the expectation was that the number of hybrid collision pairswould be h(h + 2k) where h is total number of hybrid objects, k is the number of non-hybrids.The number of non-hybrid collision pairs was expected to be k2.

Note that although Scatter currently uses additional non-hybrids to test for intersectionof lower detail models, this is not likely to be the case in the future. In the results theseextra intersection tests manifest themselves as additional non-hybrid collision pairs. In theimplementation there is a level of filtering to avoid two representations of the same object beingtested. The number of objects was expected to increase as model switching was attempted anddecrease as the models were successfully switched (when the test objects were removed from thescene).

5.1.5 Results

The following graphs given a good amount of detail regarding the performance of Scatter underthe four tests. We will look initially at Scene 1 to understand the patterns caused by the simplescenario of two objects and how the processes of requesting and switching are visible in theresults. We will then apply the understanding of Scene 1 to the more complicated graphs ofScene 2 and how the patterns hold.

5.1.5.1 Test 1 and Test 2

For these tests the models used were composed of approximately 8 five boxes and 22 boxes forthe hunter mesh and fighter mesh respectively. For both tests 1 and 2, both scenes played outidentically until the point of overlap (between frame 140 and 180 in Figure 5.3). In the hybridbuild (test 2) the events can be described as follows:


Figure 5.3: Scene 1: Key frames of the Hybrid Build

• Both models heading towards each other (frame 80)

• Bounding Boxes overlap triggering a request to switch for both models (after frame 140)

• Hunter model switches successfully in the following frame

• Fighter model fails to switch but the request is acknowledged (visible in frame 180 as theoverlapping pink shape on the right)

• The objects collide and start travelling in different directions

• Bounding boxes stop overlapping and the switch of Fighter is successful (frame 200)

• Both objects are drawn back in by the field and overlap bounding boxes for the secondtime (between 200 and 450)

• They eventually experience equal forces on each other and begin rotating slowly till theend of the test

Figure 5.5 shows the full scene duration from frame 0 to frame 1400. The activity from thecollision is focussed between frame 150 and frame 230. The persistent manifolds (in red) areassociated with the left axis and the number of collision pairs in the narrowphase are associatedwith the right axis. Figure 5.6 provides a better view of the activity. The default build (test1) shows the (392) manifolds being created at the instance of the first collision. The numberonly drops after the objects separate and stop overlapping. The second collision causes the (392)manifolds to be added again. In the hybrid build (test 2) shows the number of manifolds isinitially more (424) than the default build due to the temporary testing of Hunter level 1. Themanifolds then drop and rise due to the testing of Fighter level 1 until the objects part. Bythis point (frame 200) both models are at level 1. This means that when they overlap again the


number of manifolds is significantly less (20). Test 1 and 2 have shown the effect switching hason reducing manifolds for future computations, now we will look at the graphs for Scene 2.

5.1.5.2 Test 3 and Test 4

On first inspection we can see immediately that the scene is a lot more complex. There are 10Asteroids, 10 Fighters, 10 Hunters and a single Sun. The expectation was that the increased levelof detail of the models would not cause a problem until entering the narrowphase. This predictionwas correct and both test 3 and test 4 experienced a significant deterioration of frame rate as ittook up to 500 ms to calculate the physics (only leaving enough time to generate approximately2 frames per second. In these tests the scene ran the same until the first point of collision (frame280 approx) before that a number of objects in the hybrid build had already attempted to switchmodels. Figure 5.4 shows the in frame 140 the hunter object has already switched before the firstimpact. The Total Objects graph (Figure 5.8) supports this claim because it shows an increasein objects implying that a number of the hybrids are attempting to switch. The key events areas follows:

Figure 5.4: Scene 1: Key frames of the Hybrid Build

• Objects start moving towards the centre point (frame 80)

• Adjacent objects start to overlap and cause a request (frame 140)

• Contact starts to occur up to a peak level of contact (between frame 360 and 380)

• Objects begin to disperse leaving space to allow some objects to switch (frame 420)

• There is a a second peak of contact between 450 and 500, likely to be caused by collisionswhen dispersing

• Some objects continue to collide until the end of the scene where the sun eventually impactson the centre (frame 1600)


In the manifold graphs (with the axes arranged in the same way as scene 1) in figure 5.7, the moststriking observation is that the number of manifolds has reached 77874 for the default build and66255 for the hybrid build. The number is approximately a combination of all overlapping objectstested with each other, n2, but it is dependent on too many variables to calculate accurately.What the graphs show is that after the initial collision and the second collision the calculation ofmanifolds (in red in both these graphs) trails off sharply at between 550 and 600, indicating thatmost complex models have been switched. This is supported by the Total Objects graph, Figure5.8, which indicates that number of objects is the same as before the contact and hence the testlevels must have switched out. Looking at the overall object graph we can see that it is similar toscene 1 it takes a while (up to frame 1200) for all hybrids to successfully switch. This occurs atleast 600 frames after the benefits of switching have taken place. These findings led to the ideaof retracting switches (mentioned in the investigation). Using global analysis to decide when theperformance of the system is acceptable, we can retract any unsuccessful requests. This is anexample of a suitable global policy.

There are two further trends we can draw from the Total Objects graph. The first is that thethe number of contacts made is higher in the hybrid scene. For each unswitched hybrid object,the test model for switching has no contact resolution (because we don’t want it to collide untilwe are sure it can do so with causing instability, see section 3.4.3). The model will thereforepenetrate other models, which although doesn’t cause a problem, is undesirable. The graphhelped in identifying this case, which is rectifiable by filtering the contact point determinationstep for test models (reducing the purple peak in the graph).

The second of the two trends is that Figure 5.9 shows the number of objects up to the peakcollision (frame 370) is still decreasing less than 25 frames before the major degradation. Thismeans that model switching is continuously reducing the calculations up until the last possiblepoint. This is obviously a desirable outcome and shows the success of the implementation.

5.1.6 Observations

As well as expected results, testing reveal a few unexpected ones. We know that the mainbottleneck in collision detection by overlapping of bounding volumes. The results show that thekey cause of the drop in performance in this set of tests was caused by the “Updating AABBs”(Figures 5.10 and 5.11). The updating is still part of the broadphase, but the cause seems tobe the “midphase” of Bullet. Up until the point of overlap any hybrid object only requires anouter AABB recalculation. As overlap occurs the Bullet midphase requires the updating of the“optimisedBVH” described in the investigation. This supports the findings of the investigationthat the time taken is dependent on the chance of failing an intersection test and therefore having


to test further. It is unclear at this stage whether the performance drop is because of the Bulletimplementation of “midphase” or inherent in the testing of complex AABB trees. This is a pointfor future study, but this report provides the ground work for furthering the research.

In summary, the peak representing the collision overload was smaller in all graphs for thehybrid build compared to the default build. This suggests that model reduction improved thesituation in the lead up to the peak. The difference in the peaks is dependent on how big achange of detail there is. This gap is limited only by the requirement to keep the collisions“feasible” for the user. The results show that reducing a 22 box model to a single box modelmodel would improve the situation, but the observed collisions would look suspect. We thereforeused a more detailed model (22 boxes to 8 boxes). The next stage for implementation would beto improve the response and efficiency in which models switch. The conclusion discusses usingphysics models that don’t encapsulate the mesh, which would provide the extra space needed tomake the switch. The results show that even a simple implementation can cause improvements.


0

50

100

150

200

250

300

350

400

0 200 400 600 800 1000 1200 1400 0

1

2

3

4

5

Num

ber

of M

anifo

lds

Num

ber

of C

ollis

ion

Pai

rs

Frame Number

Persistant ManifoldsTotal Collision Pairs in Narrowphase

0

50

100

150

200

250

300

350

400

450

0 200 400 600 800 1000 1200 1400 0

1

2

3

4

5

Num

ber

of M

anifo

lds

Num

ber

of C

ollis

ion

Pai

rs

Frame Number


Figure 5.5: Scene 1: Manifolds and Narrowphase CPs. Top: Default Build. Bottom: HybridBuild.


0

50

100

150

200

250

300

350

400

140 150 160 170 180 190 200 210 220 230 0

1

2

3

4

5

Num

ber

of M

anifo

lds

Num

ber

of C

ollis

ion

Pai

rs

Frame Number


0

50

100

150

200

250

300

350

400

450

140 150 160 170 180 190 200 210 220 230 0

1

2

3

4

5

Num

ber

of M

anifo

lds

Num

ber

of C

ollis

ion

Pai

rs

Frame Number


Figure 5.6: Scene 1: Detail: Manifolds and Narrowphase CPs. Top: Default Build. Bottom:Hybrid Build.


0

10000

20000

30000

40000

50000

60000

70000

80000

0 200 400 600 800 1000 1200 1400 1600 1800 0

20

40

60

80

100

120

140

Man

ifold

s

Tes

t Col

lisio

n P

airs

Frame Number


Collision Pairs with HybridCollision Pairs without Hybrid

0

10000

20000

30000

40000

50000

60000

70000

80000

0 200 400 600 800 1000 1200 1400 1600 1800 0

50

100

150

200

250

300

350

400

450

Man

ifold

s

Tes

t Col

lisio

n P

airs

Frame Number


Collision Pairs with HybridCollision Pairs without Hybrid

Figure 5.7: Scene 2: Manifolds and Narrowphase CPs. Top: Default Build. Bottom: HybridBuild.


30

35

40

45

50

0

100

200

300

400

500

600

700

800

900

100

0

110

0

120

0

130

0

140

0

150

0

160

0

0

50

100

150

200

250

300N

umbe

r of

Obj

ects

Frame Number

Total Objects: Hybrid SceneTotal Objects: Default SceneTotal Contacts: Default BuildTotal Contacts: Hybrid Build

Figure 5.8: Scene 2: Total Objects and Contact Points (Both builds)

30

35

40

45

50

300

310

320

330

340

350

360

370

380

390

400

410

420

430

440

450

460

470

480

490

500

Num

ber

of O

bjec

ts

Frame Number

Total Objects: Hybrid SceneTotal Objects: Default Scene

Figure 5.9: Scene 2: Peak Detail: Total Objects (Both builds)


0

100

200

300

400

500 0

100

200

300

400

500

600

700

800

900

100

0

110

0

120

0

130

0

140

0

150

0

160

0

Exe

cutio

n T

ime

(ms)

Frame Number

Total PhysicsUpdate AABBsPredict Motion

Collision DetectionCalculate Simulation Islands

Solve Non-contact ConstraintsSolve Contact Constraints

Integrate TransformsUpdate Vehicles

Update Activation

0

100

200

300

400

500

0

100

200

300

400

500

600

700

800

900

100

0

110

0

120

0

130

0

140

0

150

0

160

0

Exe

cutio

n T

ime

(ms)

Frame Number





Update Activation

Figure 5.10: Scene 2: Physics profiling. Top: Default Build. Bottom: Hybrid Build.


0

100

200

300

400

500 3

50

355

360

365

370

375

380

385

390

395

400

Exe

cutio

n T

ime

(ms)

Frame Number





Update Activation

0

100

200

300

400

500

350

355

360

365

370

375

380

385

390

395

400

Exe

cutio

n T

ime

(ms)

Frame Number





Update Activation

Figure 5.11: Scene 2: Peak Detail: Physics profiling. Top: Default Build. Bottom: HybridBuild.

Chapter 6

Summary and Conclusion

“Games developers are like magicians, they mesmerise their audiences with fantasticeffects and beautiful worlds, but the real magic is how they simplify the illusion”

Ian Ballantyne, 2007

In this report we have discussed the background of physics simulation and looked in depth intohow a physics simulator functions. The investigation stage looked at the bottlenecks and theareas of research that can help reduce the impact of collisions on performance. This report hasfocused on “level of detail” as a technique to reduce the gap between performance of physics inthe standard case and in the rare cases of full overlapping. The proposed method of “Encap-sulation Levels” and “Model Switching” has proved to be an initial improvement by reducingthe impact prior to the collision and accelerating the the return to stability after. The complextask of developing Scatter revealed the difficulties of prototyping with an existing simulator, butresulted in a framework for future testing of new physics techniques and especially level of detailtechniques. Scatter provides a foundation for more research into topics such as the analysisof global and group heuristics to influence local model switching decisions. This final chapterdiscusses the conclusions of this project and the best direction for future research.

6.1 Performance

The success of dynamic model switching relies on the improvement of performance. I achievedthis with Scatter and have concluded the following:

• The technique successfully improves the performance in the lead up to the initial collision.The switched objects are simpler, but still have a complex representation.

• Even after the initial collision(s), switching helps the simulator return to a stable state.

• The graphs show that the hybrid implementation improves the execution time of the initialcollision and improves any successive collisions.

• The recorded values don’t give a clear indication that a collision is about to occur. Analysingthe gradient of the graph could be a good indicator, however this information is deceptive.It could produce false positives, so more research is required in this area.

88

CHAPTER 6. SUMMARY AND CONCLUSION 89

• Switching is completed successfully in less than 3 frames (Both scene 1 and scene 2 demon-strate this).

• Model switching still occurs successfully during the start of degradation, showing it is nottoo late to improve.

• The implementation takes over half the test time to perform all switch requests (bearingin mind that all objects switch more than once). Retracting requests is the best way todeal with this.

• Updating of the AABBs in the scene is the cause of the performance spike. It is likely thatthe implementation of optimised bounding volume hierarchies (BVH) in Bullet, which useAABB trees is the cause of the spike. Further investigation is required to identify whetherthe problem is with AABB tree technique or a shortcoming of the Bullet implementationfor compound objects.

6.2 Scatter

Scatter provides the following features:

• A simple API for creating scenes to test in a physics simulator.

• A transparent wrapper for creating “Hybrid” objects and running them in a hybrid orstandard world.

• Profiling, CPU load and physics state variables such as “number of broadphase intersectionpairs” for each frame.

• Controls to pause physics and observe interactions with extra visual information:

– Bounding boxes

– Encapsulation levels

– Physics Collision Shapes

– Visual response to requests and switches

• Controls to manipulate objects in the scene for experimentation (to prompt certain re-sponses).

Scatter is a fast framework for prototyping new physics concepts. It provides the output requiredto test performance without the hassle of implementing an input and rendering module. It can bequickly used to load meshes and COLLADA physics data, abstracting the renderer and physicsthe from the implementation. Future work includes the addition of other renderer and physicslibrary implementations, such as Ogre and ODE respectively, that can be used to compare anyscene with different combinations of technology. This feature would be useful for developersinterested in comparing physics packages.

The implementation revealed that it would not have been possible to perform encapsulationlevels at the game developer perspective. Integrating into Bullet allowed more control than thecallbacks provided in the API (detecting the exact frame of contact in SCBulPhysics supportsthis statement) .


6.3 Level of Detail

From the report we can draw the following conclusions about level of detail:

• It is an effective technique in improving the performance in rare cases.

• These “rare” cases are becoming more common in game physics.

• Developers require detailed objects, but also to perform well (narrowing the gap betweenaccuracy and performance). Dynamic level of detail can achieve this.

• There is a lot of research that can be gathered from graphics techniques and applied tophysics.

6.3.1 Encapsulation Levels and Model Switching

• Encapsulation Levels are necessary for stability in model switching.

• They improve the performance when switching to a lower level of detail despite the ad-ditional complexity of having to test a less complicated object in addition to the currentlevel of detail. The efficiency of the test will determine the extent of which encapsulationis an effective trade-off for improving level of detail.

• Switching to a higher level of detail requires no intersection testing and very little effort,but care must be taken to ensure the feasibility of the change (see Figure 3.5).

• In terms of local heuristics, proximity is a good trigger, provided that there is a marginaround the largest level of detail that ensures switching can occur even when faced with highvelocity objects. The expected requirement is that the margin m should be greater thanthe distance travelled by the maximum velocity in a single time-step (m > vmax∗ttimestep).This condition is required for future implementations.

• Visually the impact of encapsulation levels depends on the step between the levels of detail:

– Using bounding boxes as collision objects is undesirable but the best for performance.

– Frequency of switching affects the users only if they can observe a change in the typeof collision.

– Very low detail models produce collisions that are obviously infeasible. These infeasi-ble collisions are only acceptable under the user perception conditions explored in theinvestigation.

– Visually, the difference between having the debug AABBs and collision shapes on andoff is significant because without the debug the users have their own, often inaccurate,perception of the collisions.

– Reducing the size of the physics representation compared to the mesh improves themodel switching at the expense of visual overlapping of meshes.

• There is a trade-off between “user perception” and “performance” when selecting the stepsize.


• The design of “encapsulation levels” is an attempt to improve performance before the pointof contact. The results show an improvement, but also that the technique may be suitablefor other more common scenarios like those found in physics-based games such as “CellFactor” where groups of objects are moved together. The response of the system in thetest scenario was to improve the performance as objects were given space to expand.

• Relaxing encapsulation levels is detrimental. Without encapsulation levels we can introduceunnecessary energy into the system. It is possible for small amounts of overlap providingthat stability is not a requirement.

6.4 Implementation

This project used Scatter to provide extra functionality to Bullet with the aim of modifyingthe implementation of Bullet as little as possible. Minimising the impact proved to be a hugechallenge, as the physics technique would have been more suited to full Bullet integration. Fullintegration would have required the acceptance of the key classes, like primitive shape objectsand would have require a specific implementation of each of the many algorithms used. Thiswould have made Bullet dependent on the encapsulation technique, which is undesirable forimplementing prototype physics concepts. My approach was to create instances of the key classes,btHybridDynamicsWorld, which provided the best compromise and only required the modificationof two Bullet classes: a single function to btRigidBody and an extra field in btManifoldPoint.

Integration of Irrlicht was a key feature of Scatter, but development revealed the nature ofjoining two independently written systems. Although closely linked, physics and rendering fightover control of the simulated game world. Each engine wants to be in charge of the coordinatesystem and this results in many transformations to convert between the two. Scatter is a physicsbased implementation, therefore rendering is updated from the physics. Table 6.1 shows a list ofthe equivalent primitive types in both Bullet and Irrlicht and how they compare. Conversions oforientation posed the biggest problem in Scatter, but in principle, this is possible in any pairingof middle-ware technologies. The overhead of conversion could out way the benefits, which maylead to developers writing their own implementation. It is unclear whether some projects arewritten around the tools they use or whether they are written for the tools. A closely linkedrenderer and physics simulator would be most beneficial. Work on multi-threaded game loopswill reveal more definite answers.

Bullet Type Irrlicht Type Available Conversion(s)

btScalar f32, f64, i32, u32 etc float to double, float to int, float to unsigned int etcstd::wstring std::string std::wstring to std::string and w_char* to char*btPoint3 vector3df get and set methodsbtVector3 vector3df get and set methods

btQuaternion vector3df Quaternion to EulerbtVector3 SColor Vector r,g,b to SColor a,r,g,b

Table 6.1: Conversion between Bullet and Irrlicht primitive types.


6.5 Future Work

Future work will continue to address the improvements level of detail can make to performanceand the ability to have dynamic object accuracy. Effort is still required to improve the efficiencyof level of detail. The work on parallelisation of calculations is on going, but can also be appliedto collision overload. The follow is a list of future work identified in this report:

• Furthering the field of level of detail by investigating “Adaptive level of detailmodels based on algorithmic generation” . The research would be applicable to modelswitching and encapsulation levels in both pre-computation and real-time. The latter ofthe two is a lot more complicated and would require an efficient implementation. The startpoint for this research would begin with an investigation into adaptive meshes in graphicswith the purpose of investigating “adaptive convex hulls for level of detail”.

• Application of “Encapsulation Levels” to deformable bodies. Encapsulation levelshas the potential to work with deformable bodies, but would require investigation into thespeed of regenerating the levels after deformation.

• Investigation into the performance of step-size between models. The trade-offbetween performance improvement and user perception requires some quantitative analysis,mainly because the area of user perception is very qualitative. These results of this workcould be a good indicator of “how much” a certain change could improve performance

• Investigation analysing the heuristics that dictate requests to switch. This reportdefined the areas of global, group and local for requesting model switching. The implemen-tation focused on local but could be easily extended to work with global requests. Theprecursor to performing this adaptation is analysing the heuristics. Proximity is a goodindicator but would require higher level information to be efficient.

• Implementing Parallelisation techniques in Scatter to analyse the improve-ments. This report has given an example of where in Bullet parallelisation could occur.Work by Kokkevis et al. with parallel physics on the CELL is a good starting point forfurther work [27]. Kokkevis notes that paralleisation could hinder as well as help. UsingScatter as a test-bed this could be implemented and tested.

6.6 Discussion

This research demonstrates that dynamic level of detail will become a much more important areaof study in the future of physics simulations. With increasing feasibility of what can be calculatedin real-time we will always aim to push the boundaries. I envisage games and simulations wheredevelopers will attempt to allow users to demolish entire buildings or even cities, but still beable to chip corners off the individual bricks or cause the intricate cracking of glass in windows.Scale appears less of an obstacle when “level of detail” is involved, but the idea of being able toperform the complexity of calculations without it is just as exciting. With virtual environmentsas seemingly accurate as the real world, what limits could there be except the laws of physicsthemselves!

Bibliography

[1] AGEIA. Advanced gaming physics. White Paper, 2006.

[2] Robert Bridson. Ronald Fedkiw. Joh Anderson. Robust treatment of collisions, contact andfriction for cloth animation. In SIGGRAPH 2002, volume 21, pages 594–603. ACM Press /ACM SIGGRAPH, 2002.

[3] Zhaosheng Bao, Jeong-Mo Hong, J. Teran, and R. Fedkiw. Fracturing rigid materials. InIEEE Transactions on Visualization and Computer Graphics, volume 13, pages 370–378,2007.

[4] David Baraff. Analytical methods for dynamic simulation of non-penetrating rigid bodies.In SIGGRAPH 89, volume 23 of Computer Graphics. Cornell University, Ithaca, NY 14853,ACM Press, July 1989.

[5] David Baraff. Fast contact force computation for nonpenetrating rigid bodies. In SIG-GRAPH, pages 23–34. Carnegie Mellon University, ACM Press, 1994.

[6] Gino Van Den Bergen. Efficient collision detection of complex deformable models usingaabb trees. Journal of Graphics Tools, 2(4):1–13, April 1998.

[7] Gino Van Den Bergen. A fast and robust gjk implementation for collision detection of convexobjects. Journal of Graphics Tools, 4, 1999.

[8] David Blythe. The direct3d 10 system. Technical report, Microsoft Corporation, 2006.

[9] Ian Buck. Taking the Plunge into GPU Computing, chapter 32, pages 509–512. GPU Gems2. Addison-Wesley, 1st edition, 2005.

[10] Yan Zhuang. John Canny. Real-time simulation of physically realistic global deformation.Technical report, University of California, Berkeley, California, USA, 1999.

[11] Erin Catto. Iterative dynamics with temporal coherence. Game Developer Conference, 2005.

[12] Valve Developer Community. Information on prop data, 2006. Available from:http://developer.valvesoftware.com/wiki/Prop_Data [cited 1st October 2006].

[13] Jong-Shi Pang. Richard E. Stone. Richard W. Cottle. The Linear Complementarity Problem.Academic Press, San Diego, California, USA, 1997.

[14] Erwin Coumans. Bullet collision detection and physics sdk. 2006. Available from:http://www.continuousphysics.com/Bullet/BulletFull/main.html.

94

http://developer.valvesoftware.com/wiki/Prop_Data

http://www.continuousphysics.com/Bullet/BulletFull/main.html

BIBLIOGRAPHY 95

[15] Erwin Coumans. Bullet collision detection faq. 2006. Available from:http://www.continuousphysics.com/mediawiki-1.5.8/index.php.

[16] Erwin Coumans. Physics simulation forum, 2007. Available from:http://www.continuousphysics.com/Bullet/phpBB2/index.php.

[17] David H. Eberly. Game Physics. Interactive 3D Technology. Morgan Kaufmann, 500 San-some Street, Suite 400, San Francisco, CA 94111, 2003.

[18] Kenny Erleben. Stable, Robust And Versatile Multibody Dynamics Animation. PhD thesis,University of Copenhagen, Copenhagen, Denmark, 2004.

[19] Eduardo Tejada. Thomas Ertl. Large steps in gpu-based deformable bodies simulation.Simulation Practice and Theory, 13(9):703–715, 2005.

[20] Emmett Kilgariff. Randima Fernando. The GeForce 6 Series GPU Architecture, chapter 30,pages 471–491. GPU Gems 2. Addison-Wesley, 1st edition, 2005.

[21] Matt Pharr. Randima Fernando. GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation. GPU Gems. Addison-WesleyProfessional, 2005.

[22] David Luebke. Martin Reddy. Jonathan D Cohen. Amitabh Varshney. Benjamin Wat-son. Robert Huebner. Level of Detail for 3D Graphics. Morgan Kaufmann, 2003.

[23] David Baraff. Andrew Witkin. Michael Kass. Physically based modelling. SIGGRAPHCourse, 2001.

[24] E.G Gilbert. D. W. Johnson. S. S. Keerthi. A fast procedure for computing the distancebetween complex objects in three-dimensional space. IEEE Journal of Robotics and Au-tomation, 4(2):192–203, 1988.

[25] David Knott. Cinder, collision and interference detection in real time using graphics hard-ware. Master’s thesis, University of British Columbia, 2003.

[26] Evangelos Kokkevis. Practical physics for articulated characters. Game Developer Confer-ence, 2004.

[27] Vangelis Kokkevis, Steven Osman, and Eric Larsen. High-performance physics solver designfor next generation consoles. In Game Developers Conference, 2006.

[28] S. Gottschalk. M. C. Lin. D. Manocha. Obbtree: A hierarchical structure for rapid interfer-ence detection. In SIGGRAPH, pages 171–180. ACM SIGGRAPH, 1996.

[29] Don Woligroski. Aaron McKenna. The best gaming video cards for the money: January2007. Tom’s Hardware, January 2007. Available from: http://tomshardware.co.uk/.

[30] Brian Mirtich. Impulse-based Dynamic Simulation of Rigid Body Systems. PhD thesis,University of California, Berkeley, California, USA, 1996.

[31] Adam Morvanszky and Pierre Terdiman. Games Programming Gem 4: Fast Contact Reduc-tion for Dynamics Simulation, chapter 3, pages 253–263. Number 5. Charles River Media,2004.

http://www.continuousphysics.com/mediawiki-1.5.8/index.php

http://www.continuousphysics.com/Bullet/phpBB2/index.php

http://tomshardware.co.uk/

BIBLIOGRAPHY 96

[32] Intel Software Network. Open source game development, 2007. Available from:http://www.intel.com/cd/ids/developer/asmo-na/eng/254761.htm?page=1 [citedMay 2007].

[33] Carol O’Sullivan and John Dingliana. Collisions and perception. Technical report, ImageSynthesis Group, Trinity College Dublin, 2001.

[34] Darren E. Polkowski. Geforce 8800: Here comes the dx10 boom. Tom’s Hardware, November2006. Available from: http://tomshardware.co.uk/.

[35] X. Provot. Collision and self-collision handling in cloth model dedicated to design garment.Graphics Interface, pages 177–89, 1997.

[36] Martin Reddy. Perceptually Modulated Level of Detail for Virtual Environments. PhD thesis,University of Edinburgh, Edinburgh, Scotland, 1997.

[37] Martin Reddy. Visual perception and lod. Presentation, 2002.

[38] Alec R. Rivers and Doug L. James. Fastlsm: Fast lattice shape matching for robust real-timedeformation. Due for proceedings of SIGGRAPH 2007, 2007.

[39] M.Muller. L.McMillan. J.Dorsey. R.Jagnow. Real-time simulation of deformation and frac-ture of stiff materials. In Eurographic Workshop on Computer Animation and Simulation,pages 113–124, Manchester, UK, September 2001. Springer-Verlag New York, Inc.

[40] Axel Seugling and Martin Rolin. Evaluation of physics engines and implementation of aphysics module in a 3d-authoring tool. Master’s thesis, UMEA University, March 2006.

[41] H.W. Six and D. Wood. Counting and reporting intersections of d-ranges. In IEEE Trans-actions on Computers, volume 3, pages C–31:181–187, 1982.

[42] X. Tu and D. Terzopoulos. Artificial fishes: Physics, locomotion, perception, behavior. InSIGGRAPH 94, pages 43–50, July 1994.

[43] James Tulip, James Bekkema, and Keith Nesbitt. Multi-threaded game engine design.Technical report, Charles Sturt University, 2005.

[44] Luis Valente, Aura Conci, and Bruno Feijo. Real time game loop models for single-playercomputer games. Oct 2005.

[45] Gino van den Bergen. Collision Detection in Interactive 3D Environments. Interactive 3DTechnology. Morgan Kaufmann, San Francisco, CA 94111, 2003.

[46] Gino van den Bergen. Collision Detection in Interactive 3D Environments, chapter 2, pages56–57. Interactive 3D Technology. Morgan Kaufmann, San Francisco, CA 94111, 2003.

http://www.intel.com/cd/ids/developer/asmo-na/eng/254761.htm?page=1

http://tomshardware.co.uk/

collision overload: reducing the impact in real-time physics final

Documents