kipa game engine seminars
DESCRIPTION
KIPA Game Engine Seminars. Day 15. Jonathan Blow Seoul, Korea December 12, 2002. Bit Tricks. Generating Bit Masks Is some number a power of two? Avoiding ‘if’ statements (branch prediction) Floating-point absolute value Floating-point compare Floating-point log2. Generating Bit Masks. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/1.jpg)
1
KIPA Game Engine Seminars
Jonathan Blow
Seoul, Korea
December 12, 2002
Day 15
![Page 2: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/2.jpg)
2
Bit Tricks
• Generating Bit Masks
• Is some number a power of two?
• Avoiding ‘if’ statements (branch prediction)
• Floating-point absolute value
• Floating-point compare
• Floating-point log2
![Page 3: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/3.jpg)
3
Generating Bit Masks
• Suppose we want to mask the low n bits of a machine word
• We can generate that with a loop
• Show summation equation for the loop
• Identity that lets us do something faster
![Page 4: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/4.jpg)
4
Is some number a power of two?
• The power-of-two will be a single bit somewhere in the middle of the word
• The power-of-two minus one will be a bit mask like the ones we just looked at
• ANDing them together will produce 0
![Page 5: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/5.jpg)
5
Counting the numberof set bits in a machine word
• Slow loop version
• “Trick” O(num set bits) version
• Discussion of tree version
![Page 6: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/6.jpg)
6
Pentium 4 “fireball”
• A 16-bit integer unit at the core of the chip that runs at very high clock speeds
• 32-bit integer operations are pipelined through the fireball as multi-stage 16-bit operations
• Pipeline is organized for bits to flow from bottom to top of the word (as with addition and subtraction)
• Right-shifts require a dependency that goes in the opposite direction (slower!)
![Page 7: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/7.jpg)
7
“How many bits does it take to store this range of values?”
• Application: network or file i/o
• Want ceil(log2(n_max)) assuming the values go from 0 to n_max
• Slow floating-point versions
• Fast bit-extraction versions
![Page 8: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/8.jpg)
8
Floating-Point log2
• Show slow version
• Fast version utilizing the IEEE-754 format
![Page 9: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/9.jpg)
9
Fast absolute value
• Utilizing IEEE-754 floating point format
![Page 10: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/10.jpg)
10
Fast floating-point compare
• Description of how x86 machines compare floating point numbers– Get at least one of them on the stack– Perform ‘fcomp’ instruction– Load the floating point control word– Bit-mask it to see if the desired field is set
![Page 11: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/11.jpg)
11
Decision-making without branching
• (And without writing in assembly language, to use instructions like CMOV)
• Build a mask based on whether some intermediate result is negative or not
• Use that to mask values and add them, or whatever you want– Examples
![Page 12: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/12.jpg)
12
Collision Detection
• Speedbox and Schnitzel as alternatives to the “prevent tunneling” raycast
![Page 13: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/13.jpg)
13
Collision Detection
• Don’t forget to optimize mainly for the expected case!– To miss a lot, or to hit a lot?
• Example of Shock Force and the “early hit test”– We expect to miss usually!– So the early hit test was not so effective
![Page 14: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/14.jpg)
14
Collision detection
• More Shock Force examples– Hierarchy of tests: bounding sphere, OBB,
simple plane divide, BSP “hard case”
![Page 15: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/15.jpg)
15
Profiling• Motivation
– You can’t optimize unless you profile. For some reason some people think they can… they’re wrong.
• Demo of sample app
• Goals:– Know where the overall CPU is being spent
• May depend on which kind of behavior is happening!
– Know which routines are stable and which ones are not
![Page 16: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/16.jpg)
16
Profiling
• Example of getting the current time on Windows– At different accuracy levels
• Description of how this is slow, and why– Too slow to call very often in code!
![Page 17: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/17.jpg)
17
Profiling (2)
• Using the rdtsc instruction
• Converting this to realtime units by calling QueryPerformanceCounter once per frame
![Page 18: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/18.jpg)
18
Profiling (3)
• Define macros that put rdtsc calls into preambles and postambles for functions
• Measure and categorize CPU time this way
• Measure “self time” and “hierarchical time”
• Code review of macros / constructors
![Page 19: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/19.jpg)
19
Problem with rdtsc
• There’s this SpeedStep thing on Intel laptops– Change the CPU’s clock speed based on
performance / temperature demands– Does not adjust rdtsc to compensate
• May spread beyond laptops in the future– Power consumption of CPUs is becoming an
important concern for businesses
![Page 20: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/20.jpg)
20
We can detect if rdtsc is screwing up profiling data
• But we can’t fix the profiling data
• Solution: just draw a big warning on the screen
![Page 21: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/21.jpg)
21
Division of Profiler
• Low-Level Profiler
• High-Level Profiler
![Page 22: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/22.jpg)
22
Walkthrough of first demo app
• How it uses the macros
• How it collects and draws the profiling data
![Page 23: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/23.jpg)
23
Measuring varianceof profiling data
• To figure out how stable each function is
• Draw which functions are “hot” in the realtime display
![Page 24: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/24.jpg)
24
Behaviors
• We would like some better analysis of what the different behaviors are for our program
• Just “eyeing” the results is not very scientific
• Examples of different behaviors– Fill rate limited, AI limited, etc
![Page 25: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/25.jpg)
25
Batch Profiling vs Interactive Profiling
• Batch profiling averages a bunch of data together over a session– Maybe it provides a way to peek at individual
samples but the processing is never very convenient
• Interactive profiling is about seeing results as soon as they happen– But interactive profilers are usually hacked
together• What if we made a good one?
![Page 26: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/26.jpg)
26
Want to detect and analyzespecific behaviors
• But without preconceived ideas of what they might be
• Treat incoming frames of profiling data as vectors, and cluster them
• Description of k-means clustering
![Page 27: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/27.jpg)
27
Clustering algorithms tend tobe pretty slow
• And they require batch data to process– k-means needs random access to the input!
• Online k-means– Faster, non-batch. But quality?
![Page 28: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/28.jpg)
28
Self-Organizing Map
• “Kohonen Self-Organizing Map”
• Description of the algorithm
• Much like online k-means– But with coherence in a separate space
![Page 29: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/29.jpg)
29
Demo of SOM-enabledProfiling Tool
• Visualizations are still early
• Hopefully they will mature into something truly useful (people in other visualization fields like SOMs, so hopes are high)
![Page 30: KIPA Game Engine Seminars](https://reader035.vdocument.in/reader035/viewer/2022081506/56814e95550346895dbc3ddd/html5/thumbnails/30.jpg)
30
Discussions of changes made to SOM to support online clustering