cranking floating point performance up to 11
DESCRIPTION
The iPhone has a surprisingly powerful engine under that shiny hood when it comes to floating-point computations. This is something that surprises a lot of programmers because by default, things can slow down a lot whenever any floating point numbers are involved. This session will explain the secrets to unlocking maximum performance for floating point calculations, from the mysteries of Thumb mode, to harnessing the full power of the forgotten vector floating point unit. Stay away from this session if he thought of reading or even (gasp!) writing assembly code scares you.TRANSCRIPT
![Page 1: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/1.jpg)
Cranking Floating Point Performance Up To 11
Noel LlopisSnappy Touch
http://twitter.com/[email protected]
http://gamesfromwithin.com
![Page 2: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/2.jpg)
![Page 3: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/3.jpg)
![Page 4: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/4.jpg)
![Page 5: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/5.jpg)
![Page 6: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/6.jpg)
![Page 7: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/7.jpg)
Floating Point Performance
![Page 8: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/8.jpg)
Floating point numbers
![Page 9: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/9.jpg)
Floating point numbers
• Representation of rational numbers
![Page 10: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/10.jpg)
Floating point numbers
• Representation of rational numbers
• 1.2345, -0.8374, 2.0000, 14388439.34, etc
![Page 11: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/11.jpg)
Floating point numbers
• Representation of rational numbers
• 1.2345, -0.8374, 2.0000, 14388439.34, etc
• Following IEEE 754 format
![Page 12: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/12.jpg)
Floating point numbers
• Representation of rational numbers
• 1.2345, -0.8374, 2.0000, 14388439.34, etc
• Following IEEE 754 format
• Single precision: 32 bits
![Page 13: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/13.jpg)
Floating point numbers
• Representation of rational numbers
• 1.2345, -0.8374, 2.0000, 14388439.34, etc
• Following IEEE 754 format
• Single precision: 32 bits
• Double precision: 64 bits
![Page 14: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/14.jpg)
Floating point numbers
![Page 15: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/15.jpg)
Floating point numbers
![Page 16: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/16.jpg)
Why floating point performance?
![Page 17: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/17.jpg)
Why floating point performance?
• Most games use floating point numbers for most of their calculations
![Page 18: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/18.jpg)
Why floating point performance?
• Most games use floating point numbers for most of their calculations
• Positions, velocities, physics, etc, etc.
![Page 19: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/19.jpg)
Why floating point performance?
• Most games use floating point numbers for most of their calculations
• Positions, velocities, physics, etc, etc.
• Maybe not so much for regular apps
![Page 20: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/20.jpg)
CPU
![Page 21: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/21.jpg)
CPU
• 32-bit RISC ARM 11
![Page 22: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/22.jpg)
CPU
• 32-bit RISC ARM 11
• 400-535Mhz
![Page 23: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/23.jpg)
CPU
• 32-bit RISC ARM 11
• 400-535Mhz
• iPhone 2G/3G and iPod Touch 1st and 2nd gen
![Page 24: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/24.jpg)
CPU (iPhone 3GS)
![Page 25: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/25.jpg)
CPU (iPhone 3GS)
• Cortex-A8 600MHz
![Page 26: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/26.jpg)
CPU (iPhone 3GS)
• Cortex-A8 600MHz
• More advanced architecture
![Page 27: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/27.jpg)
CPU
![Page 28: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/28.jpg)
CPU
• No floating point support in the ARM CPU!!!
![Page 29: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/29.jpg)
How about integer math?
![Page 30: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/30.jpg)
How about integer math?
• No need to do any floating point operations
![Page 31: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/31.jpg)
How about integer math?
• No need to do any floating point operations
• Fully supported in the ARM processor
![Page 32: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/32.jpg)
How about integer math?
• No need to do any floating point operations
• Fully supported in the ARM processor
• But...
![Page 33: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/33.jpg)
Integer Divide
![Page 34: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/34.jpg)
Integer Divide
![Page 35: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/35.jpg)
Integer Divide
There is no integer divide
![Page 36: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/36.jpg)
Fixed-point arithmetic
![Page 37: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/37.jpg)
Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
![Page 38: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/38.jpg)
Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
• You need to represent rational numbers
![Page 39: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/39.jpg)
Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
• You need to represent rational numbers
• Can use a fixed-point library.
![Page 40: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/40.jpg)
Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
• You need to represent rational numbers
• Can use a fixed-point library.
• Performs rational arithmetic with integer values at a reduced range/resolution.
![Page 41: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/41.jpg)
Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
• You need to represent rational numbers
• Can use a fixed-point library.
• Performs rational arithmetic with integer values at a reduced range/resolution.
• Not so great...
![Page 42: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/42.jpg)
Floating point support
![Page 43: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/43.jpg)
Floating point support
• There’s a floating point unit
![Page 44: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/44.jpg)
Floating point support
• There’s a floating point unit
• Compiled C/C++/ObjC code uses the VFP unit for any floating point operations.
![Page 45: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/45.jpg)
Sample program
![Page 46: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/46.jpg)
Sample program struct Particle { float x, y, z; float vx, vy, vz; };
![Page 47: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/47.jpg)
Sample program struct Particle { float x, y, z; float vx, vy, vz; };
for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}
![Page 48: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/48.jpg)
Sample program struct Particle { float x, y, z; float vx, vy, vz; };
for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}
• 7.2 seconds on an iPod Touch 2nd gen
![Page 49: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/49.jpg)
Floating point support
![Page 50: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/50.jpg)
Floating point support
Trust no one!
![Page 51: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/51.jpg)
Floating point support
Trust no one!When in doubt, check the
assembly generated
![Page 52: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/52.jpg)
Floating point support
![Page 53: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/53.jpg)
Thumb Mode
![Page 54: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/54.jpg)
Thumb Mode
![Page 55: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/55.jpg)
Thumb Mode• CPU has a special thumb
mode.
![Page 56: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/56.jpg)
Thumb Mode• CPU has a special thumb
mode.
• Less memory, maybe better performance.
![Page 57: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/57.jpg)
Thumb Mode• CPU has a special thumb
mode.
• Less memory, maybe better performance.
• No floating point support.
![Page 58: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/58.jpg)
Thumb Mode• CPU has a special thumb
mode.
• Less memory, maybe better performance.
• No floating point support.
• Every time there’s an fp operation, it switches out of Thumb, does the fp operation, and switches back on.
![Page 59: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/59.jpg)
Thumb Mode
![Page 60: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/60.jpg)
Thumb Mode
• It’s on by default!
![Page 61: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/61.jpg)
Thumb Mode
• It’s on by default!
• Potentially HUGE wins turning it off.
![Page 62: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/62.jpg)
Thumb Mode
• It’s on by default!
• Potentially HUGE wins turning it off.
![Page 63: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/63.jpg)
Thumb Mode
![Page 64: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/64.jpg)
Thumb Mode
• Turning off Thumb mode increased performance in Flower Garden by over 2x
![Page 65: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/65.jpg)
Thumb Mode
• Turning off Thumb mode increased performance in Flower Garden by over 2x
• Heavy usage of floating point operations though
![Page 66: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/66.jpg)
Thumb Mode
• Turning off Thumb mode increased performance in Flower Garden by over 2x
• Heavy usage of floating point operations though
• Most games will probably benefit from turning it off (especially 3D games)
![Page 67: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/67.jpg)
![Page 68: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/68.jpg)
2.6 seconds!
![Page 69: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/69.jpg)
ARM assemblyDISCLAIMER:
![Page 70: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/70.jpg)
ARM assembly
I’m not an ARM assembly expert!!!DISCLAIMER:
![Page 71: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/71.jpg)
ARM assembly
I’m not an ARM assembly expert!!!DISCLAIMER:
![Page 72: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/72.jpg)
ARM assembly
I’m not an ARM assembly expert!!!DISCLAIMER:
![Page 73: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/73.jpg)
ARM assembly
I’m not an ARM assembly expert!!!DISCLAIMER:
Z80!!!
![Page 74: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/74.jpg)
ARM assembly
![Page 75: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/75.jpg)
ARM assembly
• Hit the docs
![Page 76: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/76.jpg)
ARM assembly
• Hit the docs
• References included in your USB card
![Page 77: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/77.jpg)
ARM assembly
• Hit the docs
• References included in your USB card
• Or download them from the ARM site
![Page 78: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/78.jpg)
ARM assembly
• Hit the docs
• References included in your USB card
• Or download them from the ARM site
• http://bit.ly/arminfo
![Page 79: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/79.jpg)
ARM assembly
![Page 80: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/80.jpg)
ARM assembly
• Reading assembly is a very important skill for high-performance programming
![Page 81: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/81.jpg)
ARM assembly
• Reading assembly is a very important skill for high-performance programming
• Writing is more specialized. Most people don’t need to.
![Page 82: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/82.jpg)
VFP unit
![Page 83: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/83.jpg)
VFP unitA0
![Page 84: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/84.jpg)
VFP unitA0
+
![Page 85: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/85.jpg)
VFP unitA0
B0+
![Page 86: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/86.jpg)
VFP unitA0
B0+
=
![Page 87: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/87.jpg)
VFP unitA0
B0+
C0=
![Page 88: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/88.jpg)
VFP unitA0
B0+
C0=
A1
B1+
C1=
![Page 89: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/89.jpg)
VFP unitA0
B0+
C0=
A1
B1+
C1=
A2
B2+
C2=
![Page 90: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/90.jpg)
VFP unitA0
B0+
C0=
A1
B1+
C1=
A2
B2+
C2=
A3
B3+
C3=
![Page 91: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/91.jpg)
VFP unit
![Page 92: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/92.jpg)
VFP unitA0 A1 A2 A3
![Page 93: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/93.jpg)
VFP unit
+A0 A1 A2 A3
![Page 94: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/94.jpg)
VFP unit
+A0 A1 A2 A3
B0 B1 B2 B3
![Page 95: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/95.jpg)
VFP unit
+
=
A0 A1 A2 A3
B0 B1 B2 B3
![Page 96: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/96.jpg)
VFP unit
+
=
A0 A1 A2 A3
B0 B1 B2 B3
C0 C1 C2 C3
![Page 97: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/97.jpg)
VFP unit
+
=
A0 A1 A2 A3
B0 B1 B2 B3
C0 C1 C2 C3
Sweet! How do we use the vfp?
![Page 98: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/98.jpg)
"fldmias %2, {s8-s23} \n\t" "fldmias %1!, {s0-s3} \n\t" "fmuls s24, s8, s0 \n\t" "fmacs s24, s12, s1 \n\t"
"fldmias %1!, {s4-s7} \n\t"
"fmacs s24, s16, s2 \n\t" "fmacs s24, s20, s3 \n\t" "fstmias %0!, {s24-s27} \n\t"
Like this!
![Page 99: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/99.jpg)
Writing vfp assembly
![Page 100: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/100.jpg)
Writing vfp assembly
• There are two parts to it
![Page 101: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/101.jpg)
Writing vfp assembly
• There are two parts to it
• How to write any assembly in gcc
![Page 102: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/102.jpg)
Writing vfp assembly
• There are two parts to it
• How to write any assembly in gcc
• Learning ARM and VPM assembly
![Page 103: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/103.jpg)
vfpmath library
![Page 104: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/104.jpg)
vfpmath library
• Already done a lot of work for you
![Page 105: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/105.jpg)
vfpmath library
• Already done a lot of work for you
• http://code.google.com/p/vfpmathlibrary
![Page 106: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/106.jpg)
vfpmath library
• Already done a lot of work for you
• http://code.google.com/p/vfpmathlibrary
• Vector/matrix math
![Page 107: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/107.jpg)
vfpmath library
• Already done a lot of work for you
• http://code.google.com/p/vfpmathlibrary
• Vector/matrix math
• Might not be exactly what you need, but it’s a great starting point
![Page 108: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/108.jpg)
Assembly in gcc
• Only use it when targeting the device
![Page 109: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/109.jpg)
Assembly in gcc
• Only use it when targeting the device
#include <TargetConditionals.h>#if (TARGET_IPHONE_SIMULATOR == 0) && (TARGET_OS_IPHONE == 1) #define USE_VFP#endif
![Page 110: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/110.jpg)
Assembly in gcc
• The basics
asm (“cmp r2, r1”);
![Page 111: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/111.jpg)
Assembly in gcc
• The basics
asm (“cmp r2, r1”);
http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html
![Page 112: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/112.jpg)
Assembly in gcc
• Multiple lines
asm ( “mov r0, #1000\n\t” “cmp r2, r1\n\t”);
![Page 113: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/113.jpg)
Assembly in gcc• Accessing C variables
asm (//assembly code : // output operands : // input operands : // clobbered registers);
![Page 114: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/114.jpg)
Assembly in gcc• Accessing C variables
asm (//assembly code : // output operands : // input operands : // clobbered registers);
int src = 19; int dest = 0; asm volatile ( "add %0, %1, #42" : "=r" (dest) : "r" (src) : );
![Page 115: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/115.jpg)
Assembly in gcc• Accessing C variables
asm (//assembly code : // output operands : // input operands : // clobbered registers);
int src = 19; int dest = 0; asm volatile ( "add %0, %1, #42" : "=r" (dest) : "r" (src) : );
%0, %1, etc are the variables in order
![Page 116: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/116.jpg)
Assembly in gcc
![Page 117: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/117.jpg)
Assembly in gcc int src = 19; int dest = 0; asm volatile ( "add r10, %1, #42\n\t" "add %0, r10, #33\n\t" : "=r" (dest) : "r" (src) : "r10" );
![Page 118: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/118.jpg)
Assembly in gcc int src = 19; int dest = 0; asm volatile ( "add r10, %1, #42\n\t" "add %0, r10, #33\n\t" : "=r" (dest) : "r" (src) : "r10" );
Clobber register list are registers used by
the asm block
![Page 119: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/119.jpg)
Assembly in gcc int src = 19; int dest = 0; asm volatile ( "add r10, %1, #42\n\t" "add %0, r10, #33\n\t" : "=r" (dest) : "r" (src) : "r10" );
Clobber register list are registers used by
the asm block
volatile prevents “optimizations”
![Page 120: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/120.jpg)
VFP asmFour banks of 8 32-bit registers each
![Page 121: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/121.jpg)
VFP asmFour banks of 8 32-bit registers each
#define VFP_VECTOR_LENGTH(VEC_LENGTH) "fmrx r0, fpscr \n\t" \ "bic r0, r0, #0x00370000 \n\t" \ "orr r0, r0, #0x000" #VEC_LENGTH "0000 \n\t" \ "fmxr fpscr, r0 \n\t"
![Page 122: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/122.jpg)
VFP asm
![Page 123: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/123.jpg)
VFP asm
![Page 124: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/124.jpg)
VFP asmfor (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}
![Page 125: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/125.jpg)
VFP asm for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; asm volatile ( "fldmias %0, {s0-s5} \n\t" "fldmias %1, {s6-s8} \n\t" "fldmias %2, {s9-s11} \n\t" "fmacs s0, s3, s6 \n\t" "fmuls s3, s3, s9 \n\t" "fstmias %0, {s0-s5} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }
for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}
![Page 126: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/126.jpg)
VFP asm for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; asm volatile ( "fldmias %0, {s0-s5} \n\t" "fldmias %1, {s6-s8} \n\t" "fldmias %2, {s9-s11} \n\t" "fmacs s0, s3, s6 \n\t" "fmuls s3, s3, s9 \n\t" "fstmias %0, {s0-s5} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }
for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}
Was: 2.6 seconds
![Page 127: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/127.jpg)
VFP asm for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; asm volatile ( "fldmias %0, {s0-s5} \n\t" "fldmias %1, {s6-s8} \n\t" "fldmias %2, {s9-s11} \n\t" "fmacs s0, s3, s6 \n\t" "fmuls s3, s3, s9 \n\t" "fstmias %0, {s0-s5} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }
for (int i=0; i<MaxParticles; ++i){ Particle& p = s_particles[i]; p.x += p.vx*dt; p.y += p.vy*dt; p.z += p.vz*dt; p.vx *= drag; p.vy *= drag; p.vz *= drag;}
Was: 2.6 secondsNow: 1.4 seconds!!
![Page 128: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/128.jpg)
VFP asmLet’s do 6 operations at once!
struct Particle2 { float x0, y0, z0; float x1, y1, z1; float vx0, vy0, vz0; float vx1, vy1, vz1; };
![Page 129: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/129.jpg)
VFP asm for (int i=0; i<iterations; ++i) { Particle2* p = &s_particles2[i]; asm volatile ( "fldmias %0, {s0-s11} \n\t" "fldmias %1, {s12-s17} \n\t" "fldmias %2, {s18-s23} \n\t" "fmacs s0, s6, s12 \n\t" "fmuls s6, s6, s18 \n\t" "fstmias %0, {s0-s11} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); }
![Page 130: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/130.jpg)
VFP asm for (int i=0; i<iterations; ++i) { Particle2* p = &s_particles2[i]; asm volatile ( "fldmias %0, {s0-s11} \n\t" "fldmias %1, {s12-s17} \n\t" "fldmias %2, {s18-s23} \n\t" "fmacs s0, s6, s12 \n\t" "fmuls s6, s6, s18 \n\t" "fstmias %0, {s0-s11} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); } Was: 1.4 seconds
![Page 131: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/131.jpg)
VFP asm for (int i=0; i<iterations; ++i) { Particle2* p = &s_particles2[i]; asm volatile ( "fldmias %0, {s0-s11} \n\t" "fldmias %1, {s12-s17} \n\t" "fldmias %2, {s18-s23} \n\t" "fmacs s0, s6, s12 \n\t" "fmuls s6, s6, s18 \n\t" "fstmias %0, {s0-s11} \n\t" : "=r" (p) : "r" (p), "r" (dtArray), "r" (dragArray) : ); } Was: 1.4 seconds
Now: 1.2 seconds
![Page 132: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/132.jpg)
VFP asmWhat’s the loop/cache overhead?
for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; p->x = p->vx; p->y = p->vy; p->z = p->vz; }
![Page 133: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/133.jpg)
VFP asmWhat’s the loop/cache overhead?
for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; p->x = p->vx; p->y = p->vy; p->z = p->vz; }
Was: 1.2 seconds
![Page 134: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/134.jpg)
VFP asmWhat’s the loop/cache overhead?
for (int i=0; i<MaxParticles; ++i) { Particle* p = &s_particles[i]; p->x = p->vx; p->y = p->vy; p->z = p->vz; }
Was: 1.2 secondsNow: 1.2 seconds!!!!
![Page 135: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/135.jpg)
![Page 136: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/136.jpg)
Matrix multiply
![Page 137: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/137.jpg)
Matrix multiplyStraight from vfpmathlib
![Page 138: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/138.jpg)
Matrix multiply
Touch: 0.037919 s
Straight from vfpmathlib
![Page 139: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/139.jpg)
Matrix multiply
Touch: 0.037919 sNormal: 0.096855 s
Straight from vfpmathlib
![Page 140: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/140.jpg)
Matrix multiply
Touch: 0.037919 sNormal: 0.096855 sVFP: 0.042216 s
Straight from vfpmathlib
![Page 141: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/141.jpg)
Matrix multiply
Touch: 0.037919 sNormal: 0.096855 sVFP: 0.042216 s
About 2x faster!
Straight from vfpmathlib
![Page 142: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/142.jpg)
Good use of vfp
![Page 143: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/143.jpg)
Good use of vfp
• Matrix operations
![Page 144: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/144.jpg)
Good use of vfp
• Matrix operations
• Particle systems
![Page 145: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/145.jpg)
Good use of vfp
• Matrix operations
• Particle systems
• Skinning
![Page 146: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/146.jpg)
Good use of vfp
• Matrix operations
• Particle systems
• Skinning
• Physics
![Page 147: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/147.jpg)
Good use of vfp
• Matrix operations
• Particle systems
• Skinning
• Physics
• Procedural content generation
![Page 148: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/148.jpg)
Good use of vfp
• Matrix operations
• Particle systems
• Skinning
• Physics
• Procedural content generation
• ....
![Page 149: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/149.jpg)
What about the 3GS?
![Page 150: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/150.jpg)
What about the 3GS?
3G 3GS
Thumb
Normal
VFP1
VFP2
Touch
7.2 8.0
2.6 2.6
1.4 1.30
1.2 0.64
1.2 0.18
![Page 151: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/151.jpg)
What about the 3GS?
3G 3GS
Thumb
Normal
VFP1
VFP2
Touch
7.2 8.0
2.6 2.6
1.4 1.30
1.2 0.64
1.2 0.18
![Page 152: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/152.jpg)
What about the 3GS?
3G 3GS
Thumb
Normal
VFP1
VFP2
Touch
7.2 8.0
2.6 2.6
1.4 1.30
1.2 0.64
1.2 0.18
![Page 153: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/153.jpg)
What about the 3GS?
3G 3GS
Thumb
Normal
VFP1
VFP2
Touch
7.2 8.0
2.6 2.6
1.4 1.30
1.2 0.64
1.2 0.18
![Page 154: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/154.jpg)
What about the 3GS?
3G 3GS
Thumb
Normal
VFP1
VFP2
Touch
7.2 8.0
2.6 2.6
1.4 1.30
1.2 0.64
1.2 0.18
![Page 155: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/155.jpg)
What about the 3GS?
3G 3GS
Thumb
Normal
VFP1
VFP2
Touch
7.2 8.0
2.6 2.6
1.4 1.30
1.2 0.64
1.2 0.18
![Page 156: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/156.jpg)
What about the 3GS?
3G 3GS
Thumb
Normal
VFP1
VFP2
Touch
7.2 8.0
2.6 2.6
1.4 1.30
1.2 0.64
1.2 0.18
![Page 157: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/157.jpg)
More 3GS: NEON
![Page 158: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/158.jpg)
More 3GS: NEON
• SIMD coprocessor
![Page 159: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/159.jpg)
More 3GS: NEON
• SIMD coprocessor
• Floating point and integer
![Page 160: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/160.jpg)
More 3GS: NEON
• SIMD coprocessor
• Floating point and integer
• Huge potential
![Page 161: Cranking Floating Point Performance Up To 11](https://reader033.vdocument.in/reader033/viewer/2022052900/555a08f4d8b42ad00a8b54a0/html5/thumbnails/161.jpg)
More 3GS: NEON
• SIMD coprocessor
• Floating point and integer
• Huge potential
• Very little documentation right now :-(