advanced mobile optimizations

Post on 22-Jan-2015

756 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

How take 60fps in games for iPhone

TRANSCRIPT

Advanced Mobile Optimizations

How to go to 60 fps after you have removed all Sleep calls ;-)

Disclaimer

• The views expressed here are my personal views and do not necessarily reflect the thoughts, opinions, intentions, plans or strategies of Unity

Optimization Mindset

• you can't just make your game faster– there is no magic bullet– very specific stuff

• not the same as scripting charachter

Optimization Mindset

• not in specific order• know• think• measure

Optimization Mindset

• You can't avoid any of that– no, really

Optimization Mindset

• know + think = shoot in the dark– you just write code hoping for the best

• know + measure = shoot in the dark– you are missing "understand" part

• think + measure = shoot in the dark– you solve abstract problem, not real

Optimization Mindset: know + think

• hardware is more complex then you think– highly parallel– deep pipelining– when you write asm - high-level already

Optimization Mindset: know + measure

• knowledge is static• knowledge comes from the past• knowledge is general

Optimization Mindset: know + measure

• qsort vs bubble sort– sure, qsort is faster

• but you are missing the point– maybe radix?– maybe no need to sort?– maybe insertion?– parallel sorting network?

Optimization Mindset: think + measure

• solving abstract problem– example: GPU

• optimizing for RIVA TNT and GTX is different

Optimization Mindset

• well, if you are missing two from the three– no comments

Know

• your hardware• your data

– knowing data is interleaved with think– we will talk more of it in "think"

Know your hardware

• GPU• CPU• whatever

– e.g. disk load speed

Know your hardware: GPU

• Pipeline– meaning - slow step = slow everything– you are as slow as your bottleneck

• Know your pipeline• Won't go into full pipeline spec

– Resources section• Just common/biggest problems

Know your hardware: GPU Geometry

• pre/post tnl cache– should use indexed geometry or not

• cache hit rate – strips vs tri list

• memory throughput– vertex size

• fetch cost (memory)– pack attributes or not

Know your hardware: GPU Textures

• Texture Cache– swizzle– compression– mip-maps

• Biggest memory hog

Know your hardware: GPU Shaders

• VertexProgram vs FragmentShader– balancing– attributes

• Unified Shaders– load balancing

• Precision– gles: highp/mediump/lowp– CG: float/half/fixed (iirc)

Know your hardware: GPU Rasterization

• Fillrate (memory speed)– alpha

• 2x2 samples (or more)– why GometryLOD matters

Know your hardware: CPU

• Mobile = in-order RISC– for stupid code far worse than CISC

• 2 main issues:– Memory speed– Computation speed

Know your hardware: CPU Memory

• This is single most important factor– memory access far slower then computation

• Latency vs Throughput• Caches

– fast memory– your best friend– L1/L2/whatever

• LHS

Know your hardware: CPU Computations

• SIMD– better memory usage– better arithmetic usage (4 vals instead of 1)

Know your target hardware

• There were general rules• But you are running on that particular

piece of sh... hardware

Know your target hardware: PowerVR

• TBDR– perfect hidden surface removal– Alpha-Test/discard

• shader precision • unified shaders• Tegra / ATI-AMD / Adreno more common

Know your target hardware: ARM

• VFP = FPU on steroids (not real SIMD)– scalar instructions at same speed as

vectorized• NEON = SIMD

– more registers– awesome load/store instructions– not as cool as Altivec but cool enough for

mobiles

Know your target hardware: ARM

• Conditional execution of most instructions• Fold shifts and rotates into the "data

processing" instructions– load structure from array by index

• Thumb + float = disaster– switch back and forth between Thumb mode

and regular 32-bit mode

Know your hardware: Resources

• RTR• lots of whitepapers:

– powerVR (imgtech) tegra (nvidia) adreno (qualcomm)

– AMD/ATI - basically the same as X360, but much smaller tiles

• ARM dev center

Think

• Think about your data• Think about your algorithms• Think about your constraints• Think about your hardware

Think Basics

• CPU vs GPU– e.g. draw calls

• pure CPU cost

• CPU:– memory vs arithmetic

• memory slower

• GPU:– vprog vs fshader– memory vs arithmetic

Think Memory

• fragmentation• data organization

– AOS vs SOA – hot/cold split

• data structures– linear vs random – array vs list – map vs hashtable – allocators

Think Constraints

• GPU: will you see the difference?– really?– on mobile screen?– on that one small thingy in the corner?

• CPU: will you need that?– e.g. physics in casual game?

• Memory: will you need that?– will you need more then XXX actors?

Measure

• you didn't optimize anything if you didn't measure difference

• you can't optimize if you don't know what needs to be optimized– if you can't measure what takes time

Measure Tools

• there are lots of tools – instruments (ios)– perfhud (tegra)– adreno profiler (qualcomm)– some more probably

• Poor-man profiler– timers

Unity use case:random bits

• Mobile shaders– specialized of usual built-ins

• Skinning– full NEON/VFP impl

• usually 10-15% of c-code time– and we are not done optimizing it ;-)

• Rej's baking material to texture and coming soon BRDF baking to texture

Unity use case:random bits

• Remote Profiler– run on target hw, data is transferred over wifi– collect in Editor and show pretty graphs ;-)

• Sort alpha-test *after* opaque• check *lots* of extensions• LODs - almost done• Vertex Cache optimization - after LODs ;-)

Closing Words

• Know hardware• Know data• Think data• Think constraints• Measure always

– You better know earlier• You should be always optimizing

Questions

top related