anatomy of a mobile - eprg.org · transistors • a cpu is a collection of transistors • all...
TRANSCRIPT
Anatomy of a MobileSteven R. Bagley
Today
• Last week, considered the characteristics of a mobile device
• Today, look at what’s inside a mobile…
Device Characteristics
• CPU (~1GHz or less)
• RAM (128MB—1GB)
• Flash storage (varies, sometimes external)
• Display (LCD, resolution < 1024x768)
• WiFi, 3G, Bluetooth, GPS
• User Input (Touch, accelerometer, etc)
Device Characteristics
• CPU (~1GHz or less)
• RAM (128MB—1GB)
• Flash storage (varies, sometimes external)
• Display (LCD, resolution < 1024x768)
• WiFi, 3G, Bluetooth, GPS
• User Input (Touch, accelerometer, etc)This sounds like a Computer!
Mobile vs. Computer
• In many ways, the technology within a mobile device is conceptually to the technology in a computer
• But it is the way it is put together that is different
Inside a computer
• Several chips…
• CPU
• North bridge, South bridge
• RAM
• Several cards with extra bits on
Connections
• CPU has a bus that connects it to the other devices
• Originally, this would be an address bus, and a data bus alongside some control buses
• RAM chips etc. have similar connections
• Connect CPU to RAM/ROM
• Plus some control logic, and you have a PC
Connections
• CPU has a bus that connects it to the other devices
• Originally, this would be an address bus, and a data bus alongside some control buses
• RAM chips etc. have similar connections
• Connect CPU to RAM/ROM
• Plus some control logic, and you have a PC
Connections
• CPU has a bus that connects it to the other devices
• Originally, this would be an address bus, and a data bus alongside some control buses
• RAM chips etc. have similar connections
• Connect CPU to RAM/ROM
• Plus some control logic, and you have a PC
Connections
• These days, more advanced buses (e.g. HyperTransport) are used
• However, the number of pins used for the bus defines chip size
• Typical PC parts won’t fit in a mobile…
• Need a different approach
Transistors
• A CPU is a collection of transistors
• All digital logic devices are built out of Transistors
• 4 transistors will build a NAND gate
• From a NAND gate, you can build any logic circuit
Intel 4004 has 2300, 8008 has 3500 -- 6502 had around 351068000 cpu had 68k,Intel Core i7 has 731m to over a billion depending on CPU typeARM Cortex A9 ~26m
System on a Chip
• Only use a small amount of the transistors to form a CPU
• The rest are used to form the other parts of the system
• External pins connect directly to hardware
• Called ‘System on a Chip’
System on a Chip
• Built block by block from descriptions of separate parts
• CPU block
• GPU block
• Etc…
System on a Chip
• Qualcomm SnapDragon
• Apple A4/A5
• Texas Instruments OMAP
• All share the same CPU part… (ARM Cortex A8)
• But may have different companion technology
Typically Mobile CPU parts
from www.ifixit.com
Gives you some idea of scale — middle is the SIM card trayA4 chips is actually a Package-on-Package device
OMAP 44x0
• Typical, SoC chip used in mobile devices
• Motorola Droid Bionic, Samsung Galaxy Nexus
• Contains two ARM Cortex-A9 CPU
• 3D GPU (PowerVR SGX)
• IVA Accelerator
• Image Signal Processor
Discuss the various parts of the chip
Package on Package• What about RAM?
• Could add it to the SoC...
• But RAM uses a lot of space...
• Space we want to use for ‘useful’ stuff
• Separate package on top of the SoC
• Package-on-Package (or PoP)
Package on Package• What about RAM?
• Could add it to the SoC...
• But RAM uses a lot of space...
• Space we want to use for ‘useful’ stuff
• Separate package on top of the SoC
• Package-on-Package (or PoP)
Package on Package• What about RAM?
• Could add it to the SoC...
• But RAM uses a lot of space...
• Space we want to use for ‘useful’ stuff
• Separate package on top of the SoC
• Package-on-Package (or PoP)
from www.ifixit.com
RAM
• RAM also tends to be shared between the CPU and GPU unlike in a computer
• Not all the RAM in the phone is available for the computer...
• Code assuming you don't have much...
Mobile CPUs
• Almost all Mobile Devices use ARM CPUs
• Originated from Acorn computers in the late-1980s
• Acorn RISC Machine
• Spun out from Acorn in early-1990s
• Now, Advanced RISC machine
ARM CPU
• 32-bit CPU
• Doesn’t always have a CPU
• Sold as a design, not a physical device
• Great for SoC usage!
• Various revisions of instruction set
(current tend to be ARM v7, but ARM v6 also about)
ARM CPU
• RISC-design (Reduced Instruction Set Chip)
• Removes instructions that aren’t used that often
• E.g. Divide instructions
• If you need to divide, you roll your own in software
(or the compiler does)
ARM CPU
• 16 registers (+ shadows)
• Load/Store architecture
• Each instruction is 32-bits long
• Makes decoding easy
• But means loading a constant into a register can be tricky
Load and Store
• Data can only be moved
• From a register to Memory (store)
• From memory to a register (load)
• Not memory to memory
• Although a variety of addressing modes are allowed
Constants
• Constants must fit into the 32-bit instruction width
• Can’t therefore be the full 32-bits
• 8-bit + a 4-bit shift — gives a wide range of values
• Also a Move Negated instruction
• If that doesn’t work, load from a literal pool
Memory Addresses
• Also makes finding the addresses of things in memory tricky
• Done by adding a constant to the PC
• Sometimes requires two or three instructions to calculate
• Assembler and compiler have psuedo-instructions to do it for you
Although PC is 8 bytes ahead
ARM Speed
• Running at 1GHz means its going to be slow right?
• Not necessarily…
• GHz-speed tells us ‘cycles per second’
• Actual speed depends on how many cycles an instruction takes
• ARM aims for one-cycle per instruction
ARM Speed
• x86 instructions can take many cycles
• If an ARM instruction takes 1 cycle at 1GHz
• And an x86 instruction takes 3 cycles at 3GHz
• Which is faster?
• Speed is not entirely down to clock speed
ARM Conditional Execution
• Often want to conditionally execute some code
• Traditional approach is to execute a compare
• Then Branch if the condition is met
• Branches are ‘expensive’
(lots of clock cycles)
ARM Conditional Execution
• ARM has another trick up its sleeve…
• Any instruction can be made conditional
• Removing need for an expensive branch
• Faster program, smaller footprint
Barrel Shifter
• ARM has a barrel shifter that can be used on any instruction
• Shift/Rotate the bits within the binary number
• Good way of multiplying or dividing by powers of two in some cases
ARM and Thumb
• Every ARM instruction is 32bits long
• Can take up a lot of memory
• Memory accesses take time
• Can slow things down
• Enter Thumb…
Thumb
• 16-bit version of the ARM instruction set
• Variable length instruction set
• Took the most popular ARM instructions used and encoded them as 16-bit values
• Why?
Why Thumb?
• Speed…
• ARM instructions require 32bits to be read every instruction
• Memory reads take time
• Not all memory is 32bit wide…
• On smaller RAM width, takes multiple reads to get an instruction
Why Thumb?
• By making the instructions smaller, you gain a speed up.
• 8-bit memory two reads per instruction
• 16-bit memory, one read per instruction
• 32-bit memory, one read gives two instructions
• iPhone SDK generates Thumb by default
Floating Point and Thumb
• Thumb doesn’t encode FPU instructions
• CPU would have to branch to ARM instructions to execute it
• This takes time
• FPU heavy code is better compiled to ARM, not Thumb
• Assuming the device has an FPU!
Compilers
• Most of the time, the compiler will take advantage of this
• Although you may have to manually switch it compile ARM over Thumb
ARM big.LITTLE
• ARM's latest processor contains two cores
• One optimised for performance (big core)
• One optimised for energy efficiency (LITTLE core)
• Two Cores are architecturally consistent
• System can switch between the two as appropriate for the task in hand...