arm processor. why arm? as of 2007, about 98% of the more than one billion mobile phones sold each...
TRANSCRIPT
ARM Processor
Why ARM?
As of 2007, about 98% of the more than one billion mobile phones sold each year use at least one ARM processor.
As of 2009, ARM processors account for approximately 90% of all embedded 32-bit RISC processors
source: http://en.wikipedia.org/wiki/ARM_architecture
HistoryARM was developed at Acron Computer Limited
of Cambridge, England between 1983 and 1985RISC concept introduced in 1980 at Stanford
and BerkleyARM Limited founded in 1990ARM Cores
Licensed to partners to develop and fabricate new micro-controllers
Soft-core
ARM ArchitectureBased upon RISC Architecture with enhancements
to meet requirements of embedded applicationsA large uniform register fileLoad-store architecture, where data processing
operations operate on register contents onlyUniform and fixed length instructions32-bit processorInstructions are 32-bit longGood Speed/Power Consumption RatioHigh Code Density
Enhancement to Basic RISC Features
Variable cycle execution for certain instructionsload-store-multiple instructions
Inline barrel shifter leading to more complex instructionsPreprocessing one of the input registers before use
Thumb 16-bit instruction setCode density improved by 30% over 32-bit instructions
Enhanced DSP instructionsSupport fast 16x16 multiplier operations
Enhancement to Basic RISC Features
Auto-increment and auto-decrement addressing modes to optimize program loops
Load and Store Multiple instructions to maximize data throughput
Conditional Execution of instruction to maximize execution throughput
ARM Architecture VersionsVersion 1 (1983-85)
26 bit addressing, no multiply or co-processor
Version 2 Includes 32-bit result multiply co-processor
Version 332 bit addressing
Version 4Add signed, unsigned half-word and signed byte load and store
instructionsVersion 4T
16-bit Thumb compressed form of instruction introduced
ARM Architecture VersionsVersion 5T
Superset of 4T adding new instructions
Version 5TEAdd signal processing signal extension
Examples:ARM 6: v3ARM 7: v3, ARM7TDMI: v4TStrongARM: v4ARM 9E-S: v5TE
Overview: Core Data PathData items are placed in register file
No data processing instructions directly manipulate data in memory
Instructions typically use two source registers and single result or destination registers
A Barrel shifter on the data path can pre-process data before it enters ALU
Increment/decrement logic can update register content for sequential access independent of ALU
Basic ARM Organization
General Purpose registers hold either data or address
All registers are of 32 bitsIn user mode 16 data registers and 2
status registers are visibleData registers: r0 to r15
Three registers r13, r14, r15 perform special functions
r13: stack pointer r14: link register r15: program counter
Registers
Registers (2)Depending upon context, registers r13 and r14 can
also be used as GPRAny instruction which use r0 can as well be used
with any other GPR (r1-r13) (Orthogonal)
In addition, there are two status registersCPSR: current program status registerSPSR: saved program status register
Status RegistersCPSR: monitors and controls internal operations
ARM Status BitsEvery arithmetic, logical, or shifting
operation sets CPSR bits:N (negative), Z (zero), C (carry), V
(overflow).
Example: -1 + 1 = 0: NZCV = 0110.
Processor ModesProcessor modes determine
Which registers are active, andAccess rights to CPSR register itself
Each processor mode is eitherPrivileged: full read-write access to the CPSRNon-privileged: read-only access to the
control field of CPSR but read-write access to the condition flags
Processor Modes (2)ARM has seven modes
Privileged: abort, fast interrupt request, interrupt request, supervisor, system and undefined
Non-privileged: userUser mode is used for programs and
applications
Privileged ModesAbort
when there is a failed attempt to access memory
Fast Interrupt Request (FIQ) & interrupt requestcorrespond to interrupt levels available on
ARMSupervisor mode
state after reset and generally the mode in which OS kernel executes
Privileged Modes (2)System mode
special version of user mode that allows full read-write access of CPSR
Undefinedwhen processor encounters an undefined
instruction
Processor Modes
Banked RegistersRegister file contains in all 37 registers
20 registers are hidden from program at different timesThese registers are called banked registers
Banked registers are available only when the processor is in a particular modeProcessor modes (other than system mode) have a
set of associated banked registers that are subset of 16 registers
Maps one-to-one onto a user mode register
SPSREach privileged mode (except system mode) has
associated with it, a Save Program Status Register or SPSR
This SPSR is used to save the state of CPSR (Current Program Status Register) when the privileged mode is entered in order that the user state can be fully restored when the user process is resumed
Mode ChangingMode changes by writing directly to CPSR
or by hardware when the processor responds to exception or interrupt
To return to user mode a special return instruction is used that instructs the core to restore the original CPSR and banked registers
InstructionsInstructions process data held in registers and
access memory with load and store instructionsClasses of instructions:
Data processingBranch instructionsLoad-store instructionsSoftware interrupt instructionsProgram status register instructions
Features of ARM instruction set3-address data processing instructionsConditional execution of every instructionLoad and store multiple registersShift, ALU operation in a single instruction
ARM data instructionsBasic format:
ADD r0,r1,r2Computes r1+r2, stores in r0.
Immediate operand:ADD r0,r1,#2Computes r1+2, stores in r0.
Data ProcessingManipulate data within registers
MOVE instructionsArithmetic instructionsLogical instructionsComparison instructions
Suffix S on data processing instructions updates flags in CPSR
Data Processing InstructionsOperands are 32-bit wide; come from
registers or specified as literal (immediate operands) in the instruction itself
Second operand sent to ALU via barrel shifter
32-bit result placed in register; long multiply instruction produces 64 bit result
Move instructionMOV Rd, N
Rd: destination registerN: can be an immediate value or source registerExample: mov r7, r5
MVN Rd, NMove into Rd not (inverse) of the 32-bit value from
source
Using Barrel ShifterEnables shifting 32-bit operand in one of the source
registers left or right by a specific number of positions
Basic Barrel shifter operationsShift left, shift right, rotate right
Facilitates fast multiply, division and increases code density
Example: mov r7, r5, LSL # 2Multiplies content of r5 by 4 and puts result in r7
Barrel Shift InstructionsLSL, LSR : logical shift left/right
fills with zeroes.ASL, ASR : arithmetic shift left/right
fills with ones.
ROR : rotate right
RRX : rotate right extended with Cperforms 33-bit rotate, including C bit from CPSR above
sign bit.
Arithmetic InstructionsImplements 32 bit addition and subtraction
3-operand form
Examples
SUB r0, r1, r2Subtract value stored in r2 from that of r1 and store in r0
SUBS r1, r1, #1Subtract 1 from r1 and store result in r1 and update Z and C flags
Arithmetic InstructionsADD, ADC
add (with carry)SUB, SBC
subtract (with carry)RSB, RSC
reverse subtract (with carry)MUL, MLA
multiply (and accumulate)
Multiply InstructionsMultiply contents of a pair of registers
Long multiply generates 64 bit resultExamples:
MUL r0, r1, r2 Contents of r1 and r2 multiplied and put in r0
UMULL r0, r1, r2, r3 Unsigned multiply with result stored in r0 and r1
Number of cycles taken for execution of multiply instruction depends upon processor implementation
Multiply and AccumulateResult of multiplication can be accumulated with
content of another register
MLA Rd, Rm, Rs, RnRd = (Rm * Rs) + Rn
UMLAL Rdlo, Rdhi, Rm, Rs[Rdhi, Rdlo] = [Rdhi, Rdlo] + (Rm * Rs)
Logical InstructionsBit-wise logical operations on the two source
registersOperators: AND, OR, EOR (Ex-OR), BIC (bit clear)
Example: BIC r0, r1, r2r2 contains a binary pattern where every binary 1 in r2
clears a corresponding bit location in register r1Useful in manipulating status flags and interrupt masks
With Barrel ShifterUse of barrel shifter with arithmetic and logical
instructions increases the set of possible available operations
Example:ADD r0, r1, r1 LSL # 1Register r1 is shifted to the left by 1, then it is added
with r1 and the result (3 times r1) is stored in r0.
Compare InstructionsEnables comparison of 32 bit values
Updates CPSR flags but do not affect other registers
ExamplesCMP r0, r9
Flags set as a result of r0 – r9
TEQ r0, r9 Flags set as a result r0 ex-0r r9
TST r0, r9 Flags as a result of r0 & r9
Compare InstructionsCMP : compareTST : bit-wise testTEQ : XOR
These instructions set only the NZCV bits of CPSR.
Load-Store InstructionsTransfers data between memory and processor
registersSingle register transfer
Data types supported are signed and unsigned words (32 bits), half-words, bytes
Multiple-register transfer Transfer multiple registers between memory and the
processor in a single instruction
Swap Swaps content of a memory location with the contents of a
register
Single Transfer InstructionsLoad & Store data on a boundary alignment
LDR, LDRH, LDRB: Load (word, half-word, byte)
STR, STRH, STRB Store (word, half-word, byte)
Supports different addressing modes:3 primary addressing modes
Preindex with writeback, Preindex, Postindex
Almost 9 derived addressing modes Immediate, Register, Scaled register, …
Addressing Modes (1)Preindex with writeback
LDR r0, [r1, #4]! Updates the address base register with new address
Addressing Modes (2)Preindex (Immediate Offset)
LDR r0, [r1, #4] 12-bit offset added to the base register
Addressing Modes (3)Postindex
LDR r0, [r1], #4 Updates the address register after address is used
Initial:r0 = 0x00000000
r1 = 0x00009000
mem32 [0x00009000] = 0x01010101
mem32 [0x00009004] = 0x02020202Preindexing with writeback: LDR r0, [r1, #4]!
r0 = 0x02020202
r1 = 0x00009004Preindexing: LDR r0, [r1, #4]
r0 = 0x02020202
r1 = 0x00009000
Example (1)
Example (2)Initial:
r0 = 0x00000000
r1 = 0x00009000
mem32 [0x00009000] = 0x01010101
mem32 [0x00009004] = 0x02020202
Postindexing: LDR r0, [r1], #4r0 = 0x01010101
r1 = 0x00009004
Derived Addressing ModesRegister indirect: LDR r0, [r1]Register operation: LDR r0, [r1, -r2]
Calculated Address uses base register and another register
Scaled: LDR r0, [r1, r2, LSL #2]Address is calculated using the base address register
and a barrel shift operation
Example: C assignmentsC:
x = (a + b) - c;
Assembler:ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b, reusing r4
LDR r1,[r4] ; get value of b
ADD r3,r0,r1 ; compute a+b
ADR r4,c ; get address for c
LDR r2[r4] ; get value of c
C assignment, cont’d.SUB r3,r3,r2 ; complete computation of x
ADR r4,x ; get address for x
STR r3[r4] ; store value of x
Example: C assignmentC:
y = a*(b+c);
Assembler:ADR r4,b ; get address for b
LDR r0,[r4] ; get value of b
ADR r4,c ; get address for c
LDR r1,[r4] ; get value of c
ADD r2,r0,r1 ; compute partial result
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
C assignment, cont’d.MUL r2,r2,r0 ; compute final value for y
ADR r4,y ; get address for y
STR r2,[r4] ; store y
Example: C assignmentC:
z = (a << 2) | (b & 15);
Assembler:ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
MOV r0,r0,LSL 2 ; perform shift
ADR r4,b ; get address for b
LDR r1,[r4] ; get value of b
AND r1,r1,#15 ; perform AND
ORR r1,r0,r1 ; perform OR
C assignment, cont’d.ADR r4,z ; get address for z
STR r1,[r4] ; store value for z
Multiple Register TransferLoad-store multiple instructions transfer multiple
register contents between memory and the processor in a single instruction
More efficient – for moving blocks of memory and saving and restoring context and stack
These instructions can increase interrupt latency Usually instruction executions are not interrupted by ARM
On ARM 7: 2 + Nt cycles N: number of registers to load t: number of cycles required for each sequential access to
memory.
Multiple Byte Load-StoreAny subset of current bank of registers can be
transferred to memory or fetched from memoryLDMSTM
Syntax: <LDM|STM>{<cond>}<addressing mode> Rn{!},<registers>{^}
The base register Rn determines source or destination address
Load/Store Multiple Addressing
SWAP InstructionSpecial case of load store instructionSwap instructions:
SWP: swap a word between memory and registerSWPB: swap a byte between memory and register
SWAP InstructionUseful for implementing synchronization primitives
like semaphore
Control Flow InstructionsBranch InstructionsConditional BranchesConditional ExecutionBranch and Link InstructionsSubroutine Return Instructions
Branch InstructionBranch instruction: B label
Example: B forwardAddress label is stored in the instruction as a signed
pc-relative offsetConditional Branch: B<cond> label
Example: BNE loopBranch has a condition associated with it and
executed if condition codes have the correct value
Example: Block memory copyLoop LDMIA r9!, {r0-r7}
STMIA r10!, {r0-r7}
CMP r9, r11
BNE Loop
r9 points to source of data, r10 points to start of destination data, r11 points to end of the source
Conditional ExecutionAn unusual feature of ARM instruction set is that
conditional execution applies not only to branches but to all ARM instructions
Example: ADDEQ r0, r1, r2Instruction will only be executed when the zero flag is
set to 1
AdvantagesReduces the number of branches
Reduces the number of pipeline flushesImproves performance of the code
Increases code densityThumb Rule: Whenever the conditional sequence is
3 instructions or fewer (smaller and faster), exploit conditional execution than to use a branch
Branch & Link InstructionPerform a branch, save the address following
the branch in the link register, r14Example: BL subroutine
For nested subroutine, push r14 and some work registers required to be saved onto a stack in memoryExample:
BL sub1
…….
STMFD r13!, {r0-r2, r14}
BL sub2
Subroutine return instructionsNo specific instructionsExample (1):
sub ……
MOV PC, r14Example (2): when return address has been pushed
to stack
sub2 …..
LDMFD r13!, {r0-r12, PC}
ThumbThumb encodes a subset of the 32 bit instruction set
into a 16-bit subspaceThumb has higher performance than ARM on a
processor with a 16-bit data busThumb has higher code density
For memory constrained embedded systemOn average, a Thumb implementation takes 30% less
memory than the equivalent ARM implementation.(source: ARM System Developer’s Guide)
Thumb Instruction DecodingEach Thumb instruction is related to a 32-bit ARM
instruction.
ARMv5E ExtensionsExtensions to facilitate signal processing
operations
SupportsSigned multiply accumulate instructionGreater flexibility and efficiency when
manipulating 16 bit values for applications such as 16 bit digital audio processing.
SummaryWe have studied instruction set of ARM
processorsWe discussed the use of barrel shiftersWe studied various addressing modesWe have examined Thumb mode of
operation