1824 arm integer cores - university of...
Post on 12-May-2020
4 Views
Preview:
TRANSCRIPT
ARM integer cores – v5 – 1
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
❍ the ARM 3-stage pipeline
❍ the ARM7TDMI core
❍ the ARM 5-stage pipeline
❍ the ARM9TDMI core
❍ the ARM10TDMI core
❍ StrongARM & XScale
❍ the ARM11 core
❍ Cortex
☞ hands-on: system software – interrupts
ARM integer cores – v5 – 2
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
➜ the ARM 3-stage pipeline
❍ the ARM7TDMI core
❍ the ARM 5-stage pipeline
❍ the ARM9TDMI core
❍ the ARM10TDMI core
❍ StrongARM & XScale
❍ the ARM11 core
❍ Cortex
☞ hands-on: system software – interrupts
ARM integer cores – v5 – 3
MANCHEstER1824
The
Uni
vers
ity
line
n the instruction set.)
ath control signals
bank, shifted,ten back
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r The 3-stage ARM pipe
(The original architecture which has affected o
❏ fetch
❍ the instruction is fetched from memory
❏ decode
❍ the instruction is decoded and the datapprepared for the next cycle
❏ execute
❍ the operands are read from the register combined in the ALU and the result writ
ARM integer cores – v5 – 4
MANCHEstER1824
The
Uni
vers
ity
line
ently
xecute
ecode execute
time
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r The 3-stage ARM pipe
❏ Single cycle instructions
❍ complete at a rate of one per clock cycle
❍ intended to use (a single) memory effici
fetch decode execute
fetch decode e
fetch d
1
2
3
instruction
ARM integer cores – v5 – 5
MANCHEstER1824
The
Uni
vers
ity
line
s
i
time
execute
decode execute
etch ADD decode execute
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r The 3-stage ARM pipe
❏ More complex instructions:
❍ ‘STR’ causes a stall while transfer occur
fetch ADD decode execute1
2
3
nstruction
4
5
fetch STR decode calc. addr.
fetch ADD decode
fetch ADD
f
data xfer
ARM integer cores – v5 – 6
MANCHEstER1824
The
Uni
vers
ity
line
n executes
8
is architecturally
ssary adjustments,
ARMs ,ay vary.
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r The 3-stage ARM pipe
❏ PC behaviour
❍ r15 increments twice before an instructio
– due to pipeline operation
❍ therefore r15 = address of instruction +
– (+12 if used after first cycle, though this undefined)
– in Thumb code the offset is +4
❍ normally the assembler makes the necee.g. in branches
This behaviour is consistent for allalthough the pipeline structures m
ARM integer cores – v5 – 7
MANCHEstER1824
The
Uni
vers
ity
tion
5
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r 3-stage ARM organiza
❏ ARM components:
❍ register bank
– 2 read ports, 1 write port
– plus additional read and write ports for r1
❍ barrel shifter
❍ ALU
❍ address register and incrementer
❍ memory data registers
❍ instruction decoder and control
ARM integer cores – v5 – 8
MANCHEstER1824
The
Uni
vers
ity
tion
ter data in register
ster
incrementer
relfter
instruction
control
and
decode
instruction
control
register
D[31:0]
B bus
PC
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r 3-stage ARM organiza
❍ Separate addressincrementer
❍ Two register read ports
❍ Barrel shifter in serieswith ALU
data out regis
address regi
register
bank
barshi
ALU
multiplyregister
A[31:0]
ALU
bus
A bus
PC
ARM integer cores – v5 – 9
MANCHEstER1824
The
Uni
vers
ity
r?
sult in pipeline stallsumstances.
n a branch is taken
lt from a previous one
s long latency)
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r Why does this matte
The internal structure of the processor can re(⇒ loss of performance) in some circ
❏ Instruction dependencies
❍ pipeline needs flushing and refilling whe
❍ alleviated by branch prediction
❏ Data dependencies
❍ instruction may have to wait for the resu
❍ alleviated by forwarding
❍ particular problem with loads (memory i
– code reordering can help
ARM integer cores – v5 – 10
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
❍ the ARM 3-stage pipeline
➜ the ARM7TDMI core
❍ the ARM 5-stage pipeline
❍ the ARM9TDMI core
❍ the ARM10TDMI core
❍ StrongARM & XScale
❍ the ARM11 core
❍ Cortex
☞ hands-on: system software – interrupts
ARM integer cores – v5 – 11
MANCHEstER1824
The
Uni
vers
ity
t
t
ware
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r The ARM7TDMI
❏ The ARM7TDMI is …
❍ an ARM7 3-stage pipeline core, with
❍ T - support for the Thumb instruction se
❍ D - support for debug
– the processor can stop on a debug even
❍ M - support for long multiplies
❍ I - the EmbeddedICE macrocell
– provides breakpoint and watchpoint hard
• described later
ARM integer cores – v5 – 12
MANCHEstER1824
The
Uni
vers
ity
tion
or debugging
ssore
othersignals
scan chain 0
hain 1
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM7TDMI organiza
❍ Scan chains provide access to signals f
EmbeddedICE
JTAG TAPcontroller
bussplitter
procecor
extern0extern1
Din[31:0]
Dout[31:0]
D[31:0]
A[31:0]
control
JTAG
scan chain 2
scan c
ARM integer cores – v5 – 13
MANCHEstER1824
The
Uni
vers
ity
A[31:0]
Din[31:0]
Dout[31:0]
D[31:0]
bl[3:0]r/wmas[1:0]mreqseqlock
transmode[4:0]abort
Tbit
tapsm[3:0]ir[3:0]tdoentck1tck0screg[3:0]
drivebsecapclkbsicapclkbshighzpclkbsrstclkbssdinbssdoutbsshclkbsshclk2bs
TRSTTCKTMSTDITDO
TDMI
re
boundaryscanextension
memoryinterface
state
TAPinformation
JTAGcontrols
MMUinterface
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r
TheARM7TDMI
core interfacesignals
mclkwaiteclk
bigend
irqfiq
isync
reset
eninenout
enoutiabeale
apedbetbe
busenhighz
busdisecapclk
dbgrqbreakptdbgack
execextern1extern0dbgen
rangeout0rangeout1
dbgrqicommrxcommtx
opccpi
cpacpb
VddVss
ARM7
co
clockcontrol
configuration
interrupts
initialization
buscontrol
debug
coprocessorinterface
power
ARM integer cores – v5 – 14
MANCHEstER1824
The
Uni
vers
ity
e signals
f bus cycle which will
design
active
ory inactive
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM7TDMI core interfac
❏ Memory interface
❍ ~mreq - memory request
❍ seq - sequential address
– together these signals indicate the sort ohappen next
– they are pipelined ahead to aid memory
mreq seq Cycle Use
0 0 N Non-sequential memory access
0 1 S Sequential memory access
1 0 I Internal cycle bus and memory in
1 1 C Coprocessor register transfer mem
ARM integer cores – v5 – 15
MANCHEstER1824
The
Uni
vers
ity
e signals
ore transfer occurs
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM7TDMI core interfac
❏ Notes:
❍ request for memory is in clock cycle bef
❍ wait state insertion is possible
mclk
Din[31:0]
Dout[31:0]
mreq, seq
abort
A[31:0] & control
ARM integer cores – v5 – 16
MANCHEstER1824
The
Uni
vers
ity
MIPS 60Power 87 mWMIPS/W 690
n’ ARM7 (0.18µm) is < mm212---
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM7TDMI
❏ ARM7TDMI debug support
❍ the EmbeddedICE module
❍ supports breakpoints and watchpoints
– controlled via the JTAG test access port
– EmbeddedICE & JTAG are covered later
❏ ARM7TDMI characteristics:
Process 0.35 µm Transistors 74,209Metal layers 3 Core area 2.1 mm2
Vdd 3.3V Clock 0 to 66 MHz
A ‘moder
ARM integer cores – v5 – 17
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
❍ the ARM 3-stage pipeline
❍ the ARM7TDMI core
➜ the ARM 5-stage pipeline
❍ the ARM9TDMI core
❍ the ARM10TDMI core
❍ StrongARM & XScale
❍ the ARM11 core
❍ Cortex
☞ hands-on: system software – interrupts
ARM integer cores – v5 – 18
MANCHEstER1824
The
Uni
vers
ity
nce
ipeline stage
tages)
on)
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r Getting higher performa
❏ Increase the clock rate
❍ the clock rate is limited by the slowest p
– decrease the logic complexity per stage
– increase the pipeline depth (number of s
❏ improve the CPI (clocks per instructi
❍ fewer wasted cycles
– better memory bandwidth
ARM integer cores – v5 – 19
MANCHEstER1824
The
Uni
vers
ity
line
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r The 5-stage ARM pipe
❏ Fetch
❏ Decode
❍ instruction decode and register read
❏ Execute
❍ shift and ALU
❏ Memory
❍ data memory access
❏ Write-back
ARM integer cores – v5 – 20
MANCHEstER1824
The
Uni
vers
ity
line
clock cycle
r
k cycle
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r The 5-stage ARM pipe
❏ Reducing the CPI
❍ ARM7 uses the memory on nearly every
– for either instruction fetch or data transfe
❍ therefore a reduced CPI requiresmore than one memory access per cloc
❏ Possible solutions are:
❍ separate instruction and data memories
❍ double-bandwidth memory (e.g. ARM8)
ARM integer cores – v5 – 21
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
❍ the ARM 3-stage pipeline
❍ the ARM7TDMI core
❍ the ARM 5-stage pipeline
➜ the ARM9TDMI core
❍ the ARM10TDMI core
❍ StrongARM & XScale
❍ the ARM11 core
❍ Cortex
☞ hands-on: system software – interrupts
ARM integer cores – v5 – 22
MANCHEstER1824
The
Uni
vers
ity
pipeline
rts
edICE debug
e than the
0-200 MHz
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM9TDMI
❏ The ARM9TDMI is …
❍ a ‘classic’ Harvard architecture 5-stage
– separate instruction and data memory po
❍ with full support for Thumb and Embedd
❍ aimed at significantly higher performancARM7TDMI
– enhanced pipeline (then) operated at 10
– (now) up to 250MHz
ARM integer cores – v5 – 23
MANCHEstER1824
The
Uni
vers
ity
e
data mem.
access
ad
g.
write
reg.shift/ALU
write
reg.
Execute
Memory Write
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM9TDMI pipelin
❍ Thumb instructions are decoded directly
instruction
fetch
ARM
decode re
re
shift/ALUread
Thumb
decompress
instruction
fetch
decode
DecodeFetch
Decode ExecuteFetch
ARM7TDMI:
ARM9TDMI:
ARM integer cores – v5 – 24
MANCHEstER1824
The
Uni
vers
ity
register write
shifter
ALU
mul
registerread
byte repl.
D-cache
rot/sgnex
I decode
I cache
reg shift
FE
TC
HD
EC
OD
EE
XE
CU
TE
BU
FF
ER
/DATA
WR
ITE
immediate
forwarding
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r
ARM9TDMIpipeline
❍ Pipelineintroduces moredependencies
❍ alleviated withforwarding paths
+ 4
mux
+ 4
next PC
PC+4
PC+8(r15)
LDM/STM
B, BL,MOV pc,SUB pc
LDR pc
address
post- index
pre- index
ARM integer cores – v5 – 25
MANCHEstER1824
The
Uni
vers
ity
MIPS 220Power 150 mWMIPS/W 1,500
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM9TDMI
❏ EmbeddedICE
❍ as ARM7TDMI, plus:
– hardware single-stepping
– breakpoints on exceptions
❏ On-chip coprocessor support
❍ for floating-point, DSP, and so on
Process 0.25 µm Transistors 111,000Metal layers 3 Core area 2.1 mm2
Vdd 2.5 V Clock 0-200 MHz
ARM integer cores – v5 – 26
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
❍ the ARM 3-stage pipeline
❍ the ARM7TDMI core
❍ the ARM 5-stage pipeline
❍ the ARM9TDMI core
➜ the ARM10TDMI core
❍ StrongARM & XScale
❍ the ARM11 core
❍ Cortex
☞ hands-on: system software – interrupts
ARM integer cores – v5 – 27
MANCHEstER1824
The
Uni
vers
ity
e than the
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM10TDMI
❏ The ARM10TDMI is …
❍ aimed at significantly higher performancARM9TDMI
❍ achieved through use of:
– higher clock rate
– 64-bit I- and D-memory buses
– branch prediction
– hit-under-miss D-memory interface
ARM integer cores – v5 – 28
MANCHEstER1824
The
Uni
vers
ity
e
Write
multiplier
partial add write
reg.
Memory
ta mem.
ccess write
data
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM10TDMI pipelin
❏ 6-stage pipeline
❍ Additional time allowed for
– I- and D-memory accesses
– instruction decode
multiply
instruction
fetchdecode
decode
read shift/ALU
ExecuteDecodeIssueFetch
da
a
addr.
calc.
ARM integer cores – v5 – 29
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
❍ the ARM 3-stage pipeline
❍ the ARM7TDMI core
❍ the ARM 5-stage pipeline
❍ the ARM9TDMI core
❍ the ARM10TDMI core
➜ StrongARM & XScale
❍ the ARM11 core
❍ Cortex
☞ hands-on: system software – interrupts
ARM integer cores – v5 – 30
MANCHEstER1824
The
Uni
vers
ity
le
gned by ARM Ltd.
performance
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r StrongARM & XSca
❏ Most ARM implementations are desi
❏ Two notable exceptions:
❍ StrongARM
– designed by Digital, later bought by Intel
– now largely obsolete
❍ XScale
– designed by Intel
– current
– up to 800 MHz
Both of these were designed primarily for high Used for proprietary devices
ARM integer cores – v5 – 31
MANCHEstER1824
The
Uni
vers
ity
lls
em.
ss
rly termination
iply
ulate
ALU
writeback
data
writeback
MAC
writeback
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r XScale™ pipeline
❍ another deep pipeline
❍ (unpredicted) branches are expensive
❍ data (memory) dependencies cause sta
BTB
fetch
data m
acce
ea
mult
accum
fetch2read/
shiftdecode
ALU
exec.
state
exec.
ARM integer cores – v5 – 32
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
❍ the ARM 3-stage pipeline
❍ the ARM7TDMI core
❍ the ARM 5-stage pipeline
❍ the ARM9TDMI core
❍ the ARM10TDMI core
❍ StrongARM & XScale
➜ the ARM11 core
❍ Cortex
☞ hands-on: system software – interrupts
ARM integer cores – v5 – 33
MANCHEstER1824
The
Uni
vers
ity
ARM Ltd.)
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM11
❏ Most recent available ARM core (from
❏ process portable
❏ high speed
❍ faster clock (~550MHz)
❍ deeper pipeline
– also to accommodate DSP operations
❍ improved branch prediction
ARM integer cores – v5 – 34
MANCHEstER1824
The
Uni
vers
ity
y during LDM/STM)
e miss stalls
data mem.
access
ALU saturationregister
write
Writeback
ultiply
mulate
register
write
/data1/AC2
Sat./data2/MAC3
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM11 pipeline
❏ 8-stage pipeline
❍ Parallel ALU and data access (especiall
❍ ‘Hit under Miss’ in ‘data1’ alleviates cach
instruction
fetch
Fetch 1
branch
prediction
Fetch 2
instruction
decode
Decode
register
read
Issue
shift
shift/addr./
m
accu
address
generation
MAC1ALU
M
ARM integer cores – v5 – 35
MANCHEstER1824
The
Uni
vers
ity
tion
t taken)
sp!, {…,pc} }
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM 11 branch predic❍ Two-level dynamic branch prediction
– constant offset branches– 128 entry, direct mapped (addr[9:3])
– cost: 1 or 0 cycles if successful (taken/no
❍ Static branch prediction
– constant offset branches …– … that miss the dynamic predictor– cost: 4 cycles if successful
❍ Return stack
– three entries– Pushes on BL or BLX (inc. BLX Rn )– Pops on: {BX lr; MOV pc, lr; LDR pc, [sp], #n; LDMIA
– cost: 4 cycles if successful
ARM integer cores – v5 – 36
MANCHEstER1824
The
Uni
vers
ity
ies
cle penalty (cache hit)
s registers early
- no stall
waiting
le
in total
rom the pipeline figure
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM 11 dependenc
❍ Data operations forward results
❍ Load operations impose an extra two cy
❍ Memory operations require their addres
Thus: ADD r2, r1, r0 ; produce R2ADD r4, r3, r2 ; consume R2
LDR r2, [r1] ; load r2ADD r4, r3, r2 ; lose 2 cycles
ADD r2, r1, r0 ; produce R2LDR r4, [r2] ; lose one cyc
LDR r2, [r1] ;LDR r4, [r2] ; lose 3 cycles
– a few other dependencies may be seen f
ARM integer cores – v5 – 37
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r ARM integer cores
❏ Outline:
❍ the ARM 3-stage pipeline
❍ the ARM7TDMI core
❍ the ARM 5-stage pipeline
❍ the ARM9TDMI core
❍ the ARM10TDMI core
❍ StrongARM & XScale
❍ the ARM11 core
➜ Cortex
☞ hands-on: system software – interrupts
ARM integer cores – v5 – 38
MANCHEstER1824
The
Uni
vers
ity
ets
ms.
ets
ors
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r Cortex™
❏ Three ‘flavours’:
❍ Cortex-A Series
– applications processors
– ARM, Thumb and Thumb-2 instruction s
❍ Cortex-R Series
– embedded processors for real-time syste
– ARM, Thumb, and Thumb-2 instruction s
❍ Cortex-M Series
– deeply embedded/cost sensitive process
– Thumb-2 instruction set only
ARM integer cores – v5 – 39
MANCHEstER1824
The
Uni
vers
ity
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r Cortex-M3
❏ First of the line
❍ Thumb-2 only
❍ new design
– 3-stage pipeline
– Harvard core
❍ small core
– ~ mm2 (excluding cache)
❍ performance (0.18µm):
– ~100 MHz
– ~8000 MIPS/W
12---
ARM integer cores – v5 – 40
MANCHEstER1824
The
Uni
vers
ity
are
re issues
© 2005 PEVEIT Unit – ARM System Design
of M
anch
este
r Hands-on: system softw– interrupts
❏ Look further into ARM system softwa
❍ Write an interrupt handler
❍ Use it to trigger a context switch
☞ Follow the ‘Hands-on’ instructions
top related