nc state university center for embedded systems research (cesr) electrical & computer...
TRANSCRIPT
NC STATE UNIVERSITY
Center for Embedded Systems Research (CESR)
Electrical & Computer EngineeringNorth Carolina State University
Ali El-Haj-Mahmoud and Eric Rotenberg
Safely Exploiting Multithreaded Processors to Tolerate Memory Latency
in Real-Time Systems
CASES 2004 2El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Embedded Processor Trends
• More demanding applications and user expectations• Higher frequency
– ARM-11 (0.13µ): 500 MHz
– ARM-11 (0.10µ): 1 GHz
• Processor-memory speed gap
CASES 2004 3El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Memory Wall
1000
10
10000
100000
100
Per
form
ance
20051980
Years
How to capitalize on higher frequency?
CASES 2004 4El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Multithreading
• Switch-on-event coarse-grain multithreading– Multiple register contexts for fast switching– Switch to alternate task when current task accesses
memory– Overlap memory accesses with computation
• But what about multithreading in hard-real-time?– Require analyzability– Cannot rely on dynamic schemes
CASES 2004 5El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Exploiting Multithreading in Hard-Real-Time Systems
• Safety– Guarantee all tasks meet deadlines (worst-case)– Statically bound overlap under all scenarios
• Tractability– Confirm/disconfirm schedulability mathematically
using closed-form tests– Consider each task individually
CASES 2004 6El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Classic Real-Time Scheduling
25.08
2
A
AA period
WCETU
• Utilization-based schedulability test• No need to construct the schedule a priori
Example: Earliest Deadline First (EDF)
time
Task A
Task B
EDF
B1 B2 B3 B4
A1 A2
periodA
periodB
release A1 deadline A1release A2
release B1 deadline B1release B2
B1 A1 B2 B3 B4A2
EDF schedulability test
1i i
i
period
WCETU
75.04
3
B
BB period
WCETU
CASES 2004 7El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Performance vs. Tractability
• Performance: Memory overlap
• Tractability: Closed-form schedulability test
• Only basic parameters of tasks are known – WCET = C + M (from conventional WCET analysis)– Period = deadline
A
B
M
M M
CASES 2004 8El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
A
B
M
M M
M M M
A
B
M
M M
M M M
A
B
M
M M
M M M
EDF (Classic Real-Time):
Low Performance but Tractable
1B
B
A
A
period
WCET
period
WCETU
• Memory overlap: No• Closed-form test: Yes
A
B
M
M M
M M M
DEADLINE MISSED
CASES 2004 9El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
A
B
M M
M M
M
MA
B
M M
M M
M
MA
B
M M
M M
M
MA
B
M M
M M
M
MA
B
M M
M M
M
MA
B
M M
M M
M
M
Dynamic Switch on Memory Access: Possibly High Performance but Intractable
• Memory overlap: Possible (unfair for low priority)• Closed-form test: No
– Must examine memory positioning!
DEADLINE MISSED
CASES 2004 10El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
A
B
M
M
M
M
M
M
A
B
M
M
M
M
M
M
A
B
M
M
M
M
M
M
Deterministic Switching:
High Performance and Tractable
• Memory overlap: Yes (fair and bounded)• Closed-form test: Yes
A
B
M
M
M
M
M
M
A
B
M
M
M
M
M
M
CASES 2004 11El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Tractability through Deterministic Switching
• Fully decouple independent tasks by forcing periodic switches– Every task gets a chance to initiate/overlap
memory accesses– No scheduling dependences– No specificity regarding memory positioning
CASES 2004 12El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Weighted-Round-Robin (WRR)
Pipeline
Virt. Proc. 1Virt. Proc. 2Virt. Proc. 3Virt. Proc. 4
round = memory latency
T1 T2
T3
T4
time
T1
T2
T3 T1 T2 T4T3 T1 T2 T3T4
T3
T1
T2
T4
T3
T1
T2
forced pre-emptiondilate WCET
T4T4 T4memory transfer operation
Round i Round i+1 Round i+2
duty cycle
CASES 2004 13El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Analytical Framework for WRR
d: duty cycle 0 < d 1
P: period = deadline
WCET: Worst-Case Execution Time
WCET = C + M
C: aggregate computation time
M: aggregate memory time
WCET’: dilated Worst-Case Execution Time
WCET’ = (C/d) + M
CASES 2004 14El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Schedulability Test
1
11
n
i ii
ii
PM
PC
2. Sum of all duty cycles less than or equal to 1
PM
PC
MP
Cd
1
PWCET '
1. Dilated task meets deadline on its virtual processor
PMd
C
MP
Cd
n
iid
1
1
CASES 2004 15El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Generalized Analytical Framework
12
22
1
11
t
tvp
t
vpvp
P
MdC
P
MdC
P
MdC
t
jjj
t
jjj
vp
PM
PC
d
1
1
1
Multiple tasks per VP:
CASES 2004 16El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Modeling Bus and Memory System
• Addressed in detail in paper
• Analysis accounts for:– Worst-case task serialization on memory bus– DRAM bank conflicts– Multiple VPs sharing single DRAM bank
CASES 2004 17El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
0
0.2
0.4
0.6
0.8
1
1.2
1GHz 2GHz 1GHz 2GHz 1GHz 2GHz
TASK-SETS8 tasks (2 tasks/VP for WRR)
wo
rst-
ca
se
uti
liza
tio
nEDF WRR EDF (no memory)
LOW MED HIGH
Results
50%
28%
memcomponent
CASES 2004 18El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Novel
Memory overlap
Formalism
(safety & tractability)
Classic real-time No Yes
Classic multithreading Yes No
Our real-time multithreading framework
Yes Yes
CASES 2004 19El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Useful
• Fully capitalize on high-frequency embedded microprocessors
• Exceed schedulability limit of conventional real-time theory for uniprocessors by analytically bounding WCET overlap
CASES 2004 20El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Deployable
• Software-only solution– Use Ubicom IP3023 8-thread
embedded microprocessor
– Analytical framework + scheduling policy
16
Local Register Files
(8 Banks) GPR: 16 x 32 Bits Addr: 8 x 32 Bits
Accum: 1 x 48 Bits Source-3: 1 x 32 Bits Interrupt Mask Reg. Control/Status Reg.
Global Register Thread Control Interrupt Status Debug Control
32-Bit CPU Core
Program Memory 256K SRAM
Data Memory 64K SRAM
SDRAM Memory
Controller
Parallel I/O
Serializer Deserializer
(serdes)
Serializer Deserializer
(serdes)
SPI Debug
CPU PLL
Peripheral PLL
Watchdog Timer
Reset & Brown-Out
Random # Generator
Data Memory (0-8MB)
External Flash
Memory (0-4MB)
8
10Base-T USB GPSI
UARTs
MII GPIO
CASES 2004 21El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Summary
Safely expose multithreading to hard-real-time schedulability analysis
Bound computation / memory overlap Offline closed-form schedulability test Safe Tractable
Scale “Memory Wall” in embedded systems Expose full benefits of high-frequency embedded
processors
CASES 2004 22El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Questions?