nc state university center for embedded systems research (cesr) electrical & computer...

22
NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and Eric Rotenberg Safely Exploiting Multithreaded Processors to Tolerate Memory Latency in Real-Time Systems

Upload: catherine-powell

Post on 13-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

NC STATE UNIVERSITY

Center for Embedded Systems Research (CESR)

Electrical & Computer EngineeringNorth Carolina State University

Ali El-Haj-Mahmoud and Eric Rotenberg

Safely Exploiting Multithreaded Processors to Tolerate Memory Latency

in Real-Time Systems

Page 2: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 2El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Embedded Processor Trends

• More demanding applications and user expectations• Higher frequency

– ARM-11 (0.13µ): 500 MHz

– ARM-11 (0.10µ): 1 GHz

• Processor-memory speed gap

Page 3: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 3El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Memory Wall

1000

10

10000

100000

100

Per

form

ance

20051980

Years

How to capitalize on higher frequency?

Page 4: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 4El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Multithreading

• Switch-on-event coarse-grain multithreading– Multiple register contexts for fast switching– Switch to alternate task when current task accesses

memory– Overlap memory accesses with computation

• But what about multithreading in hard-real-time?– Require analyzability– Cannot rely on dynamic schemes

Page 5: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 5El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Exploiting Multithreading in Hard-Real-Time Systems

• Safety– Guarantee all tasks meet deadlines (worst-case)– Statically bound overlap under all scenarios

• Tractability– Confirm/disconfirm schedulability mathematically

using closed-form tests– Consider each task individually

Page 6: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 6El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Classic Real-Time Scheduling

25.08

2

A

AA period

WCETU

• Utilization-based schedulability test• No need to construct the schedule a priori

Example: Earliest Deadline First (EDF)

time

Task A

Task B

EDF

B1 B2 B3 B4

A1 A2

periodA

periodB

release A1 deadline A1release A2

release B1 deadline B1release B2

B1 A1 B2 B3 B4A2

EDF schedulability test

1i i

i

period

WCETU

75.04

3

B

BB period

WCETU

Page 7: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 7El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Performance vs. Tractability

• Performance: Memory overlap

• Tractability: Closed-form schedulability test

• Only basic parameters of tasks are known – WCET = C + M (from conventional WCET analysis)– Period = deadline

A

B

M

M M

Page 8: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 8El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

A

B

M

M M

M M M

A

B

M

M M

M M M

A

B

M

M M

M M M

EDF (Classic Real-Time):

Low Performance but Tractable

1B

B

A

A

period

WCET

period

WCETU

• Memory overlap: No• Closed-form test: Yes

A

B

M

M M

M M M

DEADLINE MISSED

Page 9: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 9El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

A

B

M M

M M

M

MA

B

M M

M M

M

MA

B

M M

M M

M

MA

B

M M

M M

M

MA

B

M M

M M

M

MA

B

M M

M M

M

M

Dynamic Switch on Memory Access: Possibly High Performance but Intractable

• Memory overlap: Possible (unfair for low priority)• Closed-form test: No

– Must examine memory positioning!

DEADLINE MISSED

Page 10: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 10El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

A

B

M

M

M

M

M

M

A

B

M

M

M

M

M

M

A

B

M

M

M

M

M

M

Deterministic Switching:

High Performance and Tractable

• Memory overlap: Yes (fair and bounded)• Closed-form test: Yes

A

B

M

M

M

M

M

M

A

B

M

M

M

M

M

M

Page 11: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 11El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Tractability through Deterministic Switching

• Fully decouple independent tasks by forcing periodic switches– Every task gets a chance to initiate/overlap

memory accesses– No scheduling dependences– No specificity regarding memory positioning

Page 12: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 12El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Weighted-Round-Robin (WRR)

Pipeline

Virt. Proc. 1Virt. Proc. 2Virt. Proc. 3Virt. Proc. 4

round = memory latency

T1 T2

T3

T4

time

T1

T2

T3 T1 T2 T4T3 T1 T2 T3T4

T3

T1

T2

T4

T3

T1

T2

forced pre-emptiondilate WCET

T4T4 T4memory transfer operation

Round i Round i+1 Round i+2

duty cycle

Page 13: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 13El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Analytical Framework for WRR

d: duty cycle 0 < d 1

P: period = deadline

WCET: Worst-Case Execution Time

WCET = C + M

C: aggregate computation time

M: aggregate memory time

WCET’: dilated Worst-Case Execution Time

WCET’ = (C/d) + M

Page 14: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 14El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Schedulability Test

1

11

n

i ii

ii

PM

PC

2. Sum of all duty cycles less than or equal to 1

PM

PC

MP

Cd

1

PWCET '

1. Dilated task meets deadline on its virtual processor

PMd

C

MP

Cd

n

iid

1

1

Page 15: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 15El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Generalized Analytical Framework

12

22

1

11

t

tvp

t

vpvp

P

MdC

P

MdC

P

MdC

t

jjj

t

jjj

vp

PM

PC

d

1

1

1

Multiple tasks per VP:

Page 16: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 16El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Modeling Bus and Memory System

• Addressed in detail in paper

• Analysis accounts for:– Worst-case task serialization on memory bus– DRAM bank conflicts– Multiple VPs sharing single DRAM bank

Page 17: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 17El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

0

0.2

0.4

0.6

0.8

1

1.2

1GHz 2GHz 1GHz 2GHz 1GHz 2GHz

TASK-SETS8 tasks (2 tasks/VP for WRR)

wo

rst-

ca

se

uti

liza

tio

nEDF WRR EDF (no memory)

LOW MED HIGH

Results

50%

28%

memcomponent

Page 18: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 18El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Novel

Memory overlap

Formalism

(safety & tractability)

Classic real-time No Yes

Classic multithreading Yes No

Our real-time multithreading framework

Yes Yes

Page 19: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 19El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Useful

• Fully capitalize on high-frequency embedded microprocessors

• Exceed schedulability limit of conventional real-time theory for uniprocessors by analytically bounding WCET overlap

Page 20: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 20El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Deployable

• Software-only solution– Use Ubicom IP3023 8-thread

embedded microprocessor

– Analytical framework + scheduling policy

16

Local Register Files

(8 Banks) GPR: 16 x 32 Bits Addr: 8 x 32 Bits

Accum: 1 x 48 Bits Source-3: 1 x 32 Bits Interrupt Mask Reg. Control/Status Reg.

Global Register Thread Control Interrupt Status Debug Control

32-Bit CPU Core

Program Memory 256K SRAM

Data Memory 64K SRAM

SDRAM Memory

Controller

Parallel I/O

Serializer Deserializer

(serdes)

Serializer Deserializer

(serdes)

SPI Debug

CPU PLL

Peripheral PLL

Watchdog Timer

Reset & Brown-Out

Random # Generator

Data Memory (0-8MB)

External Flash

Memory (0-4MB)

8

10Base-T USB GPSI

UARTs

MII GPIO

Page 21: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 21El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Summary

Safely expose multithreading to hard-real-time schedulability analysis

Bound computation / memory overlap Offline closed-form schedulability test Safe Tractable

Scale “Memory Wall” in embedded systems Expose full benefits of high-frequency embedded

processors

Page 22: NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and

CASES 2004 22El-Haj-Mahmoud © 2004

NC STATE UNIVERSITY

Questions?