a mixed time-criticality sdram controller meaow30-09-2013 sven goossens, benny akesson, kees...

16
A Mixed Time-Criticality SDRAM Controller MeAOW 30-09-2013 Sven Goossens, Benny Akesson, Kees Goossens COBRA – CA104 NEST

Upload: francine-reeves

Post on 02-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

A Mixed Time-Criticality SDRAM Controller

MeAOW

30-09-2013

Sven Goossens, Benny Akesson, Kees Goossens

COBRA – CA104

NEST

www.compsoc.eu

Mixed Time-Criticality2/15

• Embedded multi-core systems are getting more complex:– Integrating more applications– Applications get more complex– (Functionality/Energy) ratio increases

• Driven by power, area and cost constraints

• Results in a mix of applications of different time-criticalities sharing hardware resources

– Firm real-time + Soft real-time = Mixed real-time

The hardware can no longer be tailored for a specific time-criticality class

www.compsoc.eu

SDRAM Controllers3/15

• DRAM: Most commonly used off-chip memory resource• Shared across FRT and SRT

• Performance metrics: bandwidth (throughput) and latency (response time)• Difficult to bound performance: locality dependent

Firm Real-Time Controllers

•Maximize worst-case performance• Simple / analyzable command scheduler•No attention for average-case performance•Do not exploit locality across requests

Soft Real-Time Controllers

•Maximize average-case performance•Complex high performance command scheduler•Guaranteeable performance is usually low•Exploit locality as much as possible

Mixed Real-Time Controllers: requirementsFor FRT: guarantee enough worst-case performance to satisfy requirementsFor SRT: maximizing the average-case performance

How can locality be exploited by a MRT controller?

www.compsoc.eu

Outline4/15

Introduction

Firm Real-Time Performance

Conclusions

Reconfigurable Controller Architecture

Conservative Open-Page Policy

Firm Real-Time Performance

• Our approach: do not schedule individual commands at run time, instead, use design-time computed command sequences called patterns, and schedule those.

• Select the right memory map / configuration for the mix of applications.

5/15

An example read pattern for a DDR3-800 in configuration (BI 2, BC 4):

Access Granularity (AG):Number of bytes read/written in a pattern:AG = BI BC BL IW∙ ∙ ∙

ACT0 RD04 ×

NOP3 ×

NOP RD0 3 × NOP RD0 2 ×

NOPACT

1 RD0 RD1 RD1 3 × NOP RD1 3 ×

NOP RD1

Each read command results in a burst data transfer. The number of burst per bank is called Burst Count (BC)

The number of Banks Interleaved (BI) in the access

Pattern length

3 × NOP

3 × NOP

(Gross) efficiency: fraction of time that the data bus is occupied in the worst-case

Burst Length (BL): The number of words per read command. They are transferred in BL/2 clock cycles.Interface width (IW)

• The designer can choose the bank interleaving and burst count.• Each configuration results in a different trade-off between bandwidth, latency and power‡

‡ Goossens, Kouters, Akesson, Goossens, Memory Map Selection for Firm Real-time SDRAM Controllers,Proc. DATE 2012

Firm Real-Time Performance6/15

0 0.2 0.4 0.6 0.8 1 1.20

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0.1

0.3

0.4

0.6

0.7

0.9

1.1

1.41.72.22.83.75.49.7 42

1

2

4

8

16

32

2,1

2,2

2,4

4,1

4,2

8,1

8,2

8,64

1

2

4

8

16

32

2,1

2,2

2,4

4,1

4,2

8,1

8,64

1

2

4

8

16

32

64

2,1

2,2

2,4

2,8

4,1

4,2

4,4

8,1

8,2

8,64b net

AG

(G

B/s

)

Power (W)

128MB_DDR2-800

128MB_DDR3-800256MB_LPDDR2-800-S4

• Labels: BI,BC (BI 1 is omitted)• All memories in this graph run at 400MHz• Pareto optimal points are connected• Isolines denote energy efficiency in GB/J• Peak bandwidth (1.6GB/s)

0 0.2 0.4 0.6 0.8 1 1.20

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0.1

0.3

0.4

0.6

0.7

0.9

1.1

1.41.72.22.83.75.49.7 42

1

2

4

8

16

32

2,1

2,2

2,4

4,1

4,2

8,1

8,2

8,64

1

2

4

8

16

32

2,1

2,2

2,4

4,1

4,2

8,1

8,64

1

2

4

8

16

32

64

2,1

2,2

2,4

2,8

4,1

4,2

4,4

8,1

8,2

8,64b net

AG

(G

B/s

)

Power (W)

128MB_DDR2-800

128MB_DDR3-800256MB_LPDDR2-800-S4

• Select the configuration based on the real-time requirements of the requestors, and their request sizes.

Band

wid

th (

GB/

s)

Power (W)

www.compsoc.eu

Outline7/15

Introduction

Firm Real-Time Performance

Conclusions

Reconfigurable Controller Architecture

Conservative Open-Page Policy

www.compsoc.eu

Open vs. Close-Page Policy8/15

• Color indicates locality (and request origin)• For the blue requestor the open-page policy:

• Increases the worst-case execution time• Reduces the average-case execution time• We would like to improve average case performance for SRT applications, without

hurting the FRT guarantees

A Read

A Read Read Read

P A Read P A Read P A Read P

A Read PP

A Read P

A Read

1 2 3Request arrivals:

Close-Page policy

Open-Page policy

Time ε

4

1

Conservative Open-Page Policy9/15

A Read

A Read Read Read

P A Read P A Read P A Read P

A Read PP

A Read P

A Read

1 2 3Request arrivals:

Close-Page policy

Open-Page policy

Time

A Read Read P A Read P A Read P A Read PConservative Open-Page policy

4

1

• Key idea:• Do not precharge if next request is known to target the open row• Precharge if next address is not known in time, or in case of a miss

ε

‡ Goossens, Akesson, Goossens, Conservative Open-Page Policy for Mixed Time-Criticality Memory Controllers, Proc. DATE 2013

• Conservative Open-Page policy‡ can be used in a MRT controller:• Worst-case guarantees are equal to a close-page policy.• Average-case performance is better, leading to lower execution times, lower average-case

latencies. SRT applications can even benefit indirectly from the quicker service to FRT requests!• The execution time reduction depends on the memory load of the application.

10/15

• Starting point is a predictable memory pattern set, with a “bypass” in case of a page hit• Use explicit precharges instead of auto-precharge flags

• postpones the precharge as long as possible, to increase the hit-window in which we can decide to bypass the precharge and activate.

A0

N N N N N N N A1

N R0

N N N R0

N N N R1

N N N R1

N N N N N P0

N N N N N N N P1N A0

N N N N

PRE-to-ACT = 10Hit window (28 cc)

A0

N N N N N N N A1

N R0

N N N R0

N N N R1

N N N R1

N N N N N N N N N N N N N N N A0

N N N N

ACT-to-ACT constraint = 38 cycles

Hit window (14 cc)Next request

Bank:Cmd:

Bank:Cmd:

Conservative Open-Page Policy

(DDR3-1600)

‡ Goossens, Akesson, Goossens, Conservative Open-Page Policy for Mixed Time-Criticality Memory Controllers, Proc. DATE 2013

www.compsoc.eu

Outline11/15

Introduction

Firm Real-Time Performance

Conclusions

Reconfigurable Controller Architecture

Conservative Open-Page Policy

Reconfigurable Back-end12/15

SDRAM back-end

Internal configuration busInternal configuration bus

PatternselectorPatternselectorRequest

type

Configuration data

Logical address

Pattern LUTPattern LUT

OffsetOffset

Commandplayer

Commandplayer

Refreshtimer

Refreshtimer

AddressgeneratorAddress

generator

Pattern memoryPattern memory

row/col,bank

RAS, CAS WE, etc

Address masks, shift-amounts

Pattern base-address, length

commands

• Patterns are reprogrammable at run time.• Can support all devices supported by the PHY (all DDR3 devices)

• Different pattern different worst-case bandwidth, latency and power trade-off. • Allows different trade-off per use case.

‡ Goossens, Kuijsten, Akesson, Goossens, A Reconfigurable Real-Time SDRAM Controller for Mixed Time-Criticality Systems, Proc. CODES+ISSS 2013

Reconfigurable Controller Architecture13/15

SDRAMback-endSDRAMback-end

Resource front-endResource front-end

Configuration BusConfiguration Bus

WidthConverter

WidthConverter

WidthConverter

WidthConverter

Req./Resp.queue

Req./Resp.queue

AtomizerAtomizer

TDMArbiterTDM

Arbiter

Req./Resp.queue

Req./Resp.queue

SD

RA

M

Re

so

ur c

e Bu

s

Memory client 1

Memory client 2

Configuration data

AtomizerAtomizer

PH

Y

Run-time reconfiguration infrastructure (memory mapped)

Reconfigurable TDM arbiter (predictable and composable during reconfiguration)

Reconfigurable back-end

• Implemented in SystemC, and on a ML605 Virtex-6 development board from Xilinx• 2-port instance: 13754 registers, 9543 LUTs and 1 BRAM • 4-port instance: 22065 registers, 14016 LUTs and 1 BRAM

• (Most registers are used in the req./resp. queue, that contain 256 bytes / port)‡ Goossens, Kuijsten, Akesson, Goossens, A Reconfigurable Real-Time SDRAM Controller forMixed Time-Criticality Systems, Proc. CODES+ISSS 2013

www.compsoc.eu

Outline14/15

Introduction

Firm Real-Time Performance

Conclusions

Reconfigurable Controller Architecture

Conservative Open-Page Policy

www.compsoc.eu

Conclusions

• Mixed-time criticality controllers should focus on:• For FRT: guarantee enough worst-case performance to satisfy requirements• For SRT: maximizing the average-case performance

• Choosing the right memory map / pattern configuration for the mix of applications:• Trade-offs exist between worst-case bandwidth, latency and power• Select the configuration that satisfies the firm real-time requirements

• Using a conservative open-page policy, some of the locality across requests can be exploited:• Decrease the gap between worst-case performance and average-case performance• Reduce average case latency and thus average case execution time• For soft real-time applications

• Reconfigurable architecture allows changing the memory map / configuration at run-time:• Select the right trade-off per use-case• Leads to other interesting challenges (see CODES2013 paper on predictable reconfiguration)

15/15

www.compsoc.eu

• For further information / a broader perspective:www.compsoc.eu

Sven Goossens <[email protected]>Benny Akesson <[email protected]>Kees Goossens <[email protected]>

Referred papers:www.svengoossens.nl

16/16

Electronic Systems GroupElectrical Engineering Faculty

Eindhoven University of Technology

5-tile compSOC platform: