elastic refresh: techniques to mitigate refresh penalties in high density memory

21
Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory Jeffrey Stuecheli 1,2 , Dimitris Kaseridis 1 , Hillery C. Hunter 3 & Lizy K. John 1 1 ECE Department, The University of Texas at Austin 2 IBM Corp., Austin 3 IBM Thomas J. Watson Research Center Laboratory for Computer Architecture 12/7/2010 MICRO-43

Upload: totie

Post on 09-Jan-2016

45 views

Category:

Documents


2 download

DESCRIPTION

MICRO-43. Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory. Jeffrey Stuecheli 1,2 , Dimitris Kaseridis 1 , Hillery C. Hunter 3 & Lizy K. John 1 1 ECE Department, The University of Texas at Austin 2 IBM Corp., Austin 3 IBM Thomas J. Watson Research Center. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

Jeffrey Stuecheli1,2, Dimitris Kaseridis1, Hillery C. Hunter3 & Lizy K. John1

1ECE Department, The University of Texas at Austin

2IBM Corp., Austin

3IBM Thomas J. Watson Research Center

Laboratory for Computer Architecture 12/7/2010

MICRO-43

Page 2: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

2 Laboratory for Computer Architecture 12/7/2010

Overview/Summary

Refresh overhead is increasing with device density

Due to the nature of this increase, performance is suffering

Current refresh scheduling methods ineffective in hiding these delays

We propose more sophisticated mitigation methods

– Elastic Refresh Scheduling

Page 3: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

Basic DRAM/Refresh Info

Each bit stored on a capacitor

Single read transistor to hold charge

Leakage, looses charge over time

Refresh: Rewrite cell on periodic basis

DDR3– Temperature dependence on refresh

requirement, 64ms@85oC, 32ms@95oC– DRAM device contains internal address

counter– JEDEC simply specifies the time interval

(tREFI, time REFresh Interval) tREFI = 64ms/8096 = 7.8 us (3.9 us for 95oC)

3 Laboratory for Computer Architecture 12/7/2010

Background

Page 4: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

Transition to denser devices

7.8 us based on 8k Rows per bank

DRAM device density doubles ~2 year

With one refresh per row, tREFI would half each generation

Instead, multiple rows are refreshed with each command

Current delivery constraints forces increase in tRFC with denser devices

95 nm 512 MBit

42 nm 2GBit

4 Laboratory for Computer Architecture 12/7/2010

Background

Page 5: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

“Stacked” Refresh Operations in a Single Command Example

Source: TN-47-16 Designing for High-Density DDR2 Memory Introduction by MICRON

5 Laboratory for Computer Architecture 12/7/2010

Background

Page 6: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

6 Laboratory for Computer Architecture 12/7/2010

tRFC Growth with DRAM Density DRAM type Refresh Completion Time

512Mbit 90ns

1Gbit 110ns

2Gbit 160ns

4Gbit 300ns

8Gbit 350ns

In the most basic terms, tRFC should scale linearly with density

– Based strictly on current to charge capacitance

~Fixed charge per bit

This has been reflected in the DDR3 spec, with the exception of 8 GBit

Net, even if DRAM vendors can slow the growth, the delay is large today

Background

Page 7: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

Slowdown Effects Observed in Simulation

Simics/Gems

4 cores, 2 1333MHz channels, 2 DDR3 Ranks/channel

7 Laboratory for Computer Architecture 12/7/2010

Motivation

Page 8: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

8 Laboratory for Computer Architecture 12/7/2010

Why it is so bad

Refresh

26ns 326ns

Worst Case Refresh Hit DRAM Read

DRAM capacity

tRFC bandwidth overhead

(95oC per Rank)

latency overhead

(95oC)

512Mb 90ns 2.7% 1.4ns

1Gb 110ns 3.3% 2.1ns

2Gb 160ns 5.0% 4.9ns

4Gb 300ns 7.7% 11.5ns

8Gb 350ns 9.0% 15.7nsRefreshes Reads

tRFCtREFI

Motivation

Page 9: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

Postponing Refresh Operations

Each cell needs to be refreshed every 64 ms,

Refresh command spacing is based around an average rate.

As such, cell failure will not occur if no refresh is sent as tREFI expires.

Current DDR3 spec allows the controller to fall eight tREFI intervals behind (backlog count)

– Cell refresh rate is elongated by 0.1% (8 in 8k)

9 Laboratory for Computer Architecture 12/7/2010

Motivation

Page 10: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

10 Laboratory for Computer Architecture 12/7/2010

Current Approaches

Demand Refresh (DR)

– Most basic policy, sends refresh operations as high priority operations every tREFI period

Delay Until Empty (DUE)

– Policy utilizes DRAM ability to postpone refreshes.

– Refresh operations are postponed until no reads are queued, or the max backlog count has been reached

Why These policies are ineffective

– DR: Does nothing to hide refreshes

– DUE: Too aggressive in sending refresh operations. Does not take advantage of the backlog in many cases.

Motivation

Page 11: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

11 Laboratory for Computer Architecture 12/7/2010

Elastic Refresh

Exploit

– Non-uniform request distribution

– Refresh overhead just has to fit in free cycles

Initially not aggressive, converges with DUE as refresh backlog grows

Latency sensitive workloads are often lower bandwidth

Decrease the probability of reads conflicting with refreshes

Page 12: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

12 Laboratory for Computer Architecture 12/7/2010

Idle Delay Function

Refresh Backlog1 2 3 4 5 6 7 8

ProportionalConstantHigh

Priority

Introduce refresh backlog dependent idle threshold

With a log backlog, there is no reason to send refresh command

With a bursty request stream, the probability of a future request decreases with time

As backlog grows, decrease this delay threshold

Elastic Refresh

Idle

Delay

Threshold

Page 13: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

Tuning the Idle Delay Function

Parameter Units Description

Max Delay Memory ClocksSets the delay in the constant region

Proportional Slope

Memory Clocks per Postponed Step

Sets slope of the proportional region

High Priority Pivot Postponed Step

Point where the idle delay goes to zero

The optimal shape of the IDF is workload dependent

IDF can be controlled with the listed parameters

Our system contains hardware to determine “good” parameters

– Max Delay and Proportional Slope

Elastic Refresh

13 Laboratory for Computer Architecture 12/7/2010

Page 14: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

Max Delay Circuit

Current Idle Count (14)

Delay Accumlator (20)

Operation Count (10)

+0

1

+

+

Max Delay (10)

To Idle Delay Function

carry

cat

DRAM Read Sent Circuit used to collect average Rank

idle period

Conceptually, given a exponential type distribution, the average can be used to find the tail

Calculated average is used as Max Delay

Circuit function,– Accumulate idle delay over 1024 events– Average calculated with concatenation of

accumulator

Elastic Refresh

14 Laboratory for Computer Architecture 12/7/2010

Page 15: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

15 Laboratory for Computer Architecture 12/7/2010

Proportional Slope CircuitLow High

Divide By 2 Divide By 2

Postponed < Threshold

carrycarry++

Conceptually, proportional region acts to gracefully transition to high priority, while utilizing full postponed range

Circuit works to balance the utilization across the postponed range (High/Low counts)

PI type controller adjusts slot to balance High/Low counts

Low High

- Integral

+

Prop Slope

w(p) w(i)

To Idle Delay Function

+

Elastic Refresh

Page 16: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

Hardware Cost Trivial integration into DUE based policies

– Structure replaces “empty” indication of DUE

Logic size

– ~100 latch bits for static policy

– ~80 additional latch bits for dynamic policy

Logic cycle time

– Low frequency compared to ALU functions in processor core.

– Infrequent updates could enable pipelined control.

Elastic Refresh

16 Laboratory for Computer Architecture 12/7/2010

Refresh Queue

Input Queue Bank Queues x 8

Rank Queues x NRequest

InputInterface

Refresh Scheduler

OutputTo DRAMIO Drivers

tREFI Counter

Page 17: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

Simulation Methodology

Simics extended with GEMS model

– 1, 4 & 8 cores CMP

– First-Ready, First-Come-First-Served memory controller policy

– DDR3 1333MHz 8-8-8 memory, 2 MC, 2 Ranks/MC

– tRFC= 550ns, tREFI = 3.9μs @95oC (estimation of 16GBit)

– Refresh policies:

• Demand Refresh (DR) • Defer Until Empty (DUE) • Elastic Refresh policies

SPEC cpu2006 workloads

17 Laboratory for Computer Architecture 12/7/2010

Page 18: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

Results

Integer

8 Cores

18 Laboratory for Computer Architecture 12/7/2010

Page 19: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

Related Work

B. Bhat and F. Mueller,“Making DRAM refresh predictable,” Real-Time Systems, Euromicro Conference 2010

M. Ghosh and H. S. Lee, “Smart Refresh: An enhanced memory controller design for reducing energy in conventional and 3D die-stacked DRAMs,” in MICRO 40

K. Toshiaki, P. Paul, H. David, K. Hoki, J. Golz, F. Gregory, R. Raj, G. John, R. Norman, C. Alberto, W. Matt, and I. Subramanian, “An 800 MHz embedded DRAM with a concurrent refresh mode,” in IEEE ISSCC Digest of Technical Papers, Feb. 2004

19 Laboratory for Computer Architecture 12/7/2010

Page 20: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

Conclusions

The significant degradation of refresh can be mitigated with low overhead mechanisms

Commodity DRAM is cost driven

– Elastic refresh requires no DRAM changes

Future work:

– Coordinate refresh with other structures on the CMP

– Investigate refresh for future DRAM devices (DDR4)

• Example, dynamically select how many rows to refreshed

20 Laboratory for Computer Architecture 12/7/2010

Page 21: Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

Thank You,Questions?

Laboratory for Computer ArchitectureUniversity of Texas Austin

IBM Austin

IBM T. J. Watson Lab

21 Laboratory for Computer Architecture 12/7/2010