refrint: intelligent refresh to minimize power in on-chip multiprocessor cache...

30
Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchies Aditya Agrawal, Prabhat Jain, Amin Ansari and Josep Torrellas University of Illinois at Urbana Champaign http://iacoma.cs.uiuc.edu

Upload: others

Post on 11-Mar-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchies

Aditya Agrawal, Prabhat Jain, Amin Ansari and Josep Torrellas

University of Illinois at Urbana Champaign

http://iacoma.cs.uiuc.edu

Page 2: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Motivation

• As Vdd decreases Leakage power more important

• On-chip SRAM memories major contributor to leakage

• eDRAMs have low leakage power

– Already used as LLC in POWER 7

• Problem: Refresh energy

• Goal: Only refresh the lines that will be used soon

Feb 26, 2013 HPCA 2013, Shenzen, China 2

Page 3: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Contributions

• Refrint: Intelligent fine-grained refresh of eDRAMs

• Only refresh lines which will be used soon

• Don’t refresh lines that are inactive or frequently used

• Significant energy reductions with Refrint

𝐸 𝑅𝑒𝑓𝑟𝑖𝑛𝑡 𝑒𝐷𝑅𝐴𝑀 𝐶𝑎𝑐ℎ𝑒 𝐻𝑖𝑒𝑟𝑎𝑟𝑐ℎ𝑦

𝐸(𝑆𝑅𝐴𝑀 𝐶𝑎𝑐ℎ𝑒 𝐻𝑖𝑒𝑟𝑎𝑟𝑐ℎ𝑦) = 0.30

𝐸 Conv. 𝑒𝐷𝑅𝐴𝑀 𝐶𝑎𝑐ℎ𝑒 𝐻𝑖𝑒𝑟𝑎𝑟𝑐ℎ𝑦

𝐸(𝑆𝑅𝐴𝑀 𝐶𝑎𝑐ℎ𝑒 𝐻𝑖𝑒𝑟𝑎𝑟𝑐ℎ𝑦) = 0.56

Feb 26, 2013 HPCA 2013, Shenzen, China 3

Page 4: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Outline

• Motivation and Contribution

• Refrint

– Sources of Unnecessary Refreshes

– Time-based policy

– Data-based policy

• Implementation

• Evaluation Setup

• Results

• Conclusion

Feb 26, 2013 HPCA 2013, Shenzen, China 4

Page 5: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Unnecessary Refreshes

Two sources of unnecessary refreshes

• Cold lines

– Not accessed or accessed far apart in time

– Found in lower level caches like L3

– Propose: data-based policies (What to refresh?)

Feb 26, 2013 HPCA 2013, Shenzen, China 5

Retention time

Unnecessary refreshesLast access

Page 6: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Unnecessary Refreshes

• Hot lines

– Actively accessed (and automatically refreshed)

– Found in upper level caches like L2

– Propose: time-based policies (When to refresh?)

Feb 26, 2013 HPCA 2013, Shenzen, China 6

Retention time

Unnecessary refreshesAccess Refresh required

Page 7: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Time-Based Policy: Polyphase

• For hot lines: decides when to refresh

• The retention period is divided into Phases

• Each cache line records the phase when it was last accessed

• A line is refreshed only when the same phase arrives in the next retention period

Feb 26, 2013 HPCA 2013, Shenzen, China 7

Retention time

Line refreshedhere in PeriodicAccess

phase 0 phase 1 phase 2 phase 3

Line refreshedhere in Polyphase

phase 0 phase 1 phase 2 phase 3

Page 8: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Polyphase Effectiveness

Frequency of accesses > Refresh rate

• True for higher levels of cache but refresh energy small

• LLCs have high refresh energy but few accesses

• LLCs can benefit from Polyphase under

– Fine-grained sharing (repeated writebacks and reads)

– Significant conflict in higher level caches

– Accesses bypassing higher level caches

Feb 26, 2013 HPCA 2013, Shenzen, China 8

Frequency of accesses1/Retention Time HighLow

Polyphase effectiveness

Page 9: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Data-Based Policy

• For cold lines: decides what to refresh

• 4 simple policies using the line state

All: All lines

Valid: Only valid lines (includes clean and dirty lines)

Dirty: Only dirty lines

WB(n,m): Idle dirty lines n times before writeback and

idle valid clean lines m times before invalidation

Feb 26, 2013 HPCA 2013, Shenzen, China 9

Page 10: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Data-Based Policy: Effectiveness

Class 2

WB (n,m)

Large n,m

Class 1

WB (n,m)

Small n,m

Class 3

Valid

Small Large

High

Low

Footprint

Visibility

Application categorization as seen from LLC

Feb 26, 2013 HPCA 2013, Shenzen, China 10

Page 11: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Outline

• Motivation and Contribution

• Refrint

• Implementation

– Key Ideas

– Hardware Support

– Operation

• Issuing Refresh Request

• Processing Refresh Request

• Evaluation Setup

• Results

• Conclusion

Feb 26, 2013 HPCA 2013, Shenzen, China 11

Page 12: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Key Ideas

• Retention time is divided into 2N intervals: Global Phase

• Each cache line has N bits: Local Phase

• Phase Array:

– Hardware structure in cache controller

– Has N local phase bits and a copy of the valid bit of each cache line

Feb 26, 2013 HPCA 2013, Shenzen, China 12

Page 13: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Hardware Support

State Data + Tag

Decision Logic

Cache Controller

Phase Array

Local Phase

V

= = =

R/W Request

Refresh Request

Count

GlobalPhase

Feb 26, 2013 HPCA 2013, Shenzen, China 13

Page 14: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Operation: Issuing Refresh

# On normal rd or wr access

local phase = global phase

#At beginning of each global phase

hold all rd and wr requests

for (all the lines of the cache) {

if ((global phase == local phase)

&& (line == Valid))

issue a refresh request

#processing on next slide

}

release rd and wr requests

Feb 26, 2013 HPCA 2013, Shenzen, China 14

State Data + Tag

Decision Logic

Cache Controller

Phase Array

Local Phase

V

= = =

R/W Request

Refresh Request

Count

GlobalPhase

Page 15: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Operation: Processing Refresh

All: Refresh all lines

Valid: Refresh all valid lines

Dirty: Refresh all dirty lines

Invalidate clean lines

WB(n,m): next slide

Feb 26, 2013 HPCA 2013, Shenzen, China 15

State Data + Tag

Decision Logic

Cache Controller

Phase Array

Local Phase

V

= = =

R/W Request

Refresh Request

Count

GlobalPhase

Page 16: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Processing Refresh for WB(n,m)

if (Count >= 1)

refresh line

Count --

else if (Dirty == 1)

write back

State = Valid Clean

Count = m

else if (Valid == 1)

invalidate

Feb 26, 2013 HPCA 2013, Shenzen, China 16

State Data + Tag

Decision Logic

Cache Controller

Phase Array

Local Phase

V

= = =

R/W Request

Refresh Request

Count

GlobalPhase

Page 17: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Outline

• Motivation and Contribution

• Refrint

• Implementation

• Evaluation Setup

• Results

• Conclusion

Feb 26, 2013 HPCA 2013, Shenzen, China 17

Page 18: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Architectural Parameters

Simulated Architectural Parameters

Chip 16 core CMP

Core MIPS32, 2 issue out-of-order

IL1 (SRAM) 32 KB, 2 way

DL1 (SRAM) 32 KB, 4 way, private

L2 (eDRAM) 256 KB, 8 way, private

L3 (eDRAM) 16 MB, 16 banks, shared

L3 bank 1 MB, 8 way

Line size 64 Bytes

Network 4 X 4 torus

Coherence MESI directory protocol at L3

Feb 26, 2013 HPCA 2013, Shenzen, China 18

Page 19: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Technology Parameters & Tools

Technology Parameters

Technology node 32 nm

Frequency 1000 MHz

Device type LOP (Low operating power)

Tools and Applications

Architectural Simulator SESC

Timing and Power McPAT & CACTI

Applications SPLASH-2 and PARSEC

Feb 26, 2013 HPCA 2013, Shenzen, China 19

Page 20: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Assumptions

Parameters for L2 and L3

eDRAM access time = SRAM access time

eDRAM access energy = SRAM access energy

eDRAM leakage power = 1/8 X SRAM leakage power

eDRAM line refresh time = eDRAM line access time

eDRAM line refresh energy = eDRAM line access energy

Feb 26, 2013 HPCA 2013, Shenzen, China 20

Page 21: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Parameter Sweep

Parameter Values

Retention time 50 us

Timing policy Periodic Polyphase (# of phases = 1) Polyphase (# of phases = 2) Polyphase (# of phases = 4)

Data policy All Valid Dirty WB(4,4) WB(8,8) WB(16,16) WB(32,32)

Total combinations 29 (28 + baseline)

Feb 26, 2013 HPCA 2013, Shenzen, China 21

Page 22: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Outline

• Motivation and Contribution

• Refrint

• Implementation

• Evaluation Setup

• Results

– Cache hierarchy (+ DRAM access) energy

– Total energy

– Execution time

– Effectiveness of Polyphase

• Conclusion

Feb 26, 2013 HPCA 2013, Shenzen, China 22

Page 23: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Plots

• Retention time: 50 us

• 4 time-based policies

– Periodic (P)

– Polyphase with 1 phase (PP1)

• Lookup for invalid lines done in phase array Saves cycles wrt P

– Polyphase with 2 and 4 phases (PP2, PP4)

• 7 data-based policies

– All, Valid, Dirty

– WB(4,4), WB(8,8), WB(16,16), WB(32,32)

Note: Baseline is SRAM cache hierarchy

Feb 26, 2013 HPCA 2013, Shenzen, China 23

Page 24: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Cache Hierarchy Energy

• Very large reduction in refresh energy

• PP1, PP2 and PP4 do better than Periodic(P)

• PP1 is as good as PP2 and PP4

• WB(32,32) does better than other data-policies

Feb 26, 2013 HPCA 2013, Shenzen, China 24

Page 25: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Total On-chip Energy

• Same trends as cache hierarchy energy

• Conventional eDRAM hierarchy consumes 77% of baseline

• Refrint eDRAM hierarchy consumes 58% of baseline

Feb 26, 2013 HPCA 2013, Shenzen, China 25

Page 26: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Execution Time

• Conventional eDRAM hierarchy slows down by 25%

• Refrint eDRAM hierarchy slows down by only 6%

Feb 26, 2013 HPCA 2013, Shenzen, China 26

Page 27: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Effectiveness of Polyphase

• Kernel with fine grained sharing: L3 sees frequent updates (more than refresh rate)

• Polyphase PP4 saves significant energy

• Across all data policies: PP4 > PP2 > PP1 Feb 26, 2013 HPCA 2013, Shenzen, China 27

Page 28: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Outline

• Motivation and Contribution

• Refrint

• Implementation

• Evaluation Setup

• Results

– L3 and L2 energy

– Total energy

– Execution time

– Effectiveness of Polyphase

• Conclusion

Feb 26, 2013 HPCA 2013, Shenzen, China 28

Page 29: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Conclusion

• eDRAM + Refrint shaves away most of refresh energy

– Refrint eDRAM hierarchy

• Consumes 30% of baseline cache hierarchy energy

• Slowdown of 6%

– Conventional eDRAM hierarchy

• Consumes 56% of baseline cache hierarchy energy

• Slowdown of 25%

• Simple hardware implementation

Feb 26, 2013 HPCA 2013, Shenzen, China 29

Page 30: Refrint: Intelligent Refresh to Minimize Power in On-Chip Multiprocessor Cache Hierarchiesiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_hpca13_3.pdf · 2013-03-29 · Refrint: Intelligent

Questions

Thanks !

谢谢

Feb 26, 2013 HPCA 2013, Shenzen, China 30