comparative analysis of contemporary cache power …
TRANSCRIPT
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Comparative Analysis ofContemporary Cache Power
Reduction Techniques
Ph.D. DissertationProposal
Samuel V. Rodriguez
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Motivation
Portable Devices
Power dissipation is important acrossthe board, not just portable devices!!
Mid-end (e.g. Desktops)
High-end
(e.g. servers)
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Motivation
-Thermal Design Power (TDP) is now apriority specification
-AMD currently can’t compete in “Thinand light” notebooks because of theirhigher TDP’s
-AMD’s power advantage in initial dual-core offerings
-An entire Intel Pentium 4 designrecently cancelled because of higherthan expected TDP’s
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Motivation
• Breakdown of power consumption for a 4-wide 200MHz 3.3V 0.35um processor with32kB/32kB/1MB caches
Photograph taken from Gurumurthi
Cache(44%)
CoreDatapath(22%)
Clock(32%)
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Motivation
• Fraction of die area and xsistorcount dedicated to caches isincreasing
Photograph taken from Weiss2002
Itanium
Die
Photo
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Presentation Outline
• Motivation (finished)• Background
– Power Dissipation– Cache/SRAM Implementation
• Contemporary Cache PowerReduction Schemes
• Proposed Work• Q&A
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Background (Power Dissipation)
Physical Gate Length (um)
Normalized Total Chip Power
Year1990 1995 2000 2005 2010 2015 2020
0
50
100
150
200
250
300
0.000001
0.0001
0.01
1
100
Dynamic PowerSubthresholdLeakage Power
Phy.Gate Length
- Need to account for both dynamicand static power dissipation!
Graph from Kim2004
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Background (Power Dissipation)
• Causes of dynamic power
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Background (Power Dissipation)
• Causes of dynamic power
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Background (Power Dissipation)
• Causes of dynamic power
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Background (Power Dissipation)
• Powerdyn ∝ N x C x VDD2 x f
– ↑↑↑ N : Number of transistors– ↓↓ C : Device capacitance– ↓ VDD : supply voltage– ↑↑ f: Frequency
• Dynamic power trend: slow increase
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Background (Power Dissipation)
• Causes of static power: leakagecurrents
SOURCE
GATE
DRAIN
BODY
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Background (Power Dissipation)
• Subthreshold: 5x per generation• Gate leakage: 500x per generation!!!
SOURCE
GATE
DRAIN
BODY
SOURCE
GATE
DRAIN
BODY
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Background (Power Dissipation)
• Subthreshold leakage is increasing:
Id,sat ∝ (Vgs – Vth) = (VDD – Vth)
• Increase: 5x per generation
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Background (Power Dissipation)
• Gate leakage
• Gate leakage: 500x per generation!!!
SOURCE
GATE
DRAIN
BODY
Tox scaling resultingin increased gateleakage caused byoxide tunneling
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Presentation Outline
• Motivation (finished)• Background
– Power Dissipation– Cache/SRAM Implementation
• Contemporary Cache PowerReduction Schemes
• Proposed Work• Q&A
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
SA SA
TAG D ATA
SA SA
TAG D ATA
HIT
TLB
TLB _O UTOU TPU T
DATA
VIRTUA L ADD RESS
CLK
A D D R
AD S
RD /W RB
TAG W L
TA G BL/BLB
TA G O U T
TLB O U T
H IT
D ATA W L
D ATA BL/BLB
D ATA O U T
O U TPU T D ATA
Background (Cache Implementation)
2-way set-associativeCache Read
Note: (external signal)
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Background (Cache Implementation)
• Full CMOS 6T Memory Cell
WL
BL BLB
BL BLB
WL
WL
BL/BLB
READ OP WRITE OP
PRECHARGE
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Background (Cache Implementation)
BL BLB
WL
Simplified 8 x 8b SRAM array
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Background (Cache Implementation)
Simplified 8 x 8b SRAM array
CLK
ADDR
WL
BL/BLB
DATA OUT
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Background (Cache Implementation)
SRAM partitioning: array is often divided into smaller “subarrays”
Addr Tag Addr Index
Subarray 3
Subarray 2
Subarray 1
Subarray 0
SubarrayID partOf ADDRIndex
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Presentation Outline
• Motivation (finished)• Background (finished)
– Power Dissipation– Cache/SRAM Implementation
• Contemporary Cache PowerReduction Schemes
• Proposed Work• Q&A
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power ReductionTechniques
NO
NO
YES
YES
NO
YES
YES
Exec-timeincrease?
N/AN/ADynamicData sizedetection
N/A55%DynamicWay-halting
N/AN/A **StaticNear-OPTprecharge
YES60-75%StaticDrowsycache
YES39%-59%StaticDRG-cache
NO80%StaticCachedecay
NON/A *StaticGated-Vdd
State-Retentive?
Est. PowerSavings
Dynamic /Static?
Scheme
* - paper only cites 62% energy-delay savings** - paper only cites 92% reduction of bitline discharge
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power ReductionTechniques (cont…)
YES
YES
NO
NO
YES
NO
NO
µARCHtransparent?
NO
NO
NO
YES
YES
NO
NO
Additionalnoiseproblems?
NOYESNOData sizedetection
NONO*NOWay-halting
YESNO*NONear-OPTprecharge
YESYESNODrowsycache
NOYESNODRG-cache
NOYESYESCachedecay
NOYESYESGated-Vdd
Variableload-hitlatency?
Accesstimeincrease?
MissRatioincrease?
Scheme
* - With proper design
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power Reduction Techniques
First four techniques:Supply Gating
INPUTS
ACTIVE
OUTPUTS
INPUTS OUTPUTS
ACTIVE
ON
OFF
OFF
ON
OFF
Ileak1 Ileak2
Ileak2>>Ileak1
StackingEffect:
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power Reduction Techniques
1. Gated-Vdd (circuit)
SLEEP
WL
BL BLB
High-Vt PMOS
-6TMC can be disabled by the gating transistor, resulting in less leakage
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power Reduction Techniques
1. Gated-Vdd (microarchitecture : Dynamically ResIzable [DRI] Cache)
SA SA
TAG DATA
HIT
TLB
TLB_OUT
VIRTUAL ADDRESS
DATA OUT
MASK
MASKCONTROLCIRCUIT
- Mask out partof the index todynamically resizethe cache- Make this decisionbased on the cacheHit ratio- Energy-delay reduced by 62%
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power Reduction Techniques
1. Gated-Vdd (microarchitecture : Dynamically ResIzable [DRI] Cache)
SA SA
TAG DATA
HIT
TLB
TLB_OUT
VIRTUAL ADDRESS
DATA OUT
MASK
MASKCONTROLCIRCUIT
Example:If MASK removesthe upper 2 bitsof the index, onlythe lower _ setsof the cache canbe accessed (allother sets aregated off)
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power Reduction Techniques
2. Cache decay (concept)
Cache block first accessed
Cache block last accessed
Cache block evicted
time
-If we turn a cache block’s power offright after it is last accessed, we saveleakage power without any performancepenalty
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power Reduction Techniques
2. Cache decay (circuit borrows
Gated-Vdd techniques)
SLEEP
WL
BL BLB
High-Vt PMOS
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power Reduction Techniques
2. Cache decay (microarchitecture)
zero?
zero?
zero?
CLK
DEC
DEC
DEC
WORDLINE DECODER
CACHE LINES
- Static power reduced by 80%!!
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power Reduction Techniques
3. Data Retention Ground (DRG) (circuit)
-DRG gates the ground of the MC’s-With careful sizing, state can be preserved!-Technique is transparent!!-Power is reduced by 39% to 59%
ROW DECODER
WL (Gated-ground control)
3/1.5 3/1.5
4/1
3/13/1
4/1
(6.5/1) x no. of columns
BL BLB
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power Reduction Techniques
4. Drowsy caches (circuit)
VDD(1V) VDD(0.3V)
LowVolt LowVolt
VV DD
VDD
ACTIVE
- Drowsy caches (microarchitecture) : Simple algorithm – periodically put *every* cache line into drowsy mode- Static power reduced by 60% to 75%
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power Reduction Techniques
5. Near-optimal Precharging
PRE#
-bitline leakage burns powereven in unused cache subarray(additional power is neededduring the precharge phase)
-For a given time interval, only asmall fraction of subarraysare actually used
-Bitline discharge reduced by 92%
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power Reduction Techniques
5. Near-optimal Precharging
-Near-optimal precharging:stop precharging infrequently-used subarrays-Microarchitecture: countersto track subarray use, and asystem to handle variableload-hit latency
SA SA S A
PRECHARGE
GATED-PRECHARGING PRECHARGE CONTROLLER
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power Reduction Techniques
6. Way-halting cache
-Perform early missdetection to stop access tocache ways that are certainto miss-Early miss detectionperformed by offloading afew tag bits into a fasterarray that performs tagcomparison early in theaccess-Power reduced by 55%
SA SA
TAG D ATA
SA SA
TAG DATA
HIT
TLB
TLB_O UTO UTPUT
DATA
VIRTUAL AD DRESS
EARLY M ISS DETECTO R
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power Reduction Techniques
7. Data Size Detection
-Not every operand uses up the maximum spaceprovided by the wordlength(e.g. ~94% of the operands in 64-bit AlphaSpecInt95 benchmarks use 32-bit or less)
-Keep track of this information to turn off theupper bits of the datapath (saving on wordline,bitline and sense-amp power)
32bits 64-bits
Plot from Brooks and Martonosi
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power ReductionTechniques
NO
NO
YES
YES
NO
YES
YES
Exec-timeincrease?
N/AN/ADynamicData sizedetection
N/A55%DynamicWay-halting
N/AN/A **StaticNear-OPTprecharge
YES60-75%StaticDrowsycache
YES39%-59%StaticDRG-cache
NO80%StaticCachedecay
NON/A *StaticGated-Vdd
State-Retentive?
Est. PowerSavings
Dynamic /Static?
Scheme
* - paper only cites 62% energy-delay savings** - paper only cites 92% reduction of bitline discharge
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Cache Power ReductionTechniques (cont…)
YES
YES
NO
NO
YES
NO
NO
µARCHtransparent?
NO
NO
NO
YES
YES
NO
NO
Additionalnoiseproblems?
NOYESNOData sizedetection
NONO*NOWay-halting
YESNO*NONear-OPTprecharge
YESYESNODrowsycache
NOYESNODRG-cache
NOYESYESCachedecay
NOYESYESGated-Vdd
Variableload-hitlatency?
Accesstimeincrease?
MissRatioincrease?
Scheme
* - With proper design
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Presentation Outline
• Motivation (finished)• Background (finished)
– Power Dissipation– Cache/SRAM Implementation
• Contemporary Cache PowerReduction Schemes
• Proposed Work• Q&A
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Proposed Work
• Detailed comparative study ofdiscussed low-power cachetechniques (and variouscombinations)
• Metrics of comparison:– Power dissipation (including
overheads)– Performance penalty (IPC and access
time)– Die area overhead– Complexity
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering
Proposed Work• Contributions
– Every scheme is put on the sameplaying field
– Schemes are made up to date withthe use of predictive 65nm/45nmtechnology
– Improved evaluation accuracy• Gate leakage is now accounted for• Careful accounting for overheads• Use of a state-of-the-art memory
system model
– Data Size Detection is proposed
University ofMaryland
Samuel RodriguezPh.D. Proposal
Department ofElectrical and
ComputerEngineering Q & A