revisi&ng)lp,nuca)energy) consump&on:)cache)access...
TRANSCRIPT
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies
and Adap&ve Block Dropping D. Suárez-‐Gracia, Alexandra Ferrerón, L. Montesano,
T. Monreal, and V. Viñals
Grupo de Arquitectura de Computadores (gaZ) Departamento de Informá&ca e Ingeniería de Sistemas Ins&tuto de Inves&gación en Ingeniería de Aragón (I3A)
Universidad de Zaragoza
Instituto Universitario de Investigación
de Ingeniería de Aragón
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
LP-‐NUCA Organiza&on for Embedded Processing
• First-‐ and second-‐level cache merged into a &led fabric
• Specialized networks-‐in-‐cache: – Search – Transport – Replacement
• Single or mul&-‐thread programs
SEARCH: Misses waste energy
DOMINOES REPLACEMENT: Non-‐reused blocks waste energy
1
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Objec&ves • Reduce the energy consump&on of LP-‐NUCA without sacrificing performance: – Is parallel access really useful/needed? Reconsider search policy Energy reduc&ons between 13.2% -‐ 31.7%
– Dominoes replacement: can we iden&fy harmful blocks without short-‐term reuse? Track temporal locality and drop cache blocks from the RT with no reuse or long-‐term reuse • How can we track temporal locality? Adap&ve drop ra&o controller (ADR) ADR minimizes replacement and migra&on in low-‐locality phases: energy decreases 22.7% (1SMT) and 29% for (2SMT)
2
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Agenda
• Search Policy • Adap&ve Replacement with ADR • Methodology • Evalua&on • Conclusions
3
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Agenda
• Search Policy • Adap&ve Replacement with ADR • Methodology • Evalua&on • Conclusions
4
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Search Policy: Parallel vs. Serial Access Parallel Serial
tag data
address address
tag data
Energy Consumptionhitmiss
tag + data 9tag + data 8
hitmiss
Latencytag + data 8max(tag,data) 9
tag + data 9tag 9
High locality
parallel (latency)
Low locality
serial (power)
5
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Parallel vs. Serial Results
1 SMT 2 SMT
• Comparable performance • Parallel policy wastes energy for high-‐RPKI workloads
(RPKI: replacements per kilo-‐instruc&on)
6
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Agenda
• Search Policy • Adap5ve Replacement with ADR • Methodology • Evalua&on • Conclusions
7
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Non-‐reused Blocks Waste Energy
• Dominoes replacement in a 3-‐level LP-‐NUCA
• Non-‐reused blocks: – Are inserted and evicted
up to 5 &mes – Can evict other useful
blocks, either from the same or another thread
8
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Reducing Replacement Energy Waste • Cache blocks from the RT
with long-‐term reuse or no reuse should be dropped
• How? Selec&ve block evic&on from the RT taking into account program and phase?
Low locality ! drop (clean blocks discarded; !" " " dirty blocks to next cache level)!
High locality ! evict (usual behavior)!
✔ ! ✔ !✔ !✖ ! ✖ !✖ !
9
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Temporal Locality Varies across Programs and Time
&me
1M 2M
epoch
1-‐2 Mcycles
< 1 Mcycles
3M
> 2 MCycles
10
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Temporal Locality Varies across Programs and Time
Short-‐term reuse Near reuse Long-‐term or no reuse
ST: Keep ST: Keep ST: Drop all
MT: Keep MT: careful dropping MT: Drop all
ST: single-‐thread mode MT: mul&-‐thread mode
short-‐term reuse near reuse long-‐term reuse
DROP RATE
no reuse
11
473.astar 456.hmmer 470.lbm
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Adap&ve Drop Ra&o Controller • Periodic opera&on (epochs) • State: drop rate + direc&on (↑, ↓) • Local search problem: – Hill climbing – Memory addi&onal rules (avoid minima)
• Avoid on-‐chip power metering • How to know we are losing hits? – Auxiliary tags
epoch control
ADR
RT drop rate
performance & energy metrics
12
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
ADR Opera&on
0.5 ↑
0.5 ↓
1
0.5
0.5
0
0.5
0.5
1 ↑
0.5 ↓
k k + 1epoch k + 2
age jk + 3 k + 4 k + 5
j + 1
trial referencetrial
time
thread 0
trialthread 1
trial
trialreferencetrial
thread 0
thread 1
score 120 100 90 evaluate scores
13
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Scoring Func&on • How do we know we are losing hits?
– Auxiliary tags: insert blocks we drop
• Scoring func&on: maximize low-‐energy accesses and minimizes high-‐energy accesses
hitsi : low-‐energy accesses (LP-‐NUCA) aux_tag_hitsi :high-‐energy accesses (dropped blocks) k: energy cost ra&o between both (constant)
14
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Hardware Implementa&on
RT
cache ports ADRcontroller
processorRNGsearch
message
replacement message
...
RT mshr
hits / evictions
state, dir
# th
read
s
epoch id
RT repl.buffer
cmp
drop/insert
auxiliarytags
epochs & age stats.
...
dropped @
15
1024-‐entry
32b p. ep
log2(th + 1) log2(states) + 1 p. th
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Agenda
• Search Policy • Adap&ve Replacement with ADR
• Methodology • Evalua&on • Conclusions
16
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Methodology • Processor model based on commercial SoCs (IBM/LSI Power PC 476FP)
• Simula&on – SMTScalar (SimpleScalar)
• Energy and Delay – CACTI 6.5 – Original LP-‐NUCA layout
• SPEC CPU2000 and CPU2006 – Simpoints [Hamerly+06] – Benchmarks divided into two groups: high-‐ and low-‐RPKI
17
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Agenda
• Search Policy • Adap&ve Replacement with ADR • Methodology
• Evalua5on • Conclusions
18
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
ADR Sensi&vity Analysis
Key Controller Parameters Time/Event-‐based Epoch? Time-‐based, 128KCycles
Op5mal number of drop states? (2-‐9)
Two states (epoch-‐length dependent)
Weight of auxiliary hits? (K) Liple impact Auxiliary Tag Size? 1024 entries Controller Delay? 50-‐100 cycles
no impact on performance
19
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
IPC/Energy Impact: 1 SMT
IPC Energy
20
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
IPC/Energy Impact: 2 SMT
IPC Energy
21
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
ED System Impact: 1 SMT
22
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
ED System Impact: 2 SMT
23
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Agenda
• Search Policy • Adap&ve Replacement with ADR • Methodology • Evalua&on • Conclusions
24
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Conclusions • LP-‐NUCA energy reduc&on based on adap&ve mechanisms
• Search serial policy access reduces energy without hur&ng performance
• Replacement is op&mized by means of an Adap&ve Drop Ra&o controller – Hill climbing-‐based – Detects low locality program phases and useless blocks are silently dropped during them • Total migra&ons reduced by 81%
– Straighqorward implementa&on, low area overhead – Energy reduced by 22.7% (1 SMT) and 29% (2SMT)
25
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies an Adap&ve Block Dropping HIPEAC 2015, Amsterdam
Revisi&ng LP-‐NUCA Energy Consump&on: Cache Access Policies
and Adap&ve Block Dropping D. Suárez-‐Gracia, Alexandra Ferrerón, L. Montesano,
T. Monreal, and V. Viñals
Grupo de Arquitectura de Computadores (gaZ) Departamento de Informá&ca e Ingeniería de Sistemas Ins&tuto de Inves&gación en Ingeniería de Aragón (I3A)
Universidad de Zaragoza
Instituto Universitario de Investigación
de Ingeniería de Aragón