hardware architectures for power and energy adaptation phillip stanley-marbell

19
Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

Upload: alexis-cannon

Post on 06-Jan-2018

213 views

Category:

Documents


1 download

DESCRIPTION

3 Motivation Power consumption is becoming a limiting factor with scaling of technology to smaller feature sizes Mobile/battery-powered computing applications Thermal issues in high end servers Low Power Design is not enough: Power- and Energy-Aware Design Adapt to non-uniform application behavior Only use as many resources as required by application This talk : Exploit processor-memory performance gap to save power, with limited performance degradation

TRANSCRIPT

Page 1: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

Hardware Architectures for Power and Energy Adaptation

Phillip Stanley-Marbell

Page 2: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

2

Outline Motivation

Related Research

Architecture

Experimental Evaluation

Extensions

Summary and Future work

Page 3: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

3

Motivation Power consumption is becoming a limiting factor with

scaling of technology to smaller feature sizes Mobile/battery-powered computing applications Thermal issues in high end servers

Low Power Design is not enough: Power- and Energy-Aware Design

Adapt to non-uniform application behavior Only use as many resources as required by application

This talk : Exploit processor-memory performance gap to save power, with limited performance degradation

Page 4: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

4

Related Research Reducing power dissipation in on-chip caches

Reducing instruction cache leakage power dissipation [Powell et al, TVLSI ‘01]

Reducing dynamic power in set-associative caches and on-chip buffer structures [Dropsho et al, PACT ‘02]

Reducing power dissipation of CPU core Compiler-directed dynamic voltage scaling of

CPU core [Hsu, Kremer, Hsiao. ISLPED ‘01]

Page 5: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

5

Target Application Class: Memory-Bound Applications

Memory-bound applications Limited by memory system performance

CPU @ Vdd

CPU @ Vdd/2

Single-issue in-order processors Limited overlap of main memory access and

computation

Page 6: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

6

Power-Performance Tradeoff Detect memory-bound execution phases

Maintain sufficient information to determine compute / stall time ratio

Pros Scaling down CPU core voltage yields significant

energy savings (Energy Vdd2)

Cons Performance hit (Delay Vdd)

Page 7: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

7

Power Adaptation Unit (PAU) Maintains information to determine ratio of compute to stall time

Entries allocated for instructions which cause CPU stalls

Intuitively, one table entry required per program loop

[From S-M et al, PACS 2002]

Fields: State (I, A, T, V) # instrs. executed (NINSTR) Distance b/n stalls (STRIDE) Saturating ‘Quality’ counter (Q)

Page 8: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

8

PAU Table Entry State Machine

If CPU at-speed,slow it down

∂ = 0.01 • STRIDE + NINSTRNINSTR

Slowdown factor, ∂, for a target 1% performance degradation:

Page 9: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

9

Example

for (x = 100;;)

{if (x- - > 0)a = i;

b = *n;c = *p++;

}

PAU table entries created for each assignment

After 100 iterations, assignment to a stops Entries for b or c can take

over immediately

Page 10: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

10

Experimental Methodology Simulated PAU as part of a single-issue embedded

processor Used Myrmigki simulator [S-M et al, ISLPED 2001] Models Hitachi SH RISC embedded processor

5 stage in-order pipeline 8K unified L1, 100 cycle latency to main memory

Empirical instruction power model, from SH7708 device Voltage scaling penalty of 1024 cycles, 14uJ

Investigated effect of PAU table size on performance, power

Intuitively, PAU table entries track program loops with repeated stalls

Page 11: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

11

Effect of Table Size on Energy Savings

Single-entry PAU table provides 27% reduction in energy, on average Scaling up to a 64-entry PAU table only provides additional 4%

Page 12: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

12

Effect of Table Size on Performance

Single-entry PAU table incurs 0.75% performance degradation, on avg. Large PAU table, leads to more aggressive behavior, increased penalty

Page 13: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

13

Overall Effect of Table Size : Energy-Delay product

Considering both performance and power, there is little benefit from larger PAU table sizes

Page 14: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

14

Extending the PAU structure

Multiprogramming environments

Superscalar architectures

Slowdown factor computation

Page 15: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

15

PAU in Multiprogramming Environments

Only a single entry necessary per application

Amortize mem.-bound phase detection Would be wasteful to flush PAU at each context switch (~10ms)

Extend PAU entries with an ID field:

CURID and IDMASK fields written to by OS

Page 16: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

16

PAU in Superscalar Architectures

Dependent computations are ‘stretched out’ FUs with no dependent instructions unduly slowed down

Maintain separate instruction counters per FU:

Drawback : Requires ability to runFUs in core at different voltages

CPU @ Vdd

CPU @ Vdd/2

Page 17: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

17

Slowdown factor computation Computation only performed on application

phase change Hardware solution would be wasteful

Solution : computation by software ISR Compute ∂ , lookup discrete Vdd/Freq. by indexing

into a lookup table

Similar software handler solution proposed in [Dropsho et al, 2002]

Page 18: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

18

Summary & Future Work PAU : Hardware identifies program regions (loops)

with compute / memory stall mismatch

Due to nature of most programs, even single entry PAU is effective : can achieve 27% energy savings with only 0.75% perf. Degradation

Proposed extensions to PAU architecture

Future work Evaluations with smaller miss penalties Implementation of proposed extensions More extensive evaluation of implications of applications

Page 19: Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

19

Questions