power management in multicores minshu zhao. outline introduction review of power management...

24
Power Management in Multicores Minshu Zhao

Upload: phoebe-james

Post on 22-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Power Management in Multicores

Minshu Zhao

OutlineIntroductionReview of Power management

techniquePower management in Multicore

◦Identify Multicores Characteristics◦Apply power management technique

Future of multicore

Review on low power technique

Clock gating

◦ + Gating can be done on fine grained

◦ + Save dynamic power

◦ - Not affect static power

Power Gating◦ + save both

dynamic and static power

◦ - need microseconds to power up again

◦ - lost data or need some form of state retention

FFEN

CK

FF

Vdd

EN

Review on low power technique

Voltage (Frequency) Scaling◦ Scale down

frequency and/or voltage, sacrifice performance for power I ∝ (Vdd-Vt) ~ Vdd f ∝ Vdd P ∝ CV2f ∝ V3

Variable device threshold◦ Use high vt

transistor to reduce leakage

◦ + reduce leakage◦ - vt is generally

fixed for one transistor

OutlineIntroductionReview of Power management

techniquePower management in Multicore

◦Identify Multicores Characteristics◦Apply power management technique

Future of multicore

Identify Multicore CharacteristicsHalf of the chip is cores

◦Large dynamic power◦Unbalanced power consumption

among coresAnother Half of the chip is Cache

◦Large Leakage Power

OutlineIntroductionReview of Power management

techniquePower management in Multicore

◦Identify Multicores Characteristics◦Apply power management technique

To Cores To Caches

Future of multicore

Traditional DVFSMotivation

◦ Large Computation/Memory Gap

Problems to apply to multi-core◦ Slow

Microsecond timescales

◦ Coarse-grained adjustment In operating system

◦ All cores arrive at a single chip-wide VF setting Lose potential power

saving

Power supply

Off-chip

regulator

Core0

Core1

Core2

Core3

Per-core DVFS & on-chip regulatorOn-chip vs. off-

chip regulator◦ Tens of

nanoseconds vs. microseconds

Per-Core vs. Chip-Wide DVFS◦ Benefit

heterogeneous workload

Power supply

Off-chip

regulator

Core0

Core1

Core2

Core3

On-c

hip

R

egula

tor

Wonyoung Kim; Gupta, M.S.; Gu-Yeon Wei; Brooks, D.; , "System level analysis of fast, per-core DVFS using on-chip switching regulators," High Performance Computer Architecture, 2008. HPCA 2008.

Per-core DVFS & on-chip regulatorApplication

◦Multi-Core Global Power Management Monitor power & performance Apply policies by per-core DVFS

Problem◦Overhead is large

App A App BLow IPC High IPC

Time

Act

ivit

y

Thread Motion

Cores have different Voltage-Frequency setting

Migrate thread between coresApply DVFS benefits to program variability

by observe micro architectural eventsFast movement create effective voltage

levelKrishna K. Rangan, Gu-Yeon Wei, and David Brooks. 2009. Thread motion: fine-grained power management for multi-core

systems. In Proceedings of the 36th annual international symposium on Computer architecture (ISCA '09).

High-VF

Low-VF

Thread MotionApplication

◦Thread Motion Framework Evaluation driven by micro

architectural events Time-driven Miss-driven

Predict IPC for the next interval Move thread if needed

Problem◦Potential Cache penalty

Clustered multicore with shared L1 cache within cluster

◦Register file transfer penalty Store them in the shared cache

Heterogeneous CoresMotivation

◦Different applications have different resource requirements Large ILP -> VLIW

◦Different Power conditions full battery vs. low battery

Combine existing processor architecture and do core-selection to minimize energyRakesh Kumar, Dean M. Tullsen, Parthasarathy Ranganathan,

Norman P. Jouppi, and Keith I. Farkas. 2004. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. In Proceedings of the 31st annual international symposium on Computer architecture (ISCA '04).

OutlineIntroductionReview of Power management

techniquePower management in Multicore

◦Identify Multicores Characteristics◦Apply power management technique

To Cores To Caches

Future of multicore

Gated-Vdd cacheUse high- Vt

transistor to turn off power supply

+ reduce power when turn off

- data stored in low power mode are lost

Michael Powell, Se-Hyun Yang, Babak Falsafi, Kaushik Roy, and T. N. Vijaykumar. 2000. Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories. In Proceedings of the 2000 international symposium on Low power electronics and design (ISLPED '00). ACM, New York, NY, USA, 90-95.

SRAM CELL

Vdd

Gnd

Gated-vddcontrol

Gated-Vdd cacheApplication

◦Dynamically resizable i-cache Evaluate miss rate at every time interval and

upsize/downsize the cache using gated-vdd

Problem◦Data remapping on the fly

Yang, S.; Powell, M.D.; Falsafi, B.; Roy, K.; Vijaykumar, T.N.; , "An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches," High-Performance Computer Architecture, 2001. HPCA.

Gated-Vdd cacheApplication

◦Cache Decay Turn a cache line off if

some cycles elapsed since last access

The decay interval can be adaptive to the program

Problem◦Data lost in sleep

cache line, suffer cache missKaxiras, S.; Zhigang Hu; Martonosi, M.; , "Cache decay:

exploiting generational behavior to reduce cache leakage power," Computer Architecture, 2001. Proceedings. 28th Annual International Symposium on , vol., no., pp.240-251, 2001

ABB-Multi-threshold CMOSIncrease Vsb in

the sleep modeEffectively

increase vth to reduce leakage

+ State Preserved in sleep mode

- Need long time to switch from sleep

K. Nii, et. al. A low power SRAM using auto-backgate-controlledMT-CMOS. Proc. of Int. Symp. Low Power Electronicsand Design, 1998, pp. 293-298.

1.0V 1.0V

0V 0V

1.0V / 3.3V

0V / 1.0V

Drowsy CachesApply DVFS to

Cache+ Waking up

cost is small+ State preserve- Save not as

much leakage power

Krisztián Flautner, Nam Sung Kim, Steve Martin, David Blaauw, and Trevor Mudge. 2002. Drowsy caches: simple techniques for reducing leakage power. In Proceedings of the 29th annual international symposium on Computer architecture (ISCA '02). IEEE Computer Society, Washington, DC, USA, 148-157.

Vdd1V

0.3V

drowsy

drowsy

SRAM CELL

Drowsy CachesApplication

◦Simple policy Put all lines into sleep periodically and wake

up afterwards

◦No-access policy Put the lines which is not access in the

window in sleep

◦90% of the lines can be drowsy mode

Problem

Normalized total energy

Normalized leakage energy

Run time increase

Avg 0.46 0.29 0.41%

Drowsy cache Gated-Vdd

Leakage power 6.24nW 0.02nW

OutlineIntroductionReview of Power management

techniquePower management in Multicore

◦Identify Multicores Characteristics◦Apply power management technique

Future of multicore

Future multicoreDark silicon (transistor under-

utilization)◦Power constraints

Power down the transistor to reduce power

◦Memory wall Waiting for the memory to continue

computation

◦Lack of parallelism Do not have enough work for transistor

Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. In Proceeding of the 38th annual international symposium on Computer architecture (ISCA '11).

Future multicorePower constraints

◦New Device– FinFETMemory wall

◦New Technology – 3D ICLack of parallelism

◦Auto parallization

Thank you !