power management in multicores minshu zhao. outline introduction review of power management...
TRANSCRIPT
OutlineIntroductionReview of Power management
techniquePower management in Multicore
◦Identify Multicores Characteristics◦Apply power management technique
Future of multicore
Review on low power technique
Clock gating
◦ + Gating can be done on fine grained
◦ + Save dynamic power
◦ - Not affect static power
Power Gating◦ + save both
dynamic and static power
◦ - need microseconds to power up again
◦ - lost data or need some form of state retention
FFEN
CK
FF
Vdd
EN
Review on low power technique
Voltage (Frequency) Scaling◦ Scale down
frequency and/or voltage, sacrifice performance for power I ∝ (Vdd-Vt) ~ Vdd f ∝ Vdd P ∝ CV2f ∝ V3
Variable device threshold◦ Use high vt
transistor to reduce leakage
◦ + reduce leakage◦ - vt is generally
fixed for one transistor
OutlineIntroductionReview of Power management
techniquePower management in Multicore
◦Identify Multicores Characteristics◦Apply power management technique
Future of multicore
Identify Multicore CharacteristicsHalf of the chip is cores
◦Large dynamic power◦Unbalanced power consumption
among coresAnother Half of the chip is Cache
◦Large Leakage Power
OutlineIntroductionReview of Power management
techniquePower management in Multicore
◦Identify Multicores Characteristics◦Apply power management technique
To Cores To Caches
Future of multicore
Traditional DVFSMotivation
◦ Large Computation/Memory Gap
Problems to apply to multi-core◦ Slow
Microsecond timescales
◦ Coarse-grained adjustment In operating system
◦ All cores arrive at a single chip-wide VF setting Lose potential power
saving
Power supply
Off-chip
regulator
Core0
Core1
Core2
Core3
Per-core DVFS & on-chip regulatorOn-chip vs. off-
chip regulator◦ Tens of
nanoseconds vs. microseconds
Per-Core vs. Chip-Wide DVFS◦ Benefit
heterogeneous workload
Power supply
Off-chip
regulator
Core0
Core1
Core2
Core3
On-c
hip
R
egula
tor
Wonyoung Kim; Gupta, M.S.; Gu-Yeon Wei; Brooks, D.; , "System level analysis of fast, per-core DVFS using on-chip switching regulators," High Performance Computer Architecture, 2008. HPCA 2008.
Per-core DVFS & on-chip regulatorApplication
◦Multi-Core Global Power Management Monitor power & performance Apply policies by per-core DVFS
Problem◦Overhead is large
App A App BLow IPC High IPC
Time
Act
ivit
y
Thread Motion
Cores have different Voltage-Frequency setting
Migrate thread between coresApply DVFS benefits to program variability
by observe micro architectural eventsFast movement create effective voltage
levelKrishna K. Rangan, Gu-Yeon Wei, and David Brooks. 2009. Thread motion: fine-grained power management for multi-core
systems. In Proceedings of the 36th annual international symposium on Computer architecture (ISCA '09).
High-VF
Low-VF
Thread MotionApplication
◦Thread Motion Framework Evaluation driven by micro
architectural events Time-driven Miss-driven
Predict IPC for the next interval Move thread if needed
Problem◦Potential Cache penalty
Clustered multicore with shared L1 cache within cluster
◦Register file transfer penalty Store them in the shared cache
Heterogeneous CoresMotivation
◦Different applications have different resource requirements Large ILP -> VLIW
◦Different Power conditions full battery vs. low battery
Combine existing processor architecture and do core-selection to minimize energyRakesh Kumar, Dean M. Tullsen, Parthasarathy Ranganathan,
Norman P. Jouppi, and Keith I. Farkas. 2004. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. In Proceedings of the 31st annual international symposium on Computer architecture (ISCA '04).
OutlineIntroductionReview of Power management
techniquePower management in Multicore
◦Identify Multicores Characteristics◦Apply power management technique
To Cores To Caches
Future of multicore
Gated-Vdd cacheUse high- Vt
transistor to turn off power supply
+ reduce power when turn off
- data stored in low power mode are lost
Michael Powell, Se-Hyun Yang, Babak Falsafi, Kaushik Roy, and T. N. Vijaykumar. 2000. Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories. In Proceedings of the 2000 international symposium on Low power electronics and design (ISLPED '00). ACM, New York, NY, USA, 90-95.
SRAM CELL
Vdd
Gnd
Gated-vddcontrol
Gated-Vdd cacheApplication
◦Dynamically resizable i-cache Evaluate miss rate at every time interval and
upsize/downsize the cache using gated-vdd
Problem◦Data remapping on the fly
Yang, S.; Powell, M.D.; Falsafi, B.; Roy, K.; Vijaykumar, T.N.; , "An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches," High-Performance Computer Architecture, 2001. HPCA.
Gated-Vdd cacheApplication
◦Cache Decay Turn a cache line off if
some cycles elapsed since last access
The decay interval can be adaptive to the program
Problem◦Data lost in sleep
cache line, suffer cache missKaxiras, S.; Zhigang Hu; Martonosi, M.; , "Cache decay:
exploiting generational behavior to reduce cache leakage power," Computer Architecture, 2001. Proceedings. 28th Annual International Symposium on , vol., no., pp.240-251, 2001
ABB-Multi-threshold CMOSIncrease Vsb in
the sleep modeEffectively
increase vth to reduce leakage
+ State Preserved in sleep mode
- Need long time to switch from sleep
K. Nii, et. al. A low power SRAM using auto-backgate-controlledMT-CMOS. Proc. of Int. Symp. Low Power Electronicsand Design, 1998, pp. 293-298.
1.0V 1.0V
0V 0V
1.0V / 3.3V
0V / 1.0V
Drowsy CachesApply DVFS to
Cache+ Waking up
cost is small+ State preserve- Save not as
much leakage power
Krisztián Flautner, Nam Sung Kim, Steve Martin, David Blaauw, and Trevor Mudge. 2002. Drowsy caches: simple techniques for reducing leakage power. In Proceedings of the 29th annual international symposium on Computer architecture (ISCA '02). IEEE Computer Society, Washington, DC, USA, 148-157.
Vdd1V
0.3V
drowsy
drowsy
SRAM CELL
Drowsy CachesApplication
◦Simple policy Put all lines into sleep periodically and wake
up afterwards
◦No-access policy Put the lines which is not access in the
window in sleep
◦90% of the lines can be drowsy mode
Problem
Normalized total energy
Normalized leakage energy
Run time increase
Avg 0.46 0.29 0.41%
Drowsy cache Gated-Vdd
Leakage power 6.24nW 0.02nW
OutlineIntroductionReview of Power management
techniquePower management in Multicore
◦Identify Multicores Characteristics◦Apply power management technique
Future of multicore
Future multicoreDark silicon (transistor under-
utilization)◦Power constraints
Power down the transistor to reduce power
◦Memory wall Waiting for the memory to continue
computation
◦Lack of parallelism Do not have enough work for transistor
Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. In Proceeding of the 38th annual international symposium on Computer architecture (ISCA '11).
Future multicorePower constraints
◦New Device– FinFETMemory wall
◦New Technology – 3D ICLack of parallelism
◦Auto parallization