green clouds – power consumption as a first order criterion karsten schwan, sudhakar yalamanchili,...
TRANSCRIPT
Green Clouds – Power Consumption as a First Order Criterion
Karsten Schwan, Sudhakar Yalamanchili, Ada Gavrilovska,Hrishikesh Amur, Bhavani Krishnan, Surabhi Diwan,
Nikhil Sathe, Minki Lee, Saibal Mukopadhyay, …CERCS
Yogendra Joshi, Pramod Kumar, Emad SamadianiCEETHERM
Georgia Institute of Technology
Green Computing Initiative
• An eco-system of projects addressing multiple stack layers
• Multiple faculty and students involved
• ECE, ME, CS
Circuit levelCircuit level: DVFS, power states, clock gating (ECE)
Chip and PackageChip and Package: power multiplexing, spatiotemporal migration (SCS, ECE)
BoardBoard: VirtualPower, scheduling/scaling/operating system… (SCS, ME, ECE)
RackRack: mechanical design, thermal and airflow analysis, VPTokens, OS and management (ME, SCS)
Pow
er d
istr
ibut
ion
and
deli
very
(E
CE
)
http://img.all2all.net/main.php?g2_itemId=157
Datacenter and beyondDatacenter and beyond: design, IT management, HVAC control… (ME, SCS, OIT…)
3
Modeling and Control Across Entire Stack
• Seek a fundamental understanding of relationships between performance, power distribution, energy consumption, heat generation and cooling technologies at all levels of the stack
• Develop, model, and assess (new) principles for energy and thermal management– Coordinated management across the entire stack– Example concepts
• Couple cooling and workload generation: thermal flow control to respond to load conditions
• Couple power distribution and workload generation: adapt to power capacity (time of day?)
• Pro-active spatio-temporal migration driven by physics of heat generation/flow rather than reactive sensor driven techniques
Sample Projects
• Understanding of power distribution opportunities at the chip level
• Platform-level coordination of power management methods– DVFS, scheduling, idle states– Understanding of impact of these approaches
• Distributed power management methods• IT & environmental factor management
– Temperature and air velocity, HVAC control
• Management architecture for virutalized platforms
5
Thermal and Power Scaling Limits-On-Chip
Power Limited PerformancePower Limited Performance
Temperature Limited Temperature Limited PerformancePerformance
Mukhopadhyay and Yalamanchili (2008)Mukhopadhyay and Yalamanchili (2008)
75°70°
45°30 ms
75°70°
45°30 ms
• 64 on-tiles• 256 total tiles• 100K time slice interval@3GHz
Unmanaged Thermal Behavior Managed Thermal Behavior (multiplexed power)
Courtesy: Nikil Sathe
Spatial gradientSpatial gradient: 10.5°@0.75mm Temporal Temporal gradientgradient: 2.5°@100Kcyles
Spatial gradientSpatial gradient: 2.5°@0.75mm Temporal gradientTemporal gradient: 1.99°@100Kcyles
The Need for Feedback
Power distribution network
Spatiotemporal migration
Thermal profile
Co-design power distribution/architecture management@chip and multi-chip
Co-exploration of thermal management/architecture management
LocalMemoryCache
core
LocalMemoryCache
coreLocalMemoryCache
coreLocalMemoryCache
core
Towards Integrated Platform Power Management
• Multiple techniques exist for power management on platforms.
VirtualPower: Coordinated Power Management
• Coordinated power management (DVFS) + load management (migration) + CPU management (credit based soft scaling) -> cumulative reduction of 34% in power resources without SLA degradation of RUBis benchmark
• (R. Nathuji, K. Schwan, SOSP0)
Hardware
Hypervisor
OS
Application
VPM States
PMPolicy
OS
Application
VPM States
PMPolicy
Dom0
VPM Channel
VPMRules
VP
M M
ech
anis
ms
Platform-level Power Management Methods: Costs and Opportunities
• Goals– identify and quantify the
reasons for performance degradation associated with each method for power management
– To be able to estimate the power savings from each methods
• Use this information to support better runtime management decisions
• Current focus: DVFS– Bounds on degradation
based on system profiling and runtime performance couter information
• Other methods next, including idle states
Virtualization Support for Power Budgeting
Power
Power Power Power
VM1
VM2
VM3
VM4
VM5
Group level power budget distributed amongst underlying platforms based on system utilization, static priority, etc
Platform power cap directly affects VM performance impact
Goal: Develop system support for improving aggregate QoS/performance of VMs in distributed power budgeted environments
VPMTokens Benefits
100% 90% 80% 70% 60%
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
Trans (low rate)
Trans (high rate)
Trans (high rate w/VPM channel)
Platform Budget (%)
Norm
aliz
ed
Ap
plic
ati
on
QoS
Without VPM channel feedback high rate transaction application experiences QoS impact
VPM input allows system to dynamically move budget from low rate to high rate VM, reducing overall performance impact within budget constraints
• Experiment: High rate and low rate transaction VMs running on P4 platform• Both VMs have same utility value, therefore equal allocation without VPM channel input
• (R. Nathuji, K. Schwan, HPDC08)
CoolIT: Coordinated IT and environmental/thermal
management
CRAC
R5 R6 R7 R8
R4R3R2R1
Cold Aisle
Energy tradeoff • Increased airflow rates provide improved ability to dissipate server heat loads• Airflow required to meet maximum inlet temperature varies based upon server loads and physical distribution of heat
Use of power budgeting in IT• Efficient operating point for cooling infrastructure implies load constraints• Power capping capabilities enable dynamic compliance when DPM alone cannot meet constraints
R5 R6 R7 R8
R4R3R2R1
Cold Aisle
CoolIT Approach• Objectives:
– VM load distribution strategy:• Inlet temperature in the data center should be below
32C (or target temperature)• Server power consumption and cooling power is
limited– Heat load within limits based on cooling capacity– Hot-spot avoidance in the face of load and platform
heterogeneity• Model output:
– target utilization for different platforms based on inlet temperature, air velocity, average utilization
• Offline profiles:– power vs CPU utilization for different architectures– other resources next
• Dynamic load management: – Currently first fit bin-packing algorithm– Driven by management server
• Input from mgt domains and thermocouples sensors– OSIsoft PI server
• dom0 -> PI server SNMP-based infrastructure• can also drive VM migration
Management Architecture• Management brokers
– make and enforce ‘localized’ management decisions• within VMs• VMM-level – CPU scheduling, allocation of memory or device
resources, ..• at hardware level
• Management channels– enable inter-broker
coordination through well-defined interfaces
– event and shared memory based
• Management VMs – platform wide policies
and cross-platform coordination
CERCS Distributed Cloud Infrastructure