green clouds – power consumption as a first order criterion karsten schwan, sudhakar yalamanchili,...

Green Clouds – Power Consumption as a First Order Criterion

Karsten Schwan, Sudhakar Yalamanchili, Ada Gavrilovska,Hrishikesh Amur, Bhavani Krishnan, Surabhi Diwan,

Nikhil Sathe, Minki Lee, Saibal Mukopadhyay, …CERCS

Yogendra Joshi, Pramod Kumar, Emad SamadianiCEETHERM

Georgia Institute of Technology

Green Computing Initiative

• An eco-system of projects addressing multiple stack layers

• Multiple faculty and students involved

• ECE, ME, CS

Circuit levelCircuit level: DVFS, power states, clock gating (ECE)

Chip and PackageChip and Package: power multiplexing, spatiotemporal migration (SCS, ECE)

BoardBoard: VirtualPower, scheduling/scaling/operating system… (SCS, ME, ECE)

RackRack: mechanical design, thermal and airflow analysis, VPTokens, OS and management (ME, SCS)

Pow

er d

istr

ibut

ion

and

deli

very

(E

CE

)

http://img.all2all.net/main.php?g2_itemId=157

Datacenter and beyondDatacenter and beyond: design, IT management, HVAC control… (ME, SCS, OIT…)

3

Modeling and Control Across Entire Stack

• Seek a fundamental understanding of relationships between performance, power distribution, energy consumption, heat generation and cooling technologies at all levels of the stack

• Develop, model, and assess (new) principles for energy and thermal management– Coordinated management across the entire stack– Example concepts

• Couple cooling and workload generation: thermal flow control to respond to load conditions

• Couple power distribution and workload generation: adapt to power capacity (time of day?)

• Pro-active spatio-temporal migration driven by physics of heat generation/flow rather than reactive sensor driven techniques

Sample Projects

• Understanding of power distribution opportunities at the chip level

• Platform-level coordination of power management methods– DVFS, scheduling, idle states– Understanding of impact of these approaches

• Distributed power management methods• IT & environmental factor management

– Temperature and air velocity, HVAC control

• Management architecture for virutalized platforms

5

Thermal and Power Scaling Limits-On-Chip

Power Limited PerformancePower Limited Performance

Temperature Limited Temperature Limited PerformancePerformance

Mukhopadhyay and Yalamanchili (2008)Mukhopadhyay and Yalamanchili (2008)

75°70°

45°30 ms

75°70°

45°30 ms

• 64 on-tiles• 256 total tiles• 100K time slice interval@3GHz

Unmanaged Thermal Behavior Managed Thermal Behavior (multiplexed power)

Courtesy: Nikil Sathe

Spatial gradientSpatial gradient: 10.5°@0.75mm Temporal Temporal gradientgradient: 2.5°@100Kcyles

Spatial gradientSpatial gradient: 2.5°@0.75mm Temporal gradientTemporal gradient: 1.99°@100Kcyles

The Need for Feedback

Power distribution network

Spatiotemporal migration

Thermal profile

Co-design power distribution/architecture management@chip and multi-chip

Co-exploration of thermal management/architecture management

LocalMemoryCache

core

LocalMemoryCache

coreLocalMemoryCache

coreLocalMemoryCache

core

Towards Integrated Platform Power Management

• Multiple techniques exist for power management on platforms.

VirtualPower: Coordinated Power Management

• Coordinated power management (DVFS) + load management (migration) + CPU management (credit based soft scaling) -> cumulative reduction of 34% in power resources without SLA degradation of RUBis benchmark

• (R. Nathuji, K. Schwan, SOSP0)

Hardware

Hypervisor

OS

Application

VPM States

PMPolicy

OS

Application

VPM States

PMPolicy

Dom0

VPM Channel

VPMRules

VP

M M

ech

anis

ms

Platform-level Power Management Methods: Costs and Opportunities

• Goals– identify and quantify the

reasons for performance degradation associated with each method for power management

– To be able to estimate the power savings from each methods

• Use this information to support better runtime management decisions

• Current focus: DVFS– Bounds on degradation

based on system profiling and runtime performance couter information

• Other methods next, including idle states

Virtualization Support for Power Budgeting

Power

Power Power Power

VM1

VM2

VM3

VM4

VM5

Group level power budget distributed amongst underlying platforms based on system utilization, static priority, etc

Platform power cap directly affects VM performance impact

Goal: Develop system support for improving aggregate QoS/performance of VMs in distributed power budgeted environments

VPMTokens Benefits

100% 90% 80% 70% 60%

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

Trans (low rate)

Trans (high rate)

Trans (high rate w/VPM channel)

Platform Budget (%)

Norm

aliz

ed

Ap

plic

ati

on

QoS

Without VPM channel feedback high rate transaction application experiences QoS impact

VPM input allows system to dynamically move budget from low rate to high rate VM, reducing overall performance impact within budget constraints

• Experiment: High rate and low rate transaction VMs running on P4 platform• Both VMs have same utility value, therefore equal allocation without VPM channel input

• (R. Nathuji, K. Schwan, HPDC08)

CoolIT: Coordinated IT and environmental/thermal

management

CRAC

R5 R6 R7 R8

R4R3R2R1

Cold Aisle

Energy tradeoff • Increased airflow rates provide improved ability to dissipate server heat loads• Airflow required to meet maximum inlet temperature varies based upon server loads and physical distribution of heat

Use of power budgeting in IT• Efficient operating point for cooling infrastructure implies load constraints• Power capping capabilities enable dynamic compliance when DPM alone cannot meet constraints

R5 R6 R7 R8

R4R3R2R1

Cold Aisle

CoolIT Approach• Objectives:

– VM load distribution strategy:• Inlet temperature in the data center should be below

32C (or target temperature)• Server power consumption and cooling power is

limited– Heat load within limits based on cooling capacity– Hot-spot avoidance in the face of load and platform

heterogeneity• Model output:

– target utilization for different platforms based on inlet temperature, air velocity, average utilization

• Offline profiles:– power vs CPU utilization for different architectures– other resources next

• Dynamic load management: – Currently first fit bin-packing algorithm– Driven by management server

• Input from mgt domains and thermocouples sensors– OSIsoft PI server

• dom0 -> PI server SNMP-based infrastructure• can also drive VM migration

Management Architecture• Management brokers

– make and enforce ‘localized’ management decisions• within VMs• VMM-level – CPU scheduling, allocation of memory or device

resources, ..• at hardware level

• Management channels– enable inter-broker

coordination through well-defined interfaces

– event and shared memory based

• Management VMs – platform wide policies

and cross-platform coordination

CERCS Distributed Cloud Infrastructure

green clouds – power consumption as a first order criterion karsten schwan, sudhakar yalamanchili,...

Documents

power states

power multiplexing

power resources

power savings

power managementto

power scaling limits

power capacity time

green clouds power consumption