sensor-based fast thermal evaluation model for energy efficient high-performance datacenters q....

34
I MPACT A rizona Sta te U n iv ersity Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters Q. Tang, T. Mukherjee, Sandeep K. S. Gupta Department of Computer Sc. & Engg. Arizona State University & Phil Cayton, Intel Corp.

Upload: claribel-pierce

Post on 28-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

IMPACT

A r izo n a S tate U n iv e r s ity

Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters

Q. Tang, T. Mukherjee, Sandeep K. S. GuptaDepartment of Computer Sc. & Engg.

Arizona State University&

Phil Cayton, Intel Corp.

IMPACT

A r izo n a S tate U n iv e r s ity

Heating problem in Data Center

Power densities are increasing exponentially along with Moore Law

Current cooling solutions at various levels

Chip / component level Server/board level Rack level Data center level

IMPACT

A r izo n a S tate U n iv e r s ity

Two steps of reducing heating effects

Design and deployment stage (Civil & Mechanical Engineering Approach )

Increasing air conditioner capacity Designing optimized layout to facilitate air circulation

Operation stage (Computer Science Approach) Example: dynamically assigning tasks to avoid

overheated servers and to achieve thermal balancing Assigning task to servers who consume less energy

IMPACT

A r izo n a S tate U n iv e r s ity

Thermal Management of Datacenter

Motivation and significance Compute Intensive Applications (Online Gaming,

Computer Movie Animation, Data Mining) requiring increased utilization of Data Center

Maximizing computing capacity is a demanding requirement

New blade servers can be packed more densely Energy cost is rising dramatically

Goal Improving thermal performance Lowering hardware failure rate Reducing energy cost

IMPACT

A r izo n a S tate U n iv e r s ity

Typical layout of a datacenter

Rack outlet temperature Tout

Rack inlet temperature Tin

Air conditioner supply temperature Ts

IMPACT

A r izo n a S tate U n iv e r s ity

Schematic View of Thermal Management

C o n tro l

F eed b ack

T ran sd u ce r

Se ns o r D ataD atabas e

C FD s im ulat io ns o f tware

P o lic yC o ntro l le r

M o abSc he dule r

O the r Im pac tfac to r s

C o lle c t ing e nviro nm e ntal data andlo ad info rm atio n f ro m s e ns o r s

`

C o rre lat io n o flo ad & po we r

C o s t Analys is

Sc he duling P o l ic y

C o ntro l P o l ic y

Inc o m ing tas k

O ns i te s urve y

M a p loa d to pow e rc ons um ption

P ro c e s sM igrat io n

H is to ry Se ns o r D ata

C ur re nt Se ns o r D ata

D atac enter

Abs trac t H e atM o de l

T arg e t

IMPACT

A r izo n a S tate U n iv e r s ity

Thermal-Aware Scheduling versusDatacenter Energy Cost

IMPACT

A r izo n a S tate U n iv e r s ity

Thermal Scheduling: Problem Statement

We present results of thermal-aware scheduling to improve the (blade server based) energy efficient of datacenter

Given a total task C, how to divide it among N server node to finish computing task with minimal total energy cost ?

IMPACT

A r izo n a S tate U n iv e r s ity

Energy Conservation

i iout i p outQ f C T

i iin i p inQ f C T

Inlet Airflow, a mixture ofSupplied cold air and Recirculated hot air

Outlet Airflow Server Power Consumption Pi

Depending on amount of computing task

i iout in iQ Q P

IMPACT

A r izo n a S tate U n iv e r s ity

Thermal Management

Ta s kA s s ig m e n t

Po we rV e cto r

Te m pe ra tu reD is tr ibu t io n

C o o lin gC o s t

To ta lC o s t+

C o m pu t in g C o s t

Different task assignment result in different power consumption distribution

Different power consumption distribution results in different temperature distribution

Different temperature distribution results in different total energy cost

IMPACT

A r izo n a S tate U n iv e r s ity

Example

Inlet temperaturedistributionwithout Cooling

25C

25C

Cooling lowered Inlet temperature lowered blowredline threshold

Different schedulingResults different inletTemperature distribution

Scheduling 1

Scheduling 2

Demand for cooling load /energy

Demand for cooling load/energy

IMPACT

A r izo n a S tate U n iv e r s ity

Total Energy Cost of Datacenter

Computing energy cost Cooling energy cost

keep the maximal inlet temperature below the redline temperature of devices 25C

COP: Coefficient Of Performance (COP)

Total Energy Cost

the amount of heat removed

the energy consumed by the cooling device.COP =

IMPACT

A r izo n a S tate U n iv e r s ity

Observation

Even with the same computing power dissipation, different temperature distribution may demand different cooling load, results in different total energy cost

We can manipulating task scheduling to achieve best temperature distribution, consequently minimize total energy cost

IMPACT

A r izo n a S tate U n iv e r s ity

Naive Scheduling Algorithm

IMPACT

A r izo n a S tate U n iv e r s ity

Uniform Outlet Profile

Why Naive Based on observation and intuition No mathematical formalization

Uniform Outlet Profile (UOP) Assigning tasks in a way trying to

achieve unifrom outlet temperature distribution Tc

Assigning more task to nodes with low inlet temperature (water filling process)

Tc

Temperature risedue to power consumption

Inlet Temperature

IMPACT

A r izo n a S tate U n iv e r s ity

Uniform Task

Uniform Task (UT) Assigning all chassis the

same amount of tasks (power consumptions)

All nodes experience the same power consumption and temperature rise

IMPACT

A r izo n a S tate U n iv e r s ity

Minimum Computing Energy

Minimum computing energy (cooling inlet) Assigning tasks in a way to keep the number

of active (power on) chassis as small as possible

IMPACT

A r izo n a S tate U n iv e r s ity

Abstract Heat Flow Mode &Cross Interference Coefficients

IMPACT

A r izo n a S tate U n iv e r s ity

Abstract Heat Flow Model

N 1 A C

R ecircu la tio n

T su p T in T o u t T A C in

N 2 N 3

1 2 1 3

2 13 1

1 1

Observation Airflow pattern are stable (confirmed through CFD simulation)

Hypothesis The amount of recirculated heat is stable, can be characterized Define aij the percentage of recirculated heat from node i to node j

IMPACT

A r izo n a S tate U n iv e r s ity

Cross Interference among Server Nodes

Cross Interference Coefficients (CIC) Define aij the percentage of recirculated heat from

node i to node j Cross interference coefficients

Cross Interference Matrix Correlations among power consumption (utilization

rate), temperature, and cross interference

1

2

0 0

0 ...

0 ... ...

p

p

n p

f C

f CK

f C

IMPACT

A r izo n a S tate U n iv e r s ity

Fast Thermal Evaluation

Use profiling process to calculate cross interference coefficients

Temperature Prediction

A Configuration of Distributed System

NumericalSimulation (hours)

Fast ThermalEvaluation (real time)

Thermal Performance Evaluation

IMPACT

A r izo n a S tate U n iv e r s ity

Recirculation Minimized Scheduling: XInt

IMPACT

A r izo n a S tate U n iv e r s ity

Formalizing optimization problem

To minimize cooling energy cost, we only need to minimize maximal inlet temperature

Formalized optimization problem based on abstract heat flow model, can be converged into LP, ILP, linear, nonlinear problems according to different models and policies

IMPACT

A r izo n a S tate U n iv e r s ity

Simulation Results

IMPACT

A r izo n a S tate U n iv e r s ity

Simulation Environment

2 Row Datacenter Ten standard 42U racks Each rack has five Dell 1855 Blade server CFD simulation is used for evaluate

temperature distribution (Flovent from Flomerics)

IMPACT

A r izo n a S tate U n iv e r s ity

DataCenter model

Node 1

Node 2

Node 5

Node 50

Node 25

Node 30

IMPACT

A r izo n a S tate U n iv e r s ity

Cross Interference Coefficients

Confirmed with datacenter reality

Strong interference to neighboring nodes

IMPACT

A r izo n a S tate U n iv e r s ity

Fast Thermal Evaluation Results

Provides fast and accurate temperature prediction

Practical for online real-time thermal management

IMPACT

A r izo n a S tate U n iv e r s ity

Simulation Results: Cooling Cost

IMPACT

A r izo n a S tate U n iv e r s ity

Simulation Results: Analysis & Summary

XInt consistently outperforms all other scheduling algorithms

Compared with MinHR, XInt is more practicabel Task oriented scheduling vs. Power oriented

scheduling Online, real-time XInt is mathematically formalized

IMPACT

A r izo n a S tate U n iv e r s ity

Future Works

Integrating with cluster management software platforms

Moab, Torque, etc Considering task priorities and time

constraints

IMPACT

A r izo n a S tate U n iv e r s ity

Questions ?

IMPACT

A r izo n a S tate U n iv e r s ity

Related Works

Consil vs Fast Thermal Evaluation Deduction vs. Prediction Current vs. future, which is more important for

proactive and preventive thermal management MinHR vs. XInt

Both characterize recirculation in similar granulites Aggregated effects vs. point to point Offline vs. online Power oriented vs. Task oriented

IMPACT

A r izo n a S tate U n iv e r s ity

Supply Heat Index (SHI)

Roughly characterize recirculation Cannot differentiate the same SHI but different

temperature distribution