thermal aware data management in cloud based data centers ling liu college of computing georgia...

11
Thermal Aware Data Management in Cloud based Data Centers Ling Liu College of Computing Georgia Institute of Technology SEEDM workshop, May 2-3, 2011

Upload: leona-jennings

Post on 13-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Thermal Aware Data Management in Cloud based Data Centers

Ling LiuCollege of Computing

Georgia Institute of Technology

NSF SEEDM workshop, May 2-3, 2011

Thermal aware Computing Era

• Power density increases– Circuit density increases by a factor of 3 every 2 years– Energy efficiency increases by a factor of 2 every 2 years– Effective power density increases by a factor of 1.5 every 2 years

[Keneth Brill: The Invisible Crisis in the Data Center]

• Maintenance/TCO rising– Data Center TCO doubles every three years– Three-year cost of electricity exceeds the purchase cost of the server– Virtualization/Consolidation is a 1-time/short term solution

[Uptime Institute]

• Thermal management corresponds to an increasing portion of expenses– Thermal-aware computing and management solutions becoming prominent

– Increasing need for thermal awareness

[VarsamopoulosGupta 2008]

Thermal aware Task Scheduling in Data Centers

• Given a total task C, how to divide it among N server nodes to finish computing task with minimal cooling energy cost ?

• Self-Interference and cross-interference lead to the temperature rise of inlet air, should be minimized

• Environment interference(room temperature) is not critical• Task scheduling in spatial domain

[VarsamopoulosGupta 2008]

Cooling Cost aware Scheduling

[VarsamopoulosGupta-2008]

Energy Saving by Dynamic Load Distribution

Increasing the range of changes in the rack heat load

• Heat load distribution of [30 kW, 5 kW, 5 kW, 20 kW] in the case study only needs 1.7 m/s (9,726 CFM) cooling air flow

• It is 19% less than the uniform distribution needs

• This could save ~$189,000 annually in typical real world data centers

[15,15,15,15] kW with 2.1 m/s [30,5,5,20] kW with 1.7 m/s

Temperature Contours Around Racks:

[Yogendra Joshi, Georgia Tech/CERCS]

Think Globally, Act Locally

Numerically

Run simulations for a range of

velocities

Make a server heat load-Inlet T variation matrix

Change in max. inlet T of servers

Unit change in server loads

S1 S2 Sn

S1

S2

Sn

Experimentally

Vary the heat loads sequentially

at servers for a chosen unit cell and monitor the

max. server inlet T

Advantage:

The simulations run for different velocities are not required for the experimental approach.

Modifications:

Blocks of servers can be identified with same effect or no effect on the inlet T.

• This will give insights on the sparsity of this matrix.

• Reduce the computational work.

A Matrix

n

iil

1

max

..ts crT TlA

maxmin lll Where,

server I load

Minimum load (startup)

Max. load (full utilization)

Max. inlet T allowed by ASHRAE

n

iil

1

max

crT TlA

maxmin lll maxmin lll

[Yogendra Joshi, Georgia Tech/CERCS] ]

68% increase in allowed heat dissipation

(For the same CRAC velocity)

37.5% decrease in Facilities Energy Consumption (For the same heat

dissipation)

An Example

288

293

298

303

308

313

318

323

328

Max.

Inle

t T

at

Serv

ers

(K

)

AILM: 0.8-7.5kWserver range - A rack

AILM: 0.8-7.5kWserver range - B Rack

Uniform: 5kW serverload - A Rack

Uniform: 5kW serverload - B Rack

SafeTemperature

Limit

11 141312 15 4116 21 3122 23 2524 26

Total Data Center Load Dissipation

298kW

297kW

VCRAC = 5m/s

11 41

16 46

[Yogendra Joshi, Georgia Tech/CERCS]

Pertinence of Thermal Maps in Data Center Management

• Given an equipment utilization layout, find the temperature around the room

• Create a collection of thermal maps or a function to “predict” thermal behavior of a task assignment

• Use collection to decide on job placement (temporally and spatially)

[VarsamopoulosGupta 2008]

Thermal-awareData Management

[Adapted from VarsamopoulosGupta 2008]

Thermal aware data management

• Task profiling – CPU utilization, I/O activity

etc• Equipment power profiling

– CPU consumption, disk consumption etc

• Heat recirculation modeling• Task management technologies

Need for a comprehensive research framework