green high performance computing initiative at...

The Center for Autonomic Computing is supported by the National Science Foundation under Grant No. 0758596.

China – US Software Workshop, September 2011

Green High Performance Computing Initiative at Rutgers University (GreenHPC@RU)

Manish Parashar, Ivan Rodero

Contact: {parashar, irodero}@cac.rutgers.edu http://nsfcac.rutgers.edu/GreenHPC

Motivations for HPC Power Management q  Growing scales of High-end High Performance Computing systems

q  Power consumption has become critical in terms of operational costs (dominant part of IT budgets)

q  Existing power/energy management focused on single layers q  May result in suboptimal solutions

Challenges in HPC Power Management q  GreenHPC

q  $/W/MFLOP, defining energy efficiency q  Infrastructure: thermal sensors, instrumentation, monitoring, cooling,

etc. q  Architectural challenges

q  Processor, memory, interconnect technologies q  Increased use of accelerators q  Power-aware micro-architectures

q  Compilers, OS, runtime support q  Power-aware code generation q  Power-aware scheduling q  DVFS, programming models, abstractions

q  Considerations for system level energy efficiency q  Optimizing CPU alone is not sufficient, need to look at entire system/

cluster q  Application/workload aware-optimizations

q  Power-aware algorithm design

Motivations and Challenges Cross-layer Energy-Power Management q  Architecture

Component-based Power Management [HiPC’10/11] q  Application-centric aggressive power management

at component level q  Workload profiling and characterization

q  HPC workloads (e.g., HPC Linpack) q  Use of low power modes to configure subsystems

(i.e., CPU, memory, disk and NIC) q  Energy-efficient memory hierarchies

q  Memory power management for multi- many-core systems q  Multiple channels to main memory q  Application-driven power control (i.e., ensure bandwidth to main memory leveraging channel affinity)

Runtime Power Management q  Partitioned Global Address Space (PGAS)

q  Implicit message-passing q  Unified Parallel C (UPC) so far

q  Target platforms q  Many-core (i.e., Intel SCC) q  HPC clusters

Energy-aware Autonomic Provisioning [IGCC’10] q  Virtualized Cloud infrastructures with multiple

geographical distributed entry points. q  Workloads composed of HPC applications q  Distributed Online Clustering (DOC)

q  Cluster job requests in the input stream based on their resource requirements (multiple dimensions, e.g., memory, CPU, network requirements)

q  Optimizing energy efficiency in the following ways: q  Powering down subsystems when they are not needed q  Efficient, just-right VM provisioning (reduce over-provisioning ) q  Efficient proactive provisioning and grouping (reduce re-provisioning)

Energy and Thermal Autonomic Management q  Reactive thermal and energy management of HPC workloads

q  Autonomic decision making to react to thermal hotspots considering multiple dimensions (i.e., energy and thermal efficiency) using different techniques: VM migration, CPU DVFS, CPU pinning

q  Proactive energy-aware application-centric VM allocation for HPC workloads q  Strategy for proactive VM allocation based on VM consolidation while

satisfying QoS guarantees q  Based on application profiling (profiles are known in advance) and

benchmarking on real hardware

q  Abnormal operational state detection (e.g., poor performance, hotspots)

q  Reactive and proactive approaches q  Reacting to anomalies to return to steady state q  Predict anomalies in order to avoid them

q  Workload/application-awareness q  Application profiling/characterization

Approach

Energy E!ciency

Thermal e!ciency

Abnormal

Steady State

Energy E!ciency

Thermal e!ciency

Energy E!ciency

Thermal e!ciency

q  Autonomic (self-monitored and self-managed) computing systems

q  Optimizing (minimizing): q  Energy efficiency q  Cost-effectiveness q  Utilization

while ensuring (maximizing): q  Performance/quality of service delivered

q  Addressing both “traditional” and virtualized system (i.e., data centers and GreenHPC in the Cloud)

Physical Environment Sensor

!Observer!

!Actuator

!Controller!

Resources

Sensor

!Observer!

!Actuator

!Controller!

Virtualization

Sensor

!Observer!

!Actuator

!Controller!

Application/ Workload Sensor

!Observer!

!Actuator

!Controller!

Cross-layer Power Management

Cross-infrastructure Power Management

(private, public,

hybrid, etc.)

(private, public,

hybrid, etc.)

Virtualized Instrumented infrastructure

Application-aware

1 2 3 4 5 6 7 8 910111213141516

Program Region

Channels

Opportunities for powering down

! " # $ % & '

+,-)./012.

456)./012.

! "!!! #!!!! #"!!!!

*+,(-./01-

345(-./01-

VM classes based on clusters

Over-provisioning cost

1.30383e+09 1.30383e+09 1.30383e+09 1.30383e+09 1.30383e+09 1.30383e+09 1.30383e+09 1.30383e+09 1.30383e+09 1.30383e+09 1.30383e+09

Time (ns)

CG.A.O3.16 execution. Call’s to UPC Runtime function upcr_wait2 in node 1

Example: Potential saving during collective operations (e.g., barriers) in the SCC platform

green high performance computing initiative at...

Documents

p21smartchart parashar

fundamental analysis of ashok leyland by kartik parashar

grid computing: an introduction - computer...

recap: overview of oracle architecture - anuj parashar

curriculum vitae of dr. manish parashar

sachin parashar ppt

techniques department-commerc prof-parashar dave capital

what is financial inclusion by kartik parashar

digital energy modern energy management systems...

manish parashar * center for autonomic computing the applied...

1 a concise introduction to autonomic computing roy...

urban b sectorcdn.cseindia.org/userfiles/laghu parashar bus...

garima parashar food poisoning ppt =pr assignment 2

portfolio management_meenakshi parashar

pfr parashar 99 -...

autonomic computing: an overview manish parashar and salim...

dr lalit-mohan-parashar...

company law, 1956 vivek parashar

parashar light help

autonomic grid computing: concepts, infrastructure...