to improve efficiency in identifying opportunities hpc...

Identifying Opportunities to Improve Efficiency in

HPC Clusters

Jordi BlascoCo-founder & CTO

HPC Advisory Council - Perth - August 2018

Quick introduction to HPCNow!

Identifying Opportunities to Improve Efficiency in HPC Clusters - Jordi Blasco

● Global HPC consulting company● IT + scientific background● HPC services and solutions● User-oriented company● Hardware agnostic

System Administratorsand User Support

Top500 Supercomputer Users


Contributions to HPC Community


Public sector Private Companies



Motivation

MotivationAre you familiar with these issues?


1Risk of user dissatisfaction

3Higher IO demanding workflows

5 Cluster Contention

2Higher waiting time

4Hardware no longer

supported


Buy a New Cluster

Large procurement usually involves long and complex RfP process.

MotivationPotential Solutions


Buy a New Cluster


Use CloudExtend the current compute capacity by using cloud bursting to accommodate peaks of needs is definitely the best option. Unfortunately, ongoing and regular usage become expensive.


Use CloudCloud Bursting Capabilities - Hybrid Cloud



Buy a New Cluster


Use CloudExtend the current compute capacity by using cloud bursting to accommodate peaks of needs is definitely the best option. Unfortunately, ongoing and regular usage become expensive.

Improve Efficiency

By improving the performance and efficiency, you are somehow creating more allocation for new jobs with zero investment in hardware capacity.


Improve Job EfficiencyImpact in Job Allocation and Resources Availability



How to identify tuning opportunities easily?

Job Efficiency Monitoring


Traditional tools like Ganglia are not capable of representing the metrics required to identify inefficient jobs.● based load monitoring● no link to user● no link to job● no link to other nodes allocated for the same job● no information regarding efficiency in the allocated

resources

Job Efficiency Monitoring Requirements


1

Review re

source

s

requested vs

used

Key fundamental m

etrics t

o understand

how well u

sed are th

e reso

urces

requested. 2

Real Tim

e Monito

ring

Enables proacti

ve jo

b profiling and also

enables the poss

ibility t

o trigger re

al

time acti

ons.3

30 Seconds R

esolutio

n

30 seco

nds reso

lution is

quite re

asonable

for the m

ajority of H

PC workl

oads.

The main goal is to identify opportunities to improve user workflows, user codes and applications, in addition to user mistakes.

Given the huge number of jobs and large number of nodes, the solution requires big data strategy.


Architecture

Architecture


Large number of events to analyse requires the use of Big Data technologies. Data is gathered using custom codes and aggregated into ElasticSearch, an open source search and analytics engine which has high reliability and proven scalability. Finally, the data is represented through Grafana and Kibana, which are leading tools for querying and visualizing large datasets and metrics.

Traditional Stack Pipeline


Custom Monitoring Stack pipeline for HPC


Custom Monitoring Stack for HPC prototype


Standard vs Custom


Current statistics Standard Custom Prototype

Metrics per user (s) 8 8

Resolution 30 30

Avg. events/cycle 380 380

Avg. size per package (bytes) 2000 400

Avg. TB/year in ElasticSearch 1.80TB 0.17TB

Theoretical limit (events/s) 50k 260k

Standard vs Custom vs Prototype


Current statistics Standard Custom Prototype

Metrics per user (s) 8 8 318

Resolution 30 30 10

Avg. events/cycle 380 380 380

Avg. size per package (bytes) 2000 400 6800

Avg. TB/year in ElasticSearch 1.80TB 0.17TB 8.5

Theoretical limit (events/s) 50k 260k 15k

The prototype setup is based on LXD containers allocated across two bare metal nodes with 24 cores (Intel Haswell), 32GB of memory, 2TB of SSD disks and 1GB Ethernet.


● Going down to 10 seconds resolution● Job level usage● Task level usage● Allocated CPU usage● Memory usage● IPC● Disk IO● Network (Infiniband)● Cluster File System

Job Efficiency Monitoring (prototype)Additional Metrics and Features

● Read / Write calls● inodes updates● MB write / read● Open / Close requests● Walltime used / requested● Memory used / requested● Retention / purging policy● Alerts and event correlation● MPI stats (collectives)

Need to Scale Up?


Migrate to 10GB ethernet

Which could increase the number of events digested

to x10

Use buffersWhich could increase the number of events digested to x10

Add more elasticsearch nodesVirtually unlimited scalability

03

01 02


No performance penalty based on HPL resultsAdditional Metrics and Features

Custom Monitoring Stack for HPC prototype


Most Relevant Case Studies


Case Study CPUTime (h) Output

VASP user 4,265,883 Improved efficiency from 74% to 97.6%

ORCA user 2,300,033 Improved efficiency from 18% to 87% and IO.

R code user 1,670,402 Resilience issues (zombie tasks)

Fluent workflow user 1,401,825 Improved the efficiency 200%

Ansys Fluent user 1,391,951 x5 efficiency (100 vs 500 cores) + resilience

OMNeT++ user 1,253,462 Improved the efficiency from 6% to 96%

Custom CESM user 1,093,184 Improved the efficiency from 1% to 98%


Conclusions

Efficiency

Scalability

Performance

ConclusionsThanks to the job efficiency monitoring we have been able to improve


Efficiency

Scalability

Performance

ConclusionsThanks to the job efficiency monitoring we have also been able to

Detect user mistakes early

Avoid massive waste of CPU time

Improve user workflows

Accelerate research

Improve reliability

Improve user satisfaction


“The best way to predict the future is to invent it.” -- Alan Kay


[email protected]

www.hpcnow.com

Marie Curie, 8 - 08042 Barcelona (Spain)

34 Fernly Rise, 2019 Auckland (New Zealand)

Barcelona

Auckland

to improve efficiency in identifying opportunities hpc...

Documents