a review of soa hpc power monitoring systems & future ... · hpc cluster hot air/water c˚ rpm...

25
A Review of SoA HPC power monitoring systems & future trend on fine-grain data analytics EE in HPC @CINECA Ostrava, IT4I 30-Jan-2020 Andrea Bartolini (slides from Antonio Libri) EU H2020 FETHPC project ANTAREX (g.a. 671623)

Upload: others

Post on 01-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

A Review of SoA HPC power monitoring

systems & future trend on fine-grain

data analyticsEE in HPC @CINECA

Ostrava, IT4I30-Jan-2020

Andrea Bartolini

(slides from Antonio Libri)

EU H2020 FETHPC

project ANTAREX

(g.a. 671623)

Page 2: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

IIS - D-ITET - ETH Zurich

▪ Motivation – why reliable and high-res power/energy mon?

▪ SoA power monitoring systems

▪ Future trend on fine-grain measurements: data analytics

22-Feb-2019Antonio Libri 2

Outline

Page 3: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

IIS - D-ITET - ETH Zurich 22-Feb-2019Antonio Libri 3

Fine-Grain Power and Performance Measurements:

- Verify and classify node performance (& anomalies)- In spec / out of spec behaviour

- Miss configuration- Aging and wear out

- Predictive maintenance

Coarse grain

Fine grain

CPU

CPU

ACC ACC

Node

DIMMDIMMDIMM

req

requtil

Job Scheduler

System Power Capping (reliable energy measurements)

- New Installations, Grid SLA, Power Shortage, Natural Disasters

- Ensures operating power below a maximum power consumption level

Several Challenges for HPC and Data-centers

Page 4: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

IIS - D-ITET - ETH Zurich

▪ Document in [1] reports requirements to rank a HPC system in Top500/Green500

22-Feb-2019Antonio Libri 4

Top500/Green500 Power Meas methodology [1]

[1] EEHPC WG, “Energy Efficient High Performance Computing Power Measurement Methodology”, v2.0 RC 1.0

Requirements Level 1

(Min Quality)

Level 2 Level 3

(Best Quality)

Granularity 1S/s 1S/s • Continuously integrated energy

• V & I sampled at least

• @5kS/s for AC

• @120S/s for DC

Precision

(1σ - relative error)

5% 2% Below 1%

Meas Synch b/w

different meters

Below sampling

period (e.g., NTP)

Below sampling

period (e.g., NTP)

Below sampling period (e.g., NTP)

Page 5: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

IIS - D-ITET - ETH Zurich

▪ Often the terms accuracy and precision are wrongly interchanged

▪ However

▪ Accuracy → mean (can be fixed by calibration)

▪ Precision → Std Dev (no fix)

22-Feb-2019Antonio Libri 5

Some (CORRECT) terminology

Note: This definitions are used for any set of

measurements (e.g., power meas, but also time

synch measurements)

Page 6: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

IIS - D-ITET - ETH Zurich 12-Nov-2018 6

Fine-grain Sync w. App Phases on HPC Clusters [1]

μs resolved time stamps

ESoC_1ESoC_n

Rack

node1

Cold air/water

CRAC

HPC cluster Hot air/water

RPM FAN

Power

PerfcountersGPU

CPU1

CPUn

Clock

Clock

SeveralMetrics P0

Pn

APP MPI Synch

Time

Node 1

Node n

ParallelApplication

Node 1

Node nTimeTemp

Power

Cache Miss

Antonio Libri

Power @1s

[1] Libri et al., 2018, Evaluation of NTP / PTP Fine-

Grain Synchronization Performance in HPC Clusters

Page 7: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

IIS - D-ITET - ETH Zurich

▪ Motivation – why reliable and high-res power/energy mon?

▪ SoA power monitoring systems

▪ Future trend on fine-grain measurements: data analytics

22-Feb-2019Antonio Libri 7

Outline

Page 8: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

IIS - D-ITET - ETH Zurich 22-Feb-2019 8

SoA HPC Monitoring Systems

Antonio Libri

▪ Current solutions allow to collect measurements in-band

and out-of-band (no overhead on PE) via

▪ built-in tools (e.g., IPMI, Amester, RAPL → hw perf counters)

▪ custom sensors (e.g., HDEEM, HAEC → fine grain power meas)

Page 9: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

9

PE1 PE2 PE3

DCDC

VR VR VR

PSU

BMC

User - SpaceSW

HW

SoA In-band HPC Power Monitoring systems

State-of-the-art:

1. Intel RAPL [1]

[1] M. Hahnel et al., “Measuring Energy

Consumption for short Code Paths Using RAPL”

- Sampling time up to 1ms

- Reading energy via RAPL MSR registers

- Synchronization via NTP/PTP (vendor dependent)

- Precision is vendors dependent

- Scalable

RAPL MSR Register

(MSR_Safe @User Space)

Page 10: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

10

PE1 PE2 PE3

DCDC

VR VR VR

PSU

BMC

User - Space

Remote

Management Node

System

Administrator

P(t)

IPMI

SW

HW

SoA Out-of-band HPC Power Monitoring systems

State-of-the-art:

1. BMC – IPMI [1]

[1] IPMI spec v2.0 rev1.1, April 2015

- Slow sampling time (seconds)

- Unreliable time stamping

- Instantaneous power measurement – no energy / aliasing

- Precision is vendors dependent

- Scalable

Page 11: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

11

SoA Out-of-band HPC Power Monitoring systems

- BMC reads Power measurements via OCC (within PE)

- Also in-band version in new Power9

- Time resolution up to 250μs (8kB buffers)

- Synchronization via NTP/PTP (vendor dependent)

- Precision is vendors dependent

- Scalable

[1] T. Rosedahl et al., “Power/Performance Controlling

Techniques in OpenPOWER”

State-of-the-art:

1. BMC – IPMI

2. IBM Amester [1]

PE1 PE2 PE3

DCDC

OCC OCC OCC

PSU

BMC

User - Space

Remote

Management Node

System

Administrator

IPMI

SW

HW

Page 12: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

12

PE1 PE2 PE3

DCDC

VR VR VR

BMC

User - Space

Remote

Management Node

P(t)

IPMI

System

Administrator SW

HW

FPGA

DatabaseUser

SoA Out-of-band HPC Power Monitoring systems

- Time resolution up to 1ms (VR on CPU, DDR) - 125 μs (plug)

- Precision of 2% and 3%, respectively

- Time synchronization up to ms via NTP

- Scalable

[1] Ilsche et al., “Power Measurement Techniques for

Energy-Efficient Computing: Reconciling Scalability,

Resolution and Accuracy”

State-of-the-art:

1. BMC – IPMI

2. IBM Amester

3. HDEEM [1]

Page 13: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

13

PE1 PE2 PE3

DCDC

Shunt Shunt Shunt

BMC

User - Space

Remote

Management Node

P(t)

System

Administrator SW

HW

NI-DAC

User

SoA Out-of-band HPC Power Monitoring systems

Shunt Resistor

- 2 NI-DAC, one @7 kS/s (T=143μs - VR on CPU, DDR ), one

@500kS/s (T=2μs – power plug)

- Precision below 2%

- Current monitoring with Shunt Resistors (tested also HE Sensor)

- Time synchronization up to ms via NTP

- Not Scalable (single node only)

[1] Ilsche et al., “Power Measurement Techniques for

Energy-Efficient Computing: Reconciling Scalability,

Resolution and Accuracy”

State-of-the-art:

1. BMC – IPMI

2. IBM Amester

3. HDEEM

4. HAEC [1]

Page 14: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

14

PE1 PE2 PE3

DCDC

VR VR VR

BMC

User - Space

Remote Management

Node

P(t)

IPMI

System

Administrator SW

HW

PMBUS

DatabaseUser

SoA Out-of-band HPC Power Monitoring systems

State-of-the-art:

1. BMC – IPMI

2. IBM Amester

3. HDEEM

4. HAEC

5. CRAY XC APM [1]

- Measurements via PMBus

- Time res up to 1s (Node Power) and 10s (CPU & Memory Power)

- Precision ±2.5% (by datasheet)

- Time synchronization up to ms via NTP

- Scalable

[1] Steven J. Martin et al., “Cray XC

Advanced Power Management Updates”

Page 15: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

15

PE1 PE2 PE3

DCDC

VR VR VR

BMC

User - Space

Remote

Man. Node

P(t)

IPMI

Sys

Admin SW

HW

ESoC

User

SoA Out-of-band HPC Power Monitoring systems

State-of-the-art:

1. BMC – IPMI

2. IBM Amester

3. HDEEM

4. HAEC

5. CRAY XC APM

6. PowerInsight [1] /

Ardupower [2]

- Open, low cost embedded SoC (PI → Beaglebone; AP → Arduino)

- Current monitoring with HE Sensor

- Time res up to 1ms on PowerInsight; ~2ms on ArduPower

- Precision 1.8% PowerInsight; not reported on ArduPower

- Time synchronization up to ms via NTP

- Scalable

ArduPower

-

Arduino Mega 2560

PowerInsight

-

Beaglebone

[1] J. L. Laros et al., “PowerInsight – A commodity Power Measurement Capability”

[2] M. F. Dolz et al., “ARDUPOWER: A Low-cost Wattmeter to improve Energy Efficiency of HPC

Applications”

Page 16: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

16

- 1GHz ARM Cortex-A8

- 12bit 8-ch SAR ADC

- PTP HW enabled

Beaglebone Black (BBB)

SoA Out-of-band HPC Power Monitoring systems

State-of-the-art:

1. BMC – IPMI

2. IBM Amester

3. HDEEM

4. HAEC

5. CRAY XC APM

6. PowerInsight /

Ardupower

7. DiG [1]

- Open, Low-Cost Embeeded (BeagleboneBlack - ARM-A8)

- Tested both HE Sens / Shunt Res on several arch (Intel, ARM, IBM)

- Time res up to 20 μs @plug and reading VR at different rates

depending on arch

- Precision below 1%

- Time synchronization up to μs via PTP

- Scalable: Big data communication protocol (MQTT) + Real-time

edge analytics of the fine-grain measurements

[1] Libri et al., “DiG: Enabling Out-of-Band

Scalable High-Resolution for Data-Center

Analytics, Automation and Control”

PE1 PE2 PE3

DCDC

VR VR VR

BMC

User - Space

P(t)

Sys

Admin SW

HW

BBBUserMQTT

Database

Remote

Man. Node

Page 17: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

IIS - D-ITET - ETH Zurich

▪ Motivation – why reliable and high-res power/energy mon?

▪ SoA power monitoring systems

▪ Future trend on fine-grain measurements: data analytics

22-Feb-2019Antonio Libri 17

Outline

Page 18: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

IIS - D-ITET - ETH Zurich 12-Nov-2018Antonio Libri 18

Example of Fine-Grain Power Meas w. DiG

Coarse Grain View

BB View1 Node -20 min

20 min

45 Nodes -4s

BB @1s

BB @1ms

BB @1ms 45 Nodes -1s

Page 19: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

IIS - D-ITET - ETH Zurich 22-Feb-2019

Antonio Libri

19

Cluster Analytics – Data Collection SW Stack [1,2]

Back-end• Send Pow and Perf measurements for

cluster-level analytics

Front-end • Exploit Cassandra (NoSQL DB)• Data Visualization (Grafana) and Cluster Level ML

(Spark, both RT & batch mode)

Meas Meas Meas Meas Meas Meas

Target Facility

GrafanaApacheSpark

Applications

Python Matlab

Cassandranode1

CassandranodeM

NoSQL

MQTT2Kairos MQTT2kairos

Kairosdb

Broker1

MQTT

BrokerM

MQTT Brokers

[1] https://github.com/EEESlab/examon

[2] F. Beneventi et al., “Continuous learning of HPC infrastructure models using big data analytics

and in-memory processing tools”

Antonio Libri

Page 20: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

IIS - D-ITET - ETH Zurich 22-Feb-2019 20

DiG: High-Res Power Monitoring [1,2]

Antonio Libri

▪ Fourier on high-resolution power measurements as example of feature extraction technique for time series

Application 1

Application 2

[1] Libri et al., 2018, DiG: Enabling Out-of-Band Scalable High-Resolution Monitoring for Data-Center Analytics, Automation and Control[2] Borghesi et al., 2018, “Online Anomaly Detection in HPC systems”

Page 21: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

IIS - D-ITET - ETH Zurich 22-Feb-2019 21

Our Envision on Future Fine-Grain Analytics – The

DiG Approach [1,2]

PSU DCDC

PEPE

PEPE

Leverage real-time analysis between:

▪ Embed Mon – High Res Pow/Perf Edge Analytics

▪ Central Mon - Pow/Perf Cluster Analytics

Edge Analytics @high rate

MQTTPub

MQTTBroker

MQTTSub

CentralMon

Cluster Analytics @low rate

Power

I V

Perf Mon

Embedded Computer

Data-center ServersPub(top, data)

Sub(top)

Antonio Libri

[1] Libri et al., 2018, DiG: Enabling Out-of-Band Scalable High-Resolution Monitoring for Data-Center Analytics, Automation and Control[2] Borghesi et al., 2018, “Online Anomaly Detection in HPC systems”

Page 22: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

IIS - D-ITET - ETH Zurich

▪ Several challenges on HPC systems require reliable and high-

resolution measurements (precise, fine-grained & synchronized)

▪ Several SoA methods → in-band and out-of-band methods which

use built-in and custom sensors

▪ Fine-grain measurements can reveal precious information that can

be used to profiling applications and system behavior (e.g., detection

of anomalies)

▪ Leverage the huge amount of data of fine-grain measurements

between edge and cluster level analytics

22-Feb-2019 22

Take home messages

Antonio Libri

Page 23: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

Thanks for your interest

Contact:

▪ Antonio Libri, [email protected]

Acknowledge:

EU H2020 FETHPC

project ANTAREX

(g.a. 671623)

▪ Andrea Bartolini, Francesco Beneventi,

Andrea Borghesi and Luca Benini

Page 24: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

IIS - D-ITET - ETH Zurich 22-Feb-2019Antonio Libri 24

Backup Slides

Page 25: A Review of SoA HPC power monitoring systems & future ... · HPC cluster Hot air/water C˚ RPM FAN Power Perf GPU counters CPU1 CPUn Clock Clock Several Metrics P0 Pn APP MPI Synch

IIS - D-ITET - ETH Zurich

▪ SW overhead: DiG CPU usage of mon daemons < 46 %

(soon 0% thanks to co-processor offloading)

▪ Synch: via PTP up to μs (below sampling period 20μs)

▪ MQTT Scalability: tested on 512 nodes of GALILEO

(CINECA) → suitable for large-scale systems

▪ Real-Time ML Inference preliminary benchmarks:▪ RT feature extraction via FFT on high-resolution measurements in a time

window of 40ms w. around 7% of DiG CPU usage

▪ RT ML inference via TF w. Resnet of 16 layers and chan {16;16; 32; 64} respecting real-time constraint of 40ms (FFT time window)

22-Feb-2019 25

DiG: SW Overhead, Scalability & ML Infer

Antonio Libri