hpc performance tools, on the road to exascale

HPC Performance tools,

on the road to Exascale

David Tur, HPC Scientist

Maria Isabel Gandia, Communications

Manager

EduPERT call, Barcelona, Dec 4th 2014

HPC Performance tools, on the road to Exascale

1. Introduction

2. HPC, On the Road to Exascale

3. Performance Tools

4. European Exa-Initiatives

5. Q&A

economies d’escala

Introduction

Our goal is to share academic, scientific and management services for our institutions to improve efficiency promoting synergies and economies of scale.

The Consortium

Introduction

Management of e-infrastructures for the universities and research

centres, like Anella Científica, the Regional Research and

Education Network in Catalonia

Improvement of library services through collaboration

among universities

The Consortium

Introduction

• Joint procurement• Consortium services

The Consortium

Communications

• 81 connected institutions• 85 points of access• Traffic (2014): 29,78 TB• Several 10 Gbps for projects

• 26 (+3) institutions in Eduroam

• 25 members at CATNIX• 10 Gbps interconnection

The Consortium

Repositories

The Consortium

Digital certification e-voting e-register e-signature

Digital Certificates Custody

Interoperability Archive Logs

E-Administration

30 elections

8 universities

20,000R (input)

11,000 R (output)

+ 3,000

certificates

The Consortium

High Performance Computing

The Consortium

• HPC for scientific (academic) groups• Drug Design Service• HPC for industrial and engineering

Introduction

Coming challenges in HPC

Science in (Big) Data:

Giga to Exabytes

Big Data is not (only) about big storage but:

Methods & HPC machines

Introduction


Science in (Big) Data:

Giga to Exabytes

Scientific Simulations:

Giga to Exaflops


1. Introduction




5. Q&A


HPC, On the Road to Exascale

Welcome to the Future, welcome to the Exascale!

Exa-what???


Measure of computer performanceFlops or Flop/s: FLoating-point Operations Per Second

EXASCALE

Flops

Zettaflops 1021

Exaflops 1018Petaflops 1015

Teraflops 1012

Gigaflops 109

Megaflops 106

Kiloflops 103


State of the Art


What U.S. Department of Energy concludes (and we, Europeans should bear in mind) is:

“U.S. economic competitiveness will also be significantly enhanced as companies utilize accelerate development of superior new products and spur creativity arising from modeling and simulation at unprecedented speed and fidelity. An important ingredient is the boost to continued international leadership of the U.S.-based information technology industry at all levels, from laptop to exaflop.”

State of the Art


• Adaptation to regional climate changes • Reduction of the carbon footprint

• Efficiency and safety of the nuclear energy sector

• Innovative designs for cost-effective renewable energy resources such as batteries, biofuels…

• Design of advanced experimental facilities, such as accelerators, and inertial confinement fusion

• First-principles understanding of the properties of fission and fusion reactions

• Reverse engineering of the human brain

• Design, control and manufacture of advanced materials

Why Exascale?


1943, First Electronic general purpose Computer:ENIAC (Electronic Numerical Integrator And Computer)Designed for the United States Army’s Ballistic Research Laboratory1000 times faster than electro-mechanical machines, 20 tons, 167m2

150kW, 100kHZ and 400 Flop/s

Brief history of HPC

Flops

Zettaflops 1021


Teraflops 1012

Gigaflops 109

Megaflops 106

Kiloflops 103

Flops

Zettaflops 1021


Teraflops 1012

Gigaflops 109

Megaflops 106

Kiloflops 103

Altix UV 1000 (2010)Xeon X7542, 1344 cores 6.14 TB12,61 TFlop/s


Brief history of HPC


State of the Art


The power constraint

From the www.top500.org list




55 PFlop/s 17 MW

Power barrier!



Hydroelectric station 20 MW


Power is the limitation: ~20MW

Computational performance X200

Power increase X2 (x5)

First Exascale system between 2018-2020

This means a paradigm change!! Something must happen…something is happening!



New computing architectures

HPC systems consumers (and providers):

‘forgot’ about energy but…

Mobile world didn't!

forgot about vectorial computation but…

Gaming industry recovered it!


New computing architectures

Hardware (type of nodes)

• Multicore

• Many-core

• Low consumption core


New software paradigms

Hybrid SW also arriving

• OpenMP – works on a single node or with Intel MIC.

• MPI – works on single node, multiple nodes &Intel MIC.

• MPI with OpenMP (hybrid) – works on a single node, multiple nodes, possibly on Intel MIC.

• OpenCL – on single CPU node with AMD/NVidiaGPU

• MPI with CUDA (hybrid) – works on multiple nodes with NVidia GP-GPUs.

• MPI with OpenCL (hybrid) – works on multiple AMD/NVidia GP-GPU and x86 node.

• OpenACC (Fortran/C) with MPI – works with multiple nodes and with GP-GPUs.

• Computing simulation: 3rd pillar of Science (Exp&Th.)• Less importance of Flop/s and more on other factors

(bandwidth, overheads, latency…)• Power consumption rules the Exaworld

• New paradigms need to be implemented (co-design)

• New software• Numerical methods• Codes• Hardware architectures

• Heterogeneous & Hybrid Hardware Solutions (CPU/GPU/ARM/…)


Conclusions


1. Introduction




5. Q&A


• Computing simulation: 3rd pillar of Science (Exp&Th.)• Less importance of Flop/s and more on other factors

(bandwidth, overheads, latency…)• Power consumption rules the Exaworld

• New paradigms need to be implemented (co-design)

• New software• Numerical methods• Codes• Hardware architectures

• Heterogeneous & Hybrid Hardware Solutions (CPU/GPU/ARM/…)


New (or evolved) New (or evolved) performance toolsperformance tools

are neededare needed

Performance Tools

Supercomputing tools

What’s needed to perform simulations in a supercomputer

• OS

• Environment modules

• Batch system (Slurm, LSF, Maui, PBS, …)

• Mathematical Libraries (Trilinos, MKL, CUDA Toolkit, …)

• Parallel I/O Tools (NetCDF, HDF5, …)

• MPI Libraries (OpenMPI, “vendor” MPI, …)

• Programming Tools (Chapel, Intel TBB, GA, …)

• Compilers (GNU, Intel, PGI, “vendor”, …)

• Performance Tools

Performance Tools

Performance Tools

We are not reinventing the wheel!VI-HPS: European initiative http://www.vi-hps.org/

NERSC: U.S. department of Energy https://www.nersc.gov/

Performance Tools

• Debugging: These tools are used to investigate

codes searching for errors in the control flow

• Performance Analysis: Provides information

about the runtime behavior of application,

validating the efficiency usage of the hardware

• Correctness: checking tool that detects and

reports error patterns

Performance Tools

Performance Tools

Debugging

• Debugging: These tools are used to investigate

codes searching for errors in the control flow

• Examination values of the variables

• Live during execution or post-mortem

Performance Tools

Debugging

• Allinea DDT: GUI-based parallel debugger, which

can be used to debug serial, MPI, threaded

(OpenMP) and GPU (CUDA)

• TotalView: GUI-based source code debugger and

defect analysis tool that allows to debug one or

many processes and/or threads in a single window

with complete control over program execution.

• Intel Inspector XE: is a memory and

threading error debugger for C, C++, C#

and Fortran applications.

Performance Tools

Performance analysis

• Performance Analysis: Provides information

about the runtime behavior of application,

validating the efficiency usage of the hardware

• Data used by the tool can be obtained in

various ways: from static code analysis,

measurements or simulations

• Important to explore the “vendor” (SGI, Cray,

…) software

Performance Tools


• Vendor software: Tools for measuring and analyzing major

components are offered from supercomputing vendors: Craypat,

SGI Perf. Suite, …

• Intel Inspector XE: Intel VTune Amplifier application for software

performance analysis for 32, 64-bit x86 and MIC based machines.

Performance Tools


Extrae / Paraver:

Paraver was developed to have a global perception of the application behavior by visual inspection. The modular structure of Paraver plays a significant role towards achieving these targets.

Extrae is the tool devoted to generate tracefiles which can be analyzed later by Paraver. Extrae inject probes into the target application so as to gather information regarding the application performance.

DIMEMAS: performance analysis tool for message-passing programs. It enables the user to develop and tune parallel applications, while providing a prediction of their performance on the parallel target machine. Dimemas generates trace files that are suitable for Paraver and Vampir.

Performance Tools


TAU (Tuning and Analysis Utilities): is a profiling and tracing toolkit for performance analysis of parallel programs written in most programming languages. TAU is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements as well as event-based sampling. In addition, TAU can generate event traces that can be displayed with the Vampir, Paraver or JumpShot trace visualization tools.

Performance Tools


UNITE (UNiform Integrated Tool

Environment): provides a robust,

portable, and integrated environment

for the debugging and performance

analysis of parallel MPI, OpenMP, and

hybrid MPI/OpenMP programs on

HPC.

• Parallel Debugging

• TotalView• DDT

• Parallel Performance Analysis

• Extrae and Paraver• Marmot• mpiP• HPCToolkit• Intel Trace Collector

and Analyzer• Kcachegrind• MUST• PerfSuite• Periscope• TAU• Scalasca• Vampir

Performance Tools

Performance Tools

For more detailed information visit www.numexas.eu or:

VI-HPS: European initiative http://www.vi-hps.org/

NERSC: U.S. department of Energy https://www.nersc.gov/


1. Introduction




5. Q&A


From Gigabytes to Exabytes


Exa-Initiatives

Gaia European Network for Improved User Services

Gaia is a European Space Agency Cornerstone mission launched in 2013, and aiming to produce the most accurate 3D map of the Milky Way to date. A pan-European consortium named DPAC, funded at the national level, is working on the implementation of the Gaia data processing (Sec. 1.1.2), of which the final result will be a catalogue and data archive containing one billion objects.GENIUS is aimed at significantly contributing to the development of this archive, to help the exploitation of the Gaia results in order to maximize the scientific return, and also to facilitate outreach activities related to it.

WP-350 Hardware considerationsWP-630 Archive testbed - Simulations, aimed to test the system and validations - Data mining, ask archive to obtain new knowledge - Get expertise in the use of the technology (Hadoop cluster, Hive) - Community portal (Design, development and hosting)

European Exa-Initiatives

GENIUS


Coordinator: Jülich

Cost: 18,500,000 €

Coordinator: BSC

Cost: 14,000,000 €

Coordinator: EPCC, The University of Edinburgh

Cost:12,000,000 €

Coordinator: TOTAL S.A.

Cost: 640,000 €

Coordinator: CIMNE

HPC Exascale Projects


NUMEXAS

Numerical Methods and Tools for Key Exascale Computing Challenges in Engineering and Applied Sciences

The overall aim of the NUMEXAS project is to develop, implement and validate the next generation of numerical methods to be run under exascale computing architectures. This will be done by implementing a new paradigm for the development of advanced numerical methods to really exploit the intrinsic capabilities of the future exascale computing infrastructures. The main outcome of NUMEXAS will be a new set of numerical methods and codes that will allow industry, government and academia to solve exascale-class problems in engineering and applied sciences.


NUMEXAS

T1.1 Report with Scalability testT1.2 Report with scalability of assessment of available codesT2.1 Study evaluation and installation supercomputing "generic tools“T2.2 Available math libraries for the exascale T2.3 Identification of commonality developmentsT8.1 Evaluations and installation of benchmarking tools, including description of commonality developmentsT8.2 Analysis of software including initial scaling behaviourT8.3 Software prototypes with hardware specific enhancements for different platformsT12.1 Collaborative Web Portal


NUMEXAS

WP3-4-5-6-7: Scalable Pre&Solver&Post developers

WP1: Survey of existing soft. WP2: Exascale-Oriented Resources

WP8: Bench., Prof. and HW Optimization

NUMEXAS



NUMEXAS


HPL (Linpack) results perverted, HPCG taking over?

State of the Art

SCALABILITY TEST

HPL (High Performance Linpack) results

• Very high efficiency

• SMP slightly outperforms MPP

– Computing network or architecture efficiency– Large number of cores

• LINPACK may not scale linearly

NUMEXAS

HPL and HPCCG for Phi

• Technical difficulties did not allow testing on more than one nodes

SCALABILITY TEST

NUMEXAS


Conclusions

• Making the transition to Exascale poses numerous unavoidable scientific and technological challenges

• Reducing Power requirements

• Coping with run-time errors (billions of cores!)• Exploiting massive parallelism

• The benefits of Exascale far outweigh the costs!

• Paradigm change > difficult to know what will happen but science breakthroughs at Exascale will lead to exponential growth in new industries

Questions?

David Tur, HPC Scientist

[email protected]

Maria Isabel Gandia, Communications Manager

[email protected]

EduPERT call, Barcelona, Dec 4th 2014

hpc performance tools, on the road to exascale

Technology

exaflopshpc performance

big storage

exabytesbig data

exabytesscientific simulations

regional research

department of energy

r output

r input