hpc performance tools, on the road to exascale
TRANSCRIPT
HPC Performance tools,
on the road to Exascale
David Tur, HPC Scientist
Maria Isabel Gandia, Communications
Manager
EduPERT call, Barcelona, Dec 4th 2014
HPC Performance tools, on the road to Exascale
1. Introduction
2. HPC, On the Road to Exascale
3. Performance Tools
4. European Exa-Initiatives
5. Q&A
economies d’escala
Introduction
Our goal is to share academic, scientific and management services for our institutions to improve efficiency promoting synergies and economies of scale.
The Consortium
Introduction
Management of e-infrastructures for the universities and research
centres, like Anella Científica, the Regional Research and
Education Network in Catalonia
Improvement of library services through collaboration
among universities
The Consortium
Communications
• 81 connected institutions• 85 points of access• Traffic (2014): 29,78 TB• Several 10 Gbps for projects
• 26 (+3) institutions in Eduroam
• 25 members at CATNIX• 10 Gbps interconnection
The Consortium
Digital certification e-voting e-register e-signature
Digital Certificates Custody
Interoperability Archive Logs
E-Administration
30 elections
8 universities
20,000R (input)
11,000 R (output)
+ 3,000
certificates
The Consortium
High Performance Computing
The Consortium
• HPC for scientific (academic) groups• Drug Design Service• HPC for industrial and engineering
Introduction
Coming challenges in HPC
Science in (Big) Data:
Giga to Exabytes
Big Data is not (only) about big storage but:
Methods & HPC machines
Introduction
Coming challenges in HPC
Science in (Big) Data:
Giga to Exabytes
Scientific Simulations:
Giga to Exaflops
Introduction
Coming challenges in HPC
Science in (Big) Data:
Giga to Exabytes
Scientific Simulations:
Giga to Exaflops
HPC Performance tools, on the road to Exascale
1. Introduction
2. HPC, On the Road to Exascale
3. Performance Tools
4. European Exa-Initiatives
5. Q&A
economies d’escala
HPC, On the Road to Exascale
Measure of computer performanceFlops or Flop/s: FLoating-point Operations Per Second
EXASCALE
Flops
Zettaflops 1021
Exaflops 1018Petaflops 1015
Teraflops 1012
Gigaflops 109
Megaflops 106
Kiloflops 103
HPC, On the Road to Exascale
What U.S. Department of Energy concludes (and we, Europeans should bear in mind) is:
“U.S. economic competitiveness will also be significantly enhanced as companies utilize accelerate development of superior new products and spur creativity arising from modeling and simulation at unprecedented speed and fidelity. An important ingredient is the boost to continued international leadership of the U.S.-based information technology industry at all levels, from laptop to exaflop.”
State of the Art
HPC, On the Road to Exascale
• Adaptation to regional climate changes • Reduction of the carbon footprint
• Efficiency and safety of the nuclear energy sector
• Innovative designs for cost-effective renewable energy resources such as batteries, biofuels…
• Design of advanced experimental facilities, such as accelerators, and inertial confinement fusion
• First-principles understanding of the properties of fission and fusion reactions
• Reverse engineering of the human brain
• Design, control and manufacture of advanced materials
Why Exascale?
HPC, On the Road to Exascale
1943, First Electronic general purpose Computer:ENIAC (Electronic Numerical Integrator And Computer)Designed for the United States Army’s Ballistic Research Laboratory1000 times faster than electro-mechanical machines, 20 tons, 167m2
150kW, 100kHZ and 400 Flop/s
Brief history of HPC
Flops
Zettaflops 1021
Exaflops 1018Petaflops 1015
Teraflops 1012
Gigaflops 109
Megaflops 106
Kiloflops 103
Flops
Zettaflops 1021
Exaflops 1018Petaflops 1015
Teraflops 1012
Gigaflops 109
Megaflops 106
Kiloflops 103
Altix UV 1000 (2010)Xeon X7542, 1344 cores 6.14 TB12,61 TFlop/s
HPC, On the Road to Exascale
Brief history of HPC
HPC, On the Road to Exascale
Power is the limitation: ~20MW
Computational performance X200
Power increase X2 (x5)
First Exascale system between 2018-2020
This means a paradigm change!! Something must happen…something is happening!
The power constraint
HPC, On the Road to Exascale
New computing architectures
HPC systems consumers (and providers):
‘forgot’ about energy but…
Mobile world didn't!
forgot about vectorial computation but…
Gaming industry recovered it!
HPC, On the Road to Exascale
New computing architectures
Hardware (type of nodes)
• Multicore
• Many-core
• Low consumption core
HPC, On the Road to Exascale
New computing architectures
Hardware (type of nodes)
• Multicore
• Many-core
• Low consumption core
HPC, On the Road to Exascale
New computing architectures
Hardware (type of nodes)
• Multicore
• Many-core
• Low consumption core
HPC, On the Road to Exascale
New computing architectures
Hardware (type of nodes)
• Multicore
• Many-core
• Low consumption core
HPC, On the Road to Exascale
New software paradigms
Hybrid SW also arriving
• OpenMP – works on a single node or with Intel MIC.
• MPI – works on single node, multiple nodes &Intel MIC.
• MPI with OpenMP (hybrid) – works on a single node, multiple nodes, possibly on Intel MIC.
• OpenCL – on single CPU node with AMD/NVidiaGPU
• MPI with CUDA (hybrid) – works on multiple nodes with NVidia GP-GPUs.
• MPI with OpenCL (hybrid) – works on multiple AMD/NVidia GP-GPU and x86 node.
• OpenACC (Fortran/C) with MPI – works with multiple nodes and with GP-GPUs.
• Computing simulation: 3rd pillar of Science (Exp&Th.)• Less importance of Flop/s and more on other factors
(bandwidth, overheads, latency…)• Power consumption rules the Exaworld
• New paradigms need to be implemented (co-design)
• New software• Numerical methods• Codes• Hardware architectures
• Heterogeneous & Hybrid Hardware Solutions (CPU/GPU/ARM/…)
HPC, On the Road to Exascale
Conclusions
HPC Performance tools, on the road to Exascale
1. Introduction
2. HPC, On the Road to Exascale
3. Performance Tools
4. European Exa-Initiatives
5. Q&A
economies d’escala
• Computing simulation: 3rd pillar of Science (Exp&Th.)• Less importance of Flop/s and more on other factors
(bandwidth, overheads, latency…)• Power consumption rules the Exaworld
• New paradigms need to be implemented (co-design)
• New software• Numerical methods• Codes• Hardware architectures
• Heterogeneous & Hybrid Hardware Solutions (CPU/GPU/ARM/…)
HPC, On the Road to Exascale
New (or evolved) New (or evolved) performance toolsperformance tools
are neededare needed
Performance Tools
Supercomputing tools
What’s needed to perform simulations in a supercomputer
• OS
• Environment modules
• Batch system (Slurm, LSF, Maui, PBS, …)
• Mathematical Libraries (Trilinos, MKL, CUDA Toolkit, …)
• Parallel I/O Tools (NetCDF, HDF5, …)
• MPI Libraries (OpenMPI, “vendor” MPI, …)
• Programming Tools (Chapel, Intel TBB, GA, …)
• Compilers (GNU, Intel, PGI, “vendor”, …)
• Performance Tools
Performance Tools
Performance Tools
We are not reinventing the wheel!VI-HPS: European initiative http://www.vi-hps.org/
NERSC: U.S. department of Energy https://www.nersc.gov/
Performance Tools
• Debugging: These tools are used to investigate
codes searching for errors in the control flow
• Performance Analysis: Provides information
about the runtime behavior of application,
validating the efficiency usage of the hardware
• Correctness: checking tool that detects and
reports error patterns
Performance Tools
Performance Tools
Debugging
• Debugging: These tools are used to investigate
codes searching for errors in the control flow
• Examination values of the variables
• Live during execution or post-mortem
Performance Tools
Debugging
• Allinea DDT: GUI-based parallel debugger, which
can be used to debug serial, MPI, threaded
(OpenMP) and GPU (CUDA)
• TotalView: GUI-based source code debugger and
defect analysis tool that allows to debug one or
many processes and/or threads in a single window
with complete control over program execution.
• Intel Inspector XE: is a memory and
threading error debugger for C, C++, C#
and Fortran applications.
Performance Tools
Performance analysis
• Performance Analysis: Provides information
about the runtime behavior of application,
validating the efficiency usage of the hardware
• Data used by the tool can be obtained in
various ways: from static code analysis,
measurements or simulations
• Important to explore the “vendor” (SGI, Cray,
…) software
Performance Tools
Performance analysis
• Vendor software: Tools for measuring and analyzing major
components are offered from supercomputing vendors: Craypat,
SGI Perf. Suite, …
• Intel Inspector XE: Intel VTune Amplifier application for software
performance analysis for 32, 64-bit x86 and MIC based machines.
Performance Tools
Performance analysis
• Vendor software: Tools for measuring and analyzing major
components are offered from supercomputing vendors: Craypat,
SGI Perf. Suite, …
• Intel Inspector XE: Intel VTune Amplifier application for software
performance analysis for 32, 64-bit x86 and MIC based machines.
Performance Tools
Performance analysis
Extrae / Paraver:
Paraver was developed to have a global perception of the application behavior by visual inspection. The modular structure of Paraver plays a significant role towards achieving these targets.
Extrae is the tool devoted to generate tracefiles which can be analyzed later by Paraver. Extrae inject probes into the target application so as to gather information regarding the application performance.
DIMEMAS: performance analysis tool for message-passing programs. It enables the user to develop and tune parallel applications, while providing a prediction of their performance on the parallel target machine. Dimemas generates trace files that are suitable for Paraver and Vampir.
Performance Tools
Performance analysis
Extrae / Paraver:
Paraver was developed to have a global perception of the application behavior by visual inspection. The modular structure of Paraver plays a significant role towards achieving these targets.
Extrae is the tool devoted to generate tracefiles which can be analyzed later by Paraver. Extrae inject probes into the target application so as to gather information regarding the application performance.
DIMEMAS: performance analysis tool for message-passing programs. It enables the user to develop and tune parallel applications, while providing a prediction of their performance on the parallel target machine. Dimemas generates trace files that are suitable for Paraver and Vampir.
Performance Tools
Performance analysis
TAU (Tuning and Analysis Utilities): is a profiling and tracing toolkit for performance analysis of parallel programs written in most programming languages. TAU is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements as well as event-based sampling. In addition, TAU can generate event traces that can be displayed with the Vampir, Paraver or JumpShot trace visualization tools.
Performance Tools
Performance analysis
UNITE (UNiform Integrated Tool
Environment): provides a robust,
portable, and integrated environment
for the debugging and performance
analysis of parallel MPI, OpenMP, and
hybrid MPI/OpenMP programs on
HPC.
• Parallel Debugging
• TotalView• DDT
• Parallel Performance Analysis
• Extrae and Paraver• Marmot• mpiP• HPCToolkit• Intel Trace Collector
and Analyzer• Kcachegrind• MUST• PerfSuite• Periscope• TAU• Scalasca• Vampir
Performance Tools
Performance Tools
For more detailed information visit www.numexas.eu or:
VI-HPS: European initiative http://www.vi-hps.org/
NERSC: U.S. department of Energy https://www.nersc.gov/
HPC Performance tools, on the road to Exascale
1. Introduction
2. HPC, On the Road to Exascale
3. Performance Tools
4. European Exa-Initiatives
5. Q&A
economies d’escala
Gaia European Network for Improved User Services
Gaia is a European Space Agency Cornerstone mission launched in 2013, and aiming to produce the most accurate 3D map of the Milky Way to date. A pan-European consortium named DPAC, funded at the national level, is working on the implementation of the Gaia data processing (Sec. 1.1.2), of which the final result will be a catalogue and data archive containing one billion objects.GENIUS is aimed at significantly contributing to the development of this archive, to help the exploitation of the Gaia results in order to maximize the scientific return, and also to facilitate outreach activities related to it.
WP-350 Hardware considerationsWP-630 Archive testbed - Simulations, aimed to test the system and validations - Data mining, ask archive to obtain new knowledge - Get expertise in the use of the technology (Hadoop cluster, Hive) - Community portal (Design, development and hosting)
European Exa-Initiatives
GENIUS
European Exa-Initiatives
Coordinator: Jülich
Cost: 18,500,000 €
Coordinator: BSC
Cost: 14,000,000 €
Coordinator: EPCC, The University of Edinburgh
Cost:12,000,000 €
Coordinator: TOTAL S.A.
Cost: 640,000 €
Coordinator: CIMNE
HPC Exascale Projects
Numerical Methods and Tools for Key Exascale Computing Challenges in Engineering and Applied Sciences
The overall aim of the NUMEXAS project is to develop, implement and validate the next generation of numerical methods to be run under exascale computing architectures. This will be done by implementing a new paradigm for the development of advanced numerical methods to really exploit the intrinsic capabilities of the future exascale computing infrastructures. The main outcome of NUMEXAS will be a new set of numerical methods and codes that will allow industry, government and academia to solve exascale-class problems in engineering and applied sciences.
European Exa-Initiatives
NUMEXAS
T1.1 Report with Scalability testT1.2 Report with scalability of assessment of available codesT2.1 Study evaluation and installation supercomputing "generic tools“T2.2 Available math libraries for the exascale T2.3 Identification of commonality developmentsT8.1 Evaluations and installation of benchmarking tools, including description of commonality developmentsT8.2 Analysis of software including initial scaling behaviourT8.3 Software prototypes with hardware specific enhancements for different platformsT12.1 Collaborative Web Portal
European Exa-Initiatives
NUMEXAS
WP3-4-5-6-7: Scalable Pre&Solver&Post developers
WP1: Survey of existing soft. WP2: Exascale-Oriented Resources
WP8: Bench., Prof. and HW Optimization
NUMEXAS
European Exa-Initiatives
SCALABILITY TEST
HPL (High Performance Linpack) results
• Very high efficiency
• SMP slightly outperforms MPP
– Computing network or architecture efficiency– Large number of cores
• LINPACK may not scale linearly
NUMEXAS
HPL and HPCCG for Phi
• Technical difficulties did not allow testing on more than one nodes
SCALABILITY TEST
NUMEXAS
HPC, On the Road to Exascale
Conclusions
• Making the transition to Exascale poses numerous unavoidable scientific and technological challenges
• Reducing Power requirements
• Coping with run-time errors (billions of cores!)• Exploiting massive parallelism
• The benefits of Exascale far outweigh the costs!
• Paradigm change > difficult to know what will happen but science breakthroughs at Exascale will lead to exponential growth in new industries
Questions?
David Tur, HPC Scientist
Maria Isabel Gandia, Communications Manager
EduPERT call, Barcelona, Dec 4th 2014