programming the new knl cluster at lrz introduction · wednesday, january 24, 2018, kursraum 2,...

54
Programming the new KNL Cluster at LRZ Introduction Dr. Volker Weinberg (LRZ) January, 24-25, 2018, LRZ

Upload: others

Post on 19-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Programming the new KNL Cluster at LRZIntroduction

Dr. Volker Weinberg (LRZ) January, 24-25, 2018, LRZ

Page 2: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

LRZ in the HPC Environment

Programming the new KNL Cluster at LRZ

HLRS@Stuttgart JSC@Jülich LRZ@Garching

PRACE has 25 members, representing European Union Member States and Associated Countries.

Bavarian Contribution to National Infrastructure

German Contribution to European Infrastructure

24.-25.1.2018

Page 3: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

PRACE Training

Advanced Training Centre (PATC) Courses

LRZ is part of the Gauss Centre for Supercomputing (GCS), which is one of the six PRACE Advanced Training Centres (PATCs) that started in 2012:

− Barcelona Supercomputing Center (Spain), CINECA− Consorzio Interuniversitario (Italy)− CSC – IT Center for Science Ltd (Finland)− EPCC at the University of Edinburgh (UK)− Gauss Centre for Supercomputing (Germany)− Maison de la Simulation (France)

Mission: Serve as European hubs and key drivers of advanced high-quality training for researchers working in the computational sciences.

http://www.training.prace-ri.eu/

Programming the new KNL Cluster at LRZ24.-25.1.2018

Page 4: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Tentative Agenda: Wednesday

24.-25.1.2018 Programming the new KNL Cluster at LRZ

● Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room)●● 09:00-10:00 Welcome & Introduction (Weinberg)● 10:00-10:30 Overview of the Intel MIC architecture (Allalen)● 10:30-11:00 Coffee break● 11:00-11:30 Overview of Intel MIC programming & Using the LRZ Linux-Cluster (Allalen)● 11:30-12:00 Hands-on

● 12:00-13:00 Lunch break

● 13:00-14:00 Guided CoolMUC3/SuperMUC Tour (Weinberg/Allalen)● 14:00-15:00 Vectorisation and basic Intel Xeon Phi performance optimisation (Allalen)● 15:00-15:30 Coffee break● 15:30-16:00 Hands-on

Page 5: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Tentative Agenda: Thursday

24.-25.1.2018 Programming the new KNL Cluster at LRZ

● Thursday, January 25, 2018, Kursraum 2, H.U.010 (course room)

● 09:00-10:30 Code optimization process for KNL (Iapichino)● 10:30-11:00 Coffee Break● 11:00-12:00 Intel profiling tools and roofline model (Iapichino)

● 12:00-13:00 Lunch Break● 13:00-14:30 KNL Memory Modes and Cluster Modes, MCDRAM (Weinberg)● 14:30-15:00 Coffee Break● 15:00-16:00 Hands-on● 16:00-16:30 Wrap-Up

Page 6: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Information

● Lecturers:− Dr. Momme Allalen (LRZ)− Dr. Luigi Iapichino (LRZ)− Dr. Volker Weinberg (LRZ)

● Complete lecture slides & exercise sheets:− https://www.lrz.de/services/compute/courses/x_lecturenotes/knl_c

ourse_2018/− https://tinyurl.com/y9hmln56

● Examples under:− /lrz/sys/courses/KNL

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 7: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Intel Xeon Phi @ LRZ and EU

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 8: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Evaluating Accelerators at LRZ

Research at LRZ within PRACE & KONWIHR:● CELL programming

− 2008-2009 Evaluation of CELL programming.− IBM announced to discontinue CELL in Nov. 2009.

● GPGPU programming− Regular GPGPU computing courses at LRZ since 2009.− Evaluation of GPGPU programming languages:

− CAPS HMPP− PGI accelerator compiler− CUDA, cuBLAS, cuFFT− PyCUDA/R

− Regular Deep Learning Workshops since 2018● Intel Xeon Phi programming

− Larrabee (2009) → Knights Ferry (2010) → Knights Corner → Intel Xeon Phi (2012) → KNL (2016)

24.-25.1.2018 Programming the new KNL Cluster at LRZ

} → OpenACC, OpenMP 4.x

Page 9: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

IPCC (Intel Parallel Computing Centre)

● New Intel Parallel Computing Centre (IPCC) since July 2014: Extreme Scaling on MIC/x86

● Chair of Scientific Computing at the Department of Informatics in the Technische Universität München (TUM) & LRZ

● https://software.intel.com/de-de/ipcc#centers● https://software.intel.com/de-de/articles/intel-parallel-computing-center-at-

leibniz-supercomputing-centre-and-technische-universit-t● Codes:

− Simulation of Dynamic Ruptures and Seismic Motion in Complex Domains: SeisSol

− Numerical Simulation of Cosmological Structure Formation: GADGET− Molecular Dynamics Simulation for Chemical Engineering: ls1 mardyn− Data Mining in High Dimensional Domains Using Sparse Grids: SG++

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 10: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

IPCC (Intel Parallel Computing Centre)

● Numerical Simulation of Cosmological Structure Formation: GADGET

● Code optimization through:− Better threading parallelism (dynamic scheduling,

lockless version),− Data optimization (AoS → SoA),− Promoting more efficient vectorization.

● Baruffa, Iapichino, Hammer & Karakasis: Performance optimisation of SmoothedParticle Hydrodynamics algorithms for multi/many-core architectures, 2017, proceedings of HPCS 2017, Also available as: https://arxiv.org/abs/1612.06090Awarded as Outstanding Paper.

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 11: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

IPCC (Intel Parallel Computing Centre)

− Numerical Simulation of Cosmological Structure Formation: GADGET− Improved Speed-Up of the Isolated Kernel on KNL wrt

original version: 5.7x− parallel efficiency: 73%

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 12: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Intel Xeon Phi and GPU Training @ LRZ

Programming the new KNL Cluster at LRZ

28.-30.4.2014 @ LRZ (PATC): KNC+GPU27.-29.4.2015 @ LRZ (PATC): KNC+GPU3.-4.2.2016 @ IT4Innovations: KNC27.-29.6.2016 @ LRZ (PATC): KNC+KNL28.9.2016 @ PRACE Seasonal School,

Hagenberg: KNC7.-8.2.2017 @ IT4Innovations (PATC): KNC26.-28.6.2017 @ LRZ (PATC): KNLJune 2018 @ LRZ (PATC tbc.): KNL

http://inside.hlrs.de/inSiDE, Vol. 12, No. 2, p. 102, 2014inSiDE, Vol. 13, No. 2, p. 79, 2015inSiDE, Vol. 14, No. 1, p. 76f, 2016inSiDE, Vol. 14, No. 2, p. 25ff, 2016inSiDE, Vol. 15, No. 1, p. 48ff, 2017inSiDE, Vol. 15, No. 2, p. 55ff, 2017

24.-25.1.2018

Page 13: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

● Czech-Bavarian Competence Team for Supercomputing Applications (CzeBaCCA)

● BMBF funded project (Jan. 2016 – Dec 2017) to:

− Foster Czech-German Collaboration in Simulation Supercomputing− series of workshops will initiate and deepen collaboration between Czech

and German computational scientists − Establish Well-Trained Supercomputing Communities

− joint training program will extend and improve trainings on both sides− Improve Simulation Software

− establish and disseminate role models and best practices of simulation software in supercomputing

Programming the new KNL Cluster at LRZ

CzeBaCCA Project

24.-25.1.2018

Page 14: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

CzeBaCCA Trainings and Workshops

● Intel MIC Programming Workshop, 3 – 4 February 2016, Ostrava, Czech Republic

● Scientific Workshop: SeisMIC - Seismic Simulation on Current and Future Supercomputers, 5 February 2016, Ostrava, Czech Republic

● PRACE PATC Course: Intel MIC Programming Workshop, 27 - 29 June 2016, Garching, Germany

● Scientific Workshop: High Performance Computing for Water Related Hazards, 29 June - 1 July 2016, Garching, Germany

● PRACE PATC Course: Intel MIC Programming Workshop, 7 – 8 February 2017, Ostrava, Czech Republic

● Scientific Workshop: High performance computing in atmosphere modelling and air related environmental hazards, 9 February 2017, Ostrava, Czech Republic

● PRACE PATC Course: Intel MIC Programming Workshop, 26 – 28 June 2017, Garching, Germany

● Scientific Workshop: HPC for natural hazard assessment and disaster migration, 28 - 30 June 2017, Garching, Germany

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 15: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

CzeBaCCA Trainings and Workshops

Programming the new KNL Cluster at LRZ

1st workshop series: February 2016 @ IT4I

https://www.lrz.de/forschung/projekte/forschung-hpc/CzeBaCCA/http://inside.hlrs.de/ inSiDE, Vol. 14, No. 1, p. 76f, 2016http://www.gate-germany.de/fileadmin/dokumente/Laenderprofile/Laenderprofil_Tschechien.pdf, p.27

24.-25.1.2018

Page 16: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

CzeBaCCA Trainings and Workshops

Programming the new KNL Cluster at LRZ

2nd workshop series: June 2016 @ LRZ

https://www.lrz.de/forschung/projekte/forschung-hpc/CzeBaCCA/http://inside.hlrs.de/ inSiDE, Vol. 14, No. 2, p. 25ff, 2016http://www.gate-germany.de/fileadmin/dokumente/Laenderprofile/Laenderprofil_Tschechien.pdf, p.27

24.-25.1.2018

Page 17: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

CzeBaCCA Trainings and Workshops

Programming the new KNL Cluster at LRZ

3rd workshop series: February 2017 @ IT4I

https://www.lrz.de/forschung/projekte/forschung-hpc/CzeBaCCA/http://inside.hlrs.de/ inSiDE, Vol. 15, No. 1, p. 48ff, 2017http://www.gate-germany.de/fileadmin/dokumente/Laenderprofile/Laenderprofil_Tschechien.pdf, p.27

24.-25.1.2018

Page 18: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

CzeBaCCA Trainings and Workshops

Programming the new KNL Cluster at LRZ

4th workshop series: June 2017 @ LRZ

https://www.lrz.de/forschung/projekte/forschung-hpc/CzeBaCCA/http://inside.hlrs.de/ inSiDE, Vol. 15, No. 2, p. 55ff, 2017http://www.gate-germany.de/fileadmin/dokumente/Laenderprofile/Laenderprofil_Tschechien.pdf, p.27

24.-25.1.2018

Page 19: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Intel Xeon Phi @ Top500 November 2017

● https://www.top500.org/list/2017/11/● #2: Tianhe-2 (MilkyWay-2) - TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH

Express-2, Intel Xeon Phi 31S1P, National Super Computer Center in Guangzhou, China● #7: Trinity - Cray XC40, Intel Xeon Phi 7250 68C 1.4GHz, Aries interconnect , Cray

Inc.DOE/NNSA/LANL/SNL, United States ● #8: Cori - Cray XC40, Intel Xeon Phi 7250 68C 1.4GHz, Aries interconnect , Cray Inc.,

DOE/SC/LBNL/NERSC, United States ● #9: Oakforest-PACS - PRIMERGY CX1640 M1, Intel Xeon Phi 7250 68C 1.4GHz, Intel

Omni-Path , Fujitsu, Joint Center for Advanced High Performance Computing, Japan ● #12:Stampede2 - PowerEdge C6320P, Intel Xeon Phi 7250 68C 1.4GHz, Intel Omni-Path ,

Dell, Texas Advanced Computing Center/Univ. of Texas, United States ● #14: Marconi, Intel Xeon Phi - CINECA Cluster, Intel Xeon Phi 7250 68C 1.4GHz, Intel

Omni-Path , Lenovo, CINECA, Italy ● #23: Tera-1000-2-Part 1 - Bull Sequana X1000, Intel Xeon Phi 7250 68C 1.4GHz, Bull BXI

1.2, Bull, Atos Group, Commissariat a l'Energie Atomique (CEA)●● … several non European systems …● #87: Salomon - SGI ICE X, Xeon E5-2680v3 12C 2.5GHz, Infiniband FDR, Intel Xeon Phi

7120P, HPE, IT4Innovations National Supercomputing Center, VSB-Technical University of Ostrava, Czech Republic

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 20: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

PRACE: Best Practice Guides

● http://www.prace-ri.eu/best-practice-guides/

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 21: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Best Practice Guides - Overview

● The following 4 Best Practice Guides (BPGs) have been written within PRACE-4IP by 13 authors from 8 institutions and have been published in pdf and html format in January 2017 on the PRACE website:

− Intel® Xeon Phi™ BPG Update of the PRACE-3IP BPG− Haswell/Broadwell BPG Written from scratch− Knights Landing BPG Written from scratch− GPGPU BPG Update of the PRACE-2IP mini-guide

● Online under: http://www.prace-ri.eu/best-practice-guides/

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 22: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Intel MIC within PRACE: Intel Xeon Phi (KNC) Best Practice Guide

− Created within PRACE-3IP+4IP.− Written in Docbook XML.− 122 pages, 13 authors− Now including information about existing

Xeon Phi based systems in Europe: Avitohol@ BAS (NCSA), MareNostrum @ BSC, Salomon @ IT4Innovations,SuperMIC @ LRZ

− http://www.prace-ri.eu/best-practice-guide-intel-xeon-phi-january-2017/

− http://www.prace-ri.eu/IMG/pdf/Best-Practice-Guide-Intel-Xeon-Phi-1.pdf

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 23: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Intel MIC within PRACE: Knights Landing Best Practice Guide

− Created within PRACE-4IP.− Written in Docbook XML.− 85 pages, 3 authors− General information about the KNL

architecture and programming environment− Benchmark & Application Performance

results− http://www.prace-ri.eu/IMG/best-practice-

guide-knights-landing-january-2017/− http://www.prace-ri.eu/IMG/pdf/Best-

Practice-Guide-Knights-Landing.pdf

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 24: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Best Practice Guides - Dissemination

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 25: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

DEEP/ER Project: Towards Exascale

● Design of an architecture leading to Exascale.● Development of hardware: Implementation of a Booster based on

MIC processors and EXTOLL interconnect.

7.-8.2.2017 Intel MIC Programming Workshop @ IT4I

Page 26: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

SuperMIC ∈ SuperMUC @ LRZ

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 27: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

SuperMUC System Overview

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 28: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

SuperMUC Phase 2: Moving to Haswell

24.-25.1.2018 Programming the new KNL Cluster at LRZ

6 Haswell islands512 nodes per islandwarm water cooling

LRZ infrastructure(NAS, Archive, Visualization)

Internet / Grid Services

Mellanox FDR14Island switch

Haswell-EP24 cores/node2.67 GB/core

non blocking

Spine infinibandswitches

pruned tree

I/Oservers

GPFS for$WORK$SCRATCH

I/O Servers (weak coupling of phases 1+2)

Mellanox FDR10 Island switch

non blocking

pruned tree

Thin + Fat islandsof SuperMC

Page 29: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

SuperMIC: Intel Xeon Phi Cluster

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 30: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

SuperMIC: Intel Xeon Phi Cluster

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 31: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

SuperMIC ∈ SuperMUC @ LRZ

● 32 compute nodes (diskless)− SLES11 SP3− 2 Ivy-Bridge host processors [email protected] GHz with 16 cores− 2 Intel Xeon Phi 5110P coprocessors per node with 60 cores − 64 GB (Host) + 2 * 8 GB (Xeon Phi) memory− 2 MLNX CX3 FDR PCIe cards attached to each CPU socket

● Interconnect− Mellanox Infiniband FDR14 − Through Bridge Interface all nodes and MICs are directly accessible

● 1 Login- and 1 Management-Server (Batch-System, xCAT, …)● Air-cooled● Supports both native and offload mode● Batch-system: LoadLeveler

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 32: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

SuperMIC Network Access

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 33: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

SuperMIC Access

● Description of SuperMIC:● https://www.lrz.de/services/compute/supermuc/supermic/

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 34: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

CoolMUC3 @ Linux-Cluster @ LRZ

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 35: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

First self-assembled Linux cluster (1999-2002)

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 36: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Cluster components (2012)

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 37: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

SGI UltraViolet with air guidesin front to improve cooling efficiency (2012)

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 38: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

CooLMUC2 (2015): The six racks to the left

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 39: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

LRZ Linux Cluster Overview

● The LRZ Linux Cluster consists of several segments with different types of interconnect and different sizes of shared memory. All systems have a (virtual) 64 bit address space:− CooLMUC2 Cluster with 28-way Haswell-based nodes and FDR14 Infiniband

interconnect, used for both serial and parallel processing− Intel Broadwell based 6 TByte shared memory server HP DL580− CooLMUC3 Cluster with 64-way KNL 7210-F many-core processors and Intel

Omnipath OPA1 interconnect, for parallel/vector processing● Based on the various node types the LRZ Linux cluster offers a wide span of

capabilities:− mixed shared and distributed memory− large software portfolio− flexible usage due to various available memory sizes− parallelization by message passing (MPI)− shared memory parallelization with OpenMP or pthreads− mixed (hybrid) programming with MPI and OpenMP

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 40: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Linux Cluster System Overview

Architecture Total Numbers Max Job Limits How to get access

System Name CPU Cores

per Node

RAMper Node[GB]

Nodes Cores Nodes Cores WallTime

Max. aggreg.RAM

Queue

Login Node(job submission)

Linux-ClusterCoolMUC2

Intel XeonE5-2697 v3 ("Haswell")

28 64 384 10752 60 1680 48h 3.8 TB mpp2lxlogin5.lrz.de, lxlogin6.lrz.de,lxlogin7.lrz.de

Linux-ClusterCoolMUC2

Intel XeonE5-2697 v3 ("Haswell")

28 64 1 28 1 28 96h 64 GB seriallxlogin5.lrz.de, lxlogin6.lrz.de,lxlogin7.lrz.de

Linux-ClusterHugemem

Intel XeonE5-2660 v2 ("Sandy Bridge")

20 240 7 220 1 20 168h 240 GB hugememlxlogin5.lrz.de,lxlogin6.lrz.de,lxlogin7.lrz.de

Linux-ClusterTeramem

Intel Xeon E7-8890 v4 96 6144 1 96 1 96 48h 6.1 TB inter

lxlogin5.lrz.de,lxlogin6.lrz.de,lxlogin7.lrz.de

Linux ClusterMany Core

Intel Xeon Phi (Knights Landing) 64 RAM:96

HBM:16 148 64 32 64 24h

RAM:2880GBHBM:512GB

mpp3 lxlogin8.lrz.de

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 41: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

CoolMUC3 System Procurement Vision

● To supply a system to its users that is suited for processing highly vectorizable and thread-parallel applications,

● provides good scalability across node boundaries for strong scaling, and

● deploys state-of-the art high-temperature water-cooling technologies for a system operation that avoids heat transfer to the ambient air of the computer room.

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 42: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

CoolMUC3 System Overview

● Consists of 148 computational many-core Intel “Knight’s Landing (KNL)” nodes (Xeon Phi 7210-F hosts).

● Connected to each other via an Intel Omnipath high performance network in fat tree topology.

● Theoretical peak performance is 400 TFlop/s and the LINPACK performance of the complete system is 255 TFlop/s.

● Standard Intel® Xeon® login node for development work and job submission.

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 43: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

CoolMUC3 System Overview

● CoolMUC-3 comprises of three warm-water cooled racks, using an inlet temperature of at least 40 C.

● Deployment of liquid-cooled power supplies and Omni-Path switches.

● Thermal isolation of the racks to suppress radiative losses. ● Liquid cooled racks operate entirely without fans. ● A complementary rack for aircooled components (e.g.

management servers) uses less than 3% of the systems total power budget.

● With 4.96 GFlops/Watt (according to the strict Green500 level-3 measurement methodology) CooLMUC-3 is one of the most efficient x86 systems worldwide.

● Result of an established partnership between LRZ and Megware.

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 44: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

CoolMUC3 (2017): Really Cool

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 45: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

CoolMUC3 (2017): Really Cool

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 46: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

CoolMUC3 System Overview

HardwareNumber of nodes 148Cores per node 64Hyperthreads per core 4Core nominal frequency 1.3 GHzMemory (DDR4) per node 96 GB (Bandwidth 80.8 GB/s)High Bandwidth Memory per node 16 GB (Bandwidth 460 GB/s)Bandwidth to interconnect per node 25 GB/s (2 Links)Number of Omnipath switches (100SWE48) 10 + 4 (each 48 Ports)Bisection bandwidth of interconnect 1.6 TB/sLatency of interconnect 2.3 µsPeak performance of system 394 TFlop/s

InfrastructureElectric power of fully loaded system 62 kVAPercentage of waste heat to warm water 97%Inlet temperature range for water cooling 30 … 50 °CTemperature difference between outlet and inlet 4 … 6 °C

Software (OS and development environment)Operating system SLES12 SP2 LinuxMPI Intel MPI 2017, alternativ OpenMPICompilers Intel icc, icpc, ifort 2017 Performance libraries MKL, TBB, IPP, DAALTools for performance and correctness analysis Intel Cluster Tools

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 47: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Linux Cluster System Access

24.-25.1.2018 Programming the new KNL Cluster at LRZ

ssh -Y lxlogin5.lrz.de -l xxyyyzz Haswell (CooLMUC-2) login nodessh -Y lxlogin6.lrz.de -l xxyyyzz Haswell (CooLMUC-2) login nodessh -Y lxlogin7.lrz.de -l xxyyyzz Haswell (CooLMUC-2) login nodegsissh -Y lxgt2.lrz.de login node for Gsi-SSH

ssh -Y lxlogin8.lrz.de -l xxyyyzz KNL Cluster (CooLMUC-3 login node)

Page 48: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Linux Cluster System Documentation

24.-25.1.2018 Programming the new KNL Cluster at LRZ

● Linux-Cluster:− https://www.lrz.de/services/compute/linux-cluster/

● CoolMUC-3:− https://www.lrz.de/services/compute/linux-

cluster/coolmuc3/initial_operation/− https://www.lrz.de/services/compute/linux-

cluster/coolmuc3/overview/

Page 49: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

CoolMUC3 SLURM

● Submit a job: sbatch --reservation=KNL_Course job.sh

● List own jobs: squeue –u a2c06aa

● Cancel jobs: scancel jobid

● Interactive Access:● salloc --nodes=1 --time=02:00:00

--constraint=flat,quad--reservation=KNL_Course

● srun –-pty bash --reservation=KNL_Course

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 50: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

CoolMUC3 SLURM Simplest Batch File

-> /lrz/sys/courses/KNL/batch-flat-quad.sh

#!/bin/bash

#SBATCH -o /home/hpc/a2c06/a2c06aa/test.%j.%N.out

#SBATCH -D /home/hpc/a2c06/a2c06aa/

#SBATCH -J jobname

#SBATCH --clusters=mpp3

#SBATCH --get-user-env

#SBATCH --time=02:00:00

#SBATCH --constraint=flat,quad

commands

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 51: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

KNL Testsystem

24.-25.1.2018 Programming the new KNL Cluster at LRZ

First login to Linux-Cluster (directlyreachable from the course PCs, use onlyaccount a2c06aa!):

ssh lxlogin8.lrz.de –l a2c06aa

Then:

ssh mcct03.cos.lrz.de orssh mcct04.cos.lrz.de

Processor: Intel(R) Xeon Phi(TM) CPU 7210. 64 cores, 4 threads per core.Frequency: 1 - 1.5 GHz

KNL: 64 cores x 1.3 GHz x 8 (SIMD) x 2 x 2 (FMA) = 2662.4 GFLOP/sCompare with:KNC: 60 cores x 1 GHz x 8 (SIMD) x 2 (FMA) = 960 GFLOP/sSandy-Bridge: 2 sockets x 8 cores x 2.7 GHz x 4 (SIMD) x 2 (ALUs) = 345.6 GFLOP/s

Page 52: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Xeon Phi References

● Books:− James Reinders, James Jeffers, Intel Xeon Phi Coprocessor High

Performance Programming, Morgan Kaufman Publ. Inc., 2013 http://lotsofcores.com ; new KNL edition in July 2016

− Rezaur Rahman: Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers, Apress 2013 .

− Parallel Programming and Optimization with Intel Xeon Phi Coprocessors, Colfax 2013 http://www.colfaxintl.com/nd/xeonphi/book.aspx

● Training material by CAPS, TACC, EPCC● Intel Training Material and Webinars● V. Weinberg (Editor) et al., Best Practice Guide - Intel Xeon Phi v2,

http://www.prace-ri.eu/best-practice-guide-intel-xeon-phi-january-2017/ and references therein

● Ole Widar Saastad (Editor) et al., Best Practice Guide – Knights Landing, http://www.prace-ri.eu/best-practice-guide-knights-landing-january-2017/

24.-25.1.2018 Programming the new KNL Cluster at LRZ

Page 53: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

Acknowledgements

● IT4Innovation, Ostrava.

● Partnership for Advanced Computing in Europe (PRACE)

● Intel

● BMBF (Federal Ministry of Education and Research)

● Dr. Karl Fürlinger (LMU)

● J. Cazes, R. Evans, K. Milfeld, C. Proctor (TACC)

● Adrian Jackson (EPCC)

Programming the new KNL Cluster at LRZ24.-25.1.2018

Page 54: Programming the new KNL Cluster at LRZ Introduction · Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room) 09:00-10:00 Welcome & Introduction (Weinberg) 10:00-10:30 Overview

And now …

Enjoy the course!

24.-25.1.2018 Programming the new KNL Cluster at LRZ