programming the new knl cluster at lrz introduction · wednesday, january 24, 2018, kursraum 2,...
TRANSCRIPT
Programming the new KNL Cluster at LRZIntroduction
Dr. Volker Weinberg (LRZ) January, 24-25, 2018, LRZ
LRZ in the HPC Environment
Programming the new KNL Cluster at LRZ
HLRS@Stuttgart JSC@Jülich LRZ@Garching
PRACE has 25 members, representing European Union Member States and Associated Countries.
Bavarian Contribution to National Infrastructure
German Contribution to European Infrastructure
24.-25.1.2018
PRACE Training
Advanced Training Centre (PATC) Courses
LRZ is part of the Gauss Centre for Supercomputing (GCS), which is one of the six PRACE Advanced Training Centres (PATCs) that started in 2012:
− Barcelona Supercomputing Center (Spain), CINECA− Consorzio Interuniversitario (Italy)− CSC – IT Center for Science Ltd (Finland)− EPCC at the University of Edinburgh (UK)− Gauss Centre for Supercomputing (Germany)− Maison de la Simulation (France)
Mission: Serve as European hubs and key drivers of advanced high-quality training for researchers working in the computational sciences.
http://www.training.prace-ri.eu/
Programming the new KNL Cluster at LRZ24.-25.1.2018
Tentative Agenda: Wednesday
24.-25.1.2018 Programming the new KNL Cluster at LRZ
● Wednesday, January 24, 2018, Kursraum 2, H.U.010 (course room)●● 09:00-10:00 Welcome & Introduction (Weinberg)● 10:00-10:30 Overview of the Intel MIC architecture (Allalen)● 10:30-11:00 Coffee break● 11:00-11:30 Overview of Intel MIC programming & Using the LRZ Linux-Cluster (Allalen)● 11:30-12:00 Hands-on
● 12:00-13:00 Lunch break
● 13:00-14:00 Guided CoolMUC3/SuperMUC Tour (Weinberg/Allalen)● 14:00-15:00 Vectorisation and basic Intel Xeon Phi performance optimisation (Allalen)● 15:00-15:30 Coffee break● 15:30-16:00 Hands-on
Tentative Agenda: Thursday
24.-25.1.2018 Programming the new KNL Cluster at LRZ
● Thursday, January 25, 2018, Kursraum 2, H.U.010 (course room)
● 09:00-10:30 Code optimization process for KNL (Iapichino)● 10:30-11:00 Coffee Break● 11:00-12:00 Intel profiling tools and roofline model (Iapichino)
● 12:00-13:00 Lunch Break● 13:00-14:30 KNL Memory Modes and Cluster Modes, MCDRAM (Weinberg)● 14:30-15:00 Coffee Break● 15:00-16:00 Hands-on● 16:00-16:30 Wrap-Up
Information
● Lecturers:− Dr. Momme Allalen (LRZ)− Dr. Luigi Iapichino (LRZ)− Dr. Volker Weinberg (LRZ)
● Complete lecture slides & exercise sheets:− https://www.lrz.de/services/compute/courses/x_lecturenotes/knl_c
ourse_2018/− https://tinyurl.com/y9hmln56
● Examples under:− /lrz/sys/courses/KNL
24.-25.1.2018 Programming the new KNL Cluster at LRZ
Intel Xeon Phi @ LRZ and EU
24.-25.1.2018 Programming the new KNL Cluster at LRZ
Evaluating Accelerators at LRZ
Research at LRZ within PRACE & KONWIHR:● CELL programming
− 2008-2009 Evaluation of CELL programming.− IBM announced to discontinue CELL in Nov. 2009.
● GPGPU programming− Regular GPGPU computing courses at LRZ since 2009.− Evaluation of GPGPU programming languages:
− CAPS HMPP− PGI accelerator compiler− CUDA, cuBLAS, cuFFT− PyCUDA/R
− Regular Deep Learning Workshops since 2018● Intel Xeon Phi programming
− Larrabee (2009) → Knights Ferry (2010) → Knights Corner → Intel Xeon Phi (2012) → KNL (2016)
24.-25.1.2018 Programming the new KNL Cluster at LRZ
} → OpenACC, OpenMP 4.x
IPCC (Intel Parallel Computing Centre)
● New Intel Parallel Computing Centre (IPCC) since July 2014: Extreme Scaling on MIC/x86
● Chair of Scientific Computing at the Department of Informatics in the Technische Universität München (TUM) & LRZ
● https://software.intel.com/de-de/ipcc#centers● https://software.intel.com/de-de/articles/intel-parallel-computing-center-at-
leibniz-supercomputing-centre-and-technische-universit-t● Codes:
− Simulation of Dynamic Ruptures and Seismic Motion in Complex Domains: SeisSol
− Numerical Simulation of Cosmological Structure Formation: GADGET− Molecular Dynamics Simulation for Chemical Engineering: ls1 mardyn− Data Mining in High Dimensional Domains Using Sparse Grids: SG++
24.-25.1.2018 Programming the new KNL Cluster at LRZ
IPCC (Intel Parallel Computing Centre)
● Numerical Simulation of Cosmological Structure Formation: GADGET
● Code optimization through:− Better threading parallelism (dynamic scheduling,
lockless version),− Data optimization (AoS → SoA),− Promoting more efficient vectorization.
● Baruffa, Iapichino, Hammer & Karakasis: Performance optimisation of SmoothedParticle Hydrodynamics algorithms for multi/many-core architectures, 2017, proceedings of HPCS 2017, Also available as: https://arxiv.org/abs/1612.06090Awarded as Outstanding Paper.
24.-25.1.2018 Programming the new KNL Cluster at LRZ
IPCC (Intel Parallel Computing Centre)
− Numerical Simulation of Cosmological Structure Formation: GADGET− Improved Speed-Up of the Isolated Kernel on KNL wrt
original version: 5.7x− parallel efficiency: 73%
24.-25.1.2018 Programming the new KNL Cluster at LRZ
Intel Xeon Phi and GPU Training @ LRZ
Programming the new KNL Cluster at LRZ
28.-30.4.2014 @ LRZ (PATC): KNC+GPU27.-29.4.2015 @ LRZ (PATC): KNC+GPU3.-4.2.2016 @ IT4Innovations: KNC27.-29.6.2016 @ LRZ (PATC): KNC+KNL28.9.2016 @ PRACE Seasonal School,
Hagenberg: KNC7.-8.2.2017 @ IT4Innovations (PATC): KNC26.-28.6.2017 @ LRZ (PATC): KNLJune 2018 @ LRZ (PATC tbc.): KNL
http://inside.hlrs.de/inSiDE, Vol. 12, No. 2, p. 102, 2014inSiDE, Vol. 13, No. 2, p. 79, 2015inSiDE, Vol. 14, No. 1, p. 76f, 2016inSiDE, Vol. 14, No. 2, p. 25ff, 2016inSiDE, Vol. 15, No. 1, p. 48ff, 2017inSiDE, Vol. 15, No. 2, p. 55ff, 2017
24.-25.1.2018
● Czech-Bavarian Competence Team for Supercomputing Applications (CzeBaCCA)
● BMBF funded project (Jan. 2016 – Dec 2017) to:
− Foster Czech-German Collaboration in Simulation Supercomputing− series of workshops will initiate and deepen collaboration between Czech
and German computational scientists − Establish Well-Trained Supercomputing Communities
− joint training program will extend and improve trainings on both sides− Improve Simulation Software
− establish and disseminate role models and best practices of simulation software in supercomputing
Programming the new KNL Cluster at LRZ
CzeBaCCA Project
24.-25.1.2018
CzeBaCCA Trainings and Workshops
● Intel MIC Programming Workshop, 3 – 4 February 2016, Ostrava, Czech Republic
● Scientific Workshop: SeisMIC - Seismic Simulation on Current and Future Supercomputers, 5 February 2016, Ostrava, Czech Republic
● PRACE PATC Course: Intel MIC Programming Workshop, 27 - 29 June 2016, Garching, Germany
● Scientific Workshop: High Performance Computing for Water Related Hazards, 29 June - 1 July 2016, Garching, Germany
● PRACE PATC Course: Intel MIC Programming Workshop, 7 – 8 February 2017, Ostrava, Czech Republic
● Scientific Workshop: High performance computing in atmosphere modelling and air related environmental hazards, 9 February 2017, Ostrava, Czech Republic
● PRACE PATC Course: Intel MIC Programming Workshop, 26 – 28 June 2017, Garching, Germany
● Scientific Workshop: HPC for natural hazard assessment and disaster migration, 28 - 30 June 2017, Garching, Germany
24.-25.1.2018 Programming the new KNL Cluster at LRZ
CzeBaCCA Trainings and Workshops
Programming the new KNL Cluster at LRZ
1st workshop series: February 2016 @ IT4I
https://www.lrz.de/forschung/projekte/forschung-hpc/CzeBaCCA/http://inside.hlrs.de/ inSiDE, Vol. 14, No. 1, p. 76f, 2016http://www.gate-germany.de/fileadmin/dokumente/Laenderprofile/Laenderprofil_Tschechien.pdf, p.27
24.-25.1.2018
CzeBaCCA Trainings and Workshops
Programming the new KNL Cluster at LRZ
2nd workshop series: June 2016 @ LRZ
https://www.lrz.de/forschung/projekte/forschung-hpc/CzeBaCCA/http://inside.hlrs.de/ inSiDE, Vol. 14, No. 2, p. 25ff, 2016http://www.gate-germany.de/fileadmin/dokumente/Laenderprofile/Laenderprofil_Tschechien.pdf, p.27
24.-25.1.2018
CzeBaCCA Trainings and Workshops
Programming the new KNL Cluster at LRZ
3rd workshop series: February 2017 @ IT4I
https://www.lrz.de/forschung/projekte/forschung-hpc/CzeBaCCA/http://inside.hlrs.de/ inSiDE, Vol. 15, No. 1, p. 48ff, 2017http://www.gate-germany.de/fileadmin/dokumente/Laenderprofile/Laenderprofil_Tschechien.pdf, p.27
24.-25.1.2018
CzeBaCCA Trainings and Workshops
Programming the new KNL Cluster at LRZ
4th workshop series: June 2017 @ LRZ
https://www.lrz.de/forschung/projekte/forschung-hpc/CzeBaCCA/http://inside.hlrs.de/ inSiDE, Vol. 15, No. 2, p. 55ff, 2017http://www.gate-germany.de/fileadmin/dokumente/Laenderprofile/Laenderprofil_Tschechien.pdf, p.27
24.-25.1.2018
Intel Xeon Phi @ Top500 November 2017
● https://www.top500.org/list/2017/11/● #2: Tianhe-2 (MilkyWay-2) - TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH
Express-2, Intel Xeon Phi 31S1P, National Super Computer Center in Guangzhou, China● #7: Trinity - Cray XC40, Intel Xeon Phi 7250 68C 1.4GHz, Aries interconnect , Cray
Inc.DOE/NNSA/LANL/SNL, United States ● #8: Cori - Cray XC40, Intel Xeon Phi 7250 68C 1.4GHz, Aries interconnect , Cray Inc.,
DOE/SC/LBNL/NERSC, United States ● #9: Oakforest-PACS - PRIMERGY CX1640 M1, Intel Xeon Phi 7250 68C 1.4GHz, Intel
Omni-Path , Fujitsu, Joint Center for Advanced High Performance Computing, Japan ● #12:Stampede2 - PowerEdge C6320P, Intel Xeon Phi 7250 68C 1.4GHz, Intel Omni-Path ,
Dell, Texas Advanced Computing Center/Univ. of Texas, United States ● #14: Marconi, Intel Xeon Phi - CINECA Cluster, Intel Xeon Phi 7250 68C 1.4GHz, Intel
Omni-Path , Lenovo, CINECA, Italy ● #23: Tera-1000-2-Part 1 - Bull Sequana X1000, Intel Xeon Phi 7250 68C 1.4GHz, Bull BXI
1.2, Bull, Atos Group, Commissariat a l'Energie Atomique (CEA)●● … several non European systems …● #87: Salomon - SGI ICE X, Xeon E5-2680v3 12C 2.5GHz, Infiniband FDR, Intel Xeon Phi
7120P, HPE, IT4Innovations National Supercomputing Center, VSB-Technical University of Ostrava, Czech Republic
24.-25.1.2018 Programming the new KNL Cluster at LRZ
PRACE: Best Practice Guides
● http://www.prace-ri.eu/best-practice-guides/
24.-25.1.2018 Programming the new KNL Cluster at LRZ
Best Practice Guides - Overview
● The following 4 Best Practice Guides (BPGs) have been written within PRACE-4IP by 13 authors from 8 institutions and have been published in pdf and html format in January 2017 on the PRACE website:
− Intel® Xeon Phi™ BPG Update of the PRACE-3IP BPG− Haswell/Broadwell BPG Written from scratch− Knights Landing BPG Written from scratch− GPGPU BPG Update of the PRACE-2IP mini-guide
● Online under: http://www.prace-ri.eu/best-practice-guides/
24.-25.1.2018 Programming the new KNL Cluster at LRZ
Intel MIC within PRACE: Intel Xeon Phi (KNC) Best Practice Guide
− Created within PRACE-3IP+4IP.− Written in Docbook XML.− 122 pages, 13 authors− Now including information about existing
Xeon Phi based systems in Europe: Avitohol@ BAS (NCSA), MareNostrum @ BSC, Salomon @ IT4Innovations,SuperMIC @ LRZ
− http://www.prace-ri.eu/best-practice-guide-intel-xeon-phi-january-2017/
− http://www.prace-ri.eu/IMG/pdf/Best-Practice-Guide-Intel-Xeon-Phi-1.pdf
24.-25.1.2018 Programming the new KNL Cluster at LRZ
Intel MIC within PRACE: Knights Landing Best Practice Guide
− Created within PRACE-4IP.− Written in Docbook XML.− 85 pages, 3 authors− General information about the KNL
architecture and programming environment− Benchmark & Application Performance
results− http://www.prace-ri.eu/IMG/best-practice-
guide-knights-landing-january-2017/− http://www.prace-ri.eu/IMG/pdf/Best-
Practice-Guide-Knights-Landing.pdf
24.-25.1.2018 Programming the new KNL Cluster at LRZ
Best Practice Guides - Dissemination
24.-25.1.2018 Programming the new KNL Cluster at LRZ
DEEP/ER Project: Towards Exascale
● Design of an architecture leading to Exascale.● Development of hardware: Implementation of a Booster based on
MIC processors and EXTOLL interconnect.
7.-8.2.2017 Intel MIC Programming Workshop @ IT4I
SuperMIC ∈ SuperMUC @ LRZ
24.-25.1.2018 Programming the new KNL Cluster at LRZ
SuperMUC System Overview
24.-25.1.2018 Programming the new KNL Cluster at LRZ
SuperMUC Phase 2: Moving to Haswell
24.-25.1.2018 Programming the new KNL Cluster at LRZ
6 Haswell islands512 nodes per islandwarm water cooling
LRZ infrastructure(NAS, Archive, Visualization)
Internet / Grid Services
Mellanox FDR14Island switch
Haswell-EP24 cores/node2.67 GB/core
non blocking
Spine infinibandswitches
pruned tree
I/Oservers
GPFS for$WORK$SCRATCH
I/O Servers (weak coupling of phases 1+2)
Mellanox FDR10 Island switch
non blocking
pruned tree
Thin + Fat islandsof SuperMC
SuperMIC: Intel Xeon Phi Cluster
24.-25.1.2018 Programming the new KNL Cluster at LRZ
SuperMIC: Intel Xeon Phi Cluster
24.-25.1.2018 Programming the new KNL Cluster at LRZ
SuperMIC ∈ SuperMUC @ LRZ
● 32 compute nodes (diskless)− SLES11 SP3− 2 Ivy-Bridge host processors [email protected] GHz with 16 cores− 2 Intel Xeon Phi 5110P coprocessors per node with 60 cores − 64 GB (Host) + 2 * 8 GB (Xeon Phi) memory− 2 MLNX CX3 FDR PCIe cards attached to each CPU socket
● Interconnect− Mellanox Infiniband FDR14 − Through Bridge Interface all nodes and MICs are directly accessible
● 1 Login- and 1 Management-Server (Batch-System, xCAT, …)● Air-cooled● Supports both native and offload mode● Batch-system: LoadLeveler
24.-25.1.2018 Programming the new KNL Cluster at LRZ
SuperMIC Network Access
24.-25.1.2018 Programming the new KNL Cluster at LRZ
SuperMIC Access
● Description of SuperMIC:● https://www.lrz.de/services/compute/supermuc/supermic/
24.-25.1.2018 Programming the new KNL Cluster at LRZ
CoolMUC3 @ Linux-Cluster @ LRZ
24.-25.1.2018 Programming the new KNL Cluster at LRZ
First self-assembled Linux cluster (1999-2002)
24.-25.1.2018 Programming the new KNL Cluster at LRZ
Cluster components (2012)
24.-25.1.2018 Programming the new KNL Cluster at LRZ
SGI UltraViolet with air guidesin front to improve cooling efficiency (2012)
24.-25.1.2018 Programming the new KNL Cluster at LRZ
CooLMUC2 (2015): The six racks to the left
24.-25.1.2018 Programming the new KNL Cluster at LRZ
LRZ Linux Cluster Overview
● The LRZ Linux Cluster consists of several segments with different types of interconnect and different sizes of shared memory. All systems have a (virtual) 64 bit address space:− CooLMUC2 Cluster with 28-way Haswell-based nodes and FDR14 Infiniband
interconnect, used for both serial and parallel processing− Intel Broadwell based 6 TByte shared memory server HP DL580− CooLMUC3 Cluster with 64-way KNL 7210-F many-core processors and Intel
Omnipath OPA1 interconnect, for parallel/vector processing● Based on the various node types the LRZ Linux cluster offers a wide span of
capabilities:− mixed shared and distributed memory− large software portfolio− flexible usage due to various available memory sizes− parallelization by message passing (MPI)− shared memory parallelization with OpenMP or pthreads− mixed (hybrid) programming with MPI and OpenMP
24.-25.1.2018 Programming the new KNL Cluster at LRZ
Linux Cluster System Overview
Architecture Total Numbers Max Job Limits How to get access
System Name CPU Cores
per Node
RAMper Node[GB]
Nodes Cores Nodes Cores WallTime
Max. aggreg.RAM
Queue
Login Node(job submission)
Linux-ClusterCoolMUC2
Intel XeonE5-2697 v3 ("Haswell")
28 64 384 10752 60 1680 48h 3.8 TB mpp2lxlogin5.lrz.de, lxlogin6.lrz.de,lxlogin7.lrz.de
Linux-ClusterCoolMUC2
Intel XeonE5-2697 v3 ("Haswell")
28 64 1 28 1 28 96h 64 GB seriallxlogin5.lrz.de, lxlogin6.lrz.de,lxlogin7.lrz.de
Linux-ClusterHugemem
Intel XeonE5-2660 v2 ("Sandy Bridge")
20 240 7 220 1 20 168h 240 GB hugememlxlogin5.lrz.de,lxlogin6.lrz.de,lxlogin7.lrz.de
Linux-ClusterTeramem
Intel Xeon E7-8890 v4 96 6144 1 96 1 96 48h 6.1 TB inter
lxlogin5.lrz.de,lxlogin6.lrz.de,lxlogin7.lrz.de
Linux ClusterMany Core
Intel Xeon Phi (Knights Landing) 64 RAM:96
HBM:16 148 64 32 64 24h
RAM:2880GBHBM:512GB
mpp3 lxlogin8.lrz.de
24.-25.1.2018 Programming the new KNL Cluster at LRZ
CoolMUC3 System Procurement Vision
● To supply a system to its users that is suited for processing highly vectorizable and thread-parallel applications,
● provides good scalability across node boundaries for strong scaling, and
● deploys state-of-the art high-temperature water-cooling technologies for a system operation that avoids heat transfer to the ambient air of the computer room.
24.-25.1.2018 Programming the new KNL Cluster at LRZ
CoolMUC3 System Overview
● Consists of 148 computational many-core Intel “Knight’s Landing (KNL)” nodes (Xeon Phi 7210-F hosts).
● Connected to each other via an Intel Omnipath high performance network in fat tree topology.
● Theoretical peak performance is 400 TFlop/s and the LINPACK performance of the complete system is 255 TFlop/s.
● Standard Intel® Xeon® login node for development work and job submission.
24.-25.1.2018 Programming the new KNL Cluster at LRZ
CoolMUC3 System Overview
● CoolMUC-3 comprises of three warm-water cooled racks, using an inlet temperature of at least 40 C.
● Deployment of liquid-cooled power supplies and Omni-Path switches.
● Thermal isolation of the racks to suppress radiative losses. ● Liquid cooled racks operate entirely without fans. ● A complementary rack for aircooled components (e.g.
management servers) uses less than 3% of the systems total power budget.
● With 4.96 GFlops/Watt (according to the strict Green500 level-3 measurement methodology) CooLMUC-3 is one of the most efficient x86 systems worldwide.
● Result of an established partnership between LRZ and Megware.
24.-25.1.2018 Programming the new KNL Cluster at LRZ
CoolMUC3 (2017): Really Cool
24.-25.1.2018 Programming the new KNL Cluster at LRZ
CoolMUC3 (2017): Really Cool
24.-25.1.2018 Programming the new KNL Cluster at LRZ
CoolMUC3 System Overview
HardwareNumber of nodes 148Cores per node 64Hyperthreads per core 4Core nominal frequency 1.3 GHzMemory (DDR4) per node 96 GB (Bandwidth 80.8 GB/s)High Bandwidth Memory per node 16 GB (Bandwidth 460 GB/s)Bandwidth to interconnect per node 25 GB/s (2 Links)Number of Omnipath switches (100SWE48) 10 + 4 (each 48 Ports)Bisection bandwidth of interconnect 1.6 TB/sLatency of interconnect 2.3 µsPeak performance of system 394 TFlop/s
InfrastructureElectric power of fully loaded system 62 kVAPercentage of waste heat to warm water 97%Inlet temperature range for water cooling 30 … 50 °CTemperature difference between outlet and inlet 4 … 6 °C
Software (OS and development environment)Operating system SLES12 SP2 LinuxMPI Intel MPI 2017, alternativ OpenMPICompilers Intel icc, icpc, ifort 2017 Performance libraries MKL, TBB, IPP, DAALTools for performance and correctness analysis Intel Cluster Tools
24.-25.1.2018 Programming the new KNL Cluster at LRZ
Linux Cluster System Access
24.-25.1.2018 Programming the new KNL Cluster at LRZ
ssh -Y lxlogin5.lrz.de -l xxyyyzz Haswell (CooLMUC-2) login nodessh -Y lxlogin6.lrz.de -l xxyyyzz Haswell (CooLMUC-2) login nodessh -Y lxlogin7.lrz.de -l xxyyyzz Haswell (CooLMUC-2) login nodegsissh -Y lxgt2.lrz.de login node for Gsi-SSH
ssh -Y lxlogin8.lrz.de -l xxyyyzz KNL Cluster (CooLMUC-3 login node)
Linux Cluster System Documentation
24.-25.1.2018 Programming the new KNL Cluster at LRZ
● Linux-Cluster:− https://www.lrz.de/services/compute/linux-cluster/
● CoolMUC-3:− https://www.lrz.de/services/compute/linux-
cluster/coolmuc3/initial_operation/− https://www.lrz.de/services/compute/linux-
cluster/coolmuc3/overview/
CoolMUC3 SLURM
● Submit a job: sbatch --reservation=KNL_Course job.sh
● List own jobs: squeue –u a2c06aa
● Cancel jobs: scancel jobid
● Interactive Access:● salloc --nodes=1 --time=02:00:00
--constraint=flat,quad--reservation=KNL_Course
● srun –-pty bash --reservation=KNL_Course
24.-25.1.2018 Programming the new KNL Cluster at LRZ
CoolMUC3 SLURM Simplest Batch File
-> /lrz/sys/courses/KNL/batch-flat-quad.sh
#!/bin/bash
#SBATCH -o /home/hpc/a2c06/a2c06aa/test.%j.%N.out
#SBATCH -D /home/hpc/a2c06/a2c06aa/
#SBATCH -J jobname
#SBATCH --clusters=mpp3
#SBATCH --get-user-env
#SBATCH --time=02:00:00
#SBATCH --constraint=flat,quad
commands
24.-25.1.2018 Programming the new KNL Cluster at LRZ
KNL Testsystem
24.-25.1.2018 Programming the new KNL Cluster at LRZ
First login to Linux-Cluster (directlyreachable from the course PCs, use onlyaccount a2c06aa!):
ssh lxlogin8.lrz.de –l a2c06aa
Then:
ssh mcct03.cos.lrz.de orssh mcct04.cos.lrz.de
Processor: Intel(R) Xeon Phi(TM) CPU 7210. 64 cores, 4 threads per core.Frequency: 1 - 1.5 GHz
KNL: 64 cores x 1.3 GHz x 8 (SIMD) x 2 x 2 (FMA) = 2662.4 GFLOP/sCompare with:KNC: 60 cores x 1 GHz x 8 (SIMD) x 2 (FMA) = 960 GFLOP/sSandy-Bridge: 2 sockets x 8 cores x 2.7 GHz x 4 (SIMD) x 2 (ALUs) = 345.6 GFLOP/s
Xeon Phi References
● Books:− James Reinders, James Jeffers, Intel Xeon Phi Coprocessor High
Performance Programming, Morgan Kaufman Publ. Inc., 2013 http://lotsofcores.com ; new KNL edition in July 2016
− Rezaur Rahman: Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers, Apress 2013 .
− Parallel Programming and Optimization with Intel Xeon Phi Coprocessors, Colfax 2013 http://www.colfaxintl.com/nd/xeonphi/book.aspx
● Training material by CAPS, TACC, EPCC● Intel Training Material and Webinars● V. Weinberg (Editor) et al., Best Practice Guide - Intel Xeon Phi v2,
http://www.prace-ri.eu/best-practice-guide-intel-xeon-phi-january-2017/ and references therein
● Ole Widar Saastad (Editor) et al., Best Practice Guide – Knights Landing, http://www.prace-ri.eu/best-practice-guide-knights-landing-january-2017/
24.-25.1.2018 Programming the new KNL Cluster at LRZ
Acknowledgements
● IT4Innovation, Ostrava.
● Partnership for Advanced Computing in Europe (PRACE)
● Intel
● BMBF (Federal Ministry of Education and Research)
● Dr. Karl Fürlinger (LMU)
● J. Cazes, R. Evans, K. Milfeld, C. Proctor (TACC)
● Adrian Jackson (EPCC)
Programming the new KNL Cluster at LRZ24.-25.1.2018
And now …
Enjoy the course!
24.-25.1.2018 Programming the new KNL Cluster at LRZ