the us doe exascale computing project (ecp) · and hardware innovations within doe facilities ecp...

38
The US DOE Exascale Computing Project (ECP) Perspective for the HEP Community Douglas B. Kothe (ORNL), ECP Director Lori Diachin (LLNL), ECP Deputy Director Erik Draeger (LLNL), ECP Deputy Director of Application Development Tom Evans (ORNL), ECP Energy Applications Lead Blueprint Workshop on A Coordinated Ecosystem for HL-LHC Computing R&D Washington, DC October 23, 2019

Upload: others

Post on 11-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

The US DOE ExascaleComputing Project (ECP)Perspective for the HEP Community

Douglas B. Kothe (ORNL), ECP DirectorLori Diachin (LLNL), ECP Deputy DirectorErik Draeger (LLNL), ECP Deputy Director of Application DevelopmentTom Evans (ORNL), ECP Energy Applications Lead

Blueprint Workshop on A Coordinated Ecosystem for HL-LHC Computing R&DWashington, DCOctober 23, 2019

Page 2: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

2

US DOE Office of Science (SC) and National Nuclear Security Administration (NNSA)

DOE Exascale Program: The Exascale Computing Initiative (ECI)

ECI partners

Accelerate R&D, acquisition, and deployment to deliver exascale computing capability to DOE national labs by the early- to mid-2020s

ECI mission

Delivery of an enduring and capable exascale computing capability for use by a wide range of applications of importance to DOE and the US

ECI focus

Exascale Computing

Project (ECP)

Exascale system procurement projects &

facilities

ALCF-3 (Aurora)

OLCF-5 (Frontier)

ASC ATS-4 (El Capitan)

Selected program office application

development (BER, BES,

NNSA)

Three Major Components of the ECI

Page 3: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

3

ECP Mission and VisionEnable US revolutions in technology development; scientific discovery; healthcare; energy, economic, and national security

Develop exascale-ready applications and solutions that address currently intractable problems of strategic importance and national interest.

Create and deploy an expanded and vertically integrated software stack on DOE HPC exascale and pre-exascale systems, defining the enduring US exascale ecosystem.

Deliver US HPC vendor technology advances and deploy ECP products to DOE HPC pre-exascale and exascale systems.

ECP mission

Deliver exascale simulation and data science innovations and solutions to national problems that enhance US economic competitiveness, change our quality of life, and strengthen our national security.

ECP vision

Page 4: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

4

Vision: Exascale Computing Project (ECP) Lifts all U.S. High Performance Computing to a New Trajectory

Time

Capability

2016 2021 2022 2023 2024 2025 2026 2027

10X

5X

Page 5: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

5

Relevant US DOE Pre-Exascale and Exascale Systems for ECP

Page 6: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

6

The three technical areas in ECP have the necessary components to meet national goals

ApplicationDevelopment (AD)

SoftwareTechnology (ST)

Hardware and Integration (HI)

Performant mission and science applications @ scale

Aggressive RD&D Project

Mission apps & integrated S/W stack

Deployment to DOE HPC Facilities

Hardware tech advances

Integrated delivery of ECP products on targeted systems at

leading DOE HPC facilities

6 US HPC vendors focused on exascale node and system

design; application integration and software deployment to

facilities

Deliver expanded and vertically integrated software stack to

achieve full potential of exascale computing

70 unique software products spanning programming models and run times, math libraries,

data and visualization

Develop and enhance the predictive capability of

applications critical to the DOE

24 applications including national security, to energy, earth

systems, economic security, materials, and data

Page 7: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

7

Measure progress and ensure execution within scope, schedule, and budget

Develop and enhance predictive capability of applications critical to DOE across science, energy, and national security mission space

Build a comprehensive, coherent software stack that enables the productive development of highly parallel applications that effectively target diverse exascale architectures

A capable exascale computing ecosystem made possible by integrating ECP applications, software and hardware innovations within DOE facilities

ECP is a large, complex projectEffective project management with three technical focus areas designed to deliver a capable exascale ecosystem

Project Management (PM)

Application Development (AD)

Software Technology (ST)

Hardware and Integration (HI)

Distinctive characteristics

• RD&D and software development in nature

• Two sponsoring DOE programs

• Numerous participating institutions

• Decentralized cost system

• External project dependence

• Broad and qualitative mission need requirements

• Outcomes: Products and solutions

• Key performance parameters require innovation

• Application of scope contingency

• End of project transition

Page 8: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

8

ECP by the Numbers

A seven-year, $1.8 B R&D effort that launched in 2016

Six core DOE National Laboratories: Argonne, Lawrence Berkeley, Lawrence Livermore, Oak Ridge, Sandia, Los Alamos

• Staff from most of the 17 DOE national laboratories take part in the project

More than 80 top-notch R&D teams

Four focus areas: Hardware and Integration, Software Technology, Application Development, Project Management

Hundreds of consequential milestones delivered on schedule and within budget since project inception

7 YEARS

$1.7B

6CORE DOE

LABS

4FOCUSAREAS

81 R&D TEAMS

1000 RESEARCHERS

Page 9: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

9

Software Technology

Mike Heroux, SNLDirector

Jonathan Carter, LBNLDeputy Director

Hardware & Integration

Terri Quinn, LLNL

Director

Susan Coghlan, ANL

Deputy Director

Application Development

Andrew Siegel, ANLDirector

Erik Draeger, LLNLDeputy Director

Project Management

Kathlyn Boudwin, ORNL

Director

Manuel Vigil, LANL

Deputy Director

Doug Collins, ORNL

Associate Director

Al Geist, ORNL

Chief Technology Officer

Exascale Computing ProjectDoug Kothe, ORNL

Project Director

Lori Diachin, LLNL

Deputy Project Director

Project Office Support

Megan Fielden, Human Resources

Willy Besancenez, Procurement

Sam Howard, Export Control Analyst

Mike Hulsey, Business Management

Kim Milburn, Finance Officer

Susan Ochs, Partnerships

Michael Johnson, Legal

and Points of Contacts at the

Core Laboratories

Julia White, ORNL

Technical OperationsMike Bernhardt, ORNL

Communications

Doug CollinsIT & Quality

Monty MiddlebrookProject Controls & Risk

Industry Council

Dave Kepczynski, GE, Chair

Core Laboratories

Board of Directors

Bill Goldstein, Chair (Director, LLNL)

Thomas Zacharia, Vice Chair (Director, ORNL)

Laboratory Operations Task Force (LOTF)

DOE HPC Facilities

ECP Organization

Dan HoagFederal Project Director

Barb HellandASCR Program Manager

Thuc HoangASC Program Manager

Page 10: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

10

Project Management2.1

Boudwin (ORNL)

Project Planning and Management

2.1.1Boudwin (ORNL)

Project Controls and Risk Management

2.1.2Middlebrook (ORNL)

Information Technology and Quality Management

2.1.5Collins (ORNL)

Business Management2.1.3

Hulsey (ORNL)

Procurement Management2.1.4

Besancenez (ORNL)

Communications and Outreach2.1.6

Bernhardt (ORNL)

Chemistry and Materials Applications

2.2.1

Energy Applications2.2.2

National Security Applications2.2.5

Earth and Space Science Applications

2.2.3

Co-Design2.2.6

Application Development2.2

Software Technology2.3

Heroux (SNL)

Programming Models and Runtimes

2.3.1Thakur (ANL)

Development Tools2.3.2

Vetter (ORNL)

Mathematical Libraries2.3.3

McInnes (ANL)

Data and Visualization2.3.4

Ahrens (LANL)

Chemistry and Materials Applications

2.2.1Deslippe (LBL)

Energy Applications2.2.2

Evans (ORNL)

National Security Applications2.2.5

Francois (LANL)

Earth and Space Science Applications

2.2.3Dubey (ANL)

Data Analytics and Optimization Applications

2.2.4Hart (SNL)

Co-Design2.2.6

Germann (LANL)

Application Development2.2

Siegel (ANL)

Chemistry and Materials Applications

2.2.1

Energy Applications2.2.2

National Security Applications2.2.5

Earth and Space Science Applications

2.2.3

Co-Design2.2.6

Application Development2.2

PathForward2.4.1

de Supinski (LLNL)

Hardware Evaluation2.4.2

Pakin (LANL)

Facility Resource Utilization2.4.5

White (ORNL)

Application Integration at Facilities

2.4.3Hill (ORNL)

Software Deployment at Facilities

2.4.4Adamson (ORNL)

Training and Productivity2.4.6

Barker (ORNL)

Hardware and Integration2.4

Quinn (LLNL)

Exascale Computing Project 2.0

Kothe (ORNL)

ECP Work Breakdown Structure (WBS)Key leaders at WBS Level 1, 2, 3

Software Ecosystem and Delivery2.3.5

Munson (ANL)

NNSA Software Technologies2.3.6

Neely (LLNL)

81 WBS L4 subprojects have set their FY20-23 performance baseline with scope and technical plans to execute on RD&D objectives in ECP’s Final Design

Page 11: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

11

ECP High Level Schedule and Access to Systems

Page 12: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

12

ECP applications target national problems in DOE mission areas

Health care

Accelerate and translate

cancer research (partnership with NIH)

Energy security

Turbine wind plant efficiency

Design and commercialization

of SMRs

Nuclear fission and fusion reactor materials design

Subsurface use for carbon capture, petroleum extraction,

waste disposal

High-efficiency, low-emission

combustion engine and gas turbine

design

Scale up of clean fossil fuelcombustion

Biofuel catalyst design

National security

Next-generation, stockpile

stewardship codes

Reentry-vehicle-environment simulation

Multi-physics science simulations of high-

energy density physics conditions

Economic security

Additive manufacturing

of qualifiablemetal parts

Reliable and efficient planning of the power grid

Seismic hazard risk assessment

Earth system

Accurate regional impact assessments

in Earth system models

Stress-resistant crop analysis and catalytic

conversion of biomass-derived

alcohols

Metagenomics for analysis of

biogeochemical cycles, climate

change, environmental remediation

Scientific discovery

Cosmological probe of the standard model

of particle physics

Validate fundamental laws of nature

Plasma wakefieldaccelerator design

Light source-enabled analysis of protein

and molecular structure and design

Find, predict, and control materials

and properties

Predict and control magnetically

confined fusion plasmas

Demystify origin of chemical elements

Page 13: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

13

Co-design Subprojects

Co-design helps to ensure that applications effectively utilize

exascale systems

• Pull software and hardware developments into applications

• Pushes application requirements into software and hardware RD&D

• Evolved from best practice to an essential element of the development cycle

CD Centers focus on a unique collection of algorithmic motifs

invoked by ECP applications

• Motif: algorithmic method that drives a common pattern of computation and communication

• CD Centers must address all high priority motifs used by ECP applications, including the new motifs associated with data science applications

Efficient mechanism for delivering next-generation

community products with broad application impact

• Evaluate, deploy, and integrate exascale hardware-aware software designs and technologies for key crosscutting algorithmic motifs into applications

ExaLearnMachine Learning

ExaGraphGraph-based

algorithms

CEEDFinite element discretization

AMReXBlock structured

AMR

COPAParticles/mesh

methods

CODARData and

workflows

• Co-design centers address computational motifs common to multiple application projects

Page 14: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

14

LLNLIBM/NVIDIA

Department of Energy (DOE) Roadmap to Exascale SystemsAn impressive, productive lineup of accelerated node systems supporting DOE’s mission

ANLIBM BG/Q

ORNLCray/AMD/NVIDIA

LBNLCray/AMD/NVIDIA

LANL/SNLTBD

ANL*Cray/Intel

ORNL*Cray/AMD

LLNL*TBD

LANL/SNLCray/Intel Xeon/KNL

2012 2016 2018 2020 2021-2023

ORNLIBM/NVIDIA

LLNLIBM BG/Q

Sequoia (13)

Cori (14)

Trinity (7)

Theta (28)Mira (24)

Titan (12) Summit (1)

NERSC-9Perlmutter

Aurora

ANLCray/Intel KNL

LBNLCray/Intel Xeon/KNL

First U.S. Exascale Systems*

Sierra (2)

Pre-Exascale Systems

Three different types of accelerators!

To date, only NVIDIA GPUs

Page 15: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

15

New hardware requires fully re-examining approaches

Code PortingAlgorithmic

RestructuringAlternate Choice of

Physical Models

NewNumerical

Approaches

This is not just a porting exercise, codes are being redesigned with heterogeneous computing and portability in mind

Goal: Ensure exascale hardware impacts DOE science/engineering mission

Approach: Significant investment in scientific applications well in advance of exascale machines

Page 16: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

16

ECP Software Technology Software Ecosystem

ECP Applications

Software Ecosystem & Delivery

DevelopmentTools

ProgrammingModels

Runtimes

Mathematical

LibrariesData &

Visualization

Facilities Vendors HPC Community

ECP Software Technology

Collaborators

Details available publicly at https://www.exascaleproject.org/wp-content/uploads/2019/02/ECP-ST-CAR.pdf

Page 17: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

17

Programming Models & Runtimes

• Enhance & prepare OpenMP and MPI programming models (hybrid programming models, deep memory copies) for exascale

• Development of performance portability tools (e.g. Kokkos and Raja)

• Support alternate models for potential benefits and risk mitigation: PGAS (UPC++/GASNet) ,task-based models (Legion, PaRSEC)

• Libraries for deep memory hierarchy & power management

Development Tools

• Continued, multifaceted capabilities in portable, open-source LLVM compiler ecosystem to support expected ECP architectures, including support for F18

• Performance analysis tools that accommodate new architectures, programming models, e.g., PAPI, Tau

Math Libraries

• Linear algebra, iterative linear solvers, direct linear solvers, integrators and nonlinear solvers, optimization, FFTs, etc

• Performance on new node architectures; extreme strong scalability

• Advanced algorithms for multi-physics, multiscale simulation and outer-loop analysis

• Increasing quality, interoperability, complementarity of math libraries

Data and Visualization

• I/O libraries: HDF5, ADIOS, PnetCDF,

• I/O via the HDF5 API

• Insightful, memory-efficient in-situ visualization and analysis – Data reduction via scientific data compression

• Checkpoint restart

• Filesystem support for emerging solid state technologies.

Software Ecosystem

• Develop features in Spack necessary to support all ST products in E4S, and the AD projects that adopt it

• Development of Spack stacks for reproducible turnkey deployment of large collections of software

• Optimization and interoperability of containers on HPC systems

• Regular E4S releases of the ST software stack and SDKs with regular integration of new ST products

NNSA ST

• Projects that have both mission role and open science role

• Major technical areas: New programming abstractions, math libraries, data and viz libraries

• Cover most ST technology areas

• Open source NNSA Software projects

• Subject to the same planning, reporting and review processes

ECP software technologies are a fundamental underpinning in delivering on DOE’s exascale mission

Page 18: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

18

Software Development Kits (SDKs): Key delivery vehicle for ECPA collection of related software products (packages) where coordination across package teams improves usability and practices, and foster community growth among teams that develop similar and complementary capabilities

• Domain scopeCollection makes functional sense

• Interaction modelHow packages interact; compatible, complementary, interoperable

• Community policiesValue statements; serve as criteria for membership

• Meta-infrastructureInvokes build of all packages (Spack), shared test suites

• Coordinated plansInter-package planning. Augments autonomous package planning

• Community outreachCoordinated, combined tutorials, documentation, best practices

ECP ST SDKs: Grouping similar products for collaboration & usability

Programming Models & Runtimes Core

Tools & Technologies

Compilers & Support

Math Libraries (xSDK)

Viz Analysis and Reduction

Data mgmt., I/O Services & Checkpoint/ Restart

“Unity in essentials, otherwise diversity”

Page 19: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

19

ECP ST SDKs will span all technology areas

zfp

VisIt

ASCENT

Cinema

Catalyst

VTK-m

SZ

ParaView

Visualization Analysis and Reduction (9)

ROVER

xSDK (16)

MAGMA

DTK

Tasmanian

TuckerMPI

SUNDIALS

PETSc/TAO

libEnsemble

STRUMPACK

SuperLU

ForTrilinos

SLATE

MFEM

Kokkoskernels

Trilinos

hypre

FleSCI

PMR Core (17)

UPC++

MPICH

Open MPI

Umpire

AML

RAJA

CHAI

PaRSEC*

DARMA

GASNet-EX

Qthreads

BOLT

SICM

Legion

Kokkos (support)

QUO

Papyrus

Tools and Technology (11)

PAPI

Program Database Toolkit

Search (random forests)

Siboka

C2C

Sonar

Dyninst Binary Tools

Gotcha

Caliper

TAU

HPCToolkit

Compilers and Support (7)

OpenMP V & V

Flang/LLVM Fortran comp

LLVM

CHiLL autotuning comp

LLVM openMP comp

openarc

Kitsune

Data management, I/O Services, Checkpoint restart (12)

Parallel netCDF

ADIOS

Darshan

UnifyCR

VeloC

IOSS

HXHIM

ROMIO

Mercury (Mochi suite)

HDF5

SCR

FAODEL

Ecosystem/E4S at-large (12)

BEE

FSEFI

Kitten Lightweight Kernel

COOLR

NRM

ArgoContainers

Spack

MarFS

GUFI

Intel GEOPM

mpiFileUtils

TriBITS

Tools

PMR

Data and Vis

Ecosystems and delivery

Math Libraries Legend Each column is an SDK as defined in the initial breakdown process using criteria developed for choosing logical and effective groupings based on experience with the xSDK. The colored background denotes the ST technical area for each product.

Page 20: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

20

ST Ecosystem: From products to SDKs to an integrated stack

ST Products

• Source: ECP L4 teams; Non-ECP Developers; Standards Groups

• Delivery: Apps directly; spack; vendor stack; facility stack

SDKs

• Source: ECP SDK teams; Non-ECP Products (policy compliant, spackified)

• Delivery: Apps directly; spack install sdk; future: vendor/facility

E4S

•Source: ECP E4S team; Non-ECP Products (all dependencies)

•Delivery: spack install e4s; containers; CI Testing

Levels of Integration Product Source and Delivery

• Group similar products

• Make interoperable

• Assure policy compliant

• Include external products

• Build all SDKs

• Build complete stack

• Containerize binaries

• Standard workflow

• Existed before ECP

ECP ST Open Product Integration Architecture

ECP ST Individual Products

Page 21: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

21

Extreme-scale Scientific Software Stack (E4S)A Spack-based distribution of ECP ST products and related and dependent software tested for interoperability and portability to multiple architecturesLead: Sameer Shende, University of Oregon

• Provides distinction between SDK usability / general quality / community and deployment / testing goals

• Will leverage and enhance SDK interoperability thrust

• Releases:

– Oct: E4S 0.1: 24 full, 24 partial release products

– Jan: E4S 0.2: 37 full, 10 partial release products

• Current primary focus: Facilities deployment

http://e4s.io

Page 22: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

Monte Carlo Transport on Accelerated Node ArchitecturesRecent efforts in the ECP ExaSMR Subproject

Thomas M. Evans

A Coordinated Ecosystem for HL-LHC Computing R&D

Catholic University, Oct 23, 2019

Page 23: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

23

ExaSMR: Modeling and Simulation of Small Modular Reactors

• Small modular nuclear reactors present significant simulation challenges

– Small size invalidates existing low-order models

– Natural circulation flow requires high-fidelity fluid flow simulation

• ExaSMR will couple most accurate available methods to perform “virtual experiment” simulations

– Monte Carlo neutronics

– CFD with turbulence modelsReproduced with permission

MC Neutronics CFD

Petascale Exascale Petascale Exascale

• System-integrated responses

• Single physics

• Constant temperature

• Isotopic depletion on assemblies

• Reactor startup

• Pin-resolved (and sub-pin) responses

• Coupled with T/H

• Variable temperatures

• Isotopic depletion on full core

• Full-cycle modeling

• Single fuel assembly

• RANS

• Within-core flow

• Full reactor core

• Hybrid LES/RANS

• Entire coolant loop

Fuel assembly mixing vane

Page 24: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

24

Physical Problem CharacteristicsProblem Parameters

• Core Characteristics

– Full core representative SMR model containing 37 assemblies with 17 × 17 pins per assembly and 264 fuels pins per assembly

– 1010 particles per eigenvalue iteration

– Pin-resolved reaction rate with 3 radial tally regions and 50 – 100 axial levels

– O(150) nuclides and O(8) reactions per nuclide in each tally region

• Geometry Size

– 𝑁𝑐𝑒𝑙𝑙𝑠 = 1.9 × 106 − 8.8 × 106

• Tally Sizes

– 𝑁𝑡,𝑐𝑒𝑙𝑙𝑠 = 4.8 × 105 − 5.9 × 106

– 𝑁𝑡,𝑏𝑖𝑛𝑠 = 1.5 × 109 − 1.8 × 1010

rf = 0.406 cmrg = 0.414 cmrc = 0.475 cm

Pin pitch = 1.26 cmAssembly pitch = 21.5 cmHeight = 227.56 cm

Fuel (UO2)

Clad (Zr) Gap (He)

Page 25: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

25

Monte Carlo Neutron Transport Challenges

• MC neutronics is a stochastic method

– Independent random walks are not readily amenable to SIMT algorithms – on-node concurrency

– Sampling data is randomly accessed

– Sampling data is characterized by detailed structure

– Large variability in transport distributions both within and between particle histories

Page 26: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

26

Developing GPU Continuous Energy Monte Carlo – Intra-Node

• Focus on high-level thread divergence

• Optimize for device occupancy

– Separate geometry and physics kernels to increase occupancy

– Boundary crossings (geometry)

– Collision (physics)

• Smaller kernels help address variability in particle transport distributions

• Partition macro cross section calculations between fuel and non-fuel regions – separate kernels for each

• Use of hardware atomics for tallies and direct sort addressing

• Judicious use of texture memory

– __ldg on data interpolation bounds

Simple Event-Based Transport Algorithm

get vector of source particleswhile any particles are alive dofor each living particle do

move particledist-to-collisiondist-to-boundarymove-to-next

end forfor each living particle do

process particle collisionend forsource particlessort/consolidate surviving particles

end while

Page 27: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

27

Production continuous-energy Monte Carlo transport solver on GPUs

• Petascale implementation did not use GPU hardware

• Enables three-dimensional, fully-depleted SMR core

models simulated using continuous-energy physics and

pin-resolved reaction rates with temperature-dependence

• Algorithmic improvements offer 10x speedup relative to

initial implementation and nearly 60x per-node speedup

over Titan

• Nearly perfect parallel scaling efficiency on ORNL’s Summit

supercomputer

• GPU algorithm executes more than 20x faster than CPU

algorithm on Summit (per full node)

• Paper describes first production MC solver implementation

on GPUs

Hamilton, S.P., Evans, T.M., 2019. Continuous-energy Monte Carlo neutron transport on GPUs in the Shift code. Annals of Nuclear Energy 128, 236–247. https://doi.org/10.1016/j.anucene.2019.01.012

Total reaction rate in SMR core

Increase in particle tracking rate across GPU computing architectures

Page 28: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

28

Cross section calculations

• Computing transport cross sections requires contributions from various constituents

Σ 𝐸 = σ𝑚=1𝑀 𝑁𝑚𝜎𝑚 (𝐸)

• Fuel compositions contain substantially more nuclides than non-fuel

• Partition mixtures into fuel and non-fuel

– Evaluate cross sections in separate kernels to reduce divergence

Page 29: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

29

Occupancy

• Flattened algorithm allows small, focused kernels

– Split geometry/physics components to reduce register usage

– Smaller kernels = higher occupancy

MC type Algorithm Registers Occupancy

Multigroup History-based

Event-based

85

83

25%

25%

Continuous-Energy History-based

Event-based

168

62

12.5%

50%

Page 30: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

30

Effect of varying occupancy

• Artificially limit occupancy by allocating shared memory

– kernel<<<grids, blocks, shared_mem>>>(…)

Algorithm

Occupancy (%) History-based Event-based Flattened event-based

12.5 3.7 3.4 8.2

25.0 - 5.8 13.3

37.5 - - 14.5

50.0 - - 16.9

62.5❊ - - 18.0

❊Only applied to “distance to collision kernel”

Page 31: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

31

CPU v GPU performance

CPU tracking rate per core GPU core equivalent

GPU performance increases have outpaced corresponding CPU improvements

Page 32: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

32

Device saturation

Depleted SMR core

Newest architectures remain unsaturated at 1M particles per GPU

Page 33: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

33

Inter-node Scaling

Weak scaling on Summit – 1 GPU per MPI rank

Domain replication parallelism

0 1

2 3

0 1 2 3

Num Particles = N / 4 Num Particles = N / 4 Num Particles = N / 4 Num Particles = N / 4

Multi-set domain decomposition topology(in development – GPU)

Intra-set non-uniform block out to address load balancing

Ellis, J.A., Evans, T.M., Hamilton, S.P., Kelley, C.T., Pandya, T.M., 2019. Optimization of processor allocation for domain decomposed Monte Carlo calculations. Parallel Computing 87, 77–86. https://doi.org/10.1016/j.parco.2019.06.001

Page 34: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

34

On-the-Fly Doppler Broadening

• Cross section resonances significantly broaden due to thermal motion of nuclei

• The cross section (𝜎) at any energy (𝐸) and temperature (𝑇) can be expressed as a summation over contributions from poles (𝑝𝑗) and corresponding residues (𝑟𝑗):

• A polynomial approximation can be used to reduce the number of 𝑊 ⋅ evaluations

𝜎 𝐸, 𝑇 =1

𝐸

𝐴𝜋

𝑘𝐵𝑇

𝑗

ℜ 𝑟𝑗 𝑊 𝐸 − 𝑝𝑗 𝐴/𝑘𝐵𝑇

𝜎 𝐸, 𝑇 =1

𝐸

𝐴𝜋

𝑘𝐵𝑇

𝑗

ℜ 𝑟𝑗 𝑊 𝐸 − 𝑝𝑗 𝐴/𝑘𝐵𝑇 +

𝑛=0

𝑁−1

𝑎𝑤,𝑛𝔇𝑛

Page 35: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

35

GPU Performance

• Performance testing with a quarter-core model of the awaited NuScale Small Modular Reactor (SMR)

• No significant sacrifice of accuracy compared to standard continuous energy (CE) data

• Each GPU thread does individual Fadeeva evaluations (no vectorization over nuclides)

• Factor of 2-3 performance penalty on both the CPU and GPU for arbitrary temperature resolution

2x IBM Power8+4x NVIDIA Tesla P100

Page 36: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

36

Geant-based proxy pilot

Goals

• Research and develop design patterns suitable for HEP transport on GPUs

• Produce a proxy app with limited but representative physics processes

• Execute and profile the proxy app at the scale needed by next-generation HEP experiments

Challenges

• Choosing a scope small enough to digest but can emulate the level of complexity of a real simulation

• Reconciling static (build-time) preference of GPU code with dynamic user requirements

• Effectively utilizing the GPU with a very broad, flat call graph (dozens of independent physics processes)

Page 37: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

37

Geant-based proxy pilot

Complete

• Developed requirements document for the proxy app

• Constructed development framework (CMake/Docker/CUDA)

• Integrated CUDA-enabled VecGeom geometry

In progress

• Iterating on high-level code architecture and event loop

• Implementing physics kernels in CUDA

• Awaiting onboarding of postdoc...

Future work

• Explore HIP in preparation for Frontier

• Evaluate ClangJIT for GPU-friendly dynamicism

Page 38: The US DOE Exascale Computing Project (ECP) · and hardware innovations within DOE facilities ECP is a large, complex project Effective project management with three technical focus

38

Questions?https://www.exascaleproject.org/