why next- gen artificial intelligence needs supercomputing · large hadron collider needs • 3d...

© 2019 Cray Inc.

W h y N e x t - g e n A r t i f i c i a l I n t e l l i g e n c e N e e d s

S u p e r c o m p u t i n g

James D. Maltby, Ph.D.Solutions Architect, Artificial Intelligence

© 2019 Cray Inc.

• Advanced AI needs Supercomputing…• New challenging applications• Models are getting larger• Deep Learning training needs to scale

• And Supercomputing needs new features for Advanced AI!• New I/O patterns• New languages and workflows• Hybrid systems with new processor types

OVERVIEW

2

© 2019 Cray Inc.

What is a Supercomputer?

3

6174 Cray XC Nodes based onIntel® Xeon®scalable processor

197, 568 Cores128 GB memory per node790 TB total memory7.2 petaflops/s peakDragonfly interconnect17.6 PB Lustre storage2.8 MW electrical

…plenty of room for AI and Big Data

© 2019 Cray Inc.

• There are many definitions…• AI is most commonly associated with Deep Learning (DL), though many other

Machine Learning techniques are included.• Deep Learning (DL) training is a classic high-performance computing problem

which demands:• Large compute capacity in terms of FLOPs, memory capacity and bandwidth• A performant interconnect for fast communication of gradients and model

parameters• Parallel I/O and storage with sufficient bandwidth to keep the compute fed at

scale

What is Advanced AI?

4

© 2019 Cray Inc.

Supercomputers are key for advanced simulation• Largest ever storm prediction

model• Over 4 billion points used to

simulate the landfall of Hurricane Sandy

• Urban scale grid resolution of 500m (compared to standard 3km)

• Enables the research to understanding fine grained properties of hurricanes

• Determined the precise chemical structure of the HIV capsid

• Protein shell that protects the virus’s genetic material and is a key to its virulence.

• Key to the development of new antiretroviral drugs

• Requires the assembly of more than 1,300 identical proteins – in atomic-level detail.

• Crop devastation by whiteflies is a major cause of hunger in East Africa

• Understanding the DNA of the species by generating phylogenetic trees

• With only 500 whiteflies in a genetic dataset, the possible relationships between these flies run into the octillions (10^25)

5

© 2019 Cray Inc.

But Deep Learning can be compute intensive also• Crop data is key to decision

makers.• Applying Deep Learning on

satellite data, the two major crops can be distinguished with 95% accuracy just a few months after planting and well before harvest.

• More timely estimates could be used for a variety applications, including supply-chain logistics, commodity market future projections, and more.

• Cryo-electron microscopy (cryo-EM) provides 3D structural information of biological molecules and assemblies

• Cryo-EM has improved structure resolution to near atomic in just the past few years

• Critical to advancing basic biology and to characterizing drugs and drug targets for improved drug discovery.

• Development of systems for connected cars and autonomous technologies are key to enabling autonomous vehicles.

• These advances could not have been realized without the application of deep learning to object detection in image and full motion video

6

© 2019 Cray Inc. 7

Deep Learning is already being applied in traditional supercomputing fields

When Simulation Is too Expensive

• Detailed simulation of subatomic particles interactions is essential to High Energy Physics at CERN

• Monte Carlo approach is not fast enough for the High-Luminosity Large Hadron Collider needs

• 3D convolutional GAN can generate realistic detector output >2000x faster

Ref: Dr. Federico Carminati et al, CERN

Maximizing Data Utilization

• Satellites create more data than can be assimilated

• Only a small % of available data is used today.

• “Deep learning object detection can be used to identify areas of atmospheric instability from satellite observation data, focus extraction of observations on these regions of interest.”

Ref: Jebb Stewart, NOAA, 2018 ECMWF workshop on HPC in Meteorology

Saving Compute Cycles Through Improved Simulation Guidance

• Application of machine learning optimization techniques such as regularization and steering to determine the velocity model of Full Waveform Inversion (FWI) seismic imaging.

• Compared with manual velocity model determination and tuning, FWI with ML converges more quickly and efficiently.

© 2019 Cray Inc. 8

Computational demands of Deep Learning are growing (fast)

Ref: Open AI

© 2019 Cray Inc.

• DL Training• We can strong scale training time-to-accuracy provided:

• Number of workers (e.g., # nodes) << number of training examples• Learning rate for particular batch size / scale is known

• Hyper-Parameter Optimization (HPO)• Finding the best set of “tuning parameters” for a model

• learning rate schedule, momentum, batch size• Evolve the topology of the model itself if a good architecture is unknown

• Layer types, width, number filters, activation functions, drop-out rates• Sampling the parameter space requires many training sessions!

Scaling is important for Deep Learning

9

© 2019 Cray Inc.

Scaling performance on CosmoFlow

• Achieve 77% scaling efficiency at 8192 nodes• Measure walltime per epoch (throughput)

– Captures end-to-end capability, including communication, I/O, interconnect, single-node performance…

Note: Batch size per node is constant (=-1), so global batch size is # nodes.

10

© 2019 Cray Inc.

Scaling: the importance of IO• Poor scaling on Lustre

beyond 512 nodes – 58% at 1024 nodes.

• Tests with dummy data (i.e. not read from FS) showed cause was I/O

• Use DataWarp (“Burst Buffer”) to achieve 77% scaling efficiency at 8192 nodes

• Attribute to: – Higher available read bandwidth from DataWarp– SSDs more suited to random, small read pattern– Less heavily-utilised resources

11

© 2019 Cray Inc.

Many new open source packages and languages are used in AI

• Currently available in the Urika-XC package:

• Spark distributed in-memory processing for Big Data

• Python-based packages such as Anaconda and Dask for Python-based distributed parallel computing

• Jupyter Notebooks for interactive supercomputing

• Integrated Deep Learning • There are many popular deep

learning frameworks…• Intel© BigDL© native deep learning

library for Spark

12

© 2019 Cray Inc.

AI often requires end-to-end, heterogeneous workflows – digital rock example

DataAcquisition

DataProcessing

Data PreparationLabor & CPU Intensive

ModelTraining

ModelTesting

Model DevelopmentCompute (GPU & CPU) Intensive

ModelDeployment

ModelInference

Model ImplementationBusiness Process Intensive

Compute Requirements (By Stage and Task): Data

preparation Machine

Learning Training

Deep Learning Training

Inference

Data Requirements (Size, Type, Performance): large volume data

stores high-bandwidth fast IOPs Direct connectivity

to external storage HDF5/Netcdf ~

Relational, Document, Key-Value

Performance Requirements (By Stage and Task): Scaling Throughput

User-productivity Performant Python Collaborative notebooks Open-source framework support

Training and Inference Processor Landscape Intel, AMD, ARM,

NVIDIA Google TPU v2 Graphcore Habana 30+ startups….

Storage Technology Landscape Tape HDD SSD-SATA/SAS SSD-NVMe Flash On-node vs. off-

node – Datawarp vs. NVMe-over-Fabrics

Process and prepare 3D image data into 2D

“slices”

Develop computer model of rock features of interest

using training data

Using model, predict rock features on new image data that has

been “sliced”

13

© 2019 Cray Inc.

• Science data is heterogeneous:• Domain, source, type, ....

Multiple models are used in AI analyses

14

Geospatial DataPoints

Lines

Grid

Surface

Mesh

Streaming Data

Time series

Graph Data

Associations

AI Today Multi-Model Future

Model CNN, RNN, LSTM, GAN etc. Domain-specific

Baseline Humans, Other ML algorithms Theory, Science

Use CaseSpeech, Image interpretation

Hyper-personalization

Computational SteeringProxy models

Figure of Merit Time-to-accuracy, Model-size + Interpretability, Feasibility

Model Design Transfer/Incremental Learning Hyper-parameter optimization

Model Testing A/B Testing on a cadence Statistical rigor

© 2019 Cray Inc.

• Traditionally, HPC has been batch scheduled…

• New interactive Big Data/AI applications have arisen• Interactivity BOFs at SC17, SC18• NERSC JupyterHub, Rstudio• KAUST kslhub (Sam Kortas)• CSCS Jupyterlab• Cray, Inc. Urika-XC

Interactivity and supercomputing

15

© 2019 Cray Inc.

Future: system to eco-system/workflow thinking

16

ComponentPerformance

NodePerformance

Multi-nodePerformance

SystemPerformance

FacilityPerformance

i/o

Hardware Software Ecosystem

© 2019 Cray Inc.

The Exascale era will be marked by data-centric workloads which combine AI and HPC for unprecedented innovation in commercial and research applications

These workloads will be characterized by:

• Increasing heterogeneity in compute requirements

• A rapid evolution of software ecosystems

• A need for new approaches in system design to reach exascale

Exascale Computing – A Shared Vision

18

© 2019 Cray Inc.

Aurora will be delivered in 2021 and will be built on the Cray Shasta architecture utilizing future Intel Xeon generation Intel® Xeon® Scalable processor and Intel Xe

compute architecture.

• Aurora will consist of more than 200 cabinets of Shasta compute infrastructure to deliver sustained exaflop performance

• Utilizes the Cray Slingshot™ interconnect to scale workloads seamlessly

• Cray Shasta system software in combination with Intel One API framework will provide a unified application programming interface

• Aurora is the second major supercomputer win for Shasta in the last 6 months – the first being the NERSC-9 “Perlmutter” pre-exascale system

Aurora – Built on the Shasta Architecture

19

© 2019 Cray Inc.

• Key aspects of what is coming:• New, highly compute intensive applications• New languages, packages and approaches• Greater mingling of simulation and AI methodologies in scientific workflows• Scaling is still paramount

• Implications:• Need for new architectures, both hardware and software

Summary

20

why next- gen artificial intelligence needs supercomputing · large hadron collider needs • 3d...

Documents