why next- gen artificial intelligence needs supercomputing · large hadron collider needs • 3d...
TRANSCRIPT
© 2019 Cray Inc.
W h y N e x t - g e n A r t i f i c i a l I n t e l l i g e n c e N e e d s
S u p e r c o m p u t i n g
James D. Maltby, Ph.D.Solutions Architect, Artificial Intelligence
© 2019 Cray Inc.
• Advanced AI needs Supercomputing…• New challenging applications• Models are getting larger• Deep Learning training needs to scale
• And Supercomputing needs new features for Advanced AI!• New I/O patterns• New languages and workflows• Hybrid systems with new processor types
OVERVIEW
2
© 2019 Cray Inc.
What is a Supercomputer?
3
6174 Cray XC Nodes based onIntel® Xeon®scalable processor
197, 568 Cores128 GB memory per node790 TB total memory7.2 petaflops/s peakDragonfly interconnect17.6 PB Lustre storage2.8 MW electrical
…plenty of room for AI and Big Data
© 2019 Cray Inc.
• There are many definitions…• AI is most commonly associated with Deep Learning (DL), though many other
Machine Learning techniques are included.• Deep Learning (DL) training is a classic high-performance computing problem
which demands:• Large compute capacity in terms of FLOPs, memory capacity and bandwidth• A performant interconnect for fast communication of gradients and model
parameters• Parallel I/O and storage with sufficient bandwidth to keep the compute fed at
scale
What is Advanced AI?
4
© 2019 Cray Inc.
Supercomputers are key for advanced simulation• Largest ever storm prediction
model• Over 4 billion points used to
simulate the landfall of Hurricane Sandy
• Urban scale grid resolution of 500m (compared to standard 3km)
• Enables the research to understanding fine grained properties of hurricanes
• Determined the precise chemical structure of the HIV capsid
• Protein shell that protects the virus’s genetic material and is a key to its virulence.
• Key to the development of new antiretroviral drugs
• Requires the assembly of more than 1,300 identical proteins – in atomic-level detail.
• Crop devastation by whiteflies is a major cause of hunger in East Africa
• Understanding the DNA of the species by generating phylogenetic trees
• With only 500 whiteflies in a genetic dataset, the possible relationships between these flies run into the octillions (10^25)
5
© 2019 Cray Inc.
But Deep Learning can be compute intensive also• Crop data is key to decision
makers.• Applying Deep Learning on
satellite data, the two major crops can be distinguished with 95% accuracy just a few months after planting and well before harvest.
• More timely estimates could be used for a variety applications, including supply-chain logistics, commodity market future projections, and more.
• Cryo-electron microscopy (cryo-EM) provides 3D structural information of biological molecules and assemblies
• Cryo-EM has improved structure resolution to near atomic in just the past few years
• Critical to advancing basic biology and to characterizing drugs and drug targets for improved drug discovery.
• Development of systems for connected cars and autonomous technologies are key to enabling autonomous vehicles.
• These advances could not have been realized without the application of deep learning to object detection in image and full motion video
6
© 2019 Cray Inc. 7
Deep Learning is already being applied in traditional supercomputing fields
When Simulation Is too Expensive
• Detailed simulation of subatomic particles interactions is essential to High Energy Physics at CERN
• Monte Carlo approach is not fast enough for the High-Luminosity Large Hadron Collider needs
• 3D convolutional GAN can generate realistic detector output >2000x faster
Ref: Dr. Federico Carminati et al, CERN
Maximizing Data Utilization
• Satellites create more data than can be assimilated
• Only a small % of available data is used today.
• “Deep learning object detection can be used to identify areas of atmospheric instability from satellite observation data, focus extraction of observations on these regions of interest.”
Ref: Jebb Stewart, NOAA, 2018 ECMWF workshop on HPC in Meteorology
Saving Compute Cycles Through Improved Simulation Guidance
• Application of machine learning optimization techniques such as regularization and steering to determine the velocity model of Full Waveform Inversion (FWI) seismic imaging.
• Compared with manual velocity model determination and tuning, FWI with ML converges more quickly and efficiently.
© 2019 Cray Inc. 8
Computational demands of Deep Learning are growing (fast)
Ref: Open AI
© 2019 Cray Inc.
• DL Training• We can strong scale training time-to-accuracy provided:
• Number of workers (e.g., # nodes) << number of training examples• Learning rate for particular batch size / scale is known
• Hyper-Parameter Optimization (HPO)• Finding the best set of “tuning parameters” for a model
• learning rate schedule, momentum, batch size• Evolve the topology of the model itself if a good architecture is unknown
• Layer types, width, number filters, activation functions, drop-out rates• Sampling the parameter space requires many training sessions!
Scaling is important for Deep Learning
9
© 2019 Cray Inc.
Scaling performance on CosmoFlow
• Achieve 77% scaling efficiency at 8192 nodes• Measure walltime per epoch (throughput)
– Captures end-to-end capability, including communication, I/O, interconnect, single-node performance…
Note: Batch size per node is constant (=-1), so global batch size is # nodes.
10
© 2019 Cray Inc.
Scaling: the importance of IO• Poor scaling on Lustre
beyond 512 nodes – 58% at 1024 nodes.
• Tests with dummy data (i.e. not read from FS) showed cause was I/O
• Use DataWarp (“Burst Buffer”) to achieve 77% scaling efficiency at 8192 nodes
• Attribute to: – Higher available read bandwidth from DataWarp– SSDs more suited to random, small read pattern– Less heavily-utilised resources
11
© 2019 Cray Inc.
Many new open source packages and languages are used in AI
• Currently available in the Urika-XC package:
• Spark distributed in-memory processing for Big Data
• Python-based packages such as Anaconda and Dask for Python-based distributed parallel computing
• Jupyter Notebooks for interactive supercomputing
• Integrated Deep Learning • There are many popular deep
learning frameworks…• Intel© BigDL© native deep learning
library for Spark
12
© 2019 Cray Inc.
AI often requires end-to-end, heterogeneous workflows – digital rock example
DataAcquisition
DataProcessing
Data PreparationLabor & CPU Intensive
ModelTraining
ModelTesting
Model DevelopmentCompute (GPU & CPU) Intensive
ModelDeployment
ModelInference
Model ImplementationBusiness Process Intensive
Compute Requirements (By Stage and Task): Data
preparation Machine
Learning Training
Deep Learning Training
Inference
Data Requirements (Size, Type, Performance): large volume data
stores high-bandwidth fast IOPs Direct connectivity
to external storage HDF5/Netcdf ~
Relational, Document, Key-Value
Performance Requirements (By Stage and Task): Scaling Throughput
User-productivity Performant Python Collaborative notebooks Open-source framework support
Training and Inference Processor Landscape Intel, AMD, ARM,
NVIDIA Google TPU v2 Graphcore Habana 30+ startups….
Storage Technology Landscape Tape HDD SSD-SATA/SAS SSD-NVMe Flash On-node vs. off-
node – Datawarp vs. NVMe-over-Fabrics
Process and prepare 3D image data into 2D
“slices”
Develop computer model of rock features of interest
using training data
Using model, predict rock features on new image data that has
been “sliced”
13
© 2019 Cray Inc.
• Science data is heterogeneous:• Domain, source, type, ....
Multiple models are used in AI analyses
14
Geospatial DataPoints
Lines
Grid
Surface
Mesh
Streaming Data
Time series
Graph Data
Associations
AI Today Multi-Model Future
Model CNN, RNN, LSTM, GAN etc. Domain-specific
Baseline Humans, Other ML algorithms Theory, Science
Use CaseSpeech, Image interpretation
Hyper-personalization
Computational SteeringProxy models
Figure of Merit Time-to-accuracy, Model-size + Interpretability, Feasibility
Model Design Transfer/Incremental Learning Hyper-parameter optimization
Model Testing A/B Testing on a cadence Statistical rigor
© 2019 Cray Inc.
• Traditionally, HPC has been batch scheduled…
• New interactive Big Data/AI applications have arisen• Interactivity BOFs at SC17, SC18• NERSC JupyterHub, Rstudio• KAUST kslhub (Sam Kortas)• CSCS Jupyterlab• Cray, Inc. Urika-XC
Interactivity and supercomputing
15
© 2019 Cray Inc.
Future: system to eco-system/workflow thinking
16
ComponentPerformance
NodePerformance
Multi-nodePerformance
SystemPerformance
FacilityPerformance
i/o
Hardware Software Ecosystem
© 2019 Cray Inc.
AURORA – The 1st US Exascale System
17
© 2019 Cray Inc.
The Exascale era will be marked by data-centric workloads which combine AI and HPC for unprecedented innovation in commercial and research applications
These workloads will be characterized by:
• Increasing heterogeneity in compute requirements
• A rapid evolution of software ecosystems
• A need for new approaches in system design to reach exascale
Exascale Computing – A Shared Vision
18
© 2019 Cray Inc.
Aurora will be delivered in 2021 and will be built on the Cray Shasta architecture utilizing future Intel Xeon generation Intel® Xeon® Scalable processor and Intel Xe
compute architecture.
• Aurora will consist of more than 200 cabinets of Shasta compute infrastructure to deliver sustained exaflop performance
• Utilizes the Cray Slingshot™ interconnect to scale workloads seamlessly
• Cray Shasta system software in combination with Intel One API framework will provide a unified application programming interface
• Aurora is the second major supercomputer win for Shasta in the last 6 months – the first being the NERSC-9 “Perlmutter” pre-exascale system
Aurora – Built on the Shasta Architecture
19
© 2019 Cray Inc.
• Key aspects of what is coming:• New, highly compute intensive applications• New languages, packages and approaches• Greater mingling of simulation and AI methodologies in scientific workflows• Scaling is still paramount
• Implications:• Need for new architectures, both hardware and software
Summary
20