your platform for discovery · tesla accelerated computing . your platform for discovery . tesla...
TRANSCRIPT
Tesla Accelerated Computing
YOUR PLATFORM FOR DISCOVERY
Tesla K40 GPU
Piz Daint: Fastest Supercomputer in
Europe
NVLink
Jetson TK1
3 Most Popular Servers Incorporate
Tesla GPUs
OpenPOWER Foundation
ENI: World’s Fastest Enterprise Supercomputer
World’s First GPU-Accelerated ARM
Servers
World’s First POWER+GPU
Server
GPU Sweeps ILSVRC Image Recognition Competition
IBM DB2 Plan for GPU-
Acceleration
Top 15 Greenest
Supercomputers
cuDNN: Machine Learning Library
CUDA 6 Unified Mem
Development Data Center Infrastructure
Tesla Accelerated Computing Platform
GPU Accelerators Interconnect System
Management Compiler Solutions
GPU Boost …
GPU Direct NVLink
…
NVML …
LLVM …
Profile and Debug
CUDA Debugging API …
Development Tools
Programming Languages
Infrastructure Management Communication System Solutions
/
Software Solutions
Libraries
cuBLAS …
Tesla: Platform with Open Ecosystem Common Programming Models Across Multiple CPUs
x86
Libraries
Programming Languages
Compiler Directives
AmgX cuBLAS
/
US to Build Two Flagship Supercomputers
Major Step Forward on the Path to Exascale
Partnership for Science
100-300 PFLOPS Peak Performance
10x in Scientific Applications
2017
SUMMIT SIERRA
Just 4 nodes in Summit would make the Top500 list of
supercomputers today
Similar Power as Titan 5-10x Faster 1/5th the Size
150 PF = 3M Laptops
ORNL is managed by UT-Battelle for the US Department of Energy
Future Leadership Computers at OLCF
Jeff Nichols Associate Laboratory Director Computing & Computational Sciences Directorate
Presented at: SC’14
8 SC’14 Summit - Bland
CORAL System
Our Science requires that we continue to advance OLCF’s computational capability over the next decade on the roadmap to Exascale. Since clock-rate scaling ended in 2003, HPC performance has been achieved through increased parallelism. Jaguar scaled to 300,000 cores.
Titan and beyond deliver hierarchical parallelism with very powerful nodes. MPI plus thread level parallelism through OpenACC or OpenMP plus vectors
Jaguar: 2.3 PF Multi-core CPU 7 MW
Titan: 27 PF Hybrid GPU/CPU 9 MW
2010 2012 2017 2022
OLCF5: 5-10x Summit ~20 MW Summit: 5-10x Titan
Hybrid GPU/CPU 10 MW
9 SC’14 Summit - Bland
2017 OLCF Leadership System Hybrid CPU/GPU architecture Vendor: IBM (Prime) / NVIDIA™ / Mellanox Technologies®
At least 5X Titan’s Application Performance
Approximately 3,400 nodes, each with: • Multiple IBM POWER9 CPUs and multiple NVIDIA Tesla® GPUs using the NVIDIA Volta
architecture • CPUs and GPUs completely connected with high speed NVLink • Large coherent memory: over 512 GB (HBM + DDR4)
– all directly addressable from the CPUs and GPUs • An additional 800 GB of NVRAM, which can be configured as either a burst buffer or as
extended memory • over 40 TF peak performance
Dual-rail Mellanox® EDR-IB full, non-blocking fat-tree interconnect
IBM Elastic Storage (GPFS™) - 1TB/s I/O and 120 PB disk capacity.
10 SC’14 Summit - Bland
Scientific Progress at all Scales Fusion Energy A Princeton Plasma Physics Laboratory team led by C.S. Chang was able to simulate and explain nonlinear coherent turbulence structures on the plasma edge in a fusion reactor by exploiting increased performance of the Titan architecture. Turbomachinery Ramgen Power Systems is using the Titan system to significantly reduce time to solution in the optimization of novel designs based on aerospace shock wave compression technology for gas compression systems, such as carbon dioxide compressors. Nuclear Reactors Center for the Advanced Simulation of Lightwater Reactors (CASL) investigators successfully performed full core physics power-up simulations of the Westinghouse AP1000 pressurized water reactor core using their Virtual Environment for Reactor Application code that has been designed on Titan architecture.
Liquid Crystal Film Stability ORNL Postdoctoral fellow Trung Nguyen
ran unprecedented large-scale molecular dynamics
simulations on Titan to model the beginnings of ruptures in thin
films wetting a solid substrate.
Earthquake Hazard Assessment To better prepare California for large
seismic events SCEC joint researchers have simulated earthquakes at high
frequencies allowing structural engineers to conduct realistic physics-based
probabilistic seismic hazard analyses.
Climate Science Simulations on OLCF resources
recreated the evolution of the climate state during the first half of the last
deglaciation period allowing scientists to explain the mechanisms that triggered the
last period of Southern Hemisphere warning and deglaciation.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
SC 14 November 17-21, 2014
Rob Neely, Associate Division Leader, Center for Applied Scientific Computing
Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 13
Strengthen the United States’ security by developing and applying world-class science, technology, and engineering.
Science and Technology on a Mission
Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 14
Unmodified codes will run on Power® Architecture processor
Memory rich nodes; high node memory bandwidth
Volta™ GPUs provide substantial performance potential
Outstanding benchmark analysis by IBM + NVIDIA
Cost competitive; low risk solution; outstanding hardware reliability
LLNL selected the most compelling system for NNSA
NRE contract provides significant benefit Center of Excellence - expert help with porting and optimizing actual applications
Motherboard design and novel cooling concept
GPU reliability; file system performance; open source compiler infrastructure
Advanced system diagnostics and scheduling; advanced networking capabilities
MEM
Multi-core CPU
GPU
ME
M
Notional Sierra node
Coherent memory capability
Development Data Center Infrastructure
Tesla Accelerated Computing Platform
GPU Accelerators Interconnect System
Management Compiler Solutions
GPU Boost …
GPU Direct NVLink
…
NVML …
LLVM …
Profile and Debug
CUDA Debugging API …
Development Tools
Programming Languages
Infrastructure Management Communication System Solutions
/
Software Solutions
Libraries
cuBLAS …
IBM POWER CPU Most Powerful Serial Processor
NVIDIA NVLink Fastest CPU-GPU Interconnect
NVIDIA Volta GPU Most Powerful Parallel Processor
Accelerated Computing 5x Higher Energy Efficiency
TESLA K80 WORLD’S FASTEST ACCELERATOR FOR DATA ANALYTICS AND SCIENTIFIC COMPUTING
Caffe Benchmark: AlexNet training throughput based on 20 iterations, CPU: E5-2697v2 @ 2.70GHz. 64GB System Memory, CentOS 6.2
Maximum Performance Dynamically Maximize Perf for
Every Application
Double the Memory Designed for Big Data Apps
24GB
Oil & Gas
Data Analytics
HPC Viz
K40 12GB
2x Faster 2.9 TF| 4992 Cores | 480 GB/s
0x
5x
10x
15x
20x
25x
CPU Tesla K40 Tesla K80
Deep Learning: Caffe
Dual-GPU Accelerator for
Max Throughput
Performance Lead Continues to Grow
0
500
1000
1500
2000
2500
3000
3500
2008 2009 2010 2011 2012 2013 2014
Peak Double Precision FLOPS
NVIDIA GPU x86 CPU
M2090
M1060
K20
K80
GFLOPS
0
100
200
300
400
500
600
2008 2009 2010 2011 2012 2013 2014
Peak Memory Bandwidth
NVIDIA GPU x86 CPU
GB/s
K20
K80
K40 K40
M2090
M1060
Chromatophore Converts Light to Energy Enabling Visualization with
the Tesla Platform
Award Winning Science Now Visualized
Chromatophore Light Harvesting Organelle to
Convert Light to Energy
HIV Virus Unveiling the Structure of the
HIV Capsid
Formation of Milky Way Understanding How Our Galaxy Evolved Over Billions of Years
SC’14 Visualization and Data Analytics Showcase Finalist 2014 Gordon Bell Prize Finalist 2013 Front Cover of Nature Magazine
Interactive Scientific Visualization in the NVIDIA Booth at SC’14
Tesla Accelerated Computing
YOUR PLATFORM FOR DISCOVERY