your platform for discovery · tesla accelerated computing . your platform for discovery . tesla...

Tesla Accelerated Computing

YOUR PLATFORM FOR DISCOVERY

Tesla K40 GPU

Piz Daint: Fastest Supercomputer in

Europe

NVLink

Jetson TK1

3 Most Popular Servers Incorporate

Tesla GPUs

OpenPOWER Foundation

ENI: World’s Fastest Enterprise Supercomputer

World’s First GPU-Accelerated ARM

Servers

World’s First POWER+GPU

Server

GPU Sweeps ILSVRC Image Recognition Competition

IBM DB2 Plan for GPU-

Acceleration

Top 15 Greenest

Supercomputers

cuDNN: Machine Learning Library

CUDA 6 Unified Mem

Development Data Center Infrastructure

Tesla Accelerated Computing Platform

GPU Accelerators Interconnect System

Management Compiler Solutions

GPU Boost …

GPU Direct NVLink

…

NVML …

LLVM …

Profile and Debug

CUDA Debugging API …

Development Tools

Programming Languages

Infrastructure Management Communication System Solutions

/

Software Solutions

Libraries

cuBLAS …

Tesla: Platform with Open Ecosystem Common Programming Models Across Multiple CPUs

x86

Libraries


Compiler Directives

AmgX cuBLAS

/

US to Build Two Flagship Supercomputers

Major Step Forward on the Path to Exascale

Partnership for Science

100-300 PFLOPS Peak Performance

10x in Scientific Applications

2017

SUMMIT SIERRA

Just 4 nodes in Summit would make the Top500 list of

supercomputers today

Similar Power as Titan 5-10x Faster 1/5th the Size

150 PF = 3M Laptops

ORNL is managed by UT-Battelle for the US Department of Energy

Future Leadership Computers at OLCF

Jeff Nichols Associate Laboratory Director Computing & Computational Sciences Directorate

Presented at: SC’14

8 SC’14 Summit - Bland

CORAL System

Our Science requires that we continue to advance OLCF’s computational capability over the next decade on the roadmap to Exascale. Since clock-rate scaling ended in 2003, HPC performance has been achieved through increased parallelism. Jaguar scaled to 300,000 cores.

Titan and beyond deliver hierarchical parallelism with very powerful nodes. MPI plus thread level parallelism through OpenACC or OpenMP plus vectors

Jaguar: 2.3 PF Multi-core CPU 7 MW

Titan: 27 PF Hybrid GPU/CPU 9 MW

2010 2012 2017 2022

OLCF5: 5-10x Summit ~20 MW Summit: 5-10x Titan

Hybrid GPU/CPU 10 MW


2017 OLCF Leadership System Hybrid CPU/GPU architecture Vendor: IBM (Prime) / NVIDIA™ / Mellanox Technologies®

At least 5X Titan’s Application Performance

Approximately 3,400 nodes, each with: • Multiple IBM POWER9 CPUs and multiple NVIDIA Tesla® GPUs using the NVIDIA Volta

architecture • CPUs and GPUs completely connected with high speed NVLink • Large coherent memory: over 512 GB (HBM + DDR4)

– all directly addressable from the CPUs and GPUs • An additional 800 GB of NVRAM, which can be configured as either a burst buffer or as

extended memory • over 40 TF peak performance

Dual-rail Mellanox® EDR-IB full, non-blocking fat-tree interconnect

IBM Elastic Storage (GPFS™) - 1TB/s I/O and 120 PB disk capacity.


Scientific Progress at all Scales Fusion Energy A Princeton Plasma Physics Laboratory team led by C.S. Chang was able to simulate and explain nonlinear coherent turbulence structures on the plasma edge in a fusion reactor by exploiting increased performance of the Titan architecture. Turbomachinery Ramgen Power Systems is using the Titan system to significantly reduce time to solution in the optimization of novel designs based on aerospace shock wave compression technology for gas compression systems, such as carbon dioxide compressors. Nuclear Reactors Center for the Advanced Simulation of Lightwater Reactors (CASL) investigators successfully performed full core physics power-up simulations of the Westinghouse AP1000 pressurized water reactor core using their Virtual Environment for Reactor Application code that has been designed on Titan architecture.

Liquid Crystal Film Stability ORNL Postdoctoral fellow Trung Nguyen

ran unprecedented large-scale molecular dynamics

simulations on Titan to model the beginnings of ruptures in thin

films wetting a solid substrate.

Earthquake Hazard Assessment To better prepare California for large

seismic events SCEC joint researchers have simulated earthquakes at high

frequencies allowing structural engineers to conduct realistic physics-based

probabilistic seismic hazard analyses.

Climate Science Simulations on OLCF resources

recreated the evolution of the climate state during the first half of the last

deglaciation period allowing scientists to explain the mechanisms that triggered the

last period of Southern Hemisphere warning and deglaciation.

LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

SC 14 November 17-21, 2014

Rob Neely, Associate Division Leader, Center for Applied Scientific Computing

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 13

Strengthen the United States’ security by developing and applying world-class science, technology, and engineering.

Science and Technology on a Mission

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 14

Unmodified codes will run on Power® Architecture processor

Memory rich nodes; high node memory bandwidth

Volta™ GPUs provide substantial performance potential

Outstanding benchmark analysis by IBM + NVIDIA

Cost competitive; low risk solution; outstanding hardware reliability

LLNL selected the most compelling system for NNSA

NRE contract provides significant benefit Center of Excellence - expert help with porting and optimizing actual applications

Motherboard design and novel cooling concept

GPU reliability; file system performance; open source compiler infrastructure

Advanced system diagnostics and scheduling; advanced networking capabilities

MEM

Multi-core CPU

GPU

ME

M

Notional Sierra node

Coherent memory capability

Development Data Center Infrastructure

Tesla Accelerated Computing Platform

GPU Accelerators Interconnect System

Management Compiler Solutions

GPU Boost …

GPU Direct NVLink

…

NVML …

LLVM …

Profile and Debug

CUDA Debugging API …

Development Tools


Infrastructure Management Communication System Solutions

/

Software Solutions

Libraries

cuBLAS …

IBM POWER CPU Most Powerful Serial Processor

NVIDIA NVLink Fastest CPU-GPU Interconnect

NVIDIA Volta GPU Most Powerful Parallel Processor

Accelerated Computing 5x Higher Energy Efficiency

TESLA K80 WORLD’S FASTEST ACCELERATOR FOR DATA ANALYTICS AND SCIENTIFIC COMPUTING

Caffe Benchmark: AlexNet training throughput based on 20 iterations, CPU: E5-2697v2 @ 2.70GHz. 64GB System Memory, CentOS 6.2

Maximum Performance Dynamically Maximize Perf for

Every Application

Double the Memory Designed for Big Data Apps

24GB

Oil & Gas

Data Analytics

HPC Viz

K40 12GB

2x Faster 2.9 TF| 4992 Cores | 480 GB/s

0x

5x

10x

15x

20x

25x

CPU Tesla K40 Tesla K80

Deep Learning: Caffe

Dual-GPU Accelerator for

Max Throughput

Performance Lead Continues to Grow

0

500

1000

1500

2000

2500

3000

3500

2008 2009 2010 2011 2012 2013 2014

Peak Double Precision FLOPS

NVIDIA GPU x86 CPU

M2090

M1060

K20

K80

GFLOPS

0

100

200

300

400

500

600

2008 2009 2010 2011 2012 2013 2014

Peak Memory Bandwidth

NVIDIA GPU x86 CPU

GB/s

K20

K80

K40 K40

M2090

M1060

Chromatophore Converts Light to Energy Enabling Visualization with

the Tesla Platform

Award Winning Science Now Visualized

Chromatophore Light Harvesting Organelle to

Convert Light to Energy

HIV Virus Unveiling the Structure of the

HIV Capsid

Formation of Milky Way Understanding How Our Galaxy Evolved Over Billions of Years

SC’14 Visualization and Data Analytics Showcase Finalist 2014 Gordon Bell Prize Finalist 2013 Front Cover of Nature Magazine

Interactive Scientific Visualization in the NVIDIA Booth at SC’14

http://nvidianews.nvidia.com/ImageLibrary/detail.aspx?MediaDetailsID=2302

Tesla Accelerated Computing

YOUR PLATFORM FOR DISCOVERY

your platform for discovery · tesla accelerated computing . your platform for discovery . tesla...

Documents