gtc 2016 opening keynote
Post on 07-Jan-2017
30.841 Views
Preview:
TRANSCRIPT
2
Academia Games
Finance Manufacturing
Internet Oil & Gas
National Labs Automotive
Defense M & E
2X Accelerated Systems, 96% of New Systems on NVIDIA
2X GTC Attendees 4X CUDA Developers,10X in Hyperscale + Auto
Auto InternetGov't / Labs AcademiaM&E FinanceAerospace / Defense ManufacturingOil & Gas IT / HW / SWMedical
LEAPS IN ADOPTION
2012 2016
4x300K
0
20
40
60
80
100
120
Nov 2013 Nov 2014 Nov 2015
# ac
cele
rate
d sy
stem
s
5,500
2,350
2012 2016
4
NVIDIA GAMEWORKSVolumetric Lighting | Voxel Accelerated Ambient Occlusion | Hybrid Frustum Traced Shadows
Available Now
COMPUTEWORKS
HairWorks WaveWorks FlameWorks
and other technologies such as:Clothing, VXGI, Flex, Destruction
GAMEWORKS VRWORKS DESIGNWORKS DRIVEWORKS JETPACK
PhysX
5
NVIDIA DESIGNWORKSAdobe support of MDL | Siemens NX adopts Iray
COMPUTEWORKS
MDL OptiX Path Rendering
and other technologies such as:GL Extensions, GRID, GPU Direct for Video, Mosaic, VXGI, Warp and Blend
GAMEWORKS VRWORKS DESIGNWORKS DRIVEWORKS JETPACK
Iray
6
NVIDIA VRWORKSOculus Rift and HTC Vive integration | Epic, Max Play and Unity game engines
Available Now
COMPUTEWORKS
VR SLI Context Priority Warp and Blend
and other technologies such as:Direct Mode, GPUDirect for Video
GAMEWORKS VRWORKS DESIGNWORKS DRIVEWORKS JETPACK
Multi-Res Shading
7
NVIDIA COMPUTEWORKSCUDA 8 — Available June | cuDNN 5 — Available April | nvGRAPH — Available June
IndeX plug-in for ParaView — Available May
COMPUTEWORKS
cuDNN
and other technologies such as:AMGx, cuSOLVER, cuSPARSE, OpenACC, NSIGHT, THRUST
GAMEWORKS VRWORKS DESIGNWORKS DRIVEWORKS JETPACK
CUDA nvGRAPH IndeX
8
NVIDIA DRIVEWORKSJPL — Available Now | EAP — Available Q2’16
General Release — Available Q1’17
COMPUTEWORKS
Detection Localization HD Maps
GAMEWORKS VRWORKS DESIGNWORKS DRIVEWORKS JETPACK
SensorFusion
and other technologies such as:Driving, Planning
9
NVIDIA JETPACKJetson TX1: 24 images/s/W | GIE - GPU Inference Engine — Available May
COMPUTEWORKS
DIGITS Workflow VisionWorks Jetson Media SDK
and other technologies such as:Linux4Tegra, NSIGHT EE, OpenCV4Tegra, OpenGL, System Trace, Visual Profiler, Vulkan
GAMEWORKS VRWORKS DESIGNWORKS DRIVEWORKS JETPACK
Deep Learning SDK
10
VR: A START OF A NEW PLATFORM
New York Times ships Cardboard to subscribers
Microsoft demonstrates Holoportation
Google announces Jump VR camera platform
Samsung, Oculus, HTC release headsets
VR Startups Raise $1.5B in funding
13
IRAY VRBreakthrough Photoreal VR — Available Starting in June
Rasterize depth buffer at headset eye positions
Reconstruct image for new viewpoint from depth and multiple probes
Pre-render light probes surrounding region of interest
15
IRAY VR LITEAvailable in June
2. Download Irayfor 3ds Max Plug-in
1. Design in 3ds Max 3. Download Android Viewer
4. Get VR HMD
16
AN AMAZING YEAR IN AI
AlphaGoRivals a World Champion
Microsoft & Google “Superhuman” Image
Recognition
Microsoft “Super Deep Network”
Berkeley’s BrettOne network,
everything robotics
Deep Speech 2One network, 2 languages
A New Computing Model Hits Pop Culture
17
A NEW COMPUTING MODEL
Deep Learning Object DetectionDNN + Data + HPC
Traditional Computer VisionExperts + Time
Deep Learning Achieves “Superhuman” Results
0%10%20%30%40%50%60%70%80%90%
100%
2009 2010 2011 2012 2013 2014 2015 2016
Traditional CVDeep Learning
ImageNet
19
Ad Service Technology
InvestmentMedia
Oil & Gas
Mfg
Retail
Other
$500B OPPORTUNITY OVER 10 YRS
Deep Learning Software Revenue by Industry
Deep Learning Total Revenue by Segment
IBM: “Cognitive business representsa $2T opportunity”
SOURCE: “Deep Learning for Enterprise Applications,” 4Q 2015, Tractica
20
NVIDIA GPU FOR HYPERSCALE
10X Speed up | 20 images/s/W Cloud Services Powered by AI
TESLA M40 + TESLA M4
21
Soumith ChintalaAI Research Engineer, Facebook
“ Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.”
— Soumith Chintala, Facebook AI ResearchAlec Radford & Luke Metz indico Research
23
150B XTORS | 5.3TF FP64 | 10.6TF FP32 | 21.2TF FP16 | 14MB SM RF | 4MB L2 Cache
TESLA P100THE MOST ADVANCED HYPERSCALE DATACENTER GPU EVER BUILT
25
GIANT LEAPS IN EVERYTHING
3x GPU Mem BW3x Compute 5x GPU-GPU BW
Tera
flop
s (F
P32/
FP16
)
5
10
15
20
K40
P100 (FP32)
P100 (FP16)
M40
K40
Band
wid
th (
GB/
Sec)
40
80
120
160 P100
M40
K40
Band
wid
th
1x
2x
3x P100
M40
26
“ This is a new era of computing. New approaches to the underlying technologies will be required for AI and cognitive. The combination of NVIDIA Pascal GPUs and IBM POWER accelerates Watson’s learning of new skills. Together, IBM and NVIDIA will advance the artificial intelligence industry.”
Dr. John Kelly III, SVP, Cognitive Solutions & IBM Research
“ NVIDIA GPU is accelerating progress in AI. As neural nets become larger and larger, we not only need faster GPUs with larger and faster memory, but also much faster GPU-to-GPU communication, as well as hardware that can take advantage of reduced-precision arithmetic. This is precisely what Pascal delivers.”
Yann LeCun, Director of AI Research, Facebook
“ Microsoft is developing super deep neural networks that are more than 1000 layers. NVIDIA Tesla P100’s impressive horsepower will enable Microsoft’s CNTK to accelerate AI breakthroughs.”
Xuedong Huang, Chief Speech Scientist, Microsoft Research
“ AI computers are like space rockets: The bigger the better. Pascal’s throughput and interconnect will make the biggest rocket we’ve seen yet.”
Andrew Ng, Chief Scientist, Baidu
28
GPU-ACCELERATED DL FOR EVERY MARKET
IBM: “Cognitive business representsa $2T opportunity”
Deep Learning in the Cloud
Deep Learning for Enterprise
Ad Service Technology
InvestmentMedia
Oil & Gas
Mfg
Retail
Other
SOURCE: “Deep Learning for Enterprise Applications,” 4Q 2015, Tractica
29
Engineered for deep learning | 170TF FP16 | 8x Tesla P100
NVLink hybrid cube mesh | Accelerates major AI frameworks
NVIDIA DGX-1WORLD’S FIRST DEEP LEARNING SUPERCOMPUTER
31
“250 SERVERS IN-A-BOX”
DUAL XEON DGX-1
FLOPS (CPU + GPU) 3 TF 170 TF
AGGREGATE NODE BW 76 GB/s 768 GB/s
ALEXNET TRAIN TIME 150 HOURS 2 HOURS
TRAIN IN 2 HOURS >250 NODES* 1 NODE
*Caffe Training on Multi-node Distributed-memory Systems Based on Intel® Xeon® Processor E5 Family (extrapolated)Gennady Fedorov (Intel)'s picture Submitted by Gennady Fedorov (Intel), Vadim P. (Intel) on October 29, 2015https://software.intel.com/en-us/articles/caffe-training-on-multi-node-distributed-memory-systems-based-on-intel-xeon-processor-e5
32
12X SPEED-UP IN ONE YEAR1.33 billion images/day
25 Hours
2 Hours
GTC 20154 Maxwell GPUS
GTC 20168 Pascal GPUS
33
Bryan CatanzaroSenior Researcher, Baidu
Time series input
“Time series output”
GPU0
GPU1
Model Parallel
Data Parallel
Recurrent Neural Nets Model + Data Parallelism
34
Add Model Parallelism over NVLINK Compose with Data ParallelismPersistent RNNs:
Peak FLOPs at batch of 8
weights
keep in registers
repeat ~300 times repeat ~300 times
GPU0
GPU1
GPU2
GPU3
Data Parallel
Strong scale to 32X more processors
36
170TF | “250 servers in-a-box” | nvidia.com/dgx1
$129,000
NVIDIA DGX-1WORLD’S FIRST DEEP LEARNING SUPERCOMPUTER
37
PIONEERS IN AI RESEARCH
Frameworks for Multi-GPU Pascal
Large-scale Deep Learning
Reinforcement Learning
Unsupervised and Transfer Learning
Natural Language Understanding
Autonomous Driving
Medical Applications
38
DEEP LEARNING FOR MEDICINENVIDIA Founding Technology Partner of MGH Center of Clinical Data Science
10B Medical images on DGX-1 to advance radiology, pathology, genomics
39
TESLA FAMILY
Multi-App HPCHyperscale HPC Strong-Scale HPC Researchers / Early Adopters
M40 + M4 K80
40
Uber Enters the Race
Toyota Invests $1B in AI Lab
Volvo Drive Me on Public Roads in 2017
NHTSA: Computer Counts as Driver
Tesla Model 3: 300K pre-orders
AN AMAZING YEAR FOR SELF-DRIVING CARS
Audi, BMW, Daimler Buy HERE
Tesla Model S Auto-pilot
Baidu Enters the Race
Honda, Nissan, Toyota Team Up
GM Buys Cruise
42
World’s first DL-powered car computing platform
One scalable architecture — from DNN training to cluster, infotainment, ADAS, autonomous driving, and mapping
Open platform
NVIDIA DRIVE PX AI CAR COMPUTER
Training on DGX-1
Driving with DriveWorks
KALDILOCALIZATION
MAPPING
DRIVENET
DAVENET
NVIDIA DGX-1 NVIDIA DRIVE PX
43
NVIDIA DRIVE PX PERCEPTION
Training on DGX-1
Driving with DriveWorks
KALDILOCALIZATION
MAPPING
DRIVENET
DAVENET
NVIDIA DGX-1 NVIDIA DRIVE PX
NVIDIA DRIVENET#1 accuracy score for KITTI car detection
44
NVIDIA DRIVE PX PERCEPTION
Training on DGX-1
Driving with DriveWorks
KALDILOCALIZATION
MAPPING
DRIVENET
DAVENET
NVIDIA DGX-1 NVIDIA DRIVE PX
45
NEW END-TO-END HD MAPPING
Training on DGX-1
Driving with DriveWorks
KALDILOCALIZATION
MAPPING
DRIVENET
DAVENET
NVIDIA DGX-1 NVIDIA DRIVE PX
47
NEW END-TO-END HD MAPPING
Training on DGX-1
Driving with DriveWorks
KALDILOCALIZATION
MAPPING
DRIVENET
DAVENET
NVIDIA DGX-1 NVIDIA DRIVE PX
49
NEW AI DRIVING
Training on DGX-1
Driving with DriveWorks
KALDILOCALIZATION
MAPPING
DRIVENET
DAVENET
NVIDIA DGX-1 NVIDIA DRIVE PX
51
WORLD’S FIRST AUTONOMOUS CAR RACE10 teams, 20 identical cars | DRIVE PX 2: The “brain” of every car | 2016/17 Formula E season
top related