past, present & future - gtc on-demand featured talks...
TRANSCRIPT
S7204
Past, Present & Future:AI & HPC Infrastructure in Azure
@Karan_Batta
Senior Program Manager
Microsoft Azure Compute
Our Mission
“No compromise infrastructure”
Invest in scale out; hyper-scale workloads need low latency and high bandwidth networking
Close to bare-metal performance
Invest in eco-system of partners
True “HPC in the cloud”
Recap
Compute Virtual Machines (NC)
NC6 NC12 NC24 NC24r
Cores 6 12 24 24
GPU1 K80 GPU (1/2
Physical Card)
2 K80 GPUs (1
Physical Card)
4 K80 GPUs (2
Physical Cards)
4 K80 GPUs (2
Physical Cards)
Memory 56 GB 112 GB 224 GB 224 GB
Disk ~380 GB SSD ~680 GB SSD ~1.5 TB SSD ~1.5 TB SSD
Network Azure Network Azure Network Azure Network InfiniBand
State of the Union
5000+ customer signups during preview
General Availability since December 1st
Huge demand for specialized hardware
GPU offerings at the forefront of hardware innovation
New 1st party products built on N-Series like Cris.ai
100s of external customers in production
Areas such as AI & Deep Learning driving growth
Under The Covers
Applications
GPU Provisioning
Host OS
Client OS
Hardware
• Azure Developer & Platform Services
• Custom Images
• Azure Marketplace
• Custom apps and services
• Hyper-V
• DDA
• NVIDIA M60 GPU (Viz SKU)
• NVIDIA K80 GPU (Compute SKU)
DDA? (Discreet Device Assignment)
Real World Case Studies
“By using GPU resources in Azure, we can
run simulations in days that would take a
month on CPU-based machines. This
speeds our progress toward the
development of lifesaving drugs.”
Dr. Nagarajan Vaidehi
Director
Computational Therapeutics Core
Beckman Research Institute
“We are not short on ideas,
just computers.”
City Of Hope
AudioBurst
Next-Gen Compute Virtual Machines (NC_v2)
NC6s_v2 NC12s_v2 NC24s_v2 NC24rs_v2
Cores 6 12 24 24
GPU 1 x P100 2 x P100 4 x P100 4 x P100
Memory 112 GB 224 GB 448 GB 448 GB
Disk ~700 GB SSD ~1.4 TB SSD ~3 TB SSD ~3 TB SSD
Network Azure Network Azure Network Azure Network InfiniBand
HPC Workloads Performance Gains with P100
0x
10x
20x
30x
40x
2x K80 4x P100 16GB
Speedup Relative to
Dual Broadwell
Broadwell CPU System: Dual E5-2690v3@ 2.6GHz, 14 CoreGPU System: Same CPU system with 2x K80 and 4x P100 PCIe with 16GB
Artificial Intelligence
Seeing AI
Skype Translator
NOONUM
Algorithmia
Smart Refrigerator
The system's word error rate is reported to be 5.9 percent, which is "about equal" to professional transcriptionists asked to work on speech
Cognitive Toolkit fastest on Azure & Pascal GPUs
Deep Learning Virtual Machines (ND)
ND6s ND12s ND24s ND24rs
Cores 6 12 24 24
GPU 1 x P40 2 x P40 4 x P40 4 x P40
Memory 112 GB 224 GB 448 GB 448 GB
Disk ~700 GB SSD ~1.4 TB SSD ~3 TB SSD ~3 TB SSD
Network Azure Network Azure Network Azure Network InfiniBand
Training Workloads Performance Gains with P40
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
5,000
AlexnetOWT Googlenet InceptionV3 ResNet-50 VGG16 AlexnetOWT ResNet-152 ResNet-50
4x K80 4x P40
Speed-Up ranging to over 2x for training workloads
CNTKCaffe
Up to 21x Inference Throughput with P40
-
1,000
2,000
3,000
4,000
1 2 4 8 16 32 64 128
Thro
ugh
pu
t (i
mag
es/s
eco
nd
)
Batch Size
K80
P40
21x Speedup
GPU: Ubuntu 14.04.5, Tensor RT 2.1, CUDA 8.0.42, cuDNN 6.0.5; precision FP32 (K80), INT8 (P40 GPU).
ResNet-50
Optimize performance with TensorRT and reduced precision
NVIDIA Tesla P40 Demo
Follow me @Karan_Batta
Thanks!