past, present & future - gtc on-demand featured talks...

S7204

Past, Present & Future:AI & HPC Infrastructure in Azure

@Karan_Batta

Senior Program Manager

Microsoft Azure Compute

Our Mission

“No compromise infrastructure”

Invest in scale out; hyper-scale workloads need low latency and high bandwidth networking

Close to bare-metal performance

Invest in eco-system of partners

True “HPC in the cloud”

Compute Virtual Machines (NC)

NC6 NC12 NC24 NC24r

Cores 6 12 24 24

GPU1 K80 GPU (1/2

Physical Card)

2 K80 GPUs (1

Physical Card)

4 K80 GPUs (2

Physical Cards)

4 K80 GPUs (2

Physical Cards)

Memory 56 GB 112 GB 224 GB 224 GB

Disk ~380 GB SSD ~680 GB SSD ~1.5 TB SSD ~1.5 TB SSD

Network Azure Network Azure Network Azure Network InfiniBand

State of the Union

5000+ customer signups during preview

General Availability since December 1st

Huge demand for specialized hardware

GPU offerings at the forefront of hardware innovation

New 1st party products built on N-Series like Cris.ai

100s of external customers in production

Areas such as AI & Deep Learning driving growth

Under The Covers

Applications

GPU Provisioning

Host OS

Client OS

Hardware

• Azure Developer & Platform Services

• Custom Images

• Azure Marketplace

• Custom apps and services

• Hyper-V

• DDA

• NVIDIA M60 GPU (Viz SKU)

• NVIDIA K80 GPU (Compute SKU)

DDA? (Discreet Device Assignment)

Real World Case Studies

“By using GPU resources in Azure, we can

run simulations in days that would take a

month on CPU-based machines. This

speeds our progress toward the

development of lifesaving drugs.”

Dr. Nagarajan Vaidehi

Director

Computational Therapeutics Core

Beckman Research Institute

“We are not short on ideas,

just computers.”

City Of Hope

AudioBurst

Next-Gen Compute Virtual Machines (NC_v2)

NC6s_v2 NC12s_v2 NC24s_v2 NC24rs_v2

Cores 6 12 24 24

GPU 1 x P100 2 x P100 4 x P100 4 x P100


Disk ~700 GB SSD ~1.4 TB SSD ~3 TB SSD ~3 TB SSD


HPC Workloads Performance Gains with P100

0x

10x

20x

30x

40x

2x K80 4x P100 16GB

Speedup Relative to

Dual Broadwell

Broadwell CPU System: Dual E5-2690v3@ 2.6GHz, 14 CoreGPU System: Same CPU system with 2x K80 and 4x P100 PCIe with 16GB

Artificial Intelligence

Seeing AI

Skype Translator

NOONUM

Algorithmia

Smart Refrigerator

The system's word error rate is reported to be 5.9 percent, which is "about equal" to professional transcriptionists asked to work on speech

Cognitive Toolkit fastest on Azure & Pascal GPUs

Deep Learning Virtual Machines (ND)

ND6s ND12s ND24s ND24rs

Cores 6 12 24 24

GPU 1 x P40 2 x P40 4 x P40 4 x P40


Disk ~700 GB SSD ~1.4 TB SSD ~3 TB SSD ~3 TB SSD


Training Workloads Performance Gains with P40

0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

5,000

AlexnetOWT Googlenet InceptionV3 ResNet-50 VGG16 AlexnetOWT ResNet-152 ResNet-50

4x K80 4x P40

Speed-Up ranging to over 2x for training workloads

CNTKCaffe

Up to 21x Inference Throughput with P40

-

1,000

2,000

3,000

4,000

1 2 4 8 16 32 64 128

Thro

ugh

pu

t (i

mag

es/s

eco

nd

)

Batch Size

K80

P40

21x Speedup

GPU: Ubuntu 14.04.5, Tensor RT 2.1, CUDA 8.0.42, cuDNN 6.0.5; precision FP32 (K80), INT8 (P40 GPU).

ResNet-50

Optimize performance with TensorRT and reduced precision

NVIDIA Tesla P40 Demo

Follow me @Karan_Batta

Thanks!

past, present & future - gtc on-demand featured talks...

Documents