mla2646bu machine learning architecturesdl.geekboy.pro:8080/vmworld 2019/mla2646bu.pdfai and ml will...

Machine Learning ArchitecturesCurrent and Future ML Design Options

Daniel Beveridge & Josh SimonsOffice of the CTOVMware

#vmworld #MLA2646BU

MLA2646BU

VMworld 2019 Content: Not for publication or distribution

©2019 VMware, Inc.

Disclaimer

This presentation may contain product features or functionality that are currently under development.

This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

Technical feasibility and market demand will affect final delivery.

Pricing and packaging for any new features/functionality/technology discussed or presented, have not been determined.

2

The information in this presentation is for informational purposes only and may not be incorporated into any contract. There is no commitment or obligation to deliver any items presented herein. VMworld 2019 Content: Not for publication or distribution

©2019 VMware, Inc.

EXPLORE & INSPIRE

1. Industry Thought Leader

2. Trusted Advisor & Innovation Partner

3. Force For Good

IT

Business Units

Sales

OCTO

Corporate Development

INT

ER

NA

L IN

FL

UE

NC

ER

S

Analysts

Customers

Ecosystem

Academia

Dell Technologies

EX

TE

RN

AL

IN

FL

UE

NC

ER

S

Standards Bodies

INNOVATE IMPACT THE FUTURE

OCTOPrograms

Research &

AcademiaInnovation

GlobalField

Sustainability

OpenSource

Feedback loop

Feedback loop

Global SupportGoals

The Office of the CTO’s Purpose:To Look Over the Horizon – To EXPLORE, INSPIRE and INNOVATE to IMPACT THE FUTURE


4©2019 VMware, Inc.

“Machine Learning is the most important new workload to emerge in the Enterprise in the last ten years.”

Pat Gelsinger


©2019 VMware, Inc. 5

AI Growth

Infoworld

Deloitte Global predicts the number of machine learning pilots and implementations will double in 2018 compared to 2017,

and double again by 2020.

International Data Corporation (IDC) forecasts that spending on AI and ML will grow from $12B in 2017 to $57.6B by 2021

Deloitte

Deloitte

TechCrunch



This Person Does not ExistImages all generated by AI – They are not real people.

ThisPersonDoesNotExist.com

Created with ‘GANN’ (Generated Adversarial Networks) using Nvidia GPUs.



AI Generated StorytellingBased on Romance Novel Training set

source: https://medium.com/@samim/generating-stories-about-images-d163ba41e4ed

“We were barely able to catch the breeze at the beach , and it felt as if someone stepped out of my mind. She was in love with him for the first time in months, so she had no intention of escaping. The sun had risen from the ocean, making her feel more alive than normal. She's beautiful, but the truth is that I don't know what to do. The sun was just starting to fade away, leaving people scattered around the Atlantic Ocean. I’d seen the men in his life, who guided me at the beach once more.”


https://medium.com/@samim/generating-stories-about-images-d163ba41e4ed


Artificial Intelligence Examples



Technology Landscape

MachineLearning

DeepLearningBig Data

Data Analytics

Artificial Intelligence

Human intelligence demonstrated by machines

Machine Learning

Enable machines to learn by themselves using the provided data and make accurate predictions without being explicitly programmed

Deep Learning

Machine Learning using deep (many-level) neural networks

Data Analytics

Discovery, interpretation, and communication of meaningful patterns in data

Big Data

Capture, storage, and analysis of uncomfortably large data sets

ArtificialIntelligence

(Definitions derived from those on Wikipedia)



ML Algorithm Landscape

Source: TechLeer. (2017) https://www.techleer.com/articles/203-machine-learning-algorithm-backbone-of-emerging-technologies/


https://www.techleer.com/articles/203-machine-learning-algorithm-backbone-of-emerging-technologies/


Terminology

ML Training vs. ML Inference

ML Training – Model BuildingThe process of providing to a learning algorithm known examples of an input/output mapping of interest so as to create a mathematical model that can then be used to make predictions on new, unseen input data.

ML Inference – Model ExecutionThe process of applying an already-trained mathematical model to previously unseen input data to predict an output.



Now and in Two Years

Where ML Workloads are Deployed

Source: VMware Core Metrics Survey 2018 (N=1,600)

Virtualized Servers in your

data center

On public cloud infrastructure

Unvirtualized servers in your

data center

28% 17%22%

6%7%9%

11%

33% 20%15%

18%

4%3%

5%

Virtualized Servers in your

data center On public cloud infrastructure

Unvirtualized servers in your

data center

In Use (N = 286) Plan to Use (N = 411)

Deployment of ML Workloads (% respondents)



Federated

Learning

Machine Learning IT Landscape

MachineLearning

DeepLearningBig Data

PRIVATE

HYBRID/PUBLIC

Data

inference

inference

Data Analytics

EdgeorIoT

low-latency inference

low-latency inference

Edge

Edge

Edge

Edge

Co

mp

ute

Ed

ge

De

vic

eE

dg

e

training

training

Re

mo

te D

esk

top



Making AI Faster

Key Hardware Accelerators Technologies for AI

Graphical Processing Units:

• Hundreds to 5K cores, up to 47x faster than CPU, 27x on inference

• Large ML training jobs

• Rapid parallel computations

Multi-GPU Servers

• Many thousands of cores

• High speed bus communications

• Jobs are split across GPUs

• Motherboards with 20+ GPUs available

• Up to 544x speed of CPU server, 80K Cuda cores, 10K Tensor cores.

• Nvidia HGX-2: 2 petaFLOPS tensor ops

RDMA

• High bandwidth, low-latency fabrics

• High-scale, multi-host ML training acceleration

Custom AI Silicon

• Purpose built AI processors (IPU)

• Greater density and power efficiency than GPU

• Massive memory bandwidth over 7TB/sec on Colossus GC2 and 90TB/sec per server.

• 23 billion transistors per IPU

• Co-location of memory w/ logic cores yields big gains - 100x gains vs. GPU’s HBM2.

FPGAs

• High throughput – 10TFLOPS/sec

• Database and Hadoop acceleration

• ML applicationsVMworld 2019 Content: Not for publication or distribution


Software Innovation: CPU Based InferenceUsing Binarization Methods for Acceleration on CPU – CPU transistors become accelerators

• State of the art efficiency and accuracy while minimizing CPU and memory utilization.

• ML models run in real time in any environment from low powered Arm processors, to cloud hosted GPS.

• Extraordinary video analytics performance without need for GPU.

• Enable digital transformation and ‘brownfield’ Edge deployment of advanced ML.

• 15x less memory, 30x less energy consumed, 10x faster than typical GPU based or Neural accelerator.

• No need to move data to the Cloud for fast ML inference, no need to ‘break the bank’ at the Edge.

• Demo: https://youtu.be/Ql2iCQSYSE0

60 fps multi-object detection

Edge AI running on Solar


https://youtu.be/Ql2iCQSYSE0


Front end of the Data Scientist’s analysis pipeline

Experimentation with (subsets of) data to find appropriatetechniques for solving the problem at hand

Often on laptops or desktops. Is that ok?

Data Science often involves confidential or sensitive data

Virtual Desktops offer two benefits

• For the Organization: Only pixels leave the data center – data is secure

• For the Data Scientist: Access to datacenter-class CPU, memory, accelerators, etc., for faster and more extensive exploration

Exploratory Analytics



Data volumes can be problematic

Model Building / Training

Trainingdata,

D

Learningalgorithm

Model Deployment

StorageSystem

ComputeHost

DeploymentSystems

× E, #epochs

× T, #tunings

D × E × T bytes transferredTypical values:

D, gigabytes to terabytes (or more)E, 10’s to 100’s (or more)T, 10’s or more (1000’s if researching new models)

=> Caching, high-speed storage, compression recommendedVMworld 2019 Content: Not for publication or distribution


Device memory can be a limiting factor

• All model parameters and training data batch must be resident

• NVIDIA V100: 16GB or 32GB, T4: 16GB

GPU model matters – cooling, power, available memory, cost

Accessible via passthrough (VM Direct Path I/O), NVIDIA vGPU, Bitfusion technology

Data Scientists often require multiple GPUs for training, but for two very different reasons: it is important to get more information

1. Building large models that cannot fit into a single GPU

• This is Deep Learning and it comes with the largest data set requirements.

2. Training single-GPU models more quickly

– Horovod (Uber) is one such framework for faster, parallel training across multiple hosts

3. For maximum performance, purchase multi-GPU servers with NVLINK

4. For maximum performance, Data Scientists may request RDMA interconnects

– For example, 100 Gb/s InfiniBand and RoCE (RDMA over Converged Ethernet)

Hardware Acceleration

Model Building / Training



Carbon Impact and Energy Consumption

Resource Implications of AI Training Today

Energy and Policy Considerations for Deep Learning in NLP, Strubell, E., et al. arXiv:1906.02243

Air travel, 1 passenger NY > SF : 1984 𝑪𝑶𝟐𝒆 (𝒍𝒃𝒔)Car, lifetime including fuel: 1𝟐𝟔, 𝟎𝟎𝟎 𝑪𝑶𝟐𝒆 (𝒍𝒃𝒔)



Public and Private Cloud

Network Edge

“Things” and Endpoints

Device Edge

Compute Edge

Edge ComputingMoving from Data Centers to Centers of Data



Forces Driving Edge ML Adoption



Elements of Change

Factors Reshaping AI Design Assumptions

1. Data Gravity: IoT will generate huge volumes of data – too much to centralize.

2. Real-Time Insights: ML Inference will enable RT analytics on data generated at the Edge, shifting focus to Edge as center of gravity for short term analytics.

3. Data Privacy, GDPR: Increasingly complex data policy frameworks will make default centralization of data infeasible.

4. Data Freedom: ‘Own your Data’ is an increasingly popular sentiment. Users are reluctant to lose control of their data by placing it in walled gardens where they must pay to retrieve it.

5. GPU Alternatives: Advances in CPU based Inference and other lower cost hardware accelerators will enable ML training to move closer to the data, eventually at the place of data generation.

6. Cost: AI will no longer be an exotic capability justifying use of high-end Cloud resources; it will be the bread and butter of business and need to happen at minimal cost. Owning AI accelerators will make sense to avoid high Opex and network related fees.VMworld 2019 Content: Not for publication or distribution


Edge will drive tomorrow’s growth

Market Growth in AI Spend

• “Essentially, inference will be 3.5X bigger than training in terms of market potential, with most of that inference being driven at the edge. Most of this edge-based chipset revenue will come from mobile phones, smart speakers, drones, AR/VR headsets, robots, security cameras, and other devices, all of which are going to need AI-based edge processing.”

• By 2025, according to Tractica’s Deep Learning Chipsets report, while cloud-based AI chipsets will account for $14.6 billion in revenue, edge-based AI chipsets will account for $51.6 billion.

AI Chipset Revenue by Market Sector, World Markets: 2016-2025VMworld 2019 Content: Not for publication or distribution

https://www.tractica.com/research/deep-learning-chipsets/


The Modern Edge Multiple Edge Layers – Fractal Design Pattern in the Modern Factory of the Future

Factory SDDC - Management, Data Lake

• Factory level data plane

• Highly reliable

• Offering Management by Dell/VMware (Dell at Edge)

• Send Key Insights to Cloud or Corporate datacenters for aggregation and trending analytics.

• Seamless Interoperability with VMC.

Production Line Clusters

• Local IoT Data Lake

• RT Analytics

• Line Resource monitoring

• Transfer Insights up to Data Lake

Robot Monitoring and Control

• RT Sensor – data capture

• RT Analytics

• Low Latency Control loops

• Transfer Key Events upstreamVMworld 2019 Content: Not for publication or distribution


Even Smaller Form FactorsFor collection of Real-time IoT Data and Analytics

• Pi 4b – 4GB Ram - $55 plus Case etc.

• Demonstrated Pi 3 compatibility at VMworld 2018

• Targeting compatibility with Pi 4b later this year

• Game changer for ESXi’s position within Edge

• Intel Compute Sticks CS525 – 4GB Ram, Core m5

• Demonstrated live vMotion over WiFi at VMware’s Empower Conference in April.

• Idea form factor for running small workloads –monitoring robots etc.VMworld 2019 Content: Not for publication or distribution


Design Norms and Directions

Future Trends in AI/ML

ML Training gets done in the Cloud or Central datacenter

Training gets done closer to the Edge due to data quantity and privacy restrictions. Federated analytics will offer insights while minimizing data movement

Today Future

ML Inference is Cloud/datacenter focused today because much of the data is centralized

ML Inference moves closer to the Edge as IoT grows and low inference latency grows in importance

GPUs are the mainstay of ML training. They offer Highly parallel compute but are general purpose

Custom AI Silicon and FPGA will be common as they offer power efficiency, greater compute density, and low compute latency for Edge

Data is often centralized by default, with assumptions that analytics must be centralized.

Data will be more distributed, with centralization as needed. Data security and multi-tenant access will be importance.VMworld 2019 Content: Not for publication or distribution


Machine Learning Pipeline

MLAlgorithms

ConfigurationData Collection

Feature Extraction

Data Verification

MachineResource

Management

Analysis Tools

ProcessManagement Tools

ServingInfrastructure

Monitoring

Based on: Hidden Technical Debt in Machine Learning Systems, Scully, D., et al.

ML Engineer domain

Data Scientist domainVMworld 2019 Content: Not for publication or distribution


ML on VMware vSphere using Compute Accelerators

Nvidia GRID vGPU

VMwarevSphere

GPU GPU GPU

Passthrough (DirectPath I/O)

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

Device driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Pass-th

rough

Pass-th

rough

Pass-th

rough

GPU

Pass-th

rough

VMwarevSphere

vGPU

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Nvidia GRIDvGPU manager

vGPU

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

vGPUvGPU

GPU

vGPU vGPU vGPU vGPUFPGA

Pass-th

rough

FPGA

Pass-th

rough



Virtual Machine

Guest OS

Bitfusion

Applications

ML on VMware vSphere using Compute Accelerators

VMwarevSphere

GPU GPU

Bitfusion Accelerator Remoting

Virtual Machine

Guest OS

GPU driver

Bitfusion

Pass-th

rough

Pass-th

rough

GPU

Pass-th

rough

Virtual Machine

Guest OS

Bitfusion

Applications

Virtual Machine

Guest OS

Bitfusion

Applications

Virtual Machine

Guest OS

Bitfusion

Applications

Virtual Machine

Guest OS

Bitfusion

Applications

Virtual Machine

Guest OS

Bitfusion

Applications

Virtual Machine

Guest OS

Bitfusion

Applications

Virtual Machine

Guest OS

Bitfusion

Applications

Server VM

Client VMs


©2019 VMware, Inc.

Bitfusion Performance on vSphere

Networking Connectivity – 10GbE vmxnet3

• DellEMC PowerEdge R730

• NVIDIA P100 (Configured in DirectPath I/O)

• Intel Ethernet 10GbE (vmxnet3)

• vSphere 6.7

• VM size: 20 vCPUs and 200G memory

• Bitfusion 1.11.9

• Benchmark: tf_cnn_benchmarks + ImageNet

Baseline: directly run on server VM with GPU w.o Bitfusion

0

0.2

0.4

0.6

0.8

1

1 2

Pe

rfo

rma

nce

Ra

tio

s

Remote, single client, single GPU

Series1

Series2

Series3


©2019 VMware, Inc.

Bitfusion Performance on vSphere

Networking Connectivity – RoCE DirectPath I/O

• DellEMC PowerEdge R730

• NVIDIA P100 (Configured in DirectPath I/O)

• Mellanox Connect X-4 100 Gb/s (Configured in DirectPath I/O)

• vSphere 6.7

• VM size: 20 vCPUs and 200G memory

• Bitfusion 1.11.9

• Benchmark: tf_cnn_benchmarks + ImageNet

Baseline: directly run on server VM with GPU w.o Bitfusion

0

0.2

0.4

0.6

0.8

1

1 2

Pe

rfo

rma

nce

Ra

tio

s

Remote, single client, single GPU

Series1

Series2

Series3



Accelerator Interface Evolution

ACC

ACC

VM VM

Passthrough • vGPU (NVIDIA)• SR-IOV (AMD)

VM VM

GPU

PCI P2P

ACC

ACC

VM

PCI

NVIDIA GPU Direct

GPU

GPU

VM

NVLINK

Bitfusion, rCUDA, AvA, etc.

RDMA

VM

RDMA

GPU

GPU Direct RDMA

GPU

RDMA

VM

RDMA

ACC

VM

Memory-centric (composable) architecture (e.g., Gen-Z)

GPU

RDMA

VM

RDMA

ACC

VM

RDMA

ACC

RDMA

NVMe



Internal Projects

Examples:

• Sales pipeline optimization

• Bug assignment & de-duplication

• Customer problem ticket routing & resolution

• Field feedback sentiment analysis

• Email communication prioritization

• Hybrid cloud performance debugging and root cause analysis

• Topic modeling on internal research corpora

• Time-series forecasting, anomaly detection, anomaly prediction


©2019 VMware, Inc.

vMLP: Machine Learning on VMware Cloud Foundation

Collect Data

Explore & Visualize

Transform & Clean

Build Model

ModelModel

Model

Model repository

Controller

Training Cluster

Data Science Notebook

Serving Cluster

Apps

KubernetesCPU GPU

Storage Network

Data Science in a VCF-based environment

• Data collection & cleaning

• Data cleansing & transformation

• Model training

• Model serving



CloudEnvironment

CommonUse Cases

IntelligenceLayer

Hybrid/Public/SaaSOn-premise

ProactiveSupport

Capacity planning

Intelligence storage

management

Diagnostic Guidance

Adaptive auto-configuration

Network trafficanalysis

Intrusiondetection

SLA Monitoring & Alerting

Root cause analysis

Workload pattern

clustering

Solution Evolution

DescriptiveAnalytics

What happened?

DiagnosticsAnalytics

Why did it happen?

PredictiveAnalytics

What will happen?

PrescriptiveAnalytics

What should I do?

ML/Analytics Use Cases

CONFIDENTIAL – Shared under NDA ONLY


©2019 VMware, Inc. 36Confidential │ ©2018 VMware, Inc.

Magna: Reverse-Engineering Component Effects

KPI

Internet Access Power

Compute Storage Network

Operating System

Virtualization / Hypervisor

Containers

Data Stores Platform Apps (MQ/proxy)

Distributed Frameworks (Mesos/Kubernetes)

Microservice Frameworks

Application Code


©2019 VMware, Inc. 37Confidential │ ©2018 VMware, Inc.

Self-Driving Data Center Vision

A self-driving data center requires infrastructure that:• Self initiates

• Self secures

• Self tunes

• Self heals

• Self escalates

• Self explains

This is a primary focus of our product-related ML work



Conclusions

AI is making tremendous inroads into many industries, and Edge will drive the next wave of productivity gains.

More companies are adopting AI and increasingly, workloads will run across Cloud, Datacenter, and Edge.

New ML inference accelerators are moving to the Edge – FPGA, SSN’s, new software approaches will make ML affordable and high performance.

In the future, data will move less than today, analytics will be federated and available wherever data lives.

VMware has a wide set of options for customers including Cloud, on-premise, various types of GPU accelerators, AI based datacenter optimization (Magna), and ML as a platform for the community. VMworld 2019 Content: Not for publication or distribution

mla2646bu machine learning architecturesdl.geekboy.pro:8080/vmworld 2019/mla2646bu.pdfai and ml will...

Documents