mla2646bu machine learning architecturesdl.geekboy.pro:8080/vmworld 2019/mla2646bu.pdfai and ml will...
TRANSCRIPT
Machine Learning ArchitecturesCurrent and Future ML Design Options
Daniel Beveridge & Josh SimonsOffice of the CTOVMware
#vmworld #MLA2646BU
MLA2646BU
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Disclaimer
This presentation may contain product features or functionality that are currently under development.
This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new features/functionality/technology discussed or presented, have not been determined.
2
The information in this presentation is for informational purposes only and may not be incorporated into any contract. There is no commitment or obligation to deliver any items presented herein. VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
EXPLORE & INSPIRE
1. Industry Thought Leader
2. Trusted Advisor & Innovation Partner
3. Force For Good
IT
Business Units
Sales
OCTO
Corporate Development
INT
ER
NA
L IN
FL
UE
NC
ER
S
Analysts
Customers
Ecosystem
Academia
Dell Technologies
EX
TE
RN
AL
IN
FL
UE
NC
ER
S
Standards Bodies
INNOVATE IMPACT THE FUTURE
OCTOPrograms
Research &
AcademiaInnovation
GlobalField
Sustainability
OpenSource
Feedback loop
Feedback loop
Global SupportGoals
The Office of the CTO’s Purpose:To Look Over the Horizon – To EXPLORE, INSPIRE and INNOVATE to IMPACT THE FUTURE
VMworld 2019 Content: Not for publication or distribution
4©2019 VMware, Inc.
“Machine Learning is the most important new workload to emerge in the Enterprise in the last ten years.”
Pat Gelsinger
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 5
AI Growth
Infoworld
Deloitte Global predicts the number of machine learning pilots and implementations will double in 2018 compared to 2017,
and double again by 2020.
International Data Corporation (IDC) forecasts that spending on AI and ML will grow from $12B in 2017 to $57.6B by 2021
Deloitte
Deloitte
TechCrunch
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 6
This Person Does not ExistImages all generated by AI – They are not real people.
ThisPersonDoesNotExist.com
Created with ‘GANN’ (Generated Adversarial Networks) using Nvidia GPUs.
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 7
AI Generated StorytellingBased on Romance Novel Training set
source: https://medium.com/@samim/generating-stories-about-images-d163ba41e4ed
“We were barely able to catch the breeze at the beach , and it felt as if someone stepped out of my mind. She was in love with him for the first time in months, so she had no intention of escaping. The sun had risen from the ocean, making her feel more alive than normal. She's beautiful, but the truth is that I don't know what to do. The sun was just starting to fade away, leaving people scattered around the Atlantic Ocean. I’d seen the men in his life, who guided me at the beach once more.”
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 8
Artificial Intelligence Examples
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 9
Technology Landscape
MachineLearning
DeepLearningBig Data
Data Analytics
Artificial Intelligence
Human intelligence demonstrated by machines
Machine Learning
Enable machines to learn by themselves using the provided data and make accurate predictions without being explicitly programmed
Deep Learning
Machine Learning using deep (many-level) neural networks
Data Analytics
Discovery, interpretation, and communication of meaningful patterns in data
Big Data
Capture, storage, and analysis of uncomfortably large data sets
ArtificialIntelligence
(Definitions derived from those on Wikipedia)
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 10
ML Algorithm Landscape
Source: TechLeer. (2017) https://www.techleer.com/articles/203-machine-learning-algorithm-backbone-of-emerging-technologies/
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 11
Terminology
ML Training vs. ML Inference
ML Training – Model BuildingThe process of providing to a learning algorithm known examples of an input/output mapping of interest so as to create a mathematical model that can then be used to make predictions on new, unseen input data.
ML Inference – Model ExecutionThe process of applying an already-trained mathematical model to previously unseen input data to predict an output.
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 12
Now and in Two Years
Where ML Workloads are Deployed
Source: VMware Core Metrics Survey 2018 (N=1,600)
Virtualized Servers in your
data center
On public cloud infrastructure
Unvirtualized servers in your
data center
28% 17%22%
6%7%9%
11%
33% 20%15%
18%
4%3%
5%
Virtualized Servers in your
data center On public cloud infrastructure
Unvirtualized servers in your
data center
In Use (N = 286) Plan to Use (N = 411)
Deployment of ML Workloads (% respondents)
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 13
Federated
Learning
Machine Learning IT Landscape
MachineLearning
DeepLearningBig Data
PRIVATE
HYBRID/PUBLIC
Data
inference
inference
Data Analytics
EdgeorIoT
low-latency inference
low-latency inference
Edge
Edge
Edge
Edge
Co
mp
ute
Ed
ge
De
vic
eE
dg
e
training
training
Re
mo
te D
esk
top
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 14
Making AI Faster
Key Hardware Accelerators Technologies for AI
Graphical Processing Units:
• Hundreds to 5K cores, up to 47x faster than CPU, 27x on inference
• Large ML training jobs
• Rapid parallel computations
Multi-GPU Servers
• Many thousands of cores
• High speed bus communications
• Jobs are split across GPUs
• Motherboards with 20+ GPUs available
• Up to 544x speed of CPU server, 80K Cuda cores, 10K Tensor cores.
• Nvidia HGX-2: 2 petaFLOPS tensor ops
RDMA
• High bandwidth, low-latency fabrics
• High-scale, multi-host ML training acceleration
Custom AI Silicon
• Purpose built AI processors (IPU)
• Greater density and power efficiency than GPU
• Massive memory bandwidth over 7TB/sec on Colossus GC2 and 90TB/sec per server.
• 23 billion transistors per IPU
• Co-location of memory w/ logic cores yields big gains - 100x gains vs. GPU’s HBM2.
FPGAs
• High throughput – 10TFLOPS/sec
• Database and Hadoop acceleration
• ML applicationsVMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 15
Software Innovation: CPU Based InferenceUsing Binarization Methods for Acceleration on CPU – CPU transistors become accelerators
• State of the art efficiency and accuracy while minimizing CPU and memory utilization.
• ML models run in real time in any environment from low powered Arm processors, to cloud hosted GPS.
• Extraordinary video analytics performance without need for GPU.
• Enable digital transformation and ‘brownfield’ Edge deployment of advanced ML.
• 15x less memory, 30x less energy consumed, 10x faster than typical GPU based or Neural accelerator.
• No need to move data to the Cloud for fast ML inference, no need to ‘break the bank’ at the Edge.
• Demo: https://youtu.be/Ql2iCQSYSE0
60 fps multi-object detection
Edge AI running on Solar
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 16
Front end of the Data Scientist’s analysis pipeline
Experimentation with (subsets of) data to find appropriatetechniques for solving the problem at hand
Often on laptops or desktops. Is that ok?
Data Science often involves confidential or sensitive data
Virtual Desktops offer two benefits
• For the Organization: Only pixels leave the data center – data is secure
• For the Data Scientist: Access to datacenter-class CPU, memory, accelerators, etc., for faster and more extensive exploration
Exploratory Analytics
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 17
Data volumes can be problematic
Model Building / Training
Trainingdata,
D
Learningalgorithm
Model Deployment
StorageSystem
ComputeHost
DeploymentSystems
× E, #epochs
× T, #tunings
D × E × T bytes transferredTypical values:
D, gigabytes to terabytes (or more)E, 10’s to 100’s (or more)T, 10’s or more (1000’s if researching new models)
=> Caching, high-speed storage, compression recommendedVMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 18
Device memory can be a limiting factor
• All model parameters and training data batch must be resident
• NVIDIA V100: 16GB or 32GB, T4: 16GB
GPU model matters – cooling, power, available memory, cost
Accessible via passthrough (VM Direct Path I/O), NVIDIA vGPU, Bitfusion technology
Data Scientists often require multiple GPUs for training, but for two very different reasons: it is important to get more information
1. Building large models that cannot fit into a single GPU
• This is Deep Learning and it comes with the largest data set requirements.
2. Training single-GPU models more quickly
– Horovod (Uber) is one such framework for faster, parallel training across multiple hosts
3. For maximum performance, purchase multi-GPU servers with NVLINK
4. For maximum performance, Data Scientists may request RDMA interconnects
– For example, 100 Gb/s InfiniBand and RoCE (RDMA over Converged Ethernet)
Hardware Acceleration
Model Building / Training
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 19
Carbon Impact and Energy Consumption
Resource Implications of AI Training Today
Energy and Policy Considerations for Deep Learning in NLP, Strubell, E., et al. arXiv:1906.02243
Air travel, 1 passenger NY > SF : 1984 𝑪𝑶𝟐𝒆 (𝒍𝒃𝒔)Car, lifetime including fuel: 1𝟐𝟔, 𝟎𝟎𝟎 𝑪𝑶𝟐𝒆 (𝒍𝒃𝒔)
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 20
Public and Private Cloud
Network Edge
“Things” and Endpoints
Device Edge
Compute Edge
Edge ComputingMoving from Data Centers to Centers of Data
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 21
Forces Driving Edge ML Adoption
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 22
Elements of Change
Factors Reshaping AI Design Assumptions
1. Data Gravity: IoT will generate huge volumes of data – too much to centralize.
2. Real-Time Insights: ML Inference will enable RT analytics on data generated at the Edge, shifting focus to Edge as center of gravity for short term analytics.
3. Data Privacy, GDPR: Increasingly complex data policy frameworks will make default centralization of data infeasible.
4. Data Freedom: ‘Own your Data’ is an increasingly popular sentiment. Users are reluctant to lose control of their data by placing it in walled gardens where they must pay to retrieve it.
5. GPU Alternatives: Advances in CPU based Inference and other lower cost hardware accelerators will enable ML training to move closer to the data, eventually at the place of data generation.
6. Cost: AI will no longer be an exotic capability justifying use of high-end Cloud resources; it will be the bread and butter of business and need to happen at minimal cost. Owning AI accelerators will make sense to avoid high Opex and network related fees.VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 23
Edge will drive tomorrow’s growth
Market Growth in AI Spend
• “Essentially, inference will be 3.5X bigger than training in terms of market potential, with most of that inference being driven at the edge. Most of this edge-based chipset revenue will come from mobile phones, smart speakers, drones, AR/VR headsets, robots, security cameras, and other devices, all of which are going to need AI-based edge processing.”
• By 2025, according to Tractica’s Deep Learning Chipsets report, while cloud-based AI chipsets will account for $14.6 billion in revenue, edge-based AI chipsets will account for $51.6 billion.
AI Chipset Revenue by Market Sector, World Markets: 2016-2025VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 24
The Modern Edge Multiple Edge Layers – Fractal Design Pattern in the Modern Factory of the Future
Factory SDDC - Management, Data Lake
• Factory level data plane
• Highly reliable
• Offering Management by Dell/VMware (Dell at Edge)
• Send Key Insights to Cloud or Corporate datacenters for aggregation and trending analytics.
• Seamless Interoperability with VMC.
Production Line Clusters
• Local IoT Data Lake
• RT Analytics
• Line Resource monitoring
• Transfer Insights up to Data Lake
Robot Monitoring and Control
• RT Sensor – data capture
• RT Analytics
• Low Latency Control loops
• Transfer Key Events upstreamVMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 25
Even Smaller Form FactorsFor collection of Real-time IoT Data and Analytics
• Pi 4b – 4GB Ram - $55 plus Case etc.
• Demonstrated Pi 3 compatibility at VMworld 2018
• Targeting compatibility with Pi 4b later this year
• Game changer for ESXi’s position within Edge
• Intel Compute Sticks CS525 – 4GB Ram, Core m5
• Demonstrated live vMotion over WiFi at VMware’s Empower Conference in April.
• Idea form factor for running small workloads –monitoring robots etc.VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 26
Design Norms and Directions
Future Trends in AI/ML
ML Training gets done in the Cloud or Central datacenter
Training gets done closer to the Edge due to data quantity and privacy restrictions. Federated analytics will offer insights while minimizing data movement
Today Future
ML Inference is Cloud/datacenter focused today because much of the data is centralized
ML Inference moves closer to the Edge as IoT grows and low inference latency grows in importance
GPUs are the mainstay of ML training. They offer Highly parallel compute but are general purpose
Custom AI Silicon and FPGA will be common as they offer power efficiency, greater compute density, and low compute latency for Edge
Data is often centralized by default, with assumptions that analytics must be centralized.
Data will be more distributed, with centralization as needed. Data security and multi-tenant access will be importance.VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 27
Machine Learning Pipeline
MLAlgorithms
ConfigurationData Collection
Feature Extraction
Data Verification
MachineResource
Management
Analysis Tools
ProcessManagement Tools
ServingInfrastructure
Monitoring
Based on: Hidden Technical Debt in Machine Learning Systems, Scully, D., et al.
ML Engineer domain
Data Scientist domainVMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 28
ML on VMware vSphere using Compute Accelerators
Nvidia GRID vGPU
VMwarevSphere
GPU GPU GPU
Passthrough (DirectPath I/O)
Virtual Machine
Guest OS
GPU driver
Applications
Virtual Machine
Guest OS
Device driver
Applications
Virtual Machine
Guest OS
GPU driver
Applications
Pass-th
rough
Pass-th
rough
Pass-th
rough
GPU
Pass-th
rough
VMwarevSphere
vGPU
Virtual Machine
Guest OS
GPU driver
Applications
Virtual Machine
Guest OS
GPU driver
Applications
Virtual Machine
Guest OS
GPU driver
Applications
Virtual Machine
Guest OS
GPU driver
Applications
Nvidia GRIDvGPU manager
vGPU
Virtual Machine
Guest OS
GPU driver
Applications
Virtual Machine
Guest OS
GPU driver
Applications
Virtual Machine
Guest OS
GPU driver
Applications
Virtual Machine
Guest OS
GPU driver
Applications
vGPUvGPU
GPU
vGPU vGPU vGPU vGPUFPGA
Pass-th
rough
FPGA
Pass-th
rough
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 29
Virtual Machine
Guest OS
Bitfusion
Applications
ML on VMware vSphere using Compute Accelerators
VMwarevSphere
GPU GPU
Bitfusion Accelerator Remoting
Virtual Machine
Guest OS
GPU driver
Bitfusion
Pass-th
rough
Pass-th
rough
GPU
Pass-th
rough
Virtual Machine
Guest OS
Bitfusion
Applications
Virtual Machine
Guest OS
Bitfusion
Applications
Virtual Machine
Guest OS
Bitfusion
Applications
Virtual Machine
Guest OS
Bitfusion
Applications
Virtual Machine
Guest OS
Bitfusion
Applications
Virtual Machine
Guest OS
Bitfusion
Applications
Virtual Machine
Guest OS
Bitfusion
Applications
Server VM
Client VMs
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Bitfusion Performance on vSphere
Networking Connectivity – 10GbE vmxnet3
• DellEMC PowerEdge R730
• NVIDIA P100 (Configured in DirectPath I/O)
• Intel Ethernet 10GbE (vmxnet3)
• vSphere 6.7
• VM size: 20 vCPUs and 200G memory
• Bitfusion 1.11.9
• Benchmark: tf_cnn_benchmarks + ImageNet
Baseline: directly run on server VM with GPU w.o Bitfusion
0
0.2
0.4
0.6
0.8
1
1 2
Pe
rfo
rma
nce
Ra
tio
s
Remote, single client, single GPU
Series1
Series2
Series3
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Bitfusion Performance on vSphere
Networking Connectivity – RoCE DirectPath I/O
• DellEMC PowerEdge R730
• NVIDIA P100 (Configured in DirectPath I/O)
• Mellanox Connect X-4 100 Gb/s (Configured in DirectPath I/O)
• vSphere 6.7
• VM size: 20 vCPUs and 200G memory
• Bitfusion 1.11.9
• Benchmark: tf_cnn_benchmarks + ImageNet
Baseline: directly run on server VM with GPU w.o Bitfusion
0
0.2
0.4
0.6
0.8
1
1 2
Pe
rfo
rma
nce
Ra
tio
s
Remote, single client, single GPU
Series1
Series2
Series3
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 32
Accelerator Interface Evolution
ACC
ACC
VM VM
Passthrough • vGPU (NVIDIA)• SR-IOV (AMD)
VM VM
GPU
PCI P2P
ACC
ACC
VM
PCI
NVIDIA GPU Direct
GPU
GPU
VM
NVLINK
Bitfusion, rCUDA, AvA, etc.
RDMA
VM
RDMA
GPU
GPU Direct RDMA
GPU
RDMA
VM
RDMA
ACC
VM
Memory-centric (composable) architecture (e.g., Gen-Z)
GPU
RDMA
VM
RDMA
ACC
VM
RDMA
ACC
RDMA
NVMe
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 33
Internal Projects
Examples:
• Sales pipeline optimization
• Bug assignment & de-duplication
• Customer problem ticket routing & resolution
• Field feedback sentiment analysis
• Email communication prioritization
• Hybrid cloud performance debugging and root cause analysis
• Topic modeling on internal research corpora
• Time-series forecasting, anomaly detection, anomaly prediction
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
vMLP: Machine Learning on VMware Cloud Foundation
Collect Data
Explore & Visualize
Transform & Clean
Build Model
ModelModel
Model
Model repository
Controller
Training Cluster
Data Science Notebook
Serving Cluster
Apps
KubernetesCPU GPU
Storage Network
Data Science in a VCF-based environment
• Data collection & cleaning
• Data cleansing & transformation
• Model training
• Model serving
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 35
CloudEnvironment
CommonUse Cases
IntelligenceLayer
Hybrid/Public/SaaSOn-premise
ProactiveSupport
Capacity planning
Intelligence storage
management
Diagnostic Guidance
Adaptive auto-configuration
Network trafficanalysis
Intrusiondetection
SLA Monitoring & Alerting
Root cause analysis
Workload pattern
clustering
Solution Evolution
DescriptiveAnalytics
What happened?
DiagnosticsAnalytics
Why did it happen?
PredictiveAnalytics
What will happen?
PrescriptiveAnalytics
What should I do?
ML/Analytics Use Cases
CONFIDENTIAL – Shared under NDA ONLY
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 36Confidential │ ©2018 VMware, Inc.
Magna: Reverse-Engineering Component Effects
KPI
Internet Access Power
Compute Storage Network
Operating System
Virtualization / Hypervisor
Containers
Data Stores Platform Apps (MQ/proxy)
Distributed Frameworks (Mesos/Kubernetes)
Microservice Frameworks
Application Code
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 37Confidential │ ©2018 VMware, Inc.
Self-Driving Data Center Vision
A self-driving data center requires infrastructure that:• Self initiates
• Self secures
• Self tunes
• Self heals
• Self escalates
• Self explains
This is a primary focus of our product-related ML work
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 38
Conclusions
AI is making tremendous inroads into many industries, and Edge will drive the next wave of productivity gains.
More companies are adopting AI and increasingly, workloads will run across Cloud, Datacenter, and Edge.
New ML inference accelerators are moving to the Edge – FPGA, SSN’s, new software approaches will make ML affordable and high performance.
In the future, data will move less than today, analytics will be federated and available wherever data lives.
VMware has a wide set of options for customers including Cloud, on-premise, various types of GPU accelerators, AI based datacenter optimization (Magna), and ML as a platform for the community. VMworld 2019 Content: Not for publication or distribution
VMworld 2019 Content: Not for publication or distribution
VMworld 2019 Content: Not for publication or distribution