operationalizing machine learning using...
TRANSCRIPT
OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS
1
Why GPUs?
2
A Tale of Numbers
100x
75%Performance
100x gains over traditional RDBMS / NoSQL / In-Mem Databases
Cores
Modern GPUs can consist of up to 3000+ cores compared to 32 in a CPU
Costs
75% reduction in infrastructure costs, licensing, staff, etc.
More with Less
Increase performance, throughput, capability while minimizing the costs to support the business
Performance Increase
Infrastructure Cost Savings
3000vs
32
Why a GPU Database?
• Leverage Innovations in CPUs and GPUs
• Single Hardware Platform• Simplified Software Stack
3
What are AI, ML, and Deep Learning?
4
ML
Predict y using function on data x
Deep LearningAI
AI/ML/Deep Learning Cheat Sheet
5
No shortage of techniques and programing languages
6
ML Cheat Sheet
Python and SQL cover almost all the algorithms in that scary spider and Kinetica supports all Python libraries!
7
ML/AI/Deep Learning Lifecycle
ML/AI/Deep Learning Lifecycle
•Create, extract, transform, and process big data: batch and streams• Apply ML to data.
•Model pre-processing•Model execution•Model post-processing
•Within an ecosystem of general analytics
• Supporting a range of human and machine consumers
9
9
Typical AI Process: High Latency, Rigid, Complex Tech Stack
SPECIALIZED AI/ DATA
SCIENCE TOOLS
SUBSET
DATA SCIENTISTSBUSINESS USERS
EXTRACT
EXTRACTING DATA FOR AI IS EXPENSIVE AND SLOW
ENTERPRISES STRUGGLE TO MAKE AI MODELS AVAILABLE TO BUSINESS
???
Kinetica: A More Ideal AI Process
10
Monte Carlo Risk
Custom Function 2
Custom Function 3
API EXPOSES CUSTOM FUNCTIONS WHICH CAN BE MADE AVAILABLE TO BUSINESS USERS
BUSINESS USERS
DATA SCIENTISTS
UDFs
Current Inefficient Use of Python
11
python
• Interpreted• Single threaded• Clean, transform• Flow: for each member
• Pre-process• Model execute• Post-process
=
Optimized SQL and Python UDF with Kinetica
12
=
SQL
UDF
python
SQL
• Pre-process• Binary executable code• Superior optimization• declarative SQL
• Model execute• Only essential imperative model code• Not relational set processing
• Post-process• Binary executable code• Superior optimization• Declarative SQL
VariousETL/ELT
Head Node
Worker1
KINETICA: 10 Node Cluster
Worker9
Fact and dimensions tables for various Use CasesBillions of rows
Massive StreamIngestion
Massive FastAnalytics
Apache Tomcat Applications Servers• Spring Endpoint oriented architecture• Horizontal elastic scaling
Full Model Pipeline 1VariousETL/ELT
Full Model Pipeline N
Prompts Project
Comprehensive Solution ArchitectureMajor U.S Retailer
13
Fast StreamingProjects
Fast AnalyticsProjects
Use Case Example
MNIST: Simple Image Processing Use Case
15
A Parametric ModelPython Using TensorFlow
Model Training• Set of image files stored in Kinetica Database
Table• Python UDF in Kinetica using TensorFlow
Model Serving• Python UDF in Kinetica using TensorFlow• Input = table TFModel table.• Output = table mnist_inference_out
Model Analytics• SQL!
UDFtrain_nd_udf.py
Machine 0
Rank 0
Tom 0Table
mnist_trainingShard 0
TableTFModelShard 0
Tablemnist_inference
Shard 0
Tablemnist_inference_out
Shard 0
Tom 1Table
mnist_trainingShard 1
TableTFModelShard 1
Tablemnist_inference
Shard 1
Tablemnist_inference_out
Shard 1
Tom 2Table
mnist_trainingShard 2
TableTFModelShard 2
Tablemnist_inference
Shard 2
Tablemnist_inference_out
Shard 2
Tom 3Table
mnist_trainingShard 3
TableTFModelShard 3
Tablemnist_inference
Shard 3
Tablemnist_inference_out
Shard 3
Machine 0
Rank 0
Tom 4Table
mnist_trainingShard 4
TableTFModelShard 4
Tablemnist_inference
Shard 4
Tablemnist_inference_out
Shard 4
Tom 5Table
mnist_trainingShard 5
TableTFModelShard 5
Tablemnist_inference
Shard 5
Tablemnist_inference_out
Shard 5
Tom 6Table
mnist_trainingShard 6
TableTFModelShard 6
Tablemnist_inference
Shard 6
Tablemnist_inference_out
Shard 6
Tom 7Table
mnist_trainingShard 7
TableTFModelShard 7
Tablemnist_inference
Shard 7
Tablemnist_inference_out
Shard 7
UDF UDF UDF UDF UDF UDF UDF UDF
Model Training & Inference Data Model: MPP Sharding
Thank You!Come get your copy of the O’Reilly Book at Booth G.01!