operationalizing machine learning using...

OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS

1

Why GPUs?

2

A Tale of Numbers

100x

75%Performance

100x gains over traditional RDBMS / NoSQL / In-Mem Databases

Cores

Modern GPUs can consist of up to 3000+ cores compared to 32 in a CPU

Costs

75% reduction in infrastructure costs, licensing, staff, etc.

More with Less

Increase performance, throughput, capability while minimizing the costs to support the business

Performance Increase

Infrastructure Cost Savings

3000vs

32

Why a GPU Database?

• Leverage Innovations in CPUs and GPUs

• Single Hardware Platform• Simplified Software Stack

3

What are AI, ML, and Deep Learning?

4

ML

Predict y using function on data x

Deep LearningAI

AI/ML/Deep Learning Cheat Sheet

5

No shortage of techniques and programing languages

6

ML Cheat Sheet

Python and SQL cover almost all the algorithms in that scary spider and Kinetica supports all Python libraries!

7

ML/AI/Deep Learning Lifecycle

ML/AI/Deep Learning Lifecycle

•Create, extract, transform, and process big data: batch and streams• Apply ML to data.

•Model pre-processing•Model execution•Model post-processing

•Within an ecosystem of general analytics

• Supporting a range of human and machine consumers

9

9

Typical AI Process: High Latency, Rigid, Complex Tech Stack

SPECIALIZED AI/ DATA

SCIENCE TOOLS

SUBSET

DATA SCIENTISTSBUSINESS USERS

EXTRACT

EXTRACTING DATA FOR AI IS EXPENSIVE AND SLOW

ENTERPRISES STRUGGLE TO MAKE AI MODELS AVAILABLE TO BUSINESS

???

Kinetica: A More Ideal AI Process

10

Monte Carlo Risk

Custom Function 2

Custom Function 3

API EXPOSES CUSTOM FUNCTIONS WHICH CAN BE MADE AVAILABLE TO BUSINESS USERS

BUSINESS USERS

DATA SCIENTISTS

UDFs

Current Inefficient Use of Python

11

python

• Interpreted• Single threaded• Clean, transform• Flow: for each member

• Pre-process• Model execute• Post-process

=

Optimized SQL and Python UDF with Kinetica

12

=

SQL

UDF

python

SQL

• Pre-process• Binary executable code• Superior optimization• declarative SQL

• Model execute• Only essential imperative model code• Not relational set processing

• Post-process• Binary executable code• Superior optimization• Declarative SQL

VariousETL/ELT

Head Node

Worker1

KINETICA: 10 Node Cluster

Worker9

Fact and dimensions tables for various Use CasesBillions of rows

Massive StreamIngestion

Massive FastAnalytics

Apache Tomcat Applications Servers• Spring Endpoint oriented architecture• Horizontal elastic scaling

Full Model Pipeline 1VariousETL/ELT

Full Model Pipeline N

Prompts Project

Comprehensive Solution ArchitectureMajor U.S Retailer

13

Fast StreamingProjects

Fast AnalyticsProjects

Use Case Example

MNIST: Simple Image Processing Use Case

15

A Parametric ModelPython Using TensorFlow

Model Training• Set of image files stored in Kinetica Database

Table• Python UDF in Kinetica using TensorFlow

Model Serving• Python UDF in Kinetica using TensorFlow• Input = table TFModel table.• Output = table mnist_inference_out

Model Analytics• SQL!

UDFtrain_nd_udf.py

Machine 0

Rank 0

Tom 0Table

mnist_trainingShard 0

TableTFModelShard 0

Tablemnist_inference

Shard 0

Tablemnist_inference_out

Shard 0

Tom 1Table


TableTFModelShard 1


Shard 1


Shard 1

Tom 2Table


TableTFModelShard 2


Shard 2


Shard 2

Tom 3Table


TableTFModelShard 3


Shard 3


Shard 3

Machine 0

Rank 0

Tom 4Table


TableTFModelShard 4


Shard 4


Shard 4

Tom 5Table


TableTFModelShard 5


Shard 5


Shard 5

Tom 6Table


TableTFModelShard 6


Shard 6


Shard 6

Tom 7Table


TableTFModelShard 7


Shard 7


Shard 7

UDF UDF UDF UDF UDF UDF UDF UDF

Model Training & Inference Data Model: MPP Sharding

[email protected]

Thank You!Come get your copy of the O’Reilly Book at Booth G.01!

operationalizing machine learning using...

Documents