operationalizing machine learning using...

17
OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS 1

Upload: others

Post on 05-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS

1

Page 2: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

Why GPUs?

2

A Tale of Numbers

100x

75%Performance

100x gains over traditional RDBMS / NoSQL / In-Mem Databases

Cores

Modern GPUs can consist of up to 3000+ cores compared to 32 in a CPU

Costs

75% reduction in infrastructure costs, licensing, staff, etc.

More with Less

Increase performance, throughput, capability while minimizing the costs to support the business

Performance Increase

Infrastructure Cost Savings

3000vs

32

Page 3: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

Why a GPU Database?

• Leverage Innovations in CPUs and GPUs

• Single Hardware Platform• Simplified Software Stack

3

Page 4: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

What are AI, ML, and Deep Learning?

4

ML

Predict y using function on data x

Deep LearningAI

Page 5: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

AI/ML/Deep Learning Cheat Sheet

5

No shortage of techniques and programing languages

Page 6: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

6

ML Cheat Sheet

Python and SQL cover almost all the algorithms in that scary spider and Kinetica supports all Python libraries!

Page 7: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

7

ML/AI/Deep Learning Lifecycle

Page 8: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

ML/AI/Deep Learning Lifecycle

•Create, extract, transform, and process big data: batch and streams• Apply ML to data.

•Model pre-processing•Model execution•Model post-processing

•Within an ecosystem of general analytics

• Supporting a range of human and machine consumers

9

Page 9: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

9

Typical AI Process: High Latency, Rigid, Complex Tech Stack

SPECIALIZED AI/ DATA

SCIENCE TOOLS

SUBSET

DATA SCIENTISTSBUSINESS USERS

EXTRACT

EXTRACTING DATA FOR AI IS EXPENSIVE AND SLOW

ENTERPRISES STRUGGLE TO MAKE AI MODELS AVAILABLE TO BUSINESS

???

Page 10: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

Kinetica: A More Ideal AI Process

10

Monte Carlo Risk

Custom Function 2

Custom Function 3

API EXPOSES CUSTOM FUNCTIONS WHICH CAN BE MADE AVAILABLE TO BUSINESS USERS

BUSINESS USERS

DATA SCIENTISTS

UDFs

Page 11: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

Current Inefficient Use of Python

11

python

• Interpreted• Single threaded• Clean, transform• Flow: for each member

• Pre-process• Model execute• Post-process

=

Page 12: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

Optimized SQL and Python UDF with Kinetica

12

=

SQL

UDF

python

SQL

• Pre-process• Binary executable code• Superior optimization• declarative SQL

• Model execute• Only essential imperative model code• Not relational set processing

• Post-process• Binary executable code• Superior optimization• Declarative SQL

Page 13: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

VariousETL/ELT

Head Node

Worker1

KINETICA: 10 Node Cluster

Worker9

Fact and dimensions tables for various Use CasesBillions of rows

Massive StreamIngestion

Massive FastAnalytics

Apache Tomcat Applications Servers• Spring Endpoint oriented architecture• Horizontal elastic scaling

Full Model Pipeline 1VariousETL/ELT

Full Model Pipeline N

Prompts Project

Comprehensive Solution ArchitectureMajor U.S Retailer

13

Fast StreamingProjects

Fast AnalyticsProjects

Page 14: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

Use Case Example

Page 15: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

MNIST: Simple Image Processing Use Case

15

A Parametric ModelPython Using TensorFlow

Model Training• Set of image files stored in Kinetica Database

Table• Python UDF in Kinetica using TensorFlow

Model Serving• Python UDF in Kinetica using TensorFlow• Input = table TFModel table.• Output = table mnist_inference_out

Model Analytics• SQL!

Page 16: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

UDFtrain_nd_udf.py

Machine 0

Rank 0

Tom 0Table

mnist_trainingShard 0

TableTFModelShard 0

Tablemnist_inference

Shard 0

Tablemnist_inference_out

Shard 0

Tom 1Table

mnist_trainingShard 1

TableTFModelShard 1

Tablemnist_inference

Shard 1

Tablemnist_inference_out

Shard 1

Tom 2Table

mnist_trainingShard 2

TableTFModelShard 2

Tablemnist_inference

Shard 2

Tablemnist_inference_out

Shard 2

Tom 3Table

mnist_trainingShard 3

TableTFModelShard 3

Tablemnist_inference

Shard 3

Tablemnist_inference_out

Shard 3

Machine 0

Rank 0

Tom 4Table

mnist_trainingShard 4

TableTFModelShard 4

Tablemnist_inference

Shard 4

Tablemnist_inference_out

Shard 4

Tom 5Table

mnist_trainingShard 5

TableTFModelShard 5

Tablemnist_inference

Shard 5

Tablemnist_inference_out

Shard 5

Tom 6Table

mnist_trainingShard 6

TableTFModelShard 6

Tablemnist_inference

Shard 6

Tablemnist_inference_out

Shard 6

Tom 7Table

mnist_trainingShard 7

TableTFModelShard 7

Tablemnist_inference

Shard 7

Tablemnist_inference_out

Shard 7

UDF UDF UDF UDF UDF UDF UDF UDF

Model Training & Inference Data Model: MPP Sharding

Page 17: OPERATIONALIZING MACHINE LEARNING USING GPUon-demand.gputechconf.com/gtc-eu/2017/presentation/... · process big data: batch and streams •Apply ML to data. •Model pre-processing

[email protected]

Thank You!Come get your copy of the O’Reilly Book at Booth G.01!