prediction database: the need of the hour -...

19
Prediction DataBase: The Need of The Hour Devavrat Shah Professor EECS Director Stats & Data Sc Massachusetts Institute of Technology Co-Funder Chief Scientist Celect, Inc.

Upload: leduong

Post on 12-Feb-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

Prediction DataBase: The Need of The Hour

Devavrat Shah

ProfessorEECS

DirectorStats & Data Sc

Massachusetts Institute of Technology

Co-FunderChief Scientist

Celect, Inc.

Page 2: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

© 2015 Celect, Inc. All Rights Reserved.Use, reproduction, or disclosure is subject to restrictions set forth in Contract Number 2014-14031000011 and Sub Contract No. Celect 01Use, reproduction, or disclosure is subject to restrictions set forth in Contract Number 2014-14031000011 and Sub Contract No. Celect 01

An Ultimate Prediction Engine?

Prediction

ConfidenceProvenance

BigData

Heterogeneous Data

Sparse Data

Up-and-Running Instantly without

Team of Data Scientists

Add Data Incrementally

Stitch Different Data Sources for Better Predictions

Anybody (Excel user) can use it!

Page 3: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

Existing Paradigm of Statistics / Machine Learning

Application

DataStore

Manual Data Processing

Predictive Queries

“Normalized Data”

Model Learning, Prediction

Page 4: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

Existing Paradigm of Statistics / Machine Learning

Application

DataStore

Manual Data Processing

Predictive Queries

“Normalized Data”

Model Learning, Prediction

Getting Rid of This!

Page 5: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

Prediction DataBase

Application

DataStore

Predictive Queries

A Software Layer

No Manual Processing

Page 6: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

Predictive DataBase: A New Data Infrastructure

A Brief History of DataBase

1970s-80s: relational database like MySQL and Postgres

1980s-90s: personal database like Excel

1990s-00s: distributed database like Cassandra

Now: Prediction database

2000s-10s: search engines database like Elastic Search

Page 7: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

Formal Description

9: ?

Schema less DB

key : value key : ?

Atomic Prediction

Name Table

1: ‘Vasudha Shivamoggi’2: ‘Devavrat Shah’

3: ‘Vishal Doshi’

4: ‘Ying-zong Huang’

5: ‘John Andrews’

6: ‘Balaji Rengarajan’

7: ‘Ritesh Madan’

8: ‘Daniel Xu’

Page 8: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

Formal Description

Name Table

1: ‘Vasudha Shivamoggi’2: ‘Devavrat Shah’

3: ‘Vishal Doshi’

4: ‘Ying-zong Huang’

5: ‘John Andrews’

8: ‘Daniel Xu’

6: ‘Balaji Rengarajan’

7: ‘Ritesh Madan’

Schema less DB

key : value key : ?

Atomic Prediction

Gender Table

1: ‘Female’2: ‘Male’

3: ‘Male’

4: ‘Male’

5: ‘Male’

8: ‘Male’

6: ‘Male’

7: ‘Male’

9: ‘John Tsitsiklis’ 9: ?

Page 9: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

Formal Description

Name Table

1: ‘Vasudha Shivamoggi’2: ‘Devavrat Shah’

3: ‘Vishal Doshi’

4: ‘Ying-zong Huang’

5: ‘John Andrews’

8: ‘Daniel Xu’

6: ‘Balaji Rengarajan’

7: ‘Ritesh Madan’

Schema less DB

key : value key : ?

Atomic Prediction

Gender Table

1: ‘Female’2: ‘Male’

3: ‘Male’

4: ‘Male’

5: ‘Male’

8: ‘Male’

6: ‘Male’

7: ‘Male’

9: ‘John Tsitsiklis’ 9: ?

May be ‘Male’

Page 10: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

Formal Description

(1, ‘Name’): ‘Vasudha Shivamoggi’

(2, ‘Name’): ‘Devavrat Shah’

(3, ‘Name’): ‘Vishal Doshi’

(4, ‘Name’): ‘Ying-zong Huang’

(5, ‘Name’): ‘John Andrews’

(8, ‘Name’): ‘Daniel Xu’

(6, ‘Name’): ‘Balaji Rengarajan’

(7, ‘Name’): ‘Ritesh Madan’

Schema less Prediction DB

(key, table name) : value (key, table name) : ?

Atomic Prediction

(1, ‘Gender’): ‘Female’

(2, ‘Gender’): ‘Male’

(3, ‘Gender’): ‘Male’

(4, ‘Gender’): ‘Male’

(5, ‘Gender’): ‘Male’

(8, ‘Gender’): ‘Male’

(6, ‘Gender’): ‘Male’

(7, ‘Gender’): ‘Male’

(9, ‘Name’): ‘John Tsitsiklis’ (9, ‘Gender’): ?

Page 11: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

Formal Description

value

text

numeric

image

geoJson

Schema less Prediction DB

(key, table name) : value (key, table name) : ?

Atomic Prediction

(key1, key2, table name) : value (key1, key2, table name) : ?

Page 12: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

Graph DB: A Special Case

Schema less Prediction DB

(key, table name) : value (key, table name) : ?

Atomic Prediction

(key1, key2, table name) : value (key1, key2, table name) : ?

1

2

3

4

Page 13: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

Graph DB: A Special Case

Schema less Prediction DB

(key, table name) : value (key, table name) : ?

Atomic Prediction

(key1, key2, table name) : value (key1, key2, table name) : ?

1

2

3

4

(1,2, retweet) : ‘GeoInt’

‘GeoInt’

Page 14: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

Graph DB: A Special Case

Schema less Prediction DB

(key, table name) : value (key, table name) : ?

Atomic Prediction

(key1, key2, table name) : value (key1, key2, table name) : ?

1

2

3

4

(1,2, retweet) : ‘GeoInt’

‘GeoInt’

(1,2, SMS) : ‘Meet @ Hyatt Dulles’

‘Meet @ Hyatt Dulles’

Page 15: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

Graph DB: A Special Case

Schema less Prediction DB

(key, table name) : value (key, table name) : ?

Atomic Prediction

(key1, key2, table name) : value (key1, key2, table name) : ?

1

2

3

4

(1,2, retweet) : ‘GeoInt’

‘GeoInt’

(1,2, SMS) : ‘Meet @ Hyatt Dulles’

‘Meet @ Hyatt Dulles’

(1,name) : ‘Dev’

‘Dev’

Page 16: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

This is Not A Pipe Dream: Celect Has Built It

Application

DataStore

Schema Definition

Predictive Queries

Page 17: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

5% - 15% Increase in Revenue

System in Cloud Auto-Scales as Data Grows

100Gbs/Day 100M+ Customers 100M+ Products

< 100 milliseconds API Response Time

Celect in Retail: At Fortune 500 Scale

Page 18: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

Celect Beyond Retail

? ???

??

?

Page 19: Prediction DataBase: The Need of The Hour - USGIFusgif.org/system/uploads/4846/original/predictionDB_public.pdf · Prediction DataBase: The Need of The Hour Devavrat Shah Professor

Parting Remarks

Prediction Database

New Paradigm for Modern Statistics and Machine Learning

Make Prediction a “Special” Database Query

Celect, Inc. Has Built Such an Infrastructure

Can Support Most (If Not All) Problems of Interest

Successful in Retail IndustryIntriguing Case-Studies Beyond Retail

Handles Unstructured Data: Text, GeoSpatial, Image