velox: models in action

90
VELOX: MODELS IN ACTION Presented by Dan Crankshaw [email protected] Henry Milner, Joseph Gonzalez, Peter Bailis, Haoyuan Li, Tomer Kaftan, Zhao Zhang, Ali Ghodsi, Michael Franklin, Michael Jordan, and Ion Stoica https://amplab.cs.berkeley.edu/projects/velox/

Upload: dan-crankshaw

Post on 13-Jul-2015

1.551 views

Category:

Software


2 download

TRANSCRIPT

VELOX:MODELS IN ACTION

Presented by Dan Crankshaw [email protected]

Henry Milner, Joseph Gonzalez, Peter Bailis, Haoyuan Li, Tomer Kaftan,Zhao Zhang, Ali Ghodsi, Michael Franklin, Michael Jordan, and Ion Stoica

https://amplab.cs.berkeley.edu/projects/velox/

Data

ModelPredictionsPredict

Train

Observe

Well Studied

MODELS AT REST

Data

ModelPredictionsServing

TrainingFeedb

ack

OpenChallenges

Data

ModelPredictionsServing

TrainingFeedb

ack

OpenChallenges

Velox Model Management System

Catify: Music for Cats

Node.js App Server

Apache Web Server

MongoDB

Catify: Music for Cats

MODELING TASK

Rating

Songs

MODELING TASK

Ratings

Songs

Prediction

Data

ModelPredictionsServing

TrainingFeedb

ack

Catify: Music for Cats

Tachyon + HDFS

Pipeline

CatID Song Score

1 16 2.1

1 14 3.7

3 273 4.2

4 14 1.9

Catify: Music for Cats

Tachyon + HDFS

Pipeline

CatID Song Score

1 16 2.1

1 14 3.7

3 273 4.2

4 14 1.9

Catify: Music for Cats

Tachyon + HDFS

Pipeline

CatID Song Score

1 16 2.1

1 14 3.7

3 273 4.2

4 14 1.9

Pipeline

Tachyon + HDFS

Node.js App Server

Apache Web Server

MongoDB

Catify: Music for Cats

Data

ModelPredictionsServing

TrainingFeedb

ack

Pipeline

Tachyon + HDFS

Node.js App Server

Apache Web Server

MongoDB

Catify: Music for Cats

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Materialize all predictions

Pipeline

Catify: Music for Cats

Catify: Music for Cats

SongsO(users + songs)

Users

Songs

Users

O(users * songs)

Catify: Music for Cats

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Catify: Music for Cats

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Training Data

Catify: Music for Cats

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Training Data

New Model

Catify: Music for Cats

What’s wrong?

1. Built from scratch for each application

What’s wrong?

1. Built from scratch for each application

2. Different systems

What’s wrong?

1. Built from scratch for each application

2. Different systems3. Space inefficient

What’s wrong?

1. Built from scratch for each application

2. Different systems3. Space inefficient4. Stale predictions

What’s wrong?

1. Built from scratch for each application

2. Different systems3. Space inefficient4. Stale predictions5. The T-Swift effect Sample Bias

What’s wrong?

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Training Data

New Model

Catify: Music for Cats

Pipeline

Tachyon + HDFS

Web Application Velox

The Missing Piece

Data

ModelPredictionsServing

TrainingFeedb

ack

Tachyon + HDFS

Velox

The Missing Piece

Prediction Service

Model Manager

Web Application

Pipeline

BENEFITS

BENEFITS1. Low-latency and scalable

predictions as a service

BENEFITS1. Low-latency and scalable

predictions as a service2. Integrated approach leads to

fresher, better predictions

BENEFITS1. Low-latency and scalable

predictions as a service2. Integrated approach leads to

fresher, better predictions3. Easy translation to production

predictions

BENEFITS1. Low-latency and scalable

predictions as a service2. Integrated approach leads to

fresher, better predictions3. Easy translation to production

predictions4. Eases operational pain

PERSONALIZED MODELING

PERSONALIZED MODELING

wu · f(x; ✓)Rating =

PERSONALIZED MODELING

Shared BasisFeature Models

wu · f(x; ✓)Rating =

PERSONALIZED MODELING

Shared BasisFeature Models

PersonalizedUser Model

wu · f(x; ✓)Rating =

PERSONALIZED MODELING

Shared BasisFeature Models

PersonalizedUser Model

wu · f(x; ✓)

Change slowly

Rating =

PERSONALIZED MODELING

Shared BasisFeature Models

PersonalizedUser Model

wu · f(x; ✓)

Change slowlyHighly dynamic

Rating =

PERSONALIZED MODELING

Data

ModelPredictionsServing

TrainingFeedb

ack

VELOX

Pipeline

Tachyon + HDFS

VeloxPrediction Service

Model Manager

Web Application

Predictions as a service

VELOX

Pipeline

Tachyon + HDFS

VeloxPrediction Service

Model Manager

Web Application

Predictions as a service

PREDICTION API

GET  /velox/catify/predict_top_k?userid=22&k=100

GET  /velox/catify/predict?userid=22&song=27632

PREDICTION API

GET  /velox/catify/predict_top_k?userid=22&k=100

GET  /velox/catify/predict?userid=22&song=27632

PREDICTION API

GET  /velox/catify/predict_top_k?userid=22&k=100

GET  /velox/catify/predict?userid=22&song=27632

PREDICTIONS

def  predict(  u:  UUID,  x:  Context  )

wu · f(x; ✓)

Look up user weight

PREDICTIONS

def  predict(  u:  UUID,  x:  Context  )

wu · f(x; ✓)

Compute Features

Look up user weight

PREDICTIONS

def  predict(  u:  UUID,  x:  Context  )

wu · f(x; ✓)

LOW-LATENCY PREDICTIONS

Velox

Tachyon

Partition  0

Velox

Tachyon

Partition  1

Velox

Tachyon

Partition  2

Partition users

Compute Features

Look up user weight

PREDICTIONS

def  predict(  u:  UUID,  x:  Context  )

wu · f(x; ✓)

LOW-LATENCY PREDICTIONS

Velox

Tachyon

Feature Cache

LOW-LATENCY PREDICTIONS

Velox

Tachyon

Feature Cache

Features shared between users

Data

ModelPredictionsServing

TrainingFeedb

ack

Data

ModelPredictionsServing

TrainingFeedb

ack

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Catify: Music for Cats

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Training Data

Catify: Music for Cats

SIMPLE EXPLORATION

Rating

Songs

Prediction

SIMPLE EXPLORATION

Rating

Songs

Prediction

Epsilon-greedy

SIMPLE EXPLORATION

Rating

Songs

Prediction

Epsilon-greedy

ACTIVE LEARNING

Rating

Songs

Prediction

ACTIVE LEARNING: LinUCB

Rating

Songs

Prediction

Uncertainty

Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. WWW '10: Proceedings of the 19th international conference on World wide web, New York, New York, USA:  ACM. doi:10.1145/1772690.1772758

ACTIVE LEARNING: LinUCB

Rating

Songs

Prediction

Look at upper confidence bound

Uncertainty

Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. WWW '10: Proceedings of the 19th international conference on World wide web, New York, New York, USA:  ACM. doi:10.1145/1772690.1772758

ACTIVE LEARNING: LinUCB

Rating

Songs

Prediction

Look at upper confidence bound

Uncertainty

Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. WWW '10: Proceedings of the 19th international conference on World wide web, New York, New York, USA:  ACM. doi:10.1145/1772690.1772758

Data

ModelPredictionsServing

TrainingFeedb

ack

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Velox

Catify: Music for Cats

Prediction Service

Model Manager

Data

ModelPredictionsServing

TrainingFeedb

ackMgmt.

Data

ModelPredictionsServing

TrainingFeedb

ackMgmt.

RealtimeLearning

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Training Data

New Model

Catify: Music for Cats

GET  /velox/catify/predict?userid=22&song=27632

GET  /velox/catify/predict_top_k?userid=22&k=100

USER-FACING API

GET  /velox/catify/predict?userid=22&song=27632

GET  /velox/catify/predict_top_k?userid=22&k=100

USER-FACING API

POST  /velox/catify/observe?userid=22&song=27632?score=3.7

ONLINE UPDATES

def  observe(u:  UUID,  x:  Context,  y:  Score)

wu · f(x; ✓)

Update wu with new training point

ONLINE UPDATES

def  observe(u:  UUID,  x:  Context,  y:  Score)

wu · f(x; ✓)

Basis functions stay fixed

Update wu with new training point

ONLINE UPDATES

def  observe(u:  UUID,  x:  Context,  y:  Score)

wu · f(x; ✓)

Data

ModelPredictionsServing

TrainingFeedb

ackMgmt.

RealtimeLearning

Data

ModelPredictionsServing

TrainingFeedb

ackMgmt.

RealtimeLearning + Offline Retraining

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Velox

Catify: Music for Cats

Prediction Service

Model Manager

Data

ModelPredictionsServing

Feedb

ack

Velox Model Management System

Spark

The future of research in scalable learning systems will be in the integration of the learning lifecycle:

Data

ModelPredictionsServing

TrainingFeedb

ack

SUMMARY

•Model training and predictions rely on ad-hoc, manual processes spread across multiple systems

SUMMARY

•Model training and predictions rely on ad-hoc, manual processes spread across multiple systems

•The Velox system automatically maintains multiple models while providing low latency, scalable, and personalized predictions

SUMMARY

•Model training and predictions rely on ad-hoc, manual processes spread across multiple systems

•The Velox system automatically maintains multiple models while providing low latency, scalable, and personalized predictions

•Velox is part of BDAS, is coming soon…

SUMMARY

•Model training and predictions rely on ad-hoc, manual processes spread across multiple systems

•The Velox system automatically maintains multiple models while providing low latency, scalable, and personalized predictions

•Velox is part of BDAS, is coming soon…•https://amplab.cs.berkeley.edu/projects/velox/

SUMMARY

BACKUP MATERIAL

RETRAIN OFFLINEdef  retrainOffline(sc:  SparkContext,  

trainingData:  RDD)

wu · f(x; ✓)

Retrain feature functions

RETRAIN OFFLINEdef  retrainOffline(sc:  SparkContext,  

trainingData:  RDD)

wu · f(x; ✓)

Use Spark for batch retrain