predictionio – a machine learning server in scala – sf scala

27
Building and Deploying ML Applications on production in a fraction of the time. A Machine Learning Server in Scala

Upload: predictionio

Post on 14-Jul-2015

33.664 views

Category:

Engineering


8 download

TRANSCRIPT

Page 1: PredictionIO – A Machine Learning Server in Scala – SF Scala

Building and Deploying ML Applications on production

in a fraction of the time.

A Machine Learning Server in Scala

Page 2: PredictionIO – A Machine Learning Server in Scala – SF Scala

Available Tools

Processing Framework • e.g. Apache Spark, Apache Hadoop

Algorithm Libraries • e.g. MLlib, Mahout

Data Storage • e.g. HBase, Cassandra

Page 3: PredictionIO – A Machine Learning Server in Scala – SF Scala

Integrate everything together nicely and move from prototyping to production.

What is Missing?

Page 4: PredictionIO – A Machine Learning Server in Scala – SF Scala

You have a mobile app

A Classic Recommender Example…

AppPredict

products

You need a Recommendation Engine Predict products that a customer will like – and show it.

Predictive model

Algorithm - You don't need to write your own: Spark MLlib - ALS algorithm

Predictive model - based on users’ previous behaviors

Page 5: PredictionIO – A Machine Learning Server in Scala – SF Scala

def pseudocode () {// Read training data val trainingData = sc.textFile("trainingData.txt").map(_.split(',') match { …. })

// Build a predictive model with an algorithmval model = ALS.train(trainingData, 10, 20, 0.01)

// Make prediction allUsers.foreach { user => model.recommendProducts(user, 5)}

}

A Classic Recommender Example prototyping…

Page 6: PredictionIO – A Machine Learning Server in Scala – SF Scala

• How to deploy a scalable service that respond to dynamic prediction query?

• How do you persist the predictive model, in a distributed environment?

• How to make HBase, Spark and algorithms talking to each other?

• How should I prepare, or transform, the data for model training?

• How to update the model with new data without downtime?

• Where should I add some business logics?

• How to make the code configurable, re-usable and maintainable?

• How do I build all these with a separate of concerns (SoC)?

Beyond Prototyping

Page 7: PredictionIO – A Machine Learning Server in Scala – SF Scala

Engine

Event Server(data storage)

Data: User Actions

Query via REST: User ID

Predicted Result:A list of Product IDs

A Classic Recommender Example on production…

Mobile App

Page 8: PredictionIO – A Machine Learning Server in Scala – SF Scala

• PredictionIO is a machine learning server for building and deploying predictive engines on productionin a fraction of the time.

• Built on Apache Spark, MLlib and HBase.

PredictionIO

Page 9: PredictionIO – A Machine Learning Server in Scala – SF Scala

Data: User Actions

Query via REST: User ID

Predicted Result:A list of Product IDs Engine

Event Server(data storage)

Mobile App

Event Server

Page 10: PredictionIO – A Machine Learning Server in Scala – SF Scala

• $ pio eventserver

• Event-based client.create_event( event="rate", entity_type="user", entity_id=“user_123”, target_entity_type="item", target_entity_id=“item_100”, properties= { "rating" : 5.0 } )

Event Server Collecting Date

Page 11: PredictionIO – A Machine Learning Server in Scala – SF Scala

Query via REST: User ID

Predicted Result:A list of Product IDs Engine

Data: User Actions Event Server(data storage)

Mobile App

Engine

Page 12: PredictionIO – A Machine Learning Server in Scala – SF Scala

• DASE - the “MVC” for Machine Learning • Data: Data Source and Data Preparator • Algorithm(s) • Serving • Evaluator

Engine Building an Engine with Separation of Concerns (SoC)

Page 13: PredictionIO – A Machine Learning Server in Scala – SF Scala

A. Train deployable predictive model(s) B. Respond to dynamic query C. Evaluation

Engine Functions of an Engine

Page 14: PredictionIO – A Machine Learning Server in Scala – SF Scala

Engine A. Train predictive model(s)

class DataSource(…) extends PDataSource def readTraining(sc: SparkContext) ==> trainingData

class Preparator(…) extends PPreparator def prepare(sc: SparkContext, trainingData: TrainingData) ==> preparedData

class Algorithm1(…) extends PAlgorithm def train(prepareData: PreparedData) ==> Model

$ pio train

Page 15: PredictionIO – A Machine Learning Server in Scala – SF Scala

Engine A. Train predictive model(s)

class DataSource(…) extends PDataSource

override def readTraining(sc: SparkContext): TrainingData = { val eventsDb = Storage.getPEvents() val eventsRDD: RDD[Event] = eventsDb.find(….)(sc)

val ratingsRDD: RDD[Rating] = eventsRDD.map { event => val rating = try { val ratingValue: Double = event.event match {….} Rating(event.entityId, event.targetEntityId.get, ratingValue) } catch {…} rating } new TrainingData(ratingsRDD) }

Page 16: PredictionIO – A Machine Learning Server in Scala – SF Scala

Engine A. Train predictive model(s)

class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm

def train(preparedData: PreparedData): Model1 = { mllibRatings = data…. val m = ALS.train(mllibRatings, ap.rank, ap.numIterations, ap.lambda) new Model1( rank = m.rank, userFeatures = m.userFeatures, productFeatures = m.productFeatures ) }

Page 17: PredictionIO – A Machine Learning Server in Scala – SF Scala

Engine A. Train predictive model(s)

Event Server

Algorithm 1 Algorithm 3Algorithm 2

PreparedDate

Engine

Data Preparator

Data Source

TrainingDate

Model 3Model 1Model 2

Page 18: PredictionIO – A Machine Learning Server in Scala – SF Scala

B. Respond to dynamic queryEngine

• Query (Input) :$ curl -H "Content-Type: application/json" -d '{ "user": "1", "num": 4 }' http://localhost:8000/queries.json

case class Query(

val user: String, val num: Int ) extends Serializable

Page 19: PredictionIO – A Machine Learning Server in Scala – SF Scala

B. Respond to dynamic queryEngine

• Predicted Result (Output):{“itemScores”:[{"item":"22","score":4.072304374729956},{"item":"62","score":4.058482414005789}, {"item":"75","score":4.046063009943821}]}

case class PredictedResult( val itemScores: Array[ItemScore] ) extends Serializable

case class ItemScore( item: String, score: Double ) extends Serializable

Page 20: PredictionIO – A Machine Learning Server in Scala – SF Scala

class Algorithm1(…) extends PAlgorithm def predict(model: ALSModel, query: Query) ==> predictedResult

class Serving extends LServing def serve(query: Query, predictedResults: Seq[PredictedResult]) ==> predictedResult

B. Respond to dynamic queryEngine

Query via REST

Page 21: PredictionIO – A Machine Learning Server in Scala – SF Scala

Engine B. Respond to dynamic query

class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm

def predict(model: ALSModel, query: Query): PredictedResult = { model….{ userInt => val itemScores = model.recommendProducts (…).map (….) new PredictedResult(itemScores) }.getOrElse{….} }

Page 22: PredictionIO – A Machine Learning Server in Scala – SF Scala

B. Respond to dynamic queryEngine

Algorithm 1Model 1

Serving

Mobile App

Algorithm 3Model 3

Algorithm 2Model 2

Predicted Results

Query (input)

Predicted Result (output)

Engine

Page 23: PredictionIO – A Machine Learning Server in Scala – SF Scala

Engine DASE Factory

object RecEngine extends IEngineFactory { def apply() = { new Engine( classOf[DataSource], classOf[Preparator], Map("algo1" -> classOf[Algorithm1]), classOf[Serving]) } }

Page 24: PredictionIO – A Machine Learning Server in Scala – SF Scala

Running on Production

• Install PredictionIO$ bash -c "$(curl -s http://install.prediction.io/install.sh)"

• Start the Event Server $ pio eventserver

• Deploy an Engine$ pio build; pio train; pio deploy

• Update Engine Model with New Data $ pio train; pio deploy

Page 25: PredictionIO – A Machine Learning Server in Scala – SF Scala

Deploy on Production

Website

Mobile App

EmailCampaign

Event Server(data storage)

Data

Query via REST

PredictedResult

Engine 1

Engine 3

Engine 2

Engine 4

Page 26: PredictionIO – A Machine Learning Server in Scala – SF Scala

The Next Step

• Quickstart with an Engine Template!

• Follow on Github: github.com/predictionio/

• Learn PredictionIO: prediction.io/

• Learn Scala! Scala for the Impatient

• Contribute!

Page 27: PredictionIO – A Machine Learning Server in Scala – SF Scala

Thanks.

Simon [email protected]

@PredictionIO

prediction.io (Newsletters)

github.com/predictionio