![Page 1: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/1.jpg)
PAPIs 2015
Akka & Data Science:Making real-time predictionsBrian Gawalt2nd International Conference on Predictive APIs and AppsAugust 7, 2015
![Page 2: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/2.jpg)
PAPIs 2015
[A]Sometimes, data scientists need to worry about throughput.
2
![Page 3: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/3.jpg)
PAPIs 2015
[B]One way to increase throughput is with concurrency.
3
![Page 4: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/4.jpg)
PAPIs 2015
[C]The Actor Model is an easy way to build a concurrent system.
4
![Page 5: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/5.jpg)
PAPIs 2015
[D]Scala+Akka provides an easy-to-use Actor Model context.
5
![Page 6: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/6.jpg)
PAPIs 2015
[A + B + C + D ⇒ E]Data scientists should check out Scala+Akka.
6
![Page 7: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/7.jpg)
PAPIs 2015
Consider:● building a model, ● vs. using a model
7
![Page 8: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/8.jpg)
PAPIs 2015
Lots of ways to practice building a model
8
![Page 9: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/9.jpg)
PAPIs 2015
The Classic Process
1. Load your data set’s raw materials
2. Produce feature vectors:
o Training,
o Validation,
o Testing
3. Build the model with training and validation vectors
4 U th d l t t/ t9
![Page 10: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/10.jpg)
PAPIs 2015
The Classic Process: One-time Testing
10
Load train/valid./test materials
Make train/valid./test feature vectors
Train Model
Make test predictions
Build
Use
![Page 11: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/11.jpg)
PAPIs 2015
The Classic Process: Repeated Testing
11
Load train/valid. materials
Make train/valid. feature vectors
Train Model
Load test/new materials
Make test/new feature vectors
Make test/new predictions
(saved model)
(repeat every K minutes)
Build
Use
![Page 12: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/12.jpg)
PAPIs 2015
Sometimes my tasks work like that, too!
12
![Page 13: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/13.jpg)
PAPIs 2015
But this talk is about the other kind of tasks.
13
![Page 14: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/14.jpg)
PAPIs 2015
[A]Sometimes, data scientists need to worry about throughput.
14
![Page 15: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/15.jpg)
PAPIs 2015
Example:Freelancer availability on
15
![Page 16: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/16.jpg)
PAPIs 2015
Hiring Freelancers on Upwork
1. Post a job
2. Search for freelancers
3. Find someone you like
4. Ask them to interview
o Request Accepted!
o or rejected/ignored...16
THE TASK:
Look at recent freelancer behavior, and predict, at time Step 2, who’s likely to accept an invite at time Step 4
![Page 17: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/17.jpg)
PAPIs 2015
Building this model is business as usual:
17
![Page 18: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/18.jpg)
PAPIs 2015
Building Availability Model
1. Load raw materials:
o Examples of accepts/rejects
o Histories of freelancer site activity
Job applications sent or received
Hours worked
Click logs
Profile updates
2. Produce feature vectors: 18
Greenplum
Amazon S3
Internal Service
![Page 19: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/19.jpg)
PAPIs 2015
Using Availability Model
19
Load train/valid. materials
Make train/valid. feature vectors
Train Model
Load test/new materials
Make test/new feature vectors
Make test/new predictions
(saved model)
(repeat every 60 minutes)
![Page 20: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/20.jpg)
PAPIs 2015
Using Availability Model
20
Load test/new materials
Make test/new feature vectors
Make test/new predictions
(saved model)
(repeat every 60 minutes)
Load job app data(4 min.)
Load click log data(30 min.)
Load work hours data(5 min.)
Load profile data(20 ms/profile)
![Page 21: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/21.jpg)
PAPIs 2015
Using Availability Model
21
Load job app data(4 min.)
Load click log data(30 min.)
Load work hours data(5 min.)
Load profile data(20 ms/profile)
● Left with under 21 minutes to collect profile data○ Rate limit: 20 ms/profile○ At most, 63K profiles per
hour● Six Million freelancers who
need avail. predictions: expect ~90 hours between re-scoring any individual
● Still need to spend time actually building vectors and exporting scores!
![Page 22: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/22.jpg)
PAPIs 2015
[B]One way to increase throughput is with concurrency.
22
![Page 23: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/23.jpg)
PAPIs 2015
Expensive Option:Major infrastructure overhaul
23
![Page 24: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/24.jpg)
PAPIs 2015
… but that takes a lot of time, attention, and cooperation…
24
![Page 25: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/25.jpg)
PAPIs 2015
Simpler Option:The Actor Model
25
![Page 26: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/26.jpg)
PAPIs 2015
[C]The Actor Model is an easy way to build a concurrent system.
26
![Page 27: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/27.jpg)
PAPIs 2015
● Imagine a mailbox with a brain● Computation only begins when/if a
message arrives● Keeps its thoughts private:
○ No other actor can actively read this actor’s state
○ Other actors will have to wait to hear a message from this actor
An Actor
27
![Page 28: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/28.jpg)
PAPIs 2015
● Lots of Actors, and each has:○ Private message queue○ Private state, shared only sending more
messages● Execution context:
○ Manages threading of each Actor’s computation
○ Handles asynch. message routing○ Can send prescheduled messages
● Each received message’s computation is fully completedbefore Actor moves on to next message in queue
The Actor Model of Concurrency
28
![Page 29: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/29.jpg)
PAPIs 2015
The Actor Model of Concurrency
29
Execution Context
![Page 30: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/30.jpg)
PAPIs 2015
Parallelizing predictions
30
Refresh work hours
Vectorizer:● Keep copies of raw data● Emit vector for each new
profile received
Refresh job apps
Refresh click log Fetch 10 profiles
Apply model; export
prediction
raw data
raw data
Schedule: Fetch once per hour Schedule: Fetch once per hour
Schedule: Fetch once per hour Schedule: Fetch every 300ms
![Page 31: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/31.jpg)
PAPIs 2015
Serial processing
31
Refresh job apps
Make feature vectors
Export predictions
(repeat every 60 minutes)
Refresh work hours
Refresh click log
Fetch ~50K profiles
...
55 min
5 min
4 min
5 min
30 min
55 - 4 - 5 - 30 = 16 min
...
![Page 32: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/32.jpg)
PAPIs 2015
Serial processing
32
Refresh job apps
Make feature vectors
Export predictions
(repeat every 60 minutes)
Refresh work hours
Refresh click log
Fetch ~50K profiles
...
55 min
5 min
4 min
5 min
30 min
55 - 4 - 5 - 30 = 16 min
... Throughput:48K users/hr
![Page 33: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/33.jpg)
PAPIs 2015
Parallel Processing with Actors
33
Refresh job apps
...
Refresh click log
Refresh work hrs.
Rx data
Fetch pro.
Export
Rx data
Fetch pro.
Fetch pro.
Fetch pro.
Fetch pro.= msg. sent= msg. rx’d
1/hr.
1/hr.
1/hr. 3/sec. (as rx’ed)
Store
Store
Vectorize
Vectorize
Store
1/hr.
Thr. 1 Thr. 2 Thr. 3 Thr. 4
Vectorize
Fetch pro.
Fetch pro.(msg. processing time not to scale)
Rx data
Vectorize
...
![Page 34: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/34.jpg)
PAPIs 2015
Parallel Processing with Actors
34
Refresh job apps
...
Refresh click log
Refresh work hrs.
Rx data
Fetch pro.
Export
Rx data
Fetch pro.
Fetch pro.
Fetch pro.
Fetch pro.= msg. sent= msg. rx’d
1/hr.
1/hr.
1/hr. 3/sec. (as rx’ed)
Store
Store
Vectorize
Vectorize
Store
1/hr.
Thr. 1 Thr. 2 Thr. 3 Thr. 4
Vectorize
Fetch pro.
Fetch pro.
Throughput:180K users/hr
Rx data
Vectorize
...
![Page 35: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/35.jpg)
PAPIs 2015
[D]Scala+Akka provides an easy-to-use Actor Model context.
35
![Page 36: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/36.jpg)
PAPIs 2015
Message passing, scheduling, & computation behavior defined in 445 lines.
36
![Page 37: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/37.jpg)
PAPIs 2015
Scala+Akka Actors
● Create Scala class, mix in Actor trait
● Implement the required partial function: receive: PartialFunction[Any, Unit]
● Define family of message objects this actor’s planning to handle
● Define behavior for each message case in receive
37
![Page 38: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/38.jpg)
PAPIs 2015
Scala+Akka Actors
38
Mixin same code used for export in non-Actor version
Private, mutable state: stored scores
Private, mutable state: time of last export
If receiving new scores: store them!
If storing lots of scores, or if it’s been awhile: upload what’s stored, then erase them
If told to shut down, stop accepting new scores
![Page 39: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/39.jpg)
PAPIs 2015
Scala+Akka Pros
● Easy to get productive in the Scala language
● SBT dependency management makes it easy to move to any box with a JRE
● No global interpreter lock!
39
![Page 40: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/40.jpg)
PAPIs 2015
Scala+Akka Cons
● Moderate Scala learning curve
● Object representation on the JVM has pretty lousy memory efficiency
● Not a lot of great options for building models in Scala (compared to R, Python, Julia)
40
![Page 41: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/41.jpg)
PAPIs 2015
[A]Sometimes, data scientists need to worry about throughput.
41
![Page 42: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/42.jpg)
PAPIs 2015
[B]One way to increase throughput is with concurrency.
42
![Page 43: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/43.jpg)
PAPIs 2015
[C]The Actor Model is an easy way to build a concurrent system.
43
![Page 44: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/44.jpg)
PAPIs 2015
[D]Scala+Akka provides an easy-to-use Actor Model context.
44
![Page 45: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/45.jpg)
PAPIs 2015
[A + B + C + D ⇒ Z]Data scientists should check out Scala+Akka
45
![Page 46: [Research] deploying predictive models with the actor framework - Brian Gawalt](https://reader031.vdocument.in/reader031/viewer/2022030310/58f9a9a5760da3da068b70cf/html5/thumbnails/46.jpg)
PAPIs 2015
Thanks!Questions?
bgawalt@{upwork, gmail}.comtwitter.com/bgawalt