scaling mobile millennium - uc berkeley amp...
TRANSCRIPT
![Page 1: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/1.jpg)
August 22, 2012
Scaling Mobile Millennium with BDAS
Timothy Hunter,Teodor Moldovan, Matei Zaharia, Samy Merzgui, Justin Ma,Michael J. Franklin, Pieter Abbeel, Alexandre M. Bayen
UC Berkeley
![Page 2: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/2.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 2/34
Machine learning at scale
● Combining A, M and P in a real application:● Complex models (car traffic estimation)● Crowd-sourced data (mobile phones)● Computations on the cloud
● How we run Spark inside Mobile Millennium
![Page 3: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/3.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 3/34
Plan
● Why car traffic estimation● Overview of Mobile Millennium● 2 minutes of applied Machine Learning● Programming with the Spark framework● Conclusion: the good, the bad, the not so beautiful
![Page 4: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/4.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 4/34
Need for good traffic estimation
● Traffic congestion affects everyone● Up-to-date estimation is critical● Complex for urban streets (arterial roads)
![Page 5: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/5.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 5/34
Real-time processing of fleet data
● Input: sampled position of taxicabs
● Observed every minute
![Page 6: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/6.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 6/34
Estimating the travel times
● Input: sampled position of taxicabs
● Observed every minute
● Covers the whole SF Bay
● 0.5 Million points / day(60M / day total)
● 0.1 Million road links
![Page 7: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/7.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 7/34
Filtering of fleet data
Preprocessing:
● Recovering trajectories from GPS points
![Page 8: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/8.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 8/34
Mobile Millennium
● A cyberphysical system for participatory sensing
![Page 9: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/9.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 9/34
Mobile Millennium
● A cyberphysical system for participatory sensing
Today:Batch jobs outsourcedto the cloud
Today:Batch jobs outsourcedto the cloud
![Page 10: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/10.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 10/34
Estimation of arterial traffic
● Input:● Pieces of trajectories between GPS points
● Output: probability distributions of travel time● For each link● Parametrized by vector θ (mean and variance of link
travel time)
©G
oogl
e, I
nc.
![Page 11: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/11.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 11/34
The way things work
● Example road network
● Associated link travel times:
![Page 12: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/12.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 12/34
The way things work
![Page 13: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/13.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 13/34
The way things work
Measurement sent
![Page 14: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/14.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 14/34
The way things work
Measurement sent
![Page 15: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/15.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 15/34
The way things work
...
![Page 16: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/16.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 16/34
The way things work
...
![Page 17: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/17.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 17/34
Life is not so simple
Measurement sent
![Page 18: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/18.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 18/34
Life is not so simple
● Long time between observations Measurement sent
![Page 19: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/19.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 19/34
Life is not so simple
● Long time between observations Measurement sent
● Solution: sample!
![Page 20: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/20.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 20/34
Life is not so simple
● Long time between observations Measurement sent
![Page 21: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/21.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 21/34
Life is not so simple
● Long time between observations Measurement sent
![Page 22: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/22.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 22/34
Life is not so simple
● Long time between observations Measurement sent
![Page 23: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/23.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 23/34
Machine learning without saying it
● Procedure called Expectation Maximization● Iterative in nature:
● Alternates between sampling (E step) and learning (M step)
● Some figures:● 50k road links ( parameters)● 50M observations (15GB, avg. 4 links / observation)● 200M partial travel times● x1000 samples per partial travel times
![Page 24: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/24.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 24/34
System workflow
Start link parameters(on master node)
Observations(distributed, persisted across nodes)
![Page 25: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/25.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 25/34
System workflow
Network parameters(distributed over the nodes)
![Page 26: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/26.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 26/34
System workflow
Travel time samplesFor each observation link
![Page 27: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/27.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 27/34
System workflow
Travel time samples aggregatedon a link basis
![Page 28: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/28.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 28/34
System workflow
New parameters are generatedThe maximize sampled travel timesfor each link.
The master collects the vector ofnew parameters.
![Page 29: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/29.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 29/34
Using the Spark programming model
Main loop of the programval observations = spark.textFile(“hdfs:...”) .map(parseObservation _) .cache()var params = // Initialize models parameterswhile (!converged) {
val samples = observations.flatMap( obs => generateSamples(obs, params))
params = samples.groupByKey(false).map( case (linkId, vals) => mostLikelyParam(linkId, vals) ).collect()}
Step 1 (E step)
Step 2 (M step)
![Page 30: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/30.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 30/34
The good
● Before using Spark:● 3.5x slower than real-time● Could not even handle all the data
● With Spark:● Similar programming interface (methods on scala
collections)● Very good scalability (near linear)● Each iteration 3x faster than reloading from disk
cores
runtime
NERSC cluster:quad-core Xeon4X QDR InfiniBand interconnect
![Page 31: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/31.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 31/34
Efficient utilization of memory
● The observation data is stored in memory:● Be careful with the memory footprint● Look at logs to monitor GC status
● We cache pointer-based structures ● Significant overhead in the JVM
● Workaround: use compact collection structures (arrays) and make liberal use of .toArray()
● Workaround: RDDs of serialized data
![Page 32: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/32.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 32/34
Broadcast of large parameters
● Need to share data between all workers:● At the start of the job (network description, > 40MB)● Between iterations (updated parameters θ)
● Using Spark's broadcast● Data loading time reduced by 79%
val network = // load networkval bc_net = spark.broadcast(network)val observations = spark.textFile(“...”) .map(parseObservation(_, bc_net.get()))
val network = // load networkval observations = spark.textFile(“...”) .map(parseObservation(_, network))
No broadcast BT broadcast0
500
1000
1500
2000
2500
Data loading time
Load
ing
time
(sec
)
No broadcast With broadcast
![Page 33: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/33.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 33/34
Conclusion
● An application of Spark:● Real-world ML problem● Crowd-sourced data
● Implementation now (much) faster than real time● Not limited by computations:
● We can use more complex ML tools than before
![Page 34: Scaling Mobile Millennium - UC Berkeley AMP Campampcamp.berkeley.edu/.../tim-hunter-amp-camp-2012-mobile-millenni… · Scaling Mobile Millennium with BDAS Timothy Hunter, Teodor](https://reader033.vdocument.in/reader033/viewer/2022060408/5f0ff1857e708231d446a934/html5/thumbnails/34.jpg)
August 22, 2012 MM on BDAS - AMP Camp 2012 34/34
Thank you