carnegie mellon yucheng low aapo kyrola danny bickson a framework for machine learning and data...
TRANSCRIPT
Carnegie Mellon
Yucheng Low
AapoKyrola
DannyBickson
A Framework for Machine Learning and Data Mining in the Cloud
JosephGonzalez
CarlosGuestrin
JoeHellerstein
Big Data is Everywhere
72 Hours a MinuteYouTube28 Million
Wikipedia Pages
900 MillionFacebook Users
6 Billion Flickr Photos
2
“… data a new class of economic asset, like currency or gold.”
“…growing at 50 percent a year…”
Shift Towards Use Of Parallelism in ML
GPUs Multicore Clusters Clouds Supercomputers
ML experts repeatedly solve the same parallel design challenges:
Race conditions, distributed state, communication…
Resulting code is very specialized:difficult to maintain, extend, debug…
Graduate
students
Avoid these problems by using high-level abstractions
4
CPU 1 CPU 2 CPU 3 CPU 4
MapReduce – Map Phase
6
Embarrassingly Parallel independent computation
12.9
42.3
21.3
25.8
No Communication needed
CPU 1 CPU 2 CPU 3 CPU 4
MapReduce – Map Phase
7
12.9
42.3
21.3
25.8
24.1
84.3
18.4
84.4
Embarrassingly Parallel independent computation No Communication needed
CPU 1 CPU 2 CPU 3 CPU 4
MapReduce – Map Phase
8
12.9
42.3
21.3
25.8
17.5
67.5
14.9
34.3
24.1
84.3
18.4
84.4
Embarrassingly Parallel independent computation No Communication needed
CPU 1 CPU 2
MapReduce – Reduce Phase
9
12.9
42.3
21.3
25.8
24.1
84.3
18.4
84.4
17.5
67.5
14.9
34.3
2226.
26
1726.
31
Image Features
Attractive Face Statistics
Ugly Face Statistics
U A A U U U A A U A U A
Attractive Faces Ugly Faces
MapReduce for Data-Parallel ML
Excellent for large data-parallel tasks!
Data-Parallel Graph-Parallel
CrossValidation
Feature Extraction
MapReduce
Computing SufficientStatistics
Graphical ModelsGibbs Sampling
Belief PropagationVariational Opt.
Semi-Supervised Learning
Label PropagationCoEM
Graph AnalysisPageRank
Triangle Counting
Collaborative Filtering
Tensor Factorization
Is there more toMachine Learning
?10
12
HockeyScuba Diving
Underwater Hockey
Scuba Diving
Scuba Diving
Scuba Diving
Hockey
Hockey
Hockey
Hockey
Graphs are Everywhere
Use
rs
Movies
Netflix
Collaborative Filtering
Doc
s
Words
Wiki
Text Analysis
Social Network
Probabilistic Analysis
13
Properties of Computation on Graphs
DependencyGraph
IterativeComputation
My Interests
Friends Interests
LocalUpdates
ML Tasks Beyond Data-Parallelism
Data-Parallel Graph-Parallel
CrossValidation
Feature Extraction
Map Reduce
Computing SufficientStatistics
Graphical ModelsGibbs Sampling
Belief PropagationVariational Opt.
Semi-Supervised Learning
Label PropagationCoEM
Graph AnalysisPageRank
Triangle Counting
Collaborative Filtering
Tensor Factorization
15
Bayesian Tensor Factorization
Gibbs Sampling
MatrixFactorization
Lasso
SVM
Belief Propagation
PageRank
CoEM
SVD
LDA
…Many others…Linear Solvers
Splash SamplerAlternating Least
Squares
21
2010Shared Memory
Distributed Cloud
- Distributing State- Data Consistency- Fault Tolerance
23
Unlimited amount of computation resources!(up to funding limitations)
The GraphLab Framework
Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
24
Data GraphData associated with vertices and edges
Vertex Data:• User profile• Current interests estimates
Edge Data:• Relationship (friend, classmate, relative)
Graph:• Social Network
25
Distributed Graph
Ghost vertices maintain adjacency structure and replicate remote data.
“ghost” vertices
27
Distributed Graph
Cut efficiently using HPC Graph partitioning tools (ParMetis / Scotch / …)
“ghost” vertices
28
The GraphLab Framework
Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
29
Pagerank(scope){ // Update the current vertex data
// Reschedule Neighbors if needed if vertex.PageRank changes then reschedule_all_neighbors; }
Update FunctionsUser-defined program: applied to avertex and transforms data in scope of vertex
Dynamic computation
Update function applied (asynchronously) in parallel until convergence
Many schedulers available to prioritize computation
30
32
Shared Memory Dynamic Schedule
e f g
kjih
dcbaCPU 1
CPU 2
a
h
a
b
b
i
Process repeats until scheduler is empty
Scheduler
Distributed Scheduling
e
ih
ba
f g
kj
dc
a
h
f
g
j
cb
i
Each machine maintains a schedule over the vertices it owns.
33Distributed Consensus used to identify completion
The GraphLab Framework
Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
36
Racing Collaborative Filtering
37
0 200 400 600 800 1000 1200 1400 16000.01
0.1
1
d=20 Racing d=20
Time
RMSE
Serializability
38
For every parallel execution, there exists a sequential execution of update functions which produces the same result.
CPU 1
CPU 2
SingleCPU
Parallel
Sequential
time
Serializability Example
39
Read
Write
Update functions one vertex apart can be run in parallel.
Edge Consistency
Overlapping regions are only read.
Stronger / Weaker consistency levels available
User-tunable consistency levelstrades off parallelism & consistency
Edge Consistency via Graph Coloring
Vertices of the same color are all at least one vertex apart.Therefore, All vertices of the same color can be run in parallel!
42
Chromatic Distributed EngineTi
me
Execute tasks on all vertices of
color 0
Execute tasks on all vertices of
color 0
Ghost Synchronization Completion + Barrier
Execute tasks on all vertices of
color 1
Execute tasks on all vertices of
color 1
Ghost Synchronization Completion + Barrier
43
Matrix FactorizationNetflix Collaborative Filtering
Alternating Least Squares Matrix Factorization
Model: 0.5 million nodes, 99 million edges
Netflix
Users
Movies
d
44
Users Movies
ProblemsRequire a graph coloring to be available.
Frequent Barriers make it extremely inefficient for highly dynamic systems where only a small number of vertices are active in each round.
47
SolutionPipelining
CPU Machine 1
Machine 2
A C
B D
Consistency Through LockingMulticore Setting
PThread RW-Locks
Distributed Setting
Distributed LocksChallenges
Latency
A C
B D
A C
B D
A
51
No Pipelining
lock scope 1
Process request 1
scope 1 acquiredupdate_function 1
release scope 1
Process release 1
Tim
e
52
Pipelining / Latency HidingHide latency using pipelining
lock scope 1
Process request 1
scope 1 acquired
update_function 1release scope 1
Process release 1
lock scope 2
Tim
e lock scope 3Process request 2
Process request 3scope 2 acquiredscope 3 acquired
update_function 2release scope 2
53
The GraphLab Framework
Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
54
57
Snapshot Performance
No Snapshot
Snapshot
One slow machine
Because we have to stop the world, One slow machine slows everything down!
Snapshot time
Slow machine
59
Checkpointing1985: Chandy-Lamport invented an asynchronous snapshotting algorithm for distributed systems.
snapshottedNot snapshotted
60
CheckpointingFine Grained Chandy-Lamport.
Easily implemented within GraphLab as an Update Function!
61
Async. Snapshot Performance
No Snapshot
Snapshot
One slow machine
No penalty incurred by the slow machine!