concurrency control for scalable bayesian inference
DESCRIPTION
Concurrency Control for Scalable Bayesian Inference. Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab Co-founder, GraphLab Inc. jegonzal @eecs.berkeley.edu. ISBA’ 2014. A Systems Approach to Scalable Bayesian Inference. Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/1.jpg)
Concurrency Control
for Scalable Bayesian Inference
Joseph E. GonzalezPostdoc, UC Berkeley AMPLabCo-founder, GraphLab [email protected]
ISBA’ 2014
![Page 2: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/2.jpg)
A Systems Approach
to Scalable Bayesian Inference
Joseph E. GonzalezPostdoc, UC Berkeley AMPLabCo-founder, GraphLab [email protected]
ISBA’ 2014
![Page 3: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/3.jpg)
http://www.domo.com/learn/data-never-sleeps-2
![Page 4: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/4.jpg)
http://www.domo.com/learn/data-never-sleeps-2
![Page 5: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/5.jpg)
http://www.domo.com/learn/data-never-sleeps-2
![Page 6: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/6.jpg)
http://www.domo.com/learn/data-never-sleeps-2
![Page 7: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/7.jpg)
Data Velocity is an opportunity for Bayesian Nonparametrics.
How do we scale Bayesian
inference?
![Page 8: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/8.jpg)
Opposing Forces
Accuracy Scalability
Ability to estimate the posterior distribution
Ability to effective useparallel resources
SerialInference
Coordination FreeSamplers
Parameter Server
![Page 9: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/9.jpg)
Data
ModelState
Serial Inference
![Page 10: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/10.jpg)
ModelState
Coordination Free Parallel Inference
Processor 1
Processor 2
Data
![Page 11: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/11.jpg)
Data
ModelState
Coordination Free Parallel Inference
Processor 1
Processor 2
Keep Calm and Carry On.
![Page 12: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/12.jpg)
Parameter Servers
System for Coordination Free Inference
D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent Dirichlet allocation. In NIPS, 2007.
A. Smola and S. Narayanamurthy. An architecture for parallel topic models. VLDB’10
Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. WSDM '12
Ho et al. “More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server.” NIPS’13
![Page 13: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/13.jpg)
Hierarchical Clustering
Global Variables
Local Variables
13
![Page 14: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/14.jpg)
Example: Topic Modeling with LDA
Word Dist. by Topic
Local Variables Documents
Maintained by the Parameter Server
Maintained by the Workers Nodes
Tokens
14
![Page 15: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/15.jpg)
Parameter Server
Parameter Server
Parameter Server
Ex: Collapsed Gibbs Sampler for LDA
Partitioning the model and data
15
w1w2w3 w4w5w6 w7w8w9
w1 w2w3
w4w5w6 w7
w8w9
W1:10K W10k:20K W20k:30K
![Page 16: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/16.jpg)
Parameter Server
Parameter Server
Parameter Server
Ex: Collapsed Gibbs Sampler for LDA
Partitioning the model and data
16
W1:10K W10k:20K W20k:30K
Parameter Cache
Car
Do g Pig
Bat
Parameter Cache
Cat
Ga s Zo oVW
Parameter Cache
Car
Ri
m bm w$$
Parameter Cache
Ma c iOS
iPo dCat
![Page 17: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/17.jpg)
Parameter Server
Parameter Server
Parameter Server
Ex: Collapsed Gibbs Sampler for LDA
Inconsistent model replicas
W1:10K W10k:20K W20k:30K
Parameter Cache
Car
Do g Pig
Bat
Parameter Cache
Cat
Ga s Zo oVW
Parameter Cache
Car
Ri
m bm w$$
Parameter Cache
Ma c iOS
iPo dCat
Inconsistent ValuesC
at
Cat
17
![Page 18: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/18.jpg)
Parallel Gibbs SamplingIncorrect Posterior
dependent variables cannot in general be sampled simultaneously.
Strong PositiveCorrelation
t=0
Parallel Execution
t=2 t=3
Strong PositiveCorrelation
t=1
Sequential
Execution
Strong NegativeCorrelation
18
![Page 19: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/19.jpg)
Issues with Nonparametrics
Difficult to introduce new clusters asynchronously:
Leads to too many clusters! 19
Parameter Server
Parameter Server
Parameter Server
Create Cluster
7
Create Cluster
7
7 7
![Page 20: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/20.jpg)
Opposing Forces
Accuracy Scalability
SerialInference
Asynchronous Samplers
Parameter Server
![Page 21: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/21.jpg)
Opposing Forces
Accuracy Scalability
Accu
racy
Scalability
![Page 22: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/22.jpg)
Opposing Forces
SerialInference
Asynchronous Samplers
Parameter Server
Accu
racy
Scalability
![Page 23: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/23.jpg)
Opposing Forces
SerialInference
Asynchronous Samplers
Parameter Server
Accu
racy
Scalability
Concurrency Control ?
![Page 24: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/24.jpg)
Concurrency Control
24
Coordination Free (Parameter Server):
Provably fast and correct under key assumptions.
Concurrency Control:
Provably correct and fast under key assumptions.
Systems Ideas toImprove Efficiency
![Page 25: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/25.jpg)
Concurrency Control
Opposing Forces
SerialInference
Asynchronous Samplers
Parameter Server
MutualExclusion
OptimisticConcurrency
Control
SafeUnsafe
Accu
racy
Scalability
![Page 26: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/26.jpg)
Mutual ExclusionConditional IndependenceJ. Gonzalez, Y. Low, A. Gretton, and C. Guestrin. Parallel Gibbs
Sampling: From Colored Fields to Thin Junction Trees. AISTATS’11
Exploit the Markov random field for Parallel Gibbs Sampling
GraphColoring
R/W LockMutual Exclusion
![Page 27: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/27.jpg)
Time
Mutual Exclusion through Scheduling
Chromatic Gibbs Sampler
Compute a k-coloring of the graphical model
Sample all variables with same color in parallel
Serial Equivalence:
27
![Page 28: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/28.jpg)
Theorem: Chromatic Sampler
Ergodic: converges to the correct distribution
»Based on graph coloring of the Markov Random Field
Quantifiable acceleration in mixingTime to update
all variables once
# Variables
# Colors
# Processors
28
![Page 29: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/29.jpg)
Data
ModelState
Mutual Exclusion Through Locking
Processor 1
Processor 2
Introducing locking (scheduling) protocols to identify
potential conflicts.
![Page 30: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/30.jpg)
Data
ModelState
Processor 1
Processor 2
✗
Enforce serialization of computation that could conflict.
Mutual Exclusion Through Locking
![Page 31: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/31.jpg)
Markov Blanket Locks Read/Write Locks:
31
R W
R
R
R
W R
R
R
R
![Page 32: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/32.jpg)
Markov Blanket Locks
Eliminate fixed schedule and global coordination
Supports more advanced block sampling
Expected Parallelism:
32
# Processors # Variables
Max Degree
![Page 33: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/33.jpg)
A System for Mutual Exclusion on
Markov Random Fields
GraphLab/PowerGraph [UAI’10, OSDI’12]:
• Chromatic Sampling
• Markov Blanket Locks + Block Sampling 33
![Page 34: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/34.jpg)
LimitationDensely Connected MRF
V-Structures: observations couple many variables
Collapsed models: clique-like MRFs
Mutual exclusion pessimistically serializes computation that could interfere.
Can we be optimistic and only serialize computation that does
interfere?
34
![Page 35: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/35.jpg)
Opposing Forces
SerialInference
Asynchronous Samplers
Parameter Server
MutualExclusion
OptimisticConcurrency
Control
SafeUnsafe
Accu
racy
Scalability
![Page 36: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/36.jpg)
Optimistic Concurrency
Controlassume the best and correctX. Pan, J. Gonzalez, S. Jegelka, T. Broderick, M. Jordan.
Optimistic Concurrency Control for Distributed Unsupervised Learning. NIPS’13
XinghaoPan
TamaraBroderick
Stefanie Jegelka
MichaelJordan
![Page 37: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/37.jpg)
Classic idea from Database Systems:
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems. 1981
Assume most operations won’t conflict:
• Execute operations without blockingFrequent case is fast
• Identify and resolve conflicts after they occur
Infrequent case with potentially costly resolution
37
Optimistic Concurrency Control
![Page 38: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/38.jpg)
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
Allow computation to proceed without blocking.
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
![Page 39: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/39.jpg)
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
?✔
Validate potential conflicts.
Valid outcome
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
![Page 40: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/40.jpg)
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
? ?✗ ✗
Validate potential conflicts.
Invalid Outcome
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
![Page 41: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/41.jpg)
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
Take a compensating action.
✗ ✗Amend the Value
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
![Page 42: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/42.jpg)
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
✗ ✗
Validate potential conflicts.
Invalid Outcome
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
![Page 43: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/43.jpg)
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
✗ ✗Rollback and Redo
Take a compensating action.
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
![Page 44: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/44.jpg)
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
Rollback and Redo
Non-Blocking Computation
Validation: Identify Errors
Resolution: Correct Errors
Concurrency
AccuracyFast
Infrequent
Requirements:
![Page 45: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/45.jpg)
Optimistic Concurrency Control
for Bayesian InferenceNon-parametric Models [Pan et al., NIPS’13]:
• OCC DP-Means: Dirichlet Process Clustering
• OCC BP-Means: Beta Process Feature Learning
Conditional Sampling: (In Progress)
• Collapsed Gibbs LDA
• Retrospective Sampling for HDP
45
![Page 46: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/46.jpg)
DP-Means Algorithm
Start with DP Gaussian mixture model:
small variance limit
[Kulis and Jordan, ICML’12]
![Page 47: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/47.jpg)
DP-Means Algorithm
Start with DP Gaussian mixture model:
small variance limit redefine :
[Kulis and Jordan, ICML’12]
DecreasesRapidly
![Page 48: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/48.jpg)
DP-Means Algorithm
Corresponding Gibbs sampler conditionals:
Taking the small variance limit
[Kulis and Jordan, ICML’12]
![Page 49: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/49.jpg)
DP-Means Algorithm
Gibbs updates become deterministic:
Taking the small variance limit
[Kulis and Jordan, ICML’12]
![Page 50: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/50.jpg)
DP-Means Algorithm
Gibbs updates become deterministic:
[Kulis and Jordan, ICML’12]
![Page 51: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/51.jpg)
DP-Means Algorithm
Computing cluster membership
[Kulis and Jordan, ICML’12]
λ
![Page 52: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/52.jpg)
DP-Means Algorithm
Updating cluster centers:
[Kulis and Jordan, ICML’12]
![Page 53: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/53.jpg)
DP-Means Parallel Execution
Computing cluster membership in parallel:
CPU 1
CPU 2
Cannot introduce
overlapping clusters in parallel
<λ
![Page 54: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/54.jpg)
Optimistic Concurrency Control
for Parallel DP-Means
<λ
ResolutionAssign new cluster center to existing cluster
Optimistic AssumptionNo new cluster created nearby
ValidationVerify that new clusters don’t overlap
CPU 1
CPU 2
![Page 55: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/55.jpg)
OCC DP-means
Theorem: OCC DP-means is serializable and therefore preserves theoretical properties of DP-means.
Theorem: Assuming well spaced clusters the expected overhead of OCC DP-means does not depend on data size.
Correctness
Concurrency
![Page 56: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/56.jpg)
Empirical Validation Failure Rate
56
Para
llelis
m
OC
C O
verh
ead
Poin
ts F
aili
ng
Valid
ati
on
Dataset Size
λ Separable Clusters
2 Processors
4 Processors
8 Processors
16 Processors
32 Processors
![Page 57: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/57.jpg)
Empirical Validation Failure Rate
57
Para
llelis
m
OC
C O
verh
ead
Poin
ts F
aili
ng
Valid
ati
on
Dataset Size
Overlapping Clusters
2 Processors
4 Processors
8 Processors
16 Processors
32 Processors
![Page 58: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/58.jpg)
Distributed Evaluation Amazon EC2
1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
Number of Machines
Ru
nti
me I
n S
econ
dP
er
Com
ple
te P
ass o
ver
Data
OCC DP-means Runtime Projected Linear Scaling
2x #machines≈ ½x runtime
~140 million data points; 1, 2, 4, 8 machines
![Page 59: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/59.jpg)
Optimistic Future forOptimistic Concurrency
Control
59
SerialInference
Parameter Server
Optimistic Concurrency
Control
SafeUnsafe
Accuracy Scalability
![Page 61: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/61.jpg)
OCC Collapsed Gibbs for LDA
Maintain epsilon intervals on the conditional:
System ensures conditionals are ε-accurate
Validate: Accept draws that land outside interval
Resolution: Serially resample rejected tokens
61
0 1u
ε ε ε ε
u
![Page 62: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/62.jpg)
OCC Collapsed Gibbs for LDA
62
![Page 63: Concurrency Control for Scalable Bayesian Inference](https://reader036.vdocument.in/reader036/viewer/2022081603/56814e78550346895dbc11df/html5/thumbnails/63.jpg)
Optimism forOptimistic Concurrency
Control
63
Fast + EasyCoordination Free
Slow + ComplexResolution
Validation
Enable decades of work in serial Bayesian inference
to be extended to the parallel setting.
Frequen
t Rare