computation and minimax risk the most challenging topic… some recent progress: –tradeoffs...

Post on 28-Dec-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Computation and Minimax Risk

• The most challenging topic…• Some recent progress:

– tradeoffs between time and accuracy via convex relaxations (Chandrasekaran & Jordan, 2013)

– constraints on computation via optimization oracles (Duchi, McMahan & Jordan, 2014)

– parallelization via optimistic concurrency control (Pan, et al., 2014)

Concurrency Control for Distributed Machine

LearningMichael I. Jordan

University of California, Berkeley

(with Xinghao Pan, Joseph Gonzalez, Stefanie Jegelka, Tamara Broderick and Joseph Bradley)

Distributed Computing Meets Large-Scale Statistical Inference

• In many areas of statistics, parallel/distributed approaches are increasingly essential (e.g., to provide time/sample tradeoffs)

• Many methods, either optimization-based or integration-based, involve exploring models having variable structure

• Leading to a core problem: how to ensure that statistical consistency and coherence are maintained when multiple processors are making structural changes to a model?

Data

ModelState

Serial Inference

ModelState

Coordination Free Parallel Inference

Processor 1

Processor 2

Data

Data

ModelState

Coordination Free Parallel Inference

Processor 1

Processor 2

Keep Calm and Carry On.

Accuracy

Serial

Low High

Accuracy

Scalability

Coordination-free

Serial

High

Low High

Low

Accuracy

Scalability

Coordination-free

Serial

High

Low High

Low

ConcurrencyControl

Database mechanismso Guarantee correctnesso Maximize concurrency Mutual exclusion Optimistic CC

Data

ModelState

Mutual Exclusion Through Locking

Processor 1

Processor 2

Introducing locking (scheduling) protocols to identify

potential conflicts.

Data

ModelState

Processor 1

Processor 2

Enforce serialization of computation that could conflict.

Mutual Exclusion Through Locking

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

Allow computation to proceed without blocking.

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

?✔

Validate potential conflicts.

Valid outcome

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

? ?✗ ✗

Validate potential conflicts.

Invalid Outcome

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

Take a compensating action.

✗ ✗Amend the Value

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

✗ ✗

Validate potential conflicts.

Invalid Outcome

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

✗ ✗Rollback and Redo

Take a compensating action.

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

Rollback and Redo

Non-Blocking Computation

Validation: Identify Errors

Resolution: Correct Errors

Concurrency

AccuracyFast

Infrequent

Requirements:

Concurrency Control

Coordination Free:

Provably fast and correct under key assumptions.

Concurrency Control:

Provably correct and fast under key assumptions.

Systems Ideas toImprove Efficiency

Examples

Keyw

ord

sQ

ueri

es

A B C D E F G H

1 2 3 4 5 6 7 8

$2 $5 $1 $2 $5 $1 $4 $2

Costs

$2 $2 $4 $4 $3 $6 $5 $1

Value

θ1

ϕ1

θ2

θ3θ4

ϕ2 ϕ3 ϕ4θ5

θ6

Clustering: DP-means Submodularity: Double Greedy

Bayesian Nonparametrics: Chinese Restaurant Process

Clustering with DP-means

Bayesian Nonparametrics Meets Optimization

• A methodology whereby optimization functionals arise when “small-variance asymptotics” are applied to Bayesian models based on combinatorial stochastic process priors

Bayesian Nonparametrics Meets Optimization

• A methodology whereby optimization functionals arise when “small-variance asymptotics” are applied to Bayesian models based on combinatorial stochastic process priors

• Inspiration: the venerable, scalable K-means algorithm can be derived as the limit of an Expectation-Maximization algorithm for fitting a mixture model

Bayesian Nonparametrics Meets Optimization

• A methodology whereby optimization functionals arise when “small-variance asymptotics” are applied to Bayesian models based on combinatorial stochastic process priors

• Inspiration: the venerable, scalable K-means algorithm can be derived as the limit of an Expectation-Maximization algorithm for fitting a mixture model

• We do something similar in spirit, taking limits of various Bayesian nonparametric models:– Dirichlet process mixtures– hierarchical Dirichlet process mixtures– beta processes and hierarchical beta processes

DP-Means Algorithm

Computing cluster membership

[Kulis and Jordan, 2012]

λ

DP-Means Algorithm

Updating cluster centers:

[Kulis and Jordan, ICML’12]

DP-Means Parallel Execution

Computing cluster membership in parallel:

CPU 1

CPU 2

Cannot introduce

overlapping clusters in parallel

Optimistic Concurrency Control

for Parallel DP-Means

ResolutionAssign new cluster center to existing cluster

Optimistic AssumptionNo new cluster created nearby

ValidationVerify that new clusters don’t overlap

CPU 1

CPU 2

Corr

ectn

es

sConcurrency Control for DP-means

Theorem: OCC DP-means is serializable, i.e. equivalent to some sequential execution.

Corollary: OCC DP-means preserves theoretical properties of DP-means.

Theorem: Assuming well-spaced clusters, expected overhead of OCC DP-means, in terms of number of rejected proposals, does not depend on size of data set.

Con

cu

rre

ncy

Empirical Validation Failure Rate

30

OC

C O

verh

ead

Poin

ts F

aili

ng

Valid

ati

on

Dataset Size

λ Separable Clusters

2 Processors

4 Processors

8 Processors

16 Processors

32 Processors

Independence of dataset size

Empirical Validation Failure Rate

31

OC

C O

verh

ead

Poin

ts F

aili

ng

Valid

ati

on

Dataset Size

Overlapping Clusters

2 Processors

4 Processors

8 Processors

16 Processors

32 Processors

Weak dependence of dataset size

Distributed Evaluation Amazon EC2

1 2 3 4 5 6 7 80

500

1000

1500

2000

2500

3000

3500

Number of Machines

Ru

nti

me I

n S

econ

dP

er

Com

ple

te P

ass o

ver

Data

OCC DP-means Runtime Projected Linear Scaling

2x #machines≈ ½x runtime

~140 million data points; 1, 2, 4, 8 machines

Summary

Accuracy Scalability

SequentialAppealing theoretical properties

Little

Coordination-free

Approximate, under

assumptionsAlways fast

Concurrency Control

Always correctGood, under assumptions• Coordination-free approach guarantees speed, and

analysis focuses on showing accuracy under assumptions.• Our approach guarantees accuracy, and analysis focuses

on showing speed under assumptions.

Conclusions

• Many conceptual and mathematical challenges arising in taking seriously the problem of “Big Data”

• Facing these challenges will require a rapprochement between computer science and statistics, bringing them together at the level of their foundations – thus reshaping both disciplines

top related