computation and minimax risk the most challenging topic… some recent progress: –tradeoffs...

Computation and Minimax Risk

• The most challenging topic…• Some recent progress:

– tradeoffs between time and accuracy via convex relaxations (Chandrasekaran & Jordan, 2013)

– constraints on computation via optimization oracles (Duchi, McMahan & Jordan, 2014)

– parallelization via optimistic concurrency control (Pan, et al., 2014)

Concurrency Control for Distributed Machine

LearningMichael I. Jordan

University of California, Berkeley

(with Xinghao Pan, Joseph Gonzalez, Stefanie Jegelka, Tamara Broderick and Joseph Bradley)

Distributed Computing Meets Large-Scale Statistical Inference

• In many areas of statistics, parallel/distributed approaches are increasingly essential (e.g., to provide time/sample tradeoffs)

• Many methods, either optimization-based or integration-based, involve exploring models having variable structure

• Leading to a core problem: how to ensure that statistical consistency and coherence are maintained when multiple processors are making structural changes to a model?

ModelState

Serial Inference

ModelState

Coordination Free Parallel Inference

Processor 1

Processor 2

ModelState

Coordination Free Parallel Inference

Processor 1

Processor 2

Keep Calm and Carry On.

Accuracy

Serial

Low High

Accuracy

Scalability

Coordination-free

Serial

Low High

Accuracy

Scalability

Coordination-free

Serial

Low High

ConcurrencyControl

Database mechanismso Guarantee correctnesso Maximize concurrency Mutual exclusion Optimistic CC

ModelState

Mutual Exclusion Through Locking

Processor 1

Processor 2

Introducing locking (scheduling) protocols to identify

potential conflicts.

ModelState

Processor 1

Processor 2

Enforce serialization of computation that could conflict.

Mutual Exclusion Through Locking

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

Allow computation to proceed without blocking.

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

ModelState

Processor 1

Processor 2

Validate potential conflicts.

Valid outcome

ModelState

Processor 1

Processor 2

? ?✗ ✗

Invalid Outcome

ModelState

Processor 1

Processor 2

Take a compensating action.

✗ ✗Amend the Value

ModelState

Processor 1

Processor 2

✗ ✗

Invalid Outcome

ModelState

Processor 1

Processor 2

✗ ✗Rollback and Redo

Take a compensating action.

ModelState

Processor 1

Processor 2

Rollback and Redo

Non-Blocking Computation

Validation: Identify Errors

Resolution: Correct Errors

Concurrency

AccuracyFast

Infrequent

Requirements:

Concurrency Control

Coordination Free:

Provably fast and correct under key assumptions.

Concurrency Control:

Provably correct and fast under key assumptions.

Systems Ideas toImprove Efficiency

Examples

A B C D E F G H

1 2 3 4 5 6 7 8

$2 $5 $1 $2 $5 $1 $4 $2

$2 $2 $4 $4 $3 $6 $5 $1

θ3θ4

ϕ2 ϕ3 ϕ4θ5

Clustering: DP-means Submodularity: Double Greedy

Bayesian Nonparametrics: Chinese Restaurant Process

Clustering with DP-means

Bayesian Nonparametrics Meets Optimization

• A methodology whereby optimization functionals arise when “small-variance asymptotics” are applied to Bayesian models based on combinatorial stochastic process priors

• Inspiration: the venerable, scalable K-means algorithm can be derived as the limit of an Expectation-Maximization algorithm for fitting a mixture model

• We do something similar in spirit, taking limits of various Bayesian nonparametric models:– Dirichlet process mixtures– hierarchical Dirichlet process mixtures– beta processes and hierarchical beta processes

DP-Means Algorithm

Computing cluster membership

[Kulis and Jordan, 2012]

DP-Means Algorithm

Updating cluster centers:

[Kulis and Jordan, ICML’12]

DP-Means Parallel Execution

Computing cluster membership in parallel:

Cannot introduce

overlapping clusters in parallel

for Parallel DP-Means

ResolutionAssign new cluster center to existing cluster

Optimistic AssumptionNo new cluster created nearby

ValidationVerify that new clusters don’t overlap

sConcurrency Control for DP-means

Theorem: OCC DP-means is serializable, i.e. equivalent to some sequential execution.

Corollary: OCC DP-means preserves theoretical properties of DP-means.

Theorem: Assuming well-spaced clusters, expected overhead of OCC DP-means, in terms of number of rejected proposals, does not depend on size of data set.

Empirical Validation Failure Rate

Dataset Size

λ Separable Clusters

2 Processors

4 Processors

8 Processors

16 Processors

32 Processors

Independence of dataset size

Empirical Validation Failure Rate

Dataset Size

Overlapping Clusters

2 Processors

4 Processors

8 Processors

16 Processors

32 Processors

Weak dependence of dataset size

Distributed Evaluation Amazon EC2

1 2 3 4 5 6 7 80

Number of Machines

OCC DP-means Runtime Projected Linear Scaling

2x #machines≈ ½x runtime

~140 million data points; 1, 2, 4, 8 machines

Summary

Accuracy Scalability

SequentialAppealing theoretical properties

Little

Coordination-free

Approximate, under

assumptionsAlways fast

Concurrency Control

Always correctGood, under assumptions• Coordination-free approach guarantees speed, and

analysis focuses on showing accuracy under assumptions.• Our approach guarantees accuracy, and analysis focuses

on showing speed under assumptions.

Conclusions

• Many conceptual and mathematical challenges arising in taking seriously the problem of “Big Data”

• Facing these challenges will require a rapprochement between computer science and statistics, bringing them together at the level of their foundations – thus reshaping both disciplines

computation and minimax risk the most challenging topic… some recent progress: –tradeoffs...

optimistic methods

database systems

acm transactions

potential conflicts

invalid outcomekung

valuekung robinson

serialization of computation

compensating action

Documents

rmaug minimax: adobe edge

minimax v 212 man

ieee signal processing society senior members...

branch flow model relaxations, convexification

maximax, maximin y minimax

seminar ppt(minimax algorithm)

hypergraphic lp relaxations for steiner...

rounding sum of squares relaxations

authors.library.caltech.edu · 2016-10-17 · title:...

folleto aerogeneradores minimax

minimax 2010 nr4

arxiv:1904.08934v1 [cs.ds] 18 apr 2019 · convex graph...

minimax pathology

rajeswari chandrasekaran, ph.d. research and … & advanced...

minimax and hamiltonian dynamics of excitatory...

full characterizations of minimax inequality, fixed-point...

español english français · abstracción de conocimiento...

minimax 2010 nr2

portfolio mr i chandrasekaran

dr v p chandrasekaran