convex optimization: [0.5ex] from real-time embedded [0 ... · 29/10/2013 · applications —...
TRANSCRIPT
![Page 1: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/1.jpg)
Convex Optimization:
from Real-Time Embedded
to Large-Scale Distributed
Stephen BoydNeal Parikh, Eric Chu, Yang Wang, Jacob Mattingley
Electrical Engineering Department, Stanford University
Lytle Lecture, University of Washington, 10/29/2013
1
![Page 2: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/2.jpg)
Outline
Convex Optimization
Real-Time Embedded Optimization
Large-Scale Distributed Optimization
Summary
2
![Page 3: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/3.jpg)
Outline
Convex Optimization
Real-Time Embedded Optimization
Large-Scale Distributed Optimization
Summary
Convex Optimization 3
![Page 4: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/4.jpg)
Convex optimization — Classical form
minimize f0(x)subject to fi (x) ≤ 0, i = 1, . . . ,m
Ax = b
◮ variable x ∈ Rn
◮ f0, . . . , fm are convex: for θ ∈ [0, 1],
fi (θx + (1− θ)y) ≤ θfi (x) + (1− θ)fi (y)
i.e., fi have nonnegative (upward) curvature
Convex Optimization 4
![Page 5: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/5.jpg)
Convex optimization — Cone form
minimize cT x
subject to x ∈ K
Ax = b
◮ variable x ∈ Rn
◮ K ⊂ Rn is a proper cone◮ K nonnegative orthant −→ LP◮ K Lorentz cone −→ SOCP◮ K positive semidefinite matrices −→ SDP
◮ the ‘modern’ canonical form
Convex Optimization 5
![Page 6: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/6.jpg)
Why
◮ beautiful, nearly complete theory◮ duality, optimality conditions, . . .
Convex Optimization 6
![Page 7: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/7.jpg)
Why
◮ beautiful, nearly complete theory◮ duality, optimality conditions, . . .
◮ effective algorithms, methods (in theory and practice)◮ get global solution (and optimality certificate)◮ polynomial complexity
Convex Optimization 6
![Page 8: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/8.jpg)
Why
◮ beautiful, nearly complete theory◮ duality, optimality conditions, . . .
◮ effective algorithms, methods (in theory and practice)◮ get global solution (and optimality certificate)◮ polynomial complexity
◮ conceptual unification of many methods
Convex Optimization 6
![Page 9: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/9.jpg)
Why
◮ beautiful, nearly complete theory◮ duality, optimality conditions, . . .
◮ effective algorithms, methods (in theory and practice)◮ get global solution (and optimality certificate)◮ polynomial complexity
◮ conceptual unification of many methods
◮ lots of applications (many more than previously thought)
Convex Optimization 6
![Page 10: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/10.jpg)
Application areas
◮ machine learning, statistics
◮ finance
◮ supply chain, revenue management, advertising
◮ control
◮ signal and image processing, vision
◮ networking
◮ circuit design
◮ combinatorial optimization
◮ quantum mechanics
Convex Optimization 7
![Page 11: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/11.jpg)
Applications — Machine learning
◮ parameter estimation for regression and classification◮ least squares, lasso regression◮ logistic, SVM classifiers◮ ML and MAP estimation for exponential families
◮ modern ℓ1 and other sparsifying regularizers◮ compressed sensing, total variation reconstruction
◮ k-means, EM, auto-encoders (bi-convex)
Convex Optimization 8
![Page 12: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/12.jpg)
Example — Support vector machine
◮ data (ai , bi ), i = 1, . . . ,m◮ ai ∈ Rn feature vectors; bi ∈ {−1, 1} Boolean outcomes
◮ prediction: b = sign(wTa− v)◮ w ∈ Rn is weight vector; v ∈ R is offset
Convex Optimization 9
![Page 13: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/13.jpg)
Example — Support vector machine
◮ data (ai , bi ), i = 1, . . . ,m◮ ai ∈ Rn feature vectors; bi ∈ {−1, 1} Boolean outcomes
◮ prediction: b = sign(wTa− v)◮ w ∈ Rn is weight vector; v ∈ R is offset
◮ SVM: choose w , v via (convex) optimization problem
minimize L+ (λ/2)‖w‖22
L = (1/m)∑
m
i=1
(
1− bi (wTai − v)
)
+is avg. loss
Convex Optimization 9
![Page 14: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/14.jpg)
SVM
wT z − v = 0 (solid); |wT z − v | = 1 (dashed)
Convex Optimization 10
![Page 15: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/15.jpg)
Sparsity via ℓ1 regularization
◮ adding ℓ1-norm regularization
λ‖x‖1 = λ(|x1|+ |x2|+ · · ·+ |xn|)
to objective results in sparse x
◮ λ > 0 controls trade-off of sparsity versus main objective
◮ preserves convexity, hence tractability
◮ used for many years, in many fields◮ sparse design◮ feature selection in machine learning (lasso, SVM, . . . )◮ total variation reconstruction in signal processing◮ compressed sensing
Convex Optimization 11
![Page 16: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/16.jpg)
Example — Lasso
◮ regression problem with ℓ1 regularization:
minimize (1/2)‖Ax − b‖22 + λ‖x‖1
with A ∈ Rm×n
◮ useful even when n ≫ m (!!); does feature selection
Convex Optimization 12
![Page 17: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/17.jpg)
Example — Lasso
◮ regression problem with ℓ1 regularization:
minimize (1/2)‖Ax − b‖22 + λ‖x‖1
with A ∈ Rm×n
◮ useful even when n ≫ m (!!); does feature selection
◮ cf. ℓ2 regularization (‘ridge regression’):
minimize (1/2)‖Ax − b‖22 + λ‖x‖22
Convex Optimization 12
![Page 18: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/18.jpg)
Example — Lasso
◮ regression problem with ℓ1 regularization:
minimize (1/2)‖Ax − b‖22 + λ‖x‖1
with A ∈ Rm×n
◮ useful even when n ≫ m (!!); does feature selection
◮ cf. ℓ2 regularization (‘ridge regression’):
minimize (1/2)‖Ax − b‖22 + λ‖x‖22
◮ lasso, ridge regression have same computational cost
Convex Optimization 12
![Page 19: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/19.jpg)
Example — Lasso
◮ m = 200 examples, n = 1000 features
◮ examples are noisy linear measurements of true x
◮ true x is sparse (30 nonzeros)
true x ℓ2 reconstruction
100 200 300 400 500 600 700 800 900 1000
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
100 200 300 400 500 600 700 800 900 1000
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Convex Optimization 13
![Page 20: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/20.jpg)
Example — Lasso
true x ℓ1 (lasso) reconstruction
100 200 300 400 500 600 700 800 900 1000
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
100 200 300 400 500 600 700 800 900 1000
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Convex Optimization 14
![Page 21: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/21.jpg)
State of the art — Medium scale solvers
◮ 1000s–10000s variables, constraints
◮ reliably solved by interior-point methods on single machine
◮ exploit problem sparsity
◮ not quite a technology, but getting there
Convex Optimization 15
![Page 22: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/22.jpg)
State of the art — Modeling languages
◮ (new) high level language support for convex optimization◮ describe problem in high level language◮ description is automatically transformed to cone problem◮ solved by standard solver, transformed back to original form
Convex Optimization 16
![Page 23: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/23.jpg)
State of the art — Modeling languages
◮ (new) high level language support for convex optimization◮ describe problem in high level language◮ description is automatically transformed to cone problem◮ solved by standard solver, transformed back to original form
◮ enables rapid prototyping (for small and medium problems)
◮ ideal for teaching (can do a lot with short scripts)
Convex Optimization 16
![Page 24: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/24.jpg)
CVX
◮ parser/solver written in Matlab (M. Grant, 2005)
◮ SVM:minimize L+ (λ/2)‖w‖22
L = (1/m)∑
m
i=1
(
1− bi (wTai − v)
)
+is avg. loss
◮ CVX specification:
cvx begin
variables w(n) v % weight, offset
L=(1/m)*sum(pos(1-b.*(A*w-v))); % avg. loss
minimize (L+(lambda/2)*sum square(w))
cvx end
Convex Optimization 17
![Page 25: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/25.jpg)
Outline
Convex Optimization
Real-Time Embedded Optimization
Large-Scale Distributed Optimization
Summary
Real-Time Embedded Optimization 18
![Page 26: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/26.jpg)
Motivation
◮ in many applications, need to solve the same problemrepeatedly with different data
◮ control: update actions as sensor signals, goals change◮ finance: rebalance portfolio as prices, predictions change
◮ used now when solve times are measured in minutes, hours◮ supply chain, chemical process control, trading
Real-Time Embedded Optimization 19
![Page 27: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/27.jpg)
Motivation
◮ in many applications, need to solve the same problemrepeatedly with different data
◮ control: update actions as sensor signals, goals change◮ finance: rebalance portfolio as prices, predictions change
◮ used now when solve times are measured in minutes, hours◮ supply chain, chemical process control, trading
◮ (using new techniques) can be used for applications withsolve times measured in milliseconds or microseconds
Real-Time Embedded Optimization 19
![Page 28: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/28.jpg)
Example — Disk head positioning
replacements
F
◮ force F (t) moves disk head/arm modeled as 3 masses(2 vibration modes)
◮ goal: move head to commanded position as quickly aspossible, with |F (t)| ≤ 1
◮ reduces to a (quasi-) convex problem
Real-Time Embedded Optimization 20
![Page 29: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/29.jpg)
Optimal force profile
position force F (t)
−2 0 2 4 6 8 10 120
0.5
1
1.5
2
2.5
3
t−2 0 2 4 6 8 10 12
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
t
Real-Time Embedded Optimization 21
![Page 30: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/30.jpg)
Embedded solvers — Requirements
◮ high speed◮ hard real-time execution limits
◮ extreme reliability and robustness◮ no floating point exceptions◮ must handle poor quality data
◮ small footprint◮ no complex libraries
Real-Time Embedded Optimization 22
![Page 31: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/31.jpg)
Embedded solvers
◮ (if a general solver works, use it)
Real-Time Embedded Optimization 23
![Page 32: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/32.jpg)
Embedded solvers
◮ (if a general solver works, use it)
◮ otherwise, develop custom code◮ by hand◮ automatically via code generation
◮ can exploit known sparsity pattern, data ranges, requiredtolerance at solver code development time
Real-Time Embedded Optimization 23
![Page 33: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/33.jpg)
Embedded solvers
◮ (if a general solver works, use it)
◮ otherwise, develop custom code◮ by hand◮ automatically via code generation
◮ can exploit known sparsity pattern, data ranges, requiredtolerance at solver code development time
◮ typical speed-up over general solver: 100–10000×
Real-Time Embedded Optimization 23
![Page 34: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/34.jpg)
Parser/solver vs. code generator
Probleminstance
Parser/solverx⋆
Real-Time Embedded Optimization 24
![Page 35: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/35.jpg)
Parser/solver vs. code generator
Probleminstance
Parser/solverx⋆
Source codeGeneratorProblem family
description
Custom solver
Custom solverCompiler
Probleminstance x
⋆
Real-Time Embedded Optimization 24
![Page 36: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/36.jpg)
CVXGEN code generator
◮ handles small, medium size problems transformable to QP(J. Mattingley, 2010)
◮ uses primal-dual interior-point method
◮ generates flat library-free C source
Real-Time Embedded Optimization 25
![Page 37: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/37.jpg)
CVXGEN example specification — SVM
dimensions
m = 50 % training examples
n = 10 % dimensions
end
parameters
a[i] (n), i = 1..m % features
b[i], i = 1..m % outcomes
lambda positive
end
variables
w (n) % weights
v % offset
end
minimize
(1/m)*sum[i = 1..m](pos(1 - b[i]*(w’*a[i] − v))) +
(lambda/2)*quad(w)
end
Real-Time Embedded Optimization 26
![Page 38: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/38.jpg)
CVXGEN sample solve times
problem SVM Disk
variables 61 590
constraints 100 742
CVX, Intel i3 270 ms 2100 ms
CVXGEN, Intel i3 230 µs 4.8 ms
Real-Time Embedded Optimization 27
![Page 39: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/39.jpg)
Outline
Convex Optimization
Real-Time Embedded Optimization
Large-Scale Distributed Optimization
Summary
Large-Scale Distributed Optimization 28
![Page 40: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/40.jpg)
Motivation and goal
motivation:
◮ want to solve arbitrary-scale optimization problems◮ machine learning/statistics with huge datasets◮ dynamic optimization on large-scale networks
Large-Scale Distributed Optimization 29
![Page 41: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/41.jpg)
Motivation and goal
motivation:
◮ want to solve arbitrary-scale optimization problems◮ machine learning/statistics with huge datasets◮ dynamic optimization on large-scale networks
goal:
◮ ideally, a system that◮ has CVX-like interface◮ targets modern large-scale computing platforms◮ scales arbitrarily
. . . not there yet, but there’s promising progress
Large-Scale Distributed Optimization 29
![Page 42: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/42.jpg)
Distributed optimization
◮ devices/processors/agents coordinate to solve largeproblem, by passing relatively small messages
◮ can split variables, constraints, objective terms amongprocessors
◮ variables that appear in more than one processor called‘complicating variables’(same for constraints, objective terms)
Large-Scale Distributed Optimization 30
![Page 43: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/43.jpg)
Example — Distributed optimization
minimize f1(x1, x2) + f2(x2, x3) + f3(x1, x3)
f1 f2 f3
x2 x3
x1
Large-Scale Distributed Optimization 31
![Page 44: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/44.jpg)
Distributed optimization methods
◮ dual decomposition (Dantzig-Wolfe, 1950s–)
◮ subgradient consensus(Tsitsiklis, Bertsekas, Nedic, Ozdaglar, Jadbabaie, 1980s–)
Large-Scale Distributed Optimization 32
![Page 45: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/45.jpg)
Distributed optimization methods
◮ dual decomposition (Dantzig-Wolfe, 1950s–)
◮ subgradient consensus(Tsitsiklis, Bertsekas, Nedic, Ozdaglar, Jadbabaie, 1980s–)
◮ alternating direction method of multipliers (1980s–)◮ equivalent to many other methods
(e.g., Douglas-Rachford splitting)◮ well suited to modern systems and problems
Large-Scale Distributed Optimization 32
![Page 46: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/46.jpg)
Consensus optimization
◮ want to solve problem with N objective terms
minimize∑
N
i=1 fi (x)
e.g., fi is the loss function for ith block of training data
◮ consensus form:
minimize∑
N
i=1 fi (xi )subject to xi − z = 0
◮ xi are local variables◮ z is the global variable◮ xi − z = 0 are consistency or consensus constraints
Large-Scale Distributed Optimization 33
![Page 47: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/47.jpg)
Consensus optimization via ADMM
with xk = (1/N)∑
N
i=1 xk
i(average over local variables)
xk+1i
:= argminxi
(
fi (xi ) + (ρ/2)‖xi − xk + uki ‖22
)
uk+1i
:= uki + (xk+1i
− xk+1)
◮ get global minimum, under very general conditions
◮ uk is running sum of inconsistencies (PI control)
◮ minimizations carried out independently and in parallel
◮ coordination is via averaging of local variables xi
Large-Scale Distributed Optimization 34
![Page 48: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/48.jpg)
Statistical interpretation
◮ fi is negative log-likelihood (loss) for parameter x given ithdata block
◮ xk+1i
is MAP estimate under prior N (xk − uki, ρI )
◮ processors only need to support a Gaussian MAP method◮ type or number of data in each block not relevant◮ consensus protocol yields global ML estimate
◮ privacy preserving: agents never reveal data to each other
Large-Scale Distributed Optimization 35
![Page 49: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/49.jpg)
Example — Consensus SVM
◮ baby problem with n = 2, m = 400 to illustrate
◮ examples split into N = 20 groups, in worst possible way:each group contains only positive or negative examples
Large-Scale Distributed Optimization 36
![Page 50: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/50.jpg)
Iteration 1
−3 −2 −1 0 1 2 3−10
−8
−6
−4
−2
0
2
4
6
8
10
Large-Scale Distributed Optimization 37
![Page 51: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/51.jpg)
Iteration 5
−3 −2 −1 0 1 2 3−10
−8
−6
−4
−2
0
2
4
6
8
10
Large-Scale Distributed Optimization 38
![Page 52: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/52.jpg)
Iteration 40
−3 −2 −1 0 1 2 3−10
−8
−6
−4
−2
0
2
4
6
8
10
Large-Scale Distributed Optimization 39
![Page 53: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/53.jpg)
Example — Distributed lasso
◮ example with dense A ∈ R400000×8000 (∼30 GB of data)◮ distributed solver written in C using MPI and GSL◮ no optimization or tuned libraries (like ATLAS, MKL)◮ split into 80 subsystems across 10 (8-core) machines on
Amazon EC2
◮ computation times
loading data 30s
factorization (5000× 8000 matrices) 5m
subsequent ADMM iterations 0.5–2s
total time (about 15 ADMM iterations) 5–6m
Large-Scale Distributed Optimization 40
![Page 54: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/54.jpg)
Outline
Convex Optimization
Real-Time Embedded Optimization
Large-Scale Distributed Optimization
Summary
Summary 41
![Page 55: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/55.jpg)
Summary
convex optimization problems
◮ arise in many applications
◮ can be solved effectively◮ small problems at microsecond/millisecond time scales◮ medium-scale problems using general purpose methods◮ arbitrary-scale problems using distributed optimization
Summary 42
![Page 56: Convex Optimization: [0.5ex] from Real-Time Embedded [0 ... · 29/10/2013 · Applications — Machine learning parameter estimation for regression and classification least squares,](https://reader033.vdocument.in/reader033/viewer/2022051511/6016ec08b8af0f5701352b20/html5/thumbnails/56.jpg)
References
◮ Convex Optimization (Boyd & Vandenberghe)
◮ CVX: Matlab software for disciplined convex programming
(Grant & Boyd)
◮ CVXGEN: A code generator for embedded convex
optimization (Mattingley & Boyd)
◮ Distributed optimization and statistical learning via the
alternating direction method of multipliers
(Boyd, Parikh, Chu, Peleato, & Eckstein)
all available (with code) from stanford.edu/~boyd
Summary 43