![Page 1: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/1.jpg)
Accelerated, Parallel and PROXimal coordinate descent
IPAMFebruary 2014
A P PROXPeter Richtárik
(Joint work with Olivier Fercoq - arXiv:1312.5799)
![Page 2: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/2.jpg)
Contributions
![Page 3: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/3.jpg)
Variants of Randomized Coordinate Descent Methods
• Block– can operate on “blocks” of
coordinates – as opposed to just on individual
coordinates
• General – applies to “general” (=smooth
convex) functions – as opposed to special ones such as
quadratics
• Proximal– admits a “nonsmooth regularizer”
that is kept intact in solving subproblems
– regularizer not smoothed, nor approximated
• Parallel – operates on multiple blocks /
coordinates in parallel– as opposed to just 1 block /
coordinate at a time
• Accelerated– achieves O(1/k^2) convergence rate
for convex functions– as opposed to O(1/k)
• Efficient– avoids adding two full feature
vectors
![Page 4: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/4.jpg)
Brief History of Randomized Coordinate Descent Methods
+ new long stepsizes
![Page 5: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/5.jpg)
Introduction
![Page 6: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/6.jpg)
I. Block
Structure
II. Block
Sampling
IV. Fast or
Normal?
III. Proximal
Setup
![Page 7: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/7.jpg)
I. Block Structure
![Page 8: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/8.jpg)
I. Block Structure
![Page 9: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/9.jpg)
I. Block Structure
![Page 10: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/10.jpg)
I. Block Structure
![Page 11: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/11.jpg)
I. Block Structure
![Page 12: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/12.jpg)
I. Block StructureN = # coordinates
(variables)
n = # blocks
![Page 13: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/13.jpg)
II. Block Sampling
Block sampling
Average # blocks selected by the sampling
![Page 14: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/14.jpg)
III. Proximal Setup
Convex & Smooth Convex & Nonsmooth
Loss Regularizer
![Page 15: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/15.jpg)
III. Proximal SetupLoss Functions: Examples
Quadratic loss
L-infinity
L1 regression
Exponential loss
Logistic loss
Square hinge loss
BKBG’11RT’11bTBRS’13RT ’13a
FR’13
![Page 16: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/16.jpg)
III. Proximal SetupRegularizers: Examples
No regularizer Weighted L1 norm
Weighted L2 normBox constraints
e.g., SVM dual
e.g., LASSO
![Page 17: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/17.jpg)
The Algorithm
![Page 18: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/18.jpg)
APPROX
Olivier Fercoq and P.R. Accelerated, parallel and proximal coordinate descent, arXiv:1312.5799, December 2013
![Page 19: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/19.jpg)
Part CRANDOMIZED
COORDINATE DESCENT
Part BGRADIENT METHODS
B1GRADIENT DESCENT
B2PROJECTED
GRADIENT DESCENT
B3PROXIMAL
GRADIENT DESCENT
B4FAST PROXIMAL
GRADIENT DESCENT
C1PROXIMAL
COORDINATE DESCENT
C2PARALLEL
COORDINATE DESCENT
C3DISTRIBUTED
COORDINATE DESCENT
C4FAST PARALLEL
COORDINATE DESCENT
new FISTAISTA
Olivier Fercoq and P.R. Accelerated, parallel and proximal coordinate descent, arXiv:1312.5799, Dec 2013
![Page 20: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/20.jpg)
PCDM
P.R. and Martin Takac. Parallel coordinate descent methods for big data optimization, arXiv:1212.0873, December 2012IMA Fox Prize in Numerical Analysis, 2013
![Page 21: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/21.jpg)
2D Example
![Page 22: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/22.jpg)
Convergence Rate
![Page 23: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/23.jpg)
Convergence Rate
average # coordinates updated / iteration
# blocks# iterations
implies
Theorem [Fercoq & R. 12/2013]
![Page 24: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/24.jpg)
Special Case: Fully Parallel Variantall blocks are updated in each iteration
# normalized weights (summing to n)
# iterations
implies
![Page 25: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/25.jpg)
New Stepsizes
![Page 26: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/26.jpg)
Expected Separable Overapproximation (ESO):How to Choose Block Stepsizes?
P.R. and Martin Takac. Parallel coordinate descent methods for big data optimization, arXiv:1212.0873, December 2012Olivier Fercoq and P.R. Smooth minimization of nonsmooth functions by parallel coordinate descent methods, arXiv:1309.5885, September 2013P.R. and Martin Takac. Distributed coordinate descent methods for learning with big data, arXiv:1310.2059, October 2013
SPCDM
![Page 27: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/27.jpg)
Assumptions: Function f
Example:
(a)
(b)
(c)
![Page 28: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/28.jpg)
Visualizing Assumption (c)
![Page 29: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/29.jpg)
New ESO
Theorem (Fercoq & R. 12/2013)
(i)
(ii)
![Page 30: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/30.jpg)
Comparison with Other Stepsizes for Parallel Coordinate Descent Methods
Example:
![Page 31: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/31.jpg)
Complexity for New Stepsizes
Average degree of separability
“Average” of the Lipschitz constants
With the new stepsizes, we have:
![Page 32: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/32.jpg)
Work in 1 Iteration
![Page 33: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/33.jpg)
Cost of 1 Iteration of APPROX
Assume N = n (all blocks are of size 1)and that
Sparse matrixThen the average cost of 1 iteration of APPROX is
Scalar function: derivative = O(1)
arithmetic ops
= average # nonzeros in a column of A
![Page 34: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/34.jpg)
Bottleneck: Computation of Partial Derivatives
maintained
![Page 35: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/35.jpg)
PreliminaryExperiments
![Page 36: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/36.jpg)
L1 Regularized L1 Regression
Dorothea dataset:
Gradient Method
Nesterov’s Accelerated Gradient Method
SPCDM
APPROX
![Page 37: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/37.jpg)
L1 Regularized L1 Regression
![Page 38: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/38.jpg)
L1 Regularized Least Squares (LASSO)
KDDB dataset:
PCDM
APPROX
![Page 39: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/39.jpg)
Training Linear SVMs
Malicious URL dataset:
![Page 40: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/40.jpg)
Importance Sampling
![Page 41: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/41.jpg)
with Importance Sampling
Zheng Qu and P.R. Accelerated coordinate descent with importance sampling, Manuscript 2014P.R. and Martin Takac. On optimal probabilities in stochastic coordinate descent methods, aXiv:1310.3438, 2013
![Page 42: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/42.jpg)
Convergence Rate
Theorem [Qu & R. 2014]
![Page 43: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/43.jpg)
Serial Case: Optimal ProbabilitiesNonuniform serial sampling:
Optimal ProbabilitiesUniform Probabilities
![Page 44: Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv:1312.5799)](https://reader038.vdocument.in/reader038/viewer/2022103021/56649c755503460f94929117/html5/thumbnails/44.jpg)
Extra 40 Slides