optimization methods in machine learning lecture 22 · 2016. 4. 11. · combettes and wajs, ʻ05...

16
Optimization Methods in Machine Learning Lecture 22 Katya Scheinberg Lehigh University [email protected]

Upload: others

Post on 02-Apr-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimization Methods in Machine Learning Lecture 22 · 2016. 4. 11. · Combettes and Wajs, ʻ05 Eckstein and Bertsekas, ʻ92, Eckstein and Svaiter, ʼ08 Glowinski and Le Tallec,

Optimization Methods in Machine Learning

Lecture 22

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAAAAA

Katya Scheinberg Lehigh University

[email protected]

Page 2: Optimization Methods in Machine Learning Lecture 22 · 2016. 4. 11. · Combettes and Wajs, ʻ05 Eckstein and Bertsekas, ʻ92, Eckstein and Svaiter, ʼ08 Glowinski and Le Tallec,

Splitting, alternating linearization and alternating direction

methods

Page 3: Optimization Methods in Machine Learning Lecture 22 · 2016. 4. 11. · Combettes and Wajs, ʻ05 Eckstein and Bertsekas, ʻ92, Eckstein and Svaiter, ʼ08 Glowinski and Le Tallec,

Augmented Lagrangian

Augmented Lagrangian method

Augmented Lagrangian function

Page 4: Optimization Methods in Machine Learning Lecture 22 · 2016. 4. 11. · Combettes and Wajs, ʻ05 Eckstein and Bertsekas, ʻ92, Eckstein and Svaiter, ʼ08 Glowinski and Le Tallec,

•  Consider:

•  Relax constraints via Augmented Lagrangian technique

Alternating directions (splitting) method

Assume that f(x) and g(y) are both such that the above functions are easy to optimize in x or y

Page 5: Optimization Methods in Machine Learning Lecture 22 · 2016. 4. 11. · Combettes and Wajs, ʻ05 Eckstein and Bertsekas, ʻ92, Eckstein and Svaiter, ʼ08 Glowinski and Le Tallec,

Alternating direction method (ADM)

Widely used method without complexity bounds

Combettes and Wajs, ‘05

Eckstein and Bertsekas, ‘92,

Eckstein and Svaiter, ’08

Glowinski and Le Tallec, ‘89

Kiwiel, Rosa, and Ruszczynski, ’99

Lions and Mercier ‘79

Page 6: Optimization Methods in Machine Learning Lecture 22 · 2016. 4. 11. · Combettes and Wajs, ʻ05 Eckstein and Bertsekas, ʻ92, Eckstein and Svaiter, ʼ08 Glowinski and Le Tallec,

A slight modification of ADM

This turns out to be equivalent to……

Goldfarb, Ma and S, ’10

Page 7: Optimization Methods in Machine Learning Lecture 22 · 2016. 4. 11. · Combettes and Wajs, ʻ05 Eckstein and Bertsekas, ʻ92, Eckstein and Svaiter, ʼ08 Glowinski and Le Tallec,

Alternating linearization method (ALM)

Goldfarb, Ma, S, ‘10

Page 8: Optimization Methods in Machine Learning Lecture 22 · 2016. 4. 11. · Combettes and Wajs, ʻ05 Eckstein and Bertsekas, ʻ92, Eckstein and Svaiter, ʼ08 Glowinski and Le Tallec,

Convergence rate for ALM

Th: If µ·1/L then in O(L/²) iterations finds ² -optimal solution

Goldfarb, Ma, S, ’10

Page 9: Optimization Methods in Machine Learning Lecture 22 · 2016. 4. 11. · Combettes and Wajs, ʻ05 Eckstein and Bertsekas, ʻ92, Eckstein and Svaiter, ʼ08 Glowinski and Le Tallec,

Th: If µ·1/L then in iterations finds ² -optimal solution

Convergence rate for fast ALM

Goldfarb, Ma, S, ’10

Page 10: Optimization Methods in Machine Learning Lecture 22 · 2016. 4. 11. · Combettes and Wajs, ʻ05 Eckstein and Bertsekas, ʻ92, Eckstein and Svaiter, ʼ08 Glowinski and Le Tallec,

Alternating linearization method for nonsmooth g

This is not true for ||x||1!!!

Qg(x,y) may not be an upper approximation of F(x)!

Goldfarb, Ma, S, ’10

Idea: with line search can accept different µ values, including zero, for g

Page 11: Optimization Methods in Machine Learning Lecture 22 · 2016. 4. 11. · Combettes and Wajs, ʻ05 Eckstein and Bertsekas, ʻ92, Eckstein and Svaiter, ʼ08 Glowinski and Le Tallec,

Examples of applications of alternating linearization method

Page 12: Optimization Methods in Machine Learning Lecture 22 · 2016. 4. 11. · Combettes and Wajs, ʻ05 Eckstein and Bertsekas, ʻ92, Eckstein and Svaiter, ʼ08 Glowinski and Le Tallec,

Sparse Inverse Covariance Selection

Shrinkage O(n2) ops

Eigenvalue decomposition O(n3) ops. Same as one gradient of f(X)

f(x) g(x)

Page 13: Optimization Methods in Machine Learning Lecture 22 · 2016. 4. 11. · Combettes and Wajs, ʻ05 Eckstein and Bertsekas, ʻ92, Eckstein and Svaiter, ʼ08 Glowinski and Le Tallec,

Sparse Inverse Covariance Selection

Eigenvalue decomposition O(n3) ops. Same as one gradient of f(X)

f(x) g(x)

Page 14: Optimization Methods in Machine Learning Lecture 22 · 2016. 4. 11. · Combettes and Wajs, ʻ05 Eckstein and Bertsekas, ʻ92, Eckstein and Svaiter, ʼ08 Glowinski and Le Tallec,

Lasso or group Lasso

Shrinkage O(n2) ops

Eigenvalue decomposition O(n3) ops. Same as one gradient of f(X)

f(x) g(x)

Page 15: Optimization Methods in Machine Learning Lecture 22 · 2016. 4. 11. · Combettes and Wajs, ʻ05 Eckstein and Bertsekas, ʻ92, Eckstein and Svaiter, ʼ08 Glowinski and Le Tallec,

Robust PCA

Shrinkage O(n2) ops

Eigenvalue decomposition O(n3) ops. Same as one gradient of f(X)

f(x) g(x)

Page 16: Optimization Methods in Machine Learning Lecture 22 · 2016. 4. 11. · Combettes and Wajs, ʻ05 Eckstein and Bertsekas, ʻ92, Eckstein and Svaiter, ʼ08 Glowinski and Le Tallec,

Recall Collaborative Prediction?

Closed form solution!

O(n^3) effort