matrix sketching for alternating direction method of ... · explicit+implicit dimension reduction...

18
MATRIX S KETCHING FOR ALTERNATING DIRECTION METHOD OF MOMENTS OPTIMIZATION Daniel J. McDonald Indiana University, Bloomington mypage.iu.edu/~dajmcdon Nonlinear Dimension Reduction (SDSS) 17 May 2018 1

Upload: others

Post on 27-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

MATRIX SKETCHING FOR ALTERNATING DIRECTION METHOD

OF MOMENTS OPTIMIZATION

Daniel J. McDonaldIndiana University, Bloomington

mypage.iu.edu/~dajmcdon

Nonlinear Dimension Reduction (SDSS)17 May 2018

1

Page 2: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

EXPLICIT+IMPLICIT DIMENSION REDUCTION

Modern statistical applications — genomics, neural image analysis, text analysis, weatherprediction — have large numbers of covariates p

Also frequently have lots of observations n.

Need algorithms which can handle these kinds of data sets. With good statistical properties

2

Page 3: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

MOTIVATING EXAMPLES

1. Localizing groups of genes thatpredict disease

2. Finding global temperaturetrends using satellite imagery

3. Detecting outliers in fMRIscans

3

Page 4: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

ESTIMATORS

1. Sparse PCR (BT04, PBHT08, DM17)

V = argminV ∈Fd

− 1n

tr(X>XV ) + λ∑ij

|Vij |

θ = argminθ

∥∥∥Y −XV θ∥∥∥2

2.

2. `1-trend filtering (KKBG09, TT12, T14, MK18)

θ = argminθ−L(Y | θ) + λ ‖Dθ‖1

3. PCA leverage (MNECL16, MMD18)

U = argminU∈Fd

− 1n

tr(XX>U

)+ λ

∥∥∥∥∥∥D∑j

|Vij |

∥∥∥∥∥∥1

4

Page 5: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

GENERIC CONVEX OPTIMIZATION

Many estimators have the form:minxf(x) + g(x)

Consider f(x) as the negative log-likelihood and g(x) as some kind of penalty thatpreferences useful structure.

The negative likelihood is convex and differentiable.The penalty may be neither.Sometimes relax the penalty to something convex to get approximate structure:

Example:minβ‖Y −Xβ‖22 + λ ‖β‖0 −→ min

β‖Y −Xβ‖22 + λ ‖β‖1

5

Page 6: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

ALTERNATING DIRECTION METHOD OF MULTIPLIERSOne way to solve optimization problems like this is to restate the problem

Original Equivalent

minx

f(x) + g(x)minx,z

f(x) + g(z)

s.t. x− z = 0

Then, iterate the following with ρ > 0

x← argminx

f(x) + ρ

2 ‖x− z + u‖22

z ← argminz

g(z) + ρ

2 ‖x− z + u‖22

u← u+ x− z

6

Page 7: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

WHY WOULD YOU DO THIS?It decouples f and g: this can be easierIf f and g have the right structure, the individual updates can be parallelizedThe algorithm converges under very general conditionsThere are often many ways to decouple a problem

minβ‖Y −Xβ‖22 + λ ‖β‖1

The individual minimizations don’t have to be solved in closed form

Example:β ← (X>X + ρI)−1(X>Y + ρ(α− u))α← Sλ/ρ(β + u)u← u+ β − α

[Sa(b)]k = sgn(bk)(|bk| − a)+

7

Page 8: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

CONDITIONS FOR CONVERGENCE

When the updates are exact (as with lasso), all you need for convergence is

1. f , g are convex, extended real valued.2. f(x) + g(z) + u>(x− z) has a saddle point.

The convergence rate is not well understood.

It turns out, you can solve the minimizations approximately.

∞∑k=1

∥∥∥Π(yk)− Π(yk)∥∥∥

2<∞

Source: Eckstein and Bertsekas 1992

8

Page 9: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

WHY APPROXIMATE?

In our Example, the first step involved a matrix inversion (X>X + ρI)−1

The same is true for the real data cases above: we need matrixdecompositions/inversions.

Focus on two methods of “approximate eigendecomposition”

1. Nyström extension2. Column sampling

9

Page 10: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

A QUICK SKETCH OF THE INTUITION

Both methods fall into a larger class

Suppose we want to approximate S = 1nX>X ∈ Rp×p

S is symmetric and positive semi-definite

Choose t and form a “sketching” matrix Φ ∈ Rp×t

Then writeS ≈ (SΦ)(Φ>SΦ)†(SΦ)>

10

Page 11: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

SPECIAL CASES

Nyström and column sampling correspond to particular ΦBut they are easy to implement without extra multiplicationsRandomly choose t entries in 1, . . . , p andThen partition the matrix so the selected portion is S11

S =[S11 S12S21 S22

]Nyström

S ≈[S11S21

]S†11

[S11 S12

]Column sampling

S ≈ U([S11S21

])Λ([S11S21

])U

([S11S21

])>11

Page 12: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

A SHORT LIST OF RELATED WORK

Rokhlin, Tygert, (2008).Drineas, Mahoney, Muthukrishnan, Sarlós (2011).Halko, Martinsson, Tropp (2011).Gittens, Mahoney (2013).Woodruff (2014).Pourkamali (2014).Homrighausen, McDonald (2016).Wang, Gittens, Mahoney (2017)

12

Page 13: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

ADMM FOR GENETICS

Goal is to find clusters of genes which predict the response.

The approach is semi-supervised: like PCR, but we assume that the eigenvectors are “rowsparse”.

1. This allows for consistent estimation when p n.2. Matches our assumption that only a few genes are predictive: ‖Vi‖2 = 0⇒ βi = 0.

V ← ΠFd

(Y − U + 1

nρX>X

)Y ← Sλ/ρ(V + U)U ← U + V − Y

13

Page 14: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

PROJECTING ONTO THE FANTOPEGiven an eigen decomposition of A =

∑i γiaia

>i .

ΠFd(A) =∑i

γ+i (θ)aia>i

γ+i (θ) = min(max(γi − θ, 0), 1), θ s.t.

∑i

γ+i (θ) = d

The γ-θ stuff solves a monotone, piecewise linear equation.For our data, S is 105 × 105.And we have to do the decomposition at every iteration.The fMRI outlier detection problem involves a similar step but the matrix isnvoxels × nvoxels.

Source: Vu, Cho, Lei, and Rohe 2013

14

Page 15: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

SIMULATION FOR GENES

0.2

0.3

0.4

0.5

elnet fps fps_approx lasso ols ridge

method

mea

n sq

uare

pre

dict

ion

erro

r

n = 1000, p = 2000, 100 true genes, 3 principal components

15

Page 16: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

A NOD TOWARD THEORY

At each iteration, we use column sampling with t = 1000Could also use “Nyström approximation”

These approximations are accurate: something like O(ε−1) if t = Ω((1− ε)−2)Need t→ p as k →∞ to guarantee convergence, though seems unnecessary inpractice.

Source: See Alex’s work as well as that of Mahoney, Woodroofe, Drineas, others

16

Page 17: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

CONCLUSION

This talk summarized some methodology for analyzing large data sets.

Making these methods work requires computational approximations.

These ideas combined algorithmic dimension reduction with nonlinear dimensionreduction.

Current work develops more detailed theoretical results for these methods.

17

Page 18: Matrix Sketching for Alternating Direction Method of ... · EXPLICIT+IMPLICIT DIMENSION REDUCTION Modern statistical applications — genomics, neural image analysis, text analysis,

COLLABORATORS AND FUNDING

18