stochastic reformulations of linear systems: algorithms ...prichtar/talks/talk... · stochastic...

StochasticReformulationsofLinearSystems:

AlgorithmsandConvergenceTheory

PeterRichtárik

ModernConvexOptimizationandApplicationsWorkshopinhonourofArkadi Nemirovski’s 70th birthday

FieldsInstitute,TorontoJuly4-7,2017

HPOPT 2008Tilburg UniversityThe Netherlands

ASystemofLinearEquations

Ax = b

m

n A 2 Rm⇥n, x 2 Rn

, b 2 Rm

m equations with n unknowns

Assumption:Thesystemisconsistent(i.e.,asolutionexists)

PartIStochasticReformulations

P.R.andMartinTakáčStochasticReformulationsofLinearSystems:AlgorithmsandConvergenceTheoryarXiv:1706.01108, 2017

StochasticReformulationsofLinearSystems

Ax = b

1. StochasticOptimization2. StochasticLinearSystem3. StochasticFixedPoint4. ProbabilisticIntersection

Theorema) These4problemshavethesamesolutionsetsb) Necessary&sufficientconditionsforthesolutionset

tobeequalto{x : Ax = b}

B,D

n⇥ n pos def

distribution over m⇥ q matrices

B = identityD = uniform over e1, . . . , em (unit basis vectors in Rm

)

Example:

Reformulation1:StochasticOptimization

Minimize f(x)def= ES⇠D[fS(x)]

H = S(S>AB�1A>S)†S>LS = {x : S

>Ax = S

>b}

fS(x) =12kx�⇧B

LS(x)k2B = 1

2 (Ax� b)>H(Ax� b)

Reformulation2:StochasticLinearSystem

preconditioner

H = S(S>AB�1A>S)†S>InsteadofAx = b

thepreconditionedsystem:

wesolve

Instead of B�1A>E[H]A we have access to B�1A>HA

Unbiasedestimateofthepreconditioner

Solve B�1A>ES⇠D[H]Ax = B�1A>ES⇠D[H]b

Reformulation3:StochasticFixedPointProblem

Projection in B-norm onto LS = {x : S

>Ax = S

>b}

Solve x = ES⇠D⇥⇧

BLS

(x)

⇤

Reformulation4:ProbabilisticIntersectionProblem

LS = {x : S

>Ax = S

>b}

Sketchedsystem

Sdiscrete

Find x 2 Rn such that P(x 2 LS) = 1

{x : P(x 2 LS) = 1} =T

S LS

PartIIRandomizedAlgorithms

Viewpoint1:StochasticOptimization

StochasticGradientDescent

S ⇠ Dconstantstepsize

x

t+1 = x

t � !rfS(xt)

Akeymethodinmachinelearning

“stochasticgradient”

Stochastic“Newton”Descent

S ⇠ D

B- pseudoinverse ofthe“stochasticHessian”

x

t+1 = x

t � !(r2fS)†BrfS(xt)

“stochasticgradient”Constantstepsize

StochasticProximalPointMethod

S ⇠ D

x

t+1 = arg minx2Rn

⇢f

S

(x) +! � 1

2!kx� x

tk2B

�

Viewpoint3:StochasticFixedPoint

Method

StochasticFixedPointMethod

S ⇠ D

Relaxationparameter

Stochasticfixedpointmapping

x

t+1 = !⇧BLS

(xt) + (1� !)xt

PartIIIComplexity

BasicMethod

BasicMethod:Complexity

H = S(S>AB�1A>S)†S>

E[U>B

1/2(xt � x

⇤)] = (I � !⇤)tU>B

1/2(x0 � x

⇤)

W = B�1/2A>ES⇠D[H]AB�1/2 = U⇤U>

stepsize /relaxationparameter

BasicMethod:Complexity

t � �max

�+

min

log

✓1

✏

◆kE[xt � x

⇤]k2B ✏

! = 1/�max

! = 1

t � 1

�+min

log

✓1

✏

◆kE[xt � x

⇤]k2B ✏

! = 1

t � 1

�+min

log

✓1

✏

◆E⇥kxt � x

⇤k2B⇤ ✏

ConvergenceofExpectedIterates

L2Convergence

ParallelMethod

ParallelMethod

x

t+1 = 1⌧

P⌧i=1 �!(xt

, S

ti )

i.i.d.

“Run 1 step of the basic method from x

t

several times independently,

and average the results.”

One step of the basic method from x

t

ParallelMethod:Complexity

t � 1

�+min

log

✓1

✏

◆

E⇥kxt � x

⇤k2B⇤ ✏

L2Convergence

⌧ = 1

or

⌧ = +1

t � �max

�+

min

log

✓1

✏

◆

AcceleratedMethod

AcceleratedMethod

Accelerationparameter(between1and2)

x

t+1 = ��!(xt, S

t) + (1� �)�!(xt�1, S

t�1)

St, St�1 ⇠ D (independent)


t�1


t

AcceleratedMethod:Complexity

ConvergenceofIterates

kE[xt � x

⇤]k2B ✏

BasicMethoddependson!

t �

s�max

�+

min

log

✓1

✏

◆

�max

�+

min

DetailedComplexityResults

PartIVConclusion

Contributions• 4Equivalentstochasticreformulationsofalinearsystem

– Stochasticoptimization– Stochasticfixedpointproblem– Stochasticlinearsystem– Probabilisticintersection

• 3Algorithms– Basic(SGD,stochasticNewtonmethod,stochasticfixedpointmethod,

stochasticproximalpointmethod,stochasticprojectionmethod,…)– Parallel– Accelerated

• Iterationcomplexityguaranteesforvariousmeasuresofsuccess– Expectediterates(closedform)– L1/L2convergence– Convergenceoff;ergodic…

RelatedWorkRobertMansel GowerandP.R.RandomizedIterativeMethodsforLinearSystemsSIAMJ.MatrixAnalysis&Applications36(4):1660-1690, 2015

RobertMansel GowerandP.R.StochasticDualAscentforSolvingLinearSystemsarXiv:1512.06890, 2015

• 2017IMAFoxPrize(2nd Prize)inNumericalAnalysis

• MostdownloadedSIMAXpaper

RobertMansel GowerandP.R.RandomizedQuasi-NewtonMethodsareLinearlyConvergentMatrixInversionAlgorithmsarXiv:1602.01768, 2016

RobertMansel Gower,DonaldGoldfarbandP.R.StochasticBlockBFGS:SqueezingMoreCurvatureoutofDataICML2016

Basicmethodwithunitstepsize andfullrankA:

Removaloffullrankassumption+duality:

Invertingmatrices&connectiontoQuasi-Newtonupdates:

Computingthepseudoinverse:

RobertMansel GowerandP.R.LinearlyConvergentRandomizedIterativeMethodsforComputingthePseudoinversearXiv:1612.06255,2016

Applicationinmachinelearning:

THEEND

stochastic reformulations of linear systems: algorithms ...prichtar/talks/talk... · stochastic...

Documents