stochastic reformulations of linear systems: algorithms ...prichtar/talks/talk... · stochastic...
TRANSCRIPT
StochasticReformulationsofLinearSystems:
AlgorithmsandConvergenceTheory
PeterRichtárik
ModernConvexOptimizationandApplicationsWorkshopinhonourofArkadi Nemirovski’s 70th birthday
FieldsInstitute,TorontoJuly4-7,2017
HPOPT 2008Tilburg UniversityThe Netherlands
ASystemofLinearEquations
Ax = b
m
n A 2 Rm⇥n, x 2 Rn
, b 2 Rm
m equations with n unknowns
Assumption:Thesystemisconsistent(i.e.,asolutionexists)
PartIStochasticReformulations
P.R.andMartinTakáčStochasticReformulationsofLinearSystems:AlgorithmsandConvergenceTheoryarXiv:1706.01108, 2017
StochasticReformulationsofLinearSystems
Ax = b
1. StochasticOptimization2. StochasticLinearSystem3. StochasticFixedPoint4. ProbabilisticIntersection
Theorema) These4problemshavethesamesolutionsetsb) Necessary&sufficientconditionsforthesolutionset
tobeequalto{x : Ax = b}
B,D
n⇥ n pos def
distribution over m⇥ q matrices
B = identityD = uniform over e1, . . . , em (unit basis vectors in Rm
)
Example:
Reformulation1:StochasticOptimization
Minimize f(x)def= ES⇠D[fS(x)]
H = S(S>AB�1A>S)†S>LS = {x : S
>Ax = S
>b}
fS(x) =12kx�⇧B
LS(x)k2B = 1
2 (Ax� b)>H(Ax� b)
Reformulation2:StochasticLinearSystem
preconditioner
H = S(S>AB�1A>S)†S>InsteadofAx = b
thepreconditionedsystem:
wesolve
Instead of B�1A>E[H]A we have access to B�1A>HA
Unbiasedestimateofthepreconditioner
Solve B�1A>ES⇠D[H]Ax = B�1A>ES⇠D[H]b
Reformulation3:StochasticFixedPointProblem
Projection in B-norm onto LS = {x : S
>Ax = S
>b}
Solve x = ES⇠D⇥⇧
BLS
(x)
⇤
Reformulation4:ProbabilisticIntersectionProblem
LS = {x : S
>Ax = S
>b}
Sketchedsystem
Sdiscrete
Find x 2 Rn such that P(x 2 LS) = 1
{x : P(x 2 LS) = 1} =T
S LS
PartIIRandomizedAlgorithms
Viewpoint1:StochasticOptimization
StochasticGradientDescent
S ⇠ Dconstantstepsize
x
t+1 = x
t � !rfS(xt)
Akeymethodinmachinelearning
“stochasticgradient”
Stochastic“Newton”Descent
S ⇠ D
B- pseudoinverse ofthe“stochasticHessian”
x
t+1 = x
t � !(r2fS)†BrfS(xt)
“stochasticgradient”Constantstepsize
StochasticProximalPointMethod
S ⇠ D
x
t+1 = arg minx2Rn
⇢f
S
(x) +! � 1
2!kx� x
tk2B
�
Viewpoint3:StochasticFixedPoint
Method
StochasticFixedPointMethod
S ⇠ D
Relaxationparameter
Stochasticfixedpointmapping
x
t+1 = !⇧BLS
(xt) + (1� !)xt
PartIIIComplexity
BasicMethod
BasicMethod:Complexity
H = S(S>AB�1A>S)†S>
E[U>B
1/2(xt � x
⇤)] = (I � !⇤)tU>B
1/2(x0 � x
⇤)
W = B�1/2A>ES⇠D[H]AB�1/2 = U⇤U>
stepsize /relaxationparameter
BasicMethod:Complexity
t � �max
�+
min
log
✓1
✏
◆kE[xt � x
⇤]k2B ✏
! = 1/�max
! = 1
t � 1
�+min
log
✓1
✏
◆kE[xt � x
⇤]k2B ✏
! = 1
t � 1
�+min
log
✓1
✏
◆E⇥kxt � x
⇤k2B⇤ ✏
ConvergenceofExpectedIterates
L2Convergence
ParallelMethod
ParallelMethod
x
t+1 = 1⌧
P⌧i=1 �!(xt
, S
ti )
i.i.d.
“Run 1 step of the basic method from x
t
several times independently,
and average the results.”
One step of the basic method from x
t
ParallelMethod:Complexity
t � 1
�+min
log
✓1
✏
◆
E⇥kxt � x
⇤k2B⇤ ✏
L2Convergence
⌧ = 1
or
⌧ = +1
t � �max
�+
min
log
✓1
✏
◆
AcceleratedMethod
AcceleratedMethod
Accelerationparameter(between1and2)
x
t+1 = ��!(xt, S
t) + (1� �)�!(xt�1, S
t�1)
St, St�1 ⇠ D (independent)
One step of the basic method from x
t�1
One step of the basic method from x
t
AcceleratedMethod:Complexity
ConvergenceofIterates
kE[xt � x
⇤]k2B ✏
BasicMethoddependson!
t �
s�max
�+
min
log
✓1
✏
◆
�max
�+
min
DetailedComplexityResults
PartIVConclusion
Contributions• 4Equivalentstochasticreformulationsofalinearsystem
– Stochasticoptimization– Stochasticfixedpointproblem– Stochasticlinearsystem– Probabilisticintersection
• 3Algorithms– Basic(SGD,stochasticNewtonmethod,stochasticfixedpointmethod,
stochasticproximalpointmethod,stochasticprojectionmethod,…)– Parallel– Accelerated
• Iterationcomplexityguaranteesforvariousmeasuresofsuccess– Expectediterates(closedform)– L1/L2convergence– Convergenceoff;ergodic…
RelatedWorkRobertMansel GowerandP.R.RandomizedIterativeMethodsforLinearSystemsSIAMJ.MatrixAnalysis&Applications36(4):1660-1690, 2015
RobertMansel GowerandP.R.StochasticDualAscentforSolvingLinearSystemsarXiv:1512.06890, 2015
• 2017IMAFoxPrize(2nd Prize)inNumericalAnalysis
• MostdownloadedSIMAXpaper
RobertMansel GowerandP.R.RandomizedQuasi-NewtonMethodsareLinearlyConvergentMatrixInversionAlgorithmsarXiv:1602.01768, 2016
RobertMansel Gower,DonaldGoldfarbandP.R.StochasticBlockBFGS:SqueezingMoreCurvatureoutofDataICML2016
Basicmethodwithunitstepsize andfullrankA:
Removaloffullrankassumption+duality:
Invertingmatrices&connectiontoQuasi-Newtonupdates:
Computingthepseudoinverse:
RobertMansel GowerandP.R.LinearlyConvergentRandomizedIterativeMethodsforComputingthePseudoinversearXiv:1612.06255,2016
Applicationinmachinelearning:
THEEND