Stochastic Approximation Algorithms with
Set-Valued Maps
Shalabh Bhatnagar
Computer Science and Automation
Indian Institute of Science
Bangalore
November 10, 2019
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 1 / 24
Outline
1 Optimization under noise
2 Stochastic approximation algorithms
3 Random Directions Stochastic Approximation (RDSA)
4 Gradient algorithms with set-valued maps and their analysis
5 Ongoing and future work
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 2 / 24
Example: Parameter Optimization1
Consider a repeated experiment that gives i.i.d input-output pairs
(Xn,Yn), n ≥ 0, in real time
Goal: Find a best parameterized fit
Yn = fw (Xn) + εn,
i.e., one with the least g(w) =1
2E [‖ εn ‖2]
fw could correspond to polynomials, neural networks, splines,
wavelets etc.
Note ∇g(w) = −E [< Yn − fw (Xn),∇fw (Xn) >]
1V.S.Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint,
Cambridge Univ Press, 2008
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 3 / 24
Example: Parameter Optimization
Problem: Cannot find the expectation
Solution: Drop the expectation!
Gradient Scheme with Noise:
wn+1 = wn + a(n) < Yn − fwn(Xn),∇fwn(Xn) >
= wn + a(n)(−∇g(wn) + Mn+1),
where Mn+1 is the noise term
Algorithms of this type are called Stochastic Approximation
Algorithms
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 4 / 24
Stochastic Approximation2
Objective: Solve the equation J(x) = 0 when analytical form of J
is not known, however, ‘noisy’ measurements J(x) + Mn+1 can be
obtained
J(.)
J( n+1
Mn+1
x x) +M
The Robbins-Monro Algorithm:
xn+1 = xn + a(n)(J(xn) + Mn+1) (1)
2H.Robbins and S.Monro Annals of Mathematical Statistics, 22: 400–407, 1951
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 5 / 24
A Convergence Result3
(C1) J : RN → RN is Lipschitz continuous
(C2)∑
n
a(n) = ∞,∑
n
a(n)2 < ∞
(C3) Mn+1, n ≥ 0 is a martingale difference w.r.t.
{Fn�= σ(xm,Mm,m ≤ n)}. Further, for some K > 0,
E [‖ Mn+1 ‖2| Fn] ≤ K (1+ ‖ xn ‖2)
(C4) supn ‖ xn ‖< ∞ almost surely
Let x∗ be the unique globally asymptotically stable attractor for the
ODE x(t) = J(x(t)). Then
Theorem (Convergence of SAA): Under (C1)–(C4), {xn}converges almost surely to x∗.
3V.S.Borkar, Stochastic Approximation Algorithms: A Dynamical Systems
Viewpoint, Cambridge Univ. Press, 2008
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 6 / 24
Optimization under Noise
Let J : RN → R be a given objective function having the form
J(x) = Eμ[h(x , μ)], where μ denotes ‘noise’ and Eμ[·] is the
expectation under that noise
x
J(x)
h(x,
h(x,
μ1)
μ3)
h(x, μ2)
Goal: Find x∗ s.t. J(x∗) = minx∈RN J(x)
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 7 / 24
Gradient Estimation using RDSA4
Run two simulations with parameters
x + δd =
⎛⎜⎜⎜⎜⎝
x1 + δd1
x2 + δd2
··
xN + δdN
⎞⎟⎟⎟⎟⎠ , x − δd =
⎛⎜⎜⎜⎜⎝
x1 − δd1
x2 − δd2
··
xN − δdN
⎞⎟⎟⎟⎟⎠
where d1, . . . , dN are independent random variables with
distribution U[−η, η]
Gradient Estimator:
∇J(x) =3
η2d
J(x + δd)− J(x − δd)
2δ
4L.A.Prashanth, S.Bhatnagar, M.Fu and S.Marcus, IEEE Transactions on
Automatic Control, 62(5):2223-2238, 2017
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 8 / 24
Hessian Estimator for RDSA
Hessian Estimator:
∇2J(x) =9
2η4R
(J(x + δd) + J(x − δd)− 2J(x)
δ2
),
where
R =
⎡⎢⎢⎣
52(d
1)2 − η2/3 · · · d1dN
d2d1 · · · d2dN
· · · · · · · · ·dNd1 · · · 5
2(dN)2 − η2/3
⎤⎥⎥⎦ .
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 9 / 24
Main Convergence Result
RDSA Algorithm:
xn+1 = xn − a(n)Γ(∇2J(xn))−1∇J(xn)
except that δ is replaced with δn ↓ 0 and
∑n
a(n) = ∞;∑
n
(a(n)
δn
)2
< ∞ (2)
Let x∗ be the unique globally asymptotically stable equilibrium of
the ODE
x(t) = −Γ(∇2J(x))−1∇J(x)
Let a(n) = 1/nα and δn = 1/nγ with α− γ > 0.5 and
β�= α− 2γ > 0
Theorem: Under (C1) on ∇J, (2), (C3) and (C4)1 xn
a.s.→ x∗
2 nβ/2(xn − x∗)dist→ N (μ,Ω)
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 10 / 24
Sufficient Conditions for Stability of SA5
(C5)
(i) Jc(x)�= J(cx)/c, c ≥ 1 satisfies Jc → J∞, for some J∞ : RN → RN
uniformly on compacts
(ii) The origin in RN is a unique globally asymptotically stable
equilibrium for the ODE x(t) = J∞(x(t))
(iii) There is a unique globally asymptotically stable equilibrium
x∗ ∈ RN for the ODE x(t) = J(x(t))
The Stability Theorem: Under (C1)-(C3), (C5), for any initial
condition x0 ∈ RN , supn ‖ xn ‖< ∞ a.s. Further, xna.s.→ x∗.
5V. S. Borkar and S. P. Meyn, SIAM Journal of Control and Optimization,
38(2):447-469, 2000
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 11 / 24
Analysis of Algorithms with Biases67
Let ∇J(x) denote an estimator for ∇J(x) s.t.
‖ ∇J(x)−∇J(x) ‖≤ ε(δ) → 0 as δ → 0
ε J(x)
J(x)
ΔΔ
Consider the recursion xn+1 = xn − a(n)(∇J(xn) + εn), where
‖ εn ‖≤ ε ∀n6A.Ramaswamy and S.Bhatnagar, IEEE Transactions on Automatic Control,
63(5):1465-1471, 20187A.Ramaswamy and S.Bhatnagar, Mathematics of Operations Research,
42(3):648-661, 2017
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 12 / 24
Marchaud Map
x1
h(x1)
Rn R
m
x2
x3
x
h(x)
h(x3)
h(x2
) y1
y2
y3
y
A set-valued map h is called Marchaud if
h(x) is convex and compact for each x
supw∈h(x)
‖ w ‖ ≤ K (1+ ‖ x ‖) for each x
h is upper-semicontinuous, i.e., given {xn} ⊂ Rn and {yn} ⊂ Rm
with xn → x and yn → y with yn ∈ h(xn), ∀n, we have y ∈ h(x)
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 13 / 24
Differential Inclusion
Consider the differential inclusion (DI) in Rd :
x(t) ∈ H(x(t)), (3)
where H : Rd → {subsets of Rd} is Marchaud. Then the above DI
has at least one solution x and each solution is absolutely
continuous.8
8J.Aubin and A.Cellina, Differential Inclusions: Set-Valued Maps and Viability
Theory, Springer, 1984
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 14 / 24
Semiflow
The Set-Valued Semiflow Φ associated with (3) is defined on
[0,∞)×Rd as
Φ(t , x) = {x(t) | x ∈ Σ, x(0) = x},
where Σ is the set of all absolutely continuous solutions to (3).
x
φ( t,x)
For B × L ⊂ [0,∞)×Rd , let
Φ(B, L) = ∪t∈B,x∈LΦ(t , x)
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 15 / 24
Invariant Set
M ⊂ Rd is invariant if for every x ∈ M, there exists x∈ Σ s.t.
x(t) ∈ M ∀t with x(0) = x
x
M
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 16 / 24
Attractor of a DI
A ⊂ Rd is attracting if it is compact and there exists a
neighborhood U such that for any ε > 0, ∃T (ε) ≥ 0 with
Φ([T (ε),∞),U) ⊂ Nε(A)
A
U
A
U
Nε
(A )
T(ε)
If the above A is invariant, it is called an attractor
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 17 / 24
An Alternative View of Algorithm
Recall the recursion
xn+1 = xn − a(n)(∇J(xn) + εn),
where ‖ εn ‖≤ ε ∀n
Alternatively consider
xn+1 = xn − a(n)g(xn), (4)
where g(xn) ∈ G(xn)∀n and G(x)�= ∇J(x) + Bε(0), i.e., gradient
estimate lies in an ε-ball around true gradient
ε J(x)
J(x)
ΔΔG(x)
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 18 / 24
Assumptions
(A1) ∇J is a continuous function s.t. ‖ ∇J(x) ‖ ≤ K (1+ ‖ x ‖) for
all x ∈ Rd , K > 0
(A2) a(n) > 0∀n with
∑n
a(n) = ∞,∑
n
a(n)2 < ∞
Can show that G is upper-semicontinuous
Let Gc(x)�= {
y
c| y ∈ G(cx)}
Let G∞(x)�= co(Limsupc→∞Gc(x)), where Limsupc→∞Gc(x)
�= {y | lim infc→∞ d(y ,Gc(x)) = 0}
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 19 / 24
An Important Result
Lemma 1 The map x �→ G∞(x) is Marchaud
Thus, x(t) ∈ −G∞(x(t)) has at least one solution and which is
absolutely continuous
(A3) x(t) ∈ −G∞(x(t)) has an attractor set A such that A ⊂ Ba(0)for some a > 0 and Ba(0) is a fundamental neighborhood of A
(A4) Let cn ≥ 1 be an increasing sequence of integers such that
cn ↑ ∞ as n ↑ ∞. Let xn → x and yn → y as n ↑ ∞, such that
yn ∈ Gcn(xn), ∀n, then y ∈ G∞(x)
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 20 / 24
The Stability Result
Theorem 1 Under (A1)-(A4), the iterates (4) are stable i.e.,
supn ‖ xn ‖< ∞ a.s.
Now recall that G(x) = ∇J(x) + Bε(0)
Let the minimum set M of J be the global attractor of
x(t) = −∇J(x(t))
It can be shown that any compact set K with M ⊂ K ⊂ Rd is a
fundamental neighborhood of M
From Theorem 1, x(t) ∈ K0 ∀t ≥ 0 for some (possibly sample path
dependent) compact set K0 which then is a fundamental
neighborhood of M
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 21 / 24
The Main Result
Theorem 2 Given δ > 0, there exists ε(δ) > 0 such that (4)
converges to Nδ(M) provided ε ≤ ε(δ)/2
1T2 T3
T4T
t(1) t(2)
x(t)
x(t)
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 22 / 24
Related Recent Work
A general stochastic recursion with set-valued maps and Markov noise
xn+1 = xn + a(n)(h(xn,Zn) + Mn+1)
General convergence with Zn non-ergodic, iterate-dependent, Markov
process [V.Yaji and SB, Stochastics, 2018]
Two-timescale stochastic recursions with Markov noise
xn+1 = xn + a(n)(h(xn, yn,Z1n ) + M1
n+1)
yn+1 = yn + b(n)(g(xn, yn,Z2n ) + M2
n+1)
Z 1n ,Z
2n independent non-ergodic iterate-dependent Markov processes,
M1n+1,M
2n+1 independent martingale differences, a(n) = o(b(n)), h,g
point-to-point maps – analysis and application to reinforcement learning
[P.Karmakar and SB, Math of OR, 2018]
Analysis under set-valued h,g [V.Yaji and SB, Math of OR, 2019]
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 23 / 24
Ongoing and Future Work
Finding minima of non-differentiable functions under noise
Algorithms for convergence to global minima
Asynchronous update algorithms
Reinforcement learning algorithms for partially observed Markov
decision processes
Analysis of deep reinforcement learning algorithms
Applications in robotics, microgrids, vehicular traffic control etc.
Shalabh Bhatnagar (IISc) SAA with Set-Valued Maps November 10, 2019 24 / 24