stochastic quasi-gradient methods roger j-b wets university of california, davis february 15, 2005

Stochastic Quasi-Gradient Methods

Roger J-B Wets

University of California, Davis

February 15, 2005

Stochastic optimization Formulation

Properties:

minE{ f (ξ,x)} so that x∈S⊂ ° n

E{ f (ξ,x)} = f (ξ,x)P(dξ)

Ξ⊂° N∫ =Ef (x)

ξ a f (ξ , x) is (measurable) continuous

x a f (ξ , x) is lsc, convex ⇒ Ef (g) convex

S is a closed convex set

S

Subgradients of convex fcnsv(ξ) ∈∂x f(ξ,x) ⇔ ∀x: f(ξ,x)≥ f(ξ,x) + v(ξ),x−x

v(ξ) =∇f(ξ,x) ⇔ ∂x f(ξ,x) a singleton

Minimization algorithms

xν xν+1 =xν −λν

)v

xν + λ(−)v) : )v∈∂Ef(xν ),λ ≥0.

S

Step type 1

Minimization algorithms

xν

xν+1

xν + λ(−)v) : )v∈∂Ef(xν ),λ ≥0.

S

Step type 2proj

xν −λν

)v

“repeated” projections

xν+1 =argmin dist2 (xν −λνvν (ξ),x) : x∈S⎡⎣ ⎤⎦

Convex program: quadratic objective functionquadratic program if S is a polyhedral set

Many applications: S =[l,u] ∩ x: aj (xj ) ≤βj=1

n∑{ }

projection is a simple/efficient non-negative, convex, bounded away from 0a j

SQG Iterates basic strategy:

xν+1 = projS(xν −λνς

ν ), −ςν estim. descent direction

λν ] 0 as ν → ∞

ς ν stochastic quasi-gradient:

Ef (x*) ≥ Ef (xν ) + E{ς ν | x0 ,K , xν −1},(x * −xν ) + γ ν

"asymptotically": E{ς ν | x0 ,K , xν −1}∈∂Ef (xν )

SQG: Stochastic Optimization . sqg:

justification:


vν (ξs)∈∂f(ξs,xν ) or =∇f(ξs,xν )

ξs sample of random vector ξ, or

vν (ξs)=1κ

vνll=1

κ∑ (ξsl ), vνl (ξsl )∈∂f(ξsl ,xν )

∂Ef (x) = ∂f (ξ , x)P(dξ ) (generally)Ξ∫

SQG: Stochastic Optimization . value estimate:

justification:


ην (ξ s )= f (ξ s , xν ), ξ s sample, or

η ν (ξ s )=1

κf (ξ sl , xν )

l=1

κ

∑

Ef (xν ) =E{ f(ξ,xν )} & Law L.N.

A (simple) location problem Pop. Size of 12 districts: 11 # 26. Probabilistic choice of shopping district:

shortage cost: 4, holding cost: 0.5 (excess) decision: location of facilities (shopping malls)

pij =e−λcij

e−λcik

k=1

12∑ from i to shop in j

λ =0.1 ,cij : distance & other factors

“preferences” tablecij :

0 1

3 4 6 7 8 …

2 0 1 1 3 5 5 …

7 1 0 1 2 6 5 4

4 …

Formulation from objective:

probability of sample

determined by customer behavior

cij to pijminE fi (ξi ,xi )

i=1

12

∑⎧⎨⎩

⎫⎬⎭

fi (ξi ,xi ) =max[qs(ξi −xi ), qe(xi −ξi )]

ξ =(ξ1,K ,ξ12 )

∂fi (ξ i , xi )∋−qs if ξ i ≥ xi; = qe otherwise

pij

Objective Value: iterates

Estimate of the objective per iterate λν =1 / ν , xν +1 = xν − vν (ξ s ) / ν

viν = −qs or = qe

Objective Value (2): iterates

λν =1 / ν , xν +1 = xν − vν (ξ s ) / ν


Estimate of the objective per iterate

Facilities: 18.57 15.90 19.13 16.35 27.25 20.75 21.88 17.81 19.11 17.52 18.62 19.60Distr.Pop: 14 11 14 13 26 23 22 11 14 12 18 10

Objective Value (3): iterates

Facilities: 24 22 23 20 26 22 23 22 22 20 22 25 : 271Distr.Pop: 19 16 19 16 27 21 22 18 19 18 19 20 : 234

λν =1 / ν , xν +1 = xν − vν (ξ s ) / ν


a.s. Convergence For now presumed optimal sol’n at iteration ν

projection implies:

x*∈S

xν , sample ξν , vν ∈∂f(ξν ,xν ), λν =1 /νxν+1 =prjS(x

ν −λνvν )

F ν =σ − x0 ,K , xν{ }

xν+1 −x*2≤ xν −x*

2−2λν vν ,xν −x* + λν

2 vν 2

a.s Convergence

taking condition expectation w.r.t. Fν

assumption(a.): with

Eν xν+1 −x*2

{ } ≤ xν −x*2+ λν

2Eν vν (ξ)2

{ }

−2λν Eν vν (ξ){ } ,xν −x*

γν + Eν vν (ξ ){ }, x * −xν ≥ Ef (x*) − Ef (xν ) ≥ 0

Eν g{ } =E g: F ν{ }

γν F ν -measurable

⇒ Eν vν (ξ ){ }, x * −xν ≥ −γ ν

a.s Convergence Hence

Assumption(b.): where

with

Eν xν+1 −x*2

{ } ≤

xν −x*2+ λν

2Eν vν (ξ)2

{ } + 2λνγν ≤ xν −x*2+ ρν

Xν = x* −xν 2+ ρkk=0

ν∑ ,

Eν Xν +1{ } ≤Xν , Xν ≥0 ⇒ Xνa.s.⏐ →⏐ X, EX≤EX0

Eν g{ } =E g: F ν{ }

ρνν=0

∞

∑ < ∞

ρν =λν2Eν vν (ξ )

2

{ }+ 2λ ν | γ ν | ⇒ λ ν2 < ∞

ν =0

∞

∑( )

a.s. Convergence recursively

from (a)

E xν+1 −x*2

{ } ≤E x0 −x*2

{ } + λk2E vk 2

{ }k=0

ν

∑

−2 λk E vk{ } ,xk −x*k=0

ν

∑

E xν+1 −x*2

{ } −E x0 −x*2

{ } − E ρk{ }k=0

ν∑≤2 Ef(x* )−Ef(xk)⎡⎣ ⎤⎦k=0

ν∑

a.s. Convergence Thus

assumption (c.) and there exists a subsequence such that

λkk=0

∞

∑ Ef (x*) − Ef (xk )⎡⎣ ⎤⎦< ∞

λν ≥0, λ νν =0

∞

∑ = ∞Ef (x*) ≤Ef(xk) ⇒

Ef (x*)−Ef(xνk )→ 0.

Review of assumptions (a.)

(b.)

(c.)

Ef (x*)−Ef(xν ) ≤γν + Eν vν (ξ){ } ,x* −xν

λν2Eν vν (ξ )

2

{ }+ 2λ ν | γ ν |ν =0

∞

∑ < ∞

λν ≥0, λ νν =0

∞

∑ = ∞

“stumbling” blocks Projection Step size: adaptive, adjust (increase,

decrease) based on the variance of the stochastic quasi-gradient

Stopping criterion: like for step-size, but more generally comparison of the values of the objective:

1

M +1f (ξ l

l=ν−M

ν∑ ,xl ) ≈Ef (xν ) estimate

A short history Stochastic approximation methods

Robbins & Monro, Kiefer & Wolfowitz (‘50) SQG: Theory

Shor, Poljak, Ermoliev, Fabian (‘60),

Kushner(‘70),Pflug, Ruszczynski (‘80), Implementation:

Gaivoronski, Gupal, Norkin (‘80 … 2005)

stochastic quasi-gradient methods roger j-b wets university of california, davis february 15, 2005

Documents

value estimate

sqg iteratesbasic strategy

s convergencehenceassumptionb

shortage cost

simple location problempop

kiefer wolfowitz

theory shor

simpleefficient nonnegative