one world optimization seminar...one world optimization seminar univie, june 22nd, 2020 context:...

Claudia SagastizábalIMECC-UNICAMP Brazil

One World Optimization SeminarUniVie, June 22nd, 2020

Context: when to use decomposition methods?

I problems too difficult to solve directly mixed 0-1, linear, quadratic,or nonlinear programsstochastic UC with nonconvex HPF

I problems with partial separable structure complicating variables:block diagonal 2nd-stage constraints,coupled with 1st stagecomplicating constraints, separable objective

I problems of problems equilibrium problems, games, variationalinequalitiesenergy markets

I information not accesiblebig data, machine learning


I problems too difficult to solve directlymixed 0-1, linear, quadratic, or nonlinear programs

stochastic UC with nonconvex HPFI problems with partial separable structure

complicating variables: block diagonal 2nd-stage constraints,coupled with 1st stage

complicating constraints, separable objectiveI problems of problems

equilibrium problems, games, variational inequalitiesenergy markets

I information not accesible in ML, commercial oraclessomething something


I problems too difficult to solve directlymixed 0-1, linear, quadratic, or nonlinear programsstochastic UC with nonconvex HPF

I problems with partial separable structurecomplicating variables: block diagonal 2nd-stage constraints,

coupled with 1st stage

complicating constraints, separable objectiveI problems of problems

equilibrium problems, games, variational inequalitiesenergy markets





coupled with 1st stagecomplicating constraints, separable objective

I problems of problemsequilibrium problems, games, variational inequalitiesenergy markets






I problems of problemsequilibrium problems, games, variational inequalities

energy marketsI information not accesible in ML, commercial oracles

something something





I problems of problemsequilibrium problems, games, variational inequalitiesenergy markets


Whi

chde

com

posi

tion

met

hod?

It all depends on the output of interest!

Decomposition: what and how?

Primal

Dual

Primal-Dual

It all depends on the output of interest

Illustration with a simple example min fT (yT ) + fH(yH)s.t. yT ∈ ST ,yH ∈ SH

yT + yH = d

Two power plants

yT ∈ ST yH ∈ SH〈F ,x〉+fT (yT ) fH(yH)

x ∈ {0,1} and yT ≤ x yup

yT + yH = d (demand)


yT + yH = d

Two power plants


x ∈ {0,1} and yT ≤ x yupyT + yH = d (demand)


yT + yH = d

Two power plants



A less simple example min fT (yT ) + fH(yH)s.t. yT ∈ ST ,yH ∈ SH

yT + yH = d

Two power plants

yT ∈ ST yH ∈ SH〈F ,x〉+ fT (yT ) fH(yH)


A less simple example

min 〈F ,x〉+ fT (yT ) + fH(yH)s.t. yT ∈ ST ,yH ∈ SH

yT + yH = dx ∈ {0,1} and yT ≤ x yup ⇐⇒ (x ,yT ) ∈ ST

Two power plants

yT ∈ ST yH ∈ SH〈F ,x〉+ fT (yT ) fH(yH)


Primal scissors (Benders)

I Good , to split 0-1 variables from possible NLP relations

I Bad , because all technologies dealt with together

for instance,

min 〈F ,x〉+fT (yT ) + fH(yH)s.t. (xk ,yT ) ∈ ST ,yH ∈ SH

yT + yH = d


I Good , to split 0-1 variables from possible NLP relationsI Bad , because all technologies dealt with together

for instance,

min 〈F ,x〉+fT (yT ) + fH(yH)s.t. (xk ,yT ) ∈ ST ,yH ∈ SH

yT + yH = d



for instance,

min 〈F ,x〉+fT (yT ) + fH(yH)s.t. ( xk ,yT ) ∈ ST ,yH ∈ SH

yT + yH = d=⇒ operational subproblem in both yT ,yH

for each given xk (a master program defines xk+1)

To separate technologieswe need dual scissors acting on the demand constraint



for instance,




To separate technologieswe need dual scissors

acting on the demand constraint



for instance,




To separate technologieswe need dual scissors acting on the demand constraint

Dual scissors: Lagrangian relaxation

min fT (x ,yT ) + fH(yH)s.t. (x ,yT ) ∈ ST

yH ∈ SHyT + yH = d ↔ u

=⇒

L(x ,y ,u) = fT (x ,yT ) + fH(yH)+〈u,d− yT − yH〉

= LT (x ,yT ,u) + LH(yH ,u)+〈u,d〉

⇔minx ,y

maxu

L(x ,y ,u)

s.t. (x ,yT ) ∈ STyH ∈ SH




=⇒


= LT (x ,yT ,u) + LH(yH ,u)+〈u,d〉

⇔

minx ,y

maxu

L(x ,y ,u)





=⇒


= LT (x ,yT ,u) + LH(yH ,u)+〈u,d〉

⇔minx ,y

maxu

L(x ,y ,u)





=⇒


= LT (x ,yT ,u) + LH(yH ,u)+〈u,d〉

DUAL maxmin replaces minmaxmax

umin

yL(x ,y ,u)


=⇒

maxu

θT (u) + θH(u) + 〈u,d〉

forθT (u) := minLT (x ,yT ,u) : (x ,yT ) ∈ ST

θH(u) := minLH(yH ,u) : yH ∈ SH

Dual scissors: Lagrangian relaxation à la bundle

min fT (yT ) + fH(yH)s.t. x ∈ {0,1} ,yT ∈ ST

yT ≤ x yupyH ∈ SHyT + yH = d ↔ u

=⇒

L(y ,u) = fT (yT ) + fH(yH)+〈u,d− yT − yH〉

= LT (yT ,u) + LH(yH ,u)+〈u,d〉

DUAL maxmin replaces minmaxmax

umin

yL(x ,y ,u)


=⇒

maxu

θT (u) + θH(u) + 〈u,d〉

forθT(u)≈minLT (x ,yT ,u) : (x ,yT ) ∈ ST

θH(u)≈minLH(yH ,u) : yH ∈ SH

Dual scissors

I Good , to separate technologies + inexact information

I Good , if output of interest if u (shadow price)I Bad , if output of interest is x ,y : the final ones may be infeasible

(dual approach solves the bi-dual of original problem)

Dual scissors

I Good , to separate technologies + inexact informationI Good , if output of interest if u (shadow price)

I Bad , if output of interest is x ,y : the final ones may be infeasible(dual approach solves the bi-dual of original problem)

Dual scissors

I Good , to separate technologies + inexact informationI Good , if output of interest if u (shadow price)I Bad , if output of interest is x ,y : the final ones may be infeasible

(dual approach solves the bi-dual of original problem)

Scissor features

I Primal scissors primal feasibility

I Dual scissors separability

Can we have both (x ,y) and u???

Need primal-dual scissors

Primal-dual scissors: Augmented Lagrangians

Lr (x ,y ,u) = L(x ,y ,u) +r2‖d− yT − yH‖2

I Good , closes duality gap

I Bad , puts back together the technologiesI Also: inexact calculations make it difficult to manage parameter r

Need to sharpen our scissors

L#(x ,y ,u, r) = L(x ,y ,u) +r2‖d− yT − yH‖2 + r |d− yT − yH |1



I Good , closes duality gapI Bad , puts back together the technologies

I Also: inexact calculations make it difficult to manage parameter r





I Good , closes duality gapI Bad , puts back together the technologiesI Also: inexact calculations make it difficult to manage parameter r



Primal-dual scissors: Sharp Augmented Lagrangians




L#(x ,y ,u, r) = L(x ,y ,u) + r |d− yT − yH |1

Primal-dual scissors: Sharp Augmented Lagrangians




L#(x ,y ,u, r ) = L(x ,y ,u) + r |d− yT − yH |1r is a dual variable

parameter r

Generalized Augmented Lagrangians (GAL)The perturbation function p

p(u) =

min fT (x ,yT ) + fH(yH)s.t. (x ,yT ) ∈ ST ,yH ∈ SHyT + yH = d+u ⇐⇒ p(u) = min ϕ(x)s.t. x ∈ Xh(x) = 0u

For the example,p is the minimumof two quadratic functions


p(u) =

inf fT (x ,yT ) + fH(yH)s.t. (x ,yT ) ∈ ST ,yH ∈ SHyT + yH = d+u ⇐⇒ p(u) = inf ϕ(x)s.t. x ∈ Xh(x) = u

If in the examplefT/H are quadratic,p is the minimumof two quadratic functions


p(u) =

inf fT (x ,yT ) + fH(yH)s.t. (x ,yT ) ∈ ST ,yH ∈ SHyT + yH = d+u ⇐⇒ p(u) = inf ϕ(x)s.t. x ∈ Xh(x) = u

If in the examplefT/H are quadratic,p is the minimumof two quadratic functions

p(0)→ ↖ p∗∗(0)

Need a “wedge” to close duality gap and bring magenta curve closer to blue one

The GAL according to R&W

p(u) =

inf ϕ(x)s.t. x ∈ Xh(x) = u

=infx∈X

(ϕ(x) + I{h(x)−u=0}(x)

)︸︷︷︸

D(x ,u)

I p as a marginal function of DI D is the conjugate of the Lagrangian: D∗ = LI “Fix” D adding a “σ -term”

Dσ = D+ 1r σ∗

D∗σ = D∗+∨rσ= Lσ is the GAL

I Lσ (x ,u, r) = L(x ,u) + rσ(h(x))I pσ is the marginal function of Dσ σ = | · |1

p(0)→↑ pσ (0)

no duality gap!


p(u) =

inf ϕ(x)s.t. x ∈ Xh(x) = u =infx∈X

(ϕ(x) + I{h(x)−u=0}(x)

)︸︷︷︸

D(x ,u)


Dσ = D+ 1r σ∗



p(0)→↑ pσ (0)

no duality gap!


p(u) =


(ϕ(x) + I{h(x)−u=0}(x)

)︸︷︷︸

D(x ,u)

I p as a marginal function of D

I D is the conjugate of the Lagrangian: D∗ = LI “Fix” D adding a “σ -term”

Dσ = D+ 1r σ∗



p(0)→↑ pσ (0)

no duality gap!


p(u) =


(ϕ(x) + I{h(x)−u=0}(x)

)︸︷︷︸

D(x ,u)

I p as a marginal function of DI D is the conjugate of the Lagrangian: D∗ = L

I “Fix” D adding a “σ -term”Dσ = D+ 1r σ

∗



p(0)→↑ pσ (0)

no duality gap!


p(u) =


(ϕ(x) + I{h(x)−u=0}(x)

)︸︷︷︸

D(x ,u)


Dσ = D+ 1r σ∗



p(0)→↑ pσ (0)

no duality gap!


p(u) =


(ϕ(x) + I{h(x)−u=0}(x)

)︸︷︷︸

D(x ,u)


Dσ = D+ 1r σ∗

D∗σ = D∗+∨rσ

= Lσ is the GAL


p(0)→↑ pσ (0)

no duality gap!


p(u) =


(ϕ(x) + I{h(x)−u=0}(x)

)︸︷︷︸

D(x ,u)


Dσ = D+ 1r σ∗


I Lσ (x ,u, r) = L(x ,u) + rσ(h(x))

I pσ is the marginal function of Dσ σ = | · |1

p(0)→↑ pσ (0)

no duality gap!


p(u) =


(ϕ(x) + I{h(x)−u=0}(x)

)︸︷︷︸

D(x ,u)


Dσ = D+ 1r σ∗



p(0)→↑ pσ (0)

no duality gap!

Sharp and Proximal GAL according to R&W

Lσ (x ,u, r)

L(x ,u) + r |h(x)|1 and L(x ,u) +r2|h(x)|22

r-co

mpo

nent

x-component

No duality gap, but two catches:I separable L(x ,u) = LT (xT ,u) + LH(xH ,u)

turned into non-separable Lσ (x ,u, r) = L(x ,u) + r |xT + xH−d |1I requires exact evaluation of dual function θσ (u, r) = minx∈X Lσ (x ,u, r)

a global optimization problem

Our workis a proposal to address these issues


Lσ (x ,u, r)


r-co

mpo

nent

x-component

No duality gap, but two catches:

I separable L(x ,u) = LT (xT ,u) + LH(xH ,u)turned into non-separable Lσ (x ,u, r) = L(x ,u) + r |xT + xH−d |1

I requires exact evaluation of dual function θσ (u, r) = minx∈X Lσ (x ,u, r)a global optimization problem



Lσ (x ,u, r)


r-co

mpo

nent

x-component


turned into non-separable Lσ (x ,u, r) = L(x ,u) + r |xT + xH−d |1

I requires exact evaluation of dual function θσ (u, r) = minx∈X Lσ (x ,u, r)a global optimization problem



Lσ (x ,u, r)


r-co

mpo

nent

x-component


turned into non-separable Lσ (x ,u, r) = L(x ,u) + r |xT + xH−d |1I requires exact evaluation of dual function θσ (u, r) = minx∈X Lσ (x ,u, r)

a global optimization problem


GAL according to

GAL of

min ϕ(x)s.t. x ∈ X

h(x) = 0

≡ Lagrangian of


h(x) = 0σ(h(x))≤ 0

From now on we consider L(x ,u, r) for problem on the right

←←←←←

to examine relations of approximate solutions to minx L(x ,u, r)I with ε-subgradients of the dual function θ (u, r)I with numerical schemes à la bundleI with solutions to problem on the left

Any CQ for original problem yields multipliers (u, r) for the σ -augmented problem

GAL according to

GAL of


h(x) = 0≡ Lagrangian of


h(x) = 0σ(h(x))≤ 0


←←←←←



GAL according to

GAL of




h(x) = 0σ(h(x))≤ 0


←←←←←

to examine relations of approximate solutions to minx L(x ,u, r)

I with ε-subgradients of the dual function θ (u, r)I with numerical schemes à la bundleI with solutions to problem on the left


GAL according to

GAL of




h(x) = 0σ(h(x))≤ 0


←←←←←



Everett’s theorem σ continuous, non-negative with unique minimizer at 0

Fix (ũ, r̃) and consider evaluating θ (ũ, r̃) = minx L(x , ũ, r̃),what primal problem is solved by minimizer x̃?

NLP Everett’s: A stationary point x̃ for θ (ũ, r̃), i.e., solving0 ∈ ∂xL(x , ũ, r̃) + NX (x) , with x ∈ X

is a stationary point for the original problem ⇐⇒ h(x̃) = 0ε-Everett’s: An ε-solution x̃ for θ (ũ, r̃), i.e., satisfying

L(x̃ , ũ, r̃)≤ L(x , ũ, r̃) + ε for all x ∈ Xapproximately solves the original problem (ϕ(x̃)≤ ϕ∗+ ε) ⇐⇒ h(x̃) = 0

• Summing up:I Augmentation (approximately) closes the duality gap, provided h(x̃) = 0I h(x) = 0 ⇐⇒ σ(h(x)) = 0I An inexact bundle method drives σ(h(xk )) to 0




is a stationary point for the original problem ⇐⇒ h(x̃) = 0

ε-Everett’s: An ε-solution x̃ for θ (ũ, r̃), i.e., satisfyingL(x̃ , ũ, r̃)≤ L(x , ũ, r̃) + ε for all x ∈ X

approximately solves the original problem (ϕ(x̃)≤ ϕ∗+ ε) ⇐⇒ h(x̃) = 0• Summing up:I Augmentation (approximately) closes the duality gap, provided h(x̃) = 0I h(x) = 0 ⇐⇒ σ(h(x)) = 0I An inexact bundle method drives σ(h(xk )) to 0






• Summing up:I Augmentation (approximately) closes the duality gap, provided h(x̃) = 0

I h(x) = 0 ⇐⇒ σ(h(x)) = 0I An inexact bundle method drives σ(h(xk )) to 0






• Summing up:I Augmentation (approximately) closes the duality gap, provided h(x̃) = 0I h(x) = 0 ⇐⇒ σ(h(x)) = 0

I An inexact bundle method drives σ(h(xk )) to 0

GAL and the σ -simplex recall that h(x) = 0 ⇐⇒ σ(h(x)) = 0

θ (ũ, r̃) = minx∈X L(x , ũ, r̃) = minx∈X ϕ(x) + 〈ũ,h(x)〉+ r̃σ(h(x))=⇒

(h(xmin),σ(h(xmin))

)∈ ∂εθ (ũ, r̃)

Suppose X compactExplicit calculus rule in

GAL and the σ -simplex recall that h(x) = 0 ⇐⇒ σ(h(x)) = 0

θ (ũ, r̃) = minx∈X L(x , ũ, r̃) = minx∈X ϕ(x) + 〈ũ,h(x)〉+ r̃σ(h(x))=⇒

(h(xmin), σ(h(xmin))

)∈ ∂εθ (ũ, r̃)

Suppose X compactExplicit calculus rule in

=⇒ σ -simplex=conv{

σ(h(x i)), x i approximate minimizers}

Theorem: 0 ∈ σ -simplex is equivalent toI 0 ∈ ∂εθ (ũ, r̃)I One x ibest in the σ -simplex is primal feasible: h(x ibest ) = 0

=⇒ x ibest approximate solution for original problem

PDBM: a primal-dual bundle method for GAL

Init Choose (u1, r1) and compute x1 ≈minx∈X L(x ,u1, r1)Dual Solve bundle QP with a model for θ to obtain (u+, r+)

Noise? Adjust prox-parameter and go to Dual if too much noise

Primal Compute x+ ≈minx∈X L(x ,u+, r+)Stop if σ(h(x+)) is small

Bundle Classify (u+, r+) as serious or null step, update the model

Loop to DualTheorem:1. Noise attenuation loop is finite2. There is a primal feasible limit point x̄ ibest that is approximately optimal3. If the dual sequence has accumulation points, they solve approximately the

dual problem4. For existence of dual accumulation points, see “Convex proximal bundle

methods in depth” MP2014




Primal Compute x+ ≈minx∈X L(x ,u+, r+)

Stop if σ(h(x+)) is smallBundle Classify (u+, r+) as serious or null step, update the model









Loop to Dual

Theorem:1. Noise attenuation loop is finite2. There is a primal feasible limit point x̄ ibest that is approximately optimal3. If the dual sequence has accumulation points, they solve approximately the



Solving difficult problems with PDBM for GALDC problems with explicit nonconvexity ( exact solution of subproblems)

{min

x∈X=IRnϕ(x) := 12 〈x ,Qx〉+ 〈x ,q〉− maxi∈{1,...,N}

{〈x ,αi〉+ βi}

s.t. h(x) := Ax−b = 0

augmented with σ(·) = 12‖ · ‖22 yields an easy dual function θ(u, r) = mini∈{1,...,N} θi (u, r)

Each θi is a QP having the same quadratic term for all i

θi (u, r) = minx

12〈x ,Qx〉+ 〈x ,q〉+ 〈x ,αi〉+ βi + 〈Ax−b,u〉+

12

r‖Ax−b‖22

PDBM MSM (Gas02) ENUMavg stdev avg stdev avg stdev

∆ϕ -1E-03 3E-03 -7E-02 3E-01 0 0h(x̄) 4E-08 3E-08 1E-03 6E-03#primal 25 42 105 115CPU (s) 5 8 21 36 209 515

Solving difficult problems with PDBM for GALDC problems with explicit nonconvexity ( exact solution of subproblems)

{min

x∈X=IRnϕ(x) := 12 〈x ,Qx〉+ 〈x ,q〉− maxi∈{1,...,N}

{〈x ,αi〉+ βi}

s.t. h(x) := Ax−b = 0augmented with σ(·) = 12‖ · ‖

22 yields an easy dual function θ(u, r) = mini∈{1,...,N} θi (u, r)

Each θi is a QP having the same quadratic term for all i

θi (u, r) = minx

12〈x ,Qx〉+ 〈x ,q〉+ 〈x ,αi〉+ βi + 〈Ax−b,u〉+

12

r‖Ax−b‖22

PDBM MSM (Gas02) ENUMavg stdev avg stdev avg stdev

∆ϕ -1E-03 3E-03 -7E-02 3E-01 0 0h(x̄) 4E-08 3E-08 1E-03 6E-03#primal 25 42 105 115CPU (s) 5 8 21 36 209 515

Solving difficult problems with PDBM for GALUnit-commitment problems ( inexact solution of subproblems, using ADMM)

min ∑

i∈I

(〈F ,xi〉+ Ci (yi )

)s.t. (xi ,yi ) ∈ Si

∑i

yi = D

⇐⇒

min ∑

i∈Iϕi (xi ,yi )

s.t. (xi ,yi ) ∈ Si∑

izi = D

h(x ,y) = z− y = 0augmented with σ(·) = | · |1 gives

L(x ,y ,z,u, r) =L0-1(x ,y ,u) + Lcont (z,u) + r |y− z|1

≈L0-1(x ,y ,u) +

r2|y− zfixed |1︸︷︷︸ + Lcont (z,u) + r2 |yfixed − z|1︸︷︷︸

∑i

min(xi ,yi )∈Si

min∑i zi =D


min ∑

i∈I

(〈F ,xi〉+ Ci (yi )


∑i

yi = D⇐⇒

min ∑

i∈Iϕi (xi ,yi )


izi = D

h(x ,y) = z− y = 0

augmented with σ(·) = | · |1 givesL(x ,y ,z,u, r) =L0-1(x ,y ,u) + Lcont (z,u) + r |y− z|1

≈L0-1(x ,y ,u) +


∑i

min(xi ,yi )∈Si

min∑i zi =D


min ∑

i∈I

(〈F ,xi〉+ Ci (yi )


∑i

yi = D⇐⇒

min ∑

i∈Iϕi (xi ,yi )


izi = D

h(x ,y) = z− y = 0augmented with σ(·) = | · |1 gives

L(x ,y ,z,u, r) =L0-1(x ,y ,u) + Lcont (z,u) + r |y− z|1

≈L0-1(x ,y ,u) +


∑i

min(xi ,yi )∈Si

min∑i zi =D


Very good performance, provided r0 is well chosen (not too large)

Results for 56 synthetic instances, horizon from 1 to 7 days, hourly discretization

PDBM MSM (Gas02)avg stdev avg stdev

Gap(%) 4.5 6.5 14.6 12.1h(x̄) 4.5E-03 7.2E-04 5.3E-03 1.1E-03#primal 105 52 208 131CPU (s) 96 121 148 140

Some works on GAL

I E. Golshtein and N. Tretyakov, Modified Lagrangians and Monotone Maps in Optimization, Wiley 1996

I R. Rockafellar and R. Wets. Variational Analysis. Springer, 1998

I A. M. Rubinov, B. M. Glover and X. Q. Yang, “Decreasing Functions with Applications to Penalization”, SiOPT 1999

I R. N. Gasimov. “Augmented Lagrangian Duality and Nondifferentiable Optimization Methods in Nonconvex Programming”,JOGO 2002←MSMI X. X. Huang and X. Q. Yang. “A Unified Augmented Lagrangian Approach to Duality and Exact Penalization”,MOR 2003

I R. Burachik, R. Gasimov, N. Ismayilova, C. Kaya. “On a Modified Subgradient Algorithm for Dual Problems via Sharp Augmented Lagrangian”,JOGO 2006

I R. Burachik and A. Rubinov. “Abstract Convexity and Augmented Lagrangians”,SiOPT 2007

I A. Nedič, A. Ozdaglar, and A. Rubinov. “Abstract convexity for nonconvex optimization duality”, OPT 2007

I R. Burachik, A. N. Iusem, and J. G. Melo. “A primal dual modified subgradient algorithm with sharp Lagrangian”,JOGO 2010

I R. Burachik. “On primal convergence for augmented Lagrangian duality”, OPT 2011

I R. Burachik, A. Iusem, and J. Melo. “An Inexact Modified Subgradient Algorithm for Primal-Dual Problems via Augmented Lagrangians”, JOTA 2013

I N. L. Boland and A. C. Eberhard. “On the augmented Lagrangian dual for integer programming”, MP 2015

I M. J. Feizollahi, S. Ahmed, and A. Sun. “Exact augmented Lagrangian duality for mixed integer linear programming”,MP 2017

I A. M. Bagirov, G. Ozturk, and R. Kasimbeyli, “A sharp augmented Lagrangian-based method in constrained non-convex optimization”, OMS 2018

I X. Gu, S. Ahmed, and S. S. Dey. “Exact Augmented Lagrangian Duality for Mixed Integer Quadratic Programming”,SiOPT 2020

Questions?

one world optimization seminar...one world optimization seminar univie, june 22nd, 2020 context:...

Documents