chapter 1 stochastic linear and nonlinear programming …rohop/spring_15/chapter1.pdf · chapter 1...
TRANSCRIPT
Chapter 1
Stochastic Linear and Nonlinear Programming
1.1 Optimal land usage under stochastic uncertainties
1.1.1 Extensive form of the stochastic decision program
We consider a farmer who has a total of 500 acres of land availablefor growing wheat, corn and sugar beets. We denote by x1, x2, x3 theamount of acres of land devoted to wheat, corn and sugar beets, re-spectively.
The planting costs per acre are $ 150, $ 230, and $ 260 for wheat, cornand sugar beets.
The farmer needs at least 200 tons (T) of wheat and 240 T of corn forcattle feed which can be grown on the farm or bought from a wholesaler.We refer to y1, y2 as the amount of wheat resp. corn (in tons) purchasedfrom a wholesaler. The purchase prices of wheat resp. corn per ton are$ 238 for wheat and $ 210 for corn.
The amount of wheat and corn produced in excess will be sold at pricesof $ 170 per ton for wheat and $ 150 per ton for corn. For sugar beetsthere is a quota on production which is 6000 T for the farmer. Anyamount of sugar beets up to the quota can be sold at $ 36 per ton, theamount in excess of the quota is limited to $ 10 per ton. We denote byw1 and w2 the amount in tons of wheat resp. corn sold and by w3, w4
the amount of sugar beets sold at the favorable price and the reducedprice, respectively.
The farmer knows that the average yield on his land is 2.5 T, 3.0 Tand 20.0 T per acre for wheat, corn and sugar beets.
The data are shown in Table 1.
Table 1. Data for optimal land usage
Wheat Corn Sugar BeetsYield (T/acre) 2.5 3.0 20.0Planting cost ($/acre) 150 230 260Purchase price ($/T) 238 210 –Selling price ($/T) 170 150 36 (under 6000 T)
10 (above 6000 T)Minimum requirement (T) 200 240 –Total available land: 500 acres
1
2 Ronald H.W. Hoppe
The farmer wants to maximize his profit. Based on the above data,this amounts to the solution of the linear program:
minimize (150x1 + 230x2 + 260x3 + 238y1 − 170w1(1.1)
+ 210y2 − 150w2 − 36w3 − 10w4)
subject to x1 + x2 + x3 ≤ 500 ,
2.5x1 + y1 − w1 ≥ 200 ,
3.0x2 + y2 − w2 ≥ 240 ,
w3 + w4 ≤ 20x3 ,
w3 ≤ 6000 ,
x1, x2, x3, y1, y2, w1, w2, w3, w4 ≥ 0 .
The solution of (1.1) is shown in Table 2.
Table 2. Solution of the linear program (’average yields’)
Culture Wheat Corn Sugar BeetsSurface (acres) 120 80 300Yield (T) 300 240 6000Purchases (T) – – –Sales (T) 100 – 6000Maximum profit: $ 118,600
The yield is sensitive to, e.g., wheather conditions. We refer to to thepreviously determined optimal solution as that one based on ’averageyields’ and consider two more scenarios, namely ’above average yields’and ’below average yields’ by a margin of ± 20 %. The associatedoptimal solutions are depicted in Table 3 and Table 4.
Table 3. Solution of the linear program (’above average yields’)
Culture Wheat Corn Sugar BeetsSurface (acres) 183.33 66.67 250Yield (T) 550 240 6000Purchases (T) – – –Sales (T) 350 – 6000Maximum profit: $ 167,667
Optimization Theory II, Spring 2007 ; Chapter 1 3
Table 4. Solution of the linear program (’below average yields’)
Culture Wheat Corn Sugar BeetsSurface (acres) 100 25 375Yield (T) 200 60 6000Purchases (T) – 180 –Sales (T) – – 6000Maximum profit: $ 59,950
The mean profit is the average profit of the three scenarios which is$ 115,406.
The problem for the farmer is that he has to decide on the land assign-ment, i.e., to determine x1, x2, x3 without knowing which of the threescenarios is going to happen with regard to the purchases y1, y2 andsales w1, w2, w3, w4 which depend on the yield. The variables x1, x2, x3
are called the first stage decision variables.
Hence, the decisions depend on the scenarios which are indexed byj = 1, 2, 3 with j = 1 referring to ’above average yields’, j = 2 to’average yields’ and j = 3 to ’below average yields’. We introducecorresponding new variables yij, 1 ≤ i ≤ 2, 1 ≤ j ≤ 3, and wij, 1 ≤ i ≤4, 1 ≤ j ≤ 3. For instance, w31 represents the amount of sugar beetssold at the favorable price in case ’above average yields’. The decisionvariables yij, wij are referred to as the second stage decision variables.
We assume that the three scenarios occur at the same probability of1/3. If the objective is to maximize long-run profit, we are led to thefollowing problem:
minimize(1.2) (150x1 + 230x2 + 260x3
−1
3(170w11 − 238y11 + 150w21 − 210y21 + 36w31 + 10w41)
−1
3(170w12 − 238y12 + 150w22 − 210y22 + 36w32 + 10w42)
−1
3(170w13 − 238y13 + 150w23 − 210y23 + 36w33 + 10w43)
)
subject to x1 + x2 + x3 ≤ 500 ,
3.0x1 + y11 − w11 ≥ 200 ,
3.6x2 + y21 − w21 ≥ 240 ,
4 Ronald H.W. Hoppe
w31 + w41 ≤ 24x3 ,
w31 ≤ 6000 ,
2.5x1 + y12 − w12 ≥ 200 ,
3.0x2 + y22 − w22 ≥ 240 ,
w32 + w42 ≤ 20x3 ,
w32 ≤ 6000 ,
2.0x1 + y13 − w13 ≥ 200 ,
2.4x2 + y23 − w23 ≥ 240 ,
w33 + w43 ≤ 16x3 ,
w33 ≤ 6000 ,
x, y, w ≥ 0 .
The optimization problem (1.2) is called a stochastic decision prob-lem. In particular, (1.2) is said to be the extensive form of thestochastic program. The reason for this notation is that it explicitlydescribes the second stage variables for all possible scenarios. Its opti-mal solution is shown in Table 5.
Table 5. Solution of the stochastic decision problem (1.2)
Wheat Corn Sugar BeetsFirst stage Surface (acres) 170 80 250s = 1 Yield (T) 510 288 6000above Purchases (T) – – –average Sales (T) 310 48 6000s = 2 Yield (T) 425 240 5000average Purchases (T) – – –
Sales (T) 225 – 5000s = 3 Yield (T) 340 192 4000below Purchases (T) – 48 –average Sales (T) 140 – 4000Maximum profit: $ 108,390
We see that the solution differs from those obtained in case of perfecta priori information. The distinctive feature is that in a stochasticsetting the decisions have to be hedged against the various possiblescenarios (cf. Tables 2,3 and 4). We also see that the expected max-imum profit ($ 108,390) differs from the mean value ($ 115,406) ofthe maximum profits of the three scenarios in case of perfect a priori
Optimization Theory II, Spring 2007 ; Chapter 1 5
information. The difference $ 7016 is called the Expected Value ofPerfect Information (EVPI).
A variant of the above stochastic decision problem is that the farmermakes the first stage decision (allocation of land) on the basis of ’av-erage yields’ according to Table 2. If the yields are again random with20 % above resp. below average, it has to be observed that the plant-ing costs are deterministic, but the purchases and sales depend on theyield. This leads to a reduced stochastic decision program where themaximum profit turns out to be $ 107,240 which is in this case less thanthe maximum profit $ 108,390 of the stochastic decision program (1.2).The difference $ 1,150 is called the Value of the Stochastic Solu-tion (VSS) reflecting the possible gain by solving the full stochasticmodel.
1.1.2 Two-stage stochastic program with recourse
For a stochastic decision program, we denote by x ∈ lRn1 , x ≥ 0, thevector of first stage decision variables. It is subject to constraints
(1.3) Ax ≤ b ,
where A ∈ lRm1×n1 , b ∈ lRm1 are a fixed matrix and vector, respectively.In the optimal land usage problem, x = (x1, x2, x3)
T represents theamount of acres devoted to the three different crops. Here, m1 = 1 andA = (1 1 1), b = 500.We further denote by ξ a random vector whose realizations provideinformation on the second stage decisions y which is a random vectorwith realizations in lRn2
+ . In the optimal land usage problem, ξ =(t1, t2, t3)
T with ti = ti(s), 1 ≤ i ≤ 3, where s ∈ 1, 2, 3 stands for thepossible scenarios (’above average’, ’average’, and ’below average’). Inother words, ti(s) represents the yield of crop i under scenario s.y = (y1, y2, y3 = w1, y4 = w2, y5 = w3, y6 = w4)
T is the random vectorwhose realizations y(s), s ∈ 1, 2, 3, are the second stage decisionson the amount of crop to be purchased or sold at scenario s. Therelationship between x and y can be expressed according to
(1.4) Wy = h − Tx ,
where h ∈ lRm2 is a fixed vector, W ∈ lRm2×n2 is a fixed matrix, andT is a random matrix with realizations T(s) ∈ lRm2×n1 .W is called the recourse matrix and T is referred to as the techno-logy matrix.
6 Ronald H.W. Hoppe
The second stage decision problems can be stated as
minimize qTy(1.5)
subject to Wy + Tx ≥ h,
y ≥ 0
for given q ∈ lRn2 . We set
(1.6) Q(x, ξ) := min qTy | Wy + Tx ≥ h .
In the optimal land usage problem, a second stage decision problem forscenario s can be written as
minimize 238y1 − 170w1 + 210y2 − 150w2 − 36w3 − 10w4(1.7)
subject to t1(s)x1 + y1 − w1 ≥ 200 ,
t2(s)x2 + y2 − w2 ≥ 240 ,
t3(s)x3 − w3 − w4 ≥ 0 ,
w3 ≤ 6000 ,
y, w ≥ 0 .
Altogether, the stochastic program can be formulated according to
minimize cT x + EξQ(x, ξ)(1.8)
subject to Ax ≤ b ,
x ≥ 0 ,
where c ∈ lRn1 is given and Eξ stands for the expectation.
The problem (1.8) is called a two stage stochastic program withrecourse. It represents the implicit representation of the original sto-chastic decision problem (1.2).Finally, we refer to the function
(1.9) Q(x) := EξQ(x, ξ)
as the value function or recourse function. Then, in even morecompact form (1.8) can be written as
minimize cT x + Q(x)(1.10)
subject to Ax ≤ b ,
x ≥ 0 ,
Optimization Theory II, Spring 2007 ; Chapter 1 7
1.1.3 Continuous random variables: The news vendor pro-blem
As an example of a stochastic problem with continuous random vari-ables we consider the so-called news vendor problem: The settingof the problem is as follows:Every morning, a news vendor goes to the publisher and buys x news-papers at a price of c per paper. This number is bounded from aboveby xmax. The vendor tries to sell as many newspapers as possible ata selling price q. Any unsold newspapers can be returned to the pub-lisher at a return price of r < c. The demand for newspapers variesof the days and is described by a continuous random variable ξ with
probability distribution F = F (ξ), i.e., P (a ≤ ξ ≤ b) =∫ b
adF (ξ) and∫ +∞
−∞ dF (ξ) = 1.The objective is to maximize the vendor’s profit. To this end, we definey as the effective sales and w as the number of remittents. Then, theproblem can be stated as
minimize J(x) := cx + Q(x)(1.11)
subject to 0 ≤ x ≤ xmax ,
where
Q(x) := EξQ(x, ξ) ,(1.12)
Q(x, ξ) := min − qy(ξ)− rw(ξ) ,
subject to y(ξ) ≤ ξ ,
y(ξ) + w(ξ) ≤ x ,
y(ξ), w(ξ) ≥ 0 .
Note that−Q(x) is the expected profit on sales and returns and−Q(x, ξ)stands for the profit on sales and returns in case the demand is givenby ξ.We see that like the optimal land usage problem, (1.11) represents an-other two-stage stochastic linear program with fixed recourse.The optimal solution of (1.11) can be easily computed: When the de-mand ξ is known in the second stage, the optimal solution is givenaccording to
y∗(ξ) = min(ξ, x) , w∗(ξ) = max(x− ξ, 0) ,
and hence, the second stage expected value function turns out to be
Q(x) = Eξ [−q min(ξ, x) − r max(x− ξ, 0)] .
8 Ronald H.W. Hoppe
The second stage expected value function can be computed by meansof the probalibility distribution F (ξ):
Q(x) =
x∫
−∞
(− qξ − r(x− ξ)
)dF (ξ) +
+∞∫
x
(− qx
)dF (ξ) =
= −(q − r)
x∫
−∞
ξ dF (ξ) − rxF (x) − qx(1− F (x)) .
Integration by parts yieldsx∫
−∞
ξ dF (ξ) = xF (x) −x∫
−∞
F (ξ) dξ ,
whence
Q(x) = −qx + (q − r)
x∫
−∞
F (ξ) dξ .
It follows that Q is differentiable in x with
Q′(x) = −q + (q − r)F (x) .
From Optimization I we know that the optimal solution x∗ of (1.11)satisfies the variational equation
(v − x∗)J ′(x∗) ≥ 0 , v ∈ K := x | 0 ≤ x ≤ xmax ,
whose solution is given by
x∗ = 0 , if J ′(0) > 0 ,
x∗ = xmax , if J ′(xmax) < 0 ,
J ′(x∗) = 0 , otherwise .
Since J ′(x) = c + Q′(x), we find
x∗ = 0 , ifq − c
q − r< F (0) ,
x∗ = xmax , ifq − c
q − r> F (xmax) ,
x∗ = F−1(q − c
q − r) , otherwise .
Optimization Theory II, Spring 2007 ; Chapter 1 9
1.2 Two-stage stochastic linear programs with fixed recourse
1.2.1 Formulation of the problem and basic properties
Recalling the example concerning optimal land usage in case of sto-chastic uncertainties, we give the general formulation of a two-stagestochastic linear program with fixed recourse.
Definition 1.1 (Two-stage stochastic linear program with fixedrecourse)
Let I ⊂ N be an index set, A ⊂ Rm1×n1 and W ∈ Rm2×n2 be fixed(deterministic) matrices, T (ω) ∈ Rm2×n1 , ω ∈ I, a random matrix,c ∈ Rn1 , b ∈ Rm1 fixed (deterministic) vectors and q(ω) ∈ Rn2 , h(ω) ∈Rm2 , ω ∈ I, random vectors w.r.t. a probability space (P, Ω,A). Letfurther ξT (ω) = (q(ω)T , h(ω)T , T1(ω), · · · , Tm2(ω) ∈ RN , N := n2 +m2 + (m2 × n1), ω ∈ I, where Ti(ω), 1 ≤ i ≤ m2, are the rows of T (ω).Then, the problem
minimize cT x + Eξ(min q(ω)T y(ω))(1.13)
Ax = b ,
T (ω)x + Wy(ω) = h(ω) a.s. ,
x ≥ 0 , y(ω) ≥ 0 a.s. ,
is called a two-stage stochastic linear program with fixed re-course. The matrix W is said to be the recourse matrix and thematrix T (ω) is referred to as the technology matrix.
Definition 1.2 (Deterministic Equivalent Program (DEP))
The linear program
minimize cT x + Q(x)(1.14)
Ax = b ,
x ≥ 0 ,
where
Q(x) := Eξ(Q(x, ξ(ω))) ,(1.15)
Q(x, ξ(ω)) := minyq(ω)T y | Wy = h(ω)− T (ω)x , y ≥ 0 ,
is called the Deterministic Equivalent Program (DEP) associatedwith (1.13). The function Q is said to be the recourse function orexpected second-stage value function.
10 Ronald H.W. Hoppe
Definition 1.3 (Feasible sets)
The sets
K1 := x ∈ Rn1 | Ax = b ,(1.16)
K2 := x ∈ Rn1 | Q(x) < ∞are called the first stage feasible set and the second stage feasibleset, respectively.Let Σ ⊂ RN be the support of ξ in the sense that P (ξ ∈ Σ) = 1. If Σis finite, Q(x) is the weighted sum of finitely many Q(x, ξ) values. Weuse the convention that if the values ±∞ occur, then +∞ + (−∞) =+∞. The sets
K2(ξ) := x ∈ Rn1 | Q(x, ξ) < ∞ ,(1.17)
KP2 := x ∈ Rn1 | For all ξ ∈ Σ, y ≥ 0 exists
s.th. Wy = h− Tx =⋂
ξ∈Σ
K2(ξ)
are called the elementary second stage feasible set and the pos-sibility interpretation of the second stage feasible set, respec-tively.
Definition 1.4 (Relatively complete, complete, and simplerecourse)
The stochastic program (1.13) is said to have
• relatively complete recourse, if K1 ⊂ K2,• complete recourse, if for all z ∈ Rm2 there exists y ≥ 0 such
that Wy = z,• simple recourse, if the recourse matrix W has the structure
W = [I − I].
In case of simple recourse, we partition y and q according to y =(y+, y−), and q = (q+, q−). Then, the optimal values (y+
i (ω), y−i (ω)only depend on the sign of hi(ω)− Ti(ω)x, provided qi = q+
i + q−i ≥ 0with probability one. Moreover, if hi has an associated distributionfunction Fi and mean value hi, there holds
(1.18) Qi(x) = q+i hi − (q+
i − qiFi(Tix))Tix − qi
∫
hi≤Tix
hi dFi(hi) .
Optimization Theory II, Spring 2007 ; Chapter 1 11
Theorem 1.1 (Characterization of second stage feasible sets)
(i) For each ξ, the elementary second stage feasible set K2(ξ) is a closedconvex polyhedron which implies that KP
2 is a closed convex set.(ii) Moreover, if Σ is finite, KP
2 = K2.
Proof: The proof of (i) is obvious. For the proof of (ii) assume x ∈ K2.Then, Q(x) is bounded from above. Hence, Q(x, ξ) is bounded fromabove for each ξ which shows x ∈ K2(ξ) for all ξ, whence x ∈ KP
2 .Conversely, assume x ∈ KP
2 . Then, Q(x, ξ) is bounded from above forall ξ. We deduce that Q(x) is bounded from above and hence, x ∈ K2.
¤We note that in case ξ is a continuous random variable similar results
hold true, if ξ has finite second moments. For details we refer to [6]and [9].
Theorem 1.2 (Properties of the second stage value function)
Assume Q(x, ξ) > −∞. Then, there holds
(i) Q(x, ξ) is piecewise linear and convex in (h, T ).
(ii) Q(x, ξ) is piecewise linear and concave in q.
(iii) Q(x, ξ) is piecewise linear and convex in x for all x ∈ K1 ∩K2.
Proof: The piecewise linearity in (i)-(iii) follows from the existence offinitely many optimal bases for the second stage program. For detailswe refer to [7].For the proof of the convexity in (h, t) resp. in x, it suffices to provethat the function
g(z) := min qT y | Wy = zis convex in z. For λ ∈ [0, 1] and z1, z2, z1 6= z2, we consider z(λ) :=λz1 + (1 − λ)z2 and denote by y∗i , 1 ≤ i ≤ 2, optimal solutions ofthe minimization problem for z = z1 and z = z2, respectively. Then,y∗(λ) := λy∗1 + [1− λ)y∗2 is a feasible solution for z = z(λ). If y∗λ is thecorresponding optimal solution, we obtain
g(z(λ)) = qT y∗λ ≤ qT y∗(λ) =
= λqT y∗1 + (1− λ)qT y∗2 = λg(z1) + (1− λ)g(z2) .
The proof of the concavity in q is left as an exercise. ¤For similar results in case ξ is a continuous random variable with finitesecond moments we again refer to [8].
12 Ronald H.W. Hoppe
1.2.2 Optimality conditions
For the derivation of the optimality conditions (KKT conditions), weassume that (1.14) has a finite optimal value. We refer to [8] for con-ditions that guarantee finiteness of the optimal value.
Theorem 1.3 (Optimality conditions)
Assume that (1.14) has a finite optimal value. A solution x∗ ∈ K1 of(1.14) is optimal if and only if there exist λ∗ ∈ Rm1 , µ∗ ∈ Rn1
+ , (µ∗)T x∗ =0, such that
(1.19) −c + AT λ∗ + µ∗ ∈ ∂Q(x∗) ,
where ∂Q(x∗) denotes the subdifferential of the recourse function Q.
Proof: As we know from Optimization I, the minimization problem
minimize J(x) := cT x + Q(x) ,
subject to Ax = b , x ≥ 0
is a convex optimization problem with a closed convex constraint setwhich can be equivalently written as
(1.20) infx∈Rn1
supλ∈Rm1 ,µ∈Rn1
+
L(x, λ, µ) ,
where the Lagrangian is given by
L(x, λ, µ) := J(x) − λT (Ax− b) − µT x .
The optimality condition for (1.20) reads
0 ∈ ∂L(x∗, λ∗, µ∗) .
The subdifferential of the Lagrangian turns out to be
∂L(x∗, λ∗, µ∗) = c + ∂Q(x∗) − AT λ∗ − µ∗ ,
which results in (1.19). ¤
Obviously, the non-easy task will be to evaluate the subdifferentialof the recourse function. The following result shows that it can bedecomposed into subgradients of the recourse for each realization of ξ.
Theorem 1.4 (Decomposition of the subgradient of the re-course function)
For x ∈ K there holds
∂Q(x) = Eω∂Q(x, ξ(ω)) + N(K2, x) ,(1.21)
N(K2, x) = v ∈ Rn1 | vT y ≤ 0 for all y s.th. x + y ∈ K2 ,
Optimization Theory II, Spring 2007 ; Chapter 1 13
where N(K2, x) is the normal cone of the second stage feasible set K2.
Proof: The subdifferential calculus of random convex functions withfinite expectations [10] infers
∂Q(x) = Eω∂Q(x, ξ(ω)) + rec(∂Q(x)) ,
where rec(∂Q(x)) is the recession cone of the subdifferential accordingto
rec(∂Q(x)) = v ∈ Rn1 | u + λv ∈ ∂Q(x) , λ ≥ 0 , u ∈ ∂Q(x) .
The recession cone can be equivalently written as
rec(∂Q(x)) = v ∈ Rn1 | yT (u+λv)+Q(x) ≤ Q(x+y) , λ ≥ 0 , y ∈ Rn1 .
Consequently, we have
v ∈ rec(∂Q(x)) ⇐⇒ yT v ≤ 0 for all y s.th. Q(x + y) < ∞ .
Recalling the definition of K2, we conclude. ¤
Corollary 1.5 (Optimality conditions in case of relatively com-plete recourse)
Assume that (1.14) has relatively complete recourse. Then, a solutionx∗ ∈ K1 of (1.14) is optimal if and only if there exist λ∗ ∈ Rm1 , µ∗ ∈Rn1
+ , (µ∗)T x∗ = 0, such that
(1.22) −c + AT λ∗ + µ∗ ∈ Eω∂Q(x∗, ξ(ω)) .
Proof: Taking into account that under the assumption of a relativelycomplete recourse there holds
N(K2, x) ⊂ N(K1, x) = v ∈ Rn2 | v = AT λ + µ , µ ≥ 0 , µT x = 0 ,
the result follows from Theorems 1.3 and 1.4. ¤
Corollary 1.6 (Optimality conditions in case of simple re-course)
Assume that (1.14) has relatively complete recourse. Then, a solutionx∗ ∈ K1 of (1.14) is optimal if and only if there exist λ∗ ∈ Rm1 , µ∗ ∈Rn1
+ , (µ∗)T x∗ = 0, and π∗ ∈ Rn2 with
−(q+i − qiFi(Tix
∗)) ≤ π∗i ≤ −(q+i − qiF
+i (Tix
∗))
where F+i (h) := limt→h+ Fi(t), such that
(1.23) −c + AT λ∗ + µ∗ − (π∗)T T = 0 .
Proof: We deduce from (1.18) that
∂Qi(x) = πi(Ti)T | −(q+
i −qiFi(Tix)) ≤ πi ≤ −(q+i −qiF
+i (Tix)) .
Then, (1.23) follows readily from Theorem 1.3. ¤
14 Ronald H.W. Hoppe
1.2.3 The value of information
Definition 1.5 (Expected value of perfect information)
For a particular realization ξ = ξ(ω), ω ∈ I, we consider the objectivefunctional
J(x, ξ) := cT x + min qT y | Wy = h− Tx , y ≥ 0and the associated minimization problem
minx∈K1
EξJ(x, ξ) , K1 := x ∈ Rn1+ | Ax = b .
The optimal solution is sometimes referred to as the here-and-nowsolution (cf. (1.13)). We denote the optimal value of this recourseproblem by
(1.24) RP := minx∈K1
J(x, ξ) .
Another related minimization problem is to find the optimal solutionfor all possible scenarios and to consider the expected value of theassociated optimal value
(1.25) WS := Eξ minx∈K1
J(x, ξ) .
The optimal solution of (1.25) is called the wait-and-see solution.The difference between the optimal values of the here-and-now solutionand the wait-and-see solution
(1.26) EV PI := RP − WS
is referred to as the expected value of perfect information.
Example: In the optimal land usage problem, we found
WS = −$115, 406 , RP = −$108, 390
so that EV PI = $7, 016. This amount represents the value of perfectinformation w.r.t. the wheather conditions for the next season.
The computation of the wait-and-see solution requires a considerableamount of computational work. Replacing all random variables bytheir expectations leads us to a quantity which is called the value ofthe stochastic solution.
Optimization Theory II, Spring 2007 ; Chapter 1 15
Definition 1.6 (Value of the stochastic solution)
We denote by ξ = E(ξ) the expectation of ξ and consider the mini-mization problem
(1.27) EV := minx∈K1
J(x, ξ) .
which is dubbed the expected value problem or mean value prob-lem. An optimal solution of (1.27) is called the expected value so-lution.Denoting an optimal solution by x(ξ), the expected value
(1.28) EEV := EξJ(x(ξ), ξ)
is referred to as the expected result of using the expected valuesolution. The difference
(1.29) V SS := EEV − RP
is called the value of the stochastic solution. It measures theperformance of x(ξ) w.r.t. second stage decisions optimally chosen asfunctions of x(ξ) and ξ.
Example: In the optimal land usage problem, we have
EEV = −$107, 240 , RP = −$108, 390 ,
so that V SS = $1, 150. This amount represents the cost of ignoringuncertainty in the choice of a decision.
Theorem 1.7 (Fundamental inequalities, Part I)
Let RP , WS, and EEV be given by (1.24), (1.25) and (1.28), respec-tively. Then, there holds
(1.30) WS ≤ RP ≤ EEV .
Proof: If x∗ denotes the optimal solution of the recourse problem(1.24) and x(ξ) is the wait-and-see solution, we have
J(x(ξ), ξ) ≤ J(x∗, ξ) .
Taking the expectation on both sides results in the left inequality in(1.30). Since x∗ is the optimal solution of (1.24), whereas x(ξ) is justone solution of the recourse problem, we arrive at the second inequalityin (1.30). ¤
16 Ronald H.W. Hoppe
Theorem 1.8 (Fundamental inequalities, Part II)
Let WS and EV be given by (1.25) and (1.27). Then, in case of fixed(deterministic) objective coefficients and fixed (deterministic) technol-ogy matrix T , there holds
(1.31) EV ≤ WS .
Proof: We definef(ξ) = min
x∈K1
J(x, ξ) .
Recalling (1.25) and (1.27), we see that (1.31) is equivalent to
(1.32) E(f(ξ)) ≤ f(E(ξ)) .
Since (1.32) holds true for convex functions according to Jensen’s in-equality, the only thing we have to prove is the convexity of f . In orderto do that, by duality we have
minx∈K1
J(x, ξ) = maxσ,π
σT b + πT h | σT A + πT T ≤ cT , πT W ≤ q .
We observe that the constraints of the dual problem remain unchangedfor all ξ = h. Hence, epi f is the intersection of the epigraphs of thelinear functions σT b+πT h for all feasible σ, π. The latter are obviouslyconvex, and so is then epi f . We know from Optimization I that afunction is convex if and only if its epigraph is convex. ¤
Theorem 1.9 (Fundamental inequalities, Part III)
Let RP and EEV be given by (1.24) and (1.28). Assume further thatx∗ is an optimal solution of (1.24) and that x(ξ) is a solution of theexpected value problem (1.28). Then, there holds
(1.33) RP ≥ EEV + (x∗ − x(ξ))T η , η ∈ ∂EξJ(x(ξ), ξ) .
Proof: The proof is left as an exercise. ¤
We finally derive an upper bound for the optimal value RP of therecourse problem (1.24) which is based on the observation
RP = minx∈K1
EξJ(x, ξ) ,(1.34)
J(x, ξ) := cT x + min qT y | Wy ≥ h(ξ)− Tx , y ≥ 0 .(1.35)
Note that in (1.36) the second stage constraints are inequalities.
Optimization Theory II, Spring 2007 ; Chapter 1 17
Theorem 1.10 (Fundamental inequalities, Part IV)
Assume that h(ξ) is bounded from above, i.e., there exists hmax suchthat h(ξ) ≤ hmax for all possible realizations of ξ. Let xmax be anoptimal solution of
minx∈K1
J(x, hmax) .
Then, there holds
(1.36) RP ≤ J(xmax, hmax) .
Proof: We see that for any ξ ∈ Σ and x ∈ K1, a feasible solution ofWy ≥ hmax − Tx, y ≥ 0, is also a feasible solution of Wy ≥ h(ξ) −Tx, y ≥ 0. Consequently, we have
J(x, hmax) ≥ J(x, h(ξ)) =⇒ J(x, hmax) ≥ EξJ(x, h(ξ)) ,
whenceJ(x, hmax) ≥ min
x∈K1
EξJ(x, h(ξ)) = RP . ¤
There is no universal relationship between EV PI and V SS. For adiscussion of this issue we refer to [1].
18 Ronald H.W. Hoppe
1.3 Numerical solution of two-stage stochastic linear pro-grams with fixed recourse
1.3.1 The L-shaped method
We consider the deterministic equivalent program of a two-stage sto-chastic linear program with fixed recourse (cf. (1.13) and (1.14))
minimize cT x + Q(x)(1.37)
subject to Ax = b , x ≥ 0 ,
where
Q(x) := Eξ(Q(x, ξ(ω))) ,(1.38)
Q(x, ξ(ω)) := minyq(ω)T y | Wy = h(ω)− T (ω)x , y ≥ 0 .
The computational burden w.r.t. the DEP (1.37),(1.38) is the solutionof all second stage recourse linear programs. If the random vector ξonly has a finite number, let’s say, K realizations with probabilitiespk, 1 ≤ k ≤ K, the computational work can be significantly reduced byassociating one set of second stage decisions yk to each realization ofξ, i.e., to each realization of qk, hk, and Tk, 1 ≤ k ≤ K. In other words,we consider the following extensive form
minimize cT x +K∑
k=1
pkqTk yk(1.39)
subject to Ax = b ,
Tkx + Wyk = hk , 1 ≤ k ≤ K ,
x ≥ 0 , yk ≥ 0 , 1 ≤ k ≤ K .
A
T1 W
T2 W
• • • •
TK W
Fig.1.1. Block structure of the L-shaped method
Optimization Theory II, Spring 2007 ; Chapter 1 19
The L-shaped method is an iterative process with feasibility cutsand optimality cuts according to the block structure of the extensiveprogram illustrated in Fig. 1.1 which gives the method its name.
L-shaped algorithm
Step 0 (Initialization): Set r = s = ν = 0 and Dr = 0, dr =0, Es = 0, es = 0.
Step 1 (Iteration loop): Set ν = ν+1 and solve the linear program
minimize J(x, θ) := cT x + θ(1.40a)
subject to Ax = b ,(1.40b)
D` x ≥ d` , ` = 1, · · · , r ,(1.40c)
E` x + θ ≥ e` , ` = 1, · · · , s ,(1.40d)
x ≥ 0 , θ ∈ R .(1.40e)
If no constraints (1.40d) are present, set θ = 0. Denote an optimalsolution by (xν , θν). Set θν = −∞, if there are no constraints (1.40d).
Step 2 (Feasibility cuts): For k = 1, · · · , K until J(y, v+, v−) > 0solve the linear program
minimize J(y, v+, v−) := eT v+ + eT v−(1.41a)
subject to Wy + v+ − v− = hk − Tkxν ,(1.41b)
y ≥ 0 , v+ ≥ 0 , v− ≥ 0 ,(1.41c)
where e := (1, · · · , 1)T . For the first 1 ≤ k ≤ K with J(y, v+, v−) > 0let σν be the associated Lagrange multiplier and define the feasibilitycut
Dr+1 := (σν)T Tk ,(1.42a)
dr+1 := (σν)T hk . .(1.42b)
Set r = r + 1 and go back to Step 1.If J(y, v+, v−) = 0 for all 1 ≤ k ≤ K, go to Step 3.
Step 3 (Optimality cuts): For k = 1, · · · , K solve the linearprogram
minimize J(y) := qT y(1.43a)
subject to Wy = hk − Tkxν ,(1.43b)
y ≥ 0 .(1.43c)
20 Ronald H.W. Hoppe
Let πνk be the Lagrange multipliers associated with an optimal solution
and define the optimality cut
Es+1 :=K∑
k=1
pk(πνk)T Tk ,(1.44a)
es+1 :=K∑
k=1
pk(πνk)T hk . .(1.44b)
Set Jν := es+1−Es+1xν . If θν ≥ Jν , stop the iteration: xν is an optimal
solution. Otherwise, set s = s + 1 and go back to Step 1.
1.3.2 Illustration of feasibility and optimality cuts
Example (Optimality cuts): We consider the minimization problem
minimize Q(x)(1.45)
subject to 0 ≤ x ≤ 10 ,
where
(1.46) Q(x, ξ) :=
ξ − x , x ≤ ξx− ξ , x ≥ ξ
,
where ξ1 = 1, ξ2 = 2, ξ3 = 4 are the possible realizations of ξ with theprobabilities pk = 1/3, 1 ≤ k ≤ 3. Note that
W = 1 , Tk =
1 , x ≤ ξk
−1 , x > ξk, qk = 1 , hk =
ξk , x ≤ ξk
−ξk , x > ξk.
Set r = s = 0, Dr = 0, dr = 0, Es = 0, es = 0, and ν = 1, x1 = 0, θ1 =−∞, and begin the iteration with Step 2:
Iteration 1: In Step 2, we find J(y, v+, v−) = 0, 1 ≤ k ≤ 3, sincex1 = 0 is feasible. In Step 3, the solution of (1.43) yields y = (1, 2, 4)T
with π1k = 1, 1 ≤ k ≤ 3. Hence, (1.44a),(1.44b) give rise to
E1 = 1 , e1 =7
3, J1 =
7
3.
Set s = 1 and begin Iteration 2.
Iteration 2: In Step 1, the solution of the minimization problem
minimize θ
subject to θ ≥ 7
3− x ,
0 ≤ x ≤ 10 , θ ∈ Ris x2 = 10, θ2 = −23
3.
Step 2 does not result in a feasibility cut, since x2 is feasible. In Step
Optimization Theory II, Spring 2007 ; Chapter 1 21
3, the solution of (1.43) yields y = (9, 8, 6)T with π2k = 1, 1 ≤ k ≤ 3.
Hence, from (1.44a),(1.44b) we obtain
E2 = −1 , e2 = −7
3, J2 =
23
3.
Set s = 2 and begin Iteration 3.
Iteration 3: In Step 1, the solution of the minimization problem
minimize θ
subject to θ ≥ 7
3− x ,
θ ≥ x− 7
3,
0 ≤ x ≤ 10 , θ ∈ Ris x3 = 7
3, θ3 = 0.
Step 2 does not result in a feasibility cut, since x3 is feasible. In Step 3,the solution of (1.43) gives y = (4/3, 1/3, 5/3)T with π3
k = 1, 1 ≤ k ≤ 3.The equations (1.44a),(1.44b) imply
E3 = −1
3, e3 =
1
3, J3 =
10
9.
Set s = 3 and begin Iteration 4.
Iteration 4: In Step 1, the solution of the minimization problem
minimize θ
subject to θ ≥ 7
3− x ,
θ ≥ x− 7
3,
θ ≥ x
3+
1
3,
0 ≤ x ≤ 10 , θ ∈ Ris x4 = 3
2, θ4 = 5
6.
Step 2 does not result in a feasibility cut, since x4 is feasible. In Step 3,the solution of (1.43) results in y = (1/2, 1/2, 5/2)T with π4
k = 1, 1 ≤k ≤ 3. The equations (1.44a),(1.44b) imply
E4 =1
3, e4 =
5
3, J4 =
7
6.
Set s = 4 and begin Iteration 5.
22 Ronald H.W. Hoppe
Iteration 5: In Step 1, the solution of the minimization problem
minimize θ
subject to θ ≥ 7
3− x ,
θ ≥ x− 7
3,
θ ≥ x
3+
1
3,
θ ≥ 5
3− x
3,
0 ≤ x ≤ 10 , θ ∈ Ris x5 = 2, θ5 = 1.Step 2 does not result in a feasibility cut, since x5 is feasible. In Step3, the solution of (1.43) is y = (1, 0, 2)T with πν
k = 1, 1 ≤ k ≤ 3. Theequations (1.44a),(1.44b) yield
E5 =1
3, e5 =
5
3, J5 = 1 .
Since J5 = θ5, we found the optimal solution.
Example (Feasibility cuts): We consider the minimization problem
minimize 3x1 + 2x2 + Eξ(15y1 + 12y2)(1.47)
subject to 3y1 + 2y2 ≤ x1 ,
2y1 + 5y2 ≤ x2 ,
0.8ξ1 ≤ y1 ≤ ξ1 ,
0.8ξ2 ≤ y2 ≤ ξ2 ,
x ≥ 10 , y ≥ 0 ,
where ξ = (ξ1, ξ2)T with ξ1 ∈ 4, 6, ξ2 ∈ 4, 8 independently with
probability 1/2 each.This example represents an investment decision in two resources x1 andx2 which are needed in the second stage decision to cover 80 % of thedemand.Note that
c = (3, 2)T , pk =1
2, 1 ≤ k ≤ 2 , q = (15, 12)T ,
W =
(3 22 5
), T =
( −1 00 −1
).
Consider the realization ξ = (6, 8)T . Set r = s = 0, Dr = 0, dr =0, Es = 0, es = 0, and ν = 1, x1 = (0, 0)T , θ1 = −∞, and begin the
Optimization Theory II, Spring 2007 ; Chapter 1 23
iteration with Step 2 which results in a first feasibility cut
3x1 + x2 ≥ 123.2 .
The associated first-stage solution provided by Step 1 is
x1 = (41.067, 0)T .
The following Step 2 gives rise to the feasibility cut
x2 ≥ 22.4 .
Going back to Step 1 and computing the associated first-stage solutiongives
x2 = (33.6, 22.4)T .
Step 2 results in a third feasibility cut
x2 ≥ 41.6
with the associated first-stage solution
x3 = (27.2, 41.6)T
which guarantees feasible second-stage decisions.
Remark: This example illustrates that the formal application of thefeasibility cuts can lead to an inefficient procedure. A closer look atthe problem at hand shows that in case ξ1 = 6 and ξ2 = 8, feasibilityrequires
x1 ≥ 27.2 , x2 ≥ 41.6 .
In other words, a reasonable initial program is given by
minimize 3x1 + 2x2 + Q(x) ,
subject to x1 ≥ 27.2 ,
x2 ≥ 41.6 ,
which guarantees second-stage feasibility.Another particular case where second-stage feasibility is guaranteed(and thus Step 2 of the L-shaped method can be skipped) is a two-stage stochastic linear program with complete recourse, i.e., thereexists y ≥ 0 such that Wy = t for all t ∈ Rm2 .For further specific cases where the structure of the program simplifiessecond-stage feasibility, we refer to [1].
24 Ronald H.W. Hoppe
Example: We illustrate the implementation of the L-shaped methodfor the following example:
n1 = 1 , n2 = 6 , m1 = 0 , m2 = 3 ,
c = 0 , W =
1 −1 −1 −1 0 00 1 0 0 1 00 0 1 0 0 1
.
The random variable ξ has K = 2 independent realizations with prob-ability 1/2 each which are given by
ξ1 = (q1, h1, T1)T , ξ2 = (q2, h2, T2)
T ,
where
q1 = (1, 0, 0, 0, 0, 0)T , q2 = (3/2, 0, 2/7, 1, 0, 0)T ,
h1 = (−1, 2, 7)T , h2 = (0, 2, 7)T ,
T1 = (1, 0, 0)T , T2 = T1 .
We note that for ξ1, the recourse function is given by
Q1(x) =
−x− 1 , x ≤ −10 , x ≥ −1
,
whereas for ξ2 we obtain
Q2(x) =
−1.5x , x ≤ 00 , 0 ≤ x ≤ 2
27(x− 2) , 2 ≤ x ≤ 9
x− 7 , x ≥ 9
.
We further impose the constraints
−20 ≤ x ≤ +20 .
A closer look at the problem reveals that x = 0 is an optimal solution.
Step 0 (Initialization): We choose x0 ≤ −1.
Iteration 1: We obtain
x1 = −2 , θ1 is omitted , New cut: θ ≥ −0.5− 1.25x .
Iteration 2: The second iteration yields
x2 = 20 , θ2 = −25.5 , New cut: θ ≥ −3.5 + 0.5x .
Iteration 3: The computations result in
x3 =12
7, θ3 = −37
14, New cut: θ ≥ 0 .
Optimization Theory II, Spring 2007 ; Chapter 1 25
Iteration 4: We get
x4 ∈ [−2
5, 7] , θ4 = 0 .
If we choose x4 ∈ [0, 2], iteration 4 terminates. Otherwise, more itera-tions are needed.
1.3.3 Finite termination property
In this section, we prove that the L-shaped method terminates aftera finite number of steps, provided ξ is a finite random variable. Inparticular, we show that
• a finite number of feasibility cuts (1.40b) is required either toprovide a feasible vector within the second stage feasible setK2 = x ∈ Rn1 | Q(x) < ∞ (cf. (1.16)) or to detect infeasibil-ity of the problem,
• a finite number of optimality cuts (1.40c) is needed to end upwith an optimal solution of (1.37),(1.38), provided there existfeasible points x ∈ K2.
We first recall the definition of Q(x) in (1.37)
Q(x) = EωQ(x, ξ(ω)) ,
Q(x, ξ(ω)) = minyq(ω)T y | Wy = h(ω)− T (ω)x , y ≥ 0 .
Lemma 1.11 (Representation of subgradients)
Let πνk , k ∈ 1, · · · , K, be an optimal multiplier associated with an
optimal solution xν of the minimization problem (1.43). Then, thereholds
(1.48) −(πνk)T Tk ∈ ∂Q(xν , ξk) .
Proof: The proof is left as an exercise. ¤
Proposition 1.12 (Finite termination of Step 2)
After a finite number of sub-steps, Step 2 (feasibility cuts) of the L-shaped method either terminates with feasible points x ∈ K2 or detectsinfeasibility of (1.37),(1.38).
Proof: We introduce the set
(1.49) pos W := t ∈ lRm2 | t = Wy, y ≥ 0and note that
(1.50) x ∈ K2 ⇐⇒ hk − Tkx ∈ pos W , 1 ≤ k ≤ K .
26 Ronald H.W. Hoppe
Given a first stage solution xν , in Step 2 the minimization problem(1.41) tests whether hk − Tkx
ν ∈ pos W for all 1 ≤ k ≤ K, or if
hk − Tkxν /∈ pos W for some k ∈ 1, · · · , K. In the latter case, there
exists a hyperplane separating hk − Tkxν and pos W , i.e., there exists
σ ∈ Rm2 such that
σT t ≤ 0 , t ∈ pos W and σT (hk − Tkxν) > 0
We remark that σ can be chosen as a multiplier σνk
associated with
(1.41), since
(σνk)T W ≤ 0 and (σν
k)T (hk − Tkx
ν) > 0 .
On the other hand, we have
(1.51) x ∈ K2 =⇒ (σνk)T (hk − Tkx) ≤ 0 , 1 ≤ k ≤ K .
Due to the finiteness of ξ, there is only a finite number of optimal basesof problem (1.41) and hence, there are only finitely many constraints(σν
k)T (hk − Tkx) ≤ 0. Consequently, after a finite number of sub-stepswe either find feasible points x ∈ K2 or detect infeasibility. ¤
Proposition 1.13 (Finite termination of Step 3)
Assume feasibility of (1.37),(1.38). Then, after a finite number of sub-steps, Step 3 (optimality cuts) of the L-shaped method terminates withan optimal solution.
Proof: We first observe that (1.37),(1.38) can be equivalently statedas
minimize cT x + Θ(1.52)
Q(x) ≤ Θ ,
subject to x ∈ K1 ∩K2 ,
where K1 is the first stage feasible set K1 := x ∈ Rn1 | Ax = b (cf.(1.16)).We further note that in Step 3 of the L-shaped method we compute thesolution of (1.43) along with an associated multiplier πν
k . We know fromthe duality theory of linear programming (cf., e.g., Theorem 1.3(ii) in[4]) that
(1.53) Q(xν , ξk) = (πνk)T (hk − Tkx
ν) , 1 ≤ k ≤ K .
Moreover, taking
v ∈ ∂Q(xν , ξk) ⇐⇒ vT (x− xν) + Q(xν , ξk) ≤ Q(x, ξk) , x ∈ Rn1
into account, it follows from (1.48) in Lemma 1.11 that
(1.54) (πνk)T Tk(x
ν − x) + Q(xν , ξk) ≤ Q(x, ξk)
Optimization Theory II, Spring 2007 ; Chapter 1 27
Using (1.53) in (1.54), we find
(1.55) Q(x, ξk) ≥ (πνk)T (hk − Tkx) .
Denoting by T,h and πν the random variables with realizations Tk, hk
and πνk , 1 ≤ k ≤ K, respectively, and taking the expectations in (1.53)
and (1.55), we get
Q(xν) = E(πν)T (h−Txν) =K∑
k=1
pk(πνk)T (hk − Tkx
ν)
and
Q(x) = E(πν)T (h−Tx) =K∑
k=1
pk(πνk)T (hk − Tkx) .
It follows that a pair (x, Θ) is feasible for (1.52) if and only if
Θ ≥ Q(x) ≥ E(πν)T (h−Tx) ,
which corresponds to (1.40d).On the other hand, if a pair (xν , Θν) is optimal for (1.52), then
Q(xν) = Θν = E(πν)T (h−Txν) .
Consequently, at each sub-step of Step 3 we either find Θν ≥ Q(xν)which means that we found an optimal solution, or we have Θν < Q(xν)which means that we have to continue with a new first stage solutionxν+1 and associated multipliers πν+1
k , 1 ≤ k ≤ K for (1.43). Since thereis only a finite number of optimal bases associated with (1.43), therecan be only a finite number of different combinations of the multipliersand hence, Step 3 must terminate after a finite number of sub-stepswith an optimal solution for (1.37),(1.38). ¤
Unifying the results of Proposition 1.12 and Proposition 1.13, we arriveat the following finite convergence result:
Theorem 1.14 (Finite convergence of the L-shaped method)
Assume that ξ is a finite random variable. Then, after a finite num-ber of steps the L-shaped method either terminates with an optimalsolution or proves infeasibility of (1.37),(1.38).
1.3.4 The multicut version of the L-shaped method
An alternative to Step 3 of the L-shaped method, where optimalitycuts are computed with respect to the K realizations of the second-stage program and then aggregated to one cut (cf. (1.44a),(1.44b)),
28 Ronald H.W. Hoppe
one can impose multiple cuts which leads to the following multicutL-shaped algorithm:
Multicut L-shaped algorithm
Step 0: Set r = ν = 0 and sk = 0, 1 ≤ k ≤ K.
Step 1: Set ν = ν + 1 and solve the linear program
minimize z(x) := cT x +K∑
k=1
θk ,(1.56a)
subject to Ax = b ,(1.56b)
D` x ≥ d` , ` = 1, · · · , r ,(1.56c)
E`(k) x + θk ≥ e`(k) , `(k) = 1, · · · , s(k) ,(1.56d)
1 ≤ k ≤ K ,
x ≥ 0 .(1.56e)
Let (xν , θν1 , · · · , θν
K) be an optimal solution of (1.56a)-(1.56e).In case there are no constraints (1.56d) for some k ∈ 1, · · · , K, weset θν
k = −∞.
Step 2 (Feasibility cuts): Step 2 is performed as in Step of theL-shaped method.
Step 3 (Optimality cuts): For 1 ≤ kleK solve the linear programs(1.43) and denote by πν
k the optimal multiplier associated with the k-thproblem. Check whether
(1.57) θνk < pk(π
νk)T (hk − Tkx
ν) .
If (1.57) is satisfied, define
Es(k)+1 = pk(πνk)T Tk ,(1.58a)
es(k)+1 = pk(πνk)T hk ,(1.58b)
set s(k) = s(k) + 1, and return to Step 1.If (1.57) does not hold true for any 1 ≤ k ≤ K, stop the algorithm: xν
is an optimal solution.
Example: We consider the same example as in Chapter 1.3.2:
Step 0 (Initialization): We choose x0 ≤ −1.
Optimization Theory II, Spring 2007 ; Chapter 1 29
Iteration 1: We compute
x1 = −2 , θ11andθ1
2 are omitted ,
New cuts: θ1 ≥ −0.5− 0.5x ,
θ2 ≥ −3
4x .
Iteration 2: We obtain
x2 = 20 , θ21 = −10.5 , θ2
2 = −15 ,
New cuts: θ1 ≥ 0 ,
θ2 ≥ −3.5 + 0.5x .
Iteration 3: The computations yield
x3 = 2.8 , θ31 = 0 , θ3
2 = −2.1 ,
New cut: θ2 ≥ 1
7(x− 2) .
Iteration 4: We get
x4 = 0.32 , θ41 = 0 , θ4
2 = −0.24 ,
New cut: θ2 ≥ 0 .
Iteration 5: This iteration reveals
x5 = 0 , θ51 = 0 , θ5
2 = 0 .
The algorithm terminates with x5 = 0 as an optimal solution.
1.3.5 Inner linearization methods
We consider the dual linear program with respect to (1.40a)-(1.40e)in Steps 1-3 of the L-shaped method:Find (ρ, σ, π) such that
maximize ζ = ρT b +r∑
`=1
σ`d` +s∑
`=1
π`e`,(1.59a)
subj. to ρT A +r∑
`=1
σ`D` +s∑
`=1
π`E` ≤ cT ,(1.59b)
s∑
`=1
π` = 1 , σ` ≥ 0, 0 ≤ ` ≤ r , π` ≥ 0, 0 ≤ ` ≤ s .(1.59c)
The dual program (1.59a)-(1.59c) involves
30 Ronald H.W. Hoppe
• multipliers σ`, 0 ≤ ` ≤ r, on extreme rays (directions ofrecession) of the duals of the subproblems,
• multipliers π`, 0 ≤ ` ≤ s, on the expectations of extremepoints of the duals of the subproblems.
Indeed, let us consider the following dual linear program with respectto (1.43a)-(1.43c) in Step 3 (optimality cuts) of the L-shaped method:
maximize w = πT (hk − Tkxν) ,(1.60a)
subject to πT W ≤ qT .(1.60b)
From the duality theory of linear programming we know (cf. Theorem1.3 and Theorem 1.4 in [4]):
• If (1.60a)-(1.60b) is unbounded for all k, then there exists amultiplier σν such that
(σν)T W ≤ 0 , (σν)T (hk − Tkxν) > 0 ,
and the primal problem (1.41a)-(1.41c) does not have a feasiblesolution.
• If (1.60a)-(1.60b) is bounded for some k, then (1.60a)-(1.60b)is feasible and (1.41a)-(1.41c) has an optimal (primal) solution.
In other words, Step 2 of the L-shaped method is equivalent to checkingwhether (1.60a)-(1.60b) is unbounded for any k. If so, D`+1 and d` arecomputed according to (1.42a) and (1.42b) of the L-shaped methodand added to the constraints (feasibility cuts).
Next, consider the case when (1.60a)-(1.60b) has a finite optimal valueσν
k for all k, i.e., (1.41a)-(1.41c) is solvable for all k. In Step 3 of theL-shaped method, we then compute E`+1 and e`+1 according to (1.44a)and (1.44b) and add them to the constraints (optimality cuts). In thedual approach (1.59a)-(1.59c) we proceed in the same way.
Conclusion: Steps 1-3 of the L-shaped method are equivalent to solv-ing (1.59a)-(1.59c) as a master program and the maximization prob-lems (1.60a)-(1.60b) as subproblems.
This leads to the following so-called inner linearization algorithm:
Step 0: Set r = s = ν = 0.
Step 1: Set ν = ν +1. Compute (ρν , σν , πν) as the solution of (1.59a)-(1.59c) and (xν , θν) as the associated dual solution.
Step 2: For 1 ≤ k ≤ K solve the subproblems (1.60a)-(1.60b).If all subproblems are solvable, go to Step 3.If an infeasible subproblem (1.60a)-(1.60b) is found, stop the algorithm
Optimization Theory II, Spring 2007 ; Chapter 1 31
(the stochastic program is ill-posed).If an unbounded solution with extreme ray σν is found for some k,compute
Dr+1 := (σν)T Tk ,(1.61a)
dr+1 := (σν)T hk ,(1.61b)
set r = r + 1 and return to Step 1.
Step 3: Compute Es+1 and es+1 according to
Es+1 :=K∑
k=1
pk(πνk)T Tk ,(1.62a)
es+1 :=K∑
k=1
pk(πνk)T hk ,(1.62b)
If
(1.63) es+1 − Es+1xν − θν ≤ 0 ,
then stop: (ρν , σν , πν) and (xν , θν) are optimal solutions.On the other hand, if
(1.64) es+1 − Es+1xν − θν > 0 ,
set s = s + 1 and return to Step 1.
Remark: The name inner linearization algorithm stems from thefact that (1.59a)-(1.59c) can be interpreted as an inner linearizationof the dual program of the original L-shaped method in the sense ofthe Dantzig-Wolfe decomposition of large-scale linear programs [3].Since we solve dual problems instead of primal ones, finite conver-gence follows directly from the corresponding property of the originalL-shaped method.
Remark: With regard to the dimensionality of the problems, inmany applied cases we have n1 À m1. Then, the primal L-shapedmethod has basis matrices of order at most m1 +m2 compared to basismatrices of order n1 + n1 for the dual version. Therefore, the original(primal) L-shaped method is usually preferred.
The inner linearization method can be applied directly to the pri-mal problem (1.14), if the technology matrix T is deterministic. In this
32 Ronald H.W. Hoppe
case, (1.14) can be replaced by
minimize z = cT x + Ψ(χ)(1.65)
Ax = b ,
Tx− χ = 0 ,
x ≥ 0 ,
where
Ψ(χ) := Eξψ(χ, ξ(ω)) ,(1.66)
ψ(χ, ξ(ω)) := miny≥0
q(ω)T y | Wy = h(ω)− χ .
The idea is to construct an inner linearization of the substitute Ψ(χ)of the recourse function using the generalized programming ap-proach from [2] by replacing Ψ(χ) with the convex hull of points Ψ(χ`)computed within the iterations of the algorithm. In particular, eachiteration generates an extreme point of a region of linearity for Ψ.We define Ψ+
0 (ζ) as follows
(1.67) Ψ+0 (ζ) := lim
α→∞Ψ(χ + αζ)−Ψ(χ)
α.
Generalized programming algorithm for two-stage stochasticlinear programs:
Step 0: Set s = r = ν = 0.
Step 1: Set ν = ν + 1 and solve the master linear program
minimize zν = cT x +r∑
i=1
µiΨ+0 (ζ i) +
s∑i=1
λiΨ(χi),(1.68a)
subj. to Ax = b ,(1.68b)
subj. to Tx−r∑
i=1
µiζi −
s∑i=1
λiχi = 0,(1.68c)
s∑
`=1
λi = 1 , λi ≥ 0, 1 ≤ i ≤ s ,(1.68d)
x ≥ 0 , µi ≥ 0, 1 ≤ i ≤ r .(1.68e)
If (1.68a)-(1.68e) is infeasible or unbounded, stop the algorithm.Otherwise, compute the solution (xν , µν , λν) and the dual solution(σν , πν , ρν).
Optimization Theory II, Spring 2007 ; Chapter 1 33
Step 2: Solve the subproblem
(1.69) minimize Ψ(χ) + (πν)T χ − ρν over χ .
If (1.69) has a solution χs+1, go to Step 3.On the other hand, if (1.69) is unbounded, there exists a recessiondirection ζr+1 such that for some χ
Ψ(χ + αζr+1) + (πν)T (χ + αζr+1) → −∞ as α → +∞ .
In this case, we define
(1.70) Ψ+0 (ζr+1) = lim
α→+∞Ψ(χ + αζ)−Ψ(χ)
α.
We set r = r + 1 and and return to Step 1.
Step 3: Check whether
(1.71) Ψ(χs+1) + (πν)T χs+1 − ρν ≥ 0 .
If (1.71) holds true, stop the algorithm: (xν , µν , λν) is an optimal so-lution of (1.65).Otherwise, set s = s + 1 and return to Step 1.
Remark: In case of a two-stage stochastic linear problem, thesubproblem (1.69) can be reformulated according to
minimizeK∑
k=1
pkqTk yk + (πν)T χ − ρν ,(1.72a)
subject to Wyk + χ = hk , 1 ≤ k ≤ K ,(1.72b)
subject to yk ≥ 0 , 1 ≤ k ≤ K .(1.72c)
In general, for k ∈ 1, · · · , K the subproblem (1.72) can not be fur-ther separated into different subproblems so that the original L-shapedmethod should be preferred. However, for problems with a simplerecourse, for each k the function Ψ(χ) is separable into components,and (1.72) can be split into K independent subproblems.
Finite termination of the generalized programming algorithm will beestablished by means of the following result:
Proposition 1.15 (Characterization of extreme points)Every optimal extreme point (y∗1, · · · , y∗K , χ∗) of the feasible region of(1.72) corresponds to an extreme point χ∗ of
(1.73) χ | Ψ(χ) = (π∗)T χ + θ ,
34 Ronald H.W. Hoppe
where π∗ =∑K
k=1 π∗k and each π∗k, 1 ≤ k ≤ K, is an extreme point of
(1.74) πk | πTk W ≤ qT
k .
Proof. Let (y∗1, · · · , y∗K , χ∗) be an optimal extreme point in (1.72).Then, we have
(1.75) qTk y∗k ≤ qT
k yk for all yk with Wyk = ξk − χ∗ .
We claim that
(1.76) y∗k also is an extreme point of yk |Wyk = ξk−χ∗ , yk ≥ 0 .
Indeed, if (1.76) is not true, we could choose y∗k as the arithmetic meanof two distinct feasible y1
i and y2i .
It follows from (1.75) and (1.76) that y∗k has a complementary dualsolution π∗k, i.e.,(1.77)π∗k is an extreme point of πk | πT
k W ≤ qTk and (qT
k−(π∗k)T W )y∗k = 0 .
The proof of the assertion will now be provided by a contradictionargument: We assume that (y∗1, · · · , y∗K , χ∗) is not an extreme pointof the linearity region
(1.78) Ψ(χ) = (π∗)T χ+θ , θ = Ψ(χ∗)−(π∗)T χ∗ , π∗ =K∑
k=1
π∗k .
Then, χ∗ must be the convex combination of two χi, 1 ≤ i ≤ 2,i.e.,χ∗ = λχ1 + (1− λ)χ2, 0 < λ < 1, where
Ψ(χi) = (π∗)T χi + θ , 1 ≤ i ≤ 2 .
We also claim that(1.79)
Ψ(χi) =K∑
k=1
qTk yi
k , where qTk yi
k = (π∗k)T (hk − χi) , 1 ≤ i ≤ 2 .
Indeed, if (1.79) does not hold true, due to the feasibility of π∗k wewould have
qTk yi
k > (π∗k)T (hk − χi) , 1 ≤ i ≤ 2 ,
which would imply
Ψ(χi) > (π∗)T χi + θ .
We remark that (1.79) also implies
(1.80) ((π∗k)T W − qT
k )(λy1k + (1− λ)y2
k) = 0 ,
Optimization Theory II, Spring 2007 ; Chapter 1 35
and hence, due to the fact that y∗k is an extreme point of the feasibleset of the k-th recourse problem,
(1.81) y∗k = λy1k + (1− λ)y2
k .
It follows from (1.81) that
(y∗1, · · · , y∗k, χ∗) = λ(y1
1, · · · , y1k, χ
1) + (1− λ)(y21, · · · , y2
k, χ2) ,
which contradicts that (y∗1, · · · , y∗k, χ∗) is an extreme point. ¤
Proposition 1.16 (Characterization of extreme rays)Any extreme ray associated with subproblem (1.72) is an extreme rayof a region of linearity of Ψ(χ).
Proof. The proof is left as an exercise. ¤
Theorem 1.17 (Finite convergence of the generalized program-ming algorithm)The application of the generalized programming algorithm to problem(1.65) with subproblem (1.72) converges after a finite number of steps.
Proof. Each solution of subproblem (1.72) generates a new linearregion extreme value. For a new extreme ray ζr+1 we have
(1.82) Ψ+0 (ζr+1) + (πν)T ζr+1 < 0 ,
whereas
(1.83) Ψ+0 (ζ i) + (πν)T ζ i < 0 , 1 ≤ i ≤ r .
As far as a new extreme point χs+1 is concerned, such a point is addedto the constraints only if
(1.84) Ψ(χs+1) + (πν)T χs+1 − ρν < 0 ,
whereas
(1.85) Ψ(χi) + (πν)T χi − ρν < 0 , 1 ≤ i ≤ s .
Since the number of regions satisfying (1.82)-(1.85) is finite and eachregion has a finite number of extreme rays and extreme points, thealgorithm must terminate after a finite number of steps. ¤
36 Ronald H.W. Hoppe
1.4 Two-stage stochastic nonlinear programs with recourse
In this section, we consider a generalization of stochastic two-stage lin-ear programs with recourse to problems involving nonlinear functions.
Definition 1.7 (Two-stage stochastic nonlinear program withrecourse)
Let I ⊂ N be an index set, f 1 : Rn1 → R, g1i : Rn1 → R, 1 ≤ i ≤ m1
and f 2(·, ω) : Rn2 → R, g2i (·, ω) : Rn2 → R, 1 ≤ i ≤ m2, t
2i (·, ω) : Rn1 →
R, 1 ≤ i ≤ m2, functions that are continuous for any fixed ω ∈ I andmeasurable in ω for any fixed first argument. Then, a minimizationproblem of the form
minimize z = f 1(x) + Q(x) ,(1.86)
subject to g1i (x) ≤ 0 , 1 ≤ i ≤ m1 ,
g1i (x) = 0 , m1 + 1 ≤ i ≤ m1 ,
where Q(x) = Eω[Q(x, ω)] and
Q(x, ω) = inf f 2(y(ω), ω) ,(1.87)
subject to t2i (x, ω) + g2i (y(ω), ω) ≤ 0 , 1 ≤ i ≤ m2 ,
t2i (x, ω) + g2i (y(ω), ω) = 0 , m2 + 1 ≤ i ≤ m2 ,
is called a two-stage stochastic nonlinear program with recoursefunction Q(x).
Remark: We note that the assumptions in Definition 1.7 imply thatQ(x, ω) is measurable in ω for all x ∈ Rn1 and hence, the recourse Q(x)is well defined.
Definition 1.8 (First and second stage feasible sets)
The set(1.88)K1 := x ∈ Rn1 | g1
i (x) ≤ 0 , 1 ≤ i ≤ m1 , g1i (x) = 0 , m1+1 ≤ i ≤ m1
is called the first-stage feasible set, whereas the sets
K2(ω) := x ∈ Rn2 | There exists y(ω) such that(1.89)
t2i (x, ω) + g2i (y(ω), ω) ≤ 0 , 1 ≤ i ≤ m2 ,
t2i (x, ω) + g2i (y(ω), ω) = 0 , m2 + 1 ≤ i ≤ m2 ,
K2 := x ∈ Rn2 | Q(x) < ∞are referred to as the second-stage feasible sets.
Optimization Theory II, Spring 2007 ; Chapter 1 37
Remark: We note that unlike the situation in Chapter 1.2 we do notconsider fixed recourse in order to keep an utmost amount of generality.We also remark that in case of fixed recourse the optimality conditionsdepend on the form of the objective and constraint functions anyway.
In order to ensure necessary and sufficient optimality conditionsfor (1.86),(1.87) we impose the following assumptions on the objectiveand constraint functions:
A1 (Convexity):
• The functions f 1, g1i : Rn1 → R, 1 ≤ i ≤ m1, are convex,
• The functions g1i : Rn1 → R, m1 + 1 ≤ i ≤ m1, are affine,
• The functions f 2(·, ω), g2i (·, ω) : Rn2 → R, 1 ≤ i ≤ m2, are
convex for all ω ∈ I,• The functions g2
i (·, ω) : Rn2 → R, m2 + 1 ≤ i ≤ m2, are affinefor all ω ∈ I,
• The functions t2i (·, ω) : Rn1 → R, 1 ≤ i ≤ m2, are convex for allω ∈ I,
• The functions t2i (·, ω) : Rn1 → R, m2 + 1 ≤ i ≤ m2, are affinefor all ω ∈ I.
A2 (Slater Condition):
• If Q(x) < ∞, for almost all ω ∈ I, there exists y(ω) such that
t2i (x, ω) + g2i (y(ω), ω) < 0 , 1 ≤ i ≤ m2
and
t2i (x, ω) + g2i (y(ω), ω) = 0 , m2 + 1 ≤ i ≤ m2 .
Theorem 1.18 (Convexity of the recourse function)
Assume that assumptions (A1) and (A2) hold true. Then, the re-course function Q(x, ω) is a convex function of x for all ω ∈ I.
Proof. Suppose that yi, 1 ≤ i ≤ 2, are solutions of (1.87) with respectto xi, 1 ≤ i ≤ 2, respectively. We have to show that for λ ∈ [0, 1]
(1.90) y = λy1 + (1− λ)y2 solves (1.87) for x := λx1 + (1− λ)x2
as well. This is an easy consequence of the assumptions and left as anexercise. ¤
Theorem 1.19 (Lower semicontinuity of the recourse function)
If the second-stage feasible set K2(ω) is bounded for all ω ∈ I, thenthe recourse function Q(·, ω) is lower semicontinuous for all ω ∈ I.
38 Ronald H.W. Hoppe
Proof. We have to show that for any x ∈ Rn1 and ω ∈ I we have
(1.91) Q(x, ω) ≤ lim infx→x
Q(x, ω) .
Suppose that xνν∈N is a sequence in Rn1 such that xν → x as ν →∞. Without restriction of generality, we may assume that Q(xν , ω) <∞, ν ∈ N, since otherwise we will find a subsequence N′ with thatproperty.By our assumptions, we find yν(ω), ν ∈ N, such that
t2i (xν , ω) + g2
i (yν(ω), ω) ≤ 0 , 1 ≤ i ≤ m2 ,
t2i (xν , ω) + g2
i (yν(ω), ω) = 0 , m2 + 1 ≤ i ≤ m2 .
The boundedness assumption and the continuity of the functions implythat the sequence yν(ω)ν∈N is bounded, and hence, there exist y(Ω)and a subsequence N′ ⊂ N, such that yν(ω) → y(ω) as ν ∈ N′ → ∞and
t2i (x, ω) + g2i (y(ω), ω) ≤ 0 , 1 ≤ i ≤ m2 ,
t2i (x, ω) + g2i (y(ω), ω) = 0 , m2 + 1 ≤ i ≤ m2 .
Consequently, x is feasible and
Q(x, ω) ≤ f 2(x, ω) = limν→∞
f 2(xν , ω) = limν→∞
Q(xν , ω) ,
which gives the assertion. ¤
Corollary 1.20 (Further properties of the feasible set and therecourse function)
The feasible set K2 is a closed, convex set, and the expected recoursefunction Q is a lower semicontinuous convex function in x.
Proof. The proof is an immediate consequence of the assumptions andthe previous results. ¤
Remark: In general, it is difficult to decompose the feasible set K2
according to
(1.92) K2 =⋂ω∈I
K2(ω) .
A particular example, where such a decomposition can be realized isfor quadratic objective functionals f 2.
Theorem 1.21 (Optimality conditions)
Suppose that there exists
(1.93) x ∈ ri(dom(f 1(x))) ∩ ri(dom(Q(x)) ,
Optimization Theory II, Spring 2007 ; Chapter 1 39
where ri stands for the relative interior, and further assume that
g1i (x) < 0 , 1 ≤ i ≤ m1 ,(1.94)
g1i (x) = 0 , m1 + 1 ≤ i ≤ m1 .(1.95)
Then, x∗ ∈ Rn1 is optimal in (1.86) if and only if x∗ ∈ K1 and thereexist multipliers µ∗i ≥ 0, 1 ≤ i ≤ m1, and λ∗i , m1 + 1 ≤ i ≤ m1, suchthat
0 ∈ ∂f 1(x∗) + ∂Q(x∗) +
m1∑i=1
µ∗i ∂g1i (x
∗) +
m1∑i=m1+1
λ∗i ∂g1i (x
∗) ,(1.96)
µ∗i g1i (x
∗) = 0 , 1 ≤ i ≤ m1 .(1.97)
Proof. The assertions can be deduced readily by applying the generaltheory of nonlinear programming (cf., e.g., Chapter 2 in [4]). ¤
Remark: As far as decompositions of the subgradient ∂Q(x) intosubgradients of Q(x, ω) are concerned, in much the same way as inTheorem 1.4 of Chapter 1.2 one can show
(1.98) ∂Q(x) = Eω[∂Q(x, ω)] + N(K2, x) ,
where N(K2, x) stands for the normal cone (cf. Chapter 2 in [4]).Note that (1.98) reduces to
∂Q(x) = Eω[∂Q(x, ω)]
in case of relatively complete recourse, i.e., if K1 ⊂ K2.For the derivation of optimality conditions for problems with explicitconstraints on non-anticipativity we refer to Theorem 39 in [1].
40 Ronald H.W. Hoppe
1.5 Piecewise quadratic form of the L-shaped method
Although such a systematic approach like SQP (Sequential QuadraticProgramming) for deterministic nonlinear problems does not exist ina stochastic environment, one might be tempted to reduce a generaltwo-stage stochastic nonlinear program to the successive solution oftwo-stage stochastic quadratic programs. In this section, we considera piecewise quadratic form of the L-shaped method for such two-stagestochastic quadratic programs which are of the form:
minimize z(x) = cT x +1
2xT Cx +(1.99)
+ Eξ[min(qT (ω)y(ω) +1
2yT (ω)D(ω)y(ω))] ,
subject to Ax = b ,
T (ω)x + Wy(ω) = h(ω) ,
x ≥ 0 , y(ω) ≥ 0 .
Here, A ∈ Rm1×n1 , C ∈ Rn1×n1 ,W ∈ Rm2×n2 are fixed matrices, andc ∈ Rn1 is a fixed vector. Moreover, D ∈ Rn2×n2 , T ∈ Rm2×n1 arerandom matrices and q ∈ Rn2 , h ∈ Rm2 are random vectors.
For a given realization ξ(ω), ω ∈ I, the associated recourse functioncan be defined according to
Q(x, ξ(ω)) := minqT (ω)y(ω) +1
2yT (ω)D(ω)y(ω) |(1.100)
T (ω)x + Wy(ω) = h(ω) , y(ω) ≥ 0 ,
which may attain the values ±∞, if the problem is unbounded or in-feasible, respectively.The expected recourse function is given by
(1.101) Q(x) := EξQ(x, ξ) .
We use the convention +∞+ (−∞) = +∞.The first-stage and second-stage feasible sets K1 and K2 are defined asin the previous section.
We impose the following assumptions on the data of the two-stagestochastic quadratic program:
A3:
• The random vector ξ has a discrete distribution.
Optimization Theory II, Spring 2007 ; Chapter 1 41
A4:
• The matrix C is positive semi-definite and the matrices D(ω)are positive semi-definite for all ω ∈ I,
• The matrix W has full row rank.
Remark: We note that (A13) implies the decomposability of thesecond-stage feasible set K2, whereas (A4) ensures convexity of therecourse functions.
An important feature of the problem is that the recourse function Q(x)is piecewise quadratic, i.e., the second-stage feasible set K2 can bedecomposed into polyhedral sets, called cells, such that Q(x) is qua-dratic on each cell.
Example: We consider the following two-stage stochastic quadraticprogram:
minimize z(x) = 2x1 + 3x2 + Eξ min−6.5y1 −(1.102)
− 7y2 +1
2y2
1 + y1y2 +1
2y2
2 ,
subject to 3x1 + 2x2 ≤ 15 ,
x1 + 2x2 ≤ 8 ,
y1 ≤ x1 , y2 ≤ x2 ,
y1 ≤ ξ1 , y2 ≤ ξ2 ,
x1 + x2 ≥ 0 , x ≥ 0 , y ≥ 0 .
We assume that ξ1 ∈ 2, 4, 6 and ξ2 ∈ 1, 3, 5 are independent ran-dom variables with probability 1/3 for each realization.
The problem can be interpreted as a portfolio problem where theissue is to minimize quadratic penalties on deviations from a meanvalue.
In the second stage of the problem, for small values of the assets xi, 1 ≤i ≤ 2, it is optimal to sell, i.e., yi = xi, 1 ≤ i ≤ 2. Indeed, for
(x1, x2) ∈ C1 := (x1, x2) | 0 ≤ x1 ≤ 2 , 0 ≤ x2 ≤ 1the optimal solution of the second stage is yi = xi, 1 ≤ i ≤ 2 for allpossible values of ξ, whence
Q(x) = Q(x, ξ) = −6.5x1 − 7x2 +1
2x2
1 + x1x2 +1
2x2
2 , (x1, x2) ∈ C1 .
42 Ronald H.W. Hoppe
Definition 1.9 (Finite closed convex complex)
A finite closed convex complex K is a finite collection of closedconvex sets Cν , 1 ≤ ν ≤ M , called the cells of K, such that
int(Cν1 ∩ Cν2) = ∅, ν1 6= ν2 .
Definition 1.10 (Piecewise convex program)
A piecewise convex program is a convex program of the form
(1.103) inf z(x) | x ∈ S ,
where z : Rn → R is convex and S is a closed convex subset of dom(z)with int(S) 6= ∅.Definition 1.11 (Piecewise quadratic function)
Consider a piecewise convex program and assume that K is a finiteclosed convex complex with cells Cν , 1 ≤ ν ≤ M such that
S ⊆M⋃
ν=1
Cν ,
(1.104a)
either z ≡ −∞, or for each cell Cν , 1 ≤ ν ≤ M, there exists(1.104b)
a convex function zν : S → R which is continuously differen-
tiable on an open set containing Cν such that
z(x) = zν(x) , x ∈ Cν , 1 ≤ ν ≤ M ,
∇zν(x) ∈ ∂z(x) , x ∈ Cν , 1 ≤ ν ≤ M .
A piecewise quadratic function z : S → R is a piecewise convexfunction where on each cell Cν , 1 ≤ ν ≤ M, the function zν is a qua-dratic form.
Example: In the above example, both Q(x) and z(x) are piecewisequadratic. In particular, we have
Q(x) = −6.5x1 − 7x2 +1
2x2
1 + x1x2 +1
2x2
2 , (x1, x2) ∈ C1 ,
z(x) = −4.5x1 − 4x2 +1
2x2
1 + x1x2 +1
2x2
2 , (x1, x2) ∈ C1 .
Optimization Theory II, Spring 2007 ; Chapter 1 43
The numerical solution of two-stage stochastic piecewise quadraticprograms is taken care of by the following PQP algorithm:
Step 0 (Initialization): Compute a decomposition of the state spaceS according to (1.104a) into cells Cν , 1 ≤ ν ≤ M , set S1 = S andchoose x0 ∈ S1.
Step 1 (Iteration loop): For µ ≥ 1:
Step 1.1 (Determination of current cell): Determine Cµ suchthat xµ−1 ∈ Cµ and specify the quadratic form zµ(·) on Cµ accordingto (1.104b).
Step 1.2 (Solution of minimization subproblems): Compute
xµ = arg minx∈Sµ
zµ(x) ,(1.105a)
wµ = arg minx∈Cµ
zµ(x) .(1.105b)
If wµ is the limiting point of a ray on which zµ(·) is decreasing to−∞, stop the algorithm: The original PQP is unbounded. Otherwise,continue with Step 1.3.
Step 1.3 (Optimality check): Check the optimality condition
(1.106) (∇zµ(wµ))T (xµ − wµ) = 0 .
If (1.106) is satisfied, stop the algorithm: wµ is the optimal solution ofthe PQP. Otherwise, continue with Step 1.4.
Step 1.4 (Update of state space): Compute
(1.107) Sµ+1 := Sµ ∩ x | (∇zµ(wµ))T x ≤ (∇zµ(wµ))T wµ ,
set µ := µ + 1, and go to Step 1.1.
Theorem 1.22 (Finite termination of the PQP algorithm)
Under assumptions (A3) and (A4), the PQP algorithm terminates af-ter a finite number of steps with the solution of the two-stage stochasticpiecewise quadratic program.
Proof. We refer to [5]. ¤
Remark: Details concerning the appropriate construction of finiteclosed convex complexes K satisfying (1.104a),(1.104b) can be foundin [5].
44 Ronald H.W. Hoppe
Example: We illustrate the implementation of the PQP algo-rithm for the piecewise quadratic program (1.102).
Step 0 (Initialization): We choose the cells Cν , 1 ≤ ν ≤ 8, as shownin Fig. 1.2.We further define
S1 = S = x ∈ R2 | 3x1 + 2x2 ≤ 15 , x1 + 2x2 ≤ 8 , x1, x2 ≥ 0
choose
x0 = (0, 0)T
and set µ = 1.
-
6
6
¾
PPq´3
u u
u
u
u
u
u
u
x2
5
4
3
2
1
x0
x3 x1
x1
w1 = w3w4 = x5 = w5
x2
x4
w2
oc1oc3
oc2
oc4
54321
Fig. 1.2. Finite closed convex complex and PQP cuts
Optimization Theory II, Spring 2007 ; Chapter 1 45
Iteration 1: The cell containing x0 is
C1 = x ∈ R2 | 0 ≤ x1 ≤ 2 , 0 ≤ x2 ≤ 1 ,
and the quadratic function z1 on C1 is
z1(x) = −4.5x1 − 4x2 +1
2x2
1 + x1x2 +1
2x2
2 .
Solving (1.105a),(1.105b) by means of the KKT-conditions results in
x1 = (4.5, 0)T , w1 = (2, 1)T ∈ C1 ,
whence
∇z1(w1) = (−1.5,−1)T , (∇z1(w
1))T (x1 − w1) = −2.75 6= 0 ,
and
S2 = S1 ∩ x ∈ R2 | − 1.5x1 − x2 ≤ −4 .
Iteration 2: The cell containing x1 is
C2 = x ∈ R2 | 4 ≤ x1 ≤ 6 , 0 ≤ x2 ≤ 1 , x1 + x2 ≤ 6.5 .
The quadratic function z2 on C2 is
z2(x) = −29
3− 1
6x1 − 2x2 +
1
6x2
1 +1
3x1x2 +
1
2x2
2 .
The solution of (1.105a),(1.105b) by means of the KKT-conditionsgives
x2 = (22
19,43
19)T , w2 = (4,
2
3)T ∈ C2 ,
whence
∇z2(w2) = (
25
18, 0)T , (∇z2(w
2))T (x2 − w2) 6= 0 ,
and
S3 = S2 ∩ x ∈ R2 | 25
18x1 ≤ 100
18 .
Iteration 3: The cell containing x2 is
C3 = x ∈ R2 | 0 ≤ x1 ≤ 2 , 1 ≤ x2 ≤ 3 .
The quadratic function z3 on C3 is
z3(x) = −13
6− 25
6x1 − 5
3x2 + 2x1x2 +
1
3x2
2 .
Via the solution of (1.105a),(1.105b) we obtain
x3 = (4, 0)T , w3 = w1 = (2, 1)T ,
46 Ronald H.W. Hoppe
whence
S4 = S3 ∩ x ∈ R2 | − 3
2x1 +
1
3x2 ≤ −8
3 .
Iteration 4: The cell containing x3 is
C4 = x ∈ R2 | 2 ≤ x1 ≤ 4 , 0 ≤ x2 ≤ 1 .
The quadratic function z4 on C4 is
z4(x) = −11
3− 7
3x1 − 10
3x2 +
1
3x2
1 +2
3x1x2 +
1
3x2
2 .
The solution of (1.105a),(1.105b) yields
x4 ≈ (2.18, 1.81)T , w4 = (2.5, 1)T ,
whence
S5 = S4 ∩ x ∈ R2 | − 2
3x1 ≤ −2
3 .
Iteration 5: The cell containing x4 is
C5 = x ∈ R2 | 2 ≤ x1 ≤ 4 , 1 ≤ x2 ≤ 3 ∩ S .
The quadratic function z5 on C5 is
z4(x) = −101
18− 19
9x1 − 11
9x2 +
1
3x2
1 +4
9x1x2 +
1
3x2
2 .
The solution of (1.105a),(1.105b) yields
x5 = w5 = (2.5, 1)T ,
which is an optimal solution of the problem.
References
[1] J.R. Birge and F. Louveaux; Introduction to Stochastic Programming.Springer, Berlin-Heidelberg-New York, 1997
[2] G.B. Dantzig; Linear Programming and Extensions. Princeton UniversityPress, Princeton, NJ, 1963
[3] G.B. Dantzig and P. Wolfe; The decomposition principle for linear programs.Operations Research, 8, 101–111, 1960
[4] R.H.W. Hoppe; Optimization I. Handout of the course held in Fall 2006. Seehttp://www.math.uh.edu
[5] F.V. Louveaux; Piecewise convex programs. Math. Prgramming, 15, 53–62,1978
[6] D. Walkup and R.J.-B. Wets; Stochastic programs with recourse. SIAM J.Appl. Math., 15, 1299–1314, 1967
[7] D. Walkup and R.J.-B. Wets; Stochastic programs with recourse II: on thecontinuity of the objective. SIAM J. Appl. Math., 17, 98–103, 1969
Optimization Theory II, Spring 2007 ; Chapter 1 47
[8] R.J.-B. Wets; Characterization theorems for stochastic programs. Math. Pro-gramming, 2, 166–175, 1972
[9] R.J.-B. Wets; Stochastic programs with fixed recourse: the equivalent deter-ministic problem. SIAM Rev. 16, 309–339, 1974
[10] R.J.-B. Wets; Stochastic programming. In: Optimization (G.L. Nemhauser etal.; eds.), Handbboks in Operations Research and Management Science, Vol.I, North-Holland, Amsterdam, 1990