successivelinearapproximationsolution ...matzgy/papers/cp_ihdp3jrb... ·...
TRANSCRIPT
Successive Linear Approximation Solution
of Infinite Horizon Dynamic Stochastic
Programs
John R. Birge ∗ and Gongyun Zhao †
June 30, 2006
∗The University of Chicago Graduate School of Business, Chicago, Illinois, USA.
([email protected]). His work was supported in part by the National Science Foun-
dation under Grant DMI-0200429 and by The University of Chicago Graduate School of Business.†Department of Mathematics, National University of Singapore, Singapore. ([email protected]).
His work was supported in part by NUS Academic Research Grant R-146-000-057-112.
1
Successive Linear Approximation Solution
of Infinite Horizon Dynamic Stochastic
Programs
Abstract
Models for long-term planning often lead to infinite horizon stochastic pro-
grams that offer significant challenges for computation. Finite-horizon approxi-
mations are often used in these cases but they may also become computationally
difficult. In this paper, we directly solve for value functions of infinite horizon
stochastic programs. We show that a successive linear approximation method
converges to an optimal value function for the case with convex objective, linear
dynamics, and feasible continuation.
Keywords: stochastic programming, dynamic programming, infinite horizon, lin-
ear approximation, cutting planes
AMS Classification: 65K05, 90C15, 90C39, 91B28
1 Introduction
Many long-term planning problems can be expressed as infinite horizon stochastic
programs. The infinite horizon often arises because of uncertainty about any
specific endpoint (e.g., the lifetime of an individual or organization). Solving
such problems with multiple decision variables and random parameters presents
obvious computational difficulties. A common technique is to use a finite-horizon
approximation, but even these problems become quite difficult for practical size.
The approach in this paper is to assume stationary data and to solve for the
infinite horizon value function directly. A motivating example is an infinite hori-
zon portfolio problem, which involves decisions on amounts to invest in different
assets and amounts to consume over time. For simple cases, such as those in
Samuelson [17] for discrete-time and Merton [12] for continuous-time, optimality
conditions can be solved directly but general transaction costs and constraints
on consumption or investment require more complex versions of the form in the
infinite horizon problems considered here.
The problems considered here also relate to the dynamic programming liter-
ature (see, for example, Bertsekas [3]) and particularly to methods for solving
1
partially observed Markov decision processes (see the survey in Lovejoy [10]). Our
method is most similar to the piecewise linear construction in Smallwood and
Sondik [18] for finite horizons and the bounding approximations used in Lovejoy
[11] for both finite and infinite horizons. The main differences between our model
and those in [11] and [18] are that we do not assume a finite action space and
do not use finite horizon approximations. Our method also does not explicitly
find a policy or approximate the state space with a discrete grid; instead, we use
the convexity of the value function and the contraction properties of the dynamic
programming operator in a form of approximate value iteration.
Our approach also is similar to the linear programming approach to approx-
imate dynamic programming (LP-ADP, see, e.g., De Farias and Van Roy [4]),
but that approach again focuses on discrete state and action spaces and uses a
pre-selected set of linear basis functions to approximate the value function. Our
method uses linear support functions as an outer linearization in contrast to inner
linearization in the LP-ADP approach. Our support functions are also gener-
ated within the method and achieve arbitrary accuracy. To obtain this result,
our method takes advantage of continuity and convexity properties and, on all
iterations, maintains bounds not available in the LP-ADP framework.
In the next section, we describe the general problem setting. Section 3 describes
the algorithm and its convergence properties. Section 4 discusses the construction
of the value function domain as required for algorithm convergence. Section 5
describes two examples and implementation of the algorithm. Section 6 concludes
with a discussion of further issues.
2 Problem Setting
We seek to find the value function V ∗ of the infinite horizon problem
V ∗(x) = miny1,y2,... Eξ0,ξ1,...
∞∑
t=0
δtct(xt, yt) (2.1)
s.t. xt+1 = Atxt +Btyt + bt, for t = 0, 1, 2, . . . ,
x0 = x.
In this problem, ξt = (At, Bt, bt) is random data for stage t = 0, 1, 2, . . ., At, Bt
are matrices and bt is a vector. At is a square matrix. The equation xt+1 =
Atxt + Btyt + bt characterizes the transition of state from stage t to t + 1, the
dynamics of the problem. y1, y2, . . . are controls/decision variables. State xt and
control yt are continuous variables in certain finite dimensional Euclidean spaces
X and Y , respectively. The cost function ct : X ×Y → <∪{+∞} is a generalized
function. The static constraints in each stage, e.g., (xt, yt) ∈ Gt for some subset
2
Gt of X × Y , are not explicitly formulated. They are implicitly captured by the
effective domain of ct, say dom(ct) = Gt. The number 0 < δ < 1 is a discount
factor.
The above problem can be represented as
miny0 {c0(x0, y0) + δEξ0 miny1{c1(x1, y1) + δEξ1 miny2
{c2(x2, y2) + . . .}}}
s.t. xt+1 = Atxt +Btyt + bt, for t = 0, 1, 2, . . . ,
x0 = x.
Here Eξt has the same meaning as Ext+1 .
In this paper we consider a simple version of (2.1), namely, ct = c and (At, Bt, bt) =
(A,B, b), for all t = 0, 1, 2, . . ., are identically and independently distributed ran-
dom variables. For the presentation below, we assume that ξ = (A,B, b) is a
discrete random vector with pi = Prob(ξ = (Ai, Bi, bi)), i = 1, . . . , L. (The gen-
eral algorithm does not require finite realizations but practical implementations
make this assumption necessary.) The equality constraints in (2.1) characterize
the dynamics of the problem. The static constraints (e.g. yt ≥ 0) in each stage
are implicitly captured by the effective domain of the generalized function c.
The value function V ∗ defined by (2.1) is a solution of V = M(V ), where the
map M (often called the dynamic programming operator) is defined by
M(V )(x) = miny{c(x, y) + δEξV (Ax+By + b)}
= miny{c(x, y) + δ
L∑
i=1
piV (Aix+Biy + bi)}. (2.2)
Note: The problem of finding a solution of V =M(V ) and the infinite horizon
problem (2.1) are different. The value function V ∗ defined by the infinite horizon
problem is a solution to V = M(V ); however, the equation V = M(V ) may have
many solutions. We will discuss this later. For the time being, we only need to
know that a solution of V = M(V ) is equal to V ∗ if the effective domain of the
solution coincides with dom(V ∗).
More precisely, let D∗ = dom(V ∗) be compact and convex. Let B(D∗) be the
Banach space of all functions finite on D∗ and equipped with the norm ‖f‖D∗ =
supx∈D∗{|f(x)|}.
Theorem 2.1 M is a contraction on B(D∗).
Proof. The proof can be found, e.g., in Theorem 3.8 of [9].
The above theorem ensures that the equation V =M(V ) has a unique solution
on B(D∗). Since V ∗ solves V =M(V ) and V ∗ ∈ B(D∗), V ∗ is the unique solution
of V =M(V ) on B(D∗).
3
An assumption throughout this paper is that the cost function c is convex.
This assumption ensures the convexity of the value function as shown below and
enables the use of a cutting plane method.
Lemma 2.2 M(V ) is convex if V is convex. Consequently, V ∗ is convex.
Proof. See Lemma 3.10 and Corollary 3.11 in [9].
In next section, we will propose a cutting plane method to construct a piecewise
linear value function V k which approximately solves the problem V =M(V ). The
cut generated at iteration k is a supporting plane ofM(V k) at a selected point xk.
The convexity of M(V k) shown by Lemma 2.2 guarantees the existence of such
cuts.
For any functions f and g : <n → < ∪ {+∞}, we say f ≥ g if f(z) ≥ g(z) for
all z ∈ <n.
3 A cutting plane method
In this section, we assume that the domain D∗ = dom(V ∗) is known and is a
compact polyhedral set. All functions are regarded as elements in B(D∗); thus,
we will only define values of functions on D∗.
3.1 An algorithm
Our cutting plane method is based on the following observations.
Lemma 3.1 For any V , V ∈ B(D∗), V ≤ V implies M(V ) ≤M(V ).
Proof. For any x ∈ D∗,
M(V )(x) = miny{c(x, y) + δEV (Ax+By + b)}
≤ miny{c(x, y) + δEV (Ax+By + b)}
= M(V )(x).
Theorem 3.2 Let V ∈ B(D∗) with V ≤ V ∗. Then
(i) M(V ) ≤ V ∗;
(ii) V = V ∗ if M(V ) ≤ V .
4
Proof. (i) By Lemma 3.1 and V ≤ V ∗, we have
M(V ) ≤M(V ∗) = V ∗.
(ii) It follows from M(V ) ≤ V and Lemma 3.1 that
V ≥M(V ) ≥M2(V ) ≥ . . . ≥Mk(V ) ≥ . . . .
Since M is a contraction on B(D∗) by Theorem 2.1, Mk(V ) → V ∗. This shows
that V ≥ V ∗. Now we have V ∗ ≥ V ≥ V ∗ which implies V = V ∗.
We will employ a cutting plane algorithm to construct piecewise linear func-
tions V k defined by a set of cuts {t ≥ Qix+ qi : i = 1, . . . , pk} in the form
V k(x) = max{Qix+ qi : i = 1, . . . , pk}
which approximate V ∗ from below. Theorem 3.2 suggests adding a cut to V k
cutting off an area where M(V k)(x) > V k(x) so that V k+1 can move up towards
V ∗. The algorithm stops when there is no x ∈ D∗ such that M(V k)(x) > V k(x).
Algorithm 3.1
1. Initialization: Find a piecewise linear convex function V 0 ∈ B(D∗) satisfying
V 0 ≤ V ∗. Set k ← 0.
2. If V k ≥ M(V k), stop, V k is the solution. Otherwise, find a point xk ∈ D∗
with V k(xk) < M(V k)(xk).
3. Find a supporting hyperplane ofM(V k) at xk, say t = Qk+1x+qk+1. Define
V k+1(x) = max{V k(x), Qk+1x+ qk+1}.
k ← k + 1. Go to Step 2.
3.2 Details of the algorithm
Step 1: Usually, we can find V 0 easily. For instance, if c(x, y) ≥ c0 for all (x, y)
in its domain, then we can choose V 0(x) = c0/(1− δ), a constant function on D∗.
It is clear that
V ∗(x) ≥∞∑
t=0
δtc0 = V 0(x), ∀x ∈ D∗.
Step 2 consists of two parts. Part 1 is the valuation of M(V k)(x) and Part 2
describes how to find a point xk ∈ D∗ with V k(xk) < M(V k)(xk).
Part 1. Assume that V k is defined by k linear cuts, i.e., for any x ∈ D∗,
V k(x) = max{Qix+ qi : i = 1, . . . , k}
= min{θ | θ ≥ Qix+ qi, i = 1, . . . , k}.
5
Then
M(V k)(x) = miny{c(x, y) + δ
L∑
j=1
pjVk(Ajx+Bjy + bj)}
= miny,θ{c(x, y) + δ
L∑
j=1
pjθj | θj ≥ Qizj + qi, ∀ i = 1, . . . , k;
zj = Ajx+Bjy + bj ∈ D∗, j = 1, . . . , L}. (3.1)
Part 2. We seek to find xk by approximately minimizing V k(x)−M(V k)(x) on
D∗. Notice that V k−M(V k) is a d.c. function (difference of two convex functions)
on D∗. There are rich results on solving d.c. programs, originated by Horst and
Tuy [7]. In Appendix 1, we describe a method based on [6]. This method can find
an exact global minimizer. For Algorithm 3.1 to work, however, an approximate
minimizer xk suffices (see Theorem 3.4). Selecting a computationally low-cost
method for finding sufficiently good xk is important for enhancing the efficiency
of the algorithm. The method described in Appendix 1 certainly is not the only
choice for finding xk. In the second example of numerical experiments, we take
a small sample of 5 points and choose the best point as xk. This simple strategy
works well for the example.
Step 3: Suppose that the polytope D∗ is defined by a system of inequalities
Fx ≤ f for some matrix F and vector f , i.e., D∗ = {x | Fx ≤ f}. Let
V k(x) =
{
min{θ | Qkx+ qk ≤ θe} if Fx ≤ f ,
+∞ otherwise,
where Qk is a matrix and qk and e = (1, . . . , 1)T are vectors. Qkx+qk represents
the set of cuts generated up to the k-th iteration.
Let (yk, θk) and {(λkj , µ
kj ) : j = 1, . . . , L} be an optimal solution and optimal
multipliers of the problem:
miny,θ{c(xk, y) + δ
L∑
j=1
pjθj | Qzj + q ≤ θje, Fzj ≤ f , zj = Ajx
k +Bjy + bj , j = 1, . . . , L}.
Let ζk = (ζkx ; ζky ) be a subgradient of c at (x
k, yk). Then
ξk = ζkx +L∑
j=1
((λkj )
TQAj + (µkj )
TFAj)
is a subgradient of M(V k) at xk. (See Appendix 2 for the proof.)
A supporting hyperplane of M(V k) at xk is
t = M(V k)(xk) + ξk(x− xk)
That is,
Qk+1 = ξk, qk+1 =M(V k)(xk)− ξkxk.
6
3.3 A modification of the algorithm
As we can see, global search takes considerable computational effort. A more
efficient algorithm should combine local and global searches. We can perform a
number of local search iterations between two global searches. A simple modifica-
tion of the algorithm follows.
Algorithm 3.2
1. Initialization: Find a piecewise linear convex function V 0 ∈ B(D∗) satisfying
V 0 ≤ V ∗. Set k ← 0.
2. If V k ≥ M(V k), stop, V k is the solution. Otherwise, find a point x ∈ D∗
with V k(x) < M(V k)(x). Let V ← V k.
3. Find a supporting hyperplane ofM(V ) at x, say t = Qx+q. Define V +(x) =
max{V (x), Qx+ q}.
4. If the improvement of V + over V is larger than a certain tolerance, then let
V ← V + and repeat Step 3. Otherwise, let V k+1 ← V +, k ← k + 1, and go
to Step 2.
The time required for Step 3 is negligible compared to Step 2. Two m-files are
attached. The modification is in the file optutildeepcut.m. The implementation on
the first example in the paper shows a reduction by 3/4 of the number of iterations
(of Step 2), i.e., the modified algorithm requires only 1/4 of the original number
of iterations. Furthermore, it obtains better approximate solutions.
3.4 Comparison with other methods
We make comparisons in two perspectives: in improvement and in representation
of an approximate value function V k.
We refer to our method as the successive linear approximation method (SLAM).
Comparison between SLAM and PIM
The policy iteration method (PIM) is widely used for solving Markovian de-
cision problems, cf. [16]. Thus we shall make a comparison between our method
SLAM and PIM.
At k-th iteration, for given V k, PIM finds yk : D∗ → Y (which is the policy in
the notation of our problem setting) by
yk(x) = argmin y{c(x, y) + δEV k(Ax+By + b)}, ∀x ∈ D∗, (3.2)
then finding V k+1 by solving
V k+1(x) = c(x, yk(x)) + δEV k+1(Ax+Byk(x) + b), ∀x ∈ D∗. (3.3)
7
SLAM performs the tk-th iteration as follows: given V k, find a point xk ∈ D∗
by approximately solving
maxx
MV k(x)− V k(x); (3.4)
then, construct V k+1 by adding to V k a cutting plane at xk.
Since
MV k(x) = miny{c(x, y) + δEV k(Ax+By + b)},
finding the action yk(x) at a point x takes the same computational effort as the
valuation of MV k at x; thus, the determination of yk, i.e., the valuation of yk(x)
at all x ∈ D∗, is more expensive than the determination of xk. The construction
of V k+1 in PIM requires solving an infinite-dimensional equation, which is again
more difficult than determining a cutting plane in SLAM. In return, one can expect
that each iteration of PIM improves the value function V k more than SLAM does.
Comparison of approximation methods for the continuous-state DP
All existing approximation methods for continuous-state DP can be roughly
categorized into discrete approximation (DA) and parametric approximation (PA),
cf. [1]. The former approximates V on a grid of D∗ and the latter approximates V
by a linear combination of a set of basis functions. Our method, cutting plane ap-
proximation (CPA), approximates V by the maximum function of a set of cutting
planes.
The CPA method is similar to PA in the sense that both methods use a set of
continuous functions to approximate V . The superiority of PA over DA is that, in
many cases, one can obtain a good global approximation to V using a small number
of basis functions, whereas in high-dimensional problems the DA approach requires
a very large number of grid points to obtain a comparably accurate approximation,
see [1]. CPA has the same advantages as PA over DA, and, moreover, is more
flexible than PA, because cuts are generated where they are most needed in CPA,
whereas basis functions are pre-selected in PA. A disadvantage of the CPA method
is, however, that cuts can be accumulated rapidly. A computational challenge is
to find effective ways to delete unnecessary cuts.
3.5 Convergence
If Algorithm 3.1 stops at an iteration with V K ≥M(V K), then we have V K = V ∗
by Theorem 3.2.
If the algorithm does not terminate in a finite number of iterations, then the
convergence of V k to V ∗ is not so obvious. We notice that each cut is related to
8
a testing point x. Such a process can lead to pointwise convergence, but not nec-
essarily uniform convergence. A catch is that the limit of a pointwise convergent
sequence {V k} may not be V ∗, because V k may only be updated in some area of
the domain but not over the full domain. Our convergence analysis shall answer
the following two questions. Under what conditions can the sequence {V k} con-
verge uniformly? If {V k} converges uniformly, is the limit function equal to V ∗?
Our main condition for uniform convergence is that D∗ is a polytope. Indeed, if
D∗ is an arbitrary convex compact set, one can construct a monotone increasing
sequence of convex functions {V k} which converges pointwise but not uniformly.
The assumption that D∗ is a polytope ensures the continuity of the limit function
of {V k}. Then, by Theorem 7.13 in [15], {V k} converges uniformly; see details in
the proof of Theorem 3.4. For the second question, we will give a positive answer.
The result is presented in Theorem 3.4.
Lemma 3.3 V k ≤ V k+1 ≤ V ∗.
Proof. From Step 2 of Algorithm 3.1, it is obvious that V k ≤ V k+1.
The initial function V 0 ≤ V ∗. Suppose V k ≤ V ∗; then, by Theorem 3.2 (i),
M(V k) ≤ V ∗; thus, for any x ∈ D∗,
V k+1(x) ≤ max{V k(x), M(V k)(x)} ≤ V ∗(x).
Theorem 3.4 Suppose that D∗ is a polytope and suppose that, at the k-th itera-
tion, a point xk ∈ D∗ is selected such that
M(V k)(xk)− V k(xk) ≥ αmax{M(V k)(x)− V k(x) | x ∈ D∗}
for some constant α > 0; then, V k → V ∗ uniformly.
Proof. First, suppose that Algorithm 3.1 stops at a finite iteration K with V K ≥
M(V K). By Lemma 3.3 V K ≤ V ∗. Thus, by Theorem 3.2, V K = V ∗.
Now suppose Algorithm 3.1 generates a sequence {V k} which converges to V
pointwise.
Because V k ≤ V k+1 → V , epi(V ) = ∩kepi(Vk), which is a closed set since every
V k is closed. Thus, V is a closed convex function on D∗. By our assumption, D∗
is a polytope; thus, by Theorem 10.2 in [13], V is continuous on D∗. By Theorem
7.13 in [15], V k → V uniformly on D∗.
Assume that there exists an x ∈ D∗ such that
M(V )(x)− V (x) = 2σ > 0.
9
By Theorem 2.1, M(V k) → M(V ) uniformly on D∗ since V k → V uniformly;
thus, there exists a k such that M(V k)(x) ≥ M(V )(x) − σ, for all k ≥ k. This
yields
M(V k)(x)− V k(x) ≥M(V )(x)− σ − V (x) = σ.
Since the supporting hyperplane at iteration k satisfiesQk+1xk+qk+1 =M(V k)(xk),
we have
V k+1(xk)− V k(xk) = M(V k)(xk)− V k(xk)
≥ α[M(V k)(x)− V k(x)]
≥ ασ.
Thus
V (xk)− V k(xk) ≥ ασ, ∀ k ≥ k.
The above contradicts the uniform convergence of V k → V on D∗.
The contradiction implies that
M(V )(x) ≤ V (x), ∀x ∈ D∗.
Then, by Theorem 3.2, V = V ∗. This means that V k converges to V ∗ uniformly
on D∗.
4 Construction of the domain D∗
Recall that D∗ = dom(V ∗) where V ∗ is the value function defined by (2.1). This
section discusses how to find D∗. We do not have a complete answer for this
problem; nevertheless, the method proposed can find D∗ in a finite number of
iterations for many cases.
4.1 An algorithm
Although the domain of a solution of V = M(V ) does not necessarily coincide
with D∗, it does provide useful information for finding D∗. We first represent the
domain of M(V ).
From (2.2) one can see that x ∈ dom(M(V )) if and only if there exists a y such
that c(x, y) < +∞ and Aix+Biy + bi ∈ dom(V ) for all i = 1, . . . , L. Denote
Dc(x) = {y | c(x, y) < +∞},
G(x,D) = {y | Aix+Biy + bi ∈ D, ∀i = 1, . . . , L}.
10
Then, x ∈ dom(M(V )) if and only if Dc(x) ∩G(x, dom(V )) 6= ∅.
Denote
Γ(D) = {x | Dc(x) ∩G(x,D) 6= ∅}.
Then,
dom(M(V )) = Γ(dom(V )).
The following are some basic properties of the operator Γ.
Lemma 4.1 (i) If D1 ⊆ D2, then Γ(D1) ⊆ Γ(D2).
(ii) Γ(D∗) = D∗.
(iii) If D∗ ⊆ D, then D∗ ⊆ Γ(D).
Proof. (i) If D1 ⊆ D2, then, for any x, G(x,D1) ⊆ G(x,D2). Now, for any
x ∈ Γ(D1), Dc(x) ∩G(x,D1) 6= ∅. This implies Dc(x) ∩G(x,D2) 6= ∅. Therefore,
x ∈ Γ(D2).
(ii) It follows from M(V ∗) = V ∗ that dom(M(V ∗)) = dom(V ∗), which yields
Γ(D∗) = D∗ by the definition.
(iii) If D∗ ⊆ D; then, by (i), Γ(D∗) ⊆ Γ(D); hence, by (ii), we have D∗ ⊆ Γ(D).
The following lemma shows that, if dom(V ) ⊆ dom(M(V )), then dom(V ) ⊆
dom(V ∗).
Lemma 4.2 Suppose c is bounded on its domain. If D ⊆ Γ(D), then D ⊆ D∗.
Proof. For any xt ∈ D, we have xt ∈ Γ(D); thus,
∃yt ∈ Dc(xt) : Aixt +Biyt + bi ∈ D, ∀ i = 1, . . . , L. (4.1)
For any x ∈ D, let x0 = x. There exists y0 satisfying (4.1). For each realization
of ξ = (A,B, b) (where the subscript is omitted for ease of exposition), let x1 =
Ax0 + By0 + b. By (4.1), x1 ∈ D. Since D ⊆ Γ(D), x1 ∈ Γ(D). Therefore, there
exists y1 satisfying (4.1), and so on; so, we obtain a sequence {(xt, yt) : t = 0, 1, . . .}
(here (xt, yt) are random vectors) satisfying (xt, yt) ∈ dom(c) because yt ∈ Dc(xt).
Since c is bounded on its domain, E∑∞
t=0 δtc(xt, yt) <∞; thus, V
∗(x), defined by
(2.1), is finite, i.e., x ∈ D∗. Therefore, D ⊆ D∗.
Suppose we use a cutting plane method to construct D∗ and start with a set
D ⊇ D∗. The above lemma suggests that one should cut off a portion of D \Γ(D)
if D 6⊆ Γ(D).
Because
D∗ ⊆ Dcx := {x | Dc(x) 6= ∅} = {x | ∃y such that c(x, y) < +∞},
we start with Dcx to find D∗. A generic cutting plane method which constructs
D∗ with this idea is as follows:
11
Algorithm 4.1
1. Let D0 = Dcx. k = 0.
2. If Dk ⊆ Γ(Dk), stop. Otherwise, find a cut F k+1x ≤ fk+1 which cuts off a
portion of Dk \ Γ(Dk).
3. Let Dk+1 = Dk ∩ {x | F k+1x ≤ fk+1}. k ← k + 1. Repeat.
The algorithm stops when Dk with Dk ⊆ Γ(Dk) is found. Question: is Dk =
D∗?
Theorem 4.3 If the algorithm terminates at a finite iteration K, then DK = D∗.
Proof. The algorithm starts with Dcx which contains D∗. By Lemma 4.1, D∗ ⊆
Γ(Dk) for k = 0, 1, . . . ,K; thus, no cut cuts off any point of D∗. This implies
DK ⊇ D∗. When the algorithm stops with DK ⊆ Γ(DK), Lemma 4.2 yields
DK ⊆ D∗; thus, DK = D∗.
Unfortunately, if Algorithm 4.1 does not stop in a finite number of iterations,
the sequence {Dk} need not converge to D∗ (see Remark (iii) after Example 1).
How to modify the algorithm to guarantee convergence is still an open question.
4.2 Generating cuts
Now we discuss Step 2 in detail. First, we propose a method which finds a point
x ∈ Dk \ Γ(Dk) and generates a cut which cuts off a portion of Dk \ Γ(Dk).
For simplicity we only consider the case that dom(c) is a polyhedral set. More
precisely, let dom(c) = {(x, y) : Tx +Wy ≤ r}. Let Dk = {x : F ix ≤ f i : i =
1, . . . , k}.
x 6∈ Γ(Dk) if and only if
T x+Wy ≤ r,
F i(Aj x+Bjy + bj) ≤ f i, i = 1, . . . , k; j = 1, . . . , L;
has no feasible solution; then, by Farkas’ Theorem, if and only if
k∑
i=1
L∑
j=1
πijFiBj + λTW = 0,
k∑
i=1
L∑
j=1
πij [Fi(Aj x+ bj)− f i] + λT (T x− r) > 0, (4.2)
π ≥ 0, λ ≥ 0,
has a solution.
12
Thus, to find a point x ∈ Dk such that x 6∈ Γ(Dk) is equivalent to finding a
triple (x, λ, π) satisfying x ∈ Dk and (4.2). (Note that finding a solution to (4.2)
is equivalent to determining the sign of the supremum of an indefinite quadratic
function subject to linear constraints, which would also require global optimization
methods as in Horst, et al. [6].)
Once a solution (x, λ, π) is found, one can construct a feasibility cut:
k∑
i=1
L∑
j=1
πij [Fi(Ajx+ bj)− f i] + λT (Tx− r) ≤ 0, (4.3)
i.e.,
F k+1 =k∑
i=1
L∑
j=1
πijFiAj , fk+1 =
k∑
i=1
L∑
j=1
πij [fi − F ibj ] + λT r.
We can add an objective to find the “best” cut in the sense, e.g., that F k+1 is the
normal direction of a facet of D∗, or, perhaps more realistically, a facet of Γ(Dk).
Let us look at a simple example to see how Γk(Dcx) approximates D∗.
Example 1
Suppose
dom(c) = {(x, y) : x ∈ [−1, 1]2, y ∈ [−β, β]},
Ax+By + b = αx+ e1y, for x ∈ <2, y ∈ <1, (deterministic).
Here α ∈ <1 and e1 = (1, 0)T , meaning that A = αI ∈ <2×2, B = e1 ∈ <
2×1 and
b = (0, 0)T . For this example, we have
Dc(x) = {y : (x, y) ∈ dom(c)} =
{
[−β, β] if x ∈ [−1, 1]2,
∅ otherwise.
Dcx = {x : Dc(x) 6= ∅} = [−1, 1]2,
G(x,D) = {y : αx+ e1y ∈ D} for D ⊂ <2.
Thus,
Dc(x) ∩G(x,D) 6= ∅ ⇐⇒ x ∈ [−1, 1]2, [−β, β] ∩ {y : αx+ e1y ∈ D} 6= ∅,
⇐⇒ x ∈ [−1, 1]2, (αx+ e1[−β, β]) ∩D 6= ∅, (4.4)
where
αx+ e1[−β, β] = {αx+ ye1 : y ∈ [−β, β]}
=
{
α
(
x1
x2
)
+
(
y
0
)
: y ∈ [−β, β]
}
.
13
For any D = {|x1| ≤ p, |x2| ≤ q} with p, q ≥ 0,
Γ(D) = {x ∈ [−1, 1]2 : (αx+ e1[−β, β]) ∩D 6= ∅},
= {x ∈ [−1, 1]2 : [αx1 − β, αx1 + β] ∩ [−p, p] 6= ∅, |αx2| ≤ q},
= {x ∈ [−1, 1]2 : |x1| ≤ (p+ β)/α, |x2| ≤ q/α},
= {|x1| ≤ min{1, (p+ β)/α}, |x2| ≤ min{1, q/α}}.
For 0 < α ≤ 1,
Γ(Dcx) = [−1, 1]2 = Dcx.
Thus, by Lemma 4.2, Dcx ⊆ D∗, but D∗ ⊆ Dcx. Thus D∗ = Dcx = [−1, 1]
2.
Note: For the case α = 1, any subsetD of [−1, 1]2 of the form {|x1| ≤ 1, |x2| ≤
t} for some 0 ≤ t ≤ 1 satisfies D = Γ(D). Thus, M is a contraction on the Banach
space B(D) and V = M(V ) has a solution (fixed point) VD in B(D). This shows
that the equation V =M(V ) may have many solutions.
For 1 < α ≤ 1 + β,
Γ(Dcx) = {|x1| ≤ 1, |x2| ≤ 1/α},
Γ2(Dcx) = Γ(Γ(Dcx)) = {|x1| ≤ 1, |x2| ≤ 1/α2},
...
Γk(Dcx) = {|x1| ≤ 1, |x2| ≤ 1/αk};
thus, D∗ = limk→∞ Γk(Dcx) = {|x1| ≤ 1, x2 = 0} 6= Dcx.
For α > 1 + β,
Γ(Dcx) = {|x1| ≤1 + β
α, |x2| ≤ 1/α},
Γ2(Dcx) = {|x1| ≤1 + β + αβ
α2, |x2| ≤ 1/α
2},
...
Γk(Dcx) = {|x1| ≤1 + β + αβ + . . .+ αk−1β
αk, |x2| ≤ 1/α
k};
thus, D∗ = limk→∞ Γk(Dcx) = {|x1| ≤
βα−1 , x2 = 0} 6= Dcx.
Remarks:
(i) In some case (α ≤ 1), D∗ = Dcx, which can be obtained directly.
(ii) In some case (α > 1), infinitely many iterations are required to reach D∗.
(iii) Suppose the cutting plane algorithm generates only single-side-cuts in the
case of α > 1, e.g., only the cuts x2 ≥ −1/αk; then, Dk → {|x1| ≤ 1, 0 ≤
x2 ≤ 1} 6= D∗.
14
(iv) If we can directly generate the cut x2 ≥ 0 instead of infinitely many cuts
{x2 ≥ −1/αk : k = 1, 2, . . .}, then we can construct D∗ in 2 iterations for the
problem with 1 < α ≤ 1+β and 4 iterations for the problem with α > 1+β.
4.3 Generating the deepest cut
A cut generated by (4.3) only cuts off points in Dk \Γ(Dk), i.e., it does not cut off
any point in Γ(Dk). As shown in Example 1, the cutting plane method using such
cuts may fail to approximate D∗ even with infinitely many iterations. In order to
fulfill the goal in Remark (iv), we must find a deeper cut (a smaller f k+1), which
cuts into Γ(Dk), hopefully, reaching the boundary of D∗.
Given a D, suppose that we have obtained a cut dTx ≤ t0 which cuts off a
portion of D \Γ(D) (but does not cut into Γ(D)). Denote Dt := D∩{dTx ≤ t}. If
the plane dTx = t0 has touched the boundary of Γ(D), then no point of Dt0 \Γ(D)
can be cut off by any cut of the form dTx ≤ t for arbitrary t. However, since
Γ(Dt0) is smaller than Γ(D), a portion of Dt0 \Γ(Dt0) may be cut off by some cut
dTx ≤ t. We wish to find t < t0 such that no point of Dt \ Γ(Dt) can be cut off
by any cut of the form dTx ≤ t. In other words, the plane dTx = t touches the
boundary of Γ(Dt) (a supporting plane of Γ(Dt)). Thus, for any t > t, the plane
dTx = t does not touch Γ(Dt), and, for any t ≤ t, the half-space dTx ≤ t intersects
with Γ(Dt). The latter means that there exists x ∈ Γ(Dt) satisfying dTx ≥ t. The
latter interpretation suggests determining t by the following linear program:
t = max {t | dTx ≥ t, x ∈ Γ(Dt)}. (4.5)
Because dTx ≤ t0 does not cut into Γ(D) (then does not cut into Γ(Dt)), there
exists no x satisfying dTx ≥ t and x ∈ Γ(Dt) if t > t0. This implies that t ≤ t0.
Therefore, the cut dTx ≤ t is deeper than the cut dTx ≤ t0.
On the other hand, the following lemma guarantees that the cut dTx ≤ t will
not cut off any point in D∗.
Lemma 4.4 Suppose that D∗ ⊂ D. Let t be the optimal objective value of problem
(4.5); then, dTx ≤ t is satisfied by all x ∈ D∗.
Proof. Let
x∗ = argmax{dTx | x ∈ D∗}, t∗ = dTx∗.
Because x∗ ∈ D∗, there exist {(xt, yt) : t = 0, 1, 2, . . .} such that
V ∗(x∗) = E[∞∑
t=0
δtc(xt, yt)] <∞,
xt+1 = Axt +Byt + b, x0 = x∗.
15
For any integerK ≥ 0, let xl = xl+K and yl = yl+K . Because E[∑∞
t=K δt−Kc(xt, yt)] <
∞, we have
V ∗(xK) ≤ E[∞∑
l=0
δlc(xl, yl)] <∞.
Therefore, xK ∈ D∗.
Now y0 ∈ Dc(x∗) follows from c(x∗, y0) < ∞ and y0 ∈ G(x∗, D∗) follows from
x1 = Aix∗ + Biy0 + bi ∈ D∗ for every i = 1, . . . , L. Thus x∗ ∈ Γ(D∗). Because
D∗ ⊆ D and D∗ ⊆ {x : dTx ≤ t∗}, we have D∗ ⊆ Dt∗ . Therefore, x∗ ∈ Γ(Dt∗).
This, together with dTx∗ = t∗, shows that (x∗, t∗) is a feasible point of (4.5); thus
t∗ ≤ t, from which the claim of the lemma follows.
The following example shows the effect of the deepest cut.
Example 1 (Continued)
Consider α > 1. Consider the cut of the form −x2 ≤ t for some t ∈ R to be
determined. So, d = (0,−1)T in (4.5). Start with D = Dcx = [−1, 1]2; then,
Dt = {x ∈ [−1, 1]2 : −x2 ≤ t}.
Linear program (4.5) is
t = max t
s.t. −x2 ≥ t,
−1 ≤ αx1 + y ≤ 1,
−t ≤ αx2 ≤ 1,
−β ≤ y ≤ β.
A feasible solution must satisfy
−t/α ≤ x2 ≤ −t.
This can only be satisfied when t ≤ 0 since α > 1. Thus, we have t = 0. The cut
−x2 ≤ 0 reaches the bottom of D∗. With one more cut from above (d = (0, 1)T ),
D∗ will be completely determined for the case of 1 < α ≤ 1 + β.
For the case of α > 1 + β, suppose we have d = (1, 0)T . Then,
Dt = {x ∈ [−1, 1]2 : x1 ≤ t}.
Linear program (4.5) is
t = max t,
s.t. x1 ≥ t,
−1 ≤ αx1 + y ≤ t,
−1 ≤ αx2 ≤ 1,
−β ≤ y ≤ β.
16
Feasible solutions must satisfy
t ≤ x1 ≤t+ β
α.
This implies
t ≤β
α− 1.
Thus t = βα−1 ; so, we obtain a cut x1 ≤
βα−1 which cuts exactly to the boundary
of D∗ on the right. One more cut from the left (d = (−1, 0)T ) will completely
determine D∗; hence, in total, we need only 4 cuts.
5 Examples
5.1 Infinite horizon portfolio
As noted in the introduction, this work was motivated by solving infinite horizon
investment problems that face long-enduring institutions. We will demonstrate
how the algorithm performs on a small example where an infinite horizon optimum
can be found analytically (as done, for example, in [17] and [12]). The goal is to
maximize the discounted expected utility of consumption over an infinite horizon.
The decisions in each period are how much to consume and how much to invest
in a risky asset (or in a variety of assets).
The state variable x in this case corresponds to wealth or the current market
value of all assets. The control variable y has two components, y1, which corre-
sponds to consumption, and y2, which corresponds to the amount invested in a
risky asset with random return ξ. The assumption in this model is that any re-
maining funds, after consuming y1 and investing y2 in the risky asset, are invested
in a riskfree asset (e.g., U.S. Treasury bills) with known rate of return r. For this
model, c(x, y) is either ∞ if x < 0 or −yγ1/γ, for some non-zero parameter γ < 1,
giving (the negative of) the common utility function with constant relative risk
aversion (i.e., such that risk preferences do not depend on the level of wealth).
With these assumptions, M(V ) takes the following form (for x ≥ 0):
M(V )(x) = miny{−yγ1/γ + δ
L∑
i=1
piV ((1 + r)x− (1 + r)y1 + (ξi − r)y2)}, (5.6)
where ξi is the ith realization of the random return with probability pi. The
solution V ∗ ofM(V ) = V can be found analytically by observing that the optimal
value function is proportional to xγ (by, for example, considering the limiting case
17
of a finite horizon problem). We then have that
L∑
i=1
piV ((1 + r)x− (1 + r)y1 + (ξi − r)y2)
= −KL∑
i=1
pi((1 + r)x− (1 + r)y1 + (ξi − r)y2)γ
= −K(x− y1)γ
L∑
i=1
pi((1 + r) + (ξi − r)[y2/(x− y1)])γ
= −K(x− y1)γ
L∑
i=1
pi((1 + r) + (ξi − r)z)γ ,
where z = y2
x−y1is the fractional risky investment after consuming y1 and K is
some positive constant. The optimal z∗ then must solve
L∑
i=1
piγ(ξi − r)((1 + r) + (ξi − r)z)γ−1 = 0, (5.7)
which is independent of y1. With V ∗ = −∑L
i=1 pi((1 + r) + (ξi − r)z∗)γ , optimal
y∗1 now must solve
−yγ−11 + δγKV ∗(x− y1)
γ−1 = 0 (5.8)
or
y∗1 = x(δγKV ∗)1/(γ−1)
1 + (δγKV ∗)1/(γ−1)= xw∗, (5.9)
for an optimal consumption fraction w∗. The last step is to find K from M(V ) =
V , using
−xγ((w∗)γ/γ + δKV ∗(1− w∗)γ) =M(V (x)) = V (x) = −Kxγ (5.10)
to obtain K = ((δV ∗)1/(γ−1)−1)γ−1
δγV ∗ .
While this function can be found explicitly (up to solving the nonlinear equa-
tion (5.7)), various other constraints on investment (such as transaction costs and
limits on consumption changes from period to period) make analytical solutions
impossible. We use the analytical solution here to observe Algorithm 3.1’s per-
formance and convergence behavior. We also use the analytical solution to derive
initial upper bounding approximations. For our test, we use the linear supports of
V ∗ at x = 0.1 and x = 10 as initial cuts and restrict our search in x to the interval
[0.1, 10], although the feasible region is unbounded.
For our test, we used γ = 0.03, r = 0.05, and ξi chosen as a discrete approxima-
tion of the lognormal return distribution with mean return of 0.08 and standard
18
0 1 2 3 4 5 6 7 8 9 10−170
−165
−160
−155
−150
−145
V−0V−5V−10V−20V−50V−100V*
Figure 1: Value function approximations for portfolio example, δ = 1/1.25. The hor-
izontal axis stands for the state variable x and the vertical axis for the value function
V .
deviation of 0.4. Algorithm 3.1 was implemented in MATLAB using fmincon to
solve the optimization subproblems and a linesearch to find x in Step 2. We tried
different values for the discount factor, δ. The results for δ = 11.25 appear in Figure
1, which includes V 0, V 5, V 10, V 20, V 50, V 100, and V ∗. In this case, after 100
iterations, the approximation almost perfectly matches the true infinite-horizon
value function.
With a larger δ, the value of K increases rapidly as (δV ∗)1/(γ−1) approaches
one. For δ = 11.25 , K is 155.6, while, for δ = 1
1.07 , K is 466.3. The result is that
larger δ values (corresponding to lower discount rates) require additional iterations
of Algorithm 3.1 to approach V ∗. The results for the same data as in Figure 1
except with δ = 11.07 appear in Figure 2. After 500 iterations, the approximating
V 500 agrees relatively well with V ∗ as shown in the figure but has not converged to
nearly the same accuracy as the approximations (with fewer iterations) in Figure
1.
For this example, Algorithm 3.2 gains a notable speed-up. For the instance
with δ = 11.25 , it requires only 25 iterations to obtain the approximating V
25 which
19
0 1 2 3 4 5 6 7 8 9 10−500
−490
−480
−470
−460
−450
−440
−430
V−0V−10V−50V−500V*
Figure 2: Value function approximations for portfolio example, δ = 1/1.07. The hor-
izontal axis stands for the state variable x and the vertical axis for the value function
V .
is as good as V 100 obtained by Algorithm 3.1. For the case of δ = 11.07 , Algorithm
3.2 requires only about 1/4 of iterations needed by Algorithm 3.1. Moreover,
Algorithm 3.2 can generate more accurate approximating value functions.
Understanding the numerical behavior of Algorithm 3.1 and finding mecha-
nisms to speed convergence should be topics for further investigation. Comparing
V k to V ∗ in the figures shows how the algorithm forces closer approach in some
areas of the curve over others. The behavior of the algorithm is generally to move
along the curve V k to create tighter cuts and then to repeat that process with
less improvement on a new sequence of iterations. These observations suggest
that procedures with multiple cut generation and tightening tolerances should be
considered for accelerating convergence.
5.2 Quadratic-linear stochastic control problems
This example is intended to test the algorithm for high dimensional problems. The
quadratic-linear stochastic control problems are well known. We will use these
20
examples following Section 6.5 of [2]. The value function in a quadratic-linear
stochastic control problem is defined as follows:
V ∗(x) = miny1,y2,... Eξ0,ξ1,...
∞∑
t=0
δt[xTt Qxt + yTt Ryt]
s.t. xt+1 = Axt +Byt + b, for t = 0, 1, 2, . . . ,
x0 = x.
In the above expression, xt ∈ <n, yt ∈ <
m, and the matrices A, B, Q, R are
given and have appropriate dimensions. We assume that Q is a symmetric posi-
tive semidefinite matrix and that R is also symmetric and positive definite. The
disturbances ξt = b form independent random vectors with given probability dis-
tributions that do not depend on xt, yt. Furthermore, the vector b has zero mean
and finite second moment. The controls yt are unconstrained. The problem above
represents a popular formulation of a regulation problem whereby we desire to
keep the state of the system close to the origin. Such problems are common in the
theory of automatic control of a motion or a process. For computational purpose,
we assume ξ = b is a discrete random vector with pi = Prob(ξ = bi), i = 1, . . . , L.
The value function V ∗ is the solution of M(V ) = V where the mapping M is
determined by
M(V )(x) = miny{xTQx+ yTRy + δ
L∑
i=1
piV (Ax+By + bi)}.
We choose the quadratic-linear stochastic control problem, because its exact
solution on <n can be computed. In the following, to be consistent with standard
notation for these problems, we re-use the notation K and c as a matrix and con-
stant respectively. To find the exact solution for V ∗, we first compute a symmetric
positive semi-definite matrix K by solving the algebraic matrix Riccati equation:
K = AT [δK − δ2KB(δBTKB +R)−1BTK]A+Q.
Then, the exact value function V ∗ is given by
V ∗(x) = xTKx+ c,
where
c =δ
1− δ
L∑
i=1
pibTi Kbi.
The above formula determines the exact value function on <n; however, a
numerical method can only determine a value function on a bounded set. In this
example, we will find the value function on the box D∗ = [−t, t]n. The solution of
21
M(V ) = V on B(<n) and B([−t, t]n) need not be the same; however, for reasonably
large t, the two solutions should be close. Thus, we still use the exact value function
on B(<n) as the reference for numerically generated value functions.
Finding a point x which maximizes M(V )(x) − V (x) is computationally very
expensive. The method for minimizing a d.c. function described in Appendix 1
can be applied to finding x; however, this is difficult to program and expensive
in computation. Thus, we use the following approach. We randomly sample J
points, xj , j = 1, . . . , J , (e.g., J = 5). At each point xj , we evaluate M(V )(xj)
and compute the gradient of M(V ) (or a subgradient if M(V ) is not smooth), ξj ,
which generates at the point a supporting plane, H j , of M(V ). Then we find a
point xj by maximizingHj(x)−V (x) which can be viewed as a local approximation
of M(V )(x)− V (x). Suppose that the current approximate value function
V (x) = max{Qix+ qi : i = 1, . . . , p}, x ∈ [−t, t]n.
Using the supporting plane formula from Subsection 3.2, we have
Hj(x) =M(V )(xj) + ξj(x− xj).
Thus, maximizing Hj(x)− V (x) can be formulated as a linear program
σj = max{M(V )(xj) + ξj(x− xj)− θ : x ∈ [−t, t]n, Qix+ qi ≤ θ, i = 1, . . . , p };
then, x is chosen as the point among xj , j = 1, . . . , J with the largest σj .
We implemented the successive linear approximation method (SLAM) for the
quadratic-linear stochastic control problem with various dimensions and summa-
rize the results in the tables below. The accuracy of an approximate solution to
the exact solution is measured by the relative error,
‖V ∗ − V ‖D∗/‖V ∗‖D∗ .
Although the exact value function V ∗(x) = xTKx + c is known, computing the
approximation error is still difficult due to the high dimension of D∗ = [−t, t]n.
To estimate the error, we generate a sample of 5000 points plus points on the line
of the eigenvector corresponding to the maximum eigenvalue of K, and estimate
‖f‖D∗ by max{|f(x)| : for all sample points x}. This sample size is generally
small for high-dimensional problems, causing potential inaccuracy in the errors
shown in the table. The result can, however, be understood as a tendency. For
each dimension, we computed 3 instances. The errors shown in the table are the
average of the 3 instances.
22
Number of Iterations
10 25 50 100 200
(1,1) 0.3540 0.1238 0.0473 0.0125 0.0037
(2,2) 0.1680 0.0750 0.0466 0.0191 0.0105
(n,m) = (4,2) 0.1444 0.0649 0.0541 0.0309 0.0196
(4,4) 0.1140 0.0512 0.0380 0.0190 0.0153
(10,5) 0.1311 0.1255 0.0785 0.0499 0.0455
(10,10) 0.1429 0.1403 0.0868 0.0751 0.0670
Relative Errors
The experiments show that fairly good solutions can be obtained for low di-
mensional instances. For the high-dimensional cases, errors are rapidly reduced in
the first several iterations, but are not clearly reduced after the initial iterations
through the 200th iteration. We display only the first iterations for the examples
of dimensions (n,m) = (20, 20) and (50, 50).
# Iterations 1 2 3 4 6 8 200
(20,20) 1.0000 0.3935 0.3926 0.1383 0.1221 0.1159 0.1126
(50,50) 1.0000 0.7774 0.3442 0.1794 0.0939 0.0925 0.0925
Relative Errors
A positive aspect of these tests is that the method appears to work well for high-
dimensional problems in the sense that it can find a relatively good solution in the
first few iterations, with relative errors reduced to around 0.1. Considering that it
is not possible for a direct state-discretization method to work for a 20-dimensional
problem (since that requires q20 points if each dimension is discretized into q
points), our method can be considered as a useful alternative for solving high-
dimensional stochastic dynamic programs for which the curse of dimensionality is
a major concern.
The negative result from this example is that, after a few iterations, the method
converges very slowly, in particularly, for high-dimensional problems. This can be
seen from the examples of dimensions (n,m) = (20, 20) and (50, 50). Useful cuts
continue to be added in these examples, but, while reducing local errors about the
iteration points, errors still remain unaffected in other portions of the domain.
6 Observations and future issues
We have described an algorithm for solving a general class of discrete-time con-
vex infinite horizon optimization problems and demonstrated the method on two
simple examples. As the first example demonstrates, many iterations may be
required to achieve accuracy for highly nonlinear value functions. The second ex-
ample, however, demonstrates that approximate solutions converge very fast at the
beginning and can produce approximate solutions with a relative error of around
0.1, even for high-dimensional problems.
Since each iteration involves additional optimization steps, selection of x (or
potentially multiple points on each iteration) has a critical effect on performance.
23
We mentioned the d.c. methods as one possibility for finding good points but
other methods that require fewer function evaluations may also be useful. In the
second example, we select x from a small sample. The results from this example
are encouraging. To see if this selection is effective in general, more numerical
experiments are needed. The effect of different options also requires further study.
Our method relies on identification of the feasible domain, D∗. In that case,
we are left with the following questions:
(i) Under what condition is D∗ a polytope?
(ii) Can Algorithm 4.1 terminate in a finite number of iterations if D∗ is a
polytope?
(iii) How can one modify the algorithm if D∗ is not a polytope?
These questions and further implementation issues are additional subjects for fu-
ture research.
7 Appendix
Appendix 1
This appendix describes a method for finding a minimizer to the problem
minx∈D∗
V (x)−M(V )(x),
where both V and M(V ) are convex. This problem is equivalent to
min xn+1
x, xn+1 : V (x)−M(V )(x)− xn+1 ≤ 0,
x ∈ D∗,
which is equivalent to
min xn+1
x, xn+1, xn+2 : V (x)− xn+1 − xn+2 ≤ 0,
M(V )(x)− xn+2 ≥ 0
x ∈ D∗.
Suppose that
V (x) = max{Qix+ qi : i = 1, . . . , k}, ∀x ∈ D∗ ⊂ <n;
then, the above problem is equivalent to
min xn+1
24
x, xn+1, xn+2 : Qix+ qi − xn+1 − xn+2 ≤ 0, i = 1, . . . , k,
x ∈ D∗,
M(V )(x)− xn+2 ≥ 0.
The first k + 1 constraints define a polyhedral set, denoted by D. (In order to
use the algorithm described in [6], D should be bounded. This can be done by
adding appropriate lower and upper bounds on xn+1 and xn+2, without changing
the solution of the minimization problem.) The function in the k+2-nd constraint
is convex, denoted by g. Such a program can be solved by Algorithm 4.1 in [6].
Here we describe it briefly. The algorithm solves
min cTx (7.1)
s.t. x ∈ D, g(x) ≥ 0.
Initialization:
Solve min{cTx : x ∈ D} to obtain x0 ∈ D. Assume g(x0) < 0 (otherwise,
x0 is optimal solution to (7.1)). Construct a simple polytope S0, e.g., a simplex,
containing D, such that x0 is a vertex of S0; compute the vertex set V (S0) of S0.
j ← 0.
Iterations:
Choose z ∈ V (Sj) satisfying g(z) = max{g(x) : x ∈ V (Sj)}.
If g(z) = 0 and z ∈ D, then stop; z is an optimal solution.
Otherwise, apply the simplex algorithm with starting vertex z to solve min{cTx :
x ∈ Sj} until an edge [u, v] of Sj is found such that g(u) ≥ 0 and g(v) < 0 and
cT v < cTu; compute the intersection point s of [u, v] with {x : g(x) = 0}.
If s ∈ D, then Sj+1 ← Sj ∩ {x : cTx ≤ cT s}.
If s 6∈ D, then Sj+1 ← Sj ∩ {x : l(x) ≤ 0}, where l(x) ≤ 0 is one of the linear
constraints defining D satisfying l(s) > 0.
j ← j + 1, repeat.
Appendix 2
Define the function ρ =M(V ) by
ρ(x) = miny,θ{c(x, y) + δ
∑
j
pjθj | Qzj + q ≤ θje, Fzj ≤ f ,
zj = Ajx+Bjy + bj , j = 1, . . . L}
and let (y, θ) and (λ, µ) be an optimal solution and multiplier of the above mini-
mization problem. We will show
ξ = ζx +L∑
j=1
(λTj QAj + µT
j FAj)
25
is a subgradient of ρ at x, where ζ = (ζx, ζy) is a subgradient of c at (x, y).
We can write
ρ(x) = min c(x, y) + δL∑
j=1
pjθj
y, θ : Q(Ajx+Bjy + bj) + q ≤ θje,
F(Ajx+Bjy + bj) ≤ f , j = 1, . . . , L,
= max h(λ, µ;x)
s.t. λTj e = δpj , j = 1, . . . , L, (7.2)
λ ≥ 0, µ ≥ 0,
where λ = (λ1, . . . , λL), µ = (µ1, . . . , µL), and
h(λ, µ;x) = miny c(x, y) +L∑
j=1
[λTj (Q(Ajx+Bjy + bj) + q)
+µTj (F(Ajx+Bjy + bj)− f)]. (7.3)
Let (λ, µ) be the optimal solution of the problem (7.5) with x = x. Then
ρ(x) = h(λ, µ; x).
The necessary and sufficient condition for y to be an optimal solution of the
problem (7.3) (given (λ, µ; x)) is that there exists a subgradient ζ = (ζx, ζy) of c
at (x, y) such that
ζy +L∑
j=1
[λTj QBj + µT
j FBj ] = 0. (7.4)
For fixed (λ, µ) = (λ, µ) and for any x, denote by yx an optimal solution of (7.3).
Then,
ρ(x) = maxλ,µ{h(λ, µ;x) | λT
j e = δpj , j = 1, . . . , L, ; λ ≥ 0, µ ≥ 0}
≥ h(λ, µ;x)
= c(x, yx) +L∑
j=1
[λTj (Q(Ajx+Bjyx + bj) + q) + µT
j (F(Ajx+Bjyx + bj)− f)].
Because c is convex,
c(x, y) ≥ c(x, y) + ζx(x− x) + ζy(y − y).
Note that yx = y and
ρ(x) = h(λ, µ; x),
= c(x, y) +L∑
j=1
[λTj (Q(Aj x+Bj y + bj) + q) + µT
j (F(Aj x+Bj y + bj)− f)].
26
Thus,
ρ(x) ≥ ρ(x) + ζx(x− x) + ζy(yx − y),
+L∑
j=1
[λTj (QAj(x− x) +QBj(yx − y)) + µT
j (FAj(x− x) + FBj(yx − y))],
= ρ(x) + {ζx +L∑
j=1
(λTj QAj + µT
j FAj)}(x− x).
where the last equation uses (7.4). The above inequality shows that
ξ = ζx +L∑
j=1
(λTj QAj + µT
j FAj)
is a subgradient of ρ at x.
References
[1] H. Benitez-Silva, G. Hall, G.J. Hitsch, G. Pauletto, and J. Rust, A compar-
ison of discrete and parametric approximation methods for continuous-state
dynamic programming problems, Reseach report, Department of Economics,
Yale University, New Haven, CT, 2000 (also Working Paper No. 24, Com-
putation in Economics and Finance, Society for Computational Economics
2000).
[2] D.P. Bertsekas, Dynamic Programming and Stochastic Control, Academic
Press, 1976.
[3] D.P. Bertsekas, Dynamic Programming and Optimal Control, Volume II,
Athena Scientific, Belmont, MA, 1995.
[4] D.P. De Farias and B. Van Roy, The linear programming approach to approx-
imate dynamic programming, Operations Research, 51 (2003), pp. 850–865.
[5] S.D. Flam and R.J-B Wets, Existence results and finite horizon approximates
for infinite horizon optimization problems, Econometrica, 55 (1987), pp. 1187–
1209.
[6] R. Horst, P.M. Pardalos, and N.V. Thoai, Introduction to Global Optimiza-
tion, 2nd edition, Kluwer Academic Publishers, Dordrecht, 2000.
[7] R. Horst and H. Tuy, Global Optimization: Deterministic Approaches, 2nd
revised edition. Springer Verlag, Heidelberg, 1993.
[8] R.C. Grinold, Finite horizon approximations of infinite horizon linear pro-
grams, Math. Program., 12 (1977), pp. 1–17.
27
[9] L.A. Korf, An approximation framework for infinite horizon stochastic dy-
namic optimization problems with discounted costs, Research report, De-
partment of Mathematics, Washington University, Seattle, WA, 2000.
[10] W.S. Lovejoy, A survey of algorithmic methods for partially observed Markov
decision processes, Ann. Oper. Res., 28 (1991), pp. 47–66.
[11] W.S. Lovejoy, Computationally feasible bound for partially observed Markov
decision processes, Oper. Res., 39 (1991), pp. 162–175.
[12] R.C. Merton, Lifetime portfolio selection under uncertainty: the continuous-
time case, The Review of Economics and Statistics, 51 (1969), pp. 247–257.
[13] R.T. Rockafellar, Convex Analysis, Princeton University Press, Princeton,
NJ, 1970.
[14] R.T. Rockafellar and R.J-B Wets, Variational Analysis, Springer-Verlag,
Berlin, 1998.
[15] W. Rudin, Principles of Mathematical Analysis, 3rd edition, McGraw-Hill
Inc., 1984.
[16] J. Rust, A comparison of policy iteration methods for solving continuous-
state, infinite-horizon Markovian decision problems using random, quasi-
random, and deterministic discretizations, Working Paper, Yale University,
April 1997 (also in Computational Economics, Economics Working Paper
Archive at WUSTL).
[17] P.A. Samuelson, Lifetime portfolio selection by dynamic stochastic program-
ming, The Review of Economics and Statistics, 51 (1969), pp. 239–246.
[18] R.D. Smallwood and E.J. Sondik, The optimal control of partially observable
Markov processes over a finite horizon, Oper. Res., 21 (1973), pp. 1071–1088.
28