successivelinearapproximationsolution ...matzgy/papers/cp_ihdp3jrb... ·...

Successive Linear Approximation Solution

of Infinite Horizon Dynamic Stochastic

Programs

John R. Birge ∗ and Gongyun Zhao †

June 30, 2006

∗The University of Chicago Graduate School of Business, Chicago, Illinois, USA.

([email protected]). His work was supported in part by the National Science Foun-

dation under Grant DMI-0200429 and by The University of Chicago Graduate School of Business.†Department of Mathematics, National University of Singapore, Singapore. ([email protected]).

His work was supported in part by NUS Academic Research Grant R-146-000-057-112.

1

Successive Linear Approximation Solution

of Infinite Horizon Dynamic Stochastic

Programs

Abstract

Models for long-term planning often lead to infinite horizon stochastic pro-

grams that offer significant challenges for computation. Finite-horizon approxi-

mations are often used in these cases but they may also become computationally

difficult. In this paper, we directly solve for value functions of infinite horizon

stochastic programs. We show that a successive linear approximation method

converges to an optimal value function for the case with convex objective, linear

dynamics, and feasible continuation.

Keywords: stochastic programming, dynamic programming, infinite horizon, lin-

ear approximation, cutting planes

AMS Classification: 65K05, 90C15, 90C39, 91B28

1 Introduction

Many long-term planning problems can be expressed as infinite horizon stochastic

programs. The infinite horizon often arises because of uncertainty about any

specific endpoint (e.g., the lifetime of an individual or organization). Solving

such problems with multiple decision variables and random parameters presents

obvious computational difficulties. A common technique is to use a finite-horizon

approximation, but even these problems become quite difficult for practical size.

The approach in this paper is to assume stationary data and to solve for the

infinite horizon value function directly. A motivating example is an infinite hori-

zon portfolio problem, which involves decisions on amounts to invest in different

assets and amounts to consume over time. For simple cases, such as those in

Samuelson [17] for discrete-time and Merton [12] for continuous-time, optimality

conditions can be solved directly but general transaction costs and constraints

on consumption or investment require more complex versions of the form in the

infinite horizon problems considered here.

The problems considered here also relate to the dynamic programming liter-

ature (see, for example, Bertsekas [3]) and particularly to methods for solving

1

partially observed Markov decision processes (see the survey in Lovejoy [10]). Our

method is most similar to the piecewise linear construction in Smallwood and

Sondik [18] for finite horizons and the bounding approximations used in Lovejoy

[11] for both finite and infinite horizons. The main differences between our model

and those in [11] and [18] are that we do not assume a finite action space and

do not use finite horizon approximations. Our method also does not explicitly

find a policy or approximate the state space with a discrete grid; instead, we use

the convexity of the value function and the contraction properties of the dynamic

programming operator in a form of approximate value iteration.

Our approach also is similar to the linear programming approach to approx-

imate dynamic programming (LP-ADP, see, e.g., De Farias and Van Roy [4]),

but that approach again focuses on discrete state and action spaces and uses a

pre-selected set of linear basis functions to approximate the value function. Our

method uses linear support functions as an outer linearization in contrast to inner

linearization in the LP-ADP approach. Our support functions are also gener-

ated within the method and achieve arbitrary accuracy. To obtain this result,

our method takes advantage of continuity and convexity properties and, on all

iterations, maintains bounds not available in the LP-ADP framework.

In the next section, we describe the general problem setting. Section 3 describes

the algorithm and its convergence properties. Section 4 discusses the construction

of the value function domain as required for algorithm convergence. Section 5

describes two examples and implementation of the algorithm. Section 6 concludes

with a discussion of further issues.

2 Problem Setting

We seek to find the value function V ∗ of the infinite horizon problem

V ∗(x) = miny1,y2,... Eξ0,ξ1,...

∞∑

t=0

δtct(xt, yt) (2.1)

s.t. xt+1 = Atxt +Btyt + bt, for t = 0, 1, 2, . . . ,

x0 = x.

In this problem, ξt = (At, Bt, bt) is random data for stage t = 0, 1, 2, . . ., At, Bt

are matrices and bt is a vector. At is a square matrix. The equation xt+1 =

Atxt + Btyt + bt characterizes the transition of state from stage t to t + 1, the

dynamics of the problem. y1, y2, . . . are controls/decision variables. State xt and

control yt are continuous variables in certain finite dimensional Euclidean spaces

X and Y , respectively. The cost function ct : X ×Y → <∪{+∞} is a generalized

function. The static constraints in each stage, e.g., (xt, yt) ∈ Gt for some subset

2

Gt of X × Y , are not explicitly formulated. They are implicitly captured by the

effective domain of ct, say dom(ct) = Gt. The number 0 < δ < 1 is a discount

factor.

The above problem can be represented as

miny0 {c0(x0, y0) + δEξ0 miny1{c1(x1, y1) + δEξ1 miny2

{c2(x2, y2) + . . .}}}

s.t. xt+1 = Atxt +Btyt + bt, for t = 0, 1, 2, . . . ,

x0 = x.

Here Eξt has the same meaning as Ext+1 .

In this paper we consider a simple version of (2.1), namely, ct = c and (At, Bt, bt) =

(A,B, b), for all t = 0, 1, 2, . . ., are identically and independently distributed ran-

dom variables. For the presentation below, we assume that ξ = (A,B, b) is a

discrete random vector with pi = Prob(ξ = (Ai, Bi, bi)), i = 1, . . . , L. (The gen-

eral algorithm does not require finite realizations but practical implementations

make this assumption necessary.) The equality constraints in (2.1) characterize

the dynamics of the problem. The static constraints (e.g. yt ≥ 0) in each stage

are implicitly captured by the effective domain of the generalized function c.

The value function V ∗ defined by (2.1) is a solution of V = M(V ), where the

map M (often called the dynamic programming operator) is defined by

M(V )(x) = miny{c(x, y) + δEξV (Ax+By + b)}

= miny{c(x, y) + δ

L∑

i=1

piV (Aix+Biy + bi)}. (2.2)

Note: The problem of finding a solution of V =M(V ) and the infinite horizon

problem (2.1) are different. The value function V ∗ defined by the infinite horizon

problem is a solution to V = M(V ); however, the equation V = M(V ) may have

many solutions. We will discuss this later. For the time being, we only need to

know that a solution of V = M(V ) is equal to V ∗ if the effective domain of the

solution coincides with dom(V ∗).

More precisely, let D∗ = dom(V ∗) be compact and convex. Let B(D∗) be the

Banach space of all functions finite on D∗ and equipped with the norm ‖f‖D∗ =

supx∈D∗{|f(x)|}.

Theorem 2.1 M is a contraction on B(D∗).

Proof. The proof can be found, e.g., in Theorem 3.8 of [9].

The above theorem ensures that the equation V =M(V ) has a unique solution

on B(D∗). Since V ∗ solves V =M(V ) and V ∗ ∈ B(D∗), V ∗ is the unique solution

of V =M(V ) on B(D∗).

3

An assumption throughout this paper is that the cost function c is convex.

This assumption ensures the convexity of the value function as shown below and

enables the use of a cutting plane method.

Lemma 2.2 M(V ) is convex if V is convex. Consequently, V ∗ is convex.

Proof. See Lemma 3.10 and Corollary 3.11 in [9].

In next section, we will propose a cutting plane method to construct a piecewise

linear value function V k which approximately solves the problem V =M(V ). The

cut generated at iteration k is a supporting plane ofM(V k) at a selected point xk.

The convexity of M(V k) shown by Lemma 2.2 guarantees the existence of such

cuts.

For any functions f and g : <n → < ∪ {+∞}, we say f ≥ g if f(z) ≥ g(z) for

all z ∈ <n.

3 A cutting plane method

In this section, we assume that the domain D∗ = dom(V ∗) is known and is a

compact polyhedral set. All functions are regarded as elements in B(D∗); thus,

we will only define values of functions on D∗.

3.1 An algorithm

Our cutting plane method is based on the following observations.

Lemma 3.1 For any V , V ∈ B(D∗), V ≤ V implies M(V ) ≤M(V ).

Proof. For any x ∈ D∗,

M(V )(x) = miny{c(x, y) + δEV (Ax+By + b)}

≤ miny{c(x, y) + δEV (Ax+By + b)}

= M(V )(x).

Theorem 3.2 Let V ∈ B(D∗) with V ≤ V ∗. Then

(i) M(V ) ≤ V ∗;

(ii) V = V ∗ if M(V ) ≤ V .

4

Proof. (i) By Lemma 3.1 and V ≤ V ∗, we have

M(V ) ≤M(V ∗) = V ∗.

(ii) It follows from M(V ) ≤ V and Lemma 3.1 that

V ≥M(V ) ≥M2(V ) ≥ . . . ≥Mk(V ) ≥ . . . .

Since M is a contraction on B(D∗) by Theorem 2.1, Mk(V ) → V ∗. This shows

that V ≥ V ∗. Now we have V ∗ ≥ V ≥ V ∗ which implies V = V ∗.

We will employ a cutting plane algorithm to construct piecewise linear func-

tions V k defined by a set of cuts {t ≥ Qix+ qi : i = 1, . . . , pk} in the form

V k(x) = max{Qix+ qi : i = 1, . . . , pk}

which approximate V ∗ from below. Theorem 3.2 suggests adding a cut to V k

cutting off an area where M(V k)(x) > V k(x) so that V k+1 can move up towards

V ∗. The algorithm stops when there is no x ∈ D∗ such that M(V k)(x) > V k(x).

Algorithm 3.1

1. Initialization: Find a piecewise linear convex function V 0 ∈ B(D∗) satisfying

V 0 ≤ V ∗. Set k ← 0.

2. If V k ≥ M(V k), stop, V k is the solution. Otherwise, find a point xk ∈ D∗

with V k(xk) < M(V k)(xk).

3. Find a supporting hyperplane ofM(V k) at xk, say t = Qk+1x+qk+1. Define

V k+1(x) = max{V k(x), Qk+1x+ qk+1}.

k ← k + 1. Go to Step 2.

3.2 Details of the algorithm

Step 1: Usually, we can find V 0 easily. For instance, if c(x, y) ≥ c0 for all (x, y)

in its domain, then we can choose V 0(x) = c0/(1− δ), a constant function on D∗.

It is clear that

V ∗(x) ≥∞∑

t=0

δtc0 = V 0(x), ∀x ∈ D∗.

Step 2 consists of two parts. Part 1 is the valuation of M(V k)(x) and Part 2

describes how to find a point xk ∈ D∗ with V k(xk) < M(V k)(xk).

Part 1. Assume that V k is defined by k linear cuts, i.e., for any x ∈ D∗,

V k(x) = max{Qix+ qi : i = 1, . . . , k}

= min{θ | θ ≥ Qix+ qi, i = 1, . . . , k}.

5

Then

M(V k)(x) = miny{c(x, y) + δ

L∑

j=1

pjVk(Ajx+Bjy + bj)}

= miny,θ{c(x, y) + δ

L∑

j=1

pjθj | θj ≥ Qizj + qi, ∀ i = 1, . . . , k;

zj = Ajx+Bjy + bj ∈ D∗, j = 1, . . . , L}. (3.1)

Part 2. We seek to find xk by approximately minimizing V k(x)−M(V k)(x) on

D∗. Notice that V k−M(V k) is a d.c. function (difference of two convex functions)

on D∗. There are rich results on solving d.c. programs, originated by Horst and

Tuy [7]. In Appendix 1, we describe a method based on [6]. This method can find

an exact global minimizer. For Algorithm 3.1 to work, however, an approximate

minimizer xk suffices (see Theorem 3.4). Selecting a computationally low-cost

method for finding sufficiently good xk is important for enhancing the efficiency

of the algorithm. The method described in Appendix 1 certainly is not the only

choice for finding xk. In the second example of numerical experiments, we take

a small sample of 5 points and choose the best point as xk. This simple strategy

works well for the example.

Step 3: Suppose that the polytope D∗ is defined by a system of inequalities

Fx ≤ f for some matrix F and vector f , i.e., D∗ = {x | Fx ≤ f}. Let

V k(x) =

{

min{θ | Qkx+ qk ≤ θe} if Fx ≤ f ,

+∞ otherwise,

where Qk is a matrix and qk and e = (1, . . . , 1)T are vectors. Qkx+qk represents

the set of cuts generated up to the k-th iteration.

Let (yk, θk) and {(λkj , µ

kj ) : j = 1, . . . , L} be an optimal solution and optimal

multipliers of the problem:

miny,θ{c(xk, y) + δ

L∑

j=1

pjθj | Qzj + q ≤ θje, Fzj ≤ f , zj = Ajx

k +Bjy + bj , j = 1, . . . , L}.

Let ζk = (ζkx ; ζky ) be a subgradient of c at (x

k, yk). Then

ξk = ζkx +L∑

j=1

((λkj )

TQAj + (µkj )

TFAj)

is a subgradient of M(V k) at xk. (See Appendix 2 for the proof.)

A supporting hyperplane of M(V k) at xk is

t = M(V k)(xk) + ξk(x− xk)

That is,

Qk+1 = ξk, qk+1 =M(V k)(xk)− ξkxk.

6

3.3 A modification of the algorithm

As we can see, global search takes considerable computational effort. A more

efficient algorithm should combine local and global searches. We can perform a

number of local search iterations between two global searches. A simple modifica-

tion of the algorithm follows.

Algorithm 3.2

1. Initialization: Find a piecewise linear convex function V 0 ∈ B(D∗) satisfying

V 0 ≤ V ∗. Set k ← 0.

2. If V k ≥ M(V k), stop, V k is the solution. Otherwise, find a point x ∈ D∗

with V k(x) < M(V k)(x). Let V ← V k.

3. Find a supporting hyperplane ofM(V ) at x, say t = Qx+q. Define V +(x) =

max{V (x), Qx+ q}.

4. If the improvement of V + over V is larger than a certain tolerance, then let

V ← V + and repeat Step 3. Otherwise, let V k+1 ← V +, k ← k + 1, and go

to Step 2.

The time required for Step 3 is negligible compared to Step 2. Two m-files are

attached. The modification is in the file optutildeepcut.m. The implementation on

the first example in the paper shows a reduction by 3/4 of the number of iterations

(of Step 2), i.e., the modified algorithm requires only 1/4 of the original number

of iterations. Furthermore, it obtains better approximate solutions.

3.4 Comparison with other methods

We make comparisons in two perspectives: in improvement and in representation

of an approximate value function V k.

We refer to our method as the successive linear approximation method (SLAM).

Comparison between SLAM and PIM

The policy iteration method (PIM) is widely used for solving Markovian de-

cision problems, cf. [16]. Thus we shall make a comparison between our method

SLAM and PIM.

At k-th iteration, for given V k, PIM finds yk : D∗ → Y (which is the policy in

the notation of our problem setting) by

yk(x) = argmin y{c(x, y) + δEV k(Ax+By + b)}, ∀x ∈ D∗, (3.2)

then finding V k+1 by solving

V k+1(x) = c(x, yk(x)) + δEV k+1(Ax+Byk(x) + b), ∀x ∈ D∗. (3.3)

7

SLAM performs the tk-th iteration as follows: given V k, find a point xk ∈ D∗

by approximately solving

maxx

MV k(x)− V k(x); (3.4)

then, construct V k+1 by adding to V k a cutting plane at xk.

Since

MV k(x) = miny{c(x, y) + δEV k(Ax+By + b)},

finding the action yk(x) at a point x takes the same computational effort as the

valuation of MV k at x; thus, the determination of yk, i.e., the valuation of yk(x)

at all x ∈ D∗, is more expensive than the determination of xk. The construction

of V k+1 in PIM requires solving an infinite-dimensional equation, which is again

more difficult than determining a cutting plane in SLAM. In return, one can expect

that each iteration of PIM improves the value function V k more than SLAM does.

Comparison of approximation methods for the continuous-state DP

All existing approximation methods for continuous-state DP can be roughly

categorized into discrete approximation (DA) and parametric approximation (PA),

cf. [1]. The former approximates V on a grid of D∗ and the latter approximates V

by a linear combination of a set of basis functions. Our method, cutting plane ap-

proximation (CPA), approximates V by the maximum function of a set of cutting

planes.

The CPA method is similar to PA in the sense that both methods use a set of

continuous functions to approximate V . The superiority of PA over DA is that, in

many cases, one can obtain a good global approximation to V using a small number

of basis functions, whereas in high-dimensional problems the DA approach requires

a very large number of grid points to obtain a comparably accurate approximation,

see [1]. CPA has the same advantages as PA over DA, and, moreover, is more

flexible than PA, because cuts are generated where they are most needed in CPA,

whereas basis functions are pre-selected in PA. A disadvantage of the CPA method

is, however, that cuts can be accumulated rapidly. A computational challenge is

to find effective ways to delete unnecessary cuts.

3.5 Convergence

If Algorithm 3.1 stops at an iteration with V K ≥M(V K), then we have V K = V ∗

by Theorem 3.2.

If the algorithm does not terminate in a finite number of iterations, then the

convergence of V k to V ∗ is not so obvious. We notice that each cut is related to

8

a testing point x. Such a process can lead to pointwise convergence, but not nec-

essarily uniform convergence. A catch is that the limit of a pointwise convergent

sequence {V k} may not be V ∗, because V k may only be updated in some area of

the domain but not over the full domain. Our convergence analysis shall answer

the following two questions. Under what conditions can the sequence {V k} con-

verge uniformly? If {V k} converges uniformly, is the limit function equal to V ∗?

Our main condition for uniform convergence is that D∗ is a polytope. Indeed, if

D∗ is an arbitrary convex compact set, one can construct a monotone increasing

sequence of convex functions {V k} which converges pointwise but not uniformly.

The assumption that D∗ is a polytope ensures the continuity of the limit function

of {V k}. Then, by Theorem 7.13 in [15], {V k} converges uniformly; see details in

the proof of Theorem 3.4. For the second question, we will give a positive answer.

The result is presented in Theorem 3.4.

Lemma 3.3 V k ≤ V k+1 ≤ V ∗.

Proof. From Step 2 of Algorithm 3.1, it is obvious that V k ≤ V k+1.

The initial function V 0 ≤ V ∗. Suppose V k ≤ V ∗; then, by Theorem 3.2 (i),

M(V k) ≤ V ∗; thus, for any x ∈ D∗,

V k+1(x) ≤ max{V k(x), M(V k)(x)} ≤ V ∗(x).

Theorem 3.4 Suppose that D∗ is a polytope and suppose that, at the k-th itera-

tion, a point xk ∈ D∗ is selected such that

M(V k)(xk)− V k(xk) ≥ αmax{M(V k)(x)− V k(x) | x ∈ D∗}

for some constant α > 0; then, V k → V ∗ uniformly.

Proof. First, suppose that Algorithm 3.1 stops at a finite iteration K with V K ≥

M(V K). By Lemma 3.3 V K ≤ V ∗. Thus, by Theorem 3.2, V K = V ∗.

Now suppose Algorithm 3.1 generates a sequence {V k} which converges to V

pointwise.

Because V k ≤ V k+1 → V , epi(V ) = ∩kepi(Vk), which is a closed set since every

V k is closed. Thus, V is a closed convex function on D∗. By our assumption, D∗

is a polytope; thus, by Theorem 10.2 in [13], V is continuous on D∗. By Theorem

7.13 in [15], V k → V uniformly on D∗.

Assume that there exists an x ∈ D∗ such that

M(V )(x)− V (x) = 2σ > 0.

9

By Theorem 2.1, M(V k) → M(V ) uniformly on D∗ since V k → V uniformly;

thus, there exists a k such that M(V k)(x) ≥ M(V )(x) − σ, for all k ≥ k. This

yields

M(V k)(x)− V k(x) ≥M(V )(x)− σ − V (x) = σ.

Since the supporting hyperplane at iteration k satisfiesQk+1xk+qk+1 =M(V k)(xk),

we have

V k+1(xk)− V k(xk) = M(V k)(xk)− V k(xk)

≥ α[M(V k)(x)− V k(x)]

≥ ασ.

Thus

V (xk)− V k(xk) ≥ ασ, ∀ k ≥ k.

The above contradicts the uniform convergence of V k → V on D∗.

The contradiction implies that

M(V )(x) ≤ V (x), ∀x ∈ D∗.

Then, by Theorem 3.2, V = V ∗. This means that V k converges to V ∗ uniformly

on D∗.

4 Construction of the domain D∗

Recall that D∗ = dom(V ∗) where V ∗ is the value function defined by (2.1). This

section discusses how to find D∗. We do not have a complete answer for this

problem; nevertheless, the method proposed can find D∗ in a finite number of

iterations for many cases.

4.1 An algorithm

Although the domain of a solution of V = M(V ) does not necessarily coincide

with D∗, it does provide useful information for finding D∗. We first represent the

domain of M(V ).

From (2.2) one can see that x ∈ dom(M(V )) if and only if there exists a y such

that c(x, y) < +∞ and Aix+Biy + bi ∈ dom(V ) for all i = 1, . . . , L. Denote

Dc(x) = {y | c(x, y) < +∞},

G(x,D) = {y | Aix+Biy + bi ∈ D, ∀i = 1, . . . , L}.

10

Then, x ∈ dom(M(V )) if and only if Dc(x) ∩G(x, dom(V )) 6= ∅.

Denote

Γ(D) = {x | Dc(x) ∩G(x,D) 6= ∅}.

Then,

dom(M(V )) = Γ(dom(V )).

The following are some basic properties of the operator Γ.

Lemma 4.1 (i) If D1 ⊆ D2, then Γ(D1) ⊆ Γ(D2).

(ii) Γ(D∗) = D∗.

(iii) If D∗ ⊆ D, then D∗ ⊆ Γ(D).

Proof. (i) If D1 ⊆ D2, then, for any x, G(x,D1) ⊆ G(x,D2). Now, for any

x ∈ Γ(D1), Dc(x) ∩G(x,D1) 6= ∅. This implies Dc(x) ∩G(x,D2) 6= ∅. Therefore,

x ∈ Γ(D2).

(ii) It follows from M(V ∗) = V ∗ that dom(M(V ∗)) = dom(V ∗), which yields

Γ(D∗) = D∗ by the definition.

(iii) If D∗ ⊆ D; then, by (i), Γ(D∗) ⊆ Γ(D); hence, by (ii), we have D∗ ⊆ Γ(D).

The following lemma shows that, if dom(V ) ⊆ dom(M(V )), then dom(V ) ⊆

dom(V ∗).

Lemma 4.2 Suppose c is bounded on its domain. If D ⊆ Γ(D), then D ⊆ D∗.

Proof. For any xt ∈ D, we have xt ∈ Γ(D); thus,

∃yt ∈ Dc(xt) : Aixt +Biyt + bi ∈ D, ∀ i = 1, . . . , L. (4.1)

For any x ∈ D, let x0 = x. There exists y0 satisfying (4.1). For each realization

of ξ = (A,B, b) (where the subscript is omitted for ease of exposition), let x1 =

Ax0 + By0 + b. By (4.1), x1 ∈ D. Since D ⊆ Γ(D), x1 ∈ Γ(D). Therefore, there

exists y1 satisfying (4.1), and so on; so, we obtain a sequence {(xt, yt) : t = 0, 1, . . .}

(here (xt, yt) are random vectors) satisfying (xt, yt) ∈ dom(c) because yt ∈ Dc(xt).

Since c is bounded on its domain, E∑∞

t=0 δtc(xt, yt) <∞; thus, V

∗(x), defined by

(2.1), is finite, i.e., x ∈ D∗. Therefore, D ⊆ D∗.

Suppose we use a cutting plane method to construct D∗ and start with a set

D ⊇ D∗. The above lemma suggests that one should cut off a portion of D \Γ(D)

if D 6⊆ Γ(D).

Because

D∗ ⊆ Dcx := {x | Dc(x) 6= ∅} = {x | ∃y such that c(x, y) < +∞},

we start with Dcx to find D∗. A generic cutting plane method which constructs

D∗ with this idea is as follows:

11

Algorithm 4.1

1. Let D0 = Dcx. k = 0.

2. If Dk ⊆ Γ(Dk), stop. Otherwise, find a cut F k+1x ≤ fk+1 which cuts off a

portion of Dk \ Γ(Dk).

3. Let Dk+1 = Dk ∩ {x | F k+1x ≤ fk+1}. k ← k + 1. Repeat.

The algorithm stops when Dk with Dk ⊆ Γ(Dk) is found. Question: is Dk =

D∗?

Theorem 4.3 If the algorithm terminates at a finite iteration K, then DK = D∗.

Proof. The algorithm starts with Dcx which contains D∗. By Lemma 4.1, D∗ ⊆

Γ(Dk) for k = 0, 1, . . . ,K; thus, no cut cuts off any point of D∗. This implies

DK ⊇ D∗. When the algorithm stops with DK ⊆ Γ(DK), Lemma 4.2 yields

DK ⊆ D∗; thus, DK = D∗.

Unfortunately, if Algorithm 4.1 does not stop in a finite number of iterations,

the sequence {Dk} need not converge to D∗ (see Remark (iii) after Example 1).

How to modify the algorithm to guarantee convergence is still an open question.

4.2 Generating cuts

Now we discuss Step 2 in detail. First, we propose a method which finds a point

x ∈ Dk \ Γ(Dk) and generates a cut which cuts off a portion of Dk \ Γ(Dk).

For simplicity we only consider the case that dom(c) is a polyhedral set. More

precisely, let dom(c) = {(x, y) : Tx +Wy ≤ r}. Let Dk = {x : F ix ≤ f i : i =

1, . . . , k}.

x 6∈ Γ(Dk) if and only if

T x+Wy ≤ r,

F i(Aj x+Bjy + bj) ≤ f i, i = 1, . . . , k; j = 1, . . . , L;

has no feasible solution; then, by Farkas’ Theorem, if and only if

k∑

i=1

L∑

j=1

πijFiBj + λTW = 0,

k∑

i=1

L∑

j=1

πij [Fi(Aj x+ bj)− f i] + λT (T x− r) > 0, (4.2)

π ≥ 0, λ ≥ 0,

has a solution.

12

Thus, to find a point x ∈ Dk such that x 6∈ Γ(Dk) is equivalent to finding a

triple (x, λ, π) satisfying x ∈ Dk and (4.2). (Note that finding a solution to (4.2)

is equivalent to determining the sign of the supremum of an indefinite quadratic

function subject to linear constraints, which would also require global optimization

methods as in Horst, et al. [6].)

Once a solution (x, λ, π) is found, one can construct a feasibility cut:

k∑

i=1

L∑

j=1

πij [Fi(Ajx+ bj)− f i] + λT (Tx− r) ≤ 0, (4.3)

i.e.,

F k+1 =k∑

i=1

L∑

j=1

πijFiAj , fk+1 =

k∑

i=1

L∑

j=1

πij [fi − F ibj ] + λT r.

We can add an objective to find the “best” cut in the sense, e.g., that F k+1 is the

normal direction of a facet of D∗, or, perhaps more realistically, a facet of Γ(Dk).

Let us look at a simple example to see how Γk(Dcx) approximates D∗.

Example 1

Suppose

dom(c) = {(x, y) : x ∈ [−1, 1]2, y ∈ [−β, β]},

Ax+By + b = αx+ e1y, for x ∈ <2, y ∈ <1, (deterministic).

Here α ∈ <1 and e1 = (1, 0)T , meaning that A = αI ∈ <2×2, B = e1 ∈ <

2×1 and

b = (0, 0)T . For this example, we have

Dc(x) = {y : (x, y) ∈ dom(c)} =

{

[−β, β] if x ∈ [−1, 1]2,

∅ otherwise.

Dcx = {x : Dc(x) 6= ∅} = [−1, 1]2,

G(x,D) = {y : αx+ e1y ∈ D} for D ⊂ <2.

Thus,

Dc(x) ∩G(x,D) 6= ∅ ⇐⇒ x ∈ [−1, 1]2, [−β, β] ∩ {y : αx+ e1y ∈ D} 6= ∅,

⇐⇒ x ∈ [−1, 1]2, (αx+ e1[−β, β]) ∩D 6= ∅, (4.4)

where

αx+ e1[−β, β] = {αx+ ye1 : y ∈ [−β, β]}

=

{

α

(

x1

x2

)

+

(

y

0

)

: y ∈ [−β, β]

}

.

13

For any D = {|x1| ≤ p, |x2| ≤ q} with p, q ≥ 0,

Γ(D) = {x ∈ [−1, 1]2 : (αx+ e1[−β, β]) ∩D 6= ∅},

= {x ∈ [−1, 1]2 : [αx1 − β, αx1 + β] ∩ [−p, p] 6= ∅, |αx2| ≤ q},

= {x ∈ [−1, 1]2 : |x1| ≤ (p+ β)/α, |x2| ≤ q/α},

= {|x1| ≤ min{1, (p+ β)/α}, |x2| ≤ min{1, q/α}}.

For 0 < α ≤ 1,

Γ(Dcx) = [−1, 1]2 = Dcx.

Thus, by Lemma 4.2, Dcx ⊆ D∗, but D∗ ⊆ Dcx. Thus D∗ = Dcx = [−1, 1]

2.

Note: For the case α = 1, any subsetD of [−1, 1]2 of the form {|x1| ≤ 1, |x2| ≤

t} for some 0 ≤ t ≤ 1 satisfies D = Γ(D). Thus, M is a contraction on the Banach

space B(D) and V = M(V ) has a solution (fixed point) VD in B(D). This shows

that the equation V =M(V ) may have many solutions.

For 1 < α ≤ 1 + β,

Γ(Dcx) = {|x1| ≤ 1, |x2| ≤ 1/α},

Γ2(Dcx) = Γ(Γ(Dcx)) = {|x1| ≤ 1, |x2| ≤ 1/α2},

...

Γk(Dcx) = {|x1| ≤ 1, |x2| ≤ 1/αk};

thus, D∗ = limk→∞ Γk(Dcx) = {|x1| ≤ 1, x2 = 0} 6= Dcx.

For α > 1 + β,

Γ(Dcx) = {|x1| ≤1 + β

α, |x2| ≤ 1/α},

Γ2(Dcx) = {|x1| ≤1 + β + αβ

α2, |x2| ≤ 1/α

2},

...

Γk(Dcx) = {|x1| ≤1 + β + αβ + . . .+ αk−1β

αk, |x2| ≤ 1/α

k};

thus, D∗ = limk→∞ Γk(Dcx) = {|x1| ≤

βα−1 , x2 = 0} 6= Dcx.

Remarks:

(i) In some case (α ≤ 1), D∗ = Dcx, which can be obtained directly.

(ii) In some case (α > 1), infinitely many iterations are required to reach D∗.

(iii) Suppose the cutting plane algorithm generates only single-side-cuts in the

case of α > 1, e.g., only the cuts x2 ≥ −1/αk; then, Dk → {|x1| ≤ 1, 0 ≤

x2 ≤ 1} 6= D∗.

14

(iv) If we can directly generate the cut x2 ≥ 0 instead of infinitely many cuts

{x2 ≥ −1/αk : k = 1, 2, . . .}, then we can construct D∗ in 2 iterations for the

problem with 1 < α ≤ 1+β and 4 iterations for the problem with α > 1+β.

4.3 Generating the deepest cut

A cut generated by (4.3) only cuts off points in Dk \Γ(Dk), i.e., it does not cut off

any point in Γ(Dk). As shown in Example 1, the cutting plane method using such

cuts may fail to approximate D∗ even with infinitely many iterations. In order to

fulfill the goal in Remark (iv), we must find a deeper cut (a smaller f k+1), which

cuts into Γ(Dk), hopefully, reaching the boundary of D∗.

Given a D, suppose that we have obtained a cut dTx ≤ t0 which cuts off a

portion of D \Γ(D) (but does not cut into Γ(D)). Denote Dt := D∩{dTx ≤ t}. If

the plane dTx = t0 has touched the boundary of Γ(D), then no point of Dt0 \Γ(D)

can be cut off by any cut of the form dTx ≤ t for arbitrary t. However, since

Γ(Dt0) is smaller than Γ(D), a portion of Dt0 \Γ(Dt0) may be cut off by some cut

dTx ≤ t. We wish to find t < t0 such that no point of Dt \ Γ(Dt) can be cut off

by any cut of the form dTx ≤ t. In other words, the plane dTx = t touches the

boundary of Γ(Dt) (a supporting plane of Γ(Dt)). Thus, for any t > t, the plane

dTx = t does not touch Γ(Dt), and, for any t ≤ t, the half-space dTx ≤ t intersects

with Γ(Dt). The latter means that there exists x ∈ Γ(Dt) satisfying dTx ≥ t. The

latter interpretation suggests determining t by the following linear program:

t = max {t | dTx ≥ t, x ∈ Γ(Dt)}. (4.5)

Because dTx ≤ t0 does not cut into Γ(D) (then does not cut into Γ(Dt)), there

exists no x satisfying dTx ≥ t and x ∈ Γ(Dt) if t > t0. This implies that t ≤ t0.

Therefore, the cut dTx ≤ t is deeper than the cut dTx ≤ t0.

On the other hand, the following lemma guarantees that the cut dTx ≤ t will

not cut off any point in D∗.

Lemma 4.4 Suppose that D∗ ⊂ D. Let t be the optimal objective value of problem

(4.5); then, dTx ≤ t is satisfied by all x ∈ D∗.

Proof. Let

x∗ = argmax{dTx | x ∈ D∗}, t∗ = dTx∗.

Because x∗ ∈ D∗, there exist {(xt, yt) : t = 0, 1, 2, . . .} such that

V ∗(x∗) = E[∞∑

t=0

δtc(xt, yt)] <∞,

xt+1 = Axt +Byt + b, x0 = x∗.

15

For any integerK ≥ 0, let xl = xl+K and yl = yl+K . Because E[∑∞

t=K δt−Kc(xt, yt)] <

∞, we have

V ∗(xK) ≤ E[∞∑

l=0

δlc(xl, yl)] <∞.

Therefore, xK ∈ D∗.

Now y0 ∈ Dc(x∗) follows from c(x∗, y0) < ∞ and y0 ∈ G(x∗, D∗) follows from

x1 = Aix∗ + Biy0 + bi ∈ D∗ for every i = 1, . . . , L. Thus x∗ ∈ Γ(D∗). Because

D∗ ⊆ D and D∗ ⊆ {x : dTx ≤ t∗}, we have D∗ ⊆ Dt∗ . Therefore, x∗ ∈ Γ(Dt∗).

This, together with dTx∗ = t∗, shows that (x∗, t∗) is a feasible point of (4.5); thus

t∗ ≤ t, from which the claim of the lemma follows.

The following example shows the effect of the deepest cut.

Example 1 (Continued)

Consider α > 1. Consider the cut of the form −x2 ≤ t for some t ∈ R to be

determined. So, d = (0,−1)T in (4.5). Start with D = Dcx = [−1, 1]2; then,

Dt = {x ∈ [−1, 1]2 : −x2 ≤ t}.

Linear program (4.5) is

t = max t

s.t. −x2 ≥ t,

−1 ≤ αx1 + y ≤ 1,

−t ≤ αx2 ≤ 1,

−β ≤ y ≤ β.

A feasible solution must satisfy

−t/α ≤ x2 ≤ −t.

This can only be satisfied when t ≤ 0 since α > 1. Thus, we have t = 0. The cut

−x2 ≤ 0 reaches the bottom of D∗. With one more cut from above (d = (0, 1)T ),

D∗ will be completely determined for the case of 1 < α ≤ 1 + β.

For the case of α > 1 + β, suppose we have d = (1, 0)T . Then,

Dt = {x ∈ [−1, 1]2 : x1 ≤ t}.

Linear program (4.5) is

t = max t,

s.t. x1 ≥ t,

−1 ≤ αx1 + y ≤ t,

−1 ≤ αx2 ≤ 1,

−β ≤ y ≤ β.

16

Feasible solutions must satisfy

t ≤ x1 ≤t+ β

α.

This implies

t ≤β

α− 1.

Thus t = βα−1 ; so, we obtain a cut x1 ≤

βα−1 which cuts exactly to the boundary

of D∗ on the right. One more cut from the left (d = (−1, 0)T ) will completely

determine D∗; hence, in total, we need only 4 cuts.

5 Examples

5.1 Infinite horizon portfolio

As noted in the introduction, this work was motivated by solving infinite horizon

investment problems that face long-enduring institutions. We will demonstrate

how the algorithm performs on a small example where an infinite horizon optimum

can be found analytically (as done, for example, in [17] and [12]). The goal is to

maximize the discounted expected utility of consumption over an infinite horizon.

The decisions in each period are how much to consume and how much to invest

in a risky asset (or in a variety of assets).

The state variable x in this case corresponds to wealth or the current market

value of all assets. The control variable y has two components, y1, which corre-

sponds to consumption, and y2, which corresponds to the amount invested in a

risky asset with random return ξ. The assumption in this model is that any re-

maining funds, after consuming y1 and investing y2 in the risky asset, are invested

in a riskfree asset (e.g., U.S. Treasury bills) with known rate of return r. For this

model, c(x, y) is either ∞ if x < 0 or −yγ1/γ, for some non-zero parameter γ < 1,

giving (the negative of) the common utility function with constant relative risk

aversion (i.e., such that risk preferences do not depend on the level of wealth).

With these assumptions, M(V ) takes the following form (for x ≥ 0):

M(V )(x) = miny{−yγ1/γ + δ

L∑

i=1

piV ((1 + r)x− (1 + r)y1 + (ξi − r)y2)}, (5.6)

where ξi is the ith realization of the random return with probability pi. The

solution V ∗ ofM(V ) = V can be found analytically by observing that the optimal

value function is proportional to xγ (by, for example, considering the limiting case

17

of a finite horizon problem). We then have that

L∑

i=1

piV ((1 + r)x− (1 + r)y1 + (ξi − r)y2)

= −KL∑

i=1

pi((1 + r)x− (1 + r)y1 + (ξi − r)y2)γ

= −K(x− y1)γ

L∑

i=1

pi((1 + r) + (ξi − r)[y2/(x− y1)])γ

= −K(x− y1)γ

L∑

i=1

pi((1 + r) + (ξi − r)z)γ ,

where z = y2

x−y1is the fractional risky investment after consuming y1 and K is

some positive constant. The optimal z∗ then must solve

L∑

i=1

piγ(ξi − r)((1 + r) + (ξi − r)z)γ−1 = 0, (5.7)

which is independent of y1. With V ∗ = −∑L

i=1 pi((1 + r) + (ξi − r)z∗)γ , optimal

y∗1 now must solve

−yγ−11 + δγKV ∗(x− y1)

γ−1 = 0 (5.8)

or

y∗1 = x(δγKV ∗)1/(γ−1)

1 + (δγKV ∗)1/(γ−1)= xw∗, (5.9)

for an optimal consumption fraction w∗. The last step is to find K from M(V ) =

V , using

−xγ((w∗)γ/γ + δKV ∗(1− w∗)γ) =M(V (x)) = V (x) = −Kxγ (5.10)

to obtain K = ((δV ∗)1/(γ−1)−1)γ−1

δγV ∗ .

While this function can be found explicitly (up to solving the nonlinear equa-

tion (5.7)), various other constraints on investment (such as transaction costs and

limits on consumption changes from period to period) make analytical solutions

impossible. We use the analytical solution here to observe Algorithm 3.1’s per-

formance and convergence behavior. We also use the analytical solution to derive

initial upper bounding approximations. For our test, we use the linear supports of

V ∗ at x = 0.1 and x = 10 as initial cuts and restrict our search in x to the interval

[0.1, 10], although the feasible region is unbounded.

For our test, we used γ = 0.03, r = 0.05, and ξi chosen as a discrete approxima-

tion of the lognormal return distribution with mean return of 0.08 and standard

18

0 1 2 3 4 5 6 7 8 9 10−170

−165

−160

−155

−150

−145

V−0V−5V−10V−20V−50V−100V*

Figure 1: Value function approximations for portfolio example, δ = 1/1.25. The hor-

izontal axis stands for the state variable x and the vertical axis for the value function

V .

deviation of 0.4. Algorithm 3.1 was implemented in MATLAB using fmincon to

solve the optimization subproblems and a linesearch to find x in Step 2. We tried

different values for the discount factor, δ. The results for δ = 11.25 appear in Figure

1, which includes V 0, V 5, V 10, V 20, V 50, V 100, and V ∗. In this case, after 100

iterations, the approximation almost perfectly matches the true infinite-horizon

value function.

With a larger δ, the value of K increases rapidly as (δV ∗)1/(γ−1) approaches

one. For δ = 11.25 , K is 155.6, while, for δ = 1

1.07 , K is 466.3. The result is that

larger δ values (corresponding to lower discount rates) require additional iterations

of Algorithm 3.1 to approach V ∗. The results for the same data as in Figure 1

except with δ = 11.07 appear in Figure 2. After 500 iterations, the approximating

V 500 agrees relatively well with V ∗ as shown in the figure but has not converged to

nearly the same accuracy as the approximations (with fewer iterations) in Figure

1.

For this example, Algorithm 3.2 gains a notable speed-up. For the instance

with δ = 11.25 , it requires only 25 iterations to obtain the approximating V

25 which

19

0 1 2 3 4 5 6 7 8 9 10−500

−490

−480

−470

−460

−450

−440

−430

V−0V−10V−50V−500V*

Figure 2: Value function approximations for portfolio example, δ = 1/1.07. The hor-

izontal axis stands for the state variable x and the vertical axis for the value function

V .

is as good as V 100 obtained by Algorithm 3.1. For the case of δ = 11.07 , Algorithm

3.2 requires only about 1/4 of iterations needed by Algorithm 3.1. Moreover,

Algorithm 3.2 can generate more accurate approximating value functions.

Understanding the numerical behavior of Algorithm 3.1 and finding mecha-

nisms to speed convergence should be topics for further investigation. Comparing

V k to V ∗ in the figures shows how the algorithm forces closer approach in some

areas of the curve over others. The behavior of the algorithm is generally to move

along the curve V k to create tighter cuts and then to repeat that process with

less improvement on a new sequence of iterations. These observations suggest

that procedures with multiple cut generation and tightening tolerances should be

considered for accelerating convergence.

5.2 Quadratic-linear stochastic control problems

This example is intended to test the algorithm for high dimensional problems. The

quadratic-linear stochastic control problems are well known. We will use these

20

examples following Section 6.5 of [2]. The value function in a quadratic-linear

stochastic control problem is defined as follows:

V ∗(x) = miny1,y2,... Eξ0,ξ1,...

∞∑

t=0

δt[xTt Qxt + yTt Ryt]

s.t. xt+1 = Axt +Byt + b, for t = 0, 1, 2, . . . ,

x0 = x.

In the above expression, xt ∈ <n, yt ∈ <

m, and the matrices A, B, Q, R are

given and have appropriate dimensions. We assume that Q is a symmetric posi-

tive semidefinite matrix and that R is also symmetric and positive definite. The

disturbances ξt = b form independent random vectors with given probability dis-

tributions that do not depend on xt, yt. Furthermore, the vector b has zero mean

and finite second moment. The controls yt are unconstrained. The problem above

represents a popular formulation of a regulation problem whereby we desire to

keep the state of the system close to the origin. Such problems are common in the

theory of automatic control of a motion or a process. For computational purpose,

we assume ξ = b is a discrete random vector with pi = Prob(ξ = bi), i = 1, . . . , L.

The value function V ∗ is the solution of M(V ) = V where the mapping M is

determined by

M(V )(x) = miny{xTQx+ yTRy + δ

L∑

i=1

piV (Ax+By + bi)}.

We choose the quadratic-linear stochastic control problem, because its exact

solution on <n can be computed. In the following, to be consistent with standard

notation for these problems, we re-use the notation K and c as a matrix and con-

stant respectively. To find the exact solution for V ∗, we first compute a symmetric

positive semi-definite matrix K by solving the algebraic matrix Riccati equation:

K = AT [δK − δ2KB(δBTKB +R)−1BTK]A+Q.

Then, the exact value function V ∗ is given by

V ∗(x) = xTKx+ c,

where

c =δ

1− δ

L∑

i=1

pibTi Kbi.

The above formula determines the exact value function on <n; however, a

numerical method can only determine a value function on a bounded set. In this

example, we will find the value function on the box D∗ = [−t, t]n. The solution of

21

M(V ) = V on B(<n) and B([−t, t]n) need not be the same; however, for reasonably

large t, the two solutions should be close. Thus, we still use the exact value function

on B(<n) as the reference for numerically generated value functions.

Finding a point x which maximizes M(V )(x) − V (x) is computationally very

expensive. The method for minimizing a d.c. function described in Appendix 1

can be applied to finding x; however, this is difficult to program and expensive

in computation. Thus, we use the following approach. We randomly sample J

points, xj , j = 1, . . . , J , (e.g., J = 5). At each point xj , we evaluate M(V )(xj)

and compute the gradient of M(V ) (or a subgradient if M(V ) is not smooth), ξj ,

which generates at the point a supporting plane, H j , of M(V ). Then we find a

point xj by maximizingHj(x)−V (x) which can be viewed as a local approximation

of M(V )(x)− V (x). Suppose that the current approximate value function

V (x) = max{Qix+ qi : i = 1, . . . , p}, x ∈ [−t, t]n.

Using the supporting plane formula from Subsection 3.2, we have

Hj(x) =M(V )(xj) + ξj(x− xj).

Thus, maximizing Hj(x)− V (x) can be formulated as a linear program

σj = max{M(V )(xj) + ξj(x− xj)− θ : x ∈ [−t, t]n, Qix+ qi ≤ θ, i = 1, . . . , p };

then, x is chosen as the point among xj , j = 1, . . . , J with the largest σj .

We implemented the successive linear approximation method (SLAM) for the

quadratic-linear stochastic control problem with various dimensions and summa-

rize the results in the tables below. The accuracy of an approximate solution to

the exact solution is measured by the relative error,

‖V ∗ − V ‖D∗/‖V ∗‖D∗ .

Although the exact value function V ∗(x) = xTKx + c is known, computing the

approximation error is still difficult due to the high dimension of D∗ = [−t, t]n.

To estimate the error, we generate a sample of 5000 points plus points on the line

of the eigenvector corresponding to the maximum eigenvalue of K, and estimate

‖f‖D∗ by max{|f(x)| : for all sample points x}. This sample size is generally

small for high-dimensional problems, causing potential inaccuracy in the errors

shown in the table. The result can, however, be understood as a tendency. For

each dimension, we computed 3 instances. The errors shown in the table are the

average of the 3 instances.

22

Number of Iterations

10 25 50 100 200

(1,1) 0.3540 0.1238 0.0473 0.0125 0.0037

(2,2) 0.1680 0.0750 0.0466 0.0191 0.0105

(n,m) = (4,2) 0.1444 0.0649 0.0541 0.0309 0.0196

(4,4) 0.1140 0.0512 0.0380 0.0190 0.0153

(10,5) 0.1311 0.1255 0.0785 0.0499 0.0455

(10,10) 0.1429 0.1403 0.0868 0.0751 0.0670

Relative Errors

The experiments show that fairly good solutions can be obtained for low di-

mensional instances. For the high-dimensional cases, errors are rapidly reduced in

the first several iterations, but are not clearly reduced after the initial iterations

through the 200th iteration. We display only the first iterations for the examples

of dimensions (n,m) = (20, 20) and (50, 50).

# Iterations 1 2 3 4 6 8 200

(20,20) 1.0000 0.3935 0.3926 0.1383 0.1221 0.1159 0.1126

(50,50) 1.0000 0.7774 0.3442 0.1794 0.0939 0.0925 0.0925

Relative Errors

A positive aspect of these tests is that the method appears to work well for high-

dimensional problems in the sense that it can find a relatively good solution in the

first few iterations, with relative errors reduced to around 0.1. Considering that it

is not possible for a direct state-discretization method to work for a 20-dimensional

problem (since that requires q20 points if each dimension is discretized into q

points), our method can be considered as a useful alternative for solving high-

dimensional stochastic dynamic programs for which the curse of dimensionality is

a major concern.

The negative result from this example is that, after a few iterations, the method

converges very slowly, in particularly, for high-dimensional problems. This can be

seen from the examples of dimensions (n,m) = (20, 20) and (50, 50). Useful cuts

continue to be added in these examples, but, while reducing local errors about the

iteration points, errors still remain unaffected in other portions of the domain.

6 Observations and future issues

We have described an algorithm for solving a general class of discrete-time con-

vex infinite horizon optimization problems and demonstrated the method on two

simple examples. As the first example demonstrates, many iterations may be

required to achieve accuracy for highly nonlinear value functions. The second ex-

ample, however, demonstrates that approximate solutions converge very fast at the

beginning and can produce approximate solutions with a relative error of around

0.1, even for high-dimensional problems.

Since each iteration involves additional optimization steps, selection of x (or

potentially multiple points on each iteration) has a critical effect on performance.

23

We mentioned the d.c. methods as one possibility for finding good points but

other methods that require fewer function evaluations may also be useful. In the

second example, we select x from a small sample. The results from this example

are encouraging. To see if this selection is effective in general, more numerical

experiments are needed. The effect of different options also requires further study.

Our method relies on identification of the feasible domain, D∗. In that case,

we are left with the following questions:

(i) Under what condition is D∗ a polytope?

(ii) Can Algorithm 4.1 terminate in a finite number of iterations if D∗ is a

polytope?

(iii) How can one modify the algorithm if D∗ is not a polytope?

These questions and further implementation issues are additional subjects for fu-

ture research.

7 Appendix

Appendix 1

This appendix describes a method for finding a minimizer to the problem

minx∈D∗

V (x)−M(V )(x),

where both V and M(V ) are convex. This problem is equivalent to

min xn+1

x, xn+1 : V (x)−M(V )(x)− xn+1 ≤ 0,

x ∈ D∗,

which is equivalent to

min xn+1

x, xn+1, xn+2 : V (x)− xn+1 − xn+2 ≤ 0,

M(V )(x)− xn+2 ≥ 0

x ∈ D∗.

Suppose that

V (x) = max{Qix+ qi : i = 1, . . . , k}, ∀x ∈ D∗ ⊂ <n;

then, the above problem is equivalent to

min xn+1

24

x, xn+1, xn+2 : Qix+ qi − xn+1 − xn+2 ≤ 0, i = 1, . . . , k,

x ∈ D∗,

M(V )(x)− xn+2 ≥ 0.

The first k + 1 constraints define a polyhedral set, denoted by D. (In order to

use the algorithm described in [6], D should be bounded. This can be done by

adding appropriate lower and upper bounds on xn+1 and xn+2, without changing

the solution of the minimization problem.) The function in the k+2-nd constraint

is convex, denoted by g. Such a program can be solved by Algorithm 4.1 in [6].

Here we describe it briefly. The algorithm solves

min cTx (7.1)

s.t. x ∈ D, g(x) ≥ 0.

Initialization:

Solve min{cTx : x ∈ D} to obtain x0 ∈ D. Assume g(x0) < 0 (otherwise,

x0 is optimal solution to (7.1)). Construct a simple polytope S0, e.g., a simplex,

containing D, such that x0 is a vertex of S0; compute the vertex set V (S0) of S0.

j ← 0.

Iterations:

Choose z ∈ V (Sj) satisfying g(z) = max{g(x) : x ∈ V (Sj)}.

If g(z) = 0 and z ∈ D, then stop; z is an optimal solution.

Otherwise, apply the simplex algorithm with starting vertex z to solve min{cTx :

x ∈ Sj} until an edge [u, v] of Sj is found such that g(u) ≥ 0 and g(v) < 0 and

cT v < cTu; compute the intersection point s of [u, v] with {x : g(x) = 0}.

If s ∈ D, then Sj+1 ← Sj ∩ {x : cTx ≤ cT s}.

If s 6∈ D, then Sj+1 ← Sj ∩ {x : l(x) ≤ 0}, where l(x) ≤ 0 is one of the linear

constraints defining D satisfying l(s) > 0.

j ← j + 1, repeat.

Appendix 2

Define the function ρ =M(V ) by

ρ(x) = miny,θ{c(x, y) + δ

∑

j

pjθj | Qzj + q ≤ θje, Fzj ≤ f ,

zj = Ajx+Bjy + bj , j = 1, . . . L}

and let (y, θ) and (λ, µ) be an optimal solution and multiplier of the above mini-

mization problem. We will show

ξ = ζx +L∑

j=1

(λTj QAj + µT

j FAj)

25

is a subgradient of ρ at x, where ζ = (ζx, ζy) is a subgradient of c at (x, y).

We can write

ρ(x) = min c(x, y) + δL∑

j=1

pjθj

y, θ : Q(Ajx+Bjy + bj) + q ≤ θje,

F(Ajx+Bjy + bj) ≤ f , j = 1, . . . , L,

= max h(λ, µ;x)

s.t. λTj e = δpj , j = 1, . . . , L, (7.2)

λ ≥ 0, µ ≥ 0,

where λ = (λ1, . . . , λL), µ = (µ1, . . . , µL), and

h(λ, µ;x) = miny c(x, y) +L∑

j=1

[λTj (Q(Ajx+Bjy + bj) + q)

+µTj (F(Ajx+Bjy + bj)− f)]. (7.3)

Let (λ, µ) be the optimal solution of the problem (7.5) with x = x. Then

ρ(x) = h(λ, µ; x).

The necessary and sufficient condition for y to be an optimal solution of the

problem (7.3) (given (λ, µ; x)) is that there exists a subgradient ζ = (ζx, ζy) of c

at (x, y) such that

ζy +L∑

j=1

[λTj QBj + µT

j FBj ] = 0. (7.4)

For fixed (λ, µ) = (λ, µ) and for any x, denote by yx an optimal solution of (7.3).

Then,

ρ(x) = maxλ,µ{h(λ, µ;x) | λT

j e = δpj , j = 1, . . . , L, ; λ ≥ 0, µ ≥ 0}

≥ h(λ, µ;x)

= c(x, yx) +L∑

j=1

[λTj (Q(Ajx+Bjyx + bj) + q) + µT

j (F(Ajx+Bjyx + bj)− f)].

Because c is convex,

c(x, y) ≥ c(x, y) + ζx(x− x) + ζy(y − y).

Note that yx = y and

ρ(x) = h(λ, µ; x),

= c(x, y) +L∑

j=1

[λTj (Q(Aj x+Bj y + bj) + q) + µT

j (F(Aj x+Bj y + bj)− f)].

26

Thus,

ρ(x) ≥ ρ(x) + ζx(x− x) + ζy(yx − y),

+L∑

j=1

[λTj (QAj(x− x) +QBj(yx − y)) + µT

j (FAj(x− x) + FBj(yx − y))],

= ρ(x) + {ζx +L∑

j=1

(λTj QAj + µT

j FAj)}(x− x).

where the last equation uses (7.4). The above inequality shows that

ξ = ζx +L∑

j=1

(λTj QAj + µT

j FAj)

is a subgradient of ρ at x.

References

[1] H. Benitez-Silva, G. Hall, G.J. Hitsch, G. Pauletto, and J. Rust, A compar-

ison of discrete and parametric approximation methods for continuous-state

dynamic programming problems, Reseach report, Department of Economics,

Yale University, New Haven, CT, 2000 (also Working Paper No. 24, Com-

putation in Economics and Finance, Society for Computational Economics

2000).

[2] D.P. Bertsekas, Dynamic Programming and Stochastic Control, Academic

Press, 1976.

[3] D.P. Bertsekas, Dynamic Programming and Optimal Control, Volume II,

Athena Scientific, Belmont, MA, 1995.

[4] D.P. De Farias and B. Van Roy, The linear programming approach to approx-

imate dynamic programming, Operations Research, 51 (2003), pp. 850–865.

[5] S.D. Flam and R.J-B Wets, Existence results and finite horizon approximates

for infinite horizon optimization problems, Econometrica, 55 (1987), pp. 1187–

1209.

[6] R. Horst, P.M. Pardalos, and N.V. Thoai, Introduction to Global Optimiza-

tion, 2nd edition, Kluwer Academic Publishers, Dordrecht, 2000.

[7] R. Horst and H. Tuy, Global Optimization: Deterministic Approaches, 2nd

revised edition. Springer Verlag, Heidelberg, 1993.

[8] R.C. Grinold, Finite horizon approximations of infinite horizon linear pro-

grams, Math. Program., 12 (1977), pp. 1–17.

27

[9] L.A. Korf, An approximation framework for infinite horizon stochastic dy-

namic optimization problems with discounted costs, Research report, De-

partment of Mathematics, Washington University, Seattle, WA, 2000.

[10] W.S. Lovejoy, A survey of algorithmic methods for partially observed Markov

decision processes, Ann. Oper. Res., 28 (1991), pp. 47–66.

[11] W.S. Lovejoy, Computationally feasible bound for partially observed Markov

decision processes, Oper. Res., 39 (1991), pp. 162–175.

[12] R.C. Merton, Lifetime portfolio selection under uncertainty: the continuous-

time case, The Review of Economics and Statistics, 51 (1969), pp. 247–257.

[13] R.T. Rockafellar, Convex Analysis, Princeton University Press, Princeton,

NJ, 1970.

[14] R.T. Rockafellar and R.J-B Wets, Variational Analysis, Springer-Verlag,

Berlin, 1998.

[15] W. Rudin, Principles of Mathematical Analysis, 3rd edition, McGraw-Hill

Inc., 1984.

[16] J. Rust, A comparison of policy iteration methods for solving continuous-

state, infinite-horizon Markovian decision problems using random, quasi-

random, and deterministic discretizations, Working Paper, Yale University,

April 1997 (also in Computational Economics, Economics Working Paper

Archive at WUSTL).

[17] P.A. Samuelson, Lifetime portfolio selection by dynamic stochastic program-

ming, The Review of Economics and Statistics, 51 (1969), pp. 239–246.

[18] R.D. Smallwood and E.J. Sondik, The optimal control of partially observable

Markov processes over a finite horizon, Oper. Res., 21 (1973), pp. 1071–1088.

28

successivelinearapproximationsolution ...matzgy/papers/cp_ihdp3jrb... ·...

Documents