dynamic programming: has its day arrived? · has its day arrived? oscar r. burt a look at john...

Multiperiod Optimization:Dynamic Programming vs. Optimal Control(Harry Mapp, Oklahoma State University, Presiding)

Dynamic Programming:Has Its Day Arrived?

Oscar R. Burt

A look at John Kennedy's recent reviewarticle on dynamic programming (DP) appli-cations in agriculture, forestry, and fisheriesleaves no doubt that DP is a useful analyticaland numerical technique [Kennedy]. How-ever, the method appears to be in morecommon use among Australian than Ameri-can agricultural economists, particularly inthe farm management/production economicsarea. When John Allison and I published"Farm Management Decisions with Dynam-ic Programming" [Burt and Allison] twentyyears ago, I though DP would be as routinelyused by now as linear programming (LP), butthis is obviously not the case. After outliningthe fundamental principles of DP, the appar-ent lack of its popularity among researchagricultural economists will be addressed.

For simplicity in exposition, consider adynamic process which can be described by asingle state variable and a first order differ-ence equation,

(1) Xt+ 1 = h(ut,xt, Et)

where u, x, and E are the decision variable,state variable, and a random variable, respec-tively. Frequently, the economic criterion isa periodic cash flow, the expectation of whichcan be written as a function of ut and xt, i.e.,G(ut,xt). A one-period discount factor appro-priately defined for the length of time periodt is denoted 3. Since the conditional expect-ed periodic return function, G(ut,xt), is arandom variable for periods j>t from thedecision agent's point of view at time t, a

Oscar R. Burt is a professor in the Department ofAgricultural Economics and Economics at MontanaState University.

decision function Dt(xt) is sought which max-imizes the expected present value criterion

(2)o00

E[ PEtG(Dt(xt),xt)]t=l

subject to (1) and an initial value Xo, whereE(-) is the mathematical expectation oper-ator.

The dynamic optimization model can berestated as a recursive equation by definingVn(x) as the expected present value of netreturns from an n-period decision processwhen the optimal decision rule is followedand the initial state of the process is x. Thenapplication of Bellman's "principle of opti-mality" [Bellman 1957a] gives

(3) v,(x)= max[G(u,x) + pE{vn_ l(h(u,x, E))}].

The subscript n is a reverse ordering withrespect to chronological time t, and is conve-nient for considering limits as the process isextended indefinitely. The number ofperiods duration of the process is called thenumber of stages; thus (3) refers to an n-stageprocess.

When G(-), h('), and the distribution of £

are independent of stages, vn(') converges toa limiting function as n-->o and the decisionrule is invariant over stages. Therefore, thelimiting case of (3) is a functional equationwhere v(-) replaces vn(') and Vn- _(). Manyapplications involve a finite number ofperiods and the functions are changing overtime (stages), but the simpler case is usedhere for discussion purposes. The basic re-currence equation in (3) generalizes in anobvious way to allow, u,x, and E to be vectorswith h(.) a vector function. The nonstochasticcase occurs when the random variable E is

381

Western Journal of Agricultural Economics

deleted from h(-), i.e., the state variablechanges from stage to stage as an exact func-tion instead of randomly.

The most common method of solving (3) isby successive approximations to Vn-l('). Anarbitrary value is assigned to vo('), say identi-cally zero; then (3) is solved for vl(') bysolving the optimization problem on the righthand side for many discrete values of thestate variable, then v2() is obtained with theapproximation for vl(-), then V(') and so on.Using discrete values of a continuous statevariable as an approximation converts (3) intoa finite Markovian decision process whichwas the type of DP problem analyzed inRonald Howard's pioneering work [Howard].

Obstacles to Implementation

Although the computational and input/out-put problems with data are substantial ingetting solutions to empirical formulations of(3), the greatest obstacle to applicationsseems to be conceptualization difficulties,i.e., understanding how to formulate an em-pirical situation as a DP model. The concep-tual and computational difficulties compoundone another because the most intuitive anddirect way to structure the model is ofteninfeasible computationally, or at least cum-bersom and expensive in computer re-sources. Clifford Hildreth remarked 25 yearsago in the context of this same type of deci-sion theory model: ". . .although theoreticaldiscussion is concerned almost exclusivelywith solving given decision problems, for-mulating appropriate problems is one of themost difficult tasks in successful application"[Hildreth].

It is important to recognize that the solu-tion technique is relatively unimportant com-pared to formulation of the practical problemin such a way that it is amenable to analysisby the particular technique chosen. An out-standing example of imaginative use of stoch-astic simulation in conjunction with statisticalresponse surface exploration methods is theearly application of Zusman and Amiad to astochastic DP problem which appeared toocomplex for a direct DP solution. The recent

382

analysis of U.S. wheat storage policies byTaylor and Talpaz is another clever use of twotechniques together - approximate certain-ty equivalence DP and stochastic simulation.

The primary objective in all modeling is tocapture the essential aspects of the phenome-non under study and yet keep the model assimple as possible. The tendency for stoch-astic sequential decision problems to mush-room in complexity requires more imagina-tion in the application of this basic principleof modeling. Modern computers and cheaprates at many research facilities have en-couraged the substitution of computer simu-lation in a cavalier manner for well construc-ted economic models. The result has beenmore numerical output than can be as-similated by the analyst, let alone presentedin an objective form for the professional liter-ature. Many papers are being publishedwhich try to subjectively summarizevoluminous output from simulation modelswhich are a complete black box to the pros-pective audience.

Although I do not understand why, thereseems to be more difficulty than with mostother mathematical modeling techniques inmaking the transition from a rudimentaryknowledge of the method of DP to originalformulations of problems. Dreyfus and Lawbegin the preface of their text:

It is our conviction, based on considerableexperience teaching the subject, that the artof formulating and solving problems usingdynamic programming can be learned onlythrough active participation by the student.No amount of passive listening to lectures orof reading text material prepares the studentto formulate and solve novel problems. Thestudent must first discover, by experience,that proper formulation is not quite as trivialas it appears when reading a textbook solu-tion. Then, by considerable practice withsolving problems on his own, he will acquirethe feel for the subject that ultimately ren-ders proper formulation easy and natural.

The above statement makes an excellent pre-mise on which to teach any technical coursesuch as econometrics or operations researchtechniques, but in a relative sense, it is espe-cially important in DP. My personal experi-

December 1982

Dynamic Programming

ences in teaching the subject support theview of Dreyfus and Law.

In my opinion, part of the lack of populari-ty of DP in agricultural economics researchemanates from its cursory treatment in grad-uate courses which survey various operationsresearch techniques. The many topics whichcompete for time in such courses, combinedwith the limited credit hours devoted toquantitative methods, discourage instructorsfrom covering the subject in much depth.Another factor is the generally poor back-ground in mathematics and statistics of ourgraduate students which discourages them intheir study of abstract topics such as DP,particularly the stochastic models. A finalobstacle to the students learning the subjectis the poor quantitative training of the in-structors themselves in many of thesecourses. When these quantitative courses areoffered in specialized departments on cam-pus with well qualified instruction, that iswhere the students should be enrolled.Technical competence in our profession isstill suffering from the "relevancy binge" tak-en back in the late 1960's and early 70's.

The use of control theory in its populardeterministic, continuous time version, as-sociated with the Pontryagin maximum prin-ciple, has received a great deal of exposure inthe general economics literature during thepast decade. However, numerical solution ofapplications is a rarity because these papershave been almost entirely of a theoreticalnature where general mathematical structureof the solution is the objective. Explicit solu-tion of control theory problems is seldompossible except for trivial exercises. Mostagricultural economists, particularly theyounger ones, have had some exposure tothis type of dynamic economic analysis. Con-trol theory applications in the agriculturaleconomics literature tend to mimic the theo-retical research in general economics by stat-ing the necessary conditions for an optimum,a la the Pontryagin maximum principle, andthen give an intuitive economic interpreta-tion, but theorems on the structure of thesolution are rare.

The discrete time characteristic of DP isactually more realistic in most economic ap-plications than continuous time modeling.One may wonder why the discrete time ana-logues of Pontryagin's maximum principleare not more widely used in general econom-ics. The reason is apparently the more awk-ward mathematical analysis and less opportu-nity to exploit results already proved in thephysical and mathematical sciences. The costof using continuous approximations to dis-crete time phenomena is that certain tacitassumptions are made about the optimal sol-ution to the actual discrete time model. Forexample, certain types of oscillatory behaviorof the state variables are ruled out in thecontinuous time version which could affectthe solution. In a sense, the necessity ofmaking tacit assumptions to apply continuoustime models to discrete phenomena is thedual of the assumption required to derive thePontryagin maximum principle from Bell-man's principle of optimality. Mathemati-cally, the relaxed assumptions needed toprove the maximum principle was the maincontribution of Pontryagin over Bellman'sresults (see Intrilligator p. 329, 356, andfootnote 5 for the nature of the assumptionrequired to use Bellman's approach).

When the dynamic model is nonstochasticand the objective of the analysis is to set outthe necessary conditions for an optimal solu-tion with a little intuitive economic interpre-tation, it is largely irrelevant whether wethink of the necessary conditions as havingbeen derived through the calculus of varia-tions, control theory, or DP; the result is thesame. If further mathematical analysis todelineate the structure of the solution isattempted, it appears that mathematicaleconomists usually prefer classic variationalmethods over DP in deterministic models.

Bellman introduced the term "curse ofdimensionality" to describe the extremelylarge computer storage requirements result-ing when a direct computational procedure isapplied to the recurrence relation of (3). Un-fortunately, the term has tended to be as-sociated with DP and not recognized as an

383

Burt


inherent characteristic of dynamic optimiza-tion problems in general when the number ofstate variables is large (more than 3 or 4).Many methods which are used with the ideaof avoiding the curse, such as simulation,tacitly pretend that it does not exist. Theresult of ignoring the true complexity of theoptimization problem is a partial, approxi-mate, or at worst, no solution to the problem.

Most solution methods associated with thecontrol theory or calculus of variations formu-lation reduce the dimensionality problem byobtaining the solution to a very specific state-ment of the optimization, viz., a specificinitial state of the process, instead of gettingthe solution to an entire family of problemsassociated with all possible initial states as isobtained with DP. The latter method alsoyields the entire family of solutions for allpossible lengths of planning horizons in thecontext of many applications. In empiricalagricultural economics research, the objec-tive is to learn much more about a dynamicprocess than merely an optimal solution pathstarting from some initial state, in contrast tomany applications such as in the space pro-gram or individual firm investment deci-sions.

DP has a certain generality and simplicityfor numerical solution of low dimensionproblems in that concavity/convexity as-sumptions are not required and inequalityand integer constraints actually simplify thecomputations. But good numerical al-gorithms have been developed in recentyears for some classes of continuous state anddecision variable models, and these al-gorithms provide powerful alternatives whenappropriate for the problem at hand. Whenone can obtain the optimal time path of thestate vector for an initial state with one ofthese algorithms at a relatively low cost, thestructure of the optimal decision rule couldbe approximated by using the principles ofexperimental design to strategically choose arelatively small number of initial states. Thedata thus generated would provide an empir-ical basis for fitting a continuous vector func-tion of the state variables as an approximation

to the decision rule, i.e. ut = Dt(xt), where uand x are vectors.

The above discussion has been with refer-ence to deterministic models; when the statevariable difference (differential) equation isstochastic as in (1), DP appears to be the onlyviable method for numerical solution. Thestochastic version of the maximum principledoes not appear to be very useful in suggest-ing solution algorithms [Kushner andSchweppe]. The method used recently byDixon and Howitt (attributed to Athans) hasserious logical problems in that a determin-istic solution path is used as a reference locusfor minimizing expected squared deviations.Various approximation methods applied di-rectly to (3) would appear to give betterresults; some of these are discussed below.The basic difficulty in stochastic control mod-els is that the decision must be made afterthe random outcomes of the state variablesare known for periods j<t when the decisionis made at period t, sometimes referred to asa "wait and see" solution, otherwise valuableinformation is thrown away. In the parlanceof control theory, the decision rule needs tobe in "closed loop" form where the decisionvariable is expressed as a function of the statevariables. This is the natural way in whichthe decision rule is obtained from a DP solu-tion procedure, see (1) through (3).

Making DP More Operational

The primary consideration in formulatingproblems in a DP model is the choice of statevariables which must jointly describe theentire history of the process. The first orderdifference equation in (1) requires that thestate and decision vectors on the right handside summarize all pertinent informationcontributing to optimization of the dynamicprocess under study. Deficiencies in statevariables can directly discard informationsince a decision rule makes the decision vari-ables functions of the state variables at eachstage, or the loss of information can be indi-rect by causing h(') in (1) to be inaccurate andcreate errors in the time paths of the in-cluded state variables. Frequently, informa-

384

December 1982

Dynamic Programming

tion is lost in both ways when a relevant statevariable is omitted from the model.

The expected present value function, Vn(x),ultimately determines the importance of astate variable; x is now defined as a vector. Ifavn(')/axj = 0, then xj is a redundant state vari-able which could be deleted. Going from thisextreme case, the importance of xj in thedecision process is closely related to the mag-nitude of avn(')/IXj, particularly in a relativesense to the other state variables. We canavoid the scale of measurement problem byconverting avn(')/axj to an elasticity, denotedlj. Then I\jl relative to Irqk tells us somethingabout the importance of xj compared to xk.

Another important consideration is theamount of independent variation in Xj takenjointly with the other state variables. Theideas here are much like those in mul-tivariate statistical models, factor analysis inparticular. If all the independent variation ofthe {xj} takes place in q<m dimensions and xis an m-component vector, then there exists atransformation from m to q dimensions suchthat the new state variables yl,.. . yq describethe process equally well. The meaning ofindependence is linear and stochastic for de-terministic and stochastic processes, respec-tively, at least in an operational sense. Mostlikely, the transformation would be linear ineither case which implies the normal dis-tribution for a stochastic process, possiblyafter a previous transformation of the statevariables such as logarithmic.

The above considerations about the statevariables would not be very useful if econom-ic modeling were an exact science. Approxi-mations are always necessary and these ideascan be used to choose among various ways ofsimplifying the model. This is particularlytrue in conjunction with a method of derivingproximate decision rules given by Arrow forDP formulations. Observe that (3) would be astatic optimization problem if the functionVn-_(x) were known. The basic idea of Ar-row's is to use an approximation to Vn-l(')which provides a proximate decision rulewhen the one-stage maximization problem in(3) is solved.

Reductions in theState Vector Dimension

There is no set way to arrive at an approxi-mation to Vn-i(x). In some types of applica-tions, such as firm growth models, dynamiclinear programming coupled with the stan-dard parametric programming options of thealgorithms could be used to estimate a pre-sent value function starting from variousstates. Then Vn-i(x) could be approximatedby fitting a continuous function, such as apolynomial, to the points generated byparametric linear programming. The form of(3) from which the final proximate decisionrule is derived could be either deterministicor stochastic.

In a recent wheat storage study [Burt,Koo, and Dudley], a simplified model con-taining only two state variables was used toestimate Vn- i(x) with the other state variablessuppressed. Then the special nature of theadditional state variables, viz., higher orderlags in econometric equations, permittedtheir gradual phasing into the expected pre-sent value function over six additional stages.Each additional iteration contained an incre-ment to the number of state variables andprovided an improved approximation tovn- (x) in (3). The model was stochastic andthe final iteration implicitly contained 14state variables. This example illustrates theimportance of exploiting any special struc-ture in the application.

The method used in [Burt, et al.] tosuppress all of the state variables except twowas a very informal application of the princi-ples discussed above on independent varia-tion among the state variables which wereeconomic time series varibles and highly col-linear. Subjective estimates of the relativemagnitudes of the {rlj} were also used indeciding which state variables to retain forthe two state variable model. Another con-sideration in this application was the role of acommodity market equilibrium where thelagged variables in the econometric equa-tions converge to equilibrium values; the twostate variable model could be interpreted asan "equilibrium neighborhood" model.

385

Burt


Robert Taylor just completed a study ofwild oats control in the Northern GreatPlains which solved a stochastic DP modelcontaining five state variables. The proximatesolution procedure of Arrow was used by firstsolving a three state variable problem whichkept the most important variables based on apriori reasoning. Then the two remainingstate variables were introduced in sequencewith two more iterations. The model con-tained four stochastic and one deterministicstate variables. Here too, the key to success-ful solution with relative ease on a modestsize computer was exploitation of the specialstructure of the application.

Taylor's approach to simplifying the prob-lem was an informal, subjective use of rela-tive elasticities, Ir'lj; two state variables weresuppressed in getting the estimate of vn- 1().Some limited validation of the a priori as-sumption about relative elasticities can beperformed after the final model when resultsfor all five state variables have been es-timated, but care must be taken in recogniz-ing the influence of the approximating proce-dure itself on empirical estimates of the {'j}.Also, the size of 'njl must be consideredjointly with the amount of variation in xjduring the process; if xj is constant over theprocess it is a redundant variable.

Giaver's early study of dairy cow replace-ment was a formal application of multivariatenormal theory to reduce the number of statevariables by a linear transformation to lowerdimension space [Giaver]. An important setof state variables in dairy cow replacementdecisions is lactation history; if a cow is keptfor as many as six lactations, then six statevariables are involved in describing her pro-duction history. Giaver assumed a mul-tivariate normal distribution on productionby lactation and used a single state variable toreplace the entire history. This variable was alinear combination of individual lactationsobtained from the linear regression coeffi-cients of a predictor equation for the nextlactation beyond the current history. Sincethe conditional distribution of a normalvariate, given a linear constraint on a subset

of variates with which it is jointly distributed,is also normally distributed, the distributionfor stochastic state transitions was easily de-rived from the multivariate distribution overall lactations. The particular linear combina-tion chosen had the intuitive appeal of beingthe conditional mean of the lactation for thenext stage of the decision process, given thecow's lactation history.

Although Giaver's method has a certainamount of intuitive appeal, a better choicefor a single linear combination of the statevariables would be the eigenvector as-sociated with the largest principal compo-nent of the estimated covariance matrix.Point estimation is not the objective, butinstead, a linear relation which captures themost information with respect to the condi-tional distribution, given the linear restric-tion on the subset of multivariate normalvariates. In the normal distribution, the mostinformation is synonomous with minimumvariance in the conditionally distributed ran-dom variable. The principal componentmethod also generalizes directly to two ormore linear combinations of the state vari-ables to get a smaller set of state variableswhich captures the most information. In dis-tributions which are not multivariate normal,the principal component method, whichminimizes conditional variance, would seemto be a good approximation to an optimaltransformation, although it cannot be jus-tified as well in a formal statistical model.

The above discussion of transforming mul-tivariate normal state variables into a smallerset, and thus throwing away information,illustrates a general method of reducing thedimension of the state vector in stochasticDP which is much easier to defendphilosophically than its counterpart in deter-ministic models. A subset of the original statevariables, or a reduced set obtained by trans-formations, is used as the state variables inthe DP model and the probability distribu-tion associated with the state transition fromstage to stage has greater variance. The re-duced model uses a conditional distributionbased on the smaller set of state variables

386

December 1982

Dynamic Programming

which connotes less information in the deci-sion process. Since we are dealing with prob-ability distributions, there is no "error" or"bias" in the modeling process, just less in-formation relative to the ideal.

In deterministic models, we can onlyspeak of approximations and orders of mag-nitude on the errors created by a reduction inthe dimension of the state vector. In the rareinstance that an exact linear combination ex-ists among the state variables, the dimensioncan be reduced without error, but any ap-proximate linear combination will introducesome error if used to reduce the number ofstate variables. As a practical matter, thesituation is much like that in stochastic mod-els, but nothing as precise as a less informa-tive conditional probability distribution canbe used.

Taylor's SeriesApproximations to v,-l(')

When continuous state variables are ap-proximated by discrete segments and thetransitions for that variable are deterministic,some form of interpolation between the dis-crete points is necessary. Usually linear in-terpolation is adequate if the intervals are nottoo wide. Let the random variable E be sup-pressed in (3) and suppose the one continu-ous state variable x is approximated by a setof discrete values, x1, x2, ... xP. Then thefunction n- 1(') is approximated by a table ofvalues vn-_(xi), i=1, 2, ... p. In generalh(u,x) in the right hand side of (3) will fallbetween two values, say h(u,xj) and h(u,xj + 1),which requires some sort of interpolationscheme between the two table values. Amethod of interpolation for deterministicstate variables which comprise a subset of thetotal in a stochastic model is also necessary inmost cases.

Bellman has suggested that the curse ofdimensionality can be partially overcome byapproximating v,(x) in (3) as a polynomial onthe relevant interval [Bellman 1961, p. 244].It is easily shown that the table of values usedto approximate v,,(x) is replaced by the coeffi-cients of a polynomial. It might take 50 dis-

crete values with linear interpolation to givethe same accuracy as a 5th degree polynomialwith only 6 parameters. The method readilyextends to several state variables and someorthogonal form of polynomial such asLegendre polynomials could be used. Theseideas could be used in fitting spline functionsinstead of polynomials if empirical evidencesuggested a better approximation with thesame number of parameters.

The polynomial approximation of Vn(x)would appear relatively more powerful instochastic DP where the state variables (or asubset) are continuous, although I have notseen it suggested. For a continuous singlestate variable, let h(u,x, e) = x + 4((u,x, ) with-out any loss in generality. Then Evn-1(h(u,x,e)) is expanded in a Taylor's seriesaround the point x,

(4) Evn _ l(x + ((u,w, w)) =

Vn - 1(x) + E((u,x, E)Vn_- l(x)

+ E(b(u,x, £)2v _ 1(x)/2 ! +

If a polynomial approximation is used onvn-l( ), the derivatives evaluated at x areeasily calculated and only the moment func-tions E)(u,x,e) J are required to an arbitraryorder of approximation desired for the max-imization operation in (3). The main attrac-tion of this method in the stochastic model isavoidance of discrete approximations bymeans of a finite state Markov chain model[Howard]; only a few moments of 4(u,x,£) arerequired which contain u and x as arguments.But these moment functions are calculatedonce and for all, and are just part of the datafor the model.

The use of (4) in the right hand side of (3)does not depend in any essential way onvn-l(') being globally approximated by apolynomial. The method is operational if thefirst few derivatives of Vn-l( ) can be cal-culated at an arbitrary point x. This could bedone by an interpolation formula using a

387

Burt


sufficient number of points and degreepolynomial applied to a tabular approxima-tion of vn- 1() as is traditional in deterministicDP. The interpolation would get rather com-plex with several state variables which wouldmake the global polynomial approximation ofvn-l( ) look relatively more attractive. Thetransition from deterministic to stochasticDP might not be as onerous as most of ushave thought if this method were as powerfulas it appears.

An analytical approach to getting approxi-mately optimal decision rules for certaintypes of DP applications has been quite suc-cessful [Burt 1964a, 1976, 1981; Burt andCummings]. The ideas are similar to thosejust discussed in relation to substituting (4)into the right hand side of (3) except thatapproximations to the derivatives are alsoderived. In nonstochastic models the approx-imations become exact in a neighborhood ofstate space centered on the equilibrium stateof the process. For decision processes wherethe state variables change slowly over time inconjunction with discounting, the approxi-mation seems to give good results at anyinterior point in state space. In applicationsto ground water [Burt 1964a] and soil conser-vation [Burt 1981], the approximations wereexcellent except when the state variableswere close to a nonnegativity boundary,which is not surprising in that the derivationof the approximations assumed interior solu-tions in the optimization.

The method can be used in either deter-ministic or stochastic applications. Applica-tion of the approximation requires solution ofa system of equations for a given state of theprocess; a subset of the variables in the solu-tion constitutes the approximately optimalvalues of the decision variables for the givenstate. As an example, the one decision andstate variable model requires two and foursimultaneous equations for the first and sec-ond order approximations, respectively, al-though one of the variables can be simplyeliminated in the first order case. In the firstorder case the variables are the decision vari-able, u, and v'(x), while in the second order

388

approximation they are u, v'(x), v"(x), andau/ax. These additional variables besides thedecision variable are themselves of economicinterest; v'(x) is marginal expected presentvalue of the state variable and v"(x) is its rateof change. The au/ax is a measure of thesensitivity of the decision variable to changesin the state variable.

The most general treatment of this methodis found in [Burt 1976] where the model isstochastic with multiple decision and statevariables. Most of the analysis is for a firstorder approximation, but the general secondorder case with one decision and state vari-able is analyzed in the appendix. Althoughthe paper is oriented around multiaquiferground water management, the mathemati-cal results are in general terms.

When the assumptions are met for certain-ty equivalence DP, [Simon] and [Theil], thesecond order approximation simplifies to thestandard results; this was demonstrated in[Burt 1967] with a certainty equivalancemodel for intertemporal allocation of groundwater. The second order approximation usesa locally quadratic approximation to the func-tional equation v(x), and when the assump-tions are met for the standard certaintyequivalance theorem, v(x) is apparently glob-ally quadratic.

Algorithms for DiscreteDP Markov Processes

For purposes of numerical solution, therecursive DP equation in (3) is usually ap-proximated as a finite state Markov decisionprocess,

m(5) Vn(i) = max[qk+ f pjvn- (j)],

k j=1

i=l, 2, ... Mk=1, 2, ... K

in the following notation:

i = one of M possible statesk = one of K possible decisions

qk = discrete valued approximation toG(u,x)

December 1982

Dynamic Programming

pj = conditional probability of going tostate j under decision k when thecurrent state is i

Vn(i) = discrete valued approximation toVn(X).

Efficient solution methods to obtain theasymptotic decision rule in (5) as n--oo arediscussed below.

Several modifications of Bellman's stan-dard successive approximation algorithm[Bellman 1957b] have been published, buttheir usefulness is difficult to assess. Thefollowing discussion draws heavily on an un-published paper [Hendrikx, Van Nunen, andWessels] which gives an excellent summaryof the many modified algorithms and somecomparative computation times on four ap-plications. Hendrikx, et al. emphasize theimportance of choosing a combination ofvariants in algorithms which exploits the spe-cial structure in an application, but somevariants performed uniformly well across allfour of their applications. Emphasis here willbe placed on these more promising al-gorithms.

The method of successive approximationapplied to (5) yields monotonic improvementin the decision rule as well as the valuefunction vn('). An unequivocal convergencecriterion comparable to that for Howard'spolicy improvement algorithm is not availa-ble for the decision rule. The relevant criter-ion in value iteration is to find a decision rulesuch that Vn(i) is sufficiently close to v(i), themaximum expected present value possible.The basis for such a convergence criterionwas provided by MacQueen who derivedformulae for upper and lower bounds on v(i)that can be calculated at stage n. The itera-tions on (5) are terminated when the differ-ence between the upper and lower bounds isless than some small value 8 times Ivn(i)l forall i.

Algorithms should always include this typeof convergence criterion although the beststrategy may be not to apply it at each itera-tion because of the extra computation. Thereis a tradeoff between calculating extra itera-tions and the computations required to check

for convergence. When the number of altera-tive decisions at each stage is large relative tothe number of states, convergence should bechecked at each iteration, but as the ratioK/M get sufficiently small, every iteration istoo frequent. A stochastic strategy could beused to advantage by checking only a subsetof the M states for convergence at each itera-tion, with the subset chosen by random sam-pling. The best size sample would dependdirectly on K/M.

Most modified algorithms require extracomputations at each iteration and wouldhave more advantage when the maximizationoperation in (5) is relatively costly. However,some policy improvement variants, such asthe Gauss-Seidel, can give impressive in-creases in convergence with essentially thesame amount of computation at each iteration[Porteous]. The Gauss-Seidel variant exploitsa near upper or lower triangular form in thematrix of transition probabilities under a giv-en decision rule (common in replacementapplications).

The bisection variant [Bartman] requiresconsiderable extra computation at each itera-tion, but can greatly accelerate convergencewhen the policy improvement uniformlycloses in one direction the gap between theupper and lower bounds on v(i) for all x at agiven stage n. When the application has theright structure to exploit the Gauss-Seidelvariant, the computation times reported in[Hendrikx, et al.] suggest that these twovariants of the successive approximation al-gorithm complement one another. In one ofthese applications, the computation time wasreduced by a factor of 25 and in another by10.

A simple device which is advantageouswhen the number of decision alternatives islarge relative to the number of states is toiterate (5) several times with the same deci-sion rule [Van Nunen]. This procedure issomewhat of a combination between How-ard's policy and Bellman's value iterationalgorithms. Since the former tends to requirefewer iterations but more difficult computa-tions at each iteration than the latter, Van

389

Burt


Nunen's variant would seem to be a goodstrategy on some types of problems. A seri-ous limitation of Howard's policy iterationalgorithm is the large dimension of the sys-tem of linear equations that must be solved ateach iteration, which is equal to the numberstates.

The following remarks on computationalefficiency are applicable to generalizations of(5) where the qk and pki are functions of thestage n, and thus, the decision rule is depen-dent on the stage and the maximum numberof stages is finite. A large proportion of theelements in transition probability matricesare zero in most applications, and usually thepositive elements are clustered together inadjacent columns for a given row. Both com-puter storage and computational require-ments can be greatly reduced by structuringthe algorithm to avoid the zeros. Taking i andk as given, let S(i,k) be the smallest integerfor j where pkj>O and let L(i,k) be the largest.Then the lower and upper limits on thesummation in (5) to take expected values canbe replaced by S(i,k) and L(i,k), respectively.In many problems, the computations are re-duced to a small fraction of what they wouldhave been with (5). The cost of this techniqueis separate calculation and storage of thetransition probabilities with S(i,k) and L(i,k)as leading entries in the array of probabilitiesfor each i and k combination, but the storagespace for S(') and L(.) is usually trivial com-pared to the zero elements of the {pij} forwhich storage space is eliminated.

In order to keep the number of states M toa manageable number, lengths of the inter-vals into which continuous state variables aredivided sometimes get rather large. The usu-al procedure in calculating the {pkj} is to takethe initial state i as the midpoint of theinterval in a one state variable case, or thecenter of the multidimension cell if i is as-sociated with a combination of several statevariables. The following discussion uses asingle state variable but the generalization isobvious.

As the interval gets larger and larger, usingthe midpoint of the interval as a reference

390

point tends to bias the {pk} in many decisionprocesses. The bias occurs because the inter-val is too large relative to the mean andvariance of the random variable which under-lies the process. The problem can be largelyovercome by specifying a uniform probabilitydistribution for the position within the ithinterval instead of using the midpoint of theinterval as a deterministic reference point.Calculation of the {pj} is more onerous, butthis is a once and for all calculation. Supposethe interval is divided into ten subintervalsand each subinterval has an equal conditionalprobability of occurring, given that the largeinterval is the current state i. Then the prob-ability pkj is calculated ten times, once foreach of the midpoints of the ten subintervalstaken as the given state i; the final probabilityis the simple average of these ten values. Thelarge intervals will still yield an inferior deci-sion rule compared to smaller intervals be-cause less precise information is available onwhich to base decisions, but the compound-ing of probabilities will remove most of thesystematic bias from using midpoints of thelarge intervals.

Another device that can be used in someapplications to allow wider intervals in dis-crete approximations to state variables islengthening the time period which specifiesthe stage of the process. This method workedwell in an intertemporal ground water appli-cation [Burt 1964b] because changes instocks (the state variable) are a slow, evolu-tionary process under an optimal decisionrule. The stage was taken as a five-yearperiod instead of one year which preventsthe decision variable from changing morefrequently than every five years. Intertem-poral allocation problems in many naturalresources would appear amenable to thistype of modeling device, and in some re-spects, the restriction on frequency ofchanges in decision variables makes econom-ic sense. In the case of water, the restrictionforces a certain stability on the supplies ofwater available in the production process.

Computation of Ancillary ResultsAlthough the optimal decision rule based

December 1982

Dynamic Programming

on an expected value criterion is the primaryobjective in most applications of stochasticDP, the finite Markovian form in (5) can beused to calculate much additional informa-tion about the problem. Various decisionrules, whatever the source, can be comparedwith the optimal rule. Variance, or highermoments, of the return function can be cal-culated for any decision rule. The measure ofreturns can be present value or periodic (bystage), the latter in association with theequilibrium probability vector correspondingto the decision rule.

Starting from any initial state, the stateprobability vector for any future period(stage) can be calculated in a straightforwardmanner. Then the same kind of statisticalmeasures on returns as for the equilibriumstate probability vector can be calculated forthe future period. This auxiliary analysiscould be done quite easily at a higher level ofaggregation than that used to derive the opti-mal decision rule, i.e., the interval lengths inthe state variables used to get (5) could beincreased.

This type of computations for Markovprocesses is often overlooked, but couldsatisfy the apparent preference of many ag-ricultural economists for the flexibility ofsimulation. Its advantage over simulation isthe orderly structure of the modeling effort,and the availability of an optimal decisionrule based on maximum expected value as abenchmark. Even without an optimal deci-sion rule as a guide the Markov processcomputations would appear to be more effi-cient than stochastic simulation if very manyalternative decision rules are evaluated.

A Neglected Area of Application

Although most agricultural economists,even those well indoctrinated to DP, arepessimistic about the usefulness of DP mod-els applied to farm-firm growth and farmplanning in general [Kennedy], I am optim-istic about the opportunities for application ofstochastic DP in this area. Clearly, such ap-plications will not be routine formulationsand will require a major research effort, but

DP is still the most promising analytical mod-el for the conceptual problems involved. Thetrick for success is an astute choice of a fewstate variables at an aggregated level of ab-straction. What would appear to be seriousomissions in state variables must be takeninto account by nesting the omitted statevariables into the model implicitly by variousconditional optimizations. Various aspects ofthe firm growth process must be classifiedinto purely static and short, intermediate,and long run dynamic components. All butthe long run dynamic aspects of the processwould be dealt with by various suboptimiza-tions and approximations.

My limited experience with a simple firmgrowth model about ten years ago [Larson,Stauber, Burt] convinced me that stochasticDP is the most promising method for empiri-cal analysis of the firm growth process. Thismodel had only two state variables, (1) cashreserves (debt) divided by value of capacity,and (2) capacity measured in acres of crop-land (one state was reserved for bankruptcy).The decision variable was amount of landbought or sold. The primary criterion wasmaximum expected value of net worth at theend of 25 years, but some experimentationwith a concave utility function of terminal networth was also done. Crop prices were takenas fixed and the main source of risk was fromthe variation in dryland wheat yields.

In spite of the simplicity and apparentnaievete of the model, it provided muchinformation on the risks faced by operators invarious financial positions, and the nature oftheir best strategies for survival and growth.Some contrasting results with and withoutgovernment subsidy programs illustrated thedynamic reduction in risk from the govern-ment program. A necessary generalizationunder current world grain price instability isanother state variable associated with anautoregressive model of wheat prices. Also,more sophistication in the way decisions aremade to acquire and dispose of land areneeded.

A particularly attractive safety first criter-ion which is easily used in stochastic DP is to

391

Burt


minimize the probability of net worth fallingbelow a specified level at the end of a finiteplanning horizon. This criterion minimizesthe sum of a subset of the probabilities ofbeing in various states at the termination ofthe process, and is a simple modification ofBellman's original application of DP to Mar-kov processes [Bellman 1957b]. The mostconservative version of the criterion wouldminimize the probability of bankruptcy.

References

Arrow, Kenneth J. "Statistics and Economic Policy."Econometrica, 25(1957):523-31.

Athans, M. "The Discrete Time Linear-Quadratic-Gaussian Stochastic Control Problem." Annals ofEconomic and Social Measurement, 1(1972):449-91.

Bartman, D. "A Method of Bisection for DiscountedMarkov Decision Problems." Zeitschrift fir Opera-tions Research, 23(1979):275-87.

Bellman, Richard. Dynamic Programming. Princeton:Princeton University Press, 1957a.

."A Markovian Decision Process." Journal ofMathematics and Mechanics, 6(1957b):679-84.

Adaptive Control Processes: A Guided Tour.Princeton: Princeton University Press, 1961.

Burt, Oscar R. "Optimal Resource Use Over Time Withan Application to Ground Water." Management Sci-ence, 11(1964a):80-93.

"The Economics of Conjunctive Use of Groundand Surface Water." Hilgardia 36(1964b):31-111.

."Groundwater Management Under QuadraticCriterion Functions." Water Resources Research,3(1967):673-82.

"Ground Water Management and Surface Wa-ter Development for Irrigation." in Economic Model-ing for Water Policy Evaluation, edited by R. M.Thrall, et al., New York: North-Holland, 1976.

"Farm Level Economics of Soil Conservation inthe Palouse Area of the Northwest." American Journalof Agricultural Economics, 63(1981):83-92.

Burt, Oscar R. and John R. Allison. "Farm ManagementDecisions with Dynamic Programming." Journal ofFarm Economics, 45(1963):121-36.

392

Burt, Oscar R. and Ronald G. Cummings. "NaturalResources Management, the Steady State, and Ap-proximately Optimal Decision Rules." Land Econom-ics, 53(1977):1-22.

Burt, Oscar R., Won W. Koo, and Norman J. Dudley."Optimal Stochastic Control of U.S. Wheat Stocks andExports." American Journal of Agricultural Econom-ics, 62(1980):173-87.

Dixon, Bruce L. and Richard E. Howitt. "ResourceProduction Under Uncertainty: A Stochastic ControlApproach to Timber Harvest Scheduling." AmericanJournal of Agricultural Economics, 62(1980):499-507.

Dreyfus, Stuart E. and Averill M. Law. The Art andTheory of Dynamic Programming. New York: Aca-demic Press, 1977.

Giaver, H. Optimal Dairy Cow Replacement Policies.Unpublished Ph.D. thesis, University of California,Berkeley, 1966.

Hendrikx, Marcel, Jo Van Nunen, and Jaap Wessels."Some Notes on Iterative Optimization of StructuredMarkov Decision Processes With Discounted Re-wards." COSOR Memorandum 80-20, EindhovenUniversity of Technology, Eindhoven, The Nether-lands, 1980.

Hildreth, Clifford. "Problems of Uncertainty in FarmPlanning." Journal of Farm Economics,39(1957):1430-1441.

Howard, Ronald A. Dynamic Programming and MarkovProcesses. New York: John Wiley and MIT Press,1960.

Intrilligator, Michael D. Mathematical Optimizationand Economic Theory. Englewood Cliffs: Prentice-Hall, 1971.

Kennedy, John O. S. "Applications of Dynamic Pro-gramming to Agriculture, Forestry, and Fisheries:Review and Prognosis." Review of Marketing andAgricultural Economics, 49 (Dec. 1981).

Kushner, Harold J. and Fred C. Schweppe. "A Max-imum Principle for Stochastic Control Systems." Jour-nal of Mathematical Analysis and Applications,8(1964):287-302.

Larson, Donald K., M. S. Stauber, and 0. R. Burt.Economic Analysis of Farm Firm Growth in North-central Montana. Research Report 62, Montana Ag-ricultural Experiment Station, Bozeman, Montana,1974.

MacQueen, J. "A Modified Dynamic Method for Marko-

December 1982

Dynamic Programming

vian Decision Problems." Journal of MathematicalAnalysis and Applications, 14(1966):38-43.

Van Nunen, J. A. E. E. "A Set of Successive Approxima-tion Methods for Discounted Markovian DecisionProblems." Zeitshcrift fir Operations Research,20(1976):203-08.

Porteus, E. L. "Bounds and Transformations for Dis-counted Finite Markov Decision Chains." OperationsResearch, 23(1975):761-84.

Taylor, C. Robert, and Hovav Talpaz. "ApproximatelyOptimal Carryover Levels for Wheat in the UnitedStates." American Journal of Agricultural Economics,61(1979):32-40.

Theil, H. "A Note on Certainty Equivalence in DynamicProgramming." Econometrica, 25(1957):346-49.

Zusman, Pinhas and Amotz Amiad. "Simulation: A Toolfor Farm Planning Under Conditions of Weather Un-certainty." Journal of Farm Economics, 47(1965):574-94.

Simon, Herbert A. "Dynamic Programming Under Un-certainty With a Quadratic Criterion Function."Econometrica, 24(1956):74-81.

393

Burt


394

December 1982

dynamic programming: has its day arrived? · has its day arrived? oscar r. burt a look at john...

Documents