dynamic programming approach to the numerical solution of optimal control...

Optim Eng (2014) 15:119–136DOI 10.1007/s11081-012-9204-4

Dynamic programming approach to the numericalsolution of optimal control with paradigmby a mathematical model for drug therapiesof HIV/AIDS

Bao-Zhu Guo · Bing Sun

Received: 8 June 2011 / Accepted: 7 November 2012 / Published online: 17 November 2012© Springer Science+Business Media New York 2012

Abstract In this paper, we present a new numerical algorithm to find the optimalcontrol for the general nonlinear lumped systems without state constraints. The dy-namic programming-viscosity solution (DPVS) approach is developed and the nu-merical solutions of both approximate optimal control and trajectory are produced.To show the effectiveness and efficiency of new algorithm, we apply it to an optimalcontrol problem of two types of drug therapies for human immunodeficiency virus(HIV)/acquired immune deficiency syndrome (AIDS). The quality of the obtainedoptimal control and the trajectory pair is checked through comparison with the costsunder the arbitrarily selected different controls. The results illustrate the effectivenessof the algorithm.

Keywords Optimal control · Viscosity solution · Dynamic programming ·Numerical solution

1 Introduction

Optimal control has been being one of the main topics in modern control theory,which develops from two aspects: abstract theory and computational method. From

This work was supported by the National Natural Science Foundation of China (11001012,61273129, 11171011), the National Basic Research Program of China (2011CB808002), the NaturalScience Foundation of Beijing (1102010), and the National Research Foundation of South Africa.

B.-Z. GuoAcademy of Mathematics and Systems Science, Academia Sinica, Beijing 100190, P.R. China

B. Sun (�)School of Mathematics, Beijing Institute of Technology, Beijing 100081, P.R. Chinae-mail: [email protected]

B.-Z. Guo · B. SunSchool of Computational and Applied Mathematics, University of the Witwatersrand,Private Bag 3 Wits 2050, Johannesburg, South Africa

Author's personal copy

mailto:[email protected]

120 B.-Z. Guo, B. Sun

the computational aspect, on the one hand, there are essentially three types of ap-proaches seeking the numerical solution of optimal control for a continuous controlsystem (Sargent 2000). The first one is by the necessary condition of optimality, suchas the Pontryagin maximum principle, to solve a two-point boundary value problemmainly utilizing the multiple shooting method. The second kind of method convertsthe original continuous problem into a finite-dimensional nonlinear program problemthrough full discretization. The last one is through the parameterization of control tra-jectory to get a nonlinear program problem and then adopts proper steps to deal with itfurther (von Stryk and Bulirsch 1992). All these methods have their own drawbacks.The Pontryagin maximum principle provides only a necessary condition for the opti-mal control and it is difficult to translate into feedback form. Moreover, for the two-point boundary problem obtained by the necessary condition of optimality, the shoot-ing method has the difficulty of “guess” for the initial data to start the iterative numer-ical process (Stoer and Bulirsch 1993). Only exceptional case is that when the controlhas a valid bound, the forward-backward sweep iterative method presented in Hack-busch (1978) works for any initial guess. For other two approaches, the simplificationfor original problem leads to the fall of reliability and accuracy, and when the degreesof discretization and parameterization are very high, the work of computation standsout and the solving process gives rise to “curse of dimensionality” (Bryson 1996). Onthe other hand, from the application point of view, two effective approaches for solv-ing the optimal control problem are not brought into enough attention (Lin and Arora1994). The first one is based on the solution of the optimality conditions obtained byusing the calculus of variations. The second approach is the dynamic programmingapproach based optimality. By this principle, the Hamilton-Jacobi-Bellman (HJB)equation satisfied by the value function of the optimal control can be obtained. Fur-thermore, differential dynamic programming approach is developed in a special waybased on the HJB equation. The basic idea is to successively approximate the solutionusing the first- and/or second-order terms of Taylor’s expansion of the performanceindex about the nominal control and state trajectories (Jacobson and Mayne 1970;Mayne and Polak 1975).

In this paper, different than any one of numerical approaches aforementioned, amethod based on dynamic programming is constructed and elaborated with paradigmby a mathematical model for drug therapies of HIV/AIDS, to get the approximate (orsub) optimal control. It is generally considered that finding the optimal feedback con-trol is the Holy Grail of control theory (Ho 2005). Bellman’s dynamic programmingapproach could provide a way of finding the optimal feedback control. However,since Pontryagin’s time, it has been known that using dynamic programming, thevalue function of optimal control system satisfies a nonlinear partial differential equa-tion called HJB equation. If the value function and its gradient are known, then theoptimal feedback control can be obtained analytically. But unfortunately, no matterhow smooth the coefficients of HJB equation are, the classical solution may still notexist. Moreover, even if the classical solution exists, it may happen that the solutionis not unique. That is why for a long time, dynamic programming plays no importantroles in finding the optimal control.

The evolution happens when the viscosity solution was introduced by M.G. Cran-dall and P.L. Lions in the early 1980’s. By the viscosity solution theory (Bardi and


Dynamic programming approach to the numerical solution of optimal 121

Capuzzo-Dolcetta 1997), the value function is usually the unique viscosity solutionto the associated HJB equation. But finding the analytical viscosity solution of HJBequation is usually impossible. The numerical solution is almost the best choice forfinding the optimal control.

The objective of this paper is to demonstrate how to develop a completely newalgorithm to find the numerical solutions of approximate optimal control based ondynamic programming without all solutions of value function. Moreover, our numer-ical solution is a single closed-loop solution starting from the initial state (Richardsonand Wang 2006).

Different than those existed methods, in this paper, a novel algorithm based ondynamic programming approach is developed for a rather general class of optimalcontrol problems. And to show its effectiveness and efficiency, as a paradigm, thenew algorithm is applied an optimal control problem of two types of drug therapiesfor HIV/AIDS to get the approximate optimal control strategy. Although the solutionobtained is a single closed-loop solution starting from the initial state (Richardson andWang 2006) which cannot be considered in complete feedback form, the method haspotential to synthesis the optimal feedback control if we are able to get all solutions ofvalue function instead of the discrete point solutions that we search only along somedirection. From the aspect of general control theory, the feedback control has somemerits like disturbance attenuation and robustness. And it can decrease some errorscaused by the measurement and modeling (Zurakowski and Teel 2006). For instance,in the investigation of HIV control, this point also has been proven that the feedbackcontrol can overcome unplanned treatment interruptions, inaccurate or incompletedata and imperfect model specification (David et al. 2011).

We proceed as follows. In Sect. 2, which consists of two subsections, some prelim-inaries related to the theoretical background of the DPVS approach is provided andthen, the new algorithm for finding the approximate optimal control is constructedstep by step for the optimal control problems of the general nonlinear lumped sys-tems without state constraints. As a paradigm, in Sect. 3, the algorithm is appliedto an optimal control problem of two types of drug therapies for HIV/AIDS, whichindicates the effectiveness of the new algorithm. Finally, in Sect. 4, the contributionsof this work are briefly summarized. The potentiality of the algorithm for attackingother optimal control problems in higher dimensions is addressed.

2 The DPVS approach

2.1 Preliminary

Consider the following control system:{y′(t) = f (y(t), u(t)), t ∈ (0, T ], T > 0,

y(0) = z,(2.1)

where y(·) ∈ Rn is the state, u(·) ∈ U [0, T ] = L∞([0, T ];U) is the admissible con-

trol with compact set U ⊂ Rm and z is the initial value. Assume that f : R

n × U →R

n is continuous in its variables.



Given a running cost L(t, y,u) and a terminal cost ψ(y), the optimal control prob-lem for the system (2.1) is to seek an optimal control u∗(·) ∈ U [0, T ], such that

J(u∗(·)) = inf

u(·)∈U [0,T ]J(u(·)), (2.2)

where J is the cost functional given by

J(u(·)) =

∫ T

0L

(τ, y(τ ), u(τ )

)dτ + ψ

(y(T )

).

The dynamic programming principle due to Bellman is fundamental in modern op-timal control theory. Instead of considering optimal control problem (2.1)–(2.2), theprinciple proposes to deal with a family of optimal control problems initiating from(2.1)–(2.2). That is, consider the optimal control problem for the following systemfor any (t, x) ∈ [0, T ) × R

n:{y′t,x(s) = f (yt,x(s), u(s)), s ∈ (t, T ],

yt,x(t) = x,

with the cost functional

Jt,x

(u(·)) =

∫ T

t

L(τ, yt,x(τ ), u(τ )

)dτ + ψ

(yt,x(T )

).

Define the value function

V (t, x) = infu(·)∈U [t,T ]

Jt,x

(u(·)), ∀ (t, x) ∈ [0, T ) × R

n

with the terminal value

V (T , x) = ψ(x) for all x ∈ Rn.

It is well known that if V is smooth enough, say V ∈ C1([0, T ] × Rn), then V

satisfies the following HJB equation (Bardi and Capuzzo-Dolcetta 1997):{Vt (t, x) + infu∈U {f (x,u) · ∇xV (t, x) + L(t, x,u)} = 0, (t, x) ∈ [0, T ) × R

n,

V (T , x) = ψ(x), x ∈ Rn,

(2.3)where ∇xV (t, x) stands for the gradient of V in x.

The following two propositions show the important role of the value function incharacterizing the optimal feedback law (Bardi and Capuzzo-Dolcetta 1997).

Proposition 1 Let V ∈ C1([0, T ] × Rn) be the value function. Then if there exists a

control u∗(·) ∈ U [0, T ] such that

f(y∗(t), u∗(t)

) · ∇xV(t, y∗(t)

) + L(t, y∗(t), u∗(t)

)= inf

u∈U

{f

(y∗(t), u

) · ∇xV(t, y∗(t)

) + L(t, y∗(t), u

)},



then u∗(·) is an optimal control, where y∗ is the state corresponding to u∗.

As usual, we denote the optimal control as (Bardi and Capuzzo-Dolcetta 1997)

u∗(t) ∈ arg infu∈U

{f

(y∗(t), u

) · ∇xV(t, y∗(t)

) + L(t, y∗(t), u

)}for almost all t ∈ [0, T ].

Proposition 2 Let V (t, x) ∈ C1([0, T ] × Rn) be the value function. Then

(u∗(·), y∗(·)) is an optimal control-trajectory pair in feedback form if and only if

Vt

(t, y∗(t)

) + f(y∗(t), u∗(t)

) · ∇xV(t, y∗(t)

) + L(t, y∗(t), u∗(t)

) = 0

for almost all t ∈ [0, T ).

By Propositions 1 and 2, we have the following Theorem 1, which leads to theconstruction of the feedback law via the value function.

Theorem 1 Let V (t, x) ∈ C1([0, T ] × Rn) be the value function. Suppose u(t, x)

satisfies

f(x,u(t, x)

) · ∇xV (t, x) + L(t, x, u(t, x)

) = infu∈U

{f (x,u) · ∇xV (t, x) + L(t, x,u)

}.

Then

u∗z(t) = u

(t, yz(t)

)is the feedback law of the optimal control problem (2.1)–(2.2), where yz(t) satisfyingthe following

y′z(t) = f

(yz(t), u

(t, yz(t)

)), yz(0) = z, t ∈ [0, T ].

We see from Theorem 1 that in order to find the feedback control law, not only thevalue function V itself but also its gradient ∇xV are needed.

Equation (2.3) generally has no classical solution regardless of the smoothness ofthe functions f and L. Fortunately, under some basic assumptions on f and L, thevalue function V is the unique viscosity solution to (2.3). However, it is usually notpossible to find analytical solution of (2.3) for general nonlinear functions f, L. Ittherefore becomes very important to solve (2.3) numerically, particularly in applica-tions. Actually, some difference schemes have been proposed to find the viscositysolutions (Fleming and Soner 1993; Huang et al. 2000, 2004; Wang et al. 2003).

Once a viscosity solution of (2.3) is obtained numerically, we are able to constructa numerical solution of the feedback law by Theorem 1.

2.2 Algorithm of finding optimal feedback law

In this subsection, we follow Theorem 1 to construct an algorithm for finding numer-ical solutions of optimal feedback control and optimal trajectory pair. The algorithm



consists of two coupled discretization steps. The first step is to discretize the HJBequation (2.3) to find the feedback law and the second one is to discretize the stateequation (2.1) to find the optimal trajectory.

In the last two decades, many different approximation schemes have been devel-oped for the numerical solution of (2.3) such as the upwind finite difference scheme(Wang et al. 2000), the method of vanishing viscosity (Crandall and Lions 1984), andthe parallel algorithm based on the domain decomposition technique (Falcone et al.1994), to name just a few. As for numerical solution of the state equation (2.1), thereare numerous classical methods available such as the Euler method, the Runge-Kuttamethod, the Hamming algorithm (Stoer and Bulirsch 1993), etc.

Notice that when we use Theorem 1 to find the optimal feedback law, it is thedirectional derivative ∇xV · f not the gradient ∇xV itself that is needed. This factgreatly facilitates our search for numerical solutions. The key step is to approximate∇xV · f by its natural definition as following:

∇xV (t, x) · f (x,u) =[∇xV (t, x) · η f (x,u)

1 + ‖f (x,u)‖]

1 + ‖f (x,u)‖η

≈ V (t, x + ηf (x,u)

1+‖f (x,u)‖ ) − V (t, x)

η

(1 + ∥∥f (x,u)

∥∥), (2.4)

where η > 0 is a small number and ‖ · ‖ denotes the Euclidean norm in Rn. Based

on observation (2.4), the HJB equation (2.3) can be thereby approximated by a finitedifference scheme in time and space mesh approximation:

⎧⎨⎩

Vj+1i −V

ji

�t+ V

ji+1−V

ji

η· (1 + ‖f j

i ‖) + Lji = 0,

uj+1i ∈ arg infu∈U {V

j+1i+1 −V

j+1i

η· (1 + ‖f (xi, u)‖) + L(tj+1, xi, u)}

(2.5)

for i = 0,1, . . . ,M and j = 0,1, . . . ,N − 1, where Vji = V (tj , xi), f

ji = f (xi, u

ji ),

Lji = L(tj , xi, u

ji ). Meanwhile, it is assumed that

|�t |η

(1 + max

i,j

∥∥fji

∥∥)≤ 1,

which is a necessary and sufficient condition for the stability of the finite differencescheme (Press et al. 2002).

It is pointed out that the above approximation (2.4) brings obvious advantages toour algorithm presented in this paper. If we try to work out the viscosity solutionof (2.3) first, we will most likely be under “the curse of dimensionality” for highdimensional problems, since we have to obtain data for all grids in the whole region.Perhaps that is why the numerical experiments about the viscosity solution of (2.3)studied in many literatures are most limited to 1-D or 2-D problems (e.g., Wang et al.2000). On the other hand, since our scheme searches the optimal control only alongthe direction of f not in the whole region, the new algorithm involves much less data.This idea is also applicable to infinite-dimensional systems (Guo and Sun 2005). To



the best of our knowledge, there has been no effort along this direction to find optimalcontrol by dynamic programming.

Based on above discussion, we now construct the algorithm for the numericalsolutions of the optimal feedback control-trajectory pairs.

Step 1: Initial partition on time and space. Select two positive integers N and M . Lettj = T +j�t, j = 0,1, . . . ,N be a backward partition of [0, T ], where �t = −T/N .For any initial given u ∈ U , let initial state x0 = z and

xi = xi−1 + ηf (xi−1, u)

1 + ‖f (xi−1, u)‖ , i = 1,2, . . . ,M. (2.6)

Step 2: Initialization of value function and control. Let

{V 0

i = ψ(xi),

u0i ∈ arg infu∈U {V 0

i+1−V 0i

η· (1 + ‖f (xi, u)‖) + L(T ,xi, u)}, i = 0,1, . . . ,M.

(2.7)Step 3: Iteration for HJB equation. By (2.5) and Step 2, we obtain all {{V j

i }Mi=0}Nj=0

and {{uji }Mi=0}Nj=0:

⎧⎨⎩

Vj+1i = (1 + �t

η(1 + ‖f j

i ‖))V ji − �t

η(1 + ‖f j

i ‖)V j

i+1 − �tLji ,

uj+1i ∈ arg infu∈U {V

j+1i+1 −V

j+1i

η· (1 + ‖f (xi, u)‖) + L(tj+1, xi, u)}.

Here, (uN0 , y0) = (uN

0 , y(0)) = (u(0), z) is the first optimal feedback control-trajectory pair.

Step 4: Iteration for state equation. For j = 0,1,2, . . . ,N − 1, solve the state equa-tion:

yj+1 − yj

−�T= f

(yj , u(tN−j )

),

to obtain yj+1 = y(tN−j−1). Replace (u, z) in Step 1 by (u(tN−j ), yj+1) and goto

Step 2 and Step 3 to obtain uN−j−10 . Then (u

N−j−10 , yj+1) = (u(tN−j−1), y(tN−j−1))

is the (j + 2)-th optimal feedback control-trajectory pair. Continue the iteration toobtain all (u(tN−j ), y(tN−j ))

Nj=0.

After Steps 1–4, we finally get all the desired optimal feedback control-trajectorypairs: (

u(tN−j ), y(tN−j )), j = 0,1, . . . ,N.

It is worth noting that the focus of the above algorithm is not to solve the HJBequation, not even to obtain the value function itself. This is different from most liter-atures (e.g., see Huang et al. 2000, 2004; Wang et al. 2003) in the field, which focuson solving the HJB equations. Our ultimate aim is to find the numerical solutionsof both optimal feedback control and corresponding optimal trajectory. The wholealgorithm consists of solving state equation one time and the HJB equation N times.



According to our numerical experiments reported in Guo and Sun (2005, 2007) andthe example presented in next section, the mesh point sequence in space generatedby the recursive relation (2.6) does not bring oscillation of the space variable in sub-regions of the given space even when f changes its sign. This is because (2.4) isthe more proper definition of the directional derivative, which allows us to search theoptimal control along its natural direction f .

If only the solution of HJB equation (2.3) is concerned, for instance computingthe values of V in the polyhedral

∏ni=1[ai, bi] of R

n, we have to produce the mono-tone mesh point sequence in space generated by the recursive relation (2.6). To thispurpose, we have to change the searching direction forcefully. In this case, we sug-gest to use the following approximation for directional derivative instead of using theapproximation of (2.4). Specifically, for every fixed (x,u), x = (x1, x2, . . . , xn), f =(f 1, f 2, . . . , f n),

∇xV (t, x) · f (x,u)

=n∑

p=1

[Vxp(t, x) · ηf p(x,u) sgn(f p(x,u))

1 + ‖f p(x,u)‖]

1 + ‖f p(x,u)‖η sgn(f p(x,u))

≈n∑

p=1

V (t, x + ηf p(x,u) sgn(f p(x,u))

1+‖f p(x,u)‖ Ip) − V (t, x)

η

1 + ‖f p(x,u)‖sgn(f p(x,u))

,

where

sgn(℘) ={

1, if ℘ ≥ 0,

−1, if ℘ < 0,

Ip is the n-dimension unit vector with the p-th component 1. Vxp is the partial deriva-tive of the value function V with respect to xp .

In this way, the possible oscillation caused by sign change of the function f canbe avoided when we use the formula (2.6) to perform the spatial mesh partition.Accordingly, the mesh point sequence in space generated by the recursive relation(2.6) now becomes

xpi = x

p

i−1 + ηsgn(f p(xi−1, u))f p(xi−1, u)

1 + ‖f p(xi−1, u)‖ , p = 1,2, . . . , n, i = 1,2, . . . ,M,

where xk = (x1k , x2

k , . . . , xnk ), k = i − 1, i.

To end this section, we point out that by the algorithm, the solution we obtainedis a single closed-loop solution starting from the initial state (Richardson and Wang2006). This fact can be seen from the execution process of the algorithm.

(a) In the beginning, evaluate V 0i , i = 0,1, . . . ,M , using the terminal condition

and obtain u0i by Eq. (2.7). For any given initial u, we set one initial state

x0 = z to evoke the computation. By the finite difference scheme (2.5) for theHJB equation (2.3), we obtain {{V j

i }Mi=0}Nj=0 and {{uji }Mi=0}Nj=0 (or in continuous

case V (t, z) and u(t, z)). And (uN0 , x0) = (u(0), z) is the first optimal control-

trajectory pair.



(b) Then, substitute u(0), z into the finite difference scheme for state equation andobtain the new state y1 at time tN−1. Replace u, z in (a) by u(0), y1 to obtainuN−1

0 using the difference scheme (2.5). Now we obtain the second optimalcontrol-trajectory pair (uN−1

0 , y1) = (u(tN−1), y(tN−1)).(c) Proceed the computation until we obtain all (u(tN−j ), y(tN−j )), j = 0,1, . . . ,N ,

which constitute the optimal control-trajectory pairs. Hereto, the computation isdone.

Note that during the execution of the whole algorithm, the finite difference schemeof the HJB equation is called once every time node is updated in the differencescheme for the state equation. After the algorithm finishes running, we call the stateequation (the corresponding finite difference scheme, actually) one time and the HJBequation (also the corresponding finite difference scheme) N times overall. This com-putation is evoked by the initial state z and the solution we obtain just starts from thisinitial data. Until we adopt the same algorithm and finish all computations by theinitial states from the definition domain, the feedback map for the optimal controlproblem is completely constructed.

3 A paradigm on optimal control of HIV/AIDS

In this section, to show how to utilize the new algorithm, we apply it to an optimalcontrol problem of two types of drug therapies for HIV/AIDS and get the approximateoptimal control strategy. By the aid of this paradigm, we try to elaborate the validityand effectiveness of the new algorithm.

Adopting the mathematical methods to understand the dynamics of the HIV/AIDShas been studied by many researchers, examples can be found in Nowak and May(2000), The INSIGHT-ESPRIT Study Group and SILCAAT Scientific Committee(2009). For using mathematics to understand the HIV immune dynamics, we refer toKirschner (1996). In order to understand the dynamics of HIV-1 infection in vivo, itis studied in Perelson and Nelson (1999) how the dynamical modeling and param-eter estimation techniques can be used to discover some important features of HIVpathogenesis. In Nowak and May (2000), the virus dynamics is discussed and themathematical principle of immunology and virology are investigated. The works inCraig and Xia (2001), Craig et al. (2004) introduce HIV/AIDS education into thecurriculum of a university, and the model of HIV/AIDS is analyzed from the con-trol engineering point of view. Using control theory, the question of when to initiateHIV therapy is studied in Jeffrey et al. (2003). It is concluded that the therapy isbest initiated when the viral load is easier to control. By the Pontryagin maximumprinciple, some optimal chemotherapy strategies are obtained in Butler et al. (1997),Felippe de Souza et al. (2000), Kirschner et al. (1997). Several methods of stablecontrol of the HIV-1 population using an external feedback control are developed inBrandt and Chen (2001), where it is shown, by a feedback control approach, howthe immune system components can be bolstered against the virus. An interestingstudy of the dynamic multi-drug therapies for HIV is presented in Wein et al. (1997)where a dynamic but not optimal policy is proposed. Adopting the model predictivecontrol approach, Zurakowski and Teel (2006) derives the treatment schedules of the



HIV therapy, which is a closed-loop control solution of the modified Wodarz-Nowarkmodel. A neighboring-optimal control policy is presented in Stengel et al. (2002).Based on linearization of the nonlinear model at the steady state, Radisavljevic-Gajic (2009) proposes the control strategy for the HIV-virus dynamics and furtherfor linear-quadratic optimal control problem. The controller based on minimizationof the square of the error between the actual and desired (equilibrium) values is ob-tained. For a fractional-order HIV-immune system with memory, Ding et al. (2012)discusses the necessary conditions for the optimality of a general fractional optimalcontrol problem. The fractional-order two-point boundary value problem is solvednumerically and the effects of mathematically optimal therapy are demonstrated. InAdams et al. (2007), the researchers fit a nonlinear dynamical mathematical model ofHIV infection to longitudinal clinical data for individual patients. And a statistically-based censored data method is combined with inverse problem techniques to estimatedynamic parameters. The important works Adams et al. (2004, 2005) investigate theoptimal control strategy of HIV by the Pontryagin maximum principle. Since thenumber of elements of the control set Λ in Adams et al. (2004, 2005) is finite, the op-timal structured treatment interruptions control problems are considered via firstly thecrude direct search approach involving the simple comparisons, then 5 day segmentstrategy to reduce the number of iterations, finally the subperiod method to further al-leviate the computational burden. The suboptimal structured treatment interruptionstherapy protocols are derived without mentioning the HJB equation and the concretealgorithm.

The optimal control obtained in these aforementioned works are obtained throughthe Pontryagin maximum principle. They are hence inherently the open loop con-trol. The Pontryagin maximum principle is a necessary condition. Most often, “thosesophisticated necessary conditions rarely give an insight into the structure of the op-timal controls” (Rubio 1986). Moreover, the open loop control characterized by thePontryagin maximum principle solves (when it does!) the problem only for speci-fied initial data, yet the optimal control problem usually needs a solution for a mul-titude of initial data. In addition, some simple examples in Lenhart and Workman(2007) show that the optimal controls and trajectories corresponding to the initialdata may have different structures. This brings the difficulties in finding the numer-ical solution (Mirica 2008). Actually, it is remarked in Boltyanskii (1971) that “theuse of the Pontryagin maximum principle to solve concrete problems is very difficultsince it contains two different simultaneous mathematical problems: integration of thecanonical differential system and, simultaneously, the maximization of the Hamilto-nian”.

Different with the maximum principle, in this paper, we apply the DPVS ap-proach to an optimal control problem of two types of drug therapies for HIV/AIDSto get the approximate optimal control strategy. Now let us introduce the investigatedHIV/AIDS therapy model in this section. Let x1 represent the concentration of theuninfected CD4+T cells, and x2 the free infectious virus particle population, respec-tively. Then for any tf > 0, the two types of drug treatments of HIV/AIDS model canbe described by the following system of ordinary differential equations (Joshi 2002;



Kirschner and Webb 1998):⎧⎪⎪⎨⎪⎪⎩

x′1(t) = s1 − s2x2(t)

B1+x2(t)− μx1(t) − kx2(t)x1(t) + u1(t)x1(t), t ∈ (0, tf ),

x′2(t) = g(1−u2(t))x2(t)

B2+x2(t)− cx2(t)x1(t), t ∈ (0, tf ),

(x1(0), x2(0)) = z = (z1, z2),

(3.8)where u(·) = (u1(·), u2(·)) ∈ U [0, tf ] = L∞((0, tf );U), the admissible control setin which u1, u2 represent the immune boosting and viral suppressing drugs, respec-tively. Here we may use IL-2 immunotherapy as the immune system enhancingdrug u1. By u1(t)x1(t) it is assumed that the enhancement of the immune systemthrough IL-2 results in an increase in the CD4 T cells proportional to the popula-tion of these cells at the rate u1(t). The set U = [c1�, c1r ] × [c2�, c2r ] is the controlconstraint set for some constants ci�, cir , i = 1,2. In the model equation, the terms1 − s2x2(t)

B1+x2(t)represents the source/proliferation of uninfected CD4 T cells which in-

cludes both an external (not plasma) contribution of cells from sources such as thethymus and lymph nodes, and an internal (plasma) contribution from CD4 T cell dif-ferentiation. This T-cell source deteriorates during the progression with limiting values1 − s2. s1, s2 are constant source/production of CD4+T cells, μ is the death rate ofthe uninfected CD4+T cells, k is the rate of CD4+T cells that become infected byfree virus x2, g is the input rate of the external viral source, c is the loss rate of virusand B1,B2 are half saturation constants.

This simple model of HIV progression considers only the uninfected CD4+ T cellpopulation and the free virus population interacting in the plasma. Currently, there area few newer models available as the HIV research develops, which include more statevariables, such as the class of infected CD4+ T cells. Some of them are late findingson IL-2 coadministration for HIV therapy, such as that in The INSIGHT-ESPRITStudy Group and SILCAAT Scientific Committee (2009), which almost refers allthe recent advances in this field. However, even from the biological meanings, theadopted model can show that the dynamics of HIV progression in the plasma canbe based upon simple assumptions about the interactions of uninfected CD4+T cellsand free virus. Since there is extensive data for these two populations during the pro-gression, the model simulations can be compared to data. This mathematical model(3.8) of HIV/AIDS progression was firstly presented in Kirschner and Webb (1998)to characterize the interaction of CD4+T cells and HIV particles in the immune sys-tem, where one can find more details on its biological background. And Joshi (2002)utilized the Pontryagin maximum principle to investigate the optimal control problemof the model. In this paper, we aim to develop a completely new algorithm to find thenumerical solutions of approximate optimal control based on dynamic programmingwithout all solutions of value function. The problem of drug combination in virustreatment as an application is given. The simple HIV/AIDS model adopted in thispaper, due to its mathematical properties, makes it more easily for illustration of ourpurpose. The proposed algorithm has the ability to treat more complex and practicalmodels like those considered in Rong et al. (2007), Rong and Perelson (2009) wherethe drug resistance, immune response and HIV latency become critical issues leadingto HIV persistence in the presence of long-term effective therapy.



Now we propose a new optimal control problem for the system (3.8). That is, toseek an optimal control u∗(·) ∈ U [0, tf ] such that

J(u∗(·)) = inf

u(·)∈U [0,tf ]J(u(·)), (3.9)

where J is the cost functional given by

J(u(·)) =

∫ tf

0

[A1u

21(τ ) + A2u

22(τ ) − x1(τ )

]dτ + A3x

22(tf ) (3.10)

with positive constants A1,A2,A3. It should be emphasized that we require A3 �= 0in (3.10), which is different from Joshi (2002) although our method can certainlybe used to treat the case of A3 = 0 as well. The running cost in the cost functionalabove includes the benefit of CD4+T cells and the costs of the drug therapies. Thesquare of the final virus particle count composes the terminal cost term. Because thedrugs are toxic to the body when the patients are administered in high dose, we usethe quadratic terms in the running cost. Since the final virus particle count is oneof main suppressing aims in the treatment, the square term is adopted in the sameway. Our goal here is to minimize drug use while maximize uninfected cell countand get a minimal virus particle count in the final stage of the whole investigationaltreatment period. By using the terminal cost, we successfully avoid a typical problemin existed literatures (see for instance Figs. 13, 15 of Adams et al. (2005) and Figs. 3,5 of Adams et al. (2004)), which allow the viral load to rebound simply because thehorizon is ending.

By (2.3), if the value function of the optimal control problem (3.9) is sufficientlysmooth, one can derive the following HJB equation satisfied by the value function{

Vt (t, x) + infu∈U {Vx(t, x) · F(x,u) + A1u21 + A2u

22 − x1} = 0,

V (tf , x) = A3x22 ,

where t ∈ [0, tf ), x = (x1, x2) ∈ R2. The function F(x,u) = (F1(x,u),F2(x,u))T is

defined by

F(x,u) =(

s1 − s2x2B1+x2

− μx1 − kx2x1 + u1x1g(1−u2)x2

B2+x2− cx2x1

).

In the following, we generate the numerical solutions of the optimal control prob-lem (3.8)–(3.9). This exactly follows the algorithm presented in Sect. 2. The pro-gramme is developed in C and the results are plotted by MATLAB. These parametersused in computation are listed as follows:

N = 640, M = 5, A1 = 250000, A2 = 75, tf = 40, η = 0.02,

z1 = 400.0, z2 = 3.5, c1� = 0.0, c1r = 0.02, c2� = 0.0, c2r = 0.9,

s1 = 2.0, s2 = 1.5, μ = 0.002, k = 2.5 × 10−4, g = 30.0, c = 0.007,

B1 = 14.0, B2 = 1.0,



and A3 is taken two values 100 and 0 respectively but this is only for numericalpurpose. Among these parameters and constants listed above, some of them are di-rectly borrowed from Kirschner and Webb (1998) in which the model (3.8) was firstconstructed.

It should be pointed out that making the magnitude difference between A1 and A2is to balance the size of the terms. The control bounded for u1 here is [0,2.0 × 10−2]and its square is [0,4.0 × 10−4], and the control bounded for u2 is [0,0.9] and itssquare is [0,0.81]. The control u1, u2 denote the drug administration schedules. Thevalue u1 = 0.02 means that the drug u1 is administered in full scale per day. Similarly,the drug u2 is administered in full scale per day if u2 = 0.9.

We compute the approximate optimal control-trajectory pairs for the cases of A3 =100 and 0, respectively. The obtained results are plotted into two figures. Figure 1presents the computed numerical solution of approximate optimal control-trajectorypair when A3 = 100, which plots the approximate optimal control components u∗

1, u∗2

and the computed corresponding trajectories of CD4+T cells and the HIV particles,respectively. In order to make the necessity of the terminal cost stand out, the case ofA3 = 0 is plotted in Fig. 2 in the same way.

It should be pointed out that although the results we obtained seem not optimal(the optimal control should be continuous from Theorems 6.1, 6.2 of Fleming Wen-dell and Rishel 1975), they are approximate optimal (or suboptimal) in the sense that

Fig. 1 Numerical solution of approximate optimal control-trajectory pair with A3 = 100



Fig. 2 Numerical solution of approximate optimal control-trajectory pair with A3 = 0

they are very effective to suppress the HIV particles and boost the function of im-mune systems. Even from the point of view of numerical analysis, our results arealso satisfactory. In order to check this point, we do some numerical experiments.This is performed under the same initial condition to compare the cost correspond-ing to the approximate optimal control-trajectory pair with that of arbitrarily selectedadmissible controls and corresponding trajectories.

The arbitrarily selected admissible controls are several possible combinations oftwo controls. Each control is chosen to be one of five different functions:

u1(t) =

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

0.02, Case a,−0.02t/40 + 0.02, Case b,

0.02t/40, Case c,0, Case d,

0.02| sin 30t |, Case e,

u2(t) =

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

0.9, Case A,

−0.9t/40 + 0.9, Case B,

0.9t/40, Case C,

0, Case D,

0.9| sin 50t |, Case E.

We then compute the corresponding cost functionals (J values) for these controls.The results are listed in Table 1.

Remark 1 It is clearly seen from Table 1 that the cost corresponding to the approx-imate optimal control-trajectory pair is less than any other values under the control



Table 1 Costs corresponding to different control-trajectory pairs

Control (u1, u2) J values with A3 = 100 J values with A3 = 0

Case (a, A) −19037.364346 −19037.364346

Case (b, B) −16091.030761 −19455.373985

Case (b, C) −18927.367136 −18927.595354

Case (a, D) −18365.038145 −19997.793922

Case (c, B) −13315.540793 −16752.069485

Case (c, C) −16250.916681 −16251.276633

Case (d, A) −14479.391625 −14479.394837

Case (d, D) −5464.061856 −15709.633929

Case (e, E) −18057.659758 −18094.224090

Approximate optimal (u∗1, u∗

2) −19308.169228 −20177.695872

(u1, u2) no matter what A3 takes. In particular, when A3 = 100, the cost correspond-ing to the control (d,D) is maximal. This corresponds to the case without any treat-ment. When A3 = 0, the maximal cost occurs in the case (d,A), which advises usthat the single viral suppressing drug u2 cannot give a good therapy effect, if theimmune boosting drug u1 absents. When A3 = 100, the case (a,A) is the secondminimal one, which corresponds to the case of full therapy. Namely, the patient takesthese two drugs in the whole observed period in the full scale. It tells us that receivingthe full therapy is also a good treatment option if there are no better schemes. In bothcases of (d,A), (d,D), if there is no the immune boosting drug u1, the correspondingcosts are very high whether the viral supporting drug u2 exists or not. So the immuneboosting drug is very important to get the ideal therapy effect.

Remark 2 To show our approach is effective we had the comparison with the existingreferences. It shows that our results agree with those in Joshi (2002), Kirschner et al.(1997). That is to say, the formats of the optimal solutions are similar. Among Joshi(2002), Kirschner et al. (1997) and our work, the investigational objects are different.In Kirschner et al. (1997), the model is different with ours and there is only onecontrol applied there. Moreover, the cost functionals of Joshi (2002), Kirschner et al.(1997) do not include the terminal cost. In order to have the proper comparison withthe existing results, we did calculate the case where there is no terminal cost (A3 = 0)and plotted into the following Fig. 3. The format of the obtained results do agree withthese in references.

In Joshi (2002), it is also said that “the format of the optimal controls do agree withthose in the papers Brandt and Chen (2001), Bryson (1996), Craig et al. (2004) withonly one control”. Because we do not know the initial values used in Joshi (2002)(the optimal control depends on initial value), we are not able to produce exactly thesame graph of Joshi (2002). However, we used the initial value (400.0,3.5) for the 50days simulation as Joshi (2002). The first control is almost the same as that in Joshi(2002) and the format of CD4+T cell does agree with that in Joshi (2002) (see Fig. 3:Comparison with existing results). As for the second control and HIV particles, ourresults are more like Fig. 4 of Kirschner et al. (1997) on p. 789.



Fig. 3 Comparison with existing results

4 Concluding remarks

In this paper, a new algorithm is presented to find the optimal single closed-loop solu-tions starting from the initial state (Richardson and Wang 2006) by dynamic program-ming approach. The algorithm is based on two observations: (a) the value functionof the optimal control problem considered is the viscosity solution of the associatedHamilton-Jacobi-Bellman (HJB) equation; and (b) the appearance of the gradient ofthe value function in the HJB equation is in the form of directional derivative. Thealgorithm proposes a discretization method for seeking approximate optimal control-trajectory pairs based on a finite-difference scheme in time through solving the HJBequation and state equation. We apply the algorithm to a HIV/AIDS optimal controlproblem. The results illustrate that the method is valid and effective. Furthermore, thealgorithm can be applied to other optimal control problems in higher dimensions.

From the HIV/AIDS model example, our numerical computations show that theinfluence of the different initial conditions are not remarkable. What can make a bigdifference for the results is the terminal cost. It seems that adding the terminal costin the cost functional is necessary, which is the big difference the results presented inthis paper with existing literatures.

Finally, we indicate that the solution we obtained might be local minimal due tothe chosen of the starting control. This can be ruled out through comparison with “ar-



bitrarily chosen” admissible control for a practical problem. The further investigationof this problem is needed.

Acknowledgements The authors would like to thank the Editor and the anonymous reviewers for theirvaluable comments and suggestions that improve the paper substantially.

References

Adams BM, Banks HT, Kwon H-D, Tran HT (2004) Dynamic multidrug therapies for HIV: optimal andSTI control approaches. Math Biosci Eng 1(2):223–241

Adams BM, Banks HT, Davidian M, Kwon H-D, Tran HT, Wynne SN, Rosenberg ES (2005) HIV dynam-ics: modeling, data analysis, and optimal treatment protocols. J Comput Appl Math 184(1):10–49

Adams BM, Banks HT, Davidian M, Rosenberg ES (2007) Estimation and prediction with HIV-treatmentinterruption data. Bull Math Biol 69(2):563–584

Bardi M, Capuzzo-Dolcetta I (1997) Optimal control and viscosity solutions of Hamilton-Jacobi-Bellmanequations. Birkhäuser, Boston

Boltyanskii VG (1971) Mathematical methods of optimal control. Balskrishnan-Neustadt series. Holt,Rinehart and Winston, New York-Montreal-London. Translated from the Russian by K.N. Trirogoff.Edited by Ivin Tarnove

Brandt ME, Chen GR (2001) Feedback control of a biodynamical model of HIV-1. IEEE Trans BiomedEng 48(7):754–759

Bryson AE Jr (1996) Optimal control—1950 to 1985. IEEE Control Syst 16:26–33Butler S, Kirschner D, Lenhart S (1997) Optimal control of chemotherapy affecting the infectivity of HIV.

In: Advances in mathematical population dynamics—molecules, cells and man. Series in mathemat-ical biology and medicine, vol 6. World Scientific, River Edge, pp 557–569

Craig IK, Xia X (2001) Can HIV/AIDS be controlled? Applying control engineering concepts outsidetraditional fields. IEEE Control Syst Mag 25(1):80–83

Craig IK, Xia X, Venter JW (2004) Introducing HIV/AIDS education into the electrical engineering cur-riculum at the University of Pretoria. IEEE Trans Ed 47(1):65–73

Crandall MG, Lions PL (1984) Two approximations of solutions of Hamilton-Jacobi equations. MathComput 43:1–19

David J, Tran H, Banks HT (2011) Receding horizon control of HIV. Optim Control Appl Methods32(6):681–699

Ding Y, Wang Z, Ye H (2012) Optimal control of a fractional-order HIV-immune system with memory.IEEE Trans Control Syst Technol 20(3):763–769

Falcone M, Lanucara P, Seghini A (1994) A splitting algorithm for Hamilton-Jacobi-Bellman equations.Appl Numer Math 15:207–218

Felippe de Souza JAM, Caetano MAL, Yoneyama T (2000) Optimal control theory applied to the anti-viraltreatment of AIDS. In: Proceedings of the 39th IEEE conference on decision and control, Sydney,Australia, December 2000, pp 4839–4844

Fleming WH, Rishel RW (1975) Deterministic and stochastic optimal control. Springer, Berlin-New YorkFleming WH, Soner HM (1993) Controlled Markov processes and viscosity solutions. Springer, New YorkGuo BZ, Sun B (2005) Numerical solution to the optimal birth feedback control of a population dynamics:

a viscosity solution approach. Optim Control Appl Methods 26:229–254Guo BZ, Sun B (2007) Numerical solution to the optimal feedback control of continuous casting process.

J Glob Optim 39:171–195Hackbusch W (1978) A numerical method for solving parabolic equations with opposite orientations.

Computing 20(3):229–240Ho YC (2005) On centralized optimal control. IEEE Trans Autom Control 50(4):537–538Huang CS, Wang S, Teo KL (2000) Solving Hamilton-Jacobi-Bellman equations by a modified method of

characteristics. Nonlinear Anal 40:279–293Huang CS, Wang S, Teo KL (2004) On application of an alternating direction method to Hamilton-Jacobi-

Bellman equations. J Comput Appl Math 166:153–166Jacobson DH, Mayne DQ (1970) Differential dynamic programming. Elsevier, New YorkJeffrey AM, Xia X, Craig IK (2003) When to initiate HIV therapy: a control theoretic approach. IEEE

Trans Biomed Eng 50(11):1213–1220



Joshi HR (2002) Optimal control of an HIV immunology model. Optim Control Appl Methods 23(4):199–213

Kirschner D (1996) Using mathematics to understand HIV immune dynamics. Not Am Math Soc43(2):191–202

Kirschner D, Webb GF (1998) Immunotherapy of HIV-1 infection. J Biol Syst 6(1):71–83Kirschner D, Lenhart S, Serbin S (1997) Optimal control of the chemotherapy of HIV. J Math Biol 35:775–

792Lenhart S, Workman JT (2007) Optimal control applied to biological models. Chapman & Hall/CRC

mathematical and computational biology series. Chapman & Hall/CRC, Boca RatonLin TC, Arora JS (1994) Differential dynamic programming technique for optimal control. Optim Control

Appl Methods 15(2):77–100Mayne DQ, Polak E (1975) First-order strong variation algorithms for optimal control. J Optim Theory

Appl 16(3/4):277–301Mirica S (2008) MR2316829 (2008f:49001). Mathematical review of Lenhart and Workman (2007). Avail-

able at http://www.ams.org/mathscinet/Nowak MA, May RM (2000) Virus dynamics: mathematical principle of immunology and virology. Ox-

ford University Press, OxfordPerelson AS, Nelson PW (1999) Mathematical analysis of HIV-1 dynamics in vivo. SIAM Rev 41(1):3–44Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2002) Numerical recipes in C++: the art of scien-

tific computing, 2nd edn. Cambridge University Press, CambridgeRadisavljevic-Gajic V (2009) Optimal control of HIV-virus dynamics. Ann Biomed Eng 37(6):1251–1261Richardson S, Wang S (2006) Numerical solution of Hamilton-Jacobi-Bellman equations by an exponen-

tially fitted finite volume method. Optimization 55(1–2):121–140Rong L, Perelson AS (2009) Modeling HIV persistence, the latent reservoir, and viral blips. J Theor Biol

260(2):308–331Rong L, Feng Z, Perelson AS (2007) Emergence of HIV-1 drug resistance during antiretroviral treatment.

Bull Math Biol 69(6):2027–2060Rubio JE (1986) Control and optimization: the linear treatment of nonlinear problems, nonlinear science:

theory and applications. Manchester University Press, ManchesterSargent RWH (2000) Optimal control. J Comput Appl Math 124:361–371Stengel RF, Ghigliazza RM, Kulkarni NV (2002) Optimal enhancement of immune response. Bioinfor-

matics 18(9):1227–1235Stoer J, Bulirsch R (1993) Introduction to numerical analysis. Springer, New YorkThe INSIGHT-ESPRIT Study Group and SILCAAT Scientific Committee (2009) Interleukin-2 therapy in

patients with HIV infection. N Engl J Med 361(16):1548–1559von Stryk O, Bulirsch R (1992) Direct and indirect methods for trajectory optimization. Ann Oper Res

37:357–373Wang S, Gao F, Teo KL (2000) An upwind finite-difference method for the approximation of viscosity

solutions to Hamilton-Jacobi-Bellman equations. IMA J Math Control Inf 17:167–178Wang S, Jennings LS, Teo KL (2003) Numerical solution of Hamilton-Jacobi-Bellman equations by an

upwind finite volume method. J Glob Optim 27:177–192Wein LM, Zenios SA, Nowak M (1997) Dynamic multidrug therapies for HIV: a control theoretic ap-

proach. J Theor Biol 185:15–29Zurakowski R, Teel AR (2006) A model predictive control based scheduling method for HIV therapy.

J Theor Biol 238(2):368–382


http://www.ams.org/mathscinet/

dynamic programming approach to the numerical solution of optimal control...

Documents