dynamicprogrammingmodelsandalgorithmsfor ......fer from the bank account to the portfolio. there is...

MANAGEMENT SCIENCEVol. 56, No. 5, May 2010, pp. 801–815issn 0025-1909 �eissn 1526-5501 �10 �5605 �0801

informs ®

doi 10.1287/mnsc.1100.1143©2010 INFORMS

Dynamic Programming Models and Algorithms forthe Mutual Fund Cash Balance Problem

Juliana Nascimento, Warren PowellDepartment of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540

{[email protected], [email protected]}

Fund managers have to decide the amount of a fund’s assets that should be kept in cash, considering thetrade-off between being able to meet shareholder redemptions and minimizing the opportunity cost from lost

investment opportunities. In addition, they have to consider redemptions by individuals as well as institutionalinvestors, the current performance of the stock market and interest rates, and the pattern of investments andredemptions that are correlated with market performance. We formulate the problem as a dynamic program, butthis encounters the classic curse of dimensionality. To overcome this problem, we propose a provably convergentapproximate dynamic programming algorithm. We also adapt the algorithm to an online environment, requiringno knowledge of the probability distributions for rates of return and interest rates. We use actual data for marketperformance and interest rates, and demonstrate the quality of the solution (compared to the optimal) for thetop 10 mutual funds in each of nine fund categories. We show that our results closely match the optimal solution(in considerably less time), and outperform two static (newsvendor) models. The result is a simple policy thatdescribes when money should be moved into and out of cash based on market performance.

Key words : mutual fund cash balance; approximate dynamic programmingHistory : Received May 18, 2007; accepted December 12, 2009, by Michael Fu, stochastic models andsimulation. Published online in Articles in Advance March 23, 2010.

1. IntroductionMutual fund managers have to determine how muchcash to keep on hand, striking a balance betweenthe cost of meeting redemption requests against theopportunity cost of holding cash. The academic litera-ture typically ignores several dimensions of the prob-lem, such as the characteristics of the demands forredemptions. For example, the mutual fund has toconsider both individual and institutional investors.Not only are redemption requests by institutionalinvestors much larger, they have to be satisfied rightaway, producing short-term borrowing costs if there isinsufficient cash. The manager has to consider trans-action costs and bid-ask spreads. He also has to takeinto account whether the market is under- or over-performing long-term averages, as well as the patternof deposits and redemptions that are correlated withboth market performance and interest rates.In Wermers (2000) and Edelen (1999), the authors

present empirical evidence that supports the valueof active fund management, as expenses and trans-actions costs are considerable factors in the fund netreturns. In particular, these papers point out the sig-nificant cost of holding cash in a mutual fund.Variations of the cash balance problem have been

studied in the academic community in different forms.A survey of deterministic cash balance problems canbe found in Srinivasan and Kim (1986). The deter-ministic problem assumes demands, market returns,

and interest rates are known, but introduces issuessuch as fixed costs for a transaction and the modelingof continuous time processes. A similar deterministicversion of this problem is the cash balance problembased on Sethi and Thompson (1970) and Sethi (1973),and later described in Sethi and Thompson (2000).The problem is represented by a system of differentialequations and the optimal policy is determined study-ing the associated Hamiltonian and adjoint functions.Golden et al. (1979) and Jorjani and Lamar (1994) pro-pose network flow optimization models to deal withefficient management of cash, investments, and short-term loans. One of the earliest investigations intostochastic cash management is given in Eppen andFama (1971) (see also Eppen and Fama 1969), butthese inventory-theoretic models of cash balance donot model the “state of the world” captured by marketperformance and interest rates (Bensoussan et al. 2009is a more modern example of this). Schmid (2002) usesstochastic programming where uncertainty is repre-sented in scenario trees. This approach can handleother types of information, but the scenario trees growexponentially with the number of time periods, requir-ing complex algorithmic strategies.Myopic and stationary solutions for a stochas-

tic cash balance problem are presented in Penttinen(1991).A related problem is corporate cash holding, which

is addressed in Kim et al. (2001), Almeida et al. (2004),

801

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Nascimento and Powell: Mutual Fund Cash Balance Problem802 Management Science 56(5), pp. 801–815, © 2010 INFORMS

and Faulkender and Wang (2006). These paperspresent a qualitative analysis indicating that the morevolatile the cash flow and the the higher the cost offinance, the more cash holding is observed. Almeidaet al. (2004) also analyzes cash flow sensitivity.A cash holding problem specifically applied to

mutual funds is discussed in Yan (2006). Like thepapers on corporate cash holdings, this paper onlydeals with qualitative aspects. The author developsa simplified static model for the problem and thenproceeds with a regression analysis on actual mutualfund data that validates the model predictions. Theauthor only hints, based on Constantinides (1986),that the holding of cash should be kept within a cer-tain range and it should be adjusted only when thecash level is too low or too high.Hinderer and Waldmann (2001) present a broader

cash management model that includes a basic modelof the mutual fund cash holding problem as a spe-cial case. The authors propose a dynamic program-ming formulation but do not address the computa-tional problems that arise as a result of the curse ofdimensionality.The mutual fund cash holding problem can be

viewed as a more complex inventory control problem(Graves et al. 1993). The complications arise becauseof the presence of side variables not present in astandard inventory context, such as interest rates andmarket rates of return. Even inventory problems withone-dimensional state spaces may lead to prohibitivecomputational requirements, because of the cardinal-ity of the state space. In the context of a single-itemstochastic lot-sizing problem, Halman et al. (2009)develops approximation algorithms to deal with it.A related problem in the economic literature is the

commodity storage problem (Wright and Williams1982, 1984). This problem assumes an infinite plan-ning horizon where a crop is harvested each year anda decision regarding the allocation between consump-tion and changes in inventory must be taken at eachperiod. Both a dynamic programming analysis anda competitive equilibrium analysis are discussed inJudd (1998).We provide a model of the mutual fund cash man-

agement problem that captures several dimensions ofa realistic setting. In fact, our model is based on ane-mail sent by an actual fund manager describing theissues he was facing and that he could not find theanswers in the literature to approach them. We pro-pose both finite and infinite horizon models. We thendescribe several methods for solving these models:(a) we give two solutions based on newsvendor mod-els suggested by the mutual fund manager in hise-mail, (b) we give an exact algorithm using back-ward dynamic programming (the most detailed ver-sion requires three days to solve), and (c) we provide

an approximate dynamic programming (ADP) algo-rithm, which is based on the successive projectiveapproximation routine (SPAR) for maintaining con-cavity of the value function. Our ADP algorithm,dubbed the “SPAR-Mutual” algorithm, is based on aprocedure described in Powell et al. (2004) that adap-tively learns a piecewise linear function giving thevalue of cash as a function of market return and inter-est rates. The SPAR-Mutual algorithm is provably con-vergent for finite horizon problems (the proof is givenin Nascimento and Powell 2009), without requiringthe use of an explicit exploration strategy. The algo-rithm can be used in a model-based implementation(where we assume that we know the distribution ofmarket returns and interest rates) over a finite hori-zon or a model-free implementation if we are willingto assume steady state, using actual (rather than sim-ulated) observations of the exogenous information.

2. Problem Description andAssumptions

Our problem is to determine how much cash tokeep on hand to strike a balance between havingenough cash to handle redemptions and maximizingthe return on invested assets. We consider both finiteand infinite horizon versions of this problem. Theobjective is to minimize discounted expected costs,where costs include early liquidation costs (if we haveto liquidate assets to meet a redemption request), bor-rowing costs (if a redemption request has to be satis-fied before assets can be liquidated), transaction costs,and the lost return on funds that are not investedin the market. During periods of market decline, thelost return is negative. We disregard the use of cashto pay management fees and other expenses andto make dividends and capital gains distributions,because they are typically a fixed percentage of thetotal assets under consideration.There are two types of shareholders, namely, large

institutional investors and small retail customers. Wedenote by Dl

t and Dst the demand for redemption

at time t from institutional (large) and retail (small)investors, respectively. We denote by Di

t the inflow ofmoney from new investors, treated as a single, aggre-gate quantity. We assume that the stochastic processesdescribing Dl

t , Dst , and Di

t are Markovian. Moreover,they are integer, bounded, and positive.We denote by Rt the cash level at period t after new

deposits Dit have been added. If the total demand for

redemption Dlt + Ds

t at t is larger than Rt , then partof the fund portfolio must be liquidated to meet thedemand. The cost involved will be denoted by �sh,the shortfall cost, given by a deterministic fraction ofthe liquidating amount. We note that we can easilyaccommodate increasing costs for larger transactions,

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Nascimento and Powell: Mutual Fund Cash Balance ProblemManagement Science 56(5), pp. 801–815, © 2010 INFORMS 803

which might arise from the cost of liquidating illiquidassets. If the institutional demand alone is higherthan Rt , then a financial cost (interest rate), denotedby P

ft , is also charged, because the demand must be

satisfied immediately through short-term loans whilesecurities are liquidated (a process that can requireseveral days). On the other hand, if too much cash ismaintained, the portfolio is losing investment oppor-tunities. This loss is measured using the fund’s portfo-lio rate of return, denoted by Pr

t . The prices Pft and P r

t

are Markov processes (not necessarily independent)that are exogenous to the system. We also assumethey have finite support. Continuous processes couldbe discretized appropriately to apply the algorithms.We let Wt = P r

t � Pft �D

it�D

lt�D

st � be the vector of

exogenous information that becomes available atevery time period t. We assume that Wt is indepen-dent of the cash level. Moreover, P r

t � Pft � is condition-

ally independent of Dit�D

lt�D

st � given Wt−1, that is,

Ɛ�P rt P

ft D

itD

ltD

st �Wt−1�= Ɛ�P r

t Pft �Wt−1�Ɛ�Di

tDltD

st �Wt−1�.

We note that more general structures, other than theMarkovian assumption, could be considered for theprocesses. As a consequence, we would have to aug-ment Wt with additional features.Knowing Wt and Rt , the cash rebalancing deci-

sion xt = xt1�xt2� is made, where xt1 is the amountof money to transfer from the portfolio to the bankaccount, whereas xt2 is the amount of money to trans-fer from the bank account to the portfolio. Thereis a limit on the amount of the portfolio that canbe liquidated in one period, so we impose the con-straint 0≤ xt1 ≤M , where M is a deterministic bound.Normally, M would simply be chosen large enoughthat it never constrains the optimal solution, but itcan also be justified by the 1940 Investment Com-pany Act (http://www.sec.gov/about/laws.shtml#invcoact1940), which allows redemptions to be satis-fied using shares of stock instead of cash if a redemp-tion request is too large.It is obvious that 0≤ xt2 ≤max 0�Rt −Dl

t −Dst �.

We denote by � Wt�Rt� the feasible region forxt = xt1�xt2�. The fund incurs transaction costs �tr

for each dollar moved into or out of the fund,which means that total transaction costs are given by�tr xt1 + xt2�. Moreover, we assume that the shortfallcost �sh is larger than the transaction cost �tr . Finally,we assume that Ɛ�P f

t � Pft−1� is positive and Ɛ�P r

t � Prt−1�

is greater than −�sh and −�tr . Clearly, �sh and �tr mustbe positive.We denote by Rx

t the cash level at the end ofperiod t after the decision is taken. This meansthat Rx

t = max 0�Rt − Dlt − Ds

t � + xt1 − xt2 andRt+1 =Rx

t +Dit+1. Note that because we assume that

Dlt , D

st , and Di

t are integer and bounded, then Rt , xt1,xt2, and Rx

t are also integer and bounded. The state of

the system before the decision is made is denoted bySt = Wt�Rt�, whereas the state of the system after thedecision is made is denoted by Sx

t = Wt�Rxt �.

The one period cost is given by

Ct St� xt�

= �sh Dlt +Ds

t −Rt�1�Dlt+Ds

t≥Rt�+ P

ft D

lt −Rt�1�Dl

t≥Rt�

+Prt Rt −Dl

t −Dst �1�Dl

t+Dst<Rt�

+�tr xt1+ xt2�� (1)

The first term is the shortfall cost, whereas the secondterm is the financial cost. The third term is the oppor-tunity cost and the last term is the transaction cost.Let X�

t St� be a decision function that determines xt

given the information in St . We assume that we havea family of functions X�

t �� ∈�. We consider both sta-tionary and nonstationary policies in this paper, sofor the purpose of properly representing nonstation-ary policies, the decision function is indexed by time.When we refer to a policy �, we mean the decisionfunction X�

t for some policy � ∈�. Later, we providemore specific meaning to a specific policy.Our problem is to solve

inf�∈�

ƐT∑

t=0�tCt St�X

�t St��

where � is a discount factor between 0 and 1. Thereare two strategies we can use to solve the problem (wetest both in our experimental work). The first is anoffline strategy where we use prior history to updateour forecasts at each time period t, from which wegenerate what is typically a nonstationary forecast ofthe future. For example, we may feel that a suddendrop in the market will be followed by a fairly fastreturn to normal levels over the next few days. Wewould implement such a strategy using a finite hori-zon model (T might be 5 or 10 days).The second strategy is an online implementation

where we assume that all processes are stationary. Inthis case, we would use an infinite horizon objective(T = �). We demonstrate how a steady-state modelsuch as this can be implemented very easily, withoutrequiring an explicit update to short-term forecasts.

3. Mutual Fund Cash Holding ModelsWe first introduce a dynamic programming modelwhere decisions at time t consider their impact on thefuture. This model is compared to two static modelsthat use myopic policies. The first one is a straight-forward simplification of the dynamic model, in thesense that the only difference between them is that thevalue function will not play a role when a decision istaken. This way, we are able to measure the impor-tance of the role of the value function, because it addssignificant computational burden to solve the prob-lem. The second static model, proposed by Yan (2006),

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


is even simpler. It does not consider transaction costs,and there is no distinction between the two types ofdemand.The policy obtained from minimizing the cost is

the same as the one obtained from maximizing theprofit. Because the original approximate algorithmdeals with profit maximization, we use this termi-nology when describing the models and algorithmicstrategies.

3.1. The Dynamic ModelWe start this section discussing the optimal valuefunctions for the cash holding problem. In §2,we introduced the state of the system, at time t,before and after the decision is taken, denoted bySt = Wt�Rt� and Sx

t = Wt�Rxt �, respectively. We let

V ∗t St� be the optimal value of being in state St , where

V ∗t St� = max

x∈� Wt�Rt�−Ct Wt�Rt�x�

+ sup�∈�

Ɛ

[ T∑t′=t+1

−�t′−tCt′ St′�X�t St′�� St�x�

]�

Similarly, we let V ∗�xt Sx

t � be the optimal value of beingin the postdecision state Sx

t , where

V ∗�xt Sx

t �=sup�∈�

Ɛ

[ T∑t′=t+1

−�t′−tCt′ St′�X�t St′�� Sx

t

]�

Equivalently, the optimal value functions can bedefined recursively. Using the value functions aroundthe pre- and postdecision states, we break Bellman’sequation into two steps:

V ∗�xt−1 Wt−1�R

xt−1�=Ɛ�V ∗

t Wt�Rt� � Wt−1�Rxt−1�� (2)

V ∗t Wt�Rt�= max


+�V ∗�xt Wt�R

xt �� (3)

At time t=T , because it is the end of the plan-ning horizon, V ∗�x

T WT �RxT �=0. At time t−1, for

t=T ��1, the value of being in any postdecisionstate Wt−1�Rx

t−1� does not involve the solution of anoptimization problem; it involves only an expectation,because the next predecision state Wt�Rt� dependsonly on the exogenous information that first becomesavailable at t, as in (2). On the other hand, the valueof being in any predecision state Wt�Rt� at t doesnot involve expectations, because the next postdeci-sion state Wt�R

xt � is a deterministic function of Wt , it

requires only the solution of an optimization problem,as in (2).If we substitute (3) into (2), we get

V ∗�xt−1 Wt−1�R

xt−1�

=Ɛ

[max


+�V ∗�xt Wt�R

xt � � Wt−1�R

xt−1�

]� (4)

For algorithmic reasons, throughout this paper, weonly use (4), instead of (3) and (2), i.e., we onlyconsider the value function around the postdecisionstate. Its main feature is the inversion of the optimiza-tion/expectation order in the value function formula.See Powell (2007, Chap. 4) for a complete discussionof postdecision states.To simplify notation, we will just drop the super-

script x in the value function notation. We perform aqualitative analysis of (4) in order to provide insightsabout the optimal policy. This analysis is also thefoundation for the SPAR-Mutual algorithm.Even without computing the expectation in (4),

given the integrality assumptions on Dit , Dl

t , andDs

t , the function V ∗t−1 Wt−1�·�! �0��→� is piece-

wise linear with integer break points. Thus, dis-regarding its value at Wt−1�0�, the function canbe identified uniquely by its slopes �v∗

t−1 Wt−1�1��v∗t−1 Wt−1�2��v∗

t−1 Wt−1�Bt−1��, where Bt−1 is theupper bound on the cash level Rx

t−1 and

v∗t−1 Wt−1�R

xt−1�=V ∗

t−1 Wt−1�Rxt−1�−V ∗

t−1 Wt−1�Rxt−1−1��

Assuming the slopes are monotone decreasing inthe cash level dimension, that is,

vt Wt�Rxt �≥vt Wt�R

xt +1�� (5)

for all states Wt�Rxt �, we give a simple and intu-

itive proof that the optimal cash holding policy onlyrebalances the cash holdings when the cash levelgoes outside a certain range. To describe the pol-icy, we define the following ranges: �1 Wt�=�R!�vt Wt�R�>�tr �, �2 Wt�=�R! −�tr ≤�vt Wt�R�≤�tr �,and �3 Wt�=�R! �vt Wt�R�<−�tr �.

Theorem 1. Given a predecision state Wt�Rxt �,

assume (5) for all possible cash levels. Regions �1 Wt�,�2 Wt�, and�3 Wt� determine a decision rule for the opti-mization problem

maxxt∈� Wt�Rt�

−Ct Wt�Rt�xt�+�Vt Wt� R+xt1−xt2�� (6)

where R=max 0�Rt−Dl

t−Dst �

and

Vt Wt�R�=R∑

r=1vt Wt�r��

The optimal policy is as follows.Rule 1. If R∈�1 Wt�, x

∗t1=min M�max�xt1! R+xt1∈

�1 Wt��, x∗t2=0.

Rule 2. If R∈�2 Wt�, x∗t1=0, x∗

t2=0.Rule 3. If R∈�3 Wt�, x∗

t1=0�x∗t2=max R�min�xt2! R−xt2∈�2 Wt��.

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


Proof of Theorem 1. For any predecision state Wt�Rt�, the optimal decision x∗

t for (6) is the sameas the optimal decision x∗

t of the following linearproblem:

maxxt�y�z

−�tr xt1+xt2�+�M∑i=1

vt Wt� R+i�yi

+� R∑

j=1vt Wt� R−j� 1−zj�

subject toM∑i=1

yi=xt1

R∑j=1

zj =xt2 0≤y≤1 0≤z≤1�

Note that even without imposing integrality, eachcomponent of y and z is either equal to 0 or 1. Eachyi=1, for i=1��M , represents the decision to trans-fer one dollar from the portfolio to the bank account.On the other hand, each zj =1, for j=1�� R repre-sents the transfer of one dollar from the bank accountto the portfolio. Moreover, given (5), if for some i,y∗i =1 then y∗

i′ =1 for all i′<i, y∗i′′ =0 for all i′′>i

and the entire vector z is equal to 0. Furthermore,if for some j , z∗j =1 then z∗j ′ =1 for all j ′<j�z∗j ′′ =0for all j ′′>j and the entire vector y is equal to 0.This follows the logical reasoning that if money istransferred from the portfolio to the bank accountit makes no sense to transfer money from the bankaccount to the portfolio. Clearly, if R∈�1 Wt�, thenthe optimal vector z∗ is equal to 0 and y∗

i =1 for all isuch that �vt Wt� R+i�>�tr , that is, we keep trans-ferring money to the bank account while the transac-tion cost is smaller than the marginal value of havingone more dollar in cash or until the upper boundM is reached. Symmetrically, if R∈�3 Wt�, then theoptimal vector y∗ is equal to 0 and z∗j =1 for all j

such that �vt Wt� R−j�<−�tr , that is, we keep trans-ferring money to the portfolio while the transactioncost is smaller than the marginal value of having oneless dollar in cash or until there is no more cash tobe transferred. Finally, if R∈�2 Wt�, then y∗=0 andz∗=0, since the transaction cost is too high to justifyany transfer of money, proving Rules 1–3. �

We next give an expression for the optimal slopesand show that they are monotonically decreasing, aproperty that we exploit in the SPAR-Mutual algo-rithm. This property not only accelerates the rate ofconvergence of the algorithm, but it is also central toits pure exploitation nature, resulting in a simpler andfaster procedure.

Theorem 2. For t=T ��1 and all states Wt−1�Rxt−1�,

the optimal slopes are given by


xt−1�=Ɛ� �G Wt�Rt�v

∗t � � Wt−1�R

xt−1�� (7)

where

�G Wt�Rt�v∗t �

=�sh1�Dlt+Ds

t≥Rt�+P

ft 1�Dl

t≥Rt�−P r

t 1�Dlt+Ds

t<Rt�

+max �tr ��v∗t Wt�Rt−Dl

t−Dst +Mt��

·1��v∗t Wt�Rt−Dlt−Ds

t �>�tr �1�Dlt+Ds

t<Rt�

+�v∗t Wt�Rt−Dl

t−Dst �1�−�tr≤�v∗t Wt�Rt−Dl

t−Dst �≤�tr �

·1�Dlt+Ds

t<Rt�−�tr1��v∗t Wt�Rt−Dl

t−Dst �<−�tr �1�Dl

t+Dst<Rt�

�

Moreover, the optimal slopes are monotone decreasing inthe cash level dimension, that is,


xt−1�≥v∗

t−1 Wt−1�Rxt−1+1��

implying concavity of the optimal value function.

The proof of Theorem 2 is given in the appendix.

3.2. The Static ModelsIn our first static model, given the time period t andthe predecision state Wt�Rt�, the objective is to finda decision that maximizes the expected profit in thenext time period, where the profit is the negativeof the cost given by (1). The second model is sim-ilar to the one proposed in Yan (2006). It does notconsider transaction costs and there is no distinctionbetween the two types of demand. Because of the lat-ter simplification, the finance and the shortfall costsare collapsed together. As in the first static model,given the time period t and the predecision state Wt�Rt�, the objective is to find a decision that max-imizes the expected profit in the next time period,where the expression for the profit takes into consid-eration the simplifications mentioned above.Let Dt+1=Dl

t+1+Dst+1−Di

t+1 be the net flow ofmoney. For the first and second model, the optimaldecision x∗

t is thus found by solving, respectively, themaximization problems


Ɛ�−Ct+1 Wt+1�Rxt +Di

t+1�xt� � Wt�Rt�� (8)


Ɛ�− �sh+Pft+1� Dt+1−Rx

t �1�Dt+1≥Rxt �

−Prt+1 R

xt −Dt+1�1�Dt+1<Rx

t �� Wt�Rt�� (9)

where Rxt =max 0�Rt−Dl

t−Dst �+xt1−xt2.

4. Algorithmic StrategiesIn this section, we present algorithms to actuallycompute the decisions implied by each cash holdingmodel. For the dynamic setting, once we computethe slopes of the value functions, a solution is easilydetermined following the decision rule described inTheorem 1. We propose two approaches to find the

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


slopes. The first one is traditional backward dynamicprogramming. Even though this approach is sim-ple and straightforward, it is computationally verydemanding.The second approach is through ADP. The ADP

algorithm replaces the computation of the expectedvalue by iteratively observing sample path real-izations and, most importantly, by exploiting thestructural property of the optimal slopes, namely,the monotone decreasing property. Of course, ADPreplaces the computational burden of the exact solu-tion with the statistical errors of a Monte Carlo basedprocedure. The upside is that the slopes do convergeto the optimal ones in the limit. The idea is that if thenumber of iterations is large enough, the approximatesolutions are very close to the optimal ones. The ADPapproach is discussed in full in the next section.The optimal decision for the static models is

obtained in a similar fashion as the optimal decisionfor the dynamic model at time period T −1, becauseno downstream effects are taken into considerationat the end of the planning horizon T . Therefore, ourprocedure to determine the optimal decision for thestatic models follows the reasoning of the proof ofTheorem 1, i.e., we make the decision comparing thetransaction cost with the marginal value of transfer-ring one dollar to/from the bank account.Given the predecision state Wt�Rt�, let f1 Wt�R

xt �

and f2 Wt�Rxt � denote the marginal value of transfer-

ring one dollar to the bank account for the first andsecond static models, respectively, when the cash levelis Rx

t =max 0�Rt−Dlt−Ds

t �+xt1−xt2. From (8) and (9),we can easily obtain

f1 Wt�Rxt � = Ef��Dl

t+1−Dit+1≥Rx

t � Wt�Rt��

+ �sh+Er��Dt+1≥Rxt � Wt�Rt��−Er�

f 2 Wt�Rxt � = Ef +�sh+Er��Dt+1≥Rx

t � Wt�Rt��−Er�

where Ef =Ɛ�Pft+1 �Wt�, Er =Ɛ�P r

t+1 �Wt�, and Dt+1=Dl

t+1+Dst+1−Di

t+1. Symmetrically, the marginal valueof transferring one dollar from the bank account isgiven by −f1 Wt�R

xt � and −f2 Wt�R

xt �, respectively.

The optimal decision for the first model can befound using the following procedure:

STEP 0: Initialize xt1=xt2=0.STEP 1: While f1 Wt�R

xt �>�tr and xt1<Mt do

xt1=xt1+1.STEP 2: While f1 Wt�R

xt �<−�tr and xt2<max 0�

Rt−Dlt−Ds

t � do xt2=xt2+1.Because there is no transaction cost involved in thesecond model, its solution is determined using a sim-ilar procedure, replacing �tr by 0 and f1 Wt�R

xt � by

f2 Wt�Rxt �.

Figure 1 Optimal Value Function and the Constructed Approximation

Important region

Optimal—UnknownV*t (Wt,.) Vn

t (Wt,.)Approximation—Constructed

Asset dimensionRx

t1 Rxt2 Rx

t3

�*t(W t, R

xt2)

�*t (Wt , R xt5 )�

* t(W

t,R

x t1)

�n

t(W t,

Rxt2) � n

t (Wt , R x

t5 )

�n t(W

t,R

x t1)

�nt (Wt, Rx

t3)�*t (Wt, Rx

t3) �*t (Wt , R x

t4 )� nt (W

t , R xt4 )

5. The Approximate DynamicProgramming Approach

The main idea of the algorithm is to constructconcave and piecewise linear function approxima-tions V n

t Wt�·�, learning its slopes vnt Wt�R

xt1��

vnt Wt�R

xtN � over the iterations. At each iteration, our

decision function looks like

X�t S

nt �= argmax

xt∈� Wnt �Rn

t �

−Ct Snt �xt�+� V n−1

t Wnt �R

xt ��

where V n−1t Wt�·� is the piecewise linear value func-

tion approximation computed using information upthrough iteration n−1. We now see that our policy isparameterized by the slopes vn−1

t Wt�r�, r=1�2�� foreach possible value of Wt . Thus, when we refer to apolicy �, we are actually referring to a specific set ofslopes.The catch is that the algorithm does not try to learn

the slopes for the whole state space, but only forparts close to optimal cash levels, which are deter-mined by the algorithm itself. Figure 1 illustrates theidea.At each iteration n and time t, instead of com-

puting the expectation in (4), the algorithm observesone sample realization of the information vector.After that, the sample realization and the currentvalue function approximation are used to take a deci-sion xn

t , leading the system to a postdecision state.Sample information around the new postdecisionstate is gathered and is used to update the approxi-mate slopes vn−1

t−1 . A projection operation is then per-formed in case a violation of the concavity propertyoccurs.

5.1. The SPAR-Mutual AlgorithmBefore we present the algorithm, some notationis necessary. A general postdecision state at t isdenoted by Sx

t or Wt�Rxt �. The two of them are used

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


Figure 2 SPAR-Mutual Algorithm

STEP 0: Algorithm initialization:STEP 0a: Initialize v0t Wt�R

xt � for all t and Wt�R

xt �

monotone decreasing in Rxt .

STEP 0b: Pick N , the total number of iterations.STEP 0c: Set n=1.

STEP 1: Planning horizon initialization: Observe theinitial cash level Rx�n

−1 .Do for t=0��T :STEP 2: Sample/Observe P

f �nt , P r�n

t , Di�nt , Dl�n

t , and Ds�nt .

STEP 3: Compute the predecision cash level: Rnt =Rx�n

t−1+Di�nt �

STEP 4: Slope update procedure:If t>0 thenSTEP 4a: Observe �vn

t Rx�nt−1� and �vn

t Rx�nt−1+1�.

STEP 4b: For all possible states Sxt−1:

znt−1 S

xt−1�= 1− ,n

t−1 Sxt−1��v

n−1t−1 S

xt−1�+ ,n

t−1 Sxt−1��vn

t Rxt−1��

STEP 4c: Perform the projection operation

vnt−1=��Wn

t−1�Rx�nt−1 z

nt−1�. See (10).

STEP 5: Find the optimal solution xnt of

maxx∈� Wnt �Rn

t �−Ct Snt �x�+� V n−1

t Wnt �R

xt ��

STEP 6: Compute the postdecision cash level:Rx�n

t =max 0�Rnt −Dl�n

t −Ds�nt �+xn

t1−xnt2.

STEP 7: If n<N increase n by one and go to Step 1.Else, return vN .

interchangeably. We use Sx�nt to denote the actual post-

decision state visited by the algorithm at iteration nand time t. The same notation convention holds forthe predecision states. At iteration n and time t, theactual decision taken by the algorithm is denotedby xn

t , whereas the value function approximation isdenoted by V n

t Wt�·�. The corresponding slopes aredenoted by vn

t Wt�= vnt Wt�1��vn

t Wt�Bt��. Clearly, V nt Wt�R

xt �=

∑Rxt

r=1 vnt Wt�r�. We refer interchangeably

to the value function itself and to its slopes.The SPAR-Mutual algorithm is presented in

Figure 2. As described in Step 0, the algorithm inputsare piecewise linear value function approximationsrepresented by their slopes v0t Wt�R

xt �. The initial

slopes must be decreasing in the cash level dimen-sion. A slope vector that is equal to zero for all statesand time periods is a valid input. Because we knowthat v∗

T WT �RxT �=0 for all states WT �R

xT �, then we use

vnT WT �R

xT �=0 for all iterations n.

At each iteration n, the algorithm starts by observ-ing the initial cash level, denoted by Rx�n

−1 , as inStep 1. The initial cash level must be a positive inte-ger. After that, the algorithm proceeds over time peri-ods t=0��T . At the beginning of time period t,the algorithm generates a sample of the interest rateP

f �nt , rate of return P r�n

t , money inflow Di�nt , institu-

tional redemption Dl�nt , and retail redemption Ds�n

t , asin Step 2. These are Monte Carlo samples followingthe probability distribution of the information processWt given that Wn

t−1= Pf �nt−1 �P

r�nt−1�D

i�nt−1�D

l�nt−1�D

s�nt−1�. After

that, the predecision cash level Rnt is computed, as in

Step 3.Before the decision at time period t is taken, the

algorithm uses the sample information to update theslopes of time period t−1. Steps 4a–4c describesthe procedure, and Figure 3 illustrates it. We firstobserve slopes relative to the postdecision states Wn

t−1�Rx�nt−1� and Wn

t−1�Rx�nt−1+1� (see Step 4a and

Figure 3(a)). Then, these sample slopes are used toupdate the approximation slopes vn−1

t−1 through a tem-porary slope vector zn

t−1 (see Step 4b and Figure 3(b)).This step requires the use of a step-size rule thatis state dependent, denoted by ,n

t Sxt �. We have

that ,nt S

xt �=,n

t 1�Wt=Wnt � 1�Rx

t =Rx�nt �+1�Rx

t =Rx�nt +1��, where

0<,nt ≤1 and ,n

t can depend only on informationthat became available up through iteration n andtime t. For example, it is valid to use ,n

t =1/ N Sx�nt ��,

where N Sx�nt � is the number of visits to state Sx�n

t upuntil iteration n. The updating scheme may produceslopes that violate the property of being monotoni-cally decreasing. In this case, a projection operationis performed to restore the property and obtain theupdated approximation slopes vn

t−1 (see Step 4c, illus-trated inFigure 3(c)).Next, a decision xn

t is made given the current stateat time t. This decision is the optimal solution withrespect to the current predecision state Wn

t �Rnt � and

value function approximation V n−1t Wn

t �·�, as statedin Step 5. This decision can be easily calculated fol-lowing the decision rule described in Theorem 1.We just need to consider the current predecisionstate Wn

t �Rnt � and the current slope approximation

vn−1t Wn

t �.Finally, the postdecision state Rx�n

t is computed, asin Step 6, and we advance the clock to time t+1. Asthe algorithm reaches the planning horizon t=T , ifthe number of iterations has not reached its limit N ,then the iteration counter is incremented, as in Step 7,and a new iteration is started from Step 1. Otherwise,the algorithm is finished returning the current slopeapproximation vN

t Wt�Rxt � for all t and Wt�R

xt �.

We obtain sample slopes by replacing the expecta-tion and the optimal slope v∗

t in (7) by the sample real-ization Wn

t and the current slope approximation vn−1t ,

respectively. Thus, for t=1��T , the sample slope is�vnt R�= �G Wn

t �R�vn−1t �.

The projection operator ��Wnt �Rx�n

tmaps a vector zn

t ,that may not be monotone decreasing in the cashlevel dimension, into another vector vn

t that has thisstructural property. The operator imposes the prop-erty by forcing the newly updated slope at Wn

t �Rx�nt �

to be greater than or equal to the newly updated slopeat Wn

t �Rx�nt +1� and then forcing the other violating

slopes to be equal to the newly updated ones. For any

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


Figure 3 Slopes Update Procedure, Where Ft−1�Snt−1� V n−1

t−1 �x=−Ct−1�Snt−1�x+� V n−1

t−1 �Wnt−1�R

xt−1

(a) Current approximate function, optimal decision and sampled slopes

Exact

Rnt–1 Rn

t

xATxn

t

�n

t +1–

ˆ� n

t +1+ˆ

FW

n ,R

nV

n–1

(x)

tt

t–1,

�n–1 (Wnt , Rn

t )t

Slopes of Vn–1 (Wnt )t

(b) Temporary approximate function with violation of concavity

Concavity violation

Exact

zn (Wnt , Rn

t )t

Slopes of zn (Wnt )t

Slopes of Vn (Wnt )

(c) Level projection operation: updated approximate function with convacity restored

Exact

x

�n (Wnt , Rn

t )t

t

Frn

,Pn

,Rn

Vn

(x,y

n )t

t–1,

tt

tˆ

state Wt�Rxt �, the projection is given by

��Wnt �Rx�n

t z� Wt�R

xt �

=

z Wnt �R

x�nt �+z Wn

t �Rx�nt +1�

2if C1�

��Wnt �Rx�n

t z� Wn

t �Rx�nt � if C2�

��Wnt �Rx�n

t z� Wn

t �Rx�nt +1� if C3�

z Wt�Rxt � otherwise,

(10)

where the conditions C1, C2, and C3 are

C1: Wt=Wnt � Rx

t = Rx�nt or Rx�n

t +1��z Wn

t �Rx�nt �<z Wn

t �Rx�nt +1�

C2: Wt=Wnt � Rx

t <Rx�nt �

z Wt�Rxt �≤��Wn

t �Rx�nt

z� Wnt �R

x�nt �

C3: Wt=Wnt � Rx

t >Rx�nt +1�

z Wt�Rxt �≥��Wn

t �Rx�nt

z� Wnt �R

x�nt +1��

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


5.2. Convergence AnalysisWe state two theorems that prove that the algorithmconverges to an optimal policy, that is, the algorithmdoes learn the optimal decision to be taken at eachstate that can possibly be reached by an optimal pol-icy. The proofs are based on more general results inNascimento et al. (2007); hence, only a sketch is pro-vided here.We start with some remarks. We denote by �Sn

t �n≥0=� Wn

t �Rnt ��n≥0 the sequence of predecision states vis-

ited by the algorithm at time t. Likewise, we denoteby �xn

t �n≥0 the sequence of decisions taken by the algo-rithm, and by �Sx�n

t �n≥0=� Wnt �R

x�nt ��n≥0 the sequence

of postdecision states visited by the algorithm. Eachone of these sequences has at least one accumulationpoint. This result derives from the fact that the infor-mation process and the cash level have finite support.Moreover, the decisions are constrained to compactsets. We denote by S∗

t , x∗t , and Sx�∗

t the respective accu-mulation points.We denote by �vn

t Wt��n≥0 the sequence of slopesgenerated by the algorithm. Note that there is onesuch sequence for each vector Wt . This sequence alsohas at least one accumulation point, because the pro-jection operation guarantees, for all n, that vn

t Wt�belongs to a compact set, namely, the set of vectorsthat are monotone decreasing in the cash level dimen-sion and are bounded (because of the finite supportassumption of the information process). We denote byv∗t Wt� an accumulation point of this sequence.

Theorem 3. On the event that W ∗t �R

x�∗t � is an accu-

mulation point of � Wnt �R

x�nt ��n≥0, if

�∑n=0

,nt W

∗t �R

x�∗t �=�

and �∑n=0

,nt W

∗t �R

x�∗t ��2<B<� a�s��

thenvnt W

∗t �R

x�∗t �→v∗

t W∗t �R

x�∗t �

and

vnt W

∗t �R

x�∗t +1�→v∗

t W∗t �R

x�∗t +1� a�s� (11)

As a byproduct of the previous theorem, we obtainthe next theorem:

Theorem 4. For t=0��T , on the event that W ∗

t �R∗t �v

∗t �x

∗t � is an accumulation of � Wn

t �Rnt �

vn−1t �xn

t ��n≥1, if the step-size condition of Theorem 3 issatisfied, then, with probability one, x∗

t is an optimalsolution of maxxt∈� W ∗

t �R∗t �Ft W

∗t �R

∗t �V

∗t �xt�, where

Ft W∗t �R

∗t �V

∗t �xt�

=−Ct W∗t �R

∗t �xt�+�V ∗

t

· W ∗t �max 0�R

∗t −Dl�∗

t −Ds�∗t �+xt1−xt2�� (12)

Equation (12) implies that the algorithm haslearned an optimal decision for all states that can bereached by an optimal policy. This implication can beeasily justified as follows. We start with t=0. For eachaccumulation point W ∗

0 �R∗0� of � W

n0 �R

n0��n≥0, (12) tells

us that the accumulation points x∗0 of �x

n0 �n≥0 along the

iterations with initial predecision state W ∗0 �R

∗0� are in

fact an optimal decision for period 0 when the pre-decision state is W ∗

0 �R∗0�. This implies that all accu-

mulation points Rx�∗0 =max 0�R∗

0−Dl�∗0 −Ds�∗

0 �+x∗01−

x∗02 of �Rx�n

0 �n≥0 are postdecision cash levels that canbe reached by an optimal policy. When t=1, for eachRx�∗0 and each accumulation point W ∗

1 of �Wn1 �n≥0,

R∗1=Rx�∗

0 +Di�∗1 is a predecision cash level that can be

reached by an optimal policy. Once again, (12) tellsus that the accumulation points x∗

1 of �xn1 �n≥0 along

the iterations with Wn1 �R

n1�= W ∗

1 �R∗1� are indeed an

optimal decision for period 1 when the predecisionstate is W ∗

1 �R∗1�. As before, the accumulation points

Rx�∗1 =max 0�R∗

1−Dl�∗1 −Ds�∗

1 �+x∗11−x∗

12 of �Rx�n1 �n≥0 are

postdecision cash levels that can be reached by anoptimal policy. The same reasoning can be applied fort=2��T .

6. The Infinite Horizon CashHolding Problem

The infinite horizon problem arises when we believethat the exogenous information processes (prices,interest rates, deposits, and redemptions) are station-ary. The only change is that we drop the indexing bytime of all variables. We adjust the algorithms to theinfinite horizon case in a fairly straightforward way.For the static models, we use the procedure describedin §4, dropping the time index.The ADP algorithm for the infinite horizon case is

similar to the SPAR-Mutual algorithm described inFigure 2, except that we do not loop over the differ-ent time periods and again the time index is dropped.The modified SPAR-Mutual algorithm for the infinitehorizon method brings a nice twist. It can be consid-ered an online algorithm in the sense that probabilitydistributions do not need to be known or estimated.For the finite horizon case, the sample realizations forthe interest rate, rate of return, deposits, and with-drawals are obtained from a sample generator thatfollows a known distribution. On the other hand,actual daily historical values of the interest rate, rateof return, inflow money, and demand for redemp-tion can be used for the infinite horizon case, with-out any need to estimate the probability distributionunderlying them. Moreover, as new daily informationbecomes available, it can be used to update the cur-rent slope approximations. We do not have a proof ofconvergence for this online algorithm, but we showthat the policies produced by this algorithm outper-form the static ones.

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


7. Numerical ExperimentsThe purpose of this section is to analyze the behav-ior of the different algorithmic approaches. We studythe effect of discretization on central processing unit(CPU) time and solution quality for the exact and theADP algorithms. Moreover, we quantify how muchwe gain by considering the impact a decision hason the future, instead of just using a myopic policy.Finally, for the infinite horizon case, we can observethe behavior of an online algorithm and the errorsincurred in constructing probability distributions andestimating their parameters.We want to make sure we compare the algorithmic

strategies when they are applied to realistic environ-ments. To achieve this goal, the probability distribu-tions are estimated using real data, including bankprime loan rates, rates of return of top-performingfunds, total asset values, and redemption rates of abroad range of funds.We start by describing the instances considered and

the data used to construct them. After that, we discussimplementation details. We then present and analyzethe results for the finite and infinite horizon cases.

7.1. Problem InstancesWe start with the interest rate models. The Vasicekand the Cox–Ingersoll–Ross processes are commonlyused to model interest rates (Cairns 2004). The pro-cesses are described by drt=, 1−rt�dt+2dBt anddrt=, 1−rt�dt+2

√rtdBt , respectively, where rt rep-

resents the interest rate process, Bt is a standardBrownian motion, and ,, 1, and 2 are param-eters that must be estimated. We used discretetime versions of these models, capturing the behav-ior that decisions are made once each day. Thus,we use rt+1=rt+, 1−rt�3t+2

√3tZt and rt+1=rt+

, 1−rt�3t+2√rt3tZt , respectively, where Zt is nor-

mally distributed with mean 0 and variance 1, and3t is the length of a single day (in whatever units inwhich we are measuring time).We applied the maximum likelihood estimation

(MLE) method to fit these parameters, using weeklyprime rate historical data (from July 2001 to June 2006)from the Federal Reserve Board website (http://www.federalreserve.gov/releases), because bank prime loanis one of several base rates used by banks to priceshort-term business loans.For the portfolio rate of return, we used the Vasicek

process to model the rate of return. We consider ninedifferent sets of funds. Each set represents a mar-ket capitalization category (large, mid, small cap) anda style (blend, growth, value). Moreover, each setis composed of the 10 best-performing funds in thegiven category/style over the past three years accord-ing to Yahoo! Finance (http://finance.yahoo.com/).

We identify each set using the first letters of the cor-responding category and style. For example, LB rep-resents the set of large blend funds.Daily rates of return (from July 2001 to June 2006)

for each fund were collected from the Center forResearch on Security Prices (CRSP) Mutual FundDatabase (http://www.crsp.com/) and were aver-aged across the 10 funds on each style/category. Theresulting daily rates and the MLE method were usedto estimate the parameters of the Vasicek process.For each time period t, we assume that the arrival

of shareholders investing (redeeming) money followsa compound Poisson process with rate 5i

t (5ot ). More-

over, si% of these shareholders are institutions invest-ing di (redeeming do) units of dollars, whereas 100−si(100−so)% are retail shareholders investing (redeem-ing) one unit of dollar. Data from 8,460 differentmutual funds that have both institutional and retailinvestors and that did not have any mergers or acqui-sitions from July 2005 to June 2006 were collected fromthe CRSP Mutual Fund Database. They show that it isreasonable to consider 5i

t (5ot ) as a linear function of

the rate of return Prt ; see Figure 4(a).

The redemption rate, as reported by the InvestmentCompany Institute website (http://www.ici.org/), iscalculated by taking the sum of regular redemptionsand redemption exchanges for a given period as apercentage of the average total net assets at the begin-ning and end of this period. Considering a groupof 4,623 stock funds, the redemption rate for theJuly 2005–June 2006 period was 25�2%. Therefore, weapproximate the redemption amount for each instanceover this period by multiplying their total net assetsby the redemption rate. The resulting redemptionamount is displayed in Figure 4(b). Because the cashholding problem objective value is directly relatedto the redemption amount involved, we can use thenumbers in Figure 4(b) to quantify the actual amountof dollars that can be saved when each algorithmicstrategy is employed to produce a policy.Finally, we picked the transaction cost �tr and

the shortfall cost �sh to be equal to 0�1% and 0�2%,respectively. For the finite horizon case, the planninghorizon is set to T =5 working days.7.2. Implementation IssuesOne of our goals is to compare the performance of anADP-based algorithm to an exact solution obtainedusing traditional backward dynamic programmingmethods. This requires that we address the fact thatthe interest rate and rate of return processes areunbounded and continuous, but an exact solutionrequires that they be bounded and discrete.To obtain an exact solution, we had to create a dis-

cretized version of the interest rate and return pro-cesses. We did this by first discretizing and truncating

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


Figure 4 Net Flow and Redemptions

LB LG LV MB MG MV SB SG SV0

50

100

150

200

250

300

350

400

450

500

Instance

Red

empt

ion

(in

mill

ions

of

dolla

rs)

–4 –3 –2 –1 0 1 2 3 4–20

–15

–10

–5

0

5

10

15

Monthly rate of return

Mon

thly

net

flo

wActual dataLinear regression

(a) Average net flow (b) Approximate redemption amount

Note. LB, Large blend; LG, large growth; LV, large value; MB, mid blend; MG, mid growth; MV, mid value; SB, small blend; SG, small growth; SV, smallvalue.

the exogenous changes to interest rate and returnprocesses. Using this modified process, we createda probability distribution for these processes thatreflected both the discretization and the truncation.We then chose the finest level of discretization thatcould be solved with reasonable execution times (weallowed run times to span days) and defined thisto be the exact solution. This solution could thenbe compared to solutions obtained using backwarddynamic programming for coarser levels of aggrega-tion, as well as solutions obtained using the SPAR-Mutual algorithm.When using the SPAR-Mutual algorithm, we simu-

lated the original continuous state (that is, the inter-est rate and returns—movements of cash were alwaysassumed to occur in discrete quantities). Sampleobservations of changes in interest rates and returnswere generated by either using the original contin-uous distributions or sample observations from his-tory (without discretization). Only the value functionapproximation was discretized. This strategy allowedus to estimate the errors produced using approximatedynamic programming in a realistic setting, but westill compared our performance against an “exact”model that assumed a very fine level of discretization.A second important implementation issue was the

choice of the right step-size rule, which is a key ingre-dient to faster rates of convergence. Even though thestep-size rule ,n=1/N Sx�n

t � produces a provably con-vergent algorithm, the associated rate of convergenceis poor, because this rule goes to zero too quickly. Theproblem is that while learning the slope for one state,the updating procedure depends on another slopeapproximation, which is steadily changing, becausethe algorithm is also learning this other slope, leadingto a nonstationary process. Moreover, the rate of con-vergence for each slope is different and, ideally, the

step-size rule would reflect that. For example, theslower the convergence of a given slope, the largerits step size should be. Because we cannot tune astep-size rule for each slope, an adaptive step-sizerule works best. In our implementation, we use theadaptive stepsize rule proposed in George and Pow-ell (2006). It is given by ,n

t =1− 2nt S

x�nt ��2/ 7m

t Sx�nt ��,

where m is the number of visits to state Sx�nt up to

iteration n. The variance 2nt S

x�nt ��2 is an estimate of

the variance of the observation error, and 7mt S

x�nt � is

an estimate of the total squared variation betweenthe observation and the estimate, which includes bothbias (if our estimates are consistently high or low)as well as the observation error. These are computedusing

7mt S

x�nt � = 1−9m−1�7

m−1t Sx�n

t �

+9m−1 �vnt+1 R

x�nt �− vn−1

t Sx�nt ��2

:mt S

x�nt � = 1−9m−1�:

m−1t Sx�n

t �

+9m−1 �vnt+1 R

x�nt �− vn−1

t Sx�nt ��

2nt S

x�nt ��2 = 7m

t Sx�nt �− :m

t Sx�nt ��2

1+5m−1t Sx�n

t ��

Here, 9m=10/ 10+m� is a step-size formula.The other ingredient to faster rates of convergence

is the choice of discretization increments. It is intu-itive to expect that coarser discretization increments(smaller state space) lead to faster rates of conver-gence in the initial iterations but poorer results inthe long run, whereas finer discretization increments(larger state space) lead to slower rates of conver-gence in the initial iterations but more accurate resultsin the long run. The reasoning behind this intuitionis that, when the discretization is fine, in the initialiterations, most of the state space (in the information

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


dimension Wt) does not have enough observations toproduce a reasonable slope approximation. Becausethe SPAR-Mutual algorithm depends on the slopeapproximation to make a decision and this deci-sion determines the further course of the algorithm,the rate of convergence can suffer from this lack ofinformation.We start the SPAR-Mutual algorithm (for the infi-

nite horizon case) considering a coarse discretizationincrement, for the rate of return. Then, after N1 itera-tions, we switch the discretization increment, increas-ing the state space by a factor of two. We repeat thesame procedure after N2 and N3 iterations. The valuesof N1, N2, and N3 are a rough estimate of the numberof iterations necessary to observe a wide spectrum ofvalues of Wt .We close this section describing how the numerical

experiments were conducted. All the algorithms wereimplemented in Java on a 3.06 GHz Intel P4 machinewith 2 GB of memory running Linux. To evaluate thepolicies generated by the different algorithmic strate-gies, we created, for each instance, a unique testingset. For the finite horizon case, the testing set consistsof 1�000 different sample paths that were randomlygenerated following the processes described in §7.1.For the infinite horizon case, the testing set consistsof actual daily prime rates and rates of return fromJuly 2005 to June 2006. It is worth mentioning thatthe testing set is not used as part of the sample datarequired to learn the slopes.Unless otherwise noted, we adopt as optimal the

policy obtained using traditional dynamic program-ming with discretization increments 0�001 and 0�0001for the interest rate and rate of return, respectively.For i=1��1�000, let F �

i be the value of followingpolicy named � for the ith sample path ;i in the test-ing set, given by

F �i =

T∑t=0

�tCt St ;i��X�t St ;i��

Let F ∗i be the value of following the optimal policy

labeled by �∗. Moreover, when the approximate algo-rithms are considered, we add the superscript n to thenotation, indicating that F n

i is measuring the policyobtained after n iterations of the algorithm. To takeinto account the randomness involved in the approx-imate approaches, the policy considered to obtain F n

i

is in fact an average over 10 runs of the SPAR-Mutualalgorithm, each starting with a different random seed.Finally, we measure our distance from the optimal

policy using

<n=100×∑1�000

i=1 F ni − F ∗

i �

�∑1�000i=1 F ∗

i �� (13)

Figure 5 Initial Time Period and Fixed Interest Rate—Instance LB

0.4

0.3

0.2

0.1

0.0

–0.1

–0.1

0.10 0.15 Cash level

(a) Optimal value function

(b) Corresponding optimal slopes

Rate of return

Cash levelRate of re

turn

–0.050.050

1000

2.0× 10–3

1.51.00.50.00.5

–1.0–1.5–2.0

200300

400 500

Region I

Region II

Region III

0.150.10

0.05

–0.05–0.10

–�tr

�tr

0

–0.2

–0.3

–0.4

–0.5

Val

ue f

ucnt

ion

Slop

e

100 200 300 400 500 600

We note that it is possible for F ni to be negative

because one of the costs of holding cash is the lostreturn from not having the money properly invested.This cost can be negative during a market downturn,when we actually make money relative to the market.

7.3. Finite Horizon ResultsWe start by presenting the optimal value function (Fig-ure 5(a)) and its corresponding slopes (Figure 5(b))for time period t=0 and a fixed interest rate wheninstance LB is considered.Figure 5(b) also shows the three regions considered

in Theorem 1 as well as the planes that define them.Note that as the rate of return decreases the thres-holds between Regions I (�1 Wt�) and II (�2 Wt�) andbetween Regions II and III (�3 Wt�) increase, indi-cating that more cash should be held. This is consis-tent with the intuition that lower opportunity costsand market timing when the return is low lead tomore cash holdings. Region II represents the main roleplayed by the transaction cost �tr . When the cash levelis in this region, it means that moving money from/tothe portfolio is not justified, because the transaction

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


Table 1 Traditional Dynamic Programming—Performance forSelected Levels of Discretization

Rate of return increment—% from optimal(CPU time in seconds)

Instance 0.0001 0.0005 0.0010 0.0050

LB Opt (44,348.40) −0�15 (1,794.78) −0�29 (462.29) −0�85 (23.47)LG Opt (55,178.34) −0�07 (2,231.72) −0�31 (573.75) −1�41 (29.36)LV Opt (31,306.78) −0�01 (1,261.36) −0�23 (329.41) −0�81 (18.51)MB Opt (61,641.28) −0�00 (2,486.63) 0�04 (635.70) −0�41 (29.35)MG Opt (49,096.07) −0�10 (2,013.07) −0�30 (531.73) −0�61 (23.68)MV Opt (37,511.26) −0�11 (1,521.59) −0�25 (394.57) −0�55 (18.49)SB Opt (64,957.03) −0�08 (2,556.25) −0�24 (669.67) −0�62 (29.50)SG Opt (53,718.82) −0�10 (2,243.49) −0�10 (571.22) −0�07 (25.87)SV Opt (41,232.56) −0�01 (1,689.76) −0�03 (506.88) 0�00 (19.32)

Note. Abbreviations are defined as in Figure 4.

cost �tr is too high compared to the marginal gain ofincreasing/decreasing the cash level.Although not depicted in the figure, the interest

rate plays a less significant role in the cash holdingdecision. When the interest rate is increased, the opti-mal slopes increase slightly, indicating that it is a littlebit more valuable to hold cash. It is consistent withthe intuition that higher financial costs lead to highercash balances.Table 1 shows the percentage distance and the asso-

ciated CPU time for selected discretization levels forthe rate of return. The interest rate increment wasfixed at 0�001, because policies obtained at this levelare comparable to the ones obtained at 0�0005 and thecomputational time is reduced. As expected, the qual-ity of the solution improves as the discretization ofthe rate of return process is made finer. Of course,this accuracy comes at a cost. The curse of dimen-sionality is clearly observed, because the less accuratepolicy (smallest state space) is computed in less than20 seconds, whereas it takes almost a day to computethe optimal one (largest state space). It is importantto note that in this problem class, an improvementof 0�1% implies that millions of dollars can be savedin one year, because savings are proportional to theredemption amount as depicted in Figure 4(b).Table 2 presents the percentage distance from the

optimal policy for static and ADP approaches. TheADP algorithm was run for 300 thousand iterations,and the reported results are for the last iteration. Thediscretization increment for the interest rate was 0�001and for the rate of return was 0�0005.Figure 6 shows the rate of convergence as a function

of CPU time and as a function of number of iterationsfor the large style funds. These graphs illustrate thatthe algorithm has very fast initial convergence witha long tail. However, the convergence path is quitestable.The substantial gap between the two static meth-

ods shows that the second model is oversimplified

Table 2 Static and ADP Approaches—Distance from Optimal

SPAR-Mutual ADP algorithm

Instance Static 1 Static 2 % from optimal CPU time (in sec.)

LB −1�60 −7�75 −0�12 27�35LG −1�65 −12�45 −0�26 28�48LV −2�11 −8�66 −0�26 27�95MB −0�79 −6�77 0�06 26�65MG −2�01 −11�27 −0�37 27�71MV −2�14 −7�09 −0�36 29�26SB −2�54 −5�07 −0�52 28�51SG −1�75 −12�07 0�04 29�25SV −2�09 −11�97 −0�59 29�50

Note. Abbreviations are defined as in Figure 4.

and indicates that it is important to distinguish thetwo types of demand (individual and institutional) inaddition to considering transaction costs. Traditionaldynamic programming has running times compara-ble with the ADP approach when the discretizationlevel (the discretization of prices and interest rates) is0�005. However, the accuracy of its policy is inferiorto the one produced by an ADP algorithm.

7.4. Infinite Horizon ResultsIn the finite horizon case, we compared the per-formance of different algorithms using an assumedprobability distribution for the portfolio return andinterest rates. In the infinite horizon case, the ADPalgorithm is applied without any knowledge of thedistributions, because actual data can be used in anonline learning setting. By contrast, the exact infi-nite horizon model and the static model require thatparameters first be estimated from the model. Actualdata is also used to measure the policies, indepen-dently of the algorithm. Therefore, modeling andfitting the parameters of the underlying distribu-tions does introduce errors in the resulting policieswhen value iteration and the static approaches areconsidered.Table 3 presents the percentage of increased cost

when the static models are considered relative to theADP approach. Formula (13) multiplied by −1 wasused to obtain the numbers. Even though the ADPalgorithm was run for 200 thousand iterations and thetotal CPU time was around six seconds, it convergedafter 25,000 iterations. Because we used actual rates ofreturn and interest rates from July 2001 to June 2005,we did not have 200 thousand sample realizations asrequired by the ADP algorithm. We circumvented thisproblem by going back to the July 2001 data after theJune 2005 data was reached.We can infer from this table that the savings using a

dynamic policy instead of a static one were even moredramatic for the infinite horizon case. Other than theintrinsic dynamic versus static factor, we can infer

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


Figure 6 Rate of Convergence of the SPAR-Mutual Algorithm for Large Style Funds

0 5 10 15 20 25 30

–12

–10

–8

–6

–4

–2

0

CPU time (seconds)

Perc

enta

ge a

way

fro

m th

e op

timal

LargeMid-capSmall

0 0.5 1.0 1.5 2.0 2.5 3.0

× 105

–12

–10

–8

–6

–4

–2

0

Number of iterations

Perc

enta

ge a

way

fro

m th

e op

timal

Table 3 Percentage Increased Cost Produced by the Static ModelsRelative to the ADP Approach from July 2005 to June 2006

Blend Growth ValueMarketcapitalization Static 1 Static 2 Static 1 Static 2 Static 1 Static 2

Large 14�95 91�14 14�28 97�19 17�59 2�49Mid 0�48 78�46 16�69 101�07 9�89 68�56Small 13�80 72�81 8�75 93�94 15�18 84�48

that the main reason behind the bigger difference ismodeling errors. We conclude that real gains can beobserved when an online algorithm is used insteadof one that requires the estimation of the underlyingdistributions.

8. Insights and ConclusionsWe proposed static and dynamic models for themutual fund cash balance problem as well as algo-rithmic strategies to solve them. We were able todevelop a provably convergent approximate dynamicprogramming algorithm that replaces the compu-tation of expected values with the observation ofsample realizations. Moreover, in an infinite hori-zon environment, our approach is an online algo-rithm that requires no estimation of probabilitydistributions.We were able to show experimentally that the

SPAR-Mutual approximate dynamic programmingalgorithm delivers policies very close to the opti-mal ones in a short amount of time. Moreover, itis very robust relative to state space size, eliminat-ing the curse of dimensionality. For the infinite hori-zon case, the SPAR-Mutual algorithm does not requirethe knowledge of the underlying probability distri-butions, eliminating the errors introduced by modelselection and parameter estimation.We proved that the optimal value functions are

piecewise linear and concave in the cash level

dimension, implying that dealing with the mono-tone decreasing value function slopes is equivalent todealing with the function itself. This structural resultallowed us to prove in a very intuitive way that theoptimal policy does not produce transactions unlessthe cash level falls outside of a specific range. How-ever, this range depends on the performance of themarket and, to a lesser degree, interest rates.We found that the rate of return is more influen-

tial than the interest rate when making a decision.Furthermore, transaction costs and differentiatingredemptions originating from institutional and retailinvestors are two factors that should be taken intoconsideration when solving the problem, because thepolicy obtained from the model that disregards thesefactors is inferior to the policies produced by the mod-els that do consider them. Finally, dynamic policiesoutperform static ones by a significant margin, imply-ing that the downstream effect of the decisions doesplay an important role and the increased computa-tional burden introduced with the consideration ofthe value functions does pay off.

AcknowledgmentsThe authors are grateful for the very helpful commentsof the reviewers, as well as additional comments by thedepartment editor, Michael Fu, which clarified this paper.This work was also supported in part by Air Force Officeof Scientific Research Grant FA9550-06-1-0496.

AppendixProof of theorem 2. We prove the theorem using a

backward induction on t. We start with t=T −1. By defi-nition, we have that v∗

T−1 WT−1�RxT−1�=V ∗

T−1 WT−1�RxT−1�−

V ∗t−1 WT−1�Rx

T−1−1�. Using Equation (4), a straightforwardcalculation gives us

v∗T−1 WT−1�R

xT−1�

=Ɛ[−CT WT �RT �0�+CT WT �RT −1�0� � WT−1�R

xT−1�

]=Ɛ

[�sh1�Dl

T +DsT ≥RT �+P

fT 1�Dl

T ≥RT �

−PrT 1�Dl

T +DsT <RT � � WT−1�R

xT−1�

]�

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.


because the optimal decision x∗T inside the expectation in (4),

given any predecision state WT �RT �, is equal to 0, becausethe transaction cost �tr is charged for all decisions differentfrom 0 and the optimal value function at T is 0 for all states WT �R

xT �. Based on the assumption that Ɛ�P

ft �Pf

t−1� is positiveand Ɛ�P r

t �Prt−1� is greater than −�sh and −�tr , it is easy to see

that v∗T−1 WT−1�Rx

T−1�≥v∗T−1 WT−1�Rx

T−1+1�.Now we state the induction hypothesis. Suppose, for

any t=T −2��1 and all postdecision states Wt�Rxt �

that v∗t Wt�R

xt �≥v∗

t Wt�Rxt +1�. We shall prove that

v∗t−1 Wt−1�Rx

t−1� is given by (7) and v∗t−1 Wt−1�Rx

t−1�≥v∗t−1 Wt−1�Rx

t−1+1� for all postdecision states Wt−1�Rxt−1�.

To obtain (7), we plug in (4) in the definition ofv∗t−1. Because the first three terms of the cost function

Ct Wt�Rt�x� are independent of the decision x, the first threeterms of (7) are obtained in the same fashion as we did fort=T −1. The last three terms of (7) are obtained pluggingin the solution determined by Rules 1–3 of Theorem 1.Finally, v∗

t−1 Wt−1�Rxt−1�≥v∗

t−1 Wt−1�Rxt−1+1� follows from

the fact that v∗t is monotone decreasing and the assumption

that �sh≥�tr . �

ReferencesAlmeida, H., M. Campello, M. Weisbach. 2004. The cash flow sen-

sitivity of cash. J. Finance 59(4) 1777–1804.Bensoussan, A., A. Chutani, S. P. Sethi. 2009. Optimal cash man-

agement under uncertainty. Oper. Res. Lett. 37(6) 425–429.Cairns, A. 2004. Interest Rate Models: An Introduction. Princeton

University Press, Princeton, NJ.Constantinides, G. 1986. Capital-market equilibrium with transac-

tion costs. J. Political Econom. 94(4) 842–862.Edelen, R. M. 1999. Investor flows and the assessed performance of

open-end mutual funds. J. Financial Econom. 53(3) 439–466.Eppen, G. D., E. F. Fama. 1969. Cash balance and simple dynamic

portfolio problems with proportional costs. Internat. Econom.Rev. 10(2) 119–133.

Eppen, G. D., E. F. Fama. 1971. Three asset cash balance anddynamic portfolio problems. Management Sci. 17(5) 311–319.

Faulkender, M., R. Wang. 2006. Corporate financial policy and thevalue of cash. J. Finance 61(4) 1957–1990.

George, A., W. B. Powell. 2006. Adaptive stepsizes for recursiveestimation with applications in approximate dynamic pro-gramming. Machine Learn. 65(1) 167–198.

Golden, B., M. Liberatore, C. Lieberman. 1979. Models and solutiontechniques for cash flow management. Comput. Oper. Res. 6(1)13–20.

Graves, S., A. R. Kan, P. Zipkin. 1993. Handbooks in OperationsResearch and Management Science: Logistics of Production andInventory. North-Holland, Amsterdam.

Halman, N., D. Klabjan, M. Mostagir, J. Orlin, D. Simchi-Levi. 2009.A fully polynomial-time approximation scheme for single-itemstochastic inventory control with discrete demand. Math. Oper.Res. 34(3) 674–685.

Hinderer, K., K. Waldmann. 2001. Cash management in a randomlyvarying environment. Eur. J. Oper. Res. 130(18) 468–485.

Jorjani, S., B. Lamar. 1994. Cash flow management network modelswith quantity discounting. Omega-Internat. J. Management Sci.22(2) 149–155.

Judd, K. 1998. Numerical Methods in Economics. MIT Press, Cam-bridge, MA.

Kim, C., D. Mauer, A. Sherman. 2001. The determinants of corpo-rate liquidity: Theory and evidence. J. Financial Quant. Anal.135(3) 335–359.

Nascimento, J. M., W. B. Powell. 2009. An optimal approximatedynamic programming algorithm for the lagged asset acquisi-tion problem. Math. Oper. Res. 34(1) 210–237.

Nascimento, J., W. B. Powell, A. Ruszczynski. 2007. Optimalapproximate dynamic programming algorithms for a gen-eral class of storage problems. Technical report, PrincetonUniversity, Princeton, NJ.

Penttinen, M. 1991. Myopic and stationary solutions for stochasticcash balance problems. Eur. J. Oper. Res. 52(2) 155–166.

Powell, W. B. 2007. Approximate Dynamic Programming: Solving theCurses of Dimensionality. John Wiley & Sons, New York.

Powell, W. B., A. Ruszczynski, H. Topaloglu. 2004. Learning algo-rithms for separable approximations of stochastic optimizationproblems. Math. Oper. Res. 29(4) 814–836.

Schmid, O. 2002. A multistage stochastic optimization modelfor the cash management problem. E. J. Kontoghiorghes,B. Rustem, S. Siokos, eds. Computational Methods in Decision-Making Economics and Finance. Kluwer Academic, Dordrecht,The Netherlands, 49–75.

Sethi, S. 1973. Note on modeling simple dynamic cash balanceproblems. J. Financial Quant. Anal. 8(4) 685–687.

Sethi, S., G. Thompson. 1970. Applications of mathematical con-trol theory to finance—Modeling simple dynamic cash balanceproblems. J. Financial Quant. Anal. 5(4–5) 381–394.

Sethi, S., G. Thompson. 2000. Optimal Control Theory: Applications toManagement Science and Economics, 2nd ed. Kluwer Academic,Norwell, MA.

Srinivasan, V. Y. Kim. 1986. Deterministic cash flow management—State-of-the-art and research directions. Omega-Internat. J. Man-agement Sci. 14(2) 145–166.

Wermers, R. 2000. Mutual fund performance: An empirical decom-position into stock-picking talent, style, transaction costs, andexpenses. J. Finance 55(4) 1655–1695.

Wright, B., J. Williams. 1982. The economic role of commodity stor-age. Econom. J. 92(367) 596–614.

Wright, B., J. Williams. 1984. The welfare effects of the introductionof storage. Quart. J. Econom. 99(1) 169–192.

Yan, X. 2006. The determinants and implications of mutual fundcash holdings: Theory and evidence. Financial Management35(2) 67–91.

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

dynamicprogrammingmodelsandalgorithmsfor ......fer from the bank account to the portfolio. there is...

Documents