fast universilization of investmnet strategies

8/13/2019 Fast Universilization of Investmnet Strategies

1/22

FAST UNIVERSALIZATION OF INVESTMENT STRATEGIES

KARHAN AKCOGLU, PETROS DRINEAS, AND MING-YANG KAO

SIAM J. COMPUT. c 2004 Society for Industrial and Applied MathematicsVol. 34, No. 1, pp. 122

Abstract. A universalization of a parameterized investment strategy is an online algorithmwhose average daily performance approaches that of the strategy operating with the optimal param-eters determined offline in hindsight. We present a general framework for universalizing investmentstrategies and discuss conditions under which investment strategies are universalizable. We presentexamples of common investment strategies that fit into our framework. The examples include bothtrading strategies that decide positions in individual stocks, and portfolio strategies that allocatewealth among multiple stocks. This work extends in a natural way Covers universal p ortfolio work.We also discuss the runtime efficiency of universalization algorithms. While a straightforward im-plementation of our algorithms runs in time exponential in the number of parameters, we show thatthe efficient universal portfolio computation technique of Kalai and Vempala [Proceedings of the41st Annual IEEE Symposium on Foundations of Computer Science, Redondo Beach, CA, 2000,pp. 486491] involving the sampling of log-concave functions can be generalized to other classes ofinvestment strategies, thus yielding provably good approximation algorithms in our framework.

Key words. universal portfolios, computational finance, portfolio optimization, investment

strategies, portfolio strategies, trading strategies, constantly rebalanced portfolios

AMS subject classifications. 91B28, 68Q32, 68W40

DOI. 10.1137/S0097539702405619

1. Introduction. An age-old question in finance deals with how to managemoney on the stock market to obtain an acceptable return on investment. Aninvestment strategy is an online algorithm that attempts to address this questionby applying a given set of rules to determine how to invest capital. Typically, aninvestment strategy is parameterized by a vectorw R = i=1Ri that dictates howthe strategy operates. The optimal parameters that maximize the strategys returnare unknown when the algorithm is run, and the parameters are usually chosen quite

arbitrarily. Auniversalizationof an investment strategy is an online algorithm basedon the strategy whose average daily performance approaches that of the strategyoperating with the optimal parameters determined offline in hindsight.

Consider the constantly rebalanced portfolio (CRP) investment strategy univer-salized by Cover [5] and the subject of several extensions and generalizations [3, 6, 11,14, 16, 20]. The CRP strategy maintains a constant proportion of total wealth in eachstock, where the proportions are dictated by the parameters given to the strategy. In

Received by the editors April 12, 2002; accepted for publication (in revised form) March 27, 2004;published electronically October 1, 2004. A preliminary version of this work appeared in LectureNotes in Computer Science 2380: Proceedings of the 29th International Colloquium on Automata,Languages, and Programming, M. Hennessy and P. Widmayer, eds., Springer-Verlag, New York,2002, pp. 888900.

http://www.siam.org/journals/sicomp/34-1/40561.htmlThe Goldman Sachs Group, New York, NY 10004 ([email protected]). This work

was done while the author was a graduate student at Yale University and was supported in part byNSF grant CCR-9988376.

Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180 ([email protected]). This work was done while the author was a graduate student at Yale University and wassupported in part by NSF grant CCR-9896165.

Department of Computer Science, Northwestern University, Evanston, IL 60201 ([email protected]). This authors work was supported in part by NSF grant CCR-9988376.

1


2/22

2 KARHAN AKCOGLU, PETROS DRINEAS, AND MING-YANG KAO

a stock market with m stocks, the parameter space for the CRP strategy is

Wm =

w [0, 1]m

m

i=1wi = 1

,

the set of vectors in Rm whose components are between 0 and 1 and add up to 1.Given aportfolio vector w= (w1, . . . , wm) Wm,wi tells us the proportion of wealthto invest in stock i for 1 i m. At the beginning of each day, the holdings arerebalanced; i.e., money is taken out of some stocks and put into others, so that thedesired proportions are maintained in each stock. As an example of the robustness ofthe CRP strategy, consider the following market with two stocks [11, 16]. The priceof one stock remains constant, while the other stock doubles and halves in price onalternate days. Investing in a single stock will at most double our money. With aCRP (12 ,

12) strategy, however, our wealth will increase exponentially, by a factor of

(12 1 + 12 2) (12 1 + 12 12) = 32 34 = 98 every two days.Cover developed an investment strategy that effectively distributes wealth uni-

formly over all portfolio vectors w

Wm on the first day and executes the CRP

strategy with daily rebalancing according to each w on the (infinitesimally small)proportion of wealth initially allocated to each w. Cover showed that the averagedaily log-performance1 of such a strategy approaches that of the CRP strategy oper-ating with the optimal return-maximizing parameters chosen with hindsight.

This paper generalizes previous results and introduces a framework that allowsuniversalizations of other parameterized investment strategies. As we see in section 2,investment strategies typically fall under two categories: trading strategies operateon a single stock and dictate when to buy and short2 the stock; portfolio strategies,such as CRP, operate on the stock market as a whole and dictate how to allocatewealth among multiple stocks. We present several examples of common trading andportfolio strategies that can be universalized in our framework. We discuss our univer-salization framework in section 3. The proofs of our results are very general, and, aswith previous universal portfolio results, we make no assumptions on the underlying

distribution of the stock prices; our results are applicable for all sequences of stock re-turns and market conditions. The running times of universalization algorithms are, ingeneral, exponential in the number of parameters used by the underlying investmentstrategy. Kalai and Vempala [14] presented an efficient implementation of the CRPalgorithm that runs in time polynomial in the number of parameters. In section 4, wepresent general conditions on investment strategies under which the universalizationalgorithm can be efficiently implemented. We also give some investment strategiesthat satisfy these conditions. Section 5 concludes with directions for further research.

2. Types of investment strategies. Suppose we would like to distribute ourwealth amongmstocks.3 Investment strategiesare general classes of rules that dictatehow to invest capital. At timet >0, a strategyStakes as input anenvironment vectorEt and aparameter vector w, and returns an investment descriptionSt(w) specifyinghow to allocate our capital at time t. The environment vectorEt contains historic

1The average daily log-performance is the average of the logarithms of the factors by which ourwealth changes on a daily basis. This notion is discussed further in section 3.1.

2A short position in a stock, discussed in section 2.1, allows us to earn a profit when the stockdeclines in value.

3We use the term stocks in order to keep our terminology consistent with previous work, butwe actually mean a broader range of investment instruments, including both long and short positionsin stocks.


3/22

FAST UNIVERSALIZATION OF INVESTMENT STRATEGIES 3

market information, including stock price history, trading volumes, etc.; the parametervector w is independent ofEt and specifies exactly how the strategy Sshould operate;the investment description St(w) = (St1(w), . . . , S tm(w)) is a vector specifying theproportion of wealth to put in each stock, where we put a fraction Sti(w) of our

holdings in stock i, for 1 i m. For example, CRP is an investment strategy;coupled with a portfolio vector w, it tells us to rebalance our portfolio on a dailybasis according to w; its investment description, CRPt(w) = w, is independent ofthe market environmentEt.

There are two general types of investment strategies that we focus upon in thispaper. Trading strategies tell us whether we should take a long (bet that the stockprice will rise) or a short(bet that the stock price will fall) position on a given stock.Portfolio strategies tell us how to distribute our wealth among various stocks. Weshould note here that these two classes do not exhaust all investment strategies; thereexist strategies that take both long and short positions in several stocks (as in [21]).Trading strategies are denoted by T, and portfolio strategies are denoted by P. WeuseS to denote either kind of strategy. For k 2, let

Wk =

w= (w1, . . . , wk) [0, 1]k ki=1

wi= 1

.(2.1)

Remark 1.Wk is a(k1)-dimensional simplex inRk. The investment strategiesthat we describe below are parameterized by vectors inWk = Wk W k (times)

for somek 2 and 1. We may write w Wk in the form w = (w1, . . . , w),where w= (w1, . . . , wk) for 1 .

2.1. Trading strategies. Suppose that our market contains a single stock. Wehave m = 2 potential investments: either a long position or a short position in thestock. To take a long position, we buy shares in hopes that the share price will rise.We close a long positionby selling the shares. The money we use to buy the sharesis our investment in the long position; the valueof the investment is the money we

get when we close the position. If we let pt denote the stock price at the beginning ofdayt, the value of our investment will change by a factor ofxt =

pt+1pt

from dayt tot + 1.

To take a short position, we borrow shares from our broker and sell them on themarket in hopes that the share price will fall. Weclose a short positionby buying theshares back and returning them to our broker. As collateral for the borrowed shares,our broker has a margin requirement: a fraction of the value of the borrowed sharesmust be deposited in amargin account. Should the price of the security rise sufficiently,the collateral in our margin account will not be enough, and the broker will issue amargin call, requiring us to deposit more collateral. The margin requirement is ourinvestment in the short position; the valueof the investment is the money we getwhen we close the position.

Lemma 2.1. Let the margin requirement for a short position be

(0, 1]. Sup-

pose that a short position is opened on dayt and that the price of the underlying stockchanges by a factor of xt =

pt+1pt

< 1 + during the day. Then the value of our

investment in the short position changes by a factor ofxt = 1 + 1xt

during the day.

Proof. Suppose that we have $v to deposit in the zero-interest margin account.Using this as our investment in the short position, we can sell $v/ worth of shares.Combining the proceeds of the stock sale with our margin account balance, we willhave a total ofv+ v/ dollars. At the end of the day, it will costxtv/ dollars to


4/22


buy the shares back, and we will be left with v+ v xt v dollars, which is positive

since xt < 1 + . Thus, our investment of $v in the short position has changed by afactor of 1 + 1xt

, as claimed.

Should the price of the underlying stock change by a factor greater than 1 +,

we will lose more money than we initially put in. We will assume that the marginrequirement is sufficiently large that the daily price change of the stock is alwaysless than 1 + .

Remark 2. This assumption can be eliminated by purchasing a call option onthe stock with some strike price p < (1 + )pt. Should the stock price get too high,the call allows us to purchase the stock back for $p. Though its price detracts fromthe performance of our short trading strategy, the call protects us from potentiallyunlimited losses due to rising stock price.

If a short position is held for several days, assume that it is rebalanced at thebeginning of each day: either part of the short is closed (if xt > 1) or additionalshares are shorted (ifxt 0, investors should put aD fraction of their moneyin the long position and a 1 D fraction in cash; ifD < 0, investors should investD in the short position and 1 D in cash; if D = 0, investors should avoid thestock completely and keep al l their money in cash. From a practical standpoint, it isdesirable for the trading strategy to be decisive, i.e.,|D| = 1, so that our allocationof money to the stock is always fully invested in the stock (either as a long or a shortposition). We show in section 3 that investment strategies that are continuous in

their parameter spaces are universalizable. Though decisive trading strategiesT arediscontinuous, they can be approximated by continuous strategies whose investmentdescriptions converge almost everywhere to Tt as t (see, for example, (2.3)below).

We now describe some commonly used and researched trading strategies [4, 10,18, 22] and show how they can be parameterized.

MA[k]: Moving average cross-over with k-day memory. In traditional applica-tions [10] of this rule, we compare the current stock price with the moving averageover, say, the previous 200 days: if the price is above the moving average, we take along position; otherwise we take a short position. Some generalizations of this rulehave been made, where we compare a fast moving average (over, for example, thepast 5 to 20 days) with a slow moving average (over the past 50 to 200 days). Wegeneralize this rule further. Given day t

0, let v

t = (v

t1, . . . , v

tk) be the price-

history vector over the previous k days, where vtj is the stock price on day t j.Assume that the stock prices have been normalized such that 0 < vtj 1. Let(wF, wS) W2k (whereWk is defined in (2.1)) be the weights used to compute thefast moving and slow moving averages, so that these averages on day t are given bywF vt and wSvt, respectively. Since the prices have been normalized to the interval(0, 1],1 (wF wS) vt 1, let g : [1, 1] [0, 1] be the long/short allocation

function. The idea is that g((wF wS) vt) represents the proportion of wealth that


5/22


we invest in a long position. The full investment description for the MA = MA[k]trading strategy is

MAt(wF, wS) = g((wF wS) vt), 1 g((wF wS) vt).Note that the dimension of the parameter space for MA[k] is 2(k1) since each ofwFand wSare taken from (k 1)-dimensional spaces. Possible functions forg include

gs(x) =

0 ifx


6/22


invest in a long position. The full investment description for the SR = SR[ k] tradingstrategy is

SRt(w) = h(pt rt, pt st), 1 h(pt rt, pt st).The value ofh need only be defined on {(x, y) [1, 1]2 | x y} since, by Lemma 2.2,st rt. A possible function forh is

hs(x, y) =

0 ifx y 0,1

+1 ifx


7/22


kj=1 fj(v) = 1 for all v R. The parameter space isWkm; the investment descrip-

tion is CRP-St(w1, . . . , wk) =k

j=1 fj(vt)wj, where vt is the indicator vector forday t. Under such a scheme, we have the flexibility of splitting our wealth amongmultiple sets of portfolios w1, . . . , wk on any given day, rather than being forced to

choose a single one. For example, assume that v is ak-dimensional vector, with eachvi corresponding to portfolio wi. Define f : R

k [0, 1]k byfi(vt) = vtik=1vt

, so that

our allocation is biased towards portfolios corresponding to higher indicators whilestill maintaining a position in the others.

IA[k]: k-way indicator aggregation. For each dayt 0, suppose that each stock ihas a set ofk indicatorsvti= (vti1, . . . , vtik), where each vtij (0, 1] and, for 1 jk,vt1j , . . . , vtmj have been normalized such that there is at least onei such thatvtij =1. Examples of possible indicators include historic stock performance and tradingvolumes, and company fundamentals. Our goal is to aggregate the indicators for eachstock to get a measure of the stocks attractiveness and put a greater proportion of ourwealth in stocks that are more attractive. We will aggregate the indicators by takingtheir weighted average, where the weights will be determined by the parameters. The

parameter space isW

= Wk, and the investment description isIAt(w) =

w vt1mi=1 w vti

, . . . , w vtmmi=1 w vti

.

3. Universalization of investment strategies.

3.1. Universalization defined. In a typical stock market, wealth grows geo-metrically. On day t0, let xt be the return vector for day t, the vector of factorsby which stock prices change on day t. The return vector corresponding to a tradingstrategy on a single stock is (xt, 1 +

1xt

), where xt is the factor by which the price

of the stock changes and 1 + 1xt

is the factor by which our investment in a shortposition changes, as described in Lemma 2.1; the return vector corresponding to aportfolio strategy is (xt1, . . . , xtm), wherexti is the factor by which the price of stocki changes, where 1 i m. Henceforth, we do not make a distinction betweenreturn vectors corresponding to trading and portfolio strategies; we assume that xtis appropriately defined to correspond to the investment strategy in question. For aninvestment strategy S with parameter vector w, the return of S(w) during the tthdaythe factor by which our wealth changes on the tth day when invested accordingto S(w)is St(w) xt =

mi=1 Sti(w) xti. (Recall that St(w) is the investment

description ofS(w) for day t, which is a vector specifying the proportion of wealth to

put in each stock.) Given timen >0, letRn(S(w)) =n1

t=0 St(w) xt be thecumu-lative return ofS(w) up to timen; we may writeRn(w) in place ofRn(S(w)) ifSis obvious from context. We analyze the performance ofSin terms of the normalizedlog-returnLn(w) = Ln(S(w)) = 1nlog Rn(w) of the wealth achieved.

For investment strategy S, let wn = arg maxwR Rn(S(w)) be the parametersthat maximize the return ofSup to dayn.

4

An investment strategyUuniversalizes(oris universal for)S if5

Ln(U) = Ln(S(wn)) o(1)

4As mentioned above, wn

can be computed only with hindsight.5Unlike previously discussed investment strategies, the behavior ofUis fully defined without an

additional parameter vector w .


8/22


for all environment vectorsEn. That is, U is universal for S if the average dailylog-return ofUapproaches the optimal average daily log-return ofSas the lengthnof the time horizon grows, regardless of stock price sequences.

3.2. General techniques for universalization. Given an investment strat-

egy S, let W be the parameter space for S and let be the uniform measure overW. Our universalization algorithm forS,U(S), is a generalization of Covers originalresult [5]; we note that a similar algorithm appeared in [20] under the name Aggre-gating Algorithm. The investment descriptionUt(S) for the universalization ofSonday t > 0 is a weighted average ofSt(w) over w W, with greater weight given toparametersw that have performed better in the past (i.e., Rt(w) is larger). Formally,the investment description is

Ut(S) =W

St(w)Rt(w)d(w)W

Rt(w)d(w) =W

St(w)Rt(S(w))d(w)W

Rt(S(w))d(w) ,(3.1)

where we takeR0(w) = 1 for all w W.6 Equivalently, the above integral mightbe interpreted as splitting our money equally among all the different strategies and

letting it sit. In the following lemma, we will prove that this strategy has the sameexpected gain as picking one strategy at random.

Remark 5. The definition of universalization can be expanded to include mea-sures other than, but we consider only in our results.

Lemma 3.1 (see [3, 6]). The cumulativen-day return ofU(S) is

Rn(U(S)) =W

Rn(w)d(w) = ERn(w),

which is the-weighted average of the cumulative returns of the investment strategies{S(w) | w W}.

Proof. The return ofU(S) on day t isUt(S) xt, where xt is the return vector fordayt. The cumulative n-day return ofU(S) is

Rn(U(S)) =n1t=0

Ut(S) xt =n1t=0

W St(w)Rt(w)d(w)

WRt(w)d(w) xt

=n1t=0

W

(St(w) xt)Rt(w)d(w)W

Rt(w)d(w) =n1t=0

W

Rt+1(w)d(w)W

Rt(w)d(w) .

The result follows from the fact that this product telescopes.Rather than directly universalizing a given investment strategy S, we instead

focus on a modified version ofSthat puts a nonzero fraction of wealth into each ofthem stocks. Define the investment strategy Sby

St(w) =

1

2(t + 1)2

St(w) +

2m(t + 1)2

for t 0 and some fixed 0 < < 1. Rather than universalizing S, we insteaduniversalize S. Lemma 3.2 tells us that we do not lose much by doing this.

Lemma 3.2. For alln 0,(1) Rn(U(S)) (1)Rn(U(S))and(2) Ln(U(S)) =Ln(U(S)) o(n)n . (3)IfU(S)is a universalization ofS, thenU(S)is a universalizationofS as well.

6Covers algorithm is a special case of this, replacing St(w) with w .


9/22


Proof. Statements (2) and (3) follow directly from (1). Statement (1) follows from

the fact that for all w W, Rn(S(w)) =n1

t=0 St(w) xtn1

t=0(1 2(t+1)2 )St(w) xt (1

n1t=0

2(t+1)2 )Rn(S(w)) (1 )Rn(S(w)).

Remark 6. Henceforth, we assume that suitable modifications have been madeto S to ensure thatSti(w) 2m(t+1)2 for all 1 i m and t 0.

Theorem 3.3. Given an investment strategy S, letW =Wk (for some k 2and 1) be its parameter space. For 1 i m, 1 , and 1 j k, assumethat there is a constantc such that

Sti(w)wj c(t + 1)for all w W. ThenU(S) is

a universalization ofS.To prove Theorem 3.3, we first prove some preliminary results; the proof follows

the same general strategy as in [5, 6].Lemma 3.4. For nonnegative vector a and strictly positive vectors b and x,

mini

aibi

a xb x maxi

aibi

.

Proof. Assume that the components ofa and b are strictly positive. Otherwise,the lemma holds trivially. Letimax= arg maxi aibi andimin= arg mini

aibi

, so that

aibi

aimaxbimax

aiaimax

bibimax

and ai

bi aimin

bimin ai

aimin bi

bimin.

Then

aimin(ximin+

i=iminai

aiminxi)

bimin(ximin+

i=iminbi

biminxi)

= a xb x =

aimax(ximax+

i=imaxai

aimaxxi)

bimax(ximax+

i=imaxbi

bimaxxi)

.

Therefore,

aimin

bimin a xb x

aimax

bimax.

Our next two results are related to the (k1)-dimensional volumes of some subsetsofRk.

Lemma 3.5. The (k 1)-dimensional volume of the simplexWk = {w [0, 1]k |ki=1 wi = 1}, defined in (2.1), is k(k1)! .

Proof. By induction on k , it can be shown that the k-dimensional volume of the

solid Wk(s) ={w |k

i=1 wi s} is sk

k! . Written in terms of the lengthr of the line

segment passing between the origin and ( sk

, . . . , sk

) Rk, the volume is 1k!r

kkk2 since

s = r

k. Upon differentiation with respect to r, 1(k1)!rk1k

k2 = 1(k1)!

ksk1, we

arrive at the (k 1)-dimensional volume of the simplexWk(s) = {w |

ki=1 wi = s}.

Settings = 1 yields the desired result.

Lemma 3.6. The (k 1)-dimensional volume of a (k 1)-dimensional ball ofradius embedded inWk is

k12 k1

( k12 +1), where

() = ( 1)! and

+1

2

=

1

2

3

2

1

2

.

Proof. This result is proven in Folland [8, Corollary 2.56].


10/22


Proof of Theorem 3.3. From Lemma 3.1, the return ofU(S) is the average ofthe cumulative returns of the investment strategies{S(w) | w W}. Let w =arg maxwWRn(S(w)) be the parameters that maximize the return ofS. We showthat there is a set B of nonzero volume around w such that for w B the returnRn(w) is close to the optimal returnRn(w). We then show that the contributionofB to the average return is sufficiently large to ensure universalizability. We beginby bounding the magnitude of the gradient vector Rn(w). From Remark 6 and ourassumption in the statement of the theorem, for allw, t, i, , and j ,Sti(w)wj

Sti(w)

cm(t + 1)3,

where c = 2c

. Using this fact and Lemma 3.4, the partial derivative of the return

functionRn(w) = Rn(S(w)) =n1

t=0 rt(S(w)) with respect to parameter wj is

Rn(w)wj Rn(w)n1t=0

(St(w)xt)

wj St(w) xt Rn(w)

n1t=0

mi=1

Sti(w)wj xti

mi=1 Sti(w) xti

Rn(w)n1t=0

cm(t + 1)3 cRn(w)mn4

and

|Rn(w)| cRn(w)mn4

k.(3.2)

We would like to take our set B to be some d-dimensional ball around w; unfor-tunately, if w is on (or close to) an edge ofW, the reasoning introduced at thebeginning of this proof is not valid. We instead perturb w to a point w that is atleast

=

cmn4k2

away from all edges, where 0 < < 1 is a constant, and such thatRn(w) is closetoRn(w). To illustrate the perturbation, let w = (w1, . . . , w ), where w =(w1, . . . , w

k) and w

k = 1

k1i=1 w

i for 1 . We perturb each w in the

same way. Let w0 = w . For 1 j k, given wj1 , define wj as follows. Let

jmax be the index of the maximum coordinate ofwj1 . If 0 wjj < , definewjj = w

j1j +, w

jjmax

= wj1jmax and leave all other coordinates unchanged.Otherwise, let wjj0 = w

j1j0

. The final perturbation is w = (w1, . . . ,w), where

w = wk. By construction, w W, w is at least away from the edges ofW and

|wj

wj

| k for all and j. We bound Rn(w

)

Rn( w

)

by the multivariate mean value

theorem and the CauchySchwarz inequality:

Rn(w) = Rn(w) + Rn(w) Rn(w) Rn(w) |Rn(w) (w w)| (for somew between w andw) Rn(w) |Rn(w)| |w w| Rn(w) cRn(w)mn4

k k

k

Rn(w) cRn(w)mn4

k k

k Rn(w)(1 ).


11/22


For 0 let C ={w Rk | |w w| }. From the construction ofw,B= CWk is a (k1)-dimensional ball of radius. Let w = argmaxwB Rn(w),and let w = (w1, . . . ,w

) be the profit-maximizing parameters inB = B1 B.

For w B,Rn(w) = Rn(w) + Rn(w) Rn(w)

Rn(w) |Rn(w)| |w w| (for somew between w and w) Rn(w) cRn(w)mn4

k 2

Rn(w)(1 )

Rn(w)(1 2).By Lemma 3.1,

Rn(U(S)) =W

Rn(S(w))d(w) B

Rn(w)d(w) (1 2)Rn(w)B

d(w)

(1 2)Rn(w)B

dwW

dw

= (1 2)Rn(w) k12 k1

(k12 + 1)(k

1)!

k

(from Lemmas 3.5 and 3.6)

= Rn(w)(,m,k,)n4k,where is some constant depending on , m, k , and. Therefore,

Ln(w) Ln(U(S)) log (,m,k,)n

+ 4klog n

n =

o(n)

n ,(3.3)

as claimed.Remark 7. The techniques used in the proof of Theorem3.3can be generalized to

other investment strategies with bounded parameter spacesWthat are not necessarilyof the formWk.

3.3. Increasing the number of parameters with time. The reader may

notice from the proof of Theorem 3.3 that an investment strategy S may be uni-versalizable even if the dimensions of its parameter space W grow with time. Infact, even if the dimension of the parameter space (the coefficient of logn

n in (3.3))

isO( n(n) logn), where (n) is a monotone increasing function, the strategy is still

universalizable. This introduces an interesting possibility for investment strategieswhose parameter spaces grow with time as more information becomes available. As asimple example, considerdynamic universalization, which allows us to track a higher-return benchmark than basic universalization. Partition the time intervalI= [0, n)into =O( n

(n) logn ) subintervalsI1, . . . ,I, and let wIj be the parameters thatoptimize the return duringIj . InI1, we run the universalization algorithm given by(3.1) over the basic parameter space W ofS. InI2, we run the algorithm over WW;to compute the investment description for a day t I2 using (3.1), we compute thereturn

Rt(w1, w2) as the product of the returns we would have earned in

I1 usingw1

and what we would have earned up to day t inI2 using w2. We proceed similarly inintervalsI3 throughI. This will allow us to track the strategy that uses the optimalparameterswIj corresponding to eachIj . Such a strategy is useful in environmentswhere optimal investment styles (and the optimal investment strategy parametersthat go with them) change with time. Finally, we note that similar ideas appear inthe area of tracking the best expert in the theory of prediction with expert advice;we refer the reader to [12, 19] for more details.


12/22


3.4. Applications to trading strategies. By proving an upper bound onTti(w)wj

for our trading strategies T, we show that they are universalizable.Corollary 3.7. The moving average cross-over trading strategy, MA[k], is

universalizable for the long/short allocation functions g(t)(x) and g(x) defined in

(2.3) and (2.4), respectively.Proof. The parameters for MA[k] are of the form wF = (wF1, . . . , wF(k1), 1

wF1 wF(k1)) and wS = (wS1, . . . , wS(k1), 1 wS1 wS(k1)). Usingthe long/short allocation functiong(t)(x) defined in (2.3), the partial derivative of theinvestment description with respect to a parameter wFj (or similarly wSj ) isMAti(wF, wS)wFj

=g((wF wS) vt)wFj

t2 (vtj vtk) t2 ,where 1 j < k and i {1, 2}. Similarly, we can show that using the long/shortallocation function g(x) defined in (2.4),

MAti(wF,wS)wFj

12 .Corollary 3.8. The support and resistance breakout trading strategy, SR[k], is

universalizable for the long/short allocation functionsh(t)(x, y) andhp(x, y) definedin (2.6) and (2.7), respectively.Proof. We arrive at the result by differentiating the long/short allocation functions

h(t)(x, y) and hp(x, y) with respect to an arbitrary parameter wj and showing thatthe partial derivative isO(t), as in the proof of Corollary 3.7.

3.5. Applications to portfolio strategies.

Corollary 3.9. The constantly rebalanced portfolio, CRP, and CRP with sideinformation, CRP-S, portfolio strategies are universalizable.

Proof. The partial derivatives of CRPti and CRP-Sti with respect to an arbitraryparameterwj are at most 1.

Corollary 3.10. The k-way indicator aggregation portfolio strategy, IA[k], isuniversalizable.

Proof. First, we show thatm

=1 w vt 1

k for all t. Sincek

j=1 wj = 1, thereexistsj0 such thatwj0 1k . Then

m=1 w vt

m=1 wj0 vtj0 1k

m=1 vtj0 1k ,

since the{vtj0}1m have been normalized such that there is at least one 0 suchthatvt0j0 = 1.

Now, letS= IA[k]. By Theorem 3.3, we need only show that Sti(w)wj

= O(t) for1j k 1. For t0 and 1im recall that Sti(w) = wvtim

=1wvt . Then, for1 j k 1, since w = (w1, . . . , wk1, 1 (w1+ + wk1)),

Sti(w)

wj=

vtij vtikm=1 w vt

w vti(m

=1 w vt)2

m=1

(vtj vtk)

1m=1 w vt+

m

(m=1 w vt)2 k+ mk2,

as we wanted to show.

4. Fast computation of universal investment strategies.

4.1. Approximation by sampling. The running time of the universalizationalgorithm depends on the time needed to compute the integral in (3.1). A straightfor-ward evaluation takes time exponential in the number of parameters. Following Kalai


13/22


and Vempala [14], we propose to approximate this integral by sampling the param-eters according to a biased distribution, giving greater weight to better performingparameters. Define the measuret on Wby

dt(w) = Rt(S(w))W

Rt(S(w))d(w) d(w).

Lemma 4.1 (see [14]). The investment descriptionUt(S) for universalization isthe average ofSt(w) with respect to thet measure.

Proof. The average ofSt(w) with respect to t is

Ew(W,t)(St(w)) =

W

St(w)dt(w)

=

W

St(w) Rt(S(w))W

Rt(S(w))d(w) d(w) =Ut(S),

where the final equality follows from (3.1).

We now briefly outline our approach, which follows the lines of [2, 14]. The maintechnical complication is that sampling efficiently with respect to t is not, in general,an easy problem. As a result, we will need some (rather generic) assumption on theinvestment strategies from which we can sample efficiently.

Investment strategies with log-concavity properties. In section 4.3, we usestraightforward manipulations to prove that any investment strategySwhichis linear in the vector of parametersw (such strategies include MA[k], SR[k],CRP, and CRP-S) has a cumulative return functionRt(S(w)) that is log-concave. Our efficient sampling techniques are applicable only on such strate-gies.

Approximating t by t. In section 4.2, we show that for strategies whosecumulative return function is log-concave, it is possible to efficiently samplefrom a distributiont that is close tot. This distribution approximation

incurs some small, bounded error (see Lemma 4.2). Approximating the integral fort via sampling. With such sampling abilities,

it is easy to approximate the average ofSt(w) with respect to t: simply pickNt (as defined in Lemma 4.3) sample parameter vectorsw with respect to tand compute their average. The error incurred by this approximation of theaverage can be bounded in a straightforward manner using Chernoff bounds.

Sampling with respect to t. The critical issue (addressed in section 4.2) ishow to pick vectorsw W with respect tot. In order to tackle this problem,we discretize it by placing a grid on W, and then we perform a Metropolisrandom walk. The convergence properties of this random walk are discussedin Theorems 4.12 and 4.13.

In section 4.2, we show that for certain strategies we can efficiently sample froma distribution t that is close to t; i.e., given t > 0, we generate samples from tsuch that

W

t(w) t(w) d(w) t.(4.1)Assume for now that we can sample from t, with t =

2

4m(t+1)4 , where is the

constant appearing in Remark 6. Let Ut(S) =W

St(w)dt(w) be the corresponding


14/22


approximation toU(S). Lemma 4.2 tells us that we do not lose much by samplingfrom t.

Lemma 4.2. For alln 0,(1)Rn( U(S)) (1 )Rn(U(S))and (2)ifU(S) isa universalization ofS, then

U(S) is a universalization ofSas well.

Proof. Statement (2) follows directly from (1). To see (1), we need only showthat the fraction of wealth that we put into each stock i on dayt under U(S) is withina 1 2(t+1)2 factor of the corresponding amount underU(S); i.e., Uti(S) (1

2(t+1)2 )Uti(S) for 0 t < n and 1 i m. For w W, let t(w) = |t(w) t(w)|,so that

W

t(w)dw= t 24m(t+1)4 . We have

Uti(S) =W

Sti(w)t(w)d(w) W

Sti(w)(t(w) t(w))d(w)

=Uti(S) W

Sti(w)t(w)d(w) Uti(S) t (since Sti(w) 1)

1

2(t + 1)2Uti(S)

sinceUti(S) minw

S(w) 2m(t + 1)2

andt 24m(t+1)4

,

as we wanted to show.

By sampling fromt, we use a generalization of the Chernoff bound to get an ap-proximation U(S) to U(S) such that with high probability Uti(S) (1 2(t+1)2 ) Uti(S)for 0 t < n and 1 i m. Using an argument similar to that in the proof ofLemma 4.2, we see that if U(S) is a universalization ofS, then such a U(S) is a univer-salization ofSas well. Choose w1, . . . , wNt Wat random according to distributiont, and let Uti(S) = 1Nt

Nti=1 Sti(wi). Lemma 4.3 discusses the number of samplesNt

required to get a sufficiently good approximation to

Ut(S).

Lemma 4.3. Given 0 < < 1, use Nt 8m2(t+1)84 log 2m(t+1)2

samples to

compute Ut(S), where is the constant appearing in Remark 6. With probability1 , Uti(S) (1 2(t+1)2 ) Uti(S) for all 1 i m andt 0.

Proof. Hoeffding [13] proves a general version of the Chernoff bound. For random

variables 0 Xi 1 with E(Xi) = and X = 1NN

i=1 Xi, the bound states that

Pr(X (1 )) e2N22 . In our case, we would like Uti (1 2(t+1)2 ) Uti.As this must hold for 1 i m and t 0 with total probability 1 , we requirePr( Uti (1 2(t+1)2 ) Uti) 2m(t+1)2 for each i and t. From our assumption statedin Remark 6, = Uti 2m(t+1)2 , and the desired probability bound is achieved withNt 8m

2(t+1)8

4 log 2m(t+1)

2

samples.

4.2. Efficient sampling. We now discuss how to sample from W =Wk =Wk W k according to distribution t() Rt() = Rt(S()). W is a convex setof diameterd =

2. We focus on a discretization of the sampling problem. Choose

an orthogonal coordinate system on each Wk, and partition it into hypercubes of sidelength t, where t is a constant chosen below. Let be the set of centers of cubesthat intersect W, and choose the partition such that the coordinates of w aremultiples of t. For w , let C(w) be the cube with center w. We show how to


15/22


choose w with probability close to

t(w) = Rt(w)

w Rt(w).

In particular, we sample from a distribution t that satisfies

w

|t(w) t(w)| t = 2

4m(t + 1)4.(4.2)

Note that this is a discretization of (4.1). We will also have that for each w ,t(w)

t(w) 2.(4.3)

We would like to choose t sufficiently small thatRt is nearly constant over C(w);i.e., there is a small constant >0 such that

(1 + )1

Rt(w) Rt(w) (1 + )Rt(w)(4.4)for all w C(w). Such a t can be chosen for investment strategies S that havebounded derivative, as we see in Lemma 4.4.

Lemma 4.4. Suppose that investment strategySsatisfies the condition for uni-

versalizability given in Theorem 3.3; i.e.,Sti(w)

wj

c(t+ 1). Given > 0, lett = t() =

3cmt4k

, wherec is defined in the proof of Theorem3.3. For w, wWsuch that|wij wij | t() for all 1 i and 1 j k, (1 + )1Rt(w)Rt(w) (1 + )Rt(w).

Proof. Note that |w w| t

k. Letw be the parameters that maximize thereturn on the line between w and w. By the multivariate mean value theorem andthe bound for|Rt| given in (3.2),

Rt(w) = Rt(w) + Rt(w) Rt(w) Rt(w) + |Rt(wm)| |w w| (for somewm between w and w) Rt(w) + cRt(wm)mn4

k t

k Rt(w) + Rt(w)

3

Rt(w) Rt(w)

1 3

Rt(w)

1

3

so thatRt(w) (1 + )Rt(w). By similar reasoning,Rt(w) = Rt(w) + Rt(w) Rt(w)

Rt(w) |Rt(wm)| |w w| (for somewm between w and w ) Rt(w)

1

3

Rt(w)

1

3

Rt(w)(1 + )1,

completing the proof.We use a Metropolis algorithm [15] to sample from t. We generate a random

walk on according to a Markov chain whose stationary distribution is t. Begin byselecting a point w0 according to either t1 or t2;7 Remark 8 explains howto do this.

7Ideally, we would like to begin with a point selected according to t1, but, as discussed inRemark 8, this is not always possible.


16/22


Remark 8. We can select a point according to t1 by saving our sam-ples that were generated at time t1. By Lemma 4.3, we would have generatedNt1 8m2t84 log 2mt

2

samples at time t 1, which is not enough to generate the

Nt

8m2(t+1)8

4 log 2m(t+1)

2

samples necessary at time t. Instead, we can save

samples that were generated at times t 1 and t 2. For sufficiently large t, NtNt1 + Nt2 and our initial point w0 would be picked according to either t1 ort2. As we see in the proof of Lemma 4.10, this distinction is not important.

If wis the position of our random walk at time 0, we pick its position attime + 1 as follows. Note thatw has 2(k 1) neighbors, two along each axis inthe Cartesian product of (k 1)-dimensional spaces. Let w be a neighbor ofw,selected uniformly at random. Ifw , set

w+1 =

w with probabilityp = min(1, Rt(w)Rt(w)),

w with probability 1 p.If w , let w+1 = w. It is well known that the stationary distribution of thisrandom walk is t. We must determine how many steps of the walk are necessary

before the distribution has gotten sufficiently close to stationary. Letpbe the dis-tribution attained aftersteps of the random walk. That is,p(w) is the probabilityof being at w after steps.

Remark 9. A distinction should be made between t and . We use t to referto the time step in our universalization algorithm. We use to refer to sub- timesteps used in the Markov chain to sample fromt. When t is clear from context, wemay drop it from the subscripts in our notation.

Applegate and Kannan [2] show that if the desired distribution t is proportionalto a log-concave function F (i.e., log Fis concave), then the Markov chain is rapidlymixingand reaches its steady state in polynomial time. Frieze and Kannan [9] give animproved upper bound on the mixing time using logarithmic Sobolev inequalities [7].

Theorem 4.5 (Theorem 1 of [9]). Assume the diameter d of W satisfies dt

k and that the target distribution is proportional to a log-concave function.

There is an absolute constant >0 such that

2

w

|(w) p(w)|2

e2t

kd2 log 1

+

M ekd2

2t,(4.5)

where = minw (w), M= maxwp0(w)(w) log

p0(w)(w)

, p0() is the initial distribu-tion on , e =

we (w), and e ={w | Vol(C(w) W) < Vol(C(w))}.

(The e in the subscripts ofe and e stands for edge.)In the random walk described above, ifwis on an edge of , so that it has many

neighbors outside , the walk may get stuck at wfor a long time, as seen in thee term of Theorem 4.5. We must ensure that the random walk has a low probabilityof reaching such edge points. We do this by applying a damping function toRt,which becomes exponentially small near the edges ofW. For 1

i

, 1

j

k ,

and w = (w1, . . . , w) = ((w11, . . . , w1k), . . . , (w1, . . . , wk)) Wletfij(w) = e

min(+wij ,0),(4.6)

where >0 and > 2 are constants that we choose below, and let

Ft(w) = Rt(w)

i=1

kj=1

fij(w).


17/22


Lemma 4.6. Ft is log-concave if and only ifRt is log-concave.8Proof. This follows from the fact that log-concave functions are closed under mul-

tiplication and the fact that log fij(w) = min(+ wij , 0), which is concave.Choose = 1

kt(

t2 ), where t(

) is defined in Lemma 4.4 and t is defined in

(4.2). LetF Ft be the probability measure proportional toFt. We need to showthat, for our purposes, sampling fromFis not much different than sampling fromt.By Lemma 4.2, we can do this by showing that

W

|t(w) F(w)|dw t, which wedo in Lemma 4.7.

Remark 10. Before continuing, we show how W can be scaled, which will beuseful in future proofs. Take p= ( 1

k, . . . , 1

k) Wk; given (1, 1), let

w() = (1 + )(w p) + p,

and let

W()k = {w() | w Wk}

be a scaled version ofWk about p, where the scaling factor is 1 +. To extend thisscaling to W = Wk, given w= (w1, . . . , w) W, let w() = (w()1 , . . . , w() ) and

W() = {w() | w W}.

A fact we use is that for 1 i , 1 j k, and w= (w1, . . . , w) W,

|w()ij wij| =(1 + )

wij1

k

+

1

k wij

||.Lemma 4.7.

W

|t(w) F(w)|dw t.Proof. Let W = W(k) be the scaled-in version ofW, as defined in Remark 10.

By Lemma 4.4, since|wij wij | k = t(t

2) for all i and j ,Rt(w)

1

1+t2 Rt(

w)and

W

Rt(w)dw 11 + t2

W

Rt(w)dw.(4.7)

Let Weq ={w W | Ft(w) =Rt(w)} be the subset ofW where Ft() andRt()are equal; W Weq since, by construction of w, wij for all i and j. LetW+ = {w W | F(w) t(w)} be the subset ofW where F() is at least t() andlet W= WW+. We bound

W

|F(w) t(w)|dw=W+

(F(w) t(w))dw +W

(t(w) F(w))dw

by boundingW

(t F), which also gives a bound forW+

(F t), sinceW+

(F t) =

1 W

F

1 W

t

=

W

(t F).

8We characterize investment strategies for which Rt is log-concave in Theorem 4.14.


18/22


Since Ft Rt,W

FtW

Rt and F(w) = Ft(w)WFt

Rt(w)WRt = t(w) for w Weq;

thus W Weq W+ and WWW. We have

W

(t(w) F(w))dw WW

t(w)dw=WW Rt(w)dw

WRt(w)dw = 1

W Rt(w)dwW

Rt(w)dw 1 1

1 + t2 t

2,

where the second-to-last inequality follows from (4.7). This completes the proof.

Henceforth, we are concerned with sampling from W with probability proportionaltoFt(). We use the Metropolis algorithm described above, replacingRt() withFt();we must refine our grid spacing t so that (4.4) is satisfied by Ft; let

t be the new

grid spacing.

Lemma 4.8. Suppose that the conditions of Lemma 4.4 are satisfied. Given >0, lett() =

t =

3cmt4k = t(

), where appears in (4.6). For w, w

Wsuch that

|wij

wij |

t

() for all 1

i

and 1

j

k, (1 + )1Ft(w)Ft(w) (1 + )Ft(w).

Proof. By Lemma 4.4,Rt(w) andRt(w) differ by at most a factor of 1 + .For each i and j, fij(w) and fij(w

) differ by at most a factor ofe

t(), and hencei=1

kj=1 fij(w) and

i=1

kj=1 fij(w

) differ by at most a factor of ek

t() =

e

3cmt4 . Hence, for 2 and sufficiently larget, Ft(w) and Ft(w) differ by at mosta factor of 1 + .

We are now ready to use Theorem 4.5 to select so that the resulting distributionpsatisfies (4.2) (Theorem 4.12) and (4.3) (Theorem 4.13), withpin place of t andFt in place ofRt. We begin with some preliminary lemmas.

Lemma 4.9. There is a constant > 0 such that log 1

k+ k log t

+t log 2mt

2

, where is defined in Remark 6.

Proof. Take such that the number of points in is at most (

t )(k

1)

. Forw1, w2 , the ratio of single-day returns on day t using w1 and w2 is

St(w1) xtSt(w2) xt

2m(t+ 1)2,

by Remark 6 and Lemma 3.4. The ratio of the cumulative returns up to day t is

Rt(w1)Rt(w2)

2mt2

t,

and thus Rt(w)

wRt(w) ( t

)(k1)

2mt2

t. Factoring in the maximum dampening

effect of the fij , ek(

t)(k1)

2mt2

t and log 1 k + k log t +

t log 2mt2

.

Lemma 4.10. M 4 2m(t+1)2

2log 2m(t+1)

2

.

Proof. As stated in Remark 8, the initial distribution is either p0 = t1 or t2.It turns out that the worst case happens when p0 = t2. For all w , t2(w)t2(w) 2by (4.3) and the following:


19/22


t2(w)t(w)

= Ft2(w)w Ft2(w)

w Ft(w)Ft(w)

Ft2(w)Ft(w)

Ft(w)

Ft2(w) by Lemma 3.4, where w

= arg maxw

Ft(w)

Ft2(w)

=Rt2(w)

Rt(w) Rt(w)Rt2(w) (since the{fij()}i,j remain constant with time)

=(St(w

) xt)(St1(w) xt1)(St(w) xt)(St1(w) xt1)

2m(t + 1)2

2,

where the final inequality follows from the discussion in the proof of Lemma 4.9. This

proves the result since t2(w)t(w)

= t2(w)t2(w)

t2(w)t(w)

.

Lemma 4.11. e (1 + )4(1 + t2)e, where appears in the definition oftin Lemma 4.8, t appears in (4.2), and and appear in (4.6).

Proof. Extend our t-hypercube partition ofW to the hyperplane containing W,and let be the set of centers of the hypercubes in this extended partition. ForK

R

k, let Kbe the set of grid points w

such that C(w)

K

=

, so that

= W. By Lemma 4.8, for K W,

1

1 +

wK

Ft(w)Vol(C(w) K) K

Ft(w)dw (1 + )

wKFt(w)Vol(C(w) K).

(4.8)

Using the notation of Lemma 4.7, let W = W(k) be a scaled-in version ofW; weshowed in Lemma 4.7 that for w W, Ft(w) = Rt(w), and that

W

Ft(w)dw=

W

Rt(w)dw 11 + t2

W

Rt(w)dw.(4.9)

Let W = W(

t()) be a scaled-out version ofW, and extend the domains ofFt() and

Rt(

) to W by definingFt(w) = Ft(w) and

Rt(w

) =

Rt(w

) for w

W

W,

where w is the point where the line between w andp = (p, . . . , p) W intersectsthe boundary ofW. By Lemma 4.8 and the construction of the extension ofRt,Rt(w) (1 + )Rt(w) and

W

Rt(w)dw (1 + )W

Rt(w)dw.(4.10)

By construction ofW, C(w) W for w e; from the definition ofFt and thechoice oft, Ft(w) (1 + )eRt(w) for w e. Using these facts,

e =

we Ft(w)w Ft(w)

(k1)t

(k1)t

(1 + )e

we Rt(w)w Ft(w)

(1 + )ewW Vol(C(w) W

)Rt(w)wW Vol(C(w) W)Ft(w) since Vol(C(w)) =

(k1)t

(1 + )e (1 + )W

Rt(w)dw1

(1+)

W

Ft(w)dw(by (4.8))

(1 + )3eW

Rt(w)dwW

Ft(w)dw (1 + )4

1 +

t2

e

(by (4.9) and (4.10)).


20/22


Remark 11. We simplify notation below by usingO() notation, which ignoreslogarithmic and constant terms. For our purposes, f() =O(g()) if there exists aconstantC 0 such thatf() =O(g()logC(kmt/)). The values derived above inthis notation aret =

O(

2

mt4), t =

O(

mt4k), =

O(

2

m2t8k2), t =

O( mt4k ),

log 1 = O(k+ t), M= O(m2

t4

2 ), ande= O(e).Theorem 4.12. Letting =O( 1

) =O(m2t8k2

2 ), the random walk reaches a

distribution that satisfies (4.2) after= O(k76m6t2424

) steps.Proof. We show how to bound the right-hand side of (4.5), where the grid spacing

t has been replaced by t. The second term,

Mekd2

t2 , can be made exponentially

small in by choosing =O( 1

). The value of stated in the theorem is large

enough to make the first term, et

2

kd2 log 1

, exponentially small in.Theorem 4.13. Suppose that the distributionp0 obtained after0 steps satisfies

w|(w) p0(w)| t.

After0 00log 1log 1t log 1

= O(0(k + t))steps, the resulting distributionp0satisfies

maxw

p0(w)

(w) 1 1,

which implies (4.3).

Proof. Letd() = 12

w |(w) p(w)| and d() = maxw p(w)(w) 1 so thatd(0) 12t. Aldous and Fill [1, (5) and (6)] prove that if 1log 1 , then d() 1,where = minw t(w) is as defined in the statement of Theorem 4.5 and is thesecond-largest eigenvalue of the steady-state transition matrix P oft.

To prove the bound on 0, we show that 0log 1

log 1

t0 = 1

log 1

+log 1

t0 .

We do this by appealing to a result from Sinclair [17, Proposition 1(i)], which statesthat

0log 1

+ log 1

t

1 .9

Solving for yields the bound for 0. TheO() bound comes from the fact that= O(1) and that log 1

tand log 1

are low-order terms relative to the0 obtained

in Theorem 4.12.

4.3. Application to investment strategies. The efficient sampling techniquesof this section are applicable to investment strategies Swhose return functions Rn(S())are log-concave. Theorem 4.14 and Corollary 4.15 characterize such functions.

Theorem 4.14. Given investment strategy S, suppose that S is linear on w,

or, more formally, that for all parameters wi and wj, 2Swiwj

= 0. ThenRt(w) =Rt(S(w)) is log-concave.

9Strictly speaking, this result pertains to max, the second-largest absolute value of the eigenval-ues ofP, but as Sinclair discusses [17, p. 355], the smallest eigenvalue is unimportant, as P can bemodified so that all eigenvalues are positive without affecting mixing times beyond a constant factor.


21/22


Proof. Let rt(w) =St(w) xt, so thatRn(w) =n1

t=0 rt(w). Since log-concavefunctions are closed under multiplication, we need only show thatrt(w) is log-concave.

The gradient vector of log rt(w) has ith element log rt(w)

wi= 1

rt(w)rt(w)wi

, and the

matrix of second derivatives has (i, j)th element

1rt(w)2

rt(w)

wi

rt(w)

wj+

1

rt(w)

2rt(w)

wiwj= 1

rt(w)2rt(w)

wi

rt(w)

wj,

since 2rt(w)

wiwj=m

=12St(w)wiwj

xt = 0 by assumption. The matrix of second deriva-tives is negative semidefinite, implying that log rt(w) is a concave function.

Corollary 4.15. Universalizations of the following investment strategies can becomputed using the sampling techniques of this section:

1. the trading strategies MA[k] and SR[k] with long/short allocation functionsg(x) andhp(x, y), respectively, and

2. the portfolio strategies CRP and CRP-S.Proof. The result follows from a straightforward differentiation of the investment

descriptions of these strategies.

5. Further research. We have introduced in this paper a general frameworkfor universalizing parameterized investment strategies. It would be interesting torelax the condition of Theorem 3.3 and generalize the theorem. Likewise, it would beinteresting to see whether the proof of Theorem 3.3 can be optimized so that existinguniversal portfolio proofs for CRP [3, 5, 6] are a special case of Theorem 3.3. Theseproofs not only prove thatLn(U(CRP)) converges toLn(CRP(wn)), but also provea bound on the rate of convergence,

Rn(CRP(wn))Rn(U(CRP))

n + m 1

m 1

(n + 1)m1.

It would also be interesting to study other trading and portfolio strategies that fitinto our universalization framework and to see how our universalization algorithms

perform in empirical tests.

Acknowledgments. We would like to thank two anonymous reviewers for theircomments, which significantly improved the presentation of our work.

REFERENCES

[1] D. Aldous and J. A. Fill, AdvancedL2 techniques for bounding mixing times, in ReversibleMarkov Chains and Random Walks on Graphs, unpublished monograph, 1999; availableonline at stat-www.berkeley.edu/users/aldous/book.html.

[2] D. Applegate and R. Kannan, Sampling and integration of near log-concave functions, inProceedings of the 23rd Annual ACM Symposium on Theory of Computing, New Orleans,LA, 1991, pp. 156163.

[3] A. Blum and A. Kalai, Universal portfolios with and without transaction costs, MachineLearning, 35 (1999), pp. 193205.

[4] W. Brock, J. Lakonishok, and B. LeBaron,Simple technical trading rules and the stochasticproperties of stock returns, J. Finance, 47 (1992), pp. 17311764.

[5] T. M. Cover, Universal portfolios, Math. Finance, 1 (1991), pp. 129.[6] T. M. Cover and E. Ordentlich, Universal portfolios with side information, IEEE Trans.

Inform. Theory, 42 (1996), pp. 348363.[7] P. Diaconis and L. Saloff-Coste, Logorathmic Sobolev inequalities for finite Markov chains,

Ann. Appl. Probab., 6 (1996), pp. 695750.[8] G. B. Folland, Real Analysis: Modern Techniques and Their Applications, John Wiley &

Sons, New York, 1984.


22/22


[9] A. Frieze and R. Kannan, Log-Sobolev inequalities and sampling from log-concave distribu-tions, Ann. Appl. Probab., 9 (1999), pp. 1426.

[10] H. M. Gartley, Profits in the Stock Market, Lambert Gann Publishing Company, Pomeroy,WA, 1935.

[11] D. P. Helmbold, R. E. Schapire, Y. Singer, and M. K. Warmuth , On-line portfolio selec-

tion using multiplicative updates, Math. Finance, 8 (1998), pp. 325347.[12] M. Herbster and M. K. Warmuth, Tracking the best expert, Machine Learning, 32 (1998),

pp. 151178.[13] W. Hoeffding,Probability inequalities for sums of bounded random variables, J. Amer. Statist.

Assoc., 58 (1963), pp. 1330.[14] A. Kalai and S. Vempala, Efficient algorithms for universal portfolios, J. Mach. Learn. Res.,

3 (2002), pp. 423440.[15] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller ,

Equation of state calculation by fast computing machines, J. Chem. Phys., 21 (1953),pp. 10871092.

[16] E. Ordentlich and T. M. Cover, Online portfolio selection, in Proceedings of the 9th AnnualConference on Computational Learning Theory, Desanzano sul Garda, Italy, 1996, ACM,New York, pp. 310313.

[17] A. Sinclair, Improved bounds for mixing rates of Markov chains and multicommodity flow,Combin. Probab. Comput., 1 (1992), pp. 351370.

[18] R. Sullivan, A. Timmermann, and H. White, Data-snooping, technical trading rules and the

bootstrap, J. Finance, 54 (1999), pp. 16471692.[19] V. Vovk, Derandomizing stochastic prediction strategies, Machine Learning, 35 (1999),pp. 247282.

[20] V. Vovk, Competitive on-line statistics, Internat. Statist. Rev., 69 (2001), pp. 213248.[21] V. G. Vovk and C. J. H. C. Watkins , Universal portfolio selection, in Proceedings of the

11th Conference on Computational Learning Theory, Madison, WI, 1998, pp. 1223.[22] R. Wyckoff, Studies in Tape Reading, Fraser Publishing Company, Burlington, VT, 1910.

fast universilization of investmnet strategies

Documents