stock price prediction & optimal portfolio selection using

50
Stock Price Prediction & Optimal Portfolio Selection using Signal Processing A Project Report submitted by NISHIDH BIYANI in partial fulfillment of the requirements for the award of the degree of DUAL DEGREE IN ELECTRICAL ENGINEERING DEPARTMENT OF ELECTRICAL ENGINEERING INDIAN INSTITUTE OF TECHNOLOGY MADRAS. MAY 2013

Upload: others

Post on 15-Feb-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Stock Price Prediction & Optimal Portfolio Selection using

Signal Processing

A Project Report

submitted by

NISHIDH BIYANI

in partial fulfillment of the requirements

for the award of the degree of

DUAL DEGREE IN ELECTRICAL ENGINEERING

DEPARTMENT OF ELECTRICAL ENGINEERING

INDIAN INSTITUTE OF TECHNOLOGY MADRAS.

MAY 2013

THESIS CERTIFICATE

This is to certify that the thesis titled STOCK PRICE PREDICTION & OPTIMAL

PORTFOLIO SELECTION USING SIGNAL PROCESSING, submitted by NISHIDH

BIYANI, to the Indian Institute of Technology Madras, Chennai for the award of the degree of

Masters in Electrical Engineering, is a bona fide record of the research work done by him

under our supervision. The contents of this thesis, in full or in parts, have not been submitted to

any other Institute or University for the award of any degree or diploma.

Dr. Bharath Bhikkaji

Research Guide

Professor

Department of Electrical Engineering Place: Chennai

IIT-Madras, 600 036

Date: 7th

May 2013

i

ACKNOWLEDGEMENTS

It is indeed a great privilege for me to present this work, I take this opportunity to thank

all those who made this endeavor a success.

Foremost, I would like to express my gratitude to my advisor Prof. Bharath Bhikkaji for

the continuous support, innumerable suggestions, good guidance and cooperation, right from the

beginning and during the project. I am fortunate to work under him.

I thank the team of Santa Fe Research- Arun, Jerin and Shravan who helped and

contributed great ideas and advices. I also thank Dr. Girish Ganesan for his support and

guidance.

Lastly, I thank my parents for giving constant support and encouragement to pursue my

studies.

ii

ABSTRACT

KEYWORDS: HMM, Gaussian Mixture Models, CRP, Universal Portfolio Theory,

Commission, Side Information.

Stock market analysis and prediction is one of the interesting areas in which past data

could be used to anticipate and predict information about future. Technically speaking, this area

is of high importance for professionals in the industry of finance and stock exchange as they can

lead and direct future trends or manage crises over time. In this assignment, we try to take

advantage of Hidden Markov Models (HMM) to address some interesting problems regarding

stock market analysis. Specifically, stock price prediction is done in this assignment. First, a set

of past data is loaded and analyzed; then, an HMM is modeled and trained for the problem

model. Afterwards, similar past data are distinguished and used to predict future stock market

values. Stock market data that are used in this assignment are the data from NSE (National Stock

Exchange). Basically, each stock market data is a quadruple (open; low; high; close) carrying the

meaning that each day the stock market starts its activity, it starts with some opening after which

during the day it reaches its highest or drops down to its lowest of the day and then it will stop

with a close value. Such data seems to be very sensitive for stock traders and business

shareholders to predict future stock trends. In this assignment, we try to estimate the future day's

close values as precisely as possible.

This project also deals with the portfolio theory developed by Thomas M. Cover. Our

main goal is to understand the link of information theory to the theory of optimal investments in

a stock market. Firstly, we analyze the Constant Rebalanced Portfolio Theory (CRP) given by

Cover. A CRP is an investment strategy which keeps the same distribution of wealth among a set

of stocks from period to period. That is, the proportion of total wealth in a given stock is the

same at the beginning of each period. It is proven that CRP is the best portfolio strategy and no

other portfolio strategy can outperform it. But there are some practical issues in implementing

CRP in real stock market. To overcome these issues, Cover gave Universal Portfolio Theory

which asymptotically performs same as CRP and is more practically feasible. We analyze the

Universal Portfolio Theory and simulate it on a two stock portfolio.

iii

The Universal Portfolio Theory does not take into account transaction fees, which

could be the bane for an investor as according to the Universal Portfolio Theory portfolio has to

be updated daily. We provide a simple analysis which naturally extends to the case of a fixed

percentage transaction cost (commission). In addition, we present a simple implementation on

real stock market. Inclusion of transaction costs in the Universal Portfolio Theory makes it the

best and practically feasible portfolio strategy on paper. But the market varies in such a manner

that there is possibility of outperforming this strategy by having insightful information about the

market. Portfolio management based on instincts and extra information can outperform this, but

algorithmically it is the best strategy. It requires no extra information and no expertise on stock

market. Any person with no expertise in stock market can use this strategy and get way better

returns than any other strategies available to him.

Lastly, we tried to incorporate extra information (side information) or credible

expert opinion into our algorithm to improve our returns. We worked on state constant

rebalanced portfolio and clubbed the same concept with the universal portfolio to use the side

information. In our case we looked into stocks listed on both NYSE and NSE. And used the

performance of the stocks on NYSE as the side information and updated the portfolio in NSE

accordingly. This surpasses the returns of the universal portfolio without side information. We

also looked into the upper bound on the increment in the returns due to the side information. The

increment in the returns resulting from the side information is upper-bounded by the mutual

information between the stock vector and the side information.

iv

TABLE OF CONTENTS

ACKNOWLEDGEMENTS i

ABSTRACT ii

LIST OF FIGURES vi

ABBREVIATIONS vii

CHAPTER 1. STOCK PRICE PREDICTION USING HMM 1

1.1 Introduction to Markov Models …………………………………

1.2 Hidden Markov Models…………………………………………

1.2.1 Three fundamental problems for HMMs………

1.2.2 The Forward Algorithm………………………

1.2.3 The Backward Algorithm………………………

1.2.4 Baum Welch Algorithm………………………

1.3 HMM as a Predictor………………………………………………

1.4 Implementation…………………………………………………

1.4.1 HMM Parameters…………………………….

1.4.2 Initialization………………………………….

1.4.3 Prediction…………………………………….

1.5 Results………………………………………………………….

CHAPTER 2. LOG OPTIMAL PORTFOLIO 15

2.1 Stock market, Portfolio and Wealth………………………………

2.2 Growth Rate and Log-optimal Portfolio……………………….

2.3 Kuhn-Tucker Characterization of the Log-optimal Portfolio………

2.4 Asymptotic Optimality of the Log-optimal Portfolio………………

v

CHAPTER 3. UNIVERSAL PORTFOLIO THEORY 26

3.1 Universal Portfolio Theory without Commission………………

3.2 Universal Portfolio with Commission ………………………….

3.3 Implementation and Results…………………………………….

CHAPTER 4. SIDE INFORMATION 32

4.1 Side Information and the Doubling Rate………………………

4.2 State Constant Rebalanced Portfolios……………………………

4.3 Example explaining impact of Side Information………………….

4.4 Universal Portfolio with Side Information………………………

4.5 Implementation on real Stock market……………………………

REFERENCES 40

vi

LIST OF FIGURES

1.1. The Forward Procedure…………………………………………………………………. 4

1.2. The Backward Procedure……………………………………………………………… 5

1.3. Plot comparing the Predicted Closing Price to the Actual Closing Price for IBM…… 12

1.4. Plot comparing the Predicted Closing Price to the Actual Closing Price for Dell…… 12

1.5. Plot comparing the Predicted Closing Price to the Actual Closing Price for Southwest

Airlines………………………………………………………………………………… 13

1.6. Plot comparing the Predicted Closing Price to the Actual Closing Price for Ryanair

Holdings……………………………………………………………………………… 13

1.7. Plot comparing the Predicted Closing Price to the Actual Closing Price for Apple Inc. 14

2.1. Sharpe Markowitz Theory 16

3.1. Comparing the results of Universal Portfolio Strategy with other strategies…………… 30

3.2. Effect of Transaction Cost on the returns of Universal Portfolio Theory……………… 31

4.1. Comparison of Universal Portfolio with and without Side Information……………… 39

vii

ABBREVIATIONS

HMM Hidden Markov Model

CRP Constant Rebalanced Portfolio (Log-Optimal Portfolio)

i.i.d. Independent and Identically distributed

NSE National Stock Exchange

NYSE New York Stock Exchange

MAP Maximum a Posteriori

MAPE Mean Absolute Percentage Error

1

CHAPTER 1

STOCK PRICE PREDICTION USING HMM

Stock market analysis and prediction is one of the interesting areas in which past data

could be used to anticipate and predict information about future. Technically speaking, this area

is of high importance for professionals in the industry of finance and stock exchange as they can

lead and direct future trends or manage crises over time. In this assignment, we try to take

advantage of Hidden Markov Models (HMM) to address some interesting problems regarding

stock market analysis. Specifically, stock price prediction is done in this assignment. Hidden

Markov models are especially known for their application in temporal pattern recognition such

as speech, handwriting, gesture recognition. A hidden Markov model can be considered a

generalization of a mixture model where the hidden variables, which control the mixture

component to be selected for each observation, are related through a Markov process rather than

independent of each other.

1.1 Introduction to Markov Models

Markov models are used to train and recognize sequential data, such as speech utterances,

temperature variations, biological sequences, and other sequence data. In a Markov model, each

observation in the data sequence depends on previous elements in the sequence. Consider a

system where there are a set of distinct states { }. At each discrete time slot t, the

system moves to one of the states according to a set of state transition probabilities . We denote

the state at time t as . Since the state transition is independent of time, we can have the

following state transition matrix :

is state transition probability, hence:

2

Also we need to know the probability to start from a certain state, the initial state distribution:

Therefore, ∑ .

1.2 Hidden Markov Models

A HMM is a doubly stochastic process with an underlying stochastic process that is not

observable (it is hidden), but can only be observed through another set of stochastic processes

that produce the sequence of observed symbols. In an HMM, one does not know anything about

what generates the observation sequence. The number of states, the transition probabilities, and

from which state observation is generated all are unknown. Each state of the HMM is associated

with a probabilistic function. At time , an observation is generated by a probabilistic

function , which is associated with state , with the probability:

A HMM is composed of five tuple:

{ } is the set of states. The state at time is denoted by .

number of distinct observation symbols per state (observation symbols correspond

to the physical output of the system being modeled)

Initial state distribution { } is defined as

State transition probability distribution { } .

Observation symbol probability distribution . The probabilistic function for

each state is :

3

The overall HMM model is denoted by

After modeling a problem as an HMM, and assuming that some set of data was generated

by the HMM, we are able to calculate the probabilities of the observation sequence and the

probable underlying state sequences. Also we can train the model parameters based on the

observed data and get a more accurate model. Then use the trained model to predict unseen data.

1.2.1 Three Fundamental Problems for HMMs

1. Given the model how do we compute , the probability of

occurrence of the observation sequence .

2. Given the observation sequence and a model , how do we choose a state sequence

that best explains the observations.

3. Given the observation sequence and a space of models found by varying the model

parameters , and , how do we find the model that best explains the observed

data.

There are established algorithms to solve the above questions [3]. In our task we have

used the forward-backward algorithm to compute the and Baum-Welch algorithm to

train the HMM.

1.2.2 The Forward Algorithm

The forward variable is defined as:

stores the total probability of ending up in state at time , given the observation sequence

.

4

Fig.1.1 – The Forward Procedure

The forward procedure:

1. Initialization

2. Induction

[∑

]

3. Update time by setting ; Return to step 2 if ; Otherwise, terminate

algorithm

4. Termination

5

1.2.3 The Backward Algorithm

The backward procedure calculates the probability of the partial observation sequence

from to the end, given the model and state at time . The backward variable is

defined as:

The backward procedure:

1. Initialization

2. Induction

3. Update time by setting ; Return to step 2 if ; Otherwise terminate the

algorithm

4. Termination

Fig.1.2 – The Backward Procedure

6

1.2.4 Baum Welch Algorithm

The last and most difficult problem about HMMs is that of parameter estimation. Given

an observation sequence, we want to find the model parameters that best explains

the observation sequence. The problem can be reformulated as find the parameters that maximize

the following probability:

There is no known analytic method to choose to maximize but we can use a

local maximization algorithm to find the highest probability. This algorithm is also called the

Baum Welch. It is a special case of the Expectation Maximization method [1]. It works

iteratively to improve the likelihood of . This iterative process is called the training of the

model. The Baum-Welch algorithm is numerically stable with the likelihood non-decreasing of

each iteration. It has linear convergence to a local optima.

To work out the optimal model iteratively, we will need to define a few

intermediate variables. Define as follows:

∑ ∑

This is the probability of being at state at time , and at state at time , given the model

and the observation .

7

Then define . This is the probability of being at state at time , given the observation and

the model :

The above equation holds because is the expected number of transition from state

and is the expected number of transitions from state to .

Given the above definitions we begin with an initial model and run the training data

through the current model to estimate the expectations of each model parameter. Then we can

change the model to maximize the values of the paths that are used. By repeating this process we

hope to converge on the optimal values for the model parameters.

The re-estimation formulas of the model are:

8

1.3 HMM as a Predictor

We use a continuous Hidden Markov Model to model the stock data as a time series. An

HMM can be written as . Where is the transition matrix whose elements give the

probability of a transition from one state to another, is the emission matrix giving the

probability of observing when in state , and gives the initial probabilities of the states

at . Further for a continuous HMM the emission probabilities are modeled as Gaussian

Mixture Models (GMMs):

∑ )

where:

is the number of Gaussian Mixture components.

is the weight of the mixture component in state .

is the mean vector for the component in the state.

is the probability of observing vector in the multi-dimensional

Gaussian distribution.

is the Covariance matrix for the mixture component in state

Training of the above HMM from given sequences of observations is done using the

Baum-Welch algorithm which uses Expectation-Maximization (EM) to arrive at the optimal

parameters for the HMM. In our model the observations are the daily stock data in the form of

the 4-dimensional vector,

(

)

9

Here open is the day opening value, close is the day closing value, high is the day high, and low

is the day low. We use fractional changes along to model the variation in stock data which

remains constant over the years [4].

Once the model is trained, testing is done using an approximate Maximum a Posteriori

(MAP) approach. We assume a latency of days while forecasting future stock values. Hence,

the problem becomes as follows - given the

HMM model and the stock values for days along with the stock open value

for the day, we need to compute the close value for the day. This is

equivalent to estimating the fractional change

for the day. For this, we

compute the MAP estimate of the observation vector .

Let be the MAP estimate of the observation on the day, given the values

of the first days. Then,

The observation vector is varied over all possible values. Since the denominator is

constant with respect to , the MAP estimate becomes,

The joint probability value can be computed using the forward-

backward algorithm for HMMs. In practice, we compute the probability over a discrete set of

possible values of and find the maximum. The computational complexity of the forward-

backward algorithm for finding the likelihood of a given observation is , where is the

number of states in the HMM and is the latency. This procedure is repeated over the discrete

set of possible values of . In our case and there are possible

10

values of . The closing value of a particular day can be computed by using the day opening

value and the predicted fractional change for that day.

1.4 Implementation

1.4.1 HMM Parameters

The HMM parameters are set to the following values:

Number of underlying Hidden States

Number of mixture components for each state

Dimension of observations

Ergodic HMM (all transitions are possible).

Latency days.

These were obtained by varying the parameters over suitable ranges and choosing the values

which give the minimum error between forecasted and actual stock values [2].

1.4.2 Initialization

For initialization of the model parameters the prior probabilities and transition

probabilities are assumed to be uniform across all states. To initialize the mean, variance and

weights of the Gaussian mixture components we use k-means algorithm. Each cluster found from

k-means is assumed to be a separate mixture component from which the mean and variance are

computed. Weights of the components are assumed to be weights of the clusters, which are

divided equally between the states to obtain the initial emission probabilities.

1.4.3 Prediction

To compute the MAP estimate , we compute the probability values over a range of

possible values of the tuple

and find the maximum. A higher

precision is used for the fractional change values since these are ultimately used for the stock

prediction. The range of values is listed in table below.

11

Observation Range Number of Points

Min. Max.

-0.1 0.1 50

0 0.1 10

0 0.1 10

1.5 Results

The metric used to evaluate the performance of the algorithm is Mean Absolute

Percentage Error (MAPE) in accuracy. MAPE is the average absolute error between the actual

stock values and the predicted stock values in percentage.

where is the actual stock value, is the predicted stock value on day and is the number of

days for which the data is tested. The table below lists the MAPE values for the four stocks using

our developed algorithm.

Stock Name MAPE

IBM 1.067%

Dell 1.654%

Southwest Airlines 1.857%

Ryanair Holdings 1.944%

Apple Inc. 6.057%

12

We took 1 year data for training the model and predicted the closing price for next 90 days. The

following plots compare the predicted closing price for the above mentioned 5 stocks with the

actual closing price for that day.

Fig.1.3 – Plot comparing the Predicted Closing Price to the Actual Closing Price for IBM

Fig.1.4 – Plot comparing the Predicted Closing Price to the Actual Closing Price for Dell

75

80

85

90

95

100

105

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85

Sto

ck P

rice

fo

r IB

M

Prediction Day

Stock Price Prediction for IBM

Actual Closing Price

Predicted Closing Price

0

5

10

15

20

25

30

35

40

45

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85

Sto

ck P

rice

fo

r D

ell

Prediction Day

Stock Price Prediction for Dell

Actual Closing Price

Predicted Closing Price

13

Fig.1.5 – Plot comparing the Predicted Closing Price to the Actual Closing Price for Southwest

Airlines

Fig.1.6 – Plot comparing the Predicted Closing Price to the Actual Closing Price for Ryanair

Holdings

0

2

4

6

8

10

12

14

16

18

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81Sto

ck P

rice

fo

r So

uw

est

Air

line

s

Prediction Day

Stock Price Prediction for Southwest Airlines

Actual Closing Price

Predicted Closing Price

0

10

20

30

40

50

60

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70Sto

ck P

rice

fo

r R

yan

air

Ho

ldin

gs

Prediction Day

Stock Price Prediction for Ryanair Holdings

Actual Closing Price

Predicted Closing Price

14

Fig.1.7 – Plot comparing the Predicted Closing Price to the Actual Closing Price for Apple Inc.

For above results we took non-volatile stocks in a very stable market situation. The

problem with the proposed model is that it does not take into account changes in the market due

to unpredictable and unquantifiable factors such as, policies, market conditions etc. This model

entirely depends on the past training data and it is not always necessary that stock prices follow

the same trend as it did in past, which is quite visible in the stock prediction plot for Apple Inc.

For other 4 stocks it approximately predicted the stock price, but for Apple it did not. Stock Price

is not only a function of past prices in real market.

This is a very basic model which approaches the problem of stock price prediction using

HMM. The model can be improved by giving some meaning to the states defined. To give

meaning to state one should analyze the stock market and try to define the states in a different

manner.

In the current approach, we also assumed that the model for one particular stock is

independent of the other stocks in the market, however in reality these stocks are heavily

correlated to each other and, to some extent, to stocks in other markets too. As a future work, it

might be intuitive to try and build a model which takes into consideration these correlations.

Also, currently the data is quantized to form observation vectors for a full day. A performance

improvement might be achieved by removing this quantization and instead taking the full range

of minute-by-minute or hour-by-hour stock values.

0

10

20

30

40

50

60

70

80

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85Sto

ck P

rice

Pre

dic

tio

n f

or

Ap

ple

Inc.

Prediction Day

Stock Price Prediction for Apple Inc.

Actual Closing Price

Predicted Closing Price

15

CHAPTER 2

LOG-OPTIMAL PORTFOLIO THEORY

The duality between the growth rate of wealth in the stock market and the entropy rate of

the market is striking. In particular, we shall find the competitively optimal and growth rate

optimal portfolio strategies. They are the same, just as the Shannon code is optimal both

competitively and in expected value in data compression. We shall also find the asymptotic

doubling rate for an ergodic stock market process. I would like to remark that the portfolio

theory based on Information Theory is completely adapted from the work done in this field by

Thomas M. Cover and Joy A. Thomas [10].

2.1 Stock Market, Portfolio and Wealth

Definition 2.1 A stock market is a random return vector,

where, is the number of stocks in the stock market, is the price relative or return of stock

i.e., the ratio of the price at the end of the day to the price at the beginning of the day and

is the joint distribution of the vector of price relatives.

Definition 2.2 A portfolio is a vector,

16

Where, is the portfolio weights i.e., the fraction of the investor's capital in stock .

Definition 2.3 Let be a portfolio and a stock market, the resulting random wealth (relative)

at the end of the day is

We wish to maximize in some sense. But is a random variable, so there is controversy

over the choice of the best distribution for S. The standard theory of stock market investment is

based on the consideration of the first and second moments of S. The objective is to maximize

the expected value of S, subject to a constraint on the variance. Since it is easy to calculate these

moments, the theory is simpler than the theory that deals with the entire distribution of S. The

mean-variance approach is the basis of the Sharpe-Markowitz theory of investment in the stock

market and is used by business analysts and others [11]. The figure 2.1 illustrates the set of

achievable mean-variance pairs using various portfolios. The set of portfolios on the boundary of

this region corresponds to the un-dominated portfolios: these are the portfolios which have the

highest mean for a given variance. This boundary is called the efficient frontier, and if one is

interested only in mean and variance, then one should operate along this boundary.

Fig.2.1 – Sharpe Markowitz Theory

17

Normally the theory is simplified with the introduction of a risk-free asset, e.g., cash or

Treasury bonds, which provide a fixed interest rate with variance 0. This stock corresponds to a

point on the Y axis in the figure. By combining the risk-free asset with various stocks, one

obtains all points below the tangent from the risk-free asset to the efficient frontier. This line

now becomes part of the efficient frontier. The concept of the efficient frontier also implies that

there is a true value for a stock corresponding to its risk. This theory of stock prices is called the

Capital Assets Pricing Model and is used to decide whether the market price for a stock is too

high or too low.

2.2 Growth Rate and Log-optimal Portfolio

Our objective is to find the largest wealth . To motivate this, we define the growth rate

and the log-optimal portfolio. Looking at the mean of a random variable gives information about

the long term behavior of the sum of i.i.d. versions of the random variable. But in the stock

market, one normally reinvests every day, so that the wealth at the end of n days is the product of

factors, one for each day of the market. The behavior of the product is determined not by the

expected value but by the expected logarithm. This leads us to define the doubling rate as

follows:

Definition 2.4 The growth rate of a portfolio with respect to a stock distribution is the

expected value of the logarithm of wealth,

The reason why we do not define the growth rate as , is the multiplicative growth of

wealth Assume that one invests days. The resulting wealth at the end of days would be

18

According to the strong law of large numbers, the expected value of describe the dominant

behavior of ∏ i.e., ∏

Definition 2.5 The optimal doubling rate is defined as

Where, the maximum is over all possible portfolios ∑

Definition 2.6 A portfolio that achieves the maximum of is called a log optimal

portfolio.

The definition of doubling rate is justified by the following theorem,

Theorem 2.1 Let be i.i.d. according to . Let

Be the wealth after days using the constant rebalanced portfolio . Then

with probability 1.

Proof:

with probability 1,

by the strong law of large numbers. Hence,

Lemma 2.1 is concave in and linear in . is convex in

Proof: The doubling rate is

19

∫ .

Since the integral is linear in so is .

Since

by the concavity of the logarithm, it follows, by taking expectations, that is concave

in .

Finally, to prove the convexity of as a function of , let and be two

distributions on the stock market and let the corresponding optimal portfolios be

and respectively. Let the log-optimal portfolio corresponding to

be . Then by linearity of with respect to we have

,

Since maximizes and maximizes .

Lemma 2.2 The set of log-optimal portfolios forms a convex set.

Proof: Let and

be any two portfolios in the set of log-optimal portfolios. By the previous

lemma, the convex combination of and

has a doubling rate greater than or equal to the

doubling rate of or

, and hence the convex combination also achieves the maximum

doubling rate. Hence the set of portfolios that achieves the maximum doubling rate. Hence the

set of portfolios that achieves the maximum is the convex.

20

2.3 Kuhn-Tucker Characterization of the Log-optimal Portfolio

The determination that achieves is a problem of maximization of a concave function

over a convex set . The maximum may lie on the boundary.

Theorem 2.2 The log-optimal portfolio for a stock market , i.e., the portfolio that

maximizes the doubling rate , satisfies the following necessary and sufficient conditions:

(

) if

(

) if

Proof: The doubling rate is concave in , where ranges over the simplex

of portfolios. It follows that is log-optimum iff the directional derivative of in the

direction from to any alternative portfolio is non-positive. Thus, letting

for , we have

.

These conditions can be reduced since the one-sided derivative at of is

(

)

(

( (

)))

(

) ,

21

where the interchange of limit and expectation can be justified using the dominated convergence

theorem. Thus

(

)

for all .

If the line segment from to can be extended beyond in the simplex, then the two-

sided derivative at of vanishes and the above equation holds with equality. If the

line segment from to cannot be extended, then we have an inequality in the above equation.

This theorem has a few immediate consequences. One surprising result is expressed in

the following theorem:

Theorem 2.3 Let be the random wealth resulting from the log-optimal portfolio .

Let be the wealth resulting from any other portfolio . Then

(

)

Conversely, if (

) for all portfolios , then * (

)+ for all .

This theorem can be stated more symmetrically as

* (

)+ for all

⇔ (

) , for all .

Proof: From the previous theorem, it follows that for a log-optimal portfolio ,

(

)

for all . Multiplying this equation by and summing over , we have

22

∑ (

) ∑

which is equivalent to

(

) (

) .

The converse follows from Jensen’s inequality, since

* (

)+ (

) .

Thus expected log ratio optimality is equivalent to expected ratio optimality.

Maximizing the expected algorithm was motivated by the asymptotic growth rate. But we

have shown that the log- optimal portfolio, in addition to maximizing the asymptotic growth rate,

also maximizes the wealth relative for one day.

Another consequence of Kuhn-Tucker characterization of the log-optimal portfolio is the

fact that the expected proportion of wealth in each stock under the log-optimal portfolio is

unchanged from day to day. Consider the stocks at the end of first day. The initial allocation of

wealth is . The proportion of the wealth in stock at the end of the day is ⁄ , and the

expected value of this proportion is

(

)

(

)

Hence the expected proportion of wealth in stock at the end of the day is same as the proportion

invested in stock at the beginning of the day.

2.4 Asymptotic Optimality of the Log-optimal Portfolio

In this section we prove that with probability 1, the conditionally log-optimal investor will not do

any worse than any other investor who uses a casual investment strategy.

23

We first consider an i.i.d. stock market, i.e., are i.i.d. according to .

Let

be the wealth after days for an investor who uses portfolio on day Let

be the maximal doubling rate and let be a portfolio that achieves the maximum doubling rate.

We only allow portfolios that depend casually on the past and are independent of the

future values of the stock market.

From the definition of , it immediately follow that the log-optimal portfolio

maximizes the expected log of the final wealth. This is stated in the following lemma.

Lemma 2.3 Let be the wealth after days for the investor using the log-optimal strategy on

i.i.d. stocks, and let be the wealth of any other investor using a causal portfolio strategy

Then

Proof:

24

,

and the maximum is achieved by a constant portfolio strategy .

So far, we have proved two simple consequences of the definition of log optimal

portfolios, i.e., that maximizes the expected log wealth and that the wealth is equal to first

order in the exponent, with high probability.

Now, we will prove that exceeds the wealth (to first order in the exponent) of any

other investor for almost every sequence of outcomes from the stock market.

Theorem 2.4 Let be a sequence of i.i.d stock vectors drawn according to

Let ∏

, where is the log-optimal portfolio, and let ∏

be the

wealth resulting from any other causal portfolio. Then

with probability 1.

Proof: From the Kuhn-Tucker conditions, we have

.

Hence by Markov’s inequality, we have

(

)

.

Hence

.

25

Setting and summing over , we have

.

Then, by the Borel-Cantelli lemma,

infinitely often .

This implies that for almost every sequence from the stock market, there exists an such that for

all

. Thus

with probability .

The theorem proves that the log-optimal portfolio will do as well or better than any other

portfolio to first order in exponent.

The remaining question, how we can compute the log-optimal portfolio, can be solved by

the following algorithm.

Algorithm 2.1 Generate a sequence of portfolio vectors recursively according to

where . The sequence { } remains in the simplex and converges to the log-optimal

portfolio

The biggest drawback of the above algorithm is that it requires the knowledge of the

return distribution which is generally not available. The accuracy of log-optimal portfolio

depends on the accuracy of the distribution, which is very difficult to predict with high accuracy.

Hence, Cover proposed alternative portfolio theory, Universal Portfolio Theory, which gives

asymptotically same results as log-optimal portfolio.

26

CHAPTER 3

UNIVERSAL PORTFOLIO THEORY

In the previous chapter we saw that finding out the best constant ratio for the investment

according to CRP is practically infeasible. In this chapter, we give a simple analysis of the

universal algorithm of Cover, without commission. We also look into a strategy which takes

commission into account and foretells us the feasibility of investment according to Universal

Portfolio Theory [7].

3.1 Universal Portfolio Theory without Commission

Let us first consider an easier question. Suppose we just want a strategy that is

competitive with respect to the best single stock. In other words, we want to maximize the worst-

case ratio of our wealth to that of the best stock. In this case, a good strategy is simply to divide

our money among the stocks and let it sit. We will always have at least times as much

money as the best stock. Note that this deterministic strategy achieves the expected wealth of the

randomized strategy that just places all its money in a random stock. Now consider the problem

of competing with the best CRP. Cover's universal portfolio algorithm is similar to the above. It

splits its money evenly among all CRPs and lets it sit in these CRP strategies. (It does not

transfer money between the strategies.) Likewise, it always achieves the expected wealth of the

randomized strategy which invests all its money in a random CRP.

The proposed universal adaptive portfolio strategy is the performance weighted strategy

specified by

(

)

27

And the integration is over the set of dimensional portfolios

{ ∑

}

The wealth resulting from the universal portfolio is

Thus the initial universal portfolio is uniform over the stocks, and the portfolio at time is

the performance weighted average of all portfolios .

3.2 Universal Portfolio with Commission

One of the biggest limitations of Universal Portfolio is the transaction costs associated

with the rebalancing which can overcome the return. So we develop a strategy which involves

the transaction cost. According to this strategy we compute the returns from Universal portfolio

for different rebalancing period (all ) for some period without actually investing in the stocks.

Through this we’ll be able to find the optimal rebalancing period (as low as possible) for which

the effect of transaction cost is negligible. And we rebalance our portfolio at this calculated time

period according to Universal algorithm.

We consider the case of fixed percentage commission . For simplicity, we will

assume that the commission is charged only for purchases and not for sales. Alternatively, one

can imagine having two commissions, and , for buying and selling. Our theoretical

results will still hold for because one rupee in a single stock can be transferred

to rupees in a different stock. According to our assumption one rupee in a

single stock can be transferred to rupees in a different stock. And

, if .

28

Thus, if we consider the different selling and buying commissions our wealth will be greater than

what we compute using only buying commission . Hence, to avoid complexity

we consider only for buying and .

We now need to specify how an investor, who has a target distribution of wealth, pays for

these transaction costs, each period. In our model, the investor must pay for all transaction costs

by selling stock. Since we are comparing ourselves to the best CRP, it is natural to assume that

the CRP investor makes the optimal trades so as to rebalance his portfolio and pay for his

transaction costs. For example, suppose there is a hefty 40% commission on each purchase. Say,

at the end of a period, an investor has Rs.200 in stock A, and Rs.800 in stock B, and this investor

wishes to rebalance to a (1/2, 1/2) portfolio for the start of the next period. The optimal investor

would first sell Rs.100 of stock B to cover the upcoming transaction costs. He now has Rs.200 in

stock A, Rs.700 in stock B, and Rs.100 in cash. When he trades Rs.250 of stock B for stock A to

get Rs.450 in each stock, he then pays him (40%) Rs.250 = Rs.100 in transaction costs. This is

better than someone who naively tries to rebalance to Rs.500 in each stock and then must sell

Rs.60 worth of each stock to pay his (40%) Rs.300 = Rs.120 in transaction costs, leaving Rs.440

in each stock. For our analysis, we do not need to know the specifics of this optimal re-balancer.

But for the sake of completeness, let's look at how to compute these optimal costs. Say we start

with one rupee distributed according to and we would like to re-balance to , optimally with

respect to commission. If we know the the largest amount that we can have after rebalancing, ,

then it is easy. In order to achieve rupees distributed according to , we sell the difference of

every stock for which and buy the others. Optimality requires that is the solution of

For reasonable commission costs, we can easily approximate the optimal rebalance described

above by paying for the transactions proportionally from each stock, i.e.

29

This is not optimal rebalancing but still it gives us the lower limit to our wealth which could be

slightly improved by doing optimal rebalancing.

3.3 Implementation and Results

We now test the universal algorithm on real data. For our analysis we took 2 stock

portfolio (L&T and Infosys listed on NSE) from 2005-2010. Since, we already know the future

for these stocks (2005-2010) we can calculate best CRP by varying for its all possible values.

We invested Re.1 in this 2 stock portfolio for 5 years. We calculated the returns

according to constant rebalanced portfolio theory. In our case, . We varied from

0 to 1 in steps of 0.05, giving us 21 different portfolios. We calculated the returns for all these 21

portfolios according to CRP and whichever gave the best result at the end of the test period is our

best constant rebalanced portfolio . In our example turns out to be (0.45, 0.55) and wealth

is .

We also invested in the same stocks for same period according to Universal Portfolio

Theory. Starting with (1/2, 1/2) in both the stocks and then rebalancing according to the

algorithm. And the wealth according to this algorithm is . We compared these returns

with the best buy and hold strategy which would have been L&T, with return in the

plot below (Fig.3.1). The best buy and hold strategy is a passive investment strategy in which an

investor buys stocks and holds them for a long period of time, regardless of fluctuations in the

market. An investor who employs a buy-and-hold strategy actively selects stocks, but once in a

position, is not concerned with short-term price movements and technical indicators. The table

below shows returns from all the strategies.

Strategy Wealth (of Re.1)

Constant Rebalanced Portfolio 2.6112

Universal Portfolio ) 2.2934

L&T (best buy and hold) 2.0017

Infosys (buy and hold) 1.6413

30

Fig.3.1- Comparing the results of Universal Portfolio Strategy with other strategies.

We also took the transaction costs into account. We calculated the reduced wealth after

considering the commission (Section 3.2) for different rebalancing time periods and different

commission costs. We varied commission for 0 to 0.5 in steps of 0.02 (i.e. 0 to 50%) and

rebalancing time period from 1 day to 30 days. (Note: In our model which means

that in real market scenario we varied the commissions from 0 to 25% only.)

The results for which are shown in the plot below (Fig.3.2). In our test portfolio the

wealth doesn’t change very significantly even though we balance it every day in 5 years because

in our portfolio the value stock wealth ratio always remains in close proximity of 0.5 and thus,

the transaction cost affects the stock very less. Therefore, there is no need of increasing the

rebalancing time period as the effect of the transaction costs on returns is negligible.

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

51

59

11

7

17

5

23

3

29

1

34

9

40

7

46

5

52

3

58

1

63

9

69

7

75

5

81

3

87

1

92

9

98

7

10

45

11

03

11

61

12

19

12

77

13

35

13

93

14

51

Re

turn

Days

Performance of Universal Portfolio

L&T

CRP

Infosys

Universal

31

Fig.3.2 – Effect of Transaction Cost on the returns of Universal Portfolio Theory

We expect the wealth to monotonically decrease as we increase our rebalancing period

but the plot above (Fig.3.2) doesn’t follow the trend. This is because the rebalancing done more

frequently takes care of all the abrupt changes and hence averages the risk. If the rebalancing is

not done frequently then we can lie either on the peak or on the low of the graph. But if we have

some extra information which can provide us accurate future market trends, then we can

rebalance our portfolio accordingly, in that case we can actually outperform CRP and Universal

both. A small discussion on this idea has been done in the next chapter.

Universal Portfolio Theory with commission is completely feasible and can provide far

better results as compared to other strategies. The best advantage of this strategy is that it

requires zero knowledge of market condition and is not susceptible to any abrupt changes in the

market. Although there is possibility of getting returns more than above strategy, but that

increases the risk. Overall, the above strategy is a very sound long term investment strategy with

far better returns for much lesser risk.

1.95

2

2.05

2.1

2.15

2.2

2.25

2.3

2.35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Re

turn

s

Rebalancing Period

Effect of Transaction Cost

c=0.14

c=0.16

c=0.18

c=0.20

c=0.22

c=0.24

c=0.26

c=0.28

c=0.30

32

CHAPTER 4

SIDE INFORMATION

Investors use various sources of side information to adjust their portfolios. We model this

side information as a finite valued variable made available at the start of each investment

period. The portfolio choice can then incorporate knowledge of for that period. Thus, the

formal domain of our market model is a sequence of pairs where, is the stock vector

for period and { } denotes the state of the side information at time .

The side information can arise in numerous ways. For example, sophisticated trading

strategies often develop signaling algorithms that indicate the nature of the investment

opportunity about to be faced. The signal would constitute the side information. Side information

could be world events, the behavior of a correlated market, or past information on previous stock

market data.

4.1 Side Information and the Doubling Rate

Theorem 4.1 Let be drawn i.i.d. from Let the log-optimal portfolio

corresponding to and let be the log-optimal portfolio corresponding to some other

density Then the increase in doubling rate by using instead of

is bounded by

( ) (

)

Note:

and

Proof: We have

∫ ∫

33

where, (a) follows from Jensen’s inequality and (b) follows from the Kuhn- Tucker conditions

and the fact that is log optimal for .

Note: ∫

(also known as Kullback–Leibler divergence)

Theorem 4.2 The increase in doubling rate due to side information is bounded by

Proof: Given side information the log-optimal investor uses conditional log-optimal

portfolio for the conditional distribution . Hence, conditional on , we have,

from Theorem 4.1,

34

Averaging this over possible values of we have

∫ ∫

Hence the increase in doubling rate is bounded above by the mutual information between the

side information and the stock market [10].

Above result is quite intuitive because mutual information can also be represented

as , where is the entropy. If Entropy is regarded as a

measure of uncertainty about a random variable, then is a measure of

what does not say about . This is "the amount of uncertainty remaining about after is

known", and thus the right side of the first of these equalities can be read as "the amount of

uncertainty in , minus the amount of uncertainty in which remains after is known", which is

equivalent to "the amount of uncertainty in which is removed by knowing ". This

corroborates the intuitive meaning of mutual information as the amount of information (that is,

reduction in uncertainty) that knowing either variable provides about the other. And we expect

the growth rate of our portfolio to be directly proportional to the reduction in uncertainty in

arose due to the knowledge of , which is proven by the above theorem.

4.2 State Constant Rebalanced Portfolios

The constant rebalanced portfolio is extended to the state constant rebalanced portfolio by

allowing the portfolio decisions to vary with the side information . A state constant rebalanced

35

portfolio specifies portfolios and uses portfolio at time when

the side information state takes on value { }. The choice of results in wealth

on the stock sequence and side information . The collection of state-constant rebalanced

portfolios with states will be denoted by

For a sequence of stock vectors and side information states we can determine the

best state constant rebalanced portfolio as the one achieving the maximum wealth. We denote

this portfolio by where,

And the maximum is overall portfolio assignments Let

Denote the maximum wealth. Thus the best state constant rebalanced portfolio strategy uses

portfolio and achieves a wealth of [15].

The number of degrees of freedom in a state constant rebalanced portfolio will be useful

in characterizing the subsequent results. A state constant rebalanced portfolio for states and

stocks has degrees of freedom. degrees of freedom for each of the portfolios

which must be specified. The requirement that the entries sum to one gives each portfolio

vector degrees of freedom, rather than , where is the number of stocks.

36

4.3 Example explaining impact of Side Information

We now present a simple example illustrating impact of side information on constant rebalanced

portfolio [15].

Let , and let

(

)

(

)

be the sequence of stock market vectors. Note that the first component of the stock vector

at time is constantly equal to for . This first component represents a risk free

asset (or cash). On the other hand, the second stock is highly volatile, jumping up and down

by a factor of or

each investment day. A buy and hold strategy in stock results in

∏ ; a buy and hold of stock results in ∏ , when is even. Also, the

sequence has been maliciously chosen to perform contrary to naïve expectation. For example,

whenever stock has outperformed stock in past, it plunges by a factor of

.

Now consider the behavior of a constant rebalanced portfolio on this sequence.

Then, for even,

.

Setting the derivatives to , we find the maximum wealth is achieved by rebalancing each time to

,

resulting in wealth

√ ,

for even). Since (

) (

) , for , the wealth grows

exponentially to infinity.

37

Now consider side information with sates:

Thus indicates whether the running price of stock exceeds stock (cash) at time . The

sequences look like this:

(

)

(

)

Note that the simple calculation based on the past yields side information that gives prefect

investment information. An investor knowing would make perfect investment decisions,

and hence the best state constant rebalanced portfolio is

By investing in the best stock each time, the wealth gained by the best state constant rebalanced

portfolio

,

38

for even. Of course this is much greater than the result from constant rebalanced portfolio.

4.4 Universal portfolio with Side Information

Universal portfolio with side information is defined similar to universal portfolio by

using a fresh universal portfolio on each subsequence of corresponding to the

times at which the side information takes on a given value [15].

(

)

where, is the wealth obtained by the constant rebalanced portfolio along the

subsequence { }, and is given by

∏ and .

4.5 Implementation on real Stock Market

As we mentioned earlier, side information can be any information which indicates the

future trends in stock market. In our case we looked into 2 stocks (HDFC and Infosys) which are

listed on both NYSE and NSE. As there is a time lag between U.S. and India, we can make our

decision in NSE by looking into the performance of the stock in NYSE. We used the

performance of the stocks on NYSE as the side information and updated the portfolio in NSE

accordingly. The universal portfolio with side information surpasses the returns of the universal

portfolio without side information. Test period taken for this exercise is 2 years. The returns of

universal portfolio theory with and without side information are compared in the plot below

(Fig.4.1).

39

Fig.4.1 – Comparison of Universal Portfolio with and without Side Information

Strategy Returns (for Re.1)

Universal Portfolio without side information 1.70

Universal Portfolio with side information 2.12

Clearly, credible side information can give an edge to your portfolio but the challenge is to have

access to 100% credible side information, which is near to impossible in real life situation. Most

of the time, we have to speculate the future trends and develop it into side information which

may or may not go in our favor.

0

0.5

1

1.5

2

2.5

1

22

43

64

85

10

6

12

7

14

8

16

9

19

0

21

1

23

2

25

3

27

4

29

5

31

6

33

7

35

8

37

9

40

0

42

1

44

2

46

3

Re

turn

Days

Plot showing the effect of side information

Without Side Info

With Side Info

40

REFERENCES

[1] “Hidden Markov models and the Baum-Welch Algorithm”. IEEE Information theory

society newsletter, Dec 2003.

[2] B. Nobakht, C.E.J. Dippel, and B. Loni. “Stock market analysis and prediction using

hidden markov models”, unpublished.

[3] L.R. Rabiner. “A tutorial on hidden markov models and selected applications in speech

recognition.” Proceedings of the IEEE, pages 257–286, 1989.

[4] Rafiul Hassan and Baikunth Nath. “Stockmarket forecasting using hidden markov model:

A new approach”. IEEE Computer Society, 2005.

[5] Wikipedia. Hidden markov model. http://en.wikipedia.org/wiki/Hidden_Markov_model.

[6] L. R. Rabiner and B. H. Juang. “An introduction to hidden Markov models”. IEEE ASSP

Mag., June: 4-16, 1986.

[7] T. M. Cover, “Universal Portfolios," Mathematical Finance, Vol 1, No.1, 1-29, January

1991.

[8] T. M. Cover, “Log Optimal Portfolios," Chapter in “Gambling Research:Gambling and

Risk Taking," Seventh International Conference, Vol 4:Quantitative Analysis and

Gambling, ed. by W.E. Eadington, Reno,Nevada, 1987.

[9] T. M. Cover, “An Algorithm for Maximizing Expected Log Investment Return,” IEEE

Transactions on Information Theory, Vol. IT-30, No. 2, March 1984.

[10] T. Cover and J. Thomas, Elements of Information Theory, Wiley, New York, 2nd

edition,

2006.

[11] Harry Markowitz, “Portfolio selection”, Journal of Finance, vol. 7, no. 1, pp. 77–91,

1952.

[12] J Kelly, “A new interpretation of information rate”, IEEE Transactions on Information

Theory, vol. 2, no. 3, pp. 185–189, 1956.

[13] R. Bell and T. M. Cover, “Competitive optimality of logarithmic investment”,

Mathematics of Operations Research, vol. 5, no. 2, May 1980.

[14] T. Cover, “An algorithm for maximizing expected log investment return”, Information

Theory, IEEE Transactions on, vol. 30, no. 2, pp. 369 – 373, mar 1984.

41

[15] T. Cover and E. Ordentlich, “Universal portfolios with short sales and margin”, in

Information Theory, 1998. Proceedings. 1998 IEEE International Symposium on, aug

1998, p. 174.

[16] T.M. Cover and E. Ordentlich, “Universal portfolios with side information”,

Information Theory, IEEE Transactions on, vol. 42, no. 2, pp. 348 –363, mar 1996.

[17] T. Cover and D. Julian, “Performance of universal portfolios in the stock market”, in

Information Theory, 2000. Proceedings. IEEE International Symposium on, 2000, p. 232.