markowitz whitepaper

ILLUSTRATING THE LINK BETWEEN MODERN PORTFOLIOTHEORY AND KELLY

TOM STARKE

0

ILLUSTRATING THE LINK BETWEEN MODERN PORTFOLIO THEORY AND KELLY 1

1. Introduction

This article was hugely inspired by Ernie Chan’s work, so thanks Ernie forproviding us with so much insight.

If you are active in the algorithmic trading field you will have come acrossErnie Chan’s books1. He has a sizeable section dedicated to portfolio managementand describes how to use the Kelly formula to apply the optimum leverage inorder to maximise returns. This approach was first applied and popularised byThorpe2. Recently, Ernie has posted a very interesting piece of work on his blog3

which unifies the portfolio management approaches of Kelly and Markowitz andhis modern portfolio theory. This post is based on his work, please have a look athis post now, before you read on.

Markowitz is generally regarded as a ’serious’ academic theory whereas Thorpewas just a mathematically-minded Black-Jack player who also made a couple ofhundred millions on the stock market. The Kelly formula works on the assumptionthat, with respect to your bankroll, there is an optimum fraction you can bet tomaximise your profits. However, both theories work on the (mostly inaccurate)assumption that returns are normally distributed. In reality we often have fattertails, which means that we should at least reduce the amount of leverage that weuse. Even though we may maximise our profits with straight-Kelly, such a strategycan also suffer from enormous drawdowns and anyone who experienced the darkerside of trading psychology knows what that means.

If you had a look at his post you may have realised that there is a lot of algebrahappening and some of you may want to know how they can use this in practicalterms. In this post I will walk you through his work and provide you with codeexamples that illustrate the concepts. I hope you enjoy it and get a little moreenlightened in the process. In this post I used random data rather than actualstock data, which will hopefully help you to get a sense of how to use modellingand simulation to improve your understanding of the theoretical concepts. Don‘tforget that the skill of an algo-trader is to put mathematical models into code andthis example is great practise.

As a final note I would like to point out that from my experience there area lot of people with IT experience reading these articles who do not alwayshave the in-depth mathematical background. For that reason I try not to jumptoo many steps to increase the educational value. So let‘s start with import-ing a few modules, which we need later and produce a series of normally dis-tributed returns. CVXOPT is a convex solver which you can easily download withsudo pip install cvxopt.

1http://epchan.blogspot.com.au2http://www.eecs.harvard.edu/cs286r/courses/fall12/papers/Thorpe_

KellyCriterion2007.pdf3http://epchan.blogspot.com.au/2014/08/kelly-vs-markowitz-portfolio.html

2 TOM STARKE

from numpy import ∗from pylab import plot , show , x labe l , y l ab e l

from cvxopt import matrix

from cvxopt . b l a s import dotfrom cvxopt . s o l v e r s import qp

import numpy as np

from sc ipy import opt imize

n = 5 # number o f a s s e t s in the p o r t f o l i opbar = [ ] ; r e t v e c = [ ]

f o r i in range (n) :

r e t v e c . append ( random . randn (1000) +0.01)pbar . append (mean( r e t v e c [−1]) )

Listing 1. Producing normally distributed return series.

These return series can be used to create a wide range of portfolios, which allhave different returns and risks (standard deviation). We can produce a wide rangeof random weight vectors and plot those portfolios. In this example I introducean upward trending bias by adding 0.01, which makes things easier for illustrationbut you can set it to zero and see what happens. You can also see that there isa filter that only allows to plot portfolios with a standard deviation of < 1.2 forbetter illustration.

de f rand weight (n) :k = random . rand int (−101 ,101 ,n)re turn k/ f l o a t (sum(k ) )

de f ca l c mean re t ( r e t vec , n , f ) :means = [ ] ; s td s = [ ]

f o r i in range (3000) :w = f (n)i f s td ( array ( r e t v e c ) .T∗w) < 1 . 2 :

means . append ( dot ( matrix ( pbar ) , matrix (w) ) )s td s . append ( sq r t ( dot ( matrix (w) , matrix ( cov ( array ( r e t v e c ) ) ) ∗matrix (w) ) ) )

re turn means , s td s

means , s td s = ca l c mean re t ( r e t vec , n , rand weight )p l o t ( stds , means , ’ o ’ )

y l ab e l ( ’mean ’ )x l ab e l ( ’ s td ’ )

Listing 2. Producing normally distributed return series.

Upon plotting those you will observe that they form a characteristic parabolicshape called the ‘Markowitz bullet‘ with the boundaries being called the ‘efficientfrontier‘, where we have the lowest variance for a given expected return as shownin Figure 1. In the code you will notice the calculation of the return with:

(1) R = pTw


where R is the expected return, pT is the transpose of the vector for the meanreturns for each time series and w is the weight vector of the portfolio. p is a Nx1column vector, so pT turns into a 1xN row vector which can be multiplied with theNx1 weight (column) vector w to give a scalar result. This is equivalent to the dotproduct used in the code. Keep in mind that Python has a reversed definition ofrows and columns and the accurate Numpy version of the previous equation wouldbe R = matrix(w)*matrix(p).T

Next, we calculate the standard deviation with

(2) σ =√wTCw

where C is the covariance matrix of the returns which is a NxN matrix. Pleasenote that if we simply calculated the simple standard deviation with the appropri-ate weighting using std(array(ret_vec).T*w) we would get a slightly different’bullet’. This is because the simple standard deviation calculation would not takecovariances into account. In the covariance matrix, the values of the diagonalrepresent the simple variances of each asset while the off-diagonals are the vari-ances between the assets. By using ordinary std() we effectively only regard thediagonal and miss the rest. A small but significant difference.

Figure 1. Simulation of portfolios with varying weight, plot of theefficient frontier and the Kelly portfolios with varying θ.

Once we have a good representation of our portfolios as the blue dots in Figure1 show we can calculate the efficient frontier Markowitz-style. This is done byminimising

(3) wTCw

4 TOM STARKE

for w on the expected portfolio return RTw whilst keeping the sum of all theweights equal to 1:

(4)∑i

wi = 1

Here we parametrically run through RTw = µ and find the minimum variancefor different µ‘s. This can be done with scipy.optimise.minimize but we haveto define quite a complex problem with bounds, constraints and a Lagrange mul-tiplier. Conveniently, the cvxopt package, a convex solver, does that all for us.I used one of their examples4 with some modifications as shown below. You willnotice that there are some conditioning expressions in the code. They are simplyneeded to set up the problem. For more information please have a look at thecvxopt example.

The mus vector produces a series of expected return values µ in a non-linear andmore appropriate way. We will see later that we don‘t need to calculate a lot ofthese as they perfectly fit a parabola, which can safely be extrapolated for highervalues.

## CONVERSION TO MATRIXk = array ( r e t v e c )

S = matrix ( cov (k ) )pbar = matrix ( pbar )

## CONDITIONING OPTIMIZER

G = matrix ( 0 . 0 , (n , n ) )G [ : : n+1] = −1.0h = matrix ( 0 . 0 , (n , 1 ) )

A = matrix ( 1 . 0 , (1 , n ) )b = matrix ( 1 . 0 )

## CALCULATING EXPECTED RETURNSN = 100mus = [ 10∗∗ (5 .0∗ t /N−1.0) f o r t in range (N) ]

## CALCULATING THE WEIGHTS FOR EFFICIENT FRONTIERp o r t f o l i o s = [ qp (mu∗S , −pbar , G, h , A, b) [ ’ x ’ ] f o r mu in mus ]

## CALCULATING RISKS AND RETURNS FOR FRONTIER

r e t s = [ dot ( pbar , x ) f o r x in p o r t f o l i o s ]

r i s = [ sq r t ( dot (x , S∗x ) ) f o r x in p o r t f o l i o s ]p l o t ( r i s , r e t s , ’ y−o ’ )

Listing 3. Calculating the efficient frontier.

You can see the result in Figure 1 as the parabolic fitting curve to the Markowitzbullet. So far, there is no magic here. But now comes the interesting part: ErnieChan shows in his article after some interesting algebra that the Markowitz optimalportfolio is proportional to the Kelly optimal leverage portfolio by a factor which

4http://cvxopt.org/examples/book/portfolio.html


I will call θ. When I read this article I was intrigued and wanted to see this formyself, which gave me the motivation for this article.

So let‘s start with calculating the optimal Kelly portfolio. This is very simplydone by the equation:

(5) w = C−1R

which in pythonic form look like this:

ww = np . l i n a l g . inv (np . matrix (S) ) ∗np . matrix ( pbar )

Listing 4. Calculation of optimal Kelly portfolio.

Now all we have to do is to apply a factor θ and see how we go with this. In mylittle example below I actually plotted the lines for both θ and −θ, which is quiteinteresting. The result of this can be seen as the black, dotted lines in Figure 1 andit can be seen that the upper line is exactly the tangent of our efficient frontier.Bingo! By getting that plot we can see that somewhere along the black Kelly linewe will find the optimal Markowitz portfolio. Since we know that all the pointson the efficient frontier have a leverage of 1 we can see that Ernie Chan is indeedcorrect with his calculations. The black line with the negative slope can also be aKelly-tangent since the tangent to the efficient frontier has actually two solutionsas we will see below.

rks = [ ] ; r e s = [ ] ;

f o r theta in arange ( 0 . 0 5 , 2 0 , 0 . 0 001 ) :w = ww/ thetarks . append ( dot ( pbar , matrix (w) ) )

r e s . append ( sq r t ( dot ( matrix (w) , S∗matrix (w) ) ) )p l o t ( res , rks , ’ ko ’ , markers i ze=3)p l o t ( res , array ( rks ) ∗−1, ’ ko ’ , markers i ze=3)

show ( )

Listing 5. Calculation of the Kelly tangent.

Once we know the optimal Kelly portfolio, we can readily find the weights forMarkowitz with w∑

i wi. However, the diligent investigator may wonder how accurate

that Kelly-derived weight vector actually is. In order to work that out we firsthave to extend our efficient frontier a little by doing a second-order polynomial fitas shown in the code example below. At the same time we are also fitting the linefor the Kelly portfolios. An example of these fits is shown in Figure 2. For ease ofcalculation I have reversed the axis here.

## FITTING THE MARKOWITZ PARABOLA

m1=p o l y f i t ( r e t s , r i s , 2 )

x = l i n s p a c e (min ( r e t s ) ,max( r e t s ) ∗2 ,1000)y = po lyva l (m1, x )

## FITTING THE KELLY WEIGHTSm2 = p o l y f i t ( res , rks , 1 )

6 TOM STARKE

Listing 6. Fitting the efficient frontier.

Figure 2. Fitting the efficient frontier and the Kelly-tangent (bluecrosses) and plotting the optimum portfolio determined with Kelly(green) and the and using the tangent calculation (black). The ac-tual tangent to the efficient frontier is shown as solid black line. Notethat the axis of risk and return are reverse to Figure 1

Next, we calculate the tangent to the efficient frontier and the point wheretangent and parabola meet. You can easily calculate this yourself knowing that thederivative of the point where both lines touch is equal to the slope of the tangent.The equation for the efficient frontier can be modelled by a simple polynomial:

(6) f(x) = ax2 + bx+ c

where the slope of the tangent is:

(7) f ′(x) = 2ax+ b

Since we know that our tangent goes through the origin, it‘s equation will bet(x) = xf ′(x) which gives us

(8) t(x) = (2ax+ b)x

We set this equal to the efficient frontier:

(9) (2ax+ b) = ax2 + bx+ c

and after some algebra we can calculate the x-value for the intercept:

(10) x = ±√c

a


and the slope of the tangent at this point:

(11) m = ±2√ca+ b

Notice that the for the tangent we obtain 2 solutions and my code is written toobtain the positive one. If you change the bias on the returns to negative youwill notice that the bullet will also shift towards the negative tangent and you cancalculate the solutions for this case.

p lo t (x , ( 2 ∗ ( s q r t (m1[ 0 ] ∗m1 [ 2 ] ) )+m1 [ 1 ] ) ∗x , ’ k ’ )x1 = sq r t (m1[ 2 ] /m1 [ 0 ] )

y1 = m1[ 0 ] ∗ x1∗∗2 + m1[ 1 ] ∗ x1 + m1 [ 2 ]

p l o t ( [ x1 ] , [ y1 ] , ’ ko ’ , markers i ze=10)

Listing 7. Plotting tangent and interception point.

When we compare the slopes of the Kelly and the real tangent to the parabolawe find that they are slightly different. I do not have a satisfying explanation forthis right now other then attributing it to rounding errors, particularly when weare dealing with portfolios where most of the expected returns are close to zero.However, in most cases the curves are close enough to conclude that Ernie Chan‘sderivations are correct.

Finally, we compare the optimum portfolio point obtained from the real tangentwith that from the Kelly-tangent. For this we use cvxopt again to calculate theweights at the optimum Kelly point are a bit different but relatively close.

wt = qp( x1∗S , −pbar , G, h , A, b) [ ’ x ’ ]p l o t ( dot ( pbar , matrix (wt ) ) , s q r t ( dot ( matrix (wt ) , S∗matrix (wt ) ) ) , ’ go ’ , markers i ze=10)

Listing 8. Calculating the interception point based on the Kelly-tangent.

Well done, if you have followed me up to this point. I hope you got a little bitenlightened and please let me know if you have any questions or suggestions. Tomake it really easy for you I finally put it all together in one big block of code.But remember, just running the code does not help to understand the depth ofthe problem described here.

from numpy import ∗from pylab import plot , show , x labe l , y l ab e l

from cvxopt import matrixfrom cvxopt . b l a s import dotfrom cvxopt . s o l v e r s import qpimport numpy as np

## NUMBER OF ASSETS

n = 4

## PRODUCING RANDOM WEIGHTSde f rand weight (n) :

k = random . rand int (−101 ,101 ,n)

8 TOM STARKE

r e turn k/ f l o a t (sum(k ) )

## PRODUCING NORMALLY DISTRIBUTED RETURNS

pbar = [ ] ; r e t v e c = [ ]f o r i in range (n) :

r e t v e c . append ( random . randn (1000) +0.01)

pbar . append (mean( r e t v e c [−1]) )

## CREATING PORTFOLIOS FUNCTIONdef ca l c mean re t ( r e t vec , n , f ) :

means = [ ] ; s td s = [ ]

f o r i in range (3000) :w = f (n)

i f s td ( array ( r e t v e c ) .T∗w) < 1 . 2 :

means . append ( dot ( matrix ( pbar ) , matrix (w) ) )s td s . append ( sq r t ( dot ( matrix (w) , matrix ( cov ( array ( r e t v e c ) ) ) ∗matrix (w) ) ) )

re turn means , s td s

## RUNNING AND PLOTTING PORTFOLIOS

means , s td s = ca l c mean re t ( r e t vec , n , rand weight )p l o t ( stds , means , ’ o ’ )y l ab e l ( ’mean ’ )

x l ab e l ( ’ s td ’ )

## CONVERSION TO MATRIXk = array ( r e t v e c )S = matrix ( cov (k ) )

pbar = matrix ( pbar )

## CONDITIONING OPTIMIZERG = matrix ( 0 . 0 , (n , n ) )G [ : : n+1] = −1.0h = matrix ( 0 . 0 , (n , 1 ) )A = matrix ( 1 . 0 , (1 , n ) )

b = matrix ( 1 . 0 )

## CALCULATING EXPECTED RETURNS

N = 100

mus = [ 10∗∗ (5 .0∗ t /N−1.0) f o r t in range (N) ]

## CALCULATING THE WEIGHTS FOR EFFICIENT FRONTIER

p o r t f o l i o s = [ qp (mu∗S , −pbar , G, h , A, b) [ ’ x ’ ] f o r mu in mus ]

## CALCULATING RISKS AND RETURNS FOR FRONTIERr e t s = [ dot ( pbar , x ) f o r x in p o r t f o l i o s ]r i s = [ sq r t ( dot (x , S∗x ) ) f o r x in p o r t f o l i o s ]

p l o t ( r i s , r e t s , ’ y−o ’ )

## PLOTTING THE KELLY PORTFOLIOSww = np . l i n a l g . inv (np . matrix (S) ) ∗np . matrix ( pbar )

rks = [ ] ; r e s = [ ] ;f o r i in arange ( 0 . 0 5 , 2 0 , 0 . 0 001 ) :

w = ww/ i

rks . append ( dot ( pbar , matrix (w) ) )r e s . append ( sq r t ( dot ( matrix (w) , S∗matrix (w) ) ) )


## PLOTTING THE KELLY PORTFOLIOS FOR DIFF LEVERAGEplo t ( res , rks , ’ ko ’ , markers i ze=3)

p l o t ( res , array ( rks ) ∗−1, ’ ko ’ , markers i ze=3)

show ( )

## FITTING THE MARKOWITZ PARABOLA

m1=p o l y f i t ( r e t s , r i s , 2 )x = l i n s p a c e (min ( r e t s ) ,max( r e t s ) ∗2 ,1000)

y = po lyva l (m1, x )

## FITTING THE KELLY WEIGHTS

m2 = p o l y f i t ( res , rks , 1 )

## m1 IS THE PARABOLIC FIT AND m2 THE LINEAR FIT

pr in t m1, m2, ww

## PLOTTING THE TANGENT TO THE PARABOLA

plo t (x , ( 2 ∗ ( s q r t (m1[ 0 ] ∗m1 [ 2 ] ) )+m1 [ 1 ] ) ∗x , ’ k ’ )

## PLOTTING THE KELLY PORTFOLIOS FOR DIFFERENT FACTORSp lo t ( rks , res , ’ x ’ )

## PLOTTING THE MARKOWITZ PARABOLAplo t ( r e t s , r i s , ’ r−o ’ )

## PLOTTING THE FITTING FUNCTION FOR THE MARKOWITZ PARABOLAplo t (x , y )

## FINDING OPTIMAL MARKOWITZ POINTSx1 = sq r t (m1[ 2 ] /m1 [ 0 ] )

x2 = −s q r t (m1[ 2 ] /m1 [ 0 ] )y1 = m1[ 0 ] ∗ x1∗∗2 + m1[ 1 ] ∗ x1 + m1 [ 2 ]y2 = m1[ 0 ] ∗ x2∗∗2 + m1[ 1 ] ∗ x2 + m1 [ 2 ]

## PLOTTING THE OPTIMUM MARKOWITZ POINT

p lo t ( [ x1 ] , [ y1 ] , ’ ko ’ , markers i ze=10)

x l ab e l ( ’mean ’ )y l ab e l ( ’ s td ’ )

## FINDING OPTIMUM PORTFOLIO WITH CVXOPTwt = qp( x1∗S , −pbar , G, h , A, b) [ ’ x ’ ]p l o t ( dot ( pbar , matrix (wt ) ) , s q r t ( dot ( matrix (wt ) , S∗matrix (wt ) ) ) , ’ go ’ , markers i ze=10)

show ( )

Listing 9. Putting it all together.

markowitz whitepaper

Documents