pairs trading: an implementation of the stochastic spread and cointegration approach
DESCRIPTION
Pairs Trading:An implementation of the Stochastic Spread andCointegration ApproachTRANSCRIPT
University of Amsterdam
Master Thesis
Pairs Trading:
An implementation of the Stochastic Spread and
Cointegration Approach
Author:
Nick Huurman
5631335
Supervisors:
Prof. dr. C.G.H. Diks
Dr. S.A. Broda
August 10, 2012
Contents
1 Introduction 1
2 Cointegration approach 3
2.1 Integration, cointegration and error correction . . . . . . . . . . . . . . . . . . . . 3
2.2 Theoretical framework for pairs trading . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Johansen cointegration test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Stochastic spread model 7
3.1 The state-space model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 The EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Trading design 12
4.1 Trading period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Pairs selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.1 Cointegration approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.2 Stochastic spread approach . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Mean-Variance optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Evaluation 16
5.1 Sharpe ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2.1 Stochastic Spread Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2.2 Cointegration Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.3 Results using DAX index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6 Conclusion 24
i
Chapter 1
Introduction
History shows us that using a market neutral trading strategy can be a good way to invest your
money. Typically, such a strategy performs in a steady manner, regardless of whether the market
goes up or down, and returns come with low volatility (Vidyamurhty, 2004). These favourable
characteristics are achieved by trading a market neutral portfolio, which can be constructed by
going long and short in two assets that have the same beta (hence, a portfolio with zero beta),
which is also referred to as a spread portfolio.
This thesis will evaluate one particular market neutral trading strategy that has already been
used (and proved its value) for 25 years on Wall Street, namely pairs trading. Recent studies
tell us that pairs trading performs exceptionally well in turbulent markets, where mispricing of
stocks is more common (Gatev et al., 2006; Do et al., 2006; Baronyan et al., 2010). Baronyan
et al. (2010) even reported a 40 per cent net annual profit in the first year (2008) of the financial
crisis. This result shows that pairs trading, despite its 25-year existence, is still profitable and
therefore very relevant to investigate, especially with the recent turbulent stock market.
The concept of pairs trading is relatively simple and can be summarized as follows. To begin,
an investor has to find two securities of which the prices have historically moved together and are
therefore in a ”relative equilibrium”. Then, when the price di↵erence between the two securities
widens, hence the securities are out of the relative equilibrium, the trader takes a long position
in the cheap security and a short position in the expensive security. Based on the past price
dynamics, the expectation of the investor is that the prices will converge back to their relative
equilibrium. If so, the long and short position are unwound and a profit is made.
The main di�culties of constructing a profitable pairs trading strategy lie evidently in using
the right method for selecting a suitable pair of securities and how and when to take a position
in the selected pair. A recent thesis by Yakop (2011) investigates a broad range of selection and
trading methods which are appropriate for pairs trading. He concludes that the model based
approaches perform best. Therefore, this thesis will investigate and analyse two di↵erent model
based approaches for pairs trading. The first approach is the cointegration approach, which is
1
CHAPTER 1. INTRODUCTION 2
based on the error correction model. The second method is the ”stochastic spread” approach,
as introduced in Elliot et al. (2005).
The results of the selected pairs of both methods will be calculated with the use of two
di↵erent trading strategies. The first is a dynamic model for the number of positions taken in
the spread that is based on the mean-variance optimization procedure discussed in the paper of
Markowitz (1952). The second is the two-standard deviation approach, which is commonly used
in earlier literature (Yakop (2011), Gatev et al. (2006), Vidyamurhty (2004)). The main objective
for this thesis is to compare the performance of the cointegration approach with the stochastic
spread approach when implemented with the two aforementioned pairs trading strategies.
The thesis is organized as follows. Chapters 2 and 3 give an outline of the two di↵erent
approaches used for modelling the behaviour of a pair. In the 4th chapter the di↵erent trading
strategies will be described and Chapter 5 provides an evaluation of the results of both models
with the di↵erent trading strategies. The last chapter contains the conclusions of this thesis.
Chapter 2
Cointegration approach
The notion of cointegrated time series was first introduced by Engle and Granger (1987) and
is one of the ideas for which they received a Nobel Prize in economics in 2003. Cointegrated
time series possess characteristics that are very useful for pairs trading, such as a long-term
equilibrium with the associated property of mean reversion. In the first section of this chapter
the definitions of integration, cointegration and the error correction model (ECM) for a time
series are given. The second section gives the theoretical framework for pairs trading and in the
third section, the cointegration test proposed by Johansen (1991) is discussed.
2.1 Integration, cointegration and error correction
To begin the theory about cointegration, first the definitions of weakly stationarity and inte-
grated time series are given:
Definition. An n ordered sequence of random variables ,i.e., a time series or process {xt}is weakly stationary or second-order stationary if the first two moments of the distribution of
{xt} are constant and independent of time.
Definition. A time series which has a stationary, invertible, ARMA representation after di↵er-
encing d times, is said to be integrated of order d, denoted {xt} ⇠ I(d).
The two above definitions become tangible by an example of a simple VAR model. Consider a
k-dimensional VAR(p) time series {xt} with possible time trend so that the model is
xt = µt + �1xt�1 + ... + �pxt�p + at,
or
�(B)xt = µt + at,
3
CHAPTER 2. COINTEGRATION APPROACH 4
with
�(B) = [I � �1B � ...� �pBp],
where the innovation at is assumed to be Gaussian and µt = µ0 + µ1t, where µ0 and µ1 are
k-dimensional constant vectors. From the definition of weak stationarity, it follows that a nec-
essary condition for the VAR(p) system above to be weakly stationary is that all zeros of the
determinant |�(B)| lie outside the unit circle, {xt} is unit-root stationary or is said to be not
integrated (I(0) process) (Tsay, 2010). The definition of cointegration as stated in Engle and
Granger (1987) is given next.
Definition. The components of the vector {xt} are said to be cointegrated of order d, b,
denoted {xt} ⇠ CI(d, b), if (i) all components of {xt} are I(d); (ii) there exists a vector ⇧( 6= 0)
so that zt = ⇧0xt ⇠ I(d� b), b > 0. The vector ⇧ is called the cointegrating vector.
Considering the case where d = b = 1, cointegration would mean that the equilibrium error
would be I(0) and zt will rarely drift far from it’s mean and will often cross this line (Engle
and Granger, 1987). A convenient way of representing the vector {xt} as a stationary series is
by the error correction model (ECM) representation (solves the issue of overdi↵erencing (Tsay
(2010),p. 431)). The definition of the ECM is given next (Engle and Granger, 1987):
Definition: A vector time series {xt}, has an error correction representation if it can be ex-
pressed as:
A(B)(1 �B)xt = ��zt�1 + ut,
where ut is a stationary multivariate disturbance, with A(0) = I, A(1) has all elements finite
and � 6= 0.
In this representation of the ECM, only the disequilibrium in the last period is an explanatory
variable. However, by rearranging terms, any kind of set of lags can be written in this form.
Therefore, this representation of the ECM permits any type of gradual adjustment towards a
new equilibrium (Engle and Granger, 1987).
2.2 Theoretical framework for pairs trading
Define the observed price of stock i at time t as {Pit} and let pit = ln(Pit) be the correspond-
ing log price. Now a common assumption about {pit} is made in the literature (Tsay, 2010;
Vidyamurhty, 2004), namely the time series {pit} has a unit-root and follows a random walk:
pit = pi,t�1 + rit, where {rit} is the return (this unit root assumption of {pit} will be confirmed
by the Augmented Dicky-Fuller (ADF) unit-root test).
CHAPTER 2. COINTEGRATION APPROACH 5
Based on the arbitrage pricing theorem (APT), if two stocks have similar risk factors, they
should have similar returns. If this is the case, {p1t} and {p2t} are likely to be driven by a
common component and are therefore cointegrated (Tsay, 2010). Or in formula, there exists a
linear combination wt = �0pt = p1t � �p2t, which is unit-root stationary and mean reverting.
These two price series {p1t} and {p2t} can also be written in an ECM form:
p1t � p1,t�1
p2t � p2,t�1
!=
↵1
↵2
!(wt�1 � µw) +
✏1t
✏2t
!, (2.1)
where µw = E[wt] denotes the mean of {wt}, which is referred to as the spread between the
two log stock prices.
The left hand side of the ECM form represents the log returns of both price series. Fur-
thermore, the equation states that the returns depend on the stationary series wt�1 � µw and
are therefore also stationary. Specifically, wt�1 � µw denote the deviations from the long-term
equilibrium between the two stocks. So, the returns of the stocks (left side of 2.1) depend on the
past deviations from the equilibrium. The coe�cients ↵1 and ↵2 respectively show the e↵ect of
these past deviations on the returns {r1t} and {r2t}. In practice, the coe�cients ↵1 and ↵2 will
have opposite signs, indicating the mean reversion behaviour of the stationary series.
2.3 Johansen cointegration test
For testing purpose, the ECM representation for a k-dimensional VAR(p) time series {xt} be-
comes:
�xt = µdt + ⇧xt�1 + �⇤1xt�1 + ... + �⇤
pxt�p + at,
where the deterministic regressor {dt} (constant/trend) is added and t = p + 1, ..., T . Further-
more,
�⇤j = �
pX
i=j+1
�i,
and
⇧ = ↵�0= �p + �p�1 + ... + �1 � I = ��(1).
The term ⇧xt�1 is referred to as the error correction term, which plays a key role in the cointe-
gration study (Tsay, 2010). If we assume that {xt} is at most I(1), �xt is I(0) process. Now,
one can consider three cases of interest of the ECM, namely:
1. Rank(⇧) = 0. Hence, ⇧ = 0 and xt is not cointegrated.
2. Rank(⇧) = k. Hence, |�(1)| 6= 0 and xt contains no unit roots and one can just look at xt
(which is I(0)).
3. 0 < Rank(⇧) = m < k. Hence, xt has m linearly independent cointegration vectors and k�m
CHAPTER 2. COINTEGRATION APPROACH 6
unit roots. If one writes ⇧ = ↵�0, ↵ and � are k ⇥m matrices with Rank(↵) = Rank(�) = m.
As can be seen from the above three cases, the rank of the ⇧ matrix is su�cient for knowing
if the time series {xt} is cointegrated. Therefore, next a likelihood ratio (LR) test is described
for determining the rank of ⇧, which is called the Johansen cointegration test. The hypothesis
of this test can be formulated as H0 : Rank(⇧) = m versus Ha : Rank(⇧) < m. The value
of m starts at null and is sequentially added by one if the null hypothesis is rejected. If the
null hypothesis is rejected for every m k, {xt} has the properties of the second case specified
above.
The LR test statistic proposed by Johansen is defined as
LRtr(m) = �(T � p)kX
i=m+1
ln(1 � �i),
where �i (should be small for i > m) are the squared canonical correlations between ut and vt,
which are the residuals of �xt and xt�1. This test is also referred to as the trace cointegration
test. The asymptotic null distribution of this test is not �2, but Dickey-Fuller-type distribution,
which depends on k �m and the deterministic components (Tsay, 2010).
Chapter 3
Stochastic spread model
In this chapter I will describe a mean reverting Gaussian Markov chain model for the spread,
namely the stochastic spread model which is based on the paper by Elliot et al. (2005). Later in
this thesis the returns of this stochastic spread approach, when implemented as a pairs trading
strategy, are compared with the above mentioned cointegration approach using historical data.
3.1 The state-space model
At any given time, a pairs trading portfolio is associated with a quantity called the spread,
which is the di↵erence between the quoted prices of the securities used. If the spread of the
portfolio is significantly di↵erent from the mean, a position in both securities is taken with the
expectation that the spread will revert to its mean (Vidyamurhty, 2004).
To explicitly model the mean reverting behaviour of the spread, a state process {xk|k =
0, 1, 2, ...} is introduced, where {xk} denotes the value of some variable at time tk = k⌧ for
k = 0, 1, 2, .... We assume that {xk} is mean reverting:
xk+1 � xk = b⇣ab� xk
⌘⌧ + �
p⌧✏k+1, (3.1)
where � 0, b > 0, a 2 R and ✏ ⇠ N (0, 1). The above equation is a discretized Ornstein-
Uhlenhorst process: dX(t) = (a� bX(t))dt + �dW (t).
Furthermore, it is easy to see that xk ⇠ N (µk,�2k), with
µk = E(xk = a⌧+(1�b⌧)µk�1 = a⌧+(1�b⌧)[a⌧+(1�b⌧)µk�2] = ... =a
b�a
b(1 � b⌧)k+(1�b⌧)kµ0,
and
�2k = V ar(xk) = (1 � b⌧)2�2
k�1 + �2⌧ = ... = �2⌧1 � (1 � b⌧)2k
1 � (1 � b⌧)2+ (1 � b⌧)2k�2
0.
From these two equations the long term mean and variance can be derived.
For k ! 1:
µk =a
b, �2
k =�2⌧
1 � (1 � b⌧)2.
7
CHAPTER 3. STOCHASTIC SPREAD MODEL 8
The state equation can be rewritten in the following way:
xk = A + Bxk�1 + C✏k, (3.2)
where A = a⌧ , B = (1 � b⌧) and C = �p⌧ .
The latent variable {xk} defined above is used in the measurement equation, which defines
the observed spread {yk} as a mean reverting process with noise:
yk = xk + D!k, (3.3)
where D > 0 and ! ⇠ N (0, 1).
The model described above has three major advantages from an theoretical point of view.
The first one is rather obvious, namely the model is mean reverting. This is exactly what is
required of the spread between two stocks to implement a successful pairs trading strategy.
The second advantage is that the model for the spread is continuous in time, such that it is
convenient for forecasting purposes. Critical questions for pairs trading such as, the expected
holding period of the portfolio and the expected return of the strategy, can therefore be answered.
The third advantage is that the model is completely tractable. All the parameters can be
estimated using the Kalman filter and a maximum likelihood procedure called the EM algorithm.
In the next two sections, the Kalman filter and the EM algorithm will be discussed in detail.
3.2 Kalman Filter
To estimate the above dynamical system of the stochastic spread model, a very useful tool
called the Kalman Filter (which is named for the contribution of R.E. Kalman (Kalman, 1960))
is introduced. This Kalman Filter is an algorithm for calculating linear least squares forecasts
of the state vector on the basis of data observed through t,
xt+1|t = E[xt+1|�t],
where �t = (yt, yt�1, ..., y1, xt, xt�1, ..., x1). The Kalman filter calculates these forecast recur-
sively, generating x1|0, x2|1,..., xt|t�1 in succesion (Hamilton, 1994).
In this thesis, the Kalman filter is described as a four-step procedure and is based on the
description given in chapter 13 of the book of Hamilton (1994) and the paper of Elliot et al.
(2005). For convenience, the key features of a general state-space system are given first:
xt+1 = A + Bxt + C✏t+1, (3.4)
yt = xt + D!t, (3.5)
where ! and ✏ are both white noise processes.
CHAPTER 3. STOCHASTIC SPREAD MODEL 9
For now it is assumed that the values of A,B,C and D are know, but later these parameters
are estimated with the use of the EM algorithm from Shumway and Sto↵er (1982).
To begin the Kalman filtering, the starting point of the recursion has to be set. Typically,
the starting point of the recursion is set as x1|0 = E[x1], which is just the unconditional mean
of x1. The associated Mean Squared Error (MSE) of this starting point is therefore P1|0 =
E[(x1 � x1|0)2].
After defining the starting point, the next step is to calculate the following points in time as
follows:
xk+1|k = E[xk+1|�k] = A + Bµk = A + Bxk|k, (3.6)
and the corresponding variance is:
Pk+1|k = E[(xk+1 � xk+1|k)2] = B2Pk|k + C2. (3.7)
The second step of the Kalman Filter is to forecast the observation of yk:
yk|k�1 = E[yk|xk, �t�1] = xk ⇡ xk|k�1. (3.8)
The MSE of yt is therefore equal to:
E[(yk+1 � yk+1|k)2] = Pk|k�1 + D2. (3.9)
Next the inference about the current value of {xt} is updated on the basis of the observation
of {yt} to produce
xk|k = E(xk|yk, �k�1) = E(xk|�k). (3.10)
Using the formula for updating a linear projection (Hamilton, 1994)(p.379) results in:
xk|k = xk|k�1 + (E[(xk � xk|k�1)(yk � yk|k�1)] ⇤ (E[(yk � yk|k�1)2])�1 ⇤ (yk � yk|k�1), (3.11)
xk+1|k+1 = xk+1|k + k+1 ⇤ (yk+1 � xk+1|k), (3.12)
where the stands for the kalman gain and is given by:
k+1 = Pk+1|k/(Pk+1|k + D2). (3.13)
The estimate xk+1|k+1 denotes the best forecast for of {xk+1} given �k.
3.3 The EM Algorithm
The Kalman filter assumes that the parameters in the state-space model are specified in advance.
Normally, this is not the case and these parameters have to be estimated. One widely used
estimation method is described in the paper of Shumway and Sto↵er (1982) and will also be
CHAPTER 3. STOCHASTIC SPREAD MODEL 10
used in this thesis. In the paper of Shumway and Sto↵er (1982) the estimation of the parameters
is done by maximum likelihood using the EM algorithm. Next, I will discuss this estimation
method.
In order to estimate the parameters of the state space model defined by 3.4 and 3.5, the joint
log likelihood has to be specified for this model. The dependence on the unobserved time series
{xk} of the system, makes the specification of the likelihood function not straightforward. To
solve this problem, the EM algorithm is conditioned on the observed time series y1, ..., yn. Lets
define the estimated parameters at the (r + 1)st iterate as the values # = (A,B,C2, D2) which
maximize:
G(#) = Er[LogL|y1, ..., yn], (3.14)
where the conditional expectation Er refers to the rth iterative values of A(r), B(r), C2(r) and
D2(r). Furthermore, LogL is the joint log likelihood of the complete data. The conditional
mean and the covariance functions specified by the Kalman filter are conditioned on the full
dataset, which gives smoothed estimators of {xk}:
xk|n = E(xk|�n),
Pk|n = E[(xk � xk|n)2],
Pk,k�1|n = E[(xk � xk|n)(xk�1 � xk�1|n)].
The EM-algorithm is a two step iterative procedure that finds a stationary value # of the
likelihood function in the following way:
step 1 (The E-step): Compute (with # = #j):
Q(#, #) = E#[LogL|y1, ..., yn],
step 2 (the M-step): Find
#j+1 2 argmax Q(#, #).
The graph 3.3 shows a generated spread (with the parameters in Elliot et al. (2005)) and
the fitted values of this spread using the stochastic spread approach.
CHAPTER 3. STOCHASTIC SPREAD MODEL 11
0 20 40 60 80 100 120−3
−2
−1
0
1
2
3
4
Days
Spre
ad
Figure 3.1: The fitted values of Stochastic Spread approach (green line) and simulated spread
(blue line)
Chapter 4
Trading design
This chapter discusses the trading strategy used in this thesis. In the first section, the trading
period is described. The second section sets out the pairs selection criteria for the two model
based approaches described in the former chapters. In the third section, the mean-variance
optimization theory of Markowitz (1952) for determining the optimal number of positions in the
spread, is discussed.
4.1 Trading period
The data used in this thesis contains daily closing prices of the stocks of the Amsterdam Stock
Exchange (AEX) in the period from 1st of January 2006 until 30th of December 2011 and is
obtained by Thomas Reuters through Datastream Advance. Since an equilibrium between two
stocks is not very likely to remain over the whole time of the dataset, the data is divided in little
blocks of formation periods and adjacent trading periods. The number of days in the formation
period are arbitrarily chosen and set to 128, 256 and 512 days. The adjacent trading period is
set to half of the trading days of the formation period as is done in earlier literature (Gatev et al.
(2006), Yakop (2011)). In the trading period, the number of positions in the spread is opened
following the mean-variance optimization procedure (discussed at the end of the chapter) and
the two standard deviation strategy. Any remaining open positions in the spread are closed at
the end of the trading period.
A rolling window of 40 trading days will be used to start a new formation period. The result
of implementing a rolling window is that after the first 128, 256 or 512 days (which are the
di↵erent lengths of the formation periods), all the remaining days in the dataset will be used
for trading and no opportunities are lost.
12
CHAPTER 4. TRADING DESIGN 13
4.2 Pairs selection
This section describes the criteria for selecting a suitable pair for the di↵erent methods.
4.2.1 Cointegration approach
As mentioned in chapter 2, {pit} is assumed to have a unit-root and follows a random walk
model: pit = pi,t�1 + rit. This assumption is tested with the ADF-test and if the null hypothesis
(a unit root) is not rejected, the series {pit} is selected.
After selecting the time series {pit}, all the di↵erent combinations of pairs are tested for
cointegration by the Johansen test procedure. The model specified for testing is:
�xt = ↵(�0wt�1 � µw) + c0 + �1�xt�1 + ✏t,
where µw is the intercept and c0 the deterministic trend.
Pairs that reject the first hypothesis of m = 0 and did not reject the second hypothesis of
m = 1 are selected as suitable pairs and have a mean reverting spread wt with mean mw. The
spread portfolio is wt = p1t � �p2t. So against one stock of {p1t}, � stocks of {p2t} are held,
where ↵ is the speed of adjustment parameter.
4.2.2 Stochastic spread approach
To select a pair suitable for trading, all the di↵erent combinations of spreads are estimated with
the EM algorithm and Kalman filter as discussed in chapter 3. After estimating the parameters
of the model, the parameter B of the state equation is evaluated. If B is between 0 < B < 1,
the spread shows mean reversing behaviour and the pair is selected for trading. The number of
positions taken in the spread is again obtained using the Mean-Variance optimization strategy
discussed below.
4.3 Mean-Variance optimization
This section will describe the mean-variance optimization procedure (MV), used for determining
the number of positions in a pairs trade. The concept of mean-variance optimization was first
introduced by Markowitz (1952). The main purpose of Markowitz’s paper was to mathematically
explain the behaviour of investors to diversify their portfolio. Markowitz claims that investors
do not only maximize the expected return of a portfolio, but also consider the variance of
the returns. In this thesis I will use Markowitzs ”expected returns-variance of returns rule to
optimize the number of positions held in a spread portfolio.
The ratio behind the optimization of the number of position in a spread portfolio lies in the
mean reverting behaviour of the spread of a pairs trade. No matter how big the deviation of
CHAPTER 4. TRADING DESIGN 14
the mean, the spread is always expected to revert back to its long term equilibrium value. In
earlier literature about pairs trading, a fixed position in the portfolio is opened after the spread
hits a pre-set threshold some distance away (two standard deviations) from the long term mean
(Yakop (2011), Gatev et al. (2006), Vidyamurhty (2004)). After hitting the threshold value, the
position is held until the spread reverts back to the mean. When this happens, the position is
unwound and a profit is made. In the time that has passed between opening and closing the
position, the spread could have been significantly larger than it was when the trader first opened
the position. If this is the case, the trader can generate a much bigger profit by taking on more
positions proportional to the size of the spread.
In this thesis, the opportunity to generate a higher profit in a trade is explored by varying the
number of positions. The positions taken in a spread are optimized by using a utility function
based on the aforementioned principle of the expected returns-variance of returns by Markowitz
(1952), namely:
Ut(wpt+1) = Et
wpt+1 � wpt
wpt
�� �V art
wpt+1 � wpt
wpt
�,
where wpt is the amount of wealth of the portfolio at time t and � is a constant that mea-
sures the risk aversion of the trader (and is set to one when the strategy is evaluated). In
the paper of Markowitz (1952) it is stressed that finding reasonable values for Et
hwpt+1�wpt
wpt
i
and V art
hwpt+1�wpt
wpt
iby using reliable statistical techniques is essential. Both the stochastic
spread and the cointegration approach have these favourable characteristics. Now, let’s define
{returnt+1} as the value of a portfolio at time {t + 1} that invested one dollar in the spread
at time {t} . Using this definition for {returnt+1}, the expected return and variance can be
evaluated using the following equations:
Et
wpt+1 � wpt
wpt
�= zt ⇥ Et
returnt+1
wpt
�,
V art[rt+1] = z2t ⇥ V art
returnt+1
wpt
�,
where {zt} represents the number of positions taken in the spread portfolio. The value of
Et[returnt+1] is calculated with the use of the parameters estimated in the formation period.
The value of V art[returnt+1] is estimated in the formation period and is assumed to be constant
in the trading period.
The number of positions taken in the spread at any point in time can now be calculated by
maximizing the utility function with respect to {zt}. The first order condition is given by:
@Ut(zt)
@zt= E
returnt+1
wpt
�� 2�ztV ar
returnt+1
wpt
�= 0.
Since the second derivative of the utility function is always negative (� > 0, V ar[returnt+1] > 0),
solving this first order condition for {zt} gives the number of positions to be taken in the spread
CHAPTER 4. TRADING DESIGN 15
that maximize the utility function. This optimal value of {zt} at any point in time is given by:
zt =E[returnt+1]
2�V ar[returnt+1]⇥ wpt.
The return rt+1 of this strategy is given by:
rt+1 =wpt+1 � wpt
wpt= zt ⇥
returnt+1
wpt.
When the optimal value of zt is used, the return of the strategy is as follows:
rt+1 =wpt+1 � wpt
wp�t=
Et[returnt+1]
2�V art[returnt+1]⇥ returnt+1.
It can be seen that the returns of this strategy are not dependent of the value of wpt.
Chapter 5
Evaluation
This chapter gives an evaluation of the results of the two model based approaches discussed in
chapters 2 and 3. The structure of this chapter is as follows. First, the definition of a Sharpe
ratio is given and a few concerns with the calculation of Sharpe ratio’s, as explained in the master
thesis of Yakop (2011), are discussed. In the second section, the results for both approaches are
given. The last section gives out of sample results of the di↵erent pairs trading strategies.
5.1 Sharpe ratio
A common way to compare the returns of di↵erent trading strategies is done by calculating
the ”reward-to-variability”, nowadays also called the Sharpe ratio introduced by Sharpe (1966).
The Sharpe ratio gives the excess expected return of an investment to its return volatility. In
formula,
SR =E[rt] � rf
�, (5.1)
where E[rt] and � are the expected return and standard deviation of the returns series {rt}. rf
is the average return earned by the benchmark in the evaluated period. The risk-free rate is
usually assumed to be an adequate benchmark for comparing the returns of the strategy. As
discussed in Yakop (2011), an adequate benchmark should act as an appropriate substitute for
pairs trading. Therefore, Yakop (2011) did not use the risk-free rate, but the composite index
of the stocks, in this case the AEX index. When calculating the Sharpe ratio with equation 5.1,
the rf is therefore set to zero. Afterwards, the calculated Sharpe ratio’s of the di↵erent trading
strategies are compared to the Sharpe ratio’s of the AEX index.
The estimation of the Sharpe ratio, SR, is found by substituting µ = 1T
PTt=1 rt for E[rt]
and � =q
1T
PTt=1(rt � µ) for �, which are the estimated mean and standard deviation of the
return series. As discussed in Yakop (2011), since SR is based on µ and � (which are estimated
with some error), SR is (also) estimated with some error. Denoting the vector (µ �)0by ✓ and
16
CHAPTER 5. EVALUATION 17
the SR formula in equation 5.1 by g(✓), Lo (2002) shows that the asymptotic distribution of the
SR estimator is given by:
pT (SR� SR) ⇠a N(0, VGMM ), VGMM =
@g
@✓⌃@g
@✓0 .
The estimation of @g@✓ and ⌃ and the derivation of the asymptotic distribution are not done in
this thesis. Interested readers are referred to Appendix A of Yakop (2011).
Furthermore, Yakop (2011) discusses two limitations of the use of Sharpe ratios. The first
limitation of the Sharpe ratio is that it implicitly assumes the return series to be normally
distributed or at least approximately so. In practice, pairs trading strategies produce frequent
small positive returns with sometimes large losses, which will accentuate the Sharpe ratios
because of the excess skewness and kurtosis (Lo, 2002).
The second limitation of the use of Sharpe ratio is that it ignores any underlying serial
correlation, which is frequently present in financial time series. The consequence of the serial
correlation is, again, that it results in overestimation of the Sharpe ratio’s (Lo, 2002) . To resolve
this issue, the standard deviations of the return series, � have to be estimated by the Newey-West
(1987) (heteroskedastic) autocorrelation consistent estimator of variance (HAC estimator). The
HAC estimator is used when calculating the Sharpe ratio’s of the return series. The derivation of
the HAC estimator is not done in this thesis, but can also be found in Yakop (2011) in Appendix
A.
5.2 Results
In this section the results of pairs trading with the Stochastic Spread approach and the Cointe-
gration approach are given.
5.2.1 Stochastic Spread Approach
As mentioned in the third chapter, the Stochastic Spread model has three major advantages
from a theoretical point of view. The model captures mean-reversion, is continuous in time and
is completely tractable. Despite these hopeful properties of the model, the experienced empirical
results turn out to be less favourable.
First of all it takes a long time to estimate the parameters of one spread, let alone those of
the 276 di↵erent spreads available in the AEX (consisting of 24 stocks). To give an indication
of the time needed to estimate these spreads: a single formation period already takes forty-two
minutes. There are seven formation periods in this dataset. So the estimation of all the di↵erent
pairs in the dataset would take roughly five hours.
This first disadvantage stated above, is inconvenient but can be overcome by the use of
faster computers (or patience). However, another disadvantage is more problematic. After the
CHAPTER 5. EVALUATION 18
estimation of all the di↵erent spreads, the amount of pairs found suitable for pairs trading was
minimal. For example, the first formation period resulted in five suitable pairs. This is not
much, given the fact that there are 276 di↵erent pairs available.
Also, the parameters estimated from the pairs selected by this method, suggest that the
model can be simplified to a simple AR(1) model for the spread. Specifically, the parameter D
in the space equations is estimated to be at most 0.001. This suggest that the state-space model
can be brought back to the state equation, which is just a simple AR(1) model for the spread.
This AR(1) model has already extensively been tested in the context of pairs trading inYakop
(2011) and will therefore not be further analysed in this thesis.
So, despite the favourable theoretical properties, the use of the stochastic spread model for
pairs trading, which was suggested by Elliot et al. (2005), does not turn out to be a good
approach for pairs trading in practice.
Parameters of selected Pairs Values
Number of possible pairs 276
Average number of selected pairs 5
A 0.0062
B 0.9845
C 0.2660
D 0.0007
Table 5.1: Estimation results of Stochastic Spread Approach
0 100 200 300 400 500 600−4
−3
−2
−1
0
1
2
3
4
5
6
(a) Fitted values of spread in FP, with B = 0.9827
0 50 100 150 200 250 300−0.25
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
(b) Actual spread in the trading period
Figure 5.1: Example of a Pair selected with the Stochastic Spread Approach
CHAPTER 5. EVALUATION 19
5.2.2 Cointegration Approach
Contrary to the stochastic spread approach, the results of the cointegration approach are useful
for evaluating a pairs trading strategy. To begin the evaluation of the cointegration approach, an
overview of the specifics of the dataset and parameters used in the analyses are stated in table
5.2. As can be learned from table 5.2, results for three di↵erent lengths of formation periods
(respectively 128, 256 and 512 days) and the adjacent trading periods, are estimated. In these
di↵erent lengths, all the possible combinations of pairs (in this case 276 pairs) are tested with
the Johansen cointegration trace test described in 2.3 (with a significance level of 0.05). The
average amount of pairs found by this test for the di↵erent formation periods are also stated in
table 5.2.
Parameters Description Values
D Number of trading days 1316
S Number of stocks 23
RW Rolling window 40
FP Formation period 128 days 256 days 512 days
TP Trading period 64 days 128 days 256 days
NT Number of trading periods 28 23 13
NP Average number of Pairs 19 28 35
Table 5.2: Parameters for trading strategy
The graphs of figure 5.2 on page 20 show the behaviour of two di↵erent pairs during the
formation and trading period. As can be seen from the graphs, the pair of stocks show periods
of divergence and convergence during the formation period. This mean reversion behaviour is
the key for a profitable pairs trading strategy and is present in all pairs selected in the formation
period. Unfortunately, some of the pairs formed during the formation period will not portray
the same behaviour during the trading period (see graph d). As a result, losses will be made
on these pairs. For the pairs trading strategy to be a success, the pairs that do show mean
reversion behaviour should make up for the probable losses incurred on these ”bad” pairs.
Now I will present the main results of the cointegration approach. Table 5.3 contains the
calculated Sharpe ratios of the cointegration approach using the mean-variance optimization
trading strategy. The Sharpe ratios are calculated on the basis of the daily returns and therefore
look small. Conversion of these daily SRs to annual SRs is commonly done by multiplying the
SRs byp
250. This is known as ’time aggregation’ within finance. Lo (2002) however shows that
statistically speaking this rule is incorrect because of the serial correlation underlying financial
CHAPTER 5. EVALUATION 20
0 20 40 60 80 100 120 140−6
−5
−4
−3
−2
−1
0
1
2
Days
Spread
(a) Spread FP: Aegon, Heineken
0 10 20 30 40 50 60 70−1
0
1
2
3
4
5
Days
Spread
(b) Spread TP: Aegon, Heineken
0 20 40 60 80 100 120 140−3
−2
−1
0
1
2
3
Days
Spread
(c) Spread FP: PostNL, Unibail-Rodamco
0 10 20 30 40 50 60 70−10
−5
0
5
10
15
Days
Spread
(d) Spread TP: PostNL, Unibail-Rodamco
Figure 5.2: Example of Pairs
CHAPTER 5. EVALUATION 21
returns, which can result in extreme overestimation of the SRs. Therefore, only the estimated
daily SRs are included in this thesis.
Furthermore, it has to be noted that the calculation of the daily returns did not incorpo-
rate the transaction costs. Including transaction costs in the investigation would require some
creativity, since the di↵erence between the bid and ask price of a stock is not reported (only
the daily closing prices are). The fee for making a transaction is also not commonly known.
Therefore, the inclusion of transaction costs within pairs trading justifies an entire research on
its own and shall not be further dealt with in this thesis.
As can be seen from the average Sharpe ratios of this strategy, the mean-variance optimiza-
tion su↵ers large losses in all the di↵erent formation periods length. This is a remarkable result,
since this strategy is supposed to maximize the value of the portfolio. Unfortunately, one critical
assumption of this strategy is that the selected pairs have the property of mean reversion. If this
assumption is not met and a pair drifts away, the number of positions taken in the spread will
increase dramatically and huge losses will be taken. The results show that there are to many
pairs that show this behaviour. Therefore the average Sharpe ratios of the di↵erent formation
periods are negative.
Benchmark Descriptive statistics SRs Count
SR(AEX) FP Average Max Min Std. Dev. pos. SR SR >
SR(AEX)
Significant
at 5%
0.0073 128 -0.0447 0.1461 -0.1523 0.0583 13 5 5
0.0046 256 -0.0444 0.0585 -0.1079 0.0436 8 3 3
0.0439 512 -0.0376 0.0041 -0.0665 0.0227 2 1 0
Table 5.3: SR’s for di↵erent FP with MV optimized positions
On the other hand, the histogram in figure 5.3 (which shows the distribution of the SR in
the di↵erent TP) shows us that if there are enough pairs that do mean reverse in one formation
period, the SR of that period can be high (SR of 0,15). Unfortunately, this does not happen
often enough and the overall results of this strategy are disappointing.
CHAPTER 5. EVALUATION 22
−0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.20
1
2
3
4
SRs
Freqeuncy
Figure 5.3: Histogram of the estimated SR’s of the MV strategy of formation period length of
128 days
To compare the mean-variance strategy with a less risky strategy, I also calculated the Sharpe
ratios using the common two standard deviation (2STD) strategy for opening a position. This
strategy is not as risky as the mean-variance optimization, because it will only open one position
at a time. The results of this strategy are stated in table 5.2.2. It can be seen that the 2STD
strategy returns positive average Sharpe ratios in the three di↵erent formation periods, where
the formation period of 128 days has the highest average. In contrast to the mean-variance
strategy, the pairs that do not converge and will drift away from the equilibrium will only have
a loss of two times the standard deviation. These losses are clearly overcome by all the pairs that
do behave as expected, which results in the positive average Sharpe ratios for all the di↵erent
trading periods.
CHAPTER 5. EVALUATION 23
Benchmark Descriptive statistics SRs Count
SR(AEX) FP Average Max Min Std. Dev. pos. SR SR >
SR(AEX)
Significant
at 5%
0.0073 128 0.0209 0.0667 -0.0227 0.0233 25 15 11
0.0046 256 0.0147 0.0469 -0.0167 0.0221 19 12 9
0.0439 512 0.0081 0.0204 -0.0045 0.0092 10 2 2
Table 5.4: SR results of di↵erent FP with 2STD trigger opening of a position
5.3 Results using DAX index
To see if the results of the cointegration approach are robust, an second estimation of the
cointegration approach for both trading strategies is done. The second dataset consists of the
daily closing prices from the last five years of the DAX index (which includes the thirty biggest
listed German companies). The results of both trading strategies are given in the table 5.5
below.
As can be seen in table 5.5, the MV strategy is performing even worse in this dataset than it
did in the AEX dataset. The average daily SRs of the MV strategy for the di↵erent periods are
all negative and only in one TP does the MV strategy significantly outperform the DAX index
(FP:128 days). The 2STD strategy (again) performs better than the MV strategy and generates
small positive average SR in all the trading periods. The results of both pairs trading strategies
of both datasets are much alike. Therefore, it can be concluded that the results obtained are
robust.
Benchmark Descriptive statistics SRs Count
Strategy SR(DAX) FP Average Max Min Std. Dev. pos. SR SR >
SR(AEX)
Significant
at 5%
MV
0.0551 128 -0.0843 -0.0119 -0.1829 0.0486 0 3 1
0.0596 256 -0.0636 0.0165 -0.1056 0.0426 1 3 0
0.0316 512 -0.0429 -0.0225 -0.0536 0.0105 0 1 0
2STD
0.0551 128 0.0210 0.0689 -0.0181 0.0215 17 8 5
0.0596 256 0.0135 0.0429 -0.0069 0.0159 14 3 3
0.0316 512 0.0069 0.0148 -0.0045 0.0030 6 3 3
Table 5.5: SR Results of MV and 2STD for Dax index
Chapter 6
Conclusion
In this thesis two di↵erent model based approaches for pairs trading were discussed and tested
with the use of two di↵erent trading strategies. Results were generated for the daily closing
prices of the stocks in the AEX index over the last five years. Furthermore, an out of sample
estimation was done to verify if the results where robust.
The first approach for modelling the behaviour of a pair, the stochastic spread, was first
suggested (but not yet tested) by (Elliot et al., 2005). From a theoretical point of view, the
stochastic spread has three major advantages. The model captures mean-reversion, is continuous
in time and is completely tractable. Despite these theoretical advantages, the empirical results
turn out to be less favourable in practice. First of all, the stochastic spread approach did not
find pairs suitable for trading. Secondly, the estimated parameters of the state-space form of
the model suggested that the model could be simplified to only the state equation (which is just
an AR(1) model). This renders the estimation of the parameters with the EM-algorithm and
Kalman filter unnecessary, since the AR(1) model is embedded in the other approach discussed
in this thesis. Therefore, only a few estimates and graphs of the spread are present and not the
actual results of pairs trading are present in this thesis.
The second approach for modelling the behaviour of a pair is the cointegration approach.
The idea of cointegration was already used for pairs trading in earlier papers (Yakop (2011),
Vidyamurhty (2004)). The approach in these earlier papers however, is more ad-hoc and not
based on the error correction model (ECM), which is normally used in econometric research. In
this thesis the cointegration approach is based on the ECM and the pairs are tested with the
use of the Johansen cointegration test.
Subsequently, two trading strategies for taking a position in the spread were used to calculate
the results. The first one is the two standard deviations strategy (2STD). This strategy is
commonly used in the literature (Yakop, 2011; Vidyamurhty, 2004, Gatev et al., 2006). The
concept of this strategy is very simple: one takes a position in the spread if it is far enough
(two standard deviation) away from the mean and closes the position when the spread returns
24
CHAPTER 6. CONCLUSION 25
to the equilibrium value. The second strategy is called the mean-variance approach (MV). As
the name suggests, the number of positions taken in the spread is determined by a trade-o↵
between the di↵erence from the spread of the mean and the variance of the spread. The spread
is expected to revert back to the mean and the MV strategy uses this assumption to maximize
the portfolio value by varying the number of positions taken in the spread.
The results of both strategies are in tables 5.2.2 and 5.3. The 2STD strategy generated
small positive returns over all the di↵erent formation periods. This result is typical for a pairs
trading strategy and is thus what you would expect. In contrary, the MV strategy generates
large negative SRs in all the formation periods. This is not what you would expect, because
this strategy aims to maximize the portfolio value by varying the number of positions in the
spread and should, consequently, perform well. However, one crucial assumption for the success
of this strategy, namely mean reversion, is not met by a large number of pairs. The number
of positions drastically increases in these pairs and the losses are substantial. This leads me to
the conclusion that the MV strategy might be too risky (in this case, at least) for pairs trading.
The estimation of the second dataset (DAX index) confirms this, because similar results were
generated. Given the fact that two indices produced similar results, one can conclude that these
results are robust.
Further research in pairs trading should focus on other ways to optimize the trading strategy,
since the MV procedure did not generate the desired results. Furthermore, the inclusion of
transaction costs within pairs trading is a relevant topic that should be taken into account,
but has not yet been investigated. One could also investigate the concept of pairs trading for
more than two securities, such as ”triple” or ”quadruple trading”. The cointegration approach
discussed in this thesis could be a good way for investigating this topic, since the existence of a
cointegration relation between three or four stock can be easily tested within this framework.
Bibliography
Baronyan, S. R., Boduroglu, I. I., and Sener, E. (2010). Investigation of Stochastic Pairs Trading
Strategies under di↵erent Volatility Regimes. The Manchester School, pages 114–134.
Broda, S. (2011). Financial econometrics slides.
Do, B., Fa↵, R., and Hamza, K. (2006). A New Approach to Modeling and Estimation for Pairs
Trading. Working Paper, pages 1–30.
Elliot, M. J., van der Hoek, J., and Malcolm, W. (2005). Pairs Trading. Quantitative Finance,
5(3):271–276.
Engle, R. F. and Granger, C. W. (1987). Co-integration and Error Correction:representation,
Estimation and Testing. Econometrica, 55(2):251–276.
Gatev, E., Goetzmann, W. N., and Rouwenhorst, K. G. (2006). Pairs Trading: Performance of
a Relative-Value Arbitrige Rule. Review of Financial studies, 19(3):797–827.
Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press.
Johansen, S. (1991). Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian
Vector Autoregressive Models. Econometrica, 59(6):1551–1580.
Kalman, R. (1960). A new Approach to Linear Filtering and Prediction Problems. Journal of
Basic Engineering, 82:35–45.
Lo, A. (2002). The statistics of Sharpe Ratios. Financial Analysts Journal, July/August:36–52.
Markowitz, H. (1952). Portfolio Selection. Journal of finance, 7(1):77–91.
Sharpe, W. (1966). Mutual Fund Performance. The journal of Business, 39(1):119–138.
Shumway, R. and Sto↵er, D. (1982). An Approach to Time Series Smoothing and Forecasting
using the EM Algorithm. Journal of Time Series Analysis, 3:253–264.
Tsay, R. S. (2010). Analysis of Financial Time Series. John Wiley and Sons, Inc., third edition
edition.
26
BIBLIOGRAPHY 27
Vidyamurhty, G. (2004). Pairs Trading, Quantitative Methods and Analysis. John Wiley and
Sons, Inc.
Yakop, M. (2011). A Comparative Analysis of Pairs Trading. Master’s thesis, University of
Amsterdam.