1 high frequency futures data ewan kirk ceo cantab capital partners llp jan 2008
TRANSCRIPT
1
High Frequency Futures Data
Ewan Kirk CEO Cantab Capital Partners
LLP
Jan 2008
2
Introduction
A vast amount of information is produced in the financial markets every day
Typically the vast majority of the data is either discarded or ignored by both practitioners and researchers
With powerful computers and massive data storage more of this data can be analysed
But…
We’ve lived in a Gaussian or near-Gaussian financial world for our entire professional lives and much of this data is not even close to Gaussian
Concepts like “return”, “volatility” cease to have meaning
There is huge amounts of data. 200mb per contract per day.
In the univariate case, the data on a single contract is bursty and not evenly spaced
In the multivariate case, data is not cotemporaneous.
There are no obvious intellectual framework.
But…
The financial opportunities are large.
3
A Descent into the data 20 years of SP500 data. Eyeball statistics tell you that this is log-normal with a drift. And, to a first approximation, it is.
But think hard about this. What is this data? Does it bear any resemblance to the market on the day? There are 5000 data points here and this is probably about as much as you can reasonably hope to work with.
S & P 500 Spot I ndex
200
400
600
800
1000
1200
1400
1600
15Jan88 1Jan90 1Jan92 1Jan94 1Jan96 1Jan98 1Jan00 1Jan02 1Jan04 1Jan06
4
What about a month worth of data? 29 days SP500 data. It’s pretty obvious here that you can’t say much statistically.
Oh and don’t forget that there are weekends, holidays, early closing days in this graph.
S & P 500 Spot I ndex
1380
1400
1420
1440
1460
1480
1500
1520
1Dec07 10Dec0717Dec0724Dec0731Dec07 7Jan08 14Jan08 21Jan08
5
Could we use more frequent data? The futures market generates a lot of data but there are lots of little issues with futures
They’re not the same as the spot index (and in some cases like oil, there isn’t a spot index). They also expire and roll, lots of tough stuff to worry about here but let’s ignore all these issues and just look at the hourly data.
E-Mini S&P 500 Index CME Nrby b 01 S & P 500 Spot I ndex
1380
1400
1420
1440
1460
1480
1500
1520
1540
1Dec07 10Dec0717Dec0724Dec0731Dec07 7Jan08 14Jan08 21Jan08
6
So here’s hourly data Oh no! What’s gone wrong?
Well there are zeros in the data stream
hloc2(ccp_ rt_ esh8.trdprc_ 1,-60)
0
200
400
600
800
1000
1200
1400
1600
12/ 1 7:30:00 1/ 31 7:30:00
7
Let’s clean it up Looking a bit better but there are some pretty odd things happening here
And let’s not forget that I’ve just said “hourly” but what does that mean? Average price over the hour (argh!), highest price, lowest price, last price, first price? When does an hour start and end? Last traded price or mean of last bid and offer? How much do you weight a price at 11pm on a Friday compared to 2pm on a Wednesday? What does 11pm and 2pm mean in this context?
680 Data points
zapz(hloc2(ccp_ rt_ esh8.trdprc_ 1,-60))
1360
1380
1400
1420
1440
1460
1480
1500
1520
1540
12/ 1 7:30:00 1/ 31 7:30:00
8
Let’s zoom in some more. One Day 1 hour: 24 (?) points, 93 15 minute points, 1387 minutely points. Note 4 days data is more than the 20 year S&P graph!
Clearly minutely data gives a lot more information than hourly and there are some interesting bits of structure here. But there are lots of times when not much is happening. How do we deal with this? Oh and the kurtosis is 45…
Even with minute data, we’re throwing away more than 99.5% of the data.
Hourly 15 Minutes Minutes
1370
1375
1380
1385
1390
1395
1400
1405
1410
1415
1420
1/ 15 0:00:00 1/ 16 0:00:00
9
Tick Data Every time something happens on a futures exchange, the exchange sends out a message to every
subscriber saying what has happened. So what can happen? A trade. We get trade time (to nearest millisecond but there are lags), trade volume and, obviously,
trade price. The best bid can change and the size on the best bid can change. The best offer can change and the size on the best offer can change.
In addition, there is “level data” which is the next best bids and offers down 5 (or more) levels.
This screen flashes continuously pretty much all day…
10
Tick Data This is starting to look more interesting.
zapz(ccp_ rt_ esh8.trdprc_ 1) zapz(ccp_ rt_ esh8.bid) zapz(ccp_ rt_ esh8.ask)
1370
1375
1380
1385
1390
1395
1400
1405
1410
1415
1420
1/ 15 0:00:00.000 1/ 16 0:00:00.000
11
Zoom in (30 minutes) This is a randomly chosen 30 minute window from 10:30 EST to 11:00 EST on the
15th of January
Trade Bid Ask
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1/ 15 10:30:00.000 1/ 15 11:00:00.000
What can we say about this data? The digital nature of the system is starting to become more obvious.
Oh and let’s not forget that there are 1400 trades in this period and 43488 changes of the bid or offer price or size.
In this single half hour, there is more data than in the entire history of the S&P series since 1945….And we get a new set every 30 minutes….
12
Zoom in even more (2 minutes)
Trade Bid Ask Traded Volume Bid Size Ask Size
1395.5
1395.6
1395.7
1395.8
1395.9
1396
1396.1
1396.2
1396.3
1396.4
1396.5
1396.6
1396.7
1396.8
1396.9
1397
1397.1
1397.2
1397.3
1397.4
1397.5
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1/ 15 10:38:00.000 1/ 15 10:40:00.000
13
Zoom in even more (10 seconds)
Trade Bid Ask Traded Volume Bid Size Ask Size
1396.75
1396.8
1396.85
1396.9
1396.95
1397
1397.05
1397.1
1397.15
1397.2
1397.25
1397.3
1397.35
1397.4
1397.45
1397.5
0
50
100
150
200
250
300
350
400
450
500
550
600
650
1/ 15 10:38:10.000 1/ 15 10:38:20.000
14
Zoom in even more (1 second)
Trade Bid Ask Traded Volume Bid Size Ask Size
1396.75
1396.8
1396.85
1396.9
1396.95
1397
1397.05
1397.1
1397.15
1397.2
1397.25
1397.3
1397.35
1397.4
1397.45
1397.5
0
50
100
150
200
250
300
350
400
450
500
550
600
650
1/ 15 10:38:13.000 1/ 15 10:38:14.000
Rule of thumb, one second of data is equivalent to about 6 months of daily data.
Look at the interesting structure. Artifacts too!
Did the bid size change here? Nope
Trades happening on the bid
Simultaneous trades at bid and offer? Then nothing for over 2/10 of a second! Relative calm!
15
Here’s that same second tabular format Trade Bid Ask Traded VolumeBid Size Ask Size
10:38:13 1397.25 1397.25 1397.50 1 69 18610:38:13 1397.25 1397.50 68 18910:38:13 1397.50 1397.25 1397.50 20 68 18010:38:13 1397.25 1397.50 71 18110:38:13 1397.25 1397.50 76 17310:38:13 1397.25 1397.50 76 16010:38:13 1397.25 1397.25 1397.50 76 76 17010:38:13 1397.00 1397.00 1397.25 250 134 910:38:13 1397.00 1397.00 1397.25 8 133 9310:38:13 1397.00 1397.00 1397.25 5 111 12210:38:13 1397.00 1397.00 1397.25 2 100 13610:38:13 1397.00 110:38:13 1397.00 1397.00 1397.25 1 91 13810:38:13 1397.00 1397.00 1397.25 20 91 14610:38:13 1397.00 1397.00 1397.25 1 69 12610:38:13 1397.00 1397.00 1397.25 1 68 12610:38:13 1397.00 1397.00 1397.25 1 2 12210:38:13 1397.00 1396.75 1397.00 2 611 5810:38:13 1396.75 1397.00 596 5910:38:13 1396.75 1397.00 564 3510:38:13 1396.75 1397.00 544 50
Note that Excel (which was used to reformat the data) doesn’t understand times less than one second.
16
Shall we make it more complicated? One minute in equities land.
I’ve removed bids/asks and sizes (but don’t forget the richness of that data)
FTSE and the S&P are both equities so they should be related but how? Not sure that the “correlation of returns” is really going to help here…
SP500 Trade FTSE Trade Dax CAC STOXX
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
1/ 15 10:00:00.000 1/ 15 10:01:00.000
17
Oh and different assets look different at micro scale
One minute, quite a lot of moves but maybe not as many trades?Crude Oil Crude Bid Crude ask
91.92
91.93
91.94
91.95
91.96
91.97
91.98
91.99
92
92.01
92.02
92.03
1/ 15 10:00:00.000 1/ 15 10:01:00.000
18
Here’s another one
The bund moves a lot less in one minute
Bund Bund Bid Bund ask FGBLH8.BIDSIZE FGBLH8.ASKSIZE
115.77
115.772
115.774
115.776
115.778
115.78
115.782
115.784
115.786
115.788
115.79
0
100
200
300
400
500
600
700
800
900
1000
1/ 15 10:00:00.000 1/ 15 10:01:00.000
19
Let’s not lose sight of the amazonian rain forest for the trees There are literally thousands of futures contracts. Hundreds of them produce this much data
every second.
Conservative estimate is that since the start of electronic trading in 2002, the crude oil contract has produced 30 billion data points. (5 levels deep bids and asks x 12 contracts x 255 days x 5 years.
Include equity indices, bonds, currency futures, other commodities, and it is close to 10 trillion data points.
This is a hugely richer data set than the usual SP500, Lehman Bond Index and “Oil” daily data that most people seem to do research on.
Apart from high energy physics, there probably aren’t very many areas where there is this much data which needs to be modelled.
But why do we want to model it?
20
Either “make money” or “don’t lose money Make Money (Statistical “Arbitrage”):
With all this information, can we predict where the next trade will be? Can we identify short term trends, short term mean reversion, does the intra day information tell us something about the next tick, the next 30 seconds, the next hour, the next day?
Forget your GARCH models, intra-day volatility is a lot better than GARCH at forecasting vol tomorrow. Costs in these markets are tiny!. FX is the best with $1m of notional costing $3 to trade. 1/30 th of the tightest bid
offer spread. FX spreads are often <1bp. Futures costs are considerably less than one tick and the markets are 1 tick wide most of the time. So if you can get a 2 tick move, you’re making money.
But how do you back test strategies? Just because you saw a trade at the bid, doesn’t mean that you got done at the bid in your backtest. Queues, latency etc.
Don’t Lose Money (Algorithmic Trading) If the bid market depth is 50 lots at 100, 50 lots at 99 and 250 lots at 98, and I have to sell 350 lots I know I can do
this right now at a WAP of 98.42. Can I do better? What about if I need to do 3500 lots? There is a risk trade off here. And it gets even more complicated because I might need to do 100 trades (“program
trading” as it is known). I can wait but might miss my price. Market impact, order arrival, agents… This has spawned the whole VWAP, TWAP, Iceberg…etc etc industry.
There are LOTS of computers in the market doing this on the back of some pretty hokey modelling.
21
When Algorithms Go Bad…10 very very bad seconds for some statistician at a bank…
ESH8.TRDPRC_1 ESH8.BID ESH8.ASK ESH8.ACVOL_1
1355
1360
1365
1370
1375
1380
1385
1390
1395
1400
1405
1410
1415
1420
1425
1430
1435
20000
25000
30000
35000
40000
45000
50000
55000
60000
65000
70000
75000
80000
1/14 2:01:10.000 1/14 2:01:20.000
22
Next Steps
I’m not presenting a model
I’m presenting a problem.
A big problem.
If anybody is interested in the problem then I’m happy to talk through it in more detail.
I was going to hand out a CD with 600mb of data on it. This was the 15Jan08 for the front month FTSE, SP500, 10y Bund and Crude oil.
But there are licensing issues with our data provider. If you want access to this data then get in touch and we’ll work out a way to do it.
23
Disclaimer
This document is issued by Cantab Capital Partners (“CCP”), authorised and regulated by the FSA in relation to shares (the “Shares”) in the CCP Quantitative Fund (the “Fund”). The Fund will not be a recognised collective investment scheme under the Financial Services Act 1986 (the “Act”) and accordingly, investors in the Fund will not benefit from the rules and regulations made under the Act for the protection of investors, nor from the UK Investors’ Compensation Scheme.
CCP are regulated by FSA. This Brochure is issued only to persons falling within article 11(3) of the Financial Services Act 1986 (Investment Advertisements) (Exemptions) Order 1996 and may not be passed on to any other person. It does not constitute an offer or solicitation of an offer of any investment or investment service.
The value of the Shares, and any income from them, may go down as well as up and an investor may not receive back, on redemption of his Shares, the amount which he invested. Changes in rates of exchange between the US Dollar and the currencies in which the investments of the Fund are denominated may cause the value of the Shares to go up or down. The Shares will not be dealt in on a recognised or designated investment exchange for the purposes of the Act, nor will there be a market maker in the Shares, and it may therefore be difficult for an investor to dispose of his Shares otherwise than by way of redemption or to obtain reliable information about the extent of the risks to which his investment is exposed.
This document does not constitute or form part of any offer to issue or sell, or any solicitation of any offer to subscribe or purchase, the Shares, nor shall it or the fact of its distribution form the basis of or be relied on in connection with, any contract therefore. Recipients of this document who intend to apply for Shares following the publication of the prospectus to be issued by the Fund are reminded that any such application must be made solely on the basis of the information and opinions contained in the prospectus which may be different from the information and opinions contained herein. Neither CCP, nor their directors or employees warrant the accuracy, adequacy or completeness of the information contained herein and CCP expressly disclaims liability for errors or omissions in such information. No warranty of any kind implied, express or statutory is given by CCP or any of its directors or employees in connection with the information contained herein. Under no circumstances may this document, or any part thereof, be copied, reproduced or redistributed without the express permission of a partner of CCP. Registered in England No. OC317557. Registered office: Daedalus House, Station Road, Cambridge, CB1 2RE © Cantab Capital Partners LLP