MIDAS Predicting Volatility at Different
Frequencies
Wensi Shi Supervisor: Lars Forsberg
Uppsala University
2010‐5‐18
Abstract
Keyword: Realized volatility; MIDAS regression, realized power, absolute return, intra‐day data
I compared various MIDAS (mixed data sampling) regression models to predict volatility from one week to one month with different regressors based on the records of Chinese Shanghai composite index. The main regressors are in 2 types, one is the realized power (involving 5‐min absolute returns), the other is the quadratic variation, computed by squared returns. And realized power performs best at all the forecast horizons. I also compare the effect of lag numbers in regression, form 1 to 200, and it doesn’t change much after 50. In 3 week and month predict horizons, the fitness result with different lag numbers has a waving type among all the regressors, that implies there exists a seasonal effect which is the same as predict horizons in the lagged variables. At last,the out‐of ‐sample and in‐sample result of RV and RAV are quite similar, but in sometimes, out‐of sample performs better.
2
1 Introduction
The study of forecast future volatility is started by Engle’s (1982) ARCH‐class of models, which
is successfully capture the return variance using simple parametric model. The ARCH/GARCH
models of Engle and Bollerslev cast future variance by past squared returns, and alternatively,
other researchers try to find variables, other than squared returns, related to future volatility
useful in forecast.
Ding et.al (1993) suggested absolute returns might be better capture low‐frequency
components of volatility than squared returns. Daily ranges are also good predictor in Alizadeh et
al. (2002) and Gallant et al. (1999)’s suggestion. And Andersen and Bollerslev (1998) focus on
using data‐driven models of realized volatility computed by high frequency intra‐day data. And
mixed data sampling (MIDAS) introduced by Ghysels, Santa‐Clara and Valkanov (2002a,b) provide
a method to find out the best predictor among these variables at different frequency and
forecast horizons.
Generally, MIDAS model is a robust, simply and parsimonious framework of forecasting future
realized volatility at different horizons based on sample in different frequency. My work is using
the intra‐day data to predict the volatility on daily, weekly, and monthly horizons, because these
horizons are those frequencies used mostly in option pricing, portfolio managements and
hedging applications. Through the study of Chinese stock, I can use the measurement of volatility
computed by intra‐days data to predict the volatilities, that my way of studying 5‐min data’s
performance in MIDAS regression. I compared the results with different regressors RV (realized
volatility) and RAV (realized absolute volatility) both of them are computed by high‐frequency
return as a daily realized volatility. From the study of Eric Ghysels and Pedro Santa‐Clara (2003),
realized power(also said realized volatility) has a dominate part to other variables; and proposed
by Ole E. Barndorff‐Nielsen and Neil Shephard (2001), the absolute volatility also has a good
predictable effect.
To fix the notation, I use , 1 1log( ) log( )t t t tr P P− −= − to express the daily return between
time t and time t ‐1, which tP is the price of time t . In a higher frequency, m‐times in a day,
like 5‐min intra‐day data, I use the notation to express the interval return:
1 1,log( ) log( )tt t t
m m
r P P− −
= − .
The definition of a sequence of returns for a fixed day t with j numbers in the series is:
( 1)( ) log( ) log( ), 1, 2,...,j j jr t P P j mδ δ−= − = .
And RV (realized volatility) is defined as: ( ) 2, 1
1( )
mm
t t jj
Q r t−=
=∑ , in day t. RAV (realized absolute
variation) is ( ), 1
1( )
mm
t t jj
P r t−=
=∑ . The notations of RV and RAV are proposed by Barndorff‐Nielsen
and Shephard (2001).
I’m interested in the intra‐day high frequency data and use them to predict daily volatility. And
3
all the study is based on Chinese Shanghai composite index , from 26 July 1999 to 26 April 2010,
over 10 years data. Chinese Shanghai composite index is an index includes all the shares (A shares
and B shares) traded in Shanghai Stock Exchange. The base period is 19th December, 1990, and
the base values are the total market capitalization of constituents at that day and applied at 15th
July, 1991.1
I compare different regressors, the absolute return and square return; intra‐day return data
and daily volatility, at different forecast horizons, from day to month. The result I gain from these
regressions shows the absolute return outperformed with others, even better than daily power
realized volatility computed by absolute returns. And despite the weight parameters shows there
aren’t particular important lag variables, which have big weigh among all the lag factors, the
result of the MIDAS regression are also reasonable and good fit with absolute return used. The
section 2 introduce the MIDAS regression models, and the brief descriptive statistical result about
the variables RV, RAV are shown in section 3, section 4 is the results of all the regressions by
different regressors and frequency.
2 The MIDAS model
The notation of MIDAS regression model with daily regressors is:
max
( ) ( ), , 1
0( , )
kHm m
t H t H H H t k t k Htk
V b k Xμ φ θ ε+ − − −=
= + +∑
In this paper, the index t refers to daily sampling; H is the future horizon. For instance, when
calculate the week volatility, we should use H =5; in month volatility calculation, H =20.
Because there is 5 trading days in each week, and 20 trading days in a month for there is 4 weeks
in each month. Through all the different horizons,m is the intra‐day time. In my study, for high
frequency data sampled, there are 48 data records in a trading day totally, from the beginning of
market at 9:00am to the end 15:00pm, and a 90‐min break in the noon. At the left side, ( ),
Hmt H tV + is
the future focused volatility, which represents the weekly or monthly volatility. In my work, I use
2 different ways to measure the daily volatility, which are realized volatility, the quadratic
variation of return, and the absolute volatility, the absolute value of returns. The index Hm of
( ),
Hmt H tV + means there are Hm records to be summed in calculating the future volatility. The
realized volatility is calculated as: 2( ) ( 1) ,( ) ( 2)
1
[ ]Hm
t H j m t H j mj
r + − − + − −=∑ . From the notation above, the
return are computed as1 1,
log( ) log( )tt t tm m
r P P− −
= − .Because of the transformation, the forecasting
can yields better in‐ and out‐of sampled variance result for the less weight on extreme data and
previous papers, including Andersen et al.(2003) say so.
On the right side, ( ), 1
mt k t kX − − − is a volatility measure, which is calculated in two types, in
1 The detailed description are in http://en.wikipedia.org/wiki/SSE_Composite_Index
4
quadratic and absolute way. But ( ), 1
mt k t kX − − − and ( )
,Hm
t H tV + can be calculated in different frequency.
There is a weight function ( , )Hb k θ before ( ), 1
mt k t kX − − − . ( , )Hb k θ has two properties, one is
normalized to add up to one, insure the estimation of the scale parameter Hφ , the other is
non‐negative, which guarantees a non‐negative volatility process. Also there are other ways to
specify ( , )Hb k θ like “exponential Almon lag” in Ghysels et al.(2004). In my work, ( , )Hb k θ is
specified on Beta function parameterized by k and θ in my work. k is the number of lags of
( ), 1
mt k t kX − − − ,and θ is the parameters to scale weight , which has two parameters shown as
1 2[ ; ]θ θ θ= . The whole function ( , )Hb k θ is shown as below, which product k weight values
paired with ( ), 1
mt k t kX − − − :
Where1 1(1 )( , , )
( , )
a bz zf z a ba bβ
− −−= , and ( , )a bβ is based on Gamma function, that means,
( ) ( )( , )( )a ba ba b
β Γ Γ=Γ +
. The specification of ( , )Hb k θ was introduced in Ghysels et al.(2002b,2004)
and has several useful characteristics. 1) It provides positive coefficients, makes all the
forecasting variables ( ), 1
mt k t kX − − − have a positive weight value. 2) When 1 1θ = and 2θ >1, the
series of weight values has a slowly decaying pattern, which let the nearer volatility value has a
heavier weight in the regression. 3) When 1 1θ = and 2θ =1, it products an equal weight series,
means every forecasting variables ( ), 1
mt k t kX − − − has the same weight in MIDAS regression. Figure 1
shows the trend of Beta function with different parameters 2θ and fixed 1 1θ =.
max
1 2max
1 2max1
, ;( ; )
, ;H
k
j
kfkb k
jfk
θ θθ
θ θ=
⎛ ⎞⎜ ⎟⎝ ⎠=
⎛ ⎞⎜ ⎟⎝ ⎠
∑
5
In the whole paper, I focus on predicting( )
,Hm
t H tV + , the future realized volatility from one day
(H=1), one week (H=5) to one month (H=20), because the week‐to‐month horizons matter mostly
for option pricing and portfolio management. To compare the effect of high‐frequency data and
daily volatility, there are two different ways to get the regressors. One way is to use 5‐min data
directly as the regressed variables. And at different horizons of predicted future volatility, k is
the number of lagged regressors. The other way is calculating daily volatility summed by 5‐min
data in one day, and then combining the daily volatility with weight parameters get
max
( ), 1
0( , )
km
H t k t kk
b k Xθ − − −=∑
used to predict future volatility on different horizons. In my work, all the
predicted future items ( )
,Hm
t H tV + are the sum of (future) squared returns, namely( )
,Hm
t H tQ + 。
And there are 2 different daily regressors. One is the past squared returns,( ), 1m
t tQ − , which is the
usual regressors in autoregressive conditional volatility and advocated by Andersen et
al.(2001,2002,2003). The other way is the sum of high‐frequency absolute returns, also called as
“realized absolute power” variation. Defined as:
( ), 1 ( 1) , ( 2)
1
mm
t t t j m t j mj
P r− − − − −=
= ∑
Realized power variation is suggested by Barndorff‐Nielsen and Shephard(2003b,2004)
and Woerner (2002).
3 Distributional properties of realized volatility and returns
My data set consists of daily returns and realized volatilities for the Shanghai Composite Index
of Chinese stock market from 26 July 1999 to 26 April 2010. All the row data are form Chinese
data company, Biaopuyonghua. First, I calculate the relevant return which is formed as:
1 1,log( ) log( )tt t t
m m
r P P− −
= −
from high‐frequency data. The data is log transformed which is good to eliminate the extreme
value effection. Then, by summing all the 5‐min squared return in a day, the daily realized
volatility is constructed. So the daily realized volatility is formed as:
( ) 2, 1
1
( )m
mt t j
j
Q r t−=
= ∑
The daily absolute volatility is formed by summing 5‐minute absolute return in a day and the
calculational formula is
( ), 1
1
( )m
mt t j
j
P r t−=
= ∑
Despite the holiday and weekend in the data period, there are 124464 trading records, and
2593 days totally which 48 trading records per day. Time series plots of the variables including
6
relevant return, realized volatility, and absolute volatility are given below, from figure 2 to 5.
Figure 2: The time series plot of RV (realized volatility)
Figure3: The plot of square root of RV (realized volatility)
Figure 4: The plot of series RAV
The figures above shows that there is a bigger volatility form 2007 to 2010. And all the 3
different types of daily volatility measurement have a similar shape and few extreme values. All
the histograms of them show the distribution has a heavy tail. From the figures above, the
series of RAV and the square root of RV express a more detailed volatility with a lower weight of
extreme values. And in RV’s series plot, because the quadratic variation, the extreme values
become higher than other plots. I think that the point why the absolute return explain the
regression best among others regressors.
7
Figure 5 shows the log transformed return data, which is calculated by the formula
, 1 1log( ) log( )t t t tr P P− −= −.The plot shows that there is a bigger range between2007 and 2010
at the end part of time series. And the histogram tells us the data is aggregated at 0 point
symmetrically, but not in a normal distribution.
Figure 5: The plot of returned values by log formed price
Table 1 is the descriptive statistic result of RV, RAV and return. There isn’t a big distance
between the mean and median of these values. Both the range values and variances of the series
give us a impression of aggregated data pattern.
Talbe1: Brief statistical result of RV RAV and return RV, Square Root of RV and RAV are the daily volatility summed by intra‐day data
RV ( ), 1m
t tQ −
Square
Root of RV
( ), 1m
t tQ −
RAV ( ), 1m
t tP − Log Return
1,t tm
r−
Mean 1.78× 410− 0.0114 0.06 4.99× 610−
median 9.24× 510− 0.00961 0.05 ‐7.18× 610−
maximum 3.87× 310− 0.0623 0.36 0.088
Minimum 3.72× 610− 0.00193 0.01 ‐0.043
variance 6.31× 810− 4.75× 510− 0.014 5.08× 610−
8
4 The result of regressions
4.1 The results of daily volatility forecast with high‐frequency sample
There are 2 MIDAS regressions to be compared in the daily volatility forecast. They
are:
max
( ) ( ) 2, / , ( 1) /
0
( , )[ ]k
Hm mt H t H H H t k m t k m Ht
k
Q b k rμ φ θ ε+ − − −=
= + +∑ (1)
max
( ) ( ), / , ( 1) /
0
( , ) | |k
Hm mt H t H H H t k m t k m Ht
k
Q b k rμ φ θ ε+ − − −=
= + +∑ (2)
( ),
Hmt H tQ + is the predicted quadratic variation volatility. And it is defined as :
( ),
Hmt H tQ + = 2
( ) ( 1) ,( ) ( 2)1
[ ]Hm
t H j m t H j mj
r + − − + − −=∑
In the model (2), we get , 1/ 1/log( ) log( )t t m t t mr P P− −= − which is the return and specified
in all the models above. And tP is the price at time t.
With the using of past daily realized volatility and realized power, the regression models
have changed as:
max
( ) ( ), , 1
0
( , )k
Hm mt H t H H H t k t k Ht
k
Q b k Qμ φ θ ε+ − − −=
= + +∑ (3)
max
( ) ( ), , 1
0
( , )k
Hm mt H t H H H t k t k Ht
k
Q b k Pμ φ θ ε+ − − −=
= + +∑ (4)
In this part, I use 5‐min data to forecast daily volatility directly, which means ( ),
Hmt H tQ + and
( ),
Hmt H tP+ on the left side of models (3) and (4) are summed by 48 5‐min intra‐day data of a trading
day for H =1. With different lagged day, I compared with the result and try to find the befitting
lag number of the smallest MSE result. The lags I choose are a vector formed by series
1,5,10,15,…till 260, 53 numbers totally. When the lag is 1, means there are only 48 regression
variables ( )/ , ( 1) /
mt k m t k mr − − − in model (1) and (2) and max
k =1 in model (3) and (4).i
The in‐sample result is from the whole data period which was record from 26 July 1999 to
26 April 2010. Because of the same left side, so we can compare the 4 models by MSE shown by
9
in Figure 6. According to the left figure, model (1) has the better result. The figure also shows that
there is a same fitness trend along the lag increase. The lag day is beginning with 1 and
dramatically improved when lagged day is 5. There is a drop at 145 lagged days which may due to
the half year report in stock market, and an over 1‐year lagged days shows there isn’t a significant
year report effect through the regressions. The figure of lagged daily volatility is laid right which
has the similar trend to the left one. Both of them have the big drop at 145. But using daily
volatility has a smaller MSE result. Comparing the two plots, RAV (realized absolute return)
outperform others.
In all the regressions, the estimated values of Hμ and Hφ change little. The detailed
information can be attained in appendix tables.
Figure 6 : The brief in‐sample result of 1 day predict horizons with |r|,r ,RAV, RV
The two figures are the results of the four regression models, and the period is from 26 July 1999 to 26 April 2010. In these models H=1
Figure 7 is the MSE result of out‐of sampl with all the 4 types regressors at 1 day predict
horizons. The out‐of sample is a ten year dataset which is from 26th July 1999 to 26th July 2009. I
use first 9 years data get the estimated parameters and calculate the difference between the real
forecast volatilities and estimated ones in the 10th year. From the left plot, RV and RAV have a
very similar and nearly results. And the absolute 5‐min return performs better than squared
5‐min return very much. On the right side, the plot shows that RAV performs better than RV. Both
the two plots have a steady trend when lagged day increased to 10 and the don’t perform the
drop at about 145 lagged days happened in in‐sample result either. Compared with in‐sample
data result, out‐of sample has a bigger MSE. Generally, in 1 day forecast horizon, RAV performs
best.
10
Figure 7 : The brief out‐of‐sample result at 1 day predict horizons () with |r|,r ,RAV, RV
The two figures are the results of out‐of sample. I use the data from 26th July 1999 to 26th July 2008 to estimate parameters in models and calculate the
MSE(mean square error) in the period of 26th July 2008 to 26th July 2009. The forecast horizon is 1 day means H=1 in models. The left one shows the results of
all the regressors but RV and RAV are not clearly due to small difference compared with the y axis scale. The right side one is the compared result of RV and
RAV which RAV has better performance
4.2 The comparison of regression fitness at different horizons with fixed regressor
In general, there are 2 ways to forecast weeks volatility in my work. One is using past daily
volatility, like realized volatility and realized power. The other is using 5‐min data directly which
make the regression has a large number of lags.
I want to predict weekly future volatility which means H=5 in equation (3) and (4). The right
side of model (3), ( ), 1
mt k t kQ − − − = 2
( 1) , ( 2)1
[ ]m
t j m t j mj
r − − − −=∑ ,and , 1/ 1/log( ) log( )t t m t t mr P P− −= − .For
there are 48 5‐min records in one day, m =48 in the models. And on the left side, ( ),
Hmt H tQ + =
( ), 1
1
Hm
t j t jj
Q + + +=∑ which is the summed daily volatilities. And in formula (4), the right side is calculated
as ( ), 1
mt k t kP− − − = ( 1) , ( 2)
1
| |m
t j m t j mj
r − − − −=∑ ,by summed intra‐day absolute returns. And the left side is
the same as equation (3).
I choose one week daily lags first which means in equation (3) and (4) maxk =5 with 5 daily
volatility values used on the right side. But in equation (1) and (2), maxk =48×5=240, for using
the 5‐min data. Then I enlarge the number of lags to 50 daily volatilities according to Ghysels et
11
al.(2003). So in (3) and (4), maxk =50, and in (1) to (2), maxk =48×50=2400. At last, I choose 20
lags on the right side, so maxk =200 in (3) and (4), 4800 5‐min data in (1) to (2). I calculate the
estimated values from 9 years data and then get MSE of the 4 models at the lag point 10, 20, 30,
until 200 step 10. The results are detailed in appendix and according to the results, I draw figures
giving a directly idea.
In this part, I focus the question how the fitness performs in different forecast horizons in
in‐sample result. From figure 8 to 11, there are the 4 different regressors, absolute return,
squared return, daily realized volatility and daily realized power. In each regressors, I give 2
pictures show the fitness by MSE values and R square values. All the figures show a similar
pattern of series trend.
In MSE part, short forecast period has a smaller mean square error. According to different
predict horizons that month volatility summed by the longest returns which means the more
regression error summed and this result to the largest mean square error. So, R square values
may be is the better one in the comparison of fitness through different predict horizons.
The R squared part in figure 8 and 9 tell us monthly horizon has the highest value while 2
weeks horizon is the second one. The 1 week and 2 weeks horizons have a steady trend when the
lags increase to 30 and 2 weeks horizon is better fitted then 1 week through all the figures. The 3
weeks horizon and 4 weeks horizon have a seasonal pattern in the plots. The absolute return and
squared return outperform in monthly predict horizons, and perform quite similar at 2 week and
3 week horizons. Though the smallest MSE value is in 1 week horizons, R square shows opposite
that in 1 week horizons it has the lowest values. Squared return value has the similar result with
absolute return, while it has a big diffusion in 3 week horizons. Both the 2 daily volatilities RV and
RAV are have good fitness from 2 weeks to 4 weeks. In 3 weeks, the period of seasonal effect is
15 day which is the same as the predict horizon. So, with lag number at 10,25,40,…., MSE has the
lowest values. And in one month predict horizon, the period is 20 days, that at 10,30,50 lags, it
has the smallest error. Compared with the MSE and R square values, RAV has the smallest MSE
values in all the horizons from in‐sample result. The detailed values can be approach in appendix.
Figure 8: The In‐Sample MSE Fitness of 4 Forecast Horizons With regressor |r|
12
Figure 9: The In‐Sample Fit of 4 Forecast Horizons With regressor 2r
Figure 10: The In‐Sample Fit of 4 Forecast Horizons With regressor RV
Figure 11: The In‐Sample Fit of 1 Month Forecast Horizons With regressor RAV
13
The 4.3 Out‐Of Sample Comparison of different regressors with fixed predict horizons
In this part, I focus on the performance of different regressors in all models through out‐of
sample result.
Figure 12 shows us briefly out‐of‐sample result of the changing fitness when the lags are
increasing. Compared with the 4 different regressors in all the predict horizons, RAV plays the
best. In all the pictures, RV and RAV have a very nearly MSE values. And when the lag increased
to 20, in the first 2 plots, the MSE values become steady. Squared 5‐min return performs worst
which has the largest MSE values. At 3 weeks and 1 month predicted horizons, there is a seasonal
pattern in the figures with all regressors from the third and forth figures. The last 2 plots are the
seasonal pattern of absolute 5‐min return which is too small to shown in first 4 plots. In 3 weeks
horizons, my plots show the period is 30 days in both absolute return and squared return, but in
fact, the smallest period is 15 days, the same as predicted horizons. And in month horizons, the
lagged period is 20 days. But model 1 and 2, using 5‐min return, has the opposite best fitted
seasonal lag day value. Like in 3 weeks horizons, the absolute 5‐min returns at 20, 35, 50… get the
smallest MSE values, while the squared return at the same point has the largest periodic MSE
values.
When the lags changed to 130, the MSE values in the absolute 5‐min return, see from the last
2 plots, become decline slightly in 3 weeks and 1 month forecast horizons. And in Figure 13, it
shows the trend of RV and RAV performance.
Figure 12: The MSE Out‐of‐Sample Fit of Weeks Forecast with different lags The results are obtained using the sample, from 26 July 1999, to 26 April 2010 and shown with MSE values.
14
Figure 13 is the plots of RV and RAV in all the horizons. From figure 12, we know the
comparisons between 5‐min data and daily volatility and daily volatilities performs better. And in
figure 13, we can see clearly how the daily volatility works in MIDAS models. In all the plots, RAV
performs better than RV and both of the two volatilities have the same fitness trend. In 3 weeks
and 1 month predict horizons, RV isn’t steady enough with increasing lagged day while RAV not.
They all have seasonal pattern in 3 weeks and 1 month horizons with the same period time as
predict horizons, 15 days in 3 weeks and 20 days in 1 month horizons.
So generally, from figure 8 and 9, with the increase forecast horizons, the MSE values of all
the regressors are also increased. And the increasing of lagged numbers doesn’t have a
significant effect of the improvement of fitness especially with the use of RAV. The 50 lags in
Ghysels et al.(2004) can be enough in the predicting. In the result of out‐of sample, RAV has the
best regression al result.
Figure 13: The Out‐of‐Sample MSE Fitness of all Forecast Horizons with RV and RAV
The results are obtained using the sample, from 26 July 1999, to 26 April 2010 and shown with MSE values
15
5. Conclusions
MIDAS regressions is a very good method to forecast future volatility and my approach is by
comparing forecasting models with two different measurements of volatility, frequencies and lag
lengths. The main focus of this paper is the forecasting with high‐frequency sample which is using
5‐min data directly. Because MIDAS framework can find a good use in any empirical investigation,
so I can compare high‐frequency measurement with daily measurement, besides, it’s allowing
different measurement of volatility, the approach by comparing different regressors is available.
Through all the comparisons in this paper, power realized volatility has the best fitness and
simple, robust and parsimonious. There are several findings from predictability of daily to monthly realized volatility of Chinese
market. First, the squared return values outperforms than absolute variation in forecast daily
volatility based on high‐frequency data with 1 to 260 days lag. Though the fitness result isn’t
good enough, but MIDAS method is quit robust that there isn’t a big difference between others
with different frequency and forecast horizons. And I think, with the 50 lag number, the
regression result can get a steady fitness and the short length of lags in daily forecast regression
is useful, despite the long lags increase the fitness very slowly. Second, the intra‐day data used in
MIDAS forecast is reliable for power realized volatility and absolute return. Using daily power
realized volatility is more reliable with 5‐min data directly and other regressors. Third in 3 weeks
and 4 weeks horizons, all the regressors exist a seasonal effect which has the same period time
with predicting horizon. So, my suggestion is trying different lags near 50 (which is long enough
to get a good fitness result) and get the best fitness lag point. Last, when fix the regressors,
different predict horizons has a different fitness result. But the conclusion is that 2 week to
month horizons outperform than 1 week though 1 week has the lowest MSE values in all the
models.
16
References
[1] Alizadeh, S., M. Brandt and F. X. Diebold, (2002), “Range‐based estimation of stochastic
volatility models”, Journal of Finance, 57, 1047‐1091
[2] Andersen, T. and T. Bollerslev (1998), ”Answering the Skeptics: Yes, Standard Volatility
Models Do provide Accurate Forecasts”, International Economic Review,39,885‐905
[3] Andersen, T., T. Bollerslev, F. X. Diebold and P. Labys (2001), “The Distribution of Exchange
Rate Volatility”, Journal of American Statistical Association, 96,42‐55
[4] Andersen, T.G., Bollerslev, T. and F. X. Diebold, (2002),”Parametric and Nonparametric
Volatility Measurement,” in L.P. Hansen and Y. Ait‐Sahalia (eds.), Handbook of Financial
Econometrics, Amsterdam: North‐Holland, forthcoming.
[5] Andersen, T., T. Bollerslev, F. X. Diebold and P. Labys (2003), ” Modeling and Forecasting
Realized Volatility”, Econometrica, 71, 529‐626
[6] Barnodorff‐Nielsen, O. and N. Shephard (2001), “Non‐Gaussian Ornstein‐Uhlenbeck‐based
models and some of their uses in financial economics (with discussion),”Journal of the Royal
Statistical Society, Series B, 63, 167‐241.
[7] Barnodorff‐Nielsen, O. and N. Shephard (2002a), “Econometric analysis of realized volatility
and its use in estimating stochastic volatility models”, Journal of the Royal Statistical Society,
Series B, 64,25‐280
[8] Barnodorff‐Nielsen, O. and N. Shephard (2003a) “How accurate is the asymptotic
approximation to the distribution of realized volatility?” in D.W.K. Andrews, J. Powell, P.
Ruud and J. Stock (ed.), Identification and Inference for Econometric Models, A Festschrift for
Tom Rothenberg, Cambridge University Press.
[9] Barndorff‐Nielsen, O. and N. Shephard (2003b), “Realised power variation and stochastic
volatility” Bernoulli 9, 243‐265
[10] Barnodorff‐Nielsen, O. and N. Shephard (2004) “Power and bipower variation with stochastic volatility and jumps” (with discussion) Journal of Financial Econometrics, 2, 1‐48
[11] Ding, Z., C. W.J Granger and R. F. Engle (1993), “A long memory property of stock market
returns and a new models”, Journal of Empirical Finance,1,83‐106
[12] Engle, R. F. and G. Gallo (2003), “A Multiple Indicator Model for Volatility Using Intra Daily
Data”, Discussion Paper NYU and University di Firenze.
[13] Engle, R.F. (1982), “Autoregressive Conditional Heteroscedasticity with Estimates of the
Variance of United Kingdom Inflation”, Econometrica, 50, 987‐1008.
[14] Gallant, A. R., C.‐T. Hsu, and Tauchen, G. (1999), “Using Daily Range Data to Calibrate Bolatility Diffusions and Extract the Forward Integrated Volatility”, Review of Economics and
Statistics, 85,616‐631.
[15] Ghysels, E., P. Santa‐Clara and R. Valkanov (2002a), “There is a Risk‐return Tradeoff after all,” Journal of Financial Economics
[16] Ghysels, E., P. Santa‐Clara and R. Valkanov (2002b), “The MIDAS Touch: Mixed Data Sampling
Regrssion,”
[17] Ghysels, E., P. Santa‐Clara, A. Sinko and R. Valkanov (2004), “MIDAS Regressions: Results and
New Directions”
[18] Woerner J. (2002), “Variational sums and power variation: a unifying approach to model
selection and estimation in semimartingale models”, Discussion Paper, Oxford University.
17
Appendix
Table 1: The detailed results of In‐sample‐fit of regression 1 and 2 with H=1 1 day predicted horizons
18
Table 2: The in‐sample result of 1 day predict horizon with RV and RAV
19
Table 3: The out‐of‐sample result of 1 day predict horizon with RAV, squared return, RV, and absolute return
Table4: The in‐sample results of 1 week predicted horizon with regressors |r|, r , RAV, RV
20
Table5: The in‐sample results of 2 weeks predicted horizon with regressors |r|, r , RAV, RV
21
Table6: The in‐sample results of 3 weeks predicted horizon with regressors |r|, r , RAV, RV
22
Table7: The in‐sample results of 1 month predicted horizon with regressors |r|, r , RAV, RV
23
Table8: The MSE results of Out‐of sample by RV and RAV at all forecast horizons
Table9: The MSE results of Out‐of sample by |r|, r at all forecast horizons
R code
24
25
26
27
##########OUT OF SAMPLE###################
outresult_absr=function(l_day,ylag){
MSE=c()
ret=c(0,return_diff_l)
for(j in 1:length(l_day)){
agy=ylag*48
agx=l_day[j]*48
error=c()
x=datageneration_x(ret[1:104016],agy,agx)
y=datageneration_y(ret[1:104016],agy,agx)
a=bnls_restr(x,y)
for(i in 1:11664){
hv=a[1]+a[2]*(t(as.matrix(ret[(104015+i):(104016+i‐agx)])
)%*%as.matrix(beta_weights(agx,1,a[3])))
v=sum(ret[(11663+i):(11664+i+agy)])
error[i]=hv‐v
}
MSE[j]=sum(error^2)/243
}
return(MSE)
}
outresult_rv=function(l_day,ylag){
MSE=c()
for(j in 1:length(l_day)){
agy=ylag
agx=l_day[j]
error=c()
x=datageneration_x(rv[1:2167],agy,agx)
y=datageneration_y_2(rv[1:2167],agy,agx)
a=bnls_restr(x,y)
for(i in 1:243){
hv=a[1]+a[2]*(t(as.matrix(rv[(2166+i):(2167+i‐agx)]
))%*%as.matrix(beta_weights(agx,1,a[3])))
v=sum(rv[(2167+i):(2166+i+agy)])
error[i]=hv‐v
}
MSE[j]=sum(error^2)/243
}
return(MSE)
}
i In my R code, the regression command is “agy=48, agx=48”
outresult_sqrr=function(l_day,ylag){
MSE=c()
ret=c(0,return_diff_l)
for(j in 1:length(l_day)){
agy=ylag*48
agx=l_day[j]*48
error=c()
x=datageneration_x_2(ret[1:104016],agy,agx)
y=datageneration_y(ret[1:104016],agy,agx)
a=bnls_restr(x,y)
for(i in 1:11664){
hv=a[1]+a[2]*(t(as.matrix(ret[(104015+i):(104016+i‐agx)]))
%*%as.matrix(beta_weights(agx,1,a[3])))
v=sum(ret[(11663+i):(11664+i+agy)])
error[i]=hv‐v
}
MSE[j]=sum(error^2)/243
}
return(MSE)
}
outresult_rav=function(l_day,ylag){
MSE=c()
for(j in 1:length(l_day)){
agy=ylag
agx=l_day[j]
error=c()
x=datageneration_x(rav[1:2167],agy,agx)
y=datageneration_y_2(rv[1:2167],agy,agx)
a=bnls_restr(x,y)
for(i in 1:243){
hv=a[1]+a[2]*(t(as.matrix(rav[(2166+i):(2167+i‐agx)]))%*%
as.matrix(beta_weights(agx,1,a[3])))
v=sum(rv[(2167+i):(2166+i+agy)])
error[i]=hv‐v
}
MSE[j]=sum(error^2)/243
}
return(MSE)