paolo zaffaroni ∗ imperial college london

LARGE-SCALE VOLATILITY MODELS:THEORETICAL PROPERTIES OF PROFESSIONALS’ PRACTICE

Paolo Zaffaroni ∗

Imperial College London

This draft: October 2007

∗I thank Alessio Sancetta, Fabio Fornari and M. Hashem Pesaran for stimulating con-versations and Tim Bollerslev for kindly providing me with the foreign exchange rate data.Address correspondence to:Tanaka Business School, Imperial College London, South Kensington Campus, SW7 2AZLondon, tel. + 44 207 594 9186, email [email protected]

Abstract

This paper examines the way in which GARCH models are esti-mated and used for forecasting by practitioners in particular usingthe highly popular RiskmetricsTM approach. Although it permits siz-able computational gains and provide a simple way to impose positivesemi-definitiveness of multivariate version of the model, we show thatthis approach delivers non-consistent parameter’ estimates. The noveltheoretical result is corroborated by a set of Montecarlo exercises. Aset of empirical applications suggest that this could cause, in general,unreliable forecasts of conditional volatilities and correlations.

Keywords: GARCH, RiskmetricsTM , estimation, forecasting, multivariate volatilitymodels.

2

1 Introduction

Accurate forecasting of volatility and correlations of financial asset returns

is essential for optimal asset allocation, managing portfolio risk, derivative

pricing and dynamic hedging. Volatility and correlations are not directly

observable but can only be estimated using historical data on asset returns.

Financial institutions typically face the problem of estimating time-varying

conditional volatilities and correlations for a large number of assets. More-

over, fast and computationally efficient methods are required. Therefore,

parametric models of changing volatility are those most commonly used,

rather than semi- and non-parametric methods. In particular, the autore-

gressive conditional heteroskedasticity (ARCH) model of Engle (1982), and

the generalized ARCH (GARCH) of Bollerslev (1986), represent the most

relevant paradigms. GARCH models are easy to estimate and fit financial

data remarkably well (see Andersen and Bollerslev (1998)). In fact, GARCH

models do account for several of the empirical regularities of asset returns

(see Bollerslev, Engle, and Nelson (1994)), in particular dynamic conditional

heteroskedasticity.

The popularity of GARCH models among practitioners in part stems

from their close analogies with linear time series models such as autore-

gressive integrated moving average models (ARIMA), as well as with other,

a-theoretical, models such as the exponentially weighted moving average

(EWMA) model. Precisely by exploiting such analogies has permitted a

feasible and computationally fast method for evaluating the conditional time-

varying covariance matrix for a large number of assets, of the order of the

hundreds. This method, which can be viewed as a highly restricted multi-

variate GARCH, has been popularized under the name of RiskMetrics TM

approach (see J.P.Morgan/Reuters (1996)). In this paper we shall call this

1

the common approach, acknowledging that it has been, and still is, the

dominant paradigm used by most financial analysts in the last years (see

J.P.Morgan/Reuters (1996) and Litterman and Winkelmann (1998)).

Given the widespread evidence of practitioners using the common ap-

proach, this paper examines its theoretical underpinnings and effective per-

formance. Surprisingly, very little theoretical research has been carried out

on this topic. The only notable exception is Cheng, Fan, and Spokoiny

(2003) who nest the common approach within a wide class of filtering prob-

lems. They show that, under mild conditions, the filtering performance of the

common approach does not depend on whether one uses (observed) square

returns rather than the (unobserved) volatility process. Their result is com-

plementary to this paper that focuses, instead, on the estimation part of

the filtering. Our main contribution is to show how the estimation method,

embedded within the common approach, delivers non consistent estimates

of the model parameters. A MonteCarlo exercise describes the finite-sample

properties of the estimator, indicating that its poor performance does not

only arise asymptotically. Consequently, misleading forecasts are likely to

occur. More importantly, conditional cross-covariances and correlations are

poorly estimated, possibly leading to unexpected risk exposure when the

estimated conditional covariance matrix is used to calculate dynamic hedge-

ratios, Value-at-Risk performance and mean-variance efficient portfolios.

Adopting the common approach contrasts with the use of correctly spec-

ified GARCH models which we will be referring to as the correct approach.

Practical applications of the correct approach for large scale problems (in-

volving a large number of assets) is limited by the large number of parameters

involved. As a consequence, the several proposed versions of multivariate

GARCH models entail strong forms of parametric simplification, in order

2

to achieve computational feasibility. Recent advances include the models

with time-varying conditional correlation of Engle (2002) and Tse and Tsui

(2002) that generalize the constant conditional correlation model of Boller-

slev (1990). Bauwens, Laurent, and Rombouts (2003) provides a complete

survey of this recent literature.

This paper proceeds as follows. Section 2.1 describes the way in which

multivariate GARCH models are routinely specified and estimated by prac-

titioners. Section 2.2 recalls the correct specification of GARCH models and

related estimation issue. Section 2.3 looks at the small-sample properties

of the estimation nested within the common approach by means of a set

of Montecarlo exercises. A comparison of the predictive ability of the two

approaches is described in sections 3.1 and 3.2, based on the Olsen’s data set

of the spot Mark/Dollar and Yen/Dollar foreign exchange rates. Concluding

remarks are in Section 4. Section 5 contains a mathematical appendix.

2 Large-scale volatility models: specification

and estimation

Typical situations faced by practitioners, such as optimal asset allocations

and portfolio risk diversification, involve many assets. J.P.Morgan/Reuters

(1996, p.97), for instance, considers the problem of estimating the conditional

covariance matrix of 480 time series. In such circumstances, as discussed in

Section 1, use of the correct approach is not a possibility and some form of

restrictions are required in order to achieve computational feasibility. The

common approach, instead, is always applicable in such situations, which

motivates its widespread use for real time applications. The common ap-

proach, and the corresponding estimation procedure, is spelled out in the

following section. Next, we then briefly recall the correct approach.

3

2.1 Common approach

Let Pi,t be the speculative price of the generic ith asset, for i = 1, ..., m, at

date t and define the continuously compounded one-period rate of return as

ri,t = ln(Pi,t/Pi,t−1). To focus on the volatility dynamics, assume for the sake

of simplicity that the rt = (r1,t, ..., ri,t, ..., rm,t)′ are martingale differences:

E(rt | Ft−1) = 0, (1)

where Ft defines the sigma-algebra induced by the rs, s ≤ t. The simplest

estimator of the conditional covariance E(ri,trj,t | Ft−1) = σij,t is the weighted

rolling estimator, with window of length n:

σij,t =n∑

s=1

ws(n)ri,t−srj,t−s, i, j = 1, ..., m, (2)

where the weights ws(n) satisfy ws(n) ≥ 0, limn→∞∑n

s=1 ws(n) = 1, (see

J.P.Morgan/Reuters (1996, Table 5.4) and Litterman and Winkelmann (1998,

eq.(2)) among others). σ2ij,t is a function of n but we will not make this ex-

plicit for simplicity’s sake. Important particular cases of (2) are the equally

weighted estimator, for ws(n) = 1/n, and the exponentially weighted estima-

tor, for

ws(n) = (1− λ0) λs−10 , (3)

for constant 0 < λ0 < 1, known as the decay factor. The weights (3) yield

the popular EWMA estimator

σij,t = (1− λ0)n∑

s=1

λs−10 ri,t−srj,t−s. (4)

The practical appeal of the EWMA estimator (4) lies in the fact that, by

suitably choosing λ0, the estimate will be more sensitive to newer observa-

tions than to older observations (cf. J.P.Morgan/Reuters (1996, p.80) and

4

Litterman and Winkelmann (1998, p.15)). Computationally, EWMA is not

more burdensome than simple averages, thanks to the recursion

σij,t = λ0σij,t−1 + (1− λ0)ri,t−1rj,t−1, σij,t−n = 0. (5)

Note that the parameter λ0 does not vary with i, j and, indeed, this choice

guarantees that the m × m matrix Σt = [σij,t] (1 ≤ i, j ≤ m), solution of

Σt = λ0Σt−1 + (1 − λ0)rt−1r′t−1, Σ0 = 0, is positive semi-definite, setting

rt = (r1,t, ..., rm,t)′ (cf. J.P.Morgan/Reuters (1996, p. 97)). Practical imple-

mentation of (4) is often based on

σij,t =(1− λ0)

(1− λn0 )

n∑s=1

λs−10 ri,t−srj,t−s, (6)

ensuring that∑n

s=1 ws(n) = 1 for any finite n, with the recursion (in matrix

notation)

Σt = λ0Σt−1 +(1− λ0)

(1− λn0 )

rt−1r′t−1 −

(1− λ0)

(1− λn0 )

λn−10 rt−n−1r

′t−n−1. (7)

Typically, the initialization of such recursion is based on the sample covari-

ance matrix of a pre-sample of data. Alternatively, assuming to observe a

sample r1, ..., rT of data and for a given n, one can evaluate (7) for t ≤ n with

an expanding window (replacing n with t − 1), and returning to (7) when

t > n.

When the weights ws(n) vary suitably with n, the rolling estimator (2),

though a-theoretical, can be justified as a non-parametric estimator of the

conditional variance. This rules out the possibility that σij,t nests the EWMA

(4) which, however, is closely related to parametric time series models, such

as GARCH, as illustrated below. The weights of the (rolling) EWMA (7)

vary with n but without converging towards zero as n grows to infinity, again

ruling out the non-parametric interpretation.

5

We now establish the analogy of EWMA (4) with multivariate GARCH(1, 1).

The latter, in its most general specification, is

rt = Σ12t zt, (8)

vech(Σt) = Ω0 + A0vech(Σt−1) + B0vech(rt−1r′t−1), (9)

where vech(.) denotes the column stacking operator of the lower portion of

a symmetric matrix, zt = (z1,t, ..., zm,t)′ is an m-valued i.i.d. sequence with

Ezt = 0 and Eztz′t = Im where Im is the identity matrix of dimension m×m,

Ω0 is an m(m + 1)/2× 1 vector and A0, B0 are m(m + 1)/2×m(m + 1)/2

matrices of coefficients (see Bollerslev, Engle, and Wooldridge (1988, eq. 4)).

Imposing diagonality and constancy of the diagonal terms of A0, B0 yields

A0 = α0Im(m+1)/2, B0 = β0Im(m+1)/2. (10)

Further imposing

α0 + β0 = 1, (11)

one obtains the multivariate integrated GARCH(1, 1) (hereafter IGARCH(1, 1):

σij,t = ωij,0 + α0ri,t−1rj,t−1 + (1− α0)σij,t−1, i, j = 1, ...,m. (12)

Despite the ri,t are not covariance stationary, the conditional correlations

ρij,t = σij,t/σi,tσj,t are well defined. Finally, imposing

ωij,0 = 0, i, j = 1, ...,m, (13)

and σij,t−n = 0, i, j = 1, ..., m, the EWMA (5) and the multivariate IGARCH(1, 1)

(12) coincide. In matrix notation, the latter is

rt = Σ12t zt, Σt = α0rt−1r

′t−1 + (1− α0)Σt−1. (14)

GARCH have a close connection with ARMA (cf. Bollerslev (1986) for the

univariate case). In fact, set νt = rtr′t −Σt yielding vech(νt) = vech(rtr

′t)−

6

vech(Σt) = (Σ12t ⊗Σ

12t )vech(ztz

′t−Im) by (8). The multivariate GARCH(1, 1),

as from (9), implies a multivariate ARMA(1, 1) representation for vech(rtr′t):

vech(rtr′t) = Ω0 + (A0 + B0)vech(rt−1r

′t−1) + vech(νt)−A0vech(νt),

with martingale difference vech(νt) (not necessarily with finite variance). Un-

der (10)-(11) one gets a multivariate ARIMA(0, 1, 1)

vech(rtr′t) = Ω0 + vech(rt−1r

′t−1) + vech(νt)− α0vech(νt), (15)

Therefore, the diagonal elements of rtr′t, that is the r2

i,t, i = 1, ...,m, display

a unit root with non-negative drift. It is well known that for standard,

linear, unit root models, with positive drift, the series diverges a.s. to infinity.

In turn, this suggested to focus on IGARCH(1, 1) models satisfying (13)

(cf. J.P.Morgan/Reuters (1996, eq. (5.37)) and Litterman and Winkelmann

(1998, eq. (4))). However, despite its appeal this analogy is false. In fact,

when m = 1 (Nelson 1990, Theorem 1 and 2) showed that, for any value

ωii,0, under

E ln(β0 + α0z2t ) < 0. (16)

univariate GARCH(1, 1) are strictly stationary and ergodic. It is easy to

see that IGARCH(1, 1), with E z2t = 1, satisfies (16) (see Nelson (1990,

p.321))). Therefore, in contrast to standard linear unit root models, such as

the classical random walk, the r2i,t are strictly stationary and do not exhibit

any type of explosive behaviour.

In order to introduce estimation, let us introduce the parameterized model

for the ith conditional variance σ2ii,t(α) = αr2

i,t−1+(1−α)σ2ii,t−1(α), σ2

ii,0(α) =

0 with α ∈ (0, 1). Given the emphasis on forecasting, the parameter α0 is

routinely estimated by least squares (LS), i.e. minimizing the mean square

7

error (MSE) of the predictions:

MSEiT (α) =1

T

T∑t=2

(r2i,t − σ2

ii,t(α))2

(17)

yielding

αiT = argminα∈[α,α]MSEiT (α), (18)

for given 0 < α < α < 1 (cf. J.P.Morgan/Reuters (1996, section 5.3.2.1)).

The α, α can be chosen arbitrarily, yet the minimization in (18) necessarily

requires the definition of a compact interval, bounded away from both zero

and unity.

As far as estimation is concerned, the common approach consists of a two-

stage approach: (i) estimate univariate EWMA for each asset by minimizing

the univariate MSE (cf. (17)), yielding the α(i)T (i = 1, ..., m); (ii) get the final

estimate of α0 by a weighted average of the former yielding

αT =m∑

i=1

φ(i)T α

(i)T , (19)

with weights that penalize assets whose estimated coefficients have a large

MSE, φ(i)T = (θ

(i)T )−1/

∑mj=1(θ

(j)T )−1, i = 1, ..., m, setting

θ(i)T =

√MSEiT (α

(i)T )(

√MSE1T (α1T )+...+

√MSEmT (αmT ))−1, i = 1, ..., m,

(see J.P.Morgan/Reuters (1996, section 5.3.2.2)).

We now show that specifying GARCH models by imposing (13) has dra-

matic implications for two-stage estimation procedure of the common ap-

proach. Therefore, exploiting the analogies of GARCH models with both

EWMA and ARIMA, could lead to misleading inferences and forecasting

results. First of all, we recall that setting (13) is not innocuous for the prob-

abilistic properties of the model. In fact, when (10)-(11)-(13) hold, by simple

8

substitution one gets

Σt = Σ12t−1At−1Σ

12t−1, (20)

setting At = (α0Im + (1− α0)ztz′t). From (20) it follows that when Σt = 0

then Σs = 0 and rs = 0 for any s > t, in analogy with the univariate case,

suggesting that

tr(Σt) → 0 a.s. for t →∞, (21)

where tr(·) is the trace operator. Indeed, for m = 1 it has been established

that (11)-(13) imply

σ2t → 0 a.s. for t →∞ (22)

(see Nelson (1990, Theorem 1)). The impact of this result can also be viewed

by means of a simulation, displayed in Figure 1 (left panel), based on setting

α = 0.95. It turns out that the smaller α is, the sooner the conditional

variance will converge towards zero. The right panel of Figure 1 reports

the results of a simulation exercise showing that (21) really also occurs for

m > 1. Result (22) is well known. However, its implication for estimation

are not known. In principle, this asymptotic degenerateness might, not be

important for estimation and forecasting. It turns out that this statement is

false as stated by the following theorem, whose proof is in the appendix.

Theorem 1 For univariate IGARCH(1, 1), when (13) holds,

(a)

supα∈[α,α]

MSEiT (α) → 0 a.s. for T →∞ (23)

for any 0 < α < α < 1 and i = 1, ..., m.

(b) Under the same conditions

infα∈[α,α]

E(MSEiT (α)|σ2

0

) →∞ for T →∞. (24)

9

Remark 1.1 The first part of Theorem 1 implies that the LS estimator αiT is

non-consistent for α0 and is therefore meaningless. In fact (23) says that the

MSE, non negative by construction, is (asymptotically) minimized for any

value of α ∈ (0, 1). It follows that α0 is globally (asymptotically) unidentified.

One might wonder whether this is merely the symptom of a different rate of

convergence, namely whether T bMSEiT (α) would converge to a non-random

expression for some b 6= 0, uniquely minimized at α0. Simple inspection of

the proof of Theorem 1 shows that T MSEiT (α) is bounded a.s. and non

zero but the limit is random and model identification fails.

Since non-consistency holds for all i = 1, ..., m, it follows that the final

estimator αT is also non-consistent for α0.

Remark 1.2 Cheng, Fan, and Spokoiny (2003) show that no differences

arises when estimating the common approach through (17) with respect to

the ideal, but unfeasible, case when the true volatility σ2ii,t is observed and

one minimizes T−1∑T

t=2

(σ2

ii,t − σ2ii,t(α)

)2.

Remark 1.3 Establishing almost sure convergence plays an important role.

In fact, taking the (conditional) expectation of MSEiT (α) yields (24) which

seems to contradict part (a) of Theorem 1. This (apparent) contradiction

is a by-product of the non stationarity of IGARCH. Basically, the asymp-

totic behaviour of the average of moments does not reflect the asymptotic

behaviour of the average of the underlying random variables. Recall that for

IGARCH(1, 1) condition (16) holds, implying that it is both strictly station-

ary and ergodic.

2.2 Correct approach

When m = 1, IGARCH(1, 1) have been shown to be strictly stationary and

ergodic under (16). (No corresponding results have been provided for case

10

m > 1.) It follows that, despite the ARIMA(0, 1, 1) representation, there is

no harm in imposing

ωii,0 > 0. (25)

Indeed, Nelson (1990, Theorem 2) has shown that under (16) and (25) σ2ii,t−

uσ2ii,t → 0 a.s. for t →∞ setting uσ

2ii,t = ωii,0α

−10 +α0

∑∞s=1(1−α0)

s−1r2i,t−s.

Here uσ2ii,t defines the unconditional process, in contrast to σ2

ii,t which de-

fines the conditional process as it depends on the initial condition σ2ii,t−n.

Moreover, it has been shown that the uσ2ii,t are strictly stationary and ergodic,

with a well-defined non degenerate probability measure on [ωii,0/α0,∞). Fig-

ure 1 (middle panel), based on setting α = 0.95, provides a typical sample

path for the process. Now the model conditional variance is always bounded

away from zero and the process is never degenerate. IGARCH(1, 1) are,

however, not covariance stationary. In fact, under (11) the ri,t have infinite

variance although the sample path will not be explosive, thanks to the strict

stationarity and ergodicity. For this reason, the IGARCH(1, 1) parameters

ωii,0 and α0 cannot be estimated by LS. However, the model is well speci-

fied and can be estimated in various ways, the most common of which is by

pseudo maximum likelihood (PML). The univariate PML estimator (PMLE)

is characterized by standard asymptotic statistical properties (see Lumsdaine

(1996)), and it is given by (ωpmleii,T , αpmle

iT ) = argminω,α∈[α,α]×[ω,ω] LiT (ω, α) for

constants 0 < α < α < 1 and 0 < ω < ω < ∞, setting

LiT (ω, α) =1

T

T∑t=1

ln σ2ii,t(ω, α) +

1

T

T∑t=1

r2i,t

σ2ii,t(ω, α)

(26)

with the parameterized conditional variance σ2ii,t(ω, α) = ω + αr2

i,t−1 + (1 −α)σ2

ii,t−1(ω, α).

Obviously, when m > 1, it would be preferable to estimate model pa-

rameters using a multivariate estimator, to afford greater efficiency. In

11

principle, the multivariate PMLE, could be used whereby (ΩpmleT , αpmle

T ) =

argminΩ,α∈O×[α,α] LT (Ω, α) for

LT (Ω, α) =1

T

T∑t=1

ln | Σt(Ω, α) | + 1

T

T∑t=1

r′tΣ−1t (Ω, α)rt (27)

with the parameterized conditional covariance matrix vech(Σt(Ω, α)) = Ω+

α vech(Σt−1(Ω, α))+(1−α)vech(rt−1r′t−1). However, when even moderately

large values of m are considered, say 5 ≤ m ≤ 10, minimization of (27)

becomes an arduous computational task, certainly prohibitive when m is of

order of the hundreds. This is because (27)is highly nonlinear in terms of

the elements of Ω, α. Suitably restricted multivariate GARCH models might

afford a relatively larger m although, for instance Bollerslev (1990) presents

an application of its constant conditional correlation model with m = 5

whereas Engle (2002) reports some applications with m = 3 of his dynamic

conditional correlation model.

2.3 Small-sample performance

Table 1 reports a Montecarlo exercise in order to evaluate the performance

of the LS estimator αT and of the PMLE αpmleT used, respectively, to esti-

mate the common and the correct approach. We focus here on the case

m = 1 for simplicity. One can also compare the common and the correct

approaches using the same estimator, in particular the PMLE. However, we

feel that it is more relevant to compare the two models using the correspond-

ing estimation procedure most frequently used by practitioners. Moreover,

the asymptotic degeneratedness (22) would make the implementation of the

PMLE problematic for the common approach.

We simulated IGARCH(1, 1) imposing (13), with α0 = 0.05, 0.75 and 0.95.

We consider samples of length 15, 000 and estimate the model considering the

12

first 1, 000 observations, the first 5, 000 observations and, lastly, all 15, 000

observations. The MSE is minimized by numerical methods with a MatLab

code, starting from an arbitrary value equal to 0.5. Such choice should be

completely irrelevant for a well specified model.

The non-consistency of αT clearly emerges when comparing the Monte-

carlo variances (column three) for different sample sizes, which do not vary

with T . On the other hand, when looking at the Montecarlo means (column

two), it seems that the LS estimator is reliable for large values of α0. (For

case α0 = 0.05 also the mean indicates the non consistency). Likewise, look-

ing at Figure 2 (top three panels), it clearly appears that the LS estimator is

both non-consistent and non-centered (biased) with respect to the true value.

Interestingly, note how the behaviour of the LS estimator for case α0 = 0.95

(top right panel) is heavily influenced by the initial, very persistent, observa-

tions. As a result, the estimate is close to the trsue value although in reality

this is independent of its asymptotic statistical properties. It is thus possible

that a mis-specified model achieves a good forecasting performance. This

possibility is investigated in section 3.1.

Columns seven to twelve of Table 1 report the small-sample properties of

the PMLE, based on samples of size T = 1, 000, 5, 000 and 15, 000. Again the

optimization started from the same arbitrary value of 0.5 for α. The estimates

are all centered around the true value and their variance decreases as the

sample size increases, at rate 1/T . The average of (26) across replications,

evaluated at the PMLE (column twelve of Table 1), does not significantly

change with T . The empirical distribution of the estimates are reported in

Figure 2 (bottom three panels).

13

3 Empirical illustration

Our theoretical result indicates the inherent difficulties for estimation of the

common approach. We now present two simple empirical applications aimed

at providing some evidence on the forecasting implications of the theoretical

result.

3.1 Comparing forecasting performances (univariate)

We compare the predictive capability of univariate IGARCH(1, 1) models

described in section 2.1 and 2.2, that is with and without condition (13).

Hereafter, we denote the former as model 2 and the latter as model 1. Since

m = 1 we drop any reference to the index i = 1, ..., m. We consider the

Diebold and Mariano (1995) test of predictive accuracy

DMS(L) =1√AS

S∑s=1

(L(σ2

s ,(1)σ2

s|s−p)− L(σ2s ,

(2)σ2s|s−p)

)(28)

where S defines the number of p-step ahead forecasts employed and L(a, b)

defines a generic loss function. (1)σ2s|s−p and (2)σ2

s|s−p express the two, com-

peting, p-step ahead forecasts of the true conditional volatility σ2s defined in

(30) and (31). The normalizing quantity AS is a consistent estimate of the

variance of the numerator of (28), robust to autocorrelation of unknown form

(see Diebold and Mariano (1995, section 1.1) for details), such that under

suitable regularity conditions DMS(L) converges in distribution to a stan-

dard normal (for S → ∞ ), under the null hypothesis of equal forecasting

performance

E(L(σ2

s ,(1)σ2

s|s−p)− L(σ2s ,

(2)σ2s|s−p)

)= 0. (29)

We employ the well-known data set of Olsen & Ass., frequently used to com-

pare predictive performance of volatility models since Andersen and Boller-

14

slev (1998). In particular, the data consist of daily and intra-daily returns

for the Mark/Dollar spot exchange rate (henceforth Mark/Dollar returns),

from July 1, 1974, through September 30, 1992 - 4, 573 observations - for

daily data, and from October 1, 1992, through September 30, 1993 - 74, 880

observations - for intra-daily data (five-minute returns). We then obtain 260

observations of the so-called realized volatility, obtained by summing squared

intra-daily returns (288 intra-daily observations per day), as indicated by An-

dersen and Bollerslev (1998). In this exercise we assume that the realized

volatility expresses true (daily) volatility, denoted by σ2t , with a negligible

error.

The results are reported in Table 2. The data in levels (returns) have

been preliminary filtered with an AR(1) model. The two volatility models

are then estimated, first, using 4, 573 daily observations, then 4, 573 + 1

observations, and so on up to 4, 573 + 259 observations. West (1996) noted

that the asymptotic distribution of DMS(L) might depend on the sample

variability of parameter estimates. However he established that no effect

arises when the length of the estimation sample dominates the length of the

evaluation sample. Since in our case the former (4, 573) is nearly twenty times

the latter (260), we proceed ignoring the effect of parameter uncertainty. The

averages of the 260 different point estimates of α0 for both models (estimates

of ω0 for model 1 are not reported for the sake of simplicity) are reported

in column two and column three for model 2 and model 1 respectively. For

both models estimation starts from an initial value α = 0.95.

We consider two different types of loss function, the square-rooted ab-

solute difference L(a, b) =| a − b | 12 and the loss function implicit in the

Gaussian log likelihood L(a, b) = ln(b)+a/b, (see Bollerslev, Engle, and Nel-

son (1994, section 7)), whose results are reported in columns 4−7 and 8−11

15

respectively. Four different forecasting horizons are considered: at 1 day, 1

week, 1 month and 6 months ahead. The forecasting function for model 1

(correct), is

(1)σ2t+p|t = E((1)σ2

t+p|Ft) = pω0 + α0r2t + (1− α0)σ

2t . (30)

For model 2 (common) the forecast function is constant and equal to

(2)σ2t+p|t = E((2)σ2

t+p|Ft) = α0r2t + (1− α0)σ

2t , (31)

for any p ≥ 1. Obviously the preceding expressions are in practice evalu-

ated at the estimated, rather than true, parameter values. The choice made

for the loss functions reflects the estimation methods employed for model 1

and model 2 respectively. The square-rooted absolute difference loss func-

tion should potentially favor model 2 whereas the Gaussian likelihood loss

function should favor model 1. Comparing the results for the two loss func-

tions should avoid biases when assessing the forecasting performance of the

competing models.

There are two main findings. First, the forecasting performance of model

2 is significantly worse than that of model 1 in most cases, or at most not

significantly different from it, for short and medium-run horizons (1 day, 1

week and 1 month). By contrast, for longer horizon (6 months), model 2

outperforms model 1 for the square-root loss function and it is not signifi-

cantly different from model 2 for the Gaussian loss function. This outcome is

not completely surprising. In fact, the asymptotic degenerateness of model

2 implies a bounded forecasting function for all horizons, whereas for (the

correctly specified) model 1 the forecasting function diverges to infinity, as

the forecast horizon grows to infinity.

Second, the forecasting performance of model 2 is extremely dependent

on the initial value for σ20(α) chosen when estimating the model. The first

16

four rows of Table 2 refer to different values for σ20(α), all arbitrarily chosen,

except for row four for which σ20(α) has been estimated jointly with α. Nev-

ertheless, even for this last case, the poor forecasting performance of model

2 emerges (except for the long horizon). In fact, different initial conditions

imply extremely different point estimates for α, as indicated in the second

column reporting the average of the αT , and thus different forecasts. The

performance of the two models is comparable when setting σ20(α) = 0.1 in

the estimation of model 2, yielding an average of the αT equal to 0.881, not

surprisingly close to the average of the αpmleT of 0.851.

The last three rows report the result obtained setting σ20(α) equal to

the sample variance of the data, and setting α = 0.94. This is the value

suggested by J.P.Morgan/Reuters (1996, p.100) and, presumably, close by to

any other values used by practitioners. As before, no significative difference

arises between the two competing models for all but the 6 months horizon.

3.2 Comparing forecasting performances (bivariate)

This section compares the predictive performance of the common approach

versus the correct approach in a bivariate (m = 2) exercise.

We employ the Mark/Dollar spot exchange rate series described in section

3.1 as well as the time series of the Yen/Dollar spot exchange rate. (The

data are described in Andersen and Bollerslev (1998).) Considering only

the observations which share the same trading periods yields 4, 573 daily

observations - from July 1, 1974, through September 30, 1992 - and 74, 304

intra-daily (five-minute returns) observations - from October 1, 1992, through

September 30, 1993. Table 3 describes the results. The return data have been

first filtered with an AR(1) model. For this bivariate exercise we consider

the loss functions L1(A, B) =| a11 − b11 | 12 + | a22 − b22 | 12 + | a12 − b12 | 12

17

, L2(A,B) =| a12/√

a11a22 − b12/√

b11b22 |, for any pair of 2 × 2 symmetric

matrices

A =

(a11 a12

a12 a22

), B =

(b11 b12

b12 b22

).

The first type of loss function generalizes the square-rooted absolute differ-

ence used in section 3.1. The second type of loss function compares (condi-

tional) cross-correlations. The forecasting function for model 1 is, for p ≥ 1,

(1)Σt+p|t = pΩ0 + α0rtr′t + (1− α0)Σt.

As for the univariate case, the forecast function of model 2 is constant and

equal to

(2)Σt+p|t = α0rt r′t + (1− α0)Σt,

for any p.

The results of Table 3 confirm that model 1 significantly outperforms

model 2 in most cases. The results are less conclusive for longer horizons.

The better performance of model 1 is particularly evident when comparing

conditional cross-correlations (columns 8− 11). Second, we find the extreme

sensibility of the estimates and forecasting performance of model 2 from ini-

tial conditions Σ0(α) = σ20(α)I2. Again, the performance of the two models

is never significantly different when σ20(α) = 0.1. When deriving σ0(α) jointly

with αT , the performance of model 2 is significantly worse than that of model

1 for most cases. Finally, note that the estimates αT are only marginally dif-

ferent the ones reported in Table 2. (In both cases, estimation started from

α = 0.95.) Summarizing, the common approach is markedly inferior, at all

horizons, when the smoothing parameter is estimated.

18

4 Concluding remarks

Practitioners face the problem of estimating in real time conditional volatil-

ities and cross-correlations for a large set of asset returns. The so-called

common approach of specifying, estimating and forecasting GARCH mod-

els provides a simple and feasible way to achieve this task. Unfortunately,

as this paper shows, the estimates obtained in this way lack of the usual

(asymptotic) statistical properties. The Monte-Carlo experiments indicates

that such estimation procedure is invalid even in small-sample. The empiri-

cal applications here presented suggest that this can have a non trivial effect

on the forecasting performance of conditional variances and correlations.

5 Appendix

Proof of Theorem 1. We drop for simplicity any reference to the index i,

since it plays no role in the proof.

(a) Impose (13). Substituting for t times, recursively, r2t = z2

t σ2t into σ2

t =

ω0 + α0r2t−1 + (1− α0)σ

2t−1 (cf. Nelson (1990, eq.(6))) yields

σ2t = σ2

0

t∏i=1

(1− α0 + α0z2t−i). (32)

which is bounded a.s. for any t < ∞ whenever σ20 < ∞. Moreover, there

exists a random integer K < ∞ a.s. such that (cf. Nelson (1990, p.320))

t∏s=1

(1− α0 + α0z2t−i) ≤ Ct a.s. for any t > K, (33)

setting C = eγ2 < 1, with γ = E ln z2

0 < 0. Next, by Markov’s inequality, for

arbitrary ε > 0, Pr(t−2 z2t ≥ ε) ≤ ε−1t−2 yielding by Borel-Cantelli lemma

z2t = o(t2) a.s. for t →∞. (34)

19

Finally, using (a − b)2 ≤ 2(a2 + b2) for any real a, b yields (assume T > K

with no loss of generality)

MSET (α) ≤ 2

T

T∑t=2

r4t +

2

T

T∑t=2

σ4t (α). (35)

For the first term on the right-hand side of (35), using (33) and (34),

T∑t=2

r4t =

K∑t=2

r4t +

T∑t=K+1

r4t

≤K∑

t=2

r4t + σ4

0

T∑t=K+1

C2tz4t ≤

K∑t=2

r4t + σ4

0

∞∑t=1

C2tz4t < ∞ a.s.,

yielding 1T

∑Tt=2 r4

t = O(

1T

)a.s. for T → ∞. Note that this term is inde-

pendent of α.

For the second term on the right-hand side of (35)∑T

t=2 σ4t (α) =

∑K+1t=2 σ4

t (α)+∑T

t=K+2 σ4t (α) and

T∑t=K+2

σ4t (α) ≤ 2α2

T∑t=K+2

(t−K−1∑

s=1

(1− α)s−1r2t−s

)2

+2α2

T∑t=K+2

(t−1∑

s=t−K

(1− α)s−1r2t−s

)2

(36)

using σ2t (α) = α

∑t−1s=1(1 − α)s−1r2

t−s. For the first term on the right hand

side of (36), using the cr inequality, viz. (∑m

i=1 ai)2 ≤ m (

∑mi=1 a2

i ) for any

sequence ai,

α2

T∑t=K+2

(t−K−1∑

s=1

(1− α)s−1r2t−s

)2

≤ σ40α

2

T∑t=K+2

t

(t−K−1∑

s=1

(1− α)2(s−1)C2(t−s)z4t−s

)

≤ σ40α

2

T∑t=2

t

|t/2|∑s=1

(1− α)2(s−1)C2(t−s)z4t−s

+σ4

0α2

T∑t=2

t

t−1∑

s=|t/2|+1

(1− α)2(s−1)C2(t−s)z4t−s

≤ σ40α

2

T∑t=2

t Ct

|t/2|∑s=1

(1− α)2(s−1)z4t−s

+ σ4

0α2

T∑t=2

t (1− α)t

t−1∑

s=|t/2|+1

C2(t−s)z4t−s

20

≤ σ40α

2

T∑t=2

t Ct

(t−1∑s=1

z4t−s

)+ σ4

0α2

T∑t=2

t (1− α)t

(t−1∑s=1

z4t−s

)≤ 2σ4

0α2

T∑t=2

Bt

(t−1∑s=1

z4t−s

)

= 2σ40α

2

T−1∑t=1

z4t

(T∑

s=t+1

Bs

)≤ 2σ4

0α2

1−B

T−1∑t=1

z4t B

t+1 ≤ 2σ40α

2

1−B

∞∑t=1

z4t B

t+1 < ∞ a.s.

for a constant B satisfying sup [ (1− α), C] < B < 1. For the second term

on the right-hand side of (36)

2α2

T∑t=K+2

(t−1∑

s=t−K

(1− α)s−1r2t−s

)2

≤ α2 K max1≤s≤K r4i

T∑t=K+2

(t−1∑

s=t−K

(1− α)2(s−1)

)

= α2 K max1≤s≤Kr4i

T∑t=K+2

(1− α)2(t−K−1)

(K−1∑s=0

(1− α)2s

)

≤ α2 K max1≤s≤K r4i

( ∞∑t=0

(1− α)2t

)2

= K max1≤s≤K r4i

(α

1− (1− α)2

)2

< ∞ a.s.

Finally, collecting terms supα∈[α,α]2T

∑Tt=2 σ4

t (α) = O(

1T

)a.s. for T →∞. 2

(b) E(MSET (α) | σ20) = T−1

∑Tt=1 (E(r4

t | σ20) + E(σ4

t (α) | σ20)− 2E(r2

t σ2t (α) | σ2

0)) .

Easy calculations yield E(r4t | σ2

0) = σ40δ

t0 setting δ0 := E(1 − α0 + α0z

2t )

2.

Next, setting κ0 := E(1−α0 + α0z2t )z

2t and E(r2

t σ2t (α) | σ2

0) = κ0α∑t−1

j=1(1−α)(j−1)δt−j

0 , one gets that E(σ4t (α) | σ2

0) equals

σ40α

2

t−1∑j=1

(1− α)2(j−1)δt−j0 + 2σ4

0α2κ0

t−2∑j2=1

(1− α)j2−1

t−1∑j1=j2+1

(1− α)j1−1δt−j10 .

Tedious calculations yield E(r4t | σ2

0) + E(σ4t (α) | σ2

0) − 2E(r2t σ

2t (α) | σ2

0) =

δt0ct(α), where the sequence of positive constants ct(α) satisfy ct(α) → c(α) <

∞ as t →∞, setting

c(α) =

(1 +

α2

(δ0 − (1− α)2)+

2α2κ0(1− α)

(δ0 − (1− α))(δ0 − (1− α)2)− 2ακ0

(δ0 − (1− α))

).

Note that ct(α), c(α) depend also on δ0, κ0 but we are not making this explicit

for simplicity. By simple manipulations, noting that δ0 = 1+α20(µ4−1), κ0 =

21

1+α0(µ4−1) for µ4 := Ez40 , one gets c(α) = α(µ4−1)2(δ0− (1−α)2)−1(δ0−

(1 − α))−1, implying infα∈[α,α] c(α) = c > 0. Hence (24) easily follows since

δ0 > 1. 2

References

Andersen, A., and T. Bollerslev (1998): “Answering the skeptics: Yes,

standard volatility models do provide accurate forecasts,” International

Economic Review, 39, 885–905.

Bauwens, L., S. Laurent, and J. Rombouts (2003): “Multivariate

GARCH models: a survey,” CORE , Preprint.

Bollerslev, T. (1986): “Generalized autoregressive conditional het-

eroskedasticity,” Journal of Econometrics, 31, 302–327.

(1990): “Modelling the coherence in short-run nominal exchange

rates: a multivariate generalized ARCH,” Review of Economics and Statis-

tics, 72, 498–505.

Bollerslev, T., R. Engle, and D. Nelson (1994): “ARCH models,”

in Handbook of Econometrics vol. IV, ed. by R. Engle, and D. McFadden.

Amsterdam: North Holland.

Bollerslev, T., R. Engle, and J. Wooldridge (1988): “A capital

asset pricing model with time varying covariances,” Journal of Political

Economy, 96, 116–131.

Cheng, M., J. Fan, and V. Spokoiny (2003): “Dynamic nonparametric

filtering with application to finance,” Weiesestrass Insitute and Humboldt

University Berlin, Preprint.

22

Diebold, F., and R. Mariano (1995): “Comparing predictive accuracy,”

Journal of Business and Economic Statistics, 13, 253–263.

Engle, R. (2002): “Dynamic conditional correlation - a simple class of mul-

tivariate generalized autoregressive conditional heteroskedasticity mod-

els,” Journal of Business & Economic Statistics, 20, 339–350.

Engle, R. F. (1982): “Autoregressive conditional heteroskedasticity with

estimates of the variance of the United Kingdom,” Econometrica, 50, 987–

1007.

J.P.Morgan/Reuters (1996): “RiskMetricsTM - Technical Document,

Fourth Edition,” .

Litterman, R., and K. Winkelmann (1998): “Estimating Covariance

Matrices,” Risk management series, Goldman Sachs & Co.

Lumsdaine, R. (1996): “Consistency and asymptotic normality of the

quasi-maximum likelihood estimator in IGARCH(1,1) and covariance sta-

tionary GARCH(1,1) models,” Econometrica, 64/3, 575–596.

Nelson, D. (1990): “Stationarity and persistence in the GARCH(1,1)

model,” Econometric Theory, 6, 318–334.

Tse, Y. K., and A. Tsui (2002): “A Multivariate Generalized Autore-

gressive Conditional Heteroscedasticity Model With Time-Varying Corre-

lations,” Journal of Business and Economic Statistics, 20, 351–362.

West, K. (1996): “Asymptotic inference about predictive ability,” Econo-

metrica, 64, 1067–1084.

23

Table 1: univariate estimation of α0

ω0 = 0 ω0 = 1

Value Mean Stand. Dev. Maximum Minimum Obj. fun. Value Mean Stand. Dev. Maximum Minimum Obj. fun.

T = 1, 000

0.05 0.21 0.22 0.96 0.00 2.50 0.05 0.05 0.01 0.10 0.00 2.200.75 0.74 0.16 0.93 0.02 101.13 0.75 0.74 0.02 0.80 0.68 4.580.95 0.95 0.02 1.00 0.50 19.78 0.95 0.95 0.01 0.99 0.91 6.55

T = 5, 000

0.05 0.36 0.16 0.96 0.00 0.50 0.05 0.04 0.01 0.07 0.00 2.200.75 0.74 0.14 0.93 0.06 20.22 0.75 0.74 0.01 0.77 0.71 4.630.95 0.95 0.03 0.98 0.50 69.57 0.95 0.94 0.001 0.96 0.93 7.35

T = 15, 000

0.05 0.51 0.10 0.96 0.00 0.16 0.05 0.05 0.01 0.06 0.00 2.200.75 0.73 0.15 0.93 0.23 6.70 0.75 0.75 0.01 0.76 0.73 4.640.95 0.95 0.02 0.98 0.50 28.44 0.95 0.95 0.01 0.95 0.94 7.55This table reports the results of two Monte-Carlo experiments (1, 000 replications): columns

one to six reports the summary statistics of the LS estimator αT for IGARCH(1, 1) when

(13) holds (common approach). Columns seven to twelve reports the summary statistics of

the PML estimator αpmleT for IGARCH(1, 1) when (25) holds (correct approach).

Obj. func. is the average (across 1000 replications) of the MSE (17) for LS estimator and of

the Gaussian log likelihood (26) for PML estimator, respectively.

24

Table 2: Testing predictive accuracy (univariate)

1260

∑260t=p+1

∣∣∣σ2t − σ2

t|t−p

∣∣∣12 1

260

∑260t=p+1

(ln(σ2

t|t−p) + σ2t

σ2t|t−p

)

σ20(α) αT αpmle

T p: 1 5 20 120 1 5 20 120

0.10 0.88 0.85 0.82 1.47 1.95 8.88 -0.91 -0.95 -0.38 2.640.50 0.51 ” -5.10 -2.31 -0.25 4.33 -4.27 -3.59 -2.23 -0.211.00 0.30 ” -4.47 -7.47 -0.89 3.90 -2.91 -3.04 -3.40 -1.080.22 0.49 ” -5.06 -2.50 -0.31 4.21 -4.45 -3.44 -2.44 -0.31s2 0.94 ” 0.06 0.52 1.82 9.79 1.288 0.38 0.54 4.91

s2, n = 15 0.94 ” 0.68 0.68 0.81 5.22 -0.49 -0.55 -0.58 -0.36s2, n = 30 0.94 ” 0.78 2.02 1.98 6.41 -0.55 -1.33 0.11 2.45

Columns 4 to 11 report the Diebold and Mariano (28) test statistic that is N(0, 1) under the null

hypothesis (29). Columns 1 and 2 report the average (across 260 replications) of the LS estimates

of σ20 , α0. Column 3 reports the average of the PMLE of α0 (the average of the PMLE of ω0 is

9× 10−7). s2 indicates the sample covariance matrix of the data and n is the time-window (in days).

Table 3: Testing predictive accuracy (bivariate)

1258

∑258t=p+1

(e(11),t + e(22),t + e(12),t

)1

258

∑258t=p+1

∣∣ρ12,t − ρ12,t|t−p

∣∣

e(ij),t =∣∣∣σ2

ij,t − σ2ij,t|t−p

∣∣∣12

ρ12,t|t−p =σ212,t|t−p

σ11,t|t−pσ22,t|t−p

σ20(α) αT αpmle

T p: 1 5 20 120 1 5 20 120

0.10 0.88 0.85 0.48 0.81 4.12 8.50 1.01 0.87 -0.12 0.670.50 0.51 ” -5.28 -2.66 -0.21 4.81 -6.54 -7.30 -5.97 -8.931.00 0.31 ” -7.26 -4.94 -0.68 4.28 -8.24 -8.60 -7.73 -8.170.22 0.49 ” -5.43 -2.81 -0.28 4.68 -6.67 -7.30 -6.14 -8.86s2 0.94 ” 0.63 1.02 4.24 8.64 1.07 0.73 -0.34 0.40

s2, n = 15 0.94 ” -0.49 1.20 1.31 7.09 -3.56 -2.71 -1.87 -0.83s2, n = 30 0.94 ” 0.64 1.35 2.83 8.34 0.27 -0.17 -0.85 -0.04

Columns 4 to 11 report the Diebold and Mariano (28) test statistic. Columns 1 and 2 report

the average (across 258 replications) of the LS estimates of σ20 , α0. Column 3 reports the average

of the PMLE of α0 (the average of the PMLE of Ω0 = (ω0,11, ω0,12, ω0,22) is (9× 10−7, 3× 10−7, 10× 10−7)).

s2 indicates the sample covariance matrix of the data and n is the time-window (in days).

25

Fig

ure

1

-15

-10-505101520

020

0040

0060

0080

0010

000

1200

014

000

-500

-400

-300

-200

-1000

100

200

300

400

500

020

0040

0060

0080

0010

000

1200

014

000

01234567

020

0040

0060

0080

0010

000

1200

014

000

Not

e:Sim

ula

ted

pat

hof

GA

RC

H(1

,1)

r tw

ith

α0

=0.

95an

dω

0=

0(fi

rst

pan

el)

and

ω0

=1

(sec

ond

pan

el)

*an

dof

tr(Σ

t)(c

f.(1

4))

wit

hΩ

0=

0an

dα

0=

0.95

(thir

dpan

el).

Fig

ure

2

0

20

0 0.0

00

.10

0.2

00.3

00

.40

0.5

00.6

00.7

00

.80

0.9

0

IUHTXHQF\

T=

1.0

00

T=

5.0

00

T=

15

.00

0

0

10

20

30

40

50

60

70

80

90

10

0 0.0

00

.10

0.2

00.3

00

.40

0.5

00.6

00.7

00

.80

0.9

01.0

0

IUHTXHQF\

T=

15

.00

0

T=

5.0

00

T=

1.0

00

0

50

10

0 0.9

00

.91

0.9

20.9

30

.94

0.9

50.9

60.9

70

.98

0.9

91.0

0

IUHTXHQF\

T=

15

.000

T=

5.0

00

T=

1.0

00

0

10

0

20

0

30

0

40

0

50

0

60

0 0.0

00

.02

0.0

40

.06

0.0

80

.10

0.1

20

.14

IUHTXHQF\

T=

15

.00

0

T=

5.0

00

T=

1.0

00

0

10

0

20

0

30

0

40

0

50

0

60

0 0.6

50

.67

0.6

90.7

10

.73

0.7

50.7

70.7

90

.81

0.8

3

IUHTXHQF\

T=

15

.00

0

T=

5.0

00

T=

1.0

00

0

50

10

0

15

0

20

0

25

0

30

0

35

0

40

0 0.9

00

.91

0.9

20.9

30

.94

0.9

50.9

60.9

70

.98

0.9

9

IUHTXHQF\

T=

15

.00

0

T=

5.0

00

T=

1.0

00

Not

e:M

onte

carl

odis

trib

uti

on(1

,000

replica

tion

s)of

the

LS

esti

mat

orα

Tw

ith

true

valu

esω

0=

0an

dα

0=

0.05

(firs

tpan

el),

α0

=0.

75*

(sec

ond

pan

el),

α0

=0.

95(t

hir

dpan

el)

and

ofof

the

PM

Les

tim

ator

αpm

leT

wit

htr

ue

valu

esω

0=

1an

dα

0=

0.05

(fou

rth

pan

el),

α0

=0.

75*

(fift

hpan

el),

α0

=0.

95(s

ixth

pan

el).

Est

imat

ion

star

tsfr

oman

init

ialva

lue

ofα

=0.

5.T

he

dot

ted

line

refe

rsto

sam

ple

sof

T=

1,00

0*

obse

rvat

ions,

the

grey

line

tosa

mple

sof

T=

5,00

0ob

serv

atio

ns

and

the

bla

ckline

tosa

mple

sof

T=

15,0

00ob

serv

atio

ns.

paolo zaffaroni ∗ imperial college london

Documents