garch tutorial

GARCH MODELS OF DYNAMIC VOLATILITY AND CORRELATION

David S. Matteson and David Ruppert

School of Operations Research and Information Engineering, Cornell University, Ithaca, NY 14853

Economic and financial time series typically exhibit timevarying conditional (given the past) standard deviations andcorrelations. The conditional standard deviation is also calledthe volatility. Higher volatilities increase the risk of assets,and higher conditional correlations cause an increased riskin portfolios. Therefore, models of time varying volatilitiesand correlations are essential for risk management. GARCH(Generalized AutoRegressive Conditional Heteroscedastic)processes are dynamic models of conditional standard devi-ations and correlations. This tutorial begins with univariateGARCH models of conditional variance, including univariateAPARCH (Asymmetric Power ARCH) models that featurethe leverage effect often seen in asset returns. The lever-age effect is the tendency of negative returns to increase theconditional variance more than do positive returns of thesame magnitude. Multivariate GARCH models potentiallysuffer from the curse of dimensionality, because there ared(d + 1)/2 variances and covariances for a d-dimensionalprocess, but most multivariate GARCH models reduce the di-mensionality in some way. A number of multivariate GARCHmodels are reviewed: the EWMA model, the diagonal VECmodel, the dynamic conditional correlations model, orthogo-nal GARCH, and the dynamic orthogonal components model.In a case study, the dynamic orthogonal components modelwas found to provide the best fit.

INTRODUCTION

The volatility of a time series Y1, Y2, . . . at time n is the condi-tional standard deviation of Yn given Y1, . . . , Yn−1. A charac-teristic feature of economic and financial time series is volatil-ity clustering where periods of high and of low volatility oc-cur in the data. Typically, the changes between periods of low,medium, and high volatility do not exhibit any systematic pat-terns and seem best modeled as occurring randomly.

Volatility clustering is illustrated in Figure 1, which is atime series plot of daily S&P 500 log returns from January1981 through April 1991. Let Pn be the price of an asset attime n. The net return on the asset at time n is the relativechange since time n − 1, specifically (Pn − Pn−1)/Pn−1,and is usually expressed as a percentage. Financial analystsoften use the log-return, which is log(Pn/Pn−1). Using theapproximation log(1 + x) ≈ x, it is easy to show that the re-turn and the log-return are close to each other, provided that

they are both small, say under 5%, which is typical of dailyreturns; see Chapter 2 of [1] for more information about re-turns. In Figure 1, the very high volatility in October 1987is obvious, especially on and after October 19 (Black Mon-day) when the return on the S&P 500 index was −22.8%, andother periods of high or low volatility can also be seen.

1982 1984 1986 1988 1990

−20

−10

05

year

retu

rn

Fig. 1. S&P 500 daily log returns from January 1981 throughApril 1991. The returns are expressed as percentages.

Further evidence of volatility clustering can be seen in thesample autocorrelation function (ACF) of the squared returns,which is plotted on the right in Figure 2. The ACF of the re-turns on the left side of the figure shows little autocorrelation.The squared returns show substantial positive autocorrelationbecause of volatility clustering, which causes the values ofsquared returns on days t and t − k to be similar, at least forsmall values of k.

To model volatility clustering, one needs a stochastic pro-cess with a nonconstant conditional (given the past) variance.GARCH and stochastic volatility models have this property.GARCH models are easier to implement and are more widelyused than stochastic volatility models, so, in this tutorial, wewill cover only GARCH models. Stochastic volatility modelsare used in options pricing; see [2].

We start with univariate GARCH models and then coverthe multivariate case. Multivariate GARCH models are of ob-vious importance to many areas of finance, e.g., managing the

0 5 15 25 35

−0.

100.

050.

20

Lag

AC

F

return

0 5 15 25 35

−0.

100.

050.

20

LagA

CF

return2

Fig. 2. ACF of S&P 500 daily returns and squared returns.

risk of a portfolio of assets. Even in the case of a single as-set, volatility clustering complicates risk management, sincethe level of risk varies with the conditional standard devia-tion of the returns on the stock. With a portfolio of stocks,risk management is even more challenging because: (1) theconditional standard deviations of the returns on the stocks inthe portfolio will co-evolve; (2) the correlations between thereturns will also vary. Multivariate GARCH models accom-modate both of these effects.

UNIVARIATE GARCH MODELS

To understand GARCH models, it is useful to review ARMAmodels. Throughout this article, {ϵt}∞t=−∞ is an independent,identically distributed (iid) sequence with mean zero and fi-nite variance σ2

ϵ . An ARMA(p, q) model for the process {Yt}is

(Yt − µ) = ϕ1(Yt−1 − µ) + · · ·+ ϕp(Yt−p − µ)

+ ϵt + θ1ϵt−1 + · · ·+ θqϵt−q

The conditional mean of Yt, given the past, is µ+ ϕ1(Yt−1 −µ) + · · ·+ ϕp(Yt−p − µ) + θ1ϵt−1 + · · ·+ θqϵt−q. The con-ditional variance of Yt is the constant σ2

ϵ , which explains whyan ARMA process is not suitable for modeling volatility clus-tering.

In a manner analogous to the way an ARMA process mod-els the conditional mean, a GARCH model expresses the con-ditional variance at time t as a linear combination of pastvalues of the conditional variance and of the squared pro-cess. The simplest GARCH model is the ARCH(1) processa1, a2, . . . where

at = σtϵt,

andσ2t = ω + α1a

2t−1.

In this model, a large value of |at−1|, which is an indication ofhigh volatility of at−1, increases σt, the volatility of at. Forthis reason, in an ARCH(1) model volatilities cluster, withhigher values of α causing more volatility clustering. It is

required that ω > 0 and α ≥ 0 so that the conditional vari-ance stays positive. Also, α < 1 is required for stationarity,specifically to keep σ2

t from exploding.Unfortunately, the ARCH(1) model does not have enough

parameters to provide a good fit to many financial time series.A simple solution to this problem is to allow σ2

t to dependnot only on a2t−1 but also on σ2

t−1. This is done with theGARCH(1,1) model where at = σtϵt as before, but now

σ2t = ω + α1a

2t−1 + β1σ

2t−1.

The GARCH(1,1) model can be generalized easily;{at}∞t=−∞ is a GARCH(p, q) process if at = σtϵt, where

σ2t = ω +

p∑i=1

αia2t−i +

q∑i=1

βiσ2t−i,

α1, . . . , αp and β1, . . . , βq are all nonnegative, ω > 0, and∑max(p,q)i=1 (αi + βi) < 1 for stationarity. Without a con-

straint, the parameters in the GARCH(p, q) are not uniquelydefined because one can multiply each of ω, α1 . . . , αp, andβ1 . . . , βq by some positive constant and divide σ2

t by thatconstant without changing the model. To make the parame-ters uniquely defined, we add the constraint that σ2

ϵ equals 1.Let Fn be the information set at time n, or, more precisely,the set σ⟨ϵn, ϵn−1 . . .⟩ of all information contained in {ϵt :t = −∞, . . . , n}, or, equivalently, in {at : t = −∞, . . . , n}.Then, for any random variable X , E(X|Ft−1) is the condi-tional expectation of X given {ϵt : t = −∞, . . . , n − 1}.It is easy to check that E(at|Ft−1) = 0. More importantly,var(at|Ft−1) = σ2

t , so the conditional variance of at is a lin-ear combination of: (1) past conditional variances; and (2)past values of a2t . Feature (1) keeps the conditional variancesmooth while feature (2) allow it to adapt to changes in theobserved volatility of the process.

Because {ϵt}∞t=−∞ is stationary, so also is {at}∞t=−∞, andso its unconditional variance is constant, and, in fact, equalto σ2

a := ω/{1 −∑max(p,q)

i=1 (αi + βi)}, as will be shownsoon. Here, to simplify notation, we define αi = 0 if i > pand βi = 0 if i > q. Therefore, a GARCH process has aconstant unconditional variance (as it must, since it is a sta-tionary process) but a nonconstant conditional variance—wesay that it is homoscedastic but conditionally heteroscedastic.The name “GARCH” is an acronym for “generalized autore-gressive conditional heteroscedasticity.” Engle [3] proposedARCH(p) processes where q = 0 and the conditional het-eroscedasticity is modeled as an autoregression in past valuesof a2t . Bollerslev [4] extended Engle’s idea to GARCH pro-cesses, that is, q ≥ 1.

For any k ≥ 1, we have

E(atat−k) = E{E(atat−1|Ft−1)}= E{at−1E(at|Ft−1)} = 0,

so that {at}∞t=−∞ is an uncorrelated process, although it isnot an independent sequence since var(at|Ft−1) = σ2

t de-pends on past values of the process. An uncorrelated pro-cess with a constant unconditional mean and variance, suchas {at}∞t=−∞, is called a weak white noise process.

A GARCH model for {at}∞t=−∞ is also an ARMA modelfor {a2t}∞t=−∞. To see this, let ηt = a2t − σ2

t , which is aweak white noise process, and, as before, let σ2

a = ω/{1 −∑max(p,q)i=1 (αi + βi)}. Straightforward algebra shows that

a2t − σ2a =

max(p,q)∑i=1

(αi + βi)(a2t−i − σ2

a)−q∑

i=1

βiηt−i + ηt,

so that a2t is an ARMA(max(p, q), q) process with uncondi-tional mean σ2

a. Therefore, σ2a is the unconditional variance

of at as was claimed before.The ACF of {a2t}∞t=−∞ is readily obtained as the ACF

of the corresponding ARMA process. For example, if{at}∞t=−∞ is an ARCH(1) process, then {a2t}∞t=−∞ is anAR(1) process, and therefore the ACF of {a2t}∞t=−∞ isρa2(h) = α

|h|1 . If {at}∞t=−∞ is a GARCH(1, 1) process,

then {a2t}∞t=−∞ is an ARMA(1,1) process, and the ACF of{a2t}∞t=−∞ is

ρa2(1) =α1(1− α1β1 − β2

1)

1− 2α1β1 − β21

andρa2(k) = (α1 + β1)

k−1ρa2(1), k ≥ 2.

These results can be derived from the ACF of the ARMA(1,1)model Yt = ϕYt−1 + θϵt−1 + ϵt, which is

ρY (1) =(1 + ϕθ)(ϕ+ θ)

1 + θ2 + 2ϕθ

and ρY (k) = ϕk−1ρ(1), for k ≥ 2, by replacing ϕ by α1+β1

and θ by −β1.The parameters of a GARCH model can be estimated by

maximum likelihood (see [5]), and p and q can be selectedby minimizing AIC or BIC. AIC and BIC are criteria de-signed to select models that provide a good fit to the datawithout an excessive number of parameters; such modelsare called parsimonious. If θ is the vector of all parame-ters, dim(θ) is the length of θ, θ is the MLE, and L(θ) isthe likelihood, then AIC = −2L(θ) + 2 dim(θ), and BIC =−2L(θ) + log(n) dim(θ), where n is the sample size andlog is the natural logarithm. Smaller values of AIC or BICindicate better models. A small value of −2L(θ) indicatesa good fit, while 2 dim(θ) and log(n) dim(θ) penalize thenumber of parameters. Since log(n) > 2 in any practicalsetting, BIC tends to select more parsimonious models thanAIC. See Section 5.12 of [1] for more information about AICand BIC. For the S&P 500 daily returns, all models with1 ≤ p ≤ 3 and 1 ≤ q ≤ 5 were compared by AIC and

BIC. AIC was minimized by the GARCH(1,4) model, andBIC was minimized by the GARCH(1,2) model. AIC has atendency to overfit, that is, to select too many parameters, sothe GARCH(1,2) model will be used for further analysis.

The standardized residuals are at/σt and estimate the {ϵt}process. They can be used for model checking. If the ACFof the squared standardized residuals shows little autocorre-lation, then this is a sign of a good fit. For the S&P 500 re-turns, the ACFs of the standardized residuals and the squaredstandardized residuals are shown in Figure 3. The squaredstandardized residuals show little autocorrelation, which isevidence that the GARCH(1,2) model selected by BIC is ad-equate and the more complex model selected by AIC is notneeded.

Many software packages such as SAS and Matlab havefunctions for fitting GARCH models. The GARCH modelsin this section, as well as the ARMA/GARCH and APARCHmodels discussed later, were estimated using the garchFitcommand in the fGarch package of R [6]. R has the ad-vantages that it is very widely used by statisticians and thatit available free of charge. Examples of R code for fittingGARCH models can be found in [1] and on the web site ac-companying that book:http://people.orie.cornell.edu/˜davidr/SDAFE/index.html

0 5 15 25 35

−0.

100.

050.

20

Lag

AC

F

StdResiduals

0 5 15 25 35−

0.10

0.05

0.20

Lag

AC

F

StdResiduals2

Fig. 3. S&P 500 daily returns. ACFs of the standard-ized residuals and the squared standardized residuals from aGARCH(1,2) model.

COMBINING GARCH AND ARMA MODELS

There is some first order autocorrelation in the standardizedresiduals in Figure 3, and this suggests that the returns arecorrelated and so are not a GARCH process. This problemis remedied by combining an ARMA model for the condi-tional mean with a GARCH model for the conditional stan-dard deviation. The combined model, which will be called anARMA(pA, qA)/GARCH(pG, qG) model, is

(Yt − µ) = ϕ1(Yt−1 − µ) + · · ·+ ϕpA(Yt−pA − µ)

+ at + θ1at−1 + · · ·+ θqAat−qA

where {at} is a GARCH(pG, qG) process.

An ARMA(0,1)/GARCH(1,2) process was fit to the S&P500 returns and neither the standardized residuals nor squaredstandardized residuals showed evidence of autocorrelation.The conditional standard deviations estimated by the modelare shown in Figure 4 for the second half of 1987. The un-conditional standard deviation is shown as a horizontal redline. The conditional standard deviations have a sharp peakon the day after Black Monday. Notice, however, that theyhad been rising steadily before Black Monday, and on BlackMonday the conditional standard deviation was over twice aslarge as the unconditional standard deviation.

1987.5 1987.6 1987.7 1987.8 1987.9 1988.0

0.01

0.03

0.05

0.07

year

SD

Fig. 4. S&P 500 daily returns. Estimated conditional standarddeviations during the second half of 1987. The horizontal redline shows the unconditional standard deviation. The sharppeak occurs on the day after Black Monday.

BLACK MONDAY

How probable was the large negative return on October 19,1987 when the S&P 500 lost 22.8% of its value? Define a“Black Monday event” as a daily return less than or equalto −22.8%. The probability of a Black Monday event cal-culated using a Gaussian distribution and the unconditionalvariance is 4.1× 10−98, so we would expect a Black Mondayevent only once in 9.63 × 1094 years! Obviously, somethingmust be wrong with this calculation. In fact, two things arewrong, the Gaussian assumption and the use of the uncondi-tional variance. Black Monday’s return was 21 unconditionalstandard deviation below 0, but only 9.9 conditional standarddeviations below 0. Moreover, daily stock returns have muchheavier tails than the Gaussian distribution and have been ob-served to be approximately t-distributed with between 3 and 8degrees of freedom (DF). A t-distribution was fit to the stan-dardized residuals and the MLE of DF was 6. The probabilitythat a t6 random variable is below −9.9 is small, and onlyone daily return this small can be expected every 1560 years.However, Black Monday is much more unlikely if one ignores

the conditional heteroscedasticity and calculates the probabil-ity using the t6-distribution with the unconditional standarddeviation—then only one Black Monday event is expected ev-ery 21,600 years.

The standardized residuals also show some left skewness,and we should take that into account. If we fit a skew t dis-tribution [7] to the standardized residuals, then the estimatedprobability of a Black Monday event increases and we canexpect one event every 417 years. In summary, if we take intoaccount conditional heteroscedasticity, heavy tails, and leftskewness, a Black Monday event changes from being com-pletely implausible to merely a very rare event. We do not ex-pect many Black Monday events, but having one in the 20thcentury was plausible.

The probabilities reported in this section were calculatedusing parameters estimated from the entire series, includingBlack Monday and the period of very high volatility that fol-lowed. The results are somewhat different if only the dataprior to Black Monday are used. The estimated conditionalstandard deviation for Black Monday is 37% higher using theentire series compared to using only the data prior to BlackMonday. Using only the prior data, the degrees of freedomof the t-distribution fitted to the standardized residuals is 8,rather than 6. A larger degrees of freedom parameter im-plies lighter tails making extreme returns less likely. Oneconclusion from these comparisons is that estimation of ex-treme risks using only data from periods of low to moderatevolatility can be misleading. An analyst might be reluctantto use data from 1987 when managing risks, since such dataseem too old to be still relevant. However, periods of ex-treme volatility are rare and perhaps data from that far backare needed to estimate extreme risks.

APARCH MODELS

It has often been observed that a large negative return in-creases volatility more than does a positive return of the samesize. This phenomenon is called the leverage effect and makessense, since we can expect investors to become more nervousafter a large negative return than after a positive return of thesame magnitude. The GARCH models introduced so far can-not model the leverage effect, because σt is a function onlyof past values of a2t , so information about the sign of at islost. To model the leverage effect, the APARCH (asymmet-ric power ARCH) models replace the square function witha more flexible class of nonnegative functions that includesasymmetric functions. The APARCH(p, q) model for the con-ditional standard deviation is

σδt = ω +

p∑i=1

αi(|at−i| − γiat−i)δ +

q∑j=1

βjσδt−j ,

where δ > 0 and −1 < γj < 1, j = 1, . . . , p. The parametersω, α1, . . . , αp, and β1, . . . , βq satisfy the same constraints as

for a GARCH model. Note that δ = 2 and γ1 = · · · = γp = 0give a standard GARCH model. A positive value of γj indi-cates a leverage effect, because for a positive γ, as |x| in-creases, the function |x| − γx increases more quickly whenx is negative than when x is positive—this causes the con-ditional variance to increase more for negative than positivevalues of at−i.

−2 −1 0 1 2

0.0

0.5

1.0

1.5

2.0

x

no leverageγ = 0.346

Fig. 5. The function |x| − γx with γ = 0 (no leverage) andγ = 0.346 (MLE for the S&P 500 daily returns).

When an ARMA(0,1)/APARCH(1,2) model was fit to theS&P 500 daily returns, the MLE of γ1 was 0.346 with a 95%confidence interval of (0.17, 0.52). Since this interval is wellabove 0, there is a statistically significant leverage effect. Thesize of the leverage effect is illustrated in Figure 5 that plots|x| − γx with γ = 0 and with γ = 0.346. The MLE of δwas 1.87 with a 95% confidence interval of (1.3, 2.4), whichincludes the value δ = 2 of the standard GARCH model. TheARMA(0,1)/APARCH(1,2) had the smallest value of BIC ofall models considered, despite having two more parameters(γ1 and δ) than the ARMA(0,1)/GARCH(1,2) model andthree more parameters than the GARCH(1,2) model.

MULTIVARIATE GARCH MODELS

As financial asset returns evolve, they tend to move together.Their respective volatilities also tend to move together overtime, across both assets and markets. Modeling a time-varying conditional covariance matrix, referred to as thevolatility matrix, is important in many financial applications,including asset pricing, portfolio selection, hedging, and riskmanagement. See [8] and [9] for a thorough literature review.

Multivariate volatility models all face major difficulties.First, the curse of dimensionality; there are d(d + 1)/2 vari-ances and covariances for a d-dimensional process, e.g., 55for d = 10, all of which are latent and may vary with time.Many parameterizations for the evolution of the volatility ma-trix use such a large number of parameters that estimation

becomes infeasible for d > 10. In addition to empirical ade-quacy (i.e., goodness-of-fit of the model to the data), ease andfeasibility of estimation are important considerations for anypractitioner.

Analogous to positivity constraints in univariate GARCHmodels, a well-defined multivariate volatility matrix processmust be positive-definite at every time point. This propertyshould extend to model-based forecasts as well. From a prac-tical perspective, the inverse of a volatility matrix is oftenneeded for applications. Additionally, a positive conditionalvariance estimate for a portfolio’s return (i.e., linear combina-tion of asset returns) is essential; fortunately, this is guaran-teed by positive definiteness.

Multivariate Conditional Heteroscedasticity

Figures 6a and 6b are time series plots of daily returns (inpercentage) for IBM stock and the Center for Research in Se-curity Prices (CRSP) value-weighted index, including divi-dends, from January 3, 1989 to December 31, 1998, respec-tively. Each series clearly exhibits volatility clustering. Letyt denote the vector time series of these returns.

(a)

year

retu

rn (

%)

1990 1992 1994 1996 1998

−10

5

(b)

year

retu

rn (

%)

1990 1992 1994 1996 1998

−6

0

Fig. 6. Daily returns (in percentage) for (a) IBM stock and(b) the CRSP value-weighted index, including dividends.

Figures 7a and 7b are the sample ACF plots for the IBMstock and CRSP index returns, respectively. These returnswill be used as an example for the remainder of the paper.There is some evidence of minor serial correlation at lowlags. Cross-dependence between processes is also importantin multivariate time series. We first consider the lead-lag lin-ear relationship between pairs of returns. Figure 7c is the

sample cross-correlation function (CCF) between IBM andCRSP. The lag zero estimate for contemporaneous correla-tion of 0.49 is very significant. There is also some evidenceof minor cross-correlation at low lags. A Box-Ljung [10] testcan be applied to simultaneously test that the first m auto-correlations, as well as the lagged cross-correlations in themultivariate case [11], are all zero. The multivariate Box-Ljung test statistic at lag five is Qd=2(yt,m = 5) = 50.12,which has a p value very close to zero. A p value [1, p. 618]represents the probability of obtaining a test statistic at leastas extreme, 50.12 or greater in this case, assuming that thenull hypothesis, zero serial correlation for lags 1, . . . ,m inthis case, is true. Here, the p value very close to zero pro-vides strong evidence to reject the null hypothesis and indi-cates there is significant serial correlation for the vector pro-cess.

0 5 10 20 30

−0.

050.

05

Lag

AC

F

(a)

0 5 10 20 30

−0.

050.

05

Lag

AC

F

(b)

−20 −10 0 10 20

0.0

0.2

0.4

Lag

CC

F

(c)

Fig. 7. ACFs for (a) the IBM stock and (b) CRSP index re-turns; (c) CCF between IBM and CRSP returns.

For simplicity, we use ordinary least squares to fit a vectorAR(1) model to remove the minor serial correlation and focuson the conditional variance and covariance. Let et denote theestimated residuals from the regression; et estimates the in-novation process et which is described more fully in the nextsection. The multivariate Box-Ljung test statistic at lag fiveis now Q2(et, 5) = 16.21, which has a p value of 0.704, in-dicating there is no significant serial correlation in the vectorresidual process.

Although the innovation process et is serially uncorre-lated, Figure 8 shows it is not an independent process. Figures8a and 8b are sample ACF plots for the squared processes e2it.They both show substantial positive autocorrelation becauseof the volatility clustering. Figure 8c is the sample CCF forthe squared processes; this figure shows there is a dynamicrelationship between the squared processes at low lags. Fig-ure 8d is the sample ACF for the product process e1te2t and

shows that there is also positive autocorrelation in the con-ditional covariance series. The multivariate volatility modelsdescribed below attempt to account for these forms of depen-dence exhibited in the vector time series of innovations.

0 5 10 20 30

0.0

0.2

Lag

AC

F

(a)

0 5 10 20 30

0.0

0.2

Lag

AC

F

(b)

−20 −10 0 10 20

−0.

050.

15

LagC

CF

(c)

0 5 10 20 30

0.0

0.2

Lag

AC

F

(d)

Fig. 8. ACFs of squared innovations for (a) IBM and (b)CRSP; (c) CCF between squared innovations; (d) ACF forthe product of the innovations.

Basic Setting

Let yt = (y1t, . . . , ydt)′ denote a d-dimensional vector pro-

cess and let Ft denote the information set until time index t,i.e., σ

⟨yt,yt−1, . . .

⟩. We partition the process as

yt = µt + et,

in which µt = E(yt|Ft−1) is the conditional mean and etis the mean zero serially uncorrelated innovation vector. LetΣe = Cov(et) be the unconditional covariance matrix ofet. Let Σt = Cov(et|Ft−1) = Cov(yt|Ft−1) denote theconditional covariance matrix. For a stationary process theunconditional mean and covariance matrix are constant, eventhough the conditional mean and covariance matrix may benon-constant. Multivariate time series modeling is concernedwith the time evolutions of µt and Σt, the conditional meanand covariance matrix.

In our analysis we assume that µt follows a vector autore-gression µt = ϕ0+

∑Pi=1 ϕiyt−i, where P is a non-negative

integer, and ϕ0 and the ϕi are d × 1 and d × d coefficientmatrices, respectively. We took P = 1 in the current exam-ple. The relationship between the innovation process and thevolatility process is defined by

et = Σ1/2t εt, εt

iid∼ F (0, Id),

in which Σ1/2t is the matrix square-root of Σt and the stan-

dardized iid innovations εt are from a standardized multivari-

ate distribution F . The models below describe dynamic evo-lutions for the volatility matrix Σt. Throughout the rest ofthis paper, all figures and comparisons are based on in-sampleestimates which use the entire sample to calculate. Out-of-sample comparisons are also of interest, but there is no clearmetric to measure their adequacy. This is because the condi-tional distribution is changing over time and there is only oneobservation available at each time index, making it impossi-ble to directly calculate conditional variances and covariancesat each time point.

EWMA Model

The simplest matrix generalization of a univariate volatil-ity model is the exponentially weighted moving average(EWMA) model. It is indexed by a single parameter λ ∈(0, 1) and is defined by the recursion

Σt = (1− λ)et−1e′t−1 + λΣt−1

= (1− λ)

∞∑m=1

λm−1et−me′t−m.

When the recursion is initialized with a positive-definite (p.d.)matrix the sequence remains p.d. This single parameter modelis simple to estimate regardless of the dimension, with largevalues of λ indicating high persistence in the volatility pro-cess. However, the dynamics can be too restrictive in prac-tice, since the component-wise evolutions all have the samediscounting factor (or, persistence parameter) λ.

Figure 9 shows the in-sample fitted EWMA model for etassuming a multivariate standard normal distribution for εtand using conditional maximum likelihood estimation. Theestimated conditional standard deviations are shown in (a) and(d), and the conditional covariances and implied conditionalcorrelations are shown in (b) and (c), respectively. The per-sistence parameter λ was estimated as 0.985.

Diagonal VEC Model

The vectorization model of [12] specifies a vector ARMAtype model for a vector process containing all the conditionalvariances and covariances. The diagonal form of this model(DVEC) simply specifies univariate representations for eachcomponent process. Analogous to the GARCH(1,1) model,the (1,1) form of the DVEC model is given by the recursions

σ2it = ωi + αie

2i,t−1 + βiσ

2i,t−1,

σij,t = ωij + αijei,t−1ej,t−1 + βijσij,t−1,

for i = j, in which σ2it = Σii,t and σij,t = Σij,t = Σji,t.

Although univariate models are specified, the model stillrequires joint estimation of all the parameters. Combiningcontemporaneous marginal estimates of the conditional vari-ances and covariances does not, in general, assure that thecovariance matrix is positive definite (p.d.). Conditions for

(a) σ1,t

year

1990 1994 1998

1.0

2.0

(b) σ12,t

year

1990 1994 1998

0.5

1.5

2.5

(c) ρ12,t

year

1990 1994 1998

0.1

0.4

0.7

(d) σ2,t

year

1990 1994 1998

0.4

1.0

1.6

Fig. 9. A fitted EWMA model with λ = 0.985. The red linein (c) is the sample correlation estimate over the previous sixmonths for comparison.

p.d. estimates from the DVEC model are straight-forward tocheck, but difficult to incorporate into a constrained estima-tion problem. Figure 10 shows the fitted DVEC model for etusing the multivariate standard normal distribution and con-ditional maximum likelihood estimation. The estimated con-ditional standard deviations are shown in (a) and (d), and theconditional covariances and implied conditional correlationsare shown in (b) and (c), respectively.

A true matrix generalization of the univariate GARCHmodel is given by [13]; it is commonly refereed to as theBEKK model in reference to the authors. It has a parame-terization to assure a positive definite estimated covariancematrix. However, the model parameters are difficult to inter-pret. Also, the model parameters are difficult to estimate evenfor very low dimensional processes, so we will not discussthese models further.

Dynamic Conditional Correlation Models

Nonlinear combinations of univariate volatility models havebeen proposed to allow for time-varying correlations, a fea-ture that has been documented in financial applications. Both[14] and [15] generalize the constant correlation model of[16] to allow for such dynamic conditional correlations(DCC).

Analogously to the GARCH(1,1) model, the first orderform of the DCC model in [15] may be represented by the

(a) σ1,t

year

1990 1994 1998

1.0

2.0

3.0

(b) σ12,t

year

1990 1994 1998

01

23

4

(c) ρ12,t

year

1990 1994 1998

0.2

0.6

(d) σ2,t

year

1990 1994 1998

0.5

1.5

Fig. 10. A fitted first order DVEC model with (ω1, α1, β1) =(0.013, 0.023, 0.974), (ω2, α2, β2) = (0.011, 0.046, 0.933),(ω12, α12, β12) = (0.009, 0.033, 0.952). The red line in (c) isthe sample correlation estimate over the previous six monthsfor comparison.

equations

σ2it = ωi + αie

2i,t−1 + βiσ

2i,t−1,

Dt = diag{. . . , σit, . . .}ϵt = D−1

t et

Qt = (1− λ)ϵt−1ϵ′t−1 + λQt−1

Rt = diag{Qt}−12Qt diag{Qt}−

12

Σt = DtRtDt.

The main idea is to first model the conditional variance ofeach individual series σ2

it, compute the standardized residualsϵt from these models, then focus on modeling the conditionalcorrelation matrix Rt. These are then combined at each timepoint t to estimate the volatility matrix Σt. The recursion Qt

is an EWMA applied to the scaled residuals ϵt. It is indexedby a single parameter λ ∈ (0, 1). The matrix Qt needs to berescaled to form a proper correlation matrix Rt with the valueone for all elements on the main diagonal.

This model’s parameters can be estimated consistently intwo stages using quasi-maximum likelihood. First, a univari-ate GARCH(1,1) model is fit to each series to find σ2

it. Then,given ϵt, λ is estimated by maximizing the components of thequasi-likelihood that only depend on the correlations. Thisis justified since the squared residuals do not depend on thecorrelation parameters.

In the form above, the variance components only condi-tion on their own individual lagged returns and not the jointreturns. Also, the dynamics for each of the conditional corre-lations are constrained to have equal persistence parameters,similar to the EWMA model. [However, equal persistence pa-rameters may be more suitable for the scale-free correlationmatrix (as in the DCC model) than for the covariance matrix(as in the EWMA model).] An explicit parameterization ofthe conditional correlation matrix Rt, with flexible dynam-ics, is just as difficult to estimate in high dimensions as Σt

itself. Figure 11 shows a fitted DCC model for et using quasi-maximum likelihood estimation. The estimated conditionalstandard deviations are shown in (a) and (d), and the condi-tional covariances and conditional correlations are shown in(b) and (c), respectively.

(a) σ1,t

year

1990 1994 1998

1.5

2.5

3.5

(b) σ12,t

year

1990 1994 1998

02

46

(c) ρ12,t

year

1990 1994 1998

0.2

0.5

0.8

(d) σ2,t

year

1990 1994 1998

0.5

1.5

2.5

Fig. 11. A fitted first order DCC model with(ω1, α1, β1) = (0.0741, 0.0552, 0.9233), (ω2, α2, β2) =(0.0206, 0.0834, 0.8823), and λ = 0.9876. The red line in(c) is the sample correlation estimate over the previous sixmonths for comparison.

Orthogonal GARCH Model

Several factor and orthogonal models have been proposed toreduce the number of parameters and parameter constraintsby imposing a common dynamic structure on the elements ofthe volatility matrix. The orthogonal GARCH (O-GARCH)model of [17] is among the most popular because of its sim-plicity. It is assumed that the innovations et can be decom-posed into orthogonal components zt via a linear transforma-tion U . This is done in conjunction with principal component

analysis (PCA) as follows. Let Υ be the matrix of eigenvec-tors and Λ the diagonal matrix of the corresponding eigenval-ues of Σe. Then, take U = Λ−1/2Υ′, and let

zt = Uet.

The components are constructed such that Cov{zt} = Id.The sample estimate of Σe is typically used to estimate U .

Next, univariate GARCH(1,1) models are individually fitto each orthogonal component to estimate the conditional co-variance Ot = Cov{zt|Ft−1}. Let

σ2it = ωi + αiz

2i,t−1 + βiσ

2i,t−1

Ot = diag{. . . , σ2it, . . .}

Σt = U−1OtU−1′.

In summary, a linear transformation U is estimated, usingPCA, such that the components of zt = Uet have zero (un-conditional) correlation. It is then also assumed that the con-ditional correlations of zt are also zero; however, this is not atall assured to be true. Under this additional stronger assump-tion, the conditional covariance matrix for zt, Ot is diagonal.For simplicity, univariate models are then fit to model the con-ditional variance σ2

it for each component of zt.The main drawback of this model is that the orthogonal

components are uncorrelated unconditionally, but they maystill be conditionally correlated. The O-GARCH model im-plicitly assumes the conditional correlations for zt are zero.Figure 12 shows a fitted O-GARCH model for et using PCAfollowed by univariate conditional maximum likelihood es-timation. The estimated conditional standard deviations areshown in (a) and (d), and the conditional covariances and con-ditional correlations are shown in (b) and (c), respectively.The implied conditional correlations do not appear adequatefor this fitted model compared to the sample correlation es-timate over the previous six months (used as a proxy for theconditional correlation).

Dynamic Orthogonal Components Models

To properly apply univariate modeling after estimating alinear transformation in the spirit of the O-GARCH modelabove, the resulting component processes st must not only beorthogonal contemporaneously, the conditional correlationsmust be zero. Additionally, the lagged cross-correlations forthe squared components must also be zero. In [18], if thecomponents of a time series st satisfy these conditions, thenthey are called dynamic orthogonal components (DOCs) involatility.

Let st = (s1t, . . . , sdt)′ denote a vector time series of

DOCs. Without loss of generality, st is assumed to be stan-dardized such that E{sit} = 0 and Var{sit} = 1 for i =1, . . . , d. A Ljung-Box statistic, defined below, is used to testfor the existence of DOCs in volatility. Including lag zero in

(a) σ1,t

year

1990 1994 1998

1.0

2.0

(b) σ12,t

year

1990 1994 1998

0.5

1.5

(c) ρ12,t

year

1990 1994 1998

0.2

0.6

(d) σ2,t

year

1990 1994 1998

0.5

1.5

Fig. 12. A fitted first order orthogonal GARCHmodel with (ω1, α1, β1) = (0.0038, 0.0212, 0.9758),(ω2, α2, β2) = (0.0375, 0.0711, 0.8913), and U−1 =((1.7278, 0.2706), (0.2706, 0.7241)). The red line in (c) isthe sample correlation estimate over the previous six monthsfor comparison.

the test implies that the conditional covariance between sta-tionary DOCs is zero since the Cauchy-Schwarz inequalitygives

|Cov(sitsjt, si,t−ℓsj,t−ℓ)| ≤ Var(sitsjt) = E(s2i s2j ),

since E(sitsjt) = E(si,t−ℓsj,t−ℓ) = 0 by the assumption ofa DOC model. Let ρs

2

i,j = Corr{s2i,t, s2j,t−ℓ}. The joint lag-m null and alternative hypotheses to test for the existence ofDOCs in volatility are

H0 : ρs2

i,j(ℓ) = 0 for all i = j, ℓ = 0, . . . ,m

HA : ρs2

i,j(ℓ) = 0 for some i = j, ℓ = 0, . . . ,m.

The corresponding Ljung-Box test statistic is

Q0d(m) = n

∑i<j

ρs2

i,j(0)2+n(n+2)

m∑k=1

∑i=j

ρs2

i,j(k)2/(n−k).

Under H0, Q0d(m) is asymptotically distributed as a χ2 dis-

tribution with d(d − 1)/2 + md(d − 1) degrees of freedom.The null hypothesis is rejected for a large value of Q0

d(m).When H0 is rejected, one must seek an alternative modelingprocedure.

As expected, the DOCs in volatility hypothesis is rejectedfor the innovations. The test statistic is Q0

2(e2, 5) = 356.926

with a p value near zero. DOCs in volatility is also rejectedfor the principal components used in the O-GARCH model,the test statistic is Q0

2(z2, 5) = 135.492 with a p value near

zero. Starting with the uncorrelated principal components zt,[18] propose estimating an orthogonal matrix W such that thecomponents st = Wzt are as close to DOCs in volatility aspossible for a particular sample. This is done by minimizinga reweighted version of the Ljung-Box test statistic, definedabove, with respect to the separating matrix W . The nullhypothesis of DOCs in volatility is accepted for the estimatedcomponents st. The test statistic is Q0

2(s2, 5) = 7.845 with a

p value equal to 0.727.After DOCs are identified, a univariate volatility model is

considered for each process σ2it. The following model was fit

et = Mst = MV1/2t εt,

V t = diag{σ21t, . . . , σ

2dt}, εit

iid∼ tνi(0, 1)

σ2it = ωi + αis

2i,t−1 + βiσ

2i,t−1

Σt = MV tM′,

in which tνi(0, 1) denotes the standardized Student-t distribu-tion with νi degrees of freedom. Each Σt is positive-definiteif σ2

it > 0 for all components. The fundamental motivationis that empirically the dynamics of et can often be well ap-proximated by an invertible linear combination of DOCs et =Mst, in which M = U−1W ′ by definition. In summary, Uis estimated by PCA to uncorrelate et, W is estimated to min-imize a reweighted version of Q0

d(m) defined above (givingmore weight to lower lags). U and W are combined to es-timate DOCs st, of which univariate volatility modeling maythen be appropriately applied. This approach allows model-ing of a d-dimensional multivariate volatility process with dunivariate volatility models, while greatly reducing both thenumber of parameters and the computational cost of estima-tion, and at the same time maintaining adequate empirical per-formance.

Figure 13 shows a fitted DOCs in volatility GARCHmodel for et using generalized decorrelation followed byunivariate conditional maximum likelihood estimation. Theestimated conditional standard deviations are shown in (a)and (d), and the conditional covariances and implied cor-relations are shown in (b) and (c), respectively. Unlike theO-GARCH fit, the implied conditional correlations appearadequate compared to the rolling estimator.

Model Checking

For a fitted volatility sequence Σt, the standardized residualsare defined as

εt = Σ−1/2t et.

To verify the adequacy of a fitted volatility model, laggedcross-correlations of the squared standardized residuals shouldbe zero. The product process εitεjt should also have no se-rial correlation. Additional diagnostic checks for time series

(a) σ1,t

year

1990 1994 1998

1.0

2.0

(b) σ12,t

year

1990 1994 1998

13

5

(c) ρ12,t

year

1990 1994 1998

0.2

0.6

(d) σ2,t

year

1990 1994 1998

0.5

1.5

Fig. 13. A fitted first order DOCs in volatility GARCH modelwith (ω1, α1, β1, ν1) = (0.0049, 0.0256, 0.9703, 4.3131),(ω2, α2, β2, ν2) = (0.0091, 0.0475, 0.9446, 5.0297), andM = ((1.5350, 0.838), (0.0103, 0.7730)). The red line in(c) is the sample correlation estimate over the previous sixmonths for comparison.

are considered in [11]. Since the standardized residuals areestimated and not observed, all p values given in this sectionare only approximate.

To check the first condition we can apply a multivariateBox-Ljung test to the squared standardized residuals. Forthe EWMA model, Q2(ε

2t , 5) = 26.40 with a p value of

0.153, implying no significant serial correlation. For theDVEC model, Q2(ε

2t , 5) = 21.14 with a p value of 0.389,

implying no significant serial correlation. For the DCCmodel, Q2(ε

2t , 5) = 10.54 with a p value 0.957, implying

no significant serial correlation. For the O-GARCH model,Q2(ε

2t , 5) = 30.77 with a p value 0.058. In this case, there

is some minor evidence of serial correlation. For the DOCin volatility model, Q2(ε

2t , 5) = 18.68 with a p value 0.543,

implying no significant serial correlation.The multivariate Box-Ljung test for the squared standard-

ized residuals is not sensitive to misspecification of the condi-tional correlation structure. To check this condition, we applyunivariate Box-Ljung tests to the product of each pair of stan-dardized residuals. For the EWMA model, Q(ε1tε2t, 5) =16.45 with a p value of 0.006. This model has not adequatelyaccounted for the time-varying conditional correlation. Forthe DVEC model, Q(ε1tε2t, 5) = 7.76 with a p value of0.170, implying no significant serial correlation. For the DCCmodel, Q(ε1tε2t, 5) = 8.37 with a p value of 0.137, implying

no significant serial correlation. For the O-GARCH model,Q(ε1tε2t, 5) = 63.09 with a p value near zero. This modelalso fails to account for the observed time-varying conditionalcorrelation. For the DOC in volatility model, Q(ε1tε2t, 5) =9.07 with a p value of 0.106, implying no significant serialcorrelation.

Other Multivariate GARCH Models

The assumption of a multivariate normal conditional distri-bution is typically justified by the proof in [19] of the strongconsistency of the Gaussian quasi-maximum likelihood formultivariate GARCH models. However, consistent estimatesof the first two conditional moments may not be sufficientfor some applications and there may be non-negligible biasin small samples. It may be necessary to relax the distribu-tional assumption, at the risk of inconsistent estimates. Useof flexible distributions, able to account for the excess kur-tosis or skewness exhibited in the unconditional moments ofasset returns, while not significantly increasing the estimationburden, is desirable. However, few multivariate extensionsare tractable.

[20] and [21] have proposed copula GARCH models, inwhich a conditional copula function is specified, extendingthe conditional dependence beyond conditional correlation.This approach allows for flexible joint distributions in the bi-variate case, but its usefulness in higher dimensions has notbeen studied. In particular, the dynamics of the copula func-tion would also need to be constrained to keep estimation fea-sible. It is also desirable to incorporate asymmetry into a mul-tivariate model; depending on the application, variances andcovariance may react differently to a negative than to a pos-itive innovation. [22] and [5] propose two such multivariateGARCH models which include these so called “leverage ef-fects.”

FURTHER READING

Further discussion of univariate GARCH models and moreexamples can be found in [1]. A more advanced treatmentincluding multivariate GARCH models is available in [23].

ACKNOWLEDGMENTS

David Matteson was supported by NSF Grant CMMI-0926814,and David Ruppert was supported by NSF Grant DMS-0805975. The S&P 500 returns are in the SP500 data set andthe IBM and CRSP returns are in the CRSPday data set inR’s Ecdat package. The R packages fGarch and mnormtwere also used in our analysis.

AUTHORS

David S. Matteson ([email protected]) received a B.S. inBusiness from the Carlson School of Management at the Uni-versity of Minnesota in 2003 with concentrations in Finance,Mathematics, and Statistics, and a Ph.D. in Statistics from theUniversity of Chicago in 2008. He has been a Visiting Assis-tant Professor in the School of Operations Research and Infor-mation Engineering at Cornell University since 2008 and hewas a New Researcher Fellow at the Statistical and AppliedMathematical Sciences Institute in Durham, NC from 2009 -2010. His research interests include time series, dimensionreduction, and emergency medical services information sys-tems.

David Ruppert ([email protected]) received a B.A. inMathematics from Cornell University in 1970, an M.A. inMathematics from the University of Vermont in 1973, and aPh.D. in Statistics from Michigan State University in 1977.He was Assistant and then Associate Professor in the De-partment of Statistics at the University of North Carolina atChapel Hill from 1977 to 1987 and, since 1987, he has beenProfessor in the School of Operations Research and Infor-mation Engineering at Cornell University, where he is nowthe Andrew Schultz, Jr., Professor of Engineering. He is aFellow of the American Statistical Association and of theInstitute of Mathematical Statistics. He won the WilcoxonAward in 1986 and is an ISI highly cited researcher. ProfessorRuppert is currently Editor of the Electronic Journal of Statis-tics. He was formerly an Associate Editor for The AmericanStatistician, Journal of the American Statistical Association,Biometrics, and Annals of Statistics and formerly Editor ofthe IMS’s Lecture Notes–Monographs Series.

Professor Ruppert has written five books, Transformationand Weighting in Regression, Measurement Error in Nonlin-ear Models: A Modern Perspective, 2nd ed., Semiparamet-ric Regression, Statistics and Finance: An Introduction, andStatistics and Data Analysis for Financial Engineering. Hisresearch interests include functional data analysis, nonpara-metric and semiparametric modeling, the effects of measure-ment error, calibration of engineering models, and astrostatis-tics.

REFERENCES

[1] David Ruppert, Statistics and Data Analysis for Finan-cial Engineering, Springer, 2010.

[2] Jim Gatheral, The Volatility Surface: A Practitioner’sGuide, Wiley, Hoboken, NJ, 2006.

[3] Robert F. Engle, “Autoregressive conditional het-eroskedasticity with estimates of variance of U.K. infla-tion,” Econometrica, vol. 50, no. 9, pp. 87–1008, 1982.

[4] Tim Bollerslev, “Generalized autoregressive conditional

heteroskedasticity,” Journal of Econometrics, vol. 31,pp. 307–327, 1986.

[5] Ruey S. Tsay, “Multivariate volatility models,” Instituteof Mathematical Statistics Lecture Notes-MonographSeries, vol. 52, pp. 210–222, 2006.

[6] Ross Ihaka and Robert Gentleman, “R: A language fordata analysis and graphics,” Journal of Computationaland Graphical Statistics, vol. 5, no. 3, pp. 299–314,1996.

[7] Adelchi Azzalini and Antonella Capitanio, “Distribu-tions generated by perturbation of symmetry with em-phasis on a multivariate skew t distribution,” Journal ofthe Royal Statistics Society, Series B, vol. 65, pp. 367–389, 2003.

[8] Luc Bauwens, Sebastien Laurent, and Jeroen V. K.Rombouts, “Multivariate GARCH models: A survey,”Journal of Applied Econometrics, vol. 21, no. 1, pp. 79–109, 2006.

[9] A. Silvennoinen and T. Terasvirta, “MultivariateGARCH models,” Handbook of Financial Time Series,pp. 201–229, 2009.

[10] Greta M. Ljung and George E. P. Box, “On a measureof lack of fit in time series models,” Biometrika, vol. 65,no. 2, pp. 297, 1978.

[11] Wai Keung Li, Diagnostic Checks in Time Series, Chap-man & Hall/CRC, Boca Raton, 2004.

[12] Tim Bollerslev, Robert F. Engle, and Jeffrey M. Wool-ridge, “A capital asset pricing model with time varyingcovariances,” Journal of Political Economy, vol. 96, pp.116–131, 1988.

[13] Robert F. Engle and Kenneth F. Kroner, “Multivariatesimultaneous generalized ARCH,” Econometric Theory,vol. 11, pp. 122–150, 1995.

[14] Yiu Kay Tse and Albert K. C. Tsui, “A multivariategeneralized autoregressive conditional heteroscedastic-ity model with time-varying correlations,” Journal ofBusiness & Economic Statistics, vol. 20, no. 3, pp. 351–362, 2002.

[15] Robert F. Engle, “Dynamic conditional correlation: Asimple class of multivariate generalized autoregressiveconditional heteroskedasticity models,” Journal of Busi-ness & Economic Statistics, vol. 20, no. 3, pp. 339–350,2002.

[16] Tim Bollerslev, “Modelling the coherence in short-runnominal exchange rates: A multivariate generalized archapproach,” Review of Economics and Statistics, vol. 72,pp. 498–505, 1990.

[17] Carol O. Alexander, Market Models, John Wiley &Sons, 2001.

[18] David S. Matteson and Ruey S. Tsay, “Dynamic orthog-onal components for multivariate time series,” Submit-ted, 2010.

[19] Thierry Jeantheau, “Strong consistency of estimators formultivariate ARCH models,” Econometric Theory, vol.14, pp. 70–86, 1998.

[20] Andrew Patton, “Modelling asymmetric exchange ratedependence,” International Economic Review, vol. 47,no. 2, pp. 527–556, 2006.

[21] Eric Jondeau and Michael Rockinger, “The copula-GARCH model of conditional dependencies: An inter-national stock-market application,” Journal of Interna-tional Money and Finance, vol. 25, pp. 827–853, 2006.

[22] Kenneth F. Kroner and Victor K. Ng, “Modeling asym-metric comovements of asset returns,” The Review ofFinancial Studies, vol. 11, pp. 817–844, 1998.

[23] Ruey S. Tsay, Analysis of Financial Time Series, 3rded., Wiley–Interscience, 2010.

garch tutorial

Documents