arch garch heavy tailed

8/7/2019 Arch Garch Heavy Tailed

1/34

Inference in ARCH and GARCH Models with Heavy-Tailed ErrorsAuthor(s): Peter Hall and Qiwei YaoSource: Econometrica, Vol. 71, No. 1 (Jan., 2003), pp. 285-317Published by: The Econometric SocietyStable URL: http://www.jstor.org/stable/3082047

Accessed: 01/03/2010 21:34

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at

http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless

you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you

may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at

http://www.jstor.org/action/showPublisher?publisherCode=econosoc.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed

page of such transmission.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of

content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms

of scholarship. For more information about JSTOR, please contact [email protected].

The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.

http://www.jstor.org
http://www.jstor.org/stable/3082047?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/action/showPublisher?publisherCode=econosochttp://www.jstor.org/action/showPublisher?publisherCode=econosochttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/3082047?origin=JSTOR-pdf


2/34

Econometrica, Vol. 71, No. 1 (January, 2003), 285-317

INFERENCE IN ARCH AND GARCH MODELSWITH HEAVY-TAILED ERRORSBY PETER HALL AND QIWEI YAO1

ARCH and GARCH models directly address the dependency of conditional secondmoments, and have proved particularlyvaluable in modelling processes where a relativelylarge degree of fluctuation is present. These include financial time series, which can beparticularly heavy tailed. However, little is known about properties of ARCH or GARCHmodels in the heavy-tailed setting, and no methods are available for approximatingthe dis-tributions of parameter estimators there. In this paper we show that, for heavy-tailed errors,the asymptotic distributions of quasi-maximum likelihood parameter estimators in ARCHand GARCH models are nonnormal, and are particularly difficult to estimate directlyusing standard parametric methods. Standard bootstrap methods also fail to produce con-sistent estimators. To overcome these problems we develop percentile-t, subsample boot-strap approximations to estimator distributions. Studentizing is employed to approximatescale, and the subsample bootstrap is used to estimate shape. The good performance ofthis approach is demonstrated both theoretically and numerically.

KEYWORDS: Autoregression, bootstrap, dependent data, domain of attraction, finan-cial data, limit theory, percentile-t bootstrap, quasi-maximum likelihood, semiparametricinference, stable law, studentize, subsample bootstrap, time series.

1. INTRODUCTIONIN CONTRAST TO TRADITIONAL time series analysis,which focuses on modellingthe conditional first moment, an ARCH or GARCH model takes the depen-dency of the conditional second moments explicitly into consideration. See, forexample, Engle (1982), Bollerslev (1986), and Taylor (1986). The practical moti-vation for doing so lies in the increasingly important need to explain and modelrisk and uncertainty in, for example, financial time series. Early successes ofARCH/GARCH modelling of financial time series were confined to the caseof Normal errors, for which an explicit conditional likelihood function is read-ily available to facilitate estimation of parameters in the model. Investigationof non-Normal cases has been partly driven by empirical evidence that financialtime series can be very heavy-tailed (e.g., Mittnik, Rachev, and Paolella (1998);Mittnik and Rachev (2000)).This leads to semiparametric ARCH and GARCH models in which the errordistributions are unknown (Engle and Gonzalez-Rivera (1991)). Nevertheless,conditional Gaussian likelihood functions still motivate parameter estimators,which might be called quasi-maximum likelihood estimators. See, for example,Bollerslev and Wooldridge (1992), and Chapter 4 of Gourieroux (1997). Other

We are grateful to two reviewers for their helpful comments.285


3/34

286 P. HALL AND Q. YAOmethods include adaptive estimation for ARCH models (Linton (1993)), andWhittle estimation for a general ARCH (oo) process (Giraitis and Robinson(2001)).It is known that, provided the error distribution has finite fourth moment,quasi-maximumlikelihood estimators are asymptoticallyNormally distributed inthe case of an ARCH model (Weiss (1986)), and also for a GARCH(1, 1) model(Lee and Hansen (1994); Lumsdaine (1996)). However, little more than consis-tency is available in other settings, least of all in the case of relativelyheavy-tailederror distributions that are of particular interest in applications to finance. Inthis paper we develop a very general account of theory for estimators in ARCHand GARCH models, paying special attention to the heavy-tailed case. There thelimit distributions that arise are multivariate stable laws, and are particularly dif-ficult to estimate directly. While this is arguably the most interesting aspect ofour work, even in the case of finite fourth moment (for example, in the settingof GARCH(p, q) models with (p, q) $A(1, 1)) our results are new. Moreover, itis possible to obtain Normal limiting distributionswithout assuming finite fourthmoment.We suggest bootstrap methods for estimating parameter distributions. Now,it is well known that in settings where the limiting distribution of a statisticis not Normal, standard bootstrap methods are generally not consistent whenused to approximate the distribution of the statistic. See, for example, Mammen(1992), Athreya (1987a, 1987b), Knight (1989b), and Hall (1990). To some extent,subsampling methods can be used to overcome the problem of inconsistency.See Bickel, Gotze, and van Zwet (1995) and Politis, Romano, and Wolf (1999)for recent accounts of the subsampling method. However, while this approachconsistently approximates the distribution of a statistic, it does so only for avalue of sample size that is smaller than the size of the actual sample. Thesurrogate that it uses for sample size is the size of the subsample, which hasto be an order of magnitude less than the sample size. As a result, the "scale"of the distribution that is approximated by the subsample bootstrap is generallyan order of magnitude larger than that for the true sample size. Therefore, aconfidence or prediction procedure based directly on the subsample bootstrapcan be very conservative. In the absence of an accurate method for adjustingscale, subsamplingcan be unattractive.To overcome this problem we suggest a new approach based on a percentile-tform of the subsample bootstrap. The percentile-t method is usually employedin order to attain a high order of accuracy in approximationswhere the limitingdistribution is Normal. That is not our main goal in the present setting. Instead,we use a form of the percentile-t subsample bootstrap to ensure consistent dis-tribution estimation in a particularlywide range of settings, where the limitingdistribution can be either Normal or non-Normal. We studentize primarily todetermine the scale of the test statistic. The subsample bootstrap can then beemployed to estimate just the shape of the distribution, rather than both shapeand scale. In this way we avoid the difficulties noted in the previous paragraph.


4/34

ARCH AND GARCH MODELS 287In more regular cases, where relatively high-order moments of the error dis-tribution are finite, conventional percentile-t methods can be developed. They

differ from the techniques discussed in the present paper in that they use thestandard bootstrap rather than the subsample bootstrap, and they studentizeusing an estimator of the square root of the covariance matrix of the vector ofparameter estimators. (In this paper we studentize using a scalar quantity; thecovariance matrix is generally not well defined in the heavy-tailed case.) Whensufficiently high-order moments can be assumed, the method for implementingconventional bootstrap approximations closely parallels that used for the boot-strap in linear time series; see for example Bose (1988). For the sake of brevitywe shall not discuss such methods further here.Classical work on financial time series with non-Normal errors includes thatof Mandelbrot (1963) and Fama (1965), although of course without the bene-fit of critical recent developments, particularly in extreme value analysis. EvenARCH or GARCH models with Normal errors can have heavy tails; see forexample Kesten (1973), Goldie (1991), Embrechts, Klippelberg, and Mikosch(1997), Davis and Mikosch (1998), and Mikosch and Starica (2000). Statisticalaspects of financial modelling of heavy-tailed data are discussed by, for example,Shephard (1996) and Rydberg (2000).Our results are in the spirit of some of the findings of Mikosch and Straumann(2001), who show that poor rates of convergence can occur if X, is a GARCHprocess and E(XI) is infinite. Part (a) of Theorem 2.1 in Section 2.3 below hasalso been obtained by Berkes, Horvath, and Kokoszka (2001) under differentconditions; see also Comte and Lieberman (2000).

2. MAIN THEORETICAL RESULTS2.1. Model

Assume Xt = OtEt for -00 < t < 0, where the random variables Et are inde-pendent and identically distributed with zero mean and unit variance, Et is inde-pendent of {Xt-j, i > 1},p q

(2.1) o2 = C+Eai X2i+Ebjot2,i=l j=1c > 0, ai > 0, bj > 0, p > 0, and q > 0, the latter two quantities of course beingintegers. If q = 0, then the model is of autoregressive conditional heteroscedastic,or ARCH, type. If q > 1 it is of generalized ARCH, or GARCH, form. To avoidpathological cases we shall assume throughout that p > 1 and ap > 0, and bq> 0when q > 1.It is known that a necessary and sufficient condition for the process {Xt, -00


5/34

288 P. HALL AND Q. YAOsee Nelson (1990), Bougerol and Picard (1992), and Giraitis, Kokoszka, and Lei-pus (2000), and also Bollerslev (1986). In this case, E(X,) = 0 and

E(X2) = c (1 - ai- bjWe shall assume (2.2) throughout, and that the process is in its stationarydistribution. In this case it may be shown that (2.1) implies

P p 00 q q(2.3) Ort2= _ h+EajX2 i+aEE E bjl..bjkX2__,._ki=1 i=1 k=ljl=l jk=1where the multiple series vanishes if q = 0. Since each ai and bj is nonnegative,and since the expected value of the multiple series is finite, then the series con-verges with probability 1. In this notation we may write ot = ort(a,b, c), express-ing ot as a function of a = (a,, . . ., ap), b = (bl, . . ., bq), c, and the data Xt.In practice the data are observed only over a finite time interval, say 1 < t < n,and o2 has to be approximated by a truncated series. For simplicity and clarity,however, we shall assume for the present that insofar as calculation of St isconcerned we may use values of Xu for -oc < u < n, even though in otherrespects our inference will be confined to Xt for 1 < t < n. The contrarycase willbe discussed in Section 2.4. There we shall show that our main results do notchange when an appropriatelytruncated approximation is employed.

2.2. EstimatorsConditional maximum likelihood estimators in problems of this type were dis-cussed by Engle (1982) and Bollerslev (1986). They can be motivated by tem-porarily assuming that the errors Et are Gaussian, which would imply that if(a, b, c) took their true values, then the variables Xt/lu(a, b, c) would be inde-pendent and identically N(0, 1). Therefore, without requiring Gaussian errorsfrom this point on, it is suggested that we minimize

(2.4) L(a,b,c)=E j( X C)2 +logot(a,b,c)2}with respect to the r = p + q + 1 variables in (a, b, c). There is an extensiveliterature on using Normal-based methods for non-Normal data, including datawith infinite variance. See, for example, Cline (1989) and references cited therein.The derivatives of L(a, b, c) may be deduced from the formulae(2.5) d _2(2.6) dc 1 - j bj(2.6) da = t-i+ E E ... E bii...biktIik-l -tjk'


6/34

ARCH AND GARCH MODELS 289(2.7) O_t2 c p~~~~~00(2.7) _ (1_ - Ei bi)2 + ai(k+1)

q qxE... E bjl ..bjk X2---, j

ji =1 ik =1

To interpret (2.7) relative to (2.3), note that in the latter result we could havedropped the second term on the right-hand side if we had summed from k = 0to oo, instead of from k = 1 to oo, in the multiple series.2.3. General CentralLimit Theorem

Let U = U(a, b, c) denote the r-vector of first derivatives o..2 = ur1(a,b, C)2with respect to the components of a, b, and c. Put M = EO(ouj4UUT), an r x rmatrix, where Eo denotes expectation when a, b, and c take their true valuesao, b?, and cO. It may be deduced from (2.3) and (2.5)-(2.7) that if none ofa1,... , ap vanishes, and if (for q > 1) none of b1, ... , bq vanishes, then eachcomponent of UT2U has all its moments finite, and in particular there exists aconstant C > 0 such that E(y2 11U11) < i! Cv for all integers v > 1, where 11.11denotes the Euclidean norm. See Section 2.5 for discussion. Hence, the existenceand finiteness of the components of M are guaranteed. We shall assume M isnonsingular. Let (a, b, 8) be any local minimum of L(a, b, c) that occurs withinradius r of (a?, bo, c0), for sufficiently small but fixed q > 0, and write 0 for thecolumn vector of length r whose components are those of (a, b, 8). Likewise let0 denote the column vector of components of (a, b, c), and let 00 be the versionof 0 for the true parameter values.Recall that we assume throughout that the distribution of E has zero meanand unit variance, and in particular that E(E2) < 00. When E(E4) = oo, but thedistribution of E2 is in the domain of attraction of the Normal law, put(2.8) H(A) = E{E4I(E2 < A)} and An= inf{A >0: nH(A) < A2}.The function H is slowly varying at infinity;see Section IX.8 of Feller (1966).

Next consider the case where the distribution of E2 is in the domain of attrac-tion of a stable law with exponent a E [1, 2). Redefine An by(2.9) An=inf{A>0:nP(E2 > A) < 1}.The properties of a stable law imply that Anis regularly varying at infinity withexponent 1/a. That is, An= nl/la(n) where e is a slowly varyingfunction, meaningthat an appropriate extension of e to the real line satisfies ?(cn)/1(n) -* 1 asn -* oo, for each c > 0. Examples of slowly varyingfunctions include polynomialsin the logarithm function, or in iterates of that function.Let Y1, Y2,... represent the infinite extension of the multicomponent jointextreme-value distribution of the first type, with exponent a. That is, for each kthe distribution of (Y1, . . . , Yk) is the limiting joint distribution of the k largest


7/34

290 P. HALL AND Q. YAOvalues of a sample of size n drawn from a distribution in the domain of attractionof the first type of extreme-value distribution, after appropriate normalizationfor scale. As is standard, we assume the normalization is chosen so that Y1 hasdistribution function exp(-y-a) for y > 0. Then it may be shown that for eachk > 1 the marginal distribution function of Yk is given by

k-1(2.10) Fk(Y) = exp(_y-a) y-ja/I!, y > 0.j=0Hall (1978a) formulated a representation of the distribution of the full processy1, Y2, ...Let Vl, V2,. . . be independent and identically distributed as ol2M 1U, wherewe take 0 = 00, and let them be independent also of Y1,Y2, .When 1 < a An)} (Note that E(Y1) = oo when a = 1; hence theneed to work with both WOand W1.) Convergence of the infinite series at (2.11)and (2.12) is guaranteed; see part (e) of the theorem below. When a = 1, An iSan unbounded, slowly varying function of n. Let y denote Euler's constant.By Theorem 1.4.5 of Samorodnitsky and Taqqu (1994, pp. 28), the marginaldistributions of WOand W1 are stable with exponent a. It follows from thatproperty, and characterizations of multivariate stable laws, that the multivariatedistributions of WOand W1 are multivariate stable, although the characteristicfunctions in this setting are awkward to write down and the distributions aremore difficult still to estimate directly. The characteristic functions of the jthcomponents of WOand W1are respectively(2.13) E{exp(itW("))} = exp[-sjItla{1 -i 3j sgnttan(a-7n/2)}],

E{exp(itW(")) I = exp[iojt -sjltl{ 1 + i,l3(21 7r)sgnt log ltl}],where sgn t denotes the sign of t (taken equal to 1 if t = 0), s, = El V(I)Ia/Ca,,8jEIV(j)Ia= E{IV(i)IasgnV(i)}, V(i) denotes a random variable with the distribu-tion of the jth component of Vk,C1= 2/I7r, Ca = (1- a)/{F(2 - a) cos(alr/2)}for 0 < a < 1, and 4j may be deduced from Samorodnitsky and Taqqu (1994,pp. 28-29). In interpreting sj and fj in the case of (2.13), note that a= 1 there.

THEOREM 2.1: Assume M is nonsingular,that p > 1, that all of al,,.. , ap arenonzero, that if q > 1 then all of b1, . . . , bq are nonzero, that c > 0, and that q


8/34

ARCH AND GARCH MODELS 291(employed in the definition of 0) is strictlypositive and sufficientlysmall. (a) IfE(E4)


9/34

292 P. HALL AND Q. YAObe calculated exactly. However, a,(a, b, c)2 can be computed in an approximate,truncated form, as follows. For 2 < t < n, define

- 1- b, min(p, t-1) P 0 q q(2.15) Q(a, b, C)2- + E aiX2i+EaiEE... Ebj1 .bj,Ej j i=1 i=1 k=lj1=1 jk=1x t-_ijl_..._jkI(t-ij- * -k > 1).

The indicator function here (denoted by I), and the truncation of the first seriesover i, ensure that the definition of 6t(a, b, C)2 uses only the data X1, . . ., Xt.However, for small t the accuracy of this approximation to ot2 will be severelycurtailed, suggesting that when conducting inference we should avoid early termsin the series. Thus, a practicable version of L might be defined byn x2\2(2.16) L,(a, b, c) = |I - C)2 +log&t(a, b, C)2}j

where the integer v = v(n) divergeswith n but at a rate sufficientlyslow to ensurev/n -+ 0 as n -x o0. Our next result shows that for appropriate choice of v, theresults summarized by Theorem 2.1 continue to hold if estimators are computedusing the truncated likelihood Lv.

THEOREM 2.2: Assume the initial conditions of Theorem 2.1, as well as theadditional conditionsfor any one of parts (a)-(c) of that theorem.Suppose too thatv = v(n) satisfies v/ log n -x oc and v/n - 0 as n -> ox. Then, if the estimator 0is defined by minimizingL, defined at (2.16), instead of L at (2.4), the respectiveconclusions of parts (a)-(c) of Theorem2.1 hold.There is also a version of part (d) of Theorem 2.1 in the context of truncatedlikelihood. However, it involves a new centering constant that depends on v.For brevity we do not give it here. Our development of bootstrap methods inSection 3 will be founded on estimators calculated using truncated likelihood.

2.5. Moments of oy2UIt is readily seen that the components of o1T2U that correspond to derivativesof a, (a, b, c)2 with respect to the components of a, or with respect to c, arebounded. Therefore, to show that all moments of U21 U11are finite it suffices toshow that the components of o1 2U that correspond to derivatives with respect tocomponents of b have all moments finite. To this end, given a weight function w,define

P 00 q qW(w) =,ai ,: w(k) , bj, ... bjk Xt2-ij--ji=1 k=1 j1=1 jk=1


10/34

ARCH AND GARCH MODELS 293and put V = W(w) for w(k) _ 1 and VK= W(w) for w(k) _ kI(k > K). Thenit is sufficient to prove that E{V1/(V + 1)} < v! c' for a constant C1> 0 and allintegers v > 1. This will follow if we show that(2-17) P{V1 > K(V + 1)} < C2C3Kfor constants C2> 0 and C3 E (0, 1), and all integers K > 1. Now, V1> K(V + 1)implies VK> K, and

00KP(VK > K) < E(VK) < C4 E kCs < C6KCSK,

k=Kwhere C4, C6 > 0 and C5= Ej bj < 1. Therefore (2.17) holds with (C2, C3) =(C6, CO).Similarly it may be proved that if Ur denotes the vector of rth derivatives oful(a, b, c)2, then E(oT2IIUrII)l < oo for all v > 1. Likewise, all moments of thesupremum of o17211Ur 11over values of (a, b, c) in a sufficientlysmall neighborhoodof the true values of these parameters are finite.

3. BOOTSTRAP METHODS3.1. DeterminingScale by Studentizing

First we discuss the scales associated with the limit results described by dif-ferent parts of Theorem 2.1. It may be deduced from the theorem that if thedistribution of E2 lies in the domain of attraction of a stable law (including theNormal law) having exponent strictly greater than 1, and if we define Ento bethe infimum of values A> 0 such that nH(A) < A2,where H(A) = E{E41(E2 < A)}(the same definition as at (2.8)), then(3.1) nt-1 (6- 60) has a proper, nondegenerate limiting distribution.Indeed, if 72 = var(E2) < 00, then En- n1/2(T2 + 1)1/2, and so by part (a) ofTheorem 2.1 the limit distribution claimed at (3.1) is Normal with zero meanand variance matrix (1 + r-2)-1M-1. If r2 = oobut E2 is in the domain of attrac-tion of the Normal law, then en is identical to An defined at (2.8), and so bypart (b) of the theorem the limit is Normal N(O, M-1). And if the stable law forthe domain of attraction has exponent a E (1, 2), then in is asymptotic to a con-stant multiple of the quantity Andefined at (2.9), and so the limiting distributionclaimed at (3.1) is a rescaled form of that given in part (c) of Theorem 2.1. Ineach of these cases we shall let W denote a random variable with the limitingdistribution of nt-1 ( - 0).Result (3.1) implies that in very general circumstances the scale of (6 - 00)is accurately described by n-lfn. In particular, the assertion 6 -0 = Op(n-1en)gives an accurate account of the order or magnitude of 0- 00. However, the sizeof En depends intimately on the particular law in whose domain of attraction


11/34

294 P. HALL AND Q. YAOthe distribution of E2 lies. The law is unknown, and so it is quite awkward todetermine the scale empirically; this is the root of the difficulty of accuratelyapproximating the distribution of 0 - 00.This difficulty can be overcome by observing that, if we define(3.2) 2 = _E _(_ Ethen n1/2' has scale en. Indeed, we claim that in very general circumstancesn112enl convergesin distributionto a proper,nonzerolimit as n -x o. If r < x,then the limiting distribution is clearly degenerate at a positive constant. Thesame holds true if r= oo but the distributionof E2 lies in the domain of attractionof the Normal law. The limiting constant here is in fact 1; this follows from theso-called "weaklaw of large numbers with infinite mean" (e.g., Theorem 3, page223 of Feller (1966)). If r = oo and the distribution of E2 lies in the domainof attraction of a stable distribution with exponent a E (1, 2), then (nn/2e 17)2converges in distribution to a strictly positive stable law with exponent 2a; thismay be deduced from Section IX.8 of Feller (1966).Not only do ne-1Q9 600)and n1/2e-1' both have proper limiting distributions,but their weak convergence is joint, as our next result shows. There, and in theother results in this section, it is assumed that all parameter estimators (includingbootstrap estimators) are constructed by minimizing the negative log-likelihoodwithin a fixed but sufficiently small distance, ij > 0, of the vector of true param-eter values.

THEOREM 3.1: Assume the initial conditions of Theorem 2.1, as well as theadditional conditionsfor any one of parts (a)-(c) of that result. Then(3.3) e-1(n( -00), n12) -> (W, S)in distribution,wherethe random variable S satisfies P(O < S < ox) = 1.

In the contexts of cases (a) or (b) of Theorem 2.1, (3.3) follows triviallyfromthe theorem, since (as noted two paragraphs above) S is then degenerate at apositive constant. In case (c) of the theorem it can be shown that we may write(W, S) = (c1WO,c2Y), where c1, c2 are strictly positive constants, WOis given by(2.11), and y2 = Zk>1 Yk2, with Yk being the same as at (2.11). The method ofproof is similar to that of part (c) of Theorem 2.1. The distribution of y2 iSidentical to the limiting distribution of A 2 Et E4, where Anis defined at (2.9),and has a stable law with exponent 2a; see the corollary of Hall (1978b). Theconstants c1 and c2 may differ from 1 only because the divisor en at (3.3) maydiffer from the norming constants used in Theorem 2.1. In particular, if in thecontext of part (c) of the theorem we replace En by An at (3.3), then that resultholds with (W, S) = (WO,Y) and Y as defined just above.


12/34

ARCH AND GARCH MODELS 295Of course, (3.1) implies that

(3.4) n1/2 -Sin distribution. Comparing (3.1) and (3.4) we see that in normalizing by 'r wehave eliminated the unknown scale factor in from the distribution of 0- 06. Thelimiting distribution in the case of (3.4) is unknown only in terms of shape, notscale. In view of this result it would be straightforwardto approximate the distri-bution of the left-hand side of (3.4) using the subsample bootstrap, except thatthe errors Et used to compute r' are unknown. However, they may be replacedby residuals, which we introduce in the next section.

3.2. Resamplingfrom a GARCH ProcessSuppose we are given a sample W = {X1,.. ., Xn}, generated by the modeldescribed in Section 2.1. Define the truncated version &t2of ot2,and the truncatedversion L. of L, by (2.15) and (2.16) respectively. Both are functions only of thedata in W. Choose (a, b, c) to minimize L,(a, b, c) over nonnegative parametervalues. (Our theory permits the minimization to take place only over (a, b, c)within some radius ij of (a?, b?,c0), although the same sufficiently small ij maybe used throughout the bootstrap algorithm. In numerical practice, however,

it appears that the minimization may be done globally.) Put -t = t(a b,In this notation the "raw" residuals are Et= Xt/t for v < t < n. They can bestandardized for location and scale, by defining(3.5) Et = -n' I ju2_ (n1>IZE )2}1/2 < t < n,{n1 ~ n1 ~~)}where n1= n - v + 1 and each sum over u is taken over v < u < n.Draw E*, for -o < t < oo,by sampling randomly, with replacement, from thecentered residuals E, ... , en. (The series of 't's has already been truncated, butit could be further reduced if necessary, to remove suspected edge effects. Ourtheory requires only that the number of 't's exceed a positive constant multipleon n, for all sufficiently large n. In practical implementation we draw E* for-K < t < n, where K > 0 is a sufficiently large integer.) Consider the stationaryprocess (conditionalon W) definedby X* = O7Et**for -oo < t < oo, where, byanalogy with (2.1),

p q(*2 = c + E ai(Xt*i)2+ E bj((vt_* o


13/34

296 P. HALL AND Q. YAOp > 1 and q > 1 respectively), the same properties carryover to a, b, and c, withprobability converging to 1. Likewise, for each ij > 0 the probability that

|ai +b ai + bj) ' oo. Therefore, provided the original process X, was sta-tionary, i.e. (2.2) holds, it will also be true that the probability, conditional on X,of X* being stationaryconverges to 1as n -x oo.Here we have used a conditionalform of a result of Giraitis, Kokoszka, and Leipus (2000).Next we introduce a version of the m-out-of-n bootstrap. Let m < n, andcompute estimators (a*, b*, c*) of (a, b, c) using the dataset W*= {X1, .. , X*}and the truncated likelihood approach described in Section 2.4. In particular,(a*, b*, c*) are defined in the same way as functions of W*, as were (d, b, c) asfunctions of W.We can use the same value of v as before, provided v/m -+ 0. Let6 and 6* denote the vectors formed by concatenating the components of (a, b, c)and (a*, b*,c*), respectively, and put m1= m - v + 1,

ln 1 n \2 l m 4 r m 22 _ _ (*)2 = m t=v

these being the empirical and bootstrap versions, respectively, of ,2 definedat (3.2).3.3. BootstrapApproximation

The distribution of m1/2(f*)yl(6* -_ ), conditional on W, is our bootstrapapproximation to the unconditional distribution of nl/2'-' (6 - 0). Both distri-butions enjoy the limit property at (3.4); this follows from Theorem 3.2 below.Here it is necessary to assume that the size m of the bootstrap subsample W* isstrictly smaller than the sample size n, although still diverging to infinitywith n.THEOREM 3.2: Assume the conditions of Theorem 3.1. Suppose too that m =

m(n) satisfiesm -+ oo and mn -> 0 as n -+ oo, and that the truncationpoint v usedto construct the likelihood L. at (2.16) satisfies v/log n -? oo and vim -O 0. Thenf -1(n( 6-0?), n1/2f) (W, S)

in distribution,andp{ t-1(M (6*-_ 6), ml/27~*) E [W1, W2] X [S1, S2]|}

-> P{(W, S) E [W1, W2] X [S1, S2]}inprobabilityfor each -00 < W1 < W2 < 00 and all continuitypoints 0 < s1 < S2 < 00of the distributionof S, where the random vector (W, S) is as at (3.3).


14/34

ARCH AND GARCH MODELS 297COROLLARY: Assume the conditions of Theorem3.2. Thenfor all convex sub-sets ',

PiMl/2 yl _ 6) E cl -P_{n1/2 (-l 00) E -IJ - 0in probabilityas n -x 00.

An outline proof of Theorem 3.2 will be given in Section 5.3. The corollaryfollows from Theorem 3.2 on noting that the distribution of WIS is continuous.When the distribution of E2 is in the domain of attraction of the Normal dis-tribution (i.e. in cases (a) and (b) of Theorem 2.1), Theorem 3.2 holds withoutrequiring m/n -O 0. In particular, in that setting it holds for m = n. However,the condition mn -> 0, along with m -x 00, is essential in the heavy-tailed case,where the error distributionis in the domain of attractionof a non-Normal stablelaw. Hall (1990) has given necessary and sufficient conditions for the standard,n-out-of-n bootstrap to produce consistent distribution estimators in the simplercase of estimating the distribution of a sample mean, and analogous results maybe derived in the present setting.Of course, these remarks address only asymptotic results. The way in which thebootstrap approximation depends on m for finite sample sizes n is not explicitlyclear from Theorem 3.2 and its Corollary.Section 4 will take up this issue directly,and show that performance of the bootstrap approximation is robust againstvariation in m.In conclusion we explain intuitively why, in the heavy-tailed case, the n-out-of-n bootstrap gives inconsistent results. In effect, it fails to accurately modelrelationships among extreme order statistics in the sample. For example, for eachfixed k > 2 the probability that the k largest values in a resample are equaldoes not converge to 0 in the case of the n-out-of-n bootstrap. The probabilitydoes converge to 0 for the m-out-of-n bootstrap, provided mn -O 0. And ofcourse, it converges to 0 for the sample itself. In the case of heavy-tailed errordistributions the limit properties of parameter estimators are dictated by thebehavior of extreme order statistics. In particular this is why, in the heavy-tailedcase, the distributions of the limit variables W and S are expressed in terms of

extreme-value distributions.3.4. ConfidenceRegions

In principle, simultaneous multivariate confidence regions for the componentsof 06 can be developed using the asymptotic approximation suggested by theCorollary.However, such regions can be difficult to interpret, and moreover theirconstruction requires a determination of region shape. In the present generalsetting it is unclear how to do this. Therefore we shall consider only one-sidedconfidence intervals for individual parameter components. Two-sided intervalsmay be obtained in the usual way, on taking the intersection of two one-sidedintervals.


15/34

298 P. HALL AND Q. YAOThe vectors 00, 0, and 0* are each of length r = p + q + 1. Use the superscriptnotation (k) to denote the kth component, where 1 < k < r. Given IT E [0, 1], for

example r = 0.90 or 0.95, putUIT=inf{u P[ml/2(i*)l(6* - 0)(k) < Ul(] > IT}.

We may of course compute ut to arbitrarynumerical accuracy by Monte Carlosimulation of the bootstrap distribution.Let J, = [0(k) - n12'TUT, oo) be a poten-tial confidence region for (00)(k). It follows from the Corollarythat J, has nomi-nal coverage r, and that this coverage is asymptoticallycorrect in the sense that,under the conditions assumed in the Corollary, p{(60)(k) E J,} -+ IT as n -x o0.Our approach can be employed to construct consistent confidence regions evenwhen the error distribution does not lie in any domain of attraction. Compare,for example, Hall and LePage (1996). However, on the present occasion such adegree of generality would be a significant distraction, and we do not pursue it.4. NUMERICAL PROPERTIES

We report results of a simulation studyof ARCH(2) and GARCH(1, 1) models.The latter are the most commonly found GARCH models in the literature, andenjoy significant application in the finance setting. In both cases we took theerrors Et to have Student's t distributionwith d degrees of freedom, for d = 3, 4,or 5. Note that ElEtld= oo.For ARCH(2) models we employed c = 1, a, = 0.5,and a2 = 0.4. We used the same c and a1 for GARCH(1, 1) models, and tookb= 0.4. It follows that our ARCH and GARCH processes both have the samevariance.We draw 1000 samples of size n = 500 and 1000, respectively, in each setting.We truncated likelihood functions at v = p + 1 = 2 in the case of the ARCHmodel, and v = 20 for the GARCH model. Parameters were estimated by maxi-mizing the likelihood L at (2.4). Boxplots of the average absolute errors (AAEs)are presented in Figure 1. The AAE is defined as 3 (IC-Cl + la-a1 I + a2 -a21)for ARCH(2), and 1(IC3- Cl+ a' - a,1 + lb1- b 1) for GARCH(1, 1). The AAEis larger when the tail of the error distribution is heavier (i.e., d is smaller),although the deterioration is only slight. Moreover, the deterioration of estima-tor performance as we pass from the ARCH model to the relatively complexGARCH case is also only slight.Bootstrap confidence intervals were constructed for each parameter in eachmodel. For the sake of simplicitywe give results only for the one-sided intervals[0(k) - n-1/2u.T, o) introduced in Section 3.4, in the case X = 0.9. Here, ut isthe 10% quantile of a bootstrap sample drawn from the conditional distributionof m1/2(0* - 0)/'* for sample size m. We took m = 250,300,350,400, and 500when n = 500, and m = 500,600,700,800, and 1000 when n = 1000. Thus, weincluded the case m = n in our simulations. Each bootstrap sampling step wasrepeated B = 1000 times, and 1000 samples were drawnfor each configuration ofparameters. The relative frequency of the event that a bootstrap interval covers


16/34

ARCH AND GARCH MODELS 299(a) AAEforARCH(2) (b) AAE for GARCH(1,1)

of -X1 - 1

oi -* =

0 W "L L~J L:J C>0j L -d=3 d=3 d=4 d=4 d=5 d=5 d=3 d=3 d=4 d=4 d=5 d=5n=500 n=1000 n=500 n=1000 n=500 n=1000 n=500 n=1000 n=500 n=1000 n=500 n=1000

FIGURE 1.- Simulated quasi-maximumlikelihood estimates. Panels (a) and (b) show boxplots ofaverage absolute errors of estimators in the cases of (a) ARCH(2) and (b) GARCH(1, 1) models,respectively, with errors distributed as Student's t with d = 3, 4, 5 degrees of freedom, and samplesizes n = 500 and 1000.

the true value of the parameter was taken as our approximation to the trueconfidence level of the bootstrap interval.Figure 2 displays approximate coverage levels for parameters c, a,, and a2in the ARCH(2) model. In each case the level is close to its nominal value,0.90, although accuracy is noticeably greater for the larger sample size. It can beseen that in the ARCH(2) case, distributions with lighter tails tend to producerelatively conservative confidence intervals.Nevertheless, only when n = 500, andfor the parameter a,, would the anticonservatism of the extreme heavy-tailedcase (i.e. d = 3) be a potential problem. Note particularly that coverage erroris quite robust against changes in m. The finite sample properties in the casem = n, as demonstrated in the figure, are broadly similar to those for m < n.Figure 3 shows coverage levels in the case of the GARCH(1, 1) model. Themethod is having somewhat greater difficulty in this more complex setting,although serious problems occur only when constructing confidence intervalsfor bl. As in the ARCH case, lighter-tailed distributions tend to produce rela-tively conservative confidence regions, and coverage error is again robust againstvarying m.The method is less robust against choice of m when the error distributionis asymmetric. To illustrate this point we simulated samples with n = 500 and1000 from ARCH(2) and GARCH(1, 1) models when the error distribution wasPareto, with density 3/(1 +x)4 for x > 0 except that it was recentered and rescaledso as to have zero mean and unit variance. The results are depicted in Figure 4.


17/34

300 P. HALL AND Q. YAOn=500, c n=500, a_l n=500, a_2

4 3 635 cm1

> 4 5 0iG)

.X,, X 1-. S--5--5 --6---- -- c

4) ci 0) - 5 455 5.- e444

coniec leel09

IOU 44cm ci~~~4 4 4 4 4------------- 4...3

350~~~~ ~~~whe 30,adwt 0 r80we 00

4) ~ ~ ~ ~ ~ ~ 3 3 % 336 6 3~~~~~~~~~~~~~~~~~~~~~~3C

form~~~~~~~~~~~~~~~c[o ()_nl2u_ aebodysmlr orteAC()mdlte

250 350 45'0 250 350 450 250 350 45'0m m m

n=1000sc n=1000,aan n=1000,a 23

s 4 3G 6 63 ccc5 4 5 6 555D5 5 ~~~~~~~~5ci55 45 43 4 5 ......4---- -----54

'a 05~~~~0'E 4 ~433 33 6iE ~~~ ~~~~~~~33 3 30 6~~~~~~~~~~~~~2G)~~~~~~~~~~~~~~~iGupper Gi(t lataradesdnFgr3.H er,ower-tailed intervals6or c nd a1have educe coveage rlativ to teir uper-ti

m mm~~~~~cFIGURE 2.- Confidencelevelsof bootstrapintervalsfor ARCH(2) models withsymmetricerrors.Theestimated confidence levels are plotted against m. The labels "3," "4," and "5" correspond to the

number of degrees of freedom, d, of the error distribution. The dotted line indicates the nominalconfidence level, 0.9.

For either model, minimal coverage error was obtained with m approximately350 when n = 500, and with m';z 700 or 800 when n = 1000.Results for one-sided confidence intervals of the opposite parity, i.e. of theform [0,(k - n r_T] are broadly similar. For the ARCH(2) model theyshow a tendency towards relatively less conservatism. By way of contrast, in thecase of the GARCH(1, 1) model, lower-tailed confidence intervals for b, havegreater coverage, and substantially greater coverage accuracy, than those in theupper tail (the latter are addressed in Figure 3). However, lower-tailed intervalsfor c and a, have reduced coverage relative to their upper-tailed counterparts.


18/34

ARCH AND GARCH MODELS 301n=500, c n=500, a_1 n=500, b_1

~~~~~~~ 05 L5444

250 35 450 250 30 425 5 55 cio 4

3 3cm 4 4~~6 5o 335~~~

2500 30 450 2500 30 450 2500 30 400

m m m

5..Poo ofTere .

54 ~~~~~~4 6E 5 6 53 3 5~~~~~~~~~333~~~~~~~~~~~~

N C~~~~~~~~~~~~O500 370 900 50 02 700 900 500 70 900m m m

Step (i): Preliminaiy Expansion. Recall that (a, b, c) has been concatenatedinto a vector 6 of length r = p + q + 1. Let pt(G) denote the r-vector whoseith component is o-t(G)-48o(Gt)2/8Gi.In this notation the likelihood equations,defining the extremum of the negative log-likelihood at (2.4), aren

(5.1) ~{Xt2 - oJt(G9)2}pt(G9)= 0.t=1


19/34

302 P. HALL AND Q. YAOn=500, c n=500, a_1 n=500, a_2 or b_1

a603a~~~~~~~~~~~~~~~~~~~~a>a aa0 6 a a~~~~~~~~aaa

0g 0 1 a_ nag a~~~~~~~~~~~g a(U ............. .....g.g.

.E e 9~~~~~~~~~~o , . . . . . o g

250 350 450 250 350 450 250 350 450

m m m

n=1000,cn=1000,o a1 n=1000, an2ior_b1a a a a

a~~~~a~~ a 6~~~~

noin a25-27 an h eut nScin25 it a esow yTyo

a~~~~~~~~~~~9 9 g 9~~ 9

epnion tha0

(5.) (9)2 ff 0902 + (0?)T0-C) 1-0lRt(0)()

6 .p....... 6...... ... ..........500 700 900 500 700 900 500 700 900FiGURE 4.- Confidence levels of bootstrapintervalsfor ARCH(2) and GARCH(1, 1) models withasymmetricerrors.Symbols "a" and "g" indicate results in ARCH and GARCH cases, respectively.

The dotted line indicates the nominal confidence level, 0.9.

Let At(6) and Bt(6) be the r-vector and r x r matrix, respectively, of derivativesof ot (6)2 and pt(6), respectively, with respect to 6. Using the fact that p > 1, thatnone of a,,. .. , ap vanish, and that if q>1,then none ofbl, . .. , bqvanish; andnoting (2.5)-(2.7) and the results in Section 2.5; it may be shown by Taylorexpansion that(5.2) u-(6)2 = u-t(60)2 +At(6O)T(6O_60) + 16- 60112Rlt(0)ot(00)2,

pt(6) =pt(0) +Bt(00)(6- 00)+II60_II12R2 (0)o,t(00)-2,


20/34

ARCH AND GARCH MODELS 303where R1l(0) and R2t(0) are an r-vector and an r x r matrix, respectively, andfor a constant C > 0 not depending on -1provided the latter is sufficiently small,and with Rt = R1t and Rt = R2t,n(5.3) p n-1 E sup IRt(0)I < C -1,t=1 10-0011 0 not depending on 7) provided the latteris sufficiently small,(5X5) Pt SUp |R(O9)l< C ->1.

110-0011


21/34

304 P. HALL AND Q. YAOwhere the "op(l)" term does not depend on 0, w, = a, (00)2p,(00), and R satisfies(5.5). Since 7j > 0 is arbitrarilysmall, although fixed, then this implies that

n(5.9) {M+ op(1)}(0 - 00) = n-1 E(E6t-1)w.t=l

To derive (5.9) from (5.8), let M1= M + op(l) be identical to the term withinbraces on the left-hand side of (5.8). Multiply both sides of (5.8) on the leftby M-1; denote the resulting right-hand side by '- OP(n-1/2); and put S(6) =MT1R(6). Let C1 equal the constant C at (5.5) multiplied by twice the inverseof the absolute value of the smallest eigenvalue of M; observe that by (5.5),P sup IS(0)I< Ci} 1110-0011


22/34

ARCH AND GARCH MODELS 305where v denotes the integer part of nWand 0 < g < 1. Then the summandswith indices t and u are independent if It - uI > v, whence it follows that thevariance of the series on the left-hand side of (5.10), after the summands havebeen modified, equals O(nv). Moreover, the differences between the originalseries and its modified counterpart, and between the expected values of thosetwo series, both equal 0(n-C) for all C > 0. Result (5.10) follows on combiningthese properties.) Therefore, by (5.4) and since An/n is a slowly varyingfunctionof n,(5.11) nAn1{I + n-1AA,M-1Ml ? Op(An/n)}(0 -0)nM_lA_l (E2 _ 1)Wt + Op(nA 110 611 2) + op(1).t=1

Result (5.11), in company with the assumption that n-AAnpn2 0, and thepropertyn(5.12) M-1A1 D(E -2)w +nE(V1) - yE(V1)

t=1converges in distribution to W1 as n -oo,which we shall establish in step (iii) of the present proof, implies that 0-0 =Op(An,Anl/n).Again using the fact that n-1AnAn2-* 0 we may now deduce firstthat nA11160-600112_O0 in probability, and thence, from (5.11), that

nnAk-1(6 -0) = M-1A-1 (E - 1)wt + op(l).t=1

The result claimed in part (d) of the theorem follows from the latter expansionand (5.12).Step (ii): Parts (a) and (b) of Theorem. Case (a) in the theorem followsdirectly from (5.9); the series at (5.9) may be expressed as a multivariatesquare-integrable martingale, and a martingale central limit theorem such as themultivariate form of that of Brown (1971) may be used to obtain the result.Alternatively, case (a) is effectively covered by our treatment of case (b), whichwe give next.Define Dt to equal any linear combination of the components of the vectorwt at (5.9). We shall prove that under the conditions for case (b), or indeed ifE(E4) < 00,(5.13) A-1E(Et- 1)Dt is asymptotically Normallytdistributed with variance var(E2)E(D 2).Part (b) of the theorem follows from this property via the Cramer-Wolddevice.


23/34

306 P. HALL AND Q. YAOLet A, be as at (2.8) and define Itn= I(Et2-11 A)/H(A) -O 0as A -* 00. See (8.5), of Feller (1966, p. 303). It therefore follows from the defi-nition of Anat (2.8) that(5.16) nP(E2 -1 > An) nAH(An)-1P(E2 - > An) -O 0as n -* oo, and so P(S3 = 0) -* 1. Analogously to (5.15) it may be proved thatAE{jE2 -ljI(jE2 - 11> A)}/H(A) > 0 as A -* oo, and so analogously to (5.16),

ni8ni < nE{E2 -_ jI(1E2 11 > AX)} = o{nA,-jH(An)} = o(A0).Therefore, I8nS41= Op(nlI8l) = Op(An).Combining the results in the paragraphwe deduce that(5.17) S1= S2 + Op(An).

Let Dt(k) be the random variable obtained by setting to 0 each E"for u < t - k,in the formula for Dt. Now, {Dt(k), -oo < t < oo} is a stationary time series, andfor a constant C1> 0, ID1Iand ID1(k) Iare both less than C1with probability1 forall k > 1. Define Qt = (E2 )Itn-n Stn(k) = QtDt(k) Vtn(k) = Qt{Dt-Dt (k)}S6 = Et Stn(k), and S7 = Et Vtn(k). Then(5.18) S2=S6+S7.

Given f > 0, choose ko so large that E{D1 - D1(k)}2 < f for all k > ko. Notethat for a constant C2> 0 not depending on k or n, E(Q2) < C2H(An). Observetoo that E(Vtnvun)= 0 if t 0 u. Therefore if k > koE(S2) = nE{vln(k)2}

= nE(Q )E{D1-D1j(k)}2 0, then(5.19) lim limsupE(S7/An)2 = 0.k--oo n--oo


24/34

ARCH AND GARCH MODELS 307Next we prove a central limit theorem for S6. Distribute the summands

stn(k), 1 < t < n, among blocks of alternating lengths k and ?. Let the sums ofstn(k) withinthe respectiveblocksbe T1uandT2U,in the cases of blocksof lengthsk and ? respectively. Thus, excepting a possible residual block at the very end,each T1uor T2Uis a sum of k or e, respectively,adjacentlyindexedvalues ofstn(k).Denote by n1 and n2 the numbersof indicesu for T1uand T2U,respec-tively.Then In1- n21< 1, both n1and n2 are asymptoticto n/(k + e), and

n, n2(5.20) S6= E T1u + T2U.u=1 u=1

Note too that for a constantC3(k) > 0,E(TA21)< k2E{sll(t)2} < 2k2Cl2E{(E2- 1)2Il1} < C3(k)H(An).

If t > k, then the variables T1u,for u > 1, are independent, and so

E(A-' ET1) = An-2 EE(T72u) - nA-2(k + f)-1E(TA2)u u - H(An)'1(k + e)-1E(TA21).

Therefore, for each fixed k > 1,/ ~~~~2

(5.21) lim limsupE( E T1U - 0.The summands stn(k) are k-dependent, and so the variables T2U,for 1 < u < n2,are independent. Excepting a possible residual block at the end, they are alsoidentically distributedwith finite variance, although the distribution and variancedepend on n. Lindeberg's central limit theorem may be applied to the seriesA`- Eu T2Uto show that it is asymptoticallyNormally distributed with zero mean

and variance ,3(k, ?)2, say. (When showing that Lindeberg's condition is satisfied,note that the functionH is slowly varying.)It may be proved by elementarycalculus that for each fixed k > 1, ,B(k, e)2 _ 12 = E(D2) as f -x . Therefore,writing Z for a variable with the standard Normal random distribution,we havefor each fixed k,lim lim sup P( A` E T2U 'x)-P(Z,B


25/34

308 P. HALL AND Q. YAOStep (iii): Parts (c)-(e) of Theorem. We shall show only that each component ofnA,-i (0G-0) converges weakly to the corresponding component of the distribu-

tion of W0 or W1 (after the appropriate location change if a = 1). Our argumenthas a straightforwardmultivariateversion, in which a multivariate metric betweendistributions is used in place of the Levy distance that we employ. The longerargument differs only in notational complexity. Part (e) of the theorem may beproved as in Samorodnitsky and Taqqu(1994, Theorem 1.4.5). In particular,00(5.22) S(2) = Z(Ykvk - EYkEDl)

k=1converges almost surely when 1 < a < 2.Let Dt equal any one of the components of the vector M-1w, where wt is asat (5.9). Take Itn= I(EtE ij < C4A4),where An has the meaning it assumes inparts (c) and (d) of the theorem and C4 > 0 will be taken small and fixed. In thisnew notation, define Jtn and an by (5.14), and let

n nS = E- 1)Dt, S2= 2- Dt,t=1 t=1

n nS3=E{( - 1)Itn5n-8}Dt, S4 = Dt.t=1 t=1

Then,(5.23) S1 = S2 + S3 +8nS4.We shall first prove that(5.24) lim limsupE(S3/An)2 = 0,C4 - 0 n -??oc(5.25) 8nS4/AAn=-a + op(l),where, defining

f3 =3(n C4 = a(a - 1)-'C4h- if 1< a


26/34

ARCH AND GARCH MODELS 309and so E(S3/A,)2 - a(2 - a)-lC42-aE(D2) as n -x o. Since a < 2 then thisimplies (5.24).

Repeated use of KT and the uniform convergence theorem for slow variation(denoted by UCT, say; see Bingham, Goldie, and Teugels (1987, Section 1.2))allows us to prove that(5.27) E{E2I(C4An< E2 < C4An+ 1)} = O(An/n),

-5n = E{E21(E2 > C4AX)}+ o(An/n).When 1 < a < 2, nE{E21(E2 > C4AX)}- f3Anby KT and UCT, and so in viewof (5.27), -nnl/An -- /3; call this property (P). The variables D1 and D, areasymptotically uncorrelated as itt -> x, and the process {DJ} is stationary andessentially bounded, so(5.28) n-1S4 -- E(D1) in probability.Result (5.25), for 1 < a < 2, follows from this property and (P). (Result (5.28) canbe derived using the argument leading to (5.10); see the parenthetical remarksbelow that formula.)To complete the proof of (5.25) when a = 1, note that by KT and UCT,

E{E2I(C4A4 < E2 < A)} = -(An/n) log C4+o(An/n).Hence by (5.27),

-5n/An = E{E2I(E2 > A)} - n-1 log C4+ o(n-1).Result (5.25) follows from this formula, the fact that E{E2I(E2 > An)} is slowlyvarying in n, and (5.27).For large n, Jtn = I(Et > C4An+ 1) for all 1 < t < n. Furthermore, for eachC4 > 0,(5.29) lim liminf P(no more than k out of E21..., En exceed C4A4)= 1.k-*oo n-*ooTherefore the number of nonzero terms in S2 equals Op(l) as n -> x. Hence,if we take E62) > ... > E2 to be the ordered values of E21... vEn if welet D(n), . . . D(1) be the concomitant values of D1, . . . , Dn, and if we put Z(t) =(2)/An then

n(5.30) S2/An= Z(t)DWII(Z(t) > C4) + op(1)t=1Recall that Dt denotes a particular component of M-1wt, say the sth; let Vkbe the sth component of Vk. Then Vk and Dt have the same distribution, and

in particular, E(vk) = E(D,). The joint limiting distribution of (Z(n), D(n)), ... v(Z(n-k+l), D(n-k+l)) is the joint distribution of (Y1, v1), * **, (Yk, Vk), for each


27/34

310 P. HALL AND Q. YAOfixed k. (Call this result (R); we shall outline a proof in the next paragraph.)Note too that the variables Z(t) are nonincreasing with t. Combining these prop-erties and (5.29), and applying Lemma 5.1 below to the series on the right-handside of (5.30), we deduce that

00(5.31) S2/AnS+ 5(1)-E YkVkI(Yk > Q,k=1where the convergence is in distribution as n -x o. The infinite series here con-verges because, reflecting (5.29), with probability 1 it contains only a finite num-ber of nonzero terms. By (5.26) and (5.31),(5.32) lim limsupL{S1/A,, S(1) -,A} = 0.C40+O n-+oo

Next we outline a derivation of result (R). Let rj, a random integer, denotethe index r such that e(n-j+l) = er2. If (R) did not hold, then there wouldexist a subsequence of values of n along which the joint distribution of(Z(n), D(n)), ... * (Z(n-k+l), D(n-k+l)) converged, as n x-+ , but to a sub-distribution limit that did not have the form claimed under (R). Then, notingthat the separations of the integers rj diverge with sample size, we could choosea sub-subsequence nm, say, for which there existed a sequence of positive inte-gers Vn diverging to infinity and with the property that, as n increased along thesub-subsequence, Irjl- rj2/vn diverged to infinity with probability 1 whenever1 < j 0 j2 < k. Call this property (P). Consider the version of the problem inwhich each Dt is replaced by its approximant D, the latter defined by replacingby 0 each E,for which s< t- Vn. Let D n-j) denote the correspondingconcomitantof Z(n-j) Note that D' and D2 are independent if It1- t2I> cn. It may be shownusing this result and property (P) that (Z(n), D'()), * * *v(Z(nk?l), D(n-k+l)) con-verges to the limit distribution claimed for (Z(n), D(n)), ... , (Z(n-k+l), D(n-k+l))under (R). However, D'_) - D(n_) converges in probability to zero for eachO< j < k - 1, and so (R) must be true at least along the sub-subsequence {nm}.This contradiction completes the proof of (R).

The proof of Lemma 5.1 is given in the Appendix.LEMMA 5.1: Let (Uni Vni), for 1 < i < n < x, denote a triangular arrayof random 2-vectors, and let (U1, V1), (U2, V2), . . . be an infinite sequence of2-vectors.Assume thatfor each k > 1 thejoint distributionof the orderedsequence

(Unj, Vni) 1 < i < k, convergesweaklyto thejoint distributionof (Ui, Vi), 1 < i < k,as n -+ oo. Suppose too that for each k, (V1, . . . , Vk) has a continuous distribution,and thatfor each C > 0,(5.33) lim liminf P(no more than thefirst k of Vnl,... vVnnexceed C) = 1,k-oo n--oo

lim P(no more than thefirst k of Vlv, V2.... exceed C) = 1.k-oo


28/34

ARCH AND GARCH MODELS 311Thenfor each C > 0,

n 00

Z UniI(Vni > C) E UJI(Vi > C)i=l i=lin distribution as n -> oo.

Reverting to the case 1 < a < 2, and with S(1) and S(2) given by (5.31) and(5.22), we may write(5.34) S(1) = S(2) - S(3) + S(4),where

00(5.35) S(3) = I[YkVkI(Yk < C4) - E{Yk(Yk < C4)}E(Dj)]1

k=1oo(5.36) S(4) = E(D1) Ej E{Yk(Yk > Q}.

k=1The argument used to establish convergence of the infinite series S(2) may beemployed to prove that S(3) also converges (as an infinite series) with proba-bility 1. Note too that E{S(3)} = 0 and var{S(3)} -+ 0 as C4 -+ 0. ThereforeS(3) -+ 0 in probabilityas C4 -+ 0. Combining these results with (5.32) we deducethat(5.37) lim limsup L{Sl/An, S(2) + S(4) -A} = 0.C4 -- 0 nf-+0oo

Continuing to assume 1 < a < 2, and defining Fk as at (2.10), we have00 00

(5.38) S(4)/E(D1) = ydFk(y)k=1

f00[ 00 100=IC4 Ik-i k(y) d + C4 {1 Fk(C4)}._k=1 ~~J k=1

Now, Ek>1{1 - Fk(y)} = y-. It may therefore be shown that if 1 < a < 2,then S(4)/E(D1) = P3.Hence, (5.37) is equivalent to: limn,w L{Sl/An, S(1)} = 0,which implies that Sl/An converges in distribution to S(1). This proves part (c)of the theorem, in a component-wise sense.Next we treat a = 1. Continue to define S(2), S(3), and S(4) by (5.22), (5.35),and (5.36), except that now the series should be taken only over k > 2. (We alsodefine S(1) by (5.31), in this case continuing to take the sum over k > 1.) Theninstead of (5.34),

S(1) = S(2) - S(3) + S(4) + Y1v1I(Y1 > C4).We have as before that S(3) -+ 0 in probabilityas C4-+ 0. Therefore, from (5.32)we deduce in place of (5.37) that(5.39) lim limsupL{S1/A,nvS(2) + S(4) -a + Y1vl} = 0,C4weeyd no Entoowhere y denotes Euler's constant.


29/34

312 P. HALL AND Q. YAOResult (5.38) continues to hold when a = 1, provided now that both series onthe right-hand side are taken over k > 2. Since (in the case a 1)

00 1- Fk(y)} = F1(y) +y-1 - 1 = exp(_y-1) +y1 -1,k=2and

00y = j {exp(-y-1) +min(y-1 - 1, 0)} dy,then

00Q- 00 00l fl-Fk (y) Idy+C4E{1-Fk (C4)1 Y-log C4 +o(l)C4k=2 k=2

as C4 0. Therefore,(5.40) S(4) = (y-log C4)E(D1) + o(1)as C4 0. Noting (5.39), (5.40), and the fact that A= (g,ln- log C4)E(D1) in thecase a= 1, we deduce that

lim L{S1/An, S(2) + Y1v1+ yE(D1) - InE(Dj) } = 0,n-*oowhich implies that (Sl/An) + btnE(D1) - yE(D1) converges in distribution toS(2) + Y1vl. This proves the component-wise form of (5.12), and as argued at thebeginning of the current step, a proof of the vector form is virtually identical. Weshowed, at the end of step (i), how part (d) of the theorem follows from (5.12).

5.2. Outline Proof of Theorem2.2Recall the definitions (2.4) and (2.15) of o72and &I2.The property v/ log n -+ oo,and the fact that E(X2) < oo, ensure that for all C > 0,

(5.41) sup sup I&t(a,b, C)2 - o-t(a, b, c)21= Op(n-C),(a, b,c)EN v


30/34

ARCH AND GARCH MODELS 3135.3. OutlineProof of Theorem 3.2

Recall that & d,= i I,b). Starting from (2.15), in which we take (a, b, c)equal to (aO,b0,c0) in one instance and to (a, b, c) in another, and noting thateach component of the estimators a, b, c differs from its true value by Op(n-6),for some ( > 0 (see Theorem 2.2), it may be proved that(5.42) 6t/I&t(ao, bo, co) = 1+ Op(n-6)uniformly in v < t < n, for some ( > 0. The argument uses the assumption that(if p > 1) at least one of a1, . . . , ap is nonzero, and (if q > 1) at least one ofb1, .., bq is nonzero. It also requires c > 0.Note too that, in view of the results obtained in Section 5.2, and writing at todenote at (a?, b?,cO),we have(5.43) ut/&t(a0, bo, co) = 1+ QP(n-C)uniformly in v < t < n, for all C > 0. Combining (5.42) and (5.43) we deduce that

t/at = 1 + Op(n-) uniformly in v < t < n. Equivalently, I(it/ Et)-1 = Op(n-)uniformly in v < t < n. From this property, and the definition of et at (3.5), itmay be deduced that for some 6 > 0,(5.44) et = Et{1 + Op(n-)} + Op(n-),where both "9O"terms are of the stated orders uniformly in v < t < n.Using (5.44) we may rework the proof of Theorem 2.1 in the bootstrap case,to obtain Theorem 3.2. When error variance is finite the argument hardly alters;when error variance is infinite but the error distribution is in the domain ofattraction of the Normal law, we borrow an argument from Hall (1990); andwhen the error distribution is in the domain of attraction of a stable law withexponent a E (1, 2) we note that, in view of (5.44), large values of E2 are identifiedwith large values of Et. For example, for each fixed k we may show that withprobability converging to 1 as n -? oo, the indices tl, . . . , tk at which the k largestvalues of E2 (for 1 < t < n) occur are identical to the indices at which the k largestvalues of 62 (for v < t < n) occur.

Centre for Mathematics and its Applications, Australian National University,Canberra,ACT 0200, Australia, andDepartment of Statistics, London School of Economics and Political Science,Houghton Street,London WC2A2AE, UnitedKingdom.

ManuscriptreceivedMay, 2001; final revision receivedFebruary,2002.APPENDIX: PROOF OF LEMMA 5.1

Let L(., ) denote the Levy metric, interpreted as in Step (iii) of Section 5.1. Suppose 8 > 0 isgiven. Using properties (5.33), choose k(8) > 1 so large that for all sufficiently large n,(A.1) P{no more than the first k(8) of Vn1,. . ., Vnnexceed C} > 1- 8,

P{no more than the first k(8) of V1,V2,. .. exceed C} > 1-8.


31/34

314 P. HALL AND Q. YAOThe assumptions in the lemma imply that, if we define

k kQk(n)=LUnjI(Vnj>C) and Qk=LUI(Vi>C),i=1 i=1

then for each fixed k,(A.2) Qk(n) -+ Qk in distributionas n -+ oo. Result (A.1) implies that for sufficiently large n,

P{Qk(S)(n) $ Qn(n)} < 8 and P{Qk(S) QJ


32/34

ARCH AND GARCH MODELS 315BOLLERSLEV,T., AND J. M. WOOLDRIDGE (1992): "Quasi Maximum Likelihood Estimation andInference in Dynamic Models with Time Varying Covariances,"EconometricReviews, 11, 143-172.BOSE,A. (1988): "Edgeworth Correction by Bootstrap in Autoregressions," Annals of Statistics, 16,1709-1722.BOUGEROL, P., AND N. PICARD (1992): "Stationarityof GARCH Processes and of some Nonnega-tive Time Series," Journal of Econometrics, 52, 115-127.BROwN, B. M. (1971): "Martingale Central Limit Theorems," Annals of Mathematical Statistics, 42,59-66.CHAN, N. H. (1990): "Inference for Near-Integrated Time Series with Infinite Variance,"Journal ofthe American StatisticalAssociation, 85, 1069-1074.(1993): "On the Noninvertible Moving Average Time Series with Infinite Variance,"Econo-metric Theory, 9, 680-685.CHAN, N. H., AND L. T. TRAN (1989): "On the First-Order Autoregressive Process with InfiniteVariance,"Econometric Theory, 5, 354-362.CLINE, D. B. H. (1989): "Consistency for Least-Squares Regression Estimators with Infinite Variance

Data," Journal of StatisticalPlanning and Inference, 23, 163-179.CLINE, D. B. H., AND P. J. BROCKWELL (1985): "LinearPrediction of ARMA Processes with InfiniteVariance,"Stochastic Processes and theirApplications, 19, 281-296.COMTE,F., AND 0. LIEBERMAN (2000): "Asymptotic Theory for Multivariate GARCH Processes,"Preprint.DARLING, D. A. (1952): "The Influence of the Maximum Term in the Addition of IndependentRandom Variables," Transactionsof the American MathematicalSociety, 73, 95-107.DAVIS,R. A., AND T. MIKOSCH (1998): "The Sample Autocorrelations of Heavy-Tailed Processeswith Applications to ARCH," Annals of Statistics, 26, 2049-2080.DAVIS,R. A., AND S. RESNICK (1985a): "Limit Theory for Moving Averages of Random Variableswith Regularly VaryingTail Probabilities,"Annals of Probability, 13, 179-195.(1985b): "More Limit Theory for the Sample Correlation Function of Moving Averages,"

Stochastic Processes and theirApplications, 20, 257-279.(1989): "Basic Properties and Prediction of Max-ARMA Processes," Advances in AppliedProbability,21, 781-803.(1996): "Limit Theory for Bilinear Processes with Heavy-Tailed Noise," Annals of AppliedProbability, 6, 1191-1210.DWASS,M. (1966): "Extremal Processes. II," Illinois Journal of Mathematics, 10, 381-391.EMBRECHTS, P., C. KLUPPELBERG, AND T. MIKOSCH (1997): Modelling Extremal Events. Berlin:Springer-Verlag.ENGLE, R. F. (1982): "Autoregressive Conditional Heteroscedasticity with Estimates of the Varianceof United Kingdom Inflation," Econometrica, 50, 987-1007.ENGLE, R. F., AND G. GONZALEZ-RIvERA(1991): "Semiparametric ARCH Models," Journal ofEconomics and Business Statistics, 9, 345-359.FAMA,E. (1965): "The Behaviour of Stock Market Prices," Journal of Business, 38, 34-105.FELLER, W. (1966): An Introduction to Probability Theoryand its Applications. New York: Wiley.GOLDIE, C. M. (1991): "Implicit Renewal Theory and Tails of Solutions of Random Equations,"Annals of Applied Probability, 1, 126-166.GIRAITIS,L., P. KOKOSZKA, AND R. LEIPUS(2000): "StationaryARCH Models: Dependence Struc-ture and Central Limit Theorem," Econometric Theory, 16, 3-22.GIRArIS, L., AND P. M. ROBINSON (2001): "Whittle Estimation of ARCH Models," EconometricTheory, 17, 608-623.GOURIEROUX, C. (1997): ARCH Models and Financial Applications. New York: Springer-Verlag.GROSS, S., AND W. L. STEIGER (1979): "Least Absolute Deviation Estimates in Autoregression withInfinite Variance," Journal of Applied Probability, 16, 104-116.HALL, P. (1978a): "Representations and Limit Theorems for Extreme Value Distributions," Journalof Applied Probability, 15, 639-644.

(1978b): "On the Extreme Terms of a Sample from the Domain of Attraction of a StableLaw,"Journal of the London Mathematical Society, 18, 181-191.


33/34

316 P. HALL AND Q. YAO(1990): "Asymptotic Properties of the Bootstrap for Heavy-Tailed Distributions," Annals ofProbability, 18, 1342-1360.HALL, P., AND R. LEPAGE(1996): "On Bootstrap Estimation of the Distribution of the StudentizedMean," Annals of the Instituteof Statistical Mathematics, 48, 403-421.HANNAN,E. J., AND M. KANTER(1977): "Autoregressive Processes with Infinite Variance,"Journal

of Applied Probability, 14, 411-415.HSING, T. (1999): "On the Asymptotic Distributions of Partial Sums of Functionals of Infinite-Variance Moving Averages," Annals of Probability,27, 1579-1599.KANTER, M., AND W. STEIGER(1974): "Regression and Autoregression with Infinite Variance,"Advances in Applied Probability,6, 768-783.KESTEN,H. (1973): "Random Difference Equations and Renewal Theory for Products of RandomMatrices," Acta Mathematica, 131, 207-248.KNIGHT,K. (1987): "Rate of Convergence of Centred Estimates of Autoregressive Parameters forInfinite Variance Autoregressions,"Journal of Time SeriesAnalysis, 8, 51-60.(1989a): "Consistency of Akaike's Information Criterion for Infinite Variance AutoregressiveProcesses," Annals of Statistics, 17, 824-840.(1989b): On the Bootstrap of the Sample Mean in the Infinite Variance Case," Annals ofStatistics, 17, 1168-1175.(1993): "Estimation in Dynamic Linear Regression Models with Infinite Variance Errors,"Econometric Theory, 9, 570-588.KOKOSZKA,P. S., AND T. MIKOSCH(1997): "The Integrated Periodogram for Long-Memory Pro-cesses with Finite or Infinite Variance,"StochasticProcesses and theirApplications, 66, 55-78.KOKOSZKA,P. S., AND M. S. TAQQU (1994): "Infinite Variance Stable ARMA Processes," Journalof Time Series Analysis, 15, 203-220.(1996a): "ParameterEstimation for Infinite Variance Fractional ARIMA," Annals of Statistics,24, 1880-1913.(1996b): "Infinite Variance Stable Moving Averages with Long Memory,"Journal of Econo-

metrics, 73, 79-99.(1999): "Discrete Time Parametric Models with Long Memory and Infinite Variance,"Math-ematical and ComputerModelling, 29, 203-215.KOUL,H. L., AND D. SURGAILIS(2001): "Asymptotics of Empirical Processes of Long MemoryMoving Averages with Infinite Variance,"Stochastic Processes and theirApplications, 91, 309-336.LEE, S.-W., AND B. E. HANSEN(1994): "Asymptotic Theory for the GARCH(1, 1) Quasi-MaximumLikelihood Estimator,"Econometric Theory, 10, 29-52.LEIPUS,R., AND M.-C. VIANO(2000): "Modelling Long-Memory Time Series with Finite or InfiniteVariance:A General Approach,"Journal of Time SeriesAnalysis, 21, 61-74.LINTON,0. (1993): "Adaptive Estimation in ARCH Models," Econometric Theory, 9, 539-569.LUMSDAINE,R. L. (1996): "Consistency and Asymptotic Normality of the Quasi-Maximum Likeli-hood Estimator in IGARCH(1, 1) and Covariance Stationary GARCH(1, 1) Models," Economet-

rica, 64, 575-596.MAMMEN,E. (1992): WhenDoes Bootstrap Work?AsymptoticResultsand Simulations, Lecture Notesin Statistics, 77. New York: Springer.MANDELBROT,B. (1963): "The Variation of Certain Speculative Prices," Journal of Business, 36,394-419.MIKOSCH, T., T. GADRICH, C. KLUPPELBERG, AND R. ADLER (1995): "Parameter Estimation forARMA Models with Infinite Variance Innovations,"Annals of Statistics, 23, 305-326.MIKOSCH,T., AND C. KLUPPELBERG(1995): "On Strong Consistency of Estimators for InfiniteVariance Time Series," Theoryof Probabilityand MathematicalStatistics, 53, 127-136.MIKOSCH,T., ANDC. STARICA(2000): "LimitTheory for the Sample Autocorrelations and Extremesof a GARCH(1, 1) Process, Annals of Statistics, 28, 1427-1451.MIKOSCH, T., AND D. STRAUMANN (2001): "Whittle Estimation in a Heavy-Tailed GARCH(1, 1)

Model," Manuscript.MITTNIK, S., AND S. T. RACHEV (2000): Stable Paretian Models in Finance. New York: Wiley.


34/34

ARCH AND GARCH MODELS 317MITTNIK,S., S. T. RACHEV,AND M. S. PAOLELLA(1998): "Stable Paretian Modeling In Finance:Some Empirical and Theoretical Aspects, in A Practical Guide to Heavy Tails, ed. by R. J. Adler,R. E. Feldman, and M. S. Taqqu. Boston: Birkhauser, pp. 79-110.NELSON,D. B. (1990): "Stationarityand Persistence in the GARCH(1, 1) Model," Econometric The-ory, 6, 318-334.PHILLIPS,P. C. B. (1990): "Time Series Regression with a Unit Root and Infinite-Variance Errors,Econometric Theory, 6, 44-62.POLITIS,D., J. P. ROMANO,AND M. WOLF(1999): Subsampling.New York: Springer.RESNICK,S. I. (1986): "Point Processes, Regular Variation and Weak Convergence," Advances inApplied Probability, 18, 66-138.(1987): Extreme Values,Regular Variation,and Point Processes. New York: Springer-Verlag.RYDBERG,T. (2000): "Realistic Statistical Modelling of Financial Data," International StatisticalReview, 68, 233-258.SAMORODNITSKY, G., AND M. S. TAQQu (1994): StableNon-Gaussian Random Processes. StochasticModels with Infinite Variance.StochasticModeling. New York: Chapman and Hall.SHEPHARD,N. (1996): "Statistical Aspects of ARCH and Stochastic Volatility,"in Time SeriesModelsin Econometrics, Finance and OtherFields, ed. by D. R. Cox, D. V. Hinkley, and 0. E. Barndorff-Nielsen. London: Chapman and Hall, pp. 1-67.TAYLOR, S. J. (1986): ModellingFinancial Time Series. Chichester: John Wiley.WEISS,A. (1986): "Asymptotic Theory for ARCH Models: Estimation and Testing," EconometricTheory, 2, 107-131.YOHAI,V. J., AND R. A. MARONNA(1977): "Asymptotic Behavior of Least-Squares Estimates forAutoregressive Processes with Infinite Variances,"Annals of Statistics, 5, 554-560.

arch garch heavy tailed

Documents