in copyright - non-commercial use permitted rights ...24069/eth... · garch models of higher order,...

Research Collection

Working Paper

Nonparametric GARCH models

Author(s): Bühlmann, Peter Lukas; McNeil, Alexander J.

Publication Date: 1999

Permanent Link: https://doi.org/10.3929/ethz-a-004159348

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For moreinformation please consult the Terms of use.

ETH Library

https://doi.org/10.3929/ethz-a-004159348

http://rightsstatements.org/page/InC-NC/1.0/

https://www.research-collection.ethz.ch

https://www.research-collection.ethz.ch/terms-of-use

Nonparametric GARCH Models

by

Peter Buhlmann

andAlexander J. McNeil

1

Research Report No. 90July 2000

Seminar fur Statistik

and

Department of Mathematics

Eidgenossische Technische Hochschule (ETH)

CH-8092 Zurich, Switzerland

1A. McNeil is Swiss Re Research Fellow at ETH Zurich and gratefully acknowledges the financialsupport of Swiss Re Research.

Nonparametric GARCH Models

Peter Buhlmann

andAlexander J. McNeil

†

Seminar fur StatistikETH Zentrum

CH-8092 Zurich, Switzerland

July 2000

Abstract

In this paper we describe a nonparametric GARCH model of first order and pro-pose a simple iterative algorithm for its estimation from data. We provide a theoreticaljustification for this algorithm and give examples of its application to stationary timeseries data showing stochastic volatility. We observe that our nonparametric pro-cedure often gives better estimates of the unobserved latent volatility process thanparametric GARCH(1,1) modelling, particularly when asymmetries are present in thedata. We show how the basic iterative idea may be extended to more complex timeseries models combining ARMA or GARCH features of possibly higher order.

1 Introduction

Stationary time series data showing fluctuating volatility and, in particular, financial re-turn series have provided the impetus for the study of a whole series of econometric time se-ries models that may be grouped under the general heading of GARCH (generalized, auto-regressive, conditionally heteroscedastic models). Examples include the original ARCHmodel of Engle (1982), the standard GARCH model, integrated GARCH (IGARCH), ex-ponential GARCH (EGARCH) and threshold GARCH (TGARCH), to name but a veryfew. Two review articles giving details of these models and their many variants are Boller-slev, Chou, and Kroner (1992) and Shephard (1996). In all of these models the hiddenvariable volatility depends parametrically on lagged values of the process and lagged val-ues of volatility. The standard approach to fitting these models to data involves nonlinearmaximum likelihood estimation.

In this paper we take a nonparametric approach to GARCH modelling which is lesssensitive to model misspecification. We concentrate on a model, motivated by the naturalidea in finance, that the hidden volatility depends nonparametrically on one lagged volatil-ity and one lagged value of the process; we term this model a nonparametric GARCH(1,1)process. Nonparametric estimation is a feasible alternative because, in a sense, volatil-ity is only partially hidden. It is possible to devise an iterative scheme to estimate thevolatility process and thus, perhaps surprisingly, to overcome the problem of latency. Thisscheme makes use of a bivariate smoother and is thus very easy to implement. Using asoftware package that provides bivariate smoothing such as S-Plus, we are simply requiredto program an additional loop.†A. McNeil is Swiss Re Research Fellow at ETH Zurich and gratefully acknowledges the financial

support of Swiss Re Research.

1

We argue that such an iterative bivariate smoothing scheme has the same rates ofestimation accuracy as one classical bivariate smoothing step, which is usually of theorder n−2/3 with n denoting the sample size; see Stone (1982). Although the nonpara-metric GARCH(1,1)-volatility process depends on the infinite past of the process, in acertain but not fully arbitrary nonparametric way, its estimation can be achieved withinreasonable accuracy. In contrast to the parametric case, one should not use an ARCH-approximation scheme in the nonparametric framework: an expansion of the nonparamet-ric GARCH(1,1)- into a nonparametric ARCH(∞)-model and estimation of the latter witha high order nonparametric ARCH(p)-model is prohibitive. The curse of dimensionalitywould at best lead to an estimation rate of order n−4/(4+p) with p large (Stone 1982). Oneway to deal with the curse of dimensionality is to consider multiplicative ARCH(p)-modelswith estimation rate usually of order n−4/5 (Yang, Hardle, and Nielsen 1999, Hafner 1998).But from an interpretational view, we strongly prefer the nonparametric GARCH(1,1)-model; volatility depends in a fully nonparametric way only on one-lagged values of thetime series and the volatility.

The basic idea of our iterative estimation algorithm allows for many extensions to othertime series models with some form of latency. These extensions include nonparametricGARCH models of higher order, nonparametric ARMA models for prediction of condi-tional expectations and nonparametric GARCH models with a general additive structurefor the conditional variance; we give examples in Section 4. To begin with we presentin Section 2 the basic first order nonparametric model and its estimation algorithm, forwhich we provide a justification. In Section 3 we provide examples of the use of thealgorithm on both simulated and real data.

2 The Basic Model

In this paper we consider stationary stochastic processes {Xt; t ∈ Z} adapted to the fil-tration {Ft; t ∈ Z} with Ft = σ ({Xs; s ≤ t}), and having the form

Xt = σtZt,

σ2t = f(Xt−1, σ

2t−1), (1)

where {Zt; t ∈ Z} is an iid innovation series with zero mean, unit variance and a finitefourth moment. Zt is assumed independent of {Xs; s < t}, and f : R × R+ → R+ is astrictly positive-valued function; σ2

t is then the conditional variance Var[Xt | Ft−1] and σtis known as the volatility.

Equations (1) specify a general discrete time stochastic volatility process of first orderthat includes the parametric ARCH(1) and GARCH(1,1) models, but which also allowsfor a more complicated dependence of the present volatility on the past. For example, forthe parametric GARCH(1,1) model,

f(x, σ2) = α0 + α1x2 + βσ2, α0, α1, β > 0,

and for a more unusual asymmetric model we might consider

f(x, σ2) = α0 + α1x2 + β1{x≤0}σ

2.

In this paper we leave the exact form of f unspecified and attempt to estimate it bynonparametric means.

To this end we observe first that the model (1) can be written in terms of an additivenoise as follows;

X2t = f(Xt−1, σ

2t−1) + Vt,

Vt = f(Xt−1, σ2t−1)

(Z2t − 1

),

2

where Vt is a martingale difference series with E[Vt] = E[Vt | Ft−1] = 0 and Cov[Vs, Vt] =Cov[Vs, Vt | Ft−1] = 0 for s < t. It follows that

E[X2t | Ft−1] = f(Xt−1, σ

2t−1),

and

Var[X2t | Ft−1] = f2(Xt−1, σ

2t−1)

(E[Z4

t ]− 1).

This suggests we could estimate f by regressing X2t on the lagged variables Xt−1 and

σ2t−1 using a nonparametric smoothing technique. The conditional heteroscedasticity of

the series X2t suggests a weighted regression might be used. The principal problem is

that the volatility σt−1 is an unobserved latent variable. This problem is overcome in thefollowing algorithm.

2.1 An estimation algorithm

Assume we have a data sample {Xt; 1 ≤ t ≤ n}, ideally from a process satisfying (1).

1. Calculate a first estimate of volatility {σt,0; 1 ≤ t ≤ n} by fitting an ordinaryparametric GARCH(1,1) model by standard maximum likelihood. Set m = 1.

2. Regress {X2t ; 2 ≤ t ≤ n} against {Xt−1; 2 ≤ t ≤ n} and {σ2

t−1,m−1; 2 ≤ t ≤ n} usinga nonparametric procedure to obtain an estimate fm of f . For a weighted regressionuse the regression weights {σ−2

t,m−1; 2 ≤ t ≤ n}.

3. Calculate {σ2t,m = fm(Xt−1, σ

2t−1,m−1); 2 ≤ t ≤ n} and impute some sensible value

for σ21,m, for example σ2

1,m−1.

4. Increment m and return to step two if m < M .

Generally after a few iterations the algorithm has converged and there is little to pickand choose between volatility estimates σt,m for various values of m. The algorithm canhowever often be improved by averaging over the final K such estimates to obtain

σt,∗ = (1/K)M∑

m=M−K+1

σt,m,

and then performing a final regression of X2t against Xt−1 and σ2

t−1,∗ to get final estimatesf of f and σ2

t = f(Xt−1, σ2t−1,∗); we refer to this as a final smoothing step.

The parameters of the estimation algorithm are n, the sample size; M , the number ofbasic iterations and K, the number of estimates used in the final smoothing step. Therewill also be a bandwidth parameter to choose in the nonparametric regression method.

2.2 Justification of the algorithm

The following assumption is crucial for our justification of consistency of the estimationalgorithm.

Assumption A1 (contraction with respect to hidden variable).

supx∈R|f(x, σ2)− f(x, τ2)| ≤ D|σ2 − τ2| for some 0 < D < 1, for all σ2, τ2 ∈ R+.

A key feature of the algorithm can be understood when simplifying first to the case ofno estimation error. We recursively define

ft,m(x, σ2) = E[X2t |Xt−1 = x, σ2

t−1,m−1 = σ2]

σ2t,m = ft,m(Xt−1, σ

2t−1,m−1), m = 1, 2, . . . ; t ∈ Z,

3

where σ2t,0 are some starting values assumed to be elements of Ft−1 for all t ∈ Z, i.e. they

are independent from {Xs; s ≥ t}. The σ2t,m are the theoretical quantities corresponding

to the estimates σ2t,m of the algorithm. By iteration, σ2

t−1,m−1 ∈ Ft−2 is independent from{Xs; s ≥ t− 1}. Since Zt is also independent from {Xs; s < t}, we can write

σ2t,m = E[X2

t |Xt−1, σ2t−1,m−1]

= E{f(Xt−1, σ2t−1)|Xt−1, σ

2t−1,m−1}, m = 1, 2, . . . ; t ∈ Z.

The following argument quantifies the error due to wrong starting values,

E{(σ2t − σ2

t,m)2} = E

([f(Xt−1, σ

2t−1)− E{f(Xt−1, σ

2t−1)|Xt−1, σ

2t−1,m−1}]2

)≤ E[{f(Xt−1, σ

2t−1)− f(Xt−1, σ

2t−1,m−1)}2]

≤ D2mE[{σ2

t−m − σ2t−m,0}2]. (2)

The first inequality holds since E{f(Xt−1, σ2t−1)|Xt−1, σ

2t−1,m−1} is the best mean square

error predictor of σ2t = f(Xt−1, σ

2t−1) as a function of Xt−1 and σ2

t−1,m−1, the secondinequality holds by iteration and assumption A1. Formula (2) shows that in case of noestimation error, iteration from arbitrary starting values σ2

t,0 (t ∈ Z) leads exponentiallyfast to the true squared volatility σ2

t , due to the contraction property in assumption A1.In the presence of estimation error we consider

ft,m(x, σ2) = E[X2t |Xt−1 = x, σ2

t−1,m−1 = σ2],

σ2t,m = ft,m(Xt−1, σ

2t−1,m−1),

the latter being a true conditional expectation as a function of Xt−1 and the estimateσ2t−1,m−1. The estimation error in the mth smoothing step of the algorithm is then de-

scribed by

∆t;n,m = σ2t,m − σ2

t,m, m = 1, 2, . . . , t = m+ 2, . . . , n.

Denote by

∆(L2)n = sup

m≥1max

m+2≤t≤n(E∆2

t;n,m)1/2.

We sometimes assume the following.Assumption A2.

E[{σ2t,m − σ2

t,m}2] = E[{ft,m(Xt−1, σ2t−1,m−1)− ft,m(Xt−1, σ

2t−1,m−1)}2]

≤ G2E[{σ2

t−1,m−1 − σ2t−1,m−1}2] for some 0 < G < 1,

for t = m+ 2,m+ 3, . . . and m = 1, 2, . . .Assumption A3.

max2≤t≤n

E{(σ2t,0 − σ2

t,0)2} ≤ K1 <∞, max2≤t≤n

E{(σ2t,0 − σ2

t )2} ≤ K2 <∞ for all n.

Assumption A4.

∆(L2)n → 0 as n→∞.

Theorem 1. Assume that {Xt}t is as in (1), satisfying assumptions A1 and A2. Denoteby ‖.||2 the L2-norm, i.e. ‖Y ‖2 = (EY 2)1/2. Then, the estimator σ2

t,m in the mth step ofthe algorithm satisfies

‖σ2t,m − σ2

t ‖2 ≤m−1∑j=0

Gj‖∆t−j;n,m−j‖2 +Gm‖σ2t−m,0 − σ2

t−m,0‖2 +Dm‖σ2t−m,0 − σ2

t−m‖2,

t = m+ 2,m+ 3, . . . , n.

4

If additionally assumption A3 holds, then

lim supm→∞

maxm+2≤t≤n

‖σ2t,m − σ2

t ‖2 ≤ ∆(L2)n /(1−G).

If assumptions A1-A4 hold, and by choosing mn = C{− log(∆(L2)n )} for some C ≥

max{−1/ log(D),−1/ log(G)}, we have

maxmn+2≤t≤n

‖σ2t,mn − σ

2t ‖2 = O(∆(L2)

n ) as n→∞.

A proof is given in the Appendix. Note that similar asymptotic results also hold forthe final smoothed estimate σ2

t .The last statement of Theorem 1 says that the accuracy of the estimation algorithm is

of the same order as the maximal error ∆(L2)n in one smoothing step. Since the smoothing

problem is bivariate, we expect ∆(L2)n = O(n−2/3) with a second order kernel, provided

some appropriate smoothness conditions on the function f hold and rate-optimal band-width choice is used (Stone 1982), although the volatility of the nonparametric GARCHmodel at time t is a function of the infinite past Xt−1, Xt−2, . . . .

According to Theorem 1, the minimal number of iteration steps is

mn;min = max{log(∆(L2)n )/ log(D), log(∆(L2)

n )/ log(G)}

to achieve the order of ∆(L2)n in the last statement of the Theorem. Assuming in the

remaining part of the section that ∆(L2)n ∼ const.n−2/3, we obtain mn;min ∼ const. log(n),

where the constant depends on the unknown D and G. We propose to use

mn = a log(n) with a ∈ {2, 3}.

This range of a covers the values mn;min for D,G ∈ {0.72, 0.80}, and mn = 3 log(n) is avalid choice for the last statement in Theorem 1 for all D,G ≤ 0.8.

We complete this section with a discussion about assumptions and difficulties witha rigorous theoretical analysis. Assumption A1 is crucial in our reasoning and we con-jecture that without a suitable contraction property of the function f with respect tothe second, hidden argument, there is no hope in getting a consistent estimate with ouriterative scheme. Formula (2) states a clean mathematical result when there are no es-timation errors; it thus points out that it seems indeed possible to solve the problem ofnonparametric GARCH estimation with a simple iterative smoothing scheme. Assump-tions A2, A3 and A4 are all exclusively needed to deal with the case of estimation error.Assumption A2 is about differences between projections: it requires that the L2-norm ofthe difference between projections is strictly smaller than the L2-norm of the differencebetween the functions σ2

t−1,m−1 and σ2t−1,m−1 (which are the spanning elements causing

distinct projection spaces). Although such an assumption is uncommon, it is natural inthe framework of L2-Hilbert theory. Assumption A2 can also be replaced by

Assumption A2’.

supm≥1

maxm+2≤t≤n

E{(E[X2t |Xt−1, σ

2t−1,m−1]− E[f(Xt−1, σ

2t−1)|Xt−1, σ

2t−1,m−1])2} ≤ Γ2

n,

Γn → 0 as n→∞.

This assumption quantifies the effect that σ2t−1,m−1 /∈ Ft−1 so that the two conditional

expectations are not the same. If assumptions A1, A2’, A3 and A4 hold, then similar toTheorem 1,

‖σ2t,m − σ2

t ‖2 ≤ (Γn + ∆(L2)n )

m−1∑j=0

Dj + const.Dm,

5

which yields consistency; the same rate as with assumption A2 is obtained if Γn =O(

∆(L2)n

). Assumption A3 is a very mild condition regarding initial conditions. As-

sumption A4 is plausible: a rigorous justification under some appropriate regularity con-ditions for the process {Xt}t seems tedious, involving the difficulty of quantifying the effectwhen plugging in estimated random variables. Concluding this discussion, we believe thatformula (2) and Theorem 1 reflect the correct asymptotic behaviour of our estimationalgorithm, although a rigorous justification of our assumptions A2 (or A2’) and A4 ismissing and still an open area for future research in theoretical statistics. Our belief forcorrectness of the asymptotics given here is also based on satisfactory results in numericalexamples as discussed in the next section.

3 Illustrative Examples

3.1 Simulated Example

We illustrate the method using simulated data from a process satisfying (1). For thevolatility surface we take

f(x, σ2) = 5 + 0.2x2 + (0.75 · 1{x>0} + 0.1 · 1{x≤0})σ2,

which is depicted in Figure 1, and we assume that the innovation distribution is standardnormal. Note the asymmetry that has been built into the GARCH effect; the strength ofthe GARCH effect (that is the magnitude of the σ2 coefficient) varies with the sign of x.A realisation of n = 1000 points from this process is shown in Figure 2.

We apply the algorithm to these data, setting the number of basic iterations to beM = 8 and performing a final smooth based on K = 5 final iterations. We use thedefault value of the smoothing parameter in the S-Plus implementation of a bivariateLoess smoother.

In Figure 3 the estimated surfaces fm calculated at the first 8 iteration and the surfacecalculated at the final smoothing stage are shown. Comparison with Figure 1 indicatesthat the nonparametric algorithm is recovering the essential features of the volatilitysurface. This series of nine pictures illustrates the first envisaged use of the algorithm —as an exploratory graphical tool for investigating the dependence of the present volatilityon the immediate past.

To provide a quantitative assessment of the accuracy of the volatility estimates wecalculate both a mean squared estimation error and a mean absolute estimation error.Because estimates of volatility at the first few time points may be unreliable we generallyomit r = 10 values from this calculation. Our measures of estimation error are

MSE(σ·,m) =1

n− r

n∑t=r+1

(σt,m − σt)2, MAE(σ·,m) =1

n− r

n∑t=r+1

|σt,m − σt| .

In Table 1 we see that the parametric GARCH(1,1) estimate of volatility can beimproved upon through additional iterations of the nonparametric algorithm. Even asingle iteration reduces both errors by around a third and after 2 or 3 iterations nofurther improvement is obtained; thereafter the errors oscillate within a band. In thisexample the final smooth gives the lowest errors of all.

In Figure 4 the volatility estimates are compared with the true volatility trajectoryfor an arbitrary section of 70 observations. In the left-hand picture it is clear that theparametric estimate derived from GARCH(1,1) modelling is unable to follow the truevolatility closely through its peaks and troughs; on the other hand, in the right-handpicture we see that the nonparametric GARCH estimate derived from the final smoothinground is fairly faithful to the true volatility.

6

Iteration (m) MSE(σ·,m) MAE(σ·,m)0 (parametric) 0.48 0.561 0.31 0.572 0.25 0.403 0.21 0.374 0.19 0.335 0.20 0.326 0.20 0.327 0.20 0.328 0.20 0.32* (final smooth) 0.18 0.31

Table 1: Volatility estimation errors for a realisation of n = 1000 points.

Essentially it is the strong asymmetry of the simulated process that denigrates theperformance of ordinary parametric GARCH as a volatility estimator. For more symmetricprocesses the difference in performance is less obvious. For simple additive surfaces of theform

f(x, σ2) = g(x) + h(σ2),

where g : R → R+ is a positive-valued function satisfying g(x) = g(−x) (such as g(x) =α|x|, 0 < α < 1) and h : R+ → R+ is a positive-valued non-decreasing function (such ash(σ2) = βσ, 0 < β < 1 ), the performance of ordinary GARCH(1,1) is often as good asor slightly better than that of the nonparametric procedure, even though GARCH(1,1)assumes an erroneous parametric form. For complicated additive functions we note thatour procedure can be adapted to fit general nonparametric additive models; for an outlinesee Section 4.2.

When the true process is exactly GARCH(1,1) then we would expect the parametricprocedure to outperform the nonparametric one, and it does so. Our simulation experi-ments suggest that it is often in situations of asymmetry that the nonparametric procedurehas advantages.

3.2 Real Example

For an example with real data we take a 1000-day excerpt from the time series of dailypercentage returns on the BMW share price (see Figure 5). Our time series contains theperiod of high volatility around the 1987 stock market crash.

In Figure 6 we show again the graphical output of the algorithm for the first 8 iterationsand the final smoothing step. The pictures have been rotated so that we view them alongthe line x = 0. These estimated volatility surfaces show some evidence of an asymmetriceffect, which depends on the sign of the last observation; as long as volatility is low, a largepositive return appears to have a much more modest effect on the next day’s volatility thandoes a large negative return. This asymmetric effect of new information is a well-knownphenomenon in financial time series and our method clearly picks it up.

Finally in Figure 7 we show the estimated volatilities derived from parametric mod-elling and the final-smooth stage of nonparametric modelling. Although we do not knowthe true volatility process for these data a simple comparison of the estimators can beobtained by calculating the mean squared “error” statistic

1n− r

n∑t=r+1

(σ2t −X2

t )2

7

for the two volatility estimates. This statistic is based on the additive representation of thesquared series as X2

t = σ2t +Vt, as introduced in Section 2; the Vt is a martingale difference

series with mean zero, which we regard as the error term. The mean squared errorstatistic takes the values 106 and 98.3 for the parametric and nonparametric estimatorsrespectively. Thus the error is about 8% smaller for the nonparametric estimate.

4 Extensions

The iterative idea of our estimation algorithm can be extended in a variety of ways andcombined with other nonparametric modelling techniques.

4.1 Nonparametric GARCH(p,q)

The estimation algorithm in Section 2.1 and its justification easily extend to the non-parametric GARCH(p, q) model with 0 ≤ p, q < ∞. Equation (1) is now generalisedto

Xt = σtZt,

σ2t = f(Xt−1, . . . , Xt−p, σ

2t−1, . . . , σ

2t−q).

The mth iteration step in the estimation algorithm is then

σ2t,m = fm(Xt−1, . . . , Xt−p, σ

2t−1,m−1, . . . , σ

2t−q,m−1),

where fm is based on a p+q-variate smoother ofX2t versusXt−1, . . . , Xt−p, σ

2t−1,m−1, . . . , σ

2t−q,m−1.

Justification of such an extension can be given as in section 2.2 by replacing assumptionA1 with

supx∈Rp

|f(x1, . . . , xp, σ21, . . . , σ

2q )− f(x1, . . . , xp, τ

21 , . . . , τ

2q )| ≤

q∑i=1

di|σ2i − τ2

i |,

for some 0 < d1, . . . , dq < 1 withq∑i=1

di = D < 1,

for all σ2, τ2 ∈ (R+)q.

4.2 Nonparametric additive GARCH

The model in (1) or the higher order model of Section 4.1 can be modified to the casewhere σ2

t is a general additive function of lagged values and volatilities. The estimationalgorithm is simply adjusted: the smoothing operations are performed according to theadditive structure of σ2

t by using the backfitting algorithm. For generalized additivemodels and their estimation see Hastie and Tibshirani (1990)

4.3 Nonparametric ARMA-GARCH of first order

The estimation algorithm also extends to hybrid ARMA-GARCH models; for notationalsimplicity we focus on the first order case. Consider

Xt = µt + σtZt,

µt = g(Xt−1, µt−1),σ2t = f

((Xt−1 − µt−1), σ2

t−1

),

8

where g : R × R → R is a nonparametric function giving the conditional mean of theprocess; all other elements of the model are as in (1).

Observe that

E[Xt | Ft−1] = g(Xt−1, µt−1),

and

Var[Xt | Ft−1] = f((Xt−1 − µt−1), σ2

t−1

).

This suggests that a similar scheme can be applied to estimate both g and f .We start with estimates µt,0 and σt,0 of µt and σt, perhaps derived by parametric

means. We regress Xt against Xt−1 and µt−1,0, weighting possibly with σ−1t,0 , to estimate

g and hence to re-estimate µt by µt,1. We then regress (Xt− µt,1)2 against (Xt−1− µt−1,1)and σ2

t−1,0, weighting possibly with σ−2t,0 , to estimate f and hence to re-estimate σt by σt,1.

We iterate this procedure to improve estimates of conditional mean and volatility.

5 Discussion

In this paper we have proposed and justified an algorithm for fitting nonparametricGARCH models of first order, and we have suggested extensions to other time seriesmodels. The first use we see for our procedure in practical applications is an explanatoryone: assuming the availability of a bivariate smoother (such as loess in S-Plus) it is asimple matter to implement our iterative algorithm to investigate visually the dependenceof volatility on lagged values of the time series and the volatility itself.

The second envisaged use for our method is as a volatility estimator in situations wherethe functional relationship between volatility and the lagged series differs markedly fromstandard parametric GARCH(1,1), particularly in situations where asymmetries appearto be present.

Of course the GARCH framework is not the only way to approach volatility estimation.Another stream of models are the stochastic volatility (SV) models; see Shephard (1996).In contrast to our model (1), the SV model has a fully latent volatility process with noobservable structure. It can be formalized as a state space model

Xt = σtZt,

{σ2t } = stationary stochastic process of a pre-specified form, (3)

with Zt as in (1), and σ2t conditionally independent of {Xs; s < t} given {σ2

s ; s < t}. Notethat this conditional independence structure is not true for our nonparametric GARCHmodel. This is why we refer to partially hidden volatility in (1), since the lagged valueXt−1 contributes to the partial observability of σ2

t . The SV model in (3) cannot be fittedwith our iterative procedure. If the process {σ2

t } and the innovation distribution of Ztare of finite parametric form, techniques from nonlinear Kalman filtering can be used,e.g. Markov chain Monte Carlo or Gibbs sampling; see for example Shephard and Pitt(1997).

A semiparametric approach for estimation of the model in (3) is given in Franke,Hardle, and Kreiss (1998). The dynamics of {σ2

t } are assumed nonparametric, but theinnovation distribution of Zt is taken to be standard normal. They use a deconvolutionestimator that relies crucially on this assumption of normality and prove that the conver-gence rate of the estimator is log(n)−2. Convergence is thus extremely slow, although thisis a common rate with such deconvolution techniques.

A further positive feature of our model is that it is fully nonparametric in the sense thatnot only is the exact functional relationship between volatility and the one-lagged values of

9

the time series and volatility left unspecified, but also no assumptions are made regardingthe distributional form of the innovation distribution. (We assume only the existence ofa finite fourth moment to justify the use of weighted nonparametric regression.)

Much work in empirical finance suggests that GARCH-type models with Gaussianinnovations cannot capture the leptokurtosis and conditional heteroscedasticity of typicalfinancial return data. This work suggests that heavier-tailed innovations are required;see McNeil and Frey (1999). It is thus attractive to look at flexible fitting methods suchas our nonparametric GARCH algorithm that do not require us to fix the form of theinnovation distribution.

Appendix

Proof of Theorem 1. Write

‖σ2t,m − σ2

t ‖2 ≤ ∆t;n,m + ‖σ2t,m − σ2

t,m‖2 + ‖σ2t,m − σ2

t ‖2.

The third term on the right hand side is bounded with formula (2); for the second termwe obtain with assumption A2,

‖σ2t,m − σ2

t,m‖2 ≤ G‖σ2t−1,m−1 − σ2

t−1,m−1‖2 ≤ G(∆t−1;n,m−1 + ‖σ2t−1,m−1 − σ2

t−1,m−1‖2)

≤ . . . ≤m−1∑j=1

Gj∆t−j;n,m−j +Gm‖σ2t−m,0 − σ2

t−m,0‖2.

These bounds then yield the first assertion in Theorem 1. The other assertions followeasily by using assumptions A3 and A4, respectively. �

References

Bollerslev, T., R. Chou, and K. Kroner (1992): “ARCH modeling in finance,”Journal of Econometrics, 52, 5–59.

Engle, R. (1982): “Autoregressive conditional heteroskedasticity with estimates of thevariance of U.K. inflation,” Econometrica, 50, 987–1008.

Franke, J., W. Hardle, and J.-P. Kreiss (1998): “Nonparametric estimation in astochastic volatility model,” Preprint.

Hafner, C. (1998): “Estimating high-frequency foreign exchange rate volatility withnonparametric ARCH models,” J. Statist. Plann. Infer., 68, 247–269.

Hastie, T., and R. Tibshirani (1990): Generalized Models. Chapman & Hall, NewYork.

McNeil, A., and R. Frey (1999): “Estimation of tail-related risk measures for het-eroscedastic financial time series: an extreme value approach,” preprint, ETH Zurich.

Shephard, N. (1996): “Statistical Aspects of ARCH and Stochastic Volatility,” in TimeSeries Models in Econometrics, Finance and other Fields, ed. by D. Cox, D. Hinkley,and O. Barndorff-Nielsen, pp. 1–55, London. Chapman & Hall.

Shephard, N., and M. Pitt (1997): “Likelihood analysis of non-Gaussian measurementtime series,” Biometrika, 84, 653–667.

Stone, C. (1982): “Optimal global rates of convergence for nonparametric regression,”Annals of Statistics, 10, 1040–1053.

10

Yang, L., W. Hardle, and J. Nielsen (1999): “Nonparametric autoregression withmultiplicative volatility and additive mean,” Journal of Time Series Analysis, 20, 579–604.

11

-10-5

05

1015

x20

40

60

80

sigma.sq

020

4060

8010

012

0f(

x,si

gma.

sq)

Figure 1: Simulation experiment. The volatility surface f(x, σ2) = 5 + 0.2x2 +(0.751{x>0} + 0.11{x≤0})σ2; note the asymmetry depending on the sign of x.

Time�

0�

200�

400�

600�

800�

1000

-15

-10

-50

510

15

Figure 2: Simulation experiment. The data.

12

-10� -5� 05

� 10� 15

�

x�20

�40�

60�80�

sigma.sq

�

-20-1

0 01

0203

0405

0E

stim

ated

f

Iteration 1

-10 -5 0

5 10 15

x�

10�20

30�

40�50�

sigma.sq

�

-10

010

2030

40E

stim

ated

f

Iteration 2

-10� -5� 05

� 10� 15

�

x�10�

20�

30�40�

50�

sigma.sq

�

-10

010

2030

40E

stim

ated

f

Iteration 3

-10� -5� 05

� 10� 15

�

x�

10

20!

30"40#

sigma.sq

$

-10

010

2030

40E

stim

ated

f

Iteration 4

-10% -5% 05

& 10' 15

'

x(

10)

20*

30+40,

sigma.sq

-

-10

010

2030

40E

stim

ated

f

Iteration 5

-10. -5. 05

/ 100 15

0

x1

10)20

*30+

40,

sigma.sq

-

-10

010

2030

40E

stim

ated

f

Iteration 6

-10. -5. 05

/ 100 15

0

x1

10)20

*30+

40,

sigma.sq

-

-10

010

2030

40E

stim

ated

f

Iteration 7

-10. -5. 05

/ 100 15

0

x1

10)20

*30+

40,

sigma.sq

-

-10

010

2030

40E

stim

ated

f

Iteration 8

-10% -5% 05

& 10' 15

'

x(

102

203

304405

sigma.sq

6

-10

010

2030

40E

stim

ated

f

Iteration final smooth

Figure 3: Simulation experiment. The estimated surfaces fm calculated at the first 8iterations and the surface f estimated at the final smoothing stage.

13

Time�

GA

RC

H E

stim

ate

0�

20�

40�

60�

34

56

Time�

NP

-GA

RC

H E

stim

ate

0�

20�

40�

60�

34

56

Figure 4: Simulation experiment. For an arbitrary section of 70 observations from thesimulated time series, the left-hand plot shows true volatility (solid line) compared with aparametric GARCH(1,1) estimate (dotted line) and the right-hand plot shows true volatil-ity compared with the nonparametric GARCH estimate obtained after a final smooth(dashed line).

Time�

-15

-10

-50

510

02.06.86�

02.03.87�

02.12.87�

02.09.88�

02.06.89�

02.03.90�

Figure 5: Real data. A 1000-day excerpt from the time series of daily returns on theBMW share price.

14

1020�

30�

4050�

sigm

a.sq

-10 -5 0 5 10x�

-10

010

2030

4050

Est

. f

Iteration 1

1020�

30�

40

sigm

a.sq

-10 -5 0 5 10x�

-10

010

2030

4050

Est

. f

Iteration 2

1020�

30�

40

sigm

a.sq

-10 -5 0 5 10x�

-10

010

2030

40E

st. f

Iteration 310

20�

30

40

sigm

a.sq

-10 -5 0 5 10x�

-10

010

2030

4050

Est

. f

Iteration 4

1020

30�

40

sigm

a.sq

-10 -5 0 5 10x�

-10

010

2030

4050

60E

st. f

Iteration 5

1020�

30

4050�

sigm

a.sq

-10 -5 0 5 10x�

-10

010

2030

4050

Est

. f

Iteration 6

10�

20�

30�

40

sigm

a.sq

-10 -5 0 5 10x�

-20

020

4060

80E

st. f

Iteration 7

20�

4060�

sigm

a.sq

-10 -5 0 5 10x�

-10

010

2030

4050

Est

. f

Iteration 8

10�

2030�

4050�

sigm

a.sq

-10 -5 0 5 10x�

-10

010

2030

4050

60E

st. f

Iteration final smooth

Figure 6: Real data. The estimated surfaces fm calculated at the first 8 iterations andthe surface f estimated at the final smoothing stage.

15

Time�

GA

RC

H E

stim

ate

24

6

02.06.86�

02.03.87�

02.12.87�

02.09.88�

02.06.89�

02.03.90�

Time�

NP

-GA

RC

H E

stim

ate

24

6

02.06.86�

02.03.87�

02.12.87�

02.09.88�

02.06.89�

02.03.90�

Figure 7: Real data. A comparison of volatilities estimated by ordinary parametricGARCH(1,1) modelling and nonparametric estimation (after the final smooth).

16

in copyright - non-commercial use permitted rights ...24069/eth... · garch models of higher order,...

Documents