consistency of akaike's information criterion

CONSISTENCY OF AKAIKE'S INFORMATION CRITERIONFOR INFINITE VARIANCE AUTOREGRESSIVE PROCESSES

by

Keith Knight

TECHNICAL REPORT No. 97January 1987

Department of Statistics, GN-22

University of Washington

Seattle, Washington 98195 USA

Consistency of Akaike's Inf()I'Dlation Criterion for Infinite VarianceAutoregressive Processes

KeithKnightUniversityof BritishColumbiaand Universityof Washington

ABSTRACT

Suppose {Xn } is a p -th order autoregressive process with. innovations

in the domain of attraction of a stable law and the true order p unknown.

The estimate of p, P , is chosen to minimize Akaike's Information Criterion

over the integers 0, 1, ... , K . It is shown that p is weakly consistent and

the consistency is retained if K -?' 00 as N -?' 00 at a certain rate

depending on the index of the stable law.

March 14, 1987

This research was supported by ONR Contract NOOOI4-84-e-0169 and by NSF Grant MCS 83-01807 and was part of theauthor's Ph.D. dissertation completed at the of The author would like to thank his supervisorR. Martin for his support and enCl)Urngem,ent in preparing this

O. Introduction

Consider a stationary p autoregressive (AR(P process } :

where {en} are independent, identically distributed (i.i.d.) random variables. The

parameters B1 ' ... , Bp satisfy the usual stationarity constraints, namely all zeroes of thepolynomial

have modulus less than 1.

Now assume that the true order p is unknown but bounded by some finite constant

K (N ). Our main purpose here will be to estimate p by P where p will be obtained by

minimizing a particular version of Akaikes Information Criterion (AIC) (Akaike, 1973) over

the integers {O, 1 , ... , K (N)}. Because we should be willing to examine a greater range

of possible orders for our estimate as the number of observations increases, it makes sense to

allow K (N) to increase with N . In the finite variance case with K (N) == K , AIC does not

a consistent estimate of p ; in there a nondegenerate limit u.""UUJUWl\JU of p

concentrated on integers p, p + 1 , ... , K

It snouto is a

parameter vector is defined as tolJ,OWS:

k -mmensionar

:::: -2 +2

is usuany defined

- 2-

terms a Gaussian hkelihood; so a k

as follows:

,,')where cr-(k) is estimate of

"2)=Nlncr )+2k

innovations vanance obtained the Y\V estimating

equations. We will cnoose as our estimate of p the

between 0 and K , that is,

minimizes $(k) for k

p =: ) .

In the case where two or more orders achieve the minimum, we will take the smallest of

those to be our estimate.

For certain reasons, we may also want the autoregressive parameters to vary (with N)

over some region of the parameter space. For example, consider the following hypothesis

testing problem:

versus

Ha : Xn is a nondegenerate autoregressive process.

We can consider a sequence { } convergmg to senseparameters converse to zero as

stanstica] test

set stanonanty condmon is

- 3 -

parnai autocorrelations, each

set

{ (p 1 r s pp ) : Pj E (-1

process by its p

set of

j=l, ... ,p}.p

one can parametrize an AR(P)

the int'''T\I::l1

(-1,1), Moreover one can show for an AR(p ) ..,rr,/,p,oc

order selection, the p-parametrization is somewhat more natural than ~-parametrization,

That is, "distance" between two autoregressive models with different orders is more

seen in the p-parametrization,

-4-

1. Infinite variance autoregressions

We will be interested in the case where the innovations {en} are in domain of

attraction of a stable law with index ex E (0,2). If E ( Ien I)< 00 then we will assume that

Recall that given observations Xl ' ... ,XN and known order p , it is possible to

consistently estimate the AR parameters 13 1 " " , J3p ' In fact for LS estimates

betahatsubJ , ... , ~ I where l;c. p :as,Nl/~(~k(l)- 13k ) ~ 0 for 0> ex

where 13k = 0 for k >p. For YW estimates, a slightly weaker result holds: convergence to

ois in probability rather than almost sure.

We may also wish to consider AR models of the form

where Jl is unknown and we retain the same assumptions on the 13k ' s and {en}' It can be

shown (Knight, 1987) that if we center the observed series by subtracting the sample mean X

(i.e., = '10 .... ,

n = 1, ... , N), we will still have Nl/~ (~k - 13k ) !..." 0 for 0 > max (I, ex)convergence is almlost sure for

by subl:racting "location esnmate Jl

Depending on "Jl we to rate

-5-

As stated earlier, we will want to vary the autoregressive parameters with N. For this

reason, we will consider a triangular array of random variables

X (1)1

X1(2) X (2)

, 2

where each row is a finite realization of an AR(p) process:

X (N) = ~ R lfY)X (N). + e(N)n ..l.J PI n-I n

j=1

The corresponding triangular array of innovations, {e;;') }ns,N , consists of row-wise

independent random variables sampledfrom a common distribution which is in the domain

of attraction of a stable law. Given a single i.i.d. sequence {en}' we could construct each

element of the triangular array as follows:

00

X (N) = " c. ( R(N)e .n ..l.J I P n-Jj=O -

We will require that f3(N) = (f3iN) , ... , f3;N) are such that (piN) , ... , p;N area compact)

= , we can attempt to f3p to zero as to intmity and to

consistently estimate p at same time. the testmg

consistent test

-6-

altematives.) Intuitively, it would seem that the smaller Ifit) I is, the more itshould be to distinguish between a p -th order and a lower order AR model. From

simulations, this does seem to be the case. This is the real motivation for allowing the

parameters to vary with N . Consider the following example. Suppose we observe a p -th

order AR process which has Pp very close to zero (say Pp =0.1). To estimate the order

of the process, we use a procedure which we know to be consistent. So for N large enough,

we will select the true order with arbitrarily high probability. However, for moderate sized

N , the probability of underestimating p may be very high. Conversely, if Ipp I is closeto 1, then even for small N there will be high probability of selecting the true order. So by

allowing Pp = fip to shrink to zero with N , we may get some idea of the relative samplesizes needed to get the same probability of correct order selection for two different sets of

AR parameters. If we view order selection as a hypothesis testing problem (say testing a null

hypothesis of white noise versus autoregressive alternatives), shrinking fip to zero issimilar in spirit to the sequence of contiguous alternative hypotheses to a null hypothesis

considered in Pitman efficiency calculations.

We should note that the partial autocorrelations do not have tlleir usual finite variance

interpretation; however, they can be unambiguously defined in terms of the regular

autocorrelations which themselves can be unambiguously defined in terms of the linear

as

process coefficients. (see

can

and Resnick, 1985) Moreover,

recursive YW estimates variance case.

assume that itIf we mctude unknown location, u,

not

model, we

a sense it

not

a sense,

-7-

a nuisance parameter this situation.

We will provide an answer to the following question: under what conditions (if any) on

K (N) and (~iN) , ... , ~;"l) ) AlC provide a consistent estimate p of p ? Bhansali(1983) conjectures that AlC may provide a consistent estimate of the order of an

autoregressive process based on the rapid convergence of parameter estimates. However, he

seems to conclude, from Monte Carlo results, that this may not be the case. If K (N) is

allowed to grow too fast then we may wind up severely overfitting much of the time; for

example, p could equal K (N) with high probability.

- 8-

2. Theoretical Results

The main result of this paper is contained in Theorem 7; the first six results provide the

necessary machinery for Theorem 7. We begin by stating two results dealing with r-th

moments of martingales and submartingales.

n

Theorem L (Esseen and von Bahr, 1965) Let Sn=l:X". If E(Xn ISn_ 1)=0 for"=1

2 S. n S. N and Xn E L' for 1 S. r S. 2 then

or N r

E( ISN I ) S. 2 l: E ( IXn I ).n=l

(Note that {Sn ' (j(Sn) ; n ~ I} is a martingale.)

Theorem 2. (c.f, Chung,1974 p.346) If {Xn ' (j(Xn ) ; n ~ I} is an L' -submartingale for

some r > 1 then

o

The following lemma will allow us to ignore the dependence on N of the moments of

{X;N)} by virtue of being able to bound the momentsparameters within a compact set.

Lemma 3. Let {Xn (~)} be a stationary AR(P) process with parameter ~ and innovations

{en} in the domain of attraction of a law with Ct.

the parameter for 0 < 0 < e ,

00

Proof. Xn (f) :;:: k Cj (f)en- j=o -

Now

-9-

where Cj (~) is a conanuoas function of f) for all j .

00

IXn (f) I s j~O ICj (~) II en -j I00

S; k aj Ien _j Ij=O

where a , =J shewn that Iajl CjP Ix I.i where00 yIx I < 1 and so k Iaj I < 00 for all "(> 0 . Under this summability condition, it

j=o

follows from Cline (1983) that the random variable

00

X = k aj Iej Ij=o

is finite almost surely with

lim ----"-----'~X-4 00

00

= k a.ct < 00 '0 JJ=

This implies that E (X 0) is finite for all 0 < 0 < a and the result follows. o

The following lemma will allow us to treat moments of k X n the same as the

moments of k en when a > 1.

Lemma 4. {Xn } a zero mean stal:ioI1lary ) process }

a law a>1. 1 < r < a,

- 10-

(a) El N x.' J= O(N)(b) E[~ l::? r ] = O(N).Proof.

where IRN I s [max I~k I ] pep + 1) [ max Ix, I ]. ThusISkSp I-pSksN

. C [ 1 ~ A 1- 1 Th > Mi' ...where '= - "" Pk J . 'us by inkowski's Inequality,k=1

l/r

'LXnn=1

rand part (a) tollc:.>ws from Theorem 1 by noting that E [ IRN I ] = 0 ). (It can actuallybe shown to 0 by a unitonn inteJgrabrility argument.)

rneorem 2 by notilng

+

- 11 -

and using Minkowski's Inequality.

The following theorem deals with uniform convergence of both LS and YW

autoregressive parameter estimates in the case where location is known.

Theorem 5. Assume known location J1. Let K (N) = 0 (N) for 0 < 2; and 11! IIdenote the Euclidean norm of the vector v . Then

("a)

(b)

{jj max II ~(l) - f)(N) II ~ 0 where f)t) == 0 for k > P .p'5.1'5.K(N) - -

{jj max II ~(l) - ~(I)1I ~ O.1'5.1'5.K(N) - -

Note that the vectors are not fixed length but may vary with N .

Proof. (a) The style of proof will mimic Hannan and Kanter (1977). For convenience we

suppress the notation indicating the dependence of {Xn }, {En} and f) on N . For I ~ Pthe LSestimating equations can be reexpressedjas follows:

* N 2where Ll (j)= L EnXn_ j Fix o< __(X, and set K=K(N)=O(N). For each

n=l+l 2

I , C1 is non-negative definite and so it suffices to show that for some 1C < .1..,(X,

(i) Np~O

pv~oo

= ) . and maxl'5.K

) - f) p~

- 12-

To prove (i), it sutaees to show that

for some y < ~ .

Now

NLenXn_j

n=1+1

1rJ21 JIN

NL enXn_ j

n=1

21]

+E1

L nXn-jn=l

$:,\",(1-2"i7.E[>m.,.]=1 lS1SK

s ~N (l-f

- 13-

uniformly over j between 1 and K (N) .

kIf 21 ~ 1 then a> 1 and so Sk,j = L EnXn

n=1is a martingale for each J.

ISk,j I is an L 2Y-submartingale and so by Theorems 1 and 2,

uniformly over j .

Similarly it can be shown that for all permissible values of 1, WN,j = 0 (N) uniformly

over j between 1 and K (N). Thus for a given sequence K(N) = 0 (N) by taking K

sufficiently close to ~ and 1 sufficiently close to ~, we will have

as desired.

To prove (ii), we define Xn , v ' En, v as follows:

KEn,v = L VkEn_k

k=1

K 2L vk = 1. It tok=1

- 14-

pNow note that X nv =:E f}jXn-}.v +env By the triangle inequality,

Now

N- K ~ [~A X ]2 < ~ A~ ~ N- K ~ X 2"'" "'" tJk 11. - k v - "'" tJJ "'" "'" 11. - k: v

n=K+1 k=l }=1 k=l n=K+1

It remains only to show that N- K :E e; v ~ 00. If this is true then.

p~ 00 since the probability that this quantity stays bounded clearly must tend

to zero.

Now

=:E :E vkk=ln=K+1

N-K= :E e;

n=K+1

- 15 -

Thus

since

N-K PN- K L E~ -+ 00

n=K+l

Thus we need only show that

Now

K k-ls N- K L [vk: I L [vj I

k=2 j=l

NL En_jEn_ k

n=K+l

Now take "( < a and note that. j *' k . If "( < 1 then

If "(:2:: 1 then necessarily a> 1. Thus Sl = L En _ j En -k is an L r -martingale andn=K+l

hence

NL

n=K+l-k ]=0

unitormly over j :t:k Theorem 1.

- 16-

Now

since Ivk I s 1 for all k .

(b) From the definitions of C1 ,C1 '!.I and f..1 ' it is easy to see that

(la)

and

(lb)

K NT IC" (. ') C- (. ') I

- 17 -

Now from the definitions of ~(l) and aO), we get

uniformly in 1 by equation (3b). Finally we must show that the minimum eigenvalue of

N- te C1 tends in probability to infinity uniformly in 1 since II (N- te C1 rIll is (in the

case of symmetric positive definite matrices) merely the reciprocal of this minimum

eigenvalue. Note that for unit vectors ~

uniformly over 1 and unit vectors ~ by condition (ii) of the proof of part (a) of this

theorem and equation (3a) above. Therefore

as required. D

In the case where we have an unknown location parameter and we estimate it with some

location estimate ~,we can obtain the following corollary.

Corollary 6.

L If (~-IJi = Op (NY) for 'Y S; min [[ ~ - ; + ~ J, 0 ] uniformly over allcompact subsets the parameter space, then Theorem 5 still holds.

condition.

(1.> 1,

Theorem S

K =0

- 18 -

Proof. 1. (a) Assume without loss of generality that J.1 = O. We can again reexpress the LS

estimating equations as follows:

where now

and

N!.7 (j) = Ln=l+l

** [p] "2=!.I U) + (N - I) 1 - L ~k J.1

k=l

By similar methods to those used in the proof of Theorem 5, it is easy to show that for some

2K

- 19-

(b) Defining TN and SN analogously to the proof of Theorem 5, we again get that for

some K 2p and conclusions (a) (b) of Theorem 5

some K ) then

pP 4

- 20-

pProof. First we note that since p is integer-valued, p -+ p is toP [p = p] -+ I (as N -+ 00). From here on, we will refer to K (N) as K and to ~r) as

~k thus suppressing the dependence on N .

Moreover we will assume that the observations Xn are already centered; that is, we have

subtracted out the location estimate ~ (if we are assuming unknown location).

We now use the fact that

il(k) = &2(0) Ii (1- ~;(l1=1

for k e I

"2 I N 2where (J (0) = - L Xn and ~ k (/) is the YW estimate of ~k (I Since

we can write

so

P[p

- 21-

lim sup P [N ~; (P) ~ 2p] = 0N~oo

since lim inf

We also have that

P [ p > p ] s P [ (k) < (k -1) for some p < k s K ]

-S;P [N min In(l- ~i (k < -2 ]p

- 22-

3. Simulation results

For illustrative purposes, a small simulation study was carried out using four symmetric

stable innovations distributions with a = 0.5, 1.2, 1.9 and 2.0 (the latter being the normal

distribution). The underlying processes were AR(I) processes with the AR parameter

13 =0.1 ,0.5 and 0.9. The sample sizes considered were 100 and 900. For N =100, the

maximum order K was taken to be 10 while for N = 900, K was taken to be 15. 100

replications were made for each of the 24 possible arrangements of a, 13 and N . The results

of the study are given in Tables 1 through 8.

Estimated "C AR parameterorder 0.1 0.5 0.9

0 89 0 11 4 91 872 4 3 23 0 1 14 1 0 0

I 5 0 3 26 1 1 27 0 0 38 0 0 19 1 1 010 0 0 1

Table 1: Frequency of selected order for AR(I)process. N =100 a =0.5

Estimated AR parameterorder 0.1 0.5 0.9

0 0 0 01 93 95 912 0 0 03 0 0 14 0 0 05 0 0 06 0 0 07 4 0 08 1 5 29 0 0 1

10-15 2 0 5

Table 2: Frequency of selected order for AR(l)process. N = 900 a = 0.5

The results are much as expected. We can see that for N = 100 and ~1 = 0.1, AlC

underestimates the true order with high probability. For N = 900, the probabilities of

selecting the true order increases over those for N = 100.

Estimated AR parameterorder 0.1 f\J:. 0.9V.,J

0 70 0 01 15 86 862 7 7 43 3 34 1 1 35 0 1 16 0 0 07 2 0 28 2 1 19 0 1 010 0 0 0

Table 3: Frequency of selected order forprocess. N = 100 a = 1.2

Estimated AR parameterorder I 0.1 0.5 0.9

0 0 0 01 80 87 902 6 3 33 6 0 14 1 4 35 1 2 06 0 0 07 2 2 18 1 0 09 0 0 0

10-15 3 2 2

Table 4: Frequency of selected order for AR(l)process. N =900 a =1.2


0 57 0 01 25 76 712 5 8 103 2 5 94 3 6 65 2 1 26 3 1 07 1 1 18 1 0 09 0 0 110 1 2 0

Table 5: Frequency of selected order for AR(1)process. N =100 a =1.9


0 5 0 01 78 74 752 7 14 93 2 2 74 2 5 25 1 1 26 1 1 07 3 1 38 0 0 09 0 0 1

10-15 1 2 1

Table 6: Frequency of selected order for AR(l)process. N =900 a =1.9


0 63 0 01 25 75 752 4 3 123 1 6 24 0 7 5t: 2 2 '"J .6 2 4 37 1 0 08 1 1 19 0 2 010 1 0 0

Table 7: Frequency of selected order for AR(I)process. N = 100 Normal distribution

Estimatedorder

o123456789

10-15

- 26-

AR parameter0.1 0.5 0.9000

83 79 803311443460233000040411002000

Table 8: Frequency of selected order forAR(l)process. N = 900 Normal distribution

- 27-

4. Comments

Bhansali and Downham (1977) propose a generalization of AlC. They propose to

minimize $'(k) = N In ci (k) + yk where y E (0,4). It is easy to see from the proof of theabove result that their criterion will also lead to consistent estimates of p under similar

conditions on K (N) and ~~N). In fact, if y = y(N) > 0 satisfies y(N)/N ~ 0, then thecriterion corresponding to $"(k ) = N In ;l (k ) +y(N) k will consistently estimate p.Specifically, with known location, the estimate will be consistent provided

lim inf .s: 1~~N) 12 > PN -')00 y(N)

with y(N) bounded away from zero and with the same conditions on K (N). With an

appropriate choice of y(N), this criterion will also be consistent in the finite variance case.

However if y(N) grows too quickly with N then the criterion may seriously underestimate

the true order p in small samples in both the finite and infinite variance cases. In an

application such as autoregressive spectral density estimation (assuming now finite

variance), underestimation is more serious than overestimation since, if the order is

underestimated, the resulting spectral density estimate may be lacking important features

which may indeed exist.

- 29-

Davis, R. and Resnick, S. (1985) Limit theory for moving averages of random variables with

regularly varying tail probabilities. Ann. Prob. 13 179-95.

Esseen, C. and von Bahr, B. (1965) Inequalities for the rth absolute moment of a sum of

random variables, 1 s r s 2. Ann. Math. Statist. 36 299-303.

Hannan, E. and Kanter, M. (1977) Autoregressive processes with infinite variance. J. App.

Prob. 14 411-15.

Knight, K. (1987) Rate of convergence of centred estimates of autoregressive parameters for

infinite variance autoregressions. J. Time Ser. Anal. 8 51-60.

Shibata, R. (1976) Selection of the order of an autoregressive model by Akaike's

information criterion. Biometrika 63 117-26.

Current address:V6T lW5

of Statisncs, Univlersity of British Columbia, 2021 West

- 28-

References

Akaike, H. (1973) Information theory and an extension of the maximum likelihood

principle, in: Second International Symposium on Information Theory (eds. B. Petrov

and F. Csaki) Akedemiai Kiade, Budapest 267-81.

Bamdorff-Nielsen, O. and Schou, G. (1973) On the parametrization of autoregressive

models by partial autocorrelations. J. Multivar. Anal. 3 408-19.

Bhansali, R (1983) Order determination for processes with infinite variance, in: Robust and

Nonlinear Time Series Analysis. (eds. J. Franke, W. Hardle, RD. Martin), Springer-

Verlag, New York, 17-25.

Bhansali, R and Downham, D. (1977) Some properties of the order of an autoregressive

model selected by a generalization of Akaike's FPE criterion. Biometrika 64 547-51.

Chung, K.L. (1974) A course in probability theory. Academic Press, New York.

Cline, D.

Report

of random vanabtes with regularly '{791"\111'1'0

83-24, Institute of Applied Mathematics and Statistics, Umversitv

consistency of akaike's information criterion

Documents