consistency of akaike's information criterion
DESCRIPTION
Consistency OF AKAIKE'S INFORMATION CRITERIONTRANSCRIPT
-
CONSISTENCY OF AKAIKE'S INFORMATION CRITERIONFOR INFINITE VARIANCE AUTOREGRESSIVE PROCESSES
by
Keith Knight
TECHNICAL REPORT No. 97January 1987
Department of Statistics, GN-22
University of Washington
Seattle, Washington 98195 USA
-
Consistency of Akaike's Inf()I'Dlation Criterion for Infinite VarianceAutoregressive Processes
KeithKnightUniversityof BritishColumbiaand Universityof Washington
ABSTRACT
Suppose {Xn } is a p -th order autoregressive process with. innovations
in the domain of attraction of a stable law and the true order p unknown.
The estimate of p, P , is chosen to minimize Akaike's Information Criterion
over the integers 0, 1, ... , K . It is shown that p is weakly consistent and
the consistency is retained if K -?' 00 as N -?' 00 at a certain rate
depending on the index of the stable law.
March 14, 1987
This research was supported by ONR Contract NOOOI4-84-e-0169 and by NSF Grant MCS 83-01807 and was part of theauthor's Ph.D. dissertation completed at the of The author would like to thank his supervisorR. Martin for his support and enCl)Urngem,ent in preparing this
-
O. Introduction
Consider a stationary p autoregressive (AR(P process } :
where {en} are independent, identically distributed (i.i.d.) random variables. The
parameters B1 ' ... , Bp satisfy the usual stationarity constraints, namely all zeroes of thepolynomial
have modulus less than 1.
Now assume that the true order p is unknown but bounded by some finite constant
K (N ). Our main purpose here will be to estimate p by P where p will be obtained by
minimizing a particular version of Akaikes Information Criterion (AIC) (Akaike, 1973) over
the integers {O, 1 , ... , K (N)}. Because we should be willing to examine a greater range
of possible orders for our estimate as the number of observations increases, it makes sense to
allow K (N) to increase with N . In the finite variance case with K (N) == K , AIC does not
a consistent estimate of p ; in there a nondegenerate limit u.""UUJUWl\JU of p
concentrated on integers p, p + 1 , ... , K
It snouto is a
parameter vector is defined as tolJ,OWS:
k -mmensionar
:::: -2 +2
-
is usuany defined
- 2-
terms a Gaussian hkelihood; so a k
as follows:
,,')where cr-(k) is estimate of
"2)=Nlncr )+2k
innovations vanance obtained the Y\V estimating
equations. We will cnoose as our estimate of p the
between 0 and K , that is,
minimizes $(k) for k
p =: ) .
In the case where two or more orders achieve the minimum, we will take the smallest of
those to be our estimate.
For certain reasons, we may also want the autoregressive parameters to vary (with N)
over some region of the parameter space. For example, consider the following hypothesis
testing problem:
versus
Ha : Xn is a nondegenerate autoregressive process.
We can consider a sequence { } convergmg to senseparameters converse to zero as
stanstica] test
set stanonanty condmon is
-
- 3 -
parnai autocorrelations, each
set
{ (p 1 r s pp ) : Pj E (-1
process by its p
set of
j=l, ... ,p}.p
one can parametrize an AR(P)
the int'''T\I::l1
(-1,1), Moreover one can show for an AR(p ) ..,rr,/,p,oc
order selection, the p-parametrization is somewhat more natural than ~-parametrization,
That is, "distance" between two autoregressive models with different orders is more
seen in the p-parametrization,
-
-4-
1. Infinite variance autoregressions
We will be interested in the case where the innovations {en} are in domain of
attraction of a stable law with index ex E (0,2). If E ( Ien I)< 00 then we will assume that
Recall that given observations Xl ' ... ,XN and known order p , it is possible to
consistently estimate the AR parameters 13 1 " " , J3p ' In fact for LS estimates
betahatsubJ , ... , ~ I where l;c. p :as,Nl/~(~k(l)- 13k ) ~ 0 for 0> ex
where 13k = 0 for k >p. For YW estimates, a slightly weaker result holds: convergence to
ois in probability rather than almost sure.
We may also wish to consider AR models of the form
where Jl is unknown and we retain the same assumptions on the 13k ' s and {en}' It can be
shown (Knight, 1987) that if we center the observed series by subtracting the sample mean X
(i.e., = '10 .... ,
n = 1, ... , N), we will still have Nl/~ (~k - 13k ) !..." 0 for 0 > max (I, ex)convergence is almlost sure for
by subl:racting "location esnmate Jl
Depending on "Jl we to rate
-
-5-
As stated earlier, we will want to vary the autoregressive parameters with N. For this
reason, we will consider a triangular array of random variables
X (1)1
X1(2) X (2)
, 2
where each row is a finite realization of an AR(p) process:
X (N) = ~ R lfY)X (N). + e(N)n ..l.J PI n-I n
j=1
The corresponding triangular array of innovations, {e;;') }ns,N , consists of row-wise
independent random variables sampledfrom a common distribution which is in the domain
of attraction of a stable law. Given a single i.i.d. sequence {en}' we could construct each
element of the triangular array as follows:
00
X (N) = " c. ( R(N)e .n ..l.J I P n-Jj=O -
We will require that f3(N) = (f3iN) , ... , f3;N) are such that (piN) , ... , p;N area compact)
= , we can attempt to f3p to zero as to intmity and to
consistently estimate p at same time. the testmg
consistent test
-
-6-
altematives.) Intuitively, it would seem that the smaller Ifit) I is, the more itshould be to distinguish between a p -th order and a lower order AR model. From
simulations, this does seem to be the case. This is the real motivation for allowing the
parameters to vary with N . Consider the following example. Suppose we observe a p -th
order AR process which has Pp very close to zero (say Pp =0.1). To estimate the order
of the process, we use a procedure which we know to be consistent. So for N large enough,
we will select the true order with arbitrarily high probability. However, for moderate sized
N , the probability of underestimating p may be very high. Conversely, if Ipp I is closeto 1, then even for small N there will be high probability of selecting the true order. So by
allowing Pp = fip to shrink to zero with N , we may get some idea of the relative samplesizes needed to get the same probability of correct order selection for two different sets of
AR parameters. If we view order selection as a hypothesis testing problem (say testing a null
hypothesis of white noise versus autoregressive alternatives), shrinking fip to zero issimilar in spirit to the sequence of contiguous alternative hypotheses to a null hypothesis
considered in Pitman efficiency calculations.
We should note that the partial autocorrelations do not have tlleir usual finite variance
interpretation; however, they can be unambiguously defined in terms of the regular
autocorrelations which themselves can be unambiguously defined in terms of the linear
as
process coefficients. (see
can
and Resnick, 1985) Moreover,
recursive YW estimates variance case.
assume that itIf we mctude unknown location, u,
not
model, we
a sense it
not
a sense,
-
-7-
a nuisance parameter this situation.
We will provide an answer to the following question: under what conditions (if any) on
K (N) and (~iN) , ... , ~;"l) ) AlC provide a consistent estimate p of p ? Bhansali(1983) conjectures that AlC may provide a consistent estimate of the order of an
autoregressive process based on the rapid convergence of parameter estimates. However, he
seems to conclude, from Monte Carlo results, that this may not be the case. If K (N) is
allowed to grow too fast then we may wind up severely overfitting much of the time; for
example, p could equal K (N) with high probability.
-
- 8-
2. Theoretical Results
The main result of this paper is contained in Theorem 7; the first six results provide the
necessary machinery for Theorem 7. We begin by stating two results dealing with r-th
moments of martingales and submartingales.
n
Theorem L (Esseen and von Bahr, 1965) Let Sn=l:X". If E(Xn ISn_ 1)=0 for"=1
2 S. n S. N and Xn E L' for 1 S. r S. 2 then
or N r
E( ISN I ) S. 2 l: E ( IXn I ).n=l
(Note that {Sn ' (j(Sn) ; n ~ I} is a martingale.)
Theorem 2. (c.f, Chung,1974 p.346) If {Xn ' (j(Xn ) ; n ~ I} is an L' -submartingale for
some r > 1 then
o
The following lemma will allow us to ignore the dependence on N of the moments of
{X;N)} by virtue of being able to bound the momentsparameters within a compact set.
Lemma 3. Let {Xn (~)} be a stationary AR(P) process with parameter ~ and innovations
{en} in the domain of attraction of a law with Ct.
the parameter for 0 < 0 < e ,
-
00
Proof. Xn (f) :;:: k Cj (f)en- j=o -
Now
-9-
where Cj (~) is a conanuoas function of f) for all j .
00
IXn (f) I s j~O ICj (~) II en -j I00
S; k aj Ien _j Ij=O
where a , =J shewn that Iajl CjP Ix I.i where00 yIx I < 1 and so k Iaj I < 00 for all "(> 0 . Under this summability condition, it
j=o
follows from Cline (1983) that the random variable
00
X = k aj Iej Ij=o
is finite almost surely with
lim ----"-----'~X-4 00
00
= k a.ct < 00 '0 JJ=
This implies that E (X 0) is finite for all 0 < 0 < a and the result follows. o
The following lemma will allow us to treat moments of k X n the same as the
moments of k en when a > 1.
Lemma 4. {Xn } a zero mean stal:ioI1lary ) process }
a law a>1. 1 < r < a,
-
- 10-
(a) El N x.' J= O(N)(b) E[~ l::? r ] = O(N).Proof.
where IRN I s [max I~k I ] pep + 1) [ max Ix, I ]. ThusISkSp I-pSksN
. C [ 1 ~ A 1- 1 Th > Mi' ...where '= - "" Pk J . 'us by inkowski's Inequality,k=1
l/r
'LXnn=1
rand part (a) tollc:.>ws from Theorem 1 by noting that E [ IRN I ] = 0 ). (It can actuallybe shown to 0 by a unitonn inteJgrabrility argument.)
rneorem 2 by notilng
+
-
- 11 -
and using Minkowski's Inequality.
The following theorem deals with uniform convergence of both LS and YW
autoregressive parameter estimates in the case where location is known.
Theorem 5. Assume known location J1. Let K (N) = 0 (N) for 0 < 2; and 11! IIdenote the Euclidean norm of the vector v . Then
("a)
(b)
{jj max II ~(l) - f)(N) II ~ 0 where f)t) == 0 for k > P .p'5.1'5.K(N) - -
{jj max II ~(l) - ~(I)1I ~ O.1'5.1'5.K(N) - -
Note that the vectors are not fixed length but may vary with N .
Proof. (a) The style of proof will mimic Hannan and Kanter (1977). For convenience we
suppress the notation indicating the dependence of {Xn }, {En} and f) on N . For I ~ Pthe LSestimating equations can be reexpressedjas follows:
* N 2where Ll (j)= L EnXn_ j Fix o< __(X, and set K=K(N)=O(N). For each
n=l+l 2
I , C1 is non-negative definite and so it suffices to show that for some 1C < .1..,(X,
(i) Np~O
pv~oo
= ) . and maxl'5.K
) - f) p~
-
- 12-
To prove (i), it sutaees to show that
for some y < ~ .
Now
NLenXn_j
n=1+1
1rJ21 JIN
NL enXn_ j
n=1
21]
+E1
L nXn-jn=l
$:,\",(1-2"i7.E[>m.,.]=1 lS1SK
s ~N (l-f
-
- 13-
uniformly over j between 1 and K (N) .
kIf 21 ~ 1 then a> 1 and so Sk,j = L EnXn
n=1is a martingale for each J.
ISk,j I is an L 2Y-submartingale and so by Theorems 1 and 2,
uniformly over j .
Similarly it can be shown that for all permissible values of 1, WN,j = 0 (N) uniformly
over j between 1 and K (N). Thus for a given sequence K(N) = 0 (N) by taking K
sufficiently close to ~ and 1 sufficiently close to ~, we will have
as desired.
To prove (ii), we define Xn , v ' En, v as follows:
KEn,v = L VkEn_k
k=1
K 2L vk = 1. It tok=1
-
- 14-
pNow note that X nv =:E f}jXn-}.v +env By the triangle inequality,
Now
N- K ~ [~A X ]2 < ~ A~ ~ N- K ~ X 2"'" "'" tJk 11. - k v - "'" tJJ "'" "'" 11. - k: v
n=K+1 k=l }=1 k=l n=K+1
It remains only to show that N- K :E e; v ~ 00. If this is true then.
p~ 00 since the probability that this quantity stays bounded clearly must tend
to zero.
Now
=:E :E vkk=ln=K+1
N-K= :E e;
n=K+1
-
- 15 -
Thus
since
N-K PN- K L E~ -+ 00
n=K+l
Thus we need only show that
Now
K k-ls N- K L [vk: I L [vj I
k=2 j=l
NL En_jEn_ k
n=K+l
Now take "( < a and note that. j *' k . If "( < 1 then
If "(:2:: 1 then necessarily a> 1. Thus Sl = L En _ j En -k is an L r -martingale andn=K+l
hence
NL
n=K+l-k ]=0
unitormly over j :t:k Theorem 1.
-
- 16-
Now
since Ivk I s 1 for all k .
(b) From the definitions of C1 ,C1 '!.I and f..1 ' it is easy to see that
(la)
and
(lb)
K NT IC" (. ') C- (. ') I
-
- 17 -
Now from the definitions of ~(l) and aO), we get
uniformly in 1 by equation (3b). Finally we must show that the minimum eigenvalue of
N- te C1 tends in probability to infinity uniformly in 1 since II (N- te C1 rIll is (in the
case of symmetric positive definite matrices) merely the reciprocal of this minimum
eigenvalue. Note that for unit vectors ~
uniformly over 1 and unit vectors ~ by condition (ii) of the proof of part (a) of this
theorem and equation (3a) above. Therefore
as required. D
In the case where we have an unknown location parameter and we estimate it with some
location estimate ~,we can obtain the following corollary.
Corollary 6.
L If (~-IJi = Op (NY) for 'Y S; min [[ ~ - ; + ~ J, 0 ] uniformly over allcompact subsets the parameter space, then Theorem 5 still holds.
condition.
(1.> 1,
Theorem S
K =0
-
- 18 -
Proof. 1. (a) Assume without loss of generality that J.1 = O. We can again reexpress the LS
estimating equations as follows:
where now
and
N!.7 (j) = Ln=l+l
** [p] "2=!.I U) + (N - I) 1 - L ~k J.1
k=l
By similar methods to those used in the proof of Theorem 5, it is easy to show that for some
2K
-
- 19-
(b) Defining TN and SN analogously to the proof of Theorem 5, we again get that for
some K 2p and conclusions (a) (b) of Theorem 5
some K ) then
pP 4
-
- 20-
pProof. First we note that since p is integer-valued, p -+ p is toP [p = p] -+ I (as N -+ 00). From here on, we will refer to K (N) as K and to ~r) as
~k thus suppressing the dependence on N .
Moreover we will assume that the observations Xn are already centered; that is, we have
subtracted out the location estimate ~ (if we are assuming unknown location).
We now use the fact that
il(k) = &2(0) Ii (1- ~;(l1=1
for k e I
"2 I N 2where (J (0) = - L Xn and ~ k (/) is the YW estimate of ~k (I Since
we can write
so
P[p
-
- 21-
lim sup P [N ~; (P) ~ 2p] = 0N~oo
since lim inf
We also have that
P [ p > p ] s P [ (k) < (k -1) for some p < k s K ]
-S;P [N min In(l- ~i (k < -2 ]p
-
- 22-
3. Simulation results
For illustrative purposes, a small simulation study was carried out using four symmetric
stable innovations distributions with a = 0.5, 1.2, 1.9 and 2.0 (the latter being the normal
distribution). The underlying processes were AR(I) processes with the AR parameter
13 =0.1 ,0.5 and 0.9. The sample sizes considered were 100 and 900. For N =100, the
maximum order K was taken to be 10 while for N = 900, K was taken to be 15. 100
replications were made for each of the 24 possible arrangements of a, 13 and N . The results
of the study are given in Tables 1 through 8.
Estimated "C AR parameterorder 0.1 0.5 0.9
0 89 0 11 4 91 872 4 3 23 0 1 14 1 0 0
I 5 0 3 26 1 1 27 0 0 38 0 0 19 1 1 010 0 0 1
Table 1: Frequency of selected order for AR(I)process. N =100 a =0.5
-
Estimated AR parameterorder 0.1 0.5 0.9
0 0 0 01 93 95 912 0 0 03 0 0 14 0 0 05 0 0 06 0 0 07 4 0 08 1 5 29 0 0 1
10-15 2 0 5
Table 2: Frequency of selected order for AR(l)process. N = 900 a = 0.5
The results are much as expected. We can see that for N = 100 and ~1 = 0.1, AlC
underestimates the true order with high probability. For N = 900, the probabilities of
selecting the true order increases over those for N = 100.
Estimated AR parameterorder 0.1 f\J:. 0.9V.,J
0 70 0 01 15 86 862 7 7 43 3 34 1 1 35 0 1 16 0 0 07 2 0 28 2 1 19 0 1 010 0 0 0
Table 3: Frequency of selected order forprocess. N = 100 a = 1.2
-
Estimated AR parameterorder I 0.1 0.5 0.9
0 0 0 01 80 87 902 6 3 33 6 0 14 1 4 35 1 2 06 0 0 07 2 2 18 1 0 09 0 0 0
10-15 3 2 2
Table 4: Frequency of selected order for AR(l)process. N =900 a =1.2
Estimated AR parameterorder 0.1 0.5 0.9
0 57 0 01 25 76 712 5 8 103 2 5 94 3 6 65 2 1 26 3 1 07 1 1 18 1 0 09 0 0 110 1 2 0
Table 5: Frequency of selected order for AR(1)process. N =100 a =1.9
-
Estimated AR parameterorder 0.1 0.5 0.9
0 5 0 01 78 74 752 7 14 93 2 2 74 2 5 25 1 1 26 1 1 07 3 1 38 0 0 09 0 0 1
10-15 1 2 1
Table 6: Frequency of selected order for AR(l)process. N =900 a =1.9
Estimated AR parameterorder 0.1 0.5 0.9
0 63 0 01 25 75 752 4 3 123 1 6 24 0 7 5t: 2 2 '"J .6 2 4 37 1 0 08 1 1 19 0 2 010 1 0 0
Table 7: Frequency of selected order for AR(I)process. N = 100 Normal distribution
-
Estimatedorder
o123456789
10-15
- 26-
AR parameter0.1 0.5 0.9000
83 79 803311443460233000040411002000
Table 8: Frequency of selected order forAR(l)process. N = 900 Normal distribution
-
- 27-
4. Comments
Bhansali and Downham (1977) propose a generalization of AlC. They propose to
minimize $'(k) = N In ci (k) + yk where y E (0,4). It is easy to see from the proof of theabove result that their criterion will also lead to consistent estimates of p under similar
conditions on K (N) and ~~N). In fact, if y = y(N) > 0 satisfies y(N)/N ~ 0, then thecriterion corresponding to $"(k ) = N In ;l (k ) +y(N) k will consistently estimate p.Specifically, with known location, the estimate will be consistent provided
lim inf .s: 1~~N) 12 > PN -')00 y(N)
with y(N) bounded away from zero and with the same conditions on K (N). With an
appropriate choice of y(N), this criterion will also be consistent in the finite variance case.
However if y(N) grows too quickly with N then the criterion may seriously underestimate
the true order p in small samples in both the finite and infinite variance cases. In an
application such as autoregressive spectral density estimation (assuming now finite
variance), underestimation is more serious than overestimation since, if the order is
underestimated, the resulting spectral density estimate may be lacking important features
which may indeed exist.
-
- 29-
Davis, R. and Resnick, S. (1985) Limit theory for moving averages of random variables with
regularly varying tail probabilities. Ann. Prob. 13 179-95.
Esseen, C. and von Bahr, B. (1965) Inequalities for the rth absolute moment of a sum of
random variables, 1 s r s 2. Ann. Math. Statist. 36 299-303.
Hannan, E. and Kanter, M. (1977) Autoregressive processes with infinite variance. J. App.
Prob. 14 411-15.
Knight, K. (1987) Rate of convergence of centred estimates of autoregressive parameters for
infinite variance autoregressions. J. Time Ser. Anal. 8 51-60.
Shibata, R. (1976) Selection of the order of an autoregressive model by Akaike's
information criterion. Biometrika 63 117-26.
Current address:V6T lW5
of Statisncs, Univlersity of British Columbia, 2021 West
-
- 28-
References
Akaike, H. (1973) Information theory and an extension of the maximum likelihood
principle, in: Second International Symposium on Information Theory (eds. B. Petrov
and F. Csaki) Akedemiai Kiade, Budapest 267-81.
Bamdorff-Nielsen, O. and Schou, G. (1973) On the parametrization of autoregressive
models by partial autocorrelations. J. Multivar. Anal. 3 408-19.
Bhansali, R (1983) Order determination for processes with infinite variance, in: Robust and
Nonlinear Time Series Analysis. (eds. J. Franke, W. Hardle, RD. Martin), Springer-
Verlag, New York, 17-25.
Bhansali, R and Downham, D. (1977) Some properties of the order of an autoregressive
model selected by a generalization of Akaike's FPE criterion. Biometrika 64 547-51.
Chung, K.L. (1974) A course in probability theory. Academic Press, New York.
Cline, D.
Report
of random vanabtes with regularly '{791"\111'1'0
83-24, Institute of Applied Mathematics and Statistics, Umversitv