consistency of akaike's information criterion

31
 C ON SI STEN CY OF AKAIKE S I NF OR MA TI ON CRITERION FOR INFINITE VARIANCE A UT OR EG RE SS IV E P RO CE SS ES by  eith  night TECHNICALREPORT No.97  anuary 1987 Department of Statistics GN-22 University of Was hin gton Seattle Washington 98195 USA

Upload: amirhosein389

Post on 03-Nov-2015

224 views

Category:

Documents


0 download

DESCRIPTION

Consistency OF AKAIKE'S INFORMATION CRITERION

TRANSCRIPT

  • CONSISTENCY OF AKAIKE'S INFORMATION CRITERIONFOR INFINITE VARIANCE AUTOREGRESSIVE PROCESSES

    by

    Keith Knight

    TECHNICAL REPORT No. 97January 1987

    Department of Statistics, GN-22

    University of Washington

    Seattle, Washington 98195 USA

  • Consistency of Akaike's Inf()I'Dlation Criterion for Infinite VarianceAutoregressive Processes

    KeithKnightUniversityof BritishColumbiaand Universityof Washington

    ABSTRACT

    Suppose {Xn } is a p -th order autoregressive process with. innovations

    in the domain of attraction of a stable law and the true order p unknown.

    The estimate of p, P , is chosen to minimize Akaike's Information Criterion

    over the integers 0, 1, ... , K . It is shown that p is weakly consistent and

    the consistency is retained if K -?' 00 as N -?' 00 at a certain rate

    depending on the index of the stable law.

    March 14, 1987

    This research was supported by ONR Contract NOOOI4-84-e-0169 and by NSF Grant MCS 83-01807 and was part of theauthor's Ph.D. dissertation completed at the of The author would like to thank his supervisorR. Martin for his support and enCl)Urngem,ent in preparing this

  • O. Introduction

    Consider a stationary p autoregressive (AR(P process } :

    where {en} are independent, identically distributed (i.i.d.) random variables. The

    parameters B1 ' ... , Bp satisfy the usual stationarity constraints, namely all zeroes of thepolynomial

    have modulus less than 1.

    Now assume that the true order p is unknown but bounded by some finite constant

    K (N ). Our main purpose here will be to estimate p by P where p will be obtained by

    minimizing a particular version of Akaikes Information Criterion (AIC) (Akaike, 1973) over

    the integers {O, 1 , ... , K (N)}. Because we should be willing to examine a greater range

    of possible orders for our estimate as the number of observations increases, it makes sense to

    allow K (N) to increase with N . In the finite variance case with K (N) == K , AIC does not

    a consistent estimate of p ; in there a nondegenerate limit u.""UUJUWl\JU of p

    concentrated on integers p, p + 1 , ... , K

    It snouto is a

    parameter vector is defined as tolJ,OWS:

    k -mmensionar

    :::: -2 +2

  • is usuany defined

    - 2-

    terms a Gaussian hkelihood; so a k

    as follows:

    ,,')where cr-(k) is estimate of

    "2)=Nlncr )+2k

    innovations vanance obtained the Y\V estimating

    equations. We will cnoose as our estimate of p the

    between 0 and K , that is,

    minimizes $(k) for k

    p =: ) .

    In the case where two or more orders achieve the minimum, we will take the smallest of

    those to be our estimate.

    For certain reasons, we may also want the autoregressive parameters to vary (with N)

    over some region of the parameter space. For example, consider the following hypothesis

    testing problem:

    versus

    Ha : Xn is a nondegenerate autoregressive process.

    We can consider a sequence { } convergmg to senseparameters converse to zero as

    stanstica] test

    set stanonanty condmon is

  • - 3 -

    parnai autocorrelations, each

    set

    { (p 1 r s pp ) : Pj E (-1

    process by its p

    set of

    j=l, ... ,p}.p

    one can parametrize an AR(P)

    the int'''T\I::l1

    (-1,1), Moreover one can show for an AR(p ) ..,rr,/,p,oc

    order selection, the p-parametrization is somewhat more natural than ~-parametrization,

    That is, "distance" between two autoregressive models with different orders is more

    seen in the p-parametrization,

  • -4-

    1. Infinite variance autoregressions

    We will be interested in the case where the innovations {en} are in domain of

    attraction of a stable law with index ex E (0,2). If E ( Ien I)< 00 then we will assume that

    Recall that given observations Xl ' ... ,XN and known order p , it is possible to

    consistently estimate the AR parameters 13 1 " " , J3p ' In fact for LS estimates

    betahatsubJ , ... , ~ I where l;c. p :as,Nl/~(~k(l)- 13k ) ~ 0 for 0> ex

    where 13k = 0 for k >p. For YW estimates, a slightly weaker result holds: convergence to

    ois in probability rather than almost sure.

    We may also wish to consider AR models of the form

    where Jl is unknown and we retain the same assumptions on the 13k ' s and {en}' It can be

    shown (Knight, 1987) that if we center the observed series by subtracting the sample mean X

    (i.e., = '10 .... ,

    n = 1, ... , N), we will still have Nl/~ (~k - 13k ) !..." 0 for 0 > max (I, ex)convergence is almlost sure for

    by subl:racting "location esnmate Jl

    Depending on "Jl we to rate

  • -5-

    As stated earlier, we will want to vary the autoregressive parameters with N. For this

    reason, we will consider a triangular array of random variables

    X (1)1

    X1(2) X (2)

    , 2

    where each row is a finite realization of an AR(p) process:

    X (N) = ~ R lfY)X (N). + e(N)n ..l.J PI n-I n

    j=1

    The corresponding triangular array of innovations, {e;;') }ns,N , consists of row-wise

    independent random variables sampledfrom a common distribution which is in the domain

    of attraction of a stable law. Given a single i.i.d. sequence {en}' we could construct each

    element of the triangular array as follows:

    00

    X (N) = " c. ( R(N)e .n ..l.J I P n-Jj=O -

    We will require that f3(N) = (f3iN) , ... , f3;N) are such that (piN) , ... , p;N area compact)

    = , we can attempt to f3p to zero as to intmity and to

    consistently estimate p at same time. the testmg

    consistent test

  • -6-

    altematives.) Intuitively, it would seem that the smaller Ifit) I is, the more itshould be to distinguish between a p -th order and a lower order AR model. From

    simulations, this does seem to be the case. This is the real motivation for allowing the

    parameters to vary with N . Consider the following example. Suppose we observe a p -th

    order AR process which has Pp very close to zero (say Pp =0.1). To estimate the order

    of the process, we use a procedure which we know to be consistent. So for N large enough,

    we will select the true order with arbitrarily high probability. However, for moderate sized

    N , the probability of underestimating p may be very high. Conversely, if Ipp I is closeto 1, then even for small N there will be high probability of selecting the true order. So by

    allowing Pp = fip to shrink to zero with N , we may get some idea of the relative samplesizes needed to get the same probability of correct order selection for two different sets of

    AR parameters. If we view order selection as a hypothesis testing problem (say testing a null

    hypothesis of white noise versus autoregressive alternatives), shrinking fip to zero issimilar in spirit to the sequence of contiguous alternative hypotheses to a null hypothesis

    considered in Pitman efficiency calculations.

    We should note that the partial autocorrelations do not have tlleir usual finite variance

    interpretation; however, they can be unambiguously defined in terms of the regular

    autocorrelations which themselves can be unambiguously defined in terms of the linear

    as

    process coefficients. (see

    can

    and Resnick, 1985) Moreover,

    recursive YW estimates variance case.

    assume that itIf we mctude unknown location, u,

    not

    model, we

    a sense it

    not

    a sense,

  • -7-

    a nuisance parameter this situation.

    We will provide an answer to the following question: under what conditions (if any) on

    K (N) and (~iN) , ... , ~;"l) ) AlC provide a consistent estimate p of p ? Bhansali(1983) conjectures that AlC may provide a consistent estimate of the order of an

    autoregressive process based on the rapid convergence of parameter estimates. However, he

    seems to conclude, from Monte Carlo results, that this may not be the case. If K (N) is

    allowed to grow too fast then we may wind up severely overfitting much of the time; for

    example, p could equal K (N) with high probability.

  • - 8-

    2. Theoretical Results

    The main result of this paper is contained in Theorem 7; the first six results provide the

    necessary machinery for Theorem 7. We begin by stating two results dealing with r-th

    moments of martingales and submartingales.

    n

    Theorem L (Esseen and von Bahr, 1965) Let Sn=l:X". If E(Xn ISn_ 1)=0 for"=1

    2 S. n S. N and Xn E L' for 1 S. r S. 2 then

    or N r

    E( ISN I ) S. 2 l: E ( IXn I ).n=l

    (Note that {Sn ' (j(Sn) ; n ~ I} is a martingale.)

    Theorem 2. (c.f, Chung,1974 p.346) If {Xn ' (j(Xn ) ; n ~ I} is an L' -submartingale for

    some r > 1 then

    o

    The following lemma will allow us to ignore the dependence on N of the moments of

    {X;N)} by virtue of being able to bound the momentsparameters within a compact set.

    Lemma 3. Let {Xn (~)} be a stationary AR(P) process with parameter ~ and innovations

    {en} in the domain of attraction of a law with Ct.

    the parameter for 0 < 0 < e ,

  • 00

    Proof. Xn (f) :;:: k Cj (f)en- j=o -

    Now

    -9-

    where Cj (~) is a conanuoas function of f) for all j .

    00

    IXn (f) I s j~O ICj (~) II en -j I00

    S; k aj Ien _j Ij=O

    where a , =J shewn that Iajl CjP Ix I.i where00 yIx I < 1 and so k Iaj I < 00 for all "(> 0 . Under this summability condition, it

    j=o

    follows from Cline (1983) that the random variable

    00

    X = k aj Iej Ij=o

    is finite almost surely with

    lim ----"-----'~X-4 00

    00

    = k a.ct < 00 '0 JJ=

    This implies that E (X 0) is finite for all 0 < 0 < a and the result follows. o

    The following lemma will allow us to treat moments of k X n the same as the

    moments of k en when a > 1.

    Lemma 4. {Xn } a zero mean stal:ioI1lary ) process }

    a law a>1. 1 < r < a,

  • - 10-

    (a) El N x.' J= O(N)(b) E[~ l::? r ] = O(N).Proof.

    where IRN I s [max I~k I ] pep + 1) [ max Ix, I ]. ThusISkSp I-pSksN

    . C [ 1 ~ A 1- 1 Th > Mi' ...where '= - "" Pk J . 'us by inkowski's Inequality,k=1

    l/r

    'LXnn=1

    rand part (a) tollc:.>ws from Theorem 1 by noting that E [ IRN I ] = 0 ). (It can actuallybe shown to 0 by a unitonn inteJgrabrility argument.)

    rneorem 2 by notilng

    +

  • - 11 -

    and using Minkowski's Inequality.

    The following theorem deals with uniform convergence of both LS and YW

    autoregressive parameter estimates in the case where location is known.

    Theorem 5. Assume known location J1. Let K (N) = 0 (N) for 0 < 2; and 11! IIdenote the Euclidean norm of the vector v . Then

    ("a)

    (b)

    {jj max II ~(l) - f)(N) II ~ 0 where f)t) == 0 for k > P .p'5.1'5.K(N) - -

    {jj max II ~(l) - ~(I)1I ~ O.1'5.1'5.K(N) - -

    Note that the vectors are not fixed length but may vary with N .

    Proof. (a) The style of proof will mimic Hannan and Kanter (1977). For convenience we

    suppress the notation indicating the dependence of {Xn }, {En} and f) on N . For I ~ Pthe LSestimating equations can be reexpressedjas follows:

    * N 2where Ll (j)= L EnXn_ j Fix o< __(X, and set K=K(N)=O(N). For each

    n=l+l 2

    I , C1 is non-negative definite and so it suffices to show that for some 1C < .1..,(X,

    (i) Np~O

    pv~oo

    = ) . and maxl'5.K

    ) - f) p~

  • - 12-

    To prove (i), it sutaees to show that

    for some y < ~ .

    Now

    NLenXn_j

    n=1+1

    1rJ21 JIN

    NL enXn_ j

    n=1

    21]

    +E1

    L nXn-jn=l

    $:,\",(1-2"i7.E[>m.,.]=1 lS1SK

    s ~N (l-f

  • - 13-

    uniformly over j between 1 and K (N) .

    kIf 21 ~ 1 then a> 1 and so Sk,j = L EnXn

    n=1is a martingale for each J.

    ISk,j I is an L 2Y-submartingale and so by Theorems 1 and 2,

    uniformly over j .

    Similarly it can be shown that for all permissible values of 1, WN,j = 0 (N) uniformly

    over j between 1 and K (N). Thus for a given sequence K(N) = 0 (N) by taking K

    sufficiently close to ~ and 1 sufficiently close to ~, we will have

    as desired.

    To prove (ii), we define Xn , v ' En, v as follows:

    KEn,v = L VkEn_k

    k=1

    K 2L vk = 1. It tok=1

  • - 14-

    pNow note that X nv =:E f}jXn-}.v +env By the triangle inequality,

    Now

    N- K ~ [~A X ]2 < ~ A~ ~ N- K ~ X 2"'" "'" tJk 11. - k v - "'" tJJ "'" "'" 11. - k: v

    n=K+1 k=l }=1 k=l n=K+1

    It remains only to show that N- K :E e; v ~ 00. If this is true then.

    p~ 00 since the probability that this quantity stays bounded clearly must tend

    to zero.

    Now

    =:E :E vkk=ln=K+1

    N-K= :E e;

    n=K+1

  • - 15 -

    Thus

    since

    N-K PN- K L E~ -+ 00

    n=K+l

    Thus we need only show that

    Now

    K k-ls N- K L [vk: I L [vj I

    k=2 j=l

    NL En_jEn_ k

    n=K+l

    Now take "( < a and note that. j *' k . If "( < 1 then

    If "(:2:: 1 then necessarily a> 1. Thus Sl = L En _ j En -k is an L r -martingale andn=K+l

    hence

    NL

    n=K+l-k ]=0

    unitormly over j :t:k Theorem 1.

  • - 16-

    Now

    since Ivk I s 1 for all k .

    (b) From the definitions of C1 ,C1 '!.I and f..1 ' it is easy to see that

    (la)

    and

    (lb)

    K NT IC" (. ') C- (. ') I

  • - 17 -

    Now from the definitions of ~(l) and aO), we get

    uniformly in 1 by equation (3b). Finally we must show that the minimum eigenvalue of

    N- te C1 tends in probability to infinity uniformly in 1 since II (N- te C1 rIll is (in the

    case of symmetric positive definite matrices) merely the reciprocal of this minimum

    eigenvalue. Note that for unit vectors ~

    uniformly over 1 and unit vectors ~ by condition (ii) of the proof of part (a) of this

    theorem and equation (3a) above. Therefore

    as required. D

    In the case where we have an unknown location parameter and we estimate it with some

    location estimate ~,we can obtain the following corollary.

    Corollary 6.

    L If (~-IJi = Op (NY) for 'Y S; min [[ ~ - ; + ~ J, 0 ] uniformly over allcompact subsets the parameter space, then Theorem 5 still holds.

    condition.

    (1.> 1,

    Theorem S

    K =0

  • - 18 -

    Proof. 1. (a) Assume without loss of generality that J.1 = O. We can again reexpress the LS

    estimating equations as follows:

    where now

    and

    N!.7 (j) = Ln=l+l

    ** [p] "2=!.I U) + (N - I) 1 - L ~k J.1

    k=l

    By similar methods to those used in the proof of Theorem 5, it is easy to show that for some

    2K

  • - 19-

    (b) Defining TN and SN analogously to the proof of Theorem 5, we again get that for

    some K 2p and conclusions (a) (b) of Theorem 5

    some K ) then

    pP 4

  • - 20-

    pProof. First we note that since p is integer-valued, p -+ p is toP [p = p] -+ I (as N -+ 00). From here on, we will refer to K (N) as K and to ~r) as

    ~k thus suppressing the dependence on N .

    Moreover we will assume that the observations Xn are already centered; that is, we have

    subtracted out the location estimate ~ (if we are assuming unknown location).

    We now use the fact that

    il(k) = &2(0) Ii (1- ~;(l1=1

    for k e I

    "2 I N 2where (J (0) = - L Xn and ~ k (/) is the YW estimate of ~k (I Since

    we can write

    so

    P[p

  • - 21-

    lim sup P [N ~; (P) ~ 2p] = 0N~oo

    since lim inf

    We also have that

    P [ p > p ] s P [ (k) < (k -1) for some p < k s K ]

    -S;P [N min In(l- ~i (k < -2 ]p

  • - 22-

    3. Simulation results

    For illustrative purposes, a small simulation study was carried out using four symmetric

    stable innovations distributions with a = 0.5, 1.2, 1.9 and 2.0 (the latter being the normal

    distribution). The underlying processes were AR(I) processes with the AR parameter

    13 =0.1 ,0.5 and 0.9. The sample sizes considered were 100 and 900. For N =100, the

    maximum order K was taken to be 10 while for N = 900, K was taken to be 15. 100

    replications were made for each of the 24 possible arrangements of a, 13 and N . The results

    of the study are given in Tables 1 through 8.

    Estimated "C AR parameterorder 0.1 0.5 0.9

    0 89 0 11 4 91 872 4 3 23 0 1 14 1 0 0

    I 5 0 3 26 1 1 27 0 0 38 0 0 19 1 1 010 0 0 1

    Table 1: Frequency of selected order for AR(I)process. N =100 a =0.5

  • Estimated AR parameterorder 0.1 0.5 0.9

    0 0 0 01 93 95 912 0 0 03 0 0 14 0 0 05 0 0 06 0 0 07 4 0 08 1 5 29 0 0 1

    10-15 2 0 5

    Table 2: Frequency of selected order for AR(l)process. N = 900 a = 0.5

    The results are much as expected. We can see that for N = 100 and ~1 = 0.1, AlC

    underestimates the true order with high probability. For N = 900, the probabilities of

    selecting the true order increases over those for N = 100.

    Estimated AR parameterorder 0.1 f\J:. 0.9V.,J

    0 70 0 01 15 86 862 7 7 43 3 34 1 1 35 0 1 16 0 0 07 2 0 28 2 1 19 0 1 010 0 0 0

    Table 3: Frequency of selected order forprocess. N = 100 a = 1.2

  • Estimated AR parameterorder I 0.1 0.5 0.9

    0 0 0 01 80 87 902 6 3 33 6 0 14 1 4 35 1 2 06 0 0 07 2 2 18 1 0 09 0 0 0

    10-15 3 2 2

    Table 4: Frequency of selected order for AR(l)process. N =900 a =1.2

    Estimated AR parameterorder 0.1 0.5 0.9

    0 57 0 01 25 76 712 5 8 103 2 5 94 3 6 65 2 1 26 3 1 07 1 1 18 1 0 09 0 0 110 1 2 0

    Table 5: Frequency of selected order for AR(1)process. N =100 a =1.9

  • Estimated AR parameterorder 0.1 0.5 0.9

    0 5 0 01 78 74 752 7 14 93 2 2 74 2 5 25 1 1 26 1 1 07 3 1 38 0 0 09 0 0 1

    10-15 1 2 1

    Table 6: Frequency of selected order for AR(l)process. N =900 a =1.9

    Estimated AR parameterorder 0.1 0.5 0.9

    0 63 0 01 25 75 752 4 3 123 1 6 24 0 7 5t: 2 2 '"J .6 2 4 37 1 0 08 1 1 19 0 2 010 1 0 0

    Table 7: Frequency of selected order for AR(I)process. N = 100 Normal distribution

  • Estimatedorder

    o123456789

    10-15

    - 26-

    AR parameter0.1 0.5 0.9000

    83 79 803311443460233000040411002000

    Table 8: Frequency of selected order forAR(l)process. N = 900 Normal distribution

  • - 27-

    4. Comments

    Bhansali and Downham (1977) propose a generalization of AlC. They propose to

    minimize $'(k) = N In ci (k) + yk where y E (0,4). It is easy to see from the proof of theabove result that their criterion will also lead to consistent estimates of p under similar

    conditions on K (N) and ~~N). In fact, if y = y(N) > 0 satisfies y(N)/N ~ 0, then thecriterion corresponding to $"(k ) = N In ;l (k ) +y(N) k will consistently estimate p.Specifically, with known location, the estimate will be consistent provided

    lim inf .s: 1~~N) 12 > PN -')00 y(N)

    with y(N) bounded away from zero and with the same conditions on K (N). With an

    appropriate choice of y(N), this criterion will also be consistent in the finite variance case.

    However if y(N) grows too quickly with N then the criterion may seriously underestimate

    the true order p in small samples in both the finite and infinite variance cases. In an

    application such as autoregressive spectral density estimation (assuming now finite

    variance), underestimation is more serious than overestimation since, if the order is

    underestimated, the resulting spectral density estimate may be lacking important features

    which may indeed exist.

  • - 29-

    Davis, R. and Resnick, S. (1985) Limit theory for moving averages of random variables with

    regularly varying tail probabilities. Ann. Prob. 13 179-95.

    Esseen, C. and von Bahr, B. (1965) Inequalities for the rth absolute moment of a sum of

    random variables, 1 s r s 2. Ann. Math. Statist. 36 299-303.

    Hannan, E. and Kanter, M. (1977) Autoregressive processes with infinite variance. J. App.

    Prob. 14 411-15.

    Knight, K. (1987) Rate of convergence of centred estimates of autoregressive parameters for

    infinite variance autoregressions. J. Time Ser. Anal. 8 51-60.

    Shibata, R. (1976) Selection of the order of an autoregressive model by Akaike's

    information criterion. Biometrika 63 117-26.

    Current address:V6T lW5

    of Statisncs, Univlersity of British Columbia, 2021 West

  • - 28-

    References

    Akaike, H. (1973) Information theory and an extension of the maximum likelihood

    principle, in: Second International Symposium on Information Theory (eds. B. Petrov

    and F. Csaki) Akedemiai Kiade, Budapest 267-81.

    Bamdorff-Nielsen, O. and Schou, G. (1973) On the parametrization of autoregressive

    models by partial autocorrelations. J. Multivar. Anal. 3 408-19.

    Bhansali, R (1983) Order determination for processes with infinite variance, in: Robust and

    Nonlinear Time Series Analysis. (eds. J. Franke, W. Hardle, RD. Martin), Springer-

    Verlag, New York, 17-25.

    Bhansali, R and Downham, D. (1977) Some properties of the order of an autoregressive

    model selected by a generalization of Akaike's FPE criterion. Biometrika 64 547-51.

    Chung, K.L. (1974) A course in probability theory. Academic Press, New York.

    Cline, D.

    Report

    of random vanabtes with regularly '{791"\111'1'0

    83-24, Institute of Applied Mathematics and Statistics, Umversitv