finite sample criteria for autoregressive model order selection

Upload: rhysu

Post on 14-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Finite sample criteria for autoregressive model order selection

    1/4

    Finite sample criteria for autoregressive order selection

    This document details autoregressive model selection criteria following Broersen.1 Emphasisis placed on converting the formulas into forms efficiently computable when evaluating asingle model. When evaluating a hierarchy of models, computing using intermediate results

    may be more efficient.

    Setting

    An AR(K) process and its AR(p) model are given by

    xn + a1xn1 + + aKxnK = n

    xn + a1xn1 + + apxnp = n

    in which n N(0, 2 ) and n N(0, 2 ). Model selection criteria for evaluating which of

    several candidates most parsimoniously fits an AR(K) process generally have the form

    criterion(vmethod,N,p ,) = ln residual(p, vmethod) + overfit(criterion, vmethod,N,p ,) . (2)

    Among all candidates and using a given criterion, the best model minimizes the criterion.Here, N represents the number of samples used to estimate model parameters, p denotesthe order of the estimated model, vmethod = vmethod(N, i) is the method-specific estimationvariance for model order i, and is an optional factor with a criterion-dependent meaning.When estimating a1, . . . , ap given sample data xn, the residual variance is

    residual(vmethod, p) = residual(p) = 2 .

    Therefore the left term in (2) penalizes misfitting the data independently of the estimationmethod used. One may therefore distinguish among criterion using only the overfittingpenalty term, namely overfit(criterion, vmethod,N,p ,).

    In Broersens work, the penalty term depends upon the model estimation method usedthrough the estimation variance v:

    vYuleWalker(N, i) =N i

    N(N+ 2)i = 0

    vBurg(N, i) =1

    N+ 1 ii = 0

    vLSFB

    (N, i) =1

    N+ 1.5 1.5ii = 0

    vLSF(N, i) =1

    N+ 2 2ii = 0

    1Broersen, P. M. T. Finite sample criteria for autoregressive order selection. IEEE Transactions onSignal Processing 48 (December 2000): 3550-3558. http://dx.doi.org/10.1109/78.887047.

    1

    http://dx.doi.org/10.1109/78.887047http://dx.doi.org/10.1109/78.887047http://dx.doi.org/10.1109/78.887047
  • 7/29/2019 Finite sample criteria for autoregressive model order selection

    2/4

    Here LSFB and LSF are shorthand for least squares estimation minimizing both theforward and backward prediction or only the forward prediction, respectively. The estimationvariance for i = 0 depends only on whether or not the sample mean has been subtracted:

    v(N, 0) =1

    N

    sample mean subtracted

    v(N, 0) = 0 sample mean retained

    Infinite sample overfit penalty terms

    The method-independent generalized information criterion (GIC) has overfitting penalty

    overfit(GIC,N,p ,) = p

    N

    independent ofvmodel. The Akaike information criterion (AIC) has

    overfit(AIC,N ,p) = overfit(GIC,N,p, 2) (5)

    while the consistent criterion BIC and minimally consistent criterion (MCC) have

    overfit(BIC,N ,p) = overfit(GIC,N,p, lnN) (6)

    overfit(MCC,N ,p) = overfit(GIC,N,p, 2lnln N) . (7)

    Additionally, Broersen uses = 3 with GIC referring to the result as GIC(p,3). Theasymptotically-corrected Akaike information criterion (AICC) of Hurvich and Tsai

    2 is

    overfit(AICC,N ,p) =2p

    Np 1

    .

    Finite sample overfit penalty terms

    Finite information criterion3

    The finite information criterion (FIC) is an extension of GIC meant to account for finitesample size and the estimation method employed. The FIC overfit penalty term is

    overfit(FIC, vmethod,N,p ,) =

    pi=0

    vmethod(N, i)

    = v(N, 0) +

    pi=1

    vmethod(N, i)

    2Hurvich, Clifford M. and Chih-Ling Tsai. Regression and time series model selection in small samples.Biometrika 76 (June 1989): 297-307. http://dx.doi.org/10.1093/biomet/76.2.297

    3FIC is mistakenly called the finite sample information criterion on page 3551 of Broersen 2000 butreferred to correctly as the finite information criterion on page 187 of Broersens 2006 book.

    2

  • 7/29/2019 Finite sample criteria for autoregressive model order selection

    3/4

    where v(N, 0) is evaluated using (4) and vmethod(N, i) from (3). The factor may be chosenas in (5), (6), or (7). Again, Broersen uses = 3 calling the result FIC(p,3).

    By direct computation one finds the following:

    overfit(FIC, vYuleWalker,N,p ,) = v(N, 0) p (1 2N+ p)2N(N+ 2)

    overfit(FIC, vBurg,N,p ,) = (v(N, 0) (N+ 1) + (N+ 1 p))

    overfit(FIC, vLSFB,N,p ,) =

    v(N, 0)

    2

    3

    3 + 2N

    3

    3 + 2N

    3p

    overfit(FIC, vLSF,N,p ,) =

    v(N, 0)

    1

    2

    2 + N

    2

    2 + N

    2p

    The simplifications underneath the Burg, LSFB, and LSF results use that

    p

    i=1

    1

    N+ a ai=

    p1

    i=0

    1

    N ai

    =1

    a

    p1

    i=0

    1Na i

    =1

    aN

    a

    + 1 Na

    p + 1holds a R because the digamma function telescopes according to

    (x + 1) =1

    x+ (x) = (x + k) (x) =

    k1i=0

    1

    x + i.

    For strictly positive abscissae, may be numerically evaluated following Bernardo.4

    Finite sample information criterion

    The finite sample information criterion (FSIC) is a finite sample approximation to theKullbackLeibler discrepancy5. FSIC has the overfit penalty term

    overfit(FSIC, vmethod,N ,p) =

    pi=0

    1 + vmethod(N, i)

    1 vmethod(N, i) 1

    =1 + v(N, 0)

    1 v(N, 0)

    pi=1

    1 + vmethod(N, i)

    1 vmethod(N, i) 1. (9)

    4Bernardo, J. M. Algorithm AS 103: Psi (digamma) function. Journal of the Royal Statistical Society.Series C (Applied Statistics) 25 (1976). http://www.jstor.org/stable/2347257

    5

    Presumably FSIC could be related, through the Kullback symmetric divergence, to the KICc and AKICccriteria proposed by Seghouane, A. K. and M. Bekara. A Small Sample Model Selection Criterion Based onKullbacks Symmetric Divergence. IEEE Transactions on Signal Processing 52 (December 2004): 3314-3323.http://dx.doi.org/10.1109/TSP.2004.837416.

    3

    http://www.jstor.org/stable/2347257http://dx.doi.org/10.1109/TSP.2004.837416http://dx.doi.org/10.1109/TSP.2004.837416http://dx.doi.org/10.1109/TSP.2004.837416http://www.jstor.org/stable/2347257
  • 7/29/2019 Finite sample criteria for autoregressive model order selection

    4/4

    The product in the context of the YuleWalker estimation may be reexpressed aspi=1

    1 + vYuleWalker(N, i)

    1 vYuleWalker(N, i)=

    pi=1

    N2 + 3N i

    N2 + N+ i

    = (1)

    p(1 3n n2)p

    (1 + n n2)p =

    (n2 + 3np)p

    (1 + n n2)p (10)

    where the rising factorial is denoted by the Pochhammer symbol

    (x)k = (x + k)

    (x).

    When x is a negative integer and is therefore undefined, the limiting value of the ratio isimplied. The product in the context of the Burg, LSFB, or LSF estimation methods becomes

    pi=1

    1 + vBurg|LSFB|LSF(N, i)

    1 vBurg|LSFB|LSF(N, i)=

    pi=1

    N+ a (1 i) + 1

    N+ a (1 i) 1=

    1+N

    a

    p

    1Na p

    (11)

    where a R is a placeholder for a method-specific constant. Routines for computing thePochhammer symbol may be found in, for example, SLATEC6 or the GNU Scientific Li-brary7. In particular, both suggested sources handle negative integer input correctly.

    By direct substitution of (10) or (11) into (9) one obtains:

    overfit(FSIC, vYuleWalker,N ,p) =1 + v(N, 0)

    1 v(N, 0)

    (n2 + 3np)p(1 + n n2)p

    1

    overfit(FSIC, vBurg,N ,p) =1 + v(N, 0)

    1 v(N, 0)

    (1 N)p(1 N)p

    1

    overfit(FSIC, vLSFB,N ,p) =

    1 + v(N, 0)

    1 v(N, 0) 22N3 p

    22N3

    p 1

    overfit(FSIC, vLSF,N ,p) =1 + v(N, 0)

    1 v(N, 0)

    1N

    2

    p

    1N2

    p

    1

    Combined information criterion

    The combined information criterion (CIC) takes the behavior of FIC(p,3) at low orders andFSIC at high orders. For any estimation method CIC has the overfit penalty term

    overfit(CIC, vmethod,N ,p) = maxoverfit(FSIC, vmethod,N ,p) ,overfit(FIC, vmethod,N,p,3)

    .

    6 Vandevender, W. H. and K. H. Haskell. The SLATEC mathematical subroutine library. ACMSIGNUM Newsletter 17 (September 1982): 16-21. http://dx.doi.org/10.1145/1057594.1057595

    7 M. Galassi et al, GNU Scientific Library Reference Manual (3rd Ed.), ISBN 0954612078. http://www.gnu.org/software/gsl/

    4

    http://dx.doi.org/10.1145/1057594.1057595http://www.gnu.org/software/gsl/http://www.gnu.org/software/gsl/http://www.gnu.org/software/gsl/http://www.gnu.org/software/gsl/http://dx.doi.org/10.1145/1057594.1057595