finite sample criteria for autoregressive model order selection

7/29/2019 Finite sample criteria for autoregressive model order selection

1/4

Finite sample criteria for autoregressive order selection

This document details autoregressive model selection criteria following Broersen.1 Emphasisis placed on converting the formulas into forms efficiently computable when evaluating asingle model. When evaluating a hierarchy of models, computing using intermediate results

may be more efficient.

Setting

An AR(K) process and its AR(p) model are given by

xn + a1xn1 + + aKxnK = n

xn + a1xn1 + + apxnp = n

in which n N(0, 2 ) and n N(0, 2 ). Model selection criteria for evaluating which of

several candidates most parsimoniously fits an AR(K) process generally have the form

criterion(vmethod,N,p ,) = ln residual(p, vmethod) + overfit(criterion, vmethod,N,p ,) . (2)

Among all candidates and using a given criterion, the best model minimizes the criterion.Here, N represents the number of samples used to estimate model parameters, p denotesthe order of the estimated model, vmethod = vmethod(N, i) is the method-specific estimationvariance for model order i, and is an optional factor with a criterion-dependent meaning.When estimating a1, . . . , ap given sample data xn, the residual variance is

residual(vmethod, p) = residual(p) = 2 .

Therefore the left term in (2) penalizes misfitting the data independently of the estimationmethod used. One may therefore distinguish among criterion using only the overfittingpenalty term, namely overfit(criterion, vmethod,N,p ,).

In Broersens work, the penalty term depends upon the model estimation method usedthrough the estimation variance v:

vYuleWalker(N, i) =N i

N(N+ 2)i = 0

vBurg(N, i) =1

N+ 1 ii = 0

vLSFB

(N, i) =1

N+ 1.5 1.5ii = 0

vLSF(N, i) =1

N+ 2 2ii = 0

1Broersen, P. M. T. Finite sample criteria for autoregressive order selection. IEEE Transactions onSignal Processing 48 (December 2000): 3550-3558. http://dx.doi.org/10.1109/78.887047.

1
http://dx.doi.org/10.1109/78.887047http://dx.doi.org/10.1109/78.887047http://dx.doi.org/10.1109/78.887047


2/4

Here LSFB and LSF are shorthand for least squares estimation minimizing both theforward and backward prediction or only the forward prediction, respectively. The estimationvariance for i = 0 depends only on whether or not the sample mean has been subtracted:

v(N, 0) =1

N

sample mean subtracted

v(N, 0) = 0 sample mean retained

Infinite sample overfit penalty terms

The method-independent generalized information criterion (GIC) has overfitting penalty

overfit(GIC,N,p ,) = p

N

independent ofvmodel. The Akaike information criterion (AIC) has

overfit(AIC,N ,p) = overfit(GIC,N,p, 2) (5)

while the consistent criterion BIC and minimally consistent criterion (MCC) have

overfit(BIC,N ,p) = overfit(GIC,N,p, lnN) (6)

overfit(MCC,N ,p) = overfit(GIC,N,p, 2lnln N) . (7)

Additionally, Broersen uses = 3 with GIC referring to the result as GIC(p,3). Theasymptotically-corrected Akaike information criterion (AICC) of Hurvich and Tsai

2 is

overfit(AICC,N ,p) =2p

Np 1

.

Finite sample overfit penalty terms

Finite information criterion3

The finite information criterion (FIC) is an extension of GIC meant to account for finitesample size and the estimation method employed. The FIC overfit penalty term is

overfit(FIC, vmethod,N,p ,) =

pi=0

vmethod(N, i)

= v(N, 0) +

pi=1

vmethod(N, i)

2Hurvich, Clifford M. and Chih-Ling Tsai. Regression and time series model selection in small samples.Biometrika 76 (June 1989): 297-307. http://dx.doi.org/10.1093/biomet/76.2.297

3FIC is mistakenly called the finite sample information criterion on page 3551 of Broersen 2000 butreferred to correctly as the finite information criterion on page 187 of Broersens 2006 book.

2


3/4

where v(N, 0) is evaluated using (4) and vmethod(N, i) from (3). The factor may be chosenas in (5), (6), or (7). Again, Broersen uses = 3 calling the result FIC(p,3).

By direct computation one finds the following:

overfit(FIC, vYuleWalker,N,p ,) = v(N, 0) p (1 2N+ p)2N(N+ 2)

overfit(FIC, vBurg,N,p ,) = (v(N, 0) (N+ 1) + (N+ 1 p))

overfit(FIC, vLSFB,N,p ,) =

v(N, 0)

2

3

3 + 2N

3

3 + 2N

3p

overfit(FIC, vLSF,N,p ,) =

v(N, 0)

1

2

2 + N

2

2 + N

2p

The simplifications underneath the Burg, LSFB, and LSF results use that

p

i=1

1

N+ a ai=

p1

i=0

1

N ai

=1

a

p1

i=0

1Na i

=1

aN

a

+ 1 Na

p + 1holds a R because the digamma function telescopes according to

(x + 1) =1

x+ (x) = (x + k) (x) =

k1i=0

1

x + i.

For strictly positive abscissae, may be numerically evaluated following Bernardo.4

Finite sample information criterion

The finite sample information criterion (FSIC) is a finite sample approximation to theKullbackLeibler discrepancy5. FSIC has the overfit penalty term

overfit(FSIC, vmethod,N ,p) =

pi=0

1 + vmethod(N, i)

1 vmethod(N, i) 1

=1 + v(N, 0)

1 v(N, 0)

pi=1

1 + vmethod(N, i)

1 vmethod(N, i) 1. (9)

4Bernardo, J. M. Algorithm AS 103: Psi (digamma) function. Journal of the Royal Statistical Society.Series C (Applied Statistics) 25 (1976). http://www.jstor.org/stable/2347257

5

Presumably FSIC could be related, through the Kullback symmetric divergence, to the KICc and AKICccriteria proposed by Seghouane, A. K. and M. Bekara. A Small Sample Model Selection Criterion Based onKullbacks Symmetric Divergence. IEEE Transactions on Signal Processing 52 (December 2004): 3314-3323.http://dx.doi.org/10.1109/TSP.2004.837416.

3
http://www.jstor.org/stable/2347257http://dx.doi.org/10.1109/TSP.2004.837416http://dx.doi.org/10.1109/TSP.2004.837416http://dx.doi.org/10.1109/TSP.2004.837416http://www.jstor.org/stable/2347257


4/4

The product in the context of the YuleWalker estimation may be reexpressed aspi=1

1 + vYuleWalker(N, i)

1 vYuleWalker(N, i)=

pi=1

N2 + 3N i

N2 + N+ i

= (1)

p(1 3n n2)p

(1 + n n2)p =

(n2 + 3np)p

(1 + n n2)p (10)

where the rising factorial is denoted by the Pochhammer symbol

(x)k = (x + k)

(x).

When x is a negative integer and is therefore undefined, the limiting value of the ratio isimplied. The product in the context of the Burg, LSFB, or LSF estimation methods becomes

pi=1

1 + vBurg|LSFB|LSF(N, i)

1 vBurg|LSFB|LSF(N, i)=

pi=1

N+ a (1 i) + 1

N+ a (1 i) 1=

1+N

a

p

1Na p

(11)

where a R is a placeholder for a method-specific constant. Routines for computing thePochhammer symbol may be found in, for example, SLATEC6 or the GNU Scientific Li-brary7. In particular, both suggested sources handle negative integer input correctly.

By direct substitution of (10) or (11) into (9) one obtains:

overfit(FSIC, vYuleWalker,N ,p) =1 + v(N, 0)

1 v(N, 0)

(n2 + 3np)p(1 + n n2)p

1

overfit(FSIC, vBurg,N ,p) =1 + v(N, 0)

1 v(N, 0)

(1 N)p(1 N)p

1

overfit(FSIC, vLSFB,N ,p) =

1 + v(N, 0)

1 v(N, 0) 22N3 p

22N3

p 1

overfit(FSIC, vLSF,N ,p) =1 + v(N, 0)

1 v(N, 0)

1N

2

p

1N2

p

1

Combined information criterion

The combined information criterion (CIC) takes the behavior of FIC(p,3) at low orders andFSIC at high orders. For any estimation method CIC has the overfit penalty term

overfit(CIC, vmethod,N ,p) = maxoverfit(FSIC, vmethod,N ,p) ,overfit(FIC, vmethod,N,p,3)

.

6 Vandevender, W. H. and K. H. Haskell. The SLATEC mathematical subroutine library. ACMSIGNUM Newsletter 17 (September 1982): 16-21. http://dx.doi.org/10.1145/1057594.1057595

7 M. Galassi et al, GNU Scientific Library Reference Manual (3rd Ed.), ISBN 0954612078. http://www.gnu.org/software/gsl/

4
http://dx.doi.org/10.1145/1057594.1057595http://www.gnu.org/software/gsl/http://www.gnu.org/software/gsl/http://www.gnu.org/software/gsl/http://www.gnu.org/software/gsl/http://dx.doi.org/10.1145/1057594.1057595

finite sample criteria for autoregressive model order selection

Documents