modeling with arma processes: i - university of...
TRANSCRIPT
Modeling with ARMA Processes: I
• goal: determine which ARMA(p, q) process is best model forobserved time series x1, . . . , xn
• tasks at hand include
� determining p and q (order selection)
� estimating process mean (easily done!), coe�cients �j & ✓j(not so easy) and white noise variance �2 (relatively easy)
� subjecting selected model to goodness-of-fit tests
• note: will assume that, if need be, series x1, . . . , xn has beenadjusted so that it can be regarded as realization of zero meanstationary process (usual procedure: take sample mean x0 =Pn
t=1 x0t of original series x01, . . . , x0n and set xt = x0t � x0)
BD–137, CC–149, SS–121 XIII–1
Modeling with ARMA Processes: II
• with p & q assumed initially to be known, will advocate Gaussianmaximum likelihood (ML) estimators for �j, ✓j & �2
• requires use of nonlinear optimization procedure, for which needgood initial estimates of coe�cients �j & ✓j
• can base initial estimates on easier-to-compute estimators
� Yule–Walker (Y–W) estimator (good for AR(p) case)
� Burg estimator (for AR(p) also)
� innovations algorithm (handles MA(q) and ARMA(p, q))
� Hannan–Rissanen (adapts Y–W to handle ARMA(p, q))
• will now described these estimators, along with some prelimi-nary discussion about order selection
BD–138, CC–149, SS–121 XIII–2
Yule–Walker Estimation: I
• assume causal AR(p) model �(B)Xt = Zt, i.e.,
Xt �pX
j=1
�jXt�j = Zt, (⇤)
with {Zt} ⇠WN(0,�2)
• can develop set of p linear equations linking
� = [�1, . . .�p]0
to ACVF values �(0), �(1), . . . , �(p� 1)
• Y–W estimators gotten by substituting usual estimator �(h)for �(h) in p equations (so-called ‘moment matching’)
• one additional equation needed to estimate �2 via this scheme
BD–139, CC–149, SS–121 XIII–3
Yule–Walker Estimation: II
• with h � 0, multiply both sides of (⇤) by Xt�h:
XtXt�h �pX
j=1
�jXt�jXt�h = ZtXt�h
• take expectations to get
�(h)�pX
j=1
�j�(h� j) = E{ZtXt�h} =
(�2, h = 0;
0, h � 1,(⇤⇤)
because causality allows us to write
Xt =1Xj=0
jZt�j and hence E{ZtXt�h} =1Xj=0
jE{ZtZt�h�j}
(recall that 0 = 1)
BD–139, CC–149, SS–121 XIII–4
Yule–Walker Estimation: III
• leads toPp
j=1 �j�(h� j) = �(h) for h = 1, . . . , p:
�1�(0) + �2�(1) + · · ·�p�(p� 1) = �(1)�1�(1) + �2�(0) + · · ·�p�(p� 2) = �(2)
...�1�(p� 1) + �2�(p� 2) + · · ·�p�(0) = �(p)
• in matrix notation we have �p� = �p, where
�p =
2664
�(0) �(1) · · · �(p� 1)�(1) �(0) · · · �(p� 2)
... ... . . . ...�(p� 1) �(p� 2) · · · �(0)
3775 , � =
2664�1�2...�p
3775 , �p =
2664�(1)�(2)
...�(p)
3775
• to (mis)quote Yogi Berra: ‘This is like deja vu all over again!’
BD–139, CC–149, SS–121 XIII–5
Yule–Walker Estimation: IV
• exactly same matrix equation arose when trying to find coe�-cients of best linear predictor of Xn+1 given Xn, . . . , X1 for ageneral stationary process {Xt} (i.e., not necessarily AR(p))
• given time series x1, . . . , xn, form usual estimate of ACVF:
�(h) =1
n
n�|h|Xt=1
xtxt+|h|
• with �p & �p formed by replacing �(h)’s in �p & �p by �(h)’s,Y–W estimator � of AR(p) coe�cients � given by
� = ��1p �p,
where inverse ��1p exists as long as time series isn’t ‘boring’
BD–139, 140, CC–149, SS–121 XIII–6
Yule–Walker Estimation: V
• can solve equation using Levinson–Durbin recursions
• note: provides Y–W estimates not only for order p, but also forall lower orders 1, . . . , p� 1
• once � has been computed, can return to h = 0 case of (⇤⇤),namely,
�2 = �(0)�pX
j=1
�j�(j), to get estimator �2 = �(0)�pX
j=1
�j�(j)
(similar equation arose for getting MSE of best linear predictor)
• fitted model
Xt � �1Xt�1 � · · ·� �pXt�p = Zt, {Zt} ⇠WN(0, �2),
is guaranteed to be causal!
BD–139, 140, CC–149, SS–121 XIII–7
Yule–Walker Estimation: VI
• fitted model has theoretical ACVF that is identical to esti-mates �(h) at lags h = 0, 1, . . . , p, but in general is di↵erentat higher lags
• as an example, let’s revisit the sunspot time series:
� fit AR models of orders p = 1, . . . , 8 & then 29 using Y–W
� compare sample ACVF to the theoretial ACVFs correspond-ing to 9 fitted AR models
BD–141, CC–149, SS–121 XIII–8
Sunspots (1749–1963)
1750 1800 1850 1900 1950
050
100
150
year
x t
BD–99 I–7
Sample PACF for Sunspots
0 10 20 30 40
−1.0
−0.5
0.0
0.5
1.0
h (lag)
PAC
F
BD–99 XII–13
Sample and Fitted AR(1) ACVFs for Sunspots
0 10 20 30 40
−500
050
010
0015
00
h (lag)
ACVF
XIII–9
Sample and Fitted AR(2) ACVFs for Sunspots
0 10 20 30 40
−500
050
010
0015
00
h (lag)
ACVF
XIII–10
Sample and Fitted AR(3) ACVFs for Sunspots
0 10 20 30 40
−500
050
010
0015
00
h (lag)
ACVF
XIII–11
Sample and Fitted AR(4) ACVFs for Sunspots
0 10 20 30 40
−500
050
010
0015
00
h (lag)
ACVF
XIII–12
Sample and Fitted AR(5) ACVFs for Sunspots
0 10 20 30 40
−500
050
010
0015
00
h (lag)
ACVF
XIII–13
Sample and Fitted AR(6) ACVFs for Sunspots
0 10 20 30 40
−500
050
010
0015
00
h (lag)
ACVF
XIII–14
Sample and Fitted AR(7) ACVFs for Sunspots
0 10 20 30 40
−500
050
010
0015
00
h (lag)
ACVF
XIII–15
Sample and Fitted AR(8) ACVFs for Sunspots
0 10 20 30 40
−500
050
010
0015
00
h (lag)
ACVF
XIII–16
Sample and Fitted AR(29) ACVFs for Sunspots
0 10 20 30 40
−500
050
010
0015
00
h (lag)
ACVF
XIII–17
Yule–Walker Estimation: VII
• distribution of Y–W estimators � is approximately multivariatenormal with mean � & covariance �2��1
p /n for large n
• large sample distribution of ML estimators is the same
• don’t even need to worry about inverting �p: can show that
�2��1p = A0A�B0B = AA0 �BB0,
where A and B are p⇥ p lower triangular matrices whose firstcolumns are, respectively,2
6641��1
...��p�1
3775 and
2664
�p�p�1
...�1
3775 ;
A has the same element along any given diagonal; and B has asimilar structure (sometimes referred to as a Toeplitz structure)
BD–141, CC–161, SS–122 XIII–18
Confidence Intervals and Regions for �
• can use large sample distribution to get approximate confidenceintervals for individual �j’s or confidence region for vector �
• approximate 95% confidence interval for �j given by24�j � 1.96
v1/2j,jpn
, �j + 1.96v
1/2j,jpn
35 ,
where vj,j is jth diagonal element of �2��1p
• letting �20.95(p) denote 95% quantile of chi-squared distribution
with p degrees of freedom, approximate 95% confidence regionfor � is the set of all �’s such that
(�� �)0�p(�� �) �20.95(p)
�2
n
BD–142, 143 XIII–19
Yule–Walker Estimation and Order Selection: I
• when Y–W is used to estimate coe�cients for AR(h) model
Xt � �1Xt�1 � · · ·� �hXt�h = Zt,
estimate �h is same as �h,h (hth member of sample PACF)
• as noted before, large sample theory suggests that �h,h is ap-proximately N (0, 1/n) for h > p (the true AR model order)
• given estimates of �h,h out to some maximum order, say H,Brockwell & Davis suggest setting p to be smallest m such that|�h,h| < 1.96/
pn for m < h H
• obvious danger: sampling variability might result in p being settoo high
� with H = 40 in sunspot example, would select p = 29, whichmight not be a reasonable choice
BD–96, 141, CC–115, SS–122 XIII–20
Yule–Walker Estimation and Order Selection: II
• another approach is to select order that minimizes AICC statis-tic (biased-corrected version of Akaike’s information criterion):
AICC = �2 ln (L(�, S(�)/n)) +2(p + 1)n
n� p� 2,
where L is Gaussian likelihood function, and S(�) is definedbelow
• given zero-mean Gaussian AR(p) time series Xn with covari-ance matrix �n (implicitly dependent on � & �2), can write
L(�n) = (2⇡)�n/2(det �n)�1/2 exp�� 1
2X0n��1
n Xn�
and hence
�2 ln (L(�n)) = n ln (2⇡) + ln (det �n) + X 0n��1n Xn
BD–141, 142, 158, CC–130, SS–53, 153 XIII–21
Yule–Walker Estimation and Order Selection: III
• when considering ML estimation later on, will argue that
�2 ln (L(�n)) = n ln (2⇡) + ln (det �n) + X 0n��1n Xn
can be rewritten in AR(p) case as
�2 ln L(�,�2) = n ln(2⇡�2) +p�1Xj=0
ln(rj) +nX
j=1
(Xj � bXj)2
�2rj�1,
where rj ⌘ vj/�2 (note: rj = 1 for j � p)
• dependence on � is through vn’s and coe�cients determiningbXj (can get these from � using reverse L–D recursions)
• can remove �2 by replacing it with S(�)/n, where
S(�) =nX
j=1
(Xj � bXj)2
rj�1
BD–141, 142, 158, CC–130, SS–53, 153 XIII–22
Yule–Walker Estimation and Order Selection: IV
• with removal of �2, AICC statistic becomes
AICC = Cn + n ln
0@ nX
j=1
(Xj � bXj)2
rj�1
1A +
p�1Xj=0
ln(rj) +2(p + 1)n
n� p� 2,
whereCn ⌘ n + n ln(2⇡/n)
• note: will discuss other order selection statistics (BIC etc.) later
• let’s see what order the AICC picks out for sunspot series
BD–141, 142, 158, CC–130, SS–53, 153 XIII–23
AICC for Sunspots
0 10 20 30 40
1850
1900
1950
model order
AIC
C
XIII–24
Sample and Fitted AR(9) ACVFs for Sunspots
0 10 20 30 40
−500
050
010
0015
00
h (lag)
ACVF
XIII–25
Example – Recruitment Time Series: I
• monthly measure of number of new fish entering Pacific Ocean(453 months covering 1950–87; Shumway & Sto↵er got it fromRoy Mendelssohn, NOAA/PFEL, who got it from Pierre KleiberNOAA/NMFS, who generated measures using a model . . . )
SS–7 XIII–26
Recruitment Time Series (1950–1987)
0 100 200 300 400
020
4060
8010
0
t (months starting with Jan 1950)
x t
SS–8 XIII–27
Sample ACF for Recruitment Series
0 10 20 30 40
−1.0
−0.5
0.0
0.5
1.0
h (lag)
ACF
SS–109 XIII–28
Sample PACF for Recruitment Series
0 10 20 30 40
−1.0
−0.5
0.0
0.5
1.0
h (lag)
PAC
F
SS–109 XIII–29
Sample & Fitted AR(2) ACFs for Recruitment Series
0 10 20 30 40
−1.0
−0.5
0.0
0.5
1.0
h (lag)
ACF
XIII–30
Example – Recruitment Time Series: II
• for AR(2) model, Y–W estimates are
� =
"�1
�2
#.=
1.3316�0.4445
�and �2 .
= 94.171
(note: R function ar gives �2 .= 94.799 . . . hmmm)
• using large sample approximation that � is multivariate normalwith mean � and covariance �2��1
2 , can get 95% confidenceintervals (CIs) and regions based upon
�2��12 = �2
�(0) �(1)�(1) �(0)
��1
=
v1,1 v1,2v2,1 v2,2
�.=
0.8024 �0.7396�0.7396 0.8024
�
• usingh�j � 1.96 v
1/2j,j /p
n, �j + 1.96 v1/2j,j /p
ni
yields 95% CIs
[1.2491, 1.4141] for �1 and [�0.5270,�0.3621] for �2
XIII–31
AICC for Recruitment Series
0 10 20 30 403320
3340
3360
3380
3400
3420
model order
AIC
C
XIII–32
Sample & Fitted Y–W AR(13) ACFs
0 10 20 30 40
−1.0
−0.5
0.0
0.5
1.0
h (lag)
ACF
XIII–33
Burg’s Algorithm: I
• Y–W estimator � of � is based on L–D recursions with �(h)replaced by �(h)
• given �k�1 & vk�1, recursion gives us �k & vk via 3 steps
1. get kth order partial autocorrelation:
�k,k =�(k)�
Pk�1j=1 �k�1,j�(k � j)
vk�1
2. get remaining �k,j’s:264
�k,1...
�k,k�1
375 =
264
�k�1,1...
�k�1,k�1
375� �k,k
264�k�1,k�1
...�k�1,1
375
3. get kth order MSE: vk = vk�1(1� �2k,k)
BD–147, 148 XIII–34
Burg’s Algorithm: II
• when k = p, get Y–W estimators � = �p and �2 = vp
• start procedure by setting
�1 = [�1,1] =
�(1)
�(0)
�and v1 = �(0)(1� �2
1,1)
• sample ACVF comes into play in forming �1 and in 1st step ofL–D recursions, but not in 2nd and 3rd steps
• sample ACVF just used to get PACF estimates �1,1, . . . , �p,p
• kth component of PACF is a correlation coe�cient:
�k,k = corr {Xk � bXk,X0 � bX0|k�1}• Burg’s algorithm is based on estimating �k,k in keeping with
the above rather than via sample ACVF
BD–147, 148 XIII–35
Burg’s Algorithm: III
• let �k�1 = [�k�1,1, . . . , �k�1,1]0 be Burg estimator of coe�-
cients for AR(k � 1) process based on X1, . . . , Xn
• calculate forward & backward observed innovations:
�!Ut(k � 1) ⌘ Xt �
k�1Xj=1
�k�1,jXt�j, k t n
�Ut�k(k � 1) ⌘ Xt�k �
k�1Xj=1
�k�1,jXt�k+j, k + 1 t n + 1
• can show that, for any estimator �k,k with �k,1, . . . , �k,k�1generated by step 2 of L–D, have, for k + 1 t n
�!Ut(k) =
�!Ut(k � 1)� �k,k
�Ut�k(k � 1)
�Ut�k(k) =
�Ut�k(k � 1)� �k,k
�!Ut(k � 1)
BD–147, 148 XIII–36
Burg’s Algorithm: IV
• Burg’s idea: choose �k,k that minimizes
SSk(�k,k) ⌘nX
t=k+1
�!U 2
t (k) + �U 2
t�k(k)
• yields Burg’s estimator
�k,k ⌘Pn
t=k+1�!Ut(k � 1)
�Ut�k(k � 1)
12Pn
t=k+1�!U 2
t (k � 1) + �U 2
t�k(k � 1)
• compare above to following expression:
�k,k = corr {Xk � bXk,X0 � bX0|k�1}
=cov {Xk � bXk,X0 � bX0|k�1}�
var {Xk � bXk} var {X0 � bX0|k�1}�1/2
BD–147, 148 XIII–37
Burg’s Algorithm: V
• initialize with�!Ut(0) ⌘ Xt and
�Ut�1(0) ⌘ Xt�1
• guaranteed to have |�k,k| 1
• if |�p,p| 6= 1, Burg estimators � = �p of coe�cients � alwayscorrespond to stationary & causal AR(p) process (same is truefor Y–W, except that |�p| = 1 can’t happen)
• large sample distribution for Burg same as for Y–W and ML,but Monte Carlo studies show Burg outperforming Y–W
• fitted model has theoretical ACVF that need not be identicalto sample ACVF, as another visit to sunspot time series shows
BD–147, 148 XIII–38
Sample and Fitted AR(5) ACVFs for Sunspots
0 10 20 30 40
−500
050
010
0015
00
h (lag)
ACVF
XIII–39
Sample and Fitted AR(10) ACVFs for Sunspots
0 10 20 30 40
−500
050
010
0015
00
h (lag)
ACVF
XIII–40
Sample and Fitted AR(15) ACVFs for Sunspots
0 10 20 30 40
−500
050
010
0015
00
h (lag)
ACVF
XIII–41
Sample and Fitted AR(20) ACVFs for Sunspots
0 10 20 30 40
−500
050
010
0015
00
h (lag)
ACVF
XIII–42
Example – Recruitment Time Series: III
• reconsider recruitment time series, this time using Burg’s algo-rithm
• can use Burg to get estimate of PACF that is an alternative tosample PACF (latter is based on Y–W)
XIII–43
Sample PACF for Recruitment Series
0 10 20 30 40
−1.0
−0.5
0.0
0.5
1.0
h (lag)
PAC
F
SS–109 XIII–29
Burg Estimate of PACF for Recruitment Series
0 10 20 30 40
−1.0
−0.5
0.0
0.5
1.0
h (lag)
PAC
F
XIII–44
Sample & Fitted AR(2) ACFs for Recruitment Series
0 10 20 30 40
−1.0
−0.5
0.0
0.5
1.0
h (lag)
ACF
XIII–45
Example – Recruitment Time Series: IV
• for AR(2) model, Burg estimates are
� =
�1�2
�.=
1.3515�0.4620
�and �2 .
= 89.337,
as compared to Y–W estimates:
� =
"�1
�2
#.=
1.3316�0.4445
�and �2 .
= 94.171
• can determine Burg estimator �(h) of ACVF by feeding � intoone of the methods for computing theoretical ARMA ACVFs
• yields
�2��12 = �2
�(0) �(1)�(1) �(0)
��1
=
v1,1 v1,2v2,1 v2,2
�.=
0.7866 �0.7271�0.7271 0.7866
�
& 95% CIs [1.2698, 1.4332] for �1 & [�0.5436,�0.3803] for �2
XIII–46
95% Confidence Regions for � (Y–W and Burg)
1.20 1.25 1.30 1.35 1.40 1.45 1.50
−0.60
−0.50
−0.40
−0.30
q1
q 2
XIII–47
95% Confidence Regions and Causality Region for �
−2 −1 0 1 2
−1.0
−0.5
0.0
0.5
1.0
q1
q 2
XIII–48
AICC for Recruitment Series
0 10 20 30 403320
3340
3360
3380
3400
3420
model order
AIC
C
XIII–49
Sample & Fitted Burg AR(13) ACFs
0 10 20 30 40
−1.0
−0.5
0.0
0.5
1.0
h (lag)
ACF
XIII–50
Sample & Fitted Y–W AR(13) ACFs
0 10 20 30 40
−1.0
−0.5
0.0
0.5
1.0
h (lag)
ACF
XIII–33
Moment Matching and MA(q) Processes: I
• Y–W & Burg give preliminary estimates of � & �2 for AR(p)
• Y–W estimator based on moment matching
• relates �1, . . . , �p and �2 to ACVF values �(0), . . . �(p) viap + 1 linear equations
• solve p equations to get �, namely,
�1�(0) + �2�(1) + · · ·�p�(p� 1) = �(1)�1�(1) + �2�(0) + · · ·�p�(p� 2) = �(2)
...�1�(p� 1) + �2�(p� 2) + · · ·�p�(0) = �(p)
after which get �2 via
�2 = �(0)� �1�(1)� · · ·� �p�(p)
• Q: is a similar scheme viable to estimate ✓ & �2 for MA(q)?
BD–145, 146, CC–150, SS–123 XIII–51
Moment Matching and MA(q) Processes: II
• consider invertible MA(1) model: Xt = Zt+✓Zt�1 with |✓| < 1and {Zt} ⇠WN(0,�2)
• ACVF given by
�(h) =
8><>:�2(1 + ✓2), h = 0,
�2✓, h = ±1,
0, otherwise,
• using �(0) = �2(1 + ✓2) and �(1) = �2✓ to express ✓ in termsof ACVF leads to solving nonlinear equation
�(1)
�(0)= ⇢(1) =
✓
1 + ✓2, i.e., need to find roots of ⇢(1)✓2�✓+⇢(1) = 0
BD–145, 146, CC–150, SS–123 XIII–52
Moment Matching and MA(q) Processes: III
• when ⇢(1) 6= 0, possible solutions are
✓ =1 ±
p1� 4⇢2(1)
2⇢(1), which requires �1
2 ⇢(1) 12
for ✓ to be real-valued (need �12 < ⇢(1) < 1
2 to satisfy |✓| < 1)
• since ⇢(1) need not obey this constraint, moment matching canfail to give viable estimators of ✓ for MA(q) process
• failure of moment matching might suggest MA(q) model isinappropriate!
• if viable moment matching estimator ✓ exists in MA(1) case,estimator of �2 is
�2 =�(0)
1 + ✓2
BD–145, 146, CC–150, SS–123 XIII–53
Innovations Algorithm: I
• alternative to moment matching is innovations algorithm (IA),which involves linear manipulation of �(h)’s
• innovations representation for stationary process {Xt} is
Xn+1 =nX
j=0
✓n,jUn�j+1, n = 0, 1, 2, . . . ,
where ✓n,0 = 1, other ✓n,j’s given by IA, U1 = X1 and
Un+1 = Xn+1� bXn+1, with bXn+1 =nX
j=1
�n,jXn, n = 1, 2, . . . ,
and bX1 = 0
• vn = var {U2n+1} = E{(Xn+1 � bXn+1)
2} is associated MSE
BD–73, 150, 151, SS–114 XIII–54
Innovations Algorithm: II
• when {Xt} is invertible MA(q) process with coe�cients ✓1,. . . , ✓q & white noise variance �2, innovations representationsimplifies:
Xn+1 =qX
j=0
✓n,jUn�j+1, n = 0, 1, 2, . . . ,
where ✓n,j ! ✓j and vn! �2 as n!1• convergence can be rapid or painfully slow, depending on how
close roots of ✓(B) = 0 are to unit circle
• as examples, let’s consider three MA(3) processes:
Xt = Zt + 0.4Zt�1 + 0.2Zt�2 + 0.1Zt�3 (1)Xt = Zt + 0.8Zt�1 + 0.8Zt�2 + 0.8Zt�3 (2)Xt = Zt + 0.84Zt�1 + 0.88Zt�2 + 0.92Zt�3 (3)
BD–150, 151, SS–114 XIII–55
Root Plot for First MA(3) Process
−3 −2 −1 0 1 2 3
−3−2
−10
12
3
x
y
**
*
*
*
*
XIII–56
Convergence of ✓n,j’s to ✓j for First MA(3) Process
5 10 15 20
0.0
0.1
0.2
0.3
0.4
0.5
n
e nj
XIII–57
Root Plot for Second MA(3) Process
−2 −1 0 1 2
−2−1
01
2
x
y
*
*
**
*
*
XIII–58
Convergence of ✓n,j’s to ✓j for Second MA(3) Process
5 10 15 20
0.0
0.2
0.4
0.6
0.8
1.0
n
e nj
XIII–59
Root Plot for Third MA(3) Process
−2 −1 0 1 2
−2−1
01
2
x
y
*
*
**
*
*
XIII–60
Convergence of ✓n,j’s to ✓j for Third MA(3) Process
5 10 15 20
0.0
0.2
0.4
0.6
0.8
1.0
n
e nj
XIII–61
Innovations Algorithm: III
• IA munches on �(h)’s and spits out ✓n,j’s and vn’s
• IA estimators of ✓ and �2 for MA(q) process gotten by lettingIA munch on �(h)’s instead
• let ✓n,j and vn denote what IA gives when handed �(h)’s
• IA estimators ✓ & �2 of ✓ & �2 given by ✓j = ✓n,j & �2 = vn
• large sample theory for ✓ & �2 much more complicated thanthat for Y–W and Burg (see B&D, p. 151)
BD–150, 151 XIII–62
Innovations Algorithm: IV
• theory says: for invertible MA(q) process satisfying certain reg-ularity conditions and for any fixed k > 0, normalized vector
n1/2h✓m,1 � ✓1, . . . , ✓m,k � ✓k
iconverges to multivariate normal with mean vector 0 and co-variance matrix A whose (i, j)th element is
Ai,j =
min {i,j}Xl=1
✓i�l✓j�l; in particular, var {✓m,i} ⇡Ai,i
n=
1
n
i�1Xl=0
✓2l
(here ✓0 ⌘ 1 and ✓l ⌘ 0 for l > q)
• need m⌧ n, but m must be large enough so E{✓m,i} ⇡ ✓i
BD–151 XIII–63
Confidence Intervals and Regions for ✓
• can use large sample distribution to get approximate confidenceintervals for individual ✓j’s or confidence region for vector ✓
• with ✓0 ⌘ 1, approx. 95% confidence interval for ✓j given by"✓j � 1.96
✓Pj�1l=0 ✓
2l
n
◆1/2
, ✓j + 1.96
✓Pj�1l=0 ✓
2l
n
◆1/2#
• approx. 95% confidence region for ✓ is set of all ✓’s such that
(✓ � ✓)0A�1(✓ � ✓) �20.95(q)
n,
where q ⇥ q matrix A has (i, j)th element given by
Ai,j =
min {i,j}Xl=1
✓i�l✓j�l
BD–152, 153 XIII–64
Order Selection for MA(q) Processes
1. overhead IX–52: for large n, sample ACF RVs for MA(q) timeseries at lags h > q are approximately N (0, wh,h/n), where
wh,h = 1 + 2⇢2(1) + · · · + ⇢2(q)
(⇢(h) should fall between ±1.96(wh,h/n)1/2 with prob.⇡ 95%)
2. in a similar manner, can base order selection on IA estimator✓, using its large sample theory to assess variability
3. can also select q to be minimizer of AICC statistic, with likeli-hood for each order being evaluated using IA estimator ✓
BD–152, CC–110, SS–524, 525 XIII–65
Atomic Clock Time Series
0 200 400 600 800 1000
−20
−10
010
t
x t
IX–53
Sample ACF for Atomic Clock Series
0 10 20 30 40
−0.4
−0.2
0.0
0.2
0.4
h (lag)
ACF
IX–54
Sample PACF for Atomic Clock Series
0 10 20 30 40
−1.0
−0.5
0.0
0.5
1.0
h (lag)
PAC
F
BD–99 XIII–66
PACF for MA(1) Process with ✓ = �0.5
0 10 20 30 40
−1.0
−0.5
0.0
0.5
1.0
h (lag)
PAC
F
BD–99 XIII–67
Moment Matching for Atomic Clock Series
• assuming q = 1, can do moment matching since ⇢(1).= �0.3659
and hence |⇢(1)| < 0.5, as required
• possible estimates are
✓ =1 ±
p1� 4⇢2(1)
2⇢(1); i.e., either ✓
.= �2.2978 or ✓
.= �0.4352
• two estimates are reciprocals of one another
• choice ✓.= �0.4352 corresponds to invertible MA(1) model
• corresponding estimate of �2 is
�2 =�(0)
1 + ✓2
.= 23.919
• assuming q = 2, . . . hmmm . . .
XIII–68
Convergence of ✓n,j’s for Atomic Clock Series
0 10 20 30 40
−0.6
−0.4
−0.2
0.0
0.1
n
e nj
j = 1 j = 2 j = 3 j = 4
XIII–69
Convergence of vn’s for Atomic Clock Series
0 10 20 30 40
2224
2628
n
v n
XIII–70
Innovations Algorithm for Atomic Clock
• appears have converged by n = 15 (sooner?)
• base estimates & 95% CIs for ✓1, . . . , ✓4 on ✓15,1, . . . , ✓15,4:
j ✓j lower bound upper bound1 �0.5879 �0.6491 �0.52662 �0.1316 �0.2026 �0.06053 �0.0690 �0.1405 0.00254 �0.0044 �0.0760 0.0672
• moment matching for MA(1) model gave ✓.= �0.4352, which
is not within 95% CI based on ✓1
• CIs suggest MA(2) model is appropriate
• here �2 = v15.= 20.782 (got �2 .
= 23.919 from MA(1) momentmatching)
BD–151 XIII–71
95% Confidence Region for ✓
−0.75 −0.65 −0.55 −0.45
−0.30
−0.20
−0.10
0.00
e1
e 2
XIII–72
95% Confidence Region & Invertibility Region for ✓
−2 −1 0 1 2
−1.0
−0.5
0.0
0.5
1.0
e1
e 2
XIII–73
AICC for Atomic Clock Series
0 10 20 30 40
6020
6040
6060
6080
model order
AIC
C
XIII–74
Parameter Estimation for Mixed ARMA(p, q) Models
• so far have discussed estimation techniques appropriate for pureAR(p) models and pure MA(q) models
• can handle mixed ARMA models (i.e., p > 0 and q > 0) using
� innovations algorithm
� higher-order Yule–Walker method with innovations algorithm
� Hannan–Rissanen algorithm, an example of a so-called leastsquares estimator
XIII–75
Innovations Algorithm for Mixed ARMA Models: I
• assume causal ARMA process
Xt� �1Xt�1� · · ·� �pXt�p = Zt + ✓1Zt�1 + · · ·+ ✓qZt�q,
where {Zt} ⇠WN(0,�2)
• because of causality, can write
Xt =1Xj=0
jZt�j,
where 0 = 1 and
j = ✓j +
min {p,j}Xi=1
�i j�i, j � 1, (⇤)
for which we define ✓j = 0 for j > q
BD–154, 155 XIII–76
Innovations Algorithm for Mixed ARMA Models: II
• knowing 1 . . . , p+q, can solve for �i’s & ✓j’s, as follows
• equation (⇤) for j = 1, . . . , q gives
1 = ✓1 + �1, . . . , q = ✓q +
min {p,q}Xi=1
�i q�i (•)
• (⇤) for j = q + 1, . . . , q + p does not involve ✓j’s directly:
q+1 =
min {p,q+1}Xi=1
�i q+1�i, . . . , q+p =pX
i=1
�i q+p�i (†)
• use (†) to solve for �i’s, after which (•) gives
✓j = j �min {p,j}X
i=1
�i j�i, j = 1, . . . , q
BD–154, 155 XIII–77
Innovations Algorithm for Mixed ARMA Models: III
• IA takes ACVF and gives ✓m,j’s, where ✓m,j ! j as m!1• to get estimates of �i’s and ✓j’s,
1. use IA with sample ACVF to get estimates of ✓m,j (with mchosen large enough to ensure convergence)
2. set j equal to estimate of ✓m,j
3. use j’s with p + q equations to get �i’s and ✓j’s
• B&D note that
� resulting �i need not correspond to a causal process
� order selection using sample ACVF and PACF dicey withmixed models because no clear patterns to distinguish be-tween, e.g., ARMA(2,1) and ARMA(1,2)
� order selection can still be done using AICC
BD–154, 155 XIII–78
Innovations Algorithm for Mixed ARMA Models: IV
• can base estimate of �2 on normalized one-step-ahead MSEs:
�2 =1
n
nXt=1
(Xt � bXt)2
rt�1, where rt�1 =
E{(Xt � bXt)2}�2 ,
and bXt is predictor of Xt based upon Xt�1, . . . , X1
• can get �2 by using �(0) (sample variance for time series)along with estimates �i & ✓j to calculate ACVF for theoreticalARMA(p, q) process and feeding this ACVF into L–D recur-sions – desired estimate �2 is nth order MSE vn
• alternatively, once ARMA(p,q) ACVF has been determined,apply IA to it with m set large enough so that vm is stable,and then use �2 = vm
BD–154, 155 XIII–79
Example – Atomic Clock Series: I
• as an example, let’s use IA to fit an ARMA(1,1) model
Xt � �Xt�1 = Zt + ✓Zt�1, {Zt} ⇠WN(0,�2),
to atomic clock series
• the p + q = 2 relevant equations are
1 = ✓ + � & 2 = � 1, yielding � = 2
1& ✓ = 1 � �
• basing our estimates of 1 and 2 on
✓15,1.= �0.5879 and ✓15,2
.= �0.1316
(see overhead XIII–71) yields
�.= 0.2238 and ✓
.= �0.8117
(corresponds to a causal and invertible ARMA(1,1) model)
• get �2 = 20.860 (compared to �2 .= 20.782 for MA(2) model)
XIII–80
Estimation of �2 via v100 from Innovations Algorithm
0 20 40 60 80 100
2224
2628
m
v m
XIII–81
Sample & ARMA(1,1) ACF for Atomic Clock
0 10 20 30 40
−0.4
−0.2
0.0
0.2
0.4
h (lag)
ACF
XIII–82
Sample & ARMA(1,1) PACF for Atomic Clock
5 10 15 20
−0.4
−0.2
0.0
0.2
0.4
h (lag)
PAC
F
XIII–83
AICC for Atomic Clock Series
0 10 20 30 40
6020
6040
6060
6080
number of parameters in model
AIC
C
XIII–84
Higher-Order Yule–Walker Method: I
• alternative to IA algorithm for handling mixed ARMA modelsis based on structure of ACVF {�(h)} for such models
• as noted before (overhead IX–20), ARMA(p, q) ACVF satisfies
�(k)� �1�(k � 1)� · · ·� �p�(k � p) = 0
for all k � q + 1
• does not involve MA coe�cients
• can use so-called higher-order Y–W equations to get �i’s:
�1�(q) + �2�(q � 1) + · · ·�p�(q � p + 1) = �(q + 1)�1�(q + 1) + �2�(q) + · · ·�p�(q � p + 2) = �(q + 2)
...�1�(q + p� 1) + �2�(q + p� 2) + · · ·�p�(q) = �(q + p)
BD–145 XIII–85
Higher-Order Yule–Walker Method: II
• with �i’s known, can filter time series X1, . . . , Xn and getoutput Yp+1, . . . , Yn with MA(q) structure:
Yt ⌘ Xt � �1Xt�1 � · · ·� �pXt�p
= Zt + ✓1Zq�1 + · · · + ✓qZt�q
• higher-order Y–W method with IA thus consists of
� substituting �(h)’s into higher-order Y–W equations andsolving to get estimates �i
� using �i’s to filter time series to get output, say Y 0t� forming sample ACVF for Y 0t ’s and using these as input to
IA to estimate MA coe�cients ✓j
BD–145 XIII–86
Example – Atomic Clock Series: II
• as an example, let’s use scheme to fit an ARMA(1,1) model toatomic clock series
• relevant higher-order Y–W equation is ��(1) = �(2), yielding
� =�(2)
�(1).= 0.1823, as compared to �
.= 0.2238 using IA
• estimate corresponds to a causal process, but might not happenfor other time series (no reason why �(1) ⇡ 0 can’t occur)
• forming sample ACVF for Y 0t = Xt � �Xt�1, t = 2, . . . , n,and feeding it into IA yields ✓n,j’s and vn’s shown on nextoverheads
XIII–87
Convergence of ✓n,j’s for Y 0t ’s
0 10 20 30 40
−0.8
−0.6
−0.4
−0.2
0.0
n
e nj
j = 1 j = 2 j = 3 j = 4
XIII–88
Convergence of vn’s for Y 0t ’s
0 10 20 30 40
2022
2426
2830
32
n
v n
XIII–89
Example – Atomic Clock Series: III
• using ✓15,1 to estimate ✓ in ARMA(1,1) model, get ✓.= �0.7748
compared to ✓.= �0.8117 using IA by itself (overhead XIII–80)
• using v15 to estimate �2 yields �2 .= 20.631 as compared to
IA-based �2 .= 20.860
• sampling theory for ✓n,j’s suggests that those for j = 2, 3 and 4are not significantly di↵erent from zero; i.e., ARMA(1,q) modelwith q > 1 not indicated
• AICC for fitted ARMA(1,1) model is 6022.7, so model is lesslikely than IA-based model with AICC of 6016.2
• next overheads
� show AICC compared to ones for IA-based MA(q) models
� compare theoretical and sample ACVFs and PACFs
XIII–90
AICC for Higher-Order Y–W Method
0 10 20 30 40
6020
6040
6060
6080
number of parameters in model
AIC
C
XIII–91
Sample & ARMA(1,1) ACF for Atomic Clock
0 10 20 30 40
−0.4
−0.2
0.0
0.2
0.4
h (lag)
ACF
XIII–92
Sample & ARMA(1,1) PACF for Atomic Clock
5 10 15 20
−0.4
−0.2
0.0
0.2
0.4
h (lag)
PAC
F
XIII–93
Least Squares Estimators: I
• as prelude to Hannan–Rissanen algorithm, consider least squares(LS) estimators for AR(p) coe�cients
• express AR(p) model as
Xt = �1Xt�1 + · · · + �pXt�p + Zt, {Zt} ⇠WN(0,�2),
• above looks like a multiple regression model, except explanatoryvariables are lagged versions of dependent variable Xt
CC–154, SS–126 XIII–94
Least Squares Estimators: II
• given time series X1, . . . , Xn, have n�p observations for whichwe can also get required explanatory variables:
Xp+1 = �1Xp + · · · + �pX1 + Zp+1
Xp+2 = �1Xp+1 + · · · + �pX2 + Zp+2...
Xn = �1Xn�1 + · · · + �pXn�p + Zn
• in matrix formulation, can write as X = A� + Z, where
X =
2664
Xp+1Xp+2
...Xn
3775, A =
2664
Xp · · · X1Xp+1 · · · X2
... ... ...Xn�1 · · · Xn�p
3775, � =
2664�1�2...�p
3775, Z =
2664
Zp+1Zp+2
...Zn
3775
CC–154, SS–126 XIII–95
Least Squares Estimators: III
• use ordinary LS to estimate �
• by definition, LS estimators minimize
Sf (�) ⌘nX
t=p+1
�Xt � �1Xt�1 � · · ·� �pXt�p
�2
• Sf is a quadratic function of �, so any minimizing � mustsatisfy, for i = 1, . . . , p,
@Sf (�)
@�i= �2
nXt=p+1
�Xt � �1Xt�1 � · · ·� �pXt�p
�Xt�i = 0
• yields set of p so-called normal equations:nX
t=p+1
�1Xt�1Xt�i + · · · + �pXt�pXt�i =nX
t=p+1
XtXt�i
CC–154, SS–126 XIII–96
Least Squares Estimators: IV
• in matrix formulation, normal equations become A0A� = A0X
• denote solution as �f – this is the LS estimator of �
• �f need not correspond to causal AR(p) model, and A0A neednot be positive definite
• interesting connection between �f and Y–W estimator �:
� take time series and add p zeros before X1 and p zeros afterXn to create a time series, say {X 0t}, with n + 2p values
� LS estimator of {X 0t} is identical to Y–W estimator of {Xt}!� in particular, A0A� = A0X reduces to �p� = �p
� solution � must correspond to causal AR(p) model, andA0A = �p is always positive definite – adding zeros acts asregularization procedure!
CC–154, SS–126 XIII–97
Least Squares Estimators: V
• in view of
Sf (�) =nX
t=p+1
�Xt � �1Xt�1 � · · ·� �pXt�p
�2 ,
can regard �f as arising from minimization of sum of squaredforward prediction errors
• in the same spirit as Burg’s algorithm, can also consider back-ward prediction errors:
Sb(�) =n�pXt=1
�Xt � �1Xt+1 � · · ·� �pXt+p
�2
• leads to forward/backward LS estimator �fb, which is whatever� minimizing
Sfb(�) = Sf (�) + Sb(�)
XIII–98
Hannan–Rissanen Algorithm: I
• Hannan–Rissanen (H–R) algorithm extends LS estimator towork with ARMA(p, q) processes
• reexpress ARMA model to mimic a multiple regression model:
Xt = �1Xt�1 + · · · + �pXt�p
+ ✓1Zt�1 + · · · + ✓qZt�q + Zt, {Zt} ⇠WN(0,�2)
• explanatory variables now also include lagged versions of errorsZt, which can’t be directly observed
• H–R gets around this by creating surrogates Zt for unobservableZt’s and then forging ahead with LS procedure
BD–156, 157 XIII–99
Hannan–Rissanen Algorithm: II
• start by fitting high-order AR(m) model to time series X1, . . . ,Xn using Y–W, where m > max {p, q}
• idea is that high-order AR model might be able to closely mimiccovariance structure of low-order ARMA model
• estimate Zt using
Zt = Xt� �m,1Xt�1� · · ·� �m,mXt�m, t = m + 1, . . . , n
• estimate � = [�1, . . . ,�p, ✓1, . . . , ✓q]0 by minimizing
S(�) =nX
t=m+1+q
⇣Xt � �1Xt�1 � · · ·� �pXt�p � ✓1Zt�1 � · · ·� ✓qZt�q
⌘2
• let � = [�1, . . . , �p, ✓1, . . . , ✓q]0 denote resulting estimator
BD–156, 157 XIII–100
Hannan–Rissanen Algorithm: III
• to get estimates of white noise component, recursively set
Zt =
(0, 1 t max {p, q};Xt �
Ppi=1 �iXt�i �
Pqj=1 ✓jZt�j, max {p, q} < t n
• since Zt’s might not be quite white, recursively set
Vt =
(0, 1 t max {p, q};Pp
i=1 �iVt�i + Zt, max {p, q} < t n
and
Wt =
(0, 1 t max {p, q};�
Pqj=1 ✓jWt�j + Zt, max {p, q} < t n
• note: �(B)Vt = Zt & ✓(B)Wt = Zt, so �(B)Vt = ✓(B)Wt
BD–157, 158 XIII–101
Hannan–Rissanen Algorithm: IV
• estimate � = [�1, . . . ,�p, ✓1, . . . , ✓q]0 by minimizing
S(�) =nX
t=max {p,q}+1
✓Zt �
pXi=1
�iVt�i �qX
j=1
✓jWt�j
◆2
• let �† = [�1, . . . , �p, ✓1, . . . , ✓q]0 denote resulting estimator
• H–R estimator is � = � + �†, but B&D stick with just �
• three comments
1. can handle both pure MA(q) & mixed ARMA(p, q) models
2. usual formulation of H–R calls for use of Y–W, but Burg isa better choice (in particular, Zt’s are computed as part ofBurg’s algorithm)
3. as in IA, choice of m > max {p, q} requires some care
BD–155, 156, 157 XIII–102
Example – Atomic Clock Series: IV
• as an example, let’s use H–R to fit an MA(4) model to atomicclock series to compare with IA results
• next two overheads look at
1. dependence of estimates ✓1, . . . , ✓4 on order m for approx-imating AR process; m = 5 to 40; m = 15 looks like goodchoice; dotted lines indicate ✓j = ✓15,j from IA
2. same, but now for refinement ✓j; will use m = 15 again
j ✓HR,j ✓HR,j ✓IA,j lower bound upper bound1 �0.5860 �0.5890 �0.5879 �0.6491 �0.52662 �0.1426 �0.1509 �0.1316 �0.2026 �0.06053 �0.0723 �0.0616 �0.0690 �0.1405 0.00254 �0.0058 �0.0110 �0.0044 �0.0760 0.0672
XIII–103
Convergence of ✓j’s for Atomic Clock Series
0 10 20 30 40
−0.6
−0.4
−0.2
0.0
0.1
m
e j
j = 1 j = 2 j = 3 j = 4
XIII–104
Convergence of ✓j’s for Atomic Clock Series
0 10 20 30 40
−0.6
−0.4
−0.2
0.0
0.1
m
e~ j
j = 1 j = 2 j = 3 j = 4
XIII–105
AICC for MA(q) Models for Atomic Clock Series
0 10 20 30 40
6020
6040
6060
6080
model order q
AIC
C
IA HR
XIII–106
Example – Atomic Clock Series: V
• now let’s use H–R to fit an ARMA(1,1) model
Xt � �Xt�1 = Zt + ✓Zt�1, {Zt} ⇠WN(0,�2),
• H–R yields �.= 0.2730 and ✓
.= �0.8662 (m = 15)
• IA yields �.= 0.2238 and ✓
.= �0.8117
• AICC for IA model is 6016.2, while that for H–R is 6011.5; i.e.,H–R model is more likely
XIII–107
AICC for MA(q) Models for Atomic Clock Series
0 10 20 30 40
6020
6040
6060
6080
model order q
AIC
C
IA HR HR ARMA(1,1)
XIII–108
Sample & ARMA(1,1) ACF for Atomic Clock
0 10 20 30 40
−0.4
−0.2
0.0
0.2
0.4
h (lag)
ACF
XIII–109
Sample & ARMA(1,1) PACF for Atomic Clock
5 10 15 20
−0.4
−0.2
0.0
0.2
0.4
h (lag)
PAC
F
XIII–110
Maximum Likelihood Estimation: I
• as noted before (XIII–21), likelihood for Gaussian zero-meanstationary time series Xn = [X1, . . . , Xn]0 with covariance ma-trix �n is given by
L(�n) = (2⇡)�n/2(det �n)�1/2 exp�� 1
2X0n��1
n Xn�
and hence
�2 ln (L(�n)) = n ln (2⇡) + ln (det �n) + X 0n��1n Xn
• for ARMA(p, q) time series, parameters �, ✓ & �2 set �n
• given Xn, can assess likelihood of various parameter settings
• maximum likelihood estimators (MLEs) of parameters are set-tings such that L(�n) is maximized
• note: L(�n) is maximized when �2 ln (L(�n)) is minimized
BD–158, CC–158, SS–124 XIII–111
Maximum Likelihood Estimation: II
• key to evaluating L(�n) is knowing how to compute det �n andX 0n��1
n Xn for various parameter settings
• can appeal to IA equation to get manageable expressions
• key equation behind IA is Xn = CnUn, where
Cn =
26666664
1 0 0 · · · 0 0✓n�1,1 1 0 · · · 0 0
... ... ... . . . 0 0✓n�1,n�3 ✓n�2,n�4 · · · 1 0 0✓n�1,n�2 ✓n�2,n�3 · · · ✓2,1 1 0✓n�1,n�1 ✓n�2,n�2 · · · ✓2,2 ✓1,1 1
37777775
is an n⇥n lower triangular matrix, and Un = [U1, . . . , Un]0 isa vector of innovations (one-step-ahead prediction errors)
BD–158, 159, CC–158, SS–124 XIII–112
Maximum Likelihood Estimation: III
• innovations in Un are uncorrelated RVs with variances v0, v1,. . . , vn�1
• covariance matrix Dn for Un is thus a diagonal matrix, withdiagonal elements given by vj’s
• since Xn = CnUn, standard result from theory of randomvectors (B&D, Equation (A.2.5)) says that covariance �n forXn can be written as CnDnC0n
• can argue (why?) that
det �n = (det Cn)(det Dn)(det C0n) =n�1Yj=0
vj,
which gives us the required manageable expression for det �n
BD–158, 159, CC–158, SS–124 XIII–113
Maximum Likelihood Estimation: IV
• to get a manageable expression for X 0n��1n Xn, note that, since
Xn = CnUn implies C�1n Xn = Un;
since
Un = Xn � Xn, where Xn = [ bX1, . . . , bXn]0;and since
�n = CnDnC0n implies ��1n = (C0n)�1D�1
n C�1n ;
it follows that
X 0n��1n Xn = X 0n(C0n)�1D�1
n C�1n Xn = U 0nD�1
n Un,
i.e.,
X 0n��1n Xn =
nXj=1
U2j
vj�1=
nXj=1
(Xj � bXj)2
vj�1
BD–158, 159, CC–158, SS–124 XIII–114
Maximum Likelihood Estimation: V
• since �, ✓ & �2 determine �n and since vj = rj�2, we can
reexpress �2 ln (L(�n)) as
�2 ln�L(�,✓,�2)
�= n ln (2⇡) + ln (det �n) + X 0n��1
n Xn
= n ln (2⇡) + ln
✓ n�1Yj=0
vj
◆+
nXj=1
(Xj � bXj)2
vj�1
= n ln (2⇡�2) +n�1Xj=0
ln (rj) +1
�2
nXj=1
(Xj � bXj)2
rj�1
⌘ n ln (2⇡�2) +n�1Xj=0
ln (rj) +S(�,✓)
�2 ,
where we note that rj’s and S(�,✓) do not depend on �2
BD–160, CC–158, SS–124 XIII–115
Maximum Likelihood Estimation: VI
• di↵erentiating �2 ln�L(�,✓,�2)
�with respect to �2 and set-
ting resulting expression to zero yields MLE
�2 =S(�,✓)
n=
1
n
nXj=1
(Xj � bXj)2
rj�1
• substituting �2 for �2 in �2 ln�L(�,✓,�2)
�yields a so-called
profile likelihood, which does not depend on �2
BD–160, CC–158, SS–124 XIII–116
Maximum Likelihood Estimation: VII
• profile likelihood takes the form
�2 ln�L(�,✓)
�= n ln (2⇡�2) +
n�1Xj=0
ln (rj) +S(�,✓)
�2
= n ln (2⇡S(�,✓)/n) +n�1Xj=0
ln (rj) +S(�,✓)
S(�,✓)/n
= n + n ln (2⇡/n) + n ln (S(�,✓)) +n�1Xj=0
ln (rj)
BD–160, CC–158, SS–124 XIII–117
Maximum Likelihood Estimation: VIII
• to evaluate�2 ln�L(�,✓)
�for a particular ARMA(p, q) model,
here are the steps we need to take
1. compute ACVF for model out to lag n� 1, setting �2 = 12. get rj’s and compute 1-step-ahead predictions via recursions
bXj+1 =
(Pjk=1 ✓j,kUj�k+1, 1 j < m;Ppi=1 �iXj�i+1 +
Pqk=1 ✓j,kUj�k+1, m j n� 1
where rj’s & ✓j,k’s come from IA applied to model ACVF
(note: usually vj = rj�2, so setting �2 = 1 gives rj = vj);
Uj = Xj � bXj (as usual!); and m = max {p, q}3. compute S(�,✓) =
Pnj=1 U2
j /rj�1 andPn�1
j=0 ln (rj)
• MLEs are settings � and ✓ that minimize �2 ln�L(�,✓)
�• side calculation gives corresponding �2 = S(�, ✓)/n
BD–160, CC–158, SS–124 XIII–118
Maximum Likelihood Estimation: IX
• can take simpler approach to evaluate �2 ln�L(�)
�in special
case of AR(p) model
1. use reverse L–D to generate rj’s and coe�cients �j,k for bestlinear predictors of orders j = p� 1, . . . , 1, along with r0
2. compute 1-step-ahead predictions via recursions
bXj+1 =
(Pjk=1 �j,kXj�k+1, 1 j < p;Ppi=1 �iXj�i+1, p j n� 1,
along with innovations Uj+1 = Xj+1 � bXj+1
3. compute S(�) =Pp
j=1 U2j /rj�1+
Pnj=p+1 U2
j &Pp�1
j=0 ln (rj)
• MLEs are settings � that minimize �2 ln�L(�)
�• side calculation gives corresponding �2 = S(�)/n
XIII–119
ML-Based Least Squares Estimation
• can argue that, for large n,
�2 ln�L(�,✓)
�= n+n ln (2⇡/n)+n ln (S(�,✓))+
n�1Xj=0
ln (rj)
depends mainly on n ln (S(�,✓)), notP
j ln (rj), in part be-cause rj ! 1 (for AR(p) models rj = 1 for j � p)
• withP
j ln (rj) dropped, minimization of
n + n ln (2⇡/n) + n ln (S(�,✓))
is equivalent to minimization of S(�,✓), with minimizers �and ✓ defining ML-based least squares estimators
• corresponding estimator for �2 is �2 = S(�, ✓)/(n� p� q)
• nonlinear optimization still required, but sometimes easier tofind LS estimates than MLEs
BD–161, CC–158, SS–124 XIII–120
Order Selection
• order selection can be based on AICC statistic:
AICC = �2 ln�L(�, ✓)
�+
2(p + q + 1)n
n� p� q � 2,
which, for a given model, is necessarily minimized by MLEs
• B&D, C&C and S&S discuss other order selection criteria (FPEfor AR(p) models, AIC, BIC)
• see also Choi (1992), McQuarrie & Tsai (1998) and Stoica &Selen (2004)
BD–161, 169–174, CC–130, 131, SS–52, 53 XIII–121
Large Sample Distribution of MLEs: I
• consider causal & invertible ARMA(p, q) model
�(B)Xt = ✓(B)Zt, {Zt} ⇠WN(0,�2)
• let {Ut} and {Vt} be AR(p) and AR(q) processes satisfying
�(B)Ut = Zt and ✓(B)Vt = Zt,
and let Ut = [Ut, . . . , Ut+1�p]0 and Vt = [Vt, . . . , Vt+1�q]
0
• letting � = [�1, . . . ,�p, ✓1, . . . , ✓q]0 and letting � denote the
corresponding vector of MLEs, have, for large n,
� ⇡ N (�, V (�)/n),
where
V (�) = �2
E{UtU0t} E{UtV
0t }
E{VtU0t} E{VtV
0t }
��1
(q = 0: V (�) = �2 [E{UtU0t}]�1; p = 0: V (�) = �2 [E{VtV
0t }]�1)
BD–161, 162, CC–161, SS–133, 134 XIII–122
Large Sample Distribution of MLEs: II
• easy to get E{UtU0t} & E{VtV
0t }; E{VtU
0t} is more work
• B&D give V (�) for five special cases:
AR(1):h1� �2
i
AR(2):
1� �2
2 ��1(1 + �2)��1(1 + �2) 1� �2
2
�
MA(1):h1� ✓2
i
MA(2):
1� ✓2
2 ✓1(1� ✓2)✓1(1� ✓2) 1� ✓2
2
�
ARMA(1,1):1 + �✓
(� + ✓)2
(1� �2)(1 + �✓) �(1� �2)(1� ✓2)�(1� �2)(1� ✓2) (1� ✓2)(1 + �✓)
�
• can plug in estimates �i & ✓j (OK for large n & small p + q)
BD–162, 163, CC–161, SS–133, 134 XIII–123
Example – Atomic Clock Series: VI
model suggested by fitted by � ✓1 ✓2 ✓3 AICCMA(1) ACF CIs IA — �0.59 — — 6067.05
ML — �0.73 — — 6051.33MA(2) IA CIs IA — �0.59 �0.13 — 6024.04
ML — �0.61 �0.18 — 6016.66MA(3) IA AICC IA — �0.59 �0.13 �0.07 6013.40
ML — �0.60 �0.14 �0.08 6012.65ARMA(1,1) ACF/PACF H–R 0.27 �0.87 — — 6011.53
ML 0.27 �0.86 — — 6011.52
• ML AICCs for
� AR(1), AR(2) & AR(3): 6190.78, 6136.49, 6095.82
� ARMA(1,2) & ARMA(2,1): 6013.20, 6013.14
XIII–124
Diagnostic Checking
• diagnostic tests based on normalized innovations correspondingto fitted model:
Wt =Utprj�1
=Xt � bXtp
rj�1, t = 1, . . . , n
• if fitted model is good representation for time series, Wt’s shouldresemble zero-mean white noise (but won’t be exactly so)
• tests include
� informal assessment based on plot of Wt’s
� sample ACF and PACF for Wt’s
� checks on hypothesis of randomness
• let’s illustrate diagnostic checking by looking at fitted ARMA(1,1)model for atomic clock series
BD–164 XIII–125
ARMA(1,1) Residuals Wt for Atomic Clock
0 200 400 600 800 1000
−20
−10
010
20
t
Wt
XIII–126
Sample ACF for ARMA(1,1) Residuals Wt
0 10 20 30 40
−0.2
−0.1
0.0
0.1
0.2
h (lag)
ACF
XIII–127
Sample PACF for ARMA(1,1) Residuals Wt
0 10 20 30 40
−0.2
−0.1
0.0
0.1
0.2
h (lag)
PAC
F
XIII–128
Portmanteau Tests of ARMA(1,1) Residuals Wt
0 5 10 15 20
05
1015
2025
30
h
QLB
and
0.9
5% q
uant
ile
XIII–129
Example – Atomic Clock Series: VII
expected testtest value statistic p-value
turning point 681.3 709 0.040di↵erence-sign 511.5 515 0.705
rank 261888 262439 0.920runs 513.0 506 0.662
AR method AICC order AIC orderYule–Walker 0 0
Burg 0 0OLS 0 3MLE 0 0
XIII–130
Atomic Clock Time Series
0 200 400 600 800 1000
−20
−10
010
20
t
x t
IX–53
Simulated Atomic Clock Time Series: I
0 200 400 600 800 1000
−20
−10
010
20
t
x t
XIII–131
Simulated Atomic Clock Time Series: II
0 200 400 600 800 1000
−20
−10
010
20
t
x t
XIII–132
Simulated Atomic Clock Time Series: III
0 200 400 600 800 1000
−20
−10
010
20
t
x t
XIII–133
Forecasting Atomic Clock Time Series
1000 1005 1010 1015 1020 1025 1030
−20
−10
010
20
t
x t
** * * * * *
XIII–134
References
• B. Choi (1992), ARMA Model Identification, New York: Springer-Verlag
• R. A. Johnson and D. W. Wichern (1998), Applied Multivariate Statistical Analysis(Fourth Edition), Upper Saddle River, New Jersey: Prentice Hall
• A. D. R. McQuarrie and C.-L. Tsai (1998), Regression and Time Series Model Selection,Singapore: World Scientific
• P. Stoica and Y. Selen (2004), ‘Model-Order Selection: A Review of Information CriterionRules,’ IEEE Signal Processing Magazine, 21, pp. 36–47
XIII–135