forecasting with varma models · the successful use of univariate arma models for forecasting has...
TRANSCRIPT
EUROPEAN UNIVERSITY INSTITUTE DEPARTMENT OF ECONOMICS
EUI Working Paper ECO No. 2004 /25
Forecasting with VARMA Models
HELMUT LÜTKEPOHL
BADIA FIESOLANA, SAN DOMENICO (FI)
All rights reserved. No part of this paper may be reproduced in any form
Without permission of the author(s).
©2004 Helmut Lütkepohl
Published in Italy in September 2004 European University Institute
Badia Fiesolana I-50016 San Domenico (FI)
Italy
*Revisions of this work can be found at: http://www.iue.it/Personal/Luetkepohl/.
August 30, 2004
Forecasting with VARMA Models
by
Helmut Lutkepohl
Department of Economics, European University Institute, Via della Piazzuola 43, I-50133
Firenze, ITALY, email: [email protected]
Contents
1 Introduction and Overview 1
2 VARMA Processes 42.1 Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Cointegrated I(1) Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Linear Transformations of VARMA Processes . . . . . . . . . . . . . . . . . 72.4 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.1 General Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4.2 Forecasting Aggregated Processes . . . . . . . . . . . . . . . . . . . . 12
2.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.5.1 Deterministic Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.5.2 More Unit Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.5.3 Non-Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Specifying and Estimating VARMA Models 203.1 The Echelon Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.1 Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.2 I(1) Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Estimation of VARMA Models for Given Lag Orders and Cointegrating Rank 253.2.1 ARMAE Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.2 EC-ARMAE Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Testing for the Cointegrating Rank . . . . . . . . . . . . . . . . . . . . . . . 283.4 Specifying the Lag Orders and Kronecker Indices . . . . . . . . . . . . . . . 293.5 Diagnostic Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4 Forecasting with Estimated Processes 314.1 General Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.2 Aggregated Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5 Conclusions 33
0
EUI WP ECO 2004/24
1 Introduction and Overview
In this chapter linear models for the conditional mean of a stochastic process are considered.
These models are useful for producing linear forecasts of time series variables. Suppose that
K related time series variables are considered, y1t, . . . , yKt, say. Defining yt = (y1t, . . . , yKt)′,
a linear model for the conditional mean of the data generation process (DGP) of the observed
series may be of the vector autoregressive (VAR) form,
yt = A1yt−1 + · · ·+ Apyt−p + ut, (1.1)
where the Ai’s (i = 1, . . . , p) are (K × K) coefficient matrices and ut is a K-dimensional
error term. If ut is independent over time (i.e., ut and us are independent for t 6= s), the
conditional mean of yt, given past observations, is
yt|t−1 ≡ E(yt|yt−1, yt−2, . . . ) = A1yt−1 + · · ·+ Apyt−p.
Thus, the model can be used directly for forecasting one period ahead and forecasts with
larger horizons can be computed recursively. Therefore, variants of this model will be the
basic forecasting models in this chapter.
For practical purposes the simple VAR model of order p may have some disadvantages,
however. The Ai parameter matrices will be unknown and have to be replaced by estimators.
For an adequate representation of the DGP of a set of time series of interest a rather large
VAR order p may be required. Hence, a large number of parameters may be necessary for
an adequate description of the data. Given limited sample information this will usually
result in low estimation precision and also forecasts based on VAR processes with estimated
coefficients may suffer from the uncertainty in the parameter estimators. Therefore it is
useful to consider the larger model class of vector autoregressive moving-average (VARMA)
models which may be able to represent the DGP of interest in a more parsimonious way. In
this chapter the analysis of models from that class will be discussed although special case
results for VAR processes will occasionally be noted explicitly.
The VARMA class has the further advantage of being closed with respect to linear trans-
formations, that is, a linearly transformed finite order VARMA process has again a finite
order VARMA representation. Therefore linear aggregation issues can be studied within
this class. In this chapter special attention will be given to results related to forecasting
contemporaneously and temporally aggregated processes.
VARMA models can be parameterized in different ways. In other words, different param-
eterizations describe the same stochastic process. Although this is no problem for forecasting
1
EUI WP ECO 2004/24
purposes because we just need to have one adequate representation of the DGP, nonunique
parameters are a problem at the estimation stage. Therefore the echelon form of a VARMA
process is presented as a unique representation. Estimation and specification of this model
form will be considered.
These models have first been developed for stationary variables. In economics and also
other fields of applications many variables are generated by nonstationary processes, however.
Often they can be made stationary by considering differences or changes rather than the
levels. A variable is called integrated of order d (I(d)) if it is still nonstationary after
taking differences d − 1 times but it can be made stationary or asymptotically stationary
by differencing d times. In most of the following discussion the variables will be assumed to
be stationary (I(0)) or integrated of order 1 (I(1)) and they may be cointegrated. In other
words, there may be linear combinations of I(1) variables which are I(0). If cointegration is
present, it is often advantageous to separate the cointegration relations from the short-run
dynamics of the DGP. This can be done conveniently by allowing for an error correction or
equilibrium correction (EC) term in the models and EC echelon forms will also be considered.
The model setup for stationary and integrated or cointegrated variables will be presented
in the next section where also forecasting with VARMA models will be considered under the
assumption that the DGP is known. In practice it is, of course, necessary to specify and
estimate a model for the DGP on the basis of a given set of time series. Model specification,
estimation and model checking are discussed in Section 3 and forecasting with estimated
models is considered in Section 4. Conclusions follow in Section 5.
Historical Notes
The successful use of univariate ARMA models for forecasting has motivated researchers to
extend the model class to the multivariate case. It is plausible to expect that using more
information by including more interrelated variables in the model improves the forecast
precision. This is actually the idea underlying Granger’s influential definition of causality
(Granger (1969)). It turned out, however, that generalizing univariate models to multi-
variate ones is far from trivial in the ARMA case. Early on Quenouille (1957) considered
multivariate VARMA models. It became quickly apparent, however, that the specification
and estimation of such models was much more difficult than for univariate ARMA mod-
els. The success of the Box-Jenkins modelling strategy for univariate ARMA models in the
1970s (Box & Jenkins (1976)) triggered further attempts of using the corresponding mul-
2
EUI WP ECO 2004/24
tivariate models and developing estimation and specification strategies. In particular, the
possibility of using autocorrelations, partial autocorrelations and cross-correlations between
the variables for model specification were explored. Because modelling strategies based on
such quantities had been to some extent successful in the univariate Box-Jenkins approach,
it was plausible to try multivariate extensions. Examples of such attempts are Tiao & Box
(1981), Tiao & Tsay (1983, 1989), Tsay (1989a, b), Wallis (1977), Zellner & Palm (1974),
Granger & Newbold (1977, Chapter 7), Jenkins & Alavi (1981). It became soon clear, how-
ever, that these strategies were at best promising for very small systems of two or perhaps
three variables. Moreover, the most useful setup of multiple time series models was under
discussion because VARMA representations are not unique or, to use econometric terminol-
ogy, they are not identified. Important early discussions of the related problems are due to
Hannan (1970, 1976, 1979, 1981), Dunsmuir & Hannan (1976) and Akaike (1974). A rather
general solution to the structure theory for VARMA models was later presented by Hannan
& Deistler (1988). Understanding the structural problems contributed to the development
of complete specification strategies. By now textbook treatments of modelling, analyzing
and forecasting VRAMA processes are available (Lutkepohl (1991), Reinsel (1993)).
The problems related to VARMA models were perhaps also relevant for a parallel de-
velopment of pure VAR models as important tools for economic analysis and forecasting.
Sims (1980) launched a general critique of classical econometric modelling and proposed
VAR models as alternatives. A short while later the concept of cointegration was developed
by Granger (1981) and Engle & Granger (1987). It is conveniently placed into the VAR
framework as shown by Johansen (1995a). Therefore it is perhaps not surprising that VAR
models dominate time series econometrics although the methodology and software for work-
ing with more general VARMA models is nowadays available. A recent previous overview of
forecasting with VARMA processes is given by Lutkepohl (2002). The present review draws
partly on that article and on a monograph by Lutkepohl (1987).
Notation, Terminology, Abbreviations
The following notation and terminology is used in this chapter. The lag operator also some-
times called backshift operator is denoted by L and it is defined as usual by Lyt ≡ yt−1. The
differencing operator is denoted by ∆, that is, ∆yt ≡ yt − yt−1. For a random variable or
random vector x, x ∼ (µ, Σ) signifies that its mean (vector) is µ and its variance (covariance
matrix) is Σ. The (K × K) identity matrix is denoted by IK and the determinant and
3
EUI WP ECO 2004/24
trace of a matrix A are denoted by det A and trA, respectively. For quantities A1, . . . , Ap,
diag[A1, . . . , Ap] denotes the diagonal or block-diagonal matrix with A1, . . . , Ap on the diag-
onal. The natural logarithm of a real number is signified by log. The symbols Z, N and C
are used for the integers, the positive integers and the complex numbers, respectively.
DGP stands for data generation process. VAR, AR, MA, ARMA and VARMA are used
as abbreviations for vector autoregressive, autoregressive, moving-average, autoregressive
moving-average and vector autoregressive moving-average (process). Error correction is ab-
breviated as EC and VECM is short for vector error correction model. The echelon forms
of VARMA and EC-VARMA processes are denoted by ARMAE and EC-ARMAE, respec-
tively. OLS, GLS, ML and RR abbreviate ordinary least squares, generalized least squares,
maximum likelihood and reduced rank, respectively. LR and MSE are used to abbreviate
likelihood ratio and mean squared error.
2 VARMA Processes
2.1 Stationary Processes
Suppose the DGP of the K-dimensional multiple time series, y1, . . . , yT , is stationary, that
is, its first and second moments are time invariant. It is a (finite order) VARMA process if
it can be represented in the general form
A0yt = A1yt−1 + · · ·+Apyt−p +M0ut +M1ut−1 + · · ·+Mqut−q, t = 0,±1,±2, . . . , (2.1)
where A0, A1, . . . , Ap are (K ×K) autoregressive parameter matrices while M0,M1, . . . , Mq
are moving average parameter matrices also of dimension (K×K). Defining the VAR and MA
operators, respectively, as A(L) = A0−A1L−· · ·−ApLp and M(L) = M0+M1L+· · ·+MqL
q,
the model can be written in more compact notation as
A(L)yt = M(L)ut, t = 0,±1,±2, . . . . (2.2)
Here ut is a white-noise process with zero mean, nonsingular, time-invariant covariance
matrix E(utu′t) = Σu and zero covariances, E(utu
′t−h) = 0 for h = ±1,±2, . . . . The zero-
order matrices A0 and M0 are assumed to be nonsingular. They will often be identical,
A0 = M0, and in many cases they will be equal to the identity matrix, A0 = M0 = IK . To
indicate the orders of the VAR and MA operators, the process (2.1) is sometimes called a
VARMA(p, q) process. Notice, however, that so far we have not made further assumptions
4
EUI WP ECO 2004/24
regarding the parameter matrices so that some or all of the elements of the Ai’s and Mj’s
may be zero. In other words, there may be a VARMA representation with VAR or MA
orders less than p and q, respectively. Obviously, the VAR model (1.1) is a VARMA(p, 0)
special case with A0 = IK and M(L) = IK . It may also be worth pointing out that there
are no deterministic terms such as nonzero mean terms in our basic VARMA model (2.1).
These terms are ignored here for convenience although they are important in practice. The
necessary modifications for deterministic terms will be discussed in Section 2.5.
The matrix polynomials in (2.2) are assumed to satisfy
det A(z) 6= 0, |z| ≤ 1, and det M(z) 6= 0, |z| ≤ 1 for z ∈ C. (2.3)
The first of these conditions ensures that the VAR operator is stable and the process is
stationary. Then it has a pure MA representation
yt =∞∑
j=0
Φiut−i (2.4)
with MA operator Φ(L) = Φ0 +∑∞
i=1 ΦiLi = A(L)−1M(L). Notice that Φ0 = IK if A0 = M0
and in particular if both zero order matrices are identity matrices. In that case (2.4) is just
the Wold MA representation of the process and, as we will see later, the ut are just the
one-step ahead forecast errors. Some of the forthcoming results are valid for more general
stationary processes with Wold representation (2.4) which may not come from a finite order
VARMA representation. In that case, it is assumed that the Φi’s are absolutely summable
so that the infinite sum in (2.4) is well-defined.
The second part of condition (2.3) is the usual invertibility condition for the MA operator
which implies the existence of a pure VAR representation of the process,
yt =∞∑i=1
Ξiyt−i + ut, (2.5)
where A0 = M0 is assumed and Ξ(L) = IK − ∑∞i=1 ΞiL
i = M(L)−1A(L). Occasionally
invertibility of the MA operator will not be a necessary condition. In that case, it is assumed
without loss of generality that det M(z) 6= 0, for |z| < 1. In other words, the roots of the
MA operator are outside or on the unit circle. There are still no roots inside the unit circle,
however. This assumption can be made without loss of generality because it can be shown
that for an MA process with roots inside the complex unit circle an equivalent one exists
which has all its roots outside and on the unit circle.
5
EUI WP ECO 2004/24
It may be worth noting at this stage already that every pair of operators A(L), M(L)
which leads to the same transfer functions Φ(L) and Ξ(L) defines an equivalent VARMA
representation for yt. This nonuniqueness problem of the VARMA representation will become
important when parameter estimation is discussed in Section 3.
As specified in (2.1), we are assuming that the process is defined for all t ∈ Z. For sta-
ble, stationary processes this assumption is convenient because it avoids considering issues
related to initial conditions. Alternatively one could define yt to be generated by a VARMA
process such as (2.1) for t ∈ N, and specify the initial values y0, . . . , y−p+1, u0, . . . , u−p+1 sep-
arately. Under our assumptions they can be defined such that yt is stationary. Alternatively,
one may define fixed initial values or perhaps even y0 = · · · = y−p+1 = u0 = · · · = u−p+1 = 0.
In general, such an assumption implies that the process is not stationary but just asymptot-
ically stationary, that is, the first and second order moments converge to the corresponding
quantities of the stationary process obtained by specifying the initial conditions accordingly
or defining yt for t ∈ Z. The issue of defining initial values properly becomes more important
for the nonstationary processes discussed in Section 2.2.
Both the MA and the VAR representations of the process will be convenient to work
with in particular situations. Another useful representation of a stationary VARMA process
is the state space representation which will not be used in this review, however. State space
representations of VARMA processes are considered, for example, by Aoki (1987), Hannan
& Deistler (1988), and Wei (1990). In more general terms state space models are discussed
in Chapter ??? of this Handbook.
2.2 Cointegrated I(1) Processes
If the DGP is not stationary but contains some I(1) variables, the levels VARMA form
(2.1) is not the most convenient one for inference purposes. In that case, det A(z) = 0 for
z = 1. Therefore we write the model in EC form by subtracting A0yt−1 on both sides and
re-arranging terms as follows:
A0∆yt = Πyt−1 + Γ1∆yt−1 + · · ·+ Γp−1∆yt−p+1
+M0ut + M1ut−1 + · · ·+ Mqut−q, t ∈ N,(2.6)
where Π = −(A0 − A1 − · · · − Ap) = −A(1) and Γi = −(Ai+1 + · · ·+ Ap) (i = 1, . . . , p− 1)
(Lutkepohl & Claessen (1997)). Here Πyt−1 is the EC term and r = rk(Π) is the cointe-
grating rank of the system which specifies the number of linearly independent cointegration
relations. The process is assumed to be started at time t = 1 from some initial values
6
EUI WP ECO 2004/24
y0, . . . , y−p+1, u0, . . . , u−p+1 to avoid infinite moments. Thus, the initial values are now of
some importance. Assuming that they are zero is convenient because in that case the pro-
cess is easily seen to have a pure EC-VAR or VECM representation of the form
∆yt = Π∗yt−1 +t−1∑j=1
Θj∆yt−j + A−10 M0ut, t ∈ N, (2.7)
where Π∗ and Θj (j = 1, 2, . . . ) are such that
IK∆− Π∗L−∞∑
j=1
Θj∆Lj = A−10 M0M(L)−1(A0∆− ΠL− Γ1∆L− · · · − Γp−1∆Lp−1).
A similar representation can also be obtained if nonzero initial values are permitted (see
Saikkonen & Lutkepohl (1996)). Furthermore, Bauer & Wagner (2003) present a state space
representation which is especially suitable for cointegrated processes.
2.3 Linear Transformations of VARMA Processes
As mentioned in the introduction, a major advantage of the class of VARMA processes is that
it is closed with respect to linear transformations. In other words, linear transformations of
VARMA processes have again a finite order VARMA representation. These transformations
are very common and are useful to study problems of aggregation, marginal processes or
averages of variables generated by VARMA processes etc.. In particular, the following result
from Lutkepohl (1984) is useful in this context. Let
yt = ut + M1ut−1 + · · ·+ Mqut−q
be a K-dimensional invertible MA(q) process and let F be an (M ×K) matrix of rank M .
Then the M -dimensional process zt = Fyt has an invertible MA(q) representation with q ≤ q.
An interesting consequence of this result is that if yt is a stable and invertible VARMA(p, q)
process as in (2.1), then the linearly transformed process zt = Fyt has a stable and invertible
VARMA(p, q) representation with p ≤ (K − M + 1)p and q ≤ (K − M)p + q (Lutkepohl
(1987, Chapter 4)).
These results are directly relevant for contemporaneous aggregation of VARMA processes
and they can also be used to study temporal aggregation problems. To see this suppose we
wish to aggregate the variables yt generated by (2.1) over m subsequent periods. For instance,
m = 3 if we wish to aggregate monthly data to quarterly figures. To express the temporal
7
EUI WP ECO 2004/24
aggregation as a linear transformation we define
yϑ =
ym(ϑ−1)+1
ym(ϑ−1)+2
...
ymϑ
and uϑ =
um(ϑ−1)+1
um(ϑ−1)+2
...
umϑ
(2.8)
and specify the process
A0yϑ = A1yϑ−1 + · · ·+APyϑ−P +M0uϑ +M1uϑ−1 + · · ·+MQuϑ−Q, (2.9)
where
A0 =
A0 0 0 . . . 0
−A1 A0 0 . . . 0
−A2 −A1 A0...
......
.... . .
−Am−1 −Am−2 −Am−3 . . . A0
,
Ai =
Aim Aim−1 . . . Aim−m+1
Aim+1 Aim . . . Aim−m+2
......
. . ....
Aim+m−1 Aim+m−2 . . . Aim
, i = 1, . . . , P,
with Aj = 0 for j > p and M0, . . . ,MQ defined in an analogous manner. The order
P = minn ∈ N|nm ≥ p and Q = minn ∈ N|nm ≥ q. Notice that the time subscript of
yϑ is different from that of yt. The new time index ϑ refers to another observation frequency
than t. For example, if t refers to months and m = 3, ϑ refers to quarters.
Using the process (2.9), temporal aggregation over m periods can be represented as a
linear transformation. In fact, different types of temporal aggregation can be handled. For
instance, the aggregate may be the sum of subsequent values or it may be their average.
Furthermore, temporal and contemporaneous aggregation can be dealt with simultaneously.
In all of these cases the aggregate has a finite order VARMA representation if the original
variables are generated by a finite order VARMA process and its structure can be ana-
lyzed using linear transformations. For another approach to study temporal aggregates see
Marcellino (1999).
8
EUI WP ECO 2004/24
2.4 Forecasting
2.4.1 General Results
When forecasting a set of variables is the objective, it is useful to think about a loss function
or an evaluation criterion for the forecast performance. Given such a criterion, optimal fore-
casts may be constructed. VARMA processes are particularly useful for producing forecasts
that minimize the forecast MSE. Therefore this criterion will be used here and the reader is
referred to Chapter ??? for a discussion of other forecast evaluation criteria.
Forecasts of the variables of the VARMA process (2.1) are obtained easily from the pure
VAR form (2.5). Assuming an independent white noise process ut, an optimal, minimum
MSE h-step forecast at time τ is the conditional expectation given the yt, t ≤ τ ,
yτ+h|τ ≡ E(yτ+h|yτ , yτ−1, . . . ).
It may be determined recursively for h = 1, 2, . . . , as
yτ+h|τ =∞∑i=1
Ξiyτ+h−i|τ , (2.10)
where yτ+j|τ = yτ+j for j ≤ 0. If the ut do not form an independent but only uncorrelated
white noise sequence, the forecast obtained in this way is still the best linear forecast although
it may not be the best in a larger class of possibly nonlinear functions of past observations.
For given initial values, the ut can also be determined under the present assumption of a
known process. Hence, the h-step forecasts may be determined alternatively as
yτ+h|τ = A−10 (A1yτ+h−1|τ + · · ·+ Apyτ+h−p|τ ) + A−1
0
q∑
i=h
Miuτ+h−i, (2.11)
where, as usual, the sum vanishes if h > q.
Both ways of computing h-step forecasts from VARMA models rely on the availability
of initial values. In the pure VAR formula (2.10) all infinitely many past yt are in principle
necessary if the VAR representation is indeed of infinite order. In contrast, in order to use
(2.11), the ut’s need to be known which are unobserved and can only be obtained if all
past yt or initial conditions are available. If only y1, . . . , yτ are given, the infinite sum in
(2.10) may be truncated accordingly. For large τ , the approximation error will be negligible
because the Ξi’s go to zero quickly as i → ∞. Alternatively, precise forecasting formulas
based on y1, . . . , yτ may be obtained via the so-called Multivariate Innovations Algorithm of
Brockwell & Davis (1987, §11.4).
9
EUI WP ECO 2004/24
Under our assumptions, the properties of the forecast errors for stable, stationary pro-
cesses are easily derived by expressing the process (2.1) in Wold MA form,
yt = ut +∞∑i=1
Φiut−i, (2.12)
where A0 = M0 is assumed (see (2.4)). In terms of this representation the optimal h-step
forecast may be expressed as
yτ+h|τ =∞∑
i=h
Φiuτ+h−i. (2.13)
Hence, the forecast errors are seen to be
yτ+h − yτ+h|τ = uτ+h + Φ1uτ+h−1 + · · ·+ Φh−1uτ+1. (2.14)
Thus, the forecast is unbiased (i.e., the forecast errors have mean zero) and the MSE or
forecast error covariance matrix is
Σy(h) ≡ E[(yτ+h − yτ+h|τ )(yτ+h − yτ+h|τ )′] =
h−1∑j=0
ΦjΣuΦ′j.
If ut is normally distributed (Gaussian), the forecast errors are also normally distributed,
yτ+h − yτ+h|τ ∼ N(0, Σy(h)). (2.15)
Hence, forecast intervals etc. may be derived from these results in the familiar way under
Gaussian assumptions.
It is also interesting to note that the forecast error variance is bounded by the covariance
matrix of yt,
Σy(h) →h→∞ Σy ≡ E(yty′t) =
∞∑j=0
ΦjΣuΦ′j. (2.16)
Hence, forecast intervals will also have bounded length as the forecast horizon increases.
The situation is different if there are integrated variables. The formula (2.11) can again be
used for computing the forecasts. Their properties will be different from those for stationary
processes, however. Although the Wold MA representation does not exist for integrated
processes, the Φj coefficient matrices can be computed in the same way as for stationary
processes from the power series A(z)−1M(z) which still exists for z ∈ C with |z| < 1. Hence,
the forecast errors can still be represented as in (2.14) (see Lutkepohl (1991, Chapter 11)).
Thus, formally the forecast errors look quite similar to those for the stationary case. Now
10
EUI WP ECO 2004/24
the forecast error MSE matrix is unbounded, however, because the Φj’s in general do not
converge to zero as j →∞. Despite this general result, there may be linear combinations of
the variables which can be forecast with bounded precision if the forecast horizon gets large.
This situation arises if there is cointegration. For cointegrated processes it is of course also
possible to base the forecasts directly on the EC form. For instance, using (2.6),
∆yτ+h|τ = A−10 (Πyτ+h−1|τ +Γ1∆yτ+h−1|τ +· · ·+Γp−1∆yτ+h−p+1|τ )+A−1
0
q∑
i=h
Miuτ+h−i, (2.17)
and yτ+h|τ = yτ+h−1|τ + ∆yτ+h|τ can be used to get a forecast of the levels variables.
As an illustration of forecasting cointegrated processes consider the following bivariate
VAR model which has cointegrating rank 1: y1t
y2t
=
0 1
0 1
y1,t−1
y2,t−1
+
u1t
u2t
. (2.18)
For this process
A(z)−1 = (I2 − A1z)−1 =∞∑
j=0
Aj1z
j =∞∑
j=0
Φjzj
exists only for |z| < 1 because Φ0 = I2 and
Φj = Aj1 =
0 1
0 1
, j = 1, 2, . . . ,
does not converge to zero for j →∞. The forecast MSE is
Σy(h) =h−1∑j=0
ΦjΣuΦ′j = Σu + (h− 1)
σ2
2 σ22
σ22 σ2
2
, h = 1, 2, . . . ,
where σ22 is the variance of u2t. The conditional expectations are yk,τ+h|τ = y2,τ (k = 1, 2).
Assuming normality of the white noise process, (1−γ)100% forecast intervals are easily seen
to be
y2,τ ± c1−γ/2
√σ2
k + (h− 1)σ22, k = 1, 2,
where c1−γ/2 is the (1− γ/2)100 percentage point of the standard normal distribution. The
lengths of these intervals increase without bounds for h →∞.
The EC representation of (2.18) is easily seen to be
∆yt =
−1 1
0 0
yt−1 + ut.
11
EUI WP ECO 2004/24
Thus, rk(Π) = 1 so that the two variables are cointegrated and some linear combinations can
be forecasted with bounded forecast intervals. For the present example, multiplying (2.18)
by
1 −1
0 1
gives
1 −1
0 1
yt =
0 0
0 1
yt−1 +
1 −1
0 1
ut.
Obviously, the cointegration relation zt = y1t − y2t = u1t − u2t is zero mean white noise
and the forecast intervals for zt for any forecast horizon h ≥ 1 are of constant length,
zτ+h|τ ± c1−γ/2σz(h) or [−c1−γ/2σz, c1−γ/2σz]. Note that zτ+h|τ = 0 for h ≥ 1 and σ2z =
Var(u1t) + Var(u2t)− 2Cov(u1t, u2t) is the variance of zt.
As long as theoretical results are discussed one could consider the first differences of the
process, ∆yt, which also have a VARMA representation. If there is genuine cointegration,
then ∆yt is overdifferenced in the sense that its VARMA representation has MA unit roots
even if the MA part of the levels yt is invertible.
2.4.2 Forecasting Aggregated Processes
We have argued in Section 2.3 that linear transformations of VARMA processes are often of
interest, for example, if aggregation is studied. Therefore forecasts of transformed processes
are also of interest. Here we present some forecasting results for transformed and aggregated
processes from Lutkepohl (1987) where also proofs and further references can be found. We
begin with general results which have immediate implications for contemporaneous aggrega-
tion. Then we will also present some results for temporally aggregated processes which can
be obtained via the process representation (2.9).
Linear Transformations and Contemporaneous Aggregation
Suppose yt is a stationary VARMA process with pure, invertible Wold MA representation
(2.4), that is, yt = Φ(L)ut with Φ0 = IK , F is an (M ×K) matrix with rank M and we are
interested in forecasting the transformed process zt = Fyt. It was discussed in Section 2.3
that zt also has a VARMA representation so that the previously considered techniques can
12
EUI WP ECO 2004/24
be used for forecasting. Suppose that the corresponding Wold MA representation is
zt =∞∑i=0
Ψivt−i = Ψ(L)vt. (2.19)
From (2.13) the optimal h-step predictor for zt at origin τ , based on its own past, is then
zτ+h|τ =∞∑
i=h
Ψivτ+h−i. (2.20)
In general this predictor will differ from one based on forecasting yt and then transforming
the forecast,
zoτ+h|τ ≡ Fyτ+h|τ . (2.21)
Before we compare the two forecasts zoτ+h|τ and zτ+h|τ it may be of interest to draw
attention to yet another possible forecast. If the dimension K of the vector yt is large, it
may be difficult to construct a suitable VARMA model for the underlying process and one
may consider forecasting the individual components of yt by univariate methods and then
transforming the univariate forecasts. Because the component series of yt can be obtained by
linear transformations, they also have ARMA representations. Denoting the corresponding
Wold MA representations by
ykt =∞∑i=0
θkiwk,t−i = θk(L)wkt, k = 1, . . . , K, (2.22)
the optimal univariate h-step forecasts are
yuk,τ+h|τ =
∞∑
i=h
θkiwk,τ+h−i, k = 1, . . . , K. (2.23)
Defining yuτ+h|τ = (yu
1,τ+h|τ , . . . , yuK,τ+h|τ )
′, these forecasts can be used to obtain an h-step
forecast
zuτ+h|τ ≡ Fyu
τ+h|τ (2.24)
of the variables of interest.
We will now compare the three forecasts (2.20), (2.21) and (2.24) of the transformed
process zt. In this comparison we denote the MSE matrices corresponding to the three
forecasts by Σz(h), Σoz(h) and Σu
z (h), respectively. Because zoτ+h|τ uses the largest information
set, it is not surprising that it has the smallest MSE matrix and is hence the best one out
of the three forecasts,
Σz(h) ≥ Σoz(h) and Σu
z (h) ≥ Σoz(h), (2.25)
13
EUI WP ECO 2004/24
where “≥” means that the difference between the left-hand and right-hand matrices is pos-
itive semidefinite. Thus, forecasting the original process yt and then transforming the fore-
casts is generally more efficient than forecasting the transformed process directly or trans-
forming univariate forecasts. It is possible, however, that some or all of the forecasts are
identical. Actually, for I(0) processes, all three predictors always approach the same long-
term forecast of zero. Consequently,
Σz(h), Σoz(h), Σu
z (h) → Σz ≡ E(ztz′t) as h →∞. (2.26)
Moreover, it can be shown that if the one-step forecasts are identical, then they will also be
identical for larger forecast horizons. More precisely we have,
zoτ+1|τ = zτ+1|τ ⇒ zo
τ+h|τ = zτ+h|τ h = 1, 2, . . . , (2.27)
zuτ+1|τ = zτ+1|τ ⇒ zu
τ+h|τ = zτ+h|τ h = 1, 2, . . . , (2.28)
and, if Φ(L) and Θ(L) are invertible,
zoτ+1|τ = zu
τ+1|τ ⇒ zoτ+h|τ = zu
τ+h|τ h = 1, 2, . . . . (2.29)
Thus, one may ask whether the one-step forecasts can be identical and it turns out that this
is indeed possible. The following proposition which summarizes results of Tiao & Guttman
(1980), Kohn (1982) and Lutkepohl (1984), gives conditions for this to happen.
Proposition 1. Let yt be a K-dimensional stochastic process with MA representation as in
(2.4) with Φ0 = IK and F an (M ×K) matrix with rank M . Then, using the notation intro-
duced in the foregoing and defining Θ(L) = diag[θ1(L), . . . , θK(L)], the following relations
hold:
zoτ+1|τ = zτ+1|τ ⇐⇒ FΦ(L) = Ψ(L)F, (2.30)
zuτ+1|τ = zτ+1|τ ⇐⇒ FΘ(L) = Ψ(L)F (2.31)
and, if Φ(L) and Θ(L) are invertible,
zoτ+1|τ = zu
τ+1|τ ⇐⇒ FΦ(L)−1 = FΘ(L)−1. (2.32)
¤
14
EUI WP ECO 2004/24
There are several interesting implications of this proposition. First, if yt consists of
independent components (Φ(L) = Θ(L)) and zt is just their sum, i.e., F = (1, . . . , 1), then
zoτ+1|τ = zτ+1|τ ⇐⇒ θ1(L) = · · · = θK(L). (2.33)
In other words, forecasting the individual components and summing up the forecasts is
strictly more efficient than forecasting the sum directly whenever the components are not
generated by identical stochastic processes. Second, forecasting the univariate components
of yt individually can only be as efficient a forecast for yt as forecasting on the basis of the
multivariate process if and only if Φ(L) is a diagonal matrix operator. Related to this result
is a well-known condition for Granger-noncausality. For a bivariate process yt = (y1t, y2t)′,
y2t is said to be Granger-causal for y1t if it is helpful for improving the forecasts of the latter
variable. In terms of the previous notation this may be stated by specifying F = (1, 0)
and defining y2t as being Granger-causal for y1t if zoτ+1|τ = Fyτ+1|τ = yo
1,τ+1|τ is a better
forecast than zτ+1|τ = y1,τ+1|τ . From (2.30) it then follows that y2t is not Granger-causal for
y1t if and only if φ12(L) = 0, where φ12(L) denotes the upper right hand element of Φ(L).
This characterization of Granger-noncausality is well-known in the related literature (e.g.,
Lutkepohl (1991, Section 2.3.1)).
It may also be worth noting that in general there is no unique ranking of the forecasts
zτ+1|τ and zuτ+1|τ . Depending on the structure of the underlying process yt and the trans-
formation matrix F , either Σz(h) ≥ Σuz (h) or Σz(h) ≤ Σu
z (h) will hold and the relevant
inequality may be strict in the sense that the left-hand and right-hand matrices are not
identical.
Some but not all the results in this section carry over to nonstationary I(1) processes.
For example, the result (2.26) will not hold in general if some components of yt are I(1)
because in this case the three forecasts do not necessarily converge to zero as the forecast
horizon gets large. On the other hand, the conditions in (2.30) and (2.31) can be used for
the differenced processes. For these results to hold, the MA operator may have roots on the
unit circle and hence overdifferencing is not a problem.
The previous results on linearly transformed processes can also be used to compare
different predictors for temporally aggregated processes by setting up the corresponding
process (2.9). Some related results will be summarized next.
15
EUI WP ECO 2004/24
Temporal Aggregation
Different forms of temporal aggregation are of interest, depending on the types of variables
involved. If yt consists of stock variables, then temporal aggregation is usually associated
with systematic sampling, sometimes called skip-sampling or point-in-time sampling. In other
words, the process
sϑ = ymϑ (2.34)
is used as an aggregate over m periods. Here the aggregated process sϑ has a new time index
which refers to another observation frequency than the original subscript t. For example, if
t refers to months and m = 3, then ϑ refers to quarters. In that case the process sϑ consists
of every third variable of the yt process. This type of aggregation contrasts with temporal
aggregation of flow variables where a temporal aggregate is typically obtained by summing
up consecutive values. Thus, aggregation over m periods gives the aggregate
zϑ = ymϑ + ymϑ−1 + · · ·+ ymϑ−m+1. (2.35)
Now if, for example, t refers to months and m = 3, then three consecutive observations are
added to obtain the quarterly value. In the following we again assume that the disaggregated
process yt is stationary and invertible and has a Wold MA representation as in (2.4), yt =
Φ(L)ut with Φ0 = IK . As we have seen in Section 2.3, this implies that sϑ and zϑ are also
stationary and have Wold MA representations. We will now discuss forecasting stock and
flow variables in turn. In other words, we consider forecasts for sϑ and zϑ.
Suppose first that we wish to forecast sϑ. Then the past aggregated values sϑ, sϑ−1, . . . may be used to obtain an h-step forecast sϑ+h|ϑ as in (2.13) on the basis of the MA repre-
sentation of sϑ. If the disaggregate process yt is available, another possible forecast results
by systematically sampling forecasts of yt which gives soϑ+h|ϑ = ymϑ+mh|mϑ. Using the results
for linear transformations, the latter forecast generally has a lower MSE than sϑ+h|ϑ and the
difference vanishes if the forecast horizon h → ∞. For special processes the two predictors
are identical, however. It follows from relation (2.30) of Proposition 1 that the two predictors
are identical for h = 1, 2, . . . , if and only if
Φ(L) =
( ∞∑i=0
ΦimLim
) (m−1∑i=0
ΦiLi
)(2.36)
(Lutkepohl (1987, Proposition 7.1)). Thus, there is no loss in forecast efficiency if the MA
operator of the disaggregate process has the multiplicative structure in (2.36). This condition
16
EUI WP ECO 2004/24
is, for instance, satisfied if yt is a purely seasonal process with seasonal period m such that
yt =∞∑i=0
Φimut−im. (2.37)
It also holds if yt has a finite order MA structure with MA order less than m. Interestingly,
it also follows that there is no loss in forecast efficiency if the disaggregate process yt is a
VAR(1) process, yt = A1yt−1 + ut. In that case, the MA operator can be written as
Φ(L) =
( ∞∑i=0
Aim1 Lim
)(m−1∑i=0
Ai1L
i
)
and, hence, it has the required structure.
Now consider the case of a vector of flow variables yt for which the temporal aggregate
is given in (2.35). For forecasting the aggregate zϑ one may use the past aggregated values
and compute an h-step forecast zϑ+h|ϑ as in (2.13) on the basis of the MA representation
of zϑ. Alternatively, we may again forecast the disaggregate process yt and aggregate the
forecasts. This forecast is denoted by zoϑ+h|ϑ, that is,
zoϑ+h|ϑ = ymϑ+mh|mϑ + ymϑ+mh−1|mϑ + · · ·+ ymϑ+mh−m+1|mϑ. (2.38)
Again the results for linear transformations imply that the latter forecast generally has a
lower MSE than zϑ+h|ϑ and the difference vanishes if the forecast horizon h → ∞. In this
case equality of the two forecasts holds for small forecast horizons h = 1, 2, . . . , if and only
if
(1 + L + · · ·+ Lm−1)
( ∞∑i=0
ΦiLi
)
=
( ∞∑j=0
(Φjm + · · ·+ Φjm−m+1)Ljm
) (m−1∑i=0
(Φ0 + Φ1 + · · ·+ Φi)Li
), (2.39)
where Φj = 0 for j < 0 (Lutkepohl (1987, Proposition 8.1)). In other words, the two forecasts
are identical and there is no loss in forecast efficiency from using the aggregate directly if
the MA operator of yt has the specified multiplicative structure upon multiplication by
(1 + L + · · ·+ Lm−1). This condition is also satisfied if yt has the purely seasonal structure
(2.37). However, in contrast to what was observed for stock variables, the two predictors are
generally not identical if the disaggregate process yt is generated by an MA process of order
less than m.
It is perhaps also interesting to note that if there are both stock and flow variables in one
system, then even if the underlying disaggregate process yt is the periodic process (2.37), a
17
EUI WP ECO 2004/24
forecast based on the disaggregate data may be better than directly forecasting the aggregate
(Lutkepohl (1987, pp. 177-178)). This result is interesting because for the purely seasonal
process (2.37) using the disaggregate process will not result in superior forecasts if a system
consisting either of stock variables only or of flow variables only is considered.
So far we have considered temporal aggregation of stationary processes. Most of the
results can be generalized to I(1) processes by considering the stationary process ∆yt instead
of the original process yt. Recall that forecasts for yt can then be obtained from those of
∆yt. Moreover, in this context it may be worth taking into account that in deriving some of
the conditions for forecast equality, the MA operator of the considered disaggregate process
may have unit roots resulting from overdifferencing. A result which does not carry over
to the I(1) case, however, is the equality of long horizon forecasts based on aggregate or
disaggregate variables. The reason is again that optimal forecasts of I(1) variables do not
settle down at zero eventually when h →∞.
Clearly, so far we have just discussed forecasting of known processes. In practice, the
DGPs have to be specified and estimated on the basis of limited sample information. In
that case quite different results may be obtained and, in particular, forecasts based on
disaggregate processes may be inferior to those based on the aggregate directly. This issue
is taken up again in Section 4.2 when forecasting estimated processes is considered.
Forecasting temporally aggregated processes has been discussed extensively in the liter-
ature. Early examples of treatments of temporal aggregation of time series are Abraham
(1982), Amemiya & Wu (1972), Brewer (1973), Lutkepohl (1986a, b), Stram & Wei (1986),
Telser (1967), Tiao (1972), Wei (1978) and Weiss (1984) among many others. More recently,
Breitung & Swanson (2002) have studied the implications of temporal aggregation when the
number of aggregated time units goes to infinity.
2.5 Extensions
So far we have considered processes which are too simple in some respects to qualify as
DGPs of most economic time series. This was mainly done to simplify the exposition. Some
important extensions will now be considered. In particular, we will discuss deterministic
terms, higher order integration and seasonal unit roots as well as non-Gaussian processes.
18
EUI WP ECO 2004/24
2.5.1 Deterministic Terms
An easy way to integrate deterministic terms in our framework is to simply add them to the
stochastic part. In other words, we consider processes
yt = µt + xt,
where µt is a deterministic term and xt is the purely stochastic part which is assumed to
have a VARMA representation of the type considered earlier. The deterministic part can, for
example, be a constant, µt = µ0, a linear trend, µt = µ0 + µ1t, or a higher order polynomial
trend. Furthermore, seasonal dummy variables or other dummies may be included.
From a forecasting point of view, deterministic terms are easy to handle because by
their very nature their future values are precisely known. Thus, in order to forecast yt, we
may forecast the purely stochastic process xt as discussed earlier and then simply add the
deterministic part corresponding to the forecast period. In this case, the forecast errors and
MSE matrices are the same as for the purely stochastic process. Of course, in practice the
deterministic part may contain unknown parameters which have to be estimated from data.
For the moment this issue is ignored because we are considering known processes. It will
become important, however, in Section 4, where forecasting estimated processes is discussed.
2.5.2 More Unit Roots
In practice the order of integration of some of the variables can be greater than one and
det A(z) may have roots on the unit circle other than z = 1. For example, there may be
seasonal unit roots. Considerable research has been done on these extensions of our basic
models. See, for instance, Johansen (1995b, 1997), Gregoir & Laroque (1994) and Haldrup
(1998) for discussions of the I(2) and higher order integration frameworks, and Johansen &
Schaumburg (1999) and Gregoir (1999a, b) for research on processes with roots elsewhere
on the unit circle. Bauer & Wagner (2003) consider state space representations for VARMA
models with roots at arbitrary points on the unit circle.
As long as the processes are assumed to be known these issues do not create additional
problems for forecasting because we can still use the general forecasting formulas for VARMA
processes. Extensions are important, however, when it comes to model specification and
estimation. In these steps of the forecasting procedure taking into account extensions in the
methodology may be useful.
19
EUI WP ECO 2004/24
2.5.3 Non-Gaussian Processes
If the DGP of a multiple time series is not normally distributed, point forecasts can be
computed as before. They will generally still be best linear forecasts and may in fact be
minimum MSE forecasts if ut is independent white noise, as discussed in Section 2.4. In
setting up forecast intervals the distribution has to be taken into account, however. If the
distribution is unknown, bootstrap methods can be used to compute interval forecasts (e.g.,
Findley (1986), Masarotto (1990), Grigoletto (1998), Kabaila (1993), Pascual, Romo & Ruiz
(2004)).
3 Specifying and Estimating VARMA Models
As we have seen in the previous section, for forecasting purposes the pure VAR or MA rep-
resentations of a stochastic process are quite useful. These representations are in general of
infinite order. In practice, they have to be replaced by finite dimensional parameterizations
which can be specified and estimated from data. VARMA processes are such finite dimen-
sional parameterizations. Therefore, in practice, a VARMA model such as (2.1) or even a
pure finite order VAR as in (1.1) will be specified and estimated as a forecasting tool.
As mentioned earlier, the operators A(L) and M(L) of the VARMA model (2.2) are not
unique or not identified, as econometricians sometimes say. This nonuniqueness is prob-
lematic if the process parameters have to be estimated because a unique representation is
needed for consistent estimation. Before we discuss estimation and specification issues re-
lated to VARMA processes we will therefore present identifying restrictions. More precisely,
the echelon form of VARMA and EC-VARMA models will be presented. Then estimation
procedures, model specification and diagnostic checking will be discussed.
3.1 The Echelon Form
Any pair of operators A(L) and M(L) that gives rise to the same VAR operator Ξ(L) =
IK −∑∞
i=1 ΞiLi = M(L)−1A(L) or MA operator Φ(L) = A(L)−1M(L) defines an equivalent
VARMA process for yt. Here A0 = M0 is assumed. Clearly, if we premultiply A(L) and
M(L) by some invertible operator D(L) = D0 + D1L + · · · + DqLq satisfying det(D0) 6= 0
and det D(z) 6= 0 for |z| ≤ 1, an equivalent VARMA representation is obtained. Thus, a
first step towards finding a unique representation is to cancel common factors in A(L) and
M(L). We therefore assume that the operator [A(L) : M(L)] is left-coprime. To define this
20
EUI WP ECO 2004/24
property, recall that a matrix polynomial D(z) and the corresponding operator D(L) are
unimodular if det D(z) is a constant which does not depend on z. Examples of unimodular
operators are
D(L) = D0 or D(L) =
1 δL
0 1
(3.1)
(see Lutkepohl (1996) for definitions and properties of matrix polynomials). A matrix oper-
ator [A(L) : M(L)] is called left-coprime if only unimodular operators D(L) can be factored.
In other words, [A(L) : M(L)] is left-coprime if for operators A(L), M(L) an operator D(L)
exists such that [A(L) : M(L)] = D(L)[A(L) : M(L)] holds, then D(L) must be unimodular.
Although considering only left-coprime operators [A(L) : M(L)] does not fully solve the
nonuniqueness problem of VARMA representations it is a first step in the right direction
because it excludes many possible redundancies. It does not rule out premultiplication by
some nonsingular matrix, for example, and thus, there is still room for improvement. Even
if A0 = M0 = IK is assumed, uniqueness of the operators is not achieved because there
are unimodular operators D(L) with zero-order matrix IK , as seen in (3.1). Premultiplying
[A(L) : M(L)] by such an operator maintains left-coprimeness. Therefore more restrictions
are needed for uniqueness. The echelon form discussed in the next subsections provides
sufficiently many restrictions in order to ensure uniqueness of the operators. We will first
consider stationary processes and then turn to EC-VARMA models.
3.1.1 Stationary Processes
We assume that [A(L) : M(L)] is left-coprime and we denote the kl-th elements of A(L) and
M(L) by αkl(L) and mkl(L), respectively. Let pk be the maximum polynomial degree in the
k-th row of [A(L) : M(L)], k = 1, . . . , K, and define
pkl =
min(pk + 1, pl) for k > l,
min(pk, pl) for k < l,k, l = 1, . . . , K.
The VARMA process is said to be in echelon form or, briefly, ARMAE form if the operators
A(L) and M(L) satisfy the following restrictions (Lutkepohl & Claessen (1997), Lutkepohl
(2002)):
mkk(L) = 1 +
pk∑i=1
mkk,iLi, for k = 1, . . . , K, (3.2)
21
EUI WP ECO 2004/24
mkl(L) =
pk∑i=pk−pkl+1
mkl,iLi, for k 6= l, (3.3)
and
αkl(L) = αkl,0 −pk∑i=1
αkl,iLi, with αkl,0 = mkl,0 for k, l = 1, . . . , K. (3.4)
Here the row degrees pk (k = 1, . . . , K) are called the Kronecker indices (see Hannan &
Deistler (1988), Lutkepohl (1991)). In the following we denote an echelon form with Kro-
necker indices p1, . . . , pK by ARMAE(p1, . . . , pK). Notice that it corresponds to a VARMA(p, p)
representation in (2.1) with p = max(p1, . . . , pK). An ARMAE form may have more zero
coefficients than those specified by the restrictions from (3.2)-(3.4). In particular, there may
be models where the AR and MA orders are not identical due to further zero restrictions.
Such over-identifying constraints are not ruled out by the echelon form. It does not need
them to ensure uniqueness of the operator [A(L) : M(L)] for a given VAR operator Ξ(L)
or MA operator Φ(L), however. Note also that every VARMA process can be written in
echelon form. Thus, the echelon form does not exclude any VARMA processes.
The present specification of the echelon form does not restrict the autoregressive operator
except for the maximum row degrees imposed by the Kronecker indices and the zero order
matrix (A0 = M0). Additional identifying zero restrictions are placed on the moving average
coefficient matrices attached to low lags of the error process ut. This form of the echelon form
was proposed by Lutkepohl & Claessen (1997) because it can be combined conveniently with
the EC representation of a VARMA process, as we will see shortly. Thus, it is particularly
useful for processes with cointegrated variables. A slightly different echelon form is usually
used in the literature on stationary processes. Typically the restrictions on low order lags
are imposed on the VAR coefficient matrices (e.g., Hannan & Deistler (1988), Lutkepohl
(1991)).
To illustrate the present ARMAE form we consider the following three-dimensional pro-
cess from Lutkepohl (2002) with Kronecker indices (p1, p2, p3) = (1, 2, 1). It is easy to derive
the pkl,
[pkl] =
• 1 1
1 • 1
1 2 •
.
22
EUI WP ECO 2004/24
Hence, we get the ARMAE(1, 2, 1) form:
1 0 0
0 1 0
0 α32,0 1
yt =
α11,1 α12,1 α13,1
α21,1 α22,1 α23,1
α31,1 α32,1 α33,1
yt−1 +
0 0 0
α21,2 α22,2 α23,2
0 0 0
yt−2
+
1 0 0
0 1 0
0 α32,0 1
ut +
m11,1 m12,1 m13,1
0 m22,1 0
m31,1 m32,1 m33,1
ut−1
+
0 0 0
m21,2 m22,2 m23,2
0 0 0
ut−2.
(3.5)
Although in this example the zero order matrix is lower triangular, it will often be an identity
matrix in an echelon form. In fact, it will always be an identity matrix if the Kronecker indices
are ordered from smallest to largest.
3.1.2 I(1) Processes
If the EC form of the ARMAE model is set up as in (2.6), the autoregressive short-run
coefficient matrices Γi (i = 1, . . . , p − 1) satisfy similar identifying constraints as the Ai’s
(i = 1, . . . , p). More precisely, Γi obeys the same zero restrictions as Ai+1 for i = 1, . . . , p−1.
This structure follows from the specific form of the zero restrictions on the Ai’s. If αkl,i is
restricted to zero by the echelon form this implies that the corresponding element αkl,j of Aj
is also zero for j > i. Similarly, the zero restrictions on Π are the same as those on A0−A1.
Because the echelon form does not impose zero restrictions on A1 if all Kronecker indices
pk ≥ 1 (k = 1, . . . , K), there are no echelon form zero restrictions on Π if all Kronecker
indices are greater than zero. On the other hand, if there are zero Kronecker indices, this
has consequences for the rank of Π and, hence, for the integration and cointegration structure
of the variables. In fact, denoting by % the number of zero Kronecker indices, it can be shown
that
rk(Π) ≥ %. (3.6)
This result is useful to remember when procedures for specifying the cointegrating rank of a
VARMA system are considered.
In the following we use the acronym EC-ARMAE for an EC-VARMA model which sat-
isfies the echelon form restrictions. To illustrate its structure, we follow again Lutkepohl
23
EUI WP ECO 2004/24
(2002) and use the model (3.5). The corresponding EC-ARMAE form is
1 0 0
0 1 0
0 α32,0 1
∆yt =
π11 π12 π13
π21 π22 π23
π31 π32 π33
yt−1 +
0 0 0
γ21,1 γ22,1 γ23,1
0 0 0
∆yt−1
+
1 0 0
0 1 0
0 α32,0 1
ut +
m11,1 m12,1 m13,1
0 m22,1 0
m31,1 m32,1 m33,1
ut−1
+
0 0 0
m21,2 m22,2 m23,2
0 0 0
ut−2.
The following three-dimensional ARMAE(0, 0, 1) model is another example from Lutkepohl
(2002):
yt =
0 0 0
0 0 0
α31,1 α32,1 α33,1
yt−1 + ut +
0 0 0
0 0 0
m31,1 m32,1 m33,1
ut−1. (3.7)
Note that in this case A0 = I3. Now two of the Kronecker indices are zero and, hence,
according to (3.5), the cointegrating rank of this system must be at least 2. The EC-ARMAE
form is
∆yt =
1 0 0
0 1 0
π31 π32 π33
yt−1 + ut +
0 0 0
0 0 0
m31,1 m32,1 m33,1
ut−1.
The rank of
Π =
1 0 0
0 1 0
π31 π32 π33
is clearly at least two.
Because we now have a unique representation of VARMA models we can discuss estima-
tion of such models. Of course, to estimate an ARMAE or EC-ARMAE form we need to
specify the Kronecker indices and possibly the cointegrating rank. We will discuss parameter
estimation first and then consider model specification issues.
24
EUI WP ECO 2004/24
Before we go on with these topics, we mention that there are other ways to achieve
uniqueness or identification of a VARMA representation. For example, Zellner & Palm (1974)
and Wallis (1977) consider a final form representation which also solves the identification
problem. It often results in rather heavily parameterized models and has therefore not
gained much popularity. Tiao & Tsay (1989) propose so-called scalar component models
to overcome the identification problem. The idea is to consider linear combinations of the
variables which can reveal simplifications of the general VARMA structure. The interested
reader is referred to the aforementioned article. We have presented the echelon form here in
some detail because it often results in parsimonious representations.
3.2 Estimation of VARMA Models for Given Lag Orders and
Cointegrating Rank
For given Kronecker indices the ARMAE form of a VARMA DGP can be set up and esti-
mated. We will consider this case first and then study estimation of EC-ARMAE models for
which the cointegrating rank is given in addition to the Kronecker indices. Specification of
the Kronecker indices and the cointegrating rank will be discussed in Sections 3.4 and 3.3,
respectively.
3.2.1 ARMAE Models
Suppose the white noise process ut is normally distributed (Gaussian), ut ∼ N(0, Σu). Given
a sample y1, . . . , yT and presample values y0, . . . , yp−1, u0, . . . , uq−1, the log-likelihood function
of the VARMA model (2.1) is
l(θ) =T∑
t=1
lt(θ). (3.8)
Here θ represents the vector of all parameters to be estimated and
lt(θ) = −K
2log 2π − 1
2log det Σu − 1
2u′tΣ
−1u ut,
where
ut = M−10 (A0yt − A1yt−1 − · · · − Apyt−p −M1ut−1 − · · · −Mqut−q).
It is assumed that the uniqueness restrictions of the ARMAE form are imposed and θ contains
the freely varying parameters only. The initial values are assumed to be fixed and if the ut
25
EUI WP ECO 2004/24
(t ≤ 0) are not available, they may be replaced by zero without affecting the asymptotic
properties of the estimators.
Maximization of l(θ) is a nonlinear optimization problem which is complicated by the
inequality constraints that ensure invertibility of the MA operator. Iterative optimization
algorithms may be used here. Start-up values for such algorithms may be obtained as follows:
An unrestricted long VAR model of order hT , say, is fitted by OLS in a first step. Denoting
the estimated residuals by ut, the ARMAE form can be estimated when all lagged ut’s are
replaced by ut’s. If A0 6= IK , then unlagged ujt in equation k (k 6= j) may also be replaced
by estimated residuals from the long VAR. The resulting parameter estimates can be used
as starting values for an iterative algorithm.
If the DGP is stable and invertible and the parameters are identified, the ML estimator
θ has standard limiting properties, that is, θ is consistent and
√T (θ − θ)
d→ N(0, Σθ),
whered→ signifies convergence in distribution and Σθ is the inverse asymptotic information
matrix. This result holds even if the true distribution of the ut’s is not normal but satisfies
suitable moment conditions. In that case the estimators are just quasi ML estimators, of
course.
There has been some discussion of the likelihood function of VARMA models and its
maximization (Tunnicliffe Wilson (1973), Nicholls & Hall (1979), Hillmer & Tiao (1979)).
Unfortunately, optimization of the Gaussian log-likelihood is not a trivial exercise. Therefore
other estimation methods have been proposed in the literature (e.g., Koreisha & Pukkila
(1987), Kapetanios (2003), Poskitt (2003)). Of course, it is also straightforward to add
deterministic terms to the model and estimate the associated parameters along with the
VARMA coefficients.
3.2.2 EC-ARMAE Models
If the cointegrating rank r is given and the DGP is a pure, finite order VAR(p) process, the
corresponding VECM,
∆yt = αβ′yt−1 + Γ1∆yt−1 + · · ·+ Γp−1∆yt−p+1 + ut, (3.9)
can be estimated conveniently by RR regression, as shown in Johansen (1995a). Concentrat-
ing out the short-run dynamics by regressing ∆yt and yt−1 on ∆Y ′t−1 = [∆y′t−1, . . . , ∆y′t−p+1]
26
EUI WP ECO 2004/24
and denoting the residuals by R0t and R1t, respectively, the EC term can be estimated by
RR regression from
R0t = αβ′R1t + uct . (3.10)
Because the decomposition Π = αβ′ is not unique, the estimators for α and β are not
consistent whereas the resulting ML estimator for Π is consistent. However, because the
matrices α and β have rank r, one way to make them unique is to choose
β′ = [Ir : β′(K−r)], (3.11)
where β(K−r) is a ((K−r)×r) matrix. This normalization is always possible upon a suitable
ordering of the variables. The ML estimator of β(K−r) can be obtained by post-multiplying
the RR estimator β of β by the inverse of its first r rows and using the last K−r rows as the
estimator β(K−r) of β(K−r). This estimator is not only consistent but even superconsistent
meaning that it converges at a faster rate than the usual√
T to the true parameter matrix
β(K−r). In fact, it turns out that T (β(K−r) − β(K−r)) converges in distribution. As a result
inference for the other parameters can be done as if the cointegration matrix β were known.
Other estimation procedures that can be used here as well were proposed by Ahn &
Reinsel (1990) and Saikkonen (1992). In fact, in the latter article it was shown that the
procedure can even be justified if the true DGP is an infinite order VAR process and only a
finite order model is fitted, as long as the order goes to infinity with growing sample size. This
result is convenient in the present situation where we are interested in VARMA processes,
because we can estimate the cointegration relations in a first step on the basis of a finite
order VECM without MA part. Then the estimated cointegration matrix can be used in
estimating the remaining VARMA parameters. That is, the short-run parameters including
the loading coefficients α and MA parameters of the EC-ARMAE form can then be estimated
by ML conditional on the estimator for β. Because of the superconsistency of the estimator
for the cointegration parameters this procedure maintains the asymptotic efficiency of the
Gaussian ML estimator. Except for the cointegration parameters, the parameter estimators
have standard asymptotic properties which are equivalent to those of the full ML estimators
(Yap & Reinsel (1995)). If the Kronecker indices are given, the echelon VARMA structure
can also be taken into account in estimating the cointegration matrix.
As mentioned earlier, before a model can be estimated, the Kronecker indices and possibly
the cointegrating rank have to be specified. These issues are discussed next.
27
EUI WP ECO 2004/24
3.3 Testing for the Cointegrating Rank
A wide range of proposals exists for determining the cointegrating ranks of pure VAR pro-
cesses (see Hubrich, Lutkepohl & Saikkonen (2001) for a recent survey). The most popular
approach is due to Johansen (1995a) who derives likelihood ratio (LR) tests for the cointe-
grating rank of a pure VAR process. Because ML estimation of unrestricted VECMs with a
specific cointegrating rank r is straightforward for Gaussian processes, the LR statistic for
testing the pair of hypotheses H0 : r = r0 versus H1 : r > r0 is readily available by comparing
the likelihood maxima for r = r0 and r = K. The asymptotic distributions of the LR statis-
tics are nonstandard and depend on the deterministic terms included in the model. Tables
with critical values for various different cases are available in Johansen (1995a, Chapter 15).
The cointegrating rank can be determined by checking sequentially the null hypotheses
H0 : r = 0, H0 : r = 1, . . . , H0 : r = K − 1
and choosing the cointegrating rank for which the first null hypothesis cannot be rejected in
this sequence.
For our present purposes it is of interest that Johansen’s LR tests can be justified even
if a finite-order VAR process is fitted to an infinite order DGP, as shown by Lutkepohl &
Saikkonen (1999). It is assumed in this case that the order of the fitted VAR process goes
to infinity with the sample size and Lutkepohl & Saikkonen (1999) discuss the choice of
the VAR order in this approach. Because the Kronecker indices are usually also unknown,
choosing the cointegrating rank of a VARMA process by fitting a long VAR process is an
attractive approach which avoids knowledge of the VARMA structure at the stage where the
cointegrating rank is determined. So far the theory for this procedure seems to be available
for processes with nonzero mean term only and not for other deterministic terms such as
linear trends. It seems likely, however, that extensions to more general processes are possible.
An alternative way to proceed in determining the cointegrating rank of a VARMA process
was proposed by Yap & Reinsel (1995). They extended the likelihood ratio tests to VARMA
processes under the assumption that an identified structure of A(L) and M(L) is known.
For these tests the Kronecker indices or some other identifying structure has to be specified
first. If the Kronecker indices are known already, a lower bound for the cointegrating rank
is also known (see (3.6)). Hence, in testing for the cointegrating rank, only the sequence of
null hypotheses H0 : r = %,H0 : r = % + 1, . . . , H0 : r = K − 1, is of interest. Again, the
rank may be chosen as the smallest value for which H0 cannot be rejected.
28
EUI WP ECO 2004/24
3.4 Specifying the Lag Orders and Kronecker Indices
A number of proposals for choosing the Kronecker indices of ARMAE models were made, see,
for example, Hannan & Kavalieris (1984), Poskitt (1992), Nsiri & Roy (1992) and Lutkepohl
& Poskitt (1996) for stationary processes and Lutkepohl & Claessen (1997), Claessen (1995),
Poskitt & Lutkepohl (1995) and Poskitt (2003) for cointegrated processes. The strategies
for specifying the Kronecker indices of cointegrated ARMAE processes presented in this
section are proposed in the latter two papers. Poskitt (2003, Proposition 3.3) presents a
result regarding the consistency of the estimators of the Kronecker indices. A simulation
study of the small sample properties of the procedures was performed by Bartel & Lutkepohl
(1998). They found that the methods work reasonably well in small samples for the processes
considered in their study. This section draws partly on Lutkepohl (2002, Section 8.4.1).
The specification method proceeds in two stages. In the first stage a long reduced-
form VAR process of order hT , say, is fitted by OLS giving estimates of the unobservable
innovations ut as in the previously described estimation procedure. In a second stage the
estimated residuals are substituted for the unknown lagged ut’s in the ARMAE form. A range
of different models is estimated and the Kronecker indices are chosen by model selection
criteria.
There are different possibilities for doing so within this general procedure. For example,
one may search over all models associated with Kronecker indices which are smaller than
some prespecified upper bound pmax, (p1, . . . , pK)|0 ≤ pk ≤ pmax, k = 1, . . . , K. The set
of Kronecker indices is then chosen which minimizes the preferred model selection criterion.
For systems of moderate or large dimensions this procedure is rather computer intensive
and computationally more efficient search procedures have been suggested. One idea is to
estimate the individual equations separately by OLS for different lag lengths. The lag length
is then chosen so as to minimize a criterion of the general form
Λk,T (n) = log σ2k,T (n) + CT n/T , n = 0, 1, . . . , PT ,
where CT is a suitable function of the sample size T and T σ2k,T (n) is the residual sum of
squares from a regression of ykt on (ujt − yjt) (j = 1, . . . , K, j 6= k) and yt−s and ut−s
(s = 1, . . . , n). The maximum lag length PT is also allowed to depend on the sample size.
In this procedure the echelon structure is not explicitly taken into account because the
equations are treated separately. The k-th equation will still be misspecified if the lag order
is less than the true Kronecker index. Moreover, the k-th equation will be correctly specified
but may include redundant parameters and variables if the lag order is greater than the
29
EUI WP ECO 2004/24
true Kronecker index. This explains why the criterion function Λk,T (n) will possess a global
minimum asymptotically when n is equal to the true Kronecker index, provided CT is chosen
appropriately. In practice, possible choices of CT are CT = hT log T or CT = h2T (see Poskitt
(2003) for more details on the procedure). Poskitt & Lutkepohl (1995) and Poskitt (2003)
also consider a modification of this procedure where coefficient restrictions derived from
those equations in the system which have smaller Kronecker indices are taken into account.
The important point to make here is that procedures exist which can be applied in a fully
computerized model choice. Thus, model selection is feasible from a practical point of view
although the small sample properties of these procedures are not clear in general, despite
some encouraging but limited small sample evidence by Bartel & Lutkepohl (1998). Other
procedures for specifying the Kronecker indices for stationary processes were proposed by
Akaike (1976), Cooper & Wood (1982), Tsay (1989b) and Nsiri & Roy (1992), for example.
The Kronecker indices found in a computer automated procedure for a given time series
should only be viewed as a starting point for a further analysis of the system under consid-
eration. Based on the specified Kronecker indices a more efficient procedure for estimating
the parameters may be applied (see Section 3.2) and the model may be subjected to a range
of diagnostic tests. If such tests produce unsatisfactory results, modifications are called for.
Tools for checking the model adequacy will be briefly summarized in the following section.
3.5 Diagnostic Checking
As noted in Section 3.2, the estimators of an identified version of a VARMA model have
standard asymptotic properties. Therefore the usual t- and F -tests can be used to decide
on possible overidentifying restrictions. When a parsimonious model without redundant
parameters has been found, the residuals can be checked. According to our assumptions
they should be white noise and a number of model-checking tools are tailored to check
this assumption. For this purpose one may consider individual residual series or one may
check the full residual vector at once. The tools range from visual inspection of the plots
of the residuals and their autocorrelations to formal tests for residual autocorrelation and
autocorrelation of the squared residuals to tests for nonnormality and nonlinearity (see, e.g.,
Lutkepohl (1991), Doornik & Hendry (1997)). It is also advisable to check for structural
shifts during the sample period. Possible tests based on prediction errors are considered in
Lutkepohl (1991). Moreover, when new data becomes available, out-of-sample forecasts may
be checked. Model defects detected at the checking stage should lead to modifications of the
30
EUI WP ECO 2004/24
original specification.
4 Forecasting with Estimated Processes
4.1 General Results
To simplify matters suppose that the generation process of a multiple time series of interest
admits a VARMA representation with zero order matrices equal to IK ,
yt = A1yt−1 + · · ·+ Apyt−p + ut + M1ut−1 + · · ·+ Mqut−q, (4.1)
that is, A0 = M0 = IK . Recall that in the echelon form framework this representation can
always be obtained by premultiplying by A−10 if A0 6= IK . We denote by yτ+h|τ the h-step
forecast at origin τ given in Section 2.4, based on estimated rather than known coefficients.
For instance, using the pure VAR representation of the process,
yτ+h|τ =h−1∑i=1
Ξiyτ+h−i|τ +∞∑
i=h
Ξiyτ+h−i. (4.2)
For practical purposes one may truncate the infinite sum at i = τ . For the moment we will,
however, consider the infinite sum. For this predictor the forecast error is
yτ+h − yτ+h|τ = (yτ+h − yτ+h|τ ) + (yτ+h|τ − yτ+h|τ ),
where yτ+h|τ is the optimal forecast based on known coefficients and the two terms on the
right-hand side are uncorrelated if only data up to period τ are used for estimation. In that
case the first term can be written in terms of ut’s with t > τ and the second one contains
only yt’s with t ≤ τ . Thus, the forecast MSE becomes
Σy(h) = MSE(yτ+h|τ ) + MSE(yτ+h|τ − yτ+h|τ )
= Σy(h) + E[(yτ+h|τ − yτ+h|τ )(yτ+h|τ − yτ+h|τ )′]. (4.3)
The MSE(yτ+h|τ − yτ+h|τ ) can be approximated by Ω(h)/T , where
Ω(h) = E
[∂yτ+h|τ
∂θ′Σθ
∂y′τ+h|τ∂θ
], (4.4)
θ is the vector of estimated coefficients, and Σθ is its asymptotic covariance matrix (see
Yamamoto (1980), Baillie (1981) and Lutkepohl (1991) for more detailed expressions for
Ω(h) and Hogue, Magnus & Pesaran (1988) for an exact treatment of the AR(1) special
31
EUI WP ECO 2004/24
case). If ML estimation is used, the covariance matrix Σθ is just the inverse asymptotic
information matrix. Clearly, Ω(h) is positive semidefinite and the forecast MSE,
Σy(h) = Σy(h) +1
TΩ(h), (4.5)
for estimated processes is larger (or at least not smaller) than the corresponding quantity
for known processes, as one would expect. The additional term depends on the estimation
efficiency because it includes the asymptotic covariance matrix of the parameter estimators.
Therefore, estimating the parameters of a given process well is also important for forecasting.
On the other hand, for large sample sizes T , the additional term will be small or even
negligible.
It may be worth noting that deterministic terms can be accommodated easily, as discussed
in Section 2.5. In the present situation the uncertainty in the estimators related to such terms
can also be taken into account like that of the other parameters. If the deterministic terms are
specified such that the corresponding parameter estimators are asymptotically independent
of the other estimators, an additional term for the estimation uncertainty stemming from
the deterministic terms has to be added to the forecast MSE matrix (4.5). For deterministic
linear trends in univariate models more details are presented in Kim, Leybourne & Newbold
(2004).
Various extensions of the previous results have been discussed in the literature. For
example, Lewis & Reinsel (1985) and Lutkepohl (1985) consider the forecast MSE for the
case where the true process is approximated by a finite order VAR, thereby extending earlier
univariate results by Bhansali (1978). Reinsel & Lewis (1987), Basu & Sen Roy (1987),
Engle & Yoo (1987), Sampson (1991) and Reinsel & Ahn (1992) present results for processes
with unit roots. Stock (1996) and Kemp (1999) assume that the forecast horizon h and the
sample size T both go to infinity simultaneously. Clements & Hendry (1998, 2001) consider
various other sources of possible forecast errors. Taking into account the specification and
estimation uncertainty in multi-step forecasts, it makes also sense to construct a separate
model for each specific forecast horizon h. This approach is discussed in detail by Bhansali
(2002).
4.2 Aggregated Processes
In Section 2.4 we have compared different forecasts for aggregated time series. It was found
that generally forecasting the disaggregate process and aggregating the forecasts (zoτ+h|τ ) is
32
EUI WP ECO 2004/24
more efficient than forecasting the aggregate directly (zτ+h|τ ). In this case, if the sample
size is large enough, the part of the forecast MSE due to estimation uncertainty will even-
tually be so small that the estimated zoτ+h|τ is again superior to the corresponding zτ+h|τ .
There are cases, however, where the two forecasts are identical for known processes. Now
the question arises whether in these cases the MSE term due to estimation errors will make
one forecast preferable to its competitors. Indeed if estimated instead of known processes
are used, it is possible that zoτ+h|τ looses its optimality relative to zτ+h|τ because the MSE
part due to estimation may be larger for the former than for the latter. Consider the case,
where a number of series are simply added to obtain a univariate aggregate. Then it is
possible that a simple parsimonious univariate ARMA model describes the aggregate well,
whereas a large multivariate model is required for an adequate description of the multivari-
ate disaggregate process. Clearly, it is conceivable that the estimation uncertainty in the
multivariate case becomes considerably more important than for the univariate model for the
aggregate. Lutkepohl (1987) shows that this may indeed happen in small samples. In fact,
similar situations can not only arise for contemporaneous aggregation but also for temporal
aggregation. Generally, if two predictors based on known processes are nearly identical, the
estimation part of the MSE becomes important and generally the predictor based on the
smaller model is then to be preferred.
There is also another aspect which is important for comparing forecasts. So far we
have only taken into account the effect of estimation uncertainty on the forecast MSE. This
analysis still assumes a known model structure and only allows for estimated parameters.
In practice, model specification usually precedes estimation and usually there is additional
uncertainty attached to this step in the forecasting procedure. It is also possible to explicitly
take into account the fact that in practice models are only approximations to the true DGP
by considering finite order VAR and AR approximations to infinite order processes. This
has also been done by Lutkepohl (1987). Under these assumptions it is again found that
the forecast zoτ+h|τ looses its optimality and forecasting the aggregate directly or forecasting
the disaggregate series with univariate methods and aggregating univariate forecasts may
become preferable.
5 Conclusions
VARMA models are a powerful tool for producing linear forecasts for a set of time series
variables. They utilize the information not only in the past values of a particular variable
33
EUI WP ECO 2004/24
of interest but also allow for information in other, related variables. We have mentioned
conditions under which the forecasts from these models are optimal under an MSE criterion
for forecast performance. Even if the conditions for minimizing the forecast MSE in the
class of all functions are not satisfied the forecasts will be best linear forecasts under general
assumptions. These appealing theoretical features of VARMA models make them attractive
tools for forecasting.
Special attention has been paid to forecasting linearly transformed and aggregated pro-
cesses. Both contemporaneous as well as temporal aggregation have been studied. It was
found that generally forecasting the disaggregated process and aggregating the forecasts is
more efficient than forecasting the aggregate directly and thereby ignoring the disaggregate
information. Moreover, for contemporaneous aggregation, forecasting the individual compo-
nents with univariate methods and aggregating these forecasts was compared to the other
two possible forecasts. Forecasting univariate components separately may lead to better
forecasts than forecasting the aggregate directly. It will be inferior to aggregating forecasts
of the fully disaggregated process, however. These results hold if the DGPs are known.
In practice the relevant model for forecasting a particular set of time series will not be
known, however, and it is necessary to use sample information to specify and estimate a
suitable candidate model from the VARMA class. We have discussed estimation methods
and specification algorithms which are suitable at this stage of the forecasting process for
stationary as well as integrated processes. The nonuniqueness or lack of identification of
general VARMA representations turned out to be a major problem at this stage. We have
focussed on the echelon form as one possible parameterization that allows to overcome the
identification problem. The echelon form has the advantage of providing a relatively parsi-
monious VARMA representation in many cases. Moreover, it can be extended conveniently
to cointegrated processes by including an EC term. It is described by a set of integers called
Kronecker indices. Statistical procedures were presented for specifying these quantities. We
have also presented methods for determining the cointegrating rank of a process if some or
all of the variables are integrated. This can be done by applying standard cointegrating rank
tests for pure VAR processes because these tests maintain their usual asymptotic properties
even if they are performed on the basis of an approximating VAR process rather than the
true DGP. We have also briefly discussed issues related to checking the adequacy of a par-
ticular model. Overall a coherent strategy for specifying, estimating and checking VARMA
models has been presented. Finally, the implications of using estimated rather than known
processes for forecasting have been discussed.
34
EUI WP ECO 2004/24
If estimation and specification uncertainty are taken into account it turns out that fore-
casts based on a disaggregated multiple time series may not be better and may in fact be
inferior to forecasting an aggregate directly. This situation is in particular likely to occur if
the DGPs are such that efficiency gains from disaggregation do not exist or are small and
the aggregated process has a simple structure which can be captured with a parsimonious
model.
Clearly, VARMA models also have some drawbacks as forecasting tools. First of all,
linear forecasts may not always be the best choice (see Chapter ??? for a discussion of
forecasting with nonlinear models). Second, adding more variables in a system does not
necessarily increase the forecast precision. Higher dimensional systems are typically more
difficult to specify than smaller ones. Thus, considering as many series as possible in one
system is clearly not a good strategy. The increase in estimation and specification uncertainty
may offset the advantages of using additional information. VARMA models appear to be
most useful for analyzing small sets of time series. Choosing the best set of variables for a
particular forecasting exercise may not be an easy task. In conclusion, although VARMA
models are an important forecasting tool, the actual success very much depends on the skills
of the user of these tools.
References
Abraham, B. (1982). Temporal aggregation and time series, International Statistical Review50: 285–291.
Ahn, S. K. & Reinsel, G. C. (1990). Estimation of partially nonstationary multivariate autoregres-sive models, Journal of the American Statistical Association 85: 813–823.
Akaike, H. (1974). Stochastic theory of minimal realization, IEEE Transactions on AutomaticControl AC-19: 667–674.
Akaike, H. (1976). Canonical correlation analysis of time series and the use of an informationcriterion, in R. K. Mehra & D. G. Lainiotis (eds), Systems Identification: Advances and CaseStudies, Academic Press, New York, pp. 27–96.
Amemiya, T. & Wu, R. Y. (1972). The effect of aggregation on prediction in the autoregressivemodel, Journal of the American Statistical Association 67: 628–632.
Aoki, M. (1987). State Space Modeling of Time Series, Springer-Verlag, Berlin.
Baillie, R. T. (1981). Prediction from the dynamic simultaneous equation model with vectorautoregressive errors, Econometrica 49: 1331–1337.
Bartel, H. & Lutkepohl, H. (1998). Estimating the Kronecker indices of cointegrated echelon formVARMA models, Econometrics Journal 1: C76–C99.
35
EUI WP ECO 2004/24
Basu, A. K. & Sen Roy, S. (1987). On asymptotic prediction problems for multivariate autore-gressive models in the unstable nonexplosive case, Calcutta Statistical Association Bulletin36: 29–37.
Bauer, D. & Wagner, M. (2003). A canonical form for unit root processes in the state spaceframework, Diskussionsschriften 03-12, Universitat Bern.
Bhansali, R. J. (1978). Linear prediction by autoregressive model fitting in the time domain, Annalsof Statistics 6: 224–231.
Bhansali, R. J. (2002). Multi-step forecasting, in M. P. Clements & D. F. Hendry (eds), A Com-panion to Economic Forecasting, Blackwell, Oxford, pp. 206–221.
Box, G. E. P. & Jenkins, G. M. (1976). Time Series Analysis: Forecasting and Control, Holden-Day,San Francisco.
Breitung, J. & Swanson, N. R. (2002). Temporal aggregation and spurious instantaneous causalityin multiple time series models, Journal of Time Series Analysis 23: 651–665.
Brewer, K. R. W. (1973). Some consequences of temporal aggregation and ssystematic samplingfor ARMA and ARMAX models, Journal of Econometrics 1: 133–154.
Brockwell, P. J. & Davis, R. A. (1987). Time Series: Theory and Methods, Springer-Verlag, NewYork.
Claessen, H. (1995). Spezifikation und Schatzung von VARMA-Prozessen unter besondererBerucksichtigung der Echelon Form, Verlag Joseph Eul, Bergisch-Gladbach.
Clements, M. P. & Hendry, D. F. (1998). Forecasting Economic Time Series, Cambridge UniversityPress, Cambridge.
Clements, M. P. & Hendry, D. F. (1999). Forecasting Non-stationary Economic Time Series, MITPress, Cambridge, Ma.
Cooper, D. M. & Wood, E. F. (1982). Identifying multivariate time series models, Journal of TimeSeries Analysis 3: 153–164.
Doornik, J. A. & Hendry, D. F. (1997). Modelling Dynamic Systems Using PcFiml 9.0 for Windows,International Thomson Business Press, London.
Dunsmuir, W. T. M. & Hannan, E. J. (1976). Vector linear time series models, Advances in AppliedProbability 8: 339–364.
Engle, R. F. & Granger, C. W. J. (1987). Cointegration and error correction: Representation,estimation and testing, Econometrica 55: 251–276.
Engle, R. F. & Yoo, B. S. (1987). Forecasting and testing in cointegrated systems, Journal ofEconometrics 35: 143–159.
Findley, D. F. (1986). On bootstrap estimates of forecast mean square errors for autoregressive pro-cesses, in D. M. Allen (ed.), Computer Science and Statistics: The Interface, North-Holland,Amsterdam, pp. 11–17.
Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectralmethods, Econometrica 37: 424–438.
36
EUI WP ECO 2004/24
Granger, C. W. J. (1981). Some properties of time series data and their use in econometric modelspecification, Journal of Econometrics 16: 121–130.
Granger, C. W. J. & Newbold, P. (1977). Forecasting Economic Time Series, Academic Press, NewYork.
Gregoir, S. (1999a). Multivariate time series with various hidden unit roots, part I: Integral operatoralgebra and representation theorem, Econometric Theory 15: 435–468.
Gregoir, S. (1999b). Multivariate time series with various hidden unit roots, part II: Estimationand test, Econometric Theory 15: 469–518.
Gregoir, S. & Laroque, G. (1994). Polynomial cointegration: Estimation and test, Journal ofEconometrics 63: 183–214.
Grigoletto, M. (1998). Bootstrap prediction intervals for autoregressions: Some alternatives, Inter-national Journal of Forecasting 14: 447–456.
Haldrup, N. (1998). An econometric analysis of I(2) variables, Journal of Economic Surveys12: 595–650.
Hannan, E. J. (1970). Multiple Time Series, John Wiley, New York.
Hannan, E. J. (1976). The identification and parameterization of ARMAX and state space forms,Econometrica 44: 713–723.
Hannan, E. J. (1979). The statistical theory of linear systems, in P. R. Krishnaiah (ed.), Develop-ments in Statistics, Academic Press, New York, pp. 83–121.
Hannan, E. J. (1981). Estimating the dimension of a linear system, Journal of Multivariate Analysis11: 459–473.
Hannan, E. J. & Deistler, M. (1988). The Statistical Theory of Linear Systems, Wiley, New York.
Hannan, E. J. & Kavalieris, L. (1984). Multivariate linear time series models, Advances in AppliedProbability 16: 492–561.
Hillmer, S. C. & Tiao, G. C. (1979). Likelihood function of stationary multiple autoregressivemoving average models, Journal of the American Statistical Association 74: 652–660.
Hogue, A., Magnus, J. & Pesaran, B. (1988). The exact multi-period mean-square forecast errorfor the first-order autoregressive model, Journal of Econometrics 39: 327–346.
Hubrich, K., Lutkepohl, H. & Saikkonen, P. (2001). A review of systems cointegration tests,Econometric Reviews 20: 247–318.
Jenkins, G. M. & Alavi, A. S. (1981). Some aspects of modelling and forecasting multivariate timeseries, Journal of Time Series Analysis 2: 1–47.
Johansen, S. (1995a). Likelihood-based Inference in Cointegrated Vector Autoregressive Models,Oxford University Press, Oxford.
Johansen, S. (1995b). A statistical analysis of cointegration for I(2) variables, Econometric Theory11: 25–59.
Johansen, S. (1997). Likelihood analysis of the I(2) model, Scandinavian Journal of Statistics24: 433–462.
37
EUI WP ECO 2004/24
Johansen, S. & Schaumburg, E. (1999). Likelihood analysis of seasonal cointegration, Journal ofEconometrics 88: 301–339.
Kabaila, P. (1993). On bootstrap predictive inference for autoregressive processes, Journal of TimeSeries Analysis 14: 473–484.
Kapetanios, G. (2003). A note on an iterative least-squares estiamtion method for ARMA andVARMA models, Economics Letters 79: 305–312.
Kemp, G. C. R. (1999). The behovior of forecast errors from a nearly integrated AR(1) model asboth sample size and forecast horizon become large, Econometric Theory 15: 238–256.
Kim, T. H., Leybourne, S. J. & Newbold, P. (2004). Asymptotic mean-squared forecast errorwhen an autoregression with linear trend is fitted to data generated by an I(0) or I(1) process,Journal of Time Series Analysis 25: 583–602.
Kohn, R. (1982). When is an aggregate of a time series efficiently forecast by its past?, Journal ofEconometrics 18: 337–349.
Koreisha, S. G. & Pukkila, T. M. (1987). Identification of nonzero elements in the polynomialmatrices of mixed VARMA processes, Journal of the Royal Statistical Society B49: 112–126.
Lewis, R. & Reinsel, G. C. (1985). Prediction of multivarate time series by autoregressive modelfitting, Journal of Multivariate Analysis 16: 393–411.
Lutkepohl, H. (1984). Linear transformations of vector arma processes, Journal of Econometrics26: 283–293.
Lutkepohl, H. (1985). The joint asymptotic distribution of multistep prediction errors of estimatedvector autoregressions, Economics Letters 17: 103–106.
Lutkepohl, H. (1986a). Forecasting temporally aggregated vector ARMA processes, Journal ofForecasting 5: 85–95.
Lutkepohl, H. (1986b). Forecasting vector ARMA processes with systematically missing observa-tions, Journal of Business & Economic Statistics 4: 375–390.
Lutkepohl, H. (1987). Forecasting Aggregated Vector ARMA Processes, Springer-Verlag, Berlin.
Lutkepohl, H. (1991). Introduction to Multiple Time Series Analysis, Springer Verlag, Berlin.
Lutkepohl, H. (1996). Handbook of Matrices, John Wiley & Sons, Chichester.
Lutkepohl, H. (2002). Forecasting cointegrated VARMA processes, in M. P. Clements & D. F.Hendry (eds), A Companion to Economic Forecasting, Blackwell, Oxford, pp. 179–205.
Lutkepohl, H. & Claessen, H. (1997). Analysis of cointegrated VARMA processes, Journal ofEconometrics 80: 223–239.
Lutkepohl, H. & Poskitt, D. S. (1996). Specification of echelon form VARMA models, Journal ofBusiness & Economic Statistics 14: 69–79.
Lutkepohl, H. & Saikkonen, P. (1999). Order selection in testing for the cointegrating rank of aVAR process, in R. F. Engle & H. White (eds), Cointegration, Causality, and Forecasting. AFestschrift in Honour of Clive W.J. Granger, Oxford University Press, Oxford, pp. 168–199.
38
EUI WP ECO 2004/24
Marcellino, M. (1999). Some consequences of temporal aggregation in empirical analysis, Journalof Business & Economic Statistics 17: 129–136.
Masarotto, G. (1990). Bootstrap prediction intervals for autoregressions, International Journal ofForecasting 6: 229–239.
Nicholls, D. F. & Hall, A. D. (1979). The exact likelihood of multivariate autoregressive movingaverage models, Biometrika 66: 259–264.
Nsiri, S. & Roy, R. (1992). On the identification of ARMA echelon-form models, Canadian Journalof Statistics 20: 369–386.
Pascual, L., Romo, J. & Ruiz, E. (2004). Bootstrap predictive inference for ARIMA processes,Journal of Time Series Analysis 25: 449–465.
Poskitt, D. S. (1992). Identification of echelon canonical forms for vector linear processes usingleast squares, Annals of Statistics 20: 196–215.
Poskitt, D. S. (2003). On the specification of cointegrated autoregressive moving-average forecastingsystems, International Journal of Forecasting 19: 503–519.
Poskitt, D. S. & Lutkepohl, H. (1995). Consistent specification of cointegrated autoregressivemoving average systems, Discussion Paper 54, SFB 373, Humboldt-Universitat zu Berlin.
Quenouille, M. H. (1957). The Analysis of Multiple Time-Series, Griffin, London.
Reinsel, G. C. (1993). Elements of Multivariate Time Series Analysis, Springer-Verlag, New York.
Reinsel, G. C. & Ahn, S. K. (1992). Vector autoregressive models with unit roots and reduced rankstructure: Estimation, likelihood ratio test, and forecasting, Journal of Time Series Analysis13: 353–375.
Reinsel, G. C. & Lewis, A. L. (1987). Prediction mean square error for non-stationary multivariatetime series using estimated parameters, Economics Letters 24: 57–61.
Saikkonen, P. (1992). Estimation and testing of cointegrated systems by an autoregressive approx-imation, Econometric Theory 8: 1–27.
Saikkonen, P. & Lutkepohl, H. (1996). Infinite order cointegrated vector autoregressive processes:Estimation and inference, Econometric Theory 12: 814–844.
Sampson, M. (1991). The effect of parameter uncertainty on forecast variances and confidence in-tervals for unit root and trend stationary time-series models, Journal of Applied Econometrics6: 67–76.
Sims, C. A. (1980). Macroeconomics and reality, Econometrica 48: 1–48.
Stock, J. H. (1996). VAR, error correction and pretest forecasts at long horizons, Oxford Bulletinof Economics and Statistics 58: 685–701.
Stram, D. O. & Wei, W. W. S. (1986). Temporal aggregation in the ARIMA process, Journal ofTime Series Analysis 7: 279–292.
Telser, L. G. (1967). Discrete samples and moving sums in stationary stochastic processes, Journalof the American Statistical Association 62: 484–499.
39
EUI WP ECO 2004/24
Tiao, G. C. (1972). Asymptotic behaviour of temporal aggregates of time series, Biometrika59: 525–531.
Tiao, G. C. & Box, G. E. P. (1981). Modeling multiple time series with applications, Journal ofthe American Statistical Association 76: 802–816.
Tiao, G. C. & Guttman, I. (1980). Forecasting contemporal aggregates of multiple time series,Journal of Econometrics 12: 219–230.
Tiao, G. C. & Tsay, R. S. (1983). Multiple time series modeling and extended sample cross-correlations, Journal of Business & Economic Statistics 1: 43–56.
Tiao, G. C. & Tsay, R. S. (1989). Model specification in multivariate time series (with discussion),Journal of the Royal Statistical Society B51: 157–213.
Tsay, R. S. (1989a). Identifying multivariate time series models, Journal of Time Series Analysis10: 357–372.
Tsay, R. S. (1989b). Parsimonious parameterization of vector autoregressive moving average models,Journal of Business & Economic Statistics 7: 327–341.
Tunnicliffe Wilson, G. (1973). Estimation of parameters in multivariate time series models, Journalof the Royal Statistical Society B35: 76–85.
Wallis, K. F. (1977). Multiple time series analysis and the final form of econometric models,Econometrica 45: 1481–1497.
Wei, W. W. S. (1978). Some consequences of temporal aggregation in seasonal time series models, inA. Zellner (ed.), Seasonal Analysis of Economic Time Series, U.S. Department of Commerce,Bureau of the Census, pp. 433–444.
Wei, W. W. S. (1990). Time Series Analysis: Univariate and Multivariate Methods, Addison-Wesley, Redwood City, Ca.
Weiss, A. A. (1984). Systematic sampling and temporal aggregation in time series models, Journalof Econometrics 26: 271–281.
Yamamoto, T. (1980). On the treatment of autocorrelated errors in the multiperiod prediction ofdynamic simultaneous equation models, International Economic Review 21: 735–748.
Yap, S. F. & Reinsel, G. C. (1995). Estimation and testing for unit roots in a partially nonstationaryvector autoregressive moving average model, Journal of the American Statistical Association90: 253–267.
Zellner, A. & Palm, F. (1974). Time series analysis and simultaneous equation econometric models,Journal of Econometrics 2: 17–54.
40
EUI WP ECO 2004/24