forecasting with varma models · the successful use of univariate arma models for forecasting has...

EUROPEAN UNIVERSITY INSTITUTE DEPARTMENT OF ECONOMICS

EUI Working Paper ECO No. 2004 /25

Forecasting with VARMA Models

HELMUT LÜTKEPOHL

BADIA FIESOLANA, SAN DOMENICO (FI)

All rights reserved. No part of this paper may be reproduced in any form

Without permission of the author(s).

©2004 Helmut Lütkepohl

Published in Italy in September 2004 European University Institute

Badia Fiesolana I-50016 San Domenico (FI)

Italy

*Revisions of this work can be found at: http://www.iue.it/Personal/Luetkepohl/.

August 30, 2004

Forecasting with VARMA Models

by

Helmut Lutkepohl

Department of Economics, European University Institute, Via della Piazzuola 43, I-50133

Firenze, ITALY, email: [email protected]

Contents

1 Introduction and Overview 1

2 VARMA Processes 42.1 Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Cointegrated I(1) Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Linear Transformations of VARMA Processes . . . . . . . . . . . . . . . . . 72.4 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4.1 General Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4.2 Forecasting Aggregated Processes . . . . . . . . . . . . . . . . . . . . 12

2.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.5.1 Deterministic Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.5.2 More Unit Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.5.3 Non-Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Specifying and Estimating VARMA Models 203.1 The Echelon Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1.1 Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.2 I(1) Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Estimation of VARMA Models for Given Lag Orders and Cointegrating Rank 253.2.1 ARMAE Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.2 EC-ARMAE Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Testing for the Cointegrating Rank . . . . . . . . . . . . . . . . . . . . . . . 283.4 Specifying the Lag Orders and Kronecker Indices . . . . . . . . . . . . . . . 293.5 Diagnostic Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4 Forecasting with Estimated Processes 314.1 General Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.2 Aggregated Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 Conclusions 33

0

EUI WP ECO 2004/24

1 Introduction and Overview

In this chapter linear models for the conditional mean of a stochastic process are considered.

These models are useful for producing linear forecasts of time series variables. Suppose that

K related time series variables are considered, y1t, . . . , yKt, say. Defining yt = (y1t, . . . , yKt)′,

a linear model for the conditional mean of the data generation process (DGP) of the observed

series may be of the vector autoregressive (VAR) form,

yt = A1yt−1 + · · ·+ Apyt−p + ut, (1.1)

where the Ai’s (i = 1, . . . , p) are (K × K) coefficient matrices and ut is a K-dimensional

error term. If ut is independent over time (i.e., ut and us are independent for t 6= s), the

conditional mean of yt, given past observations, is

yt|t−1 ≡ E(yt|yt−1, yt−2, . . . ) = A1yt−1 + · · ·+ Apyt−p.

Thus, the model can be used directly for forecasting one period ahead and forecasts with

larger horizons can be computed recursively. Therefore, variants of this model will be the

basic forecasting models in this chapter.

For practical purposes the simple VAR model of order p may have some disadvantages,

however. The Ai parameter matrices will be unknown and have to be replaced by estimators.

For an adequate representation of the DGP of a set of time series of interest a rather large

VAR order p may be required. Hence, a large number of parameters may be necessary for

an adequate description of the data. Given limited sample information this will usually

result in low estimation precision and also forecasts based on VAR processes with estimated

coefficients may suffer from the uncertainty in the parameter estimators. Therefore it is

useful to consider the larger model class of vector autoregressive moving-average (VARMA)

models which may be able to represent the DGP of interest in a more parsimonious way. In

this chapter the analysis of models from that class will be discussed although special case

results for VAR processes will occasionally be noted explicitly.

The VARMA class has the further advantage of being closed with respect to linear trans-

formations, that is, a linearly transformed finite order VARMA process has again a finite

order VARMA representation. Therefore linear aggregation issues can be studied within

this class. In this chapter special attention will be given to results related to forecasting

contemporaneously and temporally aggregated processes.

VARMA models can be parameterized in different ways. In other words, different param-

eterizations describe the same stochastic process. Although this is no problem for forecasting

1

EUI WP ECO 2004/24

purposes because we just need to have one adequate representation of the DGP, nonunique

parameters are a problem at the estimation stage. Therefore the echelon form of a VARMA

process is presented as a unique representation. Estimation and specification of this model

form will be considered.

These models have first been developed for stationary variables. In economics and also

other fields of applications many variables are generated by nonstationary processes, however.

Often they can be made stationary by considering differences or changes rather than the

levels. A variable is called integrated of order d (I(d)) if it is still nonstationary after

taking differences d − 1 times but it can be made stationary or asymptotically stationary

by differencing d times. In most of the following discussion the variables will be assumed to

be stationary (I(0)) or integrated of order 1 (I(1)) and they may be cointegrated. In other

words, there may be linear combinations of I(1) variables which are I(0). If cointegration is

present, it is often advantageous to separate the cointegration relations from the short-run

dynamics of the DGP. This can be done conveniently by allowing for an error correction or

equilibrium correction (EC) term in the models and EC echelon forms will also be considered.

The model setup for stationary and integrated or cointegrated variables will be presented

in the next section where also forecasting with VARMA models will be considered under the

assumption that the DGP is known. In practice it is, of course, necessary to specify and

estimate a model for the DGP on the basis of a given set of time series. Model specification,

estimation and model checking are discussed in Section 3 and forecasting with estimated

models is considered in Section 4. Conclusions follow in Section 5.

Historical Notes

The successful use of univariate ARMA models for forecasting has motivated researchers to

extend the model class to the multivariate case. It is plausible to expect that using more

information by including more interrelated variables in the model improves the forecast

precision. This is actually the idea underlying Granger’s influential definition of causality

(Granger (1969)). It turned out, however, that generalizing univariate models to multi-

variate ones is far from trivial in the ARMA case. Early on Quenouille (1957) considered

multivariate VARMA models. It became quickly apparent, however, that the specification

and estimation of such models was much more difficult than for univariate ARMA mod-

els. The success of the Box-Jenkins modelling strategy for univariate ARMA models in the

1970s (Box & Jenkins (1976)) triggered further attempts of using the corresponding mul-

2

EUI WP ECO 2004/24

tivariate models and developing estimation and specification strategies. In particular, the

possibility of using autocorrelations, partial autocorrelations and cross-correlations between

the variables for model specification were explored. Because modelling strategies based on

such quantities had been to some extent successful in the univariate Box-Jenkins approach,

it was plausible to try multivariate extensions. Examples of such attempts are Tiao & Box

(1981), Tiao & Tsay (1983, 1989), Tsay (1989a, b), Wallis (1977), Zellner & Palm (1974),

Granger & Newbold (1977, Chapter 7), Jenkins & Alavi (1981). It became soon clear, how-

ever, that these strategies were at best promising for very small systems of two or perhaps

three variables. Moreover, the most useful setup of multiple time series models was under

discussion because VARMA representations are not unique or, to use econometric terminol-

ogy, they are not identified. Important early discussions of the related problems are due to

Hannan (1970, 1976, 1979, 1981), Dunsmuir & Hannan (1976) and Akaike (1974). A rather

general solution to the structure theory for VARMA models was later presented by Hannan

& Deistler (1988). Understanding the structural problems contributed to the development

of complete specification strategies. By now textbook treatments of modelling, analyzing

and forecasting VRAMA processes are available (Lutkepohl (1991), Reinsel (1993)).

The problems related to VARMA models were perhaps also relevant for a parallel de-

velopment of pure VAR models as important tools for economic analysis and forecasting.

Sims (1980) launched a general critique of classical econometric modelling and proposed

VAR models as alternatives. A short while later the concept of cointegration was developed

by Granger (1981) and Engle & Granger (1987). It is conveniently placed into the VAR

framework as shown by Johansen (1995a). Therefore it is perhaps not surprising that VAR

models dominate time series econometrics although the methodology and software for work-

ing with more general VARMA models is nowadays available. A recent previous overview of

forecasting with VARMA processes is given by Lutkepohl (2002). The present review draws

partly on that article and on a monograph by Lutkepohl (1987).

Notation, Terminology, Abbreviations

The following notation and terminology is used in this chapter. The lag operator also some-

times called backshift operator is denoted by L and it is defined as usual by Lyt ≡ yt−1. The

differencing operator is denoted by ∆, that is, ∆yt ≡ yt − yt−1. For a random variable or

random vector x, x ∼ (µ, Σ) signifies that its mean (vector) is µ and its variance (covariance

matrix) is Σ. The (K × K) identity matrix is denoted by IK and the determinant and

3

EUI WP ECO 2004/24

trace of a matrix A are denoted by det A and trA, respectively. For quantities A1, . . . , Ap,

diag[A1, . . . , Ap] denotes the diagonal or block-diagonal matrix with A1, . . . , Ap on the diag-

onal. The natural logarithm of a real number is signified by log. The symbols Z, N and C

are used for the integers, the positive integers and the complex numbers, respectively.

DGP stands for data generation process. VAR, AR, MA, ARMA and VARMA are used

as abbreviations for vector autoregressive, autoregressive, moving-average, autoregressive

moving-average and vector autoregressive moving-average (process). Error correction is ab-

breviated as EC and VECM is short for vector error correction model. The echelon forms

of VARMA and EC-VARMA processes are denoted by ARMAE and EC-ARMAE, respec-

tively. OLS, GLS, ML and RR abbreviate ordinary least squares, generalized least squares,

maximum likelihood and reduced rank, respectively. LR and MSE are used to abbreviate

likelihood ratio and mean squared error.

2 VARMA Processes

2.1 Stationary Processes

Suppose the DGP of the K-dimensional multiple time series, y1, . . . , yT , is stationary, that

is, its first and second moments are time invariant. It is a (finite order) VARMA process if

it can be represented in the general form

A0yt = A1yt−1 + · · ·+Apyt−p +M0ut +M1ut−1 + · · ·+Mqut−q, t = 0,±1,±2, . . . , (2.1)

where A0, A1, . . . , Ap are (K ×K) autoregressive parameter matrices while M0,M1, . . . , Mq

are moving average parameter matrices also of dimension (K×K). Defining the VAR and MA

operators, respectively, as A(L) = A0−A1L−· · ·−ApLp and M(L) = M0+M1L+· · ·+MqL

q,

the model can be written in more compact notation as

A(L)yt = M(L)ut, t = 0,±1,±2, . . . . (2.2)

Here ut is a white-noise process with zero mean, nonsingular, time-invariant covariance

matrix E(utu′t) = Σu and zero covariances, E(utu

′t−h) = 0 for h = ±1,±2, . . . . The zero-

order matrices A0 and M0 are assumed to be nonsingular. They will often be identical,

A0 = M0, and in many cases they will be equal to the identity matrix, A0 = M0 = IK . To

indicate the orders of the VAR and MA operators, the process (2.1) is sometimes called a

VARMA(p, q) process. Notice, however, that so far we have not made further assumptions

4

EUI WP ECO 2004/24

regarding the parameter matrices so that some or all of the elements of the Ai’s and Mj’s

may be zero. In other words, there may be a VARMA representation with VAR or MA

orders less than p and q, respectively. Obviously, the VAR model (1.1) is a VARMA(p, 0)

special case with A0 = IK and M(L) = IK . It may also be worth pointing out that there

are no deterministic terms such as nonzero mean terms in our basic VARMA model (2.1).

These terms are ignored here for convenience although they are important in practice. The

necessary modifications for deterministic terms will be discussed in Section 2.5.

The matrix polynomials in (2.2) are assumed to satisfy

det A(z) 6= 0, |z| ≤ 1, and det M(z) 6= 0, |z| ≤ 1 for z ∈ C. (2.3)

The first of these conditions ensures that the VAR operator is stable and the process is

stationary. Then it has a pure MA representation

yt =∞∑

j=0

Φiut−i (2.4)

with MA operator Φ(L) = Φ0 +∑∞

i=1 ΦiLi = A(L)−1M(L). Notice that Φ0 = IK if A0 = M0

and in particular if both zero order matrices are identity matrices. In that case (2.4) is just

the Wold MA representation of the process and, as we will see later, the ut are just the

one-step ahead forecast errors. Some of the forthcoming results are valid for more general

stationary processes with Wold representation (2.4) which may not come from a finite order

VARMA representation. In that case, it is assumed that the Φi’s are absolutely summable

so that the infinite sum in (2.4) is well-defined.

The second part of condition (2.3) is the usual invertibility condition for the MA operator

which implies the existence of a pure VAR representation of the process,

yt =∞∑i=1

Ξiyt−i + ut, (2.5)

where A0 = M0 is assumed and Ξ(L) = IK − ∑∞i=1 ΞiL

i = M(L)−1A(L). Occasionally

invertibility of the MA operator will not be a necessary condition. In that case, it is assumed

without loss of generality that det M(z) 6= 0, for |z| < 1. In other words, the roots of the

MA operator are outside or on the unit circle. There are still no roots inside the unit circle,

however. This assumption can be made without loss of generality because it can be shown

that for an MA process with roots inside the complex unit circle an equivalent one exists

which has all its roots outside and on the unit circle.

5

EUI WP ECO 2004/24

It may be worth noting at this stage already that every pair of operators A(L), M(L)

which leads to the same transfer functions Φ(L) and Ξ(L) defines an equivalent VARMA

representation for yt. This nonuniqueness problem of the VARMA representation will become

important when parameter estimation is discussed in Section 3.

As specified in (2.1), we are assuming that the process is defined for all t ∈ Z. For sta-

ble, stationary processes this assumption is convenient because it avoids considering issues

related to initial conditions. Alternatively one could define yt to be generated by a VARMA

process such as (2.1) for t ∈ N, and specify the initial values y0, . . . , y−p+1, u0, . . . , u−p+1 sep-

arately. Under our assumptions they can be defined such that yt is stationary. Alternatively,

one may define fixed initial values or perhaps even y0 = · · · = y−p+1 = u0 = · · · = u−p+1 = 0.

In general, such an assumption implies that the process is not stationary but just asymptot-

ically stationary, that is, the first and second order moments converge to the corresponding

quantities of the stationary process obtained by specifying the initial conditions accordingly

or defining yt for t ∈ Z. The issue of defining initial values properly becomes more important

for the nonstationary processes discussed in Section 2.2.

Both the MA and the VAR representations of the process will be convenient to work

with in particular situations. Another useful representation of a stationary VARMA process

is the state space representation which will not be used in this review, however. State space

representations of VARMA processes are considered, for example, by Aoki (1987), Hannan

& Deistler (1988), and Wei (1990). In more general terms state space models are discussed

in Chapter ??? of this Handbook.

2.2 Cointegrated I(1) Processes

If the DGP is not stationary but contains some I(1) variables, the levels VARMA form

(2.1) is not the most convenient one for inference purposes. In that case, det A(z) = 0 for

z = 1. Therefore we write the model in EC form by subtracting A0yt−1 on both sides and

re-arranging terms as follows:

A0∆yt = Πyt−1 + Γ1∆yt−1 + · · ·+ Γp−1∆yt−p+1

+M0ut + M1ut−1 + · · ·+ Mqut−q, t ∈ N,(2.6)

where Π = −(A0 − A1 − · · · − Ap) = −A(1) and Γi = −(Ai+1 + · · ·+ Ap) (i = 1, . . . , p− 1)

(Lutkepohl & Claessen (1997)). Here Πyt−1 is the EC term and r = rk(Π) is the cointe-

grating rank of the system which specifies the number of linearly independent cointegration

relations. The process is assumed to be started at time t = 1 from some initial values

6

EUI WP ECO 2004/24

y0, . . . , y−p+1, u0, . . . , u−p+1 to avoid infinite moments. Thus, the initial values are now of

some importance. Assuming that they are zero is convenient because in that case the pro-

cess is easily seen to have a pure EC-VAR or VECM representation of the form

∆yt = Π∗yt−1 +t−1∑j=1

Θj∆yt−j + A−10 M0ut, t ∈ N, (2.7)

where Π∗ and Θj (j = 1, 2, . . . ) are such that

IK∆− Π∗L−∞∑

j=1

Θj∆Lj = A−10 M0M(L)−1(A0∆− ΠL− Γ1∆L− · · · − Γp−1∆Lp−1).

A similar representation can also be obtained if nonzero initial values are permitted (see

Saikkonen & Lutkepohl (1996)). Furthermore, Bauer & Wagner (2003) present a state space

representation which is especially suitable for cointegrated processes.

2.3 Linear Transformations of VARMA Processes

As mentioned in the introduction, a major advantage of the class of VARMA processes is that

it is closed with respect to linear transformations. In other words, linear transformations of

VARMA processes have again a finite order VARMA representation. These transformations

are very common and are useful to study problems of aggregation, marginal processes or

averages of variables generated by VARMA processes etc.. In particular, the following result

from Lutkepohl (1984) is useful in this context. Let

yt = ut + M1ut−1 + · · ·+ Mqut−q

be a K-dimensional invertible MA(q) process and let F be an (M ×K) matrix of rank M .

Then the M -dimensional process zt = Fyt has an invertible MA(q) representation with q ≤ q.

An interesting consequence of this result is that if yt is a stable and invertible VARMA(p, q)

process as in (2.1), then the linearly transformed process zt = Fyt has a stable and invertible

VARMA(p, q) representation with p ≤ (K − M + 1)p and q ≤ (K − M)p + q (Lutkepohl

(1987, Chapter 4)).

These results are directly relevant for contemporaneous aggregation of VARMA processes

and they can also be used to study temporal aggregation problems. To see this suppose we

wish to aggregate the variables yt generated by (2.1) over m subsequent periods. For instance,

m = 3 if we wish to aggregate monthly data to quarterly figures. To express the temporal

7

EUI WP ECO 2004/24

aggregation as a linear transformation we define

yϑ =

ym(ϑ−1)+1

ym(ϑ−1)+2

...

ymϑ

and uϑ =

um(ϑ−1)+1

um(ϑ−1)+2

...

umϑ

(2.8)

and specify the process

A0yϑ = A1yϑ−1 + · · ·+APyϑ−P +M0uϑ +M1uϑ−1 + · · ·+MQuϑ−Q, (2.9)

where

A0 =

A0 0 0 . . . 0

−A1 A0 0 . . . 0

−A2 −A1 A0...

......

.... . .

−Am−1 −Am−2 −Am−3 . . . A0

,

Ai =

Aim Aim−1 . . . Aim−m+1

Aim+1 Aim . . . Aim−m+2

......

. . ....

Aim+m−1 Aim+m−2 . . . Aim

, i = 1, . . . , P,

with Aj = 0 for j > p and M0, . . . ,MQ defined in an analogous manner. The order

P = minn ∈ N|nm ≥ p and Q = minn ∈ N|nm ≥ q. Notice that the time subscript of

yϑ is different from that of yt. The new time index ϑ refers to another observation frequency

than t. For example, if t refers to months and m = 3, ϑ refers to quarters.

Using the process (2.9), temporal aggregation over m periods can be represented as a

linear transformation. In fact, different types of temporal aggregation can be handled. For

instance, the aggregate may be the sum of subsequent values or it may be their average.

Furthermore, temporal and contemporaneous aggregation can be dealt with simultaneously.

In all of these cases the aggregate has a finite order VARMA representation if the original

variables are generated by a finite order VARMA process and its structure can be ana-

lyzed using linear transformations. For another approach to study temporal aggregates see

Marcellino (1999).

8

EUI WP ECO 2004/24

2.4 Forecasting

2.4.1 General Results

When forecasting a set of variables is the objective, it is useful to think about a loss function

or an evaluation criterion for the forecast performance. Given such a criterion, optimal fore-

casts may be constructed. VARMA processes are particularly useful for producing forecasts

that minimize the forecast MSE. Therefore this criterion will be used here and the reader is

referred to Chapter ??? for a discussion of other forecast evaluation criteria.

Forecasts of the variables of the VARMA process (2.1) are obtained easily from the pure

VAR form (2.5). Assuming an independent white noise process ut, an optimal, minimum

MSE h-step forecast at time τ is the conditional expectation given the yt, t ≤ τ ,

yτ+h|τ ≡ E(yτ+h|yτ , yτ−1, . . . ).

It may be determined recursively for h = 1, 2, . . . , as

yτ+h|τ =∞∑i=1

Ξiyτ+h−i|τ , (2.10)

where yτ+j|τ = yτ+j for j ≤ 0. If the ut do not form an independent but only uncorrelated

white noise sequence, the forecast obtained in this way is still the best linear forecast although

it may not be the best in a larger class of possibly nonlinear functions of past observations.

For given initial values, the ut can also be determined under the present assumption of a

known process. Hence, the h-step forecasts may be determined alternatively as

yτ+h|τ = A−10 (A1yτ+h−1|τ + · · ·+ Apyτ+h−p|τ ) + A−1

0

q∑

i=h

Miuτ+h−i, (2.11)

where, as usual, the sum vanishes if h > q.

Both ways of computing h-step forecasts from VARMA models rely on the availability

of initial values. In the pure VAR formula (2.10) all infinitely many past yt are in principle

necessary if the VAR representation is indeed of infinite order. In contrast, in order to use

(2.11), the ut’s need to be known which are unobserved and can only be obtained if all

past yt or initial conditions are available. If only y1, . . . , yτ are given, the infinite sum in

(2.10) may be truncated accordingly. For large τ , the approximation error will be negligible

because the Ξi’s go to zero quickly as i → ∞. Alternatively, precise forecasting formulas

based on y1, . . . , yτ may be obtained via the so-called Multivariate Innovations Algorithm of

Brockwell & Davis (1987, §11.4).

9

EUI WP ECO 2004/24

Under our assumptions, the properties of the forecast errors for stable, stationary pro-

cesses are easily derived by expressing the process (2.1) in Wold MA form,

yt = ut +∞∑i=1

Φiut−i, (2.12)

where A0 = M0 is assumed (see (2.4)). In terms of this representation the optimal h-step

forecast may be expressed as

yτ+h|τ =∞∑

i=h

Φiuτ+h−i. (2.13)

Hence, the forecast errors are seen to be

yτ+h − yτ+h|τ = uτ+h + Φ1uτ+h−1 + · · ·+ Φh−1uτ+1. (2.14)

Thus, the forecast is unbiased (i.e., the forecast errors have mean zero) and the MSE or

forecast error covariance matrix is

Σy(h) ≡ E[(yτ+h − yτ+h|τ )(yτ+h − yτ+h|τ )′] =

h−1∑j=0

ΦjΣuΦ′j.

If ut is normally distributed (Gaussian), the forecast errors are also normally distributed,

yτ+h − yτ+h|τ ∼ N(0, Σy(h)). (2.15)

Hence, forecast intervals etc. may be derived from these results in the familiar way under

Gaussian assumptions.

It is also interesting to note that the forecast error variance is bounded by the covariance

matrix of yt,

Σy(h) →h→∞ Σy ≡ E(yty′t) =

∞∑j=0

ΦjΣuΦ′j. (2.16)

Hence, forecast intervals will also have bounded length as the forecast horizon increases.

The situation is different if there are integrated variables. The formula (2.11) can again be

used for computing the forecasts. Their properties will be different from those for stationary

processes, however. Although the Wold MA representation does not exist for integrated

processes, the Φj coefficient matrices can be computed in the same way as for stationary

processes from the power series A(z)−1M(z) which still exists for z ∈ C with |z| < 1. Hence,

the forecast errors can still be represented as in (2.14) (see Lutkepohl (1991, Chapter 11)).

Thus, formally the forecast errors look quite similar to those for the stationary case. Now

10

EUI WP ECO 2004/24

the forecast error MSE matrix is unbounded, however, because the Φj’s in general do not

converge to zero as j →∞. Despite this general result, there may be linear combinations of

the variables which can be forecast with bounded precision if the forecast horizon gets large.

This situation arises if there is cointegration. For cointegrated processes it is of course also

possible to base the forecasts directly on the EC form. For instance, using (2.6),

∆yτ+h|τ = A−10 (Πyτ+h−1|τ +Γ1∆yτ+h−1|τ +· · ·+Γp−1∆yτ+h−p+1|τ )+A−1

0

q∑

i=h

Miuτ+h−i, (2.17)

and yτ+h|τ = yτ+h−1|τ + ∆yτ+h|τ can be used to get a forecast of the levels variables.

As an illustration of forecasting cointegrated processes consider the following bivariate

VAR model which has cointegrating rank 1: y1t

y2t

=

0 1

0 1

y1,t−1

y2,t−1

+

u1t

u2t

. (2.18)

For this process

A(z)−1 = (I2 − A1z)−1 =∞∑

j=0

Aj1z

j =∞∑

j=0

Φjzj

exists only for |z| < 1 because Φ0 = I2 and

Φj = Aj1 =

0 1

0 1

, j = 1, 2, . . . ,

does not converge to zero for j →∞. The forecast MSE is

Σy(h) =h−1∑j=0

ΦjΣuΦ′j = Σu + (h− 1)

σ2

2 σ22

σ22 σ2

2

, h = 1, 2, . . . ,

where σ22 is the variance of u2t. The conditional expectations are yk,τ+h|τ = y2,τ (k = 1, 2).

Assuming normality of the white noise process, (1−γ)100% forecast intervals are easily seen

to be

y2,τ ± c1−γ/2

√σ2

k + (h− 1)σ22, k = 1, 2,

where c1−γ/2 is the (1− γ/2)100 percentage point of the standard normal distribution. The

lengths of these intervals increase without bounds for h →∞.

The EC representation of (2.18) is easily seen to be

∆yt =

−1 1

0 0

yt−1 + ut.

11

EUI WP ECO 2004/24

Thus, rk(Π) = 1 so that the two variables are cointegrated and some linear combinations can

be forecasted with bounded forecast intervals. For the present example, multiplying (2.18)

by

1 −1

0 1

gives

1 −1

0 1

yt =

0 0

0 1

yt−1 +

1 −1

0 1

ut.

Obviously, the cointegration relation zt = y1t − y2t = u1t − u2t is zero mean white noise

and the forecast intervals for zt for any forecast horizon h ≥ 1 are of constant length,

zτ+h|τ ± c1−γ/2σz(h) or [−c1−γ/2σz, c1−γ/2σz]. Note that zτ+h|τ = 0 for h ≥ 1 and σ2z =

Var(u1t) + Var(u2t)− 2Cov(u1t, u2t) is the variance of zt.

As long as theoretical results are discussed one could consider the first differences of the

process, ∆yt, which also have a VARMA representation. If there is genuine cointegration,

then ∆yt is overdifferenced in the sense that its VARMA representation has MA unit roots

even if the MA part of the levels yt is invertible.

2.4.2 Forecasting Aggregated Processes

We have argued in Section 2.3 that linear transformations of VARMA processes are often of

interest, for example, if aggregation is studied. Therefore forecasts of transformed processes

are also of interest. Here we present some forecasting results for transformed and aggregated

processes from Lutkepohl (1987) where also proofs and further references can be found. We

begin with general results which have immediate implications for contemporaneous aggrega-

tion. Then we will also present some results for temporally aggregated processes which can

be obtained via the process representation (2.9).

Linear Transformations and Contemporaneous Aggregation

Suppose yt is a stationary VARMA process with pure, invertible Wold MA representation

(2.4), that is, yt = Φ(L)ut with Φ0 = IK , F is an (M ×K) matrix with rank M and we are

interested in forecasting the transformed process zt = Fyt. It was discussed in Section 2.3

that zt also has a VARMA representation so that the previously considered techniques can

12

EUI WP ECO 2004/24

be used for forecasting. Suppose that the corresponding Wold MA representation is

zt =∞∑i=0

Ψivt−i = Ψ(L)vt. (2.19)

From (2.13) the optimal h-step predictor for zt at origin τ , based on its own past, is then

zτ+h|τ =∞∑

i=h

Ψivτ+h−i. (2.20)

In general this predictor will differ from one based on forecasting yt and then transforming

the forecast,

zoτ+h|τ ≡ Fyτ+h|τ . (2.21)

Before we compare the two forecasts zoτ+h|τ and zτ+h|τ it may be of interest to draw

attention to yet another possible forecast. If the dimension K of the vector yt is large, it

may be difficult to construct a suitable VARMA model for the underlying process and one

may consider forecasting the individual components of yt by univariate methods and then

transforming the univariate forecasts. Because the component series of yt can be obtained by

linear transformations, they also have ARMA representations. Denoting the corresponding

Wold MA representations by

ykt =∞∑i=0

θkiwk,t−i = θk(L)wkt, k = 1, . . . , K, (2.22)

the optimal univariate h-step forecasts are

yuk,τ+h|τ =

∞∑

i=h

θkiwk,τ+h−i, k = 1, . . . , K. (2.23)

Defining yuτ+h|τ = (yu

1,τ+h|τ , . . . , yuK,τ+h|τ )

′, these forecasts can be used to obtain an h-step

forecast

zuτ+h|τ ≡ Fyu

τ+h|τ (2.24)

of the variables of interest.

We will now compare the three forecasts (2.20), (2.21) and (2.24) of the transformed

process zt. In this comparison we denote the MSE matrices corresponding to the three

forecasts by Σz(h), Σoz(h) and Σu

z (h), respectively. Because zoτ+h|τ uses the largest information

set, it is not surprising that it has the smallest MSE matrix and is hence the best one out

of the three forecasts,

Σz(h) ≥ Σoz(h) and Σu

z (h) ≥ Σoz(h), (2.25)

13

EUI WP ECO 2004/24

where “≥” means that the difference between the left-hand and right-hand matrices is pos-

itive semidefinite. Thus, forecasting the original process yt and then transforming the fore-

casts is generally more efficient than forecasting the transformed process directly or trans-

forming univariate forecasts. It is possible, however, that some or all of the forecasts are

identical. Actually, for I(0) processes, all three predictors always approach the same long-

term forecast of zero. Consequently,

Σz(h), Σoz(h), Σu

z (h) → Σz ≡ E(ztz′t) as h →∞. (2.26)

Moreover, it can be shown that if the one-step forecasts are identical, then they will also be

identical for larger forecast horizons. More precisely we have,

zoτ+1|τ = zτ+1|τ ⇒ zo

τ+h|τ = zτ+h|τ h = 1, 2, . . . , (2.27)

zuτ+1|τ = zτ+1|τ ⇒ zu

τ+h|τ = zτ+h|τ h = 1, 2, . . . , (2.28)

and, if Φ(L) and Θ(L) are invertible,

zoτ+1|τ = zu

τ+1|τ ⇒ zoτ+h|τ = zu

τ+h|τ h = 1, 2, . . . . (2.29)

Thus, one may ask whether the one-step forecasts can be identical and it turns out that this

is indeed possible. The following proposition which summarizes results of Tiao & Guttman

(1980), Kohn (1982) and Lutkepohl (1984), gives conditions for this to happen.

Proposition 1. Let yt be a K-dimensional stochastic process with MA representation as in

(2.4) with Φ0 = IK and F an (M ×K) matrix with rank M . Then, using the notation intro-

duced in the foregoing and defining Θ(L) = diag[θ1(L), . . . , θK(L)], the following relations

hold:

zoτ+1|τ = zτ+1|τ ⇐⇒ FΦ(L) = Ψ(L)F, (2.30)

zuτ+1|τ = zτ+1|τ ⇐⇒ FΘ(L) = Ψ(L)F (2.31)

and, if Φ(L) and Θ(L) are invertible,

zoτ+1|τ = zu

τ+1|τ ⇐⇒ FΦ(L)−1 = FΘ(L)−1. (2.32)

¤

14

EUI WP ECO 2004/24

There are several interesting implications of this proposition. First, if yt consists of

independent components (Φ(L) = Θ(L)) and zt is just their sum, i.e., F = (1, . . . , 1), then

zoτ+1|τ = zτ+1|τ ⇐⇒ θ1(L) = · · · = θK(L). (2.33)

In other words, forecasting the individual components and summing up the forecasts is

strictly more efficient than forecasting the sum directly whenever the components are not

generated by identical stochastic processes. Second, forecasting the univariate components

of yt individually can only be as efficient a forecast for yt as forecasting on the basis of the

multivariate process if and only if Φ(L) is a diagonal matrix operator. Related to this result

is a well-known condition for Granger-noncausality. For a bivariate process yt = (y1t, y2t)′,

y2t is said to be Granger-causal for y1t if it is helpful for improving the forecasts of the latter

variable. In terms of the previous notation this may be stated by specifying F = (1, 0)

and defining y2t as being Granger-causal for y1t if zoτ+1|τ = Fyτ+1|τ = yo

1,τ+1|τ is a better

forecast than zτ+1|τ = y1,τ+1|τ . From (2.30) it then follows that y2t is not Granger-causal for

y1t if and only if φ12(L) = 0, where φ12(L) denotes the upper right hand element of Φ(L).

This characterization of Granger-noncausality is well-known in the related literature (e.g.,

Lutkepohl (1991, Section 2.3.1)).

It may also be worth noting that in general there is no unique ranking of the forecasts

zτ+1|τ and zuτ+1|τ . Depending on the structure of the underlying process yt and the trans-

formation matrix F , either Σz(h) ≥ Σuz (h) or Σz(h) ≤ Σu

z (h) will hold and the relevant

inequality may be strict in the sense that the left-hand and right-hand matrices are not

identical.

Some but not all the results in this section carry over to nonstationary I(1) processes.

For example, the result (2.26) will not hold in general if some components of yt are I(1)

because in this case the three forecasts do not necessarily converge to zero as the forecast

horizon gets large. On the other hand, the conditions in (2.30) and (2.31) can be used for

the differenced processes. For these results to hold, the MA operator may have roots on the

unit circle and hence overdifferencing is not a problem.

The previous results on linearly transformed processes can also be used to compare

different predictors for temporally aggregated processes by setting up the corresponding

process (2.9). Some related results will be summarized next.

15

EUI WP ECO 2004/24

Temporal Aggregation

Different forms of temporal aggregation are of interest, depending on the types of variables

involved. If yt consists of stock variables, then temporal aggregation is usually associated

with systematic sampling, sometimes called skip-sampling or point-in-time sampling. In other

words, the process

sϑ = ymϑ (2.34)

is used as an aggregate over m periods. Here the aggregated process sϑ has a new time index

which refers to another observation frequency than the original subscript t. For example, if

t refers to months and m = 3, then ϑ refers to quarters. In that case the process sϑ consists

of every third variable of the yt process. This type of aggregation contrasts with temporal

aggregation of flow variables where a temporal aggregate is typically obtained by summing

up consecutive values. Thus, aggregation over m periods gives the aggregate

zϑ = ymϑ + ymϑ−1 + · · ·+ ymϑ−m+1. (2.35)

Now if, for example, t refers to months and m = 3, then three consecutive observations are

added to obtain the quarterly value. In the following we again assume that the disaggregated

process yt is stationary and invertible and has a Wold MA representation as in (2.4), yt =

Φ(L)ut with Φ0 = IK . As we have seen in Section 2.3, this implies that sϑ and zϑ are also

stationary and have Wold MA representations. We will now discuss forecasting stock and

flow variables in turn. In other words, we consider forecasts for sϑ and zϑ.

Suppose first that we wish to forecast sϑ. Then the past aggregated values sϑ, sϑ−1, . . . may be used to obtain an h-step forecast sϑ+h|ϑ as in (2.13) on the basis of the MA repre-

sentation of sϑ. If the disaggregate process yt is available, another possible forecast results

by systematically sampling forecasts of yt which gives soϑ+h|ϑ = ymϑ+mh|mϑ. Using the results

for linear transformations, the latter forecast generally has a lower MSE than sϑ+h|ϑ and the

difference vanishes if the forecast horizon h → ∞. For special processes the two predictors

are identical, however. It follows from relation (2.30) of Proposition 1 that the two predictors

are identical for h = 1, 2, . . . , if and only if

Φ(L) =

( ∞∑i=0

ΦimLim

) (m−1∑i=0

ΦiLi

)(2.36)

(Lutkepohl (1987, Proposition 7.1)). Thus, there is no loss in forecast efficiency if the MA

operator of the disaggregate process has the multiplicative structure in (2.36). This condition

16

EUI WP ECO 2004/24

is, for instance, satisfied if yt is a purely seasonal process with seasonal period m such that

yt =∞∑i=0

Φimut−im. (2.37)

It also holds if yt has a finite order MA structure with MA order less than m. Interestingly,

it also follows that there is no loss in forecast efficiency if the disaggregate process yt is a

VAR(1) process, yt = A1yt−1 + ut. In that case, the MA operator can be written as

Φ(L) =

( ∞∑i=0

Aim1 Lim

)(m−1∑i=0

Ai1L

i

)

and, hence, it has the required structure.

Now consider the case of a vector of flow variables yt for which the temporal aggregate

is given in (2.35). For forecasting the aggregate zϑ one may use the past aggregated values

and compute an h-step forecast zϑ+h|ϑ as in (2.13) on the basis of the MA representation

of zϑ. Alternatively, we may again forecast the disaggregate process yt and aggregate the

forecasts. This forecast is denoted by zoϑ+h|ϑ, that is,

zoϑ+h|ϑ = ymϑ+mh|mϑ + ymϑ+mh−1|mϑ + · · ·+ ymϑ+mh−m+1|mϑ. (2.38)

Again the results for linear transformations imply that the latter forecast generally has a

lower MSE than zϑ+h|ϑ and the difference vanishes if the forecast horizon h → ∞. In this

case equality of the two forecasts holds for small forecast horizons h = 1, 2, . . . , if and only

if

(1 + L + · · ·+ Lm−1)

( ∞∑i=0

ΦiLi

)

=

( ∞∑j=0

(Φjm + · · ·+ Φjm−m+1)Ljm

) (m−1∑i=0

(Φ0 + Φ1 + · · ·+ Φi)Li

), (2.39)

where Φj = 0 for j < 0 (Lutkepohl (1987, Proposition 8.1)). In other words, the two forecasts

are identical and there is no loss in forecast efficiency from using the aggregate directly if

the MA operator of yt has the specified multiplicative structure upon multiplication by

(1 + L + · · ·+ Lm−1). This condition is also satisfied if yt has the purely seasonal structure

(2.37). However, in contrast to what was observed for stock variables, the two predictors are

generally not identical if the disaggregate process yt is generated by an MA process of order

less than m.

It is perhaps also interesting to note that if there are both stock and flow variables in one

system, then even if the underlying disaggregate process yt is the periodic process (2.37), a

17

EUI WP ECO 2004/24

forecast based on the disaggregate data may be better than directly forecasting the aggregate

(Lutkepohl (1987, pp. 177-178)). This result is interesting because for the purely seasonal

process (2.37) using the disaggregate process will not result in superior forecasts if a system

consisting either of stock variables only or of flow variables only is considered.

So far we have considered temporal aggregation of stationary processes. Most of the

results can be generalized to I(1) processes by considering the stationary process ∆yt instead

of the original process yt. Recall that forecasts for yt can then be obtained from those of

∆yt. Moreover, in this context it may be worth taking into account that in deriving some of

the conditions for forecast equality, the MA operator of the considered disaggregate process

may have unit roots resulting from overdifferencing. A result which does not carry over

to the I(1) case, however, is the equality of long horizon forecasts based on aggregate or

disaggregate variables. The reason is again that optimal forecasts of I(1) variables do not

settle down at zero eventually when h →∞.

Clearly, so far we have just discussed forecasting of known processes. In practice, the

DGPs have to be specified and estimated on the basis of limited sample information. In

that case quite different results may be obtained and, in particular, forecasts based on

disaggregate processes may be inferior to those based on the aggregate directly. This issue

is taken up again in Section 4.2 when forecasting estimated processes is considered.

Forecasting temporally aggregated processes has been discussed extensively in the liter-

ature. Early examples of treatments of temporal aggregation of time series are Abraham

(1982), Amemiya & Wu (1972), Brewer (1973), Lutkepohl (1986a, b), Stram & Wei (1986),

Telser (1967), Tiao (1972), Wei (1978) and Weiss (1984) among many others. More recently,

Breitung & Swanson (2002) have studied the implications of temporal aggregation when the

number of aggregated time units goes to infinity.

2.5 Extensions

So far we have considered processes which are too simple in some respects to qualify as

DGPs of most economic time series. This was mainly done to simplify the exposition. Some

important extensions will now be considered. In particular, we will discuss deterministic

terms, higher order integration and seasonal unit roots as well as non-Gaussian processes.

18

EUI WP ECO 2004/24

2.5.1 Deterministic Terms

An easy way to integrate deterministic terms in our framework is to simply add them to the

stochastic part. In other words, we consider processes

yt = µt + xt,

where µt is a deterministic term and xt is the purely stochastic part which is assumed to

have a VARMA representation of the type considered earlier. The deterministic part can, for

example, be a constant, µt = µ0, a linear trend, µt = µ0 + µ1t, or a higher order polynomial

trend. Furthermore, seasonal dummy variables or other dummies may be included.

From a forecasting point of view, deterministic terms are easy to handle because by

their very nature their future values are precisely known. Thus, in order to forecast yt, we

may forecast the purely stochastic process xt as discussed earlier and then simply add the

deterministic part corresponding to the forecast period. In this case, the forecast errors and

MSE matrices are the same as for the purely stochastic process. Of course, in practice the

deterministic part may contain unknown parameters which have to be estimated from data.

For the moment this issue is ignored because we are considering known processes. It will

become important, however, in Section 4, where forecasting estimated processes is discussed.

2.5.2 More Unit Roots

In practice the order of integration of some of the variables can be greater than one and

det A(z) may have roots on the unit circle other than z = 1. For example, there may be

seasonal unit roots. Considerable research has been done on these extensions of our basic

models. See, for instance, Johansen (1995b, 1997), Gregoir & Laroque (1994) and Haldrup

(1998) for discussions of the I(2) and higher order integration frameworks, and Johansen &

Schaumburg (1999) and Gregoir (1999a, b) for research on processes with roots elsewhere

on the unit circle. Bauer & Wagner (2003) consider state space representations for VARMA

models with roots at arbitrary points on the unit circle.

As long as the processes are assumed to be known these issues do not create additional

problems for forecasting because we can still use the general forecasting formulas for VARMA

processes. Extensions are important, however, when it comes to model specification and

estimation. In these steps of the forecasting procedure taking into account extensions in the

methodology may be useful.

19

EUI WP ECO 2004/24

2.5.3 Non-Gaussian Processes

If the DGP of a multiple time series is not normally distributed, point forecasts can be

computed as before. They will generally still be best linear forecasts and may in fact be

minimum MSE forecasts if ut is independent white noise, as discussed in Section 2.4. In

setting up forecast intervals the distribution has to be taken into account, however. If the

distribution is unknown, bootstrap methods can be used to compute interval forecasts (e.g.,

Findley (1986), Masarotto (1990), Grigoletto (1998), Kabaila (1993), Pascual, Romo & Ruiz

(2004)).

3 Specifying and Estimating VARMA Models

As we have seen in the previous section, for forecasting purposes the pure VAR or MA rep-

resentations of a stochastic process are quite useful. These representations are in general of

infinite order. In practice, they have to be replaced by finite dimensional parameterizations

which can be specified and estimated from data. VARMA processes are such finite dimen-

sional parameterizations. Therefore, in practice, a VARMA model such as (2.1) or even a

pure finite order VAR as in (1.1) will be specified and estimated as a forecasting tool.

As mentioned earlier, the operators A(L) and M(L) of the VARMA model (2.2) are not

unique or not identified, as econometricians sometimes say. This nonuniqueness is prob-

lematic if the process parameters have to be estimated because a unique representation is

needed for consistent estimation. Before we discuss estimation and specification issues re-

lated to VARMA processes we will therefore present identifying restrictions. More precisely,

the echelon form of VARMA and EC-VARMA models will be presented. Then estimation

procedures, model specification and diagnostic checking will be discussed.

3.1 The Echelon Form

Any pair of operators A(L) and M(L) that gives rise to the same VAR operator Ξ(L) =

IK −∑∞

i=1 ΞiLi = M(L)−1A(L) or MA operator Φ(L) = A(L)−1M(L) defines an equivalent

VARMA process for yt. Here A0 = M0 is assumed. Clearly, if we premultiply A(L) and

M(L) by some invertible operator D(L) = D0 + D1L + · · · + DqLq satisfying det(D0) 6= 0

and det D(z) 6= 0 for |z| ≤ 1, an equivalent VARMA representation is obtained. Thus, a

first step towards finding a unique representation is to cancel common factors in A(L) and

M(L). We therefore assume that the operator [A(L) : M(L)] is left-coprime. To define this

20

EUI WP ECO 2004/24

property, recall that a matrix polynomial D(z) and the corresponding operator D(L) are

unimodular if det D(z) is a constant which does not depend on z. Examples of unimodular

operators are

D(L) = D0 or D(L) =

1 δL

0 1

(3.1)

(see Lutkepohl (1996) for definitions and properties of matrix polynomials). A matrix oper-

ator [A(L) : M(L)] is called left-coprime if only unimodular operators D(L) can be factored.

In other words, [A(L) : M(L)] is left-coprime if for operators A(L), M(L) an operator D(L)

exists such that [A(L) : M(L)] = D(L)[A(L) : M(L)] holds, then D(L) must be unimodular.

Although considering only left-coprime operators [A(L) : M(L)] does not fully solve the

nonuniqueness problem of VARMA representations it is a first step in the right direction

because it excludes many possible redundancies. It does not rule out premultiplication by

some nonsingular matrix, for example, and thus, there is still room for improvement. Even

if A0 = M0 = IK is assumed, uniqueness of the operators is not achieved because there

are unimodular operators D(L) with zero-order matrix IK , as seen in (3.1). Premultiplying

[A(L) : M(L)] by such an operator maintains left-coprimeness. Therefore more restrictions

are needed for uniqueness. The echelon form discussed in the next subsections provides

sufficiently many restrictions in order to ensure uniqueness of the operators. We will first

consider stationary processes and then turn to EC-VARMA models.

3.1.1 Stationary Processes

We assume that [A(L) : M(L)] is left-coprime and we denote the kl-th elements of A(L) and

M(L) by αkl(L) and mkl(L), respectively. Let pk be the maximum polynomial degree in the

k-th row of [A(L) : M(L)], k = 1, . . . , K, and define

pkl =

min(pk + 1, pl) for k > l,

min(pk, pl) for k < l,k, l = 1, . . . , K.

The VARMA process is said to be in echelon form or, briefly, ARMAE form if the operators

A(L) and M(L) satisfy the following restrictions (Lutkepohl & Claessen (1997), Lutkepohl

(2002)):

mkk(L) = 1 +

pk∑i=1

mkk,iLi, for k = 1, . . . , K, (3.2)

21

EUI WP ECO 2004/24

mkl(L) =

pk∑i=pk−pkl+1

mkl,iLi, for k 6= l, (3.3)

and

αkl(L) = αkl,0 −pk∑i=1

αkl,iLi, with αkl,0 = mkl,0 for k, l = 1, . . . , K. (3.4)

Here the row degrees pk (k = 1, . . . , K) are called the Kronecker indices (see Hannan &

Deistler (1988), Lutkepohl (1991)). In the following we denote an echelon form with Kro-

necker indices p1, . . . , pK by ARMAE(p1, . . . , pK). Notice that it corresponds to a VARMA(p, p)

representation in (2.1) with p = max(p1, . . . , pK). An ARMAE form may have more zero

coefficients than those specified by the restrictions from (3.2)-(3.4). In particular, there may

be models where the AR and MA orders are not identical due to further zero restrictions.

Such over-identifying constraints are not ruled out by the echelon form. It does not need

them to ensure uniqueness of the operator [A(L) : M(L)] for a given VAR operator Ξ(L)

or MA operator Φ(L), however. Note also that every VARMA process can be written in

echelon form. Thus, the echelon form does not exclude any VARMA processes.

The present specification of the echelon form does not restrict the autoregressive operator

except for the maximum row degrees imposed by the Kronecker indices and the zero order

matrix (A0 = M0). Additional identifying zero restrictions are placed on the moving average

coefficient matrices attached to low lags of the error process ut. This form of the echelon form

was proposed by Lutkepohl & Claessen (1997) because it can be combined conveniently with

the EC representation of a VARMA process, as we will see shortly. Thus, it is particularly

useful for processes with cointegrated variables. A slightly different echelon form is usually

used in the literature on stationary processes. Typically the restrictions on low order lags

are imposed on the VAR coefficient matrices (e.g., Hannan & Deistler (1988), Lutkepohl

(1991)).

To illustrate the present ARMAE form we consider the following three-dimensional pro-

cess from Lutkepohl (2002) with Kronecker indices (p1, p2, p3) = (1, 2, 1). It is easy to derive

the pkl,

[pkl] =

• 1 1

1 • 1

1 2 •

.

22

EUI WP ECO 2004/24

Hence, we get the ARMAE(1, 2, 1) form:

1 0 0

0 1 0

0 α32,0 1

yt =

α11,1 α12,1 α13,1

α21,1 α22,1 α23,1

α31,1 α32,1 α33,1

yt−1 +

0 0 0

α21,2 α22,2 α23,2

0 0 0

yt−2

+

1 0 0

0 1 0

0 α32,0 1

ut +

m11,1 m12,1 m13,1

0 m22,1 0

m31,1 m32,1 m33,1

ut−1

+

0 0 0

m21,2 m22,2 m23,2

0 0 0

ut−2.

(3.5)

Although in this example the zero order matrix is lower triangular, it will often be an identity

matrix in an echelon form. In fact, it will always be an identity matrix if the Kronecker indices

are ordered from smallest to largest.

3.1.2 I(1) Processes

If the EC form of the ARMAE model is set up as in (2.6), the autoregressive short-run

coefficient matrices Γi (i = 1, . . . , p − 1) satisfy similar identifying constraints as the Ai’s

(i = 1, . . . , p). More precisely, Γi obeys the same zero restrictions as Ai+1 for i = 1, . . . , p−1.

This structure follows from the specific form of the zero restrictions on the Ai’s. If αkl,i is

restricted to zero by the echelon form this implies that the corresponding element αkl,j of Aj

is also zero for j > i. Similarly, the zero restrictions on Π are the same as those on A0−A1.

Because the echelon form does not impose zero restrictions on A1 if all Kronecker indices

pk ≥ 1 (k = 1, . . . , K), there are no echelon form zero restrictions on Π if all Kronecker

indices are greater than zero. On the other hand, if there are zero Kronecker indices, this

has consequences for the rank of Π and, hence, for the integration and cointegration structure

of the variables. In fact, denoting by % the number of zero Kronecker indices, it can be shown

that

rk(Π) ≥ %. (3.6)

This result is useful to remember when procedures for specifying the cointegrating rank of a

VARMA system are considered.

In the following we use the acronym EC-ARMAE for an EC-VARMA model which sat-

isfies the echelon form restrictions. To illustrate its structure, we follow again Lutkepohl

23

EUI WP ECO 2004/24

(2002) and use the model (3.5). The corresponding EC-ARMAE form is

1 0 0

0 1 0

0 α32,0 1

∆yt =

π11 π12 π13

π21 π22 π23

π31 π32 π33

yt−1 +

0 0 0

γ21,1 γ22,1 γ23,1

0 0 0

∆yt−1

+

1 0 0

0 1 0

0 α32,0 1

ut +

m11,1 m12,1 m13,1

0 m22,1 0

m31,1 m32,1 m33,1

ut−1

+

0 0 0

m21,2 m22,2 m23,2

0 0 0

ut−2.

The following three-dimensional ARMAE(0, 0, 1) model is another example from Lutkepohl

(2002):

yt =

0 0 0

0 0 0

α31,1 α32,1 α33,1

yt−1 + ut +

0 0 0

0 0 0

m31,1 m32,1 m33,1

ut−1. (3.7)

Note that in this case A0 = I3. Now two of the Kronecker indices are zero and, hence,

according to (3.5), the cointegrating rank of this system must be at least 2. The EC-ARMAE

form is

∆yt =

1 0 0

0 1 0

π31 π32 π33

yt−1 + ut +

0 0 0

0 0 0

m31,1 m32,1 m33,1

ut−1.

The rank of

Π =

1 0 0

0 1 0

π31 π32 π33

is clearly at least two.

Because we now have a unique representation of VARMA models we can discuss estima-

tion of such models. Of course, to estimate an ARMAE or EC-ARMAE form we need to

specify the Kronecker indices and possibly the cointegrating rank. We will discuss parameter

estimation first and then consider model specification issues.

24

EUI WP ECO 2004/24

Before we go on with these topics, we mention that there are other ways to achieve

uniqueness or identification of a VARMA representation. For example, Zellner & Palm (1974)

and Wallis (1977) consider a final form representation which also solves the identification

problem. It often results in rather heavily parameterized models and has therefore not

gained much popularity. Tiao & Tsay (1989) propose so-called scalar component models

to overcome the identification problem. The idea is to consider linear combinations of the

variables which can reveal simplifications of the general VARMA structure. The interested

reader is referred to the aforementioned article. We have presented the echelon form here in

some detail because it often results in parsimonious representations.

3.2 Estimation of VARMA Models for Given Lag Orders and

Cointegrating Rank

For given Kronecker indices the ARMAE form of a VARMA DGP can be set up and esti-

mated. We will consider this case first and then study estimation of EC-ARMAE models for

which the cointegrating rank is given in addition to the Kronecker indices. Specification of

the Kronecker indices and the cointegrating rank will be discussed in Sections 3.4 and 3.3,

respectively.

3.2.1 ARMAE Models

Suppose the white noise process ut is normally distributed (Gaussian), ut ∼ N(0, Σu). Given

a sample y1, . . . , yT and presample values y0, . . . , yp−1, u0, . . . , uq−1, the log-likelihood function

of the VARMA model (2.1) is

l(θ) =T∑

t=1

lt(θ). (3.8)

Here θ represents the vector of all parameters to be estimated and

lt(θ) = −K

2log 2π − 1

2log det Σu − 1

2u′tΣ

−1u ut,

where

ut = M−10 (A0yt − A1yt−1 − · · · − Apyt−p −M1ut−1 − · · · −Mqut−q).

It is assumed that the uniqueness restrictions of the ARMAE form are imposed and θ contains

the freely varying parameters only. The initial values are assumed to be fixed and if the ut

25

EUI WP ECO 2004/24

(t ≤ 0) are not available, they may be replaced by zero without affecting the asymptotic

properties of the estimators.

Maximization of l(θ) is a nonlinear optimization problem which is complicated by the

inequality constraints that ensure invertibility of the MA operator. Iterative optimization

algorithms may be used here. Start-up values for such algorithms may be obtained as follows:

An unrestricted long VAR model of order hT , say, is fitted by OLS in a first step. Denoting

the estimated residuals by ut, the ARMAE form can be estimated when all lagged ut’s are

replaced by ut’s. If A0 6= IK , then unlagged ujt in equation k (k 6= j) may also be replaced

by estimated residuals from the long VAR. The resulting parameter estimates can be used

as starting values for an iterative algorithm.

If the DGP is stable and invertible and the parameters are identified, the ML estimator

θ has standard limiting properties, that is, θ is consistent and

√T (θ − θ)

d→ N(0, Σθ),

whered→ signifies convergence in distribution and Σθ is the inverse asymptotic information

matrix. This result holds even if the true distribution of the ut’s is not normal but satisfies

suitable moment conditions. In that case the estimators are just quasi ML estimators, of

course.

There has been some discussion of the likelihood function of VARMA models and its

maximization (Tunnicliffe Wilson (1973), Nicholls & Hall (1979), Hillmer & Tiao (1979)).

Unfortunately, optimization of the Gaussian log-likelihood is not a trivial exercise. Therefore

other estimation methods have been proposed in the literature (e.g., Koreisha & Pukkila

(1987), Kapetanios (2003), Poskitt (2003)). Of course, it is also straightforward to add

deterministic terms to the model and estimate the associated parameters along with the

VARMA coefficients.

3.2.2 EC-ARMAE Models

If the cointegrating rank r is given and the DGP is a pure, finite order VAR(p) process, the

corresponding VECM,

∆yt = αβ′yt−1 + Γ1∆yt−1 + · · ·+ Γp−1∆yt−p+1 + ut, (3.9)

can be estimated conveniently by RR regression, as shown in Johansen (1995a). Concentrat-

ing out the short-run dynamics by regressing ∆yt and yt−1 on ∆Y ′t−1 = [∆y′t−1, . . . , ∆y′t−p+1]

26

EUI WP ECO 2004/24

and denoting the residuals by R0t and R1t, respectively, the EC term can be estimated by

RR regression from

R0t = αβ′R1t + uct . (3.10)

Because the decomposition Π = αβ′ is not unique, the estimators for α and β are not

consistent whereas the resulting ML estimator for Π is consistent. However, because the

matrices α and β have rank r, one way to make them unique is to choose

β′ = [Ir : β′(K−r)], (3.11)

where β(K−r) is a ((K−r)×r) matrix. This normalization is always possible upon a suitable

ordering of the variables. The ML estimator of β(K−r) can be obtained by post-multiplying

the RR estimator β of β by the inverse of its first r rows and using the last K−r rows as the

estimator β(K−r) of β(K−r). This estimator is not only consistent but even superconsistent

meaning that it converges at a faster rate than the usual√

T to the true parameter matrix

β(K−r). In fact, it turns out that T (β(K−r) − β(K−r)) converges in distribution. As a result

inference for the other parameters can be done as if the cointegration matrix β were known.

Other estimation procedures that can be used here as well were proposed by Ahn &

Reinsel (1990) and Saikkonen (1992). In fact, in the latter article it was shown that the

procedure can even be justified if the true DGP is an infinite order VAR process and only a

finite order model is fitted, as long as the order goes to infinity with growing sample size. This

result is convenient in the present situation where we are interested in VARMA processes,

because we can estimate the cointegration relations in a first step on the basis of a finite

order VECM without MA part. Then the estimated cointegration matrix can be used in

estimating the remaining VARMA parameters. That is, the short-run parameters including

the loading coefficients α and MA parameters of the EC-ARMAE form can then be estimated

by ML conditional on the estimator for β. Because of the superconsistency of the estimator

for the cointegration parameters this procedure maintains the asymptotic efficiency of the

Gaussian ML estimator. Except for the cointegration parameters, the parameter estimators

have standard asymptotic properties which are equivalent to those of the full ML estimators

(Yap & Reinsel (1995)). If the Kronecker indices are given, the echelon VARMA structure

can also be taken into account in estimating the cointegration matrix.

As mentioned earlier, before a model can be estimated, the Kronecker indices and possibly

the cointegrating rank have to be specified. These issues are discussed next.

27

EUI WP ECO 2004/24

3.3 Testing for the Cointegrating Rank

A wide range of proposals exists for determining the cointegrating ranks of pure VAR pro-

cesses (see Hubrich, Lutkepohl & Saikkonen (2001) for a recent survey). The most popular

approach is due to Johansen (1995a) who derives likelihood ratio (LR) tests for the cointe-

grating rank of a pure VAR process. Because ML estimation of unrestricted VECMs with a

specific cointegrating rank r is straightforward for Gaussian processes, the LR statistic for

testing the pair of hypotheses H0 : r = r0 versus H1 : r > r0 is readily available by comparing

the likelihood maxima for r = r0 and r = K. The asymptotic distributions of the LR statis-

tics are nonstandard and depend on the deterministic terms included in the model. Tables

with critical values for various different cases are available in Johansen (1995a, Chapter 15).

The cointegrating rank can be determined by checking sequentially the null hypotheses

H0 : r = 0, H0 : r = 1, . . . , H0 : r = K − 1

and choosing the cointegrating rank for which the first null hypothesis cannot be rejected in

this sequence.

For our present purposes it is of interest that Johansen’s LR tests can be justified even

if a finite-order VAR process is fitted to an infinite order DGP, as shown by Lutkepohl &

Saikkonen (1999). It is assumed in this case that the order of the fitted VAR process goes

to infinity with the sample size and Lutkepohl & Saikkonen (1999) discuss the choice of

the VAR order in this approach. Because the Kronecker indices are usually also unknown,

choosing the cointegrating rank of a VARMA process by fitting a long VAR process is an

attractive approach which avoids knowledge of the VARMA structure at the stage where the

cointegrating rank is determined. So far the theory for this procedure seems to be available

for processes with nonzero mean term only and not for other deterministic terms such as

linear trends. It seems likely, however, that extensions to more general processes are possible.

An alternative way to proceed in determining the cointegrating rank of a VARMA process

was proposed by Yap & Reinsel (1995). They extended the likelihood ratio tests to VARMA

processes under the assumption that an identified structure of A(L) and M(L) is known.

For these tests the Kronecker indices or some other identifying structure has to be specified

first. If the Kronecker indices are known already, a lower bound for the cointegrating rank

is also known (see (3.6)). Hence, in testing for the cointegrating rank, only the sequence of

null hypotheses H0 : r = %,H0 : r = % + 1, . . . , H0 : r = K − 1, is of interest. Again, the

rank may be chosen as the smallest value for which H0 cannot be rejected.

28

EUI WP ECO 2004/24

3.4 Specifying the Lag Orders and Kronecker Indices

A number of proposals for choosing the Kronecker indices of ARMAE models were made, see,

for example, Hannan & Kavalieris (1984), Poskitt (1992), Nsiri & Roy (1992) and Lutkepohl

& Poskitt (1996) for stationary processes and Lutkepohl & Claessen (1997), Claessen (1995),

Poskitt & Lutkepohl (1995) and Poskitt (2003) for cointegrated processes. The strategies

for specifying the Kronecker indices of cointegrated ARMAE processes presented in this

section are proposed in the latter two papers. Poskitt (2003, Proposition 3.3) presents a

result regarding the consistency of the estimators of the Kronecker indices. A simulation

study of the small sample properties of the procedures was performed by Bartel & Lutkepohl

(1998). They found that the methods work reasonably well in small samples for the processes

considered in their study. This section draws partly on Lutkepohl (2002, Section 8.4.1).

The specification method proceeds in two stages. In the first stage a long reduced-

form VAR process of order hT , say, is fitted by OLS giving estimates of the unobservable

innovations ut as in the previously described estimation procedure. In a second stage the

estimated residuals are substituted for the unknown lagged ut’s in the ARMAE form. A range

of different models is estimated and the Kronecker indices are chosen by model selection

criteria.

There are different possibilities for doing so within this general procedure. For example,

one may search over all models associated with Kronecker indices which are smaller than

some prespecified upper bound pmax, (p1, . . . , pK)|0 ≤ pk ≤ pmax, k = 1, . . . , K. The set

of Kronecker indices is then chosen which minimizes the preferred model selection criterion.

For systems of moderate or large dimensions this procedure is rather computer intensive

and computationally more efficient search procedures have been suggested. One idea is to

estimate the individual equations separately by OLS for different lag lengths. The lag length

is then chosen so as to minimize a criterion of the general form

Λk,T (n) = log σ2k,T (n) + CT n/T , n = 0, 1, . . . , PT ,

where CT is a suitable function of the sample size T and T σ2k,T (n) is the residual sum of

squares from a regression of ykt on (ujt − yjt) (j = 1, . . . , K, j 6= k) and yt−s and ut−s

(s = 1, . . . , n). The maximum lag length PT is also allowed to depend on the sample size.

In this procedure the echelon structure is not explicitly taken into account because the

equations are treated separately. The k-th equation will still be misspecified if the lag order

is less than the true Kronecker index. Moreover, the k-th equation will be correctly specified

but may include redundant parameters and variables if the lag order is greater than the

29

EUI WP ECO 2004/24

true Kronecker index. This explains why the criterion function Λk,T (n) will possess a global

minimum asymptotically when n is equal to the true Kronecker index, provided CT is chosen

appropriately. In practice, possible choices of CT are CT = hT log T or CT = h2T (see Poskitt

(2003) for more details on the procedure). Poskitt & Lutkepohl (1995) and Poskitt (2003)

also consider a modification of this procedure where coefficient restrictions derived from

those equations in the system which have smaller Kronecker indices are taken into account.

The important point to make here is that procedures exist which can be applied in a fully

computerized model choice. Thus, model selection is feasible from a practical point of view

although the small sample properties of these procedures are not clear in general, despite

some encouraging but limited small sample evidence by Bartel & Lutkepohl (1998). Other

procedures for specifying the Kronecker indices for stationary processes were proposed by

Akaike (1976), Cooper & Wood (1982), Tsay (1989b) and Nsiri & Roy (1992), for example.

The Kronecker indices found in a computer automated procedure for a given time series

should only be viewed as a starting point for a further analysis of the system under consid-

eration. Based on the specified Kronecker indices a more efficient procedure for estimating

the parameters may be applied (see Section 3.2) and the model may be subjected to a range

of diagnostic tests. If such tests produce unsatisfactory results, modifications are called for.

Tools for checking the model adequacy will be briefly summarized in the following section.

3.5 Diagnostic Checking

As noted in Section 3.2, the estimators of an identified version of a VARMA model have

standard asymptotic properties. Therefore the usual t- and F -tests can be used to decide

on possible overidentifying restrictions. When a parsimonious model without redundant

parameters has been found, the residuals can be checked. According to our assumptions

they should be white noise and a number of model-checking tools are tailored to check

this assumption. For this purpose one may consider individual residual series or one may

check the full residual vector at once. The tools range from visual inspection of the plots

of the residuals and their autocorrelations to formal tests for residual autocorrelation and

autocorrelation of the squared residuals to tests for nonnormality and nonlinearity (see, e.g.,

Lutkepohl (1991), Doornik & Hendry (1997)). It is also advisable to check for structural

shifts during the sample period. Possible tests based on prediction errors are considered in

Lutkepohl (1991). Moreover, when new data becomes available, out-of-sample forecasts may

be checked. Model defects detected at the checking stage should lead to modifications of the

30

EUI WP ECO 2004/24

original specification.

4 Forecasting with Estimated Processes

4.1 General Results

To simplify matters suppose that the generation process of a multiple time series of interest

admits a VARMA representation with zero order matrices equal to IK ,

yt = A1yt−1 + · · ·+ Apyt−p + ut + M1ut−1 + · · ·+ Mqut−q, (4.1)

that is, A0 = M0 = IK . Recall that in the echelon form framework this representation can

always be obtained by premultiplying by A−10 if A0 6= IK . We denote by yτ+h|τ the h-step

forecast at origin τ given in Section 2.4, based on estimated rather than known coefficients.

For instance, using the pure VAR representation of the process,

yτ+h|τ =h−1∑i=1

Ξiyτ+h−i|τ +∞∑

i=h

Ξiyτ+h−i. (4.2)

For practical purposes one may truncate the infinite sum at i = τ . For the moment we will,

however, consider the infinite sum. For this predictor the forecast error is

yτ+h − yτ+h|τ = (yτ+h − yτ+h|τ ) + (yτ+h|τ − yτ+h|τ ),

where yτ+h|τ is the optimal forecast based on known coefficients and the two terms on the

right-hand side are uncorrelated if only data up to period τ are used for estimation. In that

case the first term can be written in terms of ut’s with t > τ and the second one contains

only yt’s with t ≤ τ . Thus, the forecast MSE becomes

Σy(h) = MSE(yτ+h|τ ) + MSE(yτ+h|τ − yτ+h|τ )

= Σy(h) + E[(yτ+h|τ − yτ+h|τ )(yτ+h|τ − yτ+h|τ )′]. (4.3)

The MSE(yτ+h|τ − yτ+h|τ ) can be approximated by Ω(h)/T , where

Ω(h) = E

[∂yτ+h|τ

∂θ′Σθ

∂y′τ+h|τ∂θ

], (4.4)

θ is the vector of estimated coefficients, and Σθ is its asymptotic covariance matrix (see

Yamamoto (1980), Baillie (1981) and Lutkepohl (1991) for more detailed expressions for

Ω(h) and Hogue, Magnus & Pesaran (1988) for an exact treatment of the AR(1) special

31

EUI WP ECO 2004/24

case). If ML estimation is used, the covariance matrix Σθ is just the inverse asymptotic

information matrix. Clearly, Ω(h) is positive semidefinite and the forecast MSE,

Σy(h) = Σy(h) +1

TΩ(h), (4.5)

for estimated processes is larger (or at least not smaller) than the corresponding quantity

for known processes, as one would expect. The additional term depends on the estimation

efficiency because it includes the asymptotic covariance matrix of the parameter estimators.

Therefore, estimating the parameters of a given process well is also important for forecasting.

On the other hand, for large sample sizes T , the additional term will be small or even

negligible.

It may be worth noting that deterministic terms can be accommodated easily, as discussed

in Section 2.5. In the present situation the uncertainty in the estimators related to such terms

can also be taken into account like that of the other parameters. If the deterministic terms are

specified such that the corresponding parameter estimators are asymptotically independent

of the other estimators, an additional term for the estimation uncertainty stemming from

the deterministic terms has to be added to the forecast MSE matrix (4.5). For deterministic

linear trends in univariate models more details are presented in Kim, Leybourne & Newbold

(2004).

Various extensions of the previous results have been discussed in the literature. For

example, Lewis & Reinsel (1985) and Lutkepohl (1985) consider the forecast MSE for the

case where the true process is approximated by a finite order VAR, thereby extending earlier

univariate results by Bhansali (1978). Reinsel & Lewis (1987), Basu & Sen Roy (1987),

Engle & Yoo (1987), Sampson (1991) and Reinsel & Ahn (1992) present results for processes

with unit roots. Stock (1996) and Kemp (1999) assume that the forecast horizon h and the

sample size T both go to infinity simultaneously. Clements & Hendry (1998, 2001) consider

various other sources of possible forecast errors. Taking into account the specification and

estimation uncertainty in multi-step forecasts, it makes also sense to construct a separate

model for each specific forecast horizon h. This approach is discussed in detail by Bhansali

(2002).

4.2 Aggregated Processes

In Section 2.4 we have compared different forecasts for aggregated time series. It was found

that generally forecasting the disaggregate process and aggregating the forecasts (zoτ+h|τ ) is

32

EUI WP ECO 2004/24

more efficient than forecasting the aggregate directly (zτ+h|τ ). In this case, if the sample

size is large enough, the part of the forecast MSE due to estimation uncertainty will even-

tually be so small that the estimated zoτ+h|τ is again superior to the corresponding zτ+h|τ .

There are cases, however, where the two forecasts are identical for known processes. Now

the question arises whether in these cases the MSE term due to estimation errors will make

one forecast preferable to its competitors. Indeed if estimated instead of known processes

are used, it is possible that zoτ+h|τ looses its optimality relative to zτ+h|τ because the MSE

part due to estimation may be larger for the former than for the latter. Consider the case,

where a number of series are simply added to obtain a univariate aggregate. Then it is

possible that a simple parsimonious univariate ARMA model describes the aggregate well,

whereas a large multivariate model is required for an adequate description of the multivari-

ate disaggregate process. Clearly, it is conceivable that the estimation uncertainty in the

multivariate case becomes considerably more important than for the univariate model for the

aggregate. Lutkepohl (1987) shows that this may indeed happen in small samples. In fact,

similar situations can not only arise for contemporaneous aggregation but also for temporal

aggregation. Generally, if two predictors based on known processes are nearly identical, the

estimation part of the MSE becomes important and generally the predictor based on the

smaller model is then to be preferred.

There is also another aspect which is important for comparing forecasts. So far we

have only taken into account the effect of estimation uncertainty on the forecast MSE. This

analysis still assumes a known model structure and only allows for estimated parameters.

In practice, model specification usually precedes estimation and usually there is additional

uncertainty attached to this step in the forecasting procedure. It is also possible to explicitly

take into account the fact that in practice models are only approximations to the true DGP

by considering finite order VAR and AR approximations to infinite order processes. This

has also been done by Lutkepohl (1987). Under these assumptions it is again found that

the forecast zoτ+h|τ looses its optimality and forecasting the aggregate directly or forecasting

the disaggregate series with univariate methods and aggregating univariate forecasts may

become preferable.

5 Conclusions

VARMA models are a powerful tool for producing linear forecasts for a set of time series

variables. They utilize the information not only in the past values of a particular variable

33

EUI WP ECO 2004/24

of interest but also allow for information in other, related variables. We have mentioned

conditions under which the forecasts from these models are optimal under an MSE criterion

for forecast performance. Even if the conditions for minimizing the forecast MSE in the

class of all functions are not satisfied the forecasts will be best linear forecasts under general

assumptions. These appealing theoretical features of VARMA models make them attractive

tools for forecasting.

Special attention has been paid to forecasting linearly transformed and aggregated pro-

cesses. Both contemporaneous as well as temporal aggregation have been studied. It was

found that generally forecasting the disaggregated process and aggregating the forecasts is

more efficient than forecasting the aggregate directly and thereby ignoring the disaggregate

information. Moreover, for contemporaneous aggregation, forecasting the individual compo-

nents with univariate methods and aggregating these forecasts was compared to the other

two possible forecasts. Forecasting univariate components separately may lead to better

forecasts than forecasting the aggregate directly. It will be inferior to aggregating forecasts

of the fully disaggregated process, however. These results hold if the DGPs are known.

In practice the relevant model for forecasting a particular set of time series will not be

known, however, and it is necessary to use sample information to specify and estimate a

suitable candidate model from the VARMA class. We have discussed estimation methods

and specification algorithms which are suitable at this stage of the forecasting process for

stationary as well as integrated processes. The nonuniqueness or lack of identification of

general VARMA representations turned out to be a major problem at this stage. We have

focussed on the echelon form as one possible parameterization that allows to overcome the

identification problem. The echelon form has the advantage of providing a relatively parsi-

monious VARMA representation in many cases. Moreover, it can be extended conveniently

to cointegrated processes by including an EC term. It is described by a set of integers called

Kronecker indices. Statistical procedures were presented for specifying these quantities. We

have also presented methods for determining the cointegrating rank of a process if some or

all of the variables are integrated. This can be done by applying standard cointegrating rank

tests for pure VAR processes because these tests maintain their usual asymptotic properties

even if they are performed on the basis of an approximating VAR process rather than the

true DGP. We have also briefly discussed issues related to checking the adequacy of a par-

ticular model. Overall a coherent strategy for specifying, estimating and checking VARMA

models has been presented. Finally, the implications of using estimated rather than known

processes for forecasting have been discussed.

34

EUI WP ECO 2004/24

If estimation and specification uncertainty are taken into account it turns out that fore-

casts based on a disaggregated multiple time series may not be better and may in fact be

inferior to forecasting an aggregate directly. This situation is in particular likely to occur if

the DGPs are such that efficiency gains from disaggregation do not exist or are small and

the aggregated process has a simple structure which can be captured with a parsimonious

model.

Clearly, VARMA models also have some drawbacks as forecasting tools. First of all,

linear forecasts may not always be the best choice (see Chapter ??? for a discussion of

forecasting with nonlinear models). Second, adding more variables in a system does not

necessarily increase the forecast precision. Higher dimensional systems are typically more

difficult to specify than smaller ones. Thus, considering as many series as possible in one

system is clearly not a good strategy. The increase in estimation and specification uncertainty

may offset the advantages of using additional information. VARMA models appear to be

most useful for analyzing small sets of time series. Choosing the best set of variables for a

particular forecasting exercise may not be an easy task. In conclusion, although VARMA

models are an important forecasting tool, the actual success very much depends on the skills

of the user of these tools.

References

Abraham, B. (1982). Temporal aggregation and time series, International Statistical Review50: 285–291.

Ahn, S. K. & Reinsel, G. C. (1990). Estimation of partially nonstationary multivariate autoregres-sive models, Journal of the American Statistical Association 85: 813–823.

Akaike, H. (1974). Stochastic theory of minimal realization, IEEE Transactions on AutomaticControl AC-19: 667–674.

Akaike, H. (1976). Canonical correlation analysis of time series and the use of an informationcriterion, in R. K. Mehra & D. G. Lainiotis (eds), Systems Identification: Advances and CaseStudies, Academic Press, New York, pp. 27–96.

Amemiya, T. & Wu, R. Y. (1972). The effect of aggregation on prediction in the autoregressivemodel, Journal of the American Statistical Association 67: 628–632.

Aoki, M. (1987). State Space Modeling of Time Series, Springer-Verlag, Berlin.

Baillie, R. T. (1981). Prediction from the dynamic simultaneous equation model with vectorautoregressive errors, Econometrica 49: 1331–1337.

Bartel, H. & Lutkepohl, H. (1998). Estimating the Kronecker indices of cointegrated echelon formVARMA models, Econometrics Journal 1: C76–C99.

35

EUI WP ECO 2004/24

Basu, A. K. & Sen Roy, S. (1987). On asymptotic prediction problems for multivariate autore-gressive models in the unstable nonexplosive case, Calcutta Statistical Association Bulletin36: 29–37.

Bauer, D. & Wagner, M. (2003). A canonical form for unit root processes in the state spaceframework, Diskussionsschriften 03-12, Universitat Bern.

Bhansali, R. J. (1978). Linear prediction by autoregressive model fitting in the time domain, Annalsof Statistics 6: 224–231.

Bhansali, R. J. (2002). Multi-step forecasting, in M. P. Clements & D. F. Hendry (eds), A Com-panion to Economic Forecasting, Blackwell, Oxford, pp. 206–221.

Box, G. E. P. & Jenkins, G. M. (1976). Time Series Analysis: Forecasting and Control, Holden-Day,San Francisco.

Breitung, J. & Swanson, N. R. (2002). Temporal aggregation and spurious instantaneous causalityin multiple time series models, Journal of Time Series Analysis 23: 651–665.

Brewer, K. R. W. (1973). Some consequences of temporal aggregation and ssystematic samplingfor ARMA and ARMAX models, Journal of Econometrics 1: 133–154.

Brockwell, P. J. & Davis, R. A. (1987). Time Series: Theory and Methods, Springer-Verlag, NewYork.

Claessen, H. (1995). Spezifikation und Schatzung von VARMA-Prozessen unter besondererBerucksichtigung der Echelon Form, Verlag Joseph Eul, Bergisch-Gladbach.

Clements, M. P. & Hendry, D. F. (1998). Forecasting Economic Time Series, Cambridge UniversityPress, Cambridge.

Clements, M. P. & Hendry, D. F. (1999). Forecasting Non-stationary Economic Time Series, MITPress, Cambridge, Ma.

Cooper, D. M. & Wood, E. F. (1982). Identifying multivariate time series models, Journal of TimeSeries Analysis 3: 153–164.

Doornik, J. A. & Hendry, D. F. (1997). Modelling Dynamic Systems Using PcFiml 9.0 for Windows,International Thomson Business Press, London.

Dunsmuir, W. T. M. & Hannan, E. J. (1976). Vector linear time series models, Advances in AppliedProbability 8: 339–364.

Engle, R. F. & Granger, C. W. J. (1987). Cointegration and error correction: Representation,estimation and testing, Econometrica 55: 251–276.

Engle, R. F. & Yoo, B. S. (1987). Forecasting and testing in cointegrated systems, Journal ofEconometrics 35: 143–159.

Findley, D. F. (1986). On bootstrap estimates of forecast mean square errors for autoregressive pro-cesses, in D. M. Allen (ed.), Computer Science and Statistics: The Interface, North-Holland,Amsterdam, pp. 11–17.

Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectralmethods, Econometrica 37: 424–438.

36

EUI WP ECO 2004/24

Granger, C. W. J. (1981). Some properties of time series data and their use in econometric modelspecification, Journal of Econometrics 16: 121–130.

Granger, C. W. J. & Newbold, P. (1977). Forecasting Economic Time Series, Academic Press, NewYork.

Gregoir, S. (1999a). Multivariate time series with various hidden unit roots, part I: Integral operatoralgebra and representation theorem, Econometric Theory 15: 435–468.

Gregoir, S. (1999b). Multivariate time series with various hidden unit roots, part II: Estimationand test, Econometric Theory 15: 469–518.

Gregoir, S. & Laroque, G. (1994). Polynomial cointegration: Estimation and test, Journal ofEconometrics 63: 183–214.

Grigoletto, M. (1998). Bootstrap prediction intervals for autoregressions: Some alternatives, Inter-national Journal of Forecasting 14: 447–456.

Haldrup, N. (1998). An econometric analysis of I(2) variables, Journal of Economic Surveys12: 595–650.

Hannan, E. J. (1970). Multiple Time Series, John Wiley, New York.

Hannan, E. J. (1976). The identification and parameterization of ARMAX and state space forms,Econometrica 44: 713–723.

Hannan, E. J. (1979). The statistical theory of linear systems, in P. R. Krishnaiah (ed.), Develop-ments in Statistics, Academic Press, New York, pp. 83–121.

Hannan, E. J. (1981). Estimating the dimension of a linear system, Journal of Multivariate Analysis11: 459–473.

Hannan, E. J. & Deistler, M. (1988). The Statistical Theory of Linear Systems, Wiley, New York.

Hannan, E. J. & Kavalieris, L. (1984). Multivariate linear time series models, Advances in AppliedProbability 16: 492–561.

Hillmer, S. C. & Tiao, G. C. (1979). Likelihood function of stationary multiple autoregressivemoving average models, Journal of the American Statistical Association 74: 652–660.

Hogue, A., Magnus, J. & Pesaran, B. (1988). The exact multi-period mean-square forecast errorfor the first-order autoregressive model, Journal of Econometrics 39: 327–346.

Hubrich, K., Lutkepohl, H. & Saikkonen, P. (2001). A review of systems cointegration tests,Econometric Reviews 20: 247–318.

Jenkins, G. M. & Alavi, A. S. (1981). Some aspects of modelling and forecasting multivariate timeseries, Journal of Time Series Analysis 2: 1–47.

Johansen, S. (1995a). Likelihood-based Inference in Cointegrated Vector Autoregressive Models,Oxford University Press, Oxford.

Johansen, S. (1995b). A statistical analysis of cointegration for I(2) variables, Econometric Theory11: 25–59.

Johansen, S. (1997). Likelihood analysis of the I(2) model, Scandinavian Journal of Statistics24: 433–462.

37

EUI WP ECO 2004/24

Johansen, S. & Schaumburg, E. (1999). Likelihood analysis of seasonal cointegration, Journal ofEconometrics 88: 301–339.

Kabaila, P. (1993). On bootstrap predictive inference for autoregressive processes, Journal of TimeSeries Analysis 14: 473–484.

Kapetanios, G. (2003). A note on an iterative least-squares estiamtion method for ARMA andVARMA models, Economics Letters 79: 305–312.

Kemp, G. C. R. (1999). The behovior of forecast errors from a nearly integrated AR(1) model asboth sample size and forecast horizon become large, Econometric Theory 15: 238–256.

Kim, T. H., Leybourne, S. J. & Newbold, P. (2004). Asymptotic mean-squared forecast errorwhen an autoregression with linear trend is fitted to data generated by an I(0) or I(1) process,Journal of Time Series Analysis 25: 583–602.

Kohn, R. (1982). When is an aggregate of a time series efficiently forecast by its past?, Journal ofEconometrics 18: 337–349.

Koreisha, S. G. & Pukkila, T. M. (1987). Identification of nonzero elements in the polynomialmatrices of mixed VARMA processes, Journal of the Royal Statistical Society B49: 112–126.

Lewis, R. & Reinsel, G. C. (1985). Prediction of multivarate time series by autoregressive modelfitting, Journal of Multivariate Analysis 16: 393–411.

Lutkepohl, H. (1984). Linear transformations of vector arma processes, Journal of Econometrics26: 283–293.

Lutkepohl, H. (1985). The joint asymptotic distribution of multistep prediction errors of estimatedvector autoregressions, Economics Letters 17: 103–106.

Lutkepohl, H. (1986a). Forecasting temporally aggregated vector ARMA processes, Journal ofForecasting 5: 85–95.

Lutkepohl, H. (1986b). Forecasting vector ARMA processes with systematically missing observa-tions, Journal of Business & Economic Statistics 4: 375–390.

Lutkepohl, H. (1987). Forecasting Aggregated Vector ARMA Processes, Springer-Verlag, Berlin.

Lutkepohl, H. (1991). Introduction to Multiple Time Series Analysis, Springer Verlag, Berlin.

Lutkepohl, H. (1996). Handbook of Matrices, John Wiley & Sons, Chichester.

Lutkepohl, H. (2002). Forecasting cointegrated VARMA processes, in M. P. Clements & D. F.Hendry (eds), A Companion to Economic Forecasting, Blackwell, Oxford, pp. 179–205.

Lutkepohl, H. & Claessen, H. (1997). Analysis of cointegrated VARMA processes, Journal ofEconometrics 80: 223–239.

Lutkepohl, H. & Poskitt, D. S. (1996). Specification of echelon form VARMA models, Journal ofBusiness & Economic Statistics 14: 69–79.

Lutkepohl, H. & Saikkonen, P. (1999). Order selection in testing for the cointegrating rank of aVAR process, in R. F. Engle & H. White (eds), Cointegration, Causality, and Forecasting. AFestschrift in Honour of Clive W.J. Granger, Oxford University Press, Oxford, pp. 168–199.

38

EUI WP ECO 2004/24

Marcellino, M. (1999). Some consequences of temporal aggregation in empirical analysis, Journalof Business & Economic Statistics 17: 129–136.

Masarotto, G. (1990). Bootstrap prediction intervals for autoregressions, International Journal ofForecasting 6: 229–239.

Nicholls, D. F. & Hall, A. D. (1979). The exact likelihood of multivariate autoregressive movingaverage models, Biometrika 66: 259–264.

Nsiri, S. & Roy, R. (1992). On the identification of ARMA echelon-form models, Canadian Journalof Statistics 20: 369–386.

Pascual, L., Romo, J. & Ruiz, E. (2004). Bootstrap predictive inference for ARIMA processes,Journal of Time Series Analysis 25: 449–465.

Poskitt, D. S. (1992). Identification of echelon canonical forms for vector linear processes usingleast squares, Annals of Statistics 20: 196–215.

Poskitt, D. S. (2003). On the specification of cointegrated autoregressive moving-average forecastingsystems, International Journal of Forecasting 19: 503–519.

Poskitt, D. S. & Lutkepohl, H. (1995). Consistent specification of cointegrated autoregressivemoving average systems, Discussion Paper 54, SFB 373, Humboldt-Universitat zu Berlin.

Quenouille, M. H. (1957). The Analysis of Multiple Time-Series, Griffin, London.

Reinsel, G. C. (1993). Elements of Multivariate Time Series Analysis, Springer-Verlag, New York.

Reinsel, G. C. & Ahn, S. K. (1992). Vector autoregressive models with unit roots and reduced rankstructure: Estimation, likelihood ratio test, and forecasting, Journal of Time Series Analysis13: 353–375.

Reinsel, G. C. & Lewis, A. L. (1987). Prediction mean square error for non-stationary multivariatetime series using estimated parameters, Economics Letters 24: 57–61.

Saikkonen, P. (1992). Estimation and testing of cointegrated systems by an autoregressive approx-imation, Econometric Theory 8: 1–27.

Saikkonen, P. & Lutkepohl, H. (1996). Infinite order cointegrated vector autoregressive processes:Estimation and inference, Econometric Theory 12: 814–844.

Sampson, M. (1991). The effect of parameter uncertainty on forecast variances and confidence in-tervals for unit root and trend stationary time-series models, Journal of Applied Econometrics6: 67–76.

Sims, C. A. (1980). Macroeconomics and reality, Econometrica 48: 1–48.

Stock, J. H. (1996). VAR, error correction and pretest forecasts at long horizons, Oxford Bulletinof Economics and Statistics 58: 685–701.

Stram, D. O. & Wei, W. W. S. (1986). Temporal aggregation in the ARIMA process, Journal ofTime Series Analysis 7: 279–292.

Telser, L. G. (1967). Discrete samples and moving sums in stationary stochastic processes, Journalof the American Statistical Association 62: 484–499.

39

EUI WP ECO 2004/24

Tiao, G. C. (1972). Asymptotic behaviour of temporal aggregates of time series, Biometrika59: 525–531.

Tiao, G. C. & Box, G. E. P. (1981). Modeling multiple time series with applications, Journal ofthe American Statistical Association 76: 802–816.

Tiao, G. C. & Guttman, I. (1980). Forecasting contemporal aggregates of multiple time series,Journal of Econometrics 12: 219–230.

Tiao, G. C. & Tsay, R. S. (1983). Multiple time series modeling and extended sample cross-correlations, Journal of Business & Economic Statistics 1: 43–56.

Tiao, G. C. & Tsay, R. S. (1989). Model specification in multivariate time series (with discussion),Journal of the Royal Statistical Society B51: 157–213.

Tsay, R. S. (1989a). Identifying multivariate time series models, Journal of Time Series Analysis10: 357–372.

Tsay, R. S. (1989b). Parsimonious parameterization of vector autoregressive moving average models,Journal of Business & Economic Statistics 7: 327–341.

Tunnicliffe Wilson, G. (1973). Estimation of parameters in multivariate time series models, Journalof the Royal Statistical Society B35: 76–85.

Wallis, K. F. (1977). Multiple time series analysis and the final form of econometric models,Econometrica 45: 1481–1497.

Wei, W. W. S. (1978). Some consequences of temporal aggregation in seasonal time series models, inA. Zellner (ed.), Seasonal Analysis of Economic Time Series, U.S. Department of Commerce,Bureau of the Census, pp. 433–444.

Wei, W. W. S. (1990). Time Series Analysis: Univariate and Multivariate Methods, Addison-Wesley, Redwood City, Ca.

Weiss, A. A. (1984). Systematic sampling and temporal aggregation in time series models, Journalof Econometrics 26: 271–281.

Yamamoto, T. (1980). On the treatment of autocorrelated errors in the multiperiod prediction ofdynamic simultaneous equation models, International Economic Review 21: 735–748.

Yap, S. F. & Reinsel, G. C. (1995). Estimation and testing for unit roots in a partially nonstationaryvector autoregressive moving average model, Journal of the American Statistical Association90: 253–267.

Zellner, A. & Palm, F. (1974). Time series analysis and simultaneous equation econometric models,Journal of Econometrics 2: 17–54.

40

EUI WP ECO 2004/24

forecasting with varma models · the successful use of univariate arma models for forecasting has...

Documents