forecasting us interest rates with cointegrated varma models

Forecasting US interest rates with Cointegrated

VARMA Models. ∗

Christian Kascha†‡ Carsten Trenkler§

University of Zurich University of Mannheim

August 30, 2011

Abstract

We bring together some recent advances in the literature on vec-tor autoregressive moving-average models creating a relatively simplespecification and estimation strategy for the cointegrated case. Weshow that in the cointegrated case with fixed initial values there existsa so-called final moving representation which is usually simpler but notas parsimonious than the usual Echelon form. Furthermore, we proofthat our specification strategy is consistent also in the case of cointe-grated series. In order to show the potential usefulness of the method,we apply it to US interest rates and find that it generates forecastssuperior to methods which do not allow for moving-average terms.

Keywords: Cointegration, VARMA Models, Forecasting

1 Introduction

In this paper, we propose a relatively simple specification and estimationstrategy for the cointegrated vector-autoregressive moving-average (VARMA)models using the estimators given in Yap & Reinsel (1995) and Poskitt(2003) and the identified forms proposed by Dufour & Pelletier (2008). In

∗We thank Francesco Ravazzolo, Kyusang Yu, Michael Vogt and seminars participantsat the Tinbergen Institute Amsterdam, University of Bonn, University of Konstanz, Uni-versity of Mannheim and University of Zurich as well as session participants at the ISF2011, Prague, and at the ESEM 2011, Oslo, for very helpful comments.

†University of Zurich, Chair for Statistics and Empirical Economic Research,Zurichbergstrasse 14, 8032 Zurich, Switzerland; [email protected]

‡Norges Bank, Research Department, P.O. Box 1179 Sentrum, 0107 Oslo, Norway§Address of corresponding author: University of Mannheim, Department of Eco-

nomics, Chair of Empirical Economics, L7, 3-5, 68131 Mannheim, Germany; [email protected]

1

order to show its potential usefulness, we apply the procedure in a forecast-ing exercise for US interest rates and find promising results.

Existing specification and estimation procedures for cointegrated VARMAmodels can be found in Yap & Reinsel (1995), Lutkepohl & Claessen (1997),Poskitt (2003, 2006) and also Poskitt (2009). Common to these papers isthe use of the reverse “Echelon-Form”, a set of parameter restrictions whichmake sure that the remaining coefficients are identified with respect to thelikelihood function. A related but different approach uses so-called “scalar-component” representations originally proposed by Tiao & Tsay (1989) andembedded in a complete estimation procedure by Athanasopoulos & Vahid(2008). While both structures can be quite parsimonious representations ofa given process, they can display relatively complex structures. Instead, weextend the simpler identified representations of Dufour & Pelletier (2008) tothe cointegrated case with fixed initial values. Furthermore, we propose tospecify the model using Dufour & Pelletier’s (2008) order selection criteriaapplied to the model estimated in levels. We proof a.s. consistency of theestimated orders in this case. While we believe that our proposed specifica-tion and estimation procedure for this class of models stands out because ofits simplicity and robustness, this is not to say that our procedure should bepreferred to the alternative methods mentioned above. This is an empiricalquestion that goes beyond the scope of this paper.

Finally, we apply the methods to the problem of predicting U.S. treasurybill and bond interest rates with different maturities taking cointegration asgiven. We find quite promising results relative to a multivariate randomwalk and the standard vector error correction model (VECM). We displaythe forecasting performance of the VECM and cointegrated VARMA modelrelative to the multivariate random walk over time. These plots show thatthe VARMA model delivers good forecasts apart from a period stretchingfrom the mid-nineties to 2000.

The motivation for looking at this particular model class stems fromthe well-known theoretical advantages of VARMA models over pure vector-autoregressive (VAR) processes; see e.g. Lutkepohl (2005). In contrast toVAR models, the class of VARMA models is closed under linear transforma-tions. For example, a subset of variables generated by a VAR process is typi-cally generated by a VARMA, not by a VAR process (Lutkepohl 1984a,b). Itis well known that linearized dynamic stochastic general equilibrium (DSGE)models imply that the variables of interest are generated by a finite-orderVARMA process. Fernandez-Villaverde, Rubio-Ramırez, Sargent & Watson(2007) show formally how DSGE models and VARMA processes are linked.Cooley & Dwyer (1998) claim that modeling macroeconomic time series sys-tematically as pure VARs is not justified by the underlying economic theory.A comparsion of structural identification using VAR, VARMA and statespace representations is provided by Kascha & Mertens (2009).

Our particular application is part of a vast literature on the term struc-

2

ture of interest rates. Following the seminal paper of Campbell & Shiller(1987), the cointegration approach has become a widespread tool for termstructure analysis in the field of applied econometrics. If the rational expec-tation hypothesis of the term structure holds (REHTS) then the spread oftwo interest rates is stationary. Accordingly, there should exist K−1 cointe-gration relations in a system of K (nonstationary) interest rates. While thereis strong empirical evidence for K − 1 cointegration relations among moneymarket rates and medium-term bond yields, see e.g. Hall, Anderson & Granger(1992), Engsted & Tanggaard (1994), Cuthbertson (1996), Hassler & Wolters(2001), a smaller number of cointegration relations is usually found if long-term interest rates are considered in addition. This has been found e.g. byShea (1992), Carstensen (2003) and Cavaliere, Rahbek & Taylor (2010).

Starting with Campbell & Shiller (1987), many studies have also ana-lyzed whether the spreads help to predict individual interest rates by clas-sical regression models including the spread or the use of vector autore-gressive models. Given the widespread use of cointegration techniques totest for the REHTS, it is, however, surprising that only a few papers applycorresponding multivariate models for cointegration like the VECM for pre-dicting interest rates; see Hall et al. (1992), Hassler & Wolters (2001) and,more recently, Clarida, Sarno, Taylor & Valente (2006)

In contrast, the results on the forecasting performance of (cointegrated)VARMA models are very sparse. Using an identified form different fromours, Yap & Reinsel (1995) apply a cointegrated VARMA model to U.S. in-terest rates but do not evaluate its forecasting performance. An interestingcontribution is the one by Feunou (2009). He uses a VARMA model for mod-eling the whole yield curve imposing no-arbitrage restrictions in a stationarymodel instead of cointegration restrictions. Another study is provided byMonfort & Pegoraro (2007) using switching VARMA term structure mod-els, again in a stationary context. The applied part of this paper adds tothis literature.

The rest of the paper is organized as follows. Section 2 presents thepaper’s contribution and the application. Section 3 gives details on theproposed methodology. Section 4 concludes. Programs and data can befound on the homepages of the authors.

2 Cointegrated VARMA models

This section summarizes the model framework and the results of the paper.The technical details are given in section 3 and the proofs are provided inthe appendix.

This paper mainly assembles and extends elements of the work of Yap & Reinsel(1995), Poskitt (2003) and Dufour & Pelletier (2008) in order to constructa reasonably easy and fast strategy for the specification and estimation of

3

cointegrated VARMA models. The considered model for a time series ofdimension K, yt = (yt,1, . . . , yt,K)′, is

yt = µ0 +p∑

j=1

Ajyt−j + ut +q∑

j=1

Mjut−j for t = 1, . . . , T (1)

|A(z)| 6= 0 for |z| < 1|M(z)| 6= 0 for |z| ≤ 1

given fixed initial values y0, . . . , y−p+1, A(L) = IK −A1L− . . .−ApLp and

M(L) = IK +M1L+ . . .+MqLq, L denotes the lag operator, | · | refers to the

determinant. The error terms ut are assumed to be i.i.d. with mean zero andpositive definite covariance matrix, Σu, and at least finite second moments(Poskitt 2003, Assumptions A.2, A.3). We are interested in the case in whichthe process has s unit roots such that |A(z)| = ast(z)(1− z)s for 0 < s ≤ Kand |ast(z)| 6= 0 for z ≤ 1. Then it is said that the cointegration rank ofyt is r = K − s and we can decompose Π :=

∑pj=1 Aj − IK as Π = αβ′,

where α and β are (n× r) matrices with full column rank r. Furthermore,the constant is assumed to take the form µ0 = −αρ such that a trend in thedifferences is ruled out and one can write

∆yt = αβ′(yt−1 − ρ) +k∑

j=1

Γj∆yt−j + ut +q∑

j=j

Mjut−j (2)

Now, it is well known, that one has to impose certain restrictions on theparameter matrices in order to achieve uniqueness. That is, given a se-ries (yt)T

−p+1, there is generally more than one pair of finite polynomials[A(z), M(z)] such that (1) is satisfied. Therefore, one has to restrict theset of considered pairs [A(z), M(z)] to a subset such that every processsatisfying (1) is represented by exactly one pair in this subset.

Poskitt (2003) proposes a complete modelling strategy using the Ech-elon form which is based on so-called Kronecker indices. Here, we usethe much simpler final moving-average (FMA) representation proposed byDufour & Pelletier (2008) in the context of stationary VARMA models. Thisrepresentation imposes restrictions on the moving-average polynomial only.More precisely, we consider only polynomials [A(z), M(z)], such that

M(L) = m(L)IK , m(L) = 1 + m1L + . . . + mqLq. (3)

is true.1 As already noted by Dufour & Pelletier (2008), this identification1 Dufour & Pelletier (2008) also propose another representation that restricts attention

to pairs with diagonal moving-average polynomials such as

M(L) =

K⊕

k=1

mk(L), mk(L) = 1 + mk,1L + . . . mk,qLqk (4)

where m(z) and the mk(z) are scalar polynomials. This form actually delivered resultssimilar to the ones for the FMA form and will therefore not be discussed in the paper.

4

strategy is valid despite A(z) having roots on the unit circle. The reasonis that the polynomial M−1(L)A(L) can be uniquely related to M(L) andA(L) (Dufour & Pelletier 2008, Lemma 3.8 and Theorem 3.9). What is left,is only to show the definition of the FMA form in the non-stationary contextwith fixed initial values. Analogous to the results in Poskitt (2006), we canshow that in this particular case the resulting pair of polynomials does nothave to be left-coprime anymore.

Prior to estimation and specification, we subtract the sample mean fromthe observations, that is, we actually apply the methods to yt−T−1

∑Ts=1 ys

in the VARMA case. However, the notation will not distinguish between rawand adjusted data and we simply write

yt =p∑

j=1

Ajyt−j + ut +q∑

j=1

Mjut−j , (1’)

for example. The used estimation methods remain valid, provided the con-stant can indeed be absorbed in the cointegrating relation; see Yap & Reinsel(1995, section 6.) and Poskitt (2003, section 2).

Dufour & Pelletier (2008) propose an information criterion for choosingthe integers p and q in the context of stationary version of (1’) identified via(3). The model is estimated for different orders via a long autoregression toobtain an estimate of the unknown residuals and estimates of the VARMAmodel are then obtained by a generalized least squares procedure. We usetheir procedure to choose these integers for the cointegrated VARMA modelin levels.

Having determined the orders p and q, we use the algorithm describedin Poskitt (2003) to obtain an initial estimate of the cointegrated model.This step amounts to one OLS and one GLS regression. Poskitt’s (2003)estimates are then updated using the algorithm described in Yap & Reinsel(1995) in order to obtain efficient estimates in a Gaussian setting. This laststep is another GLS regression.

The complete procedure for a given cointegrating rank is

1. Subtract the sample mean from the yt

2. Estimate (1’) for different orders p and q imposing the FMA formusing OLS. The order estimate (p, q) is the pair which minimizes theinformation criterion in (14).

3. Get a preliminary estimate via Poskitt’s method.

4. Update the preliminary estimates by the method described by Yapand Reinsel.

For the proofs and the forecasting exercise, we take the cointegrating rankas given. However, one might use the results in section 6 of Yap & Reinsel

5

(1995) to specify the cointegrating rank at the last two steps of the proce-dure.

We show that this modeling strategy is potentially interesting by ap-plying it to a prediction exercise for US interest rates and comparing theresulting forecasts to those of the random walk (RW) model and a VECMthat has only autoregressive terms and whose order is chosen by minimiz-ing the BIC. Hence, we compare two modeling strategies rather than twomodels: one, which is allowing for nonzero moving average terms and in-cluding the special case of a pure VAR and one, which exclusively considersthe latter case. We take monthly averages of interest rate data for trea-sury bills and bonds from the FRED database of the Federal Reserve Bankof St. Louis. The used data are the series TB3MS, TB6MS, GS1, GS5 andGS10 with maturities, 3 months, 6 months, 1 year, 5 years and 10 years,respectively. Our vintage starts in 1970:1 and ends in 2010:1 and comprisesT = 482 data points. We use the first Ts = 200 observations for estimationsuch that, for example, we are left with 282 observations for evaluating 1-month ahead forecasts. Denote by Rt,mk

the annualized interest rate for thek-th maturity mk. Throughout we analyze yt,k := 100 ln(1 + Rt,mk

).Table 1 and Table 2 contain the main results for the RW model, the

VECM, the cointegrated VARMA estimated via Poskitt’s (2003) initial es-timator (VARMA) and the the cointegrated VARMA with estimates up-dated by one iteration via the algorithm presented in Yap & Reinsel (1995)(VARMA YP). The first table gives the mean square prediction errors(MSPEs) series by series for different systems and horizons. The MSPEis defined in a standard way. The second table gives results for the deter-minant of the MSPE matrix for different horizons and systems; that is,

|MSPEh| =∣∣∣∣∣

1T − Ts − h + 1

T−h∑

t=Ts

(yt+h − yt+h|t)(yt+h − yt+h|t)′∣∣∣∣∣ ,

using an obvious notation and omitting the dependence on the specific modeland system. The last criterion serves as a criterion to measure joint forecast-ing precision as we do not want to enter the discussion of how to obtain thebest density forecast. In Table 1, the maturities of the systems are given onthe left column; that is, the first two columns stand for the bivariate systemwith interest rates for maturities 3 and 6 months. The forecast horizons are1, 3, 6 and 12 months. Table 2 is structured similarly. For both tables, theentries for the RW model are absolute while the entries for the other modelsare always relative to the corresponding entry for the RW model. For exam-ple, the first entry in the first row tells us that the random walk producesa one-step-ahead MSPE of 0.042 which corresponds to

√0.042 ' 0.205 per-

centage points. In the same row, the entry for the VECM at h = 1 tells usthat this model produces one-step-ahead forecasts of the 3-month interestrate that have a MSPE which is roughly 20 % lower than the MSPE of the

6

RW model.Table 1 shows that the cointegrated models are more advantageous rela-

tive to the RW model for the bivariate systems than for the larger systems.Apparently, cointegrated VAR or VARMA models can be very advantegousat one-month and three-month horizons while the RW becomes more com-petitive for longer horizons and longer maturities - at least when individualMSPEs are considered. The cointegrated models also work better whenmaturities which are close to each other are grouped together. When com-paring the MSPE figures for the VECM and VARMA model (VARMA andVARMA YP) one can see that the VARMA model is generally performingbetter than the VECM model, sometimes quite clearly. To give an exam-ple, the gain in forecasting precision can amount to more than 20% for thebivariate systems at short horizons. Typically, a VARMA(1,1) model is pre-ferred by the information criterion over pure VAR models, while the BICusually picks two autoregressive lags. An exception is the system consistingof five variables. Here, the lag selection criterion almost always chooses nomoving-average terms and thus the “VARMA results” are actually resultsfor the pure VECM model when estimated with the algorithm by Poskitt(2003) or Yap & Reinsel (1995), respectively. Therefore, the comparison forthe five-variable system amounts to a comparison of different estimation al-gorithms for the same model and it turns out that in this case maximumlikelihood is largely preferable to the approximative methods in terms of theMSPE measure.

The results in Table 2 largely reflect those in Table 1. That is, thecointegrated models’ forecasts are usually more precise than the forecastsgenerated by the random walk and the VARMA predictions are usuallymore accurate than the VAR predictions apart from the special case of thefive-dimensional system as discussed above. However, in contrast to thesingle MSPE results, the forecasts generated by the cointegrated models aresuperior - in terms of joint criterion - to those of the random walk even atlonger horizons, in particular for h = 12.

To get a complete picture of the performance of the cointegrated modelsvis-a-vis the RW for h-step-ahead forecasts for the k-th series in the systemwe compute cumulative sums of squared prediction errors as defined as

t∑

s=T s+h

e2s,RW,k,h − e2

s,M,k,h t = T s + h, . . . , T (5)

where M stands for either the VECM or the cointegrated VARMA model(VARMA YP) and et,RW,k,h, et,M,k,h are the forecast errors from predictingyt,k based on information up to t−h, i.e. et,M,k,h = yt,k− yt,k|t−h,M. Ideally,we should see the above sum steadily increasing over time if forecastingmethod M is indeed preferable to the RW. The pictures are given in Figure2 for the system with maturities 3 months and 1 year and forecast horizons

7

h = 1, 6, 12.First, Figure 2 of course mirrors the results of Table 1 as the Table

contains the end-of-sample results. Second, the forecasting advantage ofboth models is relatively consistent through time. While there are occasional“jumps” when the cointegrated models perform much better than the RWmodel, these jumps do not appear to dominate the results in Table 1. Theforecasting advantage of the VARMA model appears however marginallymore stable over time. Third, there is also a period roughly from the midnineties to 2000 when the RW model performed better than the cointegratedmultivariate models. Interestingly, a similar finding is also obtained byde Pooter, Ravazzolo & van Dijk (2010) in a different context. In sum, thepictures support the view that the forecasting advantage of both the VECMand VARMA model over the RW is systematic.

3 Methodological Details

3.1 VARMA Modelling

We start discussing the general VARMA process

A0yt =p∑

j=1

Ajyt−j + M0ut +p∑

j=1

Mjut−j , for t = 1, . . . , T, (6)

where, slightly abusing notation, p denotes the maximum of the autoregres-sive and moving average lag order in this section. Using the notation fromPoskitt (2006) with minor modifications, we define deg[A(z), M(z)] as themaximum row degree max1≤k≤K degk[A(z), M(z)] where degk[A(z), M(z)]denotes the polynomial degree of the kth row of [A(z), M(z)]. Then we candefine a class of processes by {[AM ]}p := {[A(z), M(z)]| deg[A(z), M(z)] =p}.

For the moment we just assume

Assumption 3.1 The K-dimensional series (yt)Tt=1−p admits a VARMA

representation as in (6) with A0 = M0, A0 invertible, [A(z), M(z)] ∈{[AM ]}p and fixed initial values y1−p, . . . , y0;

but impose additional restrictions of the general model as needed. Theproofs are given in the appendix.

Identification

The identification of the parameters of the FMA form follows from the ob-servation that any process that satisfies (6) can always be written as

yt =t+p−1∑

s=1

Πsyt−s + ut + nt, t = 1− p, . . . , T, (7)

8

where it holds, by construction of the series Πi and nt, that

0 =p∑

j=0

MjΠi−j , i ≥ p + 1 (8)

0 =p∑

j=0

Mjnt−j , t = 1, . . . , T. (9)

On the other hand, given a process satisfying (7) and existence of matri-ces M0, M1, . . . , Mp such that conditions (8) and (9) are true, the processhas a VARMA representation as above. These statements are made precisein the following theorem which is just a restatement of the correspondingtheorem in Poskitt (2006).

Theorem 3.1 The process (yt)Tt=1−p admits a VARMA representation as

in (6) with [A(z), M(z)] ∈ {[A M ]}p and initial conditions y0, . . . , y1−p ifand only if (yt)T

t=1−p admits an autoregressive representation

yt =t+p−1∑

s=1

Πsyt−s + ut + nt, t = 1− p, . . . , T,

in which the conditions (8) and (9) are satisfied.

Now, one assigns to the autoregressive representation a unique VARMArepresentation. Although not necessary for the derivations that follow im-mediately, we assume that M(z) is invertible

Assumption 3.2 |M(z)| 6= 0 for |z| ≤ 1.

Because of the properties of the adjoint, Mad(z)M(z) = |M(z)|, equa-tions (8) and (9) imply

0 =q∑

j=0

mjΠi−j , i ≥ q + 1 (10)

0 =q∑

j=0

mjnt−j , t = q − p + 1, . . . , T. (11)

Here, |M(z)| ≡ m(z) = m0 + m1z + . . . + mqzq is a scalar polynomial and

q = p ·K is its maximal order.Therefore, one can define a pair in final moving-average form as in (3)

[A(z), m(z)IK ], provided the stated assumptions and that T ≥ q − p + 1.This representation, however, is not the only representation of this form. Toachieve uniqueness, we select the representation of the form [A(z), m(z)IK ]with the lowest possible degree of the scalar polynomial m(z) such that thefirst coefficient is one and (10) and (11) are satisfied.

9

Theorem 3.2 Assume that the process (yt)Tt=1−p satisfies Assumptions 3.1

and 3.2. Then for T ≥ q − p + 1 there exists a unique, observationallyequivalent, representation in terms of a pair [A0(z), m0(z)IK ] with ordersp0 and q0, respectively, commencing from some t0 ≥ 1− p.

In contrast to the discussion in Dufour & Pelletier (2008), the specialfeature in the non-stationary case with fixed initial values is that the FMArepresentation does not need to be left-coprime, in particular the autore-gressive and moving average polynomial can have the same roots. This is aconsequence of condition (11) and is not very surprising given the results ofPoskitt (2006) on the Echelon form representation in the same setting.

Now, if we assume normality and independence, i.e. ut ∼ i.i.d.N(0,Σu)with Σu positive definite, and under our assumptions, the parameters of themodel are identified via the Gaussian partial likelihood function

f(yTt0 |yt0−1

1−p , λ)

where yTt0 = (y′t0 , . . . , y

′T )′, yt0−1

1−p = (y′1−p, . . . , y′t0−1)

′ and λ being the param-eter vector of the final moving average form. This just follows from Poskitt(2006, section 2.2) and the observation that assumptions 2.1 and 2.2 of thispaper are satisfied in the present case.

If, in addition to the assumptions above we are interested in the casein which there is cointegration among the components of yt such that theprocess has s unit roots; that is, |A(z)| = ast(z)(1− z)s for 0 < s ≤ K and|ast(z)| 6= 0 for z ≤ 1, then the corresponding error-correction representation

yt = Πyt−1 +p0−1∑

i=1

Γj∆yt−j + m(L)ut (12)

with the same initial conditions as above is identified as there exists a oneto one mapping between this representation and the presentation in levels(cf. Poskitt 2006, section 4.1).

Specification

Since a legitimate critique of VARMA modeling is the increased specificationuncertainty, we think that a serious forecast comparison has to involve mod-eling uncertainty. Therefore, we chose to select the orders data-dependentin our forecast study as described in the following.

For the specification and estimation of the cointegrated model (1), thesample mean is subtracted from the observations as justified above. In orderto determine the lag orders p and q of the VARMA model (1) with the FMAstructure in (3) we apply a two-step approach similar to Dufour & Pelletier(2008) and use the information criterion they have suggested. Our procedureworks as follows.

10

1. Fit a long VAR regression with hT lags to the mean-adjusted series as

yt =hT∑

i=1

ΠhTi yt−i + uhT

t . (13)

Denote the estimated residuals from (13) by uhTt .

2. Regress yt on φhTt−1 = [y′t−1, . . . , y

′t−p, u

hT ′t−1, . . . , u

hT ′t−q]

′, t = sT + 1, . . . T ,imposing the FMA restriction in (3) for all combinations of p = k +1 ≤ pT and q ≤ qT with sT = max(pT , qT ) + hT using OLS. De-note the estimate of the corresponding covariance error matrix byΣT (p, q) = (1/N)

∑TsT +1 zt(p, q)z′t(p, q), where zt(p, q) are the OLS

residuals. Compute the information criterion

DP (p, q) = ln |ΣT (p, q)|+ dim(γ(p,q))(lnN)1+ν

N, ν > 0 (14)

where N = T − sT and dim(γ(p,q)) is the dimension of the vector offree parameters of the corresponding VARMA(p, q) model in levels.

3. Choose the AR and MA orders by (p, q)IC = argmin(p,q) DP (p, q),where the minimization is over {1, 2, . . . , pT } × {0, 1, . . . , qT }.

In order to show consistency we make the following assumption which isequivalent to Assumption A.2 in Poskitt (2003).

Assumption 3.3 The true error term vectors ut = (u′t,1, u′t,2, . . . , u

′t,K), t =

1−p, . . . , 0, 1, . . . , T , form an independent, identically distributed zero meanwhite noise sequence with positive definite variance-covariance matrix Σu.Furthermore, the moment condition E

(||ut||δ1)

< ∞ for some δ1 > 2, where|| · || denotes the Euclidean norm, and growth rate ||ut|| = O

((log t)1−δ2)

)almost surely (a.s.) for some 0 < δ2 < 1 also hold.

Using Assumption 3.3 we obtain the following theorem on the consistencyof the order estimators.

Theorem 3.3 If Assumptions 3.1-3.3 hold, if hT = [c(lnT )a] is the integerpart of c(lnT )a for some c > 0, a > 1, and if max(pT , qT ) < hT , then theorders chosen according to (14) converge a.s. to their true values.

Theorem 3.3 is the counterpart to Dufour & Pelletier (2008, Theorem 5.1),dealing with the stationary VARMA setup, and, to some extent, to Poskitt(2003, Proposition 3.2), referring to cointegrated VARMA models identi-fied via the echelon form. Note, that we can use the same penalty termCT = (lnN)1+ν , ν > 0, as in the stationary VARMA case. However,

11

the assumptions on the error terms have to be strengthened. In partic-ular, an iid assumption is needed in contrast to the strong mixing as-sumption employed by Dufour & Pelletier (2008). Existing formal results ofPoskitt & Lutkepohl (1995) and Huang & Guo (1990) show that weakeningAssumption 3.3, e.g. making an appropriate martingale difference sequenceassumption on ut, leads to too low convergence orders for the estimatorsobtained in steps 1. and 2.. As a consequence, the penalty term needs to bestronger, e.g. one may set it to C ′

T = hT CT . Nevertheless, Poskitt (2003)argues that it is likely that the needed convergence orders can be obtainedunder weaker conditions than stated in Assumption 3.3.

The practitioner has to chose values for ν, hT , pT , and qT satisfy-ing the conditions contained in Theorem 3.3. We set ν = 0.2 followingDufour & Pelletier (2008). As pointed out by Poskitt (2003) and Lutkepohl(2005, Section 14) no clear guideline exists on how to select hT for the non-stationary case. We adopt the rule hT = max

(max(pT , qT ) + 1, (lnT )1.25

)from Poskitt (2003) with pT = qT = 4. Choosing larger values for pT andqT left the results virtually unchanged. Alternatively, one may use Akaike’sinformation criterion (AIC) to determine hT resulting in the estimator hAIC

T ,say. While Poskitt (2003) conjectures that hAIC

T satisfies the condition onhT given in Theorem 3.3, the latter has only been proven for the BIC byBauer & Wagner (2005, Corollary 1). The BIC, however, may select a toosmall lag order for the long autoregression (13) in finite samples. Therefore,we decided to use the above specified rule for hT .

Estimation

Conditioned on the (estimated) orders and using the estimation results ofthe long autoregression (13) with hT as specified in Theorem 3.3 we canobtain Poskitt’s (2003) initial estimator as described in the following.

The cointegrated VARMA model (2) can be conveniently written as

∆yt = Π′yt−1 + [Γ M]Zt−1 + ut, (15)

where Γ = vec[Γ1, . . . , Γk], M = vec[M1, . . . , Mq] andZt−1 = [∆y′t−1, . . . , ∆y′t−k, u

′t−1, . . . , u

′t−q]

′.Let ZhT

t be the matrix obtained from Zt by replacing the ut by uhTt . Let

γ1 be the vector of free parameters in vec[Γ M] and the augmented vectorγ2 = (vec(Π)′, γ′1)

′. Identification restrictions are imposed by defining suit-able matrices R, R2 such that vec([Γ M]) = Rγ1 and vec([Π Γ M]) = R2γ2,respectively. Equipped with these definitions, one can write

∆yt =(y′t−1 ⊗ IK , ZhT

t−1

′)vec([Π Γ M]) + ut

=(y′t−1 ⊗ IK , ZhT

t−1

′)R2γ2 + ut

= Xtγ2 + ut (16)

12

Poskitt’s (2003) initial estimator is the feasible GLS estimator

γ2 =

T∑

hT +1

X ′t(Σ∆,T )−1Xt

−1

T∑

hT +1

X ′t(Σ∆,T )−1∆yt, (17)

which is strongly consistent (Poskitt 2003, Propositions 4.1 and 4.2) givenAssumptions 3.1-3.3 and Σ∆,T is an estimate of Σu obtained from OLSestimation of (16).2 The estimated matrices are denoted by Π, Γ, M . Toexploit the reduced rank structure in Π = αβ′, β is normalized such thatβ = [Ir, β

∗′]′. Then α is estimated as the first r rows of Π such that

α = Π[., 1 : r], (18)

β∗ =(

α′(M(1)ΣT M(1)′

)−1α

)−1

×(

α′(M(1)ΣT M(1)′

)−1Π[., r + 1 : K]

). (19)

These estimates are taken as starting values for the procedure described inYap & Reinsel (1995). We perform here one iteration of a conditional max-imum likelihood estimation procedure to obtain fully efficient estimates ina Gaussian setting. Define the vector of free parameters, given the cointe-gration restrictions, as δ := (vec((β∗)′)′, vec(α)′, γ′1)

′ and its value at the jthiteration as δ(j). The elements of the initial vector δ(0) = δ correspond to(17) - (19). Compute u

(j)t and Σ(j)

u according to

M (j)(L)u(j)t = ∆yt − α(j)(β(j))′yt−1 − Γ(j)(L)∆yt−1, (20)

Σ(j)u =

1T

T∑t

u(j)t (u(j)

t )′ (21)

For the calculation, it is assumed yt = ∆yt = ut = 0 for t ≤ 0. Only

W(j)t := −∂u

(j)t

∂δ′t

is needed for computing one iteration of the proposed

Newton-Raphson iteration. Also W(j)t can be calculated iteratively as

(W (j)t )′ =

[(y′t−1H ⊗ α), (y′t−1β ⊗ IK), ((Z(j)

t−1)′ ⊗ IK)R

]−

q∑

i=1

Mi(W(j)t−i)

′

where H ′ := [0((K−r)×r), IK−r]. The estimate is then updated according to

δ(j+1) − δ(j) =

(T∑

t=1

W(j)t (Σ(j)

u )−1(W (j)t )′

)−1 T∑

t=1

W(j)t (Σ(j)

u )−1u u

(j)t ,

2Our formulation differs from his because we formulate the models in differencesthroughout. The procedures yield identical results.

13

which amounts to a GLS estimation step. The estimates of the residualsand their covariance can be updated according to (20) and (21). The one-step iteration estimator δ(1) is consistent and fully efficient asymptoticallyaccording to Yap & Reinsel (1995, Theorem 2) given the strong consistencyof the initial estimator γ2 in (17).

Given estimates of the parameters and innovations, forecasts are ob-tained by using the implied VARMA form in levels. Finally, the samplemean, which was subtracted earlier, is added to the forecasts.

3.2 Benchmark Models

There are two benchmark models in the forecasting exercise. The first isthe multivariate random walk yt = yt−1 + ut, ut ∼ i.i.d.(0K , Σu), wherethe notation means that ut is an independent white noise process. Pointforecasts are obtained in a standard way.

The second benchmark model is the VECM

∆yt = µ0 + Πyt−1 +k∑

j=1

Γj∆yt−j + ut, t = k + 2, . . . , T, (22)

with the assumptions on the initial values, parameters and the ut are analo-gous to the assumptions made for the VARMA (1). The lag length is chosenby using the Bayesian information criterion, pBIC = argminp BIC(p), wherethe minimization is over p = 1, . . . , pT and kBIC = pBIC − 1. From the re-sults in Bauer & Wagner (2005), we take pT = [(T/ log T )1/2]. Paulsen(1984) shows that the standard order selection criteria are consistent formultivariate autoregressive processes with unit roots. The BIC is

BIC(p) = ln |Σ(p)|+ lnN(pK + 1)K

N, (23)

where N = T −pT , Σ(p) =∑T

t=pT +1 utu′t/N is an estimate of the error term

covariance matrix Σ and the ut are obtained by estimating an unrestrictedVAR model of order p using ypT +1−p, . . . , yT by OLS. After that, the param-eters of (22) are estimated by reduced rank maximum likelihood estimation(Johansen 1988, 1991, 1996). Forecasts are obtained iteratively by using theimplied estimated VAR form.

4 Conclusion

In this paper, we tie together some recent advances in the literature onVARMA models creating a relatively simple specification and estimationstrategy for the cointegrated case. In order to show its potential usefulness,we applied the procedure in a forecasting exercise for US interest rates andfound promising results.

14

There are a couple of issues which could be followed up. For example,the intercept term in the cointegrating relation is treated by subtractingthe sample mean from the series and it would be desirable to have a moreefficient method in this case. Also, it would be good to augment the modelby time-varying conditional variance. Finally, the development of modeldiagnostic tests appropriate for the cointegrated VARMA case would be ofinterest.

References

Athanasopoulos, G. & Vahid, F. (2008), ‘VARMA versus VAR for macroe-conomic forecasting’, Journal of Business & Economic Statistics 26, 237–252.

Bauer, D. & Wagner, M. (2005), Autoregressive approximations of multiplefrequency I(1) processes, Economics Series 174, Institute for AdvancedStudies.

Campbell, J. Y. & Shiller, R. J. (1987), ‘Cointegration and tests of presentvalue models’, Journal of Political Economy 95(5), 1062–88.

Carstensen, K. (2003), ‘Nonstationary term premia and cointegration of theterm structure’, Economics Letters 80(3), 409–413.

Cavaliere, G., Rahbek, A. & Taylor, A. (2010), ‘Co-integration rank testingunder conditional heteroskedasticity’, Econometric Theory 26(6), 1719–1760.

Clarida, R. H., Sarno, L., Taylor, M. P. & Valente, G. (2006), ‘The roleof asymmetries and regime shifts in the term structure of interest rates’,Journal of Business 79(3), 1193–1224.

Cooley, T. F. & Dwyer, M. (1998), ‘Business cycle analysis without muchtheory. A look at structural VARs’, 83, 57–88.

Cuthbertson, K. (1996), ‘The expectations hypothesis of the term structure:The uk interbank market’, The Economic Journal 106(436), 578–592.

de Pooter, M., Ravazzolo, F. & van Dijk, D. (2010), ‘Term structure forecast-ing using macro factors and forecast combination’, Norges Bank WorkingPaper 2010/01 .

Dufour, J. M. & Pelletier, D. (2008), ‘Practical methods for modellingweak VARMA processes: Identification, estimation and specificationwith a macroeconomic application’, Discussion Paper, McGill University,CIREQ and CIRANO .

15

Engsted, T. & Tanggaard, C. (1994), ‘Cointegration and the us term struc-ture’, Journal of Banking & Finance 18(1), 167–181.

Fernandez-Villaverde, J., Rubio-Ramırez, J. F., Sargent, T. J. & Watson,M. W. (2007), ‘A,B,C’s (and D)’s of understanding VARs’, AmericanEconomic Review 97(3), 1021–1026.

Feunou, B. (2009), ‘A no-arbitrage VARMA term structure model withmacroeconomic variables’, Working Paper, Duke University . June 2009,Mimeo, Duke University.

Guo, L., Chen, H.-F. & Zhang, J.-F. (1989), ‘Consistent order estimationfor linear stochastic feedback control systems (carma model)’, Automatica25(1), 147–151.

Hall, A. D., Anderson, H. M. & Granger, C. W. J. (1992), ‘A cointegrationanalysis of treasury bill yields’, The Review of Economics and Statistics74(1), 116–26.

Hassler, U. & Wolters, J. (2001), Forecasting money market rates in theunified germany, in R. Friedmann, L. Knuppel & H. Lutkepohl, eds,‘Econometric Studies: A Festschrift in Honour of Joachim Frohn’, LIT,pp. 185–201.

Huang, D. & Guo, L. (1990), ‘Estimation of nonstationary ARMAX mod-els based on the Hannan-Rissanen method’, The Annals of Statistics18(4), 1729–1756.

Johansen, S. (1988), ‘Statistical analysis of cointegration vectors’, Journalof Economic Dynamics and Control 12(2-3), 231–254.

Johansen, S. (1991), ‘Estimation and hypothesis testing of cointegration vec-tors in Gaussian vector autoregressive models’, Econometrica 59(6), 1551–1580.

Johansen, S. (1996), Likelihood-based Inference in Cointegrated Vector Au-toregressive Models, Oxford University Press, Oxford.

Kascha, C. & Mertens, K. (2009), ‘Business cycle analysis and VARMAmodels’, Journal of Economic Dynamics and Control 33(2), 267–282.

Lai, T. L. & Wei, C. Z. (1982), ‘Asymptotic properties of projections withapplications to stochastic regression problems’, Journal of MultivariateAnalysis 12, 346–370.

Lutkepohl, H. (1984a), ‘Linear aggregation of vector autoregressive movingaverage processes’, Economics Letters 14(4), 345–350.

16

Lutkepohl, H. (1984b), ‘Linear transformations of vector ARMA processes’,Journal of Econometrics 26(3), 283–293.

Lutkepohl, H. (1996), Handbook of Matrices, Chichester: Wiley.

Lutkepohl, H. (2005), New Introduction to Multiple Time Series Analysis,Berlin: Springer-Verlag.

Lutkepohl, H. & Claessen, H. (1997), ‘Analysis of cointegrated VARMAprocesses’, Journal of Econometrics 80(2), 223–239.

Monfort, A. & Pegoraro, F. (2007), Switching VARMA term structure mod-els - extended version, Working Paper 2007-19, Centre de Recherche enEconomie et Statistique.

Nielsen, B. (2006), Order determination in general vector autoregressions, inH.-C. Ho, C.-K. Ing & T. L. Lai, eds, ‘Time Series and Related Topics: InMemory of Ching-Zong Wei’, Vol. 52 of IMS Lecture Notes and MonographSeries, Institute of Mathematical Statistics, pp. 93–112.

Paulsen, J. (1984), ‘Order determination of multivariate autoregressive timeseries with unit roots’, Journal of Time Series Analysis 5(2), 115–127.

Poskitt, D. (2006), ‘On the identification and estimation of nonstationaryand cointegrated ARMAX systems’, Econometric Theory 22(6), 1138–1175.

Poskitt, D. S. (2003), ‘On the specification of cointegrated autoregressivemoving-average forecasting systems’, International Journal of Forecasting19(3), 503–519.

Poskitt, D. S. (2009), Vector autoregressive moving-average identificationfor macroeconomic modeling: Algorithms and theory, Working Paper12/2009, Monash University.

Poskitt, D. S. & Lutkepohl, H. (1995), Consistent specification of cointe-grated autoregressive moving average systems, Discussion Paper 1996-74,Humboldt Universitat zu Berlin, SFB 373.

Shea, G. S. (1992), ‘Benchmarking the expectations hypothesis of theinterest-rate term structure: An analysis of cointegration vectors’, Journalof Business & Economic Statistics 10(3), 347–66.

Tiao, G. C. & Tsay, R. S. (1989), ‘Model specificationi in multivariate timeseries’, Journal of the Royal Statistical Society, B 51(2), 157–213.

Yap, S. F. & Reinsel, G. C. (1995), ‘Estimation and testing for unit rootsin a partially nonstationary vector autoregressive moving average model’,Journal of the American Statistical Association 90(429), 253–267.

17

Appendix

Proof of Theorem 3.1:

⇒: Suppose (yt)Tt=1−p satisfies (6) given initial conditions. One can view

the sequence (ut)Tt=1−p as a solution to (6) viewed as system of equations

for the errors and given initial conditions u0, . . . , u1−p. Then we know that(ut)T

t=1−p is the sum of a particular solution and the appropriately chosensolution of the corresponding homogeneous system of equations, ut = uP

t +(−nt), say.

Define the series (Πi)i∈N0 by the recursive relations A0 = −M0Π0 and

Ai =i∑

j=0

MjΠi−j , for i = 1, . . . , p (24)

0 =p∑

j=0

MjΠi−j for i = p + 1, . . . (25)

Define now (uPt )T

t=1−p by uPt := yt −

∑t+p−1s=1 Πsyt−s, where

∑0s=1 Πsyt−s :=

0. Then, (uPt )T

t=1−p is indeed a particular solution as for t ≥ 1

p∑

j=0

MjuPt−j =

p∑

j=0

Mj

(yt−j −

t−j+p−1∑

s=1

Πsyt−s−j

)

= A0yt −p∑

j=1

Ajyt−j .

Further, define (nt)Tt=1−p by nt = up

t − ut for t = 1 − p, . . . , 0 and 0 =∑pi=0 Mjnt−i, for t = 1, . . . , T .By construction of nt, yt =

∑t+p−1s=1 Πsyt−s + ut + nt for t = 1− p, . . . , 0.

Then, we also have ut = upt −nt for t ≥ 1 as (−nt)T

t=1−p represents a solutionto the homogeneous system.

⇐ : Conversely, suppose (yt)Tt=1−p admits an autoregressive representation

as in (7) and there exist (K × K) matrices Mj j = 0, . . . , p such that0 =

∑pj=0 MjΠi−j for i ≥ p + 1 and 0 =

∑pj=0 Mjnt−j , for t = 1, . . . , T..

Then, for t = 1, . . . , T , it holds that

p∑

j=0

Mjyt−j =p∑

j=0

Mj

t−j+p−1∑

s=1

Πsyt−j−s +p∑

j=0

Mjut−j +p∑

j=0

Mjnt−j

18

Which leads to

−t+p−1∑

v=0

min(v,p)∑

j=0

MjΠv−jyt−v = −p∑

v=0

v∑

j=0

MjΠv−jyt−v

= A0yt −p∑

v=1

Avyt−v = M(L)ut

where the last line defines the Av’s.


From Theorem 3.1, (yt)Tt=1−p has a autoregressive representation. One can

express conditions (10) and (11) for a suitable pair [A(z), m(z)IK ] by defin-ing the polynomials Π(z) = −(Π0 +Π1z+ . . .) and n(z) = n1−p +n−pz+ . . ..One also defines the (stochastic) polynomial o(z) = o0 + o1z + . . . + ot0z

t

which captures that∑max(q,t−p+1)

j=0 mjnt−j 6= 0 for t ≤ t for some t.

A(z) = m(z)Π(z) (26)o(z) = m(z)n(z) (27)

Then, for given polynomials (Π(z), n(z)) one defines the set of all scalar poly-nomials m(z) with the first coefficient normalized to one for which there existfinite polynomials A(z) and o(z) such that the (26) and (27) are satisfied.Denote this set by S. Since the (normalized) determinant of M(z) satisfiesthe above conditions, S is not empty. Denote one solution to

minm(L)∈S

deg(m(L)),

by m0(z) with degree q0, where deg : S → N is the function that assignsthe degree to every polynomial in S. Denote the associated polynomials byA0(z), o0(z) with degrees p0, t0, respectively.

Suppose, there is another solution of the same degree m1(z) = 1+m1,1z+. . . + m1,q0z

q0 with

A1(z) = m1(z)Π(z)o1(z) = m1(z)n(z)

Since both polynomials are of degree q0, a = m0,q0/m1,q0 exists and one gets

(A0(z)− aA1(z)) = (m0(z)− am1(z))Π(z)(o0(z)− ao1(z)) = (m0(z)− am1(z))n(z)

Then, normalization of the first non-zero coefficient of (m0(z) − am1(z))would give a polynomial in S with degree smaller than q0, a contradiction.Thus m0(z) is unique.

Then, condition (26) alone would imply left-coprimeness of [A0(z), m0(z)IK ]but if n(z) 6= 0 the minimal orders p0, q0 might well be above those of theleft-coprime solution to (26).

19


Similar to Guo, Chen & Zhang (1989), we proof (pT , qT ) → (p0, q0) a.s. byshowing that the only limit point of (pT , qT ) is indeed (p0, q0) with proba-bility one, where p0 and q0 are the true lag orders. Thus the convergenceof pT and qT follows, which is equivalent to joint convergence. In order toshow this, we demonstrate that the events “(pT , qT ) has a limit point (p, q)with p + q > p0 + q0 ” (assuming p ≥ p0, q ≥ q0) and “(pT , qT ) has a limitpoint (p, q) with p < p0 or q < q0 ” both have probability zero.

Following Huang & Guo (1990) we rely on the spectral norm in orderto analyze the convergence behaviour of various sample moments. To thisend, we frequently make use of properties of the spectral norm, which arecollected e.g. in Lutkepohl (1996, Ch. 8).

Case 1: p ≥ p0, q ≥ q0, p + q > p0 + q0

Using T instead of N in our lag selection criterion (14) we obtain

DP (p0, q0)−DP (p, q) = ln det ΣT (p0, q0)− ln det ΣT (p, q)

−c(lnT )1+v

T,

where c is a constant.We have to show that DP (p0, q0) − DP (p, q) has a positive limit for

any pair p, q with p0 ≤ p ≤ pT , q ≤ q ≤ qT , and p + q > p0 + q0.Following Nielsen (2006, Proof of Theorem 2.5), it is sufficient to show thatT (ΣT (p0, q0) − ΣT (p, q)) = o{g(T )} such that (lnT )1+v/g(T ) → ∞ for p, qas just defined since c > 0 in the current setup.

Let us introduce the following notation with p0 and q0 being the true lagorders:

φ0t (p, q) = [y′t, . . . , y

′t−p+1, u

′t, . . . u

′t−q+1]

′

φhTt (p, q) = [y′t, . . . , y

′t−p+1, (u

hTt )′, . . . , (uhT

t−q+1)′]′

YT = [y′1, . . . , y′T ]′

UT = [u′1, . . . , u′T ]′

x0t (p, q) = [(φ0

t−1(p, q)′ ⊗ IK)R]′

xhTt (p, q) = [(φhT

t−1(p, q)′ ⊗ IK)R]′

X0T (p, q) = [x0

1(p, q), . . . , x0T (p, q)]′

XT (p, q) = [xhT1 (p, q), . . . , xhT

T (p, q)]′

γ(p, q) = [vec(A1, A2, . . . , Ap)′,m1,m2, . . . ,mq]′,

where the (K2 · p + q × 1) vector of free coefficients, γ(p, q) is the vector oftrue parameters such that Ai = 0 and mj = 0 for i > p0, j > q0, respectively.

20

Then, one can write

yt =p∑

i=1

Aiyt−i + ut +q∑

i=1

Miut−i

= [A1, . . . , Ap,M1, . . . Mq]φ0t−1(p, q) + ut

= (φ0t−1(p, q)′ ⊗ IK)vec[A1, . . . , Ap,M1, . . . Mq] + ut

= (φ0t−1(p, q)′ ⊗ IK)Rγ(p, q) = x0

t (p, q)′γ(p, q) + ut

in order to summarize the model in matrix notation by

YT = X0T (p, q)γ(p, q) + UT

= XT (p, q)γ(p, q) + [X0T (p, q)−XT (p, q)]γ(p, q) + UT

= XT (p, q)γ(p, q) + RT + UT , (28)

whereRT := [X0

T (p, q)−XT (p, q)]γ(p, q).

RT does not depend on p, q for p ≥ p0, q ≥ q0 and can be decomposedas RT = [r′0, r

′1, . . . , r

′T−1]

′, where rt, t = 0, 1, . . . , T − 1, is a K × 1 vector.Let ZT (p, q) = [z1(p, q)′, . . . , zT (p, q)′]′ be the OLS residuals obtained fromregressing YT on XT (p, q), i.e.

ZT (p, q) = YT −XT (p, q)[XT (p, q)′XT (p, q)

]−1XT (p, q)′YT

= XT (p, q)γ(p, q) + RT + UT −XT (p, q)[XT (p, q)′XT (p, q)

]−1XT (p, q)′

×(XT (p, q)γ(p, q) + RT + UT )

= [RT + UT ]−XT (p, q)[XT (p, q)′XT (p, q)

]−1XT (p, q)′[RT + UT ].

The estimator of the error covariance matrix Σ in dependence on p andq is given by ΣT (p, q) = T−1

∑Tt=1 zt(p, q)zt(p, q)′. Furthermore, note that

ΣT (p0, q0)− ΣT (p, q) is positive semidefinite since p ≥ p0 and q ≥ q0 in thecurrent setup. Hence, we have

‖ ΣT (p0, q0)− ΣT (p, q) ‖= λmax

(ΣT (p0, q0)− ΣT (p, q)

)

≤ tr(ΣT (p0, q0)− ΣT (p, q)

)= tr

(ΣT (p0, q0)

)− tr

(ΣT (p, q)

)

= T−1ZT (p0, q0)′ZT (p0, q0)− T−1ZT (p, q)′ZT (p, q) (29)

= T−1[RT + UT ]′XT (p, q)[XT (p, q)′XT (p, q)

]−1XT (p, q)′[RT + UT ]

− T−1[RT + UT ]′XT (p0, q0)[XT (p0, q0)′XT (p0, q0)

]−1XT (p0, q0)′[RT + UT ]

We have for the terms on the right-hand side (r.h.s.) of the last equality

21

in (29)

[RT + UT ]′XT (p, q)[XT (p, q)′XT (p, q)

]−1XT (p, q)′[RT + UT ]

= O(|| [XT (p, q)′XT (p, q)

]−1/2XT (p, q)′[RT + UT ]||2

)

= O(|| [XT (p, q)′XT (p, q)

]−1/2XT (p, q)′RT ||2

)

+ O(|| [XT (p, q)′XT (p, q)

]−1/2XT (p, q)′UT ||2

),

(30)

where the result holds for all p ≥ p0 and q ≥ q0.As in Poskitt & Lutkepohl (1995, Proof of Theorem 3.2), we obtain from

Lai & Wei (1982, Theorem 3) for any s = max(p, q)

|| [XT (p, q)′XT (p, q)]−1/2

XT (p, q)′UT ||2

= O

(max

{1, ln+

(s∑

n=1

∑t

||yt−n||2 + ||uhTt−n||2

)})(31)

= O(ln s) + O

(ln

(O

{∑t

||yt||2 + ||uhTt ||2

}))a.s.,

where ln+(x) denotes the positive part of ln(x). Then, we have that∑

t ||yt||2 =O(T g) due to Assumption 3.3, where the constant and growth rate are inde-pendent of s, see Poskitt & Lutkepohl (1995, Proof of Theorem 3.2, Proofof Lemma 3.1). Therefore, the second term on the r.h.s. of (31) is O(lnT )a.s. uniformly in s. Hence, the left-hand side (l.h.s.) of (31) is O(lnT ) a.s.since s ≤ sT ≤ hT = [c(lnT )a], c > 0, a > 1.

Similar to Poskitt & Lutkepohl (1995, Proof of Theorem 3.2) we obtainfrom a standard result in least squares

|| [XT (p, q)′XT (p, q)]−1/2

XT (p, q)′RT ||2 ≤T−1∑

t=0

K∑

i=1

r2i,t

≤ ||γ(p, q)||2 ·q∑

n=1

∑t

||ut−n − uhTt−n||2 (32)

= O(lnT ) a.s.,

where the last line follows from Poskitt (2003, Proposition 3.1) due to As-sumption 3.3, our choice of hT and since ||γ(p, q)|| = constant < ∞ indepen-dent of (p, q). Hence, we have T−1[RT +UT ]′XT (p, q) [XT (p, q)′XT (p, q)]−1×XT (p, q)′[RT + UT ] = O(lnT/T ) a.s. uniformly in (p, q).

22

Using the result of (31) and (32) with respect to (29) we finally obtain

maxp0<p≤pTq0<q≤qT

‖ ΣT (p0, q0)− ΣT (p, q) ‖

≤ 2 maxp0≤p≤pTq0≤q≤qT

∣∣∣T−1[RT + UT ]′XT (p, q)[XT (p, q)′XT (p, q)

]−1XT (p, q)′[RT + UT ]

∣∣∣

= O

(lnT

T

)a.s.,

which gives the desired result.

Case 2: (p, q) with p < p0 or q < q0

For (p, q) with p < p0 or q < q0, write

D(p, q)−D(p0, q0) = ln |IK + (ΣT (p, q)− ΣT (p0, q0))Σ−1T (p0, q0))|+ o(1)

Thus, it suffices to show that lim inf λmax(ΣT (p, q)− ΣT (p0, q0)) > 0 , as inNielsen (2006). To do so, let us introduce some further notation:

γT (p, q) =[X ′

T (p, q)XT (p, q)]−1

X ′T (p, q)YT (33)

= [vec(A1, A2, . . . , Ap)′, m1, m2, . . . , mq]′.

and, defining sp = max(p, p0) and sq = max(q, q0),

γ0T (p, q) = [vec(A1, A2, . . . , Asp)

′, m1, m2, . . . , msq ]′ (34)

with Ai = 0 for i > p and mi = 0 for i > q. Then, we get

ZT (p, q) = YT −XT (p, q)γT (p, q) = YT −XT (sp, sq)γ0T (p, q)

= YT −XT (sp, sq)γ(sp, sq) + XT (sp, sq)[γ(sp, sq)− γ0T (p, q)]

= UT + XT γ(sp, sq) + XT (sp, sq)γT (p, q),

where γ(p, q) is defined as above in Case 1,

XT := (X0T (sp, sq)−XT (sp, sq)), (35)

and γT (p, q) = γ(sp, sq)− γ0T (p, q).

Let x′t and x′t(sp, sq) be the typical K × (pK2 + q) (sub)matrices of theTK × (pK2 + q) matrices XT and XT (sp, sq), respectively, i.e. the partitionof XT and XT (sp, sq) is analogous to XT (p, q) above. Then, for p < p0 or

23

q < q0, the residual covariance matrix can be written as

ΣT (p, q) =1T

T∑

t=1

zt(p, q)zt(p, q)′

=1T

T∑

t=1

x′t(sp, sq)γT (p, q)γ′T (p, q)xt(sp, sq)

+1T

T∑

t=1

(x′t(sp, sq)γT (p, q)

) (ut + x′tγ(sp, sq)

)′

+1T

T∑

t=1

(ut + x′tγ(sp, sq)

) (x′t(sp, sq)γT (p, q)

)′

+1T

T∑

t=1

(ut + x′tγ(sp, sq)

) (ut + x′tγ(sp, sq)

)′

= D1,T + (D2,T + D′2,T ) + D3,T ,

where D1,T , D2,T , and D3,T are equal to the square products and crossproducts in the above equations, respectively.

Similarly, the residual covariance matrix based on the true orders p0 andq0 can be expressed by

ΣT (p0, q0) =1T

T∑

t=1

zt(p, q)z′t(p, q)

=1T

T∑

t=1

x′t(p0, q0)γT (p0, q0)γ′T (p0, q0)x′t(p0, q0)

+1T

T∑

t=1

(x′t(p0, q0)γT (p0, q0)

) (ut + x′tγ(p0, q0)

)′

+1T

T∑

t=1

(ut + x′tγ(p0, q0)

) (x′t(p0, q0)γT (p0, q0)

)′

+1T

T∑

t=1

(ut + x′tγ(p0, q0)

) (ut + x′tγ(p0, q0)

)′

= D01,T + (D0

2,T + (D02,T )′) + D0

3,T ,

where D01,T , D0

2,T , and D03,T are defined analogously. Then,

ΣT (p, q)− ΣT (p0, q0) = D1,T +(D2,T + D′

2,T −D01,T −D0

2,T − (D02,T )′

)

+(D3,T −D0

3,T

). (36)

It is easily seen that D3,T and D03,T both converge to Σu a.s.. Therefore,

the third term in (36) is o(1). We will further show that D2,T , D01,T , and

24

D02,T are o(1) a.s. and that lim inf λmax(D1,T ) > 0 while noting that D1,T is

p.s.d. by construction, showing lim inf λmax(ΣT (p, q) − ΣT (p0, q0)) > 0, thedesired result.

D1,T : Since D1,T is positive semidefinite by construction, it has at leastone nonzero eigenvalue if

λmax(D1,T ) = λmax

(1T

T∑

t=1

x′t(sp, sq)γT (p, q)γ′T (p, q)xt(sp, sq)

)

= λmax

(ΓT (p, q)

(1T

T∑

t=1

φt(sp, sq)φ′t(sp, sq)

)Γ′T (p, q)

)

≥ λmin

(1T

T∑

t=1

φt(sp, sq)φ′t(sp, sq)

)||ΓT (p, q)||2,

> 0,

where ΓT (p, q) = [A1, . . . , Mq] is γ(p, q) augmented to matrix form. Then,according to Poskitt & Lutkepohl (1995, Proof of Theorem 3.2) one getslimT→∞ inf T−1λmin(

∑Tt=1 φt(sp, sq)φ′t(sp, sq)) > 0 a.s. and we also have

||ΓT (p, q)||2 = constant > 0 from Huang & Guo (1990, p. 1753). This giveslim inf λmax(D1,T ) > 0.

D01,T : We have

γT (p0, q0) = γ(p0, q0)− γ0T (p0, q0) (37)

= − [X ′

T (p0, q0)XT (p0, q0)]−1

X ′T (p0, q0)

[XT γ(p0, q0) + UT

]

due to (28), (33), (34), and (35). Therefore,

|| 1T

T∑

t=1

x′t(p0, q0)γT (p0, q0)γT (p0, q0)′xt(p0, q0)||

≤ 1T

T∑

t=1

||x′t(p0, q0)γT (p0, q0)γT (p0, q0)′xt(p0, q0)||

=1T

T∑

t=1

γT (p0, q0)′xt(p0, q0)x′t(p0, q0)γT (p0, q0)

=1T

γT (p0, q0)′(

T∑

t=1

xt(p0, q0)x′t(p0, q0)

)γT (p0, q0)

=1T

γT (p0, q0)′(X ′

T (p0, q0)XT (p0, q0))γT (p0, q0),

25

and, using the above result on γT ,

=1T

[XT γ(p0, q0) + UT

]′XT (p0, q0)

[X ′

T (p0, q0)XT (p0, q0)]−1

× X ′T (p0, q0)

[XT γ(p0, q0) + UT

]

=1T

O(lnT ) a.s.,

where the last line follows from (30-32) of the first part of the proof; comparealso Huang & Guo (1990, pp. 1754).

D2,T : For D2,T , we have

‖ 1T

T∑

t=1

(x′t(sp, sq)γT (p, q))(ut + x′tγ(sp, sq))′ ‖

=‖ 1T

T∑

t=1

ΓT (p, q)φt(sp, sq)(ut + x′tγ(sp, sq))′ ‖

=‖ 1T

ΓT (p, q)(Φ′T ΦT )1/2(Φ′T ΦT )−1/2T∑

t=1

φt(sp, sq)(ut + x′tγ(sp, sq))′ ‖,

where ΦT := [φ0(sp, sq), . . . , φT−1(sp, sq)]′ is a T×(sp+sq)·K matrix. Then,similar to the approach in Huang & Guo (1990),

|| 1T

T∑

t=1

(x′t(sp, sq)γT (p, q))(ut + x′tγ(sp, sq))′||

≤ 1T||ΓT (p, q)(Φ′T ΦT )1/2|| ||(Φ′T ΦT )−1/2Φ′T (UT + XT )||,

where UT := [u1, . . . , uT ]′, XT := [x′1γ(sp, sq), . . . , x′T γ(sp, sq)]′. Now, te-dious but straightforward calculations lead to

1T||ΓT (p, q)(Φ′T ΦT )1/2|| ||(Φ′T ΦT )−1/2Φ′T (UT + XT )||

≤(

1T

ΓT (p, q)(Φ′T ΦT )Γ′T (p, q))1/2

(38)

×(

1T

(UT + XT γ)′(ΦT ⊗ IK)((Φ′T ΦT )−1 ⊗ IK)(Φ′T ⊗ IK)(UT + XT γ))1/2

= [O(1)]1/2

[O

(1T

ln T

)]1/2

= o(1) a.s.

following from the results on D1,T and again from (30-32) of the first partof the proof. Note in this respect that the results in (31) and (32) also hold

26

when using the regressor matrix ΦT ⊗ IK appearing in (38). This is dueto the fact that the relevant properties of linear projections and OLS donot depend on whether the restricted or unrestricted form of the regressormatrix is used.

D02,T : Similar to the arguments used for D2,T , we can write

|| 1T

T∑

t=1

(x′tγT (p0, q0))(ut + x′tγ(p0, q0)||

≤ 1T

T∑

t=1

||(x′tγT (p0, q0))(ut + x′tγ(p0, q0)′||

≤ 1T

T∑

t=1

[γ′T (p0, q0)xt(p0, q0)x′t(p0, q0)γT (p0, q0)]1/2

× [(ut + x′tγ(p0, q0))′(ut + x′tγ(p0, q0))]1/2

≤ 1T

[γT (p0, q0)′(X ′T (p0, q0)XT (p0, q0))γT (p0, q0)]1/2

× [(UT + X ′T γ(p0, q0))′(UT + X ′

T γ(p0, q0))]1/2

=[O

(1T

ln T

)]1/2

[O(1)]1/2 = o(1) a.s.,

using arguments identical to those used to evaluate D01,T and noting that

T−1(UT + X ′T γ(p0, q0))′(UT + X ′

T γ(p0, q0)) = T−1U ′T UT + o(1) = O(1) a.s.

due to the results of Poskitt & Lutkepohl (1995, Proof of Theorem 3.2) andapplying Poskitt (2003, Proposition 3.1). This completes the proof.

27

Tab

le1:

Com

pari

son

ofm

ean

squa

red

pred

icti

oner

rors

RW

VE

CM

VA

RM

AVA

RM

AY

P1

36

121

36

121

36

121

36

123M

0.04

20.

228

0.68

12.

100

0.79

10.

834

0.88

00.

924

0.74

00.

793

0.83

40.

888

0.73

80.

789

0.82

20.

878

6M0.

042

0.23

50.

676

2.01

50.

832

0.95

41.

001

1.00

40.

794

0.90

30.

944

0.96

60.

780

0.88

60.

926

0.95

43M

0.04

20.

228

0.68

12.

100

0.86

00.

912

0.87

30.

877

0.72

50.

746

0.77

40.

808

0.72

80.

736

0.75

10.

783

1Y0.

055

0.28

50.

756

2.06

70.

899

1.08

91.

058

1.05

10.

793

0.93

40.

978

0.99

10.

786

0.90

80.

949

0.96

43M

0.04

20.

228

0.68

12.

100

1.06

21.

065

0.96

10.

896

0.81

70.

886

0.89

90.

847

0.80

30.

908

0.91

20.

825

5Y0.

065

0.28

20.

591

1.12

00.

930

1.07

61.

042

1.07

80.

879

0.98

71.

030

1.06

30.

875

0.97

91.

016

1.03

23M

0.04

20.

228

0.68

12.

100

1.09

21.

070

0.98

80.

894

0.84

30.

955

0.96

20.

867

0.84

71.

003

1.00

10.

861

10Y

0.05

30.

206

0.42

00.

755

0.93

91.

069

1.05

31.

053

0.91

51.

015

1.04

41.

052

0.91

41.

006

1.02

71.

020

3M0.

042

0.22

80.

681

2.10

00.

779

0.80

00.

889

0.93

40.

795

0.82

10.

877

0.90

20.

794

0.82

90.

878

0.89

96M

0.04

20.

235

0.67

62.

015

0.76

90.

897

1.00

01.

009

0.81

80.

900

0.97

10.

972

0.82

30.

911

0.97

30.

969

1Y0.

055

0.28

50.

756

2.06

70.

846

1.02

71.

131

1.12

60.

884

0.99

71.

076

1.07

50.

887

1.00

41.

074

1.07

11Y

0.05

50.

285

0.75

62.

067

1.07

01.

347

1.30

61.

031

0.96

21.

155

1.11

20.

994

0.97

21.

175

1.14

11.

010

5Y0.

065

0.28

20.

591

1.12

00.

954

1.04

61.

020

0.96

10.

917

0.99

60.

959

0.97

40.

935

1.00

60.

977

0.99

110

Y0.

053

0.20

60.

420

0.75

50.

955

1.00

60.

959

0.92

60.

924

0.98

30.

952

0.97

30.

940

0.99

00.

968

0.99

33M

0.04

20.

228

0.68

12.

100

0.91

11.

088

1.17

31.

013

0.72

80.

783

0.82

90.

838

0.73

00.

802

0.85

60.

838

1Y0.

055

0.28

50.

756

2.06

70.

979

1.29

71.

372

1.16

70.

822

1.00

41.

042

0.99

90.

830

1.02

91.

070

0.99

610

Y0.

053

0.20

60.

420

0.75

50.

960

1.07

31.

093

1.08

50.

912

1.00

61.

025

1.04

80.

911

1.00

21.

013

1.01

83M

0.04

20.

228

0.68

12.

100

0.99

81.

136

1.08

90.

884

1.21

11.

154

1.00

70.

855

1.24

21.

181

1.02

10.

853

6M0.

042

0.23

50.

676

2.01

51.

027

1.18

61.

162

0.95

11.

193

1.16

11.

057

0.91

31.

222

1.18

51.

071

0.91

11Y

0.05

50.

285

0.75

62.

067

1.07

21.

217

1.19

41.

010

1.19

51.

164

1.07

90.

967

1.21

61.

183

1.09

10.

966

5Y0.

065

0.28

20.

591

1.12

01.

023

1.04

51.

019

0.98

81.

059

1.01

20.

959

0.96

01.

065

1.01

60.

961

0.95

710

Y0.

053

0.20

60.

420

0.75

51.

011

1.01

40.

982

0.96

01.

017

0.98

60.

948

0.93

91.

017

0.98

40.

943

0.93

3

Note

:T

he

table

report

sm

ean

square

dpre

dic

tion

erro

rsfo

rsy

stem

sw

ith

diff

eren

tm

atu

riti

esand

diff

eren

tm

odel

s.VA

RM

Are

fers

toth

eVA

RM

Am

odel

esti

mate

dvia

Posk

itt’

s(2

003)

met

hod

and

VA

RM

AY

Pre

fers

toth

esa

me

model

esti

mate

dvia

the

met

hod

of

Yap

&R

einse

l(1

995).

28

Tab

le2:

Com

pari

son

ofde

term

inan

tof

mea

nsq

uare

dpr

edic

tion

erro

rm

atri

ces

RW

VE

CM

VA

RM

AVA

RM

AY

P1

36

121

36

121

36

121

36

123M

,6M

0.00

00.

004

0.01

50.

079

0.80

50.

788

0.75

70.

559

0.74

60.

767

0.75

50.

549

0.75

50.

816

0.79

30.

560

3M,1

Y0.

001

0.01

30.

067

0.34

40.

857

0.86

00.

707

0.55

10.

688

0.73

40.

657

0.49

50.

705

0.77

40.

666

0.48

43M

,5Y

0.00

20.

041

0.23

81.

129

1.00

01.

107

0.95

80.

803

0.74

00.

895

0.90

50.

759

0.72

50.

929

0.92

60.

740

3M,1

0Y0.

002

0.03

70.

213

1.11

71.

032

1.17

21.

067

0.86

00.

794

1.02

61.

046

0.84

00.

792

1.07

31.

076

0.81

73M

,6M

,1Y

0.00

00.

000

0.00

00.

002

0.75

00.

737

0.80

10.

603

0.75

10.

686

0.74

50.

562

0.75

80.

694

0.74

80.

561

3M,1

Y,1

0Y0.

000

0.00

00.

002

0.01

11.

118

1.35

91.

171

0.81

60.

931

1.09

01.

030

0.84

70.

958

1.13

31.

065

0.85

71Y

,5Y

,10Y

0.00

00.

001

0.00

60.

055

0.93

91.

158

0.99

50.

490

0.74

90.

913

0.81

70.

461

0.77

61.

009

0.85

70.

437

3M-1

0Y0.

000

0.00

00.

000

0.00

00.

927

1.05

10.

943

0.44

91.

147

1.05

70.

928

0.49

11.

170

1.06

90.

923

0.47

0

Note

:T

he

table

report

sth

edet

erm

inant

of

the

mea

nsq

uare

dpre

dic

tion

erro

rm

atr

ices

for

syst

ems

wit

hdiff

eren

tm

atu

riti

esand

diff

eren

tm

odel

s.VA

RM

Are

fers

toth

eVA

RM

Am

odel

esti

mate

dvia

Posk

itt’

s(2

003)

met

hod

and

VA

RM

AY

Pre

fers

toth

esa

me

model

esti

mate

dvia

the

met

hod

ofY

ap

&R

einse

l(1

995).

29

Figure 1: US treasury bills and bonds yields. See text for definitions.

30

(a) Forecasting horizon: 1 month

(b) Forecasting horizon: 6 month

(c) Forecasting horizon: 12 month

Figure 2: Cumulative squared prediction errors of the VECM and VARMAmodel for different horizons.

31

forecasting us interest rates with cointegrated varma models

Documents