researchersresearchers-sbe.unimaas.nl/alainhecq/wp-content/uploads/... · 2016. 3. 8. · testing...

Testing for Granger Causality in Large

Mixed-Frequency VARs

Thomas B. Gotz∗1, Alain Hecq†2, and Stephan Smeekes‡2

1Deutsche Bundesbank2Maastricht University

March 3, 2016

Abstract

We analyze Granger causality testing in a mixed-frequency VAR, where the differencein sampling frequencies of the variables is large, implying parameter proliferation problemsin case we attempt to estimate the model unrestrictedly. We propose several tests basedon reduced rank restrictions, including bootstrap versions thereof to account for factor es-timation uncertainty and improve the finite sample properties of the tests, and a BayesianVAR extended to mixed frequencies. We compare these methods to a test based on anaggregated model, the max-test (Ghysels et al., 2016a) and an unrestricted VAR-based test(Ghysels et al., 2016b) using Monte Carlo simulations. An empirical application illustratesthe techniques.JEL Codes: C11, C12, C32JEL Keywords: Granger Causality, Mixed Frequency VAR, Bayesian VAR, Reduced RankModel, Bootstrap Test

∗Correspondence to: Thomas B. Gotz, Deutsche Bundesbank, Macroeconomic Analysis and Projection Divi-sion, Wilhelm-Epstein-Strasse 14, 60431 Frankfurt am Main. Email: [email protected], Tel: +49-69-9566-6991, Fax: +49-69-9566-6734. The views expressed in this paper are ours and do not necessarily reflectthe views of the Deutsche Bundesbank or its staff. Any errors or omissions are our responsibility.

†Maastricht University, School of Business and Economics, Department of Quantitative Economics, P.O. Box616, 6200 MD Maastricht, The Netherlands. Email: [email protected]

‡Maastricht University, School of Business and Economics, Department of Quantitative Economics, P.O. Box616, 6200 MD Maastricht, The Netherlands. Email: [email protected]

1

1 Introduction

Economic time series are published at various frequencies. While higher frequency variables

used to be aggregated (e.g., Silvestrini and Veredas, 2008), it has become more and more popular

to consider models that take into account the difference in frequencies of the processes under

consideration. As argued extensively in the mixed-frequency literature (among others, Ghysels

et al., 2007, Clements and Galvao, 2008, Ghysels et al., 2006, Marcellino and Schumacher, 2010),

working in a mixed-frequency setup instead of a common low-frequency one is advantageous due

to the potential loss of information in the latter scenario and feasibility of the former through

MI(xed) DA(ta) S(ampling) regressions (Ghysels et al., 2004), even in the presence of many

high-frequency variables compared to the number of observations.

Until recently, the MIDAS literature was limited to the single-equation framework, in which

one of the low-frequency variables is chosen as the dependent variable and the high-frequency

ones are in the regressors. Since the work of Ghysels (2015) for stationary series and the ex-

tension of Gotz et al. (2013) and Ghysels and Miller (2015) for the non-stationary and possibly

cointegrated case, we can analyze the link between high- and low-frequency series in a VAR

system treating all variables as endogenous. Note that the VAR vector in the model of Ghy-

sels (2015) is specified in terms of the low frequency by stacking the high-frequency variables

and aligning them with the low-frequency ones. Another approach of incorporating mixed

frequencies into VAR models is to write the entire model in high frequency by treating the

corresponding observations of the low-frequency series as latent. Noteworthy contributions to

this branch of the mixed-frequency VAR literature are, among others, Schorfheide and Song

(2015), Kuzin et al. (2011) or Eraker et al. (2015). We refer the reader to Foroni et al. (2013)

for an excellent survey on VAR models with mixed-frequency data.

An often investigated feature within VAR models is Granger causality, generally introduced

in Granger (1969). It is well known, though, that Granger causality is not invariant to temporal

2

aggregation (Granger, 1988, Granger and Lin, 1995 or Marcellino, 1999). While most work on

this issue focused on cases in which all series are aggregated, Ghysels et al. (2016b) investigate

Granger causality in a setting, where variables are available at varying frequencies. Starting

from an underlying high-frequency process, it is shown that mixed-frequency Granger causality

tests better recover causal patterns in the data generating process than traditional low-frequency

tests do. However, decent size and power properties of their mixed-frequency tests depend on

a relatively small difference in sampling frequencies of the variables involved. Indeed, if the

number of high-frequency observations within a low-frequency period is large, size distortions

and loss of power may be expected.

In this paper we take the model of Ghysels (2015) and analyze the finite sample behavior

of Granger non-causality tests in the spirit of Dufour and Renault (1998) when the number

of high-frequency observations per low-frequency period is large as, e.g., in a month/working

day-example. To avoid the proliferation of parameters we consider two parameter reduction

techniques: reduced rank regressions and a Bayesian mixed-frequency VAR. As far as the

latter is concerned, we show precisely how to extend the dummy observation approach of

Banbura et al. (2010), itself built on the work of Sims and Zha (1998), to the presence of mixed

frequencies.1 Importantly, due to stacking the high-frequency variables in the mixed-frequency

VAR (Ghysels, 2015), their approach cannot be applied directly such that a properly adapted

choice of auxiliary dummy variables is required, which marks a significant contribution to the

respective literature by itself. With respect to reduced rank regressions, the associated factors

are typically not observable and must be estimated, which obviously affects the distribution

of the Wald tests to detect directions of Granger causalities. Consequently, another important

contribution of this paper is the introduction of bootstrap versions of these tests (also for the

unrestricted VAR), which have correct size even for large VARs and a small sample size.

1Banbura et al. (2010) refer to these variables as dummy observations. To avoid confusion between high- andlow-frequency observations and auxiliary variables, we use the term ’auxiliary dummy variables’ henceforth.

3

Independently and simultaneously to this work, Ghysels et al. (2016a) introduced another

approach to testing for Granger causality in a mixed-frequency VAR characterized by a large

discrepancy in frequencies of the series involved. In particular, their method is based on a

sequence of parsimonious regression models where the low-frequency variable is regressed on

its own lags and only one specific lead or lag of the high-frequency series. The test statistic

itself is then computed as the maximum of these individual-regression-based estimators scaled

and weighted properly; hence the name “max-test”. The authors show that, in a mixed-

frequency VAR a la Ghysels (2015), this procedure leads to correct identification of the test

in the presence or absence of Granger causality, which implies consistency against causality of

all forms. Its simplicity and applicability to large dimensional systems makes the max-test the

main competitor for our tests.

Both reduced rank regressions and the Bayesian estimation of the mixed-frequency VAR,

are aiming at reducing the amount of parameters to be estimated before testing for Granger

causality. The former does so by imposing a certain set of restrictions leading to the computation

of a low-dimensional set of factors; the latter uses parameter shrinkage to a prior that is centered

at a restricted version of the VAR of Ghysels (2015). While these methods are very appealing

from an applied perspective, they are, much like the MIDAS polynomials, entirely ad hoc and

possibly misspecified. Indeed, if the restrictions imposed are not satisfied in the true data

generating process, i.e., unless we impose (over)identified restrictions, the resulting model is

not correctly specified. This, in turn, may result in potential inability of our methods to

detect causality, and in particular cause our methods to lose consistency against certain forms

of causality that cannot be detected under the misspecified restrictions. In contrast, Granger

causality tests based on the unrestricted VAR (Ghysels et al., 2016b) and the max-test (Ghysels

et al., 2016a) are not based on (possibly invalid) restrictions implying that they have asymptotic

power of one against all forms of causality. However, in situations where our tests are based

on a correctly specified model, our approaches may very well outperform the test based on

4

the unrestricted VAR or the max-test due to the way in which the parameter proliferation

problem in our large dimensional system is addressed. Moreover, even in situations in which

the restrictions are not valid, in finite samples our methods may still allow one to pick up

causality with high power due to the dimensionality reduction which may offset the effect of

misspecification in small samples.

The rest of the paper is organized as follows. In Section 2 notations are introduced, the

mixed-frequency VAR (MF-VAR hereafter) for our specific case at hand is presented and

Granger (non-)causality is defined. Section 3 discusses the approaches to reduce the num-

ber of parameters to be estimated, whereby reduced rank restrictions (Section 3.1) as well as

Bayesian MF-VARs (Section 3.2) are presented in detail. The finite sample performance of

testing for Granger causality within the aforementioned restricted models is analyzed via a

Monte Carlo experiment in Section 4. We compare our tests with the one based on the unre-

stricted VAR, the max-test, and an asymptotic Wald test based on a common low-frequency

VAR obtained by temporally aggregating the high-frequency variable (Breitung and Swanson,

2002). An empirical example with U.S. data on the monthly industrial production index and

daily volatility in Section 5 illustrates the results. Section 6 concludes.

2 Causality in a Mixed-frequency VAR

2.1 Notation

Let us start from a two variable mixed-frequency system, where yt, t = 1, . . . , T, is the low-

frequency variable and x(m)t−i/m are the high-frequency variables with m high-frequency observa-

tions per low-frequency period t. Throughout this paper we assume m to be rather large as in

a year/month- or month/working day-example. We also assume m to be constant rather than

varying with t.2 The value of i indicates the specific high-frequency observation under con-

2As long as m is deterministic, even time-varying frequency discrepancies do not pose a problem on a theo-retical level. However, the assumption of constant m simplifies the notation greatly (Ghysels, 2015).

5

sideration, ranging from the beginning of each t-period (x(m)t−(m−1)/m) until the end (x

(m)t with

i = 0). These notational conventions have become standard in the mixed-frequency literature

and are similar to the ones in Gotz et al. (2014), Clements and Galvao (2008, 2009) or Miller

(2014).

Furthermore, let Wt = (W ′t−1, W ′t−2, . . ., W ′t−p)′ denote the last p low-frequency lags of

any process W stacked. Finally, 0i×j (1i×j) denotes an (i × j)-matrix of zeros (ones), Ii is an

identity matrix of dimension i, ⊗ represents the Kronecker product and vec corresponds to the

operator stacking the columns of a matrix.

Remark 1 Extensions towards representations of higher dimensional multivariate systems as

in Ghysels et al. (2016b) can be considered, but are left for further research here. Analyzing

Granger causality among more than two variables inherently leads to multi-horizon causality

(see Lutkepohl, 1993 among others). The latter implies the potential presence of a causal chain:

for example, in a trivariate system, X may cause Y through an auxiliary variable Z. To abstract

from that scenario, Ghysels et al. (2016b) often consider cases in which high- and low-frequency

variables are grouped and causality patterns between these groups, viewed as a bivariate system,

are analyzed. They study the presence of a causal chain and multi-horizon causality in a Monte

Carlo analysis though.

2.2 MF-VARs

Considering each high-frequency variable such that

X(m)t = (x

(m)t , x

(m)t−1/m, . . . , x

(m)t−(m−1)/m)′, (1)

a dynamic structural equations model for the stationary multivariate process Zt = (yt, X(m)′t )′

is given by

AcZt = c+A1Zt−1 + . . .+ApZt−p + εt. (2)

6

Note that the parameters in Ac are related to the ones in A1 due to stacking the high-frequency

observations X(m)t in Zt (Ghysels, 2015).3 Explicitly for a lag length of p = 1, the model reads

as:

1 β1 β2 . . . βm

δ1 1 −ρ1 . . . −ρm−1

δ2 0 1 . . . −ρm−2

......

.... . .

...

δm 0 0 . . . 1

yt

x(m)t

x(m)t−1/m

...

x(m)t−(m−1)/m

=

c1

c2

...

cm+1

+

ρy θ1 θ2 . . . θm

ψ2 ρm . . . . . . 0

ψ3 ρm−1 ρm . . . 0

......

.... . .

...

ψm+1 ρ1 ρ2 . . . ρm

yt−1

x(m)t−1

x(m)t−1−1/m

...

x(m)t−1−(m−1)/m

+

ε1t

ε2t

...

ε(m+1)t

.

(3)

In general, we assume the high-frequency process to follow an AR(q) process with q < m

(we set q = m in the equation above; if q < m, one should set ρq+1 = . . . = ρm = 0). For

non-zero values of βj or δj the matrix Ac links contemporaneous values of y and x, a feature

referred to as nowcasting causality (Gotz and Hecq, 2014).4

Pre-multiplying (3) by A−1c we get to the mixed-frequency reduced-form VAR(p) model:

Zt = µ+ Γ1Zt−1 + . . .+ ΓpZt−p + ut

= µ+B′Zt + ut.(4)

3Compared to Ghysels (2015) we simply reverse the mixed-frequency vector Zt and put the low-frequencyvariable first.

4βj 6= 0 implies that yt is affected by incoming observations of X(m)t , whereas δj 6= 0 implies that the

high-frequency observations are influenced by yt (see Gotz and Hecq, 2014). The latter becomes interestingfor studying policy analysis, where the high-frequency policy variable(s) may react to current low-frequencyconditions (see Ghysels, 2015 for details). Note that Gotz et al. (2015) consider a noncausal MF-VAR in whichone allows for non-zero elements in the lead coefficients of the Ac matrix.

7

Consequently, µ = A−1c c, Γi = A−1

c Ai and ut = A−1c εt. Let B = (Γ1,Γ2, . . . ,Γp)

′ and B(z) =

Im+1 −∑p

j=1 Γjzj . We then make the following assumptions on the MF-VAR:

Assumption 2 Zt is generated by the MF-VAR(p) in (4), for which it holds that (i) the roots

of the matrix polynomial B(z) all lie outside the unit circle; (ii) ut is independent and identically

distributed (i.i.d.) with E(ut) = 0, E(utu′t) = Σu, with Σu positive definite, and E(||ut||4) <∞,

where || · || is the Frobenius norm.

Assumption 2(i) ensures that the MF-VAR is I(0), while (ii) is a standard assumption to

ensure validity of the bootstrap for VAR models, see, e.g., Paparoditis (1996), Kilian (1998) or

Cavaliere et al. (2012).

Remark 3 The i.i.d. assumption on ut is not strictly necessary to derive the limit distributions

of the test statistics and could be weakened to a martingale difference sequence (mds) assumption

(see, e.g., Ghysels et al., 2016b). However, it allows us to directly use the aforementioned

literature on the validity of our bootstrap method that uses i.i.d. resampling of the VAR residuals,

as considered in the simulation study in Section 4. Asymptotic validity of the bootstrap can also

be established for heteroskedastic error terms, provided one adapts the bootstrap method to a

wild bootstrap as we do in the empirical application in Section 5. Asymptotic validity of the

wild bootstrap for univariate autoregressions is established in Goncalves and Kilian (2004) and

extended to VAR models by Bruggemann et al. (2016). Section 3.1.4 provides further details

on our bootstrap methods and their validity.

Remark 4 Assumption 2 implies that the data are truly generated at mixed frequencies. Hence,

similar to Ghysels et al. (2016a) but unlike Ghysels et al. (2016b), we do not start with a com-

mon high-frequency data generating process (DGP hereafter).5 The latter would imply causality

5Given such a situation it would seem natural to cast the model in state space form and estimate the param-eters using the Kalman filter. However, this amounts to a ’parameter-driven’ model (Cox et al., 1981), whichcontains latent processes, i.e., the high-frequency observations of y. The latter is a feature we try to avoid in

8

patterns to arise at the high frequency. Consequently, we do not investigate which causal re-

lationships at high frequency get preserved when moving to the mixed- or low-frequency case.

Indeed, an extension of our methods along these lines would demand a careful analysis of the

mixed- and low-frequency systems corresponding to their latent high-frequency counterpart (see

Ghysels et al., 2016b for the unrestricted VAR case).

Equation (4) is easy to estimate for small m, yet becomes a rather large system as the latter

grows. For example, in a MF-VAR(1),

yt

x(m)t

x(m)t−1/m

...

x(m)t−(m−1)/m

=

µ1

µ2

...

µm+1

+

γ1,1 γ1,2 · · · γ1,m+1

γ2,1 γ2,3 · · · γ2,m+1

......

. . ....

γm+1,1 γm+1,2 · · · γm+1,m+1

︸︷︷︸

Γ1

×

yt−1

x(m)t−1

x(m)t−1−1/m

...

x(m)t−1−(m−1)/m

+

u1t

u2t

...

u(m+1)t

,

(5)

uti.i.d.∼ (0((m+1)×1),Σu), Σu =

σ1,1 σ1,2 . . . σ1,m+1

σ2,1 σ2,2 . . ....

......

. . ....

σm+1,1 . . . . . . σm+1,m+1

, (6)

our MF-VAR: say we are interested in the impact of shocks to one or several variables on the whole system.Using a high-frequency DGP with missing observations implies that shocks to these latent processes are alsolatent and unobservable. This is undesirable given that, e.g., policy shocks are, of course, observable (Foroni andMarcellino, 2014).

9

there are (m + 1)2 parameters to estimate in the matrix Γ1. Additional lags would further

complicate the issue.

2.3 Granger Causality in MF-VARs

Let Ωt represent the information set available at moment t and let ΩWt denote the corresponding

set excluding information about the stochastic process W . With P [X(m)t+h|Ω] being the best linear

forecast of X(m)t+h based on Ω, Granger non-causality is defined as follows (Dufour and Renault,

1998):

Definition 5 y does not Granger cause X(m) if

P [X(m)t+1 |Ω

yt ] = P [X

(m)t+1 |Ωt]. (7)

Similarly, X(m) does not Granger cause y if

P [yt+1|ΩX(m)

t ] = P [yt+1|Ωt]. (8)

In other words, y does not Granger cause X(m) if past information of the low-frequency

variable do not help in predicting current (or future) values of the high-frequency variables and

vice versa. In terms of (5), testing for Granger non-causality implies the following null (and

alternative) hypotheses:

• y does not Granger cause X(m)

H0 : γ2,1 = γ3,1 = . . . = γm+1,1 = 0,

HA : γi,1 6= 0 for at least one i = 2, . . . ,m+ 1.(9)

10

• X(m) does not Granger cause y

H0 : γ1,2 = γ1,3 = . . . = γ1,m+1 = 0,

HA : γ1,i 6= 0 for at least one i = 2, . . . ,m+ 1.(10)

3 Parameter Reduction

This section presents techniques that we have considered, and evaluated through a Monte Carlo

exercise, with the aim to reduce the amount of parameters to be estimated in the MF-VAR

model. Two approaches are discussed in detail, reduced rank restrictions and a Bayesian MF-

VAR.

There are many alternative approaches to reduce the number of parameters among which are

principal components, Lasso (Tibshirani, 1996) or ridge regressions (Hoerl and Kennard, 1970,

among others). However, using principal components does not necessarily preserve the dynamics

of the VAR under the null: nothing prevents, e.g., the first and only principal component to be

loading exclusively on y implying that the remaining dynamics enter the error term. In other

words, the autoregressive matrices in (4) may and will most likely not be preserved, which

naturally affects the block of parameters we test on for Granger non-causality. As for Lasso

and ridge regressions, it is well known that they may be interpreted in a Bayesian context.

In particular, the latter is equivalent to imposing a normally distributed prior with mean zero

on the parameter vector (Vogel, 2002, among others), while the former may be replicated

using a zero-mean Laplace prior distribution (Park and Casella, 2008). Given that our set of

models contains a Bayesian VAR for mixed-frequency data, we abstract from its connection to

regularized versions of least squares at this stage.

11

3.1 Reduced Rank Restrictions

3.1.1 Reduced Rank Regression Model

In order to reduce the number of parameters to estimate in the MF-VAR, we propose the

following reduce rank regression model, for which we make the following assumption:

Assumption 6 Let BX(m) be the matrix obtained from B′ in (4) by excluding the first columns

of Γ1, Γ2, . . ., Γp. The rank of this matrix is smaller than the number of high-frequency

observations within Zt, i.e.,

rk(B′X(m)) = r < m.

The model then reads as follows:

Zt = µ+ γ·,1yt + α∑p

i=1 δ′iX

(m)t−i + νt

= µ+ γ·,1yt + αδ′X(m)t + νt,

(11)

where γ·,1 is the (m + 1) × p matrix containing the first columns of Γ1,Γ2, . . . ,Γp, and α and

δ = (δ′1, . . . , δ′p)′ are (m+ 1)× r and pm× r matrices, respectively. Note that (11) can also be

written in terms of Zt. Let us define Γi ≡ (γ(i)·,1 , αδ

′i), where γ

(i)·,1 , i = 1, . . . , p, corresponds to

the ith column of γ·,1. Then, (11) is equivalent to

Zt = µ+B′Zt + νt, (12)

12

where B = (Γ1, . . . ,Γp)′. For p = 1 the model becomes

yt

x(m)

x(m)t−1/m

...

x(m)t−(m−1)/m

= µ+

γ1,1

γ2,1

...

γm+1,1

yt−1 +

α1

α2

...

αm+1

δ′1

x(m)t−1

x(m)t−1−1/m

...

x(m)t−1−(m−1)/m

+ νt

= µ+

γ1,1

γ2,1

...

γm+1,1

,

α1

α2

...

αm+1

δ′1

︸︷︷︸

Γ1

×

yt−1

x(m)t−1

x(m)t−1−1/m

...

x(m)t−1−(m−1)/m

+ νt,

(13)

where each αi, i = 1, . . . ,m + 1, is of dimension 1 × r and where δ1 is an m × r matrix.

Hence, we could call δ′X(m)t a vector of r high-frequency factors. Note that r = m − s, where

s represents the rank reduction we are able to achieve within X(m)t . In terms of parameter

reduction, the unrestricted VAR in (4) requires p(m + 1)2 coefficients to be estimated in the

autoregressive matrices, whereas the VAR under reduced rank restrictions in (11) or (12) needs

p(m + 1) + r(m + 1) + prm parameter estimates. As an example, assume p = 1 and m = 20.

Then, if r = 1, 2 or 3, there are, respectively, 62, 103 or 144 coefficients to be estimated in

Γ1 instead of 441 in Γ1. While being reduced by a large amount, the number of parameters

to be estimated remains considerable implying that the tests may still incur small sample size

distortions. This is one of the reasons why we also consider bootstrap versions of these tests.

Note that we do not require yt−1 to be included in the same transmission mechanism as the x

variables.

Remark 7 There are several ways to justify the reduced rank feature of the autoregressive

13

matrix BX(m). First, at the model representation level we may assume that, in the structural

model (3), x follows an AR(q) process with q < m and that the last m − q elements of each

X(m)t−i , i = 1, . . . , p, have a zero coefficient in the equation for yt. Plugging these restrictions

into (4) results in a reduced rank of B′X(m) because the matrices Γi = A−1

c Ai have the rank of

Ai. Second, at the empirical level one can interpret the MF-VAR as an approximation of the

VARMA obtained after block marginalization of a high-frequency VAR for each variable. In

this situation, reduced rank matrices may empirically not be rejected by the data because of the

combinations of many elements. Thus, a small number of dynamic factors can approximate

more complicated (possibly nonlinear) dynamics. Finally, the way one typically restricts the

MF-VAR in (4), i.e., assuming the high-frequency series to follow an ARX-process (Ghysels,

2015), actually implies a reduced rank representation. Looking slightly ahead, consider Equation

(25) or the matrix in (29), which we will use in our Monte Carlo section, for the special case

p = 1. It is obvious that for the first order MF-VAR, Zt = Γ1Zt−1 + ut, with

Γ1 =

γ1,1 γ1,2 γ1,3 . . . γ1,21

γ∗2,1 ρ20 0 . . . 0

γ∗3,1 ρ19... . . . 0

......

.... . .

...

γ∗21,1 ρ 0 . . . 0

we have, due to the block of zeros, a reduced rank matrix BX(m) with two (restricted coefficient)

14

factors:

rk

γ1,2 γ1,3 . . . γ1,21

ρ20 0 . . . 0

ρ19... . . . 0

......

. . ....

ρ 0 . . . 0

= rk

1 0

0 ρ20

0 ρ19

......

0 ρ

AA−1

γ1,2 γ1,3 . . . γ1,21

1 0 . . . 0

= 2,

where A is a 2 × 2 full rank rotation matrix, the identity in this particular example. Hence,

under HA, e.g., two factors (although unrestricted in our setting) will capture the reduced rank

feature. Note that we could impose (overidentifying) restrictions to retain the large block of

zeros in Γ1. In that sense, the model with r = 2 would not be misspecified but “suboptimal”.

On the other hand, with merely one factor we would not be able to impose the block of zeros

implying the model to be misspecified. Under the null hypothesis the model is not misspecified

with either one or two factors, because the test tries to capture some dynamics that is not there.

Note that our focus is to compare the performance of different approaches including their

bootstrap versions (whenever applicable) and that in this light we view reduced rank regressions

as just one particular specification. In particular, next to the Bayesian MF-VAR and the max-

test, our bootstrap implementation has also made the unrestricted MF-VAR (which is not

subject to rank misspecification) a viable option to compare the reduced rank regressions with.

3.1.2 Testing for Granger Non-Causality

Given r and Assumption 6, under the condition that the high-frequency factors δ′X(m)t are

observable, we can estimate (11) by OLS. Letting Γ = (µ, γ·,1, α)′ denote the corresponding

OLS estimates, we can test for Granger non-causality by defining R as the matrix that picks

the set of coefficients we want to do inference on, i.e., Rvec(Γ). For a general construction of

15

the matrix R in the presence of several low- and high-frequency variables,6 we refer the reader

to Ghysels et al. (2016b). The Wald test is constructed as

ξW =[Rvec(Γ)

]′(RΩR′)−1

[Rvec(Γ)

], (14)

with

Ω = (W ′W )−1 ⊗ Σν , (15)

where Σν = 1T

∑Tt=p+1 νtν

′t, νt are the OLS residuals of (11), and W = (W1, . . . ,WT )′ is the

regressor set consisting of Wt = (1, y′t, δ′X

(m)′t )′.7 As illustrated in Ghysels et al. (2016b), ξW is

asymptotically χ2rank(R). A robust version of (14) is also implemented in the empirical section.

Of course, the factors are typically not observable and must be estimated (unless we im-

pose specific factors). Using estimated rather than observed factors in the regression (11) will

obviously affect the distribution of ξW , certainly in small samples. For this reason we consider

bootstrap versions of the tests. Before going into the details on the bootstrap we describe how

we estimate or impose the factors.

3.1.3 Estimating the Factors

We use three ways to estimate the factors, canonical correlation analysis (CCA hereafter),

partial least squares (PLS hereafter) and heterogeneous autoregressive (HAR hereafter) type

restrictions. We briefly present the algorithms that are used to extract those factors.

CCA is based on analyzing the eigenvalues and corresponding eigenvectors of

Σ−1

X(m)

X(m)ΣX

(m)Z

Σ−1ZZ

ΣZX

(m) (16)

6Within the unrestricted MF-VAR in (4), though.7A sample size correction, i.e., using Σν = 1

T−KW

∑Tt=p+1 νtν

′t, where KW is the amount of elements in W ,

may alleviate size distortions in finite samples of tests using asymptotic critical values.

16

or, similarly, of the symmetric matrix

Σ−1/2

X(m)

X(m)ΣX

(m)Z

Σ−1ZZ

ΣZX

(m)Σ−1/2

X(m)

X(m) . (17)

For a detailed discussion we refer the reader to Anderson (1951) or, for the application to com-

mon dynamics, to Vahid and Engle (1993). Note that Σij represents the empirical covariance

matrix of processes i and j. Furthermore, Z and X(m)

indicate Zt and X(m)t , respectively, to

be concentrated out by the variables that do not enter in the reduced rank regression, i.e., the

intercept and yt−1. Denoting by V = (v1, v2, . . . , vr), with v′ivj = 1 for i = j and 0 otherwise,

the eigenvectors corresponding to the r largest eigenvalues of the matrix in (17), we obtain the

estimators of α and δ as:

α = Σ−1ZZ

ΣZX

(m)Σ−1/2

X(m)

X(m) V

δ = Σ−1/2

X(m)

X(m) V .

(18)

Note that the estimation of the eigenvectors obtained from the canonical correlation analysis

in (16) or (17) may, however, perform poorly with high-dimensional systems, because inversions

of the large variance matrices Σ−1ZZ

and Σ−1

X(m)

X(m) are required. As an alternative to CCA we

use a PLS algorithm similar to the one used in Cubadda and Hecq (2011) or Cubadda and

Guardabascio (2012). In order to make the solution of this eigenvalue problem invariant to

scale changes of individual elements, we compute the eigenvectors associated with the largest

eigenvalues of the matrix

D−1/2

X(m)

X(m)ΣX

(m)ZD−1ZZ

ΣZX

(m)D−1/2

X(m)

X(m)

with DX

(m)X

(m) and DZZ being diagonal matrices having the diagonal elements of, respectively,

ΣX

(m)X

(m) and ΣZZ as their entries. The computation of α and δ works in a similar fashion as

with CCA-based factors.

17

Finally, we may impose the presence of r = 3p factors,8 inspired by the HAR-model (Corsi,

2009). For i = 1, . . . , p:

δi =

03(i−1)×m

1 01×(m−1)

11×( 14m) 0(1× 3

4m)

11×m

03(p−i)×m

′

⇒ δ′X(m)t =

x(m)t−1∑ 1

4m−1

i=0 x(m)t−1−i/m∑m−1

i=0 x(m)t−1−i/m...

x(m)t−p∑ 1

4m−1

i=0 x(m)t−p−i/m∑m−1

i=0 x(m)t−p−i/m

. (19)

For p = 1 and m = 20 this corresponds to

δ′X(m)t =

x

(20)t−1∑4

i=0 x(20)t−1−i/20∑19

i=0 x(20)t−1−i/20

≡

xDt−1

xWt−1

xMt−1

, (20)

with xDt , xWt and xMt denoting daily, weekly and monthly measures, respectively.

Remark 8 As noted in Ghysels and Valkanov (2012) and Ghysels et al. (2007), these HAR-

type restrictions are a special case of MIDAS with step functions introduced in Forsberg and

Ghysels (2007). Considering partial sums of regressors x as Xt(K,m) =∑K

i=0 x(m)t−i/m, a MIDAS

regression with M steps reads as yt = µ +∑M

j=1 βjXt(Kj ,m) + εt, where K1 < . . . < KM .

Alternatively, we could use MIDAS restrictions as introduced in Ghysels et al. (2004), for the

difference between the latter and HAR-type restrictions has been shown to be small (Ghysels

and Valkanov, 2012). However, step functions have the advantage of not requiring non-linear

estimation methods, since the distributed lag pattern is approximated by a number of discrete

8To ensure that r < m we assume that p < 13m at this stage.

18

steps, which simplifies the analysis. Furthermore, and more crucially in the context of this paper,

implementing MIDAS restrictions and testing for Granger non-causality implies the well-known

Davies (1987) problem, i.e., the parameters determining the MIDAS weights (see Ghysels et al.,

2004 for details) are not identified under the null hypothesis.9

3.1.4 Bootstrap Tests

Despite the dimensionality reduction achieved by imposing the factor structure, a considerable

number of parameters remains to be estimated for conducting the Wald tests such that the tests

may still be subject to considerable small sample size distortions. Moreover, the estimation of

the factors will have a major effect on the distribution of the Wald test statistic. We therefore

also consider a bootstrap implementation of these tests to improve the properties of the tests

in finite samples.

Let B = Γδ′ denote the estimates of B obtained by one of the reduced rank methods, and

assume that δ is normalized such that its upper r × r block is equal to the identity matrix.10

The first bootstrap method we consider is a standard “unrestricted” bootstrap, that may have

superior power properties in certain cases (cf. Paparoditis and Politis, 2005). Its algorithm

looks as follows.

1. Obtain the residuals from the MF-VAR(p)

νt = Zt − µ− B′Zt, t = p+ 1, . . . , T. (21)

9To properly test for Granger non-causality in this case, a grid for the weight specifying parameter vectorhas to be considered and the corresponding Wald tests for each candidate have to be computed. Subsequently,one can calculate the supremum of these tests (Davies, 1987) and obtain an ’asymptotic p-value’ using bootstraptechniques (see Hansen, 1996 or Ghysels et al., 2007 for details). While this approach is feasible, it is computa-tionally more demanding. Admittedly, HAR-type restrictions provide less flexibility than the MIDAS approach,yet their simplicity makes them very appealing from an applied perspective.

10This ensures that the rotation of the factors is the same for the original sample and the bootstrap sample,which in turn makes sure the correct bootstrap null hypothesis is imposed for the first bootstrap method. In thesimulations we experimented with different normalizations which did not change the results.

19

2. Draw the bootstrap errors ν∗1 , . . . , ν∗T with replacement from νy,p+1, . . . , νy,T .

3. Letting Z∗t = (Z∗′t−1, . . . , Z∗′t−p)

′, construct the bootstrap sample Z∗t recursively as

Z∗t = µ+ B′Z∗t + ν∗t , t = 1, . . . , T, Z∗0 = 0. (22)

Note that the null hypothesis in the bootstrap is not imposed which has consequences for

the next step.

4. Estimate the bootstrap equivalent of (11), where the factors are estimated using the same

method as for the original sample, and obtain the bootstrap Wald test statistic ξ∗W . As the

“true” bootstrap parameters governing Granger causality are different from zero, we need

to adapt the bootstrap Wald test such that it tests the correct null hypothesis. Observing

that those parameters form a subset of the estimates in Γ used in (22), the appropriate

Wald test statistic is

ξ∗W =[Rvec(Γ

∗)− Rvec(Γ)

]′(RΩ∗R′)−1

[Rvec(Γ

∗)− Rvec(Γ)

], (23)

where all quantities with a superscript ‘*’ are calculated analogously to their sample

counterparts but then using the bootstrap sample.

5. Repeat steps 2 to 4 B times, and calculate the bootstrap p-value as the proportion of

bootstrap samples for which ξ∗W > ξW .

While this bootstrap method has the advantage that it can be used for testing Granger

causality in both directions (as only the matrix R needs to change), it also has some drawbacks.

In particular, even with the dimensionality reduction provided by the reduced rank estimation,

it still relies on the generation of a bootstrap sample based on an (m + 1)-dimensional MF-

VAR(p) that requires p(m+ 1) + r(m+ 1) + prm parameters to estimate. This may make the

20

bootstrap unstable and prone to generate outlying samples, with potential size distortions or a

loss of power as a result.

Therefore we consider a second bootstrap method that imposes the null hypothesis of no

Granger causality and in doing so achieves a further significant reduction of dimensionality. This

method is very similar to the parametric wild bootstrap proposed in Ghysels et al. (2016a). A

downside of this method is that it requires separate bootstrap algorithms for testing in the two

opposite directions. We describe the procedure here for the null hypothesis that X(m) does not

Granger cause y, and comment on the test in the opposite direction below.

1. Estimate an AR(p) model for y and obtain the residuals

uy,t = yt − µ−p∑j=1

ρy,jyt−j , t = p+ 1, . . . , T.

2. Draw the bootstrap errors u∗1, . . . , u∗T with replacement from uy,p+1, . . . , uy,T .

3. Construct the bootstrap sample y∗t recursively as

y∗t =

p∑j=1

ρy,jy∗t−1 + u∗t , t = 1, . . . , T, y∗0 = 0,

thus imposing the null of no Granger causality from X(m) to y.

4. Letting X(m)∗t = X

(m)t , use the bootstrap sample to estimate the MF-VAR using the same

estimation method as for the original estimator, and obtain the corresponding Wald test

statistic ξ∗W . As the null hypothesis of no Granger causality has been imposed in the

bootstrap, ξ∗W now has the same form as ξW in (14).

Note that only p parameters need to be estimated in order to generate the bootstrap sam-

ple, most likely making the procedure more stable than the unrestricted bootstrap above. A

similar bootstrap procedure can be implemented for the reverse direction of causality by only

21

resampling X(m)t while keeping yt fixed. As one can resample X

(m)t independently of yt, it can

be done in the original high-frequency as a univariate process. Said differently, it only requires

fitting an AR(q) process to x(m)t , again achieving a large reduction of dimensionality.

Given the simplicity and apparent advantages of the second bootstrap method, one may

wonder why one should consider the first, unrestricted, bootstrap method at all. However,

in our simulation study we find that the unrestricted method has better size properties than

the restricted method for testing causality from y to X(m), in particular in the presence of

nowcasting causality.11

It is also possible in step 2 of both algorithms to use the wild bootstrap to be robust against

heteroskedasticity. In this case the bootstrap errors are generated as u∗t = ξ∗t ut, where we

generate ξ∗1 , . . . , ξ∗T as independent standard normal random variables. This would constitute

the so-called “recursive wild bootstrap” scheme that Goncalves and Kilian (2004) for univariate

autoregressions and Bruggemann et al. (2016) for VAR models prove to be asymptotically valid

under conditionally heteroskedastic errors. In particular, employing the wild bootstrap then

allows us to replace the i.i.d. assumption in Assumption 2 with a martingale difference sequence

assumption. We implement the wild bootstrap version in the empirical application in Section

5.

Remark 9 Kilian (1998) and Paparoditis (1996) prove the asymptotic validity of the unre-

stricted VAR bootstrap under Assumption 2. The validity of the restricted bootstrap can be

derived from the results for AR(p) processes in Bose (1988), and the observation that condi-

tioning is a valid approach in the absence of Granger causality. Van Giersbergen and Kiviet

(1996) provide a detailed examination of the two different types of bootstraps in the context of

autoregressive distributed lag models and demonstrate that the restricted bootstrap even works

well in the absence of strong exogeneity. Dufour et al. (2006) apply the bootstrap to causality

11Simulation results explicitly comparing the two bootstrap methods are available upon request.

22

testing in VAR models and relax the assumption of i.i.d. error terms.12

Remark 10 Goncalves and Perron (2014) consider factor-augmented regression models, with

factors estimated by principal components. In this setting the asymptotic impact of the factor

estimation depends on which asymptotic framework is assumed. Under the asymptotic frame-

work, in which factor estimation has a non-negligible effect on the limit distribution of the

regression estimators, the bootstrap correctly mimics this effect and is therefore asymptotically

valid. Consequently, the bootstrap provides a much more accurate approximation to the finite

sample distribution of the regression estimators than the asymptotic approximation, that as-

sumes a negligible effect of factor estimation. We expect the bootstrap to have similar properties

in our setting where the factors are estimated using CCA or PLS.

3.2 Bayesian MF-VARs

3.2.1 Restricted MF-VARs

Ghysels (2015) discusses the issue of parsimony in MF-VAR models by specifying the high-

frequency process in such a way as to allow a number of parameters that is independent from

m. Indeed, while it seems reasonable to leave the equation for yt unrestricted,13 it is less clear

for the remaining ones. Hence, let us make the following assumption (see Ghysels, 2015 or the

numerical examples in Ghysels et al., 2016a).

Assumption 11 The high-frequency process x(m)t follows an AR(1) model with one lag of the

12If the test statistic (14) is asymptotically pivotal, we may expect the bootstrap to provide asymptoticrefinements as well and, hence, reduce small sample size distortions (see Bose, 1988 or Horowitz, 2001).

13In fact, viewed as single equation, it boils down to an unrestricted MIDAS model (Foroni et al., 2015) withoutcontemporaneous observations of the high-frequency variable. MIDAS restrictions (Ghysels et al., 2004) can beimposed here as an alternative. However, doing so implies leaving the linear framework, which is needed to applythe auxiliary dummy variable approach presented below. Furthermore, even after having drawn the MIDAShyperparameters, one still faces the aforementioned Davies (1987) problem when attempting to test for Grangernon-causality from X(m) to y.

23

low-frequency variable in the regressor set

x(m)t−i/m = µi+2 + ρx

(m)t−(i+1)/m + πyt−1 + υ(i+2)t (24)

for i = 0, . . . ,m− 1.

Completing the system with the equation for yt leads to the following restricted MF-VAR:

yt

x(m)t

x(m)t−1/m

...

x(m)t−(m−1)/m

=

µ1∑m−1i=0 ρiµ2+i∑m−2i=0 ρiµ3+i

...

µm+1

+

γ(1)1,1 γ

(1)1,2 γ

(1)1,3 . . . γ

(1)1,m+1

π∑m−1

i=0 ρi ρm

0m×(m−1)

π∑m−2

i=0 ρi ρm−1

......

π ρ

×

yt−1

x(m)t−1

x(m)t−1−1/m

...

x(m)t−1−(m−1)/m

+∑p

k=2

γ(k)1,1 γ

(k)1,2 . . . γ

(k)1,m+1

0m×(m+1)

×

yt−k

x(m)t−k...

x(m)t−k−(m−1)/m

+

υ1t∑m−1i=0 ρiυ(m+1−i)t∑m−2i=0 ρiυ(m+1−i)t

...

υ(m+1)t

︸︷︷︸

υ∗t

,

(25)

where γ(k)i,j corresponds to the (i, j)-element of matrix Γk in (4). As for the error terms υit, i =

1, . . . ,m+1, we set E(υitυit) = σHH , E(υ1tυit) = σHL for i = 2, . . . ,m+1, and E(υ1tυ1t) = σLL.

Furthermore, each error term is assumed to possess a zero mean and to be normally distributed.

24

Consequently, υ∗t ∼ N(0((m+1)×1),Συ∗), whereby we refer the reader to Ghysels (2015) for the

exact composition of Συ∗ .

3.2.2 The Auxiliary Dummy Variable Approach for MF Data

As pointed out by Carriero et al. (2011), Bayesian methods allow the imposition of restrictions

such as the ones in Assumption 11, while also admitting influence of the data. Consequently,

Bayesian shrinkage has become a standard tool when being faced with large-dimensional esti-

mation problems such as large VARs (e.g., Banbura et al., 2010, Kadiyala and Karlsson, 1997).

As for MF-VARs, Ghysels (2015) describes a way to sample the MIDAS hyperparameters and

subsequently formulates prior beliefs for the remaining parameters.14 Once these hyperparam-

eters are taken care of, the Bayesian analyses of mixed- and common-frequency VAR models

are quite similar and, hence, traditional Bayesian VAR techniques (e.g., Kadiyala and Karlsson,

1997, Litterman, 1986) can be applied.

We follow the approach of Banbura et al. (2010), which in turn is built on the work of

Sims and Zha (1998), showing that adding a set of auxiliary dummy variables to the system

is equivalent to imposing a normal inverted Wishart prior. The specification of prior beliefs is

derived from the Minnesota prior in Litterman (1986), whereby we center the prior distributions

14McCracken et al. (2015) use a Sims-Zha shrinkage prior and the algorithm in Waggoner and Zha (2003) tosolve the parameter proliferation problem with Bayesian estimation techniques. Bayesian methods within mixed-frequency VARs are also considered by Schorfheide and Song (2015). However, they use a latent high-frequencyVAR instead of the mixed-frequency system a la Ghysels (2015).

25

of the coefficients in B around the restricted MF-VAR in (25):

E[γ(k)i,j ] =

ρm if i = j = k = 1

ρm+j−i if k = 1, j = 2, i > 1

0 else

,

V ar[γ(k)i,j ] =

φλ

2

k2SLH for i = 1, j > 1

φλ2

k2SHL for j = 1, i > 1

λ2

k2else

,

(26)

where all γ(k)i,j are assumed to be a priori independent and normally distributed. The covariance

matrix of the residuals is for now assumed diagonal and fixed, i.e., Σu = Σ = Σd with Σd =

diag(σ2L, σ

2H , . . . , σ

2H) of dimension m + 1. The tightness of the prior distributions around the

AR(1) specification in (24) is determined by λ,15 the influence of low- on high-frequency data

and vice versa is controlled by φ and, finally, Sij =σ2i

σ2j, i, j = L,H, governs the difference in

scaling between y and the x-variables. For µ we take a diffuse prior.

Remark 12 As for the common-frequency case, the expressions for V ar[γ(k)i,j ] in (26) imply

that more recent (low-frequency) lags provide more reliable information than more distant ones.

However, due to the stacked nature of X(m)t , the coefficients could be shrunk according to their

high-frequency time difference instead. This implies specifying the lag associated with each

coefficient in fractions of the low-frequency time index: yt and x(m)t−1−i/m, e.g., are 1 + i/m low-

frequency time periods apart such that the denominator in the corresponding coefficient’s prior

variance would equal (1 + i/m)2.

However, in the context of Granger causality testing such a specification is problematic as

the coefficients to test on are shrunk in ”opposite directions”: yt and x(m)t−1−1/2 are 1.5 t-periods

apart, whereas x(m)t−1−1/2 and yt−1 are separated by only 0.5 low-frequency periods. Consequently,

15λ ≈ 0 results in the posterior coinciding with the prior, whereas λ =∞ causes the posterior mean to coincidewith the OLS estimate of the unrestricted VAR in (4).

26

for a given λ, the coefficients are shrunk much more when testing for Granger causality from

X(m) to y, especially for large m. This makes it difficult to control the size of the respective

tests in one or the other direction. The common-frequency handling of the prior variances in

(26) mitigates this issue, although some loss in power has to be assumed.

Note that a treatment of the mixed-frequency nature of the variables along the lines outlined

above may well be advantageous in other circumstances (e.g., forecasting) such that we present

this approach in detail in Appendix A.16

Given the prior beliefs specified before, the analysis is very similar to the one in Banbura

et al. (2010). In short, let us write the MF-VAR as

Z = ZB∗ + E,

where Z = (Zµ1 , . . . , ZµT )′ with Zµt = (Z ′t, 1)′, Z = (Z1, . . . , ZT )′, E = (u1, . . . , uT )′ and B∗ =

(B′, µ)′. Then, one can show that augmenting the model by two dummy variables, Yd and Xd,

is equivalent to imposing a normal inverted Wishart prior that satisfies the moments in (26).

Finally, estimating the augmented model by ordinary least squares gives us the posterior mean

of the coefficients, on which we can do inference as outlined in the next subsection.

Note that Xd is, in fact, identical to the matrix for the common-frequency VAR (Banbura

et al., 2010); Yd, however, is slightly different due to the prior means being centered around the

restricted VAR in (25):

16When extending the approach we stick to the standard structure of the Minnesota prior in Litterman (1986).It is, however, also feasible to address the aforementioned issue by introducing, say, three hyperparameters: λ1

governing the prior tightness for the coefficients determining Granger causality from y to X(m)t , λ2 for parameters

controlling causality in the reverse direction and λ3 taking care of the remaining coefficients. This strategy is,however, a rather straightforward extension of the theory presented in Appendix A such that we do not presentit here.

27

Yd︸︷︷︸((m+1)(p+1)+1)×(m+1)

=

ρmσL/λ 0 0 . . . 0

0 ρmσH/λ ρm−1σH/λ . . . ρσH/λ

0((m+1)p−2)×(m+1)

diag(σL, σH , . . . , σH)

01×(m+1)

,

Augmenting the model is then achieved by setting Z∗ = (Z ′, Y ′d)′ and Z∗ = (Z ′, X ′d)′.

3.2.3 Testing for Granger Non-Causality

In terms of analyzing the Granger non-causality testing behavior within the Bayesian MF-

VAR, the auxiliary dummy variable approach is quite appealing, as it provides a closed-form

solution for the posterior mean of the coefficients. It is thus straightforward to compare the

testing behavior with the ones from alternative approaches using the Wald test.17 The latter

is derived analogously to the one in (14), and is, given Assumption 11, also asymptotically

χ2rank(R)-distributed.

Remark 13 Because the parameter vector vec(B∗) is not assumed fixed but random, construc-

tion of the test statistic, and especially its interpretation, need to be treated with special care. We

consider Bayesian confidence intervals, in particular highest posterior density (HPD) confidence

intervals (Bauwens et al., 2000), the tightness of which is expressed by α, not coincidentally

the same letter that denotes the significance level in the frequentist’s framework. Here, it is

interpreted such that (1 − α)% of the probability mass falls within the respective interval. In

other words, the probability that a model parameter falls within the bounds of the interval is

equal to (1− α)%. The interval centered at the modal value for unimodal symmetric posterior

17There is a large sample correspondence between classical Wald and Bayesian posterior odds tests (Andrews,1994). For certain choices of the prior distribution, the posterior odds ratio is approximately equal to the Waldstatistic. Andrews (1994) shows that for any significance level α there exist priors such that the aforementionedcorrespondence holds, and vice versa.

28

densities is then called the HPD (Zellner, 1996 or Bauwens et al., 2000). The connection to

Wald tests is now immediate using the equivalence between confidence intervals and respective

test statistics, whereby similar care is demanded when interpreting results.

3.3 Benchmark Models

3.3.1 Low-Frequency VAR

Before the introduction of MIDAS regression models, high-frequency variables were usually

aggregated to the low frequency in order to obtain a common frequency for all variables

appearing in a regression (Silvestrini and Veredas, 2008 or Marcellino, 1999). Likewise for

systems, a monthly variable, for example, was usually aggregated to, say, the quarterly fre-

quency such that a VAR could be estimated in the resulting common low frequency. Formally,

xt = W (L1/m)x(m)t , where W (L1/m) denotes a high-frequency lag polynomial of order A, i.e.,

W (L1/m)x(m)t =

∑Ai=0wix

(m)t−i/m (Silvestrini and Veredas, 2008).18 As far as testing for Granger

non-causality is concerned, we can rely on the asymptotic Wald statistic in (14), where the set

of regressors, the matrices Ω and R as well as the coefficient matrix are suitably adjusted.19

Remark 14 Naturally, such temporal aggregation leads to a great reduction in parameters.

After all, each set of m high-frequency variables per t-period is aggregated into one low-frequency

observation. Of course, this decrease in parameters comes at the cost of disregarding information

embedded in the high-frequency process. As argued in Miller (2011), if the aggregation scheme

employed is different from the true one underlying the DGP, potentially crucial high-frequency

information will be forfeited. Additionally, aggregating a high-frequency variable may lead to

“spurious” (non-)causality in the common low-frequency setup (Breitung and Swanson, 2002),

18This generic specification nests the two dominating aggregation schemes in the literature, Point-in-Time(A = 0, w0 = 1) and Average sampling (A = m− 1, wi = 1/m ∀i), where the former is usually applied to stockand the latter to flow variables. In view of the high-frequency variable we consider in our empirical application,the natural logarithm of bipower variation, we focus on Average sampling in this paper.

19We only consider the asymptotic Wald test in our simulations as it turns out to be correctly sized; hencethere is no need for the bootstrap.

29

since causality is a property which is not invariant to temporal aggregation (Marcellino, 1999

or Sims, 1971).

3.3.2 The max-test

Independently and simultaneously to this work, Ghysels et al. (2016a) have developed a new

Granger non-causality testing framework, whose parsimonious structure makes it very appealing

in a situation, where m is large relative to the sample size. In short, the idea is to focus on the

first line of the MF-VAR in (4), but rather than estimating that (U)-MIDAS equation (Foroni

et al., 2015) and test for Granger non-causality in the direction from X(m) to y, the authors

propose to compute the OLS estimates of βj in the following h separate regression models:

yt = µ+

q∑k=1

αk,jyt−k + βj+1x(m)t−1−j/m + vj , j = 0, . . . , h− 1,

whereby h needs to be set “sufficiently large” (to achieve h > pm). The corresponding max-

test statistic is then the properly scaled and weighted maximum of β21 , . . . , β

2h. Although it

has a non-standard limit distribution under H0, an asymptotic p-value may be obtained in a

similar way as when overcoming the Davies (1987) problem (see Remark 8). Testing for Granger

non-causality in the reverse direction works analogously in the following regression model:

yt = µ+

q∑k=1

αk,jyt−k +h∑k=1

βk,jx(m)t−1−k/m + γjx

(m)t+j/m + vj , j = 1, . . . , l.

Again, the corresponding max-test statistic is the maximum of γ21 , . . . , γ

2l scaled and weighted

properly.20 Ghysels et al. (2016a) show that, in a mixed-frequency VAR a la Ghysels (2015), the

sequential procedure underlying the max-test leads to correct identification of the test in the

20A MIDAS polynomial (Ghysels et al., 2004) on∑hk=1 βk,jx

(m)

t−1−k/m may be imposed to increase parsimony.Note that the aforementioned Davies problem does not occur due to testing for Granger non-causality by lookingat the OLS estimates of γj .

30

presence or absence of Granger causality. Hence, the max-test is consistent against causality.

3.3.3 Unrestricted VARs

Finally, we can attempt to estimate the full MF-VAR in (4) ignoring the possibility that the

amount of parameters may be too large to perform accurate estimations or adequate Granger

non-causality tests. To this end we just estimate the MF-VAR using ordinary least squares. In

this sense the comparison is related to the one of U-MIDAS (Foroni et al., 2015) and MIDAS

regression models for large m. We can test for Granger non-causality using the Wald statistic

in (14), adequately adjusting W , Ω, R and B. We also consider a bootstrap version of the un-

restricted MF-VAR, which we expect to alleviate size distortions, but which cannot solve power

issues due to the proliferation of parameters. Indeed, as shown by Davidson and MacKinnon

(2006), the power of a bootstrap test can only equal the size-corrected power of the original

test; as the parameter proliferation causes the original Wald test to be grossly oversized, the

amount of size-correcting needed implies the remaining power to be low.

4 Monte Carlo Simulations

In order to assess the finite sample performance of our different parameter reduction techniques,

we conduct a Monte Carlo experiment. In light of our empirical investigation we set m = 20,

i.e., as in a month/ working day-example.21 Furthermore, we start by investigating the case

where p = 1 and keep the analysis of higher lag orders for future research.

As far as investigating the size of our Granger non-causality tests is concerned, we assume

21See Section 5 for a justification of the time-invariance of m within this setup. The merits of our methodsshould hold for larger values of m; for smaller values, however, we may assume our approaches to perform worse.Indeed, the smaller m becomes, the less is the need to reduce the amount of parameters via (possibly misspecified)reduction techniques.

31

that the data are generated as a mixed-frequency white noise process, i.e.,

Zt = ut. (27)

Remark 15 Additionally, we have considered three alternative DGPs for size, all based on the

restricted VAR in (25): (i) γ1,i = γ∗i,1 = 0 (’diagonal’), (ii) γ1,i = 0 and γ∗i,1 6= 0 (’only Granger

non-causality from X(m) to y’) and (iii) γ∗i,1 = 0 and γ1,i 6= 0 (’only Granger non-causality

from y to X(m)’) ∀i = 2, . . . , 21, where the parameters γ1,i and γ∗i,1 refer to (29). However,

with the respect to the respective testing direction, the outcomes do not differ qualitatively and

quantitative differences are very small. Results are available upon request.

To analyze power we generate two different DGPs that are closely related to the restricted

VAR in (25):

Zt = ΓPZt−1 + ut (28)

with

ΓP =

γ1,1 γ1,2 γ1,3 . . . γ1,21

γ∗2,1 ρ20 0 . . . 0

γ∗3,1 ρ19... . . . 0

......

.... . .

...

γ∗21,1 ρ 0 . . . 0

, (29)

where γ1,1 = 0.5, ρ = 0.6 and γ1,j = βw∗j−1(−0.01) for j > 1. Note that wi(ψ) = exp(ψi2)∑20i=1 exp(ψi

2)

corresponds to the two-dimensional exponential Almon lag polynomial with the first parameter

set to zero (Ghysels et al., 2007). Now, for the first power DGP we simply set w∗i (ψ) = wi(ψ),

whereas in the second power DGP w∗i (ψ) = wi(ψ)− w(ψ) with the horizontal bar symbolizing

the arithmetic mean. As far as γ∗j,1 is concerned we take γ∗j,1 = γj,1 (first power DGP) and

γ∗j,1 = γj,1 − γ·,1 (second power DGP) for j = 2, . . . , 21, whereby γj,1 = (γj−1,1)1.11 for j =

32

3, . . . , 21.22 As an example, Figure 1 plots βw∗j−1 and γ∗j,1 for β = 2 and γ2,1 = 0.25.

[INSERT FIGURE 1 HERE]

Remark 16 Due to the zero-mean feature of w∗j−1(ψ) and γ∗j,1, j = 2, . . . , 21, in the second

power DGP, we expect the presence of Granger causality to be “hidden” when Average sampling

the high-frequency variable (see Section 3.3.1). It is thus a situation, in which classical temporal

aggregation is expected not to preserve the causality patterns in the data (Marcellino, 1999). The

first power DGP serves as a benchmark in the sense that we do not a priori expect one approach

to be better or worse than the others.

For the methods in Section 3 we analyze size and power for T = 50, 250 and 500, corre-

sponding to roughly 4, 21 and 42 years of monthly data. Note that an additional 100 monthly

observations are used to initialize the process. As far as the error term is concerned, we as-

sume uti.i.d.∼ N(0(21×1),Σu), where Σu has the same structure as the covariance matrix of the

restricted VAR in (25) with σLL = 0.5 and σHH = 1.23

For CCA and PLS we consider r = 1, 2 and leave the analysis of higher factor dimensions

for further research. Recall from Remark 7 that under H0 the model is not misspecified with

either amount of factors. Under HA, however, the model is misspecified for r = 1, but not

for r = 2. In that sense, we implicitly analyze the impact of misspecification, stemming from

choosing r too small, on the behavior of the respective test statistics. As far as the Bayesian

approach is concerned, we experimented with various values for the hyperparameter and found

λ = 0.175 to be a good choice. Furthermore, we approximate ρ by running a corresponding

univariate regression. For the max-test we take q = 1 and h = l = m = 20.

22The parameter values have been chosen to mimic part of the structure of the restricted VAR in (25) and toensure stability of the system. In the first DGP late xt−1-observations have a larger impact on yt, and positivevalues of yt−1 increase x towards the end of the current period, but hardly have an impact at the beginning. Inthe second power DGP a zero-mean feature is imposed on the coefficients w∗j−1(ψ) and γ∗j,1 without changingthe general pattern of their evolvement.

23These values resemble the data in the empirical section, where σLL = 0.55 and σHH = 1.09.

33

We consider different versions of our size and power DGPs by varying the values of σHL

and β. To be more precise, we choose σHL = 0,−0.05,−0.1 when analyzing size properties

of our tests in both directions.24 A value of zero implies the absence of nowcasting causality,

whereas the other two values imply some degree of correlation between the low- and high-

frequency series. For both power DGPs we fix σHL = −0.05 though. When investigating

Granger causality testing from X(m) to y in the first power DGP we consider β = 0.5, 0.8, 2;

in the second power DGP we only take β = 0.5. For causality in the reverse direction we take

β = 0.5 and γ2,1 = 0.2 in both power DGPs, because the results do not seem sensitive in this

respect.

The figures in the tables below represent the percentage amount of rejections at α = 5%.25

All figures are based on 2,500 replications and are computed using GAUSS12. For the bootstrap

versions we use B = 499.

4.1 Size

Tables 1 and 2 contain the size results for Granger non-causality tests within the following

approaches: the low-frequeny VAR (LF), the unrestricted VAR (UNR), reduced rank restric-

tions using canonical correlations analysis (CCA) or partial least squares (PLS), and with three

imposed factors using the HAR model (HAR), the max-test (max) and the Bayesian mixed-

frequency VAR (BMF). With respect to CCA and PLS, the outcomes for r = 2 turn out to

be qualitatively very similar to the r = 1 case, showing that the methods are apparently not

affected by the presence of misspecification. Consequently, we only show the outcomes for r = 1

to save space. Furthermore, aside from the results of the standard, i.e., asymptotic, Wald tests,

we only display the outcomes for the best performing bootstrap variant, which are denoted with

24σHL turns out to be equal to −0.22 in our empirical analysis, yet we decrease its value here so as to achievepositive-definiteness of Σu. We checked the robustness of our outcomes to σHL being positive (we chose thecorresponding values 0.05 and 0.1) as this situation is more common in economic applications. However, theresults remain broadly unchanged and are available upon request.

25Outcomes for α = 10% and α = 1% are available upon request.

34

a superscript ‘∗’ (e.g., CCA∗ is the bootstrap CCA-based Wald test). In the case of testing

causality in the direction from the high- to the low-frequency series this turns out to be the

bootstrap that imposes the null hypothesis, whereas the unrestricted variant dominates for the

reverse direction.26 For both methods we set the lag length within the bootstrap equal to p = 1.

When testing causality from y to X(m) size distortions may occur due to computing a joint

test on mp parameters from m different equations in the system. In order to address this issue,

we take the maximum of the Wald statistics computed equation by equation. We denote these

tests by adding a subscript ’b’, e.g., CCAb. For the tests with asymptotic critical values we

apply a Bonferroni correction to control the size under multiple testing (Dunn, 1961). In similar

spirit as in White (2000), the bootstrap implementations of these tests automatically provide

an implicit Bonferroni-type correction for multiple testing, as we calculate the corresponding

maximum of the single-equation Wald tests within each bootstrap iteration and compare that

to the maximum of the original tests. We will refer to all these tests as ‘Bonferroni-type’ tests

to avoid confusion with the max-test, and present the corresponding outcomes in Table 3.

[INSERT TABLE 1 HERE]

Let us start by analyzing the testing behavior from X(m) to y. First, LF has size close to

the nominal one, which is not surprising given that the flat aggregation scheme is correct in this

white noise DGP (no matter the value of σHL). The unrestricted VAR, however, incurs some

size distortions for small T due to parameter proliferation. Reduced rank restrictions based

on CCA and PLS yield considerable size distortions, whereby the imposed HAR-type factor

structure delivers good results (despite being slightly oversized for T = 50). Finally, max

and BMF show almost perfect size, whereby the former is marginally under- and the latter

marginally oversized.

26The unrestricted bootstrap is seriously oversized when testing causality from X(m) to y, while for the reversedirection the bootstrap under the null is mildly oversized for small samples. Hence, we recommend the use of therestricted bootstrap for testing causality from X(m) to y, but using the unrestricted bootstrap for the oppositedirection.

35

Turning to the outcomes of the bootstrap versions we observe that, independent of the value

of σHL, even for T = 50 the actual size of the Wald tests within the unrestricted VAR and

reduced rank restriction models with CCA- or HAR-type factors is very close to the nominal one,

such that the bootstrap tests clearly dominate their asymptotic counterparts. For PLS-based

factors this conclusion only holds in the absence of nowcasting causality, and size distortions

arise quickly as we increase the contemporaneous correlation between X(m) and y.



Many of the aforementioned statements carry over to testing causality in the reverse direc-

tion: LF delivering nearly perfect size (as it is the correct model under the null), max performing

very well (being only marginally oversized for small T ) and BMF being a bit oversized (by a

slightly larger degree than in Table 1).27 The size distortions incurred by UNR, CCA, PLS

and HAR are, however, amplified. The use of the Bonferroni-corrected UNRb, CCAb, PLSb and

HARb can only mitigate this effect, yet not fully eradicate it. The bootstrap version does elim-

inate the size distortions, though: actual size of UNR∗, PLS∗ and HAR∗ becomes oftentimes

very close to 5%; only CCA∗ remains a bit oversized for T = 50. The Bonferroni-type tests

UNR∗b , CCA∗b , PLS∗b and HAR∗b all have size very close to the nominal one for all sample sizes.

To sum up, the asymptotic Granger non-causality tests with size close to the nominal one are

LF, max and BMF to some degree. As far as the bootstrap tests are concerned, UNR∗, CCA∗,

PLS∗ (with at most ”mild” nowcasting causality) and HAR∗ when testing the direction from

X(m) to y, and UNR∗, CCA∗, PLS∗ and HAR∗ as well as their Bonferroni-type counterparts

when testing the reverse direction deliver good size outcomes. Consequently, we focus on these

cases when analyzing power (and when dealing with real data in Section 5).

27Strangely, BMF seems to be rejected less and less for growing σHL, with the effect being strongest for smallT .

36

4.2 Power

Tables 4 and 5 contain the corresponding outcomes under the alternative hypothesis. Recall

that for the direction from X(m) to y we consider three different values of β within the first

power DGP, β = 0.5, 0.8 and 2. For the second power DGP we fix β = 0.5. For the reverse

direction we always keep γ2,1 = 0.2.


Let us again start with the direction from the high- to the low-frequency variable and first

focus on the outcomes for the first power DGP, i.e., the top three blocks of Table 4. Observe (i)

how power reaches one asymptotically for all approaches, and (ii) how larger values of β imply

an increase in the rejection frequencies for a given T . Naturally, for a large enough β all tests

have power equal to one, irrespective of the sample size (as nearly happens for β = 2). So, in

order to compare our different tests let us focus on β = 0.5.

The asymptotic Wald test within the low-frequency VAR still performs very well, with the

highest rejection frequency for T = 50. Note, however, that the Granger causality feature

in the data does not get averaged out by temporal aggregation in the first power DGP. The

bootstrapped version of the Wald test after HAR-type restrictions have been imposed in a

reduced rank regression perform almost as well as LF. max, BMF and PLS∗ appear to fall a

bit short, but catch up quickly as either T or β grows. A similar observation holds for UNR∗

and CCA∗, whereby they seem to be markedly less powerful for small T .

These conclusions generally carry over to the second power DGP, with two exceptions: first

and foremost, the outcomes of LF show that Average sampling of X(m) annihilates the causality

feature between the series for all sample sizes (power actually almost coincides with the nominal

size of 5%). Second, the rejection frequencies are lower than they were in the corresponding

setting of the first power DGP.


37

Focusing on the direction from y to X(m), it suffices to analyze the results of the first power

DGP; the aforementioned statements regarding the second power DGP carry over to the reverse

direction of causality being tested. It turns out that both, max and BMF, have higher power

than the bootstrapped Wald tests corresponding to the reduced rank restrictions approaches.

Recall, though, that BMF and max were somewhat oversized, especially for small T , which

may inflate their power. Also, so far we did not consider the Bonferroni-type tests. Given the

way in which max is constructed it seems more natural to compare it to the Bonferroni-type

counterparts of the bootstrap tests. Indeed, both approaches rely on a statistic that is computed

as the maximum of a set of test statistics.28 We find that for a given T almost all approaches are

more powerful than max. The fact that UNR∗b , PLS∗b and HAR∗b are all slightly undersized for

small T , while max is marginally oversized, even strengthens these conclusions. Also note that

the bootstrap tests based on the reduced rank restrictions seem to be slightly more powerful

than UNR∗b . Hence, while the bootstrap is able to correct size of the unrestricted MF-VAR

approach, the proliferation of parameters continues to have a negative effect on power.

Combining the power and size outcomes, it seems that BMF, HAR∗, max (for large enough

β when T is small) and PLS∗ (in the absence of nowcasting causality) are the dominant Granger

non-causality testing approaches as far as the direction from X(m) to y is concerned. For the

reverse direction of causality CCA∗b , PLS∗b and HAR∗b as well as UNR∗b and max (though both to

a lesser degree) appear superior. Despite the competitiveness of our proposed tests, asymptotic

or bootstrapped, in the situation at hand, one should keep in mind that the underlying param-

eter reduction techniques are ultimately ad hoc. Hence, in contrast to the max-test and testing

within the unrestricted VAR, which do not involve possibly misspecified dimension reduction

approaches, our tests can ultimately not have an asymptotic power of one. As such, the good

performance of our proposed tests may be specific to our simulation setup and should not be

28There is a difference in terms of the underlying regression to obtain the various test statistics, of course.Ghysels et al. (2016a) consider univariate MIDAS regressions with leads, whereas the Bonferroni-type tests arebased on estimated coefficients from different equations, i.e., the ones for X(m), of the corresponding system.

38

generalized to all possible causality patterns for which our imposed restrictions may not hold.

5 Application

We apply the approaches described in Section 3 to a MF-VAR consisting of the monthly growth

rate of the U.S. industrial production index (ipi hereafter), a measure of business cycle fluctua-

tions, and the logarithm of daily bipower variation (bv hereafter) of the S&P500 stock index, a

realized volatility measure more robust to jumps than the realized variance. While the degree

to which macroeconomic variables can help to predict volatility movements has been investi-

gated widely in the literature (see Schwert, 1989b, Hamilton and Gang, 1996, or Engle and

Rangel, 2008, among others), the reverse, i.e., whether the future path of the economy can be

predicted using return volatility, has been granted comparably little attention (examples are

Schwert, 1989a, Mele, 2007, and Andreou et al., 2000). Instead of using an aggregate mea-

sure of volatility such as, e.g., monthly realized volatilities (Chauvet et al., 2015) or monthly

GARCH estimated variances, we use daily bipower variation computed on 5-minute returns ob-

tained from the database in Heber et al. (2009). With the bv-series being available at a higher

frequency than most indicators of business cycle fluctuations, we obtain the mixed-frequency

framework analyzed in this paper.

The sample covers the period from January 2000 to June 2012 yielding a sample size of

T = 150. We take m = 20 as it is the maximum amount of working days that is available

in every month throughout the sample we deal with.29 Consequently, we have BV(20)t =

(bv(20)t , bv

(20)t−1/20, . . . , bv

(20)t−19/20)′. Figure 2 plots the data.

[INSERT FIGURE 2 HERE]

29Whenever a month contains more than 20 working days we disregard the corresponding amount of days atthe beginning of the month. In June 2012, e.g., there are 21 working days such that we do not consider June1. For May 2012 we disregard the first three days. An alternative (balanced) strategy would have been to takethe maximum number of days in a particular month (i.e., 23, usually in July, August or October) and to createadditional values for non-existing days in other months whenever necessary. As far as the treatment of dailydata is concerned we have also taken bv

(20)t = bv

(20)

t−1/20 when there are no quotations for bv(20)t .

39

Table 6 contains the outcomes of Granger non-causality tests for all approaches discussed in

Section 3. As mentioned before, we disregard the cases that showed considerable size distortions

in the Monte Carlo analysis. Note that a lag length of p = 1 and 2 is considered (the same

for the bootstrap) and that the numbers represent p-values (in percentages). For reduced rank

restrictions using CCA and PLS we consider one up to three factors, whereby we only show

the outcomes for r = 1 and r = 2 for representational ease; r = 3 gives similar results. For

the non-bootstrap tests, except the max-test and the Wald-test within the Bayesian mixed-

frequency VAR,30 we also consider a heteroscedasticity consistent variant of (14) by computing

a robust estimator of Ω (see Ravikumar et al., 2000) to account for the potential presence of a

time varying multivariate process:

ΩR = T ((W ′W )−1 ⊗ Im+1)S0((W ′W )−1 ⊗ Im+1), (30)

where

S0 =1

T

T∑t=1

(Wt ⊗ ut)′(Wt ⊗ ut). (31)

For the bootstrap tests we achieve the robustness to heteroskedasticity by implementing the

wild bootstrap version as described in Section 3.1.4.


If we consider the test statistics that perform best in our Monte Carlo experiment, the out-

comes above clearly point towards Granger-causality from BV (20) to ipi. This result supports

Andreou et al. (2000) concluding that ”[...] volatilities may also be useful [...] indicators for

both the growth and volatility of industrial production” (p. 15). The situation is less clear

cut when looking at Granger-causality from the low-frequency variable to the high-frequency

30Note that, as discussed in Section 3.2.2, there is a one-to-one correspondence between the OLS estimator ofthe augmented model and the prior setting we consider in our Bayesian framework. As we did not discuss a priorspecification that corresponds to the robust version of the Wald test we disregard from that test variant here.

40

volatility measures. Indeed, max and most Bonferroni-type tests seem to reject the null of no

Granger-causality, whereas the joint tests do not. However, the higher power obtained on the

maximum of the individual Wald-type tests would favor the presence of Granger-causality in

the direction from business cycle movements to financial uncertainties as well. A more careful

investigation, out of the scope of this paper, could be done on different subsamples in order

to analyze whether, e.g., periods before, during or after the financial crises lead to similar

conclusions.

6 Conclusion

We investigate Granger non-causality testing in a mixed-frequency VAR, where the mismatch

between the sampling frequencies of the variables under consideration is large, causing estima-

tion and inference to be potentially problematic. To avoid this issue we discuss two parameter

reduction techniques in detail, reduced rank restrictions and a Bayesian MF-VAR approach,

and compare them to (i) a common low-frequency VAR, (ii) the max-test approach and (iii) the

unrestricted VAR in terms of their Granger non-causality testing behavior. To further improve

their finite sample test properties we also consider two bootstrap variants for the reduced rank

regression approaches (and the unrestricted VAR).

For both directions of causality we find a different set of tests that result in the best Granger

non-causality testing behavior. For the direction from the high- to the low-frequency series,

standard testing within the Bayesian mixed-frequency VAR, the max-test of Ghysels et al.

(2016a), and the restricted bootstrap version of the Wald test in a reduced rank regression

after HAR-type factors have been imposed or PLS-type factors have been computed, the latter

of which being restricted to nowcasting non-causality (Gotz and Hecq, 2014), perform best.

For the reverse direction, the unrestricted bootstrap variants of the Bonferroni-corrected Wald

tests within the following models dominate: the unrestricted VAR and reduced rank regressions

41

with CCA-, PLS- or HAR-based factors.

An application investigating the presence of a causal link between business cycle fluctuations

and uncertainty in financial markets illustrates the practical usefulness of these approaches.

While Granger causality from uncertainty in financial markets to business cycle fluctuations

was clearly supported by the data, evidence for causality in the reverse direction only comes

from a subset of the tests, yet the more powerful ones according to our simulation results.

Acknowledgements

We want to particularly thank Eric Ghysels for many fruitful discussions, comments and sug-

gestions on earlier versions of the paper. Moreover, we thank two anonymous referees, one

of them deserving special gratitude for providing us with MATLAB code for the max-test.

We also thank Daniela Osterrieder, Lenard Lieb, Jorg Breitung, Roman Liesenfeld, Jan-Oliver

Menz, Klemens Hauzenberger, Martin Mandler and participants of the Workshop on Advances

in Quantitative Economics in Maastricht 2013, the CEIS Seminar in Rome 2013, the 33rd Inter-

national Symposium on Forecasting in Seoul 2013, the 24th (EC)2 conference in Cyprus 2013,

the 22th SNDE Annual Symposium in New York 2014, the 8th ECB Workshop on Forecasting

Techniques in Frankfurt 2014, the Graduate School Research Seminar in Cologne 2014, the 68th

European Meeting of the Econometric Society in Toulouse 2014, the 8th International Confer-

ence on Computational and Financial Econometrics in Pisa 2014 and the Norges Bank Research

Seminar in Oslo 2015 for useful suggestions and comments on earlier versions of the paper. The

third author would like to thank the Netherlands Organisation for Scientific Research (NWO)

for financial support.

42

A Bayesian VAR Estimation with MF Prior Variances

Properly accounting for the high-frequency time difference between the variables involved im-

plies the following prior variances:

V ar[γ(k)i,j ] =

φ λ2

(k+(j−2)/m)2SLH for i = 1, j > 1

φ λ2

(k+(2−i)/m)2SHL for j = 1, i > 1

λ2

(k+(j−i)/m)2else

, (32)

Now, define Z = (Zµ1 , . . . , ZµT )′, where Zµt = (Z ′t, 1)′, and let us re-write the MF-VAR in (4)

in the following way:

vec(Z)︸︷︷︸(m+1)T×1

= Z∗︸︷︷︸(m+1)T×(m+1)n

vec(B∗)︸︷︷︸(m+1)n×1

+ vec(E)︸︷︷︸(m+1)T×1

, (33)

where Z = (Z1, . . . , ZT )′, Z∗ = Im+1⊗Z, E = (u1, . . . , uT )′, B∗ = (B′, µ)′ and n = (m+1)p+1.

In order to drop the undesirable feature of a fixed and diagonal covariance matrix Σ, we impose

a normal inverted Wishart prior (at the cost of having to set φ = 1) with the following form

(Kadiyala and Karlsson, 1997):

vec(B∗)|Σ ∼ N(vec(B∗0), [Z∗′0 (Σ−1 ⊗ IT )Z∗0 ]−1) and Σ ∼ iW (V0, v0), (34)

where Z∗0 = Im+1⊗Z0 with Z0 being a (T ×n)-matrix. B∗0 and Z0 have to be chosen so as to let

expectations and variances of the elements in B∗ coincide with the moments in (32). Likewise,

V0 and v0 need to be set such that E[Σ] = Σd.

Similar to Banbura et al. (2010), we can show that adding auxiliary dummy variables Yd and

Xd, the precise composition of which is given in Appendix B, to (33) is equivalent to imposing

43

the normal inverted Wishart prior in (34). To this end, let

B∗aux0 = (X ′dXd)−1X ′dYd,

Zaux0 = Xd

V0 = (Yd −XdB∗aux0 )′(Yd −XdB

∗aux0 ) and

v0 = m+ 3 = Td − naux + 2,

(35)

where naux = m(2p+ 1) and Td = naux + (m+ 1), and subsequently set

vec(B∗0) = S′vec(B∗aux0 ) and

V ar[vec(B∗)] = S′[Z∗aux′0 (Σ−1 ⊗ ITd)Z∗aux0 ]−1S,(36)

with Z∗aux0 = Im+1 ⊗ Zaux0 and where S is an [(m + 1)naux × (m + 1)n]-dimensional selection

matrix, the precise construction of which is given in Appendix C.

Remark 17 Intuitively speaking, Zaux0 is an auxiliary matrix constructed as the “union” of the

Z0 matrices corresponding to the different columns of B∗. The non-random matrix S selects,

for each column of B∗, the corresponding elements of Zaux0 in order to let the variance of each

element in B∗ match the corresponding prior variance. Likewise, B∗aux0 is an auxiliary matrix

from which we derive B∗0 .

In order to augment the model in (33), let us define Zdj = (Z ′, (XdSj)′)′, where Sj is the

jth block of the selection matrix S (see Appendix C for details), j = 1, . . . ,m+ 1.31 Then, the

augmented system becomes

vec(Z∗)︸︷︷︸(m+1)(T+Td)×1

= Z∗∗︸︷︷︸(m+1)(T+Td)×(m+1)n

vec(B∗)︸︷︷︸(m+1)n×1

+ vec(E∗)︸︷︷︸(m+1)(T+Td)×1

, (37)

where Z∗ = (Z ′, Y ′d)′, E∗ = (E′, E′d)′ and Z∗∗ is block diagonal with Z∗∗ = diagZd1, Zd2, . . . , Zdm+1.

31Here, Sj picks the elements of Zaux0 = Xd corresponding to column j of B∗.

44

The posterior then has the form

vec(B∗)|Σ, Z ∼ N(vec(B∗), [Z∗′∗ (Σ−1 ⊗ IT+Td)Z∗∗]−1) and Σ|Z ∼ iW (V , T +m+ 3), (38)

where V = E′∗E∗ with vec(E∗) = vec(Z∗)− Z∗∗vec(B∗).32 Note that the posterior mean of the

coefficients boils down to

vec(B∗) = [Z∗′∗ (Σ−1 ⊗ IT+Td)Z∗∗]−1Z∗′∗ (Σ−1 ⊗ IT+Td)vec(Z∗), (39)

i.e., the GLS estimate of a SUR regression of vec(Z∗) on Z∗∗. As for the common-frequency

case, it can be checked that it also coincides with the posterior mean for the prior setup in (32).

An example with p = 1 and m = 2 illustrates the auxiliary dummy variables approach and is

provided in Appendix D.

B The Auxiliary Dummy Variables with MF Prior Variances

The auxiliary dummy variables that imply a matching of the prior moments turn out to be

Yd︸︷︷︸Td×(m+1)

=

02(m−1)×1

Dρ ⊗ (0, σH/λ)′σLρm/λ

0

0(m(2p−1)−1)×(m+1)

diag(σL, σH , . . . , σH)

01×(m+1)

, (40)

32In practice, we estimate Σ in the standard way, i.e., Σ = 1T+Td

Eols′∗ Eols∗ , where Eols∗ denotes the OLS

residuals of the system in (37). Again, a sample size correction alleviates potential size distortions (see Section3.1.2). The ith column of Eols∗ , i = 1, . . . ,m + 1, corresponds to the residuals of a regression of (Z′·,i, Y

′d,i)′ on

Zdi , where Z·,i and Yd,i denote the ith columns of Z and Yd, respectively.

45

Xd︸︷︷︸Td×naux

=

J1p ⊗ diag(σL, σH)/λ 02pm×(m−1) 02pm×1

0(m−1)×2pm J2pσH/λ 0(m−1)×1

0(m+1)×2pm 0(m+1)×(m−1) 0(m+1)×1

01×2pm 01×(m−1) ε

, (41)

where Dρ = diaga(ρm, m−1m ρm−1, . . . , 2

mρ2, 1mρ), with diaga(·) denoting an anti-diagonal matrix.

Furthermore,

J1p = diag( 1

m ,2m , . . . , 1,

m+1m , . . . , 2, . . . , . . . , p),

J2p = diag(p+ 1

m , p+ 2m , . . . , p+ m−1

m ).(42)

The last line of both, Yd and Xd, corresponds to the diffuse prior for the intercept (ε is a very

small number), the block above imposes the prior for Σ and the remaining blocks set the priors

for the coefficients γ(k)i,j . As in Banbura et al. (2010), we set σ2

i = s2i , i = L,H, where s2

i is the

variance of a residual from an AR(p), respectively an AR(mp), model for yt, respectively for

x(m)t .

C Construction of the Selection Matrix S

Let us investigate the variance of vec(B∗)|Σ more closely. Noting that we have to choose Z0,

or Z∗0 in (34), in such a way so as to let the variances of the corresponding coefficients coincide

with the prior variances in (32), it turns out that we need to set

V ar[vec(B∗)|Σ] =

σ2LΩ

(2)0 0n×n . . . . . . 0n×n

0n×n σ2HΩ

(2)0 0n×n . . . 0n×n

... 0n×n σ2HΩ

(3)0 . . .

...

......

.... . . 0n×n

0n×n 0n×n . . . 0n×n σ2HΩ

(m+1)0

, (43)

46

where

Ω(i)0 = diag( λ2

(1+(2−i)/m)2σ2L, λ2

(1+(2−i)/m)2σ2H, λ2

(1+(3−i)/m)2σ2H, . . . , λ2

(1+1+(1−i)/m)2σ2H, . . . ,

. . . , λ2

(p+(2−i)/m)2σ2L, λ2

(p+(2−i)/m)2σ2H, . . . , λ2

(p+1+(1−i)/m)2σ2H, 1ε2

)(44)

for i = 2, . . . ,m+ 1.

Hence, unlike in the common-frequency case, where Ω(i)0 = Ω0 ∀i (Banbura et al., 2010),

the set of variances changes due to the stacked nature of the vector Zt and the specific lag

structure for each coefficient; see the variances in (32). Let us form an auxiliary matrix Ωaux0

that contains the union of all elements in Ω(2)0 , . . . ,Ω

(m+1)0 . Each matrix Ω

(i)0 contains p + 1

new elements compared to Ω(i−1)0 for i > 2. As there are n elements in Ω

(2)0 , we end up with a

dimension of m(2p+ 1) = naux for the square matrix Ωaux0 :

Ωaux0 = diag( λ2

(1/m)2σ2L, λ2

(1/m)2σ2H, . . . , λ2

12σ2L, λ2

12σ2H, λ2

(1+1/m)2σ2L, λ2

(1+1/m)2σ2H, . . . , λ2

22σ2H,

. . . , . . . , λ2

p2σ2L, λ2

p2σ2H, λ2

(p+1/m)2σ2H, . . . , λ2

(p+1−1/m)2σ2H, 1ε2

).(45)

All that remains is to define a [(m + 1)naux × (m + 1)n]-dimensional selection matrix S such

that S′(Σd ⊗ Ωaux0 )S = V ar[vec(B∗)|Σ] . Note that from Ωaux

0 we can then derive Zaux0 = Xd

by using Ωaux0 = (Zaux

′0 Zaux0 )−1.

Let us denote by 1i an naux-dimensional column vector with a one in row i and zeros

elsewhere. It turns out that S is a block-diagonal matrix, i.e., S = diagS1, S2, . . . , Sm+1,

where each off-diagonal block is 0naux×n. Each Sj , j = 1, . . . ,m+1, can be described as follows:

Sj = (S1j , S

2j , 1naux) for j = 2, . . . ,m+ 1, (46)

47

where

S1j = ( 12m−(2j−3), 12m−(2j−4), 12m−(2j−6), . . . , 14m−(2j−2), 14m−(2j−3), . . . ,

16m−(2j−2), . . . , . . . , 12pm−(2j−3), 12pm−(2j−4), . . . , 12pm),(47)

S2j =

∅ for j = m+ 1

(12pm+1, 12pm+2, . . . , 12pm+m−(j−1)) else(48)

and

S1 = S2. (49)

Each of the indices in Sj reveals which row element of the jth column of Baux0 gets selected by

S. Put differently, the indices that are missing in Sj correspond to elements of Baux0 (in column

j) that will not get chosen by S. This implies that the values of these elements do not play a

role for the computation of vec(B∗0). Consequently, when constructing Baux0 , and subsequently

also Yd, we assign these elements a value of zero for simplicity.

D Example: the Auxiliary Dummy Variables Approach for MF Data

Let p = 1 and m = 2. The prior beliefs in (32) corresponding to this setup, and written in

terms of vec(B∗) = vec((B′, µ)′), are given by

E(vec(B∗)) = vec

ρ2 0 0

0 ρ2 ρ

0 0 0

0 0 0

48

and

V ar(vec(B∗)) = λ2

1 0 . . . 0

0 SLH 0 . . . 0

0 0 1(3/2)2

SLH 0 . . . 0

0 . . . 0σ2L

ε2λ2 0 . . . 0

0 . . . 0 SHL 0 . . . 0

0 . . . 0 1 0 . . . 0

0 . . . 0 1(3/2)2

0 . . . 0

0 . . . 0σ2H

ε2λ2 0 . . . 0

0 . . . 0 1(1/2)2

SHL 0 . . . 0

0 . . . 0 1(1/2)2

0 0

0 . . . 0 1 0

0 . . . 0σ2H

ε2λ2

.

(50)

With Td = 9 and naux = 6 the auxiliary dummy variables look as follows:

Yd =

0 0 0

0 012ρσHλ

ρ2σLλ 0 0

0 ρ2σHλ 0

0 0 0

σL 0 0

0 σH 0

0 0 σH

0 0 0

, Xd =

12σLλ 0 0 0 0 0

012σHλ 0 0 0 0

0 0 σLλ 0 0 0

0 0 0 σHλ 0 0

0 0 0 032σHλ 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 ε

.

Let us demonstrate that these auxiliary dummy variables truly imply a matching of the

prior moments in (34) and the prior beliefs given above. Simple algebra gives us Zaux0 = Xd,

49

v0 = 5 as well as

B∗aux0 = (X ′dXd)−1X ′dYd =

0 0 0

0 0 ρ

ρ2 0 0

0 ρ2 0

0 0 0

0 0 0

,

V0 = (Yd −XdB∗aux0 )′(Yd −XdB

∗aux0 ) =

σ2L 0

0 σ2H 0

0 0 σ2H

.

To obtain vec(B∗0) we require the selection matrix S. Following the guidelines in Appendix C

yields S = diagS1, S2, S3, where the off-diagonal blocks are of dimension 6× 4 and

S1 = S2 =

0 0 0 0

0 0 0 0

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

, S3 =

1 0 0 0

0 1 0 0

0 0 0 0

0 0 1 0

0 0 0 0

0 0 0 1

.

Now it is easy to show that, indeed,

vec(B∗0) = S′vec(B∗aux0 ) = vec

ρ2 0 0

0 ρ2 ρ

0 0 0

0 0 0

= E(vec(B∗))

50

and S′[Σ⊗ (Zaux′0 Zaux0 )−1]S = V ar(vec(B∗)) as in equation (50).

Having shown that Yd and Xd capture the prior moments, we need to augment the MF-VAR.

Let us start by re-writing it into the format presented in equation (33):

vec

y2 x

(2)2 x

(2)2−1/2

......

...

yT x(2)T x

(2)T−1/2

=

I3 ⊗

y1 x

(2)1 x

(2)1/2 1

......

......

yT−1 x(2)T−1 x

(2)T−1−1/2 1

×

vec

γ1,1 γ2,1 γ3,1

γ1,2 γ2,2 γ3,2

γ1,3 γ2,3 γ3,3

µ1 µ2 µ3

︸︷︷︸

vec(B∗)

+vec(E).

Following the steps outlined after Remark 17 gives us the augmented system:

vec

y2 x(2)2 x

(2)2−1/2

......

...

yT x(2)T x

(2)T−1/2

0 0 0

0 012ρσHλ

ρ2σLλ 0 0

0 ρ2σHλ 0

0 0 0

σL 0 0

0 σH 0

0 0 σH

0 0 0

= Z∗∗vec(B∗) + vec(E∗),

51

Z∗∗ =

y1 x(2)1 x

(2)1/2

1

0(T+9)×4 0(T+9)×4

......

......

yT−1 x(2)T−1 x

(2)T−1−1/2

1

0 0 0 0

0 0 0 0

σLλ

0 0 0

0 σHλ

0 0

0 032σHλ

0

03×4

0 0 0 ε

0(T+9)×4

y1 x(2)1 x

(2)1/2

1

0(T+9)×4

......

......

yT−1 x(2)T−1 x

(2)T−1−1/2

1

0 0 0 0

0 0 0 0

σLλ

0 0 0

0 σHλ

0 0

0 032σHλ

0

03×4

0 0 0 ε

0(T+9)×4 0(T+9)×4

y1 x(2)1 x

(2)1/2

1

......

......

yT−1 x(2)T−1 x

(2)T−1−1/2

112σLλ

0 0 0

012σHλ

0 0

0 0 0 0

0 0 σHλ

0

0 0 0 0

03×4

0 0 0 ε

.

The GLS estimate of vec(B∗) in the regression above is then a closed-form solution for the

posterior mean of the coefficients.

52

References

Anderson, T. W., 1951, Estimating linear restrictions on regression coefficients for multivariate

normal distributions. The Annals of Mathematical Statistics 22, 327–351.

Andreou, E., D. R. Osborn and M. Sensier, 2000, A comparison of the statistical properties of

financial variables in the USA, UK and Germany over the business cycle. Manchester School

68, 396–418.

Andrews, D. W. K., 1994, The large sample correspondence between classical hypothesis tests

and Bayesian posterior odds tests. Econometrica 62, 1207–1232.

Banbura, M., D. Giannone and L. Reichlin, 2010, Large Bayesian vector autoregressions.

Journal of Applied Econometrics 25, 71–92.

Bauwens, L., M. Lubrano and J.-F. Richard, 2000, Bayesian inference in dynamic econometric

models. Oxford University Press.

Bose, A., 1988, Edgeworth correction by bootstrap in autoregressions. Annals of Statistics 16,

1709–1722.

Breitung, J. and N. R. Swanson, 2002, Temporal aggregation and spurious instantaneous

causality in multiple time series models. Journal of Time Series Analysis 23, 651–666.

Bruggemann, R., C. Jentsch and C. Trenkler, 2016, Inference in VARs with conditional het-

eroskedasticity of unknown form. Journal of Econometrics 191, 69–85.

Carriero, A., G. Kapetanios and M. Marcellino, 2011, Forecasting large datasets with bayesian

reduced rank multivariate models. Journal of Applied Econometrics 26, 735–761.

Cavaliere, G., A. Rahbek and A. M. R. Taylor, 2012, Bootstrap determination of the co-

integration rank in vector autoregressive models. Econometrica 80, 1721–1740.

53

Chauvet, M., Z. Senyuz and E. Yoldas, 2015, What does financial volatility tell us about

macroeconomic fluctuations? Journal of Economic Dynamics and Control 52, 340–360.

Clements, M. and A. B. Galvao, 2008, Macroeconomic forecasting with mixed-frequency data.

Journal of Business & Economic Statistics 26, 546–554.

Clements, M. and A. B. Galvao, 2009, Forecasting U.S. output growth using leading indicators:

an appraisal using MIDAS models. Journal of Applied Econometrics 24, 1187–1206.

Corsi, F., 2009, A simple approximate long-memory model of realized volatility. Journal of

Financial Econometrics 7, 174–196.

Cox, D. R., L. Bondesson, G. Gudmundsson, E. Harsaae, K. Juselius, P. Laake, S. L. Lau-

ritzen and G. Lindgren, 1981, Statistical analysis of time series: some recent developments.

Scandinavian Journal of Statistics 8, 93–115.

Cubadda, G. and B. Guardabascio, 2012, A medium-N approach to macroeconomic forecasting.

Economic Modelling 29, 1099–1105.

Cubadda, G. and A. Hecq, 2011, Testing for common autocorrelation in data rich environments.

Journal of Forecasting 30, 325–335.

Davidson, R. and J. G. MacKinnon, 2006, The power of bootstrap and asymptotic tests. Journal

of Econometrics 133, 421–441.

Davies, R. B., 1987, Hypothesis testing when a nuisance parameter is present only under the

alternatives. Biometrika 74, 33–43.

Dufour, J.-M., D. Pelletier and E. Renault, 2006, Short run and long run causality in time

series: inference. Journal of Econometrics 132, 337–362.

Dufour, J.-M. and E. Renault, 1998, Short run and long run causality in time series: theory.

Econometrica 66, 1099–1126.

54

Dunn, O. J., 1961, Multiple comparisons among means. Journal of the American Statistical

Association 56, 52–64.

Engle, R. F. and J. G. Rangel, 2008, The spline-garch model for low-frequency volatility and

its global macroeconomic causes. Review of Financial Studies 21, 1187–1222.

Eraker, B., C. W. J. Chiu, A. T. Foerster, T. B. Kim and H. D. Seoane, 2015, Bayesian mixed

frequency VARs. Journal of Financial Econometrics 13, 698–721.

Foroni, C., E. Ghysels and M. Marcellino, 2013, Mixed-frequency vector autoregressive models,

in: T. B. Fomby, L. Kilian and A. Murphy, (Eds.), Advances in Econometrics, Vol. 32.

Emerald Group Publishing Limited, pp. 247–272.

Foroni, C. and M. Marcellino, 2014, Mixed frequency structural models: identification, estima-

tion, and policy analysis. Journal of Applied Econometrics 29, 1118–1144.

Foroni, C., M. Marcellino and C. Schumacher, 2015, Unrestricted mixed data sampling (MI-

DAS): MIDAS regressions with unrestricted lag polynomials. Journal of the Royal Statistical

Society Series A 178, 57–82.

Forsberg, L. and E. Ghysels, 2007, Why do absolute returns predict volatility so well? Journal

of Financial Econometrics 5, 31–67.

Ghysels, E., 2015, Macroeconomics and the reality of mixed frequency data. Working Paper,

Dept. of Economics, University of North Carolina at Chapel Hill.

Ghysels, E., J. B. Hill and K. Motegi, 2016a, Simple Granger causality tests for mixed frequency

data. Working paper, Dept. of Economics, University of North Carolina at Chapel Hill.

Ghysels, E., J. B. Hill and K. Motegi, 2016b, Testing for Granger causality with mixed frequency

data. Journal of Econometrics 192, 207–230.

55

Ghysels, E. and J. I. Miller, 2015, Testing for cointegration with temporally aggregated and

mixed-frequency time series. Journal of Time Series Analysis 36, 797–816.

Ghysels, E., P. Santa-Clara and R. Valkanov, 2004, The midas touch: Mixed data sampling

regression models. CIRANO Working Papers 2004s-20, CIRANO.

Ghysels, E., P. Santa-Clara and R. Valkanov, 2006, Predicting volatility: getting the most out

of return data sampled at different frequencies. Journal of Econometrics 131, 59–95.

Ghysels, E., A. Sinko and R. Valkanov, 2007, Midas regressions: further results and new

directions. Econometric Reviews 26, 53–90.

Ghysels, E. and R. Valkanov, 2012, Forecasting volatility with MIDAS, in: L. Bauwens, C.

Hafner and S. Laurent, (Eds.), Handbook of Volatility Models and Their Applications, Chap-

ter 16. John Wiley & Sons, Inc., Hoboken, NJ, USA, pp. 383–401.

Goncalves, S. and L. Kilian, 2004, Bootstrapping autoregressions with conditional heteroskedas-

ticity of unknown form. Journal of Econometrics 123, 89–120.

Goncalves, S. and B. Perron, 2014, Bootstrapping factor-augmented regression models. Journal

of Econometrics 182, 156–173.

Gotz, T. B. and A. Hecq, 2014, Nowcasting causality in mixed frequency vector autoregressive

models. Economics Letters 122, 74–78.

Gotz, T. B., A. Hecq and L. Lieb, 2015, Real-time mixed-frequency vars: nowcasting, back-

casting and granger causality. Working Paper.

Gotz, T. B., A. Hecq and J.-P. Urbain, 2013, Testing for common cycles in non-stationary VARs

with varied frecquency data, in: T. B. Fomby, L. Kilian and A. Murphy, (Eds.), Advances in

Econometrics, Vol. 32. Emerald Group Publishing Limited, pp. 361–393.

56

Gotz, T. B., A. Hecq and J.-P. Urbain, 2014, Forecasting mixed frequency time series with

ECM-MIDAS models. Journal of Forecasting 33, 198–213.

Granger, C. W. J., 1969, Investigating causal relations by econometric models and cross-spectral

methods. Econometrica 37, 424–438.

Granger, C. W. J., 1988, Some recent development in a concept of causality. Journal of

Econometrics 39, 199–211.

Granger, C. W. J. and J.-L. Lin, 1995, Causality in the long run. Econometric Theory 11,

530–536.

Hamilton, J. D. and L. Gang, 1996, Stock market volatility and the business cycle. Journal of

Applied Econometrics 11, 573–93.

Hansen, B. E., 1996, Inference when a nuisance parameter is not identified under the null

hypothesis. Econometrica 64, 413–30.

Heber, G., A. Lunde, N. Shephard and K. Sheppard, 2009, Oxford-man institute’s realized

library (version 0.2). Oxford-Man Institute, University of Oxford.

Hoerl, A. E. and R. W. Kennard, 1970, Ridge regression: biased estimation for nonorthogonal

problems. Technometrics 12, 55–67.

Horowitz, J. L., 2001, The bootstrap, in: J. J. Heckman and E. E. Leamer, (Eds.), Handbook

of Econometrics, Vol. 5, Chapter 52. North Holland Publishing, Amsterdam, pp.3159–3228.

Kadiyala, K. R. and S. Karlsson, 1997, Numerical methods for estimation and inference in

Bayesian VAR-models. Journal of Applied Econometrics 12, 99–132.

Kilian, L., 1998, Small-sample confidence intervals for impulse response functions. Review of

Economics and Statistics 80, 218–230.

57

Kuzin, V., M. Marcellino, and C. Schumacher, 2011, MIDAS vs. mixed-frequency VAR: now-

casting GDP in the euro area. International Journal of Forecasting 27, 529–542.

Litterman, R. B., 1986, Forecasting with Bayesian vector autoregressions - Five years of expe-

rience. Journal of Business & Economic Statistics 4, 25–38.

Lutkepohl, H., 1993, Testing for causation between two variables in higher dimensional VAR

models, in: H. Schneewei, Hans und K. F. Zimmermann, (Eds.), Studies in Applied Econo-

metrics. Springer-Verlag, Heidelberg, pp. 75–91.

Marcellino, M., 1999, Some consequences of temporal aggregation in empirical analysis. Journal

of Business & Economic Statistics 17, 129–136.

Marcellino, M. and C. Schumacher, 2010, Factor MIDAS for Nowcasting and Forecasting with

Ragged-Edge Data: a Model Comparison for German GDP. Oxford Bulletin of Economics

and Statistics 72, 518–550.

McCracken, M. W., M. T. Owyang and T. Sekhposyan, 2015, Real-Time forecasting with a

large, mixed frequency, Bayesian VAR. Working Paper 2015-30, Federal Reserve Bank of St.

Louis.

Mele, A., 2007, Asymmetric stock market volatility and the cyclical behavior of expected

returns. Journal of Financial Economics 86, 446–478.

Miller, J. I., 2011, Conditionally efficient estimation of long-run relationships using mixed-

frequency time series. Working Paper 1103, Department of Economics, University of Missouri.

Miller, J. I., 2014, Mixed-frequency cointegrating regressions with parsimonious distributed lag

structures. Journal of Financial Econometrics 12, 584–614.

Paparoditis, E., 1996, Bootstrapping autoregressive and moving average parameter estimates of

infinite order vector autoregressive processes. Journal of Multivariate Analysis 57, 277–296.

58

Paparoditis, E. and D. N. Politis, 2005, Bootstrap hypothesis testing in regression models.

Statistics & Probability Letters 74, 356–365.

Park, T. and G. Casella, 2008, The Bayesian Lasso. Journal of American Statistical Association

103, 681–686.

Ravikumar, B., S. Ray and N. E. Savin, 2000, Robust Wald tests in SUR systems with adding-

up restrictions. Econometrica 68, 715–720.

Schorfheide, F. and D. Song, 2015, Real-time forecasting with a mixed-frequency VAR. Journal

of Business & Economics Statistics 33, 366–380.

Schwert, G. W., 1989a, Business cycles, financial crises, and stock volatility. Carnegie-Rochester

Conference Series on Public Policy 31, 83–125.

Schwert, G. W., 1989b, Why does stock market volatility change over time? Journal of Finance

44, 1115–53.

Silvestrini, A. and D. Veredas, 2008, Temporal aggregation of univariate and multivariate time

series models: a survey. Journal of Economic Surveys 22, 458–497.

Sims, C. A., 1971, Discrete approximations to continuous time distributed lags in econometrics.

Econometrica 39, 545–563.

Sims, C. A. and T. Zha, 1998, Bayesian methods for dynamic multivariate models. International

Economic Review 39, 949–68.

Tibshirani, R., 1996, Regression shrinkage and selection via the Lasso. Journal of the Royal

Statistical Society (Series B) 58, 267–288.

Vahid, F. and Engle, R. F., 1993, Common trends and common cycles. Journal of Applied

Econometrics 8, 341–360.

59

Van Giersbergen, N. P. A. and J. F. Kiviet, 1996, Bootstrapping a stable AD model: weak vs

strong exogeneity. Oxford Bulletin of Economics and Statistics 58, 631–656.

Vogel, C. R., 2002, Computational methods for inverse problems. Society for Industrial and

Applied Mathematics, Philadelphia.

Waggoner, D. F. and T. Zha, 2003, A Gibbs sampler for structural vector autoregressions.

Journal of Economic Dynamics and Control 28, 349–366.

White, H., 2000, A reality check for data snooping. Econometrica 68, 1097–1126.

Zellner, A., 1996, An introduction to bayesian inference in econometrics. John Wiley and Sons,

Inc.

60

E Tables

61

Tab

le1:

Siz

eof

Gra

nge

rN

on-C

ausa

lity

Tes

tsfr

omX

(m)

toy

X(m

)toy

TL

FU

NR

CC

AP

LS

HA

Rmax

BM

FU

NR∗

CC

A∗

PL

S∗

HA

R∗

σHL

=0

505.4

13.4

18.8

11.4

6.9

4.3

5.7

5.0

4.6

4.9

5.2

250

4.7

6.4

28.6

12.4

5.4

4.4

5.2

4.9

4.7

5.3

5.3

500

4.9

5.6

29.2

12.8

5.2

4.1

5.3

5.1

5.2

6.2

5.3

σHL

=−

0.0

550

5.3

12.6

20.9

16.7

6.6

4.6

5.6

4.7

5.6

8.2

5.0

250

4.8

5.8

28.3

17.1

5.5

4.4

5.2

4.7

4.8

9.2

5.4

500

4.8

6.2

30.0

16.9

4.4

4.2

5.3

5.6

5.2

8.5

4.5

σHL

=−

0.1

505.0

12.5

20.0

32.6

6.3

4.6

5.0

4.8

4.8

19.6

4.8

250

4.6

7.2

28.8

35.7

5.5

4.8

5.1

5.8

5.8

23.2

5.2

500

4.7

5.4

28.9

36.5

5.3

4.5

5.2

4.8

6.3

23.0

5.2

Note

:F

or

test

ing

Gra

nger

non-c

ausa

lity

fromX

(m)

toy

the

figure

sre

pre

sent

per

centa

ges

of

reje

ctio

ns

of

the

asy

mpto

tic

Wald

test

stati

stic

sand

the

rest

rict

edb

oots

trap

vari

ants

(indic

ate

dw

ith

asu

per

scri

pt

’*’)

of

the

follow

ing

appro

ach

es:

The

low

-fre

quen

yV

AR

(LF

),th

eunre

stri

cted

appro

ach

(UN

R),

reduce

dra

nk

rest

rict

ions

wit

hone

fact

or

usi

ng

canonic

al

corr

elati

ons

analy

sis

(CC

A)

or

part

ial

least

square

s(P

LS),

and

wit

hth

ree

imp

ose

dfa

ctors

usi

ng

the

HA

Rm

odel

(HA

R),

themax

-tes

t(max

)and

the

Bay

esia

nm

ixed

-fre

quen

cyappro

ach

(BM

F).

The

lag

length

of

the

esti

mate

dV

AR

sis

equal

toone.

The

under

lyin

gD

GP

isfo

und

in(2

7),

wher

eth

eva

riance

-cov

ari

ance

matr

ixof

the

erro

rte

rmis

equal

toΣu

wit

hσHL

=0,−

0.0

5,−

0.1

.

62

Tab

le2:

Siz

eof

Gra

nge

rN

on-C

ausa

lity

Tes

tsfr

omy

toX

(m)

ytoX

(m)

TL

FU

NR

CC

AP

LS

HA

Rmax

BM

FU

NR∗

CC

A∗

PL

S∗

HA

R∗

σHL

=0

505.7

91.1

82.0

58.6

59.8

6.1

7.9

4.6

8.8

5.0

5.2

250

5.5

11.5

11.8

10.6

10.2

4.6

10.4

5.2

5.3

5.0

5.3

500

5.2

6.7

7.4

6.8

6.6

5.9

7.8

4.6

4.8

4.6

4.8

σHL

=−

0.0

550

5.6

92.4

83.8

62.0

60.6

5.7

4.8

5.6

9.4

4.9

5.6

250

5.4

10.9

13.2

11.9

10.3

4.8

8.9

4.7

4.5

4.4

4.8

500

5.0

7.4

9.2

8.7

7.3

5.5

7.3

5.4

5.4

5.3

5.2

σHL

=−

0.1

505.7

92.1

87.1

64.2

58.5

5.7

0.5

5.5

9.2

4.0

4.4

250

4.7

11.0

20.5

14.8

10.5

5.0

5.4

5.2

6.3

5.1

5.3

500

5.3

7.4

13.8

10.7

7.2

5.5

5.0

5.3

5.7

4.2

5.2

Note

:F

or

test

ing

Gra

nger

non-c

ausa

lity

fromy

toX

(m)

the

figure

sre

pre

sent

per

centa

ges

of

reje

ctio

ns

of

the

asy

mpto

tic

Wald

test

stati

stic

sand

the

unre

stri

cted

boots

trap

vari

ants

(indic

ate

dw

ith

asu

per

scri

pt

’*’)

of

the

follow

ing

appro

ach

es:

The

low

-fre

quen

cyV

AR

(LF

),th

eunre

stri

cted

appro

ach

(UN

R),

reduce

dra

nk

rest

rict

ions

wit

hone

fact

or

usi

ng

canonic

al

corr

elati

ons

analy

sis

(CC

A)

or

part

ial

least

square

s(P

LS),

and

wit

hth

ree

imp

ose

dfa

ctors

usi

ng

the

HA

Rm

odel

(HA

R),

themax

-tes

t(max

)and

the

Bay

esia

nm

ixed

-fre

quen

cyappro

ach

(BM

F).

The

lag

length

of

the

esti

mate

dV

AR

sis

equal

toone.

The

under

lyin

gD

GP

isfo

und

in(2

7),

wher

eth

eva

riance

-cov

ari

ance

matr

ixof

the

erro

rte

rmis

equal

toΣu

wit

hσHL

=0,−

0.0

5,−

0.1

.

63

Tab

le3:

Siz

eof

Gra

nge

rN

on-C

ausa

lity

Tes

tsfr

omy

toX

(m)

(Bon

ferr

oni)

ytoX

(m)

TU

NRb

CC

Ab

PL

Sb

HA

Rb

BM

Fb

UN

R∗ b

CC

A∗ b

PL

S∗ b

HA

R∗ b

σHL

=0

5016

.917

.316

.413

.92.

93.

44.

63.

93.

725

012

.713

.613

.112

.75.

14.

65.

05.

04.

950

012

.412

.412

.112

.24.

45.

05.

04.

64.

9

σHL

=−

0.0

550

17.9

18.9

18.3

15.2

2.1

3.8

4.8

4.0

4.0

250

13.4

14.5

14.4

12.8

4.8

5.4

5.2

5.4

5.4

500

12.3

13.7

14.8

12.5

4.0

5.0

5.2

5.0

4.6

σHL

=−

0.1

5017

.220

.221

.615

.40.

43.

35.

04.

34.

425

012

.015

.417

.012

.23.

34.

95.

34.

84.

750

011

.914

.917

.011

.73.

35.

45.

25.

15.

2

Note

:F

or

test

ing

Gra

nger

non-c

ausa

lity

fromy

toX

(m)

the

figure

sre

pre

sent

per

centa

ges

of

reje

ctio

ns

of

the

Bonfe

rroni-

typ

ete

sts

(indic

ate

dw

ith

asu

bsc

ript

’b’)

for

both

the

asy

mpto

tic

Wald

test

stati

stic

sand

the

unre

stri

cted

boots

trap

vari

ants

(indic

ate

dw

ith

asu

per

scri

pt

’*’)

of

the

follow

ing

appro

ach

es:

The

unre

stri

cted

appro

ach

(UN

R),

reduce

dra

nk

rest

rict

ions

wit

hone

fact

or

usi

ng

canonic

al

corr

elati

ons

analy

sis

(CC

A)

or

part

ial

least

square

s(P

LS),

and

wit

hth

ree

imp

ose

dfa

ctors

usi

ng

the

HA

Rm

odel

(HA

R)

and

the

Bay

esia

nm

ixed

-fre

quen

cyappro

ach

(BM

F).

For

the

low

-fre

quen

cyV

AR

(LF

)and

themax

-tes

t(max

)th

eB

onfe

rroni-

typ

ete

stis

not

applica

ble

.T

he

lag

length

of

the

esti

mate

dV

AR

sis

equal

toone.

The

under

lyin

gD

GP

isfo

und

in(2

7),

wher

eth

eva

riance

-cov

ari

ance

matr

ixof

the

erro

rte

rmis

equal

toΣu

wit

hσHL

=0,−

0.0

5,−

0.1

.

64

Tab

le4:

Pow

erof

Gra

nge

rN

on-C

ausa

lity

Tes

tsfr

omX

(m)

toy

X(m

)toy

TL

Fmax

BM

FU

NR∗

CC

A∗

PL

S∗

HA

R∗

Pow

er-D

GP

1,

β=

0.5

5064

.139

.652

.823

.112

.754

.062

.625

099

.910

0.0

99.8

99.8

96.0

98.7

100.

050

010

0.0

100.

010

0.0

100.

010

0.0

100.

010

0.0

Pow

er-D

GP

1,

β=

0.8

5093

.679

.492

.254

.626

.677

.993

.025

010

0.0

100.

010

0.0

100.

010

0.0

100.

010

0.0

500

100.

010

0.0

100.

010

0.0

100.

010

0.0

100.

0

Pow

er-D

GP

1,

β=

2

5010

0.0

100.

010

0.0

100.

085

.398

.410

0.0

250

100.

010

0.0

100.

010

0.0

100.

010

0.0

100.

050

010

0.0

100.

010

0.0

100.

010

0.0

100.

010

0.0

Pow

er-D

GP

2,

β=

0.5

505.

814

.521

.99.

78.

025

.122

.325

05.

075

.979

.271

.258

.879

.789

.450

04.

798

.699

.098

.387

.697

.399

.8

Note

:F

or

test

ing

Gra

nger

non-c

ausa

lity

fromX

(m)

toy

the

figure

sre

pre

sent

per

centa

ges

of

reje

ctio

ns

of

the

asy

mpto

tic

Wald

test

stati

stic

sof

the

low

-fre

quen

yV

AR

(LF

),th

emax

-tes

t(max

)and

the

Bay

esia

nm

ixed

-fre

quen

cyappro

ach

(BM

F).

Furt

her

more

,it

conta

ins

per

centa

ges

of

reje

ctio

ns

of

the

rest

rict

edb

oots

trap

vari

ants

(indic

ate

dw

ith

asu

per

scri

pt

’*’)

of

the

unre

stri

cted

appro

ach

(UN

R),

reduce

dra

nk

rest

rict

ions

wit

hone

fact

or

usi

ng

canonic

al

corr

elati

ons

analy

sis

(CC

A)

or

part

ial

least

square

s(P

LS),

and

wit

hth

ree

imp

ose

dfa

ctors

usi

ng

the

HA

Rm

odel

(HA

R).

The

lag

length

of

the

esti

mate

dV

AR

sis

equal

toone.

The

under

lyin

gD

GP

isfo

und

in(2

8)

or

(29),

wher

eth

eva

riance

-cov

ari

ance

matr

ixof

the

erro

rte

rmis

equal

toΣu

wit

hσHL

=−

0.0

5.

Inp

ower

-DG

P1

the

Gra

nger

causa

lity

det

erm

inin

gco

effici

ents

do

not

sum

up

toze

ro,

wher

eas

they

do

soin

pow

er-D

GP

2.

See

Sec

tion

4fo

rdet

ails.β

=0.5,0.8,2

inp

ower

-DG

P1

andβ

=0.5

inp

ower

-DG

P2.

65

Tab

le5:

Pow

erof

Gra

nge

rN

on-C

ausa

lity

Tes

tsfr

omy

toX

(m)

ytoX

(m)

TL

Fmax

BM

FU

NR∗

CC

A∗

PL

S∗

HA

R∗

UN

R∗ b

CC

A∗ b

PL

S∗ b

HA

R∗ b

Pow

er-D

GP

150

6.9

8.0

15.8

5.6

9.9

6.2

6.2

6.8

11.4

10.7

10.8

250

19.9

24.2

31.0

17.7

20.0

20.3

19.3

51.0

54.3

54.7

53.8

500

36.4

53.2

49.1

39.3

44.7

43.9

42.4

83.1

85.2

85.2

84.6

Pow

er-D

GP

250

4.7

7.9

18.4

5.7

10.6

5.7

5.3

5.9

10.0

8.9

8.0

250

5.0

22.9

25.1

13.5

14.6

14.3

14.3

32.2

34.7

34.2

32.6

500

5.2

49.9

33.7

26.6

29.9

28.4

27.7

57.3

62.2

62.2

59.3

Note

:F

or

test

ing

Gra

nger

non-c

ausa

lity

fromy

toX

(m)

the

figure

sre

pre

sent

per

centa

ges

of

reje

ctio

ns

of

the

asy

mpto

tic

Wald

test

stati

stic

sof

the

low

-fre

quen

yV

AR

(LF

),th

emax

-tes

t(max

)and

the

Bay

esia

nm

ixed

-fre

quen

cyappro

ach

(BM

F).

Furt

her

more

,it

conta

ins

per

centa

ges

of

reje

ctio

ns

of

the

unre

stri

cted

boots

trap

vari

ants

(indic

ate

dw

ith

asu

per

scri

pt

’*’)

of

the

unre

stri

cted

appro

ach

(UN

R),

reduce

dra

nk

rest

rict

ions

wit

hone

fact

or

usi

ng

canonic

al

corr

elati

ons

analy

sis

(CC

A)

or

part

ial

least

square

s(P

LS),

and

wit

hth

ree

imp

ose

dfa

ctors

usi

ng

the

HA

Rm

odel

(HA

R).

For

the

latt

erse

tof

appro

ach

es,

itals

osh

ows

the

outc

om

esfo

rth

eB

onfe

rroni-

typ

ete

sts

(indic

ate

dw

ith

asu

bsc

ript

’b’)

.T

he

lag

length

of

the

esti

mate

dV

AR

sis

equal

toone.

The

under

lyin

gD

GP

isfo

und

in(2

8)

or

(29),

wher

eth

eva

riance

-cov

ari

ance

matr

ixof

the

erro

rte

rmis

equal

toΣu

wit

hσHL

=−

0.0

5.

Inp

ower

-DG

P1

the

Gra

nger

causa

lity

det

erm

inin

gco

effici

ents

do

not

sum

up

toze

ro,

wher

eas

they

do

soin

pow

er-D

GP

2.

See

Sec

tion

4fo

rdet

ails.β

=0.5

andγ2,1

=0.2

inb

oth

pow

er-D

GP

s.

66

Table 6: Testing for Granger Causality between BV (20) and ipi

BV (20) to ipip = 1 p = 2

Wald Robust Wald Robust

LF 0.0 0.2 0.1 2.1max 0.1 ./. 0.0 ./.BMF 14.4 ./. 21.8 ./.

UNR∗ 15.0 17.4 7.3 8.7CCA1∗ 2.9 7.9 12.4 20.4CCA2∗ 14.5 16.2 19.2 25.1PLS1∗ 0.1 0.7 0.2 1.9PLS2∗ 0.6 2.8 0.1 1.4HAR∗ 1.2 2.2 1.5 2.4

ipi to BV (20) p = 1 p = 2Wald Robust Wald Robust

LF 0.1 1.5 0.4 5.0max 4.4 ./. 1.1 ./.BMF 23.5 ./. 42.5 ./.

UNR∗ 28.9 34.2 25.8 26.4CCA1∗ 25.8 33.5 20.5 38.4CCA2∗ 35.9 46.1 17.6 34.3PLS1∗ 15.8 24.6 11.4 24.4PLS2∗ 18.3 26.9 10.3 18.0HAR∗ 19.2 27.0 11.4 18.6

UNR∗b 5.4 8.9 20.0 22.9CCA1∗b 4.1 10.5 7.8 16.9CCA2∗b 5.0 10.8 7.7 15.1PLS1∗b 1.5 4.3 1.5 9.5PLS2∗b 1.2 5.2 1.1 4.0HAR∗b 2.3 6.5 6.2 14.1

Note: For testing Granger non-causality between BV (20) and ipi the figures represent p-values (in percentages)of the asymptotic Wald test statistics, the restricted bootstrap variants for the direction from BV (20) to ipi andthe unrestricted bootstrap variants for the direction from ipi to BV (20) (bootstrap variants are indicated with asuperscript ’*’, the Bonferroni-type tests with a subscript ’b’). For all approaches but the max-test (max) andthe Bayesian MF-VAR (BMF) a robustified version is computed as well. The lag length of the estimated VARsis equal to one or two. For CCA and PLS the results for one and two factors are displayed.

67

F Figures

Figure 1: Parameter values for 2w∗j−1(−0.01) and γ∗j,1 (γ2,1 = 0.25)

68

Figure 2: ipi and BV (20)

Note: This figure shows the monthly growth rate of the U.S. industrial production index (lower line), i.e., ipi,

and the logarithm of daily bipower variation of the S&P500 stock index (top lines), i.e., BV (20), for the time

period from January 2000 to June 2012. The graph for the former is shifted downwards to enable a visual

separation of ipi from the bv-lines.

69

researchersresearchers-sbe.unimaas.nl/alainhecq/wp-content/uploads/... · 2016. 3. 8. · testing...

Documents