Barcelona GSE Working Paper Series
Working Paper nº 723
Nets: Network Estimation for Time Series
Matteo Barigozzi Christian Brownlees
October 2013
Nets:
Network Estimation for Time Series
Matteo Barigozzi‡ Christian Brownlees†
This Draft: October, 2013
Abstract
This work proposes novel network analysis techniques for multivariate time se-ries. We de�ne the network of a multivariate time series as a graph where verticesdenote the components of the process and edges denote non{zero long run par-tial correlations. We then introduce a two step lasso procedure, called nets, toestimate high{dimensional sparse Long Run Partial Correlation networks. This ap-proach is based on a var approximation of the process and allows to decomposethe long run linkages into the contribution of the dynamic and contemporaneousdependence relations of the system. The large sample properties of the estimatorare analysed and we establish conditions for consistent selection and estimation ofthe non{zero long run partial correlations. The methodology is illustrated with anapplication to a panel of U.S. bluechips.
Keywords: Networks, Multivariate Time Series, Long Run Covariance, Lasso
JEL: C01, C32, C52
‡ Department of Statistics, London School of Economics and Political Science,e-mail: [email protected];† Department of Economics and Business, Universitat Pompeu Fabra and Barcelona GSE,e-mail: [email protected];
The proofs of the main results of the paper are contained in an appendix available online fromthe authors' websites. The procedures presented in this paper are available in the package nets for R.
1
1 Introduction
Over the last years, network analysis has become increasingly popular in economics and
finance. In a nutshell, network analysis aims at representing the interconnections of a
large multivariate system as a graph, and the graph representation is then used to study
the properties of the system. In finance, these methods have been applied to study inter-
connections among financial institutions in order to identify channels of contagion. Ap-
plications in this area have gauged considerable interest in the aftermath of the 2007-2009
financial crisis. Example of contributions on network analysis in the literature include,
inter alia, Billio, Getmansky, Lo, and Pellizzon (2012), Diebold and Yilmaz (2013) and
Hautsch, Schaumburg, and Schienle (2012, 2013).
In this work we propose a novel network analysis technique to represent the cross–
sectional conditional dependence structure of a high–dimensional multivariate time series.
This entails two tasks. We first introduce a measure of cross–sectional conditional depen-
dence for time series that is going to be used to define the network. We then develop an
estimation approach that allows to estimate the dependence measure from the data.
The standard network dependence measure used in the statistics and graphical mod-
elling literature is partial correlation (see e.g. Dempster, 1972; Meinshausen and Buhlmann,
2006). The partial correlation network is defined as an undirected weighted graph where
vertices denote the components of the process and the presence of an edge between the
i-th and j-th vertices indicates that the i-th and j-th components are partially correlated
given all other components. The network edges are then associated with the value of the
partial correlation between the vertices they connect to express the weight of the link.
Partial correlation networks are typically motivated by the analysis of i.i.d. Gaussian data.
In this setting, absence of partial correlation implies conditional independence, thus the
partial correlation network exhaustively characterizes cross–sectional dependence. This
definition however has limitations in the analysis of time series processes exhibiting serial
dependence. Partial correlation only captures contemporaneous cross–sectional depen-
dence, while in a time series setting cross–sectional dependence can arise at different
2
leads or lags. In fact, in the the time series network literature Billio et al. (2012) and
Diebold and Yilmaz (2013) have proposed network definitions that take into account the
dynamic properties of the data.
In this work we propose a network definition based on long run partial correlation,
a dependence measure constructed from the long run covariance matrix of the process.
The Long Run Partial Correlation network is defined analogously to the (simple) partial
correlation network by using the long run partial correlation as a measure of dependence
between two time series. Long run partial correlation is a comprehensive and model–
free measure of cross–sectional conditional dependence. It captures contemporaneous
correlation as well as lead/lag effects. Moreover, it is model–free in the sense that it does
not hinge on specific modelling assumptions on the process. These are appealing features
for economic and financial applications where dependence across series is not necessarily
only contemporaneous and correct model specification is often a concern. Long run partial
correlation can also be interpreted as a special case of partial spectral coherence, a cross–
sectional conditional dependence measure based on the spectral density matrix used in
frequency domain analysis (see Brillinger, 1981; Dahlhaus, 2000; Eichler, 2007).
In order to estimate the Long Run Partial Correlation network from the data, we pro-
pose an estimation approach that combines ideas from the high–dimensional sparse graph
estimation literature in statistics (see Meinshausen and Buhlmann, 2006; Peng, Wang,
Zhou, and Zhu, 2009) and the heteroskedastic and autocorrelation consistent (hac) co-
variance estimation literature in econometrics (see White, 1984; Gallant, 1987; Newey and
West, 1987; Andrews and Monahan, 1992; Den Haan and Levin, 1994). Following Mein-
shausen and Buhlmann (2006) and Peng et al. (2009), we are concerned with the case in
which the network is sparse and the main inferential problem of interest consists of select-
ing and estimating the non–zero long run partial correlations of the system. Graphically
this corresponds to detecting the network linkages and estimating their weight. It can be
shown that long run partial correlations are a function of the inverse of the long run co-
variance matrix (i.e. the long run concentration matrix). Moreover, the long run (i, j)-th
partial correlation is zero if and only if the (i, j)-th element of the long run concentration
3
matrix is zero. This implies that network estimation can be formulated as a sparse long
run concentration matrix estimation problem. A sufficiently regular multivariate time
series can be represented as a Vector Autoregression (var) of infinite order and the long
run covariance matrix of such representation is equal to the long run covariance of the
original process. Furthermore, the long run covariance is available in closed form and
its inverse is a “sandwich” form made up of two components: (i) a matrix that depends
on the var parameters that summarises the long run Granger causality structure of the
process (which we name Granger component) and (ii) the concentration matrix of the
var innovations that synthesises contemporaneous dependence conditionally on the past
information (contemporaneous component). These two components can be further asso-
ciated to two networks representing predictive and contemporaneous association among
series. From a graphical perspective such representation is appealing in that the long
run partial correlation linkages can be interpreted as a (nontrivial) combination of the
Granger and contemporaneous links.
We propose a natural regression procedure based on the var representation of the data
generating process to estimate the long run correlation network. The algorithm consists
of estimating the var parameters and the concentration matrix of the var residuals using
lasso regressions. The outputs of the procedure are the Long Run Partial Correlation
network as well as the Granger and Contemporaneous networks. We name this procedure
nets (Network Estimator for Time Series). The properties of this estimation strategy
are established under the assumption that the var representation is sparse. The spar-
sity of the var representation determines in turn the sparsity structure of the Granger,
Contemporaneous, and Long Run Partial Correlation networks. We establish conditions
for consistent selection of the network edges and consistent estimation of edges’ weights.
An important highlight of the theory developed in this work is that results are derived
in a high–dimensional settings that allows for the number of series and order of the var
to increase with the sample size. This implies that the number of possible linkages is al-
lowed to be larger than the number of observations available. The theory is developed for
a zero–mean, purely–nondeterministic, weakly stationary process. Standard results from
4
the hac literature (see Den Haan and Levin, 1994) lead us to conjecture that the result
should also hold for more general classes of process that allow, among other features, for
heteroskedasticity.
We illustrate this technique with an application to financial network analysis based
on a panel of financial monthly equity returns for a set of 41 bluechips across different
industry groups between 1990 and 2010. Building up on standard models used in empirical
finance, we model the returns using a capm type one factor model: the returns of each
firm are a linear function of the market return and an idiosyncratic shock. We then
use nets to analyse the network structure of the idiosyncratic shocks. We find that
the market accounts on average for 25% of the variation of returns while links of the
idiosyncratic network account on average for another 15%. Moreover, a number of network
analysis indices are used to summarises the properties of the network and we find that
the idiosyncratic risk network shares several of the empirical regularities found in social
network analysis.
Our work relates to different strands of literature. This paper is inspired and motivated
by the econometric literature on the analysis of financial networks that has developed in
the aftermath of the 2007–2009 financial crisis. Contributions in this area include Billio
et al. (2012), Diebold and Yilmaz (2013), Hautsch et al. (2012, 2013), Dungey, Luciani,
and Veredas (2012). Our contribution builds up on the statistical literature on networks
and graphical models (see Lauritzen, 1996). In particular, the relation between partial
correlation and the inverse of the covariance matrix was first put forward in Dempster
(1972). Recently research in the area was boosted by a number of contributions developing
techniques for high–dimensional partial correlation network estimation for independent
data using the lasso, inter alia Meinshausen and Buhlmann (2006), Friedman, Hastie,
and Tibshirani (2008), Peng et al. (2009). See also, for instance, Peterson, Vannucci,
Karakas, Choi, Ma, and Maletic-Savatic (2013) for a contribution on graph modelling
from a Bayesian perspective. Also, our work relates to the research of De Mol, Gian-
none, and Reichlin (2008), Song and Bickel (2011), Jarocinski and Mackowiak (2011),
Davis, Zang, and Zheng (2012), Kock (2012), Kock and Callot (2012), and Medeiros
5
and Mendes (2012) on sparse estimation for large var models. The estimation approach
developed in this paper is inspired by the hac covariance estimation literature, which
includes contributes by White (1984), Gallant (1987), Newey and West (1987), Andrews
and Monahan (1992), Den Haan and Levin (1994). The estimation approach developed in
this work is high–dimensional, in the sense that we allow for the number of parameters to
be estimated to be larger than the number of observations available. Contributions and
surveys in the high–dimensional literature include Belloni, Chernozhukov, and Hansen
(2011) and Buhlmann and van de Geer (2011) among others. This research also relates to
the literature on regularized covariance estimation by Bickel and Levina (2008), Lam and
Fan (2009), Ledoit and Wolf (2004), Pourahmadi (2011), Fan, Liao, and Mincheva (2011,
2013), and Bai and Liao (2012). Finally, theoretical research on networks in economics
and finance, also in relation to the financial crisis, was put forward by Allen and Gale
(2000) and Xavier, Parigi, and Rochet (2000). More recent influential contributions in-
clude Acemoglu, Carvalho, Ozdaglar, and Tahbaz-Salehi (2012) and Acemoglu, Ozdaglar,
and Tahbaz-Salehi (2013).
The paper is structured as follows. Section 2 introduces the definition of network for
time series and Section 3 presents the estimation approach. Section 4 presents simulation
results. Section 5 describes the empirical application to a panel of equity returns for a
panel of U.S. bluechips. Concluding remarks follow in Section 6.
2 Networks for Time Series
We are concerned in using a graphical model to represent the cross–sectional conditional
dependence structure of a n–dimensional multivariate time series
yt = (yt 1, . . . , yt n)′.
6
2.1 Partial Correlation Network
First let us assume that yt is a white noise process. The standard network definition com-
monly used in the statistics literature is the partial correlation network, (see Dempster,
1972; Meinshausen and Buhlmann, 2006). The process yt is represented as an undirected
weighted graph N = (V , E) where V is the set of vertices and E is the set of edges. The
set of vertices V is {1, ..., n} where each element corresponds to a component of yt. The
set of edges E is a subset of V × V such that the pair (i, j) is in E if and only if the
components i and j are partially correlated conditionally on all other components of the
process. Partial correlation is a linear measure of cross–sectional conditional dependence
and it is defined as the correlation between yt i and yt j conditionally on all other variables
in the system, that is
ρij = Cor(yt i, yt j|{yt k : k 6= i, j}).
The set of edges of the partial correlation network can then be expressed as
E ={
(i, j) ∈ V × V : ρij 6= 0},
and ρij is the weight associated with the edge between i and j. It is typically assumed
that the partial correlation structure of the process is sparse (see e.g. Meinshausen and
Buhlmann, 2006; Peng et al., 2009). The objective of network estimation is to use a real-
ization of the process to detect which edges are present and to estimate the corresponding
partial correlations.
It is useful to recall a number of properties of partial correlation. First, if the white
noise is also Gaussian, then absence of partial correlation implies conditional indepen-
dence. Second, partial correlation is related to linear regression. Consider the linear
regression model where variable i is regressed on all other remaining variables in the
system, that is
yt i = c+ β1iyt 1 + . . .+ β(i−1)iyt i−1 + β(i+1)iyt i+1 + . . .+ βniyt n + et i
7
where E(et i) = 0, Var(et i) = σ2i and E(yt jet i) = 0 for all j 6= i. It can be shown that, for
any j 6= i, the ρij partial correlation is different from zero if and only if βij is different from
zero. In other words, partial correlation signals that yt j has additional information for
predicting yt i after conditioning on all the other variables in the system (and vice versa).
Partial correlation is also related to (simple) correlation. If in the partial correlation
network there exists a path between the two vertices, then the two vertices are correlated
(and vice versa). Finally, partial correlation can be expressed as a function of the inverse
of the covariance matrix of the process. Let Σ = Cov(yt,yt) and let K ≡ Σ−1, which is
also commonly referred to as the concentration matrix. Let kij denote the (i, j)-th entry
of K. Then, the (i, j)-th partial correlation can be expressed as
ρij =−kij√kiikjj
. (1)
This relation implies that the structure of the partial correlation network is fully char-
acterized by the concentration matrix of the process. This property is important for
estimation, in that network estimation is typically reformulated as a sparse concentration
matrix estimation problem.
Example 1. Let yt be a zero mean Gaussian white noise process with n = 5 and with
concentration matrix
K ≡ Σ−1 =
1.0 -0.2 0.0 0.0 0.0
-0.2 1.0 0.5 -0.3 0.0
0.0 0.5 1.0 0.0 0.0
0.0 -0.3 0.0 1.0 -0.7
0.0 0.0 0.0 -0.7 1.0
.
The partial correlation network associated with this process is plotted in Figure 1.
2.2 Long Run Partial Correlation Network
Partial correlation is an exhaustive measure of conditional dependence for white noise pro-
cesses. However, this measure has limitations when considering more general time series
8
processes. The main shortcoming is that partial correlation captures only contemporane-
ous dependence between the variables in the system. In a time series context, however,
cross–sectional dependence is not necessarily just contemporaneous.
In this work we propose a network definition for time series that attempts to overcome
the limitations of partial correlations. The network definition we adopt is based on long
run partial correlation, a measure of cross–sectional conditional dependence constructed
from the long run covariance matrix of the process. The long run covariance matrix of
the process yt can be defined as
ΣL ≡ limM→∞
1
MCov
(M∑t=1
yt,M∑t=1
yt
), (2)
assuming the limit exists. The main highlight of the definition in equation (2) is that the
rescaled limiting covariance of the aggregated process summarises cross–sectional depen-
dences at any lead/lag. Also, our definition simplifies to the standard definition of partial
correlation network in case the time series process is a white noise. Clearly, our measure
of dependence in (2) only captures linear dependence, and it does not capture other types
of cross–sectional dependence.
We present a stylized example for a 2–dimensional process in order to point out the
differences between the (simple) covariance and the long run covariance matrix.
Example 2. Let yt be a vma process defined as
yt 1
yt 2
=
εt 1
εt 2
+
0 ψ
0 0
εt−1 1
εt−1 2
with εt 1
εt 2
∼ N 0
0
, σ2 0
0 σ2
,
and Cov(εt j, εt−h i) = 0, for any i, j and h 6= 0. The contemporaneous correlation between
9
the two components is zero
Cor (yt 1, yt 2) = 0.
The correlation of the aggregated process over M = 12 periods is
Cor(∑12
t=1 yt 1,∑12
t=1 yt 2
)=
(11
12
)ψ√
1 + ψ2,
where it is trivial to see that the correlation is increasing in the absolute value of the
spillover parameter ψ. The long run correlation between the two processes is
limM→∞
Cor(∑M
t=1 yt 1,∑M
t=1 yt 2
)= lim
M→∞
M − 1
M
ψ√1 + ψ2
=ψ√
1 + ψ2. (3)
This example highlights a number of important properties. The (simple) covariance
does not exhaustively characterize cross–sectional dependence in a dynamic setting. On
the other hand, the covariance of the aggregated process provides more comprehensive
description of the conditional dependence structure. Also, the example shows that if the
memory of the process has a sufficiently fast rate of decay, then the convergence of the
long run covariance to its limit is fast. The bottom line of the example is that in a
serially dependent multivariate system the cross–sectional dependence structure can be
richer than what is measured by the covariance.
Our network definition has a general interpretation in terms of spectral density matrix
of a covariance stationary vector process, which is defined as
sy(ω) =1
2π
+∞∑h=−∞
Γy h e−ihω, ω ∈ [−π, π],
where Γy h = Cov(yt,yt−h) and i =√−1. As it is well known, the spectral density
is related to the long run covariance in that the spectrum evaluated at zero frequency
corresponds to the long run covariance:
ΣL = 2π sy(0) =+∞∑
h=−∞
Γy h,
10
which highlights how ΣL contains the linear dependences of yt at every lead and lag.
Example 3. For the vma defined in the previous example we have:
Γy 0 =
σ2(1 + ψ2) 0
0 σ2
, Γy 1 =
0 ψσ2
0 0
, Γy−1 = Γ′y 1,
and Γy h = 0 for |h| > 1. The spectral density matrix is, for ω ∈ [−π, π],
2π sy(ω) = Γy 0 + Γy 1e−iω + Γy−1e
iω.
which can also be computed as
2π sy(ω) =
I +
0 ψ
0 0
e−iω σ2 0
0 σ2
I +
0 0
ψ 0
eiω .
In both cases we have
ΣL = 2π sy(0) =
σ2(1 + ψ2) σ2ψ
σ2ψ σ2
,and the long run correlation in (3) follows.
As for contemporaneous partial correlations, it is possible to characterize the network
on the basis of the inverse of the long run covariance matrix. Let long run concentration
matrix be defined as
KL ≡ Σ−1L ,
and let kL ij denote its (i, j)-th entry. We can then express long run partial correlation
coefficient for series i and j as
ρijL =−kL ij√kL iikL jj
. (4)
The Long Run Partial Correlation network is defined as a weighted undirected network
11
NL = (V ,K) where the set of edges K is defined as
K ={
(i, j) ∈ V × V : ρijL 6= 0}, (5)
and ρijL is the weight associated with the edge between i and j.
In this work we develop an estimation approach for KL which is inspired by the
literature on hac covariance estimators (Newey and West, 1987; Andrews and Monahan,
1992; Den Haan and Levin, 1994). Under appropriate assumptions on the data generating
process, we can approximate yt with an infinite order var:
yt =∞∑k=1
Akyt−k + εt, εt ∼ w.n.(0,Γε), (6)
where Ak are n× n matrices such that the process is stable. The spectral density matrix
of representation (6) is
sy(ω) =1
2π
(I−
∞∑k=1
Ake−ikω
)−1
Γε
(I−
∞∑k=1
A′keikω
)−1
, ω ∈ [−π, π]. (7)
Therefore, the long run covariance and the long run concentration matrix are given by
ΣL =
(I−
∞∑k=1
Ak
)−1
Γε
(I−
∞∑k=1
A′k
)−1
,
KL ≡ Σ−1L =
(I−
∞∑k=1
A′k
)Γ−1ε
(I−
∞∑k=1
Ak
). (8)
This expression has an appealing interpretation in that it factorizes the long run concen-
tration in a sandwich form determined by the term I−∑∞
k=1 Ak, which captures long run
dynamic relations of the system, and the term Γ−1ε , which accounts for the contempora-
neous dependence of the system innovations. For ease of notation we express (8) more
compactly as
KL = (I−G)′C (I−G), (9)
12
where G ≡∑∞
k=1 Ak and C ≡ Γ−1ε , and we are going to refer to the two matrices respec-
tively as the Granger and Contemporaneous components of the long run concentration
matrix.
The formula in (9) has an appealing graphical interpretation. In order to better appre-
ciate this, let us first introduce the Granger and Contemporaneous networks associated
with the var representation of equation (6). We define gij as the (i, j)-th entry of G,
and cij as the (i, j)-th entry of C. Notice also that in general gij 6= gji while cij = cji.
The Granger network is defined as a weighted directed network NG = (V ,G) where the
presence of an edge from i to j denotes that i Granger causes j in the long run, that is
G = {(i, j) ∈ V × V : gji 6= 0} , (10)
and gji is the weight associated with the edge from i to j. The contemporaneous network
is defined as the partial correlation network of the var innovations. It is a weighted
undirected network NC = (V , C) where an edge between i and j denotes that i is partially
correlated to j conditionally on the past, that is
C ={
(i, j) ∈ V × V : ρij 6= 0}, (11)
where ρij = −cij/√ciicjj is the weight associated with the edge between i and j. The
Long Run Partial Correlation network can be interpreted as a combination of the Granger
and Contemporaneous networks.
Building on Meinshausen and Buhlmann (2006) in this work we are concerned in
the case in which the Long Run Partial Correlation network is sparse. However, in this
work we do not formulate sparsity assumptions on the KL matrix directly. Instead, we
formulate sparsity assumptions on the approximating var model. Precise details on the
sparsity assumptions are given in the theoretical analysis in Section 3 and the Appendix.
Under such assumptions, the sparsity of the var model determines the sparsity structure
of the G and C components and of the long run concentration matrix KL. Due to the
13
sandwich form of KL its sparsity structure is not trivial. The generic (i, j)-th element of
KL is given by
kLij =n∑h=1
n∑k=1
gihchkgjk, i, j = 1, . . . n.
In order to see which relations are mapped into an edge of the Long Run Partial Corre-
lation network, we decompose, in the case that i 6= j, the previous sums as follows
kLij =n∑h=1
n∑k=1
gihchkgjk =
= gii
n∑`=1
ci`gj` + gjj
n∑h=1
gihchj(1− δhi) +n∑h=1
n∑k=1
gihchkgjk (1− δhi)(1− δkj)
= giicijgjj + giiciigji + gijcjjgjj
+gii
n∑`=1
ci`gj` (1− δ`i)(1− δ`j) + gjj
n∑`=1
gi`c`j(1− δ`i)(1− δ`j)
+n∑h=1
n∑k=1
gihchkgjk (1− δhi)(1− δkj)(1− δhk)
+n∑`=1
gi`c``gj`(1− δ`i)(1− δ`j),
where δij = 0 if i 6= j and δii = 1. Using this formula we see that there is an edge between
nodes i and j, i.e. kLij 6= 0 if
1. i and j are contemporaneously partially correlated, i.e. cij = cji 6= 0;
2. i Granger causes j in the long run, i.e. gji 6= 0;
3. j Granger causes i in the long run, i.e. gij 6= 0;
4. there exists another node k such that gik 6= 0 and ckj 6= 0 or gjk 6= 0 and cki 6= 0.
Being linked in the Granger and Contemporaneous networks is a sufficient condition to
be linked in the long run but it is not necessary. The last condition shows that the set of
links of the Long Run Partial Correlation network is larger than the union of the links of
the Granger and Contemporaneous networks.
Notice that, by looking at the formula of the spectral density matrix of a var process
(see equation (7)) it is clear that with our method we can estimate not only the Long Run
14
Partial Correlation network but also the inverse of the spectrum at different frequencies.
The sparsity structure will be the same at all frequencies but the weights attached to the
edges will change with the frequency.
We conclude this Section with two examples showing the sparseness of the Long Run
Partial Correlation network as a combination of the two components.
Example 4. Consider a case with n = 3 and g11 6= 0, g23 6= 0 and c13 = c31 6= 0, while
g13 = g31 = g12 = g21 = c12 = c21 = c23 = c32 = 0 and we can show that still kL12 6= 0.
Indeed:
kL12 = g11c12g22 + g11c11g21 + g12c22g22 + g11c13g23 + g13c32g22
+g12c21g21 + g12c23g23 + g13c31g21 + g13c33g23 = g11c13g23
The intuition is that, since 1 and 3 are contemporaneously partially correlated (c13 =
c31 6= 0), and 3 Granger causes 2 in the long run (g23 6= 0), then, because 1 has serial
dependence (g11 6= 0), we have also that 1 indirectly Granger causes series 2 in the long
run, therefore 1 and 2 are partially correlated in the long run given 3. The resulting Long
Run Partial Correlation netowrk is shown graphically in Figure 2.
Example 5. Consider a var(1) with n = 6
yt = A1yt−1 + εt εt ∼ N (0,Γε)
where
A1 =
0.7 0.0 0.0 0.0 0.0 0.2
0.0 0.6 0.0 0.0 0.0 0.0
0.0 0.0 0.1 0.0 0.0 0.0
0.0 0.0 0.4 0.2 -0.3 0.0
0.0 0.0 0.0 0.0 0.3 0.0
0.0 0.0 0.0 0.0 0.0 0.4
,
15
and
Γ−1ε =
1.0 0.0 0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0 0.0 0.0
0.0 0.0 1.0 -0.2 0.0 -0.3
0.0 0.0 -0.2 1.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0
0.0 0.0 -0.3 0.0 0.0 1.0
.
Then, the long run concentration matrix is
KL =
0.1 0.0 0.0 0.0 0.0 -0.1
0.0 0.2 0.0 0.0 0.0 0.0
0.0 0.0 1.1 -0.5 -0.2 -0.2
0.0 0.0 -0.5 0.6 0.2 0.0
0.0 0.0 -0.2 0.2 0.6 0.0
-0.1 0.0 -0.2 0.0 0.0 0.4
.
Figure 3 displays the Long Run Partial Correlation network, as well as the Granger and
Contemporaneous networks. Note that vertices 3 and 5 are not connected nor in the
Contemporaneous or in the Granger network, however they are connected in the Long Run
Partial Correlation network. Notice that the strength of the Long Run Partial Correlation
network links differs from the strength of the Granger and Contemporaneous network links.
3 Network Estimation
3.1 LASSO Estimation
We obtain sparse estimators using the lasso (Least Absolute Shrinkage and Selection
Operator), an estimation technique introduced in the statistics literature by Tibshirani
(1996). Consider a general linear regression model
Yi = ϑ′0Xi + ei, i = 1, . . . , T,
16
with Xi and ϑ0 being P -dimensional vectors, and ei ∼ i.i.d.(0, s2). The lasso estimator of
this model is defined as the minimizer of the `1–penalised least squares objective function
ϑT,L = arg minϑ
1
T
T∑i=1
(Yi − ϑ′Xi)2+λTT
P∑j=1
|ϑj|, (12)
where λT ≥ 0 is a tuning parameter that determines the degree of the penalization
and ϑj is the j–th component of the parameters vector ϑ. The lasso is a shrinkage
type estimation procedure that shrinks the ols estimate towards zero. The amount of
shrinkage is determined by the value of the penalty λT : the greater the λT the greater the
shrinkage effect. An important feature of the `1–penalty is that the solution of equation
(12) delivers estimates of the parameter vector that contain exact zeros, i.e. a “sparse”
solution. Under appropriate conditions on the model and the degree of penalization, it
can be shown that if the true parameter vector ϑ0 is sparse, then the lasso consistently
estimates the non–zero parameters while the others are shrunk to exact zero. Another
advantage of the lasso (which is also shared by other shrinkage estimators) is the fact
that the estimator can be well defined even when the number of parameters P is larger
than the number of observations T (see Fan and Peng, 2004). This property is crucial in
high–dimensional modelling problems where this is indeed the case.
Several variants of the lasso have been proposed in the literature in order to consider
the case of dependent data. In particular, in this work we make use of the Adaptive
lasso estimator proposed by Zou (2006), and defined as
ϑT,AL = arg minϑ
1
T
T∑i=1
(Yi − ϑ′Xi)2+λTT
P∑j=1
wj|ϑj|,
where the wj is a weight on the penalty for ϑj and is typically chosen as the reciprocal
of the absolute value of a pre-estimator of ϑj. Among the possible choices of the pre-
estimator the most popular are the ols estimator or the ridge estimator when P is larger
than T .
17
3.2 NETS
The var approximation has two appealing features in this setting. The expression of the
long run concentration matrix in (8) lends itself to relatively straightforward estimation.
Moreover, estimators based on this representation deliver consistent estimates of the long
run concentration matrix for quite general classes of time series processes, including, for
instance, heteroskedastic processes.
In this Section we propose an algorithm called nets (Network Estimator for Time
Series) to estimate sparse Long Run Partial Correlation networks. The nets procedure
consists of reformulating the estimation of the long run concentration matrix of the var
representation of the process in (9) in two lasso regression problems. Our strategy con-
sists of estimating the two components G and C of the network and to then combine the
two to obtain an estimator of KL. In the rest of this Section we describe the estimation
procedure and provide its large sample properties. In particular, the method we propose
delivers not only an estimate of the Long Run Partial Correlation network but also consis-
tent estimates of its two components: (i) the Granger network that summarises the long
run Granger causality structure of the process and (ii) the Contemporaneous network of
the var innovations that synthesises contemporaneous dependence conditionally on the
past information. Depending on the data at hand both components may contain interest-
ing information and it might be worth to consider them separately. Detailed assumptions
and proofs of the results of this Section are in Appendix.
In what follows we assume that a sample of T observations is available for the yt
process. Assumptions 1, 2, 3 and 4 part (a) characterize the process. More in detail,
assumption 1 states that yt is a zero mean, purely stochastic process belonging to the
class of weakly dependent processes. This is a general class of processes introduced by
Doukhan and Louhichi (1999) which nests several types of mixing processes as well as
covariance stationary processes. Indeed, in assumption 2 we assume to have covariance
stationary process whose spectral density and covariance matrices are positive definite.
This assumption is in turn used to establish the existence of a var(∞) representation in
18
the same way as in Den Haan and Levin (1994). In particular, assumption 3 specifies the
rate of decay of the autocovariances which allows us to compute the asymptotic bias due
to the truncation error when estimating a var(p).
Finally, assumption 4 part (a) contains assumptions on the rates of the growth of
the dimension of the model with respect to the sample size. More specifically, the rate of
growth of the number of series is n = O(T ζ1) where ζ1 > 0. The data is then approximated
by a finite p-th order var
yt =
p∑k=1
Akyt−k + εt, εt ∼ w.n.(0,Γε), t = 1, . . . , T. (13)
where the order of the autoregression is allowed to grow with the sample size at the rate
p = O(T ζ2) where ζ2 > 0. Notice that due the truncation at lag p the var is biased and the
bias depends on the number of series n, the lag p and rate of decay of the autocovariance.
Assumption 4 part (a.iii) gives the relation between the relative rates of n and p in order
to have a truncation error which is asymptotically negligible. In particular, for a given
rate of decay of the autocovariances, the larger is n the larger must be p.
Hereafter, we adopt the following conventions. Although both n and p depend on
T we omit this dependence for notational convenience while we keep the dependence on
T for the estimated quantities, the penalization constant, and the number of non–zero
parameters to be introduced below. We use the index 0 to denote the true values of the
parameters and we use to denote estimated quantities.
Granger Network Estimator. The Granger component matrix G0 with generic ele-
ments g0 ij is estimated by estimating a p-th of order var model via the Adaptive lasso.
As in least squares, estimation of var can be decomposed in n separate regressions each
with T observations and np variables, which is typically more convenient for the numerical
implementation. For each of the n equations αi = (a1i1 . . . a1in . . . api1 . . . apin)′, where ak ij
19
is the (i, j)-th element of Ak. The Adaptive lasso estimator of αi is then defined as
αT i = arg minα
1
T
T∑t=1
(yt i −α′izt)2 +λGT
T
n p∑j=1
∣∣∣∣ αijαT i j
∣∣∣∣ , i = 1, . . . , n, (14)
where αT i is a pre-estimator. We denote the set of non–zero coefficients of one var
equation as Ai which has qGT i elements. In assumption 4 part (b) we require for any i
qGT i = o
(√T
log T
),
λGT
T
√qGT i = o(1), and
λGT
T
√T
log T→∞.
The following theorem establishes the large sample properties of the Adaptive lasso
estimator (14).
Theorem 1. Under assumptions 1, 2, 3, 4 parts (a) and (b), 5, and condition 1 in
Appendix, the following propositions are true for any i = 1, . . . , n.
(a) (Consistent Estimation of the Restricted Problem) Consider the constrained mini-
mizer αAiT i of (14) with αAiT i j = 0 for j ∈ Aci . Then for T large enough and for any
η > 0 the estimator exists and there exists a constant κG such that
∣∣∣∣αAiT i −α0i
∣∣∣∣2≤ κG
λGT
T
√qGT i,
and, for any j ∈ Ai, sign(αAiTi j) = sign(α0i j) with at least probability 1−O(T−η).
(b) (Consistent Selection of the Unrestricted Problem) Consider the minimizer αT i of
(14). Then for T large enough and for any η > 0
αT i j = 0 for j ∈ Aci ,
with at least probability 1−O(T−η).
(c) (Oracle) Consider the minimizer αT i of (14). Then for T large enough and for any
η > 0
||αT i −α0i||2 ≤ κGλGT
T
√qGT i,
20
and, for any j, sign(αT i j) = sign(α0i j), with at least probability 1−O(T−η).
The first statement of the theorem says that the Adaptive lasso estimator, when re-
stricted to the non–zero coefficients, is consistent (asλGT
T
√qGT i = o(1)) and their sign is also
estimated consistently. The second statement says that the unrestricted lasso estimator
correctly selects the non–zero coefficients asymptotically. Finally, the last statement says
that the unrestricted estimator is also consistent.
The strategy adopted to prove the previous theorem follows the literature of lasso
estimation of large dimensional linear time series models (see e.g. Fan and Peng, 2004;
Meinshausen and Buhlmann, 2006; Peng et al., 2009). We are contributing to the existing
results in two ways. First we consider a large dimensional panel of time series which
requires Adaptive lasso estimation. Second we address the var(∞) case and the bias
due to truncation at lag p. Such bias is of order O(n2/pβ), where β is the rate of decay of
the autocovariances and by assumption 4 part (a.iii) it can be controlled for. While the
first contribution is similar to the ones by Kock (2012); Kock and Callot (2012); Medeiros
and Mendes (2012), the second contribution, to our knowledge, is new to the literature.
The estimator of the var matrices constructed from αT i is denoted by ATk and we
have
GT =
p∑k=1
ATk,
with generic entries gT ij. The estimator of the set of the Granger network edges, G as
defined in (10), is
GT = {(i, j) ∈ V × V : gT ij 6= 0}.
Under the additional assumptions on the structure of the coefficients a0k ij of the var
approximation, we are able to establish consistency of the estimated Granger network. In
particular, assumption 7 imposes that (i) if for each lag k < p we have a0k ij = 0, then
for each k > p we require a0k ij = 0; and that (ii) if for any lag k, a0k ij 6= 0, then, for any
h, we have that∑h
k=1 a0k ij 6= 0. These are reasonable assumptions that respectively rule
out the cases of (i) zero coefficients followed by non–zero ones and (ii) exact cancellations
of non–zero coefficients. We then have the result on the Granger network estimation.
21
Corollary 1. Under assumptions 1, 2, 3, 4 parts (a), (b), 5, 7 and condition 1 in
Appendix, for any η > 0 and for T large enough
GT = G,
with probability at least 1−O(T−η). Moreover, for each i, j = 1, . . . , n
p– limT→∞
gT ij = g0 ij.
Contemporaneous Network Estimator. The contemporaneous component matrix
C0 which has generic elements c0 ij is estimated using a strategy put forward in Peng
et al. (2009). It consists of estimating the concentration matrix of the innovation term in
the var representation (6). Define for any var equation εt i = (yt i − α′iTzt). Then, the
entries of the inverse correlation matrix are ρij0 = −c0 ij/√c0 iic0 jj are the coefficients of
the regression
εt i =n∑j 6=i
ρij0
√cT iicT jj
εt j + ut i, i = 1, . . . , n. (15)
where cii T are pre-estimators of c0 ii. This estimation problem is a regression of n variables
on n(n−1)2
explanatory variables. Given sparsity of C0, we estimate the n equations as (15)
contemporaneously by means of the lasso estimator
ρT = arg minρ
1
T
T∑t=1
n∑i=1
(εt i −
n∑j 6=i
ρij
√cT iicT jj
εt j
)2
+λCTT
n∑i=2
i−1∑j=1
|ρij|. (16)
The estimator of the set of the Contemporaneous network edges, C as defined in (11), is
CT = {(i, j) ∈ V × V : ρ ijT 6= 0},
while the estimator of the matrix C0 is denoted as CT and has generic entries cT ij =
−ρ ijT√cT iicT jj. Finally, the pre–estimators cT ii must satisfy condition 2 in Appendix
22
and are given by
cT ii =
[1
T − 1
T∑t=1
u2t i
]−1
(17)
where ut i are the residuals of regression (15). Estimation is achieved by iterating between
(16) and (17).
If we denote the total number of non–zero parameters, i.e. the number of elements of
C, by qCT , then in assumption 4 part (c) we require
qCT = o
(√T
log T
),
λCT
T
√qCT = o(1), and
λCT
T
√T
log T→∞.
The following theorem establishes the large sample properties of the lasso estimator
(16).
Theorem 2. Under assumptions 1, 2, 3, 4 parts (a), (b), (c), and (d), 5, 6, and conditions
1, 2 in Appendix, the following propositions are true
(a) (Consistent Estimation of the Restricted Problem) Consider the constrained mini-
mizer ρCT of (A-5) with ρ ij CT = 0 for (i, j) ∈ Cc. Then for T large enough and for
any η > 0 the estimator exists and there exists a constant κC such that
∣∣∣∣ρCT C − ρ0 C∣∣∣∣
2≤ κC
λCT
T
√qCT ,
and, for any (i, j) ∈ C, sign(ρ ij CT ij) = sign(ρij0 ), with at least probability 1−O(T−η).
(b) (Consistent Selection of the Unrestricted Problem) Consider the minimizer ρT of
(A-5). Then for T large enough and for any η > 0
ρ ijT = 0 for (i, j) ∈ Cc,
with at least probability 1−O(T−η).
(c) (Oracle) Consider the minimizer ρT of (A-5). Then for T large enough and for any
23
η > 0
||ρT − ρ0||2 ≤ κCλCT
T
√qCT ,
and, for any (i, j), sign(ρ ijT ) = sign(ρij0 ), with at least probability 1−O(T−η).
The first statement of the theorem says that the lasso estimator, when restricted to
the non–zero coefficients, is consistent (asλCT
T
√qCT i = o(1)) and their sign is also estimated
consistently. The second statement says that the unrestricted lasso estimator correctly
selects the non–zero coefficients asymptotically. Finally, the last statement says that the
unrestricted estimator is also consistent.
In the proof, analogous to the one of theorem 1, we have also to take into account both
the truncation bias term due to the var(p) approximation, which as explained before is
controlled by assumption 4 part (a), and the bias due to the first step estimation which is
of order O(n√qGT iλ
GT /T ) and can be controlled by assumption 4 part (d). In particular,
as also explained in remark 1 in Appendix, this last assumption implies that if we want a
consistent estimator of both network components we must have a tradeoff relation between
the number of non–zero parameters in the var and the total number of series n. The
higher is the first the smaller must be the second (and viceversa). To our knowledge,
the analysis of the consistency of the high–dimensional sparse partial correlation network
estimator when the data is subject to this type of measurement error is novel in the
literature.
Finally, the following corollary is a straightforward consequence of theorem 2.
Corollary 2. Under assumptions 1, 2, 3, 4 parts (a), (b), (c), and (d), 5, 6, 7, and
conditions 1, 2 in Appendix, for any η > 0 and for T large enough
CT = C,
with probability at least 1−O(T−η). Moreover, for each i, j = 1, . . . , n
p– limT→∞
ρ ijT = ρij0 .
24
Long Run Partial Correlation Network Estimator. Finally, we can estimate the
long run concentration matrix KL0 with generic entries kL0 ij, by combining the sparse
estimators of the Granger and Contemporaneous networks
KLT = (I− GT )′ CT (I− GT ),
which has generic entries kLT ij. Then the generic entries of the Long Run Partial Corre-
lation network and their estimators are respectively
ρijL0 = − kL0 ij√kL0 iikL0 jj
and ρ ijL T = − kLT ij√kLT iikLT jj
.
Finally, the estimator of the set of the Long Run Partial Correlation network edges, K as
defined in (5), is
KT = {(i, j) ∈ V × V : ρ ijL T 6= 0},
and we have the final result on network estimation.
Corollary 3. Under assumptions 1, 2, 3, 4 parts (a), (b), (c), and (d), 5, 6, 7, and
conditions 1, 2 in Appendix, for any η > 0 and for T large enough
KT = K,
with probability at least 1−O(T−η). Moreover, for each i, j = 1, . . . , nT
p– limT→∞
ρ ijLT = ρijL0.
4 Simulation
In this Section we provide a simple illustration of our estimation algorithm using simulated
data. The exercise consists of simulating a multivariate time series process and using the
nets algorithm to detect the linkages of the Long Run Partial Correlation network. We
25
replicate the experiment for 1000 times and report the Monte Carlo estimates of the roc
curves of the procedure to assess the performance of the edge detection algorithm.
In line with our empirical application, we simulate a 40 dimensional system. Recall
that the number of possible links in a 40 dimensional system is 780. We assume that the
system is generated by a var(1). The sparsity structure the Granger and Contempora-
neous networks is determined by the Erdos-Renyi random graph model. The parameters
of the var(1) are set as
A1 = 0.3 ιG and Γ−1ε = I− 0.2 ιC ,
where ιG and ιC are adjacency matrices obtained by the Erdos-Renyi random graph
model. The networks are kept fixed across simulations and in the simulation results we
report below the number of Granger links is 78 and the number of contemporaneous links
is 44. For each replication we run the nets algorithm over a range of pairs of λGT and
λCT values. The series of pairs is such that both λG
T and λCT simultaneously increases. We
perform the experiment for different sample sizes T = 250, 500, 750, 2500.
We replicate the simulation 1000 times for each sample size and in each replication we
compute the proportion of type 1 and type 2 errors. The simulation results are used to
compute the roc curve, that is the plot of the true positive rate (tpr) versus the false
positive rate (fpr). Note that the penalization coefficient determines the true positive
rate and the false positive rate. When λGT and λC
T are small (large), the proportion of
type 1 errors is high (low) while the proportion of type 2 errors is low (high). Unfor-
tunately, besides special cases, the mapping between the penalization constants and the
true positive rate and false positive rate is unknown.
Figure 4 reports the roc curves for the different sample sizes. The curves show that as
the sample size T increases the performance of the classifier, which is measure by the area
underneath the roc curves, increases rapidly. Overall, the simulation results convey that
the estimation algorithm performs satisfactorily in a sparse high–dimensional setting.
26
5 Empirical Application
We illustrate the methodology proposed in this paper with an application to financial
risk networks. The application is motivated by the 2007/2009 financial crisis and is
connected to the strand of literature concerned with the development of quantitative
techniques for the measurement of systemic risk in the economy (see also Adrian and
Brunnermeier, 2009; Brownlees and Engle, 2012; Bisias, Flood, Lo, and Valavanis, 2012;
Brunnermeier and Oehmke, 2012). One of the lessons learnt from the crisis is that high
levels of interconnectedness between companies can be damaging for the entire economy.
When the degree of codependence is high, a large individual shock to a small set of firms
can have potentially vast rippling effects to all other interconnected institutions.
We use stock returns to construct a market based estimate of the network of intercon-
nections for a panel of companies. The application is inspired by the recent contributions
of Billio et al. (2012) and Diebold and Yilmaz (2013). Billio et al. (2012) use a network
definition based on bivariate Granger causality and, similarly to this application, analyse
the network of interconnections among firms using monthly returns. On the other hand,
Diebold and Yilmaz (2013) use a network definition based on variance decompositions
and analyse the network of interconnections among volatilities using volatility measures.
We consider a panel of U.S. bluechips across different industry sectors. The list of
company names and industry groups is provided in Table 6. We work with compound
monthly returns in between January 1990 to December 2010, which correspond to a
sample size of 251 observations. Classical asset pricing theory models like the capm
or apt imply that the unexpected return of these risky assets can be expressed as a
linear function of a few common factors and an idiosyncratic component. The presence of
common factors is however in contrast with the assumption of sparsity. Common factors
represent systematic components through which all asset returns are correlated. Thus in
order to estimate the Long Run Partial Correlation network from the data we first have
to control for common sources of variation. We consider here the case when factors are
27
observable and in particular we rely on a simple one factor model, that is
rt i = βirtm + zt i, (18)
where rt i is the rate of return for firm i, rtm is the return of the market factor and zt i is the
idiosyncratic shock of firm i. The objective is then to estimate the network structure of
the idiosyncratic shocks zt i. As it is commonly assumed in empirical finance, the market
factor is treated as observed and here we use the monthly rate of return on the S&P 500
index. In this setting, our asymptotic results will also depend on the estimation error of
βi which can be easily accounted for. In some cases, the common factors are unobservable
and need to be estimated (see e.g. Forni, Hallin, Lippi, and Reichlin, 2000; Stock and
Watson, 2002; Bai, 2003). However, in this case estimation of the network will depend
also on the estimation error of the unobservable factors. We plan to address this issue in
future research.
The idiosyncratic risk network is estimated as follows. For each asset in the panel,
we first estimate the βi coefficient of equation (18) by least squares and construct the
series of residuals. We then run nets on the panel of residuals. The order of the var
approximation p is set to one. The lasso penalties λGT and λC
T are determined on the
basis of the bic information criterion by searching over a grid of possible values.
The nets algorithm estimate of the idiosyncratic risk network contains 57 edges out of
820 possible ones. The estimated long run covariance matrix associated with the network
is also positive definite. Approximately 88% of the edges come from the Contemporaneous
network component. The vast majority of the partial correlations are positive and the
ones that are negative have small magnitudes from an economic perspective. Interestingly,
the linkages detected in the idiosyncratic risk network account for a significant portion
of the overall variation in stock returns. While the variability explained by the market
factor is 25% on average, the detected linkages account for an extra 15% on average.
The estimated idiosyncratic risk network is reported in Figure 5. In the figure, the
vertex diameter is proportional to the size of the firm the vertex color denotes the industry
28
group. We can group vertices in two categories: vertices that belong to a connected com-
ponent and vertices that are not connected to any other company. The vertices that are
not interconnected can be associated with industry groups that can be considered as more
peripheral from an idiosyncratic risk perspective. In fact, all Consumer Discretionary like
Disney (DIS), McDonalds (MCD) and Home Depot (HD) are disconnected. Figure 6
shows the subnetwork of the connected component. In this subgraph many of the links
are within the same industry group, a phenomenon that in the social network analysis
is called similarity: similar firms tend to be connected. Besides the intra–industry link-
ages, the network gives insights on the inter–industry linkages and shows how Financial
and Technology company are the industries that are more interconnected with the other
groups.
Network plots can be visually rich and it is sometimes challenging to grasp all the
information encoded in these graphs. To this end, it is useful to use techniques introduced
in the network literature to decode and synthesise the information in the graph. In
particular, in this analysis we focus on centrality and clustering.
The objective of centrality analysis is to determine which vertices are more central in
the network. Notions of centrality have been associated with systemic risk: a shock to a
node that is very central can potentially have vast rippling effects. This is put forward
in Dungey et al. (2012). Several centrality indices have been proposed in the literature.
Among other proposals, a simple measure of centrality is the number of edges of a node:
the higher the number of edges, the more interconnected and central a node is. Another
popular measure of centrality is the measure of eigenvector centrality, which is at the basis
of the PageRank algorithm used by Google to rank search results (Dungey et al., 2012). In
Figure 7, we report another plot of connected component of the idiosyncratic risk network.
In this plot however, we set the diameter of the vertices proportional to the number
of edges. Interestingly, the plot shows how Financial companies and some Technology
companies are central in the networks: AIG (AIG), Bank of America (BAC), Citigroup
(C) and Apple (AAPL). In order to get further insights on the drivers of centrality, we
regress firm network characteristics on firm characteristics, that is we regress the number
29
of vertices of each firm on industry dummies and (log) size. The results are reported in
Table 6. Results show that the only significant effects are the Financial, Material and
Technology Fixed Effects. Interestingly, results show that size does not matter. This
might be driven by the fact that size is not comparable across different industries. Also,
we acknowledge that regression results should be interpreted with caution given the small
sample size of 41 firms. Lastly, Figure 8 reports the Lorentz curve associated with the
number of edges in the network. The plot shows that there is a moderate degree of
“network inequality”, in the sense the top 20% vertices in terms of number of edges
account for more than 50% of the interconnections in the network.
Another interesting network characteristic that is often encountered in social networks
is clustering. Clustering refers to the fact that neighbouring vertices tend to be connected
to a similar set of neighbours: A is connected to C and B is connected to C, then it is
likely that A and B are also connected. Again, different types of indices can be used to
measure clustering. Here we use a local measure of clustering: for each pair of vertices,
we compute the proportion of common neighbour of the two vertices. We compute this
index for the idiosyncratic risk network. Out of 802 pairs, 172 have a clustering index
greater than zero (approximately 20%). We report the histogram of the index in Figure
9 (conditioning on the pairs with positive clustering). The histograms shows that the
density of ties in the network can be quite high. For those pairs whose clustering index
is greater than zero, the average proportion of common neighbours is 0.246. High degree
of clustering implies that the network has “small–world” characteristics: if clustering is
large then the number of edges connecting any two vertices is going to be small. From
a risk perspective, ceteris paribus, higher clustering implies that a shock to an individual
node is going to have larger effects in the network.
Overall, the results show that the idiosyncratic risk network estimated by the nets
algorithm provides interesting insights on the conditional dependence structure of id-
iosyncratic shocks in the panel. The network exhibits several of the empirical regularities
typically encountered in social networks.
30
6 Conclusions
In this work we have introduced novel network techniques for the analysis of high–
dimensional multivariate time series data. We provide a network definition that is specif-
ically tailored for multivariate time series based on long run partial correlations. We pro-
pose a simple estimation procedure based on the lasso that we name nets to estimate
Long Run Partial Correlation networks. The large sample properties of the procedure
are analysed and we provide conditions for consistent estimation of the sparse high–
dimensional networks. Finally, we illustrate the methodology with a financial network
analysis application.
31
Acknowledgements
We would like to thank Tony Hall, Nikolaus Hautsch, Julien Idier, Clifford Lam, Marco Lippi, Gabor Lugosi, Tommaso
Proietti, Kimmo Soramaki for their helpful comments. Special thanks go also to Giorgio Calzolari. We also thank the
participants to the workshop “Recent Developments in Econometrics” at Universitat Rovira i Virgili (October 26, 2012),
the UPF Finance Lunch Seminar (November 9, 2012), the conference “New Tools for Financial Regulation” at Banque
de France (November 15-16, 2012), the CORE–ECORE/LSM Seminar at UCLouvain (December 17, 2012), the conference
ICEEE 2013 at Universita di Genova (January 16–18, 2013), the ULB–ECARES Statistics seminar (February 8, 2013), the
Bundesbank seminar (February 12, 2013), the LSE Joint Econometrics and Statistics workshop (March 1, 2013), the London
School of Hygiene and Tropical Medicine Statistics seminar (March 5, 2013), the 3rd Humboldt–Copenhagen Conference
on Financial Econometrics (March 16–18, 2013), the Economics seminar at IHS in Vienna (March 26, 2013), the Economics
seminar at Universita di Pisa (April 15, 2013), the Economics seminar at IAE/UAB (April 17, 2013), the workshop on
“High Dimensional Time Series in Macroeconomics and Finance” at IHS in Vienna 2013 (May 2–4, 2013), the Financial
Econometrics conference at the Toulouse School of Economics (May 17–18, 2013), the Economics seminar at Universidad
de Alicante (May 26, 2013), the Barcelona GSE Summer Forum on Time Series Analysis in Macro and Finance workshop
(June 10–11, 2013), the Italian Statistical Society conference (June 19–21, 2013), the EUI Measuring and Modeling Financial
Risk with High Frequency Data workshop (June 27–29, 2013), the 2013 NBER-NSF Time Series Conference at the Federal
Reserve Board (September 26-27, 2013), the Economics seminar at Koc University (October, 2, 2013). All mistakes are
ours.
32
References
Acemoglu, D., Carvalho, V., Ozdaglar, A., and Tahbaz-Salehi, A. (2012). The Network Originsof Aggregate Fluctuations. Econometrica, 80, 1977{2016.
Acemoglu, D., Ozdaglar, A., and Tahbaz-Salehi, A. (2013). Systemic Risk and Stability inFinancial Networks. Technical report.
Adrian, T. and Brunnermeier, M. K. (2009). CoVaR. Technical report, FRB of New York. Sta�Report No. 348.
Allen, F. and Gale, D. (2000). Financial Contagion. Journal of Political Economy , 108, 1{33.
Andrews, D. W. K. and Monahan, J. C. (1992). An Improved Heteroskedasticity and Autocor-relation Consistent Covariance Matrix Estimator. Econometrica, 60, 953{966.
Bai, J. (2003). Inferential Theory for Factor Models of Large Dimensions. Econometrica, 71,135{171.
Bai, J. and Liao, Y. (2012). E�cient Estimation of Approximate Factor Models via RegularizedMaximum Likelihood. arXiv 1209.5911v2.
Belloni, A., Chernozhukov, V., and Hansen, C. (2011). Inference for High{Dimensional SparseEconometric Models. Advances in Economics and Econometrics. 10th World Congress ofEconometric Society.
Bickel, P. J. and Levina, E. (2008). High Dimensional Inference and Random Matrices. TheAnnals of Statistics, 36, 2577{2604.
Billio, M., Getmansky, M., Lo, A., and Pellizzon, L. (2012). Econometric Measures of Con-nectedness and Systemic Risk in the Finance and Insurance Sectors. Journal of FinancialEconomics, 104, 535{559.
Bisias, D., Flood, M., Lo, A. W., and Valavanis, S. (2012). A survey of systemic risk analytics.Working paper #0001, O�ce of Financial Research.
Brillinger, D. (1981). Time Series: Data Analysis and Theory . Holt, Rinehart and Winston,New York.
Brownlees, C. and Engle, R. (2012). Volatility, Correlations and Tails for Systemic Risk Mea-surement. Technical report, New York University.
Brunnermeier, M. K. and Oehmke, M. (2012). Bubbles, �nancial crises, and systemic risk.Technical report, Princeton.
B•uhlmann, P. and van de Geer, S. (2011). Statistics for High–Dimensional Data: Methods,Theory and Applications. Springer, New York.
Dahlhaus, R. (2000). Graphical Interaction Models for Multivariate Time Series. Metrika, 51,157{172.
Davis, R. A., Zang, P., and Zheng, T. (2012). Sparse Vector Autoregressive Modeling. arXiv1207.0520v1.
33
De Mol, C., Giannone, D., and Reichlin, L. (2008). Forecasting Using a Large Number ofPredictors: Is Bayesian Shrinkage a Valid Alternative to Principal Components? Journal ofEconometrics, 146, 318{328.
Dedecker, J., Doukhan, P., Lang, G., Le�on, J., Louhichi, S., and Prieur, C. (2007). WeakDependence: With Examples and Applications. Springer, New York.
Dempster, A. P. (1972). Covariance selection. Biometrics, 28, 157{175.
Den Haan, W. J. and Levin, A. (1994). Inferences from Parametrics and Non{ParametricCovariance Matrix Estimation Procedures. Technical report, UCSD and Federal ReserveBoard.
Diebold, F. and Yilmaz, K. (2013). On the Network Topology of Variance Decompositions:Measuring the Connectedness of Financial Firms. Journal of Econometrics, page forthcoming.
Doukhan, P. and Louhichi, S. (1999). A New Weak Dependence Condition and Applications toMoment Inequalities. Stochastic Processes and their Applications, 84, 313{342.
Doukhan, P. and Neumann, M. H. (2007). Probability and Moment Inequalities for Sums ofWeakly Dependent Random Variables, with Applications. Stochastic Processes and theirApplications, 117, 878{903.
Doukhan, P. and Neumann, M. H. (2008). The Notion of ψ{weak Dependence and its Applica-tions to Bootstrapping Time Series. Probability Surveys, 5, 146{168.
Dungey, M., Luciani, M., and Veredas, D. (2012). Ranking Systemically Important FinancialInstitutions. Technical report, ECARES.
Eichler, M. (2007). Granger Causality and Path Diagrams for Multivariate Time Series. Journalof Econometrics, 137, 334{353.
Fan, J. and Peng, H. (2004). Nonconcave Penalized Likelihood with a Diverging Number ofParameters. The Annals of Statistics, 32, 928{961.
Fan, J., Liao, Y., and Mincheva, M. (2011). High Dimensional Covariance Matrix Estimationin Approximate Factor Models. The Annals of Statistics, 39, 3320{3356.
Fan, J., Liao, Y., and Mincheva, M. (2013). Large Covariance Estimation by ThresholdingPrincipal Orthogonal Complements. Journal of the Royal Statistical Society, Series B . forth-coming.
Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2000). The Generalized Dynamic FactorModel: Identi�cation and Estimation. The Review of Economics and Statistics, 82, 540{554.
Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse Inverse Covariance Estimation withthe Graphical Lasso. Biostatistics, 9, 432{441.
Gallant, R. A. (1987). Nonlinear Methods in Econometrics. In J. Eatwell, M. Milgate, andP. Newman, editors, The New Palgrave, volume 3 (K{P). Stockton Press New York.
Hautsch, N., Schaumburg, J., and Schienle, M. (2012). Financial Network Systemic Risk Con-tributions. Technical report, Discussion Paper 2012-053, CRC 649, Humboldt-Universit•at zuBerlin.
34
Hautsch, N., Schaumburg, J., and Schienle, M. (2013). Forecasting Systemic Impact in FinancialNetworks. Technical report, Discussion Paper 2013-008, CRC 649, Humboldt-Universit•at zuBerlin.
Jaroci�nski, M. and Ma�ckowiak, B. (2011). Choice of Variables in Vector Autoregressions. Tech-nical report.
Kock, A. B. (2012). On the Oracle Property of the Adaptive Lasso in Stationary and Non-stationary Autoregressions. CREATES Research Papers 2012-05, School of Economics andManagement, University of Aarhus.
Kock, A. B. and Callot, L. A. (2012). Oracle Inequalities for High Dimensional Vector Au-toregressions. CREATES Research Papers 2012-16, School of Economics and Management,University of Aarhus.
Lam, C. and Fan, J. (2009). Sparsistency and Rates of Convergence in Large Covariance MatrixEstimation. The Annals of Statistics, 37, 4254{4278.
Lauritzen, S. L. (1996). Graphical Models. Clarendon Press, Oxford.
Ledoit, O. and Wolf, M. (2004). A Well-Conditioned Estimator for Large{Dimensional Covari-ance Matrices. Journal of Multivariate Analysis, 88, 365{411.
Medeiros, C. M. and Mendes, E. (2012). Estimating High{Dimensional Time Series Models.Technical report, Ponti�cal Catholic University of Rio de Janeiro.
Meinshausen, N. and B•uhlmann, P. (2006). High Dimensional Graphs and Variable Selectionwith the Lasso. The Annals of Statistics, 34, 1436{1462.
Newey, W. K. and West, K. D. (1987). A simple, positive semi-de�nite, heteroskedasticity andautocorrelation consistent covariance matrix. Econometrica, 55, 703{708.
Peng, J., Wang, P., Zhou, N., and Zhu, J. (2009). Partial Correlation Estimation by Joint SparseRegression Models. Journal of the American Statistical Association, 104, 735{746.
Peterson, C., Vannucci, M., Karakas, C., Choi, W., Ma, L., and Maleti�c-Savati�c, M. (2013). In-ferring Metabolic Networks Using Baysian Adaptive Graphical Lasso with Informative Priors.Statistics and Its Interface, page forthcoming.
Pourahmadi, M. (2011). Covariance Estimation: The GLM and the Regularization Perspectives.Statistical Science, 26, 369{387.
Song, S. and Bickel, P. J. (2011). Large Vector Auto Regressions. arXiv 1106.3915v1.
Stock, J. H. and Watson, M. W. (2002). Macroeconomic Forecasting Using Di�usion Indexes.Journal of Business and Economic Statistics, 20, 147{162.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the RoyalStatistical Society. Series B (Methodological), 58, 267{288.
White, H. (1984). Asymptotic Theory for Econometricians. New York: Academic Press.
Xavier, F., Parigi, B. M., and Rochet, J.-C. (2000). Systemic Risk, Interbank Relations andLiquidity Provision by the Central Bank. Journal of Money, Credit and Banking , 32, 611{638.
Zou, H. (2006). The Adaptive Lasso and Its Oracle Properties. Journal of the AmericanStatistical Association, 101, 1418{1429.
35
Table 1: U.S. Bluechips
ID Tick Name Sector
1 BRK Berkshire Hathaway B Financial2 C Citigroup Inc. Financial3 BAC Bank of America Corp. Financial4 AXP American Express Co. Financial5 AIG American International Group Inc. Financial6 MCD McDonalds Corp. Consumer Discretionary7 DIS Walt Disney Co. Consumer Discretionary8 HD Home Depot Inc. Consumer Discretionary9 PG Procter & Gamble Co. Consumer Staples10 KO Coca-Cola Co. Consumer Staples11 WMT Wal-Mart Stores Inc. Consumer Staples12 PEP PepsiCo Inc. Consumer Staples13 SLB Schlumberger Ltd. Energy14 OXY Occidental Petroleum Corp. Energy15 PFE Pfizer Inc. Healthcare16 JNJ Johnson & Johnson Healthcare17 MRK Merck & Co. Inc. Healthcare18 ABT Abbott Laboratories Healthcare19 AMGN Amgen Inc. Healthcare20 GE General Electric Co. Industrials21 UTX United Technologies Corp. Industrials22 UNP Union Pacific Corp. Industrials23 MMM 3M Co. Industrials24 CAT Caterpillar Inc. Industrials25 BA Boeing Co. Industrials26 DD E.I. DuPont de Nemours & Co. Materials27 FCX Freeport-McMoRan Copper & Gold Inc. Materials28 DOW Dow Chemical Co. Materials29 NEM Newmont Mining Corp. Materials30 PPG PPG Industries Inc. Materials31 AAPL Apple Inc. Technology32 IBM International Business Machines Corp. Technology33 MSFT Microsoft Corp. Technology34 T AT&T Inc. Technology35 ORCL Oracle Corp. Technology36 CSCO Cisco Systems Inc. Technology37 INTC Intel Corp. Technology38 DUK Duke Energy Corp. Utilities39 SO Southern Co. Utilities40 D Dominion Resources Inc. (Virginia) Utilities41 AEP American Electric Power Co. Inc. Utilities
List of company names and sectors.
Table 2: Network Regression
Variable Estimate
Consumer Staples FE 0.62Energy FE 1.15Financial FE 5.66**Healthcare FE 2.05Industrials FE 1.33Materials FE 4.91*Technology FE 2.76*Utilities FE 1.23(Log) Size 0.06R2 0.40
Number of Node Degree and Firm Characteristics.
36
Figure 1: Partial Correlation Network Example.
1
2
34
5
0.2
-0.50.70.3
The �gure displays the partial correlation network associated with Example 1.
Figure 2: Long Run Partial Correlation Network Example.
1
2 3
1
2 3
1
2 3
The �gure displays the Granger (top), Contemporaneous (middle), and Long Run Partial Correlation(bottom) networks associated with Example 4.
37
Figure 3: Long Run Partial Correlation Network Example.
1
2
3
4
5
6
0.2
0.4-0.3
1
2
3
4
5
6
0.2
0.3
1
2
3
4
5
6
0.3
0.5
0.2
0.2
-0.4
The �gure displays the Granger (top), Contemporaneous (middle), and Long Run Partial Correlation(bottom) networks associated with Example 5.
38
Figure 4: Monte Carlo Evidence
T=250 T=500
T=750 T=2500The �gure displays the roc curves associated with the Monte Carlo experiment of Section 4. The set of
penalization constants is λ = {l 0.1T : l = 1, ..., 20} for both lasso regressions.
39
Figure 5: Idiosyncratic Risk Network.
Plot of the full idiosyncratic risk network.
40
Fig
ure
6:Id
iosy
ncratic
Ris
kN
etw
ork.
Plo
tof
the
conn
ecte
dco
mpo
nent
ofth
eid
iosy
ncra
tic
risk
netw
ork.
41
Fig
ure
7:Id
iosy
ncratic
Ris
kN
etw
ork.
Plo
tof
the
conn
ecte
dco
mpo
nent
ofth
eid
iosy
ncra
tic
risk
netw
ork.
The
node
size
deno
tes
cent
ralit
yin
the
netw
ork.
42
Figure 8: Edge Concentration.
The figure shows the Lorentz curve of the edge concentration.
Figure 9: Clustering
The figure shows the histogram of the number of common neighbours for all the nodes pairs with atleast one neighbour in common.
43