bioinformatics applications note doi:10.1093/bioinformatics… · 2017-04-04 · smyth,g.k. (2005)...
TRANSCRIPT
Vol 30 no 4 2014 pages 571ndash573BIOINFORMATICS APPLICATIONS NOTE doi101093bioinformaticsbtt705
Gene expression Advance Access publication December 3 2013
Cascade a R package to study predict and simulate the diffusion
of a signal through a temporal gene networkNicolas Jung12 Frederic Bertrand2 Seiamak Bahram1 Laurent Vallat1 andMyriam Maumy-Bertrand2
1INSERM UMR S_1109 Labex Transplantex FMTS Hopitaux and Faculte de Medecine Universite de Strasbourg67085 Strasbourg Cedex and 2IRMA CNRS UMR 7501 Labex IRMIA Universite de Strasbourg 67084 StrasbourgCedex France
Associate Editor Ziv Bar-Joseph
ABSTRACT
Summary Temporal gene interactions in response to environmental
stress form a complex system that can be efficiently described using
gene regulatory networks They allow highlighting the more influential
genes and spotting some targets for biological intervention experi-
ments Despite that many reverse engineering tools have been
designed the Cascade package is an integrated solution adding
several new and original key features such as the ability to predict
changes in gene expressions after a biological perturbation in the
network and graphical outputs that allow monitoring the spread of a
signal through the network
Availability and implementation The R package Cascade is
available online at httpwww-mathu-strasbgfrgenpredspipphp
rubrique4
Contact fbertranmathunistrafr
Supplementary information Supplementary data are available at
Bioinformatics online
Received on June 6 2013 revised on October 4 2013 accepted on
November 28 2013
1 INTRODUCTION
Since the emergence of high-throughput technologies that allow
measuring simultaneously expression of thousands of genes
many tools have been developed to learn gene expression profiles
and reverse engineer their underlying gene regulatory network
(GRN) (Bar-Joseph et al 2012 Hecker et al 2009) These
tools are either based on static coexpression methods or if the
biological phenomenon shows any temporality time-dependent
methods Although the former relies on the assumption that coex-
pressed genes share some biological characteristics the latter
infers a directed network with temporal dependencies In this
last case another important distinction should be made between
exogenous stress (eg growth response) and endogenous phenom-
enon (eg cell cycle) (Yosef et al 2011 Zhu et al 2007) This
leads to different network topologies in exogenous stress net-
worksrsquo topologies seem to have larger hubs and shorter paths
through temporal-dependent transcriptional waves (Luscombe
et al 2004) This results in a quick response to environmental
modifications (Luscombe et al 2004) The Cascade package
is designed to model such lsquocascade networksrsquo taking advantageof the assignment of genes to temporal clusters which adds
temporal causality in the network
2 DETAILS ON THE PACKAGE FEATURES
This package has been designed to analyze temporal microarraydatasets allowing gene selection temporal cluster assignmentreverse engineering the GRN using a penalized regression
model and predicting the effect of biological intervention experi-ments It also features a temporal synthetic cascade simulation
tool The biological interpretations are facilitated thanks toseveral graphical outputs More insight about the statisticaltools as well as benchmarks is provided in Vallat et al (2013)
21 Gene selection and cluster assignment
Selecting the genes for reverse engineering is a crucial stepBesides selecting genes with high-differential expressions the
Cascade package allows enriching the selection with genesfeaturing specific temporal patterns As pointed out by Hao
and Baltimore (2009) several temporal gene expression wavescorresponding to specific cellular functions can be individualizedafter stimulation of the cellular environment In this pulsed bio-
logical response some relevant genes may have low but system-atic differential expressions This selection step mostly relies on
the Bioconductor R package limma (Smyth et al 2005)Each gene must be then assigned to one of the time clusters
This can be automatically performed (according to the first time
when the gene is differentially expressed) Alternatively the timeclusters can be user-provided
22 Reverse engineering of the network
The reverse engineering algorithm is the Lasso-penalized estima-tion of a linear regression model described in Vallat et al (2013)The Lasso penalty ensures sparsity which is a well-known fea-
ture of most biological networks (Barabasi 2003) Furthermorethe temporal gene clusters are taken into account using a set of
matrices F to describe how genes interact
Y frac14XN
ifrac141
FmethXiTHORNmethYTHORNiXi thorn eth1THORN
where Y is the regulated gene and the Xi are potential regulatorgenes the i determine the strength of the link between Xi and Y
methTHORN is the function thatmaps a gene to its temporal cluster and isTo whom correspondence should be addressed
The Author 2013 Published by Oxford University Press All rights reserved For Permissions please e-mail journalspermissionsoupcom 571
a noise Some further constraints are set to ensure a temporal
causality andweuse the Lasso estimator to achieve some sparsityIt is common knowledge that biological networks are scale-
free (Barabasi 2003) the distribution of the outgoing edges in
the networks follows a power law distribution As a consequence
using a statistical test from Clauset et al (2009) we derived a
cutoff value for the coefficients It was established by a simu-
lation study that such a procedure greatly improves F-scores
(Van Rijsbergen 1979) A graphical output Supplementary
Material S1 shows the modification of the network topology
when this cutoff varies For a given cutoff a graphical output
Supplementary Material S2 shows how the stimulated transcrip-
tional response spreads through the network If time clusters are
heterogeneous matrices F and values are iteratively estimated
in a coordinate ascendant approach On the contrary if all the
time clusters are homogeneous enough the estimation of the
matrices Fmay be achieved using all the genes in each of the clus-
ters instead of using only those pointed out by their values
This results in a non-iterative algorithm matrices F and values
are only estimated once
23 Prediction
We can predict changes in gene expressions using Equation (1)
after a gene intervention experiment at the first time point as
validated in silico and biologically in Vallat et al (2013)
24 Simulation
The Cascade package provides two simulation tools On the first
hand a random network can be simulated following the prefer-
ential attachment theory (Barabasi 2003) with some constraints
to ensure that the result is a temporal cascade network On the
other hand the model Equation (1) can be used to simulate
gene expressions from any given network
3 EXAMPLES
Two packagersquos vignettes detail the comprehensive analysis of two
example datasets A first dataset extracted from GSE39411 is
based on the transcriptional response of healthy lymphocytes
B-cells after antigenic stimulation (Vallat et al 2007) The
second dataset (E-MTAB-1475) has a different experimental
design and is based on the transcriptional response of murine
lymphocytes T-cells after an in vitro stimulation that sustains
cellular differentiation (van den Ham et al 2013) In both
cases gene expressions measured at different time points after
cell stimulation are used to select genes with specific temporal
patterns or high differential expressions which are then assigned
to time clusters (Fig 1 for GSE39411 and Supplementary
Material S3 and S4 for E-MTAB-1475) The reverse engineering
of the GRN highlights the most influential genes in the temporal
cascade (Fig 2 and Supplementary Material S3ndashS5) The impact
in the GRN of a knockdown experiment of one influential gene
is predicted (Fig 3 and Supplementary Material S3ndashS6)
ACKNOWLEDGEMENT
Nicolas Jung Frederic Bertrand Laurent Vallat and Myriam
Maumy-Bertrand have contributed equally to this article
Funding Strasbourg High Throughput Next Generation
Sequencing facility (GENOMAX) INSERM UMR_S 1109
CNRS UMR 7501 University of Strasbourg INSERM
Fig 3 Step 3 predicted perturbations in the network at the second time
point after gene expression modulation at an early time in the temporal
GRN of GSE39411 The green influential gene is supposed to be knocked
down Color scale legend from downregulated (blue) to upregulated (red)
genes
Fig 2 Step 2 Reverse engineering of the network in GSE39411 Nodes
represent genes and the arrows statistical links between the genes
Arrowsrsquo thickness depicts the intensity of the link
Fig 1 Step 1 gene selection in GSE39411 and assignment to a time
cluster
572
NJung et al
(ITMO cancer Systems Biology plan cancer 2009-2013) CNRS(PEPS-BMI interdisciplinarity 2013)
Conflict of Interest none declared
REFERENCES
BarabasiAL (2003) Emergence of scaling in complex networks In BornholdtS
and SchusterHG (eds) Handbook of Graphs and Networks From the Genome
to the Internet Wiley-VCH Weinheim pp 69ndash84
Bar-JosephZ et al (2012) Studying and modelling dynamic biological processes
using time-series gene expression data Nat Rev Genet 13 552ndash564
ClausetA et al (2009) Power-law distributions in empirical data SIAM Rev 51
661ndash703
HaoS and BaltimoreD (2009) The stability of mRNA influences the temporal
order of the induction of genes encoding inflammatory molecules Nat
Immunol 10 281ndash288
HeckerM et al (2009) Gene regulatory network inference data integration in
dynamic models-a review Biosystems 96 86ndash103
LuscombeNM et al (2004) Genomic analysis of regulatory network dynamics
reveals large topological changes Nature 431 308ndash312
SmythGK (2005) Limma linear models for microarray data In GentlemanR
et al (eds) Bioinformatics and Computational Biology Solutions Using R and
Bioconductor Springer New York pp 397ndash420
van den HamHJ et al (2013) Early divergence of Th1 and Th2 transcriptomes
involves a small core response and sets of transiently expressed genes Eur J
Immunol 43 1074ndash1084
Van RijsbergenCJ (1979) Information Retrieval 2nd edn Butterworth London
VallatL et al (2007) Temporal genetic program following B-cell receptor cross-
linking altered balance between proliferation and death in healthy and malig-
nant B cells Blood 109 3989ndash3997
VallatL et al (2013) Reverse-engineering the genetic circuitry of a cancer cell with
predicted intervention in chronic lymphocytic leukemia Proc Natl Acad Sci
USA 110 459ndash464
YosefN and RegevA (2011) Impulse control temporal dynamics in gene
transcription Cell 144 886ndash896
ZhuX et al (2007) Getting connected analysis and principles of biological
networks Genes Dev 21 1010ndash1024
573
Cascade
a noise Some further constraints are set to ensure a temporal
causality andweuse the Lasso estimator to achieve some sparsityIt is common knowledge that biological networks are scale-
free (Barabasi 2003) the distribution of the outgoing edges in
the networks follows a power law distribution As a consequence
using a statistical test from Clauset et al (2009) we derived a
cutoff value for the coefficients It was established by a simu-
lation study that such a procedure greatly improves F-scores
(Van Rijsbergen 1979) A graphical output Supplementary
Material S1 shows the modification of the network topology
when this cutoff varies For a given cutoff a graphical output
Supplementary Material S2 shows how the stimulated transcrip-
tional response spreads through the network If time clusters are
heterogeneous matrices F and values are iteratively estimated
in a coordinate ascendant approach On the contrary if all the
time clusters are homogeneous enough the estimation of the
matrices Fmay be achieved using all the genes in each of the clus-
ters instead of using only those pointed out by their values
This results in a non-iterative algorithm matrices F and values
are only estimated once
23 Prediction
We can predict changes in gene expressions using Equation (1)
after a gene intervention experiment at the first time point as
validated in silico and biologically in Vallat et al (2013)
24 Simulation
The Cascade package provides two simulation tools On the first
hand a random network can be simulated following the prefer-
ential attachment theory (Barabasi 2003) with some constraints
to ensure that the result is a temporal cascade network On the
other hand the model Equation (1) can be used to simulate
gene expressions from any given network
3 EXAMPLES
Two packagersquos vignettes detail the comprehensive analysis of two
example datasets A first dataset extracted from GSE39411 is
based on the transcriptional response of healthy lymphocytes
B-cells after antigenic stimulation (Vallat et al 2007) The
second dataset (E-MTAB-1475) has a different experimental
design and is based on the transcriptional response of murine
lymphocytes T-cells after an in vitro stimulation that sustains
cellular differentiation (van den Ham et al 2013) In both
cases gene expressions measured at different time points after
cell stimulation are used to select genes with specific temporal
patterns or high differential expressions which are then assigned
to time clusters (Fig 1 for GSE39411 and Supplementary
Material S3 and S4 for E-MTAB-1475) The reverse engineering
of the GRN highlights the most influential genes in the temporal
cascade (Fig 2 and Supplementary Material S3ndashS5) The impact
in the GRN of a knockdown experiment of one influential gene
is predicted (Fig 3 and Supplementary Material S3ndashS6)
ACKNOWLEDGEMENT
Nicolas Jung Frederic Bertrand Laurent Vallat and Myriam
Maumy-Bertrand have contributed equally to this article
Funding Strasbourg High Throughput Next Generation
Sequencing facility (GENOMAX) INSERM UMR_S 1109
CNRS UMR 7501 University of Strasbourg INSERM
Fig 3 Step 3 predicted perturbations in the network at the second time
point after gene expression modulation at an early time in the temporal
GRN of GSE39411 The green influential gene is supposed to be knocked
down Color scale legend from downregulated (blue) to upregulated (red)
genes
Fig 2 Step 2 Reverse engineering of the network in GSE39411 Nodes
represent genes and the arrows statistical links between the genes
Arrowsrsquo thickness depicts the intensity of the link
Fig 1 Step 1 gene selection in GSE39411 and assignment to a time
cluster
572
NJung et al
(ITMO cancer Systems Biology plan cancer 2009-2013) CNRS(PEPS-BMI interdisciplinarity 2013)
Conflict of Interest none declared
REFERENCES
BarabasiAL (2003) Emergence of scaling in complex networks In BornholdtS
and SchusterHG (eds) Handbook of Graphs and Networks From the Genome
to the Internet Wiley-VCH Weinheim pp 69ndash84
Bar-JosephZ et al (2012) Studying and modelling dynamic biological processes
using time-series gene expression data Nat Rev Genet 13 552ndash564
ClausetA et al (2009) Power-law distributions in empirical data SIAM Rev 51
661ndash703
HaoS and BaltimoreD (2009) The stability of mRNA influences the temporal
order of the induction of genes encoding inflammatory molecules Nat
Immunol 10 281ndash288
HeckerM et al (2009) Gene regulatory network inference data integration in
dynamic models-a review Biosystems 96 86ndash103
LuscombeNM et al (2004) Genomic analysis of regulatory network dynamics
reveals large topological changes Nature 431 308ndash312
SmythGK (2005) Limma linear models for microarray data In GentlemanR
et al (eds) Bioinformatics and Computational Biology Solutions Using R and
Bioconductor Springer New York pp 397ndash420
van den HamHJ et al (2013) Early divergence of Th1 and Th2 transcriptomes
involves a small core response and sets of transiently expressed genes Eur J
Immunol 43 1074ndash1084
Van RijsbergenCJ (1979) Information Retrieval 2nd edn Butterworth London
VallatL et al (2007) Temporal genetic program following B-cell receptor cross-
linking altered balance between proliferation and death in healthy and malig-
nant B cells Blood 109 3989ndash3997
VallatL et al (2013) Reverse-engineering the genetic circuitry of a cancer cell with
predicted intervention in chronic lymphocytic leukemia Proc Natl Acad Sci
USA 110 459ndash464
YosefN and RegevA (2011) Impulse control temporal dynamics in gene
transcription Cell 144 886ndash896
ZhuX et al (2007) Getting connected analysis and principles of biological
networks Genes Dev 21 1010ndash1024
573
Cascade
(ITMO cancer Systems Biology plan cancer 2009-2013) CNRS(PEPS-BMI interdisciplinarity 2013)
Conflict of Interest none declared
REFERENCES
BarabasiAL (2003) Emergence of scaling in complex networks In BornholdtS
and SchusterHG (eds) Handbook of Graphs and Networks From the Genome
to the Internet Wiley-VCH Weinheim pp 69ndash84
Bar-JosephZ et al (2012) Studying and modelling dynamic biological processes
using time-series gene expression data Nat Rev Genet 13 552ndash564
ClausetA et al (2009) Power-law distributions in empirical data SIAM Rev 51
661ndash703
HaoS and BaltimoreD (2009) The stability of mRNA influences the temporal
order of the induction of genes encoding inflammatory molecules Nat
Immunol 10 281ndash288
HeckerM et al (2009) Gene regulatory network inference data integration in
dynamic models-a review Biosystems 96 86ndash103
LuscombeNM et al (2004) Genomic analysis of regulatory network dynamics
reveals large topological changes Nature 431 308ndash312
SmythGK (2005) Limma linear models for microarray data In GentlemanR
et al (eds) Bioinformatics and Computational Biology Solutions Using R and
Bioconductor Springer New York pp 397ndash420
van den HamHJ et al (2013) Early divergence of Th1 and Th2 transcriptomes
involves a small core response and sets of transiently expressed genes Eur J
Immunol 43 1074ndash1084
Van RijsbergenCJ (1979) Information Retrieval 2nd edn Butterworth London
VallatL et al (2007) Temporal genetic program following B-cell receptor cross-
linking altered balance between proliferation and death in healthy and malig-
nant B cells Blood 109 3989ndash3997
VallatL et al (2013) Reverse-engineering the genetic circuitry of a cancer cell with
predicted intervention in chronic lymphocytic leukemia Proc Natl Acad Sci
USA 110 459ndash464
YosefN and RegevA (2011) Impulse control temporal dynamics in gene
transcription Cell 144 886ndash896
ZhuX et al (2007) Getting connected analysis and principles of biological
networks Genes Dev 21 1010ndash1024
573
Cascade