modeling biological networksbrunos/lecture3.pdfmodeling biological networks dr. carlo cosentino ......
TRANSCRIPT
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20081
Modeling Biological Networks
Dr. Carlo CosentinoSchool of Computer and Biomedical Engineering
Department of Experimental and Clinical MedicineUniversità degli Studi Magna Graecia
Catanzaro, [email protected]
http://bioingegneria.unicz.it/~cosentino
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20082
Outline
Classification of biological networks
Modeling metabolic networks
Modeling gene regulatory networks
Inferring gene regulatory networks
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20083
Types of Biological Network
Several different kinds of biological network can be distinguished at the molecular level
Gene regulatory
Metabolic
Signal transduction
Protein–protein interaction
Moreover other networks can be considered as we move to different description levels, e.g.
Immunological
Ecological
Here we will focus exclusively on molecular processes that take place within the cell
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20084
Goals
A major challenge consists in identifying with reasonable accuracy the complex macromolecular interactions at the gene, metabolite and protein levels
Once identified, the network model can be used to
simulate the process it represents
predict the features of its dynamical behavior
extrapolate cellular phenotypes
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20085
Graphs
A very useful formal tool for describing and visualizing biological networks is represented by graphs
A graph, or undirected graph, is an ordered pair G=(V,E), where V is the set of the vertices, or nodes, and E is the set of unordered pairs of distinct vertices, called edges or lines
For each edge {u,v}, the nodes u and v are said to be adjacent
We have a directed graph, or digraph, if E is a set of ordered pairs
In digraphs, the in–degree, kin, (out–degree, kout) of a node is the number of edges incident to (from) that node
Barabasi et al, Nature Review Genetics 101(5), 101–114 , 2004
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20086
Topological Characteristics
The degree distribution , P(k), gives the probability that a selected node has exactly k links
It allows us to distinguish between different classes of networks (see next slide)
The clustering coefficient of a node I, CI, measures the aggregation of its adjacents (number of “triangles” passing through node I)
C(k) is the average clustering coefficient of all nodes with k links
Barabasi et al, Nature Review Genetics 101(5), 101–114 , 2004
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20087
Erdös–Rényi Random Networks
The Erdös–Rényi model of a random network starts with N nodes and connects each pair of nodes with probability p
The degree follows a Poisson distribution, thus many nodes have the same number of links (close to the average degree <k>
The tail decreases exponentially, which indicates that nodes with k very different from the average are rare
The clustering coefficient is independent of a node’s degree
Barabasi et al, Nature Review Genetics 101(5), 101–114 , 2004
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20088
Scale–Free Networks
Scale–free networks are characterized by a power–law degree distribution
The probability that a node has k links follows P(k)~k-γ, where γ is the degree exponent
The probability that a node is highly connected is statistically more significant than in a random graph
In the Barabási–Albert model, at each time point a node with M links is added to the network, which connects to an already existing node I with probability Πi=ki/Σjkj
The underlying mechanism is that nodes with many links have higher probability of getting more (this is also referred to as preferential attachment)
Barabasi et al, Nature Review Genetics 101(5), 101–114 , 2004
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 20089
Hierarchical Networks
A hierarchical structure arises in systems that combine modularity and scale–free topology
The hierarchical model is based on the replication of a small cluster of four nodes (the central ones)
The external nodes of the replicas are linked to the central node of the original cluster
The resulting network has a power–law degree distribution, thus it is scale–free
The average clustering coefficient scales with the degree following C(k )~k -1
Barabasi et al, Nature Review Genetics 101(5), 101–114 , 2004
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200810
Graphs of Biological Networks
Depending on the kind of biological network, the edges and nodes of the graph have different meaning
Metabolic network
nodes: metabolic product, edge: a reaction transforming A into B
Transcriptional regulation network (protein–DNA)
nodes: genes and proteins, edge: a TF regulates a gene
Protein – protein network
nodes: proteins, edge: interaction between proteins
Gene regulatory networks (functional association network)
nodes: genes, edge: expressions of A and B are correlated
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200811
Topology of Biological Networks
An extensive commentary has been published by Albert in 2005, reviewing literature on the topology of different kinds of biological networks
Experimental evidences are reviewed for metabolic, transcriptional regulatory, signal transduction, functional association networks
All of the considered networks approximately exhibit power–law degree distribution, at least for the in– or for the out–degree
For instance, transcriptional regulation networks exhibit a scale–free out–degree distribution, signifying the potential of transcription factors to regulate multiple targets
On the other hand, their in–degree is a more restricted exponential function, suggesting that combinatorial regulation by several TFs is less frequent
Albert, Scale–free networks in cell biology, Journal of Cell Science 118(21), 4947–4957, 2005
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200812
P–P Interaction Network in Yeast
This network is based on yeast two–hybrid experiments
Few highly connected nodes (hubs) hold the network together
Barabasi et al, Nature Review Genetics 101(5), 101–114 , 2004
The color of a node indicates the phenotypic effect deriving from removing the corresponding protein
red: lethal
green: non–lethal
orange: slow growth
yellow: unknown
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200813
Outline
Classification of biological networks
Modeling metabolic networks
Modeling gene regulatory networks
Inferring gene regulatory networks
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200814
Metabolic Reactions
Living cells require energy and material for
building up membranes
storing molecules
replenishing enzymes
replication and repair of DNA
movement
Metabolic reactions can be divided in two categories
Catabolic reactions: breakdown of complex compounds to get energy and building blocks
Anabolic reactions: assembling of the compounds used by the cellular mechanisms
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200815
Basic Concepts of Metabolism
Historically metabolism is the part of cell functioning that has been studied more thoroughly during the last decades
This implies that several well assessed mathematical tools exist for describing this kind of networks
Enzyme kinetics investigates the dynamic properties of the individual reactions in isolation
Stoichiometric analysis deals with the balance of compound production and degradation at the network level
Metabolic control analysis describes the effect of perturbations in the network, in terms of changes of metabolites concentrations
Most of the tools used in the quantitative study of metabolic networks can also be applied to other types of networks
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200816
Glycolysis
We will exploit the case–study of glycolysis in yeast in order to illustrate the theoretical concepts introduced hereafter
The pathway shown below is part of the glycolysis process
Hynne et al, Full–scale model of glycolysis in Saccharomyces cerevisiae (2001) Biophys. Chem. 94, 121–163
v1: hexokinasev2: consumption of glucose–6–phosphate in other pathwaysv3: phosphoglucoisomerasev4: phosphofructokinase
v5: aldolasev6: ATP production in lower glycolysisv7: ATP consumption in other pathwaysv8: adenylate kinase
List of Reactions
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200817
ODE Model of Glycolysis
The system of ODEs describing the pathway is
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200818
ODE Model with Constant Glucose
The kinetic rates as functions of reactants can be derived by applying the models presented in the previous lecture
Model Parameters
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200819
Stoichiometric Analysis
The basic elements considered in stoichiometric analysis of metabolic networks are
The concentrations of the various species
The reactions or transport processes affecting such concentrations
The stoichiometric coefficients denote the proportion of substrate and product molecules involved in a reaction
For instance, if we consider the reaction
the stoichiometric coefficients of S1, S2, P are 1,1,-2 respectively
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200820
Stoichiometric Analysis
The change of concentrations in time can be described by means of ODEs
For the simple reaction above we have
This means that the degradation of S1 with rate v is accompanied by the degradation of S2 with the same rate and by the production of P with a double rate
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200821
Stoichiometric Matrix
In general, for a system of m substances and r reactions, the system dynamics are described by
The number nij is the stoichiometric coefficient of the i-th metabolite in the j-th reaction
For the sake of simplicity, we assume that the changes of concentrations are only due to reactions (i.e. we neglect the effect of convection or diffusion)
We can then define the stoichiometric matrix
in which columns correspond to reactions and rows to concentration variations
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200822
Stoichiometric Model
The mathematical description of the metabolic network can be given in matrix form as
where
S=(S1,…,Sm)T is the vector of concentration values
v=(v1,…,vr)T is the vector of reaction rates
If the system is at steady–state (that is dSi /dt = 0 for i=1,…,m) we can also define the vector of steady–state fluxes, J=(J1,…,Jr)T
Finally, the model involves a certain number of parameters, thus we can define also a parameter vector, p=(p1,…,pη )T
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200823
Stoichiometric Model of Glycolysis
For the glycolysis model we have
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200824
Analysis of the Stoichiometric Matrix
A relevant information that can be readily derived from the N matrix is which combinations of individual fluxes are possible at steady–state
The system of algebraic eqs admits a nontrivial solution only if rank(N)<r
Every possible set of steady–state fluxes can be expressed as a linear combination of the basis of the kernel of N, defined by the matrix K, such that N·K=0
Therefore, denoting by ki the i-th column of K,
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200825
An Example
Let us consider the simple network
The stoichiometric matrix is N=(1 1 1)
and the steady–state fluxes are described by the linear combination
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200826
Null Rates at Steady–State
For the glycolysis model we have r=8 and rank(N)=5, thus the base of the null space of N is composed of three vectors
Note that the entries in the last row are all zero; this means that the net rate for that reaction is null at steady–state
Hence, at steady–state we can neglect the reaction v8
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200827
Unbranched Pathways
Another property that can be readily derived is the presence of unbranchedpathways
In this case, the net rate of all the reactions in the pathway must be equal
The entries for the second and third reaction in the matrix K are always equal
This implies that the fluxes through reactions 2 and 3 must be equal at steady–state
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200828
Elementary Flux Modes
A pathway can be defined as a set of metabolic reactions linked by common metabolites
It is not straightforward to recognize pathways in metabolic maps that have been reconstructed from experimental evidences
This problem is formalized in the concept of finding the Elementary Flux Modes (EFMs)
The aim is to find which are the admissible direct routes for producing a certain metabolite starting from another one
In order to have an idea of the usefulness of such mathematical methods, we can have a glimpse at a typical whole–organism–scale metabolic network
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200829
Metabolic Network in Yeast
Palsson, Systems Biology: Properties of Reconstructed Networks, 2006
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200830
Elementary Flux Modes
Without going into the mathematical details, we can have a further insight by looking at the elementary flux modes of two simple networks
A factor that greatly influences the EFMs is the reversibility of the single reactions
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200831
Applications of EFM Analysis
EFMs can be used to
infer the range of metabolic pathways in the network
test a set of enzymes for production of a desired compound, and to find the most convenient pathway
reconstruct metabolism from annotated genome sequences and analyze the effects of enzyme deficiency
reduce drug effects and identify drug targets
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200832
Flux Balance Analysis
Flux Balance Analysis (FBA) deals with the problem of finding the operative modes of metabolic networks subject to three kinds of constraints
1) The operative mode is assumed to be at steady–state
2) The operative mode must respect the (ir)reversibility of the reactions
3) The enzyme catalytic activity in each reaction is limited to an admissible range, i.e. αi ≤ vi ≤ βi
Additional constraints may be imposed by biomass composition or other external conditions
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200833
Flux Balance Constrained Optimization
Such constraints confine the steady–state fluxes to a feasible set, but usually do not yield a unique solution
Hence, the determination of a particular metabolic flux distribution can be cast as a linear optimization problem
Maximize an objective function
subject to the constraints given above
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200834
Conservation Relations
If a substance is neither added nor removed from the reaction system, its total concentration remains constant
This property can be derived by analyzing the null space of NT, defined by the matrix G such that
The latter implies
The dimension of the null space is m-rank(N)
GS = GNv = 0 GS = const
GN = 0
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200835
Conservation in Glycolysis
For the glycolysis example we have
which means the sum of concentrations of AMP, ADP, ATP remains constant
The conservation relations can be used to simplified the dynamical model, by exploiting the algebraic equations that express the conservation constraints to express some variables as functions of the others
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200836
Metabolic Control Analysis
Metabolic Control Analysis (MCA) deals with the sensitivity of the steady–state properties of the network to small parameter changes
It can be also applied to models of other kinds of network, like signaling pathways or gene expression
Issues addressed by MCA
Predict properties of the network from knowledge of individual components
Find which specific step has the greatest influence on a flux or steady–state concentration or reaction rate
Find which is the best target reaction to treat a metabolic disorder
These questions are very relevant in biotechnological productionprocesses and health care
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200837
Basic Concepts of MCA
The relations between steady–state properties and model parameters are usually highly nonlinear
There is no general theory predicting the effect of large parameter changes
The MCA approach deals with small parameter changes
Under this assumption, the model can be approximated, in the neighborhood of the steady–state, with a linear one
Given the linearized model it is possible to derive some indexes describing the properties above mentioned, e.g. elasticity coefficients, control coefficients, response coefficients
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200838
Outline
Classification of biological networks
Modeling metabolic networks
Modeling gene regulatory networks
Inferring gene regulatory networks
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200839
Gene Regulatory Networks
A protein synthesized from a gene can serve as a transcription factor for another gene, as an enzyme catalyzing a metabolic reaction, or as a component of a signal transduction pathway
Apart from DNA transcription regulation, gene expression may be controlled during RNA processing and transport, RNA translation, and the post–translational modification of proteins
Therefore, gene regulatory networks (GRNs) involve interactions between DNA, RNA, proteins and other molecules
A suitable way to dominate this complexity may consist of using functional association networks
In this networks the edges of the corresponding graph do not represent chemical interactions, but functional influences of one gene on the other
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200840
Example of a GRN
A toy regulatory network of three genes is depicted in the cartoon below
De Jong, Modeling and regulation of genetic regulatory systems, INRIA - RR4032, 2000
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200841
Modeling GRNs
In what follows we will present an overview of the models used to describe GRNs
Two main issues have to be taken into account when choosing a modeling framework
Computational requirements for simulation
Available methods for inferring the network topology
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200842
Bayesian Networks
In the formalism of Bayesian Networks, the structure of a genetic regulatory system is modeled by a directed acyclic graph G= ⟨V,E⟩
The vertices i∈V, i=1,…,n, represent genes expression levels and correspond to random variables Xi.
For each Xi, a conditional distribution p(Xi |parents(Xi)) is defined, where parents(Xi) denotes the direct regulators of i
The graph G and the set of conditional distributions uniquely specify a joint probability distribution p(X)
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200843
Independency in BN
If Xi is independent of Y given Z, where Y and Z are set of variables, we can state a conditional independency
For every node i in G,
Hence, the joint probability distribution can be decomposed into
i (Xi;Y|Z)
i (Xi; non− descendant(Xi)|parents(Xi))
p(X) =nYi=1
p(Xi|parents(Xi))
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200844
Example of BN
Here we illustrate the formulation of the BN model for a simple network
Two graphs are said to be equivalent if the imply the same set of independencies; they cannot be distinguished by observation on X
De Jong, Modeling and regulation of genetic regulatory systems, INRIA - RR4032, 2000
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200845
Features of BNs
There is no need to specify a single value for each parameter of the model, but rather a distribution over the admissible range of values is assigned
This characteristic helps in avoiding overfitting, which is common in the presence of a small data set and a large number of parameters
It is a statistical modeling approach, which nicely fits the stochastic nature of biological systems
BNs are static models, although it is possible to take into account dynamical aspects through an extension of this theory, namely dynamical bayesiannetworks (DBNs)
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200846
Boolean Networks
In the framework of Boolean Networks , the expression level of a gene can attain only two values, that is active (on, 1) or inactive (off, 0)
Accordingly, the interactions between elements of the network are represented by Boolean functions
Smolen, Baxter, Byrne, Mathematical model of gene networks, Neuron 26, 567 – 580, 2000
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200847
Features of Boolean Networks
Deterministic description
Very easy to build the model and to simulate it, even for very large networks
They provide only a coarse–grained description of the network behavior, thus not useful for a more detailed analysis of the regulatory mechanisms
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200848
ODE Models
We have seen that the mechanistic ODE approach has been widely exploited since the beginning of the last century for modeling biochemical reactions
When the order of the system increases, classical nonlinear ODE models become hardly tractable, in terms of parametric analysis, numerical simulation and especially for identification purposes
In order to overcome this limitations, alternative modeling approaches have been devised for application to biological networks
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200849
Power–Law Models
The basic concept underlying power–law models is the approximation of classical ODE models by means of a uniform mathematical structure
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200850
S–Systems
S–systems are a particular class of power–law models in which fluxes are aggregated
( ) ( ) ( )∏−∏===
n
j
hji
n
j
gji
i jiji tXtXdt
tdX11
,,, βα
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200851
Features of S - Systems
S–systems feature low computational requirements
Their structural homogeneity allows to easily identify the model parameters from steady–state data by means of logarithmic linearization
Generalized aggregation may introduce a loss of accuracy
Violation of biochemical fluxes concentration
It may conceal important structural features of the network
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200852
Piecewise–Linear Models
Another class of approximate models based on ODEs is that of piecewise-linear (PWL) models
The basic idea is to approximate sigmoidal curves through step functions
The model takes the general form
where
and the functions bil(·) are boolean valued regulation functions expressed in terms of step functions
Casey, De Jong, Gouzé, J. Math. Biol. 52, 27–56, 2006
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200853
Features of PWL Models
Numerical simulation studies have shown that PWL models properlyapproximate the behavior of the corresponding original nonlinear ones
A drawback of this class of systems is that their behavior is very difficult to analyze from a rigorous point of view
PWL models, indeed, can exhibit singular steady–states, that is equilibrium points lying on the threshold surfaces
Moreover it is known that the stability of switching systems cannot be reduced to the analysis of the stability of the linear systems in each sub-space
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200854
Outline
Classification of biological networks
Modeling metabolic networks
Modeling gene regulatory networks
Inferring gene regulatory networks
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200855
Inferring Bayesian Networks
In order to reverse – engineering a Bayesian network model of a gene network, we must find the directed acyclic graph that best describes the data
To do this, a scoring function is chosen, in order to evaluate the candidate graphs G with respect to the data set D
The score can be defined using Bayes rule
If the topology of the network is partially known, the a priori knowledge can be included in P(G)
The most popular scores are the Bayesian Information Criterion (BIC) or Bayesian Dirichlet equivalence (BDe)
They incorporate a penalty for complexity to cope with overfitting
P (G|D) = P (D|G)P (G)P (D)
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200856
Inferring Bayesian Networks
The evaluation of all possible networks involves checking all possible combinations of interactions among the nodes
This problem is NP-hard, therefore heuristic methods are used, like the greedy–hill climbing approach, the Markov–Chain Monte Carlo method, or Simulated Annealing
A software tool for inferring both BNs and DBNs is Banjo, developed by the group of Hartemink(http://www.cs.duke.edu/~amink/software/banjo)
Yu et al, Advances to bayesian network inference for generating causal networks from observational biological data, Bioinformatics 20: 3594-3603, 2004
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200857
Information–Theoretic Approaches
Information – theoretic approaches use a generalization of the Pearson correlation coefficient
used in hierarchical clustering, namely the Mutual Information (MI), which is computed as
where the marginal and joint entropy are defined, respectively, as
H(X) = − Xx∈X
p(x)logp(x)
H(X,Y ) = − Xx∈X ,y∈Y
p(x, y)logp(x, y)
MI(X;Y ) = H(X) +H(Y )−H(X,Y )
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200858
Information–Theoretic Approaches
From the definitions above it follows that
MI becomes zero if the two variables are statistically independent
A high value of MI indicates that the variables are non–randomly associated to each other
MIij=MIji therefore the resulting reconstructed graph is undirected
An important characteristic is that, since the approach is based on the independence of samples, it is not suitable for application to time–series (it can applied only to steady–state data sets)
A software tool based on Mutual–Information theory is ARACNE, described in
Basso et al, Reverse engineering of regulatory networks in human B cells, Nature Genetics 37(4): 382-90, 2005
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200859
Inference of ODE Models
The identification of the structure and parameters of mechanistic nonlinear ODE models is a very demanding task for non–trivial networks, both from a theoretical point of view and in terms of computational requirements
A feasible approach is based on the use of linearized dynamical models, which yield good results when applied to data sets obtained through perturbation experiments
Several methods have been developed from the groups of Gardner and diBernardo, dealing both with steady–state (NIR, MNI) and time–series data (TSNI)
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200860
Time–Series Network Identification
The TSNI algorithm is based on the linearized model
The data set consists of the expression level of N genes, sampled at M time points with a fixed sampling interval
The experimental data are derived from perturbation experiments (e.g. by treatment with a compound or gene overexpression/downregulation)
A linear regression algorithm is used to estimate the coefficients of the dynamical matrix, aij, and those of the input matrix, bi
A non-zero coefficient aij indicates an edge in the (directed) graph, between nodes i and j, whereas a nonzero bij indicates that the node i is directly affected by the perturbation
i = 1, . . . , N
k = 1, . . . ,M
Bansal, Della Gatta, di Bernardo, Bioinformatics 22: 815–822
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200861
Features of TSNI
For small networks (tens of genes), TSNI is able to correctly infer the network structure
Besides topological inference, ODE-based methods are also well–suited for uncovering unknown targets of perturbations, even in complex networks
It is not possible to exploit prior knowledge about the network topology, because this would require the exact knowledge of non–physical parameters
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200862
LMI-based Inference Approach
The basic idea is improving linear ODE–based methods by exploiting available prior knowledge about the network topology (as in BNs)
The identification of the parameters aij, bij, is cast as a convex optimization problem, in the form of linear matrix inequalities (LMIs)
This formulation allows to reduce the admissible solution space by assigning sign constraints to the coefficients corresponding to known interactions
x1 x2
x3 x4???x4
??>x3
?<>x2
???x1
x4x3x2x1
Cosentino et al, IET Systems Biology 1(3): 164–173, 2007
activation
inhibition
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200863
Features of the LMI-based Approach
Numerical tests show that exploitation of prior knowledge greatly improves the reconstruction performances
The method can exploit qualitative a priori knowledge, as well as quantitative information
Such knowledge is exploited within the reconstruction, not for a posteriorievaluation
The optimization problem is convex, therefore the optimal solution, in terms of data-interpolation, can be always found
The latter feature, on other hand, implies a higher tendency to overfitting
Hard to apply to large–scale networks (more than 100 nodes), due to the computational load deriving from the high number of constraints
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200864
Choice of the Inference Algorithms
In a recent study, Bansal et al have compared the performance obtained using different modeling formalisms (BNs, MI, hierarchical clustering, ODE-based models)
Bansal et al, How to infer gene networks from expression profiles, Molecular Systems Biology 3:78, 2007
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200865
Results on Experimental Data Sets
Bansal et al, How to infer gene networks from expression profiles, Molecular Systems Biology 3:78, 2007
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200866
Results Discussion
The different techniques considered in the review infer networks that overlap for only 10% in the best case
Furthermore, the edges predicted by more than one method are not more accurate than those inferred by a single one
On the other hand, taking the union of the interactions found by all the methods would yield an even larger number of false positives
Local perturbation experiments (i.e. affecting one or few genes) seems to yield better results than global ones (perturbations on a high number of genes)
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200867
Remarks on Inference Algorithms
A relevant issue, that is common to all inference algorithm, is that the problem is very often over–determined
All modeling formalisms, indeed, involve a large number of parameters, whereas the number of samples is usually limited (curse of dimensionality)
Possible solutions
Devise methods to exploit different data sets
Reduce the dimensionality of the problem, via data pre–processing, e.g.
clustering algorithm
elimination of statistically non–expressed nodes
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200868
Concluding Remarks
Regardless to the adopted formalism, good inference performances can be achieved only by exploiting the available prior knowledge from biological literature
Despite the great concern about the topological characterization of biological networks, much has still to be done in terms of exploitation of such features in the inference process
Several other approaches exist, both for modeling and inferring biological networks (discrete events, formal languages, machine learning methods, etc.)
Dr. Carlo Cosentino Carnegie Mellon University, Pittsburgh, 200869
References
Klipp et al, Systems Biology in Practice, Wiley-VCH, 2005Palsson, Systems Biology: Properties of Reconstructed Networks, Cambridge University Press, 2006Barabasi, Oltvai, Network Biology: Understanding the Cell’s Functional Organization, Nature Review Genetics 101(5), 101–114 , 2004Hynne et al, Full–scale model of glycolysis in Saccharomyces cerevisiae (2001) Biophys. Chem. 94, 121–163De Jong, Modeling and regulation of genetic regulatory systems, INRIA - RR4032, 2000Smolen, Baxter, Byrne, Mathematical model of gene networks, Neuron 26, 567 – 580, 2000Casey et al, Piecewise linear Models of Genetic Regulatory Networks, Equilibria and their Stability, J. Math. Biol. 52, 27–56, 2006Bansal et al, Inference of gene regulatory networks and compound mode of action from time course gene expression profiles, Bioinformatics 22: 815–822Bansal et al, How to infer gene networks from expression profiles, Molecular Systems Biology 3:78, 2007Cosentino et al, Linear Matrix Inequalities Approach to Reconstruction of Biological Networks, IET Systems Biology 1(3): 164–173, 2007