technical report march 13, 2012 department of computer...
TRANSCRIPT
*Corresponding author. Email: [email protected]
TECHNICAL REPORT
March 13, 2012
Department of Computer Engineering
Middle East Technical University
Construction of signaling pathways from PPI and RNAi data using Linear
Programming
Oyku Eren Ozsoya*
and Tolga Canb
a Informatics Institute, Middle East Technical University, Ankara, TURKEY;
b Department of Computer
Engineering, Middle East Technical University, Ankara, TURKEY
For the reconstruction of signaling pathways from RNAi data, an integer linear optimization
model is proposed. The aim is to reconstruct the signaling network from the given protein protein
interaction (PPI) network satisfying RNAi data by making minimum changes on the given
network. For evaluation, 1000 reference PPI networks each with seven, eight, or nine proteins, and
RNAi data for each of the regular proteins in the network were generated randomly. The solution
was examined to have a general overview about reconstruction of signaling networks from RNAi
data by using the proposed method.
Keywords: Computational biology; bioinformatics; integer programming; graph theory and
networks
1. Introduction
Finding the relationship between the genes in signal transduction networks and gene
regulatory networks/protein-protein interaction (PPI) networks is a very important problem in
systems biology. Signal transduction is a series of biochemical reactions involving proteins,
therefore PPI data can be used as a source for reconstruction of the signaling pathways. While
signaling pathways are considered to be directed, high-throughput protein-protein interaction
data is undirected. It is possible to transform this undirected data to directed pathways [3].
For the reconstruction of signaling pathways from PPI data, several methods have been
developed, such as color coding algorithm [11] and Netsearch algorithm [12].
RNA interference (RNAi) is a technique used for finding the genes in a pathway [2].
For large-scale RNAi screens, the readouts are generally based on single reporters [1]. High-
content, high-throughput image-based screens at a genome-wide scale are developing rapidly
[10]. Identification of genes associated with a particular phenotype can be done by the RNAi
technique. This technique investigates the downstream effects of a silenced gene. However,
their placement in space and time in the respective cellular pathways remains a problem [9].
It is possible to place genes in the respective pathways by interrogation of databases and
literature [7]. In the cases when there is insufficient information about the genes in the
pathways, automated approaches that place these genes in the network have to be developed.
A survey on the methods developed for such purposes are performed by Kaderali and Radde
[5]. The developed methods use Boolean models, correlation based models and associative
network approach, Bayesian networks, differental equations models and similar techniques.
Several methods use microarray gene expression as data and aim to generate gene regulation
networks using time dependent or static data. Some methods like Bayesian networks allow
the integration of biological prior information. Despite all these developed methods, the
temporal and spatial placement of the genes in a signaling pathway is still a challenging
problem.
For the construction of signal transduction networks using RNA interference, only a
few methods are available. Markowetz et al. [8] proposed Nested Effect Models for this
problem. Such models construct the signal transduction networks by using the nested
structure of observed perturbation effects. Although they are suitable models for effect-result
kind of data, such as RNAi data, they require several kinds and relatively high number of
readouts per knockdown. This prevents the usage of the results of RNAi experiments using
one messenger gene. Kaderali et al. [6] developed a probabilistic method that can reconstruct
network topologies using the single gene knockdown data by generating topologies consistent
with this data. However, because of the computational complexity of this method, only small
networks can be solved in a reasonable time and also it is not possible to use the data other
than the RNAi data, such as time series and real time screening data. They show that still
some information about the pathway topology can be retrieved, and results can be used to
design additional experiments to resolve the topology further.
In order to identify the network topologies from the given knockdown data, Kaderali
et al. [6] consider a deterministic model, where a node i is activated, if any node j with an
edge j-i is active. They simulate data with this model, and use complete enumeration of all
network topologies with a given number of nodes to determine how many topologies are
compatible with the data.
The problem here is; as the number of nodes increases, the number of compatible
topologies increases exponentially. For n different components in the network, there are n2
edges, each of which can be present or absent in a network, leading to 2n×n
different possible
network topologies. On the other hand, single-gene knockdowns with a single phenotypic
readout per knockdown will yield only n bits of information for network reconstruction.
When distinguishing activating and deactivating influences, this problem grows even further.
Then additional information is needed to be able to reconstruct a unique topology, such as
further observable nodes, stimulation of other nodes, time series measurements or
combinatorial knockdowns. They counter this problem using a prior distribution on model
parameters that drives the network inference to sparse networks. In addition, they can make
use of multiple (e.g. double) knockdowns if available, or of multiple different stimulations.
Using Boolean threshold functions in a Bayesian network, they sample from the posterior
over model parameters. For larger networks, they use a likelihood approximation based on
stochastic simulation. The model allows inclusion of further observable nodes, stimulation of
other nodes, time series measurements or combinatorial knockdowns.
In this study, for the reconstruction of signal transduction pathways from the RNAi
data, a linear program based model is proposed. The major difference of our approach from
the others is that we formulate the problem so as to edit a given reference network with
respect to given RNAi data. This reference network is a PPI network in our approach.
Assume that a network, which might be already constructed based on the experiments carried
out in the past, is given as a reference network. If there is new information about the network,
e.g. new RNAi data is available; this network has to be updated with respect to the new data.
In our approach, from the given reference network, a new network is reconstructed that
satisfies RNAi data by applying minimum changes on the given network. The network is
represented as a graph consisting of a set of nodes and edges. First, a binary state variable is
assumed for each edge: the state variable is 1 if the edge is present in the network, otherwise
it is 0. Then, each knockdown data is formulated as a linear constraint after enumerating all
possible paths from the source node s to the sink node t. This is done by considering whether
a path transmits the signal to the sink node or not. If the signal is transmitted, then at least one
path is complete and therefore, the sum of state variables of the edges must be equal to the
number of edges in this path. Otherwise, all possible paths are incomplete and the sum of
state variables of the edges must be smaller than the number of edges in this path. The
objective function for this linear problem is minimizing the sum of the absolute values of the
differences between the state variables of each edge in the reference network and the new
network. We solved this binary integer linear programming problem by generating the
corresponding LP file automatically and using CPLEX v12. The LP file is created by a code
written in C language automatically according to the data supplied, i.e. reference network,
number of nodes, and knockdown data. We considered several problems with different
reference networks, number of nodes and knockdown data. Our experiments show that, the
topology of a network with 10 nodes can be constructed by our approach in a reasonable
time.
This article is organized as follows. In Section 2, the model that we developed for the
reconstruction of signal transduction pathways from the RNAi data is explained in detail. In
Section 3, information about the data sets that we used to evaluate our model is given. The
results for the problems solved by using the method and a discussion on the results are given
in Section 4.
2. Methods
Consider a given directed graph G(V,E) where V represents the node set and E
represents the edge set, with a source node s, and a sink node t. This graph may be taken from
any of the protein-protein interaction (PPI) network database, where each node represents a
protein and assume that the RNAi data is available from RNAi experiment results. Although
PPI networks have undirected edges, they can be transformed to directed edges [3]. The aim
is to reconstruct a new network from the given network satisfying RNAi data by making
minimum changes on the given network. The approach would be to formulate this problem as
a linear optimization problem, which will provide a network satisfying the RNAi data with a
minimum change applied on the given network.
Let xij be the binary variable representing the presence of the edge between nodes i and
j which is from node i to node j in the given network. If the edge is present, then the value of
xij is 1, otherwise it is 0. Similarly, let wij represent the edges in the network that is to satisfy
the given RNAi data. The RNAi data consists of the information whether the signal is
transferred from the source node s to the sink node t, or not, after the knockdown of a single
node or multiple nodes. We will call these binary variables “the state variables”. The goal is
to reconstruct the given network with respect to the RNAi data by minimizing the changes
that have to be applied to the given reference network. The objective function for this linear
problem would be the sum of the absolute values of the differences between wij and xij, i.e. |xij
- wij|. If the edge is present both before and after the optimization, or not, then the difference
becomes 0, which means no change is made on the corresponding edge in the network.
However if the difference is 1, it means that the edge is either taken out from the network or
it is inserted into the network. Therefore, minimizing the sum of these differences results in a
network that is obtained by making minimum number of changes on it while satisfying the
constraints obtained from the knockdowns.
The result of each knockdown can be formulated as a linear constraint after
enumerating all possible paths from the source node s to the sink node t. If the signal is not
observed at the sink node after knockdown of a node (protein), then any path from source s to
sink t excluding the knockdown node should not be complete, i.e., the path has to be broken
somewhere between the source and the sink. If it is observed, then at least one of the possible
paths not including the knockdown node should be complete. If a path is not complete, then
at least one of the edges on the path should not be present. Therefore, the state variable
corresponding to that edge must be 0. If a path is complete, then all edges on the path must be
present and the corresponding state variables take the value 1.
To visualize the discussion above, consider a network consisting of 5 nodes, two of
which are the source node s and the sink node t, as in (Figure 1).
Figure 1. (a) A 5-node network with all possible edges, (b) given initial network: solid lines show the connected
edges, dashed lines show the disconnected edges.
Note that, there are no edges going into the source node s and no edges coming out of the
sink node t. Also, self-edges are not allowed and there is no direct edge from source to sink.
Even with these assumptions, if we disregard the sign of an edge (whether it activates or
inactivates its target node) the number of possible network topologies would be close to 2nxn
.
s t
1 2
3
s t
1 2
3
(a) (b)
The topology shown includes all possible paths from the source to the sink. However, in the
the RNAi experiments, a node is knocked down and its edges are disabled.
Let (s-3-t) be a given path on this network which is taken from a PPI network
database, as in (Figure 1b). All other edges are inactive and it is known that this network
consists of 5 nodes. Let the knockdown data be as given in (Table 1). According to this data,
knockdown of protein 1 (node 1) causes the sink node t to be not activated. This result can be
written as mathematical constraints considering all possible paths which do not include node
1. Since no activation of the sink node is observed, none of these paths can transmit the
signal, i.e. they must all be broken at one edge or some edges. Such constraint can be
satisfied by only setting at least one of the state variables of the edges on these paths to 0.
Therefore, for a non-transmitting path, the product of the state variables must be 0.
Knockdown Effect on sink node t
Node 1
Node 2
Node 3
Not activated
Activated
Activated
Table 1. Artificial knockdown data. Source gene s is activated, while all other genes are inactive. Readout is
done at gene t.
Now, all the paths which do not include an edge connected to node 1 should be
determined. For our problem with the knockdown of node 1, these non-transmitting paths are
(s-2-t), (s-3-t), (s-2-3-t), and (s-3-2-t). All of these paths must be broken, i.e. one of wij’s in
these paths must be zero. This condition can be formulated for our problem as follows:
(1)
(2)
, (3)
(4)
Since at least one of the state variables at the left hand sides of the inequalities is zero, the
corresponding sum must be less than the number of terms in the inequality. There is a logical
“AND” relationship between all of these constraints. They must all be satisfied at the same
time; otherwise the signal would be transmitted from the source node to the sink node.
Next, the second knockdown data is to be written as a constraint for our linear
programming problem. Knockdown of node 2 results in an activation of the sink node t. This
observation means that at least one path that does not include node 2 must be complete, i.e.
not broken, so that it is possible to transmit the signal from source to the sink. If a path
transmits the signal, then all of the state variables of the edges on that path must be 1. For our
problem with the knockdown of node 2, at least one path that does not include an edge
connected to node 2 must be present in the network. These paths are (s-1-t), (s-3-t), (s-1-3-t),
and (s-3-1-t). At least one of these paths has its wijs entirely equal to 1. This condition can be
formulated for this problem as follows:
, or (5)
, or (6)
, or (7)
. (8)
Note that the inequality signs (greater than or equal to) can be replaced by equalities, since
the state variables are binary variables. Between these constraints, there is a logical “OR”
condition, i.e. at least one of them must be satisfied. Note that constraint (6) cannot be
satisfied because of the constraint stated in (2), therefore it can be omitted from the
formulation.
Similarly, the knockdown of node 3 implies that at least one of the paths (s-1-t), (s-2-
t), (s-1-2-t), and (s-2-1-t) must be present in the network. The formulation of this constraint is
as follows:
, or (9)
, or (10)
, or (11)
(12)
Here, the constraint (10) cannot be satisfied because of the constraint (1), therefore it can be
excluded from the formulation.
After combining the objective function and all these constraints, the problem is stated as
follows:
Minimize (13)
Subject to
(14)
(15)
, (16)
(17)
, or (18)
, or (19)
, or (20)
. (21)
, or (22)
, or (23)
, or (24)
. (25)
Now, it is necessary to convert the constraints related with “OR” (18)-(25) into linear
form. This can be done by using the conversion explained below [4]:
2.1 Either/Or constraints
Suppose that at least one of the following equalities must hold:
Either , (26)
or . (27)
Using a sufficiently large positive number M, an equivalent set of constraints can be
formulated as
, (28)
(29)
where y is a binary variable. Since y must be either 1 or 0, both (28) and (29) are satisfied if
at least one of (26) and (27) is satisfied. A more general case when K out of N constraints
must hold is explained below [4].
K out of N Constraints must hold:
If there are several constraints only some of which must hold, formulation of either/or
constraint explained above can be expanded to account for such requirement. Assume that
there are N constraints and K of them must hold where K < N. The formulation is stated as
.
.
The equality in can be changed by an inequality ≤ if at least K
constraints are required to be satisfied.
If yi = 0, then the original constraint is obtained. However, if yi = 1, because of the
large positive number M, even if the original constraint is not satisfied, the new constraint is
satisfied. Since K out of N constraints must hold, the summation of yi’s should be equal to N-
K.
Now, we can reformulate the constraints (18)–(25) using the method described above
to transform “OR” conditions into “AND” conditions. The constraints (18)-(21) and (22)-(25)
then take the form
, (30)
, (31)
, (32)
, (33)
. (34)
, (35)
, (36)
, (37)
, (38)
. (39)
Since it is required that at least one of the constraints (30)-(33) and (35)-(38) must hold, (34)
and (39) are written as an inequalities respectively and N - K = 4 – 1 = 3. Eliminating the
constraints that cannot be satisfied due to the constraints obtained from knockdown of node
1, the problem can be stated as
Minimize (40)
Subject to
(41)
(42)
, (43)
(44)
, (45)
, (46)
, (47)
. (48)
, (49)
, (50)
, (51)
. (52)
Let us return to the original problem, where the given network consists of nodes s-3-t.
The state variables for this network are xs3=1, x3t=1, and all the remaining are 0. The optimum
solution to this problem gives 3 as the objective function value and the state variables are
found as w3t=1, ws1=1, w1t=1, and the remaining are 0. Therefore, totally 3 changes are
applied to satisfy the given constraints; edge s-3 is removed and edges s-1 and 1-t are added
to the network. The final network structure is given in (Figure 2).
Figure 2. The resulting network satisfying the RNAi data as given in (Table 1) and therefore the constraints
(55)-(66).
The solution makes sense considering the given constraints. In order to prevent signal
transmission with the knockdown of node 1, either the edge s-3 or the edge 3-t must be
removed from the network. Also, in order for the signal to be transmitted after the
knockdown of node 2 or 3, the path s-1-t must be present in the network. As it can be
understood, there may be more than one optimum solution for this problem. The solution
shown here is obtained by the software CPLEX v12.3, therefore only one of the possible
solutions is obtained. Instead of removing the edge s-3 and keeping the edge 3-t, removing
the edge 3-t and keeping the edge s-3 results in another optimum solution.
2.2 Automatic generation of constraints
In order to solve the network problem described in the previous section by using the
software CPLEX, it is necessary to generate the constraints automatically as the number of
constraints becomes very large even for pathways with small number of nodes. An LP-format
file which can be read by CPLEX is created by a code written in C. The code generates the
s t
1 2
3
objective function with the input data, i.e. a given reference PPI network. The objective
function consists of all the edge values wij that are to be found and the values xij which are the
edge values for the given initial network. Then, for RNA interference data, the constraints are
generated. Firstly, the constraints for knockdown of the nodes which do not activate the sink
node t are generated. The constraints include all possible paths from the source node s to the
sink node t which do not include the knockdown node and there exists an “AND” relation
between them as explained above. These paths have a minimum of 2 edges and a maximum
of n-1 edges for a network consisting of n nodes since a direct edge from source to sink and
paths with loop are not allowed. For all configuration of paths, i.e. paths with different
number of edges (e.g. paths consisting of 2 edges, 3 edges, …, n-1 edges), the constraints are
written in the LP-file. Next, the constraints for the knockdown of the nodes which activate
the sink node t are generated. The constraints include all possible paths from the source node
s to the sink node t which do not include the knockdown node and there exists an “OR”
relation between them as explained above. They also have a minimum of 2 edges and a
maximum of n-1 edges. Since they are related with “OR” condition, yi and M values are
added to the constraints and the number of yi values are counted to write the additional
constraints as in (48) and (52). Some of these paths are already dealt with when considering
the nodes which are not activating the sink node, therefore the corresponding constraint
cannot be satisfied and they are excluded. In fact, this is the reason why the constraints for
the nodes which do not activate the sink node are generated first. Lastly, the variables wij and
yi are defined as binary variables in the LP-file.
3. Data Sets
To evaluate the proposed formulation, we randomly generated 1,000 reference
networks each with seven, eight, or nine genes, including the receptor and reporter genes.
Each edge in a network is an outcome of a Bernoulli trial with probability 0.5. We also
randomly generated RNAi constraints for each of the regular genes in the network with
p(effecting gene=1)=0.5. Therefore, most of the initial PPI networks are dense networks.
Then, for each PPI network we generate RNAi data. The knockdown results are simulated by
randomly assigning a value of 0 or 1 to the knockdown data for each node between the source
and the sink. These values represent the observable results at the sink node after each
knockdown. If the observable value is 0 at the sink node, the knockdown node is a an
affecting protein. If it is 1, then the corresponding node is a non-affecting protein. From this
RNAi data, the constraints for our ILP are constructed. If the knockdown result is 0, “and”
constraints are created; if it is 1, “or” constraints are written down.
4. Results and Discussion
Given a reference PPI network and a set of RNAi constraints, the corresponding
integer linear program is solved by CPLEX_12 (64 bit). For each type of network, i.e. 7, 8
and 9 node networks, 6 different results are inspected, namely, number of affecting genes vs.
average solution time, number of affecting genes vs. number of changes applied on the initial
network, number of edges in the initial network vs. average solution time, number of edges in
the initial network vs. number of changes applied on the initial network, number of
constraints vs. number of affecting genes, difference in number of “and” and “or” constraints
vs. number of affecting genes. The number of affecting genes for an n-node network can be at
most n-2, i.e. all the nodes between the source node and the sink node are affecting genes. If
all of the nodes are affecting nodes, we do not solve the problem, since the formulation
described above is valid when there is at least one complete path from source node to the sink
node. The presence of a complete path in this formulation is guaranteed by having a non-
affecting node, because at least one of the paths that do not include the non-effecting node
must be complete.
First, 7-node networks are considered. The number of 7-node networks created by the
C-code randomly are shown in (Figure 3) with respect to the number of affecting genes. The
figure shows a normal distribution since the networks are created randomly with a 0.5
probability of a node being anaffecting node. In some of the networks, all genes (i.e. 5 nodes)
are affecting nodes. In that case, the problem is not solved. Similar graphs for 8 and 9 node
networks are shown in (Figure 10) and (Figure 17). (Figure 4) shows the average solution
times of the ILP for different number of affecting genes for 7-node networks. The red bars
represent the standard deviation in solution time and are really high. This is because the
solution time depends not only on the number of affecting genes but also on many other
variables, such as number of constraints and number of edges in initial network. The average
solution time shows a normal distribution over the number of affecting genes. This can be
explained by the number of “and” and “or” constraints. Although the number of constraints
decreases with the increasing number of affecting genes, as shown in (Figure 8) the
difference between them shows a similar but reverse behavior with the average solution time,
as shown in (Figure 9). If the difference between the number of “and” and “or” constraints
are small, the average solution time is high; if the difference is high, then the average solution
time is low. From this observation, we can conclude that as the “and” and “or” constraints
mix more, the solution time gets higher since finding the edge values minimizing the
objective function and at the same time satisfying both the “and” and “or” constraints are
getting harder.
In (Figure 5), the number of changes made on the initial network with respect to the
number of affecting genes are shown. As the number of affecting genes increases, the number
of changes made on the initial network also increases. This is due to the increase in the
number of “and” constraints with the number of affecting genes. Satisfying all the “and”
constraints is more difficult than satisfying only one of the “or” constraints, because while
satisfying only one “or” constraint is enough, all the “and” constraints must be satisfied at the
same time.Therefore, more changes are made on the initial network as the number of
affecting genes increases.
As mentioned before, the average solution time may also depend on the number of
edges in the initial network. Such a dependence is shown in (Figure 6). Here, the number of
edges in initial networks span from 7 to 23. This is because these networks are created
randomly and each edge has a 0.5 probability of being present in the initial network. (Figure
6) shows that the average solution time slightly increases with the number of edges in the
initial network. This is as expected because as the number of edges in initial network
increases, more edges should be removed from the network to satisfy the constraints and
therefore the time required to find such edges increases. Consequently, as shown in (Figure
10), the number of changes made in the initial network also increases with the increasing
number of edges in the initial network. In these two figures, while the average solution time
deviates too much, the standard deviation of the number of the changes in initial network is
smaller and proportinal to the number of edges in the initial network.
The above discussion is also valid for 8 node networks, (Figure 10) to (Figure 16);
and 9 node networks, (Figure 17) to (Figure 23). The created networks have a normal
distribution. As the number of nodes increase, the total number of constraints increase
exponentially, which in turn results in an exponential increase in the solution time. Similarly,
the number of edges in initial network and the number of changes imposed on the initial
network increases with number of nodes in the network.
Figure 3. Number of 7-node networks that are created randomly for different number of
affecting genes.
Figure 4. Average solution times for the 7-node networks with different number of affecting
genes.
0
50
100
150
200
250
300
350
0 1 2 3 4 5
# o
f N
etw
ork
s C
reat
ed
# of Affecting Genes
0
0,02
0,04
0,06
0,08
0,1
0,12
0,14
0 1 2 3 4
Ave
rage
So
luti
on
Tim
e
# of Affecting Genes
stdev
avg time
Figure 5. Average number of changes applied on the reference 7-node networks with
different number of affecting genes.
Figure 6. Average solution times for the 7-node networks with different number of edges in
the reference networks.
0
1
2
3
4
5
6
7
0 1 2 3 4
# o
f C
han
ges
in In
itia
l Ne
two
rk
# of Affecting Genes
stdev
# of chg
0
0,05
0,1
0,15
0,2
0,25
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Ave
rage
So
luti
on
Tim
e
# of Edges in Initial Network
std dev
avg time
Figure 7. Average number of changes applied on the reference 7-node networks with
different number of edges in the reference networks.
Figure 8. Number of AND and OR constraints for 7-node networks with different number of
affecting genes.
0
50
100
150
200
250
300
350
0 1 2 3 4
# o
f C
on
stra
ints
# of Affecting Genes
# of or cons
# of and const
0
1
2
3
4
5
6
7
8
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
# o
f C
han
ges
in I
nit
ial N
etw
ork
# of Edges in Initial Network
std dev
# of chg
Figure 9. Difference between AND and OR constraints for 7-node networks with different
number of affecting genes.
Figure 10. Number of 8-node networks that are created randomly for different number of
affecting genes
0
50
100
150
200
250
300
350
0 1 2 3 4
Dif
fere
nce
Be
twe
en
Co
nst
rain
ts
# of Affecting Genes
0
50
100
150
200
250
300
350
0 1 2 3 4 5 6
# o
f N
etw
ork
s
# Affecting Genes
Figure 11. Average solution times for the 8-node networks with different number of affecting
genes.
Figure 12. Average number of changes applied on the reference 8-node networks with
different number of affecting genes.
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0,4
0,45
0,5
0 1 2 3 4 5
Ave
rage
So
luti
on
Tim
e
# of Affecting Genes
stdev
avg time
0
1
2
3
4
5
6
7
8
9
0 1 2 3 4 5
# o
f C
han
ges
in In
itia
l Ne
two
rk
# of Affecting Genes
stdev
chg in soln
Figure 13. Average solution times for the 8-node networks with different number of edges in
the reference networks.
Figure 14. Average number of changes applied on the reference 8-node networks with
different number of edges in the reference networks.
0
0,1
0,2
0,3
0,4
0,5
0,6
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Ave
rage
So
luti
on
Tim
e
# of Edges in Initial Network
std dev
avg time
0
2
4
6
8
10
12
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
# o
f C
han
ges
in I
nit
ial N
etw
ork
# of Edges in Initial Network
std dev
# of chg
Figure 15. Number of AND and OR constraints for 8-node networks with different number of
affecting genes.
Figure 16. Difference between AND and OR constraints for 8-node networks with different
number of affecting genes.
0
500
1000
1500
2000
2500
0 1 2 3 4 5
# o
f C
on
stra
ints
# of Affecting Genes
# of or cons
# of and cons
0
500
1000
1500
2000
2500
0 1 2 3 4 5
Dif
fere
nce
Be
twe
en
Co
nst
rain
ts
# of Affecting Genes
Figure 17. Number of 9-node networks that are created randomly for different number of
affecting genes
Figure 18. Average solution times for the 8-node networks with different number of affecting
genes.
0
50
100
150
200
250
300
0 1 2 3 4 5 6 7
# o
f N
etw
ork
s
# of Affecting Genes
0
5
10
15
20
25
30
35
40
45
50
0 1 2 3 4 5 6
AV
era
ge S
olu
tio
n T
ime
# of Affecting Genes
std dev
avg time
Figure 19. Average number of changes applied on the reference 9-node networks with
different number of affecting genes.
Figure 20. Average solution times for the 9-node networks with different number of edges in
the reference networks.
0
2
4
6
8
10
12
0 1 2 3 4 5 6
# o
f C
han
ges
in In
itia
l Ne
two
rk
# of Affecting Genes
std dev
# of chg
0
10
20
30
40
50
60
70
80
90
100
18 20 22 24 26 28 30 32 34 36 38 41
AV
era
ge S
olu
tio
n T
ime
# of Edges in Initial Network
std dev
avg time
Figure 21. Average number of changes applied on the reference 9-node networks with
different number of edges in the reference networks.
Figure 22. Number of AND and OR constraints for 9-node networks with different number of
affecting genes.
0
2
4
6
8
10
12
14
16
18 20 22 24 26 28 30 32 34 36 38 41
# o
f C
han
ges
in In
itia
l Ne
two
rk
# of Edges in Initial Network.
std dev
# of chg
0
2000
4000
6000
8000
10000
12000
14000
16000
0 1 2 3 4 5 6
# o
f C
on
stra
ints
# of Affecting Genes
# or cons
# and cons
Figure 23. Difference between AND and OR constraints for 9-node networks with different
number of affecting genes.
5. Acknowledgement
The support of the European Union in the 7th
framework program through SysPatho, grant
260429 is greatly appreciated.
6. References
[1] A.L. Brass, D.M. Dykxhoorn, Y. Benita, N. Yan, A. Engelman, R.J. Xavier, J.
Lieberman and S.J. Elledge, Identification of host proteins required for HIV infection
though a functional genomic screen, Science 319 (2008), pp. 817–824.
[2] A. Fire, S. Xu, M.K. Montgomery, S.A. Kostas, S.E. Driver, C.C.Mello, Potent and
specific genetic interference by double-stranded RNA in Caenorhabditis elegans,
Nature 391 (1998), pp. 806–811.
[3] A. Gitter, J. Klein-Seetharaman, A. Gupta and Z.B. Joseph, Discovering pathways by
orienting edges in protein interaction networks, Nucl. Acids Res. 39(4) (2011).
[4] F.S. Hillier, G. J. Lieberman, Introduction to Operations Research, McGraw-Hill,
New York, NY, 2001.
[5] L. Kaderali, N. Radde, Inferring gene regulatory networks from expression data,
Computational Intelligence in Bioinformatics (2008), pp. 33-74
0
2000
4000
6000
8000
10000
12000
14000
16000
0 1 2 3 4 5 6
Dif
fere
nce
Be
twe
en
Co
nst
rain
ts
# of Affecting Genes
[6] L. Kaderali, E. Dazert, U. Zeuge, M. Frese, R. Bartenschlager, Reconstructing
signaling pathways from rnai data using probabilistic boolean threshold network,
Bioinformatics 25 (17) (2009), pp. 2229-2235.
[7] R. König, Y. Zhou, D. Elleder, T.L. Diamond, G.M.C. Bonamy, J.T. Irelan, C.
Chiang, B.P. Tu, P.D.D Jesus, C.E. Lilley, S. Seidel, A.M. Opaluch, J.S. Caldwell,
M.D. Weitzman, K.L. Kuhen, S. Bandyopadhyay, T. Ideker, A.P. Orth, L.J. Miraglia,
F.D. Bushman, J.A. Young and S.K. Chanda, Global analysis of host-pathogen
interactions that regulate earlystage HIV-1 replication, Cell 135 (2008), pp. 49–60.
[8] F. Markowetz, D. Kostka, O.G. Troyanskaya and R. Spang, Nested effects models for
high-dimensional phenotyping screens, Bioinformatics 23 (2007), pp. 305–312.
[9] J. Moffat, and D.M. Sabatini, Building mammalian signaling pathways with RNAi
screens, Nat. Rev. Mol. Cell Biol. 7 (2006), pp. 177–187.
[10] R. Sacher, L. Stergiou and L. Pelkmans, Lessons from genetics: interpreting complex
phenotypes in RNAi screens, Current Opinion in Cell Biology 20 (2008), pp. 483–
489.
[11] J. Scott, T. Ideker, R. Karp, R. Sharan, Efficient algorithms for detecting signaling
pathways in protein interaction networks, J. Comput. Biol. 13 (2006), pp. 133-144.
[12] M. Steffen, A. Petti, J. Aach, P. D'haeseleer, G. Church, Automated modelling of
signal transduction networks, BMC Bioinformatics 3/34 (2002).