quantitative study of biological systems using the...
TRANSCRIPT
1 / 49
Quantitative Study of Biological Systems usingthe Stochastic CLS
Guido Scatena
for the course
“Metodi e strumenti per la verifica“by Prof. Andrea Maggiolo–Schettini
Pisa, 12 November 2009
2 / 49
Outline
1 IntroductionFormal Modelling of Biological SystemsQualitative Modelling with the Calculus of Looping SequencesQuantitative Modelling with the Stochastic CLS
2 The Stochastic CLS Simulator
3 Studied Biological Systems
3 / 49
Biological Systems : an example
4 / 49
Biological Systems : an example
5 / 49
Computational Systems Biology
Biologists need to integrate the knowledge about singleconstituents of living organisms into a system view.
⇓Biological systems are modelled as stochastic concurrent systems
and analyzed by simulation and model checking .
A 747 plane is composed of 6 million pieces.
Do you think that inspecting every single component individually you
would be able to say how the plane will look like ?
5 / 49
Computational Systems Biology
Biologists need to integrate the knowledge about singleconstituents of living organisms into a system view.
⇓Biological systems are modelled as stochastic concurrent systems
and analyzed by simulation and model checking .
”in many ways,
biological systems are
like massively parallel,
highly complex,
error-prone computer
systems”
complex parallelcomputer systems
complex parallelbiological systems
6 / 49
Languages for Biological Systems
Biological System = Entities + Interactions
We consider systems composed by
• n species{S1, S2, . . . ,Sn}
• m reactions{R1,R2, . . . ,Rm}.
Each reaction Ri is represented by a reaction equation
l i1S1 + l i
2S2 + . . .+ l inSn
k i
−→ r i1S1 + r i
2S2 + . . .+ r inSn.
l ij = reactants stoich. coeff. r i
j = products stoich. coeff.
k i = kinetic rate
6 / 49
Languages for Biological Systems
Biological System = Entities + Interactions
For chemical reactions are sufficientlanguages on multisets.
For biological systems we needlanguages that deals with structuredobjects.
The Calculus of Looping Sequences is a language aims to describebiological systems based on term rewriting.
A biological system as a term (= a tree) representing its structure.Its evolution as a set of rewriting rules.
7 / 49
Outline
1 IntroductionFormal Modelling of Biological SystemsQualitative Modelling with the Calculus of Looping SequencesQuantitative Modelling with the Stochastic CLS
2 The Stochastic CLS Simulator
3 Studied Biological Systems
8 / 49
Calculus of Looping Sequences: state
Syntax
Possibly infinite alphabet of elements E and a neutral element εrepresenting the empty sequence.
Terms T and Sequences S of CLS are given by the followinggrammar:
T ::= S∣∣ (
S)L cT
∣∣ T |TS ::= ε
∣∣ a∣∣ S · S
where a is a generic element of E .
Thus we have the following operators:
• Sequencing ·• Looping and Containment
( )L c• Parallel composition |
9 / 49
Calculus of Looping Sequences: state
Structural Congruence
The structural congruence relations ≡S and ≡T are the leastcongruence relations on sequences and on terms, respectively,satisfying the following rules:
S1 · (S2 · S3) ≡S (S1 · S2) · S3 S · ε ≡S ε · S ≡S S
S1 ≡S S2 implies S1 ≡T S2 and(S1
)L cT ≡T
(S2
)L cT
T1 |T2 ≡T T2 |T1 T1 | (T2 |T3) ≡T (T1 |T2) |T3 T | ε ≡T T(ε)L c ε ≡ ε
(S1 · S2
)L cT ≡T
(S2 · S1
)L cT
It states
• The associativity of | and ·• The commutativity of |• The neutral role of ε
10 / 49
Calculus of Looping Sequences: events
We assume three sets of typed variable:term variables TV ranged over X ,Y ,Z , . . ., sequence variables SVranged over x , y , z , . . ., and element variables X ranged overx , y , z , . . ..
Patterns P and sequence patterns SP of CLS are given by thefollowing grammar:
P ::= SP∣∣ (
SP)L cT
∣∣ T |T∣∣ X
SP ::= ε∣∣ a
∣∣ SP · SP∣∣ x
∣∣ x
where a is a generic element of E , and X , x and x are genericelements of TV , SV and X , respectively.
The set of all variables is V = TV ∪ SV ∪ X andVar(P) denotes the set of variables appearing in P.
An instantiation is a partial function σ : V → T that respects thetype of variables.
11 / 49
Calculus of Looping Sequences events
Rewrite RulesA rewrite rule is a pair (P1,P2) such that P1,P2 ∈ Pand P1 6≡ ε, Var(P2) ⊆ Var(P1).
A rewrite rule (P1,P2) states that a term P1σ, obtained byinstantiating variables in P1 by an instantiation function σ, can betransformed into the term P2σ.
12 / 49
Calculus of Looping Sequences: model
CLS ModelA CLS model is a pair (T0,R) where
• T0 it is the starting term (state), and entities
• R is a finite set of rewriting rules (events) interactions
Given a CLS Model we can compute a LTS semantics;then we can check the reachability of certain states or test the
behavioral equivalence of different systems.
from course matherial of "Probabilistic Model Checking" by Dave Parker, Oxford University Computing Laboratory
13 / 49
Outline
1 IntroductionFormal Modelling of Biological SystemsQualitative Modelling with the Calculus of Looping SequencesQuantitative Modelling with the Stochastic CLS
2 The Stochastic CLS Simulator
3 Studied Biological Systems
14 / 49
Quantitative Modelling of Biological Systemsknowing amount of every species at time t0
⇒ we want to know amounts at time t1 timed traj.
DeterministicNumerical solution of differentialequations
StochasticEach reaction is explicitly simulated
Char.: Amount continuousMean of trajectories
Pros: Low complexity
Cons: Correct only withmillions of molecules
Char.: Amount discrete
Pros: Always correct
Cons: High complexity
Example
Ak1→ 2A
A + Bk2→ 2B
Bk3→ ∅
15 / 49
Adding Stochasticity ..
Stochastic modeling of biological systems works by associating arate number to each active reaction (or, in general, interaction)representing the frequency or the propensity of interactions.
l i1S1 + l i
2S2 + . . .+ l inSn
k i
−→ r i1S1 + r i
2S2 + . . .+ r inSn.
l ij = reactants stoich. coeff. r i
j = products stoich. coeff.
k i = kinetic rate
16 / 49
Stochastic CLS
Stochastic Rewrite RuleA stochastic rewrite rule is a triple (P,P ′, f ), denoted with P
f7→P ′,where P,P ′ ∈ P, P 6≡ ε and such that Var(P ′) ⊆ Var(P), andf : σ → IR≥0 is the rewriting rate function.
Stochastic CLS ModelA stochastic CLS model is a pair (T0,R) where
• T0 it is the starting term representing the state of the systems, and
• R is a finite set of stochastic rules that represents stochastic events
Semantics
• Stochastic application of term rewriting rules
? Given a finite set of rewrite rule schemata R its semanticscorresponds to a stochastic automaton (huge number of states)
17 / 49
Stochastic Simulation of CLSApply Gillespie’s stochastic simulation algorithm (1977), based on thetheory of collisions, generates a statistically correct trajectory in the statespace of the system evolution.
All active reactions undergo a (stochastic) race condition, and the fasterone is executed.
the state of the simulation is a pair (T , t) where
• T is the current term
• t ∈ IR≥0 is a global clock
the step of the simulation consists of
• choose the next state s ′ randomly with a probability proportional tothe rate of the transitions multiplied the number of places in theterm in which the relative rule can be applied
• increment the global clock by a random quantity which isexponentially distributed with the exit rate of the current state s asparameter
produces a temporal trace with variable delay between states
18 / 49
Outline
1 IntroductionFormal Modelling of Biological SystemsQualitative Modelling with the Calculus of Looping SequencesQuantitative Modelling with the Stochastic CLS
2 The Stochastic CLS Simulator
3 Studied Biological Systems
19 / 49
Implementation of Stochastic Simulation of CLS
To get the correct probability of each exit transition we have tosolve a tree pattern matching problem (NP–complete).
Example
20 / 49
Implementation of Stochastic Simulation of CLS
To get the correct probability of each exit transition we have tosolve a tree pattern matching problem (NP–complete).
Example
21 / 49
The Simulator : Representation of SCLS TermsExamplea · b | a · b | c · d |
(a · x · b
)L c ((a)L c (X | a) |
(a)L c (a |X ) | b)
Advantages
• More compact trees (group together structural equivalents subtrees) .
• The structural congruence becomes trivial.
22 / 49
The Simulator : Pattern Matching AlgorithmMatches sequences with and NFA and terms with a tree automata
Pre–Processing • enumerate patterns and parts of patterns• build an NFA and a tree automata transition
tableMatching • visit bottom–up the term tree annotating
each node with information about allpatterns and parts of patterns matched
? nodes with the code of the root of somepattern correspond to a match
Advantages
• In the worse case walks the term tree a single time to get allpossible matches.
• In the average case recomputes only the partof matching information that as beenchanged by a rule application.
23 / 49
Sequence Pattern NFA Example
The NFA built for matching the a.x .y .z .x .y .a sequence pattern.
With > var on the transition arrows we mean that all readElements are bound to the variable identifier.
With var we mean that the transition can be done if the next nsymbols in the input sequence are the same of the sequence, of nsymbols, bound to var int the binding associated with start state.
Constant elements are indicated between quotes.
24 / 49
The Simulator : Tree Automata
effective way of representing and manage tree languagesstrong theory and efficient algorithms
String automata
evenstart odd
b
a
bb
a
evena→ odd
odda→ even. . .
Tree automata (parallel bottom up)
a even
a even
b even a odd
a odd
b odd b odd
(even, odd)a→ even
(even, even)a→ odd
. . .
25 / 49
The Simulator : Tree Automata (Example)
Binary Boolean Expression Evaluator
0 = false, 1 = true
∧
∨
∧
0 q0 1 q1
∨
1 q1 0 q0
∧
∨
0 q0 1 q1
∧
1 q1 1 q1
Σ = {∧(2),∨(2), 0(0), 1(0)}Q = {q0, q1}F = {q1}visit bottom-up the tree
– transition for constants as starting states– mark a node with a state that is deduced from
the type of operator andthe state of the children
ε0→ q0 ε
1→ q1
(q1, q1)∧→ q1 (q0, q1)
∨→ q1
(q0, q1)∧→ q0 (q1, q0)
∨→ q1
(q1, q0)∧→ q0 (q1, q1)
∨→ q1
(q0, q0)∧→ q0 (q0, q0)
∨→ q0
also top down : acceptance condition on leaves q1 initial on the root
q00→ accept q1
1→ accept
26 / 49
The Simulator : Tree Automata (Example)
Binary Boolean Expression Evaluator
0 = false, 1 = true
∧
∨
∧
0 q0 1 q1
∨
1 q1 0 q0
∧
∨
0 q0 1 q1
∧
1 q1 1 q1
Σ = {∧(2),∨(2), 0(0), 1(0)}Q = {q0, q1}F = {q1}visit bottom-up the tree
– transition for constants as starting states– mark a node with a state that is deduced from
the type of operator andthe state of the children
ε0→ q0 ε
1→ q1
(q1, q1)∧→ q1 (q0, q1)
∨→ q1
(q0, q1)∧→ q0 (q1, q0)
∨→ q1
(q1, q0)∧→ q0 (q1, q1)
∨→ q1
(q0, q0)∧→ q0 (q0, q0)
∨→ q0
also top down : acceptance condition on leaves q1 initial on the root
q00→ accept q1
1→ accept
27 / 49
The Simulator : Tree Automata (Example)
Binary Boolean Expression Evaluator
0 = false, 1 = true
∧
∨
∧
0 q0 1 q1
∨
1 q1 0 q0
∧
∨
0 q0 1 q1
∧
1 q1 1 q1
Σ = {∧(2),∨(2), 0(0), 1(0)}Q = {q0, q1}F = {q1}visit bottom-up the tree
– transition for constants as starting states– mark a node with a state that is deduced from
the type of operator andthe state of the children
ε0→ q0 ε
1→ q1
(q1, q1)∧→ q1 (q0, q1)
∨→ q1
(q0, q1)∧→ q0 (q1, q0)
∨→ q1
(q1, q0)∧→ q0 (q1, q1)
∨→ q1
(q0, q0)∧→ q0 (q0, q0)
∨→ q0
also top down : acceptance condition on leaves q1 initial on the root
q00→ accept q1
1→ accept
28 / 49
The Simulator : Tree Automata (Example)
Binary Boolean Expression Evaluator
0 = false, 1 = true
∧
∨
∧ q0
0 q0 1 q1
∨ q1
1 q1 0 q0
∧
∨ q1
0 q0 1 q1
∧ q1
1 q1 1 q1
Σ = {∧(2),∨(2), 0(0), 1(0)}Q = {q0, q1}F = {q1}visit bottom-up the tree
– transition for constants as starting states– mark a node with a state that is deduced from
the type of operator andthe state of the children
ε0→ q0 ε
1→ q1
(q1, q1)∧→ q1 (q0, q1)
∨→ q1
(q0, q1)∧→ q0 (q1, q0)
∨→ q1
(q1, q0)∧→ q0 (q1, q1)
∨→ q1
(q0, q0)∧→ q0 (q0, q0)
∨→ q0
Also top down :q1 initial on the root; acceptance by condition on leaves
q00→ accept , q1
1→ accept
29 / 49
The Simulator : Tree Automata (Example)
Binary Boolean Expression Evaluator
0 = false, 1 = true
∧
∨ q1
∧ q0
0 q0 1 q1
∨ q1
1 q1 0 q0
∧ q1
∨ q1
0 q0 1 q1
∧ q1
1 q1 1 q1
Σ = {∧(2),∨(2), 0(0), 1(0)}Q = {q0, q1}F = {q1}visit bottom-up the tree
– transition for constants as starting states– mark a node with a state that is deduced from
the type of operator andthe state of the children
ε0→ q0 ε
1→ q1
(q1, q1)∧→ q1 (q0, q1)
∨→ q1
(q0, q1)∧→ q0 (q1, q0)
∨→ q1
(q1, q0)∧→ q0 (q1, q1)
∨→ q1
(q0, q0)∧→ q0 (q0, q0)
∨→ q0
Also top down :q1 initial on the root; acceptance by condition on leaves
q00→ accept , q1
1→ accept
30 / 49
The Simulator : Tree Automata (Example)
Binary Boolean Expression Evaluator0 = false, 1 = true
∧ q1
∨ q1
∧ q0
0 q0 1 q1
∨ q1
1 q1 0 q0
∧ q1
∨ q1
0 q0 1 q1
∧ q1
1 q1 1 q1
Σ = {∧(2),∨(2), 0(0), 1(0)}Q = {q0, q1}F = {q1}visit bottom-up the tree
– transition for constants as starting states– mark a node with a state that is deduced from
the type of operator andthe state of the children
ε0→ q0 ε
1→ q1
(q1, q1)∧→ q1 (q0, q1)
∨→ q1
(q0, q1)∧→ q0 (q1, q0)
∨→ q1
(q1, q0)∧→ q0 (q1, q1)
∨→ q1
(q0, q0)∧→ q0 (q0, q0)
∨→ q0
Also top down :q1 initial on the root; acceptance by condition on leaves
q00→ accept , q1
1→ accept
we store bindings information in the nodes attributes 31 / 49
32 / 49
33 / 49
The simulator : other
• Gillespie’s algorithm on matchset of structured rule schemata.
• Rate function can be expressed in C# language andimplemented through reflection.
• Used caching and node–sharing techniques.
• Compressed representation of bindings information.
5000 lines of F# + 1000 lines of C#
currently onhttp://www.cli.di.unipi.it/~scatena/pages/downloads/
SCLSm/SCLSm-v1.11.zip
soon on http://www.di.unipi.it/msvbio/wiki/
34 / 49
The software : example of input file
LHSs RHSs Rate Functions
35 / 49
Outline
1 IntroductionFormal Modelling of Biological SystemsQualitative Modelling with the Calculus of Looping SequencesQuantitative Modelling with the Stochastic CLS
2 The Stochastic CLS Simulator
3 Studied Biological Systems
36 / 49
Chemical Simulations
Lotka–VolterraY1
107→ Y1 | Y1
Y1 | Y20.17→ Y2 | Y2
Y2107→ ε
BrussellatorX
50007→ X | YY1
507→ Y2
Y1 | Y1 | Y20.000057→ Y1 | Y1 | Y1
Y157→ ε
Sorbitol Dehydrogenase
E |NADH0.0000062
33
ENADH
ENADH | F0.000000002
227
ENAD+ | S
ENAD+ 50
0.0000006E |NAD+ E
0.0019→ ε
37 / 49
Lactose Operon Regulation System in Escherichia Coli
Description
38 / 49
Lactose Operon Regulation System in Escherichia ColiSCLS Model
lacI · ex 0.027−→ lacI · ex | Irna (S1)
Irna0.17−→ Irna | repr (S2)
polym | ex · lacP · ey 0.17−→ ex · PP · ey (S3)
ex · PP · ey 0.017−→ polym | ex · lacP · ey (S4)
ex · PP · lacO · ey 20.07−→ polym | Rna | ex · lacP · lacO · ey (S5)
Rna0.17−→ Rna | betagal | perm | transac (S6)
repr | ex · lacO · ey 1.07−→ ex · RO · ey (S7)
ex · RO · ey 0.017−→ repr | ex · lacO · ey (S8)
repr | LACT0.0057−→ RLACT (S9)
RLACT0.17−→ repr | LACT (S10)
`ex´L c (perm | X )0.1·f17−→
`perm · ex´L c X (S11)
LACT |`
perm · ex´L c X0.0001·f27−→
`perm · ex´L c (LACT|X ) (S12)
betagal | LACT0.000017−→ betagal | GLU | GAL (S13)
perm0.0017→ ε (S14)
betagal0.0017→ ε (S15)
transac0.0017→ ε (S16)
repr0.0027→ ε (S17)
Irna0.017→ ε (S18)
Rna0.017→ ε (S19)
RLACT0.0027→ LACT (S20)`
perm · ex´L c X0.001·f27−→
`ex´L c X (S21)
(S1-S10) : Transcription - (S11-S13+S21) : Effect of Lactose degradation enzymes - (S14-S19) : Degradation
where f1(σ) = occ(perm, σ(X )) + 1, f2(σ) = occ(perm, σ(ex)) + 1 andocc(a,T ) returns the number of the elements a syntactically occurring in the term T .
39 / 49
Lactose Operon Regulation System in Escherichia Coli
Ecoli ::=`
m´L c (lacI−A | 30× polym | 100× repr)
EcoliLact ::= Ecoli | 10000× LACT
40 / 49
Quorum Sensing in Pseudomas aeruginosa
DescriptionQuorum sensing it is the ability that many bacteriahave of monitoring their population density andmodulating their gene expressions according to thisdensity.
Pseudomonas aeruginosa is a pathogen thatregulates its virulence on quorum sensing.
SCLS Model
lasO.lasR.lasI207→ lasO.lasR.lasI | LasR (S1)
lasO.lasR.lasI57→ lasO.lasR.lasI | LasI (S2)
LasI87→ LasI | 3oxo (S3)
3oxo | LasR0.257→ 3R (S4)
3R4007→ 3oxo | LasR (S5)
3R | lasO.lasR.lasI0.257→ 3RO.lasR.lasI (S6)
3RO.lasR.lasI107→ 3R | lasO.lasR.lasI (S7)
lasO.lasR.lasI12007→ lasO.lasR.lasI | LasI (S8)
lasO.lasR.lasI3007→ lasO.lasR.lasI | LasR (S9)`
m´L c 3oxo | X 307→ 3oxo |
`m´L c X (S10)
3oxo`
m´L c X
17→`
m´L c 3oxo | X (S11)
LasI17→ ε (S12)
LasR17→ ε (S13)
3oxo17→ ε (S14)
41 / 49
Quorum Sensing in Pseudomas aeruginosa
Bact ::=`
m´L c (lasO.lasR.lasI ) SignalBact ::=
`m´L c (lasO.lasR.lasI | signal)
autoinducer inside : [ |`
m´L c (lasO.lasR.lasI | signal | X ) | ] ∗ occ(3oxo, X )
SignalBact
SignalBact | Bact × 19
SignalBact | Bact × 4
SignalBact | Bact × 99
42 / 49
Summary . . .
CLS as language for modelling biological systems. . . not only qualitative analysis. . .
. . . extensions . . .for modelling proteins linksfor modelling spatial information
. . . encoding . . .to P-Systemsto Maudeto SMSR
. . .. . . open problems . . .
kinetic constant ?complexity – scalability ?biologist–oriented interface ?
. . .
[1] [3] [2] [4]
43 / 49
ReferencesRoberto Barbuti, Giulio Caravagna, Andrea Maggiolo-Schettini,Paolo Milazzo, and Giovanni Pardini.
The calculus of looping sequences.
Formal Methods for Computational Systems Biology, SFM 2008,pages 387–423, 2008.
Roberto Barbuti, Andrea Maggiolo-Schettini, Paolo Milazzo, andAngelo Troina.
Bisimulations in calculi modeling membranes.
Formal Aspect of Computing, 2008.
Daniel T. Gillespie.
Exact stochastic simulation of coupled chemical reactions.
The Journal of Physical Chemistry, 81(25):2340–2361, 1977.
G. Scatena.
Development of a stochastic simulator for biological systems basedon Calculus of Looping Sequences.
Master’s thesis, Dipartimento di Informatica, Universita di Pisa,2007.
44 / 49
T hanks
45 / 49
APPENDIX
46 / 49
Details of Differential Semantics
47 / 49
Details about Gillespie’s Algorithm
48 / 49
Details of CLS Stochastic Semantics
49 / 49
Details of CLS Stochastic Semantics
. . . see the paper on SFM 2008 for details . . .