Research Collection
Doctoral Thesis
Formal reductions of stochastic rule-based models ofbiochemical systems
Author(s): Petrov, Tatjana
Publication Date: 2013
Permanent Link: https://doi.org/10.3929/ethz-a-010006341
Rights / License: In Copyright - Non-Commercial Use Permitted
This page was generated automatically upon download from the ETH Zurich Research Collection. For moreinformation please consult the Terms of use.
ETH Library
Diss. ETH No. 21269
Formal reductions of stochastic
rule-based models of biochemical
systems
A dissertation submitted to
ETH ZURICH
for the degree of
Doctor of Sciences
presented by
Tatjana Petrov
M. Sc. Computer Science, University of Novi Sad
born 27.12.1983
citizen of Serbia
accepted on the recommendation of
Prof. Dr. Heinz W. Koppl, examiner
Prof. Dr. Thomas A. Henzinger, co-examiner
Dr. Jerome Feret, co-examiner
2013
© Copyright by Tatjana Petrov, 2013.
All Rights Reserved.
ii
Abstract
Understanding principles behind cell’s functioning is one of the most fundamental
topics of science today. However, realistically explaining the variety and complex-
ity observed in biological systems results in highly complex models with a huge
number of possible state configurations. Reducing the complexity of these models,
while preserving the realistic model description, represents a major challenge.
People have proposed domain-specific formal languages in order to facilitate the
knowledge representation and to aid the model analysis. One of them, rule-based
language, enables to compactly specify molecular interactions, by maintaining the
internal protein structure in form of a site-graph, and by allowing interactions
to happen upon testing only patterns, local contexts of molecular species. The
executions of rule-based models are traces of a continuous-time Markov chain
(CTMC), defined according to the principles of chemical kinetics.
In this thesis, we study formal reductions of rule-based models. The idea of re-
duction is that, if the rules are executed upon testing patterns (instead of full
molecular species), then the stochastic executions of the whole model can be de-
scribed in terms of a carefully chosen set of patterns, called fragments, that are
much fewer the molecular species. Our method aligns with the principle of static
program analysis – the CTMC traces (semantics) are considered only virtually,
while the actual operations are performed over the rule-set (source code, that is
a set of site-graph-rewrite rules). To this end, we study separately mathematical
relations between rule-sets, and what these relations imply for their respective
CTMC’s.
We provide a general model reduction procedure, that is efficient – of complexity
linear in the description of the rule-set, and automatic – it applies to any well-
defined rule-based program. The formal relation between the respective CTMC’s
is guaranteed within two frameworks. In the framework for exact reductions, the
set of fragments is enforced and the precise relation between respective CTMC’s is
guaranteed. In the framework for approximate reductions, the set of fragments can
vary, and, for a given time limit of a trace, the error in terms of Kullback-Leibler
divergence for trace distributions of the CTMC’s is computed. Both frameworks
rely on a unifying mathematical theory of exact and approximate Markov chain
aggregation, which takes a major part of the thesis. The theory is instantiated
with three toy examples and two large-scale case studies.
iii
Zusammenfassung
Das Verstandnis uber die Prinzipien und der Funktionsweise von Zellen stellt
eines der grundlegendsten Themen der gegenwartigen Wissenschaft dar. Allerd-
ings fuhrt die realistische Beschreibung der Vielfalt und der Komplexitat solcher
biologischen Systeme zu hochkomplexen Modellen mit einer enormen Anzahl an
moglichen Zustandskonfigurationen. Die Reduktion der Modellkomplexitat stellt,
unter Wahrung der realistischen Modellbeschreibung, eine grosse Herausforderung
dar.
Zur Erleichterung des Wissensstandes und zur Unterstutzung der Modellanal-
yse wurden domanenspezifische, formale Sprachen vorgeschlagen. Eine davon,
die regelbasierte Sprache, ermoglicht eine kompakte Spezifikation der molekularen
Wechselwirkungen, durch die Wahrung der inneren Proteinstruktur in Form einer
site-graph, und indem Wechselwirkungen erfolgen konnen. Die Ausfuhrung eines
regelbasierten Modells liefert einen Pfad einer zeitstetigen Markov Kette (Englisch:
continuous-time Markov chain, CTMC), welche durch die Prinzipien der chemis-
chen Kinetik definiert ist.
In dieser Arbeit untersuchen wir formale Reduktionen von regelbasierten Mod-
ellen. Die Idee der Reduktion besteht darin, dass, wenn die Regeln an Testmuster
angewandt werden (anstatt auf die gesamte molekulare Spezies), die stochastis-
che Ausfuhrung des gesamten Modells durch eine sorgfaltig gewahlte Menge von
Mustern, sogenannte Fragmente, beschrieben werden kann, mit viel weniger moleku-
laren Arten. Unsere Methode richtet sich an das Prinzip der statischen Program-
manalyse — die CTMC Spuren (Sematik) werden nur virtuell berucksichtigt,
wahrend die eigentlichen Operationen uber den Regelsatz durchgefhrt werden
(Quellcode, als Menge der site-graph-rewrite Regeln). Zu diesem Zweck unter-
suchen wir separat die mathematischen Beziehungen zwischen den Regelwerken
und deren Bedeutungen fur die jeweiligen CTMC’s.
Wir schlagen ein allgemeines Verfahren zur Modellreduktion vor, welches linear mit
der Grosse des Regelsatzes skaliert und desweiteren auf beliebige regelbasierte Pro-
gramme anwendbar ist. Die formale Beziehung zwischen den jeweiligen CTMC’s
ist innerhalb zwei Szenarien gewahrleistet. Im Kontext exakter Reduktionen ist die
Menge der Fragmente eindeutig und die genaue Beziehung zwischen den jeweiligen
CTMC’s gewahrleistet. Im Kontext genaherter Reduktionen kann die Menge der
iv
Fragmente variieren und fur ein gegebenes Zeitlimit eines Pfades wird der Fehler
anhand der Kullback-Leibler Divergenz zwischen Verteilungen berechnet. Beide
Programmansatze basieren auf einer vereineitlichten mathematischen Theorie der
exakten und approximierten Markov Ketten Aggregation, die einen wesentlichen
Teil der Arbeit darstellt. Die Theorie wird mit drei einfachen Beispielen und zwei
umfangreichen Fallstudien instanziiert.
v
Resume
Comprendre les principes sous-jacents au fonctionnement de la cellule est un
des sujets fondamentaux en sciences aujourd’hui. Cependant, expliquer de facon
realiste la variete et la complexite observees dans les systemes biologiques conduit
a des modeles tres complexes contenant un grand nombre de configurations possi-
bles. Reduire la complexite de ces modeles, tout en preservant une description de
modele fidele a la realite, represente un defi majeur.
Des langages formels dedies ont eteproposes afin de representer les connaissances
ainsi que pour faciliter lanalyse de modele. Lun dentre-eux, le langage a base
de regles, permet de definir de maniere concise les interactions moleculaires, en
decrivant la structure interne de la proteine sous la forme dun graphe et en per-
mettant aux interactions de se produire apres avoir testeuniquement des motifs,
contextes locaux d’especes moleculaires. Les executions des modeles a base de
regles sont les traces d’une chaıne de Markov a temps continu (CMTC), definie
selon les principes de la cinetique chimique.
Dans cette these, nous etudions les reductions formelles des modeles a base de
regles. L’idee de la reduction est que si les regles sont executees apres avoir
evalueles motifs (plutt que les especes moleculaires completes) alors l’execution
stochastique du modele complet peut etre decrite par un ensemble de motifs
biens choisis, appeles fragments, qui sont bien moins nombreux que les especes
moleculaires. Notre methode correspond au principe d’analyse statique de pro-
grammes – les traces de la CMTC (semantique) sont considerees uniquement de
maniere virtuelle, tandis que les operations sont effectuees sur l’ensemble des regles,
c’est a dire par un ensemble de regles de reecriture. A cette fin, nous etudions
separement les relations mathematiques entre les ensembles de regles et ce que ces
relations impliquent sur leurs CMTC respectives.
Nous proposons une procedure generale de reduction de modele, efficace – de com-
plexite lineaire en la taille de l’ensemble de regles – et automatique – applicable a
tout programme a base de regles bien definies. La relation formelle entre les CMTC
respectives est garantie au sein de deux cadres. Dans le cadre des reductions ex-
actes, l’ensemble des fragments est impose et la relation precise entre les CMTC
respectives est garantie. Dans le cadre des reductions approchees, l’ensemble des
fragments peut varier, et, pour un temps limite de trace donne, l’erreur en ter-
mes de divergence de Kullback-Leibler pour les distributions de trace des CMTC
vi
est calculee. Ces deux cadres reposent sur une theorie mathematique unifiant
agregation de chaine de Markov exacte et approchee, qui constitue une grande
partie de la these. La theorie est appliquee a trois exemples jouets et a deux
etudes de cas a grande echelle.
vii
Sommario
Comprendere i meccanismi alla base del funzionamento delle cellule e uno degli
obiettivi principali della scienza moderna. Tuttavia, spiegare in maniera realis-
tica la varieta e la complessita osservata nei sistemi biologici richiede l’utilizzo di
modelli molto complessi con un numero elevatissimo di possibili configurazioni di
stato. Ridurre la complessita di tali modelli, mantenendo al tempo stesso una
descrizione realistica dei sistemi in esame, e quindi un obiettivo molto importante.
Allo scopo di facilitare l’analisi e la rappresentazione del contenuto informativo
dei sistemi, negli ultimi anni sono stati proposti diversi linguaggi formali dominio-
specifici. Uno di questi, il rule-based language, permette di rappresentare in modo
compatto le interazioni tra molecole, mantenendo la struttura proteica interna in
forma di grafo dei siti, e verificando la possibilita di accadimento di determinate
interazioni tramite il riconoscimento di particolari patterns, contesti locali di specie
molecolari. Le esecuzioni dei modelli rule-based che sono realizzazioni di catene
di Markov a tempo continuo (CTMC), definite secondo i principi della cinetica
chimica.
L’obiettivo di questa tesi e la riduzione formale dei modelli rule-based. L’idea
alla base della riduzione e la seguente: se le regole sono eseguite verificando
dei patterns (invece delle intere specie molecolari), allora l’esecuzione stocastica
dell’intero modello puo essere descritta in termini di un appropriato insieme di pat-
terns, chiamati frammenti, il cui numero e di molto inferiore a quello delle specie
molecolari. Il metodo utilizzato in questa tesi si allinea con i principi dell’analisi
statica di programmi - i percorsi della CTMC (semantica) sono considerate solo
virtualmente, mentre le operazioni reali vengono eseguite attraverso un set di re-
gole (il codice sorgente che e l’insieme di grafo dei siti-regole di riscrittura). A
tale scopo ci occupiamo di studiare separatamente le relazioni matematiche tra i
diversi insiemi di regole e cosa queste relazioni implichino per le rispettive catene
di Markov a tempo continuo.
In questo lavoro, presentiamo una procedura generale di riduzione di modello che e
efficiente di complessita lineare nella descrizione del set di regole, ed automatica -
si applica a qualunque rule-based program che sia ben definito. La relazione formale
tra le rispettive CTMC e garantita per mezzo di due framework. Nel framework
per la riduzione esatta, l’insieme di frammenti e imposto a priori e la precisa
viii
relazione che intercorre tra le diverse CTMC e garantita. Per quanto riguarda
invece il framework per la riduzione approssimata, l’insieme di frammenti puo
variare, e, dato un tempo limite di una realizzazione, e possibile calcolare l’errore in
termini di divergenza Kullback-Leibler rispetto alla distribuzione delle realizzazioni
della CTMC. Entrambi i framework si basano su una teoria matematica della
aggregazione esatta ed approssimata delle catene di Markov, la cui presentazione
e parte fondamentale del presente lavoro. La teoria qui presentata e esemplificata
attraverso l’uso di tre esempi numerici e di due casi di studio su larga scala.
Acknowledgements
First of all, I would like to thank to Professor Heinz Koppl, who invited me
to work on this PhD project and who introduced me to rule-based modeling of
biochemical systems. With his knowledge and professionalism, he has been a great
advisor, giving me the freedom to pursue my ideas and guiding me wisely, always
towards better and finer results. His steady vision and belief in this project largely
influenced the thesis to its final form. I have learnt a lot from him, and from the
unique blend of collaborators he brought into this project. Thank you very much
for your ongoing support, as well as your encouragement to this day.
It was an incredible honor for me to start the PhD under the supervision of Pro-
fessor Tom Henzinger. I highly appreciate his guidance, starting at a time when
I was taking his course on model checking at EPFL and finally as my thesis co-
advisor and co-examiner. Thank you very much for your consistent interest in my
work, for your encouragement and support, and for welcoming me at IST in many
occasions. I look forward to continuing our collaboration in the future.
I consider myself very lucky to have worked with Jerome Feret, the inventor of
‘fragments’, who has directly or indirectly inspired many of the results presented in
this thesis, and who has largely shaped my thinking about abstract interpretation
and model reduction. Thanks a lot for being generous with your time for discus-
sions, for always being supportive, for hosting me at ENS, and for co-examining
the final exam.
Much of the enthusiasm and inspiration embedded into this project were created
from discussions with the ‘Kappa people’, Vincent Danos, Walter Fontana, Russ
Harmer and Jean Krivine, who always generously shared their time and thoughts
about rule-based modeling with me. I am especially grateful to Vincent, for hosting
me in 2009 in his lab in Edinburgh at the beginning of my work on this project;
those were intense two weeks that gave an important impulse to this project. I
was fortunate to pursue a one-semester long research stay in 2010, in the lab of
Professor Walter Fontana. Thank you very much for hosting me, and for the
memorable conversations on information propagation in molecular signaling. The
stay in Boston was an incredibly stimulating experience that largely influenced
the later stages of my PhD work.
ix
x
I would like to thank to the postdocs Arnab Ganguly, who taught me the Kurtz’
theorem and the asymptotic behavior of Markov chains, and to Loic Pauleve and
Michael Klann, for valuable discussions on spatial rule-based modeling in the last
year of my PhD. I am very grateful to Loic Pauleve for proof-reading the final ver-
sion of this thesis. My gratitude further goes to Professor Prakash Panangaden,
for his support and for the discussions in workshops in Bertinoro and Barbados.
It was great to have conference-companions and colleagues Sasa Misailovic, Aure-
lian Rizk, Nicholas Stroustrup, Daniel Schultz, Cheng-Zhong Zhang, Ferdinanda
Camporesi, Norman Ferns, to lead inspiring discussions on model reduction and
beyond. Finally, I am thankful to my students Marica Stojanov, Eirini Arvaniti
and Zahra Karimadini, for choosing to work with me at ETH on topics related to
this thesis.
As I first started this PhD journey at EPFL, I would like to thank to the former
MTC group at EPFL, and in particular to Barbara Jobstmann, Laurent Doyen,
Viktor Kuncak, and Ruzica Piskac, for introducing me to different challenges of
formal verification. I thank to Daniel Kroening, for hosting me in his lab in Oxford,
and to Georg Weissenbacher, for introducing me to the counter-example guided
abstraction, and for showing me the Oxford colleges in their full charm. I am
particularly indebted to Verena Wolf, for turning my attention to the Markov chain
reduction problem, and to Maria Mateescu, a PhD companion, for her friendship
and advice. My deepest thanks to Professor Martin Hasler and Professor Sebastian
Maerkl, for providing the office space and for their hospitality during the transition
from EPFL to ETHZ.
I would not have ended up at EPFL (and later at ETHZ) without my thesis
advisors at the University of Novi Sad, Professor Dragan Masulovic and Professor
Igor Dolinka, who were unconditionally supportive when applying – thank you
very much. I would like to thank to Professor Viktor Kuncak, for welcoming me
in Lausanne, and for his support during the early days at EPFL. It is possible
that I would not have even thought of studying abroad if I did not attend the
seminars at the Petnica Science Center from early high-school years – my sincere
gratitude goes to Petnica overall, as well as to my very first co-authors and very
special friends, Zeljka Dobricic and Lazar Krstic.
This thesis has abstracts written in english, and was translated to the three offi-
cial languages of Switzerland, thanks to Zoran Vidakovic and Christoph Zechner
(german), Raphael Barazzuti and Aurelian Rizk (french), and Davide Martino
xi
Raimondo (italian). I am very grateful to all of them, for providing the help with
the translation on a one-night short notice.
The success of this project owes much to all the lab mates from the BISON group
at ETHZ, and to the whole Automatic Control Lab, whose friendly atmosphere
made my transition from Lausanne to Zurich smooth and pleasant. Christoph,
Michael, Sunil, Preetam, thanks for being the best office mates. Kolega Gabriele,
Riki, Mike, Khoa, Marianne, Stefan, Claudia, Andreas, Costas, Maria, Martin,
Vedrana, Christian, thanks for the friendship and joyful lunch breaks. Zurich is a
wonderful city, but it was even nicer with Davide, Miruska, Mica, Jelena, Marko,
Rafal, Nawal, Afonso, Riki, Stefano, Stephan and Manuela, Sofia and Mathias,
Simonetta and Alberto. Thanks for all the great times, there is more to come.
I somewhere read that moving to a foreign country brings you into a psycholog-
ical state of an infant. Marija, Tamara, Zorana, Miruska, rue Marterey brings
lovely memories and feels like growing up among sisters. Marija, thank you for
demonstrating the power of friendship. Nedeljko, Andrijana, Raphael, Alex, Ana,
Roberto, Nevena, Dejan, Nikodin, Marko, Nikola, Marica, Milos, Ivana, Bojana,
Mihailo, Mirabela, Mahdi, Manos, Yannis, I am proud of all of you, thank you for
making Lausanne such a happy and dynamic place.
I would thank to my friends at home, for all the Skype conversations, good times
when back in Serbia, and for visiting me in Switzerland.
Some things hardly obey scientific laws. I am very grateful to Destiny, for introduc-
ing me to my boyfriend Davide, whose love, support and constant encouragement,
especially during the final phases of writing this thesis, were priceless.
Finally, my gratitude and love go to my family: my Mum and my Dad, my Sister,
for their love and support; to ’zet’ Mihajlo, to my grandparents, to uncles, aunts
and dear cousins, and especially to the new generation, born while this thesis was
being created – to Mateja, Aleksandar, Vasilije, Tara and Natalija.
I devote this thesis to my parents Ranko and Mira and to my sister Goca.
This thesis work was financed by SystemsX, the Swiss Initiative for Systems Bi-
ology.
Contents
Acknowledgements ix
1 Preliminaries 9
1.1 Probability spaces and random variables . . . . . . . . . . . . . . . . . 12
1.2 Distance between probability measures . . . . . . . . . . . . . . . . . . 14
1.2.1 Entropy and mutual information . . . . . . . . . . . . . . . . . 16
1.2.2 Relative entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.1 Markov chains and Markov graphs . . . . . . . . . . . . . . . . 20
1.4 Discrete-time Markov chains . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.1 Transient distributions . . . . . . . . . . . . . . . . . . . . . . . 22
1.4.2 Stationary behavior . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5 Continuous-time Markov chains . . . . . . . . . . . . . . . . . . . . . . 23
1.5.1 A discussion on constructing the CTMC . . . . . . . . . . . . 24
1.5.2 Transient distribution . . . . . . . . . . . . . . . . . . . . . . . . 26
1.5.3 Uniformization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5.4 Stationary behavior . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5.5 Finite-dimensional marginal probabilities . . . . . . . . . . . . 28
2 Rule-based modeling of biochemical networks 29
2.1 Chemical kinetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1.1 Stochastic chemical kinetics . . . . . . . . . . . . . . . . . . . . 31
2.1.2 Classical chemical kinetics . . . . . . . . . . . . . . . . . . . . . 33
2.1.3 Determinisitc and stochastic rate constants . . . . . . . . . . . 34
2.1.4 Random time change model and the thermodynamical limit 35
2.2 Site-graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3 Rule-based models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4 Site-graph rigidity and counting automorphisms . . . . . . . . . . . . 41
2.5 Individual-based and species-based semantics of rule-based programs 43
2.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3 Automated reductions of rule-based models 51
3.1 Stochastic fragments: Motivating example . . . . . . . . . . . . . . . . 53
3.2 Fragments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
xii
Contents xiii
3.3 Fragment-based semantics . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4 Reduction with fragments . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.5 Computing fragment-based semantics . . . . . . . . . . . . . . . . . . 61
3.5.1 Translating the contact map . . . . . . . . . . . . . . . . . . . . 62
3.5.2 Translating the rule-based program . . . . . . . . . . . . . . . 63
4 Exact aggregation of Markov chains 65
4.1 Lumpability and invertability . . . . . . . . . . . . . . . . . . . . . . . 66
4.2 Discrete-time case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2.1 Forward criterion . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2.2 Backward criterion . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2.3 Invertability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2.4 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.3 Continuous-time case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.4 Trace semantics of stochastic processes . . . . . . . . . . . . . . . . . . 82
4.4.1 Trace semantics: discrete-time . . . . . . . . . . . . . . . . . . 82
4.4.2 Trace semantics: continuous-time . . . . . . . . . . . . . . . . . 83
4.5 Trace semantics interpretation of exact aggregations . . . . . . . . . . 83
4.5.1 Discrete-time case . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.5.2 Continuous-time case . . . . . . . . . . . . . . . . . . . . . . . . 84
4.6 Matrix representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5 Exact automatic reductions of stochastic rule-based models 88
5.1 Exact fragment-based reduction . . . . . . . . . . . . . . . . . . . . . . 91
5.2 Computing the fragment-based semantics . . . . . . . . . . . . . . . . 95
5.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6 Approximate aggregation of Markov chains 99
6.1 KL divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2 Error measure: Discrete time . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2.1 Lifting: Discrete case . . . . . . . . . . . . . . . . . . . . . . . . 104
6.3 Error measure: Continuous time . . . . . . . . . . . . . . . . . . . . . . 106
6.3.1 Lifting: Continuous case . . . . . . . . . . . . . . . . . . . . . . 109
6.4 Trace semantics interpretation of approximate aggregations . . . . . 111
6.5 Matrix representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7 Approximate automatic reductions of stochastic rule-based mod-els 113
7.1 Approximate reductions and error bound . . . . . . . . . . . . . . . . 115
7.2 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8 Case studies 118
8.1 EGF/insulin receptor pathway . . . . . . . . . . . . . . . . . . . . . . . 118
8.1.1 Model description . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.1.2 Exact fragment-based reduction . . . . . . . . . . . . . . . . . 121
Contents xiv
8.2 HOG pathway in yeast . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.2.1 Model description . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.2.2 Reachable species . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.2.3 Exact fragment-based reduction and model decomposition . 124
9 Conclusions and Discussion 126
mami, tati i Goci
xv
Introduction
Recent advances in high-resolution imaging, microfluidic technology and fluores-
cent biomarkers for proteins rendered possible to obtain measurements at a level
of single cells and even single proteins, for hundreds of cells at a time [25, 38].
However, measurements alone do not explain the underlying mechanisms, and
appropriate mechanistic theories are sought.
Systems biology research focuses on mechanistic, quantitative models, which aim
to explain the function of the subject under study – molecule, cell, organism or
entire species [57]. Following laws of chemical kinetics, under mild simplifying
assumptions, molecular dynamics is appropriately modeled by a continuous-time
Markov process (CTMC), in which one state corresponds to one reaction mixture,
encoded as a multi-set of chemical species. For example, a state can be x =2S1,3S2,5S3, where S1, S2, S3 are chemical species. Then, upon a reaction,
for example, S1, S2c→ S3, the system can move from the state x to the state
x′ = S1,2S2,6S3, at a stochastic rate c whose existence is justified by the laws
of physical chemistry. The number of states of that CTMC grows exponentially
with the species’ abundances.
The above-mentioned reason motivates omitting details in model specification and
adding assumptions to the model. A popular approach is to use a deterministic
limit of a CTMC model, where abundance of all species is scaled to infinity, but
maintaining a concentration (multiplicity per unit volume) of constant order [60].
Then, a set of coupled ordinary differential equations describes the deterministic
evolution of continuous species’ concentrations. The number of equations is equal
to the number of species. However, in many applications in cellular biology, a
deterministic model is unsatisfactory due to the low multiplicities of some molec-
ular species [47, 63, 64, 68]. Then, a stochastic description of chemical reactions
is mandatory to analyze the behavior of the system. To this end, two major
1
Introduction 2
approaches are used to analyze the CTMC. The first approach is statistical esti-
mation of trace distribution and event probabilities of the CTMC by generating
many sample traces [42]. The second approach includes the efforts to understand
the transient evolution of the probability related to each state of the CTMC, re-
ferred to as the transient distribution. The transient distribution evolves according
to the Kolmogorov forward equation (chemical master equation in the chemistry
literature), and, as it is typically very difficult to solve the forward equations (ex-
cept for the simplest systems), sophisticated numerical algorithms are designed to
numerically solve the forward equation for larger systems [52, 66].
Orthogonal to solving the mathematical equations that describe the temporal evo-
lution of the modeled state people have proposed domain-specific, formal languages
in order to facilitate the knowledge representation and to aid the model analysis.
Models written in those languages can be executed by a prescribed operational
semantics, regardless of the size and complexity of the system [35]. An early
formalism designed for computations by multi-set rewriting was named Gamma
[4, 5]. Today, many such modeling frameworks are used for specifying CTMC’s
of biochemical reaction networks: rule-based models, stochastic Petri nets [45],
stochastic process algebras [12, 74], probabilistic Boolean networks [81], to name
a few.
Yet another source of complexity characterizes protein interactions. Each species
can be, for instance, a protein or its phosphorylated form or a protein complex
that consists of several proteins bound to each other. Then, especially in cellular
signal transduction, the number of different such species can be combinatorially
large [53],[84]. To exemplify, one model of the early signaling events in epidermal
growth factor receptor (EGFR) network, with only 8 different proteins, gives rise
to 2748 different molecular species [7], while the full model of the same network
has ≈ 1020 different molecular species [20].
Motivation
Rule-based language is designed to naturally capture the protein-centric and con-
current nature of biochemical signaling. The idea for a rule-based formalism was
discussed in [76], [36], before it was formally introduced in 2003 [24]. Kappa [34]
and BioNetGen [6] are examples of two rule-based modeling platforms. In a rule-
based model, the internal protein structure is maintained in form of a site-graph,
Introduction 3
and interactions can happen upon testing only patterns, local contexts of molec-
ular species. A site-graph is a graph where each node contains different types
of sites, and edges can emerge from these sites. Nodes typically encode proteins
and their sites are the protein binding-domains or modifiable residues; the edges
indicate bonds between proteins. Then, every species is a connected site-graph,
and a reaction mixture is a multi-set of connected site-graphs. The executions of
rule-based models are traces of a continuous-time Markov chain (CTMC), defined
according to the principles of chemical kinetics. Rule-based models testify that the
success and efficiency of model analysis largely depends on the choice of its syntax
[1, 51]. First, the explicit graphical representation of molecular complexes makes
models easy to read, write or edit. Moreover, the description of interactions is
compact and models can trivially be composed, by simply merging two collections
of rules. Finally, a rule set can be executed, or subjected to formal static analy-
sis: for example, it provides efficient simulations [21], automated answers about
the reachability of a particular molecular complex [23] or about causal relations
between rule executions [19].
If a rule-set is to be expanded to its equivalent species description, its quantitative
analysis remains prohibitive. But, if the rules are executed upon testing patterns
(instead of full molecular species), then the executions of the whole model can
be described in terms of a carefully chosen set of patterns, that are much fewer
than molecular species. More specifically, one region of a molecular species being
in a particular state may, or may not influence the state of another region of
a molecular species. Such a notion of influence can be formalized by a binary
relation among the sites of molecular species. As the mentioned correlation can
be detected by only looking at the contexts of rules, one can efficiently, by a single
processing of the rule-set, obtain the set of coarse-grained species, called fragments.
The described method aligns with the principle of static program analysis [16] –
the model executions (semantics) are considered only virtually, while the actual
operations are performed over the rule-set (source code).
The idea of fragment-based reduction was first exploited in [31], where the au-
thors propose how to obtain a set of fragments which self-consistently describe
the dynamics of the model in its deterministic limit. The method was applied to
a model of interplay between epidermal growth-factor receptor (EGFR) and in-
sulin crosstalk, and the reduction from a set of 2899 ODE’s to a set of 208 ODE’s
was demonstrated [20]. Furthermore, the full EGFR model was reduced to only
Introduction 4
2 ⋅ 105 equations, instead of 2 ⋅ 1019. Yet, as the deterministic limit is a particular
limiting behavior of the ground stochastic model, the obtained ‘differential’ frag-
ments do not always correctly describe stochastic kinetics [30], nor they capture
the inherently stochastic dynamics of chemical reactions.
Problem and Contribution
In this thesis, we study the fragment-based reductions of rule-based models. We set
a following framework (Figure 1). The original model R is assigned a continuous-
time Markov chain (CTMC) Xt over the state space of reachable multi-sets
of species X . A class of fragment sets to be considered is formally defined as
emerging from a set of particular equivalence relations defined among domains
(sites) of each protein. For each particular set of fragments, we propose a new
rule-set R, which is referred to as the reduced model. The reduced model is such
that the assigned CTMC Y ′t operates over the state-space of reachable multi-sets
of fragments, denoted by Y. As more species conform to the description of the
same fragment, the species-based state space projects to a fragment-based state
space by a partition function ϕ ∶ X → Y. Let Yt be the process obtained by
projecting samples of Xt by function ϕ,
Yt = y iff Xt ∈ x ∈ X ∣ ϕ(x) = y for all t ≥ 0.
Then, if the CTMC Y ′t is equivalent to the projection Yt, the reduction is said
to be exact. Otherwise, the reduction is approximate.
The CTMC traces are considered only virtually, while the actual operations are
performed over the rule-sets. To this end, we study separately mathematical
relations between R and R, and what these relations imply for their respective
CTMC’s, Xt and Y ′t . Two scenarios are investigated, depending on a type of
formal relation to be proven between the respective CTMC’s: (1) exact reductions,
where the set of fragments is automatically derived subject to the guarantee that
the reduction is exact. (2) approximate reductions, where the set of fragments is
given by the user, and, for a given time limit of a trace, the goal is to provide the
error between trace distributions of processes Yt and Y ′t .
Every set of fragments defines a partition ϕ over the state space of Xt. Aiming
at a procedure for correlating sites of a rule-based model, while guaranteeing an
Introduction 5
exact reduction, we detected three general situations of process Xt with respect
to partition ϕ. In Figure 2, we illustrate the three situations on a simple model of a
discrete-time Markov chain. The resulting procedure (Algorithm 2) correlates any
two sites which are related directly or indirectly within a left-hand-side or a right-
hand-side of a rule, and it hence enforces a ‘strong’ independence notion between
the uncorrelated sites, analogous to the one in Figure 2c. In turn, precisely such
strong independence brings a possibility to effectively reconstruct the transient
semantics of the original system. Motivated by the dependency on the initial
condition, we investigated the asymptotic distribution in the situation when the
initial distribution is not in accordance with the invariant distribution among
lumped states.
Despite the strong correlation notion, examples and case studies confirmed that the
reduction can be significant, or even exponential. However, if the reduced system
remains at a prohibitive size, approximate reduction is necessary. In the frame-
work for approximate reduction, the set of all possible fragment sets is put in a
partially ordered set. Each fragment set is positioned depending on its potential of
expressing quantities which influence self-consistency. Among the various metrics
rule set R
rule set R
Xt
Yt Y t
(species)
(fragments)reduction error
projectionreduction
Figure 1: Problem setup. The presented arrows serve for illustration purpose:the double arrow denotes the assignment of a CTMC to a rule-based model, the
dotted arrows illustrate operations which are never performed.
a
c
b
d0.3
0.7
0.2
0.8
0.5
0.6
0.5
0.4
a
c
b
d0.3
0.7
0.2
0.8
0.60.4
0.60.4
a
c
b
d
0.5
0.6
0.5
0.4
0.2
0.8
0.2
0.8
a) b) c)
Figure 2: Three general situations of process Xt with respect to partitionϕ. a) The lumped process Yn is not Markov, time-homogeneous. b) Thelumped process Yn is Markov, time-homogeneous for all initial distributions.c) The lumped process Yn is Markov, time-homogeneous whenever the ratioof probabilities between states c and b at the initial state is equal to 0.2
0.8 = 0.25.
Introduction 6
for stochastic processes (see [39] for an overview), we decided on the Kullback-
Leibler divergence [27]. The main reason of employing KL-distance was that it
has particularly suitable properties when applied to the probability space of traces
generated by Markov sources. More concretely, it can be computed efficiently, as a
function of only the corresponding matrix description and the transient distribu-
tion of the original process [3, 13]. As the measure can be obtained only between
two CTMC’s on the same state space, an upper bound to the error is proposed in-
stead, with a technique inspired by the work in [27]. The inequality is guaranteed
by a standard result of information theory.
Outline
Chapter 1 reviews the basic concepts of Markov chain theory.
Chapter 2 introduces the stochastic chemical kinetics and rule-based models.
The main purpose of Chapter 3 is to formalize fragment-based reductions, as a
notion independent of the semantics under study. In particular, a dimension and
expressiveness of a set of fragments is defined as a means to compare two differ-
ent fragment sets, and it is demonstrated on a toy example that fragment-based
reductions can reduce the number of states of a CTMC exponentially. Moreover,
the general algorithm for reducing a rule-based program with respect to any given
fragment set is presented.
Chapter 4 and Chapter 5 contain the results on exact reduction problem. The
Chapter 4 outlines the results related to exact Markov chain aggregation, indepen-
dently of the application to rule-based models. Discrete-time and continuous-time
Markov chains are studied separately. All properties are summarized in Theo-
rem 4.28, where the relation between trace semantics and transient semantics of
the original and aggregated process are comprehensively presented.
In Chapter 5, the focus is on the practical implications of the theory presented
in Chapter 4, in the context of fragment-based reductions of rule-based models.
We propose an algorithm for obtaining the set of fragments for a given rule-based
program (Algorithm 2), and in Theorem 5.10, we prove that the suggested set
of fragments guarantees an exact reduction. Moreover, we show how to compute
the probability of a species-based state P(Xt = x), given the probability of the
fragment-based state P(Yt = y = ϕ(x)).
Introduction 7
Chapter 6 and Chapter 7 present the results on approximate reduction problem.
Chapter 6 defines the reduction error as the Kullback-Leibler distance measure
between the trace distributions of the reduced chain Yt and the projected pro-
cess Y ′t , and the computation of error as a function of the respective generator
matrices and transient distributions (Theorem 6.12). Moreover, it is shown that
the upper bound on the error can be evaluated when only the generator matrix
and the transient distribution of the original model are known (Theorem 6.19). In
Chapter 7, the framework is instantiated over three examples. Simulation results
indicate how the error can be used to discriminate between fragment sets of equal
dimension.
In Chapter 8, the framework of exact reductions is discussed over two large-scale
case studies.
Parts of the ideas in this thesis are reflected in the Kappa modeling environment
[34], ‘complx’ toolbox. The implementation of the approximate reductions frame-
work within the Kappa modeling environment is a work in progress and for that
reason, we leave the analysis of the approximate framework for large-scale case
studies to future work.
Related work
The principle of obtaining conclusions about system’s dynamics by analyzing their
model description, originates from, and is exhaustively studied in the field of
formal program verification and model checking [11, 16], while it is recently gaining
recognition in the context of programs used for modeling biochemical networks.
An example is the aforementioned related work of detecting fragments for reducing
the deterministic rule-based models [31], detecting the information flow for ODE
models of biochemical signaling [8, 50], or the reaction network theory [18].
To the best of our knowledge, the presented method is the only static analysis tech-
nique for reducing the stochastic models of protein interactions. To this end, we
distinguish the fragment-based approach from model reduction techniques, based
on, for example, separating time-scales [44, 55, 75] or numerical algorithms that
focus of efficiently solving the chemical master equation [52, 66]. Still, once a
fragment-based rule set is obtained, it is amenable to any further analysis.
Introduction 8
In contrast, Markov chain aggregation problem was extensively studied in theory
and application. Put in the context of our problem formulation, strong lumpability
refers to the property of Xt, when there exists an exact aggregation by partition
ϕ for any initial distribution. Tian and Kannan [82] extended the notion of strong
lumpability to continuous time Markov chains. A more general situation of weak
lumpability refers to the property of Xt, when there exists an exact aggregation
by partition ϕ a subset of initial distributions.The notion first appeared in [56]
and subsequent papers [62, 77, 78] focussed toward developing an algorithm for
characterizing the desired set of initial distributions. This reconstruction property,
demonstrated to be efficiently realizable in our framework, is not addressed explic-
itly for weakly lumpable chains in previous literature. A variant of our condition
can be found in [10] where the author considered backward bisimulation over a
class of weighted automata (finite automata where weights and labels are assigned
to transitions). Moreover, we proved that even if the initial distribution is not
in accordance with the invariant distribution among lumped states, it will be so
asymptotically. These convergence results, to the best of our knowledge, have not
been discussed before.
Parts of this thesis were built on results that were previously published in collab-
oration with colleagues: [32, 33, 37, 72]. Published work related to the topic of
this thesis but not discussed or only cited is [70, 71].
Chapter 1
Preliminaries
In this section, we review the basic concepts of probability theory, which are
needed for developing the later analysis of Markov processes. A more elaborated
discussion and proofs of statements can be found in standard measure theory and
probability theory textbooks (for example, [29]).
We start with the general concepts of measure theory.
Let E be a set. We denote by P(E) the set of all subsets of E (power-set), by Ec
the complement of E, and by ∣E∣ the number of elements in E (it is not defined
when E is not countable, and may be infinite when E is countable). A range of
values taken by a function f ∶ E → E′ is denoted by R(f). A partition of a set E
is an equivalence binary relation on E, and the corresponding classes we refer to
as partition classes. We use N to denote for the set of naturals with zero.
Definition 1.1. A σ-algebra E on E is a non-empty set of subsets of E, which is
closed under complement and countable unions, that is,
(i) ∅ ∈ E ,
(ii) for all A ∈ E , Ac ∈ E ,
(iii) for all sequences Aii∈N of elements in E , ⋃iAi ∈ E .
Definition 1.2. Let E be a set and E a σ-algebra on E. The pair (E,E) is called
a measurable space, and each A ∈ E is called a measurable set. A measure µ on
9
Chapter 1. Preliminaries 10
(E,E) is a function µ ∶ E → [0,∞), such that µ(∅) = 0, and, for any sequence
Aii∈N of disjoint elements of E ,
µ(⋃i
Ai) =∑i
µ(Ai),
that is, it satisfies the countable additivity property. The triple (E,E , µ) is called
a measure space. In particular, we will say that a measure is σ-finite, if E is a
countable union of measurable sets with finite measure.
Let A be a collection of subsets of E (A ⊆ P(E)). The smallest σ-algebra on E
which contains all elements of A trivially exists and it is denoted by σ(A). We
call it the σ-algebra generated by A.
When E is a finite countable set, the measure on (E,P(E)) which assigns to each
measurable set A the number of elements of A, that is, m(A) = ∣A∣, is called the
counting measure. When E is not countable, we work with Borel sets.
Definition 1.3. Given a set E, the σ-algebra generated by the set of open sets in
E is called the Borel σ-algebra of E, denoted by B(E), and its elements are called
Borel sets. A measure µ on (E,B(E)) is called a Borel measure on E.
We will denote by B a Borel σ-algebra on R . If not specified differently, we assume
a Borel measure space (R,B, µ), where µ is the Lebesgue measure. Recall that
the Lebesgue measure on R is the Borel measure on R such that, for all a, b ∈ R,
if a < b then µ((a, b]) = b − a.
Definition 1.4. Given two measurable spaces (E,E) and (E′,E ′), a function
f ∶ E → E′ is measurable if the inverse image of A ∈ E ′, defined by f−1(A) = e ∈E ∣ f(e) ∈ A is in E . A measurable function is often called a measurement on
(E,E).
For example, a measurement on (E,E) is the indicator function of the set A ∈ E ,
defined by 1A(e) =⎧⎪⎪⎪⎨⎪⎪⎪⎩
1, if e ∈ A
0, otherwise.
The measure of f on any measurable set can be determined through the concept
of integral. The right intuition for the integral is to think of the volume taken by
the function f .
Chapter 1. Preliminaries 11
If the range of a measurable function is a finite countable set, the function is said
to be simple. Observe that, if R(f) = a1, . . . , ak, by setting Ai ∶= f−1(ai)for i = 1, . . . , k, we obtain a partition on E . A measurable function on a Borel
σ-algebra of E is called a Borel function.
Definition 1.5. Let f ∶ E → E′ be a nonnegative function. If f is a simple
function, the integral of f is ∫ fdm ∶= ∑ki=1 aiµ(ai). If f is a Borel function (not
necessarily over R), the integral of f is the supremum
∫ fdµ ∶= sup∫ gdm ∣ g is a simple function such that g(e) ≤ f(e) for all e ∈ E ,
if it exists. The integral is also called the mean or expectation of f with respect
to measure µ.
Then, for a given measurable set A ∈ E , the integral of f on a set A is ∫ f1Adµ,
written ∫A fdµ. It can be shown that the integral of f is again a measurable
function on (E,E), and hence the triple (E,E , ν) is a well-defined measure space,
where ν ∶= ∫ fdµ. In fact, f is then called the density of ν with respect to µ. The
following Theorem states the existence of the density function. We rephrase the
Theorem because, as it defines density for general Borel spaces, it will be useful for
introducing the density of a probability measure over traces of a continuous-time
Markov chains, which will be important when defining the error measure in the
framework for approximate aggregations.
As mentioned, intuitively, the integral represents the volume of f in reference to µ.
Clearly, the integral of a function should always dominate the measured function
in the sense that whenever the function has a value 0, the integral should also be
0.
Definition 1.6. If ν and µ are two measures on the same measurable space then
µ is said to be absolutely continuous with respect to ν, or dominated by ν, written
µ≪ ν, if µ(A) = 0 for every set A ∈ E for which ν(A) = 0.
Theorem 1.7. (Radon-Nikodym theorem) Let µ and ν be two measures on (E,E)and let ν be σ-finite. If ν ≪ µ, then there exists a unique nonnegative measurable
function f on E such that
ν(A) = ∫Afdµ,A ∈ E .
Chapter 1. Preliminaries 12
The function f is called the Radon-Nikodym derivative or density of ν with respect
to µ and is denoted by dν/dµ. If ∫ fdµ = 1 for an f ≥ 0, then ν is a probability
measure and f is called its probability density function (pdf) with respect to µ.
1.1 Probability spaces and random variables
Definition 1.8. Probability space is a measure space (Ω,F ,P), with measure P,
such that P(Ω) = 1. It provides a model for an experiment whose outcome is
subject to chance, and has the following interpretation:
(i) Ω is the set of outcomes of the experiments, samples ;
(ii) F is a set of observable sets of outcomes, events ;
(iii) P(A) is the probability of event A ∈ F .
We further assume to be given a probability space (Ω,F ,P).
Definition 1.9. A random variable X is a measurement on (Ω,F). We call X
a discrete random variable, if R(X) is a countable set, and a continuous random
variable, otherwise.
A random variable is a means to reduce the probability space to the observations
of interest. If X is a random variable from (Ω,F) to (E,E), we say that (E,E)is a measurable space generated by X. The measurability of X ensures that the
outputs of the random variable naturally inherit their own probability measure
PX . For example, PX(A), denoted also by P(X ∈ A), amounts to
PX(A) = P(ω ∈ Ω ∣X(ω) ∈ A).
Definition 1.10. The law of distribution of a random variable X is the image
measure PX ∶ E → [0,1], such that PX = P X−1. The range or sample space of
a discrete random variable is also called the alphabet of the random variable X,
and the random variable with alphabet E is also said to be an E-valued random
variable.
Chapter 1. Preliminaries 13
The expectation or mean of a random variable X is the integral ∫ XdP, denoted
by E[X]. Then, it can be shown that Var(X) = E[(X − E[X])2] is also a random
variable, and it is called the variance.
For discrete measurements, the probability density function fX ∶ a1, a2, . . . →[0,1] is determined with respect to the counting measure on (E,E ,PX), and is
more commonly termed the probability mass function (pmf) of X. It is defined by
pX(ai) ∶= P(X = ai) for all ai ∈ E.
Then, by the definition of the integral on discrete measurements, the probability
of event A ∈ P(E) is PX(A) = ∫A fXdm = ∑∣A∣
i=1 pX(ai)dm which indeed computes
to the intuitive result PX(A) = ∑∣A∣
i=1 pX(ai). Similarly, the expectation of X is
E[X] = ∫x∈R(X)xfX(x)dm which agrees with the intuition for the mean of a discrete
measurement: E[X] = ∑∣R(X)∣
i=1 aipX(ai).
Example 1.1. The simplest discrete random variables are:
(i) A Bernoulli random variable Xp ∶ Ω → 0,1, defined by PXp(1) = p and
PXp(0) = 1 − p. The mean of a Bernoulli random variable is E[Xp] = p, and
the variance Var(Xp) = p(1 − p). For example, the indicator function of the
event A ∈ F is a Bernoulli random variable with parameter P(A).
(ii) A binomial random variable X(n,p) ∶ Ω → 0, . . . , n, defined by PX(n,p) =k = (n
k)pk(1−p)n−k. The mean of the binomial random variable is E[X(n,p)] =
np, and the variance Var(X(n,p)) = np(1−p). Binomial random variable with
parameters n and p interprets the total number of successful outcomes when
repeating n independent Bernoulli trials with parameter p.
(iii) A geometric random variable Xp ∶ Ω → 1,2, . . . is defined by PXp = k =(1 − p)k−1p. The mean of a geometric random variable is E[Xp] = 1
p , and the
variance Var(Xp) = 1−pp2 . The geometric random variable with parameter p
interprets the number of trials until a success happens.
(iv) The poisson random variable Poλ ∶ Ω→ 0,1,2, . . . is defined by
PPoλ = k =λke−λ
k!.
The mean is E[Poλ] = λ and the variance is Var(Poλ) = λ.
Chapter 1. Preliminaries 14
For continuous measurements, the density function of measure PX is typically
taken to be the one with respect to the Lebegue measure µ on (R,B). If it exists,
is called the probability density function (pdf) of PX . Then, the probability is
determined by
PX(A) = ∫Af(x)dµ(x) = ∫
Af(x)dx,
where the latter is the standard Lebegue integration. Moreover, for all non-
negative Borel functions g, µ(g) = ∫R g(x)f(x)dx. In particular, the expectation
of X is computed as E[X] = ∫ XdP = ∫x∈R(X)xdPX = ∫z∈R zfX(z)dz.
If X is a random variable to the Borel space (R,B), the measure PX is uniquely
determined by its values on the intervals ((−∞, x] ∣ x ∈ R). The function FX ∶ R→[0,1] defined by FX(x) = PX((−∞, x]) = P(ω ∈ Ω for X(ω) < x) = P(X < x) is
called the cumulative distribution function (cdf) of X (or PX). It is also common
to say the fX is the pdf of X (not of PX as introduced before).
Example 1.2. The two most important random variables for constructing a
continuous-time Markov chain are Poisson random variables and exponential ran-
dom variables. The exponential random variable Expλ ∶ Ω → [0,∞) has a proba-
bility density function
fExpλ(x) = λe−λx.
The mean is E[Expλ] = 1λ and the variance Var(Expλ) = 1
λ2 .
Definition 1.11. (memoryless property) For x, y ∈ R(X), a random variable
X ∶ Ω→ R is said to have the memoryless property, if
PX > x + y ∣X > x = PX > y.
It can be shown that the geometric random variables are the only discrete random
variables satisfying the memoryless property and that exponential random vari-
ables are the only continuous random variable satisfying the memoryless property.
1.2 Distance between probability measures
We introduce an information-theoretic measure of similarity between probability
distributions, called the Kullback-Leibler divergence (KL divergence). It will be
Chapter 1. Preliminaries 15
used later, as a distance measure between the trace semantics generated by two
different Markov chains over the same state space.
Relative information, cross entropy or Kullback-Leibler divergence (KL diver-
gence) is a generalization of entropy. KL divergence is always non-negative, but it
is not a metric: it is non-symmetric, and it does not satisfy the triangle inequality.
It is still often used as a measure of similarity between probability distributions.
A common technical interpretation is that KL divergence is the coding penalty
associated with selecting the candidate distribution to approximate the correct
distribution [17].
Jensen’s inequality and log-sum inequality are useful for providing bounds about
information-theoretic measures.
Theorem 1.12. Let (Ω,F ,P) be a probability space, f an integrable real-valued
random variable, and φ a convex function. Then,
φ(E[f]) ≤ E[φ(f)],
also known as Jensen’s inequality.
Proof. We discuss the inequality only for discrete measurements f , by induction
on the cardinality of R(f) = x1, . . . , xn. Let pi = Pf(xi). For n = 2, φ(E[f]) =φ(p1x1 + p2x2) and E[φ(f)] = p1φ(x1)+ p2φ(x2). Since p2 = 1− p1 and φ is convex,
φ(p1x1+p2x2) ≤ p1φ(x1)+p2φ(x2), so inequality follows. Assume that the inequal-
ity holds for all k < n. Then, E[φ(f)] = ∑ni=1 piφ(xi) = pnφ(xn) + ∑n−1
i=1 piφ(xi).Since p1 + . . . + pn−1 = 1 − pn, the sequence p′i = pi/(1 − pn)i=1,...,n−1 is a probabil-
ity distribution, and, by the induction hypotheses, ∑n−1i=1 p
′iφ(xi) ≥ φ(∑n−1
i=1 p′ixi).
Therefore, we obtain that E[φ(f)] ≥ pnφ(xn) + (1 − pn)φ(∑n−1i=1 p
′ixi). Finally,
pnφ(xn) + (1 − pn)φ(∑n−1i=1 p
′ixi) ≥ φ(∑n
i=1 pi(xi)) = φ(E[f]), by convexity of φ.
Theorem 1.13. Let n ≥ 2, and let a1, . . . , an and b1, . . . , bn be non-negative real
numbers, and let a = ∑i ai, b = ∑i bi. Then
∑i=1
ai lnaibi
≥ a lna
b,
with equality if a1b1
= . . . = anbn
. The inequality is termed log-sum inequality. In
particular, in a special case when ai and bi are probability distributions, we
Chapter 1. Preliminaries 16
obtain the Gibbs inequality :
∑i=1
ai lnaibi
≥ 0.
Proof. Let φ(x) = x lnx and let random variable f have the output values xj = aibi
and distribution pi = P(f = xj) = bib . Since for x > 0, φ′′(x) = 1/x > 0, the
function φ is convex, Jensen’s inequality holds. Then, E[φ(f)] = ∑ibib [
aibi
ln aibi] =
1b(∑i ai ln
aibi), and φ(E[f]) = φ(∑i pixi) = (∑i pixi) ln (∑i pixi) = ∑i ai
b ln (∑i aib
) =1b(a ln a
b). The equality holds in case of f being constant.
1.2.1 Entropy and mutual information
In the following, assume given a probability space (Ω,F ,P) with a discrete mea-
surement f , which induces the probability measure Pf and probability mass func-
tions pf .
Definition 1.14. The entropy of f is defined by
HP(f) = − ∑a∈R(f)
P(f = a) ln P(f = a) = − ∑a∈R(f)
pf(a) lnpf(a).
By convention, 0 ln 0 = 0, and the logarithm basis is arbitrary. In particular, if
the logarithm base is 2, the units for entropy are ‘bits’, and if it is the natural
logarithm, the units are ‘nats’. To denote HP, the subscript will be omitted, if
clear from context.
Intuitively, entropy represents the measure of uncertainty related to a random
variable. More concretely, let the information contained in the outcome a ∈ R(f)be given by If(a) = ln(1/Pf(a)) = − ln P(ω ∣ f(ω) = a), that is, a number of bits
used to encode the fraction of outcomes measured with a (e.g., for encoding 0.25,
there are exactly two bits needed). Then, entropy can be interpreted as exactly the
average information contained in the distribution: H(f) = EPf [If ] = EPf [− ln Pf ](see [80] for further reference).
Theorem 1.15. The entropy value ranges at 0 ≤ H(f) ≤ ln ∣R(f)∣, where the
lower bound holds if and only if f is a constant (no uncertanity), and the upper
bound is reached if f is a uniform distribution.
Chapter 1. Preliminaries 17
Proof. By applying Jensen’s inequality (Theorem 1.12) for R(f) = 1, . . . , n,
ai = Pf(i) and bi = 1, it follows that −∑i ai lnaibi
≤ −1 1∑i bi
= ln ∣n∣, with equal-
ity only if a1 = . . . = an. The upper bound can be shown from Gibbs inequality
(Theorem 1.13).
In the following, we use notation ⟨f, g⟩(ω) for ⟨f(ω), g(ω)⟩, and 1a=b for a function
1a=b =⎧⎪⎪⎪⎨⎪⎪⎪⎩
1, if a = b
0, otherwise.
Theorem 1.16. For two discrete random variables f and g on a common proba-
bility space, it holds that maxH(f),H(g) ≤ H(f, g) ≤ H(f) +H(g), where the
left equality holds if and only if f = g, and the right equality holds if and only if
f and g are independent.
For the proof, we refer to [46]. The latter inequality motivates the notion of mutual
information between two measurements.
Definition 1.17. Mutual information between two measurements f and g is de-
fined by I(f ; g) =H(f) +H(g) −H(f, g).
The mutual information of f with itself is equal to its entropy, because H(f, f) =H(f). More concretely, since P(f = b ∣ f = a) = 1a=b, it follows that
H(f, f) = − ∑a,b∈R(f)
P(f,f)((a, b)) ln P(f,f)((a, b))
= − ∑a,b∈R(f)
Pf(a)1a=b(ln Pf(a) + ln 1a=b) =H(f).
1.2.2 Relative entropy
Assume now given a measurable space (Ω,F) with two different probability mea-
sures, P and M. Let f be a discrete measurement, and denote its respective
probability measures with Pf , Mf , and probability mass functions with pf , mf .
Definition 1.18. If Pf is absolutely continuous with respect to Mf , that is, Pf ≪Mf , the relative entropy of f with measure Pf , with respect to the measure M is
HP∣∣M(f) = ∑a∈R(f)
pf(a) lnpf(a)mf(a)
.
Otherwise, HP∣∣M(f) =∞.
Chapter 1. Preliminaries 18
An immediate corollary of Gibbs inequality (Theorem 1.13) is that relative entropy
is always nonnegative. Since entropy is not a function of the particular output
values a ∈ R(f), but of the distribution of the random variable, it is useful to
view the entropy as a function of the partition induced on the original probability
space. Let R(f) = a1, a2, . . . and Qi = f−1(ai). If Q = Q1,Q2, . . . denotes a
partition of Ω induced by f , we may write HP(Q) = −∑QP(Qi) ln P(Qi).
Definition 1.19. We will say that a measurement f ′ refines a measurement f ,
written f ′ ⪯ f , if for all a ∈R(f ′), there exists a b ∈R(f), such that that f ′−1(a) ⊆f−1(b). In other words, if the induced partitions are denoted by Q′ and Q, every
element of Q′ needs to be contained in some element of Q.
The next theorem shows that the measurement on a discrete random variable
lowers the relative entropy and the entropy of that random variable. The gener-
alization of this result to continuous measurements will be used for providing the
error bound in the approximate aggregation framework.
Theorem 1.20. If f ⪯ g, then HP∣∣M(f) ≥HP∣∣M(g) and HP(f) ≥HP(g).
Proof. Let f ∶ Ω → D1 and g ∶ Ω → D2. Since f ⪯ g, there exists a function
θ ∶ D1 → D2 (which is a measurement on the probability space (D1,P(D1),Pf)),such that g = θ f .
If HP∣∣M(f) =∞, the theorem trivially holds. If HP∣∣M(g) =∞, then also HP∣∣M(f) =∞, and the theorem holds. This is because, if HP∣∣M(g) = ∞, then there exists
an element b ∈ D2, such that Pg(b) > 0 and Mg(b) = 0. But, Mg(b) = 0
implies that for all a ∈ θ−1(b), Mf(a) = 0, while Pg(b) ≠ 0 implies that for
some a ∈ θ−1(b), Pf(a) > 0. The remaining case is when Pf is dominated by Mf
and Pg is dominated by Mg. Then,
HP∣∣M(f) = ∑a∈D1
pf(a) lnpf(a)mf(a)
= ∑b∈D2
⎡⎢⎢⎢⎢⎣∑
a∈θ−1(b)
P(ω ∣ f(ω) = a) lnP(ω ∣ f(ω) = a)M(ω ∣ f(ω) = a)
⎤⎥⎥⎥⎥⎦
≥ ∑b∈D2
⎡⎢⎢⎢⎢⎣
⎛⎝ ∑a∈θ−1(b)
P(ω ∣ f(ω) = a)⎞⎠
ln∑a∈θ−1(b) P(ω ∣ f(ω) = a)∑a∈θ−1(b) M(ω ∣ f(ω) = a)
⎤⎥⎥⎥⎥⎦= ∑b∈D2
Pg(b) lnPg(b)Mg(b)
=HP∣∣M(g),
Chapter 1. Preliminaries 19
where the inequality step relies on the log-sum inequality applied to the bracketed
expression. The proof for entropy is similar.
1.3 Markov chains
The theory of Markov processes has a wide variety of applications ranging from
engineering to biological sciences. In systems biology appropriate Markov pro-
cesses are used in stochastic modeling of biochemical reaction systems, especially
where the constituent species are present in low abundance. In this chapter, we
first recall some general notions related to stochastic processes, and then we review
the concepts about discrete-time and continuous-time Markov chains which will
be necessary in the rest of the thesis.
Let S be a countable set, and (T,<) a totally ordered set.
Definition 1.21. A stochastic process or a random process with state space S
and parameter set T is a collection of random variables Xt, t ∈ T, defined on a
common probability space (Ω,F ,P). If T is countable, the process is said to be
discrete, and otherwise it is continuous.
The index t usually represents time, and then one thinks of Xt as the state of
the process at time t. For any subset T ′ = t1 < . . . < tn ⊂ T , the probability
distribution Pt1,...,tn = P(Xt1 , . . . ,Xtn)−1 of the random vector (Xt1 , . . . ,Xtn) ∶ Ω→Sn is called a finite-dimensional marginal distribution of the process Xt ∶ t ∈ T.
Definition 1.22. Two random processes are equivalent, if they agree at all finite-
dimensional marginal distributions.
For every fixed ω ∈ Ω, the mapping t ↦ Xt(ω) defines a trace, also called a
realization, trajectory, sample path or sample function of the process. Additional
structure is assumed on a stochastic model in order to render the model analysis
easier.
Definition 1.23. Given a stochastic process Xt on a countable state space S,
let HX(t) denote all the information about the process Xt up to time t ∈ T . The
process Xt satisfies the Markov property, if for all states s, s′ ∈ S and all times
t + h > t,PXt+h = s′ ∣ HXt = PXt+h = s′ ∣Xt.
Chapter 1. Preliminaries 20
The process Xt is said to be time-homogeneous, if, being in the state s, the
probability that the next state is s′ is the same, no matter for how much time the
system has been observed:
PXt+h = s′ ∣Xt = s = PXh = s′ ∣X0 = s.
Example 1.3. A simple example of a continuous-time process, which plays an
important role in constructing the continuous-time Markov chain is a counting
process or Poisson process. The Poisson process with intensity λ is a continuous-
time process ξt taking values in the set N, such that (i) ξ(0) = 0, (ii) the number
of events in disjoint time intervals are independent, and (iii) ξ(s+ t)− ξ(t) ∼ Poλs,
that is, P(ξ(s + t) − ξ(t) = k) = e−λs λkskk! , for s ≥ 0.
1.3.1 Markov chains and Markov graphs
When S is a finite set, it is useful to switch to the vector notation. Under some
ordering of the state space, we will denote by P(t) the transition probability matrix
for step t, with entries p(t)(s, s′) = P(Xt = s′ ∣X0 = s). It can be observed that any
Markov, time-homogeneous process satisfies the Chapman-Kolmogorov equations :
P(t+h) = P(t)P(h) for all t, h ∈ T . (1.1)
The Chapman-Kolmogorov equations ensure that any finite-dimensional marginal
distribution of a Markov, time-homogeneous process can be determined through
P(1) (denoted by P), in case of discrete-time, and through the limit ddtP
(t) (denoted
by Q) in case of continous-time. For that reason, a Markov, time-homogeneous
stochastic process on a countable set can be concisely represented in terms of a
chain (graph).
For our analysis, it will be useful to explicitly keep track of the state space of the
process, and the distribution at which the process is initiated.
Definition 1.24. A Markov graph (MG) is a triple (S,w, p0), such that
(i) S is a countable state space,
(ii) w ∶ S × S → R, defines the transition weights,
(iii) p0 ∶ S → [0,1] is such that ∑si p0(si) = 1.
Chapter 1. Preliminaries 21
We later assign a Markov process to a Markov graph. The process assigned
to a Markov graph will be either a discrete-time Markov chain (DTMC) or a
continuous-time Markov chain (CTMC). A separation of the graph description
and the process itself is done because we will later on make statements about
Markov graphs, independently of the process assigned to it.
Definition 1.25. A discrete-time Markov chain (DTMC) is a discrete-time ran-
dom process Xnn∈N which satisfies the Markov and time-homogenity property.
Definition 1.26. A continuous-time Markov chain (CTMC) is a continuous-time
random process Xtt∈R≥0 which satisfies the Markov and time-homogenity prop-
erty.
1.4 Discrete-time Markov chains
Depending on whether the process assigned to a Markov graph is discrete or
continuous, we will call it a discrete Markov graph, and a continuous Markov
graph. We start by defining a discrete-time Markov graph. We will say that w(s, ∶)is a probability distribution, if w(s, s′) ≥ 0 for all s′ ∈ S, and ∑s′∈S w(s, s′) = 1.
Definition 1.27. A discrete-time Markov graphM = (S,w, p0) is such that for all
s ∈ S, w(s, ∶) is a probability distribution. Then, a process Xn assigned to M is
a DTMC, and it is such that, for all s, s′ ∈ S,
(i) P(X0 = s) = p0(s), and
(ii) P(X1 = s′ ∣X0 = s) = w(s, s′).
Notice that it is sometimes implicit in the literature to assume that the DTMC
Xn is defined only by its one-step transition matrix. In our treatment, a DTMC
assigned to (S,w, p0) has a transition matrix determined by w, it operates over a
state space S and it has a fixed initial probability distribution p0.
Chapter 1. Preliminaries 22
1.4.1 Transient distributions
For a given DTMC Xn, the matrix P(1) = P is called the (one-step) transition
matrix. Due to Chapman-Kolmogorov equations, P(n) = Pn. The marginal distri-
bution of Xn is also called the transient distribution at time n. We use the row vec-
tor notation π(n) for the transient distribution at time n, so that π(n)(s) = P(Xn =s). Then, the transient distribution computes to π(n) = π(n−1)P = . . . = π(0)Pn.
Remark 1.28. It is worth realizing here that two DTMC’s which are indistinguish-
able by their transient distributions may have different distributions of traces.
For example, knowing the marginal distributions of X0 and X1 is not enough for
reconstructing the distribution of their joint (X0,X1): take two DTMC’s Xnand X ′
n, with S = 1,2,3 and p0 = (13 ,
13 ,
13), and let the weight functions
be as follows: w(1,2) = w(2,3) = w(3,1) = w(1,1) = w(2,2) = w(3,3) = 0.5
and w′(1,3) = w′(3,2) = w′(2,1) = w′(1,1) = w′(2,2) = w′(3,3) = 0.5. The
marginal distributions of either of these chains is uniform, while, for example,
P((X0,X1) = (1,2)) = 1/4 and P((X ′0,X
′1) = (1,2)) = 0.
1.4.2 Stationary behavior
Let P be a transition matrix.
Definition 1.29. A probability distribution µ ∶ S → [0,1] is called a stationary
probability distribution of P, if
µ(s) =∑s′µ(s′)P(s′, s) for all s, s′ ∈ S,
or, in matrix notation, a non-negative vector µ is an invariant measure, if µP = µ.
The stationary distribution is also termed equilibrium or steady state distribution
in the related Markov chain literature. However, in the application to model-
ing biological systems, the term ‘equilibrium’ or ‘steady state’ often refers to the
stationarity with respect to the deterministic model, and thus differs from the
probabilistic equilibrium.
A stationary distribution can be interpreted as a fixed point for the Markov chain,
in the sense that µ = µP = . . . = µPn. However, simply knowing a fixed point
exists does not guarantee that the system will converge to it, or that it is unique.
Chapter 1. Preliminaries 23
There exists a criterion for proving the existence, uniqueness and convergence
of the stationary distribution – it suffices to show that the transition matrix is
irreducible and aperiodic.
Definition 1.30. We say that s communicates to s′, written s→ s′, if there exists
n ≥ 0 such that p(n)s,s′ > 0. Let s↔ s′ iff s → s′ and s′ → s. The equivalence classes
of ↔ are called communication classes.
Definition 1.31. A transition matrix is irreducible if there is only one communi-
cation class. That is, if s↔ s′ for all s, s′ ∈ S. Otherwise, it is called reducible.
Definition 1.32. The period of state s ∈ S is defined by d(s) = gcdn ≥ 1 ∶ p(n)s,s >0, where gcd stands for greatest common divisor. If d(s) = 1, the we say that s
is aperiodic, and if d(s) > 1, we say that s is periodic with a period of d(s).
Theorem 1.33. Suppose that a DTMC Xn has an irreducible and aperiodic
transition matrix P. Then, there is a unique stationary distribution µ with positive
values at all components, such that
P(Xn = s)→ µ(s) as n→∞ for all s
or, equivalently, limn→∞π(0)Pn = µ.
We refer to ([67], Theorem 1.7.7) for the proof.
1.5 Continuous-time Markov chains
Reasoning about continuous-time processes is more subtle, loosely because the
parameter set T is uncountable, and we cannot assign a probability to an un-
countable union of marginal distributions. In this respect, it is useful to re-
strict to the right-continuous processes, meaning that for all ω ∈ Ω and t ≥ 0
there exists ε > 0 such that Xs(ω) = Xt(ω) for s ∈ [t, t + ε]. This assump-
tion allows reasoning about any finite-dimensional distribution of the process.
For example, one can find the marginal probability P(Xt = s), or the reacha-
bility of the state s, P(Xt = s for some t), or the finite-dimensional probability
P(Xt0 = s0, . . . ,Xtn = sn).
Every trace of a right-continuous process remains constant for a while, and then
‘jumps’ to a new state. We will assume in our analysis that a process is regular,
Chapter 1. Preliminaries 24
that is, in all finite intervals [0, t), only finitely many jumps may occur (otherwise,
the process is said to be explosive). The restriction to non-explosive processes
is justified because the stochastic models of biochemical networks are trivially
non-explosive, if modeled at a proper resolution. Each right-continuous process is
associated with random variables
(i) ξ0, ξ1, . . . ∈ R≥0 - jump times of Xt (absolute time instants when jumps
occur), defined by
ξ0 = 0, ξn+1 = inft > ξn ∣Xt ≠Xξn.
(ii) τ0, τ1, . . . ∈ R≥0 - waiting times, such that τi = ξi+1 − ξi (waiting times relative
to the last occured jump), and
(iii) Z0, Z1, . . . ∈ S - the sequence of states visited by jumps, given by Zi = Xξi .
The process Zn, n = 0,1, . . . defines the jump chain, embedded discrete
process or the skeleton process.
1.5.1 A discussion on constructing the CTMC
Before the formal description, we briefly discuss which parameters specify a CTMC.
Assume that we want to construct a continuous-time Markov time-homogeneous
process on a countable state space S. To start with, apart from knowing the
initial distribution, we need also to specify how long the process waits in each of
the states. Since we want the process to be Markov and time-homogeneous, the
waiting time must be distributed by a memoryless distribution. The only mem-
oryless continuous distribution is exponential – hence, the waiting time must be
exponentially distributed. The distribution parameter should depends solely on
the state s0, often called the activity of s0: P(ξ1 < t ∣ Z0 = s0) ∼ Exp(a(s0)). Notice
that the expected waiting time is E[ξ1 ∣ Z0 = s0] = 1a(s0)
, which is consistent with
the intuition: the bigger the activity of a state, the faster the state will be left (on
average). Moreover, it can be shown that the choice of the state visited by the next
jump, P(Z1 = s1 ∣ Z0 = s0, ξ1 < t), should depend on the current state, but it should
not depend on the amount of time spent in the current state (see [67], for example).
Loosely, this is because one would then need to keep track of the past back to when
the previous jump occured all along the time between two jumps, which would in
Chapter 1. Preliminaries 25
turn violate the targeted properties. Finally, letting ps,s′ = P(Z1 = s1 ∣ Z0 = s0) and
assigning weights to the transitions by w(s, s′) ∶= a(s)ps,s′ , the sum of the weights
initiating at state s must sum up to the activity of that state: ∑s,s′ w(s, s) = a(s),since ps,∶ is a probability distribution.
Definition 1.34. A continuous-time Markov graphM = (S,w, p0) is such that for
all s ∈ S, if s ≠ s′, then w(s, s′) ≥ 0. The function w is also called the rate function.
For any s ∈ S, set also w(s, s) = −a(s), where
a(s) = ∑s′∈S
w(s, s′)
is called the activity of a state s. A process Xt assigned to (S,w, p0) is a CTMC,
and it is such that, for all s, s′ ∈ S and for all t ≥ 0,
(i) P(X0 = s) = p0(s),
(ii) P(Xh = s′ ∣ X0 = s) =⎧⎪⎪⎪⎨⎪⎪⎪⎩
w(s, s′)h + o(h) , if s ≠ s′
1 − a(s)h + o(h) , otherwise,with f ∈ o(h) if
limh→0f(h)h = 0.
There are many different ways of defining the CTMC. For example, instead of
defining the probability of transitions in a small interval [0, h), an equivalent
definition, which is more helpful for simulating the CTMC is given by asking that:
(i) P(X0 = s) = p0(s), for all s ∈ S,
(ii) P(ξ1 < t ∣ Z0 = s) = 1 − e−a(s)t, for all s, s′ ∈ S,
(iii) P(Z1 = s′ ∣ Z0 = s) = w(s,s′)a(s) , for all s, s′ ∈ S.
Intuition is the following: assume being in state s, and having set an independent
alarm clock for each s′ such that w(s, s′) > 0, each with exponentially distributed
expiration time with parameter w(s, s′)/a(s). The state of which alarm expires
first is the next chosen state. This interpretation is consistent, due to the following
simple property of exponential distributions.
Lemma 1.35. If X1, . . . ,Xn are independent random variables with Xi ∼ Expλi ,
then X ≡ minXi ∼ Expλ, where λ = λ1 + . . . + λn. Moreover, the index of the Xi
is a discrete random variable with P(i = j) = λjλ .
Chapter 1. Preliminaries 26
1.5.2 Transient distribution
Similarly as in the discrete-time case, we may compute the transient distributions
recursively. Recall the notation p(t)(s, s′) = PXt = s′ ∣X0 = s. Then, for s′ ≠ s,
d
dtp(t)(s, s′) = lim
h→0
p(t+h)(s, s′) − p(t)(s, s′)h
= limh→0
1
h(∑s′′∈S
p(t)(s, s′′)p(h)(s′′, s) − p(t)(s, s′)) ,by (1.1),
= limh→0
1
h
⎛⎝ ∑s′′∈S∖s′
p(t)(s, s′′)p(h)(s′′, s) + p(t)(s, s′)p(h)(s′, s) − p(t)(s, s′)⎞⎠
= limh→0
1
h
⎛⎝ ∑s′′∈S∖s′
p(t)(s, s′′)p(h)(s′′, s) − p(t)(s, s′)[1 − p(h)(s′, s′)]⎞⎠
= limh→0
1
h
⎛⎝ ∑s′′∈S∖s′
p(t)(s, s′′)w(s′′, s′)h + o(h) − p(t)(s, s′)ha(s′)⎞⎠
= ∑s′′∈S∖s′
w(s′′, s′)p(t)(s, s′′) − a(s′)p(t)(s, s′),
where we used Definition 1.34 to evaluate p(h)(s′′, s) and p(h)(s′, s′). Therefore,
the marginal distribution of Xt computes to:
d
dtp(t)(s) = −a(s)p(t)(s) +∑
s′≠s
w(s′, s)p(t)(s′). (1.2)
also known as the Kolmogorov forward equations for the stochastic process, or the
chemical master equation in the biology literature.
For a given CTMC Xn, the matrix ddtP
(t) = Q, with entries q(s, s′) = w(s, s′) ∈ Ris called the generator matrix. The Equation (1.2) translates to
d
dtπ(t) = π(t)Q,
which solves to π(t) = π(0)etQ. The result involves etQ, the standard matrix
exponential defined by etQ = ∑∞n=0
tnQn
n! . The solution is valid always when the
state space is finite.
Chapter 1. Preliminaries 27
1.5.3 Uniformization
Definition 1.36. Suppose there exists r, such that r ≥ sups∈S ∣Qs,s∣, and let M =Q/r + IN (I is an identity matrix of dimension N). The DTMC with transition
matrix M is called the subordinated process of Xt with uniformization constant
r.
Let Zn be a DTMC on S, with transition matrix M. The DTMC Zn is
also called a uniformized or randomized chain, used in principle for numerical
computation of the marginal transient distribution of Xt, since
π(t) = π(0)∞
∑n=0
tnQn
n!= π(0)
∞
∑n=0
tn(r(M − I))nn!
= π(0)e−rt∞
∑n=0
Mn (rt)nn!
.
The practical convenience of the method is that the error of truncating the sum
can be bounded apriori:
π(t) −k
∑n=0
π(0)Mn (rt)nn!
e−rt =∞
∑n=k+1
π(0)Mn (rt)nn!
e−rt ≤ 1 −k
∑n=0
(rt)nn!
e−rt,
since the vector π has all components not larger than 1. Let ξt be a Poisson
process with intensity r, independent of Zn. Then, it can be shown that Xtand Zξ(t) are equivalent in distribution. Consequently, the CTMC Xt and its
subordinated DTMC Zn have identical stationary distribution. The uniformized
chain will be useful when discussing the stationary properties of a CTMC in the
context of exact Markov chain aggregation.
1.5.4 Stationary behavior
In the following, we assume to be given a generator matrix Q.
Definition 1.37. A probability distribution µ ∶ S → [0,1] is called a stationary,
equilibrium or steady state probability distribution of Q, if
µ(s) = ∑s′∈S
µ(s′)P(Xt = s′ ∣X0 = s) = ∑s′∈S
µ(s′)ps,s′(t).
In matrix notation, a non-negative vector µ is an invariant measure if µQ = 0.
Definition 1.38. We will say that the generator Q is irreducible if the transition
matrix of the embedded DTMC is irreducible.
Chapter 1. Preliminaries 28
Theorem 1.39. Suppose that Xt is a non-explosive CTMC, and that it has an
irreducible generator matrix Q. Then, there is a unique stationary distribution µ
with positive values at all components, such that
P(Xt = s)→ µ(s) as t→∞ for all s.
1.5.5 Finite-dimensional marginal probabilities
Assume to be interested in the finite-dimensional marginal probability P(Z0 =s0, τ0 < δ0, . . . , Zk−1 = sk−1, τk−1 < δk−1, Zk = sk). The joint probability can be
decomposed to
P(Z0 = s0)k−1
∏i=0
[P(τi < δi ∣ Zi = si)P(Zi+1 = si+1 ∣ Zi = si)] ,
which is equivalent to the probability of observing a sequence of states s0, . . . , sk
in the embedded DTMC, and the product of probabilities P(τ0 < δi ∣ Z0 = si)for i = 0, . . . , k − 1. Let ρs ∼ (τ0 ∣ Z0 = s) denote the waiting time in state s.
Then, the cdf of ρs is given by Fρs(δ) = 1 − e−a(s)δ and the corresponding pdf is
fρs(δ) = a(s)e−a(s)δ. The cdf of the joint random variable (Z0, . . . , Zk, τ0, . . . , τk−1)at point (s0, . . . , sk, δ0, . . . , δk−1) evaluates to
P(Z0 = s0, . . . , Zk = sk)k−1
∏i=0
P(ρsi < δi) = p0(s0)k−1
∏i=0
[(1 − e−a(si)δi) w(si, si+1)a(si)
] ,
yielding the respecting pdf p0(s0)∏k−1i=0 [fρsi(δi)
w(si,si+1)a(si)
].
Chapter 2
Rule-based modeling of biochemical
networks
In this Chapter, we introduce a rule-based language, a form of site-graph-rewrite
grammar, tailored for modeling low-level bio-molecular interactions. The need for
such a formalism was first discussed in [76], [36], before it was formally intro-
duced in 2003 [24]. Kappa [34] and BioNetGen [6] are examples of two rule-based
modeling platforms which appeared to date.
A simple rule-based model is sketched in Figure 2.1. Informally, an agent of type
B can form a bond with either agent of type A or agent of type C, via specific
(typed) site variables (a, b or c). A transition can be triggered upon local tests on
agent’s interface – omitting the site c of agent B in rule R1 (or R−1 ) means that
the conformation of site c is irrelevant for executing rule R1 (or R−1 ) (sometimes
referred to as the don’t care, don’t write agreement). Typically, agent types encode
proteins and site types encode respective protein domains. The executions of rule-
based models - programs written in a rule-based language - are defined according
to the principles established in physical chemistry and molecular physics domain.
a b AB
c3
c−3
a b AB
c4
c−4B
b CcB
b CcR1, R
−1 : R2, R
−2 :
Figure 2.1: A simple rule-based model
A rule-based model can be understood as a compact, symbolic encoding of a set
of biochemical reactions. In this sense, rule-based models are just a ‘syntactic’
29
Chapter 2. Rule-based modeling of biochemical networks 30
shift with respect to traditional models, but the impact of this simple idea goes
much beyond. Being visually comprehensive, but at the same time formal (and
hence executable), rule-based models become a powerful alternative to the tradi-
tional approaches. First, for a modeler, the site-graph representation of molecular
complexes renders models easy to read, write or edit. Moreover, the description of
interactions is compact and models can trivially be composed, by simply merging
two collections of rules. Finally, unlike the non-formal interaction diagrams, a rule
set can be executed according to its prescribed semantics, or subjected to static
(preprocessing) analysis. Questions such as reachability of a particular molecular
complex, causal relations between rule executions, or quantitative analysis with
respect to the underlying chemical kinetics, can be automatized.
We start by outlining the classical (deterministic) and stochastic chemical kinet-
ics, which are fundamental to biochemical reaction network analysis. These two
mathematical models will serve as a reference when defining the semantics of rule-
based models. Then, site-graphs, rule-based models and their stochastic semantics
are introduced. In the end of the Chapter, we introduce an example of a rule-set
which will facilitate the illustrations throughout the thesis.
2.1 Chemical kinetics
Building a model involves two important choices, related to (i) how to represent
the model (syntax), (ii) how to interpret, ‘execute’ the model (semantics). Popu-
lation models are widely used in modeling interactions among a set of individuals,
distinguishable only by the class of species they belong to. Any population model
can be represented in terms of reactions of the form
A + 2B´¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¶reactant species
k→ C®product species
,
where k denotes the rate or a speed at which the change occurs. A model of
population dynamics can be
(i) either discrete or continuous, depending on whether the population quantity
is modeled as a discrete or a continuous value, and
Chapter 2. Rule-based modeling of biochemical networks 31
(ii) either deterministic or stochastic, depending on whether the output trajec-
tory is fully determined by the initial state (deterministic), or if different
trajectories can emerge, each associated with a certain probability (stochas-
tic).
Definition 2.1. A reaction system is a pair (S,R), such that
(i) S = S1, S2, . . . , Sn is a finite set of species,
(ii) R = r1, . . . , rr is a finite set of reactions. Each reaction is a triple rj ≡(aj,νj, kj) ∈ Nn ×Nn ×R≥0, written down in the following form:
a1jS1, . . . , anjSnkj→ a′1jS1, . . . , a
′njSn, such that a′ij = aij + νij.
The vectors aj and a′j are often called respectively the consumption and
production vectors due to reaction rj, and kj is the kinetic rate of reaction
rj.
2.1.1 Stochastic chemical kinetics
Numerous studies have shown that stochastic effects generate phenotypic hetero-
geneity in cell behavior and that cells can functionally exploit variability for in-
creased fitness ([64] is an early review on the subject). As many genes, RNAs and
proteins are present in low copy numbers, deterministic models are insufficiently
informative or even wrong.
Consider, for example, a simple birth-death model ∅ k1→ S1, S1k2→ ∅. The de-
terministic solution z(t) = z(0)et(k1−k2) is interpreted as the mean population of
species S1 through time. Any additional experimental observation, such as the de-
gree of deviation around the average value, or the probability of extinction of the
species at a given time cannot be deduced. In more complex examples, observing
that the population exhibits bimodal response cannot be made unless a stochastic
model is employed.
Before introducing the stochastic model of a biochemical reaction system, we state
in the next Theorem a result of molecular physics, called the fundamental premise
of stochastic chemical kinetics [42].
Chapter 2. Rule-based modeling of biochemical networks 32
Theorem 2.2. Consider a set of species interacting in a finite volume V . If
the system is well-mixed and the temperature is constant, for a reaction of form
S1, ..., Sl → products, there exist stochastic rate constants cj, such that
cjdt ≈probability that a randomly chosen tuple of S1, . . . , Sl (2.1)
will react according to rj in the next infinitesimal time dt.
A discrete, stochastic model of a biochemical reaction system reacting in a well-
stirred mixture of volume V and in thermal equilibrium is shown in Definition 2.3.
It can be derived from the fundamental premise as shown in, for example, ([41],
Section 5.3.B).
Definition 2.3. Let (S,R) be a reaction system and x0 = (x1, ..., xn) ∈ Nn an
initial state of the system. Then, the discrete, stochastic model is a continuous-
time Markov chain (CTMC) Xt with Markov graph (S,w, p0), such that
(i) S = x ∣ x is reachable from x0 in R,
(ii) p0(x0) = 1,
(iii) w(x,y) = ∑λj(x)1y=x+νj ∣ j = 1, . . . , r.
The family of functions λj ∶ Rn → R ∣ j = 1, . . . , r, called also stochastic reaction
rates is defined by
λj(x) = cjn
∏i=1
(xiaij
) (2.2)
The binomial coefficient ( xiaij
) represents the probability of choosing aij molecules
of species Si out of xi available ones.
In the following, we use the vector notation Xt for the marginal of process Xtat time t.
The Theorem 2.2 is said to be fundamental to stochastic chemical kinetics, be-
cause the remaining theory follows from the fundamental premise via the laws of
probability. Other remaining theory in particular relates to
Chapter 2. Rule-based modeling of biochemical networks 33
(i) computing the transient probability distribution of Xt via the Kolmogorov
forward equation, also known as chemical master equation (CME) in chem-
ical literature. Concretely, denoting by p(t)(x) = P(Xt = x), the CME for
state x ∈ Nn is
d
dtp(t)(x) =
r
∑j=1,x−νj∈S
λj(x − νj)p(t)(x − νj) −r
∑j=1
λj(x)p(t)(x). (2.3)
(ii) the simulation of traces of Xt, known as the stochastic simulation algo-
rithm (SSA) in chemical literature [40].
Notice that the CME implies that the expectation of the marginal distribution of
Xt satisfies the equations
d
dtE(Xt) = ∑
x∈S
xd
dtp(t)(x)
= ∑x∈S
r
∑j=1
((x + νj) − x)λj(x(t))p(t)(x)
=r
∑j=1
νj (∑x
λj(x(t))p(t)(x))
=r
∑j=1
νjE(λj(Xt)),
Observe a transition from x to x+νj. The term λj(x)P(X(t) = x) appears exactly
once when summing up for the state x = x as the outflow probability, and exactly
once when summing up for the state x = x + νj, as the inflow probability. This
gives the term (x + νk) − x = νj multiplying λj(x)p(t)(x).
It is worth noting that, upon scaling the rate constants as will be explained in Sec-
tion 2.1.3, the equations for E(Xt) are equivalent to (2.4) only if all rate functions
are linear, that is, when all reactions are unimolecular.
2.1.2 Classical chemical kinetics
Conventional chemical kinetics handles ensembles of molecules with large number
of particles, 1020 and more. The chemist uses concentrations rather than particle
numbers, [N] = N/NA ⋅ V , where NA = 6.23 ⋅ 1023mol−1 is the Avogadro’s number
and V is the volume (in dm3). When the pressure and temperature are constant,
the following continuous, deterministic model is appropriate.
Chapter 2. Rule-based modeling of biochemical networks 34
Definition 2.4. Let (S,R) be a reaction system and z0 = (z1, ..., zn) ∈ Rn an initial
state of the system. Then, the continuous, deterministic model is the solution of
the set of n coupled differential equations given by
d
dtzi(t) =
r
∑j=1
νijλj(z(t)), for i = 1,2, . . . , n, (2.4)
satisfying the initial condition z0. The family of functions λj ∶ Rn → R ∣ j = 1, . . . , r,
called also deterministic reaction rates is defined by
λj(z) = kjn
∏i=1
ziaij . (2.5)
The fact that the speed of a chemical reaction is proportional to the quantity of
the reacting substances is known as the law of mass action.
2.1.3 Determinisitc and stochastic rate constants
We mentioned above the existence of both reaction rate constant kj and stochastic
rate constant cj. Deterministic and stochastic rate constant are not equivalent.
When switching between the stochastic and deterministic model, a conversion
of rates must be performed. In particular, stochastic rate constant depends on
the volume and the arity of a reaction. In general, the conversion is such that
the stochastic rate function applied to a state x ∈ Nn for a reaction rj, and the
deterministic law of its conversion to a volume unit - xV ∈ Rn will relate as
λj(xV −1) = λj(x)V −1. (2.6)
The careful study of the above conversions is outlined in [41]. Intuitively, recall
(2.1) and observe that, as unimolecular reactions represent a spontaneous conver-
sion of a molecule, they should not be volume-dependent. In bimolecular reactions,
the stochastic rate cj will be proportional to 1/V , reflecting that two molecules
have harder time finding each-other within a larger volume. In general, whenever
no more than one copy of each species is consumed in reaction rj of arity ∣aj ∣, then
cj = kjV −(∣aj ∣−1). The approximation cj ≈ kjV −(∣aj ∣−1) holds when all species are
highly abundant. More concretely, when each species is highly abundant, then,
the stochastic rate function for a reaction rj of arity ∣aj ∣ can be approximated
Chapter 2. Rule-based modeling of biochemical networks 35
to λj(x) = cj∏ni=1 ( xiaij) ≈ cj∏
ni=1 x
aiji ; The deterministic law of its conversion to a
volume unit is then λj(xV −1) = kj∏ni=1 xi
aijV −aij = kjV −∣aj ∣∏ni=1 xi
aij .
2.1.4 Random time change model and the thermodynamical limit
The so-called ‘thermodynamic limit’ is defined as the limit in which the reactant
populations xi and the system volume V all become infinitely large, but in such
a way that the reactant concentrations xi/V stay fixed. Importantly, the thermo-
dynamic limit is not a limit that the system actually approaches as a consequence
of its natural temporal evolution, nor as a result of the experimental intervention-
it is an idealized state that is useful because it provides a convenient approxi-
mation to macroscopic systems. Even though deterministic models historically
appeared first, they represent a particular approximation of the stochastic model.
We outline the sketch of the derivation of [2].
Denote by Rj(t) the number of times that the j-th reaction had happened until
the time t. Then, the state of the system at time t is Xt = X0 +∑rj=1Rj(t)νj. The
value of Rj(t) is a random variable, that can be described by a non-homogenous
Poisson process, with parameter ∫t
0 λj(Xs)ds, that is, Rj(t) = ξj(∫t
0 λj(Xs)ds).Finally, the expression
Xt = X0 +r
∑j=1
ξj (∫t
0λj(Xs)ds)νj (2.7)
represents the evolution of the state Xt.
Denote by V the size of the volume in which the reactions take place. Introducing
the scaled states XV = V −1X ∈ Rn, the scaled propensities λj(XV ) = V −1λj(VXV )and denoting by ξj(t) = ξj(t) − t the centered Poisson process, the scaled version
of (2.7) can be written as
XVt = XV
0 +r
∑j=1
νjV−1ξj (V ∫
t
0λj(XV
s )ds) (2.8)
= XV0 +
r
∑j=1
νj ∫t
0λj(XV
s )ds + V −1r
∑j=1
νj ξj(V ∫t
0λj(XV
s )ds). (2.9)
Letting V → ∞, the law of large numbers for the Poisson process ([2], Lemma
1.2), implies that V −1ξj(V t) ≈ 0, and and the process ξj follows – according to the
Chapter 2. Rule-based modeling of biochemical networks 36
remaining term – an ordinary differential equation, equivalent to the reaction-rate
equation (2.4). The performed limit is referred to as the thermodynamic limit.
To summarize, classical chemical kinetics faithfully describes the mean population
sizes, either when all reactions are unimolecular [65], or when the system is in the
‘thermodynamical limit’ [43, 60]. When the assumptions underlying the classical
kinetics break down, the deterministic models can be not only less informative,
but also misleading [79].
2.2 Site-graphs
We now introduce site-graphs, which will facilitate the formal definition of a rule-
based model.
A site-graph is an undirected graph where typed nodes have sites, and edges are
partial matchings on sites. Moreover, the sites which do not serve for forming
edges are called internal, and they are assigned a value from a predefined set. The
nodes of the site-graph usually interpret protein names, and sites of a node stand
for protein binding domains. Internal states are used to encode post-translational
modifications.
Let S denote the set of site labels, and I the set of internal values that can be
assigned to sites. The function I ∶ S → P(I ) denotes the set of internal values
that a site s can take. A site s can be evaluated only to a predefined set of values,
I(s). If the site is used for creating a bond, its set of predefined internal values is
empty.
A rule-based model is defined over a fixed contact map, reflecting that a mod-
eler fixes the assumptions on the structure of proteins, which are intended to be
included in the model.
Definition 2.5. A contact map (CM) is a tuple (A,Σ,E ,I), where A be the set
of node types, and each node type is being equipped with a set of sites, defined
by a signature map Σ ∶ A → P(S ). The set E ⊆ ((A, s), (A′, s′)) ∣ A,A′ ∈ A, s ∈Σ(A), s′ ∈ Σ(A′),I(s) = I(s′) = ∅ is a set of predefined edge types.
In the following, assume that site-graphs are defined over a contact map C =(A,Σ,E ,I ).
Chapter 2. Rule-based modeling of biochemical networks 37
Definition 2.6. A site-graph is a tuple G = (V,Type, I,E,ψ) with
(i) a set of nodes V ,
(ii) a node type function Type ∶ V → A,
(iii) a node interface function I ∶ V → P(S ), such that for v ∈ V , I(v) ⊆Σ(Type(v)),
(iv) a set of edges E ⊆ E , which is
symmetric: ((v, s), (v′, s′)) ∈ E iff ((v′, s′), (v, s)) ∈ E,
injective: if ((v, s), (v′, s′)) ∈ E, ((v, s), (v′′, s′′)) ∈ E, then (v′, s′) = (v′′, s′′),
irreflexive: for all v ∈ V , s ∈ S , ((v, s), (v, s)) ∉ E,
(v) a site evaluation function ψ ∶ (v, s) ∣ v ∈ V, s ∈ I(v) → I ∪ ε, such that,
[if I(s) ≠ ∅ then ψ(v, s) ∈ I(s)]. In words, if site s is an internal site, the
function ψ assigns an internal value to a node-site combination.
Site-graphs will be used in three different contexts: (i) to model physically exist-
ing group of interacting complexes, termed reaction mixtures, and the connected
components within them, called species, (ii) to specify the local interaction pat-
terns – rewrite rules, (iii) to model those patterns/motifs whose quantities within
a reaction soup are to be tracked as a state of the stochastic model – fragments.
The function Σ in the above definition tracks the sites assigned to a particular
node of a site-graph. The reaction mixtures (and species) must have all interfaces
complete, in the sense that all the sites of node’s signature are listed in its inter-
face. The patterns appearing in rules of fragments typically have non-complete
interfaces.
Notice that CM is itself not necessarily a site-graph, since the edge may occur
between two same sites. The set P = (v, s) ∣ s ∈ I(v) ⊆ V ×S is called the set of
ports. Given an edge e = (p, p′) ∈ E, we denote by e the symmetric edge (p′, p).
Definition 2.7. A site-graph G = (V,Type, I,E,ψ), such that for all v ∈ V ,
I(v) = Σ(Type(v)) is called a reaction mixture.
Definition 2.8. Given a site-graph G = (V,Type, I,E,ψ), a sequence of edges
(e1, . . . ek) ∈ Ek, ei = ((vi, si), (v′i, s′i)), such that v′i = vi+1 and s′i ≠ si+1, for i =1, . . . k − 1, is called a path between nodes v1 and vk.
Chapter 2. Rule-based modeling of biochemical networks 38
Definition 2.9. A site-graph G is connected if there exists a path between every
two vertexes v and v′.
Definition 2.10. A connected site-graph is called a pattern. A connected reaction
mixture is called a species.
Note that, different to the reaction system model (Definition 2.1), the set of all
species may be infinite. For example, consider a set of sites S = x, y, such
that I(x) = I(y) = ∅. Hence, both x and y serve as binding sites. More-
over, let A = A,B, and Σ(A) = Σ(B) = x, y, and a set of edge types is
E = ((A,y), (B,x)), ((A,x), (B,y)). Potentially infinitely many connected re-
action mixtures are formed over this contact map.
Two site-graphs can be related by an embedding function, which is important
for assigning the stochastic process to a rule-based model. The symmetry of a
site-graph is formalized as a bijective embedding of a site-graph to itself, called
automorphism.
Definition 2.11. The embedding σ between site-graphs G = (V,Type, I,E,ψ) and
G′ = (V ′,Type′, I ′,E′, ψ′), is induced by a support function σ∗ ∶ V → V ′, if
(i) σ∗ is injective: for all v, v′ ∈ V , [σ∗(v) = σ∗(v′) Ô⇒ v = v′];
(ii) for all v ∈ V , Type(v) = Type′(σ∗(v)) and for all s ∈ S such that I(s) ≠ ∅,
ψ′(σ∗(v), s) = ψ(v, s);
(iii) for all v ∈ V , [s ∈ I(v) Ô⇒ s ∈ I(σ⋆(v))];
(iv) for all (v, s) ∈ V ×S , if I(s) = ∅, then ψ(v, s) = ψ′(σ∗(v), s); Otherwise, for
all (v′, s′) ∈ V ×S , [if ((v, s), (v′, s′)) ∈ E then σ∗(v) ∈ V ′ and ((σ∗(v), s), (σ∗(v′), s′)) ∈E′].
Notice that requirement (iv) enforces that both nodes forming an edge in a site-
graph G are embedded in the site-graph G′. If σ∗ is bijective, then σ is an iso-
morphism. We denote that G is isomorphic to G′ by G ≅ G′. An isomorphism
between G and itself is called automorphism. If σ ∶ G1 → G2 is an isomorphism,
we write G2 = σ(G1). The set of embeddings between the site-graph G and G′ will
be denoted by Emb(G,G′), the set of isomorphisms between site-graphs G and
G′ will be denoted by Iso(G,G′), and the set of automorphisms of a site-graph G
will be denoted by Aut(G). The set cardinality will be denoted by ∣ ⋅ ∣.
Chapter 2. Rule-based modeling of biochemical networks 39
2.3 Rule-based models
Rule-based language is a formalism for specifying biochemical reaction systems,
in which the internal structure of molecular species is represented by site-graphs,
where modifications of protein residues and bonds are explicitly encoded. The
inspiration for the here-presented site-graph-rewrite language is Kappa [34], even
though the formalism which we present here does not fully coincide with that of
Kappa, and is often referred to as a kernel of Kappa. For example, the release of a
dangling bond (in Kappa syntax, written as an expression A(x!)→ A(x)) cannot
be expressed in our framework.
The site-graphs are specifically designed for modeling molecular interactions and,
to our knowledge, it was not previously studied within the classical graph theory
works. On the other hand, studying graph rewrite grammars and graph rewrite
transformation systems (the difference is that a transformation system has no
initial state) has been present in the computer science research since the late 60’s
[73] to today [28, 83].
A rule-based program is given by an initial reaction mixture and a collection
of rules over a fixed contact map. We first define a rule over a contact map
C = (A,Σ,E ,I ). The shorthand notation ψA will be used to denote mapping
ψA ∶ Σ(A) → I ∪ ε for a full valuation function of a node of type A, where the
binding sites are evaluated to ε.
Definition 2.12. Let G, G′ be site-graphs, and c ∈ R a non-negative real number.
The triple (G,G′, c), also denoted by Gc→ G′, is called a rule. A rule is well-
defined, if G′ = (V ′,Type′, I ′,E′, ψ′) can be derived from G = (V,Type, I,E,ψ) by
a finite number of applications of five elementary site-graph transformations:
(i) adding an edge: δae(G,e) = (V,Type, I,E∪e, e, ψ′), where e = ((v, s), (v′, s′))is such that v, v′ ∈ V , s ∈ I(v), s′ ∈ I(v′);
(ii) deleting an edge: δde(G,e) = (V,Type, I,E ∖ e, e, ψ′), where e, e′ ∈ E,
(iii) changing the state value: δci(G,v′, s′, i′) = (V,Type, I,E,ψ′), where s′ ∈I(v′), i′ ∈ I(s′), and
ψ′(v, s) =⎧⎪⎪⎪⎨⎪⎪⎪⎩
i′, if v = v′ and s = s′,
ψ(v, s), otherwise;
Chapter 2. Rule-based modeling of biochemical networks 40
(iv) deleting a node: δdn(G,v) = (V ′,Type, I,E′, ψ′), such that V ′ = V ∖ v,
ψ′ = ψ∣V ′ (the restriction of function ψ to the set of nodes V ′), and E′ =E ∖ e ∣ there is a site s and a port p, e, e ∩ (v, s), p) ≠ ∅,
(v) adding a node: δan(G,A,ψA) = (V ∪ v′,Type, I,E,ψ′), such that v′ ∉ V ,
Type(v′) = A, I(v′) = Σ(A), and ψ′(v, s) =⎧⎪⎪⎪⎨⎪⎪⎪⎩
ψA(s), if v′ = v,
ψ(v, s), otherwise.
The interface function I is unaltered under any of the transformations - a site
cannot be added or deleted from node’s interface by a rule. Adding a node needs
an evaluation of all internal sites in the signature of that node.
Definition 2.13. A rule-based program over a contact map C is a tuple (R,G0),where
(i) R = R1, . . . ,Rn is a set of well-defined site-graph rewrite rules over the
contact map C,
(ii) G0 is the the reaction mixture over the contact map C.
Since our analysis will not be tailored specifically to the observable site-graphs
(those whose quantities are to be observed during model execution), we do not
include the set of observable site-graphs into the definition of a rule-based program.
Instead, for a proposed fragment set (to be defined later), we will assume that the
observables are all fragments from that set.
A rule-set is defined over a fixed contact map. However, if it is not explicitly
defined, contact map can be inferred from a rule-set, as a union of contact maps
inferred from the lhs and rhs of each rule.
Definition 2.14. The contact map inferred from a site-graphG = (V,Type, I,E,ψ)is C = (A, I,E ,I ), where A = Type(v) ∣ v ∈ V , Σ(A) = ∪I(v) ∣ v ∈ V,Type(v) =A, E = ((A, s), (A′, s′)) ∣ ((v, s), (v′, s′)) ∈ E,Type(v) = A,Type(v′) = A′,
I (s) = i ∣ ∃v ∈ V, s ∈ I(v), ψ(v, s) = i.
Definition 2.15. Given contact maps C1 = (A1, I1,E1,I1) and C2 = (A2, I2,E2,I2),their union is a contact map C = (A, I,E ,I ), such that A = A1 ∪ A2, I(A) =I1(A)∪ I2(A) for all A ∈ A, E = E1 ∪E2, I = I1 ∪I2. If the interface functions are
such that for all A ∈ A, I1(A) ∩ I2(A) = ∅, we write C = C1 ⊎C2.
Chapter 2. Rule-based modeling of biochemical networks 41
Given an initial reaction mixture, the continuous-time Markov chain (CTMC)
assigned to a rule-based model takes values in the set of reaction mixtures reachable
from the initial one.
A rule Ri = (Gi,G′i, ci) ∈ R can be applied to a reaction mixture G if there exists
an embedding between Gi and G. The rule-application to a reaction mixture via
a given embedding is formalized by function
δi ∶ G × (Vi → V )→ P(G),
which takes as input a reaction mixture G, and a candidate embedding between
the lhs of the rule R and G. The result is equal to ∅, if σ∗ does not induce
an embedding between Gi and G. Otherwise, if the rule does not include node
creation, the result, δi(G, σ∗), is uniquely determined for a chosen embedding, as
it is rigorously shown in [20]. Intuitively, the part of the mixture to which the lhs
of the rule is embedded, is transformed by the rule, whereas the rest of it remains
unchanged (Figure 2.2). Finally, if one of the elementary transformations defining
a rule includes the creation of node of type A, there are countably many results
of application of the rule to a reaction mixture – the node is added accordingly as
defined in Definition 2.12. The name of the newly created node must be chosen
from a predefined set of node names, which will be denoted by N . It will also be
useful to adopt a naming convention, a bijective mapping ζ ∶ A × N → N , which
reflects the total order among the names reserved for nodes of the same type.
Then, if the initial reaction mixture has k copies of node of type A, the nodes
used to encode it will be ζ(A,1), ζ(A,2), . . . , ζ(A,k). If the current node set in
a reaction mixture is V , and a new node of type A is created, the corresponding
node name will be a node ζ(A, j) where j is chosen uniformly at random from
the set N ∖ i ∣ ζ(A, i) ∈ V . For example, the node naming convention can be
ζ(A, i) ∶= viA, where A ∈ A, i ∈ N.
2.4 Site-graph rigidity and counting automorphisms
Each embedding of a lhs of a rule to a reaction mixture is one randomly chosen
combination of species which conform the lhs description. We will therefore be
interested in the number of embeddings of G into the reaction mixture G. Loosely,
Chapter 2. Rule-based modeling of biochemical networks 42
a b AB
a b AB
cR
d
b
BB
C
A
v2v1
v4
v3a
c
a
c d
b
BC
Aa
c
a
c
c
B v2v1
v4
v3
G G
Figure 2.2: Rule application. Rule R can be applied to a reaction mixtureG via the embeding indicated by the dotted arrows. The result is the reaction
mixture G′ = δi(G,u↦ v2, v ↦ v3).
it is the number of different occurrences of the motif described by G inside the
reaction mixture G.
Lemma 2.16. Let G be a pattern, and G a reaction mixture over the contact map
C. Then, the number of embeddings of G to G different up to automorphism will
be denoted by mG(G), and it counts to
mG(G) ∶= ∣Emb(G,G)∣∣Aut(G)∣ .
Proof. (Sketch) Each embedding of site-graph G to a site-graph G comes together
with exactly ∣Aut(G)∣ different embeddings, each determining exactly the same
connected sub-site-graph of G. Therefore, the number of occurences of a motif
described by species G in a reaction mixture G equals to the total number of
embeddings in Emb(G,G), divided by the number of automorphisms in Aut(G).
It is much easier to count the embeddings between two connected site-graphs than
between general graphs. For regular graphs, deciding whether there exists an
embedding between two graphs (subgraph isomorphism problem), is known to be
NP-complete. On the other hand, site-graphs enjoy the rigidity property which
ensures that an embedding between two connected site-graphs is fully defined by
the image of one node.
Theorem 2.17. Let G = (V,Type, I,E,ψ) and G′ = (V ′,Type′, I ′,E′, ψ′) be two
connected site-graphs and let σ1 and σ2 be two embeddings between G and G′.
Then, for any node v ∈ V , we have: [σ∗1(v) = σ∗2(v) Ô⇒ σ1 = σ2].
Chapter 2. Rule-based modeling of biochemical networks 43
Proof. Let us assume that we have two connected site-graphs G and G′, two
embeddings σ1 and σ2 betweenG andG′, and a node v ∈ V such that σ∗1(v) = σ∗2(v).Let us consider another node v′ ∈ V . Since G is connected, there exists a path
p = (e1, . . . , ek) between the node v and v′. Let p′ be the image of the path p
by the σ1. Since σ1 is an embedding, p′ is indeed a path in G′. By induction
over n between 1 and k, if we denote en = ((vn, sn), (vn+1, sn+1)), it holds that:
σ∗1(vn+1) = σ∗2(vn+1) (and thus for n = k, we get σ∗1(v′) = σ∗2(v′). The induction
step is proved as follows. Since σ1 and σ2 are both embeddings, we know that:
((σ∗1(vn), sn), (σ∗1(vn+1), sn+1)) and (σ∗2(vn), sn), (σ∗2(vn+1), sn+1)) are two edges of
G′; by injectivity (Def. 2.6), σ∗1(vn) = σ∗2(vn) implies σ∗1(vn+1) = σ∗2(vn+1).
Recall that a species is a connected reaction mixture. Let S = S1, S2, . . . represent
the possibly infinite set of non-isomorphic species that can be formed in a given
rule-based model. In other words, S captures the set of species that are equivalent
up to isomorphism. The following result states that the quantity of any pattern
can be expressed as a linear combination of quantities of the species.
Lemma 2.18. For every pattern Fi, and for all reaction mixtures G ∈ G,
mG(Fi) = ∑Sj∈S
mG(Sj)mSj(Fi).
Proof. Let Fi = (Vi,Typei,Ii,Ei, ψi) and G = (V,Type,I,E,ψ). Take σ ∈ Emb(Fi,G).Let v ∈ Vi, and let the connected component in G, which contains σ∗(v) ∈ V be
denoted by ccG(σ∗(v)). It is isomorphic to some species Sj ∈ S. Denote the iso-
morphism by σ′ ∶ Sj → ccG(σ∗(v)). Then, by construction, σ′−1 σ ∶ Fi → Sj
is an embedding between Fi and Sj. Therefore, each embedding σ ∈ Emb(Fi,G)uniquely defines a species Sj and σ′ ∈ Emb(Sj,G), such that σ′−1σ ∈ Emb(Fi, Sj).Conversely, for each embedding σ′ ∈ Emb(Si,G) and σ1 ∈ Emb(Fi, Sj), σ′ σ1 ∈Emb(Fi,G), so the claim follows.
2.5 Individual-based and species-based semantics of rule-based
programs
Assume given a rule-based program (R,G0) over the contact map C = (A,Σ,E ,I ),where an initial mixture is written under the node naming convention ζ ∶ A×N→
Chapter 2. Rule-based modeling of biochemical networks 44
viA ∣ A ∈ A, i ∈ N, defined by ζ(A, i) ∶= viA. We now define the Markov graph
assigned to a rule-based program.
Definition 2.19. A Markov graph (G, w, p0) assigned to a rule-based program
(R,G0), where G0 respects the naming convention ζ, is such that
(i) G denotes the set of reachable reaction mixtures:
G = G ∣ G is reachable by a finite number of
applications of rules from R to G0.
(ii) the initial distribution is p0(G) =⎧⎪⎪⎪⎨⎪⎪⎪⎩
1/∣G ∩ Iso(G0)∣ if G ∈ G ∩ Iso(G0),
0 otherwise.
(iii) For every G,G′ ∈ G, let
w(G,G′) = ∑Ri∈R
ci ∣ G′ ∈ δi(G, σ∗).
Recall that, in case of no node creation in rule Ri, there is a unique reaction
mixture G′ ∈ δ(G, σ∗) for a chosen embedding (see [20] for a rigorous deriva-
tion). If the rule Ri includes the creation of a species, there are countably
many reaction mixtures G′ ∈ δ(G, σ∗), which differ in the choice of the names
for freshly created nodes, but they all respect the naming convention ζ. No-
tice that each new node name is thus chosen uniformly at random from the
set ζ(A, i) ∣ i = 1,2, . . . ∖ V , where V is the set of nodes in the reaction
mixture G.
Definition 2.20. Let Xt be CTMC assigned to (G, w, p0). Then, Xt is re-
ferred to as the individual-based (stochastic) semantics of (R,G0).
Since rules operate over node types, rather than individual, concrete node names,
it is natural to turn to a species-based view of the reaction mixture - to identify
all of those mixtures which are equivalent up to isomorphism:
G1 ∼ G2 iff exists σ ∈ Iso(G1,G2), such that σ(G1) = G2.
Assuming a finite set of species produced by a rule-set, S = S1, . . . , Sn, each
partition class is uniquely represented by a multi-set of species.
Chapter 2. Rule-based modeling of biochemical networks 45
Definition 2.21. Define ϕS ∶ G→ X ⊂ Nn, so that
ϕS(G) = (x1, . . . , xn), with xi =mG(Si).
Conversely, let x ∈ Nn be a multi-set of species with the set of node types A =A1, . . . ,AN. Every reaction mixture G = (V,Type, I,E,ψ) such that ϕS(G) = x
contains the same copy number of nodes of type A1, A2 etc. All reaction mixtures
which have fixed node names and node types, while different site evaluation and
edges, define a set ϕ−1S(x):
ϕ−1S,V,Type(x) = G′ ≡ (V,Type, I ′,E′, ψ′) ∣G = (V,Type, I,E,ψ) ∈ G is such
that ϕS(G) = x and ϕS(G′) = x.
Then, the process Xt over the state space X , defined by [Xt = x iff Xt ∈ ϕ−1S(x)]
is referred to as the species-based semantics of (R,G0).
We will argue in Chapter 5 on the properties of process Xt. Then, it will also
be useful to know how many different reaction mixtures are lumped to a given
species-based state x.
Theorem 2.22. Let x be a multiset of species with S = S1, . . . Sn and A =A1, . . . ,AN. Then,
∣ϕ−1S (x)∣ = ( a1!a2! . . . aN !
∏ni=1(xi!∣Aut(Si)∣xi)
) , (2.10)
where ai denotes the number of nodes of type Ai in any site-graph G ∈ ϕ−1S(x).
Proof. Let G ∈ G with a set of nodes V be a reaction mixture such that ϕS(G) = x,
that is, it has xi different embeddings of a species Si, i = 1, . . . , n up to auto-
morphism. Assume that G contains ai nodes of type Ai, i = 1, . . . ,N . Any other
G′ ∈ ϕ−1S(x) can be obtained from G by an isomorphism σ ∶ G → G. Denote the
set of all such isomorphisms by Γx,G = σ ∶ G → G ∣ for all G ∈ ϕ−1S(x), ϕS(G) =
ϕS(σ(G)) = x. Then, ∣ϕ−1S(x)∣ = ∣Γx,G ∣. The support function σ∗ ∶ V → V of
an isomorphism σ ∈ Γ is such that a node v is mapped to a node of the same
type: Type(v) = Type(σ∗(v)). Denote the set of all maps with such a property
by Γ∗. The set Γ∗ has a1! . . . aN ! elements. However, some support functions
Chapter 2. Rule-based modeling of biochemical networks 46
from Γ∗ determine exactly the same isomorphism, that is, for some σ∗, σ∗′ ∈ Γ∗,
it is σ(G) = σ′(G) = G′. We now determine how much of over-counting was done.
Firstly, consider a connected subgraph of G such that it forms a species Si, and
denote its set of nodes by Vi. Every two maps σ∗, σ′∗ ∈ Γ∗ whose restriction to
Vi is an automorphism of Si are inducing the same automorphism. There are
∏ni=1 ∣Aut(Si)∣xi such maps. Moreover, consider the species Si, and let Vi1, . . . , Vixi
denote the set of nodes of all subgraphs of G, which form the species Si. Let
σ∗∣Vi1 , σ∗∣Vi2 , . . . σ∗∣Vixi be the restrictions of σ∗ to the corresponding sets of nodes.
Then, every map σ∗′
whose sequence of projections (σ∗′∣Vi1 , σ∗′∣Vi2 , . . . σ∗′∣Vi1) is a
permutation of the projections (σ∣Vi1 , σ∣Vi2 , . . . σ∣Vi1), determines the same isomor-
phism. There are xi! such permutations for each species Si, and ∏ni=1 xi! such maps
over all species.
2.6 Examples
We now illustrate how the classical deterministic and stochastic model is assigned
to a rule-based program. The rule-based program will first be expanded to its
equivalent reaction system.
Example 2.1. (Simple scaffold) Scaffold protein B recruits independently the pro-
teins A and C. These assumptions are captured by a set of rules, R1,R2,R−1 ,R
−2,
depicted in Figure 2.3. Adding the rules R3,R4 accelerates the unbinding, when-
ever the bond is within a trimer complex (that is, the bonds are made less stable
when withing a trimer).
R3
a b
c d
AB
C
a b
c d
AB
C
c3 R4
a b
c d
AB
C
a b
c d
AB
C
c4
a b ABR1
a b AB
c1
c−1R2
c dB
C c dB
C
c2
c−2
Figure 2.3: Rule-set for Example 2.1.
Chapter 2. Rule-based modeling of biochemical networks 47
The corresponding reaction system is (S,R), where S = SA, SB, SC , SAB, SBC , SABCand R = rA.B, rB.C , rA.BC , rAB.C , rA..B, rB..C , rA..BC , rAB..C, defined by
rA.B ∶ SA, SBk1→ SAB
rA.BC ∶ SA, SBCk1→ SABC
rB.C ∶ SB, SCk2→ SBC
rAB.C ∶ SAB, SCk2→ SABC
rA..B ∶ SABk1−→ SA, SB
rA..BC ∶ SABCk1−→ SA, SBC
rB..C ∶ SBCk2−→ SB, SC
rAB..C ∶ SABCk2−→ SAB, SC .
The consumption vectors and change vectors are the column vectors of respectively
matrices P and C:
P =
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
1 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0
0 0 1 1 0 0 0 0
0 0 0 1 1 0 0 0
0 1 0 0 0 0 1 0
0 0 0 0 0 1 0 1
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
and C =
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
−1 −1 0 0 1 1 0 0
−1 0 −1 0 1 0 1 0
0 0 −1 −1 0 0 1 1
1 0 0 −1 −1 0 0 1
0 −1 1 0 0 1 −1 0
0 1 0 1 0 −1 0 −1
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
,
while, according to the mass-action law, the rate function has the following form:
λ(z) = (k1zAzB, k1zAzBC , k2zBzC , k2zABzC , k1−zAB, k1−zABC , k2−zBC , k2−zABC).
Deterministic model. Denote by z ∈ R6 the vector of concentrations of species
from S. For keeping transparency, let zA denote the concentration of species A, zB
the concentration of species B etc. The continuous, deterministic model is given
Chapter 2. Rule-based modeling of biochemical networks 48
by the set of ordinary differential equations:
dzAdt
= −zAzBk1 − zAzBCk1 + zABk1− + zABCk1−
dzBdt
= −zAzBk1 − zBzCk2 + zABk1− + zBCk2−
dzCdt
= −zBzCk2 − zABzCk2 + zBCk2− + zABCk2−
dzABdt
= −zABzCk2 − zABk1− + zAzBk1 + zABCk2
dzBCdt
= zBzCk2 − zBCk2− + zBzCk2 + zABCk1−
dzABCdt
= −zABCk1− − zABCk2− + zAzBCk1 + zABzCk2.
Stochastic model. Assume that there are initially three copies of agent B, one
copy of agent A and one copy of agent C, which is represented by a population
state x0 = (1,3,1,0,0,0). For transparency, we will represent states in form of
multi-sets - for example, x0 ≡ A,3B,C. The stochastic model is a CTMC Xtwith a Markov graph (S,w, p0), such that p0(x0) = 1, S = x0,x1,x2,x3,x4, and
the weights are as depicted in Figure 3.1.
x0
x1
x2
x3
x4
3c1
3c2 2c1
2c2
c1
c2 c1−
c2−
c1−
c2−
c1−
A, 3B, C
AB, 2B, C
A, 2B, BC ABC, 2B
AB, B, BC
c2−
Figure 2.4: Markov graph for x0 ≡ A,3B,C.
Denoting by p(t)(x) = P(Xt = x), the CME is represented by the following system
of equations (the superscript (t) is omitted):
dp(x0)dt
= c1−p(x1) + c1−p(x2) − p(x0)(3c1 + 3c2)dp(x1)
dt= 3c1p(x0) + c2−p(x3) + c2−p(x4) − p(x1)(c1− + 2c2 + c2)
dp(x2)dt
= 3c2p(x0) + c1−p(x3) + c1−p(x4) − p(x2)(c1− + 2c1 + c1)dp(x3)
dt= 2c2p(x1) + 2c1p(x2) − p(x3)(c2− + c1−)
dp(x4)dt
= c2p(x1) + c1p(x2) − p(x4)(c2− + c1−).
Chapter 2. Rule-based modeling of biochemical networks 49
0
1
3
time 10
ABCABBCABC
0
1
3
time 10
ABCABBCABC
a) b)
copy number per volume unit(stochastic model)
concentration(deterministic model)
mean copy number per volume unit, for volume=1
mean copy number per volume unit, for volume=20
concentration(deterministic model)
Figure 2.5: Deterministic and stochastic models for Example 2.1. a) Forvolume V = 20v, the solution z(t) of the deterministic model with initial statez(0) = (1,3,1,0,0,0)v, and one scaled trajectory of a stochastic simulation
x(Vv)(t), for initial state x(0) = (20,60,20,0,0,0) (number of molecules). Rate
values are set to k1 = 1v−1s−1, k2 = 0.2v−1s−1, k1− = 2v−1s−1, k2− = 0.3v−1s−1 andc1 = 1s−1(Vv )
−1 c2 = 0.2s−1(Vv )−1, c1− = 2s−1, c2− = 0.3s−1. b) We integrated the
CME for two initial states: x1(0) = (1,3,1,0,0,0) (five equations of the modelpresented in Figure 3.1) and x2(0) = (20,60,20,0,0,0) (set of 3113 equations).The three plots represent: (solid lines) the solution z(t) of the deterministicmodel with initial state z(0) = (1,3,1,0,0,0)v, (dashed lines) the scaled meanpopulation for each species, for initial state x1(0), that is, 1
3E[X1(t)] and,(dotted lines) the scaled mean population for each species, for initial state x2(0),
that is, 120E[X2(t)].
In Figure 2.5a, we show the solution of the model in the deterministic limit, and
one trajectory of a stochastic model scaled with the volume, XV . In Figure 2.5b,
we illustrate that, due to bimolecular reactions, the mean population size does
not coincide with the solution in the deterministic limit. The used values of rate
constants are not inspired from real data. A volume unit is denoted by v. In
order to compare the deterministic and stochastic models, we assumed that the
volume scales with the total molecule number, more precisely, that one volume
unit corresponds to five molecules. Therefore, for the initial state for the stochastic
model to x(0) = (20,60,20,0,0,0) (molecules), the volume of V = 100 molecules
takes 20 units – V = 20v.
Two more working examples are introduced. The first one, a model of two-sided
polymerization will be useful to demonstrate that, by fragment-based reductions,
the state space of the fragment-based Markov graph can be exponentially smaller
than the state space of the species-based Markov graph. The second example has
Chapter 2. Rule-based modeling of biochemical networks 50
only one node type with three sites, and it will be used to discuss the difference
between deterministic and stochastic fragments.
Example 2.2. (Two-sided polymerization) A rule-based model shown in Fig-
ure 2.6 describes an alternating polymerization between nodes A and B. The
rules R3,R−3 ,R4 and R−
4 describe the binding and unbinding events. Moreover,
each node has two activation levels, modelled by an internal state - a for node A
and b for node B, which are regulated independently of the bindings (rules R1,
R−1 , R2, R−
2 ). We assume that the binding between the nodes can be accelerated
if both nodes are in the active mode. This is incorporated by rule R∗3 . Hence, the
total rate of binding between an activated A and an activated B is c3 + c∗3.
xAc3
c−3
c∗3
R1, R−1 :
R2, R−2 :
R3, R−3 :
R∗3 :
AaA
c1
c−1bB
bB
c2
c−2
y B xA y Ba
y Bb
xAa
y Bb
xAa
R3, R−3 : yB yBx A Ax
c4
c−4
Figure 2.6: The rule-set described by Example 2.2. The shaded site s overnode type A represents a different internal state.
Example 2.3. (Conditional independence) Assume an agent A with three sites:
l(left), r(right) and c(control), such that each of them has two internal modifica-
tions, denoted by 0 and 1. The site c can change its internal value independently
of the other two rules. The sites l and r may change their internal value only if
the site c has value 1. The model is sketched in Figure 2.7.
R1, R−1 :
c
A
c1
c−1Ac
R2, R−2 :
c2
c−2Ac
l Ac
l
c3
c−3R3, R
−3 :
rAc
Ac
r
Ac
l Ac
l rAc
Ac
r
c∗2 c∗3R∗
2 : R∗3 :
Figure 2.7: The model described in Example 2.3.
Chapter 3
Automated reductions of rule-based
models
This Chapter outlines general properties of fragment-based reductions. These
reductions were termed fragment-based by Feret and co-authors, who used them
for automatically reducing the deterministic semantics of rule-based models [31].
The main focus of our work will be using fragment-based technique for reducing
stochastic semantics of rule-based models, that is, characterizing the stochastic
fragments and computing their dynamics.
The motivation for performing fragment-based reductions is the following. A small
number of rules can generate a system of astronomical state space [1, 54], rendering
the expansion to the species-based description often infeasible even to write down.
For example, if proteins D and E can bind via one bond, and each have n domains
which can all have two possible internal states, then there is 2n+2n+2n2n, different
molecular species formed only by these two molecules (for instance, equal to 16640
already for n = 7). With the huge state space, it becomes prohibitive to analyze the
rule-set by expanding it to its equivalent species description. However, since the
huge state space emerges from a small number of rules operating over patterns,
there is hope to capture the dynamics of a rule-set compactly, as a function of
patterns, which are much fewer than full molecular species. For that reason, we
turn to detecting those patterns, called fragments, which can faithfully describe
the dynamics of a rule-set. The term ‘fragment’ is chosen in the sense that it
is syntactically represented as a fragment of a full species, which is opposite to
51
Chapter 3. Automated reductions of rule-based models 52
the extensional characterization of a fragment, as a set of species to which it can
embed to.
To exemplify, consider the Example 2.1, and a projection from a system state z(t)to a state z(t) with three components zA, zB?, zAB?, such that
zA(t) = zA(t) (3.1)
zB?(t) = zB(t) + zBC(t)
zAB?(t) = zAB(t) + zABC(t)
Looking back at the system of ODE’s Section 2.6, since differentiation is a linear
operator, the derivatives of the new variables compute to
dzAdt
= −zAzB?k1 + zAB?k1− (3.2)
dzB?
dt= −k1zAzB? + k1− zAB?
dzAB?
dt= k2zAzB? − k1− zAB?.
The system (3.2) operates only over the variables zA, zB?, zAB?, that is, it self-
consistently describes their dynamics. By solving the smaller system (3.2), the full
dynamics of the concrete system is not known, but meaningful information about
the original system is obtained.
The system (3.2) is exactly the deterministic semantics of a reaction model
FA, FB?k1→ FAB? (3.3)
FAB?
k1−→ FA, FB?
operating over three ‘abstract species’, denoted by FA, FB? and FAB?. These
‘abstract species’ are called fragments. In particular, notice that, for example, the
contribution of fragment FB? with respect to rule R2 is zero. This is because FB
is consumed at rate k2zBzC , while FBC gets produced at the same rate. In total -
the two terms cancel out, and we say that rule R2 is silent with respect to FB?.
Fragment-based reductions aim to immediately derive the system (3.3), in contrast
to first expanding the equivalent species-based description, and then detecting
Chapter 3. Automated reductions of rule-based models 53
symmetries in the equations. It is therefore important to distinguish the fragment-
based reductions from other principled model simplification techniques, based on,
for example, separating time-scales [44, 55, 75] or exploiting conservation laws
[9, 14]. In fragment-based reductions, the species-based system is considered only
for the purpose of proving the relation between the reduced and the original model.
Still, once a fragment-based rule set is obtained, it is amenable to any further
analysis.
To this end, the fragment-based reductions follow the idea of static program anal-
ysis by abstract interpretation. Abstract interpretation is a unifying formal frame-
work for providing partial answers about mathematical structures, when the actual
problem is computationally expensive or even undecidable [16] while static anal-
ysis is preprocessing analysis, which makes conclusions about program semantics
before their executions. The theory of abstract interpretation is independent of a
particular application. In general, the applicability of the abstract interpretation
framework has two constraints: (i) the choice of the abstract domain needs to en-
sure the desired relation between the concrete and abstract semantics (soundness),
(ii) the way of computing the abstract semantics should be tractable.
The choice of the abstract domain in our case is the fragment-based domain. The
relation between the species-based and fragment-based domain will be discussed
in Chapter 5 and Chapter 7. Computing the abstract semantics is done by trans-
lating the rule-set, so that the concrete semantics of the new rule-set provides the
abstract semantics. Such an approach is tractable, and also practically appealing,
since it can be fed into any general-purpose rule-based quantitative analysis tool
(simulation engine).
3.1 Stochastic fragments: Motivating example
We start by elaborating the idea of stochastic fragments with Example 2.1.
In Figure 3.1a, the stochastic model for initially one copy of free SA, one copy of free
SC and three copies of free SB is represented. The description in terms of fragments
FA, FB?, FAB?, FC , F?BC means that states x3 and x4 are indistinguishable. Let
Chapter 3. Automated reductions of rule-based models 54
x34 ∶= x3 + x4. Then, we can compute the evolution of the fragment-based states:
dp(x34)dt
= dp(x3)dt
+ dp(x4)dt
= 3c2p(x1) + 3c1p(x2) − (c2− + c1−)(p(x3) + p(x4))
= 3c2p(x1) + 3c1p(x2) − (c2− + c1−)p(x34)dp(x1)
dt= 3c1p(x0) + c2−p(x3) + c2−p(x4) − p(x1)(c1− + 2c2 + c2)
= 3c1p(x0) + c2−p(x34) − p(x1)(c1− + 2c2 + c2)dp(x2)
dt= 3c2p(x0) + c1−p(x3) + c1−p(x4) − p(x2)(c1− + 2c1 + c1)
= 3c2p(x0) + c1−p(x34) − p(x2)(c1− + 2c1 + c1),
As the above set of equations is self-consistent, the CTMC in Figure 3.1b can be
used to compute the transient distribution of the lumped process. In the later
development of the theory, it will be shown that it can also be used to compute
the trace distribution of the lumped process.
Another property which will be argued on is that the conditional probability of
being in a state x3 or x4 can be recovered from that of x34 (introduced as ‘in-
vertability’ in Chapter 4). In particular, the theory will imply that the ratio
between the probability p(t)(x3) and p(t)(x4) can be reconstructed as the ratio of
automorphisms of site-graphs which represent the states x3 and x4 respectively:
p(t)(x3)p(t)(x4)
= ∣Aut(SAB, SB, SBC)∣∣SABC ,2SB∣
= 2
1. (3.4)
We show that (3.4) holds. Let ∆(t) ∶= 12p
(t)(x3) − p(t)(x4). Then,
d∆(t)dt
= −(c2− + c1−)∆(t)
has a unique solution ∆(t) = ∆(0)e−(c2−+c1−)t, meaning that the probability of
being in state x3 converges to being exactly two times larger than the probability
of being in state x4, and, combined with the self-consistency derivation, it follows
that p(t)(x3) = 23p
(t)(x34). If ∆(0) = 0, the ratio between probabilities will always
hold, and otherwise it will be the case asymptotically.
Finally, notice that, if, for example, the rate of unbinding SABC would be stronger
than the rate of unbinding SAB or SBC separately, it would not be possible to
write the equation for dp(x1)
dt and for dp(x2)
dt as a function of p(x34). In this case,
Chapter 3. Automated reductions of rule-based models 55
y0
y1
y2
y233c1
3c2
c1−
c1−
FA, 3FB?, 3F?B , FC
FAB?, 2FB?, 2F?B , FC
FA, 2FB?, 2F?B , 2F?BCFAB?, FB?, F?B , F?BC
3c2
3c2
c2−
c2−
x0
x1
x2
x3
x4
3c1
3c2 2c1
2c2
c1
c2 c1−
c2−
c1−
c2−
c1−
SA, 3SB , SC
SAB , 2SB , SC
SA, 2SB , SBC SABC , 2SB
SAB , SB , SBC
c2−
a) b)
Figure 3.1: Stochastic fragments: motivating example. a) The Markovgraph for x0 ≡ SA,3SB, SC; b) The fragment-based Markov graph.
the proposed fragmentation is not expressive enough, since it cannot express a
quantity which is necessary for the correct description of fragments’ dynamics.
Consequently, any proposed reduction with the same choice of fragments will not
be exact.
The goal of exact fragment-based reductions of stochastic rule-based models is to
generalize the made observations, so that the presented reduction can be detected
and performed on any rule-based program. The input to the fragmentation process
is (i) the set of observable species, patterns or their combination within a reaction
soup (for example, we can be interested in the average copy number of SA and SC ,
or the probability of being in the state with 100 patterns FAB? and 100 patterns
F?BC), (ii) the rule-set. The fragments should be chosen so that the dynamics of
the observables can be correctly and self-consistently computed from the fragment-
based description.
The detection of fragments involves characterizing the states of the CTMC that
can be lumped, and boils down to detecting groups of sites that a rule-set must
simultaneously ‘know’ in order to execute the rules correctly. For example, exe-
cuting a rule R3 in Example 2.1 demands determining whether the species SABC
embeds into the current reaction mixture, implying that the correlation between
values of sites a and c on node type B must be maintained.
In the rest of this Chapter, we address three general questions: (i) what are
fragments and what is the fragment-based semantics of a rule-based model, (ii)
how to evaluate reduction with fragments, (iii) how to compute the fragment-based
semantics efficiently.
Chapter 3. Automated reductions of rule-based models 56
3.2 Fragments
One way to formalize how fragments emerge from a rule-set is by grouping sites of
the same node type according to a binary relation. In this work, we will require
that this binary relation is an equivalence relation.
Definition 3.1. The annotation of a contact map (A,Σ,E ,I ) is a family of
equivalence relations ∼A⊆ Σ(A) ×Σ(A) ∣ A ∈ A. Let C ∶ A → P(P(S )) be such
that C (A) = Σ(A)/∼, for A ∈ A. The elements of C (A) will be called annotation
classes of node type A.
The informal meaning of s ∼A s′, is that the correlation between values of site
s and s′ in a node of type A should be maintained. Recall that a pattern is a
connected site-graph over a contact map C.
Definition 3.2. Let P be the set of all patterns over a contact map C. The set
of fragments induced by the annotation ∼AA∈A with a contact map C is
F = F ∈ P ∣F = (V,Type,I,E,ψ), such that for all v ∈ V ,
if Type(v) = A then I(v) ∈ C (A).
In words, the interface of a node in a fragment must equal exactly one of the
classes induced by annotation. When all sites of a node type A are correlated
by ∼A, the set of fragments is equal to the set of species. Therefore, whenever
C (A) = Σ(A), we deal with the species-based description. On the other extreme,
the biggest reduction is achieved when the relation ∼A is a diagonal - when each
site is correlated only with itself.
In Example 2.1, there are exactly two possible annotations, both represented in
Figure 3.2.
There are different ways of understanding fragments. In a static context, in the
intensional view, a fragment is a conceptual definition, which does not include
global knowledge about the rule-based system in which this fragment appears. On
the other hand, in the extensional view, a fragment stands for the set of species
which conform to the description of this fragment within the given rule-set. For
Chapter 3. Automated reductions of rule-based models 57
a b AB
c dB
C
b A d Ca
Bc
B
a b AB
c d C
a b AB
c dB
C
b A d Ca
Bc
a b AB
c d C
a)
b)
c
a a b AB
c d CF1 = S
F2
Figure 3.2: Example 2.1: The two possible annotations for Example 2.1 andrespective sets of fragments. a) The annotation on the contact map whichdefines the six species. b) The annotation where the sites a and c of node type
B are separated and six fragments emerge.
example,
intensional view : fragment = pattern ‘FB?’
extensional view : fragment = set of species SB, SBC.
Extensional view will be useful for proving properties related to the concrete,
species-based world. In algorithms, we will use the intensional view, because we
want to avoid the expansion to the reaction-, species-based world.
The dynamical notion of fragments is related to the semantics of interest. For
example, the fragmentation FA, FB?, FAB? with system (3.2) is appropriate for
describing the deterministic semantics, because (3.1) holds, but it does not directly
imply that the same set of fragments is appropriate for describing the stochastic
semantics.
3.3 Fragment-based semantics
In analogy to the definition of species-based semantics of a rule-based program, the
fragment-based view lumps all the reaction mixtures which are indistinguishable
when counting the number of occurrences of each fragment. Recall that the number
of distinct embeddings of fragment Fi in the mixture G is mG(Fi) = ∣Emb(Fi,G)∣∣Aut(Fi)∣
.
Chapter 3. Automated reductions of rule-based models 58
Definition 3.3. Given a set of fragments F = F1, F2, . . . , Fm, let ϕF ∶ G → Y ⊂Nm be such that
ϕF(G) = (y1, . . . , ym), with yi =mG(Fi),
and define
ϕ−1F (y) = G′ ≡ (V,Type, I ′,E′, ψ′) ∣G = (V,Type, I,E,ψ) ∈ G is such
that ϕF(G) = y and ϕF(G′) = y.
The process Yt over the state space Y, defined by [Yt = y iff Xt ∈ ϕ−1F(y)], is
referred to as the fragment-based semantics of (R,G0).
3.4 Reduction with fragments
Fragments provide potentially much smaller models than their species-based coun-
terparts. We investigate how much reduction can be achieved in relation to the
traditional, species-based view. Two parameters will be considered: (i) The di-
mension of the fragmentation; (ii) The number of states in a CTMC, for initially
n interacting particles.
The dimension of a fragmentation refers to the number of equations to be solved in
the deterministic model, or the dimension of the CTMC assigned to a rule-based
program. For example, the number of both species and fragments in Example 2.1
equals six, but the dimension of the fragmentation is five, because the quantity of
the fragment FAB? can be inferred from the quantities of other fragments:
mG(FAB?) =mG(F?B) +mG(F?BC) −mG(FB?),
inferred from a simple fact that the total number of nodes of type B can be
expressed as a sum of nodes B bound to A or not bound to A, or as a sum of
nodes B bound to C or not bound to C. The actual size of the CTMC can greatly
vary depending on the configuration of the initial state, that is, the distribution
of the n particles among the node types. In Example 2.1, if there is only one node
of type SB initially, the species-based CTMC will always have at most five states,
and the fragment-based CTMC will have at most four states, even if the particles
SA and SC are highly abundant. For that reason, we assume the case where the
Chapter 3. Automated reductions of rule-based models 59
number of each node type is initially equal, because it provides often the maximal
variety among the reachable states.
We first introduce the fragment-based state space Y as a lumping of the species-
based state space, X under function ϕ. Recall the functions ϕS ∶ G → X and
ϕF ∶ G→ Y.
Theorem 3.4. Let ϕ ∶ X → Y be such that
ϕ(x) = ϕF(G), where G ∈ ϕ−1S,V,Type(x),
for some node set V and type function Type ∶ V → A. Then, ϕ is unambiguously
defined for all x ∈ X .
Proof. Assume that for some G1,G2 ∈ G, ϕS(G1) = ϕS(G2) = (x1,⋯, xn), and let
ϕF(G1) = (y1, . . . , ym). Recall that, by (Chapter 2, Lemma 2.18), for every frag-
mentation F , and for all reaction mixtures G ∈ G,
mG(Fj) =∑Si
mG(Si)mSi(Fj). (3.5)
Then,
yj =mG1(Fj) =∑Si
mG1(Si)mSi(Fj), by (3.5)
=∑Si
xjmSi(Fj),by assumption
=∑Si
mG2(Si)mSi(Fj) =mG2(Fj), by (3.5).
Definition 3.5. Let Π ∶ F × S → N, such that Π(F,S) ∶= mS(F ). The expres-
siveness of F is span(Π), and the dimension of a fragmentation F is equal to
rank(Π). We say that the fragmentation F1 is more expressive than fragmentation
F2, written F1 ⪯ F2, if span(Π1) ⊇ span(Π2).
Expressiveness refers to all linear combinations which can be derived from frag-
ment quantities; The ground, species fragmentation is more expressive than any
fragmentation, since all fragment quantities can be described as a linear combina-
tion of species quantities. The dimension of a fragmentation can also be computed
Chapter 3. Automated reductions of rule-based models 60
y Bb
xAa
y Bb
xAa
y Bb
xAa
y Bb
xAa
rAc
l
rAc
l rAc
l
rAc
lF1 = SF1 = S
F2 F3
F4
F2 F3 F4
a) b) F5
rAc
l
Figure 3.3: Fragmentation lattice. a) Hasse diagram of a fragmentation latticefor Example 2.2, when c4 = c
−4 = 0; b) Hasse diagram of a fragmentation lattice
for Example 2.3.
directly from the annotation map, since the number of new conservation laws is
correlated with the number of annotation classes over each agent. The fragmen-
tation proposed with Example 2.1 is assigned a matrix Π, with rank five:
Π =
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
1 0 0 0 0 0
0 1 0 0 1 0
0 1 0 1 0 0
0 0 1 0 0 0
0 0 0 1 0 1
0 0 0 0 1 1
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
A
B?
?B
C
AB?
?BC
Claiming that one fragment set is more expressive than another one, can easily
be detected by directly looking at their respective contact map annotations. In-
formally, we will say that one annotation refines another one, it it induces a more
expressive fragment set.
Definition 3.6. A contact map annotation ∼1A∣ A ∈ A refines a contact map
annotation ∼2A∣ A ∈ A, if for all A ∈ A, ∼1A⊇∼2A.
Lemma 3.7. If ∼1A refines ∼2A, then F1 ⪯ F2.
Proof. It suffices to show that the quantitymG(F2), of a fragment F2 = (V,Type,I,E,ψ) ∈F2, can be expressed as a linear combination of quantities of fragments in F1. Let
F ′ ≡ (V ′,Type′,I ′,E′, ψ′) ∈ F1 be such that Emb(F ′, F2) ≠ ∅. At least one such
F ′ must exist, since ∼1A⊇∼2A. Let now F ′1 = F ′′ ≡ (V ′′,Type′′,I ′′,E′′, ψ′′) ∣ V ′′ =
V ′,Type′′ = Type′,I ′′ = I ′, that is, the set of fragments over the same set of nodes
Chapter 3. Automated reductions of rule-based models 61
Examplesimple scaffold(3 node types)polymerization(2 node types)
conditionalactivation
(1 node type)
annot. dim. size of CTMC n = 3, n = 10, n = 100
F1 ≡ S 3 (n+1)(n+2)(n+3)6 20, 286, ∼ 10e5
F2 2 (n + 1)2 16, 121, 10201F1 ≡ S n > 3P (n) > 9, > 126, > 10e10F2 2 (n + 1)2 16, 121, 10201F1 ≡ S 8 C(n,8) 120, 19448, ∼ 10e10F2, F3, F4 5 C(n,2)C(n,4) 80, 3146, ∼ 10e7F5 4 C(n,2)3 64, 121, ∼ 10e6
Table 3.1: Summary of the reduction for different annotations in the exam-ples. The model of Example 2.2 is analyzed only with respect to the rulesR3,R
−3 ,R4,R
−4. The number of partitions of n is denoted by P (n) (the ap-
proximate formula is P (n) ≈ 14n
√3eπ
√2n3 [48]). C(n, k) = (
n+k−1k−1
) equals the
number of writing n as a sum of k non-negative integers. The size of the CTMCfor each example is estimated by the assumption of initially having n copies of
each node type.
and interfaces as F ′. Then,
mG(F2) = ∑F ′′∈F ′1
mG(F ′′)mF ′′(F2).
Since the set of all fragmentations over a given contact map is a partially ordered
set with respect to ⪯, all possible annotations of a considered contact map can be
presented by a lattice. In Figure 3.3, we present a lattice of fragmentations related
to the examples introduced in Section 2.6. Moreover, in Table 3.1, we compare the
dimensions and the size of the CTMC for different fragmentations. In particular,
in the example of polymerization, it is demonstrated that the fragmentation can
lead to an exponentially smaller state space. The detailed computation for the
entries in the table can be found in [37].
3.5 Computing fragment-based semantics
Given the rule-based program (R,G0) over the contact map C, our approach for
obtaining the fragment-based semantics is to construct a new rule-based program
(R, G0) over a modified contact map C, so that the species-based semantics of Rcoincides with the fragment-based semantics of R. The modification of a contact
map and of the rule-set is done according to the annotation classes.
Chapter 3. Automated reductions of rule-based models 62
In the following, we define how to transform a rule-based program, for any given
fragment set. The relation between the semantics of the original and the translated
model will be discussed in Chapter 5 and Chapter 7.
3.5.1 Translating the contact map
Assume a rule-based program (R,G0) over the contact map C = (A,Σ,E ,I ).Recall that, given annotation ∼A∣ A ∈ A, the annotation classes are captured by
a function C ∶ A→ P(P(S )), such that C (A) = Σ(A)/∼A .
a)
b
d
B
B
B
A
C
G
v2
v1
v3
v4
v5a
c
a
ca
c
b) τ
G
b
b
C
A
a
c
Ba
a
c
Ba
a
c
Ba
Bc
Bc
Bc
v1a
v1c
v2a
v2c
v3a
v5d
v3c
v4bR3
aBa b Ab
Bc
aBa b Ab
Bc
a b AB
a b AB
c1/nB
c−1 /nB
c1
c−1R1
Figure 3.4: Translating a rule, according to the annotation such that C (A) =
b, C (B) = a,c, C (C) = b. a) Translation of a rule; b) Transla-tion of a reaction mixture.
Definition 3.8. Given annotation ∼A∣ A ∈ A, the new contact map is C =(A, Σ, E , I ), such that
(i) A = AC1 ,AC2 , . . . ∣ A ∈ A and C1,C2, . . . ∈ C (A),
(ii) Σ(AC) = Σ(A)∣C ∈ C (A),
(iii) E = ((AC, s), (A′C′ , s
′)) ∣ s ∈ C, s′ ∈ C′ and ((A, s), (A′, s′)) ∈ E,
(iv) I = I .
In words, for each equivalence class C ∈ C (A) assigned to a node type A ∈ A, a
new node type is created. The interface of the new node type and the new edge
types are naturally inherited from the original one.
In graphical representation of a contact map, each node type A is ‘cut’ into as
many parts as many classes there are in the annotation of A. We now define how
Chapter 3. Automated reductions of rule-based models 63
to project a site-graph over contact map C to a site-graph over the new contact
map C. Let G be the set of all site-graphs over C (notice that, since G denotes
all reachable reaction mixtures in a rule-based program, G ⊆ G ), and G the set of
all site-graphs over C.
Definition 3.9. The function τ ∶ G → G maps a site-graph G = (V,Type, I,E,ψ) ∈G to a site-graph G = (V , ˜Type, Σ, E, ψ) ∈ G , that is, G = τ(G), if
(i) V = vC ∣ v ∈ V and C ∈ C (Type(v))
(ii) ˜Type(vC) = [Type(v)]C,
(iii) Σ(vC) = I(v)∣C,
(iv) E = ((vC, s), (v′C′ , s′)) ∣ s ∈ C, s′ ∈ C′ and ((v, s), (v′, s′)) ∈ E,
(v) ψ(vC, s) = ψ(v, s), if s ∈ C.
Similar to how the contact map is translated according to the contact map anno-
tation, each node of a site-graph is cut into as many parts as many classes there
are in its annotation.
In Figure 3.4b, we show an application of τ to one reaction mixture for Exam-
ple 2.1. The transformation from reaction mixture G to G will not be explicitly
done over the reaction mixtures; It will arise as a reachable mixture in a translated
rule-based program.
3.5.2 Translating the rule-based program
A rule-based program (R,G0) over a contact map C can be translated to a new
rule-based program (R, G0) over the transformed contact map C. Translation of
a rule-set is performed by translating each of the rules separately.
In Algorithm 1, we present how to translate a rule to a rule over the new contact
map. The lhs and rhs of a rule are translated according to the function τ , and
then the rate in the new rule is corrected. One rule translation is illustrated in
Figure 3.4a.
Definition 3.10. Given an annotation ∼A and a rule-based program (R,G0)over the contact map C, the reduced rule-based program is (R, G0) over the con-
tact map C (Definition 3.8), such that
Chapter 3. Automated reductions of rule-based models 64
Algorithm 1: Translating a rule.
Input : A rule R = (G,G′, c) over the contact map C and annotation ∼AA∈A;the state G.
Output: A rule R = (G, G′, c), over the contact map C.
G = τ(G);G′ = τ(G′);c = c;for all A ∈ A do
nA ∶=the total count of nodes of type A in G;if there exists a node of type A in G then
c = c ⋅ nA;for all equivalence classes in Σ(A)/∼A do
c = c/nA;
(i) G0 = τ(G0),
(ii) The translation of each rule is outlined in Algorithm 1.
We explain intuitively the translation of a rule-based program. Formal analysis
will follow in Chapter 5 and Chapter 7. Assume that the process assigned to
(R,G0) is in state G, and that the process assigned to (R, G0) is in state τ(G) = G.
Two requirements arise. First, application of any rule Ri ∈ R to G should be
mimicked by the application of a rule Ri ∈ R in the following way: if the result of
applying a rule Ri to a mixture G ∈ G is G′, then the result of applying a rule Ri
to a mixture G ∈ G should be G′, such that G′ = τ(G′). To this end, whenever a
copy of a node type AC is consumed (resp. produced), a copy of a node type of
each other class C′ ∈ C (A) must be consumed (resp. produced). This is achieved
by defining the translation from Ri to Ri by translating both lhs and rhs of a
rule by τ . Second, the rate must be appropriately adjusted, so that the number
of embeddings between the lhs of Ri ∈ R and G = τ(G) approximates (or, when
possible, equals) the number of embeddings between the lhs of Ri ∈R and G.
Chapter 4
Exact aggregation of Markov chains
Throughout this section, we consider a process Xt (either discrete- or continuous-
time), assigned to a Markov graph (S,w, p0), and a partitioning of the countable
set S, induced by a surjective function g ∶ S → S, where S = A1, . . . ,AM and
M < ∣S∣. The partition classes, induced by the inverse map g−1 ∶ S → P(S) with
g−1(A) = s ∣ g(s) = A will be denoted by [A]g or only [A], when the partition
function is clear from context. The set of probability distributions over S will be
denoted by D(S).
Definition 4.1. The g-projection of a stochastic process Xt over the state space
S is a stochastic process Yt over the state space S, defined by
Yt = g(Xt).
We can now define the exact Markov chain aggregation problem.
Problem 1. Construct a Markov graph (S, w, p0) with process Yt, so that Ytis equivalent to the g-projection of Xt.
To this end, we introduce the notion of g-aggregation, to be any Markov graph on
S. In other words, aggregation is a candidate solution to the aggregation problem.
We call the aggregation exact, if the aggregated Markov graph defines exactly
the projected process. Observe that the notion of projection refers to the process
Xt, while the notion of aggregation refers to the graph (S,w, p0).
Definition 4.2. Any Markov graph (S, w, p0) such that p0(A) = ∑s∈A p0(s) is
called a g-aggregation of (S,w, p0). Let Yt be the stochastic process assigned to
65
Chapter 4. Exact aggregation of Markov chains 66
(S, w, p0). If the g-projection of Xt is equivalent to Yt, the aggregation is said
to be exact. Otherwise, the aggregation is approximate.
Naturally, the projected process is not necessarily Markov time-homogeneous, and
the exact aggregation may not exist.
In this chapter, we study sufficient criteria for performing exact aggregations,
and we show how to construct the exact aggregation (if the criteria is met). In
particular, we study the properties of lumpability, a property which guarantees the
existence of exact aggregation, and invertability, a property of being able to invert,
reconstruct the original transient probabilities from the aggregate ones. Discrete-
and continuous-time Markov chains are studied separately. Each of the two cases
is summarized in a single theorem on how the trace distribution of the aggregate
process relate to the trace distribution of the original process.
4.1 Lumpability and invertability
Definition 4.3. If the g-projection of a process Xt assigned to a Markov graph
is again a Markov, time-homogeneous process, then Xt is said to be lumpable
with respect to the partition g. If Xt is lumpable with respect to g for every
initial distribution, it is also said to be strongly lumpable, and if it is lumpable for
some initial distribution, it is said to be weakly lumpable with respect to g.
In practice, it is useful to characterize lumpability only from the graph description
of the process. The criteria we present relate to the Markov graph, and they apply
both for DTMC and CTMC.
Let (S,w, p0) a Markov graph.
Forward criterion. Define a function δ+ ∶ S × S → R by
δ+(s,A) = ∑s′∈[A]
w(s, s′),
and specify the following condition.
(Cond1) For all A,A′ ∈ S and s, s′ ∈ [A], δ+(s,A′) = δ+(s′,A′).
Chapter 4. Exact aggregation of Markov chains 67
The condition can be interpreted as follows. Consider being in some state s ∈ [A],and transitioning to a state in [A′]. In discrete-time case, the value of δ+(s,A′)represents the probability of this event, more precisely, P(Xn+1 ∈ [A′] ∣ Xn = s),and (Cond1) states that this probability is the same no matter which state s in
[A] is taken as initial.
Backward criterion. Let α ∶ S → D(S) be a family of probability measures on
S, such that α(A, s) = 0 for s ∉ A. Intuitively, one can think of α(A, s) as the
probability of process Xt being in state s, conditioned on the projected process
g(Xt) being in state A = g(s). Define δ− ∶ S × S → R≥0 by
δ−(A′, s′) = ∑s∈[A]
α(A, s)w(s, s′)α(A′, s′) , where s ∈ [A],
and specify the following condition.
(Cond2) For all A,A′ ∈ S and s, s′ ∈ [A′], δ−(A, s) = δ−(A, s′).
An intuitive interpretation is the following. Consider a transition from a state in
[A] to a state in [A′], and fix two states s′1, s′2 ∈ [A′]. The condition (Cond2)
states that the proportion of ending in either of the two states should always be
the same. For example, in discrete-time, this means that, for all n = 0,1,2, . . ., for
all A,A′ ∈ S, and for all s1, s2 ∈ [A′],
P(Xn+1 = s′1 ∣Xn ∈ [A′])P(Xn+1 = s′2 ∣Xn ∈ [A′]) = α(A
′, s′1)α(A′, s′2)
.
The value δ−(A, s′1) represents the fraction α(A′, s′1) of probability P(Xn+1 = s′1 ∣Xn ∈ [A′]).
Definition 4.4. Given a probability distribution π ∈ D(S), if
π(s)∑s′∈[A] π(s′)
= α(A, s), for all A ∈ S, for all s ∈ [A],
we say that π respects α.
Note that whenever ∣S∣ = M > 1, there are infinitely many distributions which
respect α.
It sometimes happens that the transient probability of begin in a concrete state,
conditioned on being in the respective aggregate state is invariant of time. Then,
Chapter 4. Exact aggregation of Markov chains 68
one can invert, reconstruct the original transient probabilities from the aggregate
ones.
Definition 4.5. If the transient distributions of Xt respect α for all t ∈ R≥0, we
say that Xt is invertible with respect to α.
Remark 4.6. Before continuing, we discuss informally the derivation of the two
conditions in discrete-time. The projected process is Markov, time-homogeneous,
if the probability P(Xn+1 ∈ [A′] ∣Xn ∈ [A]) depends on no particular states s ∈ [A],s′ ∈ [A′], nor on time n. Notice that
P(Xn+1 ∈ [A′] ∣Xn ∈ [A]) = ∑s′∈[A′]
P(Xn+1 = s′ ∣Xn ∈ [A])
= ∑s∈[A]
∑s′∈[A′]
P(Xn+1 = s′ ∣Xn = s)P(Xn ∈ [A])
= ∑s∈[A]
P(Xn = s ∣Xn ∈ [A]) ∑s′∈[A′]
w(s, s′).
When asking for strong lumpability, the conditional probabilty P(Xn = s ∣ Xn ∈[A]) can vary, since the initial distribution should not influence lumpability. Then,
one must ask for (Cond1), which implies
P(Xn+1 ∈ [A′] ∣Xn ∈ [A]) = ∑s∈[A]
P(Xn = s ∣Xn ∈ [A])δ+(s,A′)
= δ+(s,A′), for any s ∈ [A].
For ensuring weak lumpability, one may assume that P(Xn = s ∣ Xn ∈ [A]) =α(A, s) holds. Then, (Cond2) must be imposed to guarantee that the property is
preserved also for n + 1, in which case
P(Xn+1 ∈ [A′] ∣Xn ∈ [A]) = ∑s′∈[A′]
α(A′, s′)δ−(A, s′)
= δ−(A, s′) for any s′ ∈ [A′].
4.2 Discrete-time case
We outline two simple criteria for proving that a given Markov chain is lumpable
with respect to a given partition.
Chapter 4. Exact aggregation of Markov chains 69
Let (S,w, p0) be a Markov chain with DTMC Xn.
4.2.1 Forward criterion
Theorem 4.7. Suppose that (Cond1) holds. Fix s ∈ [A] and let for each A ∈ S,
w(A,A′) = δ+(s,A′). Then, the aggregation (S, w, p0) is well-defined and exact.
Proof. We show only the well-definedness part. Notice that, by (Cond1), the
aggregation is unambiguously defined. Moreover, for every state A ∈ S, w(A, ∶) is
a probability distribution: let s ∈ [S]. Then,
∑A′∈S
w(A,A′) = ∑A′∈S
δ+(s,A′)
= ∑A′∈S
∑s′∈A′
w(s, s′)
= ∑s′∈S
w(s, s′) = 1.
Corollary 1. The process Xn is strongly lumpable with respect to g. Moreover,
the process Xn is strongly lumpable with respect to g only if (Cond1) holds.
4.2.2 Backward criterion
Definition 4.8. Suppose that (Cond2) holds. Fix s ∈ A′ and let w(A,A′) =δ−(A, s) for each A′ ∈ S. Then, the aggregation (S, w, p0) is called the (g,α)-aggregation of (S,w, p0). If the partition g is clear from the context, we write only
α-aggregation.
Theorem 4.9. The α-aggregation is well-defined.
Proof. The aggregation is unambiguously defined. Notice that, by (Cond2),
α(A′, s′)w(A,A′) = ∑s∈[A]
α(A, s)w(s, s′).
Chapter 4. Exact aggregation of Markov chains 70
Summing over s′ ∈ A′, we have
w(A,A′) = ∑s∈[A]
∑s′∈[A′]
α(A, s)w(s, s′).
It follows that
∑A′∈S
w(A,A′) = ∑s∈[A]
∑s′∈S
α(A, s)w(s, s′) = 1.
Notice that α-aggregation is well-defined whenever (Cond2) holds, regardless of the
initial distribution of Xn. This will be important when discussing the asymptotic
behavior.
Theorem 4.10. If (Cond2) holds and p0 respects α, the α-aggregation (S, w, p0)is exact.
The proof is outlined together with the proof of Theorem 4.12.
Corollary 2. If (Cond2) holds and p0 respects α, the process Xn is weakly
lumpable with respect to g.
Example 4.1. We demonstrate in Figure 4.1 that it may happen that (Cond2)
holds, but (Cond1) doesn’t. This is consistent with the intuition of (Cond1) being
a more restrictive condition, since it implies strong lumpability. However, we
also demonstrate that it may happen that (Cond1) holds, but (Cond2) doesn’t.
Therefore, neither of the conditions is strictly more restrictive than the other.
On a practical note, the observation suggests that, when building algorithms for
detecting lumpable partitions, it makes sense to check both conditions (Cond1)
and (Cond2). The observation is summarized in the following Lemma.
Lemma 4.11. Given a Markov graph (S,w, p0) with process Xn, let X denote
all possible partition functions on S. Define the set
PSX = g ∈ X ∣ Xn is strongly lumpable with respect to g,
PWX = g ∈ X ∣ Xn is weakly lumpable with respect to g,
CSX = g ∈ X ∣ (S,w, p0) satisfies (Cond1) with respect to g,
CWX = g ∈ X ∣ (S,w, p0) satisfies (Cond2) with respect to g and some α.
Then, (i) for all X, CSX = PSX ⊂ PWX , (ii) for all X,CWX ⊂ PWX , (iii) there
exist X, such that CSX∖CWX ≠ ∅ and (iv) there exist X, such that CWX∖CSX ≠
Chapter 4. Exact aggregation of Markov chains 71
∅. In other words, (Cond1) is sufficient and necessary for the process Xn to be
strongly lumpable with respect to g, (Cond2) is sufficient, but not necessary for
the process Xn being weakly lumpable with respect to g, (Cond1) is neither
necessary nor sufficient condition for (Cond2).
x
y1
y2
y3
z1
z2
z3
1/2
1/3
1
1
11/3
1/3
1/2
1/2
1/2
1/2
1/2
x y
z12
z3
1
2/3
1/3
1/2
1/2
1/2
1/2
x
z12
z31/2
1/2
1/2
1/2
y12
y3
1
1
2/3
1/3
2/3
1/3
y12
y3
x z1/2
1/2
1
1
a) b) c)
Figure 4.1: An example of a DTMC and three possible aggregations. Let S =
x, y1, y2, y3, z1, z2, z3 and the transition matrix is as specified in the graph above.
Let g1, g2, g3 denote the three partitions shown in a), b) and c) (the states y1 and y2map to the state y12 etc). Then, g1 meets both properties (Cond1) and (Cond2) (by
taking α(y12, y1) = α(y12, y2) = 0.5 and α(z12, z1) = α(z12, z2) = 0.5). Furthermore, g2does not satisfy the property (Cond1), because, for example, δ+(y1, z12) ≠ δ
+(y3, z12),
but it does satisfy the property (Cond2), for α(y, y1) = α(y, y2) = α(y, y3) = 1/3 and
α(z12, z1) = α(z12, z2) = 0.5. Finally, g3 satisfies (Cond1), but (Cond2) fails, because,
for example δ−(y12, z1) ≠ δ−(y12, z3).
The coming discussion on invertability explains why (Cond2) is sometimes more
restrictive than (Cond1).
4.2.3 Invertability
The condition (Cond2) enforces more than lumpability: one can also invert, or
reconstruct the transient probabilities from the aggregated process.
Let (S,w, p0) be a Markov graph with a DTMC Xn. Recall that the Xn is
invertible with respect to α if all the transient distributions of X0,X1, . . . respect
α.
Theorem 4.12. If (Cond2) holds and p0 respects α, then Xn is invertible with
respect to α.
Chapter 4. Exact aggregation of Markov chains 72
Instead of proving directly Theorem 4.12, we show first a stronger statement which
relates the transient distributions of the original and the aggregated process. No-
tice that the following Theorem does not yet show that the α-aggregation is exact
(Theorem 4.10), because it does not show that the projected process is Markov
time-homogeneous.
Theorem 4.13. Let Y ′n denote the process assigned to (S, w, p0). If p0 respects
α, then
(i) P(Xn ∈ [A]) = P(Y ′n = A),
(ii) P(Xn = s) = P(Y ′n = A)α(A, s).
We will prove Theorem 4.13 with a help of two Lemmas.
Lemma 4.14. Assume that P(Xn−1 = s ∣ Xn−1 ∈ [A]) = α(A, s) for all A ∈ S and
all s ∈ S. Then, P(Xn ∈ [A′] ∣Xn−1 ∈ [A]) = w(A,A′).
Proof. Notice that the joint probability P(Xn ∈ [A′],Xn−1 ∈ [A]) can be written
as w(A,A′)P(Xn−1 ∈ [A]):
P(Xn ∈ [A′],Xn−1 ∈ [A]) = ∑s∈[A]
∑s′∈[A′]
P(Xn = s′,Xn−1 = s)
= ∑s∈[A]
∑s′∈[A′]
P(Xn−1 = s)P(Xn = s′ ∣Xn−1 = s)
= ∑s∈[A]
∑s′∈[A′]
P(Xn−1 = s)w(s, s′)
= P(Xn−1 ∈ [A]) ∑s′∈[A′]
⎛⎝ ∑s∈[A]
α(A, s)w(s, s′)⎞⎠,by the hypotheses
= P(Xn−1 ∈ [A])w(A,A′) ∑s′∈[A′]
α(A′, s′),by (Cond2)
= w(A,A′)P(Xn−1 ∈ [A]).
Lemma 4.15. Assume that p0 respects α. Then, P(Xn = s ∣Xn ∈ [A]) = α(A, s).
Proof. We use induction. Suppose that the statement holds for k = n − 1. First
observe that if s ∉ [A], then both sides equal zero. So assume that s ∈ [A]. Then,
Chapter 4. Exact aggregation of Markov chains 73
by Lemma 4.14, we have that P(Xn ∈ [A′] ∣ Xn−1 ∈ [A]) = w(A,A′), which is used
to show that
P(Xn ∈ [A′]) = ∑A∈S
P(Xn−1 ∈ [A])P(Xn ∈ [A′] ∣Xn−1 ∈ [A])
= ∑A∈S
P(Xn−1 ∈ [A])w(A,A′).
Next note that
P(Xn = s′) =∑s∈S
P(Xn−1 = s)w(s, s′)
= ∑A∈S
∑s∈[A]
P(Xn−1 ∈ A)P(Xn−1 = s ∣Xn−1 ∈ [A])w(s, s′)
= ∑A∈S
∑s∈[A]
P(Xn−1 ∈ A)α(A, s)w(s, s′), by the hypotheses
= ∑A∈S
P(Xn−1 ∈ A)α(A′, s′)w(A,A′), by (Cond2).
= α(A′, s′)∑A∈S
P(Xn−1 ∈ A)w(A,A′).
The claim is obtained by dividing the two expressions.
Proof. (Theorem 4.13) We use induction. Notice that both statements hold for n =0. Assume that (i) and (ii) hold for n−1. Then P(Xn−1 = s ∣Xn−1 ∈ [A]) = α(A, s),and hence by Lemma 4.14 P(Xn ∈ [A′] ∣Xn−1 ∈ [A]) = w(A,A′). Therefore,
P(Y ′n = A′) = ∑
A∈S
P(Y ′n−1 = A)w(A,A′)
= ∑A∈S
P(Xn−1 ∈ [A])P(Xn ∈ [A′] ∣Xn−1 ∈ [A])
= P(Xn ∈ [A′]).
This proves (i). Next notice that Lemma 4.15 implies
P(Xn = s) = α(A, s)P(Xn ∈ [A])
= α(A, s)P(Y ′n = A), by (i).
Chapter 4. Exact aggregation of Markov chains 74
Proof. (Theorem 4.12) The proof follows immediatelly from Theorem 4.13.
We now get back to show Theorem 4.10. In Theorem 4.13, we showed that the
transient distributions of g-projection of Xn and its aggregation are equivalent.
We need still to show that the g-projection of Xn is a Markov time-homogeneous
process.
Theorem 4.16. If (Cond2) holds and p0 respects α, then for n = 0,1, . . . and for
all sequences of states A0, . . . ,An ∈ S,
P(X0 ∈ [A0], . . . ,Xn ∈ [An]) = p0(A0)n−1
∏i=0
w(Ai,Ai+1).
Similarly as in the case of Theorem 4.13, we first show a helpful Lemma.
Lemma 4.17. Define the following two properties
Φ1(n): for all sequences of n + 2 states A0, . . . ,An−1,A,A′ ∈ S,
P(Xn ∈ [A′] ∣Xn−1 ∈ [A],Xn−1 ∈ [An−1], . . . ,X0 ∈ [A0]) = w(A,A′).
Φ2(n): for all sequences of n + 1 states A0, . . . ,An−1,A ∈ S,
P(Xn = s ∣Xn ∈ [A],Xn−1 ∈ [An−1], . . . ,X0 ∈ [A0]) = α(A, s).
Then,
(i) Φ2(n − 1) implies Φ1(n), and
(ii) Φ1(n) and Φ2(n − 1) imply Φ2(n).
Chapter 4. Exact aggregation of Markov chains 75
Proof. Assume that Φ2(n − 1) holds. Denoting the history until time n − 2, that
is, Xn−2 ∈ [An−2], . . . ,X0 ∈ [A0] by HXn−2 , we obtain
P(Xn ∈ [A′],Xn−1 ∈ [A] ∣ HXn−2)
= ∑s′∈[A′]
∑s∈[A]
P(Xn = s′,Xn−1 = s ∣ HXn−2)
= ∑s′∈[A′]
∑s∈[A]
P(Xn = s′ ∣Xn−1 = s,HXn−2)P(Xn−1 = s ∣ HXn−2),
since Xn is Markov and by Φ2(n − 1),
= ∑s′∈[A′]
∑s∈[A]
w(s, s′)α(A, s)P(Xn−1 ∈ [A] ∣ HXn−2),
= P(Xn−1 ∈ [A] ∣ HXn−2) ∑s′∈[A′]
∑s∈[A]
w(s, s′)α(A, s),
= P(Xn−1 ∈ [A] ∣ HXn−2)w(A,A′) ∑s′∈[A′]
α(A′, s′),by (Cond2),
= P(Xn−1 ∈ [A] ∣ HXn−2)w(A,A′).
Second, we assume that Φ1(n) and that Φ2(n − 1) hold. We need to show that
Φ2(n) holds. Notice that
P(Xn = s′,Xn−1 ∈ [A] ∣ HXn−2) = ∑s∈[A]
P(Xn = s′,Xn−1 = s ∣ HXn−2)
= ∑s∈[A]
P(Xn−1 = s ∣Xn−1 ∈ [A],H)w(s, s′)
= ∑s∈[A]
P(Xn−1 ∈ [A] ∣ HXn−2)α(A, s)w(s, s′),
since Φ2(n − 1) holds,
= P(Xn−1 ∈ [A] ∣ HXn−2) ∑s∈[A]
α(A, s)w(s, s′),
= P(Xn−1 ∈ [A] ∣ HXn−2)α(A′, s′)w(A,A′),
since Φ1(n) holds,
= α(A′, s′)P(Xn ∈ [A′],Xn−1 ∈ [A] ∣ HXn−2).
Notice that the statement Φ2(0) holds since p0 respects α. Then, by (i), Φ1(1)holds. Further, by (ii), Φ2(1) holds etc.
Chapter 4. Exact aggregation of Markov chains 76
Corollary 3. If (Cond2) holds and p0 respects α, then Φ1(n) and Φ2(n) hold for
n = 0,1,2, . . ..
Proof. (Theorem 4.16 and Theorem 4.10) Let Y ′n denote the aggregated process
and Yn the g-projection of process Xn. By Corollary 3, since Φ1(n) holds
for all n ∈ N, the process Yn is Markov. Moreover, it has the transition matrix
equivalent to the one of Y ′n, which also concludes the proof of Theorem 4.10.
The forward and backward criteria for lumpability in case of DTMC’s have been
first noticed by Kemeny and Snell [56]. Here, these results were adapted to our
terminology of projection and aggregation (in order to have a unifying framework
for exact and approximate aggregations). From this point on, we present several
extension to the theory: (i) invertability property, and invertability in case that the
initial distribution does not respect α, (ii) continuous-time case, (iii) interpretation
of the results between trace distributions of the original and aggregated chain (both
for discrete- and continuous-time case).
4.2.4 Convergence
In the previous section, we proved that, if the initial distribution respects α, then
(Cond2) implies lumpability (with respect to g) and invertability (with respect to
α). We now investigate the case when the initial distribution doesnt respect α.
We assume throughout to be given a DTMC Xn over the Markov graph (S,w, p0).Moreover, we will assume that (Cond2) is satisfied for some α, and we denote the
α-aggregation by (S, w, p0).
Theorem 4.18. Let Yn be a process assigned to (S, w, p0). Then,
(i) If Xn is irreducible, then so is Yn.
(ii) If state s is aperiodic in Xn, then the state g(s) is aperiodic in Yn.
We start with the following Lemma.
Lemma 4.19. For every A,A′ ∈ S and every s′ ∈ [A′],
P(Yn = A′ ∣ Y0 = A) = ∑s∈[A]α(A, s)P(Xn = s′ ∣X0 = s)α(A′, s′) .
Chapter 4. Exact aggregation of Markov chains 77
Proof. We use induction. Notice that, for n = 1, the assertion is true by the
definition of w. Assume that the statement holds for some n. Notice that
P(Xn+1 = s′ ∣X0 = s) = ∑s′′∈S
P(X1 = s′′ ∣X0 = s)P(Xn+1 = s′ ∣X1 = s′′)
= ∑A′′∈S
∑s′′∈[A′′]
w(s, s′′)P(Xn+1 = s′ ∣X1 = s′′).
By multiplying with α(A, s) for all s ∈ [A], and summing up, we obtain
∑s∈[A]
α(A, s)p(n+1)(s, s′) = ∑s∈[A]
α(A, s) ∑A′′∈S
∑s′′∈[A′′]
w(s, s′′)p(n)(s′′, s′)
= ∑A′′∈S
∑s′′∈[A′′]
α(A′′, s′′)w(A,A′′)p(n)(s′′, s′),
= ∑s′′∈[A′′]
α(A′′, s′′)p(n)(s′′, s′),by the hypotheses,
= α(A′, s′)P(Yn+1 = A′ ∣ Y1 = A′′),
where the shorthand p(n)(s, s′) was used for P(Xn = s′ ∣X0 = s). Finally, we obtain
∑A′′∈S α(A′, s′)P(Yn+1 = A′ ∣ Y1 = A′′)w(A,A′′), which is exactly α(A′, s′)P(Yn+1 =A′ ∣ Y0 = A).
Proof. (Theorem 4.18) A consequence of Lemma 4.19 is that if for s ∈ [A] and
s′ ∈ [A′], if s → s′ (s communicates with s′, Definition 1.30), then A → A′ in the
DTMC Yn. It implies (i). The claim (ii) follows from
n ∣ P(Xn = s,X0 = s) > 0 ⊆ n ∣ P(Yn = A,Y0 = A).
Theorem 4.20. Let Yn be a process assigned to (S, w, p0), and let Xn also
be irreducible and aperiodic. Then
(i) The process Xn has a unique stationary distribution µ, and µ respects α.
(ii) The process Yn has a unique stationary distribution µ, and µ(A) = ∑s∈[A] µ(s).
(iii) The g-aggregation of (S,w,µ) to (S, w, µ) is exact.
The fact (iii) tells that using the aggregation (S, w, p0) will be justified in the
limit.
Chapter 4. Exact aggregation of Markov chains 78
Proof. The process Xn has a unique stationary distribution µ, by Theorem 1.33.
Assume that X ′n is assigned to a DTMC (S,w, p′0) and that p′0 respects α. Since
µ is a unique stationary distribution for every chain with transition probabilities
defined by w, the transient distribution of X ′n converges to µ. By Theorem 4.10,
since all transient distributions respect α, µ also respects α.
By Theorem 4.18, the chain Yn is also aperiodic and irreducible, so it has a
unique stationary distribution µ. Since µ respects α, (ii) and (iii) immediately
follow.
4.3 Continuous-time case
Let (S,w, p0) be a Markov graph with a CTMC Xt. In addition, we will assume
that there exists an r > 0 such that supi a(si) < r. Notice that this assumption
does not mean that the state space of the chain is finite, but that there are finitely
many transitions with non-zero weights originating in a particular state in S. A
chain satisfying such a condition is told to be finitely branching.
Then, the analogue of Theorem 4.7 and Corollary 1 trivially hold.
Theorem 4.21. Suppose that (Cond1) holds. Fix s ∈ Ai and let for each A ∈ S,
w(A,A′) = δ+(s,A′). Then, the aggregation (S, w, p0) is well-defined and exact.
Corollary 4. The process Xt is strongly lumpable with respect to g. Moreover,
he process Xt is strongly lumpable with respect to g only if (Cond1) holds.
We now discuss lumpability under the condition (Cond2).
Definition 4.22. Suppose that (Cond2) holds. Fix s ∈ A′ and let w(A,A′) =δ−(A, s) for each A′ ∈ S. Then, the aggregation (S, w, p0) is called the (g,α)-aggregation. If the partition g is clear from the context, we write only α-aggregation.
Recall that Xt is invertible with respect to α if all the transient distributions
of Xt respect α for all t ∈ R≥0. We now merge the results for lumpability and
invertability in continuous-time in the following Theorem.
Theorem 4.23. If (Cond2) holds and p0 respects α, the process Xt is lumpable
with respect to g. Moreover, the g-aggregation (S, w, p0) is well-defined, exact and
invertible with respect to α.
Chapter 4. Exact aggregation of Markov chains 79
The Theorem will be shown by constructing a uniformized discrete time Markov
chain out of Xt. We first need to show the commutativity relation between
aggregating and uniformizing a CTMC. An illustration is provided in Figure 4.2.
uniformizationXt
Y t
Zn
aggregation Zn
Z n
Ytprojection
Figure 4.2: Illustration for Theorem 4.24.
Theorem 4.24. Let Q be the generator matrix with sups ∣qs∣ < r for some r, and
let
(i) Xt be the CTMC assigned to the Markov graph (S,w, p0),
(ii) Y ′t be the CTMC assigned to the (g,α)-aggregation of (S,w, p0),
(iii) Zn be the uniformized chain of Xt with parameter r,
(iv) Zn be the uniformized chain of Y ′t with parameter r,
(v) Z ′n be the DTMC assigned to the (g,α)-aggregation of Zn.
Then Z ′n ≡ Zn.
Proof. The process Zn with a transition matrix M meets the condition (Cond2)
with respect to α. Denote by δ−M and δ−Q the backward conditions with respect to
matrices M with entries m(s, s′) and Q with entries q(s, s′) respectively. Since
Zn is a uniformization of Xt with constant r, M = Q/r + IN . For s′ ∉ [A],
δ−M(A, s′) = ∑s∈[A]
α(A, s)m(s, s′)α(A′, s′) = 1
r∑s∈[A]
α(A, s)q(s, s′)α(A′, s′) = 1
rδ−Q(A, s′),
Chapter 4. Exact aggregation of Markov chains 80
and, if s′ ∈ [A],
δ−M(A, s′) = ∑s∈[A]
α(A, s)m(s, s′)α(A′, s′)
= ∑s∈[A]∖s′
α(A, s)m(s, s′)α(A′, s′) + (1 + q(s
′, s′)r
)
= 1 + 1
r
⎛⎝ ∑s∈[A]∖s′
α(A, s)q(s, s′)α(A′, s′) + q(s′, s′)
⎞⎠
= 1 + 1
r
⎛⎝ ∑s∈[A]
α(A, s)q(s, s′)α(A′, s′)
⎞⎠= 1 + 1
rδ−Q(A, s′).
The value δ−Q(A, s′) does not depend on s′, since, by assumption, (Cond2) holds
for matrix Q. The rest of the claim follows trivially.
We first show the analogue of Theorem 4.13.
Theorem 4.25. Let Y ′t denote the process assigned to (S, w, p0). If p0 respects
α, then
(i) P(Xt ∈ [A]) = P(Y ′t = A),
(ii) P(Xt = s) = P(Y ′t = A)α(A, s).
Proof. Notice that
P(Y ′t = A) =∑
n≥0
P(Zn = A)e−rt(rt)nn!
=∑n≥0
P(Z ′n = A)e
−rt(rt)nn!
=∑n≥0
P(Zn ∈ [A])e−rt(rt)nn!
=∑n≥0
∑s∈[A]
P(Zn = s)e−rt(rt)n
n!
= ∑s∈[A]
P(Xt = s) = P(Xt ∈ [A]),
where the second equality is by Theorem 4.24 while the third is by Theorem 4.10.
It is similarly shown that P(Xt = s ∣Xt ∈ [A]) = α(A, s).
Proof. (Proof sketch, Theorem 4.23) The well-definedness follows by the same
arguments as for Theorem 4.9. By Theorem 4.25, the process Xt is invertible
with respect to α, and, moreover, the process Yt and Y ′t agree at all transient
Chapter 4. Exact aggregation of Markov chains 81
distributions. It remains to show that Yt is a CTMC, that is, for all A,A′ and
for all t, h ≥ 0,
P(Xt+h = A′ ∣Xt = A,HXt) = P(Xh = A′ ∣X0 = A).
This can be proven by induction on the number of jumps occurred in history HXt ,analogous to the proof of Lemma 4.17.
We complement the proof by showing that the g-projection Yt meets the Chapman-
Kolmogorov equations. To that end, notice that for all t, h ∈ R≥0,
∑A′∈S
P(Yt = A′ ∣ Y0 = A)P(Yh = A′′ ∣ Y0 = A′)
= ∑A′′∈S
∑n≥0
n
∑k=0
P(Zk = A′ ∣ Z0 = A)P(Zn−k = A′ ∣ Y0 = A′)e−r(t+h)rn tkhn−k
k!l!
n!
n!
=∑n≥0
P(Zn = A′′ ∣ Z0 = A)e−r(t+h)rn
n!
n
∑k=0
(nk)tkhn−k, since Zn is a DTMC,
=∑n≥0
P(Zn = A′′ ∣ Z0 = A)e−r(t+h)(r(t + h))n
n!= P(Yt+h = A′′ ∣ Y0 = A),
where the last equality follows from Theorem 4.24.
Let µ be a stationary distribution of the CTMC Xt. Then we have the corre-
sponding analogue of Theorem 4.20.
Theorem 4.26. Let Xt be a CTMC assigned to a chain (S,w, p0) with an
irreducible generator Q, and with supsi ∣a(si)∣ < r for some r > 0. Let µ be the
stationary distribution for Q. Moreover, let Yt be the (g,α)-aggregation of Xtwith stationary distribution µ. Then,
(i) The process Xt has a unique stationary distribution µ, and µ respects α.
(ii) The process Yt has a unique stationary distribution µ, and µ(A) = ∑s∈[A] µ(s).
(iii) The (g,α)-aggregation from (S,w,µ) to (S, w, µ) is exact.
Proof. We first consider the uniformized chain Zn corresponding to Xt with
transition matrix M = Q/r + IN . Note that µ is the stationary distribution for M.
If follows by Theorem 4.20 that µ is the stationary distribution for Zn, hence
for Y ′t .
Chapter 4. Exact aggregation of Markov chains 82
4.4 Trace semantics of stochastic processes
We now define a probability space directly on the traces of a given stochastic
process, which will be helpful for reasoning about approximate reductions.
Let Xt, t ∈ T be a stochastic process. Define a measurable space (Γ,FΓ), such
that Γ contains all traces of process Xt and FΓ is some suitably chosen set of
measurable sets of traces. A trace γ ∈ Γ is a mapping of T to S, and the set Γ lies in
the space of all fuctions from T to S, denoted by (T → S) or ST . The probability
measure PΓ is then inherited from the finite-dimensional marginal distributions
of Xt, and one has to take care of defining it properly. The probability space
(Γ,FΓ,PΓ) is sometimes called a directly given random process or a Kolmogorov
model for the random process [46].
4.4.1 Trace semantics: discrete-time
For example, in a directly given random process of a discrete time process Xn, n ∈N, a trace is an infinite sequence of states from S:
γ ≡ s0 → s1 → s2 → . . . ∈ SN,
and it is also called the output sequence or a discrete signal. We refer to the i-th
state of trace γ by sγi . For a given k ∈ N0, and a sequence of states s0, . . . , sk ∈ S,
let Prefix(s0, . . . , sk) denote the set of all the traces with a prefix (s0, . . . , sk):
Prefix(s0, . . . , sk) = γ ∈ Γ ∣ sγi = si, for all i = 0,1, . . . , k.
Let the event space FSN be the smallest σ-algebra containing all the finite prefix
sets of traces. The elements of FSN are sometimes called the rectangle sets of
traces. Finally, the probability measure PSN is naturally inherited from the original
probability measure with
PSN(Prefix(s0, . . . , sk)) = P(X0 = s0, . . . ,Xk = sk).
Notice that any finite-dimensional distribution related to the process Xn can be
determined through the set FSN . Random variables can now be defined over the
probability space (SN,FSN ,PSN) as usually.
Chapter 4. Exact aggregation of Markov chains 83
4.4.2 Trace semantics: continuous-time
We now construct a directly given random process for a continuous-time process
Xt, t ∈ R≥0. Let Γ(R+,S) denote the set of all piece-wise constant, right-continuous
functions from R≥0 to S. Recall that any trace in Γ(R+,S) is characterized by an
infinite sequence of jump times and jump states:
γ ≡ s0t1→ s1
t2→ s2 . . . ∈ Γ(R+,S).
We refer to the i-th visited state of trace γ by sγi , and the time instance at which the
i-th jump occured by tγi . Given k ∈ N, let Cylinder(s0, . . . , sk, δ0, . . . , δk−1) denote
the set of traces which visit a sequence of states s0, . . . , sk, with respective waiting
times no longer than δ0, . . . , δk−1:
Cylinder(s0, . . . , sk, δ0, . . . , δk−1) = γ ∈ Γ(R+,S) ∣sγi = si for all i = 0, . . . , k, and
tγi+1 − tγi < δi for all i = 0, . . . , k − 1,
and let FΓ(R+,S)be the smallest σ-algebra containing all such sets (for all k ∈ N,
s0, s1, . . . , sk ∈ S and δ0, . . . , δk−1 ∈ R≥0). The elements of FΓ(R+,S)are called cylinder
sets of traces. Notice that a cylinder set of traces can take any subset of S at the
i-th jump, and the jump times may lie in any interval in B. The probability
measure is naturally inherited from the original process Xt:
PΓ(R+,S)(Cylinder(s0, . . . , sk, δ0, . . . , δk−1)) = P(τi < δi, for all i = 0, . . . , k − 1 and
Zi = si, for all i = 0, . . . , k).
4.5 Trace semantics interpretation of exact aggregations
We summarize the results of this chapter by relating the trace semantics of the
aggregated chain with the trace semantics of the original chain.
4.5.1 Discrete-time case
Let (SN,FSN ,PSN) be the directly given random model for process Xn. In par-
ticular, define the following three measurements (random variables):
Chapter 4. Exact aggregation of Markov chains 84
(i) Prefix trace semantics, defined by X0∶k(γ) = (sγ0 , . . . , sγk), inherits the joint
distribution (X0, . . . ,Xk) of the original process Xn. The probability space
generated by the random variable X0∶k is (Sk+1,P(Sk+1),PX0∶k). For some
D0, . . .Dk ∈ P(S), a rectangle set of traces (D0, . . . ,Dk) = ∪(s0, . . . , sk) ∣ si ∈Di, i = 0, . . . , k will be denoted by D0 → . . . → Dk. When it is clear from
the context that all Si are singletons, we write s0 → . . . → sk instead of
s0→ . . .→ sk.
(ii) Transient semantics, defined by Xk(γ) = sγk, coincides with the transient
distribution of the process Xn.
(iii) Projected trace semantics, defined by Xg(γ) = (g(sγ0), g(sγ1), . . .), projects
the model to the measurable space (SN,FSN). It is easy to inspect that the
induced probability space is equivalent to the directly given random model
of the g-projection of process Xn.
Theorem 4.27. Let Xn be a DTMC assigned to the Markov graph (S,w, p0),such that (Cond2) holds for some α. Let Yn be the process assigned to the
α-aggregation (S, w, p0). Then,
(i) If p0 respects α, then for all n = 0,1, . . .
(i) P(X0∶n ∈ [A0]→ . . .→ [Ak]) = P(Y 0∶n = A0 → . . .→ Ak)
(ii) P(X0∶n ∈ [A0]→ . . .→ [Ak−1]→ s) = P(Y 0∶n = A0 → . . .→ Ak)α(Ak, s),with s ∈ [Ak].
(ii) If p0 respects α, then for all n = 0,1, . . .
(i) P(Yn = A) = P(Xn ∈ [A])
(ii) P(Xn = s) = α(A, s)P(Yn = A).
(iii) If (S,w, p0) is irreducible and aperiodic, then for n→∞,
(i) ∣P(Yn = A) − P(X ∈ [A])∣→ 0
(ii) P(Xn = s)/P(Yn = A)→ α(A, s).
4.5.2 Continuous-time case
Let (Γ(R+,S),FΓ(R+,S),PΓ(R+,S)
) be a directly given model for Xt. Simlarly as in
the discrete case, we focus on the three measurements:
Chapter 4. Exact aggregation of Markov chains 85
(i) Prefix trace semantics, defined byX[0,T )(γ) = (kγ, sγ1 , . . . , sγk, t
γ0 , . . . , t
γk), where
kγ is the number of jumps which happen before time T . For D0, . . . ,Dk ∈P(S), and I1, . . . , Ik ∈ B, we write instead ofX[0,T ) ∈ (k,D0, . . . ,Dk, I1, . . . , Ik),the shorthand X[0,T ) ∈ D0
I1→ . . .Ik→ Dk, and if it is clear that all sets Di are
singletons, we omit the set parenthesis.
(ii) Transient semantics, defined by Xt(γ) = γ(t), coincides with the transient
distribution of Xt.
(iii) Projected trace semantics, defined by Xg(γ) = γ ∈ Γ(R+,S), so that γ(t) =g(γ(t)) for all t ∈ R≥0. It can be shown that the induced probability space is
equivalent to the directly given random model of the g-projection of process
Xt.
Theorem 4.28. Let Xt be assigned to the Markov graph (S,w, p0) such that
(Cond2) holds for some α. Let Yt be the process assigned to the α-aggegation
of Xt. Then,
(i) If p0 respects α, for all T ∈ R≥0,
(i) P(Y [0,T ) = A0I1→ . . .
Ik→ Ak) = P(X[0,T ) ∈ [A0]I1→ . . .
Ik→ [Ak])
(ii) P(X[0,T ) ∈ [A0]I1→ . . . [Ak−1]
Ik→ s) = P(Y [0,T ) = A0I1→ . . .
Ik→ Ak)α(Ak, s).
(ii) If p0 respects α, then for all n = 0,1, . . .
(i) P(Yt = A) = P(Xt ∈ [A])
(ii) P(Xt = s) = α(A, s)P(Yt = A).
(iii) If (S,w, p0) is irreducible and aperiodic, then for t→∞,
(i) ∣P(Yt = A) − P(X ∈ [A])∣→ 0
(ii) P(Xt = s)/P(Yt = A)→ α(A, s).
4.6 Matrix representation
If the state space is finite, using the matrix notation enables a concise specification
of the forward, backward criterion, and of the construction for the aggregate chain.
Chapter 4. Exact aggregation of Markov chains 86
Corollary 5. Let V be an N ×M matrix defined by
Vs,A =⎧⎪⎪⎪⎨⎪⎪⎪⎩
1 if g(s) = A,
0 otherwise,
and let Uα be an M ×N matrix defined by
UαA,s =
⎧⎪⎪⎪⎨⎪⎪⎪⎩
α(A, s) if g(s) = A,
0 otherwise.
Let P be the transition matrix of Markov graph (S,w, p0) (either discrete- or
continuous-time), and let Pα be the transition matrix of the corresponding α-
aggregation (S, w, p0). Then,
(i) UαV = IM for all α,
(ii) (Cond1) is equivalent to VUαPV = PV for all α.
(iii) (Cond2) is equivalent to UαPVUα = UαP.
(iv) The transition matrix of α-aggregation is given by Pα = UαPV.
(v) The results of Theorem 4.13 and Theorem 4.25 summarize to π(t) = π(t)V
(lumpability) and π(t) = π(t)Uα (invertability).
Proof. (i) for all α, (UαV )AI ,AJ = ∑s∈S UαAI ,s
Vs,AJ = ∑s∈[AJ ]α(AI , s) =⎧⎪⎪⎪⎨⎪⎪⎪⎩
1 if I = J
0 otherwise.
(ii) First see that (PV )si,AJ = ∑sj∈[AJ ]w(si, sj) = δ+(si,AJ), and that (V Uα)si,sj =⎧⎪⎪⎪⎨⎪⎪⎪⎩
α(A, sj) if g(si) = g(sj) = A
0 otherwise. Moreover, (V UαP )si,sj = ∑sk∈S
(V Uα)si,skw(sk, sj) =
∑sk∈[g(si)]α(g(si), sk)w(sk, sj) and (V UαPV )si,AJ = ∑sj∈S(V UαP )si,sjVsj ,AJ =
∑sj∈[AJ ]∑sk∈[AI]α(AI , sk)w(sk, sj), where AI = g(si). Since (V UαPV )si,AJ
has the same value for all si ∈ [AI], it follows that the matrices VU and
VUαPV are equivalent if and only if (Cond1) holds.
(iii) Notice that (UαP )Ai,sj = ∑sk∈SUαAi,sk
w(sk, sj) = ∑sk∈[Ai]α(Ai, sk)w(sk, sj),
that is equivalent to δ−(AI , sj)α(AJ , sj), where AJ = g(sj). On the other
hand, (UαPV )AI ,AJ = ∑si∈[AI]∑sj∈[AJ ]α(AI , si)w(si, sj) and consequently
Chapter 4. Exact aggregation of Markov chains 87
(UαPV Uα)Ai,sj = (UαPV )AI ,AJUAJ ,sj , whereAJ = g(sj). Therefore, UαPVUα =UαP is (UαPV )AI ,AJ is equivalent to δ−(AI , sj) for all sj ∈ [AJ], that is, if
and only if (Cond2) holds.
The points (iv) and (v) follow trivially.
Chapter 5
Exact automatic reductions of stochastic
rule-based models
Throughout this Chapter, we assume a rule-based program (R,G0) over a contact
map C = (A,Σ,E ,I ) with Markov graph (G, w, p0) and process Xt. We use
notation S for the set of species, F for the set of fragments, Xt the species-based
semantics with partition function ϕS ∶ G → X and Markov graph (X ,w, p0), and
Yt the fragment-based semantics with partition function ϕF ∶ G → Y . Finally,
ϕ ∶ X → Y denotes the partitioning from the species state space to the fragments
state space.
Definition 5.1. The F-reduction of (R,G0) is any ϕ-aggregation of (X ,w, p0).The reduction is exact, if the ϕ-aggregation is exact (Definition 4.2). Otherwise,
the reduction is approximate.
We will address the following problem related to the exact reduction of a rule-based
program.
Problem 2. Characterize the set of fragments F , such that there exists an exact
F -reduction of (R,G0); Then, if the fragment-based semantics of a given rule-
based program is Yt, define a new rule-based program with the species-based
semantics Yt.
Definition 5.2. Let G1 = (V1,Type1, I1,E1, ψ1) and G2 = (V2,Type2, I2,E2, ψ2)be site-graphs such that V1∩V2 = ∅. Their disjoint union, denoted by G = G1⊎G2,
is such that, for i = 1,2, if v ∈ Vi, then Type(v) = Typei(v), I(v) = Ii(v), ψ(v) =ψi(v).
88
Chapter 5. Exact automatic reductions of stochastic rule-based models 89
Definition 5.3. Site-graph G is said to be a sub-graph of G′, written G ⊆ G′, if
the identity mapping is a support of an embedding of G to G′. If σ ∈ Emb(G,G′),we denote by σ(G) the sub-graph of G′ induced by σ. In that case, we can refer
to the inverse transformation σ−1 ∈ Emb(σ(G),G).
The following definition is an alternative way of defining fragments, by extending
the annotation relation to the sites of different agents which are potentially con-
nected via a path. It will be useful for proving that the reduction by Algorithm 2
is exact.
Definition 5.4. Let ∼A∣ A ∈ A be some annotation of a contact map C. More-
over, let ∼⊆ ((A, s), (A′, s′)) ∣ A,A′ ∈ A, s ∈ Σ(A), s′ ∈ Σ(A′) be a transitive
closure of the relation
(A, s) ∼ (A′, s′) iff s ∼A s′ or ((A, s), (A′, s′)) ∈ E .
The contact map can uniquely be decomposed into a disjoint union of a finite
number of contact maps, C = ⊎iCi (Figure 5.1a). Moreover, under the described
decomposition of a contact map, the set of fragments F can be decomposed to
F = ⊎iFi, where Fi is isomorphic to a set of species for the contact map Ci.
Definition 5.5. The restriction of a site-graph G = (V,Type, I,E,ψ) over a con-
tact map C = (A,Σ,E ,I ), to a contact map Ci = (Ai,Σi,Ei,I ) is denoted
by G∣Ci and defined by (Vi,Typei, Ii,Ei, ψi), with Vi = v ∈ V ∣ Type(v) ∈ Ai,
Typei(v) = Type(v), Ii(v) = Σ(Typei(v)), Ei = ((v, s), (v′, s′)) ∣ v, v′ ∈ Vi, s ∈Σi(v), s′ ∈ Σi(v′), ((v, s), (v′, s′)) ∈ E, ψi(v, s) = ψ(v, s), if v ∈ Vi.
Notice that the identity node permutation always defines an embedding between
G∣Cj and G.
Notation 1. If σ ∈ Iso(G1,G2), we write G2 = σ(G1). A union of a1 copies of site-
graphs F1, a2 copies of site-graph F2 etc will be denoted by GF ≡ a1F1, . . . , amFm.
Notice that GF is defined over the contact map C, but it is not necessarily a
reaction mixture.
Recall that ϕS(G) = (x1, . . . , xn) with xi = mG(Si) for i = 1, . . . , n, and that
ϕF(G) = (y1, . . . , ym) with yi = mG(Fi) for i = 1, . . . ,m. We provide an alter-
native characterization of both aggregations in terms of a binary relation between
states. An illustration is provided in Figure 5.1.
Chapter 5. Exact automatic reductions of stochastic rule-based models 90
u3
u4
G1
a b
c d
AB
C
a
Bc
G2
a
b
c d
A
BC
a
Bc
a b
c d
AB
C
a
Bc
a
b
c d
A
BC
a
Bc
a b
c d
AB
C
a
Bc
u1u2
u2
u2
u2
u2
a b AB
c d CB
c d C
a b AB
C C1 C2b)
a)
G3 G4
G0
Figure 5.1: Example 2.1, characterization of ϕS and ϕF by node permutations.a) The decomposition of a contact map, as described in Definition 5.4. b) As-sume the initial state of (G, w, p0) in G0. Any of the four states G1,G2,G3,G4 isreachable from G0, after applying rules R1 and R2 (in any order). The permuta-tion of nodes which confirms that G1,G2 ∈ [SABC , SB]ϕS is (v2 ↦ v1) (the othernode images are uniquely determined); Moreover, G3,G4 ∈ [SAB, SBC]ϕS , withpermutation (v1 ↦ v2); Finally, G1,G2,G3,G4 ∈ [FAB?, F?BC]ϕF . For exam-ple, the permutation between G1∣C1 to G3∣C1 is (u2 ↦ u2), and the permutation
between G1∣C2 to G3∣C2 is (u2 ↦ u1).
Lemma 5.6. Let G = (V,Type,I,E,ψ), G′ = (V,Type,I,E′, ψ′) be two reaction
mixtures in G and let C = C1 ⊎ . . .⊎Ck be the decomposition of a contact map as
defined in Definition 5.4. Then,
(i) ϕS(G) = ϕS(G′) if and only if there exists an isomorphism ρ ∈ Iso(G,G′)with support ρ∗ ∶ V → V , that is, G′ = ρ(G). Notice that, since nodes can be
mapped only to nodes of the same type, ρ is a product of permutations over
the nodes of the same type.
(ii) ϕF(G) = ϕF(G′) if and only if there exists a family of isomorphisms ρi ∈Iso(G∣Ci ,G′∣Ci) ∣ i = 1, . . . , k, such that G′∣Ci = ρi(G∣Ci) for all i = 1, . . . , k.
It is expected that the species-based semantics is an exact aggregation with respect
to the individual-based semantics. Next result confirms that (Cond1) holds for
lumping Xt with partition ϕS .
Theorem 5.7. The process Xt is strongly lumpable with respect to ϕS .
Proof. Let G1 ∈ G, ϕS(G1) = x and for rule Ri, σ∗ ∶ V → V be such that δi(σ,G1) =G′1, and ϕS(G′1) = x′. Let G2 ∈ G be such that ϕS(G2) = x, and let ρ ∶ V → V
be the node permutation such that G2 = ρ(G1). Then, ρ∗ σ∗ ∶ Vi → V ′ is an
Chapter 5. Exact automatic reductions of stochastic rule-based models 91
Algorithm 2: Procedure for annotating the node type’s signatures
Input : A rule-based program (R,G0) over the contact map (A,Σ,E ,I ), suchthat R ≡ R1, . . . ,Rn and for each i = 1, . . . , n, Ri ≡ (Gi,G′
i, ci);Output: Annotation ∼AA∈A (a family of equivalence relations of the set of sites
of each node type).
for A ∈ A do ∼A= s ∣ s ∈ Σ(A);for G ∈ G1,G′
1, . . . ,Gn,G′n do
G = (V,Type, I,E,ψ);for v ∈ V do
A = Type(v);for s ∈ I(v) and each s′ ∈ I(v) do ∼A= addrelation(∼A, s, s′);
end
endreturn (∼A)A∈A;/* For any node type A ∈ A, ∼A is a equivalence relation that is encoded by a
forest as in the Union-Find algorithm [15], the primitiveaddrelation(∼, a, b) fuses the two ∼-equivalence classes [a]∼ and [b]∼. */
embedding support between Gi and G2. Let ρ′ ∶ V ′ → V ′ be such that ρ′(v) =⎧⎪⎪⎪⎨⎪⎪⎪⎩
ρ(v) if v ∈ V
v if v ∈ V ′ ∖ V. Then, δi(ρ σ, ρ G1) = ρ′(δi(σ,G1)), by definition of rule
application. Hence, δi(ρ σ,G2) ∈ [x]ϕS . Since whenever σ′ ≠ σ, also ρ σ′ ≠ ρ σ,
the number of applications starting from G1 to x′, and from G2 to x′ is equal.
Consequently, δ+(G1,x′) = δ+(G2,x′).
5.1 Exact fragment-based reduction
We propose the Algorithm 2 for annotating the contact map of a given rule-based
program. Initially, each site is correlated only to itself (being the top of the annota-
tion lattice, introduced in Chapter 3). First, the annotation is refined so to contain
all correlations between sites which appear in the observable patterns, because all
the observables should be read from in the fragment-based semantics. Then, the
procedure refines the current annotation by processing the rules sequentially, so
that every two sites which are tested or modified by the same rule, are correlated
by the annotation.
Chapter 5. Exact automatic reductions of stochastic rule-based models 92
From now on, we assume that ∼A∣ A ∈ A is the annotation derived by Algo-
rithm 2 and F the corresponding fragmentation. The following observation will
be important for showing that the F -reduction is exact.
Observation 1. Let u be a node appearing in the lhs of a rule Ri, and Let C = ⊎jCj
be the contact map decomposition, F = ⊎jFj the corresponding partition of the
fragment set, v a node appearing in the a fragment F ∈ F , such that Type(u) =Type(v) = A. Then, we differ one of the three cases:
(i) I(u) = ∅,
(ii) I(u) ⊆ I(v) = Σj(A), or
(iii) I(u) ≠ ∅ and I(u) ∩ I(v) = ∅.
In further analysis, it will be important to notice that, for every embedding σ ∈Emb(Gi,G), in case (ii), the test is performed exclusively over fragments from
group Fj. If Ri changes the internal configuration of a fragment, it affects no
fragment from other groups. If a Ri involves deletion of nodes, a deleted fragment
from set Fj will trigger a deletion of all other fragments which contain the node
types involved in the lhs of the rule. Therefore, unless the deleted structure is a
species, the change will involve more fragments of the same type simultaneously
(see Figure 5.2). In a well-defined rule, the birth of a component must be a birth
of a species. As Algorithm 2 merges also the sites which appear together in the rhs
of the rule, the birth of a fragment will uniquely affect any fragment-based state.
In case (i), no site of node u is tested by the rule. Such a node either remains as
it is on the rhs, or can be deleted.
We first establish lumpability between the fragment-based semantics and the
individual-based semantics by showing that (Cond2) holds.
Theorem 5.8. Xt is lumpable with respect to ϕF . Specifically, (Cond2) is met,
for α(y,G) = 1/∣[y]ϕF ∣.
Proof. Fix y,y′ ∈ Y, reaction mixture G′1 ∈ [y′]ϕF and rule Ri = (Gi,G′i, ci), such
that Gi = (Vi,Typei,Ii,Ei, ψi) and G′i = (V ′
i ,Type′i,I ′i ,E′i, ψ
′i). Define the set of
embeddings which lead to G′1 after applying a rule Ri to a reaction mixture from
[y]ϕF by Γ−(y,Ri,G′1) ∶= σ ∈ Emb(Gi,G) ∣ G ∈ [y] and δi(G1, σ∗) = G′1. We show
that the cardinality of set Γ−(y,Ri,G′1) does not depend on the choice of G′1. Let
Chapter 5. Exact automatic reductions of stochastic rule-based models 93
c5
G1
a b
c d
AB
C
a
Bc
a b
c d
AB
C
G2
a
b
c d
A
BC
a
Bc a
c dB
C
c5
cR5 : ∅ x1
x2
x3
x4
ABC, B
AB, BC BC
ABCc5
c5
G1
G2
a) b) c)
aB
Figure 5.2: Specificity of deletion events. a) A ruleR5 is added to Example 2.1,where a node of type B is deleted whenever site c is free. b) Two reactionmixtures, G1 and G2, are lumped in the fragment-based view; To both statesG1 and G2, lhs of rule R5 embeds exactly once: mG1(G5) = mG1(G5) = 1, butthe states obtained after the transition are not equivalent in the fragment-basedview. c) The species-based representation of the situation in b) – (Cond1) is
violated.
G′2 ∈ [y′]ϕF . The goal is to construct a bijection between sets Γ−(y,Ri,G′1) and
Γ−(y,Ri,G′2).
For each G1 and σ1 ∈ Γ−(y,Ri,G′1), we define a different G2 ∈ [y]ϕF and σ2, such
that G′2 = δi(G2, σ∗2). Since ϕF(G′1) = ϕF(G′2), there exist node permutations ρ′∗j ∶V ′ → V ′ ∣ j = 1, . . . , k such that G′2∣Cj = ρ′j(G′1∣Cj) for j = 1,2, . . .. Let u ∈ Gi,
and v = σ∗1(u) its image in G1. Define G2 by permuting the node identifiers of
G1 with ρ∗j ∶ V → V ∣ j = 1, . . . , k,such that ρ∗j (v) =⎧⎪⎪⎪⎨⎪⎪⎪⎩
ρ′∗j (v) if v ∈ V ∩ V ′
v if v ∈ V ∖ V ′
. In
other words, the image of a node which was not deleted remains as in y′, and
the deleted nodes have the identity image. In case V ⊂ V ′ (creation of nodes),
the image is neglected. First show that G2 ∈ [y′]ϕF : (Case 1) if V = V ′, trivially
holds by the discussion in Observation 1; (Case 2) if V ∖ V ′ ≠ ∅ (Ri is a deletion
event) For all j = 1, . . . , k, a deleted node v ∈ V ∖ V ′ is mapped to itself by ρj.
Therefore, for each deleted node in G1 and G2, their interfaces are equivalent.
Denote the deleted component by Sl. Since by assumption, G′1 = G1 ∖ Sl and
G′2 = G2∖Sl, are fragment-based equivalent, it follows that G1 and G2 are fragment-
based equivalent. (Case 3) if V ′ ∖ V ≠ ∅ (Ri is a birth event) The node v ∈ V ′ ∖ Vinduces a connected component in G′1 which is a fully defined species Sl, because,
in a well-defined rule, the birth components must have full interfaces. Moreover,
since the Algorithm 2 groups all the sites which appear on the rhs of a rule,
the annotation class of every v ∈ V ′ ∖ V lists all sites from its interface Σ(v),and, consequently the connected component induced by ρ′∗j (v) must be its species
Chapter 5. Exact automatic reductions of stochastic rule-based models 94
equivalent. Therefore, the states before the creation of these components are also
fragment-equivalent. Finally, the rule Ri can be applied to G2 via embedding
ρ∗j σ∗, resulting in δi(ρj G1, ρ∗j σ∗) = ρj(δi(G1, σ∗)) = G′2.
It can be inspected that the activity of each rule is equal for any two individual
states lumped by ϕF . This is however not sufficient for claiming (Cond1).
Theorem 5.9. Xt is lumpable with respect to ϕF . Specifically, if all deletion
events are deletion of a species,(Cond1) holds.
Proof. Taking G1,G2 such that ϕF(G1) = ϕF(G2) = y, it suffices to establish a
bijection between the successors of G1 and G2 inside some state [y′]ϕF . The proof
is analogous to the proof of Theorem 5.8, but it proceeds in the opposite direction:
under assumption that the transition sources are lumped, one needs to show that
the transition targets are lumped as well. The difference comes in the argument
related to the cases of node birth and node deletion: the case of node deletion is
analogous to the case of node creation in Theorem 5.8, and vice versa. However,
while the proof of Theorem 5.8 relies on the argument that the new node must
have fully defined interface, the guarantee that the deleted node is a fully defined
species is lost. An illustration of an Example where the node deletion violates
(Cond1) is provided in Figure 5.2.
Theorem 5.10. Xt is lumpable with respect to ϕ. Specifically, (Cond2) is met
for α ∶ Y ×X → [0,1], such that
α(y,x) =⎧⎪⎪⎪⎨⎪⎪⎪⎩
∣[x]ϕS ∣
∣[y]ϕF ∣, if [x]ϕS ∩ [y]ϕF ≠ ∅
0 , otherwise..
If all deletion events are deletion of a species, (Cond1) holds as well.
The proof follows directly from Theorem 3.4.
The computation of the conditional probabilities can be done efficiently: for G ∈[x]ϕS , ∣[x]ϕS ∣ = ∣Aut(G)∣ is given in Theorem 2.22. Moreover, ∣[y]ϕF ∣ is equal to
the number of automorphisms of τ(G).
In the related works, the presented problem was discussed from different perspec-
tives. In [33] and [32], the proof is facilitated with the process-algebraic notation,
while the semantics is defined over a weighted-labeled transition model. In [37], the
Chapter 5. Exact automatic reductions of stochastic rule-based models 95
results were extended by showing how the applicability of de-aggregation depends
on the initial distribution. Recently, we presented a procedure for reconstructing
the high-dimensional species-based dynamics from the aggregate state efficiently.
The algorithm involves counting the automorphisms of a connected site-graph,
and has a quadratic time complexity in the number of molecules which constitute
the site-graphs of interest [70].
5.2 Computing the fragment-based semantics
When dealing with large-scale examples, it is important to be able to compute the
fragment-based semantics. We propose to reuse the rule-based simulator for ob-
taining the fragment-based semantics. Towards this goal, it is important to choose
the right representation of a fragment-based state, and to define the executions
between fragment-based states, so that the aggregation between the underlying
Markov graphs is indeed exact.
Recall the translation of a rule-based program proposed in Chapter 3. In the
following, consider the (R, G0), which is a result of translation according to the
Algorithm 1 and the fragmentation F .
Observation 2. Each species in S = S1, S2, . . . in the model (R, G0) is isomorphic
to some fragment Fj ∈ F = F1, F2, . . . in the model (R,G0) and vice versa.
Assume that the ordering is such that S1 ≅ F1, S2 ≅ F2 etc.
Theorem 5.11. The individual-based semantics of (R, G0) –a Markov graph
(G,w, p0), and the individual-based semantics of (R, G0)– a Markov graph (G, w, p0),are equivalent.
The following result will serve to show that the activity of a rule in G ∈ G and
τ(G) ∈ G are equal. Recall the translation of a rule-based program presented in
(Definition 3.10, Chapter 3).
Lemma 5.12. Let G = (V,Type, I,E,ψ) and A1, . . . ,An be the node types which
occur in the lhs of the rule Ri, that is, in Gi. Moreover, let a1, a2, . . . , an denote
the number of annotation classes in A1,A2, . . . ,An respectively, and N1, . . . ,Nn
Chapter 5. Exact automatic reductions of stochastic rule-based models 96
the total abundance of nodes of type A1, . . . ,An in G. Then,
cici= 1
∏nj=1N
aj−1j
= mG(Gi)mτ(G)(τ(Gi))
.
Proof. The first equality follows from the definition of Algorithm 2. Fix σ∗ ∶ Vi →V , one embedding of Gi to G, such that δi(G) = G′. Let Ci = (Ai,Σi,Ei,Ii) be
the contact map inferred from a rule Ri. For each σ ∈ Emb(Gi,G), there are
exactly ∏nj=1N
aj−1j distinct embeddings σ ∈ Emb(Gi, G). More precisely, since
σ ∈ Emb(Gi, G), it holds that
σ∗(vC) =⎧⎪⎪⎪⎨⎪⎪⎪⎩
σ∗(vC), if Σi ⊆ C
any vC ∈ V , such that [Type(v)]C = Type(vC), (if C ∩Σi = Σi = ∅).
induces an embedding such that σ ∈ Emb(Gi, G). By Observation 1, there is a
unique vC ∈ Vi such that Σi ⊆ C. Since there are Nj possibilities for the choice of
σ∗(vC), the proof is complete.
Proof. (Theorem 5.11) If suffices to show that
1. G ∈ G iff τ(G) ∈ G, and
2. w(G,G′) = w(τ(G), τ(G′)).
We use induction on the number of steps. By Definition, G0 = τ(G0). Assume that
G = τ(G), and let G′ = δi(σ,G) be an arbitrary successor of state G. Let Vi be the
set of nodes of the translation of rule Ri by Algorithm 1. Define σ∗ ∶ Vi → V , so
that σ∗(vC) = [σ∗(v)]C. Then, it trivially holds that δi(G, σ∗) = G′, and G′ = τ(G′).Moreover,
w(G,G′) =∑Ri
∑σ∈Emb(Gi,G)
ci1δi(G, σ) = G′, by Definition of rule application
=∑Ri
(mG(Gi)ci)1δi(G, σ) = G′
=∑Ri
(mG(Gi)ci)1δi(G, σ) = G′ by Lemma 5.12
=∑Ri
∑σ∈Emb(G,G)
ci1δi(G, σ) = G′, by Definition of rule application
= w(τ(G), τ(G′)).
Chapter 5. Exact automatic reductions of stochastic rule-based models 97
a) b)1 time
P(Yt = y)
P(Xt ∈ [y]ϕ)
c3, c4 > 0
c3 = c4 = 0
0
0.05
1 time0
3e − 7
P(Xt = x1)
P(Yt = y)α(y,x)
c3, c4 > 0
c3 = c4 = 0
Figure 5.3: Example 2.1: testing Theorem 5.11, Theorem 5.10 and Theo-rem 4.28 (points (ii) and (iii)) for a set of fragments shown in Figure 3.2, for ini-tial state x = 10SA,20SB,10SC, and rate values c1 = 0.001, c1−=2, c2 = 0.002,c2− = 3, and c3 = c4 = 0 (solid lines), c3 = 0.05, c4 = 0.1 (dotted lines). The statesx1 = 10SB,10SABC) and x2 = 10SAB,10S10 are such that ϕ(x1) = ϕ(x2) = y(a multiset 10FAB?,10F?BC). The transient distribution of for the model with506 states, and for the reduced model with 121 states, is obtained by integrat-ing the CME. a) The plots for P(Yt = y) (green) and P(Xt ∈ [y]ϕ) (blue) areshown. For c3 = c4 = 0, the curves are identical; When the initial distributionwas changed so that P(X0 = x1) = 1, the condition still holds (plot not shown).b) The plots for P(Xt = x1) and P(Yt = y)α(y,x1) = P(Yt = y)10!10!
20! are shown.Again, for c3 = c4 = 0, the curves are identical. When the initial distributionwas set to P(X0 = x1) = 1, the condition holds asymptotically (plot not shown).
Corollary 6. Let Y ′t with be the species-based semantics of (R, G0), Then, Y ′
t is equivalent to Yt, the fragment-based semantics of (R,G0).
5.3 Example
The results are interpreted over the Example 2.1. For a set of rules R1,R−1 ,R2,R−
2,
the set of fragments F , derived by Algorithm 2, is the one shown in Figure 3.2
(Chapter3). Therefore, Theorem 5.10 holds and (Cond2) is satisfied for a pro-
cesses Xt (with respect to the partition ϕ), proving the F -reduction from Ytto Xt to be exact. By Corollary 6, the species-based semantics of a new rule-
set R = R1, R−1 , R2, R−
2 is exactly the process Yt. Therefore, the process Ytcan be analyzed instead of Xt, with their mutual relation as outlined in The-
orem 4.28. More precisely, points (i) and (ii) always hold. Since the set of rules
is reversible, the CTMC Yt is irreducible, and Theorem 4.28(iii) holds as well.
Moreover, point (ii)(i) of Theorem 4.28 relates the transient semantics between
Chapter 5. Exact automatic reductions of stochastic rule-based models 98
Yt and Xt. This is because there are no deletion events in the model, and by
Theorem 5.9. We confirmed all of the above observations on a test case described
in Figure 5.3.
Notice that, when rules R3,R4 are added, the Algorithm 2 will output the set
of species. Indeed, the simulation shows that adding rules R3,R4, F no longer
provides an exact reduction.
Chapter 6
Approximate aggregation of Markov chains
Throughout this section, we consider a Markov graph (S,w, p0) with the process
Xt (either discrete- or continuous-time), a partitioning of the countable set S,
induced by a surjective function g ∶ S → S, where S = A1, . . . ,AM and M < ∣S∣.
Recall from Chapter 4.1, that using a g-aggregation (S, w, p0), instead of the
original Markov graph (S,w, p0), is justified in the case of exact aggregation,
that is, when the projected process is equivalent to the process assigned to the
aggregated Markov graphs. If we do not have guarantees that the aggregation is
exact, it is useful to quantify the error of using the Markov graph(S, w, p0) as an
approximation of the projected process.
Problem 3. Quantify the error induced by the g-aggregation (S, w, p0), until time
T .
Let Yt be the g-projection of Xt and let Y ′t be the CTMC assigned to
(S, w, p0). Both processes Yt and Y ′t are defined over the same state space S.
The projected process Yt is not necessarily a Markov chain. We need a distance
measure between distributions of two multi-dimensional random variables. In case
of discrete time, we deal with a multi-dimensional discrete random variable, and
in the case of continuous time, we deal with a multi-dimensional mixed (discrete
and continuous) random variable.
We decide on the information-theoretic measure of divergence, known as relative
entropy or Kullback-Leibler divergence. The main reason why we use the KL-
distance is that it is convenient when applied to the probability space of traces
generated by Markov sources: it can be computed efficiently, as a function of only
99
Chapter 6. Approximate aggregation of Markov chains 100
aggregation error
Xt
Yt
projection
Y n
X t
lifting
aggregation
Figure 6.1: Processes Xt and X ′t operate on the state space S, while
processes Yt and Y ′t operate on the reduced state space S. Aggregation
and lifting are operations over the Markov graph. Projected process Yt is notnecessarily a Markov chain.
the corresponding generator matrices and the transient distribution of the original
process.
6.1 KL divergence
Let P and M be two probability measures on a common measurable space (Ω,F).
Definition 6.1. The divergence of measure P with respect to measure M (discrete-
or continuous-) is the supremum of relative entropy with respect to all possible
discrete measurements:
D(P∣∣M) = supfH(Pf ∣∣Mf).
Various other terms are used for quantity D(P∣∣M) throughout the literature: dis-
crimination, Kullback-Leibler number, directed divergence, cross entropy. By pre-
vious discussion on relative entropy, KL divergence is nonnegative, it equals zero
if the distributions match and it can be equal to infinity (whenever P is not dom-
inated by M). KL divergence is not a metrics, since it is non-symmetric, and it
does not satisfy the triangle inequality. A common technical interpretation is that
KL divergence is the coding penalty associated with selecting the candidate M to
approximate the correct distribution P .
The following Theorem provides the computation of KL divergence for both discrete-
and continuous- probability spaces. It will be important in the analysis of approx-
imate aggregations, since we will need to derive an error measure between trace
distributions generated by continuous-time Markov chains.
Chapter 6. Approximate aggregation of Markov chains 101
Theorem 6.2. [Theorem 5.2.3 in [46]] The Kullback-Leibler divergence of P with
respect to M is given by
D(P∣∣M) =⎧⎪⎪⎪⎨⎪⎪⎪⎩
∫ ln f(ω)dP(ω) = ∫ f(ω) ln f(ω)dM(ω) , if P ≪ M
∞ , otherwise.
The term f = dP/dM is the Radon-Nikodym derivative of P with respect to M. The
quantity ln f , if it exists, is called the entropy density or relative entropy density
of P with respect to M.
In particular, it is worth instantiating the computation of divergence on a proba-
bility space depending on whether P and M are discrete or continuous measures:
(i) If P and M are discrete, then the Radon-Nikodym derivative dPdM(ω) is equal
to p(ω)m(ω) , so we have
D(P∣∣M) = ∑ω∈Ω
p(ω) lnp(ω)m(ω) .
Indeed, this follows immediately from Lemma 1.20: for discrete sample space,
the supremum is achieved for the identity measurement.
(ii) When P and M are measures on Euclidean space Rn, and if they are both
absolutely continuous with respect to Lebesgue measure (dominated by it),
then there exist pdf’s f and g for measures P and M respectively, such that
D(P∣∣M) = ∫Rnf(x) ln
f(x)g(x)dx.
Another important result which will provide a crucial argument in the framework
of approximate aggregations, states that the measurement on a finite alphabet
random variable lowers the relative entropy of that random variable.
Theorem 6.3. [Thm. 5.2.2 in [46]] If P and M are two probability measures and
f a measurement on the common measurable space (Ω,F), then
D(Pf ∣∣Mf) ≤D(P∣∣M).
In the case of discrete probability spaces, the result of the Theorem is immediate
from the definition of divergence. The result for arbitrary measurable function on
Chapter 6. Approximate aggregation of Markov chains 102
continuous measure spaces can be proven by combining the Lemma 1.20 and an
approximation technique. A detailed proof and further discussion can be found in
([46], Thm. 5.2.2).
6.2 Error measure: Discrete time
Our goal is to estimate the aggregation error for any given number of steps, as a
function of only the descriptions of Markov graphs (S,w, p0) and (S, w, p0) and
the transient distribution of process Xt of (S,w, p0). We start by defining the
aggregation error.
Definition 6.4. Let Xn be a DTMC assigned to M ≡ (S,w, p0) and Y ′n a
process assigned to some g-aggregation M ≡ (S, w, p0) . Then, the aggregation
error until time n = 0,1, . . . is defined by
∆M,M(n) ∶=D(PY 0∶n ∣∣P′Y 0∶n),
where PY 0∶n and P′Y 0∶n denote the prefix trace semantics of respectively Yn and
Y ′n (prefix trace semantics is defined in Section 4.4.1).
A useful notion of divergence between processes is the KL divergence rate.
Definition 6.5. Let P and P′ be two different probability measures on the mea-
surable space of a directly given random model of a discrete-time process Xn.
Then, the KL distance rate is defined by
D(PX ∣∣P′X) = lim
n→∞
1
nD(PX0∶n ∣∣P′
X0∶n).
We start with a Theorem on how to compute the KL distance between prefix trace
distributions between two Markov sources on the same state space. Then, since
Y ′n is not necessarily a Markov chain, instead of computing the error between
Yn and Y ′n directly, we will provide an upper bound, by lifting the aggregation
(S, w, p0) back to the state space S. We will show that the KL distance between
the original process, Xn, and the lifted process, X ′n is indeed an upper bound
to the aggregation error.
Chapter 6. Approximate aggregation of Markov chains 103
Theorem 6.6. Let Xn and X ′n be DTMC’s assigned to the Markov graphs (S,w, p0)
and (S,w′, p′0), and let P, P′ denote the probability measures of their directly given
models. Moreover, let π(k) ∶ S → [0,1] denote the transient distribution of Xnat time k. Then,
D(PX0∶n ∣∣P′X0∶n) =HP∣∣P′(X0∶n) =∑
s∈S
p0(s) lnp0(s)p′0(s)
+n−1
∑k=0
∑s∈S
π(k)(s)σ(s)
=D(p0∣∣p′0) +n−1
∑k=0
Eπ(k)[σ],
where σ ∶ S → R≥0 is defined by
σ(s) = ∑s′∈S
w(s, s′) lnw(s, s′)w′(s, s′) .
In particular, if Xn has a unique stationary distribution µ, then
D(P∣∣P′) = Eµ[σ] =∑s∈S
µ(s)σ(s).
Recall the definition of prefix trace semantics in discrete-time and notice that the
prefix trace s0→ . . .→ sk is assigned a probability
PX0∶k(s0 → . . .→ sk) = PSN(Prefix(s0, . . . , sk)) = p0
k−1
∏i=0
w(si, si+1).
Proof. We use induction on the length of the trace. The case n = 0 is trivial.
Assume that the claim holds for all k < n. Then,
D(PX0∶n ∣∣P′X0∶n)
= ∑s0,...,sn−1∈S
∑s′∈S
P(X0∶n = (s0, . . . , sn−1, s′)) ln
P(X0∶n = (s0, . . . , sn−1, s′))P(X ′
0∶n = (s0, . . . , sn−1, s′))
= ∑s0,...,sn−1∈S
∑s′∈S
P(X0∶(n−1) = (s0, . . . , sn−1))w(sn−1, s′)
lnP(X0∶(n−1) = (s0, . . . , sn−1))w(sn−1, s′)P(X ′
0∶(n−1)= (s0, . . . , sn−1))w′(sn−1, s′)
=D(PX0∶(n−1) ∣∣P′
X0∶(n−1)) + ∑sn−1∈S
π(n−1)(sn−1)σ(sn−1),
where we used that the transient probability of being in state s at step (n − 1)is equal to the sum of probabilities of all traces which have the (n − 1)st step in
s.
Chapter 6. Approximate aggregation of Markov chains 104
k 0.25k
0.25k
a b
a2
a1 b1
b2π = ( )a1 a2 b1 b2
0.5 0.5 0.25 0.75( )g = a a b ba1 a2 b1 b2
0.75k
0.75k
Figure 6.2: An example of π-lifting.
6.2.1 Lifting: Discrete case
Lifting is an operation on the Markov graph, which is in a sense inverse to ag-
gregation: given a Markov graph on the aggregated state space, lifting outputs a
Markov graph on the original state space. Lifting needs to be done in such a way,
that the KL divergence between (S,w, p0) and the lifted graph (S,w′, p′0) provides
an upper bound on the aggregation error. It will suffice to construct the lifting so
that the aggregated process Y ′n is a projection of the lifted process X ′
n.
Definition 6.7. Given a discrete-time Markov graph (S, w, p0) and a probability
distribution π ∶ S → [0,1], let
α(A, s) = 1g(s)=Aπ(s)
∑s′∈g−1(A) π(s′). (6.1)
Then, a Markov graph (S,w′, p′0) defined by
(i) p′0(s) = α(A, s)p0(A), where A = g(s), and
(ii) w′(s, s′) = α(A′, s′)w(A,A′), where A = g(s) and A′ = g(s′)
is called a π-lifting of (S, w, p0).
Lemma 6.8. The π-lifting of a discrete-time Markov graph is a discrete-time
Markov graph.
Proof. Notice that for every s ∈ S, p′0(s) = ∑A∈S α(A, s)p0(A) ≥ 0. Moreover,
∑s∈S
p′0(s) =∑s∈S
∑A∈S
p0(A)α(A, s) = ∑A∈S
p0(A)(∑s∈S
α(A, s)) = 1.
Chapter 6. Approximate aggregation of Markov chains 105
Observe that, by Definition 6.7, w′(s, s′) = ∑A∈S∑A′∈S 1g(s)=Aα(A′, s′)w(A,A′) ≥ 0.
For a fixed s ∈ S,
∑s′∈S
w′(s, s′) = ∑s′∈S
⎛⎝∑A∈S
∑A′∈S
1g(s)=Aα(A′, s′)w(A,A′)⎞⎠
= ∑A∈S
1g(s)=A⎛⎝∑A′∈S
∑s′∈S
α(A′, s′)w(A,A′)⎞⎠
= ∑A∈S
1g(s)=A⎛⎝∑A′∈S
w(A,A′)⎞⎠= 1.
Lemma 6.9. If (S,w′, p′0) is a π-lifting of (S, w, p0), then (S, w, p0) is an exact
aggregation of (S,w′, p′0).
Proof. By Theorem 4.27, it suffices to show that, for α defined as in (6.1), (Cond2)
holds and p0 respects α. The distribution p0 respects α by construction. Let
A,A′ ∈ S. Then, for every state s′ ∈ [A′], δ−(A, s′) equals w(A,A′):
δ−(A, s′) = α(A′, s′)−1 ∑s∈[A]
α(A, s)w(s, s′)
= α(A′, s′)−1 ∑s∈[A]
α(A, s)[α(A′, s′)w(A,A′)]
= α(A′, s′)−1α(A′, s′)w(A,A′) ∑
s∈[A]
α(A, s) = w(A,A′).
Theorem 6.10. Let π be some probability distribution on S, and X ′t be a
π-lifting of (S, w, p0). Then, for all n ≥ 0,
∆M,M(n) ≤D(PX0∶n ∣∣P′X0∶n).
Proof. Since Yn is a g-projection of Xn, the measurement Y 0∶n is a composition
of measurement g and X0∶n on a directly given model of process Xn (concretely,
Y 0∶n = gX0∶n). By Lemma 6.9, since the aggregation from (S,w′, p′0) to (S, w′, p′0)is exact, Y ′
n is a g-projection X ′n, and, thus, Y ′0∶n = g X ′0∶n. The final claim
follows by the fact the measurement reduces relative entropy (Theorem 6.3).
Chapter 6. Approximate aggregation of Markov chains 106
6.3 Error measure: Continuous time
Definition 6.11. Let Xt be a CTMC assigned to M ≡ (S,w, p0) and Y ′n a
process assigned to some g-aggregation M ≡ (S, w, p0) . Then, the aggregation
error until time T ∈ [0,∞) is defined by
∆M,M(T ) ∶=D(PY [0,T ) ∣∣P′
Y [0,T )),
where PY [0,T ) and P′
Y [0,T ) denote the trace distributions of respectively Yt and
Y ′t .
Theorem 6.12. Let Xt and X ′t be assigned to the CTMCs (S,w, p0) and
(S,w′, p′0) respectively, and let P, P′ denote the probability measures of the directly
given models. Moreover, let π(t) denote the transient distribution of Xt at time
t. Then,
D(PX[0,T ) ∣∣P′
X[0,T )) =HP∣∣P′(X[0,T )) =∑s∈S
p0(s) lnp0(s)p′0(s)
+ ∫T
0∑s∈S
π(k)(s)θ(s)
=D(p0∣∣p′0) + ∫T
0Eπ(k)[θ(s)],
where θ ∶ S → R≥0 is defined by
θ(s) = ∑s′∈S∖s
w(s, s′) lnw(s, s′)w(s, s′) − (a(s) − a′(s)).
The proof of a more general statement (the case of Markov processes which are
not necessarily time-homogeneous) can be found in [13]. We next outline the proof
for CTMC’s.
Recall that X[0,T ) is characterized by a tuple lying in ∪∞k=0 (S × (R+ × S)k), and
that the events measurable by X[0,T ) are of the form (k,D0, . . . ,Dk, I1, . . . , Ik),denoted by
X[0,T ) ∈ D0I1→ . . .
Ik→ Dk,
where Di ∈ P(S) and Ii ∈ B, for i = 0, . . . , k (Section 4.5.2). For example, the event
X[0,T ) ∈ D0I1→ D1
I2→ D2 contains all traces which start at s0 ∈ D0, make a first jump
to state s1 ∈ D1 at time t1 ∈ I1, move to a state s2 ∈ D2 at time t2 ∈ I2, and finally
exit the state s2 after time T .
Chapter 6. Approximate aggregation of Markov chains 107
Lemma 6.13. Let Xt be a process assigned to a CTMC (S,w, p0) and X[0,T )
its prefix trace semantics (Section 4.5.2). Then,
fX[0,T )(s0t1→ . . .
tk→ sk) = p0(s0)k
∏i=0
[e−a(si)(ti+1−ti)w(si, si+1)] e−a(sk)(T−tk). (6.2)
Proof. The measurable sets related to the random variable X[0,T ) belong to the
smallest σ-algebra generated by ∪∞k=0 (S × ([0, r) ∣ r > 0 × S)k). The cumulative
distribution function of X[0,T ) is therefore
FX[0,T )(s0t1→ . . .
tk→ sk) = PΓ(R+,S)(γ ∣sγ1 = s1, . . . , s
γk = sk,
tγ0 < t0, . . . , tγk < tk − tk−1, t
γk+1 > T − tk),
where, as introduced in Section 4.5.2, sγi associates with the i-th state visited
by γ, and the time instance at which the i-th jump occured is tγi . The cylin-
der set of traces Cylinder(s0, s1, . . . , sk, t0, t1 − t0, . . . , tk − tk−1) contains all of the
above-described events, but it also contains the traces in which the (k + 1)stexit happens before time T . Hence, the probability of the disjoint union set
∪s∈SCylinder(s0, . . . , sk, s, t0, . . . , tk − tk−1, T − tk) will be substracted from the re-
sult. Intuitively, the proportion of the events which are over-count should be
P(ξk+1 > T − tk ∣ Zk = sk) = 1 − (1 − e−a(sk)(T−tk)). Recall that (Section 1.5.5)
P(Cylinder(s0, . . . ,sk, s, δ0, . . . , δk−1, δk))
= P(Cylinder(s0, . . . , sk, δ0, . . . , δk−1))∫δi
0e−a(si−1)δkw(sk, s)dδk.
Finally, the cdf results in P(Cylinder(s0, . . . , sk, t0, . . . , tk − tk−1)), multiplied by a
factor (1 −∑s∈S∖sk[(1 − e−a(sk)(T−tk)) w(sk,s)
a(sk)]) = e−a(sk)(T−tk) yields
FX[0,T )(s0t1→ . . .
tk→ sk) = p0(s0)k−1
∏i=0
[(1 − e−a(si)(ti+1−ti)) w(si, si+1)a(si)
] e−a(sk)(T−tk).
The corresponding pdf is the derivative over δ1, . . . , δk.
By definition of KL divergence,
D(PX[0,T ) ∣∣P′
X[0,T )) = EX[0,T )[ln fX[0,T )] − EX[0,T )[ln fX′[0,T )]. (6.3)
Chapter 6. Approximate aggregation of Markov chains 108
For notational convenience, we further write E for EX[0,T ) . Moreover, we write f
for fX[0,T ) and f ′ for fX′[0,T ) . Finally, we will use Γ to denote the range of a variable
X[0,T ) and γ for a tuple which characterizes a trace until time T . Respectively,
the state visited by a trace γ at time t is denoted by γ(t).
From (6.2), by setting tk+1 = T , it follows that
ln(f(s0t1→ . . .
tk→ sk)) = ln(p0(s0)) +k
∑j=0
−a(sj)(tj+1 − tj) +k−1
∑j=0
lnw(sj, sj+1). (6.4)
The next two Lemmas help to write the expectation of two summations with
respect to the density f , in form of an integral of some function over the interval
[0, T ).
Lemma 6.14. Let a ∶ S → R≥0. Then,
Ef [kγ
∑j=0
a(sγj )(tγj+1 − t
γj )] = ∫
T
0∑s∈S
π(t)(s)a(s)dt.
Proof. Observe that
∫T
0a(γ(t))dt = ∫
t1
0a(sγ0)dt + . . . + ∫
T
tka(sγk)dt =
kγ
∑j=1
a(sγj )(tγj+1 − t
γj ).
Therefore,
Ef [kγ
∑j=0
a(sγj )(tγj+1 − t
γj )] = Ef [∫
T
0a(γ(t))dt] = ∫
γ∈Γf(γ) [∫
T
0a(γ(t))dt]dγ
= ∫T
0(∫
γ∈Γ∑s∈S
f(γ)1γ(t)=sdγ)a(s)dt = ∫T
0∑s∈S
π(t)(s)a(s)dt.
Lemma 6.15. Let φ ∶ S ×S → R≥0 be defined by φ(s, s′) = ln(w(s, s′))1s=s′ . Then,
E [kγ
∑k=1
φ(sγk, sγk+1)] = ∫
T
0(∑s∑s′≠s
πt(s)w(s, s′)φ(s, s′))dt.
Proof. We will show that E [∑kσ
k=1 φ(sσk , sσk+1)] = ∫T
0 (∑s∑s′≠s πt(s)w(s, s′)φ(s, s′))dt.
Recall that, by definition of a generator matrix for time-homogeneous Markov
chains, for all t ≥ 0, w(s, s′) = limh→0P(Xt+h=s
′∣Xt=s)h . Due to right-continuity, we
may choose h, such that the possibility of two reactions occurring within the inter-
val [t, t+h) is negligible. Then, for a fixed trace γ ∈ Γ, the function φ(γ(t), γ(t+h))
Chapter 6. Approximate aggregation of Markov chains 109
has non-zero value only in the interval of at most length h before some jump occurs.
Therefore,
∫T−h
0φ(γ(t), γ(t + h))dt = ∫
tγ1
tγ1−hφ(sγ0 , s
γ1)dt + . . . + ∫
tσk
tγk−hφ(sγk−1, s
γk)dt = h
kγ
∑k=1
φ(sγk, sγk+1).
Hence,
E [kγ
∑k=1
φ(sγk, sγk+1)] =∑
γ∈Γ
f(γ) limh→0
1
h(∫
T−h
0φ(γ(t), γ(t + h))dt)dγ
= limh→0
1
h(∑γ∈Γ
∑s∑s′≠s
f(γ)∫T−h
01γ(t)=s,γ(t+h)=s′φ(s, s′)dtdγ)
= limh→0
1
h ∫T−h
0(∑s∑s′≠s
P(Xt = s,Xt+h = s′)φ(s, s′)dt)
= limh→0
1
h ∫T−h
0(∑s∑s′≠s
P(Xt = s)P(Xt+h = s′ ∣Xt = s)φ(s, s′)dt)
= ∫T
0(∑s∑s′≠s
πt(s)w(s, s′)φ(s, s′))dt.
Proof. (Theorem 6.12) From (6.3) and (6.4), it follows that
D(PX[0,T ) ∣∣P′
X[0,T )) = E[ln f] − E[ln f ′] = E[ln(p0(sσ0) − ln(p′0(sσ0)]
− E [kσ
∑j=0
a(sσj )(tσj+1 − tσj )) −kσ
∑j=0
a′(sσj )(tσj+1 − tσj )]
+ E [kσ
∑j=1
lnw(sσj , sσj+1)] −kσ
∑j=1
lnw′(sσj , sσj+1)]
=D(PX0 ∣∣PX′0) + ∫
T
0∑s
πs(t)(∑s′≠s
w(s, s′) lnw(s, s′)w′(s, s′) − (a(s) − a′(s))) .
6.3.1 Lifting: Continuous case
We construct lifting, analogous to the discrete-time case.
Definition 6.16. Given a continuous-time Markov graph and a probability dis-
tribution π ∶ S → [0,1], let α(A, s) be defined as in (6.1). Then, a Markov graph
(S,w′, p′0) defined by
Chapter 6. Approximate aggregation of Markov chains 110
(i) p′0(s) = α(A, s)p0(A) for A = g(s), and
(ii) w′(s, s′) =
⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩
α(A′, s′)w(A,A′) if g(s) ≠ g(s′), A = g(s), A′ = g(s′)
0 if g(s) = g(s′) and s ≠ s′
w(A,A′) if s = s′, A = g(s), A′ = g(s′).
is called a π-lifting of (S,w, p0).
Lemma 6.17. The π-lifting of a continuous-time Markov graph is a continuous-
time Markov graph.
Notice that the rates between two different lumped states are always assigned rate
zero. To show that the lifting is a well-defined continuous-time Markov graph,
notice first that for s ≠ s′, it holds that w′(s, s′) ≥ 0. Moreover, for a fixed s, and
A = g(s),
∑s′∈S
w′(s, s′) = ∑s′∈S∖A
w′(s, s′) + ∑s′∈A∖s
w′(s, s′) +w′(s, s)
= ∑A′∈S∖A
∑s′∈[A′]
α(A′, s′)w(A,A′) + w′(A,A)
= ∑A′∈S∖A
w(A,A′)⎛⎝ ∑s′∈[A′]
α(A′, s′)⎞⎠+ w(A,A) = 0.
Lemma 6.18. If (S,w′, p′0) is a π-lifting of (S, w, p0), then (S, w, p0) is an exact
aggregation of (S,w′, p′0).
Proof. Let A,A′ ∈ S and s′ ∈ [A′]. For A ≠ A′, the proof that δ−(A, s′) = w(A,A′)is analogous to the case of discrete time. For A = A′, by definition of lifting, the
only non-zero rate towards state s′, inside cluster A is the self-loop. Therefore,
δ−(A, s′) = ∑s∈[s]
α(A, s)w′(s, s′)α(A, s′) = w′(s, s) = w(A,A).
Theorem 6.19. Let π be some probability distribution on S, and X ′t be a
π-lifting of (S, w, p0). Then, for all t ≥ 0,
∆M,M(T ) ≤D(PX[0,T ) ∣∣P′
X[0,T )).
Chapter 6. Approximate aggregation of Markov chains 111
Proof. Since Yt is a g-projection of Xt, the measurement Y [0,T ) is a composi-
tion of measurement g and X[0,T ) on a directly given model of process Xt. By
Lemma 6.18, since the aggregation from (S,w′, p′0) to (S, w′, p′0) is exact, Y ′t is
a g-projection X ′t, and, thus, Y ′[0,T ) = g X ′[0,T ). The claim follows by the fact
the measurement reduces relative entropy (by Theorem 6.3).
6.4 Trace semantics interpretation of approximate aggrega-
tions
We can now summarize the results of this chapter by interpreting the error measure
framework on the trace semantics.
Theorem 6.20. Let Xn be a DTMC of M = (S,w, p0) and Y ′n a process
of some g-aggregation M = (S, w, p0). Moreover, denote by Xπn the process
assigned to some π-llifting (S,w′, p′0). Then, for all n ≥ 0, for all π probability
distributions on S,
∆M,M(n) ≤D(PX0∶n ∣∣PπX0∶n).
The Theorem tells that any lifting will provide an upper bound to the aggregation
error. If Xn has a unique stationary distribution µ, the authors in [27] show
that arg minπ D(P∣∣Pπ) = µ, that is, the best bound with respect to the long-term
behavior is achieved for lifting with the stationary distribution. It is intuitively
clear that, for long-term behavior, the best g-aggregation is constructed for π = µ,
that is, when the stationary distribution is taken to be the reference for conditional
distributions α.
Theorem 6.21. Let Xt be a CTMC assigned to M ≡ (S,w, p0) and Y ′t a
process assigned to some g-aggregation M ≡ (S, w, p0). Moreover, denote by Xπt
the process assigned to some π-llifting (S,w′, p′0). Then, for all T ≥ 0, for all π
probability distributions on S,
∆M,M(T ) ≤D(PX[0,T ) ∣∣PπX[0,T )).
Chapter 6. Approximate aggregation of Markov chains 112
6.5 Matrix representation
Similarly as in the case of exact aggregations, we provide the specification of lifting
in matrix representation.
Corollary 7. For a given π, let α be defined as in (6.1), and let V and Uαbe
defined as in Corollary 5. Let P be a transition matrix of a Markov chain assigned
to (S,w, p0) (either discrete- or continuous-time). Moreover, let P be the transition
matrix of a Markov chain assigned to (S, w, p0), and Pπ be the transition matrix
of a Markov chain of the corresponding π-lifting. Then,
(i) Pπ = VPUα and P = UαPπV.
Chapter 7
Approximate automatic reductions of
stochastic rule-based models
Exact reductions deal with finding those fragments which ensure that the aggre-
gation from the species-based Markov graph to the fragment-based Markov graph
is exact. We have shown that, if each two sites, which are directly or indirectly (by
transitivity) tested or modified within the rule set, are correlated in the contact
map annotation, the corresponding fragment-based reduction is guaranteed to be
exact. In such a framework, it may happen that the fragment set output by the
algorithm leaves the system at a prohibitive size, or that it even coincides with the
species-based description (in which case the system remains at its original size).
In this Chapter, we discuss how to perform an approximate F -reduction, and we
describe an application scenario of using the error bound framework described in
Chapter 6.
(R, G0)
(R, G0) aggregation error
Xt
Yt
projection
Y t
X t
lifting
aggregationreduction
Q
Q
Qµ p(t), , ,
,
,
Figure 7.1: Approximate reductions framework. (arrows have no formalmeaning, and serve for illustration purpose: double arrows indicate assigning aMarkov graph to a rule-based program, single full arrow stands for reduction,single arrows indicate operations over a Markov graph, and the dotted arrow is
a never explicitly performed).
113
Chapter 7. Approximate automatic reductions of stochastic rule-based models 114
y1
y2
y34
x1
x2
x3
x4c1− + c3
c2−
c2− + c4
[c−2 , c−2 + c4]
[c−1 , c−1 + c3]
c−1
Figure 7.2: Approximate reduction involves assuming a distribution over thelumped states: A part of a Markov graph from Example 2.1, as presented inthe motivating example, Figure 3.1. Since conditions (Cond1) and (Cond2) are
violated, approximate rates are derived.
The general concept of approximate reductions is illustrated in Figure 7.1. The
rule set R is translated to a rule set R, by Definition 3.10. Reduction error is
the aggregation error between the species-based Markov graph assigned to the
rule-set R and the species-based Markov graph assigned to the rule-set R. The
upper bound on the aggregation error is computed by lifting the process Y ′t to
X ′t. The computation necessitates the availability of the generator matrix of
the original system, and the transient distribution of process Xt. If the user is
interested only in the stationary behavior, the stationary distribution µ suffices.
In Figure 7.2, we show a part of a Markov graph from Example 2.1, as presented
in the motivating example, Figure 3.1. By adding rules R3, R−3 and R4, R−
4 to
the model, condition (Cond1) and (Cond2) are violated. For example, see that
δ+(x3, [y1]ϕ) = c−2 ≠ c−2 + c4 = δ+(x4, [y1]ϕ). The weights w(y34,y1) and w(y34,y2)are approximated, under a certain distribution over the lumped states, denoted
by α. Then, the translated rule-set R is such that w(y34,y1) = α(y34,x3)c−2 +α(y34,x3)(c−2 + c4).
In a local reduction, we take α(y34,x3) = 23 and α(y34,x4) = 1
3 , where α is cho-
sen according to the formula from Theorem 5.10. The reduction is called local,
because the conditional distribution can be determined without looking at the
global dynamics of the rule-based model. As mentioned before, the global reduc-
tion is the one where we know the stationary distribution µ of the original Markov
graph, in which case the approximated rate is set with α(y34,x3) = µ(x3)
µ(x3)+µ(x4)and
α(y34,x4) = µ(x4)
µ(x3)+µ(x4).
Chapter 7. Approximate automatic reductions of stochastic rule-based models 115
7.1 Approximate reductions and error bound
Let M be the species-based Markov graph of rule-based program (R,G0), and M
the species-based Markov graph of its translation (R, G0). Let Q be the generator
of M , µ its stationary distribution and p(t) its transient distribution. We will
denote by Q the generator of M (Figure 7.1).
Definition 7.1. The reduction error between (R,G0) and (R, G0) until time T is
the aggregation error between M and M until time T (defined in Definition 6.11).
Theorem 7.2. For a given fragment set F , the aggregation M is such that Qα =UαQV, where Uα is defined as in Theorem 5.10, and α is defined as in Corollary 5.
Suppose that in M , the rule Ri is applicable to the fraction of states [x]ϕS ⊆ [y]ϕF .
Let G ∈ [x]ϕS . There are in total ∣Aut(x)∣∣Emb(Gi,G)∣ embeddings in the set
∪Emb(Gi,G) ∣ G ∈ [y]ϕF, thus the total rate after applying rule Ri in M is equal
to ci∣Aut(x)∣∣Emb(Gi,G)∣. In M , the translated rule Ri can be applied to each
G ∈ [y]ϕS with the rate c∣Emb(Gi, G)∣, which is equal for all G ∈ [y]ϕS . It suffices
to show that this rate equals
ci∣Aut(x)∣∣Aut(y)∣ ∣Emb(Gi,G)∣,
which can be checked similarly as in Lemma 5.12.
The lifted chain is accordingly defined by Q′ = VQUα. The computation ne-
cessitates the availability of the generator matrix of the original system, and the
transient distribution of process Xt. If the user is interested only in the sta-
tionary behavior, the stationary distribution µ suffices - the aggregated chain can
be defined directly, by setting Q = UµQV, and the lifting with Q′ = VQUµ. To
this end, we call a reduction local, in the sense that only the local, fragment-based
description of the current state needs to be known for defining the reduction. A
reduction is called global, if it is performed with respect to the stationary distri-
bution.
7.2 Tests
We illustrate the framework on three case studies. For each case study, we com-
pare the of the local and global reduction. In Figure 7.3, Example 2.1, the local
Chapter 7. Approximate automatic reductions of stochastic rule-based models 116
0.2
time
stationary (global reduction)
uniform (local reduction)
24 nodes, 81 states (165 before)
15 nodes, 36 states (56 before)d dt∆
(M,M
)(t)
,upper
bound
1
Example 1: Simple scaffold
Figure 7.3: Testing approximate reductions framework. a) Example 2.1:Transient semantics of M and the stationary distribution are obtained byintegrating the CME. The reduction is obtained for a set of fragments Fig-ure 3.2, and the upper bound on ∆(M,M) is computed for both local andglobal reduction. As expected, the global reduction gives a better error boundat the stationary regime. The initial state is set to x = 8SA,8SB,8SC orx = 5SA,5SB,5SC, and rate values are set to c1 = 0.001, c1−=2, c2 = 0.002,c2− = 3, and c3 = 0.05, c4 = 0.2. The dotted lines are the error bound in casec3 = c4 = 0, when the reduction is exact. Notice that the error is not equal tozero even in the case of exact lumping – this is because the error represents theupper bound to the actual error between the lumped processes (the error upperbound would be equal to zero only in the very specific situation that the lifted
process coincides with the original process).
reduction provides a better error bound in the transient regime, while the global
reduction, as expected, gives a better error bound at the stationary regime. We
run the system at two different initial copy numbers, so to observe that the er-
ror grows with the system size. In Figure 7.4a, Example 2.2, we illustrate the
error bound for the three possible fragment sets, with annotating maps shown in
Figure 3.3. Similarly, in Figure 7.4b, for Example 2.3, the framework is used for
showing which of the three fragmentations with the same dimension provides the
smallest error. Plots show that, at the stationary, both F2 or F3 are better than
F4, which can be explained by the fact that the correlation between sites c and x
and c and y, respectively preserved with F2 and F3, is tested with the rules R∗2
and R∗3 . It would be difficult to inspect that either F2 or F3 is better, by only
looking at the information flow in the rule-set. Results show that reduction with
F3 gives a smallest error. Since the error ∆M,M(T ) is the integral of the plotted
functions until time T , it is notable that the discussed ordering of fragment sets
does not hold for small values of T (for the considered initial state).
Chapter 7. Approximate automatic reductions of stochastic rule-based models 117
stationary (global reduction)
uniform (local reduction)
F3
F4
Example 2: Polymerization
80 states
112 states
64 states
time
0.5
1
d dt∆
(M,M
)(t)
,upper
bou
nd
F3
F4
Example 3: Conditional independence
64 states
time
d dt∆
(M,M
)(t)
,upper
bound
F2
a) b)120 states
F2
0
0.1
1 2 3 4 520.5 1.5 000
Figure 7.4: Comparing two fragment-based reductions with the same dimen-sion. a) Example 2.1. With initially 3 nodes of type A and 3 nodes of typeB, the Markov graph counts 112 states. The rates are set to c1 = 3, c−1 = 2.4,c2 = 3.5, c−2 = 2.1, c3 = 0.1, c−3 = 3.6, c∗3 = 3.3;. We compared the error boundfor the reduction for fragment set F2, F3 and F4. With F4, the reduction isfrom 112 to 64 states. With F2 and F3, the reduction is from 112 to 80 states.Preserving the correlation between the sites of nodes A (F2) shows a smallererror bound then preserving the correlation between the sites of nodes B (F3).b) Example 2.3. The system initialized at 3 nodes, inactivated at all three siteshas a Markov graph with 120 states. Three fragment sets F2, F3 and F4 reducethe state space to 80 states. The rates are set to c1 = 1, c2 = 0.6, c3 = 0.5,
c−1 = 1.2, c−2 = 0.3, c−3 = 0.5, c∗2 = 2, c∗y = 2.
The proposed ‘global’ reduction technique is not limited to the application of rule-
based models, and can be applied to any continuous-time Markov chain model.
’Local’ reductions are specific to site-graph-rewrite grammars. The implementa-
tion of the approximate reductions framework within the Kappa modeling frame-
work is a work in progress. For that reason, we leave the analysis of the approxi-
mate framework for large-scale case studies to future work.
At the end of this Chapter, we remark that, as biochemical models are already
an approximation of reality, models obtained by approximate reductions can also
be seen as alternative model candidates which operate on a context that is more
‘local’ than the given, reference model. Discriminating between such model candi-
dates, which differ only at the radius of context tested by a rule opens numerous
challenges related to model validation, which is out of the scope of this thesis work.
Chapter 8
Case studies
Two large-scale case studies of signaling pathways were chosen to analyze the per-
formance of the fragment-based reduction: a crosstalk between epidermal growth-
factor receptor and insulin receptor pathway, and high-osmolarity glycerol path-
way in yeast. Each model is captured by a set of site-graph-rewrite rules, which
are then input to a rule-based programming environment Kappa [34]. For the
definition of Kappa syntax and its operational semantics, we refer to [32]. In Fig-
ure 8.1d, we illustrate how one site-graph rewrite rule is written in Kappa. We
will use several features of Kappa based on static analysis of a rule-set: (1) an au-
tomatized generation of the contact map of a model, (2) the over-approximation
of the set of reachable species [22], and (3) the contact map annotation according
to Algorithm 2, together with the generation of a new rule-set, as defined in Defi-
nition 3.10. Rule rates will not be specified, since they do not affect the outcome
of Algorithm 2.
8.1 EGF/insulin receptor pathway
We study a model of a crosstalk between the epidermal growth-factor receptor
(EGFR) and the insulin receptor pathway. In general, EGFR exists on the cell
surface and is activated by binding of its specific ligands, EGF in this case. In
turn, it initiates a signaling cascade related to a variety of important biochemical
changes, such as cell growth, proliferation, and differentiation. A huge number of
feasible multi-protein species can be formed in a detailed model of this signaling
pathway [7]. For example, in the complete model described in [20], the number of
118
Chapter 8. Case studies 119
Figure 8.1: The set of rules for the early EGF/insulin crosstalk model. Theunderlying mechanistic model is taken from [14]. Original model contains 42956reaction and 2768 species. Kappa syntax supports two types of shorthand nota-tion: a site which simultaneously bears an internal state and serves as a bindingsite (for example, site b of node type EGFR), and the dash symbol which denotesthat the site is bound - for example, in rule r10, EGFR(bu, d
−) denotes that sited is bound.
reachable complexes is estimated to ≈ 1020. We focused on a model of the early
signaling cascade of events, described in [14]. More precisely, the model focuses
on the signaling from the initial receptor binding (either of EGF or Ins), until the
recruitment of the transport molecule Grb by binding to Sos. Grb, the growth
factor receptor-bound protein, is also known as the transport molecule, because of
its ability to link the EGFR to the activation of Ras and the further downstream
components. The model involves only eight proteins, combined into 2768 different
molecular species. The interactions are captured by a set of 42956 reactions.
8.1.1 Model description
The reactions were translated into a rule-based model with 38 reversible rules,
shown in Figure 8.1. Eight node types arise: A = EGF,EGFR, IR,Sos,Grb, IRS, Ins,Shc.
The contact map of the model is given in Figure 8.2a. Each of the eight proteins
is assigned a set of sites. For example, Σ(EGFR) = a, b, c, d. The shaded sites in
the figure bear an internal site value. For example, the internal site b of protein
EGFR can bear two internal values - I (b) = u,p, where bp denotes that the
site is phosphorylated. It is worth noticing that some sites have multiple binding
partners, which denotes a competition (concurrency) for binding, because only one
Chapter 8. Case studies 120
Sos
d
b
a
a
a
d
b
a
b
d
c
a
a
a
a
bc
b
EGF
EGFR
Shc
IR
IRS
Grb Ins
d
b b
d
EGFR EGFR
EGFR(bu, d) → EGFR(bp, d)
a)
a
d
b
a
b
IRS
Grb
Sos
a
d
b
Grb
Sos
a
d
b
a
b
IRS
Grb
Sos
a
d
bGrb
Sos
d
d
c)
b)
d)
d
b
a
a
a
d
b
a
b
d
c
a
a
a
a
bc
b
EGF
EGFR
Shc
IR
IRS
Grb
Sos
Ins
Figure 8.2: EGF/insulin crosstalk model. a) Contact map. The gray-shadedsites bear internal value. b) Contact map annotation. c) Two reaction mixtureswhich are equivalent with respect to the annotation. The green color denotesphosphorylated state. d) An example of a Kappa rule and the site-graph rewriterule: EGFR(bu, d) denotes a site-graph G = (V,Type, I,E,ψ) with one node V =
v, such that Type(v) = EGFR, interface function I(v) = b, d and evaluationfunction ψ(v, b) = u.
bond can be established at a time. For example, the site a of protein Grb has three
possible binding partners. Moreover, a self-loop at the site d of a receptor EGFR
means that it can be bound to the site d of another EGFR. Therefore, one or two
nodes of type EGFR can be found in a single species.
Two major pathways are involved: one starting with the receptor EGFR, and
another, starting at the receptor IR. The two pathways share proteins. We explain
how each pathway works, by focusing on the forward direction of rules.
In the first branch, EGFR recruits a transport molecule Grb. Three rules model
the self-dimerization of EGFR’s (r03,r04,r05). The rate depends on whether the
ligand EGF is already recruited or not. When EGFR is in a dimer with another
EGFR, it is considered to be in its active form. Therefore, depending on whether
site d is bound or free, two rules model for EGFR recruiting a ligand (EGF) on site
a (r01,r02), and two rules model that the site b of EGFR can be phosphorylated
(r06,r07). The phosphorylation signal can be passed from EGFR to the adapter
molecule Shc (r09,r10) if previously bound to it (r08). Finally, Shc recruits a
Chapter 8. Case studies 121
transport molecule Grb (r11). Yet, each receptor has a shorter way to recruit a
transport molecule. The site c of EGFR can be phosphorylated (r12,r13), and then
bind to Grb directly (r14,r15).
In the second branch, an insulin receptor (IR) recruits the transport molecule Grb.
Receptor IR can recruit insulin molecules (Ins) on two sites– a (r16,r17) and b
(r18,r19) (the rate may depend on whether an insulin molecule is already bound).
Similarly, the site c of the IR can be phosphorylated (r20,r21,r22,r23). Then, IR can
recruit an adapter Shc (r24). Whenever IR is also bound to two insulin molecules,
Shc can be phosphorylized (r25). Adapter Shc can then recruit Grb (r11). Yet,
IR also has a direct way of recruiting a Grb: the site d of IR (r26,r27,r28,r29) can
phosphorylate, and then recruit another adapter IRS (r30) which can be activated
when the insulin receptor is bound to two insulin molecules (r31). Then, IRS can
recruit Grb (r32).
Finally, independently, Grb can bind to a protein Sos (r33,r34), and Sos is activated
(r35,r36). The remaining rules describe the recruitment of Sos by Grb (r37), and
spontaneous (de)phosphorylation of Shc (r37) and IR (r38).
8.1.2 Exact fragment-based reduction
Applying Algorithm 2 to the model, a reduction from a dimension of 2768 species to
609 fragments is obtained. The annotated contact map is given in Figure 8.2b. The
interface of protein Grb is split into two annotation classes, because no rule tests
both the site a and b of Grb. Thus, the partition Σ(Grb)/∼ = C (Grb) = a,bdefines a set of fragments for which the reduction is exact. Two fragment-based
equivalent mixtures are shown in Figure 8.2c.
Even though there are cycles in the site-graph representation of a contact map,
none of these cycles is a path of a site-graph. Consequently, no more than two
EGFR can be within the same species. The largest species for this contact map
counts 16 nodes (containing two EGFR nodes, two EGF nodes, four Grb nodes, four
Shc nodes), while the equivalent fragment counts 12 nodes.
Chapter 8. Case studies 122
8.2 HOG pathway in yeast
A model of the adaption of yeast cells (S. Cerevisae) to external osmotic changes is
discussed. Osmotic shock initiates a quick increase in the external osmotic pressure
and a decrease in the turgor pressure of the cell. The balance between internal and
external pressures of a yeast cell upon osmotic shock is re-established by the high-
osmolarity glycerol (HOG) pathway. More specifically, the transducers of osmotic
signals are mitogen-activated protein kinases (MAPK) cascades which serve to
activate the HOG molecule; Upon activation, gene expression and metabolism
modules regulate an increase in glycerol, which is in turn used to increase the
internal osmotic pressure [58]. The increase in internal osmotic pressure balances
out the turgor pressure back to its original value, which deactivates the pathway
and stops unnecessary glycerol production.
As mentioned, upon activation, the HOG molecule translocates to the nucleus
and the gene transcription is initiated. At this point, it was recently reported
that, depending on the intensity of the osmotic stress, the gene expression may
vary from cell to cell [69]. In other words, a bimodal expression behavior in a cell
population is exhibited.
We consider a detailed rule-based model of the HOG pathway in yeast, based on
the evidence taken from literature [59]. The authors of the model were aiming at
a platform for ‘in silico’ experiments related to the aforementioned phenomenon
of bimodal response in transcriptional output of HOG pathway in yeast.
8.2.1 Model description
Since the model is very detailed, and the purpose of our analysis is to comment on
the dimension decrease after applying the exact fragment-based model reduction,
we provide only a high-level model description.
The model comprises the Sln and Sho branches of Hog1 activation. It contains 41
node types and 443 rules. Each of the two osmosensors at the cell membrane (Sln
and Sho) can activate a MAPK kinase kinases (Ste11 or Ssk2.22) which bind to
a MAPK kinase Pbs2. The MAPK kinase Pbs2 then doubly phosphorylates the
MAPK Hog1, which then rapidly translocates to the nucleus and starts the gene
transcription. This leads to the conversion to glycerol. Other genes will work to
Chapter 8. Case studies 123
Figure 8.3: High-osmolarity glycerol (HOG) model in yeast: contact mapobtained by Kappa - A summary of node types, their domains and possiblebindings. The model comprises 41 agents and 443 rules. The site Localiz encodesthe cellular compartments in which it can be found (membrane, cytosol, or
nucleus).
dephosphorylate the active Hog1 in the nucleus, which causes it to move back out
into the cytosol.
The input to the system, salt concentration, is modeled by the node type Osm. The
output, apart from Hog1, are nodes of type mVenus and mCherry, which denote
mRNA proteins measured by fluorescent markers and indicate stochasticity. More
precisely, the correlation between the intensities of the two mRNA’s serves to
quantify the contribution of inter-cellular (extrinsic) variability and intracellular
(intrinsic) variability to the overall expression noise (more details on the method
can be found in [85]).
Not all node types represent proteins, nor do all site types represent protein do-
mains in this model. For example, the node type FeedbackDummy or GlycFeedback
are incorporated for regulating the feedback mechanism. Moreover, site Localiz at
nodes Hog1, Ssk1, Ssk2, Ste50 denotes the compartment localization of the protein
(nucleus or cytosol).
8.2.2 Reachable species
The contact map generated automatically by Kappa is shown in Figure 8.3. The
exact number of reachable species could not be reported by Kappa, since it counts
over 109. The protein with the largest number of sites is Hog1 (10 sites). Our
calculation reports that only the number of species involving Hog1 counts 1476
Chapter 8. Case studies 124
(without polymers). A creation of species containing an unbound number of nodes
is possible in this model. For example, notice that, Hog1, Pbs2, Ste50 and Ste11
can theoretically form polymers of a size limited only by the total number of each
of the proteins in the reaction mixture (Figure 8.4).
8.2.3 Exact fragment-based reduction and model decomposition
The model is translated to a new rule-based model, according to the annotation by
Algorithm 2. We report the annotation classes for a part of the model, shown in
Figure 8.4. While all sites of the protein Hog1 are captured in one annotation class,
the scaffold Pbs2 has three independently interacting groups of sites, rendering the
number of fragments containing Pbs2 significantly smaller than the corresponding
species. Moreover, while proteins Hog1, Pbs2, Ste50, Ste11 exhibit unbounded
polymerization in the species-based system, and, consequently, produce a number
of species exponential with respect to proteins’ abundances, the number of frag-
ments composed of the same group of proteins is constant. This situation can be
put in analogy with the simple polimerization case study Example 2.2 presented
in Chapter 3, where every cycle in the contact map is broken by the annotation
classes. Despite the argued reduction from the number of species to the number
of fragments, Kappa reported that the number of fragments (species of the new
model) still counts more than 109.
We performed additional analysis of detecting smaller, independent stochastic sys-
tems, as suggested in [71]. The model decomposition with fragments is not yet
automatized in Kappa, so we performed the calculation manually. The model can
be decomposed into 20 smaller models, which can be independently analyzed. Re-
call from Chapter 3, that if a node of type A has an annotation class C ⊆ C (A),a new node type is assigned a name AC. Then, for example, the set of fragments
containing new nodes of type Pbs2Ste11, and/or a node of type Ste11Pbs2 build
one module, which interacts independently from the rest of the system. Another,
larger module is, for example, among the set of fragments arising from the new
node types Ssk22, Ssk2, Ptc2Hog1, FeedbackDummyPtc23, Pbs2Ptc23, Pbs2Ste11.
It is worth noticing that the annotation classes, and the possibility of decompos-
ing the model into smaller, independent units reflects that the modeler did not
incorporate cross-interactions between these modules. This is either because there
is indeed no evidence about the cross-interaction, or, that the modeler simply did
Chapter 8. Case studies 125
Figure 8.4: High-osmolarity glycerol (HOG) model in yeast: MAPK cascade.A part of model related to MAPK cascade is isolated. The red boxes denote
the annotations done according to the output of Algorithm 2.
not exhaust the literature related to those cross-interactions. To this end, the de-
composition of the model to smaller units, apart from the possibility to simulate
each of the units independently and to faithfully compose the obtained results,
also serves as to automatically reveal the assumptions which the modeler imposes
on the dependence of the interaction units.
Chapter 9
Conclusions and Discussion
Handling complexity is an important challenge towards understanding mechanisms
of molecular signaling. In this work, we confirmed our main hypothesis, that pro-
gram static analysis can be employed to successfully reduce rule-based models of
complex biochemical systems. More specifically, we showed how to systematically
derive a reduced model that operates over a set of much fewer, coarse-grained
variables – fragments – which self-consistently describe their own stochastic evo-
lution. We thoroughly analyzed mathematical relations between the original and
the reduced rule-set, and what these relations imply for their respective CTMC’s.
The specificity of the presented reduction procedure is that it is efficient – of
complexity linear in the size of the rule-set, automated – it applies to any well-
defined rule-based program. Formal relation between the respective CTMC’s is
guaranteed within two frameworks. In the framework for exact reductions, the set
of fragments is enforced and the precise relation between respective CTMC’s is
guaranteed. In the framework for approximate reductions, the set of fragments can
vary, and, for a given time limit of a trace, the error in terms of Kullback-Leibler
divergence for trace distributions of the CTMC’s is computed.
In the following, several research questions which directly complement the work
presented in this thesis are discussed.
The procedure for obtaining fragments which guarantee exact reduction (Algo-
rithm 2) correlates any two sites which are related directly or indirectly within a
left-hand-side or a right-hand-side of a rule, and it hence enforces a ‘strong’ in-
dependence notion between the uncorrelated sites. In turn, precisely such strong
independence brings a possibility to effectively reconstruct the transient semantics
126
Chapter 7. Conclusions and Discussion 127
of the original system. Despite such strong correlation notion, it was shown that
the reduction can be significant, as shown over the EGFR/insulin crosstalk case
study, or even radical – as shown on a simple polymerization example. However,
in several other test examples, Algorithm 2 reported the annotation equal to the
species-based description. Indeed, a typical signaling cascade module involves a
cascade of tests over pairs of sites, which are finally all correlated due to transitivity
of annotation relation. In such a case, the framework for approximate reductions
can be used to quantitatively study coarse-grained executions, even when its exe-
cutions are not consistent with the original model. In the current framework, the
computation of error bound relies on knowing the generator matrix and transient
distribution of the original process. To this end, the efficient numerical estimation
of the error bounds is a first compelling question for future work.
Second, it would be interesting to investigate whether in case of no pattern dele-
tion, the annotation which tests only the left-hand-side of each rule is sufficient for
claiming the reduction exact. For example, in Example 2.1, if no deletion of pat-
terns is involved, by Theorem 5.9, adding a rule ∅ c3→ SABC does not influence the
forward property, even though the annotation provided by Algorithm 2 enforces
it (notice that invertability can no longer be claimed). On the other side, adding
a rule FAB?c4→ ∅ indeed breaks both forward and backward property (it suffices to
observe three species-based states x0 = SB, x1 = SB, SABC, x2 = SAB, SBC).
The above observation relates to what the authors in [49] informally named am-
biguous update.
Moreover, as ODE fragments are typically fewer than stochastic ones (for example,
the presented EGF/insulin case study, the ODE fragments count 39 and stochastic
fragments 609), it motivates to study whether ODE fragments can be used for
exact simulation of stochastic traces, or, for correct computation of the transient
distribution. To this end, the result of Kurtz [61] – that the ODE model is a
thermodynamical limit of the stochastic model – is an important insight. However,
direct comparison of ODE and stochastic fragments is not possible, because the
annotation used for ODE fragment need not be transitive. In Example 2.3, while
the procedure for stochastic fragments outputs C (A) = c, x, y, the procedure for
differential fragments would output C (A) = c, x,c, y. Such a fragment set
is of dimension 6, and would be positioned between F1 (dimension 8) and F2, F3
(dimension 5) in the annotation lattice in Figure 3.3. Preliminary analysis over
examples show that ODE fragments sometimes preserve the transient distribution
Bibliography 128
in the stochastic: using ODE fragments in the stochastic setting does not provide
an exact reduction (it suffices to observe the state SA010 , SA111 and transition by
R1), while numerical experiments show that the transient distribution is preserved.
A generalization of this observation as an extension to the current framework
towards using ODE fragments in the stochastic setting, requires a different rule-
set translation procedure and necessitates further technical analysis.
Finally, it is important to mention that the here-presented work deals with pro-
viding more efficient executions of a given rule-based model (taken as the ‘ground
truth’), while we do not address the problem of collecting the modeling hypothesis
or validating that model with respect to experimental data. Studying fragment-
based reductions in a wider modeling context opens numerous challenges for for-
mal methods, when used as a service towards better understanding mechanisms
of molecular signaling. As a good model needs to be consistent with the observa-
tion, but also to predict behaviors which can be tested by observation, one such
question is how to tailor the reduction to the high-level, qualitative experimental
observation (for example, formation of a species, bimodality or causal relation be-
tween events). For example, for studying phenotypic variety, it sometimes suffices
to use a model where each site is correlated only to itself [26].
We believe that coupling the fragment-based reduction technique with a formally
expressed question of interest can provide significantly better reduction and ulti-
mately facilitate efficient, automated reasoning within the modeling cycle. This
would in turn allow the biologist to focus on the key biological principles instead
of solving equations and interpreting complicated diagrams.
Bibliography
[1] Bree B Aldridge, John M Burke, Douglas A Lauffenburger, and Peter K
Sorger. Physicochemical modelling of cell signalling pathways. Nature Cell
Biology, 8(11):1195–1203, November 2006.
[2] David F Anderson and Thomas G Kurtz. Continuous time markov chain mod-
els for chemical reaction networks. In Design and Analysis of Biomolecular
Circuits, pages 3–42. Springer, 2011.
[3] Cedric Archambeau and Manfred Opper. Approximate inference for
continuous-time markov processes. In Bayesian Time Series Models, pages
125–140. Cambridge University Press, 2011.
[4] Jean-Pierre Banatre, Pascal Fradet, and Daniel Le Metayer. Gamma and the
chemical reaction model: Fifteen years after. In Multiset Processing, pages
17–44. Springer, 2001.
[5] Jean-Pierre Banatre and Daniel Le Metayer. Programming by multiset trans-
formation. Communications ACM, 36(1):98–111, 1993.
[6] Michael L Blinov, James R Faeder, Byron Goldstein, and William S Hlavacek.
Bionetgen: software for rule-based modeling of signal transduction based on
the interactions of molecular domains. Bioinformatics, 20(17):3289–3291,
2004.
[7] Michael L Blinov, James R Faeder, Byron Goldstein, William S Hlavacek,
et al. A network model of early events in epidermal growth factor receptor
signaling that accounts for combinatorial complexity. Biosystems, 83(2):136–
151, 2006.
[8] Nikolay M Borisov, Nick I Markevich, Jan B Hoek, and Boris N Kholodenko.
Signaling through receptors and scaffolds: independent interactions reduce
combinatorial complexity. Biophysical journal, 89(2):951–966, 2005.
129
Bibliography 130
[9] N.M. Borisov, A.S. Chistopolsky, J.R. Faeder, and B.N. Kholodenko. Domain-
oriented reduction of rule-based network models. IET Syst.Biol., 2, 2008.
[10] Peter Buchholz. Bisimulation relations for weighted automata. Theoretical
Computer Science, Volume 393, Issue 1-3:109–123, 2008.
[11] Jerry R Burch, Edmund M Clarke, Kenneth L McMillan, David L Dill, and
Lain-Jinn Hwang. Symbolic model checking: 1020 states and beyond. Infor-
mation and computation, 98(2):142–170, 1992.
[12] Federica Ciocchetta and Jane Hillston. Bio-pepa: A framework for the
modelling and analysis of biological systems. Theoretical Computer Science,
410(33):3065–3084, 2009.
[13] Ido Cohn, Tal El-Hay, Nir Friedman, and Raz Kupferman. Mean field varia-
tional approximation for continuous-time bayesian networks. In Proceedings
of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI
’09, pages 91–100, Arlington, Virginia, United States, 2009. AUAI Press.
[14] Holger Conzelmann, Dirk Fey, and Ernst D Gilles. Exact model reduction of
combinatorial reaction networks. BMC Systems Biology, 2(78):342–351, 2008.
[15] Thomas H Cormen, Clifford Stein, Ronald L Rivest, and Charles E Leiserson.
Introduction to Algorithms, Chapter 21: Data structures and Disjoint Sets.
McGraw-Hill Higher Education, 2nd edition, 2001.
[16] Patrick Cousot. Abstract interpretation based formal methods and future
challenges. In Informatics - 10 Years Back. 10 Years Ahead., pages 138–156,
London, UK, 2001. Springer-Verlag.
[17] Thomas M Cover and Joy A Thomas. Elements of information theory. Wiley-
interscience, 2012.
[18] Gheorghe Craciun and Martin Feinberg. Multiple equilibria in complex chem-
ical reaction networks: Ii. the species-reaction graph. SIAM Journal on Ap-
plied Mathematics, 66(4):1321–1338, 2006.
[19] Vincent Danos, Jerome Feret, Walter Fontana, Russell Harmer, and Jean
Krivine. Rule-based modelling of cellular signalling. In CONCUR 2007–
Concurrency Theory, pages 17–41. Springer, 2007.
Bibliography 131
[20] Vincent Danos, Jerome Feret, Walter Fontana, Russell Harmer, and Jean
Krivine. Abstracting the differential semantics of rule-based models: exact
and automated model reduction. In Logic in Computer Science (LICS), 2010
25th Annual IEEE Symposium on, pages 362–381. IEEE, 2010.
[21] Vincent Danos, Jerome Feret, Walter Fontana, and Jean Krivine. Scalable
simulation of cellular signaling networks. In Programming Languages and
Systems, pages 139–157. Springer, 2007.
[22] Vincent Danos, Jerome Feret, Walter Fontana, and Jean Krivine. Abstract
interpretation of cellular signalling networks. Lecture Notes in Computer
Science, 4905:83–97, 2008.
[23] Vincent Danos, Jerome Feret, Walter Fontana, and Jean Krivine. Abstract
interpretation of reachable complexes in biological signalling networks. In Pro-
ceedings of the 9th International Conference on Verification, Model Checking
and Abstract Interpretation (VMCAI’08), volume 4905, pages 42–58, 2008.
[24] Vincent Danos and Cosimo Laneve. Formal molecular biology. Theoretical
Computer Science, 325(1):69–110, 2004.
[25] Xavier Darzacq, Jie Yao, Daniel R Larson, Sebastien Z Causse, Lana Bosanac,
Valeria de Turris, Vera M Ruda, Timothee Lionnet, Daniel Zenklusen, Ben-
jamin Guglielmi, et al. Imaging transcription in living cells. Annual review
of biophysics, 38:173, 2009.
[26] Eric J Deeds, Jean Krivine, Jerome Feret, Vincent Danos, and Walter
Fontana. Combinatorial complexity and compositional drift in protein in-
teraction networks. PLoS ONE, 7(3), 2012.
[27] Kun Deng, Prashant G Mehta, and Sean P Meyn. Optimal kullback-leibler
aggregation via spectral theory of markov chains. IEEE Trans. Automat.
Contr., 56(12):2793–2808, 2011.
[28] Lucas Dixon and Ross Duncan. Graphical reasoning in compact closed cat-
egories for quantum computation. Annals of Mathematics and Artificial In-
telligence, 56(1):23–42, 2009.
[29] Rick Durrett. Probability: Theory and examples, 2011.
Bibliography 132
[30] Jerome Feret. Fragements-based model reduction: some case studies. In
Jean Krivine and Angelo Troina, editors, Preproceedings of the First Inter-
national Workshop on Interactions between Computer Science and Biology,
CS2Bio ’2010, volume 268 of Electonic Notes in Theoretical Computer Sci-
ence, pages 77–96, Amsterdam, Netherlands, 10 June 2010. Elsevier Science
Publishers.
[31] Jerome Feret, Vincent Danos, Jean Krivine, Russ Harmer, and Walter
Fontana. Internal coarse-graining of molecular systems. Proceedings of the
National Academy of Sciences, 106(16):6453–6458, April 2009.
[32] Jerome Feret, Thomas Henzinger, Heinz Koeppl, and Tatjana Petrov. Lumpa-
bility abstractions of rule-based systems. Theoretical Computer Science,
431:137–164, 2012.
[33] Jerome Feret, Heinz Koeppl, and Tatjana Petrov. Stochastic fragments: A
framework for the exact reduction of the stochastic semantics of rule-based
models. International Journal of Software and Informatics, 4, to appear.
[34] Jerome Feret and Jean Krivine. Kasim: a simulator for kappa, 2008-2013.
http://www.kappalanguage.org.
[35] Jasmin Fisher and Thomas A Henzinger. Executable cell biology. Nature
Biotechnology, 25(11):1239–1249, November 2007.
[36] Walter Fontana and Leo W Buss. The barrier of objects: From dynamical
systems to bounded organizations. International Institute for Applied Systems
Analysis, 1996.
[37] Arnab Ganguly, Tatjana Petrov, and Heinz Koeppl. Markov chain aggrega-
tion and its applications to combinatorial reaction networks. arXiv preprint,
abs/1303.4532, 2012.
[38] J Christoph M Gebhardt, David M Suter, Rahul Roy, Ziqing W Zhao, Alec R
Chapman, Srinjan Basu, Tom Maniatis, and X Sunney Xie. Single-molecule
imaging of transcription factor binding to DNA in live mammalian cells. Na-
ture Methods, 10(5):421–426, May 2013.
[39] Alison L Gibbs and Francis E Su. On choosing and bounding probability
metrics. International Statistical Review (2002), 70(3):419–435, 2002.
Bibliography 133
[40] Daniel T Gillespie. Exact stochastic simulation of coupled chemical reactions.
The journal of physical chemistry, 81(25):2340–2361, 1977.
[41] Daniel T Gillespie. Markov Processes: An Introduction for Physical Scientist.
Gulf Professional Publishing, 1992.
[42] Daniel T Gillespie. Stochastic simulation of chemical kinetics. Annu. Rev.
Phys. Chem., 58:35–55, 2007.
[43] Daniel T Gillespie. Deterministic limit of stochastic chemical kinetics. The
journal of physical chemistry. B, 113(6):1640–1644, February 2009.
[44] Alexander N Gorban and Ovidiu Radulescu. Dynamical robustness of bio-
logical networks with hierarchical distribution of time scales. arXiv preprint
q-bio/0701020, 2007.
[45] Peter JE Goss and Jean Peccoud. Quantitative modeling of stochastic systems
in molecular biology by using stochastic petri nets. Proceedings of the National
Academy of Sciences, 95(12):6750–6755, 1998.
[46] Robert M Gray. Entropy and information theory. Springer-Verlag New York,
Inc., New York, NY, USA, 1990.
[47] John Haigh. Stochastic modelling for systems biology by d. j. wilkinson.
Journal Of The Royal Statistical Society Series A, 170(1):261–261, 2007.
[48] Hardy and Ramanujan. Asymptotic formula in combinatory analysis. Pro-
ceedings of the London Mathematical Society, S2-17(1):75–115, 1918.
[49] Russ Harmer, Vincent Danos, Jerome Feret, Jean Krivine, and Fontana Wal-
ter. Intrinsic information carriers in combinatorial dynamical systems. Chaos,
(20):037108, 2010.
[50] H.Conzelmann, J.Saez-Rodriguez, T.Sauter, B.N.Kholodenko, and E.D.
Gilles. A domain-oriented approach to the reduction of combinatorial com-
plexity in signal transduction networks. BMC Bioinformatics, 7, 2006.
[51] Thomas A Henzinger, Barbara Jobstmann, and Verena Wolf. Formalisms
for specifying markovian population models. In Reachability Problems, pages
3–23. Springer, 2009.
[52] Thomas A Henzinger, Maria Mateescu, and Verena Wolf. Sliding window
abstraction for infinite Markov chains. In CAV, pages 337–352, 2009.
Bibliography 134
[53] William S. Hlavacek, James R. Faeder, Michael L. Blinov, Alan S. Perelson,
and Byron Goldstein. The complexity of complexes in signal transduction.
Biotechnol. Bio-eng., 84:783–794, 2005.
[54] W.S. Hlavacek. The complexity of complexes in signal transduction. Biotech-
nology and Bio-engineering, 84:783–794, 2005.
[55] Hye-Won Kang and Thomas G Kurtz. Separation of time-scales and model
reduction for stochastic reaction networks. The Annals of Applied Probability,
23(2):529–583, 2013.
[56] John Kemeny and James L Snell. Finite Markov Chains. Van Nostrand, 1960.
[57] Edda Klipp, Ralf Herwig, Axel Kowald, Christoph Wierling, and Hans
Lehrach. Systems biology in practice: concepts, implementation and appli-
cation. Wiley-Blackwell, 2008.
[58] Edda Klipp, Bodil Nordlander, Roland Kruger, Peter Gennemark, and Stefan
Hohmann. Integrative model of the response of yeast to osmotic shock. Nature
biotechnology, 23(8):975–982, 2005.
[59] Peter Krenn. Assembly and experimental validation of a rule-based model
for the high osmolarity glycerol (hog) pathway. Master thesis, University of
Graz, 2013. unpublished.
[60] Thomas G. Kurtz. Solutions of ordinary differential equations as limits of pure
jump Markov processes. Journal of Applied Probability, 7(1):49–58, 1970.
[61] Thomas G Kurtz. Limit theorems for sequences of jump Markov processes
approximating ordinary differential processes. Journal of Applied Probability,
8(2):344–356, 1971.
[62] James Ledoux. On weak lumpability of denumerable markov chains. Statistics
& probability letters, 25(4):329–339, 1995.
[63] Adiel Loinger, Azi Lipshtat, Nathalie Q Balaban, and Ofer Biham. Stochastic
simulations of genetic switch systems. Physical Review E, 75(2):021904, 2007.
[64] Harley H McAdams and Adam Arkin. Its a noisy business! Genetic regulation
at the nanomolar scale. Trends in Genetics, 15(2):65–69, 1999.
[65] Donald A McQuarrie. Stochastic Approach to Chemical Kinetics. Journal of
Applied Probability, 4(3):413–478, 1967.
Bibliography 135
[66] Brian Munsky and Mustafa Khammash. The finite state projection algorithm
for the solution of the chemical master equation. The Journal of chemical
physics, 124:044104, 2006.
[67] James R Norris. Markov chains. Number 2008. Cambridge university press,
1998.
[68] Johan Paulsson. Models of stochastic gene expression. Physics of life reviews,
2(2):157–175, 2005.
[69] Serge Pelet, Fabian Rudolf, Mariona Nadal-Ribelles, Eulalia de Nadal,
Francesc Posas, and Matthias Peter. Transient activation of the HOG MAPK
pathway regulates bimodal gene expression. Science Signaling, 332(6030):732,
2011.
[70] Tatjana Petrov, Jerome Feret, and Heinz Koeppl. Reconstructing species-
based dynamics from reduced stochastic rule-based models. In Winter Sim-
ulation Conference, 2012.
[71] Tatjana Petrov, Arnab Ganguly, and Heinz Koeppl. Model decomposition
and stochastic fragments. Electron. Notes Theor. Comput. Sci., 284:105–124,
June 2012.
[72] Tatjana Petrov and Heinz Koppl. Approximate reduction of rule-based mod-
els. In Proceedings of ECC 2013, 2013.
[73] John L Pfaltz and Azriel Rosenfeld. Web grammars. In Proceedings of the
1st international joint conference on Artificial intelligence, pages 609–619.
Morgan Kaufmann Publishers Inc., 1969.
[74] Andrew Phillips and Luca Cardelli. Efficient, correct simulation of biological
processes in the stochastic pi-calculus. In Computational Methods in Systems
Biology, pages 184–199. Springer, 2007.
[75] Christopher V Rao and Adam P Arkin. Stochastic chemical kinetics and
the quasi-steady-state assumption: Application to the gillespie algorithm.
Journal of Chemical Physics, 118(11):4999–5010, 2003.
[76] Aviv Regev and Ehud Shapiro. Cells as computation. Nature, 419(6905):343,
September 2002.
Bibliography 136
[77] Gerardo Rubino and Bruno Sericola. A finite characterization of weak
lumpable Markov Processes. part II: The continuous time case. Stochastic
processes and their applications, vol. 38, no2:195–204, 1991.
[78] Gerardo Rubino and Bruno Sericola. A finite characterization of weak
lumpable Markov processes. part I: The discrete time case. Stochastic pro-
cesses and their applications, vol. 45, no 1:115–125, 1993.
[79] Michael S Samoilov and Adam P Arkin. Deviant effects in molecular reaction
pathways. Nature biotechnology, 24(10):1235–40, 2006.
[80] Claude E Shannon. A mathematical theory of communication. Bell system
technical journal, 27, 1948.
[81] Ilya Shmulevich, Edward R Dougherty, Seungchan Kim, and Wei Zhang.
Probabilistic boolean networks: a rule-based uncertainty model for gene reg-
ulatory networks. Bioinformatics, 18(2):261–274, 2002.
[82] Jianjun Paul Tian and D Kannan. Lumpability and commutativity of Markov
processes. Stochastic analysis and Applications, 24, no3:685–702, 2006.
[83] Pedro Pablo Perez Velasco. Matrix graph grammars. arXiv preprint
arXiv:0801.1245, 2008.
[84] Christopher T. Walsh. Posttranslation Modification of Proteins: Expanding
Nature’s Inventory. Roberts and Co. Publisher, 2006.
[85] Christoph Zechner, Jakob Ruess, Peter Krenn, Serge Pelet, Matthias Peter,
John Lygeros, and Heinz Koeppl. Moment-based inference predicts bimodal-
ity in transient gene expression. Proceedings of the National Academy of
Sciences, 109(21):8340–8345, 2012.