transpose concepts and tools from programming theory to systems biology

40
François Fages MPRI Bio-info 2007 Formal Biology of the Cell Modeling, Computing and Reasoning with Constraints François Fages, Constraint Programming Group, INRIA Rocquencourt mailto:[email protected] http://contraintes.inria.fr/ Transpose concepts and tools from programming theory to systems biology Formal Methods of Program Verification to Systems Biology, Constraint Logic Programming and Constraint-based Model Checking In course, Learn bits of cell biology through computational models, Develop new formalisms, languages and algorithms coming from biological questions

Upload: rumer

Post on 10-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Formal Biology of the Cell Modeling, Computing and Reasoning with Constraints François Fages, Constraint Programming Group, INRIA Rocquencourt mailto:[email protected] http://contraintes.inria.fr/. Transpose concepts and tools from programming theory to systems biology - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Formal Biology of the CellModeling, Computing and Reasoning with

Constraints

François Fages, Constraint Programming Group, INRIA Rocquencourt

mailto:[email protected]://contraintes.inria.fr/

Transpose concepts and tools from programming theory to systems biology• Formal Methods of Program Verification to Systems Biology,• Constraint Logic Programming and Constraint-based Model Checking

In course, • Learn bits of cell biology through computational models,• Develop new formalisms, languages and algorithms coming from

biological questions

Page 2: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Systems Biology

•Multidisciplinary field aiming at getting over the complexity walls to reason about biological processes at the system level.

• Conferences ICSB, CMSB, … journal TCSB, …

•Virtual cell: emulate high-level biological processes in terms of their biochemical basis at the molecular level (in silico experiments)

•Bioinformatics: end 90’s, genomic sequences post-genomic data (RNA expression, protein synthesis, protein-protein interactions,… )

•Need for a strong effort on:

- the formal representation of biological processes,

- formal tools for modeling and reasoning about their global behavior.

Page 3: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Language Approach to Cell Systems Biology

Qualitative models: from diagrammatic notation to• Boolean networks [Thomas 73]

• Petri Nets [Reddy 93]

• Milner’s π–calculus [Regev-Silverman-Shapiro 99-01, Nagasali et al. 00] • Bio-ambients [Regev-Panina-Silverman-Cardelli-Shapiro 03]

• Pathway logic [Eker-Knapp-Laderoute-Lincoln-Meseguer-Sonmez 02]

• Transition systems [Chabrier-Chiaverini-Danos-Fages-Schachter 04]

Biochemical abstract machine BIOCHAM-1 [Chabrier-Fages 03]

Quantitative models: from differential equation systems to• Hybrid Petri nets [Hofestadt-Thelen 98, Matsuno et al. 00]

• Hybrid automata [Alur et al. 01, Ghosh-Tomlin 01]

• Hybrid concurrent constraint languages [Bockmayr-Courtois 01]

• Rules with continuous dynamics BIOCHAM-2 [Chabrier-Fages-Soliman 04]

Page 4: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

The Biochemical Abstract Machine BIOCHAM

Software environment based on two formal languages:

1. Biocham Rule Language for Modeling Biochemical Systems 1. Syntax of molecules, compartments and reactions2. Semantics at 3 abstraction levels: Boolean, Concentrations,

Populations

2. Biocham Temporal Logic for Formalizing Biological Properties1. CTL for Boolean semantics2. Constraint LTL for concentration semantics, PCTL for stochastic

semantics

Machine learning Rules and Parameters from Temporal Properties1. Learning reaction rules from CTL specification2. Learning kinetic parameter values from Constraint-LTL specification

Internship topics: http://contraintes.inria.fr

Page 5: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Overview of the Lectures

1. Formal molecules and reaction rules in BIOCHAM.

2. Formal biological properties in temporal logic. Symbolic model-checking.

3. Continuous dynamics. Kinetics and transport models.

4. Computational models of the cell cycle control.

5. Abstract interpretation and typing of biochemical networks

6. Machine learning reaction rules from temporal properties.

7. Constraint-based model checking. Learning kinetic parameter values.

8. Constraint Logic Programming approach to protein structure prediction.

Page 6: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

ReferencesA wonderful textbook:

Molecular Cell Biology. 5th Edition, 1100 pages+CD, Freeman Publ.

Lodish, Berk, Zipursky, Matsudaira, Baltimore, Darnell. Nov. 2003.

Modeling dynamic phenomena in molecular and cellular biology.

Segel. Cambridge Univ. Press. 1987.

Modeling and querying bio-molecular interaction networks.

Chabrier, Chiaverini, Danos, Fages, Schächter. Theoretical Computer Science 04

Machine learning biochemical reaction networks.

Calzone, Chabrier, Fages, Soliman. Trans. Comp. Syst. Biology. 2006

The Biochemical Abstract Machine BIOCHAM. Fages, Solimanhttp://contraintes.inria.fr/BIOCHAM

Page 7: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Map of Course 1

1. BIOCHAM syntax• Proteins: complexation and phosphorylation

• DNA and genes: replication and transcription

• Reaction and transport rules

2. Boolean semantics: concurrent transition system, Kripke structure• States and transitions

• Examples: RTK membrane receptors, MAPK signaling pathways

Page 8: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

2. Syntax: a Simple Algebra of Cell Molecules

Small molecules: covalent bonds 50-200 kcal/mol

• 70% water

• 1% ions

• 6% amino acids (20), nucleotides (5),

fats, sugars, ATP, ADP, …

Macromolecules: hydrogen bonds, ionic, hydrophobic, Waals 1-5 kcal/mol

Stability and bindings determined by the number of weak bonds: 3D shape

• 20% proteins (50-104 amino acids)

• RNA (102-104 nucleotides AGCU)

• DNA (102-106 nucleotides AGCT)

Page 9: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Structure Levels of Proteins

1) Primary structure: word of n amino acids residues (20n possibilities)

linked with C-N bonds

Example: MPRI

Methionine-Proline-Arginine-Isoleucine

2) Secondary: word of m helix, strands, random coils,… (3m-10m)

stabilized by hydrogen bonds H---O

3) Tertiary 3D structure: spatial folding

stabilized by

hydrophobic

interactions

Page 10: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Formal proteins

Cyclin dependent kinase 1 Cdk1

(free, inactive)

Complex Cdk1-Cyclin B Cdk1–CycB

(low activity)

Phosphorylated form Cdk1~{thr161}-CycB

at site threonine 161

(high activity)

BIOCHAM syntax

Page 11: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Deoxyribonucleic Acid DNA

1) Primary structure: word over 4 nucleotides

Adenine, Guanine, Cytosine, Thymine

2) Secondary structure:

double helix of pairs

A--T and C---G stabilized

by hydrogen bonds

Page 12: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

DNA: Genome Size

Species Genome size Chromosomes Coding DNA

E. Coli (bacteria) 5 Mb 1 circular 100 %

S. Cerevisae (yeast) 12 Mb 16 70 %

… 3 Gb

… 15 Gb

… 140 Gb

Page 13: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

DNA: Genome Size

Species Genome size Chromosomes Coding DNA

E. Coli (bacteria) 5 Mb 1 circular 100 %

S. Cerevisae (yeast) 12 Mb 16 70 %

Mouse, Human 3 Gb 20, 23 15 %

… 15 Gb

… 140 Gb

3,200,000,000 pairs of nucleotides

single nucleotide polymorphism 1 / 2kb

Page 14: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Genome Size

Species Genome size Chromosomes Coding DNA

E. Coli (bacteria) 4 Mb 1 100 %

S. Cerevisae (yeast) 12 Mb 16 70 %

Mouse, Human 3 Gb 20, 23 15 %

Onion 15 Gb 8 1 %

… 140 Gb

Page 15: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Genome Size

Species Genome size Chromosomes Coding DNA

E. Coli (bacteria) 4 Mb 1 100 %

S. Cerevisae (yeast) 12 Mb 16 70 %

Mouse, Human 3 Gb 20, 23 15 %

Onion 15 Gb 8 1 %

Lungfish 140 Gb 0.7 %

Page 16: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

DNA Replication

Separation of the two helices and

production of one complementary strand for each copy

(from one or several starting points of replication)

Page 17: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Syntax of Genes

Part of DNA, unique #E2

Activation #E2-E2f13-DP12

binding of promotion factor

Repression

binding of another molecule

Page 18: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Transcription: DNA gene pRNA mRNA Protein

Genes: parts of DNA 1. Activation (Inhibition): transcription factors (inhibitors) bind to the

regulatory region of the gene #E2 + E2F13-DP12 => #E2-E2F13-DP122. Transcription: RNA polymerase copies the DNA from start to stop

positions into a single stranded pre-mature messenger pRNA _=[#E2-E2F13-DP12]=> pRNAcycA

3. (Alternative) splicing: non coding regions of pRNA are removed giving mature messenger mRNA pRNAcycA => mRNAcycA

4. Protein synthesis: mRNA moves to cytoplasm and binds to ribosome to assemble a protein mRNAcycA => mRNAcycA::cyt mRNAcycA::cyt + ribosome::cyt => cycA::cyt

Page 19: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

BIOCHAM Syntax of Objects

E == compound | E-E | E~{p1,…,pn}

Compound: molecule, #gene binding site, abstract @process…

- : binding operator for protein complexes, gene binding sites, …

Associative and commutative.

~{…}: modification operator for phosphorylated sites, …

Set of modified sites (Associative, Commutative, Idempotent).

O == E | E::location

Location: symbolic compartment (nucleus, cytoplasm, membrane, …)

S == _ | O+S

+ : solution operator (Associative, Commutative, Neutral _)

Page 20: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Elementary Rule Schemas

Complexation: A + B => A-B Decomplexation A-B => A + B

cdk1+cycB => cdk1–cycB

Page 21: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Elementary Rule Schemas

Complexation: A + B => A-B Decomplexation A-B => A + B

cdk1+cycB => cdk1–cycB

Phosphorylation: A =[C]=> A~{p} Dephosphorylation A~{p} =[C]=> A

Cdk1-CycB =[Myt1]=> Cdk1~{thr161}-CycB

Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB

Page 22: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Elementary Rule Schemas

Complexation: A + B => A-B Decomplexation A-B => A + B

cdk1+cycB => cdk1–cycB

Phosphorylation: A =[C]=> A~{p} Dephosphorylation A~{p} =[C]=> A

Cdk1-CycB =[Myt1]=> Cdk1~{thr161}-CycB

Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB

Synthesis: _ =[C]=> A. Degradation: A =[C]=> _.

_=[#Ge2-E2f13-Dp12]=>cycA cycE =[@UbiPro]=> _

(not for cycE-cdk2 which is stable)

Page 23: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Elementary Rule Schemas

Complexation: A + B => A-B Decomplexation A-B => A + B

cdk1+cycB => cdk1–cycB

Phosphorylation: A =[C]=> A~{p} Dephosphorylation A~{p} =[C]=> A

Cdk1-CycB =[Myt1]=> Cdk1~{thr161}-CycB

Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB

Synthesis: _ =[C]=> A. Degradation: A =[C]=> _.

_=[#Ge2-E2f13-Dp12]=>cycA cycE =[@UbiPro]=> _

(not for cycE-cdk2 which is stable)

Transport: A::L1 => A::L2

Cdk1~{p}-CycB::cytoplasm=>Cdk1~{p}-CycB::nucleus

Page 24: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

From Syntax to Semantics

R ::= S => S | kinetic-expression for R

A =[C]=> B stands for A+C => B+C

A <=> B stands for A=>B and B=>A, etc.

Systems Biology Markup Language: exchange format, no semantics

BIOCHAM : three abstraction levels

1. Boolean Semantics: presence-absence of molecules 1. Concurrent Transition System (asynchronous, non-deterministic)

2. Differential Semantics: concentration 1. Ordinary Differential Equations or Hybrid system (deterministic)

3. Stochastic Semantics: number of molecules 1. Continuous time Markov chain

Page 25: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

The Actin-Myosin two-stroke Engine with ATP fuel Myosin + ATP => Myosin-ATP

Myosin-ATP => Myosin + ADP

http://www.sci.sdsu.edu/movies

Page 26: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

The Actin-Myosin two-stroke Engine with ATP fuel Myosin + ATP => Myosin-ATP

Myosin-ATP => Myosin + ADP

http://www.sci.sdsu.edu/movies

Page 27: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

The Actin-Myosin two-stroke Engine with ATP fuel Myosin + ATP => Myosin-ATP

Myosin-ATP => Myosin + ADP

http://www.sci.sdsu.edu/movies

Page 28: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

The Actin-Myosin two-stroke Engine with ATP fuel Myosin + ATP => Myosin-ATP

Myosin-ATP => Myosin + ADPhttp://www.sci.sdsu.edu/movies

http://www-rocq.inria.fr/sosso/icema2

Page 29: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Cell to Cell Signaling by Hormones and Receptors

Signals: insulin, adrenaline, steroids, EGF, …, Delta, …, nutriments, light, pressure, …

Receptors: tyrosine kinases, G-protein coupled, Notch, …

L + R <=> L-R

RAS-GDP =[L-R]=> RAS-GTP

Page 30: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Five MAP Kinase Pathways in Budding Yeast

(Saccharomyces Cerevisiae)

Page 31: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

MAPK Signaling Pathways

Input:

RAF

• Activated by the receptor RAF-p14-3-3 + RAS-GTP

=> RAF + p14-3-3 + RAS-GDP

Output:

MAPK~{T183,Y185}

• moves to the nucleus

• phosphorylates a transcription factor

• which stimulates gene transcription

Page 32: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

MAPK Signaling Pathway in BIOCHAMRAF + RAFK <=> RAF-RAFK.RAF-RAFK => RAFK + RAF~{p1}.RAF~{p1} + RAFPH <=> RAF~{p1}-RAFPH.RAF~{p1}-RAFPH => RAF + RAFPH. MEK~$P + RAF~{p1} <=> MEK~$P-RAF~{p1} where p2 not in $P.MEK~{p1}-RAF~{p1} => MEK~{p1,p2} + RAF~{p1}.MEK-RAF~{p1} => MEK~{p1} + RAF~{p1}. MEKPH + MEK~{p1}~$P <=> MEK~{p1}~$P-MEKPH.MEK~{p1}-MEKPH => MEK + MEKPH.MEK~{p1,p2}-MEKPH => MEK~{p1} + MEKPH.MAPK~$P + MEK~{p1,p2} <=> MAPK~$P-MEK~{p1,p2} where p2 not in $P.MAPKPH + MAPK~{p1}~$P <=> MAPK~{p1}~$P-MAPKPH.MAPK~{p1}-MAPKPH => MAPK + MAPKPH.MAPK~{p1,p2}-MAPKPH => MAPK~{p1} + MAPKPH.MAPK-MEK~{p1,p2} => MAPK~{p1} + MEK~{p1,p2}.MAPK~{p1}-MEK~{p1,p2} => MAPK~{p1,p2}+MEK~{p1,p2}.

Pattern variables $P for

• Phosphorylation sites

• Molecules

with constraints

BIOCHAM rules are expanded in BIOCHAM-0 rules without patterns

Page 33: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Reaction Model of the MAPK Cascade [Levchenko et al. PNAS 2000]

(MA(1), MA(0.4)) for RAF + RAFK <=> RAF-RAFK.

(MA(0.5),MA(0.5)) for RAF~{p1} + RAFPH <=> RAF~{p1}-RAFPH.

(MA(3.3),MA(0.42)) for MEK~$P + RAF~{p1} <=> MEK~$P-RAF~{p1} where p2 not in $P.

(MA(10),MA(0.8)) for MEKPH + MEK~{p1}~$P <=> MEK~{p1}~$P-MEKPH.

(MA(20),MA(0.7)) for MAPK~$P + MEK~{p1,p2} <=> MAPK~$P-MEK~{p1,p2} where p2 not in $P.

(MA(5),MA(0.4)) for MAPKPH + MAPK~{p1}~$P <=> MAPK~{p1}~$P-MAPKPH.

MA(0.1) for RAF-RAFK => RAFK + RAF~{p1}.

MA(0.1) for RAF~{p1}-RAFPH => RAF + RAFPH.

MA(0.1) for MEK~{p1}-RAF~{p1} => MEK~{p1,p2} + RAF~{p1}.

MA(0.1) for MEK-RAF~{p1} => MEK~{p1} + RAF~{p1}.

MA(0.1) for MEK~{p1}-MEKPH => MEK + MEKPH.

MA(0.1) for MEK~{p1,p2}-MEKPH => MEK~{p1} + MEKPH.

MA(0.1) for MAPK-MEK~{p1,p2} => MAPK~{p1} + MEK~{p1,p2}.

MA(0.1) for MAPK~{p1}-MEK~{p1,p2} => MAPK~{p1,p2} + MEK~{p1,p2}.

MA(0.1) for MAPK~{p1}-MAPKPH => MAPK + MAPKPH.

MA(0.1) for MAPK~{p1,p2}-MAPKPH => MAPK~{p1} + MAPKPH.

Page 34: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Bipartite Proteins-Reactions Graph of MAPK

GraphVizhttp://www.research.att.co/sw/tools/graphviz

Page 35: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Influence Graph

inferred from the

syntactical reaction

model of the MAPK

“cascade”

Negative feedback loops…

[Fages Soliman CMSB’06]

Page 36: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Differential Simulation

Page 37: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Boolean Simulation

Page 38: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Automatic Generation of CTL Properties

reachable(MAPK~{p1}))

reachable(!(MAPK~{p1})))

oscil(MAPK~{p1}))

reachable(MAPKPH-MAPK~{p1}))

reachable(!(MAPKPH-MAPK~{p1})))

oscil(MAPKPH-MAPK~{p1}))

AG(!(MAPKPH-MAPK~{p1})->checkpoint(MAPKPH,MAPKPH-MAPK~{p1})))

AG(!(MAPKPH-MAPK~{p1})->checkpoint(MAPK~{p1},MAPKPH-MAPK~{p1})))

reachable(MAPK~{p1,p2}))

reachable(!(MAPK~{p1,p2})))

oscil(MAPK~{p1,p2}))

Page 39: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Boolean Semantics

Associate:

• Boolean state variables to molecules

denoting the presence/absence of molecules in the cell or compartment

• A Finite concurrent transition system [Shankar 93] to rules (asynchronous) over-approximating the set of all possible behaviors

A reaction A+B=>C+D is translated into 4 transition rules for the possibly complete consumption of reactants:

A+BA+B+C+D

A+BA+B +C+D

A+BA+B+C+D

A+BA+B+C+D

Page 40: Transpose concepts and tools from programming theory to systems biology

François Fages MPRI Bio-info 2007

Kripke Structure K=(S,R)

Given:

V is a set of state variables, with domain D,

T a set of transition rules between states.

Associate:

a Kripke structure (S,R) where

S=DV is the set of possible states with variables ranging in domain D

RSxS is the total relation induced by T, that is

(A,B) is in R if there exists a transition rule from state A to B

(A,A) is in R if there exist no transition from state A.