synthesis for systems biology
Post on 23-Feb-2016
34 Views
Preview:
DESCRIPTION
TRANSCRIPT
Synthesis forSystems Biology
Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC BerkeleyJasmin Fisher Microsoft ResearchNir Piterman University of Leicester
2
Executable biology pushes our boundariesMaximally non-deterministic systems
cells exhibit races model must preserve all observed n/d
Needs new synthesis algorithmsfrom 2QBF to 3QBF
Incomplete specssparse wet lab experiments unknown behavior
Needs analysis of ambiguityare there alternative explanations of observed phenomena?
4
5
Other lessons and resultsDesign your own tools
To enable synthesis, design a domain language.Then build a lightweight synthesizer.
Synthesized a C. elegans VPC modelWe failed to write this model manually; others took months.
Beyond synthesisShowed that available experiments are non-ambiguous.Synthesized an new internally alternative model.
Systems biology
6
7
Understanding Diseases“Cancer is fundamentally a disease of failure of regulation of tissue growth. In order for a normal cell to transform into a cancer cell, the genes which regulate cell growth and differentiation must be altered.” – Wikipedia
To understand cancer, investigate cell differentiation
8
How Are Cells Differentiated?Two ways of differentiation:
– A single cell divides into cells of different type.– Multiple identical cells differentiate by
communicating.
To understand cell differentiation, investigate cell communication.
Studying Differentiation on WormsCell differentiation in worms: similar to human but much simpler.
9
identicalprecursor cells
differentiatedvulval cells
10
The Research Goal
What is the cell’s “algorithm” for robustlydeciding cell fates through communication?
Mutation experiments are visually observableBiologists mutate cell genes and observe the outcome of differentiation.
sqv mutants of Caenorhabditis elegans
are defective in vulval epithelial invagination
[Herman et al. 1999]11
12
The results from wet-lab experiments
13
Mutation experiments give partial knowledgeFrom gene mutation experiments, biologists infer a protein interaction.
“In this assay, depletion of lst-2, lst-3, lst-4, or dpy-23, as well as ark-1, caused ectopic vulval induction, suggesting that they function as negative regulators of the EGFR- MAPK pathway.”
[Yoo et al. 2004]
14
Making Sense of Experiments
Executable Systems biology
15
16
Executable BiologyComputational models are needed to tackle the combinatorial complexity of cell communication.
Verification of models can show their inconsistency with experimental data.
New interactions can be discovered. [Fisher et al. 2007]
17
Semantics of modelsTime and protein concentrations are discrete:
discrete is sufficient to show interesting behavior
Cells are concurrent communicating automatabounded asynchrony (cells progress at ~same rate)
Note: timing is modeled with state progression
18
Cells as a Reactive Modules (RM) programatom Vul controls Vul reads go, Vul, IS, Muv_state, v_Vul awaits go, v_Vul, lst_state init [] (true) & v_Vul'= ko -> Vul':= off0; [] (true) & v_Vul'~= ko -> Vul':= Evaluate0;
update [] (~go & go') & Vul = Evaluate0 & Muv_state = ON & IS ~= high -> Vul' := off1; [] (~go & go') & Vul = Evaluate0 & IS = high -> Vul' := let23; [] (~go & go') & Vul = Evaluate0 & Muv_state = OFF & IS ~= high -> Vul' :=
Evaluate1; [] (~go & go') & Vul = off1 & IS = med -> Vul' := Before_Partial_On; [] (~go & go') & Vul = off1 & IS = high -> Vul' := let23; [] (~go & go') & Vul = off1 & IS ~= high & IS ~= med -> Vul' := off2; [] (~go & go') & Vul = Evaluate1 -> Vul' := let23; [] (~go & go') & Vul = Before_Partial_On -> Vul' := let23; [] (~go & go') & Vul = let23 & lst_state' = OFF -> Vul' := sem5; [] (~go & go') & Vul = sem5 & lst_state' = OFF -> Vul' := let60; [] (~go & go') & Vul = let60 & lst_state' = OFF -> Vul' := mpk1; [] (~go & go') & Vul = let23 & lst_state' = ON -> Vul' := Vul_counteracted; [] (~go & go') & Vul = sem5 & lst_state' = ON -> Vul' := Vul_counteracted; [] (~go & go') & Vul = let60 & lst_state' = ON -> Vul' := Vul_counteracted
19
RM models: laborious to develop and update
Months of tweaking to get the timing righthard to understandhard to debug
RM is too expressive (eg, has clairvoyance)it’s tempting to encode constructs that have no clear biological explanations (strange abstractions)
Summary: modeling in executable biology is laborious
if only we could automate model development
Synthesis and Analysis of Biology Models
20
21
Our contributionAutomatically infer cell models (synthesis)
– obtain executable models faster
Enumerate alternative models (“distinct” synthesis)
– find alternative explanations of observed phenomena
Ask for more specifications (disambiguation)– suggest experiments to disambiguate between
models
22
Lessons: Build your tools!Executable biology selects methods based on availability of tools, eg model checkers.
We did the same for synthesis of models. It failed.
We argue here to build our own lightweight tools, including the modeling language and its synthesizer.
We show how to DIY.
The language
23
24
Motivation for a high-level language (HLL)HLL smaller programs
smaller search space faster synthesis
HLL programs are biological diagrams easier to read by biologists
Four levels of the language
schedule
concentration update function
26
Top-level semanticsThe program
Inputs: mutation () changes behavior of proteins
schedule () bounded length, controls cell interleaving
Output: fates of cells () resulting fates of cells
27
CorrectnessTop level program
Specification (experiments):
Correctness: i. demonic scheduler cannot produce
unobserved fate
ii. angelic scheduler can produce each observed fate
28
Level 2: Program is composed from cells Cells advance according to the scheduleCells communicate by reading each others’ state
state: set of concentrations of proteins of cell proteins
Schedule: The first step executes cells 2, 3, and 6.
Bounded asynchrony: [Fischer et al.]schedule can be partitioned into macrosteps,in each macrostep, each cell makes one step
Our schedules contain exactly macrosteps
Level 3: In cells are proteinsEach cell is composed from proteins.
– protein state: discretized protein concentration– proteins read states of other proteins (pot. in
other cells)– they update their own concentration next step
Synchronous execution: – when a cell is scheduled, all of its proteins take
one step– ie, they update their concentration level
[similar to Synchronous/Reactive (SR) model, Edwards and Lee, 2002]
29
Level 4: In proteins are update functionsProtein state , discretized concentrations
Protein update function reads concentrations of attached proteins and updates own
Note: these update functions are what we synthesize
i.e., in our partial models we leave (some) some update functions unspecified
30
The output fateThe fate of the program is computed with a fate function from the state of each cell
,
where is the state of cell .
31
ExampleAssume a network of police cameras. When a gunshot happens, we want at least one nearby camera to take a picture. Synthesize a protocol for deciding which camera takes a picture. OK if multiple cameras do.
Two types of communications: - sound from gunshot (“base station”) to
cameras- radio transmission between camera nodes
announcing “I took a picture, you don’t have to, save your battery”
Nodes should decide who is closest on the basis of sound signal strength. No triangulation.
32
Example
33
Incomplete specification
signal from BS
take picture? signal from BS
take picture? cameras managed to
communicate?
H Y H N YN YY Y
H Y L N YH Y H Y N
34
Synthesized update functions for base receiver, delay node
35
Synthesis
36
SynthesisInput to synthesizer:
specification partial program (sketch)“biological” invariants see next slide
Output:completion completes into a correct
The synthesis problem:
a 3QBF problem (unlike ordinary 2QBF synthesis): 37
39
Enforcing Biological Invariants Synthesized models must satisfy biological invariants.
Biologist’s invariants specify whether one protein activates or inhibits another.
Asserted as monotonicity constraints on state transitions
The synthesizer
40
Architecture of synthesizer (3.5 KLOC)DSL embedded in Scala
just defining classes for Cells, Proteins gives nice syntax
evaluate the Scala program result is an abstract syntax graph (ASG)
interpreter for ASG in Scalagiven ASG and (m, s), run the program to get the fate
compiler from ASG to a Z3 formula use by algorithms for verification, synthesis, ambiguity
41
Example of the embedded DSLclass BaseReceiver extends Node("BaseReceiver") { val base = input(“off”, "low", "high") val lateralReceiver = input(“off”, "on") val out = output(“off”, "on")
// update functions implemented as a (more general) FSM val stateful = logic(new StatefulLogic { val off = state("off") // two observable states val on = state("on") output(out) // link these states to output port init(off) // “off” is the start state
nbStates(5) // this state machine will have five hidden states
activating(base) // biological invariants on inputs inhibiting(lateralReceiver) }) register(stateful) // necessitated by the DSL}
42
How to deal with 3QBF synthesis problem
Domain sizes:holes large treated symbolicallyschedules large treated symbolicallymutations small by demand enumeration
43
Algorithms
45
46
Synthesis Approach: CEGISassume we care only about the classical demonic correctness
synthesize
initial input set(schedule, experiment)
candidate modelSAT
add counterexample(schedule, experiment)
SAT UNSATUNSAT
verify
47
Synthesis algorithm
∃h((∀𝑚∈𝜋𝑚 (𝐸 ) .¬∃𝑠 (𝑚 ,𝑃 (𝑚 , 𝑠) )∉𝐸 )∧ (∀ (𝑚 , 𝑓 )∈𝐸 .∃𝑠 .𝑃 (𝑚 ,𝑠 )= 𝑓 ))
∃h(𝑚1 ,𝑃 (𝑚1 ,𝑠 1 ) )∈𝐸
∧…∧(𝑚𝑙 ,𝑃 (𝑚𝑙 ,𝑠𝑙 ) )∈𝐸
∧(∃𝑠 .𝑃 (𝑚1, 𝑠)= 𝑓 1 )
∧…∧(∃𝑠 .𝑃 (𝑚𝑘 , 𝑠)= 𝑓 𝑘)
verifier of demonic schedules verifier of angelic schedules
counterexample counterexample
Three communicating solvers
48
3QBF
SAT 2QBF // blasts (m,f), turns to SATSAT
2QBF 3QBF
Supporting tools
49
50
Supporting toolsWork would not be productive without these tools
– execution visualizer– causal tracer– automaton minimizer
We still need ideas on how to construct those quickly
51
Visualizing the Synthesized Model
activatedconnectionsare colored
step throughexecution
Results
52
53
Results (1): Automatic model inferenceSynthesized a model of VPC in C. elegans
- the model expressed in our bio-inspired language
- we believe it’s more readable than in RM
Prior to synthesis– we failed to manually fix a bug in an equivalent
model– collaborators took several months to make this
model
54
Results (2): Are experiments complete?We concluded that the set of experiments is complete
– this means there exists no alternative model that behaves differently on experiments not yet performed
– this is under the assumption described in the sketch provided by biologists, which encodes their knowledge about C. elegans
Working on identifying minimal set of experiments– if we want to validate these experiment, do we need
to repeat all of them?
55
Results (3)No behaviorally distinct models. But we synthesized a model that differs internally.
cell behavior due to a different protein interactionThese models can’t be distinguished via mutation and fate observation (models have same fates, after all).
Hence one must “instrument” the cell by tagging proteins with fluorescent genes.
Here, our synthesis identifies which genes to instrument (the fewer the better).
Summary: Executable biology’s challengesInfer models that can replay all observed behavior
… or else they don’t faithfully model cell phenomena.This semantics leads to a 3QBF synthesis problem.
Analyze the space of plausible modelsAre specs ambiguous, minimal? Which experiments to perform to rule out a model?
56
top related