domain-specific languages for cellular interactions bill harrison department of computer science...
Post on 21-Dec-2015
214 Views
Preview:
TRANSCRIPT
Domain-specific Languages for Cellular Interactions
Bill HarrisonDepartment of Computer ScienceUniversity of Missouri at Columbia
This work partially supported by: NIH1 R0l GM62920-04A1, NIH1 P20 GM065762-01A1,the Georgia Research Alliance andthe Georgia Cancer Coalition.
Domain-specific Languages for Cellular Interactions
Bill HarrisonDepartment of Computer ScienceUniversity of Missouri at Columbia
meow!
This work partially supported by: NIH1 R0l GM62920-04A1, NIH1 P20 GM065762-01A1,the Georgia Research Alliance andthe Georgia Cancer Coalition.
Ph.D 2001, UIUC Thesis: Modular Compilers and Their
Correctness Proofs Thesis Advisor: Sam Kamin
Post-doc, Oregon Graduate Inst. (OGI) Three years on Programatica Project
using Haskell programming language as basis for formal methods
Assistant Professor, University of Missouri-Columbia since Fall 2003
Systems Biology asks… Can static biological structure be related
to dynamic biological behavior with mathematical clarity, precision, & rigor?
Can biological systems be viewed as the “sum of their parts”? Can component-level models be integrated into
precise system-level models of biological behavior?
What techniques from Mathematics and Computer Science apply to this composition problem?
Rhodobacter Sphaeroides Photosynthetic
bacterium seeks out regions of
greater light Roughly the size of
wavelength of light cannot sense local
light differences directly
applies random walk
Simulations of Biological Systems
Simulations provide qualitative feedback, but are not models per se how accurate/faithful is a simulation? what does the feedback mean? can one reason about the biological
phenomenon based on the simulation? can you identify the biology by
inspecting the text of the simulation program?
R. Sphaeroides in C++
contains 1000 LOC to understand requires
expertise in C++ …and biological model …and critical system details
e.g., how is concurrency implemented?
bool global_state::register_state(void *apointer){ if( number_of_states == mother_of_all_states.size()) mother_of_all_states.resize(number_of_states + 1000); mother_of_all_states[number_of_states++] = apointer; return true;}
R. Sphaeroides in C++ Program structure does not
reflect biological model can you look at the source code
and recognize the underlying biology?
difficult to comprehend …and write correctly …and modify …and maintain …and re-use
bool global_state::register_state(void *apointer){ if( number_of_states == mother_of_all_states.size()) mother_of_all_states.resize(number_of_states + 1000); mother_of_all_states[number_of_states++] = apointer; return true;}
System Biology as Programming Language Design
The Problem: General-purpose programming languages do
not have the “right vocabulary” Biological model: Concurrent Markov chains C++: classes, pointers, etc.
…nor are they mathematics Our Solution: Design small, special purpose
languages with exactly the right vocabulary called a Domain-specific Language (DSL)
[Sheard99,Thiemann01,Leijen01] Mathematical semantics of DSLs gives
formal model of biology
cell1 || … || cellnExecuting:
Produces animation:
Language Model of R. Sphaeroides
Outline Language Design and Domain-specific
Languages design, definition, and implementation
Systems Biology as Language Design Case Study for Rhodobacter Sphaeroides
Design: what are the appropriate abstractions for R. Sphaeroides?
Definition: how do we specify exactly what R. Sphaeroides programs mean?
Implementation: how do we run R. Sphaeroides programs?
Conclusions
Application Programmers should choose languages with abstractions most suited to their task;Language designers must provide languages with those abstractions…
Domain Central Activities Reasonable Language
System Programming “bit-fiddling” C
Artificial Intelligence List processing LISP
System Admin. Text processing, etc. PERL
Cardinal Rule of Language Design
DSLs are small languages w/ “domain abstractions”
translatesdirectly
assignStmt :: Parser StmtassignStmt = do{ id ident ; symbol ":=" ; s Expr ; return (Assign id s)}
Parsec code
<Stmt> <ident> := <Expr>BNF for language
Ex: “Parsec” Parser DSL
“Why a language and not a library?”
The Slogan: “What is excluded from a DSL is as important as what is included in it”
libraries in a general-purpose language still require considerable expertise & self-discipline on the part of the
programmer Lack of generality in DSL fewer things to “go wrong”
DSL may have desirable properties that a general-purpose language will not
e.g., implementation techniques specialized to DSL that do not apply to general-purpose languages
small size makes rigorous specification tractable
DSL Design
DSL design for R. Sphaeroides what are our domain abstractions?
How does this organism behave? What modeling techniques are used by
biologists to describe this behavior?
Bacterial Commands
adjustspeed
grow dividetumble
die
*Probability of growth varies with light concentration
laze
Chapman-Kolmogorov Equation*
probability of transition from i to j
Pi,j
probability of being in state m
*Commonly used framework for modeling biological systems [Bremaud99, Dailey02, Mao02, Shah00]
Chapman-Kolmogorov Equation
A row in the above matrix encodes the transition function from state i of a Markov chain
Bacteria as Markov Chains
State i
State 0
State m…
0,iP
miP,
• non-deter. state machines with probabilistic transitions induced by the Chapman-Kolmogorov equation• Pi,j in terms of environmental factors, organism state, etc.• executing concurrently
Domain Abstractions for R. Sphaeroides
Individual cells: Markov-chain abstraction
choose P1 Action1
… Pn Actionn
Actions: Tumble, Divide, AdjSpeed, Laze, Grow, etc.
Concurrency: cell1 || cell2 Environmental Factors: light, size
Abstract syntax for CellSys
choose is our principal domain abstraction behaves like the Markov chain transition function
Cell-level environment variables: light, size
DSL Definition Background: Programming languages
are “collections of effects” Java = OO + Threads + State +… LISP = Higher-order Functions + … Prolog = Backtracking + …
Corresponding to each such effect is an algebraic construction called a monad
used for the development of modular semantic theories of programming languages [Moggi89]
monads may be constructed using “monad transformers”
StateTimperative
:=
EnvTbinding @ v
ErrorTexceptionsraise/catch
ContTcontinuationscallcc
NondetTnon-determ.choose
ResTthreads
step pause
DebugTdebuggingrollback
BackTbacktracking
cut
ProbTprobabilityrandom
ReactTreactivity
send,recv,…
Periodic Table of Effects
StateTimperative
:=
EnvTbinding @ v
ErrorTexceptionsraise/catch
ContTcontinuationscallcc
NondetTnon-determ.choose
ResTthreads
step pause
DebugTdebuggingrollback
BackTbacktracking
cut
ReactTreactivity
send,recv,…
Prog. languages are collections of effects captured as monads [Moggi] Monads assembled from constructors (monad transformers)
Our view: Systems are collections of effects captured as monads “Systems” broadly construed:
Compilers [Harrison00,98,01,02], Secure system software [Harrison05,03], and Biology [Harrison04]
Periodic Table of Effects
ProbTprobabilityrandom
StateTimperative
:=
EnvTbinding @ v
ErrorTexceptionsraise/catch
ContTcontinuationscallcc
NondetTnon-determ.choose
ResTthreads
step pause
DebugTdebuggingrollback
BackTbacktracking
cut
ReactTreactivity
send,recv,…
Mathematical definitions for any language created by combining MTs
CellSys = StateT + ResT + ProbT + ReactT
Such definitions are flexible modular, extensible, and easily refactored
DSL definition similar to traditional RTS
In a traditional RTS threads request
services like “send a message” “output on device” “consume resource”
RTS mediates ensuring that the
threads do not interfere
global system state remains consistent
schedules threads
Run-time System
…
threads
High-level view of definition
In CellSys Cells are threads with
physical components as well size, velocity, …
cells request services like “consume nutrients” “move me here” “want to divide”
GE mediates like RTS, also: preserves physical integrity updates global world view performs scheduling
Global Enviroment
…
cells
DSL Implementation Because CellSys defined in terms of
monad transformers, may be implemented directly as Haskell program I.e., monadic language definition may be
transcribed “symbol for symbol” into Haskell
Haskell implementation easily instrumented to output system “snapshots”:
prints out snapshots in POV (Persistence of Vision) format & converted into MPEG
Q: What are appropriate languages for modeling?
Integrate techniques from programming languages models of concurrency language semantics
i.e., precise, mathematical language definitions efficient language implementation
…into special purpose language called a “Domain-Specific Language”
abstractions taken directly from biology comprehensible by biologists
DSLs and DSL programs hide technical details irrelevant/uninteresting to biologists are “tunable” by computer scientist to reflect
discovery/refinement execute to provide “reality check” by biologists
Bioinformatics = Computer Science + Biology
models of concurrency efficient implementation mathematical models of
programs reasoning about programs
organism structure & behavior
modeling techniques cellular automata systems of PDE’s numerical
techniques
Computer Science Biology
=
Hard Problem: How do you effect a technology transfer from CS Biology?
Interdisciplinary Process
CellSys (version 1.0)
CellSys (version 2.0)
feedback/discussion
Biologist evaluates DSL model for
accuracy, expressiveness,
etc.Language expert refactors
language as needed
Summary
modularmonadic
semantics
domainspecific
languages
systemsbiology
Comprehensibility, Reusability, &
Ease of Use
Precise description of biologicalphenomena through DSL semantics
Large body of work providing domain abstractions &
models
* Harrison & Harrison, “Domain Specific Languages for Cellular Interactions” in Proceedingsof the International Conference IEEE Engineering in Medicine and Biology, 2004.
top related