domain-specific languages for cellular interactions

Post on 19-Jan-2016

32 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Domain-specific Languages for Cellular Interactions. Bill Harrison Department of Computer Science University of Missouri at Columbia. This work partially supported by: NIH1 R0l GM62920-04A1, NIH1 P20 GM065762-01A1, the Georgia Research Alliance and the Georgia Cancer Coalition. - PowerPoint PPT Presentation

TRANSCRIPT

Domain-specific Languages for Cellular Interactions

Bill HarrisonDepartment of Computer ScienceUniversity of Missouri at Columbia

This work partially supported by: NIH1 R0l GM62920-04A1, NIH1 P20 GM065762-01A1,the Georgia Research Alliance andthe Georgia Cancer Coalition.

Domain-specific Languages for Cellular Interactions

Bill HarrisonDepartment of Computer ScienceUniversity of Missouri at Columbia

meow!

This work partially supported by: NIH1 R0l GM62920-04A1, NIH1 P20 GM065762-01A1,the Georgia Research Alliance andthe Georgia Cancer Coalition.

Ph.D 2001, UIUC Thesis: Modular Compilers and Their

Correctness Proofs Thesis Advisor: Sam Kamin

Post-doc, Oregon Graduate Inst. (OGI) Three years on Programatica Project

using Haskell programming language as basis for formal methods

Assistant Professor, University of Missouri-Columbia since Fall 2003

Systems Biology asks… Can static biological structure be related

to dynamic biological behavior with mathematical clarity, precision, & rigor?

Can biological systems be viewed as the “sum of their parts”? Can component-level models be integrated into

precise system-level models of biological behavior?

What techniques from Mathematics and Computer Science apply to this composition problem?

Rhodobacter Sphaeroides Photosynthetic

bacterium seeks out regions of

greater light Roughly the size of

wavelength of light cannot sense local

light differences directly

applies random walk

Simulations of Biological Systems

Simulations provide qualitative feedback, but are not models per se how accurate/faithful is a simulation? what does the feedback mean? can one reason about the biological

phenomenon based on the simulation? can you identify the biology by

inspecting the text of the simulation program?

R. Sphaeroides in C++

contains 1000 LOC to understand requires

expertise in C++ …and biological model …and critical system details

e.g., how is concurrency implemented?

bool global_state::register_state(void *apointer){ if( number_of_states == mother_of_all_states.size()) mother_of_all_states.resize(number_of_states + 1000); mother_of_all_states[number_of_states++] = apointer; return true;}

R. Sphaeroides in C++ Program structure does not

reflect biological model can you look at the source code

and recognize the underlying biology?

difficult to comprehend …and write correctly …and modify …and maintain …and re-use

bool global_state::register_state(void *apointer){ if( number_of_states == mother_of_all_states.size()) mother_of_all_states.resize(number_of_states + 1000); mother_of_all_states[number_of_states++] = apointer; return true;}

System Biology as Programming Language Design

The Problem: General-purpose programming languages do

not have the “right vocabulary” Biological model: Concurrent Markov chains C++: classes, pointers, etc.

…nor are they mathematics Our Solution: Design small, special purpose

languages with exactly the right vocabulary called a Domain-specific Language (DSL)

[Sheard99,Thiemann01,Leijen01] Mathematical semantics of DSLs gives

formal model of biology

cell1 || … || cellnExecuting:

Produces animation:

Language Model of R. Sphaeroides

Outline Language Design and Domain-specific

Languages design, definition, and implementation

Systems Biology as Language Design Case Study for Rhodobacter Sphaeroides

Design: what are the appropriate abstractions for R. Sphaeroides?

Definition: how do we specify exactly what R. Sphaeroides programs mean?

Implementation: how do we run R. Sphaeroides programs?

Conclusions

Application Programmers should choose languages with abstractions most suited to their task;Language designers must provide languages with those abstractions…

Domain Central Activities Reasonable Language

System Programming “bit-fiddling” C

Artificial Intelligence List processing LISP

System Admin. Text processing, etc. PERL

Cardinal Rule of Language Design

DSLs are small languages w/ “domain abstractions”

translatesdirectly

assignStmt :: Parser StmtassignStmt = do{ id ident ; symbol ":=" ; s Expr ; return (Assign id s)}

Parsec code

<Stmt> <ident> := <Expr>BNF for language

Ex: “Parsec” Parser DSL

“Why a language and not a library?”

The Slogan: “What is excluded from a DSL is as important as what is included in it”

libraries in a general-purpose language still require considerable expertise & self-discipline on the part of the

programmer Lack of generality in DSL fewer things to “go wrong”

DSL may have desirable properties that a general-purpose language will not

e.g., implementation techniques specialized to DSL that do not apply to general-purpose languages

small size makes rigorous specification tractable

DSL Design

DSL design for R. Sphaeroides what are our domain abstractions?

How does this organism behave? What modeling techniques are used by

biologists to describe this behavior?

Bacterial Commands

adjustspeed

grow dividetumble

die

*Probability of growth varies with light concentration

laze

Chapman-Kolmogorov Equation*

probability of transition from i to j

Pi,j

probability of being in state m

*Commonly used framework for modeling biological systems [Bremaud99, Dailey02, Mao02, Shah00]

Chapman-Kolmogorov Equation

A row in the above matrix encodes the transition function from state i of a Markov chain

Bacteria as Markov Chains

State i

State 0

State m…

0,iP

miP,

• non-deter. state machines with probabilistic transitions induced by the Chapman-Kolmogorov equation• Pi,j in terms of environmental factors, organism state, etc.• executing concurrently

Domain Abstractions for R. Sphaeroides

Individual cells: Markov-chain abstraction

choose P1 Action1

… Pn Actionn

Actions: Tumble, Divide, AdjSpeed, Laze, Grow, etc.

Concurrency: cell1 || cell2 Environmental Factors: light, size

Abstract syntax for CellSys

choose is our principal domain abstraction behaves like the Markov chain transition function

Cell-level environment variables: light, size

DSL Definition Background: Programming languages

are “collections of effects” Java = OO + Threads + State +… LISP = Higher-order Functions + … Prolog = Backtracking + …

Corresponding to each such effect is an algebraic construction called a monad

used for the development of modular semantic theories of programming languages [Moggi89]

monads may be constructed using “monad transformers”

StateTimperative

:=

EnvTbinding @ v

ErrorTexceptionsraise/catch

ContTcontinuationscallcc

NondetTnon-determ.choose

ResTthreads

step pause

DebugTdebuggingrollback

BackTbacktracking

cut

ProbTprobabilityrandom

ReactTreactivity

send,recv,…

Periodic Table of Effects

StateTimperative

:=

EnvTbinding @ v

ErrorTexceptionsraise/catch

ContTcontinuationscallcc

NondetTnon-determ.choose

ResTthreads

step pause

DebugTdebuggingrollback

BackTbacktracking

cut

ReactTreactivity

send,recv,…

Prog. languages are collections of effects captured as monads [Moggi] Monads assembled from constructors (monad transformers)

Our view: Systems are collections of effects captured as monads “Systems” broadly construed:

Compilers [Harrison00,98,01,02], Secure system software [Harrison05,03], and Biology [Harrison04]

Periodic Table of Effects

ProbTprobabilityrandom

StateTimperative

:=

EnvTbinding @ v

ErrorTexceptionsraise/catch

ContTcontinuationscallcc

NondetTnon-determ.choose

ResTthreads

step pause

DebugTdebuggingrollback

BackTbacktracking

cut

ReactTreactivity

send,recv,…

Mathematical definitions for any language created by combining MTs

CellSys = StateT + ResT + ProbT + ReactT

Such definitions are flexible modular, extensible, and easily refactored

DSL definition similar to traditional RTS

In a traditional RTS threads request

services like “send a message” “output on device” “consume resource”

RTS mediates ensuring that the

threads do not interfere

global system state remains consistent

schedules threads

Run-time System

threads

High-level view of definition

In CellSys Cells are threads with

physical components as well size, velocity, …

cells request services like “consume nutrients” “move me here” “want to divide”

GE mediates like RTS, also: preserves physical integrity updates global world view performs scheduling

Global Enviroment

cells

DSL Implementation Because CellSys defined in terms of

monad transformers, may be implemented directly as Haskell program I.e., monadic language definition may be

transcribed “symbol for symbol” into Haskell

Haskell implementation easily instrumented to output system “snapshots”:

prints out snapshots in POV (Persistence of Vision) format & converted into MPEG

Q: What are appropriate languages for modeling?

Integrate techniques from programming languages models of concurrency language semantics

i.e., precise, mathematical language definitions efficient language implementation

…into special purpose language called a “Domain-Specific Language”

abstractions taken directly from biology comprehensible by biologists

DSLs and DSL programs hide technical details irrelevant/uninteresting to biologists are “tunable” by computer scientist to reflect

discovery/refinement execute to provide “reality check” by biologists

Bioinformatics = Computer Science + Biology

models of concurrency efficient implementation mathematical models of

programs reasoning about programs

organism structure & behavior

modeling techniques cellular automata systems of PDE’s numerical

techniques

Computer Science Biology

=

Hard Problem: How do you effect a technology transfer from CS Biology?

Interdisciplinary Process

CellSys (version 1.0)

CellSys (version 2.0)

feedback/discussion

Biologist evaluates DSL model for

accuracy, expressiveness,

etc.Language expert refactors

language as needed

Summary

modularmonadic

semantics

domainspecific

languages

systemsbiology

Comprehensibility, Reusability, &

Ease of Use

Precise description of biologicalphenomena through DSL semantics

Large body of work providing domain abstractions &

models

* Harrison & Harrison, “Domain Specific Languages for Cellular Interactions” in Proceedingsof the International Conference IEEE Engineering in Medicine and Biology, 2004.

top related