uml as a cell and biochemistry modeling languagepromo152.free.fr › cle_cube_152 › clé_cube_152...

20
BioSystems 80 (2005) 283–302 UML as a cell and biochemistry modeling language Ken Webb a , Tony White b,a Symbium Corporation, Canada b School of Computer Science, Carleton University, 1125, Colonel By Drive, Ottawa, ON, Canada K1S 5B6 Received 21 November 2003; received in revised form 6 December 2004; accepted 26 December 2004 Abstract The systems biology community is building increasingly complex models and simulations of cells and other biological entities, and are beginning to look at alternatives to traditional representations such as those provided by ordinary differential equations (ODE). The lessons learned over the years by the software development community in designing and building increasingly complex telecommunication and other commercial real-time reactive systems, can be advantageously applied to the problems of modeling in the biology domain. Making use of the object-oriented (OO) paradigm, the unified modeling language (UML) and Real-Time Object-Oriented Modeling (ROOM) visual formalisms, and the Rational Rose RealTime (RRT) visual modeling tool, we describe a multi-step process we have used to construct top–down models of cells and cell aggregates. The simple example model described in this paper includes membranes with lipid bilayers, multiple compartments including a variable number of mitochondria, substrate molecules, enzymes with reaction rules, and metabolic pathways. We demonstrate the relevance of abstraction, reuse, objects, classes, component and inheritance hierarchies, multiplicity, visual modeling, and other current software development best practices. We show how it is possible to start with a direct diagrammatic representation of a biological structure such as a cell, using terminology familiar to biologists, and by following a process of gradually adding more and more detail, arrive at a system with structure and behavior of arbitrary complexity that can run and be observed on a computer. We discuss our CellAK (Cell Assembly Kit) approach in terms of features found in SBML, CellML, E-CELL, Gepasi, Jarnac, StochSim, Virtual Cell, and membrane computing systems. © 2005 Elsevier Ireland Ltd. All rights reserved. Keywords: Agent-based modeling; UML; Cell simulation Corresponding author. Tel.: +1 613 520 2600x2208; fax: +1 613 520 4334. E-mail addresses: [email protected] (K. Webb), [email protected] (T. White). 1. Introduction Researchers in bioinformatics and systems biology are increasingly using computer models and simula- tion to understand complex inter- and intra-cellular processes. The principles of object-oriented (OO) analysis, design, and implementation, as standardized 0303-2647/$ – see front matter © 2005 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.biosystems.2004.12.003

Upload: others

Post on 05-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • BioSystems 80 (2005) 283–302

    UML as a cell and biochemistry modeling language

    Ken Webba, Tony Whiteb,∗

    a Symbium Corporation, Canadab School of Computer Science, Carleton University, 1125, Colonel By Drive, Ottawa, ON, Canada K1S 5B6

    Received 21 November 2003; received in revised form 6 December 2004; accepted 26 December 2004

    Abstract

    The systems biology community is building increasingly complex models and simulations of cells and other biological entities,and are beginning to look at alternatives to traditional representations such as those provided by ordinary differential equations(ODE). The lessons learned over the years by the software development community in designing and building increasinglycomplex telecommunication and other commercial real-time reactive systems, can be advantageously applied to the problems ofmodeling in the biology domain. Making use of the object-oriented (OO) paradigm, the unified modeling language (UML) andReal-Time Object-Oriented Modeling (ROOM) visual formalisms, and the Rational Rose RealTime (RRT) visual modeling tool,we describe a multi-step process we have used to construct top–down models of cells and cell aggregates. The simple examplemodel described in this paper includes membranes with lipid bilayers, multiple compartments including a variable numberof mitochondria, substrate molecules, enzymes with reaction rules, and metabolic pathways. We demonstrate the relevanceof abstraction, reuse, objects, classes, component and inheritance hierarchies, multiplicity, visual modeling, and other currents biologicals nd mored puter. Wed arnac,S©

    K

    f

    a

    logyula-larO)

    oftware development best practices. We show how it is possible to start with a direct diagrammatic representation of atructure such as a cell, using terminology familiar to biologists, and by following a process of gradually adding more aetail, arrive at a system with structure and behavior of arbitrary complexity that can run and be observed on a comiscuss our CellAK (Cell Assembly Kit) approach in terms of features found in SBML, CellML, E-CELL, Gepasi, JtochSim, Virtual Cell, and membrane computing systems.2005 Elsevier Ireland Ltd. All rights reserved.

    eywords:Agent-based modeling; UML; Cell simulation

    ∗ Corresponding author. Tel.: +1 613 520 2600x2208;ax: +1 613 520 4334.

    1. Introduction

    Researchers in bioinformatics and systems bioare increasingly using computer models and simtion to understand complex inter- and intra-celluprocesses. The principles of object-oriented (O

    E-mail addresses:[email protected] (K. Webb),

    [email protected] (T. White). analysis, design, and implementation, as standardized

    0303-2647/$ – see front matter © 2005 Elsevier Ireland Ltd. All rights reserved.doi:10.1016/j.biosystems.2004.12.003

  • 284 K. Webb, T. White / BioSystems 80 (2005) 283–302

    in the unified modeling language (UML), can be di-rectly applied to top–down modeling and simulation ofcells and other biological entities. This paper describesthe process of how an abstracted cell, consisting ofmembrane-bounded compartments with chemicalreactions and internal organelles, can be modeledusing tools such as Rational Rose RealTime (RRT), aUML-based software development tool. The resultingapproach, embodied in CellAK (for Cell AssemblyKit), produces models that are similar in structure andfunctionality to those that can be specified using thesystems biology markup language (SBML) (Hucka etal., 2003a, 2003b), and CellML (Hedley et al., 2001),and implemented using E-CELL (Tomita et al., 1999),Gepasi (Mendes, 1993; Mendes, 1997), Jarnac (Sauro,2000), StochSim (Morton-Firth and Bray, 1998), Vir-tual Cell (Schaff et al., 2000; Loew and Schaff, 2001;Slepchenko et al., 2002), and other tools currentlyavailable to the biology community. We claim that thisapproach offers greater potential modeling flexibilityand power because of its use of OO, UML, ROOM,and RRT. The OO paradigm, UML methodology, andRRT tool, together represent an accumulation of bestpractices of the software development community,a community constantly expected to build more andmore complex systems, a level of complexity that isstarting to approach that of systems found in biology.

    Membrane systems or P systems (Păun, 2000)are, by contrast with the systems mentioned above, abiologically-inspired approach to mathematical com-p utiono ,1 anyp byt dedr therm tiona ans-f iseto ctsb utingu anda m-b ions,a ,2

    usp een

    structure and behavior. This paper deals mainly withthe top–down (analytical) structure of membranes,compartments, small molecules, and the relationshipsbetween these, but also shows how bottom-up (syn-thetic) behavior of active objects such as enzymes,transport proteins, and lipid bilayers, is incorpo-rated into this structure to produce an executableprogram.

    We do not use differential equations to determinethe time evolution of cellular behavior, as is the casewith most of the cell modeling systems described in thispaper. Differential equations find it difficult to modeldirected or local diffusion processes and subcellularcompartmentalization (Khan et al., 2003) and they lackthe ability to deal with non-equilibrium solutions. Fur-ther, differential equation-based models are difficult toreuse when new details are added to the model. Cel-lAK more closely resembles Cellulat (Gonzalez et al.,2003) in which a collection of autonomous agents (ouractive objects — enzymes, transport proteins, lipid bi-layers) act in parallel on elements of a set of shareddata structures called blackboards (our compartmentswith small molecule data structures). The dynamics ofa CellAK model result from messages passing betweenactive objects. Agent-based modeling of cells is becom-ing an area of increasing research interest (Gonzalez etal., 2003; Khan et al., 2003) owing in no small mea-sure to the desire to understand cellular processes at anincreasing level of detail.

    This paper describes a process that starts with thei n-s rad-u ablep Thisr anyc ns-f h asc an-i therm tech-n e oft l forr

    ws.S bydU ion2 ind

    utation. Membrane systems represent an evolf the Chemical Abstract Machine (Benatŕe et al.988). In membrane systems, which have marallels to CellAK, computation is performed

    ransformation rules within each membrane-bounegion. Membrane-bounded regions can contain oembrane-bounded regions, with dynamic creand destruction of regions being possible. The tr

    ormation rules either act on and evolve a multf local numerically-valued objects, or move objeetween regions. Researchers in membrane compse biologically-inspired approaches such as portsntiports (single- and bi-directional coupled merane channels) to communicate between regnd thereby derive distributed computation (Ciobanu003).

    All of the approaches listed in the previoaragraph make a fundamental distinction betw

    dentification of biological entities and their relatiohips with each other, progresses through the gal addition of details, and ends with an executrogram that simulates biochemical pathways.elatively simple process can be used to modelhemical-like system that involves active objects traorming and moving passive small molecules, sucells, a circulatory system, neural circuits, or orgsms. We believe this process to be superior to o

    odeling approaches owing to its use of standardiques from software engineering, the visual natur

    he modeling process and the significant potentiaeuse of the model components.

    The remainder of the paper is organized as folloection 2 introduces object-oriented conceptsiscussing a eukaryotic cell. Section3 introducesML, formalizing the example introduced in Sect. Section4 describes the principal concepts beh

  • K. Webb, T. White / BioSystems 80 (2005) 283–302 285

    the real-time object-oriented methodology (ROOM).Having described ROOM, Section5 provides detailsof a process used in CellAK for cell modeling. Section6 provides an extended discussion of the modelcreated, contrasting it with prior art. In Section7,future work is described and the paper concludes withkey messages in Section8.

    2. Object-oriented (OO) paradigm

    The process of software development has evolvedconsiderably in the last 20 years. The current moregenerally accepted paradigm for commercial softwaredevelopment is called the object-oriented approach,which includes OO analysis and design methodologies,and OO programming languages such as Java and C++.OO replaced the earlier imperative paradigm in whicha computer program was thought of as a system of pro-cedures calling other procedures.

    An OO object is a software entity that encapsulatesor hides its own internal details or attributes. The scopeof an internal attribute is such that it is only knownwithin the object rather than at the global level as wastypically the case with the older paradigm. For exam-ple, an Organelle object should not allow any otherobject in the system, such as EukaryoticCell or otherinstances of Organelle, to directly manipulate its pri-vate internal structure. Objects are a way of breakingthe system into a manageable set of modules that canb atedi omeo , re-s

    ceso le, ac nellec timesa eusea mul-t

    turew ncef . Mi-t sseso thatc thed

    3. Unified modeling language (UML)

    Starting in the late 1980s various individuals andsoftware development communities developed theirown graphics-based methods for object-oriented anal-ysis and design. These methods became increasinglynecessary as computer systems became more and morecomplex, and it was no longer possible to simply sitdown at a computer workstation and start entering linesof code in some computer programming language. Inthe mid 1990s Grady Booch, Jim Rumbaugh, and IvarJacobson merged their slightly different approachesinto a common unified modeling language (UML). TheUML standardization process is managed by the Ob-ject Management Group (OMG) (OMG, 2003). It isstandard practice in the computer industry to presentanalysis, design, and implementation models of a sys-tem, using the UML common visual notation (Boochet al., 1998).

    Fig. 1 shows a UML class diagram that specifiesthe simple system used as an example in the previoussection on the OO paradigm. The connecting line withunfilled triangle from the Mitochondrion and Chloro-plast subclasses to Organelle is the symbol for inher-itance in UML. Any number (multiplicity of 0..*) ofMitochondrion and Chloroplast objects can be createdwithin EukaryoticCell. The connecting line with filleddiamond from Organelle to EukaryoticCell is the sym-bol for containment in UML. Functions associated withclasses are also defined; e.g. generateATP() in the Mi-t

    e developed and tested by individuals, and integr

    nto a larger system being developed by a team. Sbjects are typically contained within other objectsulting in a containment hierarchy.

    The OO concept of class allows multiple instanf the same type of object to be created. For exampell model may need many instances of the Orgalass. A class is defined once and reused as manys needed. This creates abstractions that can be rs parts of larger abstractions, and also allows for

    iplicity (multiple instances of a class).Subclasses allow developers to explicitly cap

    hat two objects have in common through inheritarom a superclass, as well as how they are differentochondrion and Chloroplast objects, two subclaf Organelle, both encapsulate a functionalityan be used by EukaryoticCell, but that differs inetails.

    d

    ochondrion class.

    Fig. 1. Simple example system.

  • 286 K. Webb, T. White / BioSystems 80 (2005) 283–302

    UML is starting to be used to a limited extent withinthe biology community. The systems biology markuplanguage (SBML) specification documents use manyUML diagrams to formalize the SBML data structures(Hucka et al., 2003a).

    There are three main advantages to using UML as a ba-sis for defining SBML data structures. First, comparedto using other notations or a programming language,the UML visual representations are generally easier tograsp by readers who are not computer scientists. Sec-ond, the visual notation is implementation-neutral: thedefined structures can be encoded in any concrete im-plementation language—not just XML, but C or Javaas well. Third, UML is a de facto industry standard thatis documented in many sources. Readers are thereforemore likely to be familiar with it than other notations.(Hucka et al., 2003a, p. 3)

    However, although biotechnology tools are beingwritten in OO programming languages such as Javaand C++, and although some tools such as Virtual Cell(NRCAM, 2003) and E-CELL (Takahashi et al., 2002,p. 68) are expressing their OO software design usingUML, the end user interfaces of these systems donotmake use of OO and UML concepts. Because cells andother biological entities naturally exhibit the principlesembodied in OO and UML, at least when viewed froma top–down perspective, it makes sense to use thesecomputer approaches when modeling cells.

    2 ain-i term d thes useg

    r ofU esed nsiver nt

    4R

    ated ML

    (Harel, 1987), and an early proponent of visual for-malisms in software analysis and design (Harel, 1988),has argued that biological cells and multi-cellular or-ganisms can be modeled as reactive systems using real-time software development tools (Harel, 2002; Kam etal., 2003). Two such commercially-available tools areI-Logix Rhapsody (I-Logix, 2003), and Rational RoseRealTime (Rational Software, 2003), the latter beingused to implement the system described in this paper.

    Reactive systems are those whose complexity stemsnot necessarily from complicated computation but fromcomplicated reactivity over time. They are most oftenhighly concurrent and time-intensive, and exhibit hy-brid behavior that is predominantly discrete in naturebut has continuous aspects as well. The structure of areactive system consists of many interacting compo-nents, in which control of the behavior of the systemis highly distributed amongst the components. Very of-ten the structure itself is dynamic, with its componentsbeing repeatedly created and destroyed during the sys-tem’s life span. (Kam, Harel et al., 2003, p. 5)

    Rational Rose RealTime (RRT) is a visual designand implementation tool for the production of telecom-munication systems, embedded software, and otherhighly-concurrent real-time systems. It combines thefeatures of UML with the real-time specific featuresand visual notation of the Real-time Object-OrientedModeling (ROOM) (Selic et al., 1994). A RRT appli-c en-v nts,i

    byd rchyo , us-i t, orc tated d tor (gen-e sys-t agesa sule.P acest Javac tated ther( . An

    As efforts go forward to develop whole-cell (Tomita,001) and other increasingly complex models cont

    ng multiple compartments, biologists will encounany of the same issues such as scalability that le

    oftware development community to develop andraphics-based formalisms such as UML.

    This paper will present examples of a numbeML diagram types and visual notations used in thiagrams. This paper does not present a compreheeview of UML, providing only sufficient informatioo clarify modeling concepts and diagrams.

    . The ROOM formalism and the Rationalose RealTime tool

    David Harel, originator of the hierarchical stiagram (statecharts) formalism used today in U

    ation’s main function is to react to events in theironment, and to internally-generated timeout even real-time.

    Software developers design software with RRTecomposing the system into an inheritance hieraf classes and a containment hierarchy of objects

    ng UML class diagrams. Each architectural objecapsule as they are called in RRT, contains a UML siagram that is visually designed and programmeeact to externally-generated incoming messagesrated within other capsules or sent from external

    ems), and to internally-generated timeouts. Messre exchanged through ports defined for each caports are instances of protocols, which are interf

    hat define sets of related messages. All C++, C, orode in the system is executed within objects’ siagrams, along transitions from one state to anowhich may be a self-transition to the same state)

  • K. Webb, T. White / BioSystems 80 (2005) 283–302 287

    executing RRT system is therefore an organized col-lection of communicating finite state machines. TheRRT run-time scheduler guarantees correct concurrentbehavior by making sure that each transition runs allof its code to completion before any other message isprocessed.

    The RRT design tool is visual. During design, tocreate the containment structure, capsules are draggedfrom a list of available classes into other classes. Forexample, the designer may drag an instance of Nu-cleus onto the visual representation of EukaryoticCell,thus establishing a containment relationship. This nat-urally mirrors the view understood by the biologist,making the model more accessible when compared toa differential equation-based formulation. Compatibleports on different capsules are graphically connectedto allow the sending of messages. UML state diagramsare drawn to represent the behavior of each capsule.Other useful UML graphical tools include use case di-agrams, and sequence diagrams. External C++, C, orJava classes can be readily integrated into the system.

    The developer generates the executing system bymaking a selection from a menu. RRT generates allrequired code from the diagrams, and produces an ex-ecutable program. The executable can then be run andobserved using the design diagrams to dynamicallymonitor the run-time structure and behavior of the sys-tem, although additional programming code must beadded to allow capture and graphing of changing quan-tities of molecules in the system.

    ase sw ro-g lop-m riatef le ton

    itht d int aters

    gyt lingt de:s enti-t andc enta C++

    or Java, object instantiation from a class, ease of usingmultiple instances of the same class, and subclassingto capture what entities have in common and how theydiffer. Examples of capsules, protocols, ports, and thevarious diagrams and concepts mentioned in this sec-tion, will be provided in subsequent sections of thispaper.

    5. Process

    This paper will present a simple process that hasbeen used to develop cellular models. The process hasfour iterated steps. Each step has a number of sub-steps.These steps and the principles behind them have beenadapted loosely from a number of computer industrysources (Quatrani, 1998; Kruchten, 2000).

    The design methodology presented here istop–down rather than bottom-up, but as we will see therun-time dynamics can be much more bottom-up. Acell is considered as an entity that consists of variouscompartments, each containing active objects that actchemically on various types and quantities of smallmolecules. An active object is defined here as a RRTcapsule that acts in a biologically-plausible manner onsome substrate molecule or set of substrates, possiblylocated in multiple compartments. Each compartmentmay contain other compartments to any arbitrarydepth.

    Abstraction is an important principle of softwared t int gy.F es,o uter-c thatw ,1 ther.A nso ofs ion,a ys-t tiala gy)t pro-g nceo lemd k ond

    The powerful combination of the OO paradigmmbodied in the UML and ROOM visual formalismith the added flexibility of the C, C++ or Java pramming languages, bundled together in a deveent tool such as RRT, provide much that is approp

    or biological modeling. Models are more accessibon-mathematicians using this formalism.

    There are of course problems and limitations whe approach, tools and implementation describehis paper. Some of these will be mentioned in lections.

    To summarize, benefits of the CellAK methodolohat are of use in cell and other biological modehat have been identified so far in this paper incluupport for concurrency and interaction betweenies, scalability to large systems, use of inheritanceontainment to structure a system, ability to implemny type of behavior that can be implemented in C,

    evelopment. Start by identifying entities that exishe application domain, in this case cellular bioloor example, think in terms of membranes, enzymrganelles, and small molecules rather than compentric processes. These are the types of entitiesould appear in a cell biology textbook (Becker et al.996). Consider how these entities relate to each ot this point, pay minimal attention to consideratiof how these will actually be implemented in linesoftware code. Start with a high level of abstractnd gradually add detail until the final concrete s

    em is ready to be executed. UML allows an inipplication-level model (such as at the level of biolo

    o be gradually evolved into a design, and then aramming language implementation. The importaf using concepts and terminology from the probomain (e.g. biology) is emphasized in recent woromain-specific modeling (Pohjonen and Kelly, 2002).

  • 288 K. Webb, T. White / BioSystems 80 (2005) 283–302

    The four-step CellAK process described here hasbeen successfully used to develop models and execut-ing simulations of cells and of cell aggregates. A simplecell model is used to motivate the discussion for eachstep. This process could be used to develop modelsand simulations using a variety of software develop-ment languages and tools, not just those described inthis paper.

    5.1. Step 1: Identify entities, inheritance andcontainment hierarchies

    The first step can be divided into three sub-steps:Identify entities in the problem domain (biology).Identify inheritance relations between these entities.Identify containment relations between them.

    The purpose of the small example system describedhere will be to model and simulate metabolic path-ways, especially the glycolytic pathway that takes placewithin the cytoplasm, and the TCA cycle that takes

    place within the mitochondrial matrix. It should alsoinclude a nucleus to allow for the modeling of geneticpathways in which changes in the extra cellular envi-ronment can effect changes in enzyme and other proteinlevels. The model should also be extensible, to allowfor specialized types of cells.

    Fig. 2 shows a set of candidate entities organizedinto an inheritance hierarchy, drawn as a UML classdiagram using RRT. These are only candidate entitiesbecause further analysis may uncover other entities thatshould be included, or some of these may prove unnec-essary. The lines with a triangle at one end are standardUML notation for inheritance. Erythrocyte and Neu-ronCellBody are particular specializations of the moregeneric EukaryoticCell type. CellBilayer, Mitochon-drialInnerBilayer, and MitochondrialOuterBilayer arethree of potentially many different subclasses of Lipid-Bilayer. These three share certain characteristics buttypically differ in the specific lipids that constitutethem.

    d into a

    Fig. 2. Set of entities organize

    n inheritance UML class diagram.

  • K. Webb, T. White / BioSystems 80 (2005) 283–302 289

    The figure also shows that there are four specific So-lution entities, each of which contains a mix of smallmolecules dissolved in the Solvent water. All entityclasses are subclasses of BioEntity. This will make itpossible in a later design stage for instances of eachclass to share programming code such as the abilityto display information about themselves, or the abil-ity to be scheduled at some regular interval. For nowthese are just potentialities we are setting up by makingeverything a subclass of BioEntity.

    Fig. 3 shows a different hierarchy, that of con-tainment. This UML class diagram shows that at thehighest level, a EukaryoticCell is contained within an

    ExtraCellularSolution. The EukaryoticCell in turn con-tains a CellMembrane, Cytoplasm, and a Nucleus.This reductionist top–down decomposition continuesfor several more levels. It includes the dual mem-brane structure of a Mitochondrion along with its inter-membrane space and solution and its internal matrixspace and solution. Part of the inheritance hierarchyis also shown in these figures. Each Membrane con-tains a LipidBilayer, but the specific type of bilayer(CellBilayer, MitochondrialInnerBilayer, Mitochon-drialOuterBilayer) depends on which type of mem-brane (CellMembrane, MitochondrialInnerMembrane,MitochondrialOuterMembrane) it is contained within.

    Fig. 3. The containment hierarc

    hy as a UML class diagram.

  • 290 K. Webb, T. White / BioSystems 80 (2005) 283–302

    The UML class diagram,Fig. 3has been annotatedwith text to show which entities (the four Solutionsubclasses identified in the inheritance hierarchy)will contain small molecules (SM) such as glucose,pyruvate, and the other substrates and products of themetabolic pathways that are part of this simulation.Small molecules are captured in the model as a pairof passive (non-capsule) classes (SmallMoleculescontaining multiple instances of SmallMolecule, oneinstance for each type such as glucose or pyruvate).The three entity types identified as active objects (En-zyme, PyruvateTransporter, LipidBilayer) will act onthe small molecules to create a dynamic metabolism.That dynamics will be described in Step 4.

    Fig. 4shows a set of ROOM capsule structure dia-grams that present the same information as in the UMLdiagram ofFig. 3, but laid out as a series of nested rect-angles. The software developer creates a new entity bydragging and dropping existing entities from a list in

    a browser to a space that represents the new container.For example, to create the Cytoplasm class requiresdragging instances of Cytosol, Mitochondrion, and En-zyme into the rectangle that represents Cytoplasm.

    There will typically be many enzyme types activeat the same time, so a multiplicity factor nEnzPer-Cyt (number of enzymes per cytoplasm) is declared,the specific value of which can be delayed until later.The model includes several other multiplicity factorsincluding the number of EukaryoticCell instances inExtraCellularSolution (nEukCell), the number of Mi-tochondrion instances in Cytoplasm (nMito), and thenumber of Enzyme types in Matrix (nEnzPerMatrix).Multiplicities can range from 0 to hundreds or eventhousands of instances.

    The result of Step 1 is often called a Domain Model,especially if it incorporates a large number of entitiesbelonging to one domain, in this case biology, that canlater be used to build many separate models.

    as a s

    Fig. 4. The containment hierarchy

    et of ROOM capsule structure diagrams.

  • K. Webb, T. White / BioSystems 80 (2005) 283–302 291

    Once the architectural structure is in place, themore fine-grained structure of the small moleculescan be specified, again using UML. In CellAK, eachtype of small molecule is an instance of the Substrateclass (a C++ class rather than a RRT capsule) whichcontains a count of the number of molecules of thatmolecule type (from 0 to 1015), and also containsoperations to increase (inc(amount) ) and decrease(dec(amount) ) the count by a designated amountand to get ((get() ) and set (set(amount) ) thecurrent value of the count. A separate SmallMoleculesclass includes an array or dictionary of all the possibletypes of small molecules that may be found in aCellAK model.

    5.2. Step 2: Establish relationships betweenentities

    The second step involves several sub-steps, whichwould typically be done in parallel:

    1. Identify adjacency and other relationships betweencapsules (relationships not identified in Step 1).

    2. Identify and specify protocols (interaction types be-tween entities).

    3. Create ports (instances of protocols) on capsules.4. Connect ports using connectors.

    This step establishes the adjacency structure of thebiological and chemical entities in the system, andtheir potential for interaction. In a EukaryoticCell,C ithC nnoti enC yto-p Step1 allc m-b area her,b ond urala t toe n bea

    canb tion.F olsu ocol

    Fig. 5. Configuration and adjacency protocols.

    has two signals—ConfigSig and MRnaSig. When thesimulation starts, the Chromosome within the Nucleussends a ConfigSig message to the Cytoplasm, whichwill recursively pass this message to all of its containedcapsules. The contents of this message is a reference toa structure, specific to this cell, that defines the genome,the quantities of the various small molecules, and otherstarting conditions. When an active object such as anEnzyme receives the ConfigSig message, it determinesits type and takes on the characteristics defined in thegenome for that type. When a Solution such as Cytosolreceives the ConfigSig message, it extracts the quantityof the various molecules that it contains, for examplehow many glucose and how many pyruvate molecules.In addition to being passed as messages through ports,configuration information may also be passed in to acapsule as a parameter when it is created. This is howthe entire Mitochondrion containment hierarchy is con-figured. In this approach, Nucleus is used for a purposein the simulation that is similar to its actual role in abiological cell. The MRnaSig (messenger RNA signal)message can be used to reconfigure the system by creat-ing new Enzyme types and instances as the simulationevolves over time.

    The Adjacency protocol allows configured capsulesto exchange messages that will establish an adjacencyrelationship. Capsules representing active objects(Enzymes, PyruvateTransporter and other types of

    ellMembrane is adjacent to and interacts wytoplasm, but is not adjacent to and therefore ca

    nteract directly with Nucleus. Interactions betweellMembrane and Nucleus must occur through Clasm. In many cases the static layout defined insuggests which entities will interact, but not in

    ases. For example, within MitochondrialInnerMerane, both LipidBilayer and PyruvateTransporterdjacent and could in theory interact with each otut this will not be allowed in the simple simulatiescribed here. It is important to have a structrchitecture that will place those things adjacenach other that need to be adjacent, so they callowed to interact.

    A protocol is a specific set of messages thate exchanged between capsules to allow interacig. 5 is a RRT dialog that shows the two protocsed in the sample system. The Configuration prot

  • 292 K. Webb, T. White / BioSystems 80 (2005) 283–302

    Fig. 6. The three contained capsules within the EukaryoticCell cap-sule structure diagram. The capsules are connected through ports thatare instances of the two protocols shown inFig. 5.

    TransportProtein, LipidBilayer) that engage in chem-ical reactions (to be described in Step 4) by acting onsmall substrate molecules, will send SubstrateRequestmessages. Capsules that contain small molecules(types of Solution such as Cytosol, ExtraCellularSolu-tion, MitochondrialIntermembranesol, Matrixsol) willrespond with SubstrateLevel messages.

    Fig. 6is a RRT capsule structure diagram that showsEukaryoticCell and its three contained capsules withnamed ports and connector lines between these ports.The ports are added by dragging from the protocolsymbol in a browser window to the capsule structurediagram. The ports whose names begin withadj areinstances of the Adjacency protocol, whileconfigandconfigMare instances of the Configuration protocol.The color of the port (black or white) indicates therelative direction (in or out) of message movement.

    Fig. 7shows the final result once all protocols, portsand connectors are in place. This figure continues thestep-by-step progression that has led from identifyingbiological entities, through organizing these entitiesinto inheritance (Fig. 2) and containment (Fig. 3) hi-erarchies, creating capsule structure diagrams (Fig. 4),identifying adjacency and genetic configuration rela-tionships and specifying these as protocols (Fig. 5), to

    creating instances of these protocols as ports and con-necting the ports using connectors. The structural ar-chitecture of the system is now complete. By followingthe series of connector lines between capsules inFig. 7,you can confirm which entities are in an adjacencyrelationship with which other entities. For example, thelipidBilayer and pyruvateTransporter capsules withinMitochondrialInnerMembrane are both adjacent tothe mitochondrialIntermembranesol and matrixsolcapsules.

    5.3. Step 3: Define external and internal behaviorpatterns

    The third step adds behavior to the existing structure.

    1. Define the desired behavior of the system by spec-ifying patterns of message exchange between cap-sules.

    2. Define the detailed behavior of each capsule usingstate diagrams, the combined effect of which willproduce this desired overall pattern of message ex-change.

    Fig. 8 is a UML sequence diagram that shows theadjacency configuration processing that would be ex-pected to occur in the small system described in thispaper. Capsule instances are shown at the top of thediagram, annotated with active object (AO) or con-tainer for small molecules (SM). When it starts up,each active object sends a SubstrateRequest messageo ncyp h asa sulew ninga ache dm uestt es-s se-q tingd

    sys-t ma uc-t , andt dBi-l mallm lIn-

    ut each of its adj ports (instances of the Adjacerotocol). If a port is connected to a capsule sucSolution that contains small molecules, that capill respond with a SubstrateLevel message contaireference to its small molecule data structure. E

    nzyme (enzyme1 to enzymeN), plus cellBilayer anitochondrialOuterBilayer, sends a SubstrateReq

    o Cytosol. Cytosol responds with SubstrateLevel mages containing the reference pSM. Time in auence diagram is represented by the thin line poinownward from each capsule instance.

    The resulting reference structure for the entireem is shown inFig. 9. This is exactly the same diagrasFig. 7, with two types of configured reference str

    ures superimposed, one representing adjacencyhe other representing the influence of genes. Lipiayer and pyruvateTransporter both reference the s

    olecule data structures within both mitochondria

  • K. Webb, T. White / BioSystems 80 (2005) 283–302 293

    Fig. 7. The complete structure of the sample model, with all capsules, ports, and connectors. This is an enhancement ofFig. 4with additionaldetails added.

  • 294 K. Webb, T. White / BioSystems 80 (2005) 283–302

    Fig. 8. Adjacency configuration as a UML sequence diagram. The diagram is split into two parts so it will fit on the page.

    termembranesol and matrixsol. Because there are mul-tiple instances of EukaryoticCell, Mitochondrion, andEnzyme, some of the references have double arrowheads to graphically represent this multiplicity. The in-stances of LipidBilayer may also reference an internalsmall molecule data structure that contains the lipidsthat they are composed of, allowing for lipid creationin the Cytoplasm, lipid transport, and disintegration tobe modeled (Webb and White, 2004).

    In the sample model, the glycolytic pathway isimplemented through the multiple enzymes withinCytoplasm, all acting concurrently on the same set ofsmall molecules within Cytosol. The TCA metabolicpathway is similarly implemented by the concurrentactions of the multiple enzymes within Matrix actingon the small molecules of the Matrixsol. Movement ofsmall molecules across membranes is implemented bythe various lipid bilayers. For example, lipidBilayerwithin MitochondrialOuterMembrane transportspyruvate from the Cytosol to the Mitochondrial-Intermembranesol, and pyruvateTransporter within

    MitochondrialInnerMembrane transports pyruvateacross this second membrane into the Matrixsol.

    Fig. 9 also shows the influence of the genes. Eachenzyme, transporter, and other protein, is configuredto reference a detailed description of its functionality.The description is contained within a table of gene dataresiding within the Chromosome object inside the Nu-cleus.

    Fig. 10shows the UML state diagram representingthe behavior of an Enzyme active object. When firstcreated, it makes the initialize transition, the line fromthe large grey circle in the upper left to the Waitingstate. As part of this transition it executes a line of C++code adj.SubstrateRequest().send(); that sends a mes-sage out its adj port. When it subsequently receives aSubstrateLevel response message through the same adjport, it stores the pSM reference that is part of that mes-sage, creates a timer so that it can be invoked at a regularinterval, and makes the transition to the Active state.

    The state diagrams for lipid bilayers and transportproteins are much the same, but include additional

  • K. Webb, T. White / BioSystems 80 (2005) 283–302 295

    Fig. 9. Structure of entire system with references from active objects (AO) to compartments that contain small molecules (SM), and to genesthat reside at the Chromosome.

  • 296 K. Webb, T. White / BioSystems 80 (2005) 283–302

    Fig. 10. Enzyme behavior drawn as a UML state diagram.

    states because they need to connect to two smallmolecule containers, one inside and the other outside.

    5.4. Step 4: Implement detailed behavior

    The fourth step involves adding programming lan-guage code to the state diagrams. The two types ofconfiguration, adjacency and gene configuration, cometogether at runtime as each active object repeatedlytimes out at discrete intervals (the timeCourse transi-tion shown onFig. 10) to perform its simple processing.

    Enzyme reactions can take various forms. In thispaper, we consider the simplest case, in which an en-zyme irreversibly converts a single substrate moleculeinto a different product molecule. By irreversible ismeant that the enzyme cannot also convert the prod-uct into the substrate. More complex reactions in-clude combining two substrates into one resulting prod-uct, splitting a single substrate into two products, andmaking use of activators, inhibitors, and coenzymes.These more complex reaction types have been im-plemented in CellAK using the same approach as

    shown inFig. 11, but are not discussed further in thispaper.

    At present, CellAK implements only a representa-tive sample of the kinetic and other behaviors neededfor cell simulation. Users select an active object be-havior by specifying in a data file (presently automat-ically generated from a database) the types of param-eters shown on line 1 ofFig. 11. Parameters must bespecified for eachtypeof enzyme (e.g. Hexokinase),lipid bilayer (e.g. MitochondrialOuterMembrane), andtransport protein (e.g. PyruvateTransporter), and neednot be specified separately for eachinstanceof theseactive objects. In a more mature version of CellAK, amore complete range of behaviors would be available.This would generally make it unnecessary for users towrite their own C, C++ or Java code.

    In the C++ code inFig. 11, which implements ir-reversible Michaelis–Menten kinetics (Becker et al.,1996, p. 148+;Mendes, 2003) sm→ is a reference tothe SmallMolecule data structure that in this case islocated in Cytosol, whilegene → refers to a specificgene in the Chromosome. All processing by active ob-jects makes use of these two types of data, data thatthey know about because of the two types of messageexchange that occur during initial configuration.

    The gene in CellAK is encoded as a set offeatures that includes protein kinetic constants. Forexample, inFig. 11, gene →substrateV refersto V the upper limit of the rate of reaction, andgt cules itsm

    p pasii byo

    v

    Fig. 11. One of many possible Enzyme

    ene →substrateK is the Michaelis constantKmhat gives the concentration of the substrate moleat which the reaction will proceed at one-half ofaximum velocity.The Gepasi software package (Mendes, 1997)

    erforms the same processing using ODEs. Gemplements irreversible Michaelis-Menten kineticsperating on the following formula (Mendes, 2003):

    = VSKm + S

    reaction types, as implemented in C++.

  • K. Webb, T. White / BioSystems 80 (2005) 283–302 297

    wherev is the amount of change in the quantity of thesubstrate and product, andS is the initial quantity ofsubstrate. This formula is implemented on line 4 ofFig. 11.

    The reaction rules could be as simple or as complexas needed. Because RRT can incorporate existing C,C++ or Java code, the reaction rules could make use ofexisting code from other biochemistry modeling tools.The combined action of multiple reaction rules overtime results in the two metabolic pathways that are partof the simple example model. These are the glycolyticpathway and the TCA cycle.

    In larger models using CellAK, containing 1000+cells and varying numbers of organelles, the time-Course transition shown inFig. 10has been replacedwith a simple scheduler implemented in C++. Each ac-tive object registers with the scheduler as part of itssubstrateLevel transition, and is subsequently directlyinvoked at a regular interval.

    5.5. Quantitative results

    The main focus in CellAK has been on a qualitativemodel, but this approach can also provide quantitativeresults which very closely approximate those computedusing Gepasi, a tool that does claim to produce accuratequantitative results. In addition to the practical value ofhaving CellAK generate accurate results, these resultsalso help to confirm the correctness of its design andimplementation.

    ni deli andt lites( teda rs orc t ones thef ones ucts( e-3-p ares t ofs

    I lite.O imes ffer-

    Fig. 12. Percentage difference between Gepasi and CellAK results.

    ence between the CellAK and Gepasi results is nevergreater than 0.005%.

    There is continuously more Glucose in the CellAKmodel with the passage of time than in the Gepasi ver-sion. In CellAK the cell bilayer constantly replenishesthe amount of Glucose in the cytosol by transportingit at a low rate from the extra cellular solution. Thislow rate, as currently implemented, is not sufficient tokeep the Glucose quantity constant in the cytosol. Inboth the Gepasi and CellAK results, the Glucose leveldecreases from 1,000,000 to around 900,000 (900,160Gepasi, 903,893 CellAK) after 1000 s.

    6. Discussion

    Any model or simulation of a cell must take intoaccount two separate architectures. The first is thetop–down containment structure – the membranes, thesolutions such as Cytosol, the small molecules, andthe active objects such as enzymes. The other architec-ture is the bottom-up behavior – the dynamic reactionsbetween molecules, and the rules and parameters thatdefine these reactions.

    OO, UML, ROOM and RRT make a fundamen-tal distinction between structural modeling and behav-ioral modeling of computer systems (Selic et al., 1994;Harel, 2001, p. 55). Feitelson and Treinin (2002, p.34) suggest a biological parallel when they state that“ ot tob froms heses ture( withc the

    A simplified Glycolytic Pathway model was run parallel using CellAK and Gepasi. The moncludes the 10 standard enzymes of glycolysis,he 11 standard substrate and product metaboBecker et al., p. 308). All enzymes are implemens irreversible, and there are no activators, inhibitooenzymes. Nine of the enzyme reactions converubstrate into one product. The sole exception isourth enzyme reaction (Aldolase) that convertsubstrate (fructose-1,6-biphosphate) into two proddihydroxyacetonephosphate and glyceraldehydhosphate). The units of time in both modelseconds, but more realistically should be thoughimply as discrete time steps.

    The results of this experiment are shown inFig. 12.nitially there are 1,000,000 units of each metabover the course of the simulation, during 1000 tteps, for ten out of the eleven metabolites, the di

    the particulars of many cellular structures seem ne encoded in DNA, and they are never createdcratch; rather, each cell inherits templates for ttructures from its parent cell.” Some cellular struca hierarchy of cell and organelle compartmentsollections of small molecules) must exist before

  • 298 K. Webb, T. White / BioSystems 80 (2005) 283–302

    DNA-encoded enzymes and other proteins can carryout their behavioral work. This structure represents theresult of a slow evolution over many millions of years,and it also represents the constraints that physics andchemistry enforce on the otherwise open-ended realmof possibilities. The behavior comes about through aprocess of Darwinian evolution and involves the genes.This is reflected in the need to separately specify bothstructure (capsules, compartments, components; andobjects, molecular species, small molecules) and be-havior (state machines, operations, rules, reactions) inthe startup configuration data for all the computer sys-tems mentioned in this paper.Table 1shows the ter-minology employed by each of these systems. In eachcase, behavior operates on objects (fine-grained struc-tural elements) within a structural architecture.

    Each approach mentioned in this paper has mech-anisms to deal with each of these architectures, andwith how the two come together at run-time. CellAK,with its OO, UML and RRT roots, focuses primarilyon the structural architecture, but does provide a sim-ple mechanism to simulate the time evolution of anynumber of chemical reactions occurring concurrentlywithin and between an arbitrary number of compart-ments. Tools such as Gepasi, with their roots in bio-chemistry and differential equation modeling, focusprimarily on the moment-by-moment time evolutionof the behavioral architecture, but also provide vary-ing amounts of support for aspects of the complexstructural architecture. In CellAK, reaction modelingi e ki-n ssingb are,t ne ac

    tive object to another. This makes it possible to modelvery detailed, localized phenomena in the cell, whichare extremely difficult to capture in a differential equa-tion representation. Modeling using message passingalso makes it straightforward to introduce new compo-nents to the containment hierarchy, as existing interac-tions are unaffected by the introduction. This is con-siderably harder with a differential equation represen-tation as extreme care must be taken when adding, re-moving or modifying terms in the dynamical equationsfor the system. Stated another way, CellAK separatescontainment and interaction; differential equationsdo not.

    The scope or namespace mechanism is standard insoftware development. An object (or attribute) calledpyruvate can be simultaneously used within two ormore parts of the system, and refer to a different en-tity in each case. If there are pyruvate molecules inCytosol, MitochondrialIntermembranesol, and Matrix-sol, all three can simply be called pyruvate and donot need to be distinguished by using separate namessuch as pyruvateCyt, pyruvateInt, and pyruvateMtrx.In an OO system each object is automatically scoped,has its own namespace. This is not true with manyof the biochemistry modeling tools. E-CELL, Gepasi,Jarnac and StochSim do not appear to support a scopeor namespace mechanism. All entities are globallyscoped and belong to the same global namespace. Allfour of these tools do allow for multiple compart-ments, but the molecular species within each com-p Noneo theO theirc

    TT ed in th ng concepts,i vior, a fine-graineds

    S ture (fi

    U tes, daC molec ametersS sE ncesG olitesJ ies nodV sC lesM cts

    s configured using message passing and enzymetics are coded directly into the message proceehavior. Interactions between biological elements

    herefore, directed, as messages are passed from o

    able 1he terminology used by the various computer systems describ

    ncluding a fundamental distinction between structure and behatructure

    ystem Structure (architectural) Struc

    ML, OOM/RRT Capsule structure AttribuellAK Capsule hierarchies SmallBML Compartments Specie-CELL Cell components Substaepasi Compartments Metab

    arnac Compartments Specirtual cell Regions SpecieellML Components, groups Variabembrane systems Membrane regions Obje

    -

    artment must be given separate unique names.f these tools can be said to support objects inO sense of entities that encapsulate or hide

    ontents.

    is paper differs somewhat, but they all share the same underlyind a secondary distinction between structural architecture and

    ne-grained) Behavior

    ta objects State machine behavior, operationsule objects Active object operations, gene-specified par

    Reactions, rulesReaction rulesReactions, kinetics

    es ReactionsReactions, fluxesReactions, math

    Evolution rules

  • K. Webb, T. White / BioSystems 80 (2005) 283–302 299

    The SBML and CellML specifications are neutralon the issue of scoping and namespaces. Models canbe represented in either of these two markup languagesyntaxes, with the same molecular species name such aspyruvate appearing in more than one compartment. Thesemantics of the modeling tool that reads the SBML orCellML file may allow for separate namespaces, or itmay concatenate the species and compartment namesto derive globally unique names.

    None of the biochemistry modeling tools supportthe OO concepts of class and inheritance. There is nomechanism to define a type or class of entity suchas Mitochondrion, and then make use of it in morethan one part of the model by simply naming its type.CellML does provide a simple class-like reuse mech-anism through its ability for one file to import anotherfile. None of the modeling tools provide a concept ofsubclass, or of a multiplicity of objects of the sametype.

    CellAK, as an OO system, does make use of classes,static and dynamic instantiation of objects from classes,subclasses, and multiplicity. Once a class such as En-zyme has been designed, any number of instances ofthis class (objects) can be created within Cytoplasmand Matrix. At run-time each instance of Enzyme canbe configured to be a different type of enzyme, newenzymes can be dynamically created as conditions inthe simulation change (through molecular signaling,MRNA transcription from DNA, and ribosomal trans-lation to protein), enzymes can be dynamically de-s callyr cted,a s ofM willa ncet yto-p Mi-t toe monC de-fi sso

    in-s usert andu in as ell,f licit

    unique name, requires considerably more human ef-fort when upwardly scaling configurations in the waydescribed in the previous sentence.

    On a single-processor computer, the amount ofmemory space required to run CellAK isO(m), wherem is the number of object instances in the system. Theamount of CPU time required isO(n), wheren is thenumber of active object instances. Thus, space and timerequirements depend linearly on the numbers of ob-jects and active objects. For the above example, 10,000(1000× 10) times more computer space and time arerequired than for a configuration with only one celland one mitochondrion. Once the initial configurationphase is complete, the only activity in the system is forthe scheduler to regularly and sequentially call eachactive object to carry out its simple action of movingor transforming small molecules. For a configurationin which the number of entities remains the same, thespace and time characteristics will remain constant forthe duration of the run.

    The use of visual programming in CellAK makesthe generated models more accessible to biochemistsand biologists. Physical structures are naturally repre-sented in a containment hierarchy and interactions areexplicitly represented using ports and protocols. The al-ternative differential equation representation hides in-teractions in terms in equations, which are significantlymore obscure for the non-mathematician. With inter-actions “hidden”, it makes model reuse more difficult.Reuse is explicitly encouraged with CellAK.

    ofc roto-c le tor lug-g

    on-s aper.W ir-c ane f ast origi-n thisp

    ub-c hina seso ple-m ery-

    troyed as they age, and entities can be dynamieconfigured (connected to each other, disconnend reconnected to other entities). If 100 instanceitochondrion are defined within Cytoplasm, eachutomatically, because of its class definition, refere

    he common small molecule data structure in Clasm and its own small molecule data structure in

    ochondrialIntermembranesol. Pyruvate will moveach of these internal compartments from the comytoplasm, according to the kinetic rate constantned within the MitochondrialOuterBilayer subclaf LipidBilayer.

    A system with 1000 cells each containing 10tances of Mitochondrion is as easy for a humano set up and run (although not as easy to observenderstand) in an OO system such as CellAK asystem with 1 cell and 1 (or 0) Mitochondria. E-Cor example, in which each component has an exp

    One way to use CellAK is to develop a libraryomponents with standard interfaces using the pols described in this paper. It then becomes possibapidly develop new models and simulations by ping a selection of components together.

    A number of complex simulations have been ctructed to test the concepts presented in this pe will briefly discuss the implementation of a c

    ulatory system, of a circuit of neurons, and ofcology. All of these simulations can be thought o

    est harnesses, separate environments to test theal EukaryoticCell and the concepts discussed inaper.

    In the CirculatorySystem object, instances of a slass of EukaryoticCell called Erythrocyte move witstructure of Vein, Artery, Capillary (three subclasf BloodVessel), and various Heart objects, all imented as RRT capsules. Each of the 100 or so

  • 300 K. Webb, T. White / BioSystems 80 (2005) 283–302

    throcytes (each within its own unit of BloodPlasma)is continuously reconfigured to refer successively toa cyclical series of external spaces including Lung (acompartment containing a high level of oxygen andlow level of carbon dioxide small molecules), Diges-tiveSystem (a compartment containing high levels ofglucose), and Brain (a compartment containing low lev-els of oxygen and glucose, and high levels of carbondioxide). The reaction rules and kinetic rate constantsdefined for the CellBilayer class determine the relativeconcentrations of oxygen, carbon dioxide and glucosewithin the BloodPlasma, Lung, DigestiveSystem andBrain compartments during the time evolution of thesimulation. The entire cell structure described in thispaper, as illustrated inFig. 9, was reused without alter-ation and embedded within each of the 100 cells in theCirculatorySystem simulation object.

    This ability to embed an object created for one sim-ulation within a larger simulation is possible becauseof the mechanisms provided by OO, UML and RRT,and by the simple SubstrateRequest—SubstrateLevelprotocol (interface) that all active objects andsmall molecule compartments have in common (seeFigs. 5 and 8).

    Another test involved EukaryoticCell specializedas a NeuronCellBody, as part of a Neuron. Neuronand Synapse objects were combined to form a neu-ral circuit that could transmit neural (chemical andelectrical) messages. The resulting Brain, a separatelydeveloped DigestiveSystem containing EukaryoticCells stemw ni-m om-b ucea

    7

    ri-e ime( de-s on-p orea

    a pos-s an

    XML specification, preferably either SBML (Hucka etal., 2003b) or CellML (Hedley et al., 2001). This wouldallow model exchange with tools such as Gepasi. It isour view that the creation of an interface to theSystemBiology Workbenchwould also be beneficial.

    CellAK, as described in this paper, is a top–down ap-proach, in keeping with the process normally followedwhen using the OO paradigm, the UML and ROOM vi-sual formalisms, and the RRT toolset. CellAK configu-rations could instead be implemented in a more genericway, to allow them to be manipulated using bottom-upevolutionary mechanisms. This becomes possible if theapproach described in this paper is reinterpreted as thefollowing alternative four steps:

    Step 1 Manipulate the containment hierarchy by rep-resenting it explicitly as a tree, and thenoperating on it using evolutionary mecha-nisms (Koza et al., 2001). Although the RRTcontainment hierarchy is already tree-like, itcannot be dynamically manipulated at run-time. An inheritance hierarchy could option-ally also be implemented to provide names fornode types, and to help constrain the range ofconfiguration and runtime behaviors availablefor each node type.

    Step 2 Create additional lateral connections betweennodes in the tree. Replace the RRT mechanismof static ports, protocols and connections with

    toat-des.ey

    S tiveh as

    ngomese

    beo sed.W lly,r Thes llowi ons

    ubclasses called MucosalCell, and CirculatorySyere combined into a HumanBeing (subclass of Aal) simulation. HumanBeing was subsequently cined with Plant, Atmosphere, Lake and Sun to prodsimulation of a simple ecology.

    . Future work

    CellAK is currently implemented using a proptary commercial toolset, Rational Rose RealTRRT). The architectural approach and processcribed in this paper should be ported to a nroprietary environment, which would make it mccessible to a larger community.

    Specific configurations, such as that shown inFig. 9,re currently hardcoded using RRT. It should beible to read in and write out a configuration using

    a set of algorithms that nodes could followdynamically configure themselves by naviging the tree structure to locate adjacent noThese would be the nodes with which thwould subsequently dynamically interact.

    teps 3 and 4 Provide primitives from which acobjects could evolve simple behaviors sucthe enzyme behavior shown inFig. 11. Allowhigher-level patterns of interaction includimetabolic and other networks to emerge frlocal interactions, rather than having thprespecified.

    In addition, several practical limitations need tovercome. First, visualization needs to be addreshile the creation of the model is undertaken visua

    esults of the running model are not easy to view.ystem requires the introduction of probes that andividual attributes to be viewed; e.g. concentrati

  • K. Webb, T. White / BioSystems 80 (2005) 283–302 301

    of particular substrates. Visualization should supportthe display of the time evolution of probes in multipleformats such as gauges, line and bar charts. It shouldalso be possible for a user to interact with the modelin order to modify values and, potentially, the modelitself. On a more practical note, the export of simulateddata should be enhanced.

    Second, parameters used to characterize the modelare sometimes hard to obtain; agent-based models aretypically characterized by more variables than their dif-ferential equation counterparts. We are interested inusing techniques from evolutionary computing (EC)to have these parameters generated automatically. Itwould be our intention to have the EC algorithm runthe model multiple times with various parameter set-tings to optimize the validation of the model and toinvestigate its sensitivity to particular parameters.

    Finally, the CellAK system lacks the flexibility in-herent in the mathematical framework provided byp-systems. The specification of reactions is still largelyprocedural, not declarative. This does create a burdenon the user to have some level of understanding of pro-gramming. It would be our intention to incorporate amechanism similar top-systems in a future release ofthe CellAK framework.

    8. Conclusion

    Simulation tools for non-mathematicians using vi-s er toi el-l ullr la-t od-e hers turei s.I y inu OOp thes ouslya Ah sfult thec -upa strym ten-

    tative four-step process for the development of suchhybrid systems.

    Acknowledgements

    Thanks to the three external reviewers for their use-ful comments.

    References

    Becker, W., Reece, J., Poenie, M., 1996. The World of the Cell, thirded. Benjamin/Cummings, Menlo Park, CA.

    Benatŕe, J.-P., Coutant, A., Le Metayer, D., 1988. A parallel machinefor multiset transformation and its programming style. FutyreGeneration Computer Systems 4, North-Holland, pp. 133–144.

    Booch, G., Rumbaugh, J., Jacobson, I., 1998. The Unified ModelingLanguage User Guide. Addison-Wesley, Reading, MA.

    Ciobanu, G., 2003. Distributed algorithms over communicatingmembrane systems. BioSystems 70, 123–133.

    Feitelson, D., Treinin, M., July 2002. The blueprint for life? Com-puter, 34–40.

    Gonzalez, P., Cardenas, M., Camacho, D., Franyuti, A., Rosas, O.,Lagunez-Otero, J., 2003. Cellulat: an agent-based intracellularsignalling model. BioSystems 68, 171–185.

    Harel, D., 1987. Statecharts: a visual formalism for complex systems.Science of Computer Programming 8, 231–274.

    Harel, D., 1988. On visual formalisms. Communications of the ACM31, 514–530.

    Harel, D., January 2001. From play-in scenarios to code: an achiev-able dream. IEEE Computer, 53–60.

    Harel, D., 2002. A Grand Challenge for Computing: Full Re-active Modeling of a Multi-Cellular Animal. In: Workshop

    urgh,.

    H ort9,

    H iol-ities

    H uageem-

    I

    K E.,gansgs ofnal20.

    K entrks.

    ual modeling and programming are needed in ordncrease the accessibility of modeling in biology. CAK is a step towards the realization of Harel’s “feactive modeling” and Tomita’s “whole cell simuion”. This paper has suggested that biochemical mling and simulation tools will need to provide a ricet of mechanisms for top–down structural architecf they truly wish to construct whole-cell simulationt is our hypothesis that the approaches currentlse by the software development community, thearadigm, the UML methodological formalism, andpecifics of tools such as RRT, can be advantagepplied to the problem of whole-cell simulation.ybrid approach would combine the most succes

    op–down architectural mechanisms used withinommercial software industry, with proven bottomlgorithmic mechanisms provided by the biochemiodeling community. This paper has proposed a

    on Grand Challenges for Computing Research, EdinbScotland, November 2002.http://www.wisdom.weizmannac.il/∼dharel/papers/GrandChallenge.doc.

    edley, W., Nelson, M., Bullivant, D., Nielsen, P., 2001. A shintroduction to CellML. Phil. Trans. R. Soc. Lond. A 351073–1089.

    ucka, M., Finney, A., Sauro, H., Bolouri, H., 2003a. Systems Bogy Markup Language (SBML) Level 1: structures and facilfor basic model definitions.http://www.cds.caltech.edu/erato.

    ucka, M., et al., 2003b. The systems biology markup lang(SBML): a medium for representation and exchange of biochical network models. Bioinformatics 19, 524–531.

    -Logix, 2003. I-Logix Rhapsody and Statemate.http://www.ilogix.com.

    am, N., Harel, D., Kugler, H., Marelly, R., Pnueli, A., Hubbard,Stern, M., February 24–26, 2003. Formal Modeling of C. eleDevelopment: A Scenario-Based Approach. In: ProceedinComputational Methods in Systems Biology: First InternatioWorkshop, CMSB 2003, Rovereto, Italy, LNCS 2602, pp. 4–

    han, S., Makkena, R., Gillis, W., Schmidt, C., 2003. A multi-agsystem for the quantitative simulation of biological netwoAAMAS’03, 385–392.

    http://www.wisdom.weizmann.ac.il/~dharel/papers/grandchallenge.dochttp://www.wisdom.weizmann.ac.il/~dharel/papers/grandchallenge.dochttp://www.cds.caltech.edu/eratohttp://www.ilogix.com/http://www.ilogix.com/

  • 302 K. Webb, T. White / BioSystems 80 (2005) 283–302

    Koza, J., Mydlowec, W., Lanza, G., Yu, J., Keane, M., 2001. Reverseengineering of metabolic pathways from observed data usinggenetic programming. PSB, 434–445.

    Kruchten, P., 2000. The Rational Unified Process: An Introduction,second ed. Addison-Wesley, Reading, MA.

    Loew, L., Schaff, J., 2001. The virtual cell: a software environmentfor computational cell biology. Trends Biotechnol. 19, 401–406.

    Mendes, P., 1993. GEPASI: a software package for modelling thedynamics, steady states and control of biochemical and othersystems. Comput. Appl. Biosci. 9, 563–571.

    Mendes, P., 1997. Biochemistry by numbers: simulation of biochem-ical pathways with Gepasi 3. Trends Biochem. Sci. 22, 361–363.

    Mendes, P., 2003. Gepasi 3.30.http://www.gepasi.org.Morton-Firth, C., Bray, D., 1998. Predicting temporal fluctuations

    in an intracellular signalling pathway. J. Theor. Biol. 192, 117–128.

    NRCAM, 2003. The Virtual Cell.http://www.nrcam.uchc.edu.OMG, 2003. Unified Modeling Language (UML).http://www.omg.

    org/uml.Păun, Gh., 2000. Computing with membranes. J. Comput. Syst. Sci.

    61, 108–143.Pohjonen, R., Kelly, S., 2002. Domain-specific modeling. Dr. Dobbs

    J. August 339, 26–35.Quatrani, T., 1998. Visual Modeling with Rational Rose and UML.

    Addison-Wesley, Reading, MA.

    Rational Software, 2003. Rational Rose RealTime.http://www.rational.com/products/rosert.

    Sauro, H., 2000. JARNAC: a system for interactive metabolic anal-ysis.http://www.sys-bio.org.

    Schaff, J., Slepchenko, B., Loew, L., 2000. Physiological modelingwith virtual cell framework. Methods Enzymol. 321, 1–22.

    Selic, B., Gullekson, G., Ward, P., 1994. Real-time Object-OrientedModeling. John Wiley & Sons, New York.

    Slepchenko, B., Schaff, J., Carson, J., Loew, L., 2002. Computationalcell biology: spatiotemporal simulation of cellular events. Annu.Rev. Biophys. Biomol. Struct. 31, 423–442.

    System Biology Workbench, 2003.http://www.sbw-sbml.org.Takahashi, K., Yugi, K., Hashimoto, K., Yamada, Y., Pickett, C.,

    Tomita, M., 2002. Computational challenges in cell simulation:a software engineering approach. IEEE Intell. Syst. 17, 64–71.

    Tomita, M., Hashimoto, K., Takahashi, K., Shimizu, T., Matsuzaki,Y., Miyoshi, F., Saito, K., Tanida, S., Yugi, K., Venter, J., Hutchi-son, C., 1999. E-Cell: software environment for whole-cell sim-ulation. Bioinformatics 15, 72–84.

    Tomita, M., 2001. Whole-cell simulation: a grand challenge of the21st century. Trends Biotechnol. 19, 205–210.

    Webb, K., White, T., 14–17 March 2004. Combining analysis andsynthesis in a model of a biological cell. In: Symposium on Ap-plied Computing (SAC 2004), Nicosia, Cyprus.

    http://www.gepasi.org/http://www.nrcam.uchc.edu/http://www.omg.org/umlhttp://www.omg.org/umlhttp://www.rational.com/products/roserthttp://www.rational.com/products/roserthttp://www.sys-bio.org/http://www.sbw-sbml.org/

    UML as a cell and biochemistry modeling languageIntroductionObject-oriented (OO) paradigmUnified modeling language (UML)The ROOM formalism and the Rational Rose RealTime toolProcessStep 1: Identify entities, inheritance and containment hierarchiesStep 2: Establish relationships between entitiesStep 3: Define external and internal behavior patternsStep 4: Implement detailed behaviorQuantitative results

    DiscussionFuture workConclusionAcknowledgementsReferences