creating a new language to support open innovation

55
Creating a new language to support open innovation Michael Hucka, Ph.D. Department of Computing + Mathematical Sciences California Institute of Technology Pasadena, CA, USA BioBrieng – BioMelbourne Network, Australia, August 2013 Email: [email protected] Twitter: @mhucka

Upload: mike-hucka

Post on 16-Dec-2014

106 views

Category:

Technology


1 download

DESCRIPTION

Presentation given on 19 August 2013 at a BioBriefings meeting of the BioMelbourne Network (http://www.biomelbourne.org/events/view/289) in Melbourne, Australia.

TRANSCRIPT

Page 2: Creating a new language to support open innovation

Outli

ne

Background and introduction

The Systems Biology Markup Language (SBML)

Complementary efforts: MIRIAM and SED-ML

COMBINE: the Computational Modeling in Biology Network

Conclusion

Page 3: Creating a new language to support open innovation

Outli

ne

Background and introduction

The Systems Biology Markup Language (SBML)

Complementary efforts: MIRIAM and SED-ML

COMBINE: the Computational Modeling in Biology Network

Conclusion

Page 4: Creating a new language to support open innovation

Research today: experimentation, computation, cogitation

Page 5: Creating a new language to support open innovation

“ The nature of systems biology”Bruggeman & Westerhoff,

Trends Microbiol. 15 (2007).

Page 6: Creating a new language to support open innovation

Large-scale integrative models are growing

Page 7: Creating a new language to support open innovation

Many models have traditionally been published this way

Problems:

• Errors in printing

• Missing information

• Dependencies onimplementation

• Outright errors

• Can be a hugeeffort to recreate

Is it enough to communicate the model in a paper?

Page 8: Creating a new language to support open innovation

Experiences from BioModels DatabaseBioModels Database:

• Public database of published computational models in biology

• Many models are curated – i.e., made to work & annotated

- If not available in electronic form, they encode it from the paper

Their experiences?

• Vast majority of models encoded directly from the publication did not work as published

- Often (not always) due to common errors – typos, omissions

• Success rate improved in recent years thanks to more people providing their models in electronic formats

More is needed to make computational results reproducible

Page 9: Creating a new language to support open innovation

Experiences from BioModels DatabaseBioModels Database:

• Public database of published computational models in biology

• Many models are curated – i.e., made to work & annotated

- If not available in electronic form, they encode it from the paper

Their experiences?

• Vast majority of models encoded directly from the publication did not work as published

- Often (not always) due to common errors – typos, omissions

• Success rate improved in recent years thanks to more people providing their models in electronic formats

More is needed to make computational results reproducible

http://biomodels.net/biomodels

Page 10: Creating a new language to support open innovation

Experiences from BioModels DatabaseBioModels Database:

• Public database of published computational models in biology

• Many models are curated – i.e., made to work & annotated

- If not available in electronic form, they encode it from the paper

Their experiences?

• Vast majority of models encoded directly from the publication did not work as published

- Often (not always) due to common errors – typos, omissions

• Success rate improved in recent years thanks to more people providing their models in electronic formats

More is needed to make computational results reproducible

Page 11: Creating a new language to support open innovation

Is it enough to make your (software X) code available?It’s vital for good science:

• Someone with access to the same software can try to run it, understand it, verify the computational results, build on them, etc.

• Opinion: you should always do this in any case

Page 12: Creating a new language to support open innovation

Is it enough to make your (software X) code available?It’s vital for good science—

• Someone with access to the same software can try to run it, understand it, build on it, etc.

• Opinion: you should always do this in any case

But it’s still not ideal for communication of scientific results:

• What if they don’t have access to the same software?

• What if they don’t want to use that software?

• What if they want to use a different conceptual framework?

• And how will people be able to relate the model to other work?

Page 13: Creating a new language to support open innovation

Different tools ⇒ different interfaces & languages

Page 14: Creating a new language to support open innovation

Outli

ne

Background and introduction

The Systems Biology Markup Language (SBML)

Complementary efforts: MIRIAM and SED-ML

COMBINE: the Computational Modeling in Biology Network

Conclusion

Page 15: Creating a new language to support open innovation

SBML: a lingua fra

nca

for software

Page 16: Creating a new language to support open innovation

Format for representing computational models of biological processes

• Data structures + usage principles + serialization to XML

• (Mostly) Declarative, not procedural—not a scripting language

Neutral with respect to modeling framework

• E.g., ODE, stochastic systems, etc.

Important: software reads/writes SBML, not humans <Beginning of SBML model definition>

List of function definitionsList of unit definitionsList of compartments

List of molecular speciesList of parameters

List of rulesList of reactions

List of events<End of SBML model definition>

SBML = Systems Biology Markup Language

Page 17: Creating a new language to support open innovation

The raw SBML

Page 18: Creating a new language to support open innovation

The process is central

• Literally called a “reaction” in SBML

• Participants are pools of entities (biochemical species)

Models can further include:

• Compartments

• Other constants & variables

• Discontinuous events

• Other, explicit math

Core SBML concepts are fairly simple

• Unit definitions

• Annotations

Page 19: Creating a new language to support open innovation

SBML is now widely used

Dozens of journals accept models in SBML format

100’s of software tools available today

1000’s of models available in SBML format today

0

100

200

300

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

254+ today

Page 20: Creating a new language to support open innovation

Contents of BioModels DatabaseContents today:

• 142,000+ pathway models (converted from KEGG)

• 460+ hand-curated quantitative models

• 460+ non-curated quantitative models

8%2%

3%6%

6%

7%

8%

9%24%

27%

signal transductionmetabolic processmulticelullar organismal processrhythmic processcell cyclehomeostatic processresponse to stimuluscell deathlocalizationothers (e.g., developmental process)

Database data from 2013

Page 21: Creating a new language to support open innovation

Free software libraries – libSBMLReads, writes, validates SBML

Can check & convert units

Written in portable C++

Runs on Linux, Mac, Windows

APIs for C, C++, C#, Java, Octave, Perl, Python, R, Ruby, MATLAB

Well documented API

Open-source (LGPL)

http://sbml.org/Software/libSBML

Page 22: Creating a new language to support open innovation

Free software libraries – JSBMLPure Java implementation

API is compatible with libSBML but more Java-like

Functionality is subset of libSBML

Open source (LGPL)

http://sbml.org/Software/JSBML

Page 23: Creating a new language to support open innovation

Evolution of SBML continuesToday: SBML Level 3

• Level 3 Core provides framework for common models

• Level 3 packages add additional constructs to the Core

Page 24: Creating a new language to support open innovation

Level 3 package What it enablesHierarchical model composition Models containing submodels ✔

Flux balance constraints Constraint-based models ✔

Qualitative models Petri net models, Boolean models ✔

Graph layout Diagrams of models ✔

Multicomponent/state species Entities w/ structure; also rule-based models draft

Spatial Nonhomogeneous spatial models draft

Graph rendering Diagrams of models draft

Groups Arbitrary grouping of components draft

Distributions Numerical values as statistical distributions in dev

Arrays & sets Arrays or sets of entities in dev

Dynamic structures Creation & destruction of components in dev

Annotations Richer annotation syntax

Status

Page 25: Creating a new language to support open innovation

National Institute of General Medical Sciences (USA) European Molecular Biology Laboratory (EMBL)JST ERATO Kitano Symbiotic Systems Project (Japan) (to 2003)JST ERATO-SORST Program (Japan)ELIXIR (UK)Beckman Institute, Caltech (USA)Keio University (Japan)International Joint Research Program of NEDO (Japan)Japanese Ministry of AgricultureJapanese Ministry of Educ., Culture, Sports, Science and Tech.BBSRC (UK)National Science Foundation (USA)DARPA IPTO Bio-SPICE Bio-Computation Program (USA)Air Force Office of Scientific Research (USA)STRI, University of Hertfordshire (UK)Molecular Sciences Institute (USA)

SBML funding sources over the past 13+ years

Page 26: Creating a new language to support open innovation

Outli

ne

Background and introduction

The Systems Biology Markup Language (SBML)

Complementary efforts: MIRIAM and SED-ML

COMBINE: the Computational Modeling in Biology Network

Conclusion

Page 27: Creating a new language to support open innovation

Mathematical semantics

Biological semantics

Visual interpretation

Discrete stochastic entities

Continuous lumped parameter

State transition

Mean field approximation

Model type

Model creation

Model annotation

Model analysis

Numerical results

Model life-cycle

Model representation level

COMBINE efforts cover different facets of modeling

...

Conc

ept d

ue to

Nic

olas

Le N

ovèr

e

Page 28: Creating a new language to support open innovation

Modelers want to use their own conventions

Page 29: Creating a new language to support open innovation

Modelers want to use their own conventions

No standard identifiers

Page 30: Creating a new language to support open innovation

Modelers want to use their own conventions

Low info content

No standard identifiers

Page 31: Creating a new language to support open innovation

Raw models alone are insufficient

Need standard schemes for machine-readable annotations

• Identify entities

• Mathematical semantics

• Links to other data resources

• Authorship & pub. info

Modelers want to use their own conventions

Low info content

No standard identifiers

Page 32: Creating a new language to support open innovation

Addresses 2 general areas of annotation needs:

MIRIAM is not specific to SBML

MIRIAM (Minimum Information Requested In the Annotation of Models)

Requirements for reference correspondence

Scheme for encoding annotations

Annotations for attributing model creators & sources

Annotations for referring to external

data resources

Page 33: Creating a new language to support open innovation

Addresses 2 general areas of annotation needs:

MIRIAM is not specific to SBML

MIRIAM (Minimum Information Requested In the Annotation of Models)

Requirements for reference correspondence

Scheme for encoding annotations

Annotations for attributing model creators & sources

Annotations for referring to external

data resources

Annotations for referring to external

data resources

Page 34: Creating a new language to support open innovation

Example of a problem that can be solved with annotations

http://www.ebi.ac.uk/chebi

Low info content

Page 35: Creating a new language to support open innovation

Example of a problem that can be solved with annotations

http://www.ebi.ac.uk/chebi

Low info content

Known by different names – do you want to write all of

them into your model?

salicylic acid

Page 36: Creating a new language to support open innovation

MIRIAM annotations for external referencesGoal: link model constituents to corresponding entities in bioinformatics resources (e.g., databases, controlled vocabularies)

• Supports:

- Precise identification of model constituents

- Discovery of models that concern the same thing

- Comparison of model constituents between different models

MIRIAM approach avoids putting data content directly in the model

• Instead, it points at external resources that contain the data

Page 37: Creating a new language to support open innovation

How do we create globally unique identifiers consistently?Long story short—developed by the Le Novère group at the EBI

• Resource identifiers (URIs) combine 2 parts:

• There’s a registry for namespaces: MIRIAM Registry

- Allows people & software to use same namespace identifiers

• There’s a URI resolution service: MIRIAM Resources & identifiers.org

- Allows people & software to take a given identifier and figure out what it points to

namespace entity identifier{ {

Identifies a dataset Identifies a datumwithin the dataset

Page 38: Creating a new language to support open innovation

Another problem: software can’t read figure legends

?

BIOMD0000000319 in BioModels Database

Decroly & Goldbeter, PNAS, 1982

Page 39: Creating a new language to support open innovation

SED-ML = Simulation Experiment Description MLApplication-independent format

• Captures procedures, algorithms, parameter values

Can be used for

• Simulation experiments encoding parametrizations & perturbations

• Simulations using more than one model and/or method

• Data manipulations to produce plot(s)

http://sedml.org

Simulation

Model

Task Data generators

Reports

Page 40: Creating a new language to support open innovation

Efforts like SED-ML improve reproducibility of publications

Waltemath et al., BMC Sys Bio 5, 2011.

Page 41: Creating a new language to support open innovation

Outli

ne

Background and introduction

The Systems Biology Markup Language (SBML)

Complementary efforts: MIRIAM and SED-ML

COMBINE: the Computational Modeling in Biology Network

Conclusion

Page 42: Creating a new language to support open innovation

Need interoperable formats, but developing them is not easyNeed people with diverse set of knowledge & skills

• Scientific needs

• Technical implementation skills

• Practical experience

Need manage multiple phases of a standardization effort

• Creation

• Evolution

• Support

Page 43: Creating a new language to support open innovation

Need interoperable formats, but developing them is not easyNeed people with diverse set of knowledge & skills

• Scientific needs

• Technical implementation skills

• Practical experience

Need manage multiple phases of a standardization effort

• Creation

• Evolution

• Support} This is just for the specification of the

standards, to say nothing of the necessary software and other infrastructure!

Page 44: Creating a new language to support open innovation

Realizations about the state of affairs in late-2000’s

• Many standardization efforts overlapped, but lacked coordination

• Efforts were inventing their own processes from scratch

• Many individual meetings meant more travel for many people

• Limited and fragile funding didn’t support solid, coherent base

COMBINE = Computational Modeling in Biology Network

• Coordinate standards development

• Develop common procedures & tools (but not impose them!)

• Coordinate meetings

• Provide a recognized voice

Motivations for the creation of COMBINE

Page 45: Creating a new language to support open innovation

Standardization efforts represented in COMBINE today

BioPAX

Qualifiers

GPML

COMBINE Standards

Associated Standardization Efforts

Related Standardization Efforts

Page 46: Creating a new language to support open innovation

Those are the products of successful, open collaborations!

Page 47: Creating a new language to support open innovation

Examples of community organizationTwo main annual meetings, plus ad hoc workshops

• COMBINE meeting: status updates, presentations, outreach

- Next COMBINE: Paris, Sep 16–20, 2013

• HARMONY: Hackathon on Resources for Modeling in Biology

- Software development, interoperability hacking

COMBINE 2012, TorontoCOMBINE 2011, Heidelberg

Page 48: Creating a new language to support open innovation

What motivates people to do this?Solving a problem for yourself/your closed group is easier and quicker

• So what are these people getting out of it?

Some advantages of an open, community-oriented approach:

• Finding better solutions they wouldn’t find alone

- Arguments Discussions leads to realizations & better solutions

• Contributions to science – publications, peer recognition

• Support of a standard makes their software more desirable

• Sense of community involvement

Some admitted disadvantages:

• Agreement takes time – progress can be very slow

• Solutions may include features you didn’t plan on, or need

Page 49: Creating a new language to support open innovation

COMBINE is open to all—and COMBINE needs you!

http://co.mbine.org

Current coordinators:

• Nicolas Le Novère, Mike Hucka, Falk Schreiber, Gary Bader

Page 50: Creating a new language to support open innovation

Outli

ne

Background and introduction

The Systems Biology Markup Language (SBML)

Complementary efforts: MIRIAM and SED-ML

COMBINE: the Computational Modeling in Biology Network

Conclusion

Page 51: Creating a new language to support open innovation

Time it well

• Too early and too late are bad

Start with actual stakeholders

• Address real needs, not perceived ones

Start with small team of dedicated developers

• Can work faster, more focused; also avoids “designed-by-committee”

Engage people constantly, in many ways

• Electronic forums, email, electronic voting, surveys, hackathons

Make the results free and open-source

• Makes people comfortable knowing it will always be available

Be creative about seeking funding

Some things we (maybe?) got right with SBML

Page 52: Creating a new language to support open innovation

Not waiting for implementations before freezing specifications

• Sometimes finalized specification before implementations tested it

- Especially bad when we failed to do a good job

‣ E.g., “forward thinking” features, or “elegant” designs

Not formalizing the development process sufficiently

• Especially early in the history, did not have a very open process

Not resolving intellectual property issues from the beginning

• Industrial users ask “who has the right to give any rights to this?”

Some things we certainly got wrong

Page 53: Creating a new language to support open innovation

Nicolas Le Novère, Henning Hermjakob, Camille Laibe, Chen Li, Lukas Endler, Nico Rodriguez, Marco Donizelli, Viji Chelliah, Mélanie Courtot, Harish Dharuri

Attendees at SBML 10th Anniversary Symposium, Edinburgh, 2010

John C. Doyle, Hiroaki Kitano

Mike Hucka, Sarah Keating, Frank Bergmann, Lucian Smith, Andrew Finney, Herbert Sauro, Hamid Bolouri, Ben Bornstein, Bruce Shapiro, Akira Funahashi, Akiya Juraku, Ben Kovitz

Original PI’s:

SBML Team:

SBML Editors:

BioModels DB:

Mike Hucka, Nicolas Le Novère, Sarah Keating, Frank Bergmann, Lucian Smith, Chris Myers, Stefan Hoops, Sven Sahle, James Schaff, Darren Wilkinson

And a huge thanks to many others in the COMBINE community

This work was made possible thanks to a great community

Page 54: Creating a new language to support open innovation

SBML http://sbml.org

BioModels Database http://biomodels.net/biomodels

MIRIAM http://biomodels.net/miriam

identifiers.org http://identifiers.org

SED-ML http://biomodels.net/sed-ml

SBO http://biomodels.net/sbo

SBGN http://sbgn.org

COMBINE http://co.mbine.org

URLs

Page 55: Creating a new language to support open innovation

I’d like your feedback!You can use this anonymous form:

http://tinyurl.com/mhuckafeedback