reproducibility of model-based results: standards, infrastructure, and recognition

40
hp://sems.uni-rostock.de Dagmar Waltemath September 2015, Rostock-Warnemünde | dcite Reproducibility of model-based results: Standards, infrastructure and recognition

Upload: fairdom

Post on 12-Jan-2017

903 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Dagmar WaltemathSeptember 2015, Rostock-Warnemünde | dcite

Reproducibility of model-based results:Standards, infrastructure and recognition

Page 2: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

What is a model?

Fig.: Modeling Cellular Reprogramming Using Network-based Models. Courtesy Antonio del Sol Mesa, LCSB Luxembourg

Fig.: Modeling the cell cycle using ODE systems. Goldbeter (1991), http://www.ncbi.nlm.nih.gov/pubmed/1833774

Fig.: Modeling large-scale networks. Lee et al (2013), http://www.nature.com/articles/srep02197.

2In systems biology, a computational model represents biological facts in

the computer. Often, the representation is simulated to help understand

the system's dynamic behavior.

Page 3: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Re[usea|produci]bility challenge

3

Slide courtesy Mike Hucka @ 2012 Computational Cell Biology Summer School

Page 4: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Re[usea|produci]bility challenge

4

Slide courtesy Mike Hucka @ 2012 Computational Cell Biology Summer School

“With greater interaction between tools, anda common format for publications and databases, userswould be better able to spend more time on actual research

rather than on struggling with data format issues.”

Page 5: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Re[usea|produci]bility challenge (2003)

5

Slide courtesy Mike Hucka @ 2012 Computational Cell Biology Summer School

“With greater interaction between tools, anda common format for publications and databases, userswould be better able to spend more time on actual research

rather than on struggling with data format issues.” (SBML L1)

Page 6: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

→ Standardised model representation

6

Ron Henkel et al. Database 2015;2015:bau130

Page 7: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Re[usea|produci]bility challenge (2010)

7

Fig.: Nature Blogs: Of Schemes and Dreams (2014)

Nine Worrying Stats on the Effect of Poor Scientific Data Management

Vijayalakshmi Chelliah et al. Nucl. Acids Res. 2015;43:D542-D548

Finding relevant models.

Page 8: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

→ Strategies for model similarity, ranking, clustering, filtering

Fig.: Henkel et al 2010 http://www.biomedcentral.com/1471-2105/11/423/

Fig.: Schulz et al 2011 DOI: 10.1038/msb.2011.41

x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

CellCycle Models

x x x x x x

x x x x x

x x

x

x x x x

x x x

x

x x x x

x x x

x

x

x x x x

x

x x x x

x x x

x x

x x

x x x

x x

x x x x

x x x

x

x x x x x

x x x x x x

x x x x x x

x x x x x x x

x x x x x

x x x x x

x x x x

x x x x x x x

x x x x x x

x x x x x x x x x

x x x x x x

x x

Fig.:Alm et al (2014) doi:10.1186/s13326-015-0014-4

Page 9: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Re[usea|produci]bility challenge (2012)

Reproducing published models.

Page 10: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

→ Standardised simulation descriptions

Fig.:Waltemath et al (2012) doi:10.1186/1752-0509-5-198

Page 11: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Re[usea|produci]bility challenge (2014)

Model-related data in the systems biology workflow

Linking the relevant files.

Page 12: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

→ Retrieval and archiving of simulation studies and asssociated files

Model-related data in the systems biology workflow

Linking model-related data

Give me all the files I need to run this simulation study.

Which are the most frequently used GO annotations in my model set?

Which models contain reactions with 'ATP' as reactant and 'ADP' as product?

Find good candidates for features describing my set of

models.

Page 13: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

State of affairs in 2015

● Standards:

– support for all steps of the modeling cycle

– support of various modeling techniques

– Still: some modeling concept not yet covered (→ Report of whole Cell modeling workshop, Waltemath et al 2015 (under review))

● Infrastructures:

– Software tools export/import standards

– Open model repositories and management systems

– Education

● Recognition

Page 14: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

COMBINE Standards

● COmputational Modeling in BIology Network

● Goals:– Avoid overlap of standardisation efforts– Coordinate standard developments– Coordinate meetings – Coordinate development of procedures & tools– common infrastructure for specification development, semantic

annotation, and dissemination

● All specifications now citable and accessible in one place: Schreiber et al. (2015) http://journal.imbio.de/articles/pdf/jib-258.pdf

Page 15: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

COMBINE Standards

Fig. : COMBINE standards today. Slide courtesy M. Hucka. http://www.slideshare.net/thehuck/a-summary-of-various-combine-standardization-activities

Page 16: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

COMBINE Standards

● Data formats– Community-developed representation formats for models and

related data– Format: XML, OWL, RDF/XML

● Minimum Information/Reporting guidelines:– Minimum amount of data and information required reproduce

and interpret an experiment– Format: human-readable specification documents

● Basis for the specification of data models and metadata● Bio-ontologies

Page 17: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

SBML

Fig.: SBML Level 3 Packages. Slide courtesy M. Hucka (ICSB 2014).

Page 18: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

SBML

Fig.: SBML Level 3 Packages. Slide courtesy M. Hucka (ICSB 2014).

Lucky modelers: You should not need to worry about the details of these (XML) formats, the tools should handle import and export! (Tool developers should though.)

Page 19: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Minimum Information Guidelines

● Reporting guidelines and checklists● Narrative description of the information necessary to

reproduce a model-based result● MIRIAM: Minimum Information about the Annotation of a

Model● MIASE: Minimum Information about a Simulation Experiment● MIAPE,MIAME… for experimental setups

Page 20: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

MIRIAM – information to provide about a model

● Models must– be encoded in a public machine readable format– be clearly linked to a single publication– reflect the structure of the biological processes described in the

reference paper (list of reactions, …)– be instantiable in a simulation (possess initial conditions, …)– be able to reproduce the results given in the reference paper– contain creator’s contact details– unambiguously identify each model constituent through annotation

Page 21: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

MIRIAM – information to provide about a model

● Models must– be encoded in a public machine readable format– be clearly linked to a single publication– reflect the structure of the biological processes described in the

reference paper (list of reactions, …)– be instantiable in a simulation (possess initial conditions, …)– be able to reproduce the results given in the reference paper– contain creator’s contact details– unambiguously identify each model constituent through annotation

You should worry about the details of the guidelines, as they help you to check whether you provide all necessary information.

Page 22: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Bio-ontologies for model annotation

● Major ontologies

● Linking framework: RDF/XML

● Annotation scheme: used to semantically enrich model files with detailed descriptions of the underlying biological entities, mathematical concepts or algorithms used during analysis

● De facto standard: SBML annotation scheme

Page 23: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Bio-ontologies for model annotation

enzyme

enzyme

product

substrate

enzymatic rate law

catalytic rate constant

urn:miriam:SBO:0000011urn:miriam:SBO:0000014

urn:miriam:SBO:0000014

urn:miriam:SBO:0000025

urn:miriam:SBO:0000015

Page 24: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Bio-ontologies for model annotation

Tyrosine

Phenylalanine-4-hydroxylase

Phenylalanine-4-hydroxylase

Tetrahydrobiopterin

urn:miriam:uniprot:P00439

urn:miriam:uniprot:Q03393

urn:miriam:uniprot:P07101

urn:miriam:uniprot:P00439

Page 25: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Levels of standardisation

Fig.: COMBINE standards that are relevant to this workshop; adapted from (Chelliah et al., 2009, DILS)

Page 26: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

State of affairs in 2015

● Standards:

– support for all steps of the modeling cycle

– support of various modeling technique

– Still: some modeling concept not yet covered (→ Report of whole Cell modeling workshop, Waltemath et al 2015 (under review))

● Infrastructures:

– Software tools export/import standards

– Open model repositories and management systems

– Education

● Recognition

Page 27: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Software tool support

● Standard converters (SBML ↔ SBGN; SBML ↔ CellML...)● Standard support in software● Interoperability tools

– Cytoscape for network analysis and visualization (SBML, SBGN, BioPax)

– The Virtual Cell for modeling (SBML, BioPAx)– VANTED for network analysis, visualization and manipulation

(SBML, SBGN)Check COMBINE Website

for details

Page 28: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Software tool support in SBML

Fig.: Software supporting SBML. Slide courtesy M. Hucka (ICSB 2014).

Also check the SBML Software Matrix

Page 29: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Open model repositories

● Structured, type-specific archives● Offer download of curated, annotated, published models

and associated files (visual representations, simulation descriptions, publication…)

CCDB

Page 30: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Model management systems

Fig.: The SEEK. Wolstencroft et al (2015). doi:10.1186/s12918-015-0174-y

Model management tasks:● Storage & Integration of data● Search & Retrieval ● Version Control● Provenance

Page 31: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Getting involved

● COMBINE user meeting→ next: COMBINE 2015, OCT 11-16, Salt Lake City

● COMBINE developers meeting → next: HARMONY 2016, June 7-11, Auckland

● FAIR-DOM activities: webinars, blogs, foundries● COMBINE activities: workshops, presentations, tutorials● Help through specification documents, show cases, mailing

lists, ...

http://co.mbine.org/ http://fair-dom.org/

Page 32: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

State of affairs in 2015

● Standards:

– support for all steps of the modeling cycle

– support of various modeling technique

– Still: some modeling concept not yet covered (→ Report of whole Cell modeling workshop, Waltemath et al 2015 (under review))

● Infrastructures:

– Open model repositories

– Software tools export/import standards

– Model management systems

– Education

● Recognition

Page 33: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Recognition

33

1) Higher visibility of research

2) Long-term availability

3) Link to other resources

4) Quality-checks

Fig.: Piwowar and Vision (2013) Data reuse and the open data citation advantage. PeerJ

Page 34: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Model curation and publication in BioModels Database

Fig.: Li et al (2010)

Page 35: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Functional curation of models through virtual experiments

Fig.: Functional curation of models in the Web Lab. Cooper et al (2015) https://peerj.com/preprints/1338/ ; Cooper et al (2014) doi:10.1016/j.pbiomolbio.2014.10.001

Try out theCardiac physiology

Web Lab

Page 36: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Enabling model version control

Fig.: courtesy Martin Scharm, BudHat

Page 37: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Enabling on-the-fly reproduction of the model-based results

Fig.: Software supporting SBML and SED-ML.Waltemath et al (2011). doi:10.1186/1752-0509-5-198

Page 38: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

So far for the theory… and in practice?

● Check for existing standards and specifications thereof: http://co.mbine.org

● Get involved in standard development → through the relevant mailing lists

● Problems with getting your model into the right format?

– Is it a problem with finding the approriate format or tool? → Ask on the relevant mailing list... people are friendly and happy to help.

– Is it a tool problem? → Complain with tool developers... who will hopefully change it.

– Is is a problem with the lack of a standards? → Feed back into the standard's community… people are friendly and happy to improve the standard.

● Follow best practices when aiming at publishing a result.

Page 39: Reproducibility of model-based results: standards, infrastructure, and recognition

http://sems.uni-rostock.de

Best practices for publishing reproducible modeling results

1) Encode the model in a standard format, e.g. SBML.

2) Annotate the SBML model, following MIRIAM.

3) Publish the simulation experiment descriptions in standard format, e.g. SED-ML. If unsure what to include, consult the MIASE guidelines.

4) Try to reproduce the results *yourself*.

5) Ask a colleague to reproduce the results.

6) If successful: Archive all steps that led to your results.

7) Disseminate model code and simulation description through an open repository. Adapted from: Waltemath et al (2013), doi:10.1007/978-94-007-6803-1_10

Page 40: Reproducibility of model-based results: standards, infrastructure, and recognition