data and model management in systems biology

32
Data and model management in Systems Biology Dagmar Waltemath University of Rostock, Germany Kinetics on the move – Happy 10 th anniversary to SABIO-RK! Heidelberg, 31 st May, 2016 http://www.slideshare.net/dagwa/data-and-model-management-in-systems-biology

Upload: university-of-rostock

Post on 12-Jan-2017

153 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Data and model management in Systems Biology

Data and model management in Systems Biology

Dagmar WaltemathUniversity of Rostock, Germany

Kinetics on the move – Happy 10th anniversary to SABIO-RK!Heidelberg, 31st May, 2016

http://www.slideshare.net/dagwa/data-and-model-management-in-systems-biology

Page 2: Data and model management in Systems Biology

2

Junior research group: Management of simulation studies in systems biology

Tool development: SBGN-ED for the graphical representation of networks

Infrastructure: Data management for systems biology in Germany

Standards and tools for model management

www.sems.uni-rostock.de

Page 3: Data and model management in Systems Biology

© 2009 UNIVERSITÄT ROSTOCK 3

NBI-SysBio: Data management for systems biology in Germany

3● Sustainable infrastructure for data management

● Access to documented and reproducible results

● Systems Biology Standards

● Tool Development

● Education

www.denbi.de (training – services – jobs)

Page 4: Data and model management in Systems Biology

© 2009 UNIVERSITÄT ROSTOCK 4

Photo: NY - http://nyphotographic.com (CC BY-SA 3.0) Photo: janneke staaks on flickr

Fig. courtesy 10.1371/journal.pbio.1001779

TM

Page 5: Data and model management in Systems Biology

© 2009 UNIVERSITÄT ROSTOCK 5

Data management is …

● Data management describes procedures and actions that help to store, preserve, organize and control the data generated during a (research) project.

● Aspects of data management include: – Data Ownership;– Metadata Compilation;– Data Lifecycle Control;– Data Quality; – Data Access and Dissemination Photo: NY - http://nyphotographic.com (CC BY-SA 3.0)

Page 6: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 6

● Data about data● Improved understanding of encoded data items● Descriptive details● Discovery and search for existing data, online browsing of data● Standardized and structured information

– Purpose, origin, time references, geographic location, creator, access conditions, and terms of use of your data collection

● Often encoded in ontologieshttps://www.libraries.psu.edu/psul/pubcur/what_is_dm.html#data-management

Metadata

Page 7: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 7

● Well-structured, controlled vocabularies

● Capture and convey commonly agreed definitions and concepts in a domain

● Communication across people and software tools

● Enable reuse of domain knowledge

● Make implicit domain knowledge explicit and queryable

● Bio-ontologies

– Gene Ontology, ChEBI, UniProt

– Systems Biology Ontology (concepts and terminology for modeling)

Ontologies

Page 8: Data and model management in Systems Biology

8

Example: Definition of „cell growth“ in the Gene Ontology

5/31/16

id: GO:0016049name: cell growthnamespace: biological_processdef: "The process in which a cell irreversibly increases in size over time by accretion and biosynthetic production of matter similar to that already present."synonym: "cell expansion" RELATED []synonym: "cellular growth" EXACT []synonym: "growth of cell" EXACT []is_a: GO:0009987 ! cellular processis_a: GO:0040007 ! Growthrelationship: part_of GO:0008361 ! regulation of cell size

© 2009 UNIVERSITÄT ROSTOCK

Page 9: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 9

● Increased confidence and trust in the data● Better understanding of how to use the data, and of the data itself● Better data quality ● Coherent data when standards are used● Improved business processes (saving time, guaranteeing high quality)● Improved access to data and improved reproducibility● Better exploitation of data through easier data exchange and

integration

Advantages of careful & planned data management

Page 10: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 10

● Reusable

● Exchangeable

● Interoperable

● Long-term available (in open repositories)

● Curateable

● Shareable

Advantages of standardised data

Page 11: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 11

Photo: janneke staaks on flickr

Page 12: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 12

Research data in the modeling life cycle

Modelsequations, parameters,data tables

Ideastext,

drawings

Experimental results

text, data tables

Publicationstext,

figures

Analysesconfiguration files,

data tables

Fig. courtesy Martin Scharm (adapted)

Page 13: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 13

Research data in the modeling life cycle

● Mathematical formulae

● Networks, diagrams

● Image data

● Publications

● Experiment descriptions

● Experimental results (both lab and simulation)

● Definitions of things (e.g., gene functions, chemical structures...)

Figures top to bottom: (1) By Noah A. Rosenberget al. Slightly modified by User:Wobble. - Public Library of Science, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=2839383; (2) By http://rsb.info.nih.gov/ij/images/, Public Domain, https://commons.wikimedia.org/w/index.php?curid=655748; (3) BIOM005, generated using CellDesigner 4, (4,5) PMID:18669651

Page 14: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 14

● Heterogenuous

● Highly connected

● Context-dependent

● Distributed

● Big

Research data in the modeling life cycle

Figures top to bottom: (1) By Noah A. Rosenberget al. Slightly modified by User:Wobble. - Public Library of Science, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=2839383; (2) By http://rsb.info.nih.gov/ij/images/, Public Domain, https://commons.wikimedia.org/w/index.php?curid=655748; (3) BIOM005, generated using CellDesigner 4, (4,5) PMID:18669651

Page 15: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 15

The model

● Mathematical equations

● Biological entities

● Kinetic information

● Encoding: & semantic annotationsTM

<bqmodel:isDescribedBy><rdf:Bag>

<rdf:li rdf:resource="http://identifiers.org/pubmed/18669651"/></rdf:Bag>

</bqmodel:isDescribedBy>

<parameter id="parameter_49" name="L" metaid="metaid_0000078" value="20670"/>

Page 16: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 16

SBML – Standard for model encoding

● Systems Biology Markup Language

● Community-driven de-facto Standard

● Free & open source: www.sbml.org

● Supported by many organizations and tools

● Encodes computational models of biological processes (compartments – species – reactions - parameters)

Page 17: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 17

SBGN – Standard for visual representation

● Systems Biology Graphical Notation● Standardised glyphs for biological entities

● Three levels

– SBGN-AF | SBGN-ER | SBGN-PD

● Free & open source: www.sbgn.org

● Tool support

● Interpretable Format: SBGN-ML

Fig.: http:sbgn.org

Page 18: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 18

Fig.: SBGN map for BIOM183, CellDesigner

SBGN – Standard for visual representation

Fig.: SBGN map for BIOM005, CellDesigner

Page 19: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 19

● Reproduce behaviour of the model

● Publish and share virtualexperiments– Simulation setup / conditions– Pre- and post-processing– Observations

● Encoding: & & result data in Excel, CSV files <listOfSimulations> <uniformTimeCourse id="sim1" initialTime="0" outputStartTime="0" outputEndTime="100" numberOfPoints="100"> <algorithm kisaoID="KISAO:0000019"/> </uniformTimeCourse> </listOfSimulations>

The analysis

Fig. M. Stefan et al, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2596252/

Page 20: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 20

SED-ML – Standard for model analysis

● Links to models used in an analysis

● Pre- and Post-processing of models

● Type of simulation

● Definition of output

● Free an open source: www.sed-ml.org

● Tool support

→Showcase your tool support online ←

Page 21: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 21

SED-ML – Standard for model analysis

Fig. M. Stefan et al, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2596252/

Simulation of BIOM183 in SED-ML Web Tools without simulation description

Page 22: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 22

m nCoordinate annual meetings

SimulationGuidelinesOntologies

- Next HARMONY: Auckland, June 7-11, 2016

- Next COMBINE:Newcastle, Sep 19-23, 2016

Coordinate standards development

- Common procedures- Interoperable software tools- Discussion forums, mailing lists...

Represent community

- Funders- Other communities

Provide standards resources

- Single entry point- Resolvable URI- Web infrastructure

Page 23: Data and model management in Systems Biology

Standard-compliant software tools for modeling

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 23

The path2models project integrated data from different databases into more than 140.000 SBML models.

Fig.: Büchel et al BMC Sys Biol (2013)http://www.ebi.ac.uk/biomodels-main/path2models

Page 24: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 24

The Systems Biology Workbench is a software framework to help heterogeneous application components communicate with each other.

Modeling

Editing

Simulating

Analysinghttp://sbw.sourceforge.net

Standard-compliant software tools for modeling

Page 25: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 25

The decision whether and how to share data often rests with researchers. Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, et al. (2014) Troubleshooting Public Data Archiving: Suggestions to Increase Participation. PLoS Biol 12(1): e1001779. doi:10.1371/journal.pbio.1001779

Page 26: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 26

● Bundling files● Shipping results● Exchanging data● Keeping provenance

● Encoding: zip-like file with a manifest (meta-data)● Generate, modify & share through WebCAT

COMBINE Archive

Page 27: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 27

COMBINE Archive

Original publication

SBGN map

SBML model versions

SED-ML files

Open in Webcat

Open in SEEK

Page 28: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 28

Model curation & publication

Page 29: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 29

Model curation & publication

Page 30: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 30

Model curation, simulation & publication

Page 31: Data and model management in Systems Biology

5/31/16 © 2009 UNIVERSITÄT ROSTOCK 31

Introduction to SEEK & FAIRDOM by Olga Krebs.

Page 32: Data and model management in Systems Biology

32

Thank you for your attention.

http://www.denbi.de/ @SemsProject

m nhttp://co.mbine.org