standards and tools for model management in biomedical research
TRANSCRIPT
Standards and tools for model managementin biomedical research
Dagmar WaltemathUniversity of Rostock, Germany
Clickable slides available online from slideshare.
2
© OpenStreetMap contributors
Standards and tools for model management
Junior research group: Management of simulation studies in systems biology
Tool development: SBGN-ED for the graphical representation of networks
Infrastructure project: Data management for systems biology in Germany
3
Standards and tools for model management
Figs: BioModels (top) and DOI: 10.1073/pnas.88.16.7328 (bottom)
6
Most scientific discoveries rely onprevious or other findings.
Fig.: Tyson 2001 (BIOM195)
Fig.: Tyson 1991 (BIOM005)
7
Goals of scientific publication– To announce a result
– To convince readers that the result is correct
Most scientific discoveries rely onprevious or other findings.
Traditional science
● Mathematical, complete proofs ● Result description and protocols
in experimental sciences
Computer-driven science
● Data analysis with modular software tools/packages
● Workflows● Databases rather than direct inquiry
from in-house laboratories
Mesirov (2010) Science, doi:10.1126/science.1179653
8
Can we rely on findings that we ourselves cannot evaluate? (Probably not!)
“only in ~20–25% of the projects were the relevant published data completely in line with our in-house findings (Fig. 1c). In almost two-thirds of the projects, there were inconsistencies [..] that either considerably prolonged the duration of the target validation process or, in most cases, resulted in termination of the projects because the evidence [..] was insufficient to justify further investments into these projects.” (Prinz et al (2011))
9
Reproducibility issues are discussed amongkey players in science.
Publication: 10.7554/eLife.04333 ; Project progress: https://osf.io/e81xl/wiki/Studies/
10
Reproducibility issues are discussed amongkey players in science.
Fig.: Chris Ryan/Nature, doi: 10.1038/505612a
11
We identified key challenges of reproducibility insystems biology and systems medicine.
Lack of data standards – Lack of data quality and quantity – Lack of data availability – Lack of transparency
12
53 researchers 17 countries various different professions
A lack of suitable data standards hinders researchers in providing reproducible results.
Whole Cell meeting (2015)– Goal: To identify the needs and shortcomings for today's modeling tasks
– Results: ● New developments initiated (databases, data curation tools, training data,
modeling approaches, parameter estimation tools, frameworks, parallel simulators, extensions to standard formats)
● New grant proposals and follow-up projects, new networks, better standards, improved tools
Fig.: Waltemath et al (2016) IEEE TBME, accepted for publicationProject homepage: http://bit.ly/wholecell
13
A lack of data availability makes it impossible for researchers in reproducing results.
Issues– Simulation studies comprise
of several files
– Data is heterogeneous, distributed, complex
– Documentation of the how the study was performed often missing
● Model code in BioModels, including supplemental with a howto reproduce the figures given in the original paper
● Online tool makes data available and browseable
TriplexRNA
Recon 2Recon 2
● Publication backed up with a website containing the supplemental material
● Model code in (noncurated) BioModels● Visualisation of the model can easily
be explored● References to original works
14
The COMBINE initiative works towards reproducibility and tool interoperability in computational biology.
m nCoordinate annual meetings
SimulationGuidelinesOntologies
- Next HARMONY: Auckland, June 7-11, 2016
- Next COMBINE:Newcastle, Sep 19-23, 2016
Coordinate standards development
- Common procedures- Interoperable software tools- Discussion forums, mailing lists...
Represent community
- Funders- Other communities
Provide standards resources
- Single entry point- Resolvable URI- Web infrastructure
15
The COMBINE initiative works towards reproducibility and tool interoperability in computational biology.
● Model description (network, parameters, kinetics)
Fig.: SBGN-PD map, http://sbgn.org
● Visual representation of network (glyphs)
16
The COMBINE initiative works towards reproducibility and tool interoperability in computational biology.
● Simulation setup
● Definition of observed variables (plots, data tables)
● All files that belong to a (reproducible) simulation study
● Description of archive content
● Have a look at a fully featured COMBINE archive on github
Figs: BioModels
17
Use of standard formats leads to interoperable software.
internet
internet
internet
SEARCHubiquitin
internet
RESULTSEXPORT
EXPORT
EXPORT
EXPORT
Query database for annotations, persons, simulation descriptions
Retrieve information about models, simulations, figures, documentation
Export simulation study as COMBINE archive
Download archive and open the study with your favourite simulation tool
Open archive in CAT to modify its contents and to share it with others
Cardiac Electrophysiology Web Lab, Oxford
M2CAT, SEMS
WebCAT, SEMS
JWS Online, Stellenbosch, SA SED-ML Web Tools, BIOQUANT
18
We develop tools that help researchers manage standardised data efficiently.
Storage Search, retrieval & ranking
Using graph databases to integrate standardised model-based data.
doi: 10.1093/database/bau130
doi: 10.1186/s13326-015-0014-4
Search across heterogeneous data, ontologies, and structures.
https://dx.doi.org/10.6084/m9.figshare.3382993.v1
SED-ML DB in JWS Online
Our methods are tested & used in major model repositories.
BioModels Physiome Model repository
19
We develop tools that help researchers manage standardised data efficiently.
Transfer of results Version control & Provenance
Bundling files necessary to reproduce a modeling result.
doi: 10.1093/bioinformatics/btv484
Figure courtesy Martin Scharm, slideshare
Tracking the development of simulation studies over time.
https://dx.doi.org/10.6084/m9.figshare.2543059.v5
Our methods are tested & used in major model repositories.
BioModels Physiome Model repository
20
How can we bridge the gap between standards for systems biology and systems medicine?
Fig. courtesy Atalag et al (2015) http://hdl.handle.net/2292/27911
21
Research results must be well documented, comprehensible and reproducible to be trust-able and reusable.
Ways outCurrent status Desired status
Blogs and databases
Detailed documentation
Open data
Standards
Reproducibility initiative
Sustainable Software
Infrastructure
Comprehensible, findable, available, correct models and simulation studies.
Many scientific studies in the life sciences are
not reproducible.
Waltemath and Wolkenhauer (2016) How modeling standards, software, and initiatives support reproducibility in systems biology and systems medicine. Accepted for publication, IEEE Transactions in Biomedical Engineering
Thank you for your attention.
http://www.denbi.de/
Gary Bader Mike Hucka Chris Myers
David Nickerson Dagmar WaltemathNicolas Le Novère
Martin Golebiewski
Falk Schreiber
m n
@SemsProject
http://co.mbine.org