presentatie nbic2011templates
TRANSCRIPT
A democracy of reporting standards for omics studies
Kees van Bochove
NBIC BioAssist taskforce leader
metabolomics and study capture
Origin of the question: consortia
Their aims:
Get an overview over studies of all partners
Share study data
Standardization of bioinformatics
Nutritional Phenotype Database Project (dbNP)
http://dbnp.org
Data Support Platform (DSP)
http://www.nmcdsp.org
Central question: How do I turn study descriptions and metadata tables into a persistent, queryable database?
Database
Query
We studied many available open source solutions… and finally decided to create our own
• Pedro • OpenBIS • WikiLIMS • ISACreator • SysMO-DB • Annotare • LabKey • MOLGENIS • i3Cube • And more…
Open formats: XML formats: MAGE-ML, FuGE, LabKey etc. Tab-delimited formats: MAGETAB, ISATAB, XGAP RDF
GSCF: Generic Study Capture Framework
• Open source web application, developed in • Grails = Groovy on Rails • Groovy is an extension of the Java language, it compiles to
Java bytecode (can be run on any Java VM) • Development started October/November 2009, on average 4
fulltime programmers since then • Current version is 0.8.0 • Info: http://dbnp.org
• Test it: http://demo.dbnp.org • Source code: http://trac.nbic.nl/gscf
GSCF homepage
Chris Taylor (MIBBI) about data standards
“Coverage of experimental design in current bioinformatics standards is meagre at best”
Study design
GSCF study design wizard
GSCF study design wizard
Use of ontologies: users don’t like long term lists
Study design overview
Machiel Jansen about Knowledge Representation
“The representation of knowledge will always depend on its use”
Study overview – which columns should be there?
Different ‘data levels’ in a study
• Study (meta level) • Subject (source organism, e.g. humans, mice, plants, cell lines) • Event (e.g. treatment, compound, diet) • Sampling Event (e.g. DNA isolation, liver sampling) • Sample (e.g. blood sample, urine sample) • Assay (e.g. transcriptomics, metabolomics, sequencing)
• Lines up mostly with both ISATAB and MIBBI Foundry
GSCF template editor – Subject level
GSCF template editor – Event level
Barend Mons about structured data
“Everyone wants structured data, but no one wants to fill out the forms”
Importer – upload Excel file
Importer – map your Excel file unto templates
Jildau Bouwman about study capturing
“If we really want to do personalized health research, we have to capture everything that might affect our measurements!”
DbNP data model
REST protocol
Transcriptomics module
Metabolomics module
Next Generation Sequencing module
Query composer
Query results on Study level
Query results on Sample level
Query results on Assay level
Next steps
• Within the NMC DSP project, we will create a ‘GSCF data fetch’ functionality in Galaxy, enabling the execution of workflows on specific data-slices from the database
• Connect to Semantic Web efforts (OpenPHACTS project) – we also have a pilot with TNO and UvA on using a triple store to enrich GSCF assay results
• Align with other projects: e.g. Hackathon result gscf4molgenis
• Employ the NBIC philosophy – these tools are also available to you!
Hackathon results – GSCF – MOLGENIS adapter http://hackathon.nmcdsp.org | http://trac.nbic.nl/gscf4molgenis
Acknowledgements Tjeerd Abma Adem Bilican Jildau Bouwman Christine Chichester Sudeshna Das Marjan van Erk Chris Evelo Prasad Gajula Roeland van Ham Thomas Hankemeier Margriet Hendriks Guido Hooiveld Robert Horlings Peter Horvatovich Rob Hooft Machiel Jansen Jim Kaput Kostas Karasavvas Bart Keijser Matthew Lange Scott Marshall
Barend Mons Ben van Ommen Linette Pellis Janneke van der Ploeg Marijana Radonjic Theo Reijmers Erik Roos Marco Roos Frans Paul Ruzius Jahn Saito Susanna Sansone Siemen Sikkema Rob Stierum Eugene van Someren Morris Swertz Chris Taylor Michael van Vliet Jeroen Wesbeek Katy Wolstencroft Suzan Wopereis Gooitzen Zwanenburg