susanna sansone at datacite: the isa-commons - experiences from the field
DESCRIPTION
Susanna-Assunta Sansone's talk at the DataCite Summer meeting in Copenhagen on "The ISA-Commons - experiences from the field", 14th June 2012TRANSCRIPT
The ISA Commons: experiences from the field
Susanna-Assunta Sansone, PhD
Principal Investigator, Team Leader, University of Oxford e-Research Centre,
Oxford, UK
http://uk.linkedin.com/in/sasansone #biosharing
DataCite Summer Meeting DIGITAL RESEARCH DATA IN PRACTICE: solutions for improving discovery, access and use
June 14, 2012 Copenhagen
bioscience !
• Reproducible research • annotated research data and methods offer new
discovery opportunities and prevent unnecessary repetition of work;
• improved data sharing underpins science of the future; • but !.. shared data have little or no value if they are
not interpretable and, consequently, reusable
Image from datacite.org
3!
Reproducibility
Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 41(2), 149-55 (2009) doi:10.1038/ng.295
4!
Reproducibility
Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 41(2), 149-55 (2009) doi:10.1038/ng.295
5!
Reproducibility
Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 41(2), 149-55 (2009) doi:10.1038/ng.295
6!
Reproducibility
6!
7!
Across studies and groups
7!
8!
8!
Reproducibility
NO to ‘data blobs’
YES to verifiable, complete and structured information
Image from datacite.org
! Capture all salient features of the experimental workflow
! Make annotation explicit and discoverable
! Structure the descriptions for consistency, tracking ! independent variables ! dependent variables using ! cross reference and
resolvable identifiers
Structured description of datasets
! We must strike a balance between • depth and breadth of
information; and • sufficient information
required to reuse the data
Not too much, not too little, just ‘right’
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
12
Example of experiments by InnoMed PredTox a FP6 public-private consortium
Different community, different norms and standards, e.g.:
report the same core, essential information
use the same word and refer to the same ‘thing’ allow data to flow from
one system to another
Challenges: lack of coordination, fragmentation and uneven coverage
Growing number of reporting standards
+ 130
Estimated
+ 150
Source: MIB
BI,
EQU
ATOR
+ 303
Source: BioPortal
MIAME!MIAPA!
MIRIAM!MIQAS!MIX!
MIGEN!
CIMR!MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!GCDML!
SRAxml!SOFT! FASTA!
DICOM!
MzML !SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!CHEBI!
OBI!
PATO! ENVO!MOD!
BTO !IDO…!
TEDDY!
PRO!XAO!
DO
VO!
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
15
A catalogue to map the landscape of standards and the systems implementing them: Over 400 bio-standards (public and in curation)
Field*, Sansone* et al., Omics data sharing. Science 326, 234-36 (2009) doi:0.1126/science.1180598
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
16
A catalogue to map the landscape of standards and the systems implementing them: Over 400 bio-standards (public and in curation)
Field*, Sansone* et al., Omics data sharing. Science 326, 234-36 (2009) doi:0.1126/science.1180598
Source of the figure: EBI website
! Bioscience is interdisciplinary and integrative in character • need to deal with new and existing datasets • deal with a variety of data types
Bioscience is not one domain!
!"#$%&'()'*
+,-*
&+'.!&*
'/("*
Is it possible to achieve a common, structured
representation of diverse bioscience experiments that:
• transcends individual bioscience domains, but also
• follows the appropriate community norms and
standards?
A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including:
• environmental health • environmental genomics • metabolomics • metagenomics • nanotechnology • proteomics,
We aim to achieve a common representation of experimental content that transcends individual bioscience domains
Sansone et al., Towards interoperable bioscience data. Nature Genetics 44, 121-126 (2012) doi:10.1038/ng.1054
• stem cell discovery • system biology • transcriptomics • toxicogenomics • also by communities working to build
a library of cellular signatures
A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including:
• environmental health • environmental genomics • metabolomics • metagenomics • nanotechnology • proteomics
Nanotechnology Informatics Working
Group
Some of the internal projects: Some of the public groups/resources:
4
Stem Cell Commons
Stem Cell Commons
• stem cell discovery • system biology • transcriptomics • toxicogenomics • also by communities working to build
a library of cellular signatures
A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including:
• environmental health • environmental genomics • metabolomics • metagenomics • nanotechnology • proteomics
Nanotechnology Informatics Working
Group
Some of the internal projects: Some of the public groups/resources:
4
Stem Cell Commons
Stem Cell Commons
• stem cell discovery • system biology • transcriptomics • toxicogenomics • also by communities working to build
a library of cellular signatures
Metadata tracking framework, designed to support the use us several standards checklists, terminologies conversions to (a growing number of) other metadata formats, used by public repositories, e.g. Currently finalizing conversion to RDF to explore the growing Linked Data universe, in collaboration with the W3C HCLSIG)
MAGE-Tab Pride-xml
SRA-xml SOFT
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
23
empowering researchers to use standards
To mint DOIs
www.biosharing.org
www.isacommons.org
TOWARDS INTEROPERABLE BIOSCIENCE DATA
Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B, Wolstencroft K, Xenarios J, Hide W.
Feb 2012 www.isacommons.org
doi:10.1038/ng.1054
Development timeline!
Community involvement and uptake !
Core developments!
2008 2009 2010
1st ISA-Tab workshop!3rd ISA-Tab workshop !
2nd ISA-Tab workshop !
Final ISA-Tab spec! Database instance !at EBI!
ISA software v1!
2011
1st public instance: !Harvard Stem Cell !Discovery Engine!
RDF format starts!
Conversions to !Pride-XML/SRA-XML/!MAGE-Tab and more!
User workshops/visits - start!Growing number of systems starts to adopt ISA-Tab!
Publications!
‘Omics data sharing!(Science)!
ISA-Tab and !ISA software suite!(Bioinformatics)!
Stem Cell !Discovery !Engine!(NAR)!
2007 2012
Strawman ISA-Tab spec !
Other tools implement !ISA-Tab!
Workshop reports!ISA Commons!(Nature Genetics)!
Links to analysis tools starts!
!