the genomic standards consortium - minimum information checklists for standardised reporting of...

25
Standards to support science http://gensc.org/ The Genomic Standards Consortium Minimum information checklists for standardised reporting of metagenome data Ecogenomics: from Data to Knowledge 13-14 February, 2014 CSIRO Canberra, Australia

Upload: australian-bioinformatics-network

Post on 27-Jan-2015

106 views

Category:

Technology


0 download

DESCRIPTION

Outline of Talk: - The Genomic Standards Consortium - Reporting raw data and meta Minimum information checklists for standardised reporting of metagenome data data (contextual data)

TRANSCRIPT

Page 1: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

The Genomic Standards Consortium

Minimum information checklists for

standardised reporting of

metagenome dataEcogenomics: from Data to Knowledge

13-14 February, 2014CSIRO

Canberra, Australia

Page 2: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

Introducing myself

University of Oxford e-Research Centre staff member since December 2011

Project: development of a metagenomics portal at the EBI (BBSRC grant)

Visitor European Bioinformatics Institute (EBI)

Secretary of the Genomic Standards Consortium (GSC)

Page 3: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

Outline of this talk

The Genomic Standards Consortium

Reporting raw data and metadata (contextual data)

Page 4: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

Introduction to the Genomic Standards Consortium (GSC)

The GSC was established in 2005. It is an open membership community working towards better descriptions of our collection of genomes, metagenomes and marker gene sets

Page 5: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

The GSC Mission

the implementation of new (meta)genomic standards

methods of capturing and exchanging metadata

harmonization of metadata collection and analysis efforts across the wider genomics community

Page 6: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

The GSC fulfils its mission by

• Organizing meetings • Forming working groups• Creating Consensus Products

Page 7: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/7

2005: where is the contextual data?

“It is now clear that the full potential of sequence analysis can only be achieved if the geographic and environmental context of the sequence data is considered, herewith referred to as contextual data”

Page 8: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

GSC 11,Hinxton,

2010

GSC 12Bremen,

2011

GSC 13BGI 2012

Community-driven solutions

Taking the ‘Common Path’ towards building consensus:

• Identify the problem• Define a community to address it• Define scope of the solution• Implement solution• Gain adoption of solution

GSC 14Oxford,

2012

Page 9: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

What are standards?

A standard is a convention that gives uniformity to an area of research or innovation.

Standards unite groups and enable collective change.

Standards provide the language in which innovation is written.

Page 10: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

StandardsprinciplesNot everything should be ‘standardized’

Aggregation of data, information, and knowledge requires standard ways of doing things

Standards provide foundations; Standards should drive innovation (think of electrical plugs or the internet)

Pick the right concepts to standardize – at the right time, with the right people

Requires good ‘group think’ – or ‘systems thinking’

Page 11: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

What, when, where, how?

Contextual data

TaxaHabitat

Date and TimeLatitude/Longitude

Environmental measurementsDNA extraction method

Sequencing method

Page 12: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

GSC Standards

Page 13: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

GSC Minimum Information checklists

Minimum Information about any Sequence (MixS)

• Minimum Information about a (Meta)Genome SequenceMIGS/MIMS specifies a formal way to describe genomes/ metagenomes in more detail than is currently captured in public repository documents.

• Minimum Information about a MARKer gene SequenceThe MIMARKS checklist: 'electronic laboratory notebook' containing core contextual data items required for consistent reporting of marker gene investigations. MIMARKS uses the MIGS/MIMS checklists with respect to the nucleic acid sequence source and sequencing contextual data, but extends them with further experimental contextual data such as PCR primers and conditions, or target gene name.

Page 14: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

Use of MIxS

Please provide this minimum information when you publish

• a genome• a metagenome• a marker gene study (e.g. ribosomal genes)

INSDC (DDBJ, ENA, GenBank) accept this information and encourage its submission to their public DNA databases

Page 15: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/15

Core MIxS

Item BA EU PL VI ORGME SU SPsubmitted to insdc M M M M M M M M

investigation type M M M M M M M M

project name M M M M M M M M

geographic location (latitude and longitude) M M M M M M M M

geographic location (country and/or sea,region) M M M M M M M M

collection date M M M M M M M M

environment (biome) M M M M M M M M

environment (feature) M M M M M M M M

environment (material) M M M M M M M M

environmental package M M M M M M M M

sequencing method M M M M M M M M

“M”=mandatory “C”=conditional mandatory “X”=recommended “-”=not applicable

Truly “minimal” with 11 contextual data items

Page 16: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

MIxS Standards

Yilmaz et al. Nature Biotech. 2011; 29:415-420

Page 17: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

Example Checklist Construction

Page 18: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

Controlled vocabularies and ontologies

Consistent reporting greatly enhanced the usablility of data

The MIxS standard provides a number of controlled vocabularies

The GSC encourages the use of ontologies, e.g. EnvO (environmental ontology)

Page 19: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

Example: some metadata from marine sample

Page 20: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

Meta-Analysis

Genes/OTUs Environment (pH)

Page 21: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

New sequencing technologies: rapid data increase

In recent years, new sequencing technologies have been developed. • Sequencing cost per base has

dropped rapidly• Amount of sequence in INSDC

database is currently doubling every 8 months

• Democratisation of sequencing: bench top sequencers make technology available to individual labs

Page 22: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

To exploit fully the promise of scientific data we need both innovation and community agreement on how to provide appropriate stewardship of these resources for the benefit of all. 

Requires the evolution of our scientific, technological and sociological thinking....

The Data Bonanza

Page 23: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

The GSC is running a range of consensus-driven projects and is now making a call for community compliance/community involvement

More information: http://gensc.org/

The next GSC meeting will in held in Oxford, UK (30 March-2 April 2014)

Page 24: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

gensc.org

Page 25: The Genomic Standards Consortium - Minimum Information Checklists for Standardised Reporting of Metagenome Data - Peter Sterk

Standards to support science http://gensc.org/

AcknowledgementsThe GSC efforts are contributed on a volunteer basis by a wide range of participants, including GSC authors, working group members, workshop participants and adopters.

Special Thanks to the GSC Board:Linda Amaral-Zettler, MBLGuy Cochrane, EMBL-EBI Jim Cole, MSUNeil Davies (Berkeley)Peter Dawyndt, University of Ghent Dawn Field, CEH (Chair of GSC)George Garrity, MSUJack Gilbert, Argonne National LabFrank Oliver Glöckner, MPI-BremenLynette Hirschman, MITRE Hans-Peter Klenk, DSMZ Renzo Kottmann, MPI-BremenRob Knight (University of Colorado

Nikos Kyrpides, DOE, JGIFolker Meyer, Argonne National LabNorman Morrison (University of Manchester)Inigo San Gil , LTERSusanna Sansone, University of OxfordLynn Schriml, University of Maryland (Treasurer of GSC)Peter Sterk, GSC (Secretary of GSC)Dave Ussery DTU Owen White, University of MarylandJohn Wooley, UCSD (PI of RCN4GSC)

Institutional Liasons to the GSC BoardIlene Mizrachi (NCBI/GenBank)Tatiana Tatusova (NCBI/RefSeq)