genomes on line database the gold genomic standards submitting genomic and metagenomic projects in...

43
Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group [email protected]

Upload: devan-kitchener

Post on 31-Mar-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Genomes on Line DatabaseThe GOLD genomic standards

Submitting genomic and metagenomic projects in

img/GOLD

Ioanna Pagani

Genomic Standards Group

[email protected]

Page 2: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

IntroductionIntroduction

The Genomes OnLine Database (GOLD) is a centralized resource for the continuous monitoring of world-wide genomic and metagenomic sequencing projects, uniquely integrated with their associated metadata.

2 GOLD interfaces:

1. Public GOLD currently holds ~ 12200 public genomic and metagenomic projects, whose metadata can be viewed by the general public in http://www.genomesonline.org/

A project is public when a GOLD id is assigned to it, e.g. Gi12200

2. Data is entered through the img/GOLD interface by img users at http://img.jgi.doe.gov/cgi-bin/gold/gold.cgi

Projects without a GOLD id cannot be viewed by the general public.

Page 3: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Public GOLD Public GOLD

http://img.jgi.doe.gov/cgi-bin/gold/gold.cgihttp://www.genomesonline.org/

The public interface allows users to view lists of genomic and metagenomic projects, and conduct multiple queries based on organism or project characteristics they chose.

http://www.genomesonline.org/

Page 4: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Public interface: multiple queriesPublic interface: multiple queries

http://www.genomesonline.org/

Page 5: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Queries by 50 different metadata fieldsQueries by 50 different metadata fields

Metadata fields classified in several categories: organism info, genome project info, external links, environment metadata, host metadata, organism metadata, metagenome classification etc.

Query for microbial samples, isolated from USA

Query by chosen metadata fields returns complete interactive list of organisms with hyperlinks to taxonomy, publication, sequencing center, genbank accession along with Google Map pinpointing the collection location of organisms where available.

Page 6: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

GOLDstamp IDs and GOLD CardsGOLDstamp IDs and GOLD Cards

GOLDstamp IDs are indexes for public projects:Gi, Gc, Gm, Gs

If a project does not have a GOLDstamp ID it is not visible to the public.

Page 7: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

img/GOLD img/GOLD http://img.jgi.doe.gov/cgi-bin/gold/gold.cgi

The img/GOLD interface is viewed by img users only and allows them to enter metadata for their own projects as well as view publicly available projects. You will not be able to submit a file in img, without an img/GOLD submission of metadata for your project.

Page 8: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Order in Chaos: Genomic standards Order in Chaos: Genomic standards imperative for genomics databasesimperative for genomics databases

GOLD is a database with thousands of users that enables the scientific community to learn about existing projects, seek their sequences through public repositories such as NCBI, or compare genomic features through systems like img.

Due to the sheer volume of genomic and metagenomic data, order has to be established in the cataloguing of genomic and metagenomic projects.

The Genomics Standards Consortium (GSC) has set mandates about the Minimum Information for a Genomic Sequence (MIGS), metagenomic sample (MIMS) or any genomic sequence out there (MIxS).

GOLD is one of the few databases that fully complies with these rules and expects its users to do so, if they want to use the img system capabilities.

If you want to have a sample submitted and processed with img, you will have to follow the rules too!

Page 9: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Ordo ab Chaos: Genomic standards Ordo ab Chaos: Genomic standards imperative for genomics databasesimperative for genomics databases

IF YOU DO NOT COMPLY WITH THE FOLLOWING RULES FOR SUBMITTING GENOMIC AND METAGENOMIC METADATA FOR YOUR PROJECTS,YOU WILL RECEIVE AN EMAIL STATING THAT YOUR SUBMISSIONS ARE PENDING DUE TO MISSING METADATA.

YOUR SAMPLES WILL STAY IN LIMBO UNTIL YOU FILL IN THE CORRECT METADATA.

You are here

Page 10: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting genomic project metadata in img/GOLD: Submitting genomic project metadata in img/GOLD: Do your homework before you submit!Do your homework before you submit!

Case study: Escherichia coli O1:K1:H7 DSM 30083Sequencing center: JGI, Initiative: GEBA -CSP, Project Relevance: Bioenergy. Sequencing techniques: Illumina, 454Isolation info :? Host metadata: ? Organism metadata?

Naming convention is Genus, species, official strain name, culture collection id. What is the official strain name ? Go to straininfo.net to find out!

Page 11: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting genomic project metadata in img/GOLD: Submitting genomic project metadata in img/GOLD: Do your homework before you submit!Do your homework before you submit!

Page 12: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

So the name of the organism as it will be entered in img/GOLD isEscherichia coli O1:K1:H7 U5/41, DSM 30083

Next on NCBI taxon id!

Submitting genomic project metadata in img/GOLD: Submitting genomic project metadata in img/GOLD: Do your homework before you submit!Do your homework before you submit!

Page 13: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Taxon id is 866789. Search had to be conducted under any possible combination of culture collection id, official strain name, serovar on/off, etc. If you don’t find the exact taxon id for your strain, you can find the taxon id for genus/species combination. Eg. Taxon id for Escherichia coli is 562.TAXON ID IS NOT GPID (former Genbank project id) or Bioproject id.

http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi

Submitting genomic project metadata in img/GOLD: Submitting genomic project metadata in img/GOLD: Do your homework before you submit!Do your homework before you submit!

Page 14: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

http://www.ncbi.nlm.nih.gov/bioproject

Submitting genomic project metadata in img/GOLD: Submitting genomic project metadata in img/GOLD: Do your homework before you submit!Do your homework before you submit!

Page 15: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting genomic project metadata in Submitting genomic project metadata in img/GOLDimg/GOLD

Page 16: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting genomic project metadata in Submitting genomic project metadata in img/GOLDimg/GOLD

Page 17: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting genomic project metadata in Submitting genomic project metadata in img/GOLDimg/GOLD

Page 18: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting genomic project metadata in Submitting genomic project metadata in img/GOLDimg/GOLD

This project has a GOLDstamp ID Gi, as it is incomplete. However its metadata is available to the public through the “outside” GOLD. The projects you enter will not be assigned GOLDstamp IDs unless you tell us so, therefore not only the genomic data but also the metadata of your projects will be kept hidden. However as a courtesy to our services, please send us an email when you are ready for your project’s metadata to be available to the community.

Page 19: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting genomic project metadata in Submitting genomic project metadata in img/GOLDimg/GOLD

Submitting genomic project metadata in img/GOLDSubmitting genomic project metadata in img/GOLD

Page 20: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting genomic project metadata in Submitting genomic project metadata in img/GOLDimg/GOLD

Scroll down for more fields!

Page 21: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting genomic project metadata in Submitting genomic project metadata in img/GOLDimg/GOLD

Without inclusion of sequencing center your project submission will not be accepted!!!

Controlled vocabularies forProject Relevance and Link Types.

Under DATA LINKS-- >link type you will find CVs for Seq Center, Collaborator, Funding, Information, Data, Database.

Page 22: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting genomic project metadata in Submitting genomic project metadata in img/GOLDimg/GOLD

Page 23: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting genomic project metadata in Submitting genomic project metadata in img/GOLDimg/GOLD

Page 24: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting genomic project metadata in Submitting genomic project metadata in img/GOLDimg/GOLD

Page 25: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting genomic project metadata in Submitting genomic project metadata in img/GOLDimg/GOLD

What do you do when you are done?

Update!

Page 26: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Review your genome project submission by Review your genome project submission by clicking on ER ID in img/GOLDclicking on ER ID in img/GOLD

Page 27: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Review your genome project submission by Review your genome project submission by clicking on ER ID in img/GOLDclicking on ER ID in img/GOLD

Page 28: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Standardized metagenome classification Standardized metagenome classification and namingand naming

Emphasis on standardized metagenome project and sample naming and metagenome classification.

Why? With cryptic and esoteric names it is impossible to build a database of metagenomic genes from comparable environments and hosts. Here’s a list of actual metagenome project names we had in GOLD in the past:

•TM7b•hkbic•Saliva_contig300•HK Metagenome T1 (obviously different from T2)•US Sludge•How many organisms are out there? ( Have you tried the Drake equation?)

•(and my favorite)

How to avoid the pitfalls of the metagenomic jungle

Page 29: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

A call for standardized classification of metagenome projectsEnvironmental MicrobiologyVolume 12, Issue 7, pages 1803–1805, July 2010

Standardized metagenome classification Standardized metagenome classification and namingand naming

Page 30: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

4 major parts to a metagenome name, equivalent to genus, species, strain, serovar/biovar:

1. metagenome habitat : a. Environment (marine sediment, soil, etc) b. host-associated (eg bovine rumen, human fecal, etc) , c. engineered (activated sludge, wastewater bioreactor etc)

2. metagenome community: Microbial, bacterial, viral, archaeal, eukaryotic etc.

3. metagenome location: geographic longitude and latitude for  environmental samples are required as MIMS (minimum information about metagenomic sequence/sample)

4. project identifier (equivalent to serovar/biovar) anything that describes the specific type of the community such as contaminated, degrading glycophosphates time series etc)

Standardized metagenome classification Standardized metagenome classification and namingand naming

Page 31: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

1.How would you name a project that examines viruses from the waters of the Black Sea that are acidified?

“Marine viral communities from the Black Sea, under conditions of ocean acidification” Instead of “Metagenome from viruses in acidified waters” Where:Habitat = Marine (therefore classification is environmentalaquatic marineoceanic etc)Community= viral communitiesLocation= from the Black SeaProject identifier= under conditions of ocean acidification

2. How would you name a project examining microbial communities from sludge in bioreactors at UC Davis?

“Wastewater microbial communities from EBPR bioreactor at UC Davis”instead of “US sludge”Where:Habitat = Wastewater (therefore classification is engineeredwastewater industrial wastewater etc)Community= microbial communitiesLocation= from EBPR bioreactor at UC Davis

Classification is now embedded in controlled vocabularies in img/GOLD

Case studies for standardized metagenome Case studies for standardized metagenome namingnaming

Page 32: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

3.How would you name a project that examines gut microbial communities from a panda in the San Diego Zoo?

“Panda gut microbial communities from San Diego Zoo” Instead of “Panda Metagenome” Where:Habitat = Panda gut (therefore classification is Host-associatedMammals Digestive system Large instestine fecal)Community= microbial communitiesLocation= from San Diego Zo

Classification is now embedded in controlled vocabularies in img/GOLD

Case study for metagenome project and sample submission:Let’s assume you are Craig Venter and you want to submit a project with all the marine samples you gathered with your yacht from the Sargasso Sea.

The name of your project would then be?

Case studies for standardized metagenome Case studies for standardized metagenome naming continuednaming continued

Page 33: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting metagenomic project metadata in Submitting metagenomic project metadata in img/GOLDimg/GOLD

Page 34: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting metagenomic project metadata in Submitting metagenomic project metadata in img/GOLDimg/GOLD

Page 35: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting metagenomic project metadata in Submitting metagenomic project metadata in img/GOLDimg/GOLD

Scroll down for more fields!

Page 36: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting metagenomic project metadata in Submitting metagenomic project metadata in img/GOLDimg/GOLD

Page 37: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting metagenomic sample metadata in Submitting metagenomic sample metadata in img/GOLDimg/GOLD

After updating, select your project, by clicking the bullet, then chose “edit samples” option to create new samples, copy samples or update existing samples. You will be taken to page that looks like this:

Page 38: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting metagenomic sample metadata Submitting metagenomic sample metadata in img/GOLDin img/GOLD

Scroll down for more fields!

Page 39: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting metagenomic sample metadata Submitting metagenomic sample metadata in img/GOLDin img/GOLD

Page 40: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Review your metagenome sample Review your metagenome sample submission by clicking the ER ID submission by clicking the ER ID

Questions?

Page 41: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

Submitting metagenomic sample metadata in Submitting metagenomic sample metadata in img/GOLDimg/GOLD

IMPORTANT NOTICE:

DO NOT CREATE NEW PROJECTS IF YOU WANT TO SUBMIT NEW SAMPLES UNDER AN OLD PROJECT WHICH YOU HAVE ALREADY SUBMITTED IN THE PAST. If you do this repeatedly your img account might be temporarily suspended and your img/ m er submissions will not be processed. Submit new samples under your old project by clicking “edit samples” in the previous page and clicking “new” in this one.

Page 42: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

DO NOT FORGETDO NOT FORGET

• To add a sequencing center: go to Project info tab, scroll down to Data Links, choose from Link Type “Seq center”, enter it in Source Name field and enter the url with the http:// prefix in URL field.

• To add a sample to your metagenome project entry, select your project, and choose edit samples. • Do not add new projects instead of new samples.• If all fails, send us an email ipagani @lbl.gov

Page 43: Genomes on Line Database The GOLD genomic standards Submitting genomic and metagenomic projects in img/GOLD Ioanna Pagani Genomic Standards Group ipagani@lbl.gov

AcknowledgementsAcknowledgements

GOLD Team:

Dinos Liolios

Amy Chen

TB Reddy

Bahador Nosrat

Joe Walp

Tatiana Smirnova

IMG group leads:

Natalia Ivanova

Kostas Mavromatis

Victor Markowitz

Nikos Kyrpides