genboree microbiome toolset kevin riehle 4/3/12 – nih cloud workshop boulder, colorado
TRANSCRIPT
Genboree Microbiome Toolset
Kevin Riehle
4/3/12 – NIH Cloud WorkshopBoulder, Colorado
Kevin Riehle
Collaborators
• Aleks Milosavljevic• Cristi Coarfa• Andrew Jackson• Arpit Tandon• Sameer Paithankar• Sriram Raghuraman Aagaard Lab
• Kjersti Aagaard• Jun Ma • Versalovic Lab
• James Versalovic• Emily B. Hollister• Delphine Saulnier• Toni-Ann Mistretta• Sabeen Raza
James Veralovic
Sabeen Raza
Toni-AnnMistretta
Cristi Coarfa
Kjersti Aagaard
Aleksandar Milosavljevic
Jun Ma
Andrew Jackson Sameer
Paithankar
Emily B. Hollister
Delphine Saulnier
SriramRaghuraman
Arpit Tandon
MattRoth
Overview
• Genboree Introduction– Manuscripts– Overview
• Data + Tools– Lean vs. obese twins study– Grid viewer + 16S Samples
• Data + Mashups– Grid viewer + WGS Genes / Pathways + KEGG
• Virtual Integration– Multiple databases existing within multiple servers in different
physical locations• Conclusions
Genboree Microbiome Toolset
Riehle K, Coarfa C, Jackson A, Ma J, Tandon A, Paithankar S, Raghuraman S, Mistretta TA, Saulnier D, Raza S, Diaz MA, Shulman R, Aagaard K, Versalovic J, Milosavljevic A. The genboree microbiome toolset and the analysis of 16S rRNA microbial sequences. BMC Bioinformatics 2012; In Press.
Large Scale Applications
• Metagenomic-Based Approach to a Comprehensive Characterization of the Vaginal Microbiome Signature in Pregnancy– Kjersti Aagaard, in review
Genboree Introduction
• Groups• Permissions• Databases• Projects• Browser• Workbench
Genboree Introduction
• Genboree.org– Everyone should have received an email regarding their
Genboree account• Genboree.org/microbiome– Tutorial– FAQ– This PointPoint– Etc.
• Questions:– Ask later, interrupt now, etc.
16S rRNA SFF / SRA Sample Meta Data
Quality Filtered Sequences
Multi-step OTU Picking
Remove Chimeras
Taxonomic Classification
Representative Sequences
OTU Table Phylogenetic Tree
Beta Diversity
Alpha Diversity
Classification
Feature Selection
Taxonomic Abundance
Data Tree Selector
Genboree Workbench
Various Data Types
Item Details
Data Type Filter
Input Data
Output Targets
Activated Tool
Non-Activated Tool
Genboree Workbench
Workbench FlowTransfer Associate Initialize Analyze
SRR, SFF Sequences
SRR, SFF Sequences
Subject Meta Data
Subject Meta Data
Sample Record
Sample Record
Sample Record
Quality FilteredSample Sequences
Quality FilteredSample Sequences
Samples
SampleSet
α β
Samples
• Import Samples– If sample does not exist, create– If sample exists
• Add metadata if metadata does not exist • Update metadata if metadata exists and differs
• Sample – File Linker• Add Sample Set• Delete Sample Set(s)• Add Samples to Sample Set• Remove Samples from Sample Set(s)
Data + Tools
• Lean vs obese twin study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
• 94 samples– 49 Lean– 45 Obese
• V6 primer region• 454 – 16S rRNA
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Genboree Project Integration
• http://genboree.org/java-bin/project.jsp?projectName=Turnbaugh_lean_obese_twins_project
Phylogenetic Visualizations - iTOL
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Lean vs. Obese Twins Study
Vol 457|22 January 2009| doi:10.1038/nature07540
Lean vs. Obese Twins Study
Vol 457|22 January 2009| doi:10.1038/nature07540
Grid Viewer
• Provides an interactive view of Samples from 1 to many databases
• Databases may exist in different physical locations (virtual integration) (will discuss more later)
• Users can save Sample Sets in which to analyze • Users can select Samples in which to explore
Genes and Pathways (WGS only)
HMP Data Metrics
• Phase I and Phase II– http://trace.ncbi.nlm.nih.gov/Traces/sra/?
study=SRP002395– http://trace.ncbi.nlm.nih.gov/Traces/sra/?study=S
RP002860– > 13,000 samples
Grid Viewer
16S rRNA Sample Grid Viewer
16S rRNA Sample Grid Viewer
16S rRNA Sample Grid Viewer
16S rRNA Sample Grid Viewer
16S rRNA Sample Grid Viewer
• Then show how we can use these sample sets for analysis on the GMT
16S rRNA Sample Grid Viewer• http://genboree.org/java-bin/sampleGridViewer.jsp?dbList=http%3A%2F
%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FHMP-16S-rRNA-phaseI-phaseII%2Fdb%2FHMP-16S-I-II%3F&gbGridXAttr=primer_region&gbGridYAttr=body_site&xlabel=primer_region&ylabel=body_site&gridTitle=Samples%20from%20HMP-16S-I-II&pageTitle=Sample%20Grid%20Viewer:%20Samples%20from%20HMP-16S-I-II
• http://genboree.org/java-bin/sampleGridViewer.jsp?dbList=http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FHMP-16S-rRNA-phaseI-phaseII%2Fdb%2FHMP-16S-I-II%3F&gbGridXAttr=seq_center&gbGridYAttr=primer_region_PLUS_body_site&xlabel=seq_center&ylabel=primer_region_PLUS_body_site&gridTitle=Samples%20from%20HMP-16S-I-II&pageTitle=Sample%20Grid%20Viewer:%20Samples%20from%20HMP-16S-I-II
Data + Mashup
• Genes and Pathways– View samples + tracks within Grid Viewer– View output within Gene Browser and Pathway
Browser– View Pathways within KEGG
Data + Mashup
Data + Mashup
Data + Mashup
Data + Mashup
Data + Mashup
Data + Mashup
Data + Mashup
Data + Mashup
Data + Mashup
Data + Mashup
Data + Mashup
Data + Mashup
Data + Mashup
Data + Mashup
Data + Mashup
Virtual Integration
• Accessing data that exists within different physical servers
Virtual Integration
Virtual Integration
Virtual Integration
Virtual Integration
Virtual Integration
Virtual Integration
Virtual Integration
Virtual Integration
Virtual Integration
Virtual Integration
Virtual Integration
Virtual Integration
Virtual Integration
Virtual Integration
Virtual Integration
• http://genboree.org/java-bin/sampleGridViewer.jsp?dbList=http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FHMP-16S-rRNA-phaseI-phaseII%2Fdb%2FHMP-16S-I-II%3F,http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FPublic_16S_experiment_data%2Fdb%2FDisease_X%3F&gbGridXAttr=TYPE&gbGridYAttr=DNA_extraction_site_PLUS_seq_center_PLUS_body_site_PLUS_primer_region&xlabel=TYPE&ylabel=DNA_extraction_site_PLUS_seq_center_PLUS_body_site_PLUS_primer_region&gridTitle=Samples%20from%20HMP-16S-I-II,Disease_X&pageTitle=Sample%20Grid%20Viewer:%20Samples%20from%20HMP-16S-I-II,Disease_X
• http://genboree.org/java-bin/sampleGridViewer.jsp?dbList=http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FHMP-16S-rRNA-phaseI-phaseII%2Fdb%2FHMP-16S-I-II%3F,http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FPublic_16S_experiment_data%2Fdb%2FDisease_X%3F&gbGridXAttr=primer_region&gbGridYAttr=body_site&xlabel=primer_region&ylabel=body_site&gridTitle=Samples%20from%20HMP-16S-I-II,Disease_X&pageTitle=Sample%20Grid%20Viewer:%20Samples%20from%20HMP-16S-I-II,Disease_X
• http://genboree.org/java-bin/sampleGridViewer.jsp?dbList=http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FHMP-16S-rRNA-phaseI-phaseII%2Fdb%2FHMP-16S-I-II%3F,http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FPublic_16S_experiment_data%2Fdb%2FDisease_X%3F&gbGridXAttr=barcode&gbGridYAttr=primer_region_PLUS_body_site&xlabel=barcode&ylabel=primer_region_PLUS_body_site&gridTitle=Samples%20from%20HMP-16S-I-II,Disease_X&pageTitle=Sample%20Grid%20Viewer:%20Samples%20from%20HMP-16S-I-II,Disease_X
• http://genboree.org/java-bin/sampleGridViewer.jsp?dbList=http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FHMP-16S-rRNA-phaseI-phaseII%2Fdb%2FHMP-16S-I-II%3F,http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FPublic_16S_experiment_data%2Fdb%2FDisease_X%3F&gbGridXAttr=TYPE&gbGridYAttr=primer_region_PLUS_body_site&xlabel=TYPE&ylabel=primer_region_PLUS_body_site&gridTitle=Samples%20from%20HMP-16S-I-II,Disease_X&pageTitle=Sample%20Grid%20Viewer:%20Samples%20from%20HMP-16S-I-II,Disease_X