genboree microbiome toolset and virtual data integration kevin p. riehle matthew e. roth aleksandar...
DESCRIPTION
Genboree Microbiome Toolset and Virtual Data Integration Kevin P. Riehle Matthew E. Roth Aleksandar Milosavljevic Baylor College of Medicine April 2-4, 2012 Boulder, Colorado. Genboree Introduction. Genboree.org Everyone should have received an email regarding their Genboree account - PowerPoint PPT PresentationTRANSCRIPT
Genboree Microbiome Toolsetand Virtual Data Integration
Kevin P. RiehleMatthew E. Roth
Aleksandar MilosavljevicBaylor College of Medicine
April 2-4, 2012Boulder, Colorado
2
Genboree Introduction
• Genboree.org– Everyone should have received an email regarding
their Genboree account• If not, ask or email us ([email protected])
• Genboree.org/microbiome– Tutorial– NIH Cloud Workshop Material– FAQ
• Links also provided on the shared Google docAleksandar Milosavljevic, Baylor College of
Medicine [email protected]
Invited to BCM Bioinformatics:
Additional contributors not acknowledged in the reference above:
R. Alan Harris, Tim Charnecki
Funding:
This project was supported in part by the NIH Roadmap Epigenomics (Common Fund) U01 DA025956 and NIH-NHGRI R01HG004009 grants to AM, by the NIH-NIDDK UH3 DK083990, P30 DK56338, and NIH Common Fund and NIH-NHGRI U54 HG003273 and U54 HG004973 grants to JV, and by Burroughs Welcome Fund Preterm Birth Initiative and NIH DP21DP2OD001500-01 grants to KA..
References and Acknowledgments
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
6
Genboree SNP / Resequencing Toolset
BMC Genomics (Invited submission)
Publications
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
7
Genboree RNA Toolset
Publications enabled by the Genboree small RNA Toolset• Gunaratne PH, Lin YC, Benham AL, Drnevich J, Coarfa C, Tennakoon JB, Creighton CJ, Kim JH, Milosavljevic A, Watson M, Griffiths-Jones
S, Clayton DF. “Song exposure regulates known and novel microRNAs in the zebra finch auditory forebrain.” BMC Genomics. 2011 May 31;12(1):277.
• Shohet JM, Ghosh R, Coarfa C, Ludwig A, Benham AL, Chen Z, Patterson DM, Barbieri E, Mestdagh P, Sikorski DN, Milosavljevic A, Kim ES, Gunaratne PH. “A genome-wide search for promoters that respond to increased MYCN reveals both new oncogenic and tumor suppressor microRNAs associated with aggressive neuroblastoma. “ Cancer Res. 2011 Jun 1;71(11):3841-51.
• Buchold GM, Coarfa C, Kim J, Milosavljevic A, Gunaratne PH, Matzuk MM. “Analysis of microRNA expression in the prepubertal testis. “ PLoS One. 2010 Dec 29;5(12):e15317.
• Polikepahad S, Knight JM, Naghavi AO, Oplt T, Creighton CJ, Shaw C, Benham AL, Kim J, Soibam B, Harris RA, Coarfa C, Zariff A, Milosavljevic A, Batts LM, Kheradmand F, Gunaratne PH, Corry DB. “Proinflammatory role for let-7 microRNAS in experimental asthma.” J Biol Chem. 2010 Sep 24;285(39):30139-49.
• Ma L, Buchold GM, Greenbaum MP, Roy A, Burns KH, Zhu H, Han DY, Harris RA, Coarfa C, Gunaratne PH, Yan W, Matzuk MM. “GASZ is essential for male meiosis and suppression of retrotransposon expression in the male germline.” PLoS Genet. 2009 Sep;5(9):e1000635. Epub 2009 Sep 4. Erratum in: PLoS Genet. 2009 Dec;5(12).
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
8
Genboree Epigenomics Toolset
Human Epigenome Atlas and the Genboree Epigenomic Toolset for Comparative Epigenome AnalysisCoarfa C1, Harris RA1, Jackson AR1, Pichot CS2, Raghuraman S1, Paithankar S1, Lee AV3, McGuire SE2, Milosavljevic A1
1NIH Roadmap Epigenomics Data Analysis and Coordination Center (EDACC), Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas. 2Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas. 3Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania..
The NIH Roadmap Epigenomics Program Investigators’ Meeting, May 14-15 2012, Bethesda, Maryland, USA.
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
9
Hosts / Domains
Groups Databases Annotations Sample sets Files
Projects
Launch Genboree Workbench
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
10
Hosts / Domains
Groups Databases Annotations Sample sets Files
Projects
Managing, Groups, Projects and DatabasesCreateReadUpdateDelete
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
11
Hosts / Domains
Groups Databases Annotations Sample sets Files
Projects
Launch Genboree Workbench
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
Virtual Data Integration
www.microbiome-center.org
www.brain-research-lab.org
www.genboree.org
12
Genboree Workbenches
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
Virtual Data Integration
www.microbiome-center.org
www.brain-research-lab.org
www.genboree.org
13
Physical site #1 Physical site #2Physical site #3
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
14
Virtual Data Integration via Genboree REST APIs
Compute Cluster
Databases & File Storage
Web Site
Servers
Genboree
REST API Server
Genboree Workbench UIs
(running in user’s browser)
REST API Executing Tool Job, Data Storage, Etc
Person 2
Person 2
wwww.genboree.org hosted at Baylor
www.brain-research-lab.org andwww.microbiome-center.org at Rackspace
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
15
Virtual Data Integration
A combination of dedicated hosting and elastic cloud computing accessible via the Workbench
16
Supplying Credentials to www.microbiome-center.org
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
19
Virtual Integration
Drag and drop
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
20
Virtual Integration
Drag and drop
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
26
Data Tree Selector
Various Data Types
Item Details
Data Type Filter
Input Data
Output Targets
Genboree Workbench
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
27
Activated Tool
Non-Activated Tool
Genboree Workbench
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
28
16S Microbiome Toolset Work Flow
2. Transfer 3. Link Samples to Seq Files
5. Run Tools:• RDP• QIIME• Alpha
Diversity• Machine
Learning
4. Import Sequences
SRR, SFF Sequences
Sample Meta Data
Sample Record
Sample Record
Sample Record
Quality Filtered Sequences
1. Setup
Create:
• Sample Meta Data
• Group• Database• Project
α β
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
29
16S Microbiome Toolset Tutorial1. Setup
• Create Sample Meta Data• Create Group• Create Database• Create Project
2. Transfer• Upload Files• View Uploaded Files• Import Samples• View Imported Samples
3. Link Samples to Sequence Files4. Import Sequences
• View Imported Sequences• Analysis
5. Run Tools• RDP – Taxonomic Abundance Pipeline• QIIME Pipeline – OTU Table, Phylogenetic Tree, and Beta Diversity• Alpha Diversity• Machine Learning
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
30
Multi-step OTU Picking
Remove Chimeras
Taxonomic Classification
Representative Sequences
OTU Table Phylogenetic Tree
Beta Diversity
Alpha Diversity
Classification
Feature Selection
Taxonomic Abundance
= end results
= intermediate resultQuality Filtered Sequences
16S Microbiome
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
31
Microbiome Tutorial1. Introduction to 16S Analysis
• Genboree.org/microbiome– http://genboree.org/theCommons/projects/pub-gen-microbiome/wiki/Microbiome_Tutorial
» 5 Stool samples» 5 Throat samples
2. Replicating Published Data• A core gut microbiome in obese and lean twins (Turnbaugh et al, Nature, 2009)• Study objective: To address the question of how host genotype,
environmental exposures, and host adiposity influence the gut microbiome – http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2677729/pdf/nihms74182.pdf
» 49 Lean Stool samples» 45 Obese Stool samples
– Comparing the alpha diversity between lean and obese stool samples
3. Utilizing HMP control data– > 13,000 samples– 18 body sites– 3 primer regions
4. KEGG Mashups– Subset of HMP WGS HUMAnN output
» 50 samples» 7 body sites
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
32
Microbiome Tutorial1. Introduction to 16S Analysis
• Genboree.org/microbiome– http://genboree.org/theCommons/projects/pub-gen-microbiome/wiki/Microbiome_Tutorial
» 5 Stool samples» 5 Throat samples
2. Replicating Published Data• A core gut microbiome in obese and lean twins (Turnbaugh et al, Nature, 2009)• Study objective: To address the question of how host genotype,
environmental exposures, and host adiposity influence the gut microbiome – http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2677729/pdf/nihms74182.pdf
» 49 Lean Stool samples» 45 Obese Stool samples
– Comparing the alpha diversity between lean and obese stool samples
3. Utilizing HMP control data– > 13,000 samples– 18 body sites– 3 primer regions
4. KEGG Mashups– Subset of HMP WGS HUMAnN output
» 50 samples» 7 body sites
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
33
Introduction to 16S Analysis
• http://www.Genboree.org/microbiome– http://genboree.org/theCommons/projects/pu
b-gen-microbiome/wiki/Microbiome_Tutorial• 5 Stool samples• 5 Throat samples
– Group• NIH_CLOUD_WORKSHOP
– Database• nih_workshop_tutorial
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
34
Microbiome Tutorial1. Introduction to 16S Analysis
• Genboree.org/microbiome– http://genboree.org/theCommons/projects/pub-gen-microbiome/wiki/Microbiome_Tutorial
» 5 Stool samples» 5 Throat samples
2. Replicating Published Data• A core gut microbiome in obese and lean twins (Turnbaugh et al, Nature, 2009)• Study objective: To address the question of how host genotype,
environmental exposures, and host adiposity influence the gut microbiome – http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2677729/pdf/nihms74182.pdf
» 49 Lean Stool samples» 45 Obese Stool samples
– Comparing the alpha diversity between lean and obese stool samples
3. Utilizing HMP control data– > 13,000 samples– 18 body sites– 3 primer regions
4. KEGG Mashups– Subset of HMP WGS HUMAnN output
» 50 samples» 7 body sites
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
35
Replicating Published Data
• Lean vs obese twin study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
36
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
37
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
38
Lean vs. Obese Twins Study
• 94 samples– 49 Lean– 45 Obese
• V6 primer region• 454 – 16S rRNA
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
39
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
40
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
41
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
42
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
43
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
44
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
45
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
46
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
47
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
48
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
49
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
50
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
51
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
52
Genboree Project Integration
• http://genboree.org/java-bin/project.jsp?projectName=Turnbaugh_lean_obese_twins_project
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
53
Phylogenetic Visualizations - iTOL
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
54
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
55
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
56
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
57
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
58
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
59
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
60
Lean vs. Obese Twins Study
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
61
Lean vs. Obese Twins Study
Vol 457|22 January 2009| doi:10.1038/nature07540
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
62
Lean vs. Obese Twins Study
Vol 457|22 January 2009| doi:10.1038/nature07540
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
63
Microbiome Tutorial1. Introduction to 16S Analysis
• Genboree.org/microbiome– http://genboree.org/theCommons/projects/pub-gen-microbiome/wiki/Microbiome_Tutorial
» 5 Stool samples» 5 Throat samples
2. Replicating Published Data• A core gut microbiome in obese and lean twins (Turnbaugh et al, Nature, 2009)• Study objective: To address the question of how host genotype,
environmental exposures, and host adiposity influence the gut microbiome – http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2677729/pdf/nihms74182.pdf
» 49 Lean Stool samples» 45 Obese Stool samples
– Comparing the alpha diversity between lean and obese stool samples
3. Utilizing HMP control data– > 13,000 samples– 18 body sites– 3 primer regions
4. KEGG Mashups– Subset of HMP WGS HUMAnN output
» 50 samples» 7 body sites
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
64
Grid Viewer
• Provides an interactive view of Samples from 1 to many databases
• Databases may exist in different physical locations (virtual integration) (will discuss more later)
• Users can save Sample Sets in which to analyze • Users can select Samples in which to explore
Genes and Pathways (WGS only)
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
65
HMP Data Metrics
• Phase I and Phase II– http://trace.ncbi.nlm.nih.gov/Traces/sra/?
study=SRP002395– http://trace.ncbi.nlm.nih.gov/Traces/sra/?study=S
RP002860– > 13,000 samples
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
67
16S rRNA Sample Grid Viewer
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
68
16S rRNA Sample Grid Viewer
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
69
16S rRNA Sample Grid Viewer
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
70
16S rRNA Sample Grid Viewer
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
71
16S rRNA Sample Grid Viewer
• Then show how we can use these sample sets for analysis on the GMT
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
72
16S rRNA Sample Grid Viewer• http://genboree.org/java-bin/sampleGridViewer.jsp?dbList=http%3A%2F
%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FHMP-16S-rRNA-phaseI-phaseII%2Fdb%2FHMP-16S-I-II%3F&gbGridXAttr=primer_region&gbGridYAttr=body_site&xlabel=primer_region&ylabel=body_site&gridTitle=Samples%20from%20HMP-16S-I-II&pageTitle=Sample%20Grid%20Viewer:%20Samples%20from%20HMP-16S-I-II
• http://genboree.org/java-bin/sampleGridViewer.jsp?dbList=http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FHMP-16S-rRNA-phaseI-phaseII%2Fdb%2FHMP-16S-I-II%3F&gbGridXAttr=seq_center&gbGridYAttr=primer_region_PLUS_body_site&xlabel=seq_center&ylabel=primer_region_PLUS_body_site&gridTitle=Samples%20from%20HMP-16S-I-II&pageTitle=Sample%20Grid%20Viewer:%20Samples%20from%20HMP-16S-I-II
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
73
Microbiome Tutorial1. Introduction to 16S Analysis
• Genboree.org/microbiome– http://genboree.org/theCommons/projects/pub-gen-microbiome/wiki/Microbiome_Tutorial
» 5 Stool samples» 5 Throat samples
2. Replicating Published Data• A core gut microbiome in obese and lean twins (Turnbaugh et al, Nature, 2009)• Study objective: To address the question of how host genotype,
environmental exposures, and host adiposity influence the gut microbiome – http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2677729/pdf/nihms74182.pdf
» 49 Lean Stool samples» 45 Obese Stool samples
– Comparing the alpha diversity between lean and obese stool samples
3. Utilizing HMP control data– > 13,000 samples– 18 body sites– 3 primer regions
4. KEGG Mashups– Subset of HMP WGS HUMAnN output
» 50 samples» 7 body sites
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
74
KEGG Mashup
• Genes and Pathways– View samples + tracks within Grid Viewer– View output within Gene Browser and Pathway
Browser– View Pathways within KEGG
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
91
Virtual Integration
• Accessing data that exists within different physical servers
Aleksandar Milosavljevic, Baylor College of Medicine [email protected]
107
• http://genboree.org/java-bin/sampleGridViewer.jsp?dbList=http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FHMP-16S-rRNA-phaseI-phaseII%2Fdb%2FHMP-16S-I-II%3F,http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FPublic_16S_experiment_data%2Fdb%2FDisease_X%3F&gbGridXAttr=TYPE&gbGridYAttr=DNA_extraction_site_PLUS_seq_center_PLUS_body_site_PLUS_primer_region&xlabel=TYPE&ylabel=DNA_extraction_site_PLUS_seq_center_PLUS_body_site_PLUS_primer_region&gridTitle=Samples%20from%20HMP-16S-I-II,Disease_X&pageTitle=Sample%20Grid%20Viewer:%20Samples%20from%20HMP-16S-I-II,Disease_X
• http://genboree.org/java-bin/sampleGridViewer.jsp?dbList=http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FHMP-16S-rRNA-phaseI-phaseII%2Fdb%2FHMP-16S-I-II%3F,http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FPublic_16S_experiment_data%2Fdb%2FDisease_X%3F&gbGridXAttr=primer_region&gbGridYAttr=body_site&xlabel=primer_region&ylabel=body_site&gridTitle=Samples%20from%20HMP-16S-I-II,Disease_X&pageTitle=Sample%20Grid%20Viewer:%20Samples%20from%20HMP-16S-I-II,Disease_X
• http://genboree.org/java-bin/sampleGridViewer.jsp?dbList=http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FHMP-16S-rRNA-phaseI-phaseII%2Fdb%2FHMP-16S-I-II%3F,http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FPublic_16S_experiment_data%2Fdb%2FDisease_X%3F&gbGridXAttr=barcode&gbGridYAttr=primer_region_PLUS_body_site&xlabel=barcode&ylabel=primer_region_PLUS_body_site&gridTitle=Samples%20from%20HMP-16S-I-II,Disease_X&pageTitle=Sample%20Grid%20Viewer:%20Samples%20from%20HMP-16S-I-II,Disease_X
• http://genboree.org/java-bin/sampleGridViewer.jsp?dbList=http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FHMP-16S-rRNA-phaseI-phaseII%2Fdb%2FHMP-16S-I-II%3F,http%3A%2F%2Fgenboree.org%2FREST%2Fv1%2Fgrp%2FPublic_16S_experiment_data%2Fdb%2FDisease_X%3F&gbGridXAttr=TYPE&gbGridYAttr=primer_region_PLUS_body_site&xlabel=TYPE&ylabel=primer_region_PLUS_body_site&gridTitle=Samples%20from%20HMP-16S-I-II,Disease_X&pageTitle=Sample%20Grid%20Viewer:%20Samples%20from%20HMP-16S-I-II,Disease_X