microbiomes and computational medicine

65
Microbiomes and Computational Medicine Bryan A. White

Upload: conor

Post on 23-Feb-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Microbiomes and Computational Medicine. Bryan A. White. Microbes rule the biosphere. People = 6.86 x 10 9 6,868,700,000 Bacteria in people (just GI Tract) 1.5 x 10 22 15,000,000,000,000,000,000,000 Stars = 10 24 1,000,000,000,000,000,000,000,000 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Microbiomes  and Computational Medicine

Microbiomes and Computational Medicine

Bryan A. White

Page 2: Microbiomes  and Computational Medicine

Microbes rule the biosphere

People = 6.86 x 109 6,868,700,000Bacteria in people (just GI Tract) 1.5 x 1022 15,000,000,000,000,000,000,000Stars = 1024 1,000,000,000,000,000,000,000,000Bacteria on Planet = 1030 100,000,000,000,000,000,000,000,000,000

Page 3: Microbiomes  and Computational Medicine

The human microbiome or, the “other human genome”

image courtesy of the NIH HMP website http://nihroadmap.nih.gov/hmp/

1x1014 microbial cells (micrbiome)3x106 microbial genes (metagenome)

1x1013 human cells2.5x104 human genes

Page 4: Microbiomes  and Computational Medicine

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

The Human MicrobiomeSignificant role in Health: Example in the Gastrointestinal tract• They foster development of the mucosal wall.• The development and maturation of the immune system is dependent on the presence of some members of the intestinal microbiota. Link to human health and disease. • Essential for the metabolism of certain compounds as well as xenobiotics.• Protection against epithelial cell injury. • Regulation of host fat storage. • Stimulation of intestinal angiogenesis.

Page 5: Microbiomes  and Computational Medicine

Consequences of a Perturbed Microbiome?

Peptic ulcers Kidney Stones

Osteoporosis

Obesity Diabetes

Bowel Disorders

Cancer

Pre-term birth

Page 6: Microbiomes  and Computational Medicine

NIH Human Microbiome Project2007 (The Jumpstart Component)

200 reference genomes at 4 sequencing centers in the USA Light and in-depth 16S rDNA sequencing A total of 250 subjects to be recruited with an estimated 30 sites per subject

2009  (RFA) Bring the entire reference collection up to 1000 genomes Genomic sequencing of viruses and small eukaryotes Metagenomic in depth sequencing on the same subjects

Other RFA’s for development of tools and technologies to handle the HMP data

Coordination with the International efforts

Total ~$157M in NIH funding

Page 7: Microbiomes  and Computational Medicine

The proliferation of human microbiome projects. Asher Mullard.Nature 453, 578-580 (2008)

Page 8: Microbiomes  and Computational Medicine

Challenges with studying the human microbiome

Involvement of clinicians – time, IRB, etc. Study groups – recruitment and maintenance Sample availability and quantity – Right sample? How do you get enough DNA?

Data analysis with heavy emphasis on variableregions rather than full-length sequences

Interpretation of data across different groups, worldwide Do we have enough reference genomes for scaffolding?

Page 9: Microbiomes  and Computational Medicine
Page 10: Microbiomes  and Computational Medicine
Page 11: Microbiomes  and Computational Medicine

HMP Metagenomics

Goal: Generate a healthy, well defined reference cohort of specimens that will be used to analyze the microbiome of healthy adults using metagenomics analysis and establish a reference data set.

Features: Developed and executed study protocol Screened 554 subjects

300 enrollees; 150 females, 150 males Sampled 279 enrollees 2X; sampled 100 enrollees 3X

Sampled body sites in healthy 18-40 year olds 5 body sites-oral cavity, nares, skin, GI tract, and vagina 15 sites sampled for males; 18 sites sampled for females Collected 17,040 primary specimens Processed at JCVI, Wash U, Broad and Baylor

Page 12: Microbiomes  and Computational Medicine

“Healthy Cohort” Body Sites• Saliva• Tongue dorsum• Hard palate• Buccal mucosa• Keratinized (attached) gingiva• Palatine tonsils • Throat • Supragingival plaque • Subgingival plaque

• Retroauricular crease, both ears (2)• Antecubital fossa (inner elbow), both arms (2)

• Anterior right and left nares (pooled)

• Stool

• Posterior fornix, vagina• Midpoint, vagina• Vaginal introitus

Gut

Ski

nN

asal

Ora

lVa

gina

l

(vaginal)

Slide courtesy of NHGRI

Page 13: Microbiomes  and Computational Medicine

Definition of Some TermsMicrobiome – The collective microbial community, a microbial census of “who is there”.

Metagenome – The total functional gene content, and therefore metabolic potential, a census of what genes are present in the microbiome

Phylotypes – A microbial type at the Class, Family or Genus. May be a species or even a strain

OTU - Operational taxonomic unit (97% Sequence Similarity of the 16S rDNA gene). A sequence based descriptor.

Page 14: Microbiomes  and Computational Medicine

Terms

Page 15: Microbiomes  and Computational Medicine

Methods used to investigate microbiomes

•Culture independent-based approaches – 16S rRNA and other phylogenetic marker surveys (who is there)

•Limited whole genome sequencing (reference genomes) – Single cell and single molecule sequencing on the horizon

•Subtractive hybridization studies (comparative genomics)

•Stable Isotope Probing – Active populations

•Metagenomic sequencing - functional gene content (i.e., metabolic potential)

•Meta-transcriptomics – which genes are expressed

•Metabolomics – what products are produced

Page 16: Microbiomes  and Computational Medicine
Page 17: Microbiomes  and Computational Medicine

Metabolomics

DNAMicrobiome

RNA Metagenomics

Metatranscriptomics

16s Survey

Microbiome and Metagenomic Analysis

Page 18: Microbiomes  and Computational Medicine

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Biome specific signatures based on the phylogentic content (16S rDNA Analysis)

Page 19: Microbiomes  and Computational Medicine

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Pyrosequence rDNA Tags for Deep Hypervariable Region Amplicon Sequening

Page 20: Microbiomes  and Computational Medicine

Figure 4. Rarefaction curves.

Wooley JC, Godzik A, Friedberg I (2010) A Primer on Metagenomics. PLoS Comput Biol 6(2): e1000667. doi:10.1371/journal.pcbi.1000667http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000667

Page 21: Microbiomes  and Computational Medicine
Page 22: Microbiomes  and Computational Medicine
Page 23: Microbiomes  and Computational Medicine

Tree Generation Phylogenetic tree types Distance Matrix method

UPGMA Neighbor joining

Character State method Maximum likelihood

23

Page 24: Microbiomes  and Computational Medicine

Phylogenetic tree? A tree represents graphical relation

between organisms, species, or genomic sequence

In Bioinformatics, it’s based on genomic sequence

24

Page 25: Microbiomes  and Computational Medicine

What do they represent? Root: origin of evolution Leaves: current organisms, species, or

genomic sequence Branches: relationship between

organisms, species, or genomic sequence

Branch length: evolutionary time (in cladogram, it doesn't represent time)

25

Page 26: Microbiomes  and Computational Medicine

Rooted / Unrooted trees Rooted tree: directed to a unique node

(2 * number of leaves) - 1 nodes, (2 * number of leaves) - 2 branches

Unrooted tree: shows the relatedness of the leaves without assuming ancestry at all (2 * number of leaves) - 2 nodes (2 * number of leaves) - 3 branches

https://www.nescent.org/wg_EvoViz/Tree

26

Page 27: Microbiomes  and Computational Medicine

More tree types used in bioinformatics (from cohen article) Unrooted tree

Rooted tree Cladograms: Branch length have no

meaning Phylograms: Branch length represent

evolutionary change Ultrametric: Branch length represent time,

and the length from the root to the leaves are the same

https://www.nescent.org/wg_EvoViz/Tree

27

Page 28: Microbiomes  and Computational Medicine

How to construct a phylogenetic tree?

Step1: Make a multiple alignment from base alignment or amino acid sequence (by using MUSCLE, BLAST, or other method)

28

Page 29: Microbiomes  and Computational Medicine

How to construct a phylogenetic tree?

Step 2: Check the multiple alignment if it reflects the evolutionary process.

http://genome.cshlp.org/content/17/2/127.full29

Page 30: Microbiomes  and Computational Medicine

How to construct a phylogenetic tree? cont

Step3: Choose what method we are going to use and calculate the distance or use the result depending on the method

Step 4:Verify the result statistically.

30

Page 31: Microbiomes  and Computational Medicine

Distance Matrix methods Calculate all the distance between

leaves (taxa) Based on the distance, construct a tree Good for continuous characters Not very accurate Fastest method

UPGMA Neighbor-joining

31

Page 32: Microbiomes  and Computational Medicine

UPGMA Abbreviation of “Unweighted Pair Group

Method with Arithmetic Mean” Originally developed for numeric

taxonomy in 1958 by Sokal and Michener

Simplest algorithm for tree construction, so it's fast!

32

Page 33: Microbiomes  and Computational Medicine

Downside of UPGMA Assume molecular clock (assuming the

evolutionary rate is approximately constant)

Clustering works only if the data is ultrametric

Doesn’t work the following case:

33

Page 34: Microbiomes  and Computational Medicine

Neighbor-joining method Developed in 1987 by Saitou and Nei Works in a similar fashion to UPGMA Still fast – works great for large dataset Doesn’t require the data to be

ultrametric Great for largely varying evolutionary

rates

34

Page 35: Microbiomes  and Computational Medicine

Downside of Neighbor-joining Generates only one possible tree Generates only unrooted tree

35

Page 36: Microbiomes  and Computational Medicine

Character state methods Need discrete characters

Maximum likelihood Maximum parsimony (will be covered by

Kyle)

36

Page 37: Microbiomes  and Computational Medicine

Maximum likelihood Originally developed for statistics by

Ronald Fisher between 1912 and 1922 Therefore, explicit statistical model Uses all the data Tends to outperform parsimony or

distance matrix methods

37

Page 38: Microbiomes  and Computational Medicine

How to construct a treewith Maximum likelihood? Step 1:

Make all possible trees depending on the number of leaves

Step 2: Calculate likelihood of occurring with the given dataL(Tree) = probability of each tree.

• optimizing branch length • generating tree topology

Step 3: Pick the tree that have the highest likelihood.38

Page 39: Microbiomes  and Computational Medicine

Sounds really great?

Num of leaves

Num of possible trees

3 15 1510 202702513 1505876872520 8200794532637891559375

Maximum likelihood is very expensive and extremely slow to compute

39

Page 40: Microbiomes  and Computational Medicine

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

What microbial species are shared between sites and different species?

Dethlefsen et al. Nature 2007 vol. 449 (7164) pp. 811-818

Page 41: Microbiomes  and Computational Medicine
Page 42: Microbiomes  and Computational Medicine

In adults, each part of the body supports a distinct microbial community.

With no apparent relationship with gender, age, weight, ethnicity or race.

HMP Consortium (2012)“Structure, Function and Diversity of the Human Microbiome in an Adult Reference Population” The Human Microbiome Consortium.

Page 43: Microbiomes  and Computational Medicine
Page 44: Microbiomes  and Computational Medicine

Microbiome is acquired anew each generation.D

omin

guez

-Bel

lo e

t al.

(201

0).

1) Infants obtain microbes from mother or environment.

Palm

er e

t al.

(200

7)

Koe

nig

et a

l. (2

010)

2) Microbial succession over ~1-2 yrs.

3) Microbiome becomes “adult-like” in ~1-2 yrs.

Dominguez-Bello et al. PNAS | June 29, 2010 | vol. 107 | no. 26 | 11975

Page 45: Microbiomes  and Computational Medicine

N=1

N=3

N=1

N=5

N=1

N=1

N=1

N=1

Microbe:Microbe Metabolic Interactions Can Influence Composition

Page 46: Microbiomes  and Computational Medicine

Co-abundance:Pearson correlations as a proxy for testing the interdependent structure of a microbiome

Abun

danc

e of

OT

U A

Abundance of OTU B

Pearsons correlation =

10.90.70

Page 47: Microbiomes  and Computational Medicine

Number of Connections Formed Not Influenced by OTU Abundance

Page 48: Microbiomes  and Computational Medicine

Number of Connections Formed Not Influenced by OTU Prevalence

Page 49: Microbiomes  and Computational Medicine
Page 50: Microbiomes  and Computational Medicine

Random/Exponential vs.Scale –free Networks

Page 51: Microbiomes  and Computational Medicine

Loss of Scale-free structure in Perturbed Howlers

Slope = -1.2

Slope = -0.3

Page 52: Microbiomes  and Computational Medicine

Scale-Free DD in Healthy Human Samples

Slope = -1.2

Page 53: Microbiomes  and Computational Medicine

Degree Distribution Not Affected by Natural Plasticity

Slope = -1.2

Slope = -1.1

Slope = -1.3

Page 54: Microbiomes  and Computational Medicine

Figure 4. Rarefaction curves.

Wooley JC, Godzik A, Friedberg I (2010) A Primer on Metagenomics. PLoS Comput Biol 6(2): e1000667. doi:10.1371/journal.pcbi.1000667http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000667

Page 55: Microbiomes  and Computational Medicine

Biome specific signatures based on the functional gene content (Metagenome Wide Association Studies - MWAS)

Hugenholtz and Tyson. 2008. Nature 455:481.

Page 56: Microbiomes  and Computational Medicine

Figure 2. Topics in the study of the human microbiome with outstanding computational biology challenges.

Gevers D, Pop M, Schloss PD, Huttenhower C (2012) Bioinformatics for the Human Microbiome Project. PLoS Comput Biol 8(11): e1002779. doi:10.1371/journal.pcbi.1002779http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002779

Page 57: Microbiomes  and Computational Medicine

Figure 1. Environmental Shotgun Sequencing (ESS).

Wooley JC, Godzik A, Friedberg I (2010) A Primer on Metagenomics. PLoS Comput Biol 6(2): e1000667. doi:10.1371/journal.pcbi.1000667http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000667

Page 58: Microbiomes  and Computational Medicine

Figure 3. Fragment assembly.

Wooley JC, Godzik A, Friedberg I (2010) A Primer on Metagenomics. PLoS Comput Biol 6(2): e1000667. doi:10.1371/journal.pcbi.1000667http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000667

Page 59: Microbiomes  and Computational Medicine

NATURE| Vol 464|4 March 2010

Page 60: Microbiomes  and Computational Medicine

Enterotype and Vagiotype Concept

Page 61: Microbiomes  and Computational Medicine

Enterotypes

M Arumugam et al. Nature 000, 1-7 (2011) doi:10.1038/nature09944

Page 62: Microbiomes  and Computational Medicine

Vagiotypes

Ravel et al. www.pnas.org/cgi/doi/10.1073/pnas.1002611107 PNAS

Page 63: Microbiomes  and Computational Medicine

INFORMATICS Tool development for data analysis: A distributed, scalable metagenomic analysis system using clouds

Goll et al. Bioinformatics (2010) 26 (20): 2631-2632.

JCVI Metagenomics Reports (METAREP) data mining metagenomic datasets from HMP rich web interface for analysis and comparison of annotated metagenomics datasets high-performance search engine to query large data collections

Distributed, cloud-based design for METAREP Registry for metagenomic data at different institutes / labs, data queries run across all sites Metagenomic pipelines on the cloud, no need for local data centers, benefit for smaller labs Option to install pipelines on traditional data centers / clusters for security

Page 64: Microbiomes  and Computational Medicine
Page 65: Microbiomes  and Computational Medicine