pizza club - may 2016 - shaman

30
Shaman Narayanasamy Eco-Systems Biology Group Supervisors: Paul Wilmes and Jorge Goncalves PHD-2014-1/7934898 Computational approaches to predict bacteriophage-host relationships Robert A. Edwards, Katelyn McNair, Karoline Faust, Jeroen Raes, Bas E. Dulith Review Article FEMS Microbiology (9 December 2015) Computational Biology Pizza Club series: 25 th May 2016

Upload: rsg-luxembourg

Post on 22-Jan-2017

75 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Pizza club - May 2016 - Shaman

Shaman NarayanasamyEco-Systems Biology Group

Supervisors: Paul Wilmes and Jorge Goncalves

PHD-2014-1/7934898

Computational approaches to predict bacteriophage-host relationships

Robert A. Edwards, Katelyn McNair, Karoline Faust, Jeroen Raes, Bas E. DulithReview Article FEMS Microbiology (9 December 2015)

Computational Biology Pizza Club series: 25th May 2016

Page 2: Pizza club - May 2016 - Shaman

2

Article overview

• Metagenomics for identification of viral-host associations• Introduction of wet-lab methods• Focused on bacteriophages (phages) and bacterial

interactions• Benchmark data: 820 bacteriophages, associated hosts and

publicly available metagenomic datasets• Assessment of predictive power of in silico phage-host

signals:– Abundance-based methods– Sequence homology based methods

– Genetic homology– CRISPRs– Oligonucleotide profiles

– Compositional based methods

Page 3: Pizza club - May 2016 - Shaman

3

Introduction

Page 4: Pizza club - May 2016 - Shaman

4

Introduction

Infection!

Membrane receptor

Figure adapted and modified from Gelbart & Knobler et al. (2008)

Page 5: Pizza club - May 2016 - Shaman

5

Introduction

Infection!

Resistance

Defense!!!• Membrane receptor

mutation• CRISPR-Cas• Restriction-modification

Membrane receptor

Figure adapted and modified from Gelbart & Knobler et al. (2008)

Page 6: Pizza club - May 2016 - Shaman

6

Introduction

Infection!

Resistance

Defense!!!• Membrane receptor

mutation• CRISPR-Cas• Restriction-modification

Membrane receptor

Mutation

Figure adapted and modified from Gelbart & Knobler et al. (2008)

Page 7: Pizza club - May 2016 - Shaman

7

Introduction

Infection!

Resistance Fitness

Defense!!!• Membrane receptor

mutation• CRISPR-Cas• Restriction-modification

Membrane receptor

Mutation

Figure adapted and modified from Gelbart & Knobler et al. (2008)

Page 8: Pizza club - May 2016 - Shaman

8

Introduction

Infection!

Resistance Fitness

Page 9: Pizza club - May 2016 - Shaman

9

Introduction

Infection!

Resistance Fitness

Page 10: Pizza club - May 2016 - Shaman

10

Introduction

Infection!

Resistance Fitness

Page 11: Pizza club - May 2016 - Shaman

11

Introduction

Competition

Infection!

Resistance Fitness

Page 12: Pizza club - May 2016 - Shaman

Experimental approaches for phage isolation

12

• Spot and plaque assays• Liquid assays• Viral tagging • Microfluidic PCR• PhageFISH• Single cell sequencing• Hi-C sequencing

Page 13: Pizza club - May 2016 - Shaman

Spot and plaque assays

13

Requires• Pure culture of host• Pure/environmental culture of phage

Disadvantages• Low throughput• Host isolation required

Photo adapted and modified from http://www.slideshare.net/Adrienna/global-food-safety2013

Page 14: Pizza club - May 2016 - Shaman

Liquid assays

14

Requires• Pure culture of host• Pure culture of phage

Disadvantages• Use of OD readout *• Low sensitivity (single endpoint values) *• Host and phage isolate required* Use redox dye, Omnilog platform and real-time/semiquantitative PCRFigure adapted and modified from Goldberg et al. (2014)

Page 15: Pizza club - May 2016 - Shaman

Viral tagging

15

Requires• Pure culture of host• Pure culture/environmental isolate of phages• Cell sorter (FACS..?)

Disadvantages• Host isolate required

Figure adapted and modified from http://jgi.doe.gov/dyeing-learn-marine-viruses/

Page 16: Pizza club - May 2016 - Shaman

Microfludic PCR

16

Requires• Environmental microbial community sample• PCR primers for target marker genes

Disadvantages• Relies on marker genes for design of PCR primers

Figure adapted and modified from Dang & Sullivan (2014)

Page 17: Pizza club - May 2016 - Shaman

PhageFISH

17Figures adapted and modified from Dang & Sullivan (2014) and Allers et al. (2013)

Requires• Environmental microbial community sample• PCR primers for target marker genes

Disadvantages• Relies on marker genes for FISH probe design

time

Page 18: Pizza club - May 2016 - Shaman

Single cell sequencing

18

Requires• Single microbial cell from environmental microbial community sample

Disadvantages• Biased towards most abundant environmental microbe

Figure adapted and modified from Lasken (2012)

Page 19: Pizza club - May 2016 - Shaman

Benchmark dataset

19

820 complete phage genomes

Field: “host”

153 complete bacterial genomes

NCBI RefSeq

Page 20: Pizza club - May 2016 - Shaman

Quality assessment of predictions: ROC curves

20

• Assessment of binary classifier (Host/Not Host)• Does not require cut-off value• Based on the rate of accumulation of true and false positives• True positive rate (Sensitivity), False positive rate (1-Specificity)

TPr = TP/TP + FN FPr = TN/TN + FP

Page 21: Pizza club - May 2016 - Shaman

Computational methods for phage-host signal prediction

21

• Abundance profiles• Genetic homology• CRISPR• Exact matches• Oligonucleotide profiles

Page 22: Pizza club - May 2016 - Shaman

Abundance profiles

22

• Stern et al. (2012)– Good correlation of phage-host abundance across human gut microbiome (metagenomes)

• Reyes et al. (2013)– 2/5 phages correspond to decrease in host abundance (mouse gut)

• Nielsen et al. (2014)– Occurrence of phage like gene sets corresponding to host (bacterial) gene set– Includes known phage-host pairs

• Dulith et al. (2014)• 22% metagenomic reads may be of phage origin

• Lima-Mendez et al. (2015); TARA Oceon Survey

Figure adapted and modified from Nielsen et al. (2014) and Edwards et al. (2015)

• Improves with the availability of multiple samples from same/similar environments• High spatio/temporal stratification; will improve as publicly available metagenome collection increases• Time series datasets potentially used for time lagged associations• Complicated and non-linear dynamics incompatible with straightforward correlation• 12% correct identification of host

Page 23: Pizza club - May 2016 - Shaman

Genetic homology

23

• Phage-host homology is an indication of recent common ancestry, implying interaction• Host genes may benefit phages!

• Auxilary metabolic genes

• Modi et al. (2013) and Dulith et al . (2014)

Figure adapted and modified from Edwards et al. (2015)

• Amino acid based searches applicable for distantly related organisms (29.8%)• Nucleotide based searches more accurate (38.5%)• 30% host identified

Page 24: Pizza club - May 2016 - Shaman

24

CRISPR-Cas

Phage genome 2Phage genome 1

R R R RRRS1 S2 S5S3 S4

R: RepeatSx: Spacers

CRISPR

Bacterial genome cas gene CRISPR

Page 25: Pizza club - May 2016 - Shaman

CRISPRs

25

• Studies:– Human gut microbiome; Stern et al. (2012), Minot et al. (2013)– Acidophilic biofilms; Andersson & Banfield (2008)– Cow rumen; Berg Miller et al. (2012)– Arctic glacial ice and soil; Sanguino et al. (2015)– Marines environments; Anderson, Brazelton & Baross (2011), Cassman et al. (2012)– Activated sludge; Narayanasamy et al. (unpublished)

• Little to no homology to known sequence• Environmentally dependent• Spacers are rapidly replaced• Most suitable for recent phage-host interactions• Not all prokaryotes encode CRISPRs (bacteria; 48 ± 30%, archaea; 63 ± 30%)• Highly specific, but not sensitive• Degeneracy of up to 13 mismatches allowed (Fineran et al., 2014)

Figure adapted and modified from Edwards et al. (2015)

Page 26: Pizza club - May 2016 - Shaman

Exact matches

26

• Integration of phage to host via homologous recombination• attp (POP’) on phage genome and attb (BOB’) on bacterial genome • Common identical core sequence (2-15 bp) between phage and host• Adjacent to integrase gene in phage genome, near tRNA gene in bacterial genomes

Figure adapted and modified from Edwards et al. (2015)

• Longer matches more reliable• Up to 40% matches correct prediction

Page 27: Pizza club - May 2016 - Shaman

Contig with cas gene

Contig with known phage gene

Contig with CRISPR locus

Oligonucleotide profiles

27

• Phages ameliorate genomic oligonucleotides profiles according to host• Avoid recognition by restriction enzymes• Adjustment of codon usage to match available host tRNAs• Ogilvie et al. (2013) identified 408 metagenomic fragments with phage like properties (4mers)

Figure adapted and modified from Narayanasamy et al. (unpublished) and Edwards et al. (2015)

• Profiles cannot be too sparse (shorter kmers)• K=3-8 predicted 8-17% correct hosts• Codon usage predicted ~10% hosts correctly• GC content not informative

Page 28: Pizza club - May 2016 - Shaman

Summary and overview

28

Signal category Approach Performance Comments

Abundance profiles Phage-host coabundanceprofilesAssociation by correlation

9.5% non-linear dynamics confound correlations

Genetic homology Phage-host nucleotide and protein sequence homology

38.5% - blastn29.8% - blastx

Depends on database

CRISPRs Spacers alignments to phage genomes

15.1% - most similar21.3% - highest

Occurrence of CRISPR system (~40% bacteria, ~70% archaea)No matchesNot sensitive

Exact matches ** Exact matches of phage-host genomes

40.5% Short exact matches may be random

Oligonucleotide profiles

Similarity of kmer profilesof phage-host

17.2% - 4mer10.4% - codon

Table adapted and modified from Edwards et al. (2015)

Page 29: Pizza club - May 2016 - Shaman

Summary and overview

29

• Blastn and exact matches provide strongest signal• Most methods predict between 1 - 4 bacteria as most likely host (better than random)• Significant host genome fraction required (except for abundance-based method)• Current knowledge still limited• Phage host range (highly specific vs brad range)• New methods and technology

Figure adapted and modified from Edwards et al. (2015)

Page 30: Pizza club - May 2016 - Shaman

Thank you!

PHD-2014-1/7934898