introduction to single- isolates, single cge services€¦ · helicobacter pylori klebsiella...

50
Workshop on Whole Genome Sequencing and Analysis, 19-21 Mar. 2018 Introduction to single- isolates, single CGE services

Upload: others

Post on 06-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Workshop on Whole Genome Sequencing and Analysis, 19-21 Mar. 2018

Introduction to single-isolates, single CGE services

Page 2: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Learning objective:

After this lecture and exercise, you should be able to…

…describe how the CGE methods for identifying species, Multilocus Sequence Type, plasmids, and antimicrobial resistance genes work

… account for the difference between assembly+BLAST-based prediction methods and mapping based methods

…use the above-mentioned methods as stand-alone services and interpret the results

Page 3: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes
Page 4: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Tools for species identification

Name of Service Description Status PublicationSpeciesFinder Species

identification using 16S rRNA

OnlinePublished Feb 2014 PMID: 24574292

KmerFinder Species identification using overlapping 16mers

Online

Published Jan 2014 PMID: 24172157

TaxonomyFinder Taxonomy identification using functional protein domains

Under development

Published in PMID: 24574292 + Oksana Lukjancenko's PhD thesis

Reads2Type Species identification on client computer

No longer supported

Published Feb 2014 PMID: 24574292

Page 5: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

PMID: 24574292

Page 6: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Training data◇ 1,647 completed / almost completed genomes downloaded from

NCBI in 2011 (1,009 different species)

Evaluation data◇ NCBI draft genomes

• 695 isolates from species that overlap with training set (151 species)

◇ SRA draft genomes• 10,407 sets of short reads from Illumina (168 species)

• 10,407 draft genomes from Illumina data (168 species)

Page 7: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

16S rRNA

• 16S rRNA sequencing has dominated molecular taxonomy of prokaryotes for 40 years (Fox et al, Int. J. Syst. Bacteriol., 1977)

• Tremendous amounts of 16S rRNA sequence data are available in public databases

Concerns: • Low resolution • Some genomes contain several copies of the 16S rRNA gene with inter-gene variation

• The 16S rRNA gene represents only about 0.1% of the coding part of a microbial genome

Page 8: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Reference database • 16S rRNA genes are isolated from genomes in training data using RNAmmer (Lagesen, NAR, 2007).

Method • Input genomes are BLASTed against 16S rRNA genes in reference database.

• Best hit is selected based on a combination of coverage, % identity, bitscore, number of mistmatches and number of gaps in the alignments.

CGE implementation of 16S species identification

SpeciesFinder

Page 9: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

•Genomesintrainingdataischoppedinto16mers:

A T G A C G T A T G A C T G A T G G C G T A G T A G T C C

•Downsampling

•Only16merswithspecificprefix(ATGAC)arekept

KmerFinder Using all information in the WGS data

almost

Page 10: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Bact1-> E. coli

Bact2-> S. enterica

Bact3-> K. pneumoniae

Bact4-> S. aureus

?????

Query bacteria of unknown species

Reference db bacteria of known species (template)

Prediction: Query bacteria is a S. aureus

Page 11: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Three other methods were evaluatedTaxonomyFinder: Performs its predictions based on the presence of protein profiles that are specific to particular taxonomic groups.

Reads2Type: Performs its predictions based on species-specific 50mers in the 16S rRNA or gyrB gene (for Enterobacteriaceae).

rMLST: Performs its predictions based on up to 53 ribosomal genes. Implemented in collaboration with Keith Jolley from Oxford (MLST).

Page 12: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Results

(16srRNA)

Page 13: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Summary of taxonomy benchmark study

• KmerFinder had the highest accuracy and was the fastest method.

• SpeciesFinder (16S rRNA-based) had the lowest accuracy.

• Methods that only sample genomic loci (16S, Reads2Type, rMLST) had difficulties distinguishing species that only recently diverged, especially when main difference is a plasmid.

Page 14: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

“Standard”whenaimingatdeterminingthespeciesofoneisolate

“Winnertakesitall”ifyouhaveamixedsampleorsuspectyouhaveamixedsample

Page 15: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

KmerFinder statistics

Squ

S:Score(totalnumberofuniquekmersinquerysequencethatmatchkmersinreference(template)sequence)qu:Totalnumberofuniquekmersinquerysequence

Slu

S:Score(totalnumberofuniquekmersinquerysequencethatmatchkmersintemplatesequence)lu:Totalnumberofuniquekmersinreference(template)sequence

luS

Querycoverage

Templatecoverage

Kmersinquery Kmersinreference(template)genome

qu

Page 16: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

More KmerFinder statistics

Depth(DepthofCoverage).Onlyrelevantwhenuploadingrawreads.

Average number of times each position is covered by a kmer.

N ⋅ LG

N=totalno.ofkmersthatmatchthetemplate(notthesameasscore)

L=16(lengthofkmer)

G=Totalno.ofuniquekmersintemplate

Page 17: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

KmerFinder output standard scoring method

Page 18: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Query(input)Rawreadsfromurinesamplearesplitinto16mers

Onlyunique16mersarekept

Template/referencedatabase

E.coli

P.mirabilis

S.aureus

Inthe“total”valuesthekmersareallowedtomatchmorethanonetemplate

“Winnertakesitall”

4493

3320

Depth

Page 19: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Tools for further typing

Name of Service Description Publication

MLSTMultilocus sequence typing

Published Apr 2012, PMID: 22238442

PlasmidFinder

Identification of plasmids (replicons) in Enterobacteriaceae (and Gram-positives)

Published Apr 2014, PMID: 24777092

pMLST pMLST of plasmids in Enterobacteriaceae

Published Apr 2014, PMID: 24777092

Page 20: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Multilocus Sequence Typing (MLST)

• First developed in 1998 for Neisseria meningitis (Maiden et al. PNAS 1998. 95:3140-3145)

• The nucleotide sequence of internal regions of app. 7 housekeeping genes are determined by PCR followed by Sanger sequencing

• Different alleles are each assigned a random number

• The unique combination of alleles is the sequence type (ST)

Page 21: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

UsingWGSdataforMLST

DownloadoftheMLSTdatafrompubmlst.org

The BLAST based MLST software identifies theMLST alleles within the genome, which isafterwardstranslatedtothecorrespondingST

Page 22: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Assembledgenome454–singleendreads454–pairedendreadsIllumina–singleendreadsIllumina–pairedendreadsIonTorrentSOLiD–singleendreadsSOLiD–matepairreads

Acinetobacterbaumannii#1Acinetobacterbaumannii#2ArcobacterBorreliaburgdorferiBacilluscereusBrachyspirahyodysenteriaeBifidobacteriumBrachyspiriaintermediaBordetellaBurkholderiapseudomalleiBrachyspiraBurkholeriacepaciacomplexCampylobacterjejuniClostridiumbotulinumClostridiumdifficile#1Clostridiumdifficile#2CampylobacterhelveticusCampylobacterinsulaenigraeClostridiumsepticumC.diphtheriaeCampylobacterfetusChlamydiales

CampylobacterlariCronobacterC.upsaliensisEscherichiacoli#1Escherichiacoli#2EnterococcusfaecalisEnterococcusfaeciumF.psychrophilumHaemophilusinfluenzaeHaemophilusparasuisHelicobacterpyloriKlebsiellapneumoniaeLactobacilluscaseiLactococcuslactisLeptospiraListeriaListeriamonocytogenesMoraxellacatarrhalisMannheimiahaemolyticaNeisseriaP.gingivalisP.acne

PseudomonasaeruginosaPasteurellamultocidaPasteurellamultocidaStaphylococcusaureusStreptococcusagalactiaeSalmonellaentericaStaphylococcusepidermidisS.maltophiliaStreptococcuspneumoniaeStreptococcusoralisS.zooepidemicusStreptococcuspyogenesStreptococcussuisStreptococcusthermophilusStreptomycesStreptococcusuberisVibrioparahaemolyticusVibriovulnificusWolbachiaXylellafastidiosaY.pseudotuberculosis

Page 23: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Mismatches

Page 24: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Extended Output

Page 25: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Truncated gene

Page 26: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Extended Output

Page 27: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

PlasmidFinder-identificationofplasmidreplicons

The BLAST based PlasmidFinder softwareidentifiesrepliconsintheinputdata

Ongoing update of thePlasmidFinderdatabase

Enterobacteriaceae

Grampositives

Page 28: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Selectthethresholdforminimum%Identityandlengthcoverage

Selectthedatabase

Page 29: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Input genome

%IdentityThe percentage og nucleotides in the matching locus of the input genome that is identical to the nucleotides of the plasmid replicon in the PlasmidFinder database

Plasmid replicon

Minimum lengthMinimum percent of the length of the plasmid replicon that has a matching region in the input genome

Input genome

Plasmid replicon

Page 30: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Remember-ThePlasmidFinderdatabasecontainsreplicons,notentireplasmids.

Page 31: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

pMLSTplasmidMLSTforincF,incN,incHI1,IncHI2,andIncI1plasmids

Page 32: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

ResFinder-identificationofacquiredresistancegenes

TheBLASTbasedResFindersoftwareidentifiesthe acquired resistance genes within thegenome, along with accession numbers andtheoreticalresistancephenotype

Tetracycline

Beta-lactam

Colistin

Ongoing update oft h e R e s F i n d e rdatabase

Page 33: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes
Page 34: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

ResFinderoutput

Page 35: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

◇ 200 isolates from 4 different species (Salmonella Typhimurium, Escherichia coli, Enterococcus faecalis and Enterococcus faecium)

◇ ResFinder, 98 %ID, 60% length coverage

◇ Phenotypic tests, 3,051 in total • 482 Resistant • 2569 Susceptible

=> 99,74% of the results were in agreement between ResFinder and the phenotypic tests

23 discrepancies -> 16, typically in relation to spectinomycin in E. coli

Page 36: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

• Allows for species-specific identification of point mutations in chromosomal genes causing antimicrobial resistance

• Uses BLAST for identifying relevant genes in input genomes, which are then screened for mutations known to cause resistance

Page 37: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

CampylobacterE.coliSalmonellaN.gonorrhoeaM.tuberculosis

Canalsoreportmutationsnotspecificallyknowntocauseresistance(unknownmutations)

Page 38: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Overview of chromosomal point mutations included for E. coli

Page 39: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes
Page 40: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Assembly+BLAST-based methods

Draft genomeRaw reads

Database w. genes of interest

Assembly

• The old, trusty method • Slow• Genes might be missed if at config ends (assembly dependent)

Page 41: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Mappingbasedmethods

Database w. genes of interestRaw reads

Mapping

(BWA/Kmers)

• Initially used in SRST2 (Inouye et al., 2014, Genome Med: 6:90)

• Fast

• More sensitive -> higher performance

• Too sensitive -> false positives caused by noise (contamination)

Page 42: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

KmerResistance - Identification of acquired resistance genes

https://cge.cbs.dtu.dk/services/KmerResistance/

• Examines the number of co-occurring kmers between input data and genes in ResFinder database

• Uses the “winner takes it all” strategy

I) Kmers are only assigned to the gene with the highest kmer matches

II) Kmers matching this best hit are removed

III) Step I+II are repeated until no more kmers

• To avoid false positives, a threshold for min. depth and breadth of coverage is introduced

• The threshold varies according to the depth and breadth of the entire genome as predicted by KmerFinder

Clausen et al. (2016). J Antimicrob Chemother. 71(9):2484-8

Page 43: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes
Page 44: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Output from KmerResistance

Page 45: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Output from KmerResistance, continued

…..

Page 46: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Handling sequence data?Watch out!

FASTA file in Word

This should be fine…

Page 47: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

Handling sequence data?Watch out!

Oh no! This wont work…

Use “pure” text editors

Example: • Sublime Text

Save files in “txt” format.

What your data actually looks like!

Page 48: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

A word on browsers

Browserslikelytoworkwithnoproblems:

Chrome,Firefox,(Safari)

Browserswedon’tlike:Explorer,Edge

Page 49: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

And now…www.goseqit.com/exercise1

Page 50: Introduction to single- isolates, single CGE services€¦ · Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lactis Leptospira Listeria Listeria monocytogenes

https://www.dropbox.com/sh/09r0kab7hzeb9mv/AAAHWHvUuad3pG2gPq9llc7Za?dl=0AlsoavailableviaDropBox:

Exercise data