the path to implementation of whole genome sequencing (wgs) in pulsenet

16
The path to implementation of WGS in PulseNet National Center for Emerging and Zoonotic Infectious Diseases Division of Foodborne, Waterborne, and Environmental Diseases Peter Gerner-Smidt, MD, DSc Enteric Diseases Laboratory Branch GMI9 Rome, Italy, May 23- 25, 2016

Upload: externalevents

Post on 16-Jan-2017

174 views

Category:

Education


1 download

TRANSCRIPT

Page 1: The path to implementation of Whole Genome Sequencing (WGS) in PulseNet

The path to implementation of WGS in

PulseNet

National Center for Emerging and Zoonotic Infectious Diseases

Division of Foodborne, Waterborne, and Environmental Diseases

Peter Gerner-Smidt, MD, DScEnteric Diseases Laboratory Branch

GMI9

Rome, Italy, May 23- 25, 2016

Page 2: The path to implementation of Whole Genome Sequencing (WGS) in PulseNet

PulseNet International The international subtyping network of national and regional networks for foodborne disease surveillance

”Saving Lives Since 2000”

http://www.pulsenetinternational.org/

Page 3: The path to implementation of Whole Genome Sequencing (WGS) in PulseNet

Whole Genome Sequencing (WGS) is a Transforming and REPLACING Technology

Consolidating multiple laboratory workflows into one:

o Identification – serotyping – virulence profiling – antimicrobial

resistance characterization – plasmid characterization- subtyping

Replacing - NOT supplementing current methods

More: Precise- Informative- Cost-efficient

Page 4: The path to implementation of Whole Genome Sequencing (WGS) in PulseNet

WGS in Public Health:

The analytical tools must be

• Simple

• Public health microbiologists are NOT

bioinformaticians

• Standard desktop software

• Comprehensive

• All characterization incl. analysis in one workflow

• Working in a network of laboratories, i.e. STANDARDIZED

• Free sharing and comparison of data between labs

• Central and local analysis

Page 5: The path to implementation of Whole Genome Sequencing (WGS) in PulseNet

MLST vs SNPSNP MLST

Epidemiological concordance High High

Stable nomenclature (No) Yes

Reference characterization:

identification, serotyping, virulence &

resistance markers

No Yes

Speed Slow SNP calling,

slow analysis

Slow allele calling,

fast analysis

Local computing requirements Medium-High Low

Local bioinformatics expertise Yes No

Reference used to perform analysis Sequence of

closely related

annotated strain

Allele database

Requires curation No (Yes)

MLST is the primary approach for public health surveillance; SNP is used if more

detail is needed or MLST fails

Page 6: The path to implementation of Whole Genome Sequencing (WGS) in PulseNet

Listeria 1403MLGX6-1WGS

wgMLST and hqSNP Are Equally Discriminatory

and Phylogenetic Trees Are Concordant

hqSNP

0.0

0.0

0.3

0.3

0.1

0.5

0.1

0.6

1.5

2.1

wgMLST (<All Characters>)

100

9998

wgMLST

LMO

_1

LMO

_4

LMO

_5

LMO

_6

LMO

_7

LMO

_10

LMO

_11

LMO

_12

LMO

_13

LMO

_14

LMO

_15

LMO

_16

LMO

_17

LMO

_18

2 18 20 41 11 19 11 8 37 21 22 13 4

25 2 18 20 41 11 19 11 8 37 21 22 13 4

25 2 18 20 41 11 19 11 8 37 21 22 13 4

25 2 20 41 11 19 11 8 37 21 22 13 4

25 2 18 20 41 11 19 11 8 37 21 22 13 4

25 2 41 11 19 11 8 21 22 13 109

State 2 isolate 1

State 1 isolate

State 3 isolate

State 2 isolate 2

State 2 isolate 3

2013 isolate – Nearest Neighbor

wgMLST

State 2 isolate 1

State 1 isolate

State 3 isolate

State 2 isolate 2

State 2 isolate 3

2013 isolate – Nearest Neighbor

Page 7: The path to implementation of Whole Genome Sequencing (WGS) in PulseNet

Trees ~ Tables

Key SourceStateSerotype PFGE-XbaI-patternPFGE-XbaI-status PFGE-BlnI-pattern

PFGE-BlnI-

status Outbreak SourceCounty SourceCity SourceCountry

SourceT

ype SourceSite PatientAge PatientSex IsolatDateReceivedDate UploadDate

M18340 M Enteritidis JEGX01.0009 Confirmed Unconfirmed 1507MLJEG-3 DeKalb Dawsonville USA Human Stool 54UNKNOWN 6/26/2015 7/15/2015 8/4/2015

X150951 X Enteritidis JEGX01.0009 Confirmed Unconfirmed 1507MLJEG-3 Gwinnett Key West USA Human Stool 33MALE 7/5/2015 7/15/2015 8/4/2015

D108427 D Enteritidis JEGX01.0009 Confirmed Unconfirmed 1507MLJEG-3 Fulton Miami USA Human Blood 50FEMALE 7/7/2015 7/15/2015 8/4/2015

A15054-1 A Enteritidis JEGX01.0009 Confirmed Unconfirmed 1507MLJEG-3 Pickens USA Human Stool 28FEMALE 7/7/2015 7/27/2015 8/7/2015

D508583 D Enteritidis JEGX01.0009 Confirmed Unconfirmed 1507MLJEG-3 Dawson Philadelphia USA Human Stool 24FEMALE 7/21/2015 8/11/2015

M088433 M Enteritidis JEGX01.0009 Confirmed Unconfirmed 1507MLJEG-3 Forsyth USA Human Stool 44FEMALE 7/16/2015 7/24/2015 8/13/2015

P110964-1 P Enteritidis JEGX01.0009 Confirmed Unconfirmed Forsyth USA Human Blood 72MALE 8/3/2015 8/10/2015 8/17/2015

A09461 A Enteritidis JEGX01.0009 Confirmed Unconfirmed Cabbagetown USA Human Blood 43FEMALE 7/30/2015 8/5/2015 8/26/2015

A109320 A Enteritidis JEGX01.0009 Confirmed Unconfirmed Bismarck USA Human Stool 28UNKNOWN 7/25/2015 8/6/2015 8/27/2015

T509961 T Enteritidis JEGX01.0009 Confirmed Unconfirmed

ForsythDecatur USA Human Stool 57UNKNOWN 7/31/2015 8/13/2015 9/10/2015

A110203 A Enteritidis JEGX01.0009 Confirmed Unconfirmed DeKalb Hollywood USA Human Other 14FEMALE 8/11/2015 8/25/2015 9/22/2015

A151664 A Enteritidis JEGX01.0009 Confirmed Unconfirmed Talking Rock USA Human Stool 62MALE 8/26/2015 9/8/2015 9/28/2015

DA159061 K Enteritidis JEGX01.0009 Confirmed Unconfirmed Pickens Pierre USA Human Stool 6FEMALE 8/29/2015 9/9/2015 9/29/2015

M150130-1 P Enteritidis JEGX01.0009 Confirmed Unconfirmed Dawson USA Human Stool 6MALE 9/20/2015 9/28/2015 10/1/2015

C15-0445058 N Enteritidis JEGX01.0009 Confirmed Unconfirmed Charlotte USA Human Stool 5MALE 9/2/2015 9/25/2015 10/9/2015

A122326 L Enteritidis JEGX01.0009 Confirmed Unconfirmed Gwinnett NYC USA Human Blood 88FEMALE 9/30/2015 10/7/2015 10/15/2015

A151248 A Enteritidis JEGX01.0009 Confirmed Unconfirmed Atlanta USA Human Stool 37MALE 10/4/2015 10/13/2015 10/21/2015

A125223 D Enteritidis JEGX01.0009 Confirmed Unconfirmed Hall L..A. USA Human Stool FEMALE 9/26/2015 10/14/2015 10/22/2015

FD

A0

00

09

43

3

FD

A0

00

09

40

8

FD

A0

00

09

43

2

FD

A0

00

09

41

1

FD

A0

00

09

41

4

FD

A0

00

09

41

0

20

15

K-0

96

2

FD

A0

00

09

41

5

FD

A0

00

09

40

9

PN

US

AS

00

09

07

FD

A0

00

09

41

3

20

15

K-0

96

0

FD

A0

00

09

41

2

20

15

K-0

96

1

FD

A0

00

09

41

7

PN

US

AS

00

09

05

PN

US

AS

00

08

39

FD

A0

00

09

41

6

PN

US

AS

00

08

61

PN

US

AS

00

09

06

PN

US

AS

00

08

42

PN

US

AS

00

08

58

PN

US

AS

00

08

44

PN

US

AS

00

08

62

PN

US

AS

00

08

40

PN

US

AS

00

09

08

PN

US

AS

00

08

97

PN

US

AS

00

08

45

PN

US

AS

00

08

60

PN

US

AS

00

09

03

PN

US

AS

00

09

04

PN

US

AS

00

07

64

PN

US

AS

00

08

43

PN

US

AS

00

08

59

PN

US

AS

00

08

41

PN

US

AS

00

08

07

PN

US

AS

00

08

95

PN

US

AS

00

07

73

PN

US

AS

00

07

67

*

PN

US

AS

00

08

94

PN

US

AS

00

07

66

PN

US

AS

00

07

70

*

PN

US

AS

00

07

72

*

PN

US

AS

00

08

96

PN

US

AS

00

07

69

*

PN

US

AS

00

07

71

*

PN

US

AS

00

08

08

PN

US

AS

00

07

68

*

PN

US

AS

00

07

99

20

15

K-0

96

4

6344

15

38

75

84

67

10

0

4

35

52 25

19

12

0.0

01

FD

A0

00

09

43

3

FD

A0

00

09

40

8

FD

A0

00

09

43

2

FD

A0

00

09

41

1

FD

A0

00

09

41

4

FD

A0

00

09

41

0

20

15

K-0

96

2

FD

A0

00

09

41

5

FD

A0

00

09

40

9

PN

US

AS

00

09

07

FD

A0

00

09

41

3

20

15

K-0

96

0

FD

A0

00

09

41

2

20

15

K-0

96

1

FD

A0

00

09

41

7

PN

US

AS

00

09

05

20

15

K-0

96

3

PN

US

AS

00

08

39

FD

A0

00

09

41

6

PN

US

AS

00

08

61

PN

US

AS

00

09

06

PN

US

AS

00

08

42

PN

US

AS

00

08

58

PN

US

AS

00

08

44

PN

US

AS

00

08

62

PN

US

AS

00

08

40

PN

US

AS

00

09

08

PN

US

AS

00

08

97

PN

US

AS

00

08

45

PN

US

AS

00

08

60

PN

US

AS

00

09

03

PN

US

AS

00

09

04

PN

US

AS

00

07

64

PN

US

AS

00

08

43

PN

US

AS

00

08

59

PN

US

AS

00

08

41

PN

US

AS

00

08

07

PN

US

AS

00

08

95

PN

US

AS

00

07

73

PN

US

AS

00

07

67

*

PN

US

AS

00

08

94

PN

US

AS

00

07

66

PN

US

AS

00

07

70

*

PN

US

AS

00

07

72

*

PN

US

AS

00

08

96

PN

US

AS

00

07

69

*

PN

US

AS

00

07

71

*

PN

US

AS

00

08

08

PN

US

AS

00

07

68

*

PN

US

AS

00

07

99

20

15

K-0

96

4

6344

15

38

75

84

67

10

0

4

35

52 25

19

12

0.0

01

Page 8: The path to implementation of Whole Genome Sequencing (WGS) in PulseNet

Definitive phylogenetically relevant naming of WGS profiles

“SNP Address”

Courtesy Tim Dallman, PHE

1 2

1

2

31

2

3

4

5

6

1.1.1

1.2.21.2.4

1.2.3

2.3.5

2.3.6

Page 9: The path to implementation of Whole Genome Sequencing (WGS) in PulseNet

Courtesy Tim Dallman, PHE

• Hierarchical clustering

based on full pairwise

distance between two

genomes

• Used to assign a SNP

address to a strain

based on specified

index e.g. 50:25:10:5:0

• Can be used for

surveillance purposes

“SNP address”

PulseNet International will use MLST: “Allele Code”

Page 10: The path to implementation of Whole Genome Sequencing (WGS) in PulseNet

Considerations for a phylogenetic

relevant strain nomenclature system

• Must be simple

– Sequence of numbers

• Stability of system

– Fit new sequences into an existing tree?

– Recalculate the clusters with every new entry?

• No matter which method used, the stability can be controlled

• < 2% risk that you cannot fit a new sequence unambiguously

into the nomenclatural system

• Cutoffs between levels

• Clustering algorithm

– Single linkage? UPGMA?

Page 11: The path to implementation of Whole Genome Sequencing (WGS) in PulseNet

WGS Data Workflow

Allele & Allele code

DatabasesAllele names, Allele code

(strain names)

NO Metadata

Temporary storage,

QA/QC, Data

extractionTrimming, mapping, de novo

assembly, SNP detection,

allele detection

NO Metadata

Public Health databases

Extensive Metadata

Database managers

and end users

External storageNCBI, ENA,

Limited

Metadata

Sequencer

Raw sequences

LIMS

7-gene MLST Allelic profile

cgMLST ST

wgMLST Allele Code

(SNPs)

Page 12: The path to implementation of Whole Genome Sequencing (WGS) in PulseNet

Acknowledgements

National Center for Emerging and Zoonotic Infectious Diseases

Division of Foodborne, Waterborne, and Environmental Diseases

Disclaimers:

“The findings and conclusions in this presentation are those of the author and do not necessarily

represent the official position of the Centers for Disease Control and Prevention”

“Use of trade names is for identification only and does not imply endorsement by the Centers for

Disease Control and Prevention or by the U.S. Department of Health and Human Services.”

Public Health Agency of Canada

Institut Pasteur, S. Brisse; M. Lecuit

Center for Genomic Epidemiology, DTU

University of Oxford, M. Maiden, K. Jolly

Public Health England, T. Dallman

Page 13: The path to implementation of Whole Genome Sequencing (WGS) in PulseNet

Hierachical Nomenclature Is

Inherently Unstable

• As we use approximate matching to group strains, equality is no

longer transitive.

Given strains A, B and C with distances as indicated,

Then at distance cutoff 21, A, B and C would be in the same cluster.

• However, if B has not been sampled yet, A and C would not be in

the same cluster

• How bad is it?

A

C

B13 17

28

Courtesy: Hannes Poussele, Applied Maths

Page 14: The path to implementation of Whole Genome Sequencing (WGS) in PulseNet

Cutoff determination(case PulseNet Listeria cgMLST database, N= 3,652)

Test procedure: find points with minimal name changes starting from

nothing and by chronological addition of strains

Thresholds: 150:100:63:41:21:11

Courtesy: Hannes Poussele, Applied Maths

Page 15: The path to implementation of Whole Genome Sequencing (WGS) in PulseNet

Stability Assessment

• Test 1: starting from nothing, add samples chronologically

• Test 2: starting from a random subset (50%), add samples chronologically

• Using a precalculated strain nomenclature structure based on what is

known today, reduces the nomenclature stability beyond what is expected

(that is, in this case, 50% reduction)

• The 21 allelic changes cutoff might be not stable enough

threshold % change

Test 1 Test 2

11 1.01% 0.30%

21 2.51% 1.64%

41 2.51% 0.57%

63 1.37% 0.27%

100 2.52% 0.03%

150 0.22% 0%

Courtesy: Hannes Poussele, Applied Maths

Page 16: The path to implementation of Whole Genome Sequencing (WGS) in PulseNet

Stability Assessment

ConclusionsMLST-based hierachical strain nomenclature is feasible

• Stability good

– Without the 21 allelic changes cutoff, less that 1.17% name changes

• Stability can be further increased by defining a broad starting set – Using a more international collection of strains

– Using biological knowledge about the population structure of L.monocytogenes

• Computational feasibility

– Names can be assigned one sample at a time, no need for complete

recalculations

• wgMLST instead of cgMLST yields extremely similar results

Courtesy: Hannes Poussele, Applied Maths