the prospects for nextgen surveillance of pathogens: a view from a public health lab
TRANSCRIPT
The prospects for Nextgen surveillance of pathogens: A view from a Public Health Lab
William Wolfgang Wadsworth Center NYSDOH
NIST Workshop 10/20/14
Public Health Genomics – projects at the Wadsworth
Project( Sample(Source( PI(
Salmonella(surveillance+ Isolate+ Wolfgang+
TB(surveillance+and+drug+resistance+
Isolate+and+Primary+
Escuyer+and+Musser+
C.(Botulinum((source+tracking+ Isolate+ Egan+
HCV+subtyping++ Primary+ Parker+and+Chou+
Adenovirus(surveillance(and(+characteriza@on+ Isolate+ St.+George+
and+Lamson+
Rabies(virus(intraEhost+evolu@on+ Isolate+ Davis+
Mosquito+microbiome+and++West+Nile+Virus+ Primary+ Ciota+and+
Kramer+
We collaborate with a number of different groups on Pathogen sequencing projects
• Global Microbial Identifier (GMI) initiative • CDC: Listeria monocytogenes initiative and AMD initiative • FDA: GenomeTrakr initiative • Minnesota and Washington Departments of Health • And we hope to do more
At the Wadsworth we need standards
• As we translate to new technologies we need to know: • Faster • Cheaper • Better
• For this we need to make accurate and meaningful comparisons.
• To do this we need standards.
Why use Nextgen for Salmonella typing?
• PFGE has low discriminatory power. .
Each year • 1 million cases Salmonella in US. • 19,000 hospitalizations and 378 deaths. • The Wadsworth receives about 1,500/yr.
Proof of principle study on a Salmonella Enteritidis outbreak
• Sept. 2010 Connecticut Dept. of Health identifies a Salmonella outbreak in a long term care facility (LTCF).
• Outbreak was linked to cannoli from a Westchester bakery. • Both NY and CT cases consumed cannoli’s.
• Isolates had the most common PFGE pattern.
Retrospective cohort Key County Date PFGE
IDR1000029153 Cattaraugus 8/10/10 JEGX01.0004 IDR1000031528 Rockland 8/26/10 JEGX01.0004 IDR1000033213 Putnam 9/10/10 JEGX01.0004 IDR1000033369 Putnam 9/10/10 JEGX01.0004 IDR1000033371 Putnam 9/11/10 JEGX01.0004 IDR1000034601 Washington 9/13/10 JEGX01.0004 IDR1000034587 Westchester 9/20/10 JEGX01.0004 IDR1000035417 Putnam 9/22/10 JEGX01.0004 IDR1000035178 Westchester 9/13/10 JEGX01.0004 IDR1000035179 Greenwich CT 9/12/10 JEGX01.0004 IDR1000035180 Westchester 9/12/10 JEGX01.0004 IDR1000035181 Westchester 9/13/10 JEGX01.0004 IDR1000035182 Westchester 9/12/10 JEGX01.0004 IDR1000035183 Greenwich CT 9/16/10 JEGX01.0004 IDR1000036119 9/17/10 JEGX01.0004 IDR1100035184 Westchester 9/16/10 JEGX01.0004 IDR1000036319 Putnam 9/28/10 JEGX01.0004 IDR1000036979 Putnam 10/8/10 JEGX01.0004 IDR1000038792 Nassau 10/29/10 JEGX01.0004 IDR1000034599 Orange 9/15/10 JEGX01.0004 IDR1100006235 Westchester 2/21/11 JEGX01.0004 IDR1100021079 Rockland 7/13/11 JEGX01.0004 IDR1000030147 Out-Of-State 8/22/10 JEGX01.0004 IDR1100003844 Onondaga 2/1/11 JEGX01.0004 IDR1100022186 Yates 7/22/11 JEGX01.0004 IDR1100027690 Erie 9/6/11 JEGX01.0004 IDR1100030508 Madison 10/9/11 JEGX01.0004 IDR1100031312 Suffolk 10/5/11 JEGX01.0004 IDR1100032014 Onondaga 10/22/11 JEGX01.0004 IDR1000028670 Nassau 8/8/10 JEGX01.0004 IDR1000029949 Suffolk 8/16/10 JEGX01.0004 IDR1000033603 Erie 9/14/10 JEGX01.0004 IDR1000034213 Erie 9/13/10 JEGX01.0004 IDR1000037723 Westchester 10/4/10 JEGX01.0004 IDR1000039087 Westchester 10/27/10 JEGX01.0004
7.3 SNPs
1106235
1033603
1127690
1035183
1036319 +
1031528
1033369 +
1035417 +
1132014
1033213 +
1034599
1034587 +
1035179
1122186
1030147
1130508
1034601 +
1037723
1034213
1035184
1103844
1033371 +
1035181
1036119
1028670
1121079
1035178
1035182
1029153
1131312
1038792 +
1036979 +
1029949
1039087
1035180
100
100
100
88
85
88
68
100
100
A
B
LTCF
Whole genome Cluster Analysis ( WGCA) can identify an outbreak cluster not detected
by PFGE
All isolates are PFGE PATTERN 4
Implementing WGCA for SE in real-time.
• Evaluate WGCA compared to PFGE. • Speed - Faster • Cost - Cheaper • More Actionable Clusters – Better
• Develop an in house bioinformatics pipeline.
• Develop communication pipeline to epidemiologists.
• Determine cluster parameters that represent an outbreak from a single source (assign a probability).
• Use data sets to evaluate evolving informatic methods.
• Become proficient (PT programs).
Over the past 12 months
• Sequenced all Salmonella Enteritidis (379 genomes).
• All data at NCBI
• Developed an in House pipeline to analyze the data.
• SNP based phylogenetic trees were constructed in real time.
• 63 phylogenetic clusters were reported to epidemiologists. • 0 to 5 snps differences
Current tree • Is this tree structure
reproduced by other pipelines?
• Is it reproduced within our pipeline?
• What are the minimal sequencing metrics required to create reproducible trees?
• What changes would make this faster-cheaper-better.
*
0.02
NY-swgs1314
swgs1212
NY-swgs1454
NY-sw
gs1307
NY-swgs1341
0531
sgws
-Y
N
0631
sgws
-Y
NNY-swgs1359
NY-swgs1377
NY-swgs1416
NY-swgs1276
swgs1033
NY-swgs1368
swgs1065
NY-swg
s1274
NY-sw
gs1470
NY-swgs1240
swgs1039
NY-sw
gs1476
NY-swgs1373
NY-sw
gs1402
NY-swgs1361
1641sgws-
YN
swgs1205
swgs1038
NY-swgs1334swgs10
47
NY-swgs1401
swgs1048
NY-swgs1369
swgs1012
swgs1034
swgs1061
NY-sw
gs1347
swgs1021
swgs1025
NY-swgs1346
swgs1014
swgs1088
swgs1008
NY-sw
gs1467
NY-sw
gs1345
NY-swgs1381
swgs1030
swgs12
07
swgs1031
NY-swgs1236
NY-sw
gs1363
NY-swgs1356
NY-swgs1247
NY-swgs1384
NY-swgs1427
3738000031RDI-YN
NY-swgs1405 NY-swgs1422
swgs1210
NY-swgs1328
NY-swgs1325
0321
sgws
NY-swgs1397
NY-swgs1386
swgs1023
swgs1208
NY-sw
gs1241
NY-sw
gs1445
NY-swgs1255
NY-swgs129
2
NY-swgs1407
NY-sw
gs1259
4421sgws-
YN
NY-sw
gs1447
NY-swgs1468
swgs1049
swgs1003
swgs1085
swgs1203
7071100031RDI-YN
swgs1036
NY-sw
gs1327
NY-swgs1311
swgs1053
NY-swgs1261
swgs1010
NY-swgs1293
swgs1063
swgs1
024
NY-swgs1395
swgs1214
swgs1095
NY-swgs1398
NY-swgs1349
NY-sw
gs1453
swgs1086
NY-swgs1455
NY-swgs128
2
swgs1073
swgs1217
NY-swgs1450
swgs1198
swgs1221
NY-swgs1267
NY-swgs1319
NY-swgs1329
swgs1197
swgs1055
swgs1067
NY-sw
gs1414
swgs1035
NY-swgs1
295
6418
0000
31RD
I-YN
NY-swgs1239
NY-swgs1357
swgs11
94
swgs1046
swgs1060
NY-swg
s1390
swgs1200
swgs1087
swgs10
77
NY-sw
gs1275
swgs1216
NY-swgs1
310
NY-swgs1410
NY-swgs1352
8731sgws-
YN
swgs10
66
swgs1064
NY-swgs1342
NY-swg
s1370
swgs1222
NY-IDR1300012602
NY-swgs1321
8621sgws-
YN
swgs1054
4843100031RDI-YN
NY-sw
gs1431
NY-swgs1287
swgs1062
NY-swgs1262
NY-sw
gs1471
NY-swgs1298
NY-sw
gs1396
NY-swgs1403
NY-swgs1285
swgs102
2
NY-sw
gs1415
NY-swgs1242
swgs1057
swgs1019swgs1074
NY-swgs1302r
NY-swgs1312
NY-sw
gs1408
NY-swgs1433NY-swgs1385
swgs109
2
swgs1050
NY-swgs1358
NY-swgs124
6
swgs1233
NY-swgs1
308r
NY-swgs1288
NY-SWGS1248
swgs10
09R
NY-sw
gs1412
swgs1229
NY-swgs1243
swgs1042
swgs1005
swgs1078
swgs1094
NY-sw
gs1374
5968000031RDI-YN
NY-swgs1309
NY-swgs1463
NY-swgs1340
NY-swgs1430
NY-swgs1309r
NY-swgs1253
swgs1017
Salmonella_enterica_str_P125109
NY-swgs133
6
NY-sw
gs1335
swgs1051
NY-sw
gs1472
NY-swgs1278
NY-swgs1279
NY-swgs1281
NY-swgs1460
swgs1213
NY-sw
gs1426
3431
sgws
-Y
N
swgs1020
swgs1059
NY-swgs1337
6631
sgws
-Y
N
swgs1199
NY-swgs1256
NY-swgs1400
swgs1081
NY-swgs1300
NY-swgs1303
NY-sw
gs1409
swgs1028
swgs1076
swgs1084
swgs12
04
8241
sgws
-Y
N
NY-swgs1429
1001
sgws
-Y
N
NY-sw
gs1399
NY-swgs1324
0221
sgws
swgs1069
swgs1218
NY-sw
gs1290
5531
sgws
-YN
swgs1026
swgs1195
NY-swgs127
2
swgs1225
swgs10
32
swgs1011
swgs10
04
NY-swgs1294
swgs1083
swgs1016
NY-sw
gs1317
NY-sw
gs1444
NY-swgs1383
7831sgws-
YN
8331sgws-
YN
0221100031RDI-YN
NY-swgs1379
NY-sw
gs1424
NY-swgs1432
NY-sw
gs1286
NY-swgs1330
swgs1206
swgs1006
swgs1202
NY-swgs1302
NY-swgs1323
NY-sw
gs1475
6541sgws-
YN
NY-swgs1451
NY-swgs1376
NY-swgs1277
NY-sw
gs1392
NY-sw
gs1462
NY-swgs1411
NY-swgs1344
NY-swgs1320
swgs1071
NY-swgs1419
swgs1015
4821sgws-
YN
NY-swgs1
263
swgs1091
NY-swgs1353
swgs1043
swgs1072
NY-swgs1269
swgs107
0
swgs1052
swgs1079
swgs1228NY-swgs1364
NY-swgs1
301
NY-swgs1362
swgs1080
NY-sw
gs1331
3741
sgws
-Y
N
NY-sw
gs1413
NY-swgs1452
NY-swgs1291
NY-swgs1254
NY-swgs1388
NY-swgs1322
NY-IDR1300011034
NY-swgs1289
NY-swgs1367
NY-sw
gs1326
3331
sgws
-YN
swgs104
5
6441sgws-
YN
NY-swgs1257
swgs1056
NY-swgs1389
9331sgws-
YN
swgs1201
NY-sw
gs1425
NY-swgs1391
swgs1013
NY-swgs1332
NY-swgs1280
NY-swgs1404
NY-swgs1348
NY-SWGS1249
swgs1044
NY-swgs1260
NY-IDR1300009147
swgs1235
NY-swgs1238
swgs12
19
swgs1224
swgs1227
NY-swgs1270
NY-swgs1245
NY-swgs1417
swgs1037
NY-swgs1252
swgs1090
swgs1082
NY-sw
gs1474
swgs1027
NY-swgs1469
NY-sw
gs1375
NY-swgs1304
NY-swgs1313
swgs1226
3221
sgws
swgs1058
NY-swgs127
1
NY-swgs1420
NY-sw
gs1351
swgs1234
swgs1093
3821sgws-
YN
swgs1029
NY-swgs1365
NY-swgs1266
swgs1007
swgs1068
swgs1215
NY-swgs1423
NY-swgs1448
5031
sgws
-Y
N
NY-swg
s1393
NY-swgs1306
swgs1211
swgs1209
6041sgws-
YN
NY-sw
gs1372
NY-swgs1382
NY-swgs1258
NY-sw
gs1296
swgs1196
NY-swgs1380
swgs1040
NY-sw
gs1251
NY-swg
s1318
NY-swgs1354
NY-swgs1273
swgs1018
swgs1089
9570100031RDI-YN
NY-sw
gs1371
swgs1041
NY-swgs1421
NY-sw
gs1418
NY-swgs1394
swgs1075
*
*
**
*
***
*
**
PFGE Type*=2*=4*= 5*=21 or 23*=34*=56
*
**
*
*
*
*
*
**
*
** *
*
*
*
**
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
**
*
**
*
**
*
**
***
*
*
**
*
*
Can we develop cluster metrics that give a probability of linkage to a single source?
• Perform phylogenetic analysis of epidemiologically confirmed outbreaks.
I. Calculate SNP distance. II. Examine tree structure.
• Do for many serovars of each species.
• This could be done relatively easily by freezer diving and HiSeq runs.
• For SE SNP distances appear to be small (0-3 snps) I. Based on 9 bonafide outbreaks from NY and MN.
Large Western NY pattern 5 cluster
• 13 isolates collected over 10 months (0 to 6 snps distance)
• First isolates in fall 2013.
• February 2014, two distinct clades form.
• Suggests bug has evolved
• Does this cluster represent 1, 2, or 3 sources?
NY-swgs1267
NY-IDR1300008146
NY-swgs1362
swgs1086
NY-swgs1375_NEW
NY-swgs1355
NY-swgs1333
swgs1224
swgs1018
swgs1008
swgs1079
swgs1223
swgs1226
NY-swgs1339
NY-swgs1360
NY-swgs1387_NEW
swgs1217
NY-swgs1305
NY-swgs1366
NY-swgs1347
NY-swgs1335
swgs1065
NY-swgs1311
NY-swgs1374_NEW
swgs1048
swgs1057
swgs1090
swgs1017
swgs10610.0005
7/22/14 Onondaga
7/05/14 Oneida
6/22/14 Seneca
6/3/14 Erie
9/25/13 Niagara
6/11/14 Oswego
5/12/14 Ontario
2/22/14 Oneida
2/22/14 Onondaga
2/21/14 Onondaga
2/6/14 Onondaga
12/12/13 Cattaraugus
10/15/13 Monroe
1+snps+
One person two outbreaks
• GC-35 appears 6/9/14 • PFGE pattern 4 • Isolate from food a handler • Total of 5 cases through 8/7/14
• GC-40 appears 6/30/14 • PFGE pattern 21 • isolate recovered from the same
food handler
JEGX01.0021++
JEGX01.0004
WGCA is better, but is it faster and cheaper?
metric( PFGE( WGCA(
TAT : extraction to analysis
2 days 6+days+
Cost+ $69+ $294+
Technician+@me+ 8h+ 10h+
Ac@onable+clusters++ 3+nonEendemic+ 63+
We have created a two State Network
• Collaborating with Minnesota. • Currently no informatics in house.
• We pull their sequences off Basespace. • Run through our pipeline.
Tree from Merged data
• Does pipeline used to merge the data affect tree structure?
• Do sequence metrics affect merged tree structure?
Travel+associated+
National Genomic Surveillance Machine
• State labs feed the machine by uploading sequences from isolates received through surveillance.
• Federal and other support for reagents and equipment.
• NCBI to analyze the products of this machine and reports results to state and federal agencies.
Current FDA Genome Trackr network
State Health labs • New York • Florida • Arizona • Washington • Minnesota • Virginia • Maryland
FDA labs • 9 FDA field labs • CFSAN - MOD1 • CFSAN - Wiley • IEH (contracting lab)
International labs • Mexico • Ireland • UK (FERA) • Columbia
Contributors • Turkey • Brazil • Italy
Expected Outcomes for WGS surveillance
• Laboratory • Improve outbreak cluster detection. • Clusters will be detected more rapidly and from fewer isolates.
• Epi • Allow identification of clusters within endemic patterns. • Solve more clusters.
• Public Health • More efficient identification and removal of pathogen sources.
Challenges exist • Creating a network. • Increasing amounts of data. • Metadata: how much should be public?
• In real time? • What elements?
• Paying
• As sequencing technology and bioinformatics evolve: • Need to maintain backward compatibility
• Transitioning: • What to do first. • Integration with serology and PFGE typing.
Standards I would like to see
• Pipeline quality and reproducibility.
• Tree quality and reproducibility.
• Probability metrics that a cluster is from a single source.
Summary
• WGS can improve surveillance activities and outbreak traceback.
• It is practical to develop network.
• We need standards.
Acknowledgments • Cornell Martin Wiedmann Henk den Bakker
• FDA Eric Brown Peter Evans Marc Allard Errol Strain Ruth Timme
• Connecticut DOH Stacey Kinney John Fontana
• Minnesota DOH David Boxrud Angie Jones Victoria Lappi
• Washington State DOH Ailyn Perez-Osorio Zhen Li
• Wadsworth Center Genomics Core Matt Shudt Zhen Zhang Charles MacGowan Melissa Leisner Danielle Loranger Mike Palumbo Pascal LaPierre
• Kara Michell
• Wadsworth PulseNet Lab Dianna Bopp Deb Baker Lisa Thompson
• NCBI Bill Klimke Martin Shumway