the prospects for nextgen surveillance of pathogens: a view from a public health lab

28
The prospects for Nextgen surveillance of pathogens: A view from a Public Health Lab William Wolfgang Wadsworth Center NYSDOH NIST Workshop 10/20/14

Upload: nathan-olson

Post on 06-Aug-2015

73 views

Category:

Science


1 download

TRANSCRIPT

The prospects for Nextgen surveillance of pathogens: A view from a Public Health Lab

William Wolfgang Wadsworth Center NYSDOH

NIST Workshop 10/20/14

Public Health Genomics – projects at the Wadsworth

Project( Sample(Source( PI(

Salmonella(surveillance+ Isolate+ Wolfgang+

TB(surveillance+and+drug+resistance+

Isolate+and+Primary+

Escuyer+and+Musser+

C.(Botulinum((source+tracking+ Isolate+ Egan+

HCV+subtyping++ Primary+ Parker+and+Chou+

Adenovirus(surveillance(and(+characteriza@on+ Isolate+ St.+George+

and+Lamson+

Rabies(virus(intraEhost+evolu@on+ Isolate+ Davis+

Mosquito+microbiome+and++West+Nile+Virus+ Primary+ Ciota+and+

Kramer+

We collaborate with a number of different groups on Pathogen sequencing projects

•  Global Microbial Identifier (GMI) initiative •  CDC: Listeria monocytogenes initiative and AMD initiative •  FDA: GenomeTrakr initiative •  Minnesota and Washington Departments of Health •  And we hope to do more

At the Wadsworth we need standards

•  As we translate to new technologies we need to know: •  Faster •  Cheaper •  Better

•  For this we need to make accurate and meaningful comparisons.

•  To do this we need standards.

Why use Nextgen for Salmonella typing?

•  PFGE has low discriminatory power. .

Each year •  1 million cases Salmonella in US. •  19,000 hospitalizations and 378 deaths. •  The Wadsworth receives about 1,500/yr.

Greater discrimination can be achieved by Whole Genome Sequencing

Proof of principle study on a Salmonella Enteritidis outbreak

•  Sept. 2010 Connecticut Dept. of Health identifies a Salmonella outbreak in a long term care facility (LTCF).

•  Outbreak was linked to cannoli from a Westchester bakery. •  Both NY and CT cases consumed cannoli’s.

•  Isolates had the most common PFGE pattern.

Retrospective cohort Key County Date PFGE

IDR1000029153 Cattaraugus 8/10/10 JEGX01.0004 IDR1000031528 Rockland 8/26/10 JEGX01.0004 IDR1000033213 Putnam 9/10/10 JEGX01.0004 IDR1000033369 Putnam 9/10/10 JEGX01.0004 IDR1000033371 Putnam 9/11/10 JEGX01.0004 IDR1000034601 Washington 9/13/10 JEGX01.0004 IDR1000034587 Westchester 9/20/10 JEGX01.0004 IDR1000035417 Putnam 9/22/10 JEGX01.0004 IDR1000035178 Westchester 9/13/10 JEGX01.0004 IDR1000035179 Greenwich CT 9/12/10 JEGX01.0004 IDR1000035180 Westchester 9/12/10 JEGX01.0004 IDR1000035181 Westchester 9/13/10 JEGX01.0004 IDR1000035182 Westchester 9/12/10 JEGX01.0004 IDR1000035183 Greenwich CT 9/16/10 JEGX01.0004 IDR1000036119 9/17/10 JEGX01.0004 IDR1100035184 Westchester 9/16/10 JEGX01.0004 IDR1000036319 Putnam 9/28/10 JEGX01.0004 IDR1000036979 Putnam 10/8/10 JEGX01.0004 IDR1000038792 Nassau 10/29/10 JEGX01.0004 IDR1000034599 Orange 9/15/10 JEGX01.0004 IDR1100006235 Westchester 2/21/11 JEGX01.0004 IDR1100021079 Rockland 7/13/11 JEGX01.0004 IDR1000030147 Out-Of-State 8/22/10 JEGX01.0004 IDR1100003844 Onondaga 2/1/11 JEGX01.0004 IDR1100022186 Yates 7/22/11 JEGX01.0004 IDR1100027690 Erie 9/6/11 JEGX01.0004 IDR1100030508 Madison 10/9/11 JEGX01.0004 IDR1100031312 Suffolk 10/5/11 JEGX01.0004 IDR1100032014 Onondaga 10/22/11 JEGX01.0004 IDR1000028670 Nassau 8/8/10 JEGX01.0004 IDR1000029949 Suffolk 8/16/10 JEGX01.0004 IDR1000033603 Erie 9/14/10 JEGX01.0004 IDR1000034213 Erie 9/13/10 JEGX01.0004 IDR1000037723 Westchester 10/4/10 JEGX01.0004 IDR1000039087 Westchester 10/27/10 JEGX01.0004

7.3 SNPs

11­06235

10­33603

11­27690

10­35183

10­36319 +

10­31528

10­33369 +

10­35417 +

11­32014

10­33213 +

10­34599

10­34587 +

10­35179

11­22186

10­30147

11­30508

10­34601 +

10­37723

10­34213

10­35184

11­03844

10­33371 +

10­35181

10­36119

10­28670

11­21079

10­35178

10­35182

10­29153

11­31312

10­38792 +

10­36979 +

10­29949

10­39087

10­35180

100

100

100

88

85

88

68

100

100

A

B

LTCF

Whole genome Cluster Analysis ( WGCA) can identify an outbreak cluster not detected

by PFGE

All isolates are PFGE PATTERN 4

Implementing WGCA for SE in real-time.

•  Evaluate WGCA compared to PFGE. •  Speed - Faster •  Cost - Cheaper •  More Actionable Clusters – Better

•  Develop an in house bioinformatics pipeline.

•  Develop communication pipeline to epidemiologists.

•  Determine cluster parameters that represent an outbreak from a single source (assign a probability).

•  Use data sets to evaluate evolving informatic methods.

•  Become proficient (PT programs).

Over the past 12 months

•  Sequenced all Salmonella Enteritidis (379 genomes).

•  All data at NCBI

•  Developed an in House pipeline to analyze the data.

•  SNP based phylogenetic trees were constructed in real time.

•  63 phylogenetic clusters were reported to epidemiologists. •  0 to 5 snps differences

Data is analyzed using

a portal developed by

Informatics core

In House Developed Pipeline

Current tree •  Is this tree structure

reproduced by other pipelines?

•  Is it reproduced within our pipeline?

•  What are the minimal sequencing metrics required to create reproducible trees?

•  What changes would make this faster-cheaper-better.

*

0.02

NY-swgs1314

swgs1212

NY-swgs1454

NY-sw

gs1307

NY-swgs1341

0531

sgws

-Y

N

0631

sgws

-Y

NNY-swgs1359

NY-swgs1377

NY-swgs1416

NY-swgs1276

swgs1033

NY-swgs1368

swgs1065

NY-swg

s1274

NY-sw

gs1470

NY-swgs1240

swgs1039

NY-sw

gs1476

NY-swgs1373

NY-sw

gs1402

NY-swgs1361

1641sgws-

YN

swgs1205

swgs1038

NY-swgs1334swgs10

47

NY-swgs1401

swgs1048

NY-swgs1369

swgs1012

swgs1034

swgs1061

NY-sw

gs1347

swgs1021

swgs1025

NY-swgs1346

swgs1014

swgs1088

swgs1008

NY-sw

gs1467

NY-sw

gs1345

NY-swgs1381

swgs1030

swgs12

07

swgs1031

NY-swgs1236

NY-sw

gs1363

NY-swgs1356

NY-swgs1247

NY-swgs1384

NY-swgs1427

3738000031RDI-YN

NY-swgs1405 NY-swgs1422

swgs1210

NY-swgs1328

NY-swgs1325

0321

sgws

NY-swgs1397

NY-swgs1386

swgs1023

swgs1208

NY-sw

gs1241

NY-sw

gs1445

NY-swgs1255

NY-swgs129

2

NY-swgs1407

NY-sw

gs1259

4421sgws-

YN

NY-sw

gs1447

NY-swgs1468

swgs1049

swgs1003

swgs1085

swgs1203

7071100031RDI-YN

swgs1036

NY-sw

gs1327

NY-swgs1311

swgs1053

NY-swgs1261

swgs1010

NY-swgs1293

swgs1063

swgs1

024

NY-swgs1395

swgs1214

swgs1095

NY-swgs1398

NY-swgs1349

NY-sw

gs1453

swgs1086

NY-swgs1455

NY-swgs128

2

swgs1073

swgs1217

NY-swgs1450

swgs1198

swgs1221

NY-swgs1267

NY-swgs1319

NY-swgs1329

swgs1197

swgs1055

swgs1067

NY-sw

gs1414

swgs1035

NY-swgs1

295

6418

0000

31RD

I-YN

NY-swgs1239

NY-swgs1357

swgs11

94

swgs1046

swgs1060

NY-swg

s1390

swgs1200

swgs1087

swgs10

77

NY-sw

gs1275

swgs1216

NY-swgs1

310

NY-swgs1410

NY-swgs1352

8731sgws-

YN

swgs10

66

swgs1064

NY-swgs1342

NY-swg

s1370

swgs1222

NY-IDR1300012602

NY-swgs1321

8621sgws-

YN

swgs1054

4843100031RDI-YN

NY-sw

gs1431

NY-swgs1287

swgs1062

NY-swgs1262

NY-sw

gs1471

NY-swgs1298

NY-sw

gs1396

NY-swgs1403

NY-swgs1285

swgs102

2

NY-sw

gs1415

NY-swgs1242

swgs1057

swgs1019swgs1074

NY-swgs1302r

NY-swgs1312

NY-sw

gs1408

NY-swgs1433NY-swgs1385

swgs109

2

swgs1050

NY-swgs1358

NY-swgs124

6

swgs1233

NY-swgs1

308r

NY-swgs1288

NY-SWGS1248

swgs10

09R

NY-sw

gs1412

swgs1229

NY-swgs1243

swgs1042

swgs1005

swgs1078

swgs1094

NY-sw

gs1374

5968000031RDI-YN

NY-swgs1309

NY-swgs1463

NY-swgs1340

NY-swgs1430

NY-swgs1309r

NY-swgs1253

swgs1017

Salmonella_enterica_str_P125109

NY-swgs133

6

NY-sw

gs1335

swgs1051

NY-sw

gs1472

NY-swgs1278

NY-swgs1279

NY-swgs1281

NY-swgs1460

swgs1213

NY-sw

gs1426

3431

sgws

-Y

N

swgs1020

swgs1059

NY-swgs1337

6631

sgws

-Y

N

swgs1199

NY-swgs1256

NY-swgs1400

swgs1081

NY-swgs1300

NY-swgs1303

NY-sw

gs1409

swgs1028

swgs1076

swgs1084

swgs12

04

8241

sgws

-Y

N

NY-swgs1429

1001

sgws

-Y

N

NY-sw

gs1399

NY-swgs1324

0221

sgws

swgs1069

swgs1218

NY-sw

gs1290

5531

sgws

-YN

swgs1026

swgs1195

NY-swgs127

2

swgs1225

swgs10

32

swgs1011

swgs10

04

NY-swgs1294

swgs1083

swgs1016

NY-sw

gs1317

NY-sw

gs1444

NY-swgs1383

7831sgws-

YN

8331sgws-

YN

0221100031RDI-YN

NY-swgs1379

NY-sw

gs1424

NY-swgs1432

NY-sw

gs1286

NY-swgs1330

swgs1206

swgs1006

swgs1202

NY-swgs1302

NY-swgs1323

NY-sw

gs1475

6541sgws-

YN

NY-swgs1451

NY-swgs1376

NY-swgs1277

NY-sw

gs1392

NY-sw

gs1462

NY-swgs1411

NY-swgs1344

NY-swgs1320

swgs1071

NY-swgs1419

swgs1015

4821sgws-

YN

NY-swgs1

263

swgs1091

NY-swgs1353

swgs1043

swgs1072

NY-swgs1269

swgs107

0

swgs1052

swgs1079

swgs1228NY-swgs1364

NY-swgs1

301

NY-swgs1362

swgs1080

NY-sw

gs1331

3741

sgws

-Y

N

NY-sw

gs1413

NY-swgs1452

NY-swgs1291

NY-swgs1254

NY-swgs1388

NY-swgs1322

NY-IDR1300011034

NY-swgs1289

NY-swgs1367

NY-sw

gs1326

3331

sgws

-YN

swgs104

5

6441sgws-

YN

NY-swgs1257

swgs1056

NY-swgs1389

9331sgws-

YN

swgs1201

NY-sw

gs1425

NY-swgs1391

swgs1013

NY-swgs1332

NY-swgs1280

NY-swgs1404

NY-swgs1348

NY-SWGS1249

swgs1044

NY-swgs1260

NY-IDR1300009147

swgs1235

NY-swgs1238

swgs12

19

swgs1224

swgs1227

NY-swgs1270

NY-swgs1245

NY-swgs1417

swgs1037

NY-swgs1252

swgs1090

swgs1082

NY-sw

gs1474

swgs1027

NY-swgs1469

NY-sw

gs1375

NY-swgs1304

NY-swgs1313

swgs1226

3221

sgws

swgs1058

NY-swgs127

1

NY-swgs1420

NY-sw

gs1351

swgs1234

swgs1093

3821sgws-

YN

swgs1029

NY-swgs1365

NY-swgs1266

swgs1007

swgs1068

swgs1215

NY-swgs1423

NY-swgs1448

5031

sgws

-Y

N

NY-swg

s1393

NY-swgs1306

swgs1211

swgs1209

6041sgws-

YN

NY-sw

gs1372

NY-swgs1382

NY-swgs1258

NY-sw

gs1296

swgs1196

NY-swgs1380

swgs1040

NY-sw

gs1251

NY-swg

s1318

NY-swgs1354

NY-swgs1273

swgs1018

swgs1089

9570100031RDI-YN

NY-sw

gs1371

swgs1041

NY-swgs1421

NY-sw

gs1418

NY-swgs1394

swgs1075

*

*

**

*

***

*

**

PFGE Type*=2*=4*= 5*=21 or 23*=34*=56

*

**

*

*

*

*

*

**

*

** *

*

*

*

**

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

**

*

**

*

**

*

**

***

*

*

**

*

*

Can we develop cluster metrics that give a probability of linkage to a single source?

•  Perform phylogenetic analysis of epidemiologically confirmed outbreaks.

I.  Calculate SNP distance. II.  Examine tree structure.

•  Do for many serovars of each species.

•  This could be done relatively easily by freezer diving and HiSeq runs.

•  For SE SNP distances appear to be small (0-3 snps) I.  Based on 9 bonafide outbreaks from NY and MN.

Large Western NY pattern 5 cluster

•  13 isolates collected over 10 months (0 to 6 snps distance)

•  First isolates in fall 2013.

•  February 2014, two distinct clades form.

•  Suggests bug has evolved

•  Does this cluster represent 1, 2, or 3 sources?

NY-swgs1267

NY-IDR1300008146

NY-swgs1362

swgs1086

NY-swgs1375_NEW

NY-swgs1355

NY-swgs1333

swgs1224

swgs1018

swgs1008

swgs1079

swgs1223

swgs1226

NY-swgs1339

NY-swgs1360

NY-swgs1387_NEW

swgs1217

NY-swgs1305

NY-swgs1366

NY-swgs1347

NY-swgs1335

swgs1065

NY-swgs1311

NY-swgs1374_NEW

swgs1048

swgs1057

swgs1090

swgs1017

swgs10610.0005

7/22/14 Onondaga

7/05/14 Oneida

6/22/14 Seneca

6/3/14 Erie

9/25/13 Niagara

6/11/14 Oswego

5/12/14 Ontario

2/22/14 Oneida

2/22/14 Onondaga

2/21/14 Onondaga

2/6/14 Onondaga

12/12/13 Cattaraugus

10/15/13 Monroe

1+snps+

One person two outbreaks

•  GC-35 appears 6/9/14 •  PFGE pattern 4 •  Isolate from food a handler •  Total of 5 cases through 8/7/14

•  GC-40 appears 6/30/14 •  PFGE pattern 21 •  isolate recovered from the same

food handler

JEGX01.0021++

JEGX01.0004

WGCA is better, but is it faster and cheaper?

metric( PFGE( WGCA(

TAT : extraction to analysis

2 days 6+days+

Cost+ $69+ $294+

Technician+@me+ 8h+ 10h+

Ac@onable+clusters++ 3+nonEendemic+ 63+

We have created a two State Network

•  Collaborating with Minnesota. •  Currently no informatics in house.

•  We pull their sequences off Basespace. •  Run through our pipeline.

Tree from Merged data

•  Does pipeline used to merge the data affect tree structure?

•  Do sequence metrics affect merged tree structure?

Travel+associated+

National Genomic Surveillance Machine

•  State labs feed the machine by uploading sequences from isolates received through surveillance.

•  Federal and other support for reagents and equipment.

•  NCBI to analyze the products of this machine and reports results to state and federal agencies.

Current FDA Genome Trackr network

State Health labs •  New York •  Florida •  Arizona •  Washington •  Minnesota •  Virginia •  Maryland

FDA labs •  9 FDA field labs •  CFSAN - MOD1 •  CFSAN - Wiley •  IEH (contracting lab)

International labs •  Mexico •  Ireland •  UK (FERA) •  Columbia

Contributors •  Turkey •  Brazil •  Italy

NCBI Pathogen Pipeline Salmonella tree

Travel associated

Expected Outcomes for WGS surveillance

•  Laboratory •  Improve outbreak cluster detection. •  Clusters will be detected more rapidly and from fewer isolates.

•  Epi •  Allow identification of clusters within endemic patterns. •  Solve more clusters.

•  Public Health •  More efficient identification and removal of pathogen sources.

Challenges exist •  Creating a network. •  Increasing amounts of data. •  Metadata: how much should be public?

•  In real time? •  What elements?

•  Paying

•  As sequencing technology and bioinformatics evolve: •  Need to maintain backward compatibility

•  Transitioning: •  What to do first. •  Integration with serology and PFGE typing.

Standards I would like to see

•  Pipeline quality and reproducibility.

•  Tree quality and reproducibility.

•  Probability metrics that a cluster is from a single source.

Summary

•  WGS can improve surveillance activities and outbreak traceback.

•  It is practical to develop network.

•  We need standards.

Acknowledgments •  Cornell Martin Wiedmann Henk den Bakker

•  FDA Eric Brown Peter Evans Marc Allard Errol Strain Ruth Timme

•  Connecticut DOH Stacey Kinney John Fontana

•  Minnesota DOH David Boxrud Angie Jones Victoria Lappi

•  Washington State DOH Ailyn Perez-Osorio Zhen Li

•  Wadsworth Center Genomics Core Matt Shudt Zhen Zhang Charles MacGowan Melissa Leisner Danielle Loranger Mike Palumbo Pascal LaPierre

•  Kara Michell

•  Wadsworth PulseNet Lab Dianna Bopp Deb Baker Lisa Thompson

•  NCBI Bill Klimke Martin Shumway