whole-genome sequencing (wgs) for food safetymay 22, 2017  · whole-genome sequencing (wgs) for...

34
Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center for Food Safety and Applied Nutrition U.S. Food Drug Administration IFSH Meeting 5/22/2017

Upload: others

Post on 05-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

Whole-Genome Sequencing (WGS)

for Food Safety

Errol Strain, Ph.D.Director, Biostatistics and Bioinformatics StaffCenter for Food Safety and Applied Nutrition

U.S. Food Drug Administration

IFSH Meeting

5/22/2017

Page 2: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

2

FDA Regulatory Use Cases1. Do these new bacterial isolates from

environmental/product testing match any clinical isolates in the DB?

– Is this product/facility causing illness?

2. Do these new clinical isolates match any environmental/food isolates in DB?

– Should we test product/swab a facility?

3. Are isolates collected at different points in time from the same facility a match?

– Is there a problem w/ a resident pathogen, harborage?

Page 3: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

GenomeTrakr Data Flow

GenomeTrakr Labs& Collaborators

Salmonella

Listeria

Page 4: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

4

NGS-Based Surveillance(prior to NCBI Pathogen Detection)

Initial Clustering:PFGE, K-mer, MASH, BLAST

Goal: Find a group of 10-200 Closely related isolates

SNP Pipeline: Find phylogenetically

informative SNPs,FASTA alignment

NCBI

Construct Phylogeny

FDAFDA

Missing

Page 5: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

5

NCBI Pathogen Detection

Page 6: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

6

CFSAN vs NCBI SNPs

Page 7: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

7

Scientific Evidence – Daubert Standard1. Empirical testing: whether the theory or technique is falsifiable, refutable,

and/or testable.

2. Whether it has been subjected to peer review and publication.–Specific/Target Studies for pathogen have been published. Multiple software packages for mapping and calling SNPs.

3. The known or potential error rate.–Well characterized at read level, less so for cluster analysis.

4. The existence and maintenance of standards and controls concerning its operation.

–Proficiency testing efforts through Global Microbial Identifier and also FDA GenomeTrakr network.

5. The degree to which the theory and technique is generally accepted by a relevant scientific community.

–Acceptance facilitated by open database (NCBI/SRA).

Page 8: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

8

Why Build A Pipeline?

1. Regulatory Use and/or Accredited Labs

– NCBI methods not public and peer-reviewed

– Chain of custody – local computation

– Results needed immediately

2. Pathogen and/or data not at NCBI

– Mycobacterium, Legionella*

– Food Industry – private data

Page 9: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

9

What Kind of Pipeline?

9

SNPs wgMLST

Unit of MeasureSingle Nucleotide Substitutions (other

types of mutations are excluded)

Allele - variant of a gene. Variation could arise form a number of sources, including

SNPs, insertions, deletions, etc.

RequirementsComplete or high-quality reference

genome for mappingDatabase of named alleles, must be

actively maintained

ProsExtremely High Resolution, Methods have

been published and validatedRelatively Fast, not directly dependent

upon reference genome

ConsRequires reference genome,

computationally intense, requires local bioinformatics expertise

Allele database must be centralized, cannot compute novel wgMLST types locally. wgMLST schemas not easy to

publicly access

Page 10: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

10

FDA Pipeline Requirements

1. Public, Peer-Reviewed

– Results may be subject to legal scrutiny

– Accessible to FDA-regulated industries

2. Reproducible

3. Documentation & Validation

4. Platform independent (fastq)

5. Run Locally

10

Page 11: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

11

Background: CFSAN SNP Pipeline

Mapping/Aligning (66+) SNP Detection (16+)

Samtools

SOAPsnp

GATK

SNVer

VarScan

SHORE

SMALT

MaCH

IMPUTE2

CLC BioQualitySNPngDNABaserSNPdetector

FreeBayes

SolSNP

DNAStar

Bowtie2 VarScan

Page 12: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

12

CFSAN SNP Pipeline

Documentation: http://snp-pipeline.rtfd.org

Source Code: https://github.com/CFSAN-Biostatistics/snp-pipeline

Pettengill JB, Luo Y, Davis S, Chen Y, Gonzalez-Escalona N, Ottesen A, Rand H, Allard MW, Strain E. (2014) An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with Salmonella. PeerJ 2:e620 http://dx.doi.org/10.7717/peerj.620

Davis S, Pettengill JB, Luo Y, Payne J, Shpuntoff A, Rand H, Strain E. (2015) CFSAN SNP Pipeline: an automated method for constructing SNP matrices from next-generation sequencedata. PeerJ Computer Science 1:e20 https://dx.doi.org/10.7717/peerj-cs.20

Page 13: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

13

FDA\CFSAN Validation Efforts

1. Technical Performance

Accuracy: Salmonella LT2 and Agona SL483

2. Intralaboratory variation, sequencing platform

Salmonella Montevideo (180+ runs), PacBio vs short reads

3. Interlaboratory variation

Salmonella Braenderup BAA-664 (PFGE control), ISO/CEN WG,

GenomeTrakr PT set (Salmonella & Listeria), Global Microbial

Identifier PT

4. Bioinformatics Pipeline

Software Validation, Benchmark bioinformatic data sets

Collaborations w/ Canada, CDC, NIH/NCBI

Page 14: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

14

Proficiency Testing:

• GenomeTrakr 2014, 2015:• Each lab in the GT network sequenced the same set of 8 strains. CFSAN

PT analysis returned.

• Manuscript in preparation

• GMI (yearly since 2013)• 2016 PT has wet and dry lab components

• 2016 PT includes K. pneumonia, L. mono, C. jejuni, E. coli

• PulseNet/GenomeTrakr harmonized PT• Early 2017

Page 15: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

15

CFSAN Workflow

15

Page 16: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

1616

Page 17: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

17

“Min-diff” – Minimum SNP distance to an isolate of a different sample type

– Food/Environmental vs Clinical (or Microbe)

17

Page 18: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

1818

8 SNPs Check SNP Cluster

Page 19: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

1919

Page 20: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

20

CFSAN Workflow

CFSAN SNP Pipeline is run on NCBI SNP cluster

– Reference – prefer complete genomes, drafts work almost as well

– High-Density SNP regions are filtered

>3 SNPs in 1000 bases, phages/recombination/etc.

– Phylogenetic inference – Maximum Likelihood

Ambiguous sites are treated as missing data

20

Page 21: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

21

-5

0

5

10

0% 0.1% 0.5% 1% 2.5% 5%

% Divergence

SN

P D

iffe

ren

ce

Which Reference?

Strain

SubtypePFGE

Serotype SubspeciesSpecies

Page 22: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

22

CFSAN SNP Pipeline: Listeria Draft vs PacBio Genome

High-Quality Draft Complete (PacBio)

Page 23: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

23

InterpretationSNP Distance

How close are the isolates? No single threshold for all species/types, rough guides

1. <=20 SNPs match, virtually identical

2. 20-100 SNPs inconclusive

3. > 100 SNPs exclude

Bootstrapping

Do the isolates form a unique cluster w/ >= 95% support? Is the cluster distinct from other isolates in the tree?

Results are critically evaluated and not used blindly

23

Page 24: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

24

Forensic NeedsWGS (SRA) Database:

Random survey of bacteria not possible, need to continue to grow database and curate genotypes

Thresholds for SNPs vs wgMLST:

1 SNP ≠ 1 INDEL ≠ 1 Recombination

Well-Documented wgMLST databases

Page 25: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

ExampleE. coli & Flour

25

Page 26: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

26

www.ncbi.nlm.nih.gov/pathogens/

26

Page 27: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

2727

Page 28: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

2828

0-3 SNPs to clinical isolates

0-3 SNPs to other food/env isolates

Page 29: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

2929

Page 30: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

30

CFSAN SNP Pipeline

30

Page 31: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

31

Future of GenomeTrakr & CFSAN SNP Pipeline

1. Local or web-based QA/QC and identification tools–Detect sample mix-ups and low quality before data is submitted to NCBI/SRA, fix problems more quickly

2. Continue to build WGS databases–Better thresholds for identity, increase odds of finding a match

3. Local SNP pipeline analysis–Accredited labs don’t have to send out data

Page 32: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

32

Snapshot of Data – 3/1 to 4/30SNP/ERD Clusters

* 2 or more isolates within 50 SNPs

# SNP Clusters% isolates in SNP clusters (3/2017) Total

Campylobacter 242 69 1054

E.coli/Shigella 221 59 (56%) 1132

Listeria 87 91 (89%) 356

Salmonella 439 83 (86%) 2100

Page 33: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center

33

Acknowledgements• FDA

• Center for Food Safety and Applied Nutrition• Center for Veterinary Medicine• Office of Regulatory Affairs

• National Institutes of Health• National Center for Biotechnology Information

• State Health and University Labs• Alaska• Arizona• California• Florida• Hawaii• Maryland• Minnesota• New Mexico• New York• South Dakota• Texas• Virginia• Washington

• USDA/FSIS• Eastern Laboratory

• CDC• Enteric Diseases Laboratory

• INEI-ANLIS “Carolos Malbran Institute,” Argentina

• Centre for Food Safety, University College Dublin, Ireland

• Food Environmental Research Agency, UK

• Public Health England, UK

• WHO

• Illumina

• Pac Bio

• CLC Bio

• Other independent collaborators

Page 34: Whole-Genome Sequencing (WGS) for Food SafetyMay 22, 2017  · Whole-Genome Sequencing (WGS) for Food Safety Errol Strain, Ph.D. Director, Biostatistics and Bioinformatics Staff Center