ncbi pathogen detection pipeline epi work flow · • “epi_type” – “clinical” or...

32
National Center for Emerging and Zoonotic Infectious Diseases NCBI Pathogen Detection Pipeline Epi Work Flow Rashida Hassan, MSPH Foodborne Outbreak Response Team Outbreak Response and Prevention Branch Centers for Disease Control and Prevention May 13, 2019

Upload: others

Post on 25-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

National Center for Emerging and Zoonotic Infectious Diseases

NCBI Pathogen Detection Pipeline Epi Work Flow

Rashida Hassan, MSPHFoodborne Outbreak Response TeamOutbreak Response and Prevention BranchCenters for Disease Control and Prevention

May 13, 2019

Page 2: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

What is it?

Page 3: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

NCBI Pathogen Detection project

Centralized system which integrates sequences for bacterial pathogens from food, the environment, and human patients

Agencies submit sequencing data to NCBI, which analyzes the sequences to identify closely related sequences

NCBI Pathogen Detection Isolates Browser: web-basesd portal that integrates available information with the SNP cluster information

https://www.ncbi.nlm.nih.gov/pathogens/

Page 4: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

NCBI Pathogen Detection project

Pathogens• Current focus on Campylobacter, Escherichia coli, Shigella, Listeria, and

Salmonella• >20 other pathogens added, with more expected to follow

Contributing Agencies• Routine submissions from state PulseNet Laboratories, CDC, FDA,

USDA-FSIS, Public Health England• Additional countries and institutions may submit sequences as well

Page 5: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings
Page 6: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

How we use it at CDC

Page 7: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Active Multistate Clusters• Objective: determine relatedness between

isolates included in a multistate cluster & see if there are any additional related isolates

Page 8: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Active Multistate Clusters• Objective: determine relatedness between

isolates included in a multistate cluster & see if there are any additional related isolates

• Step 1: Go to NCBI Pathogen Detection Isolates Browser https://www.ncbi.nlm.nih.gov/pathogens/

• Select “Find isolates now!” or explore data for your pathogen

Page 9: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

• Step 2: Paste WGS IDs for your cluster isolates into search box, click search

CopyPaste Click

Active Multistate Clusters

Page 10: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

• Step 3: Select generated SNP cluster

Select

11 matched isolates were found, 7 clinical and 4 environmental73 total isolates in NCBI’s SNP clusterMinimal SNP difference for all isolates within this search is 0

Active Multistate Clusters

Page 11: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Isolates you searched for will be selected/highlighted in red

Left click and “select” any blue isolates to add them to your selection, or check box in table above

Page 12: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Isolates you searched for will be selected/highlighted in red

Left click and “select” any blue isolates to add them to your selection, or check box in table above

Page 13: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Differences between isolates selected

Min-same = Minimum SNP difference within same isolation source typeMin-diff = Minimum SNP difference across different isolation types

Minimum distance between different source types

Page 14: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Selects all isolates that fall within a designated SNP distance of your originally selected isolate(s)

Page 15: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Selects all isolates that fall within a designated SNP distance of your originally selected isolate(s) Specify SNP distance

and select “Add”

Page 16: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Change info displayed; can add AMR info, PFGE, etc.

Page 17: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Add columns to line list here

Remove columns from line list here

Click ok

Page 18: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Filter by time or isolate type

Page 19: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Filter by source type here

Filter by time by selecting area on timeline

Click arrow to close

Page 20: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Export image of tree

Page 21: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Download line list of selected isolates

Page 22: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Create alert for new closely related sequencesCreate name for your alert

Select SNP distance for alert

Page 23: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

• Step 3: OR do nothing if you get this message

Sequences have not been uploaded/analyzed yet, or they have not met NCBI’s quality checks

Active Multistate Clusters

Page 24: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Queries and Searches

Page 25: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Terms for NCBI PDP Searches & Queries• taxgroup_name: select the organism name

– “Salmonella enterica”– “E.coli and Shigella”– “Campylobacter jejuni”– “Listeria monocytogenes”

• new: 1 - specify only new added isolates• mindiff: [0 to 5] - specify the range of SNP differences between any clinical

or food/environmental isolate (brackets for ranges, or just the number)• minsame: 0 - specify SNP difference between isolates of the same type• Geo_loc_name: - specify geographic location (usually country)• AMR_genotype: - specify AMR genes• “epi_type” – “clinical” or “environmental/other”

Page 26: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Terms for NCBI PDP Searches & Queries• If you forget the search terms, hover over the name of that column in the

isolates browser

Page 27: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Terms for NCBI PDP Searches & QueriesExample search: • taxgroup_name:"Salmonella enterica" AND mindiff:[0 TO 3] AND

geo_loc_name:"USA" AND new:1 AND epi_type:"clinical"

• Will find any new Salmonella clinical isolates from the USA less than 4 SNPs to a food/environmental isolate

Enter search terms hereSelect to save your search

Select search

Page 28: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Terms for NCBI PDP Searches & Queries• “Watch” and “save” search options are only available if you create an NCBI

account

Click here at top of NCBI PDP webpage

Page 29: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Terms for NCBI PDP Searches & Queries• “Watch” and “save” search options are only available if you create an NCBI

account

Click “sign up” to create your account

Page 30: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

Some caveats…

Page 31: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

• Analyses from NCBI must be confirmed by PulseNet– NCBI uses SNP analysis vs. PulseNet uses wgMLST, cgMLST, and hqSNP

• NCBI PDP most useful during WGS transition, until PulseNet allele codes are available – After the transition, NCBI PDP can supplement information in PulseNet: matches in other countries (clinical

and food), non-PulseNet pathogens, etc.• Delays: once submitted to NCBI, could take up to 1 week for Salmonella sequences to be posted• NCBI has their own guidelines for what would be considered “good quality”

– Some sequences by state labs could be rejected/not posted even if they pass PulseNet’s quality checks• Some analyses for AMR, especially on older isolates, might be out of date

– A null value does not indicate a negative result– For confirmation on any AMR, please contact NARMS

• NCBI cluster range is 50 SNPs, so many things in their trees will not really be closely related– Double check alerts for saved queries- isolates may fall within NCBI’s 50 SNP range but not within the

smaller range for your cluster/outbreak– Saved searches may result in very many notifications!

• Vibrio cluster detection by WGS based on NCBI PDP SNP analysis, since no schema is developed yet

NCBI Pathogen Detector Pipeline (PDP) Caveats

Page 32: NCBI Pathogen Detection Pipeline Epi Work Flow · • “epi_type” – “clinical” or “environmental/other” ... 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 The findings

For more information, contact CDC1-800-CDC-INFO (232-4636)TTY: 1-888-232-6348 www.cdc.gov

The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

Questions?

For more information, contact:Rashida Hassan ykm6@@cdc.gov404-639-1727