search strategies for antimicrobial resistance associated ......pearce, ames, zemla, allen...

46
Search strategies for antimicrobial resistance associated genes Data Science Workshop Aug , Aram Avila-Herrera LLNL-PRES- This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC-NA. Lawrence Livermore National Security, LLC

Upload: others

Post on 02-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Search strategies forantimicrobial resistanceassociated genesData Science Workshop

Aug 7, 2018 Aram Avila-Herrera

LLNL-PRES-755360

This work was performed under the auspices of the U.S. Department of Energy by Lawrence LivermoreNational Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

Page 2: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Antimicrobial resistance (AMR) numbers

2 million

AMR infections per year in US 1

23,000

deaths per year in US directly from AMR 1

more from indirect complications 1

35 billion

dollars per year in costs to US households from AMR 2

20 billion per year in costs to US health care system 2

1Antibiotic Resistance Threats in the United States 2013, CDC2Golkar et al., J Infect Dev Ctries 2014

LLNL-PRES-7553602

Page 3: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Antimicrobial resistance (AMR) numbers

2 millionAMR infections per year in US 1

23,000

deaths per year in US directly from AMR 1

more from indirect complications 1

35 billion

dollars per year in costs to US households from AMR 2

20 billion per year in costs to US health care system 2

1Antibiotic Resistance Threats in the United States 2013, CDC

2Golkar et al., J Infect Dev Ctries 2014

LLNL-PRES-7553602

Page 4: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Antimicrobial resistance (AMR) numbers

2 millionAMR infections per year in US 1

23,000

deaths per year in US directly from AMR 1

more from indirect complications 1

35 billion

dollars per year in costs to US households from AMR 2

20 billion per year in costs to US health care system 2

1Antibiotic Resistance Threats in the United States 2013, CDC

2Golkar et al., J Infect Dev Ctries 2014

LLNL-PRES-7553602

Page 5: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Antimicrobial resistance (AMR) numbers

2 millionAMR infections per year in US 1

23,000deaths per year in US directly from AMR 1

more from indirect complications 1

35 billion

dollars per year in costs to US households from AMR 2

20 billion per year in costs to US health care system 2

1Antibiotic Resistance Threats in the United States 2013, CDC

2Golkar et al., J Infect Dev Ctries 2014

LLNL-PRES-7553602

Page 6: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Antimicrobial resistance (AMR) numbers

2 millionAMR infections per year in US 1

23,000deaths per year in US directly from AMR 1

more from indirect complications 1

35 billion

dollars per year in costs to US households from AMR 2

20 billion per year in costs to US health care system 2

1Antibiotic Resistance Threats in the United States 2013, CDC

2Golkar et al., J Infect Dev Ctries 2014

LLNL-PRES-7553602

Page 7: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Antimicrobial resistance (AMR) numbers

2 millionAMR infections per year in US 1

23,000deaths per year in US directly from AMR 1

more from indirect complications 1

35 billion

dollars per year in costs to US households from AMR 2

20 billion per year in costs to US health care system 2

1Antibiotic Resistance Threats in the United States 2013, CDC

2Golkar et al., J Infect Dev Ctries 2014

LLNL-PRES-7553602

Page 8: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Antimicrobial resistance (AMR) numbers

2 millionAMR infections per year in US 1

23,000deaths per year in US directly from AMR 1

more from indirect complications 1

35 billiondollars per year in costs to US households from AMR 2

20 billion per year in costs to US health care system 2

1Antibiotic Resistance Threats in the United States 2013, CDC2Golkar et al., J Infect Dev Ctries 2014

LLNL-PRES-7553602

Page 9: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Antimicrobial resistance (AMR) numbers

2 millionAMR infections per year in US 1

23,000deaths per year in US directly from AMR 1

more from indirect complications 1

35 billiondollars per year in costs to US households from AMR 2

20 billion per year in costs to US health care system 2

1Antibiotic Resistance Threats in the United States 2013, CDC2Golkar et al., J Infect Dev Ctries 2014

LLNL-PRES-7553602

Page 10: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

What is antimicrobial resistance?

When microbes such as bacteria areable to counteract antibiotics

One mechanism:Genes that encode drug-inactivatingproteins are acquired or evolved

credit: Joseliya Embuscado

Detect these genes to:Prescribe the correct medicine,dosageDevelop e�ective new drugsLearn about evolution

LLNL-PRES-7553603

Page 11: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

What is antimicrobial resistance?

When microbes such as bacteria areable to counteract antibioticsOne mechanism:

Genes that encode drug-inactivatingproteins are acquired or evolved

credit: Joseliya Embuscado

Detect these genes to:Prescribe the correct medicine,dosageDevelop e�ective new drugsLearn about evolution

LLNL-PRES-7553603

Page 12: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

What is antimicrobial resistance?

When microbes such as bacteria areable to counteract antibioticsOne mechanism:

Genes that encode drug-inactivatingproteins are acquired or evolved

credit: Joseliya Embuscado

Detect these genes to:Prescribe the correct medicine,dosageDevelop e�ective new drugsLearn about evolution

LLNL-PRES-7553603

Page 13: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

What is antimicrobial resistance?

When microbes such as bacteria areable to counteract antibioticsOne mechanism:

Genes that encode drug-inactivatingproteins are acquired or evolved

credit: Joseliya Embuscado

Detect these genes to:Prescribe the correct medicine,dosageDevelop e�ective new drugsLearn about evolution

LLNL-PRES-7553603

Page 14: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

What is antimicrobial resistance?

When microbes such as bacteria areable to counteract antibioticsOne mechanism:

Genes that encode drug-inactivatingproteins are acquired or evolved

credit: Joseliya Embuscado

Detect these genes to:Prescribe the correct medicine,dosageDevelop e�ective new drugsLearn about evolution

LLNL-PRES-7553603

Page 15: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Gene detection: sequence and search

String comparison with a few twists:

Millions of short reads per sample—ambiguityNatural variation—mismatches not equally important

Fundamental step for R&D, applications

10b reads (1Tbp) of data per run x samples per dayReference DBs can be large, are regularly updated

LLNL-PRES-7553604

Page 16: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Gene detection: sequence and search

String comparison with a few twists:

Millions of short reads per sample—ambiguityNatural variation—mismatches not equally important

Fundamental step for R&D, applications

10b reads (1Tbp) of data per run x samples per dayReference DBs can be large, are regularly updated

LLNL-PRES-7553604

Page 17: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Gene detection: sequence and search

String comparison with a few twists:Millions of short reads per sample—ambiguity

Natural variation—mismatches not equally important

Fundamental step for R&D, applications

10b reads (1Tbp) of data per run x samples per dayReference DBs can be large, are regularly updated

LLNL-PRES-7553604

Page 18: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Gene detection: sequence and search

String comparison with a few twists:Millions of short reads per sample—ambiguityNatural variation—mismatches not equally important

Fundamental step for R&D, applications

10b reads (1Tbp) of data per run x samples per dayReference DBs can be large, are regularly updated

LLNL-PRES-7553604

Page 19: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Gene detection: sequence and search

String comparison with a few twists:Millions of short reads per sample—ambiguityNatural variation—mismatches not equally important

Fundamental step for R&D, applications

10b reads (1Tbp) of data per run x samples per dayReference DBs can be large, are regularly updated

LLNL-PRES-7553604

Page 20: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Gene detection: sequence and search

String comparison with a few twists:Millions of short reads per sample—ambiguityNatural variation—mismatches not equally important

Fundamental step for R&D, applications

10b reads (1Tbp) of data per run x samples per dayReference DBs can be large, are regularly updated

LLNL-PRES-7553604

Page 21: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Gene detection: sequence and search

String comparison with a few twists:Millions of short reads per sample—ambiguityNatural variation—mismatches not equally important

Fundamental step for R&D, applications10b reads (1Tbp) of data per run x samples per day

Reference DBs can be large, are regularly updated

LLNL-PRES-7553604

Page 22: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Gene detection: sequence and search

String comparison with a few twists:Millions of short reads per sample—ambiguityNatural variation—mismatches not equally important

Fundamental step for R&D, applications10b reads (1Tbp) of data per run x samples per dayReference DBs can be large, are regularly updated

LLNL-PRES-7553604

Page 23: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

ShortBRED: addresses reference database size

Kaminski et al., PloS Comp Bio 2015; (Huttenhower lab)

Identifies marker sequencesin ref. DB

Searches sample readsagainst smaller marker DB

Updates –> rebuild markersMiss hits outside of themarkers

is the whole gene present?limited mutation analysis

LLNL-PRES-7553605

Page 24: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

ShortBRED: addresses reference database size

Kaminski et al., PloS Comp Bio 2015; (Huttenhower lab)

Identifies marker sequencesin ref. DBSearches sample readsagainst smaller marker DB

Updates –> rebuild markersMiss hits outside of themarkers

is the whole gene present?limited mutation analysis

LLNL-PRES-7553605

Page 25: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

ShortBRED: addresses reference database size

Kaminski et al., PloS Comp Bio 2015; (Huttenhower lab)

Identifies marker sequencesin ref. DBSearches sample readsagainst smaller marker DB

Updates –> rebuild markersMiss hits outside of themarkers

is the whole gene present?limited mutation analysis

LLNL-PRES-7553605

Page 26: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

ShortBRED: addresses reference database size

Kaminski et al., PloS Comp Bio 2015; (Huttenhower lab)

Identifies marker sequencesin ref. DBSearches sample readsagainst smaller marker DB

Updates –> rebuild markers

Miss hits outside of themarkers

is the whole gene present?limited mutation analysis

LLNL-PRES-7553605

Page 27: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

ShortBRED: addresses reference database size

Kaminski et al., PloS Comp Bio 2015; (Huttenhower lab)

Identifies marker sequencesin ref. DBSearches sample readsagainst smaller marker DB

Updates –> rebuild markersMiss hits outside of themarkers

is the whole gene present?limited mutation analysis

LLNL-PRES-7553605

Page 28: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

ShortBRED: addresses reference database size

Kaminski et al., PloS Comp Bio 2015; (Huttenhower lab)

Identifies marker sequencesin ref. DBSearches sample readsagainst smaller marker DB

Updates –> rebuild markersMiss hits outside of themarkers

is the whole gene present?

limited mutation analysis

LLNL-PRES-7553605

Page 29: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

ShortBRED: addresses reference database size

Kaminski et al., PloS Comp Bio 2015; (Huttenhower lab)

Identifies marker sequencesin ref. DBSearches sample readsagainst smaller marker DB

Updates –> rebuild markersMiss hits outside of themarkers

is the whole gene present?limited mutation analysis

LLNL-PRES-7553605

Page 30: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Our approach: De Bruijn graph based data structure

Whole ref. seqs. represented as sliding window substrings

Search algo. based on short exact matches (see also Salmon,kallisto)

Score: breadth, weighted by extended matches

Eventually: add ML-based signatures, multiple encodings (ARmo)

Pearce, Ames, Zemla, Allen

LLNL-PRES-7553606

Page 31: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Our approach: De Bruijn graph based data structure

Whole ref. seqs. represented as sliding window substrings

Search algo. based on short exact matches (see also Salmon,kallisto)

Score: breadth, weighted by extended matches

Eventually: add ML-based signatures, multiple encodings (ARmo)

Pearce, Ames, Zemla, Allen

LLNL-PRES-7553606

Page 32: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Our approach: De Bruijn graph based data structure

Whole ref. seqs. represented as sliding window substrings

Search algo. based on short exact matches (see also Salmon,kallisto)

Score: breadth, weighted by extended matches

Eventually: add ML-based signatures, multiple encodings (ARmo)

Pearce, Ames, Zemla, Allen

LLNL-PRES-7553606

Page 33: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Our approach: De Bruijn graph based data structure

Whole ref. seqs. represented as sliding window substrings

Search algo. based on short exact matches (see also Salmon,kallisto)

Score: breadth, weighted by extended matches

Eventually: add ML-based signatures, multiple encodings (ARmo)Pearce, Ames, Zemla, Allen

LLNL-PRES-7553606

Page 34: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Gene scoring: Maximum exact match lengths as weights

Aggregate gene scores byantibiotic

Antibiotic Score

Beta-lactam 1.00Erythromycin 0.88. . . . . .

LLNL-PRES-7553607

Page 35: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Gene scoring: Maximum exact match lengths as weights

Aggregate gene scores byantibiotic

Antibiotic Score

Beta-lactam 1.00Erythromycin 0.88. . . . . .

LLNL-PRES-7553607

Page 36: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Gene scoring: Maximum exact match lengths as weights

Aggregate gene scores byantibiotic

Antibiotic Score

Beta-lactam 1.00Erythromycin 0.88. . . . . .

LLNL-PRES-7553607

Page 37: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Gene scoring: Maximum exact match lengths as weights

Aggregate gene scores byantibiotic

Antibiotic Score

Beta-lactam 1.00Erythromycin 0.88. . . . . .

LLNL-PRES-7553607

Page 38: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Predicting AMR presence and drug class: preliminary results

Simulated 500 samples with AMR to multiple drug classes

AMR presence-absence AMR drug class

Avila-Herrera, Pearce, Ames, Zemla, Allen, manuscript in prep

//TODO: larger standard data set, test scoring functions

LLNL-PRES-7553608

Page 39: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Predicting AMR presence and drug class: preliminary results

Simulated 500 samples with AMR to multiple drug classes

AMR presence-absence

AMR drug class

Avila-Herrera, Pearce, Ames, Zemla, Allen, manuscript in prep

//TODO: larger standard data set, test scoring functions

LLNL-PRES-7553608

Page 40: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Predicting AMR presence and drug class: preliminary results

Simulated 500 samples with AMR to multiple drug classes

AMR presence-absence AMR drug class

Avila-Herrera, Pearce, Ames, Zemla, Allen, manuscript in prep

//TODO: larger standard data set, test scoring functions

LLNL-PRES-7553608

Page 41: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Predicting AMR presence and drug class: preliminary results

Simulated 500 samples with AMR to multiple drug classes

AMR presence-absence AMR drug class

Avila-Herrera, Pearce, Ames, Zemla, Allen, manuscript in prep

//TODO: larger standard data set, test scoring functions

LLNL-PRES-7553608

Page 42: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Search tools are critical in modern molecular biology

Goal: release tool to compbio communityAccelerate basic researchEnable continuous biosurveillance of AMRcrisisWork towards timely precision medicaldiagnostics

LLNL-PRES-7553609

Page 43: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Search tools are critical in modern molecular biology

Goal: release tool to compbio communityAccelerate basic researchEnable continuous biosurveillance of AMRcrisisWork towards timely precision medicaldiagnostics

LLNL-PRES-7553609

Page 44: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Acknowledgements

Graph search projectJonathan AllenRoger PearceSasha AmesAdam Zemla

Related AMR projectsMarisa TorresNisha MulakkenElizabeth VitalisNicholas BeCrystal JaingTom Slezak

LLNL-PRES-75536010

Page 45: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Fin

LLNL-PRES-75536011

Page 46: Search strategies for antimicrobial resistance associated ......Pearce, Ames, Zemla, Allen LLNL-PRES-755360 6. Our approach: De Bruijn graph based data structure Whole ref. seqs. represented

Disclaimer

This document was prepared as an account of work sponsored by anagency of the United States government. Neither the United Statesgovernment nor Lawrence Livermore National Security, LLC, nor any oftheir employees makes any warranty, expressed or implied, or assumesany legal liability or responsibility for the accuracy, completeness, orusefulness of any information, apparatus, product, or process disclosed,or represents that its use would not infringe privately owned rights.Reference herein to any specific commercial product, process, orservice by trade name, trademark, manufacturer, or otherwise does notnecessarily constitute or imply its endorsement, recommendation, orfavoring by the United States government or Lawrence LivermoreNational Security, LLC. The views and opinions of authors expressedherein do not necessarily state or reflect those of the United Statesgovernment or Lawrence Livermore National Security, LLC, and shallnot be used for advertising or product endorsement purposes.

LLNL-PRES-75536012