the journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · the journey...

61
The journey to a whole genome sequencing diagnostic service for Mycobacteria spp. Derrick Crook University of Oxford Public Health England Oxford University Hospitals FT Trust

Upload: others

Post on 22-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

The journey to a whole genome sequencing diagnostic service for

Mycobacteria spp.

Derrick CrookUniversity of Oxford

Public Health EnglandOxford University Hospitals FT Trust

Page 2: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Disclosures

• Funding award from Jansen to support Taiwan’s participating in the CRyPTIC consortium

Page 3: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Contents:

• Background• Proof of principle work• Validation• Implementation • Improvement in resistance prediction• Future prospects for diagnostics

Page 4: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

How did I get to work on TB

Madadeni Hospital Newcastle KZN

Page 7: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

The goal of research consortium is, therefore, to establish whether and how rapid sequencing can be integrated with bioinformatics and web-based technologies to transform infectious disease surveillance and management

Mycobacterium tuberculosisStaphylococcus aureusClostridium difficileNorovirus

The beginning of the journey March 2008

Page 8: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Concept for ideal whole genome sequencing solution

In one step generate the complete diagnostic, typing and surveillance information

Nature Reviews Genetics 13, 601-612 (September 2012)

Page 9: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Proof of principle for identifying clusters

Page 10: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Diversity within

Same sample

A person at one timepoint

A person over time

Point source outbreak

Community

59 technical replicates

Pulmonary vs extra-pulmonary (within a 1 month)

Sequential pulmonary (separated by >6 months)

Household outbreaks

MIRU-VNTR defined community clusters

Page 11: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook
Page 12: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook
Page 13: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook
Page 14: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook
Page 15: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

0·5 SNPs per genome per year (95% CI 0·3–0·7)

Page 16: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

SNP to MIRU loci comparison

EBioMedicine34(2018)122–130

A Quantitative Evaluation of MIRU-VNTR Typing Against Whole-Genome Sequencing for Identifying Mycobacterium tuberculosis Transmission: A Prospective Observational Cohort Study

50% of MIRU-VNTR isolates are falsely clustered i.e. are > 5 SNPs apart

David Wyllie

Page 17: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

www.thelancet.com/infection

Page 18: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Resistance Prediction

Page 19: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook
Page 20: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Number of variants

Sensitive phenotype %

Resistant phenotype %

0 7,566 84 33 51 1,111 12 518 742 211 2 130 193 61 1 16 24 21 0 3 05 6 0 0 06 1 0 1 08 1 0 0 0

Distribution of number of variants across candidate-genes and their promoter-regions in phenotypically susceptible and resistant isolates

Lancet Infect Dis 2015;15: 1193–1202

Dr Tim Walker

Page 21: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Can we discover more determinants

• Resistance is conferred by genomic variation:• Non-synonymous mutations , deletions and insertions in relevant genes – 23

genes• Arises mostly de-novo in a non-recombining genome leading to homoplasy

• Investigation of 3651 isolates :• Using a heuristic method for predicting resistance

• divided into• a 2099 derivation set• a 1552 validation set

Page 22: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Resistance prediction in a validation set

Page 23: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Phenotypic resistance occurring for each genetic variant

Page 24: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Proof of concept for a WGS - Mycobacterial diagnostic service• National diagnostic service in England

• All positive primary cultures with acid fast growth were referred to PHE Reference Labs (London, Birmingham and Newcastle)

• Each isolate was identified and all TB isolates subject to DST and typing

• Identification and susceptibility results reported to the clinicians

• Typing results (MIRU-VNTR) reported to the public health teams

Page 25: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Louise Pankhurst

Page 26: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Pilot study for mycobacterial processingOctober 2013 – April 2014

• Participants: Oxford, Birmingham, Brighton, Leeds, Lille, Borstal, Dublin & Vancouver.

• Clinical samples requiring mycobacterial culture were processed in MIGT tubes

• When they flagged positive, 1 ml was removed for DNA extraction and preparation

• Sequenced by MiSeq locally

Page 27: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Bioinformatics processing

• 331 samples processed• Data Transferred via the cloud to

Oxford• Analysed and generated reports

recording:1. Species2. Resistance prediction3. Nearest genomics match

(against 4000 WGS TB in database)

Page 28: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Summary – Species ID

For MTBC: Sensitivity 95.4% Specificity 98.1%

Based on single sequencing event (no repeating of tests)

Concordant WGS only Routine only

M. tuberculosis complex 157 1

M. tuberculosis complex (BCG) 8

M. tuberculosis complex (africanum) 2

M. tuberculosis complex & M. avium complex 1 5 3

M. avium complex 71 1 1

M. avium complex & M. malmoense 1

M. abscessus complex 39 1

M. gordonae 18

M. xenopi 11

M. kansasii 6 1

M. malmoense 3

M. fortuitum 2 1

M. szulgae 2

M. celatum 1

M. lentiflavum 1

M. scrofulaceum 1

Failed to identify species 9 10 2

Total 331 (96%) 18 (5%) 10 (3%)

Page 29: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Summary – Resistance prediction

WGS resistance predictions for MTB isolates compared to phenotypic DST. R = Resistant, S = Sensitive, M = Mixed (resistant and sensitive), F = failed WGS prediction

Resistant DST Sensitive DST DST not attempted DST failed

WGS WGS WGS WGS

R S M F R S M F R S M F R S M F

First-line

drugs

Isoniazid 13 2 1 0 0 143 0 7 0 0 0 0 0 1 0 1

Rifampicin 5 1 0 0 0 148 4 9 0 0 0 0 0 0 0 1

Ethambutol 5 1 0 0 1 153 0 7 0 0 0 0 0 0 0 1

Pyrazinamide 8 1 0 0 0 149 1 8 0 0 0 0 0 0 0 1

Second-

line

drugs

Streptomycin 5 1 0 0 0 14 0 0 2 138 0 8 0 0 0 0

Fluoroquinolones 3 1 0 0 0 6 0 0 2 148 0 8 0 0 0 0

Aminoglycosides 1 0 0 0 0 5 0 0 2 141 7 12 0 0 0 0

TOTAL 40 7 1 0 1 618 5 31 6 427 7 28 0 1 0 4

Page 30: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook
Page 31: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Speed – Analysis & reporting

WGS faster

WGS slower

Page 32: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Cost

Process Throughput

(2014)*

Total per sample in 2014

(GBP)

10% fewer samples per year (GBP)

10% more samples per year

(GBP)

WGS and routine clinical workflows

MGIT culture 15265 52·39 52·90 51·97 Cepheid Xpert MTB/RIF 617 99·66 102·35 97·44

WGS workflow only WGS 2207 118·55 120·16 117·26 Routine clinical workflows only

Identification assays Hain MTBC Hain CM/AS

2207 866

1341 55·05 55·28 54·87

MIRU-VNTR 866 107·75 110·89 105·18 First-line DST 866 135·47 137·12 134·13 Limited second-line DSTǂ 62 93·01 93·24 92·83 Second-line DSTǂǂ 62 101·27 104·24 98·86

WGS workflow scenarios

MGIT culture + WGS 170·94 173·06 169·23 MGIT culture + WGS + first-line DST 306··41 310·18 303·36 MGIT culture + WGS + first-line DST + full second-line DST 500·68 507·66 495·05

Routine clinical workflow scenarios

Culture + identification assays 107·44 108·18 106·84 Culture + identification assays + MIRU-VNTR + first-line DST 350·66 356·19 346·15 Culture + identification assays + MIRU-VNTR + first-line DST + full second-line DST 544·93 553·69 537·84

Total workflow costs WGS-based diagnostics 480·91 486·01 476·75 WGS-based diagnostics + first- and full second-line DST 539.53 545.37 534.73 Routine clinical workflow based diagnostics 518·31 524·00 513·64

Page 33: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Head-to-Head of WGS to routine

56:e01480-17.https://doi.org/10.1128/JCM.01480-17.

Phoung Quan

Page 34: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook
Page 35: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Table 1 WGS species predictions compared to routine laboratory tests

Organism identified by routine laboratory tests

No. samples available for comparison

WGS identified same species (%)

WGS identified different species in same complex / no. of discordants

M. tuberculosis 747 743 (99.5) 4/4 M. africanum 8 7 (87.5) 1/1 M. bovis 8 6 (75.0) 1/2 M. bovis (BCG strain) 6 6 (100.0) 0/0 M. tuberculosis complex 13 13 (100.0) 0/0

Total M. tuberculosis complex 782 775 (99.1) 6/7 M. abscessus 153 152 (99.3) 0/1 M. chelonae 113 106 (93.8) 3/7 M. abscessus complex 4 3 (75.0) 0/1

Total M. abscessus complex 270 261 (96.7) 3/9 M. avium 258 252 (97.7) 0/6 M. intracellulare/M. chimaera 320 296 (92.5) 2/24

Total M. avium complex 578 548 (94.8) 2/30 M. fortuitum 41 24 (58.5) 15/17 M. peregrinum 7 4 (57.1) 2/3

Total M. fortuitum complex 48 28 (58.3) 17/20 M. gordonae 130 127 (97.7) - M. kansasii 34 32 (94.1) - M. malmoense 41 38 (92.7) - M. marinum 5 5 (100.0) - M. ulcerans 1 0 (0.0) - M. xenopi 13 11 (84.6) -

Total other nontuberculous mycobacteria

272 241 (88.6) -

OVERALL TOTAL 1902 1825 (96.0) 28/77 Rarer speciesb 40 11 (27.5) - Mixturesc 23 21 (91.3) -

b Rarer species include: M. interjectum, M. scrofulaceum, M. genevense, M. goodii, M. lentiflavum, M. mucogenicum, M. simiae, M. szulgai c Mixtures were considered concordant if WGS identified at least one of the species reported by the routine laboratory

Page 36: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

MTBDRplus resistant MTBDRplus susceptible MTBDRplus uncharacterised Percentage failed

Excluding failed (95% CI)

WGS Total WGS Total WGS Total Sensitivity* Specificity* R S U F R S U F R S U F inhA (Isoniazid)

17 1 0 1 19 0 652 0 48 700 0 1 0 0 1 6.8 94.4 (72.7-99.9)

100.0 (99.4-100.0)

katG (Isoniazid)

47 0 0 6 53 3 621 0 44 668 0 0 0 0 0 6.9 100.0 (92.5-100.0)

99.5 (98.6-99.9)

rpoB (Rifampicin)

18 0 1 2 21 0 571 7 115 693 0 0 2 0 2 16.3 94.7 (74.0-99.9)

98.8 (97.5-99.5)

All genes 82 1 1 9 93 3 1844 7 207 2061 0 1 2 0 3 10.0 97.6 (91.7-99.7)

99.5 (99.0-99.7)

* Sensitivity and specificity relate to WGS ability to identify MTBDRplus resistance or susceptibility only, so any uncharacterised WGS mutations were counted as discordant

Resistance prediction vs MTBDRplus

Page 37: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Phenotypically resistant Phenotypically sensitive Percentage failed / uncharacterised

Excluding failed/uncharacterised (95% CI)

WGS prediction Total WGS prediction Total Sensitivity Specificity R S U F R S U F F U Isoniazid 67 5 0 9 81 0 572 35 64 671 9.7 4.7 93.1

(84.5-97.7) 100.0 (99.4-100.0)

Rifampicin 28 0 0 3 31 2 586 20 118 726 16.0 2.6 100.0 (87.7-100.0)

99.7 (98.8-100.0)

Ethambutol 9* 0 0 0 9 9 574 92 68 743 9.1 12.2 100.0 (66.0-100.0)

98.5 (97.1-99.3)

Pyrazinamide 9 2 1 2 14 3 606 6 92 707 13.0 1.0 81.8 (48.2-97.7)

99.5 (98.6-99.9)

All 1st line drugs 113 7 1 14 135 14 2338 153 342 2847 11.9 5.2 94.2 (88.4-97.6)

99.4 (99.0-99.7)

* Includes two samples which were reported as phenotypically both resistant and susceptible

Resistance Prediction vs DST

Page 38: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

National implementation• Implemented for the North of England January 2017

• Implemented for the South (including London) 2018

• Process 200 – 300 MGIT positive samples a week (~ 15,000/year)

• Reports are issued recording species identification, resistance prediction and whether an isolate is part of a cluster

• Web service depicting phylogenetic relationships to linked clinical/epidemiological data

Page 39: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

On-going improvements

• Better resistance prediction:

• Improve phenotypic DST

• Improve the resistance catalogue

• Improve WGS resistance prediction and discontinue phenotypic DST

• Rewrite the processing software and host on the Cloud

• Improved extraction from MGIT and direct from sputum

Page 40: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Filling the resistance gapComprehensive Resistance Prediction for Tuberculosis: an International Consortium (CRyPTIC) and ResSeq-TB

• 100,000 WGS TB pledged/available• ~ 25,000 with extensive DST• Analysis:

– Heuristic approach– GWAS– Machine Learning– Thermodynamic modelling of proteins– Molecular genetic characterisation

Pyrazinamide will be done by MGIT liquid culture

People powered researchzooniverse.orgTwitter: @bashthebug

Improved Phenotyping Genotypic characterisation

Read locally

Image analysis

SarahHoosdally

Page 41: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

“Can we already predict enough Mtbc phenotypes from routinely produced WGS data, with sufficient accuracy, to justify a substantial reduction in phenotyping activity.”

In England 80% of isolates predicted to be susceptible to the four first-line drugs will no longer be processed for phenotypic DST. This will result is a substantial reduction in the cost of running the service

Page 42: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Sampling frame

Page 43: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

For all isolates

Resistant phenotype, n (%)Genotypic prediction

R S U F Total

Isoniazid 3067 90 93 44 3294Rifampicin 2743 69 7 84 2903Ethambutol 1410 81 94 55 1640Pyrazinamide 863 82 117 77 1139

Susceptible phenotype, n (%)R S U F Total

65 6313 215 117 671085 6763 232 147 7227

468 6835 781 70 8154204 6146 197 108 6655

Genotypic prediction

PPV, % NPV, % Sensitivity (%)

Specificity (%)

No genotypic prediction made, (%)

Resistance prevalence

(%)

Isoniazid 97.9 98.6 97.1 99.0 4.7 32.9Rifampicin 97.0 99.0 97.5 98.8 4.6 28.7Ethambutol 75.1 98.8 94.6 93.6 10.2 16.7Pyrazinamide 80.9 98.7 91.3 96.8 6.4 14.6

Page 44: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook
Page 45: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Improved DST

Daniela Cirillo Phil Fowler

Page 46: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

• Since April 2017, 15,743 people from around the world have done 1,239,548 classifications between them.

• Each image is shown to 15 different people

http://bashthebug.net, @bashthebug

Additional confidence in reads

Page 47: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Improved resistance prediction and catalogue

• Key innovations/developments:

• Represent variant data as predicting the MIC conferred

• Discover new genes involved in conferring resistance

• Develop algorithm/software recording both nucleotide substitutions and Indels accurately (Clockwork by the EBI)

Page 48: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Current status of CRyPTIC consortium

• 12,000 isolates subject to microtitre DST

• 10,000 isolates sequenced and processed through the enhanced variant caller (Clockwork)

• 7,000 DST/MIC and WGS data combined and analysed

Page 49: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

INH variants analysed by linear regression

Page 50: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Ethambutol variants analysed by linear regression

Page 51: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Random Forest

Genomic data Machine learning Resistance prediction

Support Vector Machine

k-nearest neighbor

Logistic regression

Deep learning

Gradient Boosting

Naïve Bayes

Machine learning applied to CRyPTIC data to predict MIC and generate list of mutations likely to cause resistance

7 state-of-the-art ML models

Feature interpretation

1 2 3a

3b

TARGETED:Machine learning models use mutations in candidate genes

as input

GENOME-WIDE :Machine learning models use

k-mers (k=31) from entire genome as input

• RESISTANCE (S/R)• MIC (dilution & MIC, +/- 1

dilution)

• Generate list of features of interest that contribute to model predictions

• Use GWAS, geometric modeling and others to confirm role in resistance

Page 52: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Table1: Preliminary accuracy results of both approaches to predict MIC in all antibiotics, using standard of +/- 1 dilution from truth

Figure 1: Sample MIC confusion matrix for MIC for targeted method (left) and k-mer method (right) applied to AMI resistance prediction

Table 2: Ten gene mutations of interest generated by model through feature interpretation for AMI resistance

rrs_a1401g rrs_a1401z rrs_g1484t eis_c-14t rrl_c211t

Rv0678_G78Z

ethR_T3A fbiB_V348I ethA_1389_indel

mmpL7_H787H

Antibiotic

Targetedn=4000

Using best model

K-mersn=2500

Using GB

AMI 0.98 0.98

BDQ 0.98 -

CFZ 0.99 -

DLM 0.97 -

EMB 0.86 0.93

ETH 0.91 0.94

INH 0.92 0.91

KAN 0.97 0.995

LEV 0.92 0.96

LZD 0.98 -

MXF 0.92 0.98

RFB 0.93 0.92

RIF 0.94 0.92

Mean 0.94 0.95

Page 53: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Resistance prediction - structure based approaches

Joshua Carter

Page 54: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Machine learning models predict pyrazinamide resistance from structural features

(A) logistic regression (LR), (B) support vector machine with radial kernel (SVM) and (C) neural network (NN) models for prediction of pyrazinamide resistance

Page 55: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

What variation is left to discover?

52,000 WGS

35,000 WGS

INH rarefaction curve of non-synonymous substitutions together across:

katGinhAfabG1ahpC

Non

-syn

onym

ous s

ubst

itutio

ns

Number of isolates

Page 56: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Non

-syn

onym

ous s

ubst

itutio

ns

Number of isolates

PZA rarefaction curve of non-synonymous substitutions across:

pncA

52,000 WGS

35,000 WGS

Page 57: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

A draft software architecture

Standardised sample preparationLIMS Sequence

Local

Centralised Cloud/EBI

TransferSummary Results Data

AssemblyVariant callsFeature callingUsing Clockwork

Persistent storage

Link to identifiable data and visually present results

LocalAccredited software service

Pairwise matrix analysis and phylogenetic analysis

Page 58: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

MTB complex reads (%) MTB complex reads number Genome coverage (x1)

One sample per flow cell

Multiplex sequencing

772,861

119,637

12,679

1,118

380

Identification of MTB reads from samples with different levels of spiked BCG

Page 59: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

What limits of detection are we aiming for?

0 – 4+

4+

3+

2+

1+

scanty

AFB/ml

10,000,000

1,000,000

100,000

10,000

3,000

HPF/AFB

10

1

0.1

0.01

0.003

Genexpert

+

+

+

+

+

WGS

complete

complete

complete

complete?

In-complete?

Page 60: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

The journey is not over!

Page 61: The Journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · The journey to a whole genome sequencing diagnostic service for Mycobacteriaspp. Derrick Crook

Acknowledgements• Sarah Walker• Tim Peto• Tim Walker• Sarah Hoosdally• Mark Wilcox – Leeds• Grace Smith – Birmingham• John Paul – Brighton• Martin Llewellyn – Brighton• Ana Gubertoni Cruz• Amy Mathers - UVa, USAMicrobiology, DNA preparation• Kate Dingle• Nicole Stoesser• Alison Vaughan• Sophie GeorgeInternational and CRyPTIC• Jamie Posey Angela Starks• Stefan Niemann Daniela Cirillo• Nazir Ismail Philip Supply• Jennifer Gardy Yanlin Zhu• Nerges Mistry Camilla Rodriques• Gaungxue He Guy Thwaites• David Moore David Clifton• Daniel Wilson Zamin Iqbal

Oxford High Throughput Sequencing Hub teamPeople participating in the studies

Informatics• David Wyllie• Phil Fowler• Josh Carter• Jim Davies• Infections in Oxfordshire Research Database

Team

Bioinformatics and Population Biology• Martin Carlos del Ojo Elias• Saheer Gharbia• Tanya Golubchik• Anna Sheppard• Stephen Bush• Xavier Didelot• Jeremy Swann• Fan Turner• Tonya Votintseva• Trein Do• Teresa Street