the journey to a whole genome sequencing diagnostic service for … · 2019. 5. 10. · the journey...
TRANSCRIPT
The journey to a whole genome sequencing diagnostic service for
Mycobacteria spp.
Derrick CrookUniversity of Oxford
Public Health EnglandOxford University Hospitals FT Trust
Disclosures
• Funding award from Jansen to support Taiwan’s participating in the CRyPTIC consortium
Contents:
• Background• Proof of principle work• Validation• Implementation • Improvement in resistance prediction• Future prospects for diagnostics
How did I get to work on TB
Madadeni Hospital Newcastle KZN
Ngwelezana KZN
LSHTM London
UVa and Blue Ridge Sanatorium
New England Medical Center
Oxford John Radcliffe Hospital
The goal of research consortium is, therefore, to establish whether and how rapid sequencing can be integrated with bioinformatics and web-based technologies to transform infectious disease surveillance and management
Mycobacterium tuberculosisStaphylococcus aureusClostridium difficileNorovirus
The beginning of the journey March 2008
Concept for ideal whole genome sequencing solution
In one step generate the complete diagnostic, typing and surveillance information
Nature Reviews Genetics 13, 601-612 (September 2012)
Proof of principle for identifying clusters
Diversity within
Same sample
A person at one timepoint
A person over time
Point source outbreak
Community
59 technical replicates
Pulmonary vs extra-pulmonary (within a 1 month)
Sequential pulmonary (separated by >6 months)
Household outbreaks
MIRU-VNTR defined community clusters
0·5 SNPs per genome per year (95% CI 0·3–0·7)
SNP to MIRU loci comparison
EBioMedicine34(2018)122–130
A Quantitative Evaluation of MIRU-VNTR Typing Against Whole-Genome Sequencing for Identifying Mycobacterium tuberculosis Transmission: A Prospective Observational Cohort Study
50% of MIRU-VNTR isolates are falsely clustered i.e. are > 5 SNPs apart
David Wyllie
www.thelancet.com/infection
Resistance Prediction
Number of variants
Sensitive phenotype %
Resistant phenotype %
0 7,566 84 33 51 1,111 12 518 742 211 2 130 193 61 1 16 24 21 0 3 05 6 0 0 06 1 0 1 08 1 0 0 0
Distribution of number of variants across candidate-genes and their promoter-regions in phenotypically susceptible and resistant isolates
Lancet Infect Dis 2015;15: 1193–1202
Dr Tim Walker
Can we discover more determinants
• Resistance is conferred by genomic variation:• Non-synonymous mutations , deletions and insertions in relevant genes – 23
genes• Arises mostly de-novo in a non-recombining genome leading to homoplasy
• Investigation of 3651 isolates :• Using a heuristic method for predicting resistance
• divided into• a 2099 derivation set• a 1552 validation set
Resistance prediction in a validation set
Phenotypic resistance occurring for each genetic variant
Proof of concept for a WGS - Mycobacterial diagnostic service• National diagnostic service in England
• All positive primary cultures with acid fast growth were referred to PHE Reference Labs (London, Birmingham and Newcastle)
• Each isolate was identified and all TB isolates subject to DST and typing
• Identification and susceptibility results reported to the clinicians
• Typing results (MIRU-VNTR) reported to the public health teams
Louise Pankhurst
Pilot study for mycobacterial processingOctober 2013 – April 2014
• Participants: Oxford, Birmingham, Brighton, Leeds, Lille, Borstal, Dublin & Vancouver.
• Clinical samples requiring mycobacterial culture were processed in MIGT tubes
• When they flagged positive, 1 ml was removed for DNA extraction and preparation
• Sequenced by MiSeq locally
Bioinformatics processing
• 331 samples processed• Data Transferred via the cloud to
Oxford• Analysed and generated reports
recording:1. Species2. Resistance prediction3. Nearest genomics match
(against 4000 WGS TB in database)
Summary – Species ID
For MTBC: Sensitivity 95.4% Specificity 98.1%
Based on single sequencing event (no repeating of tests)
Concordant WGS only Routine only
M. tuberculosis complex 157 1
M. tuberculosis complex (BCG) 8
M. tuberculosis complex (africanum) 2
M. tuberculosis complex & M. avium complex 1 5 3
M. avium complex 71 1 1
M. avium complex & M. malmoense 1
M. abscessus complex 39 1
M. gordonae 18
M. xenopi 11
M. kansasii 6 1
M. malmoense 3
M. fortuitum 2 1
M. szulgae 2
M. celatum 1
M. lentiflavum 1
M. scrofulaceum 1
Failed to identify species 9 10 2
Total 331 (96%) 18 (5%) 10 (3%)
Summary – Resistance prediction
WGS resistance predictions for MTB isolates compared to phenotypic DST. R = Resistant, S = Sensitive, M = Mixed (resistant and sensitive), F = failed WGS prediction
Resistant DST Sensitive DST DST not attempted DST failed
WGS WGS WGS WGS
R S M F R S M F R S M F R S M F
First-line
drugs
Isoniazid 13 2 1 0 0 143 0 7 0 0 0 0 0 1 0 1
Rifampicin 5 1 0 0 0 148 4 9 0 0 0 0 0 0 0 1
Ethambutol 5 1 0 0 1 153 0 7 0 0 0 0 0 0 0 1
Pyrazinamide 8 1 0 0 0 149 1 8 0 0 0 0 0 0 0 1
Second-
line
drugs
Streptomycin 5 1 0 0 0 14 0 0 2 138 0 8 0 0 0 0
Fluoroquinolones 3 1 0 0 0 6 0 0 2 148 0 8 0 0 0 0
Aminoglycosides 1 0 0 0 0 5 0 0 2 141 7 12 0 0 0 0
TOTAL 40 7 1 0 1 618 5 31 6 427 7 28 0 1 0 4
Speed – Analysis & reporting
WGS faster
WGS slower
Cost
Process Throughput
(2014)*
Total per sample in 2014
(GBP)
10% fewer samples per year (GBP)
10% more samples per year
(GBP)
WGS and routine clinical workflows
MGIT culture 15265 52·39 52·90 51·97 Cepheid Xpert MTB/RIF 617 99·66 102·35 97·44
WGS workflow only WGS 2207 118·55 120·16 117·26 Routine clinical workflows only
Identification assays Hain MTBC Hain CM/AS
2207 866
1341 55·05 55·28 54·87
MIRU-VNTR 866 107·75 110·89 105·18 First-line DST 866 135·47 137·12 134·13 Limited second-line DSTǂ 62 93·01 93·24 92·83 Second-line DSTǂǂ 62 101·27 104·24 98·86
WGS workflow scenarios
MGIT culture + WGS 170·94 173·06 169·23 MGIT culture + WGS + first-line DST 306··41 310·18 303·36 MGIT culture + WGS + first-line DST + full second-line DST 500·68 507·66 495·05
Routine clinical workflow scenarios
Culture + identification assays 107·44 108·18 106·84 Culture + identification assays + MIRU-VNTR + first-line DST 350·66 356·19 346·15 Culture + identification assays + MIRU-VNTR + first-line DST + full second-line DST 544·93 553·69 537·84
Total workflow costs WGS-based diagnostics 480·91 486·01 476·75 WGS-based diagnostics + first- and full second-line DST 539.53 545.37 534.73 Routine clinical workflow based diagnostics 518·31 524·00 513·64
Head-to-Head of WGS to routine
56:e01480-17.https://doi.org/10.1128/JCM.01480-17.
Phoung Quan
Table 1 WGS species predictions compared to routine laboratory tests
Organism identified by routine laboratory tests
No. samples available for comparison
WGS identified same species (%)
WGS identified different species in same complex / no. of discordants
M. tuberculosis 747 743 (99.5) 4/4 M. africanum 8 7 (87.5) 1/1 M. bovis 8 6 (75.0) 1/2 M. bovis (BCG strain) 6 6 (100.0) 0/0 M. tuberculosis complex 13 13 (100.0) 0/0
Total M. tuberculosis complex 782 775 (99.1) 6/7 M. abscessus 153 152 (99.3) 0/1 M. chelonae 113 106 (93.8) 3/7 M. abscessus complex 4 3 (75.0) 0/1
Total M. abscessus complex 270 261 (96.7) 3/9 M. avium 258 252 (97.7) 0/6 M. intracellulare/M. chimaera 320 296 (92.5) 2/24
Total M. avium complex 578 548 (94.8) 2/30 M. fortuitum 41 24 (58.5) 15/17 M. peregrinum 7 4 (57.1) 2/3
Total M. fortuitum complex 48 28 (58.3) 17/20 M. gordonae 130 127 (97.7) - M. kansasii 34 32 (94.1) - M. malmoense 41 38 (92.7) - M. marinum 5 5 (100.0) - M. ulcerans 1 0 (0.0) - M. xenopi 13 11 (84.6) -
Total other nontuberculous mycobacteria
272 241 (88.6) -
OVERALL TOTAL 1902 1825 (96.0) 28/77 Rarer speciesb 40 11 (27.5) - Mixturesc 23 21 (91.3) -
b Rarer species include: M. interjectum, M. scrofulaceum, M. genevense, M. goodii, M. lentiflavum, M. mucogenicum, M. simiae, M. szulgai c Mixtures were considered concordant if WGS identified at least one of the species reported by the routine laboratory
MTBDRplus resistant MTBDRplus susceptible MTBDRplus uncharacterised Percentage failed
Excluding failed (95% CI)
WGS Total WGS Total WGS Total Sensitivity* Specificity* R S U F R S U F R S U F inhA (Isoniazid)
17 1 0 1 19 0 652 0 48 700 0 1 0 0 1 6.8 94.4 (72.7-99.9)
100.0 (99.4-100.0)
katG (Isoniazid)
47 0 0 6 53 3 621 0 44 668 0 0 0 0 0 6.9 100.0 (92.5-100.0)
99.5 (98.6-99.9)
rpoB (Rifampicin)
18 0 1 2 21 0 571 7 115 693 0 0 2 0 2 16.3 94.7 (74.0-99.9)
98.8 (97.5-99.5)
All genes 82 1 1 9 93 3 1844 7 207 2061 0 1 2 0 3 10.0 97.6 (91.7-99.7)
99.5 (99.0-99.7)
* Sensitivity and specificity relate to WGS ability to identify MTBDRplus resistance or susceptibility only, so any uncharacterised WGS mutations were counted as discordant
Resistance prediction vs MTBDRplus
Phenotypically resistant Phenotypically sensitive Percentage failed / uncharacterised
Excluding failed/uncharacterised (95% CI)
WGS prediction Total WGS prediction Total Sensitivity Specificity R S U F R S U F F U Isoniazid 67 5 0 9 81 0 572 35 64 671 9.7 4.7 93.1
(84.5-97.7) 100.0 (99.4-100.0)
Rifampicin 28 0 0 3 31 2 586 20 118 726 16.0 2.6 100.0 (87.7-100.0)
99.7 (98.8-100.0)
Ethambutol 9* 0 0 0 9 9 574 92 68 743 9.1 12.2 100.0 (66.0-100.0)
98.5 (97.1-99.3)
Pyrazinamide 9 2 1 2 14 3 606 6 92 707 13.0 1.0 81.8 (48.2-97.7)
99.5 (98.6-99.9)
All 1st line drugs 113 7 1 14 135 14 2338 153 342 2847 11.9 5.2 94.2 (88.4-97.6)
99.4 (99.0-99.7)
* Includes two samples which were reported as phenotypically both resistant and susceptible
Resistance Prediction vs DST
National implementation• Implemented for the North of England January 2017
• Implemented for the South (including London) 2018
• Process 200 – 300 MGIT positive samples a week (~ 15,000/year)
• Reports are issued recording species identification, resistance prediction and whether an isolate is part of a cluster
• Web service depicting phylogenetic relationships to linked clinical/epidemiological data
On-going improvements
• Better resistance prediction:
• Improve phenotypic DST
• Improve the resistance catalogue
• Improve WGS resistance prediction and discontinue phenotypic DST
• Rewrite the processing software and host on the Cloud
• Improved extraction from MGIT and direct from sputum
Filling the resistance gapComprehensive Resistance Prediction for Tuberculosis: an International Consortium (CRyPTIC) and ResSeq-TB
• 100,000 WGS TB pledged/available• ~ 25,000 with extensive DST• Analysis:
– Heuristic approach– GWAS– Machine Learning– Thermodynamic modelling of proteins– Molecular genetic characterisation
Pyrazinamide will be done by MGIT liquid culture
People powered researchzooniverse.orgTwitter: @bashthebug
Improved Phenotyping Genotypic characterisation
Read locally
Image analysis
SarahHoosdally
“Can we already predict enough Mtbc phenotypes from routinely produced WGS data, with sufficient accuracy, to justify a substantial reduction in phenotyping activity.”
In England 80% of isolates predicted to be susceptible to the four first-line drugs will no longer be processed for phenotypic DST. This will result is a substantial reduction in the cost of running the service
Sampling frame
For all isolates
Resistant phenotype, n (%)Genotypic prediction
R S U F Total
Isoniazid 3067 90 93 44 3294Rifampicin 2743 69 7 84 2903Ethambutol 1410 81 94 55 1640Pyrazinamide 863 82 117 77 1139
Susceptible phenotype, n (%)R S U F Total
65 6313 215 117 671085 6763 232 147 7227
468 6835 781 70 8154204 6146 197 108 6655
Genotypic prediction
PPV, % NPV, % Sensitivity (%)
Specificity (%)
No genotypic prediction made, (%)
Resistance prevalence
(%)
Isoniazid 97.9 98.6 97.1 99.0 4.7 32.9Rifampicin 97.0 99.0 97.5 98.8 4.6 28.7Ethambutol 75.1 98.8 94.6 93.6 10.2 16.7Pyrazinamide 80.9 98.7 91.3 96.8 6.4 14.6
Improved DST
Daniela Cirillo Phil Fowler
• Since April 2017, 15,743 people from around the world have done 1,239,548 classifications between them.
• Each image is shown to 15 different people
http://bashthebug.net, @bashthebug
Additional confidence in reads
Improved resistance prediction and catalogue
• Key innovations/developments:
• Represent variant data as predicting the MIC conferred
• Discover new genes involved in conferring resistance
• Develop algorithm/software recording both nucleotide substitutions and Indels accurately (Clockwork by the EBI)
Current status of CRyPTIC consortium
• 12,000 isolates subject to microtitre DST
• 10,000 isolates sequenced and processed through the enhanced variant caller (Clockwork)
• 7,000 DST/MIC and WGS data combined and analysed
INH variants analysed by linear regression
Ethambutol variants analysed by linear regression
Random Forest
Genomic data Machine learning Resistance prediction
Support Vector Machine
k-nearest neighbor
Logistic regression
Deep learning
Gradient Boosting
Naïve Bayes
Machine learning applied to CRyPTIC data to predict MIC and generate list of mutations likely to cause resistance
7 state-of-the-art ML models
Feature interpretation
1 2 3a
3b
TARGETED:Machine learning models use mutations in candidate genes
as input
GENOME-WIDE :Machine learning models use
k-mers (k=31) from entire genome as input
• RESISTANCE (S/R)• MIC (dilution & MIC, +/- 1
dilution)
• Generate list of features of interest that contribute to model predictions
• Use GWAS, geometric modeling and others to confirm role in resistance
Table1: Preliminary accuracy results of both approaches to predict MIC in all antibiotics, using standard of +/- 1 dilution from truth
Figure 1: Sample MIC confusion matrix for MIC for targeted method (left) and k-mer method (right) applied to AMI resistance prediction
Table 2: Ten gene mutations of interest generated by model through feature interpretation for AMI resistance
rrs_a1401g rrs_a1401z rrs_g1484t eis_c-14t rrl_c211t
Rv0678_G78Z
ethR_T3A fbiB_V348I ethA_1389_indel
mmpL7_H787H
Antibiotic
Targetedn=4000
Using best model
K-mersn=2500
Using GB
AMI 0.98 0.98
BDQ 0.98 -
CFZ 0.99 -
DLM 0.97 -
EMB 0.86 0.93
ETH 0.91 0.94
INH 0.92 0.91
KAN 0.97 0.995
LEV 0.92 0.96
LZD 0.98 -
MXF 0.92 0.98
RFB 0.93 0.92
RIF 0.94 0.92
Mean 0.94 0.95
Resistance prediction - structure based approaches
Joshua Carter
Machine learning models predict pyrazinamide resistance from structural features
(A) logistic regression (LR), (B) support vector machine with radial kernel (SVM) and (C) neural network (NN) models for prediction of pyrazinamide resistance
What variation is left to discover?
52,000 WGS
35,000 WGS
INH rarefaction curve of non-synonymous substitutions together across:
katGinhAfabG1ahpC
Non
-syn
onym
ous s
ubst
itutio
ns
Number of isolates
Non
-syn
onym
ous s
ubst
itutio
ns
Number of isolates
PZA rarefaction curve of non-synonymous substitutions across:
pncA
52,000 WGS
35,000 WGS
A draft software architecture
Standardised sample preparationLIMS Sequence
Local
Centralised Cloud/EBI
TransferSummary Results Data
AssemblyVariant callsFeature callingUsing Clockwork
Persistent storage
Link to identifiable data and visually present results
LocalAccredited software service
Pairwise matrix analysis and phylogenetic analysis
MTB complex reads (%) MTB complex reads number Genome coverage (x1)
One sample per flow cell
Multiplex sequencing
772,861
119,637
12,679
1,118
380
Identification of MTB reads from samples with different levels of spiked BCG
What limits of detection are we aiming for?
0 – 4+
4+
3+
2+
1+
scanty
AFB/ml
10,000,000
1,000,000
100,000
10,000
3,000
HPF/AFB
10
1
0.1
0.01
0.003
Genexpert
+
+
+
+
+
WGS
complete
complete
complete
complete?
In-complete?
The journey is not over!
Acknowledgements• Sarah Walker• Tim Peto• Tim Walker• Sarah Hoosdally• Mark Wilcox – Leeds• Grace Smith – Birmingham• John Paul – Brighton• Martin Llewellyn – Brighton• Ana Gubertoni Cruz• Amy Mathers - UVa, USAMicrobiology, DNA preparation• Kate Dingle• Nicole Stoesser• Alison Vaughan• Sophie GeorgeInternational and CRyPTIC• Jamie Posey Angela Starks• Stefan Niemann Daniela Cirillo• Nazir Ismail Philip Supply• Jennifer Gardy Yanlin Zhu• Nerges Mistry Camilla Rodriques• Gaungxue He Guy Thwaites• David Moore David Clifton• Daniel Wilson Zamin Iqbal
Oxford High Throughput Sequencing Hub teamPeople participating in the studies
Informatics• David Wyllie• Phil Fowler• Josh Carter• Jim Davies• Infections in Oxfordshire Research Database
Team
Bioinformatics and Population Biology• Martin Carlos del Ojo Elias• Saheer Gharbia• Tanya Golubchik• Anna Sheppard• Stephen Bush• Xavier Didelot• Jeremy Swann• Fan Turner• Tonya Votintseva• Trein Do• Teresa Street