a statistical learning approach to infer …...introduction here, we present a statistical learning...

29
A statistical learning approach to infer transmissions of infectious diseases from deep sequencing data Maryam Alamil 1 , Joseph Hughes 2 , Karine Berthier 3 , C´ ecile Desbiez 3 , Ga¨ el Th´ ebaud 4 and Samuel Soubeyrand 1 . 1 BioSP, INRA, 84914, Avignon, France 2 MRC-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom 3 Pathologie V´ eg´ etale, INRA, 84140 Montfavet, France 4 BGPI, INRA, SupAgro, Cirad, Univ. Montpellier, Montpellier, France 12 March 2019 Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 1 / 21

Upload: others

Post on 04-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

A statistical learning approach to infer transmissions ofinfectious diseases from deep sequencing data

Maryam Alamil1, Joseph Hughes2, Karine Berthier3, Cecile Desbiez3,Gael Thebaud4 and Samuel Soubeyrand1.

1BioSP, INRA, 84914, Avignon, France2MRC-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom

3Pathologie Vegetale, INRA, 84140 Montfavet, France4BGPI, INRA, SupAgro, Cirad, Univ. Montpellier, Montpellier, France

12 March 2019

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 1 / 21

Page 2: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Overview

1 Introduction

2 Methodology

3 Results

4 Conclusion

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 2 / 21

Page 3: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Overview

1 Introduction

2 Methodology

3 Results

4 Conclusion

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 3 / 21

Page 4: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Introduction

Inferring transmission links, for fast evolving pathogens, using viralgenetic data is crucial to make epidemiological predictions and todesign control strategies.

Pathogen sequence data have been exploited to infer who infectedwhom, by using empirical and model-based approaches.

Data collected with deep sequencing techniques provide a subsampleof the pathogen variants that were present in the host at samplingtime. They are expected to give better insight into epidemiologicallinks.

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 4 / 21

Page 5: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Introduction

Here, we present a Statistical Learning Approach For EstimatingEpidemiological Links from deep sequencing data (SLAFEEL), whichis summarized as follows:

After that, we apply this approach to three real cases of animal,human and plant epidemics.

Then, we show the impact of introducing penalization and, therefore,using training data on the inference.

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 5 / 21

Page 6: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Overview

1 Introduction

2 Methodology

3 Results

4 Conclusion

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 6 / 21

Page 7: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Pseudo-evolutionary model for the evolution andtransmission of populations of sequences

+ It describes transitions between sets of sequences sampled at differenttimes from an infected host and its putative sources.

+ It takes into consideration the loss and gain of virus variants duringwithin-host evolution and their loss during between-hosttransmissions.

+ We built a sort of regression function parameterised by anevolutionary parameter and a penalisation parameter, where:

the response variable is the set of sequences S = {S1, ...,SJ} observedfrom a recipient host unit,

the explanatory variable is the set of sequences S(0) = {S(0)1 , ...,S(0)

I }observed from a putative source.

the coefficients are weights measuring how each sequence in S(0)

contributes to explaining each sequence in S.

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 7 / 21

Page 8: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Estimation and calibration of parameters, and inference oftransmissions

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 8 / 21

Page 9: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Estimation and calibration of parameters, and inference oftransmissions

Sequencing

Learning stage

Inference of transmission links for the whole data

set by assessing the link intensity between each directed pair of hosts

is calibrated by buildingand optimising a criterion

that compares contact information and inferred

sources of infection for training hosts

θ

Maximise

μ

f μ ,θ(Sm∣Sm '); m '≠m

Identify the most likely source of

s (m;θ)=argmaxm'≠m

f μm '

(θ) ,θ(Sm∣Sm ' )

mgiven θ

μ m' (θ)

S1

(m1) , ... , S Im1

(m1)

S1

(m2) , ... , S Im2

(m2 )

S1

(m3) , ... , S Im3

(m3)

S1

(m4 ) , ... , S Im4

(m4 )

S1

(m5) , ... , S Im5

(m5 )

given θ

m1

m2

m3

m4

m5

m5

m4

m3

with respect to

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 9 / 21

Page 10: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Semi-parametric pseudo-evolutionary model (1/2)+ Its general form is given by a penalized pseudo-likelihood:

fµ,θ(

S1, ...,SJ |S(0)1 , ...,S(0)

I

)= Pθ(W )×

J∏j=1

∑I

i=1 wijKµ{d(Sj ,S(0)i ); ∆ij}∑I

i=1 wij︸ ︷︷ ︸pj

where:

d(., .) is a distance function giving the number of different nucleotidesbetween two sequences,

∆ij is the duration separating the two sequences Sj and S(0)i ,

wij = 1/nj for indices i corresponding to sequences S(0)i minimally distant

from the sequence Sj (the number of such sequences denote nj) and wij = 0otherwise,

Kµ(.,∆) is a kernel smoother parameterised by an evolutionary parameter µ.It is the probability distribution function of the binomial law with size equalsto the sequence length and success probability 3(1− exp(−4µ∆))/4,

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 10 / 21

Page 11: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Semi-parametric pseudo-evolutionary model (2/2)Pθ(W ) is a parametric penalization for the weight matrix W , parameterisedby a penalisation parameter θ. It measures the likelihood of thecontributions of explanatory sequences S(0)

1 , ...,S(0)I (measured by

∑Jj=1 wij ,

i = 1, ..., I) to the response set of sequences S1, ...,SJ .

Two hypotheses are considered for the penalization:

1 H1 : E[∑J

j=1wij

]= J/I. Two associated penalization shapes:

ã Pθ(W ) =∏I

i=1 Φ(∑J

j=1 wij ; JI , θ

JI

(1 − 1

I

)),

ã Pθ(W ) = θχ2

(∑Ii=1

(∑Jj=1

wij −J/I)2

J/I ; I − 1

),

2 H2 : E[

1J∑J

j=1 min(d(Sj ,S(0)f (j)))

]= dobs . A linked penalization shape:

ã Pθ(W ) = θ∏J

j=1 Φ(∑I

i=1 wijd(Sj ,S(0)i ); dobs, σ

2obs

).

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 11 / 21

Page 12: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Overview

1 Introduction

2 Methodology

3 Results

4 Conclusion

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 12 / 21

Page 13: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Inference of epidemiological links for hosts with differentimmunological histories

116 104

113

Naive chain (5 groups)

410 405

Vaccinated chain (7 groups)

113 115

417 409

106 112

115

400 413

108

105

…..

410

405

…..

414

412

First group Last group

First group Last group

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 13 / 21

Page 14: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Inference of epidemiological links for hosts with differentimmunological histories

Naive chain of InfluenzaHosts Contact

113 115115 113104 113, 115, 116116 104, 113, 115109 104, 111, 116111 104, 109, 116105 108, 109, 111108 105, 109, 111106 105, 108, 112112 105, 106, 108

Vaccinated chain of InfluenzaHosts Contact

405 410410 405409 405, 410, 417417 405, 409, 410401 409, 417415 401, 416416 401, 415403 406, 415, 416406 403, 415, 416412 403, 406, 414414 403, 406, 412400 412, 413, 414413 400, 412, 414

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 13 / 21

Page 15: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Inference of epidemiological links for hosts with differentimmunological histories

Naive chain of InfluenzaHosts Contact

113 115115 113104 113, 115, 116116 104, 113, 115109 104, 111, 116111 104, 109, 116105 108, 109, 111108 105, 109, 111106 105, 108, 112112 105, 106, 108

Vaccinated chain of InfluenzaHosts Contact

405 410410 405409 405, 410, 417417 405, 409, 410401 409, 417415 401, 416416 401, 415403 406, 415, 416406 403, 415, 416412 403, 406, 414414 403, 406, 412400 412, 413, 414413 400, 412, 414

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 13 / 21

Page 16: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Inference of epidemiological links for hosts with differentimmunological historiesA

Tree inferred by optimizing the penalization

with contact information for the last group

Time (day)

2 4 6 8 10 12 14

113113113

115115115

104104104

116116

109109

111111

105

108108

106

112112

Host

BTree inferred by optimizing the penalization

with contact information for the last group

Time (day)

2 4 6 8 10 12 14 16 18 20 22 24

405405405405410410410

409409417

401401

415416

403406

412412412414414414414414

400400413413413

Host

Figure: Transmissions inferred in the naive chain (A) and vaccinated chain (B) of Swineinfluenza virus using pair of training hosts in the last group for calibrating thepenalization. Training hosts are written in bold. The thickness of each arrow isproportional to the intensity of the corresponding inferred link.

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 13 / 21

Page 17: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Inference of epidemiological links for hosts with differentimmunological historiesA

Tree inferred by optimizing the penalization

with contact information for the last group

Time (day)

2 4 6 8 10 12 14

113113113

115115115

104104104

116116

109109

111111

105

108108

106

112112

Host

BTree inferred by optimizing the penalization

with contact information for the last group

Time (day)

2 4 6 8 10 12 14 16 18 20 22 24

405405405405410410410

409409417

401401

415416

403406

412412412414414414414414

400400413413413

Host

Consistent estimations with the two pairs of training hosts.

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 13 / 21

Page 18: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Impact of training hosts

A B C

Figure: Transmissions inferred in the vaccinated chain of SIV using different sets oftraining hosts for calibrating the penalization: (A) a pair of hosts in the last group ofthe chain (B) a pair of hosts in the middle of the chain (C) three hosts in the last groupand the middle of the chain. Training hosts are highlighted. The thickness of each arrowis proportional to the intensity of the corresponding inferred link.

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 14 / 21

Page 19: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Impact of training hostsA B C

Using more contact information allows a finer calibration ofthe penalisation and, consequently, a more accurate resolu-tion of transmissions.

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 14 / 21

Page 20: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Comparaison between SLAFEEL and BadTrIP

Figure: Discrepancy between inferred transmission graphs obtained with SLAFFEL (withand without penalization) and BadTrIP and reference graphs, for naive and vaccinatedchains of SIV. This discrepancy is measured by the proportion of correct sourceidentifications.

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 15 / 21

Page 21: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Inference of epidemiological links in a low diversitypathogen population

Recipient DonorG3817 G3729G3820 G3729G3821 G3729G3823 G3729G3851 G3752

Senga et al. (2017)

Jawie

Kaku

a

Kis

si K

am

a

Kissi Te

ngKissi Tongi

Kpeje Bongre

Luawa

Male

ma

Ma

mb

olo

Mandu

Njaluahun

Nongowa

Figure: Most likely epidemiological links between Ebola patients cumulating to 20%probability for each recipient (i.e., for each recipient, potential donors were ranked withrespect to link intensity, and the subset of donors with higher ranks for which the sum oflink intensities reached 0.2 were retained to be displayed in the graph).

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 16 / 21

Page 22: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Inference of epidemiological links in a low diversitypathogen population

Recipient DonorG3817 G3729G3820 G3729G3821 G3729G3823 G3729G3851 G3752

Senga et al. (2017)

Jawie

Kaku

a

Kis

si K

am

a

Kissi Te

ngKissi Tongi

Kpeje Bongre

Luawa

Male

ma

Mam

bolo

Mandu

Njaluahun

Nongowa

The Jawie chiefdom seems to be an interface between KissiTeng and Kissi Tongi chiefdoms on the one hand and mostof the other chiefdoms on theother hand.

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 16 / 21

Page 23: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Inference of epidemiological links at a metapopulation scale

+ Geographic proximity is used as contact information

10 km

Figure: Epidemiological links inferred between 27 salsify patches based on sets ofpotyvirus variants sequenced from 189 infected plants sampled in a 40× 10 kmregion of south-eastern France.

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 17 / 21

Page 24: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Inference of epidemiological links at a metapopulation scale

10 km

No secondary arrows are displayed,Non-negligible proportion of long links,Environmental factors and intra-host demo-geneticfactors may play a role in the transmission of the virus.

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 17 / 21

Page 25: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Overview

1 Introduction

2 Methodology

3 Results

4 Conclusion

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 18 / 21

Page 26: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Conclusion and perspectives

F SLAFEEL is adaptable to very different contexts and data fromanimal, human and plant epidemics.

F SLAFEEL is valuable in non-standard situations where classicalmechanistic assumptions may be erroneous and when sequencing andvariant calling may be noisy.

F Introducing a penalization and using more contact information lead toaccurate inferences of transmission links.

F Calibrate and assess its efficiency by applying it to simulated datagenerated with diverse sampling efforts, sequencing techniques andstochastic models of viral evolution and transmission.

F Investigate the statistical relationship between inferred transmissionlinks and environmental factors.

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 19 / 21

Page 27: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

References

M Alamil, J Hughes, K Berthier, C Desbiez, G Thebaud, and S Soubeyrand.Inferring epidemiological links from deep sequencing data: a statistical learningapproach for human, animal and plant diseases.Submitted.

C Desbiez, A Schoeny, B Maisonneuve, K Berthier, et al.Molecular and biological characterization of two potyviruses infecting lettuce insoutheastern france.Plant Pathology, 66(6):970–979, 2017.

SK Gire, A Goba, KG Andersen, RS Sealfon, et al.Genomic surveillance elucidates Ebola virus origin and transmission during the 2014outbreak.Science, 345:1369–1372, 2014.

PR Murcia, J Hughes, P Battista, L Lloyd, GJ Baillie, et al.Evolution of an Eurasian Avian-like influenza virus in naıve and vaccinated pigs.PLoS Pathogens, 8(5):e1002730, 2012.

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 20 / 21

Page 28: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Thank you for your attention!

We welcome your questions,comments & suggestions!

Maryam ALAMIL (INRA-BioSP) ModStatSAP-2019 12 March 2019 21 / 21

Page 29: A statistical learning approach to infer …...Introduction Here, we present a Statistical Learning Approach For Estimating Epidemiological Links from deep sequencing data (SLAFEEL),

Impact of introducing a penalizationA

Tree inferred without penalization

Time (day)

2 4 6 8 10 12 14

113113113

115115115

104104104

116116

109109

111111

105

108108

106

112112

Host

BTree inferred without penalization

Time (day)

2 4 6 8 10 12 14 16 18 20 22 24

405405405405410410410

409409417

401401

415416

403406

412412412414414414414414

400400413413413

Host

Figure: Transmissions inferred in the naive chain (A) and vaccinated chain (B) ofSIV without including the penalisation and, therefore, without including traininghosts.