statistical and mathematical perspectives on present-day ...population: an application to belgium...

Statistical and MathematicalPerspectives on Present-dayInfectious DiseaseEpidemiology

Eva Santermans

Promoter: Prof. Dr Niel Hens

Co-Promoter: Prof. Dr Marc Aerts

Co-Promoter: Prof. Dr Geert Molenberghs

Dankwoord

Writer’s block...

Thanks to everyone!

Eva

Just kidding. :-) Although it just occurred to me that this is the only part of my

thesis that will be read by more than five persons, I will not let the stress get to

me. *drinks wine* So, here is my attempt to entertain everyone that is reading this

during my presentation for a solid three minutes (hang in there, reception is coming).

As per usual, I would like to start by thanking the person who had the diffi-

cult task of being my supervisor. Thank you, Niel, for your guidance these past four

years. Whilst letting me work independently, I could always count on you for input,

comments, discussions, ideas, advice (and terribly bad jokes). I feel this might be a

good opportunity to apologize for my bad sense of humor, however, there are still a

number of people I need to mention. I would like to thank my co-supervisors, Marc

and Geert, for their suggestions that helped me improve my work. Furthermore, I

am grateful to the other members of my jury for providing feedback to an earlier

version of this thesis and to all my co-authors whom I had the pleasure to collaborate

with. I would also like to say thank you to Phil and Theo for giving me the

opportunity to spend some time at the university of Nottingham. These research

visits introduced me into the field of Bayesian statistics and broadened my knowledge.

iii

iv

The past years would not have been the same without all my awesome col-

leagues. Cheers to the people of the ‘Koffiegroep’ who made my coffee and lunch

breaks so much more enjoyable. High five to the JOSS board. I had a lot of fun

during our regular meetings and activities. Especially the bowling, that seemed to

trigger a highly amusing competitiveness among the board members (yes boyz, that’s

a reference to you ;-) ). Thank you the colleagues in Antwerp for the interesting

discussions and presentations during our monthly meetings. Lander deserves a

special mentioning for working magic on my sometimes-not-that-efficient R code.

And of course, special thanks to my roomie, Robin! We were ‘matched’ at the start

of our PhD and, well, I think our neighboring offices can indicate how that turned

out (so sorry, we were a tad bit chatty sometimes). You are by far the most ‘stressy’

and ‘catty’ person I know (don’t worry, I won’t include any details), but I could

always count on you. Homies forever!

Anja en Stephanie, merci voor de regelmatige etentjes! Niets dat niet opgelost

kan worden met eten, wijn of roddels, toch? ;-) Natuurlijk ben ik ook een dikke

dankjewel verschuldigd aan mijn ouders. Mams en paps, jullie hebben me altijd

gesteund tijdens mijn (iets-langer-dan-gemiddelde) studieperiode. Papa, dankjewel

om mij altijd te pushen zodat ik het beste uit mezelf kon halen. Mama, dankjewel

om papa regelmatig een beetje af remmen daarin. :-) Ook een dikke kus voor Stella

en Richard om altijd klaar te staan voor Cliff en mij. Tenslotte, nog een vuistje

voor mijn vriendje! Sjattie, thank you om mijn ‘zeldzame’ (kuch) momenten van

grumpyness Cliff-style aan te pakken. :-) Je slaagt er altijd in om mij aan het

lachen brengen (ook al zijn je grapjes een beetje dom en is je gorilla-imitatie heel erg

genant). Ik zie je graag!

Eva Santermans

Diepenbeek, 17 November 2016

List of Publications

Publications covered in this dissertation:

[1] Santermans, E., Goeyvaerts, N., Melegaro, A., Edmunds, W.J., Faes, C.,

Aerts, M., Beutels, P. and Hens, N. (2015). The social contact hypothesis under

the assumption of endemic equilibrium: Elucidating the transmission potential

of VZV in Europe. Epidemics, Volume 11, p. 14−23.

[2] Santermans, E., Ganyani, T., Faes, C., Hens, N., Plachouras, D., Quinten,

C., Robesyn, E., Sudre, B., Van Bortel, W. (2016). Spatiotemporal evolution of

Ebola virus disease at sub-national level during the 2014 West Africa epidemic.

PLoS ONE, 11(1): e0147172. doi: 10.1371/journal.pone.0147172.

[3] Santermans, E., Van Kerckhove, K., Azmon, A., Edmunds, J.W., Beutels,

P., Faes, C., Hens, N. Structural differences in mixing behaviour informing the

role of asymptomatic infection and testing symptom heritability. In revision for

Mathematical Biosciences.

[4] Goeyvaerts, N., Santermans, E., Potter, G., Van Kerckhove, K., Willem,

L., Aerts, M., Beutels, P., Hens, N. Empirical household contact networks:

challenging the random mixing assumption. In preparation.

[5] Santermans, E., O’Neill, P.D., Kypraios, T., Beutels, P., Hens, N. Bayesian

inference for the two-level mixing model incorporating empirical household con-

tact networks. In preparation.

v

vi List of Publications and Reports

Publications not covered in this dissertation:

[7] Hens, N., Abrams, S., Santermans, E., Theeten, H., Goeyvaerts, N., Lernout,

T., Leuridan, E., Van Kerckhove, K., Goossens H., Van Damme, P. and Beu-

tels, P. (2015). Assessing the risk of measles resurgence in a highly vaccinated

population: An application to Belgium anno 2013. Eurosurveillance, 20(1), doi:

10.2807/1560-7917.ES2015.20.1.20998.

Publications not covered in this dissertation on the statistical analysis of cell trans-

plantation experiments and tumor immunology:

[1] Praet, J., Orije J., Kara, F., Guglielmetti, C. Santermans, E., Daans, J.,

Hens, N., Verhoye, M., Berneman, Z., Ponsaerts, P. and Van der Linden, A.

(2015). Cuprizone-induced demyelination and demyelination-associated inflam-

mation result in different proton magnetic resonance metabolite spectra: 1H-

MRS descriminates demyelination from its associated inflammation. NMR in

Biomedicine, doi: 10.1002/nbm.3277.

[2] Praet, J., Santermans, E., Reekmans, K., de Vocht, N., Le Blon, D., Hoor-

naert, C., Daans, J., Goossens, H., Berneman, Z., Hens, N., Van der Linden, A.

and Ponsaerts, P. (2014). Histological Characterization and Quantification of

Cellular Events Following Neural and Fibroblast(-Like) Stem Cell Grafting in

Healty and Demyelinated CNS tissue. Methods in Molecular Biology, 1213, p.

265−283.

[3] Praet, J., Santermans, E., Daans, J., Le Blon, D., Hoornaert, C., Goossens,

H., Van der Linden, A., Hens, N., Berneman, Z. and Ponsaerts, P. (2014). Early

inflammatory responses following cell grafting in the CNS trigger activation of

the sub-ventricular zone: a proposed model of sequential cellular events. Cell

Transplantation, doi: 10.3727/096368914X682800.

[4] Le Blon, D., Hoornaert, C., Daans, J., Santermans, E., Hens, N., Goossens,

H., Berneman, Z. and Ponsaerts, P. (2014). Distinct spatial distribution of

microglia and macrophages following mesenchymal stem cell implantation in

mouse brain. Immunology and Cell Biology, 92(8), p. 650−658.

[5] Costa, R., Bergwerf, I., Santermans, E., De Vocht, N., Praet, J., Daans, J., Le

Blon, D., Hoornaert, C., Reekmans, K., Hens, N., Goossens, H., Berneman, Z.,

Parolini, O., Alviano, F. and Ponsaerts, P. (2015). Distinct in vitro properties of

embryonic and extra-embryonic fibroblast-like cells are reflected in their in vivo

List of Publications and Reports vii

behaviour following grafting in the adult mouse brain. Cell Transplantation,

Volume 24, p. 223−233.

[6] Guglielmeti, C., Le Blon, D., Santermans, E., Salas-Perdomo, A., Daans, J.,

De Vocht, N., Shah, D., Hoornaert, C., Praet, J., Peerlings, J., Kara, F., Bigot,

C., Mai, Z., Goossens, H., Hens, N., Hendrix, S., Verhoye, M., Planas, A.M.,

Berneman, Z., van der Linden, A., Ponsaerts, P. (2016). Interleukin-13 immune

gene therapy prevents CNS inflammation and demyelination via alternative ac-

tiviation of microglia and macrophages. Glia, doi: 10.1002/glia.23053.

[7] Le Blon, D., Guglielmetti, C., Hoornaert, C., Dooley, D., Daans, J., Lemmens,

E., De Vocht, N., Reekmans, K., Santermans, E., Hens, N., Goossens, H.,

Verhoye, M., Van der Linden, A., Berneman, Z., Hendrix, S., Ponsaerts, P. In-

tracerebral transplantation of interleukin 13-producing mesenchymal stem cells

limits microgliosis and demyelination in the cuprizone mouse model. Submitted

to Journal of Neuroinflammation.

[8] Marcq, E., Vasiliki, S., De Waele, J., van Audenaerde, J., Zwaenepoel, K.,

Santermans, E., Hens, N., Pauwels, P., van Meerbeeck, J.P., Smits, E.L.J.

Prognostic and predictive aspects of the tumor immune microenvironment and

immune checkpoints in malignant pleural mesothelioma. Submitted to Oncolm-

munology.

Contents

List of Publications and Reports v

Table of Contents ix

List of Abbreviations xiii

List of Figures xv

List of Tables xxi

1 Introduction 1

1.1 Infectious Disease Epidemiology . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.1 Infectious Disease Models . . . . . . . . . . . . . . . . . . . . . 6

1.3.2 Epidemiological Parameters . . . . . . . . . . . . . . . . . . . . 13

1.3.3 Network Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.3.4 Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . 19

2 Data Sources 27

2.1 Disease Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.1.1 Varicella-zoster Virus . . . . . . . . . . . . . . . . . . . . . . . 27

2.1.2 A(H1N1)v2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.1.3 Pertussis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.1.4 Ebola Virus Disease . . . . . . . . . . . . . . . . . . . . . . . . 33

2.2 Social Contact Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

ix

x Table of Contents

2.2.1 POLYMOD Contact Data . . . . . . . . . . . . . . . . . . . . . 36

2.2.2 Contact Behavior during Illness . . . . . . . . . . . . . . . . . . 37

2.2.3 Estimation of Contact Rates . . . . . . . . . . . . . . . . . . . 37

2.2.4 Contact Patterns within Households . . . . . . . . . . . . . . . 40

3 The Social Contact Hypothesis Under Endemic Equilibrium 45

3.1 Estimating the Basic and Effective Reproduction Number . . . . . . . 46

3.1.1 Mass Action Principle and Mixing Assumptions . . . . . . . . . 46

3.1.2 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . 48

3.1.3 Model Eligibility and Indeterminacy . . . . . . . . . . . . . . . 49

3.1.4 Application to the Data . . . . . . . . . . . . . . . . . . . . . . 50

3.2 Elucidating Potential Risk Factors . . . . . . . . . . . . . . . . . . . . 54

3.2.1 Maximal Information Coefficient . . . . . . . . . . . . . . . . . 54

3.2.2 Random Forest Approach . . . . . . . . . . . . . . . . . . . . . 58

3.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.3 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.3.1 Contact data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.3.2 Risk Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.3.3 Perturbations Demographic and Endemic Equilibrium . . . . . 62

3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4 Differences in Mixing Behaviour and Symptom Heritability 69

4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.1.1 Transmission Models . . . . . . . . . . . . . . . . . . . . . . . . 71

4.1.2 Age Structure and Social Contacts . . . . . . . . . . . . . . . . 73


4.2 Application to the Data . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.2.1 Exploratory Analyses . . . . . . . . . . . . . . . . . . . . . . . 78

4.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.3 Impact of Home Isolation . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5 Empirical Household Contact Networks 89

5.1 Household Contact Survey . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.2 ERGMs for Within-household Physical Contact Networks . . . . . . . 93

5.3 Epidemic Spread in a Community of Households . . . . . . . . . . . . 98

5.3.1 Setting 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.3.2 Setting 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Table of Contents xi

5.3.3 Other Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6 Two-Level Mixing Model Incorporating Household Networks 107

6.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.1.1 Model Description . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.1.2 Likelihood and Posterior Density . . . . . . . . . . . . . . . . . 111

6.1.3 MCMC Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.2 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7 Spatiotemporal Evolution of EVD at Sub-national Level 117

7.1 Growth model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118


7.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7.2 Compartmental model . . . . . . . . . . . . . . . . . . . . . . . . . . . 124



7.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7.3 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

8 Discussion and Further Research 139

Bibliography 143

Acknowledgements 159

A Appendix - Chapter 5 161

A.1 Household Contact Survey . . . . . . . . . . . . . . . . . . . . . . . . . 161

A.2 Modeling Within-household Physical Contact Networks . . . . . . . . 162

A.3 Epidemic Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . 166

Samenvatting 167

List of Abbreviations

AIC Akaike information criterion

AM Adaptive metropolis

AMM Adaptive-mixture metropolis

CAS-model Continuous age-structured model

CI Confidence interval

DIC Deviance information criterion

ELISA Enzyme-linked immunosorbent assay

ERGM Exponential random graph model

EVD Ebola virus disease

EXP Exponential

FOI Force of infection

GP General practitioner

HIV Human immunodeficiency virus

IgM Immunoglobulin M

IgG Immunoglobulin G

ILI Influenza-like illness

INLA Integrated nested Laplace approximation

LIN Linear

MAP Mass-action principle

MH Metropolis-Hastings

MIC Maximal information coefficient

xiii

xiv

ML Maximum likelihood (estimation)

MSE Mean squared error

NIP National immunization program

ODE Ordinary differential equation

OOB Out-of-bag

PCR Polymerase chain reaction

PDE Partial differential equation

RAS-model Realistic age-structured model

RWM Random-walk metropolis

VE Vaccine effectiveness

VZV Varicella-zoster virus

WAIFW Who Acquires Infection From Whom

List of Figures

1.1 Left panel: 3-D computer enhanced electron microscope photo of the

Varicella-zoster virus, content provider: ShutterStock, photo credit:

Michael Taylor. Right panel: 3-D graphical representation of the struc-

ture of a generic influenza virus, content provider: CDC . . . . . . . . 2

1.2 Infectious disease stages, adapted from the book “Modelling infectious

diseases” by Keeling and Rohani, 2007 . . . . . . . . . . . . . . . . . . 3

1.3 Flow diagram for the deterministic SIR model. . . . . . . . . . . . . . 7

1.4 Flow diagram for the age-structured SIR model with two age groups. . 8

1.5 Illustration of the (basic) reproduction number R0 (left) and R (right):

one infected individual (black circle) is introduced into a fully suscep-

tible population and infects on average R0 = 3 other individuals (grey

circles, left panel), or he/she is introduced in a partly immunized pop-

ulation (dotted circles) infecting only R = 1 individual (grey circles,

right panel) (Goeyvaerts, 2011). . . . . . . . . . . . . . . . . . . . . . . 15

2.1 Observed age-specific VZV seroprevalence for Belgium, England and

Wales, Finland, Germany, Ireland, Israel and Italy. The size of the

dots is proportional to the sample size per age category. . . . . . . . . 29

2.2 Observed age-specific VZV seroprevalence for Luxembourg, the Nether-

lands, Poland, Slovakia and Spain. The size of the dots is proportional

to the sample size per age category. . . . . . . . . . . . . . . . . . . . . 30

2.3 Weekly number of ILI cases in five age categories during the early part

of the A/H1N1pdm influenza epidemic in 2009 in England and Wales. 31

xv

xvi

2.4 Compositions of the households included in the pertussis study (left)

and symptom onset times in days relative to the symptom onset time

of the primary case of the household (right). . . . . . . . . . . . . . . . 33

2.5 Transmission electron micrograph of an Ebola virus virion. . . . . . . 34

2.6 Contour plot of the estimated Belgian contact rates derived from the

bivariate smoothing approach applied to the POLYMOD survey data. 39

2.7 Age-specific contact rates for asymptomatic individuals (left) and

symptomatic individuals (right) based on the age classes of the in-

cidence data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.8 Observed within-household physical contact networks by household

size. Nodes represent household members and edges represent phys-

ical contacts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.9 Barplots of contact intensity distributions (duration, frequency and

touching) and contact location distributions for all contacts recorded

with non-household (left bar) and household members (right bar). . . 43

3.1 Estimated basic and effective reproduction numbers with 95% boot-

strap percentile confidence intervals for constant (black), log-linear

(gray) and extended log-linear (light gray) proportionality factor. For

each country, sizes of the dots are proportional to Akaike weights, hence

larger dots correspond to smaller AIC values. The dotted horizontal

line indicates the single eligible value for R under endemic equilibrium,

which is one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.2 Profile likelihood estimates of R0 (left axis) and R (right axis) as a

function of γ2, the parameter related to infectiousness, for Finland and

Luxembourg. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.3 Profile likelihood estimates of R (dots) with interpolated 95% boot-

strap percentile confidence intervals (dashed lines) as a function of γ2,

the parameter related to infectiousness, for Finland and Luxembourg.

The vertical dotted line indicates the value of γ2 for which the upper

confidence limit of R equals 1 (horizontal dotted line). . . . . . . . . . 53

3.4 Observed age-specific VZV seroprevalence (dots) and the profile esti-

mated from the final model selected for each country (solid line). The

corresponding force of infection estimates are displayed by the lower

solid line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

xvii

3.5 Observed age-specific VZV seroprevalence (dots) and the profile esti-

mated from the final model selected for each country (solid line). The

corresponding force of infection estimates are displayed by the lower

solid line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.1 Schematic diagram of the non-preferential transmission model. Super-

scripts indicate presence (s) or absence (a) of symptoms. . . . . . . . . 71

4.2 Schematic diagram of the preferential transmission model. Superscripts

indicate clinical status of the infected individual: symptomatic (s) or

asymptomatic (a). Subscripts indicate whether the infector was symp-

tomatic (s) or asymptomatic (a). . . . . . . . . . . . . . . . . . . . . . 72

4.3 Prior and posterior distributions for the proportion of cases that de-

velop symptoms (φ), the proportionality factor for asymptomatic in-

dividuals (qa), the relative infectiousness of symptomatic cases versus

asymptomatic cases (qr) and the reporting rates (ρi, i = 1, 2, 3, 5). . . 80

4.4 Scatter plot of the proportion of cases that develop symptoms (φ),

the proportionality factor for asymptomatic individuals (qa) and the

infectiousness ratio (qr). . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.5 Number of symptomatic (full line) and asymptomatic (dotted line)

cases over time for the five age categories assuming a 20% reporting

rate in the 45− 65 age class for the non-preferential model. . . . . . . 81

4.6 Prior and posterior distributions for the proportion of individuals in-

fected by a symptomatic case that develop symptoms (φs), the pro-

portion of individuals infected by an asymptomatic case that remain

asymptomatic (φa), the proportionality factor for asymptomatic indi-

viduals (qa), the relative infectiousness of symptomatic cases versus

asymptomatic cases (qr) and the reporting rates (ρi, i = 1, 2, 3, 5). . . 82

4.7 Scatter plot of the proportion of individuals infected by a symptomatic

case that develop symptoms (φs), the proportion of individuals in-

fected by an asymptomatic case that remain asymptomatic (φa), the

proportionality factor for asymptomatic individuals (qa) and the infec-

tiousness ratio (qr). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.8 Number of symptomatic (full line) and asymptomatic (dotted line)

cases over time for the five age categories assuming a 20% reporting

rate in the 45− 65 age class for the preferential model. . . . . . . . . . 83

xviii

4.9 Histogram of MCMC samples for φs− (1−φa), with φs the proportion

of individuals infected by a symptomatic case that develop symptoms

and φa the proportion of individuals infected by an asymptomatic case

that remain asymptomatic in the preferential model. . . . . . . . . . . 84

4.10 Observed (grey bars) and estimated (connected dots) reported weekly

incidence for the five age categories. Full line and filled dots is the es-

timated incidence for the non-preferential model, dotted line and open

dots are the estimates for the preferential model. . . . . . . . . . . . . 85

4.11 Proportion of cases plotted against the proportion of symptomatic in-

dividuals staying home immediately after symptom onset. Left panel:

reduction in total number of cases for the non-preferential model with

95% confidence intervals. Right panel: reduction in the number of

total, symptomatic and asymptomatic cases for the preferential model. 86

5.1 Proportion of complete networks (left) and mean network density

(right): observed values (blue stars with size proportional to the sam-

ple size) and values simulated from the ERGM for within-household

physical contact networks on a weekday. . . . . . . . . . . . . . . . . 96

5.2 Proportion of complete networks (left) and mean network density

(right): observed values (blue stars with size proportional to the sam-

ple size) and values simulated from the ERGM for within-household

physical contact networks on a weekend day. . . . . . . . . . . . . . . 97

5.3 Proportion of observed versus potential triangles: observed values (blue

stars with size proportional to the sample size) and values simulated

from the ERGM for within-household physical contact networks on a

weekday (left) and on a weekend day (right). . . . . . . . . . . . . . . 97

5.4 Mean infection incidence over time at the individual (left) and house-

hold level (right) for 1000 simulations of a stochastic SIR epidemic

process on a 2-level households model assuming random (black) and

empirical-based (red) mixing within households. . . . . . . . . . . . . . 100

5.5 Household attack rates by household size for 1000 simulations of a

stochastic SIR epidemic process on a 2-level households model assuming

random and empirical-based mixing within households. . . . . . . . . . 100

xix

5.6 Mean infection incidence over time at the individual (left) and house-

hold level (right) for 1000 simulations of a stochastic SIR epidemic

process on a 2-level households model assuming random (black) and

empirical-based (red) mixing within households including a density

scaling factor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.7 Household attack rates by household size for 1000 simulations of a

stochastic SIR epidemic process on a 2-level households model assum-

ing random and empirical-based mixing within households including a

density scaling factor. . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.1 Trace plot of the MCMC samples for the within-household transmis-

sion probability (βh), the community risk of infection (βc), the mean

duration of the incubation period (µ), the standard deviation of the

incubation period (σ) and the number of edges and triangles in the

household contact network G. . . . . . . . . . . . . . . . . . . . . . . . 113

6.2 Prior and posterior distributions for the within-household transmission

probability (βh), the community risk of infection (βc), the mean du-

ration of the incubation period (µ) and the standard deviation of the

incubation period (σ). Dotted lines are prior distributions. . . . . . . . 114

7.1 Estimated weekly growth rates per district and implemented interven-

tion measures for Guinea, Sierra Leone and Liberia, 2014-2015. Red

colours indicate an increase in number of weekly cases, whereas blue

colours indicate a decline. Periods for which no reported cases are

available are shown in white. A light dot indicates that a triage, hold-

ing centre or CCC is in place and a dark dot indicates that an ETU or

ETU and CCC are in place. . . . . . . . . . . . . . . . . . . . . . . . . 120

7.2 Estimated growth rate per district and implemented intervention mea-

sures during week 21 and 40 of 2014 and week 8 and 26 of 2015. ‘1’

triage, holding centre or CCC is in place; ‘2’ ETU or ETU plus CCC

is in place. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.3 Cumulative cases per district and implemented intervention measures.

A light dot indicates that a triage, holding centre or CCC is in place

and a dark dot indicates that an ETU or ETU and CCC are in place. 122

7.4 Cumulative deaths per district and implemented intervention measures.

A light dot indicates that a triage, holding centre or CCC is in place

and a dark dot indicates that an ETU or ETU and CCC are in place. 123

xx

7.5 Flow diagram for the SEIR model with distinction between cases that

survive and fatal cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

7.6 Schematic representation of reporting of case notifications. . . . . . . . 125

7.7 Observed (black) and estimated (blue) number of new cases (top left),

new deaths (top right), cumulative cases (bottom left) and cumula-

tive deaths (bottom right) per district. Dashed lines are 95% credible

intervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

7.8 Three-week prediction of new cases (left) and deaths (right) for Western

Area Urban at 24 October, 14 November, 5 December and 26 December

2014 (top to bottom). Light blue regions are the predicted time periods

and estimation is based on all data before that time point. . . . . . . . 129

7.9 Estimated reproduction number per district with 95% posterior inter-

vals. The threshold value of one is indicated by a red horizontal line. 130

A.1 Barplot of within-household contact duration distributions by type of

relationship, including both physical and non-physical contacts. . . . . 161

A.2 Interpretation of mixing and age effect statistics of the ERGM: ratio

of the odds of physical contact occurring between two relatives versus

a pair of siblings, as a function of the sum of the siblings’ ages. Left

panel: weekday, right panel: weekend day. . . . . . . . . . . . . . . . . 162

A.3 Final fractions for 1000 simulations of a stochastic SIR epidemic process

on a 2-level households model assuming random and empirical-based

mixing within households. Small outbreaks are excluded from display. 166

A.4 Final fractions for 1000 simulations of a stochastic SIR epidemic process

on a 2-level households model assuming random and empirical-based

mixing within households. Small outbreaks are excluded from display. 166

List of Tables

2.1 Overview of the VZV serological data and demographic parameters. . 28

3.1 Estimates of the basic and effective reproduction numbers and trans-

mission parameters (γ0, γ1, γ2) with 95% bootstrap percentile confi-

dence intervals and corresponding AIC values for constant (CP), log-

linear (LP) and extended log-linear (EP) proportionality assumptions.

Estimates for EP are obtained using a profile likelihood-based assess-

ment of model eligibility. Final models are indicated in bold. . . . . . 52

3.2 Selected set of potential risk factors for varicella. Data sources and

missingness are indicated. Reference years were chosen to be as close

to the year of serological data collection as possible, conditional on

availability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.3 Ten factors with the largest MIC value of association with R0, esti-

mated from the final model selected for each country, and correspond-

ing Spearman correlation coefficients ρS . . . . . . . . . . . . . . . . . . 59

3.4 Ten best scoring factors obtained by a random forest analysis of R0,

estimated from the final selected model for each country, and corre-

sponding Spearman correlation coefficients ρS . . . . . . . . . . . . . . 60

3.5 Pairs of potential risk factors with the largest absolute Spearman cor-

relation coefficient. High scoring factors according to MIC and random

forest are indicated in bold. . . . . . . . . . . . . . . . . . . . . . . . . 61

3.6 Comparison of the average household size at time of serological data

collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

xxi

xxii

3.7 Estimated basic and effective reproduction numbers with 95% boot-

strap percentile confidence intervals and corresponding AIC values for

the log-linear model based on contact data minimizing AIC. . . . . . . 63

3.8 Ten factors with the largest MIC value of association with R0, esti-

mated from the log-linear model using the minimal AIC contact data,

and corresponding Spearman correlation coefficient, ρS . . . . . . . . . 63

3.9 Ten best scoring factors obtained by a random forest analysis of R0,

estimated from the log-linear model using the minimal AIC contact

data, and corresponding Spearman correlation coefficient, ρS . . . . . . 64

3.10 Estimates of the basic and effective reproduction number when imple-

menting a vaccination strategy or changing the birth rate. . . . . . . . 64

3.11 Ranges of estimates of the basic reproduction numbers obtained by

Santermans et al. , Nardone et al. and Melegaro et al. Nardone et

al. used a WAIFW matrix approach for three age groups, whereas

Melegaro et al. used the social contact hypothesis for different stratifi-

cations of POLYMOD contact data. . . . . . . . . . . . . . . . . . . . 66

4.1 An overview of parameters of pandemic influenza A/H1N1 2009 in

humans obtained from a literature review (Dorjee et al., 2013). These

values were either estimated from empirical data of experimental or

observational studies (Est.); or referenced for modeling (Ref.). . . . . . 77

4.2 Prior distributions for the parameters in the preferential and non-

preferential model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.3 Posterior median, 95% posterior intervals and DIC value for the non-

preferential model for different values of the reporting rate ρ. . . . . . 78

4.4 Posterior median, 95% posterior intervals and DIC value for the pref-

erential model for different values of the reporting rate ρ. . . . . . . . 79

4.5 Posterior median, 95% posterior credible intervals and DIC value for

the non-preferential and preferential model. . . . . . . . . . . . . . . . 80

5.1 Proportion of complete networks and mean network density, stratified

by household size, for the observed within-household physical contact

networks, comparing week and weekend days (top) and regular and

holiday periods (bottom). . . . . . . . . . . . . . . . . . . . . . . . . . 92

xxiii

5.2 Network statistics considered in the ERGMs, where an edge is defined

as a physical contact between two individuals. Reference categories

are child-child mixing, boy-girl mixing, and mixing within households

of size 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.3 ERGM for within-household physical contact networks on week- and

weekend days: parameter estimates and Wald test p-values, log-

likelihood and AIC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.1 Estimation of vaccine effectiveness for 1 to 14-year-olds per birth cohort

according to the NIP report (National Institute for Public Health and

the Environment, 2013). . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.2 Posterior median and 95% posterior credible intervals for the model

parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

7.1 Prior distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7.2 Parameter estimates with 95% posterior credible intervals. . . . . . . 127

7.3 Parameter estimates sensitivity analysis. Fixed values are indicated in

bold, asterisks indicate model differences compared to the final model 1.134

7.4 Parameter estimates sensitivity analysis. Fixed values are indicated in

bold, asterisks indicate model differences compared to the final model 1.135

A.1 Observed physical contact networks: average degree and various mea-

sures of within-household clustering, stratified by household size. . . . 163

A.2 Observed proportion of complete networks and mean network density,

stratified by household size, with median and 95% percentile range

obtained from 1000 networks simulated from the ERGM for within-

household physical contact networks on a weekday. . . . . . . . . . . . 163

A.3 Observed proportion of observed versus potential triangles, stratified

by household size, with median and 95% percentile range obtained from

1000 networks simulated from the ERGM for within-household physical

contact networks on a weekday. . . . . . . . . . . . . . . . . . . . . . . 164

A.4 Observed proportion of complete networks and mean network density,

stratified by household size, with median and 95% percentile range

obtained from 1000 networks simulated from the ERGM for within-

household physical contact networks on a weekend day. . . . . . . . . . 164

xxiv

A.5 Observed proportion of observed versus potential triangles, stratified

by household size, with median and 95% percentile range obtained from

1000 networks simulated from the ERGM for within-household physical

contact networks on a weekend day. . . . . . . . . . . . . . . . . . . . 164

A.6 Literature-based estimates of household and community transmission

parameters obtained from household final size or symptom onset data:

qHH = P (escape infection from infected HH member per day) as-

suming an infectious period of 4 days, the household secondary at-

tack rate (SAR) i.e. the probability of being infected by another

household member during the course of the latter’s infectious pe-

riod, and qcom = P (escape infection from community during epidemic

period)= 1− CPI, where CPI is the community probability of infection.

† Household size defined as the number of susceptibles in a household

prior to the epidemic. * Same age definitions for children and adults as

in Longini et al. (1988), distinguishing between susceptibles and infected.165

Chapter 1Introduction

1.1 Infectious Disease Epidemiology

Infectious diseases have a huge impact on global health, being responsible for millions

of deaths each year, especially in the developing world. From the global HIV and

tuberculosis epidemics, to the appearance of new pathogens and resurgence of old

ones, often in new and drug-resistant forms. All bring the need for new and improved

methods in infectious disease epidemiology. Infectious disease epidemiology shares

the same conceptual framework as ‘non-infectious disease’ epidemiology: it concerns

the study of the causes and distribution of infectious diseases in populations and aims

to control them. There are, however, some concepts and terminology specifically

related to infectious diseases.

Infectious diseases are caused by pathogenic microorganisms, such as bacteria,

viruses, parasites or fungi; the diseases can be spread, directly or indirectly, from

one person to another (World Health Organization, 2016b). Direct transmission may

be through direct contact, e.g. touching, biting, kissing or sexual intercourse, by

droplet contact or through airborne transmission. Droplet contact occurs when an

individual sneezes or coughs and the droplets spray onto the eyes, nose or mouth

of another individual. This is usually limited to short distances. The transmission

route is called airborne when viruses travel on small respiratory droplets that

may become aerosolized after sneezing, coughing or talking. These particles can

remain in the air for long periods of time and travel over considerable distances.

Examples of diseases that are transmitted via airborn or droplet contact are measles,

1

2 Chapter 1. Introduction

pertussis, (pandemic) influenza (Figure 1.1), etc. Indirect transmission occurs when

the infectious organism is transferred from a source through objects (vehicle-born)

or insects (vector-borne). Malaria and dengue fever are examples of vector-borne

diseases. The type of transmission route depends mainly on the characteristics of

the causative agent and those of the host. Some microorganisms are restricted to

a limited number of transition routes, whereas others can follow many different

pathways to infect their hosts. In this thesis, we will focus on infectious diseases for

which the main transmission route is from human to human via non-sexual social

contacts, such as airborne transmission, droplet contact or physical contact.

Figure 1.1: Left panel: 3-D computer enhanced electron microscope photo of the

Varicella-zoster virus, content provider: ShutterStock, photo credit: Michael Taylor. Right

panel: 3-D graphical representation of the structure of a generic influenza virus, content

provider: CDC

When an individual is infected with an infectious disease, an immune reaction is initi-

ated with the production of antibodies specific to the pathogen (humoral immunity)

and activation of cells aiming to destruct the pathogen (cellular immunity). There is

a period of variable length between the moment the host is infected and the moment

when the host is infectious, hence able to transmit the pathogen. Furthermore, the

host can develop symptoms after infection, although this does not need to coincide

with the period of infectiousness. Individuals that do not develop symptoms at

all are refered to as asymptomatic cases. Eventually, the host may no longer be

infectious and recover. Thereafter, he can be immune to the disease (Figure 1.2).

Cell-mediated immunity is driven by T-cells that are able to detect viral antigens on

1.1. Infectious Disease Epidemiology 3

Figure 1.2: Infectious disease stages, adapted from the book “Modelling infectious

diseases” by Keeling and Rohani, 2007

a cell’s surface and destroy the cell if necessary. Humoral (or antibody-mediated)

immunity on the other hand is related to the production of virus-specific antibodies,

that induce long term protection. There are two main types of antibodies: Im-

munoglobulin M (IgM) and Immunoglobulin G (IgG). Another antibody isotype is,

for example, IgA, which is found in mucosal areas (see e.g. Woof and Burton (2004)).

IgM antibodies are produced quickly after the onset of infection and will last for only

a short period of time. The production of IgG antibodies takes longer, but they can

persist for years after the infection and it provides immunity for years, even lifelong.

This type of antibodies can be transferred from a pregnant woman to her fetus,

granting immunity to a pathogen until the infant’s immune system has matured.

Vaccination is based on the introduction of an antigen from a pathogen to stimulate

the immune system and to develop immunity against that specific pathogen. Hence,

in the absence of immunization, the presence of IgG antibodies in blood serum

indicates past infection. In Chapter 3, cross-sectional sets of serum samples are used.

Testing these samples for IgG titer values gives rise to serological data and provides

information on the immunity status of the individuals. For a description of this data,

we refer to Section 2.1.1.

In response to emerging threats and to improve control of endemic diseases,


the field of infectious disease modeling has grown substantially in the last decades.

A distinction between statistical and mathematical models is to be made. Statistical

models study relations between different variables based on data and make inferences

based on these relations. Mathematical models, on the other hand, describe a

system through mathematical equations and study how the system changes from

one state to the next and how variables depend on the value or state of other variables.

A key parameter in infectious disease modeling is the probability of contact

between an infectious source and a susceptible individual. For infections transmitted

from person to person various assumptions are required to simplify the range of

human relations into tractable mathematical models. Classical early work in epi-

demic modeling usually assumed homogeneous mixing in a community of individuals,

each having the same susceptibility to disease and the same ability to transmit

disease. Such assumptions rarely reflect reality, although they are often sufficient

for modeling purposes, and during the last decades, considerable efforts have been

made towards modeling heterogeneity in the acquisition of infection and its effect on

disease propagation. Anderson and May (1991) were to first to introduce a method

of imposing certain patterns on age-dependent mixing rates. The effect of imposing

these mixing patterns was studied by Greenhalgh and Dietz (1994) and Farrington

et al. (2001) among others. Several extensions of the traditional approach by

Anderson and May (1991) have been proposed including time-varying transmission

rates (Whitaker and Farrington, 2004) and continuous parametric contact surfaces

(Farrington and Whitaker, 2005). Wallinga et al. (2006) introduced the use of social

contact surveys to inform transmission rates. They assume that transmission rates

for infections transmitted through non-sexual social contacts are proportional to

contact rates estimable from contact surveys. Social contact data form an important

part of this dissertation, and are described in Section 2.2. Further, models have been

developed that attempt to represent the underlying structure of contact patterns by

partitioning the population into contact structures. Examples of epidemic models

that incorporate structured populations include independent-household models (see

Longini and Koopman (1982); Becker and Dietz (1995); Becker and Hall (1996)),

models with two levels of mixing (see Ball et al. (1997); Ball and Lyne (2001); Demiris

and O’Neill (2005a)), random network models (e.g. Anderson (1999); Britton and

O’Neill (2002)), and social cluster models (e.g. Schinazi (2002)).

1.2. Overview of the Thesis 5

1.2 Overview of the Thesis

The social contact hypothesis, introduced by Wallinga et al. (2006), can be extended

by incorporating age-dependent susceptibility to infection, which entailed an improve-

ment of model fit for data on varicella-zoster virus (VZV) in Belgium (Goeyvaerts

et al., 2010). In Chapter 3, we look at data from 11 other European countries, besides

Belgium, and evaluate how this age-dependent susceptibility affects the fit to the

data. Furthermore, we introduce a method to account for age-specific heterogeneity

related to infectiousness by relying on the effective reproduction number as model

eligibility criterion.

In Chapter 4, we use social contact data that provide insight in the impact of

illness on contact patterns. We show that this type of data can inform inference

on parameters related to asymptomatic infection using data on symptomatic cases

only. This will be illustrated using data on influenza-like illness. Additionally, we

investigate whether the probability of developing symptoms depends on the clinical

state of the person that transmitted the infection.

Chapters 5 and 6 look into contact heterogeneity within households. Data

from the first social contact survey designed to study contact networks within

households is described in Chapter 5. In this chapter, we also develop a network

model to infer on the factors that drive contacts between household members. This

network model is then used in Chapter 6 to inform within-household networks in a

2-level mixing model. Inference for this model is illustrated using data on pertussis

in the Netherlands.

In Chapter 7 we develop a two-stage model for the Ebola outbreak of 2014.

This model takes into account the spatial and temporal heterogeneity of the outbreak

and is based on publicly-available district-level data on the number of cases and

deaths.

Finally, in Chapter 8 we summarize our main conclusions and discuss topics

open for further research.


1.3 Basic Concepts

In this section, we will introduce some basic terminology and fundamental concepts

used in the field of mathematical epidemiology. Infectious disease models are intro-

duced in Section 1.3.1. In Section 1.3.2, we describe some of the most important epi-

demiological parameters. Section 1.3.3 provides an introduction to the basic concepts

of networks and exponential random graph models. Finally, an overview of inference

methods relevant for the analyzes in this thesis are discussed in Section 1.3.4.

1.3.1 Infectious Disease Models

Deterministic models describing disease dynamics by partitioning the population into

different disease states, date back to the work by Bernouilli (1760). He developed the

first model to demonstrate the benefits of immunizing individuals against smallpox

in France. Others followed, but the most important contributions to these models

were made after the 1900s when the interest in infectious disease models increased

substantially (Kermack and McKendrick, 1927; Bailey, 1975; Dietz, 1975; Anderson

and May, 1991). Deterministic transmission models are very insightful to study dis-

ease dynamics in large populations, however they are less suited for small or isolated

populations. To this purpose, stochastic models were defined. These models make up

the second important branch in infectious disease modeling and are usually defined at

the individual level. For an elaborate discussion on stochastic modeling, we refer to

Daley and Gani (1999); Andersson and Britton (2000) and Diekmann et al. (2013). In

this thesis, we make use of deterministic models in Chapters 3, 4 and 7. To describe

disease transmission in relatively small populations of households in Chapters 5 and

6, a stochastic chain-binomial model is used.

1.3.1.1 Deterministic SIR Model

One of the most simple compartmental models is the so-called Susceptible-Infected-

Recovered (SIR) model. This model describes disease spread of infections conferring

lifelong immunity. It is depicted as a flow diagram in Figure 1.3.

The SIR model assumes that individuals are born susceptible (S) to infection. Then,

as time progresses individuals of age a become infected and move to the infectious

class I at an age- and time-dependent rate λ(a, t), the so-called ‘force of infection’.

After this stage, individuals are removed and progress to the R compartment at rate

γ(a, t) in which they stay until they die. These individuals can no longer transmit the

1.3. Basic Concepts 7

Figure 1.3: Flow diagram for the deterministic SIR model.

infection to other individuals and are, depending on the disease under consideration,

recovered, immunized, isolated or dead. Furthermore, individuals in each state are

subject to natural mortality µ. Infectious individuals may experience disease-related

mortality α, and are thus subject to mortality at rate η(a, t) = µ(a, t) + α(a, t). The

number of individuals in each compartment are denoted by S(a, t), I(a, t) and R(a, t).

The model can be expressed by the following set of partial differential equations

(PDEs) (Kermack and McKendrick, 1927):

δS(a,t)δa + δS(a,t)

δt = −[λ(a, t) + µ(a, t)]S(a, t),

δI(a,t)δa + δI(a,t)

δt = λ(a, t)S(a, t)− [γ(a, t) + η(a, t)]I(a, t),

δR(a,t)δa + δR(a,t)

δt = γ(a, t)I(a, t)− µ(a, t)R(a, t),

(1.1)

with boundary conditions S(0, t) = B(t), the number of births in the population at

time t, and I(0, t) = R(0, t) = 0 because of the assumption that all individuals are

born susceptible to infection. The total number of individuals of age a at time t is

defined as N(a, t) = S(a, t) + I(a, t) +R(a, t).

1.3.1.2 Age-structured Model

Solving the set of PDEs in (1.1) is not straightforward, however several simplifying

assumptions can be made to facilitate mathematical derivations (see e.g. Hens

et al. (2012)). One way of doing so, is by considering an age-structured model.

In such a model, the age dimension is divided into a finite number of age groups

that interact with each other. An illustration for two age groups is shown in Figure 1.4.


Figure 1.4: Flow diagram for the age-structured SIR model with two age groups.

When assuming K age groups, the system of ODEs for the first age group is given

by: dS1(t)dt = B(t)− [λ1(t) + µ1(t) + δ1]S1(t),

dI1(t)dt = λ1(t)S1(t)− [γ1(t) + η1(t) + δ1]I1(t),

dR1(t)dt = γ1(t)I1(t)− [µ1(t) + δ1]R1(t),

(1.2)

where δ1 is the rate at which individuals move to the second age group. For the other

age groups, the system is similar, but without births into the susceptible class and

with flows δi−1 from the previous age groups.dSi(t)dt = δi−1Si−1(t)− [λi(t) + µi(t) + δi]Si(t),

dIi(t)dt = δi−1Ii−1(t) + λi(t)Si(t)− [γi(t) + ηi(t) + δi]Ii(t),

dRi(t)dt = δi−1Ri−1(t) + γi(t)Ii(t)− [µi(t) + δi]Ri(t),

(1.3)

for i = 2, ...,K. Since we consider continuous transitions from one age group to the

next (via δi), this model is called the continuous age-structured model (CAS-model).

The disadvantage of the CAS-model is that people can transition instantaneously

to the next age group. To overcome this disadvantage, we consider the realistic

age-structured model (RAS-model). In this model, individuals move to the next age

group after exactly 1 year (when assuming age groups of 1 year). The RAS-model

consists of a two-step iteration:

Step 1: Solve the following set of ODEs:


dSi(t)dt = −[λi(t) + µi(t)]Si(t)

dIi(t)dt = λi(t)Si(t)− [γi(t) + ηi(t)]Ii(t)

dRi(t)dt = γi(t)Ii(t)− µi(t)Ri(t),

(1.4)

with initial conditions {Si(t0), Ii(t0), Ri(t0)} to obtain {Si(t + 1), Ii(t + 1), Ri(t +

1)}, i = 1, ...,K.

Step 2: Shift individuals to the next age class: {Si(t + 1), Ii(t + 1), Ri(t + 1)} →{Si+1(t + 1), Ii+1(t + 1), Ri+1(t + 1)}, i = 1, ...,K − 1 and all newborns B(t) are

assumed susceptible: {S0(t+ 1), I0(t+ 1), R0(t+ 1)} = {B(t), 0, 0}.

This process is iterated during the time period of interest. We used the RAS-

model in Chapter 3 to simulate the effect of demographic change and vaccination.

1.3.1.3 Demographic and Endemic Equilibrium

In Section 1.3.1.1, the general SIR model is described and in the previous section, the

model was simplified by considering discrete age classes. Another example of such

simplification is the assumption of endemic equilibrium or steady state of the model

(see e.g. Anderson and May (1991) and Chapter 3 in this thesis) meaning that the

disease incidence fluctuates around a stationary average over time. The population

can also be assumed to have reached demographic equilibrium which implies that the

age distribution is stationary. For some diseases the disease-induced mortality can be

neglected (α(a, t) = 0). Finally, the number of births and deaths can assumed to be

constant over time and balanced, resulting in a constant population of size N . Under

the endemic and demographic equilibrium assumptions, the time-dependency in the

set of PDEs (1.1) cancels out and we obtain a set of ordinary differential equations

(ODEs):

dS(a)

da = −[λ(a) + µ(a)]S(a),

dI(a)da = λ(a)S(a)− [γ(a) + µ(a)]I(a),

dR(a)da = γ(a)I(a)− µ(a)R(a).

(1.5)

The equations in (1.5) yield the following expression for the stationary population age


distribution N(a):dN(a)

da= −µ(a)N(a),

from which follows

N(a) = N(0) exp

(−∫ a

0

µ(u) du

)= N(0) exp(−φ(a)). (1.6)

Based on the boundary condition on the number of newborns and (1.6), births and

deaths are indeed balanced, since

N(0) = B =

∫ ∞0

µ(a)N(a) da,

is equivalent to ∫ ∞0

µ(a) exp(−φ(a)) da = 1,

which is satisfied. Note that exp(−φ(a)) is a monotone decreasing function reflecting

the probability to survive up to age a: m(a) = exp(−φ(a)) = P (T > a), where T is

the time of death. From this follows that the life expectancy is given by

L =

∫ ∞0

−am′(a) da = −am(a)|∞0 +

∫ ∞0

m(a) da =

∫ ∞0

exp(−φ(a)) da.

Although not necessary when empirical data on natural mortality is available, it is

sometimes convenient to make simplifying assumptions regarding µ(a). Two types

of mortality functions that are often used in literature are called ’type I mortality’

and ’type II mortality’. Under Type I mortality, individuals survive up to the life

expectancy L after which they immediately die. For Type II mortality, the survival

function is of the form m(a) = exp(−µa), where µ is a constant mortality rate. In

this case, the life expectancy is given by L = 1/µ. Type I and type II mortality are

typically used for developed and developing countries, respectively, although many

developing countries are transitioning from type II to type I now.

In the above, the set of differential equations was described in terms of the

total number of individuals in each compartment. Instead, one can define age-specific

proportions or fractions of susceptible, infectious and removed individuals e.g.

s(a) = S(a)/N(a). It is convenient to do so, since this eliminates the natural

mortality rates µ(a) from the set of ODEs in (1.5):ds(a)

da = −λ(a)s(a),

di(a)da = λ(a)s(a)− γ(a)i(a),

dr(a)da = γ(a)i(a),

(1.7)


since, e.g.

ds(a)da = 1

N(a)dS(a)da + S(a)dN

−1(a)da

= −[λ(a) + µ(a)] S(a)N(a) + S(a)µ(a)N(a)

N(a)2

= λ(a)s(a).

.

Solving the above set of ODEs, the following expression for the fraction of susceptible

individuals of age a is obtained:

s(a) = exp

(−∫ a

0

λ(u) du

).

The SIR model is a fundamental example of a deterministic model used to describe

disease dynamics. It is the most frequently used compartmental model in the litera-

ture, however many extensions exist with different numbers of compartments having

various interpretations. For example the MSIR model accounting for maternal pro-

tection after birth, the SEIR model in which individuals experience a latent period

before becoming infectious, the SIS model with loss of natural immunity, and so on

(see e.g. Hens et al. (2012)). In Chapter 3, the MSIR model is considered for VZV

under the assumption of endemic and demographic equilibria, in Chapter 4 we study

an SEIR model taking into account asymptomatic infection for influenza, the pertussis

data in Chapter 6 are analyzed according to an age-homogenous, discrete-time SEIR

model, and in Chapter 7 the SEIR model is adapted for EVD assuming homogeneity

with respect to age.

1.3.1.4 Stochastic SIR model

The SIR model discussed in Section 1.3.1.1 is deterministic, i.e. every time the

equations are solved, the same result is obtained. Stochastic models, on the other

hand, describe the uncertainty seen in real-life outbreaks. For example, it may be

important to account for the variability of individual realizations when predicting

the course of an individual outbreak. Furthermore, when the number of cases is

small, the uncertainty on time to extinction is large and cannot be captured by a

deterministic model. Stochastic effects also play an important role when studying

recurrence and extinction of infections. In this section, we will describe a simple

discrete-time chain binomial SIR model (Bailey, 1957).

Chain binomial models are developed from the simple binomial model. The

basic idea behind the binomial model is that exposure to infection occurs in discrete


time units. Define p as the transmission probability conditional upon contact between

a susceptible and an infectious individual. The probability that the susceptible person

will not be infected during this contact is the escape probability q = 1 − p. If the

susceptible individual contacts n infectious individuals, the probability of escaping

from infection is qn = (1 − p)n (assuming that all contacts are equally infectious).

The probability of infection is then 1− qn = 1− (1− p)n. The chain binomial model

is now defined as the chained, or sequential application of the binomial model.

An example of a chain binomial model is the simple Reed-Frost model. This

model was developed by Lowell Reed and Wade Hampton Frost in the 1920’s and

described by Abbey (1952). In this model, the population size is assumed constant

and individuals are in one of the three SIR states. When working in discrete time,

the number of susceptible individuals is denoted by St, similar for the number of

infectious individuals It and the number of removed individuals Rt. When assuming

that individuals are infectious for exactly one generation, the full model is given by:

It+1 ∼ binom(St, 1− (1− p)It

),

St+1 = St − It+1,

Rt+1 = Rt + It.

. (1.8)

This is the most simple version of the model and it can be modified to make it

more realistic and adaptable for different diseases. One could for example alter the

assumptions on the recovery process, or add exposure to infection from outside the

population (Longini and Koopman, 1982).

Any stochastic epidemic model has a deterministic counterpart, obtained by

setting the deterministic population increments to the expected values of the

conditional increments in the stochastic model. Hence, the connection between the

stochastic SIR model described above and a deterministic SIR model can be seen as

follows. From (1.8) and a first-order Taylor approximation for small p, follows that

E (It+1|St, It) =(1− (1− p)It

)St ≈ pItSt. (1.9)

When switching from generation time to calendar time and assuming that the rate of

recovery is γ, we obtain:

E (It+1|St, It) ≈ pItSt − γIt,


corresponding with an age-homogenous deterministic SIR model with λ(t) = pI(t).

1.3.2 Epidemiological Parameters

In this section, some of the key measures of infectious disease transmission are dis-

cussed. First, the force of infection and mass action principle are discussed. Second,

the basic and effective reproduction number are defined.

1.3.2.1 Mass Action Principle and Mixing Assumptions

The expression in (1.9) states that the number of new cases in generation t + 1 is

proportional to all possible contacts between infectious and susceptible individuals in

generation t. This is the mass-action principle in its simplest form, and the underlying

assumption behind it is that infected and susceptible individuals mix homogeneously,

it is therefore also referred to as the ‘homogenous mass-action principle’. The

proportionality factor p is named ‘transmission parameter’ or ‘effective contact rate’

and is often denoted with the Greek letter beta, β. A contact is called effective

when it is made between a susceptible and infectious individual and it results in

infection. λ(t) = βI(t) is defined as the force of infection, and is one of the key pa-

rameters describing the rate at which a susceptible person acquires infection at time t.

The assumption of homogenous mixing is usually not very realistic. The mass

action principle can therefore be extended to the level of age-specific transmission

rates (see e.g. Anderson and May (1991)):

λ(a) =

∫ ∞0

β(a, a′)I(a′) da′, (1.10)

where β(a, a′) denotes the average per capita rate at which an infectious individual

of age a′ makes effective contact with a susceptible person of age a, per unit time.

This principle thus implicitly assumes that susceptible and infectious individuals mix

completely and move randomly within the population. Hence, the average rate at

which a susceptible individual of age a acquires infection per unit time roughly equals

the sum of the average rates at which he/she makes effective contacts with all infec-

tious individuals in the population, per unit time. Following Farrington et al. (2001),

(1.10) can be rewritten as:

λ(a) = D

∫ ∞0

β∗(a, a′)λ(a′)S(a′) da′,


with

β∗(a, a′) = D−1

∫ ∞0

β(a, a′ + t) exp

(−∫ t

0

γ(u) du

)exp

(−∫ a′+t

a′µ(u) du

)dt.

If the mean infectious period D is short compared to the timescale on which trans-

mission and mortality rates vary, β∗(a, a′) ≈ β(a, a′) and the force of infection can be

approximated by:

λ(a) ≈ ND

L

∫ ∞0

β(a, a′)λ(a′)s(a′)m(a′) da′, (1.11)

where s(a′), N, L and m(a′) are defined as before. When one wants to estimate

the transmission rates β(a, a′) from the force of infection λ(a), additional assump-

tions are necessary since λ(a) is a one-dimensional function of age and β(a, a′)

makes up a two-dimensional function. The traditional approach of Anderson and

May (1991) stratifies the population into a number of age classes J leading to a

system of J equations with J × J unknowns. They then impose different mixing

patterns upon this βij matrix, which is called the ‘Who Acquires Infection From

Whom’ (WAIFW) matrix, hereby constraining the number of distinct elements

βij for identifiability reasons. The unknown elements in the WAIFW matrix can

then be estimated from serological data. However, the choice of the structure

imposed on the WAIFW matrix as well as the choice of the age classes are ad hoc and

impact the estimation of R0 (Greenhalgh and Dietz, 1994; Van Effelterre et al., 2009).

In this dissertation, we will consider a more recent approach as proposed by

Wallinga et al. (2006) and extended by Ogunjimi et al. (2009) and Goeyvaerts et al.

(2010), by informing β(a, a′) with data on social contacts. This approach relies on

the so-called ‘social contact hypothesis’ stating that the transmission rate β(a, a′) is

proportional to the age-specific contact rate c(a, a′), i.e. the per capita rate at which

an individual of age a′ makes contact with a person of age a, per unit of time:

β(a, a′) = q · c(a, a′), (1.12)

where q is a constant proportionality factor. The assumption of constant propor-

tionality is commonly used in literature, however in Chapter 3 we will contrast this

assumption with an age-dependent proportionality factor q(a, a′) that may capture,

among other effects, age-specific susceptibility and infectivity. In Section 2.2 we will

introduce social contact data and methods to estimate the contact rates c(a, a′).


1.3.2.2 Reproduction Numbers

One of the key measures of infectious disease transmission is the basic reproduction

number R0, sometimes also called the basic reproductive ratio. It represents the

expected number of secondary cases produced by a single typical infectious individual

during his/her entire infectious period when introduced into a completely susceptible

population. In last years, R0 has been used extensively as a key parameter to

quantify disease transmission. For a historical overview of the development of R0, we

refer to Heesterbeek (2002).

Figure 1.5: Illustration of the (basic) reproduction number R0 (left) and R (right): one

infected individual (black circle) is introduced into a fully susceptible population and

infects on average R0 = 3 other individuals (grey circles, left panel), or he/she is

introduced in a partly immunized population (dotted circles) infecting only R = 1

individual (grey circles, right panel) (Goeyvaerts, 2011).

Figure 1.5 presents an illustration of the basic reproduction number for a simplistic

situation where R0 = 3. R0 is also referred to as a threshold parameter, since if it is

larger than 1, the infection may become endemic and the larger R0, the more effort

is required to eliminate the infection from the population. If it is smaller than 1, the

infection will eventually go extinct. Hence, the basic reproduction number reflects

the potential of an infection to lead to an epidemic. R0 depends on three factors: the

duration of the infectious period, the probability that a contact between an infected

and a susceptible individual leads to an infection and the contact rate (Dietz, 1993).

Although R0 is a useful theoretical measure, it is rarely observed in practice. The

effective reproduction number R is a measure for the actual expected number of

secundary cases, taking into account pre-existing immunity, control measures and


depletion of susceptible individuals (see right panel in Figure 1.5). In the endemic

equilibrium setting described in Section 1.3.1.3, each infectious individual infects one

other individual on average, hence R must be equal to 1 (Diekmann et al., 1990). It is

clear that both the basic and effective reproduction number are important epidemic

summary measures, it is therefore of importance to obtain reliable estimates for R0

and R.

Assume that an infectious individual of age a′ is introduced into a population

with a proportion s(a) of susceptible individuals of age a, the average number of

persons of age a infected by this individual during its infectious period of length D

is then given by

G(a, a′) = NDn(a)s(a)β∗(a, a′),

where n(a) = N(a)/N . The introduction of a ‘typical’ infectious individual results in

a next generation of infected individuals of age a, is then calculated as:

G[i](a) = NDn(a)s(a)

∫ ∞0

β∗(a, a′)i(a′) da′.

Hence, the next generation operator G expresses the age distribution of the next

generation of cases. The total number of cases infected by this ‘typical’ infectious

individual is then given by: ∫ ∞0

G[i](a) da.

Assume now that the infectious period is short (β∗(a, a′) ≈ β(a, a′)) and that the

population is in demographic equilibrium and thus the population size is fixed (n(a) =

m(a)/L). The reproduction number equals the spectral radius of the next generation

operator G and, in a fully susceptible population (s(a) = 1), the reproduction number

reduces to the basic reproduction number R0 (Diekmann et al., 1990). Therefore, the

(basic) reproduction number can be calculated as the leading eigenvalue of the ‘next

generation matrix’:

ND

Lm(a)s(a)β(a, a′).

The leading right eigenvector of the next generation matrix is then proportional to

the distribution of infected individuals during the initial exponential growth phase of

an epidemic. More details can be found in Diekmann et al. (1990) and Farrington

et al. (2001).


1.3.3 Network Modeling

The main drawback of the mass action principle (1.10) described in Section 1.3.2.1

is that it assumes complete and random mixing within the population and therefore

does not account for the fact that contacts are often clustered in e.g. households,

schools or workplaces. Network-based approaches to infectious disease dynamics have

individual-based interpretation and allow to model these aspects of social mixing

behavior (Keeling and Eames, 2005; Danon et al., 2011). The term ‘network’ is very

general and simply refers to a collection of elements and their inter-relations. Network

theory is therefore used in a variety of fields such as biology, bioinformatics, physics,

computer science, and so on. In Section 1.3.3.1 we will discuss some basic notions

and in Section 1.3.3.2 a model for networks is introduced.

1.3.3.1 Basic Definitions

Graph theory is the mathematical language in which networks are defined. Consider

a graph G, G = (V,E) is defined as a structure consisting of a set of vertices (or

nodes) V and a set of edges (or links) E. The elements of E contain unordered

pairs of nodes {u, v}, u, v ∈ V that are connected in G. The ‘order’ of the graph

G is defined as the number of nodes and the size Nv of G is the number of edges.

When each edge in E has an ordering i.e. {u, v} is distinct from {v, u}, G is called

a directed graph and the edges are called directed edges. Two nodes in V are said

to be adjacent if they are joined by an edge in E, the degree of a node v is defined

as the number of edges incident on v. There are several types of graphs that are

commonly encountered in practice. One example is a a complete graph in which

every node is connected to any other node.

Graphs and certain aspects of its structure can be characterized using matri-

ces and matrix algebra. The connectivity of a graph G may be captured in an

Nv ×Nv binary, symmetric matrix Y :

Yij =

1 if {i, j} ∈ E,

0 otherwise,(1.13)

where the nodes are denoted by 1, ..., Nv and an edge is denoted as an unordered

pair of vertices {i, j} ∈ V . This matrix Y is called the ‘adjacency matrix’ and stores

connectivity information of the graph G. On a final note regarding basic definitions,

it is sometimes useful to consider a graph G itself as a random object by thinking of

G as having been drawn from a collection of possible graphs, say G. Then P (G) refers


to the probability of drawing G from G. For more details and examples, we refer to

Kolaczyk (2009).

1.3.3.2 Exponential Random Graph Models

A model for a network graph is a collection

{Pθ(G), G ∈ G : θ ∈ Θ},

where G is a collection of possible graphs, Pθ is a probability distribution on G, and

θ is a vector of parameters with possible values in Θ. There is a vast amount of

modeling approaches in the literature, ranging from simple (e.g. Pθ uniform on G)

to complex and they are used for a variety of purposes. In this dissertation, we focus

on exponential random graph models (ERGMs, Robins et al. (2007)), that extend

the idea of statistical regression to random graphs.

In an ERGM, the probability of observing a specific network configuration is

defined in terms of network statistics. Let G = (V,E) be a random graph with

adjacency matrix Y and let y be a particular realization of Y . The probability of

observing y is then given by

Pθ(Y = y) =exp{θTg(y,X)}

κ(θ),

where g(y,X) is a vector of network statistics that may depend on additional

covariate information X, θ the corresponding vector of coefficients, and κ(θ) a

normalizing factor.

An alternative model specification clarifies the interpretation of θ (Hunter et al.,

2008). For a specific pair of nodes (i, j), define the vector of change statistics as

follows:

δg(y,X)ij = g(y+ij ,X)− g(y−ij ,X),

where y+ij and y−ij are the networks realized by fixing yij = 1 and yij = 0, respectively,

while leaving all the rest of y fixed. This allows for a logistic interpretation of the

coefficients in θ:

logit{Pθ(Yij = 1|Y cij = ycij)} = θT δg(y,X)ij , (1.14)

where Y cij represents the rest of the network other than Yij . Thus, θ reflects the

increase in the conditional log-odds of the network, per unit increase in the corre-

sponding component of g(y,X), resulting from switching a particular dyad Yij from


0 to 1 while leaving the rest of the network fixed at Y cij . In Chapters 5 and 6 we will

use ERGMs to model contact networks within households.

1.3.4 Statistical Inference

In this dissertation, we will rely on both the frequentist framework as the Bayesian

framework. In this section, we briefly introduce some of the methods used to perform

parameter estimation and asses variability.

1.3.4.1 Maximum Likelihood Estimation

Within the frequentist framework, the standard method to estimate unknown param-

eters for a given model is the method of maximum likelihood (ML). The basis principle

behind this approach is the construction of a likelihood function expressing the ‘agree-

ment’ between the selected model and the observed data. Consider a set of observed

values y = (y1, ..., yn) of a random sample Y1, ..., Yn and let fi(yi|θ) be the density

function of Yi. The vector θ = (θ1, ..., θk) represents the unknown model parameters

that we want to estimate from the observed data. Since the random variables Yi are

independent, the likelihood function is given by:

L(θ|y) =

n∏i=1

fi(yi|θ).

Hence, given the selected model fi(yi|θ) parametrized by θ, the likelihood L(θ|y) is

the probability of observing the data y as a function of θ. Maximizing this likelihood

function over the entire parameter space Θ then results in an estimate for θ, denoted

by θ. From an analytical and computational perspective, it is often more convenient

to maximize the log-transformed likelihood function ll(θ|y) = log(L(θ|y)). Indeed,

this results in a function composed of additive contributions log(fi(yi|θ)), simplifying

the calculation of derivatives with regard to θj , j = 1, ..., k. As the natural logarithm

is a monotone increasing function the optimization problem is equivalent. To derive

the ML-estimate θ the so-called set of score equations needs to be solved:

Sj(θ|y) =δ

δθjll(θ|y) = 0.

The information matrix I(θ|y) is defined in terms of the second order partial deriva-

tives:

I(θ|y) = −[

δ2

δθlδθmll(θ|y)

]l,m

.


This matrix should be positive definite for θ to be a maximum. There are multiple

numerical optimization techniques available to solve the set of score equation when

a closed form solution is not available. These include iterative procedures such

as Newton-Raphson, Fisher Scoring and the EM-algorithm, and so on. The ML-

estimator is weakly consistent and asymptotically normal under certain regularity

conditions.

Wald-based Confidence Intervals

The above procedure produces a point estimate θ for the unknown parame-

ters. To acknowledge the uncertainty associated with this estimation, we want to

estimate the standard error or confidence interval (CI) of θ. One way to do so,

is by calculating the Wald-based confidence intervals that rely on the asymptotic

normality of the ML-estimate:

[θj − z1−α/2 × se(θj), θj + z1−α/2 × se(θj)

],

where α and z1−α/2 are the significance level and the (1− α/2)× 100th percentile of

the standard normal distribution, respectively. Further, se(θj) is an estimate for the

standard error of θj given by the square root of the jth element on the diagonal of the

inverse of the observed information matrix I(θ)−1. Indeed, I(θ)−1 is an estimator

for the asymptotic variance-covariance matrix of θ.

Bootstrap Confidence Intervals

The bootstrap approach is, in contrast with the Wald-based CIs, a distribution-free

method to estimate standard errors and calculate approximate CIs for θ. It was

first introduced by Efron (1979) and is now widely used to assess the uncertainty

associated with parameter estimates. There are different versions of the bootstrap

approach, namely the nonparametric, the semiparametric and parametric bootstrap.

The semi- and parametric bootstrap approaches require parametric assumptions

about the ‘true’ underlying population, and are therefore often less useful compared

to the nonparametric approach. In this dissertation, we will only rely on the

nonparametric bootstrap.

The idea behind the latter approach is that one samples from the empirical

distribution function, a nonparametric and consistent estimator for the unknown


distribution F of the quantity of interest, which is equivalent to sampling with

replacement from the sample itself. Hence, let y(1), ...,y(B) denote B independent

bootstrap samples of size n obtained by drawing samples with replacement from

the observed data y1, ..., yn. Let θ(b), b = 1, ..., B, be the bootstrap replicates of θ

obtained by maximizing the loglikelihood ll(θ|y(b)) for bootstrap sample y(b). The

bootstrap estimate for the standard error of the ML-estimate θj is then given by:

seB(θj) =

√∑Bb=1(θ

(b)j − θj)2

B − 1,

where

θj =1

B

B∑b=1

θ(b)j .

Note that more generally, bootstrap estimates can be obtained for any statistic

of interest from y. Several bootstrap methods have been proposed to construct

approximate confidence intervals. In this thesis, we will use the percentile-based

bootstrap CIs, based on the empirical distribution function of θj . Let p(j,B)α denote

the α × 100th percentile of the bootstrap values θ(b)j , b = 1, ..., B, then the approxi-

mate (1 − α) × 100% CI for θj is [p(j,B)α/2 , p

(j,B)1−α/2]. For a more detailed discussion on

bootstrap methods, we refer to Effron and Tibshirani (1993).

Model Selection

To compare various models in this ML setting, we will focus on two informa-

tion criteria: Akaike’s information criterion (Akaike, 1973):

AIC = −2ll(θ|y) + 2k,

where k represent the number of parameters, and the Bayesian information criterion

(Schwarz, 1978):

BIC = −2ll(θ|y) + log(n)k.

Both criteria consist of two terms, the first is a measure of data fit and the second is

a penalty term. The BIC originates from a Bayesian perspective and penalizes the

number of parameters more strongly (factor log(n) instead of 2). Given a set of can-

didate models, the ‘best’ model is the one with the smallest AIC or BIC value. Since

model selection is conditional on the set of models under consideration, there may

exist other models that are closer to the true underlying model. Hence, the choice of


candidate models is crucial to ensure that the preferred model describes the data well.

Furthermore, the AIC values can be used to calculate the Akaike weights.

These weights can be interpreted as the probability of a certain model being the

‘best’ model, given the data and the set of candidate models under consideration.

Suppose we consider a set of m candidate models, and list them according to their

AIC value. Let AICmin correspond to the model with the smallest AIC value and

define the AIC differences ∆i = AICi − AICmin (i = 1, ...,m). The Akaike weights

are then calculated in the following way:

wi =exp

(− 1

2∆i

)m∑l=1

exp

(−1

2∆l

) .

For further details we refer to Burnham and Anderson (2002).

1.3.4.2 Markov Chain Monte Carlo

In the previous section, we described how ML estimation can be used to obtain point

estimates and discussed methods to assess uncertainty by estimating the standard

error or CI associated with these point estimates. This is a frequentist approach in

which the unknown quantity θ is assumed to be fixed (non-random). A different

framework for inference is the Bayesian approach. In this framework θ is treated as a

random variable and we are interested in the distribution of θ. More specifically, we

first assume that we have current knowledge about θ. This is expressed by placing a

probability distribution on the parameters, called the prior distribution, π(θ). After

observing data y = (y1, ..., yn), the distribution π(θ) is updated to obtain the posterior

distribution f(θ|y). This update is done by using Bayes’ Theorem:

f(θ|y) =f(y|θ)π(θ)∫

Θf(y|θ)π(θ) dθ

,

with Θ the space of possible parameter values, as before. In theory, the posterior dis-

tribution is always available, but evaluation of the complex integral∫Θf(y|θ)π(θ) dθ

is often analytically intractable. With the use of Markov Chain Monte Carlo (MCMC)

methods, the evaluation of the integral is avoided by making use of the unnormalized

posterior density:

f(θ|y) ∝ f(y|θ)π(θ),


equivalently,

posterior ∝ likelihood× prior.

MCMC is based on the classical Monte Carlo, i.e. Monte Carlo integration aiming at

approximating expectations of the form

E[h(X)] =

∫h(x)g(x) dx.

If X1, ..., Xn ∼ g(x), iid and E[h(X1)] < ∞, then the above expectation can be

approximated by

1

n

n∑i=1

h(Xi),

for some large, yet finite n. However, in many situations, classical Monte Carlo is

not possible because we cannot sample from the distribution g(x). For these, often

high-dimensional cases, MCMC has been developed. The general MCMC strategy

is to construct an ergodic Markov chain Xn with stationary distribution g(x) (for a

discussion on Markov chain theory, see for example Bremaud (1999)). There are a

large number of MCMC algorithms, examples are Random-Walk Metropolis (RWM),

Metropolis-Hastings (MH), Gibbs sampling, slice sampling, etc. RWM was developed

first (Metropolis et al., 1953) and MH is a generalization of RWM (Hastings, 1970).

The MH algorithm only requires the evaluation of a function that is proportional to

the target density g(x). Let p(x) be a function that is proportional to g(x), then we

can construct a Markov chain according to the following algorithm:

Initialization: Choose a starting value x0 and choose an arbitrary distribution

function q(x|y) that suggests a candidate for the next sample value x, given the

previous sample y. This function is referred to as the ‘proposal density’.

Iteration: For n = 0 to N do

1. Generate a candidate value x′ from q(x′|xn).

2. Compute the Metropolis-Hastings acceptance probability

α = min

{p(x′)q(x′|xn)

p(xn)q(xn|x′), 1

}.

3. Generate a value u from U [0, 1].


4. Accept the candidate x′ by setting xn+1 = x′ if u ≤ α, otherwise set xn+1 = xn.

The MH algorithm is the most generalizable MCMC algorithm, extending RWM

to include an asymmetric proposal distribution q. The main disadvantage of these

two methods is that the proposal variance needs to be tuned manually. Therefore,

adaptive variants of RWM, tuning the algorithm as it updates, have been proposed.

These algorithms automatically optimize the proposal variance based on the history

of the chains. However, this violates the Markov property, which states that the

proposal may only be influenced by the current state. To obtain valid Markov

chains, a two-phase approach can be used, in which adaptive MCMC is followed by a

non-adaptive algorithm, such as RWM. One of these adaptive algorithms, used in this

dissertation, is the Adaptive-Mixture Metropolis (AMM) algorithm. This algorithm

is an extension by Roberts and Rosenthal (2009) of the Adaptive Metropolis (AM)

algorithm of Haario et al. (2001). Further details will be omitted here.

Referring back to the inference setting, one can construct a Markov Chain with the

posterior distribution f(θ|y) as stationary distribution by taking p(θ) = f(y|θ)π(θ).

When the Markov chain is constructed, one needs to determine how many

steps are needed to converge to the stationary distribution within an acceptable

error. Although, there is no definitive way to tell whether the chain is long enough,

several diagnostic tools exist. For an in-depth review of these methods, we refer

to Cowles and Carlin (1996) and Brooks and Roberts (1998). Furthermore, since

an arbitrary initialization is chosen, a ‘burn-in’ period is often discarded and since

samples are not independent, the chain is often ‘thinned’, only keeping every kth

sample. The output of this simulated chain θ(1), ...,θ(m) can then be used to estimate

characteristics of f(θ|y), such as the expected value of θ: θ = 1m

∑j θ

(j).

Model Selection

AIC and BIC were introduced in the previous section as model selection crite-

ria. In a Bayesian setting, the deviance information criterion (DIC), a hierarchical

modeling generalization of AIC, is used in model selection problems. More specifi-

cally, define the deviance as D(θ) = −2 log(f(y|θ)) and denote the expected deviance

by D = Eθ[D(θ)]. Further, the effective number of parameters is pD = D − D(θ),

where θ is the expectation of θ. The DIC is then given by


DIC = D(θ) + 2pD.

From this definition, it is clear that the DIC can be easily calculated from samples

obtained by a MCMC approach. Equivalent to AIC and BIC, models with smaller

DIC are preferred over models with larger DIC.

Chapter 2Data Sources

In this chapter the data sources that are used throughout the thesis will be introduced.

We will use two main types of data in our applications. The first are disease data sets

from a variety of sources. In Section 2.1.1 cross-sectional serological survey data on

varicella-zoster virus in twelve countries is discussed. The influenza-like-illness (ILI)

incidence data obtained during the A(H1N1)v2009 influenza epidemic and data from

a prospective study on pertussis within households are introduced in Sections 2.1.2

and 2.1.3. Lastly, the Ebola virus disease (EVD) incidence and mortality data are

introduced in Section 2.1.4. In all chapters, except Chapter 7, the disease data is

augmented with contact rates obtained from social contact surveys. The different

types of such social contact data used in our applications are discussed in Section 2.2.

2.1 Disease Data

2.1.1 Varicella-zoster Virus

VZV is one of the eight known herpes viruses that affect humans. Primary infection

with VZV results in varicella (chickenpox) and mainly occurs in childhood. In

general, the disease is benign, however, symptoms may be more severe in adults and

complications may occur when varicella is acquired during pregnancy. VZV is highly

contagious and transmitted through direct close contact with lesions or indirectly

through air droplets containing virus particles. The incubation period following

VZV infection ranges from 13 to 18 days and each infected person transmits the

virus for about 7 days. The antibody response following primary infection with

27

28 Chapter 2. Data Sources

VZV is believed to induce lifelong protection against chickenpox. However, the virus

remains dormant within the body and may reactivate and give rise to herpes zoster

(or shingles), a skin disease, after years to decades (Miller et al., 1993). In this

dissertation, we will focus on primary infection and ignore reactivation leading to

zoster.

In Chapter 3, we reanalyze the ESEN2 (European Sero-Epidemiology Network) data

on VZV published by Nardone et al. (2007) together with newly available serology

from Poland and Italy, totaling 13 serosurveys from 12 different countries including

two samples from Italy (see Table 2.1 and Figures 2.1-2.2). At the time of sera

collection, which varied between 1995 and 2004, none of the participating countries

had introduced a universal VZV vaccination program. Sample sizes range from 1268

for Poland to 4398 for Germany, with substantial variability between the surveyed

age ranges.

Table 2.1: Overview of the VZV serological data and demographic parameters.

Data Age Sample Life Population

collection range size expectancy size

Country (years) (years)

Belgium (BE) 2001-2003 0-71.5 3251 77.6 10,309,722

Germany (DE) 1995/1998 0-79 4398 77.1 82,050,377

Spain (ES) 1996 2-39 3590 77.5 39,427,919

England and Wales (EW) 1996 1-20.9 2032 76.0 51,125,400

Finland (FI) 1997-1998 1-79.8 2471 76.7 5,146,965

Ireland (IE) 2003 1-60 2430 77.6 3,963,814

Israel (IL) 2000-2001 0-79 1543 76.2 6,223,842

Italy (IT’97) 1996-1997 0.1-50 3110 78.2 56,872,349

Italy (IT’04) 2003-2004 1-79 2446 80.3 5,788,0478

Luxembourg (LU) 2000-2001 4-82 2640 77.2 438,723

The Netherlands (NL) 1995-1996 0-79 1967 77.0 15,493,889

Poland (PL) 1995-2004 1-19 1268 73.2 38,637,184

Slovakia (SK) 2002 0-70 3515 73.2 5,378,702

These serological data consist of cross-sectional sets of either residual blood samples

2.1. Disease Data 29

●

●●

●

●●

●●●●

●●●

●●

●●●●●

●●● ●

●

●●

● ●

●

●●● ● ●

●

●

●

●

● ● ● ●●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●● ● ●

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

sero

prev

alen

ceBelgium

●●●

●●●

●●●

●

●

●●●●

●●●●

●

●

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

sero

prev

alen

ce

England and Wales

●

●

●●

●

●

●●

●●

●●●●●

●

●●

●

●●●●

●

●

●●●

●●●●

●●●

●●

●●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

sero

prev

alen

ce

Finland

●

●

●

●

●

●●●●

●●●●●●●●●●

●●●

●●

●

●●

●●

●●

●●●

●●●

●●

●●●●●

●

●●

●●●

●

● ● ●●●

●●●●●

●

●● ● ●● ● ● ● ● ● ● ●

●

●

●

●

●

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

sero

prev

alen

ce

Germany

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●●●●● ● ●

●

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

sero

prev

alen

ceIreland

●

●

●

●

●●

●●

●

●

●●●

●●●●

●

●

●

●

●

●

●

●

●

●

● ● ●

●

●●

●

●

● ●

●

● ●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

●

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

sero

prev

alen

ce

Israel

●

●●●

●●

●●●

●●●●

●●

●●●●●●●

●

●

●

●

●

●

●

●●

●●

●

●●●

●●

●

●

●

● ● ●

●

● ● ● ●

●

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

sero

prev

alen

ce

Italy (1997)

●

●

●●

●

●

●

●

●●

●●●

●●●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●

●

●

●

●

●

● ●

●

●

● ● ● ● ● ●

●

● ● ● ●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

●

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

sero

prev

alen

ce

Italy (2004)

Figure 2.1: Observed age-specific VZV seroprevalence for Belgium, England and Wales,

Finland, Germany, Ireland, Israel and Italy. The size of the dots is proportional to the

sample size per age category.

collected during routine laboratory tests or population-based random sampling.

Blood samples were tested using an enzyme-linked immunosorbent assay (ELISA),

which is a technique to measure infection-specific IgG antibodies. To allow for

international comparisons, the antibody titers were standardized controlling inter-

laboratory and inter-assay variations (de Ory et al., 2006). The observed IgG level

indicates past infection or vaccination and is classified as seropositive or seronegative

by comparing to the cut-off level (or range) specified by the manufacturer of the

test. Hence in the absence of an immunization program, serological data provide

information on the prevalence of past infection in a population under the assumption


●

●

●●●●●●●●

●●●●●●

●

●

●●

●

●●

●

●●

●●●

●

●●

●

●● ● ● ● ●

●

● ● ● ● ●

●

● ●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

● ● ● ●

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

sero

prev

alen

ceLuxembourg

●

●

●

●

●

●●

●

●●●●●●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

● ●

●

●

●

● ● ● ● ●

●

●

●

●

● ● ●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

●

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

sero

prev

alen

ce

Netherlands

●●●●

●

●●

●●●●

●

●

●

●

●●

●

●

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

sero

prev

alen

ce

Poland

●

●●

●

●

●

●●●

●●●●●●●●●

●●●●

●

●●

●●●●●

●●●

●

●

●●●

●●

●

● ● ● ●

●

●

● ●

● ●

● ●

●

● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

sero

prev

alen

ce

Slovakia

●

●

●

●●

●●●

●●●●●

●

●

●●●●●●●

●●●●

●●●●●

●●●

●●●●

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

sero

prev

alen

ce

Spain

Figure 2.2: Observed age-specific VZV seroprevalence for Luxembourg, the Netherlands,

Poland, Slovakia and Spain. The size of the dots is proportional to the sample size per age

category.

of a serological correlate of protection against the infection. The proportion of

seropositives in the sample is called the seroprevalence.

This type of data is type I interval censored data, or current status data, as

an individual’s infection time lies either before (seropositive) or after (seronegative)

the time of sampling. Since the test is based on a pre-specified cut-off, it suffers

from diagnostic uncertainty and both false negatives as false positives can occur.

This may lead to bias when estimating the prevalence and force of infection. To

avoid bias introduced by misclassification, approaches using the continuous antibody

titers have been proposed (Bollaerts et al., 2012). In this thesis, we will focus on the

dichotomized data where equivocal results were included as seropositive. Although

residual samples are considered to be prone to selection bias, it was shown that the

VZV sero-prevalence estimated from residual sera collection or population sampling

is similar (Kelly et al., 2002).

Unlike incidence data, i.e. disease notification counts through passive or active


surveillance, and laboratory reports, i.e. laboratory confirmed cases, serological data

do not suffer from bias introduced by changes in clinical awareness or under-reporting.

2.1.2 A(H1N1)v2009

The 2009 H1N1 flu was first detected in the United States in April 2009. Up to then,

this combination of influenza virus genes had never been seen in animals or humans.

It was most closely related to North American swine-lineage H1N1 and Eurasian

lineage swine-origin H1N1 influenza viruses, leading to the term “swine flu”. Unlike

what this term suggests, the virus was typically transmitted from person to person

via respiratory droplets. On June 11, 2009, WHO declared the outbreak a global

pandemic and on August 10, 2010, the end of the H1N1 pandemic was announced

(Centers for Disease Control and Prevention, 2010).

Figure 2.3: Weekly number of ILI cases in five age categories during the early part of the

A/H1N1pdm influenza epidemic in 2009 in England and Wales.

The methodology in Chapter 4 is applied to weekly incidence data on influenza-


like-illness (ILI) obtained from general practitioners’ weekly consultation data from

England and Wales during the early part of the A/H1N1pdm influenza epidemic

in 2009 (weeks 23-29). These data were obtained from weekly reports published

by Public Health England (Public Health England, 2010). We only consider the

exponential growth phase of the epidemic, since the model in Chapter 4 does not

take any intervention strategies into account. Pre-existing immunity to the pandemic

strain was obtained from a serological study in England the year before the pandemic

(Miller et al., 2010). Figure 2.3 shows the incidence in five age classes: 0− 4, 5− 14,

15− 44, 45− 65 and 65+ years.

2.1.3 Pertussis

Pertussis, commonly known as whooping cough, is a highly contagious respiratory

illness which is caused by a type of bacteria called Bordetella pertussis. The disease

is spread from person to person by coughing or sneezing. It is characterized by

uncontrollable, heavy coughing which often makes it hard to breathe. The name

‘whooping cough’ is derived from the high-pitched ‘whooping’ sound that may follow

after fits of many coughs. Symptoms develop within 5 to 10 days after being infected,

however the incubation period can be as long as 3 weeks. Early symptoms appear to

be nothing more than a common cold, including a runny nose, light fever and a mild

cough. After 1 to 2 weeks the disease progresses and the typical cough symptoms

of pertussis may appear. These coughing fits can go on for up to 10 weeks or more.

The infection is generally milder for teenagers and adults but can be very serious,

even deadly, for babies less than a year old.

We use data from a prospective study on pertussis within households in the

Netherlands in 2006. A detailed description of this study can be found in de Greeff

et al. (2010). In short, households with an infant aged less than 6 months that had

been hospitalized with laboratory-confirmed pertussis were enrolled in the study. The

hospitalized infant is referred to as the index case. These households were visited by

a study nurse within the first week after the diagnosis of the infant and all household

members were tested for pertussis by PCR, culture and serology. They also received

a questionnaire with questions on clinical symptoms in the past 2 months. This

questionnaire also indicates age, relation to the infected infant, date of symptom

onset (if any) and vaccination status. Follow-up data was collected by phone four

to six weeks after the initial home visit. A household contact was regarded as a

confirmed pertussis case if either PCR, culture or serology were positive. The date


of symptom onset was defined as the first day of coughing or cough-preceding cold

symptoms. The 62 households with confirmed cases that did not have a defined date

of symptom onset were excluded. Furthermore, we removed 13 atypical households

(grandparents, uncle/aunt) and 5 household with individuals for which the age was

missing. Therefore the data for analysis consisted of 121 households (and index

cases) and 401 household members of which 191 were confirmed cases. The data is

graphically presented in Figure 2.4.

Figure 2.4: Compositions of the households included in the pertussis study (left) and

symptom onset times in days relative to the symptom onset time of the primary case of the

household (right).

2.1.4 Ebola Virus Disease

Ebola virus disease (EVD) belongs to the virus family Filoviridae, also including

Cuevavirus and Marburgvirus. There are five species of EVD: Zaire, Bundibugyo,

Sudan, Reston and Tai Forest. The virus that caused the most recent West African

outbreak in 2014 belongs to the Zaire strain. EVD is transmitted to people from wild

animals and spreads through human-to-human transmission. Transmission requires

direct contact with blood, secretions, organs or other bodily fluids of infected persons

or animals. The virus starts as a flu-like syndrome after an incubation period of 2 to

21 days with the onset of fever, fatigue, muscle pain, headache and sore throat. It


then rapidly evolves to severe symptoms such as vomiting, diarrhoea, rash, impaired

kidney and liver function and both internal and external bleedings. Distinguishing

EVD from other infectious diseases such as malaria, typhoid fever and meningitis can

be difficult. Laboratory testing is necessary to confirm that symptoms are caused by

Ebola virus infection. The disease is often fatal in humans with an average fatality

rate around 50% (World Health Organization, 2016a; European Centre for Disease

Preventionl and Control, 2016).

Figure 2.5: Transmission electron micrograph of an Ebola virus virion.

EVD first appeared in two simultaneous outbreaks in 1976, one in what is now,

Nzara, South Sudan, and the other in Yambuku, Democratic Republic of Congo.

The latter occurred in a village near the Ebola River, from which the disease takes

its name (World Health Organization, 2016a). The current Ebola epidemic in West

Africa was detected in March, 2014. On 8 August, 2014, WHO declared the event

a Public Health Emergency of International Concern (Hawkes, 2014) and the UN

General Assembly declared the epidemic a threat to global health and security

(United Nations Security Council, 2014). On 9 May, 2015, Liberia was declared free

of Ebola virus transmission but on 30 June, 2015, a new case was detected from an

unknown chain of transmission (World Health Organization, 2015a,c). In Guinea

and Sierra Leone, the epidemic persisted in a number of districts mainly between

Conakry and Freetown (UN Mission for Ebola Emergency Response, 2015). As of

24 June, 2015, it has caused 27,443 probable, confirmed, and suspected cases of

EVD in Guinea, Liberia and Sierra Leone, including 11,207 deaths (World Health

Organization, 2015b).


Data on cases and deaths

We used publicly available district-level data on cumulative cases and deaths,

reported from 30 December 2013 until 8 July 2015 through situational reports by

the Ministries of Health of Guinea (Nations U. West Africa, 2015), Liberia (Liberia

MoHaSWRo, 2015) and Sierra Leone (Sanitation Moha, 2015; NERC, 2015). The

data were collected and reported to the national authorities by the Ebola treatment

units and diagnostic testing facilities in the three countries, following national guide-

lines and/or WHO case definitions (World Health Organization, 2014). Data were

reported every two to three days, and more recently on a daily basis. The data sources

provided no detailed information about the used case definition. Data for Liberia

and Guinea were the reported total cumulative number of (suspected, probably and

confirmed) cases and deaths, while for Sierra Leone, we calculated the sum of the

reported suspected, probable and confirmed cases. This allowed us to calculate for

each district the number of new cases and new deaths between two reporting intervals.

A presentation of how the cases were reported can be found in Figure 7.6.

The reporting scheme for deaths was similar, but the dates at which reporting

occurred is not necessarily the same.

Data on control measures

Publicly available situation reports of response measures were used to assess

the intensity of interventions (Organization GoGWH, 2015; Exchange HD, 2015).

The publicly available data regarding interventions provided little detail and was

not regular over time or over the entire outbreak region. Due to the complexity of

response measures and limited availability of data, we used the presence of triage

centers, holding or community care centers and Ebola Treatment Units (ETUs) as a

surrogate marker of response activities.

The implemented intervention measures and cumulative numbers of cases and

deaths are displayed in Figures 7.3 and 7.4, respectively.


2.2 Social Contact Data

Mathematical modeling of infectious disease spread requires assumptions on the un-

derlying transmission processes (i.e. β(a, a′) introduced in Section 1.3.2). Since the

spread of airborne or close-contact infections in a population is driven by social con-

tacts between individuals, these assumptions are related to human social interactions.

The frequency and intensity of these interactions typically vary with age, but also de-

pend on disease status (Section 2.2.2) and setting (Section 2.2.3). In the traditional

approach of Anderson and May (1991) mixing patterns are imposed to estimate the

WAIFW matrix from age-specific incidence or serological data. However, it has been

shown that R0 is highly sensitive to the choice of the imposed mixing pattern (Green-

halgh and Dietz, 1994). An alternative to the approach by Anderson and May (1991) is

informing the mixing patterns with data from population-based social contact surveys

and assuming that transmission rates are proportional to contact rates. Recently, sev-

eral studies were conducted to measure social mixing behavior, and Read et al. (2012)

present a review of the different methodologies employed. In the next sections, we

describe three different social contact surveys i.e. a large multi-country population-

based survey, a contact survey conducted during the A/H1N1pdm influenze epidemic

and a survey on household contacts. We also briefly describe methods for the estima-

tion of contact rates using data as obtained in the first two contact surveys.

2.2.1 POLYMOD Contact Data

In Chapter 3, we use contact data from cross-sectional diary-based surveys that were

conducted between May 2005 and September 2006 as part of the POLYMOD project

(a European Commission project funded within the Sixth Framework Programme).

This project constituted the first large-scale prospective study to investigate social

contact behavior in eight European countries: Belgium, Germany, Finland, Great

Britain, Italy, Luxembourg, the Netherlands and Poland (Mossong et al., 2008a).

Participants were recruited through random-digit dialing, face-to-face interviews or

population registers, and completed a diary-based questionnaire recording social con-

tacts during one randomly assigned day. Parents filled in the diary for young children.

Recruiting participants was done such that the samples were broadly representative

for the study populations in terms of age, sex and geographical spread. Participants

were asked to fill in some general information and record the age and gender of each

contacted person, plus location, duration and frequency of the contact. In case the

exact age was unknown, the participant had to provide an estimated age range. If so,

2.2. Social Contact Data 37

the mean of this interval was used as a surrogate for the age of the contacted person.

Further, a distinction between two types of contact was made: non-close contacts,

defined as two-way conversations of at least three words in each others proximity, and

close contacts that involve any sort of physical skin-to-skin touching. For an extensive

description of the survey, we refer to Mossong et al. (2008a).

2.2.2 Contact Behavior during Illness

Recently, the impact of illness on social contact patterns has been investigated. This

was done using data from a social contact survey that was carried out during the

A/H1N1pdm influenza epidemic in England. This survey is described in detail by

Eames et al. (2010). Briefly, participants were recruited into the study through packs

with antiviral medication distributed at thirty-one antiviral distribution centers

throughout England during the epidemic. The packs contained a social contact

diary to be filled in on one day during the time they were symptomatic with ILI.

Two weeks later (by which time participants were expected to have recovered),

participants were sent a similar, follow-up questionnaire. Thus, the study aimed to

obtain two contact diaries from each participant: one completed when the participant

was showing symptoms and one completed after he or she had recovered. In these

contact diaries participants were asked to record details about each person they met

during the course of a day: gender and (estimated) age of the contact, social setting

and duration of the encounter, frequency with which that person was met, and

whether the encounter involved any skin-to-skin contact (e.g., hand-shake, kiss, or

contact sport). A total of 140 participants returned two completed contact diaries.

In Chapter 4 we will use the difference between the contact patterns of ‘healthy’ and

symptomatic individuals to infer on parameters related to asymptomatic infection

from the incidence data described in Section 2.1.2.

2.2.3 Estimation of Contact Rates

Contact rates can be estimated from the contact data described in the previous sec-

tions as follows. First, the age dimension is discretized in J age classes [a[j], a[j+1][.

Now, consider a respondent in age class i and let Yij denote the number of contacts

with individuals in age class j during one day. From the contact surveys described

above, we observe values yij,p, p = 1, ..., Pi where Pi is the number of participants in

age class i. Let the expected number of contacts in age class j by an individual in

age class i be denoted by mij = E(Yij). The elements mij make up a J × J social


contact matrix. The yearly contact rates cij , i.e. the annual rate at which individuals

of age class j contact individuals in age class i are then given by:

cij = 365× mji

Ni,

where Ni denotes the population size in age class i obtained from demographic data.

To account for the reciprocity of social contacts (Wallinga et al., 2006), the total

number of contacts from age class i to age class j must equal the total number of

contacts from age class j to age class i on a population level:

mijNi = mjiNj

To estimate the contact rates cij from the POLYMOD data described in Section 2.2.1

we will use a bivariate smoothing approach described by Goeyvaerts et al. (2010). In

this approach the average number of contacts mij is modelled as a two-dimensional

continuous function over the age of both respondent and contact resulting in a con-

tinuous contact surface. The basis is a tensor product spline derived from two smooth

functions of the respondent’s and contact’s ages:

Yij ∼ NegBin(mij , φ), with g(mij) =

K∑k=1

K∑l=1

βklbk(a[i])dl(a[j]),

where g is a known link function, βkl are unknown parameters, and bk and bl are

known basis functions for the marginal smoothers. The basis dimension K should be

chosen large enough in order to fit the data well, but small enough to maintain com-

putation efficiency (Wood, 2006). Goeyvaerts et al. (2010) use thin plate regression

splines and a logarithmic link function. Post-stratification weights are taken into

account and a smooth-then-constrain approach is used to account for the reciprocity

of contacts. The estimated contact surface for Belgium is displayed in Figure 2.6.

From this contact surface we notice a strong main-diagonal, indicating assortative

mixing i.e. high contact rates between persons of the same age, an off-diagonal

parent-child component and a potential grandparent-grandchild component. These

age-specific mixing patterns and contact characteristics were very similar across the

European countries, although the average number of contacts differed substantially.


Figure 2.6: Contour plot of the estimated Belgian contact rates derived from the

bivariate smoothing approach applied to the POLYMOD survey data.

For the contact data in Section 2.2.2 no smoothing is applied and the averages

mij are used directly to calculate the social contact matrices Ca and Cs for both

recovered (assumed to be the same as asymptomatic) and symptomatic individuals,

respectively (Van Kerckhove et al., 2013). These matrices are presented in Figure 2.7.

Figure 2.7: Age-specific contact rates for asymptomatic individuals (left) and

symptomatic individuals (right) based on the age classes of the incidence data.


2.2.4 Contact Patterns within Households

Since households are such important units in the transmission of infectious diseases,

we study household contact networks in Chapter 5. The data that was used, results

from a social contact survey conducted in 2010-2011 focusing on households with

young children in the Flemish geographic region including Brussels. Another contact

survey with similar design aimed at gathering individual-level information was con-

ducted in parallel and is described elsewhere (Willem et al., 2012). Participants were

recruited by random digit dialing and stratified sampling ensured representativeness

in terms of geographical spread, day and week-weekend distribution, and age and

gender of the youngest child. All participants were asked to anonymously complete

a paper diary recording their contacts during one randomly assigned day without

changing their usual behavior. Two types of contact diaries were used, adapted to

the age of the participants: one for children (0-12 years) that were designed to be

filled by a proxy, and one for adolescents and adults (> 12 years). The diaries were

sent and collected by mail. Participants were reminded by phone to fill in the diary

one day in advance and followed up the day after. Data were single entered in a

computer database and independently checked.

The survey focused on households with at least one child of age 12 years or

less. Upon sampling, all persons living more than 50% of the time in the household

were defined as household members and recruited to take part in the survey.

Participants had to record all persons they made contact with, with a contact

being defined as a two-way conversation at less than 3 meters distance or a physical

contact involving skin-to-skin touching (either with or without conversation). The

information recorded included the exact or estimated age (interval) and gender of

each contacted person, physical touching (yes/no), location, frequency and total

duration of the contact, and whether or not the contacted person was a household

member. If two people contacted each other multiple times per day, participants

were instructed to consider that to be a single contact with the duration being total

duration spent in contact with that person during the 24-hour diary period.

From the 342 households that participated in the survey, 24 households were

excluded because of missing contact diaries or non-compliance with the study design.

We analyzed data from 318 households including 1266 participants who recorded

19,685 contacts in total, with household sizes ranging from 2 to 7. Within-household

contacts were identified and matched with other household members using the fol-


lowing criteria: matching household identification number, gender and age (allowing

the recorded age to deviate from the true age by 1 year). As such, all contacts

reported as household contacts could be linked to a unique household member.

Amongst the remaining contacts, i.e. with missing or negative household member

indicator, that occurred at home, an additional small subset of household contacts

is identified using the same criteria as before but requiring an exact age match.

This entailed 3821 identified within-household contacts with 98% reciprocity, i.e.

symmetry in contact reporting, indicating a good quality of reporting as expected

in this household setting (Smieszek et al., 2014). We assumed all social contacts

to be reciprocal, depicting each household as an undirected network where nodes

represent household members and edges represent contacts within the household. An

edge therefore indicates that the corresponding household members made contact at

least once during that day. Contact characteristics of reciprocal contacts are merged

such that the most intense contact value is retained and the location category is set

to ‘multiple’ if two or more different locations are reported. This resulted in a total

of 1946 distinct within-household contacts of which 1861 (96%) involved physical

contact (Figure 2.9). There are 9 participants who did not record any contact with

other household members and are referred to as isolates. Figure 2.8 depicts the

observed within-household physical contact networks by household size.

Figure 2.9 shows that contacts between household members were of long duration,

which is consistent with findings from previous social contact surveys (Mossong et al.,

2008b) and from individual-based simulation models creating so-called synthetic

populations (Del Valle et al., 2007). Further, interactions between household

members occurred (almost) daily and 66% of household members only met each

other at home, while 33% met at multiple locations of which 98% included home.

In the following, we focus on physical contacts since it has been shown that these

better explain the observed age-specific seroprevalence of airborne infections, such as

varicella and parvovirus B19, compared to non-physical contacts (Ogunjimi et al.,

2009; Goeyvaerts et al., 2010; Melegaro et al., 2011).

Age, gender and household size were used to assign the role of child, mother and father

to each household member. Two households of size 4 and 3, respectively, one with a

grandparent and one with a homosexual couple, had a non-traditional configuration

and were excluded from further analysis. The final data set thus consists of 316

households yielding 1259 participants.


Figure 2.8: Observed within-household physical contact networks by household size.

Nodes represent household members and edges represent physical contacts.


Figure 2.9: Barplots of contact intensity distributions (duration, frequency and touching)

and contact location distributions for all contacts recorded with non-household (left bar)

and household members (right bar).

Chapter 3The Social Contact Hypothesis

Under the Assumption of Endemic

Equilibrium: Elucidating the

Transmission Potential of VZV in

Europe.

In Chapter 1 we introduced two key measures of infectious disease transmission,

namely the basic and effective reproduction numbers, R0 and R. There are several

methods to estimate these reproduction number (Vynnycky and White, 2010).

In this chapter we focus on deriving R0 from transmission rates that can be

estimated from serological data under the assumption of endemic equilibrium. As

described in Section 1.3.1.3, a disease in endemic equilibrium may undergo cyclical

epidemics, but fluctuates around a stationary average over time. Also remember

that in this equilibrium setting R is expected to be equal to 1 (Diekmann et al., 1990).

We consider pre-vaccination serological data for the varicella-zoster virus from

12 different European countries described in Section 2.1.1. Serological surveys do

not provide complete information about mixing patterns, since they reflect the rate

at which susceptible individuals become infected, but not who is infecting whom.

45

46 Chapter 3. The Social Contact Hypothesis Under Endemic Equilibrium

Hence, to be able to estimate the transmission rates β(a, a′) from this VZV data, we

need to make assumptions about the underlying age-specific mixing patterns. Since

the estimation of R0 and R is sensitive to these assumptions (Greenhalgh and Dietz,

1994), we will inform the mixing pattern with data from the population-based social

contact survey described in Section 2.2.1.

Furthermore, we use the inferred effective reproduction number as a model el-

igibility criterion combined with AIC as a model selection criterion. To our

knowledge, Wallinga and Levy-Bruhl (2001) were the first to use the effective

reproduction number to asses the plausibility of different mixing patterns. However,

this is the first time that R is explicitly used as a determinant in the model selection

procedure. We evaluate how constant and age-specific proportionality factors affect

the fit to the serology and the estimated R0 values. Moreover, we assess the effect

of age-specific heterogeneity related to infectiousness on model eligibility and fit.

Further, from a selected set of demographic, socio-economic and spatio-temporal

factors, we explore which factors best explain the between-country heterogeneity in

R0 using two non-parametric methods: the maximal information coefficient (MIC)

and random forest.

This chapter covers the study in Santermans et al. (2015). It is organized as

follows. In Section 3.1, we will describe the model structure and procedure to

estimate the basic and effective reproduction number. The data application results

are presented in Section 3.1.4. In Section 3.2 we elaborate on the methods used to

determine potential risk factors and the results of this risk factor analysis. Sensitivity

to certain assumptions is assessed in Section 3.3 and we finish the chapter with some

concluding remarks in Section 3.4.

3.1 Estimating the Basic and Effective Reproduc-

tion Number

3.1.1 Mass Action Principle and Mixing Assumptions

To describe VZV transmission dynamics, a compartmental MSIR (Maternal

protection-Susceptible-Infected-Recovered) model for a closed population of size N

with fixed duration of maternal protection A is considered, following Goeyvaerts

et al. (2010) and Ogunjimi et al. (2009). Doing so, we explicitly take into account

3.1. Estimating the Basic and Effective Reproduction Number 47

the fact that newborns are protected by maternal antibodies and do not take part in

the transmission process. We assume that mortality due to infection can be ignored,

which is plausible for VZV in developed countries, and that infected individuals main-

tain lifelong immunity to varicella after recovery. Further, demographic and endemic

equilibria are assumed (Section 1.3.1.3). Under these assumptions the age-specific

prevalence π(a) is given by:

π(a) = 1− e−∫ aAλ(u) du,

where λ(a) is the age-specific force of infection. There is a wide range of methods

available to estimate λ(a) from seroprevalence data, see Hens et al. (2010) for an

historical overview.

Since we aim to estimate the basic and effective reproduction number, we need to

estimate the age-specific transmission rates β(a, a′) (Section 1.3.2.1). To do so, we

use a slightly adapted version of the mass action principle (1.11), incorporating

maternal protection:

λ(a) ≈ ND

L

∫ ∞A

β(a, a′)λ(a′)s(a′)m(a′) da′, (3.1)

where N , L, s(a) and m(a) are defined as in Chapter 1. Given the transmission

rates β(a, a′), R0 and R can be obtained following the next generation approach, as

described in Section 1.3.2.2.

Mixing assumptions

The traditional WAIFW approach of Anderson and May (1991) was used in

the exploratory analysis of the data (Nardone et al., 2007) to estimate the transmis-

sion rates. In this chapter, we will inform β(a, a′) with data on social contacts as

described in Section 1.3.2.1:

β(a, a′) = q(a, a′) · c(a, a′),

We will contrast the constant proportionality assumption, or social contact hypothesis

(1.12), against a log-linear function of the age of the susceptible individual, which

entailed an improvement of model fit for VZV in Belgium (Goeyvaerts et al., 2010),

that is, respectively:

log{q(a, a′)} = γ0 and log{q(a, a′)} = γ0 + γ1a. (3.2)


The contact rates c(a, a′) are estimated from the POLYMOD contact survey (Sec-

tion 2.2.1) using the bivariate smoothing approach described in Section 2.2.3, consid-

ering those contacts with skin-to-skin touching lasting at least 15 minutes since these

contacts have been shown to be most predictive for VZV (Goeyvaerts et al., 2010;

Melegaro et al., 2011). For the countries who participated in the POLYMOD project,

the corresponding contact rates were used, whereas for the other countries contact

data of a neighboring country or a country with similar school enrollment ages was

used (cf. Table 3.1). We present a sensitivity analysis in Section 3.3.1 to compare

these ad-hoc choices with a more objective selection of contact data by means of AIC.

We observe that the effect on R0 remains within reasonable bounds, which indicates

that the choice of contact data has limited influence on our estimates.

3.1.2 Estimation Procedure

In this chapter we will estimate the force of infection using maximum likelihood esti-

mation with the Bernouilli log-likelihood given by:

`(λ; y,a) =

n∑i=1

yi log(

1− e−∫ aiA λ(u) du

)+ (1− yi)

(−∫ ai

A

λ(u) du

). (3.3)

Here, n denotes the size of the serological data set and yi denotes a binary variable

indicating whether subject i had experienced infection before age ai. The transmis-

sion rates cannot be estimated analytically since the integral equation (3.1) has no

closed form solution. However, it is possible to solve this numerically by turning to

a discrete age framework, assuming a constant force of infection in each 1-year age

interval. Now, estimation proceeds as follows: starting values for the parameters

are provided after which the discretized mass action principle is iterated until

convergence (∑i(λi,iter − λi,iter−1)2 < 1 · 10−10) and finally, the resulting estimate

of the force of infection is contrasted to the serology using the log-likelihood (3.3).

To calculate 95% confidence intervals, non-parametric bootstraps are performed on

both the contact data and the serological data to account for all sources of variability

(Goeyvaerts et al., 2010). The number of bootstrap samples per country is fixed at

2000 with convergence rates varying between 62% and 100%.

Since some countries lack serological data on VZV in the older age groups,

the original serology is augmented with simulated data to avoid excess variability of

the bootstrap estimates (Goeyvaerts et al., 2010). These simulations are drawn from

a Bernouilli distribution with mean equal to the seroprevalence from the last 5 age


categories with at least 20 observations available. The size of the simulated samples

is determined by the demography of the population. This method is plausible from

an epidemiological point of view since the VZV seroprofile is not expected to decline

after 20 years of age. Based on the augmented data, post-stratification weights are

calculated using census data and included in the likelihood. The life expectancy

L and the age-specific mortality rates µi for every country are estimated based on

demographic data from the year of serological data collection (Eurostat, the Office

for National Statistics for England and Wales, Israeli Bureau of Statistics for Israel)

using a Poisson model with log link and offset term (Hens et al., 2012). To ensure

flexibility, a radial basis spline is used.

The duration of maternal immunity is fixed at A = 0.5 years, while the mean

duration of infectiousness for VZV is taken as D = 7/365 years. Lastly, to reduce

boundary irregularities induced by sparseness in the contact data for the elderly, the

contact surface, and hence the serological data, are restricted to the 0-69 year age

range. A sensitivity analysis showed little impact on the point estimates (results not

shown).

3.1.3 Model Eligibility and Indeterminacy

The estimated effective reproduction number R and corresponding confidence interval

allow us to check whether the above mixing patterns (3.2) conform with the assump-

tion of endemic equilibrium. In this setting, each infectious individual infects one

other individual on average, hence R is expected to be equal to 1 (Farrington, 2003).

We use this property to exclude those models for which R is estimated to be signifi-

cantly different from 1. Furthermore, the effective reproduction number allows us to

make indirect inference about the age-specific heterogeneity related to infectiousness,

assuming

log{q(a, a′)} = γ0 + γ1a+ γ2a′, (3.4)

where a′ is the age of the infective individual. We refer to this model as the ex-

tended log-linear model, in which γ2 is referred to as an infectiousness component.

Direct inference can be troublesome, as shown by Goeyvaerts et al. (2010), since

serological surveys do not provide information related to infectiousness. This indeter-

minacy can be illustrated as follows: assume for simplicity β(a, a′) = q(a, a′)c(a, a′) =

q0q1(a)q2(a′)c(a, a′). Rewriting (3.1), this implies

q0q1(a) =Lλ(a)

ND∫∞Aq2(a′)c(a, a′)λ(a′)s(a′)m(a′) da′

,


where λ(a), s(a) and c(a, a′) can be estimated from serological data and social contact

data, respectively. This implies that when q0q1(a) is flexibly modeled, the effect of

q2(a′) on the serological model is completely absorbed and the fit of this model does

not change for varying infectivity curves. However, it does affect the estimated value

of R0 and R. We deal with this indeterminacy by letting γ2 vary over a fixed interval

and assessing the effect on R. This way, the value of γ2 can be determined such that

R is not significantly different from 1. This is illustrated in Section 3.1.4.

3.1.4 Application to the Data

We apply the social contact data approach with a constant and age-specific log-linear

proportionality factor, as in (3.2), to the 13 serological data sets available for

VZV. The estimated basic and effective reproduction numbers for both models

are presented in Figure 3.1 and Table 3.1 together with 95% bootstrap percentile

confidence intervals. The size of the dots in the figure are proportional to the Akaike

weights (see Section 1.3.4.1), hence larger dots correspond to smaller AIC values.

These estimates are supplemented with estimates of the mean age at infection in

Table 3.1.

Models are classified as eligible based on the 95% confidence interval for the effective

reproduction number, and eligible models are compared by means of AIC. When the

model with lowest AIC value is eligible, this model is selected. This results in the

age-specific log-linear proportionality factor being preferred for Belgium, Denmark,

England and Wales, Ireland, Israel, Italy, The Netherlands and Poland. For Spain

and Slovakia, the constant proportionality factor is sufficient to provide a good fit.

For Finland, the log-linear model is preferred in terms of AIC, but this model is not

eligible, whereas for Luxembourg, both models are not eligible. In both cases, the

constant and basic log-linear model are not capable of providing a good fit to the data.

Therefore, we consider the extended log-linear model in (3.4) for Finland and

Luxembourg. Figure 3.2 presents the profile likelihood estimates of R0 and R as a

function of γ2. We observe that by including an infectiousness component in the

proportionality factor, the effective reproduction number R can be estimated closer

to 1. Note that the estimate of R0 decreases quite substantially with decreasing

γ2, in contrast to an increase in R. This reverse relation seems counter-intuitive,

but is caused by an interplay between q(a, a′) and s(a). Now, by performing a

non-parametric bootstrap for every value of γ2 on a specific grid, it is possible to


BE

DE

ES

EW

FI

IEIL

IT('9

7)IT

('04)

LUN

LP

LS

K

0.71

1.5

R

251015

R0

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

Fig

ure

3.1

:E

stim

ate

dbasi

cand

effec

tive

repro

duct

ion

num

ber

sw

ith

95%

boots

trap

per

centi

leco

nfiden

cein

terv

als

for

const

ant

(bla

ck),

log-l

inea

r(g

ray)

and

exte

nded

log-l

inea

r(l

ight

gra

y)

pro

port

ionality

fact

or.

For

each

countr

y,si

zes

of

the

dots

are

pro

port

ional

toA

kaik

e

wei

ghts

,hen

cela

rger

dots

corr

esp

ond

tosm

aller

AIC

valu

es.

The

dott

edhori

zonta

lline

indic

ate

sth

esi

ngle

elig

ible

valu

efo

rR

under

endem

iceq

uilib

rium

,w

hic

his

one.

52 Chapter 3. The Social Contact Hypothesis Under Endemic EquilibriumT

able

3.1

:E

stim

ate

sof

the

basi

cand

effec

tive

repro

duct

ion

num

ber

sand

transm

issi

on

para

met

ers

(γ0,γ1,γ2)

wit

h95%

boots

trap

per

centi

leco

nfiden

cein

terv

als

and

corr

esp

ondin

gA

ICva

lues

for

const

ant

(CP

),lo

g-l

inea

r(L

P)

and

exte

nded

log-l

inea

r(E

P)

pro

port

ionality

ass

um

pti

ons.

Est

imate

sfo

rE

Pare

obta

ined

usi

ng

apro

file

likel

ihood-b

ase

dass

essm

ent

of

model

elig

ibilit

y.F

inal

model

sare

indic

ate

din

bold

.

Mean

age

at

Conta

ct

Countr

yM

odel

R0

Rin

fecti

on

(years

)γ0

γ1

γ2

AIC

data

BE

CP

8.3

6[6

.46,

10.0

5]

1.0

0[0

.88,

1.3

4]

4.2

6[3

.74,

5.1

3]

-1.7

1[-

2.1

2,

-1.6

6]

891

BE

LP

6.4

0[5

.08,

8.4

7]

1.0

3[0

.88,

1.3

3]

3.9

8[3

.60,

4.5

4]

-1.4

9[-

1.9

2,

-1.0

3]

-0.0

21

[-0.0

56,

0.0

03]

880

DE

CP

6.0

7[4

.70,

6.6

6]

0.9

7[0

.90,

1.2

3]

4.9

3[4

.72,

5.8

3]

-1.7

1[-

2.1

2,

-1.7

1]

1168

DE

LP

5.5

2[4

.65,

7.2

0]

0.9

7[0

.89,

1.2

1]

4.7

9[4

.84,

5.6

9]

-1.6

2[-

2.0

7,

-1.5

0]

-0.0

07

[-0.0

14,

0.0

15]

1168

ES

CP

4.4

8[3

.86,

5.1

9]

1.0

4[0

.85,

1.4

2]

6.2

4[5

.25,

7.6

5]

-2.5

9[-

2.9

6,

-2.6

3]

2051

IT

LP

4.5

1[3

.94,

7.5

2]

1.0

4[0

.84,

1.3

7]

6.2

6[6

.04,

7.4

1]

-2.6

0[-

3.1

2,

-2.4

3]

0.0

01

[-0.0

19,

0.0

36]

2053

EW

CP

2.8

3[2

.44,

3.0

6]

0.9

5[0

.94,

0.9

9]

11.9

6[1

1.2

8,

14.7

0]

-2.4

5[-

2.7

6,

-2.5

0]

3010

GB

LP

2.7

5[2

.47,

2.9

5]

0.9

8[0

.91,

1.1

7]

11.0

5[1

0.2

3,

14.4

6]

-1.5

3[-

1.8

7,

-1.1

2]

-0.0

84

[-0.1

38,

-0.0

50]

2831

FI

CP

4.8

9[4

.31,

5.8

0]

0.9

4[0

.87,

1.0

2]

5.1

9[4

.76,

5.9

1]

-1.9

0[-

2.1

2,

-1.8

3]

682

FI

LP

5.3

2[4

.47,

8.1

0]

0.9

3[0

.87,

0.9

9]

5.4

0[5

.02,

5.9

7]

-2.0

3[-

2.2

8,

-1.7

8]

0.0

14

[-0.0

06,

0.0

35]

680

EP

4.8

1[4

.13,

6.4

6]

0.9

3[0

.88,

1.0

0]

5.4

0[5

.02,

5.9

7]

-2.1

0[-

2.3

4,

-1.8

4]

0.0

10

[-0.0

05,

0.0

35]

-0.0

08

680

IEC

P4.9

7[4

.08,

5.3

2]

0.9

2[0

.88,

1.0

4]

5.3

2[5

.31,

6.6

4]

-1.8

3[-

2.2

1,

-1.8

3]

1672

GB

LP

3.8

5[3

.41,

4.2

9]

0.9

7[0

.88,

1.1

3]

6.2

2[5

.67,

7.5

7]

-1.2

5[-

1.7

6,

-1.0

1]

-0.0

74

[-0.0

98,

-0.0

23]

1576

ILC

P11.9

3[8

.83,

14.3

4]

0.9

6[0

.88,

1.2

7]

5.0

0[4

.53,

6.2

7]

-1.3

9[-

1.7

7,

-1.2

7]

789

BE

LP

4.7

6[4

.23,

7.4

9]

1.0

5[0

.89,

1.3

3]

4.7

9[4

.37,

5.9

9]

-0.7

6[-

1.4

2,

-0.3

5]

-0.0

69

[-0.1

12,

-0.0

16]

729

IT(’

97)

CP

3.8

5[3

.39,

4.3

2]

0.9

8[0

.88,

1.3

5]

8.5

0[8

.21,

9.9

2]

-2.8

6[-

3.2

4,

-2.9

4]

2033

IT

LP

4.3

7[3

.61,

6.4

5]

0.9

5[0

.89,

1.1

9]

8.7

7[8

.62,

10.0

6]

-3.1

6[-

3.5

6,

-3.0

1]

0.0

22

[0.0

05,

0.0

42]

2000

IT(’

04)

CP

3.9

9[3

.45,

4.6

5]

0.9

8[0

.88,

1.4

2]

8.2

2[7

.63,

9.5

7]

-2.8

1[-

3.2

2,

-2.9

2]

1194

IT

LP

4.1

5[3

.63,

5.3

0]

0.9

6[0

.88,

1.3

0]

8.4

5[8

.16,

9.7

6]

-2.9

8[-

3.4

4,

-2.9

0]

0.0

11

[0.0

02,

0.0

34]

1190

LU

CP

7.2

8[6

.04,

8.8

9]

0.8

6[0

.83,

0.9

3]

4.3

3[3

.91,

5.3

0]

-1.9

7[-

2.3

0,

-1.9

0]

561

LU

LP

6.6

7[5

.75,

8.6

3]

0.8

7[0

.81,

0.9

5]

3.9

2[3

.63,

4.7

3]

-1.6

0[-

2.0

2,

-1.3

2]

-0.0

28

[-0.0

49,

0.0

02]

550

EP

4.9

9[4

.23,

6.0

7]

0.8

9[0

.82,

1.0

0]

3.8

8[3

.58,

4.7

0]

-1.4

7[-

1.8

6,

-1.1

8]

-0.0

30

[-0.0

47,

0.0

05]

-0.0

52

550

NL

CP

8.4

7[5

.74,

14.1

8]

0.8

9[0

.72,

1.4

0]

3.4

0[2

.95,

4.7

3]

-1.7

7[-

2.3

9,

-1.6

1]

400

NL

LP

7.6

0[5

.71,

12.8

7]

1.0

0[0

.69,

1.5

5]

2.7

7[2

.68,

3.8

9]

-1.1

3[-

2.0

3,

-0.3

2]

-0.0

64

[-0.1

67,

-0.0

03]

359

PL

CP

3.7

5[3

.16,

4.5

7]

0.9

3[0

.90,

0.9

9]

10.3

4[8

.35,

12.3

0]

-2.5

6[-

2.8

6,

-2.4

7]

1724

PL

LP

3.3

7[2

.93,

4.1

9]

0.9

4[0

.86,

1.0

7]

9.5

3[7

.76,

11.4

9]

-1.6

3[-

2.1

7,

-1.2

9]

-0.0

75

[-0.1

13,

-0.0

25]

1599

SL

CP

5.6

2[4

.68,

6.2

7]

0.9

0[0

.85,

1.0

4]

6.1

1[5

.93,

7.0

8]

-2.1

2[-

2.4

7,

-2.1

2]

1239

PL

LP

5.4

9[4

.77,

7.8

3]

0.9

0[0

.85,

1.0

0]

6.0

8[6

.10,

7.0

0]

-2.1

3[-

2.5

7,

-1.9

3]

-0.0

03

[-0.0

15,

0.0

22]

1241


determine the maximal value of γ2 such that 1 is within the 95% bootstrap confidence

interval of R. This is illustrated in Figure 3.3.

−0.30 −0.25 −0.20 −0.15 −0.10 −0.05 0.00

2.0

2.5

3.0

3.5

4.0

4.5

5.0

γ2

R0

0.90

0.95

1.00

1.05

1.10

1.15

R

R0

R

Finland

−0.30 −0.25 −0.20 −0.15 −0.10 −0.05 0.00

34

56

γ2

R0

0.85

0.90

0.95

1.00

1.05

R

R0

R

Luxembourg

Figure 3.2: Profile likelihood estimates of R0 (left axis) and R (right axis) as a function

of γ2, the parameter related to infectiousness, for Finland and Luxembourg.

●●●●

●

●

●

0.85

0.90

0.95

1.00

1.05

1.10

1.15

γ2

−0.1 −0.075 −0.05 −0.025 −0.01 0

● R95% ci

Finland

●●

●

●

●

●

0.8

0.9

1.0

1.1

γ2

−0.15 −0.1 −0.075 −0.05 −0.01

● R95% ci

Luxembourg

Figure 3.3: Profile likelihood estimates of R (dots) with interpolated 95% bootstrap

percentile confidence intervals (dashed lines) as a function of γ2, the parameter related to

infectiousness, for Finland and Luxembourg. The vertical dotted line indicates the value of

γ2 for which the upper confidence limit of R equals 1 (horizontal dotted line).


The parameter estimates and confidence intervals for the extended log-linear model

based on these maximal values of γ2 are also displayed in Figure 3.1. We observe

the following: for Finland, the extended model has an improved fit compared to

the constant model and is conform with the endemic equilibrium assumption. For

Luxembourg, only the extended model is eligible, and in addition, it has the lowest

AIC value. Note that the estimate of R0 for Luxembourg decreases considerably.

The estimated seroprevalence curves based on the selected model for each country are

presented in Figures 3.4 and 3.5. The fitted seroprofiles show a similar pattern across

countries, with most infections occurring during early childhood and the estimated

prevalence approaching one as age increases. However, the prevalence does not reach

one in all countries and, for example, Italy has a more particular profile. Looking

at the FOI curves, the largest estimate is observed in the Netherlands (0.57 year−1)

at the age of 5, followed by Luxembourg (0.49 year−1). The largest estimate of R0

is obtained for The Netherlands (7.60) and the lowest for England and Wales (2.75).

11 out of 13 countries have R0 estimated below 6.

3.2 Elucidating Potential Risk Factors

There is considerable variation in estimated basic reproduction numbers, and hence

in transmissibility, among the countries under consideration. To address these

differences a selection of 39 relevant country-specific variables was made, comprising

data on demography, childcare, population density and weather (Table 3.2). To

investigate associations between R0 and these variables, two different non-parametric

approaches are considered.

3.2.1 Maximal Information Coefficient

The maximal information coefficient (MIC) (Reshef et al., 2011) is a measure of two-

variable dependence, designed specifically for rapid exploration of high-dimensional

data sets. The MIC is part of a larger family of maximal information-based non-

parametric exploration statistics, which can be used not only to identify important

relationships in data sets but also to characterize them. The MIC is defined in the

following way: let G denote an x-by-y grid on the scatterplot of the two variables

under consideration for a pair of integers (x, y). Let IG denote the mutual infor-

mation of the probability distribution induced on the boxes of G, where the proba-

3.2. Elucidating Potential Risk Factors 55

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

prev

alen

ce

●

●●

●

●●

●●

●●●●

●●●●●●●●●

●

●● ●

●

●●● ●

●

●●● ● ●

●

●

●

●

● ● ● ●●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●●●●●

0.00

0.10

0.20

0.30 fo

rce

of in

fect

ion

(1/y

ears

)

Belgium

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

prev

alen

ce

●

●

●

●

●

●●●●

●●●●●

●●●●●

●●●

●●

●

●●

●●

●●

●●●

●●●

●●

●●●●●

●

●●

●●

●

●

● ● ● ●●

●

●●●●

●

●● ● ●● ●

0.00

0.15

0.30 fo

rce

of in

fect

ion

(1/y

ears

)

Germany

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

prev

alen

ce

●

●

●

●●

●●●

●

●●●●

●

●

●●●●●

●●

●

●●●

●●●

●

●

●●●

●●●●

●●●

●●●●

●

●

●●●●

●●

●

●●

●

●●●

●

●●●

●

●

●

0.00

0.15

0.30 fo

rce

of in

fect

ion

(1/y

ears

)

Spain

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

prev

alen

ce

●●●

●●●

●

●

●

●

●

●●●●

●●●●● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●●●

●

●

●

0.00

0.15

forc

e of

infe

ctio

n (1

/yea

rs)

England and Wales

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

prev

alen

ce

●

●

●

●●

●

●

●

●

●●

●

●●

●●

●

●●●●

●●

●

●

●●

●

●

●

●

●●●●

●●

●

●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.00

0.15

0.30 fo

rce

of in

fect

ion

(1/y

ears

)

Finland

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

prev

alen

ce

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

● ● ●● ● ● ●

●

0.00

0.15

0.30 fo

rce

of in

fect

ion

(1/y

ears

)

Ireland

Figure 3.4: Observed age-specific VZV seroprevalence (dots) and the profile estimated

from the final model selected for each country (solid line). The corresponding force of

infection estimates are displayed by the lower solid line.


0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

prev

alen

ce

●

●

●

●

●●

●●

●

●

●●●

●●●

●●

●

●

●

●

●

●

●

●

●

● ● ●

●

●●

●

●

● ●

●

● ●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

0.0

0.1

0.2

0.3

0.4

forc

e of

infe

ctio

n (1

/yea

rs)

Israel

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

prev

alen

ce

●

●

●●

●●

●●●

●●●●

●●

●●●●

●●●

●

●

●

●

●

●

●

●●

●●

●

●●●

●●

●

●

●

● ● ●

●

● ● ● ●

●

●●●●●

●

●●●●

●●

●

●

0.00

0.15

forc

e of

infe

ctio

n (1

/yea

rs)

Italy

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

●

●

●●

●

●

●

●

●●

●●●

●

●●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●

●

●

●

●

●

● ●

●

●

● ● ● ● ● ●

●

● ● ● ●

●

● ● ● ● ● ● ● ● ● ● ●

●

●

IT('97)IT('04)

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

prev

alen

ce

●

●

●●●●●●●●

●●●●●●

●

●

●●

●

●●

●

●

●●●

●

●

●●

●

●● ● ● ● ●

●

● ● ● ● ●

●

● ●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.0

0.1

0.2

0.3

0.4

0.5

forc

e of

infe

ctio

n (1

/yea

rs)

Luxembourg

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

prev

alen

ce

●

●

●

●

●

●●

●

●●●●●●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

● ●

●

●

●

● ● ● ● ●

●

●

●

●

● ● ●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.0

0.1

0.2

0.3

0.4

0.5

0.6

forc

e of

infe

ctio

n (1

/yea

rs)

The Netherlands

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

prev

alen

ce

●●●●

●

●●

●●●●

●

●

●

●

●●

●

●

● ●

● ●

●

●

●

●

● ●

●

●

●

●

●

● ●

●

●●

●●

●

●

●●

●

●

●

● ●

●

●●

● ●

●

● ●

●

●

●

●

●

● ● ●

● ●

0.00

0.15

0.30 fo

rce

of in

fect

ion

(1/y

ears

)

Poland

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

age

prev

alen

ce

●

●●

●

●

●

●●●

●●●●●●●●●

●●

●●

●

●●

●●●●●

●●●

●

●

●●

●●●

●

● ● ● ●

●

●

● ●

● ●

● ●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.00

0.15

0.30 fo

rce

of in

fect

ion

(1/y

ears

)

Slovakia

Figure 3.5: Observed age-specific VZV seroprevalence (dots) and the profile estimated

from the final model selected for each country (solid line). The corresponding force of

infection estimates are displayed by the lower solid line.


Table

3.2

:Sel

ecte

dse

tof

pote

nti

al

risk

fact

ors

for

vari

cella.

Data

sourc

esand

mis

singnes

sare

indic

ate

d.

Ref

eren

ceyea

rsw

ere

chose

nto

be

as

close

toth

eyea

rof

sero

logic

al

data

collec

tion

as

poss

ible

,co

ndit

ional

on

availabilit

y.

IDV

ari

able

desc

ripti

on

Sourc

eM

issi

ngness

(out

of

13)

1P

rop

ort

ion

of

popula

tion

age

0-4

years

(%)

Euro

stat

1

2P

robabilit

yof

dyin

gb

efo

reage

5(p

er

1000

live

bir

ths)

WH

O0

3E

nro

llm

ent

rate

sof

childre

n0-2

years

info

rmal

care

or

earl

yeducati

on

serv

ices

(%)

OE

CD

0

4E

nro

llm

ent

rate

sof

childre

n3-5

years

info

rmal

care

or

earl

yeducati

on

serv

ices

(%)

OE

CD

0

5P

opula

tion

livin

gin

an

overc

row

ded

house

hold

(%)

Euro

stat

1

6A

vera

ge

abso

lute

hum

idit

y(m

ean

dew

poin

tte

mp

era

ture

in◦C

)M

ath

em

ati

ca

0

7Sta

ndard

devia

tion

of

abso

lute

hum

idit

y(m

ean

dew

poin

tte

mp

era

ture

in◦C

)M

ath

em

ati

ca

0

8C

hildre

n0-3

years

that

receiv

eno

form

of

form

al

care

(%)

Euro

stat

1

9C

hildre

n0-3

years

that

are

care

dfo

rby

only

their

pare

nts

(%)

Euro

stat

1

10

Wom

en

aged

25-4

9years

wit

hat

least

one

child

aged

0-5

years

who

are

em

plo

yed

(%)

Euro

stat

1

11

Wom

en

per

men

(Num

ber

of

wom

en

per

100

men)

Euro

stat

1

12

Educati

onal

level

att

ain

ment

upp

er

secondary

(%of

popula

tion

25−

64)

Euro

stat

1

13

Unm

et

medic

al

needs

(%of

popula

tion)

Euro

stat

1

14

Inequality

of

incom

edis

trib

uti

on

(rati

oof

20%

hig

hest

incom

eand

20%

low

est

incom

e)

Euro

stat

1

15

Unem

plo

ym

ent

(%of

econom

ically

acti

ve

popula

tion)

Euro

stat

0

16

Genera

lpra

cti

tioners

(per

100,0

00

inhabit

ants

)W

HO

1

17

Educati

onal

level

school

exp

ecta

ncy:

exp

ecte

dyears

of

educati

on

over

alife

tim

e(y

ears

)E

uro

stat

1

18

Popula

tion

aged

65

years

and

ab

ove

(%)

Euro

stat

0

19

Povert

yra

te(%

of

popula

tion

that

are

at

risk

of

povert

yaft

er

socia

ltr

ansf

ers

)E

uro

stat

1

20

Avera

ge

popula

tion

densi

ty(p

er

km

2)

Euro

stat

0

21

Avera

ge

house

hold

size

Euro

stat

0

22

GD

P/capit

aat

purc

hasi

ng

pow

er

standard

Euro

stat

1

23

Bir

thra

te(n

um

ber

of

bir

ths

per

1,0

00

inhabit

ants

)E

uro

stat

0

24

Liv

ing

are

a(a

vera

ge

m2

per

pers

on)

Euro

stat

4

25

Popula

tion

aged

0-1

4years

(%)

WH

O0

26

Urb

an

popula

tion

(%)

WH

O0

27

UN

DP

:in

dex

measu

ring

avera

ge

ach

ievem

ent

in3

dim

ensi

ons

of

hum

an

develo

pm

ent

WH

O0

28

Avera

ge

num

ber

of

people

per

room

inoccupie

dhousi

ng

unit

WH

O1

29

Docto

rs’

consu

ltati

ons

(num

ber

per

capit

ap

er

year)

OE

CD

2

30

Childre

nim

muniz

ed

for

DT

P(%

)O

EC

D1

31

65+

popula

tion

vaccin

ate

dagain

stin

fluenza

(%)

OE

CD

2

32

Public

exp

endit

ure

on

pre

venti

on

and

public

healt

h(%

of

tota

lhealt

hexp

endit

ure

)O

EC

D2

33

Infa

nts

vaccin

ate

dagain

stin

vasi

ve

dis

ease

due

toH

aem

ophiliu

sin

fluenzae

typ

eb

(%)

WH

O0

34

Infa

nts

vaccin

ate

dagain

stm

um

ps

(%)

WH

O1

35

Infa

nts

vaccin

ate

dagain

stp

ert

uss

is(%

)W

HO

0

36

Infa

nts

vaccin

ate

dagain

stru

bella

(%)

WH

O0

37

Bre

ast

feedin

gat

3m

onth

s(%

of

infa

nts

)W

HO

6

38

New

born

babie

sw

ith

bir

thw

eig

ht>

2.5

kg

(%)

WH

O3

39

Tota

lhealt

hexp

endit

ure

(%of

GD

P)

Worl

dB

ank

0


bility of a box is proportional to the number of data points falling inside the box.

Now, define the characteristic matrix Mx,y as follows: the (x, y)-th entry mx,y equals

max(IG)/ log(min(x, y)), where the maximum is taken over all x-by-y grids G. MIC

is the maximum of mx,y over ordered pairs (x, y) such that xy < B, where B is a

function of sample size. Default is B = n0.6, with n denoting sample size.

3.2.2 Random Forest Approach

Secondly, a random forest approach for regression is used (Breiman, 2001), which is

a class of ensemble methods - methods that generate many classifiers and aggregate

their results - specifically designed for classification and regression trees. Each tree is

constructed using a different bootstrap sample of the data and each node is split using

the best among a subset of predictors randomly chosen at each node. The random

forest algorithm for regression is as follows:

1. Draw ntree bootstrap samples from the original data.

2. For each of the bootstrap samples, grow an unpruned regression tree by sampling

mtry of the predictors and choosing the best split from these variables.

3. Predict new data by averaging the predictions of the ntree trees.

An estimate of the error rate can be obtained as follows: at each bootstrap iteration,

predict the data not in the bootstrap sample (“out-of-bag” or OOB data) using the

tree grown with the bootstrap sample. Average the OOB predictions and calculate

the corresponding error rate. This is called the OOB estimate of error rate.

The “mean of squared residuals” is computed as

MSEOOB = n−1n∑i=1

(yi − yOOB

i )2,

where yOOBi is the average of the OOB predictions for the ith observation.

Two variable importance measures are then defined as follows. The “mean de-

crease in accuracy” is computed from permuting the OOB data. For each tree,

MSEOOB is recorded and the same is done after permuting each predictor variable.

The differences between the two are then averaged over all trees and normalized by

the standard deviation of the differences. The second measure “mean decrease in

node purity” is the total decrease in node impurities from splitting on the variable,

averaged over all trees. For regression, the node impurity is measured by the residual


sum of squares.

Compared to many other classifiers, this turns out to perform very well and is

robust against overfitting (Breiman, 2001). In addition, it has only two parameters -

the number of variables in the random subset at each node and the number of trees

in the forest - and is usually not very sensitive to their values. We use the random

forest algorithm from the randomForest package in R with the default number of

trees (500). The number of split variables is selected such that the highest percentage

explained variance is obtained. The package produces two measures of importance

of the predictor variables: “mean decrease in accuracy” and “mean decrease in node

purity”.

3.2.3 Results

Table 3.5 contains the pairs of potential risk factors with the strongest correlation

given by the Spearman correlation coefficient. These correlations can be used to

interpret the relation between R0 and certain factors.

The ten factors with the largest MIC of association with R0, are presented in

Table 3.3 together with the corresponding Spearman correlation coefficients. This

implies, for example, that the higher the inequality of income, the lower R0.

Table 3.3: Ten factors with the largest MIC value of association with R0, estimated from

the final model selected for each country, and corresponding Spearman correlation

coefficients ρS .

MIC ρS

1. inequality of income distribution 1.0 -0.64

2. poverty rate 1.0 -0.73

3. % infants vaccinated against mumps 0.65 0.64

4. average square meter living area pp 0.59 0.42

5. % breast feeding at 3 months 0.47 -0.21

6. % employed women 25 - 49 (min. 1 child 0 - 5) 0.46 0.38

7. % infants vaccinated against pertussis 0.38 0.46

8. % infants vaccinated against rubella 0.36 0.51

9. % population aged 0-14 0.32 -0.22

10. total health expenditure 0.32 0.51


Results of the random forest analysis of R0 are summarized in Table 3.4 where the

ten highest scoring factors for both importance measures are given. Comparing the

results of both analyses, we observe that factors related to the distribution of wealth

(inequality of income and poverty rate), vaccination coverage in infants (e.g. mumps

vaccination coverage) and child care attendance (e.g. the percentage of infants that

receive no formal care) seem to be associated with the transmissibility of VZV.

Table 3.4: Ten best scoring factors obtained by a random forest analysis of R0, estimated

from the final selected model for each country, and corresponding Spearman correlation

coefficients ρS .

% increase in MSE ρS Increase in node purity ρS

1. inequality of income distribution -0.64 inequality of income distribution -0.64

2. poverty rate -0.73 poverty rate -0.73

3. total health expenditure 0.51 average population density 0.33

4. % 0-2 that receive no formal care -0.29 % 0-2 that receive no formal care -0.29

5. % infants vaccinated against mumps 0.64 unmet medical needs -0.31

6. % population aged 0-14 -0.22 total health expenditure 0.51

7. % employed women (min. 1 child 0 - 5) 0.38 enrollment rates children 0-2 0.15

8. average square meter living area pp 0.42 average square meter living area pp 0.42

9. average population density 0.33 % 65+ vaccinated against influenza -0.19

10 enrollment rates children 0-2 0.15 % infants vaccinated against mumps 0.64

3.3 Sensitivity Analysis

3.3.1 Contact data

The choice of contact data for countries that do not have contact data was based

on geographical grounds or school enrollment ages. As a result, data from Italy,

Belgium, England and Wales and Poland are used for Spain, Israel, Ireland and

Slovakia, respectively. However, besides school-based contacts, the contact rates

will also depend on e.g. the number of people living in a household. Therefore, we

compared the average household size for these countries at the time of serological

data collection in Table 3.6. We can see that there is quite some difference when

comparing Belgium to Israel and even Ireland to England and Wales. However, when

selecting other contact data we cannot expect to have complete agreement on every

relevant factor. For this reason, we performed a sensitivity analysis exploring the

impact of the contact matrix by repeating the estimation with the basic log-linear

3.3. Sensitivity Analysis 61

Table 3.5: Pairs of potential risk factors with the largest absolute Spearman correlation

coefficient. High scoring factors according to MIC and random forest are indicated in bold.

ID ID ρS

1 Proportion 0-4 25 Proportion 0-14 0.96

36 % infants vaccinated against rubella 34 % infants vaccinated against mumps 0.95

19 Poverty rate 14 Inequality of income distribution 0.95

22 GDP 15 Unemployment -0.91

3 Enrollment rates children 0-2 9 % 0-3 cared for only by their parents -0.90

24 Average m2 living area pp 22 GDP 0.87

1 Proportion 0-4 23 Birth rate 0.86

25 Proportion 0-14 18 Proportion 65+ -0.84

37 Breast feeding at 3 months 20 Average population density -0.82

3 Enrollment rates children 0-2 9 % 0-2 not receiving formal care -0.82

25 Proportion 0-14 23 Birth rate 0.81

13 Unmet medical needs 10 % employed women (min. 1 child 0-5) -0.80

4 Enrollment rates children 3-5 18 Proportion 65+ 0.80

36 % infants vaccinated against rubella 33 % infants vaccinated against hib 0.80

3 Enrollment rates children 0-2 13 Unmet medical needs -0.79

5 Overcrowding 15 Unemployment 0.79

24 Average m2 living area pp 11 Women per men -0.78

15 Unemployment 10 % employed women (min. 1 child 0-5) -0.78

37 Breast feeding at 3 months 31 % 65+ vaccinated against influenza -0.77

24 Average m2 living area pp 15 Unemployment -0.77

27 Human Development Index 13 Unmet medical needs -0.76

23 Birth rate 13 Unmet medical needs -0.76

27 Human Development Index 22 GDP 0.76

9 % 0-2 not receiving formal care 7 SD humidity 0.75

3 Enrollment rates children 0-2 10 % employed women (min. 1 child 0-5) 0.75

26 Urban population 20 Average population density 0.75

31 % 65+ vaccinated against influenza 7 SD humidity -0.75

11 Women per men 10 % employed women (min. 1 child 0-5) -0.74

5 Overcrowding 10 % employed women (min. 1 child 0-5) -0.74

13 Unmet medical needs 8 % 0-3 not receiving formal care 0.73

1 Proportion 0-4 18 Proportion 65+ -0.73

20 Average population density 6 Mean humidity 0.73

37 Breast feeding at 3 months 30 % children immunized for DTP -0.72

3 Enrollment rates children 0-2 15 Unemployment -0.72

4 Enrollment rates children 3-5 32 Public expenditure public health -0.72

24 Average m2 living area pp 10 % employed women (min. 1 child 0-5) 0.72

35 % infants vaccinated against pertussis 19 Poverty rate -0.72

26 Urban population 9 % 0-3 cared for only by their parents -0.71

27 Human Development Index 15 Unemployment -0.71

21 Average household size 17 Educational level school expectancy -0.71

23 Birth rate 10 % employed women (min. 1 child 0-5) 0.70

3 Enrollment rates children 0-2 7 SD humidity -0.70


model using contact data from the other seven countries available in the POLYMOD

study. Doing so, it was possible to identify for every country the contact data that

gave the best fit to the serological data. In Table 3.7 the estimated basic reproduction

numbers are compared with the estimates obtained by using the “minimal AIC

contact data”. We observe that the effect on R0 is small, except for Luxembourg,

Belgium, Finland and Israel where the difference is larger, but still within reasonable

bounds.

Table 3.6: Comparison of the average household size at time of serological data collection.

Country HH size Country HH size

Spain 2.8 Italy 2.5

Israel 3.4 Belgium 2.4

Ireland 3.0 England and Wales 2.3

Slovakia 3.1 Poland 3.0

3.3.2 Risk Factors

Table 3.8 and Table 3.9 show the results obtained for the MIC and random forest

approach, respectively, when repeating the risk factor analysis based on the “minimal

AIC contact data”. We can conclude that the risk factor analysis is quite robust to

changes in the contact matrix, as the most important influential factors do not change.

3.3.3 Perturbations Demographic and Endemic Equilibrium

To explore the effect of perturbations in the demographic and endemic equilibrium,

we model the transmission dynamics of VZV in Belgium based on an age-time SIR

model using a RAS-model (described in Section 1.3.1.2). For the force of infection,

a dynamic model using the social contact approach is considered. Further, we use

time homogenous but age-dependent mortality rates µi estimated from mortality

data (Eurostat) and a constant number of newborns. All other parameters (e.g. N ,

L, D) used in this simulation are equal to the ones in the primary analysis. The

proportionality factor and force of infection are based on the estimates obtained for

Belgium under the log-linear proportionality assumption.


Table 3.7: Estimated basic and effective reproduction numbers with 95% bootstrap

percentile confidence intervals and corresponding AIC values for the log-linear model based

on contact data minimizing AIC.

Contact

Country R0 R AIC data

Belgium 7.79 [6.21, 14.67] 1.06 [0.82, 1.44] 877 IT

Germany 5.32 [4.69, 6.24] 0.87 [0.84, 0.93] 1156 LU

Spain 4.13 [3.63, 4.89] 0.89 [0.86, 0.93] 2031 LU

England and Wales 3.16 [2.71, 3.72] 1.06 [0.88, 1.45] 2824 IT

Finland 6.34 [5.01, 23.95] 0.88 [0.79, 1.27] 675 NL

Ireland 3.94 [3.54, 4.97] 1.04 [0.90, 1.33] 1576 BE

Israel 5.94 [5.07, 11.56] 1.08 [0.83, 1.42] 721 IT

Italy (1997) 4.77 [3.55, 6.34] 0.97 [0.93, 1.08] 1987 DE

Italy (2004) 4.15 [3.63, 5.30] 0.96 [0.88, 1.30] 1190 IT

Luxembourg 7.92 [6.55, 11.65] 1.10 [0.82, 1.88] 550 IT

The Netherlands 7.91 [6.29, 10.23] 0.91 [0.79, 1.02] 357 LU

Poland 3.37 [2.93, 4.19] 0.94 [0.86, 1.07] 1599 PL

Slovakia 5.49 [4.77, 7.83] 0.90 [0.85, 1.00] 1241 PL

Table 3.8: Ten factors with the largest MIC value of association with R0, estimated from

the log-linear model using the minimal AIC contact data, and corresponding Spearman

correlation coefficient, ρS .

MIC ρS

1. inequality of income distribution 1.0 -0.70

2. poverty rate 1.0 -0.76

3. average square meter living area pp 0.59 0.67

4. birth rate 0.50 0.22

5. % employed women 25 - 49 (min. 1 child 0 - 5) 0.46 0.49

6. unmet medical needs 0.46 -0.41

7. % infants vaccinated against mumps 0.46 0.56

8. % 65+ vaccinated against influenza 0.44 -0.32

9. women per men 0.41 -0.30

10. human development index 0.36 0.13


Table 3.9: Ten best scoring factors obtained by a random forest analysis of R0, estimated

from the log-linear model using the minimal AIC contact data, and corresponding

Spearman correlation coefficient, ρS .

% increase in MSE ρS Increase in node purity ρS

1. inequality of income distribution -0.70 % employed women (min. 1 child 0 - 5) 0.49

2. poverty rate -0.76 poverty rate -0.76

3. % employed women (min. 1 child 0 - 5) 0.49 inequality of income distribution -0.70

4. unmet medical needs -0.42 unmet medical needs -0.42

5. average square meter living area pp 0.67 urban population 0.37

6. % 0-2 that receive no formal care -0.24 enrollment rates children 0-2 0.23

7. average population density 0.29 % 0-2 that receive no formal care -0.24

8. women per men -0.30 average square meter living area pp 0.67

9. urban population 0.37 GDP/capita 0.42

10 standard deviation of absolute humidity 0.19 % infants vaccinated against hib 0.55

We first run the RAS-model until endemic equilibrium is reached. Afterwards, a

demographic change or vaccination strategy is applied. Seroprevalence data are then

sampled from the resulting prevalence and we repeat our procedure to estimate

R0 and R. We include a simple vaccination strategy by putting S(0) = (1 − p)B

and R(0) = pB for p = 0.4 and p = 0.6. This corresponds to p × 100% newborns

instantaneously immunized at birth. Simulated demographic changes follow from

increasing or decreasing the number of births per year while keeping mortality rates

fixed. The obtained results are summarized in Table 3.10.

Table 3.10: Estimates of the basic and effective reproduction number when implementing

a vaccination strategy or changing the birth rate.

Model R0 R

Vaccination 40% 4.97 [4.26, 6.16] 1.06 [0.92, 1.32]

Vaccination 60% 3.77 [3.38, 4.70] 1.15 [1.00, 1.32]

Birth +0.25% 8.39 [6.47, 15.38] 1.11 [0.94, 2.20]

Birth −0.25% 4.75 [4.03, 5.69] 1.02 [0.89, 1.34]

Birth −0.5% 3.13 [2.75, 3.49] 0.99 [0.92, 1.31]

We observe that R increases when a percentage of the newborns would have been

vaccinated and when the number of births would be increasing. It decreases when the

annual number of births would decrease.

3.4. Discussion 65

3.4 Discussion

In this chapter, we investigated the transmissibility of VZV in 12 European countries

using serological survey data and social contact data. We contrasted the social

contact hypothesis, which is currently the most used approach in the literature,

against an approach reflecting differences in characteristics related to susceptibility

and infectivity. Furthermore, we introduced the effective reproduction number as a

model eligibility criterion and we identified which country-specific socio-demographic

factors are important in explaining differences in transmission potential between

European countries using two non-parametric approaches: the maximal information

coefficient and random forest.

The social contact hypothesis provided a good fit to the VZV seroprevalence

for only 2 out of 12 countries. The other countries benefited from an extended

approach by assuming an age-dependent proportionality factor, which supports and

extends earlier findings of Goeyvaerts et al. (2010) for VZV in Belgium. This may

reflect the additional importance of age-specific characteristics related to susceptibil-

ity and infectiousness, such as the mean infectious period. Furthermore, the social

contact data are used as proxies for events by which an infection is transmitted.

Hence, the proportionality factor can also be considered as an age-specific adjustment

factor relating the true contact rates underlying infection to the social contact

proxies. Alternatively, social data are difficult to collect from young children, with

parents filling out the diary on their behalf. It may well be that they consistently

underestimate the true number of contacts that young children make.

Our analysis directly improves upon the original analysis of the ESEN2 data

on VZV by Nardone et al. (2007) who used the traditional Anderson and May

approach by imposing a 3-parameter structure on the WAIFW matrix (Anderson

and May, 1991). Our method of using R as a model eligibility criterion extends

the approach of Goeyvaerts et al. (2010) by addressing the indeterminacy of the

infectivity parameter. Our results complement those of Melegaro et al. (2011)

who analyzed part of the VZV serology using the social contact hypothesis only.

Comparing the estimated R0 values, we notice that our results in general somewhat

differ from the estimates obtained by Nardone et al. (2007) and Melegaro et al.

(2011). This is not unexpected, since there are differences in methodology and it is

known that transmission assumptions have a large impact on the estimation of R0.

See Table 3.11 for a comparative overview of the results.


Table 3.11: Ranges of estimates of the basic reproduction numbers obtained by

Santermans et al. , Nardone et al. and Melegaro et al. Nardone et al. used a WAIFW

matrix approach for three age groups, whereas Melegaro et al. used the social contact

hypothesis for different stratifications of POLYMOD contact data.

Country Santermans et al. Nardone et al. Melegaro et al.

BE 6.40 - 8.36 6.47 5.47 - 8.75

DE 5.52 - 6.07 5.46

ES 4.48 - 4.51 3.91

EW 2.75 - 2.83 3.83 3.66 - 5.11

FI 4.81 - 5.32 4.85 4.71 - 8.44

IE 3.85 - 4.97 5.22

IL 4.76 - 11.93 7.71

IT (’97) 3.85 - 4.37 3.31 3.98 - 4.64

IT (’04) 3.99 - 4.15

LU 4.99 - 7.28 8.28

NL 7.60 - 8.47 16.91

PL 3.37 - 3.75 5.27 - 7.5

SL 5.49 - 5.62 5.72

The results in Figure 3.1 indicate that there are substantial epidemiological differ-

ences between European countries. This is important to consider when parametrizing

mathematical models. Childhood vaccination coverage (for different vaccines),

child care attendance, population density and average living area per person were

positively associated with R0, whereas income inequality, poverty, breast feeding,

and the proportion of children under 14 years of age showed negative associations.

While it seems intuitively logical that greater child care attendance and population

density lead to more rapid spread of varicella, other associations are more difficult

to interpret. Less poverty and income inequality, and higher vaccination coverages

may be associated with more affluent societies in which women are more likely to

be employed and children have more universal access to childcare and kindergarten

from an early age on, facilitating the spread of VZV.

In our analyzes, we relied on a few assumptions. First of all, we assumed

that the serological status of an individual is a direct measure of his/her current

immunity against VZV (Plotkin, 2010). Further, we considered physical contacts

lasting longer than 15 minutes to be a good proxy for potential varicella transmission

events as shown by Goeyvaerts et al. (2010) for Belgium. Finally, our use of R as

a model eligibility criterion relied on the assumption of endemic equilibrium. This

3.4. Discussion 67

assumption is supported by the similarity in the results obtained for the two samples

of Italy. In addition most surveys span two seasons, which partly captures any

seasonal fluctuation. However, there are many factors that can cause changes in

the age distribution of VZV cases over time, e.g. changes in demography, medical

practice, socio-cultural factors etc. We performed a sensitivity analysis to give us a

sense of the way R changes when demographic or endemic equilibrium are perturbed.

Looking at this more rigorously requires an additional in-depth analysis which is the

topic of future research.

Since direct inference for the infectivity parameter is hindered by the lack of

information regarding infectiousness in the serological data, we estimated this

parameter via indirect inference using the effective reproduction number. This

indeterminacy illustrates that the use of social contact data does not completely

resolve the identifiability issues encountered when estimating mixing patterns from

serological data. Hence, further research is necessary to obtain additional knowledge

about the age-specific susceptibility and infectivity profiles in order to inform the

proportionality factor in this social contact approach.

Chapter 4Structural Differences in Mixing

Behaviour Informing the Role of

Asymptomatic Infection and Testing

Symptom Heritability

In the absence of effective vaccines or treatment, controlling the spread of an

infectious disease during the early stages of an outbreak, relies on (i) isolation of

symptomatic cases and (ii) tracing and quarantining the contacts of these cases.

Hence, the timing of onset of symptoms relative to the start of infectiousness is a

crucial factor in the success of these public health interventions. It has been shown

that the proportion of asymptomatic infections (i.e. transmission that occurs before

symptom onset or without showing symptoms at all) is a key parameter to predict

whether or not isolation and contact tracing will lead to containment (Fraser et al.,

2004). It is therefore important to use an epidemic model that explicitly takes into

account asymptomatic transmission. However, in many cases the available data is

based on observations of symptomatic individuals only. To overcome this limitation,

models often rely on untestable assumptions, e.g. assuming a fixed proportion of

asymptomatic individuals (Inaba and Nishiura, 2008) or ignoring pre-symptomatic

transmission (Ejima et al., 2013).

69

70 Chapter 4. Differences in Mixing Behaviour and Symptom Heritability

Data on social contacts of individuals in a population have already proven to

be a valuable additional source of information when estimating the ‘Who Acquires

Infection From Whom’ (WAIFW) matrix and the basic reproduction number R0 (see

Section 1.3.2.1). More recently, social contact data have also been used to gain insight

in the impact of illness on social contact patterns Eames et al. (2010). It was found

that individuals symptomatic with influenza-like illness (ILI) have less social contacts

than asymptomatic individuals. Furthermore, the age distribution of contacts

differs between symptomatic and asymptomatic cases. These differences in mixing

behavior affect the expected distribution of infection during the early stages of an

outbreak, which allowed Van Kerckhove et al. (2013) to estimate the proportion of ILI

infections caused by asymptomatic cases (34%; CI: 0% - 77%) from ILI incidence data.

Influenza viruses are highly infectious and cases can show a variety of symp-

toms such as fever, runny nose and sore throat. A substantial number of cases also

show little to no apparent symptoms. Several challenge studies have looked at the

dynamics of viral shedding and symptoms following influenza virus infections; for

a review see Carrat et al. (2008). Symptomatic cases are considered to be more

infectious than asymptomatic cases, since it was found that clinical cases have a

higher quantity of virus in nasal wash fluids compared to individuals who did not

develop symptoms. In addition, a positive correlation was found between severity

of illness and the mean quantity of virus. The link between administered dose and

development or degree of symptoms is less clear. Carrat et al. (2008) reported

a negative correlation between inoculated dose and fever, whereas Huang et al.

(2011) did not find a dependency between inoculated dose and disease outcome.

Their findings point to host factors leading to asymptomatic infections. Hence, it is

clear that more research is needed to find the precise link between the amount and

duration of viral shedding, the development and the degree of symptoms and the

transmission of the virus.

In this chapter we will extend the work of Van Kerckhove et al. (2013) by in-

corporating social contact data from asymptomatic and symptomatic individuals

(Section 2.2.2) to inform mixing patterns in a compartmental model described by a

system of ordinary differential equations (Santermans et al., in revision). We will

illustrate inference on parameters related to asymptomatic infection using incidence

data on influenza-like illness (Section 2.1.2). Furthermore, we will also investigate the

possibility that the chance of developing ILI symptoms depends on whether infection

came from a symptomatic or an asymptomatic case. This chapter is organized as

4.1. Methodology 71

follows. In Section 4.1, we introduce the model structure and estimation procedure.

In Section 4.2, the ILI data are analyzed, the impact of control strategies is discussed

in Section 4.3, and, lastly, Section 4.4 summarizes our main results, conclusions, and

avenues for further research.

4.1 Methodology

Two transmission models are proposed to describe the disease dynamics for influenza

and infections with similar disease progress. In the first model individuals either

develop symptoms or not after a pre-symptomatic stage. We will refer to this model

as the non-preferential transmission model, since the development of symptoms is

independent of the status of the infector. We extend this model by keeping track of

whether a susceptible individual is infected by an asymptomatic or a symptomatic

case. This second model will be referred to as the preferential transmission model.

In Sections 4.1.1 and 4.1.2 the models and underlying structure are introduced. The

estimation procedure is described in Section 4.1.3.

4.1.1 Transmission Models

In Figure 4.1 the non-preferential model is presented in a flow diagram. Note that

superscripts indicate clinical status of the infected individual: symptomatic ‘s’ or

asymptomatic ‘a’.

Figure 4.1: Schematic diagram of the non-preferential transmission model. Superscripts

indicate presence (s) or absence (a) of symptoms.

Hence, we assume that susceptible individuals are infected at rate λ(t). Following


infection, individuals enter the exposed compartment (E) in which they are infected

but not yet infectious. After a mean latent period 1/γ individuals become asymp-

tomatic infectious, entering the compartment Ia1 . We define φ; 0 ≤ φ ≤ 1 to be the

proportion of cases that will develop symptoms, and 1 − φ the proportion of cases

that will remain asymptomatic. Infectious individuals move from the asymptomatic

compartment Ia1 to the symptomatic Is or asymptomatic Ia2 compartments at rates

φ×θ and (1−φ)×θ, respectively. Finally, individuals recover and move to the recov-

ered compartment (R) at rates σa and σs, respectively. The corresponding system of

ordinary differential equations (ODEs) is given by

dS(t)dt = −λS(t),

dE(t)dt = λS(t)− γE(t),

dIa1 (t)dt = γE(t)− θIa1 (t),

dIa2 (t)dt = (1− φ)θIa1 (t)− σaIa2 (t),

dIs(t)dt = φθIa1 (t)− σsIs(t),

dR(t)dt = σaIa2 (t) + σsIs(t).

In contrast, the preferential model differentiates between infection caused by an

asymptomatic case (at rate λa(t)) and by a symptomatic case (at rate λs(t)),

respectively. Figure 4.2 shows a schematic diagram of this model.

Figure 4.2: Schematic diagram of the preferential transmission model. Superscripts

indicate clinical status of the infected individual: symptomatic (s) or asymptomatic (a).

Subscripts indicate whether the infector was symptomatic (s) or asymptomatic (a).

4.1. Methodology 73

If the infector is asymptomatic, the infected individual will move from S to Ea; if the

infector is symptomatic, the infected individual will move to Es. Next, cases become

asymptomatic infectious at rate γ and move to Iaa or Ias . Infected individuals then

either develop symptoms or remain asymptomatic. We define φa as the probability

that an individual infected by an asymptomatic case remains asymptomatic and φs as

the probability that an individual infected by a symptomatic case develops symptoms.

Note that we assume the length of the incubation period to be independent of the

infector-type. Under this assumption, the preferential model simplifies to the non-

preferential model if φs = 1 − φa. The system of ordinary differential equations

(ODEs) for the preferential model is given by

dS(t)dt = −(λa + λs)S(t),

dEa(t)dt = λaS(t)− γaEa(t),

dEs(t)dt = λsS(t)− γsEs(t),

dIaa (t)dt = γaEa(t)− θaIaa (t),

dIas (t)dt = γsEs(t)− θsIas (t),

dIa(t)dt = (φa)θaI

aa (t) + (1− φs)θsIas (t)− σaIa(t),

dIs(t)dt = (1− φa)θaI

aa (t) + φsθsI

as (t)− σsIs(t),

dR(t)dt = σaIa(t) + σsIs(t).

4.1.2 Age Structure and Social Contacts

Consider a population that is divided in K age categories. The age-specific force of

infection λ(k, t), i.e. the rate at which a susceptible person in age group k acquires

infection at time t, is given by (discretized version of (1.10) in Section 1.3.2.1):

λ(k, t) =

K∑k′=0

β(k, k′)I(k′, t),

where I(k′, t) denotes the total number of infectious individuals in age group k′ at time

t. Further, we follow the social contact hypothesis (1.12) described in Section 1.3.2.1

in which we distinguish between the asymptomatic social contact matrix Ca and

the symptomatic social contact matrix Cs. Hence, ca(k, k′) is the per capita rate

at which an asymptomatic individual in age group k′ makes contact with a person


in age group k. We allow different proportionality factors for asymptomatic and

symptomatic individuals and denote them by qa and qs, respectively. Hence,

βa(k, k′) = qa · ca(k, k′),

βs(k, k′) = qs · cs(k, k′),

with βa and βs the transmission rates of asymptomatic and symptomatic cases, re-

spectively. Lastly, define qr = qs

qa as the relative infectiousness of symptomatic cases

versus asymptomatic cases. Then, the force of infection for the non-preferential trans-

mission model is defined as

λK×1(t) = βaK×K × (Ia1,K×1(t) + Ia2,K×1(t)) + βsK×K × IsK×1(t),

where × denotes matrix multiplication. For the preferential transmission model the

rate at which a susceptible individual acquires infection from an asymptomatic or

symptomatic individual at time t, respectively, are given by

λa,K×1(t) = βaK×K × (Iaa,K×1(t) + Ias,K×1(t) + IaK×1(t)),

λs,K×1(t) = βsK×K × IsK×1(t).

The total force of infection is then λK×1(t) = λa,K×1(t) + λs,K×1(t). The repro-

duction numbers for these models can be derived using the next-generation approach

(Section 1.3.2.2 and Diekmann et al. (1990)). For the non-preferential model the next

generation matrix corresponding with the infected states (E, Ia1 , Ia2 , I

s) is given by

GNP =

βa∆S>

θ + (1−φ)βa∆S>

σa + φβs∆S>

σsβa∆S>

θ + (1−φ)βa∆S>

σa + φβs∆S>

σs

0 0

0 0

0 0

βa∆S>

σaβs∆S>

σs

0 0

0 0

0 0

.

Therefore the reproduction number for the non-preferential model is given by

RNP = max (eigenvalue (GNP )):

RNP = max

(eigenvalue

(βa∆S>

θ+

(1− φ)βa∆S>

σa+φβs∆S>

σs

)),

4.1. Methodology 75

where Ac×d∆Bc×1 operates by multiplying the ith row of A with the ith element

of B. The next generation matrix for the preferential model corresponding with the

infected states (Ea, Es, Iaa , I

as , I

a, Is) is given by

GP =

βa∆S>

θ + φaβa∆S>

σaβa∆S>

θ + (1−φs)βa∆S>

σaβa∆S>

θ + φaβa∆S>

σa

(1−φa)βs∆S>

σsφsβ

s∆S>

σs

(1−φa)βs∆S>

σs

0 0 0

0 0 0

0 0 0

0 0 0

βa∆S>

θ + (1−φs)βa∆S>

σaβa∆S>

σa 0φsβ

s∆S>

σs 0 βs∆S>

σs

0 0 0

0 0 0

0 0 0

0 0 0

.

Therefore the reproduction number is given by RP = max (eigenvalue (GP )).


We divide the population in five age categories based on the age classes of the

incidence data at hand: 0 − 4, 5 − 14, 15 − 44, 45 − 65 and 65+. The data consist

of reported number of new symptomatic cases per age group per week. We take into

account that not all ILI cases are reported via general practitioners and that these

under-reporting rates can differ by age.

We use a likelihood-based approach by assuming

yk,j ∼ Po(ρk · (Isnew,k(j)− Isnew,k(j − 1))),

where yk,j is the observed number of new cases in age group k in week j. Isnew,k(t)

is the expected cumulative number of new symptomatic cases in age group k at

time t obtained by solving dIsnew(t)/dt = φ · θ · Ia1 (t) for the non-preferential model

and dIsnew(t)/dt = (1 − φa) · θ · Iaa (t) + φs · θ · Ias (t) in the preferential model. The

age-specific reporting rate of ILI cases is denoted by ρk(k = 1, ..., 5).


The system of differential equations is initiated by taking into account the

pre-existing immunity to the pandemic strain S(0). Furthermore, since we observed

a large impact of the initial number of symptomatic cases Is(0), these five param-

eters (one for each age category) are included in the estimation procedure. The

number of asymptomatic cases at time 0 is assumed to be 0. The initial number

of recovered individuals is then R(0) = N−S(0)−Is(0), with N the population size.

Our aim is to estimate φ, φa, φs, qa, qr, ρk, and Isk(0)(k = 1, ..., 5). Other pa-

rameters are assumed known and were obtained from a literature review on influenza

transmission models by Dorjee et al. (2013). In this review, values were extracted

from studies that estimate (or use) mean, minimum and/or maximum of influenza

parameters. These were summarized into three categories: (1) estimated values,

where an article attempted to estimate parameters from empirical data; (2) referenced

values, where values were adopted from other papers; (3) assumed values, where

values were based on expert opinion or unpublished data sources. Parameters were

summarized as median and range of means, minimum and maximum values from the

reviewed articles. An overview of these parameters is given in Table 4.1. Note that

the subclinical and clinical infectious period refer to the period from infectiousness to

recovery for asymptomatic (1/θ + 1/σa) and symptomatic individuals (1/θ + 1/σs),

respectively. We will use the estimated values when available (median of means),

otherwise the referenced values are assumed to be known.

Parameters are estimated via a Markov Chain Monte Carlo (MCMC) approach. This

procedure was performed using the LaplacesDemon package (Hall, 2011) in R3.1.1

and R3.2.2. A two-phase approach was used, where the first phase consists of the

Adaptive-Mixture Metropolis (AMM) algorithm to achieve stationary samples that

seem to have converged to the target distribution. In the second phase Random-Walk

Metropolis (RWM), a non-adaptive algorithm, is used to obtain final samples. In this

phase 10,000,000 iterations were conducted retaining every 1,000th iteration. Burn-in

period is based on the convergence diagnostic by Boone, Merrick and Krachey (BMK)

(Boone et al., 2012). The univariate prior distributions for all parameters are given in

Table 4.2. All of these are uninformative, except for the initial number of symptomatic

cases. The number of symptomatic cases at time 0 in age class i is assumed to follow

a truncated normal distribution, with mean µk and standard deviation δk based on

the observed ILI incidence for age class k in week 22. To ensure that the estimates

lay within their proper parameter space, logit transformations are applied for φ, φa,

4.1. Methodology 77

Table 4.1: An overview of parameters of pandemic influenza A/H1N1 2009 in humans

obtained from a literature review (Dorjee et al., 2013). These values were either estimated

from empirical data of experimental or observational studies (Est.); or referenced for

modeling (Ref.).

Parameter Median of Median of Median of

means (range) min. values (range) max. values (range)

Incubation period Est. 2.0 1.0 (1.0 - 2.0) -

1/γ + 1/θ Ref. 2.0 (1.5 - 3.0) 1 5

Latent period Est. - - -

1/γ Ref. 1.5 (1 - 3.5) 0.9 (0.7 - 1.0) 4.0 (2.0 - 5.0)

Subclinical Est. - - -

infectious period Ref. 1.0 (0.5 - 2.5) - 2.0

1/θ + 1/σa

Clinical Est. 5.6 1.0 10.0 (8.0 - 12.0)

infectious period Ref. 3.8 (2.5 - 7.0) 3.8 (1.9 - 4.0) 5.5 (2.9 - 10)

1/θ + 1/σs

φs and ρk(k = 1, ..., 5) and log transformations for qa and qr. Furthermore, since

symptomatic cases are considered to be more infectious than asymptomatic cases, the

infectiousness ratio qr is restricted to be larger than 1.

Table 4.2: Prior distributions for the parameters in the preferential and non-preferential

model.

Parameter Prior distribution

Isk(0) N(µk, δk)(k = 1, ..., 5); truncated(0.1, 1000)

φ U(0, 1)

φa U(0.1, 1)

φs U(0, 0.95)

qa U(0, 10)

qr U(1, 10)

ρk U(0, 1)


4.2 Application to the Data

Using the social contact matrices and the ILI incidence data, we will look into the

estimation of the proportionality factor, qa, the infectiousness ratio qr, the reporting

rates, ρk(k = 1, ..., 5), the proportion of symptomatic infections, φ, for the non-

preferential model and the proportions φa and φs for the preferential model.

4.2.1 Exploratory Analyses

To assess the estimability of the reporting rates ρk(k = 1, ..., 5), we perform an ex-

ploratory analysis in which these reporting rates are replaced by one age-independent

reporting rate ρ. Parameter estimation in the non-preferential and preferential model

is then performed for fixed values of ρ at 0.3, 0.5, 1.0. The results are shown in

Tables 4.3 and 4.4.

Table 4.3: Posterior median, 95% posterior intervals and DIC value for the

non-preferential model for different values of the reporting rate ρ.

Parameter ρ = 0.3 ρ = 0.5 ρ = 1.0

φ 0.34(0.10, 0.90) 0.34(0.083, 0.86) 0.35(0.083, 0.90)

qa 0.060(0.035, 0.079) 0.062(0.039, 0.081) 0.062(0.039, 0.080)

qr 2.79(1.05, 8.90) 2.68(1.04, 9.52) 2.67(1.05, 9.18)

ρ1 0.3 0.5 1.0

ρ2 0.3 0.5 1.0

ρ3 0.3 0.5 1.0

ρ4 0.5 1.0

ρ5 0.3 0.5 1.0

R 1.47(1.39, 1.56) 1.46(1.39, 1.55) 1.46(1.39, 1.55)

DIC 306.61 307.05 307.56

We conclude for both models that the value of one age-independent reporting

rate ρ does not affect model fit or other parameter estimates. Hence, it is only

possible to estimate the relative differences in reporting rates between age categories.

We set the reporting rate of a randomly chosen age category fixed as reference

category: ρ4 = 0.2. This value of 20% is based on a literature search for reporting

rates on ILI and influenza. Since no information on reporting rates was found

specifically for H1N1 in England, this search was conducted worldwide including

4.2. Application to the Data 79

seasonal influenza e.g. Lunelli et al. (2013). However, since there is so little informa-

tion on under-reporting, we will only be interpreting the estimated relative differences.

Table 4.4: Posterior median, 95% posterior intervals and DIC value for the preferential

model for different values of the reporting rate ρ.

Parameter ρ = 0.3 ρ = 0.5 ρ = 1.0

φa 0.98(0.91, 1.00) 0.99(0.92, 1.00) 0.99(0.92, 1.00)

φs 0.28(0.07, 0.64) 0.29(0.07, 0.65) 0.25(0.07, 0.63)

qa 0.11(0.093, 0.12) 0.11(0.092, 0.12) 0.11(0.092, 0.12)

qr 2.32(1.03, 8.97) 2.24(1.04, 9.05) 2.53(1.03, 9.35)

ρ1 0.3 0.5 1.0

ρ2 0.3 0.5 1.0

ρ3 0.3 0.5 1.0

ρ4 0.3 0.5 1.0

ρ5 0.3 0.5 1.0

R 1.45(1.27, 1.62) 1.45(1.26, 1.62) 1.44 (1.26, 1.61)

DIC 290.97 291.78 291.43

4.2.2 Results

Posterior medians and 95% posterior credible intervals for the estimated parameters

and R in the non-preferential model are shown in Table 4.5. Posterior and prior

distributions are plotted in Figure 4.3. Scatter plots are shown in Figure 4.4 and the

estimated number of symptomatic and asymptomatic cases over time are plotted in

Figure 4.5.

The posterior credible intervals for φ and qr are quite wide, indicating that it

is difficult to estimate these parameters from the data. This is confirmed by the

posterior density plots. A scatter plot of φ versus qr (Figure 4.4) shows a strong link

between both parameters, indicating that we can either estimate the proportion of

symptomatic cases or the relative infectiousness from the data at hand.


Table 4.5: Posterior median, 95% posterior credible intervals and DIC value for the

non-preferential and preferential model.

Parameter Non-preferential Preferential

φ 0.15(0.04, 0.39)

φa 0.98(0.89, 1.00)

φs 0.22(0.062, 0.58)

qa 0.082(0.069, 0.093) 0.10(0.092, 0.12)

qr 2.76(1.04, 9.21) 2.62(1.04, 9.20)

ρ1 0.21(0.18, 0.25) 0.20(0.17, 0.24)

ρ2 0.20(0.17, 0.22) 0.20(0.18, 0.23)

ρ3 0.23(0.20, 0.25) 0.21(0.19, 0.23)

ρ4 0.2 0.2

ρ5 0.15(0.12, 0.19) 0.15(0.12, 0.18)

R 1.36(1.33, 1.40) 1.41(1.23, 1.63)

DIC 298.75 288.49

Figure 4.3: Prior and posterior distributions for the proportion of cases that develop

symptoms (φ), the proportionality factor for asymptomatic individuals (qa), the relative

infectiousness of symptomatic cases versus asymptomatic cases (qr) and the reporting rates

(ρi, i = 1, 2, 3, 5).

When only 15% of cases develop symptoms, symptomatic cases are estimated to be

about 2.76 times more infectious than asymptomatic cases. The reproduction number


is estimated to be 1.36. Lastly, the reporting rate is estimated to be about 1.15 times

higher in the 15 − 44 years age group and 0.75 times lower in the 65+ years age

group compared to the reporting rate in the 45− 65 years age group. The estimated

incidence is shown in Figure 4.10.

Figure 4.4: Scatter plot of the proportion of cases that develop symptoms (φ), the

proportionality factor for asymptomatic individuals (qa) and the infectiousness ratio (qr).

Figure 4.5: Number of symptomatic (full line) and asymptomatic (dotted line) cases over

time for the five age categories assuming a 20% reporting rate in the 45− 65 age class for

the non-preferential model.

Results for the preferential model are presented in Table 4.5 and Figures 4.6-4.10.


Similar as for the non-preferential model, we conclude that φs and qr are strongly

connected and cannot simultaneously be estimated from the data. When 22% of cases

infected by a symptomatic case develop symptoms, symptomatic cases are estimated

to be about 2.62 times more infectious than asymptomatic cases. Furthermore,

this model confirms that the reporting rate in the 65+ age class is lower than in

the other age categories. Reproduction number is estimated at 1.41. Lastly, this

model has a smaller DIC value and, hence, a better fit than the non-preferential model.

To check whether the preferential model simplifies to the non-preferential model

(φs = (1− φa)), the difference between φs and 1− φa is calculated for each posterior

sample. The histogram of this difference is shown in Figure 4.9. The 95% credible

interval for the difference is [0.05, 0.55]. This shows that the preferential model does

not simplify to the non-preferential model.

Figure 4.6: Prior and posterior distributions for the proportion of individuals infected by

a symptomatic case that develop symptoms (φs), the proportion of individuals infected by

an asymptomatic case that remain asymptomatic (φa), the proportionality factor for

asymptomatic individuals (qa), the relative infectiousness of symptomatic cases versus

asymptomatic cases (qr) and the reporting rates (ρi, i = 1, 2, 3, 5).


Figure 4.7: Scatter plot of the proportion of individuals infected by a symptomatic case

that develop symptoms (φs), the proportion of individuals infected by an asymptomatic

case that remain asymptomatic (φa), the proportionality factor for asymptomatic

individuals (qa) and the infectiousness ratio (qr).

Figure 4.8: Number of symptomatic (full line) and asymptomatic (dotted line) cases over

time for the five age categories assuming a 20% reporting rate in the 45− 65 age class for

the preferential model.


Figure 4.9: Histogram of MCMC samples for φs − (1− φa), with φs the proportion of

individuals infected by a symptomatic case that develop symptoms and φa the proportion

of individuals infected by an asymptomatic case that remain asymptomatic in the

preferential model.

4.3 Impact of Home Isolation

One of the possible interventions targeting symptomatic individuals is recommending

time off from work. Van Kerckhove et al. (2013) showed that contacts made

at home are not a proxy for contacts made when symptomatic. Therefore, we

assess the impact of individuals staying at home after symptom onset by assuming

that a proportion p of symptomatic individuals stays home immediately after

symptom onset. The contact matrix for symptomatic individuals Cs is replaced by

pCsh + (1 − p)Cs in which Csh is the contact matrix obtained from contacts made at

home by symptomatic individuals in the social contact survey. Hence, we assume

that these contact rates do not increase when individuals stay at home. The obtained

posterior parameter samples from the (non-)preferential model are used to solve the

4.3. Impact of Home Isolation 85

Fig

ure

4.1

0:

Obse

rved

(gre

ybars

)and

esti

mate

d(c

onnec

ted

dots

)re

port

edw

eekly

inci

den

cefo

rth

efive

age

cate

gori

es.

Full

line

and

filled

dots

isth

ees

tim

ate

din

ciden

cefo

rth

enon-p

refe

renti

al

model

,dott

edline

and

op

endots

are

the

esti

mate

sfo

rth

epre

fere

nti

al

model

.


system of ODEs associated with this isolation model for fixed values of p. This way

we can assess the impact of p on the difference in the number of (a)symptomatic cases.

Figure 4.11 shows the reduction of cases when a proportion p of symptomatic

individuals stays home after symptom onset. As p increases, the reduction in cases

also increases. For the non-preferential model, there is no visible difference between

symptomatic and asymptomatic cases (not shown). Using the preferential model,

we do see a larger reduction in symptomatic cases compared to asymptomatic cases.

Note that the reduction of cases is larger according to the non-preferential model in

comparison with the preferential model.

Figure 4.11: Proportion of cases plotted against the proportion of symptomatic

individuals staying home immediately after symptom onset. Left panel: reduction in total

number of cases for the non-preferential model with 95% confidence intervals. Right panel:

reduction in the number of total, symptomatic and asymptomatic cases for the preferential

model.

4.4 Discussion

In this chapter, we inferred parameters for an epidemic model accounting for asymp-

tomatic transmission and age-dependent under-reporting based on weekly incidence

data and social contact data from symptomatic and asymptomatic individuals.

The differences in mixing behavior between these individuals affect the expected

4.4. Discussion 87

age-distribution of infection during the early stages of an outbreak (Van Kerckhove

et al., 2013). This makes it possible to estimate parameters related to asymptomatic

infection using data on symptomatic cases only. Furthermore, we compared a simple

SEIR model with asymptomatic infection to a model in which the development of

symptoms depends on the status of the infector.

Using a Bayesian approach on ILI data from England and Wales during the

early stages of the 2009 epidemic (Public Health England, 2010), we showed that

it is possible to either estimate the proportion of symptomatic infections or the

relative infectiousness of symptomatic cases compared to asymptomatic cases in

the non-preferential model. Hence, when one has prior information on one of these

parameters, it is possible to estimate the other one from incidence data. Furthermore,

we found that the data supports the preferential transmission hypothesis i.e. the

development of ILI symptoms depends on whether one was infected by a symptomatic

or asymptomatic case. Both models show a significantly larger under-reporting rate

for people older than 65 years in comparison with 45− 65 year olds. This means that

the discrepancy between consultation rates and symptomatic illness rates is larger

for the elderly in comparison with the non-elderly adults, although consultation rates

in this last age category were found to be lower. Also note that the reporting rates

we estimate can possibly account for factors other than the propensity to visit a GP,

e.g. the ability to better fit the data because of working with a hidden layer (Azmon

et al., 2014). Lastly, we assessed the effect of symptomatic individuals staying at

home. Following the preferential transmission hypothesis, we found a reduction in

total number of cases of 39% (0.30, 0.45) when 50% of individuals would stay home

immediately after symptom onset. If all symptomatic individuals would stay home,

a reduction of 63% (0.53, 0.70) is observed. To assess more subtle scenarios of home

isolation, we will use individual-based models in future research.

Recently, Lin et al. (2016) explored the trade-off between contact rates and

infectiousness (i.e. decreasing contact rates and increasing infectiousness with

increasing symptom severity) using a model similar to our non-preferential model.

They found that R0 varies non-monotonically with symptom severity, implying that

certain interventions such as antivirals for influenza, can increase R0. Their research

highlights the importance of using empirical data describing the relation between

contact rates and symptom severity in epidemiological models.

The preferential model resembles the infector-dependent severity (IDS) model


described by Ball and Britton (2007). However, they assume a homogeneously mix-

ing population and do not estimate model parameters. They derived a threshold limit

theorem for their model and looked at the effect of vaccination. They showed that

in certain scenarios the proportion of mildly (asymptomatic in our setting) infected

individuals can increase with increasing vaccination coverage. This emphasizes the

practical importance of our model for a wide range of pathogens with different levels

of symptom severity.

One of the limitations of our approach is that the reporting rates are not es-

timable from the data. Hence, one can only infer on the relative differences in

reporting between age categories. To estimate the true number of cases, information

on the reporting rate in at least one age class is needed. Further, we assume constant

reporting rates over time, since we do not have knowledge about temporal changes in

reporting. Also, the obtained estimates rely on the values of the fixed parameters as

found in the literature. Changing these parameters will affect the estimated target

parameters. Lastly, we use social contact data and incidence data from A/H1N1pdm

in 2009, thus it is uncertain how our conclusions would apply to other influenza strains.

Future research is needed to clarify the exact role of acquired viral dose in the

development of influenza symptoms. Up until now, challenge studies have not given

clear results able to confirm or reject our preferential transmission hypothesis (Carrat

et al., 2008; Huang et al., 2011). Lastly, to extend this model for other diseases more

empirical data on how contact rates change with symptom severity are needed.

Chapter 5Empirical Household Contact

Networks: Challenging the

Household Random Mixing

Assumption

Households are crucial units in the epidemiology of airborne infectious diseases

such as influenza, smallpox and SARS. Relations between household members are

typically characterized by frequent and intimate contacts, allowing for rapid disease

spread within the household upon introduction of an infectious case. As stated by

Ferguson et al. (2006): “being a member of a household containing an influenza

case is in fact the largest single risk factor for being infected oneself” (Longini

et al., 1982; Cauchemez et al., 2004). Furthermore, households with children have

a bridging function allowing an infection to spread from schools to workplaces and

visa versa. Inference from household final-size data revealed that children play a

key role in bringing influenza infection into the household and in transmitting the

infection to other household members (Cauchemez et al., 2004). Households are the

most common transmission unit used in observational studies and in epidemic models.

Many epidemic models rely on the assumption of homogeneous mixing within

households. In early work by Longini and Koopman (1982); Becker (1989); Addy

89

90 Chapter 5. Empirical Household Contact Networks

et al. (1991), Reed-Frost type of models were used to estimate household and

community transmission parameters from household final size data, assuming a

constant probability of infection from the community. Ball et al. (1997) generalized

this to the so-called ‘households model’ with two levels of mixing, assuming random

mixing within households (local) and in the entire population (global), the latter

typically at a much lower rate. The analytical tractability of the households model

allowed the theoretical study of epidemic phenomena. This has led to the definition of

threshold parameters such as the reproduction number R∗, representing the average

number of households infected by a typical infected household in a totally susceptible

population (Ball et al., 1997; Ball and Neal, 2002). Meyers et al. (2005) used a

contact network model in an urban setting incorporating households as complete

networks (cliques) to explain the early epidemiology of SARS. Individual-based

simulation models of infectious disease spread incorporate detailed individual-level

information as to mimic demographic and social characteristics of a specific popula-

tion (e.g. Chao et al. (2010); Mniszewski et al. (2008); Grefenstette et al. (2013)).

These models sometimes incorporate more detailed structure in specific settings

such as schools and workplaces, but typically assume random mixing in households.

Studies that particularly highlight within-household transmission and control policies

targeting households can be found in Halloran et al. (2002) and Ferguson et al. (2006).

It has been argued that greater realism could be gained by considering differ-

ent household compositions and contact heterogeneity within households (Danon

et al., 2011). By ignoring contact heterogeneity between household members,

the contact network density equals the contact rate between two individuals in

a household and is a determinant for the within-household transmission rate

of airborne infectious diseases (Wallinga et al., 2006; Mossong et al., 2008b).

Until now there was no direct empirical evidence to support the assumption

of homogeneous mixing within households. Egocentric contact surveys entailed

partially observed within-household contact networks and only allowed for indirect

inference of the unobserved network links (Potter et al., 2011; Potter and Hens, 2013).

In this chapter, we describe the first social contact survey specifically designed

to study contact networks within households. This study enables us to empirically

assess the assumption of homogenous mixing, e.g. by studying the effect of age and

gender on social distance within households. Furthermore, it provides an answer

to one of the key questions to inference on household models: how the density of

the contact network scales with the household size (Danon et al., 2011). Lastly,

5.1. Household Contact Survey 91

this survey makes it possible to asses reporting quality for diary-reported contact

surveys by looking at symmetry in contact reporting. We use Exponential Random

Graph Models (ERGMs; introduced in Section 1.3.3) to gain insight in the factors

driving close contact between household members and to develop a plausible model

for within-household physical contact networks. We then compare these empirically

grounded ERGMs to the assumption of random mixing using stochastic simulations

of an epidemic in the mise en scene of the two-level mixing model. This work is

presented in Goeyvaerts et al. (to be submitted).

In Section 5.1, we provide some more details on the household contact survey

that was introduced in Section 2.2.4. Details on the ERGM, estimates obtained

for the household data and goodness-of-fit results are presented in Section 5.2. In

Section 5.3, we perform epidemic simulations to compare the assumption of random

mixing with the household model inferred in Section 5.2. Lastly, a discussion is given

in Section 5.4.

5.1 Household Contact Survey

We use data from the survey described in Section 2.2.4. Table 5.1 summarizes the

proportion of complete (i.e. fully connected) networks and the mean network density

for the within-household physical contact networks by household size, distinguishing

week from weekend days and regular from holiday periods. The network density

is defined as the ratio of the number of observed edges to the number of potential

edges.

Overall, the type of day does not seem to have a large impact on the connect-

edness within households, however, the data suggest some decreasing connectedness

with increasing household size, mainly on weekdays and during regular periods. For

2-parent households of size 4, the observed proportion of complete networks is 0.77

on weekdays and 0.85 on weekend days. Estimates inferred by Potter et al. (2011)

and Potter and Hens (2013) using Belgian egocentric contact data for the same type

of household, were smaller and ranged from 0.34 to 0.65. For the purpose of studying

household contacts, we consider our survey design an improvement upon the design

of the source data for those two studies (POLYMOD study, Mossong et al. (2008b)).

Our survey is quite similar to the POLYMOD design, but all household members

are recruited to take part in the survey and participants have to identify whether


each contacted person is a member of his/her household. In the two aforementioned

network analyses of POLYMOD data, a reported contact was assumed to be with

a household member if it occurred at home, was reported as daily or almost daily

and if the age matched one of the reported ages of household members. This way,

only a partial network was observed for each household since information on contacts

between the respondent and household members was available, but not on contacts

between other members. The density of contacts within households was, therefore,

likely underestimated.

Various measures of within-household clustering are defined in Section A.2 and

Table A.1 shows the high degree of physical contact clustering observed within

households.

Week Weekend

HH Nr. Proportion Mean Nr. Proportion Mean

size HHs complete density HHs complete density

2 9 1.00 1.00 3 1.00 1.00

3 53 0.91 0.96 19 0.74 0.88

4 111 0.77 0.93 48 0.85 0.96

5 39 0.64 0.90 18 0.78 0.95

≥ 6 13 0.46 0.85 3 1.00 1.00

Total 225 0.77 0.93 91 0.82 0.94

Regular period Holiday period

HH Nr. Proportion Mean Nr. Proportion Mean

size HHs complete density HHs complete density

2 9 1.00 1.00 3 1.00 1.00

3 42 0.86 0.94 30 0.87 0.93

4 105 0.82 0.94 54 0.76 0.93

5 38 0.66 0.91 19 0.74 0.92

≥ 6 12 0.50 0.84 4 0.75 0.98

Total 206 0.79 0.93 110 0.79 0.93

Table 5.1: Proportion of complete networks and mean network density, stratified by

household size, for the observed within-household physical contact networks, comparing

week and weekend days (top) and regular and holiday periods (bottom).

5.2. ERGMs for Within-household Physical Contact Networks 93

5.2 ERGMs for Within-household Physical Contact

Networks

We use ERGMs (see Section 1.3.3 for details) to model the within-household physical

contact networks. We infer on the processes driving physical contacts between

household members by incorporating network statistics based on nodal covariate in-

formation (Table 5.2). Although our analysis is focused on within-household contact

networks, we fit a single ERGM including all households. We include in our model

both the total number of edges and a household effect which captures the tendency

to contact others in one’s own household. Because there are no between-household

contact reports present in our survey, the probability of between-household contact

should be zero. Constrained optimization with fixed coefficients for these statistics as

proposed by Potter and Handcock (2010) does not entail a plausible approximation

of the likelihood for our data. Therefore, we use unconstrained optimization and

check whether the probability of physical contact between non-household members is

approximately zero.

We explore the effect of relationships i.e. mixing among siblings, between chil-

dren and their parents and between partners, gender-preferential mixing and age

effects in children, and the effect of household size, distinguishing small (≤ 3

members), medium (4 members) and large (≥ 5 members) households.

Above network statistics are all dyad independent: the vector of change statistics

δg(y,X)ij does not depend on the value y. We explore the presence of higher-order

dependency effects between members of the same household, such as clustering (see

Table A.1), by including in the model the number of isolate individuals, 2-stars,

triangles and triangles in households of size ≥ 6. A 2-star is a person connected to

two other household members and a triangle is a set of three household members

such that all three are connected to each other.

The triangle term estimates a transitivity effect, i.e. the increase in log odds

of contact between two people due to the fact that they have a third contact in

common. Inclusion of triangle terms in ERGM models has been found to lead to

“model degeneracy” in some cases (Handcock, 2003). Model degeneracy occurs when

the maximum likelihood estimate places most probability on a small set of possible

networks (e.g., all mass on the complete network). It results from the fact that


Network statistic Legend

Edges Total number of edges

Within-household edges Total number of edges within households

Child-father mixing Total number of edges between children and fathers

Child-mother mixing Total number of edges between children and mothers

Father-mother mixing Total number of edges between partners

Boy-boy mixing Total number of edges between male children

Girl-girl mixing Total number of edges between female children

Age effect children The sum of age(i) and age(j) for all edges (i, j) between siblings

Small (<=3) households Total number of edges within households of size ≤ 3

Large (>=5) households Total number of edges within households of size ≥ 5

Isolates Total number of isolates

2-stars Total number of 2-stars

Triangles Total number of triangles

Triangles in households of size ≥ 6 Total number of triangles in households of size ≥ 6

Table 5.2: Network statistics considered in the ERGMs, where an edge is defined as a

physical contact between two individuals. Reference categories are child-child mixing,

boy-girl mixing, and mixing within households of size 4.

the triangle term does not impose decreasing marginal returns on the number of

mutual contacts made by the two individuals in question. For example, the increase

in log odds of contact between a pair whose number of mutual contacts increases

from zero to one is forced to be the same as the increase in log odds of contact

between a pair whose number of mutual contacts increases from ten to eleven.

Alternate ERGM terms have been proposed which instead, more realistically, model

decreasing marginal returns of the number of mutual contacts on the log odds of

contact (Hunter, 2007). However, model degeneracy was not found to be a problem

in our case, possibly because the unique structure of our data set, which includes a

large number of households but includes no between-household contacts, prevents an

“avalanche effect” of triangles towards the complete network.

Approximate maximum likelihood estimates are obtained using a stochastic

Markov Chain Monte Carlo (MCMC) algorithm (Geyer and Thompson, 1992).

In short, a distribution of random networks is simulated from a starting set of

parameter values using MCMC and the parameter values are refined by comparing

this distribution of networks against the observed network in a Newton-Raphson

type algorithm, repeating this process until the parameter estimates stabilize

(Robins et al., 2007). MCMC estimation is performed with the ergm package in R

(Hunter et al., 2008; Handcock et al., 2013a) that is part of the statnet suite of


packages for statistical network analysis (Handcock et al., 2008, 2013b; Goodreau

et al., 2008). We use a burn-in of length 106, intervals between sampled networks

of length 103 and a total sample size equal to 5 · 105. The initial value of θ is

obtained by maximum pseudolikelihood estimation, considering (1.14) as a logistic

regression model assuming all Yij are mutually independent (Strauss and Ikeda, 1990).

Week Weekend

Network statistic Estimate p-value Estimate p-value

Edges -28.16 <0.01 -20.63 <0.01

Within-household edges 28.97 <0.01 22.78 <0.01

Child-father mixing -0.60 0.23 -1.15 0.45

Child-mother mixing 0.16 0.76 0.14 0.93

Father-mother mixing 0.27 0.66 -0.76 0.63

Age effect children -0.07 <0.01 -0.18 <0.01

Small households (≤3) 0.74 <0.01

Large households (≥5) -0.40 <0.01

2-stars -0.26 0.25 -0.87 0.01

Triangles 2.06 <0.01 3.58 <0.01

Triangles in households of size ≥6 -0.28 0.02

Loglikelihood -306.80 -65.98

AIC 635.59 147.95

Table 5.3: ERGM for within-household physical contact networks on week- and weekend

days: parameter estimates and Wald test p-values, log-likelihood and AIC.

The within-household physical contact networks were modeled separately for week-

days and weekends and the final ERGMs are presented in Table 5.3. The estimates

shown in this table are log odds ratios and, hence, need to be exponentiated to

obtain odds ratios, e.g. the odds of a physical contact between a father and child

is exp(−0.60) = 0.55 times the odds of a physical contact between two children

assuming other network characteristics remain fixed. Note that the edge effect is

estimated negative to counterbalance the large within-household edge effect, which

is needed because our data does not include between-household contacts. In both

models, the effects of gender-preferential mixing and the number of isolates were

found to be non-significant (likelihood ratio test p = 0.5766 for weekdays). For

weekend days, no significant effect of household size was found and the model was

further reduced to an 8-parameter model (likelihood ratio test p = 0.5134). On

weekdays, the odds of a physical contact occurring in a household of size ≤ 3 and


≥ 5 are estimated to be 2.10 and 0.67 times the odds of a physical contact occurring

in a household of size 4, respectively. Thus, the physical contact network density

decreases with increasing household size. Further on both type of days, the odds of

a physical contact between father and child is smaller than for any other pair except

for older siblings, as the probability for siblings to make physical contact decreases

with increasing age (Figure A.2). For households of size ≤ 5, the odds of a physical

contact that will complete a triangle is estimated to be 7.85 and 35.87 times the odds

of a physical contact that will not complete a triangle on week and weekend days,

respectively. This demonstrates the overall high degree of contact clustering within

households. On weekdays, the degree of clustering is slightly lower in households of

size ≥ 6 (conditional odds of 5.93).

Goodness-of-fit of the models is assessed by simulating new sets of physical

contact networks from the fitted ERGM and by comparing specific contact network

characteristics that are not included in the model, to the observed ones. We compare

the proportion of complete networks, the mean network density and the proportion

of observed versus potential triangles (Section A.2), by household size. We simulate

1000 networks using a burn-in of length 107 and intervals of length 106 between

sampled networks. The first simulated Markov chain begins at the initial network

and the end of one simulation is used as the start of the next simulation.

Figure 5.1: Proportion of complete networks (left) and mean network density (right):

observed values (blue stars with size proportional to the sample size) and values simulated

from the ERGM for within-household physical contact networks on a weekday.


Figure 5.2: Proportion of complete networks (left) and mean network density (right):

observed values (blue stars with size proportional to the sample size) and values simulated

from the ERGM for within-household physical contact networks on a weekend day.

Figure 5.3: Proportion of observed versus potential triangles: observed values (blue stars

with size proportional to the sample size) and values simulated from the ERGM for

within-household physical contact networks on a weekday (left) and on a weekend day

(right).

Overall, the final ERGMs seem to fit the data well as shown in Tables A.2-A.5 and

Figures 5.1-5.3.


5.3 Epidemic Spread in a Community of House-

holds

We simulate the spread of a newly emerging infection in a closed fully susceptible

population of households using a discrete-time chain binomial SIR model (see

Section 1.3.1.4). The 225 households from the contact survey that were analyzed

using the weekday ERGM, are used to construct the community of households. We

assume two levels of mixing similar to the households model of Ball et al. (1997):

high-intensity mixing within households and low-intensity ‘background’ random

mixing in the community i.e. between households. Two different configurations

for within-household mixing are compared: random mixing and empirical-based

mixing, where the latter refers to physical contact networks simulated from the

fitted ERGMs. For each epidemic simulation, two sets of within-household contact

networks are drawn from the ERGMs, one for a weekday and one for a weekend day,

and those are kept fixed during the whole simulation.

At time step t (in days), assuming infection is spread by means of physical

contacts, each susceptible i acquires infection with probability:

pi,1(t) = 1− (1− βh)∑

j 6=i∈hiyijIj(t) · (1− βc,11)

∑j /∈hi

Ij,1(t) · (1− βc,12)∑

j /∈hiIj,2(t),

pi,2(t) = 1− (1− βh)∑


∑j /∈hi

Ij,1(t) · (1− βc,22)∑

j /∈hiIj,2(t),

where index 1 corresponds to children ≤ 18 years and index 2 to adults > 18 years. βh

denotes the within-household transmission probability per physical contact, per time

step. The 2 × 2 community transmission probability matrix βc (with βc,12 = βc,21)

is taken directly proportional to the per capita physical contact rates estimated from

the Belgian POLYMOD contact survey, with a proportionality constant qc (Mossong

et al., 2008b; Goeyvaerts et al., 2010):

βc = qc

[17.35 6.26

6.26 7.88

]· 10−7.

Further, yij denotes the observed adjacency matrix and under the random mixing

scenario, yij equals 1 for all household members i and j. Finally, hi denotes the

household of node i and Ij(t) indicates whether node j is infected (1) or not (0) at

time t with subscripts referring to children and adults.

5.3. Epidemic Spread in a Community of Households 99

Since we aim to study the effect of contact heterogeneity, we assume that in-

herent susceptibility and infectiousness are invariant with age. Further, we assume

that there is no latent period i.e. individuals are infectious immediately when

acquiring infection. At each time step, infected individuals recover with a constant

probability of 0.22 such that the mean infectious period is approximately 3.5 days.

Values for the transmission parameters βh and qc are chosen in line with literature es-

timates based on household final size and symptom onset data (Table A.6): βh = 0.05

and qc = 275. Resulting in an average community transmission βc = 0.00026. These

parameter values result in estimates of the community probability of infection for

children and adults (CPIchild and CPIadult) between 0.18 and 0.20, and 0.11 and

0.12, respectively. Furthermore, the probability to escape infection from an infected

household member per day is qHH = 1− βh = 0.95. The first day of the epidemic is

randomly determined to be a week- or weekend day and is started by infecting three

random individuals. The epidemic is then tracked until all infected individuals are

recovered and no new infections have occurred.

5.3.1 Setting 1

Results from 1000 stochastic epidemic simulations are shown in Figures 5.4 and 5.5.

In these figures, small outbreaks defined as outbreaks with a final size of < 100

individuals that took less than 60 days, are excluded from display. The proportion

of small outbreaks is significantly smaller in the random mixing setting compared

to empirical-based mixing, 0.43 and 0.50, respectively (Fisher’s exact test, p-value:

0.0027).

These results suggest that relaxing the assumption of random mixing within

households by extending to more realistic contact network patterns drawn from

the fitted ERGMs, has a small impact on the epidemic simulations. The mean

proportion of individuals ultimately infected and the mean proportion of households

infected is slightly smaller under empirical-based mixing compared to random mixing:

0.39 [0.12, 0.56] vs. 0.36 [0.12, 0.53], and 0.70 [0.28, 0.88] vs. 0.67 [0.29, 0.86],

respectively (Figure A.3). Furthermore, the household attack rate, defined as the

mean proportion of individuals infected per household (Longini and Koopman, 1982)

increases with household size under both settings (Figure 5.5).


Figure 5.4: Mean infection incidence over time at the individual (left) and household level

(right) for 1000 simulations of a stochastic SIR epidemic process on a 2-level households

model assuming random (black) and empirical-based (red) mixing within households.

Figure 5.5: Household attack rates by household size for 1000 simulations of a stochastic

SIR epidemic process on a 2-level households model assuming random and empirical-based

mixing within households.

5.3. Epidemic Spread in a Community of Households 101

5.3.2 Setting 2

In this setting, we include a scaling factor to account for the difference in density

between empirical-based and random mixing:

pi,1(t) = 1− (1− βh · δh)∑


∑j /∈hi

Ij,1(t) · (1− βc,12)∑

j /∈hiIj,2(t),

pi,2(t) = 1− (1− βh · δh)∑


∑j /∈hi

Ij,1(t) · (1− βc,22)∑

j /∈hiIj,2(t).

Hence, δh is chosen 1 for empirical-based mixing, while for random mixing it equals

the network density of the simulated contact network in the empirical-based mixing

scenario. In the previous simulation, the different results between the network model

and the random mixing scenario could be due simply to different densities rather than

to any particular characteristic of the network structure. In this setting we calibrate

in order to make a more fair comparison between the two scenarios. Figures 5.6 and

5.7 present the results from 1000 simulations excluding small outbreaks.

From these figures we see that there are barely any differences in mean final

fractions (0.37 [0.13, 0.52] vs. 0.36 [0.12, 0.53], and 0.68 [0.31, 0.86] vs. 0.67 [0.29,

0.86]; Figure A.4) or household attack rates between empirical-based and random

mixing, respectively. The proportion of small outbreaks is also similar in both

settings, 0.48 and 0.50, respectively (Fisher’s exact test, p-value: 0.3954).

5.3.3 Other Settings

To further investigate the comparison between random and empirical-based mixing,

we also performed the epidemic simulations under following conditions:

1. High household transmission: βh = 0.4

2. Age-dependent household transmission: βh,1 = 0.2 and βh,2 = 0.05

3. Age-independent community transmission: βc = βc = 0.00026

4. Simulating over all 316 households available in the contact data

However, the obtained differences were fairly similar to the results described above

and are being omitted.


Figure 5.6: Mean infection incidence over time at the individual (left) and household level

(right) for 1000 simulations of a stochastic SIR epidemic process on a 2-level households

model assuming random (black) and empirical-based (red) mixing within households

including a density scaling factor.

Figure 5.7: Household attack rates by household size for 1000 simulations of a stochastic

SIR epidemic process on a 2-level households model assuming random and empirical-based

mixing within households including a density scaling factor.

5.4. Discussion 103

5.4 Discussion

In this chapter, we introduced the first social contact study focusing specifically on

contact networks within households. Inference of within-household contact networks

in previous studies was based on egocentric contact surveys in which each household

network was only partly observed (Potter et al., 2011; Potter and Hens, 2013) or on

limited data in a very specific setting (rural Peru; Grijalva et al. (2015)). Our survey

design was an improvement on the former surveys, since information on contacts

between all household members was available. Consequently, the obtained contact

networks are likely better than the estimates obtained by the previously mentioned

studies. We analyzed the household network data using ERGMs to assess the effect

of factors such as role in the household, gender, age in children and household

size on close contacts within households. We found that contacts between father

and children are less likely than between father and mother, mother and children

and siblings (expect older siblings). This is in line with conclusions obtained by

de Greeff et al. (2012). They analyzed data for pertussis in household with young

infants and found that fathers were less susceptible to pertussis infection than other

household members, whereas mothers were more infectious to their infants. Targeted

vaccination of mothers as well as siblings were found the most effective, the latter

because siblings more often introduce an infection in the household. We found that

the mean number of contacts increases with increasing household size (see Table A.1),

supporting density-dependent contact rates as found in previous contact surveys

(Mossong et al., 2008b). However, studies on household epidemic data of close-contact

infections (Melegaro et al., 2004; Cauchemez et al., 2004; de Greeff et al., 2012)

support frequency-dependent transmission, since they report a decreasing trend in

instantaneous risk of transmission between a susceptible and infectious individuals

with household size. This discrepancy remains to be investigated. To assess the

common assumption of random mixing within households, we simulated epidemics in

a two-level SIR setting based on either the empirically grounded networks or random

mixing. We did not find any important differences, indicating that the assumption of

random mixing between household members may be an adequate approximation of

social contact behavior in this setting for infections transmitted via close contacts.

Our study has a number of limitations and assumptions. We assume that a

contact occurred if it was reported by at least one household member. Thus, contacts

forgotten by both members could result in an underestimation of the network density.

Potter et al. (2015) developed a model to deal with the issue of reporting error


on network edges. However, given that the high reciprocity rates (98%) indicate

a very good reporting quality of the survey, we believe this will not have a large

impact on our conclusions. Further, our results depend on the contact definition

used to determine the within-household network links and cannot be generalized to

the spread of any infectious disease. Based on the exploration of various contact

definitions when using POLYMOD contact data to estimate age-specific varicella

transmission rates (Goeyvaerts et al., 2010), we opted to use physical contacts in this

study as a surrogate of potential transmission events for close-contact infections such

as influenza and smallpox. Keeling and Eames (2005) correctly note that even for two

airborne infections, different networks may be appropriate because differing levels of

interaction will be required to constitute an effective contact. Lastly, the comparison

between random mixing and empirical-based mixing was assessed by simulating

according to a two-level mixing model in a completely susceptible population of

households. It is possible that other settings (e.g. with a different structure or a par-

tially immune population) entail bigger differences between these mixing assumptions.

The methods in this chapter can be extended in a number of ways which will

be topics of future research. Figure A.1 indicates a relationship-specific heterogeneity

in duration of contact, which might be relevant for some diseases. The ERGM

framework can be adapted to model ‘valued’ within-household contact networks

(Krivitsky, 2012), with the value of a contact determined by its total duration, and

by weighting the transmission rates in the epidemic simulation model accordingly.

It is also of potential interest to capture temporal dynamics of within-household

contacts and to simulate the impact of contact formation and dissolution on the

spread of infection (Hanneke et al., 2010; Krivitsky and Handcock, 2014). Table 5.1

suggests that the physical contact networks are less complete on weekdays compared

to weekends, whereas the difference between the regular and holiday periods seems

to be minor. Further, on weekdays, interactions between household members are

expected to occur mostly during the morning and the evening, before and after

school or work time. In the survey, participants had to indicate in which location

most time was spent within certain time blocks. Combining this time-use like data

with the contact diary allows to infer the potential timing of (physical) contacts with

household members, and to estimate dynamic within-household contact networks.

This would also be valuable to inform large-scale individual-based simulation models

of infectious disease spread. Finally, combining the model for within-household

contact networks developed in this chapter with epidemic data from a similar com-

munity of households, would allow to improve estimates of age-specific heterogeneity

5.4. Discussion 105

in susceptibility and infectiousness for infections such as influenza (Addy et al., 1991).

This study provides unique insights into within-household contacts, considered

to be important drivers of many close-contact infections. It is the first empirical

evidence resulting from a large household contact survey supporting the use of the

random mixing assumption in epidemic models incorporating household structure.

Chapter 6Bayesian Inference for the Two-Level

Mixing Model Incorporating

Empirical Household Contact

Networks

In Chapter 5, we emphasized the key role that households have in the spread of

airborne infectious diseases and, consequently, the importance of describing the

mixing patterns within households as realistically as possible. To do so, we developed

a network model based on the first household contact survey to account for contact

heterogeneity within households.

In the last 15 years, network theory has gained considerable attention to model

interactions between hosts that enable the spread of disease through a population.

Estimating the parameters of these network models from data, however, remains

a huge challenge because of the inherent complexity of these sophisticated models.

Britton and O’Neill (2002) were the first to perform Bayesian statistical inference for

stochastic epidemic models by including an underlying unobserved social structure

modeled as a Bernoulli random graph i.e. assuming random mixing. Demiris and

O’Neill (2005b) extended this method by developing inference for infection rates and

imputed the contact graph, assuming random mixing within and between groups

107

108 Chapter 6. Two-Level Mixing Model Incorporating Household Networks

i.e. allowing for two levels of mixing. This type of model assumes that a population

of individuals is partitioned into groups (e.g. households, farms, etc.) in which

infectious contacts can occur both locally within a group, and globally between

groups. Groendyke et al. (2011, 2012) extended the work of Britton and O’Neill

(2002) by suggesting generalizations of the network model that was used and by

implementing software to perform inference for these models. They considered a

SEIR epidemic model on a random network modeled as an Erdos-Renyi random

graph, which is one particular type of a more general class of exponential-family

random graph models. In all these models the structure of the contact networks are

inferred from epidemic data only. As a consequence strict assumptions are necessary

to make estimation possible.

In this chapter, we will integrate the household network model inferred in the

previous chapter into an epidemic two-level mixing model to estimate parameters

from household disease data. Data augmentation is used to identify the social

structure consistent with the observed disease data and the network model. Hence,

unlike the models described above, the underlying contact graph is informed by both

empirical contact data as well as disease data. Inference will be done in the Bayesian

framework, using MCMC, in which it is natural to use data augmentation. This

approach is illustrated using the data on pertussis collected in households in the

Netherlands, which is described in Section 2.1.3.

The structure of this chapter is as follows. In Section 6.1, we describe the

model, present the corresponding likelihood and provide details on the MCMC

sampling. Preliminary results are shown in Section 6.2 and a discussion is provided

in Section 6.3.

6.1 Methodology

6.1.1 Model Description

In the following, the transmission model and underlying contact network are

described.

Two-level mixing model

Consider a population of independent households of varying sizes. We shall

6.1. Methodology 109

represent the within-household social structure using a random graph G. Specifically,

each individual in the population is represented by a vertex in G. Two vertices

are adjacent in a specific realization G of G if the corresponding individuals made

physical contact. We can now define a two-level mixing model on G. We assume

a discrete-time SEIR model. Hence, at time point t ≥ 0 every individual in the

population is either susceptible (S), exposed (E), infectious (I) or removed (R). A

susceptible individual may become infected after which he is exposed but not yet

infectious. After this period he becomes infectious and can transmit the disease to

others. Ultimately all infectious individuals recover. Note that we do not consider

possible reinfection due to the limited time scale of our data.

Now, consider household h of size nh. Denote the household members for

which pertussis is laboratory confirmed with Ih and the other members with Sh.

The data consist of a set of symptom onset times oh = {ohj , j ∈ Ih}. For pertussis

we know that the symptom onset time is also the start of infectiousness (Centers for

Disease Control and Prevention, 2016). Hence, individual j whose symptoms started

at time ohj , is infectious from time ohj until time rhj , with rhj − ohj = c the length

of the infectious period. We assume that c is known and thus fixed. Furthermore,

this individual is assumed to be infected at time ehj and the latent period ohj − ehj is

assumed to be gamma distributed with mean µ and variance σ2. Define the primary

case of the household as ph, such that ehph = min{ehj , j ∈ Ih} and Ih − {1} as the

group infected household members without the primary case. Denote the end of

follow-up for this household as Th. Lastly, let I correspond to the confirmed cases in

all households and define S,o, r, e,p, I− {1} and T in the same way.

We assume two levels of transmission: within-household transmission via the

physical contact structure G and a constant background risk of infection from the

community. Let βh be the within-household transmission probability (per contact

per day) and βc the community transmission probability (per day).

Network model

The random graph G is modeled according to the final ERGM obtained in

Section 5.2. For simplicity and computational efficiency, we will focus on the weekday

model (Table 5.3). Recall that the physical contact network density decreased with

increasing household size and that the odds of a physical contact between father and

child was smaller than for any other pair except for older siblings


Vaccination

In the Netherlands, infants were offered a primary vaccination series of 4 doses

of whole cell DTP-IPV since 1957 which was replaced by an acellular vaccine in

2005. In 2002, an acellular pertussis preschool booster was introduced for children

at the age of four years. Vaccination coverage has been high over the past decades.

According to the report on the national immunisation programme (NIP) in the

Netherlands (National Institute for Public Health and the Environment, 2013), the

vaccine effectiveness (VE) of the primary series increased after the replacement of

the whole cell vaccine with the acellular one. Further, the VE for the booster dose

decreases after approximately 4 years, i.e. when children reach the age of eight years,

to about 18% at the age of 14 years (see Table 6.1). Note that vaccine effectiveness

is defined as the percentage reduction in the incidence of disease among vaccinated

persons compared with unvaccinated persons.

Age / Birth-cohort ’98 ’99 ’00 ’01 ’02 ’03 ’04 ’05 ’06 ’07 ’08

1yr 38 63 78 73 63 29 54 72 87 92 90

2yr 33 22 52 46 41 - - 67 58 92 91

3yr 9 - - - 54 10 37 59 43 84 82

4-5yr - 77 71 82 86 80 84 83 93 89 -

6yr 74 70 80 79 71 61 89 87 90 - -

7yr 68 71 68 71 51 61 67 86 - - -

8yr 77 75 56 47 35 72 80 - - - -

9yr 73 63 36 49 34 69 - - - - -

10yr 60 - 13 24 59 - - - - - -

11yr - 11 - 5 - - - - - - -

12yr 45 3 14 - - - - - - - -

13yr - - - - - - - - - - -

14yr 18 - - - - - - - - - -

Table 6.1: Estimation of vaccine effectiveness for 1 to 14-year-olds per birth cohort

according to the NIP report (National Institute for Public Health and the Environment,

2013).

We therefore make the following assumptions for our data set, (1) a vaccinated

individual s that is 14 years or younger is susceptible with probability fs = 1− V E,

where the estimates of the VE per age group and birth-cohort from the NIP report

6.1. Methodology 111

are used, (2) a vaccinated individual of 14 years and older is completely susceptible,

fs = 1, and (3) individuals younger than 15 years with unknown vaccination status

are considered to be vaccinated (four individuals).

6.1.2 Likelihood and Posterior Density

The data under consideration consists out of dates of symptom onset only, therefore

a large part of the infectious process is not observed. This makes inference for the

parameters of the transmission model challenging, because the likelihood of the data

given the parameters is intractable for two reasons: (1) it involves summation over all

possible infection times e (O’Neill and Becker, 2001) and (2) all possible networks G

(Britton and O’Neill, 2002). We therefore include the infection times and G as extra

model parameters. The augmented log-likelihood function has three components.

LL = log{π(o, e|βh, βc, µ, σ,G)} =∑h

log{π(oh, eh|βh, βc, µ, σ,G) = LL1+LL2+LL3

We present these components for one household h and omit the superscript h for

clarity. First, the contribution from the infections is given by

LL1 =∑

j∈I−{1}

(ej−2∑t=0

(log{1− pj(t)}) + log{pj(ej − 1)}

),

where pj(t) is the probability that individual j acquires infection from time t to time

t+ 1 (in days):

pj(t) = fj

[1− (1− βh)

∑(j,k)∈G 1ok≤t<rk (1− βc)

].

Similarly, the contribution from the individuals who do not get infected is given by

LL2 =∑j∈S

T−1∑t=0

log{1− pj(t)}.

Finally, the contribution accounting for the incubation process is

LL3 =∑j∈I

log{dµ,σ(oj − ej)},

where dµ,σ is a Gamma density with mean µ and variance σ2. By Bayes theorem, the

posterior log-density is

log{π(βh, βc, µ, σ, e, G|o)} ∝ LL+ log{π(G)}+ log{π(βh, βc, µ, σ)} (6.1)


where π(βh, βc, µ, σ) = π(βh)π(βc)π(µ)π(σ) is the prior density for βh, βc, µ, σ and

π(G) = P (G = G) is specified by the network model.

Independent prior distributions are chosen for βh, βc, µ and σ. An uninformative

uniform U [0, 1] distribution is used for βh and βc. For µ and σ a Gamma distribution

with mean 9 and variance 32 is used, reflecting the belief that the mean incubation

period is around 7-10 days long and reported within the range 4-21 days (CDC and

Ncird, 2015; World Health Organization, 2010).

6.1.3 MCMC Sampling

We use an MCMC algorithm to generate approximate samples from the posterior

density in (6.1). The chain is initialized as follows: βh and βc are drawn from

the uniform distribution U [0, 1] and µ and σ from U [4, 21]. Further, for every

individual j in household h, the time of infection ehj is drawn in U [ohj − 21, ohj − 4]

and the end of the infectious period is set fixed at 21 days after the onset of symptoms.

Parameters are updated using single-component Metropolis-Hastings sampling.

In each iteration, we perform the following steps: (1) updating βh and βc, (2)

updating µ and σ, (3) updating G by resampling the household network in 10%

of the households, and lastly (4) updating one infection time per household. To

ensure plausible values, a logit-transformation is applied to βh and βc and a

log-transformation to µ and σ.

For steps (1) and (2), random walk sampling is applied by generating a new

value from the normal distribution centered at the current value with variance δ.

In (1) the value of δ is set at 0.5 and 1.0, and in (2) at 0.1 and 0.2, respectively.

In (3) and (4) independence sampling is used. For step (3) 10% of the households

are randomly selected and in each of these households either an edge is removed or

added from G, both with equal probability. If no edges can be removed or added,

the proposal is automatically rejected. Lastly, in (4) individual j ∈ Ih is randomly

selected in household h and the infection time ehj is updated using a random walk

with step size 1 conditional on ohj − 21 < ehj < ohj .

We perform 200, 000 iterations from which the first 40,000 iterations are dis-

carded as burn-in. The length of this burn-in period is based on the convergence

diagnostic by Boone, Merrick and Krachey (BMK) (Boone et al., 2012). From the

MCMC samples the posterior medians and 95% credible intervals for all parameters

6.2. Preliminary Results 113

are estimated.

6.2 Preliminary Results

Trace plots, posterior and prior distributions for the parameters and two selected

network statistics are shown in Figures 6.1 and 6.2. The BMK diagnostic indicates

convergence for all chains (Hellinger distances smaller than 0.5), however borderline

results are obtained for the mean duration of the incubation period µ. Inspecting

this chain reveals high auto-correlation, suggesting that more iterations are needed.

Posterior correlations between µ and the other parameters are low, however, βh and

βc are inversely correlated (cor(βh, βc) = −0.46). Trace plots of the number of edges

and triangles indicate that the network mixes well. Posterior medians and credible

intervals are presented in Table 6.2. The mean duration of the incubation period is

estimated at 9.4 (8.4, 11.1) days.

Figure 6.1: Trace plot of the MCMC samples for the within-household transmission

probability (βh), the community risk of infection (βc), the mean duration of the incubation

period (µ), the standard deviation of the incubation period (σ) and the number of edges

and triangles in the household contact network G.


Figure 6.2: Prior and posterior distributions for the within-household transmission

probability (βh), the community risk of infection (βc), the mean duration of the incubation

period (µ) and the standard deviation of the incubation period (σ). Dotted lines are prior

distributions.

6.3 Discussion

The model described in this chapter allows inference for disease transmission in a

population of households with underlying social structure within households. In

contrast with existing models, this underlying contact graph is not only informed by

the disease data but also by empirical contact data.

6.3. Discussion 115

Table 6.2: Posterior median and 95% posterior credible intervals for the model

parameters.

Parameter Median 95% CI

Within-household transmission βh (days−1) 0.0094 (0.0072, 0.012)

Community transmission βc (days−1) 0.0019 (0.0011, 0.0029)

Mean duration of incubation period µ (days) 9.4 (8.4, 11.1)

Standard deviation of incubation period σ (days) 5.0 (4.0, 5.7)

Data on symptom onset times of pertussis in households with a laboratory

confirmed index case were used to illustrate our approach. The likelihood of the data

was numerically intractable, therefore data augmentation was used via an MCMC

approach. By assuming a fixed length of the infectious period, it was possible to

estimate transmission parameters on two levels (within households and from the

community) and the duration of the latent period, which was assumed to be equal to

the incubation period. Our estimates were obtained using vague prior distributions,

except the prior distribution for the duration of the latent period was somewhat

more informative.

Our approach has some limitations and assumptions that need to be discussed.

First, we only observe infected households with a hospitalized index case. We dealt

with this by assuming all households to be independent and conditioning on the

state of the household at the beginning of follow-up. Another possibility would have

been to assume that the observed data is a sample from the population (O’Neill,

2009), however this would require making assumptions for the uninfected households

i.e. which size do these households have and how many are there? Cauchemez

et al. (2004) showed in a similar household setting that changing the proportion of

uninfected households did not affect the bias in the community risk. Second, we

only considered those households in which all infected persons (as determined by

PCR/serology) had a clearly defined day of symptom onset. This way, asymptomatic

cases were ignored although they could have an impact on household transmission

rates. Although the proportion of cases with missing symptom onset date in our

data was fairly small (13%), further research is necessary to determine the effect of

asymptomatic pertussis cases. Third, we assumed that the contact networks derived

from the Flemish household contact study are appropriate for the estimation of


pertussis transmission in a population of households in the Netherlands. We believe

that the difference between the two study populations will not have a large effect on

the estimation, since both studies focused on household with young children. Also, as

in Chapter 5, we assumed physical contacts to be a proxy for potential transmission

events for a close-contact infection such as pertussis. However, a different contact

definition might be more appropriate and the model for the contact network would

have to be re-evaluated accordingly.

There are still some aspects in this work that need to be studied. To assess

the adequacy and the fit of the model to the data, we will simulate epidemics in a

community of households similar in structure to the pertussis data with parameters

drawn from the posterior distributions (see also Section 5.3). Further, we will repeat

the estimation by assuming random mixing within households and compare the

results with our approach. We assumed a fixed duration of the infectious period of 21

days according to the literature (CDC and Ncird, 2015; World Health Organization,

2010). A sensitivity analysis is necessary to determine the impact of this value on the

obtained estimates. Lastly, updating the contact network in our approach requires

evaluating the likelihood of the network implied by the ERGM. This is very time

consuming and we will need to investigate the efficiency of the algorithm in more

depth and assess whether it is worthwhile to include network updates instead of

using a fixed empirically-based contact network.

Chapter 7Spatiotemporal Evolution of Ebola

Virus Disease at Sub-national Level

during the 2014 West Africa

Epidemic: Model Scrutiny and Data

Meagreness

The Ebola outbreak of 2014 is the most widespread outbreak of EVD in history,

causing a huge number of cases and deaths. Due to the nature of this outbreak

as a global public health threat, a large number of models have been published

that aimed to estimate epidemiological parameters, and to forecast the evolution

of the epidemic. These models were mainly deterministic, SEIR transmission

models (Althaus et al., 2015; Fisman et al., 2014; Gomes et al., 2014; Nishiura

and Chowell, 2014). Most models, and especially the ones early in the outbreak,

were fitted on reported cumulative national data. Doing so, they did not account

for the transmission heterogeneity of this outbreak and the serial correlation

induced by the accumulation of data. However, in the course of the outbreak,

others highlighted the importance of the spatial and temporal heterogeneity of the

outbreak, questioning assumptions made by early models (Chowell et al., 2014b).

A study by King et al. (2015) illustrated through simulations that deterministic

117

118 Chapter 7. Spatiotemporal Evolution of EVD at Sub-national Level

models, fitted on cumulative incidence data, lead to substantial underestimation of

the uncertainty in estimates and forecasts. In addition, fitting of the models was

often done not taking into account the serial correlation. The clustered pattern

of transmission could be attributed to variability in transmission settings (e.g.

healthcare facilities, households, burials) (Merler et al., 2015), behavior (e.g. expres-

sions of mistrust) and control measures (e.g. contact tracing and monitoring and

establishment of a treatment center). However, there is still a lack of insight in the

relative contribution of each factor to the transmission pattern (Chowell et al., 2014a).

A good understanding of the outbreak transmission can support an efficient

allocation of resources at national and at district level. With the study discussed in

this chapter, we aimed to develop a model that would overcome previously identified

model limitations, including the not-used district level data and the assumption of

homogenous transmission across districts. Our two-stage model is based on publicly

available data (described in Section 2.1.4) that aimed to improve the information

for operational decisions to control the epidemic. The first stage is the use of a

growth model that addresses the spatiotemporal correlation. The second stage is

the use of a compartmental model - whenever deemed appropriate - that provides a

district-specific estimation of the effective reproduction number and its uncertainty.

In addition, we performed a sensitivity analysis to study the effect of the model

assumptions on the parameter estimates.

This chapter covers the study in Santermans et al. (2016a). Section 7.1 dis-

cusses the growth model and corresponding results, while Section 7.2 provides the

model description, estimation procedure and results for the compartmental model.

To assess the sensitivity of the results in Section 7.2, a sensitivity analysis is presented

in Section 7.3. We conclude with a discussion in Section 7.4.

7.1 Growth model


To compare growth patterns over time among districts, we use a flexible spatiotem-

poral growth rate model across all districts. The weekly number of new infections in

each district is modeled via a count-distribution allowing for possible overdispersion.

The expected number of cases is modeled using a spatiotemporal function that makes

a distinction between the temporal process and the spatial process. While the num-

7.1. Growth model 119

ber of cases in a district is allowed to depend on the number of cases in this district

the week before, the number of cases also depends on the number of cases in the

neighboring districts. The growth rate is obtained numerically as the derivative of

the expected number of cases. This model can be written as :

Ii(t) ∼ NegBin(µi(t)),

log(µi(t)) = β0 + β1C1 + β2C2 + fi(t),

in which Ii(t) is the number of newly infected cases in week t and district i, C1 is an

indicator variable for Sierra Leone, C2 for Guinea and fi(t) is defined as a separable

spatiotemporal function:

fi(t) = xi,t = φxi,t−1 + εi,t,

with φ a scalar and ε a Gaussian spatial random walk process, defined as

εi,t|εj,t, τ ∼ N

1

ni

∑i∼j

εj,t,1

niτ

,

where ni is the number of neighbors of district i, τ is a precision parameter and i ∼ jindicates that districts i and j are neighboring districts.

While Markov Chain Monte Carle (MCMC) methods are often used to esti-

mate the parameters of interest in this model, it is computationally intensive.

Therefore, we use Integrated Nested Laplace Approximation (INLA; Rue et al.

(2009)) as an alternative estimation method. The INLA approach is a fast Bayesian

inference tool that uses accurate approximations to the densities of the hyperparam-

eters and latent variables in the model.

This model allows the estimation of the district-specific expected number of

new cases per week exp(β0 + βCi + fi(t)), the district-specific time trend fi(t),

the district-specific growth rate dfi(t)dt (estimated as

xi,t+1−xi,t−1

2 ) and the spatial

distribution of the growth rate within the three countries. In addition, to investigate

the effect of implemented intervention measures on the estimated growth rates, for

each district a Pearson’s Chi-square test was used. Doing so, we tested, for different

time lags, the association between positive and negative growth rates and the absence

or presence of aforementioned intervention measures. A growth rate distribution heat

map was made as a method to visualize the weekly change for each district-specific

rate of infection with an overlay of intervention measures.


7.1.2 Results

Results of the growth rate model are shown in the heat map in Figure 7.1.

Figure 7.1: Estimated weekly growth rates per district and implemented intervention

measures for Guinea, Sierra Leone and Liberia, 2014-2015. Red colours indicate an increase

in number of weekly cases, whereas blue colours indicate a decline. Periods for which no

reported cases are available are shown in white. A light dot indicates that a triage, holding

centre or CCC is in place and a dark dot indicates that an ETU or ETU and CCC are in

place.

Comparing the growth rates in the different districts (rows), it is clear that the

outbreak did not evolve uniformly over districts. Pearson’s Chi-square test for

decrease in growth rate after implementation of control measures, for different time


lags, did not reveal any insights (results not shown). Figure 7.2 shows the estimated

growth rates and implemented intervention measures for four selected time points on

a geographical map of West Africa. This figure emphasizes the spatial heterogeneity

of the outbreak, even within countries. To complement the heat map, the cumulative

number of cases and deaths are shown in Figures 7.3 and 7.4, respectively. These

figures make it possible to identify the most affected regions.

Figure 7.2: Estimated growth rate per district and implemented intervention measures

during week 21 and 40 of 2014 and week 8 and 26 of 2015. ‘1’ triage, holding centre or

CCC is in place; ‘2’ ETU or ETU plus CCC is in place.


Figure 7.3: Cumulative cases per district and implemented intervention measures. A light

dot indicates that a triage, holding centre or CCC is in place and a dark dot indicates that

an ETU or ETU and CCC are in place.


Figure 7.4: Cumulative deaths per district and implemented intervention measures. A

light dot indicates that a triage, holding centre or CCC is in place and a dark dot indicates

that an ETU or ETU and CCC are in place.


7.2 Compartmental model

To address within-district disease evolution over time, district-specific compartmental

models were fitted to the number of newly reported cases and deaths.


We use a version of a Susceptible-Exposed-Infected-Recovered (SEIR) model that

incorporates disease-related mortality by making the distinction between survivors

and non-survivors. It also takes into account an underreporting factor for cases and

deaths. This model is depicted as a flow diagram in Figure 7.5.

Figure 7.5: Flow diagram for the SEIR model with distinction between cases that survive

and fatal cases.

Hence, we assume that individuals are born susceptible (S) to infection. Then, as time

progresses they may become infected and move to the exposed compartment (E) at

a time-dependent transmission rate β(t). After the exposed stage, they become in-

fectious and a proportion 1− φ, that will eventually recover, moves to the infectious

IR compartment after a mean latent period 1/γ. The proportion of fatal cases, φ,

moves to the ID compartment at the same rate. Individuals in the IR compartment

recover after a mean infectious period 1/σ. Lastly, α denotes the disease-related mor-

tality rate. This model can be expressed by the following set of ordinary differential

equations:

7.2. Compartmental model 125

dS(t)dt = −β(t)S(t) IR(t)+ID(t)

N(t)

dE(t)dt = β(t)S(t) IR(t)+ID(t)

N(t) − γE(t)

dIR(t)dt = (1− φ)γE(t)− σIR(t)

dID(t)dt = φγE(t)− αID(t)

dR(t)dt = σIR(t)

dD(t)dt = αID(t)

In this notation N(t) = S(t) + E(t) + IR(t) + ID(t) + R(t) denotes population size.

The initial conditions at time t = 0 are given by R(0) = 0, IR(0) = ID(0) = 0, E(0)

is an unknown parameter which is estimated from the data and S(0) = N(0)− E(0)

where N(0) is the population size at the start of the epidemic. The expression for the

effective reproduction number R = Re for this model is given by:

Re(t) = β(t)

(φ

σ+

1− φα

).


Fitting the ODE is done taking into account the specific reporting of cases and

deaths. Reporting occurs at varying time intervals. Figure 7.6 schematically shows

the reporting scheme of cases. The reporting scheme for deaths is similar but the

dates at which reporting occurs are not necessarily the same.

Figure 7.6: Schematic representation of reporting of case notifications.


The data consists of cumulative number of (suspected, probable and confirmed)

cases and deaths. Hence, this data is expected to increase monotonically over time.

However, due to reclassification of suspected cases over time, the cumulative number

of cases decreases at certain time points, resulting in negative number of new cases.

We therefore applied the pooled adjacent violator algorithm (PAVA) algorithm to

monotonize the cumulative data.

Denote the cumulative number of cases by ci, i = 1, , n. Suppose that i∗ is

the first index for which ci∗ < ci∗−1, i.e. the first index for which the monotone

behavior is violated. The PAVA now states that these values need to be “pooled”.

Hence, ci∗ and ci∗−1 are both replaced byci∗+ci∗−1

2 . The algorithm then proceeds by

recursively checking monotone behavior and by pooling if necessary and stops when

monotonicity is achieved.

We assume

yj ∼ NegBin(ρ× (Inew(tj − 1)− Inew(tj − hj − 1)), φ1),

dj ∼ NegBin(ρ× (m(tj − 1)−m(tj − hj − 1)), φ2),

where yj and hj are defined as in Figure 7.6 for cases and dj is the equivalent of yj for

deaths. Inew(t) (m(t)) is the expected cumulative number of cases (deaths) at time t

obtained by solving dInew(t)dt = γE(t) (dm(t)

dt = αID(t)), ρ is the expected fraction of

reported cases (deaths) and φi; i = 1, 2 are overdispersion parameters. The objective

function is then given by the sum of the negative binomial loglikelihoods specified

above. Further, we model Re(t) as a piecewise constant function Re(i) as follows:

Re(0) = R0, Re(i) = R0 + r1 + ...+ ri; i = 1, ..., n

Such that ri is the change in reproduction number compared to the previous time

interval. This implies that βi = Re(i)/(φσ + (1−φ)

α

); i = 0, ..., n is also piecewise

constant. The length of the intervals is chosen to be 21 days.

Prior estimates for the latent period (9.4 days), the infectious period for sur-

vivors (16.4 days) and deceased (7.5 days) are used following Lewnard et al. (2014).

The remaining parameters (φ1, φ2, E(0), φ, ρ, R0, ri; i = 1, , n) are estimated via

Markov Chain Monte Carlo using the adaptive-mixture metropolis algorithm. We

conducted 2,500,000 iterations retaining every 500th iteration. Burn-in is based on

the BMK convergence diagnostic. The MCMC procedure, which we made publicly


available, was performed in R 3.1.1 using the Laplaces-Demon package (Roberts

and Rosenthal, 2009; Rosenthal, 2007). The univariate prior distributions are given

in Table 7.1. Of these, the prior distributions for φ1, φ2, E(0), R0 and ri are

uninformative. The underreporting rate ρ is assumed to follow a truncated normal

distribution with mean 0.33 based on (Centers for Disease Control and Prevention,

2014) and the case fatality ratio φ follows a beta distribution with mean 0.5.

Table 7.1: Prior distributions.

Parameter Definition Prior distribution

φ1 Overdispersion parameter cases HC(α = 25)

φ2 Overdispersion parameter deaths HC(α = 25)

E(0) Number of exposed individuals at time 0 U(0, 1)

φ Case fatality ratio Beta(α = 10, β = 10)

ρ Underreporting rate N(µ = 13, δ = 0.1); truncated(0, 1)

R0 Reproduction number 1st time period U(0, 10)

ri Changes in reproduction number U(−2, 2)

7.2.3 Results

We show the obtained results for a selection of rural and urban districts: Forecariah

(Guinea), Conakry (Guinea), Western Area Urban (Sierra Leone), and Grand Cape

Mount (Liberia). This selection was based on events of interest during the course

of the outbreak e.g. sudden increase in cases. We were, however, also restricted

by inconsistencies in the data as pointed out as a limitation of the model in the

discussion. Parameter estimates are given in Table 7.2.

Table 7.2: Parameter estimates with 95% posterior credible intervals.

District φ1 φ2ˆE(0) φ ρ

Forecariah 0.76 [0.54, 1.08] 3.18 [1.70, 7.19] 0.44 [0.07, 0.96] 0.66 [0.54, 0.77] 0.33 [0.13, 0.53]

Conakry 0.62 [0.44, 0.89] 1.61 [1.03, 2.60] 61.4 [20.6, 97.7] 0.53 [0.41, 0.67] 0.34 [0.17, 0.54]

Western Area Urban 2.34 [1.71, 3.20] 4.17 [2.31, 8.21] 0.55 [0.10, 0.98] 0.19 [0.16, 0.22] 0.35 [0.16, 0.55]

Grand Cape Mount 0.72 [0.51, 1.00] 0.62 [0.43, 0.90] 0.54 [0.08, 0.98] 0.62 [0.48, 0.77] 0.33 [0.10, 0.54]


The observed and estimated number of new and cumulative cases and deaths are

shown in Figure 7.7. From this figure, we observe that the model fits both the number

of cases and the number of deaths relatively well. Also at the level of the cumulative

numbers model and data show a reasonable fit. The estimated effective reproduction

numbers over time are shown in Figure 7.9. Re(t) ranges from below unity to up to

3.5. Furthermore, estimates are below one for all four districts in the last time period.

However, the 95% credible intervals indicate substantial variability.

Figure 7.7: Observed (black) and estimated (blue) number of new cases (top left), new

deaths (top right), cumulative cases (bottom left) and cumulative deaths (bottom right)

per district. Dashed lines are 95% credible intervals.

We assessed retrospectively the quality of three-week long predictions made at 4


different time points, for the selected districts and we compared these predictions

with the actual observed number of cases and deaths. Results of these short-term

predictions are presented in Figure 7.8 for Western Area Urban.

Figure 7.8: Three-week prediction of new cases (left) and deaths (right) for Western Area

Urban at 24 October, 14 November, 5 December and 26 December 2014 (top to bottom).

Light blue regions are the predicted time periods and estimation is based on all data before

that time point.


Note that the credible intervals do not contain all data points. Hence, even within a

3-week forecast period, the models are not always able to capture all the trends.

Figure 7.9: Estimated reproduction number per district with 95% posterior intervals.

The threshold value of one is indicated by a red horizontal line.

7.3 Sensitivity Analysis

To assess the sensitivity of our results in Section 7.2 to the model assumptions, a few

additional models were fitted to the data of Nzerekore, Guinea. We investigated the

estimability of the fixed parameters, the effect of the number of exposed individuals

at time 0, transmission through contacts with bodies of dead people, and protective

immunity by asymptomatic infections. We compared the models with Deviance

Information Criterion (DIC). We use the following notation:

Model 1: the model described in Section 7.2

Models 2a-2f : fixing E(0) and varying its value from 0.01 to 10

Model 3: estimating E(0) with uninformative prior U(0, 1000)

Model 4: estimating E(0) with uninformative prior U(0, 1000) and fixing R0 to 2.00

Model 5: estimating E(0) with uninformative prior U(0, 1000) and fixing ρ to 0.33

Model 6: fixing E(0) and estimating the latent period


Model 7: fixing E(0) and estimating the infectious period of non-fatal cases

Model 8: fixing E(0) and estimating the infectious period of fatal cases

Model 9: fixing E(0) and estimating the underreporting of deaths, ρdeaths, separately

Models 2 to 5 look at the effect of the parameter E(0), the number of exposed indi-

viduals on 23 May, 2014. In Models 6 to 9 we look at the estimation of several fixed

parameters. In Models 10 to 13c we take into account that EVD can be transmit-

ted through contact with the bodies of dead people. This model is expressed in the

following set of differential equations.

dS(t)dt = −β(t)S(t) IR(t)+ID(t)+mDI(t)

N(t)

dE(t)dt = β(t)S(t) IR(t)+ID(t)+mDI(t)

N(t) − γE(t)

dIR(t)dt = (1− φ)γE(t)− σIR(t)

dID(t)dt = φγE(t)− αID(t)

dR(t)dt = σIR(t)

dDI(t)dt = αID(t)− κDI(t)

dDR(t)dt = κDI(t)

Hence, when an individual dies from EVD, the body of that individual can transmit

the disease (state DI) for a period of time (1/κ) with transmission rate mβ(t). It

then moves to state DR where transmission is no longer possible e.g. after burial.

Model 10: uninformative prior U(0, 20) for m and Gamma distribution with

mean 2 days and standard deviation 1.5 days for 1/κ

Models 11a-11b: fixing m = 1, 2

Model 12: fixing 1/κ = 2 days

Models 13a-13c: fixing 1/κ = 2 days and m = 0.1, 0.5, 1

Finally, since there is evidence of asymptomatic Ebola infections (Bellan et al., 2014),

we assess the effect of protective immunity by asymptomatic infections in Models

14a-14d. These correspond to the following set of ODEs:


dS(t)dt = −β(t)S(t) IR(t)+ID(t)

N(t)

dE(t)dt = β(t)S(t) IR(t)+ID(t)

N(t) − γE(t)

dIR(t)dt = (1− p)(1− φ)γE(t)− σIR(t)

dID(t)dt = (1− p)φγE(t)− αID(t)

dR(t)dt = σIR(t)

dD(t)dt = αID(t) + pγE(t)

where p is the proportion of asymptomatic cases.

Models 14a-14e: fixing p = 0.1, 0.2, 0.3, 0.4, 0.5

The results are given in Tables 7.3 and 7.4. Looking at the DIC values of

model 2, we see that there are very little differences, indicating that E(0) is

not estimable from the data. However, for large values of E(0) (see model 2f)

optimization leads to a local maximum with a very small reporting rate and high

values of Re, which are deemed implausible. Moreover, mixing in this model is

very poor and convergence is not attained. The same is observed when estimating

E(0) with an uninformative prior (model 3), even when R0 is kept constant (model

4). In model 5 the underreporting rate is fixed, this leads to convergence and

good results, however, there is no improvement in DIC compared to model 1. For

this reason, we chose to estimate E(0) between 0 and 1 in our final model. Note

that the value of E(0) in the converged models only affects the estimates of Re

in the first time periods. Making the most recent estimates robust to changes in E(0).

In models 6, 7 and 8 the latent period and infectious periods are estimated,

but again this leads to bad convergence and DIC does not improve. In model 9 we

explored whether a different underreporting rate for deaths could be estimated. But

bad mixing and high autocorrelation for that parameter indicated that this is not

possible. Again, the most recent estimates of Re are quite robust in converged models.

In models 10 to 13c we look at the transmission of dead bodies. When esti-

mating both parameters (m and κ), m is estimated to be 0 and the model does

not converge. The same result is obtained when fixing 1/κ to 2 days (model 12).

7.4. Discussion 133

Hence, m is not estimable from the data. We therefore fix m to different values both

estimating and fixing κ (models 11 and 13), but this does not lead to improvement

in DIC or large changes in recent estimates of Re.

Finally in models 14a-14d, we do see an improvement in DIC with growing

proportion of asymptomatic cases, suggesting that taking into account the possibility

of asymptomatic cases is coherent with observations.

7.4 Discussion

The results of our study strengthen the evidence of a strong temporal and spatial

variability of the EVD transmission at a sub-national level in the affected regions of

Guinea, Sierra Leone and Liberia. The variable transmission dynamics are a major

challenge for the implementation of intervention measures and the mobilization of

resources among districts. This complexity highlights the importance of constant

monitoring and the usefulness of quantitative tools, thereby taking full account of

the uncertainty, to inform the response.

Our growth model quantifies spatiotemporal transmission patterns at a sub-national

level, which cannot be derived from visual inspection of incidence curves and maps

alone. The visualization of the growth rates with a two dimensional (time and space)

heatmap, is useful for decision makers to make evidence based informed decisions on

resource allocation. On the other hand, our compartmental model allows the calcula-

tion of a quantitative measure of transmission, Re(t), that can be used to compare and

communicate about differences in outbreak dynamics between districts and over time.

The combined model illustrates how district-level data can be used to gain a

quantitative insight in the complex outbreak dynamics. Both models show how

the trend varies widely among the districts and changes quickly in time and space

(Figures 7.1 and 7.9). While our estimates of Re(t) are within the range of published

estimates, most of the published estimates were derived from country-level data and

do not provide the granularity we provide at time-dependent district level. The wide

range of Re(t) between near 0 and 3.5 illustrates the need to complement national

with district data driven models, to support public health action.


Table

7.3

:P

ara

met

eres

tim

ate

sse

nsi

tivit

yanaly

sis.

Fix

edva

lues

are

indic

ate

din

bold

,ast

eris

ks

indic

ate

model

diff

eren

ces

com

pare

dto

the

final

model

1.

Model

12a

2b

2c

2d

2e

2f

34

56

78

9

E(0

)0.2

10.0

1∗

0.1

∗0.2

∗0.3

∗0.5

∗10∗

46.7

4∗

310.3

2∗

0.4

5∗

0.2

80.2

80.2

80.1

1/γ

9.4

9.4

9.4

9.4

9.4

9.4

9.4

9.4

9.4

9.4

1.9

2∗

9.4

9.4

9.4

1/σ

16.4

16.4

16.4

16.4

16.4

16.4

16.4

16.4

16.4

16.4

16.4

10.1

7∗

16.4

16.4

1/α

7.5

7.5

7.5

7.5

7.5

7.5

7.5

7.5

7.5

7.5

7.5

7.5

1.0

3∗

7.5

φ0.5

80.6

00.5

80.5

80.5

80.5

80.5

60.5

60.5

70.5

90.5

30.5

40.5

70.3

7

ρ0.3

20.3

60.3

30.3

20.3

20.3

10.0

009

0.0

009

0.0

010

0.3

3∗

0.0

009

0.0

009

0.3

20.4

3

ρdeath

s-

--

--

--

--

--

--

1.0

0∗

Re(0

)2.6

43.7

32.8

82.6

12.4

72.2

93.3

92.4

92.0

0∗

2.2

92.9

24.1

72.6

02.8

5

Re(1

)2.2

32.2

92.1

22.2

02.2

52.3

22.4

12.5

42.7

02.3

31.5

82.6

02.5

72.3

3

Re(2

)1.9

41.7

91.9

61.9

61.9

81.9

82.4

12.4

02.3

21.9

62.0

82.0

42.1

22.0

0

Re(3

)1.0

20.9

81.0

31.0

31.0

31.0

21.8

71.8

9168

1.0

41.5

91.9

51.1

01.0

4

Re(4

)0.6

00.6

30.6

10.6

00.6

00.5

92.0

82.1

01.7

50.6

02.2

82.4

00.6

70.6

2

Re(5

)0.3

70.3

50.3

60.3

70.3

70.3

82.2

02.2

81.7

50.3

72.4

22.7

40.4

60.3

6

Re(6

)0.2

60.2

70.2

70.2

50.2

60.2

62.1

72.2

21.6

60.2

63.0

92.8

60.2

40.2

6

Re(7

)0.2

40.2

30.2

40.2

40.2

40.2

41.9

12.0

31.2

30.2

42.7

52.8

60.1

60.2

4

Re(8

)0.3

80.4

00.4

10.4

30.4

00.4

01.9

42.0

11.4

80.4

32.6

02.7

80.4

40.4

1

DIC

457.5

7459.8

1456.7

7457.1

3456.4

3456.6

6456.3

7454.2

4454.6

5458.6

6581.3

7463.7

2457.2

5458.1

9

7.4. Discussion 135

Table

7.4

:P

ara

met

eres

tim

ate

sse

nsi

tivit

yanaly

sis.

Fix

edva

lues

are

indic

ate

din

bold

,ast

eris

ks

indic

ate

model

diff

eren

ces

com

pare

dto

the

final

model

1.

Model

10

11a

11b

12

13a

13b

13c

14a

14b

14c

14d

14e

E(0

)0.2

00.2

00.2

00.4

90.2

10.1

90.7

10.2

30.2

60.2

90.3

20.3

6

1/γ

9.4

9.4

9.4

9.4

9.4

9.4

9.4

9.4

9.4

9.4

9.4

9.4

1/σ

16.4

16.4

16.4

16.4

16.4

16.4

16.4

16.4

16.4

16.4

16.4

16.4

1/α

7.5

7.5

7.5

7.5

7.5

7.5

7.5

7.5

7.5

7.5

7.5

7.5

p-

--

--

--

0.1

∗0.2

∗0.3

∗0.4

∗0.4

5∗

1/κ

0.6

7∗

0.5

6∗

0.5

6∗

2.0

0∗

2.0

0∗

2.0

0∗

2.0

0∗

--

--

-

m0.0

0∗

1.0

0∗

2.0

0∗

0.0

0∗

0.1

0∗

0.5

0∗

1.0

0∗

--

--

-

φ0.5

80.5

80.5

70.5

70.6

00.5

70.5

50.5

80.5

80.5

90.5

90.5

9

ρ0.3

30.3

20.3

20.0

01

0.3

20.3

20.0

009

0.3

20.3

20.3

20.3

30.3

3

Re(0

)2.6

52.6

02.5

73.9

32.6

12.5

64.2

22.9

53.3

23.7

64.4

14.8

1

Re(1

)2.2

22.1

52.1

32.7

32.1

82.1

02.6

62.4

92.8

43.2

43.7

84.0

8

Re(2

)1.9

31.9

11.8

72.1

01.9

21.8

52.0

82.1

32.3

52.6

32.9

83.2

3

Re(3

)1.0

30.9

90.9

81.5

11.0

10.9

51.8

91.1

71.3

51.5

61.8

52.0

6

Re(4

)0.6

00.6

00.5

91.2

90.5

90.5

82.2

70.6

60.7

40.8

41.0

11.1

2

Re(5

)0.3

70.3

60.3

60.9

60.3

70.3

62.6

20.4

30.4

80.5

40.6

20.6

8

Re(6

)0.2

70.2

50.2

50.8

00.2

50.2

42.7

70.2

80.3

10.3

70.4

30.4

8

Re(7

)0.2

30.2

40.2

30.6

90.2

30.2

32.7

10.2

80.3

20.3

50.4

20.4

6

Re(8

)0.4

40.4

00.3

91.2

00.4

10.4

42.6

80.4

30.4

30.4

10.4

80.4

5

DIC

460.0

4463.7

4460.3

4462.4

3457.0

2463.5

2460.7

5456.3

2456.4

0455.3

1454.7

9453.5

2


We further show that it is difficult to generate accurate predictions. Forecast

results should be interpreted with caution, as control measures and behavioral

changes cannot be sufficiently quantified with the publicly available data. Also, there

are still gaps in our basic knowledge about the disease spread that could potentially

explain outliers, departing from modeling approaches. We think here, for example,

of the three last reported cases in Liberia; one from suspected sexual transmis-

sion months after the source case recovered from disease (Christie et al., 2015),

and most recently two connected cases without any recognized link to outbreak chains.

One of the limitations of our model is the assumption of constant underre-

porting. Previous studies have also assumed a proportion of underreporting (Merler

et al., 2015). Knowledge about the level and changes in underreporting over time

would improve the estimates of transmission dynamics. Unfortunately we do not have

data to assess the magnitude or the variability of underreporting. Also, inconsistent

reporting with undocumented backlogging and the absence of dates of disease onset

may affect the accuracy of the estimates and need to be taken into consideration

when interpreting the results (Azmon et al., 2014). Furthermore, the district-specific

SEIR model is a mathematical model assuming a deterministic disease process. As a

consequence, the second phase of our approach was deemed inappropriate for some

districts, because the data did not seem to follow any consistent pattern, presumably

due to the aforementioned inconsistencies in detection and reporting and the sporadic

introduction of cases.

EVD can be transmitted through contact with dead bodies; therefore, a model

accounting for this transmission was included in the sensitivity analysis. However,

this model did not improve the fit to the data. Most likely, the extent to which

dead bodies versus cases contribute to transmission is indistinguishable with this

model and requires more information and a fully stochastic modeling approach on

disaggregated data, which is not publicly available.

Since there is evidence suggesting the presence of asymptomatic Ebola infec-

tions (Bellan et al., 2014), we looked at the effect of accounting for protective

immunity by asymptomatic infection. We observed that the model fit improved

with increasing proportion of asymptomatic cases, suggesting that our data do not

reject the hypothetical occurrence of asymptomatic cases. Asymptomatic cases could

partially explain why the epidemic did not reach the expected incidence as predicted

by models ignoring them. This again highlights the need for serological studies in

7.4. Discussion 137

order to clarify the role of asymptomatic infection.

While our sensitivity analysis assesses the influence of unknown parameters, it

cannot substitute for non-public data. The growth rate and compartmental models

can be run in real time using our published code and dataset, and can be improved

by organizations that have additional data available or to explore adaptations to the

models and parameters. In the end, different modeling approaches bring different

insights and will improve our ability to effectively support public health action. We

recommend that minimal datasets and standards for data processing, including de-

identification, and data sharing will be developed for future multi-country outbreaks,

especially Public Health Events of International Concern under the International

Health Regulations. The importance of this has also been retained as a conclusion in

a recent research paper on this topic (Sane and Edelstein, 2015).

Our two-stage modeling approach, built with the most detailed publicly avail-

able data, provides time-dependent district-specific quantitative measures of growth

and transmission. We hope that such tool, in addition to other approaches, can

complement public health action against such devastating events as the West-African

Ebola epidemic.

Chapter 8Discussion and Further Research

In this thesis, we have presented several strategies incorporating diverse sources of

social contact data to gain greater realism in modeling infectious disease transmission.

In Chapter 3, we studied and extended the traditional social contact hypothesis for

VZV serology in multiple European countries by accounting for differences related

to susceptibility and infectiousness. Goeyvaerts et al. (2010) showed that inference

for infectiousness proportionality factors is not possible based on serology only.

However, we proposed to use the effective reproduction number as a model eligibility

criterion for infections in endemic equilibrium to deal with this indeterminacy. We

concluded that the social contact hypothesis could be improved upon in 10 out

of 12 countries by including age-dependent factors. This could be attributed to

differences in susceptibility and infectiousness between individuals of different age

groups, but also to differences in the estimated social contact rates and the true

contact rates underlying VZV transmission. Estimates of the basic reproduction

number resulting from the best fitting model differed quite substantially between

countries indicating heterogeneity in VZV epidemiology across Europe. From a set of

demographic, socio-economic and spatio-temporal factors, some were found to have a

positive association with R0 (childhood vaccination coverage, child care attendance,

population density and average living area per person), whilst others showed a

negative association (income inequality, poverty, breast feeding, and the proportion

of children under 14). Interpretation of these associations is not straightforward

in all cases, however some factors e.g. poverty, income inequality and vaccination

coverages may be associated with countries in which children go into childcare from

an early age, facilitating the spread of VZV. The analyzes in this study relied on

139

140 Chapter 8. Discussion and Further Research

(1) endemicity of VZV which seems tenable for the countries under consideration

and which is supported by the similar results obtained for the two samples of

Italy, and (2) the appropriateness of the POLYMOD physical contacts. The effect

of a perturbation in the endemic equilibrium was studied in a small sensitivity

analysis, however, an in-depth analysis would be necessary to fully asses the impact

on the estimation of R. Furthermore, we have shown that knowledge about the

heterogeneity in susceptibility and infectiousness would prove to be very useful to

inform the link between transmission and contact rates when inferring infectious

disease parameters from serological data.

It is known that disease symptoms can affect the contact pattern of an indi-

vidual, for example when staying home from work or school during illness. In

Chapter 4, we have used social contact data from both symptomatic and healthy

individuals to inform mixing patterns in a mathematical disease model accounting

for asymptomatic infection. Applying this model to ILI incidence data, we have

found that the proportion of symptomatic infections and the relative infectiousness

of symptomatic versus asymptomatic cases are very strongly correlated. Hence,

the difference in contact behavior between individuals experiencing symptoms

and healthy/asymptomatic individuals allows estimating one of these parameters

conditional on the other e.g. when assuming asymptomatic individuals are equally

infectious as symptomatic cases. We have extended this model and found that

the data support the hypothesis that the development of ILI symptoms depends

on whether one was infected by a symptomatic or asymptomatic case under the

assumption that symptomatic cases are more infectious than asymptomatic cases. In

this modeling approach we have relied on literature-based fixed parameters and we

have included reporting rates, however, these were not estimable from the data and

information on under-reporting of cases in at least one age class would be necessary

to estimate the true number of cases. The results of this analysis have pointed us to

a preferential transmission hypothesis for influenza, however there are no reports of

a clear link between acquired viral dose and development of influenza symptoms in

the literature yet. Further research is necessary to investigate this relation. We have

highlighted the importance of using empirical data to describe the relation between

contact rates and symptom severity, contact data for other diseases is necessary to

extend this model.

In contrast to the common assumption of homogeneous mixing in infectious

disease modeling, human populations exhibit inherent structure because individuals

141

spend their time in various groups such as households, work places, schools, etc. In

Chapters 5 and 6, we focused on contact heterogeneity within these groups, more

specifically households. We started by introducing the first social contact survey

designed to study contact networks within households in Chapter 5. These networks

were then analyzed using exponential random graph models. We found that the

results support density dependent transmission, with the mean number of contacts

increasing with increasing household size during weekdays, and that contacts between

father and children are less likely than between father and mother, mother and

children and siblings (except older siblings). To assess the impact of these empirically

grounded contact networks on the spread of an infection, we simulated epidemics in

a community of households in a two-level SIR setting with the underlying household

contact network either based on the ERGM or assuming random mixing. These

simulations indicated the random mixing assumption within households to be a

plausible one in this specific setting, since we did not find any noteworthy implications

of switching to an empirically-based contact network. In these analyzes, we focused

on physical contacts, however note that other network links may be more appropriate

when investigating the spread of a specific infection. For example, duration of

contact might be of importance for some diseases. This could be incorporated in the

model by switching to a ‘valued’ network analysis and using these values as weights

in the epidemic model. Further, temporal dynamics could be taken into account by

combining the contact data with the time-use data that was also collected in the study.

In Chapter 6, the model for within-household contact networks developed in

Chapter 5 is combined with epidemic data from a similar community of households

to estimate parameters in a two-level mixing model using a Bayesian framework.

The contact graph underlying this model is therefore informed by both empirical

contact data as well as disease data. From data on symptom onset times of pertussis

in households with a laboratory confirmed index case, we estimated within-household

and community transmission parameters and the duration of the latent period. We

plan to perform a simulation study in which epidemics are simulated in a similar

setting i.e. assuming independent households with an index case, to asses model

fit. Furthermore, we would like to analyze the data using a model that relies on

the random mixing assumption within households to compare the results with our

estimates and fit to the data. We will also investigate the performance of the model

when keeping the contact network fixed.

Chapter 7 presents a two-stage model for the Ebola outbreak in 2014. This

142 Chapter 8. Discussion and Further Research

model was based on publicly available district-level data and showed a strong

temporal and spatial variability of EVD transmission in Guinea, Sierra Leone and

Liberia. This spatial heterogeneity was not taken into account in the majority of

the models published during the outbreak that were fitted on cumulative national

data. We quantified the spatio-temporal transmission patterns at the sub-national

level using a growth model and estimated the effective reproduction number in

selected districts via compartmental models. The latter also allowed to generate

predictions for the number of cases and deaths. However, comparing these predictions

with observed counts, showed that even short-term forecasts are far from reliable.

Extending the model to include protective immunity by asymptomatic infection

improved the fit to the data, however, further research is necessary to investigate

the existence of asymptomatic EVD cases. The modeling approach in this chapter

relied on the assumption of constant under-reporting of cases and deaths over

time, although changes in reporting rates during the outbreak are almost certain

due to e.g. increased awareness. Unfortunately, no data on these changes were

available. Additionally, irregularities in the data indicated inconsistent reporting and

undocumented backlogging.

Lastly, note that the compartmental models in this thesis rely on the common

assumption of exponentially distributed latent and infectious periods. This assump-

tion is mathematically convenient, however not very realistic for most infections

(e.g. Sartwell (1995). More realistic distributions can be obtained by considering

an Erlang distribution with parameters γ and n. It has been shown that these

distributions can have substantial effects, for example in case of perturbations in

the endemic equilibrium (e.g. Lloyd (2001a)), when contact rates vary seasonally

(e.g. Lloyd (2001b)) and when estimating R0 for an emerging disease (Wearing

et al., 2005). I performed a small sensitivity analysis in the context of Chapter 4

(results not shown) in which Erlang distributions (n = 2 and n = 5) were considered.

There was no change in model fit, however, parameter estimates did slightly change.

Wearing et al. (2005) showed that for the same value of R0 and the same average

infectious period, larger values of n lead to a steeper increase in incidence, which is

indeed what we observed. It is possible to account for the uncertainty surrounding

these distributions. However, this implies determining two extra parameters in an

SEIR-type model that might not be estimable from the data at hand and/or, which

might be problematic in a computationally intensive setting.

Bibliography

Abbey, H. (1952). An examination of the Reed-Frost theory of epidemics. Human

Biology, 24(3):201–233.

Addy, C. L., Longini, Jr., I. M., and Haber, M. (1991). A generalized stochastic model

for the analysis of infectious disease final size data. Biometrics, 47:961–974.

Akaike, H. (1973). Information theory and an extension of the maximum likelihood

principle. In Petrov, B.N. and Csaki, F., editors. 2nd International Symposium on

Information Theory, pages 267–281.

Althaus, C. L., Low, N., Musa, E. O., Shuaib, F., and Gsteiger, S. (2015). Ebola virus

disease outbreak in Nigeria: Transmission dynamics and rapid control. Epidemics,

11:80–84.

Anderson, H. (1999). Epidemic models and social networks. Journal of Mathematical

Sciences, (24):128–147.

Anderson, R. M. and May, R. M. (1991). Infectious Diseases of Humans: Dynamics

and Control. Oxford University Press, Oxford.

Andersson, H. and Britton, T. (2000). Stochastic Epidemic Models and Their Statis-

tical Analysis, volume 151 of Lecture Notes in Statistics. Springer New York, New

York, NY.

Azmon, A., Faes, C., and Hens, N. (2014). On the estimation of the reproduction

number based on misreported epidemic data. Statistics in Medicine, 33(7):1176–

1192.

143

144 Bibliography

Bailey, N. (1975). The Mathematical Theory of Infectious Diseases and its Applica-

tion. Griflin, London.

Bailey, N. T. J. (1957). The mathematical theory of epidemics. Griffin, London, UK.

Ball, F. and Britton, T. (2007). An epidemic model with infector-dependent severity.

Advances in Applied Probability, 39(4):949–972.

Ball, F. and Lyne, O. D. (2001). Stochastic multi-type SIR epidemics among a popu-

lation partitioned into households. Advances in Applied Probability, 33(01):99–123.

Ball, F., Mollison, D., and Scalia-Tomba, G. (1997). Epidemics with two levels of

mixing. The Annals of Applied Probability, 7:46–89.

Ball, F. and Neal, P. (2002). A general model for stochastic SIR epidemics with two

levels of mixing. Mathematical Biosciences, 180:73–102.

Becker, N. G. (1989). Analysis of Infectious Disease Data. Chapman and Hall/CRC.

Becker, N. G. and Dietz, K. (1995). The effect of household distribution on transmis-

sion and control of highly infectious diseases. Mathematical biosciences, 127(2):207–

19.

Becker, N. G. and Hall, R. (1996). Immunization levels for preventing epidemics in

a community of households made up of individuals of various types. Mathematical

biosciences, 132(2):205–16.

Bellan, S. E., Pulliam, J. R. C., Dushoff, J., and Meyers, L. A. (2014). Ebola control:

effect of asymptomatic infection and acquired immunity. Lancet (London, England),

384(9953):1499–500.

Bernouilli, D. (1760). Essai dune nouvelle analyse de la mortalite causee par la petite

verole et des avantages de l’inoculation pour la prevenir. Memoires de l’Academie

Royale des Sciences, Paris.

Bollaerts, K., Aerts, M., Shkedy, Z., Faes, C., Van der Stede, Y., Beutels, P., and Hens,

N. (2012). Estimating the population prevalence and force of infection directly from

antibody titres. Statistical Modelling, 12(5):441–462.

Boone, E. L., Merrick, J. R., and Krachey, M. J. (2012). A Hellinger distance ap-

proach to MCMC diagnostics. Journal of Statistical Computation and Simulation,

84(4):833–849.

Bibliography 145

Breiman, L. (2001). Random forests. Machine Learning, 45:5–32.

Bremaud, P. (1999). Markov Chains. Springer New York, New York, NY.

Britton, T. and O’Neill, P. (2002). Bayesian inference for stochastic epidemics in

populations with random social structure. Scandinavian Journal of Statistics,

29(1998):375–390.

Brooks, S. P. and Roberts, G. O. (1998). Convergence assessment techniques for

Markov chain Monte Carlo. Statistics and Computing, 8(4):319–335.

Burnham, K. P. and Anderson, D. R. (2002). Model Selection and Multimodel In-

ference: A Practical Information-Theoretic Approach. Springer-Verlag New York

Inc.

Carrat, F., Vergu, E., Ferguson, N. M., Lemaitre, M., Cauchemez, S., Leach, S., and

Valleron, A. J. (2008). Time lines of infection and disease in human influenza: A re-

view of volunteer challenge studies. American Journal of Epidemiology, 167(7):775–

785.

Cauchemez, S., Carrat, F., Viboud, C., Valleron, a. J., and Boelle, P. Y. (2004).

A Bayesian MCMC approach to study transmission of influenza: application to

household longitudinal data. Statistics in medicine, 23(22):3469–87.

CDC and Ncird (2015). Immunology and vaccine-preventable diseases - pink book

- pertussis. https://www.cdc.gov/vaccines/pubs/pinkbook/downloads/pert.

pdf. Accessed: March 22, 2016.

Centers for Disease Control and Prevention (2010). The 2009 h1n1 pandemic: Sum-

mary highlights. http://www.cdc.gov/h1n1flu/cdcresponse.htm. Accessed:July

15, 2016.

Centers for Disease Control and Prevention (2014). West africa: Ebola

outbreak. http://www.cdc.gov/vhf/ebola/outbreaks/2014-west-africa/

qa-mmwr-estimating-future-cases.html.

Centers for Disease Control and Prevention (2016). Pertussis (whooping cough).

http://www.cdc.gov/pertussis/clinical/features.html. Accessed: October

7, 2016.

Chao, D. L., Halloran, M. E., Obenchain, V. J., and Longini, I. M. (2010). FluTE,

a Publicly Available Stochastic Influenza Epidemic Simulation Model. PLoS Com-

putational Biology, 6(1):e1000656.

146 Bibliography

Chowell, G., Simonsen, L., Viboud, C., and Kuang, Y. (2014a). Is West Africa

Approaching a Catastrophic Phase or is the 2014 Ebola Epidemic Slowing Down?

Different Models Yield Different Answers for Liberia. PLoS Currents, 6.

Chowell, G., Viboud, C., Hyman, J. M., and Simonsen, L. (2014b). The Western

Africa Ebola Virus Disease Epidemic Exhibits Both Global Exponential and Local

Polynomial Growth Rates. PLoS Currents.

Christie, A., Davies-Wayne, G. J., Cordier-Lassalle, T., Cordier-Lasalle, T., Blackley,

D. J., Laney, A. S., Williams, D. E., Shinde, S. A., Badio, M., Lo, T., Mate,

S. E., Ladner, J. T., Wiley, M. R., Kugelman, J. R., Palacios, G., Holbrook, M. R.,

Janosko, K. B., de Wit, E., van Doremalen, N., Munster, V. J., Pettitt, J., Schoepp,

R. J., Verhenne, L., Evlampidou, I., Kollie, K. K., Sieh, S. B., Gasasira, A., Bolay,

F., Kateh, F. N., Nyenswah, T. G., De Cock, K. M., and Centers for Disease Control

and Prevention (CDC) (2015). Possible sexual transmission of Ebola virus - Liberia,

2015. MMWR. Morbidity and mortality weekly report, 64(17):479–81.

Cowles, M. K. and Carlin, B. P. (1996). Markov Chain Monte Carlo Convergence Di-

agnostics: A Comparative Review. Journal of the American Statistical Association,

91(434):883–904.

Daley, D. J. and Gani, J. (1999). Epidemic Modelling. Cambridge University Press,

Cambridge.

Danon, L., Ford, A. P., House, T., Jewell, C. P., Keeling, M. J., Roberts, G. O., Ross,

J. V., and Vernon, M. C. (2011). Networks and the epidemiology of infectious

disease. Interdisciplinary Perspectives on Infectious Diseases, Article ID 284909:28

pages.

de Greeff, S. C., de Melker, H. E., Westerhof, A., Schellekens, J. F., Mooi, F. R.,

and van Boven, M. (2012). Estimation of household transmission rates of pertussis

and the effect of cocooning vaccination strategies on infant pertussis. Epidemiology,

23(6):852–860.

de Greeff, S. C., Mooi, F. R., Westerhof, a., Verbakel, J. M. M., Peeters, M. F.,

Heuvelman, C. J., Notermans, D. W., Elvers, L. H., Schellekens, J. F. P., and

de Melker, H. E. (2010). Pertussis Disease Burden in the Household: How to

Protect Young Infants. Clinical Infectious Diseases, 50(10):1339–1345.

de Ory, F., Echevarrıa, J. M., Kafatos, G., Anastassopoulou, C., Andrews, N., Back-

house, J., Berbers, G., Bruckova, B., Cohen, D. I., de Melker, H., Davidkin, I.,

Bibliography 147

Gabutti, G., Hesketh, L. M., Johansen, K., Jokinen, S., Jones, L., Linde, A., Miller,

E., Mossong, J., Nardone, A., Rota, M. C., Sauerbrei, A., Schneider, F., Smetana,

Z., Tischer, A., Tsakris, A., and Vranckx, R. (2006). European seroepidemiology

network 2: Standardisation of assays for seroepidemiology of varicella zoster virus.

Journal of clinical virology : the official publication of the Pan American Society

for Clinical Virology, 36(2):111–8.

Del Valle, S. Y., Hymanb, J., Hethcote, H., and Eubank, S. (2007). Mixing patterns

between age groups in social networks. Social Networks, 29:539–554.

Demiris, N. and O’Neill, P. (2005a). Bayesian inference for epidemics with two levels

of mixing. Scandinavian journal of statistics, 32(Mcmc):265–280.

Demiris, N. and O’Neill, P. D. (2005b). Bayesian inference for stochastic multitype

epidemics in structured populations via random graphs. Journal of the Royal Sta-

tistical Society: Series B (Statistical Methodology), 67(5):731–745.

Diekmann, O., Heesterbeek, H., and Britton, T. (2013). Mathematical tools for un-

derstanding infectious diseases dynamics. Princeton University Press.

Diekmann, O., Heesterbeek, J. A. P., and Metz, J. A. J. (1990). On the definition and

the computation of the basic reproduction ratio R0 in models for infectious diseases

in heterogeneous populations. Journal of Mathematical Biology, 28:365–382.

Dietz, K. (1975). Transmission and control of arbovirus diseases. Epidemiology, SIMS

Utah Conference Proceedings, Eds. D. Ludwig and K.L. Cooke:104–121.

Dietz, K. (1993). The estimation of the basic reproduction number for infectious

diseases. Statistical methods in medical research, 2(1):23–41.

Dorjee, S., Poljak, Z., Revie, C. W., Bridgland, J., Mcnab, B., Leger, E., and Sanchez,

J. (2013). A review of simulation modelling approaches used for the spread of

zoonotic influenza viruses in animal and human populations. Zoonoses and Public

Health, 60(6):383–411.

Eames, K. T. D., Tilston, N. L., White, P. J., Adams, E., and Edmunds, W. J. (2010).

The impact of illness and the impact of school closure on social contact patterns.

Health Technology Assessment, 14(34):267–312.

Effron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Monographs

on Statistics & Applied Probability. Chapman & Hall/CRC, London.

148 Bibliography

Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals

of Statistics, 7(1):1–26.

Ejima, K., Aihara, K., and Nishiura, H. (2013). The impact of model building on

the transmission dynamics under vaccination: observable (symptom-based) versus

unobservable (contagiousness-dependent) approaches. PloS one, 8(4):e62062.

European Centre for Disease Preventionl and Control (2016). Ebola and

marburg fevers. http://ecdc.europa.eu/en/healthtopics/ebola_marburg_

fevers/Pages/index.aspx. Accessed: July 18, 2016.

Exchange HD (2015). West africa: Ebola outbreak. https://data.hdx.rwlabs.org/

ebola. Accessed:July 1, 2015.

Farrington, C. P. (2003). Modelling Epidemics. The Open University.

Farrington, C. P., Kanaan, M. N., and Gay, N. J. (2001). Estimation of the ba-

sic reproduction number for infectious diseases from age-stratified serological sur-

vey data. Journal of the Royal Statistical Society: Series C (Applied Statistics),

50(3):251–292.

Farrington, C. P. and Whitaker, H. J. (2005). Contact Surface Models for Infectious

Diseases. Journal of the American Statistical Association, 100(470):370–379.

Ferguson, N. M., Cummings, D. A. T., Fraser, C., Cajka, J. C., Cooley, P. C., and

Burke, D. S. (2006). Strategies for mitigating an influenza pandemic. Nature,

442(7101):448–452.

Fisman, D., Khoo, E., and Tuite, A. (2014). Early Epidemic Dynamics of the West

African 2014 Ebola Outbreak: Estimates Derived with a Simple Two-Parameter

Model. PLoS Currents.

Fraser, C., Riley, S., Anderson, R. M., and Ferguson, N. M. (2004). Factors that make

an infectious disease outbreak controllable. Proceedings of the National Academy

of Sciences of the United States of America, 101(16):6146–6151.

Geyer, C. J. and Thompson, E. A. (1992). Constrained Monte Carlo maximum like-

lihood calculations. Journal of the Royal Statistical Society B, 54:657–699.

Goeyvaerts, N. (2011). Statistical and mathematical models to estimate the transmis-

sion of airborne infections from current status data. PhD thesis, Hasselt University.

Bibliography 149

Goeyvaerts, N., Hens, N., Ogunjimi, B., Aerts, M., Shkedy, Z., Damme, P. V., and

Beutels, P. (2010). Estimating infectious disease parameters from data on social

contacts and serological status. Journal of the Royal Statistical Society: Series C

(Applied Statistics), 59(2):255–277.

Goeyvaerts, N., Santermans, E., Potter, G., Van Kerckhove, K., Willem, L., Beutels,

P., and Hens, N. (2016). Empirical household contact networks: revisiting the

two-level mixing model. To be submitted.

Gomes, M. F. C., Pastore y Piontti, A., Rossi, L., Chao, D., Longini, I., Halloran,

M. E., and Vespignani, A. (2014). Assessing the International Spreading Risk

Associated with the 2014 West African Ebola Outbreak. PLoS Currents.

Goodreau, S. M., Handcock, M. S., Hunter, D. R., Butts, C. T., and Morris, M.

(2008). A statnet tutorial. Journal of Statistical Software, 24:1–26.

Greenhalgh, D. and Dietz, K. (1994). Some bounds on estimates for reproductive

ratios derived from the age-specific force of infection. Mathematical Biosciences,

124(1):9–57.

Grefenstette, J. J., Brown, S. T., Rosenfeld, R., DePasse, J., Stone, N. T. B., Coo-

ley, P. C., Wheaton, W. D., Fyshe, A., Galloway, D. D., Sriram, A., Guclu, H.,

Abraham, T., and Burke, D. S. (2013). FRED (a Framework for Reconstructing

Epidemic Dynamics): an open-source software system for modeling infectious dis-

eases and control strategies using census-based populations. BMC public health,

13:940.

Grijalva, C. G., Goeyvaerts, N., Verastegui, H., Edwards, K. M., Gil, A. I., Lanata,

C. F., Hens, N., and RESPIRA PERU project (2015). A household-based study of

contact networks relevant for the spread of infectious diseases in the highlands of

Peru. PloS one, 10(3):e0118457.

Groendyke, C., Welch, D., and Hunter, D. R. (2011). Bayesian Inference for Contact

Networks Given Epidemic Data. Scandinavian Journal of Statistics, 38(3):600–616.

Groendyke, C., Welch, D., and Hunter, D. R. (2012). A network-based analysis of

the 1861 Hagelloch measles data. Biometrics, 68(3):755–65.

Haario, H., Saksman, E., and Tamminen, J. (2001). An Adaptive Metropolis Algo-

rithm. Bernoulli, 7(2):223.

150 Bibliography

Hall, B. (2011). LaplacesDemon: An R Package for Bayesian Inference. R package

version.

Halloran, M. E., Longini, Jr., I. M., Nizam, A., and Yang, Y. (2002). Containing

bioterrorist smallpox. Science, 298:1428–1432.

Handcock, M. S. (2003). Assessing degeneracy in statistical models of social networks.

Technical Report Working Paper no. 39, University of Washington, Seattle.

Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., Krivitsky, P. N., and

Morris, M. (2013a). ergm: Fit, Simulate and Diagnose Exponential-Family Models

for Networks. The Statnet Project (http://www.statnet.org). R package version

3.1-0.

Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., Krivitsky, P. N., and

Morris, M. (2013b). statnet: Software tools for the Statistical Analysis of Network

Data. The Statnet Project (http://www.statnet.org). R package version 3.1-0.

Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., Morris, and Martina

(2008). statnet: Software tools for the representation, visualization, analysis and

simulation of network data. Journal of Statistical Software, 24:1–11.

Hanneke, S., Fu, W., and Xing, E. P. (2010). Discrete temporal models of social

networks. Electron. J. Statist., 4:585–605.

Hastings, W. K. (1970). Monte Carlo Sampling Methods Using Markov Chains and

Their Applications. Biometrika, 57(1):97–109.

Hawkes, N. (2014). Ebola outbreak is a public health emergency of international

concern, WHO warns. BMJ (Clinical research ed.), 349:g5089.

Heesterbeek, J. A. P. (2002). A brief history of R0 and a recipe for its calculation.

Acta biotheoretica, 50(3):189–204.

Hens, N., Aerts, M., Faes, C., Shkedy, Z., Lejeune, O., Van Damme, P., and Beutels,

P. (2010). Seventy-five years of estimating the force of infection from current status

data. Epidemiology and infection, 138(6):802–12.

Hens, N., Shkedy, Z., Aerts, M., Faes, C., Van Damme, P., and Beutels, P. (2012).

Modeling Infectious Disease Parameters Based on Serological and Social Contact

Data: a Modern Statistical Perspective. Springer-Verlag New York Inc.

Bibliography 151

Huang, Y., Zaas, A. K., Rao, A., Dobigeon, N., Woolf, P. J., Veldman, T., Oien,

N. C., McClain, M. T., Varkey, J. B., Nicholson, B., Carin, L., Kingsmore, S.,

Woods, C. W., Ginsburg, G. S., and Hero, A. O. (2011). Temporal dynamics of

host molecular responses differentiate symptomatic and asymptomatic influenza a

infection. PLoS genetics, 7(8):e1002234.

Hunter, D. R. (2007). Curved exponential family models for social networks. Social

Networks, 29:216–230.

Hunter, D. R., Handcock, M. S., Butts, C. T., Goodreau, S. M., and Morris, M.

(2008). ergm: A package to fit, simulate and diagnose exponential-family models

for networks. Journal of Statistical Software, 24:1–29.

Inaba, H. and Nishiura, H. (2008). The state-reproduction number for a multistate

class age structured epidemic system and its application to the asymptomatic trans-

mission model. Mathematical biosciences, 216(1):77–89.

Keeling, M. J. and Eames, K. T. D. (2005). Networks and epidemic models. Journal

of the Royal Society Interface, 2:295–307.

Kelly, H., Riddell, M. A., Gidding, H. F., Nolan, T., and Gilbert, G. L. (2002).

A random cluster survey and a convenience sample give comparable estimates of

immunity to vaccine preventable diseases in children of school age in Victoria,

Australia. Vaccine, 20(25):3130–3136.

Kermack, W. and McKendrick, A. (1927). A contribution to the mathematical theory

of epidemics. Proceedings of the Royal Society London A, 115:700–721.

King, A. A., Domenech de Celles, M., Magpantay, F. M. G., and Rohani, P. (2015).

Avoidable errors in the modelling of outbreaks of emerging pathogens, with spe-

cial reference to Ebola. Proceedings of the Royal Society B: Biological Sciences,

282(1806):20150347–20150347.

Kolaczyk, E. D. (2009). Statistical Analysis of Network Data: Methods and Models.

Springer, New York.

Krivitsky, P. N. (2012). Exponential-family random graph models for valued networks.

Electron. J. Statist., 6:1100–1128.

Krivitsky, P. N. and Handcock, M. S. (2014). A separable model for dynamic net-

works. Journal of the Royal Statistical Society: Series B (Statistical Methodology),

76(1):29–46.

152 Bibliography

Lewnard, J. a., Ndeffo Mbah, M. L., Alfaro-Murillo, J. a., Altice, F. L., Bawo, L.,

Nyenswah, T. G., and Galvani, A. P. (2014). Dynamics and control of Ebola virus

transmission in Montserrado, Liberia: a mathematical modelling analysis. The

Lancet Infectious Diseases, 3099(14).

Liberia MoHaSWRo (2015). Facts about ebola virus disease. http://www.mohsw.

gov.lr/content_display.php?submenu_id=72&sub=submenu. Accessed: July 1,

2015.

Lin, C.-J., Deger, K. A., and Tien, J. H. (2016). Modeling the trade-off between trans-

missibility and contact in infectious disease dynamics. Mathematical biosciences,

277:15–24.

Lloyd, A. L. (2001a). Destabilization of epidemic models with the inclusion of realistic

distributions of infectious periods. Proceedings of the Royal Society B: Biological

Sciences, 268(1470):985–993.

Lloyd, A. L. (2001b). Realistic Distributions of Infectious Periods in Epidemic Models:

Changing Patterns of Persistence and Dynamics. Theoretical Population Biology,

60(59971).

Longini, Jr., I. M. and Koopman, J. S. (1982). Household and community transmission

parameters from final distributions of infections in households. Biometrics, 38:115–

126.

Longini, Jr., I. M., Koopman, J. S., Haber, M., and Cotsonis, G. A. (1988). Statistical

inference for infectious diseases. risk-specific household and community transmission

parameters. American Journal of Epidemiology, 128:845–859.

Longini, Jr., I. M., Koopman, J. S., Monto, A. S., and Fox, J. P. (1982). Estimating

household and community transmission parameters for influenza. American Journal

of Epidemiology, 115:736–751.

Lunelli, A., Rizzo, C., Puzelli, S., Bella, A., Montomoli, E., Rota, M. C., Donatelli,

I., and Pugliese, A. (2013). Understanding the dynamics of seasonal influenza in

Italy: incidence, transmissibility and population susceptibility in a 9-year period.

Influenza and Other Respiratory Viruses, 7(3):286–295.

Melegaro, A., Gay, N. J., and Medley, G. F. (2004). Estimating the transmission

parameters of pneumococcal carriage in households. Epidemiology and Infection,

132:433–441.

Bibliography 153

Melegaro, A., Jit, M., Gay, N., Zagheni, E., and Edmunds, W. J. (2011). What types

of contacts are important for the spread of infections?: using contact survey data

to explore European mixing patterns. Epidemics, 3(3-4):143–51.

Merler, S., Ajelli, M., Fumanelli, L., Gomes, M. F., Piontti, A. P., Rossi, L., Chao,

D. L., Longini Jr., I. M., Halloran, M. E., and Vespignani, A. (2015). Spatiotemporal

spread of the 2014 outbreak of Ebola virus disease in Liberia and the effectiveness

of non-pharmaceutical interventions: a computational modelling analysis. Lancet

Infect Dis, 15(2):204–211.

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E.

(1953). Equation of State Calculations by Fast Computing Machines. J. Chem.

Phys. J. Chem. Phys. Journal Homepage, 21(6).

Meyers, L. A., Pourbohloul, B., Newman, M., Skowronski, D. M., and Brunham,

R. C. (2005). Network theory and SARS: predicting outbreak diversity. Journal of

Theoretical Biology, 232:71–81.

Miller, E., Hoschler, K., Hardelid, P., Stanford, E., Andrews, N., and Zambon, M.

(2010). Incidence of 2009 pandemic influenza A H1N1 infection in England: a

cross-sectional serological study. Lancet, 375(9720):1100–1108.

Miller, E., Marshall, R., and Vurdien, J. (1993). Epidemiology, outcome and control

of varicella-zoster infection. Reviews in Medical Microbiology, 4:222–230.

Mniszewski, S. M., Del Valle, S. Y., Stroud, P. D., Riese, J. M., and Sydoriak, S. J.

(2008). EpiSimS simulation of a multi-component strategy for pandemic influenza.

Society for Computer Simulation International.

Morris, M., Handcock, M. S., and Hunter, D. R. (2008). Specification of exponential-

family random graph models: Terms and computational aspects. Journal of Sta-

tistical Software, 24:1–24.

Mossong, J., Hens, N., Jit, M., Beutels, P., Auranen, K., et al. (2008a). Social contacts

and mixing patterns relevant to the spread of infectious diseases. PLoS Medicine,

5(3):381–391.

Mossong, J., Hens, N., Jit, M., Beutels, P., Auranen, K., Mikolajczyk, R., Massari,

M., Salmaso, S., Tomba, G. S., Wallinga, J., Heijne, J., Sadkowska-Todys, M.,

Rosinska, M., and Edmunds, W. J. (2008b). Social contacts and mixing patterns

relevant to the spread of infectious diseases. PLoS medicine, 5(3):e74.

154 Bibliography

Nardone, a., de Ory, F., Carton, M., Cohen, D., van Damme, P., Davidkin, I., Rota,

M. C., de Melker, H., Mossong, J., Slacikova, M., Tischer, a., Andrews, N., Berbers,

G., Gabutti, G., Gay, N., Jones, L., Jokinen, S., Kafatos, G., de Aragon, M. V. M.,

Schneider, F., Smetana, Z., Vargova, B., Vranckx, R., and Miller, E. (2007). The

comparative sero-epidemiology of varicella zoster virus in 11 countries in the Euro-

pean region. Vaccine, 25(45):7866–72.

National Institute for Public Health and the Environment (2013). The national im-

munisation programme in the netherlands - developments in 2013.

Nations U. West Africa (2015). Ebola outbreak 2014-2015. http://www.

humanitarianresponse.info/disaster/ep-2014-000041-gin/documents%

20and%20since%20April%202015%20http://guinea-ebov.github.io/sitreps.

html. Accessed: July 1, 2015.

NERC (2015). Evd dailly mohs update. http://nerc.sl/. Accessed: July 1, 2015.

Nishiura, H. and Chowell, G. (2014). Early transmission dynamics of Ebola virus

disease (evd), West Africa, march to august 2014. Eurosurveillance.

Ogunjimi, B., Hens, N., Goeyvaerts, N., Aerts, M., Van Damme, P., and Beutels,

P. (2009). Using empirical social contact data to model person to person infec-

tious disease transmission: an illustration for varicella. Mathematical Biosciences,

218:80–87.

O’Neill, P. D. (2009). Bayesian inference for stochastic multitype epidemics in struc-

tured populations using sample data. Biostatistics (Oxford, England), 10(4):779–91.

O’Neill, P. D. and Becker, N. G. (2001). Inference for an epidemic when susceptibility

varies. Biostatistics (Oxford, England), 2(1):99–108.

Organization GoGWH (2015). Rapport de la situation epidemiologique | epi

situation report. http://www.humanitarianresponse.info/en/operations/

west-and-central-africa/documents/disasters/33204. Accessed:July 1, 2015.

Plotkin, S. (2010). Complex correlates of protection after vaccination. Clinical Infec-

tious Diseases, 56:1458–1465.

Potter, G. E. and Handcock, M. S. (2010). A description of within-family resource

exchange networks in a Malawian village. Demographic Research, 23:117–152.

Bibliography 155

Potter, G. E., Handcock, M. S., Longini, Jr., I. M., and Halloran, M. E. (2011).

Estimating within-household contact networks from egocentric data. Annals of

Applied Statistics, 5:1816–1838.

Potter, G. E. and Hens, N. (2013). A penalized likelihood approach to estimate within-

household contact networks from egocentric data. Journal of the Royal Statistical

Society: Series C (Applied Statistics), 62:629–648.

Potter, G. E., Smieszek, T., and Sailer, K. (2015). Modeling workplace contact net-

works: The effects of organizational structure, architecture, and reporting errors on

epidemic predictions. Network science (Cambridge University Press), 3(3):298–325.

Public Health England (2010). Weekly epidemiological updates archive.

http://www.hpa.org.uk/Topics/InfectiousDiseases/InfectionsAZ/

PandemicInfluenza/H1N1PandemicArchive/SIEpidemiologicalData/

SIEpidemiologicalReportsArchive/influswarchiveweeklyepireports/.

Accessed: December 20, 2010.

Rampey, A. H., Longini, Jr., I. M., Haber, M., and Monto, M. S. (1992). A discrete-

time model for the statistical analysis of infectious disease incidence data. Biomet-

rics, 48:117–128.

Read, J. M., Edmunds, W. J., Riley, S., Lessler, J., and Cummings, D. a. T. (2012).

Close encounters of the infectious kind: methods to measure social mixing be-

haviour. Epidemiology and infection, 140(12):2117–30.

Reshef, D. N., Reshef, Y. a., Finucane, H. K., Grossman, S. R., McVean, G., Turn-

baugh, P. J., Lander, E. S., Mitzenmacher, M., and Sabeti, P. C. (2011). Detecting

novel associations in large data sets. Science (New York, N.Y.), 334(6062):1518–24.

Roberts, G. O. and Rosenthal, J. S. (2009). Examples of Adaptive MCMC. Journal

of Computational and Graphical Statistics, 18(2):349–367.

Robins, G., Pattison, P., Kalish, Y., and Lusher, D. (2007). An introduction to expo-

nential random graph (p*) models for social networks. Social Networks, 29(2):173–

191.

Rosenthal, J. S. (2007). AMCMC: An R interface for adaptive MCMC.

Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian inference for

latent Gaussian models by using integrated nested Laplace approximations. Journal

of the Royal Statistical Society: Series B (Statistical Methodology), 71(2):319–392.

156 Bibliography

Sane, J. and Edelstein, M. (2015). Overcoming barriers to data sharing in pub-

lic health: a global perspective. https://www.chathamhouse.org/publication/

overcoming-barriers-data-sharing-public-health-global-perspective.

Sanitation Moha (2015). Ebola situation report. http://health.gov.sl/?page_id=

583. Accessed: July 1, 2015.

Santermans, E., Goeyvaerts, N., Melegaro, A., Edmunds, W., Faes, C., Aerts, M.,

Beutels, P., and Hens, N. (2015). The social contact hypothesis under the as-

sumption of endemic equilibrium: Elucidating the transmission potential of VZV

in Europe. Epidemics, 11:14–23.

Santermans, E., Robesyn, E., Ganyani, T., Sudre, B., Faes, C., Quinten, C., Van Bor-

tel, W., Haber, T., Kovac, T., Van Reeth, F., Testa, M., Hens, N., and Plachouras,

D. (2016a). Spatiotemporal Evolution of Ebola Virus Disease at Sub-National Level

during the 2014 West Africa Epidemic: Model Scrutiny and Data Meagreness. PloS

one, 11(1):e0147172.

Santermans, E., Van Kerckhove, K., Azmon, A., Edmunds, J., Beutels, P., Faes, C.,

and Hens, N. (2016b). Structural differences in mixing behaviour informing the

role of asymptomatic infection and testing symptom heritability. Mathematical

Biosciences. In revision.

Sartwell, P. E. (1995). The distribution of incubation periods of infectious disease.

1949. American journal of epidemiology, 141(5):386–94; discussion 385.

Schinazi, R. B. (2002). On the role of social clusters in the transmission of infectious

diseases. Theoretical population biology, 61(2):163–9.

Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics,

6(2):461–464.

Smieszek, T., Barclay, V. C., Seeni, I., Rainey, J. J., Gao, H., Uzicanin, A., and

Salathe, M. (2014). How should social mixing be measured: comparing web-based

survey and sensor-based methods. Bmc Infectious Diseases, 14.

Strauss, D. and Ikeda, M. (1990). Pseudolikelihood estimation for social networks.

Journal of the American Statistical Association, 85:204–212.

UN Mission for Ebola Emergency Response (2015). (unmeer). http://apps.who.

int/ebola/current-situation/ebola-situation-report-18-march-2015. Ac-

cessed: July 1, 2015.

Bibliography 157

United Nations Security Council (2014). Resolution 2177, adopted by the security

council at its 7268th meeting on 18 september 2014.

Van Effelterre, T., Shkedy, Z., Aerts, M., Molenberghs, G., Van Damme, P., and

Beutels, P. (2009). Contact patterns and their implied basic reproductive numbers:

an illustration for varicella-zoster virus. Epidemiology and Infection, 137:48–57.

Van Kerckhove, K., Hens, N., Edmunds, W. J., and Eames, K. T. D. (2013). The

impact of illness on social networks: implications for transmission and control of

influenza. American journal of epidemiology, 178(11):1655–1662.

Vynnycky, E. and White, R. G. (2010). An introduction to infectious disease mod-

elling. Oxford University Press.

Wallinga, J. and Levy-Bruhl, D. (2001). Estimation of measles reproduction ratios

and prospects for elimination of measles by vaccination in some Western European

countries. Epidemiology and . . . , pages 281–295.

Wallinga, J., Teunis, P., and Kretzschmar, M. (2006). Using data on social contacts

to estimate age-specific transmission parameters for respiratory-spread infectious

agents. American Journal of Epidemiology, 164(10):936–944.

Wearing, H. J., Rohani, P., and Keeling, M. J. (2005). Appropriate Models for the

Management of Infectious Diseases. PLoS Medicine, 2(7):e174.

Whitaker, H. J. and Farrington, C. P. (2004). Infections with varying contact rates:

application to varicella. Biometrics, 60(3):615–23.

Willem, L., Van Kerckhove, K., Chao, D. L., Hens, N., and Beutels, P. (2012). A nice

day for an infection? Weather conditions and social contact patterns relevant to

influenza transmission. PLoS ONE, 7(11):e48695.

Wood, S. N. (2006). Generalized additive models : an introduction with R. Chapman

& Hall/CRC.

Woof, J. M. and Burton, D. R. (2004). Human antibodyaASFc receptor interactions

illuminated by crystal structures. Nature Reviews Immunology, 4(2):89–99.

World Health Organization (2010). Weekly epidemiological record - no. 40, 85. http:

//www.who.int/wer. Accessed: March 22, 2016.

158 Bibliography

World Health Organization (2014). Case definition recommendations for ebola or mar-

burg virus diseases. http://www.who.int/csr/resources/publications/ebola/

ebola-case-definition-contact-en.pdf. Accessed: January 16, 2015.

World Health Organization (2015a). The ebola outbreak in liberia is over. http://

www.who.int/mediacentre/news/statements/2015/liberia-ends-ebola/en/.

World Health Organization (2015b). Ebola situation report - 18

march 2015. http://apps.who.int/ebola/current-situation/

ebola-situation-report-18-march-2015. Accessed: March 25, 2015.

World Health Organization (2015c). Ebola situation report - 24

june 2015. http://apps.who.int/ebola/current-situation/

ebola-situation-report-24-june-2015. Accessed: July 2, 2015.

World Health Organization (2016a). Ebola virus disease: Fact sheet. http://www.

who.int/mediacentre/factsheets/fs103/en/. Accessed: July 18, 2016.

World Health Organization (2016b). Infectious diseases. http://www.who.int/

topics/infectious_diseases/en/. Accessed: August 29, 2016.

Acknowledgements

I gratefully acknowledge support from a Methusalem research grant from the

Flemish government awarded to Herman Goossens (Antwerpen University) en Geert

Molenberghs (Hasselt University). The computational resources and services used

in this thesis were provided by the VSC (Flemish Supercomputer Center), funded

by the Hercules Foundation and the Flemish Government - department EWI. I

gratefully acknowledge G.-J. Bex for his assistance.

159

Appendix AAppendix - Chapter 5

In this Appendix we present some extra information relevant to the ERGM, additional

results for the goodness-of-fit simulations (both described in Section 5.2) and the

epidemic simulation in Section 5.3.

A.1 Household Contact Survey

Figure A.1: Barplot of within-household contact duration distributions by type of

relationship, including both physical and non-physical contacts.

161

162 A. Appendix - Chapter 5

A.2 Modeling Within-household Physical Contact

Networks

Figure A.2: Interpretation of mixing and age effect statistics of the ERGM: ratio of the

odds of physical contact occurring between two relatives versus a pair of siblings, as a

function of the sum of the siblings’ ages. Left panel: weekday, right panel: weekend day.

A. Appendix - Chapter 5 163

Within-household Clustering

We consider various measures of within-household clustering: the clustering coeffi-

cient (Kolaczyk, 2009), the mean correlation coefficient (Morris et al., 2008) and the

proportion of observed versus potential triangles, defined as:

Clustering coeffcient =3 ·#triangles

#connected triples;

Mean correlation coeffcient =#triangles

#triangles + #2-stars /∈ triangle;

Proportion observed vs. potential triangles =#triangles∑h

(size(h)

3

) .

HH size Nr. HHs Average Clustering Mean correlation Proportion observed

degree coefficient coefficient vs. potential triangles

2 12 1.00 NA NA NA

3 72 1.88 0.96 0.90 0.86

4 159 2.81 0.97 0.91 0.87

5 57 3.66 0.96 0.90 0.83

≥ 6 16 4.51 0.96 0.88 0.77

Total 316 2.94 0.96 0.90 0.83

Table A.1: Observed physical contact networks: average degree and various measures of

within-household clustering, stratified by household size.

ERGM Weekday

HH size Nr. HHs Proportion complete Mean density

Observed Median Q 2.5% Q 97.5% Observed Median Q 2.5% Q 97.5%

2 9 1.00 0.89 0.67 1.00 1.00 0.89 0.67 1.00

3 53 0.91 0.92 0.83 0.98 0.96 0.97 0.93 0.99

4 111 0.77 0.75 0.66 0.82 0.93 0.93 0.90 0.95

5 39 0.64 0.67 0.51 0.79 0.90 0.91 0.84 0.96

≥ 6 13 0.46 0.54 0.31 0.77 0.85 0.84 0.73 0.93

Total 225 0.77 0.76 0.71 0.81 0.93 0.93 0.91 0.95

Table A.2: Observed proportion of complete networks and mean network density,

stratified by household size, with median and 95% percentile range obtained from 1000

networks simulated from the ERGM for within-household physical contact networks on a

weekday.


HH size Nr. HHs Proportion observed vs. potential triangles

Observed Median Q 2.5% Q 97.5%

3 53 0.91 0.92 0.83 0.98

4 111 0.85 0.84 0.78 0.89

5 39 0.81 0.82 0.71 0.90

≥ 6 13 0.71 0.71 0.56 0.87

Total 216 0.80 0.80 0.75 0.85

Table A.3: Observed proportion of observed versus potential triangles, stratified by

household size, with median and 95% percentile range obtained from 1000 networks

simulated from the ERGM for within-household physical contact networks on a weekday.

ERGM Weekend

HH size Nr. HHs Proportion complete Mean density

Observed Median Q 2.5% Q 97.5% Observed Median Q 2.5% Q 97.5%

≤ 3 22 0.77 0.82 0.68 0.95 0.89 0.92 0.82 0.98

4 48 0.85 0.83 0.73 0.92 0.96 0.95 0.92 0.98

≥ 5 21 0.81 0.86 0.71 0.95 0.96 0.97 0.92 0.99

Total 91 0.82 0.84 0.76 0.90 0.94 0.95 0.92 0.97

Table A.4: Observed proportion of complete networks and mean network density,

stratified by household size, with median and 95% percentile range obtained from 1000

networks simulated from the ERGM for within-household physical contact networks on a

weekend day.

HH size Nr. HHs Proportion observed vs. potential triangles

Observed Median Q 2.5% Q 97.5%

3 19 0.74 0.84 0.63 0.95

4 48 0.91 0.89 0.83 0.94

≥ 5 21 0.93 0.94 0.88 0.98

Total 88 0.91 0.91 0.87 0.95

Table A.5: Observed proportion of observed versus potential triangles, stratified by

household size, with median and 95% percentile range obtained from 1000 networks

simulated from the ERGM for within-household physical contact networks on a weekend

day.

A. Appendix - Chapter 5 165S

ou

rce

Data

Nu

mb

erof

Hou

seh

old

Str

ati

fica

tion

Hou

seh

old

Com

mu

nit

y

hou

sehold

ssi

ze(r

ange)

q HH

SA

Rq c

om

CP

I

Lon

gin

iand

Koop

man

(1982)

Asi

an

infl

uen

za,

Japan

n=

42

Siz

e3

0.9

60.1

70.8

60.1

4

Infl

uen

zan

=42

Siz

e4−

50.9

00.3

60.6

60.3

4

Lon

gin

iet

al.

(1982)

1977-7

8in

flu

enza

A(H

3N

2),

Tec

um

seh

n=

195

Siz

e†1−

50.9

60.1

50.8

70.1

3

1975-7

6in

flu

enza

B,

Sea

ttle

n=

87

Siz

e†1−

50.9

70.1

30.8

30.1

7

1977-7

8in

flu

enza

A(H

3N

2),

Sea

ttle

n=

159

Siz

eN

A0.9

40.2

10.7

40.2

6

1978-7

9in

flu

enza

A(H

1N

1),

Sea

ttle

n=

93

Siz

e†1−

30.9

10.3

10.5

40.4

6

Lon

gin

iet

al.

(1988)

1977-7

8,

1980-8

1in

fluen

zaA

(H3N

2),

Tec

um

sehn

=567

Siz

e†1−

5C

hild<

18y

0.9

40.2

20.8

20.1

8

Ad

ult≥

18y

0.9

70.1

10.8

90.1

1

Ad

dy

etal.

(1991)*

1977-7

8,

1980-8

1in

flu

enza

A(H

3N

2),

Tec

um

sehn

=567

Siz

e†1−

5C

hild

-ch

ild

0.9

20.2

80.8

20.1

8

Child

-ad

ult

0.9

60.1

3

Ad

ult

-ch

ild

0.9

70.1

00.8

90.1

1

Ad

ult

-adu

lt0.9

60.1

5

Ram

pey

etal.

(1992)*

1983

rhin

ovir

us,

Tec

um

seh

n=

91

Siz

e3−

9C

hild

-ch

ild

0.9

50.1

70.6

30.3

7

Child

-ad

ult

0.9

70.1

3

Ad

ult

-ch

ild

0.9

70.1

10.7

60.2

4

Ad

ult

-adu

lt0.9

70.1

1

Cau

chem

ezet

al.

(2004)

1999-2

000

infl

uen

zaA

(H3N

2),

Fra

nce

n=

334

Siz

e2−

8S

ize

20.8

70.4

30.9

20.0

8

Siz

e3

0.9

10.3

1

Siz

e4

0.9

30.2

5

Siz

e5

0.9

40.2

1

Table

A.6

:L

iter

atu

re-b

ase

des

tim

ate

sof

house

hold

and

com

munit

ytr

ansm

issi

on

para

met

ers

obta

ined

from

house

hold

final

size

or

sym

pto

monse

tdata

:q H

H=P

(esc

ap

ein

fect

ion

from

infe

cted

HH

mem

ber

per

day

)ass

um

ing

an

infe

ctio

us

per

iod

of

4day

s,th

ehouse

hold

seco

ndary

att

ack

rate

(SA

R)

i.e.

the

pro

babilit

yof

bei

ng

infe

cted

by

anoth

erhouse

hold

mem

ber

duri

ng

the

cours

eof

the

latt

er’s

infe

ctio

us

per

iod,

andq c

om

=P

(esc

ap

ein

fect

ion

from

com

munit

yduri

ng

epid

emic

per

iod)=

1−

CP

I,w

her

eC

PI

isth

eco

mm

unit

ypro

babilit

yof

infe

ctio

n.†

House

hold

size

defi

ned

as

the

num

ber

of

susc

epti

ble

sin

ahouse

hold

pri

or

toth

eep

idem

ic.

*Sam

eage

defi

nit

ions

for

childre

n

and

adult

sas

inL

ongin

iet

al.

(1988),

dis

tinguis

hin

gb

etw

een

susc

epti

ble

sand

infe

cted

.


A.3 Epidemic Simulation Model

Figure A.3: Final fractions for 1000 simulations of a stochastic SIR epidemic process on a

2-level households model assuming random and empirical-based mixing within households.

Small outbreaks are excluded from display.

Figure A.4: Final fractions for 1000 simulations of a stochastic SIR epidemic process on a

2-level households model assuming random and empirical-based mixing within households.

Small outbreaks are excluded from display.

Samenvatting

Infectieziekten zijn elk jaar verantwoordelijk voor miljoenen doden, vooral in ontwik-

kelingslanden. Van de globale HIV en tuberculose epidemieen tot de ontwikkeling

van nieuwe pathogenen en de heropflakkering van oude pathogenen, vaak in nieuwe

en resistente vorm, infectieziekten hebben een zeer grote impact op de wereldge-

zondheid. Er is continu nood aan nieuwe en verbeterde technieken om de oorzaak

en verspreiding van deze ziekten te bestuderen om ze uiteindelijk onder controle te

krijgen.

Infectieziekten worden veroorzaakt door pathogene micro-organismen of ziekte-

kiemen, zoals bacterien, virussen, parasieten of schimmels en ze kunnen direct of

indirect verspreid worden van persoon tot persoon. Directe transmissie kan gebeuren

via rechtstreeks contact, zoals aanraken, kussen, bijten of geslachtsgemeenschap,

via respiratoir druppelcontact of via de lucht. Respiratoire druppels zijn kleine

vochtdruppels die verspreid worden wanneer een persoon niest of hoest. Dit soort

transmissie is meestal beperkt tot korte afstanden. De transmissie gebeurt via de lucht

wanneer het virus zich verspreid via zeer kleine druppeltjes die kunnen verdampen na

niezen, hoesten of praten. Deze deeltjes kunnen vrij lang in de lucht blijven zweven

en zich over relatief lange afstanden verplaatsen. De mazelen, kinkhoest en (pandemi-

sche) griep zijn voorbeelden van ziektes die zich verspreiden via druppelcontact of via

de lucht. Indirecte transmissie doet zich voor wanneer het besmettelijke organisme

wordt overgedragen via objecten of via insecten. Malaria en dengue zijn voorbeelden

van ziektes die via insecten verspreid worden. De transmissieroute van een ziekte

wordt voornamelijk bepaald door eigenschappen van het besmettelijke organisme en

die van de gastheer. Sommige micro-organismen zijn gelimiteerd tot een beperkt

167

168 Samenvatting

aantal transmissieroutes, anderen kunnen op diverse manieren tot een besmetting

leiden. In deze thesis ligt de focus op ziektes die zich verspreiden van persoon tot per-

soon via niet-seksueel contact, zoals druppelcontact, via fysiek contact of via de lucht.

Wanneer een persoon geınfecteerd wordt met een virale infectieziekte, reageert

het immuunsysteem door, enerzijds specifieke antilichamen te produceren tegen het

besmettelijke pathogeen en anderzijds door cellen te activeren die het pathogeen

moeten vernietigen. Na infectie kan het een tijdje duren voordat de geınfecteerde

persoon zelf ook besmettelijk is. De lengte van deze period is afhankelijk van de

ziekte in kwestie. Besmette individuen kunnen symptomen ontwikkelen, maar het

is niet zo dat de vertoning van symptomen altijd samenvalt met de periode van

besmettelijkheid. Geınfecteerde personen die geen symptomen ontwikkelen worden

asymptomatisch genoemd. Na een tijdje is de persoon niet langer besmettelijk en

herstelt hij. Sommige ziektes induceren immuniteit na herstel, waardoor personen

die de ziekte gehad hebben niet meer besmet kunnen worden.

De laatste decennia is het domein van infectieziekte-epidemiologie substantieel

gegroeid. Dit als reactie op opkomende dreigingen, denk bijvoorbeeld aan het Ebola

virus, maar ook om endemische ziekten beter te kunnen controleren. Er kan een

onderverdeling in statistische modellen en wiskundige modellen gemaakt worden.

Statistische modellen bestuderen relaties tussen verschillende variabelen gebaseerd

op data en trekken dan besluiten gebaseerd op deze relaties. Wiskundige modellen,

daarentegen, beschrijven een systeem door middel van wiskundige vergelijkingen en

bestuderen hoe dat systeem verandert and hoe variabelen afhankelijk zijn van de

waarde van andere variabelen.

Een belangrijke parameter bij het modelleren van infectieziekten is de kans op

contact tussen een besmettelijke bron en een vatbaar persoon. Voor infecties die

worden verspreid van persoon tot persoon zijn er veronderstellingen nodig die de

grote diversiteit aan menselijke relaties vereenvoudigen zodat ze gebruikt kunnen

worden in wiskundig modellen. In klassieke epidemische modellen werd er meestal

verondersteld dat individuen in een populatie volledig willekeurig contact maken

(‘homogeneous mixing’ ) en dat iedereen even vatbaar en besmettelijk is. Hoewel

dergelijke assumpties meestal voldoende zijn in de context van modelleren, zijn

ze niet erg realistisch. Daarom is er de voorbije decennia veel onderzoek gedaan

naar het modelleren van heterogeniteit in het krijgen van een infectieziekte en het

effect daarvan op de verspreiding. Anderson and May (1991) introduceerden een

Samenvatting 169

methode waarin bepaalde structuren verondersteld worden voor leeftijdsspecifieke

overdrachtsintensiteiten. Deze structuren worden gespecifieerd via laag-dimensionale

matrices, die de ‘wie verkrijgt infectie van wie’-matrices (‘Who Acquires Infection

From Whom’ of WAIFW matrices) genoemd worden. Een nadeel van deze methode

is de subjectiviteit van de gekozen structuur en de keuze van de leeftijdscategorieen.

Het effect van deze keuzes is veelvuldig bestudeerd en er werden ook meerdere

extensies op deze klassieke WAIFW methode voorgesteld. Wallinga et al. (2006)

waren de eersten om sociale contactgegevens te gebruiken om overdrachtsintensiteiten

te informeren. Zij veronderstellen dat overdrachtsintensiteiten voor niet-seksuele,

persoon-tot-persoon infectieziekten recht evenredig zijn aan frequenties van verbale

of fysieke contacten die geschat worden uit enquetes. Dit wordt de ‘sociale contact

hypothese’ genoemd. Ook zijn er modellen ontwikkeld die de onderliggende structuur

van contactpatronen proberen te incorporeren door de populatie onder te verdelen

in contactstructuren. Voorbeelden hiervan zijn huishoudmodellen (Longini and

Koopman, 1982; Becker and Dietz, 1995; Becker and Hall, 1996), modellen met twee

niveaus van transmissie (Ball et al., 1997; Ball and Lyne, 2001; Demiris and O’Neill,

2005a), netwerk modellen (Anderson, 1999; Britton and O’Neill, 2002) en sociale

cluster modellen (Schinazi, 2002).

In deze thesis hebben we verschillende strategieen getoond waarin diverse vor-

men van sociale contact data gebruikt worden om modellen voor infectieziekten

realistischer te maken. In Hoofdstuk 3, hebben we de klassieke sociale contact

hypothese bestudeerd en uitgebreid voor serologische data van varicella-zoster virus

(VZV) in meerdere Europese landen door rekening te houden met heterogeniteit

in vatbaarheid en besmettelijkheid. Goeyvaerts et al. (2010) hebben aangetoond

dat het schatten van parameters gerelateerd aan besmettelijkheid enkel gebaseerd

op serologische data niet mogelijk is. Om om te gaan met deze onbepaaldheid,

hebben wij voorgesteld om het effectief reproductie getal R te gebruiken als model

criterium voor endemische ziektes. Reproductiegetallen zijn zeer belangrijk bij het

karakteriseren van infectieziekteverspreiding en het inschatten van de inspanning die

nodig is om een epidemie onder controle te krijgen. Het basis reproductiegetal R0

wordt gedefineerd als het gemiddeld aantal secundaire besmettingen door een typisch

geınfecteerd individu in een totaal vatbare populatie. Het effectief reproductiegetal

R geeft het aantal secundaire gevallen weer in een populatie met een bepaalde

immuniteit. Een ‘endemische’ infectieziekte is een ziekte die over een langere tijd in

een constante frequentie in een populatie voorkomt. De incidentie van zo een ziekte

kan cyclische trends vertonen, maar fluctueert steeds rond een stationair gemiddelde.

170 Samenvatting

We concludeerden dat de sociale contact hypothese verbeterd kon worden voor 10

van de 12 landen door leeftijdsspecifieke factoren toe te voegen. Dit kan een gevolg

zijn van verschillen in vatbaarheid en besmettelijkheid tussen personen van andere

leeftijdsgroepen, maar het zou ook een gevolg kunnen zijn van verschillen tussen

de geschatte sociale contacten en de werkelijke contacten die nodig zijn om VZV

te verspreiden. De geschatte waarden voor het basis reproductie getal verschilden

vrij veel tussen de landen hetgeen aangeeft dat de VZV epidemiologie in Europa

behoorlijk heterogeen is. Vanuit een set demografische, socio-economische and spatio-

temporele factoren werden positieve correlaties gevonden met R0 (vaccinatiegraden

bij kinderen, aanwezigheid op kinderopvang, bevolkingsdichtheid en gemiddelde

leefruimte per persoon), terwijl anderen een negatieve associatie vertoonde (onge-

lijkheid in inkomen, armoede, borstvoeding en de proportie kinderen jonger dan 14

jaar). Interpretatie van deze verbanden is niet in alle gevallen even duidelijk, maar

sommige factoren zoals armoede, ongelijkheid van inkomsten en vaccinatiegraden

zouden geassocieerd kunnen zijn met landen waarin kinderen vanaf een jonge

leeftijd naar de kinderopvang gaan, hetgeen de verspreiding van VZV bevordert. De

analyzes in deze studie steunden op (1) endemiciteit van VZV, wat een redelijke

aanname is voor de landen die we beschouwden, en (2) de geschiktheid van de fysieke

contacten uit de POLYMOD studie. Het effect van een schending van de endemiciteit

werd bestudeerd in een kleine sensitiviteitsstudie, maar een diepgaande analyze is

nodig om het effect van dergelijke schending op R te kunnen inschatten. Verder

hebben we ook aangetoond dat informatie over verschillen in vatbaarheid en besmet-

telijkheid zeer nuttig zou zijn bij het schatten van parameters vanuit serologische data.

Het is geweten dat ziektesymptomen een effect kunnen hebben op de contact-

patronen van een persoon, bijvoorbeeld indien een ziek persoon thuis blijft van

zijn werk of van school. In Hoofdstuk 4, hebben we sociale contactgegevens

gebruikt van zowel gezonde personen als personen die symptomen vertonen om

transmissie-intensiteiten te informeren in een wiskundig model dat rekening houdt

met asymptomatische infecties. We hebben dit model toegepast op gegevens over

ILI incidentie en vonden dat de proportie symptomatische infecties en de relatieve

besmettelijkheid van symptomatische versus asymptomatische personen zeer sterk

gecorreleerd is. Het verschil in contactgedrag tussen zieke en gezonde personen

staat ons dus toe om een van beide parameters te schatten conditioneel op de

andere. Bijvoorbeeld wanneer we veronderstellen dat asymptomatische personen

even besmettelijk zijn als symptomatische personen. In deze analyze hebben we

verder ook rekening gehouden met rapporteerintensiteiten, aangezien het mogelijk

Samenvatting 171

is dat een deel van de ILI gevallen niet gerapporteerd wordt. Deze rapporteerin-

tensiteiten zijn helaas niet schatbaar vanuit de data, dus de onderrapportering zou

in minstens een leeftijdscategorie gekend moeten zijn. Verder vonden we ook dat

onze analyse de preferentiele transmissie hypothese ondersteunde. Dit wil zeggen

dat de kans om symptomen te ontwikkelen afhankelijk is van de aan- of afwezigheid

van symptomen bij de persoon die de ziekte heeft doorgegeven. Maar voorlopig zijn

er nog geen bewijzen van een duidelijk verband tussen de verkregen virale dosis

en de ontwikkeling van influenza symptomen in de literatuur. Verder onderzoek is

vereist om deze hypothese na te gaan. In dit hoofstuk beschreven we het belang van

empirische data om de relatie tussen contactintensiteiten en ernst van symptomen te

beschrijven. Contactgegevens voor andere infectieziekten zijn nodig om dit model te

kunnen uitbreiden.

In tegenstelling tot de veelgebruikte veronderstelling van ‘homogenous mixing ’

in modellen voor infectieziektes, zijn menselijke populaties opgedeeld in inherente

structuren. Personen spenderen namelijk tijd in verschillende groepen zoals het

gezin, school, werkomgeving, etc. In Hoofdstukken 5 en 6 bestuderen we contact

heterogeniteit binnen deze groepen, en meer specifiek, binnen gezinnen. We beginnen

met het introduceren van de eerste sociale contact studie ontworpen om contact

netwerken binnen gezinnen te bestuderen in Hoofdstuk 5. Deze netwerken werden

geanalyzeerd gebruik makende van ‘exponential random graph models’ (ERGMs).

We vonden dat het gemiddelde aantal contacten toeneemt met toenemende ge-

zinsgrootte tijdens weekdagen en dat contact tussen een vader en zijn kinderen

minder waarschijnlijk is dan tussen vader en moeder, moeder en kinderen en tussen

kinderen (uitgezonderd oudere kinderen). Om na te gaan welk effect deze empirische

contact netwerken hebben op de verspreiding van een infectieziekte, voerden we

een simulatiestudie uit. Hierin simuleerden we epidemieen in een gemeenschap van

gezinnen gebaseerd op een SIR model met transmissie op twee niveaus (‘two-level

mixing model ’) waarin het contact netwerk binnen gezinnen steunde op, ofwel de

empirische contact netwerken, ofwel op de ‘homogenous mixing ’ veronderstelling. De

resultaten toonden aan dat deze laatste een plausibele veronderstelling is in deze

setting, aangezien we geen noemenswaardige verschillen vonden bij het gebruiken van

een empirisch netwerk. In deze analyzes hebben we ons gericht op fysieke contacten,

maar het is mogelijk dat andere soorten contacten meer gepast zijn wanneer men

de verspreiding van een specifieke infectie bestudeert. De duur van een contact zou,

bijvoorbeeld, van belang kunnen zijn voor sommige ziektes. Hier zou rekening mee

gehouden kunnen worden in het model door gebruik te maken van een netwerk

172 Samenvatting

analyse waarin gewichten met de duur van contact worden toegevoegd. Verder zou

het ook interessant zijn om te kijken naar verandering van contacten doorheen de tijd.

In Hoofdstuk 6 hebben we het model ontwikkeld in Hoofdstuk 7 gecombineerd

met epidemische data van een gelijkaardige gemeenschap van huishoudens om de

parameters van een ‘two-level mixing model’ te schatten volgens een Bayesiaanse

aanpak. Het onderliggende contact netwerk van dit model werd dus zowel geın-

formeerd door empirische contact data, als door ziektegegevens. Met behulp van

data over pertussis in gezinnen met een laboratorium bevestigde index werden

overdrachtsintensiteiten binnen het gezin en vanuit de gemeenschap geschat, alsook

de lengte van de latente periode. We plannen verder nog om een simulatie studie te

doen, waarin epidemieen in een gelijkaardige setting gesimuleerd zullen worden zodat

de prestatie van het model beoordeeld kan worden. Verder willen we ook dit model

vergelijken met een model waarin ‘homogeneous mixing’ verondersteld wordt binnen

gezinnen.

Tenslotte, hebben we in Hoofdstuk 7 een model bestaande uit twee fases voor

de Ebola uitbraak van 2014 ontwikkeld. Dit model was gebaseerd op publiekelijk

beschikbare district data en toonde een sterke temporele en spatiale variabiliteit in

de overdracht van EVD in Guinea, Sierra Leone en Liberia. In de meerderheid van de

modellen die tijdens de uitbraak gepubliceerd werden, werd geen rekening gehouden

met deze spatiale heterogeniteit. Ze maakten namelijk gebruik van cumulatieve

data op nationaal niveau. We kwantificeerden spatio-temporele transmissie patronen

op sub-nationaal niveau door gebruik te maken van een groeimodel en hebben het

effectief reproductiegetal geschat in een selectie van districten via compartimentele

modellen. Deze laatste methode stelde ons ook in staat om predicties te maken

voor het aantal EVD gevallen en het aantal doden. Maar het vergelijken van deze

predicties met de geobserveerde aantallen toonde aan dat zelfs voorspellingen op

korte termijn niet betrouwbaar zijn. Wanneer we dit model uitbreidden om rekening

te houden met asymptomatische infectie, verkregen we een betere fit, maar verder

onderzoek naar het bestaan van asymptomatische EVD gevallen is nodig. De aanpak

in dit hoofdstuk steunde op de veronderstelling van constante onderrapportering voor

EVD gevallen en doden, hoewel het bijna zeker is dat veranderingen in rapporteringen

tijdens de uitbraak hebben plaats gevonden. Helaas was er geen data over deze

veranderingen beschikbaar. Verder gaven onregelmatigheden in de data weer dat er

sprake was van inconsistente rapportering en niet gedocumenteerde veranderingen.

statistical and mathematical perspectives on present-day ...population: an application to belgium...

Documents