statistical and mathematical perspectives on present-day ...population: an application to belgium...
TRANSCRIPT
Statistical and MathematicalPerspectives on Present-dayInfectious DiseaseEpidemiology
Eva Santermans
Promoter: Prof. Dr Niel Hens
Co-Promoter: Prof. Dr Marc Aerts
Co-Promoter: Prof. Dr Geert Molenberghs
Dankwoord
Writer’s block...
Thanks to everyone!
Eva
Just kidding. :-) Although it just occurred to me that this is the only part of my
thesis that will be read by more than five persons, I will not let the stress get to
me. *drinks wine* So, here is my attempt to entertain everyone that is reading this
during my presentation for a solid three minutes (hang in there, reception is coming).
As per usual, I would like to start by thanking the person who had the diffi-
cult task of being my supervisor. Thank you, Niel, for your guidance these past four
years. Whilst letting me work independently, I could always count on you for input,
comments, discussions, ideas, advice (and terribly bad jokes). I feel this might be a
good opportunity to apologize for my bad sense of humor, however, there are still a
number of people I need to mention. I would like to thank my co-supervisors, Marc
and Geert, for their suggestions that helped me improve my work. Furthermore, I
am grateful to the other members of my jury for providing feedback to an earlier
version of this thesis and to all my co-authors whom I had the pleasure to collaborate
with. I would also like to say thank you to Phil and Theo for giving me the
opportunity to spend some time at the university of Nottingham. These research
visits introduced me into the field of Bayesian statistics and broadened my knowledge.
iii
iv
The past years would not have been the same without all my awesome col-
leagues. Cheers to the people of the ‘Koffiegroep’ who made my coffee and lunch
breaks so much more enjoyable. High five to the JOSS board. I had a lot of fun
during our regular meetings and activities. Especially the bowling, that seemed to
trigger a highly amusing competitiveness among the board members (yes boyz, that’s
a reference to you ;-) ). Thank you the colleagues in Antwerp for the interesting
discussions and presentations during our monthly meetings. Lander deserves a
special mentioning for working magic on my sometimes-not-that-efficient R code.
And of course, special thanks to my roomie, Robin! We were ‘matched’ at the start
of our PhD and, well, I think our neighboring offices can indicate how that turned
out (so sorry, we were a tad bit chatty sometimes). You are by far the most ‘stressy’
and ‘catty’ person I know (don’t worry, I won’t include any details), but I could
always count on you. Homies forever!
Anja en Stephanie, merci voor de regelmatige etentjes! Niets dat niet opgelost
kan worden met eten, wijn of roddels, toch? ;-) Natuurlijk ben ik ook een dikke
dankjewel verschuldigd aan mijn ouders. Mams en paps, jullie hebben me altijd
gesteund tijdens mijn (iets-langer-dan-gemiddelde) studieperiode. Papa, dankjewel
om mij altijd te pushen zodat ik het beste uit mezelf kon halen. Mama, dankjewel
om papa regelmatig een beetje af remmen daarin. :-) Ook een dikke kus voor Stella
en Richard om altijd klaar te staan voor Cliff en mij. Tenslotte, nog een vuistje
voor mijn vriendje! Sjattie, thank you om mijn ‘zeldzame’ (kuch) momenten van
grumpyness Cliff-style aan te pakken. :-) Je slaagt er altijd in om mij aan het
lachen brengen (ook al zijn je grapjes een beetje dom en is je gorilla-imitatie heel erg
genant). Ik zie je graag!
Eva Santermans
Diepenbeek, 17 November 2016
List of Publications
Publications covered in this dissertation:
[1] Santermans, E., Goeyvaerts, N., Melegaro, A., Edmunds, W.J., Faes, C.,
Aerts, M., Beutels, P. and Hens, N. (2015). The social contact hypothesis under
the assumption of endemic equilibrium: Elucidating the transmission potential
of VZV in Europe. Epidemics, Volume 11, p. 14−23.
[2] Santermans, E., Ganyani, T., Faes, C., Hens, N., Plachouras, D., Quinten,
C., Robesyn, E., Sudre, B., Van Bortel, W. (2016). Spatiotemporal evolution of
Ebola virus disease at sub-national level during the 2014 West Africa epidemic.
PLoS ONE, 11(1): e0147172. doi: 10.1371/journal.pone.0147172.
[3] Santermans, E., Van Kerckhove, K., Azmon, A., Edmunds, J.W., Beutels,
P., Faes, C., Hens, N. Structural differences in mixing behaviour informing the
role of asymptomatic infection and testing symptom heritability. In revision for
Mathematical Biosciences.
[4] Goeyvaerts, N., Santermans, E., Potter, G., Van Kerckhove, K., Willem,
L., Aerts, M., Beutels, P., Hens, N. Empirical household contact networks:
challenging the random mixing assumption. In preparation.
[5] Santermans, E., O’Neill, P.D., Kypraios, T., Beutels, P., Hens, N. Bayesian
inference for the two-level mixing model incorporating empirical household con-
tact networks. In preparation.
v
vi List of Publications and Reports
Publications not covered in this dissertation:
[7] Hens, N., Abrams, S., Santermans, E., Theeten, H., Goeyvaerts, N., Lernout,
T., Leuridan, E., Van Kerckhove, K., Goossens H., Van Damme, P. and Beu-
tels, P. (2015). Assessing the risk of measles resurgence in a highly vaccinated
population: An application to Belgium anno 2013. Eurosurveillance, 20(1), doi:
10.2807/1560-7917.ES2015.20.1.20998.
Publications not covered in this dissertation on the statistical analysis of cell trans-
plantation experiments and tumor immunology:
[1] Praet, J., Orije J., Kara, F., Guglielmetti, C. Santermans, E., Daans, J.,
Hens, N., Verhoye, M., Berneman, Z., Ponsaerts, P. and Van der Linden, A.
(2015). Cuprizone-induced demyelination and demyelination-associated inflam-
mation result in different proton magnetic resonance metabolite spectra: 1H-
MRS descriminates demyelination from its associated inflammation. NMR in
Biomedicine, doi: 10.1002/nbm.3277.
[2] Praet, J., Santermans, E., Reekmans, K., de Vocht, N., Le Blon, D., Hoor-
naert, C., Daans, J., Goossens, H., Berneman, Z., Hens, N., Van der Linden, A.
and Ponsaerts, P. (2014). Histological Characterization and Quantification of
Cellular Events Following Neural and Fibroblast(-Like) Stem Cell Grafting in
Healty and Demyelinated CNS tissue. Methods in Molecular Biology, 1213, p.
265−283.
[3] Praet, J., Santermans, E., Daans, J., Le Blon, D., Hoornaert, C., Goossens,
H., Van der Linden, A., Hens, N., Berneman, Z. and Ponsaerts, P. (2014). Early
inflammatory responses following cell grafting in the CNS trigger activation of
the sub-ventricular zone: a proposed model of sequential cellular events. Cell
Transplantation, doi: 10.3727/096368914X682800.
[4] Le Blon, D., Hoornaert, C., Daans, J., Santermans, E., Hens, N., Goossens,
H., Berneman, Z. and Ponsaerts, P. (2014). Distinct spatial distribution of
microglia and macrophages following mesenchymal stem cell implantation in
mouse brain. Immunology and Cell Biology, 92(8), p. 650−658.
[5] Costa, R., Bergwerf, I., Santermans, E., De Vocht, N., Praet, J., Daans, J., Le
Blon, D., Hoornaert, C., Reekmans, K., Hens, N., Goossens, H., Berneman, Z.,
Parolini, O., Alviano, F. and Ponsaerts, P. (2015). Distinct in vitro properties of
embryonic and extra-embryonic fibroblast-like cells are reflected in their in vivo
List of Publications and Reports vii
behaviour following grafting in the adult mouse brain. Cell Transplantation,
Volume 24, p. 223−233.
[6] Guglielmeti, C., Le Blon, D., Santermans, E., Salas-Perdomo, A., Daans, J.,
De Vocht, N., Shah, D., Hoornaert, C., Praet, J., Peerlings, J., Kara, F., Bigot,
C., Mai, Z., Goossens, H., Hens, N., Hendrix, S., Verhoye, M., Planas, A.M.,
Berneman, Z., van der Linden, A., Ponsaerts, P. (2016). Interleukin-13 immune
gene therapy prevents CNS inflammation and demyelination via alternative ac-
tiviation of microglia and macrophages. Glia, doi: 10.1002/glia.23053.
[7] Le Blon, D., Guglielmetti, C., Hoornaert, C., Dooley, D., Daans, J., Lemmens,
E., De Vocht, N., Reekmans, K., Santermans, E., Hens, N., Goossens, H.,
Verhoye, M., Van der Linden, A., Berneman, Z., Hendrix, S., Ponsaerts, P. In-
tracerebral transplantation of interleukin 13-producing mesenchymal stem cells
limits microgliosis and demyelination in the cuprizone mouse model. Submitted
to Journal of Neuroinflammation.
[8] Marcq, E., Vasiliki, S., De Waele, J., van Audenaerde, J., Zwaenepoel, K.,
Santermans, E., Hens, N., Pauwels, P., van Meerbeeck, J.P., Smits, E.L.J.
Prognostic and predictive aspects of the tumor immune microenvironment and
immune checkpoints in malignant pleural mesothelioma. Submitted to Oncolm-
munology.
Contents
List of Publications and Reports v
Table of Contents ix
List of Abbreviations xiii
List of Figures xv
List of Tables xxi
1 Introduction 1
1.1 Infectious Disease Epidemiology . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Infectious Disease Models . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Epidemiological Parameters . . . . . . . . . . . . . . . . . . . . 13
1.3.3 Network Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.4 Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Data Sources 27
2.1 Disease Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.1 Varicella-zoster Virus . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.2 A(H1N1)v2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1.3 Pertussis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.1.4 Ebola Virus Disease . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 Social Contact Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
ix
x Table of Contents
2.2.1 POLYMOD Contact Data . . . . . . . . . . . . . . . . . . . . . 36
2.2.2 Contact Behavior during Illness . . . . . . . . . . . . . . . . . . 37
2.2.3 Estimation of Contact Rates . . . . . . . . . . . . . . . . . . . 37
2.2.4 Contact Patterns within Households . . . . . . . . . . . . . . . 40
3 The Social Contact Hypothesis Under Endemic Equilibrium 45
3.1 Estimating the Basic and Effective Reproduction Number . . . . . . . 46
3.1.1 Mass Action Principle and Mixing Assumptions . . . . . . . . . 46
3.1.2 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . 48
3.1.3 Model Eligibility and Indeterminacy . . . . . . . . . . . . . . . 49
3.1.4 Application to the Data . . . . . . . . . . . . . . . . . . . . . . 50
3.2 Elucidating Potential Risk Factors . . . . . . . . . . . . . . . . . . . . 54
3.2.1 Maximal Information Coefficient . . . . . . . . . . . . . . . . . 54
3.2.2 Random Forest Approach . . . . . . . . . . . . . . . . . . . . . 58
3.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3.1 Contact data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3.2 Risk Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.3.3 Perturbations Demographic and Endemic Equilibrium . . . . . 62
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4 Differences in Mixing Behaviour and Symptom Heritability 69
4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.1.1 Transmission Models . . . . . . . . . . . . . . . . . . . . . . . . 71
4.1.2 Age Structure and Social Contacts . . . . . . . . . . . . . . . . 73
4.1.3 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . 75
4.2 Application to the Data . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.1 Exploratory Analyses . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3 Impact of Home Isolation . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5 Empirical Household Contact Networks 89
5.1 Household Contact Survey . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2 ERGMs for Within-household Physical Contact Networks . . . . . . . 93
5.3 Epidemic Spread in a Community of Households . . . . . . . . . . . . 98
5.3.1 Setting 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.3.2 Setting 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Table of Contents xi
5.3.3 Other Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6 Two-Level Mixing Model Incorporating Household Networks 107
6.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.1.1 Model Description . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.1.2 Likelihood and Posterior Density . . . . . . . . . . . . . . . . . 111
6.1.3 MCMC Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.2 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7 Spatiotemporal Evolution of EVD at Sub-national Level 117
7.1 Growth model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.1.1 Model Description . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.2 Compartmental model . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.2.1 Model Description . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.2.2 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . 125
7.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.3 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8 Discussion and Further Research 139
Bibliography 143
Acknowledgements 159
A Appendix - Chapter 5 161
A.1 Household Contact Survey . . . . . . . . . . . . . . . . . . . . . . . . . 161
A.2 Modeling Within-household Physical Contact Networks . . . . . . . . 162
A.3 Epidemic Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . 166
Samenvatting 167
List of Abbreviations
AIC Akaike information criterion
AM Adaptive metropolis
AMM Adaptive-mixture metropolis
CAS-model Continuous age-structured model
CI Confidence interval
DIC Deviance information criterion
ELISA Enzyme-linked immunosorbent assay
ERGM Exponential random graph model
EVD Ebola virus disease
EXP Exponential
FOI Force of infection
GP General practitioner
HIV Human immunodeficiency virus
IgM Immunoglobulin M
IgG Immunoglobulin G
ILI Influenza-like illness
INLA Integrated nested Laplace approximation
LIN Linear
MAP Mass-action principle
MH Metropolis-Hastings
MIC Maximal information coefficient
xiii
xiv
ML Maximum likelihood (estimation)
MSE Mean squared error
NIP National immunization program
ODE Ordinary differential equation
OOB Out-of-bag
PCR Polymerase chain reaction
PDE Partial differential equation
RAS-model Realistic age-structured model
RWM Random-walk metropolis
VE Vaccine effectiveness
VZV Varicella-zoster virus
WAIFW Who Acquires Infection From Whom
List of Figures
1.1 Left panel: 3-D computer enhanced electron microscope photo of the
Varicella-zoster virus, content provider: ShutterStock, photo credit:
Michael Taylor. Right panel: 3-D graphical representation of the struc-
ture of a generic influenza virus, content provider: CDC . . . . . . . . 2
1.2 Infectious disease stages, adapted from the book “Modelling infectious
diseases” by Keeling and Rohani, 2007 . . . . . . . . . . . . . . . . . . 3
1.3 Flow diagram for the deterministic SIR model. . . . . . . . . . . . . . 7
1.4 Flow diagram for the age-structured SIR model with two age groups. . 8
1.5 Illustration of the (basic) reproduction number R0 (left) and R (right):
one infected individual (black circle) is introduced into a fully suscep-
tible population and infects on average R0 = 3 other individuals (grey
circles, left panel), or he/she is introduced in a partly immunized pop-
ulation (dotted circles) infecting only R = 1 individual (grey circles,
right panel) (Goeyvaerts, 2011). . . . . . . . . . . . . . . . . . . . . . . 15
2.1 Observed age-specific VZV seroprevalence for Belgium, England and
Wales, Finland, Germany, Ireland, Israel and Italy. The size of the
dots is proportional to the sample size per age category. . . . . . . . . 29
2.2 Observed age-specific VZV seroprevalence for Luxembourg, the Nether-
lands, Poland, Slovakia and Spain. The size of the dots is proportional
to the sample size per age category. . . . . . . . . . . . . . . . . . . . . 30
2.3 Weekly number of ILI cases in five age categories during the early part
of the A/H1N1pdm influenza epidemic in 2009 in England and Wales. 31
xv
xvi
2.4 Compositions of the households included in the pertussis study (left)
and symptom onset times in days relative to the symptom onset time
of the primary case of the household (right). . . . . . . . . . . . . . . . 33
2.5 Transmission electron micrograph of an Ebola virus virion. . . . . . . 34
2.6 Contour plot of the estimated Belgian contact rates derived from the
bivariate smoothing approach applied to the POLYMOD survey data. 39
2.7 Age-specific contact rates for asymptomatic individuals (left) and
symptomatic individuals (right) based on the age classes of the in-
cidence data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.8 Observed within-household physical contact networks by household
size. Nodes represent household members and edges represent phys-
ical contacts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.9 Barplots of contact intensity distributions (duration, frequency and
touching) and contact location distributions for all contacts recorded
with non-household (left bar) and household members (right bar). . . 43
3.1 Estimated basic and effective reproduction numbers with 95% boot-
strap percentile confidence intervals for constant (black), log-linear
(gray) and extended log-linear (light gray) proportionality factor. For
each country, sizes of the dots are proportional to Akaike weights, hence
larger dots correspond to smaller AIC values. The dotted horizontal
line indicates the single eligible value for R under endemic equilibrium,
which is one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2 Profile likelihood estimates of R0 (left axis) and R (right axis) as a
function of γ2, the parameter related to infectiousness, for Finland and
Luxembourg. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3 Profile likelihood estimates of R (dots) with interpolated 95% boot-
strap percentile confidence intervals (dashed lines) as a function of γ2,
the parameter related to infectiousness, for Finland and Luxembourg.
The vertical dotted line indicates the value of γ2 for which the upper
confidence limit of R equals 1 (horizontal dotted line). . . . . . . . . . 53
3.4 Observed age-specific VZV seroprevalence (dots) and the profile esti-
mated from the final model selected for each country (solid line). The
corresponding force of infection estimates are displayed by the lower
solid line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
xvii
3.5 Observed age-specific VZV seroprevalence (dots) and the profile esti-
mated from the final model selected for each country (solid line). The
corresponding force of infection estimates are displayed by the lower
solid line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1 Schematic diagram of the non-preferential transmission model. Super-
scripts indicate presence (s) or absence (a) of symptoms. . . . . . . . . 71
4.2 Schematic diagram of the preferential transmission model. Superscripts
indicate clinical status of the infected individual: symptomatic (s) or
asymptomatic (a). Subscripts indicate whether the infector was symp-
tomatic (s) or asymptomatic (a). . . . . . . . . . . . . . . . . . . . . . 72
4.3 Prior and posterior distributions for the proportion of cases that de-
velop symptoms (φ), the proportionality factor for asymptomatic in-
dividuals (qa), the relative infectiousness of symptomatic cases versus
asymptomatic cases (qr) and the reporting rates (ρi, i = 1, 2, 3, 5). . . 80
4.4 Scatter plot of the proportion of cases that develop symptoms (φ),
the proportionality factor for asymptomatic individuals (qa) and the
infectiousness ratio (qr). . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.5 Number of symptomatic (full line) and asymptomatic (dotted line)
cases over time for the five age categories assuming a 20% reporting
rate in the 45− 65 age class for the non-preferential model. . . . . . . 81
4.6 Prior and posterior distributions for the proportion of individuals in-
fected by a symptomatic case that develop symptoms (φs), the pro-
portion of individuals infected by an asymptomatic case that remain
asymptomatic (φa), the proportionality factor for asymptomatic indi-
viduals (qa), the relative infectiousness of symptomatic cases versus
asymptomatic cases (qr) and the reporting rates (ρi, i = 1, 2, 3, 5). . . 82
4.7 Scatter plot of the proportion of individuals infected by a symptomatic
case that develop symptoms (φs), the proportion of individuals in-
fected by an asymptomatic case that remain asymptomatic (φa), the
proportionality factor for asymptomatic individuals (qa) and the infec-
tiousness ratio (qr). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.8 Number of symptomatic (full line) and asymptomatic (dotted line)
cases over time for the five age categories assuming a 20% reporting
rate in the 45− 65 age class for the preferential model. . . . . . . . . . 83
xviii
4.9 Histogram of MCMC samples for φs− (1−φa), with φs the proportion
of individuals infected by a symptomatic case that develop symptoms
and φa the proportion of individuals infected by an asymptomatic case
that remain asymptomatic in the preferential model. . . . . . . . . . . 84
4.10 Observed (grey bars) and estimated (connected dots) reported weekly
incidence for the five age categories. Full line and filled dots is the es-
timated incidence for the non-preferential model, dotted line and open
dots are the estimates for the preferential model. . . . . . . . . . . . . 85
4.11 Proportion of cases plotted against the proportion of symptomatic in-
dividuals staying home immediately after symptom onset. Left panel:
reduction in total number of cases for the non-preferential model with
95% confidence intervals. Right panel: reduction in the number of
total, symptomatic and asymptomatic cases for the preferential model. 86
5.1 Proportion of complete networks (left) and mean network density
(right): observed values (blue stars with size proportional to the sam-
ple size) and values simulated from the ERGM for within-household
physical contact networks on a weekday. . . . . . . . . . . . . . . . . 96
5.2 Proportion of complete networks (left) and mean network density
(right): observed values (blue stars with size proportional to the sam-
ple size) and values simulated from the ERGM for within-household
physical contact networks on a weekend day. . . . . . . . . . . . . . . 97
5.3 Proportion of observed versus potential triangles: observed values (blue
stars with size proportional to the sample size) and values simulated
from the ERGM for within-household physical contact networks on a
weekday (left) and on a weekend day (right). . . . . . . . . . . . . . . 97
5.4 Mean infection incidence over time at the individual (left) and house-
hold level (right) for 1000 simulations of a stochastic SIR epidemic
process on a 2-level households model assuming random (black) and
empirical-based (red) mixing within households. . . . . . . . . . . . . . 100
5.5 Household attack rates by household size for 1000 simulations of a
stochastic SIR epidemic process on a 2-level households model assuming
random and empirical-based mixing within households. . . . . . . . . . 100
xix
5.6 Mean infection incidence over time at the individual (left) and house-
hold level (right) for 1000 simulations of a stochastic SIR epidemic
process on a 2-level households model assuming random (black) and
empirical-based (red) mixing within households including a density
scaling factor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.7 Household attack rates by household size for 1000 simulations of a
stochastic SIR epidemic process on a 2-level households model assum-
ing random and empirical-based mixing within households including a
density scaling factor. . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.1 Trace plot of the MCMC samples for the within-household transmis-
sion probability (βh), the community risk of infection (βc), the mean
duration of the incubation period (µ), the standard deviation of the
incubation period (σ) and the number of edges and triangles in the
household contact network G. . . . . . . . . . . . . . . . . . . . . . . . 113
6.2 Prior and posterior distributions for the within-household transmission
probability (βh), the community risk of infection (βc), the mean du-
ration of the incubation period (µ) and the standard deviation of the
incubation period (σ). Dotted lines are prior distributions. . . . . . . . 114
7.1 Estimated weekly growth rates per district and implemented interven-
tion measures for Guinea, Sierra Leone and Liberia, 2014-2015. Red
colours indicate an increase in number of weekly cases, whereas blue
colours indicate a decline. Periods for which no reported cases are
available are shown in white. A light dot indicates that a triage, hold-
ing centre or CCC is in place and a dark dot indicates that an ETU or
ETU and CCC are in place. . . . . . . . . . . . . . . . . . . . . . . . . 120
7.2 Estimated growth rate per district and implemented intervention mea-
sures during week 21 and 40 of 2014 and week 8 and 26 of 2015. ‘1’
triage, holding centre or CCC is in place; ‘2’ ETU or ETU plus CCC
is in place. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.3 Cumulative cases per district and implemented intervention measures.
A light dot indicates that a triage, holding centre or CCC is in place
and a dark dot indicates that an ETU or ETU and CCC are in place. 122
7.4 Cumulative deaths per district and implemented intervention measures.
A light dot indicates that a triage, holding centre or CCC is in place
and a dark dot indicates that an ETU or ETU and CCC are in place. 123
xx
7.5 Flow diagram for the SEIR model with distinction between cases that
survive and fatal cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.6 Schematic representation of reporting of case notifications. . . . . . . . 125
7.7 Observed (black) and estimated (blue) number of new cases (top left),
new deaths (top right), cumulative cases (bottom left) and cumula-
tive deaths (bottom right) per district. Dashed lines are 95% credible
intervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.8 Three-week prediction of new cases (left) and deaths (right) for Western
Area Urban at 24 October, 14 November, 5 December and 26 December
2014 (top to bottom). Light blue regions are the predicted time periods
and estimation is based on all data before that time point. . . . . . . . 129
7.9 Estimated reproduction number per district with 95% posterior inter-
vals. The threshold value of one is indicated by a red horizontal line. 130
A.1 Barplot of within-household contact duration distributions by type of
relationship, including both physical and non-physical contacts. . . . . 161
A.2 Interpretation of mixing and age effect statistics of the ERGM: ratio
of the odds of physical contact occurring between two relatives versus
a pair of siblings, as a function of the sum of the siblings’ ages. Left
panel: weekday, right panel: weekend day. . . . . . . . . . . . . . . . . 162
A.3 Final fractions for 1000 simulations of a stochastic SIR epidemic process
on a 2-level households model assuming random and empirical-based
mixing within households. Small outbreaks are excluded from display. 166
A.4 Final fractions for 1000 simulations of a stochastic SIR epidemic process
on a 2-level households model assuming random and empirical-based
mixing within households. Small outbreaks are excluded from display. 166
List of Tables
2.1 Overview of the VZV serological data and demographic parameters. . 28
3.1 Estimates of the basic and effective reproduction numbers and trans-
mission parameters (γ0, γ1, γ2) with 95% bootstrap percentile confi-
dence intervals and corresponding AIC values for constant (CP), log-
linear (LP) and extended log-linear (EP) proportionality assumptions.
Estimates for EP are obtained using a profile likelihood-based assess-
ment of model eligibility. Final models are indicated in bold. . . . . . 52
3.2 Selected set of potential risk factors for varicella. Data sources and
missingness are indicated. Reference years were chosen to be as close
to the year of serological data collection as possible, conditional on
availability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3 Ten factors with the largest MIC value of association with R0, esti-
mated from the final model selected for each country, and correspond-
ing Spearman correlation coefficients ρS . . . . . . . . . . . . . . . . . . 59
3.4 Ten best scoring factors obtained by a random forest analysis of R0,
estimated from the final selected model for each country, and corre-
sponding Spearman correlation coefficients ρS . . . . . . . . . . . . . . 60
3.5 Pairs of potential risk factors with the largest absolute Spearman cor-
relation coefficient. High scoring factors according to MIC and random
forest are indicated in bold. . . . . . . . . . . . . . . . . . . . . . . . . 61
3.6 Comparison of the average household size at time of serological data
collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
xxi
xxii
3.7 Estimated basic and effective reproduction numbers with 95% boot-
strap percentile confidence intervals and corresponding AIC values for
the log-linear model based on contact data minimizing AIC. . . . . . . 63
3.8 Ten factors with the largest MIC value of association with R0, esti-
mated from the log-linear model using the minimal AIC contact data,
and corresponding Spearman correlation coefficient, ρS . . . . . . . . . 63
3.9 Ten best scoring factors obtained by a random forest analysis of R0,
estimated from the log-linear model using the minimal AIC contact
data, and corresponding Spearman correlation coefficient, ρS . . . . . . 64
3.10 Estimates of the basic and effective reproduction number when imple-
menting a vaccination strategy or changing the birth rate. . . . . . . . 64
3.11 Ranges of estimates of the basic reproduction numbers obtained by
Santermans et al. , Nardone et al. and Melegaro et al. Nardone et
al. used a WAIFW matrix approach for three age groups, whereas
Melegaro et al. used the social contact hypothesis for different stratifi-
cations of POLYMOD contact data. . . . . . . . . . . . . . . . . . . . 66
4.1 An overview of parameters of pandemic influenza A/H1N1 2009 in
humans obtained from a literature review (Dorjee et al., 2013). These
values were either estimated from empirical data of experimental or
observational studies (Est.); or referenced for modeling (Ref.). . . . . . 77
4.2 Prior distributions for the parameters in the preferential and non-
preferential model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3 Posterior median, 95% posterior intervals and DIC value for the non-
preferential model for different values of the reporting rate ρ. . . . . . 78
4.4 Posterior median, 95% posterior intervals and DIC value for the pref-
erential model for different values of the reporting rate ρ. . . . . . . . 79
4.5 Posterior median, 95% posterior credible intervals and DIC value for
the non-preferential and preferential model. . . . . . . . . . . . . . . . 80
5.1 Proportion of complete networks and mean network density, stratified
by household size, for the observed within-household physical contact
networks, comparing week and weekend days (top) and regular and
holiday periods (bottom). . . . . . . . . . . . . . . . . . . . . . . . . . 92
xxiii
5.2 Network statistics considered in the ERGMs, where an edge is defined
as a physical contact between two individuals. Reference categories
are child-child mixing, boy-girl mixing, and mixing within households
of size 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3 ERGM for within-household physical contact networks on week- and
weekend days: parameter estimates and Wald test p-values, log-
likelihood and AIC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.1 Estimation of vaccine effectiveness for 1 to 14-year-olds per birth cohort
according to the NIP report (National Institute for Public Health and
the Environment, 2013). . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.2 Posterior median and 95% posterior credible intervals for the model
parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.1 Prior distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.2 Parameter estimates with 95% posterior credible intervals. . . . . . . 127
7.3 Parameter estimates sensitivity analysis. Fixed values are indicated in
bold, asterisks indicate model differences compared to the final model 1.134
7.4 Parameter estimates sensitivity analysis. Fixed values are indicated in
bold, asterisks indicate model differences compared to the final model 1.135
A.1 Observed physical contact networks: average degree and various mea-
sures of within-household clustering, stratified by household size. . . . 163
A.2 Observed proportion of complete networks and mean network density,
stratified by household size, with median and 95% percentile range
obtained from 1000 networks simulated from the ERGM for within-
household physical contact networks on a weekday. . . . . . . . . . . . 163
A.3 Observed proportion of observed versus potential triangles, stratified
by household size, with median and 95% percentile range obtained from
1000 networks simulated from the ERGM for within-household physical
contact networks on a weekday. . . . . . . . . . . . . . . . . . . . . . . 164
A.4 Observed proportion of complete networks and mean network density,
stratified by household size, with median and 95% percentile range
obtained from 1000 networks simulated from the ERGM for within-
household physical contact networks on a weekend day. . . . . . . . . . 164
xxiv
A.5 Observed proportion of observed versus potential triangles, stratified
by household size, with median and 95% percentile range obtained from
1000 networks simulated from the ERGM for within-household physical
contact networks on a weekend day. . . . . . . . . . . . . . . . . . . . 164
A.6 Literature-based estimates of household and community transmission
parameters obtained from household final size or symptom onset data:
qHH = P (escape infection from infected HH member per day) as-
suming an infectious period of 4 days, the household secondary at-
tack rate (SAR) i.e. the probability of being infected by another
household member during the course of the latter’s infectious pe-
riod, and qcom = P (escape infection from community during epidemic
period)= 1− CPI, where CPI is the community probability of infection.
† Household size defined as the number of susceptibles in a household
prior to the epidemic. * Same age definitions for children and adults as
in Longini et al. (1988), distinguishing between susceptibles and infected.165
Chapter 1Introduction
1.1 Infectious Disease Epidemiology
Infectious diseases have a huge impact on global health, being responsible for millions
of deaths each year, especially in the developing world. From the global HIV and
tuberculosis epidemics, to the appearance of new pathogens and resurgence of old
ones, often in new and drug-resistant forms. All bring the need for new and improved
methods in infectious disease epidemiology. Infectious disease epidemiology shares
the same conceptual framework as ‘non-infectious disease’ epidemiology: it concerns
the study of the causes and distribution of infectious diseases in populations and aims
to control them. There are, however, some concepts and terminology specifically
related to infectious diseases.
Infectious diseases are caused by pathogenic microorganisms, such as bacteria,
viruses, parasites or fungi; the diseases can be spread, directly or indirectly, from
one person to another (World Health Organization, 2016b). Direct transmission may
be through direct contact, e.g. touching, biting, kissing or sexual intercourse, by
droplet contact or through airborne transmission. Droplet contact occurs when an
individual sneezes or coughs and the droplets spray onto the eyes, nose or mouth
of another individual. This is usually limited to short distances. The transmission
route is called airborne when viruses travel on small respiratory droplets that
may become aerosolized after sneezing, coughing or talking. These particles can
remain in the air for long periods of time and travel over considerable distances.
Examples of diseases that are transmitted via airborn or droplet contact are measles,
1
2 Chapter 1. Introduction
pertussis, (pandemic) influenza (Figure 1.1), etc. Indirect transmission occurs when
the infectious organism is transferred from a source through objects (vehicle-born)
or insects (vector-borne). Malaria and dengue fever are examples of vector-borne
diseases. The type of transmission route depends mainly on the characteristics of
the causative agent and those of the host. Some microorganisms are restricted to
a limited number of transition routes, whereas others can follow many different
pathways to infect their hosts. In this thesis, we will focus on infectious diseases for
which the main transmission route is from human to human via non-sexual social
contacts, such as airborne transmission, droplet contact or physical contact.
Figure 1.1: Left panel: 3-D computer enhanced electron microscope photo of the
Varicella-zoster virus, content provider: ShutterStock, photo credit: Michael Taylor. Right
panel: 3-D graphical representation of the structure of a generic influenza virus, content
provider: CDC
When an individual is infected with an infectious disease, an immune reaction is initi-
ated with the production of antibodies specific to the pathogen (humoral immunity)
and activation of cells aiming to destruct the pathogen (cellular immunity). There is
a period of variable length between the moment the host is infected and the moment
when the host is infectious, hence able to transmit the pathogen. Furthermore, the
host can develop symptoms after infection, although this does not need to coincide
with the period of infectiousness. Individuals that do not develop symptoms at
all are refered to as asymptomatic cases. Eventually, the host may no longer be
infectious and recover. Thereafter, he can be immune to the disease (Figure 1.2).
Cell-mediated immunity is driven by T-cells that are able to detect viral antigens on
1.1. Infectious Disease Epidemiology 3
Figure 1.2: Infectious disease stages, adapted from the book “Modelling infectious
diseases” by Keeling and Rohani, 2007
a cell’s surface and destroy the cell if necessary. Humoral (or antibody-mediated)
immunity on the other hand is related to the production of virus-specific antibodies,
that induce long term protection. There are two main types of antibodies: Im-
munoglobulin M (IgM) and Immunoglobulin G (IgG). Another antibody isotype is,
for example, IgA, which is found in mucosal areas (see e.g. Woof and Burton (2004)).
IgM antibodies are produced quickly after the onset of infection and will last for only
a short period of time. The production of IgG antibodies takes longer, but they can
persist for years after the infection and it provides immunity for years, even lifelong.
This type of antibodies can be transferred from a pregnant woman to her fetus,
granting immunity to a pathogen until the infant’s immune system has matured.
Vaccination is based on the introduction of an antigen from a pathogen to stimulate
the immune system and to develop immunity against that specific pathogen. Hence,
in the absence of immunization, the presence of IgG antibodies in blood serum
indicates past infection. In Chapter 3, cross-sectional sets of serum samples are used.
Testing these samples for IgG titer values gives rise to serological data and provides
information on the immunity status of the individuals. For a description of this data,
we refer to Section 2.1.1.
In response to emerging threats and to improve control of endemic diseases,
4 Chapter 1. Introduction
the field of infectious disease modeling has grown substantially in the last decades.
A distinction between statistical and mathematical models is to be made. Statistical
models study relations between different variables based on data and make inferences
based on these relations. Mathematical models, on the other hand, describe a
system through mathematical equations and study how the system changes from
one state to the next and how variables depend on the value or state of other variables.
A key parameter in infectious disease modeling is the probability of contact
between an infectious source and a susceptible individual. For infections transmitted
from person to person various assumptions are required to simplify the range of
human relations into tractable mathematical models. Classical early work in epi-
demic modeling usually assumed homogeneous mixing in a community of individuals,
each having the same susceptibility to disease and the same ability to transmit
disease. Such assumptions rarely reflect reality, although they are often sufficient
for modeling purposes, and during the last decades, considerable efforts have been
made towards modeling heterogeneity in the acquisition of infection and its effect on
disease propagation. Anderson and May (1991) were to first to introduce a method
of imposing certain patterns on age-dependent mixing rates. The effect of imposing
these mixing patterns was studied by Greenhalgh and Dietz (1994) and Farrington
et al. (2001) among others. Several extensions of the traditional approach by
Anderson and May (1991) have been proposed including time-varying transmission
rates (Whitaker and Farrington, 2004) and continuous parametric contact surfaces
(Farrington and Whitaker, 2005). Wallinga et al. (2006) introduced the use of social
contact surveys to inform transmission rates. They assume that transmission rates
for infections transmitted through non-sexual social contacts are proportional to
contact rates estimable from contact surveys. Social contact data form an important
part of this dissertation, and are described in Section 2.2. Further, models have been
developed that attempt to represent the underlying structure of contact patterns by
partitioning the population into contact structures. Examples of epidemic models
that incorporate structured populations include independent-household models (see
Longini and Koopman (1982); Becker and Dietz (1995); Becker and Hall (1996)),
models with two levels of mixing (see Ball et al. (1997); Ball and Lyne (2001); Demiris
and O’Neill (2005a)), random network models (e.g. Anderson (1999); Britton and
O’Neill (2002)), and social cluster models (e.g. Schinazi (2002)).
1.2. Overview of the Thesis 5
1.2 Overview of the Thesis
The social contact hypothesis, introduced by Wallinga et al. (2006), can be extended
by incorporating age-dependent susceptibility to infection, which entailed an improve-
ment of model fit for data on varicella-zoster virus (VZV) in Belgium (Goeyvaerts
et al., 2010). In Chapter 3, we look at data from 11 other European countries, besides
Belgium, and evaluate how this age-dependent susceptibility affects the fit to the
data. Furthermore, we introduce a method to account for age-specific heterogeneity
related to infectiousness by relying on the effective reproduction number as model
eligibility criterion.
In Chapter 4, we use social contact data that provide insight in the impact of
illness on contact patterns. We show that this type of data can inform inference
on parameters related to asymptomatic infection using data on symptomatic cases
only. This will be illustrated using data on influenza-like illness. Additionally, we
investigate whether the probability of developing symptoms depends on the clinical
state of the person that transmitted the infection.
Chapters 5 and 6 look into contact heterogeneity within households. Data
from the first social contact survey designed to study contact networks within
households is described in Chapter 5. In this chapter, we also develop a network
model to infer on the factors that drive contacts between household members. This
network model is then used in Chapter 6 to inform within-household networks in a
2-level mixing model. Inference for this model is illustrated using data on pertussis
in the Netherlands.
In Chapter 7 we develop a two-stage model for the Ebola outbreak of 2014.
This model takes into account the spatial and temporal heterogeneity of the outbreak
and is based on publicly-available district-level data on the number of cases and
deaths.
Finally, in Chapter 8 we summarize our main conclusions and discuss topics
open for further research.
6 Chapter 1. Introduction
1.3 Basic Concepts
In this section, we will introduce some basic terminology and fundamental concepts
used in the field of mathematical epidemiology. Infectious disease models are intro-
duced in Section 1.3.1. In Section 1.3.2, we describe some of the most important epi-
demiological parameters. Section 1.3.3 provides an introduction to the basic concepts
of networks and exponential random graph models. Finally, an overview of inference
methods relevant for the analyzes in this thesis are discussed in Section 1.3.4.
1.3.1 Infectious Disease Models
Deterministic models describing disease dynamics by partitioning the population into
different disease states, date back to the work by Bernouilli (1760). He developed the
first model to demonstrate the benefits of immunizing individuals against smallpox
in France. Others followed, but the most important contributions to these models
were made after the 1900s when the interest in infectious disease models increased
substantially (Kermack and McKendrick, 1927; Bailey, 1975; Dietz, 1975; Anderson
and May, 1991). Deterministic transmission models are very insightful to study dis-
ease dynamics in large populations, however they are less suited for small or isolated
populations. To this purpose, stochastic models were defined. These models make up
the second important branch in infectious disease modeling and are usually defined at
the individual level. For an elaborate discussion on stochastic modeling, we refer to
Daley and Gani (1999); Andersson and Britton (2000) and Diekmann et al. (2013). In
this thesis, we make use of deterministic models in Chapters 3, 4 and 7. To describe
disease transmission in relatively small populations of households in Chapters 5 and
6, a stochastic chain-binomial model is used.
1.3.1.1 Deterministic SIR Model
One of the most simple compartmental models is the so-called Susceptible-Infected-
Recovered (SIR) model. This model describes disease spread of infections conferring
lifelong immunity. It is depicted as a flow diagram in Figure 1.3.
The SIR model assumes that individuals are born susceptible (S) to infection. Then,
as time progresses individuals of age a become infected and move to the infectious
class I at an age- and time-dependent rate λ(a, t), the so-called ‘force of infection’.
After this stage, individuals are removed and progress to the R compartment at rate
γ(a, t) in which they stay until they die. These individuals can no longer transmit the
1.3. Basic Concepts 7
Figure 1.3: Flow diagram for the deterministic SIR model.
infection to other individuals and are, depending on the disease under consideration,
recovered, immunized, isolated or dead. Furthermore, individuals in each state are
subject to natural mortality µ. Infectious individuals may experience disease-related
mortality α, and are thus subject to mortality at rate η(a, t) = µ(a, t) + α(a, t). The
number of individuals in each compartment are denoted by S(a, t), I(a, t) and R(a, t).
The model can be expressed by the following set of partial differential equations
(PDEs) (Kermack and McKendrick, 1927):
δS(a,t)δa + δS(a,t)
δt = −[λ(a, t) + µ(a, t)]S(a, t),
δI(a,t)δa + δI(a,t)
δt = λ(a, t)S(a, t)− [γ(a, t) + η(a, t)]I(a, t),
δR(a,t)δa + δR(a,t)
δt = γ(a, t)I(a, t)− µ(a, t)R(a, t),
(1.1)
with boundary conditions S(0, t) = B(t), the number of births in the population at
time t, and I(0, t) = R(0, t) = 0 because of the assumption that all individuals are
born susceptible to infection. The total number of individuals of age a at time t is
defined as N(a, t) = S(a, t) + I(a, t) +R(a, t).
1.3.1.2 Age-structured Model
Solving the set of PDEs in (1.1) is not straightforward, however several simplifying
assumptions can be made to facilitate mathematical derivations (see e.g. Hens
et al. (2012)). One way of doing so, is by considering an age-structured model.
In such a model, the age dimension is divided into a finite number of age groups
that interact with each other. An illustration for two age groups is shown in Figure 1.4.
8 Chapter 1. Introduction
Figure 1.4: Flow diagram for the age-structured SIR model with two age groups.
When assuming K age groups, the system of ODEs for the first age group is given
by: dS1(t)dt = B(t)− [λ1(t) + µ1(t) + δ1]S1(t),
dI1(t)dt = λ1(t)S1(t)− [γ1(t) + η1(t) + δ1]I1(t),
dR1(t)dt = γ1(t)I1(t)− [µ1(t) + δ1]R1(t),
(1.2)
where δ1 is the rate at which individuals move to the second age group. For the other
age groups, the system is similar, but without births into the susceptible class and
with flows δi−1 from the previous age groups.dSi(t)dt = δi−1Si−1(t)− [λi(t) + µi(t) + δi]Si(t),
dIi(t)dt = δi−1Ii−1(t) + λi(t)Si(t)− [γi(t) + ηi(t) + δi]Ii(t),
dRi(t)dt = δi−1Ri−1(t) + γi(t)Ii(t)− [µi(t) + δi]Ri(t),
(1.3)
for i = 2, ...,K. Since we consider continuous transitions from one age group to the
next (via δi), this model is called the continuous age-structured model (CAS-model).
The disadvantage of the CAS-model is that people can transition instantaneously
to the next age group. To overcome this disadvantage, we consider the realistic
age-structured model (RAS-model). In this model, individuals move to the next age
group after exactly 1 year (when assuming age groups of 1 year). The RAS-model
consists of a two-step iteration:
Step 1: Solve the following set of ODEs:
1.3. Basic Concepts 9
dSi(t)dt = −[λi(t) + µi(t)]Si(t)
dIi(t)dt = λi(t)Si(t)− [γi(t) + ηi(t)]Ii(t)
dRi(t)dt = γi(t)Ii(t)− µi(t)Ri(t),
(1.4)
with initial conditions {Si(t0), Ii(t0), Ri(t0)} to obtain {Si(t + 1), Ii(t + 1), Ri(t +
1)}, i = 1, ...,K.
Step 2: Shift individuals to the next age class: {Si(t + 1), Ii(t + 1), Ri(t + 1)} →{Si+1(t + 1), Ii+1(t + 1), Ri+1(t + 1)}, i = 1, ...,K − 1 and all newborns B(t) are
assumed susceptible: {S0(t+ 1), I0(t+ 1), R0(t+ 1)} = {B(t), 0, 0}.
This process is iterated during the time period of interest. We used the RAS-
model in Chapter 3 to simulate the effect of demographic change and vaccination.
1.3.1.3 Demographic and Endemic Equilibrium
In Section 1.3.1.1, the general SIR model is described and in the previous section, the
model was simplified by considering discrete age classes. Another example of such
simplification is the assumption of endemic equilibrium or steady state of the model
(see e.g. Anderson and May (1991) and Chapter 3 in this thesis) meaning that the
disease incidence fluctuates around a stationary average over time. The population
can also be assumed to have reached demographic equilibrium which implies that the
age distribution is stationary. For some diseases the disease-induced mortality can be
neglected (α(a, t) = 0). Finally, the number of births and deaths can assumed to be
constant over time and balanced, resulting in a constant population of size N . Under
the endemic and demographic equilibrium assumptions, the time-dependency in the
set of PDEs (1.1) cancels out and we obtain a set of ordinary differential equations
(ODEs):
dS(a)
da = −[λ(a) + µ(a)]S(a),
dI(a)da = λ(a)S(a)− [γ(a) + µ(a)]I(a),
dR(a)da = γ(a)I(a)− µ(a)R(a).
(1.5)
The equations in (1.5) yield the following expression for the stationary population age
10 Chapter 1. Introduction
distribution N(a):dN(a)
da= −µ(a)N(a),
from which follows
N(a) = N(0) exp
(−∫ a
0
µ(u) du
)= N(0) exp(−φ(a)). (1.6)
Based on the boundary condition on the number of newborns and (1.6), births and
deaths are indeed balanced, since
N(0) = B =
∫ ∞0
µ(a)N(a) da,
is equivalent to ∫ ∞0
µ(a) exp(−φ(a)) da = 1,
which is satisfied. Note that exp(−φ(a)) is a monotone decreasing function reflecting
the probability to survive up to age a: m(a) = exp(−φ(a)) = P (T > a), where T is
the time of death. From this follows that the life expectancy is given by
L =
∫ ∞0
−am′(a) da = −am(a)|∞0 +
∫ ∞0
m(a) da =
∫ ∞0
exp(−φ(a)) da.
Although not necessary when empirical data on natural mortality is available, it is
sometimes convenient to make simplifying assumptions regarding µ(a). Two types
of mortality functions that are often used in literature are called ’type I mortality’
and ’type II mortality’. Under Type I mortality, individuals survive up to the life
expectancy L after which they immediately die. For Type II mortality, the survival
function is of the form m(a) = exp(−µa), where µ is a constant mortality rate. In
this case, the life expectancy is given by L = 1/µ. Type I and type II mortality are
typically used for developed and developing countries, respectively, although many
developing countries are transitioning from type II to type I now.
In the above, the set of differential equations was described in terms of the
total number of individuals in each compartment. Instead, one can define age-specific
proportions or fractions of susceptible, infectious and removed individuals e.g.
s(a) = S(a)/N(a). It is convenient to do so, since this eliminates the natural
mortality rates µ(a) from the set of ODEs in (1.5):ds(a)
da = −λ(a)s(a),
di(a)da = λ(a)s(a)− γ(a)i(a),
dr(a)da = γ(a)i(a),
(1.7)
1.3. Basic Concepts 11
since, e.g.
ds(a)da = 1
N(a)dS(a)da + S(a)dN
−1(a)da
= −[λ(a) + µ(a)] S(a)N(a) + S(a)µ(a)N(a)
N(a)2
= λ(a)s(a).
.
Solving the above set of ODEs, the following expression for the fraction of susceptible
individuals of age a is obtained:
s(a) = exp
(−∫ a
0
λ(u) du
).
The SIR model is a fundamental example of a deterministic model used to describe
disease dynamics. It is the most frequently used compartmental model in the litera-
ture, however many extensions exist with different numbers of compartments having
various interpretations. For example the MSIR model accounting for maternal pro-
tection after birth, the SEIR model in which individuals experience a latent period
before becoming infectious, the SIS model with loss of natural immunity, and so on
(see e.g. Hens et al. (2012)). In Chapter 3, the MSIR model is considered for VZV
under the assumption of endemic and demographic equilibria, in Chapter 4 we study
an SEIR model taking into account asymptomatic infection for influenza, the pertussis
data in Chapter 6 are analyzed according to an age-homogenous, discrete-time SEIR
model, and in Chapter 7 the SEIR model is adapted for EVD assuming homogeneity
with respect to age.
1.3.1.4 Stochastic SIR model
The SIR model discussed in Section 1.3.1.1 is deterministic, i.e. every time the
equations are solved, the same result is obtained. Stochastic models, on the other
hand, describe the uncertainty seen in real-life outbreaks. For example, it may be
important to account for the variability of individual realizations when predicting
the course of an individual outbreak. Furthermore, when the number of cases is
small, the uncertainty on time to extinction is large and cannot be captured by a
deterministic model. Stochastic effects also play an important role when studying
recurrence and extinction of infections. In this section, we will describe a simple
discrete-time chain binomial SIR model (Bailey, 1957).
Chain binomial models are developed from the simple binomial model. The
basic idea behind the binomial model is that exposure to infection occurs in discrete
12 Chapter 1. Introduction
time units. Define p as the transmission probability conditional upon contact between
a susceptible and an infectious individual. The probability that the susceptible person
will not be infected during this contact is the escape probability q = 1 − p. If the
susceptible individual contacts n infectious individuals, the probability of escaping
from infection is qn = (1 − p)n (assuming that all contacts are equally infectious).
The probability of infection is then 1− qn = 1− (1− p)n. The chain binomial model
is now defined as the chained, or sequential application of the binomial model.
An example of a chain binomial model is the simple Reed-Frost model. This
model was developed by Lowell Reed and Wade Hampton Frost in the 1920’s and
described by Abbey (1952). In this model, the population size is assumed constant
and individuals are in one of the three SIR states. When working in discrete time,
the number of susceptible individuals is denoted by St, similar for the number of
infectious individuals It and the number of removed individuals Rt. When assuming
that individuals are infectious for exactly one generation, the full model is given by:
It+1 ∼ binom(St, 1− (1− p)It
),
St+1 = St − It+1,
Rt+1 = Rt + It.
. (1.8)
This is the most simple version of the model and it can be modified to make it
more realistic and adaptable for different diseases. One could for example alter the
assumptions on the recovery process, or add exposure to infection from outside the
population (Longini and Koopman, 1982).
Any stochastic epidemic model has a deterministic counterpart, obtained by
setting the deterministic population increments to the expected values of the
conditional increments in the stochastic model. Hence, the connection between the
stochastic SIR model described above and a deterministic SIR model can be seen as
follows. From (1.8) and a first-order Taylor approximation for small p, follows that
E (It+1|St, It) =(1− (1− p)It
)St ≈ pItSt. (1.9)
When switching from generation time to calendar time and assuming that the rate of
recovery is γ, we obtain:
E (It+1|St, It) ≈ pItSt − γIt,
1.3. Basic Concepts 13
corresponding with an age-homogenous deterministic SIR model with λ(t) = pI(t).
1.3.2 Epidemiological Parameters
In this section, some of the key measures of infectious disease transmission are dis-
cussed. First, the force of infection and mass action principle are discussed. Second,
the basic and effective reproduction number are defined.
1.3.2.1 Mass Action Principle and Mixing Assumptions
The expression in (1.9) states that the number of new cases in generation t + 1 is
proportional to all possible contacts between infectious and susceptible individuals in
generation t. This is the mass-action principle in its simplest form, and the underlying
assumption behind it is that infected and susceptible individuals mix homogeneously,
it is therefore also referred to as the ‘homogenous mass-action principle’. The
proportionality factor p is named ‘transmission parameter’ or ‘effective contact rate’
and is often denoted with the Greek letter beta, β. A contact is called effective
when it is made between a susceptible and infectious individual and it results in
infection. λ(t) = βI(t) is defined as the force of infection, and is one of the key pa-
rameters describing the rate at which a susceptible person acquires infection at time t.
The assumption of homogenous mixing is usually not very realistic. The mass
action principle can therefore be extended to the level of age-specific transmission
rates (see e.g. Anderson and May (1991)):
λ(a) =
∫ ∞0
β(a, a′)I(a′) da′, (1.10)
where β(a, a′) denotes the average per capita rate at which an infectious individual
of age a′ makes effective contact with a susceptible person of age a, per unit time.
This principle thus implicitly assumes that susceptible and infectious individuals mix
completely and move randomly within the population. Hence, the average rate at
which a susceptible individual of age a acquires infection per unit time roughly equals
the sum of the average rates at which he/she makes effective contacts with all infec-
tious individuals in the population, per unit time. Following Farrington et al. (2001),
(1.10) can be rewritten as:
λ(a) = D
∫ ∞0
β∗(a, a′)λ(a′)S(a′) da′,
14 Chapter 1. Introduction
with
β∗(a, a′) = D−1
∫ ∞0
β(a, a′ + t) exp
(−∫ t
0
γ(u) du
)exp
(−∫ a′+t
a′µ(u) du
)dt.
If the mean infectious period D is short compared to the timescale on which trans-
mission and mortality rates vary, β∗(a, a′) ≈ β(a, a′) and the force of infection can be
approximated by:
λ(a) ≈ ND
L
∫ ∞0
β(a, a′)λ(a′)s(a′)m(a′) da′, (1.11)
where s(a′), N, L and m(a′) are defined as before. When one wants to estimate
the transmission rates β(a, a′) from the force of infection λ(a), additional assump-
tions are necessary since λ(a) is a one-dimensional function of age and β(a, a′)
makes up a two-dimensional function. The traditional approach of Anderson and
May (1991) stratifies the population into a number of age classes J leading to a
system of J equations with J × J unknowns. They then impose different mixing
patterns upon this βij matrix, which is called the ‘Who Acquires Infection From
Whom’ (WAIFW) matrix, hereby constraining the number of distinct elements
βij for identifiability reasons. The unknown elements in the WAIFW matrix can
then be estimated from serological data. However, the choice of the structure
imposed on the WAIFW matrix as well as the choice of the age classes are ad hoc and
impact the estimation of R0 (Greenhalgh and Dietz, 1994; Van Effelterre et al., 2009).
In this dissertation, we will consider a more recent approach as proposed by
Wallinga et al. (2006) and extended by Ogunjimi et al. (2009) and Goeyvaerts et al.
(2010), by informing β(a, a′) with data on social contacts. This approach relies on
the so-called ‘social contact hypothesis’ stating that the transmission rate β(a, a′) is
proportional to the age-specific contact rate c(a, a′), i.e. the per capita rate at which
an individual of age a′ makes contact with a person of age a, per unit of time:
β(a, a′) = q · c(a, a′), (1.12)
where q is a constant proportionality factor. The assumption of constant propor-
tionality is commonly used in literature, however in Chapter 3 we will contrast this
assumption with an age-dependent proportionality factor q(a, a′) that may capture,
among other effects, age-specific susceptibility and infectivity. In Section 2.2 we will
introduce social contact data and methods to estimate the contact rates c(a, a′).
1.3. Basic Concepts 15
1.3.2.2 Reproduction Numbers
One of the key measures of infectious disease transmission is the basic reproduction
number R0, sometimes also called the basic reproductive ratio. It represents the
expected number of secondary cases produced by a single typical infectious individual
during his/her entire infectious period when introduced into a completely susceptible
population. In last years, R0 has been used extensively as a key parameter to
quantify disease transmission. For a historical overview of the development of R0, we
refer to Heesterbeek (2002).
Figure 1.5: Illustration of the (basic) reproduction number R0 (left) and R (right): one
infected individual (black circle) is introduced into a fully susceptible population and
infects on average R0 = 3 other individuals (grey circles, left panel), or he/she is
introduced in a partly immunized population (dotted circles) infecting only R = 1
individual (grey circles, right panel) (Goeyvaerts, 2011).
Figure 1.5 presents an illustration of the basic reproduction number for a simplistic
situation where R0 = 3. R0 is also referred to as a threshold parameter, since if it is
larger than 1, the infection may become endemic and the larger R0, the more effort
is required to eliminate the infection from the population. If it is smaller than 1, the
infection will eventually go extinct. Hence, the basic reproduction number reflects
the potential of an infection to lead to an epidemic. R0 depends on three factors: the
duration of the infectious period, the probability that a contact between an infected
and a susceptible individual leads to an infection and the contact rate (Dietz, 1993).
Although R0 is a useful theoretical measure, it is rarely observed in practice. The
effective reproduction number R is a measure for the actual expected number of
secundary cases, taking into account pre-existing immunity, control measures and
16 Chapter 1. Introduction
depletion of susceptible individuals (see right panel in Figure 1.5). In the endemic
equilibrium setting described in Section 1.3.1.3, each infectious individual infects one
other individual on average, hence R must be equal to 1 (Diekmann et al., 1990). It is
clear that both the basic and effective reproduction number are important epidemic
summary measures, it is therefore of importance to obtain reliable estimates for R0
and R.
Assume that an infectious individual of age a′ is introduced into a population
with a proportion s(a) of susceptible individuals of age a, the average number of
persons of age a infected by this individual during its infectious period of length D
is then given by
G(a, a′) = NDn(a)s(a)β∗(a, a′),
where n(a) = N(a)/N . The introduction of a ‘typical’ infectious individual results in
a next generation of infected individuals of age a, is then calculated as:
G[i](a) = NDn(a)s(a)
∫ ∞0
β∗(a, a′)i(a′) da′.
Hence, the next generation operator G expresses the age distribution of the next
generation of cases. The total number of cases infected by this ‘typical’ infectious
individual is then given by: ∫ ∞0
G[i](a) da.
Assume now that the infectious period is short (β∗(a, a′) ≈ β(a, a′)) and that the
population is in demographic equilibrium and thus the population size is fixed (n(a) =
m(a)/L). The reproduction number equals the spectral radius of the next generation
operator G and, in a fully susceptible population (s(a) = 1), the reproduction number
reduces to the basic reproduction number R0 (Diekmann et al., 1990). Therefore, the
(basic) reproduction number can be calculated as the leading eigenvalue of the ‘next
generation matrix’:
ND
Lm(a)s(a)β(a, a′).
The leading right eigenvector of the next generation matrix is then proportional to
the distribution of infected individuals during the initial exponential growth phase of
an epidemic. More details can be found in Diekmann et al. (1990) and Farrington
et al. (2001).
1.3. Basic Concepts 17
1.3.3 Network Modeling
The main drawback of the mass action principle (1.10) described in Section 1.3.2.1
is that it assumes complete and random mixing within the population and therefore
does not account for the fact that contacts are often clustered in e.g. households,
schools or workplaces. Network-based approaches to infectious disease dynamics have
individual-based interpretation and allow to model these aspects of social mixing
behavior (Keeling and Eames, 2005; Danon et al., 2011). The term ‘network’ is very
general and simply refers to a collection of elements and their inter-relations. Network
theory is therefore used in a variety of fields such as biology, bioinformatics, physics,
computer science, and so on. In Section 1.3.3.1 we will discuss some basic notions
and in Section 1.3.3.2 a model for networks is introduced.
1.3.3.1 Basic Definitions
Graph theory is the mathematical language in which networks are defined. Consider
a graph G, G = (V,E) is defined as a structure consisting of a set of vertices (or
nodes) V and a set of edges (or links) E. The elements of E contain unordered
pairs of nodes {u, v}, u, v ∈ V that are connected in G. The ‘order’ of the graph
G is defined as the number of nodes and the size Nv of G is the number of edges.
When each edge in E has an ordering i.e. {u, v} is distinct from {v, u}, G is called
a directed graph and the edges are called directed edges. Two nodes in V are said
to be adjacent if they are joined by an edge in E, the degree of a node v is defined
as the number of edges incident on v. There are several types of graphs that are
commonly encountered in practice. One example is a a complete graph in which
every node is connected to any other node.
Graphs and certain aspects of its structure can be characterized using matri-
ces and matrix algebra. The connectivity of a graph G may be captured in an
Nv ×Nv binary, symmetric matrix Y :
Yij =
1 if {i, j} ∈ E,
0 otherwise,(1.13)
where the nodes are denoted by 1, ..., Nv and an edge is denoted as an unordered
pair of vertices {i, j} ∈ V . This matrix Y is called the ‘adjacency matrix’ and stores
connectivity information of the graph G. On a final note regarding basic definitions,
it is sometimes useful to consider a graph G itself as a random object by thinking of
G as having been drawn from a collection of possible graphs, say G. Then P (G) refers
18 Chapter 1. Introduction
to the probability of drawing G from G. For more details and examples, we refer to
Kolaczyk (2009).
1.3.3.2 Exponential Random Graph Models
A model for a network graph is a collection
{Pθ(G), G ∈ G : θ ∈ Θ},
where G is a collection of possible graphs, Pθ is a probability distribution on G, and
θ is a vector of parameters with possible values in Θ. There is a vast amount of
modeling approaches in the literature, ranging from simple (e.g. Pθ uniform on G)
to complex and they are used for a variety of purposes. In this dissertation, we focus
on exponential random graph models (ERGMs, Robins et al. (2007)), that extend
the idea of statistical regression to random graphs.
In an ERGM, the probability of observing a specific network configuration is
defined in terms of network statistics. Let G = (V,E) be a random graph with
adjacency matrix Y and let y be a particular realization of Y . The probability of
observing y is then given by
Pθ(Y = y) =exp{θTg(y,X)}
κ(θ),
where g(y,X) is a vector of network statistics that may depend on additional
covariate information X, θ the corresponding vector of coefficients, and κ(θ) a
normalizing factor.
An alternative model specification clarifies the interpretation of θ (Hunter et al.,
2008). For a specific pair of nodes (i, j), define the vector of change statistics as
follows:
δg(y,X)ij = g(y+ij ,X)− g(y−ij ,X),
where y+ij and y−ij are the networks realized by fixing yij = 1 and yij = 0, respectively,
while leaving all the rest of y fixed. This allows for a logistic interpretation of the
coefficients in θ:
logit{Pθ(Yij = 1|Y cij = ycij)} = θT δg(y,X)ij , (1.14)
where Y cij represents the rest of the network other than Yij . Thus, θ reflects the
increase in the conditional log-odds of the network, per unit increase in the corre-
sponding component of g(y,X), resulting from switching a particular dyad Yij from
1.3. Basic Concepts 19
0 to 1 while leaving the rest of the network fixed at Y cij . In Chapters 5 and 6 we will
use ERGMs to model contact networks within households.
1.3.4 Statistical Inference
In this dissertation, we will rely on both the frequentist framework as the Bayesian
framework. In this section, we briefly introduce some of the methods used to perform
parameter estimation and asses variability.
1.3.4.1 Maximum Likelihood Estimation
Within the frequentist framework, the standard method to estimate unknown param-
eters for a given model is the method of maximum likelihood (ML). The basis principle
behind this approach is the construction of a likelihood function expressing the ‘agree-
ment’ between the selected model and the observed data. Consider a set of observed
values y = (y1, ..., yn) of a random sample Y1, ..., Yn and let fi(yi|θ) be the density
function of Yi. The vector θ = (θ1, ..., θk) represents the unknown model parameters
that we want to estimate from the observed data. Since the random variables Yi are
independent, the likelihood function is given by:
L(θ|y) =
n∏i=1
fi(yi|θ).
Hence, given the selected model fi(yi|θ) parametrized by θ, the likelihood L(θ|y) is
the probability of observing the data y as a function of θ. Maximizing this likelihood
function over the entire parameter space Θ then results in an estimate for θ, denoted
by θ. From an analytical and computational perspective, it is often more convenient
to maximize the log-transformed likelihood function ll(θ|y) = log(L(θ|y)). Indeed,
this results in a function composed of additive contributions log(fi(yi|θ)), simplifying
the calculation of derivatives with regard to θj , j = 1, ..., k. As the natural logarithm
is a monotone increasing function the optimization problem is equivalent. To derive
the ML-estimate θ the so-called set of score equations needs to be solved:
Sj(θ|y) =δ
δθjll(θ|y) = 0.
The information matrix I(θ|y) is defined in terms of the second order partial deriva-
tives:
I(θ|y) = −[
δ2
δθlδθmll(θ|y)
]l,m
.
20 Chapter 1. Introduction
This matrix should be positive definite for θ to be a maximum. There are multiple
numerical optimization techniques available to solve the set of score equation when
a closed form solution is not available. These include iterative procedures such
as Newton-Raphson, Fisher Scoring and the EM-algorithm, and so on. The ML-
estimator is weakly consistent and asymptotically normal under certain regularity
conditions.
Wald-based Confidence Intervals
The above procedure produces a point estimate θ for the unknown parame-
ters. To acknowledge the uncertainty associated with this estimation, we want to
estimate the standard error or confidence interval (CI) of θ. One way to do so,
is by calculating the Wald-based confidence intervals that rely on the asymptotic
normality of the ML-estimate:
[θj − z1−α/2 × se(θj), θj + z1−α/2 × se(θj)
],
where α and z1−α/2 are the significance level and the (1− α/2)× 100th percentile of
the standard normal distribution, respectively. Further, se(θj) is an estimate for the
standard error of θj given by the square root of the jth element on the diagonal of the
inverse of the observed information matrix I(θ)−1. Indeed, I(θ)−1 is an estimator
for the asymptotic variance-covariance matrix of θ.
Bootstrap Confidence Intervals
The bootstrap approach is, in contrast with the Wald-based CIs, a distribution-free
method to estimate standard errors and calculate approximate CIs for θ. It was
first introduced by Efron (1979) and is now widely used to assess the uncertainty
associated with parameter estimates. There are different versions of the bootstrap
approach, namely the nonparametric, the semiparametric and parametric bootstrap.
The semi- and parametric bootstrap approaches require parametric assumptions
about the ‘true’ underlying population, and are therefore often less useful compared
to the nonparametric approach. In this dissertation, we will only rely on the
nonparametric bootstrap.
The idea behind the latter approach is that one samples from the empirical
distribution function, a nonparametric and consistent estimator for the unknown
1.3. Basic Concepts 21
distribution F of the quantity of interest, which is equivalent to sampling with
replacement from the sample itself. Hence, let y(1), ...,y(B) denote B independent
bootstrap samples of size n obtained by drawing samples with replacement from
the observed data y1, ..., yn. Let θ(b), b = 1, ..., B, be the bootstrap replicates of θ
obtained by maximizing the loglikelihood ll(θ|y(b)) for bootstrap sample y(b). The
bootstrap estimate for the standard error of the ML-estimate θj is then given by:
seB(θj) =
√∑Bb=1(θ
(b)j − θj)2
B − 1,
where
θj =1
B
B∑b=1
θ(b)j .
Note that more generally, bootstrap estimates can be obtained for any statistic
of interest from y. Several bootstrap methods have been proposed to construct
approximate confidence intervals. In this thesis, we will use the percentile-based
bootstrap CIs, based on the empirical distribution function of θj . Let p(j,B)α denote
the α × 100th percentile of the bootstrap values θ(b)j , b = 1, ..., B, then the approxi-
mate (1 − α) × 100% CI for θj is [p(j,B)α/2 , p
(j,B)1−α/2]. For a more detailed discussion on
bootstrap methods, we refer to Effron and Tibshirani (1993).
Model Selection
To compare various models in this ML setting, we will focus on two informa-
tion criteria: Akaike’s information criterion (Akaike, 1973):
AIC = −2ll(θ|y) + 2k,
where k represent the number of parameters, and the Bayesian information criterion
(Schwarz, 1978):
BIC = −2ll(θ|y) + log(n)k.
Both criteria consist of two terms, the first is a measure of data fit and the second is
a penalty term. The BIC originates from a Bayesian perspective and penalizes the
number of parameters more strongly (factor log(n) instead of 2). Given a set of can-
didate models, the ‘best’ model is the one with the smallest AIC or BIC value. Since
model selection is conditional on the set of models under consideration, there may
exist other models that are closer to the true underlying model. Hence, the choice of
22 Chapter 1. Introduction
candidate models is crucial to ensure that the preferred model describes the data well.
Furthermore, the AIC values can be used to calculate the Akaike weights.
These weights can be interpreted as the probability of a certain model being the
‘best’ model, given the data and the set of candidate models under consideration.
Suppose we consider a set of m candidate models, and list them according to their
AIC value. Let AICmin correspond to the model with the smallest AIC value and
define the AIC differences ∆i = AICi − AICmin (i = 1, ...,m). The Akaike weights
are then calculated in the following way:
wi =exp
(− 1
2∆i
)m∑l=1
exp
(−1
2∆l
) .
For further details we refer to Burnham and Anderson (2002).
1.3.4.2 Markov Chain Monte Carlo
In the previous section, we described how ML estimation can be used to obtain point
estimates and discussed methods to assess uncertainty by estimating the standard
error or CI associated with these point estimates. This is a frequentist approach in
which the unknown quantity θ is assumed to be fixed (non-random). A different
framework for inference is the Bayesian approach. In this framework θ is treated as a
random variable and we are interested in the distribution of θ. More specifically, we
first assume that we have current knowledge about θ. This is expressed by placing a
probability distribution on the parameters, called the prior distribution, π(θ). After
observing data y = (y1, ..., yn), the distribution π(θ) is updated to obtain the posterior
distribution f(θ|y). This update is done by using Bayes’ Theorem:
f(θ|y) =f(y|θ)π(θ)∫
Θf(y|θ)π(θ) dθ
,
with Θ the space of possible parameter values, as before. In theory, the posterior dis-
tribution is always available, but evaluation of the complex integral∫Θf(y|θ)π(θ) dθ
is often analytically intractable. With the use of Markov Chain Monte Carlo (MCMC)
methods, the evaluation of the integral is avoided by making use of the unnormalized
posterior density:
f(θ|y) ∝ f(y|θ)π(θ),
1.3. Basic Concepts 23
equivalently,
posterior ∝ likelihood× prior.
MCMC is based on the classical Monte Carlo, i.e. Monte Carlo integration aiming at
approximating expectations of the form
E[h(X)] =
∫h(x)g(x) dx.
If X1, ..., Xn ∼ g(x), iid and E[h(X1)] < ∞, then the above expectation can be
approximated by
1
n
n∑i=1
h(Xi),
for some large, yet finite n. However, in many situations, classical Monte Carlo is
not possible because we cannot sample from the distribution g(x). For these, often
high-dimensional cases, MCMC has been developed. The general MCMC strategy
is to construct an ergodic Markov chain Xn with stationary distribution g(x) (for a
discussion on Markov chain theory, see for example Bremaud (1999)). There are a
large number of MCMC algorithms, examples are Random-Walk Metropolis (RWM),
Metropolis-Hastings (MH), Gibbs sampling, slice sampling, etc. RWM was developed
first (Metropolis et al., 1953) and MH is a generalization of RWM (Hastings, 1970).
The MH algorithm only requires the evaluation of a function that is proportional to
the target density g(x). Let p(x) be a function that is proportional to g(x), then we
can construct a Markov chain according to the following algorithm:
Initialization: Choose a starting value x0 and choose an arbitrary distribution
function q(x|y) that suggests a candidate for the next sample value x, given the
previous sample y. This function is referred to as the ‘proposal density’.
Iteration: For n = 0 to N do
1. Generate a candidate value x′ from q(x′|xn).
2. Compute the Metropolis-Hastings acceptance probability
α = min
{p(x′)q(x′|xn)
p(xn)q(xn|x′), 1
}.
3. Generate a value u from U [0, 1].
24 Chapter 1. Introduction
4. Accept the candidate x′ by setting xn+1 = x′ if u ≤ α, otherwise set xn+1 = xn.
The MH algorithm is the most generalizable MCMC algorithm, extending RWM
to include an asymmetric proposal distribution q. The main disadvantage of these
two methods is that the proposal variance needs to be tuned manually. Therefore,
adaptive variants of RWM, tuning the algorithm as it updates, have been proposed.
These algorithms automatically optimize the proposal variance based on the history
of the chains. However, this violates the Markov property, which states that the
proposal may only be influenced by the current state. To obtain valid Markov
chains, a two-phase approach can be used, in which adaptive MCMC is followed by a
non-adaptive algorithm, such as RWM. One of these adaptive algorithms, used in this
dissertation, is the Adaptive-Mixture Metropolis (AMM) algorithm. This algorithm
is an extension by Roberts and Rosenthal (2009) of the Adaptive Metropolis (AM)
algorithm of Haario et al. (2001). Further details will be omitted here.
Referring back to the inference setting, one can construct a Markov Chain with the
posterior distribution f(θ|y) as stationary distribution by taking p(θ) = f(y|θ)π(θ).
When the Markov chain is constructed, one needs to determine how many
steps are needed to converge to the stationary distribution within an acceptable
error. Although, there is no definitive way to tell whether the chain is long enough,
several diagnostic tools exist. For an in-depth review of these methods, we refer
to Cowles and Carlin (1996) and Brooks and Roberts (1998). Furthermore, since
an arbitrary initialization is chosen, a ‘burn-in’ period is often discarded and since
samples are not independent, the chain is often ‘thinned’, only keeping every kth
sample. The output of this simulated chain θ(1), ...,θ(m) can then be used to estimate
characteristics of f(θ|y), such as the expected value of θ: θ = 1m
∑j θ
(j).
Model Selection
AIC and BIC were introduced in the previous section as model selection crite-
ria. In a Bayesian setting, the deviance information criterion (DIC), a hierarchical
modeling generalization of AIC, is used in model selection problems. More specifi-
cally, define the deviance as D(θ) = −2 log(f(y|θ)) and denote the expected deviance
by D = Eθ[D(θ)]. Further, the effective number of parameters is pD = D − D(θ),
where θ is the expectation of θ. The DIC is then given by
1.3. Basic Concepts 25
DIC = D(θ) + 2pD.
From this definition, it is clear that the DIC can be easily calculated from samples
obtained by a MCMC approach. Equivalent to AIC and BIC, models with smaller
DIC are preferred over models with larger DIC.
Chapter 2Data Sources
In this chapter the data sources that are used throughout the thesis will be introduced.
We will use two main types of data in our applications. The first are disease data sets
from a variety of sources. In Section 2.1.1 cross-sectional serological survey data on
varicella-zoster virus in twelve countries is discussed. The influenza-like-illness (ILI)
incidence data obtained during the A(H1N1)v2009 influenza epidemic and data from
a prospective study on pertussis within households are introduced in Sections 2.1.2
and 2.1.3. Lastly, the Ebola virus disease (EVD) incidence and mortality data are
introduced in Section 2.1.4. In all chapters, except Chapter 7, the disease data is
augmented with contact rates obtained from social contact surveys. The different
types of such social contact data used in our applications are discussed in Section 2.2.
2.1 Disease Data
2.1.1 Varicella-zoster Virus
VZV is one of the eight known herpes viruses that affect humans. Primary infection
with VZV results in varicella (chickenpox) and mainly occurs in childhood. In
general, the disease is benign, however, symptoms may be more severe in adults and
complications may occur when varicella is acquired during pregnancy. VZV is highly
contagious and transmitted through direct close contact with lesions or indirectly
through air droplets containing virus particles. The incubation period following
VZV infection ranges from 13 to 18 days and each infected person transmits the
virus for about 7 days. The antibody response following primary infection with
27
28 Chapter 2. Data Sources
VZV is believed to induce lifelong protection against chickenpox. However, the virus
remains dormant within the body and may reactivate and give rise to herpes zoster
(or shingles), a skin disease, after years to decades (Miller et al., 1993). In this
dissertation, we will focus on primary infection and ignore reactivation leading to
zoster.
In Chapter 3, we reanalyze the ESEN2 (European Sero-Epidemiology Network) data
on VZV published by Nardone et al. (2007) together with newly available serology
from Poland and Italy, totaling 13 serosurveys from 12 different countries including
two samples from Italy (see Table 2.1 and Figures 2.1-2.2). At the time of sera
collection, which varied between 1995 and 2004, none of the participating countries
had introduced a universal VZV vaccination program. Sample sizes range from 1268
for Poland to 4398 for Germany, with substantial variability between the surveyed
age ranges.
Table 2.1: Overview of the VZV serological data and demographic parameters.
Data Age Sample Life Population
collection range size expectancy size
Country (years) (years)
Belgium (BE) 2001-2003 0-71.5 3251 77.6 10,309,722
Germany (DE) 1995/1998 0-79 4398 77.1 82,050,377
Spain (ES) 1996 2-39 3590 77.5 39,427,919
England and Wales (EW) 1996 1-20.9 2032 76.0 51,125,400
Finland (FI) 1997-1998 1-79.8 2471 76.7 5,146,965
Ireland (IE) 2003 1-60 2430 77.6 3,963,814
Israel (IL) 2000-2001 0-79 1543 76.2 6,223,842
Italy (IT’97) 1996-1997 0.1-50 3110 78.2 56,872,349
Italy (IT’04) 2003-2004 1-79 2446 80.3 5,788,0478
Luxembourg (LU) 2000-2001 4-82 2640 77.2 438,723
The Netherlands (NL) 1995-1996 0-79 1967 77.0 15,493,889
Poland (PL) 1995-2004 1-19 1268 73.2 38,637,184
Slovakia (SK) 2002 0-70 3515 73.2 5,378,702
These serological data consist of cross-sectional sets of either residual blood samples
2.1. Disease Data 29
●
●●
●
●●
●●●●
●●●
●●
●●●●●
●●● ●
●
●●
● ●
●
●●● ● ●
●
●
●
●
● ● ● ●●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●● ● ●
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
sero
prev
alen
ceBelgium
●●●
●●●
●●●
●
●
●●●●
●●●●
●
●
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
sero
prev
alen
ce
England and Wales
●
●
●●
●
●
●●
●●
●●●●●
●
●●
●
●●●●
●
●
●●●
●●●●
●●●
●●
●●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
sero
prev
alen
ce
Finland
●
●
●
●
●
●●●●
●●●●●●●●●●
●●●
●●
●
●●
●●
●●
●●●
●●●
●●
●●●●●
●
●●
●●●
●
● ● ●●●
●●●●●
●
●● ● ●● ● ● ● ● ● ● ●
●
●
●
●
●
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
sero
prev
alen
ce
Germany
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●●●●● ● ●
●
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
sero
prev
alen
ceIreland
●
●
●
●
●●
●●
●
●
●●●
●●●●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●●
●
●
● ●
●
● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
sero
prev
alen
ce
Israel
●
●●●
●●
●●●
●●●●
●●
●●●●●●●
●
●
●
●
●
●
●
●●
●●
●
●●●
●●
●
●
●
● ● ●
●
● ● ● ●
●
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
sero
prev
alen
ce
Italy (1997)
●
●
●●
●
●
●
●
●●
●●●
●●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
● ●
●
●
● ● ● ● ● ●
●
● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
sero
prev
alen
ce
Italy (2004)
Figure 2.1: Observed age-specific VZV seroprevalence for Belgium, England and Wales,
Finland, Germany, Ireland, Israel and Italy. The size of the dots is proportional to the
sample size per age category.
collected during routine laboratory tests or population-based random sampling.
Blood samples were tested using an enzyme-linked immunosorbent assay (ELISA),
which is a technique to measure infection-specific IgG antibodies. To allow for
international comparisons, the antibody titers were standardized controlling inter-
laboratory and inter-assay variations (de Ory et al., 2006). The observed IgG level
indicates past infection or vaccination and is classified as seropositive or seronegative
by comparing to the cut-off level (or range) specified by the manufacturer of the
test. Hence in the absence of an immunization program, serological data provide
information on the prevalence of past infection in a population under the assumption
30 Chapter 2. Data Sources
●
●
●●●●●●●●
●●●●●●
●
●
●●
●
●●
●
●●
●●●
●
●●
●
●● ● ● ● ●
●
● ● ● ● ●
●
● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ●
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
sero
prev
alen
ceLuxembourg
●
●
●
●
●
●●
●
●●●●●●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ●
●
●
●
● ● ● ● ●
●
●
●
●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
sero
prev
alen
ce
Netherlands
●●●●
●
●●
●●●●
●
●
●
●
●●
●
●
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
sero
prev
alen
ce
Poland
●
●●
●
●
●
●●●
●●●●●●●●●
●●●●
●
●●
●●●●●
●●●
●
●
●●●
●●
●
● ● ● ●
●
●
● ●
● ●
● ●
●
● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
sero
prev
alen
ce
Slovakia
●
●
●
●●
●●●
●●●●●
●
●
●●●●●●●
●●●●
●●●●●
●●●
●●●●
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
sero
prev
alen
ce
Spain
Figure 2.2: Observed age-specific VZV seroprevalence for Luxembourg, the Netherlands,
Poland, Slovakia and Spain. The size of the dots is proportional to the sample size per age
category.
of a serological correlate of protection against the infection. The proportion of
seropositives in the sample is called the seroprevalence.
This type of data is type I interval censored data, or current status data, as
an individual’s infection time lies either before (seropositive) or after (seronegative)
the time of sampling. Since the test is based on a pre-specified cut-off, it suffers
from diagnostic uncertainty and both false negatives as false positives can occur.
This may lead to bias when estimating the prevalence and force of infection. To
avoid bias introduced by misclassification, approaches using the continuous antibody
titers have been proposed (Bollaerts et al., 2012). In this thesis, we will focus on the
dichotomized data where equivocal results were included as seropositive. Although
residual samples are considered to be prone to selection bias, it was shown that the
VZV sero-prevalence estimated from residual sera collection or population sampling
is similar (Kelly et al., 2002).
Unlike incidence data, i.e. disease notification counts through passive or active
2.1. Disease Data 31
surveillance, and laboratory reports, i.e. laboratory confirmed cases, serological data
do not suffer from bias introduced by changes in clinical awareness or under-reporting.
2.1.2 A(H1N1)v2009
The 2009 H1N1 flu was first detected in the United States in April 2009. Up to then,
this combination of influenza virus genes had never been seen in animals or humans.
It was most closely related to North American swine-lineage H1N1 and Eurasian
lineage swine-origin H1N1 influenza viruses, leading to the term “swine flu”. Unlike
what this term suggests, the virus was typically transmitted from person to person
via respiratory droplets. On June 11, 2009, WHO declared the outbreak a global
pandemic and on August 10, 2010, the end of the H1N1 pandemic was announced
(Centers for Disease Control and Prevention, 2010).
Figure 2.3: Weekly number of ILI cases in five age categories during the early part of the
A/H1N1pdm influenza epidemic in 2009 in England and Wales.
The methodology in Chapter 4 is applied to weekly incidence data on influenza-
32 Chapter 2. Data Sources
like-illness (ILI) obtained from general practitioners’ weekly consultation data from
England and Wales during the early part of the A/H1N1pdm influenza epidemic
in 2009 (weeks 23-29). These data were obtained from weekly reports published
by Public Health England (Public Health England, 2010). We only consider the
exponential growth phase of the epidemic, since the model in Chapter 4 does not
take any intervention strategies into account. Pre-existing immunity to the pandemic
strain was obtained from a serological study in England the year before the pandemic
(Miller et al., 2010). Figure 2.3 shows the incidence in five age classes: 0− 4, 5− 14,
15− 44, 45− 65 and 65+ years.
2.1.3 Pertussis
Pertussis, commonly known as whooping cough, is a highly contagious respiratory
illness which is caused by a type of bacteria called Bordetella pertussis. The disease
is spread from person to person by coughing or sneezing. It is characterized by
uncontrollable, heavy coughing which often makes it hard to breathe. The name
‘whooping cough’ is derived from the high-pitched ‘whooping’ sound that may follow
after fits of many coughs. Symptoms develop within 5 to 10 days after being infected,
however the incubation period can be as long as 3 weeks. Early symptoms appear to
be nothing more than a common cold, including a runny nose, light fever and a mild
cough. After 1 to 2 weeks the disease progresses and the typical cough symptoms
of pertussis may appear. These coughing fits can go on for up to 10 weeks or more.
The infection is generally milder for teenagers and adults but can be very serious,
even deadly, for babies less than a year old.
We use data from a prospective study on pertussis within households in the
Netherlands in 2006. A detailed description of this study can be found in de Greeff
et al. (2010). In short, households with an infant aged less than 6 months that had
been hospitalized with laboratory-confirmed pertussis were enrolled in the study. The
hospitalized infant is referred to as the index case. These households were visited by
a study nurse within the first week after the diagnosis of the infant and all household
members were tested for pertussis by PCR, culture and serology. They also received
a questionnaire with questions on clinical symptoms in the past 2 months. This
questionnaire also indicates age, relation to the infected infant, date of symptom
onset (if any) and vaccination status. Follow-up data was collected by phone four
to six weeks after the initial home visit. A household contact was regarded as a
confirmed pertussis case if either PCR, culture or serology were positive. The date
2.1. Disease Data 33
of symptom onset was defined as the first day of coughing or cough-preceding cold
symptoms. The 62 households with confirmed cases that did not have a defined date
of symptom onset were excluded. Furthermore, we removed 13 atypical households
(grandparents, uncle/aunt) and 5 household with individuals for which the age was
missing. Therefore the data for analysis consisted of 121 households (and index
cases) and 401 household members of which 191 were confirmed cases. The data is
graphically presented in Figure 2.4.
Figure 2.4: Compositions of the households included in the pertussis study (left) and
symptom onset times in days relative to the symptom onset time of the primary case of the
household (right).
2.1.4 Ebola Virus Disease
Ebola virus disease (EVD) belongs to the virus family Filoviridae, also including
Cuevavirus and Marburgvirus. There are five species of EVD: Zaire, Bundibugyo,
Sudan, Reston and Tai Forest. The virus that caused the most recent West African
outbreak in 2014 belongs to the Zaire strain. EVD is transmitted to people from wild
animals and spreads through human-to-human transmission. Transmission requires
direct contact with blood, secretions, organs or other bodily fluids of infected persons
or animals. The virus starts as a flu-like syndrome after an incubation period of 2 to
21 days with the onset of fever, fatigue, muscle pain, headache and sore throat. It
34 Chapter 2. Data Sources
then rapidly evolves to severe symptoms such as vomiting, diarrhoea, rash, impaired
kidney and liver function and both internal and external bleedings. Distinguishing
EVD from other infectious diseases such as malaria, typhoid fever and meningitis can
be difficult. Laboratory testing is necessary to confirm that symptoms are caused by
Ebola virus infection. The disease is often fatal in humans with an average fatality
rate around 50% (World Health Organization, 2016a; European Centre for Disease
Preventionl and Control, 2016).
Figure 2.5: Transmission electron micrograph of an Ebola virus virion.
EVD first appeared in two simultaneous outbreaks in 1976, one in what is now,
Nzara, South Sudan, and the other in Yambuku, Democratic Republic of Congo.
The latter occurred in a village near the Ebola River, from which the disease takes
its name (World Health Organization, 2016a). The current Ebola epidemic in West
Africa was detected in March, 2014. On 8 August, 2014, WHO declared the event
a Public Health Emergency of International Concern (Hawkes, 2014) and the UN
General Assembly declared the epidemic a threat to global health and security
(United Nations Security Council, 2014). On 9 May, 2015, Liberia was declared free
of Ebola virus transmission but on 30 June, 2015, a new case was detected from an
unknown chain of transmission (World Health Organization, 2015a,c). In Guinea
and Sierra Leone, the epidemic persisted in a number of districts mainly between
Conakry and Freetown (UN Mission for Ebola Emergency Response, 2015). As of
24 June, 2015, it has caused 27,443 probable, confirmed, and suspected cases of
EVD in Guinea, Liberia and Sierra Leone, including 11,207 deaths (World Health
Organization, 2015b).
2.1. Disease Data 35
Data on cases and deaths
We used publicly available district-level data on cumulative cases and deaths,
reported from 30 December 2013 until 8 July 2015 through situational reports by
the Ministries of Health of Guinea (Nations U. West Africa, 2015), Liberia (Liberia
MoHaSWRo, 2015) and Sierra Leone (Sanitation Moha, 2015; NERC, 2015). The
data were collected and reported to the national authorities by the Ebola treatment
units and diagnostic testing facilities in the three countries, following national guide-
lines and/or WHO case definitions (World Health Organization, 2014). Data were
reported every two to three days, and more recently on a daily basis. The data sources
provided no detailed information about the used case definition. Data for Liberia
and Guinea were the reported total cumulative number of (suspected, probably and
confirmed) cases and deaths, while for Sierra Leone, we calculated the sum of the
reported suspected, probable and confirmed cases. This allowed us to calculate for
each district the number of new cases and new deaths between two reporting intervals.
A presentation of how the cases were reported can be found in Figure 7.6.
The reporting scheme for deaths was similar, but the dates at which reporting
occurred is not necessarily the same.
Data on control measures
Publicly available situation reports of response measures were used to assess
the intensity of interventions (Organization GoGWH, 2015; Exchange HD, 2015).
The publicly available data regarding interventions provided little detail and was
not regular over time or over the entire outbreak region. Due to the complexity of
response measures and limited availability of data, we used the presence of triage
centers, holding or community care centers and Ebola Treatment Units (ETUs) as a
surrogate marker of response activities.
The implemented intervention measures and cumulative numbers of cases and
deaths are displayed in Figures 7.3 and 7.4, respectively.
36 Chapter 2. Data Sources
2.2 Social Contact Data
Mathematical modeling of infectious disease spread requires assumptions on the un-
derlying transmission processes (i.e. β(a, a′) introduced in Section 1.3.2). Since the
spread of airborne or close-contact infections in a population is driven by social con-
tacts between individuals, these assumptions are related to human social interactions.
The frequency and intensity of these interactions typically vary with age, but also de-
pend on disease status (Section 2.2.2) and setting (Section 2.2.3). In the traditional
approach of Anderson and May (1991) mixing patterns are imposed to estimate the
WAIFW matrix from age-specific incidence or serological data. However, it has been
shown that R0 is highly sensitive to the choice of the imposed mixing pattern (Green-
halgh and Dietz, 1994). An alternative to the approach by Anderson and May (1991) is
informing the mixing patterns with data from population-based social contact surveys
and assuming that transmission rates are proportional to contact rates. Recently, sev-
eral studies were conducted to measure social mixing behavior, and Read et al. (2012)
present a review of the different methodologies employed. In the next sections, we
describe three different social contact surveys i.e. a large multi-country population-
based survey, a contact survey conducted during the A/H1N1pdm influenze epidemic
and a survey on household contacts. We also briefly describe methods for the estima-
tion of contact rates using data as obtained in the first two contact surveys.
2.2.1 POLYMOD Contact Data
In Chapter 3, we use contact data from cross-sectional diary-based surveys that were
conducted between May 2005 and September 2006 as part of the POLYMOD project
(a European Commission project funded within the Sixth Framework Programme).
This project constituted the first large-scale prospective study to investigate social
contact behavior in eight European countries: Belgium, Germany, Finland, Great
Britain, Italy, Luxembourg, the Netherlands and Poland (Mossong et al., 2008a).
Participants were recruited through random-digit dialing, face-to-face interviews or
population registers, and completed a diary-based questionnaire recording social con-
tacts during one randomly assigned day. Parents filled in the diary for young children.
Recruiting participants was done such that the samples were broadly representative
for the study populations in terms of age, sex and geographical spread. Participants
were asked to fill in some general information and record the age and gender of each
contacted person, plus location, duration and frequency of the contact. In case the
exact age was unknown, the participant had to provide an estimated age range. If so,
2.2. Social Contact Data 37
the mean of this interval was used as a surrogate for the age of the contacted person.
Further, a distinction between two types of contact was made: non-close contacts,
defined as two-way conversations of at least three words in each others proximity, and
close contacts that involve any sort of physical skin-to-skin touching. For an extensive
description of the survey, we refer to Mossong et al. (2008a).
2.2.2 Contact Behavior during Illness
Recently, the impact of illness on social contact patterns has been investigated. This
was done using data from a social contact survey that was carried out during the
A/H1N1pdm influenza epidemic in England. This survey is described in detail by
Eames et al. (2010). Briefly, participants were recruited into the study through packs
with antiviral medication distributed at thirty-one antiviral distribution centers
throughout England during the epidemic. The packs contained a social contact
diary to be filled in on one day during the time they were symptomatic with ILI.
Two weeks later (by which time participants were expected to have recovered),
participants were sent a similar, follow-up questionnaire. Thus, the study aimed to
obtain two contact diaries from each participant: one completed when the participant
was showing symptoms and one completed after he or she had recovered. In these
contact diaries participants were asked to record details about each person they met
during the course of a day: gender and (estimated) age of the contact, social setting
and duration of the encounter, frequency with which that person was met, and
whether the encounter involved any skin-to-skin contact (e.g., hand-shake, kiss, or
contact sport). A total of 140 participants returned two completed contact diaries.
In Chapter 4 we will use the difference between the contact patterns of ‘healthy’ and
symptomatic individuals to infer on parameters related to asymptomatic infection
from the incidence data described in Section 2.1.2.
2.2.3 Estimation of Contact Rates
Contact rates can be estimated from the contact data described in the previous sec-
tions as follows. First, the age dimension is discretized in J age classes [a[j], a[j+1][.
Now, consider a respondent in age class i and let Yij denote the number of contacts
with individuals in age class j during one day. From the contact surveys described
above, we observe values yij,p, p = 1, ..., Pi where Pi is the number of participants in
age class i. Let the expected number of contacts in age class j by an individual in
age class i be denoted by mij = E(Yij). The elements mij make up a J × J social
38 Chapter 2. Data Sources
contact matrix. The yearly contact rates cij , i.e. the annual rate at which individuals
of age class j contact individuals in age class i are then given by:
cij = 365× mji
Ni,
where Ni denotes the population size in age class i obtained from demographic data.
To account for the reciprocity of social contacts (Wallinga et al., 2006), the total
number of contacts from age class i to age class j must equal the total number of
contacts from age class j to age class i on a population level:
mijNi = mjiNj
To estimate the contact rates cij from the POLYMOD data described in Section 2.2.1
we will use a bivariate smoothing approach described by Goeyvaerts et al. (2010). In
this approach the average number of contacts mij is modelled as a two-dimensional
continuous function over the age of both respondent and contact resulting in a con-
tinuous contact surface. The basis is a tensor product spline derived from two smooth
functions of the respondent’s and contact’s ages:
Yij ∼ NegBin(mij , φ), with g(mij) =
K∑k=1
K∑l=1
βklbk(a[i])dl(a[j]),
where g is a known link function, βkl are unknown parameters, and bk and bl are
known basis functions for the marginal smoothers. The basis dimension K should be
chosen large enough in order to fit the data well, but small enough to maintain com-
putation efficiency (Wood, 2006). Goeyvaerts et al. (2010) use thin plate regression
splines and a logarithmic link function. Post-stratification weights are taken into
account and a smooth-then-constrain approach is used to account for the reciprocity
of contacts. The estimated contact surface for Belgium is displayed in Figure 2.6.
From this contact surface we notice a strong main-diagonal, indicating assortative
mixing i.e. high contact rates between persons of the same age, an off-diagonal
parent-child component and a potential grandparent-grandchild component. These
age-specific mixing patterns and contact characteristics were very similar across the
European countries, although the average number of contacts differed substantially.
2.2. Social Contact Data 39
Figure 2.6: Contour plot of the estimated Belgian contact rates derived from the
bivariate smoothing approach applied to the POLYMOD survey data.
For the contact data in Section 2.2.2 no smoothing is applied and the averages
mij are used directly to calculate the social contact matrices Ca and Cs for both
recovered (assumed to be the same as asymptomatic) and symptomatic individuals,
respectively (Van Kerckhove et al., 2013). These matrices are presented in Figure 2.7.
Figure 2.7: Age-specific contact rates for asymptomatic individuals (left) and
symptomatic individuals (right) based on the age classes of the incidence data.
40 Chapter 2. Data Sources
2.2.4 Contact Patterns within Households
Since households are such important units in the transmission of infectious diseases,
we study household contact networks in Chapter 5. The data that was used, results
from a social contact survey conducted in 2010-2011 focusing on households with
young children in the Flemish geographic region including Brussels. Another contact
survey with similar design aimed at gathering individual-level information was con-
ducted in parallel and is described elsewhere (Willem et al., 2012). Participants were
recruited by random digit dialing and stratified sampling ensured representativeness
in terms of geographical spread, day and week-weekend distribution, and age and
gender of the youngest child. All participants were asked to anonymously complete
a paper diary recording their contacts during one randomly assigned day without
changing their usual behavior. Two types of contact diaries were used, adapted to
the age of the participants: one for children (0-12 years) that were designed to be
filled by a proxy, and one for adolescents and adults (> 12 years). The diaries were
sent and collected by mail. Participants were reminded by phone to fill in the diary
one day in advance and followed up the day after. Data were single entered in a
computer database and independently checked.
The survey focused on households with at least one child of age 12 years or
less. Upon sampling, all persons living more than 50% of the time in the household
were defined as household members and recruited to take part in the survey.
Participants had to record all persons they made contact with, with a contact
being defined as a two-way conversation at less than 3 meters distance or a physical
contact involving skin-to-skin touching (either with or without conversation). The
information recorded included the exact or estimated age (interval) and gender of
each contacted person, physical touching (yes/no), location, frequency and total
duration of the contact, and whether or not the contacted person was a household
member. If two people contacted each other multiple times per day, participants
were instructed to consider that to be a single contact with the duration being total
duration spent in contact with that person during the 24-hour diary period.
From the 342 households that participated in the survey, 24 households were
excluded because of missing contact diaries or non-compliance with the study design.
We analyzed data from 318 households including 1266 participants who recorded
19,685 contacts in total, with household sizes ranging from 2 to 7. Within-household
contacts were identified and matched with other household members using the fol-
2.2. Social Contact Data 41
lowing criteria: matching household identification number, gender and age (allowing
the recorded age to deviate from the true age by 1 year). As such, all contacts
reported as household contacts could be linked to a unique household member.
Amongst the remaining contacts, i.e. with missing or negative household member
indicator, that occurred at home, an additional small subset of household contacts
is identified using the same criteria as before but requiring an exact age match.
This entailed 3821 identified within-household contacts with 98% reciprocity, i.e.
symmetry in contact reporting, indicating a good quality of reporting as expected
in this household setting (Smieszek et al., 2014). We assumed all social contacts
to be reciprocal, depicting each household as an undirected network where nodes
represent household members and edges represent contacts within the household. An
edge therefore indicates that the corresponding household members made contact at
least once during that day. Contact characteristics of reciprocal contacts are merged
such that the most intense contact value is retained and the location category is set
to ‘multiple’ if two or more different locations are reported. This resulted in a total
of 1946 distinct within-household contacts of which 1861 (96%) involved physical
contact (Figure 2.9). There are 9 participants who did not record any contact with
other household members and are referred to as isolates. Figure 2.8 depicts the
observed within-household physical contact networks by household size.
Figure 2.9 shows that contacts between household members were of long duration,
which is consistent with findings from previous social contact surveys (Mossong et al.,
2008b) and from individual-based simulation models creating so-called synthetic
populations (Del Valle et al., 2007). Further, interactions between household
members occurred (almost) daily and 66% of household members only met each
other at home, while 33% met at multiple locations of which 98% included home.
In the following, we focus on physical contacts since it has been shown that these
better explain the observed age-specific seroprevalence of airborne infections, such as
varicella and parvovirus B19, compared to non-physical contacts (Ogunjimi et al.,
2009; Goeyvaerts et al., 2010; Melegaro et al., 2011).
Age, gender and household size were used to assign the role of child, mother and father
to each household member. Two households of size 4 and 3, respectively, one with a
grandparent and one with a homosexual couple, had a non-traditional configuration
and were excluded from further analysis. The final data set thus consists of 316
households yielding 1259 participants.
42 Chapter 2. Data Sources
Figure 2.8: Observed within-household physical contact networks by household size.
Nodes represent household members and edges represent physical contacts.
2.2. Social Contact Data 43
Figure 2.9: Barplots of contact intensity distributions (duration, frequency and touching)
and contact location distributions for all contacts recorded with non-household (left bar)
and household members (right bar).
Chapter 3The Social Contact Hypothesis
Under the Assumption of Endemic
Equilibrium: Elucidating the
Transmission Potential of VZV in
Europe.
In Chapter 1 we introduced two key measures of infectious disease transmission,
namely the basic and effective reproduction numbers, R0 and R. There are several
methods to estimate these reproduction number (Vynnycky and White, 2010).
In this chapter we focus on deriving R0 from transmission rates that can be
estimated from serological data under the assumption of endemic equilibrium. As
described in Section 1.3.1.3, a disease in endemic equilibrium may undergo cyclical
epidemics, but fluctuates around a stationary average over time. Also remember
that in this equilibrium setting R is expected to be equal to 1 (Diekmann et al., 1990).
We consider pre-vaccination serological data for the varicella-zoster virus from
12 different European countries described in Section 2.1.1. Serological surveys do
not provide complete information about mixing patterns, since they reflect the rate
at which susceptible individuals become infected, but not who is infecting whom.
45
46 Chapter 3. The Social Contact Hypothesis Under Endemic Equilibrium
Hence, to be able to estimate the transmission rates β(a, a′) from this VZV data, we
need to make assumptions about the underlying age-specific mixing patterns. Since
the estimation of R0 and R is sensitive to these assumptions (Greenhalgh and Dietz,
1994), we will inform the mixing pattern with data from the population-based social
contact survey described in Section 2.2.1.
Furthermore, we use the inferred effective reproduction number as a model el-
igibility criterion combined with AIC as a model selection criterion. To our
knowledge, Wallinga and Levy-Bruhl (2001) were the first to use the effective
reproduction number to asses the plausibility of different mixing patterns. However,
this is the first time that R is explicitly used as a determinant in the model selection
procedure. We evaluate how constant and age-specific proportionality factors affect
the fit to the serology and the estimated R0 values. Moreover, we assess the effect
of age-specific heterogeneity related to infectiousness on model eligibility and fit.
Further, from a selected set of demographic, socio-economic and spatio-temporal
factors, we explore which factors best explain the between-country heterogeneity in
R0 using two non-parametric methods: the maximal information coefficient (MIC)
and random forest.
This chapter covers the study in Santermans et al. (2015). It is organized as
follows. In Section 3.1, we will describe the model structure and procedure to
estimate the basic and effective reproduction number. The data application results
are presented in Section 3.1.4. In Section 3.2 we elaborate on the methods used to
determine potential risk factors and the results of this risk factor analysis. Sensitivity
to certain assumptions is assessed in Section 3.3 and we finish the chapter with some
concluding remarks in Section 3.4.
3.1 Estimating the Basic and Effective Reproduc-
tion Number
3.1.1 Mass Action Principle and Mixing Assumptions
To describe VZV transmission dynamics, a compartmental MSIR (Maternal
protection-Susceptible-Infected-Recovered) model for a closed population of size N
with fixed duration of maternal protection A is considered, following Goeyvaerts
et al. (2010) and Ogunjimi et al. (2009). Doing so, we explicitly take into account
3.1. Estimating the Basic and Effective Reproduction Number 47
the fact that newborns are protected by maternal antibodies and do not take part in
the transmission process. We assume that mortality due to infection can be ignored,
which is plausible for VZV in developed countries, and that infected individuals main-
tain lifelong immunity to varicella after recovery. Further, demographic and endemic
equilibria are assumed (Section 1.3.1.3). Under these assumptions the age-specific
prevalence π(a) is given by:
π(a) = 1− e−∫ aAλ(u) du,
where λ(a) is the age-specific force of infection. There is a wide range of methods
available to estimate λ(a) from seroprevalence data, see Hens et al. (2010) for an
historical overview.
Since we aim to estimate the basic and effective reproduction number, we need to
estimate the age-specific transmission rates β(a, a′) (Section 1.3.2.1). To do so, we
use a slightly adapted version of the mass action principle (1.11), incorporating
maternal protection:
λ(a) ≈ ND
L
∫ ∞A
β(a, a′)λ(a′)s(a′)m(a′) da′, (3.1)
where N , L, s(a) and m(a) are defined as in Chapter 1. Given the transmission
rates β(a, a′), R0 and R can be obtained following the next generation approach, as
described in Section 1.3.2.2.
Mixing assumptions
The traditional WAIFW approach of Anderson and May (1991) was used in
the exploratory analysis of the data (Nardone et al., 2007) to estimate the transmis-
sion rates. In this chapter, we will inform β(a, a′) with data on social contacts as
described in Section 1.3.2.1:
β(a, a′) = q(a, a′) · c(a, a′),
We will contrast the constant proportionality assumption, or social contact hypothesis
(1.12), against a log-linear function of the age of the susceptible individual, which
entailed an improvement of model fit for VZV in Belgium (Goeyvaerts et al., 2010),
that is, respectively:
log{q(a, a′)} = γ0 and log{q(a, a′)} = γ0 + γ1a. (3.2)
48 Chapter 3. The Social Contact Hypothesis Under Endemic Equilibrium
The contact rates c(a, a′) are estimated from the POLYMOD contact survey (Sec-
tion 2.2.1) using the bivariate smoothing approach described in Section 2.2.3, consid-
ering those contacts with skin-to-skin touching lasting at least 15 minutes since these
contacts have been shown to be most predictive for VZV (Goeyvaerts et al., 2010;
Melegaro et al., 2011). For the countries who participated in the POLYMOD project,
the corresponding contact rates were used, whereas for the other countries contact
data of a neighboring country or a country with similar school enrollment ages was
used (cf. Table 3.1). We present a sensitivity analysis in Section 3.3.1 to compare
these ad-hoc choices with a more objective selection of contact data by means of AIC.
We observe that the effect on R0 remains within reasonable bounds, which indicates
that the choice of contact data has limited influence on our estimates.
3.1.2 Estimation Procedure
In this chapter we will estimate the force of infection using maximum likelihood esti-
mation with the Bernouilli log-likelihood given by:
`(λ; y,a) =
n∑i=1
yi log(
1− e−∫ aiA λ(u) du
)+ (1− yi)
(−∫ ai
A
λ(u) du
). (3.3)
Here, n denotes the size of the serological data set and yi denotes a binary variable
indicating whether subject i had experienced infection before age ai. The transmis-
sion rates cannot be estimated analytically since the integral equation (3.1) has no
closed form solution. However, it is possible to solve this numerically by turning to
a discrete age framework, assuming a constant force of infection in each 1-year age
interval. Now, estimation proceeds as follows: starting values for the parameters
are provided after which the discretized mass action principle is iterated until
convergence (∑i(λi,iter − λi,iter−1)2 < 1 · 10−10) and finally, the resulting estimate
of the force of infection is contrasted to the serology using the log-likelihood (3.3).
To calculate 95% confidence intervals, non-parametric bootstraps are performed on
both the contact data and the serological data to account for all sources of variability
(Goeyvaerts et al., 2010). The number of bootstrap samples per country is fixed at
2000 with convergence rates varying between 62% and 100%.
Since some countries lack serological data on VZV in the older age groups,
the original serology is augmented with simulated data to avoid excess variability of
the bootstrap estimates (Goeyvaerts et al., 2010). These simulations are drawn from
a Bernouilli distribution with mean equal to the seroprevalence from the last 5 age
3.1. Estimating the Basic and Effective Reproduction Number 49
categories with at least 20 observations available. The size of the simulated samples
is determined by the demography of the population. This method is plausible from
an epidemiological point of view since the VZV seroprofile is not expected to decline
after 20 years of age. Based on the augmented data, post-stratification weights are
calculated using census data and included in the likelihood. The life expectancy
L and the age-specific mortality rates µi for every country are estimated based on
demographic data from the year of serological data collection (Eurostat, the Office
for National Statistics for England and Wales, Israeli Bureau of Statistics for Israel)
using a Poisson model with log link and offset term (Hens et al., 2012). To ensure
flexibility, a radial basis spline is used.
The duration of maternal immunity is fixed at A = 0.5 years, while the mean
duration of infectiousness for VZV is taken as D = 7/365 years. Lastly, to reduce
boundary irregularities induced by sparseness in the contact data for the elderly, the
contact surface, and hence the serological data, are restricted to the 0-69 year age
range. A sensitivity analysis showed little impact on the point estimates (results not
shown).
3.1.3 Model Eligibility and Indeterminacy
The estimated effective reproduction number R and corresponding confidence interval
allow us to check whether the above mixing patterns (3.2) conform with the assump-
tion of endemic equilibrium. In this setting, each infectious individual infects one
other individual on average, hence R is expected to be equal to 1 (Farrington, 2003).
We use this property to exclude those models for which R is estimated to be signifi-
cantly different from 1. Furthermore, the effective reproduction number allows us to
make indirect inference about the age-specific heterogeneity related to infectiousness,
assuming
log{q(a, a′)} = γ0 + γ1a+ γ2a′, (3.4)
where a′ is the age of the infective individual. We refer to this model as the ex-
tended log-linear model, in which γ2 is referred to as an infectiousness component.
Direct inference can be troublesome, as shown by Goeyvaerts et al. (2010), since
serological surveys do not provide information related to infectiousness. This indeter-
minacy can be illustrated as follows: assume for simplicity β(a, a′) = q(a, a′)c(a, a′) =
q0q1(a)q2(a′)c(a, a′). Rewriting (3.1), this implies
q0q1(a) =Lλ(a)
ND∫∞Aq2(a′)c(a, a′)λ(a′)s(a′)m(a′) da′
,
50 Chapter 3. The Social Contact Hypothesis Under Endemic Equilibrium
where λ(a), s(a) and c(a, a′) can be estimated from serological data and social contact
data, respectively. This implies that when q0q1(a) is flexibly modeled, the effect of
q2(a′) on the serological model is completely absorbed and the fit of this model does
not change for varying infectivity curves. However, it does affect the estimated value
of R0 and R. We deal with this indeterminacy by letting γ2 vary over a fixed interval
and assessing the effect on R. This way, the value of γ2 can be determined such that
R is not significantly different from 1. This is illustrated in Section 3.1.4.
3.1.4 Application to the Data
We apply the social contact data approach with a constant and age-specific log-linear
proportionality factor, as in (3.2), to the 13 serological data sets available for
VZV. The estimated basic and effective reproduction numbers for both models
are presented in Figure 3.1 and Table 3.1 together with 95% bootstrap percentile
confidence intervals. The size of the dots in the figure are proportional to the Akaike
weights (see Section 1.3.4.1), hence larger dots correspond to smaller AIC values.
These estimates are supplemented with estimates of the mean age at infection in
Table 3.1.
Models are classified as eligible based on the 95% confidence interval for the effective
reproduction number, and eligible models are compared by means of AIC. When the
model with lowest AIC value is eligible, this model is selected. This results in the
age-specific log-linear proportionality factor being preferred for Belgium, Denmark,
England and Wales, Ireland, Israel, Italy, The Netherlands and Poland. For Spain
and Slovakia, the constant proportionality factor is sufficient to provide a good fit.
For Finland, the log-linear model is preferred in terms of AIC, but this model is not
eligible, whereas for Luxembourg, both models are not eligible. In both cases, the
constant and basic log-linear model are not capable of providing a good fit to the data.
Therefore, we consider the extended log-linear model in (3.4) for Finland and
Luxembourg. Figure 3.2 presents the profile likelihood estimates of R0 and R as a
function of γ2. We observe that by including an infectiousness component in the
proportionality factor, the effective reproduction number R can be estimated closer
to 1. Note that the estimate of R0 decreases quite substantially with decreasing
γ2, in contrast to an increase in R. This reverse relation seems counter-intuitive,
but is caused by an interplay between q(a, a′) and s(a). Now, by performing a
non-parametric bootstrap for every value of γ2 on a specific grid, it is possible to
3.1. Estimating the Basic and Effective Reproduction Number 51
BE
DE
ES
EW
FI
IEIL
IT('9
7)IT
('04)
LUN
LP
LS
K
0.71
1.5
R
251015
R0
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
Fig
ure
3.1
:E
stim
ate
dbasi
cand
effec
tive
repro
duct
ion
num
ber
sw
ith
95%
boots
trap
per
centi
leco
nfiden
cein
terv
als
for
const
ant
(bla
ck),
log-l
inea
r(g
ray)
and
exte
nded
log-l
inea
r(l
ight
gra
y)
pro
port
ionality
fact
or.
For
each
countr
y,si
zes
of
the
dots
are
pro
port
ional
toA
kaik
e
wei
ghts
,hen
cela
rger
dots
corr
esp
ond
tosm
aller
AIC
valu
es.
The
dott
edhori
zonta
lline
indic
ate
sth
esi
ngle
elig
ible
valu
efo
rR
under
endem
iceq
uilib
rium
,w
hic
his
one.
52 Chapter 3. The Social Contact Hypothesis Under Endemic EquilibriumT
able
3.1
:E
stim
ate
sof
the
basi
cand
effec
tive
repro
duct
ion
num
ber
sand
transm
issi
on
para
met
ers
(γ0,γ1,γ2)
wit
h95%
boots
trap
per
centi
leco
nfiden
cein
terv
als
and
corr
esp
ondin
gA
ICva
lues
for
const
ant
(CP
),lo
g-l
inea
r(L
P)
and
exte
nded
log-l
inea
r(E
P)
pro
port
ionality
ass
um
pti
ons.
Est
imate
sfo
rE
Pare
obta
ined
usi
ng
apro
file
likel
ihood-b
ase
dass
essm
ent
of
model
elig
ibilit
y.F
inal
model
sare
indic
ate
din
bold
.
Mean
age
at
Conta
ct
Countr
yM
odel
R0
Rin
fecti
on
(years
)γ0
γ1
γ2
AIC
data
BE
CP
8.3
6[6
.46,
10.0
5]
1.0
0[0
.88,
1.3
4]
4.2
6[3
.74,
5.1
3]
-1.7
1[-
2.1
2,
-1.6
6]
891
BE
LP
6.4
0[5
.08,
8.4
7]
1.0
3[0
.88,
1.3
3]
3.9
8[3
.60,
4.5
4]
-1.4
9[-
1.9
2,
-1.0
3]
-0.0
21
[-0.0
56,
0.0
03]
880
DE
CP
6.0
7[4
.70,
6.6
6]
0.9
7[0
.90,
1.2
3]
4.9
3[4
.72,
5.8
3]
-1.7
1[-
2.1
2,
-1.7
1]
1168
DE
LP
5.5
2[4
.65,
7.2
0]
0.9
7[0
.89,
1.2
1]
4.7
9[4
.84,
5.6
9]
-1.6
2[-
2.0
7,
-1.5
0]
-0.0
07
[-0.0
14,
0.0
15]
1168
ES
CP
4.4
8[3
.86,
5.1
9]
1.0
4[0
.85,
1.4
2]
6.2
4[5
.25,
7.6
5]
-2.5
9[-
2.9
6,
-2.6
3]
2051
IT
LP
4.5
1[3
.94,
7.5
2]
1.0
4[0
.84,
1.3
7]
6.2
6[6
.04,
7.4
1]
-2.6
0[-
3.1
2,
-2.4
3]
0.0
01
[-0.0
19,
0.0
36]
2053
EW
CP
2.8
3[2
.44,
3.0
6]
0.9
5[0
.94,
0.9
9]
11.9
6[1
1.2
8,
14.7
0]
-2.4
5[-
2.7
6,
-2.5
0]
3010
GB
LP
2.7
5[2
.47,
2.9
5]
0.9
8[0
.91,
1.1
7]
11.0
5[1
0.2
3,
14.4
6]
-1.5
3[-
1.8
7,
-1.1
2]
-0.0
84
[-0.1
38,
-0.0
50]
2831
FI
CP
4.8
9[4
.31,
5.8
0]
0.9
4[0
.87,
1.0
2]
5.1
9[4
.76,
5.9
1]
-1.9
0[-
2.1
2,
-1.8
3]
682
FI
LP
5.3
2[4
.47,
8.1
0]
0.9
3[0
.87,
0.9
9]
5.4
0[5
.02,
5.9
7]
-2.0
3[-
2.2
8,
-1.7
8]
0.0
14
[-0.0
06,
0.0
35]
680
EP
4.8
1[4
.13,
6.4
6]
0.9
3[0
.88,
1.0
0]
5.4
0[5
.02,
5.9
7]
-2.1
0[-
2.3
4,
-1.8
4]
0.0
10
[-0.0
05,
0.0
35]
-0.0
08
680
IEC
P4.9
7[4
.08,
5.3
2]
0.9
2[0
.88,
1.0
4]
5.3
2[5
.31,
6.6
4]
-1.8
3[-
2.2
1,
-1.8
3]
1672
GB
LP
3.8
5[3
.41,
4.2
9]
0.9
7[0
.88,
1.1
3]
6.2
2[5
.67,
7.5
7]
-1.2
5[-
1.7
6,
-1.0
1]
-0.0
74
[-0.0
98,
-0.0
23]
1576
ILC
P11.9
3[8
.83,
14.3
4]
0.9
6[0
.88,
1.2
7]
5.0
0[4
.53,
6.2
7]
-1.3
9[-
1.7
7,
-1.2
7]
789
BE
LP
4.7
6[4
.23,
7.4
9]
1.0
5[0
.89,
1.3
3]
4.7
9[4
.37,
5.9
9]
-0.7
6[-
1.4
2,
-0.3
5]
-0.0
69
[-0.1
12,
-0.0
16]
729
IT(’
97)
CP
3.8
5[3
.39,
4.3
2]
0.9
8[0
.88,
1.3
5]
8.5
0[8
.21,
9.9
2]
-2.8
6[-
3.2
4,
-2.9
4]
2033
IT
LP
4.3
7[3
.61,
6.4
5]
0.9
5[0
.89,
1.1
9]
8.7
7[8
.62,
10.0
6]
-3.1
6[-
3.5
6,
-3.0
1]
0.0
22
[0.0
05,
0.0
42]
2000
IT(’
04)
CP
3.9
9[3
.45,
4.6
5]
0.9
8[0
.88,
1.4
2]
8.2
2[7
.63,
9.5
7]
-2.8
1[-
3.2
2,
-2.9
2]
1194
IT
LP
4.1
5[3
.63,
5.3
0]
0.9
6[0
.88,
1.3
0]
8.4
5[8
.16,
9.7
6]
-2.9
8[-
3.4
4,
-2.9
0]
0.0
11
[0.0
02,
0.0
34]
1190
LU
CP
7.2
8[6
.04,
8.8
9]
0.8
6[0
.83,
0.9
3]
4.3
3[3
.91,
5.3
0]
-1.9
7[-
2.3
0,
-1.9
0]
561
LU
LP
6.6
7[5
.75,
8.6
3]
0.8
7[0
.81,
0.9
5]
3.9
2[3
.63,
4.7
3]
-1.6
0[-
2.0
2,
-1.3
2]
-0.0
28
[-0.0
49,
0.0
02]
550
EP
4.9
9[4
.23,
6.0
7]
0.8
9[0
.82,
1.0
0]
3.8
8[3
.58,
4.7
0]
-1.4
7[-
1.8
6,
-1.1
8]
-0.0
30
[-0.0
47,
0.0
05]
-0.0
52
550
NL
CP
8.4
7[5
.74,
14.1
8]
0.8
9[0
.72,
1.4
0]
3.4
0[2
.95,
4.7
3]
-1.7
7[-
2.3
9,
-1.6
1]
400
NL
LP
7.6
0[5
.71,
12.8
7]
1.0
0[0
.69,
1.5
5]
2.7
7[2
.68,
3.8
9]
-1.1
3[-
2.0
3,
-0.3
2]
-0.0
64
[-0.1
67,
-0.0
03]
359
PL
CP
3.7
5[3
.16,
4.5
7]
0.9
3[0
.90,
0.9
9]
10.3
4[8
.35,
12.3
0]
-2.5
6[-
2.8
6,
-2.4
7]
1724
PL
LP
3.3
7[2
.93,
4.1
9]
0.9
4[0
.86,
1.0
7]
9.5
3[7
.76,
11.4
9]
-1.6
3[-
2.1
7,
-1.2
9]
-0.0
75
[-0.1
13,
-0.0
25]
1599
SL
CP
5.6
2[4
.68,
6.2
7]
0.9
0[0
.85,
1.0
4]
6.1
1[5
.93,
7.0
8]
-2.1
2[-
2.4
7,
-2.1
2]
1239
PL
LP
5.4
9[4
.77,
7.8
3]
0.9
0[0
.85,
1.0
0]
6.0
8[6
.10,
7.0
0]
-2.1
3[-
2.5
7,
-1.9
3]
-0.0
03
[-0.0
15,
0.0
22]
1241
3.1. Estimating the Basic and Effective Reproduction Number 53
determine the maximal value of γ2 such that 1 is within the 95% bootstrap confidence
interval of R. This is illustrated in Figure 3.3.
−0.30 −0.25 −0.20 −0.15 −0.10 −0.05 0.00
2.0
2.5
3.0
3.5
4.0
4.5
5.0
γ2
R0
0.90
0.95
1.00
1.05
1.10
1.15
R
R0
R
Finland
−0.30 −0.25 −0.20 −0.15 −0.10 −0.05 0.00
34
56
γ2
R0
0.85
0.90
0.95
1.00
1.05
R
R0
R
Luxembourg
Figure 3.2: Profile likelihood estimates of R0 (left axis) and R (right axis) as a function
of γ2, the parameter related to infectiousness, for Finland and Luxembourg.
●●●●
●
●
●
0.85
0.90
0.95
1.00
1.05
1.10
1.15
γ2
−0.1 −0.075 −0.05 −0.025 −0.01 0
● R95% ci
Finland
●●
●
●
●
●
0.8
0.9
1.0
1.1
γ2
−0.15 −0.1 −0.075 −0.05 −0.01
● R95% ci
Luxembourg
Figure 3.3: Profile likelihood estimates of R (dots) with interpolated 95% bootstrap
percentile confidence intervals (dashed lines) as a function of γ2, the parameter related to
infectiousness, for Finland and Luxembourg. The vertical dotted line indicates the value of
γ2 for which the upper confidence limit of R equals 1 (horizontal dotted line).
54 Chapter 3. The Social Contact Hypothesis Under Endemic Equilibrium
The parameter estimates and confidence intervals for the extended log-linear model
based on these maximal values of γ2 are also displayed in Figure 3.1. We observe
the following: for Finland, the extended model has an improved fit compared to
the constant model and is conform with the endemic equilibrium assumption. For
Luxembourg, only the extended model is eligible, and in addition, it has the lowest
AIC value. Note that the estimate of R0 for Luxembourg decreases considerably.
The estimated seroprevalence curves based on the selected model for each country are
presented in Figures 3.4 and 3.5. The fitted seroprofiles show a similar pattern across
countries, with most infections occurring during early childhood and the estimated
prevalence approaching one as age increases. However, the prevalence does not reach
one in all countries and, for example, Italy has a more particular profile. Looking
at the FOI curves, the largest estimate is observed in the Netherlands (0.57 year−1)
at the age of 5, followed by Luxembourg (0.49 year−1). The largest estimate of R0
is obtained for The Netherlands (7.60) and the lowest for England and Wales (2.75).
11 out of 13 countries have R0 estimated below 6.
3.2 Elucidating Potential Risk Factors
There is considerable variation in estimated basic reproduction numbers, and hence
in transmissibility, among the countries under consideration. To address these
differences a selection of 39 relevant country-specific variables was made, comprising
data on demography, childcare, population density and weather (Table 3.2). To
investigate associations between R0 and these variables, two different non-parametric
approaches are considered.
3.2.1 Maximal Information Coefficient
The maximal information coefficient (MIC) (Reshef et al., 2011) is a measure of two-
variable dependence, designed specifically for rapid exploration of high-dimensional
data sets. The MIC is part of a larger family of maximal information-based non-
parametric exploration statistics, which can be used not only to identify important
relationships in data sets but also to characterize them. The MIC is defined in the
following way: let G denote an x-by-y grid on the scatterplot of the two variables
under consideration for a pair of integers (x, y). Let IG denote the mutual infor-
mation of the probability distribution induced on the boxes of G, where the proba-
3.2. Elucidating Potential Risk Factors 55
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
prev
alen
ce
●
●●
●
●●
●●
●●●●
●●●●●●●●●
●
●● ●
●
●●● ●
●
●●● ● ●
●
●
●
●
● ● ● ●●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●●●●●
0.00
0.10
0.20
0.30 fo
rce
of in
fect
ion
(1/y
ears
)
Belgium
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
prev
alen
ce
●
●
●
●
●
●●●●
●●●●●
●●●●●
●●●
●●
●
●●
●●
●●
●●●
●●●
●●
●●●●●
●
●●
●●
●
●
● ● ● ●●
●
●●●●
●
●● ● ●● ●
0.00
0.15
0.30 fo
rce
of in
fect
ion
(1/y
ears
)
Germany
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
prev
alen
ce
●
●
●
●●
●●●
●
●●●●
●
●
●●●●●
●●
●
●●●
●●●
●
●
●●●
●●●●
●●●
●●●●
●
●
●●●●
●●
●
●●
●
●●●
●
●●●
●
●
●
0.00
0.15
0.30 fo
rce
of in
fect
ion
(1/y
ears
)
Spain
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
prev
alen
ce
●●●
●●●
●
●
●
●
●
●●●●
●●●●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●
●
●
●
0.00
0.15
forc
e of
infe
ctio
n (1
/yea
rs)
England and Wales
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
prev
alen
ce
●
●
●
●●
●
●
●
●
●●
●
●●
●●
●
●●●●
●●
●
●
●●
●
●
●
●
●●●●
●●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.00
0.15
0.30 fo
rce
of in
fect
ion
(1/y
ears
)
Finland
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
prev
alen
ce
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ● ●● ● ● ●
●
0.00
0.15
0.30 fo
rce
of in
fect
ion
(1/y
ears
)
Ireland
Figure 3.4: Observed age-specific VZV seroprevalence (dots) and the profile estimated
from the final model selected for each country (solid line). The corresponding force of
infection estimates are displayed by the lower solid line.
56 Chapter 3. The Social Contact Hypothesis Under Endemic Equilibrium
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
prev
alen
ce
●
●
●
●
●●
●●
●
●
●●●
●●●
●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●●
●
●
● ●
●
● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
0.0
0.1
0.2
0.3
0.4
forc
e of
infe
ctio
n (1
/yea
rs)
Israel
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
prev
alen
ce
●
●
●●
●●
●●●
●●●●
●●
●●●●
●●●
●
●
●
●
●
●
●
●●
●●
●
●●●
●●
●
●
●
● ● ●
●
● ● ● ●
●
●●●●●
●
●●●●
●●
●
●
0.00
0.15
forc
e of
infe
ctio
n (1
/yea
rs)
Italy
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
●
●
●●
●
●
●
●
●●
●●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
● ●
●
●
● ● ● ● ● ●
●
● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ●
●
●
IT('97)IT('04)
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
prev
alen
ce
●
●
●●●●●●●●
●●●●●●
●
●
●●
●
●●
●
●
●●●
●
●
●●
●
●● ● ● ● ●
●
● ● ● ● ●
●
● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
0.1
0.2
0.3
0.4
0.5
forc
e of
infe
ctio
n (1
/yea
rs)
Luxembourg
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
prev
alen
ce
●
●
●
●
●
●●
●
●●●●●●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ●
●
●
●
● ● ● ● ●
●
●
●
●
● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.0
0.1
0.2
0.3
0.4
0.5
0.6
forc
e of
infe
ctio
n (1
/yea
rs)
The Netherlands
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
prev
alen
ce
●●●●
●
●●
●●●●
●
●
●
●
●●
●
●
● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●●
●●
●
●
●●
●
●
●
● ●
●
●●
● ●
●
● ●
●
●
●
●
●
● ● ●
● ●
0.00
0.15
0.30 fo
rce
of in
fect
ion
(1/y
ears
)
Poland
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
age
prev
alen
ce
●
●●
●
●
●
●●●
●●●●●●●●●
●●
●●
●
●●
●●●●●
●●●
●
●
●●
●●●
●
● ● ● ●
●
●
● ●
● ●
● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.00
0.15
0.30 fo
rce
of in
fect
ion
(1/y
ears
)
Slovakia
Figure 3.5: Observed age-specific VZV seroprevalence (dots) and the profile estimated
from the final model selected for each country (solid line). The corresponding force of
infection estimates are displayed by the lower solid line.
3.2. Elucidating Potential Risk Factors 57
Table
3.2
:Sel
ecte
dse
tof
pote
nti
al
risk
fact
ors
for
vari
cella.
Data
sourc
esand
mis
singnes
sare
indic
ate
d.
Ref
eren
ceyea
rsw
ere
chose
nto
be
as
close
toth
eyea
rof
sero
logic
al
data
collec
tion
as
poss
ible
,co
ndit
ional
on
availabilit
y.
IDV
ari
able
desc
ripti
on
Sourc
eM
issi
ngness
(out
of
13)
1P
rop
ort
ion
of
popula
tion
age
0-4
years
(%)
Euro
stat
1
2P
robabilit
yof
dyin
gb
efo
reage
5(p
er
1000
live
bir
ths)
WH
O0
3E
nro
llm
ent
rate
sof
childre
n0-2
years
info
rmal
care
or
earl
yeducati
on
serv
ices
(%)
OE
CD
0
4E
nro
llm
ent
rate
sof
childre
n3-5
years
info
rmal
care
or
earl
yeducati
on
serv
ices
(%)
OE
CD
0
5P
opula
tion
livin
gin
an
overc
row
ded
house
hold
(%)
Euro
stat
1
6A
vera
ge
abso
lute
hum
idit
y(m
ean
dew
poin
tte
mp
era
ture
in◦C
)M
ath
em
ati
ca
0
7Sta
ndard
devia
tion
of
abso
lute
hum
idit
y(m
ean
dew
poin
tte
mp
era
ture
in◦C
)M
ath
em
ati
ca
0
8C
hildre
n0-3
years
that
receiv
eno
form
of
form
al
care
(%)
Euro
stat
1
9C
hildre
n0-3
years
that
are
care
dfo
rby
only
their
pare
nts
(%)
Euro
stat
1
10
Wom
en
aged
25-4
9years
wit
hat
least
one
child
aged
0-5
years
who
are
em
plo
yed
(%)
Euro
stat
1
11
Wom
en
per
men
(Num
ber
of
wom
en
per
100
men)
Euro
stat
1
12
Educati
onal
level
att
ain
ment
upp
er
secondary
(%of
popula
tion
25−
64)
Euro
stat
1
13
Unm
et
medic
al
needs
(%of
popula
tion)
Euro
stat
1
14
Inequality
of
incom
edis
trib
uti
on
(rati
oof
20%
hig
hest
incom
eand
20%
low
est
incom
e)
Euro
stat
1
15
Unem
plo
ym
ent
(%of
econom
ically
acti
ve
popula
tion)
Euro
stat
0
16
Genera
lpra
cti
tioners
(per
100,0
00
inhabit
ants
)W
HO
1
17
Educati
onal
level
school
exp
ecta
ncy:
exp
ecte
dyears
of
educati
on
over
alife
tim
e(y
ears
)E
uro
stat
1
18
Popula
tion
aged
65
years
and
ab
ove
(%)
Euro
stat
0
19
Povert
yra
te(%
of
popula
tion
that
are
at
risk
of
povert
yaft
er
socia
ltr
ansf
ers
)E
uro
stat
1
20
Avera
ge
popula
tion
densi
ty(p
er
km
2)
Euro
stat
0
21
Avera
ge
house
hold
size
Euro
stat
0
22
GD
P/capit
aat
purc
hasi
ng
pow
er
standard
Euro
stat
1
23
Bir
thra
te(n
um
ber
of
bir
ths
per
1,0
00
inhabit
ants
)E
uro
stat
0
24
Liv
ing
are
a(a
vera
ge
m2
per
pers
on)
Euro
stat
4
25
Popula
tion
aged
0-1
4years
(%)
WH
O0
26
Urb
an
popula
tion
(%)
WH
O0
27
UN
DP
:in
dex
measu
ring
avera
ge
ach
ievem
ent
in3
dim
ensi
ons
of
hum
an
develo
pm
ent
WH
O0
28
Avera
ge
num
ber
of
people
per
room
inoccupie
dhousi
ng
unit
WH
O1
29
Docto
rs’
consu
ltati
ons
(num
ber
per
capit
ap
er
year)
OE
CD
2
30
Childre
nim
muniz
ed
for
DT
P(%
)O
EC
D1
31
65+
popula
tion
vaccin
ate
dagain
stin
fluenza
(%)
OE
CD
2
32
Public
exp
endit
ure
on
pre
venti
on
and
public
healt
h(%
of
tota
lhealt
hexp
endit
ure
)O
EC
D2
33
Infa
nts
vaccin
ate
dagain
stin
vasi
ve
dis
ease
due
toH
aem
ophiliu
sin
fluenzae
typ
eb
(%)
WH
O0
34
Infa
nts
vaccin
ate
dagain
stm
um
ps
(%)
WH
O1
35
Infa
nts
vaccin
ate
dagain
stp
ert
uss
is(%
)W
HO
0
36
Infa
nts
vaccin
ate
dagain
stru
bella
(%)
WH
O0
37
Bre
ast
feedin
gat
3m
onth
s(%
of
infa
nts
)W
HO
6
38
New
born
babie
sw
ith
bir
thw
eig
ht>
2.5
kg
(%)
WH
O3
39
Tota
lhealt
hexp
endit
ure
(%of
GD
P)
Worl
dB
ank
0
58 Chapter 3. The Social Contact Hypothesis Under Endemic Equilibrium
bility of a box is proportional to the number of data points falling inside the box.
Now, define the characteristic matrix Mx,y as follows: the (x, y)-th entry mx,y equals
max(IG)/ log(min(x, y)), where the maximum is taken over all x-by-y grids G. MIC
is the maximum of mx,y over ordered pairs (x, y) such that xy < B, where B is a
function of sample size. Default is B = n0.6, with n denoting sample size.
3.2.2 Random Forest Approach
Secondly, a random forest approach for regression is used (Breiman, 2001), which is
a class of ensemble methods - methods that generate many classifiers and aggregate
their results - specifically designed for classification and regression trees. Each tree is
constructed using a different bootstrap sample of the data and each node is split using
the best among a subset of predictors randomly chosen at each node. The random
forest algorithm for regression is as follows:
1. Draw ntree bootstrap samples from the original data.
2. For each of the bootstrap samples, grow an unpruned regression tree by sampling
mtry of the predictors and choosing the best split from these variables.
3. Predict new data by averaging the predictions of the ntree trees.
An estimate of the error rate can be obtained as follows: at each bootstrap iteration,
predict the data not in the bootstrap sample (“out-of-bag” or OOB data) using the
tree grown with the bootstrap sample. Average the OOB predictions and calculate
the corresponding error rate. This is called the OOB estimate of error rate.
The “mean of squared residuals” is computed as
MSEOOB = n−1n∑i=1
(yi − yOOB
i )2,
where yOOBi is the average of the OOB predictions for the ith observation.
Two variable importance measures are then defined as follows. The “mean de-
crease in accuracy” is computed from permuting the OOB data. For each tree,
MSEOOB is recorded and the same is done after permuting each predictor variable.
The differences between the two are then averaged over all trees and normalized by
the standard deviation of the differences. The second measure “mean decrease in
node purity” is the total decrease in node impurities from splitting on the variable,
averaged over all trees. For regression, the node impurity is measured by the residual
3.2. Elucidating Potential Risk Factors 59
sum of squares.
Compared to many other classifiers, this turns out to perform very well and is
robust against overfitting (Breiman, 2001). In addition, it has only two parameters -
the number of variables in the random subset at each node and the number of trees
in the forest - and is usually not very sensitive to their values. We use the random
forest algorithm from the randomForest package in R with the default number of
trees (500). The number of split variables is selected such that the highest percentage
explained variance is obtained. The package produces two measures of importance
of the predictor variables: “mean decrease in accuracy” and “mean decrease in node
purity”.
3.2.3 Results
Table 3.5 contains the pairs of potential risk factors with the strongest correlation
given by the Spearman correlation coefficient. These correlations can be used to
interpret the relation between R0 and certain factors.
The ten factors with the largest MIC of association with R0, are presented in
Table 3.3 together with the corresponding Spearman correlation coefficients. This
implies, for example, that the higher the inequality of income, the lower R0.
Table 3.3: Ten factors with the largest MIC value of association with R0, estimated from
the final model selected for each country, and corresponding Spearman correlation
coefficients ρS .
MIC ρS
1. inequality of income distribution 1.0 -0.64
2. poverty rate 1.0 -0.73
3. % infants vaccinated against mumps 0.65 0.64
4. average square meter living area pp 0.59 0.42
5. % breast feeding at 3 months 0.47 -0.21
6. % employed women 25 - 49 (min. 1 child 0 - 5) 0.46 0.38
7. % infants vaccinated against pertussis 0.38 0.46
8. % infants vaccinated against rubella 0.36 0.51
9. % population aged 0-14 0.32 -0.22
10. total health expenditure 0.32 0.51
60 Chapter 3. The Social Contact Hypothesis Under Endemic Equilibrium
Results of the random forest analysis of R0 are summarized in Table 3.4 where the
ten highest scoring factors for both importance measures are given. Comparing the
results of both analyses, we observe that factors related to the distribution of wealth
(inequality of income and poverty rate), vaccination coverage in infants (e.g. mumps
vaccination coverage) and child care attendance (e.g. the percentage of infants that
receive no formal care) seem to be associated with the transmissibility of VZV.
Table 3.4: Ten best scoring factors obtained by a random forest analysis of R0, estimated
from the final selected model for each country, and corresponding Spearman correlation
coefficients ρS .
% increase in MSE ρS Increase in node purity ρS
1. inequality of income distribution -0.64 inequality of income distribution -0.64
2. poverty rate -0.73 poverty rate -0.73
3. total health expenditure 0.51 average population density 0.33
4. % 0-2 that receive no formal care -0.29 % 0-2 that receive no formal care -0.29
5. % infants vaccinated against mumps 0.64 unmet medical needs -0.31
6. % population aged 0-14 -0.22 total health expenditure 0.51
7. % employed women (min. 1 child 0 - 5) 0.38 enrollment rates children 0-2 0.15
8. average square meter living area pp 0.42 average square meter living area pp 0.42
9. average population density 0.33 % 65+ vaccinated against influenza -0.19
10 enrollment rates children 0-2 0.15 % infants vaccinated against mumps 0.64
3.3 Sensitivity Analysis
3.3.1 Contact data
The choice of contact data for countries that do not have contact data was based
on geographical grounds or school enrollment ages. As a result, data from Italy,
Belgium, England and Wales and Poland are used for Spain, Israel, Ireland and
Slovakia, respectively. However, besides school-based contacts, the contact rates
will also depend on e.g. the number of people living in a household. Therefore, we
compared the average household size for these countries at the time of serological
data collection in Table 3.6. We can see that there is quite some difference when
comparing Belgium to Israel and even Ireland to England and Wales. However, when
selecting other contact data we cannot expect to have complete agreement on every
relevant factor. For this reason, we performed a sensitivity analysis exploring the
impact of the contact matrix by repeating the estimation with the basic log-linear
3.3. Sensitivity Analysis 61
Table 3.5: Pairs of potential risk factors with the largest absolute Spearman correlation
coefficient. High scoring factors according to MIC and random forest are indicated in bold.
ID ID ρS
1 Proportion 0-4 25 Proportion 0-14 0.96
36 % infants vaccinated against rubella 34 % infants vaccinated against mumps 0.95
19 Poverty rate 14 Inequality of income distribution 0.95
22 GDP 15 Unemployment -0.91
3 Enrollment rates children 0-2 9 % 0-3 cared for only by their parents -0.90
24 Average m2 living area pp 22 GDP 0.87
1 Proportion 0-4 23 Birth rate 0.86
25 Proportion 0-14 18 Proportion 65+ -0.84
37 Breast feeding at 3 months 20 Average population density -0.82
3 Enrollment rates children 0-2 9 % 0-2 not receiving formal care -0.82
25 Proportion 0-14 23 Birth rate 0.81
13 Unmet medical needs 10 % employed women (min. 1 child 0-5) -0.80
4 Enrollment rates children 3-5 18 Proportion 65+ 0.80
36 % infants vaccinated against rubella 33 % infants vaccinated against hib 0.80
3 Enrollment rates children 0-2 13 Unmet medical needs -0.79
5 Overcrowding 15 Unemployment 0.79
24 Average m2 living area pp 11 Women per men -0.78
15 Unemployment 10 % employed women (min. 1 child 0-5) -0.78
37 Breast feeding at 3 months 31 % 65+ vaccinated against influenza -0.77
24 Average m2 living area pp 15 Unemployment -0.77
27 Human Development Index 13 Unmet medical needs -0.76
23 Birth rate 13 Unmet medical needs -0.76
27 Human Development Index 22 GDP 0.76
9 % 0-2 not receiving formal care 7 SD humidity 0.75
3 Enrollment rates children 0-2 10 % employed women (min. 1 child 0-5) 0.75
26 Urban population 20 Average population density 0.75
31 % 65+ vaccinated against influenza 7 SD humidity -0.75
11 Women per men 10 % employed women (min. 1 child 0-5) -0.74
5 Overcrowding 10 % employed women (min. 1 child 0-5) -0.74
13 Unmet medical needs 8 % 0-3 not receiving formal care 0.73
1 Proportion 0-4 18 Proportion 65+ -0.73
20 Average population density 6 Mean humidity 0.73
37 Breast feeding at 3 months 30 % children immunized for DTP -0.72
3 Enrollment rates children 0-2 15 Unemployment -0.72
4 Enrollment rates children 3-5 32 Public expenditure public health -0.72
24 Average m2 living area pp 10 % employed women (min. 1 child 0-5) 0.72
35 % infants vaccinated against pertussis 19 Poverty rate -0.72
26 Urban population 9 % 0-3 cared for only by their parents -0.71
27 Human Development Index 15 Unemployment -0.71
21 Average household size 17 Educational level school expectancy -0.71
23 Birth rate 10 % employed women (min. 1 child 0-5) 0.70
3 Enrollment rates children 0-2 7 SD humidity -0.70
62 Chapter 3. The Social Contact Hypothesis Under Endemic Equilibrium
model using contact data from the other seven countries available in the POLYMOD
study. Doing so, it was possible to identify for every country the contact data that
gave the best fit to the serological data. In Table 3.7 the estimated basic reproduction
numbers are compared with the estimates obtained by using the “minimal AIC
contact data”. We observe that the effect on R0 is small, except for Luxembourg,
Belgium, Finland and Israel where the difference is larger, but still within reasonable
bounds.
Table 3.6: Comparison of the average household size at time of serological data collection.
Country HH size Country HH size
Spain 2.8 Italy 2.5
Israel 3.4 Belgium 2.4
Ireland 3.0 England and Wales 2.3
Slovakia 3.1 Poland 3.0
3.3.2 Risk Factors
Table 3.8 and Table 3.9 show the results obtained for the MIC and random forest
approach, respectively, when repeating the risk factor analysis based on the “minimal
AIC contact data”. We can conclude that the risk factor analysis is quite robust to
changes in the contact matrix, as the most important influential factors do not change.
3.3.3 Perturbations Demographic and Endemic Equilibrium
To explore the effect of perturbations in the demographic and endemic equilibrium,
we model the transmission dynamics of VZV in Belgium based on an age-time SIR
model using a RAS-model (described in Section 1.3.1.2). For the force of infection,
a dynamic model using the social contact approach is considered. Further, we use
time homogenous but age-dependent mortality rates µi estimated from mortality
data (Eurostat) and a constant number of newborns. All other parameters (e.g. N ,
L, D) used in this simulation are equal to the ones in the primary analysis. The
proportionality factor and force of infection are based on the estimates obtained for
Belgium under the log-linear proportionality assumption.
3.3. Sensitivity Analysis 63
Table 3.7: Estimated basic and effective reproduction numbers with 95% bootstrap
percentile confidence intervals and corresponding AIC values for the log-linear model based
on contact data minimizing AIC.
Contact
Country R0 R AIC data
Belgium 7.79 [6.21, 14.67] 1.06 [0.82, 1.44] 877 IT
Germany 5.32 [4.69, 6.24] 0.87 [0.84, 0.93] 1156 LU
Spain 4.13 [3.63, 4.89] 0.89 [0.86, 0.93] 2031 LU
England and Wales 3.16 [2.71, 3.72] 1.06 [0.88, 1.45] 2824 IT
Finland 6.34 [5.01, 23.95] 0.88 [0.79, 1.27] 675 NL
Ireland 3.94 [3.54, 4.97] 1.04 [0.90, 1.33] 1576 BE
Israel 5.94 [5.07, 11.56] 1.08 [0.83, 1.42] 721 IT
Italy (1997) 4.77 [3.55, 6.34] 0.97 [0.93, 1.08] 1987 DE
Italy (2004) 4.15 [3.63, 5.30] 0.96 [0.88, 1.30] 1190 IT
Luxembourg 7.92 [6.55, 11.65] 1.10 [0.82, 1.88] 550 IT
The Netherlands 7.91 [6.29, 10.23] 0.91 [0.79, 1.02] 357 LU
Poland 3.37 [2.93, 4.19] 0.94 [0.86, 1.07] 1599 PL
Slovakia 5.49 [4.77, 7.83] 0.90 [0.85, 1.00] 1241 PL
Table 3.8: Ten factors with the largest MIC value of association with R0, estimated from
the log-linear model using the minimal AIC contact data, and corresponding Spearman
correlation coefficient, ρS .
MIC ρS
1. inequality of income distribution 1.0 -0.70
2. poverty rate 1.0 -0.76
3. average square meter living area pp 0.59 0.67
4. birth rate 0.50 0.22
5. % employed women 25 - 49 (min. 1 child 0 - 5) 0.46 0.49
6. unmet medical needs 0.46 -0.41
7. % infants vaccinated against mumps 0.46 0.56
8. % 65+ vaccinated against influenza 0.44 -0.32
9. women per men 0.41 -0.30
10. human development index 0.36 0.13
64 Chapter 3. The Social Contact Hypothesis Under Endemic Equilibrium
Table 3.9: Ten best scoring factors obtained by a random forest analysis of R0, estimated
from the log-linear model using the minimal AIC contact data, and corresponding
Spearman correlation coefficient, ρS .
% increase in MSE ρS Increase in node purity ρS
1. inequality of income distribution -0.70 % employed women (min. 1 child 0 - 5) 0.49
2. poverty rate -0.76 poverty rate -0.76
3. % employed women (min. 1 child 0 - 5) 0.49 inequality of income distribution -0.70
4. unmet medical needs -0.42 unmet medical needs -0.42
5. average square meter living area pp 0.67 urban population 0.37
6. % 0-2 that receive no formal care -0.24 enrollment rates children 0-2 0.23
7. average population density 0.29 % 0-2 that receive no formal care -0.24
8. women per men -0.30 average square meter living area pp 0.67
9. urban population 0.37 GDP/capita 0.42
10 standard deviation of absolute humidity 0.19 % infants vaccinated against hib 0.55
We first run the RAS-model until endemic equilibrium is reached. Afterwards, a
demographic change or vaccination strategy is applied. Seroprevalence data are then
sampled from the resulting prevalence and we repeat our procedure to estimate
R0 and R. We include a simple vaccination strategy by putting S(0) = (1 − p)B
and R(0) = pB for p = 0.4 and p = 0.6. This corresponds to p × 100% newborns
instantaneously immunized at birth. Simulated demographic changes follow from
increasing or decreasing the number of births per year while keeping mortality rates
fixed. The obtained results are summarized in Table 3.10.
Table 3.10: Estimates of the basic and effective reproduction number when implementing
a vaccination strategy or changing the birth rate.
Model R0 R
Vaccination 40% 4.97 [4.26, 6.16] 1.06 [0.92, 1.32]
Vaccination 60% 3.77 [3.38, 4.70] 1.15 [1.00, 1.32]
Birth +0.25% 8.39 [6.47, 15.38] 1.11 [0.94, 2.20]
Birth −0.25% 4.75 [4.03, 5.69] 1.02 [0.89, 1.34]
Birth −0.5% 3.13 [2.75, 3.49] 0.99 [0.92, 1.31]
We observe that R increases when a percentage of the newborns would have been
vaccinated and when the number of births would be increasing. It decreases when the
annual number of births would decrease.
3.4. Discussion 65
3.4 Discussion
In this chapter, we investigated the transmissibility of VZV in 12 European countries
using serological survey data and social contact data. We contrasted the social
contact hypothesis, which is currently the most used approach in the literature,
against an approach reflecting differences in characteristics related to susceptibility
and infectivity. Furthermore, we introduced the effective reproduction number as a
model eligibility criterion and we identified which country-specific socio-demographic
factors are important in explaining differences in transmission potential between
European countries using two non-parametric approaches: the maximal information
coefficient and random forest.
The social contact hypothesis provided a good fit to the VZV seroprevalence
for only 2 out of 12 countries. The other countries benefited from an extended
approach by assuming an age-dependent proportionality factor, which supports and
extends earlier findings of Goeyvaerts et al. (2010) for VZV in Belgium. This may
reflect the additional importance of age-specific characteristics related to susceptibil-
ity and infectiousness, such as the mean infectious period. Furthermore, the social
contact data are used as proxies for events by which an infection is transmitted.
Hence, the proportionality factor can also be considered as an age-specific adjustment
factor relating the true contact rates underlying infection to the social contact
proxies. Alternatively, social data are difficult to collect from young children, with
parents filling out the diary on their behalf. It may well be that they consistently
underestimate the true number of contacts that young children make.
Our analysis directly improves upon the original analysis of the ESEN2 data
on VZV by Nardone et al. (2007) who used the traditional Anderson and May
approach by imposing a 3-parameter structure on the WAIFW matrix (Anderson
and May, 1991). Our method of using R as a model eligibility criterion extends
the approach of Goeyvaerts et al. (2010) by addressing the indeterminacy of the
infectivity parameter. Our results complement those of Melegaro et al. (2011)
who analyzed part of the VZV serology using the social contact hypothesis only.
Comparing the estimated R0 values, we notice that our results in general somewhat
differ from the estimates obtained by Nardone et al. (2007) and Melegaro et al.
(2011). This is not unexpected, since there are differences in methodology and it is
known that transmission assumptions have a large impact on the estimation of R0.
See Table 3.11 for a comparative overview of the results.
66 Chapter 3. The Social Contact Hypothesis Under Endemic Equilibrium
Table 3.11: Ranges of estimates of the basic reproduction numbers obtained by
Santermans et al. , Nardone et al. and Melegaro et al. Nardone et al. used a WAIFW
matrix approach for three age groups, whereas Melegaro et al. used the social contact
hypothesis for different stratifications of POLYMOD contact data.
Country Santermans et al. Nardone et al. Melegaro et al.
BE 6.40 - 8.36 6.47 5.47 - 8.75
DE 5.52 - 6.07 5.46
ES 4.48 - 4.51 3.91
EW 2.75 - 2.83 3.83 3.66 - 5.11
FI 4.81 - 5.32 4.85 4.71 - 8.44
IE 3.85 - 4.97 5.22
IL 4.76 - 11.93 7.71
IT (’97) 3.85 - 4.37 3.31 3.98 - 4.64
IT (’04) 3.99 - 4.15
LU 4.99 - 7.28 8.28
NL 7.60 - 8.47 16.91
PL 3.37 - 3.75 5.27 - 7.5
SL 5.49 - 5.62 5.72
The results in Figure 3.1 indicate that there are substantial epidemiological differ-
ences between European countries. This is important to consider when parametrizing
mathematical models. Childhood vaccination coverage (for different vaccines),
child care attendance, population density and average living area per person were
positively associated with R0, whereas income inequality, poverty, breast feeding,
and the proportion of children under 14 years of age showed negative associations.
While it seems intuitively logical that greater child care attendance and population
density lead to more rapid spread of varicella, other associations are more difficult
to interpret. Less poverty and income inequality, and higher vaccination coverages
may be associated with more affluent societies in which women are more likely to
be employed and children have more universal access to childcare and kindergarten
from an early age on, facilitating the spread of VZV.
In our analyzes, we relied on a few assumptions. First of all, we assumed
that the serological status of an individual is a direct measure of his/her current
immunity against VZV (Plotkin, 2010). Further, we considered physical contacts
lasting longer than 15 minutes to be a good proxy for potential varicella transmission
events as shown by Goeyvaerts et al. (2010) for Belgium. Finally, our use of R as
a model eligibility criterion relied on the assumption of endemic equilibrium. This
3.4. Discussion 67
assumption is supported by the similarity in the results obtained for the two samples
of Italy. In addition most surveys span two seasons, which partly captures any
seasonal fluctuation. However, there are many factors that can cause changes in
the age distribution of VZV cases over time, e.g. changes in demography, medical
practice, socio-cultural factors etc. We performed a sensitivity analysis to give us a
sense of the way R changes when demographic or endemic equilibrium are perturbed.
Looking at this more rigorously requires an additional in-depth analysis which is the
topic of future research.
Since direct inference for the infectivity parameter is hindered by the lack of
information regarding infectiousness in the serological data, we estimated this
parameter via indirect inference using the effective reproduction number. This
indeterminacy illustrates that the use of social contact data does not completely
resolve the identifiability issues encountered when estimating mixing patterns from
serological data. Hence, further research is necessary to obtain additional knowledge
about the age-specific susceptibility and infectivity profiles in order to inform the
proportionality factor in this social contact approach.
Chapter 4Structural Differences in Mixing
Behaviour Informing the Role of
Asymptomatic Infection and Testing
Symptom Heritability
In the absence of effective vaccines or treatment, controlling the spread of an
infectious disease during the early stages of an outbreak, relies on (i) isolation of
symptomatic cases and (ii) tracing and quarantining the contacts of these cases.
Hence, the timing of onset of symptoms relative to the start of infectiousness is a
crucial factor in the success of these public health interventions. It has been shown
that the proportion of asymptomatic infections (i.e. transmission that occurs before
symptom onset or without showing symptoms at all) is a key parameter to predict
whether or not isolation and contact tracing will lead to containment (Fraser et al.,
2004). It is therefore important to use an epidemic model that explicitly takes into
account asymptomatic transmission. However, in many cases the available data is
based on observations of symptomatic individuals only. To overcome this limitation,
models often rely on untestable assumptions, e.g. assuming a fixed proportion of
asymptomatic individuals (Inaba and Nishiura, 2008) or ignoring pre-symptomatic
transmission (Ejima et al., 2013).
69
70 Chapter 4. Differences in Mixing Behaviour and Symptom Heritability
Data on social contacts of individuals in a population have already proven to
be a valuable additional source of information when estimating the ‘Who Acquires
Infection From Whom’ (WAIFW) matrix and the basic reproduction number R0 (see
Section 1.3.2.1). More recently, social contact data have also been used to gain insight
in the impact of illness on social contact patterns Eames et al. (2010). It was found
that individuals symptomatic with influenza-like illness (ILI) have less social contacts
than asymptomatic individuals. Furthermore, the age distribution of contacts
differs between symptomatic and asymptomatic cases. These differences in mixing
behavior affect the expected distribution of infection during the early stages of an
outbreak, which allowed Van Kerckhove et al. (2013) to estimate the proportion of ILI
infections caused by asymptomatic cases (34%; CI: 0% - 77%) from ILI incidence data.
Influenza viruses are highly infectious and cases can show a variety of symp-
toms such as fever, runny nose and sore throat. A substantial number of cases also
show little to no apparent symptoms. Several challenge studies have looked at the
dynamics of viral shedding and symptoms following influenza virus infections; for
a review see Carrat et al. (2008). Symptomatic cases are considered to be more
infectious than asymptomatic cases, since it was found that clinical cases have a
higher quantity of virus in nasal wash fluids compared to individuals who did not
develop symptoms. In addition, a positive correlation was found between severity
of illness and the mean quantity of virus. The link between administered dose and
development or degree of symptoms is less clear. Carrat et al. (2008) reported
a negative correlation between inoculated dose and fever, whereas Huang et al.
(2011) did not find a dependency between inoculated dose and disease outcome.
Their findings point to host factors leading to asymptomatic infections. Hence, it is
clear that more research is needed to find the precise link between the amount and
duration of viral shedding, the development and the degree of symptoms and the
transmission of the virus.
In this chapter we will extend the work of Van Kerckhove et al. (2013) by in-
corporating social contact data from asymptomatic and symptomatic individuals
(Section 2.2.2) to inform mixing patterns in a compartmental model described by a
system of ordinary differential equations (Santermans et al., in revision). We will
illustrate inference on parameters related to asymptomatic infection using incidence
data on influenza-like illness (Section 2.1.2). Furthermore, we will also investigate the
possibility that the chance of developing ILI symptoms depends on whether infection
came from a symptomatic or an asymptomatic case. This chapter is organized as
4.1. Methodology 71
follows. In Section 4.1, we introduce the model structure and estimation procedure.
In Section 4.2, the ILI data are analyzed, the impact of control strategies is discussed
in Section 4.3, and, lastly, Section 4.4 summarizes our main results, conclusions, and
avenues for further research.
4.1 Methodology
Two transmission models are proposed to describe the disease dynamics for influenza
and infections with similar disease progress. In the first model individuals either
develop symptoms or not after a pre-symptomatic stage. We will refer to this model
as the non-preferential transmission model, since the development of symptoms is
independent of the status of the infector. We extend this model by keeping track of
whether a susceptible individual is infected by an asymptomatic or a symptomatic
case. This second model will be referred to as the preferential transmission model.
In Sections 4.1.1 and 4.1.2 the models and underlying structure are introduced. The
estimation procedure is described in Section 4.1.3.
4.1.1 Transmission Models
In Figure 4.1 the non-preferential model is presented in a flow diagram. Note that
superscripts indicate clinical status of the infected individual: symptomatic ‘s’ or
asymptomatic ‘a’.
Figure 4.1: Schematic diagram of the non-preferential transmission model. Superscripts
indicate presence (s) or absence (a) of symptoms.
Hence, we assume that susceptible individuals are infected at rate λ(t). Following
72 Chapter 4. Differences in Mixing Behaviour and Symptom Heritability
infection, individuals enter the exposed compartment (E) in which they are infected
but not yet infectious. After a mean latent period 1/γ individuals become asymp-
tomatic infectious, entering the compartment Ia1 . We define φ; 0 ≤ φ ≤ 1 to be the
proportion of cases that will develop symptoms, and 1 − φ the proportion of cases
that will remain asymptomatic. Infectious individuals move from the asymptomatic
compartment Ia1 to the symptomatic Is or asymptomatic Ia2 compartments at rates
φ×θ and (1−φ)×θ, respectively. Finally, individuals recover and move to the recov-
ered compartment (R) at rates σa and σs, respectively. The corresponding system of
ordinary differential equations (ODEs) is given by
dS(t)dt = −λS(t),
dE(t)dt = λS(t)− γE(t),
dIa1 (t)dt = γE(t)− θIa1 (t),
dIa2 (t)dt = (1− φ)θIa1 (t)− σaIa2 (t),
dIs(t)dt = φθIa1 (t)− σsIs(t),
dR(t)dt = σaIa2 (t) + σsIs(t).
In contrast, the preferential model differentiates between infection caused by an
asymptomatic case (at rate λa(t)) and by a symptomatic case (at rate λs(t)),
respectively. Figure 4.2 shows a schematic diagram of this model.
Figure 4.2: Schematic diagram of the preferential transmission model. Superscripts
indicate clinical status of the infected individual: symptomatic (s) or asymptomatic (a).
Subscripts indicate whether the infector was symptomatic (s) or asymptomatic (a).
4.1. Methodology 73
If the infector is asymptomatic, the infected individual will move from S to Ea; if the
infector is symptomatic, the infected individual will move to Es. Next, cases become
asymptomatic infectious at rate γ and move to Iaa or Ias . Infected individuals then
either develop symptoms or remain asymptomatic. We define φa as the probability
that an individual infected by an asymptomatic case remains asymptomatic and φs as
the probability that an individual infected by a symptomatic case develops symptoms.
Note that we assume the length of the incubation period to be independent of the
infector-type. Under this assumption, the preferential model simplifies to the non-
preferential model if φs = 1 − φa. The system of ordinary differential equations
(ODEs) for the preferential model is given by
dS(t)dt = −(λa + λs)S(t),
dEa(t)dt = λaS(t)− γaEa(t),
dEs(t)dt = λsS(t)− γsEs(t),
dIaa (t)dt = γaEa(t)− θaIaa (t),
dIas (t)dt = γsEs(t)− θsIas (t),
dIa(t)dt = (φa)θaI
aa (t) + (1− φs)θsIas (t)− σaIa(t),
dIs(t)dt = (1− φa)θaI
aa (t) + φsθsI
as (t)− σsIs(t),
dR(t)dt = σaIa(t) + σsIs(t).
4.1.2 Age Structure and Social Contacts
Consider a population that is divided in K age categories. The age-specific force of
infection λ(k, t), i.e. the rate at which a susceptible person in age group k acquires
infection at time t, is given by (discretized version of (1.10) in Section 1.3.2.1):
λ(k, t) =
K∑k′=0
β(k, k′)I(k′, t),
where I(k′, t) denotes the total number of infectious individuals in age group k′ at time
t. Further, we follow the social contact hypothesis (1.12) described in Section 1.3.2.1
in which we distinguish between the asymptomatic social contact matrix Ca and
the symptomatic social contact matrix Cs. Hence, ca(k, k′) is the per capita rate
at which an asymptomatic individual in age group k′ makes contact with a person
74 Chapter 4. Differences in Mixing Behaviour and Symptom Heritability
in age group k. We allow different proportionality factors for asymptomatic and
symptomatic individuals and denote them by qa and qs, respectively. Hence,
βa(k, k′) = qa · ca(k, k′),
βs(k, k′) = qs · cs(k, k′),
with βa and βs the transmission rates of asymptomatic and symptomatic cases, re-
spectively. Lastly, define qr = qs
qa as the relative infectiousness of symptomatic cases
versus asymptomatic cases. Then, the force of infection for the non-preferential trans-
mission model is defined as
λK×1(t) = βaK×K × (Ia1,K×1(t) + Ia2,K×1(t)) + βsK×K × IsK×1(t),
where × denotes matrix multiplication. For the preferential transmission model the
rate at which a susceptible individual acquires infection from an asymptomatic or
symptomatic individual at time t, respectively, are given by
λa,K×1(t) = βaK×K × (Iaa,K×1(t) + Ias,K×1(t) + IaK×1(t)),
λs,K×1(t) = βsK×K × IsK×1(t).
The total force of infection is then λK×1(t) = λa,K×1(t) + λs,K×1(t). The repro-
duction numbers for these models can be derived using the next-generation approach
(Section 1.3.2.2 and Diekmann et al. (1990)). For the non-preferential model the next
generation matrix corresponding with the infected states (E, Ia1 , Ia2 , I
s) is given by
GNP =
βa∆S>
θ + (1−φ)βa∆S>
σa + φβs∆S>
σsβa∆S>
θ + (1−φ)βa∆S>
σa + φβs∆S>
σs
0 0
0 0
0 0
βa∆S>
σaβs∆S>
σs
0 0
0 0
0 0
.
Therefore the reproduction number for the non-preferential model is given by
RNP = max (eigenvalue (GNP )):
RNP = max
(eigenvalue
(βa∆S>
θ+
(1− φ)βa∆S>
σa+φβs∆S>
σs
)),
4.1. Methodology 75
where Ac×d∆Bc×1 operates by multiplying the ith row of A with the ith element
of B. The next generation matrix for the preferential model corresponding with the
infected states (Ea, Es, Iaa , I
as , I
a, Is) is given by
GP =
βa∆S>
θ + φaβa∆S>
σaβa∆S>
θ + (1−φs)βa∆S>
σaβa∆S>
θ + φaβa∆S>
σa
(1−φa)βs∆S>
σsφsβ
s∆S>
σs
(1−φa)βs∆S>
σs
0 0 0
0 0 0
0 0 0
0 0 0
βa∆S>
θ + (1−φs)βa∆S>
σaβa∆S>
σa 0φsβ
s∆S>
σs 0 βs∆S>
σs
0 0 0
0 0 0
0 0 0
0 0 0
.
Therefore the reproduction number is given by RP = max (eigenvalue (GP )).
4.1.3 Estimation Procedure
We divide the population in five age categories based on the age classes of the
incidence data at hand: 0 − 4, 5 − 14, 15 − 44, 45 − 65 and 65+. The data consist
of reported number of new symptomatic cases per age group per week. We take into
account that not all ILI cases are reported via general practitioners and that these
under-reporting rates can differ by age.
We use a likelihood-based approach by assuming
yk,j ∼ Po(ρk · (Isnew,k(j)− Isnew,k(j − 1))),
where yk,j is the observed number of new cases in age group k in week j. Isnew,k(t)
is the expected cumulative number of new symptomatic cases in age group k at
time t obtained by solving dIsnew(t)/dt = φ · θ · Ia1 (t) for the non-preferential model
and dIsnew(t)/dt = (1 − φa) · θ · Iaa (t) + φs · θ · Ias (t) in the preferential model. The
age-specific reporting rate of ILI cases is denoted by ρk(k = 1, ..., 5).
76 Chapter 4. Differences in Mixing Behaviour and Symptom Heritability
The system of differential equations is initiated by taking into account the
pre-existing immunity to the pandemic strain S(0). Furthermore, since we observed
a large impact of the initial number of symptomatic cases Is(0), these five param-
eters (one for each age category) are included in the estimation procedure. The
number of asymptomatic cases at time 0 is assumed to be 0. The initial number
of recovered individuals is then R(0) = N−S(0)−Is(0), with N the population size.
Our aim is to estimate φ, φa, φs, qa, qr, ρk, and Isk(0)(k = 1, ..., 5). Other pa-
rameters are assumed known and were obtained from a literature review on influenza
transmission models by Dorjee et al. (2013). In this review, values were extracted
from studies that estimate (or use) mean, minimum and/or maximum of influenza
parameters. These were summarized into three categories: (1) estimated values,
where an article attempted to estimate parameters from empirical data; (2) referenced
values, where values were adopted from other papers; (3) assumed values, where
values were based on expert opinion or unpublished data sources. Parameters were
summarized as median and range of means, minimum and maximum values from the
reviewed articles. An overview of these parameters is given in Table 4.1. Note that
the subclinical and clinical infectious period refer to the period from infectiousness to
recovery for asymptomatic (1/θ + 1/σa) and symptomatic individuals (1/θ + 1/σs),
respectively. We will use the estimated values when available (median of means),
otherwise the referenced values are assumed to be known.
Parameters are estimated via a Markov Chain Monte Carlo (MCMC) approach. This
procedure was performed using the LaplacesDemon package (Hall, 2011) in R3.1.1
and R3.2.2. A two-phase approach was used, where the first phase consists of the
Adaptive-Mixture Metropolis (AMM) algorithm to achieve stationary samples that
seem to have converged to the target distribution. In the second phase Random-Walk
Metropolis (RWM), a non-adaptive algorithm, is used to obtain final samples. In this
phase 10,000,000 iterations were conducted retaining every 1,000th iteration. Burn-in
period is based on the convergence diagnostic by Boone, Merrick and Krachey (BMK)
(Boone et al., 2012). The univariate prior distributions for all parameters are given in
Table 4.2. All of these are uninformative, except for the initial number of symptomatic
cases. The number of symptomatic cases at time 0 in age class i is assumed to follow
a truncated normal distribution, with mean µk and standard deviation δk based on
the observed ILI incidence for age class k in week 22. To ensure that the estimates
lay within their proper parameter space, logit transformations are applied for φ, φa,
4.1. Methodology 77
Table 4.1: An overview of parameters of pandemic influenza A/H1N1 2009 in humans
obtained from a literature review (Dorjee et al., 2013). These values were either estimated
from empirical data of experimental or observational studies (Est.); or referenced for
modeling (Ref.).
Parameter Median of Median of Median of
means (range) min. values (range) max. values (range)
Incubation period Est. 2.0 1.0 (1.0 - 2.0) -
1/γ + 1/θ Ref. 2.0 (1.5 - 3.0) 1 5
Latent period Est. - - -
1/γ Ref. 1.5 (1 - 3.5) 0.9 (0.7 - 1.0) 4.0 (2.0 - 5.0)
Subclinical Est. - - -
infectious period Ref. 1.0 (0.5 - 2.5) - 2.0
1/θ + 1/σa
Clinical Est. 5.6 1.0 10.0 (8.0 - 12.0)
infectious period Ref. 3.8 (2.5 - 7.0) 3.8 (1.9 - 4.0) 5.5 (2.9 - 10)
1/θ + 1/σs
φs and ρk(k = 1, ..., 5) and log transformations for qa and qr. Furthermore, since
symptomatic cases are considered to be more infectious than asymptomatic cases, the
infectiousness ratio qr is restricted to be larger than 1.
Table 4.2: Prior distributions for the parameters in the preferential and non-preferential
model.
Parameter Prior distribution
Isk(0) N(µk, δk)(k = 1, ..., 5); truncated(0.1, 1000)
φ U(0, 1)
φa U(0.1, 1)
φs U(0, 0.95)
qa U(0, 10)
qr U(1, 10)
ρk U(0, 1)
78 Chapter 4. Differences in Mixing Behaviour and Symptom Heritability
4.2 Application to the Data
Using the social contact matrices and the ILI incidence data, we will look into the
estimation of the proportionality factor, qa, the infectiousness ratio qr, the reporting
rates, ρk(k = 1, ..., 5), the proportion of symptomatic infections, φ, for the non-
preferential model and the proportions φa and φs for the preferential model.
4.2.1 Exploratory Analyses
To assess the estimability of the reporting rates ρk(k = 1, ..., 5), we perform an ex-
ploratory analysis in which these reporting rates are replaced by one age-independent
reporting rate ρ. Parameter estimation in the non-preferential and preferential model
is then performed for fixed values of ρ at 0.3, 0.5, 1.0. The results are shown in
Tables 4.3 and 4.4.
Table 4.3: Posterior median, 95% posterior intervals and DIC value for the
non-preferential model for different values of the reporting rate ρ.
Parameter ρ = 0.3 ρ = 0.5 ρ = 1.0
φ 0.34(0.10, 0.90) 0.34(0.083, 0.86) 0.35(0.083, 0.90)
qa 0.060(0.035, 0.079) 0.062(0.039, 0.081) 0.062(0.039, 0.080)
qr 2.79(1.05, 8.90) 2.68(1.04, 9.52) 2.67(1.05, 9.18)
ρ1 0.3 0.5 1.0
ρ2 0.3 0.5 1.0
ρ3 0.3 0.5 1.0
ρ4 0.5 1.0
ρ5 0.3 0.5 1.0
R 1.47(1.39, 1.56) 1.46(1.39, 1.55) 1.46(1.39, 1.55)
DIC 306.61 307.05 307.56
We conclude for both models that the value of one age-independent reporting
rate ρ does not affect model fit or other parameter estimates. Hence, it is only
possible to estimate the relative differences in reporting rates between age categories.
We set the reporting rate of a randomly chosen age category fixed as reference
category: ρ4 = 0.2. This value of 20% is based on a literature search for reporting
rates on ILI and influenza. Since no information on reporting rates was found
specifically for H1N1 in England, this search was conducted worldwide including
4.2. Application to the Data 79
seasonal influenza e.g. Lunelli et al. (2013). However, since there is so little informa-
tion on under-reporting, we will only be interpreting the estimated relative differences.
Table 4.4: Posterior median, 95% posterior intervals and DIC value for the preferential
model for different values of the reporting rate ρ.
Parameter ρ = 0.3 ρ = 0.5 ρ = 1.0
φa 0.98(0.91, 1.00) 0.99(0.92, 1.00) 0.99(0.92, 1.00)
φs 0.28(0.07, 0.64) 0.29(0.07, 0.65) 0.25(0.07, 0.63)
qa 0.11(0.093, 0.12) 0.11(0.092, 0.12) 0.11(0.092, 0.12)
qr 2.32(1.03, 8.97) 2.24(1.04, 9.05) 2.53(1.03, 9.35)
ρ1 0.3 0.5 1.0
ρ2 0.3 0.5 1.0
ρ3 0.3 0.5 1.0
ρ4 0.3 0.5 1.0
ρ5 0.3 0.5 1.0
R 1.45(1.27, 1.62) 1.45(1.26, 1.62) 1.44 (1.26, 1.61)
DIC 290.97 291.78 291.43
4.2.2 Results
Posterior medians and 95% posterior credible intervals for the estimated parameters
and R in the non-preferential model are shown in Table 4.5. Posterior and prior
distributions are plotted in Figure 4.3. Scatter plots are shown in Figure 4.4 and the
estimated number of symptomatic and asymptomatic cases over time are plotted in
Figure 4.5.
The posterior credible intervals for φ and qr are quite wide, indicating that it
is difficult to estimate these parameters from the data. This is confirmed by the
posterior density plots. A scatter plot of φ versus qr (Figure 4.4) shows a strong link
between both parameters, indicating that we can either estimate the proportion of
symptomatic cases or the relative infectiousness from the data at hand.
80 Chapter 4. Differences in Mixing Behaviour and Symptom Heritability
Table 4.5: Posterior median, 95% posterior credible intervals and DIC value for the
non-preferential and preferential model.
Parameter Non-preferential Preferential
φ 0.15(0.04, 0.39)
φa 0.98(0.89, 1.00)
φs 0.22(0.062, 0.58)
qa 0.082(0.069, 0.093) 0.10(0.092, 0.12)
qr 2.76(1.04, 9.21) 2.62(1.04, 9.20)
ρ1 0.21(0.18, 0.25) 0.20(0.17, 0.24)
ρ2 0.20(0.17, 0.22) 0.20(0.18, 0.23)
ρ3 0.23(0.20, 0.25) 0.21(0.19, 0.23)
ρ4 0.2 0.2
ρ5 0.15(0.12, 0.19) 0.15(0.12, 0.18)
R 1.36(1.33, 1.40) 1.41(1.23, 1.63)
DIC 298.75 288.49
Figure 4.3: Prior and posterior distributions for the proportion of cases that develop
symptoms (φ), the proportionality factor for asymptomatic individuals (qa), the relative
infectiousness of symptomatic cases versus asymptomatic cases (qr) and the reporting rates
(ρi, i = 1, 2, 3, 5).
When only 15% of cases develop symptoms, symptomatic cases are estimated to be
about 2.76 times more infectious than asymptomatic cases. The reproduction number
4.2. Application to the Data 81
is estimated to be 1.36. Lastly, the reporting rate is estimated to be about 1.15 times
higher in the 15 − 44 years age group and 0.75 times lower in the 65+ years age
group compared to the reporting rate in the 45− 65 years age group. The estimated
incidence is shown in Figure 4.10.
Figure 4.4: Scatter plot of the proportion of cases that develop symptoms (φ), the
proportionality factor for asymptomatic individuals (qa) and the infectiousness ratio (qr).
Figure 4.5: Number of symptomatic (full line) and asymptomatic (dotted line) cases over
time for the five age categories assuming a 20% reporting rate in the 45− 65 age class for
the non-preferential model.
Results for the preferential model are presented in Table 4.5 and Figures 4.6-4.10.
82 Chapter 4. Differences in Mixing Behaviour and Symptom Heritability
Similar as for the non-preferential model, we conclude that φs and qr are strongly
connected and cannot simultaneously be estimated from the data. When 22% of cases
infected by a symptomatic case develop symptoms, symptomatic cases are estimated
to be about 2.62 times more infectious than asymptomatic cases. Furthermore,
this model confirms that the reporting rate in the 65+ age class is lower than in
the other age categories. Reproduction number is estimated at 1.41. Lastly, this
model has a smaller DIC value and, hence, a better fit than the non-preferential model.
To check whether the preferential model simplifies to the non-preferential model
(φs = (1− φa)), the difference between φs and 1− φa is calculated for each posterior
sample. The histogram of this difference is shown in Figure 4.9. The 95% credible
interval for the difference is [0.05, 0.55]. This shows that the preferential model does
not simplify to the non-preferential model.
Figure 4.6: Prior and posterior distributions for the proportion of individuals infected by
a symptomatic case that develop symptoms (φs), the proportion of individuals infected by
an asymptomatic case that remain asymptomatic (φa), the proportionality factor for
asymptomatic individuals (qa), the relative infectiousness of symptomatic cases versus
asymptomatic cases (qr) and the reporting rates (ρi, i = 1, 2, 3, 5).
4.2. Application to the Data 83
Figure 4.7: Scatter plot of the proportion of individuals infected by a symptomatic case
that develop symptoms (φs), the proportion of individuals infected by an asymptomatic
case that remain asymptomatic (φa), the proportionality factor for asymptomatic
individuals (qa) and the infectiousness ratio (qr).
Figure 4.8: Number of symptomatic (full line) and asymptomatic (dotted line) cases over
time for the five age categories assuming a 20% reporting rate in the 45− 65 age class for
the preferential model.
84 Chapter 4. Differences in Mixing Behaviour and Symptom Heritability
Figure 4.9: Histogram of MCMC samples for φs − (1− φa), with φs the proportion of
individuals infected by a symptomatic case that develop symptoms and φa the proportion
of individuals infected by an asymptomatic case that remain asymptomatic in the
preferential model.
4.3 Impact of Home Isolation
One of the possible interventions targeting symptomatic individuals is recommending
time off from work. Van Kerckhove et al. (2013) showed that contacts made
at home are not a proxy for contacts made when symptomatic. Therefore, we
assess the impact of individuals staying at home after symptom onset by assuming
that a proportion p of symptomatic individuals stays home immediately after
symptom onset. The contact matrix for symptomatic individuals Cs is replaced by
pCsh + (1 − p)Cs in which Csh is the contact matrix obtained from contacts made at
home by symptomatic individuals in the social contact survey. Hence, we assume
that these contact rates do not increase when individuals stay at home. The obtained
posterior parameter samples from the (non-)preferential model are used to solve the
4.3. Impact of Home Isolation 85
Fig
ure
4.1
0:
Obse
rved
(gre
ybars
)and
esti
mate
d(c
onnec
ted
dots
)re
port
edw
eekly
inci
den
cefo
rth
efive
age
cate
gori
es.
Full
line
and
filled
dots
isth
ees
tim
ate
din
ciden
cefo
rth
enon-p
refe
renti
al
model
,dott
edline
and
op
endots
are
the
esti
mate
sfo
rth
epre
fere
nti
al
model
.
86 Chapter 4. Differences in Mixing Behaviour and Symptom Heritability
system of ODEs associated with this isolation model for fixed values of p. This way
we can assess the impact of p on the difference in the number of (a)symptomatic cases.
Figure 4.11 shows the reduction of cases when a proportion p of symptomatic
individuals stays home after symptom onset. As p increases, the reduction in cases
also increases. For the non-preferential model, there is no visible difference between
symptomatic and asymptomatic cases (not shown). Using the preferential model,
we do see a larger reduction in symptomatic cases compared to asymptomatic cases.
Note that the reduction of cases is larger according to the non-preferential model in
comparison with the preferential model.
Figure 4.11: Proportion of cases plotted against the proportion of symptomatic
individuals staying home immediately after symptom onset. Left panel: reduction in total
number of cases for the non-preferential model with 95% confidence intervals. Right panel:
reduction in the number of total, symptomatic and asymptomatic cases for the preferential
model.
4.4 Discussion
In this chapter, we inferred parameters for an epidemic model accounting for asymp-
tomatic transmission and age-dependent under-reporting based on weekly incidence
data and social contact data from symptomatic and asymptomatic individuals.
The differences in mixing behavior between these individuals affect the expected
4.4. Discussion 87
age-distribution of infection during the early stages of an outbreak (Van Kerckhove
et al., 2013). This makes it possible to estimate parameters related to asymptomatic
infection using data on symptomatic cases only. Furthermore, we compared a simple
SEIR model with asymptomatic infection to a model in which the development of
symptoms depends on the status of the infector.
Using a Bayesian approach on ILI data from England and Wales during the
early stages of the 2009 epidemic (Public Health England, 2010), we showed that
it is possible to either estimate the proportion of symptomatic infections or the
relative infectiousness of symptomatic cases compared to asymptomatic cases in
the non-preferential model. Hence, when one has prior information on one of these
parameters, it is possible to estimate the other one from incidence data. Furthermore,
we found that the data supports the preferential transmission hypothesis i.e. the
development of ILI symptoms depends on whether one was infected by a symptomatic
or asymptomatic case. Both models show a significantly larger under-reporting rate
for people older than 65 years in comparison with 45− 65 year olds. This means that
the discrepancy between consultation rates and symptomatic illness rates is larger
for the elderly in comparison with the non-elderly adults, although consultation rates
in this last age category were found to be lower. Also note that the reporting rates
we estimate can possibly account for factors other than the propensity to visit a GP,
e.g. the ability to better fit the data because of working with a hidden layer (Azmon
et al., 2014). Lastly, we assessed the effect of symptomatic individuals staying at
home. Following the preferential transmission hypothesis, we found a reduction in
total number of cases of 39% (0.30, 0.45) when 50% of individuals would stay home
immediately after symptom onset. If all symptomatic individuals would stay home,
a reduction of 63% (0.53, 0.70) is observed. To assess more subtle scenarios of home
isolation, we will use individual-based models in future research.
Recently, Lin et al. (2016) explored the trade-off between contact rates and
infectiousness (i.e. decreasing contact rates and increasing infectiousness with
increasing symptom severity) using a model similar to our non-preferential model.
They found that R0 varies non-monotonically with symptom severity, implying that
certain interventions such as antivirals for influenza, can increase R0. Their research
highlights the importance of using empirical data describing the relation between
contact rates and symptom severity in epidemiological models.
The preferential model resembles the infector-dependent severity (IDS) model
88 Chapter 4. Differences in Mixing Behaviour and Symptom Heritability
described by Ball and Britton (2007). However, they assume a homogeneously mix-
ing population and do not estimate model parameters. They derived a threshold limit
theorem for their model and looked at the effect of vaccination. They showed that
in certain scenarios the proportion of mildly (asymptomatic in our setting) infected
individuals can increase with increasing vaccination coverage. This emphasizes the
practical importance of our model for a wide range of pathogens with different levels
of symptom severity.
One of the limitations of our approach is that the reporting rates are not es-
timable from the data. Hence, one can only infer on the relative differences in
reporting between age categories. To estimate the true number of cases, information
on the reporting rate in at least one age class is needed. Further, we assume constant
reporting rates over time, since we do not have knowledge about temporal changes in
reporting. Also, the obtained estimates rely on the values of the fixed parameters as
found in the literature. Changing these parameters will affect the estimated target
parameters. Lastly, we use social contact data and incidence data from A/H1N1pdm
in 2009, thus it is uncertain how our conclusions would apply to other influenza strains.
Future research is needed to clarify the exact role of acquired viral dose in the
development of influenza symptoms. Up until now, challenge studies have not given
clear results able to confirm or reject our preferential transmission hypothesis (Carrat
et al., 2008; Huang et al., 2011). Lastly, to extend this model for other diseases more
empirical data on how contact rates change with symptom severity are needed.
Chapter 5Empirical Household Contact
Networks: Challenging the
Household Random Mixing
Assumption
Households are crucial units in the epidemiology of airborne infectious diseases
such as influenza, smallpox and SARS. Relations between household members are
typically characterized by frequent and intimate contacts, allowing for rapid disease
spread within the household upon introduction of an infectious case. As stated by
Ferguson et al. (2006): “being a member of a household containing an influenza
case is in fact the largest single risk factor for being infected oneself” (Longini
et al., 1982; Cauchemez et al., 2004). Furthermore, households with children have
a bridging function allowing an infection to spread from schools to workplaces and
visa versa. Inference from household final-size data revealed that children play a
key role in bringing influenza infection into the household and in transmitting the
infection to other household members (Cauchemez et al., 2004). Households are the
most common transmission unit used in observational studies and in epidemic models.
Many epidemic models rely on the assumption of homogeneous mixing within
households. In early work by Longini and Koopman (1982); Becker (1989); Addy
89
90 Chapter 5. Empirical Household Contact Networks
et al. (1991), Reed-Frost type of models were used to estimate household and
community transmission parameters from household final size data, assuming a
constant probability of infection from the community. Ball et al. (1997) generalized
this to the so-called ‘households model’ with two levels of mixing, assuming random
mixing within households (local) and in the entire population (global), the latter
typically at a much lower rate. The analytical tractability of the households model
allowed the theoretical study of epidemic phenomena. This has led to the definition of
threshold parameters such as the reproduction number R∗, representing the average
number of households infected by a typical infected household in a totally susceptible
population (Ball et al., 1997; Ball and Neal, 2002). Meyers et al. (2005) used a
contact network model in an urban setting incorporating households as complete
networks (cliques) to explain the early epidemiology of SARS. Individual-based
simulation models of infectious disease spread incorporate detailed individual-level
information as to mimic demographic and social characteristics of a specific popula-
tion (e.g. Chao et al. (2010); Mniszewski et al. (2008); Grefenstette et al. (2013)).
These models sometimes incorporate more detailed structure in specific settings
such as schools and workplaces, but typically assume random mixing in households.
Studies that particularly highlight within-household transmission and control policies
targeting households can be found in Halloran et al. (2002) and Ferguson et al. (2006).
It has been argued that greater realism could be gained by considering differ-
ent household compositions and contact heterogeneity within households (Danon
et al., 2011). By ignoring contact heterogeneity between household members,
the contact network density equals the contact rate between two individuals in
a household and is a determinant for the within-household transmission rate
of airborne infectious diseases (Wallinga et al., 2006; Mossong et al., 2008b).
Until now there was no direct empirical evidence to support the assumption
of homogeneous mixing within households. Egocentric contact surveys entailed
partially observed within-household contact networks and only allowed for indirect
inference of the unobserved network links (Potter et al., 2011; Potter and Hens, 2013).
In this chapter, we describe the first social contact survey specifically designed
to study contact networks within households. This study enables us to empirically
assess the assumption of homogenous mixing, e.g. by studying the effect of age and
gender on social distance within households. Furthermore, it provides an answer
to one of the key questions to inference on household models: how the density of
the contact network scales with the household size (Danon et al., 2011). Lastly,
5.1. Household Contact Survey 91
this survey makes it possible to asses reporting quality for diary-reported contact
surveys by looking at symmetry in contact reporting. We use Exponential Random
Graph Models (ERGMs; introduced in Section 1.3.3) to gain insight in the factors
driving close contact between household members and to develop a plausible model
for within-household physical contact networks. We then compare these empirically
grounded ERGMs to the assumption of random mixing using stochastic simulations
of an epidemic in the mise en scene of the two-level mixing model. This work is
presented in Goeyvaerts et al. (to be submitted).
In Section 5.1, we provide some more details on the household contact survey
that was introduced in Section 2.2.4. Details on the ERGM, estimates obtained
for the household data and goodness-of-fit results are presented in Section 5.2. In
Section 5.3, we perform epidemic simulations to compare the assumption of random
mixing with the household model inferred in Section 5.2. Lastly, a discussion is given
in Section 5.4.
5.1 Household Contact Survey
We use data from the survey described in Section 2.2.4. Table 5.1 summarizes the
proportion of complete (i.e. fully connected) networks and the mean network density
for the within-household physical contact networks by household size, distinguishing
week from weekend days and regular from holiday periods. The network density
is defined as the ratio of the number of observed edges to the number of potential
edges.
Overall, the type of day does not seem to have a large impact on the connect-
edness within households, however, the data suggest some decreasing connectedness
with increasing household size, mainly on weekdays and during regular periods. For
2-parent households of size 4, the observed proportion of complete networks is 0.77
on weekdays and 0.85 on weekend days. Estimates inferred by Potter et al. (2011)
and Potter and Hens (2013) using Belgian egocentric contact data for the same type
of household, were smaller and ranged from 0.34 to 0.65. For the purpose of studying
household contacts, we consider our survey design an improvement upon the design
of the source data for those two studies (POLYMOD study, Mossong et al. (2008b)).
Our survey is quite similar to the POLYMOD design, but all household members
are recruited to take part in the survey and participants have to identify whether
92 Chapter 5. Empirical Household Contact Networks
each contacted person is a member of his/her household. In the two aforementioned
network analyses of POLYMOD data, a reported contact was assumed to be with
a household member if it occurred at home, was reported as daily or almost daily
and if the age matched one of the reported ages of household members. This way,
only a partial network was observed for each household since information on contacts
between the respondent and household members was available, but not on contacts
between other members. The density of contacts within households was, therefore,
likely underestimated.
Various measures of within-household clustering are defined in Section A.2 and
Table A.1 shows the high degree of physical contact clustering observed within
households.
Week Weekend
HH Nr. Proportion Mean Nr. Proportion Mean
size HHs complete density HHs complete density
2 9 1.00 1.00 3 1.00 1.00
3 53 0.91 0.96 19 0.74 0.88
4 111 0.77 0.93 48 0.85 0.96
5 39 0.64 0.90 18 0.78 0.95
≥ 6 13 0.46 0.85 3 1.00 1.00
Total 225 0.77 0.93 91 0.82 0.94
Regular period Holiday period
HH Nr. Proportion Mean Nr. Proportion Mean
size HHs complete density HHs complete density
2 9 1.00 1.00 3 1.00 1.00
3 42 0.86 0.94 30 0.87 0.93
4 105 0.82 0.94 54 0.76 0.93
5 38 0.66 0.91 19 0.74 0.92
≥ 6 12 0.50 0.84 4 0.75 0.98
Total 206 0.79 0.93 110 0.79 0.93
Table 5.1: Proportion of complete networks and mean network density, stratified by
household size, for the observed within-household physical contact networks, comparing
week and weekend days (top) and regular and holiday periods (bottom).
5.2. ERGMs for Within-household Physical Contact Networks 93
5.2 ERGMs for Within-household Physical Contact
Networks
We use ERGMs (see Section 1.3.3 for details) to model the within-household physical
contact networks. We infer on the processes driving physical contacts between
household members by incorporating network statistics based on nodal covariate in-
formation (Table 5.2). Although our analysis is focused on within-household contact
networks, we fit a single ERGM including all households. We include in our model
both the total number of edges and a household effect which captures the tendency
to contact others in one’s own household. Because there are no between-household
contact reports present in our survey, the probability of between-household contact
should be zero. Constrained optimization with fixed coefficients for these statistics as
proposed by Potter and Handcock (2010) does not entail a plausible approximation
of the likelihood for our data. Therefore, we use unconstrained optimization and
check whether the probability of physical contact between non-household members is
approximately zero.
We explore the effect of relationships i.e. mixing among siblings, between chil-
dren and their parents and between partners, gender-preferential mixing and age
effects in children, and the effect of household size, distinguishing small (≤ 3
members), medium (4 members) and large (≥ 5 members) households.
Above network statistics are all dyad independent: the vector of change statistics
δg(y,X)ij does not depend on the value y. We explore the presence of higher-order
dependency effects between members of the same household, such as clustering (see
Table A.1), by including in the model the number of isolate individuals, 2-stars,
triangles and triangles in households of size ≥ 6. A 2-star is a person connected to
two other household members and a triangle is a set of three household members
such that all three are connected to each other.
The triangle term estimates a transitivity effect, i.e. the increase in log odds
of contact between two people due to the fact that they have a third contact in
common. Inclusion of triangle terms in ERGM models has been found to lead to
“model degeneracy” in some cases (Handcock, 2003). Model degeneracy occurs when
the maximum likelihood estimate places most probability on a small set of possible
networks (e.g., all mass on the complete network). It results from the fact that
94 Chapter 5. Empirical Household Contact Networks
Network statistic Legend
Edges Total number of edges
Within-household edges Total number of edges within households
Child-father mixing Total number of edges between children and fathers
Child-mother mixing Total number of edges between children and mothers
Father-mother mixing Total number of edges between partners
Boy-boy mixing Total number of edges between male children
Girl-girl mixing Total number of edges between female children
Age effect children The sum of age(i) and age(j) for all edges (i, j) between siblings
Small (<=3) households Total number of edges within households of size ≤ 3
Large (>=5) households Total number of edges within households of size ≥ 5
Isolates Total number of isolates
2-stars Total number of 2-stars
Triangles Total number of triangles
Triangles in households of size ≥ 6 Total number of triangles in households of size ≥ 6
Table 5.2: Network statistics considered in the ERGMs, where an edge is defined as a
physical contact between two individuals. Reference categories are child-child mixing,
boy-girl mixing, and mixing within households of size 4.
the triangle term does not impose decreasing marginal returns on the number of
mutual contacts made by the two individuals in question. For example, the increase
in log odds of contact between a pair whose number of mutual contacts increases
from zero to one is forced to be the same as the increase in log odds of contact
between a pair whose number of mutual contacts increases from ten to eleven.
Alternate ERGM terms have been proposed which instead, more realistically, model
decreasing marginal returns of the number of mutual contacts on the log odds of
contact (Hunter, 2007). However, model degeneracy was not found to be a problem
in our case, possibly because the unique structure of our data set, which includes a
large number of households but includes no between-household contacts, prevents an
“avalanche effect” of triangles towards the complete network.
Approximate maximum likelihood estimates are obtained using a stochastic
Markov Chain Monte Carlo (MCMC) algorithm (Geyer and Thompson, 1992).
In short, a distribution of random networks is simulated from a starting set of
parameter values using MCMC and the parameter values are refined by comparing
this distribution of networks against the observed network in a Newton-Raphson
type algorithm, repeating this process until the parameter estimates stabilize
(Robins et al., 2007). MCMC estimation is performed with the ergm package in R
(Hunter et al., 2008; Handcock et al., 2013a) that is part of the statnet suite of
5.2. ERGMs for Within-household Physical Contact Networks 95
packages for statistical network analysis (Handcock et al., 2008, 2013b; Goodreau
et al., 2008). We use a burn-in of length 106, intervals between sampled networks
of length 103 and a total sample size equal to 5 · 105. The initial value of θ is
obtained by maximum pseudolikelihood estimation, considering (1.14) as a logistic
regression model assuming all Yij are mutually independent (Strauss and Ikeda, 1990).
Week Weekend
Network statistic Estimate p-value Estimate p-value
Edges -28.16 <0.01 -20.63 <0.01
Within-household edges 28.97 <0.01 22.78 <0.01
Child-father mixing -0.60 0.23 -1.15 0.45
Child-mother mixing 0.16 0.76 0.14 0.93
Father-mother mixing 0.27 0.66 -0.76 0.63
Age effect children -0.07 <0.01 -0.18 <0.01
Small households (≤3) 0.74 <0.01
Large households (≥5) -0.40 <0.01
2-stars -0.26 0.25 -0.87 0.01
Triangles 2.06 <0.01 3.58 <0.01
Triangles in households of size ≥6 -0.28 0.02
Loglikelihood -306.80 -65.98
AIC 635.59 147.95
Table 5.3: ERGM for within-household physical contact networks on week- and weekend
days: parameter estimates and Wald test p-values, log-likelihood and AIC.
The within-household physical contact networks were modeled separately for week-
days and weekends and the final ERGMs are presented in Table 5.3. The estimates
shown in this table are log odds ratios and, hence, need to be exponentiated to
obtain odds ratios, e.g. the odds of a physical contact between a father and child
is exp(−0.60) = 0.55 times the odds of a physical contact between two children
assuming other network characteristics remain fixed. Note that the edge effect is
estimated negative to counterbalance the large within-household edge effect, which
is needed because our data does not include between-household contacts. In both
models, the effects of gender-preferential mixing and the number of isolates were
found to be non-significant (likelihood ratio test p = 0.5766 for weekdays). For
weekend days, no significant effect of household size was found and the model was
further reduced to an 8-parameter model (likelihood ratio test p = 0.5134). On
weekdays, the odds of a physical contact occurring in a household of size ≤ 3 and
96 Chapter 5. Empirical Household Contact Networks
≥ 5 are estimated to be 2.10 and 0.67 times the odds of a physical contact occurring
in a household of size 4, respectively. Thus, the physical contact network density
decreases with increasing household size. Further on both type of days, the odds of
a physical contact between father and child is smaller than for any other pair except
for older siblings, as the probability for siblings to make physical contact decreases
with increasing age (Figure A.2). For households of size ≤ 5, the odds of a physical
contact that will complete a triangle is estimated to be 7.85 and 35.87 times the odds
of a physical contact that will not complete a triangle on week and weekend days,
respectively. This demonstrates the overall high degree of contact clustering within
households. On weekdays, the degree of clustering is slightly lower in households of
size ≥ 6 (conditional odds of 5.93).
Goodness-of-fit of the models is assessed by simulating new sets of physical
contact networks from the fitted ERGM and by comparing specific contact network
characteristics that are not included in the model, to the observed ones. We compare
the proportion of complete networks, the mean network density and the proportion
of observed versus potential triangles (Section A.2), by household size. We simulate
1000 networks using a burn-in of length 107 and intervals of length 106 between
sampled networks. The first simulated Markov chain begins at the initial network
and the end of one simulation is used as the start of the next simulation.
Figure 5.1: Proportion of complete networks (left) and mean network density (right):
observed values (blue stars with size proportional to the sample size) and values simulated
from the ERGM for within-household physical contact networks on a weekday.
5.2. ERGMs for Within-household Physical Contact Networks 97
Figure 5.2: Proportion of complete networks (left) and mean network density (right):
observed values (blue stars with size proportional to the sample size) and values simulated
from the ERGM for within-household physical contact networks on a weekend day.
Figure 5.3: Proportion of observed versus potential triangles: observed values (blue stars
with size proportional to the sample size) and values simulated from the ERGM for
within-household physical contact networks on a weekday (left) and on a weekend day
(right).
Overall, the final ERGMs seem to fit the data well as shown in Tables A.2-A.5 and
Figures 5.1-5.3.
98 Chapter 5. Empirical Household Contact Networks
5.3 Epidemic Spread in a Community of House-
holds
We simulate the spread of a newly emerging infection in a closed fully susceptible
population of households using a discrete-time chain binomial SIR model (see
Section 1.3.1.4). The 225 households from the contact survey that were analyzed
using the weekday ERGM, are used to construct the community of households. We
assume two levels of mixing similar to the households model of Ball et al. (1997):
high-intensity mixing within households and low-intensity ‘background’ random
mixing in the community i.e. between households. Two different configurations
for within-household mixing are compared: random mixing and empirical-based
mixing, where the latter refers to physical contact networks simulated from the
fitted ERGMs. For each epidemic simulation, two sets of within-household contact
networks are drawn from the ERGMs, one for a weekday and one for a weekend day,
and those are kept fixed during the whole simulation.
At time step t (in days), assuming infection is spread by means of physical
contacts, each susceptible i acquires infection with probability:
pi,1(t) = 1− (1− βh)∑
j 6=i∈hiyijIj(t) · (1− βc,11)
∑j /∈hi
Ij,1(t) · (1− βc,12)∑
j /∈hiIj,2(t),
pi,2(t) = 1− (1− βh)∑
j 6=i∈hiyijIj(t) · (1− βc,21)
∑j /∈hi
Ij,1(t) · (1− βc,22)∑
j /∈hiIj,2(t),
where index 1 corresponds to children ≤ 18 years and index 2 to adults > 18 years. βh
denotes the within-household transmission probability per physical contact, per time
step. The 2 × 2 community transmission probability matrix βc (with βc,12 = βc,21)
is taken directly proportional to the per capita physical contact rates estimated from
the Belgian POLYMOD contact survey, with a proportionality constant qc (Mossong
et al., 2008b; Goeyvaerts et al., 2010):
βc = qc
[17.35 6.26
6.26 7.88
]· 10−7.
Further, yij denotes the observed adjacency matrix and under the random mixing
scenario, yij equals 1 for all household members i and j. Finally, hi denotes the
household of node i and Ij(t) indicates whether node j is infected (1) or not (0) at
time t with subscripts referring to children and adults.
5.3. Epidemic Spread in a Community of Households 99
Since we aim to study the effect of contact heterogeneity, we assume that in-
herent susceptibility and infectiousness are invariant with age. Further, we assume
that there is no latent period i.e. individuals are infectious immediately when
acquiring infection. At each time step, infected individuals recover with a constant
probability of 0.22 such that the mean infectious period is approximately 3.5 days.
Values for the transmission parameters βh and qc are chosen in line with literature es-
timates based on household final size and symptom onset data (Table A.6): βh = 0.05
and qc = 275. Resulting in an average community transmission βc = 0.00026. These
parameter values result in estimates of the community probability of infection for
children and adults (CPIchild and CPIadult) between 0.18 and 0.20, and 0.11 and
0.12, respectively. Furthermore, the probability to escape infection from an infected
household member per day is qHH = 1− βh = 0.95. The first day of the epidemic is
randomly determined to be a week- or weekend day and is started by infecting three
random individuals. The epidemic is then tracked until all infected individuals are
recovered and no new infections have occurred.
5.3.1 Setting 1
Results from 1000 stochastic epidemic simulations are shown in Figures 5.4 and 5.5.
In these figures, small outbreaks defined as outbreaks with a final size of < 100
individuals that took less than 60 days, are excluded from display. The proportion
of small outbreaks is significantly smaller in the random mixing setting compared
to empirical-based mixing, 0.43 and 0.50, respectively (Fisher’s exact test, p-value:
0.0027).
These results suggest that relaxing the assumption of random mixing within
households by extending to more realistic contact network patterns drawn from
the fitted ERGMs, has a small impact on the epidemic simulations. The mean
proportion of individuals ultimately infected and the mean proportion of households
infected is slightly smaller under empirical-based mixing compared to random mixing:
0.39 [0.12, 0.56] vs. 0.36 [0.12, 0.53], and 0.70 [0.28, 0.88] vs. 0.67 [0.29, 0.86],
respectively (Figure A.3). Furthermore, the household attack rate, defined as the
mean proportion of individuals infected per household (Longini and Koopman, 1982)
increases with household size under both settings (Figure 5.5).
100 Chapter 5. Empirical Household Contact Networks
Figure 5.4: Mean infection incidence over time at the individual (left) and household level
(right) for 1000 simulations of a stochastic SIR epidemic process on a 2-level households
model assuming random (black) and empirical-based (red) mixing within households.
Figure 5.5: Household attack rates by household size for 1000 simulations of a stochastic
SIR epidemic process on a 2-level households model assuming random and empirical-based
mixing within households.
5.3. Epidemic Spread in a Community of Households 101
5.3.2 Setting 2
In this setting, we include a scaling factor to account for the difference in density
between empirical-based and random mixing:
pi,1(t) = 1− (1− βh · δh)∑
j 6=i∈hiyijIj(t) · (1− βc,11)
∑j /∈hi
Ij,1(t) · (1− βc,12)∑
j /∈hiIj,2(t),
pi,2(t) = 1− (1− βh · δh)∑
j 6=i∈hiyijIj(t) · (1− βc,21)
∑j /∈hi
Ij,1(t) · (1− βc,22)∑
j /∈hiIj,2(t).
Hence, δh is chosen 1 for empirical-based mixing, while for random mixing it equals
the network density of the simulated contact network in the empirical-based mixing
scenario. In the previous simulation, the different results between the network model
and the random mixing scenario could be due simply to different densities rather than
to any particular characteristic of the network structure. In this setting we calibrate
in order to make a more fair comparison between the two scenarios. Figures 5.6 and
5.7 present the results from 1000 simulations excluding small outbreaks.
From these figures we see that there are barely any differences in mean final
fractions (0.37 [0.13, 0.52] vs. 0.36 [0.12, 0.53], and 0.68 [0.31, 0.86] vs. 0.67 [0.29,
0.86]; Figure A.4) or household attack rates between empirical-based and random
mixing, respectively. The proportion of small outbreaks is also similar in both
settings, 0.48 and 0.50, respectively (Fisher’s exact test, p-value: 0.3954).
5.3.3 Other Settings
To further investigate the comparison between random and empirical-based mixing,
we also performed the epidemic simulations under following conditions:
1. High household transmission: βh = 0.4
2. Age-dependent household transmission: βh,1 = 0.2 and βh,2 = 0.05
3. Age-independent community transmission: βc = βc = 0.00026
4. Simulating over all 316 households available in the contact data
However, the obtained differences were fairly similar to the results described above
and are being omitted.
102 Chapter 5. Empirical Household Contact Networks
Figure 5.6: Mean infection incidence over time at the individual (left) and household level
(right) for 1000 simulations of a stochastic SIR epidemic process on a 2-level households
model assuming random (black) and empirical-based (red) mixing within households
including a density scaling factor.
Figure 5.7: Household attack rates by household size for 1000 simulations of a stochastic
SIR epidemic process on a 2-level households model assuming random and empirical-based
mixing within households including a density scaling factor.
5.4. Discussion 103
5.4 Discussion
In this chapter, we introduced the first social contact study focusing specifically on
contact networks within households. Inference of within-household contact networks
in previous studies was based on egocentric contact surveys in which each household
network was only partly observed (Potter et al., 2011; Potter and Hens, 2013) or on
limited data in a very specific setting (rural Peru; Grijalva et al. (2015)). Our survey
design was an improvement on the former surveys, since information on contacts
between all household members was available. Consequently, the obtained contact
networks are likely better than the estimates obtained by the previously mentioned
studies. We analyzed the household network data using ERGMs to assess the effect
of factors such as role in the household, gender, age in children and household
size on close contacts within households. We found that contacts between father
and children are less likely than between father and mother, mother and children
and siblings (expect older siblings). This is in line with conclusions obtained by
de Greeff et al. (2012). They analyzed data for pertussis in household with young
infants and found that fathers were less susceptible to pertussis infection than other
household members, whereas mothers were more infectious to their infants. Targeted
vaccination of mothers as well as siblings were found the most effective, the latter
because siblings more often introduce an infection in the household. We found that
the mean number of contacts increases with increasing household size (see Table A.1),
supporting density-dependent contact rates as found in previous contact surveys
(Mossong et al., 2008b). However, studies on household epidemic data of close-contact
infections (Melegaro et al., 2004; Cauchemez et al., 2004; de Greeff et al., 2012)
support frequency-dependent transmission, since they report a decreasing trend in
instantaneous risk of transmission between a susceptible and infectious individuals
with household size. This discrepancy remains to be investigated. To assess the
common assumption of random mixing within households, we simulated epidemics in
a two-level SIR setting based on either the empirically grounded networks or random
mixing. We did not find any important differences, indicating that the assumption of
random mixing between household members may be an adequate approximation of
social contact behavior in this setting for infections transmitted via close contacts.
Our study has a number of limitations and assumptions. We assume that a
contact occurred if it was reported by at least one household member. Thus, contacts
forgotten by both members could result in an underestimation of the network density.
Potter et al. (2015) developed a model to deal with the issue of reporting error
104 Chapter 5. Empirical Household Contact Networks
on network edges. However, given that the high reciprocity rates (98%) indicate
a very good reporting quality of the survey, we believe this will not have a large
impact on our conclusions. Further, our results depend on the contact definition
used to determine the within-household network links and cannot be generalized to
the spread of any infectious disease. Based on the exploration of various contact
definitions when using POLYMOD contact data to estimate age-specific varicella
transmission rates (Goeyvaerts et al., 2010), we opted to use physical contacts in this
study as a surrogate of potential transmission events for close-contact infections such
as influenza and smallpox. Keeling and Eames (2005) correctly note that even for two
airborne infections, different networks may be appropriate because differing levels of
interaction will be required to constitute an effective contact. Lastly, the comparison
between random mixing and empirical-based mixing was assessed by simulating
according to a two-level mixing model in a completely susceptible population of
households. It is possible that other settings (e.g. with a different structure or a par-
tially immune population) entail bigger differences between these mixing assumptions.
The methods in this chapter can be extended in a number of ways which will
be topics of future research. Figure A.1 indicates a relationship-specific heterogeneity
in duration of contact, which might be relevant for some diseases. The ERGM
framework can be adapted to model ‘valued’ within-household contact networks
(Krivitsky, 2012), with the value of a contact determined by its total duration, and
by weighting the transmission rates in the epidemic simulation model accordingly.
It is also of potential interest to capture temporal dynamics of within-household
contacts and to simulate the impact of contact formation and dissolution on the
spread of infection (Hanneke et al., 2010; Krivitsky and Handcock, 2014). Table 5.1
suggests that the physical contact networks are less complete on weekdays compared
to weekends, whereas the difference between the regular and holiday periods seems
to be minor. Further, on weekdays, interactions between household members are
expected to occur mostly during the morning and the evening, before and after
school or work time. In the survey, participants had to indicate in which location
most time was spent within certain time blocks. Combining this time-use like data
with the contact diary allows to infer the potential timing of (physical) contacts with
household members, and to estimate dynamic within-household contact networks.
This would also be valuable to inform large-scale individual-based simulation models
of infectious disease spread. Finally, combining the model for within-household
contact networks developed in this chapter with epidemic data from a similar com-
munity of households, would allow to improve estimates of age-specific heterogeneity
5.4. Discussion 105
in susceptibility and infectiousness for infections such as influenza (Addy et al., 1991).
This study provides unique insights into within-household contacts, considered
to be important drivers of many close-contact infections. It is the first empirical
evidence resulting from a large household contact survey supporting the use of the
random mixing assumption in epidemic models incorporating household structure.
Chapter 6Bayesian Inference for the Two-Level
Mixing Model Incorporating
Empirical Household Contact
Networks
In Chapter 5, we emphasized the key role that households have in the spread of
airborne infectious diseases and, consequently, the importance of describing the
mixing patterns within households as realistically as possible. To do so, we developed
a network model based on the first household contact survey to account for contact
heterogeneity within households.
In the last 15 years, network theory has gained considerable attention to model
interactions between hosts that enable the spread of disease through a population.
Estimating the parameters of these network models from data, however, remains
a huge challenge because of the inherent complexity of these sophisticated models.
Britton and O’Neill (2002) were the first to perform Bayesian statistical inference for
stochastic epidemic models by including an underlying unobserved social structure
modeled as a Bernoulli random graph i.e. assuming random mixing. Demiris and
O’Neill (2005b) extended this method by developing inference for infection rates and
imputed the contact graph, assuming random mixing within and between groups
107
108 Chapter 6. Two-Level Mixing Model Incorporating Household Networks
i.e. allowing for two levels of mixing. This type of model assumes that a population
of individuals is partitioned into groups (e.g. households, farms, etc.) in which
infectious contacts can occur both locally within a group, and globally between
groups. Groendyke et al. (2011, 2012) extended the work of Britton and O’Neill
(2002) by suggesting generalizations of the network model that was used and by
implementing software to perform inference for these models. They considered a
SEIR epidemic model on a random network modeled as an Erdos-Renyi random
graph, which is one particular type of a more general class of exponential-family
random graph models. In all these models the structure of the contact networks are
inferred from epidemic data only. As a consequence strict assumptions are necessary
to make estimation possible.
In this chapter, we will integrate the household network model inferred in the
previous chapter into an epidemic two-level mixing model to estimate parameters
from household disease data. Data augmentation is used to identify the social
structure consistent with the observed disease data and the network model. Hence,
unlike the models described above, the underlying contact graph is informed by both
empirical contact data as well as disease data. Inference will be done in the Bayesian
framework, using MCMC, in which it is natural to use data augmentation. This
approach is illustrated using the data on pertussis collected in households in the
Netherlands, which is described in Section 2.1.3.
The structure of this chapter is as follows. In Section 6.1, we describe the
model, present the corresponding likelihood and provide details on the MCMC
sampling. Preliminary results are shown in Section 6.2 and a discussion is provided
in Section 6.3.
6.1 Methodology
6.1.1 Model Description
In the following, the transmission model and underlying contact network are
described.
Two-level mixing model
Consider a population of independent households of varying sizes. We shall
6.1. Methodology 109
represent the within-household social structure using a random graph G. Specifically,
each individual in the population is represented by a vertex in G. Two vertices
are adjacent in a specific realization G of G if the corresponding individuals made
physical contact. We can now define a two-level mixing model on G. We assume
a discrete-time SEIR model. Hence, at time point t ≥ 0 every individual in the
population is either susceptible (S), exposed (E), infectious (I) or removed (R). A
susceptible individual may become infected after which he is exposed but not yet
infectious. After this period he becomes infectious and can transmit the disease to
others. Ultimately all infectious individuals recover. Note that we do not consider
possible reinfection due to the limited time scale of our data.
Now, consider household h of size nh. Denote the household members for
which pertussis is laboratory confirmed with Ih and the other members with Sh.
The data consist of a set of symptom onset times oh = {ohj , j ∈ Ih}. For pertussis
we know that the symptom onset time is also the start of infectiousness (Centers for
Disease Control and Prevention, 2016). Hence, individual j whose symptoms started
at time ohj , is infectious from time ohj until time rhj , with rhj − ohj = c the length
of the infectious period. We assume that c is known and thus fixed. Furthermore,
this individual is assumed to be infected at time ehj and the latent period ohj − ehj is
assumed to be gamma distributed with mean µ and variance σ2. Define the primary
case of the household as ph, such that ehph = min{ehj , j ∈ Ih} and Ih − {1} as the
group infected household members without the primary case. Denote the end of
follow-up for this household as Th. Lastly, let I correspond to the confirmed cases in
all households and define S,o, r, e,p, I− {1} and T in the same way.
We assume two levels of transmission: within-household transmission via the
physical contact structure G and a constant background risk of infection from the
community. Let βh be the within-household transmission probability (per contact
per day) and βc the community transmission probability (per day).
Network model
The random graph G is modeled according to the final ERGM obtained in
Section 5.2. For simplicity and computational efficiency, we will focus on the weekday
model (Table 5.3). Recall that the physical contact network density decreased with
increasing household size and that the odds of a physical contact between father and
child was smaller than for any other pair except for older siblings
110 Chapter 6. Two-Level Mixing Model Incorporating Household Networks
Vaccination
In the Netherlands, infants were offered a primary vaccination series of 4 doses
of whole cell DTP-IPV since 1957 which was replaced by an acellular vaccine in
2005. In 2002, an acellular pertussis preschool booster was introduced for children
at the age of four years. Vaccination coverage has been high over the past decades.
According to the report on the national immunisation programme (NIP) in the
Netherlands (National Institute for Public Health and the Environment, 2013), the
vaccine effectiveness (VE) of the primary series increased after the replacement of
the whole cell vaccine with the acellular one. Further, the VE for the booster dose
decreases after approximately 4 years, i.e. when children reach the age of eight years,
to about 18% at the age of 14 years (see Table 6.1). Note that vaccine effectiveness
is defined as the percentage reduction in the incidence of disease among vaccinated
persons compared with unvaccinated persons.
Age / Birth-cohort ’98 ’99 ’00 ’01 ’02 ’03 ’04 ’05 ’06 ’07 ’08
1yr 38 63 78 73 63 29 54 72 87 92 90
2yr 33 22 52 46 41 - - 67 58 92 91
3yr 9 - - - 54 10 37 59 43 84 82
4-5yr - 77 71 82 86 80 84 83 93 89 -
6yr 74 70 80 79 71 61 89 87 90 - -
7yr 68 71 68 71 51 61 67 86 - - -
8yr 77 75 56 47 35 72 80 - - - -
9yr 73 63 36 49 34 69 - - - - -
10yr 60 - 13 24 59 - - - - - -
11yr - 11 - 5 - - - - - - -
12yr 45 3 14 - - - - - - - -
13yr - - - - - - - - - - -
14yr 18 - - - - - - - - - -
Table 6.1: Estimation of vaccine effectiveness for 1 to 14-year-olds per birth cohort
according to the NIP report (National Institute for Public Health and the Environment,
2013).
We therefore make the following assumptions for our data set, (1) a vaccinated
individual s that is 14 years or younger is susceptible with probability fs = 1− V E,
where the estimates of the VE per age group and birth-cohort from the NIP report
6.1. Methodology 111
are used, (2) a vaccinated individual of 14 years and older is completely susceptible,
fs = 1, and (3) individuals younger than 15 years with unknown vaccination status
are considered to be vaccinated (four individuals).
6.1.2 Likelihood and Posterior Density
The data under consideration consists out of dates of symptom onset only, therefore
a large part of the infectious process is not observed. This makes inference for the
parameters of the transmission model challenging, because the likelihood of the data
given the parameters is intractable for two reasons: (1) it involves summation over all
possible infection times e (O’Neill and Becker, 2001) and (2) all possible networks G
(Britton and O’Neill, 2002). We therefore include the infection times and G as extra
model parameters. The augmented log-likelihood function has three components.
LL = log{π(o, e|βh, βc, µ, σ,G)} =∑h
log{π(oh, eh|βh, βc, µ, σ,G) = LL1+LL2+LL3
We present these components for one household h and omit the superscript h for
clarity. First, the contribution from the infections is given by
LL1 =∑
j∈I−{1}
(ej−2∑t=0
(log{1− pj(t)}) + log{pj(ej − 1)}
),
where pj(t) is the probability that individual j acquires infection from time t to time
t+ 1 (in days):
pj(t) = fj
[1− (1− βh)
∑(j,k)∈G 1ok≤t<rk (1− βc)
].
Similarly, the contribution from the individuals who do not get infected is given by
LL2 =∑j∈S
T−1∑t=0
log{1− pj(t)}.
Finally, the contribution accounting for the incubation process is
LL3 =∑j∈I
log{dµ,σ(oj − ej)},
where dµ,σ is a Gamma density with mean µ and variance σ2. By Bayes theorem, the
posterior log-density is
log{π(βh, βc, µ, σ, e, G|o)} ∝ LL+ log{π(G)}+ log{π(βh, βc, µ, σ)} (6.1)
112 Chapter 6. Two-Level Mixing Model Incorporating Household Networks
where π(βh, βc, µ, σ) = π(βh)π(βc)π(µ)π(σ) is the prior density for βh, βc, µ, σ and
π(G) = P (G = G) is specified by the network model.
Independent prior distributions are chosen for βh, βc, µ and σ. An uninformative
uniform U [0, 1] distribution is used for βh and βc. For µ and σ a Gamma distribution
with mean 9 and variance 32 is used, reflecting the belief that the mean incubation
period is around 7-10 days long and reported within the range 4-21 days (CDC and
Ncird, 2015; World Health Organization, 2010).
6.1.3 MCMC Sampling
We use an MCMC algorithm to generate approximate samples from the posterior
density in (6.1). The chain is initialized as follows: βh and βc are drawn from
the uniform distribution U [0, 1] and µ and σ from U [4, 21]. Further, for every
individual j in household h, the time of infection ehj is drawn in U [ohj − 21, ohj − 4]
and the end of the infectious period is set fixed at 21 days after the onset of symptoms.
Parameters are updated using single-component Metropolis-Hastings sampling.
In each iteration, we perform the following steps: (1) updating βh and βc, (2)
updating µ and σ, (3) updating G by resampling the household network in 10%
of the households, and lastly (4) updating one infection time per household. To
ensure plausible values, a logit-transformation is applied to βh and βc and a
log-transformation to µ and σ.
For steps (1) and (2), random walk sampling is applied by generating a new
value from the normal distribution centered at the current value with variance δ.
In (1) the value of δ is set at 0.5 and 1.0, and in (2) at 0.1 and 0.2, respectively.
In (3) and (4) independence sampling is used. For step (3) 10% of the households
are randomly selected and in each of these households either an edge is removed or
added from G, both with equal probability. If no edges can be removed or added,
the proposal is automatically rejected. Lastly, in (4) individual j ∈ Ih is randomly
selected in household h and the infection time ehj is updated using a random walk
with step size 1 conditional on ohj − 21 < ehj < ohj .
We perform 200, 000 iterations from which the first 40,000 iterations are dis-
carded as burn-in. The length of this burn-in period is based on the convergence
diagnostic by Boone, Merrick and Krachey (BMK) (Boone et al., 2012). From the
MCMC samples the posterior medians and 95% credible intervals for all parameters
6.2. Preliminary Results 113
are estimated.
6.2 Preliminary Results
Trace plots, posterior and prior distributions for the parameters and two selected
network statistics are shown in Figures 6.1 and 6.2. The BMK diagnostic indicates
convergence for all chains (Hellinger distances smaller than 0.5), however borderline
results are obtained for the mean duration of the incubation period µ. Inspecting
this chain reveals high auto-correlation, suggesting that more iterations are needed.
Posterior correlations between µ and the other parameters are low, however, βh and
βc are inversely correlated (cor(βh, βc) = −0.46). Trace plots of the number of edges
and triangles indicate that the network mixes well. Posterior medians and credible
intervals are presented in Table 6.2. The mean duration of the incubation period is
estimated at 9.4 (8.4, 11.1) days.
Figure 6.1: Trace plot of the MCMC samples for the within-household transmission
probability (βh), the community risk of infection (βc), the mean duration of the incubation
period (µ), the standard deviation of the incubation period (σ) and the number of edges
and triangles in the household contact network G.
114 Chapter 6. Two-Level Mixing Model Incorporating Household Networks
Figure 6.2: Prior and posterior distributions for the within-household transmission
probability (βh), the community risk of infection (βc), the mean duration of the incubation
period (µ) and the standard deviation of the incubation period (σ). Dotted lines are prior
distributions.
6.3 Discussion
The model described in this chapter allows inference for disease transmission in a
population of households with underlying social structure within households. In
contrast with existing models, this underlying contact graph is not only informed by
the disease data but also by empirical contact data.
6.3. Discussion 115
Table 6.2: Posterior median and 95% posterior credible intervals for the model
parameters.
Parameter Median 95% CI
Within-household transmission βh (days−1) 0.0094 (0.0072, 0.012)
Community transmission βc (days−1) 0.0019 (0.0011, 0.0029)
Mean duration of incubation period µ (days) 9.4 (8.4, 11.1)
Standard deviation of incubation period σ (days) 5.0 (4.0, 5.7)
Data on symptom onset times of pertussis in households with a laboratory
confirmed index case were used to illustrate our approach. The likelihood of the data
was numerically intractable, therefore data augmentation was used via an MCMC
approach. By assuming a fixed length of the infectious period, it was possible to
estimate transmission parameters on two levels (within households and from the
community) and the duration of the latent period, which was assumed to be equal to
the incubation period. Our estimates were obtained using vague prior distributions,
except the prior distribution for the duration of the latent period was somewhat
more informative.
Our approach has some limitations and assumptions that need to be discussed.
First, we only observe infected households with a hospitalized index case. We dealt
with this by assuming all households to be independent and conditioning on the
state of the household at the beginning of follow-up. Another possibility would have
been to assume that the observed data is a sample from the population (O’Neill,
2009), however this would require making assumptions for the uninfected households
i.e. which size do these households have and how many are there? Cauchemez
et al. (2004) showed in a similar household setting that changing the proportion of
uninfected households did not affect the bias in the community risk. Second, we
only considered those households in which all infected persons (as determined by
PCR/serology) had a clearly defined day of symptom onset. This way, asymptomatic
cases were ignored although they could have an impact on household transmission
rates. Although the proportion of cases with missing symptom onset date in our
data was fairly small (13%), further research is necessary to determine the effect of
asymptomatic pertussis cases. Third, we assumed that the contact networks derived
from the Flemish household contact study are appropriate for the estimation of
116 Chapter 6. Two-Level Mixing Model Incorporating Household Networks
pertussis transmission in a population of households in the Netherlands. We believe
that the difference between the two study populations will not have a large effect on
the estimation, since both studies focused on household with young children. Also, as
in Chapter 5, we assumed physical contacts to be a proxy for potential transmission
events for a close-contact infection such as pertussis. However, a different contact
definition might be more appropriate and the model for the contact network would
have to be re-evaluated accordingly.
There are still some aspects in this work that need to be studied. To assess
the adequacy and the fit of the model to the data, we will simulate epidemics in a
community of households similar in structure to the pertussis data with parameters
drawn from the posterior distributions (see also Section 5.3). Further, we will repeat
the estimation by assuming random mixing within households and compare the
results with our approach. We assumed a fixed duration of the infectious period of 21
days according to the literature (CDC and Ncird, 2015; World Health Organization,
2010). A sensitivity analysis is necessary to determine the impact of this value on the
obtained estimates. Lastly, updating the contact network in our approach requires
evaluating the likelihood of the network implied by the ERGM. This is very time
consuming and we will need to investigate the efficiency of the algorithm in more
depth and assess whether it is worthwhile to include network updates instead of
using a fixed empirically-based contact network.
Chapter 7Spatiotemporal Evolution of Ebola
Virus Disease at Sub-national Level
during the 2014 West Africa
Epidemic: Model Scrutiny and Data
Meagreness
The Ebola outbreak of 2014 is the most widespread outbreak of EVD in history,
causing a huge number of cases and deaths. Due to the nature of this outbreak
as a global public health threat, a large number of models have been published
that aimed to estimate epidemiological parameters, and to forecast the evolution
of the epidemic. These models were mainly deterministic, SEIR transmission
models (Althaus et al., 2015; Fisman et al., 2014; Gomes et al., 2014; Nishiura
and Chowell, 2014). Most models, and especially the ones early in the outbreak,
were fitted on reported cumulative national data. Doing so, they did not account
for the transmission heterogeneity of this outbreak and the serial correlation
induced by the accumulation of data. However, in the course of the outbreak,
others highlighted the importance of the spatial and temporal heterogeneity of the
outbreak, questioning assumptions made by early models (Chowell et al., 2014b).
A study by King et al. (2015) illustrated through simulations that deterministic
117
118 Chapter 7. Spatiotemporal Evolution of EVD at Sub-national Level
models, fitted on cumulative incidence data, lead to substantial underestimation of
the uncertainty in estimates and forecasts. In addition, fitting of the models was
often done not taking into account the serial correlation. The clustered pattern
of transmission could be attributed to variability in transmission settings (e.g.
healthcare facilities, households, burials) (Merler et al., 2015), behavior (e.g. expres-
sions of mistrust) and control measures (e.g. contact tracing and monitoring and
establishment of a treatment center). However, there is still a lack of insight in the
relative contribution of each factor to the transmission pattern (Chowell et al., 2014a).
A good understanding of the outbreak transmission can support an efficient
allocation of resources at national and at district level. With the study discussed in
this chapter, we aimed to develop a model that would overcome previously identified
model limitations, including the not-used district level data and the assumption of
homogenous transmission across districts. Our two-stage model is based on publicly
available data (described in Section 2.1.4) that aimed to improve the information
for operational decisions to control the epidemic. The first stage is the use of a
growth model that addresses the spatiotemporal correlation. The second stage is
the use of a compartmental model - whenever deemed appropriate - that provides a
district-specific estimation of the effective reproduction number and its uncertainty.
In addition, we performed a sensitivity analysis to study the effect of the model
assumptions on the parameter estimates.
This chapter covers the study in Santermans et al. (2016a). Section 7.1 dis-
cusses the growth model and corresponding results, while Section 7.2 provides the
model description, estimation procedure and results for the compartmental model.
To assess the sensitivity of the results in Section 7.2, a sensitivity analysis is presented
in Section 7.3. We conclude with a discussion in Section 7.4.
7.1 Growth model
7.1.1 Model Description
To compare growth patterns over time among districts, we use a flexible spatiotem-
poral growth rate model across all districts. The weekly number of new infections in
each district is modeled via a count-distribution allowing for possible overdispersion.
The expected number of cases is modeled using a spatiotemporal function that makes
a distinction between the temporal process and the spatial process. While the num-
7.1. Growth model 119
ber of cases in a district is allowed to depend on the number of cases in this district
the week before, the number of cases also depends on the number of cases in the
neighboring districts. The growth rate is obtained numerically as the derivative of
the expected number of cases. This model can be written as :
Ii(t) ∼ NegBin(µi(t)),
log(µi(t)) = β0 + β1C1 + β2C2 + fi(t),
in which Ii(t) is the number of newly infected cases in week t and district i, C1 is an
indicator variable for Sierra Leone, C2 for Guinea and fi(t) is defined as a separable
spatiotemporal function:
fi(t) = xi,t = φxi,t−1 + εi,t,
with φ a scalar and ε a Gaussian spatial random walk process, defined as
εi,t|εj,t, τ ∼ N
1
ni
∑i∼j
εj,t,1
niτ
,
where ni is the number of neighbors of district i, τ is a precision parameter and i ∼ jindicates that districts i and j are neighboring districts.
While Markov Chain Monte Carle (MCMC) methods are often used to esti-
mate the parameters of interest in this model, it is computationally intensive.
Therefore, we use Integrated Nested Laplace Approximation (INLA; Rue et al.
(2009)) as an alternative estimation method. The INLA approach is a fast Bayesian
inference tool that uses accurate approximations to the densities of the hyperparam-
eters and latent variables in the model.
This model allows the estimation of the district-specific expected number of
new cases per week exp(β0 + βCi + fi(t)), the district-specific time trend fi(t),
the district-specific growth rate dfi(t)dt (estimated as
xi,t+1−xi,t−1
2 ) and the spatial
distribution of the growth rate within the three countries. In addition, to investigate
the effect of implemented intervention measures on the estimated growth rates, for
each district a Pearson’s Chi-square test was used. Doing so, we tested, for different
time lags, the association between positive and negative growth rates and the absence
or presence of aforementioned intervention measures. A growth rate distribution heat
map was made as a method to visualize the weekly change for each district-specific
rate of infection with an overlay of intervention measures.
120 Chapter 7. Spatiotemporal Evolution of EVD at Sub-national Level
7.1.2 Results
Results of the growth rate model are shown in the heat map in Figure 7.1.
Figure 7.1: Estimated weekly growth rates per district and implemented intervention
measures for Guinea, Sierra Leone and Liberia, 2014-2015. Red colours indicate an increase
in number of weekly cases, whereas blue colours indicate a decline. Periods for which no
reported cases are available are shown in white. A light dot indicates that a triage, holding
centre or CCC is in place and a dark dot indicates that an ETU or ETU and CCC are in
place.
Comparing the growth rates in the different districts (rows), it is clear that the
outbreak did not evolve uniformly over districts. Pearson’s Chi-square test for
decrease in growth rate after implementation of control measures, for different time
7.1. Growth model 121
lags, did not reveal any insights (results not shown). Figure 7.2 shows the estimated
growth rates and implemented intervention measures for four selected time points on
a geographical map of West Africa. This figure emphasizes the spatial heterogeneity
of the outbreak, even within countries. To complement the heat map, the cumulative
number of cases and deaths are shown in Figures 7.3 and 7.4, respectively. These
figures make it possible to identify the most affected regions.
Figure 7.2: Estimated growth rate per district and implemented intervention measures
during week 21 and 40 of 2014 and week 8 and 26 of 2015. ‘1’ triage, holding centre or
CCC is in place; ‘2’ ETU or ETU plus CCC is in place.
122 Chapter 7. Spatiotemporal Evolution of EVD at Sub-national Level
Figure 7.3: Cumulative cases per district and implemented intervention measures. A light
dot indicates that a triage, holding centre or CCC is in place and a dark dot indicates that
an ETU or ETU and CCC are in place.
7.1. Growth model 123
Figure 7.4: Cumulative deaths per district and implemented intervention measures. A
light dot indicates that a triage, holding centre or CCC is in place and a dark dot indicates
that an ETU or ETU and CCC are in place.
124 Chapter 7. Spatiotemporal Evolution of EVD at Sub-national Level
7.2 Compartmental model
To address within-district disease evolution over time, district-specific compartmental
models were fitted to the number of newly reported cases and deaths.
7.2.1 Model Description
We use a version of a Susceptible-Exposed-Infected-Recovered (SEIR) model that
incorporates disease-related mortality by making the distinction between survivors
and non-survivors. It also takes into account an underreporting factor for cases and
deaths. This model is depicted as a flow diagram in Figure 7.5.
Figure 7.5: Flow diagram for the SEIR model with distinction between cases that survive
and fatal cases.
Hence, we assume that individuals are born susceptible (S) to infection. Then, as time
progresses they may become infected and move to the exposed compartment (E) at
a time-dependent transmission rate β(t). After the exposed stage, they become in-
fectious and a proportion 1− φ, that will eventually recover, moves to the infectious
IR compartment after a mean latent period 1/γ. The proportion of fatal cases, φ,
moves to the ID compartment at the same rate. Individuals in the IR compartment
recover after a mean infectious period 1/σ. Lastly, α denotes the disease-related mor-
tality rate. This model can be expressed by the following set of ordinary differential
equations:
7.2. Compartmental model 125
dS(t)dt = −β(t)S(t) IR(t)+ID(t)
N(t)
dE(t)dt = β(t)S(t) IR(t)+ID(t)
N(t) − γE(t)
dIR(t)dt = (1− φ)γE(t)− σIR(t)
dID(t)dt = φγE(t)− αID(t)
dR(t)dt = σIR(t)
dD(t)dt = αID(t)
In this notation N(t) = S(t) + E(t) + IR(t) + ID(t) + R(t) denotes population size.
The initial conditions at time t = 0 are given by R(0) = 0, IR(0) = ID(0) = 0, E(0)
is an unknown parameter which is estimated from the data and S(0) = N(0)− E(0)
where N(0) is the population size at the start of the epidemic. The expression for the
effective reproduction number R = Re for this model is given by:
Re(t) = β(t)
(φ
σ+
1− φα
).
7.2.2 Estimation Procedure
Fitting the ODE is done taking into account the specific reporting of cases and
deaths. Reporting occurs at varying time intervals. Figure 7.6 schematically shows
the reporting scheme of cases. The reporting scheme for deaths is similar but the
dates at which reporting occurs are not necessarily the same.
Figure 7.6: Schematic representation of reporting of case notifications.
126 Chapter 7. Spatiotemporal Evolution of EVD at Sub-national Level
The data consists of cumulative number of (suspected, probable and confirmed)
cases and deaths. Hence, this data is expected to increase monotonically over time.
However, due to reclassification of suspected cases over time, the cumulative number
of cases decreases at certain time points, resulting in negative number of new cases.
We therefore applied the pooled adjacent violator algorithm (PAVA) algorithm to
monotonize the cumulative data.
Denote the cumulative number of cases by ci, i = 1, , n. Suppose that i∗ is
the first index for which ci∗ < ci∗−1, i.e. the first index for which the monotone
behavior is violated. The PAVA now states that these values need to be “pooled”.
Hence, ci∗ and ci∗−1 are both replaced byci∗+ci∗−1
2 . The algorithm then proceeds by
recursively checking monotone behavior and by pooling if necessary and stops when
monotonicity is achieved.
We assume
yj ∼ NegBin(ρ× (Inew(tj − 1)− Inew(tj − hj − 1)), φ1),
dj ∼ NegBin(ρ× (m(tj − 1)−m(tj − hj − 1)), φ2),
where yj and hj are defined as in Figure 7.6 for cases and dj is the equivalent of yj for
deaths. Inew(t) (m(t)) is the expected cumulative number of cases (deaths) at time t
obtained by solving dInew(t)dt = γE(t) (dm(t)
dt = αID(t)), ρ is the expected fraction of
reported cases (deaths) and φi; i = 1, 2 are overdispersion parameters. The objective
function is then given by the sum of the negative binomial loglikelihoods specified
above. Further, we model Re(t) as a piecewise constant function Re(i) as follows:
Re(0) = R0, Re(i) = R0 + r1 + ...+ ri; i = 1, ..., n
Such that ri is the change in reproduction number compared to the previous time
interval. This implies that βi = Re(i)/(φσ + (1−φ)
α
); i = 0, ..., n is also piecewise
constant. The length of the intervals is chosen to be 21 days.
Prior estimates for the latent period (9.4 days), the infectious period for sur-
vivors (16.4 days) and deceased (7.5 days) are used following Lewnard et al. (2014).
The remaining parameters (φ1, φ2, E(0), φ, ρ, R0, ri; i = 1, , n) are estimated via
Markov Chain Monte Carlo using the adaptive-mixture metropolis algorithm. We
conducted 2,500,000 iterations retaining every 500th iteration. Burn-in is based on
the BMK convergence diagnostic. The MCMC procedure, which we made publicly
7.2. Compartmental model 127
available, was performed in R 3.1.1 using the Laplaces-Demon package (Roberts
and Rosenthal, 2009; Rosenthal, 2007). The univariate prior distributions are given
in Table 7.1. Of these, the prior distributions for φ1, φ2, E(0), R0 and ri are
uninformative. The underreporting rate ρ is assumed to follow a truncated normal
distribution with mean 0.33 based on (Centers for Disease Control and Prevention,
2014) and the case fatality ratio φ follows a beta distribution with mean 0.5.
Table 7.1: Prior distributions.
Parameter Definition Prior distribution
φ1 Overdispersion parameter cases HC(α = 25)
φ2 Overdispersion parameter deaths HC(α = 25)
E(0) Number of exposed individuals at time 0 U(0, 1)
φ Case fatality ratio Beta(α = 10, β = 10)
ρ Underreporting rate N(µ = 13, δ = 0.1); truncated(0, 1)
R0 Reproduction number 1st time period U(0, 10)
ri Changes in reproduction number U(−2, 2)
7.2.3 Results
We show the obtained results for a selection of rural and urban districts: Forecariah
(Guinea), Conakry (Guinea), Western Area Urban (Sierra Leone), and Grand Cape
Mount (Liberia). This selection was based on events of interest during the course
of the outbreak e.g. sudden increase in cases. We were, however, also restricted
by inconsistencies in the data as pointed out as a limitation of the model in the
discussion. Parameter estimates are given in Table 7.2.
Table 7.2: Parameter estimates with 95% posterior credible intervals.
District φ1 φ2ˆE(0) φ ρ
Forecariah 0.76 [0.54, 1.08] 3.18 [1.70, 7.19] 0.44 [0.07, 0.96] 0.66 [0.54, 0.77] 0.33 [0.13, 0.53]
Conakry 0.62 [0.44, 0.89] 1.61 [1.03, 2.60] 61.4 [20.6, 97.7] 0.53 [0.41, 0.67] 0.34 [0.17, 0.54]
Western Area Urban 2.34 [1.71, 3.20] 4.17 [2.31, 8.21] 0.55 [0.10, 0.98] 0.19 [0.16, 0.22] 0.35 [0.16, 0.55]
Grand Cape Mount 0.72 [0.51, 1.00] 0.62 [0.43, 0.90] 0.54 [0.08, 0.98] 0.62 [0.48, 0.77] 0.33 [0.10, 0.54]
128 Chapter 7. Spatiotemporal Evolution of EVD at Sub-national Level
The observed and estimated number of new and cumulative cases and deaths are
shown in Figure 7.7. From this figure, we observe that the model fits both the number
of cases and the number of deaths relatively well. Also at the level of the cumulative
numbers model and data show a reasonable fit. The estimated effective reproduction
numbers over time are shown in Figure 7.9. Re(t) ranges from below unity to up to
3.5. Furthermore, estimates are below one for all four districts in the last time period.
However, the 95% credible intervals indicate substantial variability.
Figure 7.7: Observed (black) and estimated (blue) number of new cases (top left), new
deaths (top right), cumulative cases (bottom left) and cumulative deaths (bottom right)
per district. Dashed lines are 95% credible intervals.
We assessed retrospectively the quality of three-week long predictions made at 4
7.2. Compartmental model 129
different time points, for the selected districts and we compared these predictions
with the actual observed number of cases and deaths. Results of these short-term
predictions are presented in Figure 7.8 for Western Area Urban.
Figure 7.8: Three-week prediction of new cases (left) and deaths (right) for Western Area
Urban at 24 October, 14 November, 5 December and 26 December 2014 (top to bottom).
Light blue regions are the predicted time periods and estimation is based on all data before
that time point.
130 Chapter 7. Spatiotemporal Evolution of EVD at Sub-national Level
Note that the credible intervals do not contain all data points. Hence, even within a
3-week forecast period, the models are not always able to capture all the trends.
Figure 7.9: Estimated reproduction number per district with 95% posterior intervals.
The threshold value of one is indicated by a red horizontal line.
7.3 Sensitivity Analysis
To assess the sensitivity of our results in Section 7.2 to the model assumptions, a few
additional models were fitted to the data of Nzerekore, Guinea. We investigated the
estimability of the fixed parameters, the effect of the number of exposed individuals
at time 0, transmission through contacts with bodies of dead people, and protective
immunity by asymptomatic infections. We compared the models with Deviance
Information Criterion (DIC). We use the following notation:
Model 1: the model described in Section 7.2
Models 2a-2f : fixing E(0) and varying its value from 0.01 to 10
Model 3: estimating E(0) with uninformative prior U(0, 1000)
Model 4: estimating E(0) with uninformative prior U(0, 1000) and fixing R0 to 2.00
Model 5: estimating E(0) with uninformative prior U(0, 1000) and fixing ρ to 0.33
Model 6: fixing E(0) and estimating the latent period
7.3. Sensitivity Analysis 131
Model 7: fixing E(0) and estimating the infectious period of non-fatal cases
Model 8: fixing E(0) and estimating the infectious period of fatal cases
Model 9: fixing E(0) and estimating the underreporting of deaths, ρdeaths, separately
Models 2 to 5 look at the effect of the parameter E(0), the number of exposed indi-
viduals on 23 May, 2014. In Models 6 to 9 we look at the estimation of several fixed
parameters. In Models 10 to 13c we take into account that EVD can be transmit-
ted through contact with the bodies of dead people. This model is expressed in the
following set of differential equations.
dS(t)dt = −β(t)S(t) IR(t)+ID(t)+mDI(t)
N(t)
dE(t)dt = β(t)S(t) IR(t)+ID(t)+mDI(t)
N(t) − γE(t)
dIR(t)dt = (1− φ)γE(t)− σIR(t)
dID(t)dt = φγE(t)− αID(t)
dR(t)dt = σIR(t)
dDI(t)dt = αID(t)− κDI(t)
dDR(t)dt = κDI(t)
Hence, when an individual dies from EVD, the body of that individual can transmit
the disease (state DI) for a period of time (1/κ) with transmission rate mβ(t). It
then moves to state DR where transmission is no longer possible e.g. after burial.
Model 10: uninformative prior U(0, 20) for m and Gamma distribution with
mean 2 days and standard deviation 1.5 days for 1/κ
Models 11a-11b: fixing m = 1, 2
Model 12: fixing 1/κ = 2 days
Models 13a-13c: fixing 1/κ = 2 days and m = 0.1, 0.5, 1
Finally, since there is evidence of asymptomatic Ebola infections (Bellan et al., 2014),
we assess the effect of protective immunity by asymptomatic infections in Models
14a-14d. These correspond to the following set of ODEs:
132 Chapter 7. Spatiotemporal Evolution of EVD at Sub-national Level
dS(t)dt = −β(t)S(t) IR(t)+ID(t)
N(t)
dE(t)dt = β(t)S(t) IR(t)+ID(t)
N(t) − γE(t)
dIR(t)dt = (1− p)(1− φ)γE(t)− σIR(t)
dID(t)dt = (1− p)φγE(t)− αID(t)
dR(t)dt = σIR(t)
dD(t)dt = αID(t) + pγE(t)
where p is the proportion of asymptomatic cases.
Models 14a-14e: fixing p = 0.1, 0.2, 0.3, 0.4, 0.5
The results are given in Tables 7.3 and 7.4. Looking at the DIC values of
model 2, we see that there are very little differences, indicating that E(0) is
not estimable from the data. However, for large values of E(0) (see model 2f)
optimization leads to a local maximum with a very small reporting rate and high
values of Re, which are deemed implausible. Moreover, mixing in this model is
very poor and convergence is not attained. The same is observed when estimating
E(0) with an uninformative prior (model 3), even when R0 is kept constant (model
4). In model 5 the underreporting rate is fixed, this leads to convergence and
good results, however, there is no improvement in DIC compared to model 1. For
this reason, we chose to estimate E(0) between 0 and 1 in our final model. Note
that the value of E(0) in the converged models only affects the estimates of Re
in the first time periods. Making the most recent estimates robust to changes in E(0).
In models 6, 7 and 8 the latent period and infectious periods are estimated,
but again this leads to bad convergence and DIC does not improve. In model 9 we
explored whether a different underreporting rate for deaths could be estimated. But
bad mixing and high autocorrelation for that parameter indicated that this is not
possible. Again, the most recent estimates of Re are quite robust in converged models.
In models 10 to 13c we look at the transmission of dead bodies. When esti-
mating both parameters (m and κ), m is estimated to be 0 and the model does
not converge. The same result is obtained when fixing 1/κ to 2 days (model 12).
7.4. Discussion 133
Hence, m is not estimable from the data. We therefore fix m to different values both
estimating and fixing κ (models 11 and 13), but this does not lead to improvement
in DIC or large changes in recent estimates of Re.
Finally in models 14a-14d, we do see an improvement in DIC with growing
proportion of asymptomatic cases, suggesting that taking into account the possibility
of asymptomatic cases is coherent with observations.
7.4 Discussion
The results of our study strengthen the evidence of a strong temporal and spatial
variability of the EVD transmission at a sub-national level in the affected regions of
Guinea, Sierra Leone and Liberia. The variable transmission dynamics are a major
challenge for the implementation of intervention measures and the mobilization of
resources among districts. This complexity highlights the importance of constant
monitoring and the usefulness of quantitative tools, thereby taking full account of
the uncertainty, to inform the response.
Our growth model quantifies spatiotemporal transmission patterns at a sub-national
level, which cannot be derived from visual inspection of incidence curves and maps
alone. The visualization of the growth rates with a two dimensional (time and space)
heatmap, is useful for decision makers to make evidence based informed decisions on
resource allocation. On the other hand, our compartmental model allows the calcula-
tion of a quantitative measure of transmission, Re(t), that can be used to compare and
communicate about differences in outbreak dynamics between districts and over time.
The combined model illustrates how district-level data can be used to gain a
quantitative insight in the complex outbreak dynamics. Both models show how
the trend varies widely among the districts and changes quickly in time and space
(Figures 7.1 and 7.9). While our estimates of Re(t) are within the range of published
estimates, most of the published estimates were derived from country-level data and
do not provide the granularity we provide at time-dependent district level. The wide
range of Re(t) between near 0 and 3.5 illustrates the need to complement national
with district data driven models, to support public health action.
134 Chapter 7. Spatiotemporal Evolution of EVD at Sub-national Level
Table
7.3
:P
ara
met
eres
tim
ate
sse
nsi
tivit
yanaly
sis.
Fix
edva
lues
are
indic
ate
din
bold
,ast
eris
ks
indic
ate
model
diff
eren
ces
com
pare
dto
the
final
model
1.
Model
12a
2b
2c
2d
2e
2f
34
56
78
9
E(0
)0.2
10.0
1∗
0.1
∗0.2
∗0.3
∗0.5
∗10∗
46.7
4∗
310.3
2∗
0.4
5∗
0.2
80.2
80.2
80.1
1/γ
9.4
9.4
9.4
9.4
9.4
9.4
9.4
9.4
9.4
9.4
1.9
2∗
9.4
9.4
9.4
1/σ
16.4
16.4
16.4
16.4
16.4
16.4
16.4
16.4
16.4
16.4
16.4
10.1
7∗
16.4
16.4
1/α
7.5
7.5
7.5
7.5
7.5
7.5
7.5
7.5
7.5
7.5
7.5
7.5
1.0
3∗
7.5
φ0.5
80.6
00.5
80.5
80.5
80.5
80.5
60.5
60.5
70.5
90.5
30.5
40.5
70.3
7
ρ0.3
20.3
60.3
30.3
20.3
20.3
10.0
009
0.0
009
0.0
010
0.3
3∗
0.0
009
0.0
009
0.3
20.4
3
ρdeath
s-
--
--
--
--
--
--
1.0
0∗
Re(0
)2.6
43.7
32.8
82.6
12.4
72.2
93.3
92.4
92.0
0∗
2.2
92.9
24.1
72.6
02.8
5
Re(1
)2.2
32.2
92.1
22.2
02.2
52.3
22.4
12.5
42.7
02.3
31.5
82.6
02.5
72.3
3
Re(2
)1.9
41.7
91.9
61.9
61.9
81.9
82.4
12.4
02.3
21.9
62.0
82.0
42.1
22.0
0
Re(3
)1.0
20.9
81.0
31.0
31.0
31.0
21.8
71.8
9168
1.0
41.5
91.9
51.1
01.0
4
Re(4
)0.6
00.6
30.6
10.6
00.6
00.5
92.0
82.1
01.7
50.6
02.2
82.4
00.6
70.6
2
Re(5
)0.3
70.3
50.3
60.3
70.3
70.3
82.2
02.2
81.7
50.3
72.4
22.7
40.4
60.3
6
Re(6
)0.2
60.2
70.2
70.2
50.2
60.2
62.1
72.2
21.6
60.2
63.0
92.8
60.2
40.2
6
Re(7
)0.2
40.2
30.2
40.2
40.2
40.2
41.9
12.0
31.2
30.2
42.7
52.8
60.1
60.2
4
Re(8
)0.3
80.4
00.4
10.4
30.4
00.4
01.9
42.0
11.4
80.4
32.6
02.7
80.4
40.4
1
DIC
457.5
7459.8
1456.7
7457.1
3456.4
3456.6
6456.3
7454.2
4454.6
5458.6
6581.3
7463.7
2457.2
5458.1
9
7.4. Discussion 135
Table
7.4
:P
ara
met
eres
tim
ate
sse
nsi
tivit
yanaly
sis.
Fix
edva
lues
are
indic
ate
din
bold
,ast
eris
ks
indic
ate
model
diff
eren
ces
com
pare
dto
the
final
model
1.
Model
10
11a
11b
12
13a
13b
13c
14a
14b
14c
14d
14e
E(0
)0.2
00.2
00.2
00.4
90.2
10.1
90.7
10.2
30.2
60.2
90.3
20.3
6
1/γ
9.4
9.4
9.4
9.4
9.4
9.4
9.4
9.4
9.4
9.4
9.4
9.4
1/σ
16.4
16.4
16.4
16.4
16.4
16.4
16.4
16.4
16.4
16.4
16.4
16.4
1/α
7.5
7.5
7.5
7.5
7.5
7.5
7.5
7.5
7.5
7.5
7.5
7.5
p-
--
--
--
0.1
∗0.2
∗0.3
∗0.4
∗0.4
5∗
1/κ
0.6
7∗
0.5
6∗
0.5
6∗
2.0
0∗
2.0
0∗
2.0
0∗
2.0
0∗
--
--
-
m0.0
0∗
1.0
0∗
2.0
0∗
0.0
0∗
0.1
0∗
0.5
0∗
1.0
0∗
--
--
-
φ0.5
80.5
80.5
70.5
70.6
00.5
70.5
50.5
80.5
80.5
90.5
90.5
9
ρ0.3
30.3
20.3
20.0
01
0.3
20.3
20.0
009
0.3
20.3
20.3
20.3
30.3
3
Re(0
)2.6
52.6
02.5
73.9
32.6
12.5
64.2
22.9
53.3
23.7
64.4
14.8
1
Re(1
)2.2
22.1
52.1
32.7
32.1
82.1
02.6
62.4
92.8
43.2
43.7
84.0
8
Re(2
)1.9
31.9
11.8
72.1
01.9
21.8
52.0
82.1
32.3
52.6
32.9
83.2
3
Re(3
)1.0
30.9
90.9
81.5
11.0
10.9
51.8
91.1
71.3
51.5
61.8
52.0
6
Re(4
)0.6
00.6
00.5
91.2
90.5
90.5
82.2
70.6
60.7
40.8
41.0
11.1
2
Re(5
)0.3
70.3
60.3
60.9
60.3
70.3
62.6
20.4
30.4
80.5
40.6
20.6
8
Re(6
)0.2
70.2
50.2
50.8
00.2
50.2
42.7
70.2
80.3
10.3
70.4
30.4
8
Re(7
)0.2
30.2
40.2
30.6
90.2
30.2
32.7
10.2
80.3
20.3
50.4
20.4
6
Re(8
)0.4
40.4
00.3
91.2
00.4
10.4
42.6
80.4
30.4
30.4
10.4
80.4
5
DIC
460.0
4463.7
4460.3
4462.4
3457.0
2463.5
2460.7
5456.3
2456.4
0455.3
1454.7
9453.5
2
136 Chapter 7. Spatiotemporal Evolution of EVD at Sub-national Level
We further show that it is difficult to generate accurate predictions. Forecast
results should be interpreted with caution, as control measures and behavioral
changes cannot be sufficiently quantified with the publicly available data. Also, there
are still gaps in our basic knowledge about the disease spread that could potentially
explain outliers, departing from modeling approaches. We think here, for example,
of the three last reported cases in Liberia; one from suspected sexual transmis-
sion months after the source case recovered from disease (Christie et al., 2015),
and most recently two connected cases without any recognized link to outbreak chains.
One of the limitations of our model is the assumption of constant underre-
porting. Previous studies have also assumed a proportion of underreporting (Merler
et al., 2015). Knowledge about the level and changes in underreporting over time
would improve the estimates of transmission dynamics. Unfortunately we do not have
data to assess the magnitude or the variability of underreporting. Also, inconsistent
reporting with undocumented backlogging and the absence of dates of disease onset
may affect the accuracy of the estimates and need to be taken into consideration
when interpreting the results (Azmon et al., 2014). Furthermore, the district-specific
SEIR model is a mathematical model assuming a deterministic disease process. As a
consequence, the second phase of our approach was deemed inappropriate for some
districts, because the data did not seem to follow any consistent pattern, presumably
due to the aforementioned inconsistencies in detection and reporting and the sporadic
introduction of cases.
EVD can be transmitted through contact with dead bodies; therefore, a model
accounting for this transmission was included in the sensitivity analysis. However,
this model did not improve the fit to the data. Most likely, the extent to which
dead bodies versus cases contribute to transmission is indistinguishable with this
model and requires more information and a fully stochastic modeling approach on
disaggregated data, which is not publicly available.
Since there is evidence suggesting the presence of asymptomatic Ebola infec-
tions (Bellan et al., 2014), we looked at the effect of accounting for protective
immunity by asymptomatic infection. We observed that the model fit improved
with increasing proportion of asymptomatic cases, suggesting that our data do not
reject the hypothetical occurrence of asymptomatic cases. Asymptomatic cases could
partially explain why the epidemic did not reach the expected incidence as predicted
by models ignoring them. This again highlights the need for serological studies in
7.4. Discussion 137
order to clarify the role of asymptomatic infection.
While our sensitivity analysis assesses the influence of unknown parameters, it
cannot substitute for non-public data. The growth rate and compartmental models
can be run in real time using our published code and dataset, and can be improved
by organizations that have additional data available or to explore adaptations to the
models and parameters. In the end, different modeling approaches bring different
insights and will improve our ability to effectively support public health action. We
recommend that minimal datasets and standards for data processing, including de-
identification, and data sharing will be developed for future multi-country outbreaks,
especially Public Health Events of International Concern under the International
Health Regulations. The importance of this has also been retained as a conclusion in
a recent research paper on this topic (Sane and Edelstein, 2015).
Our two-stage modeling approach, built with the most detailed publicly avail-
able data, provides time-dependent district-specific quantitative measures of growth
and transmission. We hope that such tool, in addition to other approaches, can
complement public health action against such devastating events as the West-African
Ebola epidemic.
Chapter 8Discussion and Further Research
In this thesis, we have presented several strategies incorporating diverse sources of
social contact data to gain greater realism in modeling infectious disease transmission.
In Chapter 3, we studied and extended the traditional social contact hypothesis for
VZV serology in multiple European countries by accounting for differences related
to susceptibility and infectiousness. Goeyvaerts et al. (2010) showed that inference
for infectiousness proportionality factors is not possible based on serology only.
However, we proposed to use the effective reproduction number as a model eligibility
criterion for infections in endemic equilibrium to deal with this indeterminacy. We
concluded that the social contact hypothesis could be improved upon in 10 out
of 12 countries by including age-dependent factors. This could be attributed to
differences in susceptibility and infectiousness between individuals of different age
groups, but also to differences in the estimated social contact rates and the true
contact rates underlying VZV transmission. Estimates of the basic reproduction
number resulting from the best fitting model differed quite substantially between
countries indicating heterogeneity in VZV epidemiology across Europe. From a set of
demographic, socio-economic and spatio-temporal factors, some were found to have a
positive association with R0 (childhood vaccination coverage, child care attendance,
population density and average living area per person), whilst others showed a
negative association (income inequality, poverty, breast feeding, and the proportion
of children under 14). Interpretation of these associations is not straightforward
in all cases, however some factors e.g. poverty, income inequality and vaccination
coverages may be associated with countries in which children go into childcare from
an early age, facilitating the spread of VZV. The analyzes in this study relied on
139
140 Chapter 8. Discussion and Further Research
(1) endemicity of VZV which seems tenable for the countries under consideration
and which is supported by the similar results obtained for the two samples of
Italy, and (2) the appropriateness of the POLYMOD physical contacts. The effect
of a perturbation in the endemic equilibrium was studied in a small sensitivity
analysis, however, an in-depth analysis would be necessary to fully asses the impact
on the estimation of R. Furthermore, we have shown that knowledge about the
heterogeneity in susceptibility and infectiousness would prove to be very useful to
inform the link between transmission and contact rates when inferring infectious
disease parameters from serological data.
It is known that disease symptoms can affect the contact pattern of an indi-
vidual, for example when staying home from work or school during illness. In
Chapter 4, we have used social contact data from both symptomatic and healthy
individuals to inform mixing patterns in a mathematical disease model accounting
for asymptomatic infection. Applying this model to ILI incidence data, we have
found that the proportion of symptomatic infections and the relative infectiousness
of symptomatic versus asymptomatic cases are very strongly correlated. Hence,
the difference in contact behavior between individuals experiencing symptoms
and healthy/asymptomatic individuals allows estimating one of these parameters
conditional on the other e.g. when assuming asymptomatic individuals are equally
infectious as symptomatic cases. We have extended this model and found that
the data support the hypothesis that the development of ILI symptoms depends
on whether one was infected by a symptomatic or asymptomatic case under the
assumption that symptomatic cases are more infectious than asymptomatic cases. In
this modeling approach we have relied on literature-based fixed parameters and we
have included reporting rates, however, these were not estimable from the data and
information on under-reporting of cases in at least one age class would be necessary
to estimate the true number of cases. The results of this analysis have pointed us to
a preferential transmission hypothesis for influenza, however there are no reports of
a clear link between acquired viral dose and development of influenza symptoms in
the literature yet. Further research is necessary to investigate this relation. We have
highlighted the importance of using empirical data to describe the relation between
contact rates and symptom severity, contact data for other diseases is necessary to
extend this model.
In contrast to the common assumption of homogeneous mixing in infectious
disease modeling, human populations exhibit inherent structure because individuals
141
spend their time in various groups such as households, work places, schools, etc. In
Chapters 5 and 6, we focused on contact heterogeneity within these groups, more
specifically households. We started by introducing the first social contact survey
designed to study contact networks within households in Chapter 5. These networks
were then analyzed using exponential random graph models. We found that the
results support density dependent transmission, with the mean number of contacts
increasing with increasing household size during weekdays, and that contacts between
father and children are less likely than between father and mother, mother and
children and siblings (except older siblings). To assess the impact of these empirically
grounded contact networks on the spread of an infection, we simulated epidemics in
a community of households in a two-level SIR setting with the underlying household
contact network either based on the ERGM or assuming random mixing. These
simulations indicated the random mixing assumption within households to be a
plausible one in this specific setting, since we did not find any noteworthy implications
of switching to an empirically-based contact network. In these analyzes, we focused
on physical contacts, however note that other network links may be more appropriate
when investigating the spread of a specific infection. For example, duration of
contact might be of importance for some diseases. This could be incorporated in the
model by switching to a ‘valued’ network analysis and using these values as weights
in the epidemic model. Further, temporal dynamics could be taken into account by
combining the contact data with the time-use data that was also collected in the study.
In Chapter 6, the model for within-household contact networks developed in
Chapter 5 is combined with epidemic data from a similar community of households
to estimate parameters in a two-level mixing model using a Bayesian framework.
The contact graph underlying this model is therefore informed by both empirical
contact data as well as disease data. From data on symptom onset times of pertussis
in households with a laboratory confirmed index case, we estimated within-household
and community transmission parameters and the duration of the latent period. We
plan to perform a simulation study in which epidemics are simulated in a similar
setting i.e. assuming independent households with an index case, to asses model
fit. Furthermore, we would like to analyze the data using a model that relies on
the random mixing assumption within households to compare the results with our
estimates and fit to the data. We will also investigate the performance of the model
when keeping the contact network fixed.
Chapter 7 presents a two-stage model for the Ebola outbreak in 2014. This
142 Chapter 8. Discussion and Further Research
model was based on publicly available district-level data and showed a strong
temporal and spatial variability of EVD transmission in Guinea, Sierra Leone and
Liberia. This spatial heterogeneity was not taken into account in the majority of
the models published during the outbreak that were fitted on cumulative national
data. We quantified the spatio-temporal transmission patterns at the sub-national
level using a growth model and estimated the effective reproduction number in
selected districts via compartmental models. The latter also allowed to generate
predictions for the number of cases and deaths. However, comparing these predictions
with observed counts, showed that even short-term forecasts are far from reliable.
Extending the model to include protective immunity by asymptomatic infection
improved the fit to the data, however, further research is necessary to investigate
the existence of asymptomatic EVD cases. The modeling approach in this chapter
relied on the assumption of constant under-reporting of cases and deaths over
time, although changes in reporting rates during the outbreak are almost certain
due to e.g. increased awareness. Unfortunately, no data on these changes were
available. Additionally, irregularities in the data indicated inconsistent reporting and
undocumented backlogging.
Lastly, note that the compartmental models in this thesis rely on the common
assumption of exponentially distributed latent and infectious periods. This assump-
tion is mathematically convenient, however not very realistic for most infections
(e.g. Sartwell (1995). More realistic distributions can be obtained by considering
an Erlang distribution with parameters γ and n. It has been shown that these
distributions can have substantial effects, for example in case of perturbations in
the endemic equilibrium (e.g. Lloyd (2001a)), when contact rates vary seasonally
(e.g. Lloyd (2001b)) and when estimating R0 for an emerging disease (Wearing
et al., 2005). I performed a small sensitivity analysis in the context of Chapter 4
(results not shown) in which Erlang distributions (n = 2 and n = 5) were considered.
There was no change in model fit, however, parameter estimates did slightly change.
Wearing et al. (2005) showed that for the same value of R0 and the same average
infectious period, larger values of n lead to a steeper increase in incidence, which is
indeed what we observed. It is possible to account for the uncertainty surrounding
these distributions. However, this implies determining two extra parameters in an
SEIR-type model that might not be estimable from the data at hand and/or, which
might be problematic in a computationally intensive setting.
Bibliography
Abbey, H. (1952). An examination of the Reed-Frost theory of epidemics. Human
Biology, 24(3):201–233.
Addy, C. L., Longini, Jr., I. M., and Haber, M. (1991). A generalized stochastic model
for the analysis of infectious disease final size data. Biometrics, 47:961–974.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood
principle. In Petrov, B.N. and Csaki, F., editors. 2nd International Symposium on
Information Theory, pages 267–281.
Althaus, C. L., Low, N., Musa, E. O., Shuaib, F., and Gsteiger, S. (2015). Ebola virus
disease outbreak in Nigeria: Transmission dynamics and rapid control. Epidemics,
11:80–84.
Anderson, H. (1999). Epidemic models and social networks. Journal of Mathematical
Sciences, (24):128–147.
Anderson, R. M. and May, R. M. (1991). Infectious Diseases of Humans: Dynamics
and Control. Oxford University Press, Oxford.
Andersson, H. and Britton, T. (2000). Stochastic Epidemic Models and Their Statis-
tical Analysis, volume 151 of Lecture Notes in Statistics. Springer New York, New
York, NY.
Azmon, A., Faes, C., and Hens, N. (2014). On the estimation of the reproduction
number based on misreported epidemic data. Statistics in Medicine, 33(7):1176–
1192.
143
144 Bibliography
Bailey, N. (1975). The Mathematical Theory of Infectious Diseases and its Applica-
tion. Griflin, London.
Bailey, N. T. J. (1957). The mathematical theory of epidemics. Griffin, London, UK.
Ball, F. and Britton, T. (2007). An epidemic model with infector-dependent severity.
Advances in Applied Probability, 39(4):949–972.
Ball, F. and Lyne, O. D. (2001). Stochastic multi-type SIR epidemics among a popu-
lation partitioned into households. Advances in Applied Probability, 33(01):99–123.
Ball, F., Mollison, D., and Scalia-Tomba, G. (1997). Epidemics with two levels of
mixing. The Annals of Applied Probability, 7:46–89.
Ball, F. and Neal, P. (2002). A general model for stochastic SIR epidemics with two
levels of mixing. Mathematical Biosciences, 180:73–102.
Becker, N. G. (1989). Analysis of Infectious Disease Data. Chapman and Hall/CRC.
Becker, N. G. and Dietz, K. (1995). The effect of household distribution on transmis-
sion and control of highly infectious diseases. Mathematical biosciences, 127(2):207–
19.
Becker, N. G. and Hall, R. (1996). Immunization levels for preventing epidemics in
a community of households made up of individuals of various types. Mathematical
biosciences, 132(2):205–16.
Bellan, S. E., Pulliam, J. R. C., Dushoff, J., and Meyers, L. A. (2014). Ebola control:
effect of asymptomatic infection and acquired immunity. Lancet (London, England),
384(9953):1499–500.
Bernouilli, D. (1760). Essai dune nouvelle analyse de la mortalite causee par la petite
verole et des avantages de l’inoculation pour la prevenir. Memoires de l’Academie
Royale des Sciences, Paris.
Bollaerts, K., Aerts, M., Shkedy, Z., Faes, C., Van der Stede, Y., Beutels, P., and Hens,
N. (2012). Estimating the population prevalence and force of infection directly from
antibody titres. Statistical Modelling, 12(5):441–462.
Boone, E. L., Merrick, J. R., and Krachey, M. J. (2012). A Hellinger distance ap-
proach to MCMC diagnostics. Journal of Statistical Computation and Simulation,
84(4):833–849.
Bibliography 145
Breiman, L. (2001). Random forests. Machine Learning, 45:5–32.
Bremaud, P. (1999). Markov Chains. Springer New York, New York, NY.
Britton, T. and O’Neill, P. (2002). Bayesian inference for stochastic epidemics in
populations with random social structure. Scandinavian Journal of Statistics,
29(1998):375–390.
Brooks, S. P. and Roberts, G. O. (1998). Convergence assessment techniques for
Markov chain Monte Carlo. Statistics and Computing, 8(4):319–335.
Burnham, K. P. and Anderson, D. R. (2002). Model Selection and Multimodel In-
ference: A Practical Information-Theoretic Approach. Springer-Verlag New York
Inc.
Carrat, F., Vergu, E., Ferguson, N. M., Lemaitre, M., Cauchemez, S., Leach, S., and
Valleron, A. J. (2008). Time lines of infection and disease in human influenza: A re-
view of volunteer challenge studies. American Journal of Epidemiology, 167(7):775–
785.
Cauchemez, S., Carrat, F., Viboud, C., Valleron, a. J., and Boelle, P. Y. (2004).
A Bayesian MCMC approach to study transmission of influenza: application to
household longitudinal data. Statistics in medicine, 23(22):3469–87.
CDC and Ncird (2015). Immunology and vaccine-preventable diseases - pink book
- pertussis. https://www.cdc.gov/vaccines/pubs/pinkbook/downloads/pert.
pdf. Accessed: March 22, 2016.
Centers for Disease Control and Prevention (2010). The 2009 h1n1 pandemic: Sum-
mary highlights. http://www.cdc.gov/h1n1flu/cdcresponse.htm. Accessed:July
15, 2016.
Centers for Disease Control and Prevention (2014). West africa: Ebola
outbreak. http://www.cdc.gov/vhf/ebola/outbreaks/2014-west-africa/
qa-mmwr-estimating-future-cases.html.
Centers for Disease Control and Prevention (2016). Pertussis (whooping cough).
http://www.cdc.gov/pertussis/clinical/features.html. Accessed: October
7, 2016.
Chao, D. L., Halloran, M. E., Obenchain, V. J., and Longini, I. M. (2010). FluTE,
a Publicly Available Stochastic Influenza Epidemic Simulation Model. PLoS Com-
putational Biology, 6(1):e1000656.
146 Bibliography
Chowell, G., Simonsen, L., Viboud, C., and Kuang, Y. (2014a). Is West Africa
Approaching a Catastrophic Phase or is the 2014 Ebola Epidemic Slowing Down?
Different Models Yield Different Answers for Liberia. PLoS Currents, 6.
Chowell, G., Viboud, C., Hyman, J. M., and Simonsen, L. (2014b). The Western
Africa Ebola Virus Disease Epidemic Exhibits Both Global Exponential and Local
Polynomial Growth Rates. PLoS Currents.
Christie, A., Davies-Wayne, G. J., Cordier-Lassalle, T., Cordier-Lasalle, T., Blackley,
D. J., Laney, A. S., Williams, D. E., Shinde, S. A., Badio, M., Lo, T., Mate,
S. E., Ladner, J. T., Wiley, M. R., Kugelman, J. R., Palacios, G., Holbrook, M. R.,
Janosko, K. B., de Wit, E., van Doremalen, N., Munster, V. J., Pettitt, J., Schoepp,
R. J., Verhenne, L., Evlampidou, I., Kollie, K. K., Sieh, S. B., Gasasira, A., Bolay,
F., Kateh, F. N., Nyenswah, T. G., De Cock, K. M., and Centers for Disease Control
and Prevention (CDC) (2015). Possible sexual transmission of Ebola virus - Liberia,
2015. MMWR. Morbidity and mortality weekly report, 64(17):479–81.
Cowles, M. K. and Carlin, B. P. (1996). Markov Chain Monte Carlo Convergence Di-
agnostics: A Comparative Review. Journal of the American Statistical Association,
91(434):883–904.
Daley, D. J. and Gani, J. (1999). Epidemic Modelling. Cambridge University Press,
Cambridge.
Danon, L., Ford, A. P., House, T., Jewell, C. P., Keeling, M. J., Roberts, G. O., Ross,
J. V., and Vernon, M. C. (2011). Networks and the epidemiology of infectious
disease. Interdisciplinary Perspectives on Infectious Diseases, Article ID 284909:28
pages.
de Greeff, S. C., de Melker, H. E., Westerhof, A., Schellekens, J. F., Mooi, F. R.,
and van Boven, M. (2012). Estimation of household transmission rates of pertussis
and the effect of cocooning vaccination strategies on infant pertussis. Epidemiology,
23(6):852–860.
de Greeff, S. C., Mooi, F. R., Westerhof, a., Verbakel, J. M. M., Peeters, M. F.,
Heuvelman, C. J., Notermans, D. W., Elvers, L. H., Schellekens, J. F. P., and
de Melker, H. E. (2010). Pertussis Disease Burden in the Household: How to
Protect Young Infants. Clinical Infectious Diseases, 50(10):1339–1345.
de Ory, F., Echevarrıa, J. M., Kafatos, G., Anastassopoulou, C., Andrews, N., Back-
house, J., Berbers, G., Bruckova, B., Cohen, D. I., de Melker, H., Davidkin, I.,
Bibliography 147
Gabutti, G., Hesketh, L. M., Johansen, K., Jokinen, S., Jones, L., Linde, A., Miller,
E., Mossong, J., Nardone, A., Rota, M. C., Sauerbrei, A., Schneider, F., Smetana,
Z., Tischer, A., Tsakris, A., and Vranckx, R. (2006). European seroepidemiology
network 2: Standardisation of assays for seroepidemiology of varicella zoster virus.
Journal of clinical virology : the official publication of the Pan American Society
for Clinical Virology, 36(2):111–8.
Del Valle, S. Y., Hymanb, J., Hethcote, H., and Eubank, S. (2007). Mixing patterns
between age groups in social networks. Social Networks, 29:539–554.
Demiris, N. and O’Neill, P. (2005a). Bayesian inference for epidemics with two levels
of mixing. Scandinavian journal of statistics, 32(Mcmc):265–280.
Demiris, N. and O’Neill, P. D. (2005b). Bayesian inference for stochastic multitype
epidemics in structured populations via random graphs. Journal of the Royal Sta-
tistical Society: Series B (Statistical Methodology), 67(5):731–745.
Diekmann, O., Heesterbeek, H., and Britton, T. (2013). Mathematical tools for un-
derstanding infectious diseases dynamics. Princeton University Press.
Diekmann, O., Heesterbeek, J. A. P., and Metz, J. A. J. (1990). On the definition and
the computation of the basic reproduction ratio R0 in models for infectious diseases
in heterogeneous populations. Journal of Mathematical Biology, 28:365–382.
Dietz, K. (1975). Transmission and control of arbovirus diseases. Epidemiology, SIMS
Utah Conference Proceedings, Eds. D. Ludwig and K.L. Cooke:104–121.
Dietz, K. (1993). The estimation of the basic reproduction number for infectious
diseases. Statistical methods in medical research, 2(1):23–41.
Dorjee, S., Poljak, Z., Revie, C. W., Bridgland, J., Mcnab, B., Leger, E., and Sanchez,
J. (2013). A review of simulation modelling approaches used for the spread of
zoonotic influenza viruses in animal and human populations. Zoonoses and Public
Health, 60(6):383–411.
Eames, K. T. D., Tilston, N. L., White, P. J., Adams, E., and Edmunds, W. J. (2010).
The impact of illness and the impact of school closure on social contact patterns.
Health Technology Assessment, 14(34):267–312.
Effron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Monographs
on Statistics & Applied Probability. Chapman & Hall/CRC, London.
148 Bibliography
Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals
of Statistics, 7(1):1–26.
Ejima, K., Aihara, K., and Nishiura, H. (2013). The impact of model building on
the transmission dynamics under vaccination: observable (symptom-based) versus
unobservable (contagiousness-dependent) approaches. PloS one, 8(4):e62062.
European Centre for Disease Preventionl and Control (2016). Ebola and
marburg fevers. http://ecdc.europa.eu/en/healthtopics/ebola_marburg_
fevers/Pages/index.aspx. Accessed: July 18, 2016.
Exchange HD (2015). West africa: Ebola outbreak. https://data.hdx.rwlabs.org/
ebola. Accessed:July 1, 2015.
Farrington, C. P. (2003). Modelling Epidemics. The Open University.
Farrington, C. P., Kanaan, M. N., and Gay, N. J. (2001). Estimation of the ba-
sic reproduction number for infectious diseases from age-stratified serological sur-
vey data. Journal of the Royal Statistical Society: Series C (Applied Statistics),
50(3):251–292.
Farrington, C. P. and Whitaker, H. J. (2005). Contact Surface Models for Infectious
Diseases. Journal of the American Statistical Association, 100(470):370–379.
Ferguson, N. M., Cummings, D. A. T., Fraser, C., Cajka, J. C., Cooley, P. C., and
Burke, D. S. (2006). Strategies for mitigating an influenza pandemic. Nature,
442(7101):448–452.
Fisman, D., Khoo, E., and Tuite, A. (2014). Early Epidemic Dynamics of the West
African 2014 Ebola Outbreak: Estimates Derived with a Simple Two-Parameter
Model. PLoS Currents.
Fraser, C., Riley, S., Anderson, R. M., and Ferguson, N. M. (2004). Factors that make
an infectious disease outbreak controllable. Proceedings of the National Academy
of Sciences of the United States of America, 101(16):6146–6151.
Geyer, C. J. and Thompson, E. A. (1992). Constrained Monte Carlo maximum like-
lihood calculations. Journal of the Royal Statistical Society B, 54:657–699.
Goeyvaerts, N. (2011). Statistical and mathematical models to estimate the transmis-
sion of airborne infections from current status data. PhD thesis, Hasselt University.
Bibliography 149
Goeyvaerts, N., Hens, N., Ogunjimi, B., Aerts, M., Shkedy, Z., Damme, P. V., and
Beutels, P. (2010). Estimating infectious disease parameters from data on social
contacts and serological status. Journal of the Royal Statistical Society: Series C
(Applied Statistics), 59(2):255–277.
Goeyvaerts, N., Santermans, E., Potter, G., Van Kerckhove, K., Willem, L., Beutels,
P., and Hens, N. (2016). Empirical household contact networks: revisiting the
two-level mixing model. To be submitted.
Gomes, M. F. C., Pastore y Piontti, A., Rossi, L., Chao, D., Longini, I., Halloran,
M. E., and Vespignani, A. (2014). Assessing the International Spreading Risk
Associated with the 2014 West African Ebola Outbreak. PLoS Currents.
Goodreau, S. M., Handcock, M. S., Hunter, D. R., Butts, C. T., and Morris, M.
(2008). A statnet tutorial. Journal of Statistical Software, 24:1–26.
Greenhalgh, D. and Dietz, K. (1994). Some bounds on estimates for reproductive
ratios derived from the age-specific force of infection. Mathematical Biosciences,
124(1):9–57.
Grefenstette, J. J., Brown, S. T., Rosenfeld, R., DePasse, J., Stone, N. T. B., Coo-
ley, P. C., Wheaton, W. D., Fyshe, A., Galloway, D. D., Sriram, A., Guclu, H.,
Abraham, T., and Burke, D. S. (2013). FRED (a Framework for Reconstructing
Epidemic Dynamics): an open-source software system for modeling infectious dis-
eases and control strategies using census-based populations. BMC public health,
13:940.
Grijalva, C. G., Goeyvaerts, N., Verastegui, H., Edwards, K. M., Gil, A. I., Lanata,
C. F., Hens, N., and RESPIRA PERU project (2015). A household-based study of
contact networks relevant for the spread of infectious diseases in the highlands of
Peru. PloS one, 10(3):e0118457.
Groendyke, C., Welch, D., and Hunter, D. R. (2011). Bayesian Inference for Contact
Networks Given Epidemic Data. Scandinavian Journal of Statistics, 38(3):600–616.
Groendyke, C., Welch, D., and Hunter, D. R. (2012). A network-based analysis of
the 1861 Hagelloch measles data. Biometrics, 68(3):755–65.
Haario, H., Saksman, E., and Tamminen, J. (2001). An Adaptive Metropolis Algo-
rithm. Bernoulli, 7(2):223.
150 Bibliography
Hall, B. (2011). LaplacesDemon: An R Package for Bayesian Inference. R package
version.
Halloran, M. E., Longini, Jr., I. M., Nizam, A., and Yang, Y. (2002). Containing
bioterrorist smallpox. Science, 298:1428–1432.
Handcock, M. S. (2003). Assessing degeneracy in statistical models of social networks.
Technical Report Working Paper no. 39, University of Washington, Seattle.
Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., Krivitsky, P. N., and
Morris, M. (2013a). ergm: Fit, Simulate and Diagnose Exponential-Family Models
for Networks. The Statnet Project (http://www.statnet.org). R package version
3.1-0.
Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., Krivitsky, P. N., and
Morris, M. (2013b). statnet: Software tools for the Statistical Analysis of Network
Data. The Statnet Project (http://www.statnet.org). R package version 3.1-0.
Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., Morris, and Martina
(2008). statnet: Software tools for the representation, visualization, analysis and
simulation of network data. Journal of Statistical Software, 24:1–11.
Hanneke, S., Fu, W., and Xing, E. P. (2010). Discrete temporal models of social
networks. Electron. J. Statist., 4:585–605.
Hastings, W. K. (1970). Monte Carlo Sampling Methods Using Markov Chains and
Their Applications. Biometrika, 57(1):97–109.
Hawkes, N. (2014). Ebola outbreak is a public health emergency of international
concern, WHO warns. BMJ (Clinical research ed.), 349:g5089.
Heesterbeek, J. A. P. (2002). A brief history of R0 and a recipe for its calculation.
Acta biotheoretica, 50(3):189–204.
Hens, N., Aerts, M., Faes, C., Shkedy, Z., Lejeune, O., Van Damme, P., and Beutels,
P. (2010). Seventy-five years of estimating the force of infection from current status
data. Epidemiology and infection, 138(6):802–12.
Hens, N., Shkedy, Z., Aerts, M., Faes, C., Van Damme, P., and Beutels, P. (2012).
Modeling Infectious Disease Parameters Based on Serological and Social Contact
Data: a Modern Statistical Perspective. Springer-Verlag New York Inc.
Bibliography 151
Huang, Y., Zaas, A. K., Rao, A., Dobigeon, N., Woolf, P. J., Veldman, T., Oien,
N. C., McClain, M. T., Varkey, J. B., Nicholson, B., Carin, L., Kingsmore, S.,
Woods, C. W., Ginsburg, G. S., and Hero, A. O. (2011). Temporal dynamics of
host molecular responses differentiate symptomatic and asymptomatic influenza a
infection. PLoS genetics, 7(8):e1002234.
Hunter, D. R. (2007). Curved exponential family models for social networks. Social
Networks, 29:216–230.
Hunter, D. R., Handcock, M. S., Butts, C. T., Goodreau, S. M., and Morris, M.
(2008). ergm: A package to fit, simulate and diagnose exponential-family models
for networks. Journal of Statistical Software, 24:1–29.
Inaba, H. and Nishiura, H. (2008). The state-reproduction number for a multistate
class age structured epidemic system and its application to the asymptomatic trans-
mission model. Mathematical biosciences, 216(1):77–89.
Keeling, M. J. and Eames, K. T. D. (2005). Networks and epidemic models. Journal
of the Royal Society Interface, 2:295–307.
Kelly, H., Riddell, M. A., Gidding, H. F., Nolan, T., and Gilbert, G. L. (2002).
A random cluster survey and a convenience sample give comparable estimates of
immunity to vaccine preventable diseases in children of school age in Victoria,
Australia. Vaccine, 20(25):3130–3136.
Kermack, W. and McKendrick, A. (1927). A contribution to the mathematical theory
of epidemics. Proceedings of the Royal Society London A, 115:700–721.
King, A. A., Domenech de Celles, M., Magpantay, F. M. G., and Rohani, P. (2015).
Avoidable errors in the modelling of outbreaks of emerging pathogens, with spe-
cial reference to Ebola. Proceedings of the Royal Society B: Biological Sciences,
282(1806):20150347–20150347.
Kolaczyk, E. D. (2009). Statistical Analysis of Network Data: Methods and Models.
Springer, New York.
Krivitsky, P. N. (2012). Exponential-family random graph models for valued networks.
Electron. J. Statist., 6:1100–1128.
Krivitsky, P. N. and Handcock, M. S. (2014). A separable model for dynamic net-
works. Journal of the Royal Statistical Society: Series B (Statistical Methodology),
76(1):29–46.
152 Bibliography
Lewnard, J. a., Ndeffo Mbah, M. L., Alfaro-Murillo, J. a., Altice, F. L., Bawo, L.,
Nyenswah, T. G., and Galvani, A. P. (2014). Dynamics and control of Ebola virus
transmission in Montserrado, Liberia: a mathematical modelling analysis. The
Lancet Infectious Diseases, 3099(14).
Liberia MoHaSWRo (2015). Facts about ebola virus disease. http://www.mohsw.
gov.lr/content_display.php?submenu_id=72&sub=submenu. Accessed: July 1,
2015.
Lin, C.-J., Deger, K. A., and Tien, J. H. (2016). Modeling the trade-off between trans-
missibility and contact in infectious disease dynamics. Mathematical biosciences,
277:15–24.
Lloyd, A. L. (2001a). Destabilization of epidemic models with the inclusion of realistic
distributions of infectious periods. Proceedings of the Royal Society B: Biological
Sciences, 268(1470):985–993.
Lloyd, A. L. (2001b). Realistic Distributions of Infectious Periods in Epidemic Models:
Changing Patterns of Persistence and Dynamics. Theoretical Population Biology,
60(59971).
Longini, Jr., I. M. and Koopman, J. S. (1982). Household and community transmission
parameters from final distributions of infections in households. Biometrics, 38:115–
126.
Longini, Jr., I. M., Koopman, J. S., Haber, M., and Cotsonis, G. A. (1988). Statistical
inference for infectious diseases. risk-specific household and community transmission
parameters. American Journal of Epidemiology, 128:845–859.
Longini, Jr., I. M., Koopman, J. S., Monto, A. S., and Fox, J. P. (1982). Estimating
household and community transmission parameters for influenza. American Journal
of Epidemiology, 115:736–751.
Lunelli, A., Rizzo, C., Puzelli, S., Bella, A., Montomoli, E., Rota, M. C., Donatelli,
I., and Pugliese, A. (2013). Understanding the dynamics of seasonal influenza in
Italy: incidence, transmissibility and population susceptibility in a 9-year period.
Influenza and Other Respiratory Viruses, 7(3):286–295.
Melegaro, A., Gay, N. J., and Medley, G. F. (2004). Estimating the transmission
parameters of pneumococcal carriage in households. Epidemiology and Infection,
132:433–441.
Bibliography 153
Melegaro, A., Jit, M., Gay, N., Zagheni, E., and Edmunds, W. J. (2011). What types
of contacts are important for the spread of infections?: using contact survey data
to explore European mixing patterns. Epidemics, 3(3-4):143–51.
Merler, S., Ajelli, M., Fumanelli, L., Gomes, M. F., Piontti, A. P., Rossi, L., Chao,
D. L., Longini Jr., I. M., Halloran, M. E., and Vespignani, A. (2015). Spatiotemporal
spread of the 2014 outbreak of Ebola virus disease in Liberia and the effectiveness
of non-pharmaceutical interventions: a computational modelling analysis. Lancet
Infect Dis, 15(2):204–211.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E.
(1953). Equation of State Calculations by Fast Computing Machines. J. Chem.
Phys. J. Chem. Phys. Journal Homepage, 21(6).
Meyers, L. A., Pourbohloul, B., Newman, M., Skowronski, D. M., and Brunham,
R. C. (2005). Network theory and SARS: predicting outbreak diversity. Journal of
Theoretical Biology, 232:71–81.
Miller, E., Hoschler, K., Hardelid, P., Stanford, E., Andrews, N., and Zambon, M.
(2010). Incidence of 2009 pandemic influenza A H1N1 infection in England: a
cross-sectional serological study. Lancet, 375(9720):1100–1108.
Miller, E., Marshall, R., and Vurdien, J. (1993). Epidemiology, outcome and control
of varicella-zoster infection. Reviews in Medical Microbiology, 4:222–230.
Mniszewski, S. M., Del Valle, S. Y., Stroud, P. D., Riese, J. M., and Sydoriak, S. J.
(2008). EpiSimS simulation of a multi-component strategy for pandemic influenza.
Society for Computer Simulation International.
Morris, M., Handcock, M. S., and Hunter, D. R. (2008). Specification of exponential-
family random graph models: Terms and computational aspects. Journal of Sta-
tistical Software, 24:1–24.
Mossong, J., Hens, N., Jit, M., Beutels, P., Auranen, K., et al. (2008a). Social contacts
and mixing patterns relevant to the spread of infectious diseases. PLoS Medicine,
5(3):381–391.
Mossong, J., Hens, N., Jit, M., Beutels, P., Auranen, K., Mikolajczyk, R., Massari,
M., Salmaso, S., Tomba, G. S., Wallinga, J., Heijne, J., Sadkowska-Todys, M.,
Rosinska, M., and Edmunds, W. J. (2008b). Social contacts and mixing patterns
relevant to the spread of infectious diseases. PLoS medicine, 5(3):e74.
154 Bibliography
Nardone, a., de Ory, F., Carton, M., Cohen, D., van Damme, P., Davidkin, I., Rota,
M. C., de Melker, H., Mossong, J., Slacikova, M., Tischer, a., Andrews, N., Berbers,
G., Gabutti, G., Gay, N., Jones, L., Jokinen, S., Kafatos, G., de Aragon, M. V. M.,
Schneider, F., Smetana, Z., Vargova, B., Vranckx, R., and Miller, E. (2007). The
comparative sero-epidemiology of varicella zoster virus in 11 countries in the Euro-
pean region. Vaccine, 25(45):7866–72.
National Institute for Public Health and the Environment (2013). The national im-
munisation programme in the netherlands - developments in 2013.
Nations U. West Africa (2015). Ebola outbreak 2014-2015. http://www.
humanitarianresponse.info/disaster/ep-2014-000041-gin/documents%
20and%20since%20April%202015%20http://guinea-ebov.github.io/sitreps.
html. Accessed: July 1, 2015.
NERC (2015). Evd dailly mohs update. http://nerc.sl/. Accessed: July 1, 2015.
Nishiura, H. and Chowell, G. (2014). Early transmission dynamics of Ebola virus
disease (evd), West Africa, march to august 2014. Eurosurveillance.
Ogunjimi, B., Hens, N., Goeyvaerts, N., Aerts, M., Van Damme, P., and Beutels,
P. (2009). Using empirical social contact data to model person to person infec-
tious disease transmission: an illustration for varicella. Mathematical Biosciences,
218:80–87.
O’Neill, P. D. (2009). Bayesian inference for stochastic multitype epidemics in struc-
tured populations using sample data. Biostatistics (Oxford, England), 10(4):779–91.
O’Neill, P. D. and Becker, N. G. (2001). Inference for an epidemic when susceptibility
varies. Biostatistics (Oxford, England), 2(1):99–108.
Organization GoGWH (2015). Rapport de la situation epidemiologique | epi
situation report. http://www.humanitarianresponse.info/en/operations/
west-and-central-africa/documents/disasters/33204. Accessed:July 1, 2015.
Plotkin, S. (2010). Complex correlates of protection after vaccination. Clinical Infec-
tious Diseases, 56:1458–1465.
Potter, G. E. and Handcock, M. S. (2010). A description of within-family resource
exchange networks in a Malawian village. Demographic Research, 23:117–152.
Bibliography 155
Potter, G. E., Handcock, M. S., Longini, Jr., I. M., and Halloran, M. E. (2011).
Estimating within-household contact networks from egocentric data. Annals of
Applied Statistics, 5:1816–1838.
Potter, G. E. and Hens, N. (2013). A penalized likelihood approach to estimate within-
household contact networks from egocentric data. Journal of the Royal Statistical
Society: Series C (Applied Statistics), 62:629–648.
Potter, G. E., Smieszek, T., and Sailer, K. (2015). Modeling workplace contact net-
works: The effects of organizational structure, architecture, and reporting errors on
epidemic predictions. Network science (Cambridge University Press), 3(3):298–325.
Public Health England (2010). Weekly epidemiological updates archive.
http://www.hpa.org.uk/Topics/InfectiousDiseases/InfectionsAZ/
PandemicInfluenza/H1N1PandemicArchive/SIEpidemiologicalData/
SIEpidemiologicalReportsArchive/influswarchiveweeklyepireports/.
Accessed: December 20, 2010.
Rampey, A. H., Longini, Jr., I. M., Haber, M., and Monto, M. S. (1992). A discrete-
time model for the statistical analysis of infectious disease incidence data. Biomet-
rics, 48:117–128.
Read, J. M., Edmunds, W. J., Riley, S., Lessler, J., and Cummings, D. a. T. (2012).
Close encounters of the infectious kind: methods to measure social mixing be-
haviour. Epidemiology and infection, 140(12):2117–30.
Reshef, D. N., Reshef, Y. a., Finucane, H. K., Grossman, S. R., McVean, G., Turn-
baugh, P. J., Lander, E. S., Mitzenmacher, M., and Sabeti, P. C. (2011). Detecting
novel associations in large data sets. Science (New York, N.Y.), 334(6062):1518–24.
Roberts, G. O. and Rosenthal, J. S. (2009). Examples of Adaptive MCMC. Journal
of Computational and Graphical Statistics, 18(2):349–367.
Robins, G., Pattison, P., Kalish, Y., and Lusher, D. (2007). An introduction to expo-
nential random graph (p*) models for social networks. Social Networks, 29(2):173–
191.
Rosenthal, J. S. (2007). AMCMC: An R interface for adaptive MCMC.
Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian inference for
latent Gaussian models by using integrated nested Laplace approximations. Journal
of the Royal Statistical Society: Series B (Statistical Methodology), 71(2):319–392.
156 Bibliography
Sane, J. and Edelstein, M. (2015). Overcoming barriers to data sharing in pub-
lic health: a global perspective. https://www.chathamhouse.org/publication/
overcoming-barriers-data-sharing-public-health-global-perspective.
Sanitation Moha (2015). Ebola situation report. http://health.gov.sl/?page_id=
583. Accessed: July 1, 2015.
Santermans, E., Goeyvaerts, N., Melegaro, A., Edmunds, W., Faes, C., Aerts, M.,
Beutels, P., and Hens, N. (2015). The social contact hypothesis under the as-
sumption of endemic equilibrium: Elucidating the transmission potential of VZV
in Europe. Epidemics, 11:14–23.
Santermans, E., Robesyn, E., Ganyani, T., Sudre, B., Faes, C., Quinten, C., Van Bor-
tel, W., Haber, T., Kovac, T., Van Reeth, F., Testa, M., Hens, N., and Plachouras,
D. (2016a). Spatiotemporal Evolution of Ebola Virus Disease at Sub-National Level
during the 2014 West Africa Epidemic: Model Scrutiny and Data Meagreness. PloS
one, 11(1):e0147172.
Santermans, E., Van Kerckhove, K., Azmon, A., Edmunds, J., Beutels, P., Faes, C.,
and Hens, N. (2016b). Structural differences in mixing behaviour informing the
role of asymptomatic infection and testing symptom heritability. Mathematical
Biosciences. In revision.
Sartwell, P. E. (1995). The distribution of incubation periods of infectious disease.
1949. American journal of epidemiology, 141(5):386–94; discussion 385.
Schinazi, R. B. (2002). On the role of social clusters in the transmission of infectious
diseases. Theoretical population biology, 61(2):163–9.
Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics,
6(2):461–464.
Smieszek, T., Barclay, V. C., Seeni, I., Rainey, J. J., Gao, H., Uzicanin, A., and
Salathe, M. (2014). How should social mixing be measured: comparing web-based
survey and sensor-based methods. Bmc Infectious Diseases, 14.
Strauss, D. and Ikeda, M. (1990). Pseudolikelihood estimation for social networks.
Journal of the American Statistical Association, 85:204–212.
UN Mission for Ebola Emergency Response (2015). (unmeer). http://apps.who.
int/ebola/current-situation/ebola-situation-report-18-march-2015. Ac-
cessed: July 1, 2015.
Bibliography 157
United Nations Security Council (2014). Resolution 2177, adopted by the security
council at its 7268th meeting on 18 september 2014.
Van Effelterre, T., Shkedy, Z., Aerts, M., Molenberghs, G., Van Damme, P., and
Beutels, P. (2009). Contact patterns and their implied basic reproductive numbers:
an illustration for varicella-zoster virus. Epidemiology and Infection, 137:48–57.
Van Kerckhove, K., Hens, N., Edmunds, W. J., and Eames, K. T. D. (2013). The
impact of illness on social networks: implications for transmission and control of
influenza. American journal of epidemiology, 178(11):1655–1662.
Vynnycky, E. and White, R. G. (2010). An introduction to infectious disease mod-
elling. Oxford University Press.
Wallinga, J. and Levy-Bruhl, D. (2001). Estimation of measles reproduction ratios
and prospects for elimination of measles by vaccination in some Western European
countries. Epidemiology and . . . , pages 281–295.
Wallinga, J., Teunis, P., and Kretzschmar, M. (2006). Using data on social contacts
to estimate age-specific transmission parameters for respiratory-spread infectious
agents. American Journal of Epidemiology, 164(10):936–944.
Wearing, H. J., Rohani, P., and Keeling, M. J. (2005). Appropriate Models for the
Management of Infectious Diseases. PLoS Medicine, 2(7):e174.
Whitaker, H. J. and Farrington, C. P. (2004). Infections with varying contact rates:
application to varicella. Biometrics, 60(3):615–23.
Willem, L., Van Kerckhove, K., Chao, D. L., Hens, N., and Beutels, P. (2012). A nice
day for an infection? Weather conditions and social contact patterns relevant to
influenza transmission. PLoS ONE, 7(11):e48695.
Wood, S. N. (2006). Generalized additive models : an introduction with R. Chapman
& Hall/CRC.
Woof, J. M. and Burton, D. R. (2004). Human antibodyaASFc receptor interactions
illuminated by crystal structures. Nature Reviews Immunology, 4(2):89–99.
World Health Organization (2010). Weekly epidemiological record - no. 40, 85. http:
//www.who.int/wer. Accessed: March 22, 2016.
158 Bibliography
World Health Organization (2014). Case definition recommendations for ebola or mar-
burg virus diseases. http://www.who.int/csr/resources/publications/ebola/
ebola-case-definition-contact-en.pdf. Accessed: January 16, 2015.
World Health Organization (2015a). The ebola outbreak in liberia is over. http://
www.who.int/mediacentre/news/statements/2015/liberia-ends-ebola/en/.
World Health Organization (2015b). Ebola situation report - 18
march 2015. http://apps.who.int/ebola/current-situation/
ebola-situation-report-18-march-2015. Accessed: March 25, 2015.
World Health Organization (2015c). Ebola situation report - 24
june 2015. http://apps.who.int/ebola/current-situation/
ebola-situation-report-24-june-2015. Accessed: July 2, 2015.
World Health Organization (2016a). Ebola virus disease: Fact sheet. http://www.
who.int/mediacentre/factsheets/fs103/en/. Accessed: July 18, 2016.
World Health Organization (2016b). Infectious diseases. http://www.who.int/
topics/infectious_diseases/en/. Accessed: August 29, 2016.
Acknowledgements
I gratefully acknowledge support from a Methusalem research grant from the
Flemish government awarded to Herman Goossens (Antwerpen University) en Geert
Molenberghs (Hasselt University). The computational resources and services used
in this thesis were provided by the VSC (Flemish Supercomputer Center), funded
by the Hercules Foundation and the Flemish Government - department EWI. I
gratefully acknowledge G.-J. Bex for his assistance.
159
Appendix AAppendix - Chapter 5
In this Appendix we present some extra information relevant to the ERGM, additional
results for the goodness-of-fit simulations (both described in Section 5.2) and the
epidemic simulation in Section 5.3.
A.1 Household Contact Survey
Figure A.1: Barplot of within-household contact duration distributions by type of
relationship, including both physical and non-physical contacts.
161
162 A. Appendix - Chapter 5
A.2 Modeling Within-household Physical Contact
Networks
Figure A.2: Interpretation of mixing and age effect statistics of the ERGM: ratio of the
odds of physical contact occurring between two relatives versus a pair of siblings, as a
function of the sum of the siblings’ ages. Left panel: weekday, right panel: weekend day.
A. Appendix - Chapter 5 163
Within-household Clustering
We consider various measures of within-household clustering: the clustering coeffi-
cient (Kolaczyk, 2009), the mean correlation coefficient (Morris et al., 2008) and the
proportion of observed versus potential triangles, defined as:
Clustering coeffcient =3 ·#triangles
#connected triples;
Mean correlation coeffcient =#triangles
#triangles + #2-stars /∈ triangle;
Proportion observed vs. potential triangles =#triangles∑h
(size(h)
3
) .
HH size Nr. HHs Average Clustering Mean correlation Proportion observed
degree coefficient coefficient vs. potential triangles
2 12 1.00 NA NA NA
3 72 1.88 0.96 0.90 0.86
4 159 2.81 0.97 0.91 0.87
5 57 3.66 0.96 0.90 0.83
≥ 6 16 4.51 0.96 0.88 0.77
Total 316 2.94 0.96 0.90 0.83
Table A.1: Observed physical contact networks: average degree and various measures of
within-household clustering, stratified by household size.
ERGM Weekday
HH size Nr. HHs Proportion complete Mean density
Observed Median Q 2.5% Q 97.5% Observed Median Q 2.5% Q 97.5%
2 9 1.00 0.89 0.67 1.00 1.00 0.89 0.67 1.00
3 53 0.91 0.92 0.83 0.98 0.96 0.97 0.93 0.99
4 111 0.77 0.75 0.66 0.82 0.93 0.93 0.90 0.95
5 39 0.64 0.67 0.51 0.79 0.90 0.91 0.84 0.96
≥ 6 13 0.46 0.54 0.31 0.77 0.85 0.84 0.73 0.93
Total 225 0.77 0.76 0.71 0.81 0.93 0.93 0.91 0.95
Table A.2: Observed proportion of complete networks and mean network density,
stratified by household size, with median and 95% percentile range obtained from 1000
networks simulated from the ERGM for within-household physical contact networks on a
weekday.
164 A. Appendix - Chapter 5
HH size Nr. HHs Proportion observed vs. potential triangles
Observed Median Q 2.5% Q 97.5%
3 53 0.91 0.92 0.83 0.98
4 111 0.85 0.84 0.78 0.89
5 39 0.81 0.82 0.71 0.90
≥ 6 13 0.71 0.71 0.56 0.87
Total 216 0.80 0.80 0.75 0.85
Table A.3: Observed proportion of observed versus potential triangles, stratified by
household size, with median and 95% percentile range obtained from 1000 networks
simulated from the ERGM for within-household physical contact networks on a weekday.
ERGM Weekend
HH size Nr. HHs Proportion complete Mean density
Observed Median Q 2.5% Q 97.5% Observed Median Q 2.5% Q 97.5%
≤ 3 22 0.77 0.82 0.68 0.95 0.89 0.92 0.82 0.98
4 48 0.85 0.83 0.73 0.92 0.96 0.95 0.92 0.98
≥ 5 21 0.81 0.86 0.71 0.95 0.96 0.97 0.92 0.99
Total 91 0.82 0.84 0.76 0.90 0.94 0.95 0.92 0.97
Table A.4: Observed proportion of complete networks and mean network density,
stratified by household size, with median and 95% percentile range obtained from 1000
networks simulated from the ERGM for within-household physical contact networks on a
weekend day.
HH size Nr. HHs Proportion observed vs. potential triangles
Observed Median Q 2.5% Q 97.5%
3 19 0.74 0.84 0.63 0.95
4 48 0.91 0.89 0.83 0.94
≥ 5 21 0.93 0.94 0.88 0.98
Total 88 0.91 0.91 0.87 0.95
Table A.5: Observed proportion of observed versus potential triangles, stratified by
household size, with median and 95% percentile range obtained from 1000 networks
simulated from the ERGM for within-household physical contact networks on a weekend
day.
A. Appendix - Chapter 5 165S
ou
rce
Data
Nu
mb
erof
Hou
seh
old
Str
ati
fica
tion
Hou
seh
old
Com
mu
nit
y
hou
sehold
ssi
ze(r
ange)
q HH
SA
Rq c
om
CP
I
Lon
gin
iand
Koop
man
(1982)
Asi
an
infl
uen
za,
Japan
n=
42
Siz
e3
0.9
60.1
70.8
60.1
4
Infl
uen
zan
=42
Siz
e4−
50.9
00.3
60.6
60.3
4
Lon
gin
iet
al.
(1982)
1977-7
8in
flu
enza
A(H
3N
2),
Tec
um
seh
n=
195
Siz
e†1−
50.9
60.1
50.8
70.1
3
1975-7
6in
flu
enza
B,
Sea
ttle
n=
87
Siz
e†1−
50.9
70.1
30.8
30.1
7
1977-7
8in
flu
enza
A(H
3N
2),
Sea
ttle
n=
159
Siz
eN
A0.9
40.2
10.7
40.2
6
1978-7
9in
flu
enza
A(H
1N
1),
Sea
ttle
n=
93
Siz
e†1−
30.9
10.3
10.5
40.4
6
Lon
gin
iet
al.
(1988)
1977-7
8,
1980-8
1in
fluen
zaA
(H3N
2),
Tec
um
sehn
=567
Siz
e†1−
5C
hild<
18y
0.9
40.2
20.8
20.1
8
Ad
ult≥
18y
0.9
70.1
10.8
90.1
1
Ad
dy
etal.
(1991)*
1977-7
8,
1980-8
1in
flu
enza
A(H
3N
2),
Tec
um
sehn
=567
Siz
e†1−
5C
hild
-ch
ild
0.9
20.2
80.8
20.1
8
Child
-ad
ult
0.9
60.1
3
Ad
ult
-ch
ild
0.9
70.1
00.8
90.1
1
Ad
ult
-adu
lt0.9
60.1
5
Ram
pey
etal.
(1992)*
1983
rhin
ovir
us,
Tec
um
seh
n=
91
Siz
e3−
9C
hild
-ch
ild
0.9
50.1
70.6
30.3
7
Child
-ad
ult
0.9
70.1
3
Ad
ult
-ch
ild
0.9
70.1
10.7
60.2
4
Ad
ult
-adu
lt0.9
70.1
1
Cau
chem
ezet
al.
(2004)
1999-2
000
infl
uen
zaA
(H3N
2),
Fra
nce
n=
334
Siz
e2−
8S
ize
20.8
70.4
30.9
20.0
8
Siz
e3
0.9
10.3
1
Siz
e4
0.9
30.2
5
Siz
e5
0.9
40.2
1
Table
A.6
:L
iter
atu
re-b
ase
des
tim
ate
sof
house
hold
and
com
munit
ytr
ansm
issi
on
para
met
ers
obta
ined
from
house
hold
final
size
or
sym
pto
monse
tdata
:q H
H=P
(esc
ap
ein
fect
ion
from
infe
cted
HH
mem
ber
per
day
)ass
um
ing
an
infe
ctio
us
per
iod
of
4day
s,th
ehouse
hold
seco
ndary
att
ack
rate
(SA
R)
i.e.
the
pro
babilit
yof
bei
ng
infe
cted
by
anoth
erhouse
hold
mem
ber
duri
ng
the
cours
eof
the
latt
er’s
infe
ctio
us
per
iod,
andq c
om
=P
(esc
ap
ein
fect
ion
from
com
munit
yduri
ng
epid
emic
per
iod)=
1−
CP
I,w
her
eC
PI
isth
eco
mm
unit
ypro
babilit
yof
infe
ctio
n.†
House
hold
size
defi
ned
as
the
num
ber
of
susc
epti
ble
sin
ahouse
hold
pri
or
toth
eep
idem
ic.
*Sam
eage
defi
nit
ions
for
childre
n
and
adult
sas
inL
ongin
iet
al.
(1988),
dis
tinguis
hin
gb
etw
een
susc
epti
ble
sand
infe
cted
.
166 A. Appendix - Chapter 5
A.3 Epidemic Simulation Model
Figure A.3: Final fractions for 1000 simulations of a stochastic SIR epidemic process on a
2-level households model assuming random and empirical-based mixing within households.
Small outbreaks are excluded from display.
Figure A.4: Final fractions for 1000 simulations of a stochastic SIR epidemic process on a
2-level households model assuming random and empirical-based mixing within households.
Small outbreaks are excluded from display.
Samenvatting
Infectieziekten zijn elk jaar verantwoordelijk voor miljoenen doden, vooral in ontwik-
kelingslanden. Van de globale HIV en tuberculose epidemieen tot de ontwikkeling
van nieuwe pathogenen en de heropflakkering van oude pathogenen, vaak in nieuwe
en resistente vorm, infectieziekten hebben een zeer grote impact op de wereldge-
zondheid. Er is continu nood aan nieuwe en verbeterde technieken om de oorzaak
en verspreiding van deze ziekten te bestuderen om ze uiteindelijk onder controle te
krijgen.
Infectieziekten worden veroorzaakt door pathogene micro-organismen of ziekte-
kiemen, zoals bacterien, virussen, parasieten of schimmels en ze kunnen direct of
indirect verspreid worden van persoon tot persoon. Directe transmissie kan gebeuren
via rechtstreeks contact, zoals aanraken, kussen, bijten of geslachtsgemeenschap,
via respiratoir druppelcontact of via de lucht. Respiratoire druppels zijn kleine
vochtdruppels die verspreid worden wanneer een persoon niest of hoest. Dit soort
transmissie is meestal beperkt tot korte afstanden. De transmissie gebeurt via de lucht
wanneer het virus zich verspreid via zeer kleine druppeltjes die kunnen verdampen na
niezen, hoesten of praten. Deze deeltjes kunnen vrij lang in de lucht blijven zweven
en zich over relatief lange afstanden verplaatsen. De mazelen, kinkhoest en (pandemi-
sche) griep zijn voorbeelden van ziektes die zich verspreiden via druppelcontact of via
de lucht. Indirecte transmissie doet zich voor wanneer het besmettelijke organisme
wordt overgedragen via objecten of via insecten. Malaria en dengue zijn voorbeelden
van ziektes die via insecten verspreid worden. De transmissieroute van een ziekte
wordt voornamelijk bepaald door eigenschappen van het besmettelijke organisme en
die van de gastheer. Sommige micro-organismen zijn gelimiteerd tot een beperkt
167
168 Samenvatting
aantal transmissieroutes, anderen kunnen op diverse manieren tot een besmetting
leiden. In deze thesis ligt de focus op ziektes die zich verspreiden van persoon tot per-
soon via niet-seksueel contact, zoals druppelcontact, via fysiek contact of via de lucht.
Wanneer een persoon geınfecteerd wordt met een virale infectieziekte, reageert
het immuunsysteem door, enerzijds specifieke antilichamen te produceren tegen het
besmettelijke pathogeen en anderzijds door cellen te activeren die het pathogeen
moeten vernietigen. Na infectie kan het een tijdje duren voordat de geınfecteerde
persoon zelf ook besmettelijk is. De lengte van deze period is afhankelijk van de
ziekte in kwestie. Besmette individuen kunnen symptomen ontwikkelen, maar het
is niet zo dat de vertoning van symptomen altijd samenvalt met de periode van
besmettelijkheid. Geınfecteerde personen die geen symptomen ontwikkelen worden
asymptomatisch genoemd. Na een tijdje is de persoon niet langer besmettelijk en
herstelt hij. Sommige ziektes induceren immuniteit na herstel, waardoor personen
die de ziekte gehad hebben niet meer besmet kunnen worden.
De laatste decennia is het domein van infectieziekte-epidemiologie substantieel
gegroeid. Dit als reactie op opkomende dreigingen, denk bijvoorbeeld aan het Ebola
virus, maar ook om endemische ziekten beter te kunnen controleren. Er kan een
onderverdeling in statistische modellen en wiskundige modellen gemaakt worden.
Statistische modellen bestuderen relaties tussen verschillende variabelen gebaseerd
op data en trekken dan besluiten gebaseerd op deze relaties. Wiskundige modellen,
daarentegen, beschrijven een systeem door middel van wiskundige vergelijkingen en
bestuderen hoe dat systeem verandert and hoe variabelen afhankelijk zijn van de
waarde van andere variabelen.
Een belangrijke parameter bij het modelleren van infectieziekten is de kans op
contact tussen een besmettelijke bron en een vatbaar persoon. Voor infecties die
worden verspreid van persoon tot persoon zijn er veronderstellingen nodig die de
grote diversiteit aan menselijke relaties vereenvoudigen zodat ze gebruikt kunnen
worden in wiskundig modellen. In klassieke epidemische modellen werd er meestal
verondersteld dat individuen in een populatie volledig willekeurig contact maken
(‘homogeneous mixing’ ) en dat iedereen even vatbaar en besmettelijk is. Hoewel
dergelijke assumpties meestal voldoende zijn in de context van modelleren, zijn
ze niet erg realistisch. Daarom is er de voorbije decennia veel onderzoek gedaan
naar het modelleren van heterogeniteit in het krijgen van een infectieziekte en het
effect daarvan op de verspreiding. Anderson and May (1991) introduceerden een
Samenvatting 169
methode waarin bepaalde structuren verondersteld worden voor leeftijdsspecifieke
overdrachtsintensiteiten. Deze structuren worden gespecifieerd via laag-dimensionale
matrices, die de ‘wie verkrijgt infectie van wie’-matrices (‘Who Acquires Infection
From Whom’ of WAIFW matrices) genoemd worden. Een nadeel van deze methode
is de subjectiviteit van de gekozen structuur en de keuze van de leeftijdscategorieen.
Het effect van deze keuzes is veelvuldig bestudeerd en er werden ook meerdere
extensies op deze klassieke WAIFW methode voorgesteld. Wallinga et al. (2006)
waren de eersten om sociale contactgegevens te gebruiken om overdrachtsintensiteiten
te informeren. Zij veronderstellen dat overdrachtsintensiteiten voor niet-seksuele,
persoon-tot-persoon infectieziekten recht evenredig zijn aan frequenties van verbale
of fysieke contacten die geschat worden uit enquetes. Dit wordt de ‘sociale contact
hypothese’ genoemd. Ook zijn er modellen ontwikkeld die de onderliggende structuur
van contactpatronen proberen te incorporeren door de populatie onder te verdelen
in contactstructuren. Voorbeelden hiervan zijn huishoudmodellen (Longini and
Koopman, 1982; Becker and Dietz, 1995; Becker and Hall, 1996), modellen met twee
niveaus van transmissie (Ball et al., 1997; Ball and Lyne, 2001; Demiris and O’Neill,
2005a), netwerk modellen (Anderson, 1999; Britton and O’Neill, 2002) en sociale
cluster modellen (Schinazi, 2002).
In deze thesis hebben we verschillende strategieen getoond waarin diverse vor-
men van sociale contact data gebruikt worden om modellen voor infectieziekten
realistischer te maken. In Hoofdstuk 3, hebben we de klassieke sociale contact
hypothese bestudeerd en uitgebreid voor serologische data van varicella-zoster virus
(VZV) in meerdere Europese landen door rekening te houden met heterogeniteit
in vatbaarheid en besmettelijkheid. Goeyvaerts et al. (2010) hebben aangetoond
dat het schatten van parameters gerelateerd aan besmettelijkheid enkel gebaseerd
op serologische data niet mogelijk is. Om om te gaan met deze onbepaaldheid,
hebben wij voorgesteld om het effectief reproductie getal R te gebruiken als model
criterium voor endemische ziektes. Reproductiegetallen zijn zeer belangrijk bij het
karakteriseren van infectieziekteverspreiding en het inschatten van de inspanning die
nodig is om een epidemie onder controle te krijgen. Het basis reproductiegetal R0
wordt gedefineerd als het gemiddeld aantal secundaire besmettingen door een typisch
geınfecteerd individu in een totaal vatbare populatie. Het effectief reproductiegetal
R geeft het aantal secundaire gevallen weer in een populatie met een bepaalde
immuniteit. Een ‘endemische’ infectieziekte is een ziekte die over een langere tijd in
een constante frequentie in een populatie voorkomt. De incidentie van zo een ziekte
kan cyclische trends vertonen, maar fluctueert steeds rond een stationair gemiddelde.
170 Samenvatting
We concludeerden dat de sociale contact hypothese verbeterd kon worden voor 10
van de 12 landen door leeftijdsspecifieke factoren toe te voegen. Dit kan een gevolg
zijn van verschillen in vatbaarheid en besmettelijkheid tussen personen van andere
leeftijdsgroepen, maar het zou ook een gevolg kunnen zijn van verschillen tussen
de geschatte sociale contacten en de werkelijke contacten die nodig zijn om VZV
te verspreiden. De geschatte waarden voor het basis reproductie getal verschilden
vrij veel tussen de landen hetgeen aangeeft dat de VZV epidemiologie in Europa
behoorlijk heterogeen is. Vanuit een set demografische, socio-economische and spatio-
temporele factoren werden positieve correlaties gevonden met R0 (vaccinatiegraden
bij kinderen, aanwezigheid op kinderopvang, bevolkingsdichtheid en gemiddelde
leefruimte per persoon), terwijl anderen een negatieve associatie vertoonde (onge-
lijkheid in inkomen, armoede, borstvoeding en de proportie kinderen jonger dan 14
jaar). Interpretatie van deze verbanden is niet in alle gevallen even duidelijk, maar
sommige factoren zoals armoede, ongelijkheid van inkomsten en vaccinatiegraden
zouden geassocieerd kunnen zijn met landen waarin kinderen vanaf een jonge
leeftijd naar de kinderopvang gaan, hetgeen de verspreiding van VZV bevordert. De
analyzes in deze studie steunden op (1) endemiciteit van VZV, wat een redelijke
aanname is voor de landen die we beschouwden, en (2) de geschiktheid van de fysieke
contacten uit de POLYMOD studie. Het effect van een schending van de endemiciteit
werd bestudeerd in een kleine sensitiviteitsstudie, maar een diepgaande analyze is
nodig om het effect van dergelijke schending op R te kunnen inschatten. Verder
hebben we ook aangetoond dat informatie over verschillen in vatbaarheid en besmet-
telijkheid zeer nuttig zou zijn bij het schatten van parameters vanuit serologische data.
Het is geweten dat ziektesymptomen een effect kunnen hebben op de contact-
patronen van een persoon, bijvoorbeeld indien een ziek persoon thuis blijft van
zijn werk of van school. In Hoofdstuk 4, hebben we sociale contactgegevens
gebruikt van zowel gezonde personen als personen die symptomen vertonen om
transmissie-intensiteiten te informeren in een wiskundig model dat rekening houdt
met asymptomatische infecties. We hebben dit model toegepast op gegevens over
ILI incidentie en vonden dat de proportie symptomatische infecties en de relatieve
besmettelijkheid van symptomatische versus asymptomatische personen zeer sterk
gecorreleerd is. Het verschil in contactgedrag tussen zieke en gezonde personen
staat ons dus toe om een van beide parameters te schatten conditioneel op de
andere. Bijvoorbeeld wanneer we veronderstellen dat asymptomatische personen
even besmettelijk zijn als symptomatische personen. In deze analyze hebben we
verder ook rekening gehouden met rapporteerintensiteiten, aangezien het mogelijk
Samenvatting 171
is dat een deel van de ILI gevallen niet gerapporteerd wordt. Deze rapporteerin-
tensiteiten zijn helaas niet schatbaar vanuit de data, dus de onderrapportering zou
in minstens een leeftijdscategorie gekend moeten zijn. Verder vonden we ook dat
onze analyse de preferentiele transmissie hypothese ondersteunde. Dit wil zeggen
dat de kans om symptomen te ontwikkelen afhankelijk is van de aan- of afwezigheid
van symptomen bij de persoon die de ziekte heeft doorgegeven. Maar voorlopig zijn
er nog geen bewijzen van een duidelijk verband tussen de verkregen virale dosis
en de ontwikkeling van influenza symptomen in de literatuur. Verder onderzoek is
vereist om deze hypothese na te gaan. In dit hoofstuk beschreven we het belang van
empirische data om de relatie tussen contactintensiteiten en ernst van symptomen te
beschrijven. Contactgegevens voor andere infectieziekten zijn nodig om dit model te
kunnen uitbreiden.
In tegenstelling tot de veelgebruikte veronderstelling van ‘homogenous mixing ’
in modellen voor infectieziektes, zijn menselijke populaties opgedeeld in inherente
structuren. Personen spenderen namelijk tijd in verschillende groepen zoals het
gezin, school, werkomgeving, etc. In Hoofdstukken 5 en 6 bestuderen we contact
heterogeniteit binnen deze groepen, en meer specifiek, binnen gezinnen. We beginnen
met het introduceren van de eerste sociale contact studie ontworpen om contact
netwerken binnen gezinnen te bestuderen in Hoofdstuk 5. Deze netwerken werden
geanalyzeerd gebruik makende van ‘exponential random graph models’ (ERGMs).
We vonden dat het gemiddelde aantal contacten toeneemt met toenemende ge-
zinsgrootte tijdens weekdagen en dat contact tussen een vader en zijn kinderen
minder waarschijnlijk is dan tussen vader en moeder, moeder en kinderen en tussen
kinderen (uitgezonderd oudere kinderen). Om na te gaan welk effect deze empirische
contact netwerken hebben op de verspreiding van een infectieziekte, voerden we
een simulatiestudie uit. Hierin simuleerden we epidemieen in een gemeenschap van
gezinnen gebaseerd op een SIR model met transmissie op twee niveaus (‘two-level
mixing model ’) waarin het contact netwerk binnen gezinnen steunde op, ofwel de
empirische contact netwerken, ofwel op de ‘homogenous mixing ’ veronderstelling. De
resultaten toonden aan dat deze laatste een plausibele veronderstelling is in deze
setting, aangezien we geen noemenswaardige verschillen vonden bij het gebruiken van
een empirisch netwerk. In deze analyzes hebben we ons gericht op fysieke contacten,
maar het is mogelijk dat andere soorten contacten meer gepast zijn wanneer men
de verspreiding van een specifieke infectie bestudeert. De duur van een contact zou,
bijvoorbeeld, van belang kunnen zijn voor sommige ziektes. Hier zou rekening mee
gehouden kunnen worden in het model door gebruik te maken van een netwerk
172 Samenvatting
analyse waarin gewichten met de duur van contact worden toegevoegd. Verder zou
het ook interessant zijn om te kijken naar verandering van contacten doorheen de tijd.
In Hoofdstuk 6 hebben we het model ontwikkeld in Hoofdstuk 7 gecombineerd
met epidemische data van een gelijkaardige gemeenschap van huishoudens om de
parameters van een ‘two-level mixing model’ te schatten volgens een Bayesiaanse
aanpak. Het onderliggende contact netwerk van dit model werd dus zowel geın-
formeerd door empirische contact data, als door ziektegegevens. Met behulp van
data over pertussis in gezinnen met een laboratorium bevestigde index werden
overdrachtsintensiteiten binnen het gezin en vanuit de gemeenschap geschat, alsook
de lengte van de latente periode. We plannen verder nog om een simulatie studie te
doen, waarin epidemieen in een gelijkaardige setting gesimuleerd zullen worden zodat
de prestatie van het model beoordeeld kan worden. Verder willen we ook dit model
vergelijken met een model waarin ‘homogeneous mixing’ verondersteld wordt binnen
gezinnen.
Tenslotte, hebben we in Hoofdstuk 7 een model bestaande uit twee fases voor
de Ebola uitbraak van 2014 ontwikkeld. Dit model was gebaseerd op publiekelijk
beschikbare district data en toonde een sterke temporele en spatiale variabiliteit in
de overdracht van EVD in Guinea, Sierra Leone en Liberia. In de meerderheid van de
modellen die tijdens de uitbraak gepubliceerd werden, werd geen rekening gehouden
met deze spatiale heterogeniteit. Ze maakten namelijk gebruik van cumulatieve
data op nationaal niveau. We kwantificeerden spatio-temporele transmissie patronen
op sub-nationaal niveau door gebruik te maken van een groeimodel en hebben het
effectief reproductiegetal geschat in een selectie van districten via compartimentele
modellen. Deze laatste methode stelde ons ook in staat om predicties te maken
voor het aantal EVD gevallen en het aantal doden. Maar het vergelijken van deze
predicties met de geobserveerde aantallen toonde aan dat zelfs voorspellingen op
korte termijn niet betrouwbaar zijn. Wanneer we dit model uitbreidden om rekening
te houden met asymptomatische infectie, verkregen we een betere fit, maar verder
onderzoek naar het bestaan van asymptomatische EVD gevallen is nodig. De aanpak
in dit hoofdstuk steunde op de veronderstelling van constante onderrapportering voor
EVD gevallen en doden, hoewel het bijna zeker is dat veranderingen in rapporteringen
tijdens de uitbraak hebben plaats gevonden. Helaas was er geen data over deze
veranderingen beschikbaar. Verder gaven onregelmatigheden in de data weer dat er
sprake was van inconsistente rapportering en niet gedocumenteerde veranderingen.