statistical inference for epidemics on networks

49
Statistical inference for epidemics on networks PD O’Neill, T Kypraios (Mathematical Sciences, University of Nottingham) Sep 2011 ICMS, Edinburgh

Upload: benjy

Post on 25-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Statistical inference for epidemics on networks. PD O’Neill, T Kypraios ( Mathematical Sciences, University of Nottingham ). Sep 2011. ICMS, Edinburgh. Outline 1. Orientation 2. Inference for epidemics 3. Network models 4. Inference for network models 5. Open problems. Sep 2011. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Statistical inference for epidemics on networks

Statistical inference for epidemics on networks

PD O’Neill, T Kypraios (Mathematical Sciences, University of Nottingham)

Sep 2011ICMS, Edinburgh

Page 2: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Outline1. Orientation2. Inference for epidemics3. Network models4. Inference for network models5. Open problems

Page 3: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Outline1. Orientation2. Inference for epidemics3. Network models4. Inference for network models5. Open problems

Page 4: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

The basic problem

Given data on a network and an infectious disease, can model parameters be inferred?

1. Orientation

Page 5: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

The basic problem

Data •Can be partial or complete for network•Usually partial for disease•Can be multi-scale•May be longitudinal or not

1. Orientation

Page 6: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

The basic problem

Model •Can be for the network•Can be for the disease•Can be both

1. Orientation

Page 7: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Outline1. Orientation2. Inference for epidemics3. Network models4. Inference for network models5. Open problems

Page 8: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Consider Erdös-Renyi random graph on N vertices. Let p = Prob(two edges connected)

Run an SIR model on graph:Infection rate = β, Removal rate = γ

2. Inference for epidemicsInference for network and disease given

partial temporal data

Page 9: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Given complete observation of removal process, we wish to infer p, β and γ

i.e. find posterior density (p, β, γ | data)

2. Inference for epidemicsInference for network and disease given

partial temporal data

Page 10: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Bayes’ Theorem gives (p, β, γ | data) (data | p, β, γ) (p, β, γ)

However, the likelihood (data | p, β, γ) is intractable in practice.

2. Inference for epidemicsInference for network and disease given

partial temporal data

Page 11: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

One solution is to augment the parameter space to include the unobserved infection events.

This leads to a tractable likelihood, and the resulting posterior density can be explored using MCMC methods.

2. Inference for epidemicsInference for network and disease given

partial temporal data

Page 12: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

• Britton & O’Neill (2002) – basic idea• Neal & Roberts (2005) – improved

computational aspects• Ray & Marzouk (2008) – extended to two

populations• Groendyke, Welch & Hunter (2011a) – SEIR

model

2. Inference for epidemicsInference for network and disease given

partial temporal data

Page 13: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

• Groendyke, Welch & Hunter (2011b) – More general network model where

pjk = function of covariates of j, k and (j,k)

but edges are still independent

2. Inference for epidemicsInference for network and disease given

partial temporal data

Page 14: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

General comment – this estimation problem often leads to parameter identifiability issues.

e.g. A highly connected network and low-infectivity disease, or a sparse network and high-infectivity disease?

2. Inference for epidemicsInference for network and disease given

partial temporal data

Page 15: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

• Data tell us which individuals become infected and who is connected to whom.

• Again the likelihood is intractable.• Augment data with network of infectious

contacts (Demiris & O’Neill 2005; O’Neill 2009; van Boven et al. 2010).

2. Inference for epidemicsInference for disease given final outcome data

and network data

Page 16: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Outline1. Orientation2. Inference for epidemics3. Network models4. Inference for network models5. Open problems

Page 17: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Most real-life networks require more general models which can incorporate a wide range of features.

e.g. transitivity, homophily, self-organization, …

3. Network models

Page 18: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Basic idea:•Directed edges have covariates X(i,j)•Each vertex has a position in multivariate social space Z(i).•Edge prob(i,j) = f( X(i,j), | Z(i) – Z(j) | ) .•Z(i)’s are i.i.d. (e.g. Gaussian mixture).

3. Network modelsLatent position cluster models

(Handcock, Raftery & Tantrum, 2007)

Page 19: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Key point is that edge probabilities are conditionally (upon the Z(i)’s) independent.

Given data on observed edges, inference can be carried out using MCMC or even ML.

3. Network modelsLatent position cluster models

(Handcock, Raftery & Tantrum, 2007)

Page 20: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Very widely used class of models in social network literature.

Can incorporate many features of interest.

3. Network modelsExponential Random Graph Models

(Frank & Strauss, 1986)

Page 21: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Let Y be a random N N adjacency matrix:Y(i,j) = 1 if edge from i to j is present, 0 if not.

For Y=y, i = 1,…,m, s(i,y) denotes a summary statistic of y (e.g. number of edges, triangles, 3-stars, ….)

3. Network modelsExponential Random Graph Models

Page 22: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Then the ERGM is defined by

( y | ) = exp ( i (i) s(i,y) ) / z()

Where = ((1), …, (m)) is a real m-vector, z() = y exp ( i (i) s(i,y) )

3. Network modelsExponential Random Graph Models

Page 23: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Example: N=3, s(1,y) = # edges, s(2,y) = # triangles

8 possible graphs (4 up to isomorphism)

3. Network modelsExponential Random Graph Models

Page 24: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

( y | ) 1 e(1) e2(1) e3(1)+ (2)

z() = 1 + 3e(1) + 3e2(1) + e3(1)+ (2)

3. Network modelsExponential Random Graph Models

Page 25: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

(i) > 0 promotes s(i,y)(i) < 0 inhibits s(i,y)

e.g. in the example(1) > 0 promotes edges(1) < 0 inhibits edges

3. Network modelsExponential Random Graph Models

Page 26: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Often see near-degeneracy in ERGMs in the sense that small number of graphs y are far more likely than all the others.

3. Network modelsExponential Random Graph Models

Page 27: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

=(2,1) ( y | ) 0.001 0.017 0.128 0.854

3. Network modelsExponential Random Graph Models

Page 28: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

A key computational problem with ERGMs is that

z() = y exp ( i (i) s(i,y) )

is intractable unless N is very small.

3. Network modelsExponential Random Graph Models

Page 29: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Outline1. Orientation2. Inference for epidemics3. Network models4. Inference for network models5. Open problems

Page 30: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Options include:•Maximum pseudolikelihood – not that good in general•Monte Carlo ML estimation – various practical problems

4. Inference for network modelsExponential Random Graph Models

Page 31: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Standard MCMC cannot be used since the posterior density is “doubly intractable”:

(|y) (y|) () = f(y|) () / z()

i.e. the likelihood itself is only known up to proportionality (know f(y|), not z() ).

4. Inference for network modelsExponential Random Graph Models

Page 32: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

One option (Möller et al., 2006) is to augment the parameter space to include a new variable on the data space – call this x – and then work with the augmented posterior density

( x, | y).

4. Inference for network modelsExponential Random Graph Models

Page 33: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

( x, | y) = ( x | , y) ( | y)

= ( x | , y) f(y | ) () / z() (y)

4. Inference for network modelsExponential Random Graph Models

Page 34: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

A Metropolis-Hastings algorithm requires a proposal to update (x,).

If we can draw a random graph from the distribution of y given then we may choose q(x*,* | x,) = q(x* | ) q (* |)

= f (x * | ) q (* |) / z()

4. Inference for network modelsExponential Random Graph Models

Page 35: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

The resulting M-H acceptance probability ratio is then of the form

( x* | *, y) f(y | *) f(x| ) q( |*) (*) ( x | , y) f(y | ) f(x*| ) q(* |) ()

and z() is not required.

4. Inference for network modelsExponential Random Graph Models

Page 36: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

The crucial assumption is the ability to sample from the original ERGM given ; in practice this is usually achieved using MCMC.

Variations of the Möller method have been developed – essentially choices of ( x | , y).

4. Inference for network modelsExponential Random Graph Models

Page 37: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

Outline1. Orientation2. Inference for epidemics3. Network models4. Inference for network models5. Open problems

Page 38: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

5. Open Problems1. Simulating random graphs from ERGMs?

•MCMC is considered as the gold-standard method to draw from (y|) for given -- essential in order to draw inference for .

•Is it possible to use an exact algorithm instead? For instance, rejection sampling? What would be a good proposal distribution? Efficiency?

Page 39: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

5. Open Problems2. Approximate inference for ERGMs?

• Bayesian inference for ERGMs often relies on advanced MCMC algorithms (Cairo and Friel, 2010)

• Alternatively, one can resort to approximate methods which are easier to implement.

Page 40: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

5. Open Problems2. Approximate inference for ERGMs?

Data y; parameter ; target distribution (|y).

Consider the following algorithm:

1. Draw * from the prior ().2. Simulate data y* from (y*|*)3. If y* = y then accept *.4. Goto 1.

Page 41: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

5. Open Problems2. Approximate inference for ERGMs?

•No evaluation of the likelihood is required (suitable when the likelihood is intractable or expensive to compute).

•Relies on being able to simulate data from the model (which is usually easy to do so ... )

• Step 3 may not be feasible in practice...

Page 42: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

5. Open Problems2. Approximate inference for ERGMs?

A variation of the previous algorithm:

1. Draw * from the prior ().2. Simulate data y* from (y*|*)3. If ρ(y, y*) ≤ ε then accept *.4. Goto 1.

where ρ(y, y*) is a measure of distance between y and y*.

Page 43: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

5. Open Problems2. Approximate inference for ERGMs?

Summary statistics

Instead of calculating the distance between the “raw data” y and y*, we can calculate the distance between some summary statistics of the data S(y) and S(y*), i.e.

ρ(S(y), S(y*))

Page 44: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

5. Open Problems2. Approximate inference for ERGMs?

•Recall that the likelihood function is written as

(y|)=exp( i (i) s(i,y) ) / z().

•Therefore, a natural choice for summary statistics could be:

s(1,y), s(2,y), ... which are sufficient statistics too.

Page 45: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

5. Open Problems2. Approximate inference for ERGMs?

Approximate Bayesian Computation (ABC)

Challenges

•How to choose the distance metric ρ(∙) ?•How to choose ε ?•Sequential Monte Carlo (SMC) methods.

Page 46: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

5. Open Problems3. Model Choice for ERGMs?

• Suppose we have some network data and a number of different ERGMs that could we could fit to these data.•How do we decide which ERGM do the data support most?•How can we tell if a particular ERGM model offers a good fit to the data?•Model choice/selection

Page 47: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

5. Open Problems3. Model Choice for ERGMs?

•Bayesian model choice, in general, can be problematic (Bayes Factors, marginal likelihoods).

•Key concept is the marginal likelihood, (y) :(|y) = (y|) () / (y)

where (y) = ∫ (y|) () d

Page 48: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

5. Open Problems3. Model Choice for ERGMs?

•Exact (Bayesian) inference for ERGMs is itself hard due the fact that the posterior density is “doubly intractable”:

(|y) (y|) () = f(y|) () / z()

•Hence, (Bayesian) model choice would be even harder due to z() being unknown.

Page 49: Statistical inference for epidemics on networks

Sep 2011ICMS, Edinburgh

5. Open Problems4. Need for alternative, computationally

tractable network models?

•Using ERGMS in large networks can be very computationally intensive.

•Need for developing models which preserve (some of) the nice features of ERGMs but, are easier to handle computationally and more suitable for epidemic modelling?