dual-purpose bayesian design for parameter …dual-purpose bayesian design for parameter estimation...

Post on 20-May-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Dual-purpose Bayesian design for parameter estimationand model discrimination of models with intractable

likelihoods

M. B. Dehideniya 1 C. C. Drovandi 1,2 J. M. McGree 1,2

1School of Mathematical SciencesQueensland University of Technology

Australia

2ARC Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS)

Bayes on the Beach - 2017

1

Experimental design in EpidemiologySpread of a disease within a herd of cows.(eg.Foot and mouth disease)

Competing models - SIR (Orsel et al., 2007) and SEIR (Backer et al.,2012)

Not practical to continuously observe the process.A set of distinct observational times {t1, t2, ..., tn} - Design.

2

Experimental design in EpidemiologySpread of a disease within a herd of cows.(eg.Foot and mouth disease)

Competing models - SIR (Orsel et al., 2007) and SEIR (Backer et al.,2012)

Not practical to continuously observe the process.A set of distinct observational times {t1, t2, ..., tn} - Design.

2

Experimental design in EpidemiologySpread of a disease within a herd of cows.(eg.Foot and mouth disease)

Competing models - SIR (Orsel et al., 2007) and SEIR (Backer et al.,2012)

Not practical to continuously observe the process.A set of distinct observational times {t1, t2, ..., tn} - Design.

2

BackgroundBayesian experimental designs

Consider Bayesian design,I due to the availability of important utilities (total entropy).I to appropriately handle uncertainty about models and parameters.

Assume Bayesian inference will be undertaken upon observing data.Optimal design - d∗ = arg maxd u(d), where

u(d) =K∑

m=1p(m)

∫y

u(d , y ,m)p(y |d ,m)dy .

u(d , y ,m) is some measure of information gained from d given modelm and observed data y .

3

BackgroundBayesian experimental designs

Consider Bayesian design,I due to the availability of important utilities (total entropy).I to appropriately handle uncertainty about models and parameters.

Assume Bayesian inference will be undertaken upon observing data.

Optimal design - d∗ = arg maxd u(d), where

u(d) =K∑

m=1p(m)

∫y

u(d , y ,m)p(y |d ,m)dy .

u(d , y ,m) is some measure of information gained from d given modelm and observed data y .

3

BackgroundBayesian experimental designs

Consider Bayesian design,I due to the availability of important utilities (total entropy).I to appropriately handle uncertainty about models and parameters.

Assume Bayesian inference will be undertaken upon observing data.Optimal design - d∗ = arg maxd u(d), where

u(d) =K∑

m=1p(m)

∫y

u(d , y ,m)p(y |d ,m)dy .

u(d , y ,m) is some measure of information gained from d given modelm and observed data y .

3

BackgroundBayesian experimental designs

Consider Bayesian design,I due to the availability of important utilities (total entropy).I to appropriately handle uncertainty about models and parameters.

Assume Bayesian inference will be undertaken upon observing data.Optimal design - d∗ = arg maxd u(d), where

u(d) =K∑

m=1p(m)

∫y

u(d , y ,m)p(y |d ,m)dy .

u(d , y ,m) is some measure of information gained from d given modelm and observed data y .

3

BackgroundBayesian experimental designs

In most cases, u(d) cannot be solved analytically.

But it can be approximated, e.g., Monte Carlo integration,

u(d) ≈K∑

m=1p(m) 1

B

B∑b=1

u(d , ymb,m),

where ymb ∼ p(y |θmb,m,d) and θmb ∼ p(θ|m).Here, some posterior inference is required to evaluate u(d , ymb,m).Hence, K × B posterior distributions need to be approximated orsampled from to approximated u(d).Computationally challenging task,

I approximating the expected utility;I maximising the utility.

4

BackgroundBayesian experimental designs

In most cases, u(d) cannot be solved analytically.But it can be approximated, e.g., Monte Carlo integration,

u(d) ≈K∑

m=1p(m) 1

B

B∑b=1

u(d , ymb,m),

where ymb ∼ p(y |θmb,m,d) and θmb ∼ p(θ|m).

Here, some posterior inference is required to evaluate u(d , ymb,m).Hence, K × B posterior distributions need to be approximated orsampled from to approximated u(d).Computationally challenging task,

I approximating the expected utility;I maximising the utility.

4

BackgroundBayesian experimental designs

In most cases, u(d) cannot be solved analytically.But it can be approximated, e.g., Monte Carlo integration,

u(d) ≈K∑

m=1p(m) 1

B

B∑b=1

u(d , ymb,m),

where ymb ∼ p(y |θmb,m,d) and θmb ∼ p(θ|m).Here, some posterior inference is required to evaluate u(d , ymb,m).

Hence, K × B posterior distributions need to be approximated orsampled from to approximated u(d).Computationally challenging task,

I approximating the expected utility;I maximising the utility.

4

BackgroundBayesian experimental designs

In most cases, u(d) cannot be solved analytically.But it can be approximated, e.g., Monte Carlo integration,

u(d) ≈K∑

m=1p(m) 1

B

B∑b=1

u(d , ymb,m),

where ymb ∼ p(y |θmb,m,d) and θmb ∼ p(θ|m).Here, some posterior inference is required to evaluate u(d , ymb,m).Hence, K × B posterior distributions need to be approximated orsampled from to approximated u(d).

Computationally challenging task,I approximating the expected utility;I maximising the utility.

4

BackgroundBayesian experimental designs

In most cases, u(d) cannot be solved analytically.But it can be approximated, e.g., Monte Carlo integration,

u(d) ≈K∑

m=1p(m) 1

B

B∑b=1

u(d , ymb,m),

where ymb ∼ p(y |θmb,m,d) and θmb ∼ p(θ|m).Here, some posterior inference is required to evaluate u(d , ymb,m).Hence, K × B posterior distributions need to be approximated orsampled from to approximated u(d).Computationally challenging task,

I approximating the expected utility;I maximising the utility.

4

Total entropy

The total entropy utility function can be defined as follows:

uT (d , y ,m) =∫

θp(θ|m, y ,d) log p(y |θ,m,d)dθ − log p(y |d).

In general, total entropy is a computationally challenging utility.Limited use (Borth, 1975 and McGree, 2017).Difficult to evaluate p(y |θ,m,d) for models with intractablelikelihoods.Hence, p(θ|m, y ,d) and log p(y |d) are more difficult to approximate.Motivates the use of a synthetic likelihood approach.

5

Total entropy

The total entropy utility function can be defined as follows:

uT (d , y ,m) =∫

θp(θ|m, y ,d) log p(y |θ,m,d)dθ − log p(y |d).

In general, total entropy is a computationally challenging utility.Limited use (Borth, 1975 and McGree, 2017).

Difficult to evaluate p(y |θ,m,d) for models with intractablelikelihoods.Hence, p(θ|m, y ,d) and log p(y |d) are more difficult to approximate.Motivates the use of a synthetic likelihood approach.

5

Total entropy

The total entropy utility function can be defined as follows:

uT (d , y ,m) =∫

θp(θ|m, y ,d) log p(y |θ,m,d)dθ − log p(y |d).

In general, total entropy is a computationally challenging utility.Limited use (Borth, 1975 and McGree, 2017).Difficult to evaluate p(y |θ,m,d) for models with intractablelikelihoods.

Hence, p(θ|m, y ,d) and log p(y |d) are more difficult to approximate.Motivates the use of a synthetic likelihood approach.

5

Total entropy

The total entropy utility function can be defined as follows:

uT (d , y ,m) =∫

θp(θ|m, y ,d) log p(y |θ,m,d)dθ − log p(y |d).

In general, total entropy is a computationally challenging utility.Limited use (Borth, 1975 and McGree, 2017).Difficult to evaluate p(y |θ,m,d) for models with intractablelikelihoods.Hence, p(θ|m, y ,d) and log p(y |d) are more difficult to approximate.

Motivates the use of a synthetic likelihood approach.

5

Total entropy

The total entropy utility function can be defined as follows:

uT (d , y ,m) =∫

θp(θ|m, y ,d) log p(y |θ,m,d)dθ − log p(y |d).

In general, total entropy is a computationally challenging utility.Limited use (Borth, 1975 and McGree, 2017).Difficult to evaluate p(y |θ,m,d) for models with intractablelikelihoods.Hence, p(θ|m, y ,d) and log p(y |d) are more difficult to approximate.Motivates the use of a synthetic likelihood approach.

5

Other utilitiesEstimation (KLD) (McGree, 2017).

uP(d , y ,m) =∫

θp(θ|m, y ,d) log p(y |θ,m,d)dθ − log p(y |m,d).

Model discrimination (Drovandi, McGree, Pettitt, 2014).

uM(d , y ,m) = − log p(m|y ,d).

Difficult to evaluate p(y |θ,m,d) for models with intractablelikelihoods.Hence, p(θ|m, y ,d) , log p(y |m,d) and log p(m|y ,d) are moredifficult to approximate.Motivates the use of a synthetic likelihood approach more generallythan just with total entropy.

6

Other utilitiesEstimation (KLD) (McGree, 2017).

uP(d , y ,m) =∫

θp(θ|m, y ,d) log p(y |θ,m,d)dθ − log p(y |m,d).

Model discrimination (Drovandi, McGree, Pettitt, 2014).

uM(d , y ,m) = − log p(m|y ,d).

Difficult to evaluate p(y |θ,m,d) for models with intractablelikelihoods.Hence, p(θ|m, y ,d) , log p(y |m,d) and log p(m|y ,d) are moredifficult to approximate.

Motivates the use of a synthetic likelihood approach more generallythan just with total entropy.

6

Other utilitiesEstimation (KLD) (McGree, 2017).

uP(d , y ,m) =∫

θp(θ|m, y ,d) log p(y |θ,m,d)dθ − log p(y |m,d).

Model discrimination (Drovandi, McGree, Pettitt, 2014).

uM(d , y ,m) = − log p(m|y ,d).

Difficult to evaluate p(y |θ,m,d) for models with intractablelikelihoods.Hence, p(θ|m, y ,d) , log p(y |m,d) and log p(m|y ,d) are moredifficult to approximate.Motivates the use of a synthetic likelihood approach more generallythan just with total entropy.

6

Synthetic likelihood approachWood (2010) approach - p(y |θ,m,d))

7

Synthetic likelihood approach

Counts will be observed from our experiments.

Extension for discrete data.In our case, no summary statistics are considered (mean and varianceof simulated data).Idea: the Normal distribution via continuity correction.Likelihood for discrete data is thus:

pSL(Y = y |θ,m,d) = p(y1− c < Y 1 < y1 + c, . . . , yp − c < Y p < yp + c),

where (Y 1, . . . ,Y p) ∼ N(µ(θ,m,d), Σ(θ,m,d)), c = 0.5 .

8

Synthetic likelihood approach

Counts will be observed from our experiments.Extension for discrete data.

In our case, no summary statistics are considered (mean and varianceof simulated data).Idea: the Normal distribution via continuity correction.Likelihood for discrete data is thus:

pSL(Y = y |θ,m,d) = p(y1− c < Y 1 < y1 + c, . . . , yp − c < Y p < yp + c),

where (Y 1, . . . ,Y p) ∼ N(µ(θ,m,d), Σ(θ,m,d)), c = 0.5 .

8

Synthetic likelihood approach

Counts will be observed from our experiments.Extension for discrete data.In our case, no summary statistics are considered (mean and varianceof simulated data).

Idea: the Normal distribution via continuity correction.Likelihood for discrete data is thus:

pSL(Y = y |θ,m,d) = p(y1− c < Y 1 < y1 + c, . . . , yp − c < Y p < yp + c),

where (Y 1, . . . ,Y p) ∼ N(µ(θ,m,d), Σ(θ,m,d)), c = 0.5 .

8

Synthetic likelihood approach

Counts will be observed from our experiments.Extension for discrete data.In our case, no summary statistics are considered (mean and varianceof simulated data).Idea: the Normal distribution via continuity correction.

Likelihood for discrete data is thus:

pSL(Y = y |θ,m,d) = p(y1− c < Y 1 < y1 + c, . . . , yp − c < Y p < yp + c),

where (Y 1, . . . ,Y p) ∼ N(µ(θ,m,d), Σ(θ,m,d)), c = 0.5 .

8

Synthetic likelihood approach

Counts will be observed from our experiments.Extension for discrete data.In our case, no summary statistics are considered (mean and varianceof simulated data).Idea: the Normal distribution via continuity correction.Likelihood for discrete data is thus:

pSL(Y = y |θ,m,d) = p(y1− c < Y 1 < y1 + c, . . . , yp − c < Y p < yp + c),

where (Y 1, . . . ,Y p) ∼ N(µ(θ,m,d), Σ(θ,m,d)), c = 0.5 .

8

Approximating utility functionsMarginal likelihood can be approximated as follows:

p(y |m,d) = 1B

B∑b=1

pSL(y |θb,m,d),

where θb ∼ p(θ|m).

Also for p(y |d)

p(y |d) =K∑

m=1p(y |m,d)p(m).

Then, posterior model probabilities:

p(m|y ,d) = p(y |m,d)p(m)p(y |d) .

9

Approximating utility functionsMarginal likelihood can be approximated as follows:

p(y |m,d) = 1B

B∑b=1

pSL(y |θb,m,d),

where θb ∼ p(θ|m).Also for p(y |d)

p(y |d) =K∑

m=1p(y |m,d)p(m).

Then, posterior model probabilities:

p(m|y ,d) = p(y |m,d)p(m)p(y |d) .

9

Approximating utility functionsMarginal likelihood can be approximated as follows:

p(y |m,d) = 1B

B∑b=1

pSL(y |θb,m,d),

where θb ∼ p(θ|m).Also for p(y |d)

p(y |d) =K∑

m=1p(y |m,d)p(m).

Then, posterior model probabilities:

p(m|y ,d) = p(y |m,d)p(m)p(y |d) .

9

Approximating utility functions

Employ importance sampling for approximating posteriordistributions.

Use prior as importance distribution.Sample θb ∼ p(θ), b = 1, . . . ,B (equal weights).Update weights via synthetic likelihood to yield Wb; the normalisedimportance weights.p(θ|y ,m,d) can be approximated by the particle set:

{θb,Wb}Bb=1.

10

Approximating utility functions

Employ importance sampling for approximating posteriordistributions.Use prior as importance distribution.

Sample θb ∼ p(θ), b = 1, . . . ,B (equal weights).Update weights via synthetic likelihood to yield Wb; the normalisedimportance weights.p(θ|y ,m,d) can be approximated by the particle set:

{θb,Wb}Bb=1.

10

Approximating utility functions

Employ importance sampling for approximating posteriordistributions.Use prior as importance distribution.Sample θb ∼ p(θ), b = 1, . . . ,B (equal weights).

Update weights via synthetic likelihood to yield Wb; the normalisedimportance weights.p(θ|y ,m,d) can be approximated by the particle set:

{θb,Wb}Bb=1.

10

Approximating utility functions

Employ importance sampling for approximating posteriordistributions.Use prior as importance distribution.Sample θb ∼ p(θ), b = 1, . . . ,B (equal weights).Update weights via synthetic likelihood to yield Wb; the normalisedimportance weights.p(θ|y ,m,d) can be approximated by the particle set:

{θb,Wb}Bb=1.

10

Approximating utility functions

Estimation:

uP(d , y ,m) =B∑

b=1W b

m log p(y |θb,m,d)− log p(y |m,d).

Discrimination:

uM(d , y ,m) = log p(m|y ,d).

Total entropy:

uT (d , y ,m) =B∑

b=1W b

m log p(y |θb,m,d)− log p(y |d).

11

SIR model

Given that at time t there are s susceptibles and i infectious individuals ina closed population of size N, then the probabilities of possible events inthe next time period ∆t are

a Susceptible becomes an Infectious individual

P[s − 1, i + 1|s, i

]= β s i

N ∆t +O(∆t),

an Infectious individual gets Recovered

P[s, i − 1|s, i

]= αi ∆t +O(∆t).

12

SEIR model

The probabilities of possible events in the next time period ∆t area Susceptible becomes an Exposed individual

P[s − 1, e + 1, i |s, e, i

]= β s i

N ∆t +O(∆t),

an Exposed individual becomes an Infectious individual

P[s, e − 1, i + 1|s, e, i

]= αIe ∆t +O(∆t),

an Infectious individual gets Recovered ,

P[s, e, i − 1|s, e, i

]= αR i ∆t +O(∆t).

13

ApplicationPrior predictive distribution under SIR and SEIR models (prior for SEIRmodel taken from Backer et al., 2012).

SEIR : β ∼ LN(0.44, 0.162), αI ∼ G(25.55, 0.02), αR ∼ G(7.25, 0.04).

SIR: β ∼ LN(−0.09, 0.192), α ∼ G(10.30, 0.02).

14

Optimal designsRefined coordinate exchange algorithm (Dehideniya et al., 2017)

Utility Function Optimal designd ∗ U(d ∗)

KLD

(11.6) 0.91(9.4, 19.1) 1.27

(7.4, 14.2, 27.1) 1.47(7.3, 10.9, 16.4, 27.1) 1.60

MI

(3.1) -0.43(4.1, 16) -0.34

(0.7, 4.1, 18.4) -0.30(0.7, 4.1, 10.1, 25.3) -0.28

TE

(7.0) 0.97(6.7, 17.5) 1.56

(6.5, 13.5, 27.1) 1.81(5.5, 10.8, 16.3, 27.1) 1.97

15

Performance of optimal designs in model discriminationM

I−1D

TE

−1D

KLD

−1D

RD

−1D

MI−

2D

TE

−2D

KLD

−2D

RD

−2D

MI−

3D

TE

−3D

KLD

−3D

RD

−3D

MI−

4D

TE

−4D

KLD

−4D

RD

−4D

0.0

0.2

0.4

0.6

0.8

1.0

Pos

terio

r m

odel

pro

babi

lity

− p

(m=

SIR

|y)

(a) SIR model

MI−

1D

TE

−1D

KLD

−1D

RD

−1D

MI−

2D

TE

−2D

KLD

−2D

RD

−2D

MI−

3D

TE

−3D

KLD

−3D

RD

−3D

MI−

4D

TE

−4D

KLD

−4D

RD

−4D

0.2

0.4

0.6

0.8

1.0

Pos

terio

r m

odel

pro

babi

lity

− p

(m=

SE

IR|y

)(b) SEIR model

16

Performance of optimal designs in parameter estimationM

I−1D

TE

−1D

KLD

−1D

RD

−1D

MI−

2D

TE

−2D

KLD

−2D

RD

−2D

MI−

3D

TE

−3D

KLD

−3D

RD

−3D

MI−

4D

TE

−4D

KLD

−4D

RD

−4D

9

10

11

12

13

14

log(

1/de

t(co

v(th

eta|

y)))

(a) SIR model

MI−

1D

TE

−1D

KLD

−1D

RD

−1D

MI−

2D

TE

−2D

KLD

−2D

RD

−2D

MI−

3D

TE

−3D

KLD

−3D

RD

−3D

MI−

4D

TE

−4D

KLD

−4D

RD

−4D

12

13

14

15

16

17

log(

1/de

t(co

v(th

eta|

y)))

(b) SEIR model

17

Discussion

Approach to design experiments for models with intractablelikelihoods.

Flexible in that a variety of utility functions can be efficientlyestimated.

18

Discussion

Approach to design experiments for models with intractablelikelihoods.Flexible in that a variety of utility functions can be efficientlyestimated.

18

Future research

Is the normal approximation reasonable, in general? (Otherdistributions were considered.)

How small can the sample size (no. of individuals) be?How to extend this method for high dimensional Bayesian designproblems for models with intractable likelihoods?

I Suitable posterior approximations.I Possible computational resources (GPU).

19

Future research

Is the normal approximation reasonable, in general? (Otherdistributions were considered.)How small can the sample size (no. of individuals) be?

How to extend this method for high dimensional Bayesian designproblems for models with intractable likelihoods?

I Suitable posterior approximations.I Possible computational resources (GPU).

19

Future research

Is the normal approximation reasonable, in general? (Otherdistributions were considered.)How small can the sample size (no. of individuals) be?How to extend this method for high dimensional Bayesian designproblems for models with intractable likelihoods?

I Suitable posterior approximations.I Possible computational resources (GPU).

19

Key referencesBorth, D. M., 1975. A total entropy criterion for the dual problem of model discriminationand parameter estimation. Journal of the Royal Statistical Society. Series B(Methodological) 37 (1), 77-87.Backer, J., Hagenaars, T., Nodelijk, G., and van Roermund, H. (2012). Vaccinationagainst foot-and-mouth disease i: Epidemiological consequences. Preventive VeterinaryMedicine 107, 27 - 40.Dehideniya, M. B., Drovandi, C. C., and McGree, J. M. (2017). Optimal Bayesian designfor discriminating between models with intractable likelihoods in epidemiology. Technicalreport, Queensland University of Technology.Drovandi, C. C., McGree, J. M., Pettitt, A. N., 2014. A sequential Monte Carlo algorithmto incorporate model uncertainty in Bayesian sequential design. Journal of Computationaland Graphical Statistics 23 (1), 3-24.McGree, J. M. , 2017. Developments of the total entropy utility function for the dualpurpose of model discrimination and parameter estimation in Bayesian design.Computational Statistics & Data Analysis, 113, 207-225.Orsel, K., de Jong, M., Bouma, A., Stegeman, J., and Dekker, A. (2007). The effect ofvaccination on foot and mouth disease virus transmission among dairy cows. Vaccine 25,327 - 335.Wood, S. N. (2010). Statistical inference for noisy nonlinear ecological dynamic systems.Nature 466, 1102-1104.

20

top related