francisco josé vázquez polo [personales.ulpgc.es/fjvpolo.dmc]

48
1 Francisco José Vázquez Polo [www.personales.ulpgc.es/fjvpolo.dmc] Miguel Ángel Negrín Hernández [www.personales.ulpgc.es/mnegrin.dmc] {fjvpolo or mnegrin}@dmc.ulpgc.es Course on Bayesian Methods Basics (continued): Models for proportions and means

Upload: zahina

Post on 16-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Course on Bayesian Methods. Basics (continued): Models for proportions and means. Francisco José Vázquez Polo [www.personales.ulpgc.es/fjvpolo.dmc] Miguel Ángel Negrín Hernández [www.personales.ulpgc.es/mnegrin.dmc] {fjvpolo or mnegrin}@dmc.ulpgc.es. 1. Markov Chain Monte Carlo Methods - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

1

Francisco José Vázquez Polo [www.personales.ulpgc.es/fjvpolo.dmc]

Miguel Ángel Negrín Hernández [www.personales.ulpgc.es/mnegrin.dmc]

{fjvpolo or mnegrin}@dmc.ulpgc.es

Course on Bayesian Methods

Basics (continued):

Models for proportions and means

Page 2: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Markov Chain Monte Carlo Methods

Goals

-To make inference about model parameters

- To make predictions

¿Simulation?

Posterior distribution is not analytically tractable

Page 3: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Monte Carlo integration and MCMC

Monte Carlo integration

-Draw independent samples from required distribution

-Use sample averages to approximate expectations

Markov Chain Monte Carlo (MCMC)

- Draws samples by running a Markov chain that is constructed so that its limiting distribution is the joint distribution of interest

Page 4: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Markov chains

A Markov chain is a sequence of random variables X0, X1, X2,…

At each time t ≥ 0 the next state Xt+1 is sampled from a distribution

P(Xt+1 | Xt)

that depends only on the state at time t

- Called “transition kernel”

Under certain regularity conditions, the iterates from a Markov chain will gradually converge to draws from a unique stationary or invariant distribution

-i.e. chain will forget its initial state

-As t increases, samples points Xt will look increasingly like (correlated) samples from the stationary distribution

Page 5: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Suppose

-MC is run for N (large number) iterations

- We throw away output from first m iterations (burn-in)

-Regularity conditions are met

Then by ergodic theorem

-We can use averages of remaining samples to estimate means

N

mttxmN

xfE1

1)(

Markov chains

Page 6: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Gibbs sampling: one way to construct the transition kernel

Seminal references

Geman and Geman (IEEE Trans. Pattn. Anal. Mach. Intel., 1984)

Gelfand and Smith (JASA, 1990)

Hasting (Biometrika, 1970)

Metropolis, Rosenbluth, et al. (J. Chem. Phys, 1953)

Page 7: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Subject to regularity conditions, joint distribution is uniquely determined by “full conditional distributions”

- Full conditional distribution for a model quantity is distribution of that quantity conditional on assumed known values of all the other quantities in the model

Break complicated, high-dimensional problem into a large number of simpler, low-dimensional problems

Gibbs sampling: one way to construct the transition kernel

Page 8: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Example: inference about normal mean and variance, both unknown.

Model

Priors

We want posterior means, posterior means, posterior credible sets for µ, σ2

NihNyi ,...,1,~,| 122

2,2~

,~

001

00

snIGh

baN

Gibbs sampling: one way to construct the transition kernel

Page 9: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

(, σ2|x) h((n+n

0)/2-1) exp{(-1/2)[b0(-a0)

2 +s0h+hi(xi-)²]}

Gibbs sampling: one way to construct the transition kernel

Posterior distribution (Bayes’ theorem):

This is not a known distribution

Page 10: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

We can obtain the conditional distributions:

Gibbs sampling: one way to construct the transition kernel

(|h, x) = = (, h|x)

(h|x)

(, h|x)

(, h|x)d

(, h|x)

(|x)

(, h|x)

(, h|x)dh(h|, x) = =

Page 11: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

where,

(|h, x) exp{ }(b0+nh)( - )2 -12

a0b0 +hn

b0+nh

a0b0 +hnb0+nh

1b0+nh

~ N( , )

2n0+n (s0+i(xi-)²)

2 ~ G( , )

h exp{- ·h}n0+n

2(s0+i(xi-)²)

2 (h|, x)

-1

Gibbs sampling: one way to construct the transition kernel

Page 12: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Gibbs sampler algorithm for Normal

1. Choose initial values µ(0) σ2(0)

2. At each iteration t, generate new value for each parameter, conditional on most recent value of all other parameters

Gibbs sampling: one way to construct the transition kernel

Page 13: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Gibbs sampling:

•Step 0. Initial values: (0) = (0, h0)

•Step 1. How to generate the next simulation (1) = (1, h1):

We can simulate 1 from (|h=h0, x),(Normal distribution)

We can simulate h1 from (h|= 1, x),(Gamma distribution)

We can update (0, h0) to (1, h1),

···

•Step k. Update (k) = (k, hk), from (k-1) .

Gibbs sampling: one way to construct the transition kernel

Page 14: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Other MCMC techniques

- Metropolis Hasting

Metropolis et al. (1953) and Hasting (1970), Tierney (1994),Chib and Greenberg (1995), Robert and Casella (1999)

- Data augmentation

Tanner and Wrong (1987)

Page 15: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

What is WinBUGS?

“Bayesian inference Using Gibbs Sampling”

General purpose program for fitting Bayesian models

Developed by David Spiegelhater, Andrew Thomas, Nicky Best, and Wally Gilks at the Medical Research Council Biostatistics Unit, Institute of Public Health, in Cambridge, UK

Excellent documentation, including two volumes of examples

Web page:

www.mrc-bsu.cam.ac.uk/bugs/

Software: WinBUGS

Page 16: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

What does WinBUGS do?

Enable user to specify model in simple language

Construct the transition kernel of a Markov chain with the joint posterior as its stationary distribution, and simulate a sample path of the resulting chain

-Generate random variables from standard densities using standard algorithms.

-WinBUGS uses Metropolis algorithm to generate from nonstandard full conditionals

Page 17: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Topics. WinBUGS

Deciding how many chains to run

Choosing initial values

- Do not confuse initial values with priors!

-Priors are part of the model. Initial values are part of the computing strategy used to fit the model

-Priors must not be based on the current data.

-The best choices of initial values are values that are in a high-posterior-density region of the parameter space. If the prior is not very strong, then maximum likelihood estimates (from the current data) are excellent choices of initial values if they can be calculated.

Page 18: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Initial values

Initial values are not like priors!

Choosing initial values

- Run more than one chain with initial values selected to give you information about sampler performance

- Initial values may be generated from priors

- Initial values may be based on frequentist estimates.

Page 19: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

-WinBUGS usually can automatically generate initial values for other parameters

-But it’s often advantageous to specify even those WinBUGS can generate.

Initial values must be specified for variance components

Initial values

Page 20: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

In the simple models we have encountered so far, the MCMC sampler will converge quickly even with a poor choice of initial values.

-In more complicated models, choosing initial values in low posterior density regions may make the sampler take a huge number of iterations to finally star drawing from a good approximation to the true posterior.

“Assessing, whether sampler has converged”.

-How many initial iterations need to be discarded in order that remaining samples are drawn from a distribution close enough to the true stationary distribution to be usable for estimation and inference?

Topics. WinBUGS

Page 21: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Convergence assessment

How many initial iterations need to be discarded in order that remaining samples are drawn form a distribution close enough to the true stationary distribution to be usable for estimation and inference?

-“burn-in”

(0), (1), . . ., (m), (m+1), . . ., (N).

”burn in” sample

Page 22: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

-Once we are drawing from the right distribution, how many samples are needed in order to provide the desired precision in estimation and inference?

Choosing model parameterizations and MCMC algorithms that will lead to convergence, in a reasonable amount of time.

Topics. WinBUGS

N-m²Monte Carlo error:

Page 23: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

On “stats” output

Similar to standard error of the mean but adjusted for autocorrelated sample

It will get smaller as more iterations are run

Use it to guide your decisions as to how many iterations you need to run after burn-in is done

Monte Carlo errors

Page 24: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Monte Carlo errors

Decide whether to

- Discard some initial burn-in iterations and use remaining sampler output for inference

- or take some corrective action

Run more iterations

Change parameterization of model

Page 25: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Using MCMC for Bayesian inference

1. Specify a Bayesian model

2. Construct a Markov chain whose target distribution is the joint posterior distribution of interest

3. Run one or more chain(s) as long as you can stand it

4. Assess convergence

- Burn-in

- Monte Carlo error

5. Inference

Page 26: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

History plots: Early graphical methods of convergence assessment

Trajectories of sampler output for each model unknown

Can quickly reveal failure to reach stationarity

Can give qualitative information about sampler behaviour

Cannot confirm that convergence have occurred

Using MCMC for Bayesian inference

Page 27: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Monitor

-All types of model parameters, not only parameters of substantive interest

-Sample paths graphically

-Autocorrelations

-Cross-correlations between parameters

Apply more than one diagnostic, including one or more that uses information about the specific model.

Page 28: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Conclusions

MCMC methods have enabled the fitting of complex, realistic models.

Use of MCMC methods requires careful attention to

-Model parameterization

-MCMC sampler algorithms

-Choice of initial values

-Convergence assessment

-Output Analysis

Ongoing research in theoretical verification of convergence, MCMC acceleration, and exact sampling holds great promise.

Page 29: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

WinBUGS

Page 30: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

WinBUGS

Where to get the WinBUGS software

From the web site of the MRC Biostatistics Unit in Cambridge. The BUGS home page ishttp://www.mrc-bsu.cam.ac.uk/bugs/Welcome.htmlOnce you have downloaded the files you need to email the BUGS project for a key that will let you use the full version.

Manuals

-The WinBUGS manual available online.

-WinBUGS examples, volumes 1 and 2 with explanations, available online.

Page 31: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Specifying the model

The first stage in model fitting is to specify your model in the BUGS language. This can either be done graphically (and then code written form your graph) or in the BUGS language.

Graphically – DOODLE

Page 32: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Specifying the model

Starting WinBUGS

-Click on the WinBUGS icon or program file. You will get a message about the license conditions that you can read, and then close. Now explores the menus.

-HELP – you see manuals and examples.

-FILE – allows you to open existing files, or to start a new file to program your own example in the BUGS language

-DOODLE – NEW is what you need to use to start a file for a new model specified in graphical format

Page 33: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Specifying the model

Example – a simple proportion

Inference is required for the proportion (pi) of people who are willing to pay a certain amount to visit a park.

X = (0,0,0,0,1,0,0,0,1,1,1,0,1,1,1) ( 1: yes, 0: no)

Sample size = 15

• x1, x2, . . ., xn iid ~ Bin(1, ) or Bernouilli()

Page 34: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Example – a simple proportion

Press ‘ok’

Starting WinBUGS

DODDLE - NEW

Page 35: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Example – a simple proportion

Basic instructions

Click left mouse button to create a “doodle” with

Click CTRL + Supr to delete a “doodle”

To create a “plate” pressCLICK + CTRL

Click CTRL + Supr to delete a “plate”

Page 36: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Nodes: ConstantsStochastic nodesDeterministic nodes

The relationship between node: Arrows

stochastic dependencelogical function

To create a narrow we have to illuminate the “destiny” node and the press CTRL + CLICK on the “origin” node.

If we repeat the process we have to repeat the process.

Example – a simple proportion

Page 37: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Example – a simple proportion

We have to create a “node” for each variable or constant includedIn the model:

•An oval for stochastic nodes (we choosethe density and the parameters)

•A rectangle for the constants

Nodes:

Plate for the node xi

Page 38: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Now we add the arrows that represents the relationsbetween nodes:

Example – a simple proportion

Copy and paste the doodle in a new file

Page 39: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Example – a simple proportion

Data

Now you will have to enter the data.The following list format will do (notice that WinBUGS is case sensitive with the capital letters).

Initial values (opcional, WinBUGS can generate them)

list(n = 15, alpha = 1, beta = 1, x=c(0, 0, 0, 0, 1, ...))

list(phi =0.1)

Page 40: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Example – a simple proportion

Specifying the model

The first stage in model fitting is to specify your model in the BUGS language (doodle or code).

You then go to the model menu and you will get a specification tool with buttons.

Click on the check model button. Any error messages will appear in the grey bar at the bottom of the screen. If all is well, you will get the message “model is syntactically correct”. You will aso see that the compile and load data buttons will have their text darkened. You are ready for the next step.

Page 41: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Example – a simple proportion

Load Data

Data can be entered either in list format or in rectangular format (columns). Once you have typed in your data, highlight either the word list. You can use the load data button on the specification tool to read the data.

Compile

You are now ready to compile your model using compile button on the specification tool. Again, check error messages. If all is well you will get a “model compiled” message.

Page 42: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Example – a simple proportion

Initial values

All nodes that are not given as data, or derived from other nodes, need to be given initial values. This is done with the specification tool menu either by setting them specifically form values you type in (set inits button) or by generating a random value form the prior (gen inits button).

WinBUGS 1.4 allows you to run more than one chain at the same item; see specification tool above. If you want more chains you will need to set different initial values for each one.

Page 43: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Example – a simple proportion

Generating the samples - updating

You are now ready to generate samples and to examine the simulated output. To start the sampler, go to model and then update and you will get an updating tool.

You can select how many updates you get for each press of the update button and how often the screen is refreshed to show how sampling is proceeding.

Updating does the sampling, but does not store any values. In MCMC methods you usually want to run the sampler for some time (perhaps 10000 iterations) to be sure it is stable before you start storing values.

Page 44: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Example – a simple proportion

Storing values and summarising results

After an initial run values go to the inference menu and samples and you will get a sample monitoring tool.

You start by entering the parameters you want to monitor in the node box, and for each one press ser. If you also press trace you will see a plot of the samples as they are generated. Now go back to the updating tool and generate some samples.

Page 45: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Example – a simple proportion

Now go back to the sampling tool to look at the various ways of displaying your results or summary statistics. The most useful buttons are:

history – shows you a plot of all the samples you have generated

density – gives a kernel density estimate of the posterior

stats – gives summary statistics including mean, s.d., median and percentiles that can be set with the panel on the right. These can be used for credible intervals. You will also get a MonterCarlo error for the mean that will indicate how well the mean of the posterior has been estimated form your number of samples.

AutoC – a plot of the autocorrelation in the chains

Page 46: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

“click” in “stats”:

• mean = 0.4693 (real mean= 0.4705) • median = 0.4683

• Bayesian interval 95% = (0.246, 0.6988) • MC error = 0.001148

Example – a simple proportion

Results:

“click” in “density”:

“click” in “history”:

(all the chain)

Page 47: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Example – a simple proportion

Exercise:

Repeat the analysis 10000 new simulations

Page 48: Francisco José Vázquez Polo  [personales.ulpgc.es/fjvpolo.dmc]

Example – a simple proportion

Code: Doodle – Write Code