statistical methods for analysis with missing data · statistical methods for analysis with missing...

72
Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department of Biostatistics 1 / 31

Upload: others

Post on 26-Jun-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Statistical Methods for Analysis with Missing Data

Lecture 8: introduction to Bayesian inference

Mauricio Sadinle

Department of Biostatistics

1 / 31

Page 2: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Previous Lecture

EM algorithm for maximum likelihood estimation with missing data:

I Derivation and implementation of EM algorithm for categorical data

I For two categorical variables, EM under MAR is intuitive and simple

π(t+1)kl =

1

n

(n11kl +

π(t)kl

π(t)k+

n10k+ +π

(t)kl

π(t)+l

n01+l + π(t)kl n00++

),

where

n11kl =∑i

Wikl I (ri = 11), n10k+ =∑i

Wik+I (ri = 10),

n01+l =∑i

Wi+l I (ri = 01), n00++ =∑i

I (ri = 00)

I Bootstrap confidence intervals

2 / 31

Page 3: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Today’s Lecture

I Introduction to Bayesian inference. Why?

I Technique called multiple imputation is derived from a Bayesianpoint of view

I Variation of multiple imputation called multiple imputation bychained equations imitates Gibbs sampling – a procedure commonlyused for Bayesian inference

I Bayesian inference has its own ways of handling missing data

3 / 31

Page 4: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Outline

Bayes’ Theorem

Bayesian Inference

4 / 31

Page 5: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Bayes’ Theorem

For events A and B:

P(A | B) =P(B | A)P(A)

P(B)

I P(A | B): conditional probability of event A given that B is true

I P(B | A): conditional probability of event B given that A is true

I P(A) and P(B): unconditional/marginal probabilities of events Aand B

5 / 31

Page 6: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Using Bayes’ Theorem

I A: person has a given condition

I B: person tests positive for condition using cheap test

I P(B | A), P(B | Ac): known from experimental testing

I P(A): known from study based on gold standard

I P(A | B) = P(B | A)P(A)/P(B): probability of having thecondition given positive result in cheap test

6 / 31

Page 7: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Using Bayes’ TheoremI A: person has a given condition

I B: person tests positive for condition using cheap test

I P(B | A) = 0.99: sensitivity of the test

I P(Bc | Ac) = 0.99: specificity of the test

I P(A) = 0.01: rare condition

I Given that a generic person tests positive, what is the probabilitythat they have the condition?

P(A | B) =P(B | A)P(A)

P(B | A)P(A) + P(B | Ac)P(Ac)

=.99× .01

.99× .01 + .01× .99= 0.5

7 / 31

Page 8: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Bayes’ Theorem

In the above example, we are implicitly working with two binary variables

I Y = I (having medical condition)

I X = I (testing positive)

with a joint density p(X = x ,Y = y)

I Bayes’ theorem simply relates the conditional probabilitiesp(X = x | Y = y) and p(Y = y | X = x)

8 / 31

Page 9: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Bayes’ TheoremIn general

I Random vectors X and Y

I Joint density p(x , y)

I Relationship between conditionals is given by Bayes’ theorem

p(x | y) =p(y | x)p(x)

p(y)

I This is a mathematical fact about conditional probabilities/densities

I Applying Bayes’ theorem doesn’t make you Bayesian!

I The Bayesian approach arises from applying this theorem to obtainstatistical inferences on model parameters!

9 / 31

Page 10: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Bayes’ TheoremIn general

I Random vectors X and Y

I Joint density p(x , y)

I Relationship between conditionals is given by Bayes’ theorem

p(x | y) =p(y | x)p(x)

p(y)

I This is a mathematical fact about conditional probabilities/densities

I Applying Bayes’ theorem doesn’t make you Bayesian!

I The Bayesian approach arises from applying this theorem to obtainstatistical inferences on model parameters!

9 / 31

Page 11: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Bayes’ TheoremIn general

I Random vectors X and Y

I Joint density p(x , y)

I Relationship between conditionals is given by Bayes’ theorem

p(x | y) =p(y | x)p(x)

p(y)

I This is a mathematical fact about conditional probabilities/densities

I Applying Bayes’ theorem doesn’t make you Bayesian!

I The Bayesian approach arises from applying this theorem to obtainstatistical inferences on model parameters!

9 / 31

Page 12: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Outline

Bayes’ Theorem

Bayesian Inference

10 / 31

Page 13: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Philosophical Motivation

I Bayesian paradigm: model parameters are treated as randomvariables to represent uncertainty about their true values

I Compare with the frequentist paradigm: model parameters areunknown constants

I Bayesian analysis requires:

I Likelihood function, coming from a model for the distribution of thedata

I Prior distribution on model parameters, coming from previousknowledge about the phenomenon under study

I Prior distribution is updated using the likelihood via Bayes’ theorem,resulting in a posterior distribution

I Bayesian inferences are drawn from posterior distribution (updatedknowledge)

11 / 31

Page 14: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Philosophical Motivation

I Bayesian paradigm: model parameters are treated as randomvariables to represent uncertainty about their true values

I Compare with the frequentist paradigm: model parameters areunknown constants

I Bayesian analysis requires:

I Likelihood function, coming from a model for the distribution of thedata

I Prior distribution on model parameters, coming from previousknowledge about the phenomenon under study

I Prior distribution is updated using the likelihood via Bayes’ theorem,resulting in a posterior distribution

I Bayesian inferences are drawn from posterior distribution (updatedknowledge)

11 / 31

Page 15: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Philosophical Motivation

I Bayesian paradigm: model parameters are treated as randomvariables to represent uncertainty about their true values

I Compare with the frequentist paradigm: model parameters areunknown constants

I Bayesian analysis requires:

I Likelihood function, coming from a model for the distribution of thedata

I Prior distribution on model parameters, coming from previousknowledge about the phenomenon under study

I Prior distribution is updated using the likelihood via Bayes’ theorem,resulting in a posterior distribution

I Bayesian inferences are drawn from posterior distribution (updatedknowledge)

11 / 31

Page 16: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Philosophical Motivation

I Bayesian paradigm: model parameters are treated as randomvariables to represent uncertainty about their true values

I Compare with the frequentist paradigm: model parameters areunknown constants

I Bayesian analysis requires:

I Likelihood function, coming from a model for the distribution of thedata

I Prior distribution on model parameters, coming from previousknowledge about the phenomenon under study

I Prior distribution is updated using the likelihood via Bayes’ theorem,resulting in a posterior distribution

I Bayesian inferences are drawn from posterior distribution (updatedknowledge)

11 / 31

Page 17: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Philosophical Motivation

I Bayesian paradigm: model parameters are treated as randomvariables to represent uncertainty about their true values

I Compare with the frequentist paradigm: model parameters areunknown constants

I Bayesian analysis requires:

I Likelihood function, coming from a model for the distribution of thedata

I Prior distribution on model parameters, coming from previousknowledge about the phenomenon under study

I Prior distribution is updated using the likelihood via Bayes’ theorem,resulting in a posterior distribution

I Bayesian inferences are drawn from posterior distribution (updatedknowledge)

11 / 31

Page 18: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Practical MotivationBayesian machinery might make this approach appealing

I Quantification of uncertainty in large discrete parameter spaces

I Partitions in clustering problemsI Graphs in graphical modelsI Binary vectors of variable inclusion in regression model selectionI Bipartite matchings in record linkage

I Frequentist approach calls for constructing confidence sets, whereasBayesian approach works with a posterior distribution from which wecan sample to approximate summaries of interest

I Under some conditions, posteriors behave like sampling distributionof MLEs, but Bayesian machinery might be easier to implement thanderiving estimate of asymptotic covariance matrix of MLE

I Implementing a Monte Carlo EM algorithm to obtain MLEs issimilar to implementing Data Augmentation (coming soon), but thelatter readily provide you with measures of uncertainty

I Convenient in hierarchical/multilevel models – priors just addanother level to the hierarchy

12 / 31

Page 19: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Practical MotivationBayesian machinery might make this approach appealing

I Quantification of uncertainty in large discrete parameter spaces

I Partitions in clustering problemsI Graphs in graphical modelsI Binary vectors of variable inclusion in regression model selectionI Bipartite matchings in record linkage

I Frequentist approach calls for constructing confidence sets, whereasBayesian approach works with a posterior distribution from which wecan sample to approximate summaries of interest

I Under some conditions, posteriors behave like sampling distributionof MLEs, but Bayesian machinery might be easier to implement thanderiving estimate of asymptotic covariance matrix of MLE

I Implementing a Monte Carlo EM algorithm to obtain MLEs issimilar to implementing Data Augmentation (coming soon), but thelatter readily provide you with measures of uncertainty

I Convenient in hierarchical/multilevel models – priors just addanother level to the hierarchy

12 / 31

Page 20: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Practical MotivationBayesian machinery might make this approach appealing

I Quantification of uncertainty in large discrete parameter spaces

I Partitions in clustering problemsI Graphs in graphical modelsI Binary vectors of variable inclusion in regression model selectionI Bipartite matchings in record linkage

I Frequentist approach calls for constructing confidence sets, whereasBayesian approach works with a posterior distribution from which wecan sample to approximate summaries of interest

I Under some conditions, posteriors behave like sampling distributionof MLEs, but Bayesian machinery might be easier to implement thanderiving estimate of asymptotic covariance matrix of MLE

I Implementing a Monte Carlo EM algorithm to obtain MLEs issimilar to implementing Data Augmentation (coming soon), but thelatter readily provide you with measures of uncertainty

I Convenient in hierarchical/multilevel models – priors just addanother level to the hierarchy

12 / 31

Page 21: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Practical MotivationBayesian machinery might make this approach appealing

I Quantification of uncertainty in large discrete parameter spaces

I Partitions in clustering problemsI Graphs in graphical modelsI Binary vectors of variable inclusion in regression model selectionI Bipartite matchings in record linkage

I Frequentist approach calls for constructing confidence sets, whereasBayesian approach works with a posterior distribution from which wecan sample to approximate summaries of interest

I Under some conditions, posteriors behave like sampling distributionof MLEs, but Bayesian machinery might be easier to implement thanderiving estimate of asymptotic covariance matrix of MLE

I Implementing a Monte Carlo EM algorithm to obtain MLEs issimilar to implementing Data Augmentation (coming soon), but thelatter readily provide you with measures of uncertainty

I Convenient in hierarchical/multilevel models – priors just addanother level to the hierarchy

12 / 31

Page 22: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Practical MotivationBayesian machinery might make this approach appealing

I Quantification of uncertainty in large discrete parameter spaces

I Partitions in clustering problemsI Graphs in graphical modelsI Binary vectors of variable inclusion in regression model selectionI Bipartite matchings in record linkage

I Frequentist approach calls for constructing confidence sets, whereasBayesian approach works with a posterior distribution from which wecan sample to approximate summaries of interest

I Under some conditions, posteriors behave like sampling distributionof MLEs, but Bayesian machinery might be easier to implement thanderiving estimate of asymptotic covariance matrix of MLE

I Implementing a Monte Carlo EM algorithm to obtain MLEs issimilar to implementing Data Augmentation (coming soon), but thelatter readily provide you with measures of uncertainty

I Convenient in hierarchical/multilevel models – priors just addanother level to the hierarchy

12 / 31

Page 23: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Practical MotivationBayesian machinery might make this approach appealing

I Quantification of uncertainty in large discrete parameter spaces

I Partitions in clustering problemsI Graphs in graphical modelsI Binary vectors of variable inclusion in regression model selectionI Bipartite matchings in record linkage

I Frequentist approach calls for constructing confidence sets, whereasBayesian approach works with a posterior distribution from which wecan sample to approximate summaries of interest

I Under some conditions, posteriors behave like sampling distributionof MLEs, but Bayesian machinery might be easier to implement thanderiving estimate of asymptotic covariance matrix of MLE

I Implementing a Monte Carlo EM algorithm to obtain MLEs issimilar to implementing Data Augmentation (coming soon), but thelatter readily provide you with measures of uncertainty

I Convenient in hierarchical/multilevel models – priors just addanother level to the hierarchy

12 / 31

Page 24: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Practical MotivationBayesian machinery might make this approach appealing

I Quantification of uncertainty in large discrete parameter spaces

I Partitions in clustering problemsI Graphs in graphical modelsI Binary vectors of variable inclusion in regression model selectionI Bipartite matchings in record linkage

I Frequentist approach calls for constructing confidence sets, whereasBayesian approach works with a posterior distribution from which wecan sample to approximate summaries of interest

I Under some conditions, posteriors behave like sampling distributionof MLEs, but Bayesian machinery might be easier to implement thanderiving estimate of asymptotic covariance matrix of MLE

I Implementing a Monte Carlo EM algorithm to obtain MLEs issimilar to implementing Data Augmentation (coming soon), but thelatter readily provide you with measures of uncertainty

I Convenient in hierarchical/multilevel models – priors just addanother level to the hierarchy

12 / 31

Page 25: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Practical MotivationBayesian machinery might make this approach appealing

I Quantification of uncertainty in large discrete parameter spaces

I Partitions in clustering problemsI Graphs in graphical modelsI Binary vectors of variable inclusion in regression model selectionI Bipartite matchings in record linkage

I Frequentist approach calls for constructing confidence sets, whereasBayesian approach works with a posterior distribution from which wecan sample to approximate summaries of interest

I Under some conditions, posteriors behave like sampling distributionof MLEs, but Bayesian machinery might be easier to implement thanderiving estimate of asymptotic covariance matrix of MLE

I Implementing a Monte Carlo EM algorithm to obtain MLEs issimilar to implementing Data Augmentation (coming soon), but thelatter readily provide you with measures of uncertainty

I Convenient in hierarchical/multilevel models – priors just addanother level to the hierarchy

12 / 31

Page 26: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Practical MotivationBayesian machinery might make this approach appealing

I Quantification of uncertainty in large discrete parameter spaces

I Partitions in clustering problemsI Graphs in graphical modelsI Binary vectors of variable inclusion in regression model selectionI Bipartite matchings in record linkage

I Frequentist approach calls for constructing confidence sets, whereasBayesian approach works with a posterior distribution from which wecan sample to approximate summaries of interest

I Under some conditions, posteriors behave like sampling distributionof MLEs, but Bayesian machinery might be easier to implement thanderiving estimate of asymptotic covariance matrix of MLE

I Implementing a Monte Carlo EM algorithm to obtain MLEs issimilar to implementing Data Augmentation (coming soon), but thelatter readily provide you with measures of uncertainty

I Convenient in hierarchical/multilevel models – priors just addanother level to the hierarchy

12 / 31

Page 27: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Practical MotivationBayesian machinery might make this approach appealing

I Quantification of uncertainty in large discrete parameter spaces

I Partitions in clustering problemsI Graphs in graphical modelsI Binary vectors of variable inclusion in regression model selectionI Bipartite matchings in record linkage

I Frequentist approach calls for constructing confidence sets, whereasBayesian approach works with a posterior distribution from which wecan sample to approximate summaries of interest

I Under some conditions, posteriors behave like sampling distributionof MLEs, but Bayesian machinery might be easier to implement thanderiving estimate of asymptotic covariance matrix of MLE

I Implementing a Monte Carlo EM algorithm to obtain MLEs issimilar to implementing Data Augmentation (coming soon), but thelatter readily provide you with measures of uncertainty

I Convenient in hierarchical/multilevel models – priors just addanother level to the hierarchy

12 / 31

Page 28: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

The Likelihood FunctionSame as before:

I Z = (Z1, . . . ,ZK ): generic vector of study variables

I We work under a parametric model for the distribution of Z

{p(z | θ)}θ, θ = (θ1, θ2, . . . , θd)

I Data from random i.i.d. vectors {Zi}ni=1 ≡ Z

I Under our parametric model, the joint distribution of {Zi}ni=1 has adensity function

p(z | θ) =n∏

i=1

p(zi | θ)

I This, seen as a function of θ, is the likelihood function

L(θ | z) =n∏

i=1

p(zi | θ)

13 / 31

Page 29: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

The Likelihood FunctionSame as before:

I Z = (Z1, . . . ,ZK ): generic vector of study variables

I We work under a parametric model for the distribution of Z

{p(z | θ)}θ, θ = (θ1, θ2, . . . , θd)

I Data from random i.i.d. vectors {Zi}ni=1 ≡ Z

I Under our parametric model, the joint distribution of {Zi}ni=1 has adensity function

p(z | θ) =n∏

i=1

p(zi | θ)

I This, seen as a function of θ, is the likelihood function

L(θ | z) =n∏

i=1

p(zi | θ)

13 / 31

Page 30: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

The Likelihood FunctionSame as before:

I Z = (Z1, . . . ,ZK ): generic vector of study variables

I We work under a parametric model for the distribution of Z

{p(z | θ)}θ, θ = (θ1, θ2, . . . , θd)

I Data from random i.i.d. vectors {Zi}ni=1 ≡ Z

I Under our parametric model, the joint distribution of {Zi}ni=1 has adensity function

p(z | θ) =n∏

i=1

p(zi | θ)

I This, seen as a function of θ, is the likelihood function

L(θ | z) =n∏

i=1

p(zi | θ)

13 / 31

Page 31: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

The Prior Distribution

I Prior to observing the realizations of Z = {Zi}ni=1, do we have anyinformation on the parameters θ?

I Represent this prior information in terms of a distribution

p(θ)

14 / 31

Page 32: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

The Posterior Distribution

Now, “simply” use Bayes’ theorem

p(θ | z) =L(θ | z)p(θ)

p(z)

=L(θ | z)p(θ)∫L(θ | z)p(θ)dθ

∝ L(θ | z)p(θ)

“That’s it! ”

15 / 31

Page 33: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

The Posterior Distribution

For simple problems, we typically have two ways of computing theposterior p(θ | z)

I Compute the integral∫L(θ | z)p(θ)dθ, and then compute

p(θ | z) =L(θ | z)p(θ)∫L(θ | z)p(θ)dθ

I Stare at / manipulate the expression L(θ | z)p(θ) seen as a functionof θ alone

I If L(θ | z)p(θ) = a(θ, z)b(z), then p(θ | z) ∝ a(θ, z)

I If a(θ, z) looks like a known distribution except for a constant, thenwe have identified the posterior

16 / 31

Page 34: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

The Posterior Distribution

For simple problems, we typically have two ways of computing theposterior p(θ | z)

I Compute the integral∫L(θ | z)p(θ)dθ, and then compute

p(θ | z) =L(θ | z)p(θ)∫L(θ | z)p(θ)dθ

I Stare at / manipulate the expression L(θ | z)p(θ) seen as a functionof θ alone

I If L(θ | z)p(θ) = a(θ, z)b(z), then p(θ | z) ∝ a(θ, z)

I If a(θ, z) looks like a known distribution except for a constant, thenwe have identified the posterior

16 / 31

Page 35: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Binomial Data, Beta Prior

Let Z | θ ∼ Binom(n, θ), and θ ∼ Beta(a, b)

I L(θ | z) =(nz

)θz(1− θ)n−z , p(θ) = 1

B(a,b)θa−1(1− θ)b−1

I The proportionality constant is∫L(θ | z)p(θ)dθ =

(nz

)B(a, b)

∫θz+a−1(1− θ)n−z+b−1dθ

=(nz

)B(z + a, n − z + b)

B(a, b)

I And the posterior is

p(θ | z) =L(θ | z)p(θ)∫L(θ | z)p(θ)dθ

=1

B(z + a, n − z + b)θz+a−1(1− θ)n−z+b−1

I Therefore, θ | z ∼ Beta(z + a, n − z + b)

17 / 31

Page 36: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Binomial Data, Beta Prior

Let Z | θ ∼ Binom(n, θ), and θ ∼ Beta(a, b)

I L(θ | z) =(nz

)θz(1− θ)n−z , p(θ) = 1

B(a,b)θa−1(1− θ)b−1

I The proportionality constant is∫L(θ | z)p(θ)dθ =

(nz

)B(a, b)

∫θz+a−1(1− θ)n−z+b−1dθ

=(nz

)B(z + a, n − z + b)

B(a, b)

I And the posterior is

p(θ | z) =L(θ | z)p(θ)∫L(θ | z)p(θ)dθ

=1

B(z + a, n − z + b)θz+a−1(1− θ)n−z+b−1

I Therefore, θ | z ∼ Beta(z + a, n − z + b)

17 / 31

Page 37: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Binomial Data, Beta Prior

Let Z | θ ∼ Binom(n, θ), and θ ∼ Beta(a, b)

I L(θ | z) =(nz

)θz(1− θ)n−z , p(θ) = 1

B(a,b)θa−1(1− θ)b−1

I The proportionality constant is∫L(θ | z)p(θ)dθ =

(nz

)B(a, b)

∫θz+a−1(1− θ)n−z+b−1dθ

=(nz

)B(z + a, n − z + b)

B(a, b)

I And the posterior is

p(θ | z) =L(θ | z)p(θ)∫L(θ | z)p(θ)dθ

=1

B(z + a, n − z + b)θz+a−1(1− θ)n−z+b−1

I Therefore, θ | z ∼ Beta(z + a, n − z + b)

17 / 31

Page 38: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Binomial Data, Beta Prior

Let Z | θ ∼ Binom(n, θ), and θ ∼ Beta(a, b)

I L(θ | z) =(nz

)θz(1− θ)n−z , p(θ) = 1

B(a,b)θa−1(1− θ)b−1

I The proportionality constant is∫L(θ | z)p(θ)dθ =

(nz

)B(a, b)

∫θz+a−1(1− θ)n−z+b−1dθ

=(nz

)B(z + a, n − z + b)

B(a, b)

I And the posterior is

p(θ | z) =L(θ | z)p(θ)∫L(θ | z)p(θ)dθ

=1

B(z + a, n − z + b)θz+a−1(1− θ)n−z+b−1

I Therefore, θ | z ∼ Beta(z + a, n − z + b)

17 / 31

Page 39: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Binomial Data, Beta Prior

Let Z | θ ∼ Binom(n, θ) and θ ∼ Beta(a, b)

I L(θ | z) =(nz

)θz(1− θ)n−z , p(θ) = 1

B(a,b)θa−1(1− θ)b−1

I We could also have noticed

p(θ | z) ∝ L(θ | z)p(θ)

∝ θz+a−1(1− θ)n−z+b−1

I This is the non-constant part (the kernel) of the density function ofa beta random variable with parameters z + a and n − z + b,therefore θ | z ∼ Beta(z + a, n − z + b)

18 / 31

Page 40: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Binomial Data, Beta Prior

Let Z | θ ∼ Binom(n, θ) and θ ∼ Beta(a, b)

I L(θ | z) =(nz

)θz(1− θ)n−z , p(θ) = 1

B(a,b)θa−1(1− θ)b−1

I We could also have noticed

p(θ | z) ∝ L(θ | z)p(θ)

∝ θz+a−1(1− θ)n−z+b−1

I This is the non-constant part (the kernel) of the density function ofa beta random variable with parameters z + a and n − z + b,therefore θ | z ∼ Beta(z + a, n − z + b)

18 / 31

Page 41: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Binomial Data, Beta Prior

To illustrate the idea, say:

I Someone is flipping a coin n = 10 times in an independent andidentical fashion

I Number of heads Z ∼ Binomial(n, θ)

I What is the value of θ?

I We use a Beta(a, b) to express our prior belief on θ

I We observe Z = 8

19 / 31

Page 42: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Binomial Data, Beta PriorPossible scenarios of prior information on θ; posteriors with Z = 8:

I No idea of what θ could be: Beta(1,1)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

1.0

2.0

3.0

θ

PriorPosterior

I The person flipping the coin looks like a trickster: Beta(9,1)

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

θ

PriorPosterior

I Coin flipping usually has 50/50 chance of heads/tails: Beta(100,100)

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

θ

PriorPosterior

20 / 31

Page 43: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Binomial Data, Beta PriorPossible scenarios of prior information on θ; posteriors with Z = 8:

I No idea of what θ could be: Beta(1,1)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

1.0

2.0

3.0

θ

PriorPosterior

I The person flipping the coin looks like a trickster: Beta(9,1)

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

θ

PriorPosterior

I Coin flipping usually has 50/50 chance of heads/tails: Beta(100,100)

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

θ

PriorPosterior

20 / 31

Page 44: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Binomial Data, Beta PriorPossible scenarios of prior information on θ; posteriors with Z = 8:

I No idea of what θ could be: Beta(1,1)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

1.0

2.0

3.0

θ

PriorPosterior

I The person flipping the coin looks like a trickster: Beta(9,1)

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

θ

PriorPosterior

I Coin flipping usually has 50/50 chance of heads/tails: Beta(100,100)

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

θ

PriorPosterior

20 / 31

Page 45: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Comments So Far

I Bayesian approach allows you to incorporate side information basedon context

I Do you know what “flipping a coin” means?I Is the person flipping the coin someone you trust?

I It seems expressing prior information works nicely with someparametric families

I Quantification of prior information can be tricky, especially forcomplicated models

21 / 31

Page 46: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Comments So Far

I Bayesian approach allows you to incorporate side information basedon context

I Do you know what “flipping a coin” means?I Is the person flipping the coin someone you trust?

I It seems expressing prior information works nicely with someparametric families

I Quantification of prior information can be tricky, especially forcomplicated models

21 / 31

Page 47: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Comments So Far

I Bayesian approach allows you to incorporate side information basedon context

I Do you know what “flipping a coin” means?I Is the person flipping the coin someone you trust?

I It seems expressing prior information works nicely with someparametric families

I Quantification of prior information can be tricky, especially forcomplicated models

21 / 31

Page 48: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Multinomial Data, Dirichlet PriorContinuing our example from the previous class:

I Let Zi = (Zi1,Zi2), Zi1,Zi2 ∈ {1, 2}, Zi ’s are i.i.d.,

p(Zi1 = k ,Zi2 = l | θ) = πkl

I θ = (. . . , πkl , . . . ), Wikl = I (Zi1 = k ,Zi2 = l)

I The likelihood of the study variables is

L(θ | z) =∏i

∏k,l

πWikl

kl

=∏k,l

πnklkl

wherenkl =

∑i

Wikl , k , l ∈ {1, 2}

22 / 31

Page 49: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Multinomial Data, Dirichlet PriorContinuing our example from the previous class:

I Let Zi = (Zi1,Zi2), Zi1,Zi2 ∈ {1, 2}, Zi ’s are i.i.d.,

p(Zi1 = k ,Zi2 = l | θ) = πkl

I θ = (. . . , πkl , . . . ), Wikl = I (Zi1 = k ,Zi2 = l)

I The likelihood of the study variables is

L(θ | z) =∏i

∏k,l

πWikl

kl

=∏k,l

πnklkl

wherenkl =

∑i

Wikl , k , l ∈ {1, 2}

22 / 31

Page 50: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Multinomial Data, Dirichlet Prior

I Inference on multinomial parameters is convenient using theDirichlet prior

I θ = (. . . , πkl , . . . ) ∼ Dirichlet(α), α = (. . . , αkl , . . . ),

p(θ) =Γ(∑αkl)∏

k,l Γ(αkl)

∏k,l

παkl−1kl

I The posterior is given by

p(θ | z) ∝ L(θ | z)p(θ)

∝∏k,l

πnkl+αkl−1kl

I Therefore, θ | z ∼ Dirichlet(α′), α′ = (. . . , αkl + nkl , . . . )

23 / 31

Page 51: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Multinomial Data, Dirichlet Prior

I Inference on multinomial parameters is convenient using theDirichlet prior

I θ = (. . . , πkl , . . . ) ∼ Dirichlet(α), α = (. . . , αkl , . . . ),

p(θ) =Γ(∑αkl)∏

k,l Γ(αkl)

∏k,l

παkl−1kl

I The posterior is given by

p(θ | z) ∝ L(θ | z)p(θ)

∝∏k,l

πnkl+αkl−1kl

I Therefore, θ | z ∼ Dirichlet(α′), α′ = (. . . , αkl + nkl , . . . )

23 / 31

Page 52: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Comments on Priors

I There is no guarantee that the combination of arbitrary likelihoodsand priors will lead to posteriors that are easy to work with

I Conjugate priors are distributions that lead to posteriors in the samefamily, as in the previous examples – typically easier to work with,but not always available

I Non-conjugate priors can be used, but we require more advancedtechniques for handling them

I Lists of conjugate priors are available in multiple books and onlineresources

24 / 31

Page 53: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Comments on Priors

I There is no guarantee that the combination of arbitrary likelihoodsand priors will lead to posteriors that are easy to work with

I Conjugate priors are distributions that lead to posteriors in the samefamily, as in the previous examples – typically easier to work with,but not always available

I Non-conjugate priors can be used, but we require more advancedtechniques for handling them

I Lists of conjugate priors are available in multiple books and onlineresources

24 / 31

Page 54: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Comments on Priors

I There is no guarantee that the combination of arbitrary likelihoodsand priors will lead to posteriors that are easy to work with

I Conjugate priors are distributions that lead to posteriors in the samefamily, as in the previous examples – typically easier to work with,but not always available

I Non-conjugate priors can be used, but we require more advancedtechniques for handling them

I Lists of conjugate priors are available in multiple books and onlineresources

24 / 31

Page 55: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Comments on Priors

I There is no guarantee that the combination of arbitrary likelihoodsand priors will lead to posteriors that are easy to work with

I Conjugate priors are distributions that lead to posteriors in the samefamily, as in the previous examples – typically easier to work with,but not always available

I Non-conjugate priors can be used, but we require more advancedtechniques for handling them

I Lists of conjugate priors are available in multiple books and onlineresources

24 / 31

Page 56: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Multivariate Normal

I Distribution of the data

Z = {Zi}ni=1 | µ,Λi.i.d.∼ Normal(µ,Λ−1)

where Zi ∈ RK , µ is the vector of means, Λ−1 is the covariancematrix, and Λ is the inverse covariance matrix (the precision matrix)

I Conjugate prior is constructed in two steps

µ | Λ ∼ Normal(µ0, (κ0Λ)−1)

Λ ∼Wishart(υ0,W0)

Joint distribution of (µ,Λ) is called Normal-Wishart. Theparameterization is such that E (Λ) = υ0W0.

25 / 31

Page 57: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Multivariate Normal

I Distribution of the data

Z = {Zi}ni=1 | µ,Λi.i.d.∼ Normal(µ,Λ−1)

where Zi ∈ RK , µ is the vector of means, Λ−1 is the covariancematrix, and Λ is the inverse covariance matrix (the precision matrix)

I Conjugate prior is constructed in two steps

µ | Λ ∼ Normal(µ0, (κ0Λ)−1)

Λ ∼Wishart(υ0,W0)

Joint distribution of (µ,Λ) is called Normal-Wishart. Theparameterization is such that E (Λ) = υ0W0.

25 / 31

Page 58: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Example: Multivariate Normal

Posterior is also Normal-Wishart

µ | Λ, z ∼ Normal(µ′, (κ′Λ)−1)

Λ | z ∼Wishart(υ′,W ′)

where

µ′ = (κ0µ0 + nz̄)/κ′

κ′ = κ0 + n

υ′ = υ0 + n

W ′ = {W−10 + n[Σ̂ +

κ0

κ′(z̄ − µ0)(z̄ − µ0)T ]}−1

z̄ =n∑

i=1

zi/n

Σ̂ =n∑

i=1

(zi − z̄)(zi − z̄)T/n

26 / 31

Page 59: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Bayesian Point Estimation

I L(θ, θ′): loss of estimating a parameter to be θ′ when the true valueis θ

I Bayes estimator minimizes the expected posterior loss

θ̂Bayes = arg minθ′

∫L(θ, θ′)p(θ | z)dθ

I For univariate θ it is common to choose

I L(θ, θ′) = (θ − θ′)2 =⇒ θ̂Bayes is posterior mean

I L(θ, θ′) = |θ − θ′| =⇒ θ̂Bayes is posterior median

I L(θ, θ′) = I (θ 6= θ′) =⇒ θ̂Bayes is posterior mode

27 / 31

Page 60: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Bayesian Point Estimation

I L(θ, θ′): loss of estimating a parameter to be θ′ when the true valueis θ

I Bayes estimator minimizes the expected posterior loss

θ̂Bayes = arg minθ′

∫L(θ, θ′)p(θ | z)dθ

I For univariate θ it is common to choose

I L(θ, θ′) = (θ − θ′)2 =⇒ θ̂Bayes is posterior mean

I L(θ, θ′) = |θ − θ′| =⇒ θ̂Bayes is posterior median

I L(θ, θ′) = I (θ 6= θ′) =⇒ θ̂Bayes is posterior mode

27 / 31

Page 61: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Bayesian Point Estimation

I L(θ, θ′): loss of estimating a parameter to be θ′ when the true valueis θ

I Bayes estimator minimizes the expected posterior loss

θ̂Bayes = arg minθ′

∫L(θ, θ′)p(θ | z)dθ

I For univariate θ it is common to choose

I L(θ, θ′) = (θ − θ′)2 =⇒ θ̂Bayes is posterior mean

I L(θ, θ′) = |θ − θ′| =⇒ θ̂Bayes is posterior median

I L(θ, θ′) = I (θ 6= θ′) =⇒ θ̂Bayes is posterior mode

27 / 31

Page 62: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Bayesian Credible Sets/Intervals

I C is a (1− α)100% credible set if∫C

p(θ | z)dθ ≥ 1− α

I For univariate θ, define the (1− α)100% credible interval C as theinterval within the α/2 and 1− α/2 quantiles of the posteriorp(θ | z)

28 / 31

Page 63: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Bayesian Credible Sets/Intervals

I C is a (1− α)100% credible set if∫C

p(θ | z)dθ ≥ 1− α

I For univariate θ, define the (1− α)100% credible interval C as theinterval within the α/2 and 1− α/2 quantiles of the posteriorp(θ | z)

28 / 31

Page 64: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Asymptotic Behavior of Posteriors: Bernstein - von Mises

Bernstein - von Mises theorem1

I Under some conditions, the posterior distribution asymptoticallybehaves like the sampling distribution of the MLE

I Heuristically, we say

p(θ | z) ≈ N (θ̂MLE, I(θ̂MLE)−1/n)

I Therefore, for well-behaved models and with a good amount of data,Bayesian and frequentist inferences will be similar

1For details, see lecture notes of Richard Nickl:http://www.statslab.cam.ac.uk/~nickl/Site/__files/stat2013.pdf

29 / 31

Page 65: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Asymptotic Behavior of Posteriors: Bernstein - von Mises

Bernstein - von Mises theorem1

I Under some conditions, the posterior distribution asymptoticallybehaves like the sampling distribution of the MLE

I Heuristically, we say

p(θ | z) ≈ N (θ̂MLE, I(θ̂MLE)−1/n)

I Therefore, for well-behaved models and with a good amount of data,Bayesian and frequentist inferences will be similar

1For details, see lecture notes of Richard Nickl:http://www.statslab.cam.ac.uk/~nickl/Site/__files/stat2013.pdf

29 / 31

Page 66: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Asymptotic Behavior of Posteriors: Bernstein - von Mises

Bernstein - von Mises theorem1

I Under some conditions, the posterior distribution asymptoticallybehaves like the sampling distribution of the MLE

I Heuristically, we say

p(θ | z) ≈ N (θ̂MLE, I(θ̂MLE)−1/n)

I Therefore, for well-behaved models and with a good amount of data,Bayesian and frequentist inferences will be similar

1For details, see lecture notes of Richard Nickl:http://www.statslab.cam.ac.uk/~nickl/Site/__files/stat2013.pdf

29 / 31

Page 67: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Missing Data and BayesI With missing data, things get complicated

Lobs(θ, ψ | z(r), r) =n∏

i=1

∫Z(r̄i )

p(ri | zi , ψ)p(zi | θ) dzi(r̄i )

MAR=

[n∏

i=1

p(ri | zi(ri ), ψ)

]︸ ︷︷ ︸

Can be ignored

[n∏

i=1

∫Z(r̄i )

p(zi | θ) dzi(r̄i )

]︸ ︷︷ ︸

Likelihood for θ under MAR

I Under a Bayesian approach, we need to obtain

p(θ, ψ | z(r), r) ∝ Lobs(θ, ψ | z(r), r)p(θ, ψ)

I Under ignorability (MAR + separability + θ ⊥⊥ ψ a priori), we need

p(θ | z(r)) ∝ Lobs(θ | z(r))p(θ)

I Integrals in Lobs(θ | z(r)) complicate things – what to do?

30 / 31

Page 68: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Missing Data and BayesI With missing data, things get complicated

Lobs(θ, ψ | z(r), r) =n∏

i=1

∫Z(r̄i )

p(ri | zi , ψ)p(zi | θ) dzi(r̄i )

MAR=

[n∏

i=1

p(ri | zi(ri ), ψ)

]︸ ︷︷ ︸

Can be ignored

[n∏

i=1

∫Z(r̄i )

p(zi | θ) dzi(r̄i )

]︸ ︷︷ ︸

Likelihood for θ under MAR

I Under a Bayesian approach, we need to obtain

p(θ, ψ | z(r), r) ∝ Lobs(θ, ψ | z(r), r)p(θ, ψ)

I Under ignorability (MAR + separability + θ ⊥⊥ ψ a priori), we need

p(θ | z(r)) ∝ Lobs(θ | z(r))p(θ)

I Integrals in Lobs(θ | z(r)) complicate things – what to do?

30 / 31

Page 69: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Missing Data and BayesI With missing data, things get complicated

Lobs(θ, ψ | z(r), r) =n∏

i=1

∫Z(r̄i )

p(ri | zi , ψ)p(zi | θ) dzi(r̄i )

MAR=

[n∏

i=1

p(ri | zi(ri ), ψ)

]︸ ︷︷ ︸

Can be ignored

[n∏

i=1

∫Z(r̄i )

p(zi | θ) dzi(r̄i )

]︸ ︷︷ ︸

Likelihood for θ under MAR

I Under a Bayesian approach, we need to obtain

p(θ, ψ | z(r), r) ∝ Lobs(θ, ψ | z(r), r)p(θ, ψ)

I Under ignorability (MAR + separability + θ ⊥⊥ ψ a priori), we need

p(θ | z(r)) ∝ Lobs(θ | z(r))p(θ)

I Integrals in Lobs(θ | z(r)) complicate things – what to do?

30 / 31

Page 70: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

Missing Data and BayesI With missing data, things get complicated

Lobs(θ, ψ | z(r), r) =n∏

i=1

∫Z(r̄i )

p(ri | zi , ψ)p(zi | θ) dzi(r̄i )

MAR=

[n∏

i=1

p(ri | zi(ri ), ψ)

]︸ ︷︷ ︸

Can be ignored

[n∏

i=1

∫Z(r̄i )

p(zi | θ) dzi(r̄i )

]︸ ︷︷ ︸

Likelihood for θ under MAR

I Under a Bayesian approach, we need to obtain

p(θ, ψ | z(r), r) ∝ Lobs(θ, ψ | z(r), r)p(θ, ψ)

I Under ignorability (MAR + separability + θ ⊥⊥ ψ a priori), we need

p(θ | z(r)) ∝ Lobs(θ | z(r))p(θ)

I Integrals in Lobs(θ | z(r)) complicate things – what to do?

30 / 31

Page 71: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

SummaryMain take-aways from today’s lecture:

I Using Bayes’ theorem doesn’t make you a Bayesian

I Bayesian inference offers alternative framework for derivinginferences from data

I Philosophical motivation: inclusion of prior belief or knowledge,uncertainty quantification in terms of distributions for parameters

I Practical motivation: convenient in some problems, might lead togood frequentist performance

I Complex problems are computationally challenging – posterior needsto be approximated (e.g., Markov chain Monte Carlo)

Next lecture:

I Gibbs sampling

I Data augmentation

I Introduction to multiple imputation31 / 31

Page 72: Statistical Methods for Analysis with Missing Data · Statistical Methods for Analysis with Missing Data Lecture 8: introduction to Bayesian inference Mauricio Sadinle Department

SummaryMain take-aways from today’s lecture:

I Using Bayes’ theorem doesn’t make you a Bayesian

I Bayesian inference offers alternative framework for derivinginferences from data

I Philosophical motivation: inclusion of prior belief or knowledge,uncertainty quantification in terms of distributions for parameters

I Practical motivation: convenient in some problems, might lead togood frequentist performance

I Complex problems are computationally challenging – posterior needsto be approximated (e.g., Markov chain Monte Carlo)

Next lecture:

I Gibbs sampling

I Data augmentation

I Introduction to multiple imputation31 / 31