weakly informative priors - columbia universitygelman/presentations/weakpriorstalk.pdf · weakly...

115
Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference A hierarchical framework Conclusion References Weakly informative priors Andrew Gelman and Aleks Jakulin Department of Statistics and Department of Political Science Columbia University 3 Mar 2007 Andrew Gelman and Aleks Jakulin Weakly informative priors

Upload: others

Post on 16-Feb-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Weakly informative priors

Andrew Gelman and Aleks JakulinDepartment of Statistics and Department of Political Science

Columbia University

3 Mar 2007

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 2: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Themes

I Informative, noninformative, and weakly informative priors

I The sociology of shrinkage, orconservatism of Bayesian inference

I CollaboratorsI Yu-Sung Su (Dept of Poli Sci, City Univ of New York)I Masanao Yajima (Dept of Statistics, Columbia Univ)I Maria Grazia Pittau (Dept of Economics, Univ of Rome)I Gary King (Dept of Government, Harvard Univ)I Samantha Cook (Statistics group, Google)I Francis Tuerlinckx (Dept of Psychology, Univ of Leuven)

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 3: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Themes

I Informative, noninformative, and weakly informative priors

I The sociology of shrinkage, orconservatism of Bayesian inference

I CollaboratorsI Yu-Sung Su (Dept of Poli Sci, City Univ of New York)I Masanao Yajima (Dept of Statistics, Columbia Univ)I Maria Grazia Pittau (Dept of Economics, Univ of Rome)I Gary King (Dept of Government, Harvard Univ)I Samantha Cook (Statistics group, Google)I Francis Tuerlinckx (Dept of Psychology, Univ of Leuven)

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 4: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Themes

I Informative, noninformative, and weakly informative priors

I The sociology of shrinkage, orconservatism of Bayesian inference

I CollaboratorsI Yu-Sung Su (Dept of Poli Sci, City Univ of New York)I Masanao Yajima (Dept of Statistics, Columbia Univ)I Maria Grazia Pittau (Dept of Economics, Univ of Rome)I Gary King (Dept of Government, Harvard Univ)I Samantha Cook (Statistics group, Google)I Francis Tuerlinckx (Dept of Psychology, Univ of Leuven)

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 5: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Themes

I Informative, noninformative, and weakly informative priors

I The sociology of shrinkage, orconservatism of Bayesian inference

I CollaboratorsI Yu-Sung Su (Dept of Poli Sci, City Univ of New York)I Masanao Yajima (Dept of Statistics, Columbia Univ)I Maria Grazia Pittau (Dept of Economics, Univ of Rome)I Gary King (Dept of Government, Harvard Univ)I Samantha Cook (Statistics group, Google)I Francis Tuerlinckx (Dept of Psychology, Univ of Leuven)

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 6: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Themes

I Informative, noninformative, and weakly informative priors

I The sociology of shrinkage, orconservatism of Bayesian inference

I CollaboratorsI Yu-Sung Su (Dept of Poli Sci, City Univ of New York)I Masanao Yajima (Dept of Statistics, Columbia Univ)I Maria Grazia Pittau (Dept of Economics, Univ of Rome)I Gary King (Dept of Government, Harvard Univ)I Samantha Cook (Statistics group, Google)I Francis Tuerlinckx (Dept of Psychology, Univ of Leuven)

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 7: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Themes

I Informative, noninformative, and weakly informative priors

I The sociology of shrinkage, orconservatism of Bayesian inference

I CollaboratorsI Yu-Sung Su (Dept of Poli Sci, City Univ of New York)I Masanao Yajima (Dept of Statistics, Columbia Univ)I Maria Grazia Pittau (Dept of Economics, Univ of Rome)I Gary King (Dept of Government, Harvard Univ)I Samantha Cook (Statistics group, Google)I Francis Tuerlinckx (Dept of Psychology, Univ of Leuven)

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 8: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Themes

I Informative, noninformative, and weakly informative priors

I The sociology of shrinkage, orconservatism of Bayesian inference

I CollaboratorsI Yu-Sung Su (Dept of Poli Sci, City Univ of New York)I Masanao Yajima (Dept of Statistics, Columbia Univ)I Maria Grazia Pittau (Dept of Economics, Univ of Rome)I Gary King (Dept of Government, Harvard Univ)I Samantha Cook (Statistics group, Google)I Francis Tuerlinckx (Dept of Psychology, Univ of Leuven)

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 9: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Themes

I Informative, noninformative, and weakly informative priors

I The sociology of shrinkage, orconservatism of Bayesian inference

I CollaboratorsI Yu-Sung Su (Dept of Poli Sci, City Univ of New York)I Masanao Yajima (Dept of Statistics, Columbia Univ)I Maria Grazia Pittau (Dept of Economics, Univ of Rome)I Gary King (Dept of Government, Harvard Univ)I Samantha Cook (Statistics group, Google)I Francis Tuerlinckx (Dept of Psychology, Univ of Leuven)

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 10: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Themes

I Informative, noninformative, and weakly informative priors

I The sociology of shrinkage, orconservatism of Bayesian inference

I CollaboratorsI Yu-Sung Su (Dept of Poli Sci, City Univ of New York)I Masanao Yajima (Dept of Statistics, Columbia Univ)I Maria Grazia Pittau (Dept of Economics, Univ of Rome)I Gary King (Dept of Government, Harvard Univ)I Samantha Cook (Statistics group, Google)I Francis Tuerlinckx (Dept of Psychology, Univ of Leuven)

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 11: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Themes

I Informative, noninformative, and weakly informative priors

I The sociology of shrinkage, orconservatism of Bayesian inference

I CollaboratorsI Yu-Sung Su (Dept of Poli Sci, City Univ of New York)I Masanao Yajima (Dept of Statistics, Columbia Univ)I Maria Grazia Pittau (Dept of Economics, Univ of Rome)I Gary King (Dept of Government, Harvard Univ)I Samantha Cook (Statistics group, Google)I Francis Tuerlinckx (Dept of Psychology, Univ of Leuven)

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 12: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Themes

I Informative, noninformative, and weakly informative priors

I The sociology of shrinkage, orconservatism of Bayesian inference

I CollaboratorsI Yu-Sung Su (Dept of Poli Sci, City Univ of New York)I Masanao Yajima (Dept of Statistics, Columbia Univ)I Maria Grazia Pittau (Dept of Economics, Univ of Rome)I Gary King (Dept of Government, Harvard Univ)I Samantha Cook (Statistics group, Google)I Francis Tuerlinckx (Dept of Psychology, Univ of Leuven)

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 13: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

What does this have to do with MCMC?

I I’m speaking at Jun Liu’s MCMC conference

I We don’t have to be trapped by decades-old models

I The folk theorem about computation and modeling

I The example of BUGS

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 14: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

What does this have to do with MCMC?

I I’m speaking at Jun Liu’s MCMC conference

I We don’t have to be trapped by decades-old models

I The folk theorem about computation and modeling

I The example of BUGS

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 15: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

What does this have to do with MCMC?

I I’m speaking at Jun Liu’s MCMC conference

I We don’t have to be trapped by decades-old models

I The folk theorem about computation and modeling

I The example of BUGS

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 16: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

What does this have to do with MCMC?

I I’m speaking at Jun Liu’s MCMC conference

I We don’t have to be trapped by decades-old models

I The folk theorem about computation and modeling

I The example of BUGS

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 17: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

What does this have to do with MCMC?

I I’m speaking at Jun Liu’s MCMC conference

I We don’t have to be trapped by decades-old models

I The folk theorem about computation and modeling

I The example of BUGS

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 18: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularlization, stabilization

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 19: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularlization, stabilization

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 20: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularlization, stabilization

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 21: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularlization, stabilization

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 22: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularlization, stabilization

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 23: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularlization, stabilization

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 24: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularlization, stabilization

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 25: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularlization, stabilization

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 26: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Information in prior distributions

I Informative prior distI A full generative model for the data

I Noninformative prior distI Let the data speakI Goal: valid inference for any θ

I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularlization, stabilization

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 27: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors: some examples

I Variance parameters

I Covariance matrices

I Logistic regression coefficients

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 28: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors: some examples

I Variance parameters

I Covariance matrices

I Logistic regression coefficients

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 29: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors: some examples

I Variance parameters

I Covariance matrices

I Logistic regression coefficients

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 30: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors: some examples

I Variance parameters

I Covariance matrices

I Logistic regression coefficients

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 31: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors: some examples

I Variance parameters

I Covariance matrices

I Logistic regression coefficients

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 32: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors: some examples

I Variance parameters

I Covariance matrices

I Logistic regression coefficients

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 33: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors: some examples

I Variance parameters

I Covariance matrices

I Logistic regression coefficients

I Population variation in a physiological model

I Mixture models

I Intentional underpooling in hierarchical models

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 34: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forvariance parameter

I Basic hierarchical model

I Traditional inverse-gamma(0.001, 0.001) prior can be highlyinformative (in a bad way)!

I Noninformative uniform prior works better

I But if #groups is small (J = 2, 3, even 5), a weaklyinformative prior helps by shutting down huge values of τ

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 35: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forvariance parameter

I Basic hierarchical model

I Traditional inverse-gamma(0.001, 0.001) prior can be highlyinformative (in a bad way)!

I Noninformative uniform prior works better

I But if #groups is small (J = 2, 3, even 5), a weaklyinformative prior helps by shutting down huge values of τ

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 36: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forvariance parameter

I Basic hierarchical model

I Traditional inverse-gamma(0.001, 0.001) prior can be highlyinformative (in a bad way)!

I Noninformative uniform prior works better

I But if #groups is small (J = 2, 3, even 5), a weaklyinformative prior helps by shutting down huge values of τ

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 37: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forvariance parameter

I Basic hierarchical model

I Traditional inverse-gamma(0.001, 0.001) prior can be highlyinformative (in a bad way)!

I Noninformative uniform prior works better

I But if #groups is small (J = 2, 3, even 5), a weaklyinformative prior helps by shutting down huge values of τ

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 38: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forvariance parameter

I Basic hierarchical model

I Traditional inverse-gamma(0.001, 0.001) prior can be highlyinformative (in a bad way)!

I Noninformative uniform prior works better

I But if #groups is small (J = 2, 3, even 5), a weaklyinformative prior helps by shutting down huge values of τ

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 39: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Priors for variance parameter: J = 8 goups

σα0 5 10 15 20 25 30

8 schools: posterior on σα givenuniform prior on σα

σα0 5 10 15 20 25 30

8 schools: posterior on σα giveninv−gamma (1, 1) prior on σα

2

σα0 5 10 15 20 25 30

8 schools: posterior on σα giveninv−gamma (.001, .001) prior on σα

2

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 40: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Priors for variance parameter: J = 3 groups

σα0 50 100 150 200

3 schools: posterior on σα givenuniform prior on σα

σα0 50 100 150 200

3 schools: posterior on σα givenhalf−Cauchy (25) prior on σα

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 41: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forcovariance matrices

I Inverse-Wishart has problems

I Correlations can be between 0 and 1

I Set up models so prior expectation of correlations is 0

I Goal: to be weakly informative about correlations andvariances

I Scaled inverse-Wishart model uses redundant parameterization

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 42: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forcovariance matrices

I Inverse-Wishart has problems

I Correlations can be between 0 and 1

I Set up models so prior expectation of correlations is 0

I Goal: to be weakly informative about correlations andvariances

I Scaled inverse-Wishart model uses redundant parameterization

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 43: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forcovariance matrices

I Inverse-Wishart has problems

I Correlations can be between 0 and 1

I Set up models so prior expectation of correlations is 0

I Goal: to be weakly informative about correlations andvariances

I Scaled inverse-Wishart model uses redundant parameterization

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 44: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forcovariance matrices

I Inverse-Wishart has problems

I Correlations can be between 0 and 1

I Set up models so prior expectation of correlations is 0

I Goal: to be weakly informative about correlations andvariances

I Scaled inverse-Wishart model uses redundant parameterization

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 45: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forcovariance matrices

I Inverse-Wishart has problems

I Correlations can be between 0 and 1

I Set up models so prior expectation of correlations is 0

I Goal: to be weakly informative about correlations andvariances

I Scaled inverse-Wishart model uses redundant parameterization

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 46: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forcovariance matrices

I Inverse-Wishart has problems

I Correlations can be between 0 and 1

I Set up models so prior expectation of correlations is 0

I Goal: to be weakly informative about correlations andvariances

I Scaled inverse-Wishart model uses redundant parameterization

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 47: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Separation in logistic regression

glm (vote ~ female + black + income, family=binomial(link="logit"))

1960 1968

coef.est coef.se coef.est coef.se

(Intercept) -0.14 0.23 (Intercept) 0.47 0.24

female 0.24 0.14 female -0.01 0.15

black -1.03 0.36 black -3.64 0.59

income 0.03 0.06 income -0.03 0.07

1964 1972

coef.est coef.se coef.est coef.se

(Intercept) -1.15 0.22 (Intercept) 0.67 0.18

female -0.09 0.14 female -0.25 0.12

black -16.83 420.40 black -2.63 0.27

income 0.19 0.06 income 0.09 0.05

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 48: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forlogistic regression coefficients

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 49: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forlogistic regression coefficients

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 50: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forlogistic regression coefficients

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 51: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forlogistic regression coefficients

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 52: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forlogistic regression coefficients

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 53: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forlogistic regression coefficients

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 54: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forlogistic regression coefficients

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 55: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forlogistic regression coefficients

I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always

between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50

or from 0.50 to 0.99I Smoking and lung cancer

I Independent Cauchy prior dists with center 0 and scale 2.5

I Rescale each predictor to have mean 0 and sd 12

I Fast implementation using EM; easy adaptation of glm

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 56: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Regularization in action!

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 57: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forpopulation variation in a physiological model

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 58: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forpopulation variation in a physiological model

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 59: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forpopulation variation in a physiological model

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 60: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forpopulation variation in a physiological model

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 61: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forpopulation variation in a physiological model

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 62: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forpopulation variation in a physiological model

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 63: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forpopulation variation in a physiological model

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 64: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors forpopulation variation in a physiological model

I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”

I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)

I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????

I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)

I Weakly informative

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 65: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 66: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 67: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 68: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 69: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 70: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 71: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 72: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Weakly informative priors formixture models

I Well-known problem of fitting the mixture model likelihood

I The maximum likelihood fits are weird, with a single pointtaking half the mixture

I Bayes with flat prior is just as bad

I These solutions don’t “look” like mixtures

I There must be additional prior information—or, to put itanother way, regularization

I Simple constraints, for example, a prior dist on the varianceratio

I Weakly informative

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 73: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θj

I Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “inconsistent Gibbs” algorithm

I Why would we do this??

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 74: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θj

I Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “inconsistent Gibbs” algorithm

I Why would we do this??

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 75: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θj

I Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “inconsistent Gibbs” algorithm

I Why would we do this??

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 76: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θj

I Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “inconsistent Gibbs” algorithm

I Why would we do this??

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 77: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θj

I Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “inconsistent Gibbs” algorithm

I Why would we do this??

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 78: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θj

I Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “inconsistent Gibbs” algorithm

I Why would we do this??

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 79: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θj

I Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “inconsistent Gibbs” algorithm

I Why would we do this??

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 80: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θj

I Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “inconsistent Gibbs” algorithm

I Why would we do this??

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 81: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Variance parametersCovariance matricesLogistic regression coefficientsPopulation variation in a physiological modelMixture modelsIntentional underpooling in hierarchical models

Intentional underpooling in hierarchical models

I Basic hierarchical model:I Data yj on parameters θj

I Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj

I Bayesian partial-pooling estimate E(θj |y)

I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ

I An example of the “inconsistent Gibbs” algorithm

I Why would we do this??

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 82: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Static sensitivity analysis: what happens if we add priorinformation?

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 83: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Conservatism of Bayesian inference

I Consider the logistic regression exampleI Problems with maximum likelihood when data show

separation:I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases

I Is this conservative?

I Not if evaluated by log score or predictive log-likelihood

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 84: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Conservatism of Bayesian inference

I Consider the logistic regression exampleI Problems with maximum likelihood when data show

separation:I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases

I Is this conservative?

I Not if evaluated by log score or predictive log-likelihood

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 85: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Conservatism of Bayesian inference

I Consider the logistic regression exampleI Problems with maximum likelihood when data show

separation:I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases

I Is this conservative?

I Not if evaluated by log score or predictive log-likelihood

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 86: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Conservatism of Bayesian inference

I Consider the logistic regression exampleI Problems with maximum likelihood when data show

separation:I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases

I Is this conservative?

I Not if evaluated by log score or predictive log-likelihood

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 87: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Conservatism of Bayesian inference

I Consider the logistic regression exampleI Problems with maximum likelihood when data show

separation:I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases

I Is this conservative?

I Not if evaluated by log score or predictive log-likelihood

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 88: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Conservatism of Bayesian inference

I Consider the logistic regression exampleI Problems with maximum likelihood when data show

separation:I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases

I Is this conservative?

I Not if evaluated by log score or predictive log-likelihood

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 89: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Conservatism of Bayesian inference

I Consider the logistic regression exampleI Problems with maximum likelihood when data show

separation:I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases

I Is this conservative?

I Not if evaluated by log score or predictive log-likelihood

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 90: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Another example

Dose #deaths/#animals−0.86 0/5−0.30 1/5−0.05 3/5

0.73 5/5

I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9

I Which is truly conservative?

I The sociology of shrinkage

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 91: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Another example

Dose #deaths/#animals−0.86 0/5−0.30 1/5−0.05 3/5

0.73 5/5

I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9

I Which is truly conservative?

I The sociology of shrinkage

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 92: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Another example

Dose #deaths/#animals−0.86 0/5−0.30 1/5−0.05 3/5

0.73 5/5

I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9

I Which is truly conservative?

I The sociology of shrinkage

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 93: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Another example

Dose #deaths/#animals−0.86 0/5−0.30 1/5−0.05 3/5

0.73 5/5

I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9

I Which is truly conservative?

I The sociology of shrinkage

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 94: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Another example

Dose #deaths/#animals−0.86 0/5−0.30 1/5−0.05 3/5

0.73 5/5

I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9

I Which is truly conservative?

I The sociology of shrinkage

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 95: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Another example

Dose #deaths/#animals−0.86 0/5−0.30 1/5−0.05 3/5

0.73 5/5

I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9

I Which is truly conservative?

I The sociology of shrinkage

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 96: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Prior as population distributionEvaluation using a corpus of datasets

A hierarchical framework

I Consider many possible datasets

I The “true prior” is the distribution of β’s across these datasets

I Fit one dataset at a time

I A “weakly informative prior” has less information (widervariance) than the true prior

I Open question: How to formalize the tradeoffs from usingdifferent priors?

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 97: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Prior as population distributionEvaluation using a corpus of datasets

A hierarchical framework

I Consider many possible datasets

I The “true prior” is the distribution of β’s across these datasets

I Fit one dataset at a time

I A “weakly informative prior” has less information (widervariance) than the true prior

I Open question: How to formalize the tradeoffs from usingdifferent priors?

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 98: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Prior as population distributionEvaluation using a corpus of datasets

A hierarchical framework

I Consider many possible datasets

I The “true prior” is the distribution of β’s across these datasets

I Fit one dataset at a time

I A “weakly informative prior” has less information (widervariance) than the true prior

I Open question: How to formalize the tradeoffs from usingdifferent priors?

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 99: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Prior as population distributionEvaluation using a corpus of datasets

A hierarchical framework

I Consider many possible datasets

I The “true prior” is the distribution of β’s across these datasets

I Fit one dataset at a time

I A “weakly informative prior” has less information (widervariance) than the true prior

I Open question: How to formalize the tradeoffs from usingdifferent priors?

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 100: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Prior as population distributionEvaluation using a corpus of datasets

A hierarchical framework

I Consider many possible datasets

I The “true prior” is the distribution of β’s across these datasets

I Fit one dataset at a time

I A “weakly informative prior” has less information (widervariance) than the true prior

I Open question: How to formalize the tradeoffs from usingdifferent priors?

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 101: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Prior as population distributionEvaluation using a corpus of datasets

A hierarchical framework

I Consider many possible datasets

I The “true prior” is the distribution of β’s across these datasets

I Fit one dataset at a time

I A “weakly informative prior” has less information (widervariance) than the true prior

I Open question: How to formalize the tradeoffs from usingdifferent priors?

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 102: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Prior as population distributionEvaluation using a corpus of datasets

Evaluation using a corpus of datasets

I Compare classical glm to Bayesian estimates using variousprior distributions

I Evaluate using cross-validation and average predictive error

I The optimal prior distribution for β’s is (approx) Cauchy (0, 1)

I Our Cauchy (0, 2.5) prior distribution is weakly informative!

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 103: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Prior as population distributionEvaluation using a corpus of datasets

Evaluation using a corpus of datasets

I Compare classical glm to Bayesian estimates using variousprior distributions

I Evaluate using cross-validation and average predictive error

I The optimal prior distribution for β’s is (approx) Cauchy (0, 1)

I Our Cauchy (0, 2.5) prior distribution is weakly informative!

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 104: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Prior as population distributionEvaluation using a corpus of datasets

Evaluation using a corpus of datasets

I Compare classical glm to Bayesian estimates using variousprior distributions

I Evaluate using cross-validation and average predictive error

I The optimal prior distribution for β’s is (approx) Cauchy (0, 1)

I Our Cauchy (0, 2.5) prior distribution is weakly informative!

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 105: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Prior as population distributionEvaluation using a corpus of datasets

Evaluation using a corpus of datasets

I Compare classical glm to Bayesian estimates using variousprior distributions

I Evaluate using cross-validation and average predictive error

I The optimal prior distribution for β’s is (approx) Cauchy (0, 1)

I Our Cauchy (0, 2.5) prior distribution is weakly informative!

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 106: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Prior as population distributionEvaluation using a corpus of datasets

Evaluation using a corpus of datasets

I Compare classical glm to Bayesian estimates using variousprior distributions

I Evaluate using cross-validation and average predictive error

I The optimal prior distribution for β’s is (approx) Cauchy (0, 1)

I Our Cauchy (0, 2.5) prior distribution is weakly informative!

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 107: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Prior as population distributionEvaluation using a corpus of datasets

Expected predictive loss, avg over a corpus of datasets

0 1 2 3 4 5

0.29

0.30

0.31

0.32

0.33

scale of prior

−lo

g te

st li

kelih

ood

(1.79)GLM

BBR(l)

df=2.0

df=4.0

df=8.0BBR(g)

df=1.0

df=0.5

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 108: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Conclusion

I “Noninformative priors” are really weakly informative

I “Weakly informative” is a more general and useful conceptI Regularization

I Better inferencesI Stability of computation

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 109: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Conclusion

I “Noninformative priors” are really weakly informative

I “Weakly informative” is a more general and useful conceptI Regularization

I Better inferencesI Stability of computation

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 110: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Conclusion

I “Noninformative priors” are really weakly informative

I “Weakly informative” is a more general and useful conceptI Regularization

I Better inferencesI Stability of computation

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 111: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Conclusion

I “Noninformative priors” are really weakly informative

I “Weakly informative” is a more general and useful conceptI Regularization

I Better inferencesI Stability of computation

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 112: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Conclusion

I “Noninformative priors” are really weakly informative

I “Weakly informative” is a more general and useful conceptI Regularization

I Better inferencesI Stability of computation

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 113: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Conclusion

I “Noninformative priors” are really weakly informative

I “Weakly informative” is a more general and useful conceptI Regularization

I Better inferencesI Stability of computation

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 114: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Our workWork of others

References: our work

Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models.Bayesian Analysis 1, 515–533.Gelman, A., Bois, F., and Jiang, J. (1996). Physiological pharmacokinetic analysisusing population modeling and informative prior distributions. Journal of theAmerican Statistical Association 91, 1400–1412.Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003). Bayesian DataAnalysis, second edition. London: CRC Press.Gelman, A., and Hill, J. (2007). Data Analysis Using Regression andMultilevel/Hierarchical Models. Cambridge University Press.Gelman, A., and Jakulin, A. (2007). Bayes: radical, liberal, or conservative? StatisticaSinica 17, 422–426.Gelman, A., Jakulin, A., Pittau, M. G., and Su, Y. S. (2007). A default priordistribution for logistic and other regression models. Technical report, Department ofStatistics, Columbia University.Gelman, A., and King, G. (1990). Estimating the electoral consequences of legislativeredistricting. Journal of the American Statistical Association 85, 274–282.Gelman, A., and Tuerlinckx, F. (2000). Type S error rates for classical and Bayesiansingle and multiple comparison procedures. Computational Statistics 15, 373–390.

Andrew Gelman and Aleks Jakulin Weakly informative priors

Page 115: Weakly informative priors - Columbia Universitygelman/presentations/weakpriorstalk.pdf · Weakly informative priors Static sensitivity analysis Conservatism of Bayesian inference

Weakly informative priorsStatic sensitivity analysis

Conservatism of Bayesian inferenceA hierarchical framework

ConclusionReferences

Our workWork of others

References: work of others

Cook, S. R., and Rubin, D. B. (2006). Constructing vague but proper priordistributions in complex Bayesian models. Technical report, Department of Statistics,Harvard University.Greenland, S. (2007). Bayesian perspectives for epidemiological research. II.Regression analysis. International Journal of Epidemiology 36, 1–8.MacLehose, R. F., Dunson, D. B., Herring, A. H., and Hoppin, J. A. (2007). Bayesianmethods for highly correlated exposure data. Epidemiology 18, 199–207.O’Malley, A. J., and Zaslavsky, A. M. (2005). Cluster-level covariance analysis forsurvey data with structured nonresponse. Technical report, Department of HealthCare Policy, Harvard Medical School.Thomas, D. C., Witte, J. S., and Greenland, S. (2007). Dissecting effects of complexmixtures: who’s afraid of informative priors? Epidemiology 18, 186–190.

Andrew Gelman and Aleks Jakulin Weakly informative priors