school of mathematics · onsomegeometricalaspectsofbayesianinference miguel de carvalho† †joint...

56
On Some Geometrical Aspects of Bayesian Inference Miguel de Carvalho Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics M. de Carvalho On the Geometry of Bayesian Inference 1 / 37

Upload: others

Post on 18-Nov-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

On Some Geometrical Aspects of Bayesian Inference

Miguel de Carvalho†

†Joint with B. J. Barney and G. L. Page; Brigham Young University, US

School of Mathematics

M. de Carvalho On the Geometry of Bayesian Inference 1 / 37

Page 2: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

ISBA 2018: Edinburgh, June 24–29World Meeting of the International Society for Bayesian Analysis

M. de Carvalho On the Geometry of Bayesian Inference 2 / 37

Page 3: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

IntroductionMotivation

Bayesian methodologies have become main stream.Because of this, there is a need to develop methods accessible to‘non-experts’ that assess the influence of model choices on inference.These will need to be:

1 Easy to interpret.2 Easy to calculate.

Ideally: Provide a unified treatment to all pieces of Bayes theorem.

M. de Carvalho On the Geometry of Bayesian Inference 3 / 37

Page 4: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

IntroductionMotivation

Bayesian methodologies have become main stream.Because of this, there is a need to develop methods accessible to‘non-experts’ that assess the influence of model choices on inference.These will need to be:

1 Easy to interpret.2 Easy to calculate.

Ideally: Provide a unified treatment to all pieces of Bayes theorem.

M. de Carvalho On the Geometry of Bayesian Inference 3 / 37

Page 5: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

IntroductionMotivation

Much work has been devoted to developing methods to assess the sensitivityof the posterior to changes in the prior and likelihood.The so-called prior–data conflict has been another subject which has beenattracting attention (Evans and Moshonov, 2006; Walter and Augustin, 2009;Al Labadi and Evans, 2016).Others have investigated two competing priors to specifiy so-called weaklyinformative priors (Evans and Jang, 2011; Gelman et al., 2011).

M. de Carvalho On the Geometry of Bayesian Inference 4 / 37

Page 6: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

IntroductionGoals

The novel contribution we intend to make is to provide a metric that is ableto carry out comparisons between the:

prior and likelihood: to assess the prior–data agreement;prior and posterior: to assess the influence that the prior has on inference;prior and prior: to compare information available on competing priors.

To be useful this metric should be:1 Easy to interpret.2 Easy to calculate.

Ideally: Provide a unified treatment to all pieces of Bayes theorem.

M. de Carvalho On the Geometry of Bayesian Inference 5 / 37

Page 7: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

IntroductionLine of Attack

To this end, we view each of the components of Bayes theorem as if theybelonged to a geometry and seek to provide intuitively appealinginterpretations of the norms and angles between the vectors of this geometry.We will show that calculating these quantities is very straightforward and canbe done online.Interpretations are similar to those that accompany the correlation coefficientfor continuous random variables.

M. de Carvalho On the Geometry of Bayesian Inference 6 / 37

Page 8: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

IntroductionOn-the-Job Drug Usage Toy Example

Example (Christensen et al, 2011, pp. 26–27)

Suppose interest lies in estimating the proportion θ ∈ [0,1] of UStransportation industry workers that use drugs on the job. Supposey = (0,1,0,0,0,0,1,0,0,0) and that

y | θ iid∼ Bern(θ), θ ∼ Beta(a,b), θ | y ∼ Beta(a?,b?),

with a? = ∑Yi +a and b? = n−∑Yi +b.The authors conduct the analysis picking (a,b) = (3.44,22.99).

M. de Carvalho On the Geometry of Bayesian Inference 7 / 37

Page 9: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

IntroductionNatural Questions

Some key questions:How compatible is the likelihood with this prior choice?How similar are the posterior and prior distributions?How does the choice of Beta(a,b) compare to other possible priordistributions?

We provide a unified treatment to answer the questions above.

M. de Carvalho On the Geometry of Bayesian Inference 8 / 37

Page 10: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

StoryboardPlan of this Talk

1 Introduction (Done)2 Bayes Geometry (Next)

3 Posterior and Prior Mean-Based Estimators of Compatibility

4 Discussion

M. de Carvalho On the Geometry of Bayesian Inference 9 / 37

Page 11: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryPrimitive Structures of Interest

Suppose the inference of interest is over a parameter θ in Θ⊆ Rp.We work in L2(Θ), and use the geometry of the Hilbert space

H = (L2(Θ),〈·, ·〉),

with inner-product

〈g ,h〉=∫

Θg(θ)h(θ)dθ, g ,h ∈ L2(Θ),

and norm ‖ · ‖= (〈·, ·〉)1/2.The fact that H is an Hilbert space is often known as the Riesz–Fischertheorem (Cheney, 2001, p. 411).

M. de Carvalho On the Geometry of Bayesian Inference 10 / 37

Page 12: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryA Geometric View of Bayes Theorem

Bayes theorem

p(θ | y) =π(θ)f (y | θ)∫

Θ π(θ)f (y | θ)dθ

=π(θ)`(θ)

〈π, `〉.

pace1.5cm

`

π

p

The likelihood vector is used to enlarge/reduce the magnitude and suitablytilt the direction of the prior vector.

M. de Carvalho On the Geometry of Bayesian Inference 11 / 37

Page 13: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryA Geometric View of Bayes Theorem

Define the angle measure between the prior and the likelihood as

π∠ ` = arccos〈π, `〉‖π‖‖`‖

.

Since π and ` are nonnegative, π∠ ` ∈ [0,90].Bayes theorem is incompatible with a prior being orthogonal to the likelihoodas

π∠ ` = 90⇒ 〈π, `〉= 0,

thus leading to a division by zero.Our first target object of interest is given by a standardized inner product

κπ,` =〈π, `〉‖π‖‖`‖

,

which quantifies how much an expert’s opinion agrees with the data, thusproviding a natural measure of prior–data agreement.

M. de Carvalho On the Geometry of Bayesian Inference 12 / 37

Page 14: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryA Geometric View of Bayes Theorem

Define the angle measure between the prior and the likelihood as

π∠ ` = arccos〈π, `〉‖π‖‖`‖

.

Since π and ` are nonnegative, π∠ ` ∈ [0,90].Bayes theorem is incompatible with a prior being orthogonal to the likelihoodas

π∠ ` = 90⇒ 〈π, `〉= 0,

thus leading to a division by zero.Our first target object of interest is given by a standardized inner product

κπ,` =〈π, `〉‖π‖‖`‖

,

which quantifies how much an expert’s opinion agrees with the data, thusproviding a natural measure of prior–data agreement.

M. de Carvalho On the Geometry of Bayesian Inference 12 / 37

Page 15: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryA Geometric View of Bayes Theorem

Definition (Millman and Parker, 1991, p. 17)

An abstract geometry A consists of a pair P,L , where the elements of set Pare designed as points, and the elements of the collection L are designed as lines,such that:

1 For every two points A,B ∈P, there is a line l ∈L .2 Every line has at least two points.

Our abstract geometry of interest is A = P,L , where P = L2(Θ) and

L = g +kh, : g ,h ∈ L2(Θ).

In our setting points are, for example, prior densities, posterior densities, orlikelihoods, as long as they are in L2(Θ).

M. de Carvalho On the Geometry of Bayesian Inference 13 / 37

Page 16: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryA Geometric View of Bayes Theorem

Lines are elements of L , so that for example if g and h are densities, linesegments in our geometry consist of all possible mixture distributions whichcan be obtained from g and h, i.e.:

λg + (1−λ )h : λ ∈ [0,1].

Vectors in A = P,L are defined through the difference of elements inP = L2(Θ).If g ,h ∈ L2(Θ) are vectors then we say that g and h are collinear if thereexists k ∈ R, such that g(θ) = kh(θ).Put differently, we say g and h are collinear if g(θ) ∝ h(θ), for all θ ∈Θ.

M. de Carvalho On the Geometry of Bayesian Inference 14 / 37

Page 17: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryA Geometric View of Bayes Theorem

Two different densities π1 and π2 cannot becollinear:

If π1 = kπ2, then k = 1, otherwise∫

π2(θ)dθ 6= 1.

A density can be collinear to a likelihood:

If the prior is uniform p(θ | y) ∝ `(θ).

π1

π2

p

`

M. de Carvalho On the Geometry of Bayesian Inference 15 / 37

Page 18: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryA Geometric View of Bayes Theorem

Our geometry is compatible withhaving two likelihoods beingcollinear.This can be used to rethink thestrong likelihood principle thatstates that if

`(θ) = f (θ | y) ∝ f (θ | y ∗) = `∗(θ),

then the same inference should bedrawn from both samples.

`

`∗

pace0.5cm According to our geometry the strong likelihood principle reads:“Likelihoods with the same direction should yield the same inference.”

M. de Carvalho On the Geometry of Bayesian Inference 16 / 37

Page 19: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryA Geometric View of Bayes Theorem

Definition (Compatibility)

The compatibility between points in the geometry under consideration is themapping κ : L2(Θ)×L2(Θ)→ [0,1] defined as

κg ,h =〈g ,h〉‖g‖‖h‖

, g ,h ∈ L2(Θ).

pace-.1cmPearson correlation coefficient vs. compatibility〈X ,Y 〉=

∫ΩXY dP,

X ,Y ∈ L2(Ω,BΩ,P),

instead of

〈g ,h〉=

∫Θ g(θ)h(θ)dθ,

g ,h ∈ L2(Θ).

pace-.2cm Note that:κπ,`: prior-data agreement. pace0.05cmκπ,p: sensitivity of the posterior to the prior specification. pace0.05cmκπ1,π2 : compatibility of different priors [coherency of opinions of experts].

M. de Carvalho On the Geometry of Bayesian Inference 17 / 37

Page 20: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryA Geometric View of Bayes Theorem

Definition (Compatibility)

The compatibility between points in the geometry under consideration is themapping κ : L2(Θ)×L2(Θ)→ [0,1] defined as

κg ,h =〈g ,h〉‖g‖‖h‖

, g ,h ∈ L2(Θ).

pace-.1cmPearson correlation coefficient vs. compatibility〈X ,Y 〉=

∫ΩXY dP,

X ,Y ∈ L2(Ω,BΩ,P),instead of

〈g ,h〉=

∫Θ g(θ)h(θ)dθ,

g ,h ∈ L2(Θ).

pace-.2cm Note that:κπ,`: prior-data agreement. pace0.05cmκπ,p: sensitivity of the posterior to the prior specification. pace0.05cmκπ1,π2 : compatibility of different priors [coherency of opinions of experts].

M. de Carvalho On the Geometry of Bayesian Inference 17 / 37

Page 21: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryA Geometric View of Bayes Theorem

Definition (Compatibility)

The compatibility between points in the geometry under consideration is themapping κ : L2(Θ)×L2(Θ)→ [0,1] defined as

κg ,h =〈g ,h〉‖g‖‖h‖

, g ,h ∈ L2(Θ).

pace-.1cmPearson correlation coefficient vs. compatibility〈X ,Y 〉=

∫ΩXY dP,

X ,Y ∈ L2(Ω,BΩ,P),instead of

〈g ,h〉=

∫Θ g(θ)h(θ)dθ,

g ,h ∈ L2(Θ).

pace-.2cm Note that:κπ,`: prior-data agreement. pace0.05cmκπ,p: sensitivity of the posterior to the prior specification. pace0.05cmκπ1,π2 : compatibility of different priors [coherency of opinions of experts].

M. de Carvalho On the Geometry of Bayesian Inference 17 / 37

Page 22: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryNorms and their Interpretation

κπ,` is comprised of function norms: How do we interpret norms?In some cases the norm of a density is linked to the variance.

Example

Let U ∼ Unif(a,b) and let π(x) = (b−a)−1I(a,b)(x). Then,

‖π‖= 1/(12σ2U)1/4,

where the variance of U is σ2U = 1/12(b−a)2.

Example

Let X ∼ N(µ,σ2X ) with known variance σ2

X . It can be shown that

‖φ‖= ∫R

φ2(x ; µ,σ2

X )dµ1/2 = 1/(4πσ2X )1/4.

M. de Carvalho On the Geometry of Bayesian Inference 18 / 37

Page 23: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryNorms and their Interpretation

Proposition

Let Θ⊂ Rp with |Θ|< ∞ where | · | denotes the Lebesgue measure. Consider π :Θ→ [0,∞) a probability density with π ∈ L2(Θ) and let π0 ∼ Unif(Θ) denote auniform density on Θ, then

‖π‖2 = ‖π−π0‖2 +‖π0‖2.

This interpretation cannot be applied to Θ’s that do not have finite Lebesguemeasure as there is no corresponding proper Uniform distribution.

Yet, the notion that the norm of a density is a measure of its peakedness maybe applied whether or not Θ has finite Lebesgue measure.

M. de Carvalho On the Geometry of Bayesian Inference 19 / 37

Page 24: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryNorms and their Interpretation

Proposition

Let Θ⊂ Rp with |Θ|< ∞ where | · | denotes the Lebesgue measure. Consider π :Θ→ [0,∞) a probability density with π ∈ L2(Θ) and let π0 ∼ Unif(Θ) denote auniform density on Θ, then

‖π‖2 = ‖π−π0‖2 +‖π0‖2.

This interpretation cannot be applied to Θ’s that do not have finite Lebesguemeasure as there is no corresponding proper Uniform distribution.

Yet, the notion that the norm of a density is a measure of its peakedness maybe applied whether or not Θ has finite Lebesgue measure.

M. de Carvalho On the Geometry of Bayesian Inference 19 / 37

Page 25: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryNorms and their Interpretation

Proposition

Let Θ⊂ Rp with |Θ|< ∞ where | · | denotes the Lebesgue measure. Consider π :Θ→ [0,∞) a probability density with π ∈ L2(Θ) and let π0 ∼ Unif(Θ) denote auniform density on Θ, then

‖π‖2 = ‖π−π0‖2 +‖π0‖2.

This interpretation cannot be applied to Θ’s that do not have finite Lebesguemeasure as there is no corresponding proper Uniform distribution.

Yet, the notion that the norm of a density is a measure of its peakedness maybe applied whether or not Θ has finite Lebesgue measure.

M. de Carvalho On the Geometry of Bayesian Inference 19 / 37

Page 26: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryNorms and their Interpretation

To see this,

evaluate π(θ) on a grid θ1 < · · ·< θD and consider the vector

p = (π1, . . . ,πD),

with πd = π(θd ) for d = 1, . . . ,D.

The larger the norm of the vector p, the higher the indication that certaincomponents would be far from the origin—that is, π(θ) would be peaking forcertain θ in the grid.

Now, think of a density as a vector with infinitely many components (itsvalue at each point of the support) and replace summation by integration toget the L2 norm.

M. de Carvalho On the Geometry of Bayesian Inference 20 / 37

Page 27: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryNorms and their Interpretation

To see this, evaluate π(θ) on a grid θ1 < · · ·< θD

and consider the vector

p = (π1, . . . ,πD),

with πd = π(θd ) for d = 1, . . . ,D.

The larger the norm of the vector p, the higher the indication that certaincomponents would be far from the origin—that is, π(θ) would be peaking forcertain θ in the grid.

Now, think of a density as a vector with infinitely many components (itsvalue at each point of the support) and replace summation by integration toget the L2 norm.

M. de Carvalho On the Geometry of Bayesian Inference 20 / 37

Page 28: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryNorms and their Interpretation

To see this, evaluate π(θ) on a grid θ1 < · · ·< θD and consider the vector

p = (π1, . . . ,πD),

with πd = π(θd ) for d = 1, . . . ,D.

The larger the norm of the vector p, the higher the indication that certaincomponents would be far from the origin—that is, π(θ) would be peaking forcertain θ in the grid.

Now, think of a density as a vector with infinitely many components (itsvalue at each point of the support) and replace summation by integration toget the L2 norm.

M. de Carvalho On the Geometry of Bayesian Inference 20 / 37

Page 29: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryNorms and their Interpretation

To see this, evaluate π(θ) on a grid θ1 < · · ·< θD and consider the vector

p = (π1, . . . ,πD),

with πd = π(θd ) for d = 1, . . . ,D.

The larger the norm of the vector p, the higher the indication that certaincomponents would be far from the origin

—that is, π(θ) would be peaking forcertain θ in the grid.

Now, think of a density as a vector with infinitely many components (itsvalue at each point of the support) and replace summation by integration toget the L2 norm.

M. de Carvalho On the Geometry of Bayesian Inference 20 / 37

Page 30: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryNorms and their Interpretation

To see this, evaluate π(θ) on a grid θ1 < · · ·< θD and consider the vector

p = (π1, . . . ,πD),

with πd = π(θd ) for d = 1, . . . ,D.

The larger the norm of the vector p, the higher the indication that certaincomponents would be far from the origin—that is, π(θ) would be peaking forcertain θ in the grid.

Now, think of a density as a vector with infinitely many components (itsvalue at each point of the support) and replace summation by integration toget the L2 norm.

M. de Carvalho On the Geometry of Bayesian Inference 20 / 37

Page 31: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryNorms and their Interpretation

To see this, evaluate π(θ) on a grid θ1 < · · ·< θD and consider the vector

p = (π1, . . . ,πD),

with πd = π(θd ) for d = 1, . . . ,D.

The larger the norm of the vector p, the higher the indication that certaincomponents would be far from the origin—that is, π(θ) would be peaking forcertain θ in the grid.

Now, think of a density as a vector with infinitely many components (itsvalue at each point of the support) and replace summation by integration toget the L2 norm.

M. de Carvalho On the Geometry of Bayesian Inference 20 / 37

Page 32: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes Geometry

Example (On-the-job drug usage toy example, cont. 1)

From the example in the Introduction we have θ | y ∼ Beta(a?,b?) witha? = a+ ∑Yi = a+2 and b? = b+n−∑Yi = b+8. The norm of the prior,posterior, and likelihood are respectively given by

‖π(a,b)‖=B(2a−1,2b−1)1/2

B(a,b), a,b > 1/2,

and

‖p(a,b)‖= ‖π(a?,b?)‖.

M. de Carvalho On the Geometry of Bayesian Inference 21 / 37

Page 33: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryPrior and Posterior Norms: On-the-Job Drug Usage Toy Example

5 10 15 20 25 30

510

1520

2530

||π||

a

b

1.0

1.5

2.0

2.5

3.0

3.5

5 10 15 20 25 305

1015

2025

30

||p||

a

b1.0

1.5

2.0

2.5

3.0

3.5

pace-.5cm

Figure: Prior and posterior norms for on-the-job drug usage toy example. The black dotcorresponds to (a,b) = (3.44,22.99) (values employed by Christensen et al. 2011, pp. 26–27).M. de Carvalho On the Geometry of Bayesian Inference 22 / 37

Page 34: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryAngles Between Other Vectors

Considering κ, it follows that

κπ,`(a,b) = B(a?,b?)B(2a−1,2b−1)B(2∑Yi +1,2(n−∑Yi ) +1)−1/2.

As mentioned, we are not restricted to use κ only to compare π and `.

Example (On-the-job drug usage toy example, cont. 2)

Extending a previous example, we calculate

κπ,p = B(∑Yi +2a−1,n−∑Yi +2b−1)

×B(2a−1,2b−1)

×B(2∑Yi +2a−1,2n−2∑Yi +2b−1)−1/2,

and for π1 ∼ Beta(a1,b1) and π2 ∼ Beta(a2,b2),

κπ1,π2 =B(a1 +a2−1,b1 +b2−1)

B(2a1−1,2b1−1)B(2a2−1,2b2−1)1/2.

M. de Carvalho On the Geometry of Bayesian Inference 23 / 37

Page 35: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryCompatibility: On-the-Job Drug Usage Toy Example

5 10 15 20 25 30

510

1520

2530

κπ,ℓ

b

a

0.0

0.2

0.4

0.6

0.8

1.0

5 10 15 20 25 30

510

1520

2530

κπ,p

b

a

0.2

0.4

0.6

0.8

1.0

5 10 15 20 25 30

510

1520

2530

κπ1,π2

b

a

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Figure: Compatibility (κ) for on-the-job drug usage toy example. In (i) and (ii) the black dotcorresponds to (a,b) = (3.44,22.99) (values employed by Christensen et al. 2011, pp. 26–27).

M. de Carvalho On the Geometry of Bayesian Inference 24 / 37

Page 36: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryMax-Compatible Priors and Maximum Likelihood Estimators

Definition (Max-compatible prior)

Let y ∼ f ( · | θ), and let P = π(θ |α) : α ∈A be a family of priors for θ.

Ifthere exists α∗y ∈A , such that κπ,`(α

∗y ) = 1, the prior π(θ |α∗y ) ∈P is said to

be max-compatible, and α∗y is said to be a max-compatible hyperparameter.

The max-compatible hyperparameter, α∗y , is by definition a random vector,and thus a max-compatible prior density is a random function.Geometrically: A prior is max-compatible iff it is collinear to the likelihood inthe sense that

κπ,`(α∗y ) = 1 iff π(θ |α∗y ) ∝ `(θ)

.

π(θ |α∗y )

`

M. de Carvalho On the Geometry of Bayesian Inference 25 / 37

Page 37: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryMax-Compatible Priors and Maximum Likelihood Estimators

Definition (Max-compatible prior)

Let y ∼ f ( · | θ), and let P = π(θ |α) : α ∈A be a family of priors for θ. Ifthere exists α∗y ∈A , such that κπ,`(α

∗y ) = 1, the prior π(θ |α∗y ) ∈P is said to

be max-compatible

, and α∗y is said to be a max-compatible hyperparameter.

The max-compatible hyperparameter, α∗y , is by definition a random vector,and thus a max-compatible prior density is a random function.Geometrically: A prior is max-compatible iff it is collinear to the likelihood inthe sense that

κπ,`(α∗y ) = 1 iff π(θ |α∗y ) ∝ `(θ)

.

π(θ |α∗y )

`

M. de Carvalho On the Geometry of Bayesian Inference 25 / 37

Page 38: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryMax-Compatible Priors and Maximum Likelihood Estimators

Definition (Max-compatible prior)

Let y ∼ f ( · | θ), and let P = π(θ |α) : α ∈A be a family of priors for θ. Ifthere exists α∗y ∈A , such that κπ,`(α

∗y ) = 1, the prior π(θ |α∗y ) ∈P is said to

be max-compatible, and α∗y is said to be a max-compatible hyperparameter.

The max-compatible hyperparameter, α∗y , is by definition a random vector,and thus a max-compatible prior density is a random function.Geometrically: A prior is max-compatible iff it is collinear to the likelihood inthe sense that

κπ,`(α∗y ) = 1 iff π(θ |α∗y ) ∝ `(θ)

.

π(θ |α∗y )

`

M. de Carvalho On the Geometry of Bayesian Inference 25 / 37

Page 39: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryMax-Compatible Priors and Maximum Likelihood Estimators

Definition (Max-compatible prior)

Let y ∼ f ( · | θ), and let P = π(θ |α) : α ∈A be a family of priors for θ. Ifthere exists α∗y ∈A , such that κπ,`(α

∗y ) = 1, the prior π(θ |α∗y ) ∈P is said to

be max-compatible, and α∗y is said to be a max-compatible hyperparameter.

The max-compatible hyperparameter, α∗y , is by definition a random vector,and thus a max-compatible prior density is a random function.Geometrically: A prior is max-compatible iff it is collinear to the likelihood inthe sense that

κπ,`(α∗y ) = 1 iff π(θ |α∗y ) ∝ `(θ)

.

π(θ |α∗y )

`

M. de Carvalho On the Geometry of Bayesian Inference 25 / 37

Page 40: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryMax-Compatible Priors and Maximum Likelihood Estimators

Example (Beta–Binomial)

Let ∑ni=1Yi ∼ Bin(n,θ), and suppose θ ∼ Beta(a,b). It can be shown that the

max-compatible prior is π(θ | a∗,b∗) = β (θ | a∗,b∗), where a∗ = 1+ ∑ni=1Yi , and

b∗ = 1+n−∑ni=1Yi , so that

θn = arg maxθ∈(0,1)

f (y | θ) = Y =a∗−1

a∗+b∗−2=: m(a∗,b∗).

Theorem

Let y ∼ f ( · | θ), and let P = π(θ |α) : α ∈A be a family of priors for θ.Suppose there exists a max-compatible prior π(θ |α∗y ) ∈P, which we assume tobe unimodal. Then,

θn = argmaxθ∈Θ

f (y | θ) = mπ (α∗y ) := argmaxθ∈Θ

π(θ |α∗y ).

M. de Carvalho On the Geometry of Bayesian Inference 26 / 37

Page 41: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryMax-Compatible Priors and Maximum Likelihood Estimators

Example (Exp–Gamma)

In this case the max-compatible prior is given by fΓ(θ | a∗,b∗) where(a∗,b∗) = (1+n,∑n

i=1Yi ). The connection with the ML estimator is the following

θ = argmaxθ∈Θ

f (y | θ) =n

∑ni=1Yi

=a∗−1b∗

=: m2(a∗,b∗).

Example (Poisson–Gamma)

In this case the max-compatible prior is fΓ(θ | a∗,b∗), where(a∗,b∗) = (1+ ∑

ni=1Yi ,n). The max-compatible hyperparameter in this case is

different from the one in the previous example, but still

θ = argmaxθ∈Θ

f (y | θ) = Y =a∗−1b∗

=: m2(a∗,b∗).

M. de Carvalho On the Geometry of Bayesian Inference 27 / 37

Page 42: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Bayes GeometryMax-Compatible Priors and Maximum Likelihood Estimators

Example (Exp–Gamma)

In this case the max-compatible prior is given by fΓ(θ | a∗,b∗) where(a∗,b∗) = (1+n,∑n

i=1Yi ). The connection with the ML estimator is the following

θ = argmaxθ∈Θ

f (y | θ) =n

∑ni=1Yi

=a∗−1b∗

=: m2(a∗,b∗).

Example (Poisson–Gamma)

In this case the max-compatible prior is fΓ(θ | a∗,b∗), where(a∗,b∗) = (1+ ∑

ni=1Yi ,n). The max-compatible hyperparameter in this case is

different from the one in the previous example, but still

θ = argmaxθ∈Θ

f (y | θ) = Y =a∗−1b∗

=: m2(a∗,b∗).

M. de Carvalho On the Geometry of Bayesian Inference 27 / 37

Page 43: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Posterior and Prior Mean-Based Estimators of CompatibilityIntroduction

In many situations closed form estimators of κ and ‖ · ‖ are not available.This leads to considering algorithmic techniques to obtain estimates.As most Bayes methods resort to using MCMC methods it would beappealing to express κ·,· and ‖ · ‖ as functions of posterior expectations andemploy MCMC iterates to estimate them.For example, κπ,p can be expressed as

κπ,p = Ep π(θ)

[Ep

π(θ)

`(θ)

Ep`(θ)π(θ)

]−1/2

,

where Ep( ·) =∫

Θ ·p(θ | y)dθ.

M. de Carvalho On the Geometry of Bayesian Inference 28 / 37

Page 44: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Posterior and Prior Mean-Based Estimators of CompatibilityTentative Estimator

A natural Monte Carlo estimator would then be

κπ,p =1B

B

∑b=1

π(θb)

[1B

B

∑b=1

π(θb)

`(θb)

1B

B

∑b=1

`(θb)π(θb)

]−1/2

,

where θb denotes the bth MCMC iterate of p(θ | y).Consistency of such an estimator follows trivially by the ergodic theorem andthe continuous mapping theorem, but there is an important issue regardingits stability.

M. de Carvalho On the Geometry of Bayesian Inference 29 / 37

Page 45: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Posterior and Prior Mean-Based Estimators of CompatibilityProblems with Previous Attempt

Unfortunately, the previous estimator includes an expectation that contains`(θ) in the denominator and therefore (29) inherits the undesirable propertiesof the so-called harmonic mean estimator (Newton and Raftery, 1994).It has been shown that even for simple models this estimator may haveinfinite variance (Raftery et al. 2007), and has been harshly criticized for,among other things, converging extremely slowly.As argued by Wolpert and Schmidler (2012, p. 655):

“the reduction of Monte Carlo sampling error by a factor of two requiresincreasing the Monte Carlo sample size by a factor of 21/ε , or in excess of 2.5 ·1030

when ε = 0.01, rendering [the harmonic mean estimator] entirely untenable.”

M. de Carvalho On the Geometry of Bayesian Inference 30 / 37

Page 46: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Posterior and Prior Mean-Based Estimators of CompatibilitySolution

An alternate strategy is to avoid writing κπ,p as a function of harmonic meanestimators and instead express it as a function of posterior and priorexpectations. For example, consider

κπ,p = Ep π(θ)

[Eππ(θ)Eπ`(θ)

Ep`(θ)π(θ)]−1/2

,

where Eπ ( ·) =∫

Θ ·π(θ)dθ.Now the Monte Carlo estimator is

κπ,p =1B

B

∑b=1

π(θb)

B−1

∑Bb=1 π(θb)

B−1 ∑Bb=1 `(θb)

1B

B

∑b=1

`(θb)π(θb)

−1/2

,

where θb denotes the bth draw of θ from π(θ), which can also be sampledwithin the MCMC algorithm.

M. de Carvalho On the Geometry of Bayesian Inference 31 / 37

Page 47: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Posterior and Prior Mean-Based Estimators of CompatibilityIllustration

0 2000 4000 6000 8000 10000

0.0

0.2

0.4

0.6

0.8

1.0

Iterate

κ π,p

κπ,p(1, 1)κ~π,p(1, 1)κπ,p(1, 1)

κπ,p(2, 1)κ~π,p(2, 1)κπ,p(2, 1)

κπ,p(10, 1)κ~π,p(10, 1)κπ,p(10, 1)

Figure: Running point estimates of prior-posterior compatibility, κπ,p , for the on-the-job drugusage toy example. Green lines correspond to the true κπ,p values, blue represents κπ,p and reddenotes κπ,p .

M. de Carvalho On the Geometry of Bayesian Inference 32 / 37

Page 48: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Posterior and Prior Mean-Based Estimators of CompatibilityMean-Based Representations of Objects of Interest

Proposition

The following equalities hold:

‖p‖2 =Ep`(θ)π(θ)

Eπ `(θ), ‖π‖2 = Eπ π(θ), ‖`‖2 = Eπ `(θ)Ep

`(θ)

π(θ)

,

κπ1,π2 = Eπ1 π2(θ)

[Eπ1 π1(θ)Eπ2 π2(θ)

]−1/2, κπ,` = Eπ `(θ)

[Eπ π(θ)Eπ `(θ)Ep

`(θ)

π(θ)

]−1/2,

κπ,p = Ep π(θ)

[Eπ π(θ)

Eπ `(θ)Ep `(θ)π(θ)

]−1/2, κ`,p = Ep `(θ)

[Ep

`(θ)

π(θ)

Ep `(θ)π(θ)

]−1/2,

κ`1,`2 = Eπ `2(θ)Ep2

`1(θ)

π(θ)

[Eπ`1(θ)Ep1

`1(θ)

π(θ)

Eπ `2(θ)Ep2

`2(θ)

π(θ)

]−1/2.

M. de Carvalho On the Geometry of Bayesian Inference 33 / 37

Page 49: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

DraftConditionally accepted, Bayesian Analysis

[email protected]

M. de Carvalho On the Geometry of Bayesian Inference 34 / 37

Page 50: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

DiscussionFinal Remarks

We discussed a natural geometric framework to Bayesian inference whichmotivated a simple, intuitively appealing measure of the agreement betweenpriors, likelihoods, and posteriors: compatibility (κ).In this geometric framework, we also discuss a related measure of the“informativeness” of a distribution, ‖ · ‖.We developed MCMC-based estimators of these metrics that are easilycomputable and, by avoiding the estimation of harmonic means, arereasonably stable.Our concept of compatibility can be used to evaluate how much the prioragrees with the likelihood, to measure the sensitivity of the posterior to theprior, and to quantify the level of agreement of elicited priors.

M. de Carvalho On the Geometry of Bayesian Inference 35 / 37

Page 51: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

DiscussionFinal Remarks

To streamline the talk, I have focused on priors which are on L2(Θ).Yet there are examples of priors that are not in L2(Θ). A simple example isthat of the Jeffreys prior for the Beta-Binomial, Beta(1/2,1/2), whose normwill be infinity.Our geometric construction is still able to consider densities not in L2(Θ),because as documented in the paper, the following approach is still nested inour setup

κ√g ,√h =

∫Θg(θ)1/2h(θ)1/2 dθ.

This approach results in a κ that continues being a metric that measuresagreement between two elements of a geometry, but loses direct connectionwith Bayes theorem.

M. de Carvalho On the Geometry of Bayesian Inference 36 / 37

Page 52: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

DiscussionFinal Remarks

To streamline the talk, I have focused on priors which are on L2(Θ).Yet there are examples of priors that are not in L2(Θ). A simple example isthat of the Jeffreys prior for the Beta-Binomial, Beta(1/2,1/2), whose normwill be infinity.Our geometric construction is still able to consider densities not in L2(Θ),because as documented in the paper, the following approach is still nested inour setup

κ√g ,√h =

∫Θg(θ)1/2h(θ)1/2 dθ.

This approach results in a κ that continues being a metric that measuresagreement between two elements of a geometry, but loses direct connectionwith Bayes theorem.

M. de Carvalho On the Geometry of Bayesian Inference 36 / 37

Page 53: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

The EndThanks!

M. de Carvalho On the Geometry of Bayesian Inference 37 / 37

Page 54: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

References

Agarawal, A., and Daumé, III, H. (2010), “A Geometric View of Conjugate Priors,” MachineLearning, 81, 99–113.

Aitchison, J. (1971), “A Geometrical Version of Bayes’ Theorem,” The American Statistician,25, 45–46.

Al Labadi, L., and Evans, M. (2016), “Optimal Robustness Results for Relative Belief Inferencesand the Relationship to Prior-Data Conflict,” Bayesian Analysis, in press.

Bartle, R., and Sherbert, D. (2010), Introduction to Real Analysis (4th ed.), New York: Wiley.Berger, J. (1991), “Robust Bayesian Analysis: Sensitivity to the Prior,” Journal of Statistical

Planning and Inference, 25, 303–328.Berger, J., and Berliner, L. M. (1986), “Robust Bayes and Empirical Bayes Analysis with

ε-Contaminated Priors,” The Annals of Statistics, 14, 461–486.Berger, J. O., and Wolpert, R. L. (1988), The Likelihood Principle, In IMS Lecture Notes, Ed.

Gupta, S. S., Institute of Mathematical Statistics, vol. 6.Christensen, R., Johnson, W. O., Branscum, A. J., and Hanson, T. E. (2011), Bayesian Ideas

and Data Analysis, Boca Raton: CRC Press.Cheney, W. (2001), Analysis for Applied Mathematics, New York: Springer.Evans, M., and Jang, G. H. (2011), “Weak Informativity and the Information in one Prior

Relative to Another,” Statistical Science, 26, 423–439.Evans, M., and Moshonov, H. (2006), “Checking for Prior-Data Conflict,” Bayesian Analysis, 1,

893–914.Gelman, A., Jakulin, A., Pittau, M. G., and Su, Y. S. (2011), “A Weakly Informative Default

Prior Distribution for Logistic and other Regression Models,” Annals of Applied Statistics, 2,1360–1383.

Giné, E., and Nickl, R. (2008), “A Simple Adaptive Estimator of the Integrated Square of aDensity,” Bernoulli, 14, 47–61.

Grogan, W., and Wirth, W. (1981), “A New American Genus of Predaceous Midges Related to

M. de Carvalho On the Geometry of Bayesian Inference 37 / 37

Page 55: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

References

Palpomyia and Bezzia (Diptera: Ceratopogonidae),” Proceedings of the Biological Society ofWashington, 94, pp. 1279–1305.

Hastie, T., Tibshirani, R., and Friedman, J. (2008), Elements of Statistical Learning, New York:Springer.

Hoff, P. (2009), A First Course in Bayesian Statistical Methods, New York: Springer.Hunter, J., and Nachtergaele, B. (2005), Applied Analysis, London: World Scientific Publishing.Kyung, M., Gill, J., Ghosh, M., and Casella, G. (2010), “Penalized Regression, Standard Errors,

and Bayesian Lassos,” Bayesian Analysis, 5, 369–412.Knight, K. (2000), Mathematical Statistics, Boca Raton: Chapman & Hall/CRC Press.Lavine, M. (1991), “Sensitivity in Bayesian Statistics: The Prior and the Likelihood,” Journal of

the American Statistical Association 86 396–399.Lenk, P. (2009), “Simulation Pseudo-Bias Correction to the Harmonic Mean Estimator of

Integrated Likelihoods,” Journal of Computational and Graphical Statistics, 18, 941–960.Lopes, H. F., and Tobias, J. L. (2011), “Confronting Prior Convictions: On Issues of Prior

Sensitivity and Likelihood Robustness in Bayesian Analysis,” Annual Review of Economics, 3,107–131.

Millman, R. S., and Parker, G. D. (1991), Geometry: A Metric Approach with Models, NewYork: Springer.

Newton, M. A., and Raftery, A. E. (1994), “Approximate Bayesian Inference with the WeightedLikelihood Bootstrap (With Discussion),” Journal of the Royal Statistical Society, Series B,56, 3–26.

Pajor, A., and Osiewalski, J. (2013), “A Note on Lenk’s Correction of the Harmonic MeanEstimator,” Central European Journal of Economic Modelling and Econometrics, 5, 271–275.

Park, T., and Casella, G. (2008), “The Bayesian Lasso,” Journal of the American StatisticalAssociation, 103, 681–686.

Raftery, A. E., Newton, M. A., Satagopan, J. M., and Krivitsky, P. N. (2007), “Estimating the

M. de Carvalho On the Geometry of Bayesian Inference 37 / 37

Page 56: School of Mathematics · OnSomeGeometricalAspectsofBayesianInference Miguel de Carvalho† †Joint with B. J. Barney and G. L. Page; Brigham Young University, US School of Mathematics

Integrated Likelihood via Posterior Simulation using the Harmonic Mean Identity,” In BayesianStatistics, Eds. Bernardo, J. M., Bayarri, M. J., Berger, J. O., Dawid, A. P., Heckerman, D.,Smith, A. F. M., and West, M., Oxford University Press, vol. 8.

Ramsey, J. O., and Silverman, B. W. (1997), Functional Data Analysis, New York:Springer-Verlag.

Scheel, I., Green, P. J., and Rougier, J. C. (2011), “A Graphical Diagnostic for IdentifyingInfluential Model Choices in Bayesian Hierarchical Models,” Scandinavian Journal ofStatistics, 38, 529–550.

Shortle, J. F., and Mendel, M. B. (1996), “The Geometry of Bayesian Inference,” In BayesianStatistics, eds. Bernardo, J. M., Berger, J. O., Dawid, A. P., and Smith, A. F. M., OxfordUniversity Press, vol. 5, pp. 739–746.

Walter, G., and Augustin, T. (2009), “Imprecision and Prior-Data Conflict in GeneralizedBayesian Inference,” Journal of Statistical Theory and Practice, 3, 255–271.

Wolpert, R., and Schmidler, S. (2012), “α-Stable Limit Laws for Harmonic Mean Estimators ofMarginal Likelihoods,” Statistica Sinica, 22, 655–679.

Zhu, H., Ibrahim, J. G., and Tang, N. (2011), “Bayesian Influence Analysis: A GeometricApproach,” Biometrika, 98, 307–323.

M. de Carvalho On the Geometry of Bayesian Inference 37 / 37