notes 2 bayesianstatistics

8/12/2019 Notes 2 BayesianStatistics

1/6

A Summary of the Bayesian Method and Bayesian Point of View

Page gary simon20031

The formalism of the Bayesian methodology is not too hard to explain, but thephilosophical points of view are very contentious.

The discussion starts with Bayes theorem, a familiar probability result. The essentialmanipulation is this:

P(AB) =( )

( )

P

P

A B

B

=

( ) ( )

( )

P | P

P

B A A

B

This is a non-controversial formula, but you should be aware of its potential for timereversal. Suppose that eventA comes earlier in time. Then P(B|A) asks about theprobability of a current event, given a past event. Time is reversed in P(A|B) , whichasks about a past event, given a current event.

You often see Bayes formula explained in terms of a partition. LetA1,A2, ,Anbe a partition. Partition means that

P(AiAj) = 0 for ij

and also that

P(A1A2 An) = 1

Bayes formula might now be written

P(AjB) =( ) ( )

( ) ( )1

P | P

P | P

j j

n

i i

i

B A A

B A A=

Now imagine that we have a standard parametric problem dealing with a likelihoodf(x). Herexis a stand-in for the data that well see, while represents the parameterwhich will forever remain unknown. The conventional approach to this problem (calledthe likelihood approach) will use the likelihood as the primary object of interest.Estimates of will be based on the likelihood, and the method of maximum likelihoodhas paramount importance. The parameter is considered nonrandom.

The Bayesian, however, believes that previous knowledge about (or even prejudicesabout ) can be incorporated into a prior distribution. Imagine that () denotes this priordistribution. Now think of as random with probability law ().

One can now engage in a petty argument which distinguishes (i) situations inwhich there is a genuine random process (amenable to modeling) which creates ,


2/6



from (ii) situations in which is created exactly once (so that modeling isimpossible and irrelevant). The Bayesian has no need to make this distinction. Itis the state of his mental opinion about which is subjected to modeling.

It follows that the joint density of the random (,X) is given by ()f(x). Integrationof will give the marginal law ofX. Lets denote this as m(x).

m(x) = ( ) ( )|f x d

We can now create the conditional law of , given the datax. Specifically, this is

f(|x) =( ) ( )

( ) ( )

|

|

f x

f x d

=

( ) ( )

( )

|f x

m x

In the denominator, rather than has been used as the dummy of integration, just toavoid confusion. More simply, we could note that

f(|x) =( ) ( )

( )

|

factor without

f x

This means that the denominator is just a fudge factor involvingxand some numbers (butnot), so that we can supply the denominator in whatever way is needed to makef(|x)a legitimate density. (A later example will make this point clear.)

The Bayesian callsf(|x) as theposteriordensity. If another experiment is to be done,then the posterior becomes the prior for this next experiment.

The Bayesian regardsf(|x) as the best summary of the experiment.

If you wanted an estimate of , the Bayesian would supply you with a measure oflocation fromf(|x), possibly the mean or median. If you have gone through theformality of creating a loss function, the Bayesian would minimize the expectedposterior loss.

If you wanted a 95% confidence interval for , the Bayesian would give you aninterval (a, b) with the property that

( )|b

a

f x d = 0.95

presumably choosing this interval so that b- ais as short as possible.


3/6



Consider now this simple example. Suppose that is a binomial parameter. Suppose

that you want to estimate based on a random variable Y, which is binomial (n, ).

The maximum likelihood person uses =n

with no further mental anguish.

The Bayesian will invoke a prior distribution () for . For this problem, this will likelybe an instance of the beta distribution. Specifically, he might choose

() =( )

( ) ( ) ( ) ( )

11 1 I 0 1 +

notes 2 bayesianstatistics

Documents