statistical methods bayesian methods 2

30
Statistical Methods Bayesian methods 2 Daniel Thorburn Stockholm University 2012-03-29

Upload: shea-justice

Post on 03-Jan-2016

51 views

Category:

Documents


7 download

DESCRIPTION

Statistical Methods Bayesian methods 2. Daniel Thorburn Stockholm University 2012-03-29. Outline. Probability assessment; exercize Conjugate distributions Vague and other priors Inference - Point estimates, decisions and intervals. 2. 6. Probability assessment. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Statistical Methods  Bayesian methods 2

Statistical Methods Bayesian methods 2

Daniel Thorburn

Stockholm University

2012-03-29

Page 2: Statistical Methods  Bayesian methods 2

2

Outline

6. Probability assessment; exercize7. Conjugate distributions8. Vague and other priors9. Inference - Point estimates, decisions

and intervals

Page 3: Statistical Methods  Bayesian methods 2

6. Probability assessment

Page 4: Statistical Methods  Bayesian methods 2

Exercize in probability assessment

• Probability assessment is difficult and training is needed.

• The results from last weeks handout were not impressing.

• One observation is that many of you seem to use to small or large probabilities. Never use 0 or 1 if you are not absolutely certain.

Page 5: Statistical Methods  Bayesian methods 2

Results of your assessmentsProbability of the outcomes under your assessments and

independence (upper dot is total ignorance)

0

0,2

0,4

0,6

0,8

1

1,2

1,4

1,6

0 0,01 0,02 0,03 0,04 0,05 0,06 0,07 0,08

Here the likelihood is used. 6 out of 13 performed worse than saying 50 % on all. Jiayun Jin was best.

This approach is sensitive to assessing probabilities 0 (1) to events which (did not) occur.

As a curiosity for those in favour of likelihood inference: Jin is the ”ML-estimate” of the best probability assesser. An interval based on the deviance (-2loglikelihood-3.84) says that all with probabilities below 0.015 can be rejected on the 95% level. According to this test none of you is significantly better than chance.

Page 6: Statistical Methods  Bayesian methods 2

Results of your assessmentsAnother picture the fit of your assessments where the measure is mean square error. It is more forgiving to those who assessing extreme probabilities.

Still Jiayun Jin is best and only 5 out of 13 assessed probabilities worse than the totally ignorance.

Total squares error of your assessments (upper dot is total ignorance)

0

0,2

0,4

0,6

0,8

1

1,2

1,4

1,6

0 0,5 1 1,5 2 2,5 3 3,5 4

Page 7: Statistical Methods  Bayesian methods 2

7. Conjugate Distributions

Page 8: Statistical Methods  Bayesian methods 2

Recall Bayes theorem

• What you know in afterwards f|X=x = • What you knew before f • + the information in the data

fX(x|) = L(x,) • or• ln(f|X=x = f +ln(ln L(x,))

• Compare the Likelihood principle saying that all the information in an experiment is contained in the likelihood.

Page 9: Statistical Methods  Bayesian methods 2

Conjugate distributions

• In some cases there exist simple priors where the calculations are very simple to perform.

• The posteriors corresponding to those priors belong to the same distributions except for a change in the parameters reflecting the new information

• This was more important previously but nowadays with good computer packages one can use also more complicated prior distributions.

Page 10: Statistical Methods  Bayesian methods 2

Binomial

• Binomial distribution

• n and k are sufficient. View this as a distribution where =k+1 and =n-k+1 are the parameters.

• This will work as a prior distribution and the number of parameters will be two

knk ppk

nkXP

)1()(

)()(

)()1()( 11

CwherepCpxfP

Page 11: Statistical Methods  Bayesian methods 2

Conjugate priors• The Beta-distribution is called the conjugate distribution to the binomial

• The same prior will be conjugate to all Bernoulli-based distributions (sampling schemes) e.g. Geometric or Negative binomial

• Conjugate distributions are simple to work with since it will be possible to describe the posterior with only a few parameters

• In (almost) all cases when there are finite-dimensional sufficient statistics there will also exist conjugate distributions (e.g. for the exponential family)

• Remark: that posterior will be the same regardless of the sampling plan. It does not matter if you decided to toss a coin n times and got k positive or if you decided to toss the coin until you had observed k positive.

– (A Neyman Pearson person would make the inference differently. An unbiased estimate is k/n in the first case and (k-1)/(n-1) in the second).

– In Bayesian analysis your result depends only on what you observed and not on what you could have observed but did not.

Page 12: Statistical Methods  Bayesian methods 2

Poisson• Poisson distribution

• Sufficient statistics are n and xi. Normaliseing and putting p = xi-1 and b = n gives the Gamma distribution

• The conjugate distribution of the Poisson is thus the Gamma-distribution

)exp(!

)exp(!

)(

nx

obsnafterLikelihood

xxXP

i

x

x

i

)(

)exp()(

1

p

xbxbxf

pp

Page 13: Statistical Methods  Bayesian methods 2

Example• You are studying a homogeneous portfolio of car insurances. You want to

study the risk per 1000 km to have an accident.• Your model is that the number of claims per car yi is Po(*xi) where xi is

the distance driven by the car (in Mm (one thousand km)). • Your prior on has been based on the experience from previous years of

this and other car brands. It is G(50, 5 000) (prior mean is p/b = 50/5000 = 0.01. Relative standard deviation is (2/p)0.5 = 0.20

• Your posterior after observing the number of accidents Syi for all cars is G(50 + Syi, 5 000 + Sxi)

• This year the company have had 260 cars insured of this brand with 54 claims and a total driven distance of 4 800 Mm. Their posterior will be G(104, 9 800)

• The posterior mean for is 104/9 800 = 0.0106 and the relative s.d. is (2/104)0.5 = 0.14

• A 95% credibility interval is thus 0,0106 +/- 1.96*0.0106*0.14 = 0.0106 +/- 0.0029. (Gamma with p = 104 is approximately normal)

• Credibility intervals is the terms for this type of intervals in actuarial science

Page 14: Statistical Methods  Bayesian methods 2

Gamma (2)

• Density

• This is apart from normalising constants symmetric in b. I.e. the conjugate distribution for b is Gamma with parameters p+1 and x.

• I.e. -2 is gamma distributed. The conjugate distribution for 2 is said to follow an inverse Gamma distribution.

• C.f.

22

1

..)2/(½

)(

)exp()(

forfdpandbwhere

p

xbxbxf

pp

sgivenns

andgivenns

)1()1( 22

22

2

2

Page 15: Statistical Methods  Bayesian methods 2

Normal with unknown variance

• Combine the above: Conjugate Prior/posterior of variance is inverse gamma and prior for the mean given the variance is normal.

• This also means that the mean is unconditionally t-distributed.

Page 16: Statistical Methods  Bayesian methods 2

Some conjugate distributions

Binomial(n,p) p is Beta(+k,+n-k)

Neg Binomial, Geometric p is Beta(+k,+X-k)

Poisson () is Gamma(p+x,b+n)

Exponential () is Gamma(p+n,b+x)

Normal(m, ) known m is Normal(m +xn)/(+n), n/(1+n),

Normal(0,) is Inverse Gamma(p+n/2, b+s2/2)

Normal (m, ) is Inverse Gamma

m given is normal

Unconditionally m is t

Uniform (0,b) b is Pareto(max(a,xi), p+n)

Page 17: Statistical Methods  Bayesian methods 2

8. Vague and other priors

Page 18: Statistical Methods  Bayesian methods 2

Vague – uninformative - priors

• If you do not want your prior to affect the result, you may use unininformative priors.

• As you saw above the number of observations corresponds to one of the parameters of the prior. If we decrease that parameter as far as possible we get what is called uninformative prior distributions. These are not always true distributions but can be handled as limits. (If they are not they are sometimes called ”improper”).

• N(m, ) = Uniform(- , + ); f(x) t1• Gamma(0,b); f(x) t 1/x or ln(x) is U(-, )• Beta(0,0); f(x) t 1/(x(1-x)) or logit(x) is u(-, )

Page 19: Statistical Methods  Bayesian methods 2

Statistical reporting• Priors are your own and personal• Readers of scientific articles may have other opinions• When you report the result of an experiment, report the data so that

all readers can plug it in together with their own opinions, i.e. report the likelihood function.

• This is sometimes technical and it may be easier to report a posterior given an uniformative prior, which often is easier to understand.

• ML-estimate is the mode of this and if you believe that modes are a good way to describe use it. Otherwise the mean or median are often better.

• The observed Fisher Information is one way of describing the spread (- second derivative of the likelihood function at the mode), but you may also use the standard deviation of the posterior.

• If the posterior(likelihood) is approximately normal, everything is equivalent

• Other ”reference priors” than an uninformative are sometimes used

Page 20: Statistical Methods  Bayesian methods 2

Other priors

• Using a computer it is nowadays often quite easy to handle more complicated priors

• It is often sensible to safeguard against misspecifying the priors e.g. by using a mixture of two priors.

• Previous experience says one thing e.g. Beta() but if this case is unique prior experience is useless and a vague may be a good choice.

• Use a mixed distribution e.g. Beta( with probability 0.99 and Beta(0,0) with probability 0.01 (see Excel)

Page 21: Statistical Methods  Bayesian methods 2

• You may see from the Excel sheet that the posterior for large values of n is close to normal

• In fact the posterior distribution tends to normal under very limited assumptions. (If the density for is twice continuously differentiable in a set of probability 1) and the variance is the observed Fisher information.

• The posterior converges to the same distribution regardless of the prior information. When the data dominates everyone agrees on the posterior

Page 22: Statistical Methods  Bayesian methods 2

9. Inference - Point estimates, decisions and intervals

Page 23: Statistical Methods  Bayesian methods 2

Estimates

• Suppose we have a posterior distribution f().• If we should only give one value. What to do?• Use what you know from descriptive statistics.

How to describe a distribution of values with only one value– Mean: Usually the best choice. Posterior mean. Best

with, smallest mean square error – Median, smallest mean absolute error– Mode, most typical value, (the ML-estimate

corresponds to the mode with uninformative prior), smallest 0-1-loss (with an error of at most where is small. Least common in descriptive statistics

Page 24: Statistical Methods  Bayesian methods 2

Decision theory approach

mediandgives

dfdfderivative

dfddfddfdimise

Edfdgives

dfdderivative

dfdimise

dfdLimise

d

d

d

d

)()(

)()()()()(||min

)()(

)()(2

)()(min

)(),(min

2

Page 25: Statistical Methods  Bayesian methods 2

Decision problem• The demand Q for a product is unknown but modelled by the

distribution N(100,502) • If a shop orders d units the net profit will be 10min(Q,d) – 5(d-

min(Q,d))• How much should be ordered?• The expected profit is

• The derivative is

• Thus he should order the amount corresponding to the 66 2/3-percentile i.e. 121.5 units in this case

dqqdfdqqfqdqd

d

)(10)())(510(

)(5))(1(10

)(10)(10)(5)(10

dFdF

dqqfddfdqqfddfd

d

Page 26: Statistical Methods  Bayesian methods 2

• The expected profit is easily found to be 727.4 (by doing the integral above)

• The expected profit if he knew the demand would be 10*100=1000• The expected value of perfect information (EVPI) is 1000-727.4 =

272.6

• If he could pay 100 for doing a market research which gives an estimate of Q with a standard deviation of 25 should he do so?

• Combining with what he already know his posterior would have the variance 1/(1/2500+1/625) = 500 = 22.362.

• His expected profit will thus be 1000 – 100 – (22.36/50)*272.6 = 778.1

• He should thus order the market research and his expected profit will increase by 50.7

• (Check if you could fill in the details of this example. Could you have solved it using classical methods?)

Page 27: Statistical Methods  Bayesian methods 2

• Let us now instead suppose that the size of the market research is not fixed. A study with size n gives a standard deviation of 200/rot(n). The cost for such a market study is 36+n. (The previous study corresponds to n = 64)

• Doing the same calculations for different sizes and maximising profits gives the picture on the next page. He should settle for a study of size n=51 getting the total expected profit almost 780.

• These illustrates the importance of statistics having an interface with decision theory. What would you have done if the manager had come to you with the question and if you had been confined to classical statistics?

Page 28: Statistical Methods  Bayesian methods 2

760

762

764

766

768

770

772

774

776

778

780

782

0 10 20 30 40 50 60 70 80 90

Page 29: Statistical Methods  Bayesian methods 2

Confidence intervals

• An interval constructed in this way will in the long run cover the true values in 1- of all cases if it is repeated many many many times.

• Like a person throwing rings around a peg. If he is skilful he will get the ring around the peg in 95% of all cases

• Probability intervals. The true value lies with probability 1- in the interval in this case (given what is known)

• Synonyms (roughly): credibility intervals, prediction intervals

Page 30: Statistical Methods  Bayesian methods 2

Probability intervals

• An interval (a,b) such that

• HPD-interval the shortest possible interval

dxXfb

a )|(1

bxforaxforxfaf

andbxaforxfbfaf

)()()(

)()()(