statistical methods for data analysis the bayesian approach luca lista infn napoli

42
Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Upload: trystan-farthing

Post on 15-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Statistical Methodsfor Data Analysis

the Bayesian approach

Luca Lista

INFN Napoli

Page 2: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 2

Contents

• Bayes theorem• Bayesian probability• Bayesian inference

Page 3: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 3

Conditional probability

• Probability that the event A occurs given that B also occurs

A B

Page 4: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 4

Bayes theorem

• P(A) = prior probability• P(A|B) = posterior probability

Thomas Bayes (1702-1761)

Page 5: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 5

Pictorial view of Bayes theorem (I)

A

B

P(A) = P(B) =

P(A|B) = P(B|A) =

From a drawing by B.Cousins

Page 6: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 6

Pictorial view of Bayes theorem (II)

P(B|A) P(A) =

P(A|B) P(B) =

= P(A B)

= P(A B)

=

=

Page 7: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 7

A concrete example

• A person received a diagnosis of a serious illness (say H1N1, or worse…)

• The probability to detect positively a ill person is ~100%

• The probability to give a positive result on a healthy person is 0.2%

• What is the probability that the person is really ill? Is 99.8% a reasonable answer?

G. Cowan, Statistical data analysis 1998,G. D'Agostini, CERN Academic Training, 2005

Page 8: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 8

Conditional probability

• Probability to be really ill = conditioned probability after the event of the positive diagnosis– P(+ | ill) = 100%, P(- | ill) << 1– P(+ | healthy) = 0.2%, P(- | healthy) = 99.8%

• Using Bayes theorem:– P(ill | +) = P(+ | ill) P(ill) / P(+) P(ill) / P(+)

• We need to know:– P(ill) = probability that a random person is ill (<< P(healthy))

• And we have:– Using: P(ill) + P(healthy) = 1 and P(ill and healty) = 0– P(+) = P(+ | ill) P(ill) + P(+| healthy) P(healthy)

P(ill) + P(+ | healthy)

Page 9: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 9

Pictorial view

P(ill) P(healthy) 1

P(+|healty)

P(-|healthy)P(+|ill) 1

Page 10: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 10

Pictorial view

P(ill) P(healthy) 1

P(+|healty)

P(-|healthy)P(+|ill) 1

P(ill|+)

P(healthy|+)

P(ill|+) + P(healthy|+) = 1

Page 11: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 11

Adding some numbers

• Probability of being really ill:–P(ill | +) = P(ill)/P(+) P(ill) / (P(ill) + P(+ | healthy))

• If:– P(ill) = 0.17%, P(+ | healthy) = 0.2%

• We have:–P(ill | +) = 17 / (17 + 20) = 46%

Page 12: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 12

A more physics example

• A muon selection has :– Efficiency for the signal: = P(sel | )– Efficiency for background: = P(sel |)

• Given a collection of particles, what is the fraction of selected muons?

• Can’t answer, unless you know the fraction of muons: P() (and P() = 1 - P())!

• So:

• Or:

Page 13: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 13

Prob. ratios and prob. inversion

• Another convenient way to re-state the Bayes posterior is through ratios:

• No need to consider all possible hypotheses (not known in all cases)

• Clear how the ratio of priors plays a role

Page 14: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 14

Bayesian probability as learning

• Before the observation B, our degree of belief of A is P(A) (prior probability)

• After observing B, our degree of belief changes into P(A | B) (posterior probability)

• Probability can be expressed also as a property of non-random variables– E.g.: unknown parameter, unknown events

• Easy approach to extend knowledge with subsequent observation– E.g. combine experiment = multiply probabilities

• Easy to cope with numerical problems• Consider P(B) as a normalization factor:

if and

Page 15: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 15

Bayes and likelihood function• Likelihood function definition: a PDF of the variables x1, …, xn:

• Bayesian posterior probability for 1, …, m:

• Where:– P(1, …, m) is the prior probability.

• Often assumed to be flat in HEP papers, but there is no motivation for this choice (and flat distribution depends on the parameterization!)

– L(…)P(…) dm is a normalization factor• Interpretation:

– The observation modifies the prior knowledge of the unknown parameters as if L is a probability distribution function for 1, …, n

– F.James et al.: “The difference between P() and P( | x) shows how one’s knowledge (degree of belief) about has been modified by the observation x. The distribution P( | x) summarizes all one’s knowledge of and can be used accordingly.”

Page 16: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 16

Repeated use of Bayes theorem

• Bayes theorem can be applied sequentially for repeated observations (posterior = learning from experiments)

Prior

Conditioned posterior 1

observation 1

Conditioned posterior 2

observation 2

Conditioned posterior 3

observation 3

P0 = Prior

P1 P0 L1

P2 P1 L2 P0 L1 L2

P3 P0 L1 L2 L3

Note that applying Bayes theorem directlyfrom prior to (obs1 + obs2) leads to the same result:

P1+2 = P0 L1+2 = P0 L1 L2 = P2

Page 17: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 17

Bayesian in decision theory

• You need to decide to take some action after you have computed your degree of belief– E.g.: make a public announcement of a discovery or not

• What is the best decision?• The answer also depends on the (subjective) cost of

the two possible errors:– Announce a wrong answer– Don’t announce a discovery (and be anticipate by a

competitor!)• Bayesian approach fits well with decision theory,

which requires two subjective input:– Prior degree of belief– Cost of outcomes

Page 18: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 18

Falsifiability within statistics

• With Aristotle’s or “Boolean” logic, if a cause A forbids the observation of the effect B, observing the effect B implies that A is false

• Naively migrating to random possible events (Bi) with different (uncertain!) hypotheses (Aj) would lead to:– Observing an event Bi that

has very low probability, given a cause Aj, implies that Aj is very unlikely

False!!!!

Page 19: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 19

Detection of paranormal phenomena

• A person claims he has Extrasensory Perception (ESP)

• He can “predict” the outcome of card extraction with much higher success rate than random guess

• What is the (Bayesian) probability he really has ESP?

Page 20: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 20

Simpleton, ready to believe!

• If (prior) P(ESP) P(!ESP) 0.5– P(ESP|predict) 1 (posterior)– A single experiment demonstrates ESP!

P(ESP) P(!ESP)

P(predict|ESP) 1

P(predict|!ESP) << 1

Page 21: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 21

With a skeptical prior prejudice

• If (prior) P(ESP) << P(!ESP) – P(ESP|predict) < 0.5 (at least uncertain!)– More experiments? More hypotheses?

P(ESP) P(!ESP)

P(predict|ESP) 1

P(predict|!ESP) << 1

Page 22: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 22

Maybe he is cheating?

• How likely is cheating? Assume: P(ESP) << P(cheat) – P(ESP|predict) 0 (cheating more likely!)– The ESP guy should now propose alternative hypotheses!

P(ESP) P(no ESP, not cheat)

P(predict|ESP) P(predict|cheat) 1

P(predict|!ESP) << 1

P(cheat)

Page 23: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 23

Ascertain physics observations

• Are those evidence for pentaquark +(1520)K0p?• Influenced by previous evidence papers?• Are there other possible interpretations?

10 significance

arXiv:hep-ex/0509033v3

Page 24: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 24

Pentaquarks• From PDG 2006, “PENTAQUARK UPDATE” (G.Trilling, LBNL)

• “In 2003, the field of baryon spectroscopy was almost revolutionized by experimental evidence for the existence of baryon states constructed from five quarks ……To summarize, with the exception described in the previous paragraph, there has not been a high-statistics confirmation of any of the original experiments that claimed to see the Θ+; there have been two high-statistics repeats from Jefferson Lab that have clearly shown the original positive claims in those two cases to be wrong; there have been a number of other high-statistics experiments, none of which have found any evidence for the Θ+; and all attempts to confirm the two other claimed pentaquark states have led to negative results.

The conclusion that pentaquarks in general, and the Θ+, in particular, do not exist, appears compelling.”

Page 25: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 25

Dark matter search

• Are those observations of Dark matter?

Eur.Phys.J.C56:333-355,2008

Nature 456, 362-365

Page 26: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 26

B. & F. in the scientific process

• Bayesian and Frequentistic approaches have complementary role in this process

ExperimentObservation of

new phenomenon

How likely is theinterpretation?

Bayesian probabilistic interpretationof the new phenomenon:what is the probability thatthe interpretation is correct?

Strong skeptical prejudice motivates confirmation:repeat the experiment and find other evidences( run into the frequentistic domain!)

Page 27: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 27

How to compute Posterior PDF

• Perform analytical integration– Feasible in very few cases

• Use numerical integration– May be CPU intensive

• Markov Chain Monte Carlo– Sampling parameter space efficiently using a random walk

heading to the regions of higher probability– Metropolis algorithm to sample according to a PDF f(x)

1. Start from a random point, xi, in the parameter space2. Generate a proposal point xp in the vicinity of xi

3. If f(xp) > f(xi) accept as next point xi+1 = xp

else, accept only with probability p = f(xp) / f(xi)4. Repeat from point 2

– Convergence criteria and step sizemust be defined

RooStats::BayesianCalculator

RooStats::MCMCCalculator

Page 28: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 28

Problems of Bayesian approach

• The Bayesian probability is subjective, in the sense that it depends on a prior probability, or degrees of belief about the unknown parameters– Anyway, increasing the amount of observations, the

posterior probability with modify significantly the prior probability, and the final posterior probability will depend less from the initial prior probability

– … but under those conditions, using frequentist or Bayesian approaches does not make much difference anyway

• How to represent the total lack of knowledge?– A uniform distribution is not invariant under coordinate

transformations– Uniform PDF in log is scale-invariant

• Study of the sensitivity of the result on the chosen prior PDF is usually recommended

Page 29: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 29

Choosing the prior PDF• If the prior PDF is uniform in a choice of variable (“metrics”), it won’t be

uniform when applying coordinate transformation• Given a prior PDF in a random variable, there is always a

transformation that makes the PDF uniform• The problem is: chose one metric where the PDF is uniform• Harold Jeffreys’ prior: chose the prior form that is inviariant under

parameter transformation• metric related to the Fisher information (metrics invariant!)

• Some common cases:– Poissonian mean:– Poissonian mean with background b:– Gaussian mean:– Gaussian r.m.s:– Binomial parameter:

• Problematic with more than one dimension! Demonstration on Wikipedia:see: Jeffreys prior

Page 30: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Gent, 28 Oct. 2014 Luca Lista 30

Frequentist vs Bayesian intervals• Interpretation of parameter errors:

– = est [ ∈ est− , est+ ] – = est+2

−1 [ ∈ est− 1, est+ 2]

• Frequentist approach:– Knowing a parameter within some error means that a large fraction

(68% or 95%, usually) of the experiments contain the (fixed but unknown) true value within the quoted confidence interval: [est- 1, est+ 2]

• Bayesian approach:– The posterior PDF for is maximum at est and its integral is 68%

within the range [est - 1, est+ 2]

• The choice of the interval, i.e.. 1 and 2 can be done in different ways, e.g: same area in the two tails, shortest interval, symmetric error, …

• Note that both approaches provide the same results for a Gaussian model using a uniform prior, leading to possible confusions in the interpretation

Page 31: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 31

Frequentist vs Bayesian popularity

• Until 1990’s frequentist approach largely favored:– “at the present time (1997) [frequentists] appear to

constitute the vast majority of workers in high energy physics”• V.L.Highland, B.Cousins, NIM A398 (1997) 429-430

• More recently Bayesian estimates are getting more popular and provide simpler mathematical methods to perform complex estimates– Bayesian estimators properties can be studied with a

frequentistic approach using Toy Monte Carlos (feasible with today’s computers)

– Also preferred by several theorists (UTFit team, cosmologists)

Page 32: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 32

Bayesian inference• Just use the product of likelihood function times the prior

probability as the posterior PDF for the unknown parameter(s) :

• You can evaluate then the average and variance of , as well as the mode (most likely value)– In many cases, the most likely value and average don’t coincide!

• Notice that the Maximum Likelihood estimate is the mode of Bayesian inference with a flat Prior

• Upper limits are easily computed using the Bayesian approach

Page 33: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 33

Bayesian inference of a Poissonian

• Posterior probability, assuming the prior to be f0(s):

• If is f0(s) is uniform:

• We have: , • Most probable value:

… but this is somewhatarbitrary, since it is metric-dependent!

Page 34: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 34

Error propag. with Bayesian inference

• The result of the inference is just a PDF (of the measured parameters)

• The error propagation is done applying the usual transformations: z = Z(x, y)

x= X (x, y), y =Y (x, y)

Page 35: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 35

A Bayesian application: UTFit

• UTFit: Bayesian determination of the CKM unitarity triangle– Many experimental and theoretical inputs

combined as product of PDF– Resulting likelihood interpreted as

Bayesian PDF in the UT plane• Inputs:

– Experimental results that directly or indirectly measure or put constraints on Standard Model CKM Parameters

Page 36: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 36

The Unitarity Triangle

d s b

u

c

t

V V V

V V V

V V V

ud us ub

cd cs cb

td ts tb

0*** tbtdcbcdubud VVVVVV

*

*

td tb

cd cb

V V

V V*

*

ud ub

cd cb

V V

V V

1

B=(1,0)

C=(0,0)

A=(,)

• Quark mixing is described by the CKM matrix

• Unitarity relations on matrix elements lead to a triangle in the complex plane

Page 37: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 37

Inputs

Page 38: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 38

Combine the constraints

• Given {xi} parameters and {ci} constraints that depend on xi, ρ, η:

• Define the combined PDF– ƒ( ρ, η, x1, x2 , ..., xN | c1, c2 , ..., cM ) ∝

∏j=1,M ƒj(cj | ρ, η, x1, x2 , ..., xN) ∏i=1,N ƒi(xi) ⋅ ƒo (ρ, η)

– PDF taken from experiments, wherever it is possible

• Determine the PDF of (ρ, η) integrating over the remaining parameters– ƒ(ρ, η) ∝

∫ ∏j=1,M ƒj(cj | ρ, η, x1, x2 , ..., xN) ∏i=1,N ƒi(xi) ⋅ ƒo (ρ, η) dNx dMc

Prior PDF

Page 39: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 39

Unitarity Triangle fit

68%, 95% contours

Page 40: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 40

PDFs for and

Page 41: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 41

Projections on other observables

Page 42: Statistical Methods for Data Analysis the Bayesian approach Luca Lista INFN Napoli

Luca Lista Statistical Methods for Data Analysis 42

References• "Bayesian inference in processing experimental data: principles and basic applications",

Rep.Progr.Phys. 66 (2003)1383 [physics/0304102]• H. Jeffreys, "An Invariant Form for the Prior Probability in Estimation Problems“,

Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences 186 (1007): 453–46, 1946

• H. Jeffreys, “Theory of Probability”, Oxford University Press, 1939• Wikipedia: “Jeffreys prior”, with demonstration of metrics invariance• G. D'Agostini, “Bayesian Reasoning in Data Analysis: a Critical Guide", World Scientific

(2003).• W.T. Eadie, D.Drijard, F.E. James, M.Roos, B.Saudolet, Statistical Methods in Experimental

Physics, North Holland, 1971• G.D’Agostini: “Telling the truth with statistics”, CERN Academic Training Lecture, 2005

– http://cdsweb.cern.ch/record/794319?ln=it• Pentaquarks update 2006 in PDG

– pdg.lbl.gov/2006/listings/b152.ps– SVD Collaboration, Further study of narrow baryon resonance decaying into K0

s p in pA-interactions at 70 GeV/c with SVD-2 setup arXiv:hep-ex/0509033v3

• Dark matter:– R. Bernabei et al.: Eur.Phys.J.C56:333-355,2008: arXiv:0804.2741v1– J. Chang et al.: Nature 456, 362-365

• UTFit:– http://www.utfit.org/– M. Ciuchini et al., JHEP 0107 (2001) 013, hep-ph/0012308