luca stanco – infn - padova2 december 2010 1 combining p-values i.e. what happens to significance...

Luca Stanco – INFN - Padova 2 December 2010 1

Combining p-values

i.e. what happens to SIGNIFICANCE when next event comes ?

There are two ways: 1) difficult, correct

2) easy, approximate Frequentist way

Bayesian way


I assume that everybody knows what are - p-values - H0/H1 hypothesis

(otherwise please refer to e.g. http://pdg.lbl.gov/2010/reviews/rpp2010-rev-statistics.pdf )

For a short cut:

p-value = probability of less probable region of H0 hypothesis

1-p = Significance of the H1 hypthesis (power 1-, error of type II)

(only in case of 1 random variable !!! )


1rst way


Excercise: suppose the 2° event owns similar p-value than the 1rst one

2.98 sigmas

Of course, with the FISHER rule we forgot about any correlation!

Moreover is somehow wrong in case of 2 p-values quite different: p1 = 0.1 p2=0.0001→ pTOT = 0.00012 > p2


It turns out that the FISHER rule is too conservative in case of twoindependent Poissonians, being the lowest limiting p-value:

€

P(x1,x2;n) = P(x = x1 + x2;n) =(x1 + x2)n

n!e−(x1 +x2 )

In the simplest case of no correlation, with 2 candidatesas before, the result provides:

3.39 sigmas

BUT the final result should be even greater since that probability is:

€

PTOT = P(x;2 + 0) + P(x;1+1) + P(x;0 + 2)

This is a simple demonstration that the FISHER rule is CONSERVATIVE and no so good for Discrete Cases


WHY it is “difficult” the Bayesian way ?

If we simulate 1 million of pseudo-experiments for 1candidate, for 2 candidates a priori we should simulate (1 million)2 = 1012 !!

Some tricks may be applies by - Integrating the likelihood over a “normal domain” (simply connected)- Computing 1-p- Decoupling variables as much as possible

(this is formally correct)

Then, a Multivariate Likelihood computation is affordable.


In the example of the simplest OPERA case the correct result is:

3.60 sigmas

98.22% 1.77% 0.01%

98

.22

%1

.77

%0

.01

%

96.452% 1.739%

1.742%

0.018%

0.018%

0.031% 0%

0%0%

Error due to limited exps.


Backup


Feldman-Cousins is “no meaning” in case of few events (<5)and more than 1 random variable

Junk may be used (Modified Frequentist Technique): (arXiv:hep-ex/9902006v1 5 Feb 1999)Valid only for fully independent searches

For example it is used by D0 for the Higgs search but: - CDF uses Bayes - the two methods agree within 10% on the single channel and 1% overall - Tevatron decided to release the official result based on the CDF/Bayes analysis.

luca stanco – infn - padova2 december 2010 1 combining p-values i.e. what happens to significance...

Documents

luca stanco infn padova2

backup slide

similar pvalue

p decoupling variables

lowest limiting pvalue

simplest case

simplest opera case

correct result