g. cowan rhul physics statistical methods for particle physics / 2007 cern-fnal hcp school page 1...

Post on 21-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1

Statistical Methods for Particle Physics (2)

CERN-FNAL

Hadron Collider Physics

Summer School

CERN, 6-15 June, 2007

Glen CowanPhysics DepartmentRoyal Holloway, University of Londong.cowan@rhul.ac.ukwww.pp.rhul.ac.uk/~cowan

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 2

Outline

1 Brief overviewProbability: frequentist vs. subjective (Bayesian)Statistics: parameter estimation, hypothesis tests

2 Statistical tests for Particle Physicsmultivariate methods for event selection

goodness-of-fit tests for discovery

3 Systematic errors Treatment of nuisance parametersBayesian methods for systematics, MCMC

Friday

Wednesday

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 3

Testing goodness-of-fit

Suppose hypothesis H predicts pdf

observations

for a set of

We observe a single point in this space:

What can we say about the validity of H in light of the data?

Decide what part of the data space represents less compatibility with H than does the point less

compatiblewith H

more compatiblewith H

(Not unique!)

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 4

p-values

where (H) is the prior probability for H.

Express ‘goodness-of-fit’ by giving the p-value for H:

p = probability, under assumption of H, to observe data with equal or lesser compatibility with H relative to the data we got.

This is not the probability that H is true!

In frequentist statistics we don’t talk about P(H) (unless H represents a repeatable observation). In Bayesian statistics we do; use Bayes’ theorem to obtain

For now stick with the frequentist approach; result is p-value, regrettably easy to misinterpret as P(H).

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 5

p-value example: testing whether a coin is ‘fair’

i.e. p = 0.0026 is the probability of obtaining such a bizarreresult (or more so) ‘by chance’, under the assumption of H.

Probability to observe n heads in N coin tosses is binomial:

Hypothesis H: the coin is fair (p = 0.5).

Suppose we toss the coin N = 20 times and get n = 17 heads.

Region of data space with equal or lesser compatibility with H relative to n = 17 is: n = 17, 18, 19, 20, 0, 1, 2, 3. Addingup the probabilities for these values gives:

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 6

p-value of an observed signalSuppose we observe n events; these can consist of:

nb events from known processes (background)ns events from a new process (signal)

If ns, nb are Poisson r.v.s with means s, b, then n = ns + nb

is also Poisson, mean = s + b:

Suppose b = 0.5, and we observe nobs = 5. Should we claimevidence for a new discovery?

Give p-value for hypothesis s = 0:

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 7

Significance from p-value

Often define significance Z as the number of standard deviationsthat a Gaussian variable would fluctuate in one directionto give the same p-value.

TMath::Prob

TMath::NormQuantile

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 8

The significance of a peak

Suppose we measure a value x for each event and find:

Each bin (observed) is aPoisson r.v., means aregiven by dashed lines.

In the two bins with the peak, 11 entries found with b = 3.2.The p-value for the s = 0 hypothesis is:

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 9

The significance of a peak (2)

But... did we know where to look for the peak?

→ give P(n ≥ 11) in any 2 adjacent bins

Is the observed width consistent with the expected x resolution?

→ take x window several times the expected resolution

How many bins distributions have we looked at?

→ look at a thousand of them, you’ll find a 10-3 effect

Did we adjust the cuts to ‘enhance’ the peak?

→ freeze cuts, repeat analysis with new data

Should we publish????

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 10

Using shape of a distribution in a search

Suppose we want to search for a specific model (i.e. beyond the Standard Model); contains parameter .

Expected number of entries in ith bin:

signal background

Suppose the ‘no signal’ hypothesis is = 0, i.e., s(0) = 0.

Probability is product of Poisson probabilities:

Select candidate events; for each event measure some quantityx and make histogram:

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 11

Testing the hypothesized Construct e.g. the likelihood ratio:

Find the sampling distributioni.e. we need to know how t() would be distributed if theentire experiment would be repeated under assumption of thebackground only hypothesis (parameter value 0).

p-value of 0 using test variable designed to be sensitive to :

This gives the probability, under the assumption of backgroundonly, to see data as ‘signal like’ or more so, relative to what we saw.

(e.g. use MC)

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 12

Making a discovery / setting limits

If we find a small p-value → discovery

Repeat this exercise for all

Test hypothesized using

Confidence interval at confidence level = set of values not rejected by a test of significance level

If reject .

here use e.g. = 0.05

Is the new signal compatible with what you were looking for?

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 13

When to publish

HEP folklore: claim discovery when p-value of background only hypothesis is 2.85 10-7, corresponding to significance Z = 5.

This is very subjective and really should depend on the prior probability of the phenomenon in question, e.g.,

phenomenon reasonable p-value for discoveryD0D0 mixing ~0.05Higgs ~ 10-7 (?)Life on Mars ~10

Astrology

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 14

Statistical vs. systematic errors Statistical errors:

How much would the result fluctuate upon repetition of the measurement?

Implies some set of assumptions to define probability of outcome of the measurement.

Systematic errors:

What is the uncertainty in my result due to uncertainty in my assumptions, e.g.,

model (theoretical) uncertainty;modelling of measurement apparatus.

Usually taken to mean the sources of error do not vary upon repetition of the measurement. Often result from uncertain value of, e.g., calibration constants, efficiencies, etc.

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 15

Systematic errors and nuisance parameters

Response of measurement apparatus is never modelled perfectly:

x (true value)

y (m

easu

red

valu

e) model:

truth:

Model can be made to approximate better the truth by includingmore free parameters.

systematic uncertainty ↔ nuisance parameters

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 16

Example: fitting a straight line

Data:

Model: measured yi independent, Gaussian:

assume xi and i known.

Goal: estimate 0

(don’t care about 1).

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 17

Case #1: 1 known a priori

For Gaussian yi, ML same as LS

Minimize 2 → estimator

Come up one unit from

to find

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 18

Correlation between

causes errors

to increase.

Standard deviations from

tangent lines to contour

Case #2: both 0 and 1 unknown

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 19

The information on 1

improves accuracy of

Case #3: we have a measurement t1 of 1

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 20

The ‘tangent plane’ method is a special case of using the

profile likelihood:

The profile likelihood

is found by maximizing L (0, 1) for each 0.

Equivalently use

The interval obtained from is the same as

what is obtained from the tangents to

Well known in HEP as the ‘MINOS’ method in MINUIT.

Profile likelihood is one of several ‘pseudo-likelihoods’ usedin problems with nuisance parameters.

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 21

The Bayesian approach

In Bayesian statistics we can associate a probability witha hypothesis, e.g., a parameter value .

Interpret probability of as ‘degree of belief’ (subjective).

Need to start with ‘prior pdf’ (), this reflects degree of belief about before doing the experiment.

Our experiment has data y, → likelihood function L(y|).

Bayes’ theorem tells how our beliefs should be updated inlight of the data x:

Posterior pdf p(| y) contains all our knowledge about .

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 22

Case #4: Bayesian method

We need to associate prior probabilities with 0 and 1, e.g.,

Putting this into Bayes’ theorem gives:

posterior Q likelihood prior

← based on previous measurement

reflects ‘prior ignorance’, in anycase much broader than

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 23

Bayesian method (continued)

Ability to marginalize over nuisance parameters is an importantfeature of Bayesian statistics.

We then integrate (marginalize) p(0, 1 | x) to find p(0 | x):

In this example we can do the integral (rare). We find

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 24

Digression: marginalization with MCMCBayesian computations involve integrals like

often high dimensionality and impossible in closed form,also impossible with ‘normal’ acceptance-rejection Monte Carlo.

Markov Chain Monte Carlo (MCMC) has revolutionizedBayesian computation.

Google for ‘MCMC’, ‘Metropolis’, ‘Bayesian computation’, ...

MCMC generates correlated sequence of random numbers:cannot use for many applications, e.g., detector MC;effective stat. error greater than √n .

Basic idea: sample multidimensional look, e.g., only at distribution of parameters of interest.

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 25

Although numerical values of answer here same as in frequentistcase, interpretation is different (sometimes unimportant?)

Example: posterior pdf from MCMCSample the posterior pdf from previous example with MCMC:

Summarize pdf of parameter ofinterest with, e.g., mean, median,standard deviation, etc.

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 26

Case #5: Bayesian method with vague prior

Suppose we don’t have a previous measurement of 1 butrather some vague information, e.g., a theorist tells us:

1 ≥ 0 (essentially certain);

1 should have order of magnitude less than 0.1 ‘or so’.

Under pressure, the theorist sketches the following prior:

From this we will obtain posterior probabilities for 0 (next slide).

We do not need to get the theorist to ‘commit’ to this prior;final result has ‘if-then’ character.

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 27

Sensitivity to prior

Vary () to explore how extreme your prior beliefs would have to be to justify various conclusions (sensitivity analysis).

Try exponential with differentmean values...

Try different functional forms...

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 28

Wrapping up...

p-value for discovery = probability, under assumption ofbackground only, to see data as signal-like (or more so) relativeto the data you obtained.

≠ P(Standard Model true)!

Systematic errors ↔ nuisance parameters

If constrained by measurement → profile likelihoodOther prior info → Bayesian methods

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 29

Extra slides

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 30

MCMC basics: Metropolis-Hastings algorithmGoal: given an n-dimensional pdf

generate a sequence of points

1) Start at some point

2) Generate

Proposal densitye.g. Gaussian centredabout

3) Form Hastings test ratio

4) Generate

5) If

else

move to proposed point

old point repeated

6) Iterate

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 31

Metropolis-Hastings (continued)This rule produces a correlated sequence of points (note how each new point depends on the previous one).

For our purposes this correlation is not fatal, but statisticalerrors larger than naive

The proposal density can be (almost) anything, but chooseso as to minimize autocorrelation. Often take proposaldensity symmetric:

Test ratio is (Metropolis-Hastings):

I.e. if the proposed step is to a point of higher , take it;

if not, only take the step with probability

If proposed step rejected, hop in place.

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 32

Metropolis-Hastings caveatsActually one can only prove that the sequence of points followsthe desired pdf in the limit where it runs forever.

There may be a “burn-in” period where the sequence doesnot initially follow

Unfortunately there are few useful theorems to tell us when thesequence has converged.

Look at trace plots, autocorrelation.

Check result with different proposal density.

If you think it’s converged, try it again with 10 times more points.

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 33

LEP-style analysis: CLb

Same basic idea: L(m) → l(m) → q(m) → test of m, etc.

For a chosen m, find p-value of background-only hypothesis:

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 34

LEP-style analysis: CLs+b

‘Normal’ way to get interval would be to reject hypothesizedm if

By construction this interval will cover the true value of mwith probability 1 .

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 35

LEP-style analysis: CLs

The problem with the CLs+b method is that for high m, the distribution of q approaches that of the background-only hypothesis:

So a low fluctuation in the number of background events cangive CLs+b <

This rejects a high m value even though we are not sensitiveto Higgs production with that mass; the reason was a lowfluctuation in the background.

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 36

CLs

A solution is to define:

and reject the hypothesized m if:

So the CLs intervals ‘over-cover’; they are conservative.

top related