g. cowan rhul physics statistical methods for particle physics / 2007 cern-fnal hcp school page 1...

36
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL Hadron Collider Physics Summer School CERN, 6-15 June, 2007 Glen Cowan Physics Department Royal Holloway, University of London [email protected] www.pp.rhul.ac.uk/~cowan

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1

Statistical Methods for Particle Physics (2)

CERN-FNAL

Hadron Collider Physics

Summer School

CERN, 6-15 June, 2007

Glen CowanPhysics DepartmentRoyal Holloway, University of [email protected]/~cowan

Page 2: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 2

Outline

1 Brief overviewProbability: frequentist vs. subjective (Bayesian)Statistics: parameter estimation, hypothesis tests

2 Statistical tests for Particle Physicsmultivariate methods for event selection

goodness-of-fit tests for discovery

3 Systematic errors Treatment of nuisance parametersBayesian methods for systematics, MCMC

Friday

Wednesday

Page 3: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 3

Testing goodness-of-fit

Suppose hypothesis H predicts pdf

observations

for a set of

We observe a single point in this space:

What can we say about the validity of H in light of the data?

Decide what part of the data space represents less compatibility with H than does the point less

compatiblewith H

more compatiblewith H

(Not unique!)

Page 4: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 4

p-values

where (H) is the prior probability for H.

Express ‘goodness-of-fit’ by giving the p-value for H:

p = probability, under assumption of H, to observe data with equal or lesser compatibility with H relative to the data we got.

This is not the probability that H is true!

In frequentist statistics we don’t talk about P(H) (unless H represents a repeatable observation). In Bayesian statistics we do; use Bayes’ theorem to obtain

For now stick with the frequentist approach; result is p-value, regrettably easy to misinterpret as P(H).

Page 5: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 5

p-value example: testing whether a coin is ‘fair’

i.e. p = 0.0026 is the probability of obtaining such a bizarreresult (or more so) ‘by chance’, under the assumption of H.

Probability to observe n heads in N coin tosses is binomial:

Hypothesis H: the coin is fair (p = 0.5).

Suppose we toss the coin N = 20 times and get n = 17 heads.

Region of data space with equal or lesser compatibility with H relative to n = 17 is: n = 17, 18, 19, 20, 0, 1, 2, 3. Addingup the probabilities for these values gives:

Page 6: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 6

p-value of an observed signalSuppose we observe n events; these can consist of:

nb events from known processes (background)ns events from a new process (signal)

If ns, nb are Poisson r.v.s with means s, b, then n = ns + nb

is also Poisson, mean = s + b:

Suppose b = 0.5, and we observe nobs = 5. Should we claimevidence for a new discovery?

Give p-value for hypothesis s = 0:

Page 7: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 7

Significance from p-value

Often define significance Z as the number of standard deviationsthat a Gaussian variable would fluctuate in one directionto give the same p-value.

TMath::Prob

TMath::NormQuantile

Page 8: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 8

The significance of a peak

Suppose we measure a value x for each event and find:

Each bin (observed) is aPoisson r.v., means aregiven by dashed lines.

In the two bins with the peak, 11 entries found with b = 3.2.The p-value for the s = 0 hypothesis is:

Page 9: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 9

The significance of a peak (2)

But... did we know where to look for the peak?

→ give P(n ≥ 11) in any 2 adjacent bins

Is the observed width consistent with the expected x resolution?

→ take x window several times the expected resolution

How many bins distributions have we looked at?

→ look at a thousand of them, you’ll find a 10-3 effect

Did we adjust the cuts to ‘enhance’ the peak?

→ freeze cuts, repeat analysis with new data

Should we publish????

Page 10: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 10

Using shape of a distribution in a search

Suppose we want to search for a specific model (i.e. beyond the Standard Model); contains parameter .

Expected number of entries in ith bin:

signal background

Suppose the ‘no signal’ hypothesis is = 0, i.e., s(0) = 0.

Probability is product of Poisson probabilities:

Select candidate events; for each event measure some quantityx and make histogram:

Page 11: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 11

Testing the hypothesized Construct e.g. the likelihood ratio:

Find the sampling distributioni.e. we need to know how t() would be distributed if theentire experiment would be repeated under assumption of thebackground only hypothesis (parameter value 0).

p-value of 0 using test variable designed to be sensitive to :

This gives the probability, under the assumption of backgroundonly, to see data as ‘signal like’ or more so, relative to what we saw.

(e.g. use MC)

Page 12: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 12

Making a discovery / setting limits

If we find a small p-value → discovery

Repeat this exercise for all

Test hypothesized using

Confidence interval at confidence level = set of values not rejected by a test of significance level

If reject .

here use e.g. = 0.05

Is the new signal compatible with what you were looking for?

Page 13: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 13

When to publish

HEP folklore: claim discovery when p-value of background only hypothesis is 2.85 10-7, corresponding to significance Z = 5.

This is very subjective and really should depend on the prior probability of the phenomenon in question, e.g.,

phenomenon reasonable p-value for discoveryD0D0 mixing ~0.05Higgs ~ 10-7 (?)Life on Mars ~10

Astrology

Page 14: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 14

Statistical vs. systematic errors Statistical errors:

How much would the result fluctuate upon repetition of the measurement?

Implies some set of assumptions to define probability of outcome of the measurement.

Systematic errors:

What is the uncertainty in my result due to uncertainty in my assumptions, e.g.,

model (theoretical) uncertainty;modelling of measurement apparatus.

Usually taken to mean the sources of error do not vary upon repetition of the measurement. Often result from uncertain value of, e.g., calibration constants, efficiencies, etc.

Page 15: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 15

Systematic errors and nuisance parameters

Response of measurement apparatus is never modelled perfectly:

x (true value)

y (m

easu

red

valu

e) model:

truth:

Model can be made to approximate better the truth by includingmore free parameters.

systematic uncertainty ↔ nuisance parameters

Page 16: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 16

Example: fitting a straight line

Data:

Model: measured yi independent, Gaussian:

assume xi and i known.

Goal: estimate 0

(don’t care about 1).

Page 17: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 17

Case #1: 1 known a priori

For Gaussian yi, ML same as LS

Minimize 2 → estimator

Come up one unit from

to find

Page 18: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 18

Correlation between

causes errors

to increase.

Standard deviations from

tangent lines to contour

Case #2: both 0 and 1 unknown

Page 19: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 19

The information on 1

improves accuracy of

Case #3: we have a measurement t1 of 1

Page 20: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 20

The ‘tangent plane’ method is a special case of using the

profile likelihood:

The profile likelihood

is found by maximizing L (0, 1) for each 0.

Equivalently use

The interval obtained from is the same as

what is obtained from the tangents to

Well known in HEP as the ‘MINOS’ method in MINUIT.

Profile likelihood is one of several ‘pseudo-likelihoods’ usedin problems with nuisance parameters.

Page 21: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 21

The Bayesian approach

In Bayesian statistics we can associate a probability witha hypothesis, e.g., a parameter value .

Interpret probability of as ‘degree of belief’ (subjective).

Need to start with ‘prior pdf’ (), this reflects degree of belief about before doing the experiment.

Our experiment has data y, → likelihood function L(y|).

Bayes’ theorem tells how our beliefs should be updated inlight of the data x:

Posterior pdf p(| y) contains all our knowledge about .

Page 22: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 22

Case #4: Bayesian method

We need to associate prior probabilities with 0 and 1, e.g.,

Putting this into Bayes’ theorem gives:

posterior Q likelihood prior

← based on previous measurement

reflects ‘prior ignorance’, in anycase much broader than

Page 23: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 23

Bayesian method (continued)

Ability to marginalize over nuisance parameters is an importantfeature of Bayesian statistics.

We then integrate (marginalize) p(0, 1 | x) to find p(0 | x):

In this example we can do the integral (rare). We find

Page 24: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 24

Digression: marginalization with MCMCBayesian computations involve integrals like

often high dimensionality and impossible in closed form,also impossible with ‘normal’ acceptance-rejection Monte Carlo.

Markov Chain Monte Carlo (MCMC) has revolutionizedBayesian computation.

Google for ‘MCMC’, ‘Metropolis’, ‘Bayesian computation’, ...

MCMC generates correlated sequence of random numbers:cannot use for many applications, e.g., detector MC;effective stat. error greater than √n .

Basic idea: sample multidimensional look, e.g., only at distribution of parameters of interest.

Page 25: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 25

Although numerical values of answer here same as in frequentistcase, interpretation is different (sometimes unimportant?)

Example: posterior pdf from MCMCSample the posterior pdf from previous example with MCMC:

Summarize pdf of parameter ofinterest with, e.g., mean, median,standard deviation, etc.

Page 26: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 26

Case #5: Bayesian method with vague prior

Suppose we don’t have a previous measurement of 1 butrather some vague information, e.g., a theorist tells us:

1 ≥ 0 (essentially certain);

1 should have order of magnitude less than 0.1 ‘or so’.

Under pressure, the theorist sketches the following prior:

From this we will obtain posterior probabilities for 0 (next slide).

We do not need to get the theorist to ‘commit’ to this prior;final result has ‘if-then’ character.

Page 27: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 27

Sensitivity to prior

Vary () to explore how extreme your prior beliefs would have to be to justify various conclusions (sensitivity analysis).

Try exponential with differentmean values...

Try different functional forms...

Page 28: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 28

Wrapping up...

p-value for discovery = probability, under assumption ofbackground only, to see data as signal-like (or more so) relativeto the data you obtained.

≠ P(Standard Model true)!

Systematic errors ↔ nuisance parameters

If constrained by measurement → profile likelihoodOther prior info → Bayesian methods

Page 29: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 29

Extra slides

Page 30: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 30

MCMC basics: Metropolis-Hastings algorithmGoal: given an n-dimensional pdf

generate a sequence of points

1) Start at some point

2) Generate

Proposal densitye.g. Gaussian centredabout

3) Form Hastings test ratio

4) Generate

5) If

else

move to proposed point

old point repeated

6) Iterate

Page 31: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 31

Metropolis-Hastings (continued)This rule produces a correlated sequence of points (note how each new point depends on the previous one).

For our purposes this correlation is not fatal, but statisticalerrors larger than naive

The proposal density can be (almost) anything, but chooseso as to minimize autocorrelation. Often take proposaldensity symmetric:

Test ratio is (Metropolis-Hastings):

I.e. if the proposed step is to a point of higher , take it;

if not, only take the step with probability

If proposed step rejected, hop in place.

Page 32: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 32

Metropolis-Hastings caveatsActually one can only prove that the sequence of points followsthe desired pdf in the limit where it runs forever.

There may be a “burn-in” period where the sequence doesnot initially follow

Unfortunately there are few useful theorems to tell us when thesequence has converged.

Look at trace plots, autocorrelation.

Check result with different proposal density.

If you think it’s converged, try it again with 10 times more points.

Page 33: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 33

LEP-style analysis: CLb

Same basic idea: L(m) → l(m) → q(m) → test of m, etc.

For a chosen m, find p-value of background-only hypothesis:

Page 34: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 34

LEP-style analysis: CLs+b

‘Normal’ way to get interval would be to reject hypothesizedm if

By construction this interval will cover the true value of mwith probability 1 .

Page 35: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 35

LEP-style analysis: CLs

The problem with the CLs+b method is that for high m, the distribution of q approaches that of the background-only hypothesis:

So a low fluctuation in the number of background events cangive CLs+b <

This rejects a high m value even though we are not sensitiveto Higgs production with that mass; the reason was a lowfluctuation in the background.

Page 36: G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL

G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 36

CLs

A solution is to define:

and reject the hypothesized m if:

So the CLs intervals ‘over-cover’; they are conservative.