g. cowan rhul physics statistical methods for particle physics / 2007 cern-fnal hcp school page 1...
Post on 21-Dec-2015
213 views
TRANSCRIPT
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1
Statistical Methods for Particle Physics (2)
CERN-FNAL
Hadron Collider Physics
Summer School
CERN, 6-15 June, 2007
Glen CowanPhysics DepartmentRoyal Holloway, University of [email protected]/~cowan
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 2
Outline
1 Brief overviewProbability: frequentist vs. subjective (Bayesian)Statistics: parameter estimation, hypothesis tests
2 Statistical tests for Particle Physicsmultivariate methods for event selection
goodness-of-fit tests for discovery
3 Systematic errors Treatment of nuisance parametersBayesian methods for systematics, MCMC
Friday
Wednesday
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 3
Testing goodness-of-fit
Suppose hypothesis H predicts pdf
observations
for a set of
We observe a single point in this space:
What can we say about the validity of H in light of the data?
Decide what part of the data space represents less compatibility with H than does the point less
compatiblewith H
more compatiblewith H
(Not unique!)
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 4
p-values
where (H) is the prior probability for H.
Express ‘goodness-of-fit’ by giving the p-value for H:
p = probability, under assumption of H, to observe data with equal or lesser compatibility with H relative to the data we got.
This is not the probability that H is true!
In frequentist statistics we don’t talk about P(H) (unless H represents a repeatable observation). In Bayesian statistics we do; use Bayes’ theorem to obtain
For now stick with the frequentist approach; result is p-value, regrettably easy to misinterpret as P(H).
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 5
p-value example: testing whether a coin is ‘fair’
i.e. p = 0.0026 is the probability of obtaining such a bizarreresult (or more so) ‘by chance’, under the assumption of H.
Probability to observe n heads in N coin tosses is binomial:
Hypothesis H: the coin is fair (p = 0.5).
Suppose we toss the coin N = 20 times and get n = 17 heads.
Region of data space with equal or lesser compatibility with H relative to n = 17 is: n = 17, 18, 19, 20, 0, 1, 2, 3. Addingup the probabilities for these values gives:
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 6
p-value of an observed signalSuppose we observe n events; these can consist of:
nb events from known processes (background)ns events from a new process (signal)
If ns, nb are Poisson r.v.s with means s, b, then n = ns + nb
is also Poisson, mean = s + b:
Suppose b = 0.5, and we observe nobs = 5. Should we claimevidence for a new discovery?
Give p-value for hypothesis s = 0:
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 7
Significance from p-value
Often define significance Z as the number of standard deviationsthat a Gaussian variable would fluctuate in one directionto give the same p-value.
TMath::Prob
TMath::NormQuantile
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 8
The significance of a peak
Suppose we measure a value x for each event and find:
Each bin (observed) is aPoisson r.v., means aregiven by dashed lines.
In the two bins with the peak, 11 entries found with b = 3.2.The p-value for the s = 0 hypothesis is:
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 9
The significance of a peak (2)
But... did we know where to look for the peak?
→ give P(n ≥ 11) in any 2 adjacent bins
Is the observed width consistent with the expected x resolution?
→ take x window several times the expected resolution
How many bins distributions have we looked at?
→ look at a thousand of them, you’ll find a 10-3 effect
Did we adjust the cuts to ‘enhance’ the peak?
→ freeze cuts, repeat analysis with new data
Should we publish????
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 10
Using shape of a distribution in a search
Suppose we want to search for a specific model (i.e. beyond the Standard Model); contains parameter .
Expected number of entries in ith bin:
signal background
Suppose the ‘no signal’ hypothesis is = 0, i.e., s(0) = 0.
Probability is product of Poisson probabilities:
Select candidate events; for each event measure some quantityx and make histogram:
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 11
Testing the hypothesized Construct e.g. the likelihood ratio:
Find the sampling distributioni.e. we need to know how t() would be distributed if theentire experiment would be repeated under assumption of thebackground only hypothesis (parameter value 0).
p-value of 0 using test variable designed to be sensitive to :
This gives the probability, under the assumption of backgroundonly, to see data as ‘signal like’ or more so, relative to what we saw.
(e.g. use MC)
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 12
Making a discovery / setting limits
If we find a small p-value → discovery
Repeat this exercise for all
Test hypothesized using
Confidence interval at confidence level = set of values not rejected by a test of significance level
If reject .
here use e.g. = 0.05
Is the new signal compatible with what you were looking for?
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 13
When to publish
HEP folklore: claim discovery when p-value of background only hypothesis is 2.85 10-7, corresponding to significance Z = 5.
This is very subjective and really should depend on the prior probability of the phenomenon in question, e.g.,
phenomenon reasonable p-value for discoveryD0D0 mixing ~0.05Higgs ~ 10-7 (?)Life on Mars ~10
Astrology
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 14
Statistical vs. systematic errors Statistical errors:
How much would the result fluctuate upon repetition of the measurement?
Implies some set of assumptions to define probability of outcome of the measurement.
Systematic errors:
What is the uncertainty in my result due to uncertainty in my assumptions, e.g.,
model (theoretical) uncertainty;modelling of measurement apparatus.
Usually taken to mean the sources of error do not vary upon repetition of the measurement. Often result from uncertain value of, e.g., calibration constants, efficiencies, etc.
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 15
Systematic errors and nuisance parameters
Response of measurement apparatus is never modelled perfectly:
x (true value)
y (m
easu
red
valu
e) model:
truth:
Model can be made to approximate better the truth by includingmore free parameters.
systematic uncertainty ↔ nuisance parameters
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 16
Example: fitting a straight line
Data:
Model: measured yi independent, Gaussian:
assume xi and i known.
Goal: estimate 0
(don’t care about 1).
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 17
Case #1: 1 known a priori
For Gaussian yi, ML same as LS
Minimize 2 → estimator
Come up one unit from
to find
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 18
Correlation between
causes errors
to increase.
Standard deviations from
tangent lines to contour
Case #2: both 0 and 1 unknown
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 19
The information on 1
improves accuracy of
Case #3: we have a measurement t1 of 1
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 20
The ‘tangent plane’ method is a special case of using the
profile likelihood:
The profile likelihood
is found by maximizing L (0, 1) for each 0.
Equivalently use
The interval obtained from is the same as
what is obtained from the tangents to
Well known in HEP as the ‘MINOS’ method in MINUIT.
Profile likelihood is one of several ‘pseudo-likelihoods’ usedin problems with nuisance parameters.
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 21
The Bayesian approach
In Bayesian statistics we can associate a probability witha hypothesis, e.g., a parameter value .
Interpret probability of as ‘degree of belief’ (subjective).
Need to start with ‘prior pdf’ (), this reflects degree of belief about before doing the experiment.
Our experiment has data y, → likelihood function L(y|).
Bayes’ theorem tells how our beliefs should be updated inlight of the data x:
Posterior pdf p(| y) contains all our knowledge about .
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 22
Case #4: Bayesian method
We need to associate prior probabilities with 0 and 1, e.g.,
Putting this into Bayes’ theorem gives:
posterior Q likelihood prior
← based on previous measurement
reflects ‘prior ignorance’, in anycase much broader than
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 23
Bayesian method (continued)
Ability to marginalize over nuisance parameters is an importantfeature of Bayesian statistics.
We then integrate (marginalize) p(0, 1 | x) to find p(0 | x):
In this example we can do the integral (rare). We find
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 24
Digression: marginalization with MCMCBayesian computations involve integrals like
often high dimensionality and impossible in closed form,also impossible with ‘normal’ acceptance-rejection Monte Carlo.
Markov Chain Monte Carlo (MCMC) has revolutionizedBayesian computation.
Google for ‘MCMC’, ‘Metropolis’, ‘Bayesian computation’, ...
MCMC generates correlated sequence of random numbers:cannot use for many applications, e.g., detector MC;effective stat. error greater than √n .
Basic idea: sample multidimensional look, e.g., only at distribution of parameters of interest.
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 25
Although numerical values of answer here same as in frequentistcase, interpretation is different (sometimes unimportant?)
Example: posterior pdf from MCMCSample the posterior pdf from previous example with MCMC:
Summarize pdf of parameter ofinterest with, e.g., mean, median,standard deviation, etc.
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 26
Case #5: Bayesian method with vague prior
Suppose we don’t have a previous measurement of 1 butrather some vague information, e.g., a theorist tells us:
1 ≥ 0 (essentially certain);
1 should have order of magnitude less than 0.1 ‘or so’.
Under pressure, the theorist sketches the following prior:
From this we will obtain posterior probabilities for 0 (next slide).
We do not need to get the theorist to ‘commit’ to this prior;final result has ‘if-then’ character.
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 27
Sensitivity to prior
Vary () to explore how extreme your prior beliefs would have to be to justify various conclusions (sensitivity analysis).
Try exponential with differentmean values...
Try different functional forms...
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 28
Wrapping up...
p-value for discovery = probability, under assumption ofbackground only, to see data as signal-like (or more so) relativeto the data you obtained.
≠ P(Standard Model true)!
Systematic errors ↔ nuisance parameters
If constrained by measurement → profile likelihoodOther prior info → Bayesian methods
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 29
Extra slides
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 30
MCMC basics: Metropolis-Hastings algorithmGoal: given an n-dimensional pdf
generate a sequence of points
1) Start at some point
2) Generate
Proposal densitye.g. Gaussian centredabout
3) Form Hastings test ratio
4) Generate
5) If
else
move to proposed point
old point repeated
6) Iterate
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 31
Metropolis-Hastings (continued)This rule produces a correlated sequence of points (note how each new point depends on the previous one).
For our purposes this correlation is not fatal, but statisticalerrors larger than naive
The proposal density can be (almost) anything, but chooseso as to minimize autocorrelation. Often take proposaldensity symmetric:
Test ratio is (Metropolis-Hastings):
I.e. if the proposed step is to a point of higher , take it;
if not, only take the step with probability
If proposed step rejected, hop in place.
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 32
Metropolis-Hastings caveatsActually one can only prove that the sequence of points followsthe desired pdf in the limit where it runs forever.
There may be a “burn-in” period where the sequence doesnot initially follow
Unfortunately there are few useful theorems to tell us when thesequence has converged.
Look at trace plots, autocorrelation.
Check result with different proposal density.
If you think it’s converged, try it again with 10 times more points.
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 33
LEP-style analysis: CLb
Same basic idea: L(m) → l(m) → q(m) → test of m, etc.
For a chosen m, find p-value of background-only hypothesis:
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 34
LEP-style analysis: CLs+b
‘Normal’ way to get interval would be to reject hypothesizedm if
By construction this interval will cover the true value of mwith probability 1 .
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 35
LEP-style analysis: CLs
The problem with the CLs+b method is that for high m, the distribution of q approaches that of the background-only hypothesis:
So a low fluctuation in the number of background events cangive CLs+b <
This rejects a high m value even though we are not sensitiveto Higgs production with that mass; the reason was a lowfluctuation in the background.
G. CowanRHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 36
CLs
A solution is to define:
and reject the hypothesized m if:
So the CLs intervals ‘over-cover’; they are conservative.