treatment of nuisance parameters in high energy physics, and possible justifications and...

29
Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of Physics and Astronomy, UCLA Sept. 13, 2005 I. Introduction II. Poisson problem with no nuisance parameter; add one. III.The Role of Conditioning (or Absence thereof) in HEP IV. Nuisance Parameters in... A. Bayesian Intervals B. Likelihood Intervals C. Frequentist Neyman Construction of

Post on 20-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Treatment of Nuisance Parameters in High Energy Physics, and

Possible Justifications and Improvements in the Statistics Literature

Bob CousinsDept. of Physics and Astronomy, UCLA

Sept. 13, 2005

I. IntroductionII. Poisson problem with no nuisance parameter; add one.III. The Role of Conditioning (or Absence thereof) in HEPIV. Nuisance Parameters in...

A. Bayesian IntervalsB. Likelihood IntervalsC. Frequentist Neyman Construction of Confidence Intervals

V. Conclusions

Page 2: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Nuisance parameters (n.p.) appear in virtually every physics measurement of interest because the measuring apparatus must be calibrated, and for all but the simplest apparatus, the calibration technique involves unknowns which are not directly of physical interest.

Often called “systematic uncertainties” in HEP , but see Sinervo (2003) for more precise discussion of connection.

I am rather familiar with the HEP literature, and took this opportunity to dig a little more into the stats literature. The result was overwhelming and humbling, as I realized how vast the literature is and the names associated with it (Cox, Reid, and their friends and associates)! Given how little time I have these days for my statistics hobby, my revised goal is to acquaint you with the existence of some HEP and stats literature, and suggest possible paths to pursue.

Page 3: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Preliminaries• In HEP, we have parametric models that we take very

seriously (laws of nature: parameters are “fundamental constants of nature”, like mass and charge of electron.)

• Modern experiments have lots of “calibration constants” – detector-associated nuisance parameters -- but a typical fit to data has a small number of physical parameters of interest “floating”.

• Everyone knows how to simulate lots of pseudo-experiments (if inefficiently); frequentist probability is built into the culture.

• We think we have the world’s best random samples (randomness supplied by quantum mechanics).

Page 4: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Preliminaries (cont.)• So we often have a likelihood function we believe (even

if detector-related parts of it require a lot of calibration work), and

• It has a few (often one) parameter(s) of interest, and one or more nuisance parameters, and

• Most people want good frequentist behavior and can check for it.

• “Hand calculations” are of interest to us old-timers for sanity checks on the computer code, but most younger people are happy to do everything numerically without power series expansions, etc., and maybe check it with a different computer program.

Page 5: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Preliminaries. Simple problem: Observe n=3 from Poisson process; report interval for unknown mean μ.

Bayesian credible intervals: Pposterior(μ) = Pprior(μ) L(μ) likelihood intervals: from L(μ) with no metricfrequentist confidence intervals: Neyman construction

Page 6: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

• So we already have a menu of answers even before we add the first complication!

• Prototype complication: The physics of interest is a rate : mean counts per time. The experimenters observed the 3 counts during a time t.

• If t known with negligible uncertainty, then (interval for ) is (interval for ) divided by t. But if not, how does one incorporate the uncertainty in t into the interval for ?

• It’s quite remarkable how the possibilities proliferate, even for a Bayesian (if minding his/her priors)!

Before addressing this, some further preliminaries:

Page 7: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Preliminaries (cont.): Conditioning• With one (non-standard) exception, rarely consciously used in HEP.

But likelihood-based analysis tends to build it in. Fitting to the events in one’s hand tends to build it in.

• Frequentist properties are typically checked with something like global ensemble: all possible outcomes if the experimental procedure is repeated, start to finish. Issues of “what would I have done with data I did not get” are dealt with reasonably (in my opinion), especially with help of blind analyses. In fact, implicit conditioning slips in...

• Cox’s two-measuring instrument example is known to some. In our HEP analogs, people intuitively condition on the chosen measuring instrument (e.g., observed decay mode).

• But theory of conditioning at the level of Reid’s 1995 review article is little known in HEP, I think.

Page 8: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Conditioning: Ratio of Poisson MeansPoisson X with unknown mean μ, Poisson Y with unknown mean ν

Physics interest is λ = μ/ν. Take ν is n.p.

Define Z = X + Y, Poisson mean τ = μ + ν; binomial ρ = λ / (1 + λ).

For inference on λ, condition on total number of events z, problem reduces to binomial inference on ρ from observed x|z.Reid (and James/Roos): “...it is intuitively obvious that there is no information on the ratio of rates from the total count...”

Product of x and y Poissons is z Poisson times binomial for x|z

Page 9: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Ratio of Poisson Means (cont.)Poisson X with unknown mean μ, Poisson Y with unknown mean ν

Physics interest is λ = μ/ν. Take ν is n.p.

Define Z = X + Y, Poisson mean τ = μ + ν; binomial ρ = λ / (1 + λ).

For inference on λ, condition on total number of events z, problem reduces to binomial inference on ρ from observed x|z.For inference on sum τ, marginalize over x, use only the observed z.

Thus we have in one problem a different way to eliminate each of the “nuisance parameters”. When to use each way when the functional form is not so obvious?Lots of research in this area by Cox, Read, Fraser, et al.Start with Fraser/Read (2003) and Fraser (2004) and work your way backward at least to Barndorff-Nielson (1983, 1986).

Page 10: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Ratio of Poisson Means (epilog)

The standard intervals over-cover badly due to discreteness. There are no values of the means μ,ν which give coverage of λ even close to nominal. I played around for ten years and eventually constructed confidence intervals for λ which are subsets (proper subsets with two exceptions) of the “standard” intervals.

So shorter in any metric, yet still (over)cover!

Artifact of discreteness.

Turns out to have an analog in 60-year-old argument for 22 contingency tables.

Are my intervals really “improved”? They violate “intuitively obvious” conditioning on purpose, in order to use a frequentist P on total z to average out discreteness.

More on the construction below.

Page 11: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Conditioning in HEP on an InequalityPoisson signal X with unknown mean μ,

Indistinguishable Poisson background Y with known mean b. (Next talk will replace b with unknown and it’s a nuisance parameter.)

The observable is Z=X+Y. Interval for μ is desired. Most of the discussion is in the context of upper limits on μ.

Central confidence intervals can be empty set (e.g., z = 2, b=6).

Bayesian with uniform prior for μ behaves pleasingly to many physicists, and ties on to frequentist answer for upper limit (above).

In 1989, Zech found a frequentist interpretation of the Bayesian formula, by putting in the condition the yz. It is popular, has been generalized and even rediscovered by a statistician But usefulness, even among advocates, seems to be restricted to upper limits.

Confidence intervals from inverting likelihood ratio test a la Kendall & Stuart (Feldman-Cousins 1998) are used widely but violate L.P. blatantly when z = 0 since intervals depends on b.

Page 12: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

In the Bayesian world, all the difficulties with n.p. are pushed (where else?) into the prior pdfs for the n.p. Then integrate them out to get marginal posterior.

In HEP, uniform priors are the default. D0 expt popularized uniform in Poisson mean (cross section). Little attention paid to lack of invariance, except by critics such as…

Should we be thoroughly investigating Reference Priors of Bernardo (and Berger)? See articles by Liseo and Berger, together and separately. Stimulating article by HEP’s Demortier (2004). And talk yesterday. Are the HEP Bayesians following up?

Are there sensitivity studies of priors for n.p.? What is the real goal? Good frequentist properties?

Nuisance Parameters in Bayesian Intervals

Page 13: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

I learned that this has been an active area of statistics research, with developments which are not well-known in HEP. HEP standard for decades: method of MINOS in MINUIT: the profile likelihood. Period of confusion on both sides after presentation by Rolke (and Lopez) at FNAL CLW (2000). Nice study of coverage by Rolke, Lopez, and Conrad (2005).

Meanwhile statisticians have been busy “modifying” and “adjusting” the profile likelihood. Idea is to speed asymptotic convergence by from 1/√n to (1/√n)3.

Many points of view: conditioning, marginalizing, integrating. Key concept: orthogonal variables. Intro by Reid at PhyStat03.

Nuisance Parameters in Likelihood Intervals

Page 14: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

We know profile likelihood has some “issues”. Even something as simple as variance of Gaussian data is biased if mean unknown: doesn’t account for removal of degree of freedom by fitting mean.

I find the ideas in, for example, Fraser 2004 very stimulating. Basic idea: with single parameter of interest, can make all the nuisance parameters locally “orthogonal” to it: In info matrix, off-diagonal elements of parameter of interest vanish. Then marginalizing magically gives the desired (approximate) conditioning. (At this point Sir David and Nancy can explain…)But: is (1/√n)3 instead of 1/√n worth the trouble? (especially when n=0?) For what values of n do we really win? If we give up on direct construction and go for approximation, do Bayesian-inspired black-boxes perform just as well? (Berger!)

Nuisance Parameters in Likelihood Intervals (cont.)

Page 15: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

I. Full-blown construction for each value of the nuisance parameter; take union of all confidence sets over n.p.Without care, can badly over-cover, since each true value of nuisance parameter benefits from parts of the acceptance region brought in by other values of that n.p. I did the construction for ratio of Poisson means, and after several tries, found confidence sets that were subsets of the “most powerful” ones, exploiting loophole in theorem due to discreteness. (RC (1998))Shortly afterward, Feldman and I finished our paper and discussed treating nuisance parameters with the same construction. We decided to try approximate method first. (See GF talk at FNAL 2000); divert to that for the moment…

Nuisance Parameters in Confidence Intervals(Real, old-fashioned Neyman construction, or in more modern

terms, inverting hypothesis tests on parameter values.)

Page 16: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

II. Do Neyman construction, but instead of constructing for all values of the nuisance parameters, use the conditional M.L. estimate. This is also straight out of Kendall and Stuart; classic inversion of likelihood ratio test, with approximation to make it generally tractable and scalable.

Nuisance Parameters in Confidence Intervals

From GF talk atFNAL CLW 2000

Page 17: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Kyle Cranmer did the full-blown treatment for the LR ordering, and presented it at SLAC PhysStat2003! Of course he also can do the approximate method.See next talk!

Gary and I did not realize this – Louis can no longer make us feel guilty that we never provided a written explanation of Gary’s impenetrable slide at FNAL CLW. Next time, ask Kyle!

Nuisance Parameters in Confidence IntervalsI. Full-blown construction (cont.)

Page 18: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Sorry, you had to be there!

Page 19: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

1992: Prototype by Highland and RC (=/t), with handy approximate formula – others had been also doing it intuitively. Fred James explained to us what we were doing.Widely used and generalized. Barlow (2002) put it into his “calculator”. For upper limits, the “answer” is similar to a Bayesian answer, within some caveats.Tends to over-cover, sometimes by a lot. Various claims of under-coverage at 90% C.L. were debunked. Conrad and collaborators have published studies of coverage, including extension to FC. See also Lista.Cranmer took it out to 5 (3 10-7, less than humaine we learned yesterday) in a background nuisance parameter, and says it undercovers badly – see next talk.

Nuisance Parameters in Confidence Intervals III. Do Neyman construction, but average over nuisance parameter

in Bayesian-like way, either at end or in likelihood function.

Page 20: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Pause to reflect: Only in the last week did I start to think seriously about:

Do we really always want to do any of this?• A.W.F. Edwards (in Kalbfleisch 1970): “Let me say at

once that I can see no reason why it should always be possible to eliminate nuisance parameters. Indeed, one of the many objections to Bayesian inference is that is always permits this elimination.''

• As we saw yesterday in Holme’s talk, judiciously chosen 2D plots can tell us something that is lost by marginalizing, conditioning, or integrating.

Page 21: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

The statisticians argue about a lot of the same points we do, but with several decades’ head start.

I hope my HEP colleagues will find the bibliography stimulating. In my writeup I say a word or two about each.

(More conclusions following.)

Conclusions (I)

Page 22: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of
Page 23: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of
Page 24: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of
Page 25: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of
Page 26: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Another one added in proof: J. Linneman, Measures of Significance in HEP and Astrophysics, SLAC PhyStat 2003.

Page 27: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Conclusions (II): With today’s tools1. If you’re are doing a completely Bayesian analysis,

beware the curse of dimensionality. Ask yourself how you will answer a student who says “What does P mean?”

2. Running your problem through MINUIT MINOS (profile likelihood) should be mandatory just to get a sense of the likelihood contours and an approximate answer.

3. Step 2 should help you (or not) establish that it really makes sense to eliminate nuisance parameters in your application.

4. All users of LR ordering (a la FC) now can have examples of approximate methods by GF and KC (and it is still in K&S); for those with the fortitude, Kyle will help them with a full-blown construction.

5. No matter how you did it, explore frequentist coverage.

Page 28: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Conclusions (III): Some next steps?1. It’s time HEP tried out reference priors and found their

warts as well. Although statisticians think they have good coverage properties, I am a little skeptical of anything which is Jeffrey’s prior in 1D. For our Poisson , Jeffreys’ prior wasn’t even used by Jeffreys (who seemed to prefer his argument for 1/.) And in HEP, it’s not good at very low n.

2. Conditioning when appropriate should be part of our standard, conscious thinking. See RC CERN CLW (2000): What is the ensemble? Also Prosper. We ought to at least be aware of arguments against “global” ensemble. When to marginalize?

3. It someone has the fortitude, try to implement modified profile likelihood ideas in a few difficult cases, and see if they help us. Conpare to integrating out n.p. in otherwise frequentist likelihood analysis.

Page 29: Treatment of Nuisance Parameters in High Energy Physics, and Possible Justifications and Improvements in the Statistics Literature Bob Cousins Dept. of

Thanks to all,especially:

(the late)Virgil Highland;Fred James;

Gary Feldman; Louis Lyons;

and Günter Zech…(who do not always agree with me on

everything – that’s part of the fun)and to the statisticians who so generously and

gently help us “amateurs” (in the French sense of the word.)