advanced statistical techniques in particle physics conference summary (thanks to bob cousins!) jim...

45
Advanced Statistical Techniques in Particle Physics Conference Summary (Thanks to Bob Cousins!) Jim Linnemann MSU HEP Seminar 23 April, 2002

Post on 21-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Advanced Statistical Techniques in Particle

Physics

Conference Summary(Thanks to Bob Cousins!)

Jim LinnemannMSU HEP Seminar

23 April, 2002

Conference Overview Durham, UK

– 5 days, nearly no rain!

• Mixture of “theoretical” and practical– Overview/Tutorial talks– Systematic Comparisons of Methods– New Developments– Problems

• Visiting (tolerant!) Statisticians: – Michael Goldstein– Wolfgang Rolke

• Radical idea: if phenomenologist in collaboration, why not a professional statistician (a la medical research)?

http://www.ippp.dur.ac.uk/statistics/

Tutorials, Overviews, Explanations• Fred James:

– Overview– Goodness of Fit vs. Intervals

• Roger Barlow: – Systematics:

• mistakes, effects, errors

Multidimensional: • Sherry Towers:

– PDE’s– Reducing variables in classification

• Harrison Prosper: – multi- dimensional methods

• Tony Vaiculis: – Support Vector Machines

• Niels Kjaer: Monte Carlo– Interpolating (+ much else)

• Pekka Sinervo:– Significance

• Berkan Aslan (G. Zech);– Goodness of Fit measures

• Glen Cowan, Volker Blobel: – Unfolding

• Paul Harrison: – Blind Analysis

Theory, Practice, and Methods

• Chris Parkes– Combining Lep W results

• Gary Hill, Tyce De Young– Bayes in Amanda tracking

• Rudy Bock, Wolfgang Wittek– Multidimensional methods for

Gamma/hadron separation

• Volker Blobel– Global Alignment Fits

• Alex Read– CLS

• Dean Karlen– Credibility of Conf Intervals

• Raja– Uncertainty of Limits

Problems to Chew On

• Nigel Smith and Dan Tovey– Dark Matter Searches

• Bruce Yabsley– Statistics in Practice at Belle

Fred James

Important not to confuse these problems, e.g., interval estimation and goodness-of-fit testing.

Parameter fitting criterionParameter fitting criterion

– Hypothesis-testing vs parameter-fitting criteria ( cited from J.C. Collins, J. Pumplin, hep-ph/0105207, p.3 )

Weng

Roger Barlow

Calculated the to use for comparison checks…

Roger Barlow

Multidimensional Methods• Aspire to full extraction of information• Equivalent to trying to fit

P(signal)/P(background) (Neyman-Pearson)

• Issues in – choice of dimensionality (no one tells you how many!)– methods of approximation– control of bias/variance tradeoff – complexity of fit– Number of free parameters– Amount of training data needed– “ease” of interpretation

• We are following the field; hope for theory to help• See: Elements of Statistical Learning,

– Hastie, Tibshirani, Friedman

Sherry Towers

Sherry Towers

Wow! Several questions come to my mind…

[In general case, variables deletion is safer than variable addition. –M.G.]

Harrison Prosper• Thumbnail sketch of some methods of interest:

– Fisher Linear Discriminant– Principal Components Analysis– Independent Component Analysis– Self-Organizing Map– Grid Search– Probability Density Estimation– Neural Networks– Support Vector Machines

• Said these all are attempts to solve the single classification problem whose solution is the Bayes discriminator D(x) = P(S|x)/P(B|x) = (L(S)/L(B)) (P(S)/P(B)) … = Neyman-Pearson when P(S)=P(B)

• Multivariate analysis is hard: important to use all the information used by D(x) (which might be lost, e.g., by marginalization). Appears that there is no single optimal approximation.

SVM

Vaiculis

Ref

Unfolding (unsmearing)

• Inherently unstableMeasured smoother than

trueUn-smooth: Enhances noise!

• Nice discussion of regularization, biases, uncertainies

• See talk – and his statistics book

One program:

• Must balance between oscillations and over-smoothed result

• = Bias-variance tradeoff– Same issues in

multidimensional methods

Glen Cowan

Unfolding: Insight• View as matrix

problem• “ill-posed” = singular• Analyze in terms of

Eigenvalues/vectors and condition number

V. Blobel

Statistical error

Truncate eigenfunctions when below error bound

Unfolding Results

High frequencies not measured:

Report fewer bins? (or supply from prior??)

Statistical error in neighboring bins no longer uncorrelated!

Blobel

oversmoothed

Higher modes converge slowly: interate only a few times (d’Agostini)

N.J. Kjaer (I)

Delphi MC

Re-interpretation of data to interpolate on physics paramters

Analogy with Stat Mech MC techniques?

Aslan/ZechGoodness of Fit

“Energy Test” (electrostatics motivated)

Aslan/Zech

Paul Harrison

Blind Analysis

Cousins: it takes longer, especially first time

“liberating”

By the way, no one can read light green print…

Blind Analysis

• A called shot– Step towards making 3 mean 3

• Many ways to blind– 10% of data; background-only, obscure fit

result

• Creates a mindset– Avoiding biases and subjectivity

R.K. Bock

R.K. Bock

R.K.Bock, Durham, March 2002 23

Method details and comments: composite probabilities (2-D)

• intuitive determination of event probabilities by multiplying the probabilities in all 2D projections that can be made from image parameters, using constant bin content for some data

• shown on some IACT data to at least match best existing results (but strict comparisons suffered from moving data sets)

R.K.Bock, Durham, March 2002 24

CP program uses same-content binning

in 2 dimensions

Bins are set up for gammas (red),

probabilities are evaluated for protons (blue)

all possible 2-D projections are used

Method details and comments: composite probabilities (2-D)

R.K. Bock

Interesting idea:

Expansion in dimensionality of correlation

R.K. Bock

R.K. Bock

(Hill/DeYoung)

(Hill/DeYoung)

Very Interesting Technique!• Let’s relate it to something we do: say particle ID

in a detector: – In hot part of detector near beam: lots of background,

we tighten particle-ID cuts

– In lower-occupancy part of the detector away from beam, can loosen certain particle-ID cuts without letting in a lot of background

• Use our knowledge of position-dependent occupancy rates in Bayes’s Theorem to calculate the probability that a given particle in a given location is the species of interest.

• If all input P’s are frequentist P’s, the output P(particle type | data) is a frequentist P.

• We can use this posterior frequentist P like any other observable for cuts, weights, etc. If we independently calibrate the signal efficiency/ background rejection of this use, there is nothing circular about using our knowledge of the input occupancies.

• If the input occupancy knowledge is imperfect it will not introduce a bias, but rather make the technique less powerful.

Comments:

Bayes’s Theorem applies to any P satisfying the axioms of probability

• Frequentist P: limiting frequency– Theorem not much use if the unknown is a constant of nature:

P(unknown) = delta-function at unknown value

• Bayesian P: degree of belief– For constant of nature, P(unknown) can be combination of

delta-function and continuous function, reflecting degree of belief

• Is the Amanda technique “Bayesian”?– Not if “Bayesian” implies “not frequentist”, as I think is

common, even though frequency P is emulated in a certain application/limit of degree of belief.

• In any case, instructive example!

Chris Parkes

Practicalities of Combining Analyses:

W Physics Results at LEPNow the stuff you don’t normally see…

An important reminder: pragmatic considerations (sometimes even irrational) can be as important as principles in order to get out a result.

RC: An informative talk about both methodology and sociology!

Parkes

• LEP experiments contained a sizable fraction of world HEP community, and reached very mature state of analysis.– We have much to learn from them, both theoretical and

practical.

This talk is not for the squeamish or over-idealistic,

but it is a vivid description of the real world in action!

Cousins:

Studies of Intervals

• Byron Roe and Michael Woodroofe: Mini-Boone

• Jan Conrad: Coverage with Systematics

• Rolke and Lopez: Bias correction via double-bootstrap

• Giunti and Laveder: the “power” of confidence intervals

• Punzi: Strong Confidence Intervals

• Giovanni Signorelli et al: Strong C.I. And systematics

Dean Karlen’s Proposal to Evaluate Credibility of Confidence Intervals

• Yesterday evening, generally interested-to-favorable reaction

• Cousins: I’m outlier: I think it will only encourage unthinking “easy” use of Bayes, with more flat (i.e., not degree of belief) priors.

• We evaluate Bayesian intervals with serious frequentist methods.

• Why not evaluate confidence intervals with serious Bayesian methods? One metric-dependent prior constituteth not a sensitivity analysis.

• Who was it who said “How do you know that the outlier isn’t right?”

Alex Read’s Beautiful Talk on CLS • CLs = PV(s+b)/PV(b) (Cowan, PDG Stats)

– PV = P Value = prob(obs), posterior, like P(2)

• Behavior compared to LR Ordering (F-C) is understood and lucidly explained. Application to neutrino oscillations!

• Please see his talk• Cousins comment: The non-standard conditioning

(inequality, not ancillary statistic) of Zech and Roe&W and Read leads to problems with lower end of confidence intervals (see Cousins PRD Comment). Alex recognized this.

• Therefore, Alex now advocates CLS only for limits, and in case of signal, he now would use LR Ordering.

To Use Bayes or not?• Professional Statisticians are much more Bayes-oriented in last 20 years

– Computationally possible– Philosophically coherent

• (solipsistic?? Subjective Bayes…)

• In HEP: want to publish result, not prior– We want to talk about P(theory|data)

• But this requires prior: P(theory)– Likelihoods we can agree on!– Conclusions should be insensitive to a range of priors

• Probably true, with enough data

• Search limits DO depend on priors!– Hard to convince anyone of a single objective prior!!!– Unpleasant properties of naïve frequentist limits, too

• Feldman-Cousins is current consensus

• Systematic errors hard to treat in frequentist– PDG currently recommends Bayes “smearing of likelihood”

• close in spirit to Cousins-Highland mixed Frequentist-Bayesian

Michael Goldstein

• A real pleasure to have you here!• Since subjective Bayes is rarely used in HEP, but

is “known” to be the “coherent” version, it has been very enlightening:

• “Sensitivity Analysis is at the heart of scientific Bayesianism”– How skeptical would the community as a whole have to

be in order not to be convinced.– What prior gives P(hypothesis) > 0.5– What prior gives P(hypothesis) > 0.99, etc

• There’s a split among Bayesians; M.G. is in the group that sees no virtue in objective (“arbitrary”) priors (except as one of many examples of possible prior beliefs in a sensitivity analysis).

Michael Goldstein (cont.)

• Procedures should obey the likelihood principle. Frequentist methods don’t obey it: fundamental flaw.

• Bayesian methods are hard to do right, but they are the only way to attack certain hard problems.

• Bayes Linear Methodology: addresses expectations rather than whole pdf’s.

• HEP problems: appear to map onto a very similar set of abstract problems.

Cousins would add:

• (Coherent) Subjective priors behave like real probabilities under transformations, unlike, e.g., flat priors.

• M.G. represents only one school of Bayesian stats, but I don’t think you will find a school advocating uniform prior for a Poisson mean.

• M.G. portrays Bayesian methods as hard, but worth the effort. This should be stressed in HEP, where the hard part (subjective prior) is dodged, and the math is (indeed) easily cranked out (without backwards thinking) to give an “answer” that I think is without much content unless evaluated by frequentist standards.

• I think M.G.’s point about sensitivity analysis has to be taken to heart in HEP, whether one uses objective or subjective priors.

Cousins’ Last Words (for now!)

• The area under the likelihood function is meaningless.

• Mode of a probability density is metric-dependent, as are shortest intervals.

• A confidence interval is a statement about P(data | parameters), not P(parameters | data)

• Don’t confuse confidence intervals (statements about parameter) with goodness of fit (statement about model itself).

• P(non-SM physics | data) requires a prior; you won’t get it from frequentist statistics.

• The argument for coherence of Bayesian P is based on P = subjective degree of belief.