how good are your fits? unbinned multivariate goodness-of-fit...

32
How good are your fits? Unbinned multivariate goodness-of-fit tests in high energy physics. Mike Williams High Energy Physics Group Department of Physics Imperial College London June 28 th , 2010 Williams (ICL) June ’10 1 / 27

Upload: others

Post on 25-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

How good are your fits?Unbinned multivariate goodness-of-fit tests

in high energy physics.

Mike Williams

High Energy Physics GroupDepartment of PhysicsImperial College London

June 28th, 2010

Williams (ICL) June ’10 1 / 27

Page 2: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

Outline

1Introduction

2Toy Model Analysis

3GOF Methods & Performance

4Summary

Williams (ICL) June ’10 2 / 27

Page 3: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

Introduction

Outline

1Introduction

2Toy Model Analysis

3GOF Methods & Performance

4Summary

Williams (ICL) June ’10 2 / 27

Page 4: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

Introduction

The Curse of Dimensionality

A simple fact: more dimensions → sparser data. This is known as thecurse of dimensionality∗ and can often make performing a binnedmultivariate analysis undesirable (due to low bin occupancies).

Because of this curse (or, perhaps, b/c throwing away information isn’tsmart), many physicists employ unbinned techniques (e.g. log-likelihoodfitting) to analyze multivariate data.

While maximizing L does provide estimators for any unknown parametersin a PDF, it does not provide any information regarding how well the PDFdescribes the data; i.e., it is not a measure of goodness of fit!

* R.E. Bellman, Adaptive Control Processes, Princeton University Press, Princeton, NJ (1961).

Williams (ICL) June ’10 3 / 27

Page 5: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

Introduction

Physics Mythology

Have you heard this one before (I’ll bet that you have!)?

Determining goodness of fit (GOF) from unbinned fits is an unsolvedproblem in statistics.

This is a commonly held belief in the high energy physics community thathas led to most analyses in our field using the binned χ2 test (even insituations where its power is expected to be minimal).

It has also led some physicists to “invent” their own “methods” fordetermining GOF from unbinned maximum likelihood fits.

Williams (ICL) June ’10 4 / 27

Page 6: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

Introduction

A Physicist’s Solution

A method commonly used by <CENSORED> involves:

performing an unbinned maximum likelihood fit to determine theunknown parameters in a PDF;

generating an ensemble of Monte Carlo data sets from this PDF;

comparing Lmax to {Li} to determine the GOF.

Sounds reasonable but, quite frankly, it’s nonsense! E.g., try applying it totest for uniformity (in any dimension).

Williams (ICL) June ’10 5 / 27

Page 7: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

Introduction

A Physicist’s Solution

Consider the PDF f (x) = 1X e−x/X [J. Heinrich, arXiv:physics/0310167]. The

likelihood has a maximum at

− logLmax = n(1 + log X̄ ) X̄ = 1n

n∑i

xi .

Notice that for this PDF:

Any dataset w/ mean X̄ will have the sameGOF value (regardless of the parent PDF)!

The p-value will always be ∼ 50%!

Ex.) Flat data using f (x) as test PDF→x

0 0.5 1 1.5 2

50

100

150

200

p-value = 0.52

This “method” is also not invariant under change of variables, is biased,etc. Don’t use it!

Williams (ICL) June ’10 6 / 27

Page 8: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

Introduction

General GOF Observations

Some observations/advice for physicists:

avoid using any statistical method that is not published in a statisticaljournal;

physicists probably can’t invent new statistical methods (we canre-invent them though);

there is no uniformly most powerful GOF test (even in 1-D which testis the best depends on the analysis);

other scientific fields are way ahead of HEP when in comes tomultivariate GOF determination.

Williams (ICL) June ’10 7 / 27

Page 9: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

Introduction

The Point of this Talk

In the rest of this talk I’ll:

show you a few of the many unbinned multivariate GOF methods thatare published in the statistical journals;

apply them to an example real-world HEP analysis;

(hopefully) inspire you to use these types of methods in your analyses!

Williams (ICL) June ’10 8 / 27

Page 10: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

Toy Model Analysis

Outline

1Introduction

2Toy Model Analysis

3GOF Methods & Performance

4Summary

Williams (ICL) June ’10 8 / 27

Page 11: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

Toy Model Analysis

Toy Model AnalysisI want to perform a Dalitz-plot analysis of X → abc ,

where JPX ,a,b,c = 0−, mX = 1 and ma,b,c = 0.1. The

model includes a non-resonant term and 6 resonances(JP = 0+, 1−, 2+).

I will consider 3 population sizes; these are shown below.

For each sample size, an ensemble of 100 data sets will

be studied.ab2m

0 0.5

ac2m

0.5

1

10

10

10

10

10

low (n = 100) medium (n = 1000) high (n = 10000)

ab2m

0 0.5

ac2m

0.5

ab2m

0 0.5

ac2m

0.5

ab2m

0 0.5ac2

m

0.5

Williams (ICL) June ’10 9 / 27

Page 12: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

Toy Model Analysis

Toy Model Analysis

For each data set, the GOF of the following PDF’s will be analyzed:

Model: The PDF used to generate the data. The p-value distributionsobtained for each GOF method must be flat for this PDF!

Fit I: The full PDF fit to each data set w/ all resonance parametersfree (15 free parameters). Each data set has it’s own Fit I PDF. Thep-value distributions should be flat (modulo some small test bias).

Fit II: Fit I but w/ Rbc1 removed (10% fit fraction). The power of

each GOF method will be judged by how well it rejects Fit II.

Fit III: Fit I but w/ the non-resonant term removed (1% f.f.). Thepower of each GOF method will also be judged by how well it rejectsFit III. N.B. this PDF is very similar to one where the backgroundPDF is slighlty wrong.

Williams (ICL) June ’10 10 / 27

Page 13: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

Outline

1Introduction

2Toy Model Analysis

3GOF Methods & Performance

4Summary

Williams (ICL) June ’10 10 / 27

Page 14: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

GOF Method Categories

Not only is this not “an unsolved problem in statistics”, but there areactually way too many methods to test them all. I’ve divided up themethods I’ve found into the following 5 categories:

mixed-sample methods;

point-to-point dissimilarity methods;

distance to nearest-neighbor methods;

local-density methods;

kernel-based methods.

I’ve chosen one method from each category (typically the “landmark” one)to apply to my toy Dalitz-plot analysis.

Williams (ICL) June ’10 11 / 27

Page 15: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

GOF Method Categories

Not only is this not “an unsolved problem in statistics”, but there areactually way too many methods to test them all. I’ve divided up themethods I’ve found into the following 5 categories:

mixed-sample methods;

point-to-point dissimilarity methods;

distance to nearest-neighbor methods;

local-density methods;

kernel-based methods.

I’ve chosen one method from each category (typically the “landmark” one)to apply to my toy Dalitz-plot analysis.

Williams (ICL) June ’10 11 / 27

Page 16: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

Mixed Sample Methods

These methods use the fact that if two data sets are combined to form apooled sample, then the mixing of the two samples is only optimal if theyshare the same parent PDF.

f1(x , y) = f2(x , y) f1(x , y) 6= f2(x , y)

x 0 0.5 1

y

0.5

1

x 0 0.5 1

y

0.5

1

Williams (ICL) June ’10 12 / 27

Page 17: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

Mixed Sample Methods

Consider two samples with n1 and n2 events, respectively. The teststatistic∗

T =1

nk(n1 + n2)

n1+n2∑i=1

nk∑k=1

δsamplei ,samplek,

which is simply the mean fraction of like-sample nearest-neighbor events,should be larger if f1 6= f2 due to the lack of complete mixing of thesamples.

If f1 = f2 then T is normally distributed (we can get a pull!) with:

µT = (n1(n1 − 1) + n2(n2 − 1))/n(n − 1), where n = n1 + n2;

σT can be approx. under certain (not very restrictive) conditions.

*M.F. Schilling, J. Amer. Statistical Assoc. 81, No. 395 (1986) 799-806.

*N. Henze, Ann. Stat.16, No. 2 (1988) 772-783.

Williams (ICL) June ’10 13 / 27

Page 18: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

Mixed Sample Methods

For this simple example (where n1 = n2), T ≈ 1/2 for the case wheref1 = f2 and T ≈ 1 for the case f1(x , y) 6= f2(x , y).

f1(x , y) = f2(x , y) f1(x , y) 6= f2(x , y)

x 0 0.5 1

y

0.5

1

x 0 0.5 1

y

0.5

1

Williams (ICL) June ’10 14 / 27

Page 19: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

Mixed Sample Methods

For our toy-model analysis (comparing a fit PDF to data), the two samplesare the data and a MC data set sampled from the fit PDF.

This method can also be used compare two data sets (no PDF isrequired). E.g., one could test data stability by comparing two datasamples taken at different times or test the quality of MC by comparing itdirectly to the data.

What nuisance parameters do we have?

The number of Monte Carlo data events to generate, nmc .

The number of nearest-neighbor events to collect, nk .

I have studied these (arXiv:1006.3019) and found that nmc = 10nd andnk = 10 are good choices.

Williams (ICL) June ’10 15 / 27

Page 20: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

Mixed Sample Methods

Results obtained by mixing data and (various) Monte Carlo samples:

n=10000 n=1000 n=100

pull -5 0 50

10

20

Model

pull -5 0 50

10

20

Fit I

The pulls look like what we’d expect (µT ∼ 0, σT ∼ 1)! A small test bias(∼ 0.3 σT ) arises when using a PDF obtained from the data.

Williams (ICL) June ’10 16 / 27

Page 21: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

Mixed Sample Methods

Results obtained by mixing data and (various) Monte Carlo samples:

n=10000 n=1000 n=100

pull 0 10 200

10

20

Fit II

pull -5 0 50

10

20

Fit III

The rejection power of the method is good(poor) for Fit II(III). This is notsurprising given the deficiencies of these PDF’s and how the method works.

Williams (ICL) June ’10 17 / 27

Page 22: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

Mixed Sample Methods

Rejection power at 95% confidence level:

n Model Fit I Fit II Fit III

10000 3% 3% 100% 35%1000 2% 4% 73% 5%100 6% 3% 5% 3%

Lessons learned about this method:

the test bias is small (∼ −0.3σT );

the rejection power is good for large localized discrepancies but poorfor small omnipresent ones;

the regions of validity of the approx. for σT have been mapped out;

the method is very easy to use (and could have many uses in HEP)!

Williams (ICL) June ’10 18 / 27

Page 23: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

Point-to-Point Dissimilarity Methods

If the parent PDF, f (~x), of the data were known, then the GOF of a testPDF, f0(~x), could be obtained using the statisitc

T = 12

∫(f (~x)− f0(~x))2 d~x .

Since f is not known, T cannot be calculated. Of course, if f were knownthere would also be no reason to perform a fit.

A more general expression involves correlating the difference b/t f and f0at different points in the multivariate space using a weighting function:

T = 12

∫ ∫(f (~x)− f0(~x)) (f (~x ′)− f0(~x

′))ψ(|~x − ~x ′|)d~xd~x ′.

N.b. the 1st expression is the case ψ(|~x − ~x ′|) = δ(|~x − ~x ′|).

This quantity can be calculated w/o knowing f (using the data)!

Williams (ICL) June ’10 19 / 27

Page 24: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

Point-to-Point Dissimilarity Methods

Using the data and a MC data set sampled from f0, T can be written as

T = 1nd (nd−1)

∑j>iψ(|~xi−~xj |)+ 1

nmc (nmc−1)

∑j>iψ(|~yi−~yj |)− 1

ndnmc

∑i ,jψ(|~xi−~yj |).

Values of the weighting function found in the statistical literature:

ψ(z) = z2 [C.M. Cuadras and J. Fortiana, various works (1997-2003)]

ψ(z) = z [L. Baringhaus and C. Franz, J. Multivariate Anal. 88 (2004) 190-206]

ψ(z) = 1z , − log z or e−z2/2σ2

[B. Aslan and G. Zech, Stat. Comp. Simul. 75, Issue 2 (2004) 109-119]

A&Z note that for ψ(z) = 1z T is the electrostatic energy of two charge

distributions w/ opposite sign (which is minimized if f = f0); hence, theycalled it the “energy test”.

Williams (ICL) June ’10 20 / 27

Page 25: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

Point-to-Point Dissimilarity Methods

Since Dalitz-plot PDF’s are rapidly varying, I chose to useψ(z) = e−z2/2σ2

. For other analyses, a different choice might work better.

Some notes on this method:

A&Z provide a few variations of the method in their paper. I foundthat using σ → σ/f0(~x) works the best here.

The width of the Gaussian wt. function, σ, is the only nuisanceparameter (but can be estimated from the physics of interest).

Quantiles must be obtained by bootstrapping.

Williams (ICL) June ’10 21 / 27

Page 26: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

Point-to-Point Dissimilarity Methods

Results (p-value distributions) for various PDF’sn=10000 n=1000 n=100

p-value 0 0.5 1

10

20

30

Model

p-value 0 0.5 1

10

20

30

Fit I

The results look as expected. There is a small test bias for Fit I b/c thePDF’s were obtained from the data.

Williams (ICL) June ’10 22 / 27

Page 27: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

Point-to-Point Dissimilarity Methods

Results (p-value distributions) for various PDF’sn=10000 n=1000 n=100

p-value 0 0.5 1

50

100

Fit II

p-value 0 0.5 1

50

100

Fit III

Excellent rejection power! Notice that CLII = 0 for n ≥ 1000 and that thismethod has good rejection power for Fit III for large n.

Williams (ICL) June ’10 23 / 27

Page 28: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

Point-to-Point Dissimilarity Methods

Rejection power at 95% confidence level (for σ ≈ 2Γ):

n Model Fit I Fit II Fit III

10000 4% 4% 100% 81%1000 3% 2% 100% 15%100 3% 2% 10% 3%

Rejection power for Fit III is (not surprisingly) better for larger σ. It maybe even better (for Fit III) using a longer-range wt. function.

Lessons learned about this method:

test bias is small (a few percent at 95% CL);

σ can be obtained from the physics of interest;

rejection power is excellent (even has some power for n = 100);

the method is somewhat CPU intensive, but it’s really powerful!

Williams (ICL) June ’10 24 / 27

Page 29: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

Visual Elements

Some methods reduce D-dimensional distributions to 1-D ones that areeasy to plot and serve as good diagnostic tools.

Distance to nearest-neighbor∗ (left: “good”; right: “bad”)

U 0 0.5 1

50

100

150

200

Fit I

U 0 0.5 1

50

100

150

200

Fit II

The GOF tests involve testing these 1-D distributions vs expected values.*P.J. Bickel and L. Breimann, Ann. Probab. 11, No. 1 (1983) 185-214

Williams (ICL) June ’10 25 / 27

Page 30: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

GOF Methods & Performance

Visual Elements

Some methods reduce D-dimensional distributions to 1-D ones that areeasy to plot and serve as good diagnostic tools.

Local density∗ (left: “good”; right: “bad”)

r 0 0.05 0.1

L

0.05

0.1 Fit I

r 0 0.05 0.1

L

0.05

0.1 Fit II

The GOF tests involve testing these 1-D distributions vs expected values.*B.D. Ripley, J. Roy. Stat. Soc. B Met. 39, No. 2 (1977) 172-212 - cited 1000 times!

Williams (ICL) June ’10 26 / 27

Page 31: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

Summary

Outline

1Introduction

2Toy Model Analysis

3GOF Methods & Performance

4Summary

Williams (ICL) June ’10 26 / 27

Page 32: How good are your fits? Unbinned multivariate goodness-of-fit …hepd.pnpi.spb.ru/hepd/lhcb/abstract/Williams Mike-talk.pdf · 2010. 7. 7. · smart), many physicists employ unbinned

Summary

Summary

Unbinned multivariate GOF is not an “unsolved problem instatistics”!

Rather than trying to invent new methods, the HEP communitywould be better served to study the power and applicability of theGOF tests available in the statistical literature.

Since there is no uniformly most powerful GOF test (even in 1-D), itwould be worthwhile to perform similar studies for other types of HEPanalyses (i.e. repeat my work for differnt PDF’s).

Other fields (e.g. ecology, econometrics, etc.) are way ahead of HEPin this area. We shouldn’t be afraid to use modern (andnot-so-modern) statistics technology!

For more info see MW, arXiv:1006.3019 (submitted to JINST).

Williams (ICL) June ’10 27 / 27