introduction to confidence center for cosmology and intervals and limit setting via...

James Beacham (NYU)

James BeachamNew York University

Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX

Center for Cosmology and Particle Physics

)2Invariant Mass (MeV/c170 180 190 200 210 220 230 240 250

2C

ount

s/1

MeV

/c0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

22000

Pairs-e+Toy Model of APEX Test Run: 1.44M Coincident e


2C

ount

s/1

MeV

/c0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

22000

Pairs-e+Toy Model of APEX Test Run: 1.44M Coincident e

Introduction to Confidence Intervals and Limit Setting

via APEX

1

e+

e!

e!

Z

A"

!

Preliminary background-only model of APEX Test Run:1.44M Coincident e+e- Pairs

?

James Beacham (NYU)



Outline

1) Goals of statistical analysis in collider physics‣ Statistical inference

2) Hypothesis testing3) APEX as an example‣ Probability model and likelihood function‣ Profile likelihood ratio‣ Confidence interval from hypothesis test‣ Look elsewhere effect‣ Limit setting, with preliminary expected projected results

4) Other methods5) Summary

2

James Beacham (NYU)



Goals of Statistical Analysis in Collider PhysicsWe performed an experiment and we made some measurements‣ Ostensibly, we did this for a reason, such as to

● Measure a property (cross section, mass, etc.) of a known particle● Discover a new particle (pentaquarks, , etc.)● Set limits on something, e.g., the cross section of an undiscovered particle; that

is, we didn’t see it in a certain mass (or some other observable) range, so its cross section (based on some model) must be less than X

‣ We want to make a quantitative statement about the relationship between the experiment we conducted and the thing we set out to measure (or detect) statistical inference● I’ll focus on discovery and setting limits● In those two cases, we want to answer

• Did I discover something or not, and how can I demonstrate this?• What are the upper and lower limits on ___ of the particle I was looking for?

● Both of these involve hypothesis testing and confidence intervals

3

A�

James Beacham (NYU)



Hypothesis TestingWe start with a probability model likelihood function‣ Likelihood of what? Likelihood that the data set corresponds to the

model parameters, .‣ We’re actually testing hypotheses, (null, SM, background-only)

and (alternate, new physics, etc.)Test statistic: A real-valued function that summarizes the data in a way relevant to the hypotheses being tested‣ Neyman-Pearson lemma: The test statistic that minimizes the

probability of accepting when is true, for a fixed CL, is which can be generalized to

‣ Separate parameters into● Parameters of interest (physics parameters: mass, # signal events, etc.)● Nuisance parameters (everything else: background shape, etc.)● Nuisance parameters = systematics; we must deal with them somehow

4

H0

H1

L(x | H0)L(x | H1)

H0

H1

θi

L(x | θ0)L(x | θbest)

James Beacham (NYU)



Hypothesis Testing

Want a test statistic that incorporates the nuisance parameters‣ Profile likelihood ratio

5

λ =L(x | θr0,

ˆθν)L(x | θr, θν)

James Beacham (NYU)



Hypothesis Testing


6

λ =L(x | θr0,


Parameters of interest (POI), set to values for H0

James Beacham (NYU)



Hypothesis Testing


7

λ =L(x | θr0,



Conditional MLE of nuisance parameters (i.e., for )θr = θr0

James Beacham (NYU)



Hypothesis Testing


8

λ =L(x | θr0,



MLE of POIMLE of nuisance parameters


James Beacham (NYU)



Hypothesis Testing


● A good choice: It’s the maximum likelihood under the H0 as a fraction of its largest possible value; large values of indicate that H0 is reasonable

‣ And, via Wilks’s Theorem, under certain conditions, is asymptotically distributed as a with # d.o.f. = # POI● We can use it to calculate p-values and confidence intervals

Let’s look at a concrete example: APEX

9

λ =L(x | θr0,



MLE of POIMLE of nuisance parameters


−2 log λχ2

λ

James Beacham (NYU)



APEX Example

‣ We want to● I ) Identify any statistically significant excesses in spectrum

• Search for small peaks of widths mass resolution● II ) Set limits on cross section, which scales as

• Determine range in cross section (or ) of hypothetical narrow resonance consistent with observed spectrum

‣ Spectrum of background-only = H0 (null hypothesis)‣ Spectrum with significant small peak = H1 (alternate hypothesis)‣ We first determine a probability model

10

Analysis of APEX raw data invariant mass spectrum

≤

)2Invariant Mass (MeV/c100 120 140 160 180 200 220 240 260 280 300

2C

ount

s/1

MeV

/c

0

50

100

150

200

250

300

350

400

450

APEX Test Run, 2% of data, prelim. optics

�2 = α�/α

�2A�

APEX Test Run, 2% of data

James Beacham (NYU)



APEX Preliminary Probability Model and Likelihood Function

‣ For a set of data, we get a spectrum of ; our probability model becomes a likelihood function as a function of our model parameters, and for a fixed and , we form the profile likelihood ratio:

● A single (double) hat means an unconditional (conditional) MLE11

λ(S) =L(S, ˆB, ˆai)L(S, B, ai)

‣N is a normal / Gaussian probability distribution ‣Background shape given by polynomial with coefficients ai


2Co

unts

/1 M

eV/c

0

50

100

150

200

250

300

350

400

450

error band - O(3) polynomial!Fit w/ 1


2Co

unts

/1 M

eV/c

0

50

100

150

200

250

300

350

400

450

error band - O(3) polynomial!Fit w/ 1

)2Invariant Mass (MeV/c100 120 140 160 180 200 220 240 260 280 300

2C

ount

s/1

MeV

/c0

50

100

150

200

250

300

350

400

450

APEX Test Run - W target - 2% of data after first-pass optics analysis

me+e−

mA� σ

P (me+e− | mA� , σ, S, B, ai) =S · N(me+e− | mA� , σ) + B · Polynomial(me+e− , ai)

S + B

James Beacham (NYU)




2C

ount

s/1

MeV

/c

0

50

100

150

200

250

300

350

400


2C

ount

s/1

MeV

/c

0

50

100

150

200

250

300

350

400


2C

ount

s/1

MeV

/c

0

50

100

150

200

250

300

350

400


2C

ount

s/1

MeV

/c

0

50

100

150

200

250

300

350

400

Profile Likelihood Ratio

12

λ(S) =L(S, ˆB, ˆai)L(S, B, ai)

For APEX we’re scanning along the mass spectrum, so for each fixed mass and width hypothesis, we perform two fits of our model to the data, one for the conditional MLEs (where we fix S = 0) and one for the unconditional (letting all parameters float)

Background-only model before fitting

Fit of model to data with S = 0

Background-only model before fitting

Fit, all parameters floating

Toy model based on APEX 2% test run spectrum with HUGE artificial peak inserted, for illustration purposes

L(S = 0, ˆB, ˆai) L(S, B, ai)

mA� = 230 MeV/c2, fixedσ = 2 MeV/c2, fixed

James Beacham (NYU)



95% confidence

interval

Profile Likelihood Ratio

13

Once we’ve done this, we have our test statistic ( ), and for a desired confidence level (e.g., 95%) we simply read off the upper and lower limits on S that correspond to the appropriate tail of a distribution

−2 log λ

mA�

S

Fixed mass hypothesis

(θ = S)

We performed a hypothesis test and ended up with a confidence interval! (1-to-1 relationship)

K. C

ranm

er

χ2n

James Beacham (NYU)



)2 (MeV/cA'm180 190 200 210 220 230 240 250 260

S

0

2000

4000

6000

8000

10000

12000

, corrected for trials factor2=2 MeV/c!, S=2000, 2=222.8 MeV/cA'

Toy model, 90% CL, m , corrected for trials factor2=2 MeV/c!, S=2000, 2=222.8 MeV/cA'

Toy model, 90% CL, m


2C

ount

s/0.

8 M

eV/c

0

5000

10000

15000

20000

25000

30000

pairs-e+Toy model: Coincident e


2C

ount

s/0.

8 M

eV/c

0

5000

10000

15000

20000

25000

30000


)2Invariant Mass (MeV/c216 218 220 222 224 226 228

2C

ount

s/0.

8 M

eV/c

21000

21500

22000

22500

23000

)2Invariant Mass (MeV/c216 218 220 222 224 226 228

2C

ount

s/0.

8 M

eV/c

21000

21500

22000

22500

23000


Putting it All Together: Full General Procedure

14

Scan along mass; look for excess consistent with mass resolution

S = # of signal events above background

S

X

Significant excess at true point; S = 0 is not in confidence interval, i.e., excess is no longer consistent with background fluctuation

START

X

Calculate 95% confidence interval in S

True point: , S = 2000

Hypothesized mass

END

mA� = 222.8 MeV/c2

Toy model: confidence intervals and significant peak

James Beacham (NYU)



)2 (MeV/cA'm170 180 190 200 210 220 230 240 250

S

0

100

200

300

400

500

600

700

800

900

1000

2 = 2 MeV/c!Example scan of limited APEX test run data sample, 95% CL, 2 = 2 MeV/c!Example scan of limited APEX test run data sample, 95% CL,

Practice example peak search on 2% of APEX test run data

Width hypothesis (purposely unrealistic) 2 MeV/c2

95% CL

15

Searching for a Resonance: Example Based on APEX Test Run Data

Resonance search practice example

No excess; consistent with background fluctuation, as

expected from practice spectrum

‣ 2% of APEX test run data‣ No optics analysis

‣ Assumption of a mass resolution not realistic for this spectrum

‣ Couldn’t possibly see a significant peak;just a practice example

James Beacham (NYU)


Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX 16

Confidence Intervals

But what does “95% CL” mean?‣ Frequentist coverage

● In a large ensemble of identical experiments, the interval we calculated will cover the true value of the parameter 95% of the time

)2 (MeV/cA'm180 190 200 210 220 230 240 250 260

S

0

2000

4000

6000

8000

10000

12000



Toy model, 90% CL, mToy model: confidence intervals and significant peak

● The interval is a function of the data, which is random; thus, the interval will fluctuate from experiment to experiment

● NOTE: This is not the same thing as a Bayesian “credible interval”

James Beacham (NYU)



Look Elsewhere EffectMass-scan (or “raster scan”) approach has a consequence‣ Each hypothesis equally likely

must penalize CL by trials factor‣ That is, because we’re testing the null hypothesis multiple times over the

full mass range, we’re increasing the likelihood that we’ll find a resonance at some mass value; we must degrade the significance of any peak

17

mA�

⇒

p-value → p-value×mass range/resonance width

)2 (MeV/cA'm170 180 190 200 210 220 230 240 250

p-va

l

-410

-310

-210

-110

2m = 2 MeV/c!

uncorrected p-value threshold"uncorrected 2

threshold"corrected 2

Equivalently, shift thresholdEquivalently, shift CL

Note: This degradation of significance is only relevant for the lower limit on our parameter of interest (S); a significant peak means that S = 0 is not contained in our confidence interval

Setting limits doesn’t pay for look-elsewhere...

σ

James Beacham (NYU)



)2 (MeV/cA'm180 190 200 210 220 230 240 250 260

S

0

2000

4000

6000

8000

10000

12000



Toy model, 90% CL, m


2C

ount

s/0.

8 M

eV/c

0

5000

10000

15000

20000

25000

30000



2C

ount

s/0.

8 M

eV/c

0

5000

10000

15000

20000

25000

30000


Toy model: confidence intervals

)2 (MeV/cA'm180 190 200 210 220 230 240 250 260

2 !

-810

-710

-610

-510

-410, F=42=2 MeV/c", S=0, 2=222.8 MeV/c

A'Toy model, 90% CL, m , F=42=2 MeV/c", S=0, 2=222.8 MeV/c

A'Toy model, 90% CL, m

)2Invariant Mass (MeV/c198 200 202 204 206 208

2C

ount

s/0.

8 M

eV/c

24000

24500

25000

25500

26000

)2Invariant Mass (MeV/c198 200 202 204 206 208

2C

ount

s/0.

8 M

eV/c

24000

24500

25000

25500

26000


Setting Limits: General Procedure

18

Hypothesize a Gaussian resonance of fixed mass and width, e.g. mA� = 204.3 MeV/c2

σG = 2 MeV/c2

S = # of signal events above background

S

Calculate 95% confidence interval in S

X

X

START:

Do this for each mass

point; creates an exclusion

band

Raw upper bound on S upper bound on→ �2

X

S = 0

Find range of S consistent with data

END

One more step...

Toy model: Upper bound on �2

James Beacham (NYU)



APEX Limit-Setting Procedure -- PreliminaryRaw upper limit on S yields an upper limit on

19

�2(mA�) =Sδm

Bδm

F (mA�) δm

mA�

�2Neffα

3π

�

Background factor: F = 5 Neff = 1

�2

δm = mass window = 2.56× σ σ = width of Gaussian = 0.7 MeV/c2

mA�

σ

Bδm

Sδm = 80% of area of Gaussian= 0.8× Stot

2.56× σ = δm

James Beacham (NYU)


Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX)2 (MeV/cA'm

170 180 190 200 210 220 230 240 250

2 !

-810

-710

-610

-510

-410

2m = 0.7 MeV/c"

95% CL

F = 5

)2 (MeV/cA'm170 180 190 200 210 220 230 240 250

S

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Expected Upper Limit

Expected Lower Limit

2m = 0.7 MeV/c!

95% CL

20

Preliminary Model of APEX Test Run: Expected Results

Expected limits on S and exclusion sensitivity in based on

background-only preliminary model from limited sample of test run data, scaled up to full statistics of the test

run, i.e., 1.44 M events

�2 = α�/α

Preliminary background-only model of APEX Test Run:

Expected Limits on Number of Signal Events, S

Preliminary background-only model of APEX Test Run:

Expected Upper Limit on �2

Preliminary estimate�2(mA�) =

Sδm

Bδm

F (mA�) δm

mA�

�2Neffα

3π

�

σ

σ

James Beacham (NYU)



S160 180 200 220 240 260 280 300

true

Mas

s

0

20

40

60

80

100

120

140

160

180

200

Contour of S vs trueMassContour of S vs trueMass

Other MethodsWe can always get coverage by construction‣ Neyman Construction ensures this

● Feldman-Cousins method● More computationally intensive; must build up full distribution of test statistic

‣ Methods based on profile likelihood ratio are quick and work very well● Implemented in RooStats

21

mass0 50 100 150 200 250 300 350 400 450 500

Even

ts /

( 5 )

0

2

4

6

8

10

12

14

16

mass0 50 100 150 200 250 300 350 400 450 500

Even

ts /

( 5 )

0

2

4

6

8

10

12

14

16

A RooPlot of "mass"

mass

S

= true point

= 2D Feldman-Cousins scan; ~ 1 hour= 2D contour based on profile likelihood ratio; < 1 s

James Beacham (NYU)


Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX 22

Summary

APEX at NYU: J.B., Kyle Cranmer

Quantitative statistical statements about discovery and limit setting can be made via hypothesis testing and confidence intervals‣ A good choice of test statistic for hypothesis testing that incorporates

nuisance parameters is the profile likelihood ratio● Used for the preliminary model for APEX

Discovery searches using the mass-scan approach must account for the look elsewhere effect‣ Penalize by trials factor

Limit setting results from determining a confidence interval‣ For APEX, upper limit on S upper limit on

Other approaches possible; ultimately use several methods and compareSome resources:

‣ F. James, Statistical Methods in Experimental Physics, 2nd ed., World Scientific, 2006‣ L. Lyons, Statistics for Nuclear and Particle Physics, CUP, 1986‣ http://videolectures.net/cernacademictraining09_cranmer_stpp/

�2

http://videolectures.net/cernacademictraining09_cranmer_stpp/

http://videolectures.net/cernacademictraining09_cranmer_stpp/

James Beacham (NYU)



Statistical Inference -- A Little Background

Three approaches‣ Frequentist: P(data | theory)

● The data set is considered random● Coverage, p-values, confidence intervals

‣ Bayesian: P(theory | data) P(data | theory) P(theory)● Seems like the actual question we want to answer, but it requires a prior pdf on

the theory, which has no meaning from a frequentist standpoint

‣ Likelihood-based● Likelihood Principle: The likelihood function contains the full information

relative to the parameter• Frequentist calculations (p-values, confidence intervals, etc.) violate this

● But we can still use likelihood-based methods to get approximate frequentist results

23

∝

James Beacham (NYU)



RooStats

ROOT RooFit RooStats‣ Statistical tools built on top of RooFit

and distributed in ROOT‣ Will be used for the final analysis of

the ATLAS-CMS Combined Higgs Working Group

24

introduction to confidence center for cosmology and intervals and limit setting via...

Documents