introduction to confidence center for cosmology and intervals and limit setting via...
TRANSCRIPT
James Beacham (NYU)
James BeachamNew York University
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
Center for Cosmology and Particle Physics
)2Invariant Mass (MeV/c170 180 190 200 210 220 230 240 250
2C
ount
s/1
MeV
/c0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
22000
Pairs-e+Toy Model of APEX Test Run: 1.44M Coincident e
)2Invariant Mass (MeV/c170 180 190 200 210 220 230 240 250
2C
ount
s/1
MeV
/c0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
22000
Pairs-e+Toy Model of APEX Test Run: 1.44M Coincident e
Introduction to Confidence Intervals and Limit Setting
via APEX
1
e+
e!
e!
Z
A"
!
Preliminary background-only model of APEX Test Run:1.44M Coincident e+e- Pairs
?
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
Outline
1) Goals of statistical analysis in collider physics‣ Statistical inference
2) Hypothesis testing3) APEX as an example‣ Probability model and likelihood function‣ Profile likelihood ratio‣ Confidence interval from hypothesis test‣ Look elsewhere effect‣ Limit setting, with preliminary expected projected results
4) Other methods5) Summary
2
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
Goals of Statistical Analysis in Collider PhysicsWe performed an experiment and we made some measurements‣ Ostensibly, we did this for a reason, such as to
● Measure a property (cross section, mass, etc.) of a known particle● Discover a new particle (pentaquarks, , etc.)● Set limits on something, e.g., the cross section of an undiscovered particle; that
is, we didn’t see it in a certain mass (or some other observable) range, so its cross section (based on some model) must be less than X
‣ We want to make a quantitative statement about the relationship between the experiment we conducted and the thing we set out to measure (or detect) statistical inference● I’ll focus on discovery and setting limits● In those two cases, we want to answer
• Did I discover something or not, and how can I demonstrate this?• What are the upper and lower limits on ___ of the particle I was looking for?
● Both of these involve hypothesis testing and confidence intervals
3
A�
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
Hypothesis TestingWe start with a probability model likelihood function‣ Likelihood of what? Likelihood that the data set corresponds to the
model parameters, .‣ We’re actually testing hypotheses, (null, SM, background-only)
and (alternate, new physics, etc.)Test statistic: A real-valued function that summarizes the data in a way relevant to the hypotheses being tested‣ Neyman-Pearson lemma: The test statistic that minimizes the
probability of accepting when is true, for a fixed CL, is which can be generalized to
‣ Separate parameters into● Parameters of interest (physics parameters: mass, # signal events, etc.)● Nuisance parameters (everything else: background shape, etc.)● Nuisance parameters = systematics; we must deal with them somehow
4
H0
H1
L(x | H0)L(x | H1)
H0
H1
θi
L(x | θ0)L(x | θbest)
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
Hypothesis Testing
Want a test statistic that incorporates the nuisance parameters‣ Profile likelihood ratio
5
λ =L(x | θr0,
ˆθν)L(x | θr, θν)
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
Hypothesis Testing
Want a test statistic that incorporates the nuisance parameters‣ Profile likelihood ratio
6
λ =L(x | θr0,
ˆθν)L(x | θr, θν)
Parameters of interest (POI), set to values for H0
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
Hypothesis Testing
Want a test statistic that incorporates the nuisance parameters‣ Profile likelihood ratio
7
λ =L(x | θr0,
ˆθν)L(x | θr, θν)
Parameters of interest (POI), set to values for H0
Conditional MLE of nuisance parameters (i.e., for )θr = θr0
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
Hypothesis Testing
Want a test statistic that incorporates the nuisance parameters‣ Profile likelihood ratio
8
λ =L(x | θr0,
ˆθν)L(x | θr, θν)
Parameters of interest (POI), set to values for H0
MLE of POIMLE of nuisance parameters
Conditional MLE of nuisance parameters (i.e., for )θr = θr0
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
Hypothesis Testing
Want a test statistic that incorporates the nuisance parameters‣ Profile likelihood ratio
● A good choice: It’s the maximum likelihood under the H0 as a fraction of its largest possible value; large values of indicate that H0 is reasonable
‣ And, via Wilks’s Theorem, under certain conditions, is asymptotically distributed as a with # d.o.f. = # POI● We can use it to calculate p-values and confidence intervals
Let’s look at a concrete example: APEX
9
λ =L(x | θr0,
ˆθν)L(x | θr, θν)
Parameters of interest (POI), set to values for H0
MLE of POIMLE of nuisance parameters
Conditional MLE of nuisance parameters (i.e., for )θr = θr0
−2 log λχ2
λ
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
APEX Example
‣ We want to● I ) Identify any statistically significant excesses in spectrum
• Search for small peaks of widths mass resolution● II ) Set limits on cross section, which scales as
• Determine range in cross section (or ) of hypothetical narrow resonance consistent with observed spectrum
‣ Spectrum of background-only = H0 (null hypothesis)‣ Spectrum with significant small peak = H1 (alternate hypothesis)‣ We first determine a probability model
10
Analysis of APEX raw data invariant mass spectrum
≤
)2Invariant Mass (MeV/c100 120 140 160 180 200 220 240 260 280 300
2C
ount
s/1
MeV
/c
0
50
100
150
200
250
300
350
400
450
APEX Test Run, 2% of data, prelim. optics
�2 = α�/α
�2A�
APEX Test Run, 2% of data
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
APEX Preliminary Probability Model and Likelihood Function
‣ For a set of data, we get a spectrum of ; our probability model becomes a likelihood function as a function of our model parameters, and for a fixed and , we form the profile likelihood ratio:
● A single (double) hat means an unconditional (conditional) MLE11
λ(S) =L(S, ˆB, ˆai)L(S, B, ai)
‣N is a normal / Gaussian probability distribution ‣Background shape given by polynomial with coefficients ai
)2Invariant Mass (MeV/c170 180 190 200 210 220 230 240 250
2Co
unts
/1 M
eV/c
0
50
100
150
200
250
300
350
400
450
error band - O(3) polynomial!Fit w/ 1
)2Invariant Mass (MeV/c170 180 190 200 210 220 230 240 250
2Co
unts
/1 M
eV/c
0
50
100
150
200
250
300
350
400
450
error band - O(3) polynomial!Fit w/ 1
)2Invariant Mass (MeV/c100 120 140 160 180 200 220 240 260 280 300
2C
ount
s/1
MeV
/c0
50
100
150
200
250
300
350
400
450
APEX Test Run - W target - 2% of data after first-pass optics analysis
me+e−
mA� σ
P (me+e− | mA� , σ, S, B, ai) =S · N(me+e− | mA� , σ) + B · Polynomial(me+e− , ai)
S + B
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
)2Invariant Mass (MeV/c170 180 190 200 210 220 230 240 250
2C
ount
s/1
MeV
/c
0
50
100
150
200
250
300
350
400
)2Invariant Mass (MeV/c170 180 190 200 210 220 230 240 250
2C
ount
s/1
MeV
/c
0
50
100
150
200
250
300
350
400
)2Invariant Mass (MeV/c170 180 190 200 210 220 230 240 250
2C
ount
s/1
MeV
/c
0
50
100
150
200
250
300
350
400
)2Invariant Mass (MeV/c170 180 190 200 210 220 230 240 250
2C
ount
s/1
MeV
/c
0
50
100
150
200
250
300
350
400
Profile Likelihood Ratio
12
λ(S) =L(S, ˆB, ˆai)L(S, B, ai)
For APEX we’re scanning along the mass spectrum, so for each fixed mass and width hypothesis, we perform two fits of our model to the data, one for the conditional MLEs (where we fix S = 0) and one for the unconditional (letting all parameters float)
Background-only model before fitting
Fit of model to data with S = 0
Background-only model before fitting
Fit, all parameters floating
Toy model based on APEX 2% test run spectrum with HUGE artificial peak inserted, for illustration purposes
L(S = 0, ˆB, ˆai) L(S, B, ai)
mA� = 230 MeV/c2, fixedσ = 2 MeV/c2, fixed
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
95% confidence
interval
Profile Likelihood Ratio
13
Once we’ve done this, we have our test statistic ( ), and for a desired confidence level (e.g., 95%) we simply read off the upper and lower limits on S that correspond to the appropriate tail of a distribution
−2 log λ
mA�
S
Fixed mass hypothesis
(θ = S)
We performed a hypothesis test and ended up with a confidence interval! (1-to-1 relationship)
K. C
ranm
er
χ2n
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
)2 (MeV/cA'm180 190 200 210 220 230 240 250 260
S
0
2000
4000
6000
8000
10000
12000
, corrected for trials factor2=2 MeV/c!, S=2000, 2=222.8 MeV/cA'
Toy model, 90% CL, m , corrected for trials factor2=2 MeV/c!, S=2000, 2=222.8 MeV/cA'
Toy model, 90% CL, m
)2Invariant Mass (MeV/c180 190 200 210 220 230 240 250 260
2C
ount
s/0.
8 M
eV/c
0
5000
10000
15000
20000
25000
30000
pairs-e+Toy model: Coincident e
)2Invariant Mass (MeV/c180 190 200 210 220 230 240 250 260
2C
ount
s/0.
8 M
eV/c
0
5000
10000
15000
20000
25000
30000
pairs-e+Toy model: Coincident e
)2Invariant Mass (MeV/c216 218 220 222 224 226 228
2C
ount
s/0.
8 M
eV/c
21000
21500
22000
22500
23000
)2Invariant Mass (MeV/c216 218 220 222 224 226 228
2C
ount
s/0.
8 M
eV/c
21000
21500
22000
22500
23000
pairs-e+Toy model: Coincident e
Putting it All Together: Full General Procedure
14
Scan along mass; look for excess consistent with mass resolution
S = # of signal events above background
S
X
Significant excess at true point; S = 0 is not in confidence interval, i.e., excess is no longer consistent with background fluctuation
START
X
Calculate 95% confidence interval in S
True point: , S = 2000
Hypothesized mass
END
mA� = 222.8 MeV/c2
Toy model: confidence intervals and significant peak
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
)2 (MeV/cA'm170 180 190 200 210 220 230 240 250
S
0
100
200
300
400
500
600
700
800
900
1000
2 = 2 MeV/c!Example scan of limited APEX test run data sample, 95% CL, 2 = 2 MeV/c!Example scan of limited APEX test run data sample, 95% CL,
Practice example peak search on 2% of APEX test run data
Width hypothesis (purposely unrealistic) 2 MeV/c2
95% CL
15
Searching for a Resonance: Example Based on APEX Test Run Data
Resonance search practice example
No excess; consistent with background fluctuation, as
expected from practice spectrum
‣ 2% of APEX test run data‣ No optics analysis
‣ Assumption of a mass resolution not realistic for this spectrum
‣ Couldn’t possibly see a significant peak;just a practice example
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX 16
Confidence Intervals
But what does “95% CL” mean?‣ Frequentist coverage
● In a large ensemble of identical experiments, the interval we calculated will cover the true value of the parameter 95% of the time
)2 (MeV/cA'm180 190 200 210 220 230 240 250 260
S
0
2000
4000
6000
8000
10000
12000
, corrected for trials factor2=2 MeV/c!, S=2000, 2=222.8 MeV/cA'
Toy model, 90% CL, m , corrected for trials factor2=2 MeV/c!, S=2000, 2=222.8 MeV/cA'
Toy model, 90% CL, mToy model: confidence intervals and significant peak
● The interval is a function of the data, which is random; thus, the interval will fluctuate from experiment to experiment
● NOTE: This is not the same thing as a Bayesian “credible interval”
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
Look Elsewhere EffectMass-scan (or “raster scan”) approach has a consequence‣ Each hypothesis equally likely
must penalize CL by trials factor‣ That is, because we’re testing the null hypothesis multiple times over the
full mass range, we’re increasing the likelihood that we’ll find a resonance at some mass value; we must degrade the significance of any peak
17
mA�
⇒
p-value → p-value×mass range/resonance width
)2 (MeV/cA'm170 180 190 200 210 220 230 240 250
p-va
l
-410
-310
-210
-110
2m = 2 MeV/c!
uncorrected p-value threshold"uncorrected 2
threshold"corrected 2
Equivalently, shift thresholdEquivalently, shift CL
Note: This degradation of significance is only relevant for the lower limit on our parameter of interest (S); a significant peak means that S = 0 is not contained in our confidence interval
Setting limits doesn’t pay for look-elsewhere...
σ
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
)2 (MeV/cA'm180 190 200 210 220 230 240 250 260
S
0
2000
4000
6000
8000
10000
12000
, corrected for trials factor2=2 MeV/c!, S=0, 2=222.8 MeV/cA'
Toy model, 90% CL, m , corrected for trials factor2=2 MeV/c!, S=0, 2=222.8 MeV/cA'
Toy model, 90% CL, m
)2Invariant Mass (MeV/c180 190 200 210 220 230 240 250 260
2C
ount
s/0.
8 M
eV/c
0
5000
10000
15000
20000
25000
30000
pairs-e+Toy model: Coincident e
)2Invariant Mass (MeV/c180 190 200 210 220 230 240 250 260
2C
ount
s/0.
8 M
eV/c
0
5000
10000
15000
20000
25000
30000
pairs-e+Toy model: Coincident e
Toy model: confidence intervals
)2 (MeV/cA'm180 190 200 210 220 230 240 250 260
2 !
-810
-710
-610
-510
-410, F=42=2 MeV/c", S=0, 2=222.8 MeV/c
A'Toy model, 90% CL, m , F=42=2 MeV/c", S=0, 2=222.8 MeV/c
A'Toy model, 90% CL, m
)2Invariant Mass (MeV/c198 200 202 204 206 208
2C
ount
s/0.
8 M
eV/c
24000
24500
25000
25500
26000
)2Invariant Mass (MeV/c198 200 202 204 206 208
2C
ount
s/0.
8 M
eV/c
24000
24500
25000
25500
26000
pairs-e+Toy model: Coincident e
Setting Limits: General Procedure
18
Hypothesize a Gaussian resonance of fixed mass and width, e.g. mA� = 204.3 MeV/c2
σG = 2 MeV/c2
S = # of signal events above background
S
Calculate 95% confidence interval in S
X
X
START:
Do this for each mass
point; creates an exclusion
band
Raw upper bound on S upper bound on→ �2
X
S = 0
Find range of S consistent with data
END
One more step...
Toy model: Upper bound on �2
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
APEX Limit-Setting Procedure -- PreliminaryRaw upper limit on S yields an upper limit on
19
�2(mA�) =Sδm
Bδm
F (mA�) δm
mA�
�2Neffα
3π
�
Background factor: F = 5 Neff = 1
�2
δm = mass window = 2.56× σ σ = width of Gaussian = 0.7 MeV/c2
mA�
σ
Bδm
Sδm = 80% of area of Gaussian= 0.8× Stot
2.56× σ = δm
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX)2 (MeV/cA'm
170 180 190 200 210 220 230 240 250
2 !
-810
-710
-610
-510
-410
2m = 0.7 MeV/c"
95% CL
F = 5
)2 (MeV/cA'm170 180 190 200 210 220 230 240 250
S
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Expected Upper Limit
Expected Lower Limit
2m = 0.7 MeV/c!
95% CL
20
Preliminary Model of APEX Test Run: Expected Results
Expected limits on S and exclusion sensitivity in based on
background-only preliminary model from limited sample of test run data, scaled up to full statistics of the test
run, i.e., 1.44 M events
�2 = α�/α
Preliminary background-only model of APEX Test Run:
Expected Limits on Number of Signal Events, S
Preliminary background-only model of APEX Test Run:
Expected Upper Limit on �2
Preliminary estimate�2(mA�) =
Sδm
Bδm
F (mA�) δm
mA�
�2Neffα
3π
�
σ
σ
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
S160 180 200 220 240 260 280 300
true
Mas
s
0
20
40
60
80
100
120
140
160
180
200
Contour of S vs trueMassContour of S vs trueMass
Other MethodsWe can always get coverage by construction‣ Neyman Construction ensures this
● Feldman-Cousins method● More computationally intensive; must build up full distribution of test statistic
‣ Methods based on profile likelihood ratio are quick and work very well● Implemented in RooStats
21
mass0 50 100 150 200 250 300 350 400 450 500
Even
ts /
( 5 )
0
2
4
6
8
10
12
14
16
mass0 50 100 150 200 250 300 350 400 450 500
Even
ts /
( 5 )
0
2
4
6
8
10
12
14
16
A RooPlot of "mass"
mass
S
= true point
= 2D Feldman-Cousins scan; ~ 1 hour= 2D contour based on profile likelihood ratio; < 1 s
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX 22
Summary
APEX at NYU: J.B., Kyle Cranmer
Quantitative statistical statements about discovery and limit setting can be made via hypothesis testing and confidence intervals‣ A good choice of test statistic for hypothesis testing that incorporates
nuisance parameters is the profile likelihood ratio● Used for the preliminary model for APEX
Discovery searches using the mass-scan approach must account for the look elsewhere effect‣ Penalize by trials factor
Limit setting results from determining a confidence interval‣ For APEX, upper limit on S upper limit on
Other approaches possible; ultimately use several methods and compareSome resources:
‣ F. James, Statistical Methods in Experimental Physics, 2nd ed., World Scientific, 2006‣ L. Lyons, Statistics for Nuclear and Particle Physics, CUP, 1986‣ http://videolectures.net/cernacademictraining09_cranmer_stpp/
�2
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
Statistical Inference -- A Little Background
Three approaches‣ Frequentist: P(data | theory)
● The data set is considered random● Coverage, p-values, confidence intervals
‣ Bayesian: P(theory | data) P(data | theory) P(theory)● Seems like the actual question we want to answer, but it requires a prior pdf on
the theory, which has no meaning from a frequentist standpoint
‣ Likelihood-based● Likelihood Principle: The likelihood function contains the full information
relative to the parameter• Frequentist calculations (p-values, confidence intervals, etc.) violate this
● But we can still use likelihood-based methods to get approximate frequentist results
23
∝
James Beacham (NYU)
Center for Cosmology and Particle Physics
Hall A Analysis Workshop, JLab, 8 Dec. 2010 APEX
RooStats
ROOT RooFit RooStats‣ Statistical tools built on top of RooFit
and distributed in ROOT‣ Will be used for the final analysis of
the ATLAS-CMS Combined Higgs Working Group
24