estimating the significance of a signal in a multi ... · estimating the significance of a signal...

Estimating the significance of a signal in a multi-dimensional search

Ofer Vitells, Eilam Gross

1

Ofer Vitells, Eilam Gross

TAUP 2011 , 5-9 September , Munich

O.Vitells & E. Gross, Astropart. Phys. (2011) doi:10.1016/j.astropartphys.2011.08.005

Introduction

� Searching for a signal in some parameter space (mass, shape, location in the sky, etc.) involves a “look elsewhere effect” – the significance calculation needs to account for the possibility of the signal to appear anywhere within the range.

� Monte-Carlo simulation is a straight-forward way of estimating the p-value, but can be computationally very

20 2 0 4 0 6 0 8 0 1 0 0 1 2 0

0

1 0

2 0

3 0

4 0

5 0

Eve

nts

/ u

nit

mas

s

estimating the p-value, but can be computationally very expansive (requires repeating the entire search procedures many times with background-only simulations)

� The mathematical theory of random fields provides useful analytic results.

Random fields

� Usually one defines the test statistic:

� For any fixed θ, q0(θ) follows (asymptotically) a χ2

distribution with one degree of freedom by Wilks’ theorem.

0

( 0)( ) 2 log

ˆ( , )q

µθ

µ θ=

= −L

L

0 : 0H µ =

1 : 0H µ >

µ=“signal strength”

Parameterization of the search space

3

� q0(θ) is a χ2 random field over the space of θ (a random variable indexed by a continuous parameter(s) ). we are interested in

� For which we want to know what is the p-value

0p-value=P(max[ ( )] )q uθ

θ ≥

0 0 0

( 0)ˆˆ ( ) 2 log max[ ( )]ˆˆ( , )

q q qθ

µθ θ

µ θ=

≡ = − =L

Lis the global

maximum point

θ̂

� The set of points where the field has values larger then some number uis called the excursion set Au abovethe level u.

Excursion sets & The Euler characteristic

Excursion set{ : ( ) }uA q uθ θ= ∈ >M

4

Excursion set

φ=1 φ=0 φ=2

� The Euler characteristic is a topological property,in two dimensions it is the number of disconnected components minus the number of `holes’

Expectation of the Euler characteristic

E[ ( )] ( )n

u d dA uϕ ρ=∑N

� For random fields defined over any parameter space (Riemannian manifold) in D dimensions, the expected Euler characteristic of the excursion set φ(Au) is givenby :

5

0

E[ ( )] ( )u d dd

A uϕ ρ=

=∑N

[R.J. Adler and J.E. Taylor, Random Fields and Geometry (2007),

Springer Monographs in Mathematics]

dN

- ρd are ‘universal’ functions (depend only on the level u and the type of distribution)

- The geometrical shape of the space and the covariance structure of the field are completely encoded into the coefficients (do not depend on the level u)


E[ ( )] ( )n

u d dA uϕ ρ=∑N

� For random fields defined over any parameter space (Riemannian manifold) in D dimensions, the expected Euler characteristic of the excursion set φ(Au) is givenby :

6

0

E[ ( )] ( )u d dd

A uϕ ρ=

=∑N

[R.J. Adler and J.E. Taylor, Random Fields and Geometry (2007),

Springer Monographs in Mathematics]

20

( 1)/2 /21

( 2)/2 /22

( ) P( )

( )

( ) [ ( 1)]

...

s

s u

s u

u u

u u e

u u e u s

ρ χ

ρ

ρ

− −

− −

= >

=

= − −

e.g. for a χ2 field with s degrees of freedom:


� Why is E[φ(Au)] interesting ?

Above high levels excursions are rare,

1 max[ ( )]( )

q uA

θϕ

>≈

7

( )0uA

otherwiseϕ

≈

0E[ (A )] P(max[ ( )] )u q uθ

ϕ θ≈ ≥

2-D example: IceCube search for astrophysical neutrino point sources

Assume Gaussian distribution of

Unbinned likelihood:

( , ) ( ) (1 ) ( )s ss s s i b i

i

n nx n x x

N N = + −

∏�

L f f

IceCube looks for neutrino sources,

2-D Search over the sky (θ,φ)

8

J. Braun, J. Dumm, F. De Palma, C. Finley, A. Karle, and T. Montaruli,Astropart. Phys. 29, 299 (2008); [arXiv:0801.1604]

signal events

Detector resolution = 0.7O

Signal parameters can also include energy and time, not considered here

2

2

| |

22

1( | )

2

i ix x

s i sf x x e σ

πσ

−−

=

� �

� �( , )sx θ ϕ=

�

2-D example: search for neutrino sources (IceCube)

Properly covering the whole sky requires a grid of ~10002 points

9

Significance map

0( , )q θ ϕ

IceCube simulated background data (1 year) 67,000 events,provided by Jim Braun & Teresa Montaruli

0.1

0.15

0.2

0.25

10

Significance map

-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0( , )q θ ϕExcursion set (u=1)φ=95

-0.22

-0.218

Calculation of the Euler characteristic

• Usually we have q(θ) calculated on a grid of points

• Calculation of the E.C. is straightforward:

11

-0.226

-0.224

-0.222

• φ = #points - #edges + #faces

• Generalizes to higher dimensions

φ = 18(points) –23(edges) + 7(faces)

= 2

2-d example: search for neutrino sources (IceCube)

2 /21 2

1[ ( )] P( ) ( )

2u

uE A u u eϕ χ −= > + +N N

Estimate E[φ] at two levels, e.g. 0 and 1, and solve for and 1N 2N

For a chi2 field in 2 dimensions:

0

1

33.5 2

94.6 1.3

ϕ

ϕ

= ±

= ±

From 20 bkg. Simulations:

1 33 2

123 3

= ±

= ±

N

N

12-0.2 -0.1 0 0.1 0.2 0.3

-0.2

-0.1

0

0.1

0.2

0.3

u=1

2 123 3= ±N

-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

φ=35

u=0

-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

φ=95

u=1


2 /21 2

1[ ( )] P( ) ( )

2u

uE A u u eϕ χ −= > + +N N 1

2

33 2

123 3

= ±

= ±

N

N

13

P-value 0q̂

e.g.: P(max q0>30) = (2.5 ± 0.4)x10-4 (estimated)

E.C. Formula : (2.28 ± 0.06)x10-4

~200,000 random background simulations


2 /21 2

1[ ( )] P( ) ( )

2u

uE A u u eϕ χ −= > + +N N 1

2

33 2

123 3

= ±

= ±

N

N

Note this is NOT a simple “trial factor” correctioneffective trial factor increases with local significance

14

P-value 0q̂

e.g.: P(max q0>30) = (2.5 ± 0.4)x10-4 (estimated)

E.C. Formula : (2.28 ± 0.06)x10-4

~200,000 random background simulations

Slicing

� Exploit the azimuthal angle symmetry to reduce computations:

( ) ( ) ( ) ( )A B A B A Bϕ ϕ ϕ ϕ= + −∪ ∩

Divide to N slices:

φ=0=1+1-2

N=18

15

Divide to N slices:

[ (slice ) (edge )] (0)i ii

ϕ ϕ ϕ ϕ= − +∑[ ] ( [ (slice)] [ (edge)]) (0)E N E Eϕ ϕ ϕ ϕ= × − +

edge

(0)ϕ1(slice) 7.8 0.35ϕ = ±

1(edge) 2.5 0.15ϕ = ±

/2

/2

(slice) ((6 0.5) (6.7 0.8) )

(edge) (4.4 0.2)

u

u

u e

e

ϕ

ϕ

−

−

= ± + ±

= ±

1

2

28 9

120 14

= ±

= ±

N

N

40 “slice” simulations

Consistent with full sky

simulation

2-D exapmle #2: resonance search with unknown width

� Gaussian signal on exponential background

� Toy model : 0<m<100 , 2<σ<6

� Unbinned likelihood:

( ) ( )( | )s s i b s i

s bi s b

N f x N f xPoiss N N N

N N

+= × +

+∏L1 01

1 02

2( )x m−

16

( ) cxbf x ce−=

0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 01 0

0

10 20 30 40 50 60 70 80 902

2.5

3

3.5

4

4.5

5

5.5

60q̂

σ

m

2

2

( )

2

2

1( ; , )

2

x m

sf x m e σσπσ

−−

=

2-D exapmle #2: resonance search with unknown width

10-3

10-2

10-1

100

P-value0q̂

Excellent approximation above the ~2σlevel

1710 20 30 40 50 60 70 80 90

2

2.5

3

3.5

4

4.5

5

5.5

6

10 20 30 40 50 60 70 80 902

2.5

3

3.5

4

4.5

5

5.5

6

u=1 u=0

5 10 15 20 25 3010

-6

10-5

10-4

2 /21 2

1[ ( )] P( ) ( )

2u

uE A u u eϕ χ −= > + +N N

1

2

4 0.2

0.7 0.3

= ±

= ±

N

N

0 4.5 0.2ϕ = ±1 3 0.16ϕ = ±

More dimensions …

� Potential additional search dimensions:time, temporal/angular scale, energy, …

� When possible, slicing can be useful to reduce necessary computations.

18

� Slightly more complicated to calculate the E.C. on an N-D grid, but the formalism is well suited.

Summary

� The Euler characteristic formula provides a practical way of estimating the look-elsewhere effect.

� Applicable in wide range of applications, such as astrophysical searches for neutrino sources or resonance search with unknown width, and in any

19

resonance search with unknown width, and in any number of search dimensions.

� The procedure for estimating the p-value is simple and reliable.

p-value ≈ 2 /21 2

1[ ( )] P( ) ( ) ...

2u

uE A u u eϕ χ −= > + + +N N

20

Backup


0.05

0.1

0.15

0.2

0.25

21

Significance map

-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25

-0.25

-0.2

-0.15

-0.1

-0.05

0

Excursion set (u=1)

0( , )q θ ϕ

A small modification

� Usually we only look for ‘positive’ signals

0

( 0)2 log

ˆ( , )( )

0

q

µµ θθ=−

=

L

Lˆ 0µ >

ˆ 0µ ≤

q0(θ) is ‘half chi2’

[H. Chernoff, Ann. Math. Stat. 25, 573578 (1954)]

22

The p-value just get divided by 1/2

� Or equivalently consider as a gaussian field

( since by Wald’s theorem)

0 ˆ 0µ ≤

2

0

ˆ ( )( )q

µ θθ

σ =

[Cowan, Cranmer, Gross, Vitells, arXiv:1007.1727]

µ̂

The 1-dimensional caseFor a chi2 random field,the expected number ofupcrossings of a level u is given by: [Davies,1987]

/21[ ] u

uE N e−=N0 20 40 60 80 100 120

0

10

20

30

40

50

Eve

nts

/ un

it m

ass

23

To have the global maximum above a level u:

- Either have at least one upcrossing (Nu>0) or have q0>u at the origin (q0(0)>u) :

0 20 40 60 80 100 120

0 20 40 60 80 100 1200

5

m

q(m

)

0( )q m

u

0q̂

0 0ˆ( ) ( 0) ( (0) )uP q u P N P q u> ≤ > + >

0[ ] ( (0) )uE N P q u≤ + >

[ ] P( 0)u uE N N≥ >Note the inequality:

1 P(1) 2 P(2) ... P(1) P(2) ...⋅ + ⋅ + ≥ + +

When P( 1) P( 1)u uN N> =≪

[ ] P( 1) P( 0)u u uE N N N= >≃ ≃then

(large u)

Becomes an equality for large u

[R.B. Davies, Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 74, 33–43 (1987)]

The 1-dimensional case

The only unknown is ,which can be estimated from the average number of upcrossings at some

low reference level

1N

0 20 40 60 80 100 1200

10

20

30

40

50

Eve

nts

/ un

it m

ass /2

1[ ] uuE N e−=N

24

0 0P( ) [ ] P( (0) )uq u E N q u> ≤ + >/2 2

1 1

1( )

2ue P uχ−= + >N

low reference level

0

0

/21

uuN e≅N

0 20 40 60 80 100 120

0 20 40 60 80 100 1200

5

m

q(m

)

0( )q m

u

0q̂

The p-value can then be estimated by Davies’ formula

1-D example: resonance search

0 2 0 4 0 6 0 8 0 1 0 0 1 2 00

1 0

2 0

3 0

4 0

5 0

Eve

nts

/ u

nit

ma

ss

The model is a gaussian signal (with unknown location m) on top of a continuous background (Rayleigh distribution)

( | ( ) )i i ii

Poiss n s m bµ β= +∏L

0.5( ) 4.34 0.11N u = = ±

25

In this example we find

[from 100 random background simualtions]

1 5.58 0.14= ±N

/2 21 1

1( )

2ue P uχ− + >N

P-value

0max ( )m

q m

[(E. Gross and O. Vitells, Eur. Phys. J. C, 70, 1-2, (2010) , arXiv:1005.1891]

Excellent approximation already from ~2σ(p-value≈5x10-2)

estimating the significance of a signal in a multi ... · estimating the significance of a signal...

Documents