estimating the significance of a signal in a multi ... · estimating the significance of a signal...

25
Estimating the significance of a signal in a multi-dimensional search Ofer Vitells, Eilam Gross 1 Ofer Vitells , Eilam Gross TAUP 2011 , 5-9 September , Munich O.Vitells & E. Gross, Astropart. Phys. (2011) doi:10.1016/j.astropartphys.2011.08.005

Upload: others

Post on 09-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

Estimating the significance of a signal in a multi-dimensional search

Ofer Vitells, Eilam Gross

1

Ofer Vitells, Eilam Gross

TAUP 2011 , 5-9 September , Munich

O.Vitells & E. Gross, Astropart. Phys. (2011) doi:10.1016/j.astropartphys.2011.08.005

Page 2: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

Introduction

� Searching for a signal in some parameter space (mass, shape, location in the sky, etc.) involves a “look elsewhere effect” – the significance calculation needs to account for the possibility of the signal to appear anywhere within the range.

� Monte-Carlo simulation is a straight-forward way of estimating the p-value, but can be computationally very

20 2 0 4 0 6 0 8 0 1 0 0 1 2 0

0

1 0

2 0

3 0

4 0

5 0

Eve

nts

/ u

nit

mas

s

estimating the p-value, but can be computationally very expansive (requires repeating the entire search procedures many times with background-only simulations)

� The mathematical theory of random fields provides useful analytic results.

Page 3: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

Random fields

� Usually one defines the test statistic:

� For any fixed θ, q0(θ) follows (asymptotically) a χ2

distribution with one degree of freedom by Wilks’ theorem.

0

( 0)( ) 2 log

ˆ( , )q

µθ

µ θ=

= −L

L

0 : 0H µ =

1 : 0H µ >

µ=“signal strength”

Parameterization of the search space

3

� q0(θ) is a χ2 random field over the space of θ (a random variable indexed by a continuous parameter(s) ). we are interested in

� For which we want to know what is the p-value

0p-value=P(max[ ( )] )q uθ

θ ≥

0 0 0

( 0)ˆˆ ( ) 2 log max[ ( )]ˆˆ( , )

q q qθ

µθ θ

µ θ=

≡ = − =L

Lis the global

maximum point

θ̂

Page 4: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

� The set of points where the field has values larger then some number uis called the excursion set Au abovethe level u.

Excursion sets & The Euler characteristic

Excursion set{ : ( ) }uA q uθ θ= ∈ >M

4

Excursion set

φ=1 φ=0 φ=2

� The Euler characteristic is a topological property,in two dimensions it is the number of disconnected components minus the number of `holes’

Page 5: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

Expectation of the Euler characteristic

E[ ( )] ( )n

u d dA uϕ ρ=∑N

� For random fields defined over any parameter space (Riemannian manifold) in D dimensions, the expected Euler characteristic of the excursion set φ(Au) is givenby :

5

0

E[ ( )] ( )u d dd

A uϕ ρ=

=∑N

[R.J. Adler and J.E. Taylor, Random Fields and Geometry (2007),

Springer Monographs in Mathematics]

dN

- ρd are ‘universal’ functions (depend only on the level u and the type of distribution)

- The geometrical shape of the space and the covariance structure of the field are completely encoded into the coefficients (do not depend on the level u)

Page 6: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

Expectation of the Euler characteristic

E[ ( )] ( )n

u d dA uϕ ρ=∑N

� For random fields defined over any parameter space (Riemannian manifold) in D dimensions, the expected Euler characteristic of the excursion set φ(Au) is givenby :

6

0

E[ ( )] ( )u d dd

A uϕ ρ=

=∑N

[R.J. Adler and J.E. Taylor, Random Fields and Geometry (2007),

Springer Monographs in Mathematics]

20

( 1)/2 /21

( 2)/2 /22

( ) P( )

( )

( ) [ ( 1)]

...

s

s u

s u

u u

u u e

u u e u s

ρ χ

ρ

ρ

− −

− −

= >

=

= − −

e.g. for a χ2 field with s degrees of freedom:

Page 7: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

Expectation of the Euler characteristic

� Why is E[φ(Au)] interesting ?

Above high levels excursions are rare,

1 max[ ( )]( )

q uA

θϕ

>≈

7

( )0uA

otherwiseϕ

0E[ (A )] P(max[ ( )] )u q uθ

ϕ θ≈ ≥

Page 8: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

2-D example: IceCube search for astrophysical neutrino point sources

Assume Gaussian distribution of

Unbinned likelihood:

( , ) ( ) (1 ) ( )s ss s s i b i

i

n nx n x x

N N = + −

∏�

L f f

IceCube looks for neutrino sources,

2-D Search over the sky (θ,φ)

8

J. Braun, J. Dumm, F. De Palma, C. Finley, A. Karle, and T. Montaruli,Astropart. Phys. 29, 299 (2008); [arXiv:0801.1604]

signal events

Detector resolution = 0.7O

Signal parameters can also include energy and time, not considered here

2

2

| |

22

1( | )

2

i ix x

s i sf x x e σ

πσ

−−

=

� �

� �( , )sx θ ϕ=

Page 9: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

2-D example: search for neutrino sources (IceCube)

Properly covering the whole sky requires a grid of ~10002 points

9

Significance map

0( , )q θ ϕ

IceCube simulated background data (1 year) 67,000 events,provided by Jim Braun & Teresa Montaruli

Page 10: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

0.1

0.15

0.2

0.25

10

Significance map

-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0( , )q θ ϕExcursion set (u=1)φ=95

Page 11: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

-0.22

-0.218

Calculation of the Euler characteristic

• Usually we have q(θ) calculated on a grid of points

• Calculation of the E.C. is straightforward:

11

-0.226

-0.224

-0.222

• φ = #points - #edges + #faces

• Generalizes to higher dimensions

φ = 18(points) –23(edges) + 7(faces)

= 2

Page 12: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

2-d example: search for neutrino sources (IceCube)

2 /21 2

1[ ( )] P( ) ( )

2u

uE A u u eϕ χ −= > + +N N

Estimate E[φ] at two levels, e.g. 0 and 1, and solve for and 1N 2N

For a chi2 field in 2 dimensions:

0

1

33.5 2

94.6 1.3

ϕ

ϕ

= ±

= ±

From 20 bkg. Simulations:

1 33 2

123 3

= ±

= ±

N

N

12-0.2 -0.1 0 0.1 0.2 0.3

-0.2

-0.1

0

0.1

0.2

0.3

u=1

2 123 3= ±N

-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

φ=35

u=0

-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

φ=95

u=1

Page 13: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

2-d example: search for neutrino sources (IceCube)

2 /21 2

1[ ( )] P( ) ( )

2u

uE A u u eϕ χ −= > + +N N 1

2

33 2

123 3

= ±

= ±

N

N

13

P-value 0q̂

e.g.: P(max q0>30) = (2.5 ± 0.4)x10-4 (estimated)

E.C. Formula : (2.28 ± 0.06)x10-4

~200,000 random background simulations

Page 14: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

2-d example: search for neutrino sources (IceCube)

2 /21 2

1[ ( )] P( ) ( )

2u

uE A u u eϕ χ −= > + +N N 1

2

33 2

123 3

= ±

= ±

N

N

Note this is NOT a simple “trial factor” correctioneffective trial factor increases with local significance

14

P-value 0q̂

e.g.: P(max q0>30) = (2.5 ± 0.4)x10-4 (estimated)

E.C. Formula : (2.28 ± 0.06)x10-4

~200,000 random background simulations

Page 15: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

Slicing

� Exploit the azimuthal angle symmetry to reduce computations:

( ) ( ) ( ) ( )A B A B A Bϕ ϕ ϕ ϕ= + −∪ ∩

Divide to N slices:

φ=0=1+1-2

N=18

15

Divide to N slices:

[ (slice ) (edge )] (0)i ii

ϕ ϕ ϕ ϕ= − +∑[ ] ( [ (slice)] [ (edge)]) (0)E N E Eϕ ϕ ϕ ϕ= × − +

edge

(0)ϕ1(slice) 7.8 0.35ϕ = ±

1(edge) 2.5 0.15ϕ = ±

/2

/2

(slice) ((6 0.5) (6.7 0.8) )

(edge) (4.4 0.2)

u

u

u e

e

ϕ

ϕ

= ± + ±

= ±

1

2

28 9

120 14

= ±

= ±

N

N

40 “slice” simulations

Consistent with full sky

simulation

Page 16: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

2-D exapmle #2: resonance search with unknown width

� Gaussian signal on exponential background

� Toy model : 0<m<100 , 2<σ<6

� Unbinned likelihood:

( ) ( )( | )s s i b s i

s bi s b

N f x N f xPoiss N N N

N N

+= × +

+∏L1 01

1 02

2( )x m−

16

( ) cxbf x ce−=

0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 01 0

0

10 20 30 40 50 60 70 80 902

2.5

3

3.5

4

4.5

5

5.5

60q̂

σ

m

2

2

( )

2

2

1( ; , )

2

x m

sf x m e σσπσ

−−

=

Page 17: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

2-D exapmle #2: resonance search with unknown width

10-3

10-2

10-1

100

P-value0q̂

Excellent approximation above the ~2σlevel

1710 20 30 40 50 60 70 80 90

2

2.5

3

3.5

4

4.5

5

5.5

6

10 20 30 40 50 60 70 80 902

2.5

3

3.5

4

4.5

5

5.5

6

u=1 u=0

5 10 15 20 25 3010

-6

10-5

10-4

2 /21 2

1[ ( )] P( ) ( )

2u

uE A u u eϕ χ −= > + +N N

1

2

4 0.2

0.7 0.3

= ±

= ±

N

N

0 4.5 0.2ϕ = ±1 3 0.16ϕ = ±

Page 18: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

More dimensions …

� Potential additional search dimensions:time, temporal/angular scale, energy, …

� When possible, slicing can be useful to reduce necessary computations.

18

� Slightly more complicated to calculate the E.C. on an N-D grid, but the formalism is well suited.

Page 19: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

Summary

� The Euler characteristic formula provides a practical way of estimating the look-elsewhere effect.

� Applicable in wide range of applications, such as astrophysical searches for neutrino sources or resonance search with unknown width, and in any

19

resonance search with unknown width, and in any number of search dimensions.

� The procedure for estimating the p-value is simple and reliable.

p-value ≈ 2 /21 2

1[ ( )] P( ) ( ) ...

2u

uE A u u eϕ χ −= > + + +N N

Page 20: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

20

Backup

Page 21: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

2-d example: search for neutrino sources (IceCube)

0.05

0.1

0.15

0.2

0.25

21

Significance map

-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25

-0.25

-0.2

-0.15

-0.1

-0.05

0

Excursion set (u=1)

0( , )q θ ϕ

Page 22: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

A small modification

� Usually we only look for ‘positive’ signals

0

( 0)2 log

ˆ( , )( )

0

q

µµ θθ=−

=

L

Lˆ 0µ >

ˆ 0µ ≤

q0(θ) is ‘half chi2’

[H. Chernoff, Ann. Math. Stat. 25, 573578 (1954)]

22

The p-value just get divided by 1/2

� Or equivalently consider as a gaussian field

( since by Wald’s theorem)

0 ˆ 0µ ≤

2

0

ˆ ( )( )q

µ θθ

σ =

[Cowan, Cranmer, Gross, Vitells, arXiv:1007.1727]

µ̂

Page 23: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

The 1-dimensional caseFor a chi2 random field,the expected number ofupcrossings of a level u is given by: [Davies,1987]

/21[ ] u

uE N e−=N0 20 40 60 80 100 120

0

10

20

30

40

50

Eve

nts

/ un

it m

ass

23

To have the global maximum above a level u:

- Either have at least one upcrossing (Nu>0) or have q0>u at the origin (q0(0)>u) :

0 20 40 60 80 100 120

0 20 40 60 80 100 1200

5

m

q(m

)

0( )q m

u

0q̂

0 0ˆ( ) ( 0) ( (0) )uP q u P N P q u> ≤ > + >

0[ ] ( (0) )uE N P q u≤ + >

[ ] P( 0)u uE N N≥ >Note the inequality:

1 P(1) 2 P(2) ... P(1) P(2) ...⋅ + ⋅ + ≥ + +

When P( 1) P( 1)u uN N> =≪

[ ] P( 1) P( 0)u u uE N N N= >≃ ≃then

(large u)

Becomes an equality for large u

[R.B. Davies, Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 74, 33–43 (1987)]

Page 24: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

The 1-dimensional case

The only unknown is ,which can be estimated from the average number of upcrossings at some

low reference level

1N

0 20 40 60 80 100 1200

10

20

30

40

50

Eve

nts

/ un

it m

ass /2

1[ ] uuE N e−=N

24

0 0P( ) [ ] P( (0) )uq u E N q u> ≤ + >/2 2

1 1

1( )

2ue P uχ−= + >N

low reference level

0

0

/21

uuN e≅N

0 20 40 60 80 100 120

0 20 40 60 80 100 1200

5

m

q(m

)

0( )q m

u

0q̂

The p-value can then be estimated by Davies’ formula

Page 25: Estimating the significance of a signal in a multi ... · Estimating the significance of a signal in a multi-dimensional search Ofer Vitells , Eilam Gross 1 TAUP 2011 , 5-9 September

1-D example: resonance search

0 2 0 4 0 6 0 8 0 1 0 0 1 2 00

1 0

2 0

3 0

4 0

5 0

Eve

nts

/ u

nit

ma

ss

The model is a gaussian signal (with unknown location m) on top of a continuous background (Rayleigh distribution)

( | ( ) )i i ii

Poiss n s m bµ β= +∏L

0.5( ) 4.34 0.11N u = = ±

25

In this example we find

[from 100 random background simualtions]

1 5.58 0.14= ±N

/2 21 1

1( )

2ue P uχ− + >N

P-value

0max ( )m

q m

[(E. Gross and O. Vitells, Eur. Phys. J. C, 70, 1-2, (2010) , arXiv:1005.1891]

Excellent approximation already from ~2σ(p-value≈5x10-2)