analisi statistica dei dati nella fisica nucl. e subnucl...

Post on 29-Aug-2018

222 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Analisi Statistica dei dati nella Fisica Nucl. e Subnucl. [Laboratorio]

Gabriele SirriIstituto Nazionale di Fisica Nucleare

1

Questa è una raccolta delle slides mostrate nell’A.A. 2015-2016 …

… in continuo aggiornamento e rielaborazione.

2

Credits:RooFit slides extracted or adapted from original presentations by Wouter Verkerke .

3

4

Terminology

5

7

Data modelling

What’s it all about ?EstimatorsThe maximum likelihood estimator

9

10

Data modelling – analysis examples

• Typical questions

– We obtained a mass distribution from data

– what are the signal and background(s) yields?

– what is the significance of the signal?

• Typical tasks

– Creation of an adequate model for the data

– Description of detector effects such as acceptances and resolutions

– Make sure the model is correctly implemented - toy Monte Carlo studies

– Fit the model to the data

– Graphical representation of the data and fit results

11

12

Intermezzo – Functions vs probability density functions

• Why use probability density functions rather than ‘plain’ functions to describe your data?

– Easier to interpret your models. If Blue and Green pdf are each guaranteed to be normalized to 1, then fractions of Blue,Green can be cleanly interpreted as #events

– Many statistical techniques onlyfunction properly with PDFs(e.g maximum likelihood)

– Can sample ‘toy Monte Carlo’ eventsfrom p.d.f because value is always guaranteed to be >=0

• So why is not everybody always using them

– The normalization can be hard to calculate(e.g. it can be different for each set of parameter values p)

– In >1 dimension (numeric) integration can be particularly hard

– RooFit aims to simplify these tasks

13

Estimators

i

i

i

i

xN

xV

xN

x

2)(1

)(ˆ

1)(ˆ

Estimator of the mean

Estimator of the variance

aan )ˆ(lim

2)ˆˆ()ˆ( aaaV This is called theMinimum Variance Bound

Note: Cramer-Rao theorem says there is a limit to the accuracy of an estimator

ie. that there is some estimator for which the variance is a minimum (MVB).

Estimators are called efficient if V(estimator)=MVB

• An estimator is a procedure giving a value for a parameter or a property of a distribution as a function of the actual data values, i.e.

• A perfect estimator is

– Consistent:

– Unbiased – With finite statistics you get the right answer on average

– Efficient

– There are no perfect estimators for real-life problems

14

The Likelihood estimator

)...;();();()(i.e.,);()( 210 pxFpxFpxFpLpxFpLi

i

i

i pxFpL );(ln)(ln

0)(ln

ˆ

ii pp

pd

pLd

Functions used in likelihoods must be Probability Density Functions:

0);(,1);( pxFxdpxF

• Definition of Likelihood

– given D(x) and F(x;p)

– For convenience the negative log of the Likelihood is often used

• Parameters are estimated by maximizing the Likelihood, or equivalently minimizing –log(L)

15

Maximum Likelihood – Variance on ML parameter estimates

p

1

2

22 ln

)(ˆ)(ˆ

pd

LdpVp

pd

Ld

dpdb

pV2

2 ln

1)ˆ(

From Rao-Cramer-Frechetinequality

b = bias as function of p,inequality becomes equalityin limit of efficient estimator

2

1ln)(ln

ˆ2

)ˆ(ln

2

)ˆ(lnln

)ˆ(ln

)ˆ(ln

)ˆ(ln)(ln

max2

2

max

2

ˆ

2

2

max

2

ˆ

2

2

21

ˆ

LpLpp

L

pp

pd

LdL

pppd

Ldpp

dp

LdpLpL

p

pp

pppp

-lo

g(L)

0.5

• Estimator for the parameter variance is

– I.e. variance is estimated from 2nd derivative of –log(L) at minimum

– Valid if estimator is efficient and unbiased!

• Visual interpretation of variance estimate

– Taylor expand –log(L) around minimum

16

Maximum Likelihood – Properties of MLEs

22ˆ pp Use of 2nd derivative of –log(L)

for variance estimate is usually OK

• In general, Maximum Likelihood Estimators are

– Consistent (gives right answer for N)

– Mostly unbiased (bias 1/N, may need to worry at small N)

– Efficient for large N (you get the smallest possible error)

– Invariant: (a transformation of parameters will Not change your answer, e.g

• MLE efficiency theorem: the MLE will be unbiased and efficient if an unbiased efficient estimator exists

17

Maximum Likelihood – Extended ML

)...;();();()(i.e.,);()( 210 pxFpxFpxFpLpxFpLi

i

)log()),(log()(log expexp NNNpxgpL obs

D

i

Log of Poisson(Nexp,Nobs) (modulo a constant)

• Maximum likelihood information only parameterizes shape of distribution

– I.e. one can determine fraction of signal events from ML fit, but not number of signal events

• Extended Maximum likelihood add extra term

– Clever choice of parameters will allows us to extract Nsig and Nbkg in one pass ( Nexp=Nsig+Nbkg, fsig=Nsig/(Nsig+Nbkg) )

18

Introduction & Overview1

24

Introduction -- Focus: coding a probability density function• Focus on one practical aspect of many data analysis in

HEP: How do you formulate your p.d.f. in ROOT– For ‘simple’ problems (gauss, polynomial) this is easy

– But if you want to do unbinned ML fits, use non-trivial functions, or work with multidimensional functions you quickly find that you need some tools to help you

25

Introduction – Why RooFit was developed

);|BkgResol();(BkgDecay);BkgSel()1(

);|SigResol())2sin(,;(SigDecay);SigSel(

bkgbkgbkgsig

sigsigsigsig

rdttqtpmf

rdttqtpmf

• BaBar experiment at SLAC: Extract sin(2) from time dependent CP violation of B decay: e+e-

Y(4s) BB

– Reconstruct both Bs, measure decay time difference

– Physics of interest is in decay time dependent oscillation

• Many issues arise

– Standard ROOT function framework clearly insufficient to handle such complicated functions must develop new framework

– Normalization of p.d.f. not always trivial to calculate may need numeric integration techniques

– Unbinned fit, >2 dimensions, many events computation performance important must try optimize code for acceptable performance

– Simultaneous fit to control samples to account for detector performance

26

Introduction – Relation to ROOT

C++ command line interface & macros

Data management &histogramming

Graphics interface

I/O support

MINUIT

ToyMC dataGeneration

Data/ModelFitting

Data Modeling

Model Visualization

Extension to ROOT – (Almost) no overlap with existing functionality

29

RooFit core design philosophy

variable RooRealVar

function RooAbsReal

PDF RooAbsPdf

space point RooArgSet

list of space points RooAbsData

integral RooRealIntegral

RooFit classMathematical concept

)(xf

x

x

dxxf

x

x

max

min

)(

)(xf

• Mathematical objects are represented as C++ objects

31

RooFit core design philosophy

f(x,y,z)

RooRealVar x RooRealVar y RooRealVar z

RooAbsReal f

RooRealVar x(“x”,”x”,5) ;

RooRealVar y(“y”,”y”,5) ;

RooRealVar z(“z”,”z”,5) ;

RooBogusFunction f(“f”,”f”,x,y,z) ;

Math

RooFitdiagram

RooFitcode

• Represent relations between variables and functionsas client/server links between objects

32

Object-oriented data modeling

RooRealVar mass(“mass”,”Invariant mass”,5.20,5.30) ;

RooRealVar width(“width”,”B0 mass width”,0.00027,”GeV”);

RooRealVar mb0(“mb0”,”B0 mass”,5.2794,”GeV”) ;

RooGaussian b0sig(“b0sig”,”B0 sig PDF”,mass,mb0,width);

Objects representinga ‘real’ value.

PDF object

Initial range

Initial value Optional unit

References to variables

– All objects are self documenting

• Name - Unique identifier of object

• Title – More elaborate description of object

33

Object-oriented data modeling

RooRealVar mass(“mass”,”Invariant mass”,5.20,5.30) ;

RooRealVar width(“width”,”B0 mass width”,0.00027,”GeV”);

RooRealVar mb0(“mb0”,”B0 mass”,5.2794,”GeV”) ;

RooGaussian b0sig(“b0sig”,”B0 sig PDF”,mass,mb0,width);

Objects representinga ‘real’ value.

PDF object

Initial range

Initial value Optional unit

References to variables

• In RooFit every variable, data point, function, PDF represented in a C++ object

– Objects classified by data/function type they represent,not by their role in a particular setup

– All objects are self documenting

• Name - Unique identifier of object

• Title – More elaborate description of object

34

Basic use2

35

The simplest possible example

RooRealVar x(“x”,”Observable”,-10,10) ;

RooRealVar mean(“mean”,”B0 mass”,0.00027,”GeV”);

RooRealVar sigma(“sigma”,”B0 mass width”,5.2794,”GeV”) ;

RooGaussian model(“model”,”signal pdf”,mass,mean,sigma)

Objects representinga ‘real’ value.

PDF object

Initial range

Initial value Optional unit

References to variables

Name of object Title of object

• We make a Gaussian p.d.f. with three variables: mass, mean and sigma

36

Basics – Creating and plotting a Gaussian p.d.f

// Create an empty plot frame

RooPlot* xframe = w::x.frame() ;

// Plot model on frame

model.plotOn(xframe) ;

// Draw frame on canvas

xframe->Draw() ;

Plot range taken from limits of x

Axis label from gauss title

Unit normalization

Setup gaussian PDF and plot

A RooPlot is an empty frame

capable of holding anythingplotted versus it variable

37

Basics – Generating toy MC events

// Generate an unbinned toy MC set

RooDataSet* data = w::gauss.generate(w::x,10000) ;

// Generate an binned toy MC set

RooDataHist* data = w::gauss.generateBinned(w::x,10000) ;

// Plot PDF

RooPlot* xframe = w::x.frame() ;

data->plotOn(xframe) ;

xframe->Draw() ;

Generate 10000 events from Gaussian p.d.f and show distribution

Can generate both binned andunbinned datasets

43

Basics – Importing data

// Import unbinned data

RooDataSet data(“data”,”data”,w::x,Import(*myTree)) ;

// Import unbinned data

RooDataHist data(“data”,”data”,w::x,Import(*myTH1)) ;

• Unbinned data can also be imported from ROOT TTrees

– Imports TTree branch named “x”.

– Can be of type Double_t, Float_t, Int_t or UInt_t.

All data is converted to Double_t internally

– Specify a RooArgSet of multiple observables to import

multiple observables

• Binned data can be imported from ROOT THx histograms

– Imports values, binning definition and SumW2 errors (if defined)

– Specify a RooArgList of observables when importing a TH2/3.

44

Basics – ML fit of p.d.f to unbinned data

// ML fit of gauss to data

w::gauss.fitTo(*data) ;(MINUIT printout omitted)

// Parameters if gauss now

// reflect fitted values

w::mean.Print()

RooRealVar::mean = 0.0172335 +/- 0.0299542

w::sigma.Print()

RooRealVar::sigma = 2.98094 +/- 0.0217306

// Plot fitted PDF and toy data overlaid

RooPlot* xframe = w::x.frame() ;

data->plotOn(xframe) ;

w::gauss.plotOn(xframe) ;

PDFautomaticallynormalizedto dataset

45

Basics – ML fit of p.d.f to unbinned data

RooFitResult* r = w::gauss.fitTo(*data,Save()) ;

r->Print() ;

RooFitResult: minimized FCN value: 25055.6,

estimated distance to minimum: 7.27598e-08

coviarance matrix quality:

Full, accurate covariance matrix

Floating Parameter FinalValue +/- Error

-------------------- --------------------------

mean 1.7233e-02 +/- 3.00e-02

sigma 2.9809e+00 +/- 2.17e-02

r->correlationMatrix().Print() ;

2x2 matrix is as follows

| 0 | 1 |

-------------------------------

0 | 1 0.0005869

1 | 0.0005869 1

• Can also choose to save full detail of fit

46

Basics – Integrals over p.d.f.s

w::x.setRange(“sig”,-3,7) ;

RooAbsReal* ig = w::g.createIntegral(x,NormSet(x),Range(“sig”)) ;

cout << ig.getVal() ;

0.832519

mean=-1 ;

cout << ig.getVal() ;

0.743677

xdxFxCx

x

min

)()(

RooAbsReal* cdf = gauss.createCdf(x) ;

• It is easy to create an object representing integral over a normalized p.d.f in a sub-range

• Similarly, one can also request the cumulative distribution function

47

RooFit core design philosophy - Workspace

f(x,y,z)

RooRealVar x RooRealVar y RooRealVar z

RooAbsReal f

RooRealVar x(“x”,”x”,5) ;

RooRealVar y(“y”,”y”,5) ;

RooRealVar z(“z”,”z”,5) ;

RooBogusFunction f(“f”,”f”,x,y,z) ;

RooWorkspace w(“w”) ;

w.import(f) ;

Math

RooFitdiagram

RooFitcode

RooWorkspace

• The workspace serves a container class for allobjects created

48

Using the workspace

RooWorkspace w(“w”) ;

RooRealVar x(“x”,”x”,-10,10) ;

RooRealVar mean(“mean”,”mean”,5) ;

RooRealVar sigma(“sigma”,”sigma”,3) ;

RooGaussian f(“f”,”f”,x,mean,sigma) ;

// imports f,x,mean and sigma

w.import(myFunction) ;

• Workspace

– A generic container class for all RooFit objects of your project

– Helps to organize analysis projects

• Creating a workspace

• Putting variables and function into a workspace

– When importing a function or pdf, all its components (variables) are automatically imported too

49

Using the workspace

w.Print() ;

variables

---------

(mean,sigma,x)

p.d.f.s

-------

RooGaussian::f[ x=x mean=mean sigma=sigma ] = 0.249352

// Variety of accessors available

RooPlot* frame = w.var(“x”)->frame() ;

w.pdf(“f”)->plotOn(frame) ;

• Looking into a workspace

• Getting variables and functions out of a workspace

50

Using the workspace

// Variety of accessors available

w.exportToCint() ;

RooPlot* frame = w::x.frame() ;

w::f.plotOn(frame) ;

w.writeToFile(“wspace.root”) ;

• Alternative access to contents through namespace

– Uses CINT extension of C++, works in interpreted code only

• Writing workspace and contents to file

51

Using the workspace

void driver() {

RooWorkspace w(“w”0 ;

makeModel(w) ;

useModel(w) ;

}

void makeModel(RooWorkspace& w) {

// Construct model here

}

void useModel(RooWorkspace& w) {

// Make fit, plots etc here

}

• Organizing your code –Separate construction and use of models

52

RooFit core design philosophy - Factory

f(x,y,z)

RooRealVar x RooRealVar y RooRealVar z

RooAbsReal f

RooWorkspace w(“w”) ;

w.factory(“BogusFunction::f(x[5],y[5],z[5])”) ;

Math

RooFitdiagram

RooFitcode

RooWorkspace

• The factory allows to fill a workspace with pdfs and variables using a simplified scripting language

53

Factory and Workspace

w.factory(“Gaussian::f(x[-10,10],mean[5],sigma[3])”) ;

RooRealVar x(“x”,”x”,-10,10) ;

RooRealVar mean(“mean”,”mean”,5) ;

RooRealVar sigma(“sigma”,”sigma”,3) ;

RooGaussian f(“f”,”f”,x,mean,sigma) ;

• One C++ object per math symbol provides ultimate level of control over each objects functionality, but results in lengthy user code for even simple macros

• Solution: add factory that auto-generates objects from a math-like language. Accessed through factory() method of workspace

• Example: reduce construction of Gaussian pdf and its parameters from 4 to 1 line of code

54

Factory language – Goal and scope

• Aim of factory language is to be very simple.

• The goal is to construct pdfs, functions and variables

– This limits the scope of the factory language (and allows to keep it simple)

– Objects can be customized after creation

• The language syntax has only three elements

1. Simplified expression for creation of variables

2. Expression for creation of functions and pdf is trivial1-to-1 mapping of C++ constructor syntax of corresponding object

3. Multiple objects (e.g. a pdf and its variables) can be nested in a single expression

• Operator classes (sum,product) provide alternate syntax in factory that is closer to math notation

55

Factory syntax

x[-10,10] // Create variable with given range

x[5,-10,10] // Create variable with initial value and range

x[5] // Create initially constant variable

Gaussian::g(x,mean,sigma)

RooGaussian(“g”,”g”,x,mean,sigma)

Polynomial::p(x,{a0,a1})

RooPolynomial(“p”,”p”,x”,RooArgList(a0,a1));

ClassName::Objectname(arg1,[arg2],...)

• Rule #1 – Create a variable

• Rule #2 – Create a function or pdf object

– Leading ‘Roo’ in class name can be omitted

– Arguments are names of objects that already exist in the workspace

– Named objects must be of correct type, if not factory issues error

– Set and List arguments can be constructed with brackets {}

56

Factory syntax

Gaussian::g(x[-10,10],mean[-10,10],sigma[3])

x[-10,10]

mean[-10,10]

sigma[3]

Gaussian::g(x,mean,sigma)

Gaussian::g(x[-10,10],0,3)

SUM::model(0.5*Gaussian(x[-10,10],0,3),Uniform(x)) ;

• Rule #3 – Each creation expression returns the name of the object created

– Allows to create input arguments to functions ‘in place’ rather than in advance

• Miscellaneous points

– You can always use numeric literals where values or functions are expected

– It is not required to give component objects a name, e.g.

57

Model building – (Re)using standard components

RooArgusBG

RooPolynomial

RooBMixDecay

RooHistPdf

RooGaussian

BasicGaussian, Exponential, Polynomial,…Chebychev polynomial

Physics inspiredARGUS,Crystal Ball, Breit-Wigner, Voigtian,B/D-Decay,….

Non-parametricHistogram, KEYS

Easy to extend the library: each p.d.f. is a separate C++ class

• RooFit provides a collection of compiled standard PDF classes

58

Model building – (Re)using standard components

• List of most frequently used pdfs and their factory spec

Gaussian Gaussian::g(x,mean,sigma)

Breit-Wigner BreitWigner::bw(x,mean,gamma)

Landau Landau::l(x,mean,sigma)

Exponential Exponental::e(x,alpha)

Polynomial Polynomial::p(x,{a0,a1,a2})

Chebychev Chebychev::p(x,{a0,a1,a2})

Kernel Estimation KeysPdf::k(x,dataSet)

Poisson Poisson::p(x,mu)

Voigtian Voigtian::v(x,mean,gamma,sigma)

(=BW⊗G)

59

Model building – Making your own

w.factory(“EXPR::mypdf(‘sqrt(a*x)+b’,x,a,b)”) ;

w.factory(“CEXPR::mypdf(‘sqrt(a*x)+b’,x,a,b)”) ;

• Interpreted expressions

• Customized class, compiled and linked on the fly

• Custom class written by you

– Offer option of providing analytical integrals, custom handling of toy MC generation (details in RooFit Manual)

• Compiled classes are faster in use, but require O(1-2) seconds startup overhead

– Best choice depends on use context

60

Model building – Adjusting parameterization

w.factory(“expr::w(‘(1-D)/2’,D[0,1])”) ;

w.factory(“BMixDecay::bmix(t,mixState,tagFlav,

tau,expr(‘(1-D)/2’,D[0,1]),dw,....”) ;

• RooFit pdf classes do not require their parameter arguments to be variables, one can plug in functions as well

• Simplest tool perform reparameterization is interpreted formula expression

– Note lower case: expr builds function, EXPR builds pdf

• Example: Reparameterize pdf that expects mistag rate in terms of dilution

61

Composite models3

62

RooBMixDecay

RooPolynomial

RooHistPdf

RooArgusBG

Model building – (Re)using standard components

RooAddPdf+

RooGaussian

• Most realistic models are constructed as the sum of one or more p.d.f.s (e.g. signal and background)

• Facilitated through operator p.d.f RooAddPdf

63

Adding p.d.f.s – Mathematical side

)()1()()( xGfxfFxS

)(1)(...)()()(1,0

111100 xPcxPcxPcxPcxS n

ni

inn

• From math point of view adding p.d.f is simple

– Two components F, G

– Generically for N components P0-PN

• For N p.d.f.s, there are N-1 fraction coefficients that should sum to less 1

– The remainder is by construction 1 minus the sum of all other coefficients

64

Adding p.d.f.s – Factory syntax

w.factory(“Gaussian::gauss1(x[0,10],mean1[2],sigma[1]”) ;

w.factory(“Gaussian::gauss2(x,mean2[3],sigma)”) ;

w.factory(“ArgusBG::argus(x,k[-1],9.0)”) ;

w.factory(“SUM::sum(g1frac[0.5]*gauss1, g2frac[0.1]*gauss2, argus)”)

SUM::name(frac1*PDF1,frac2*PDF2,...,PDFN)

• Additions created through a SUM expression

– Note that last PDF does not have an associated fraction

• Complete example

65

Extended ML fits

NNxBfxSfxF exp;)()1()()(

BS

BS

B

BS

S NNNxBNN

NxS

NN

NxF

exp;)()()(

BS NNNf ,,

SUM::name(Nsig*S,Nbkg*B)

Write like this, extended term automatically included in –log(L)

shape normalization

• In an extended ML fit, an extra term is added to the likelihood

Poisson(Nobs,Nexp)

• This is most useful in combination with a composite pdf

66

Component plotting - Introduction

// Plot only argus components

w::sum.plotOn(frame,Components(“argus”),LineStyle(kDashed)) ;

// Wildcards allowed

w::sum.plotOn(frame,Components(“gauss*”),LineStyle(kDashed)) ;

• Plotting, toy event generation and fitting works identically for composite p.d.f.s

– Several optimizations applied behind the scenes that are specific to composite models (e.g. delegate event generation to components)

• Extra plotting functionality specific to composite pdfs

– Component plotting

67

Operations on specific to composite pdfs

RooAddPdf::sum[ g1frac * g1 + g2frac * g2 + [%] * argus ] = 0.0687785

RooGaussian::g1[ x=x mean=mean1 sigma=sigma ] = 0.135335

RooGaussian::g2[ x=x mean=mean2 sigma=sigma ] = 0.011109

RooArgusBG::argus[ m=x m0=k c=9 p=0.5 ] = 0

• Tree printing mode of workspace reveals component structure – w.Print(“t”)

– Can also make input files for GraphViz visualization(w::sum.graphVizTree(“myfile.dot”))

– Graph output on ROOT Canvas in near future(pending ROOT integrationof GraphViz package)

68

Convolution

=

• Many experimental observable quantities are well described by convolutions

– Typically physics distribution smeared with experimental resolution (e.g. for B0 J/y KS exponential decay distribution

smeared with Gaussian)

– By explicitly describing observed distribution with a convolution p.d.f can disentangle detector and physics

• To the extent that enough information is in the data to make this possible

69

Common fittingissues4• Understanding MINUIT output

• Instabilities and correlation coefficients

79

What happens when you do

pdf->fitTo(*data) ?

80

Fitting and likelihood minimization

// Construct function object representing –log(L)

RooAbsReal* nll = pdf.createNLL(data) ;

// Minimize nll w.r.t its parameters

RooMinuit m(*nll) ;

m.migrad() ;

m.hesse() ;

• What happens when you do pdf->fitTo(*data)

– 1) Construct object representing –log of (extended) likelihood

– 2) Minimize likelihood w.r.t floating parameters using MINUIT

• Can also do these two steps explicitly by hand

81

Let take a closer look at

Minuit

82

A brief description of MINUIT functionality

1

2

22 ln

)(ˆ)(ˆ

pd

LdpVp

• MIGRAD

– Find function minimum. Calculates function gradient, follow to (local) minimum, recalculate gradient, iterate until minimum found

• To see what MIGRAD does, it is very instructive to do RooMinuit::setVerbose(1). It will print a line for each step through parameter space

– Number of function calls required depends greatly on number of floating parameters, distance from function minimum and shape of function

• HESSE

– Calculation of error matrix from 2nd derivatives at minimum

– Gives symmetric error. Valid in assumption that likelihood is (locally parabolic)

– Requires roughly N2 likelihood evaluations (with N = number of floating parameters)

83

A brief description of MINUIT functionality

• MINOS

– Calculate errors by explicit finding points (or contour for >1D) where D-log(L)=0.5

– Reported errors can be asymmetric

– Can be very expensive in with large number of floating parameters

• CONTOUR

– Find contours of equal D-log(L) in two parameters and draw corresponding shape

– Mostly an interactive analysis tool

84

Note of MIGRAD function minimization

Reason: There may exist multiple (local) minimain the likelihood or c2

p

-lo

g(L)

Local minimum

True minimum

• For all but the most trivial scenarios it is not possible to automatically find reasonable starting values of parameters

– So you need to supply ‘reasonable’ starting values for your parameters

– You may also need to supply ‘reasonable’ initial step size in parameters. (A step size 10x the range of the above plot is clearly unhelpful)

– Using RooMinuit, the initial step size is the value of RooRealVar::getError(), so you can control this by supplying

initial error values

85

Minuit function MIGRAD

**********

** 13 **MIGRAD 1000 1

**********

(some output omitted)

MIGRAD MINIMIZATION HAS CONVERGED.

MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.

COVARIANCE MATRIX CALCULATED SUCCESSFULLY

FCN=257.304 FROM MIGRAD STATUS=CONVERGED 31 CALLS 32 TOTAL

EDM=2.36773e-06 STRATEGY= 1 ERROR MATRIX ACCURATE

EXT PARAMETER STEP FIRST

NO. NAME VALUE ERROR SIZE DERIVATIVE

1 mean 8.84225e-02 3.23862e-01 3.58344e-04 -2.24755e-02

2 sigma 3.20763e+00 2.39540e-01 2.78628e-04 -5.34724e-02

ERR DEF= 0.5

EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5

1.049e-01 3.338e-04

3.338e-04 5.739e-02

PARAMETER CORRELATION COEFFICIENTS

NO. GLOBAL 1 2

1 0.00430 1.000 0.004

2 0.00430 0.004 1.000

Parameter values and approximate errors reported by MINUIT

Error definition (in this case 0.5 for a likelihood fit)

Progress information,watch for errors here

• Purpose: find minimum

86

Minuit function MIGRAD

**********

** 13 **MIGRAD 1000 1

**********

(some output omitted)

MIGRAD MINIMIZATION HAS CONVERGED.

MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.

COVARIANCE MATRIX CALCULATED SUCCESSFULLY

FCN=257.304 FROM MIGRAD STATUS=CONVERGED 31 CALLS 32 TOTAL

EDM=2.36773e-06 STRATEGY= 1 ERROR MATRIX ACCURATE

EXT PARAMETER STEP FIRST

NO. NAME VALUE ERROR SIZE DERIVATIVE

1 mean 8.84225e-02 3.23862e-01 3.58344e-04 -2.24755e-02

2 sigma 3.20763e+00 2.39540e-01 2.78628e-04 -5.34724e-02

ERR DEF= 0.5

EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5

1.049e-01 3.338e-04

3.338e-04 5.739e-02

PARAMETER CORRELATION COEFFICIENTS

NO. GLOBAL 1 2

1 0.00430 1.000 0.004

2 0.00430 0.004 1.000

Approximate Error matrix

And covariance matrix

Value of c2 or likelihood at minimum

(NB: c2 values are not divided by Nd.o.f)

• Purpose: find minimum

87

Minuit function MIGRAD

• Purpose: find minimum

**********

** 13 **MIGRAD 1000 1

**********

(some output omitted)

MIGRAD MINIMIZATION HAS CONVERGED.

MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.

COVARIANCE MATRIX CALCULATED SUCCESSFULLY

FCN=257.304 FROM MIGRAD STATUS=CONVERGED 31 CALLS 32 TOTAL

EDM=2.36773e-06 STRATEGY= 1 ERROR MATRIX ACCURATE

EXT PARAMETER STEP FIRST

NO. NAME VALUE ERROR SIZE DERIVATIVE

1 mean 8.84225e-02 3.23862e-01 3.58344e-04 -2.24755e-02

2 sigma 3.20763e+00 2.39540e-01 2.78628e-04 -5.34724e-02

ERR DEF= 0.5

EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5

1.049e-01 3.338e-04

3.338e-04 5.739e-02

PARAMETER CORRELATION COEFFICIENTS

NO. GLOBAL 1 2

1 0.00430 1.000 0.004

2 0.00430 0.004 1.000

Status: Should be ‘converged’ but can be ‘failed’

Estimated Distance to Minimumshould be small O(10-6)

Error Matrix Qualityshould be ‘accurate’, but can be ‘approximate’ in case of trouble

88

Minuit function HESSE

2

2

dp

Ld

**********

** 18 **HESSE 1000

**********

COVARIANCE MATRIX CALCULATED SUCCESSFULLY

FCN=257.304 FROM HESSE STATUS=OK 10 CALLS 42 TOTAL

EDM=2.36534e-06 STRATEGY= 1 ERROR MATRIX ACCURATE

EXT PARAMETER INTERNAL INTERNAL

NO. NAME VALUE ERROR STEP SIZE VALUE

1 mean 8.84225e-02 3.23861e-01 7.16689e-05 8.84237e-03

2 sigma 3.20763e+00 2.39539e-01 5.57256e-05 3.26535e-01

ERR DEF= 0.5

EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5

1.049e-01 2.780e-04

2.780e-04 5.739e-02

PARAMETER CORRELATION COEFFICIENTS

NO. GLOBAL 1 2

1 0.00358 1.000 0.004

2 0.00358 0.004 1.000

Error matrix (Covariance Matrix)

calculated from1

2 )ln(

ji

ijdpdp

LdV

• Purpose: calculate error matrix from

89

Minuit function HESSE

2

2

dp

Ld

**********

** 18 **HESSE 1000

**********

COVARIANCE MATRIX CALCULATED SUCCESSFULLY

FCN=257.304 FROM HESSE STATUS=OK 10 CALLS 42 TOTAL

EDM=2.36534e-06 STRATEGY= 1 ERROR MATRIX ACCURATE

EXT PARAMETER INTERNAL INTERNAL

NO. NAME VALUE ERROR STEP SIZE VALUE

1 mean 8.84225e-02 3.23861e-01 7.16689e-05 8.84237e-03

2 sigma 3.20763e+00 2.39539e-01 5.57256e-05 3.26535e-01

ERR DEF= 0.5

EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5

1.049e-01 2.780e-04

2.780e-04 5.739e-02

PARAMETER CORRELATION COEFFICIENTS

NO. GLOBAL 1 2

1 0.00358 1.000 0.004

2 0.00358 0.004 1.000

Correlation matrix rij

calculated from

ijjiijV r

• Purpose: calculate error matrix from

90

Minuit function HESSE

2

2

dp

Ld

**********

** 18 **HESSE 1000

**********

COVARIANCE MATRIX CALCULATED SUCCESSFULLY

FCN=257.304 FROM HESSE STATUS=OK 10 CALLS 42 TOTAL

EDM=2.36534e-06 STRATEGY= 1 ERROR MATRIX ACCURATE

EXT PARAMETER INTERNAL INTERNAL

NO. NAME VALUE ERROR STEP SIZE VALUE

1 mean 8.84225e-02 3.23861e-01 7.16689e-05 8.84237e-03

2 sigma 3.20763e+00 2.39539e-01 5.57256e-05 3.26535e-01

ERR DEF= 0.5

EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 2 ERR DEF=0.5

1.049e-01 2.780e-04

2.780e-04 5.739e-02

PARAMETER CORRELATION COEFFICIENTS

NO. GLOBAL 1 2

1 0.00358 1.000 0.004

2 0.00358 0.004 1.000

Global correlation vector:correlation of each parameter

with all other parameters

• Purpose: calculate error matrix from

91

Minuit function MINOS

**********

** 23 **MINOS 1000

**********

FCN=257.304 FROM MINOS STATUS=SUCCESSFUL 52 CALLS 94 TOTAL

EDM=2.36534e-06 STRATEGY= 1 ERROR MATRIX ACCURATE

EXT PARAMETER PARABOLIC MINOS ERRORS

NO. NAME VALUE ERROR NEGATIVE POSITIVE

1 mean 8.84225e-02 3.23861e-01 -3.24688e-01 3.25391e-01

2 sigma 3.20763e+00 2.39539e-01 -2.23321e-01 2.58893e-01

ERR DEF= 0.5

Symmetric error

(repeated result from HESSE)

MINOS errorCan be asymmetric

(in this example the ‘sigma’ error is slightly asymmetric)

• Error analysis through Dnll contour finding

92

Illustration of difference between HESSE and MINOS errors

MINOS error

HESSE error

Extrapolationof parabolicapproximationat minimum

• ‘Pathological’ example likelihood with multiple minima and non-parabolic behavior

93

Practical estimation – Fit converge problems

PARAMETER CORRELATION COEFFICIENTS

NO. GLOBAL 1 2

1 0.99835 1.000 0.998

2 0.99835 0.998 1.000

Signs of trouble…

• Sometimes fits don’t converge because, e.g.

– MIGRAD unable to find minimum

– HESSE finds negative second derivatives (which would imply negative errors)

• Reason is usually numerical precision and stability problems, but

– The underlying cause of fit stability problems is usually by highly correlated parameters in fit

• HESSE correlation matrix in primary investigative tool

– In limit of 100% correlation, the usual point solution becomes a line solution (or surface solution) in parameter space. Minimization problem is no longer well defined

94

Mitigating fit stability problems

),;()1(),;(),,,;( 221121 msxGfmsxfGssmfxF

PARAMETER CORRELATION COEFFICIENTS

NO. GLOBAL [ f] [ m] [s1] [s2]

[ f] 0.96973 1.000 -0.135 0.918 0.915

[ m] 0.14407 -0.135 1.000 -0.144 -0.114

[s1] 0.92762 0.918 -0.144 1.000 0.786

[s2] 0.92486 0.915 -0.114 0.786 1.000

HESSE correlation matrix

Widths s1,s2

strongly correlatedfraction f

• Strategy I – More orthogonal choice of parameters

– Example: fitting sum of 2 Gaussians of similar width

95

Mitigating fit stability problems

),;()1(),;( 2212111 mssxGfmsxfG

PARAMETER CORRELATION COEFFICIENTS

NO. GLOBAL [f] [m] [s1] [s2]

[ f] 0.96951 1.000 -0.134 0.917 -0.681

[ m] 0.14312 -0.134 1.000 -0.143 0.127

[s1] 0.98879 0.917 -0.143 1.000 -0.895

[s2] 0.96156 -0.681 0.127 -0.895 1.000

– Different parameterization:

– Correlation of width s2 and fraction f reduced from 0.92 to 0.68

– Choice of parameterization matters!

• Strategy II – Fix all but one of the correlated parameters

– If floating parameters are highly correlated, some of them may be redundant and not contribute to additional degrees of freedom in your model

96

Mitigating fit stability problems -- Polynomials

• Warning: Regular parameterization of polynomials a0+a1x+a2x

2+a3x3 nearly always results in strong

correlations between the coefficients ai.

– Fit stability problems, inability to find right solution common at higher orders

• Solution: Use existing parameterizations of polynomials that have (mostly) uncorrelated variables

– Example: Chebychev polynomials

97

Minuit CONTOUR tool also useful to examine ‘bad’ correlations

• Example of 1,2 sigma contour of two uncorrelated variables

– Elliptical shape. In this example parameters are uncorrelation

• Example of 1,2 sigma contourof two variables with problematic correlation

– Pdf = fG1(x,0,3)+(1-f)G2(x,0,s) with s=4 in data

98

Practical estimation – Bounding fit parameters

Bou

nd

ed

Param

ete

r s

pace

MINUIT internal parameter space (-∞,+∞)

Internal Error

Exte

rn

al Erro

r• Sometimes is it desirable to bound the allowed range of

parameters in a fit

– Example: a fraction parameter is only defined in the range [0,1]

– MINUIT option ‘B’ maps finite range parameter to an internal infinite range using an arcsin(x) transformation:

99

Working withLikelihood8• Using discrete variable to classify data

• Simultaneous fits on multiple datasets

100

Fitting and likelihood minimization

// Construct function object representing –log(L)

RooAbsReal* nll = pdf.createNLL(data) ;

// Minimize nll w.r.t its parameters

RooMinuit m(*nll) ;

m.migrad() ;

m.hesse() ;

• What happens when you do pdf->fitTo(*data)

– 1) Construct object representing –log of (extended) likelihood

– 2) Minimize likelihood w.r.t floating parameters using MINUIT

• Can also do these two steps explicitly by hand

101

Plotting the likelihood

RooAbsReal* nll = w::model.createNLL(data) ;

RooPlot* frame = w::param.frame() ;

nll->plotOn(frame,ShiftToZero()) ;

• A likelihood function is a regular RooFit function

• Can e.g. plot is as usual

102

Constructing a c2 function

// Construct function object representing –log(L)

RooAbsReal* chi2 = pdf.createChi2(data) ;

// Minimize nll w.r.t its parameters

RooMinuit m(chi2) ;

m.migrad() ;

m.hesse() ;

• Along similar lines it is also possible to construct a c2

function

– Only takes binned datasets (class RooDataHist)

– Normalized p.d.f is multiplied by Ndata to obtain c2

– MINUIT error definition for c2 automatically adjusted to 1 (it is 0.5 for likelihoods) as default error level is supplied through virtual method of function base class RooAbsReal

103

Automatic optimizations in the calculation of the likelihood

• Several automatic computational optimizations are applied the calculation of likelihoods inside RooNLLVar

– Components that have all constant parameters are pre-calculated

– Dataset variables not used by the PDF are dropped

– PDF normalization integrals are only recalculated when the ranges of their observables or the value of their parameters are changed

– Simultaneous fits: When a parameters changes only parts of the total likelihood that depend on that parameter are recalculated

• Lazy evaluation: calculation only done when intergal value is requested

• Applicability of optimization techniques is re-evaluated for each use

– Maximum benefit for each use case

• ‘Typical’ large-scale fits see significant speed increase

– Factor of 3x – 10x not uncommon.

104

Statistical procedures involving likelihood

• ‘Simple’ Parameter and error estimation (MINUIT/HESSE/MINOS)

• Construct Bayesian credible intervals

– Likelihood appears in Bayes theorem for hypothesis with continuous parameters

• Construct (Profile) Likelihood Ratio intervals

– ‘Approximate Confidence intervals’ (Wilks theoreom)

– Connection to MINOS errors

• NB: Can also construct Frequentist intervals (Neyman construction), but these are based on PDFs, not likelihoods

105

Likelihood minimization – class RooMinuit

• Class RooMinuit is an interface to the ROOT implementation of the MINUIT minimization and error analysis package.

• RooMinuit takes care of

– Passing value of miminized RooFit function to MINUIT

– Propagated changes in parameters both from RooRealVar to MINUIT and back from MINUIT to RooRealVar, i.e. it keeps the

state of RooFit objects synchronous with the MINUIT internal state

– Propagate error analysis information back to RooRealVar

parameters objects

– Exposing high-level MINUIT operations to RooFit uses (MIGRAD,HESSE,MINOS) etc…

– Making optional snapshots of complete MINUIT information (e.g. convergence state, full error matrix etc)

106

Demonstration of RooMinuit use

// Start Minuit session on above nll

RooMinuit m(nll) ;

// MIGRAD likelihood minimization

m.migrad() ;

// Run HESSE error analysis

m.hesse() ;

// Set sx to 3, keep fixed in fit

sx.setVal(3) ;

sx.setConstant(kTRUE) ;

// MIGRAD likelihood minimization

m.migrad() ;

// Run MINOS error analysis

m.minos()

// Draw 1,2,3 ‘sigma’ contours in sx,sy

m.contour(sx,sy) ;

107

What happens if there are problems in the NLL calculation

[#0] WARNING:Minization -- RooFitGlue: Minimized function has error status.

Returning maximum FCN so far (99876) to force MIGRAD to back out of this region.

Error log follows. Parameter values: m=-7.397

RooGaussian::gx[ x=x mean=m sigma=sx ] has 3 errors

• Sometimes the likelihood cannot be evaluated do due an error condition.

– PDF Probability is zero, or less than zero at coordinate where there is a data point ‘infinitely improbable’

– Normalization integral of PDF evaluates to zero

• Most problematic during MINUIT operations. How to handle error condition

– All error conditions are gather and reported in consolidated way by RooMinuit

– Since MINUIT has no interface deal with such situations, RooMinuit passes instead a large value to MINUIT to force it to retreat from the region of parameter space in which the problem occurred

108

What happens if there are problems in the NLL calculation

pdf and data

-log(L) vs m0dropping problematic events

-log(L) vs m0with ‘wall’ (RooFit default)

• Classic example in B physics: floating the end point of the ARGUS function

– Probability density of ARGUS above end point is zero If end

point is moved to low value in fit you end up with events above end point Probility is zero Likelihood is –log(0) = infinity

109

What happens if there are problems in the NLL calculation

[#0] WARNING:Minization -- RooFitGlue: Minimized function has error status.

Returning maximum FCN so far (-1e+30) to force MIGRAD to back out of this region.

Error log follows

Parameter values: m=-7.397

RooGaussian::gx[ x=x mean=m sigma=sx ]

getLogVal() top-level p.d.f evaluates to zero or negative number

@ x=x=9.09989, mean=m=-7.39713, sigma=sx=0.1

getLogVal() top-level p.d.f evaluates to zero or negative number

@ x=x=6.04652, mean=m=-7.39713, sigma=sx=0.1

getLogVal() top-level p.d.f evaluates to zero or negative number

@ x=x=2.48563, mean=m=-7.39713, sigma=sx=0.1

• Can request more verbose error logging to debug problem

– Add PrintEvalError(N) with N>1

110

Bayesian formalism

∝ ∗

Area that integrates X% of posterior

• Original Bayes Thm:

P(B|A) ∝ P(A|B) P(B).

• Let probability density function p(x|μ) be the conditional pdf for data x, given parameter μ. Then Bayes’ Thm becomes

p(μ|x) ∝ p(x|μ) p(μ).

• Substituting in a set of observed data, x0, and recognizing the likelihood, written as L(x0|μ) ,L(μ), then

p(μ|x0) ∝ L(x0|μ) p(μ),

111

Illustration of nuisance parameters in Bayesian intervals

∫ =

MLE fit fit data-logLR(mean,sigma)

LR(mean,sigma) prior(mean,sigma) posterior(mean)

• Example: data with Gaussian model (mean,sigma)

112

Bayesian formalism and integration

• Bayesian formalism often requires integration

• Straightforward to do in RooFit Integration functionality for pdfs also works for likelihood functions

113

Likelihood ratio intervals

Likelihood ratio interval

HESSE error

Extrapolationof parabolicapproximationat minimum)ˆ,(

),(),(

xL

xLxLR

• Definition of Likelihood Ratio interval (identical to MINOS for 1 parameter)

114

Dealing with nuisance parameters in Likelihood ratio intervals

MLE fit fit data

-logLR(mean,sigma) -logLR(mean,sigma)

)ˆ,ˆ(

))(ˆ̂,()(

L

L

•best L(μ) for any value of s

•best L(μ,σ)

-logPLR(mean)

• Nuisance parameters in LR interval

– For each value of the parameter of interest, search the full subspace of nuisance parameters for the point at which the likelihood is maximized.

– Associate that value of the likelihood with that value of the parameter of interest ‘Profile likelihood’

115

Working with profile likelihood

)ˆ,ˆ(

)ˆ̂,()(

qpL

qpLp

RooAbsReal* ll = model.createNLL(data,NumCPU(8)) ;

RooAbsReal* pll = ll->createProfile(params) ;

RooPlot* frame = w::frac.frame() ;

nll->plotOn(frame,ShiftToZero()) ;

pll->plotOn(frame,LineColor(kRed)) ;

Best L for given p

Best L• A profile likelihood ratio

can be represent by a regular RooFit function(albeit an expensive one to evaluate)

116

Dealing with nuisance parameters in Likelihood ratio intervals

•Likelihood Ratio

•Profile Likelihood Ratio

•Minimizes –log(L) for each value of fsig

by changing bkg shape params(a 6th order Chebychev Pol)

117

On the equivalence of profile likelihood and MINOS

• Demonstration of equivalenceof (RooFit) profile likelihoodand MINOS errors

– Macro to make above plots is34 lines of code (+23 to beautifygraphics appearance)

118

Intervals & Limits9• A brief introduction to RooStats

119

RooStats Project – Overview

• Goals:

– Standardize interface for major statistical procedures so that they can work on an arbitrary RooFit model & dataset and handle many parameters of interest and nuisance parameters.

– Implement most accepted techniques from Frequentist, Bayesian, and Likelihood-based approaches

– Provide utilities to perform combined measurements

• Design:

– Essentially all methods start with the basic probability density function or likelihood function. Building a good model is the hard part. Want to re-use it for multiple methods Use RooFit to

construct models

– Build series of tools that perform statistical procedures on RooFit models

120

RooStats Project – Structure

• RooFit (data modeling)

– Data modeling language (pdfs and likelihoods).Scales to arbitrary complexity

– Support for efficient integration, toy MC generation

– Workspace

• Persistent container for data models

• Completely self-contained (including custom code)

• Complete introspection and access to components

– Workspace factory provides easy scripting language to populate the workspace

• RooStats (limits, interval calculators & utilities)

– Profile Likelihood calculator

– Neyman construction (FC)

– Bayesian calculator (BAT & native MCMC)

– Utilities (combinations, construct pdfs corresponding to standard number counting problems)

121

RooStats Project – Organization

• Joint ATLAS/CMS project

• Core developers

– K. Cranmer (ATLAS)

– Gregory Schott (CMS)

– Wouter Verkerke (RooFit)

– Lorenzo Moneta (ROOT)

• Open project, you are welcome to join

– Max Baak, Mario Pelliccioni, Alfio Lazzaro contributing now

• Included since ROOT v5.22

– Example macros in $ROOTSYS/tutorials/roostats

• Documentation

– Code doc. via ROOT

– Esers manual is in development

122

RooStats Project – Example

RooWorkspace* w = new RooWorkspace(“w”);

w->factory(“Poisson::P(obs[150,0,300],

sum::n(s[50,0,120]*ratioSigEff[1.,0,2.],

b[100,0,300]*ratioBkgEff[1.,0.,2.]))");

w->factory("PROD::PC(P, Gaussian::sigCon(ratioSigEff,1,0.05),

Gaussian::bkgCon(ratioBkgEff,1,0.1))");

)1.0,1,()05.0,1,()|( bsbs rGaussrGaussrbrsxPoisson

RooWorkspace(w) w contents

variables

---------

(b,obs,ratioBkgEff,ratioSigEff,s)

p.d.f.s

-------

RooProdPdf::PC[ P * sigCon * bkgCon ] = 0.0325554

RooPoisson::P[ x=obs mean=n ] = 0.0325554

RooAddition::n[ s * ratioSigEff + b * ratioBkgEff ] = 150

RooGaussian::sigCon[ x=ratioSigEff mean=1 sigma=0.05 ] = 1

RooGaussian::bkgCon[ x=ratioBkgEff mean=1 sigma=0.1 ] = 1

•Create workspace with above model (using factory)

•Contents of workspace from above operation

• Create a model - Example

123

RooStats Project – Example

RooPlot* frame = w::obs.frame(100,200) ;

w::PC.plotOn(frame) ;

frame->Draw()

• Simple use of model

124

RooStats Project – Example

ProfileLikelihoodCalculator plc;

plc.SetPdf(w::PC);

plc.SetData(data); // contains [obs=160]

plc.SetParameters(w::s);

plc.SetTestSize(.1);

ConfInterval* lrint = plc.GetInterval(); // that was easy.

FeldmanCousins fc;

fc.SetPdf(w::PC);

fc.SetData(data); fc.SetParameters(w::s);

fc.UseAdaptiveSampling(true);

fc.FluctuateNumDataEntries(false);

fc.SetNBins(100); // number of points to test per parameter

fc.SetTestSize(.1);

ConfInterval* fcint = fc.GetInterval(); // that was easy.

UniformProposal up;

MCMCCalculator mc;

mc.SetPdf(w::PC);

mc.SetData(data); mc.SetParameters(s);

mc.SetProposalFunction(up);

mc.SetNumIters(100000); // steps in the chain

mc.SetTestSize(.1); // 90% CL

mc.SetNumBins(50); // used in posterior histogram

mc.SetNumBurnInSteps(40);

ConfInterval* mcmcint = mc.GetInterval();

• Confidence intervals calculated with model

– Profile likelihood

– FeldmanCousins

– Bayesian (MCMC)

125

RooStats Project – Example

double fcul = fcint->UpperLimit(w::s);

double fcll = fcint->LowerLimit(w::s);

• Retrieving and visualizing output

126

RooStats Project – Example

• Some notes on example

– Complete working example (with output visualization) shipped with ROOT distribution ($ROOTSYS/tutorials/roofit/rs101_limitexample.C)

– Interval calculators make no assumptions on internal structure of model. Can feed model of arbitrary complexity to same calculator (computational limitations still apply!)

127

Introduzione a RooSTATS

129

130

RooStats

RooStatsTutorial_120323.pdfhttps://indico.desy.de/getFile.py/access?contribId=15&resId=3&materialId=slides&confId=5065slides da 1 a 14

131

TMVA

132

133

TMVA

TMVA (Tool for Multi Variate Analysis) Utilizzo di TMVA come classificatore. Descrizione di TMVAGui.

- Multivariate Methods, di Niklaus Berger - Statistical methods for data analysis, di L. Lista (Multivariate discriminators with TMVA)

Multidimensional models5• Uncorrelated products of p.d.f.s

• Using composition to p.d.f.s with correlation

• Products of conditional and plain p.d.f.s

134

Building realistic models

* =

g(x;m,s)m(y;a0,a1)

=

g(x,y;a0,a1,s)Possible in any PDFNo explicit support in PDF code needed

– Multiplication

– Composition

135

Model building – Products of uncorrelated p.d.f.s

RooBMixDecay

RooPolynomial

RooHistPdf

RooArgusBG

RooGaussian

RooProdPdf*

)()(),( yGxFyxH

136

Uncorrelated products – Mathematics and constructors

)()(),( yGxFyxH i

iii xFxH )()( }{}{}{

2D nD

w.factory(“Gaussian::gx(x[-5,5],mx[2],sx[1])”) ;

w.factory(“Gaussian::gy(y[-5,5],my[-2],sy[3])”) ;

w.factory(“PROD::gxy(gx,gy)”) ;

• Mathematical construction of products of uncorrelated p.d.f.s is straightforward

– No explicit normalization required If input p.d.f.s are unit

normalized, product is also unit normalized (this is true only because of the absence of correlations)

• Corresponding factory operator is PROD

137

How it work – event generation on uncorrelated products

Delegate Generate Merge

• If p.d.f.s are uncorrelated, each observable can be generated separately

– Reduced dimensionality of problem (important for e.g. accept/reject sampling)

– Actual event generation delegated to component p.d.f (can e.g. use internal generator if available)

– RooProdPdf just aggregates output in single dataset

138

Fundamental multi-dimensional p.d.fs

EXPR::mypdf(‘sqrt(x+y)*sqrt(x-y)’,x,y) ;

• It also possible define multi-dimensional p.d.f.s that do not arise through a product construction

– For example

– But usually n-dim p.d.f.s are constructed more intuitively through product constructs. Also correlations can be introduced efficiently (more on that in a moment)

• Example of fundamental 2-D B-physics p.d.f. RooBMixDecay

– Two observables: decay time (t, continuous) mixingState (m, discrete [-1,+1])

139

Plotting multi-dimensional PDFs

RooPlot* xframe = x.frame() ;

data->plotOn(xframe) ;

prod->plotOn(xframe) ;

xframe->Draw() ;

c->cd(2) ;

RooPlot* yframe = y.frame() ;

data->plotOn(yframe) ;

prod->plotOn(yframe) ;

yframe->Draw() ;

dyyxpdfxf ),()(

dxyxpdfyf ),()(

-Plotting a dataset D(x,y) versus x represents a projection over y

-To overlay PDF(x,y), you must plot Int(dy)PDF(x,y)

-RooFit automatically takes care of this!

•RooPlot remembers dimensions of plotted datasets

140

Introduction to slicing

x = x.getVal()

Slice in x

Range in y

• With multidimensional p.d.f.s it is also often useful to be able to plot a slice of a p.d.f

• In RooFit

– A slice is thin

– A range is thick

• Slices mostly usefulin discrete observables

– A slice in a continuous observablehas no width and usually no datawith the corresponding cut (e.g. “x=5.234”)

• Ranges work for bothcontinuous and discrete observables

– Range of discrete observablecan be list of >=1 state

141

Plotting a slice of a dataset

// Mixing dataset defines dt,mixState

RooDataSet* data ;

// Plot the entire dataset

RooPlot* frame = dt.frame() ;

data->plotOn(frame) ;

// Plot the mixed part of the data

RooPlot* frame_mix = dt.frame() ;

data->plotOn(frame,

Cut(”mixState==mixState::mixed”)) ;

• Use the optional cut string expression

– Works the same for binned data sets

142

Plotting a slice of a p.d.f

RooPlot* dtframe = dt.frame() ;

data->plotOn(dtframe,Cut(“mixState==mixState::mixed“)) ;

bmix.plotOn(dtframe,Slice(mixState,”mixed”)) ;

dtframe->Draw() ;

For slices both data and p.d.f normalize with respect to full dataset. If fraction ‘mixed’ in above example disagrees between data and p.d.f prediction, this discrepancy will show in plot

143

Plotting a range of a p.d.f and a dataset

RooPlot* xframe = x.frame() ;

data->plotOn(xframe) ;

model.plotOn(xframe) ;

y.setRange(“sig”,-1,1) ;

RooPlot* xframe2 = x.frame() ;

data->plotOn(xframe2,CutRange("sig")) ;

model.plotOn(xframe2,ProjectionRange("sig")) ;

model(x,y) = gauss(x)*gauss(y) + poly(x)*poly(y)

Works also with >2D projections (just specify projection range on all projected observables)

Works also with multidimensional p.d.fs that have correlations

144

Physics example of combined range and slice plotting

// Plot projection on mB

RooPlot* mbframe = mb.frame(40) ;

data->plotOn(mbframe) ;

model.plotOn(mbframe) ;

// Plot mixed slice projection on deltat

RooPlot* dtframe = dt.frame(40) ;

data>plotOn(dtframe,

Cut(”mixState==mixState::mixed”)) ;

model.plotOn(dtframe,Slice(mixState,”mixed”)) ;

Example setup:Argus(mB)*Decay(dt) +

Gauss(mB)*BMixDecay(dt)

(background)(signal)

mB

dt (mixed slice)

145

Plotting a range - Example

Example setup:Argus(mB)*Decay(dt) +

Gauss(mB)*BMixDecay(dt)

(background)(signal)

mb.setRange(“signal”,5.27,5.30) ;

mbSliceData->plotOn(dtframe2,

Cut("mixState==mixState::mixed“),

CutRange(“signal”))

model.plotOn(dtframe2,Slice(mixState,”mixed”),

ProjectionRange(“signal”))

mB

dt (mixed slice)

dt (mixed slice &&“signal” range)

“signal”

146

Plotting a range - Example

// Generate 80K toy MC events from p.d.f to be projected

RooDataSet *toyMC =

model.generate(RooArgSet(dt,mixState,tagFlav,mB),80000);

// Apply desired cut on toy MC data

RooDataSet* mbSliceToyMC = toyMC->reduce(“mb>5.27”);

// Plot data requesting data averaging over selected toy MC data

model.plotOn(dtframe2,Slice(mixState),ProjWData(mb,mbSliceToyMC))

),(

),,(1

),,(zyD

ii zyxMN

dydzzyxM

• We can also plot the finite width slice with a different technique toy MC integration

147

Plotting non-rectangular PDF regions

4)3()5( 22 yx ‘donut’

• Why is this interesting? Because with this technique we can trivially implement projection over arbitrarily shaped regions.

– Any cut prescription that you can think of to apply to data works

• Example: Likelihood ratio projection plot

– Common technique in rare decay analyses

– PDF typically consist of N-dimensional event selection PDF,where N is large (e.g. 6.)

– Projection of data & PDF in any of the N dimensions doesn’t show a significant excess of signal events

– To demonstrate purity of selected signal, plot data distribution (with overlaid PDF) in one dimension, while selecting events with a cut on the likelihood ratio of signal and background in the remaining N-1 dimensions

148

Likelihood ratio plots

dxzyxBfzyxSf

dxzyxSfzyLR

),,()1(),,(

),,(),(

),,()1(),,(

),,(),,(

zyxBfzyxSf

zyxSfzyxLR

•Integrate over x

•Plot LR vs (y,z)

• Idea: use information on S/(S+B) ratio in projected observables to define a cut

• Example: generalize previous toy model to 3 dimensions

• Express information on S/(S+B) ratio of model in terms of integrals over model components

149

Likelihood ratio plots

),(5.0),(

),,(1

),,(zyD

ii

zyLR

zyxMN

dydzzyxM•Dataset with values of (y,z)sampled from p.d.f andfiltered for events that meetLR(y,z)>0.5

•All events •Only LR(y,z)>0.5

• Decide on s/(s+b) puritycontour of LR(y,z)

– Example s/(s+b) > 50%

• Plot both data and model with corresponding cut.

– For data: calculate LR(y,z) for each event, plot only event with LR>0.5

– For model: using Monte Carlo integration technique:

150

Likelihood ratio plot on model with correlations

151

Likelihood ratio plots – Coded example

// Construct likelihood ratio in projection on (y,z)

w.factory("expr::LR('fsig*psig/ptot',fsig,

PROJ::psig(sig,x),PROJ::ptot(model,x))") ;

// Generate toy dataset for MC integration over region with LR>68%

RooDataSet* tmpdata = model.generate(RooArgSet(x,y,z),10000) ;

tmpdata->addColumn(*w.function(“LR”)) ;

RooDataSet* projdata = (RooDataSet*) tmpdata->reduce(Cut("LR>0.68")) ;

// Add LR to observed data so we can cut on it

data->addColumn(*w.function(“LR”)) ;

RooDataSet* seldata = (RooDataSet*) data->reduce(Cut("LR>0.68")) ;

// Make plot for data and pdf

RooPlot* frame3 = x.frame(Title("Projection with LR(y,z)>68%")) ;

seldata->plotOn(frame3) ;

model.plotOn(frame3,ProjWData(*projdata)) ;

dxzyxBfzyxSf

dxzyxSfzyLR

),,()1(),,(

),,(),(

152

Plotting in more than 2,3 dimensions

TH2D* ph2 = pdf.createHistogram(“ph2”,x,YVar(y)) ;

TH2* dh2 = data.createHistogram(“dg2",x,Binning(10),

YVar(y,Binning(10)));

ph2->Draw("SURF") ;

dh2->Draw("LEGO") ;

• No equivalent of RooPlot for >1 dimensions

– Usually >1D plots are not overlaid anyway

• Easy to use createHistogram() methods provided in both RooAbsData and RooAbsPdf to fill ROOT 2D,3D histograms

153

Building models – Introducing correlations

);,()),(,();( qyxfqypxfpxf

• Easiest way to do this is

– start with 1-dim p.d.f. and change on of its parameters into a function that depends on another observable

– Natural way to think about it

• Example problem

– Observable is reconstructed mass M of some object.

– Fitting Gaussian g(M,mean,sigma) some background to dataset D(M)

– But reconstructed mass has bias depending on some other observable X

– Rewrite fit functions as g(M,meanCorr(mtrue,X,alpha),sigma)where meanCorr is an (emperical) function that corrects for the bias depending on X

154

Introducing correlations through composition

);,()),(,();( qyxfqypxfpxf

w.factory(“expr::mean(‘a*y+b’,y[-10,10],a[0.7],b[0.3])”) ;

w.factory(“Gaussian::g(x[-10,10],mean,sigma[3])”) ;

• RooFit pdf building blocks do not require variables as input, just real-valued functions

– Can substitute any variable with a function expression in parameters and/or observables

– Example: Gaussian with shifting mean

– No assumption made in function on a,b,x,y being observables or parameters, any combination will work

155

What does the example p.d.f look like?

Projection on Y

Projection on X

• Use example model with x,y as observables

• Note flat distribution in y. Unlikely to describe data, solutions:

1. Use as conditional p.d.f g(x|y,a,b)

2. Use in conditional form multiplied by another pdf in y: g(x|y)*h(y)

156

Conditional p.d.f.s – Formulation and construction

xdpyxf

pyxfpyxF

),,(

),,();|(

• Mathematical formulation of a conditional p.d.f

– A conditional p.d.f is not normalized w.r.t its conditional observables

– Note that denominator in above expression depends on y and is thus in general different for each event

• Constructing a conditional p.d.f in RooFit

– Any RooFit p.d.f can be used as a conditional p.d.f as objects have no internal notion of distinction between parameters, observables and conditional observables

– Observables that should be used as conditional observables have to be specified in use context (generation, plotting, fitting etc…)

157

Method 1 – Using a conditional p.d.f – fitting and plotting

pdf.fitTo(data,ConditionalObservables(y))

xdyxf

yxfyxF

),(

),()|(

Ni

D i

ip

dxyxp

yxp

NxP

,1

),(

),(1)(

dxdyyxp

dyyxpxPp

),(

),()(

Sum over all yi in dataset DIntegrate over y

• For fitting, indicate in fitTo() call what the conditional observables are

– You may notice a performance penalty if the normalization integral of the p.d.f needs to be calculated numerically. For a conditional p.d.f it must evaluated again for each event

• Plotting: You cannot project a conditional F(x|y) on xwithout external information on the distribution of y

– Substitute integration with averaging over y values in data

158

How it works – event generation with conditional p.d.f.s

• Just like plotting, event generation of conditional p.d.f.s requires external input on the conditional observables

– Given an external input dataset P(dt)

– For each event in P, set the value of dt in F(d|dt) to dti

generate one event for observable t from F(t|dti)

– Store both ti and dti in the output dataset

159

Physics example with conditional p.d.f.s

),,();()( mtRtDtF

),,();()|( tmtRtDttF

• Want to fit decay time distribution of B0 mesons (exponential) convoluted with Gaussian resolution

• However, resolution on decay time varies from event by event (e.g. more or less tracks available).

– We have in the data an error estimate dt for each measurement from the decay vertex fitter (“per-event error”)

– Incorporate this information into this physics model

– Resolution in physics model is adjusted for each event to expected error.

– Overall scale factor can account for incorrect vertex error estimates (i.e. if fitted >1 then dt was underestimate of true error)

– Physics p.d.f must used conditional conditional p.d.f because it give no sensible prediction on the distribution of the per-event errors

160

Physics example with conditional p.d.f.s

),,();()|( tmtRtDttF

Small dt

Large dt

// Plotting of decay(t|dterr)

RooPlot* frame = dt.frame() ;

data->plotOn(frame2) ;

decay_gm1.plotOn(frame2,ProjWData(*data)) ;

Ni

D i

ip

dxyxp

yxp

NxP

,1

),(

),(1)(

Note that projecting over largedatasets can be slow. You can speedthis up by projecting with a binnedcopy of the projection data

• Some illustrations of decay model with per-event errors

– Shape of F(t|t) for several values of t

• Plot of D(t) and F(t|dt) projected over dt

161

Method 2 – Building products with conditional pdfs

• Use of conditional pdf in fitting, plotting, event generation has some practical drawbacks

– Need external dataset with distribution in conditional observable in all operations

• But there is also a fundamental issue

– If your model has both a signal and a background component, the model assumes that the distribution of the conditional observable (e.g. the per-event error) is the same for signal and background

– This may not be a valid assumption (‘Punzi effect’)

– Way out: Construct a product F(x|y)*G(y) separately for signal and background

162

Example with product of conditional and plain p.d.f.

// I - Use g as conditional pdf g(x|y)

w::g.fitTo(data,ConditionalObservables(w::y)) ;

// II - Construct product with another pdf in y

w.factory(“Gaussian::h(y,0,2)”) ;

w.factory(“PROD::gxy(g|y,h)”) ;

gx(x|y) gy(y)* model(x,y)=

dyygyxgx )()|(163

Example with product of conditional and plain p.d.f.

)()|()()|(),( dtbdttBdtsdttSdttF

• Following the ‘conditional product’ formalism you can now choose different distributions for the conditional observable for signal and background e.g.

• At this point F(t,dt) is a plain pdf: fitting plotting and event generation works ‘as usual’ without external input

• You may want to use an empirical pdf for s(dt) or b(dt) if these distributions are difficult to model

– Histogram based pdf (RooHistPdf)

– Kernel estimatin pdf (RooKeysPdf) Set next slide

164

Special pdfs – Kernel estimation model

Sample of eventsGaussian pdffor each event

Summed pdffor all events

Adaptive Kernel:width of Gaussian depends on local event density

w.import(myData,Rename(“myData”)) ;

w.factory(“KeysPdf::k(x,myData)”) ;

• Kernel estimation model

– Construct smooth pdf from unbinned data, using kernel estimation technique

• Example

• Also available for n-D data

165

Fit validation,

Toy MC studies6• Goodness-of-fit, c2

• Toy Monte Carlo studies for fit validation

166

How do you know if your fit was ‘good’

• Goodness-of-fit broad issue in statistics in general, will just focus on a few specific tools implemented in RooFit here

• For one-dimensional fits, a c2 is usually the right thing to do

– Some tools implemented in RooPlot to be able to calculate c2/ndf of curve w.r.t data

double chi2 = frame->chisquare(nFloatParam) ;

– Also tools exists to plot residual and pull distributions from curve and histogram in a RooPlot

frame->makePullHist() ;

frame->makeResidHist() ;

167

GOF in >1D, other aspects of fit validity

• No special tools for >1 dimensional goodness-of-fit

– A c2 usually doesn’t work because empty bins proliferate with dimensions

– But if you have ideas you’d like to try, there exists generic base classes for implementation that provide the same level of computational optimization and parallelization as is done for likelihoods (RooAbsOptTestStatistic)

• But you can study many other aspect of your fit validity

– Is your fit unbiased?

– Does it (often) have convergence problems?

• You can answer these with a toy Monte Carlo study

– I.e. generate 10000 samples from your p.d.f., fit them all and collect and analyze the statistics of these 10000 fits.

– The RooMCStudy class helps out with the logistics

168

Advanced features – Task automation

Input model Generate toy MC Fit model

Repeat N times

Accumulatefit statistics

Distribution of- parameter values- parameter errors- parameter pulls

// Instantiate MC study manager

RooMCStudy mgr(inputModel) ;

// Generate and fit 100 samples of 1000 events

mgr.generateAndFit(100,1000) ;

// Plot distribution of sigma parameter

mgr.plotParam(sigma)->Draw()

• Support for routine task automation, e.g. goodness-of-fit study

169

How to efficiently generate multiple sets of ToyMC?

• Use RooMCStudy class to manage generation and fitting

• Generating features

– Generator overhead only incurred once Efficient for large number of small samples

– Optional Poisson distribution for #events of generated experiments

– Optional automatic creation of ASCII data files

• Fitting

– Fit with generator PDF or different PDF

– Fit results (floating parameters & NLL) automatically collected in summary dataset

• Plotting

– Automated plotting for distribution of parameters, parameter errors, pulls and NLL

• Add-in modules for optional modifications of procedure

– Concrete tools for variation of generation parameters, calculation of likelihood ratios for each experiment

– Easy to write your own. You can intervene at any stage and offer proprietary data to be aggregated with fit results

170

A RooMCStudy example

// Setup PDF

RooRealVar x("x","x",-5,15) ;

RooRealVar mean("mean","mean of gaussian",-1) ;

RooRealVar sigma("sigma","width of gaussian",4) ;

RooGaussian gauss("gauss","gaussian PDF",x,mean,sigma) ;

// Create manager

RooMCStudy mgr(gauss,gauss,x,””,”mhv”) ;

// Generate and fit 1000 experiments of 100 events each

mgr.generateAndFit(1000,100) ;

RooMCStudy::run: Generating and fitting sample 999

RooMCStudy::run: Generating and fitting sample 998

RooMCStudy::run: Generating and fitting sample 997

Fitting Options

Generator Options

Observables

Generator PDF

Fitting PDF

• Generating and fitting a simple PDF

171

A RooMCStudy example

// Plot the distrution of the value

RooPlot* mframe = mean.frame(-2,0) ;

mgr.plotParamOn(mframe) ;

mframe->Draw() ;

// Plot the distrution of the error

RooPlot* meframe = mgr.plotError(mean,0.,0.1) ;

meframe->Draw() ;

// Plot the distrution of the pull

RooPlot* mpframe = mgr.plotPull(mean,-3,3,40,kTRUE) ;

mpframe->Draw() ;

Add Gaussian fit

• Plot the distribution of the value, error and pull of mean

172

A RooMCStudy example

// Plot the distribution of the NLL

mgr.plotNLL(mframe) ;

mframe->Draw() ;

• Plot the distribution of –log(L)

• NB: likelihood distributions cannot be used to deduce goodness-of-fit information!

173

A RooMCStudy example

mgr.fitParDataSet().get(10)->Print(“v”) ;

RooArgSet:::

1) RooRealVar::mean : 0.14814 +/- 0.191 L(-10 - 10)

2) RooRealVar::sigma : 4.0619 +/- 0.143 L(0 - 20)

3) RooRealVar::NLL : 2585.1 C

4) RooRealVar::meanerr : 0.19064 C

5) RooRealVar::meanpull : 0.77704 C

6) RooRealVar::sigmaerr : 0.14338 C

7) RooRealVar::sigmapull : 0.43199 C

TH2* h = mean.createHistogram("mean vs sigma",sigma) ;

mgr.fitParDataSet().fillHistogram(h,RooArgList(mean,sigma)) ;

h->Draw("BOX") ;

Pulls and errorshave separateentries foreasy accessand plotting

• For other uses, use summarized fit results in RooDataSet form

174

Fit Validation Study – Practical example

);();(),,,;( bkgsigbkgsig BSBS pmANpmGNppNNmF

Nsig(fit)

Nsig(generated)

• Example fit model in 1-D (B mass)

– Signal component is Gaussian centered at B mass

– Background component is Argus function (models phase space near kinematic limit)

• Fit parameter under study: Nsig

– Results of simulation study: 1000 experiments with NSIG(gen)=100, NBKG(gen)=200

– Distribution of Nsig(fit)

– This particular fit looks unbiased…

175

Fit Validation Study – The pull distribution

(Nsig)

fit

N

true

sig

fit

sig NN

)pull(N sig

pull(Nsig)

• What about the validity of the error?

– Distribution of error from simulated experiments is difficult to interpret…

– We don’t have equivalent of Nsig(generated) for the error

• Solution: look at the pull distribution

– Definition:

– Properties of pull:

• Mean is 0 if there is no bias

• Width is 1 if error is correct

– In this example: no bias, correct errorwithin statistical precision of study

176

Fit Validation Study – Low statistics example

• Special care should be taken when fitting small data samples

– Also if fitting for small signal component in large sample

• Possible causes of trouble

– c2 estimators may become approximate as Gaussian approximation of Poisson statistics becomes inaccurate

– ML estimators may no longer be efficient error estimate from 2nd derivative may become inaccurate

– Bias term proportional to 1/N of ML and c2 estimators may no longer be small compared to 1/sqrt(N)

• In general, absence of bias, correctness of error can not be assumed. How to proceed?

– Use unbinned ML fits only – most robust at low statistics

– Explicitly verify the validity of your fit

177

Demonstration of fit bias at low N – pull distributions

NBKG(gen)=200

NSIG(gen)=20

Distributions becomeasymmetric at low statistics

NSIG(fit) (NSIG) pull(NSIG)

NSIG(gen)

Pull mean ~2 away from 0 Fit is positively biased!

• Low statistics example:

– Scenario as before but now with 200 bkg events and only 20 signal events (instead of 100)

• Results of simulation study

• Absence of bias, correct error at low statistics not obvious

178

New developments for automated studies

• A new alternative framework is being put in place to replace class RooMCStudy.

– Class RooStudyManager manages logistics of repeated studies, but does not implement content of study.

– Abstract concept of study interfaced through class RooAbsStudy

– Class RooGenFitStudy manages implementation of ‘generate-and-fit’ style studies (functionality of RooMCStudy)

• Greater flexibility in choice of study (you can put in anything you want)

• Support for multiple backend implementations

– Inline calculation (as done in RooMCStudy)

– Parallelized execution through PROOF (lite)

– Almost complete automation of support for batch submission

– Just need to change one line of your macro to change back-end

179

Demo of parallelization with PROOF-lite

RooStudyManager mcs(*w,gfs) ;

mcs.run(1000) ; // inline running

mcs.runProof(1000,"") ; // empty string is PROOF-lite

mcs.prepareBatchInput("default",1000,kTRUE) ;

• Example – Factor 8 speed up on a dual-quad core box.

– Works with out-of-the box ROOT distribution

– Also: Graceful early termination when users presses ‘Stop’

• Much larger gains can be made with ‘real’ PROOF farms180

Constructing joint models7• Using discrete variable to classify data

• Simultaneous fits on multiple datasets

181

Datasets and discrete observables

Dataset A

X

5.0

3.7

1.2

4.3 Dataset B

X

5.0

3.7

1.2

Dataset A+B

X source

5.0 A

3.7 A

1.2 A

4.3 A

5.0 B

3.7 B

1.2 B

• Discrete observables play an important role in management of datasets

– Useful to classify ‘sub datasets’ inside datasets

– Can collapse multiple, logically separate datasets into a single dataset by adding them and labeling the source with a discrete observable

– Allows to express operations such a simultaneous fits as operation on a single dataset

182

Discrete variables in RooFit – RooCategory

// Define a cat. with explicitly numbered states

w.factory(“b0flav[B0=-1,B0bar=1]”) ;

// Define a category with labels only

w.factory(“tagCat[Lepton,Kaon,NT1,NT2]”) ;

w.factory(“sample[CPV,BMixing]”) ;

• Properties of RooCategory variables

– Finite set of named states self documenting

– Optional integer code associated with each state

• Used for classification of data, or to describe occasional discrete fundamental observable (e.g. B0 flavor)

183

Datasets and discrete observables – part 2

RooDataSet simdata("simdata","simdata",x,source,

Import(“A",*dataA),Import(“B",*dataB)) ;

• Example of constructing a joint dataset from 2 inputs

• But can also derive classification from info within dataset

– E.g. (10<x<20 = “signal”, 0<x<10 | 20<x<30 = “sideband”)

– Encode classification using realdiscrete mapping functions

184

A universal realdiscrete mapping function

// Mass variable

RooRealVar m(“m”,”mass,0,10.);

// Define threshold category

RooThresholdCategory region(“region”,”Region of M”,m,”Background”);

region.addThreshold(9.0, “SideBand”) ;

region.addThreshold(7.9, “Signal”) ;

region.addThreshold(6.1,”SideBand”) ;

region.addThreshold(5.0,”Background”) ;

Sig Sidebandbackground

Default state

Define region boundaries

• Class RooThresholdCategory maps ranges of input RooRealVar to states of a RooCategory

185

Discrete multiplication function

// Define ‘product’ of tagCat and runBlock

RooSuperCategory prod(“prod”,”prod”,RooArgSet(tag,flav))

flav

B0

B0bar

tag

Lepton

Kaon

NT1

NT2

prod

{B0;Lepton} {B0bar;Lepton}

{B0;Kaon} {B0bar;Kaon}

{B0;NT1} {B0bar;NT1}

{B0;NT2} {B0bar;NT2}

X

• RooSuperCategory/RooMultiCategory provides

category multiplication

186

DiscreteDiscrete mapping function

RooCategory tagCat("tagCat","Tagging category") ;

tagCat.defineType("Lepton") ;

tagCat.defineType("Kaon") ;

tagCat.defineType("NetTagger-1") ;

tagCat.defineType("NetTagger-2") ;

RooMappedCategory tagType(“tagType”,”type”,tagCat) ;

tagType.map(“Lepton”,”CutBased”) ;

tagType.map(“Kaon”,”CutBased”) ;

tagType.map(“NT*”,”NeuralNet”) ;

Define inputcategory

Create mappedcategory

Add mapping rules

Wildcard expressionsallowed

tagCat

Lepton

Kaon

NT1

NT2

tagType

CutBased

NeuralNet

• RooMappedCategory provides cat cat mapping

187

Exploring discrete data

RooTable* table=data->table(b0flav) ;

table->Print() ;

Table b0flav : aData+-------+------+| B0 | 4949 || B0bar | 5051 |+-------+------+

Double_t nB0 = table->get(“B0”) ;

Double_t b0Frac = table->getFrac(“B0”);

data->table(tagCat,"x>8.23")->Print() ;

Table tagCat : aData(x>8.23)+-------------+-----+| Lepton | 668 || Kaon | 717 || NetTagger-1 | 632 || NetTagger-2 | 616 |+-------------+-----+

Tabulate contents of datasetby category state

Extract contents by label

Extract contents fraction by label

Tabulate contents of selected part of dataset

• Like real variables of a dataset can be plotted,discrete variables can be tabulated

188

Exploring discrete data

data->table(b0Xtcat)->Print() ;

Table b0Xtcat : aData+---------------------+------+| {B0;Lepton} | 1226 || {B0bar;Lepton} | 1306 || {B0;Kaon} | 1287 || {B0bar;Kaon} | 1270 || {B0;NetTagger-1} | 1213 || {B0bar;NetTagger-1} | 1261 || {B0;NetTagger-2} | 1223 || {B0bar;NetTagger-2} | 1214 |+---------------------+------+

data->table(tcatType)->Print() ;

Table tcatType : aData+----------------+------+| Unknown | 0 || Cut based | 5089 || Neural Network | 4911 |+----------------+------+

Tabulate RooSuperCategory states

Tabulate RooMappedCategory states

• Discrete functions, built from categories in a datasetcan be tabulated likewise

189

Fitting multiple datasets simultaneously

• Simultaneous fitting efficient solution to incorporate information from control sample into signal sample

• Example problem: search rare decay

– Signal dataset has small number entries.

– Statistical uncertainty on shape in fit contributes significantly to uncertainty on fitted number of signal events

– However can constrain shape of signal from control sample (e.g. another decay with similar properties that is not rare), so no need to relay on simulations

190

Fitting multiple datasets simultaneously

• Fit to control sample yields accurate information on shape of signal

• Q: What is the most practical way to combine shape measurement on control sample to measurement of signal on physics sample of interest

• A: Perform a simultaneous fit

– Automatic propagation of errors & correlations

– Combined measurement (i.e. error will reflect contributions from both physics sample and control sample

191

Discrete observable as data subset classifier

mi

i

BB

ni

i

AA DPDFDPDFL,1,1

))(log())(log()log(

‘CTL’‘SIG’

Combined-lo

g(L

)

• Likelihood level definition of a simultaneous fit

• Minimize -logL(a,b,c)= -logL(a,b)+ -logL(b,c)

– Errors, correlations on common par. b automatically propagated192

Discrete observable as data subset classifier

mi

i

BB

ni

i

AA DPDFDPDFL,1,1

))(log())(log()log(

RooSimultaneous implements ‘switch’ PDF:

case (indexCat) {

A: return pdfA ;

B: return pdfB ;

}

Likelihood of switchPdfwith composite datasetautomatically constructssum of likelihoods above

ni

i

BADsimPDFL,1

))(log()log(

• Likelihood level definition of a simultaneous fit

• PDF level definition of a simultaneous fit

193

Practical fitting – Simultaneous fit technique

•Dsig(x), Fsig(x;a,b) •Dctl(x), Fctl(x;b,c)

• given data Dsig(x) and model Fsig(x;a,b) anddata Dctl(x) and model Fctl(x;b,c)

– Construct –log[Lsig(a,b)] and –log[Lctl(b,c)] and

194

Constructing joint pdfs

// Pdfs for channels ‘A’ and ‘B’

w.factory(“Gaussian::pdfA(x[-10,10],mean[-10,10],sigma[3])”) ;

w.factory(“Uniform::pdfB(x)”) ;

// Create discrete observable to label channels

w.factory(“index[A,B]”) ;

// Create joint pdf

w.factory(“SIMUL::joint(index,A=pdfA,B=pdfB)”) ;

RooDataSet *dataA, *dataB ;

RooDataSet dataAB(“dataAB”,”dataAB”,Index(w::index),

Import(“A”,*dataA),Import(“B”,*dataB)) ;

49

• Operator class SIMUL to construct joint models at the pdf level

• Can also construct joint datasets

195

Building simultaneous fits in RooFit

// Signal pdf

w.factory("Gaussian::sig(x[-10,10],mean[0,-10,10],sigma[3,2,4])") ;

w.factory("Uniform::bkg(x)") ;

w.factory("SUM::model(Nsig[800,0,1000]*sig,Nbkg[0,1000]*bkg)") ;

// Background pdf

w.factory("Gaussian::sig_control(x[-10,10],mean[0,-10,10],sigma[3,2,4])") ;

w.factory("Chebychev::bkg_control(x,a0[1])") ;

w.factory("SUM::model_control(Nsig_control[500,0,10000]*sig_control,

Nbkg_control[500,0,10000]*bkg_control)") ;

// Joint pdf construction

w.factory("SIMUL::model_sim(index[sig,control],

sig=model, control=model_control)") ;

// Joint data construction

RooDataSet simdata("simdata","simdata",w::x,Index(w::index),

Import("sig",*data),Import("control",*data_control)) ;

// Joint fit

RooFitResult* rs = w::model_sim.fitTo(simdata,Save()) ;

• Code that construct example shown 2 slides back

196

Constructing joint likelihood

RooAbsReal* nllJoint = w::joint.createNLL(dataAB) ;

RooAbsReal* nllA = w::A.createNLL(*dataA) ; w.import(nllA) ;

RooAbsReal* nllB = w::B.createNLL(*dataB) ; w.import(nllB) ;

w.factory(sum::nllJoint(nllA,nllB)) ;

50

• When you have a simultaneous pdf you can create a joint likelihood from the joint pdf

• Also possible to make likelihood functions of the components first and then add them

• Likelihood constructed either way is the same.

• Minimization of joint likelihood == Joint fit

197

Other scenarios in which simultaneous fits are useful

• Preceding example was ‘asymmetric’

– Very large control sample, small signal sample

– Physics in each channel possibly different (but with some similar properties

• There are also ‘symmetric’ use cases

– Fit multiple data sets that are functionally equivalent, but have slightly different properties (e.g. purity)

– Example: Split B physics data in block separated by flavor tagging technique (each technique results in a different sensitivity to CP physics parameters of interest).

– Split data in block by data taking run, mass resolutions in each run may be slightly different

– For symmetric use cases pdf-level definition of simultaneous fit very convenient as you usually start with a single dataset with subclassing formation derived from its observables

• By splitting data into subsamples with p.d.f.s that can be tuned to describe the (slightly) varying properties you can increase the statistical sensitivity of your measurement

198

A more empirical approach to simultaneous fits

• Instead of investing a lot of time in developing multi-dimensional models Split data in many subsamples, fit all subsamples

simultaneously to slight variations of ‘master’ p.d.f

• Example: Given dataset D(x,y) where observable of interest is x.

– Distribution of x varies slightly with y

– Suppose we’re only interested in the width of the peakwhich is supposed to be invariant under y (unlike mean)

– Slice data in 10 bins of y and simultaneous fit each bin with p.d.f that only has different Gaussian mean parameter, but same width

199

A more empirical approach to simultaneous fits

Floating Parameter FinalValue +/- Error

-------------------- --------------------------

mean_bin1 -4.5302e+00 +/- 1.62e-02

mean_bin2 -3.4928e+00 +/- 1.38e-02

mean_bin3 -2.4790e+00 +/- 1.35e-02

mean_bin4 -1.4174e+00 +/- 9.64e-03

mean_bin5 -4.8945e-01 +/- 7.95e-03

mean_bin6 4.0716e-01 +/- 9.67e-03

mean_bin7 1.4733e+00 +/- 1.37e-02

mean_bin8 2.4912e+00 +/- 1.44e-02

mean_bin9 3.5028e+00 +/- 1.41e-02

mean_bin10 4.5474e+00 +/- 1.68e-02

sigma 2.7319e-01 +/- 2.46e-03

• Fit to sample of preceding page would look like this

– Each mean is fitted to expected value (-4.5 + ibin)

– But joint measurement of sigma

– NB: Correlation matrix is mostly diagonal as all mean_binXX parameters are completely uncorrelated!

200

A more empirical approach to simultaneous fits

• Preceding example was simplistic for illustrational clarity, but more sensible use cases exist

– Example: Measurement CP violation in B decay. Analyzing power of each event is diluted by factor (1-2w) where w is the mistake rate of the flavor tagging algorithm

– Neural net flavor tagging algorithm provides a tagging probability for each event in data. Could use prob(NN) as w, but then we rely on good calibration of NN, don’t want that

– In a simultaneous fit to CPV+Mixing samples, can measure average w from the latter. Now not relying on NN calibration, but not exploiting event-by-event variation in analysis power.

– Improved scenario: divide (CPV+mixing) data in 10 or 20 subsets corresponding to bins in prob(NN). Use identical p.d.f but only have separate parameter to express fitted mistag rate w_binXX.

– Simultaneous fit will now exploit difference in analyzing power of events and be insensitive to calibration of flavor tagging NN.

– If calibration of NN was OK fitting mistag rate in each bin of probNN will be average probNN value for that bin

201

A more empirical approach to simultaneous fits

Event with little analyzing power

Event withgreat analyzing

power

NN predicted power

NN predicted power

NN predicted power

co

ntr

ol sam

ple

m

easu

red

po

wer

co

ntr

ol sam

ple

m

easu

red

po

wer

co

ntr

ol sam

ple

m

easu

red

po

wer

Perfect NN

OK NN

Lousy NN

In all 3 casesfit not biasedby NN calibration

Better precisionon CPV meas.because moresensitive events in sample

Worse precisionon CPV meas.because lesssensitive events in sample

202

Building simultaneous fits from a template

// Template pdf – B0 decay with mixing

w.factory("TruthModel::tm(t[-20,20])") ;

w.factory("BMixDecay::sig(t,mixState[mixed=-1,unmixed=1],

tagFlav[B0=1,B0bar=-1], tau[1.54,1,2],

dm[0.472,0.1,0.8],w[0.1,0,0.5],dw[0],tm)") ;

// Construct index category

w.factory(“tag[Lep,Kao,NT1,NT2]”) ;

// Construct simultaneous pdf with separate mistag rate for each category

w.factory(“SIMCLONE::model(sig,$SplitParam({w,dw},tagCat)”) ;

• In the ‘symmetric’ use case the models assigned to each state are very similar in structure – Usually just one parameter name is different

• Easiest way to construct these from a template pdf and a prescription on how to tailor the template for each index state

• Use operator SIMCLONE instead of SIMUL

203

Building simultaneous fits from a template

RooWorkspace(w) w contents

variables

---------

(dm,dw,dw_Kao,dw_Lep,dw_NT1,dw_NT2,mixState,t,tagCat,tagFlav,tau,w,w_Kao,w_Lep,w_NT1,w_NT2)

p.d.f.s

-------

RooBMixDecay::sig[ mistag=w delMistag=dw mixState=mixState tagFlav=tagFlav tau=tau dm=dm t=t ] = 0.2

RooSimultaneous::model[ indexCat=tagCat Lep=sig_Lep Kao=sig_Kao NT1=sig_NT1 NT2=sig_NT2 ] = 0.2

RooBMixDecay::sig_Kao[ mistag=w_Kao delMistag=dw_Kao ... t=t ] = 0.2

RooBMixDecay::sig_Lep[ mistag=w_Lep delMistag=dw_Lep ... t=t ] = 0.2

RooBMixDecay::sig_NT1[ mistag=w_NT1 delMistag=dw_NT1 ... t=t ] = 0.2

RooBMixDecay::sig_NT2[ mistag=w_NT2 delMistag=dw_NT2 ... t=t ] = 0.2

analytical resolution models

----------------------------

RooTruthModel::tm[ x=t ] = 1

• Result

204

Adding parameter pdfs to the likelihood

w.factory(“Gaussian::g(x[-10,10],mean[-10,10],sigma[3])”) ;

w.factory(“PROD::gprime(f,Gaussian(mean,1.15,0.30))”) ;

))30.0,15.1,(log(),;(log(),(log GaussxfLdata

i

• Systematic/external uncertainties can be modeledwith regular RooFit pdf objects.

• To incorporate in likelihood, simply multiply with orig pdf

– Any pdf can be supplied, e.g. Gaussian most common, but an also use class RooMultiVarGaussian to introduce a Gaussian uncertainty on multiple parameteres including a correlation

• Advantage of including systematic uncertainties in likelihood: error automatically propagated to error reported by MINUIT

205

Adding uncertainties to a likelihood

• Example 1 – Width known exactly

• Example 2 – Gaussian uncertainty on width

206

Using the fit result output

RooAbsPdf* paramPdf =

fr->createHessePdf(RooArgSet(frac,mean,sigma));

• The fit result class contains the full MINUIT output

• Can construct multi-variate Gaussian pdfrepresenting pdf on parameters

– Returned pdf represents HESSE parabolic approximation of fit

• Can also multiply this pdf in parameterswith a pdf in observables

– ‘Simultaneous fit’

207

Another approach to joint fitting

• ‘Asymmetric’ simultaneous fit may spend majority of it CPU time calculating the likelihood of the control sample part

– Because control sample have many more events

– Example: joint fit between CPV golden modes and BMixing samples

• Alternate solution: Make joint fit using likelihood of signal sample and parameterized likelihood of control sample

– Assumption: Likelihood can be described by a multi-variate Gaussian with correlations (i.e. log-likelihood is parabolic)

– Very easy to do in RooFit using RooFitResult->createHessePdf()

– Example on next page

208

Example of joint fit with parameterized likelihood

// Joint pdf construction

w.factory("SIMUL::model_sim(index[sig,ctl],

sig=model, ctl=model_ctl)") ;

// Joint data construction

RooDataSet simdata("simdata","simdata",w::x,Index(w::index),

Import("sig",*data),Import("ctl",*data_ctl)) ;

// Joint fit

RooFitResult* rs = w::model_sim.fitTo(simdata,Save()) ;

// Fit to control sample only

RooFitResult* r = w::model_ctl.fitTo(*data_ctl,Save()) ;

RooAbsPdf* ctrlParamPdf = r->createHessePdf(w::model_ctl.getParameters());

// Make pdf of parameters and import in workspace

ctrlParamPdf->SetName(“ctrlParamPdf”) ;

w.import(*ctrlParamPdf) ;

w.factory(“PROD::model_sim2(model,ctrlParamPdf)”) ;

// Joint fit with parameterized likelihood for control sample

RooFitResult* rs = w::model_sim2.fitTo(*data,Save()) ;

Regular joint fit

Joint fit with parameterized L for ctl sample

209

top related