wouter verkerke, nikhef roofit a tool kit for data modeling in root wouter verkerke (nikhef) david...

27
Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Post on 21-Dec-2015

224 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

RooFitA tool kit for data modeling in ROOT

Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Page 2: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Focus: coding a probability density function

• Focus on one practical aspect of many data analysis in HEP: How do you formulate your p.d.f. in ROOT – For ‘simple’ problems (gauss, polynomial), ROOT built-in models

well sufficient

– But if you want to do unbinned ML fits, use non-trivial functions, or work with multidimensional functions you are quickly running into trouble

Page 3: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

A recent example

• BaBar experiment at SLAC: Extract sin(2) from time dependent CP violation of B decay: e+e- Y(4s) BB– Reconstruct both Bs, measure decay time difference

– Physics of interest is in decay time dependent oscillation

• Many issues arise– Standard ROOT function framework clearly insufficient to handle such

complicated functions must develop new framework

– Normalization of p.d.f. not always trivial to calculate may need numeric integration techniques

– Unbinned fit, >2 dimensions, many events computation performance important must try optimize code for acceptable performance

– Simultaneous fit to control samples to account for detector performance

);|BkgResol();(BkgDecay);BkgSel()1(

);|SigResol())2sin(,;(SigDecay);SigSel(

bkgbkgbkgsig

sigsigsigsig

rdttqtpmf

rdttqtpmf

Page 4: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

A recent example

• Initial approach BaBar: write it from scratch (in FORTRAN!)– Does its designated job quite well, but took a long time to develop

– Possible because sin(2) effort supported by O(50) people.

– Optimization of ML calculations hand-coded error prone and not easy to maintain

– Difficult to transfer knowledge/code from one analysis to another.

• A better solution: A modeling language in C++ that integrates seamlessly into ROOT– Recycle code and knowledge

• Development of RooFit package– Started 5 years ago.

– Very successful virtually everybody in BaBar uses it now in standard ROOT distribution

Page 5: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

RooFit – a data modeling language for ROOT

C++ command line interface & macros

Data management & histogramming

Graphics interface

I/O support

MINUIT

ToyMC dataGeneration

Data/ModelFitting

Data Modeling

Model Visualization

Extension to ROOT – (Almost) no overlap with existing functionality

Page 6: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Data modeling - Desired functionality

Building/Adjusting Models

Easy to write basic PDFs ( normalization)

Easy to compose complex models (modular design)

Reuse of existing functions

Flexibility – No arbitrary implementation-related restrictions

Using Models

Fitting : Binned/Unbinned (extended) MLL fits, Chi2 fits

Toy MC generation: Generate MC datasets from any model

Visualization: Slice/project model & data in any possible way

Speed – Should be as fast or faster than hand-coded model

A n

a l y

s i s

w

o r

k c

y c

l e

Page 7: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

• Idea: represent math symbols as C++ objects

– Result: 1 line of code per symbol in a function (the C++ constructor) rather than 1 line of code per function

Data modeling – OO representation

variable RooRealVar

function RooAbsReal

PDF RooAbsPdf

space point RooArgSet

list of space points RooAbsData

integral RooRealIntegral

RooFit classMathematical concept

),;( qpxF

px,

x

dxxfx

xmax

min

)(

)(xf

kx

Page 8: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Data modeling – Constructing composite objects

• Straightforward correlation between mathematical representation of formula and RooFit code

RooRealVar x

RooRealVar s

RooFormulaVar sqrts

RooGaussian g

RooRealVar x(“x”,”x”,-10,10) ; RooRealVar m(“m”,”mean”,0) ; RooRealVar s(“s”,”sigma”,2,0,10) ; RooFormulaVar sqrts(“sqrts”,”sqrt(s)”,s) ; RooGaussian g(“g”,”gauss”,x,m,sqrts) ;

Math

RooFitdiagram

RooFitcode

RooRealVar m

),,( smxgauss

Page 9: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Model building – (Re)using standard components

• RooFit provides a collection of compiled standard PDF classes

RooArgusBG

RooPolynomial

RooBMixDecay

RooHistPdf

RooGaussian

BasicGaussian, Exponential, Polynomial,…Chebychev polynomial

Physics inspiredARGUS,Crystal Ball, Breit-Wigner, Voigtian,B/D-Decay,….

Non-parametricHistogram, KEYS

Easy to extend the library: each p.d.f. is a separate C++ class

Page 10: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Importance of a good and extendable model library

• If ‘new’ p.d.f.s are made available in a easy-to-use form, more people will use them

• Example 1: non-parametric kernel estimation technique– See article by Kyle Cranmer

– People like it, but are put off by the prospect of thoroughly reading the entire article, coding the concept from scratch, and then using it.

– In RooFit switching from a histogram-based shape to a KEYS shape is a one-line code change (RooHistPdf RooKeysPdf)

• Example 2: Chebychev polynomials – Its easy to convince people to try them if all they need to do is a

one-line code change (RooPolynomial RooChebychev)

Page 11: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Model building – (Re)using standard components

• Library p.d.f.s can be adjusted on the fly.– Just plug in any function expression you like as input variable

– Works universally, even for classes you write yourself

• Maximum flexibility of library shapes keeps library small

g(x,y;a0,a1,s)

g(x;m,s) m(y;a0,a1)

RooPolyVar m(“m”,y,RooArgList(a0,a1)) ;RooGaussian g(“g”,”gauss”,x,m,s) ;

Page 12: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Handling of p.d.f normalization

• Normalization of (component) p.d.f.s to unity is often a good part of the work of writing a p.d.f.

• RooFit handles most normalization issues transparently to the user– P.d.f can advertise (multiple) analytical expressions for integrals

– When no analytical expression is provided, RooFit will automatically perform numeric integration to obtain normalization

– More complicated that it seems: even if gauss(x,m,s) can be integrated analytically over x, gauss(f(x),m,s) cannot. Such use cases are automatically recognized.

– Multi-dimensional integrals can be combination of numeric and p.d.f-provided analytical partial integrals

• Variety of numeric integration techniques is interfaced– Adaptive trapezoid, Gauss-Kronrod, VEGAS MC…

– User can override parameters globally or per p.d.f. as necessary

Page 13: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

RooBMixDecay

RooPolynomial

RooHistPdf

RooArgusBG

Model building – (Re)using standard components

• Most physics models can be composed from ‘basic’ shapes

RooAddPdf+

RooGaussian

Page 14: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

RooBMixDecay

RooPolynomial

RooHistPdf

RooArgusBG

RooGaussian

Model building – (Re)using standard components

• Most physics models can be composed from ‘basic’ shapes

RooProdPdf*)()|(),( ygyxfyxk

RooProdPdf k(“k”,”k”,g, Conditional(f,x))

)()(),( ygxfyxh

RooProdPdf h(“h”,”h”, RooArgSet(f,g))

Page 15: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Using models - Overview

• All RooFit models provide universal and complete fitting and Toy Monte Carlo generating functionality– Model complexity only limited by available memory and CPU power

– Fitting/plotting a 5-D model as easy as using a 1-D model

– Most operations are one-liners

RooAbsPdf

RooDataSet

RooAbsData

gauss.fitTo(data)

data = gauss.generate(x,1000)

Fitting Generating

Page 16: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Fitting functionality

• Binned or unbinned ML fit?– For RooFit this is the essentially the same!

– Nice example of C++ abstraction through inheritance

• Fitting interface takes abstract RooAbsData object– Supply unbinned data object (created from TTree) unbinned fit

– Supply histogram object (created from THx) binned fit

x y z1 3 52 4 61 3 52 4 6

BinnedUnbinned

RooDataSet RooDataHist

RooAbsData

pdf.fitTo(data)

Page 17: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Automated vs. hand-coded optimization of p.d.f.

• Automatic pre-fit PDF optimization– Prior to each fit, the PDF is analyzed for possible optimizations

– Optimization algorithms:

• Detection and precalculation of constant terms in any PDF expression

• Function caching and lazy evaluation

• Factorization of multi-dimensional problems where ever possible

– Optimizations are always tailored to the specific use in each fit.

– Possible because OO structure of p.d.f. allows automated analysis of structure

• No need for users to hard-code optimizations – Keeps your code understandable, maintainable and flexible

without sacrificing performance

– Optimization concepts implemented by RooFit are applied consistently and completely to all PDFs

– Speedup of factor 3-10 reported in realistic complex fits

• Fit parallelization on multi-CPU hosts– Option for automatic parallelization of fit function on multi-CPU hosts

(no explicit or implicit support from user PDFs needed)

Page 18: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

MC Event generation

• Sample “toy Monte Carlo” samples from any PDF– Accept/reject sampling method used by default

• MC generation method automatically optimized– PDF can advertise internal generator if more efficient methods

exists (e.g. Gaussian)

– Each generator request will use the most efficient accept-reject / internal generator combination available

– Operator PDFs (sum,product,…) distribute generation over components whenever possible

– Generation order for products with conditional PDFs is sorted out automatically

– Possible because OO structure of p.d.f. allows automated analysis of structure

• Can also specify template data as input to generator

pdf.generate(vars,nevt)

pdf.generate(vars,data)

Page 19: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Using models – Plotting

• Model visualization geared towards ‘publication plots’ not interactive browsing emphasis on 1-dimensional plots

• Simplest case: plotting a 1-D model over data– Modular structure of composite p.d.f.s allows easy access to components for plotting

– Can show Poisson confidence intervals instead of sqrt(N) errors

RooPlot* frame = mes.frame() ;data->plotOn(frame) ;pdf->plotOn(frame) ;

pdf->plotOn(frame,Components(“bkg”))

frame->Draw() ;

Can store plot with data andall curves as single object

Adaptive spacing of curve

points to achieve 1‰

precision

Page 20: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Visualizing multi-dimensional models

• IMHO: Visualization of multi-dim. models much more complicated than visualization of multi-dim. data

• Simplest scenario: projection plots– Simple example M(m,t) = f[S(m)*T(t)]+(1-f)[B(m)*U(t)]

– Provide intuitive behavior: projection of p.d.f. automatically follows projection of data: 1-D plot frame remembers ‘hidden’ variables in dataset. Subsequent p.d.f.s in same frame are automatically projected

– Works identically for non-factorizable p.d.f.s! Necessary integrals are calculated automatically either analytical or numerically

RooPlot* frame = dt.frame() ;data.plotOn(frame) ;model.plotOn(frame) ;model.plotOn(frame,Components(“bkg”)) ;

mB distribution t distribution

dttmMmc ),()(

dmtmMtc ),()(

Page 21: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Using models – plotting multi-dimensional models

• Projection plot usually doesn’t capture full information content of your p.d.f. equally easy to plot a ‘slice’

RooPlot* frame = dt.frame() ;data.plotOn(frame) ;model.plotOn(frame) ;model.plotOn(frame,Components(“bkg”)) ;

RooPlot* frame = dt.frame() ;dt.setRange(“sel”,5.27,5.30) ;data.plotOn(frame,CutRange(“sel”)) ;model.plotOn(frame,ProjectionRange(“sel”));model.plotOn(frame,ProjectionRange(“sel”), Components(“bkg”)) ;

mB distribution t distribution

More complicated ‘slice’ views such as a likelihood ratio plot only take O(3) lines of extra code

30.5

27.5

),()( dmtmMtc

Page 22: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Using models – plotting

• Can also use template data to show ‘average’ curve– Convenient for visualization of conditional p.d.f.s

– Example plot t distribution of following model over data, averaging over dt values of data

)|,( dttmM

DNi

iD

dmdttmMN

tCurve,1

)|,(1

)(

model.plotOn(frame,ProjWData(dtData)) ;model.plotOn(frame,Components("bkg"), ProjWData(dtData),LineStyle(kDashed)) ;

Page 23: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Advanced features – Task automation

• Support for routine task automation, e.g. useful in finding confidence intervals

Input model Generate toy MC Fit model

Repeat N times

Accumulatefit statistics

Distribution of- parameter values- parameter errors- parameter pulls

// Instantiate MC study managerRooMCStudy mgr(inputModel) ;

// Generate and fit 100 samples of 1000 eventsmgr.generateAndFit(100,1000) ;

// Plot distribution of sigma parametermgr.plotParam(sigma)->Draw()

Page 24: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

RooFit at SourceForge - roofit.sourceforge.net

RooFit moved to SourceForge to facilitate access and

communication with non-BaBar users

Code access–CVS repository via pserver

–File distribution sets for production versions

Page 25: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

RooFit at SourceForge - Documentation

Documentation

Comprehensive set of tutorials

(PPT slide show + example macros)

Five separate tutorials

More than 250 slides and 20

macros in total

Class reference in THtml style

Also available by end of 2005: Pedagogical User Guide + Reference Guide (~100 pages document)

Page 26: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Selection of BaBar publications in 2004 using RooFit

• Improved Measurement of the CKM angle alpha using B0+-

• Measurement of branching fractions and charge asymmetries in B+ decays to +, K+, + and '+, and search for B0 decays to K0 and

• Branching Fraction and CP Asymmetries in B0KSKSKS

• Measurement of CP Asymmetries in B0K0 and B0K+K-K0S

• Measurement of Branching Fractions and Time-Dependent CP-Violating Asymmetries in B' K Decays

• Improved Measurement of Time-Dependent CP Violation in B0 to (ccbar)K0 Decays (‘sin2’)

• Measurements of the Branching Fraction and CP-Violating Asymmetries in B0f0(980)KS Decays

• Measurement of Time-dependent CP-Violating Asymmetries in B0K*, K*KS0 Decays

• Study of the decay B0+- and constraints on the CKM angle alpha.

• Measurement of CP-violating Asymmetries in B0K0S0 Decays

• Measurement of Time-Dependent CP Asymmetries in B0K0

• Search for B+/- [K-/+ +/-]D K+/- and upper limit on the bu amplitude in B+/- DK+/-

• Limits on the Decay-Rate Difference of Neutral B Mesons and on CP, T, and CPT Violation in B0B0bar Oscillations

Page 27: Wouter Verkerke, NIKHEF RooFit A tool kit for data modeling in ROOT Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine)

Wouter Verkerke, NIKHEF

Summary

• RooFit adds a powerful data modeling language to ROOT– Mathematical objects (variables, functions…) represented as C++ objects– Complex models are easily composed from library of standard components

• New fundamental components can be written easily– Powerful tools for fitting, Toy MC generation and visualization are easy to use

• Compact syntax• Universal functionality (almost no arbitrary / implementation related restrictions)

– Automated function optimization analysis and implementation delivers industrial strength performance

• RooFit makes the road from a ‘simple’ to a really elaborate/complex p.d.f. less steep:

• RooFit is distributed with ROOT starting with ROOT version v5.02-00