hep data analysis using root · – let’s try it, open hist2.root and invoke the browser •...
TRANSCRIPT
HEP data analysis using ROOTweek 2 ▪ File I/O, memory and object
ownership ▪ Fitting, peak finding and Fourier
analysis
Mark Hodgkinson
Week 2
• File I/O – recap of I/O from last week – more I/O operations – balancing load on queues, local/remote files
• Memory and object ownership – residency – persistency
• Fitting, peak finding and simple Fourier analysis – analysis with ROOT built-ins
• We will go through the slides • Then use the remaining time to work through the tasks (this is the more
important part). • Any questions to [email protected] or find me in D36.
File I/O
• Have already performed some basic I/O operations – via TGraph constructor
– Loading and browsing a TFile
– writing/saving macros • TCanvas::SaveAs(“*.C”)
TGraph TGraph(const char* filename, const char* format = "%lg %lg", Option_t* option = "")
$ root tree2.rootroot [1] new TBrowser
24
68
100 20 40 60 80 100
Graph
File I/O
• More I/O related operations: – the global directory and the system directory – hadding and chaining – tree friends
gDirectory
• ROOT provides a global pointer to the current (ROOT) directory – just as gFile is the global pointer to the
current (ROOT) file – and gPad is the global pointer to the current
pad/canvas • also
– gEnv – gRandom
• Task: call associated Print() methods to see more on these
gDirectory
• if you load a file into your session, it exists in the global directory$ root tree2.root root [0] Processing /Users/perkin/rootlogon.C...Setting MyStyleLoaded ROOT logon script.Attaching file tree2.root as _file0...root [2] gDirectory->ls()TFile** tree2.root TFile* tree2.root KEY: TTree t2;1a Tree with data from a fake Geant3
gDirectory
• and (as shown last week) newly created objects reside in it
• Task: Repeat last weeks exercise below, and check the contents of gDirectory.
root [2] t2->Draw("destep>>hDEStep","destep")Info in <TCanvas::MakeDefCanvas>: created default TCanvas with name c1(Long64_t)9999root [3] h = (TH1F*)gDirectory->Get("hDEStep")(class TH1F*)0x7fe633db95c0root [4] h->GetMean()(const Double_t)2.00359400802118163e-05
Ownership
• last line removes object that mark2 points at from memory • If you don’t then it “stays in scope” - once you shut down the process (ROOT) it will be freed. In
real world examples with 1000’s of lines of analysis code using complex c+ objects, such leaks can cause serious performance problems.
• In c++ if something is declared inside a pair of {}, then once the control exits those brackets only objects created with “new” persist. Remember to delete them!
• Most other languages you will use have automatic “garbage collection”, so you don’t worry about this kind of thing (e.g. python and hence pyROOT)
• C++ does have concept of smart pointers, which clean themselves up. Not supported in CINT. • Compiled C++ ROOT code could use them, may need to ensure you use newer C++ version (c++11) -
see later lectures on compilation.
Some remarks about ownership
• Ownership is dictated by access to the delete method – ROOT owns some objects, you own others
• delete calls the class destructor
• Failure to delete objects owned by you results in memory leaks – rule of thumb: one delete for every new
• Attempting to delete something owned by ROOT (double delete) – segmentation fault!
Some remarks about ownership
• From the ROOT manual: – Histograms, trees, and event lists created by the user
are owned by current directory (gDirectory). • When the current directory is closed or deleted the objects
it owns are deleted – TROOT master object (gROOT) has several collections
of objects. • Objects that are members of these collections are owned by
gROOT – Objects created by another object
• for example the function object (e.g.TF1) created by the TH1::Fit method is owned by the histogram.
– An object created by DrawCopy methods, is owned by the pad it is drawn in.
Example
• Cannot delete this TH1F, because gDirectory took ownership of it when you created it. • This is probably not the only example of ROOT objects taking ownership - beware! • It works the other way around though - presumably gDirectory is careful to check the
pointer is valid before calling delete - something like “if f { delete f; }” • If not sure of ownership, being careful to do it this way will offer protection against
crashes.
gSystem• also have a global pointer to the local
system – how to get the list of file names in the
current (system) directory?root [2] TSystemDirectory dir("dir",gSystem->pwd())root [3] TList* filelist = (TList*)dir.GetListOfFiles()root [4] TSystemFile *froot [5] TString fNameroot [6] TIter next(filelist)root [7] while(f=(TSystemFile*)next()){ fName = f->GetName();
cout<<fName<<endl; }
Aside: TList and TIter
• note use of TList and TIter - ROOT analogues of standard template library (STL) vector.
• However could have also used standard std::vector, though some ROOT interfaces may only support ROOT TList and TIter.
• To some extent it will be a personal preference (mine is to where possible use standard c++, more portable to other people).
hadding
• Sometimes necessary to concatenate/merge several ROOT TFiles
• Facilitated by hadd command-line utility
• Does generic merge of all contents, TObject, in all directories in the files
• Can be slow (many hours) if you have large numbers of histograms in a file for example
• Advanced Task: Take a look at what hadd does by reading its macro and for those interested in python you can find here we wrote in ATLAS using pyROOT (faster because it can ignore entire directories):
/home/hodgkinson/scripts/mergePhysValFiles.py
$ hadd merged.root file0.root file1.root … fileN.root
hadding example
• histogram 1 – random (white) noise
hadding example
• histogram 2 – bipolar pulse (setting width and sigma twice
below is NOT a typo - if you don’t, then your histogram won’t look the one on this slide)
hadding example
• now, hadd the two histogram files – open merged.root and view in TBrowser
– Task: what is the result?
$ hadd merged.root hist1.root hist2.root$ root merged.root
root[1] new TBrowser
hadding example
• The two histograms are merged into one file – there are not, however, added together!
• hadding simply consolidates the contents of multiple files – useful if your analysis output is generated
over many small jobs
residency
• h1 was created, filled then written to file – creating the histogram before the file
dictates that h1 is memory resident • upon instantiation
• h2 was created after f2 was opened – opening a TFile dictates that all subsequent
objects are disk resident • not permanently until Write() is called
residency
• Implicit method of determining whether objects are disk or memory resident – distinctly and uniquely ROOTish
• cause of much consternation to some conventionalists
• Important to understand how resources are being used – though ROOT has some ways of coping with large objects in
memory and large objects on file • Persistency of pointers and references is a deep and
nuanced topic – covered in detail in the manual
• for now, sufficient to flag importance of when/where objects are saved
• You can observe where objects reside via the TBrowser
chaining
• similar to hadding, but specifically for TTrees – containing (exactly) the same branches – Does not actually merge the files!
• simple, illustrative macro here – http://www.hep.shef.ac.uk/people/perkin/
tchain.C • N.B. trees must have the same name – will be default when we write user objects to a
branch [week 4] • i.e. branch name = object name
fitting• ROOT comes with lots of built-in fit
functionality – let’s try it, open hist2.root and invoke the
browser • right click the histogram and select FitPanel – default fit is a Gaussian (normal) distribution
• can change the range of the fit • also, invoke the DrawPanel and draw simple histogram
errors – the default bin error is sqrt(bin entries) – use TH1F::SetBinError(bin,error) to apply errors to individual
bins
Task: Use the Fit and Draw panels to explore what else you can do.
user fit function
• Don’t fit Gaussians to data that aren’t normally distributed – instead define a user function
• Delete previous fit – right click and select Delete
• Now, try this…
$ root hist2.root
root [1] TF1* fSinFit = new TF1("fSinFit","[0]*sin([1]*x+[2])", -20, 20); root [2] fSinFit->SetParameter(0,h2->GetMaximum()) root [3] h2->Fit(fSinFit)
user fit function
user functions• User functions can reference other user
fit functions – by function name
root [1] f = new TF1("sinc","sin(x)/x",-20,20)root [2] f->Draw()root [3] g = new TF1("g","sinc * x * x",-20,20)root [4] g->Draw()
peak finding
• Embed the pulse in the noise – begin with a clone
– ROOT complains if one attempts to add histograms with different ranges
$ root merged.root
root [2] summed = (TH1F*)h1->Clone("summed")root [3] summed->Add(h2)Error in <TH1F::Add>: Attempt to add histograms with different number of bins(Bool_t)0root [4] summed->Draw()
peak finding
• Sum by hand
• Find the peak
root [1] int iSum;root [2] for (int i=0; i<h2->GetNbinsX(); i++) { iSum = int(summed->GetNbinsX()/2.)+i; summed->SetBinContent(iSum,summed->GetBinContent(iSum)+h2->GetBinContent(i)); }root[3] summed->Draw();
root [4] summed->ShowPeaks()
peak finding• threshold too low!
peak finding
• Apply threshold – in range 0<threshold<1
• coordinates extracted from array of TPolyMarkers – observe marker ownership is via histogram’s
list of functions
root [4] summed->ShowPeaks(2,””,0.6)root [9] pm = (TPolyMarker*)summed->GetListOfFunctions()->FindObject("TPolyMarker")root [13] pm->GetX()[0]root [13] pm->GetY()[0]
TSpectrum
• Built-in class for more advanced spectral analysis – peak finding – background subtraction – fourier analysis
• Worthy of further reading – will show simple use of FFT
FFT• Create a histogram from a periodic
functionroot [1] f = new TF1("f"," [0]*sin([1]*x + [2])",-20,20);root [2] const Double_t pars[] = {2,10,0};root [3] f->SetParameters(pars);root [4] h = new TH1F("h","h",100,-20,20);root [5] for (i=0; i<h->GetNbinsX(); i++) h->SetBinContent(i,f->Eval(h->GetBinCenter(i)));root [6] h->Draw();
FFT
• Plot FFT
– where “MAG” is the magnitude • can also request
– real component “RE” – imaginary component “IM” – phase “PH”
• TASK: Work through the examples in the previous slides and make sure you understand them, and reproduce the plots.
root [7] fft = (TH1F*)h->Clone("fft“)root [8] h->FFT(fft,"MAG");root [9] fft->Draw();
Tasks Recap
• Make sure you can print the properties of the global pointers and understand what they are for - gRandom, gDirectory etc
• Use hadd and understand what it does + Bonus - read the hadd.C and look at the mentioned pyRoot version if you are interested in python.
• Explore the Fit and Draw panels. • Work through the examples shown in the slides
for spectral analysis and ensure you understand them and are able to reproduce the plots.
Closing remarks
• ROOT is somewhat idiosyncratic – especially when determining whether objects are disk
or memory resident • Attention must be paid to resource utilisation
– and object ownership • Have introduced basic fitting and analysis
– much more available, worthy of further reading • Next time:
– ROOT physics libraries – ROOT maths libraries
• Any questions?