PROOF - Parallel ROOT Facility
Kilian SchwarzRobert Manteufel
Carsten PreußGSI
http://root.cern.ch
Bring the KB to the PB not the PB to the KB
IntroductionA step towards a solution: (Ali) ROOT + AliEn
Andreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/Japan
● ROOT is becoming most popular available physics analysis toolkit.● Interactive analysis work in familiar C++ style syntax● data visualisation, an object-oriented I/O system● crucial role for the LCG project ?!● is successfully used within the AliROOT framework of the ALICE
experiment as an all in one solution ● PROOF extends workstation based concept of ROOT to
the 'parallel ROOT facility'. ● user procedures are kept identical during an analysis session● tasks are distributed automatically in the background
●AliEn as a GRID analysis platform provides two key elements:● a global filesystem
➔ files are indexed and tagged in a virtual file catalogue and everywhere globally accessible
● a global queuesystem➔ global job scheduling according to resource requirements
Parallel Analysis of Event Data
root
Remote PROOF Cluster
proof
proof
proof
TNetFile
TFile
Local PC
$ root
ana.Cstdout/obj
node1
node2
node3
node4
$ root
root [0] tree.Process(“ana.C”)
$ root
root [0] tree.Process(“ana.C”)
root [1] gROOT->Proof(“remote”)
$ root
root [0] tree.Process(“ana.C”)
root [1] gROOT->Proof(“remote”)
root [2] dset->Process(“ana.C”)
ana.C
proof
proof = slave server
proof
proof = master server
#proof.confslave node1slave node2slave node3slave node4
*.root
*.root
*.root
*.root
TFile
TFile
PROOF - Scalability
GSI environmentthe prooflogin-script
scanning user-parameters for errorsprocessing user-parameters
scanning LSF-Cluster for PROOF-jobstesting .rootrc
building scripts (cleanup.sh, proofstarter.C and proofd.sh) getting / setting ROOT-version
starting the wanted amount of PROOF-daemonsbuilding .proof.conf and .rootauthrc
starting local rootdstarting ROOT and executing proofstarter.C
killing jobs and processesremoving all builded files
starting PROOF, uploading packages and starting the analysis
User-parameters/usr/local/bin/prooflogin
-s slave-count-t termination-time-v ROOT-version-f ROOT-files-lib library-files-par file-packages-mol starts the master on the localhost-? / -h / -help help for proof.shoptional : -file a text-file with all parameters
written in
dedicated batch queue for PROOF
only proof jobs are started in the dedicated Proof queue
Quick Response Queue currently in test operation on 30
nodes of the GSI batch farm In this queue proof jobs have
advantage. Non proof jobs will be set on hold
PROOF configuration files
$HOME/.rootauthrc (e.g.)proofserv lxb108:kschwarz:0 lxb109:kschwarz:0 lxb110:kschwarz:0
•$HOME/.proof.conf (e.g.)
node lxb108 port=1095 usrpwdslave lxb108 port=1096 usrpwdslave lxb109 port=1095 usrpwdslave lxb109 port=1096 usrpwdslave lxb110 port=1095 usrpwdslave lxb110 port=1096 usrpwd
Interactive Analysis with PROOF
Andreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/JapanAndreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/Japan
Basic requirements: ●Analysis data has to be stored as objects derived from TObject in ROOT trees
● proofds have to load extension libraries for user-specific objects toaccess the data members
● Analysis code has to be inserted in the automatically generated selector macro for the object to be analyzed:<classobject>->MakeSelector();
Receipe:-store your objects in trees-use the selector macro for analysis code
Interactive Analysis with AliEn + ROOT/PROOF
Andreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/JapanAndreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/Japan
● interactive analysis with Proof steered by a data packetizer
● in a local cluster:➔ cluster wide accessible data can be processed by all
slaves➔ packet takeover by all slaves!
● in a grid environment:➔ site wide accessible data can be processed by all slaves
➔ packet takeover by all slaves within one site !
Work Distribution
Short explanation for creating an analysis script
4 steps: Open ROOT-file. make Selector-files (analysis files) edit header-file edit source-file
Open ROOT-file and create Selector files
Open your file:TFile f(„/u/dvgamma/hitfile.root");
Create Selector files:hittree->MakeSelector(„Anaproof“);
the ROOT-file contains a tree called „hittree“
Edit header file Add your branch:
TBranch *b_myHit;
Set branch address: Fchain->SetBranchAddress(„myHit“,&myHit);
Add user defined objects and some data members(will be explained later)
Class TCounter Dummy class
designed for collecting analysis results from slaves
Keep the data till it‘s catched in SlaveTerminate()
Edit source file (1/2)
Explanation for each function see on top
Analysis is embedded in Anaproof::Process(Int_t entry)
the analysis checks the hits in chambers and counts them
the counterobject collects hitcounters
In Anaproof::SlaveTerminate() add:
fOutput->Add(counterobj);
Edit source file (2/2)
First you get your counterobj as an TObject from the outList
Convert it back before it can be used as an TCounter object
libraries and packagesTMytrackerhit and TCounter are two user defined classes.To use them in a PROOF session you have to build a package that can beuploaded by a ROOT–demon.You need following dictionary/file mix:
libTMytrackerhit/PROOF-INF/SETUP.C
A look into Setup.C:Int_t SETUP(){ gSystem->Load(„/u/dvgamma/projects/globus/onestep/ROOT/libTMytrackhit.so“); gSystem->Load(„/u/dvgamma/projects/globus/onestep/PROOF/TCounter/TCounter.so“);
return 1;}
To create a package:„tar –czf libTMytrackerhit.par libTMytrackerhit;“
Finally launch a PROOF-Session and start the analysis
Start the PROOF-session via script or manually
Inside of the session:
Upload packages
Enable packages
Create TDSet
Add file
Start analysis
At least a screenshot
Jointventure of AliEn + ROOT
Andreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/JapanAndreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/Japan
APIService
AliEnServices
DBIDB
Proxy Catalogue DB
AliEnC++ API
gSOAPClient
TGridClass
TAlienPlugin
<other>Plugin
●AliEn Services + Catalogue are accessible via TAlien(TGrid) class and global<gGrid> variable in ROOT
● TAlien uses a SOAP based AliEn C++ API Examples:
➔TGrid::Connect(“alien://aliendb1:15000/?direct”,””); // inititate gGrid with AliEn plugin (API server at aliendb1, port 15000)➔gGrid->mkdir(“/alice/acat03”);// create directory in virtual file catalogue
Client Host Client or Remote Host VO Service + DB Hosts
Executable = "root";Packages = "ROOT::3.10.01";Arguments = "Command::ROOT -x macro.C";InputData = {"/alice/production/peters/*Tree.root”};InputFile = {"LF:/alice/user/p/peters/macro.C"};OutputFile = {"myhisto.root"};
Andreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/JapanAndreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/Japan
AliEn Job Description ExampleRunning a ROOT macro on registered data
Simple and readable !
Interactive Analysis with AliEn + ROOT/PROOF
Andreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/JapanAndreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/Japan
PROOFPROOF
USER SESSIONUSER SESSION
PROOF PROOF SLAVE SLAVE SERVERSSERVERS
PROOF MASTERPROOF MASTER SERVERSERVER
AliEn Grid Proof Setup:
PROOF PROOF SLAVE SLAVE SERVERSSERVERS
PROOF PROOF SLAVE SLAVE SERVERSSERVERS TcpRouter
TcpRouterTcpRouter
TcpRouter
● Guaranteed site access throughmultiplexing TcpRouters
Interactive Analysis with AliEn + ROOT/PROOF
Andreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/JapanAndreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/Japan
Sample Session: Connect/Query
Interactive Analysis with AliEn + ROOT/PROOF
Andreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/JapanAndreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/Japan
Sample Session: connection to assigned proofds
Interactive Analysis with AliEn + ROOT/PROOF
Andreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/JapanAndreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/Japan
Sample Session: data processing
Unification of Batch & Interactive Analysis with AliEn + ROOT/PROOF
Andreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/JapanAndreas J. Peters CERN/Geneva @ ACAT03 – Tokyo/Japan
current implementation: ● datasets are represented by objects of the type TDSet in ROOT● a GRID data query assigns data files to TDSet Objects● the “process” method initiates the interactive processing on the assigned GRID proof cluster
to come:● the same “process” method initiates the batch processing of the same data set and automatic merging of results.
ALICE will test the analysis facilities during the physics data challenge end 2004.
Thanks to Fons Rademakers Andreas Joachim Peters
For their contributions to the transparencies
http://www-w2k.gsi.de/root/