alice offline tutorial markus oldenburg – cern [email protected] may 15, 2007 –...

24
ALICE Offline Tutorial Markus Oldenburg – CERN [email protected] May 15, 2007 – University of Sao Paulo

Upload: rebecca-ohara

Post on 27-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

ALICE Offline Tutorial

Markus Oldenburg – CERN

[email protected]

May 15, 2007 – University of Sao Paulo

Page 2: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

ALICE Offline Tutorial

F.Carminati, P.Christakoglou, J.F.Grosse-Oetringhaus, P.Hristov, A.Peters, P.Saiz

April 13, 2007 – v1.3

based on:

Page 3: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

Part III: PROOF

available online at: http://cern.ch/Oldenburg -> Seminars

Page 4: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 4

PROOF

Parallel ROOT Facility

Interactive parallel analysis on a local cluster

PROOF itself is not related to GridCan be used in the GridCan access Grid files

The usage of PROOF is transparentThe same code can be run locally and in a PROOF system (certain rules have to be followed)

PROOF is part of ROOT

Page 5: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 5

root

Remote PROOF Cluster

Data

Data

Data

proof

proof

proof

Client - Local PC

$ root

ana.Cstdout/result

node1

node2

node3

node4

$ root

root [0] tree->Process(“ana.C”)

$ root

root [0] tree->Process(“ana.C”)

root [1] gROOT->Proof(“remote”)

$ root

root [0] tree->Process(“ana.C”)

root [1] TProof::Open(“remote”)

root [2] chain->Process(“ana.C”)

ana.C

proof

PROOF Schema

Data

master

slave

slave

slave

Page 6: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 6

Terminology

ClientYour machine running a ROOT session that is connected to a PROOF master

MasterPROOF machine coordinating work between Slaves

SlavePROOF machine that processes data

QueryA job submitted from the client to the PROOF system.A query consists of a selector and a chain

SelectorA class containing the analysis code (more details later)

ChainA list of files (trees) to process (more details later)

Page 7: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 7

TTree

A tree is a container for data storage with disk “overspill”

It consists of several branches

These can be in one or several filesBranches are stored contiguously (split mode)When reading a tree, certain branches can be switched off speed up of analysis when not all data is needed

TreeB

ran

ch

Bra

nc

h

Bra

nc

h

Page 8: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 8

TTree

#include "TTree.h"#include "TFile.h"#include "TRandom.h"

class point {public: void Set() {x=gRandom->Rndm();y=gRandom->Rndm();z=gRandom->Rndm();}private: Float_t x, y, z; ClassDef(point, 1)};

Int_t t() { point *pp = new point(); TTree *tree = new TTree("Test","Test Tree",99); TFile *file = new TFile("test.root","recreate"); tree->Branch("point",&pp); for(Int_t i=0; i<100; ++i) { pp->Set(); tree->Fill();} tree->Write(); file->Close(); // file=new TFile("test.root","read"); tree->Print(); // return 0;}

Page 9: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 9

TTree (2)

*******************************************************************************Tree :Test : Test Tree **Entries : 100 : Total = 4090 bytes File Size = 0 ** : : Tree compression factor = 1.00 ********************************************************************************Branch :point **Entries : 100 : BranchElement (see below) **............................................................................**Br 0 :x : **Entries : 100 : Total Size= 1006 bytes One basket in memory **Baskets : 0 : Basket Size= 32000 bytes Compression= 1.00 **............................................................................**Br 1 :y : **Entries : 100 : Total Size= 1006 bytes One basket in memory **Baskets : 0 : Basket Size= 32000 bytes Compression= 1.00 **............................................................................**Br 2 :z : **Entries : 100 : Total Size= 1006 bytes One basket in memory **Baskets : 0 : Basket Size= 32000 bytes Compression= 1.00 **............................................................................*

point

x

y

z

x x x x x x x x x x

y y y y y y y y y y

z z z z z z z z z z

BranchesFile

Page 10: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 10

How to use PROOF

Files to be analyzed are put into a chain ( TChain)

Analysis written as a selector ( TSelector, AliSelector, AliSelectorRL)

Input/Output is sent using dedicated lists

If additional libraries are needed, these have to be distributed as a “package”

Analysis(TSelector)

Input Files(TChain) Output

(TList)Input (TList)

Page 11: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 11

TChain

A chain is a list of trees (in several files)

Normal TTree functions can be used

Draw(...), Scan(...) these iterate over all elements of

the chain

Selectors can be used with chainsProcess(const char* selectorFileName)

After using SetProof() these calls are run in PROOF

Chain

Tree1 (File1)

Tree2 (File2)

Tree3 (File3)

Tree4 (File3)

Tree5 (File4)

Page 12: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 12

once on your client

once on each Slave

TSelector

for each tree

for each event

Classes derived from TSelector can run locally and in PROOF

Begin()

SlaveBegin()

Init(TTree* tree)

Process(Long64_t entry)

SlaveTerminate()

Terminate()

Page 13: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 13

Input / Output

The TSelector class has two members of type TList:

fInput, fOutputThese are used to get input data or put output data

Input listBefore running a query the input list is populatedproof->AddInput(myObj)In the selector (Begin, SlaveBegin) the object is retrieved: fInput->FindObject(“myObject”)

Page 14: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 14

Input / Output (2)

Output listAfter processing, the output has to be added to the output list on each Slave (in SlaveTerminate)fOutput->Add(fResult)PROOF merges the results from each query automatically (see next slide)On your client (in Terminate) you retrieve the object and save it, display it, ...fOutput->FindObject(“myResult”)

Page 15: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 15

Input / Output (3)

MergingObjects are identified by nameStandard merging implementation for histograms availableOther classes need to implement Merge(TCollection*)When no merging function is available all the individual objects are returned

Result fromSlave 1

Result fromSlave 2

Final result

Merge()

Page 16: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 16

Chain

Tree1 (File1)

Tree2 (File2)

Tree3 (File3)

Tree4 (File3)

Tree5 (File4)

Workflow Summary

Analysis(TSelector)

Input (TList)

proof

proof

proof

Page 17: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 17

Workflow Summary

Analysis(TSelector)

Input (TList)

proof

proof

proof

Output(TList)

Output(TList)

Output(TList)

MergedOutput

Page 18: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 18

Packages

PAR files: PROOF ARchive. Like Java jar

Gzipped tar filePROOF-INF directory

• BUILD.sh, building the package, executed per Slave

• SETUP.C, set environment, load libraries, executed per Slave

API to manage and activate packages

UploadPackage(“package.par”)EnablePackage(“package”)

Page 19: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 19

Accessing ESD

Use local ROOT

To access AliESDs.root, the ESD.par package has to be uploaded into the PROOF environment

Selector derives from AliSelector (in STEER)

Access to data by member: fESD

TSelector

AliSelector

<YourSelector>

Page 20: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 20

Accessing the RunLoader

Use local AliRootAccess to Kinematics, Clusters, etc. requires access to the RunLoaderTherefore (nearly) full AliRoot needs to be loadedA AliRoot version is already deployed on the CAF test system and can be enabled by a 3 line macro(part of the tutorial files, see later)ESD package is not allowed to be loadedSelector derives from AliSelectorRL (in STEER)

GetStack(), GetRunLoader(), GetHeader()

TSelector

AliSelector

AliSelectorRL

<YourSelector>

Page 21: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 21

CERN Analysis Facility

The CERN Analysis Facility (CAF) will run PROOF for ALICE

Prompt analysis of pp dataPilot analysis of PbPb dataCalibration & Alignment

Available to the whole collaboration but the number of users will be limited for efficiency reasons

Design goals500 CPUs100 TB of selected data locally available

Page 22: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 22

Evaluation of PROOF

Test setup since May 200640 machines, 2 CPUs each, 200 GB disk

Tests performedUsability testsSimple speedup plotEvaluation of different query typesEvaluation of the system when running a combination of query types

Goal: Realistic simulation of users using the system

Page 23: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 23

Query Type Cocktail

A realistic stress test consists of different users that submit different types of queries

4 different query types20% very short queries40% short queries20% medium queries20% long queries

User mix33 nodes available for the testMaximum average speedup for 10 users = 6.6 (33 nodes = 66 CPUs)

Page 24: ALICE Offline Tutorial Markus Oldenburg – CERN Markus.Oldenburg@cern.ch May 15, 2007 – University of Sao Paulo

May 15, 2007 24

Relative Speedup