a prototype for an extended proof

33
A prototype for an extended PROOF • What is PROOF ? • ROOT analysis model … • … on a multi-tier architecture • Status • New development • Prototype based on XRD • Demo G. Ganis / CERN PH-SFT, June 2005

Upload: blair-cain

Post on 03-Jan-2016

41 views

Category:

Documents


3 download

DESCRIPTION

A prototype for an extended PROOF. What is PROOF ? ROOT analysis model … … on a multi-tier architecture Status New development Prototype based on XRD Demo. G. Ganis / CERN PH-SFT, June 2005. The ROOT analysis model: Trees. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A prototype for an extended PROOF

A prototype for an extended PROOF

• What is PROOF ?• ROOT analysis model …• … on a multi-tier architecture• Status

• New development• Prototype based on XRD• Demo

G. Ganis / CERN PH-SFT, June 2005

Page 2: A prototype for an extended PROOF

The ROOT analysis model: Trees

• Main data structure in ROOT, extending the concept of PAW ntuple• Collection of independent entries• Organized in

• Leafs (basic type, array, C++ object)• Branches (collection of Leafs / Branches)

Page 3: A prototype for an extended PROOF

The ROOT analysis model: Trees (cnt’d)

• Efficient access to portions of entry data

• Several facilities to work with trees• Tree friends (TTree::AddFriend):

• extend an existing tree without touching it• e.g. an experiment read-only tree with user-specific branches / leafs

•Tree chains (TChain)• list of trees to make tree size virtually unbounded (typical size of single tree is < 2 GB)

• In all cases the result behaves exactly as a single tree

Page 4: A prototype for an extended PROOF

The ROOT analysis model: Selector

• TSelector: main tool to define the data processing strategy• Simple structure

• Framework automatically generated for a tree• tree->MakeSelector(“MySelector”)

void MySelector::Begin(TTree *tree){ // method called before starting the event loop fPtBranch = tree->GetBranch(“pt”); fPtBranch->SetAddress(&fPt); fPtHist = new TH1F(“Pt”,”Pt”,100,0.,400.);}

Bool_t MySelector::Process(Long64_t entry){ // Method called for each entry in the tree fPtBranch->GetEntry(entry); fPtHist->Fill(fPt);}

void MySelector::Terminate(){ // method called when the event loop is over fPtHist->Draw();}

Read only what isneeded by the algorithm

Page 5: A prototype for an extended PROOF

The ROOT analysis model: h1 analysis example

{ // localProcessing.C // Define the data set TChain a("h42"); a.Add("/home/ganis/rootdata/dstarmb.root"); a.Add("/home/ganis/rootdata/dstarp1a.root"); a.Add("/home/ganis/rootdata/dstarp1b.root"); a.Add("/home/ganis/rootdata/dstarp2.root");

// Process the selector a.Process("h1analysis.C");}

root [0] .x localProcessing.CStarting h1analysis with process option:Starting h1analysis with process option:Processing file: /home/ganis/rootdata/dstarmb.rootProcessing file: /home/ganis/rootdata/dstarp1a.rootProcessing file: /home/ganis/rootdata/dstarp1b.rootProcessing file: /home/ganis/rootdata/dstarp2.root FCN=70.4023 FROM MIGRAD STATUS=CONVERGED 220 CALLS 221 TOTAL EDM=1.37834e-08 STRATEGY= 1 ERROR MATRIX ACCURATE EXT PARAMETER STEP FIRST NO. NAME VALUE ERROR SIZE DERIVATIVE 1 p0 9.59988e+05 9.07051e+04 7.92857e+01 -2.69331e-09 2 p1 3.51130e-01 2.32881e-02 4.69706e-05 5.29292e-03 3 p2 1.18502e+03 5.95938e+01 6.72112e-01 2.29626e-06 4 p3 1.45569e-01 5.93851e-05 8.69320e-07 -1.75027e+00 5 p4 1.24388e-03 6.63103e-05 7.86533e-07 -6.72432e-01Real time 0:00:17.563133, CP time 5.880

Page 6: A prototype for an extended PROOF

PROOF

• Why ?• Data to be analyzed only rarely can be all local• Data transfer of full data sets takes time

• Goal: provide a tool for interactive analysis on a heterogeneous cluster• exploit inter-independence of entries in a tree

• basic parallelism achieved by splitting the data into packets of variable size distributed to participant nodes

• Focus on:• Transparency

• same selectors, … on PROOF as in local session• Scalability

• linear scaling up to large number of workers (tested up to 1000)• Adaptability

• cope automatically with different cluster configurations and varying running conditions / perfomances

Motto: Bring the KiloBytes to the PetaBytes and not the PetaBytes to the KiloBytes

Page 7: A prototype for an extended PROOF

PROOF: architecture

Page 8: A prototype for an extended PROOF

PROOF: connection layer

client

slave 1

master

proofserv

proofd

proofd

proofslaveproofd

proofd slave n

proofslaveproofd

proofd

fork()

fork() fork() execv()execv()

execv()

parent proofd (always running)

child proofd (transforming in proofserv / proofslave)

proofserv / proofslave : TProofServ instances

Page 9: A prototype for an extended PROOF

PROOF: simplified message flow

Page 10: A prototype for an extended PROOF

PROOF: workflow

Page 11: A prototype for an extended PROOF

PROOF: data access strategies

• Each slave get assigned, as much as possible, packets representing data in local files

• If no (more) local data, get remote data via (x)rootd, rfiod or dCache (needs good LAN, like GB eth)

• In case of SAN/NAS just use round robin strategy

Page 12: A prototype for an extended PROOF

PROOF: processing algorithms

TSelector adapted to PROOFNatural additions• Input list: code to be run, …• Output list: results• Methods to initialize and finalize processing within a slave• Method to init a tree

void MySelector::Begin(TTree *tree){ // called in the client for local inits}void MySelector::SlaveBegin(TTree *tree) { // called in each slave before processing fPtHist = new TH1F(“Pt”,”Pt”,100,0.,400.); fOutput->Add(fPtHist);}void MySelector::Init(TTree *tree) { // called at each tree change fPtBranch = tree->GetBranch(“pt”); fPtBranch->SetAddress(&fPt);}Bool_t MySelector::Process(Long64_t entry){ // called for each entry in the tree fPtBranch->GetEntry(entry); fPtHist->Fill(fPt);}void MySelector::SlaveTerminate() { // called in each slave after processing}void MySelector::Terminate() { // called in the client after processing fPtHist->Draw();}

Defines the list of objects wanted back

Objects with Merge() methodare automatically merged inTerminate

The modified TSelector worksalso in non-PROOF sessions

Page 13: A prototype for an extended PROOF

PROOF: the data

Data set: dedicated class TDSet

• Specifies a collection of files with objects• Understands logical file names• Could be return by a query to a database or file catalog or …• API very close to TChain

{ // proofProcessing.C // Define the data set TDSet a(“TTree”,"h42"); a.Add(“root://oplapro62.cern.ch//tmp/dstarmb.root"); a.Add(“root://oplapro62.cern.ch//tmp/dstarp1a.root"); a.Add(“root://oplapro62.cern.ch//tmp/dstarp1b.root"); a.Add(“root://oplapro62.cern.ch//tmp/dstarp2.root");

// Process the selector a.Process("h1analysis.C");}

Page 14: A prototype for an extended PROOF

root[0] gROOT->Proof(“pcepsft43.cern.ch”)PROOF set to parallel mode (10 slaves)root[1] .x proofProcessing.CStarting h1analysis with process option:Starting h1analysis with process option:Processing file: /tmp/ganis/rootdata/dstarp1a.rootProcessing file: /tmp/ganis/rootdata/dstarp2.rootStarting h1analysis with process option:Processing file: //tmp/ganis/rootdata/dstarmb.rootProcessing file: //tmp/ganis/rootdata/dstarp1b.rootProcessing file: //tmp/ganis/rootdata/dstarp2.root FCN=70.4023 FROM MIGRAD STATUS=CONVERGED 220 CALLS 221 TOTAL EDM=1.37834e-08 STRATEGY= 1 ERROR MATRIX ACCURATE EXT PARAMETER STEP FIRST NO. NAME VALUE ERROR SIZE DERIVATIVE 1 p0 9.59988e+05 9.07051e+04 7.92857e+01 -2.69331e-09 2 p1 3.51130e-01 2.32881e-02 4.69706e-05 5.29292e-03 3 p2 1.18502e+03 5.95938e+01 6.72112e-01 2.29626e-06 4 p3 1.45569e-01 5.93851e-05 8.69320e-07 -1.75027e+00 5 p4 1.24388e-03 6.63103e-05 7.86533e-07 -6.72432e-01root[2]

PROOF: running the query

Executing …

Page 15: A prototype for an extended PROOF

PROOF: additional features

• Possibility to upload and / or build additional packages• packed as PAR file (Proof ARchive, as Java JAR …)

gProof->UploadPackage(“MyPackage.par”)gProof->EnablePackage(“MyPackage”)

• Cache system to minimize the number of file transfers• File identity and integrity using message digest technology• Feedback information at configurable time intervals

Page 16: A prototype for an extended PROOF

PROOF: realtime feedback

Feedback histogram,

updated every (e.g.) 1 second

Chain definition (header) is fetched from the PROOF

master

Page 17: A prototype for an extended PROOF

PROOF on clusters

• PROOF can use “resource brokers” to find out where to start the slaves• PROOF can use file catalogs to locate the files to be analysed• Concrete examples:

• Interface with Condor Computing-On-Demand system• master start the slaves as COD jobs

• PEAC: PROOF-Enabled Analysis Cluster• Complete event analysis solution:

• data catalog, resource broker, PROOF• TGrid: abstract Grid interface for all Grid services

• Concrete implementation for Alien

// ConnectTGrid *alien = TGrid::Connect(“alien”);

// QueryTGridResult *res = alien->Query(“lfn:///alice/simulation/2001-04/V0.6*.root“);

// Data setTDSet *treeset = new TDSet("TTree", "AOD");treeset->Add(res);

// use files in result set to find remote nodesgROOT->Proof(res);treeset->Process(“myselector.C”);

Page 18: A prototype for an extended PROOF

PROOF: current limitations

• Originally intended for short queries• TDSet::Process blocks until is done

•Stateful connection

• everything is lost if the connection is lost or cut

• Originally designed for a local cluster• static configuration

• Robustness of some components• Interrupt control-flow based on Out-Of-Band messages • Authentication when different protocols are required at different steps

• Sandbox when user account not available

• Documentation

Page 19: A prototype for an extended PROOF

PROOF: team for new developments

• Maarten Ballintijn• Marek Biskup• Rene Brun• Derek Feichtinger (ARDA)• G.G.• Guenter Kickinger• Andreas Peters (ARDA)• Fons Rademakers

Page 20: A prototype for an extended PROOF

PROOF: new development fields

• Interactive batch• stateless connection• non blocking queries

• Robusteness• Get rid of OOB messages

• Setup/ configuration issues• zero-config setup• allow slaves to come and go

• Grid interfacing• efficient use of grid information (catalogs, resource brokers, …)

• Performance issues• targeted read ahead, improved caching, query estimators

• Authentication• Adopt XROOTD framework

• Analysis issues:• Tree friends, event lists, indices

• GUI, Browsing

Page 21: A prototype for an extended PROOF

Typical query-time distribution

Page 22: A prototype for an extended PROOF

XPD: communication layer for PROOF based on XROOTD

• Transfer of state from the client to the PROOF cluster requires a manager on the cluster side keeping track of existing sessions and query submissions • XROOTD (in ROOT since v 4.01.02), provides a generic main component (xrd) for handling of networking issues and protocol scheduling, and utilities tools (forking, error handling, security, …) on which the manager can be based on

• Candidate to introduce• interactive-batch mode:

• possibility to leave a session if a query takes too long and reconnect later to pick-up the results

• non-blocking query submission:• possibility to detach from the query while being processed (even for potentially short queries)

• more robust authentication system

Page 23: A prototype for an extended PROOF

How does XROOTD work

• Multi-component server based on a multi-thread architecture

• xrd component: provides networking, thread management, protocol scheduling

• Minimal sets of threads:

• Acceptor: opens connection; matches the protocol; submits job to scheduler• Pollers: react to any activity on open links; submit job to scheduler• Scheduler: schedules work to be done (jobs)• Worker(s): wait for job to be done• Buffer manager: dynamically optimizes use of memory buffers

• Workers created / destroyed following needs

• Links not attached to a specific worker: first worker free takes the job

• Jobs ≡ data/information to be processed for a given link

Page 24: A prototype for an extended PROOF

How does XROOTD work

accept

WN

schedulerBM

XROOTDXrdJob

poller

files

links

XrdXrootdProtocol

• one XrdXrootdProtocol instance per physical connection (i.e. per client session)• client gateway to the files: used to communicate with all the files the client wants to access on that specific server

Page 25: A prototype for an extended PROOF

How does XPROOFD work

accept

WN

scheduler

XPROOFDXrdJob

poller

proofserv

links

XrdProotdProtocol

• one XrdProotdProtocol instance per physical connection (i.e. per client session)• client gateway to proofserv• static area keeps all the relevant information about a user and its activities on the cluster

static area

Page 26: A prototype for an extended PROOF

XPROOFD: communication layer

clientxc

slave n

XrdProofd

PO

slave 1

XrdProofd proofslave

PO

master

XrdProofd

proofserv

PO

xc

PO

xcXRD pollers

TXPSocketxc

proofslave

xc xc

fork()

fork() fork()

Page 27: A prototype for an extended PROOF

Basic ingredients

• Client side:• new class TXPSocket

• TSocket interface understanding the new communication protocol• new class TXProofMgr

• reflects the status of a client vis-à-vis of a given cluster• start / attach sessions, described by TProof instances (no more unique)

• Server side:• new implementation of XrdProtocol, XrdProofdProtocol

• client gateway to the cluster, one-to-one relation to TXProofMgr• static area to describing the persistent information (server lifetime)

• new class XrdProofSrv• proxy to the external processor (proofserv), submitted queries, results, …• one per external processor

Page 28: A prototype for an extended PROOF

TXPSocket

• Separate thread for receiving messages• Intensive use of unsolicited messages

• normal asynchronous messages (i.e. in Collect)• interrupts (no OOB)• ping functionality

• Synchronous and asynchronous messages posted in separate queues• Interrupt handler waken up with internal SIGURG (from reader to main thread)• Ping treated as a special interrupt (level 0)

Page 29: A prototype for an extended PROOF

TXPSocket – Reader thread

syncmsg

asyncmsg

interrupts

SIGURG

Post event

recv()TCP connection

Page 30: A prototype for an extended PROOF

XPD: Demo!

Results achieved with the realistic prototype

• Multi-sessions• Disconnect / Reconnect• Process: blocking query• Submit: non-blocking query• Finalize results from different sessions• Archive results to /afs using same daemon as file server

Page 31: A prototype for an extended PROOF

XPD: what next

• Deep test of the communication layer• latencies• synchronization problems

• Test with large realistic number of slaves• Alternatives for internal connection • Enable authentication• XROOTD load balancing?

Page 32: A prototype for an extended PROOF

Other studies

Advanced prototype using a communication layer based onmemory mapped message queue technology (A. Peters,D. Feichtinger):

• full state in message queues• nice recovery features

• multi-thread master• queue insertion, configuration, scheduler, packetizer• client frontend

• slave splitting in supervisor and processors• not attached to a specific user

• better use of resources

Page 33: A prototype for an extended PROOF

Summary

• Lot of activity going on to improve the PROOF system• Working prototype with a communication layer based on XROOTD exists

• interactive batch, multi-session, reconnect • Alternative studies may provided good solutions for some issues

• Goal: have the new system in good shape for ROOT05