proof status and perspectives g. ganis cern / lcg vii root users workshop, cern, march 2007

51
PROOF PROOF Status and Perspectives Status and Perspectives G. GANIS G. GANIS CERN / LCG CERN / LCG VII ROOT Users workshop, CERN, March VII ROOT Users workshop, CERN, March 2007 2007

Upload: emily-horton

Post on 28-Dec-2015

222 views

Category:

Documents


3 download

TRANSCRIPT

PROOFPROOFStatus and PerspectivesStatus and Perspectives

G. GANISG. GANISCERN / LCGCERN / LCG

VII ROOT Users workshop, CERN, March VII ROOT Users workshop, CERN, March 20072007

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 22

OutlineOutline

(Very) quick introduction(Very) quick introduction What’s new since ROOT05What’s new since ROOT05 Current developments and plansCurrent developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 33

PROOF in a slidePROOF in a slide

PROOF: Dynamic PROOF: Dynamic approach to end-user HEP analysis on distributed approach to end-user HEP analysis on distributed systems exploiting the intrinsic parallelism of HEP data (see systems exploiting the intrinsic parallelism of HEP data (see Backup slides)Backup slides)

(Very) quick introduction (Very) quick introduction What’s new since ROOT05 Current developments and plans

subsubmastermaster

workersworkers MSSMSS

geographical domain

toptopmastermaster

subsubmastermaster

workersworkers MSSMSS

geographical domain

subsubmastermaster

workersworkers MSSMSS

geographical domain

master

clientclient

list of outputlist of outputobjectsobjects

(histograms, …)(histograms, …)

commands,commands,scriptsscripts

PROOF enabled facilityPROOF enabled facility

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 44

PROOF aspects / issuesPROOF aspects / issues

Connection layerConnection layer Xrootd, Authentication, Error handlingXrootd, Authentication, Error handling

Software distributionSoftware distribution Optimized package / class handlingOptimized package / class handling

Data accessData access Optimized distribution of data on worker nodesOptimized distribution of data on worker nodes

Classification / handling of the resultsClassification / handling of the results Query result managerQuery result manager

Resource sharing among usersResource sharing among users Client gets one ROOT session on each machineClient gets one ROOT session on each machine SchedulingScheduling

(Very) quick introduction (Very) quick introduction What’s new since ROOT05 Current developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 55

What’s new since ROOT05What’s new since ROOT05

Connection layer based on XROOTDConnection layer based on XROOTD Coordinator functionalityCoordinator functionality Full implementation of “interactive batch” modelFull implementation of “interactive batch” model

Dataset managementDataset management Packetizer improvementsPacketizer improvements Progress in uploading / enabling additional softwareProgress in uploading / enabling additional software Restructuring of the PROOF modulesRestructuring of the PROOF modules Progress in the integrationProgress in the integration withwith experimentexperiment

softwaresoftware PROOF PROOF WikiWiki pages pages ALICE experience at the CAF (ALICE experience at the CAF (see J.F. Grosse-Oetringhaus talk)see J.F. Grosse-Oetringhaus talk)

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 66

Coordinator functionalityCoordinator functionality

Independent channel to control the clusterIndependent channel to control the cluster Global viewGlobal view Independent access to information (e.g. log files)Independent access to information (e.g. log files) Needed for full implementation of “interactive batch”Needed for full implementation of “interactive batch”

Not directly achievable with proofdNot directly achievable with proofd Daemon instance “disappearing” into proofservDaemon instance “disappearing” into proofserv Session lifetime same as client connection lifetimeSession lifetime same as client connection lifetime Parent proofd not aware of childrensParent proofd not aware of childrens

Natural candidate: XROOTDNatural candidate: XROOTD Light weight, industrial strength, networking and Light weight, industrial strength, networking and

protocol handlerprotocol handler

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 77

New connection layer based on New connection layer based on XROOTDXROOTD

New PROOF-related protocol:New PROOF-related protocol: XrdProofdProtocol (XrdProofdProtocol (XPDXPD)) XPD launches and controls PROOF sessions (XPD launches and controls PROOF sessions (proofservproofserv))

Client connection (XrdProofConn) based on XrdClientClient connection (XrdProofConn) based on XrdClient Concept of physical (per client) / logical (per session) Concept of physical (per client) / logical (per session)

connectionconnection Asynchronous reading via dedicated threadAsynchronous reading via dedicated thread

Messages read as soon as available and added to a queueMessages read as soon as available and added to a queue setup a setup a control interrupt network independent of OOBcontrol interrupt network independent of OOB

Cleaner security systemCleaner security system Physical connection authenticatedPhysical connection authenticated

Associated logical connections inherit the “token”Associated logical connections inherit the “token”

Client disconnection / reconnection handled naturallyClient disconnection / reconnection handled naturally

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 88

XPD roleXPD role

XrdProofdProtocol: XrdProofdProtocol: client gateway to proofservclient gateway to proofserv

XPDXPD

linkslinks

XrdProofdProtocolXrdProofdProtocol

staticstaticareaarea

MT stuffMT stuff

proofservproofserv

Work

er

serv

ers

Work

er

serv

ers

clientclient

PROOF FarmPROOF Farm

XROOTDXROOTD

linkslinks

XrdXrootdProtocolXrdXrootdProtocol

filesfiles

MT stuffMT stuff

clientclient

File ServerFile Server

XrdXrootdProtocol:client gateway to files

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 99

XPD communication layerXPD communication layer

clientclientxc

worker nworker n

XrdProofdXrdProofd

XS

worker 1worker 1

XrdProofdXrdProofd proofslaveproofslave

XS

mastermaster

XrdProofdXrdProofdproofservproofserv

XS

xc

XS

xc

XRD linksXRD links

TXSocketTXSocket

xc

proofslaveproofslave

xc xc

fork()fork()

fork()fork() fork()fork()

PROOF FarmPROOF Farm

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

clientclientxc

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1010

Stateless connection and “Interactive Stateless connection and “Interactive batch”batch”

““Interactive batch”: flexible submission Interactive batch”: flexible submission system keeping advantages of interactivity system keeping advantages of interactivity and batch and batch If a query is taking too long have the option to abort it, to stop If a query is taking too long have the option to abort it, to stop

and retrieve the results, or to leave it running on the system and retrieve the results, or to leave it running on the system coming back later on to browse / retrieve / archive the resultscoming back later on to browse / retrieve / archive the results

IngredientsIngredients Non-blocking running mode (Non-blocking running mode ( v5.04.00, ROOT05v5.04.00, ROOT05)) Query result management (Query result management ( v5.04.00, ROOT05v5.04.00, ROOT05)) Stateless client connection (Stateless client connection ( v5.08.00v5.08.00)) Ctrl-Z functionality (soon)Ctrl-Z functionality (soon)

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1111

Exploiting the coordinator: client sideExploiting the coordinator: client side

Not yet fully exploited:Not yet fully exploited: new functionality added regularlynew functionality added regularly

Examples:Examples: Log retrievalLog retrieval

TProofLogTProofLog contains log files as TMacro and contains log files as TMacro and implements display, grep, save, … functionalityimplements display, grep, save, … functionality

Session resetSession reset

Cleanup of user’s entry in the coordinatorCleanup of user’s entry in the coordinator Only way-out when something bad happenOnly way-out when something bad happen

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

root[] TProofLog *pl = TProof::Mgr(“user@master”)->GetSessionLogs()root[] pl->Grep(“violation”)

TProof::Reset(“user@master”)

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1212

Exploiting the coordinator: server Exploiting the coordinator: server sideside

Static control of resource usageStatic control of resource usage Max number of usersMax number of users Max number of workers per userMax number of workers per user

Access, usage controlAccess, usage control Role of serverRole of server List of users allowed to connectList of users allowed to connect

Define ROOT versions available on the clusterDefine ROOT versions available on the cluster Extendable to packagesExtendable to packages

……

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1313

Dataset uploaderDataset uploader

Optimized distribution of data files on the farm Optimized distribution of data files on the farm using XROOTD functionalityusing XROOTD functionality By direct uploadBy direct upload By staging out from mass storage By staging out from mass storage

Direct uploadDirect upload Sources: local directory, list of URLsSources: local directory, list of URLs XROOTD/OLBD pool insures optimal distributionXROOTD/OLBD pool insures optimal distribution

No special configuration (except for clean-up)No special configuration (except for clean-up) Using a stagerUsing a stager

Requires XROOTD configurationRequires XROOTD configuration e.g. CASTOR for ALICE @ CAF e.g. CASTOR for ALICE @ CAF

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1414

Dataset managerDataset manager

Data-sets are Data-sets are identified by nameidentified by name

Data-sets can be retrieved by name to Data-sets can be retrieved by name to automatically create TDSet’sautomatically create TDSet’s

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

root[0] TProof *proof = TProof::Open(“master”);root[1] proof->UploadDataSet(“MCppH”,”/data1/mc/ppH_*”);Uploading file:///data1/mc/ppH_01.root to \ root://poolurl//poolpath/ppH_01.root[TFile::Cp] Total 20.34 MB |===============| 100.00 % [6.9 MB/s]

root[2] proof->ShowDataSets();Existing Datasets:MCppH

root[]TDSet *dset = new TDSet(proof->GetDataSet(“MCppH”));

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1515

Dataset managerDataset manager

Metadata stored in sandbox on the masterMetadata stored in sandbox on the master New sub-directory New sub-directory <SandBox>/dataset<SandBox>/dataset

Concept of Concept of privateprivate / / publicpublic data-sets data-sets User’s private definitionsUser’s private definitions

readable / writable by owner onlyreadable / writable by owner only User’s public definitionsUser’s public definitions

readable by anybodyreadable by anybody Global public definitionsGlobal public definitions

Workgroup- / experiment-wide (e.g. 2008 runs)Workgroup- / experiment-wide (e.g. 2008 runs) readable by anybody (group restrictions?)readable by anybody (group restrictions?) writable by privileged account writable by privileged account

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1616

Packetizer improvementsPacketizer improvements

Packetizer’s goal: optimize work distribution to Packetizer’s goal: optimize work distribution to process queries as fast as possibleprocess queries as fast as possible

Standard TPacketizer’s strategyStandard TPacketizer’s strategy first process local files, than try to process remote datafirst process local files, than try to process remote data

End-of-query bottleneckEnd-of-query bottleneck

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

Active workersActive workers

Processing timeProcessing time

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1717

New strategy: TAdaptivePacketizerNew strategy: TAdaptivePacketizer

Predict processing time of local files for each workerPredict processing time of local files for each worker Keep assigning remote files from start of the queryKeep assigning remote files from start of the query to to

workers expected to finish fasterworkers expected to finish faster Processing time Processing time improved by up to 50%improved by up to 50%

Remote packetsRemote packets

SameSamescalescale

Processing rateProcessing rate for all packetsfor all packets

NEW

OLD

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1818

Progress in using additional softwareProgress in using additional software

Package enablingPackage enabling Separated behaviour client / clusterSeparated behaviour client / cluster Real-time feedback during build Real-time feedback during build

Load mechanism extended to single class / macroLoad mechanism extended to single class / macro

Selectors / macros / classes binaries are now Selectors / macros / classes binaries are now cachedcached Decreases initialization timeDecreases initialization time

API to modify include / library paths on the workersAPI to modify include / library paths on the workers Use packages globally available on the clusterUse packages globally available on the cluster

root[] TProof *proof = TProof::Open(“master”)root[] proof->Load(“MyClass.C”)

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1919

Restructuring of PROOF modulesRestructuring of PROOF modules

Reduce dependenciesReduce dependencies Better control size of executables (proofserv)Better control size of executables (proofserv)

Faster worker startupFaster worker startup First step:First step:

Get rid of TVirtualProof and PROOF dependencies in Get rid of TVirtualProof and PROOF dependencies in ‘tree’‘tree’

All PROOF in ‘proof’, ‘proofx’, ‘proofd’All PROOF in ‘proof’, ‘proofx’, ‘proofd’ Still ‘proofserv’ needs a lot of libsStill ‘proofserv’ needs a lot of libs

2nd step (current situation):2nd step (current situation): Separate out TProofPlayer, TPacketizer, … in Separate out TProofPlayer, TPacketizer, … in

‘proofplayer’ (new libProofPlayer, v5.15.04)‘proofplayer’ (new libProofPlayer, v5.15.04) proofserv size proofserv size on workerson workers reduced by a factor of ~2 reduced by a factor of ~2

at startupat startup

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2020

Further optimization of PROOF libsFurther optimization of PROOF libs

Differentiate setups on client and clusterDifferentiate setups on client and cluster Client:Client:

Needs graphicsNeeds graphics May not need all experiment softwareMay not need all experiment software TSelector: compile only Begin() and Terminate()TSelector: compile only Begin() and Terminate()

Servers:Servers: Need all experiment softwareNeed all experiment software Do not need graphicsDo not need graphics TSelector: do not compile Begin() and TSelector: do not compile Begin() and

Terminate()Terminate() Client and Server versions of basic libsClient and Server versions of basic libs

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2121

Additional improvements (incomplete)Additional improvements (incomplete)

GUI controllerGUI controller Integration of the data set managerIntegration of the data set manager Integration of the new features of package managerIntegration of the new features of package manager Improved session / query history bookkeepingImproved session / query history bookkeeping

Improved user-friendliness of parameter settingImproved user-friendliness of parameter setting

Automatic support dynamic environment settingAutomatic support dynamic environment setting proofserv is a script launching proofserv.exeproofserv is a script launching proofserv.exe Envs to define the context in which to runEnvs to define the context in which to run Useful for experiment specific settings (see later) and/or for Useful for experiment specific settings (see later) and/or for

debugging purposes (e.g. run valgrind on worker …)debugging purposes (e.g. run valgrind on worker …)

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

root[] TProof *proof = TProof::Open(“master”)root[] proof->SetParameter(“factor”, 1.1)

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2222

Integration with experiment softwareIntegration with experiment software

Finding, using the experiment softwareFinding, using the experiment software Environment settings, libraries loadingEnvironment settings, libraries loading

Implementing the analysis algorithmsImplementing the analysis algorithms TSelector frameworkTSelector framework

Structured analysis and automated interaction Structured analysis and automated interaction with trees (chains) (with trees (chains) (++))

Tightly coupled with the tree (Tightly coupled with the tree (--)) New analysis implies new selectorNew analysis implies new selector Change in the tree definition implies a new selectorChange in the tree definition implies a new selector

May conflict with existing experiment technologiesMay conflict with existing experiment technologies Add new layer to hide details irrelevant for the end-Add new layer to hide details irrelevant for the end-

useruser

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2323

Setting the environmentSetting the environment

Experiment software available on nodesExperiment software available on nodes Additional dedicated software handled by the Additional dedicated software handled by the

PROOF package managerPROOF package manager Allows user to run her/his own modificationsAllows user to run her/his own modifications

The experiment environment can be set The experiment environment can be set StaticallyStatically (e.g. ALICE) (e.g. ALICE)

before starting xrootd (inherited by proofserv)before starting xrootd (inherited by proofserv) DynamicallyDynamically (e.g. CMS) (e.g. CMS)

evaluating a user defined script in front of evaluating a user defined script in front of proofservproofserv

Allows to select different versions at run timeAllows to select different versions at run time

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2424

Dynamic environment setting: CMSDynamic environment setting: CMS

CMS needs to run SCRAM before proofservCMS needs to run SCRAM before proofserv PROOF_INITCMD contains thePROOF_INITCMD contains the path of a script (path of a script (NEWNEW))

The script initializes the CMS environment using The script initializes the CMS environment using SCRAMSCRAM

TProof::AddEnvVar(“PROOF_INITCMD”, “~maartenb/proj/cms/CMSSW_1_1_1/setup_proof.sh”)

#!/bin/sh

# Export the architectureexport SCRAM_ARCH=slc3_ia32_gcc323

# Init CMS defaultscd ~maartenb/proj/cms/CMSSW_1_1_1. /app/cms/cmsset_default.sh

# Init runtime environmentscramv1 runtime -sh > /tmp/dummycat /tmp/dummy

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2525

Examples of implementing analysis Examples of implementing analysis algorithmsalgorithms

ALICE:ALICE: Generic AliSelector hiding detailsGeneric AliSelector hiding details User’s selector derives from AliSelectorUser’s selector derives from AliSelector

Access to ESD event by member fESDAccess to ESD event by member fESD Alternative technology using tasksAlternative technology using tasks See J.F. Grosse-Oetringhaus talkSee J.F. Grosse-Oetringhaus talk

TAM technology @ PHOBOSTAM technology @ PHOBOS Based on modularized tasksBased on modularized tasks

Separate analysis tasks from interaction with Separate analysis tasks from interaction with treetree

See C. Reed at ROOT05 See C. Reed at ROOT05

(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2626

CMSSW: provides EDAnalyzer for analysis CMSSW: provides EDAnalyzer for analysis Algorithms with Algorithms with a well defined interfacea well defined interface can be used can be used

with both technologies (EDAnalyzer and TSelector)with both technologies (EDAnalyzer and TSelector)

Used in a Used in a TSelector templated framework TSelector templated framework TFWLiteSelectorTFWLiteSelector

Selector libraries distributed as PAR fileSelector libraries distributed as PAR file

Analysis algorithms in CMSAnalysis algorithms in CMS(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans

class MyAnalysisAlgorithm { void process( const edm::Event & ); void postProcess( TList & ); void terminate( TList & );};

// Load framework librarygSystem->Load(“libFWCoreFWLite”);// Load TSelector librarygSystem->Load(“libPhysicsToolsParallelAnalysis”);

TSelector *mysel = new TFWLiteSelector<MyAnalysisAlgorithm>

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2727

Current developments and plansCurrent developments and plans

SchedulingScheduling Consolidation, error handlingConsolidation, error handling

Improved but still cases when we lose control of the sessionImproved but still cases when we lose control of the session Processing error reportProcessing error report

Associate to a query an object detailing what went wrong Associate to a query an object detailing what went wrong (e.g. data set elements not analyzed) and why(e.g. data set elements not analyzed) and why

Non-input-file-driven based analysisNon-input-file-driven based analysis Current processing is based on tree or object filesCurrent processing is based on tree or object files

Local multi-core desktop optimizationLocal multi-core desktop optimization No daemons, UNIX sockets (no master?)No daemons, UNIX sockets (no master?)

GUI: integration in a more general GUI ROOT GUI: integration in a more general GUI ROOT controllercontroller

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2828

PROOF exploiting multi-coresPROOF exploiting multi-cores

Alice search for Alice search for 00’s’s 4 GB simulated data4 GB simulated data

Instantaneous ratesInstantaneous rates

(evt/s, MB/s)(evt/s, MB/s)

Clear advantage ofClear advantage of

quad corequad core

Additional computingAdditional computingPower fully exploitedPower fully exploited

Demo at Intel Quad-Core Launch – Nov 2006Demo at Intel Quad-Core Launch – Nov 2006

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2929

PROOF: scheduling multi-usersPROOF: scheduling multi-users

Fair resource sharingFair resource sharing System scheduler not enough if NSystem scheduler not enough if Nusersusers >= ~ N >= ~ Nworkersworkers / 2 / 2

Enforce priority policiesEnforce priority policies Two approachesTwo approaches

Quota-based worker level load balancingQuota-based worker level load balancing Simple and solid implementation, no central unitSimple and solid implementation, no central unit

Group quotas defined in the configuration fileGroup quotas defined in the configuration file Central schedulerCentral scheduler

Per-query decisions based on cluster load, resources Per-query decisions based on cluster load, resources need by the query, user history and prioritiesneed by the query, user history and priorities

Generic interface to external schedulers plannedGeneric interface to external schedulers planned MAUI, LSF, …MAUI, LSF, …

(Very) quick introduction What’s new since ROOT05 Current developments and plansCurrent developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3030

Quota-based worker level load balancingQuota-based worker level load balancing

Lower priority processes slowdownLower priority processes slowdown sleep before next packet requestsleep before next packet request

Sleeping time proportional to the used CPU timeSleeping time proportional to the used CPU time factor depends on # users and the quotasfactor depends on # users and the quotas

Example: Example: userA, quota 2/3userA, quota 2/3; ; userB, quota 1/3userB, quota 1/3 After T seconds:After T seconds:

CPU(A) = T/2, CPU(B) = T/2CPU(A) = T/2, CPU(B) = T/2 Sleep B form T/2 secondsSleep B form T/2 seconds

After T + T/2 secondsAfter T + T/2 seconds CPU(A) = T/2 + T/2 = 2 * CPU(B) = T/2CPU(A) = T/2 + T/2 = 2 * CPU(B) = T/2

General case of N users brings a tri-diagonal linear General case of N users brings a tri-diagonal linear systemsystem

(Very) quick introduction What’s new since ROOT05 Current developments and plansCurrent developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3131

Quota-based worker level load balancingQuota-based worker level load balancing

Group quotas defined in the xrootd configuration fileGroup quotas defined in the xrootd configuration file

Factors recalculated by the master XPD each time Factors recalculated by the master XPD each time that a user start or ends processingthat a user start or ends processing Only active users consideredOnly active users considered

A low priority user will get 100% of resources when aloneA low priority user will get 100% of resources when alone

Under linux processes SCHER_RR system scheduling Under linux processes SCHER_RR system scheduling enforcedenforced The default, dynamic, SCHED_OTHER scheme screws up the The default, dynamic, SCHED_OTHER scheme screws up the

all idea, as sleeping processes get higher priority at restartall idea, as sleeping processes get higher priority at restart

xpd.group tpc usra,usrbxpd.grpparam tpc quota:70%

(Very) quick introduction What’s new since ROOT05 Current developments and plansCurrent developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3232

DemoDemo

Same sample analysis (h1 slightly slowed-Same sample analysis (h1 slightly slowed-down) repeated for 20 timesdown) repeated for 20 times

2 users2 users gganis: reserved quota 70%gganis: reserved quota 70% ganis: taking what leftganis: taking what left

Histogram show processing rate in MB/s Histogram show processing rate in MB/s

(Very) quick introduction What’s new since ROOT05 Current developments and plansCurrent developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3333

DemoDemo(Very) quick introduction What’s new since ROOT05 Current developments and plansCurrent developments and plans

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3434

Central schedulingCentral scheduling

Entity running on master XPD, loaded as plug-inEntity running on master XPD, loaded as plug-in Abstract interface XrdProofSched definedAbstract interface XrdProofSched defined

Input:Input: Query info (via XrdProofServProxy ->proofserv) Query info (via XrdProofServProxy ->proofserv) Cluster status via OLBD control networkCluster status via OLBD control network PolicyPolicy

Output:Output: List of workers to continue withList of workers to continue with

(Very) quick introduction What’s new since ROOT05 Current developments and plansCurrent developments and plans

class XrdProofSched { …public: virtual int GetWorkers(XrdproofServProxy *xps, std::list<XrdProofWorker *> &wrks)=0; …};

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3535

Central schedulingCentral scheduling(Very) quick introduction What’s new since ROOT05 Current developments and plansCurrent developments and plans

TProofPlayerTProofPlayer(session)(session)

DatasetDatasetLookupLookup

TProofTProof

ClientClient MasterMaster

SchedulerScheduler

TPacketizerTPacketizer(query)(query)

XPDXPD PLB (olbd)PLB (olbd)

Schematic viewSchematic view

Needed ingredients:Needed ingredients: Full exploitation of the OLBD networkFull exploitation of the OLBD network Come&Go functionality for workersCome&Go functionality for workers ……

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3636

Summary Summary

Several improvements in PROOF since ROOT05Several improvements in PROOF since ROOT05 Coordinator functionalityCoordinator functionality Data set managerData set manager Resource controlResource control

ALICE is stress testing the system in LHC environment ALICE is stress testing the system in LHC environment using a test-CAF at CERNusing a test-CAF at CERN a lot of useful feedbacka lot of useful feedback

Efforts now concentrated on Efforts now concentrated on Further consolidation and optimizationFurther consolidation and optimization SchedulingScheduling

PROOF is steadily improving: getting ready for LHC PROOF is steadily improving: getting ready for LHC datadata

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3737

Credits Credits

PROOF teamPROOF team M. Ballintijn, B. Bellenot, L. Franco, G.G., J. M. Ballintijn, B. Bellenot, L. Franco, G.G., J.

Iwaszkiewizc, F. RademakersIwaszkiewizc, F. Rademakers J.F. Grosse-Oetringhaus, A. Peters (ALICE)J.F. Grosse-Oetringhaus, A. Peters (ALICE) A. Hanushevsky (SLAC)A. Hanushevsky (SLAC)

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3838

Backup Backup

See also presentations at previous ROOT See also presentations at previous ROOT workshops and at CHEPxxworkshops and at CHEPxx

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3939

The ROOT data model: Trees & The ROOT data model: Trees & SelectorsSelectors

Begin()•Create histos, …•Define output list

Process()

preselection analysis

Terminate()•Final analysis (fitting, …)

output listSelector

loop over events

OK

event

branch

branch

leaf

leafleaf

branch

leafleaf

1 2 n last

n

read neededparts only

Chain

branch

leaf leaf

BackupBackup

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4040

Motivation for PROOFMotivation for PROOF

Provide an alternative, Provide an alternative, dynamic, approach to dynamic, approach to end-user HEP analysis on distributed systemsend-user HEP analysis on distributed systems

Typical HEP analysisTypical HEP analysis is a continuous refinement is a continuous refinement cycle cycle

Data sets are Data sets are collections of independent eventscollections of independent events LargeLarge (e.g. ALICE ESD+AOD: ~350 TB / year) (e.g. ALICE ESD+AOD: ~350 TB / year) SpreadSpread over many disks and mass storage systems over many disks and mass storage systems

Exploiting intrinsic parallelismExploiting intrinsic parallelism is the only way to is the only way to analyze the data in reasonable timesanalyze the data in reasonable times

Implement algorithm

Run over data set

Make improvements

BackupBackup

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4141

The PROOF approachThe PROOF approachcatalog StoragePROOF farm

scheduler

query

MASTER

PROOF query:data file list, myAna.C

files

feedbacksfinal

outputs (merged)

farm perceived as extension of local PC same syntax as in local session

more dynamic use of resources real time feedback automated splitting and merging

BackupBackup

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4242

PROOF design goalsPROOF design goals

TransparencyTransparency Minimal impact on the ROOT user habitsMinimal impact on the ROOT user habits

ScalabilityScalability Full exploitation of the available resourcesFull exploitation of the available resources

AdaptabilityAdaptability Cope transparently with heterogeneous environmentsCope transparently with heterogeneous environments

Preserve Real-time interaction and feedbackPreserve Real-time interaction and feedback Intended forIntended for

Central Analysis FacilitiesCentral Analysis Facilities Departmental workgroup computing facilities (Tier-Departmental workgroup computing facilities (Tier-

2’s)2’s) Multi-core / multi-disk desktopsMulti-core / multi-disk desktops

BackupBackup

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4343

PROOF dynamic load balancingPROOF dynamic load balancing

Pull architecture guarantees scalabilityPull architecture guarantees scalability

Adapts to variations in performance Adapts to variations in performance

Worker 1 Worker NMaster

BackupBackup

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4444

PROOF intrinsic scalabilityPROOF intrinsic scalability

Strictly concurrent user jobsStrictly concurrent user jobs

at CAF (100% CPU used)at CAF (100% CPU used) In-memory dataIn-memory data Dual Xeon, 2.8 GHzDual Xeon, 2.8 GHz

CMS analysisCMS analysis 1 master, 80 workers1 master, 80 workers Dual Xeon 3.2 GHzDual Xeon 3.2 GHz Local data: 1.4 GB / nodeLocal data: 1.4 GB / node Non-Blocking GB EthernetNon-Blocking GB Ethernet

1 user

2 users

4 users

8 users

I. Gonzales, Cantabria

BackupBackup

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4545

PROOF essentials: what can be done?PROOF essentials: what can be done?

Ideally everything made of independent tasksIdeally everything made of independent tasks Currently available:Currently available:

Processing of trees Processing of trees Processing of independent objects in a fileProcessing of independent objects in a file

Tree processing and drawing functionality Tree processing and drawing functionality completecomplete

// Create a chain of treesroot[0] TChain *c = CreateMyChain.C;

// MySelec is a TSelectorroot[1] c->Process(“MySelec.C+”);

// Create a chain of treesroot[0] TChain *c = CreateMyChain.C;

// Start PROOF and tell the chain// to use itroot[1] TProof::Open(“masterURL”);root[2] c->SetProof()

// Process goes via PROOFroot[3] c->Process(“MySelec.C+”);

PROOFLOCAL

BackupBackup

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4646

The PROOF targetThe PROOF target

Short analysis usinglocal resources, e.g.- end-analysis calculations- visualization

Long analysis jobs with well defined algorithms (e.g. production of personal trees)

Medium term jobs, e.g.analysis design and development using alsonon-local resources

Optimize response for short / medium jobs Perceive medium as short

BackupBackup

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4747

PROOF: additional remarksPROOF: additional remarks

Intrinsic serial overhead smallIntrinsic serial overhead small requires reasonable connection between a requires reasonable connection between a

(sub-)master and its workers(sub-)master and its workers Hardware considerationsHardware considerations

IO bound analysis (frequent in HEP) often limited by IO bound analysis (frequent in HEP) often limited by hard drive access: N small disks are much better hard drive access: N small disks are much better than 1 big onethan 1 big one

Good amount of RAM for efficient data caching Good amount of RAM for efficient data caching Data access is The IssueData access is The Issue::

Optimize for data locality, when possibleOptimize for data locality, when possible Low-latency access to mass storageLow-latency access to mass storage

BackupBackup

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4848

PROOF: data access issuesPROOF: data access issues

Low latencyLow latency in data access is in data access is essential for high essential for high performanceperformance Not only a PROOF issueNot only a PROOF issue

File opening overheadFile opening overhead Minimized using asynchronous open techniquesMinimized using asynchronous open techniques

Data retrievalData retrieval caching, pre-fetching of data segments to be caching, pre-fetching of data segments to be

analyzedanalyzed Recently introduced in ROOT for TTreeRecently introduced in ROOT for TTree

Techniques improving network performance, e.g. Techniques improving network performance, e.g. InfiniBand, or file access (e.g. memory-based file InfiniBand, or file access (e.g. memory-based file serving, PetaCache) should be evaluatedserving, PetaCache) should be evaluated

BackupBackup

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4949

PROOF: PAR archive filesPROOF: PAR archive files

Allow client to add software to be used in the analysisAllow client to add software to be used in the analysis Simple structureSimple structure

packagepackage// Source / binary filesSource / binary files

packagepackage//PROOF-INF/BUILD.shPROOF-INF/BUILD.sh How to build the package (makefile)How to build the package (makefile)

packagepackage//PROOF-INF/SETUP.CPROOF-INF/SETUP.C How to enable the package (load, dependencies)How to enable the package (load, dependencies)

A PAR is a gzip’ed tar-ball of the A PAR is a gzip’ed tar-ball of the packagepackage tree tree Versioning support being addedVersioning support being added

BackupBackup

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 5050

PROOF essentials: monitoringPROOF essentials: monitoring

InternalInternal File access rates, packet latencies, processing time, File access rates, packet latencies, processing time,

etc.etc. Basic set of histograms available at tunable frequencyBasic set of histograms available at tunable frequency

Client temporary output objects can also be Client temporary output objects can also be retrievedretrieved

Possibility of detailed tree for further analysisPossibility of detailed tree for further analysis MonALISA-basedMonALISA-based

Each host reportsEach host reports CPU, memory,CPU, memory, swap, networkswap, network

Each worker reportsEach worker reports CPU, memory, evt/s,CPU, memory, evt/s, IO vs. network rateIO vs. network rate

pcalimonitor.cern.ch:8889pcalimonitor.cern.ch:8889

Network traffic between nodes

BackupBackup

27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 5151

PROOF GUI controllerPROOF GUI controller

Allows full Allows full on-clickon-click control control

define a new sessiondefine a new session submit a query, executesubmit a query, execute

a command a command query editorquery editor

create / pick up a chain create / pick up a chain choose selectorschoose selectors

online monitoring of feedback histogramsonline monitoring of feedback histograms browse folders with results of querybrowse folders with results of query retrieve, delete, archive functionalityretrieve, delete, archive functionality

BackupBackup