k.harrison and a.soroko cosener’s house, abingdon, uk 22 may 2002 framework-grid interfaces:...
TRANSCRIPT
K.Harrison and A.SorokoCosener’s House, Abingdon, UK22 May 2002
Framework-Grid interfaces: technical survey
– Need for Framework-Grid interfaces– Outline of required functionality– Tools for software installation and configuration– Production tools– Grid interfaces currently under development– Conclusions
Aim to give general background, and brief overview of software products relevant to a Framework-Grid interface for ATLAS and LHCb Many items covered in more detail in later presentations
Need for Framework-Grid interfaces
– Resources for Grid activities are becoming available in increasing numbers Want to take advantage of these resources as early as possible
– Take Cambridge as an example: For Grid activities, have now: 32 X 400 MHz Pentium II processors 20 X 1.13 GHz Pentium III processors About 0.5 Tbyte of disk space Globus 2.0 installed In near future: will add 2 Tbyte file server will install EDG middleware will connect to UK eScience Grid and EU Testbed
– Physics at Cambridge using Grid resources: ATLAS: already submitting ATLFAST simulation jobs; plan to participate in data challenges LHCb: participating in data challenges (initially non-Grid; later with Grid) NA48: preparing to simulate 10^8 events for evaluation of backgrounds in rare kaon decays (300 days of CPU time)
To fully exploit possibilities for physics studies, need a tool that simplifies Grid access and job configuration: Framework-Grid interface
– First ideas for a Grid interface with built-in knowledge of the Gaudi/Athena framework used by ATLAS and LHCb developed in summer 2001, in particular by P.Mato and C.Tull Gaudi/Athena and Grid Alliance (GANGA)– GANGA might eventually be: a completely new Grid interface an adaptation/evolution of an existing Grid interface – In all cases, expect GANGA to be modular and to make use of tools/service developed by others
This workshop should help us understand how to proceed
Outline of required functionality
– A Framework-Grid interface for ATLAS and LHCb will need to provide access to services that can be logically divided into two categories– Grid services are developed in the context of many groups and work packages: Security services Job submission Job decomposition Resource allocation and management Data replication and cataloguing Application-independent monitoring Would hope to use these as they are (assume no further development needed)
– Framework-related services (specific to ATLAS and LHCb) will need to be developed in parallel with the interface implementation: Job configuration (algorithms to run, properties, input/output requests) Management of software environment (executables, libraries,databases, etc) Automatic creation of job-description files Error recovery Application-specific monitoring Bookkeeping
Tools for software installation and configuration
– In general, Grid resources will not be dedicated to a single experiment: might run jobs for ATLAS one day, CDF the next and LHCb the day after Framework-Grid interface will need access to a tool that allows setting up of the user’s software environment
– Tools of interest include: LCFG: developed in context of EU WP4, based on rpm files DAR: developed at FNAL, based on tarballs pacman: developed at Boston University, fetches, installs and manages packages based on rpm files or tarballs, makes use of software cache See presentation by S.Youssef
Production tools
– Production tools already in use can provide ideas for implementing some of the services to be offered by a Framework-Grid interface– As an example, consider Simulation for LHCb and its Integrated Control Environment (SLICE) see presentation of G.Kuznetsov– Working in a non-Grid environment: Production requests to distributed facilities are submitted via a web page Java servelets create job scripts and options files Production is monitored using control system based on PVSS Update of bookkeeping database, transfer of output data to mass storage and quality checks performed automatically– Grid-based system at experimental stage
physicist
request for production: nr of events, channel, datatype (implies a workflow), configuration, deadline for completion
physics coordinator: ratifies production request which gets added as outstanding request to the database
Job creation/submission (via Web): identify outstanding requests, select workflow(s), give nr of events, create scripts
Monitoring (via PVSS): submit jobs to distributed sites, see what jobs are running, how many, channel, datatype, site, current event nr, configuration used by job, submit time, kill jobs
bookkeeping database
production manager: -Create required nr of jobs (500 evts each)-Determine configuration -Determine/create runtime environment-Run executable-Check data-Copy data/logs-Flag production as completed, prepare updating of bookkeeping db
Servelet Purpose
Maprunmc sicbmc for rawh production
Brunelrun Brunel for DST production
Bbinclrun Sicbmc + sicbdst for physics production
Mcbrunel Sicbmc v249 + Brunel v9r1 for data challenge tests, dbase v243r1p1, v243r3
LHCb production strategy using SLICE
(From E.vanHerwijnen)
Update bookkeepingdatabase
Transfer data toMass store
Data Quality Check
Submit jobs remotelyview
Monitorperformanceof farm viaWeb
Executeon farm
(From E.vanHerwijnen)
Grid interfaces currently under development
– Middleware (Globus, EDG, PPDG, other) provides an interface to grid services via command-line instructions given in a particular sequence– More user-friendly interfaces are being developed by several groups: Alice Environment (Alien) see also presentations of P.Buncic and L.Goosens EDG GUI see also presentation of D.Colling Grid Enabled Web Environment for Site-Independent User Job Submission (GENIUS) Grid Access Portal for Physics Applications (Grappa) see also presentation of C.Tull Others?
AliEn• General characteristics of AliEn:
– Under development by Alice Offline Group, but not specific to Alice
– Uses iVDGL or EDG middleware, Globus toolkit, and a variety of external modules (SOAP, PAM, SWIG, etc)
– Based on Perl
– User access via machine on which AliEn is installed:• Command-line interface allows authentication, access to
distributed catalogue, job submission, etc
• With appropriate module installed, also have GUI interface
– Web interface is under development
Functionality of AliEn (I)
• File Catalogue: – To access the catalogue, user types: alien – To authenticate to the server, user must have either a globus
certificate, or ssh keys– User can browse the catalogue using UNIX-like commands– Catalogue entries seen by user are Logical File Names (LFN)– Each user has a home directory, and can register files by
giving LFN, PFN, and size
Functionality of AliEn (II)
• Getting a file (from local SE)
(From P.Saiz)
ProxyProxy AuthenAuthen
SESE
Client
12
3Get lfn
Lfn?Pfn and SE
Pfn?File
SE at the siteof the client
Functionality of AliEn (III)
• Job submission:– Jobs may be executed on any cluster of AliEn– Output is accessible through the AliEn catalogue– alien StartMonitor starts a daemon that forwards job
requests to a central server– alien login gives user the AliEn prompt, which allows
access to the AliEn Catalogue and provides commands to submit jobs
– User gives job description using Classads (name of the executable, possible arguments or input files, extra requirements for the job, etc)
Functionality of AliEn (IV)
• Submitting jobs
(From P.Saiz)
ISIS ProxyProxy AuthenAuthen CPUServerCPUServer
ClusterMonitor
ClusterMonitor
Client
12
3
submit
4 Registering stdin
Functionality of AliEn (V)
• Executing a job
ISIS ProxyProxy CPUServerCPUServerOne per organization
ClusterMonitor
ClusterMonitor CECE
ProcessMonitor
ProcessMonitor
One per element
1
2
3Possible Local Queues:•LSF•PBS•BQS•Globus•CONDOR•DQS
(From P.Saiz)
AliEn GUI• AliEn xfiles
– alien xfiles creates a window for browsing the catalogue
AliEn C API• AliEn C API will provide C++ (ROOT) binding
– Proposed types typedef unsigned long Alien_t; // opaque handle to Alien connection
// associated struct contains ALIEN connection state typedef struct AlienResultStruct {
char **results; // array of result strings int result_count; // number of results int current; // current result
} AlienResult_t; typedef struct AlienAttrStruct {
char **attribute; // array of attribute names char **values; // array of attribute values int atrr_count; // number of attribute pairs int current; // current attribute
} AlienAttr_t;
Alien C API– Some function declarations // Connect to ALIEN server. Return handle to ALIEN instance, 0 in case of failure. Alien_t AlienConnect(const char *alien_server, const char *user, const char *passwd); // Close connection to ALIEN server. Returns -1 in case of error. int AlienClose(Alien_t srv);
// Return ALIEN version string. const char *AlienGetInfo(Alien_t srv);
// Add physical file to catalog and associate logical file name. Returns -1 on error, like// lfn,pfn already exists, illegal handle, etc.int AlienAddFile(Alien_t srv, const char *lfn, const char *pfn);
// Delete lfn and associated pfn's. Returns -1 on error, like illegal handle, lfn not existing, no// perm, etc. int AlienDeleteFile(Alien_t srv, const char *lfn);
EDG GUI for Job Submission
EDG GUI for Job Submission
EDG GUI for Job Submission
•
EDG GUI for Job Submission
•
GENIUS• GENIUS general characteristics:
– Under development by NICE s.r.l. (Italy), and INFN
– Uses EDG middleware, Globus toolkit and the EnginFrame framework of NICE srl.
– Based on Java and XML, which is translated by EnginFrame into HTML, WML, PDF and enriched XML
– Unix/NT integration makes extensive use of the available Internet standards (HTML, HTTP, JAVA, XML, etc.)
– User must obtain an account on an interface machine where GENIUS is installed and upload globus certificate
– Testbed access is provided via web page from anywhere (desktop, laptop, PDA, WAP telephone, etc)
GENIUS• GENIUS modules:
– Service: XML representations of computing-related facilities– Client Tier: any browser and its extensions, the layer with which
users interact – Server Tier: one or more servelet-enabled web servers, providing
contents and services to the clients, and controlling resource activities in the back-end
– Resource Tier: where a number of "Agents" control the actual computing resources (clusters, stand-alone hosts, etc) and provide correctly formatted results to the servers
– Plug-ins: developed for the Resource Tier: LSF, AFS, Nfuse, Globus and DataGrid
GENIUS• GENIUS modules: The EnginFrame work-flow
GENIUS• GENIUS architecture:
Apache
EnginFrame
GENIUShttps+java/xml+rfb
WEB Browser
EDGUI
LocalWS
the GridEDG+GSI
(From R.Barbera)
GENIUS is bult on top of the already existing DataGrid command-line interface
GENIUS functionality (I)
• GENIUS services:– File Services– Security Services– Job Services– Information Services– Monitoring Services – Interactive Services (Virtual Network Computing package )– VO services– Statistics
GENIUS functionality (II)
– File Services:• Create a File
• View a File
• Edit a File
• Rename a File/Directory
• Delete a File/Directory
• Create a Directory
• Upload a File
• Show the Environment
GENIUS functionality (III)
– Security Services:• Upload Your Certificate
– Upload .globus Tar ball
– Upload Your .p12 Certificate
• Information on proxy
• Renew proxy
• Change GENIUS Password
• Change X.509 PEM phrase
GENIUS functionality (IV)
– Job Services:• Single Job
– Job Submission» The user has to provide the JDL file
» Select one of the possible Computing Elements » Press the button “Submit job”
– Job Queue » Job identifier, JDL file, time, Computing Element, present status, possible action
– Job Output (The user has to press the button “Get Output”)– Job Data (The user can inspect personal spooler area)– Clean Job Queues
• List Available Resources
GENIUS functionality (V)
– Job Services:• Job Submission
GENIUS functionality (VI)
– Job Services:• List Available Resources
GENIUS functionality (VI)
– Information Services:• Sites belonging to the test-
bed• Computing Elements
present at each site with the information on the local resource manager
• Storage Elements present at each site, the connection port, the size and the mount point
Grappa• General characteristics of Grappa:
– Under development in context of Grid Physics Network (GriPhyN) Project and ATLAS
– Prototype based on XCAT Science Portal
– Allows user to submit jobs to US-ATLAS testbed resources
– Provides file staging, remote job-option file editing, basic monitoring
– Provides a set of tools for collaborative data analysis
– Packaged with pacman
GrappaGrappa current architecture:
Athena Notebook
XCAT Science Portal
Tomcat Server
(From R.Gardner)
GrappaXCAT architecture: (From S.Smallen)
Portal Web Server(tomcat server + java servlets)
JythonIntepreter
NotebookDatabase
GSI Authentication
User’s Web Browser
Grid
•Jython - access to Java classes:
–Globus Java CoG kit
–XCAT
–XMESSAGES
Grappa functionality
• Provided via Athena Active Notebook
Users can:– Submit Athena Jobs to the
GRID
– Manage resources
– Submit a sequence of jobOptions files to the GRID
– Monitor status of running jobs
Conclusions
– Framework-Grid interfaces will be of immediate use for physics studies– Various tools and services relevant to a Framework-Grid interface are already available– User-friendly (GUI-based) Grid interfaces are being developed by several groups
Workshop should help us understand how to proceed with development of Framework-Grid interface for ATLAS and LHCb