f. fassi, s. cabrera, r. vives, s. gonzález de la hoz, Á. fernández, j. sánchez, l. march, j....
TRANSCRIPT
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz,
Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas
IFIC-CSIC-UV, Valencia, Spain
Third EELA conference, 4 December 2007, Catania, Italy
Outline:ATLAS Grid computing and facilities Distributed Analysis model in ATLASGANGA overviewIFIC Tier-2 infrastructure
Resources and services Data transfers Job priority
A demonstration of using GANGA Conclusion
ATLAS Grid computing Atlas computing based on hierarchical architecture of
Tiered Atlas computing operates uniformly on heterogeneous
grid environment based on three grid infrastructuresGrids have different middle-ware, replica catalogs and
tools to submit jobs
CERN
Tier-1s
Tier-2s
ATLAS facilitiesEvent Filter Farm at CERN
• Located near the Experiment, assembles data into a stream to the Tier-0
• Tier-0 at CERN
• Derive first pass calibration with 24 hours
• Reconstructed of the data keeping up with data taking
• Tier-1s distributed worldwide (10 centers)• Reprocessing of full data with improved calibrations 2 months after data taking• Managed Tape Access: RAW, ESD• Disk Access: AOD, Fraction of ESD
• Tier2s distributed worldwide ( 40 + centers)• MC Simulation, producing ESD, AOD • User physics analysis, Disk Store (AOD)
• CERN Analysis Facility• Primary purpose: Calibration
• Limited access to ESD and RAW
• Tier-3s distributed worldwide
. Physics analysis
Distributed Analysis model in ATLAS The Distributed Analysis model is based on the ATLAS
computing model:Data for analysis will be available distributed on all Tier-
1 and Tier-2 centers Tier-2s are open for analysis jobs The computing model foresees 50 % of grid resources to be
allocated for analysis
User jobs are sent to data large input datasets (100 GB up to several TB) Results must be made available to the user (NTuples or
similar)Data is added with meta-data and bookkeeping in
catalogs
ATLAS strategy is based on making use of the whole resourcesSolution must deal with the challenge of the
heterogeneous grid infrastructures NorduGrid: backend for ARC submission is integrated
OSG/Panda: Recently integrated a backend for Panda
GANGA front-end supports all ATLAS Grid flavors
Distributed Analysis model in ATLAS
The idea behind GANGAThe naive idea of submitting jobs to Grid assume the following steps:
Prepare the “Job Description Language” file for job configuration Find suitable Athena software application Locate the datasets on different storage elements Job splitting, monitoring and book-keeping
GANGA combines several different components providing a front-end client for interacting with grid infrastructures
It is a user-friendly job definition and management tool Allows simple switching between testing on a local batch system
and large-scale data processing on Grid distributed resources
GANGA featuresGANGA is based on a simple, but flexible, job
abstraction A job is constructed from a set of building
blocks, not all required for each job
Support for several applications: Generic Executable ATLAS Athena software Root
Support for several back-ends: LCG/gLite Resource Broker OSG/PANDA Nordugrid/ARC Middleware Batch (LSF, PBS, etc)
Equipments: (Santiago Gonzalez de la Hoz´ talk)
CPU 132 KSi2k Disk 34TB Disk Tape tape robot of 140 TB
Services:
2 SRM Interface, 2 CE, 4 UI, 1 BDII, 1 RB 1 PROXY,1 MON, 2 GridFTP, 2 QUATTOR QUATTOR: install and configure the resources
Network:
Connectivity from the site to network is about one Gpbs
The facility serves the dual purpose of producing simulated data and analysis data
Resources and servicesRacks
Robot
Data Transfers (I)Data management is a crucially aspect of Distributed Analysis
Managed by DDM system known as DQ2
Data is being distributed to Tier-1 and Tier-2 centers for
analysis Through several exercises organized by the ATLAS collaboration
IFIC is participating in this effort with the aim to: Have datasets available at site IFIC for analysis Test the functionality and performance of data transfer mechanisms
IFIC contribution in the data transfer activities is the following: SC4 (Service Challenge 4; October 2006) Functional Tests (August 2007) M4: Comics' Run: August 23 – September 3 M5 :Comics' Run scheduled for October 16-23
Data Transfers (II) The datasets exported to IFIC is store in the Lustre based Storage Element
They are available in distributed manner through:
Registering in the Local LFC catalog Archiving thought-out the whole grid using the DDM central catalog
In addition: information on the stored datasets is provided by the IFIC web page: http://ific.uv.es/atlas-t2-es/ific/main.html
Job PriorityAnalysis jobs perform in parallel to the production jobsNeed a mechanism to steer the resource consumption of the ATLAS community Job Priority
Objective: allow enforcement of job priorities based on VOMS groups/roles, using the Priorities Working Group schema
Development and deployment done at IFIC : ● Define local mappings for groups/roles and Fair Share (FS)
atlas:atlas 50 % of all Atlas VO users atlb:/atlas/Role=production 50 % of Atlas production activity atlc:/atlas/Role=software no FS (but more priority, only 1 job
at a time) atld:/atlas/Role=lcgadmin no FS (but more priority, only 1 job
at a time)
IntroductionObjective
Testing IFIC Tier-2 analysis facility
Producing Top N-tuples from large ttbar dataset (6M events, ~ 217 GB )
Benefit to perform Top analysis studies
Requirement: Fast and easy large scale production grid environment Runs everywhere use ATLAS resources Easy user interface Hide the grid infrastructure
Our setup: GANGA version 4.4.2 Athena 12.0.6, TopView-00-12-13-03 EventView Group Area 12.0.6.8
GANGA
Observations and issues Before sending jobs to Grid some operations have been done:
Find out where the dataset is complete (this dataset has 2383 files) Be sure that the selected site is a good one.
Jobs are sent correctly to the selected sites (good ones with complete
replica)
General issues: In general, jobs fail even on good sites Re-submits are necessary
until successful GANGA submission failure due to missing CE-SE correspondence Often the jobs fail because the site on which they are executed is not
properly configured Speed issue in submitting sub-jobs using LCG RG
WMS gLite bulk submission At some sites jobs end up in Long queue job priority missing
Currently no solution kill and restart again the jobs
PerformanceGeneral: Jobs at IFIC are finished within 1-2 hours fast execution 1h to
run 1M events
Some jobs were ran successfully also in other sites (Lyon, FZK) Very high efficiency running on those sites where the dataset is
available and no site configuration problem
Results Some re-combining output N-tuples were analyzed with the Root framework for reconstructing the top quark mass from the hadronic decay
ConclusionExperience in configuring and deploying IFIC site is shown Lesson learned from using GANGA:
GANGA is a Lightweight easy grid job submission tool
GANGA performs a great job in configuring, splitting jobs,
scheduling input and output files
The Distributed Analysis using GANGA depends strongly on the data distribution and sites quality configurationSpeed of submission was a major issue with LCG RB
need of WMS gLite deployment Bulk submission
feature