f. fassi, s. cabrera, r. vives, s. gonzález de la hoz, Á. fernández, j. sánchez, l. march, j....

F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz,

Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas

IFIC-CSIC-UV, Valencia, Spain

Third EELA conference, 4 December 2007, Catania, Italy

Outline:ATLAS Grid computing and facilities Distributed Analysis model in ATLASGANGA overviewIFIC Tier-2 infrastructure

Resources and services Data transfers Job priority

A demonstration of using GANGA Conclusion

ATLAS Grid computing Atlas computing based on hierarchical architecture of

Tiered Atlas computing operates uniformly on heterogeneous

grid environment based on three grid infrastructuresGrids have different middle-ware, replica catalogs and

tools to submit jobs

CERN

Tier-1s

Tier-2s

ATLAS facilitiesEvent Filter Farm at CERN

• Located near the Experiment, assembles data into a stream to the Tier-0

• Tier-0 at CERN

• Derive first pass calibration with 24 hours

• Reconstructed of the data keeping up with data taking

• Tier-1s distributed worldwide (10 centers)• Reprocessing of full data with improved calibrations 2 months after data taking• Managed Tape Access: RAW, ESD• Disk Access: AOD, Fraction of ESD

• Tier2s distributed worldwide ( 40 + centers)• MC Simulation, producing ESD, AOD • User physics analysis, Disk Store (AOD)

• CERN Analysis Facility• Primary purpose: Calibration

• Limited access to ESD and RAW

• Tier-3s distributed worldwide

. Physics analysis

Distributed Analysis model in ATLAS The Distributed Analysis model is based on the ATLAS

computing model:Data for analysis will be available distributed on all Tier-

1 and Tier-2 centers Tier-2s are open for analysis jobs The computing model foresees 50 % of grid resources to be

allocated for analysis

User jobs are sent to data large input datasets (100 GB up to several TB) Results must be made available to the user (NTuples or

similar)Data is added with meta-data and bookkeeping in

catalogs

ATLAS strategy is based on making use of the whole resourcesSolution must deal with the challenge of the

heterogeneous grid infrastructures NorduGrid: backend for ARC submission is integrated

OSG/Panda: Recently integrated a backend for Panda

GANGA front-end supports all ATLAS Grid flavors

Distributed Analysis model in ATLAS

The idea behind GANGAThe naive idea of submitting jobs to Grid assume the following steps:

Prepare the “Job Description Language” file for job configuration Find suitable Athena software application Locate the datasets on different storage elements Job splitting, monitoring and book-keeping

GANGA combines several different components providing a front-end client for interacting with grid infrastructures

It is a user-friendly job definition and management tool Allows simple switching between testing on a local batch system

and large-scale data processing on Grid distributed resources

GANGA featuresGANGA is based on a simple, but flexible, job

abstraction A job is constructed from a set of building

blocks, not all required for each job

Support for several applications: Generic Executable ATLAS Athena software Root

Support for several back-ends: LCG/gLite Resource Broker OSG/PANDA Nordugrid/ARC Middleware Batch (LSF, PBS, etc)

Equipments: (Santiago Gonzalez de la Hoz´ talk)

CPU 132 KSi2k Disk 34TB Disk Tape tape robot of 140 TB

Services:

2 SRM Interface, 2 CE, 4 UI, 1 BDII, 1 RB 1 PROXY,1 MON, 2 GridFTP, 2 QUATTOR QUATTOR: install and configure the resources

Network:

Connectivity from the site to network is about one Gpbs

The facility serves the dual purpose of producing simulated data and analysis data

Resources and servicesRacks

Robot

Data Transfers (I)Data management is a crucially aspect of Distributed Analysis

Managed by DDM system known as DQ2

Data is being distributed to Tier-1 and Tier-2 centers for

analysis Through several exercises organized by the ATLAS collaboration

IFIC is participating in this effort with the aim to: Have datasets available at site IFIC for analysis Test the functionality and performance of data transfer mechanisms

IFIC contribution in the data transfer activities is the following: SC4 (Service Challenge 4; October 2006) Functional Tests (August 2007) M4: Comics' Run: August 23 – September 3 M5 :Comics' Run scheduled for October 16-23

Data Transfers (II) The datasets exported to IFIC is store in the Lustre based Storage Element

They are available in distributed manner through:

Registering in the Local LFC catalog Archiving thought-out the whole grid using the DDM central catalog

In addition: information on the stored datasets is provided by the IFIC web page: http://ific.uv.es/atlas-t2-es/ific/main.html

Job PriorityAnalysis jobs perform in parallel to the production jobsNeed a mechanism to steer the resource consumption of the ATLAS community Job Priority

Objective: allow enforcement of job priorities based on VOMS groups/roles, using the Priorities Working Group schema

Development and deployment done at IFIC : ● Define local mappings for groups/roles and Fair Share (FS)

atlas:atlas 50 % of all Atlas VO users atlb:/atlas/Role=production 50 % of Atlas production activity atlc:/atlas/Role=software no FS (but more priority, only 1 job

at a time) atld:/atlas/Role=lcgadmin no FS (but more priority, only 1 job

at a time)

IntroductionObjective

Testing IFIC Tier-2 analysis facility

Producing Top N-tuples from large ttbar dataset (6M events, ~ 217 GB )

Benefit to perform Top analysis studies

Requirement: Fast and easy large scale production grid environment Runs everywhere use ATLAS resources Easy user interface Hide the grid infrastructure

Our setup: GANGA version 4.4.2 Athena 12.0.6, TopView-00-12-13-03 EventView Group Area 12.0.6.8

GANGA

Observations and issues Before sending jobs to Grid some operations have been done:

Find out where the dataset is complete (this dataset has 2383 files) Be sure that the selected site is a good one.

Jobs are sent correctly to the selected sites (good ones with complete

replica)

General issues: In general, jobs fail even on good sites Re-submits are necessary

until successful GANGA submission failure due to missing CE-SE correspondence Often the jobs fail because the site on which they are executed is not

properly configured Speed issue in submitting sub-jobs using LCG RG

WMS gLite bulk submission At some sites jobs end up in Long queue job priority missing

Currently no solution kill and restart again the jobs

PerformanceGeneral: Jobs at IFIC are finished within 1-2 hours fast execution 1h to

run 1M events

Some jobs were ran successfully also in other sites (Lyon, FZK) Very high efficiency running on those sites where the dataset is

available and no site configuration problem

Results Some re-combining output N-tuples were analyzed with the Root framework for reconstructing the top quark mass from the hadronic decay

ConclusionExperience in configuring and deploying IFIC site is shown Lesson learned from using GANGA:

GANGA is a Lightweight easy grid job submission tool

GANGA performs a great job in configuring, splitting jobs,

scheduling input and output files

The Distributed Analysis using GANGA depends strongly on the data distribution and sites quality configurationSpeed of submission was a major issue with LCG RB

need of WMS gLite deployment Bulk submission

feature

f. fassi, s. cabrera, r. vives, s. gonzález de la hoz, Á. fernández, j. sánchez, l. march, j....

Documents

atlas computing model

grid resources

grid infrastructuresgrids

analysis user jobs

job configuration

job support

job abstraction

raw tier