from dØ to atlas jae yu atlas grid test-bed workshop apr. 4-6, 2002, uta introduction dØ-grid...

21
From DØ To ATLAS Jae Yu ATLAS Grid Test-Bed Workshop Apr. 4-6, 2002, UTA •Introduction •DØ-Grid & DØRACE •DØ Progress •UTA DØGrid Activities •Conclusions

Upload: ethelbert-park

Post on 13-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

From DØ To ATLAS Jae Yu

ATLAS Grid Test-Bed Workshop Apr. 4-6, 2002, UTA

•Introduction •DØ-Grid & DØRACE•DØ Progress•UTA DØGrid Activities •Conclusions

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

2

Introduction• DØ has been taking data since Mar. 1, 2001

– Accumulated over 20pb-1 of collider data– Current maximum output rate is about 20Hz, averaging at 10Hz– This will improve to average rate of 25-30Hz with 50% duty factor, eventually

reaching to about 75Hz in Run IIb– Resulting Run IIa (2 years) data sample is about 400TB including reco. (1x109

events)– Run IIb will follow (4 year run) Data to increase to 3-5PB (~1010)

• DØ Institutions are scattered in 19 countries– About 40% European collaborators Natural demand for remote analysis

• The data size poses serious issues– Time taken for re-processing of data (10sec/event with 40 specint95)– Easy access through the given network bandwidth

• DØ Management recognized the issue and, Nov. 2001, created – A position for remote analysis coordination– Restructured DØ Computing team to include a D0Grid team (3 FTEs)

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

3

DØRACE• DØ Remote Analysis Coordination Efforts• In existence to accomplish:

– Setting up and maintaining remote analysis environment – Promote institutional contribution remotely– Allow remote institutions to participate in data analysis– To prepare for the future of data analysis

• More efficient and faster delivery of multi-PB data• More efficient sharing of processing resources• Prepare for possible massive re-processing and MC production to expedite the process• Expeditious physics analyses

– Maintain self-sustained support amongst the remote institutions to construct a broader bases of knowledge

– Sociological issues of HEP people at the home institutions and within the field• Integrate various remote analysis effort into one working piece• Primary goal is allow individual desktop users to make significant

contribution without being at the lab

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

4

• Difficulties– Having hard time setting up initially

• Lack of updated documentation• Rather complicated set up procedure• Lack of experience No forum to share experiences

– OS version differences (RH6.2 vs 7.1), let alone OS– Most the established sites have easier time updating releases– Network problems affecting successful completion of large size

releases (4GB) takes a couple of hours (SA)– No specific responsible persons to ask questions– Availability of all necessary software via UPS/UPD– Time difference between continents affecting efficiencies

From a Survey

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

5

Progress in DØGrid and DØRACE?• Document on DØRACE home page (http://www-hep.uta.edu/~d0race)

– To ease the barrier over the difficulties in initial set up – Updated and simplified instructions for set up available on the web Many

institutions have participated in refining the instruction – Tools to make DØ software download and installation made available

• Release Ready notification system activated– Success is defined by institutions depending on what they do

• Build Error log and dependency tree utility available• Change of release procedure to alleviate unstable network dependencies • SAM (Sequential Access model Metadata catalogue) station set up for data

access 15 institutions transfer files back and forth• Script for automatic download, installation, component verification, and

reinstallation in prep.

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

6

• Categorized remote analysis system set up by the functionality– Desk top only– A modest analysis server– Linux installation– UPS/UPD Installation and deployment– External package installation via UPS/UPD

• CERNLIB• Kai-lib• Root

– Download and Install a DØ release• Tar-ball for ease of initial set up?• Use of existing utilities for latest release download

– Installation of cvs – Code development– KAI C++ compiler– SAM station setup

DØRACE Strategy

Phase IRootupleAnalysis

Phase 0Preparation

Phase IIExecutables

Phase IIICode Dev.

Phase IVData Delivery

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

7

DØRACE Status by Setup Phases

17

39

0 2

13

0

17

34

4 47

5

17 18

0 1

2015

05

1015202530354045

No Interest Phase 0 Phase I Phase II Phase III Phase IV

Phases

Num

ber o

f Ins

titut

ions

Nov. Survey

Before 2/11

After 2/11

Progressive

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

8

Where are we?• DØRACE is entering into the next stage

– The compilation and running– Active code development– Propagation of setup to all institutions

• Instructions seem to take their shape well• Need to maintain and to keep them up to date• Support to help problems people encounter

– 35 institutions (about 50%) ready for code development– 15 institutions for data transfer

• DØGRID TestBed formed at the Feb. workshop– Some 10 institutions participating– UTA’s own Mark Sosebee is taking charge

• Globus job submission testing in progress UTA participates in LSF testing.• DØ has SAM in place for data delivery and cataloguing services Deep into

DØ analysis fabric, the D0Grid must be established on SAM• We will also need to establish regional analysis centers

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

9

Central Analysis Center (CAC)

RegionalAnalysis Centers

RAC RAC ….

InstitutionalAnalysis Centers

DesktopAnalysis Stations DAS DAS…. DAS DAS….

….

Store & Process10~20%of All Data

IAC….

IAC IAC….

IAC

Normal InteractionCommunication Path

Occasional Interaction Communication Path

Proposed DØRAM Architecture

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

10

Regional Analysis Centers• A few geographically selected sites that satisfy

requirements• Provide almost the same level of service as FNAL to a

few institutional analysis centers• Analyses carried out within the regional center

– Store 10~20% of statistically random data permanently – Most the analyses performed on these samples with the

regional network– Refine the analyses using the smaller but unbiased data set– When the entire data set is needed Underlying Grid

architecture provide access to remaining data set

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

11

Regional Analysis Center Requirements• Become a Mini-CAC• Sufficient computing infrastructure

– Large bandwidth (gagibit or better)– Sufficient Storage Space to hold 10~20% of data permanently and

expandable to accommodate data increase• >30TB just for Run IIa RAW data

– Sufficient CPU resources to provide regional or Institutional analysis requests and reprocessing

• Geographically located to avoid unnecessary network traffic overlap• Software Distribution and Support

– Mirror copy of CVS database for synchronized update between RAC’s and CAC

– Keep the relevant copies of data bases– Act as SAM service station

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

12

What has UTA Team been doing?• UTA DØGrid team consists of:

– 2 faculty (Kaushik and Jae)– 2 senior scientists (Tomasz and Mark)– 1 External software designer – 3 CSE students

• Computing Resources for MC Farm activities– 25 Dual P-III 850MHz machines for HEP MC Farm– 5 Dual P-III 850MHz machines on CSE farm– ACS 16 Dual P-III 900MHz machines under LSF

• Installed Globus 2.0 on a few machines – Completed “Hello world!” testing a few weeks ago– A working version of MC Farm bookkeeper (See the demo) from an ATLAS

machine to our HEP Farm machine through Globus Proof of Principle

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

13

Short and Long Term Goals• Plan to further the job-submission

– Submit MC jobs from remote nodes to our HEP farm for more complicated chain of job executions

– Build a prototype high level interface for job submission and distribution for DØ analysis jobs (Next 6 months)

• Store the reconstructed files ion local cache• Submit an analysis job from one of the local machines to Condor• Put the output into either the requestor’s area or the cache• Reduced output analysis job processing from a remote node through Globus

• Plan to become a DØRAC – Submitted a $1M MRI for RAC hardware purchase in Jan.

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

14

Generator job (Pythia, Isajet, …)Generator job (Pythia, Isajet, …)

DØgstar (DØ GEANT) DØgstar (DØ GEANT)

DØsim (Detector response) Underlying events(prepared in advance)

DØreco (reconstruction)

RecoA (root tuple) SAM storage in FNAL

SAM storage in FNAL

DØ Monte Carlo Production Chain

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

15

Distribute daemonDistribute daemon

BookkeeperBookkeeperLock managerLock manager

Execute daemonExecute daemon

Monitor Monitor daemondaemon

Gather daemonGather daemon

Job archiveJob archive

Cache diskCache disk

SAM SAM Remote machineRemote machine

UTA MC farm software daemons and their controlUTA MC farm software daemons and their control WWWWWWRoot Root

daemondaemon

Job Life CycleJob Life CycleDistribute queueDistribute queue Distributer daemon

Execute queueExecute queue Executer daemon

Error queueError queue Gatherer queueGatherer queue

Cache, SAM, archiveCache, SAM, archive

Gatherer daemon

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

17

Production bookeeping

• During a running period the farms produce few thousand jobs

• Some jobs crash, need to be restarted• Users must be kept up to date about their MC

requests status (waiting? Running? Done?)• A dedicated bookkeeping software is needed

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

18

The new, Globus-enabled, bookkeeper

HEP farm CSE farm

A machine from Atlas farm is running bookkeeper

Globus-job-runGrid-ftp

wwwserver

Globus domain

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

19

The new bookeeper

• One dedicated bookeeper machine can serve any number of MC production farms running mcfarm software

• The communication with remote centers is done using Globus-tools only

• No need to install bookeeper on every farm – makes life simpler if many farms participate!

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

20

DØATLAS RAC Computing Room• A 2000ft2 computing room in

the new building• Specifications given to the

designers– Multi gigabit fiber

network– Power and cooling

sufficient for• 250 Processing

PCs• 100 IDE-RAID

arrays Providing over 50TB cache

50’

40’

10’x12’ Operations

area

10’x3’

10’x3’30’x3’30’x3’30’x3’30’x3’30’x3’

Glass Window Glass WindowDouble

Door

3’ 3’ 3’ 3’ 3’

5’

5’

Apr. 5, 2002 Jae Yu, From DØ to ATLASATLAS TestBed Workshop

21

Conclusions• The DØRACE is making significant progress

– Preparation for software release and distribution– About 50% of the institutions are ready for code development and 20% with data

delivery– DØGrid TestBed being organized– DØGrid software being developed based on SAM

• UTA Has been the only US institution with massive MC generation capabilities Sufficient expertise in MC job distribution and resource management

• UTA is playing a leading role in DØGrid Activities– Major player in DØGrid software design and development– Successful in MC bookkeeper job submission and report

• DØGrid should be the testing platform for ATLAS Grid– What we do for DØ should be applicable to ATLAS with minimal efforts– What you do for ATLAS should be easily transferable to DØ and tested with actual

data taking– The Hardware for DØRAC should be the foundation for ATLAS use