distributed grid analysis using the ganga job management ... · distributed analysis model ii need...

32
Distributed Grid Analysis using the Ganga job management system Johannes Elmsheuser Ludwig-Maximilians-Universit¨ at M¨ unchen, Germany 09 July 2007/DESY Computing Seminar

Upload: others

Post on 06-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Distributed Grid Analysis using the Ganga

job management system

Johannes Elmsheuser

Ludwig-Maximilians-Universitat Munchen, Germany

09 July 2007/DESY Computing Seminar

Page 2: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Outline

1 Distributed Analysis Model in ATLAS

2 Distributed Analysis with GANGA

3 Conclusions

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 2 / 32

Page 3: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

ATLAS Grid Infrastructure

• Heterogeneous grid environment based on 3 grids:LCG/EGEE, OSG and Nordugrid

• Grids have different middle-ware, catalogs to store data, software tosubmit jobs

=⇒ Hide differences from the ATLAS user

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 3 / 32

Page 4: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Distributed Analysis Model I

The distributed analysis model is based the LHC computing model

• Data is distributed in Tier1/Tier-2 facilities by defaultavailable 24/7

• user jobs are sent to the datalarge input datasets (100 GB up to several TB)

• Results must be made available to the userpotentially already during processing

• Data is added with meta-data and bookkeeping in catalogs

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 4 / 32

Page 5: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

ATLAS Data replication and distribution

• Event Filter Farm at CERN• Located near the Experiment, assembles data into a stream to

the Tier0

• Tier0 at CERN• Raw data → Mass storage at CERN and to Tier 1s• Swift production of Event Summary Data (ESD) and Analysis

Object Data (AOD)• Ship ESD, AOD to Tier1s → Mass storage at CERN

• Tier1s distributed worldwide (10 centers)• Re-reconstruction of raw data, producing new ESD, AOD• Scheduled, group access to full ESD and AOD

• Tier2s distributed worldwide (∼ 30 centers)• MC Simulation, producing ESD, AOD → Tier 1s• On demand user physics analysis

• CERN Analysis Facility• Analysis• Heightened access to ESD and RAW/calibration data on demand

• Tier3s distributed worldwide

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 5 / 32

Page 6: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Distributed Analysis Model II

Need for: Distributed Data Management (DDM)

• Managed by DDM system DQ2 (Don-Quijote 2)

• Automated file management, distribution and archiving throughoutthe whole grid using a Central Catalog, FTS, LFCs

• Random access needs a pre-filtering of data of intereste.g. Trigger or ID streams or TAGs (event-level meta data)

Current situation and implementation

• Data from MC Production System is currently consolidated byDDM-operations team on all Tier1 and then all Tier2 sites

• Analysis model foresees Athena analysis of AODs/ESDs andinteractive use of Athena-aware-ROOT tuples

• Analysis tuple format(s) in enhancement

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 6 / 32

Page 7: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Data replication status

AOD and NTUP Replication Status (Tier-1s)AOD and NTUP Replication Status (Tier 1s)• Data replication period Feb - 19 Jun 2007, DQ2 0.2 • Total data volume : 3200+ datasets, 570+Kfiles, 23+TB, ,

ASGC BNL CERN CNAF FZK LYON NG PIC RAL SARA TRIUMF

ASGC

tofrom %

80BNL

CERN

CNAF

92

45

21FZK

LYON

21

84

85

NG

PIC

RAL

82

X

25

NIKHEF

TRIUMF

36

36

Jun 2007 A.Klimentov, ATLAS SW WS 5

95+% 90-95% 80-90% 25-60% <25% no AODs consolidation within the cloud or/and replication was stopped

60-80%

At DESY-HH trying to replicate all AODs as well

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 7 / 32

Page 8: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Analysis comparisons Tevatron/LHC

Tevatron:

• All data (MC/real data) is available at FNAL

• Single computing center, with additional few sites for MC or datareprocessing

• use e.g. d0tools to submit jobs local batch system

• Job numbers: 10-100 per analysis cycle

LHC:

• All data distributed to various sites

• CERN, 10 Tier1, ∼100 Tier2 sites for reconstruction, MC andreprocessing

• use grid tools to submit jobs which go to the data

• Job numbers: a few 1000 per analysis cycle

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 8 / 32

Page 9: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Grid job submission

Naive ssumption: Grid ≈ large batch system

• Provide complicated job configuration jdl file (Job DescriptionLanguage)

• Find suitable Athena software, installed as distribution kits in the Grid

• Locate the data on different storage elements

• Job splitting, monitoring and book-keeping

• etc.

=⇒ Need for automation and integration of various different components

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 9 / 32

Page 10: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Distributed Analysis

How to combine all these: Job scheduler/manager: GANGA

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 10 / 32

Page 11: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

D-Grid Project

• 6 community projects and D-Grid integration project for a sustainableGrid infrasturcture in Germany:DGI, AstroGrid-D, C3-Grid, HEP-Grid, InGrid, MediGrid, TextGrid

• HEP-CG, 3 working packages:• Data management with Job-Scheduling and Accounting, dCache• Job monitoring, Error indentification and job steering• Distributed and Interactive Analysis

• Close connection to the LHC experiments and the german Tier1 andTier2 centers

• LMU Munich:• Distributed and Interactive Grid Analysis• Gap analysis within D-Grid: Identify job scheduler candidates and

review needs and existing features

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 11 / 32

Page 12: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Front-end Client: GANGA

• A user-friendly job definition and management tool.

• Allows simple switching between testing on a local batch system andlarge-scale data processing on distributed resources (Grid)

• Developed in the context of ATLAS and LHCb :• For ATLAS, have built-in support for applications based on Athena

framework, for Production System JobTransforms, and for DQ2data-management system

• Component architecture readily allows extension

• Python framework

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 12 / 32

Page 13: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Who is Ganga ?

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 13 / 32

Page 14: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Ganga Job

• Ganga is based on a simple, but flexible, job abstraction

• A job is constructed from a set of building blocks, not all required forevery job

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 14 / 32

Page 15: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Ganga Buildingblocks

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 15 / 32

Page 16: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Ganga ATLAS Objects

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 16 / 32

Page 17: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Ganga Backends and Applications

• Ganga simplifies running of ATLAS (and LHCb) applications on avariety of Grid and non-Grid back-ends

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 17 / 32

Page 18: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Job definition using ATLAS software I

• Job definition at command line on local desktop:

athena AnalysisSkeleton_topOptions.py

• Job definition at command line for GRID submission:

ganga athena

--inDS csc11.005320.PythiaH170wwll.recon.AOD.v11004107

--outputdata AnalysisSkeleton.aan.root

--split 3

--lcg

--ce ce-fzk.gridka.de:2119/jobmanager-pbspro-atlasS

AnalysisSkeleton_topOptions.py

No need to use option –ce, job can also be automatically send to data

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 18 / 32

Page 19: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Job definition using ATLAS software II

Job definition within GANGA IPython shell:

j = Job()

j.application=Athena()

j.application.prepare(athena_compile=False)

j.application.option_file=’$HOME/athena/12.0.5/InstallArea/jobOptions/UserAnalysis/AnalysisSkeleton_tobOptions.py’

j.splitter=AthenaSplitterJob()

j.splitter.numsubjobs = 3

j.merger=AthenaOutputMerger()

j.inputdata=DQ2Dataset()

j.inputdata.dataset=’csc11.005145.PythiaZmumu.recon.AOD.v11004103’

j.inputdata.match_ce=True

j.outputdata=DQ2OutputDataset()

j.outputdata.outputdata=[’AnalysisSkeleton.aan.root’]

j.backend=LCG()

j.submit()

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 19 / 32

Page 20: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Screenshot of the IPython shell

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 20 / 32

Page 21: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Job Definitions with the GUI

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 21 / 32

Page 22: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Ganga jobs monitored by the Dashboard

http://dashb-atlas-job.cern.ch/dashboard/request.py/jobsummary

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 22 / 32

Page 23: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Results

Download the processed events (root-tuples) to the local desktop:

0 5 10 150

0.2

0.4

Number of muons

0 20 40 60 80 1000

0.01

of muon 1 [GeV]T

p

0 20 40 60 80 1000

0.01

0.02

0.03

of muon 2 [GeV]T

p

0 50 100 150 2000

0.02

0.06

0.08

[GeV]µµm

0 1 2 30

µµφ ∆

0 50 100 150 2000

0.02

0.06

final [GeV]TE

0 5 10 150

0.02

0.06

0.08

Number of Jets

0 50 100 150 2000

0.01

) [GeV]T

E,µµ (Tm

µµ→*γZ/

νµνµ→WW→H

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 23 / 32

Page 24: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Use cases at the LMU Munich

• Production of Signal MC sample with AthenaMC• 50000 additional events:• event generation: 5 jobs or use already validated evgen samples where

only a small fraction has officially been simulated/reconstructed• simulation: 1000 jobs• reconstruction: 50 jobs• From this exercise a prototype for automatic job submission has

evolved which will eventually will be part of Ganga• Statisics from the dashboard: 11/4-11/6: 79% Grid eff. * 77%

Application eff. = 53% overall eff.

• Distributed Analysis as part of the CSC homework:• process signal and background MC samples at well known and

maintained sites: GridKa, Lyon and LRZ• This involes: SUSY signal, ttbar (5200), Di-boson backgrounds, etc.• Using SUSYView

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 24 / 32

Page 25: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Current ATLAS issues

• Data availablity and incomplete datasets• Newest AOD data has been replicated to many sites by DDM-ops team

• Repeated direct input files access problems (Posix I/O) on Castor,dCache and DPM storage elements due to various reasons:

• broken ROOT or middleware plugins• Operational issues, like TURL resolving not working

• gLite WMS is now used for analysis

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 25 / 32

Page 26: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Distributed Analysis Tutorials and Support

https://twiki.cern.ch/twiki/bin/view/Atlas/GangaTutorial43

• Edinburgh (February 1st-2nd )

• Milan (February 5th-6th )

• Lyon (March 5th-7th)

• Munich (March 29th-30th )

• Toronto (April 18th)

• Bergen (April 27th)

• Valencia (May 3rd-4th)

• Ganga User support and Feedback:[email protected]

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 26 / 32

Page 27: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Usage Statistics

• Over 750 unique users since beginning of the year• Over 440 ATLAS users have tried Ganga at least once• About 50 ATLAS Ganga users per week

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 27 / 32

Page 28: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Domain Statistics

• Over 50 diffrent domains in the last month

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 28 / 32

Page 29: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Ganga Users I

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 29 / 32

Page 30: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Ganga Users II

Ganga is used as Grid abstraction layer, also in conjuction with DIANE

• Bioinformatics: Avian flu drug search 2006

• Geant4 regression testing

• Cambridge Ontology: content-based image retrival

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 30 / 32

Page 31: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Distributed analysis needs and Conclusions

For the distributed analysis it is vital to have:

• Easy interface that does not scare off physicists

• A reliable and robust service of many components

Conclusions

• Growing number of users are using the Grid for analysis,Still some room to grow

• Have to push analysis to higher Tiers

• Data Management is a central issue for Distributed Analysis

• It might interesting to see how more interactive use of resources willevolve

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 31 / 32

Page 32: Distributed Grid Analysis using the Ganga job management ... · Distributed Analysis Model II Need for: Distributed Data Management (DDM) † Managed by DDM system DQ2 (Don-Quijote

Resources

Homepage:http://cern.ch/ganga

Links to Documentation, Tutorials, etc.:http://ganga.web.cern.ch/ganga/workbook/

Johannes Elmsheuser (LMU Munchen) 09/07/2007/DESY 32 / 32