henp grid testbeds, applications and demonstrations rob gardner university of chicago chep03 march...

27
HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

Upload: solomon-patrick

Post on 28-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

HENP Grid Testbeds, Applications and Demonstrations

Rob GardnerUniversity of Chicago

CHEP03March 29, 2003

Ruth PordesFermilab

Page 2: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

2

Overview

High altitude survey of contributions– group, application, testbed, services/tools

Discuss common and recurring issues– grid building, services development, use

Concluding thoughts

– Acknowledgement to all the speakers who gave fine presentations, and my apologies in advance for not providing this *very limited* sampling

Page 3: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

3

Testbeds, applications, and development of tools and services

Testbeds:– Alien grids– BaBar Grid– CrossGrid– DataTAG– EDG Testbed(s)– Grid Canada – IGT Testbed (US CMS)– Korean DataGrid– NorduGrid(s)– SAMGrid– US ATLAS Testbed– WorldGrid

Evaluations– EDG testbed evaluations and

experience in multiple exps.– Testbed management experience

Applications– ALICE production– ATLAS production– BaBar analysis, file replication– CDF/D0 analysis– CMS production – LHCb production– Medical applications in Italy– Phenix– Sloan sky survey

Tools development– Use cases (HEPCAL)– Proof/Grid analysis– LCG Pool and grid catalogs– SRM, Magda– Clarens, Ganga, Genius, Grappa, JAS

Page 4: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

4

EDG TB History

Version Date

1.1.2 27 Feb 2002

1.1.3 02 Apr 2002

1.1.4 04 Apr 2002

1.2.a1 11 Apr 2002

1.2.b1 31 May 2002

1.2.0 12 Aug 2002

1.2.1 04 Sep 2002

1.2.2 09 Sep 2002

1.2.3 25 Oct 2002

1.3.0 08 Nov 2002

1.3.1 19 Nov 2002

1.3.2 20 Nov 2002

1.3.3 21 Nov 2002

1.3.4 25 Nov 2002

1.4.0 06 Dec 2002

1.4.1 07 Jan 2003

1.4.2 09 Jan 2003

1.4.3 14 Jan 2003

1.4.4 18 Jan 2003

1.4.5 26 Feb 2003

1.4.6 4 Mar 2003

1.4.7 8 Mar 2003

Successes• Matchmaking/Job Mgt.• Basic Data Mgt.Known Problems:• High Rate Submissions• Long FTP TransfersKnown Problems:

• GASS Cache Coherency• Race Conditions in Gatekeeper• Unstable MDS

Intense Use by Applications!Limitations: • Resource Exhaustion• Size of Logical Collections

Successes• Improved MDS Stability• FTP Transfers OKKnown Problems:• Interactions with RC

ATLAS phase 1 start

CMS stress test Nov.30 - Dec. 20

CMS, ATLAS, LHCB, ALICE

Emanuele Leonardi

Page 5: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

5

Resumé of experiment DC use of EDG-see experiment talks elsewhere at CHEP

ATLAS were first, in August 2002. The aim was to repeat part of the Data Challenge. Found two serious problems which were fixed in 1.3

CMS stress test production Nov-Dec 2002 – found more problems in area of job submission and RC handling – led to 1.4.x

ALICE started on Mar 4: production of 5,000 central Pb-Pb events - 9 TB; 40,000 output files; 120k CPU hours

– Progressing with similar efficiency levels to CMS

– About 5% done by Mar 14

– “Pull” architecture LHCb started mid Feb

– ~70K events for physics

– Like ALICE, using a pull architecture

BaBar/D0

– Have so far done small scale tests

– Larger scale planned with EDG 2

No.

of

evts

25

0k

Time – 21 days

Stephen Burke

Page 6: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

6

CMS Data Challenge 2002 on Grid

Two “official” CMS productions on the grid in 2002– CMS-EDG Stress Test on EDG testbed + CMS sites

> ~260K events CMKIN and CMSIM steps> Top-down approach: more functionality but less robust, large

manpower needed

– USCMS IGT Production in the US> 1M events Ntuple-only (full chain in single job)> 500K up to CMSIM (two steps in single job)> Bottom-up approach: less functionality but more stable, little

manpower needed

– See talk by P.Capiluppi

C. Grande

Page 7: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

7

CMS production components interfaced to EDG

•Four submitting UIs: Bologna/CNAF (IT), Ecole Polytechnique (FR), Imperial College (UK), Padova/INFN (IT)

•Several Resource Brokers (WMS), CMS-dedicated and shared with other Applications: one RB for each CMS UI + “backup”

•Replica Catalog at CNAF, MDS (and II) at CERN and CNAF, VO server at NIKHEF

SECECMS software

BOSSDB

WorkloadManagement

System

JDL

RefDB

parameters

data registration

Job output filteringRuntime monitoring

input

dat a

lo

cat i

on

Push data or info

Pull info

UIIMPALA/BOSS

Replica Manager

CECMS software

CECMS software

CE

WN

SECE

CMS software

SE

SE

SE

ReadWrite

CMSCMS EDGEDG

CMS ProdTools

on UI

Page 8: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

8

CMS/EDG Production

~260K events produced

~7 sec/event average

~2.5 sec/event peak (12-14 Dec)

# E

ven

ts

30 Nov

20 Dec

CMS Week

Upgrade of MWHit some limit

of implement.

P. Capiluppi talk

Page 9: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

9

US-CMS IGT Production

25 Oct

28 Dec

> 1 M events4.7 sec/event average2.5 sec/event peak (14-20 Dec 2002)Sustained efficiency: about 44%

P. Capiluppi talk

Page 10: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

10

Grid in ATLAS DC1*

US-ATLAS EDG Testbed Prod NorduGrid

part of Phase 1 reproduce part of full phase 1 & 2

production phase 1 data production

Full Phase 2 several tests

production

[ * See other ATLAS talks for more details]

G.Poulard

Page 11: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

11

Contribution to the overall CPU-time (%) per country

1,41%

10,92%

0,01%

1,46%9,59%2,36%

4,94%

10,72%

2,22%

3,15%

4,33%

1,89%

3,99%

14,33%

0,02%

28,66%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

ATLAS DC1 Phase 1 : July-August 02

3200 CPU‘s110 kSI9571000 CPU days

5*10*7 events generated1*10*7 events simulated3*10*7 single particles30 Tbytes35 000 files

39 Institutes in 18 Countries1. Australia

2. Austria3. Canada4. CERN5. Czech Republic6. France7. Germany8. Israel9. Italy10. Japan11. Nordic12. Russia13. Spain14. Taiwan15. UK16. USA

grid tools used at 11 sites

G.Poulard

Page 12: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

12

Linker

Configurator A Configurator B Configurator C

I want to run applications A, B, and C

Attach A, B, C

Make Job(Framework)

/bin/sh scriptto run App A

/bin/sh scriptto run App C

/bin/sh scriptto run App B

#!/bin/env shscriptAscriptBscriptC

Configure

ScriptGenerator

Meta Systems

MCRunJob approach by CMS production team

Framework for dealing with multiple grid resources and testbeds (EDG, IGT)

G.Graham

Page 13: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

13

Hybrid production model

MCRunJob

Site Manager startsan assignment

RefDBPhys.Group asks for

an official dataset

User starts aprivate production

Production Managerdefines assignments

DAG

job job

job

job

JDL

shellscripts

DAGMan(MOP)

LocalBatch Manager

EDGScheduler

Computer farm

LCG-1testbe

d

User’s Site Resources

ChimeraVDL

Virtual DataCatalogue

Planner

C. Grande

Page 14: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

14

Interoperability: glue

CECE

UIUI

SESERBRB

VDT Client

VDT Server

RCRC

ISIS

Page 15: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

15

Integrated Grid Systems

Two examples of integrating advanced production and analysis to multiple grids

SamGrid AliEn

Page 16: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

16

SamGrid Map

•CDF–Kyungpook National University, Korea–Rutgers State University, New Jersey, US–Rutherford Appelton Laboratory, UK–Texas Tech, Texas, US–University of Toronto, Canada

•DØ–Imperial College, London, UK–Michigan State University, Michigan, US–University of Michigan, Michigan, US–University of Texas at Arlington, Texas, US

Page 17: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

17

Physics with SAM-Grid

Standard CDF analysis job submitted via SAM-Grid and executed somewhere

z0(µ1) z0(µ2)

J/ψ => µ+ µ-

S. Stonjek

Page 18: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

18

VO

RC

RB

CE

SE

WN

CE

SE

WNCE

SE

WN

CE

SE

WN CE SE WN

The BaBar Grid as of March 2003 D. Boutigny

special challenges faced by a running experimentwith heterogeneous data requirements, root, Objy

Page 19: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

19

Grid Applications, Interfaces, Portals

Clarens Ganga Genius Grappa JAS-Grid Magda Proof-Grid

and higher level services– Storage Resource

Manager (SRM)

– Magda data management

– POOL-Grid interface

Page 20: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

20

PROOF and Data Grids

Many services are a good fit– Authentication

– File Catalog, replication services

– Resource brokers

– Monitoring

Use abstract interfaces Phased integration

– Static configuration

– Use of one or multiple Grid services

– Driven by Grid infrastructure

Fons Rademakers

Page 21: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

21

Different PROOF–GRID Scenarios

Static stand-alone– Current version, static config file, pre-installed

Dynamic, PROOF in control– Using grid file catalog and resource broker, pre-

installed Dynamic, ALiEn in control

– Idem, but installed and started on the fly by AliEn Dynamic, Condor in control

– Idem, but allowing in addition slave migration in a Condor pool

Fons Rademakers

Page 22: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

22

RB/JSS II

SE

input data location

Replica Catalog TOP

GIIS

. . .CE

Executable = "/usr/bin/env";Arguments = "zsh prod.dc1_wrc 00001";

VirtualOrganization="datatag";Requirements=Member(other.GlueHostApplicationSoftwareRunTimeEnvironment,"ATLAS-3.2.1" );Rank = other.GlueCEStateFreeCPUs;InputSandbox={"prod.dc1_wrc",“rc.conf","plot.kumac"};OutputSandbox={"dc1.002000.test.00001.hlt.pythia_jet_17.log","dc1.002000.test.00001.hlt.pythia_jet_17.his","dc1.002000.test.00001.hlt.pythia_jet_17.err","plot.kumac"};ReplicaCatalog="ldap://dell04.cnaf.infn.it:9211/lc=ATLAS,rc=GLUE,dc=dell04,dc=cnaf,dc=infn,dc=it";InputData = {"LF:dc1.002000.evgen.0001.hlt.pythia_jet_17.root"};StdOutput = " dc1.002000.test.00001.hlt.pythia_jet_17.log";StdError = "dc1.002000.test.00001.hlt.pythia_jet_17.err";DataAccessProtocol = "file";

JDL GLUE-aware files

WNATLAS sw

data

registration

GLUE-Schema basedInformation System

GLUETestbed

JDL

Job

GENIUS

UI

see WorldGrid Poster this conf.

Page 23: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

23

Ganga: ATLAS and LHCb

Server

BookkeepingDBProductio

nDB

EDG UI

PYTHON SW BUS

XML RPC server

XML RPC module

GANGA Core Module

OS Module

Athena\GAUDI

GaudiPython PythonROOT

PYTHON SW BUSG

UI

JobConfiguration

DB

Remote user

(client)

Local JobDB

LAN/WAN

GRID

LRMS

C. Tull

Page 24: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

24

Ganga EDG Grid Interface

Job class Job class JobsRegistry classJobsRegistry classJob Handler

classJob Handler

class

Data management

service

Data management

service

Job submissionJob submission Job monitoring Job monitoring Security serviceSecurity service

dg-job-list-match

dg-job-submit

dg-job-cancel

dg-job-list-match

dg-job-submit

dg-job-cancel

grid-proxy-init

MyProxy grid-proxy-init

MyProxy

dg-job-status

dg-job-get-logging-info

GRM/PROVE

dg-job-status

dg-job-get-logging-info

GRM/PROVE

edg-replica-manager

dg-job-get-output

globus-url-copy

GDMP

edg-replica-manager

dg-job-get-output

globus-url-copy

GDMPEDG UI

C. Tull

Page 25: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

25

Comment: Building Grid Applications

P is a dynamic configuration script

Turns abstract bundle into a concrete one

Challenge:– building

integrated systems

– distributed developers and support

Grid Component Library

CTL ATL

GTL

abstractbundles

templates

U2

P1a

U1

concretebundles

P1c

attributes:user info:grid info

P

Page 26: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

26

In summary…Common issues

Installation and configuration of MW Application packaging, run time environments Authentication mechanisms Policies differing among sites Private networks, firewalls, ports Fragility of services, job submission chain Inaccuracies, poor performance of information

services Monitoring and several levels Debugging, site cleanup

Page 27: HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

27

Conclusions Progress in the past 18 months has been dramatic!

– lots of experience gained in building integrated grid systems

– demonstrated functionality with large scale production – more attention being given to analysis

Many pitfalls exposed, areas for improvement identified– some of these are core middleware feedback given to

technology providers– Policy issues remain – using shared resources,

authorization– operation of production services – user interactions, support models to be developed

Many thanks to the contributors to this session