lhcb computing and grid status glenn patrick lhcb(uk), dublin – 23 august 2005

33
LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

Upload: geoffrey-beasley

Post on 05-Jan-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

LHCb Computingand Grid Status

Glenn PatrickLHCb(UK), Dublin – 23 August 2005

Page 2: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

2Glenn Patrick LHCb(UK) – 23 August 2005

Computing completes TDRs

Jan 2000June 2005

Page 3: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

3Glenn Patrick LHCb(UK) – 23 August 2005

LHCb – June 2005

03 June 2005

HCAL

MF1-MF3Mu-filters

MF4

LHCb Magnet

ECAL

Page 4: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

4Glenn Patrick LHCb(UK) – 23 August 2005

Grid World

Online System

Multiplexing Layer

FE FE FE FE FE FE FE FE FE FE FE FE

SwitchSwitch SwitchSwitch

Level-1Traffic

HLTTraffic

1000 kHz5.5 GB/s

40 kHz1.6 GB/s

94 SFCs

Front-end Electronics

7.1 GB/s

TRM

Sorter

L1-Decision

StorageSystem

Readout NetworkReadout Network

SwitchSwitch SwitchSwitch SwitchSwitch

SFC

Switch

CPU

CPU

CPU

SFC

Switch

CPU

CPU

CPU

Switch

CPUCPU

CPUCPU

CPUCPU

SFC

Switch

CPU

CPU

CPU

SFC

Switch

CPU

CPU

CPU

Switch

CPUCPU

CPUCPU

CPUCPU

SFC

Switch

CPU

CPU

CPU

SFC

Switch

CPU

CPU

CPU

Switch

CPUCPU

CPUCPU

CPUCPU

SFC

Switch

CPU

CPU

CPU

SFC

Switch

CPU

CPU

CPU

Switch

CPUCPU

CPUCPU

CPUCPU

SFC

Switch

CPU

CPU

CPU

SFC

Switch

CPU

CPU

CPU

Switch

CPUCPU

CPUCPU

CPUCPU

CPUFarm

~1600 CPUs

~ 250 MB/stotal

TIER0

Scalable in depth:more CPUs (<2200)Scalable in width: more detectors in Level- 1

TFCSystem

ECS

40 MHz

Level-0Hardware

1 MHz

Level-1Software

40 kHz

HLTSoftware

2 kHz Tier 0

Raw Data:2kHZ, 50MB/s Tier 1Tier 1Tier 1Tier 1Tier 1Tier 1

Page 5: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

5Glenn Patrick LHCb(UK) – 23 August 2005

HLT Output

b-exclusive dimuon D* b-inclusive Total

Trigger Rate (Hz)

200 600 300 900 2000

Fraction 10% 30% 15% 45% 100%

Events/year (109)

2 6 3 9 20

200 Hz Hot StreamWill be fully reconstructed on online farmin real time. “Hot stream” (RAW + rDST) written to Tier 0.

2kHzRAW data written to Tier 0 for reconstruction at CERN and Tier 1s.

XJ / )(0* KDD Calibration for proper-

time resolution.Clean peak allows PID

calibration.

bUnderstand bias on other

B selections.

Page 6: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

6Glenn Patrick LHCb(UK) – 23 August 2005

Data Flow

ReconstructionBrunel

SimulationGauss

DigitisationBoole

AnalysisDaVinci

MC Truth Raw Data DST AnalysisObjects

StrippedDST

Framework - Gaudi

DetectorDescription

ConditionsDatabase

Event Model/Physics Event Model

Page 7: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

7Glenn Patrick LHCb(UK) – 23 August 2005

Grid Architecture

Tier 1 centre(RAL)

+4 virtual Tier 2

centres

LCG-2/EGEE World’s Largest Grid!~16,000 CPU and 5PB over 192 sites in ~39 countriesGridPP provides ~3,000 CPU at 20 UK sites

Page 8: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

8Glenn Patrick LHCb(UK) – 23 August 2005

Grid Ireland

EGEE made up of regions.

UKI region consists of 3federations:

GridPP

Grid Ireland

National Grid Service

We are here

Page 9: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

9Glenn Patrick LHCb(UK) – 23 August 2005

LHCb Computing Model

14 candidates

CERN Tier 1 essential

for accessing “hot stream” for1. First alignment

& calibration.2. First high-level

analysis.

Page 10: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

10Glenn Patrick LHCb(UK) – 23 August 2005

LHC Comparison

Experiment TIER 1 TIER 2

ALICE Reconstruction MC productionChaotic analysis Chaotic analysis

ATLAS Reconstruction SimulationScheduled analysis/strimming AnalysisCalibration Calibration

CMS Reconstruction Analysis for 20-100 users

All simulation prodn.

LHCb Reconstruction MC productionScheduled strimming No analysis.Chaotic analysis

Page 11: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

11Glenn Patrick LHCb(UK) – 23 August 2005

Distributed Data

RAW DATA500 TB

CERN = Master Copy

2nd copy distributed over six Tier 1s

STRIPPING140 TB/pass/copy

Pass 1: During data taking at CERN and Tier 1s (7 months)Pass 2: After data taking at CERN and Tier 1s (1 month)

RECONSTRUCTION500TB/pass

Pass 1: During data taking at CERNand Tier 1s (7 months)

Pass 2: During winter shutdown atCERN, Tier 1s and online farm (2months)

Pass 3: During shutdown at CERN, Tier 1s and online farm

Pass 4: Before next year data taking at CERN and Tier 1s (1 month)

Page 12: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

12Glenn Patrick LHCb(UK) – 23 August 2005

Check File integrity

DaVinci stripping

Check File integrity

DaVinci stripping

Check File integrity

DaVinci stripping

Stripping Job - 2005

Read INPUTDATA and stage them in 1 go

Check File status

Not yet Staged

Prod DB

group2group1

groupN

staged

Send bad file info

Check File integrity

DaVinci stripping

Good file

Merging processDST and ETC

ETC

DST

Send file info

Usage of SRM

Stripping runs on reducedDSTs (rDST).

Pre-selection algorithmscategorise events into streams.

Events that pass are fullyreconstructed and full DSTs written.

CERN, CNAF, PIC used so far – sites based on CASTOR.

Page 13: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

13Glenn Patrick LHCb(UK) – 23 August 2005

LHCb Resource Profile

Global CPU (MSI2k.yr)2006 2007 2008 2009 2010

CERN 0.27 0.54 0.90 1.25 1.88

Tier-1’s 1.33 2.65 4.42 5.55 8.35

Tier-2’s 2.29 4.59 7.65 7.65 7.65

TOTAL 3.89 7.78 12.97 14.45 17.88

Global DISK (TB)2006 2007 2008 2009 2010

CERN 248 496 826 1095 1363

Tier-1’s 730 1459 2432 2897 3363

Tier-2’s 7 14 23 23 23

TOTAL 984 1969 3281 4015 4749

Global MSS (TB)2006 2007 2008 2009 2010

CERN 408 825 1359 2857 4566

Tier-1’s 622 1244 2074 4285 7066

TOTAL 1030 2069 3433 7144 11632

Page 14: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

14Glenn Patrick LHCb(UK) – 23 August 2005

Tier1 CPU

0

20

40

60

80

100

120

140

160

Jan

Feb

Ma

r

Ap

r

Ma

y

Jun Jul

Au

g

Se

p

Oct

No

v

De

c

Jan

Feb

Ma

r

Ap

r

Ma

y

Jun Jul

Au

g

Se

p

Oct

No

v

De

c

Jan

Feb

Ma

r

Ap

r

Ma

y

Jun Jul

Au

g

Se

p

Oct

No

v

De

c

Date

MS

I2k

LHCb

CMS

ATLAS

ALICE

2008 2009 2010

Comparisons - CPUTier 1 CPU – integrated (Nick Brook)

LHCb

Page 15: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

15Glenn Patrick LHCb(UK) – 23 August 2005

Comparisons- DiskLCG TDR – LHCC, 29.6.2005 (Jurgen Knobloch)

0

20

40

60

80

100

120

140

160

2007 2008 2009 2010Year

PB

LHCb-Tier-2

CMS-Tier-2

ATLAS-Tier-2

ALICE-Tier-2

LHCb-Tier-1

CMS-Tier-1

ATLAS-Tier-1

ALICE-Tier-1

LHCb-CERN

CMS-CERN

ATLAS-CERN

ALICE-CERN

54%

pled

ged

CE

RN

Tie

r-1

Tie

r-2

Page 16: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

16Glenn Patrick LHCb(UK) – 23 August 2005

UK Tier 1 Status

Total Available (August 2005)

CPU = 796 KSI2K (500 dual cpu)Disk = 187 TB (60 servers)Tape = 340 TB

Minimum Required Tier 1 2008

CPU = 4732 KSI2K Disk = 2678 TBTape = 2538 TB

LHCb(UK) 2008 (15% share)

CPU = 663 KSI2K Disk = 365 TBTape = 311 TB

LHCb(UK) 2008 (1/6 share)

CPU = 737 KSI2K Disk = 405 TBTape = 346 TB

Page 17: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

17Glenn Patrick LHCb(UK) – 23 August 2005

UK Tier 1 Utilisation

Hardware purchase scheduled for early 2005

postponed.PPARC discussions ongoing.

Capacity

Grid

Non-Grid

70%

Grid use increasing.

CPU “undersubscribed” (but efficiencies of Grid jobs may be a problem).

LHCb 69%Jan-July 2005

CPU/Walltime < 50%for some ATLAS jobs

Page 18: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

18Glenn Patrick LHCb(UK) – 23 August 2005

UK Tier 1 Exploitation

BaBar

LHCb

ATLAS

2004

LHCb ATLAS

BaBar17.8.05

2005

Page 19: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

19Glenn Patrick LHCb(UK) – 23 August 2005

UK Tier 1 StorageClassic SE not sufficient as LCG storage solution.SRM now the agreed interface to storage resources.Lack of SRM prevented data stripping at UK Tier 1.

This year, new storage infrastructure deployed for UK Tier 1.

Storage Resource Manager (SRM) – Interface providing a combined view of secondary and tertiary storage to Grid clients.

dCache – Disk Pool Management system jointlydeveloped by DESY and FermiLab.• Single namespace to manage 100s of TB of data.• Access via GRIDFTP and SRM.• Interfaced to RAL tapestore.

CASTOR under evaluation as replacement for home-grown (ADS) tape service. CCLRC to deploy 10,000 tape robot?

LHCb now has disk allocation of 8.2TB with 4x1.6TB under dCache control (c.f. BaBar=95TB, ATLAS=19TB, CMS=40TB).

Computing Model says LHCb Tier 1 should have ~122TB in 2006…

Page 20: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

20Glenn Patrick LHCb(UK) – 23 August 2005

UK Tier 2 Centres

CPU Disk

ALICE ATLAS CMS LHCb ALICE ATLAS CMS LHCb

London 0.0 1.0 0.8 0.4 0.0 0.2 0.3 11.0

NorthGrid 0.0 2.5 0.0 0.3 0.0 1.3 0.0 12.1

ScotGrid 0.0 0.2 0.0 0.2 0.0 0.0 0.0 39.6

SouthGrid 0.2 0.5 0.2 0.3 0.0 0.1 0.0 6.8

Committed Resources available to experiment at Tier-2 in 2007Size of an average Tier-2 in experiment's computing

model

Hopefully, more resources from future funding bids

e.g. SRIF3 April 2006 – March 2008

Under Delivered - Tier1+Tier2 (March 2005)

CPU = 2277 KSI2K out of 5184 KSI2K, DISK = 280TB out of 968TB

Improving as hardware is deployed in the Tier 2 institutes.

Page 21: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

21Glenn Patrick LHCb(UK) – 23 August 2005

Tier 2 Exploitation

Over 40 sitesin UKI federationof EGEE + over 20Virtual Organisations.

GRIDPP only.Does not

include GridIreland.

17 Aug, Grid Operations Centre

LHC

b

CM

S

ATLA

S

800 data points – improved accountingprototype on the way…

…but you get the idea.Tier 2 sites are vital LHCb Grid resource.

BaB

ar

Page 22: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

22

DIRAC Architecture

DIRAC JobManagement

Service

DIRAC JobManagement

Service

DIRAC CEDIRAC CEDIRAC CEDIRAC CE

DIRAC CEDIRAC CE

LCGLCGResourceBroker

ResourceBroker

CE 1CE 1

DIRAC SitesDIRAC Sites

AgentAgent AgentAgent AgentAgent

CE 2CE 2

CE 3CE 3

Productionmanager

Productionmanager GANGA UIGANGA UI User CLI User CLI

JobMonitorSvcJobMonitorSvc

JobAccountingSvcJobAccountingSvc

AccountingDB

Job monitorJob monitor

InformationSvcInformationSvc

FileCatalogSvcFileCatalogSvc

MonitoringSvcMonitoringSvc

BookkeepingSvcBookkeepingSvc

BK query webpage BK query webpage

FileCatalogbrowser

FileCatalogbrowser

Userinterfaces

DIRACservices

DIRACresources

DIRAC StorageDIRAC Storage

DiskFileDiskFile

gridftpgridftpbbftpbbftp

rfiorfio

Services Oriented Architecture

Page 23: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

23Glenn Patrick LHCb(UK) – 23 August 2005

Data Challenge 2004

DIRAC alone

LCG inaction

1.8 106/day

LCG paused

Phase 1 Completed

3-5 106/day

LCG restarted

187 M Produced Events

LHCb DC'04

0

20

40

60

80

100

120

140

160

180

200

Total may june july august

Month

Ev

en

ts (

M)

LCG

DIRAC

20 DIRAC sites + 43 LCG sites were used.

Data written to Tier 1s.

• Overall, 50% of events produced using LCG.

• At end, 75% produced by LCG.

UK second largest producer (25%) after

CERN.

Page 24: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

24Glenn Patrick LHCb(UK) – 23 August 2005

RTTC - 2005

Real Time Trigger Challenge – May/June 2005

150M Minimum bias events to feed online farmand test software trigger chain.

Completed in 20 days (169M events) on 65 different sites.95% produced with LCG sites5% produced with “native” DIRAC sites

Average of 10M events/day.Average of 4,000 cpus

CountriesEvents

Produced

UK 60 M

Italy 42 M

Switzerland 23 M

France 11 M

Netherlands 10 M

Spain 8 M

Russia 3 M

Greece 2.5 M

Canada 2 M

Germany 0.3 M

Belgium 0.2M

Sweden 0.2 M

Romany, Hungary, Brazil, USA

0.8 M

37%

Page 25: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

25Glenn Patrick LHCb(UK) – 23 August 2005

Looking Forward

SC3

LHC Service OperationFull physics run

2005 20072006 2008

First physicsFirst beams

cosmicsSC4

Next ChallengeSC3 – Sept. 2005

Start DC06Processing phase

May 2006

Alignment/calibrationChallenge

October 2006

Ready for data takingApril 2007

Analysisat Tier 1sNov. 2005

Excellent support from UK Tier 1 at RAL.2 application support posts at Tier 1 appointed in June 2005 BUT LHCb(UK) technical co-ordinator still to be appointed.

Page 26: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

26Glenn Patrick LHCb(UK) – 23 August 2005

LHCb and SC3

Phase 1 (Sept. 2005 ):

a) Movement of 8TB of digitised data from CERN/Tier 0 to LHCb Tier 1 centres in parallel over a 2 week period (~10k files). Demonstrate automatic tools for data movement and bookkeeeping.

b) Removal of replicas (via LFN) from all Tier 1 centres.

c) Redistribution of 4TB data from each Tier 1 centre to Tier 0 and other Tier 1 centres over a 2 week period. Demonstrate data can be redistributed in real time to meet stripping demands.

d) Moving of stripped DST data (~1TB, 190k files) from CERN to all Tier 1 centres.

Phase 1 (Sept. 2005 ):

a) Movement of 8TB of digitised data from CERN/Tier 0 to LHCb Tier 1 centres in parallel over a 2 week period (~10k files). Demonstrate automatic tools for data movement and bookkeeeping.

b) Removal of replicas (via LFN) from all Tier 1 centres.

c) Redistribution of 4TB data from each Tier 1 centre to Tier 0 and other Tier 1 centres over a 2 week period. Demonstrate data can be redistributed in real time to meet stripping demands.

d) Moving of stripped DST data (~1TB, 190k files) from CERN to all Tier 1 centres.

Phase 2 (Oct. 2005 ):a) MC production in Tier 2 centres with DST data collected in Tier

1 centres in real time followed by stripping in Tier 1 centres (2 months). Data stripped as it becomes available.

b) Analysis of stripped data in Tier 1 centres.

Phase 2 (Oct. 2005 ):a) MC production in Tier 2 centres with DST data collected in Tier

1 centres in real time followed by stripping in Tier 1 centres (2 months). Data stripped as it becomes available.

b) Analysis of stripped data in Tier 1 centres.

Page 27: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

27Glenn Patrick LHCb(UK) – 23 August 2005

GridPP Status

GRIDPP1 Prototype Grid£17M, complete

September 2001 – August 2004

GRIDPP2 Production Grid£16M, ~20% complete

September 2004 – August 2007

Beyond August 2007?

Funding from September 2007 will be incorporated as part of PPARC’s request for planning input for LHC exploitation.

To be considered by panel (G. Lafferty, S. Watts & P. Harris) providing input to the Science Committee in the autumn.

Input from ALICE, ATLAS, CMS, LHCb and GRIDPP.

Page 28: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

28Glenn Patrick LHCb(UK) – 23 August 2005

SC3

LHC Service OperationFull physics run

2005 20072006 2008

First physicsFirst beams

cosmics

SC4

LCG StatusLCG-2 (=EGEE-0)

prototyping

prototyping

product

20042004

20052005

LCG-3 (=EGEE-x?)

product

LCG has two phases.Phase 1: 2002 – 2005 Build a service prototype, based on

existing grid middleware Gain experience in running a

production grid service Produce the TDR for the final system

LCG and experiment TDRs submitted

Phase 2: 2006 – 2008 Build and commission the initial LHC

computing environment

We

are

her

e

Page 29: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

29Glenn Patrick LHCb(UK) – 23 August 2005

UK:Workflow Control

Primary eventSpill-over event

Production DesktopGennady Kuznetsov

(RAL)

Gauss B Gauss MBGauss MBGauss MB

Boole B Boole MBBoole MBBoole MB

Brunel B Brunel MB

Sim

Digi

Reco

Software installation

Gaussexecution

Check logfile

Dir listing

Bookkeeping

report

Steps

Modules

Used for RTCC and currentproduction/stripping.

Page 30: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

30Glenn Patrick LHCb(UK) – 23 August 2005

Web

Browser

Bookkeep

ing

AR

DA

S

erv

er

TCP/IPStreaming

AR

DA

C

lien

t AP

I

Tomcat S

erv

let

AR

DA

Clie

nt

AP

I

GANGA

application

UK: LHCb Metadata and ARDA

Carmine Cioffi (Oxford)

Testbed underway to measure performance with

ARDA and ORACLE servers.

Page 31: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

31Glenn Patrick LHCb(UK) – 23 August 2005

AtlasPROD

DIAL

DIRAC

LCG2

gLite

localhost

LSF

submit, kill

get outputupdate status

store & retrieve job definition

prepare, configure

Ganga4

JobJobJobJob

scripts

Gaudi

Athena

AtlasPROD

DIAL

DIRAC

LCG2

gLite

localhost

LSF

+ split, merge, monitor, dataset selection

UK: GANGA Grid Interface

Karl Harrison (Cambridge)

Alexander Soroko (Oxford)

Alvin Tan (Birmingham)Ulrik Egede (Imperial)

Andrew Maier (CERN)

Kuba Moscicki (CERN)

Ganga 4 beta release8th July

Page 32: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

32Glenn Patrick LHCb(UK) – 23 August 2005

UK: Analysis with DIRAC

Task-Queue

Agent

Job executes on WN

DIRACJob

Installs software

Closest SE

Data as LFN

Matching

Check for allSE’s whichhave data

If no dataspecified

Software Installation + Analysis via DIRAC WMS

Stuart Patterson (Glasgow)

DIRAC API for analysis job submission

[ Requirements = other.Site == "DVtest.in2p3.fr"; Arguments = "jobDescription.xml"; JobName = "DaVinci_1"; OutputData = { "/lhcb/test/DaVinci_user/v1r0/LOG/DaVinci_v12r11.alog" }; parameters = [ STEPS = "1"; STEP_1_NAME = "0_0_1" ]; SoftwarePackages = { "DaVinci.v12r11" }; JobType = "user"; Executable = "$LHCBPRODROOT/DIRAC/scripts/jobexec"; StdOutput = "std.out"; Owner = "paterson"; OutputSandbox = { "std.out", "std.err", "DVNtuples.root", "DaVinci_v12r11.alog", "DVHistos.root" }; StdError = "std.err"; ProductionId = "00000000"; InputSandbox = { "lib.tar.gz", "jobDescription.xml", "jobOptions.opts" }; JobId = ID ]

PACMANDIRAC installation tools

See later talk!

Page 33: LHCb Computing and Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005

Half way there!

But the climb getssteeper and there

may be more mountains beyond

2007

2005 Monte-Carlo Production on the Grid

2007 Data Taking

Data Stripping

Distributed Analysis

Distributed Reconstruction

Conclusion

DC04

DC03