1 m. paganoni, hcp2007 computing tools and analysis architectures: the cms computing strategy m....

27
M. Paganoni, HCP2007 1 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

Upload: griffin-arnold

Post on 03-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 1

Computing tools and analysis architectures:

the CMS computing strategy

M. PaganoniHCP2007

La Biodola, 23/5/2007

Page 2: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 2

OutlineCMS Computing and Analysis ModelCMS workflow components 25 % capacity test (CSA06 challenge)CMSSW validationLoadTest07, Site Availability Monitor and Grid gLite 3.1

The goals for 2007• Physics validation with high statistics• Full detector readout during commissioning

• 50 % capacity test (CSA07 challenge)

Analysis workflow

Page 3: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 3

CMS schedule

March

April

May

June

July

Aug.

Sep.

Oct.

Nov.

1) Detector Installation, Commissioning & Operation

First Global Readout Test

Barrel ECAL Inserted

Tracker InsertedTrigger/DAQ Ready for System

Commissioning

CMS Ready to Close

2) Preparation of Software, Computing and Physics Analysis

HLT exercise complete

Pre-CSA07 Computing Software Analysis Challenge

2007 Physics Analyses completed

CSA07

All CMS Systems Ready for Global Data Taking

Page 4: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 4

The present status of CMS computing

From development• service/data challenges (both WLCG wide and experiment specific) of increasing scale and complexity

to operations • data distribution• MC production• physics analysis

Primary needs:• Smoothly running Tier1’s and Tier2’s, concurrent with other experiments

• Streamlined and automatic operations to ease the operation load

• Full monitoring to have early detection of Grid and site problems and reach stability

• Sustainable operations in terms of DM, WM, user support, site configuration and availability, continouous significant load

Page 5: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 5

The CMS computing Model

Tier-0: Accepts data from

DAQ Prompt

reconstruction Data archive and

distribution to T1s

Tier-1’s: Data and MC archiving Re-processing Skimming and other

data-intensive analysis tasks

Data serving to T2s

Tier-2’s: User data

Analysis MC production Calibration/

alignment and detector studies

~30

Page 6: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 6

CMS data formats and data flow

RAW

RECO

AOD

TAG

CMS: ~1.5 MB/ev2 copies: 1 at T0 and 1 over T1s

4.5 PB/yr

CMS: ~250 kB/ev1 copy spread over T1s

2.1 PB/yr

CMS: ~50 kB/ev1 copy to each T1, data serving to T2s

2.6 PB/yr

CMS: ~1-10 kB/ev

MC in 1:1 ratio with data

Page 7: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 7

The MC productionProduction of 200M events (50M/month), for HLT and

PhysicsNotes, started at T2s with new MC Production System• less man-power consuming, better handling of Grid-sites

unreliability, better use of resources, automatic retrials, better error report/handling

• More flexible and automated architecture• ProdManager (PM) (+ the policy piece)

– manage the assignment of requests to 1+ ProdAgents and tracks the global completion of the task

• ProdAgent (PA)– Job creation, submission and tracking, management of merges, failures, resubmissionsTier-0/1

Policy/schedulingcontroller

PM

Official MC Prod

Develop. MC Prod

PA

Tier-1/2

PA

PA

PA

PAPM

Page 8: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 8

CMS Remote Analysis Builder CRAB is a user oriented tool for Grid submission and handling of physics analysis jobs• data discovery (DBS/DLS)• interactions with the Grid

(also error handling, resubmission)• output retrieval

routinely used since 2004on both EGEE and OSG(MTCC, PTDR, CSA06, tracker commissioning…)

New client-server architecture• improve scalability• increase automatization

Page 9: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 9

The data placement (PhEDEx) Data placement system for CMS (in production since

> 3 years)• large scale reliable dataset/fileblock replication• multi-hop routing following a transfer topology (T0

T1’s T2’s),data pre-stage from tape, data archiving to tape, monitoring, bookkeeping, priorities and policy, fail-over tactics

PhEDEx made of a sets of independent agents, integrated with gLite File Transfer Service (FTS)

It works with bothEGEE and OSG

Automatic subscriptionto DBS/DLS

Page 10: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 10

Data processing workflow

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 11: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 11

Computing Simulation Analysis challenge 2006

The first test of the complete CMS workflow and dataflow

60M events exercise to ramp up at 25% of the 2008 capacity

T0: prompt reconstruction• 207M events reconstructed (RECO, AOD) applying

alignment/calibration from offline DB• 0.5 PB transferred to 7 T1s

T1s: skimming (to get manageable datasets), re-reconstruction • automatic data serving to T2s via injection to PhEDEx

and registration in DBS/DLS T2s: access to the skimmed data,

alignment/calibration jobs, Physics analysis jobs • submission of analysis jobs to the Grid with CRAB by

single users and groups• insertion of new constants in offline DB

Page 12: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 12

CSA06: T0 and T0 -> T1

Prompt Reconstruction @ T0

peak rate: >300 Hz for >10 hoursuptime: 100% over 4 weeksbest eff.: 96% (1400 CPUs) for ~12 h

T0 -> T1 transfer

average rate: 250 MB/s peak rate: 650 MB/s

Page 13: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 13

CSA06: job submission>50K jobs/day in final week

30K/day robot jobs production jobs managed

by Production Agent analysis jobs submitted

via CRAB to the Grid

90% job efficiency

a typical CSA06 day

CRAB submissions

Page 14: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 14

CSA06: calibration

minimum bias -symmetrysingle electrons W➔eν

Z mass reconstruction

ECAL calibration with -symmetry of energy deposits in minimum bias

few hours of data

calibration workflow

Page 15: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 15

CSA06: alignment

TIB modules - positions

Closing the loop:

analysis of re-reconstructed Z➔μμ data at T1/T2 site

Determine new alignment:

run HIP algo on multiple CPU’s over dedicated alignement skim from T0

1M events ~ 4 h on 20 CPU

write new alignment into offline DB at T0

distribute offline DB to T1/T2’s for re-reconstruction

Reconstructed Z mass

Page 16: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 16

CMSSW validation: tracking Reproduce with CMSSW framework (1.2M lines of simulation, reconstruction and analysis software) the detector performance reported in PTDR vol. 1

Muons (CMSSW)

CMSSW Pixel Seeding

Page 17: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 17

CMSSW validation: electronselectron classification

momentum at vertex

electron/supercluster matching

Already improvingPTDR results in manyareas (forward tracking,electron reconstruction …)

Page 18: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 18

Site Availability Monitor Measure the site availability by testing:

Analysis submissionProductionDbase cachingData transfer

With Site Availability Monitor (SAM) infrastructure, developed in collaboration with LCG and CERN/IT

The goal is 90 % for T1s and 80 % for T2s Run tests at each EGEE sites every 2 hours now

5 CMS specific tests, more under development Feedback to site administrators and targeting individual components

Page 19: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 19

WMS acceptance tests on gLite 3.1

115000 jobs submitted in7 days on a single WMS instance• ~16000 jobs/day well

exceeding acceptance criteria

~0.3% of jobs with problems, well below the required threshold• Recoverable using a

proper command by the user

The WMS dispatched jobs to computing elements with no noticeable delay

Page 20: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 20

An infrastructure by CMS to help Tiers to exercise transfers• Based on a new traffic load generator• Coordination within the CMS Facilities/Infrastructure project

Exercises• T0T1(tape), T1T1, T1T2 (‘regional’), T1T2 (‘non-regional’)

CMS LoadTest 2007

Important achievements• routinely transferring• all Tiers report it’s useful• higher participation of Tiers • less efforts, improved stability• Automatic, streamlined operations

T0-T1 only

CMS LoadTest Cycles~2.5 PB in 1.5 months

CMS CSA06

Page 21: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 21

Goals of Computing in 2007 Support of global data taking during detector

commissioning• commissioning of the end-to-end chain: P5 --> T0 --> T1s

(tape)• data transfers and access through the complete DM system• 3-4 days every month starting in May

Demonstrate Physics Analysis performance using final software with high statistics.• Major MC production of up to 200M events started in March• Analysis starts in June, finishes by September

Ramping up of the distributed computing at scale (CSA07)• 50 % challenge of 2008 system scale• Adding new functionalities

• HLT farm (DAQ storage manager -> T0)• T1 - T1 and non regional T1 - T2

• Increase the user load for physics analysis

Page 22: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 22

CSA07 workflow

Page 23: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 23

CSA07 success metrics

Page 24: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 24

CSA07 and Physics Analysis We have roughly 10-15 T2s that have sufficient storage and CPU resources to support multiple datasets• Skims in CSA06 were about ~500 GB, the largest of the raw samples was ~8 TB

Improvements in site availability with SAM Improve non-regional Tier-1 - Tier-2 transfers

Publish data hosting proposals for Tier-1 and Tier-2 sites

User Analysis • Distributed analysis through CRAB to Tier-2 centers

• Dynamic use of the Tier-2 storage

Calibration workflow activities

Page 25: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 25

Ingredients for analysis workflows

Event Filters • Pre-select Analysis output

Event Producer• Can create new contents to be included for Analysis output

EDM output configurability• Can keep or drop any collections

Flexibility in the event contentFlexibility with different steps of data reduction

InputInput OutputOutputAnalysis jobAnalysis job

Can be mixedin any combination

Page 26: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 26

Analysis workflow at Tier0/CAF

HLT

Out

put

HLT

Out

put

RECORECO AODAODRAWRAW

(optional)

RECORECO AODAODRAWRAW

(optional)

one in-time processed stream, or HLT primary streams

Early discovery express stream

Early discovery express stream

Physics Data Quality Monitoring

Physics Data Quality Monitoring

Standard Model‘Candles’

Standard Model‘Candles’

Object IDefficiency

Object IDefficiency

Calibration withControl samples

Calibration withControl samples

Dedicated stream(s)for fast calibration

Initial fast calibration

Initial fast calibration

Actual output of HLT farm still to be detailed…

Page 27: 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

M. Paganoni, HCP2007 27

Conclusions Commissioning, integration remain major tasks in 2007• To balance the needs for physics, computing, detector will be a logistics challenge

Transition to Operations has started. Scaling at production level, while keeping high efficiency is the critical point• Continuous effort to be monitored in detail

Keep as flexible as possible in the analysis model

An increasing number of CMS people will be involved in the facilities, commissioning and operations to prepare for CMS physics analysis