atlas common computing readiness challenge ii and full dress rehearsal ii a plan (version 1)

32
ATLAS Common Computing Readiness Challenge II and Full Dress Rehearsal II a plan (version 1) ATLAS T0/1/2 Jamboree CERN , April 24 KorsBos CERN/NIKHEF, ATLAS 1

Upload: karis

Post on 06-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

ATLAS Common Computing Readiness Challenge II and Full Dress Rehearsal II a plan (version 1). ATLAS T0/1/2 Jamboree CERN , April 24 KorsBos CERN/NIKHEF, ATLAS. Overall plan. T0 processing and data distribution T1 data re-processing T2 Simulation Production T0/1/2 Physics Group Analysis - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

1

ATLAS

Common Computing Readiness Challenge II and Full Dress Rehearsal II

a plan (version 1)

ATLAS T0/1/2 JamboreeCERN , April 24

KorsBosCERN/NIKHEF, ATLAS

Page 2: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

2

Overall plan

1. T0 processing and data distribution2. T1 data re-processing3. T2 Simulation Production4. T0/1/2 Physics Group Analysis5. T0/1/2/3 End-User Analysis

A. Synchronously with Data from Detector CommissioningB. Fully rely on srmv2 everywhereC. Test now at real scale (no data deletions)D. Test the full show: shifts, communication, etc.

Page 3: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

Month Date System Requirements, remarks Parallel Shifts

April Week 1828/4 - 29/4

L1Calo+Calo run?

May Week 1830/4 - 4/5

3 days TRT + 3 Days SCT Sub-systems: Transition to tdaq-01-09

May Week 195/5 – 11/5

2 days ID combined running including Pixel DAQ

Towards end of week after transition to 01-09

Start of magnet test ~HLT algos available

May Week 2012/5-18/5

Calo+L1calo+HLT -Timing, calo DQ, debugging, high rate, algo tests

- Finish with a stable week end run?

Week days: morning expert work; evening calo + central desks

WE: 24/7 calos + central desks

May Week 2119/5-25/5

Muon+L1Mu+HLT -Same as above

-Finish with a stable week end run? with calos?

Week days: morning expert work; evening muon (calo?)+ central desks

WE: 24/7 muon (calos?) + central desks

May Week 2226/5-1/6

ID+DAQ+HLT

Beam pipe closure

-Same as above-Dedicated DAQ test after detector testing and before HLT testing-Finish with a stable week end run? with muons + calos?

Week days: morning expert work; evening ID (Muon/calo?) + central desks

WE: 24/7 ID (muon/calos?) + central desks

Page 3

Schedule: May

Page 4: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

Schedule: June

So, we have to adapt to detector data taking.– Mon-Wednesday: Functional tests (data generator)– Thursday: analysis & changeover– Fri-Sunday: Detector data

Month Date System Requirements, remarks Parallel Shifts

June Week 232/6 – 8/6

FDR-2 No Tier-0 for Det.Com. Magnet test

Week 249/6 – 15/6

Magnet test

Week 2516/6 – 22/6

LHC cold? Magnet test

Week 2623/6 – 29/6

Magnet test

July Week 27 ATLAS running ?

Page 4

Page 5: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

5

Overall plan

1. T0 processing and data distribution

2. T1 data re-processing

3. T2 Simulation Production

4. T0/1/2 Physics Group Analysis

5. T01/2/3 End-User Analysis

A. Synchronously with Data from Detector Commissioning

B. Fully rely on srmv2 everywhere

C. Test now at real scale (no data deletions)

D. Test the full show: shifts, communication, etc.

Page 6: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

6

-1- T0 processing and data distribution

• Monday – Thursday: Data Generator• Simulate running of 10

hours@200Hz per day– nominal is 14 hours– Run continuously at 40%

• Distribution of data to T1’s and T2’s• Request T1 storage classes

ATLASDATADISK and ATLASDATATAPE for disk and tape

• Request T2 storage class ATLASDATADISK

• Request T1 storage space for full 4 weeks

• Tests of– Data sitribution– Distribution latency– T0-T1, T1-T2, T1-T1

• Thursday – Sunday Detector Data• Possibly uninterrupted data taking

during weekend• Distribution of data to T1’s and T2’s• Request T1 storage classes

ATLASDATADISK and ATLASDATATAPE for disk and tape

• Request T2 storage class ATLASDATADISK

• Request T1 storage space for full 4 weeks

• Tests of:• Merging of small files• Real T0 processing• Data also to atldata at CERN• Special requests

Page 7: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

7

ATLAS Data @ T0reminder of the Comp.Model

• Raw data arrives on disk and is archived to tape• initial processing provides ESD, AOD and NTUP• a fraction (10%) of RAW and ESD and all AOD is made available on

disk at CERN in the atldata pool• RAW data is distributed by ratio over the T1’s to go to tape• AOD is copied to each T1 to remain on disk and/or copied to T2’s• ESD follows the RAW to the T1 to remain on disk• a second ESD copy is send to the paired T1• we may change this distribution for early running

Page 8: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

8

Tier-1

Raw

t0atlas

Tier-0Data flowforATLAS DATA

ATLASDATATAPE

AOD

ESDAOD

Group AnalysisAOD

TAPEATLASDATADISK

ATLASDATADISKEnd User AnalysisAOD

Tier-2

Tier-3ATLASENDUSER

ATLASGRP

Group Analysis

ATLASGRP

ESDAOD

Page 9: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

9

Data sample per dayfor when we run the generator

for the1 day

10 hrs@200Hz 7.2 Mevents/day

In the T0:• 20TB/day RAW+ESD+AOD to tape• 1.2 TB/day RAW to disk (10%)• 0.7TB/day ESD to disk (10%)• 1.4 TB/day AOD to disk

5day t0atlas buffer must be100TByte

T1 Share Tape Disk

BNL 25 % 2.9 TB 9 TB

IN2P3 15 1.7 4

SARA 15 1.7 4

RAL 10 1.2 3

FZK 10 1.2 3

CNAF 5 0.6 2

ASGC 5 0.6 2

PIC 5 0.6 2

NDGF 5 0.6 2

Triumf 5 0.6 2

RAW=1.6MBESD=1MBAOD=0.2MB

Page 10: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

10

Tape & Disk Space Requirementsfor when we run the generator

for the 4 weeks of CCRC

10 hrs@200Hz 7.2 Mevents/dayCCRC is 4 weeks of 28 days

In the T0:• 565TB RAW+ESD+AOD to tape• 32 TB RAW to disk (10%)• 20 TB ESD to disk (10%)• 40TB AOD to disk

atldata disk must be92 TB for the month

T1 Share Tape Disk

BNL 25 % 81 TB 250TB

IN2P3 15 48 106

SARA 15 48 106

RAL 10 34 84

FZK 10 34 84

CNAF 5 17 62

ASGC 5 17 62

PIC 5 17 62

NDGF 5 17 62

Triumf 5 17 62

RAW=1.6MBESD=1MBAOD=0.2MB

Page 11: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

11

ATLAS Data @ T1• T1’s are for data archive and re-processing• And for group analysis on ESD and AOD data• The Raw data share goes to tape @T1• A fraction (10%) of that also goes to disk• Each T1 receives a copy of all AOD files• Each T1 receives a share of the ESD files

– ESD data sets follow the RAW data – An extra copy of that also goes to “sister”T1– BNL takes a full copy– In total 2.5 copies of all ESD files world-wide

Space Token Storage Type Used for Size

ATLASDATADISK T0D1 RAW, ESD,AOD By Share

ATLASDATATAPE T1D0 RAW ~2 day buffer

Page 12: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

12

ATLAS Data @ T2• T2’s are for Monte Carlo Simulation Production• ATLASs assume there is no tape storage available• Also used for Group analysis

– Each physics group has its own space token ATLASGRP<name>– F.e. ATLASGRPHIGGS, ATLASGRPSUSY, ATLASGRPMINBIAS– Some initial volume for testing: 2 TB

• T2’s may request AOD datasets– Defined by the primary interest of the physics community– Another full copy of all AOD’s should be available in the cloud

• Also for End-User Analysis– Accounted as T3 activity, not under ATLAS control– Storage space not accounted as ATLAS– But almost all T2 (and even T1’s ) need space for token ATLASENDUSER– Some initial value for testing: 2 TB

Space Token Storage Type Used For Size [TB]

ATLASDATADISK T0D1 ARW, ESD, AOD dep. on request

ATLASGRP T0D1 Group Data 2

ATLASENDUSER T0D1 End-User Data to be discussed

Page 13: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

13

Nota Bene

• We had many ATLASGRP<group> storage area’s but (almost) none were used

• It seems, at this stage, that one for each VOMS group is over the top

• Much overhead to create too many small storage classes

• For now, a catch-all, ATLASGRP• We may revert back later when we better see

the usage

Page 14: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

14

Overall plan

1. T0 processing and data distribution2. T1 data re-processing3. T2 Simulation Production4. T0/1/2 Physics Group Analysis5. T0/1/2/3 End-User Analysis

A. Synchronously with Data from Detector CommissioningB. Fully rely on srmv2 everywhereC. Test now at real scale (no data deletions)D. Test the full show: shifts, communication, etc.

Page 15: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

15

-2- T1 re-processing

• Not at full scale yet, but at all T1’s at least.• Subset of M5 data staged back from tape per dataset

– 10 datasets of 250 files each plus 1 dataset of 5000 files– Each file is ~2 GB total data volume is ~5 TB, the big one is 10 TB

• Conditions data on disk (140 files)– Each re-proc. job opens ~35 of those files

• M5 data file copied to local disk of WN• Output ESD and AOD file

– Kept on disk and archived on tape (T1D1 storage class)– Copied to one or two other T1’s for ESD files– Copied to all other T1’s for AOD files

Space Token Storage Type Used for Size

ATLASDATADISKTAPE T1D1 ESD,AOD,TAG from re-processing

By Share

ATLASDATATAPE T1D0 RAW ~2 day buffer

Page 16: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

16

Page 17: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

17

Resourcerequirements for M5 re-proc

• M5 RAW datawill be distributed over T1’s• One dataset with 5000 files of 2 GB each• Pre-staging of 10 TByte (50 cassettes)• Each job (1 file) takes ~30 minutes• We request 50 cpu’s to be through in 2 days• Only (small) ESD output from re-processing• So minimal requirements for T1D1 pool• Tape cash of 5 TB will require us to think• So 5 TB requirement for T1D0 pool

Page 18: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

18

Overall plan

1. T0 processing and data distribution2. T1 data re-processing3. T2 Simulation Production4. T01/2/ Physics Group Analysis5. T0/1/2/3 End-User Analysis

A. Synchronously with Data from Detector CommissioningB. Fully rely on srmv2 everywhereC. Test now at real scale (no data deletions)D. Test the full show: shifts, communication, etc.

Page 19: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

19

-3-T2 Simulation Production

• Simulation of physics and background for FDR-2• Need to produce ~30M events• Simulation HITS (4 MB/ev), Digitization RDO (2 MB/ev)• Reconstruction ESD (1.1 MB/ev), AOD (0.2 MB/ev)• Simulation is done at the T2• HITS uploaded to T1 and kept on disk• In T1: digitization RDOs sent to BNL for mixing• In T1: Reconstruction ESD, AOD

– ESD, AOD archived to tape at T1– ESD copied to one or two other T1’s– AOD copied to each other T1

Page 20: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

20

Tape & Disk Space Requirementsfor the FDR-2 production

0.5 Mevents/dayFDR2 production 8 weeks 30M

events

In total:• 120 TB HITS• 60 TB RDO BNL• 33 TB ESD• 6 TB AOD

T1 Share Tape DiskHITS+RDO

BNL 30 % 0 TB 36+60 TB

IN2P3 10 0 18

SARA 5 0 12

RAL 10 0 18

FZK 10 0 12

CNAF 5 0 6

ASGC 5 0 6

PIC 5 0 6

NDGF 10 0 6

Triumf 10 0 6

HITS=4 MBRDO=2 MBESD=1.1 MBAOD=0.2 MB

Page 21: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

21

Tier-1

Tier-2

RDO HITS

SimulationHITS

ATLSMCDISK

Tier-0

ATLASMCDISK

OtherTier-1

Data flowforSimulationProduction

TAPE

Pile-up

ATLASMCDISK

Reconstruction

RDO

ESDAOD

MixingBS

ATLASMCTAPEHITS

ATLASMCDISKTAPE TAPE

ESDAOD

RDO

HITS

RDO

RDO

RDO

ESDAOD

ESDAOD

BNL

Page 22: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

22

Storage Types@T2for simulation production

Space Token Storage Type Used For Size [TB]

ATLASMCDISK T0D1 HITS Scales with #cpu’s

Space Token Storage Type Used for Size

ATLASMCDISK T0D1 HITS from T2’sESD, AOD from T1’s

By Cloud capacity

ATLASMCTAPE T1D0 HITS from MC Buffer for tape

ATLASMCDISKTAPE T1D1 ESD,AOD fromreconstruction

ByCloudcapacity

Additional storage types @T1for simulation production

Page 23: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

23

Overall plan

1. T0 processing and data distribution2. T1 data re-processing3. T2 Simulation Production4. T0/1/2 Physics Group Analysis5. T0/1/2/3/ End-User Analysis

A. Synchronously with Data from Detector CommissioningB. Fully rely on srmv2 everywhereC. Test now at real scale (no data deletions)D. Test the full show: shifts, communication, etc.

Page 24: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

24

-4- T0/1/2 Physics Group Analysis

• done at T0 & T1 & T2 … not at T3’s• production of primary Derived Physics Data files (DPD’s)• DPD’s are 10% of AOD’s in size … but there are 10 X more • primary DPD’s are produced form ESD and AOD at the T1’s• secondary DPD’s are produced at T1 and T2’s• also other file types may be produced (ntup’s, hist’s)• jobs always ran by managers, data always ran to/from disk• writable for group managers only, readable by all ATLAS

Space Token Storage Type Used For Size [TB]

ATLASGRP T0D1 Group Data 2

Page 25: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

25

Overall plan

1. T0 processing and data distribution2. T1 data re-processing3. T2 Simulation Production4. T0/1/2 Physics Group Analysis5. T0/1/2/3/ End-User Analysis

A. Synchronously with Data from Detector CommissioningB. Fully rely on srmv2 everywhereC. Test now at real scale (no data deletions)D. Test the full show: shifts, communication, etc.

Page 26: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

26

-5- T0/1/2/3 End-User Analysis

• done at T0 & T1 & T2 & T3’s• users can run (cpu) anywhere where there are ATLAS resources• but can only write where it has write permission (home inst.)• Each site can decide how to implement this (T1D0, T0D1)• Data must be registered in the catalog• non-registered data is really Tier-3 or laptop• longer discussion tomorrow in ATLAS Jamboree

Space Token Storage Type Used For Size [TB]

ATLASENDUSER site defined End-User Data site defined

Page 27: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

27

Summary Table for a 10% T1Space Token Storage Type Used for Size

ATLASDATADISK T0D1 RAW, ESD,AOD 84 TB (T1 share)

ATLASDATATAPE T1D0 tape buffer for RAW 5 TB (~2 day buffer)

ATLASDATADISKTAPE T1D1 ESD,AOD fromre-processing

2 TB (T1 Share)

ATLASMCDISK T0D1 HITS from T2’sESD, AOD from T1’s

18 TB (By Cloud capacity)

ATLASMCTAPE T1D0 tape buffer for HITS (not used yet)

ATLASMCDISKTAPE T1D1 ESD,AOD fromreconstruction

2 TB (ByCloudcapacity)

ATLASGRP T0D1 Group Data 2

ATLASENDUSER site defined End-User Data site defined

Page 28: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

28

Summary Table for a typical T2

Space Token Storage Type Used for Size

ATLASDATADISK T0D1 RAW, ESD,AODon request from T1

10 TB (depending on request)

ATLASMCDISK T0D1 HITS 10 TB (scales with #cpu’s)

ATLASGRP T0D1 Group Data 2 TB

ATLASENDUSER site defined End-User Data sitedefined (depending on # users)

Page 29: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

29

Detailed Planning• Functional Tests using data generator

– First week Monday through Thursday

• T1-T1 tests for all sites (again)– Second week Tuesday through Sunday

• Throughput Tests using data generator– Third week Monday through Thursday

• Contingency– Fourth week Monday through Sunday

• Detector Commissioning with Cosmic Rays– Each week Thursday through Sunday

• Reprocessing M5 data – Each week Tuesday through Sunday

• Clean-up– each Monday

• Remove all test data– last weekend: May 31 & June 1

• Full Dress Rehearsal– June 2 through 10

Page 30: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

30

Detailed Planning

Page 31: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

31

Metrics and Milestones

• still to be defined

Page 32: ATLAS Common Computing Readiness Challenge II   and Full Dress Rehearsal II  a plan (version 1)

32

References• CCRC and Space Tokens Twiki :https://twiki.cern.ch/twiki/bin/view/Atlas/SpaceTokens#CCRC08_2_Space_Token_and_Disk_Sp

• ADC Ops. eLog (certificate protected):https://prod-grid-logger.cern.ch/elog/ATLAS+Computer+Operations+Logbook/