atlas common computing readiness challenge ii and full dress rehearsal ii a plan (version 1)
DESCRIPTION
ATLAS Common Computing Readiness Challenge II and Full Dress Rehearsal II a plan (version 1). ATLAS T0/1/2 Jamboree CERN , April 24 KorsBos CERN/NIKHEF, ATLAS. Overall plan. T0 processing and data distribution T1 data re-processing T2 Simulation Production T0/1/2 Physics Group Analysis - PowerPoint PPT PresentationTRANSCRIPT
1
ATLAS
Common Computing Readiness Challenge II and Full Dress Rehearsal II
a plan (version 1)
ATLAS T0/1/2 JamboreeCERN , April 24
KorsBosCERN/NIKHEF, ATLAS
2
Overall plan
1. T0 processing and data distribution2. T1 data re-processing3. T2 Simulation Production4. T0/1/2 Physics Group Analysis5. T0/1/2/3 End-User Analysis
A. Synchronously with Data from Detector CommissioningB. Fully rely on srmv2 everywhereC. Test now at real scale (no data deletions)D. Test the full show: shifts, communication, etc.
Month Date System Requirements, remarks Parallel Shifts
April Week 1828/4 - 29/4
L1Calo+Calo run?
May Week 1830/4 - 4/5
3 days TRT + 3 Days SCT Sub-systems: Transition to tdaq-01-09
May Week 195/5 – 11/5
2 days ID combined running including Pixel DAQ
Towards end of week after transition to 01-09
Start of magnet test ~HLT algos available
May Week 2012/5-18/5
Calo+L1calo+HLT -Timing, calo DQ, debugging, high rate, algo tests
- Finish with a stable week end run?
Week days: morning expert work; evening calo + central desks
WE: 24/7 calos + central desks
May Week 2119/5-25/5
Muon+L1Mu+HLT -Same as above
-Finish with a stable week end run? with calos?
Week days: morning expert work; evening muon (calo?)+ central desks
WE: 24/7 muon (calos?) + central desks
May Week 2226/5-1/6
ID+DAQ+HLT
Beam pipe closure
-Same as above-Dedicated DAQ test after detector testing and before HLT testing-Finish with a stable week end run? with muons + calos?
Week days: morning expert work; evening ID (Muon/calo?) + central desks
WE: 24/7 ID (muon/calos?) + central desks
Page 3
Schedule: May
Schedule: June
So, we have to adapt to detector data taking.– Mon-Wednesday: Functional tests (data generator)– Thursday: analysis & changeover– Fri-Sunday: Detector data
Month Date System Requirements, remarks Parallel Shifts
June Week 232/6 – 8/6
FDR-2 No Tier-0 for Det.Com. Magnet test
Week 249/6 – 15/6
Magnet test
Week 2516/6 – 22/6
LHC cold? Magnet test
Week 2623/6 – 29/6
Magnet test
July Week 27 ATLAS running ?
Page 4
5
Overall plan
1. T0 processing and data distribution
2. T1 data re-processing
3. T2 Simulation Production
4. T0/1/2 Physics Group Analysis
5. T01/2/3 End-User Analysis
A. Synchronously with Data from Detector Commissioning
B. Fully rely on srmv2 everywhere
C. Test now at real scale (no data deletions)
D. Test the full show: shifts, communication, etc.
6
-1- T0 processing and data distribution
• Monday – Thursday: Data Generator• Simulate running of 10
hours@200Hz per day– nominal is 14 hours– Run continuously at 40%
• Distribution of data to T1’s and T2’s• Request T1 storage classes
ATLASDATADISK and ATLASDATATAPE for disk and tape
• Request T2 storage class ATLASDATADISK
• Request T1 storage space for full 4 weeks
• Tests of– Data sitribution– Distribution latency– T0-T1, T1-T2, T1-T1
• Thursday – Sunday Detector Data• Possibly uninterrupted data taking
during weekend• Distribution of data to T1’s and T2’s• Request T1 storage classes
ATLASDATADISK and ATLASDATATAPE for disk and tape
• Request T2 storage class ATLASDATADISK
• Request T1 storage space for full 4 weeks
• Tests of:• Merging of small files• Real T0 processing• Data also to atldata at CERN• Special requests
7
ATLAS Data @ T0reminder of the Comp.Model
• Raw data arrives on disk and is archived to tape• initial processing provides ESD, AOD and NTUP• a fraction (10%) of RAW and ESD and all AOD is made available on
disk at CERN in the atldata pool• RAW data is distributed by ratio over the T1’s to go to tape• AOD is copied to each T1 to remain on disk and/or copied to T2’s• ESD follows the RAW to the T1 to remain on disk• a second ESD copy is send to the paired T1• we may change this distribution for early running
8
Tier-1
Raw
t0atlas
Tier-0Data flowforATLAS DATA
ATLASDATATAPE
AOD
ESDAOD
Group AnalysisAOD
TAPEATLASDATADISK
ATLASDATADISKEnd User AnalysisAOD
Tier-2
Tier-3ATLASENDUSER
ATLASGRP
Group Analysis
ATLASGRP
ESDAOD
9
Data sample per dayfor when we run the generator
for the1 day
10 hrs@200Hz 7.2 Mevents/day
In the T0:• 20TB/day RAW+ESD+AOD to tape• 1.2 TB/day RAW to disk (10%)• 0.7TB/day ESD to disk (10%)• 1.4 TB/day AOD to disk
5day t0atlas buffer must be100TByte
T1 Share Tape Disk
BNL 25 % 2.9 TB 9 TB
IN2P3 15 1.7 4
SARA 15 1.7 4
RAL 10 1.2 3
FZK 10 1.2 3
CNAF 5 0.6 2
ASGC 5 0.6 2
PIC 5 0.6 2
NDGF 5 0.6 2
Triumf 5 0.6 2
RAW=1.6MBESD=1MBAOD=0.2MB
10
Tape & Disk Space Requirementsfor when we run the generator
for the 4 weeks of CCRC
10 hrs@200Hz 7.2 Mevents/dayCCRC is 4 weeks of 28 days
In the T0:• 565TB RAW+ESD+AOD to tape• 32 TB RAW to disk (10%)• 20 TB ESD to disk (10%)• 40TB AOD to disk
atldata disk must be92 TB for the month
T1 Share Tape Disk
BNL 25 % 81 TB 250TB
IN2P3 15 48 106
SARA 15 48 106
RAL 10 34 84
FZK 10 34 84
CNAF 5 17 62
ASGC 5 17 62
PIC 5 17 62
NDGF 5 17 62
Triumf 5 17 62
RAW=1.6MBESD=1MBAOD=0.2MB
11
ATLAS Data @ T1• T1’s are for data archive and re-processing• And for group analysis on ESD and AOD data• The Raw data share goes to tape @T1• A fraction (10%) of that also goes to disk• Each T1 receives a copy of all AOD files• Each T1 receives a share of the ESD files
– ESD data sets follow the RAW data – An extra copy of that also goes to “sister”T1– BNL takes a full copy– In total 2.5 copies of all ESD files world-wide
Space Token Storage Type Used for Size
ATLASDATADISK T0D1 RAW, ESD,AOD By Share
ATLASDATATAPE T1D0 RAW ~2 day buffer
12
ATLAS Data @ T2• T2’s are for Monte Carlo Simulation Production• ATLASs assume there is no tape storage available• Also used for Group analysis
– Each physics group has its own space token ATLASGRP<name>– F.e. ATLASGRPHIGGS, ATLASGRPSUSY, ATLASGRPMINBIAS– Some initial volume for testing: 2 TB
• T2’s may request AOD datasets– Defined by the primary interest of the physics community– Another full copy of all AOD’s should be available in the cloud
• Also for End-User Analysis– Accounted as T3 activity, not under ATLAS control– Storage space not accounted as ATLAS– But almost all T2 (and even T1’s ) need space for token ATLASENDUSER– Some initial value for testing: 2 TB
Space Token Storage Type Used For Size [TB]
ATLASDATADISK T0D1 ARW, ESD, AOD dep. on request
ATLASGRP T0D1 Group Data 2
ATLASENDUSER T0D1 End-User Data to be discussed
13
Nota Bene
• We had many ATLASGRP<group> storage area’s but (almost) none were used
• It seems, at this stage, that one for each VOMS group is over the top
• Much overhead to create too many small storage classes
• For now, a catch-all, ATLASGRP• We may revert back later when we better see
the usage
14
Overall plan
1. T0 processing and data distribution2. T1 data re-processing3. T2 Simulation Production4. T0/1/2 Physics Group Analysis5. T0/1/2/3 End-User Analysis
A. Synchronously with Data from Detector CommissioningB. Fully rely on srmv2 everywhereC. Test now at real scale (no data deletions)D. Test the full show: shifts, communication, etc.
15
-2- T1 re-processing
• Not at full scale yet, but at all T1’s at least.• Subset of M5 data staged back from tape per dataset
– 10 datasets of 250 files each plus 1 dataset of 5000 files– Each file is ~2 GB total data volume is ~5 TB, the big one is 10 TB
• Conditions data on disk (140 files)– Each re-proc. job opens ~35 of those files
• M5 data file copied to local disk of WN• Output ESD and AOD file
– Kept on disk and archived on tape (T1D1 storage class)– Copied to one or two other T1’s for ESD files– Copied to all other T1’s for AOD files
Space Token Storage Type Used for Size
ATLASDATADISKTAPE T1D1 ESD,AOD,TAG from re-processing
By Share
ATLASDATATAPE T1D0 RAW ~2 day buffer
16
17
Resourcerequirements for M5 re-proc
• M5 RAW datawill be distributed over T1’s• One dataset with 5000 files of 2 GB each• Pre-staging of 10 TByte (50 cassettes)• Each job (1 file) takes ~30 minutes• We request 50 cpu’s to be through in 2 days• Only (small) ESD output from re-processing• So minimal requirements for T1D1 pool• Tape cash of 5 TB will require us to think• So 5 TB requirement for T1D0 pool
18
Overall plan
1. T0 processing and data distribution2. T1 data re-processing3. T2 Simulation Production4. T01/2/ Physics Group Analysis5. T0/1/2/3 End-User Analysis
A. Synchronously with Data from Detector CommissioningB. Fully rely on srmv2 everywhereC. Test now at real scale (no data deletions)D. Test the full show: shifts, communication, etc.
19
-3-T2 Simulation Production
• Simulation of physics and background for FDR-2• Need to produce ~30M events• Simulation HITS (4 MB/ev), Digitization RDO (2 MB/ev)• Reconstruction ESD (1.1 MB/ev), AOD (0.2 MB/ev)• Simulation is done at the T2• HITS uploaded to T1 and kept on disk• In T1: digitization RDOs sent to BNL for mixing• In T1: Reconstruction ESD, AOD
– ESD, AOD archived to tape at T1– ESD copied to one or two other T1’s– AOD copied to each other T1
20
Tape & Disk Space Requirementsfor the FDR-2 production
0.5 Mevents/dayFDR2 production 8 weeks 30M
events
In total:• 120 TB HITS• 60 TB RDO BNL• 33 TB ESD• 6 TB AOD
T1 Share Tape DiskHITS+RDO
BNL 30 % 0 TB 36+60 TB
IN2P3 10 0 18
SARA 5 0 12
RAL 10 0 18
FZK 10 0 12
CNAF 5 0 6
ASGC 5 0 6
PIC 5 0 6
NDGF 10 0 6
Triumf 10 0 6
HITS=4 MBRDO=2 MBESD=1.1 MBAOD=0.2 MB
21
Tier-1
Tier-2
RDO HITS
SimulationHITS
ATLSMCDISK
Tier-0
ATLASMCDISK
OtherTier-1
Data flowforSimulationProduction
TAPE
Pile-up
ATLASMCDISK
Reconstruction
RDO
ESDAOD
MixingBS
ATLASMCTAPEHITS
ATLASMCDISKTAPE TAPE
ESDAOD
RDO
HITS
RDO
RDO
RDO
ESDAOD
ESDAOD
BNL
22
Storage Types@T2for simulation production
Space Token Storage Type Used For Size [TB]
ATLASMCDISK T0D1 HITS Scales with #cpu’s
Space Token Storage Type Used for Size
ATLASMCDISK T0D1 HITS from T2’sESD, AOD from T1’s
By Cloud capacity
ATLASMCTAPE T1D0 HITS from MC Buffer for tape
ATLASMCDISKTAPE T1D1 ESD,AOD fromreconstruction
ByCloudcapacity
Additional storage types @T1for simulation production
23
Overall plan
1. T0 processing and data distribution2. T1 data re-processing3. T2 Simulation Production4. T0/1/2 Physics Group Analysis5. T0/1/2/3/ End-User Analysis
A. Synchronously with Data from Detector CommissioningB. Fully rely on srmv2 everywhereC. Test now at real scale (no data deletions)D. Test the full show: shifts, communication, etc.
24
-4- T0/1/2 Physics Group Analysis
• done at T0 & T1 & T2 … not at T3’s• production of primary Derived Physics Data files (DPD’s)• DPD’s are 10% of AOD’s in size … but there are 10 X more • primary DPD’s are produced form ESD and AOD at the T1’s• secondary DPD’s are produced at T1 and T2’s• also other file types may be produced (ntup’s, hist’s)• jobs always ran by managers, data always ran to/from disk• writable for group managers only, readable by all ATLAS
Space Token Storage Type Used For Size [TB]
ATLASGRP T0D1 Group Data 2
25
Overall plan
1. T0 processing and data distribution2. T1 data re-processing3. T2 Simulation Production4. T0/1/2 Physics Group Analysis5. T0/1/2/3/ End-User Analysis
A. Synchronously with Data from Detector CommissioningB. Fully rely on srmv2 everywhereC. Test now at real scale (no data deletions)D. Test the full show: shifts, communication, etc.
26
-5- T0/1/2/3 End-User Analysis
• done at T0 & T1 & T2 & T3’s• users can run (cpu) anywhere where there are ATLAS resources• but can only write where it has write permission (home inst.)• Each site can decide how to implement this (T1D0, T0D1)• Data must be registered in the catalog• non-registered data is really Tier-3 or laptop• longer discussion tomorrow in ATLAS Jamboree
Space Token Storage Type Used For Size [TB]
ATLASENDUSER site defined End-User Data site defined
27
Summary Table for a 10% T1Space Token Storage Type Used for Size
ATLASDATADISK T0D1 RAW, ESD,AOD 84 TB (T1 share)
ATLASDATATAPE T1D0 tape buffer for RAW 5 TB (~2 day buffer)
ATLASDATADISKTAPE T1D1 ESD,AOD fromre-processing
2 TB (T1 Share)
ATLASMCDISK T0D1 HITS from T2’sESD, AOD from T1’s
18 TB (By Cloud capacity)
ATLASMCTAPE T1D0 tape buffer for HITS (not used yet)
ATLASMCDISKTAPE T1D1 ESD,AOD fromreconstruction
2 TB (ByCloudcapacity)
ATLASGRP T0D1 Group Data 2
ATLASENDUSER site defined End-User Data site defined
28
Summary Table for a typical T2
Space Token Storage Type Used for Size
ATLASDATADISK T0D1 RAW, ESD,AODon request from T1
10 TB (depending on request)
ATLASMCDISK T0D1 HITS 10 TB (scales with #cpu’s)
ATLASGRP T0D1 Group Data 2 TB
ATLASENDUSER site defined End-User Data sitedefined (depending on # users)
29
Detailed Planning• Functional Tests using data generator
– First week Monday through Thursday
• T1-T1 tests for all sites (again)– Second week Tuesday through Sunday
• Throughput Tests using data generator– Third week Monday through Thursday
• Contingency– Fourth week Monday through Sunday
• Detector Commissioning with Cosmic Rays– Each week Thursday through Sunday
• Reprocessing M5 data – Each week Tuesday through Sunday
• Clean-up– each Monday
• Remove all test data– last weekend: May 31 & June 1
• Full Dress Rehearsal– June 2 through 10
30
Detailed Planning
31
Metrics and Milestones
• still to be defined
32
References• CCRC and Space Tokens Twiki :https://twiki.cern.ch/twiki/bin/view/Atlas/SpaceTokens#CCRC08_2_Space_Token_and_Disk_Sp
• ADC Ops. eLog (certificate protected):https://prod-grid-logger.cern.ch/elog/ATLAS+Computer+Operations+Logbook/