offline/mc status report m. moulson 6th kloe physics workshop sabaudia, 10-13 may 2006
TRANSCRIPT
Offline/MC tasks for 2006
Define definitive KLOE data set:
Close holes in data reconstruction and DST coverageDST file size problemMake sure all reconstructed runs are complete
Reprocessing of various data setsMost critical is 2004 data with bad wire maps
Data qualityCompleteness of HepDB entries for MC productionFine survey for analysis purposes
MC production for 2004-2005
Re-reconstruction of 2001/2002 MC sample?Requested by K group - problem of bad wire maps
L dt (pb–1)
tag 100
2001 169
2002 292
2004 691
2005 1242
total 2394
Reconstruction of 2004-2005 data
0 500 1000 1500 2000
drn/drc
d3p
dk0
dkc
recon
tag 100
raw
1930
1740
1970
990
1480
1520
1480
pb–1
Runs 28700 (9 May 04) to 41902 (5 Dec 05)Updated to reflect DB work 21 Mar 06
DST size problem
Stream DST MCDST
dkc/mkc 210 200
dk0/mk0 290 290
d3p/m3p 2700 780
drn/mrn 6800 14600
drc/mrc 730 2800
32-bit I/O pointers in Fortran: Maximum file size = 2 GBThere may be a workaround (esp. for reading w/ KID) but not easy!
What does 2 GB limit mean? Big headache!
Max sizes difficult to calculate:Fluctuating background components
• 10% for most streams• 33% for dkc/mkc
MC numbers assume:• Full cross section for stream• Size refers to scaled luminosity
Need to split DSTs into pieces100 nb-1 chunksMust modify scripts
• Working on DST standalone script to plug holes
Max size (nb-1)
Reconstructed KLOE data
DBV 2001 2002 2004 2005 Scan Comment
12-15 163 34Bad cut in FILFONo bias sample for FILFONo bias sample for rad
19-21 645bad
wires!
217Various significant ECL modsMinor changes to reconNo bias sample for rad
22 222 395Stable ECLOld bias sample for rad
23-24 12 502 61Stable ECL for s = m
ufo as bias sample
25 16 205Stable ECL for off-peakufo as bias sample
analyzed 163 269 645 1114 266
total 169 292 691 1242 284
trgmon luminosity in pb-1 by year and DBV
645 pb-1 to be reprocessed for sure, 278 pb-1 (617 pb-1) as time permits
Notes on reprocessingReprocessing most critical for 2004 data with bad wire maps
Reprocess 2004 data firstOther reprocessing priorities can be decided upon afterwards
When data are reprocessed, old reconstructed files will be deleted:If data have already been used for analysis (e.g. 2001-2002 data):
– files deleted, database record and old DSTs will be kept.If data have not been used for analysis (e.g. 2004-2005 data):
– database record and DSTs deleted, as if run had never been reconstructed in the first place.
Basically same treatment planned for incomplete runs in data set
Off-peak data reconstructed w/ DBV-24 need reprocessing for use
61 pb-1: all scan data, small amount of s = 1000 MeV dataKeep current files until ready to start reprocessing
Plan: Start 2004 reprocessing and 2005 MC production in parallelIf there are problems with DH-induced latency, hold reprocessing until additional disk space for the I/O cache arrives (it is expected in June).
MC development itemsNew IR geometry CB
Revision of constants in GEANFI CB
Generators: ee,ee0, KSee CG RV ADS
Nuclear interactions/regeneration CB
Adjustments to EmC energy response PG
Adjustments to EmC time resolution CG
Simulation of EmC cluster efficiency MP TS
SQZ (compression) fix MM CB
Fix CELE/CSPS/cluster banks/structures MM MP
Data-quality parameters (√s, etc) GV
Trigger params.: quality, DC thresh, DISH maps MP BS
lsb background insertion MM MP
Correlated noise for charged kaons EDL PDS
dE/dx simulation VP
Check ECL code: ee0 SG
Split large (> 2GB) DSTs (MCDSTs) MM
MC sign-off: Data quality & DB issuesData-quality parameters & DB loading
Not to be confused with fine survey for analysis (S. Fiore)• 2004 run parameters (s, p, etc.) already loaded into HepDB
• 2005 run parameters obtained, need to be loaded• Trigger parameters in DB2 for 2004 & 2005• Cluster efficiency, time resolution parameters updated• Dead/hot wire maps, DC efficiencies: need to unify tables
HepDB-to-DB2 migration?
F. Sborzacchi has developed:• DB2 tables to contain data currently in HepDB:
All detector calibration dataMuch run-condition information (s, p, etc.)
• Code to fill and maintain new tables• Interface code (drop-in replacements for HepDB calls)
Ready to go (but wait until MC started?)
MC sign-off: LSB backgroundWork on LSB background to deal with:• Inconsistent cluster/KINE matching• Timescale alignment for MC and data• Treatment of “noise” hits (missing tA or tB)
Diagnose performance:t0
rec t0MC distribution
KS , KL crash events
KS , KL crasht0
true from
t0rec t0
true (ns)
linear scaleKS 00 from MC, t0
rec from T0_FIND
t0 fromMC track
t0 fromLSB cluster
mixed t0
t0rec t0
MC (ns)
– data• MC
MC sign-off: LSB background
Presenter
Accidental rateReconstr. LSB files
7 MeV in window
E 3.6% 3.1%
B 1.8% 1.7%
W 2.8% 2.7%
LSB files look OK!• Overall prob. for LSB
cluster to set t0 = 0.82(4)%
• Roughly half will steal t0 in event
Frac. events w/ stolen t0 in: Data MC OR MC AND
From tail in KS , KL crash events 0.5% 0.7% 1.1%
From MC truth (t0 cluster acc. or mixed) 0.9% 1.4%
• Harder filtering of noise hits increases stolen t0 probAre we dropping LSB events w/ no clusters?
• Before starting production, must check DC timescale alignment!
Compare to data and reconstructed MC (all_phys test runs):
New A/C module (SIMKBCK) adds background hits correlated to K tracks
Sample distribution of K-correlated background hits in:– layer, distance (in wires) from reconstructed track, time
Private reconstruction of ~20 pb-1 of 2002 MC events to study how new background parameterization affects MC tracking efficiency
+ (MC)+ (data)
trk+vtx as a function of t*(K)
Note: Large correction (5) for absolute probability of background hits:Background measured using reconstructed tracksDon’t account for tracks not reconstructed because of background
MC sign-off: K background
t*(K) (ns) t*(K) (ns)
before after
MC sign-off: dE/dx simulationCalibration of A/C module (DIGIDCADC) to simulate dE/dx response of DC
• E distribution rescaled and smeared to match data• Different s-t relations for data/MC effective integration gates different
General note:• Difficulties in calibrating for space-charge effects undermine resolution• dE/dx resolution adequate for K ID, but not e.g. Ke2/K2 separation
dE/dx (count/cm)
K2 sample
K2 sample
dE/dx (count/cm)
dE/dx distribution Truncated mean distribution
K2 sample
K2 sample
+ data– MC
Monte Carlo tests in 2006
Dates L (pb-1) Runs Comments
20/2-13/3 10539000-39600
• Test new cluster efficiency parameters• Test new EmC time resolution parameters• Additional EmC studies on energy scale calibration• Test lsb cluster background• Complications from bugs in MC truth variables for EmC Fortran structures
• DH problems complicate CPU/runtime analysis
3/5-4/5 1939601-39750
• Reconstructed with DBV-26• Correct EmC structures• Test SIMKBCK module
4/5-5/5 2939751-40000
• Fix EmC timescale for noise hits
7/5-8/5 4240001-40250
• Attempt to optimize EmC background
All tests based on all_phys card, LSF = 0.2
Monte Carlo production plans
2001-2002
450 pb-1
2004-2005
2000 pb-1
1.85G evts
8800 B80 days
8.25G evts
39000 B80 days
Averaged over entire MC sample: 0.21M evts/B80 day = 2.4 Hz0.41 s/evt (simulation + reconstruction + DST)
all, scale = 0.2KSKL, scale = 1
KK, scale = 1 radiative, scale = 5Other (1M evts/pb-1)
Total: 3.1M evts/pb-1
(about same as number of decays in data)
2001-2002 MC production Estimated time for 2004-2005 MC
all, KSKL, KK campaigns to be combined: all_phys, LSF = 1.2
Start with 2005 data: Reprocessing not necessary for comparison
Offline resources: CPU
Nodes Type CPUsms/ev
MC
ms/ev
Rec
B80
“MC”
B80
“CW”
fibm14-15,17-34 P3 375 MHz 80 211 192 80 80
fibm35-44 P4+ 1.45 GHz 40 98 85 88 96
fibm45-47 P5 1.5 GHz SMT “96” 105 202 126 184
All 294 360
• 1 “B80 CPU” = 1 P3 CPU installed in B80 server - KLOE standard unit
• Accuracy of estimates depends on B80/P4+ and B80/P5 ratios
• “MC” results based on MC tests (all_phys, ‘05 data) 3-8 May 06:87 CPU (132 B80 “MC”) configuration - no competing offline workNo serious DH problems observed
• Conventional wisdom (“CW”) based on past experience:1 P4+ = 2.4 B80 (compare 2.2 above)1 P5 = 0.8 P4+ = 1.9 B80 (compare 1.3 above)
Offline CPU needs
2006 offline projects B80 days Days/214 B80
Reprocessing of 2004 data 16000 75
MC production for 2004-2005 data 39000 180
All reprocessing for ’02-’04-’05 data (900 pb-1)
22000 100
Minimum total 55000 260
Maximum total 77000 360
9-12 months if there are no serious DH problems
Assuming: 294 B80 total (MC test, not CW)80 B80 left to users for analysis
214 B80for offline
Offline resources: Disk space
DSTs cached on nfs-mounted disks for fast analysis access
Current Final
35 TB data
10 TB MC
45 TB total
42 TB data
44 TB MC
86 TB total
DST volume*
Current DST cache capacity: 12 TB
New purchases: 21 TB FC disk + new controller
Gara closed yesterday
Delivery by end of June?
~30 TB available for disk cache
*Updated to account for scan data
Offline resources: Tape space
Type Current (TB)
Temporary allocation (TB)
Final allocation (TB)
raw 248.7 250 250
rec 181.2 250 250
DST 34.7 45 45
MC 61.2 150 350
MC DST 10.0 25 60
Total 535.8 720 955
1. Allocations include currently occupied space
2. MC DSTs probably appear as datarec files to archiver
3. Current library system capacity ~720 GBNew cassettes will have to be ordered in future
4. Temporary allocation based on 720 GB libraryAssumes MC production slow
5. Final allocation assumes completion of KLOE offline program
Outlook and summary
• Starting MC production a very high priority, but we need to make a few last checks
• We are planning for a near simultaneous start of– MC production, all_phys, LSF = 1.2, 2005 data– Reprocessing 2004 data with bad wire maps
• We have the CPU power needed to generate a definitive MC sample and reprocess as necessary on a time scale compatible with beginning of 2007
• We will probably want to expand DST disk cache
• We will need to order new cassettes towards the end of the year