![Page 1: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/1.jpg)
New CERN CAF facility:parameters, usage statistics,
user support
Marco MEONIJan Fiete GROSSE-OETRINGHAUS
CERN - Offline Week – 24.10.2008
![Page 2: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/2.jpg)
Outline
New CAF: features
CAF1 vs CAF2
Processing Rate comparison
Current Statistics
Users, Groups
Machines, Files, Disks, Datasets, CPUs
Staging problems
Conclusions
![Page 3: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/3.jpg)
Timeline28.09 startup of the new CAF cluster
01.10 1st day with users on the new cluster
07.10 old CAF dismissed by IT
Usage26 workers instead of 33 (but much faster, see later)
Head node is « alicecaf » instead of « lxb6046 »
GSI based authentication, AliEn certificate needed Announced since July but many last-minute users with AliEn account != afs account or server certificate unknown
Datasets clean up, staged only latest data production (First physics - stage 3)
– AF v4-15 meta package redistributed
New CAF
![Page 4: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/4.jpg)
Technical Differences• Cmsd (Cluster Management Service Daemon)
– Why? Olbd not supported any longer– What? Dynamic load balancing of files and data
name-space– How? Stager daemon can benefits from:
• bulk prepare replaces touch file• bulk prepare allows "co-locate" files on the same node
• GSI authentication– Secure communication using user certificates and
LDAP based configuration management
![Page 5: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/5.jpg)
Architectural Differences
New CAF Old CAF
Architecture AMD 64 Intel 32
Machines 13 x 8-core 33 x dual CPU
Space for staging
13 x 2.33 TB 33 x 200 GB
Workers 26 (2/node) 33 (1/node)
Mperf 8570 1307
• Why « only » 26 workers?
• You could use 104 if you are alone
• With 26 workers 4 users can effectively run concurrently
• Estimate average of 8 concurrent users…
• Processing units 6.5x faster than old CAF
![Page 6: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/6.jpg)
Outline
• CAF2: features
• CAF1 vs CAF2
• Processing Rate comparison
• Current Statistics
• Users, Groups
• Machines, Files, Disks, Datasets, CPUs
• Staging problems
• Conclusions
![Page 7: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/7.jpg)
CAF1 vs CAF2 (Processing Rate)
• Test Dataset
• First physics (stage 3) pp, Pythia6, 5kG, 10TeV
• /COMMON/COMMON/LHC08c11_10TeV_0.5T
• 1840 files, 276k events
• Tutorial task that runs over ESDs and displays Pt distribution
• Other comparison test:RAW data reconstruction (Cvetan)
![Page 8: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/8.jpg)
Reminder• The test is dependent on the file distribution for
the used dataset
• Parallel code:• Creation of workers
• Files validation (workers opening the files)
• Events loop (execution of the selector on the dataset)
• Serial code:• Initialization of PROOF master, session and query objects
• Files look up
• Packetizer (file slices distribution)
• Merging (biggest task)
![Page 9: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/9.jpg)
#nodes #events Size (GB) Init_time Proc_time Ev/s MB/s Speedup Efficiency
33 2k 0.25 0.8s 3s 644 50
20k 1.35 17s 1143 77
120k 8.11 49s 2423 164
200k 13.53 1m23s 2405 163
276k 18.71 2m34s 1783 120
26 2k 0.25 0.4s 2s 1062 81 1.6x
20k 1.35 6s 3299 225 2.8x
120k 8.11 28s 4253 289 1.8x
200k 13.53 42s 4743 323 2.0x
276k 18.71 55s 4365 340 2.8x
104 2k 0.25 0.9s 2s 848 124 0.8x
20k 1.35 5s 3572 244 1.1x 27%
120k 8.11 19s 6280 427 1.4x 35%
200k 13.53 31s 6365 433 1.3x 32%
276k 18.71 45s 6120 417 1.2x 30%
• Task executed 5 times and averaged
![Page 10: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/10.jpg)
Processing Rate Comparison (1)• The final average rate is the only important
information
104 workers, 200k evs 104 workers, 276k evs
• Final tail reflects the fact one by one workers stop working• data unevenly distributed
• A longer tail shows a worker overloaded on the last packet(s)• 3 workers maximum helping on the same
«slow» packet
![Page 11: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/11.jpg)
Processing Rate Comparison (2)
Events/sec
#events #events
MB/sec
___104 workers___ 26 workes___ 33 workers
![Page 12: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/12.jpg)
Outline
• CAF2: features
• CAF1 vs CAF2
• Processing Rate comparison
• Current Statistics
• Users/Groups
• Machines, Files, Disks, Datasets, CPUs
• Staging problems
• Conclusions
![Page 13: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/13.jpg)
• Available resources in CAF must be fairly used
• Highest attention to how disks and CPUs are used
• Users are grouped (sub-detectors / physics working groups)
• Each group– has a disk space (quota) which is used to stage
datasets from AliEn– has a CPU fairshare target (priority) to regulate
concurrent queries
CAF Usage
![Page 14: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/14.jpg)
CAF GroupsGroups #UsersPWG0 21 (5)
PWG1 3 (1)
PWG2 39 (21)
PWG3 18 (8)
PWG4 30 (17)
EMCAL 2 (1)
HMPID 1 (1)
ITS 6 (3)
T0 2 (1)
MUON 4 (3)
PHOS 4 (1)
TPC 3 (2)
TOF 1 (1)
TRD 4 (0)
ZDC 1 (1)
VZERO 2 (0)
ACORDE 1 (0)
PMD 3 (0)
DEFAULT
– 19 registered groups– 145 (60) registered users– In brackets () the situation at the previous
offline week
![Page 15: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/15.jpg)
CAFStatusTable
![Page 16: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/16.jpg)
Files Distribution
• Nodes with more files can produce tails in processing rate
• Above a defined threshold files are not stored any longer
Min: 1727Max: 1863
Max difference: 8%
![Page 17: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/17.jpg)
Disk Usage
Max: 116Min: 105
Max difference: 10%
![Page 18: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/18.jpg)
Dataset Monitoring- 28TB disk space for staging- PWG0: 4TB- PWG1: 1TB- PWG2: 1TB- PWG3: 1TB- PWG4: 1TB- ITS: 0.2TB- COMMON: 2TB
![Page 19: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/19.jpg)
CPU Quotas
- default group is not the most consuming anymore
![Page 20: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/20.jpg)
Outline
• CAF2: features
• CAF1 vs CAF2
• processing rate comparison
• Current Statistics
• Users, Groups
• Machines, Files, Disks, Datasets, CPUs
• File Staging
• Conclusions
![Page 21: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/21.jpg)
File Stager• CAF intensively uses 'prepare’
– 0-size files in Castor2 cannot be staged, but replicas are ok
– Check at stager level to avoid spawning infinite prepare on the same empty file unable toget online
replica[i] in Castor
&& size==0?
Copy replica (API service)
Loop over the replicas (CERN, if any, taken first)
replica[i] is not
staged?
Add to StageLIST
Skip it
STOP
File corrupted. Skip it
Stage StageLISTSTO
P
![Page 22: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/22.jpg)
Outline
• CAF2: features
• CAF1 vs CAF2
• Processing Rate comparison
• Current Statistics
• Files Distribution
• Users/Groups
• Staging
• Conclusions
![Page 23: New CERN CAF facility: parameters, usage statistics, user support](https://reader037.vdocument.in/reader037/viewer/2022110210/56812ad4550346895d8eb802/html5/thumbnails/23.jpg)
Conclusions• CAF Usage
– Subscribe to [email protected] using CERN SIMBA (http://listboxservices.web.cern.ch/listboxservices)
– Web page at http://aliceinfo.cern.ch/Offline/Analysis/CAF– CAF tutorial once a month
• New CAF– Faster machines, more space, more fun– Shaky behavior due to higher user activity is under
intensive investigation
• Credits– PROOF Team and IT for the prompt support
• If (ever) you cannot connect just drop a mail and wait for…
… « please try again »