cern it department ch-1211 genève 23 switzerland t eis section review of recent activities harry...
TRANSCRIPT
![Page 1: CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting](https://reader035.vdocument.in/reader035/viewer/2022072014/56649e935503460f94b98d99/html5/thumbnails/1.jpg)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
EIS section review of recent activities
Harry Renshall
Andrea Sciabà
IT-GS group meetingNovember 10, 2009
![Page 2: CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting](https://reader035.vdocument.in/reader035/viewer/2022072014/56649e935503460f94b98d99/html5/thumbnails/2.jpg)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Contents
• Section overview• VOBox project• ALICE activities• ATLAS activities• CMS activities• LHCb activities• Other activities• Conclusions
![Page 3: CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting](https://reader035.vdocument.in/reader035/viewer/2022072014/56649e935503460f94b98d99/html5/thumbnails/3.jpg)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Section overview
• Section members– ALICE: Patricia, Lola
– ATLAS: Alessandro
– CMS: Andrea, Nicolò
– LHCb: Roberto, Harry (as IT-LHCb management liason)
• Ongoing projects– VO-box project
– Migration of SAM tests to Nagios
• Other activities– See next slides
![Page 4: CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting](https://reader035.vdocument.in/reader035/viewer/2022072014/56649e935503460f94b98d99/html5/thumbnails/4.jpg)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
VO box project: overview
• Objective– enhance reliability of experiment applications running in the
computer centre
– Coordinated by Patricia
• Related activities– Audit of VO boxes what services are run, how critical they are
– Ensure existence of operator procedures for critical VO boxes in case of failure. Simplest case: alert the experiment
– Expand usage of Lemon and SLS. Sensors can probe the application environment or use meta-information out of SLS.
– Where possible automate recovery of a failed application and/or provide simple instructions/tools to the operators
![Page 5: CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting](https://reader035.vdocument.in/reader035/viewer/2022072014/56649e935503460f94b98d99/html5/thumbnails/5.jpg)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
VO box project: status
• ALICE: first full prototype with Lemon sensors and operator procedures ready
• ATLAS: Flavia has coordinated an audit of ATLAS VO boxes and presented plans for increasing reliability and security. The emphasis was on computer security but contains many elements for enhanced reliability. CMS was also contacted
• CMS: service criticality reviewed. Goal is to provide PhEDEx and DBS with procedures and Lemon alarms by start of data taking
• LHCb: Jiri Horky (a student) and Roberto did the setup of the basic Lemon infrastructure for some DIRAC sensors. LHCb currently collecting the list of all needed Lemon sensors
• Andrea has triggered the collection of required metrics for all experiments, to avoid duplication of metrics and efforts
![Page 6: CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting](https://reader035.vdocument.in/reader035/viewer/2022072014/56649e935503460f94b98d99/html5/thumbnails/6.jpg)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
ALICE support
• Assist in migration to SL5 of worker nodes and VO boxes
– Full scale deployment of SL5 / gLite3.2 VO boxes expected before Christmas
• Setup of CREAM CE for ALICE at all sites
– recently recommended by the MB for all Tier-1/2 sites
• Preparation for a grid-wide MyProxy service for ALICE
– launched a survey of the 102 sites concerned to clean up obsolete registered hosts
![Page 7: CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting](https://reader035.vdocument.in/reader035/viewer/2022072014/56649e935503460f94b98d99/html5/thumbnails/7.jpg)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
ATLAS support
• Completed a web interface to display the disk quota and space used by individual ATLAS users and ATLAS sub-groups in the CASTOR analysis stager spaces (Lola)
– Data extracted from Lemon (Maarten)
– Feedback from Guido Negri, ATLAS “space manager”
– To do: web interface to manipulate quota limits for individuals, subgroups and within subgroups
• Significant contribution to computing operations
– Helped debugging FTS 2.2
– Load on operations increased with data taking
![Page 8: CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting](https://reader035.vdocument.in/reader035/viewer/2022072014/56649e935503460f94b98d99/html5/thumbnails/8.jpg)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
CMS support
• New Lemon metric for PhEDEx checking for errors in log file is in production. Next steps:
– Trigger a Lemon alarm
– Define a corresponding procedure
• On DBS, enabled a Lemon alarm with automatic restart when Tomcat is down
• Site readiness
– Ongoing campaign to fix all known bugs with Dashboard developers
• Starting to test FTS 2.2
• Our script to prestage via SRM adopted by data operations team
![Page 9: CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting](https://reader035.vdocument.in/reader035/viewer/2022072014/56649e935503460f94b98d99/html5/thumbnails/9.jpg)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
LHCb support
• Using SLS to monitor free space in SRM space tokens
– Information visible in Site Status Board and DIRAC portal
• Some SAM tests successfully migrated to Nagios
• Testing submission to CREAM CE in DIRAC via gLite WMS; investigating direct submission to CREAM
• Working on a new grid JDL ranking expression
– to prevent small sites to be flooded by too many pilot jobs
– to allow for a more adequate usage of sites using fair share mechanisms.
• Exploring the idea to use the same ping method as used in SLS to detect hanging experiment applications and services
• Discussions with FIO to see how to sustain a 3-fold increase in transaction rates for the MySQL databases used by DIRAC
– IT supports Oracle but accepts to add independent instances of DIRAC and the databases.
![Page 10: CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting](https://reader035.vdocument.in/reader035/viewer/2022072014/56649e935503460f94b98d99/html5/thumbnails/10.jpg)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Other activities
• Participated in revising the HEP-SSC part of the ROSCOE (Robust Scientific Communities for EGI) proposal
• Increased number of grid pool accounts (500 for ALICE and LHCb, 1000 for ATLAS and CMS)
– Shown to be enough in the CMS October exercise
• Investigating the impact of SCAS to change identities in multi-user pilot jobs on the MyProxy servers
• Update gLite User Guide (Lola, Andrea)
• Data management support (Andrea)
![Page 11: CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting](https://reader035.vdocument.in/reader035/viewer/2022072014/56649e935503460f94b98d99/html5/thumbnails/11.jpg)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Conclusions
• Support to integration– CREAM, FTS, SCAS, MyProxy, data
management, etc.
• Support to operations– VO box project, disk space management,
troubleshooting, etc.
• Support to monitoring– Site readiness, Nagios, SLS, Lemon, etc.
• Support to user community– gLite User guide, ROSCOE proposal, etc.