cern it department ch-1211 genève 23 switzerland t eis section review of recent activities harry...

11
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/ EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting November 10, 2009

Upload: britton-shepherd

Post on 30-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CERN IT Department CH-1211 Genève 23 Switzerland  t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

EIS section review of recent activities

Harry Renshall

Andrea Sciabà

IT-GS group meetingNovember 10, 2009

Page 2: CERN IT Department CH-1211 Genève 23 Switzerland  t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Contents

• Section overview• VOBox project• ALICE activities• ATLAS activities• CMS activities• LHCb activities• Other activities• Conclusions

Page 3: CERN IT Department CH-1211 Genève 23 Switzerland  t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Section overview

• Section members– ALICE: Patricia, Lola

– ATLAS: Alessandro

– CMS: Andrea, Nicolò

– LHCb: Roberto, Harry (as IT-LHCb management liason)

• Ongoing projects– VO-box project

– Migration of SAM tests to Nagios

• Other activities– See next slides

Page 4: CERN IT Department CH-1211 Genève 23 Switzerland  t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

VO box project: overview

• Objective– enhance reliability of experiment applications running in the

computer centre

– Coordinated by Patricia

• Related activities– Audit of VO boxes what services are run, how critical they are

– Ensure existence of operator procedures for critical VO boxes in case of failure. Simplest case: alert the experiment

– Expand usage of Lemon and SLS. Sensors can probe the application environment or use meta-information out of SLS.

– Where possible automate recovery of a failed application and/or provide simple instructions/tools to the operators

Page 5: CERN IT Department CH-1211 Genève 23 Switzerland  t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

VO box project: status

• ALICE: first full prototype with Lemon sensors and operator procedures ready

• ATLAS: Flavia has coordinated an audit of ATLAS VO boxes and presented plans for increasing reliability and security. The emphasis was on computer security but contains many elements for enhanced reliability. CMS was also contacted

• CMS: service criticality reviewed. Goal is to provide PhEDEx and DBS with procedures and Lemon alarms by start of data taking

• LHCb: Jiri Horky (a student) and Roberto did the setup of the basic Lemon  infrastructure for some DIRAC sensors. LHCb currently  collecting the list of all needed Lemon sensors

• Andrea has triggered the collection of required metrics for all experiments, to avoid duplication of metrics and efforts

Page 6: CERN IT Department CH-1211 Genève 23 Switzerland  t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

ALICE support

• Assist in migration to SL5 of worker nodes and VO boxes

– Full scale deployment of SL5 / gLite3.2 VO boxes expected before Christmas

• Setup of CREAM CE for ALICE at all sites

– recently recommended by the MB for all Tier-1/2 sites

• Preparation for a grid-wide MyProxy service for ALICE

– launched a survey of the 102 sites concerned to clean up obsolete registered hosts

Page 7: CERN IT Department CH-1211 Genève 23 Switzerland  t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

ATLAS support

• Completed a web interface to display the disk quota and space used by individual ATLAS users and ATLAS sub-groups in the CASTOR analysis stager spaces (Lola)

– Data extracted from Lemon (Maarten)

– Feedback from Guido Negri, ATLAS “space manager”

– To do: web interface to manipulate quota limits for individuals, subgroups and within subgroups

• Significant contribution to computing operations

– Helped debugging FTS 2.2

– Load on operations increased with data taking

Page 8: CERN IT Department CH-1211 Genève 23 Switzerland  t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

CMS support

• New Lemon metric for PhEDEx checking for errors in log file is in production. Next steps:

– Trigger a Lemon alarm

– Define a corresponding procedure

• On DBS, enabled a Lemon alarm with automatic restart when Tomcat is down

• Site readiness

– Ongoing campaign to fix all known bugs with Dashboard developers

• Starting to test FTS 2.2

• Our script to prestage via SRM adopted by data operations team

Page 9: CERN IT Department CH-1211 Genève 23 Switzerland  t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

LHCb support

• Using SLS to monitor free space in SRM space tokens

– Information visible in Site Status Board and DIRAC portal

• Some SAM tests successfully migrated to Nagios

• Testing submission to CREAM CE in DIRAC via gLite WMS; investigating direct submission to CREAM

• Working on a new grid JDL ranking expression

– to prevent small sites to be flooded by too many pilot jobs

– to allow for a more adequate usage of sites using fair share mechanisms.

• Exploring the idea to use the same ping method as used in SLS to detect hanging experiment applications and services

• Discussions with FIO to see how to sustain a 3-fold increase in transaction rates for the MySQL databases used by DIRAC

– IT supports Oracle but accepts to add independent instances of DIRAC and the databases.

Page 10: CERN IT Department CH-1211 Genève 23 Switzerland  t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Other activities

• Participated in revising the HEP-SSC part of the ROSCOE (Robust Scientific Communities for EGI) proposal

• Increased number of grid pool accounts (500 for ALICE and LHCb, 1000 for ATLAS and CMS)

– Shown to be enough in the CMS October exercise

• Investigating the impact of SCAS to change identities in multi-user pilot jobs on the MyProxy servers

• Update gLite User Guide (Lola, Andrea)

• Data management support (Andrea)

Page 11: CERN IT Department CH-1211 Genève 23 Switzerland  t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Conclusions

• Support to integration– CREAM, FTS, SCAS, MyProxy, data

management, etc.

• Support to operations– VO box project, disk space management,

troubleshooting, etc.

• Support to monitoring– Site readiness, Nagios, SLS, Lemon, etc.

• Support to user community– gLite User guide, ROSCOE proposal, etc.