international workshop on large scale computing, vecc, kolkata, feb 8-10, 2006 1 lcg software...
TRANSCRIPT
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 20061
LCG Software Activities in India
Rajesh K.
Computer Division
BARC
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 20062
DAE-CERN Collaboration on Grid Computing
Agreement for collaboration in software development for LCG
5 year period 2003-2007 (inclusive) 50 FTE years Participating DAE institutes: BARC, CAT,
VECC, TIFR
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 20063
Current Projects
3 projects currently underway in the area of Grid Monitoring and Fabric Management
GRIDVIEW: A visualization tool for LCG ELFMS: Extremely Large Fabric
Management System CC Tracker: Computer Centre Tracker
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 20064
GRIDVIEW – Visualization Tool for LCG
Visualization system for viewing monitoring information from the LCG
Dashboard showing different status views for different kinds of information
– Site-wise– VO-wise– Etc.
Intended for use in GOCs and ROCs but not restricted to that
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 20065
Gridview
Collects monitoring information from different monitoring tools from grid sites using R-GMA as transport
– Gridftp monitor – SFT– RB Logs etc.
Archival of monitoring information in a central Oracle database at CERN
Analysis of this data to generate summaries Visualization of summary data through Web interface
and GUI
Gridview: Architecture
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 20067
Gridview: Current
GridFTP transfer monitoring– In production use– Display of network throughput and total data
transferred Different host/destination VO-wise, Site-wise, Host-wise Current, Hourly, Daily, Monthly etc.
– Used during SC3 throughput tests
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 20068
Gridview: In development
Job Status Monitoring– Total number of jobs at grid sites in different
states– VO-wise, RB-wise distribution– Site wise job failure rate, utilization etc.
Grid Dashboard– Pictorial representation of site status info on a
world map
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200610
ELFMS: Extremely Large Fabric Management System
Participation in CERN Project on ELFMS ELFMS is used to manage and monitor
thousands of nodes in the CERN computer centre and other LCG sites
Contribution to development and support for ELFMS modules LEMON and QUATTOR
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200611
LEMON: LHC Era MONitoring
Lemon is a system designed to monitor performance metrics, exceptions & status information of extremely large clusters
At CERN it monitors ~2000 nodes, ~70 clusters with ~150 metrics/host producing ~1GB of data. Estimated to monitor up to 10000 nodes
A variety of web based views of monitored data for – Sysadmins, managers and users
Highly modular architecture allows the integration of user developed sensors for monitoring site-specific metrics.
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200612
Contribution to LEMON project
TCP based communication between agent and server instead of the current UDP based one.
SSL encryption A light weight correlation engine to generate
exception metrics and launch fault tolerant actuators in response to an undesired state. This sensor supports multiple metric correlation and mathematical operations between correlated metrics
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200613
Quattor
Quattor is a tool suite providing automated installation, configuration and management of clusters and farms
Highly suitable to install, configure and manage Grid computing clusters correctly and automatically
At CERN, currently used to auto manage nodes >2000 with heterogeneous hardware and software applications
Centrally configurable & reproducible installations, run time management for functional & security updates to maximize availability
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200614
Contribution to Quattor project
Testing all new releases Quattor outside CERN.
Presently working in the SWRep (Software Repoisitory) part of Quattor.
Changing SWRep framework from ssh to SOAP with ssl.
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200615
CC Tracker
A tool to Visualize the CERN Computer Centre
Simplifies management of the CERN Computer Centre for LHC scale operations
Allows easy invocation of service management and operational interventions across sets of selected nodes
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200616
CC Tracker Functionality
Visualize Computer Centre Physical View : Display of all CC rooms with Racks, Disk
cabinets, Tape silos, PDUs Logical View: Display Domains, Clusters, Sub clusters & hosts
in hierarchical way Manage Hardware:
– Add , Move, Rename & Retire operations for a set of machines – Change cluster, update kernel, update OS , shutdown, reboot &
set desired state Manage Infrastructure:
– Addition/deletion of Rack, Disk Cabinet, Tape Silo, PDU and Tape drive
– Updating properties (name, location)– Check power consumption by rack, zone, room & cluster– Check cost by rack, zone, room & cluster
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200617
CCTrackerLogical & Physical view
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200618
Completed Projects
SHIVA – Problem Tracking System QoS: Quality of Service Prediction for worker
nodes RDBMS backend for POOL LCG-AliEn Storage Element Interface Test Suite for perl harnessing with AliEn
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200619
SHIVA Problem Tracking System
General purpose bug tracking system for keeping track of bugs, feature and other issues in software development projects
SHIVA accepts problem reports from users, routes them to troubleshooters and maintains archives of problems and solutions
Generate reports and statistics Implement helpdesk systems for services Used as a problem tracker for SC3 related work
SHIVA: Screenshot
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200621
QoS: Predicting Quality of Service of Worker nodes
Deriving a composite metric for quality of service offered by worker node
QoS computed by a correlation engine which takes simple metrics such as load average, free memory and so on as input
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200622
RelationalStorageSvc for POOL
Designed and Developed the “Relational StorageSvc” module prototype for POOL.
Designed the prototype as a plugin in POOL framework. Provided solution for Remote Database connectivity. Implemented interfaces of the POOL Storage Manager for
ORACLE backend. Demonstrated the Navigation,Storage and Retrieval of data using
ORACLE and the ODBC connectivity option. Tested the cross technology referencing concept of POOL
Storage Manager using ROOT and ORACLE, for primitive and referenced data types
Courtesy: Anil Rawat, Centre for Advanced Technology, Indore
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200623
RSSvc’s place in POOL Architecture
POOL API
Storage Service FileCatalog Collections
ROOT I/OStorage Svc
XMLCatalog
MySQLCatalog
EDG Replica Location Service
ExplicitCollection
ImplicitCollection
RDBMSStorage Svc ?
Courtesy: Anil Rawat, Centre for Advanced Technology, Indore
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200624
LCG-AliEn Storage Element Interface
Test Bed Installation for Grid Environment:– One central server and two Sites.
Installation of Certification Authority Server Installation of GridFTP Library under AliEn
– The GridFTP daemon in.ftpd has been used as server and globus-url-copy has been used as client
Development of AliEn-SE Interface via GridFTP– These newly developed modules along with necessary
GridFTP libraries and changes made in existing AliEn Code have been committed to CVS Server at CERN.
Courtesy: Tapas Samanta, VECC, Kolkata
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200625
Quality Assurance and Test Environment for AliEn-ARDA Prototype
Exploration and Design of Test Scripts using perl. Implementation of Test Scripts for each Individual perl sub-
module of AliEn. Individual perl sub-modules of AliEn code were tested for
proper functionalities. It Generates a detailed report of the individual tests and maintains a log.
Validation of Test-Scripts and Procedures. Testing Modules with perl Harnessing Environment. The Complete Suit was tested at CERN under perl Harnessing
Environment for testing AliEn online and generating online consolidated report of the test.
Inline Documentation to the extent possible.
Courtesy: Tapas Samanta, VECC, Kolkata
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200626
Acknowledgements
Gridview and Shiva team: Phool Chand, Digamber Sonvane, Kislay Bhatt
ELFMS, CC Tracker and Qos team: R.S.Mundada, R.Sharma, D.Sarode. C.Murthy
Anil Rawat, CAT, Indore T.Samanta, VECC, Kolkata Colleagues from IT Dept., CERN