international workshop on large scale computing, vecc, kolkata, feb 8-10, 2006 1 lcg software...

26
International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10 , 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

Upload: christiana-pitts

Post on 11-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 20061

LCG Software Activities in India

Rajesh K.

Computer Division

BARC

Page 2: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 20062

DAE-CERN Collaboration on Grid Computing

Agreement for collaboration in software development for LCG

5 year period 2003-2007 (inclusive) 50 FTE years Participating DAE institutes: BARC, CAT,

VECC, TIFR

Page 3: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 20063

Current Projects

3 projects currently underway in the area of Grid Monitoring and Fabric Management

GRIDVIEW: A visualization tool for LCG ELFMS: Extremely Large Fabric

Management System CC Tracker: Computer Centre Tracker

Page 4: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 20064

GRIDVIEW – Visualization Tool for LCG

Visualization system for viewing monitoring information from the LCG

Dashboard showing different status views for different kinds of information

– Site-wise– VO-wise– Etc.

Intended for use in GOCs and ROCs but not restricted to that

Page 5: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 20065

Gridview

Collects monitoring information from different monitoring tools from grid sites using R-GMA as transport

– Gridftp monitor – SFT– RB Logs etc.

Archival of monitoring information in a central Oracle database at CERN

Analysis of this data to generate summaries Visualization of summary data through Web interface

and GUI

Page 6: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

Gridview: Architecture

Page 7: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 20067

Gridview: Current

GridFTP transfer monitoring– In production use– Display of network throughput and total data

transferred Different host/destination VO-wise, Site-wise, Host-wise Current, Hourly, Daily, Monthly etc.

– Used during SC3 throughput tests

Page 8: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 20068

Gridview: In development

Job Status Monitoring– Total number of jobs at grid sites in different

states– VO-wise, RB-wise distribution– Site wise job failure rate, utilization etc.

Grid Dashboard– Pictorial representation of site status info on a

world map

Page 9: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC
Page 10: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200610

ELFMS: Extremely Large Fabric Management System

Participation in CERN Project on ELFMS ELFMS is used to manage and monitor

thousands of nodes in the CERN computer centre and other LCG sites

Contribution to development and support for ELFMS modules LEMON and QUATTOR

Page 11: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200611

LEMON: LHC Era MONitoring

Lemon is a system designed to monitor performance metrics, exceptions & status information of extremely large clusters

At CERN it monitors ~2000 nodes, ~70 clusters with ~150 metrics/host producing ~1GB of data. Estimated to monitor up to 10000 nodes

A variety of web based views of monitored data for – Sysadmins, managers and users

Highly modular architecture allows the integration of user developed sensors for monitoring site-specific metrics.

Page 12: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200612

Contribution to LEMON project

TCP based communication between agent and server instead of the current UDP based one.

SSL encryption A light weight correlation engine to generate

exception metrics and launch fault tolerant actuators in response to an undesired state. This sensor supports multiple metric correlation and mathematical operations between correlated metrics

Page 13: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200613

Quattor

Quattor is a tool suite providing automated installation, configuration and management of clusters and farms

Highly suitable to install, configure and manage Grid computing clusters correctly and automatically

At CERN, currently used to auto manage nodes >2000 with heterogeneous hardware and software applications

Centrally configurable & reproducible installations, run time management for functional & security updates to maximize availability

Page 14: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200614

Contribution to Quattor project

Testing all new releases Quattor outside CERN.

Presently working in the SWRep (Software Repoisitory) part of Quattor.

Changing SWRep framework from ssh to SOAP with ssl.

Page 15: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200615

CC Tracker

A tool to Visualize the CERN Computer Centre

Simplifies management of the CERN Computer Centre for LHC scale operations

Allows easy invocation of service management and operational interventions across sets of selected nodes

Page 16: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200616

CC Tracker Functionality

Visualize Computer Centre Physical View : Display of all CC rooms with Racks, Disk

cabinets, Tape silos, PDUs Logical View: Display Domains, Clusters, Sub clusters & hosts

in hierarchical way Manage Hardware:

– Add , Move, Rename & Retire operations for a set of machines – Change cluster, update kernel, update OS , shutdown, reboot &

set desired state Manage Infrastructure:

– Addition/deletion of Rack, Disk Cabinet, Tape Silo, PDU and Tape drive

– Updating properties (name, location)– Check power consumption by rack, zone, room & cluster– Check cost by rack, zone, room & cluster

Page 17: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200617

CCTrackerLogical & Physical view

Page 18: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200618

Completed Projects

SHIVA – Problem Tracking System QoS: Quality of Service Prediction for worker

nodes RDBMS backend for POOL LCG-AliEn Storage Element Interface Test Suite for perl harnessing with AliEn

Page 19: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200619

SHIVA Problem Tracking System

General purpose bug tracking system for keeping track of bugs, feature and other issues in software development projects

SHIVA accepts problem reports from users, routes them to troubleshooters and maintains archives of problems and solutions

Generate reports and statistics Implement helpdesk systems for services Used as a problem tracker for SC3 related work

Page 20: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

SHIVA: Screenshot

Page 21: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200621

QoS: Predicting Quality of Service of Worker nodes

Deriving a composite metric for quality of service offered by worker node

QoS computed by a correlation engine which takes simple metrics such as load average, free memory and so on as input

Page 22: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200622

RelationalStorageSvc for POOL

Designed and Developed the “Relational StorageSvc” module prototype for POOL.

Designed the prototype as a plugin in POOL framework. Provided solution for Remote Database connectivity. Implemented interfaces of the POOL Storage Manager for

ORACLE backend. Demonstrated the Navigation,Storage and Retrieval of data using

ORACLE and the ODBC connectivity option. Tested the cross technology referencing concept of POOL

Storage Manager using ROOT and ORACLE, for primitive and referenced data types

Courtesy: Anil Rawat, Centre for Advanced Technology, Indore

Page 23: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200623

RSSvc’s place in POOL Architecture

POOL API

Storage Service FileCatalog Collections

ROOT I/OStorage Svc

XMLCatalog

MySQLCatalog

EDG Replica Location Service

ExplicitCollection

ImplicitCollection

RDBMSStorage Svc ?

Courtesy: Anil Rawat, Centre for Advanced Technology, Indore

Page 24: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200624

LCG-AliEn Storage Element Interface

Test Bed Installation for Grid Environment:– One central server and two Sites.

Installation of Certification Authority Server Installation of GridFTP Library under AliEn

– The GridFTP daemon in.ftpd has been used as server and globus-url-copy has been used as client

Development of AliEn-SE Interface via GridFTP– These newly developed modules along with necessary

GridFTP libraries and changes made in existing AliEn Code have been committed to CVS Server at CERN.

Courtesy: Tapas Samanta, VECC, Kolkata

Page 25: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200625

Quality Assurance and Test Environment for AliEn-ARDA Prototype

Exploration and Design of Test Scripts using perl. Implementation of Test Scripts for each Individual perl sub-

module of AliEn. Individual perl sub-modules of AliEn code were tested for

proper functionalities. It Generates a detailed report of the individual tests and maintains a log.

Validation of Test-Scripts and Procedures. Testing Modules with perl Harnessing Environment. The Complete Suit was tested at CERN under perl Harnessing

Environment for testing AliEn online and generating online consolidated report of the test.

Inline Documentation to the extent possible.

Courtesy: Tapas Samanta, VECC, Kolkata

Page 26: International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 2006 1 LCG Software Activities in India Rajesh K. Computer Division BARC

International Workshop on Large Scale Computing, VECC, Kolkata, Feb 8-10, 200626

Acknowledgements

Gridview and Shiva team: Phool Chand, Digamber Sonvane, Kislay Bhatt

ELFMS, CC Tracker and Qos team: R.S.Mundada, R.Sharma, D.Sarode. C.Murthy

Anil Rawat, CAT, Indore T.Samanta, VECC, Kolkata Colleagues from IT Dept., CERN