02/07/09 1 wlcg nagios kashif mohammad deputy technical co-ordinator (south grid) university of...

9
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford

Upload: adam-clark

Post on 02-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford

02/07/09 1

WLCG NAGIOS

Kashif MohammadDeputy Technical Co-ordinator

(South Grid)University of Oxford

Page 2: 02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford

02/07/09 2

WLCG NAGIOS

• WLCG Nagios is a part of EGEE SA1 Multi-Level Monitoring (MLM) to provide an integrated project level monitoring system for EGEE III. https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMonitoringOverview

• This is based on EGEE III Operations Automation

Strategy to suit the future federated Infrastructure such as EGI.org.

• WLCG Nagios at ROC Level is suppose to replace central monitoring like SAM in post EGEE era.

Page 3: 02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford

02/07/09 3

WLCG NAGIOS

WLCG Nagios is based on many components, few of them are

• Nagios Configuration Generator (NCG) : It’s a configuration tool which creates configuration file for Nagios by querying GOCDB, site BDII and Metric Description Database.

• Metric Description Database : It’s a project level database which provides description of tests which should be run against grid services at EGEE sites.

• MSG-Nagios bridge: Listen on messaging system for messages destined to this Nagios and push them to Nagios.

Page 4: 02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford

02/07/09 4

Page 5: 02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford

02/07/09 5

WLCG NAGIOS

WLCG Nagios uses two type of probes at regional level.Remote Probes: These are the probes which are

executed against site by some external agents. WLCG Nagios uses two such external agents namely SAM grid monitoring probes and ENOC Network Monitoring Probes. In Nagios term, these are passive service check.

Local Probes : These are the test which site monitoring service schedule itself. Most of these tests are replica of SAM tests written as Nagios probes and submitted through User Interface using grid proxy. In Nagios term, these are active service check.

Page 6: 02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford

02/07/09 6

UKI WLCG NAGIOS SETUP AT OXFORD

Personal Computer

User Interface Nagios Server

Myproxy Server

ENOC Server

SAM ServerLCG Grid

Upload Proxy

Local Tests

Page 7: 02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford

02/07/09 7

UKI WLCG NAGIOS SETUP AT OXFORD

We have installed a WLCG Nagios instance at Oxford for UKI

www.gridppnagios.physics.ox.ac.uk/nagios

Access is restricted to members of dteam and ops VO. Access can be granted to non vo members having grid certificate.

A brief introduction is provided at http://www.gridpp.ac.uk/wiki/UKI_Regional_Nagios I have to expand it !

Page 8: 02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford

02/07/09 8

SAM

Local

NPM

Page 9: 02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford

02/07/09 9

UKI WLCG NAGIOS SETUP AT OXFORD

• You can subscribe alarm notification by dropping me a mail

• Local tests are more frequent than SAM test so sometime it can be useful. Is it ?

• Which alarms are useful ?• Alarm notifications can be fine tuned. But need

feedback.