nagios, cacti, prism monitoring at nerscnagios, cacti, prism monitoring at nersc thomas davis...

15
NAGIOS, Cacti, Prism Monitoring at NERSC Thomas Davis ([email protected]) David Skinner ([email protected])

Upload: others

Post on 18-Mar-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NAGIOS, Cacti, Prism Monitoring at NERSCNAGIOS, Cacti, Prism Monitoring at NERSC Thomas Davis (tdavis@nersc.gov) David Skinner (dskinner@nersc.gov)

NAGIOS, Cacti, PrismMonitoring at NERSC

Thomas Davis ([email protected])David Skinner ([email protected])

Page 2: NAGIOS, Cacti, Prism Monitoring at NERSCNAGIOS, Cacti, Prism Monitoring at NERSC Thomas Davis (tdavis@nersc.gov) David Skinner (dskinner@nersc.gov)

2

Overview

• Monitoring at NERSC– Node Security–Web vs. shell vs. ??

• NAGIOS• Cacti• Prism• Real Life Examples

Page 3: NAGIOS, Cacti, Prism Monitoring at NERSCNAGIOS, Cacti, Prism Monitoring at NERSC Thomas Davis (tdavis@nersc.gov) David Skinner (dskinner@nersc.gov)

Monitoring at NERSC

• Performance vs. Fault• Each system gets a dedicated

monitoring node– Node is owned and maintained by NERSC,

but is interconnected where possible to internals of system.

Page 4: NAGIOS, Cacti, Prism Monitoring at NERSCNAGIOS, Cacti, Prism Monitoring at NERSC Thomas Davis (tdavis@nersc.gov) David Skinner (dskinner@nersc.gov)

Security of monitoring node

• Security is provided by firewalling nodes to outside world.– Https only to web access.– Firewalled, iptables, and apache access

controls– Few or no local accounts• Managed by NIM/LDAP

Page 5: NAGIOS, Cacti, Prism Monitoring at NERSCNAGIOS, Cacti, Prism Monitoring at NERSC Thomas Davis (tdavis@nersc.gov) David Skinner (dskinner@nersc.gov)

Web based

• Allows portability.• System allows outside world access to

data via tunnels.• Logging of access is performed.

Page 6: NAGIOS, Cacti, Prism Monitoring at NERSCNAGIOS, Cacti, Prism Monitoring at NERSC Thomas Davis (tdavis@nersc.gov) David Skinner (dskinner@nersc.gov)

NAGIOS

• NAGIOS - http://www.nagios.org/ is primary fault monitoring system.– All systems have a NAGIOS monitoring

process.– Some plugins are locally written• Ddn fault & temperature• Engenio/FaSTT fault• Cray Cabinet temperature• Cray xtcli node & link faults

Page 7: NAGIOS, Cacti, Prism Monitoring at NERSCNAGIOS, Cacti, Prism Monitoring at NERSC Thomas Davis (tdavis@nersc.gov) David Skinner (dskinner@nersc.gov)

Cacti

• http://www.cacti.net/• Used to collect data from network,

temperature, and ddn statistics.– Locally written plugins for thermal and ddn

statistics.

Page 8: NAGIOS, Cacti, Prism Monitoring at NERSCNAGIOS, Cacti, Prism Monitoring at NERSC Thomas Davis (tdavis@nersc.gov) David Skinner (dskinner@nersc.gov)

Prism

• Locally written• Understands RDF feeds.• Can aggregrate feeds into one display.

Page 9: NAGIOS, Cacti, Prism Monitoring at NERSCNAGIOS, Cacti, Prism Monitoring at NERSC Thomas Davis (tdavis@nersc.gov) David Skinner (dskinner@nersc.gov)

NAGIOS status

Page 10: NAGIOS, Cacti, Prism Monitoring at NERSCNAGIOS, Cacti, Prism Monitoring at NERSC Thomas Davis (tdavis@nersc.gov) David Skinner (dskinner@nersc.gov)

NAGIOS – xtcli status

Page 11: NAGIOS, Cacti, Prism Monitoring at NERSCNAGIOS, Cacti, Prism Monitoring at NERSC Thomas Davis (tdavis@nersc.gov) David Skinner (dskinner@nersc.gov)

Cacti – DDN statistics

Page 12: NAGIOS, Cacti, Prism Monitoring at NERSCNAGIOS, Cacti, Prism Monitoring at NERSC Thomas Davis (tdavis@nersc.gov) David Skinner (dskinner@nersc.gov)

Cacti – Thermal Plugin

Page 13: NAGIOS, Cacti, Prism Monitoring at NERSCNAGIOS, Cacti, Prism Monitoring at NERSC Thomas Davis (tdavis@nersc.gov) David Skinner (dskinner@nersc.gov)

Cacti – Thermal Plugin

Page 14: NAGIOS, Cacti, Prism Monitoring at NERSCNAGIOS, Cacti, Prism Monitoring at NERSC Thomas Davis (tdavis@nersc.gov) David Skinner (dskinner@nersc.gov)

Cacti – Network Weather Map

Page 15: NAGIOS, Cacti, Prism Monitoring at NERSCNAGIOS, Cacti, Prism Monitoring at NERSC Thomas Davis (tdavis@nersc.gov) David Skinner (dskinner@nersc.gov)

Prism