using nagios to monitor your wo systems
TRANSCRIPT
Nagios for WO systemsPascal Robert Druide informatique
Nagios
• Open source project
• Available since 1999 (Netsaint)
• Pretty much the standard
• Interface a bit old (frames!)
Installation
• CentOS/Amazon Linux: yum install nagios nagios-plugins-all
• Ubuntu: apt-get install nagios3
• Mac OS X: port install nagios
Configuration directory
• CentOS/Amazon Linux: /etc/nagios/etc/httpd/conf.d/nagios.conf
• Ubuntu: /etc/nagios3
• Mac OS X: /opt/local/etc/nagios
NRPE• Agent to check local services
• CentOS/AmazonLinux: Installation: yum install nrpe Configuration: /etc/nagios/nrpe.cfg
• Ubuntu: apt-get install nagios-nrpe-server Configuration: /etc/nagios/nrpe.cfg
• Mac OS X: port install nrpe Configuration: /opt/local/etc/nrpe.cfg.sample
Basic monitoring
HTTP
• check_http plugin
• Can check port, string in respond, path, etc.
• Can do POST request with content
• Can do GET, HEAD, OPTIONS, TRACE, DELETE requests
• Can do BASIC auth
HTTPS
• Same plugin as HTTP
• Can check date of certificate
Using Selenium WebDriver
• Need more complex HTTP check?
• Selenium WebDriver + Google Chrome + script to the rescue!
MySQL
• Two plugins: check_mysql and check_mysql_query
• check_mysql can check status of slave
• check_mysql_query will check result of query against warning/critical levels
PostgreSQL
• check_pgsql
• Will check if specified database is active and running
Disk
• You don’t want to run out of disk space!
• check_disk plugin
• Check available disk space of specific file system or path
JMX
• Check the heap space of your WO apps!
• check_jmx
• http://exchange.nagios.org/directory/Plugins/Java-Applications-and-Servers/check_jmx/details
check_woapp.py
• Nagios plugin (Python) that checks numerous stuff in Monitor
• State
• Number of deaths
• Is refusing new sessions
• Is auto recover on?
• # of active sessions
Plugin development
• Can be anything! Bash, Python, Perl, Java, etc.
• Only need to send proper exit() signal
• Better to send performance data too
Other useful plugins
• check_load
• check_by_ssh
• check_dns
• check_file_age
• check_tcp/check_udp
• check_linux_raid
• check_ntp_time
• check_swap
Graphing
• Not built-in
• Numerous third-party
• I use PNP4Nagios
Actions
• Can launch actions (scripts) based on events
• Nagios call this « event handlers »
• Examples:
• Start new instance if one is down
• Start new VM if host memory is low
Demo
Next: Logstash
Q&A