have you been stalking your servers?
DESCRIPTION
Presentation for DrupalCon Prague 2013 https://prague2013.drupal.org/session/have-you-been-stalking-your-serversTRANSCRIPT
Have you been stalking your servers?
Have you been stalking your servers?
Marji CermakSysadmin & DevOps Engineer at Morpht
[email protected]@cermakm
The rule of 3 things
picture: http://www.flickr.com/photos/helenaperezgarcia/5692392667/
The rule of 3 things
1. What is monitoring and why do you want to monitor
2. Some monitoring tools available for you
3. It is easy to start with monitoring.
Part 1
What is monitoring and why do you want to monitor
photo: http://www.flickr.com/photos/tiagopadua/7903366470/
Monitoring
Monitoring is an intermittent (regular or irregular) series of observations in time, carried out to show the extent of compliance with a formulated standard or degree of deviation from an expected norm.
J. M. Hellawell (1991), modified by A. Brown (2000), http://jncc.defra.gov.uk/page-2268nature conservation area
Why you need to monitor
● to know about the bad news before your customers (or your boss)
Why you need to monitor
● to know about the bad news before your customers (or your boss)
● to scale up your server in advance
Why you need to monitor
● to know about the bad news before your customers (or your boss)
● to scale up your server in advance
● to tune up your app
Why you need to monitor (cont.)
● to prove your uptime of 99.999 :)
The fun of the nines
Source: http://en.wikipedia.org/wiki/High_availability
Nines: http://en.wikipedia.org/wiki/List_of_unusual_units_of_measurement#Nines
Why you need to monitor (cont.)
● to prove your uptime of 99.999 :)
● to minimise downtime (expensive)
Why you need to monitor (cont.)
● to prove your uptime of 99.999 :)
● to minimise downtime (expensive)
● to capture customer information
Why you need to monitor (cont.)
● to have data / metrics to diagnose
Diagnosing your collected data
watch out for:● trends
Diagnosing your collected data
watch out for:● trends● spikes
Diagnosing your collected data
watch out for:● trends● spikes● irregularities
Diagnosing your collected data
watch out for:● trends● spikes● irregularities● thresholds
Areas to monitor
● network
photo: http://www.flickr.com/photos/misja_klimov/2120956405/
Areas to monitor
● network● server
photo: http://www.flickr.com/photos/johnjack/3666997634/
Areas to monitor
● network● server● services
photo: http://www.flickr.com/photos/agustingodet/3691794089/
Areas to monitor
● network● server● services
photo: http://www.flickr.com/photos/agustingodet/3691792393/
Areas to monitor
● network● server● services● applications photo: http://www.flickr.com/photos/cheerfulstoic/942211994/
Areas to monitor
● network● server● services● applications● users
photo: http://www.flickr.com/photos/jimmysmith/99528596/
Drupal Areas to monitor?
● network● server● services● applications● users
Drupal Areas to monitor
● network● server ● services● applications● users
Drupal Areas to monitor
● network● server ● services● applications● users
Drupal Areas to monitor
● network● server● services
○ webserver○ database
● applications● users
Drupal Areas to monitor
● network● server● services
○ webserver○ database
● applications - your Drupal site(s)● users
Drupal Areas to monitor
● network● server● services
○ webserver○ database
● applications - your Drupal site(s)● users
Part 2
Some monitoring tools available for you
Meet Nagios, Munin and others
● Nagios● Munin● APC dashboard
● related Drupal modules
Nagios /ˈnɑːɡiːoʊs/
● system, network and infrastructure monitoring software application
● monitors and alerts
Nagios /ˈnɑːɡiːoʊs/
Provides monitoring of:● network services (SMTP, POP3, HTTP,
NNTP, ICMP, SNMP, FTP, SSH),● host resources (processor load, disk usage,
system logs),● anything else like probes (temperature,
alarms, etc).Many plugins available.
Nagios /ˈnɑːɡiːoʊs/
Name and Pronunciation:● NetSaint -> "Nagios Ain't Gonna Insist On
Sainthood"● Agios' a transliteration of the Greek word
άγιος (saint)
Nagios /ˈnɑːɡiːoʊs/
● alerts by email/pager/IM...● alerts to different contacts● notification escalation● service / host dependencies● soft / hard states
Nagios /ˈnɑːɡiːoʊs/
Nagios Addons
NRPE (Nagios Remote Plugin Executor)- executes plugins on remote Linux/Unix hosts
image source: http://nagios.sourceforge.net/docs/3_0/addons.html
Nagios Addons
NSCA- sends passive checks from remote Linux/Unix hosts to Nagios
image source: http://nagios.sourceforge.net/docs/3_0/addons.html
Drupal and Nagios
Munin
● network/system monitoring application● outputs graphs through a web interface● many plugins
Munin
● master / node architecture● connects to all nodes at regular intervals ● it uses the RRDtool (round robin database
tool, handles time-series data)
Munin Example
Drupal and Munin
Drupal and Munin
● they complement each other● nagios normally alerts on one “service” ● munin can be used to correlate different
things
Nagios & Munin
APC - what is it?
The Alternative PHP Cache (APC) is a free and open opcode cache for PHP.
APC - what is it?
The Alternative PHP Cache (APC) is a free and open opcode cache for PHP.
Its goal is to provide a free, open, and robust framework for caching and optimising PHP intermediate code.
Inside your webserver (not a webcache)
Monitoring APCMemory Usage, Hit & Misses
Monitoring APCFragmentation
Monitoring APCmemory usage
Monitoring APCfiles in cache
Other monitoring tools
● Collectd● Graphite● Shinken● Sensu● NewRelic● Pingdom
Part 3
It is easy to start with monitoring.
How to install these tools?
Muninsudo apt-get install munin munin-node
Nagiossudo apt-get install nagios3
APC dashboardphp.apc script from php-apc package
How to configure these?
● It is a bit fiddly● There are many guides targeting beginners● You don’t want to do it again and again
puppet – a quick way to start
system for automating system administration tasks
puppet – a quick way to start
● a declarative language for expressing system configuration,
puppet – a quick way to start
● a declarative language for expressing system configuration,
● a client and server for distributing it
puppet – a quick way to start
● a declarative language for expressing system configuration,
● a client and server for distributing it
● and a library for realising the configuration.
puppet – a quick way to start
}
puppet – a quick way to start
1. clone the stalk-your-box repo
2. run puppet apply on the code
3. monitor!
A quick way to start
$ git clone git://github.com/morpht/stalk-your-box.git /tmp/stalk-your-box
Cloning into '/tmp/stalk-your-box'...remote: Counting objects: 23, done.remote: Compressing objects: 100% (19/19), done.remote: Total 23 (delta 1), reused 23 (delta 1)Receiving objects: 100% (23/23), 11.35 KiB, done.Resolving deltas: 100% (1/1), done.
A quick way to start
$ cd /tmp/stalk-your-box/$ sudo puppet apply --modulepath=modules manifest.pp
notice: /Stage[main]/Nagios::Server/Package[nagios3]/ensure: ensure changed 'purged' to 'present'
notice: /Stage[main]/Nagios::Server/File[/etc/nagios3/htpasswd.users]/ensure: created
notice: /Stage[main]/Nagios::Server/Exec[update-nagios-htpasswd]/returns: Adding password for user nagiosadmin
notice: /Stage[main]/Nagios::Server/Exec[update-nagios-htpasswd]/returns: executed successfully
notice: /Stage[main]/Munin::Node/Package[libcache-cache-perl]/ensure: ensure changed 'purged' to 'present'
notice: /Stage[main]/Munin::Node/Package[munin-node]/ensure: ensure changed 'purged' to 'present'
notice: /Stage[main]/Munin::Node/File[munin-node.conf]/content: content changed '{md5}e486786f866d7d7e025dea401c300e7b' to '{md5}dbf97a87a8da86ef68155815ecae3c1c'
notice: /Stage[main]/Munin::Server/Service[apache2]: Triggered 'refresh' from 1 events
notice: Finished catalog run in 44.26 seconds
What this gives you
What this gives you
What this gives you
Manifest.pp
Manifest.pp
Manifest.pp
Summary
It is easy to start with monitoring.
The fun part - what’s wrong?
What’s wrong here?
The fun part - what’s wrong?
Questions
Here is the get started monitoring repo:https://github.com/morpht/stalk-your-box
Marji CermakSysadmin & DevOps Engineer at Morpht
[email protected]@cermakm
ResourcesRule of Three: en.wikipedia.org/wiki/Rule_of_three_(writing)Nagios: http://www.nagios.org/Munin: http://munin-monitoring.org/Nagios module: https://drupal.org/project/nagiosMunin module: https://drupal.org/project/muninMunin plugins (experimental): https://drupal.org/sandbox/murrayw/2084281Sensu: http://sensuapp.orgMySQLTuner: http://MySQLTuner.pl
THANK YOU!
WHAT DID YOU THINK?
Locate this session at the DrupalCon Prague website:http://prague2013.drupal.org/schedule
Click the “Take the survey” link