lee myers - what to do when nagios notification don't meet your needs

22
What to do when Nagios notification don't meet your needs? You Push It

Upload: nagios

Post on 21-Mar-2017

469 views

Category:

Presentations & Public Speaking


5 download

TRANSCRIPT

Page 1: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

What to do when Nagios notification don't meet your needs?

You Push It

Page 2: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

Background

Career Start

Intel - ASCII RED Supercomputer

• 1st TeraFlops Supercomputer• Cabinets 102 - Drive & Compute clusters• 4,536 Nodes• 9,216 Processors (Pentium Pro’s)• 9,216 Cores• 1600 Square Feet

Currently

NCAR - Yellowstone Computer

• 2012: 13th with 1.5 PetaFlops, Now 50th• 94 Cabinets - 74 Compute & 10 Drive clusters• 4,542 Nodes• 9,036 Processors (Intel Xeon E5-2670)• 72,288 Cores

• 2,000 Square Feet

Page 3: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

Nagios Configuration

Primary Instance• Hosts - 1289• Services - 3235

Total Instances• Hosts - 1410• Services - 3867

Test Instance• Hosts - 20,007• Services - 40,045• Passive Results from scripts

Primary Instance• 4 Check_MK Monitored Servers• 5 Remote Servers sending Passive

Results• 4 Sites being Monitored

Normal Load < 1 with 5 instances running.

Load with Test running < 4

Using OMD 1.2 (Nagios 3.5, Check_MK 1.2.4p5, Thruk 1.84-6, PNP4Nagios 0.6.24)

Page 4: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

Nagios Notification Configuration

Host / Service

• notification_period– 24x7– workhours

• contact_groups

Contact

• service_notification_period– 24x7– workhours

• host_notification_period– 24x7– workhours

• service_notification_options– w,u,c,r,f

• host_notification_options– d,u,r

Page 5: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

Standard Work Week

Simple distinction between work and home.

Page 6: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

Non-Standard Rotating Work Week

Complex and Every Week is Different.

Page 7: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

Since we have 24x7 coverage, why did we want notifications?

We are not always in our Operations Center at Night

• Doing nightly Visual Inspections• Replacing hardware in the Supercomputer• Working with facilities• Talking with Security• Eating a meal in our Kitchen• Watching fireworks with facilities• ...

Page 8: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

Our initial Failure

No Sound from iPad Web or Apps

Page 9: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

What We Needed

• Interface to Nagios Data• Something to Parse for

Unacknowledged Alerts• Something to send out Notifications• Program to give us our alerts on our

Mobile Devices

Page 10: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

Interface to Nagios Data

Check_MK Livestatus• Nagios Broker Module• Written by Mathias Kettner• Direct Connection to Nagios through a

UNIX Socket• No Database to administer• No Configuration needed• Single line needs to be added to

nagios.cfg• Access it from the shell with unixcat• Uses Livestatus Query Language• http://mathias-kettner.com/checkmk_livestatus.html

Example:root@linux# echo 'GET hosts' | unixcat /var/lib/nagios/rw/live

acknowledged;action_url;address;alias;check_command;check_period;checks_enabled;contacts;in_check_period;in_notification_period;is_flapping;last_check;last_state_change;name;notes;notes_url;notification_period;scheduled_downtime_depth;state;total_services

0;/nagios/pnp/index.php?host=$HOSTNAME$;127.0.0.1;Acht;check-mk-ping;;1;check_mk,hh;1;1;0;1256194120;1255301430;Acht;;;24X7;0;0;7

0;/nagios/pnp/index.php?host=$HOSTNAME$;127.0.0.1;DREI;check-mk-ping;;1;check_mk,hh;1;1;0;1256194120;1255301431;DREI;;;24X7;0;0;1

0;/nagios/pnp/index.php?host=$HOSTNAME$;127.0.0.1;Drei;check-mk-ping;;1;check_mk,hh;1;1;0;1256194120;1255301435;Drei;;;24X7;0;0;4

Page 11: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

Something to Parse - Livestatus

LQL Queries• “GET” and name of Table• Arbitrary number of header lines

consisting of a keyword, a colon and arguments.

• Empty line or ‘End of Transmission’

Tableshosts services hostgroupscontacts commands servicegroupslog timeperiods contactgroupsstatus downtimes hostsbygroupcolumns statehist commentsservicesbygroup servicesbyhostgroup

ColumnsColumns: <list of column names to return in order>

FiltersFilter: <column name> <operator> <value>

Operators: =, ~, =~, ~~, <, >, <=, >=, !=, !~, !=~, !~~Values: number, text

Combining filtersOr: <last x filters>And: <last X filters>Negate:

Others - Counting, Sums, Max, Min, Sd Dev, and more

Page 12: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

Send out Notifications

Pushbullet• Free• Several API’s

– Android Extensions– iPhone– HTTP API

• https://docs.pushbullet.com

Were interested in the HTTP API, we are not writing a custom mobile app.

HTTP API Calls• Objects

– /v2/pushes– /v2/devices– /v2/contacts– /v2/users/me

• Accounts– /oath2

And more API calls which we don’t use.

Page 13: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

Deliver to our Mobile Devices

Page 14: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

Our Solution

Page 15: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

nagios_push.sh

#!/bin/bash

# Get the person's access code for pushbulletread AccessCode < /home/$USER/PushBulletAccessCode

# Query nagios for host alerts and send them to pushbulletfor i in $(/opt/omd/versions/1.00/bin/unixcat < /usr/local/sbin/PushBullet_query_hosts /omd/sites/noc/tmp/run/live | tr ' ' '_' | cut -f1,2 -d';'); do

curl -u $AccessCode: https://api.pushbullet.com/v2/pushes -d type=note -d title="${i%;*}" -d body="${i#*;}" > /dev/null 2>&1done

# Query nagios for service alerts and send them to pushbullet

for i in $(/opt/omd/versions/1.00/bin/unixcat < /usr/local/sbin/PushBullet_query_services /omd/sites/noc/tmp/run/live | tr ' ' '_' | cut -f1,2 -d';'); do

curl -u $AccessCode: https://api.pushbullet.com/v2/pushes -d type=note -d title="${i%;*}" -d body="${i#*;}" > /dev/null 2>&1done

Page 16: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

/usr/local/sbin/PushBullet_query_hosts

GET hostsColumns: name plugin_output stateFilter: state > 0Filter: acknowledged = 0Filter: host_scheduled_downtime_depth = 0

Page 17: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

PushBullet Command Files

/usr/local/sbin/PushBullet_query_hosts

GET hostsColumns: name plugin_output stateFilter: state > 0Filter: acknowledged = 0Filter: host_scheduled_downtime_depth = 0

/usr/local/sbin/PushBullet_query_services

GET servicesColumns: name plugin_output stateFilter: state > 0Filter: acknowledged = 0Filter: scheduled_downtime_depth = 0

Page 18: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

Our Support Scripts

Page 19: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

npush_on

#!/bin/bash#Make sure it is not run as rootif [ $UID -eq 0 ]then

echo "Not to be run as root."exit

fi

if (crontab -l|grep -q nagios_push.sh)then#UnComment out the crontab

crontab -l | sed -e 's/#*\*\/4 \* \* \* \* \/usr\/local\/sbin\/nagios_push.sh/\*\/4 \* \* \* \* \/usr\/local\/sbin\/nagios_push.sh/'|crontabelse#Append the item to the crontab

(crontab -l; echo "*/4 * * * * /usr/local/sbin/nagios_push.sh")|crontabfi

#Let the user know when you are turning off the npushhour=$(date +%H)if [ "$hour" -lt 18 -a "$hour" -ge 6 ]; then

/usr/bin/at -f /usr/local/bin/npush_off 7pmecho "Turning off npush at 7 PM"

else/usr/bin/at -f /usr/local/bin/npush_off 7amecho "Turning off npush at 7 AM"

fi

Page 20: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

npush_off

#!/bin/bash#Comment out the crontab

crontab -l | sed -e 's/\*\/4 \* \* \* \* \/usr\/local\/sbin\/nagios_push.sh/#\*\/4 \* \* \* \* \/usr\/local\/sbin\/nagios_push.sh/'|crontab

Page 21: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

Future Upgrades

• Read Google Calendar for our schedule, no more remembering to turn it on.

• Send email alerts to PushBullet. (Without false alerts)• Remove the Crontab line, instead of commenting it out.• Anything else we can think of.

Page 22: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs

Questions