nagios
DESCRIPTION
NagiosTRANSCRIPT
![Page 1: Nagios](https://reader035.vdocument.in/reader035/viewer/2022072008/55cf8f47550346703b9ab77d/html5/thumbnails/1.jpg)
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
An Introduction to Monitoring with Nagios
Laurent Andrey Remi Badonnel
LORIA - INRIA Grand Est
ISSNSM’2008, Zurich
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 1/32
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Introduction
Nagios at a glanceKey conceptsFunctional architectureServices and service states
Nagios configurationObject definitionsOther elementsExample scenario
Checks and their executionLocal checksRemote checks
Advanced configurations
Conclusions
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 2/32
![Page 2: Nagios](https://reader035.vdocument.in/reader035/viewer/2022072008/55cf8f47550346703b9ab77d/html5/thumbnails/2.jpg)
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
ReferencesMotivations
References
� www.nagios.org: official distribution (core, plugins anddocumentation)
� www.nagiosexchange.org: lots of complementary plugins
W. Barth.Nagios, System and Network Monitoring.Open Source Press GmbH, first edition, 2006.ISBN: 1-59327-070-4.
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 3/32
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
ReferencesMotivations
What is Nagios? What is Nagios useful for?
� A widely-used monitoring tool for trouble-shooting� Simple and open source� Network, system, service levels
� With a sophisticated (?) notification system to informadministrators when something goes wrong
� Nagios provides support to administrator(s) for detectingproblems before users (including the boss!)
� Mail server failure� Hard drive overload� Network outage
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 4/32
![Page 3: Nagios](https://reader035.vdocument.in/reader035/viewer/2022072008/55cf8f47550346703b9ab77d/html5/thumbnails/3.jpg)
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Key conceptsFunctional architectureServices and service states
Key concepts
� Colored area concept� Green/Yellow/Red (Ok/Warning/Critical)
� No performance analysis or display (a priori)
� Checks using external commands (plugins)
� Various possibilities for remote checks
� Possibility for passive checks (from managed resources)
� Web interface + notifications
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 5/32
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Key conceptsFunctional architectureServices and service states
Functional architecture
Notification SystemFilters, Contact Delivery
Data ReapingResource State Calculation
Web Interfaceconfig
log
NAGIOS
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 6/32
![Page 4: Nagios](https://reader035.vdocument.in/reader035/viewer/2022072008/55cf8f47550346703b9ab77d/html5/thumbnails/4.jpg)
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Key conceptsFunctional architectureServices and service states
Architecture at run-time
� Data reaping + notification system= nagios processes� Can be run as a service (rcX.d, soft runlevel)
� Web interface� External web server (Apache)� Bunch of cgi scripts (part of Nagios)
� Configuration� Simple text files� Or a postgres database
� Logs (local files)
� Named pipe (unix domain socket) to enable nagios to receivecommands (from cgi, passive asynchronous events)
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 7/32
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Key conceptsFunctional architectureServices and service states
Service, service check
� Service� Service delivered by a software� Percentage of free space on a partition� Bandwith usage on a network interface ...
� Service check� Provides state information on a service� Returns a value: OK, WARNING, CRITICAL (exit status 0, 1,
2), UNKNOWN (exit status 3, due to time out or pluginruntime trouble) to reflect the Nagios view about this service
� Can be local (OS calls) or remote (ICMP, NRPE, SNMP ...)� Is implemented by a plugin (external command/script)
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 8/32
![Page 5: Nagios](https://reader035.vdocument.in/reader035/viewer/2022072008/55cf8f47550346703b9ab77d/html5/thumbnails/5.jpg)
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Key conceptsFunctional architectureServices and service states
Service states
� Service states are the mirror of what nagios observes
� States: OK, WARNING, CRITICAL, UNKNOWN
� Transitions from one state to another one based on resultsprovided by checks
� Critical and warning states are shadowed by related soft states� A service goes first to a soft state� Attempt count mecanism to reach a definitive hard state
� User notification can only occur when hard states are reached
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 9/32
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Key conceptsFunctional architectureServices and service states
Service state diagram
Soft Warning
warningac ==mca↑ problem ��
warningac + +
ac < mca
��
ok ; ac ← 1�������
�����������
criticalac + +ac <mca
��
criticalac ==mca↑ problem
��
Hard Warning
critical
��
warning
��
ok ; ac ← 1 ; ↑ recovery
��OK
warningac + +
ac < mca ��
okac ← 1
criticalac + +ac <mca
�� Soft Criticalcriticalac + +
ac ==mca↑ problem
��
criticalac + +
ac < mca
ok ; ac ← 1�������
����������
warningac + +ac <mca
��warning
ac ==mca↑ problem
���������������������������Hard Critical
warning
��
critical
��
ok ; ac ← 1 ; ↑ recovery
��
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 10/32
![Page 6: Nagios](https://reader035.vdocument.in/reader035/viewer/2022072008/55cf8f47550346703b9ab77d/html5/thumbnails/6.jpg)
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Key conceptsFunctional architectureServices and service states
Service state diagram legend
� HS hard state, using normal check interval between 2checks
� S soft state, using retry check interval between 2 checks
� RS �� transition triggered by a check with a return status of
RS∈ { ok, warning, critical }� ac++ �� Attempt Count is incremented when transition is
triggered
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 11/32
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Key conceptsFunctional architectureServices and service states
Service state diagram legend (ctd)
� ac<mca�� transition is triggered if Attempt Count is smallerthan the service configuration attribute:max check attempts
� ac==mca�� transition is triggered if Attempt Count is equal tothe service configuration attribute: max check attempts
� ac←1 �� Attempt Count is set to 1 then the transition istriggered
� ↑notification�� a user notification ∈ { problem, recovery } isgenerated when this transition is triggered
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 12/32
![Page 7: Nagios](https://reader035.vdocument.in/reader035/viewer/2022072008/55cf8f47550346703b9ab77d/html5/thumbnails/7.jpg)
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Object definitionsOther elementsExample scenario
Nagios configuration
� Object-oriented representation� A nagios object describes a specific unit: a service, an host, a
contact, a contactgroup ... with attributes and values� kind of inheritance mechanisms, dependencies amongst objets
� Set of configuration files� Main file: nagios.cfg (ref. to other cfg. files)� 1 file per object type: services.cfg, hosts.cfg, contacts.cfg,
checkcommands.cfg, misccommands.cfg, timeperiods.cfg ...
� Requires an a priori knowledge
� Configuration can also stand into a database
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 13/32
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Object definitionsOther elementsExample scenario
Host
� A service have to be linked to an host
� Only UP and DOWN states
� User notification (problem, recovery)� Same external checks than services
� UP = (WARNING or OK), DOWN = CRITICAL� Typically: ICMP-based checks
� No active checks if related services are OK
� Host group (cosmetic ...)
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 14/32
![Page 8: Nagios](https://reader035.vdocument.in/reader035/viewer/2022072008/55cf8f47550346703b9ab77d/html5/thumbnails/8.jpg)
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Object definitionsOther elementsExample scenario
Other Nagios elements
� contact, contactgroup: to specify who to notify, how to notifyand when to notify
� Used in host and service definitions
� command: to execute check plugins or to send usernotifications
� Links Nagios attributes found in definitions (ex: host name) toplugin parameters
� Already provided for most common plugins (seecheckcommands.cfg file)
� Basic wrapping to send emails (misccommands.cfg file)
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 15/32
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Object definitionsOther elementsExample scenario
Example scenario
Internet
webloria152.81.144.22
http://www.loria.frdnsext152.81.144.14
cat-481
dns1, preny152.81.1.25
dns2, hugo152.81.1.128
nagios152.81.9.211
152.81.0.0/20
152.81.144.0/20
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 16/32
![Page 9: Nagios](https://reader035.vdocument.in/reader035/viewer/2022072008/55cf8f47550346703b9ab77d/html5/thumbnails/9.jpg)
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Object definitionsOther elementsExample scenario
Host definition (example)
hosts.cfg
d e f i n e hos t {host name web l o r i aa l i a s w eb l o r i a l i n u x machineadd r e s s 152 . 81 . 144 . 22check command check−host−a l i v emax check at tempts 3c h e c k p e r i o d 24 x7n o t i f i c a t i o n i n t e r v a l 180 # 3 hour sn o t i f i c a t i o n p e r i o d 24 x7n o t i f i c a t i o n o p t i o n s d , r , f , u# down , r e cov e r y , f l a p p i n g , un r e a chab l ec on t a c t g r oup s a dm i n i s t r a t o r s}
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 17/32
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Object definitionsOther elementsExample scenario
Service definition (example 1)
services.cfg
d e f i n e s e r v i c e {host name web l o r i as e r v i c e d e s c r i p t i o n h t tp s e r v i c echeck command che ck h t t pmax check at tempts 3n o rma l c h e c k i n t e r v a l 5r e t r y c h e c k i n t e r v a l 1c h e c k p e r i o d 24 x7n o t i f i c a t i o n i n t e r v a l 180n o t i f i c a t i o n p e r i o d 24 x7n o t i f i c a t i o n o p t i o n s w, c , r , f , u# warning , c r i t i c a l , r e c ov e r y , f l a p p i n g , un r e a chab l ec on t a c t g r oup s a dm i n i s t r a t o r s
}Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 18/32
![Page 10: Nagios](https://reader035.vdocument.in/reader035/viewer/2022072008/55cf8f47550346703b9ab77d/html5/thumbnails/10.jpg)
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Object definitionsOther elementsExample scenario
Command definitions (example 1)
checkcommands.cfg
d e f i n e command{command name check−host−a l i v ecommand l ine $USER1$/ check icmp −H $HOSTADDRESS$}
d e f i n e command{command name che ck h t t pcommand l ine $USER1$/ che ck h t t p −H $HOSTADDRESS$}
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 19/32
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Object definitionsOther elementsExample scenario
Service definition (example 2)
services.cfg
d e f i n e s e r v i c e {host name dnsex ts e r v i c e d e s c r i p t i o n dns s e r v i c echeck command ch e c k n ame f o r g i v e n dn s
!www. l o r i a . f r ! 1 5 2 . 8 1 . 1 4 4 . 2 2max check at tempts 3n o rma l c h e c k i n t e r v a l 5r e t r y c h e c k i n t e r v a l 1c h e c k p e r i o d 24 x7n o t i f i c a t i o n i n t e r v a l 180n o t i f i c a t i o n p e r i o d 24 x7n o t i f i c a t i o n o p t i o n s w, c , r , f , uc on t a c t g r oup s a dm i n i s t r a t o r s
}Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 20/32
![Page 11: Nagios](https://reader035.vdocument.in/reader035/viewer/2022072008/55cf8f47550346703b9ab77d/html5/thumbnails/11.jpg)
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Object definitionsOther elementsExample scenario
Command definition (example 2)
checkcommands.cfg
d e f i n e command{command name ch e c k n ame f o r g i v e n dn scommand l ine $USER1$/ check dns −H $ARG1$ −a $ARG2$
−s $HOSTADDRESS$# $ARG1$ : f u l l y q u a l i f i e d name# $ARG2$ : IP add r e s s
}
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 21/32
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Object definitionsOther elementsExample scenario
Contact definition (example)
contacts.cfg
d e f i n e con ta c t {contact name andreya l i a s l a u r e n t andreys e r v i c e n o t i f i c a t i o n p e r i o d 24 x7h o s t n o t i f i c a t i o n p e r i o d 24 x7s e r v i c e n o t i f i c a t i o n o p t i o n s w, u , c , rh o s t n o t i f i c a t i o n o p t i o n s d , rs e r v i c e n o t i f i c a t i o n c ommand s n o t i f y −by−ema i lh o s t n o t i f i c a t i o n c ommand s host−no t i f y −by−ema i lema i l a n d r e y@ l o r i a . f r}
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 22/32
![Page 12: Nagios](https://reader035.vdocument.in/reader035/viewer/2022072008/55cf8f47550346703b9ab77d/html5/thumbnails/12.jpg)
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Object definitionsOther elementsExample scenario
Contactgroup definition (example)
contacts.cfg
d e f i n e con tac tg roup {contactgroup name a dm i n i s t r a t o r sa l i a s group o f a dm i n i s t r a t o r smembers andrey}
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 23/32
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Local checksRemote checks
Local checks
� Getting information about your local system
� Plugins based on system commands (such as ps, df, uptime)
� check_disk -w 30% -c 15% -p /var
� check_load -w 2.0,1.0,0.5 -c 4.0,2.0,1.0
� check_procs -w 150 -c 250 --metric=PROCS
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 24/32
![Page 13: Nagios](https://reader035.vdocument.in/reader035/viewer/2022072008/55cf8f47550346703b9ab77d/html5/thumbnails/13.jpg)
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Local checksRemote checks
Direct network checks
� Checking network services of remote hosts
� Plugins based on network protocols
� Directly executed from the Nagios machine
� check_icmp -H 1.14.1.2 -w 100.0,20% -c 200.0,40%
� check_tcp -H boston.loria.fr -p 7000
� check_ftp -H ftp.inria.fr -p 21 -e 220
� check_http -H http://www.esial.uhp-nancy.fr
� check_smtp, check_imap, check_pop
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 25/32
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Local checksRemote checks
Checks using SSH (Secure Shell)
check_by_ssh (plugin)
sshd(daemon)
check_xyz(plugin)
NAGIOS CLIENT
priv.key pub.key
NETWORK
� Nagios executes the check by ssh plugin
� To run a plugin deployed on the remote machine
� Based on asymetric keys to log without typing a password
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 26/32
![Page 14: Nagios](https://reader035.vdocument.in/reader035/viewer/2022072008/55cf8f47550346703b9ab77d/html5/thumbnails/14.jpg)
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Local checksRemote checks
Checks using NRPE (Nagios Remote Plugin Executor)
check_nrpe (plugin)ref
nrpe (daemon)
check_xyz(plugin)
NAGIOS CLIENT
tcp connection
NETWORK
ref1 check_xyz arg1 arg2ref2 check_abc arg3 arg4... ...
� Nagios executes the check nrpe plugin
� To interact with a dedicated daemon called nrpe
� Based on pre-configured plugin invocations
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 27/32
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Local checksRemote checks
Other remote checks
� Checks using SNMP� Collecting management information from SNMP agents� check_snmp plugin ⇔ net-snmp snmpget� SNMP reply + warning and critical limits ⇒ service state
� Checks using NSCA (Nagios Service Check Acceptor)� Passive method where checks are initiated by the resources
themselves (close to SNMP traps)� A NSCA daemon waits for incoming check results (on the
Nagios machine), while the send_nsca program (on theremote machines) sends messages containing check results
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 28/32
![Page 15: Nagios](https://reader035.vdocument.in/reader035/viewer/2022072008/55cf8f47550346703b9ab77d/html5/thumbnails/15.jpg)
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Making configuration more simpleSpecifying Dependencies
Making configuration more simple
� Monitoring the same service on several hosts� setting the host name attribute of the service as a comma
separated list of host names� or setting an hostgroup name attribute for the service
� Defining template-based objects� Notion of inheritance� Factorizing many low-interest attributes� register attribute to define a template� use attribute to inherite from a template
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 29/32
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Making configuration more simpleSpecifying Dependencies
Service and hostgroup (example)
d e f i n e hos tg roup{hostgroup name dn s h o s t sa l i a s h o s t s s uppo r t i n g DNSmembers dns1 , dns2 , dn sex t}
d e f i n e s e r v i c e{hostgroup name dn s h o s t ss e r v i c e d e s c r i p t i o n dns s e r v i c echeck command ch e c k n ame f o r g i v e n dn s
!www. l o r i a . f r ! 1 5 2 . 8 1 . 1 4 4 . 2 2max check at tempts 3n o rma l c h e c k i n t e r v a l 5r e t r y c h e c k i n t e r v a l 1c h e c k p e r i o d 24 x7n o t i f i c a t i o n i n t e r v a l 180n o t i f i c a t i o n p e r i o d 24 x7n o t i f i c a t i o n o p t i o n s w, c , r , f , uc on t a c t g r oup s a dm i n i s t r a t o r s}
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 30/32
![Page 16: Nagios](https://reader035.vdocument.in/reader035/viewer/2022072008/55cf8f47550346703b9ab77d/html5/thumbnails/16.jpg)
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Making configuration more simpleSpecifying Dependencies
Specifying dependencies
� Remainder: service critical state ⇒ host check� How does Nagios make a difference between:
� case 1: webloria is really down� case 2: webloria is unreachable due to the network?
� Solution: defining dependency relationships amongst objects(parents attribute)
� If an host is detected as down, the parent is checked� If the parent is OK, the initial host is really declared down� If not, the host is declared unreachable
� Ex: cat-481 router defined as a parent of webloria
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 31/32
IntroductionNagios at a glance
Nagios configurationChecks and their execution
Advanced configurations
Making configuration more simpleSpecifying Dependencies
Conclusions
� Open source monitoring solution
� Based on simple concepts: checks, states, notifications
� Easily extensible and integrable
� No discovery mechanism� Experience it during lab exercises!
� Monitoring hosts and their services� Developing and testing our own plugin� Experimenting state transitions and notifications
Laurent Andrey, Remi Badonnel An Introduction to Monitoring with Nagios 32/32