grnet noc use puppet and network inventory to populate nagios/icinga configuration tf ... ·...

24
http://www.grnet.g r GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF-NOC Dublin Alexandros Kosiaris ([email protected])

Upload: others

Post on 10-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

http://www.grnet.gr

GRNET NOC

Use puppet and network inventory to populate nagios/icinga configuration

TF-NOC Dublin

Alexandros Kosiaris ([email protected])

Page 2: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Network & Equipment

•Storage Equipment: Netapp/IBM N5300 EMC Celerra NS-480

•Computing Equipment: • Virtualization (KVM)

12 Blade servers, HP BL-460c 12 IBM 1U Servers 128 1U Fujitsu Servers 275 2U HP Proliant Servers ~200 Vms

Optical Network: ~70 cities (+30 within next year) 15years-leased dark fiber DWDM/CWDM network

Optical Equipment:

Alcatel 1626LM, 1696MS, 1678MCC Adva FSP2000

Routing Equipment: Juniper T1600, Juniper MX960 ~10x Cisco 12000s, a few Cisco 7200s/7300s

Switching Equipment: Cisco 6500 Several Cisco 3750, Cisco 2970, Juniper ex4200, Extreme X450a/X350

Page 3: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Nagios + Network Equipment or (more accurately) Switching and Routing

In-house developed Network Inventory (a.k.a. GRNETDB)

•A MySQL database of almost 150 tables •Populated multiple times a day by a PHP discovery script

SNMP, telnet + expect •Basic Concepts:

Node Interface Layer Domain Location

•These concepts get extended to represent functionality Routing, Switching nodes Layer2, Layer3 interfaces Switching, administrative domains

Page 4: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

In-house developed python Django project, with multiple sub-apps

•Network (the interface to the database) •RG (router graphs, take a peek at http://mon.grnet.gr/rg) •Maps (take a look at http://mon.grnet.gr/network/maps) •Hostmaster •Optical network (built mostly on Location info) •Nadjicingo

Builts on network app and generates a nagios/icinga configuration

•Nagvis Same thing but generates/updates nagvis config

Nagios + Network Equipment or (more accurately) Switching and Routing

Page 5: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Nadjicingo A Django management command outputing nagios/icinga configuration

•Run by crontab every hour (manage.py nadjicingo) •Will generate nagios configuration objects for

Routers Switches Interfaces

•L3 Topology aware (nagios hates cyclic dependencies – aka redundant links), populates parents field for most devices. •Hardware checks in devices •Business logic embedded in interface descriptions:

Part of it is a unique identifier for a customers link –[.NTUA-4] => National Technical University's L3 link –[AUTH@ERMOU-1] => Aristotle University of Thessaloniki L2 link at Ermou PoP

Page 6: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Nagvis A Django management command (again...)

•Run by crontab every hour (manage.py nagvis) •Will update a specific nagvis map configuration by:

Removing obsolete nodes Adding new nodes to a special area for manual positioning on map

•Also features an automated positioning mode based on devices Latitude Longitude.

Nice for showoff but not for overview in monitoring applications •Will only populate host objects in map. •Service objects cluttered it too much and information is rightly available anyway

Page 7: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Nagvis Network Map

Page 8: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Servers, Services ? A little bit of history

•For years, GRNET only had very basic services (DNS, email, Web) •And some router supporting services (Looking glass, mrtg, rancid) •And very few servers (<=10) •3 years ago, major paradigm shift from networking to services •20 Servers bought, and then 132 and recently 275 more •End user services were born:

Public cloud storage service (Pithos) Virtual Private Servers (ViMa) Students books statements (Eudoxus) Student Id cards (Paso) Public IaaS (Okeanos) Academic Professor Elections (Apella)

•Plus many other services and projects (TCS, Whois, NTP, VoD,…) •The result ? => 200 Vms were created for managing all this infrastructure

Page 9: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Puppet to the rescue What is Puppet?

•It's a stack of applications •It's a language (a declarative one as well) •It's a policy and state enforcing tool •It's a attribute and state discovery tool (kind of...) •It's a new paradigm in managing systems!

What is Puppet not?

•Not just an automation tool •Not a “For loop” •Not a command execution framework (it can be reduced to that though)

AGAIN: A new paradigm, you need to change the way you work

Page 10: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Puppet Concepts Facts

•Attributes of a system: OS Version and family Available memory CPUs Block devices IP addresses/netmasks MAC addresses And anything else you can write code for it to be discovered

LLDP neighbours IPMI functionality Hardware info Apache vhosts

•Discovered by facter and then made available to Puppet

Page 11: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Puppet Concepts(2) Resources

•Files, Directories •Users, Groups •Packages •Vlans •Interfaces •Nagios objects!!!! •And a lot more (http://docs.puppetlabs.com/references/latest/type.html)

Classes

•A way to group resources •Support inheritance and mixins (aka including) •The standard class has 3 resources defined •Package {'software': } •File { '/etc/software.conf': } •Service { 'softwared': }

Page 12: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Puppet Concepts(3) •Nodes

•A.k.a. machines (VM or hardware) •A node CAN (and probably will) have multiple puppet classes •Node population can be done in multiple ways: •Puppet language config •LDAP •External script

Puppetd agents running in each machine (daemon or crontab) Central Puppetmaster (with an RDBMS) holds all the configuration and data

Page 13: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Hello World example class helloworld {

file { '/tmp/helloworld': ensure => present, owner => root, group => root, mode => 640, content => 'Hello world' }

} node mynode { include helloworld }

Will create the /tmp/helloworld with all the attributes as defined above More importantly, if run again it will make sure to wipe any possible changes and restore the state as is defined above

Page 14: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Back to nagios Let’s use a puppet native type

nagios_host { “$hostname”: address => 10.10.10.10, alias => myhost, contact_groups => hostadmins, hostgroups => 'Puppeted Servers',

} /etc/nagios/nagios_host.cfg gets populated Problem is ...

•This is executed in the machine running puppetd not the nagios server.

No problem. Puppet supports exported resources.

Page 15: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Exported resources Let’s prepend the definition with two @ signs

@@nagios_service { 'myservice' contact_groups => hostadmins, host_name => $hostname, tag => 'collect_me_nagios_server', }

•Exports the resource but does not realize it on the machine running puppetd •No /etc/nagios/nagios_service.cfg file created

<<| Nagios_service tag == 'collect_me_nagiosserver' |>>

• In nagios server’s manifest. •/etc/nagios/nagios_service.cfg populated. •nagios,icinga.cfg can now just include the file/directory and monitoring begins

Page 16: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Simple example A manifest for all authoritative DNS servers Install bind9, install configuration and ensure it is running Open up firewall Setup a simple DNS check

class authoritativedns { include bind9 include service::dns

@@nagios_service { "authdns": command => "check_dig!www.grnet.gr", servicegroups => "DNS,DNS:Authoritative" }

}

Page 17: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Interesting use cases Class hierarchy means:

A base class nagios::host that is included in all other So all servers nagios-monitored without any intervention

But: A Server is physical and has IPMI capabilities: So export another nagios host for it

if $ipmi_capable {

@@nagios_host { "$ipmi_dns": address => $ipmi_ipaddress, tag => "hardwarehost", }

}

Page 18: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Interesting use cases (2) Server is an HP Proliant Server class hp-health { package { [ 'hp-health', 'hpacucli' ]: ensure => present, } nagios::host::service { 'hpacucli': ensure => present, servicegroups => 'HARDWARE', command => 'check_nrpe!dsa-check-hpacucli!0', } nagios::host::service { 'hpasm': ensure => present, servicegroups => 'HARDWARE', command => 'check_nrpe!dsa-check-hpasm!0', } }

Page 19: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Interesting use cases (3) Multicast beacons (double exported resources!!!)

define ssmping_check($ipv4, $ipv6) { $local = $::fqdn $remote = $name if ($::ipaddress and $ipv4 and $local != $remote) { @@nagios_service { "ping-ssm-$remote-$local-v4": ensure => present, check_command => "check_nrpe!check_ssmping!$ipv4", host_name => $local, service_description => "Multicast from $remote SSM IPv4", } … } # export the checks... @@ssmping_check { $fqdn: ipv4 => $ipaddress, ipv6 => $ipv6address}

Page 20: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Interesting use cases (4) Standard checks for all servers nagios::host::service { "disk": command => "check_nrpe!check_disk!13% 7%", } nagios::host::service { "load": command => "check_nrpe!check_load!4,3,2 5,4,3", } nagios::host::service { "users": command => "check_nrpe!check_load!20 30", } nagios::host::service { "swap": command => "check_nrpe!check_swap!60 40", } nagios::host::service { "check_tainted": command => "check_nrpe!check_tainted!0", } nagios::host::service { "check_firewall": command => "check_nrpe!check_firewall!0", }

Page 21: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Problems arise /etc/nagios/*.cfg files can become quickly large

•However each resource collection reads the entire file •Problem solved by disabling collections and creating the entire config file every time, however a more elegant solution would be nice

Exported resources cost •Each is an entry in the database and they are not used for nagios alone. •Execution speed suffers and sometimes times out •Problem solved in database by adding some indexes... but is bound to show up again •Puppet devs know it, some effort goes there

Page 22: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Problems arise (2) Puppet's declarative language can cause problems at times

@@nagios_host { 'myhost':

Hostgroups => $myhostgroups }

•And host also has classes A,B,C apart from nagios class. •Which class is going to declare $myhostgroups?

•Multiple solutions exist, all of them not elegant. •Externally (via LDAP) •Fact based •Populated hostgroups, not hosts

Page 23: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

Problems arise (3) Active checks cost. Not a Puppet issue but a nagios one

•check_mk •Distributed monitoring

Well obsess_over_services sucks… mod_gearman

•For now splitting the infrastructure in Networking Services

•But if Services grow more? Variable tagging on resources

@@nagios_service { 'myservice' contact_groups => hostadmins, host_name => $hostname, tag => 'collect_me_nagios_server_N',

}

Page 24: GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF ... · 2012-06-06 · Use puppet and network inventory to populate nagios/icinga configuration

http://www.grnet.gr

Questions

? Alexandros Kosiaris

GRNET NOC Systerms Admin [email protected]