distributed monitoring at hyves- puppet

23
Welcome Jeffrey Lensen System Engineer 1

Upload: puppet-labs

Post on 17-May-2015

4.812 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Distributed monitoring at Hyves- Puppet

Welcome

Jeffrey  Lensen

System  Engineer

1

Page 2: Distributed monitoring at Hyves- Puppet

Hyves Infrastructure

3000+  Gentoo  servers190  func;ongroups/types

3  datacentersDatabase  for  server  management

2

Page 3: Distributed monitoring at Hyves- Puppet

Using Puppet

Since:  January  2007Puppetmasters:  3  Loadbalanced,  

1  for  CA  and  developmentVersion:  2.6.1

MySQL  backend  for  (thin_)storeconfigsNginx  +  8  Mongrel  instances  per  server

100+  modulesnodes.rb  uses  management  database

Puppet  run  every  morning  on  every  server

3

Page 4: Distributed monitoring at Hyves- Puppet

Nagios

8  Nagios  hosts  in  distributed  setup2500  hosts

1  Nagios  master  for  web  and  aler;ngScripts  to  generate  configura;on

Management  database  for  informa;onTemplates  for  service  checks

Died  during  large  fallouts

4

Page 5: Distributed monitoring at Hyves- Puppet

Icinga

Switched  to  Icinga  November  2010

Distributed  Icinga  setup  doesn’t  require  centralized  hostVery  fast  standalone  Icinga-­‐web  interface

Uses  database  backendREST  API

Switching  was  easy  due  to  similar  configura;on

5

Page 6: Distributed monitoring at Hyves- Puppet

Current monitoring setup

Monitoring  hosts:  12  (4  per  DC)Services:  over  83.000Hosts:  nearly  3.500

Average  check  interval:  every  5  minNOC  monitoring  host:  1

Overview  checks  using  APICommandline  interface

6

Page 7: Distributed monitoring at Hyves- Puppet

Problems with monitoring

Adding  new  checks  meant  manually  edi;ng  a  lot  of  templates

Things  that  should  be  monitored  aren’tWon’t  realize  it  un;l  it’s  too  late

No  monitoring  makes  it  harder  to  find  the  problem

7

Page 8: Distributed monitoring at Hyves- Puppet

Using Puppet to configure Icinga

Puppet  knows  it  all  so  why  not  use  that  informa;on?

Exported  resources  from  Naginator  to  define  monitoring  checks

Include  the  monitoring  defini;ons  in  profiles

Running  Puppet  defines  all  necessary  monitoring  checks  for  that  host

8

Page 9: Distributed monitoring at Hyves- Puppet

Example

modules/monitoring/manifests/init.pp:

class monitoring { service { "nrpe": ensure => running, enable => true }

@@nagios_host { "$hostname": address => $ip }

@@nagios_service { "NRPE $hostname": service_description => "NRPE", check_command => "check_nrpe_scripts", }}

Appending $hostname in nagios_service definition to prevent duplicate definitions

on monitoring hosts

9

Page 10: Distributed monitoring at Hyves- Puppet

Example Nginx

modules/nginx/manifests/init.pp:

class nginx { service { "nginx": ensure => running, enable => true }

@@nagios_service { "HTTP $hostname": service_description => "HTTP", check_command => "check_http", event_handler => "service_restart!nginx”, contact_groups => “admins_email, admins_sms” }}

Automa;cally  create  HTTP  checks  when  including  Nginx

10

Page 11: Distributed monitoring at Hyves- Puppet

Predefining and distributing

manifests/defines.pp:

$__notifications_enabled = $systemstatus ? { operational => "1", fail => "0"} Nagios_host { ensure => present, host_name => $hostname.$domain, hostgroups => $role, use => "generic-host", #our standard template alias => $hostname, notifications_enabled => $__notifications_enabled, target => "/etc/icinga/puppetgenerated/hosts/$hostname.cfg", notes => $monitoringhost}

Nagios_service { ensure => present, host_name => $hostname.$domain, use => "generic-service", #our standard template notifications_enabled => $__notifications_enabled, target => "/etc/icinga/puppetgenerated/services/$hostname.cfg", notes => $monitoringhost}

11

Page 12: Distributed monitoring at Hyves- Puppet

Retrieving exported resources

modules/icinga/manifests/init.pp:

class icingacollect { Nagios_host <<| notes == "$hostname" |>> { require => File["/etc/icinga/puppetgenerated/hosts"] } Nagios_service <<| notes == "$hostname" |>> { require => File["/etc/icinga/puppetgenerated/services"] }}

12

Page 13: Distributed monitoring at Hyves- Puppet

Why not Tags?

Using  “notes”  to  assign  monitoring  host

Tagging  caused  problems  when  seing  require  in  Nagios_host  and  Nagios_service

Tagging  meant  redefining,  it’s  not  inherited  

Solu;on:  stages  (?)

13

Page 14: Distributed monitoring at Hyves- Puppet

Fail-safes

modules/icinga/manifests/init.pp:

class icinga { include icingacollect

exec { "verify new cfg": command => "/usr/bin/icinga -v /etc/icinga/verify-puppetgenerated.cfg", require => Class["icingacollect"] }

exec { "mv cfgs": command => "rm -rf /etc/icinga/puppet/* ; mv /etc/icinga/puppetgenerated/* /etc/icinga/puppet/", require => Exec["verify new cfg"] }

exec { "restart icinga": command => ""/usr/bin/printf '[] RESTART_PROGRAM\n' > /var/icinga/rw/icinga.cmd"", require => [ Exec["mv cfgs"], Service["icinga"] ] }}

14

Page 15: Distributed monitoring at Hyves- Puppet

Deploying monitoring

Deploy  script  starts  Puppet  run  on  all  monitoring  hosts

Threaded  with  small  sleep  in  between  start  to  prevent  thundering  herd  on  Puppet  masters

Waits  for  all  puppet  runs  to  finish  and  reports  whether  they  were  successful  or  not

15

Page 16: Distributed monitoring at Hyves- Puppet

Downsides

Puppet  run  on  Icinga  hosts  takes  about  20  minutes(using  separate  config  files  for  each  host  helps)

Modifying  a  servicecheck  requires  a  puppet  run  on  all  hosts  with  that  servicecheck  (solu;on:  use  -­‐-­‐noop)

Cleaning  up  old  resources

16

Page 17: Distributed monitoring at Hyves- Puppet

Cleaning up

$fqdn = $host_to_be_removed.$domainpuppet apply

--certname $fqdn --node_name facter --thin_storeconfigs $dbsettings --execute 'resources { ["nagios_service","nagios_host"]: purge => true }'

17

Page 18: Distributed monitoring at Hyves- Puppet

What if something isn’t running Puppet?

Configcheck  checkCompares  management  database  with  Icinga  API

18

Page 19: Distributed monitoring at Hyves- Puppet

Other cool stuff

Genera;ng  daemon  checks

modules/role/lib/facter/customfacters.rb:

Facter.add("hyves_daemons") do daemons = ["None"] if File::exists?( "/<path_to_config>/daemons.conf" ) daemons = [] daemonarray = [] daemonconf = %x{grep name /<path_to_config>/daemons.conf} for daemon in daemonconf daemon.sub!(/.*\* name:/, '') daemonarray.push(daemon.chomp) end end setcode do daemonarray.uniq endend

19

Page 20: Distributed monitoring at Hyves- Puppet

Other cool stuff

Genera;ng  daemon  checks

modules/daemons/manifests/init.pp:

class daemons { define add_daemon_check { @@nagios_service { "$name Daemon $hostname": use => "Daemon-check", service_description => "$name Daemon", check_command => "check_daemon!$name" } }

add_daemon_check { $hyves_daemons: }}

20

Page 21: Distributed monitoring at Hyves- Puppet

Other cool stuff

Genera;ng  overview  daemon  checksrequire 'net/http'

module Puppet::Parser::Functions newfunction(:get_daemons, :type => :rvalue, :docs => "\ This function returns an array of all current hyves_autodaemons, based on the Icinga API ") do |args|

domain = "<domain_of_icinga_web>" url = "/icinga-web/web/api/service/filter[AND(SERVICE_NAME%7Clike%7C*Daemon)]/columns[SERVICE_NAME]/order[SERVICE_NAME;ASC]/authkey=<api_key>/json" response = Net::HTTP.get_response(domain, url) data = response.body results = PSON.parse(data) daemons = Array.new results.each { |result| daemon = result['SERVICE_NAME'] daemon.sub!(/ Daemon/, '') daemons << daemon }

daemons.uniq endend

21

Page 22: Distributed monitoring at Hyves- Puppet

Other cool stuff

Genera;ng  overview  daemon  checksmodules/icinga/manifests/noc.pp:

$__daemons = get_daemons()templatefile { "/etc/icinga/puppetgenerated/other/daemons.cfg": template => template("icinga/daemons.cfg.erb")}

hyvesdaemons.cfg.erb:

define host{ use generic-host host_name daemons alias daemons address www.hyves.nl}

<% __daemons.each do |daemon| -%>define service{ use DaemonOverview-check host_name daemons service_description <%= daemon %>}<% end -%>

22

Page 23: Distributed monitoring at Hyves- Puppet

The End

Jeffrey Lensen | System Engineer | [email protected]

Ques%ons?  Remarks?Ideas?

23