osmc 2014: naemon 1, 2, 3, n | andreas ericsson
Post on 02-Jul-2015
257 Views
Preview:
DESCRIPTION
TRANSCRIPT
… more than software
Naemon & Nostalgia
Andreas Ericsson
ageric79@gmail.com
… more than software
Agenda
● Agenda
● Ego slide
● About op5
● IT now and then
● Naemon
● Roadmap progress
● Up and coming
… more than software
Ego slide
● Programming since I was seven
● Core architect at op5 since 2003
● Nagios core developer 2009-2013
● Performance fanatic
● Author of Merlin and Nagios 4
● Naemon maintainer
● Voted “most likely to invent the lightsaber and then accidentally killing
himself with it” when we last played that particular drinking game.
● Motivation: “He does a lot of dumb shit but he's really smart on the
inside”
… more than software
About op5
● Founded 2003
● +900 customers
● 97% renewal rate
● Focus on large installations
● http://www.op5.com
… more than software
IT now and then
… more than software
“IT” ca 1970
● Computer performance measured in CPM (cards per minute) as often as
in MHz
● 1 year of training before you were allowed to touch a machine
● Unix development began to create a multitasking, multiuser system
● Firs time a computer passes a college-level calculus course
● First successful ARPANET test
● IANA formed
● Computer-powered devices per person: 0.00000001
● Average CPU speed: 1Mhz
● admin:computer ratio 300:1
… more than software
IT ca 1980
● One-man computers were gaining ground (Apple II)
● PacMan!
● Portable computers were being developed
● Ethernet standards introduced
● ARPANET, BitNet, CSNET et al began merging
● TCP/IP standards formalized
● SMTP, DNS etc quickly followed
● Average CPU speed: 8Mhz
● admin:computer ratio 10:1
… more than software
IT ca 1990
● IBM PC style computers becoming popular
● GUI's become popular
● First web page created
● Linux invented
● Internet had suffered its first worm (Morris)
● “Software installation” was actually part of a job description
● Average CPU speed: 66Mhz
● admin:computer ratio 1:1
… more than software
IT ca 2000
● WiFi standards emerge
● The dot-com era
● Google overtakes AltaVista as the most popular search engine
● It was ok for non-nerds to get computers
● Monitoring starts to become a thing
● Nagios development starts
● Average CPU speed: 800Mhz
● admins:computers ratio 1:10
… more than software
IT ca now
● An average smartphone has 120 million times the computing power of
the first general-purpose computer (Ferranti Mark 1)
● An average smartphone has 4 million times the amount of main memory
● Giant datacenters house 100,000+ servers
● Average CPU speed: 2.4GHz
● admins:computers ratio 1:300
… more than software
Hands up if...
● … the number of servers you
manage has grown faster than
the staff you have to monitor and
manage them
● … you use more than two tools
just to manage your servers
● … you have people dedicated to
managing the servers that
monitor and manage your
servers
… more than software
Conclusions
● Manpower is getting scarce
● Sharing resources is more important than ever
● The most expensive resources are the most important to share
● Using what works now but doesn't lock one into a corner is key to
remaining effective
● Developers have a duty to minimize the job they do (laziness is
important! :-p)
● Developers have a duty to minimize the job sysadmins do
● Keep stuff simple. If it breaks, you not only get to keep the pieces, but
you get to do the same job again in a different way
… more than software
Last year's Naemon roadmap
● Completed
● External commands via query handler
● Dropdir support
● Livestatus
● In progress
● Check result transformer
● Object creation/modification at runtime
● Scheduler-controlled helper daemons (well, kinda)
● Scrapped
● Runtime-modifiable main-config – no usecase found
● Object extensions – custom variables fill the same role
… more than software
Up and coming
● Backlogged: Runtime object creation and modification
● Backlogged: Check result transformer
● Backlogged: Scheduler-controlled helper daemons
● Performance data handling
● Active agents
● Report data export
● Because http://www.youtube.com/watch?v=8yVFkMXy8rw
… more than software
Runtime object creation/modification
● User story:
● If users prefer to configure their monitoring on the monitored hosts,
we should automagically add them to the monitoring config without
reloading it.
● On-call schedule handover
● Added as a queryhandler extension
● Housekeeping events every X seconds
● Allows new stuff to call in and start getting monitored automagically
● Object creation may not happen if I can't get it stable in a reasonable
timeframe, because config reload is superfast nowadays
… more than software
Check result transformer
● User story:
● Since anomalies in network and application behaviour often indicate
errors, it's important that we can detect them and notify about it
● External helper connects to NERD
● Events zip to the helper
● Helper can alter state/output/perfdata/whatever
● Helper zaps result back to core via QH
● Allows for adaptive thresholds
● The monitoring system requires little or no configuration
● Inspired by BisCheck (which we will likely end up using as engine)
… more than software
Scheduler-controlled helper daemon
● User story:
● Users must be able to trust that exported data is complete
● New twist: Naemon will connect to other systems sockets instead
● Will most likely be implemented as a module
… more than software
Performance data handling
● User story:
● Users should be able to produce graphs of all metrics they monitor
with as little impact on available resources as possible.
● Performance data can now be streamed from Naemon
● Reduces I/O from spoolfile writing
● Feeds data to PNP or Graphite
● Gets rid of the last synchronously executed system call
● Already completed
● Builds on top of already-implemented interfaces
… more than software
Naemon
Perfdata handling design
PNP/Graphite/?
NERD
… more than software
Active agents
● User stories:
● Naemon should scale to as close to infinite sizes as possible
● To save time, new hosts should report what metrics they're offering
and Naemon should automagically monitor them.
● collectd or a pushing version of check_mk
● Input for automagic host/service creation
● Large providers already write and ship compatible plugins
● Complements active checks, but doesn't replace them
● Improves security in many setups (especially over NRPE)
● Allows us to reuse existing code (ie, be lazy and do less work)
… more than software
Active agents design
Naemon
query-handler
Receiver/Evaluator
livestatus collectd
collectd
collectd
collectd
… more than software
Report data export
● User story:
● It should be easy to get performance data from Naemon into
SystemX in order to facilitate the graphing power of SystemX
● Streams host- and service statechanges, downtime and start/stop events
from Naemon
● Builds on top of existing interfaces
● Provides excellent performance
● Allows using extreme-performance data warehouse software to store
(possibly huge amounts of) report data.
… more than software
Naemon
Report data export design
Data warehouse
NERD
… more than software
Questions?
● ageric79@gmail.com
● http://www.naemon.org/
● http://www.youtube.com/watch?v=8yVFkMXy8rw
● Or just talk to me. I'm not dangerous until I have that lightsaber ;-)
top related