from monitoringsucks to monitoring love , 2016 edition
TRANSCRIPT
From #MonitoringSucks to #MonitoringLove
Open Source Monitoring in 2016
@KrisBuytaertFlossUK 2016, London , UK
Kris Buytaert
I used to be a Dev,
Then Became an Op
Chief Trolling Officer and Open Source Consultant @inuits.eu
Everything is an effing DNS Problem
Building Clouds since before the bookstore
Organising Conferences
Evangelizing devops
An opinionated talk about the Open Source Monitoring tooling landscape
In which I hope to learn from YOU
#devops=~C(L)AMS
Culture
(Lean)
Automation
Monitoring and Measurement
SharingDamon Edwards and John Willis
Gene Kim
Monitoring is usually an aftertoughtENOBUDGET, ENOTIME
An 2008 OLS Paper
We have bloated Java tools
Some open Core stuff
DYI folks want traditional Nagios
DBA Required
#monitoringsucks
John Vincent (@lusis), june 2011
A sub #devops movement
https://github.com/monitoringsucks/
Why #monitoringsucks
Manual config (gui)
Not in sync with reality
Hosts only
Services sometimes
Application never
Chaos or out of sync with reality
Alert Fatigue
#monitoringlove
Ulf Mansson #devopsdays Rome 2011
A new era of tooling
#monitoringlove hacksessions @inuits
#monitorama
What we want
Small , well suited componentsCollect
Transport / Mangle
Store
Analyse
Act / Alert
Visualize
#monitoringlove
But the love was about :
Sensu
Awesome for non static environments
Scaling a clustered RabbitMQ ?
This is Europe, U no do cloud
Automation of #monitoring brought back the #love
Automation
Monitoring a service vs Monitoring a Service
definition of done:
monitored and in production
A software project is not done until your last end user is dead
Culture, Automation,Measurement :measure all the thingsSharing
CollectD all the metrics, at high intervals
Oldschool graphite
Graphite++
API
Dashboards Grafana
Gdash
Engines : InfluxDB
Cyanite
Draw as Infinite
Time To Deploy
Deploy Frequency
Lifecycle frequency
Map to other metrics
Graph-Explorer
(Vimeo)
Metrics 2.0
Add Events
Grafana
Graphs to Knowledge
SkylineOculus
Creating Information out of this data
Big data
Machine Learning
Aggregation
Alert on streams
Alert on aggregated metrics
Riemann
I still don't get it ?
Distributed Top
Do you like Clojure ?
Riemann Health plugin ?
s/riemann-health/collectd/g;
Output to graphite
Prometheus
Started 2012
SoundCloud
Metrics Based
Scrapes EndpointsExisting endpoints for limited tools
Often needs custom code
Graphite Exporter
Push Gateway
Great Alerting
Might need some coding
But I have log files..
Logs and Metrics
Graylog2
ELSA (Enterprise Log Search and Archive)
ELK Stack
Collect from anywhere
Filter
Send anywhere
Queing
Infitnite Diskspace ?
Logstash output Statsd => Graphite
Keep patterns around,
Selectively purge data
APM
But what about my apps ?Half the world cheers about SAAS tools :(
Packetbeat
Traffic Flow through network
Transactions causing errros
SQL per HTTP
API call usage
Old PacketBeat
Beats ?
Elasti.co
Collect, Parse and Ship
Q: Is all the data you care about suitable for Elastic Search ?
What about Long Term Storage ?
Do you even want to build alerting from this ?
Checking for Failure
IcingaAutomated config generation
SensuCloudstyle
Prometheus Metric based
Waking you up at night
Flapjackflapjack.iomonitoring notification routing + event processing system
OpenDuty github.com/szechuen/OpenDutyDuty management
Aggregating
Thruk
Grafana
Dashing
Our Current Stack
I love where Monitoring is heading We have much less false positives these days
Contact
Kris Buytaert [email protected]
Further Reading@krisbuytaert http://www.krisbuytaert.be/blog/http://www.inuits.eu/
Find Inuits in
Brasschaat,Ghent,Rotterdam,Prague,Kiev,Brno
Do not place content in the brand signature area !!
Do not place content in the brand signature area !!
ING OrangeRGB = 255 -102 - 000
ING Light blueRGB = 180 - 195 - 225
ING Dark blueRGB = 000 - 000 - 102
ING Warm Grey 5RGB = 150 - 150 - 150ING colour balance
Guidelines www.ing-presentations.intranet