devops days tel aviv 2013: ignite talk: monitoring patterns with riemann - itai frenkel & eli...
DESCRIPTION
Riemann aggregates events from your servers and applications with a powerful stream processing language, which enables concise monitoring rule declarations. This 5 minute ignite talk gives a taste of common monitoring pattern implementations: heartbeat, statistics, event enrichment, state based filters, multi-tenant monitoring, and reviews what you can do with Riemann after processing these patterns. Speakers: Itai Frenkel and Eli Polonski, GigaSpaces Eli Polonsky and Itai Frenkel work at GigaSpaces, developing the the Cloudify open source devops and cloud automation suite. Part of their work includes open source devops tool evaluation such as Riemann.TRANSCRIPT
Built for monitoring distributed systems
Event Stream Processing (like ESPER/Drools Fusion) Shared-State (index)
Open Source (written by aphyr (Kyle Kingsbury))
riemann.io
Concepts
host ‘A’
service ‘req_latency’
state ‘ok’
metric 1
ttl 60
tags ‘important’
event
Concepts
(host, service) last event
(‘A’, ‘req_latency’)
(‘B’, ‘req_latency’)
(‘C’, ‘req_latency’)
(‘D’, ‘req_latency’)
index
Heartbeat Trigger
(expired (tagged “keep_alive” (email "[email protected]")))
Heartbeat Trigger
Threshold Trigger
Threshold Trigger
(where (and (service "req_latency") (> metric 10)) (email "[email protected]"))
Change State(host, service) metric state
('A', 'req_latency') 20 error
('B', 'req_latency') 1 ok
('C', 'req_latency') 5 error
('D', 'req_latency') 5 ok
Change State
(where (service “req_latency”) (split (< metric 2) (with :state "ok" index) (> metric 10) (with :state "error" index)))
(changed-state {:init “ok”} (email [email protected]))
Time Window Statistics
ClusterStatistics
Cluster Statistics
(by [:host] (where (service "req_latency") (percentiles 60 [0.5] index-max-of-median)))
(def index-max-of-median (smap folds/maximum index))
Event Storm Filtering
Event Storm Filtering
(def alert-devops (throttle 100 3600
(rollup 3 3600 (email "[email protected]"))))
(where (tagged "db-connection-exception") alert-devops)
Event Enrichment host ‘A’
service ‘req_latency’
state ‘ok’
metric 1
ttl 60
tags ‘important’
tenant 1
Event Enrichment
(defn change-event [my-key my-value & children](fn [event] (let [my-event (assoc event :my-key :my-value)] (call-rescue my-event children))))
(change-event 'tenant' '1' index)
Tenant 1 Tenant 2 Tenant 3
Multi-Tenancy
(def riemann-agg (tcp-client :host "agg-hostname"))
(changed-state (change-event 'tenant' '1') (forward riemann-agg))
http://riemann.io