graphite, an introduction

Post on 11-May-2015

1.050 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

An introduction to graphite and why it's so great.

TRANSCRIPT

Graphite:An Introduction

Scaling real-time monitoring

The purpose today

What is graphite

Why it’s so great

How to graph(It’s really easy!)

How we use graphite

First, a definition

Alerts+Metrics=Monitoring

Graphite Cacti Munin

NagiosIcinga

BothZenoss Hyperic ZabbixPNP4Nagios

Alerting Metrics

What is graphite

About graphite

● Django web application consisting of 3 parts:○ carbon (relays, caches, aggregates metrics)○ whisper (graphite’s equivalent of RRD files)○ Web UI (graph composer, simple dashboard)

Why graphite?

Why graphing?

Discover trends and patternsWhat time of the day do we get the most users?When x happened, what was the effect on y?How many hits am I getting per hour? How does this compare to last week? last month?

Predict future eventsWhen will we need to add more servers? Databases?

Negative feedbackDid the release into production fix problem x?

Cacti SUCKS

A few reasons:

Ancient user interface (no javascript/ajax), terrible workflow, cannot push metrics, no

formulas, no graph introspection, cannot push metrics, cannot feed out of sequence

metrics, ugly graphs, no API, expose system/os metrics on host via snmp, no graph

composer, no custom graphs, predefine metrics, predefine graphs, static polling interval,

unscalable, tons of work to create one graph, no 3rd party ecosystem, etc.

Graphite ++

Simple

Powerful

Functions(sum, derivatives, integrals, timeshift, mostDeviant, scale,

averages, etc.)

API(Nagios integration, 3rd party custom dashboards)

Scalable

Easy to feed data

Wide ecosystem of 3rd party tools and dashboards

http://graphite.readthedocs.org/en/latest/tools.html

Tools

StatsD

Logster

Skyline

Collectd

Dashboards

Graphite --

No poller

No all in one solution

No easy backups

It probably will become business critical

How to graph

There are tons of ways to feed graphite your data

Bash

#!/bin/bash

timestamp = `date +%s`

value = 10

echo "dot.delimited.metric.name $value $timestamp" | nc -w 1 graphite.

host.name 2003

Python

def send_msg(message, HOST, PORT):

sock = socket.create_connection((HOST, PORT))

sock.send(message)

sock.close()

Python using graphite-pymetrics

from metrics import timing

@timing("heavy.task")

def heavy_task(x, y, z):

# do heavy stuff here

Ruby

require 'socket'

Host = 'somegraphitehost'

conn = TCPSocket.new Host, 2003

conn.puts 'Metrics value timestamp'

conn.close

Java

import java.io.DataOutputStream;

import java.net.Socket;

Socket conn = new Socket("somegraphitehost" , 2003);

DataOutputStream dos = new DataOutputStream(conn .getOutputStream());

dos.writeBytes("metrics value timestamp" );

conn.close();

How we use graphite

700K + metrics per minute

A Common Graphite Stack

Graphite-web

Collectd

Poller(s)

Applications

Carbon Whisper

Dashboards

Statsd

Scripts

Nagios

Collectd

Agent for system/hardware level metricsGrowing repository of plugins for a wide variety of applications:

disk i/o, disk space, cpu, memory, mysql, JMX, java, Redis, file sizes, load, etc.https://collectd.org/wiki/index.php/Table_of_Plugins

Write your custom plugin in python

Nagios integration

You can write Nagios plugins that can alert off of metrics valuesNagios can also feed graphite

performance data, events (ie: update counter each time email is sent), etc.

What to collect?

Hardware/OS metrics

Load

Disk space

Disk I/O

Network data

Application metrics

How often function x is called

Average value of function x

Average running time of function x

Database/Datastore

performance metrics

number of records with value == ?

number of slow queries

Events

Deployments

send a 1, draw as infinite

Log files

http access logs (2xx, 3xx, 4xx, 5xx)

Application logsException counts, results, important events, hits

Final Musings

Treat graphite like ‘Big Data’

You don’t know what metrics you need until you need it

Get Raid 10 SSD’s once you decide to scale

More devopsy

You can start graphing today!

top related