measure all the things! - austin data day 2014

Post on 28-Nov-2014

670 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides used during presentation that covered metrics gathering and analysis

TRANSCRIPT

Measure All The Things!

Gary Dusbabek Rackspace

@gdusbabek

Motivation What You Really Want

Kinds of Metrics How To Do It

Prognostication

Motivation

It’s all about

the data

We are generating data at an insane rate.

We are generating data at an insane rate.

2006 IDC estimates 161 Exabytes of

data on the Internet

That is 161 MM 1T drives

2009

988 Exabytes of data

6x growth in 4 years

Almost 1B 1T drives

A zetabyte 21 zeroes

Source http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf

2012 Internet was estimated to be shipping roughly 2.5 exabytes of data daily.

Daily

Not counting the NSA

Transferring Data

Generates Data

Metadata!

Secondary Information

A by-product

Example 1

Cloud Monitoring

Is the website up?

GET HTTP/1.1

Status=200 Bytes=432

Time to connect=15ms Time to first byte=21ms

Duration=28ms

Example 2

Netflix

You want to watch an episode of Buffy

Observations What titles you click on What time of day you started watching When you paused Parts you re-watched When you finished (if you finished)

Useless to people consuming the primary data.

Priceless when you’re trying to understand

behavior.

behavior

Understanding = Knowledge

In these cases all the data generated is

time-series

Time Series Data

Related events sorted by time of occurrence

Example 0600 – Wake up 0601 – Checked Hacker News 0605 – Shower 0630 – Breakfast 0630 – Checked Hacker News 0700 – Left for work 0730 – Arrived at work Etc…

Think about how you’d store something like this if

you were building a backend system

Relational Database Much?

You

0600Wake

up

0601Checked Hacker News

0605 Shower

0630 Breakfast

0630Checked Hacker News

0700Left for work

0730Arrive

at work

0731Checked Hacker News

When What

You 0600Wake

up

0601Checked Hacker News

0605 Shower

0630 Breakfast

0630Checked Hacker News

0700Left for work

0730Arrive

at work

0731Checked Hacker News

When What

You

You

You

You

You

You

You

Who

You 0600Wake

up

0601Checked Hacker News

0605 Shower

0630 Breakfast

0630Checked Hacker News

0700Left for work

0730Arrive

at work

0731Checked Hacker News

When What

You

You

You

You

You

You

You

Who

0603Wake

upFriend

0604Checked Hacker News

Friend

0715Left for workFriend

Other Ways?

Less Appealing

You 0600Wake

up0601

Checked Hacker News

0605 Shower 0630 Breakfast 0630Checked Hacker News

0700Left for work

0730Arrive

at work0731

Checked Hacker News

Friend 0603Wake

up0604

Checked Hacker News

0715Left for work

Column Oriented

What You Really Want

You run a

business

You want to make money

You want to make money

Show me the money!

You need to make

decisions

You need to make the right

decisions

How do you do that?

With your gut

With data

Example

API responses are taking a long time.

It’s probably the database.

You add a few indexes.

You allocate more memory.

You get faster disks.

You get bigger processors.

Maybe it’s the network…

You replace ethernet adapters.

You get faster switches.

You replace the cabling.

Crap!

Trace it!

500 ms for entire request 15 ms on the wire getting there. 200 ms to auth 50 ms looking up account 50 ms looking up other stuff 15 ms on the wire getting back. 170 ms rendering in the browser

500 ms for entire request 15 ms on the wire getting there. 200 ms to auth 50 ms looking up account 50 ms looking up other stuff 15 ms on the wire getting back. 170 ms rendering in the browser

Make the right decisions with data.

You need a metrics system

Take these things into account:

Availability Redundancy Accuracy

And your budget

Example: Pretty Graphs

If graphs go away, do you lose money?

The CEO likes them.

Do graphs help you make decisions?

Example: Usage Billing

Will losing data cost you money?

Data Lifecycle

When can I throw it away?

How much work is throwing it away?

How much work is throwing it away?

More work means it probably

won’t happen.

Kinds of Metrics

{Volume, Frequency} ⨯ {Low, High}

Low Volume, High Frequency

5,6,5,6 Things observed infrequently Almost always changes

Low storage overhead

Bulk operations are easy

Usually uninteresting

Low Volume, Low Frequency

5,5,5,6

Roughly the same as LVHF

High Volume, Low Frequency 5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,7,7 Constantly observed But doesn’t change much Optimizations!

Detect and record only level changes Requires caching

High Volume, High Frequency 34,4,7,345,6,4,2,54,67,5,6,55,74,5,3,2,5,6745…

High Volume, High Frequency 34,4,7,345,6,4,2,54,67,5,6,55,74,5,3,2,5,6745…

Numeric vs String

Most will be numeric Some are strings

Usually low frequency Special handling

Numeric vs String

High frequency strings are a sign you’re doing

something wrong or need a different system.

Gauges

Current value of something

Operation: snapshot

Speedometer

Thermometer

CPU utilization

Counter Exists as a set of operations – Operation: increment – Operation: decrement

Read by selecting over time and summing

Example: hits on a website Different than unique hits

Set statsD Number of uniquely seen items Think: Conditional counter Example: number of unique visitors

Timer

How long something takes Statistics (mean, median, min, max, percentiles)

How many times it has happened

Rate at which it is happened

Uses a sliding window

Histograms

Distribution of data

Example: when people visit your site

How Do You Do It?

If you make software

Instrument it! Java?

https://github.com/codahale/metrics Node.js?

https://github.com/mikejihbe/metrics Others?

Of course

If you run systems

Instrument them!

Get data via agent

Get data via pollers Considerations: inside or outside of your network

StatsD

https://github.com/etsy/statsd Ingests, aggregates, flushes Use a client to send your data Pushes aggregations

Graphite Databases Flat files of JSON Wherever

Graphite

http://graphite.wikidot.com

Makes graphs

Pluggable backends (NEW!!!11)

Scaling problems

Buy Enterprise Software

These exist, but I’m an open source hacker and can’t say

much about them.

Roll Your Own

Easier than you think

Harder than you think

Roll Your Own Three components

Ingestion Aggregation/Rollup

Query/Graphing

Avoid Pileups 1 sample per second 3,600 samples per hour 86,400 samples per day 31,536,000 samples per year 1k of storage? (roughly) 32 gigabytes

No!

Measure all the right things!

Does this measurement matter?

You don’t care about it when it changes

You aren’t doing anything with it

You can’t figure out what actions to take from it

(it’s meaningless)

Recent data will almost always be

most important.

Monitoring vs Aggregation

Graphite collects data that is already aggregated.

You are observing history

Looking for patterns

No alerting

Where Things Are Going

Complex Event Analysis

ESPER (my favorite). – Mostly open source.

Not enough projects though L

Data Intelligence

You need this if you don’t know what questions you ought to

ask

Correlating signals in order to make useful conclusions

Thanks!��

@gdusbabek

Photos from the Flickr CC collection

train data dump truck traffic byproduct watching numbers birds moons cake business guts data 2 choices flowers metrics gauge counter marbles timer windmils logs train tower

h"p://www.flickr.com/photos/vxla/4673817364/sizes/z/  h"p://www.flickr.com/photos/tensafefrogs/3649985674/sizes/z/  h"p://www.flickr.com/photos/seanhobson/3906189027/sizes/l/  h"p://www.flickr.com/photos/shankaronline/7291507876/sizes/l/  h"p://www.flickr.com/photos/honou/3350764803/sizes/l/  h"p://www.flickr.com/photos/jdickert/2152739544/sizes/l/  h"p://www.flickr.com/photos/28misguidedsouls/6517859113/sizes/z/  h"p://www.flickr.com/photos/55176801@N02/7911595842/sizes/o/  h"p://www.flickr.com/photos/johnkay/3764457497/sizes/l/  h"p://www.flickr.com/photos/andykirk/412600169/sizes/l/  h"p://www.flickr.com/photos/jeff-­‐anderson/4385042770/sizes/l/  h"p://www.flickr.com/photos/sgis/6532363/sizes/o/  h"p://www.flickr.com/photos/whatbe"erNme/405735418/sizes/l/  h"p://www.flickr.com/photos/rachubarama/2709346242/sizes/l/  h"p://www.flickr.com/photos/femto-­‐photography/4604878864/sizes/o/  h"p://www.flickr.com/photos/pixx0ne/5689978130/sizes/l/  h"p://www.flickr.com/photos/ruth_w/8432567657/sizes/l/  h"p://www.flickr.com/photos/wesley_lelieveld/8571911541/sizes/l/  h"p://www.flickr.com/photos/lifeasart/242208550/sizes/l/  h"p://www.flickr.com/photos/mrsenil/2219108948/sizes/l/  h"p://www.flickr.com/photos/crisNc/2773883011/sizes/l/  h"p://www.flickr.com/photos/ma"blaze/4491948497/sizes/l/  h"p://www.flickr.com/photos/kenNsh/43788618/sizes/o/  h"p://www.flickr.com/photos/dtanist/10809534755/sizes/l/  h"p://www.flickr.com/photos/jarodcarruthers/10372829184/sizes/l/  

top related