timeseries data in riak - riak meetup stockholm 1/11/2012
Post on 08-May-2015
1.510 Views
Preview:
TRANSCRIPT
Metrics with RiakA retrospective
MartinTörnwall
Many definitions, but here's ours...
Metrics?
So we can visualize it and search for patterns
Recording things that change over time
CPU, network, memory and disk usage, ...
OS
Number of requests, errors, events, ...
Application
Text messages or emails sent, customer service calls, ...
External events
● A named variable: "sys.mem.free"● With tags: "host=sl075", "code=403", ...
avg("sys.mem.free") from 1 hour ago where host="sl075"
What is a Metric?
Going Technical
Why not have distributed metrics?
We have distributed services
Solutions exist, but rely on technology stacks we had no experience of (e.g., HBASE)
Reinventing the wheel?
Just how hard can it be?
I mean, really...
Just how hard can it be?
I mean, really...
Our weekend hack glorious metrics storage and processing software
Introducing Metyr
Design Decisions
● Use familiar tools: Erlang, Riak, HTTP● Not a critical service but ...● ... Avoid SPOF● Write performance >> read performance● Centralized reference clock● Integer only● Avoid 2i if possible● When in doubt, leave it to Riak
In Theory...
Metyr Metyr Metyr
Riak cluster
Client Client Client
No SQL, no schemas, no indices (?), no aggregate operations
Storing metrics in Riak
The naïve way just never works...
Attempt 1
A bucket per metric; index by Epoch time
Make each sample an object
Atomicity, write-once, fast range queries
The Good™
Slow, large overhead, requires 2i
The Bad
Combine samples into chunks by time
Attempt 2
Key Points
● One bucket per metric as before● Split into hour-sized chunks
(configurable)● Chunk key: Epoch time● Chunk value: List of samples● To read: Fetch chunks within interval● To write: Fetch chunk, add sample, write
back
Chunk Anatomy
Time0 Value0
64 bits 64 bits
Tags0...
One sample
TimeN ValueN TagsN......
Writing just got harderSlower since we must fetch a chunk first;
potential race conditions, ...
Tests showed that the solution described so far was inadequate
(Arbitrary) Goal:Write 1K samples/sec
Keep per-metric write buffers, flushed every 10 seconds or so
Buffer them writes
● Race condition on write● Storage requirements● Downsampling of old data
Some Remaining Issues
Thank you!
top related