metrics simplified

Post on 04-Jul-2015

4.148 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Metrics SimplifiedMark Lin

mlin@admob.com

why?

"If you can not measure it, you can not improve it" -Lord Kelvin

99.999% ("five nines") = 5.26 minutes

previously ...

Sending/Collecting is complicated. Single collection server. Tedious to configure new metric collection or creation.Calculating metric from file is expensive.

bottlenecks ...

Poll based collection server

Not easy (!fun) to configure new metric collection or creation.

=grunt work for ops-engineer

uhhhh....

enabling technology

Graphite

RabbitMQ

Graphite Local Proxy

RockSteady ( w/ Esper )

path to graph

1min.juicer.output.apple.sc1.jcr1 20 1276822626

echo "1min.juicer.output.apple.sc1.jcr1 20 1276822626" | nc localhost 3400

path to graph

1min.juicer.output.apple.sc1.jcr1 20 1276822626

echo "1min.juicer.output.apple.sc1.jcr1 20 1276822626" | nc localhost 3400

graph

graph

graph

graph = post event forensic

Rocksteady, metric as event

1min.juicer.common.version.sc1.jcr1 100 1276822626 INSERT INTO Deploy SELECT * FROM Metric(name='common.revision') MATCH_RECORNIZE ( partition by colo, hostname measures A.value as revision, A.colo as colo, A.hostname as hostname, A.app as app, A.timestamp as timestamp pattern (A) define A as A.value > prev(A.value))

Rocksteady, metric as event

1min.juicer.common.version.sc1.jcr1 100 1276822626 INSERT INTO Deploy SELECT * FROM Metric(name='common.revision') MATCH_RECORNIZE ( partition by colo, hostname measures A.value as revision, A.colo as colo, A.hostname as hostname, A.app as app, A.timestamp as timestamp pattern (A) define A as A.value > prev(A.value))

auto threshold, prediction

correlation

Deployment related problem.

Capture sets of metrics when important ones crossed threshold.

Determine dependencies such as cpu to request to second or response time.

correlation

Deployment related problem.

Capture sets of metrics when important ones crossed threshold.

Determine dependencies such as cpu to request to second or response time.

revelation

beyond simple metric

Timing info per request.

Actual time spent in each component in an application.Map out dependency, find exact area of problem.

beyond simple metric

Timing info per request.

Actual time spent in each component in an application.Map out dependency, find exact area of problem.

what we learned?

1. Make metric sending simple.2. Nice UI to make sense of data.3. Real time processing of metric rocks.

top related