monitoring cassandra with graphite using yammer coda-hale library

Post on 02-Jul-2015

1.186 Views

Category:

Software

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Monitoring Cassandra using Graphite leveraging Yammer Coda-Hale Library

TRANSCRIPT

JMXExpress

Transporting Cassandra Metrics

To Graphite

Cassandra Is Awesome

● No Single Point of Failure

● Fault Tolerant

● Multi-DC Is A Picnic

● Great Properties That Let Ops Teams to

Sleep at 2 AM

Robustness Have Price

● C* Isn’t A Fire and Forget System :(

● Most Times You Don’t Notice Problemso Things can go up/down for a minutes

o C* Simply Queues Request, and Services Still

Running, but nobody notices

Be Proactive

Do Daily/Weekly Checkups to detect and

prevent Problems:

● Capacity

● Exceptions

● Performance Bottlenecks

● Data Modeling Issues

Reactive

● Something Will Go Wrong:o Hardware Failures

o Bugs

o Malicious or Non-Malicious Users

● Alarms: NOC, Pager-Duty

Proactive or Reactive?

● You Need Datao Form Alerts

o Find Anomalies

o Trends

o Debugging

● You Should Monitor Everything

Gathering Metrics● Cassandra

o OpsCenter

o JMX

o Nodetool

o Logs

● Environmento CPU, Memory, Disks, Network, …

o Logs

o JVM

Give Data Context

You Should Give the

Data Context …

Otherwise it’s just pretty

Graphs...

JMX

● Java Management Extensions

● Complex…

● Resources are presented as Objects with

Attributes

● Used for Both Monitoring and For Actions

Native JMX● Un-Friendly way to get metrics

o Requires Java

o Slow and have memory leaks

o Nightmare for Ops (Network/Security)

Init Port 7199Reply

Hostname:Port1- Get new

host/port

2- Drop old conn

3- Connect with

new host/port Init Port 7199

7199

7199

1024-65536

Client Cassandra

JMX Tools

● Visualo JConsole

o VisualVM

o Commercial

● Command Lineo jmxterm

o jmxsh

● Jolokia

● MX4J

JMX Syntax

[domain]:[key1]=[value1],[key2]=[value2] …

org.apache.cassandra.metrics:type=ColumnFamily,keyspace=outbrain,scope=user_events,name=TotalDiskSpaceUsed

JMX Domains

org.apache.cassandra

● db

● internal

● net

● request

org.apache.cassandra.metrics

JMX Types

org.apache.cassandra.metrics: type=● Cache

● Client

● ClientRequest

● ClientRequestMetrics

● ColumnFamily

● CommitLog

● Compaction

● DroppedMessages

● FileCache

● Storage

● ThreadPools

Coda-Hale Metrics

● Toolkit called metrics from metricso By Yammer Coda-Hale Library

● Easy to Use

● Easy to Read (If you speak Java)

● Popular

Types of Metrics

● Gauge: Instantaneous value

● Counter: number that can be

incremented/decremented

● Meter: Rate of Events Over time

(request/second/minutes/5min/15min)

● Histogram: Statistical Distribution

o 50,75,95,98,99,99.9 percentile

o average/median/min/max/stddev

● Timer:rate of events/historgram of

duration

75th percentile is 650.75 us

(75% took 650.75us or less)

One Minute Write rate is

13,915 per second

Native JMX

● Its overwhelming at first

● Hard to tell what they mean with the source

● Moves around a lot between versions

● Fortunately there is nodetool

Coda-Hale Reporting Interface

Coda-Hale Metrics Library:

● Default

o JMX

o Console

o CSV

o Slf4J

● Addons

o Ganglia / Graphite

● Community

o Cassandra / StatsD / NewRelic / Splunk / Cloudwatch

o Kafka / Riemann / TempDB/ Munin / Riak / InfluxDB / Sematext

o MongoDB / OpenTSDB/ Librato

o … More

Reporting Interface Activation

● Metrics library: o Included in Cassandra since 1.1

o Pre 2.0 It required writing your Java agent reporter

Pluggable Metrics in Cassandra 2.0.2

● Starting from Cassandra 2.0.2, you need only to configure special YAML

file:/etc/cassandra/metrics-reporter-config-graphite.yaml

● Load the Coda-Hale metrics by including the build-in agent in the cassandra-env.sh file

-Dcassandra.metricsReporterConfigFile=yourCoolFile.yaml

● Save the file in /etc/cassandra/ directory only and don’t specify full path,

otherwise it will not work

Pluggable Metrics in Cassandra 2.0.2

Yaml Example:graphite:-

period: 60timeunit: 'SECONDS'hosts:- host: 'graphite'port: 2003

predicate:color: "white"useQualifiedName: truepatterns:- "^org.apache.cassandra.metrics.Cache.+"- "^org.apache.cassandra.metrics.ClientRequest.+"- "^org.apache.cassandra.metrics.Storage.+"- "^org.apache.cassandra.metrics.ThreadPools.+"

Caveats of Pluggable Metrics

- Works only in 2.0.2 or higher

- Has bad metrics names: sometimes begins

with ‘.’ and not suitable for Graphite Tree

- Limited ability to manipulate metrics

Our Approach

- Use older version (2.0.3) of Metrics Library

that fits to all C* version (down to 1.1)

- Write our own Java agent for backward

compatibility

- Run the metrics via Manipulator daemon to

be able for reformat them and fit them to our

dashboards

The Java Agent

From the Documentation

The Java Agent

● Compiling it:

javac -cp $CASSANDRA_HOME/lib/metrics-core-2.0.3.jar:$CASSANDRA_HOME/lib/metrics-graphite-2.0.3.jar

com/datastax/example/ReportAgent.java

$ jar -cfM reporter.jar .

● Loading the Agent with Cassandra(Edit cassandra-env.sh and add the following line to the bottom)

JVM_OPTS="-javaagent:/path/to/your/reporter.jar $JVM_OPTS"

Manipulating the Metrics

● Metrics comes in org.apache.cassandra…

syntax

● They don’t fit into our Graphite Scheme

● Some metrics begins with . (dot)

● Need to be able to filter and manipulate

metrics

Manipulating the Metrics

We have build a Simple Bash script that poses

to a Graphite server and manipulates the

metrics as we wish:

● We change the prefix

● We can filter metrics

● Keep unified output

● Solve some syntax issues like IP addresses

read by Graphite as separate metric tree

Metrics in Graphite (Sample: Write Latency Histograms)

top related