JMXExpress
Transporting Cassandra Metrics
To Graphite
Cassandra Is Awesome
● No Single Point of Failure
● Fault Tolerant
● Multi-DC Is A Picnic
● Great Properties That Let Ops Teams to
Sleep at 2 AM
Robustness Have Price
● C* Isn’t A Fire and Forget System :(
● Most Times You Don’t Notice Problemso Things can go up/down for a minutes
o C* Simply Queues Request, and Services Still
Running, but nobody notices
Be Proactive
Do Daily/Weekly Checkups to detect and
prevent Problems:
● Capacity
● Exceptions
● Performance Bottlenecks
● Data Modeling Issues
Reactive
● Something Will Go Wrong:o Hardware Failures
o Bugs
o Malicious or Non-Malicious Users
● Alarms: NOC, Pager-Duty
Proactive or Reactive?
● You Need Datao Form Alerts
o Find Anomalies
o Trends
o Debugging
● You Should Monitor Everything
Gathering Metrics● Cassandra
o OpsCenter
o JMX
o Nodetool
o Logs
● Environmento CPU, Memory, Disks, Network, …
o Logs
o JVM
Give Data Context
You Should Give the
Data Context …
Otherwise it’s just pretty
Graphs...
JMX
● Java Management Extensions
● Complex…
● Resources are presented as Objects with
Attributes
● Used for Both Monitoring and For Actions
Native JMX● Un-Friendly way to get metrics
o Requires Java
o Slow and have memory leaks
o Nightmare for Ops (Network/Security)
Init Port 7199Reply
Hostname:Port1- Get new
host/port
2- Drop old conn
3- Connect with
new host/port Init Port 7199
7199
7199
1024-65536
Client Cassandra
JMX Tools
● Visualo JConsole
o VisualVM
o Commercial
● Command Lineo jmxterm
o jmxsh
● Jolokia
● MX4J
JMX Syntax
[domain]:[key1]=[value1],[key2]=[value2] …
org.apache.cassandra.metrics:type=ColumnFamily,keyspace=outbrain,scope=user_events,name=TotalDiskSpaceUsed
JMX Domains
org.apache.cassandra
● db
● internal
● net
● request
org.apache.cassandra.metrics
JMX Types
org.apache.cassandra.metrics: type=● Cache
● Client
● ClientRequest
● ClientRequestMetrics
● ColumnFamily
● CommitLog
● Compaction
● DroppedMessages
● FileCache
● Storage
● ThreadPools
Coda-Hale Metrics
● Toolkit called metrics from metricso By Yammer Coda-Hale Library
● Easy to Use
● Easy to Read (If you speak Java)
● Popular
Types of Metrics
● Gauge: Instantaneous value
● Counter: number that can be
incremented/decremented
● Meter: Rate of Events Over time
(request/second/minutes/5min/15min)
● Histogram: Statistical Distribution
o 50,75,95,98,99,99.9 percentile
o average/median/min/max/stddev
● Timer:rate of events/historgram of
duration
75th percentile is 650.75 us
(75% took 650.75us or less)
One Minute Write rate is
13,915 per second
Native JMX
● Its overwhelming at first
● Hard to tell what they mean with the source
● Moves around a lot between versions
● Fortunately there is nodetool
Coda-Hale Reporting Interface
Coda-Hale Metrics Library:
● Default
o JMX
o Console
o CSV
o Slf4J
● Addons
o Ganglia / Graphite
● Community
o Cassandra / StatsD / NewRelic / Splunk / Cloudwatch
o Kafka / Riemann / TempDB/ Munin / Riak / InfluxDB / Sematext
o MongoDB / OpenTSDB/ Librato
o … More
Reporting Interface Activation
● Metrics library: o Included in Cassandra since 1.1
o Pre 2.0 It required writing your Java agent reporter
Pluggable Metrics in Cassandra 2.0.2
● Starting from Cassandra 2.0.2, you need only to configure special YAML
file:/etc/cassandra/metrics-reporter-config-graphite.yaml
● Load the Coda-Hale metrics by including the build-in agent in the cassandra-env.sh file
-Dcassandra.metricsReporterConfigFile=yourCoolFile.yaml
● Save the file in /etc/cassandra/ directory only and don’t specify full path,
otherwise it will not work
Pluggable Metrics in Cassandra 2.0.2
Yaml Example:graphite:-
period: 60timeunit: 'SECONDS'hosts:- host: 'graphite'port: 2003
predicate:color: "white"useQualifiedName: truepatterns:- "^org.apache.cassandra.metrics.Cache.+"- "^org.apache.cassandra.metrics.ClientRequest.+"- "^org.apache.cassandra.metrics.Storage.+"- "^org.apache.cassandra.metrics.ThreadPools.+"
Caveats of Pluggable Metrics
- Works only in 2.0.2 or higher
- Has bad metrics names: sometimes begins
with ‘.’ and not suitable for Graphite Tree
- Limited ability to manipulate metrics
Our Approach
- Use older version (2.0.3) of Metrics Library
that fits to all C* version (down to 1.1)
- Write our own Java agent for backward
compatibility
- Run the metrics via Manipulator daemon to
be able for reformat them and fit them to our
dashboards
The Java Agent
From the Documentation
The Java Agent
● Compiling it:
javac -cp $CASSANDRA_HOME/lib/metrics-core-2.0.3.jar:$CASSANDRA_HOME/lib/metrics-graphite-2.0.3.jar
com/datastax/example/ReportAgent.java
$ jar -cfM reporter.jar .
● Loading the Agent with Cassandra(Edit cassandra-env.sh and add the following line to the bottom)
JVM_OPTS="-javaagent:/path/to/your/reporter.jar $JVM_OPTS"
Manipulating the Metrics
● Metrics comes in org.apache.cassandra…
syntax
● They don’t fit into our Graphite Scheme
● Some metrics begins with . (dot)
● Need to be able to filter and manipulate
metrics
Manipulating the Metrics
We have build a Simple Bash script that poses
to a Graphite server and manipulates the
metrics as we wish:
● We change the prefix
● We can filter metrics
● Keep unified output
● Solve some syntax issues like IP addresses
read by Graphite as separate metric tree
Metrics in Graphite (Sample: Write Latency Histograms)