apache cassandra management
TRANSCRIPT
From the front linesSaving stranded clusters
#CassandraSummit
Who am I and what do I do?• Ben Bromhead
• Co-founder and CTO of Instaclustr -> www.instaclustr.com
• Instaclustr provides Cassandra-as-a-Service in the cloud.
• Currently in AWS, Google Cloud in private beta with more to come.
• We currently manage 50+ nodes for various customers, who do various things with it.
In the beginning…
• Well designed schemas
• No data migrations
• Everything was perfect and happy
• Then we got customers
Our first C* patch
• CASSANDRA-6521
• Cassandra wouldn’t check the length of a column name in a range predicate for slice operations.
• So for large column names it would throw an assertion error.
• Which would in turn tie up threads, causing the node to be unresponsive and eventually the whole cluster.
Our first C* patch
• What was the size of the column name that would cause this issue?
• Around 130kb
• wat…
Our first migration
• Receive frantic phone call
• Self managed cluster has been down for 48 hours, for a company that gets 25 million monthly unique views.
• They are hurting
Our first migration
• The cluster was running a very early version of C* 2.0
• Update/patch the old cluster, get everything back online
• Start the migration process…
Our first migration• Bulkload manages to kill their new cluster with us in about 5
minutes.
• Open logs
• Read 1 live and 38456 tombstoned cells (see tombstone_warn_threshold)
• For every column family
• wat…
Conclusion
• Everything is awesome
• Then reality occurs
• It’s actually way more fun
• Want to make C* even better? We are hiring!