c* summit 2013: dude, where's my tweet? taming the twitter firehose by andrew noonan
DESCRIPTION
Gnip ingests and must serve out hundreds of millions of social activities every day and social platforms are only growing. This makes the scalability of applications essential for Gnip. Enter Cassandra. Problem solved, right? Not exactly, Gnip's relationship with Cassandra was not all rainbows and unicorns. In this session we will walk you through why we began looking at Cassandra as a data store in the first place and the valuable lessons we with Cassandra that has made it an invaluable part of our infrastructure.TRANSCRIPT
#Cassandra2013
Dude, Where’s My Tweet? Taming the Twitter Firehose
Andrew Noonan Software Engineer at Gnip
@noonanisms
#Cassandra2013
Gnip
Cassandra
Rainbows
Unicorns
???
#Cassandra2013
#Cassandra2013
Social Data
#Cassandra2013
90% of Fortune 500
120 Billion Activities Delivered Per Month
#Cassandra2013
Lots-O-Data
Redundancy & Reliability
Availability
#Cassandra2013
#Cassandra2013
High Write Throughput ✔
Scalable ✔
Highly Available ✔
Persistent ✔
#Cassandra2013
Right?
#Cassandra2013
Not Exactly…
#Cassandra2013
No Maintenance? Bad Idea
Begin Maintenance -> 2X Data Growth
Scalable, Right?
Bootstrap Failures Due To Cluster Load
#Cassandra2013
Reconsider (Life) Choices?
#Cassandra2013
Size Tiered Compaction vs Leveled Compaction
How Much Data To Store Per Node
Your Write Pattern Matters Too
#Cassandra2013
compaction_throughput_mb_per_sec
16-32X write rate?
Lots-o-options – explore them
#Cassandra2013
Lookup by Tweet ID
Read Rate < Write Rate
Dynamic ColumnFamilies
#Cassandra2013
For realz this time!?
#Cassandra2013
#Cassandra2013
Bloom Filter False Positive Rate
Index Intervals
Only Change Schema On One Node! (For Now)
#Cassandra2013
You Won’t Always Fit The Mold and That’s Okay
Explore Your Options No Matter What
Understand The Consequences Of Your Choices
Staging Environment Identical To Production
#Cassandra2013
www.gnip.com
@noonanisms