apache cassandra at target - cassandra summit 2014
DESCRIPTION
Pioneering NoSQL in a Big Enterprise. The problems we needed to solve, the journey we took to get there, and the lessons we learned along the way.TRANSCRIPT
Apache Cassandra at Target: Pioneering NoSQL in a Big Enterprise
Dan Cundiff (@pmotch)Target
Context
● Target’s API platform● mostly REST APIs● e.g. products, locations, inventory, etc.● consumers inside and outside of Target● wide variety of providing systems (legacy, in-
house built, saas, packages, etc.)
Problems we needed to solve
● slow providing systems● cost prohibitive to call directly● unable to scale from increased demand● need a place to aggregate data from multiple
systems● some data wasn’t even in a database to
begin with!
Barriers with existing tools, part 1
● cost too much● process for traditional DBs wasn’t a fit● too few tools/vendors
Barriers with existing tools, part 2
● RDBMS isn’t:○ distributed (multi-tenant)○ close to Guests (geographic distribution)○ distributed across our data centers○ distributed to the cloud!
Barriers with existing tools, part 3
● lack of performance control○ process, not owning it all, flexibility on
changes like indexing, etc● availability
○ systems before had outages, downtime, etc.
● not automate-able
Discovering the solution
Taking the idea back
● i just went and talked to Pete and we decided to do it!
● tried other things in the past● show results by trying; succeed or fail fast
Reasons trying was attractive, part 1
● fit 80% of our need● years in development● rich C* dev ecosystem
Reasons trying was attractive, part 2
● google-able● strong community● a company who would support it
Reasons trying was attractive, part 3
● chef-able● aligned well with existing investments● simple pricing model
Barriers to adoption
● enterprise IT; the nature of it● selling it● NoSQL for the first time● automation (was happening at the time;
scary to do)● political
Challenges integrating
● bulk loading data● keeping cassandra in sync● many systems not event driven● packaged software● limited ways to integrate with providing
systems
Challenges of standing it up, part 1
● early distributed system (new to teams)● needed local disk (always used SAN before)● needed SSDs (always used spinning things)● existing config conflicts (backups,
monitoring, raid, swap, etc)● use right sized server (don’t settle for what
your infra friends give you by default)
Challenges of standing it up, part 2
● full stack ownership● it’s new, don’t hand it off● support response is quick because we own it● you’re closest to the problem; you’re best
suited to solve it● tuned to meet the needs of our APIs● data is modeled for API performance gains
Challenges of standing it up, part 3
● skills supply is low (but getting better)● train your people● be wary of promises from consultants
○ grill them on what they claim to know
Challenges of development, part 1
● skills ramp up (data modeling, datastax driver, etc)
● developers need to care○ encourage tweaking, research, make
things better○ clients are equally as important to get the
most out of C*
Challenges of development, part 2
● mind shift from RDBMS● started with Astyanax; switched to DataStax
driver○ DataStax supported○ newer features
Ops challenges, part 1
● lots of machines; don’t config by hand● wrote Chef cookbooks● support people saw these odd servers and
turned on things we disabled (like swap)● can’t use “legacy” testing, cassandra works
differently; chaos stuff (turn off gossip, thrift, etc.)
Ops challenges, part 2
● made logging awesome; we can see anything
● utilized C* jmx interface to send data in real-time to Splunk
● can correlate these events with the app tier (because app logs are in Splunk too!)
Ops challenges, part 3
● useful mbeans:○ heap usage○ specific read/write latencies○ dropped reads/writes○ bloom filter ratios○ column count, size
Ops challenges, part 4
● more useful mbeans:○ ss tables per read○ tombstones○ cache hits and ratios○ misbehaving queries (range slice)
Open source cookbook!
● https://github.com/target/dse-cookbook● by Danny Parker● pull requests encouraged
Blog post on tuning● http://target.github.io/infrastructure/tuning-cassandra/● by Danny Parker (@dcparker88)
Results, part 1
● from n00bs to production ready = 2 months!○ infra, operation testing, app dev, and
deployed!○ just in time before peak season
● today our highest volume APIs depend on it
Results, part 2
● growth (↑ functions + ↑ volume) = ~2000%● increased adoption of our APIs● C* unlocking things we couldn't do before● quick changes possible
○ makes Agile possible○ gets us close to continuous delivery
Results, part 3
● other teams are using it; more coming● sharing our cookbooks, lessons, etc.● opened the door to other distributed systems
Future, part 1
● Use across more of our APIs● Remove remaining spinning disks
Future, part 2
● move to cloud● automate full stack down to infra
○ scale, quick geo-distribute, flexibility to tweak new infra settings, etc.
Future, part 3
● get better at data modeling designs● less bulk loading
○ remove compaction process overhead● weave in Spark, Kafka
○ more event-based updates
Future, part crazy
● Docker + Cassandra?
We’re hiring!Come talk to us
#CassandraSummit
Dan Cundiff (@pmotch)Danny Parker (@dcparker88)Pete Guidarelli (@pguidarelli)
Heather Mickman (@hmmickman)