apache cassandra at target - cassandra summit 2014

Apache Cassandra at Target: Pioneering NoSQL in a Big Enterprise

Dan Cundiff (@pmotch)Target

Context

● Target’s API platform● mostly REST APIs● e.g. products, locations, inventory, etc.● consumers inside and outside of Target● wide variety of providing systems (legacy, in-

house built, saas, packages, etc.)

Problems we needed to solve

● slow providing systems● cost prohibitive to call directly● unable to scale from increased demand● need a place to aggregate data from multiple

systems● some data wasn’t even in a database to

begin with!

Barriers with existing tools, part 1

● cost too much● process for traditional DBs wasn’t a fit● too few tools/vendors


● RDBMS isn’t:○ distributed (multi-tenant)○ close to Guests (geographic distribution)○ distributed across our data centers○ distributed to the cloud!


● lack of performance control○ process, not owning it all, flexibility on

changes like indexing, etc● availability

○ systems before had outages, downtime, etc.

● not automate-able

Discovering the solution

Taking the idea back

● i just went and talked to Pete and we decided to do it!

● tried other things in the past● show results by trying; succeed or fail fast

Reasons trying was attractive, part 1

● fit 80% of our need● years in development● rich C* dev ecosystem


● google-able● strong community● a company who would support it


● chef-able● aligned well with existing investments● simple pricing model

Barriers to adoption

● enterprise IT; the nature of it● selling it● NoSQL for the first time● automation (was happening at the time;

scary to do)● political

Challenges integrating

● bulk loading data● keeping cassandra in sync● many systems not event driven● packaged software● limited ways to integrate with providing

systems

Challenges of standing it up, part 1

● early distributed system (new to teams)● needed local disk (always used SAN before)● needed SSDs (always used spinning things)● existing config conflicts (backups,

monitoring, raid, swap, etc)● use right sized server (don’t settle for what

your infra friends give you by default)


● full stack ownership● it’s new, don’t hand it off● support response is quick because we own it● you’re closest to the problem; you’re best

suited to solve it● tuned to meet the needs of our APIs● data is modeled for API performance gains


● skills supply is low (but getting better)● train your people● be wary of promises from consultants

○ grill them on what they claim to know

Challenges of development, part 1

● skills ramp up (data modeling, datastax driver, etc)

● developers need to care○ encourage tweaking, research, make

things better○ clients are equally as important to get the

most out of C*

Challenges of development, part 2

● mind shift from RDBMS● started with Astyanax; switched to DataStax

driver○ DataStax supported○ newer features

Ops challenges, part 1

● lots of machines; don’t config by hand● wrote Chef cookbooks● support people saw these odd servers and

turned on things we disabled (like swap)● can’t use “legacy” testing, cassandra works

differently; chaos stuff (turn off gossip, thrift, etc.)


● made logging awesome; we can see anything

● utilized C* jmx interface to send data in real-time to Splunk

● can correlate these events with the app tier (because app logs are in Splunk too!)


● useful mbeans:○ heap usage○ specific read/write latencies○ dropped reads/writes○ bloom filter ratios○ column count, size


● more useful mbeans:○ ss tables per read○ tombstones○ cache hits and ratios○ misbehaving queries (range slice)

Open source cookbook!

● https://github.com/target/dse-cookbook● by Danny Parker● pull requests encouraged

https://github.com/target/dse-cookbook

https://github.com/target/dse-cookbook

Blog post on tuning● http://target.github.io/infrastructure/tuning-cassandra/● by Danny Parker (@dcparker88)

http://target.github.io/infrastructure/tuning-cassandra/

http://target.github.io/infrastructure/tuning-cassandra/

Results, part 1

● from n00bs to production ready = 2 months!○ infra, operation testing, app dev, and

deployed!○ just in time before peak season

● today our highest volume APIs depend on it

Results, part 2

● growth (↑ functions + ↑ volume) = ~2000%● increased adoption of our APIs● C* unlocking things we couldn't do before● quick changes possible

○ makes Agile possible○ gets us close to continuous delivery

Results, part 3

● other teams are using it; more coming● sharing our cookbooks, lessons, etc.● opened the door to other distributed systems

Future, part 1

● Use across more of our APIs● Remove remaining spinning disks

Future, part 2

● move to cloud● automate full stack down to infra

○ scale, quick geo-distribute, flexibility to tweak new infra settings, etc.

Future, part 3

● get better at data modeling designs● less bulk loading

○ remove compaction process overhead● weave in Spark, Kafka

○ more event-based updates

Future, part crazy

● Docker + Cassandra?

We’re hiring!Come talk to us

#CassandraSummit

Dan Cundiff (@pmotch)Danny Parker (@dcparker88)Pete Guidarelli (@pguidarelli)

Heather Mickman (@hmmickman)

https://twitter.com/pmotch

https://twitter.com/dcparker88

https://twitter.com/pguidarelli

https://twitter.com/hmmickman

apache cassandra at target - cassandra summit 2014

Technology