cassandra queuing

Queuing with CassandraDavid Straussdavid@fourkitchens.com@davidstrauss

Why we use queues

Loose couplingDifferent languages/interprocess

Integrate legacy and new systems

Allow publishers to be unaware of listeners

Asynchronous requestsLong-running tasks

Failure toleranceOf nodes within the queue system

Of systems using the queue

Possible queue guarantees

Deliver exactly once

Deliver at least once

Deliver no more than once

Enterprise queues

ActiveMQDelivers at most once

Punts at least once to lower-level redundancy

RabbitMQClusteredNo guarantee of at most once

Will deliver at least once

UnclusteredDelivers at most once, but could fail

Job queues

BeanstalkdDelivers at most once

Can optionally persist to disk

GearmanDelivers at most once

No persistence between restarts

What's annoying about these

Inflexible service levelsEntire installation or cluster guarantees exactly the same delivery semantics for all messages

Not all messages are equal

No scalable at least once queueRabbitMQ, replicates all messages to all nodesLimits scalability to what a single node can do

Sending messages redundantly to multiple job queue nodes makes multiple delivery the common case

Application-integrated sharding doesn't count

Why Cassandra?

Processes queuing messages can use ConsistencyLevel to indicate a service levelCL.ZERO is would be nice to deliverSame guarantee as a non-persistent queue

CL.ONE is low-latency with some durabilitySame guarantee as a single-node persistent queue

CL.QUORUM (or more) is delivery at least onceSame guarantee as clustered persistence (e.g. Rabbit)

Queue is sharded/partitioned across nodesUnlike RabbitMQ

Can co-locate queue with data

Queue data models for Cassandra

Use rows as queuesBest performance for ordered messages

Scale limited to row size
(but still huge by queue standards and possible to partition)

Use column families as queues with RPDistributes queue items over a cluster

No message ordering

Use column families as queues with OPPDistributes less well over a cluster

Provides message ordering

When you want or need
at most once semantics

When things are idempotent, you don't

When trying to avoid resource contention
or redundant computationPossible to make single consumer the common case
with smart consumers

memcached for imperfect but scalable/HA locking

When something absolutely cannot happen more than onceThe bank transfer case

Give messages unique identity and use locking managed by consumers

Use a locking framework like Zookeeper

Audit periodically for the effects of duplication and correct

Maybe don't use Cassandra...

What's annoying
about Cassandra queues

Polling is necessaryMakes this bad for low-latency queues

Adding locking requires interfacing code with multiple systemsEven then, locking is usually optimistic rather than a coordinated reservation of work items

So, consider Cassandra
for queuing if you have...

Different messages with
different delivery importanceBut most messages need
to reach consumers at least once

Limited need for at most once guarantees

Too much message volume to handle throughput on a single node

Willingness to poll and have high latency

cassandra queuing

Technology

supporting differentiated service classes: queue...

516 queuing

lunchtime webinar series: introduction to queuing...

advanced queuing

cassandra core concepts - cassandra day toronto

cassandra day nyc - cassandra anti patterns

la cassandra day 2015 - testing cassandra

gi/m/1 queuing systemgi/m/1 queuing system

cassandra at ebay - cassandra summit 2012

running cassandra on amazon’s ecs -...

fair queuing 15-744: computer...

cassandra day atlanta 2016 - monitoring cassandra

cassandra summit 2014: cassandra compute cloud: an elastic...

apache cassandra in action - o'reilly...

queuing system chap4 - ioe...

queuing analysis

chicago cassandra - cassandra from python

paris cassandra meetup - cassandra for developers

queuing applications

Ｐ age 2 Ｐ age 3 Ｐ age 4 callback queuing networked...