cassandra queuing
Post on 16-Apr-2017
8.535 Views
Preview:
TRANSCRIPT
Queuing with CassandraDavid Straussdavid@fourkitchens.com@davidstrauss
Why we use queues
Loose couplingDifferent languages/interprocess
Integrate legacy and new systems
Allow publishers to be unaware of listeners
Asynchronous requestsLong-running tasks
Failure toleranceOf nodes within the queue system
Of systems using the queue
Possible queue guarantees
Deliver exactly once
Deliver at least once
Deliver no more than once
+
=
Enterprise queues
ActiveMQDelivers at most once
Punts at least once to lower-level redundancy
RabbitMQClusteredNo guarantee of at most once
Will deliver at least once
UnclusteredDelivers at most once, but could fail
Job queues
BeanstalkdDelivers at most once
Can optionally persist to disk
GearmanDelivers at most once
No persistence between restarts
What's annoying about these
Inflexible service levelsEntire installation or cluster guarantees exactly the same delivery semantics for all messages
Not all messages are equal
No scalable at least once queueRabbitMQ, replicates all messages to all nodesLimits scalability to what a single node can do
Sending messages redundantly to multiple job queue nodes makes multiple delivery the common case
Application-integrated sharding doesn't count
Why Cassandra?
Processes queuing messages can use ConsistencyLevel to indicate a service levelCL.ZERO is would be nice to deliverSame guarantee as a non-persistent queue
CL.ONE is low-latency with some durabilitySame guarantee as a single-node persistent queue
CL.QUORUM (or more) is delivery at least onceSame guarantee as clustered persistence (e.g. Rabbit)
Queue is sharded/partitioned across nodesUnlike RabbitMQ
Can co-locate queue with data
Queue data models for Cassandra
Use rows as queuesBest performance for ordered messages
Scale limited to row size
(but still huge by queue standards and possible to partition)
Use column families as queues with RPDistributes queue items over a cluster
No message ordering
Use column families as queues with OPPDistributes less well over a cluster
Provides message ordering
When you want or need
at most once semantics
When things are idempotent, you don't
When trying to avoid resource contention
or redundant computationPossible to make single consumer the common
case
with smart consumers
memcached for imperfect but scalable/HA locking
When something absolutely cannot happen more than onceThe bank transfer case
Give messages unique identity and use locking managed by consumers
Use a locking framework like Zookeeper
Audit periodically for the effects of duplication and correct
Maybe don't use Cassandra...
What's annoying
about Cassandra queues
Polling is necessaryMakes this bad for low-latency queues
Adding locking requires interfacing code with multiple systemsEven then, locking is usually optimistic rather than a coordinated reservation of work items
So, consider Cassandra
for queuing if you have...
Different messages with
different delivery importanceBut most messages need
to reach consumers at least once
Limited need for at most once guarantees
Too much message volume to handle throughput on a single node
Willingness to poll and have high latency
top related