cassandra queuing

Post on 16-Apr-2017

8.535 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Queuing with CassandraDavid Straussdavid@fourkitchens.com@davidstrauss

Why we use queues

Loose couplingDifferent languages/interprocess

Integrate legacy and new systems

Allow publishers to be unaware of listeners

Asynchronous requestsLong-running tasks

Failure toleranceOf nodes within the queue system

Of systems using the queue

Possible queue guarantees

Deliver exactly once

Deliver at least once

Deliver no more than once

+

=

Enterprise queues

ActiveMQDelivers at most once

Punts at least once to lower-level redundancy

RabbitMQClusteredNo guarantee of at most once

Will deliver at least once

UnclusteredDelivers at most once, but could fail

Job queues

BeanstalkdDelivers at most once

Can optionally persist to disk

GearmanDelivers at most once

No persistence between restarts

What's annoying about these

Inflexible service levelsEntire installation or cluster guarantees exactly the same delivery semantics for all messages

Not all messages are equal

No scalable at least once queueRabbitMQ, replicates all messages to all nodesLimits scalability to what a single node can do

Sending messages redundantly to multiple job queue nodes makes multiple delivery the common case

Application-integrated sharding doesn't count

Why Cassandra?

Processes queuing messages can use ConsistencyLevel to indicate a service levelCL.ZERO is would be nice to deliverSame guarantee as a non-persistent queue

CL.ONE is low-latency with some durabilitySame guarantee as a single-node persistent queue

CL.QUORUM (or more) is delivery at least onceSame guarantee as clustered persistence (e.g. Rabbit)

Queue is sharded/partitioned across nodesUnlike RabbitMQ

Can co-locate queue with data

Queue data models for Cassandra

Use rows as queuesBest performance for ordered messages

Scale limited to row size
(but still huge by queue standards and possible to partition)

Use column families as queues with RPDistributes queue items over a cluster

No message ordering

Use column families as queues with OPPDistributes less well over a cluster

Provides message ordering

When you want or need
at most once semantics

When things are idempotent, you don't

When trying to avoid resource contention
or redundant computationPossible to make single consumer the common case
with smart consumers

memcached for imperfect but scalable/HA locking

When something absolutely cannot happen more than onceThe bank transfer case

Give messages unique identity and use locking managed by consumers

Use a locking framework like Zookeeper

Audit periodically for the effects of duplication and correct

Maybe don't use Cassandra...

What's annoying
about Cassandra queues

Polling is necessaryMakes this bad for low-latency queues

Adding locking requires interfacing code with multiple systemsEven then, locking is usually optimistic rather than a coordinated reservation of work items

So, consider Cassandra
for queuing if you have...

Different messages with
different delivery importanceBut most messages need
to reach consumers at least once

Limited need for at most once guarantees

Too much message volume to handle throughput on a single node

Willingness to poll and have high latency

top related