kafka as a message queue
TRANSCRIPT
@adamwarski, SoftwareMill, Kafka London Meetup
THE PLAN
➤ Acknowledgments in plain Kafka
➤ Why selective acknowledgments?
➤ Why not …MQ?
➤ Kmq implementation
➤ Demo
➤ Performance
@adamwarski, SoftwareMill, Kafka London Meetup
➤ Offset commits:
➤ Using this, we can implement:
➤ at-least-once processing
➤ at-most-once processing
topic
msg25msg24
ACKNOWLEDGMENTS IN PLAIN KAFKA
msg18
partition 1
partition 2
partition 3
msg19 msg20 msg21 msg22 msg23
commit offset: 20commit offset: 24
@adamwarski, SoftwareMill, Kafka London Meetup
WHY SELECTIVE ACKNOWLEDGMENTS?
➤ Integrating with external systems
➤ e.g. HTTP/REST endpoints
➤ other messaging
➤ Individual calls might fail
➤ should be retried
➤ without retrying the whole batch
➤ without delaying subsequent batches
@adamwarski, SoftwareMill, Kafka London Meetup
WHY NOT …MQ?
➤ Typical usage scenario for a message queue
➤ RabbitMQ, ActiveMQ, Artemis, SQS …
➤ Kafka:
➤ proven & reliable clustering & replication mechanisms
➤ performance
➤ convenience: reduce operational complexity
@adamwarski, SoftwareMill, Kafka London Meetup
AMAZON SQS
➤ Message queue as-a-service
➤ Simple API: ➤ CreateQueue
➤ SendMessage
➤ ReceiveMessage
➤ DeleteMessage
➤ Received messages are blocked for a period of time
➤ visibility timeout
@adamwarski, SoftwareMill, Kafka London Meetup
KMQ: IMPLEMENTATION
➤ Two topics:
➤ queue: messages to process
➤ markers: for each message, start/end markers
➤ same number of partitions
➤ A number of queue clients
➤ here data is processed
➤ A number of redelivery trackers
@adamwarski, SoftwareMill, Kafka London Meetup
QUEUE CLIENT
1. Read message from queue
2. Write start [offset] to markers
➤ wait for send to complete!
3. Commit offset to queue
4. Process the message
5. Write end [offset] markers
markers topic
partition 1
partition 2
partition 3
queue topic
partition 1
partition 2
partition 3
msg37
4. process message
fail processing, wait for redelivery
msg39msg40
1. read messages from
topic
start marker offset: 39
2. write start markers
msg38
3. commit offsets
offset: 38
success, confirm message processed
end marker offset: 37
5. write end markers
redelivery tracker
// started, not ended markersoffset=10, time=1488010644offset=15, time=1488141843offset=24, time=1488289812…
marker stream
every second trigger
redeliver timed out messages
read & redeliver messagemsg10
@adamwarski, SoftwareMill, Kafka London Meetup
REDELIVERY TRACKER
➤ A Kafka application
➤ consumes the markers topic
➤ Multiple instances for fail-over
➤ Uses Kafka’s auto-partition-assignment
@adamwarski, SoftwareMill, Kafka London Meetup
REDELIVERY TRACKER
➤ In-memory priority queue
➤ by Kafka’s marker timestamp
➤ messages with start markers, but no end markers
➤ Checks for messages to redeliver at regular intervals
➤ redelivery: seek + send
➤ in order
@adamwarski, SoftwareMill, Kafka London Meetup
PERFORMANCE
➤ 3-node Kafka cluster
➤ m4.2xlarge servers (8 CPUs, 32GiB RAM)
➤ single AZ
➤ 100 byte messages, sent in batches of up to 10
➤ Up to 8 sender/receiver nodes
➤ 64 to 160 partitions ➤ replication-factor=3
➤ min.insync.replicas=2
➤ acks=all (-1)
@adamwarski, SoftwareMill, Kafka London Meetup
LATENCY
➤ Plain Kafka: ~50 milliseconds
➤ kmq: 50ms - 130ms
@adamwarski, SoftwareMill, Kafka London Meetup
KMQ INTERNALS➤ RedeliveryTracker
➤ Implemented in Scala, with a Java API
➤ Uses Akka
➤ One tracking actor per markers topic partition
➤ One redeliver actor per queue topic partition
➤ Started/stopped when partitions are revoked/assigned ➤ KmqClient
➤ Single Java class
➤ + marker value classes
@adamwarski, SoftwareMill, Kafka London Meetup
ABOUT ME
➤ Software engineer, co-founder @
➤ Custom software development: Scala/Kafka/Java/Cassandra/…
➤ Open-source: sttp, QuickLens, ElasticMQ, Envers, MacWire, …
➤ Blog @ softwaremill.com/blog
➤ Twitter @ twitter.com/adamwarski
@adamwarski, SoftwareMill, Kafka London Meetup
SUMMARY
➤ Individual, selective message acknowledgments
➤ similar to SQS
➤ Alternative to batch/up-to-offset acknowledgments in plain Kafka
➤ Storage overhead: additional meta-data topic
➤ Performance overhead: comparable
➤ Integrating with external systems
@adamwarski, SoftwareMill, Kafka London Meetup
LINKS
➤ GitHub: https://github.com/softwaremill/kmq
➤ Introductory blog: https://softwaremill.com/using-kafka-as-a-message-queue/
➤ Message queue performance: https://softwaremill.com/mqperf/
➤ @adamwarski / [email protected]