java zone 2015 how to make life with kafka easier
TRANSCRIPT
PubSub++How to make your life with Kafka easier
Krzysztof Dębski
@DebskiChris
JavaZone 2015
Who am I
@DebskiChris
http://hermes.allegro.tech
Allegro Group
500+ people in IT
50+ independent teams
16 years on market
2 years after technical revolution
Kafka as a backbone
Kafka
Hermes
Kafka data
Partitioning
Round robin partitioning (default)
Key based partitioning
Performance issues
Rebalancing leaders
Broker 1
P1 P0
Broker 2
P2 P1
Broker 3
P0 P2
Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000
Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
Rebalancing leaders
Broker 1
P1 P0
Broker 2
P2 P1
Broker 3
P0 P2
Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000
Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
Brokers that should have partition copies
Rebalancing leaders
Broker 1
P1 P0
Broker 2
P2 P1
Broker 3
P0 P2
Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000
Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
In Sync Replicas
Rebalancing leaders
Broker 1
P1 P0
Broker 2
P2 P1
Broker 3
P0 P2
Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000
Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
Leader broker ID
Rebalancing leaders
Broker 1
P1 P0
Broker 2
P2 P1
Broker 3
P0 P2
Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000
Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
Rebalancing leaders
Broker 1
P1 P0
Broker 2
P2 P1
Broker 3
P0 P2
Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000
Topic: test Partition: 0 Leader: 1 Replicas: 3, 1 ISR: 1Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2
Rebalancing leaders
Broker 1
P1 P0
Broker 2
P2 P1
Broker 3
P0 P2
Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000
Topic: test Partition: 0 Leader: 1 Replicas: 3, 1 ISR: 1, 3Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
Lost events
ACK levels
0 - don’t wait for response from the leader
1 - only the leader has to respond
-1 - all replicas must be in sync
Spee
d
Safe
ty
Event identification
Lost events
Lost events
ERROR [Replica Manager on Broker 2]: Error when processing fetch request for partition [test,1] offset 10000 from consumer with correlation id 0. Possible cause:
Request for offset 10000 but we only have log segments in the range 8000 to 9000. (kafka.server.ReplicaManager)
Lost events
Broker 1 Broker 2
Producer
ACK = 1
Replication factor = 1
replica.lag.max.messages = 2000
commited offset = 10000 commited offset = 9000
Zookeeper
Lost events
Broker 1 Broker 2
Producer
ACK = 1
Replication factor = 1
replica.lag.max.messages = 2000
commited offset = 10000 commited offset = 9000
Zookeeper
Lost events
Broker 1 Broker 2
Producer
ACK = 1
Replication factor = 1
replica.lag.max.messages = 2000
commited offset = 10000 commited offset = 9000
Zookeepercommited offset = 9000
Slow responses
Slow responses
75%
99%
99,9%
resp
onse
tim
e
Slow responses
mes
sage
siz
e
75%
99%
99,9%
Is response time correlated to message size?
Slow responsesre
spon
se t
ime
75%
99%
99,9%
Same distribution for fixed message size.
Slow responses
resp
onse
tim
e
75%
99%
99,9%
Hermes overhead is just about 1 ms.
Kafka
kernel 3.2.x
Kafka
kernel 3.2.x
Kafka
kernel 3.2.x kernel >= 3.8.x
Normal operation
Slow responses
Message size
Optimize message sizem
essa
ge s
ize
99,9%
all topics
99,9%
biggest topic
Optimize message size
JSON human readablebig memory and network footprintpoor support for Hadoop
Optimize message size
JSONSnappy
ERROR Error when sending message to topic t3 with key: 4 bytes, value: 100
bytes with error: The server experienced an unexpected error when
processing the request (org.apache.kafka.clients.producer.internals.
ErrorLoggingCallback)
java: target/snappy-1.1.1/snappy.cc:423: char* snappy::internal::
CompressFragment(const char*, size_t, char*, snappy::uint16*, int): Assertion
`0 == memcmp(base, candidate, matched)' failed.
errors on publishing large amount of messages
Optimize message size
JSONSnappyLz4
failed on distributed data
com
pres
sion
rat
io
single
topic
multiple
topics
Optimize message size
JSONSnappyLz4Avro
small network footprintHadoop friendlyeasy schema verification
Improvements
Multi data center
Consumer backoff
You can’t have exactly one delivery
http://bravenewgeek.com/you-cannot-have-exactly-once-delivery/
Kafka offsets
<=0.8.1 - Zookeeper
>=0.8.2 - Zookeeper or Kafka
>=0.9(?) - Kafka
Kafka Offset Monitor
Manage your topics
Improved security
Authentication and authorization interfaces provided
By Default:
You can create any topic in your group
You can publish everywhere (in progress)
Group owner defines subscriptions
Improved offset management
Improved offset management
Improved offset management
Improved offset management
Turn back the time
PUT /groups/{group}/topics/{topic}/subscriptions/{subscription}/retransmission -8h
Blog: http://allegro.tech
Twitter: @allegrotechblog
Twitter: @debskichris