java zone 2015 how to make life with kafka easier

PubSub++How to make your life with Kafka easier

Krzysztof Dębski

@DebskiChris

JavaZone 2015

Who am I

@DebskiChris

http://hermes.allegro.tech

Allegro Group

500+ people in IT

50+ independent teams

16 years on market

2 years after technical revolution

Kafka as a backbone

Hermes

Kafka data

Partitioning

Round robin partitioning (default)

Key based partitioning

Performance issues

Rebalancing leaders

Broker 1

P1 P0

Broker 2

P2 P1

Broker 3

P0 P2

Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000

Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3

Rebalancing leaders

Broker 1

P1 P0

Broker 2

P2 P1

Broker 3

P0 P2



Brokers that should have partition copies

Rebalancing leaders

Broker 1

P1 P0

Broker 2

P2 P1

Broker 3

P0 P2



In Sync Replicas

Rebalancing leaders

Broker 1

P1 P0

Broker 2

P2 P1

Broker 3

P0 P2



Leader broker ID

Rebalancing leaders

Broker 1

P1 P0

Broker 2

P2 P1

Broker 3

P0 P2



Rebalancing leaders

Broker 1

P1 P0

Broker 2

P2 P1

Broker 3

P0 P2


Topic: test Partition: 0 Leader: 1 Replicas: 3, 1 ISR: 1Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2

Rebalancing leaders

Broker 1

P1 P0

Broker 2

P2 P1

Broker 3

P0 P2



Lost events

ACK levels

0 - don’t wait for response from the leader

1 - only the leader has to respond

-1 - all replicas must be in sync

Spee

d

Safe

ty

Event identification

Lost events

Lost events

ERROR [Replica Manager on Broker 2]: Error when processing fetch request for partition [test,1] offset 10000 from consumer with correlation id 0. Possible cause:

Request for offset 10000 but we only have log segments in the range 8000 to 9000. (kafka.server.ReplicaManager)

Lost events

Broker 1 Broker 2

Producer

ACK = 1

Replication factor = 1

replica.lag.max.messages = 2000

commited offset = 10000 commited offset = 9000

Zookeeper

Lost events

Broker 1 Broker 2

Producer

ACK = 1

Replication factor = 1

replica.lag.max.messages = 2000

commited offset = 10000 commited offset = 9000

Zookeepercommited offset = 9000

Slow responses

Slow responses

75%

99%

99,9%

resp

onse

tim

e

Slow responses

mes

sage

siz

e

75%

99%

99,9%

Is response time correlated to message size?

Slow responsesre

spon

se t

ime

75%

99%

99,9%

Same distribution for fixed message size.

Slow responses

resp

onse

tim

e

75%

99%

99,9%

Hermes overhead is just about 1 ms.

Kafka

kernel 3.2.x

Kafka

kernel 3.2.x kernel >= 3.8.x

Normal operation

Slow responses

Message size

Optimize message sizem

essa

ge s

ize

99,9%

all topics

99,9%

biggest topic

Optimize message size

JSON human readablebig memory and network footprintpoor support for Hadoop


JSONSnappy

ERROR Error when sending message to topic t3 with key: 4 bytes, value: 100

bytes with error: The server experienced an unexpected error when

processing the request (org.apache.kafka.clients.producer.internals.

ErrorLoggingCallback)

java: target/snappy-1.1.1/snappy.cc:423: char* snappy::internal::

CompressFragment(const char*, size_t, char*, snappy::uint16*, int): Assertion

`0 == memcmp(base, candidate, matched)' failed.

errors on publishing large amount of messages


JSONSnappyLz4

failed on distributed data

com

pres

sion

rat

io

single

topic

multiple

topics


JSONSnappyLz4Avro

small network footprintHadoop friendlyeasy schema verification

Improvements

Multi data center

Consumer backoff

You can’t have exactly one delivery

http://bravenewgeek.com/you-cannot-have-exactly-once-delivery/

Kafka offsets

<=0.8.1 - Zookeeper

>=0.8.2 - Zookeeper or Kafka

>=0.9(?) - Kafka

Kafka Offset Monitor

Manage your topics

Improved security

Authentication and authorization interfaces provided

By Default:

You can create any topic in your group

You can publish everywhere (in progress)

Group owner defines subscriptions

Improved offset management

Turn back the time

PUT /groups/{group}/topics/{topic}/subscriptions/{subscription}/retransmission -8h

Blog: http://allegro.tech

Twitter: @allegrotechblog

Twitter: @debskichris

java zone 2015 how to make life with kafka easier

Technology