introducing exactly once semantics to apache kafka

Post on 22-Jan-2018

4.123 Views

Category:

Engineering

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Introducing Exactly Once Semantics in Apache KafkaJason Gustafson, Guozhang Wang, SriramSubramaniam, and Apurva Mehta

2

On deck..

• Kafka’s existing delivery semantics.

• Why did we improve them?

• What’s new?

• How do you use it?

• Summary.

3

Apache Kafka’s existing semantics

4

Existing Semantics

5

Existing Semantics

6

Existing Semantics

7

Existing Semantics

8

Existing Semantics

9

Existing Semantics

10

Existing Semantics

11

Existing Semantics

12

Existing Semantics

13

TL;DR – What we have today

• At least once in order delivery per partition.

• Producer retries can introduce duplicates.

14

Why improve?

15

Why improve?

• Stream processing is becoming an ever bigger part of the

data landscape.

• Apache Kafka is the heart of the streams platform.

• Strengthening Kafka’s semantics expands the universe of

streaming applications.

16

A motivating example..

A peer to peer lending platform which processes micro-loans

between users.

17

A Peer to Peer Lender

18

The Basic Flow

19

Offset commits

20

Reprocessed transfer, eek!

21

Lost money! Eek eek!

22

What’s new?

23

What’s new

• Exactly once in order delivery per partition

• Atomic writes across multiple partitions

• Performance considerations

24

What’s new, Part 1

Exactly once, in order, delivery per partition

25

The idempotent producer

26

The idempotent producer

27

The idempotent producer

28

The idempotent producer

29

The idempotent producer

30

The idempotent producer

31

The idempotent producer

32

The idempotent producer

33

TL;DR

• Sequence numbers and producer ids:

• enable de-dup

• are in the log.

• Hence de-dup works transparently across leader

changes.

• Will not de-dup application-level resends.

• Works transparently – no API changes.

34

What’s new, part 2

Multi partition writes.

35

Introducing ‘transactions’

producer.initTransactions();

try {

producer.beginTransaction();

producer.send(record0);

producer.send(record1);

producer.sendOffsetsToTxn(…);

producer.commitTransaction();

} catch (ProducerFencedException e) {

producer.close();

} catch (KafkaException e) {

producer.abortTransaction();

}

36

Introducing ‘transactions’

37

Initializing ‘transactions’

38

Transactional sends – part 1

39

Transactional sends – part 2

40

Commit – phase 1

41

Commit – phase 2

42

Commit – phase 2

43

Success!

44

Let’s review the APIs

producer.initTransactions();

try {

producer.beginTransaction();

producer.send(record0);

producer.send(record1);

producer.sendOffsetsToTxn(…);

producer.commitTransaction();

} catch (ProducerFencedException e) {

producer.close();

} catch (KafkaException e) {

producer.abortTransaction();

}

45

Let’s review the APIs

producer.initTransactions();

try {

producer.beginTransaction();

producer.send(record0);

producer.send(record1);

producer.sendOffsetsToTxn(…);

producer.commitTransaction();

} catch (ProducerFencedException e) {

producer.close();

} catch (KafkaException e) {

producer.abortTransaction();

}

46

Let’s review the APIs

producer.initTransactions();

try {

producer.beginTransaction();

producer.send(record0);

producer.send(record1);

producer.sendOffsetsToTxn(…);

producer.commitTransaction();

} catch (ProducerFencedException e) {

producer.close();

} catch (KafkaException e) {

producer.abortTransaction();

}

47

Let’s review the APIs

producer.initTransactions();

try {

producer.beginTransaction();

producer.send(record0);

producer.send(record1);

producer.sendOffsetsToTxn(…);

producer.commitTransaction();

} catch (ProducerFencedException e) {

producer.close();

} catch (KafkaException e) {

producer.abortTransaction();

}

48

Let’s review the APIs

producer.initTransactions();

try {

producer.beginTransaction();

producer.send(record0);

producer.send(record1);

producer.sendOffsetsToTxn(…);

producer.commitTransaction();

} catch (ProducerFencedException e) {

producer.close();

} catch (KafkaException e) {

producer.abortTransaction();

}

49

Consumer returns only committed messages

50

Some notes on consuming transactions

• Two ‘isolation levels’ : read_committed, and

read_uncommitted.

• Messages read in offset order.

• read_committed consumers read to the point where there

are no open transactions.

51

TL;DR

• Transaction coordinator and transaction log maintain

transaction state.

• Use the new producer APIs for transactions.

• Consumers can read only committed messages.

52

Part 3

Performance!

53

What’s new, part 3: Performance boost!

• Up to +20% producer throughput

• Up to +50% consumer throughput

• Up to -20% disk utilization

• Savings start when you batch

• Details: https://bit.ly/kafka-eos-perf

54

Too good to be true?

Let’s understand how!

55

The old message format

56

The new format

57

The new format -> new fields

58

The new format -> new fields

59

The new format -> delta encoding

60

A visual comparison with 7 records, 10 bytes each

61

TL;DR

• With a batch size of 2, the new format starts saving

space.

• Savings are maximal for large batches of small

messages.

• Hence higher throughput when IO bound.

• Works as soon as you upgrade to the new format.

62

Cool!

But how do I use this?

63

Producer Configs

• enable.idempotence = true

• max.inflight.requests.per.connection=1

• acks = “all”

• retries > 1 (preferably MAX_INT)

• transactional.id = ‘some unique id’

• enable.idempotence = true

64

Consumer configs

• isolation.level:

• “read_committed”, or

• “read_uncommitted”

65

Streams config

• processing.mode = “exactly_once”

66

Putting it together

• We understood Kafka’s existing delivery semantics

• Understood why we want to improve them

• Learned how these have been strengthened

• Learned how the new semantics work

67

When is it available?

Available to try in Kafka 0.11, June 2017.

68

Thank You!

top related