kafka needs no keeper 1 - qconsf · kafka needs no keeper colin mccabe. 2 kafka has gotten its...

Post on 03-Jun-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Kafka Needs No Keeper

Colin McCabe

2

● Kafka has gotten its mileage out of Zookeeper● But it is still a second system● KIP-500 has been adopted by the community● This is not a 1-1 replacement● We’ve been headed this direction for years

Introduction

3

Evolution of Apache Kafka Clients

4

Producer

Consumer

Admin Tools

5

write to topics

Producer

Consumer

Admin Tools

6

write to topics

read fromtopics

Producer

Consumer

Admin Tools

7

write to topics

read fromtopics

offset fetch/commit

group partitionassignment

Producer

Consumer

Admin Tools

8

write to topics

read fromtopics

offset fetch/commit

group partitionassignment

topic create/delete

Producer

Consumer

Admin Tools

9

Consumer Group Coordinator

10

Consumeroffset

fetch/commitgroup partition

assignment

read from topics

11

Consumeroffset

fetch/commitgroup partition

assignment

read from topics

Consumer APIs● Fetch

12

Consumeroffset

fetch/commitgroup partition

assignment

read from topics

Consumer APIs● Fetch

13

ConsumerConsumer APIs

● Fetchoffset fetch/commit

group partitionassignment

read from topics

__offsets

14

offset fetch/commit

Consumer

group partitionassignment

read from topics

Consumer APIs● Fetch● OffsetCommit● OffsetFetch

__offsets

15

Consumer

group partitionassignment

read from topics

offset fetch/commit Consumer APIs

● Fetch● OffsetCommit● OffsetFetch

__offsets

16

Consumer

group partitionassignment

read from topics

offset fetch/commit Consumer APIs

● Fetch● OffsetCommit● OffsetFetch

__offsets

17

group partitionassignment

Consumer

read from topics

offset fetch/commit Consumer APIs

● Fetch● OffsetCommit● OffsetFetch● JoinGroup● SyncGroup● Heartbeat

__offsets

18

Consumer

read from topics

offset fetch/commit

group partitionassignment

Consumer APIs● Fetch● OffsetCommit● OffsetFetch● JoinGroup● SyncGroup● Heartbeat

__offsets

19

Consumer

read from topics

offset fetch/commit

group partitionassignment

Consumer APIs● Fetch● OffsetCommit● OffsetFetch● JoinGroup● SyncGroup● Heartbeat

__offsets

20

read from topics

offset fetch/commit

group partitionassignment

ConsumerConsumer APIs

● Fetch● OffsetCommit● OffsetFetch● JoinGroup● SyncGroup● Heartbeat

__offsets

21

Consumer

Producer

Admin Toolscreate/delete

topics

22

Kafka Security and the

Admin Client

23

Consumer

Producer

create/delete topics

Admin Tools

24

ACL Enforcement

create/delete topics

Admin Tools

Consumer

Producer

25

create/delete topics

ACL Enforcement

Admin Tools

Consumer

Producer

26

create/delete topics

ACL Enforcement

Admin Tools

27

AdminClient

Admin Tools

ACL Enforcement

create/delete topics

28

AdminClient

Admin Tools

ACL Enforcement

create/delete topics

Admin APIs:● CreateTopics● DeleteTopics● AlterConfigs● ...

29

Admin APIs:● CreateTopics● DeleteTopics● AlterConfigs● ...

AdminClient

Admin Tools

ACL Enforcement

30

Producer

Consumer

AdminClient

Client APIs:● Produce● Fetch● Metadata● CreateTopics● DeleteTopics● ...

31

Producer

Consumer

AdminClient

Client APIs:● Produce● Fetch● Metadata● CreateTopics● DeleteTopics● ...

● Encapsulation● Security● Validation● Compatibility

32

Inter BrokerCommunication

33

34

Broker RegistrationACL Management

Dynamic ConfigurationISR Management

35

Controller

Broker RegistrationACL Management

Dynamic ConfigurationISR Management

36

Controller

Broker RegistrationACL Management

Dynamic ConfigurationISR Management

Controller Election

37

Controller

Broker RegistrationACL Management

Dynamic ConfigurationISR Management

Controller Election

38

Controller

Broker RegistrationACL Management

Dynamic ConfigurationISR Management

Controller Election

39

Controller Controller APIs:● LeaderAndIsr● UpdateMetadata● StopReplica

Leader/ISR PushUpdate Metadata

Stop/Delete Replica

Broker RegistrationACL Management

Dynamic ConfigurationISR Management

Controller Election

40

Controller Controller APIs:● LeaderAndIsr● UpdateMetadata● StopReplica

Leader/ISR PushUpdate Metadata

Stop/Delete Replica

Broker RegistrationACL Management

Dynamic ConfigurationISR Management

Controller Election

41

Controller Controller APIs:● LeaderAndIsr● UpdateMetadata● StopReplica

Leader/ISR PushUpdate Metadata

Stop/Delete Replica

Broker RegistrationACL Management

Dynamic ConfigurationISR Management

Controller Election

42

Controller Controller APIs:● LeaderAndIsr● UpdateMetadata● StopReplica● AlterIsr

Broker RegistrationACL Management

Dynamic ConfigurationISR Management

Controller Election

Leader/ISR PushUpdate Metadata

Stop/Delete Replica

43

Controller

Leader/ISR PushUpdate Metadata

Stop/Delete ReplicaISR Management

Controller APIs:● LeaderAndIsr● UpdateMetadata● StopReplica● AlterIsr

Broker RegistrationACL Management

Dynamic ConfigurationISR Management

Controller Election

44

45

● Encapsulation● Compatibility● Ownership

46

Broker Liveness

47

Zk Session

48

/brokers/1 -> { host: 10.10.10.1:9092 rack: rack-1}

49

/brokers/1 -> { host: 10.10.10.1:9092 rack: rack-1}

50

51

Watch trigger

52

Watch triggerBroker 1 is offline

53

Network PartitionResilience

54

55

Case 1: Total partition

56

Case 2: Broker partition

57

Case 3: Zk Partition

58

Case 4: Controller partition

59

Metadata Inconsistency

60

61

Metadata Sourceof Truth

62

Metadata Sourceof Truth

Metadata Cache- sync writes- async updates

63

Metadata Sourceof Truth

Metadata Cache- async update

Metadata Cache- sync writes- async updates

Metadata Cache- async update

64

65

66

67

Last Resort:> rmr /controller

68

Last Resort:> rmr /controller

New controller!

69

Last Resort:> rmr /controller

Load ALLMetadata

70

Last Resort:> rmr /controller

Load ALLMetadata

71

Last Resort:> rmr /controller

Push ALLMetadata

72

Last Resort:> rmr /controller

Push ALLMetadata

73

Last Resort:> rmr /controller

Push ALLMetadata

How do you know the metadata has diverged?

74

Performance of ControllerInitialization

75

76

77

New controller!

78

Load ALLMetadata

79

Load ALLMetadata

Complexity: O(N)N = number of partitions

80

81

Push ALLMetadata

82

Push ALLMetadata

Complexity: O(N*M)N = number of partitionsM = number of brokers

83

Metadata as an Event Log

8484

Metadata as an Event Log

- Each change becomes a message

- Changes are propagated to all brokers

...

924 Create topic ”foo”

925 Delete topic “bar”

926 Add node 4 to the cluster

927 Create topic “baz”

928 Alter ISR for “foo-0”

929 Add node 5 to the cluster

8585

Metadata as an Event Log

- Clear ordering- Can send deltas- Offset tracks consumer

position- Easy to measure lag

...

924 Create topic ”foo”

925 Delete topic “bar”

926 Add node 4 to the cluster

927 Create topic “baz”

928 Alter ISR for “foo-0”

929 Add node 5 to the cluster

86

Consumer

Consumer

Consumer

offset=3

offset=1

offset=2

87

offset=3

offset=1

offset=2

Broker

Broker

Broker

?

88

offset=3

offset=1

offset=2

Broker

Broker

Broker

Controller

89

Can we use the existing Kafka log replication protocol?

- How do we elect the leader?

We need a self-managed quorum.

Implementing the Controller Log

90

Can we use the existing Kafka log replication protocol?

- How do we elect the leader?

We need a self-managed quorum.

Implementing the Controller Log

Enter Raft.

Leader election is by simple majority.

91

Kafka Raft

Writes Single Leader Single Leader

Fencing Monotonically increasing epoch

Monotonically increasing term

Log reconciliation Offset and epoch Term and index

Push/Pull Pull Push

Commit Semantics ISR Majority

Leader Election From ISR through Zookeeper

Majority

92

The Controller Quorum

93

offset=1

offset=2

Broker

Broker

Controller

Controller

Controller

The Controller Raft Quorum- The leader is the active controller- Controls reads / writes to the log- Typically 3 or 5 nodes, like ZK

94

offset=1

offset=2

Broker

Broker

Controller

Controller

Controller

Instant Failover- Low-latency failover via Raft election- Standbys contain all data in memory- Brokers do not need to re-fetch

95

/mnt/logs/kafka/metadataoffset=1

Broker

Broker

Controller

Controller

Controller

Metadata Caching- Brokers can persist metadata to disk- Only fetch what they need- Use snapshots if we’re too far behind

/mnt/logs/kafka/metadataoffset=2

96

Broker Registration- Building a map of the

cluster- What brokers exist in

the cluster?- How can they be

reached?

Controller

97

Broker Registration- Brokers send

heartbeats to the active controller

- The controller uses this to build a map of the cluster

Controller

98

Controller

Broker Registration- Brokers send

heartbeats to the active controller

- The controller uses this to build a map of the cluster

- The controller also tells brokers if they should be fenced or shut down

99

Controller

Fencing- Brokers need to be

fenced if they’re partitioned from the controller, or can’t keep up

- Brokers self-fence if they can’t talk to the controller

100

Handling network partitions

101

Case 1: Total partition

102

Case 1: Total partition

103

Case 2: Broker partition

104

Case 3: Controller partition

105

Case 3: Controller partition

106

DeploymentCurrent KIP-500

Configuration File Kafka and ZooKeeper

Kafka

Metrics Kafka and ZK Kafka

Administrative Tools

ZK Shell, Four letter words, Kafka tools

Kafka tools

Security Kafka and ZK Kafka

107

Shared Controller Nodes

- Fewer resources used

- Single node clusters (eventually)

108

Separate Controller Nodes

- Better resource isolation

- Good for big clusters

109

Roadmap

110

Remove Client-side ZK dependencies

Remove Broker-side ZK dependencies

Controller Quorum

111

Remove Client-side ZK dependencies

Remove Broker-side ZK dependencies

Controller Quorum

Incremental KIP-4 Improvements

- Create new APIs- Deprecate direct ZK

access

112

Remove Client-side ZK dependencies

Remove Broker-side ZK dependencies

Controller Quorum

Broker-Side Fixes- Remove deprecated

direct ZK access for tools

- Create broker-side APIs

- Centralize ZK access in the controller

113

Remove Client-side ZK dependencies

Remove Broker-side ZK dependencies

Controller Quorum

First Release without ZooKeeper

- Raft- Controller quorum

114

Upgrade Issues- Tools using ZK- Brokers

accessing ZK- State in ZK

KIP-500 Release

Older Kafka Release

115

Bridge Release

KIP-500 Release

Older Kafka ReleaseBridge Release- No ZK access

from tools, brokers (except controller)

116

Upgrading- Starting from the

bridge release

117

Upgrading- Start new controller

nodes (possibly combined)

- Quorum elects leader- Claims leadership in

ZK

118

Upgrading- Roll nodes one by

one as usual- Controller continues

sending LeaderAndIsr, etc. to old nodes

119

Upgrading- When all brokers

have been rolled, decommission ZK nodes

120

Conclusion

121

Apache ZooKeeper has served us well- KIP-500 is not a 1:1 replacement, but a different

paradigmWe have already started removing ZK from clients- Consumer, AdminClient- Improved encapsulation, security, upgradability

122

Metadata should be managed as a log- Deltas, ordering, caching- Controller Failover, Fencing- Improved scalability, robustness, easier deployment

The metadata log must be self-managed- Raft- Controller quorum

123

It will take a few releases to implement KIP-500- Additional KIPs for APIs, Raft, Metadata, etc.

Rolling upgrades will be supported- Bridge release- Post-ZK release

Kafka needs no Keeper

124

cnfl.io/meetups cnfl.io/blog cnfl.io/slack

THANK YOU

Colin McCabecmccabe@confluent.io

top related