![Page 1: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/1.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Kafka at Scale Multi-Tier Architectures
![Page 2: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/2.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Todd Palino Staff Site Reliability Engineer LinkedIn, Data Infrastructure Streaming
![Page 3: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/3.jpg)
3
You may remember me from such talks as…
“Apache Kafka Meetup”
And
“Enterprise Kafka: QoS and Multitenancy”
![Page 4: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/4.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Who Am I?
§ Kafka, Samza, and Zookeeper SRE at LinkedIn
§ Site Reliability Engineering – Administrators – Architects – Developers
§ Keep the site running, always
4
![Page 5: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/5.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
What Will We Talk About?
§ Tiered Cluster Architecture
§ Kafka Mirror Maker
§ Performance Tuning
§ Data Assurance
§ What’s Next?
5
![Page 6: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/6.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Kafka At LinkedIn
§ 300+ Kafka brokers § Over 18,000 topics § 140,000+ Partitions
§ 220 Billion messages per day § 40 Terabytes In § 160 Terabytes Out
§ Peak Load – 3.25 Million messages/sec – 5.5 Gigabits/sec Inbound – 18 Gigabits/sec Outbound
6
§ 1100+ Kafka brokers § Over 31,000 topics § 350,000+ Partitions
§ 675 Billion messages per day § 150 Terabytes In § 580 Terabytes Out
§ Peak Load – 10.5 Million messages/sec – 18.5 Gigabits/sec Inbound – 70.5 Gigabits/sec Outbound
![Page 7: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/7.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Tiered Cluster Architecture
7
![Page 8: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/8.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
One Kafka Cluster
8
![Page 9: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/9.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Single Cluster – Remote Clients
9
![Page 10: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/10.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Multiple Clusters – Local and Remote Clients
10
![Page 11: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/11.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Multiple Clusters – Message Aggregation
11
![Page 12: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/12.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Why Not Direct?
§ Network Concerns – Bandwidth – Network partitioning – Latency
§ Security Concerns – Firewalls and ACLs – Encrypting data in transit
§ Resource Concerns – A misbehaving application can swamp production resources
12
![Page 13: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/13.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Kafka Mirror Maker
13
![Page 14: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/14.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Kafka Mirror Maker
§ Consumes from one cluster, produces to another
§ No communication from producer back to consumer
§ Best practice is to keep the mirror maker local to the target cluster
§ Kafka does not prevent loops
14
![Page 15: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/15.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Rules of Aggregation
§ NEVER produce to aggregate clusters
15
![Page 16: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/16.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
NEVER produce to aggregate clusters!
16
![Page 17: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/17.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Rules of Aggregation
§ NEVER produce to aggregate clusters
§ Not every topic needs to be aggregated – Log compacted topics do not play nice – Most queuing topics are local only
§ But your whitelist/blacklist configurations must be consistent – If you have a topic that is aggregated, make sure to do it from all source
clusters to all aggregate clusters § Carefully consider if you want front-line aggregate clusters
– It can encourage creating single-master services – Sometimes it is necessary, such as for search services
17
![Page 18: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/18.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Mirror Maker Concerns
§ Adding a site increases the number of mirror maker instances – Solution: Multi-consumer mirror makers
§ Mirror maker can lose messages like any producer – Solution: reduce inflight batches and acks=-1
§ Mirror maker has to decompress and recompress every batch – Possible solution: flag compressed batches for keyed messages
§ Message partitions are not preserved – Possible solution: an identity mirror maker
18
![Page 19: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/19.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Performance Tuning
19
![Page 20: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/20.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Kafka Cluster Sizing
§ How big for your local cluster? – How much disk space do you have? – How much network bandwidth do you have? – CPU, memory, disk I/O
§ How big for your aggregate cluster? – In general, multiple the number of brokers by the number of local clusters – May have additional concerns with lots of consumers
20
![Page 21: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/21.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Topic Configuration
§ Partition Counts for Local – Many theories on how to do this correctly, but the answer is “it depends” – How many consumers do you have? – Do you have specific partition requirements? – Keeping partition sizes manageable
§ Partition Counts for Aggregate – Multiply the number of partitions in a local cluster by the number of local clusters – Periodically review partition counts in all clusters
§ Message Retention – If aggregate is where you really need the messages, only retain it in local for long
enough to cover mirror maker problems
21
![Page 22: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/22.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Mirror Maker Sizing
§ Number of servers and streams – Size the number of servers based on the peak bytes per second – Co-locate mirror makers – Run more mirror makers in an instance than you need – Use multiple consumer and producer streams
§ Other tunables to look at
– Partition assignment strategy – In flight requests per connection – Linger time
22
![Page 23: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/23.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Segregation of Topics
§ Not all topics are created equal
§ High Priority Topics – Topics that change search results – Topics used for hourly or daily reporting
§ Run a separate mirror maker for these topics – One bloated topic won’t affect reporting – Restarting the mirror maker takes less time – Less time to catch up when you fall behind
23
![Page 24: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/24.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Data Assurance
24
![Page 25: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/25.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Monitoring
§ Kafka is great for monitoring your applications
25
![Page 26: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/26.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Monitoring
§ Have a system for monitoring Kafka components that does not use Kafka – At least for critical metrics
§ For tiered architectures – Simple health check on mirror maker instances – Mirror maker consumer lag
§ Is the data intact?
26
![Page 27: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/27.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Auditing Message Flows
27
![Page 28: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/28.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Audit Content
§ Message audit header – Timestamp – Service and hostname
§ Audit messages – Start and end timestamps – Topic and tier – Count
28
![Page 29: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/29.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Audit Concerns
§ We are only counting messages – Duplication of messages can hide losses – Using the detailed service and host audit criteria, we can get around this
§ We can’t audit all consumers – The relational DB has issues keeping up with bootstrapping clients – This can be improved with changes to the database backend
§ We cannot handle complex message flows – The total number of messages has to appear in each tier that the topic is in – Multiple source clusters must have the same tier name
29
![Page 30: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/30.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Conclusion
30
![Page 31: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/31.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Work Needed in Kafka
§ Access controls
§ Encryption
§ Quotas
§ Decompression improvements in mirror maker
31
![Page 32: Multi-Tier Architectures At Scale.pdfKafka At LinkedIn ! 300+ Kafka brokers ! Over 18,000 topics ! 140,000+ Partitions ! 220 Billion messages per day ! 40 Terabytes In ! 160 Terabytes](https://reader033.vdocument.in/reader033/viewer/2022042220/5ec6b4c27965b564650c51ce/html5/thumbnails/32.jpg)
ORGANIZATION NAME ©2013 LinkedIn Corporation. All Rights Reserved.
Getting Involved With Kafka
§ http://kafka.apache.org
§ Join the mailing lists – [email protected] – [email protected]
§ irc.freenode.net - #apache-kafka
§ Meetups – Apache Kafka - http://www.meetup.com/http-kafka-apache-org – Bay Area Samza - http://www.meetup.com/Bay-Area-Samza-Meetup/
§ Contribute code
32