apache: big data 2015 kafka architecture the best of apache · pdf filethe best of apache...
TRANSCRIPT
![Page 1: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/1.jpg)
The Best of Apache Kafka Architecture
Ranganathan Balashanmugam
@ran_than
Apache: Big Data 2015
![Page 2: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/2.jpg)
Helló Budapest
![Page 3: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/3.jpg)
About Me
❏ Graduated as Civil Engineer.❏ <dev> 10+ years </dev>❏ <Thoughtworker from=”India”/>❏ Organizer of Hyderabad Scalability Meetup with 2000+
members.
![Page 4: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/4.jpg)
“Form follows function.”
- Louis Sullivan
![Page 5: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/5.jpg)
Gravity DamIndirasagar Dam, India
img src: http://www.montanhydraulik.in
![Page 6: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/6.jpg)
Forces on a gravity dam
Dam weight
Head Water
Tail Water
Uplift
![Page 7: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/7.jpg)
❏ publish-subscribe messaging service❏ distributed commit/write-ahead log
“producers produce, consumers consume, in large distributed reliable way -- real time”
![Page 8: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/8.jpg)
❏ DBs❏ Logs❏ Brokers❏ HDFS
“For highly distributed messages, Kafka stands out.”
Why Kafka?
![Page 9: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/9.jpg)
Kafka Vs ________
src: https://softwaremill.com/mqperf/
![Page 10: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/10.jpg)
Timeline
2011 2012 2013 2014 2015
Open sourced by LinkedIn, as version 0.6
Graduated from Apache
Latest stable - 0.8.2.1
Several Engineers who built Kakfa create Confluent
![Page 11: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/11.jpg)
A Kafka Message
CRC attributes key length
key message message length message content
kafka.message.Message
magic
Change requested:KAFKA-2511
![Page 12: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/12.jpg)
Producers - push
Kafka Broker
org.apache.kafka.clients.producer.KafkaProducer
Response => [TopicName [Partition ErrorCode Offset]]
Request => RequiredAcks Timeout [TopicName [Partition MessageSetSize MessageSet]]
![Page 13: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/13.jpg)
Topic
number of messages
time size
Remove messages based on
kafka.common.Topic
![Page 14: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/14.jpg)
Partitions
kafka.cluster.Partition
Serves: Horizontal scaling, Parallel consumer reads
![Page 15: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/15.jpg)
Consumers - pull
kafka.consumer.ConsumerConnector,kafka.consumer.SimpleConsumer
Consumer 1
Consumer 2
![Page 16: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/16.jpg)
Consumer offsetscommitting and fetching consumer offsets
img src: http://www.reynanprinting.com/photos/undefined/impresion-offset1.jpg
![Page 17: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/17.jpg)
kafka:// - protocol
● Metadata● Send● Fetch● Offsets● Offset commit● Offset fetch
“Binary protocol over TCP”
![Page 18: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/18.jpg)
Mechanical Sympathy"The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry." - Henry Peteroski
Image source: http://www.theguide2surrey.com
![Page 19: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/19.jpg)
Persistence“Everything is faster till the disk IO.”
![Page 20: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/20.jpg)
Disk faster than RAM
src: http://queue.acm.org/detail.cfm?id=1563874
![Page 21: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/21.jpg)
Linear Read & Writes
On high level there are only two operations:
Append to end of logfetch messages from a partition beginning from a particular message id
sequential file I/O
![Page 22: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/22.jpg)
“Let us play pictionary”
![Page 23: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/23.jpg)
Linux Page Cache
“Kafka ate my RAM”
![Page 24: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/24.jpg)
ZeroCopy
src: http://www.ibm.com/developerworks/library/j-zerocopy/
![Page 25: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/25.jpg)
Batchingsmall latency to improve throughput
img src: https://prashanthpanduranga.files.wordpress.com/2015/05/tirupati.jpg
![Page 26: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/26.jpg)
Compressionbandwidth is more expensive per-byte to scale than disk I/O, CPU, or network bandwidth capacity within a facility
kafka.message.CompressionCodec
![Page 27: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/27.jpg)
Log compaction
img src: http://kafka.apache.org/083/documentation.htmlkafka.log.LogCleaner, LogCleanerManager
![Page 28: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/28.jpg)
Message Delivery
Atleast once Atmost once Exactly once
![Page 29: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/29.jpg)
Replicationun-replicated = replication factor of one
![Page 30: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/30.jpg)
Quorum based
● Better latency● To tolerate “f” failures, need “2f+1” replicas
![Page 31: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/31.jpg)
Primary-backup replication
Broker 1 Broker 2 Broker 3 Broker 4
Topic 1 Topic 1 Topic 1
Topic 2 Topic 2 Topic 2
Topic 3 Topic 3Topic 3
![Page 32: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/32.jpg)
ZooKeepercluster coordinator
![Page 33: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015](https://reader031.vdocument.in/reader031/viewer/2022022615/5aa2fd3e7f8b9ab4208dc806/html5/thumbnails/33.jpg)
THANK YOUFor questions or suggestions:
Ran.ga.na.than B
@ran_than