Download - Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! by Guido Schmutz
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Apache KafkaScalable Message Processing and more!
Guido Schmutz
@gschmutz guidoschmutz.wordpress.com
Agenda
1. Introduction & Motivation2. Kafka Core
3. Kafka Connect
4. Kafka Streams
5. Kafka and ”Big Data” / ”Fast Data” Ecosystem
6. Confluent Data Platform7. Kafka in Architecture
8. Summary
Apache Kafka - Scalable Message Processing and more!3
Introduction & Motivation
Apache Kafka - Scalable Message Processing and more!4
A little story of a “real-life” customer situation
Traditional system interact with its clients and does its workImplemented using legacy technologies (i.e. PL/SQL)
New requirement:• Offer notification service to notify
customer when goods are shipped• Subscription and inform over different
channels• Existing technology doesn’t fit
delivery
LogisticSystem
Oracle
MobileApps
Sensor ship
sort
5
Rich(Web)ClientApps
DB
schedule
Logic(PL/SQL)
delivery
Apache Kafka - Scalable Message Processing and more!5
A little story of a “real-life” customer situation
Events are “owned” by traditional application (as well as the channels they are transported over)
Implement notification as a new Java-based application/system
But we need the events ! => so let’s integrate
delivery
LogisticSystem
Oracle
MobileApps
Sensor ship
sort
6
Rich(Web)ClientApps
DB
schedule
Notification
Logic(PL/SQL)
Logic(Java)delivery
SMS
…
Apache Kafka - Scalable Message Processing and more!6
A little story of a “real-life” customer situation
integrate in order to get the information! Oracle Service Bus was already there
Rule Engine implemented in Java and invoked from OSB message flowNotification system informed via queueHigher Latency introduced (good enough in this case)
delivery
LogisticSystem
OracleOracle
ServiceBus
MobileApps
Sensor AQship
sort
7
Rich(Web)ClientApps
DB
schedule
Filter
Notification
Logic(PL/SQL)
JMS
RuleEngine(Java)
Logic(Java)delivery
shipdelivery
delivery true SMS
…
Apache Kafka - Scalable Message Processing and more!7
A little story of a “real-life” customer situation
Treat events as first-class citizens
Events belong to the “enterprise” and not an individual system => Catalog of Events similar to Catalog of Services/APIs !!
Event (stream) processing can be introduced and by that latency reduced!
delivery
LogisticSystem
OracleOracle
ServiceBus
MobileApps
Sensor AQship
sort
8
Rich(Web)ClientApps
DB
schedule
Filter
Notification
Logic(PL/SQL)
JMS
RuleEngine(Java)
Logic(Java)delivery
shipdelivery
delivery true SMS
…
Apache Kafka - Scalable Message Processing and more!8
Treat Events as Events and share them!
delivery
LogisticSystem
Oracle
OracleServiceBus
MobileApps
Sensorship
sort
9
Rich(Web)ClientApps
DB
schedule
Filter
Notification
Logic(PL/SQL)
JMS
RuleEngine(Java)
Logic(Java)
delivery
ship
delivery true SMS
…
EventBus/Hub
Stream/EventProcessing
Apache Kafka - Scalable Message Processing and more!9
Treat Events as Events,share and make use of them!
delivery
LogisticSystem
Oracle
MobileApps
Sensorship
sort
10
Rich(Web)ClientApps
DB
schedule
Filter
Notification
Logic(PL/SQL)
RuleEngine(Java)
Logic(Java)
delivery
SMS
…
EventBus/Hub
Stream/EventProcessing
notifiableDelivery
notifiableDelivery
delivery
Apache Kafka - Scalable Message Processing and more!10
Kafka Stream Data Platform
Source:ConfluentApache Kafka - Scalable Message Processing and more!11
Kafka Core
Apache Kafka - Scalable Message Processing and more!12
Apache Kafka - Overview
Distributed publish-subscribe messaging system
Designed for processing of real time activity stream data (logs, metrics collections, social media streams, …)
Initially developed at LinkedIn, now part of Apache
Does not use JMS API and standards
Kafka maintains feeds of messages in topics
Apache Kafka - Scalable Message Processing and more!13
Apache Kafka - Motivation
LinkedIn’s motivation for Kafka was:
• “A unified platform for handling all the real-time data feeds a large company might have.”
Must haves
• High throughput to support high volume event feeds.
• Support real-time processing of these feeds to create new, derived feeds.
• Support large data backlogs to handle periodic ingestion from offline systems.
• Support low-latency delivery to handle more traditional messaging use cases.
• Guarantee fault-tolerance in the presence of machine failures.
Apache Kafka - Scalable Message Processing and more!14
Kafka High Level Architecture
The who is who• Producers write data to brokers.• Consumers read data from
brokers.• All this is distributed.
The data• Data is stored in topics.• Topics are split into partitions,
which are replicated.
Kafka Cluster
Consumer Consumer Consumer
Producer Producer Producer
Broker 1 Broker 2 Broker 3
ZookeeperEnsemble
Apache Kafka - Scalable Message Processing and more!15
Apache Kafka - Architecture
Kafka Broker
Movement Processor
MovementTopic
Engine-MetricsTopic
1 2 3 4 5 6
EngineProcessor1 2 3 4 5 6
Truck
Apache Kafka - Scalable Message Processing and more!16
Apache Kafka - Architecture
Kafka Broker
Movement Processor
MovementTopic
Engine-MetricsTopic
1 2 3 4 5 6
EngineProcessor
Partition0
1 2 3 4 5 6Partition0
1 2 3 4 5 6Partition1 Movement
ProcessorTruck
Apache Kafka - Scalable Message Processing and more!17
ApacheKafka
Kafka Broker 1
Movement Processor
Truck
MovementTopicP0
Movement Processor
1 2 3 4 5
P2 1 2 3 4 5
Kafka Broker 2MovementTopic
P2 1 2 3 4 5
P1 1 2 3 4 5
Kafka Broker 3MovementTopic
P0 1 2 3 4 5
P1 1 2 3 4 5
Movement Processor
Apache Kafka - Scalable Message Processing and more!18
Apache Kafka - Architecture
• Write Ahead Log / Commit Log
• Producers always append to tail
• think append to file
Kafka Broker
MovementTopic
1 2 3 4 5
Truck
6 6
Apache Kafka - Scalable Message Processing and more!19
Durability Guarantees
Producer can configure acknowledgements
Value Impact Durability
0 • Producerdoesn’twaitforleader weak1(default) • Producerwaitsforleader
• Leadersends ack whenmessagewrittentolog• Nowaitforfollowers
medium
all • Producerwaitsforleader• Leadersendsack when allIn-SyncReplicahaveacknowledged
strong
Apache Kafka - Scalable Message Processing and more!20
Apache Kafka - Partition offsets
Offset: messages in the partitions are each assigned a unique (per partition) and sequential id called the offset
• Consumers track their pointers via (offset, partition, topic) tuples
ConsumerGroupA ConsumerGroupB
Apache Kafka - Scalable Message Processing and more!21
Source:ApacheKafka
Data Retention – 3 options
1. Never
2. Time based (TTL) log.retention.{ms | minutes | hours}
3. Size based log.retention.bytes
4. Log compaction based (entries with same key are removed)kafka-topics.sh --zookeeper localhost:2181 \
--create --topic customers \--replication-factor 1 --partitions 1 \--config cleanup.policy=compact
Apache Kafka - Scalable Message Processing and more!22
Apache Kafka – Some numbers
Kafka at LinkedIn => over 1800+ broker machines / 79K+ Topics
Kafka Performance at our own infrastructure => 6 brokers (VM) / 1 cluster
• 445’622 messages/second• 31 MB / second • 3.0405 ms average latency between producer / consumer
1.3Trillionmessagesperday
330Terabytesin/day
1.2Petabytesout/day
Peakloadforasinglecluster2millionmessages/sec4.7Gigabits/secinbound15Gigabits/secoutbound
http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
https://engineering.linkedin.com/kafka/running-kafka-scale
Apache Kafka - Scalable Message Processing and more!23
Kafka Connect
Apache Kafka - Scalable Message Processing and more!26
Kafka Connect Architecture
Apache Kafka - Scalable Message Processing and more!27
Source:Confluent
Kafka Connector Hub – Certified Connectors
Source:http://www.confluent.io/product/connectors
Apache Kafka - Scalable Message Processing and more!28
Kafka Connector Hub – Additional Connectors
Source:http://www.confluent.io/product/connectors
Apache Kafka - Scalable Message Processing and more!29
Kafka Streams
Apache Kafka - Scalable Message Processing and more!30
Kafka Streams
• Designed as a simple and lightweight library in Apache Kafka
• no external dependencies on systems other than Apache Kafka
• Leverages Kafka as its internal messaging layer
• agnostic to resource management and configuration tools
• Supports fault-tolerant local state
• Event-at-a-time processing (not microbatch) with millisecond latency
• Windowing with out-of-order data using a Google DataFlow-like model
Apache Kafka - Scalable Message Processing and more!31
Kafka Streams Architecture
Apache Kafka - Scalable Message Processing and more!32
Source:Confluent
Kafka and ”Big Data” / ”Fast Data” Ecosystem
Apache Kafka - Scalable Message Processing and more!33
Kafka and the Big Data / Fast Data ecosystem
Kafka integrates with many popular products / frameworks
• Apache Spark Streaming
• Apache Flink
• Apache Storm
• Apache NiFi
• Streamsets
• Apache Flume
• Oracle Stream Analytics
• Oracle Service Bus
• Oracle GoldenGate
• Spring Integration Kafka Support
• …Stormbuilt-inKafkaSpouttoconsumeeventsfromKafka
Apache Kafka - Scalable Message Processing and more!34
Confluent Platform
Apache Kafka - Scalable Message Processing and more!35
Confluent Data Platform 3.0
Apache Kafka - Scalable Message Processing and more!36
Source:Confluent
Kafka in Architecture
Apache Kafka - Scalable Message Processing and more!37
Hadoop ClusterdHadoop ClusterHadoop Cluster
Customer Event Hub – taking Velocity into account
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
Batch Analytics
Streaming Analytics
Event HubEvent
HubEvent Hub
NoSQL
ParallelProcessing
DistributedFilesystem
Stream AnalyticsNoSQL
Reference /Models
SQL
Search
Dashboard
BITools
Enterprise Data Warehouse
Search
Online&MobileApps
File Import / SQL Import
WeatherData
Apache Kafka - Scalable Message Processing and more!38
WeatherData
SQL ImportHadoop ClusterdHadoop Cluster
Hadoop Cluster
Location
Social
Clickstream
Sensor Data
Billing &Ordering
CRM / Profile
MarketingCampaigns
CallCenter
MobileApps
Batch Analytics
Streaming Analytics
Event HubEvent
HubEvent Hub
NoSQL
ParallelProcessing
DistributedFilesystem
Stream AnalyticsNoSQL
Reference /Models
SQL
Search
Dashboard
BITools
Enterprise Data Warehouse
Search
Online&MobileApps
Customer Event Hub – mapping of technologies
Apache Kafka - Scalable Message Processing and more!39
Summary
Apache Kafka - Scalable Message Processing and more!40
Summary
• Kafka can scale to millions of messages per second, and more
• Easy to start with for a PoC
• A bit more to invest to setup production environment
• Monitoring is key
• Vibrant community and ecosystem
• Fast pace technology
• Confluent provides Kafka Distribution
Apache Kafka - Scalable Message Processing and more!41
Guido SchmutzTechnology Manager
Apache Kafka - Scalable Message Processing and more!42
@gschmutz guidoschmutz.wordpress.com