apache kafka at linkedin

37
Jay Kreps Introduction to Apache Kafka

Upload: discover-pinterest

Post on 19-Aug-2014

331 views

Category:

Engineering


5 download

DESCRIPTION

Jay Kreps is a Principal Staff Engineer at LinkedIn where he is the lead architect for online data infrastructure. He is among the original authors of several open source projects including a distributed key-value store called Project Voldemort, a messaging system called Kafka, and a stream processing system called Samza. This talk gives an introduction to Apache Kafka, a distributed messaging system. It will cover both how Kafka works, as well as how it is used at LinkedIn for log aggregation, messaging, ETL, and real-time stream processing.

TRANSCRIPT

Page 1: Apache Kafka at LinkedIn

Jay KrepsIntroduction to Apache Kafka

Page 2: Apache Kafka at LinkedIn

The Plan1. What is Apache Kafka?2. Kafka and Data Integration3. Kafka and Stream Processing

Page 3: Apache Kafka at LinkedIn

Apache Kafka

Page 4: Apache Kafka at LinkedIn

Abrief

historyof

ApacheKafka

Page 5: Apache Kafka at LinkedIn

Characteristics• Scalability of a filesystem– Hundreds of MB/sec/server throughput–Many TB per server

• Guarantees of a database–Messages strictly ordered– All data persistent

• Distributed by default– Replication– Partitioning model

Page 6: Apache Kafka at LinkedIn

Kafka is about logs

Page 7: Apache Kafka at LinkedIn

What is a log?

Page 8: Apache Kafka at LinkedIn
Page 9: Apache Kafka at LinkedIn
Page 10: Apache Kafka at LinkedIn

Logs: pub/sub done right

Page 11: Apache Kafka at LinkedIn

Partitioning

Page 12: Apache Kafka at LinkedIn

Nodes Host Many Partitions

Page 13: Apache Kafka at LinkedIn

Producers Balance Load

Page 14: Apache Kafka at LinkedIn

Consumer’s Divide Up Partitions

Page 15: Apache Kafka at LinkedIn

End-to-End

Page 16: Apache Kafka at LinkedIn

Kafka At LinkedIn• 175 TB of in-flight log data per colo• Replicated to each datacenter• Tens of thousands of data producers• Thousands of consumers• 7 million messages written/sec• 35 million messages read/sec• Hadoop integration

Page 17: Apache Kafka at LinkedIn

Performance• Producer (3x replication):– Async: 786,980 records/sec (75.1 MB/sec)– Sync: 421,823 records/sec (40.2 MB/sec)

• Consumer: – 940,521 records/sec (89.7 MB/sec)

• End-to-end latency: – 2 ms (median)– 14 ms (99.9th percentile)

Page 18: Apache Kafka at LinkedIn
Page 19: Apache Kafka at LinkedIn

The Plan1. What is Apache Kafka?2. Kafka and Data Integration3. Kafka and Stream Processing

Page 20: Apache Kafka at LinkedIn

Data Integration

Page 21: Apache Kafka at LinkedIn

Maslow’s Hierarchy

Page 22: Apache Kafka at LinkedIn

For Data

Page 23: Apache Kafka at LinkedIn

New Types of Data• Database data– Users, products, orders, etc

• Events– Clicks, Impressions, Pageviews, etc

• Application metrics– CPU usage, requests/sec

• Application logs– Service calls, errors

Page 24: Apache Kafka at LinkedIn

New Types of Systems• Live Stores– Voldemort– Espresso– Graph– OLAP– Search– InGraphs

• Offline– Hadoop– Teradata

Page 25: Apache Kafka at LinkedIn

Bad

Page 26: Apache Kafka at LinkedIn

Good

Page 27: Apache Kafka at LinkedIn

Example: User views job

Page 28: Apache Kafka at LinkedIn

Comparing Data Transfer Mechanisms

Page 29: Apache Kafka at LinkedIn

The Plan1. What is Apache Kafka?2. Kafka and Data Integration3. Kafka and Stream Processing

Page 30: Apache Kafka at LinkedIn

Stream Processing

Page 31: Apache Kafka at LinkedIn

Stream processing is ageneralization

of batch processing

Page 32: Apache Kafka at LinkedIn

Stream Processing = Logs + Jobs

Page 33: Apache Kafka at LinkedIn

Examples• Monitoring• Security• Content processing• Recommendations• Newsfeed• ETL

Page 34: Apache Kafka at LinkedIn

Frameworks Can Help

Page 35: Apache Kafka at LinkedIn

Samza Architecture

Page 36: Apache Kafka at LinkedIn

Log-centric Architecture

Page 37: Apache Kafka at LinkedIn

Kafkahttp://kafka.apache.org

Samzahttp://samza.incubator.apache.org

Log Bloghttp://linkd.in/199iMwY

Benchmark:http://t.co/40fkKJvanx

Mehttp://www.linkedin.com/in/jaykreps

@jaykreps