Transcript
Page 1: Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up

©2014 Knowledgent Group Inc. All Rights Reserved

Stream Processing with Big Data

Learn Apache KafkaKishore VeletiBig Data Engineer

Page 2: Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up

©2014 Knowledgent Group Inc. All Rights Reserved2

• Big Data Engineer at Knowledgent

• Background in enterprise application development using Hadoop stack, Java, PHP

• Worked in Healthcare, Banking, and Social Media Applications

• Passionate in sharing knowledge

About Me

Page 3: Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up

©2014 Knowledgent Group Inc. All Rights Reserved3

Tutorial

Page 4: Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up

©2014 Knowledgent Group Inc. All Rights Reserved4

• What is Apache Kafka?

• Apache Kafka Terminology

• Apache Kafka – about Topic & Partition

• Apache Kafka hands-on

We will discuss:

Page 5: Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up

©2014 Knowledgent Group Inc. All Rights Reserved5

• Apache Kafka is a publish-subscribe messaging system implemented as a distributed commit log

• It is written in Java/Scala

• Built by LinkedIn to process activity stream data from their website

What is Apache Kafka?

Page 6: Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up

©2014 Knowledgent Group Inc. All Rights Reserved6

• All the messages in Kafka are real-time

• There are many subscribers to a message

• Kafka persists messages to the disk

• Messages are retained for a specific time period

• Subscribers/clients store the state of their reads

• Easy to replay messages

What is Apache Kafka?

Page 7: Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up

©2014 Knowledgent Group Inc. All Rights Reserved7

• Message: A datum to send

• Topic: Kafka maintains messages in categories called “topics”

• Partition: A logical division of a topic

• Producer: An API to publish messages to Kafka topic

• Broker: A server

• Cluster: Kafka cluster comprises one or more brokers

• Consumer: API to consume published messages and process further

• Replication: Kafka replicates log for each partition across servers

Apache Kafka Terminology

Page 8: Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up

©2014 Knowledgent Group Inc. All Rights Reserved8

Message Topic Partition Producer Broker

Consumer

At a high level, producers send messages over the network to the Kafka cluster.

Kafka cluster in turn serves them up to consumers.

Apache Kafka Terminology & Big Picture

Page 9: Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up

©2014 Knowledgent Group Inc. All Rights Reserved9

Message Topic Partition Producer Broker

Consumer

Let’s do a hands-on exercise of Kafka with knowledge we’ve learned until now

Apache Kafka Terminology & Big Picture

Page 10: Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up

©2014 Knowledgent Group Inc. All Rights Reserved10

Message Topic Partition Producer Broker

Consumer

In Kafka for each topic a partition log is maintained.

Each partition is an ordered, immutable sequence of messages that is appended to

Each message in the partition is assigned a sequential id number called the offset

Apache Kafka: About Topic and Partition

Partition 1

Writes

Partition 2

Partition 3

Page 11: Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up

©2014 Knowledgent Group Inc. All Rights Reserved11

Message Topic Partition Producer Broker Consumer

In Kafka, a Producer is an API to publish messages to topic

Apache Kafka: About Topic and Partition

Page 12: Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up

©2014 Knowledgent Group Inc. All Rights Reserved12

Message Topic Partition Producer Broker Consumer

In Kafka, a Consumer is an API to consume messages from topics

Apache Kafka: About Topic and Partition

Page 13: Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up

©2014 Knowledgent Group Inc. All Rights Reserved13

Message Topic Partition Producer Broker

Consumer

Let’s do a hands-on exercise of Kafka with knowledge we’ve learned until now

Apache Kafka Terminology & Big Picture

Page 14: Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up

©2014 Knowledgent Group Inc. All Rights Reserved14

• Trading Systems- Risk Identification in real-time

• Change Data Capture- Capturing the changed data into data lake environment

• Online Gaming- Identifying top scorers of a game

Apache Kafka Use Cases

Page 15: Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up

©2014 Knowledgent Group Inc. All Rights Reserved15

Thank you!

Questions?


Top Related