stream processing with big data: knowledgent big data palooza meet-up

Post on 27-Jun-2015

303 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

On September 17, 2014 at the NJ Big Data Palooza MeetUp, Kishore Veleti, Big Data Engineer at Knowledgent, presented on Stream Processing with Big Data using Apache Kafka.This presentation includes the content he covered during the event, including an overview of Kafka terminology and processes.

TRANSCRIPT

©2014 Knowledgent Group Inc. All Rights Reserved

Stream Processing with Big Data

Learn Apache KafkaKishore VeletiBig Data Engineer

©2014 Knowledgent Group Inc. All Rights Reserved2

• Big Data Engineer at Knowledgent

• Background in enterprise application development using Hadoop stack, Java, PHP

• Worked in Healthcare, Banking, and Social Media Applications

• Passionate in sharing knowledge

About Me

©2014 Knowledgent Group Inc. All Rights Reserved3

Tutorial

©2014 Knowledgent Group Inc. All Rights Reserved4

• What is Apache Kafka?

• Apache Kafka Terminology

• Apache Kafka – about Topic & Partition

• Apache Kafka hands-on

We will discuss:

©2014 Knowledgent Group Inc. All Rights Reserved5

• Apache Kafka is a publish-subscribe messaging system implemented as a distributed commit log

• It is written in Java/Scala

• Built by LinkedIn to process activity stream data from their website

What is Apache Kafka?

©2014 Knowledgent Group Inc. All Rights Reserved6

• All the messages in Kafka are real-time

• There are many subscribers to a message

• Kafka persists messages to the disk

• Messages are retained for a specific time period

• Subscribers/clients store the state of their reads

• Easy to replay messages

What is Apache Kafka?

©2014 Knowledgent Group Inc. All Rights Reserved7

• Message: A datum to send

• Topic: Kafka maintains messages in categories called “topics”

• Partition: A logical division of a topic

• Producer: An API to publish messages to Kafka topic

• Broker: A server

• Cluster: Kafka cluster comprises one or more brokers

• Consumer: API to consume published messages and process further

• Replication: Kafka replicates log for each partition across servers

Apache Kafka Terminology

©2014 Knowledgent Group Inc. All Rights Reserved8

Message Topic Partition Producer Broker

Consumer

At a high level, producers send messages over the network to the Kafka cluster.

Kafka cluster in turn serves them up to consumers.

Apache Kafka Terminology & Big Picture

©2014 Knowledgent Group Inc. All Rights Reserved9

Message Topic Partition Producer Broker

Consumer

Let’s do a hands-on exercise of Kafka with knowledge we’ve learned until now

Apache Kafka Terminology & Big Picture

©2014 Knowledgent Group Inc. All Rights Reserved10

Message Topic Partition Producer Broker

Consumer

In Kafka for each topic a partition log is maintained.

Each partition is an ordered, immutable sequence of messages that is appended to

Each message in the partition is assigned a sequential id number called the offset

Apache Kafka: About Topic and Partition

Partition 1

Writes

Partition 2

Partition 3

©2014 Knowledgent Group Inc. All Rights Reserved11

Message Topic Partition Producer Broker Consumer

In Kafka, a Producer is an API to publish messages to topic

Apache Kafka: About Topic and Partition

©2014 Knowledgent Group Inc. All Rights Reserved12

Message Topic Partition Producer Broker Consumer

In Kafka, a Consumer is an API to consume messages from topics

Apache Kafka: About Topic and Partition

©2014 Knowledgent Group Inc. All Rights Reserved13

Message Topic Partition Producer Broker

Consumer

Let’s do a hands-on exercise of Kafka with knowledge we’ve learned until now

Apache Kafka Terminology & Big Picture

©2014 Knowledgent Group Inc. All Rights Reserved14

• Trading Systems- Risk Identification in real-time

• Change Data Capture- Capturing the changed data into data lake environment

• Online Gaming- Identifying top scorers of a game

Apache Kafka Use Cases

©2014 Knowledgent Group Inc. All Rights Reserved15

Thank you!

Questions?

top related