kafka connect by datio

Kafka Connect

Contents

Introduction to Kafka123

Data Ingestion

Kafka Connect

1.INTRODUCTION TO KAFKA

Let’s start with the first set of slides

Data Pipeline ProblemFrontend

Server

Metrics Server

Frontend Server

Frontend Server Database

Server

Chat Server

Metrics Server

Metrics UI

Inter-process communication channel

Data Pipeline Problem

Frontend Server

Shopping Cart

Backend Server

Metrics Server

Metrics UI Log Search

Database Server

Frontend Server

Database Server

Chat Server

Metrics Server

Metrics UI

Metrics Pub/Sub

Logging Pub/Sub

A publish/subscribe System Multiple publish/subscribe Systems

Kafka Goals

Frontend Server

Metrics Server

Metrics UI Log Search

✓ Decouple data pipelines

✓ Provide persistence for message data to allow multiple consumers

✓ Optimize for high throughput of messages

✓ Allow for horizontal scaling of the system to grow as the data stream grow

Database Server Shopping

Backend Server

ProducerProducer

Consumer Consumer Consumer

Broker 1

Topic APartition 0

Broker 2

Topic APartition 1

Broker 3

Topic B

ZOOKEEPER

Kafka Architecture

Kafka Cluster

Topic C

Disk-based retentionScalable

High throughput

USE CASESGold Data

Data Lake

2.DATA INGESTION

Kafka Ingestion

Kafka Producer

Kafka Consumer

Kafka Cluster

Use Case Requirements

DATA LOSS ?

EXCATLY ONCE ?

LATENCY ?

THROUGHPUT ?

Producer Record

KafkaBroker

Partition

KeyValue

producer.send(record).get

Exception/Metadata

Producer Record

KafkaBroker

Partition

KeyValue

producer.send(record)

Exception/Metadata

Asynchronous sendSynchronous send

Producer

Producer Record

Serializer

Partitioner

Topic APartition 0

Batch 0

Batch 1

Topic BPartition 1

Batch 0

Batch 1

Retry?

MetadataException

Send()Topic

Partition

KeyValue

TopicPartition

commit

Metadata

TopicPartitionOffset

BrokerProducer

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12

Consumer 0

Consumer 1

Consumer 2

Consumer Group

Partition 0

Partition 1

Partition 2

Partition 3

Partition 0

Partition 1

Partition 2

Partition 3

Partition 0

Partition 1

Partition 2

Partition 3

Partition 0

Partition 1

Partition 2

Partition 3

Consumer 1

Consumer 2

Consumer 1

Consumer 2

Consumer 3

Consumer 4

Consumer 1

Consumer 2

Consumer 3

Consumer 4

Consumer 5

Consumer 6

Topic T1 Topic T1 Topic T1Consumer Group 1 Consumer Group 1

Consumer Group 1

Consumer

ChannelChannel Processor

Interceptor #1

Interceptor #N

SinkSource

Flume Agent

Reliable

Fault Tolerant

Customizable

Manageable

Centralized Sources

Apache Flume

Consumer

Kafka as reliable flume channel

Flume + Kafka

Source Sink

ChannelProducer

Flume as kafka producer/consumer

3.KAFKA CONNECT

Data Source

Schema Registry

Data Sink

Kafka Connect

Ingestion integration

Streaming and batch

Scales to the application

Failover control

Accessible connector API

Worker Mode

Connect

Kafka Producer

Kafka Consumer

Worker TaskWorker TaskWorker

TaskThread

Standalone Distributed

Connector

Input N

Input 2

Input 1

Worker

Producer Record

Serializer

Partitioner

Topic APartition 0

Batch 0

Batch 1

Topic BPartition 1

Batch 0

Batch 1

Retry?

MetadataException

Send()Topic

Partition

KeyValue

TopicPartition

commit

Metadata

TopicPartitionOffset

Broker

Worker settings to ensure no data loss

request.timeout.ms=MAX_VALUE

retries=MAX_VALUE

max.in.flight.request.per.connection=1

acks=all

max.block.ms=MAX_VALUE

Worker 2 Worker 3Worker 1

Worker TaskWorker TaskWorker Task

Workerconfig

Sourceconfig

Workerconfig

Worker TaskWorker TaskWorker Task

Conn 1, Task 3Partitions: 5,6

Conn 2

Conn 1

Standaloneworker

Scalability

Fault tolerance

connectors &

Distributed Worker

Simple

1 Worker

N conn/tasks

Schema Registry

Consumer

SubjectTopic

SchemaVersion

Worker Task sendRecords()

SourceRecordSourceRecordSourceRecord Producer

RecordProducerRecordProducerRecord

REST API

Serializers

Formaters

Multiple version

of the same

schema

StreamsStreamsStream

Source

Schema

SerializerDeserializer

THANKS!Any questions?

@datiobd rbravo@datiobd.com

datio-big-data

kafka connect by datio

Data & Analytics

deploying confluent enterprise on microsoft azure ·...

kafka & hadoop - for nyc kafka meetup

kafka tutorial: kafka security

introduction to kafka connect

kafka low-level design discussion of kafka design kafka...

enterprise kafka: kafka as a service

red hat amq 7...2.4. kafka connect 2.4.1. deploying kafka...

kafka connect: real-time data integration at scale with...

real time streams at scale with kafka: couchbase connect...

pdf: kafka tutorial -...

kafka connect-london-meetup-2016

building realtime data pipelines with kafka connect and...

kafka connect & kafka streams/ksql - powerful ecosystem...

kafka connect

pc engine core datio battle stick 307 wireing 001

kafka reliability guarantees atl kafka user group

kafka connect & kafka streams/ksql - the ecosystem around...

building realtim data pipelines with kafka connect and spark...

cloudurablecloudurable.com/ppt/cloudurable-kafka-intro-with-simple-java-produc… ·...

role-based access control (rbac) for kafka connect...the...