large scale stream processing with apache flink€¦ · large scale stream processing with apache...

Post on 20-May-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Large scale stream processing with Apache FlinkNikolay Stoitsev

Sr. Software Engineer at Uber Tech Sofia

Stream Processing?

Stream Processing?User Interaction Logs

Stream Processing?User Interaction Logs

Application Logs

Stream Processing?User Interaction Logs

Application Logs

Sensor Data

Stream Processing?User Interaction Logs

Application Logs

Sensor Data

Database Commit Logs

Infinite Dataset

Producer Stream

Producer Stream HDFS

Producer Stream HDFS

Hive

Producer Stream HDFS

Hive

Big Latency

Producer Stream

HDFS

Real-time service

Apache Stormstorm.apache.org

High-latency & accurate

vs.Low-latency & approximation

Lambda architecture

https://www.oreilly.com/ideas/questioning-the-lambda-architecture

Kappa Architecture

Use Apache KafkaDurable, scalable, fault-tolerant

Producer KafkaStream

Processor

Metrics we want to track

Net payout

Daily items sold

Weekly items sold

Order acceptance rate

Order preparation speed

Item rating

Real time

Scalable

Granular

Highly available

Order Stream

Payment Stream

User Rating Stream

Order Stream

Payment Stream

User Rating Stream

Stream Processor

OLAP

samza.apache.org

Apache Flinkflink.apache.org

Everything is a batch

vs.Everything is a stream

Single JVM Cluster Cloud

Runtime

DataSet API DataStream API

Dataflow graph

Source

Source

Operator

Operator

Operator Sinc

OLAP

https://ci.apache.org/projects/flink/flink-docs-release-1.6/concepts/programming-model.html

https://ci.apache.org/projects/flink/flink-docs-release-1.6/concepts/programming-model.html

https://ci.apache.org/projects/flink/flink-docs-release-1.6/concepts/programming-model.html

Flink Program

Optimizer Graph Builder

Client

Flink Program

Optimizer Graph Builder

Client Job Manager

Task Manager Task Manager

Flink Program

Optimizer Graph Builder

Client Job Manager

Task Manager Task Manager

Snapshot Store

Fault tolerant

Flink Program

Optimizer Graph Builder

Client Job Manager

Task Manager Task Manager

Snapshot Store

Lightweight Asynchronous Snapshots for Distributed Dataflows

Paris Carbone, Gyula Fóra, Stephan EwenSeif HaridiKostas Tzoumas

Barrier Msg Msg Barrier Msg Msg Barrier

Operator

Barrier Msg Msg BarrierMsg Msg

Operator

Msg

Snapshot Store

Exactly Once Processing

Can handle very large state

Flink Program

Optimizer Graph Builder

Client Job Manager

Task Manager Task Manager

Snapshot Store

Flink Program

Optimizer Graph Builder

Client Job Manager

Task Manager Task Manager

Snapshot Store

Job Manager

Job Manager

Zookeeper

Flink Program

Optimizer Graph Builder

Client Job Manager

Task Manager Task Manager

Snapshot Store

Job Manager

Job Manager

Zookeeper

Flink Program

Optimizer Graph Builder

Client

Task Manager Task Manager

Snapshot Store

Job Manager

Job Manager

Zookeeper

Joining Streams

Order Stream

User Rating Stream

Order Stream

User Rating Stream

Order Stream

User Rating Stream

Local Join

Local Join

Order Stream

User Rating Stream

Local Join

Local Join

Apache Flink● Can join streams

● Fault tolerant

● Exactly Once Processing

● Combines stream and batch processing

… but it requires Java/Scala code

Scalable, efficient and robust

github.com/uber/AthenaX

SQL → what data to analyze

Flink → how to analyze it

Resource estimation and auto scaling

Monitoring and automatic failure recovery

eng.uber.com/athenax

Thanks!Nikolay Stoitsev @ Uber

top related