towards declarative stream processing using apache flink · stream processing with apache flink •...

1 © DIMA 2016© 2013 Berlin Big Data Center • All Rights Reserved1 © DIMA 2016

Towards Declarative Stream Processing using Apache Flink

Tilmann RablBerlin Big Data Center

www.dima.tu-berlin.de | bbdc.berlin | [email protected]

2 © DIMA 20162

2 © DIMA 2016

Agenda

3 © DIMA 20163

3 © DIMA 2016

AgendaApache Flink Primer• Architecture• Execution Engine• Some key features• Some demo (!)

Stream Processing with Apache Flink• Flexible Windows/Stream Discretization• Exactly-once Processing & Fault Tolerance

Apache Flink Use Cases• How companies use Flink (if we have time)

With slides from Data Artisans, Volker Markl, Asterios Katsifodimos, Juan Soto

4 © DIMA 20164 © DIMA 2016

Flink Primer

5 © DIMA 20165

5 © DIMA 2016

What is Apache Flink?

Apache Flink is an open source platform for scalable batch and stream data processing.

http://flink.apache.org

• The core of Flink is a distributed streaming dataflow engine.

• Executing dataflows in parallel on clusters

• Providing a reliable foundation for various workloads

• DataSet and DataStream programming abstractions are the foundation for user programs and higher layers

6 © DIMA 20166 © 2013 Berlin Big Data Center • All Rights Reserved

6 © DIMA 2016

What can I do with it?

An engine that can natively support all these workloads.

Flink

Stream processing

Batchprocessing

Machine Learning at scale

Graph Analysis


7 © DIMA 2016

Flink Timeline

8 © DIMA 2016

• Relational Algebra• Declarativity• Query Optimization• Robust Out-of-core

• Scalability• User-defined

Functions • Complex Data Types• Schema on Read

• Iterations• Advanced Dataflows• General APIs• Native Streaming

Draws onDatabase Technology

Draws onMapReduce Technology

Adds

Stratosphere: General Purpose Programming + Database Execution


9 © DIMA 2016

Flink in the Analytics Ecosystem

9

MapReduce

Hive

Flink

Spark Storm

Yarn Mesos

HDFS

Mahout

Cascading

Tez

Pig

Data processing engines

App and resource management

Applications &Languages

Storage, streams KafkaHBase

Crunch

…

Giraph

10 © DIMA 2016

Where in my cluster does Flink fit?

- Gather and backup streams- Offer streams for consumption- Provide stream recovery

- Analyze and correlate streams- Create derived streams and state- Provide these to upstream systems

Serverlogs

Trxnlogs

Sensorlogs

Upstreamsystems

Gathering Integration Analysis

11 © DIMA 201611

11 © DIMA 2016

Flink Execution Model• Flink program = DAG* of operators and intermediate streams• Operator = computation + state• Intermediate streams = logical stream of records

12 © DIMA 201612

12 © DIMA 2016

Architecture• Hybrid MapReduce and MPP database runtime

• Pipelined/Streaming engine– Complete DAG deployed

Worker 1

Worker 3 Worker 4

Worker 2

Job Manager


13 © DIMA 2016

Technology inside Flinkcase class Path (from: Long, to:Long)val tc = edges.iterate(10) {

paths: DataSet[Path] =>val next = paths

.join(edges)

.where("to")

.equalTo("from") {(path, edge) =>

Path(path.from, edge.to)}.union(paths).distinct()

next}

Cost-based optimizer

Type extraction stack

Task scheduling

Recoverymetadata

Pre-flight (Client)

MasterWorkers

DataSource

orders.tbl

Filter

Map DataSource

lineitem.tbl

JoinHybrid Hash

buildHT

probe

hash-part [0] hash-part [0]

GroupRed

sort

forward

Program

DataflowGraph

Memory manager

Out-of-core algos

Batch & Streaming

State & Checkpoints

deployoperators

trackintermediate

results

14 © DIMA 201614

14 © DIMA 2016

Rich set of operators

14

Map, Reduce, Join, CoGroup, Union, Iterate, Delta Iterate, Filter, FlatMap, GroupReduce, Project, Aggregate, Distinct, Vertex-Update, Accumulators, …

15 © DIMA 201615

15 © DIMA 2016

Effect of optimization

15

Run on a sampleon the laptop

Run a month laterafter the data evolved

Hash vs. SortPartition vs. BroadcastCachingReusing partition/sortExecution

Plan A

ExecutionPlan B

Run on large fileson the cluster

ExecutionPlan C

16 © DIMA 201616

16 © DIMA 2016

Scale Out

16

17 © DIMA 201617

17 © DIMA 2016

Flink Optimizer Transitive Closure

HDFS

newPaths

paths

Join Union Distinct

Step function

IterateIteratereplace

• What you write is not what is executed• No need to hardcode execution strategies

• Flink Optimizer decides:– Pipelines and dam/barrier placement– Sort- vs. hash- based execution– Data exchange (partition vs. broadcast)– Data partitioning steps– In-memory caching

Hash Partition on [0]Hash Partition on [1]

Hybrid Hash Join

Loop-invariant data cached in memory

Group Reduce (Sorted (on [0]))

Co-locate JOIN + UNIONHash Partition on [1]

ForwardCo-locate DISTINCT + JOIN


18 © DIMA 2016

Built-in vs. driver-based looping

Step Step Step Step Step

Client

Step Step Step Step Step

Client

map

join

red.

join

Loop outside the system, in driver program

Iterative program looks like many independent jobs

Dataflows with feedback edges

System is iteration-aware, can optimize the job

Flink

19 © DIMA 201619

19 © DIMA 2016

Managed Memory• Language APIs automatically converts objects to tuples

– Tuples mapped to pages/buffers of bytes– Operators can work on pages/buffers

• Full control over memory, out-of-core enabled• Operators (e.g., Hybrid Hash Join) address individual fields (not deserialize object): robust

4b 8b 1b 4b Variable Length

Tuple3<Integer, Double, Person>

F0: int

F1: double

Person

Id: int

Name: String

public class Person {int id; String name;}

Pojo (header)

IntSerialized

DoubleSerialized

IntSerialized

StringSerialized

Memory runs out

20 © DIMA 201620

20 © DIMA 2016

Sneak peak: Two of Flink’s APIs

20

case class Word (word: String, frequency: Int)

val lines: DataStream[String] = env.fromSocketStream(...)

lines.flatMap {line => line.split(" ").map(word => Word(word,1))}

.keyBy("word")

.window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS))

.sum("frequency”)

.print()

val lines: DataSet[String] = env.readTextFile(...)

lines.flatMap {line => line.split(" ").map(word => Word(word,1))}

.groupBy("word").sum("frequency")

.print()

DataSet API (batch):

DataStream API (streaming):


21 © DIMA 2016

DEMO – BATCH

„Inspired“ byhttp://dataartisans.github.io/flink-training/index.html

22 © DIMA 201622 © DIMA 2016

Stream Processing with Flink

23 © DIMA 201623

23 © DIMA 2016

Ingredients of a Streaming System• Streaming Execution Engine• Windowing (a.k.a Discretization)• Fault Tolerance• High Level Programming API (or language)

23

24 © DIMA 201624

24 © DIMA 2016

Batch is a Special Case of Streaming

map

Blocking operators (e.g., hybrid hash join) are embedded in streaming topology

Lower-overhead fault-tolerance via replaying intermediate results

join sum

25 © DIMA 201625

25 © DIMA 2016

Mini-Batching vs Native Streaming

Streamdiscretizer

Job Job Job Jobwhile (true) {// get next few records// issue batch computation

}

while (true) {// process next record

}

Long-standing operators

Discretized Streams (D-Streams)

Native streaming

26 © DIMA 201626

26 © DIMA 2016

Problems of Mini-Batch• Latency

– Each mini-batch schedules a new job, loads user libraries, establishes DB connections, etc

• Programming model– Does not separate business logic from recovery – changing the mini-batch size changes

query results

• Power– Keeping and updating state across mini-batches only possible by immutable

computations

26

27 © DIMA 201627

27 © DIMA 2016

Stream Discretization• Data is unbounded

– Interested in a (recent) part of it e.g. last 10 days

• Most common windows around: time, and count– Mostly in sliding, fixed, and tumbling forms

• Need for data-driven window definitions– e.g., user sessions (periods of user activity followed by inactivity), price changes, etc.

27

The world beyond batch: Streaming 101, Tyler Akidauhttps://beta.oreilly.com/ideas/the-world-beyond-batch-streaming-101Great read!

https://beta.oreilly.com/ideas/the-world-beyond-batch-streaming-101

28 © DIMA 201628

28 © DIMA 2016

Flink’s Discretization

• Allows very flexible windowing

• Borrows ideas (and extends) from IBM’s SPL– SLIDE = Trigger = When to emit a window– RANGE = Eviction = What the window contains

• Allows for lots of optimization– Not part of this talk

28

29 © DIMA 201629

29 © DIMA 2016

Flink’s Windowing• Windows can be any combination of (multiple) triggers & evictions

– Arbitrary tumbling, sliding, session, etc. windows can be constructed.

• Common triggers/evictions part of the API– Time (processing vs. event time), Count

• Even more flexibility: define your own UDF trigger/eviction

• Examples:dataStream.windowAll(TumblingEventTimeWindows.of(Time.seconds(5)));dataStream.keyBy(0).window(TumblingEventTimeWindows.of(Time.seconds(5)));

30 © DIMA 2016

Example Analysis: Windowed Aggregation

val windowedStream = stockStream.window(Time.of(10, SECONDS)).every(Time.of(5, SECONDS))val lowest = windowedStream.minBy("price")val maxByStock = windowedStream.groupBy("symbol").maxBy("price")val rollingMean = windowedStream.groupBy("symbol").mapWindow(mean _)

(1)

(2)

(4)

(3)

(1)(2)

(4)(3)

StockPrice(SPX, 2113.9)StockPrice(FTSE, 6931.7)StockPrice(HDP, 23.8)StockPrice(HDP, 26.6)

StockPrice(HDP, 23.8)

StockPrice(SPX, 2113.9)StockPrice(FTSE, 6931.7)StockPrice(HDP, 26.6)

StockPrice(SPX, 2113.9)StockPrice(FTSE, 6931.7)StockPrice(HDP, 25.2)


31 © DIMA 2016

Current Benchmark ResultsPerformed by Yahoo! Engineering, Dec 16, 2015

[..]Storm 0.10.0, 0.11.0-SNAPSHOT and Flink 0.10.1 show sub- second latencies at relatively high

throughputs[..]. Spark streaming 1.5.1 supports high throughputs, but at a relatively higher latency.

Flink achieves highest throughput with competitive low latency!

Source: http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at


32 © DIMA 2016

Flink Community

0

50

100

150

200

250

300

Jul 09 Nov 10 Apr 12 Aug 13 Dez 14 Mai 16 Sep 17

# unique contributor ids by Git commits

33 © DIMA 201633

33 © DIMA 2016

ConclusionThe case for Flink as a stream processor

– Proper streaming engine foundation– Flexible Windowing– Fault Tolerance with exactly once guarantees– Integration with batch– Large (and growing!) community

towards declarative stream processing using apache flink · stream processing with apache flink •...

Documents