apache flink big data stream processing · pdf fileapache flink big data stream processing...

1 © DIMA 2017© 2013 Berlin Big Data Center • All Rights Reserved1 © DIMA 2017

Apache FlinkBig Data Stream Processing

Tilmann RablBerlin Big Data Center

www.dima.tu-berlin.de | bbdc.berlin | [email protected] – 11.10.2017

2 © DIMA 20172

2 © DIMA 2017

Agenda

Disclaimer: I am neither a Flink developer nor affiliated with data Artisans.

3 © DIMA 20173

3 © DIMA 2017

AgendaFlink Primer• Background & APIs (-> Polystore functionality)• Execution Engine• Some key features

Stream Processing with Apache Flink• Key features

With slides from data Artisans, Volker Markl, Asterios Katsifodimos

4 © DIMA 20174 © 2013 Berlin Big Data Center • All Rights Reserved

4 © DIMA 2017

Flink Timeline

5 © DIMA 2017

• Relational Algebra• Declarativity• Query Optimization• Robust Out-of-core

• Scalability• User-defined

Functions • Complex Data Types• Schema on Read

• Iterations• Advanced Dataflows• General APIs• Native Streaming

Draws onDatabase Technology

Draws onMapReduce Technology

Adds

Stratosphere: General Purpose Programming + Database Execution


6 © DIMA 2017

The APIs

6

Process Function (events, state, time)

DataStream API (streams, windows)

Table API (dynamic tables)

Stream SQL

Stream- &Batch Processing

Analytics

StatefulEvent-DrivenApplications


7 © DIMA 2017

Process Function

7

class MyFunction extends ProcessFunction[MyEvent, Result] {

// declare state to use in the programlazy val state: ValueState[CountWithTimestamp] = getRuntimeContext().getState(…)

def processElement(event: MyEvent, ctx: Context, out: Collector[Result]): Unit = {// work with event and state(event, state.value) match { … }

out.collect(…) // emit eventsstate.update(…) // modify state

// schedule a timer callbackctx.timerService.registerEventTimeTimer(event.timestamp + 500)

}

def onTimer(timestamp: Long, ctx: OnTimerContext, out: Collector[Result]): Unit = {// handle callback when event-/processing- time instant is reached

}}


8 © DIMA 2017

Data Stream API

8

val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer09<>(…))

val events: DataStream[Event] = lines.map((line) => parse(line))

val stats: DataStream[Statistic] = stream.keyBy("sensor").timeWindow(Time.seconds(5)).sum(new MyAggregationFunction())

stats.addSink(new RollingSink(path))


9 © DIMA 2017

Table API & Stream SQL

9


10 © DIMA 2017

What can I do with it?

An engine that can natively support all these workloads.

Flink

Stream processing

Batchprocessing

Machine Learning at scale

Graph AnalysisComplex event processing


11 © DIMA 2017

Flink in the Analytics Ecosystem

11

MapReduce

Hive

Flink

Spark Storm

Yarn Mesos

HDFS

Mahout

Cascading

Tez

Pig

Data processing engines

App and resource management

Applications &Languages

Storage, streams KafkaHBase

Crunch

…

Giraph

12 © DIMA 2017

Where in my cluster does Flink fit?

- Gather and backup streams- Offer streams for consumption- Provide stream recovery

- Analyze and correlate streams- Create derived streams and state- Provide these to upstream systems

Serverlogs

Trxnlogs

Sensorlogs

Upstreamsystems

Gathering Integration Analysis

13 © DIMA 201713

13 © DIMA 2017

Architecture• Hybrid MapReduce and MPP database runtime

• Pipelined/Streaming engine– Complete DAG deployed

Worker 1

Worker 3 Worker 4

Worker 2

Job Manager

14 © DIMA 201714

14 © DIMA 2017

Flink Execution Model• Flink program = DAG* of operators and intermediate streams• Operator = computation + state• Intermediate streams = logical stream of records


15 © DIMA 2017

Technology inside Flinkcase class Path (from: Long, to:Long)val tc = edges.iterate(10) {

paths: DataSet[Path] =>val next = paths

.join(edges)

.where("to")

.equalTo("from") {(path, edge) =>

Path(path.from, edge.to)}.union(paths).distinct()

next}

Cost-based optimizer

Type extraction stack

Task scheduling

Recoverymetadata

Pre-flight (Client)

MasterWorkers

DataSource

orders.tbl

Filter

Map DataSource

lineitem.tbl

JoinHybrid Hash

buildHT

probe

hash-part [0] hash-part [0]

GroupRed

sort

forward

Program

DataflowGraph

Memory manager

Out-of-core algorithms

Batch & streaming

State & checkpoints

deployoperators

trackintermediate

results

16 © DIMA 201716

16 © DIMA 2017

Rich set of operators

16

Map, Reduce, Join, CoGroup, Union, Iterate, Delta Iterate, Filter, FlatMap, GroupReduce, Project, Aggregate, Distinct, Vertex-Update, Accumulators, …

17 © DIMA 201717

17 © DIMA 2017

Effect of optimization

17

Run on a sampleon the laptop

Run a month laterafter the data evolved

Hash vs. SortPartition vs. BroadcastCachingReusing partition/sortExecution

Plan A

ExecutionPlan B

Run on large fileson the cluster

ExecutionPlan C

18 © DIMA 201718

18 © DIMA 2017

Flink Optimizer Transitive Closure

HDFS

newPaths

paths

Join Union Distinct

Step function

IterateIteratereplace

• What you write is not what is executed• No need to hardcode execution strategies

• Flink Optimizer decides:– Pipelines and dam/barrier placement– Sort- vs. hash- based execution– Data exchange (partition vs. broadcast)– Data partitioning steps– In-memory caching

Hash Partition on [0]Hash Partition on [1]

Hybrid Hash Join

Loop-invariant data cached in memory

Group Reduce (Sorted (on [0]))

Co-locate JOIN + UNIONHash Partition on [1]

ForwardCo-locate DISTINCT + JOIN

19 © DIMA 201719

19 © DIMA 2017

Scale Out

19

20 © DIMA 201720 © DIMA 2017

Stream Processing with Flink

21 © DIMA 201721

21 © DIMA 2017

8 Requirements of Big Streaming• Keep the data moving

– Streaming architecture

• Declarative access– E.g. StreamSQL, CQL

• Handle imperfections– Late, missing, unordered items

• Predictable outcomes– Consistency, event time

• Integrate stored and streaming data– Hybrid stream and batch

• Data safety and availability– Fault tolerance, durable state

• Automatic partitioning and scaling– Distributed processing

• Instantaneous processing andresponse

The 8 Requirements of Real-Time Stream Processing – Stonebraker et al. 2005

22 © DIMA 201722

22 © DIMA 2017

8 Requirements of Streaming Systems• Keep the data moving

– Streaming architecture

• Declarative access– E.g. StreamSQL, CQL

• Handle imperfections– Late, missing, unordered items

• Predictable outcomes– Consistency, event time

• Integrate stored and streaming data– Hybrid stream and batch – see StreamSQL

• Data safety and availability– Fault tolerance, durable state

• Automatic partitioning and scaling– Distributed processing

• Instantaneous processing andresponse

The 8 Requirements of Real-Time Stream Processing – Stonebraker et al. 2005

23 © DIMA 201723

23 © DIMA 2017

How to keep data moving?

Streamdiscretizer

Job Job Job Jobwhile (true) {// get next few records// issue batch computation

}

while (true) {// process next record

}

Long-standing operators

Discretized Streams (mini-batch)

Native streaming


24 © DIMA 2017

Declarative Access – Stream SQL

24

Stream / Table Duality

Table with Primary KeyTable without Primary Key

25 © DIMA 201725

25 © DIMA 2017

Handle Imperfections - Event Time et al.

• Event time– Data item production time

• Ingestion time– System time when data item is received

• Processing time– System time when data item is processed

• Typically, these do not match!• In practice, streams are unordered!

Image: Tyler Akidau


26 © DIMA 2017

Time: Event Time Example

26

1977 1980 1983 1999 2002 2005 2015

Processing Time

EpisodeIV

EpisodeV

EpisodeVI

EpisodeI

EpisodeII

EpisodeIII

EpisodeVII

Event Time

27 © DIMA 201727

27 © DIMA 2017

Flink’s Windowing• Windows can be any combination of (multiple) triggers & evictions

– Arbitrary tumbling, sliding, session, etc. windows can be constructed.

• Common triggers/evictions part of the API– Time (processing vs. event time), Count

• Even more flexibility: define your own UDF trigger/eviction

• Examples:dataStream.windowAll(TumblingEventTimeWindows.of(Time.seconds(5)));dataStream.keyBy(0).window(TumblingEventTimeWindows.of(Time.seconds(5)));

• Flink will handle event time, ordering, etc.

28 © DIMA 2017

Example Analysis: Windowed Aggregation

val windowedStream = stockStream.window(Time.of(10, SECONDS)).every(Time.of(5, SECONDS))val lowest = windowedStream.minBy("price")val maxByStock = windowedStream.groupBy("symbol").maxBy("price")val rollingMean = windowedStream.groupBy("symbol").mapWindow(mean _)

(1)

(2)

(4)

(3)

(1)(2)

(4)(3)

StockPrice(SPX, 2113.9)StockPrice(FTSE, 6931.7)StockPrice(HDP, 23.8)StockPrice(HDP, 26.6)

StockPrice(HDP, 23.8)

StockPrice(SPX, 2113.9)StockPrice(FTSE, 6931.7)StockPrice(HDP, 26.6)

StockPrice(SPX, 2113.9)StockPrice(FTSE, 6931.7)StockPrice(HDP, 25.2)

29 © DIMA 201729

29 © DIMA 2017

Data Safety and Availability

• Ensure that operators see all events– “At least once”– Solved by replaying a stream from a checkpoint– No good for correct results

• Ensure that operators do not perform duplicate updates to their state– “Exactly once”– Several solutions

• Ensure the job can survive failure

29

30 © DIMA 201730

30 © DIMA 2017

Lessons Learned from Batch

• If a batch computation fails, simply repeat computation as a transaction• Transaction rate is constant• Can we apply these principles to a true streaming execution?

30

batch-1batch-2

31 © DIMA 201731

31 © DIMA 2017

Taking Snapshots – the naïve way

31

Initial approach (e.g., Naiad)• Pause execution on t1,t2,..• Collect state• Restore execution

t2t1

execution snapshots

32 © DIMA 201732

32 © DIMA 2017

Asynchronous Snapshots in Flinkt2t1

snap - t1 snap - t2

snapshotting snapshotting

Propagating markers/barriers

Full or incremental

33 © DIMA 201733

33 © DIMA 2017

ConclusionApache Flink!

The case for Flink as a stream processor• Ideal basis for polystore computations• Full feature big data streaming engine

apache flink big data stream processing · pdf fileapache flink big data stream processing...

Documents