chicago flink meetup: flink's streaming architecture

45
Architecture of Flink's Streaming Runtime Robert Metzger @rmetzger_ [email protected] Chicago Apache Flink Meetup, October 29, 2015

Upload: robert-metzger

Post on 24-Jan-2017

828 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Chicago Flink Meetup: Flink's streaming architecture

Architecture of Flink's Streaming Runtime

Robert Metzger@[email protected]

Chicago Apache Flink Meetup, October 29, 2015

Page 2: Chicago Flink Meetup: Flink's streaming architecture

What is stream processing Real-world data is unbounded and is

pushed to systems Right now: people are using the

batch paradigm for stream analysis (there was no good stream processor available)

New systems (Flink, Kafka) embrace streaming nature of data

2

Web server Kafka topic

Stream processing

Page 3: Chicago Flink Meetup: Flink's streaming architecture

3

Flink is a stream processor with many faces

Streaming dataflow runtime

Page 4: Chicago Flink Meetup: Flink's streaming architecture

Flink's streaming runtime

4

Page 5: Chicago Flink Meetup: Flink's streaming architecture

Requirements for a stream processor Low latency• Fast results (milliseconds)

High throughput• handle large data amounts (millions of

events per second) Exactly-once guarantees• Correct results, also in failure cases

Programmability• Intuitive APIs

5

Page 6: Chicago Flink Meetup: Flink's streaming architecture

Pipelining

6

Basic building block to “keep the data moving”

• Low latency• Operators push

data forward• Data shipping as

buffers, not tuple-wise

• Natural handling of back-pressure

Page 7: Chicago Flink Meetup: Flink's streaming architecture

Fault Tolerance in streaming at least once: ensure all operators

see all events• Storm: Replay stream in failure case

Exactly once: Ensure that operators do not perform duplicate updates to their state• Flink: Distributed Snapshots• Spark: Micro-batches on batch runtime

7

Page 8: Chicago Flink Meetup: Flink's streaming architecture

Flink’s Distributed Snapshots Lightweight approach of storing the

state of all operators without pausing the execution

high throughput, low latency Implemented using barriers flowing

through the topology

8

Kafka Consume

roffset =

162

Element Countervalue =

152

Operator stateData Stream

barrier

Before barrier =part of the snapshot

After barrier =Not in snapshot

(backup till next snapshot)

Page 9: Chicago Flink Meetup: Flink's streaming architecture

9

Page 10: Chicago Flink Meetup: Flink's streaming architecture

10

Page 11: Chicago Flink Meetup: Flink's streaming architecture

11

Page 12: Chicago Flink Meetup: Flink's streaming architecture

12

Page 13: Chicago Flink Meetup: Flink's streaming architecture

Best of all worlds for streaming Low latency• Thanks to pipelined engine

Exactly-once guarantees• Distributed Snapshots

High throughput• Controllable checkpointing overhead

Separates app logic from recovery• Checkpointing interval is just a config parameter

13

Page 14: Chicago Flink Meetup: Flink's streaming architecture

14

Throughput of distributed grep

Data Generator

“grep” operator

30 machines, 120 cores

0

20,000,000

40,000,000

60,000,000

80,000,000

100,000,000

120,000,000

140,000,000

160,000,000

180,000,000

200,000,000

aggregate throughput of 175 million elements per second

aggregate throughput of 9 million elements per second

• Flink achieves 20x higher throughput

• Flink throughput almost the same with and without exactly-once

Page 15: Chicago Flink Meetup: Flink's streaming architecture

Aggregate throughput for stream record grouping

15

Flink, n

o fault t

olerance

Flink, e

xactly once

Storm

, no fa

ult toleran

ce

Storm

, at le

ast once

0

10,000,000

20,000,000

30,000,000

40,000,000

50,000,000

60,000,000

70,000,000

80,000,000

90,000,000

100,000,000aggregate throughput of 83 million elements per second

8,6 million elements/s

309k elements/s Flink achieves 260x higher throughput with fault tolerance

30 machines, 120 cores Network

transfer

Page 16: Chicago Flink Meetup: Flink's streaming architecture

Latency in stream record grouping

16

Data Generator

Receiver:Throughput /

Latency measure

• Measure time for a record to travel from source to sink

Flink,

no fault t

olerance

Flink, e

xactly

once

Storm

, at le

ast once

0.005.00

10.0015.0020.0025.0030.00

Median latency25 ms

1 ms

Flink, no fault toler-ance

Flink, exactly once

Storm, at least

once

0.00

10.00

20.00

30.00

40.00

50.00

60.00

99th percentile latency

50 ms

Page 17: Chicago Flink Meetup: Flink's streaming architecture

17

Page 18: Chicago Flink Meetup: Flink's streaming architecture

Exactly-Once with YARN Chaos Monkey Validate exactly-once guarantees

with state-machine

18

Page 19: Chicago Flink Meetup: Flink's streaming architecture

“Faces” of Flink

19

Page 20: Chicago Flink Meetup: Flink's streaming architecture

Faces of a stream processor

20

Stream processing

Batchprocessing Machine Learning at

scale

Graph Analysis

Streaming dataflow runtime

Page 21: Chicago Flink Meetup: Flink's streaming architecture

The Flink Stack

21

Streaming dataflow runtime

Specialized Abstractions / APIs

Core APIs

Flink Core Runtime

Deployment

Page 22: Chicago Flink Meetup: Flink's streaming architecture

22

APIs for stream and batchcase class Word (word: String, frequency: Int)

val lines: DataStream[String] = env.fromSocketStream(...)lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS)) .groupBy("word").sum("frequency").print()

val lines: DataSet[String] = env.readTextFile(...)lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .groupBy("word").sum("frequency") .print()

DataSet API (batch):

DataStream API (streaming):

Page 23: Chicago Flink Meetup: Flink's streaming architecture

The Flink Stack

23

Streaming dataflow runtime

DataSet (Java/Scala) DataStream (Java/Scala)

Experimental Python API also

available

Data Sourceorders.tbl

FilterMap DataSource

lineitem.tbl

JoinHybrid Hash

buildHT probe

hash-part [0] hash-part [0]

GroupRedsort

forward

API independent DataflowGraph representation

Batch Optimizer Graph Builder

Page 24: Chicago Flink Meetup: Flink's streaming architecture

24

Batch is a special case of streaming

Batch: run a bounded stream (data set) on a stream processor

Form a global window over the entire data set for join or grouping operations

Page 25: Chicago Flink Meetup: Flink's streaming architecture

Batch-specific optimizations Managed memory on- and off-heap• Operators (join, sort, …) with out-of-core

support• Optimized serialization stack for user-

types Cost-based Optimizer• Job execution depends on data size

25

Page 26: Chicago Flink Meetup: Flink's streaming architecture

The Flink Stack

26

Streaming dataflow runtime

Specialized Abstractions / APIs

Core APIs

Flink Core Runtime

Deployment

DataSet (Java/Scala) DataStream

Page 27: Chicago Flink Meetup: Flink's streaming architecture

27

FlinkML: Machine Learning API for ML pipelines inspired by scikit-learn

Collection of packaged algorithms • SVM, Multiple Linear Regression, Optimization, ALS, ...

val trainingData: DataSet[LabeledVector] = ...val testingData: DataSet[Vector] = ...

val scaler = StandardScaler()val polyFeatures = PolynomialFeatures().setDegree(3)val mlr = MultipleLinearRegression()

val pipeline = scaler.chainTransformer(polyFeatures).chainPredictor(mlr)

pipeline.fit(trainingData)

val predictions: DataSet[LabeledVector] = pipeline.predict(testingData)

Page 28: Chicago Flink Meetup: Flink's streaming architecture

Gelly: Graph Processing Graph API and library

Packaged algorithms• PageRank, SSSP, Label Propagation, Community Detection,

Connected Components

28

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

Graph<Long, Long, NullValue> graph = ...

DataSet<Vertex<Long, Long>> verticesWithCommunity = graph.run(new

LabelPropagation<Long>(30)).getVertices();

verticesWithCommunity.print();

env.execute();

Page 29: Chicago Flink Meetup: Flink's streaming architecture

Flink Stack += Gelly, ML

29

Gelly ML

DataSet (Java/Scala) DataStream

Streaming dataflow runtime

Page 30: Chicago Flink Meetup: Flink's streaming architecture

Integration with other systems

30

SAM

OA

DataSet DataStream

Hado

op M

/R

Goog

le D

atafl

ow

Casc

adin

g

Stor

mZepp

elin

• Use Hadoop Input/Output Formats• Mapper / Reducer implementations• Hadoop’s FileSystem implementations

• Run applications implemented against Google’s Data Flow APIon premise with Flink

• Run Cascading jobs on Flink, with almost no code change• Benefit from Flink’s vastly better performance than

MapReduce• Interactive, web-based data exploration

• Machine learning on data streams

• Compatibility layer for running Storm code• FlinkTopologyBuilder: one line replacement for

existing jobs• Wrappers for Storm Spouts and Bolts• Coming soon: Exactly-once with Storm

• Build-in serializer support for Hadoop’s Writable type

Page 31: Chicago Flink Meetup: Flink's streaming architecture

Deployment options

Gelly

Table

ML

SAMOA

Data

Set (

Java

/Sca

la)

Data

Stre

am

Hadoop

Loca

lCl

uste

rYA

RNTe

zEm

bedd

edDataflow

Dataflow

MRQL

Table

Cascading

Stre

amin

g da

taflo

w ru

ntim

e

Storm

Zeppelin

• Start Flink in your IDE / on your machine• Local debugging / development using the

same code as on the cluster• “bare metal” standalone installation of Flink

on a cluster

• Flink on Hadoop YARN (Hadoop 2.2.0+)• Restarts failed containers• Support for Kerberos-secured YARN/HDFS

setups

Page 32: Chicago Flink Meetup: Flink's streaming architecture

The full stack

32

Gelly

Tabl

e

ML

SAM

OA

DataSet (Java/Scala) DataStream

Hado

op M

/R

Local Cluster Yarn Tez Embedded

Data

flow

Data

flow

(WiP

)

MRQ

L

Tabl

e

Casc

adin

gStreaming dataflow runtime

Stor

m (W

iP)

Zepp

elin

Page 33: Chicago Flink Meetup: Flink's streaming architecture

Closing

33

Page 34: Chicago Flink Meetup: Flink's streaming architecture

34

tl;dr SummaryFlink is a software stack of Streaming runtime• low latency• high throughput• fault tolerant, exactly-once data processing

Rich APIs for batch and stream processing• library ecosystem• integration with many systems

A great community of devs and users Used in production

Page 35: Chicago Flink Meetup: Flink's streaming architecture

35

What is currently happening? Features for upcoming 0.10 release:• Master High Availability• Vastly improved monitoring GUI• Watermarks / Event time processing /

Windowing rework• Stable streaming API (no more “beta”

flag) Next release: 1.0

Page 36: Chicago Flink Meetup: Flink's streaming architecture

How do I get started?

36

Mailing Lists: (news | user | dev)@flink.apache.orgTwitter: @ApacheFlink Blogs: flink.apache.org/blog, data-artisans.com/blog/IRC channel: irc.freenode.net#flink

Start Flink on YARN in 4 commands:# get the hadoop2 package from the Flink download page at# http://flink.apache.org/downloads.htmlwget <download url>tar xvzf flink-0.9.1-bin-hadoop2.tgzcd flink-0.9.1/./bin/flink run -m yarn-cluster -yn 4 ./examples/flink-java-examples-0.9.1-WordCount.jar

Page 37: Chicago Flink Meetup: Flink's streaming architecture

37flink.apache.org

• Check out the slides: http://flink-forward.org/?post_type=session

• Video recordings will be available very soon

Page 38: Chicago Flink Meetup: Flink's streaming architecture

Appendix

38

Page 39: Chicago Flink Meetup: Flink's streaming architecture

Managed (off-heap) memory and out-of-core support

39Memory runs out

Page 40: Chicago Flink Meetup: Flink's streaming architecture

40

Cost-based Optimizer

DataSourceorders.tbl

FilterMap DataSource

lineitem.tbl

JoinHybrid Hash

buildHT probe

broadcast forward

Combine

GroupRedsort

DataSourceorders.tbl

FilterMap DataSource

lineitem.tbl

JoinHybrid Hash

buildHT probe

hash-part [0] hash-part [0]

hash-part [0,1]

GroupRedsort

forwardBest plan

depends onrelative sizes of

input files

Page 41: Chicago Flink Meetup: Flink's streaming architecture

41

case class Path (from: Long, to: Long)val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next }

Optimizer

Type extraction

stack

Task schedulin

g

Dataflow metadata

Pre-flight (Client)

JobManager TaskManagers

Data Sourceorders.tbl

Filter

Map DataSource

lineitem.tbl

JoinHybrid Hash

buildHT

probe

hash-part [0] hash-part [0]

GroupRedsort

forward

Program

DataflowGraph

deployoperators

trackintermediate

results

Loca

lCl

uste

r: YA

RN, S

tand

alon

e

Page 42: Chicago Flink Meetup: Flink's streaming architecture

Iterative processing in FlinkFlink offers built-in iterations and delta iterations to execute ML and graph algorithms efficiently

42

map

join sum

ID1

ID2

ID3

Page 43: Chicago Flink Meetup: Flink's streaming architecture

Example: Matrix Factorization

43

Factorizing a matrix with28 billion ratings forrecommendations

More at: http://data-artisans.com/computing-recommendations-with-flink.html

Page 44: Chicago Flink Meetup: Flink's streaming architecture

44

Batch aggregation ExecutionGraph

JobManager

TaskManager 1

TaskManager 2

M1

M2

RP1

RP2

R1

R2

1

2 3a 3b

4a

4b

5a

5b

"Blocked" result partition

Page 45: Chicago Flink Meetup: Flink's streaming architecture

45

Streaming window aggregation

ExecutionGraph

JobManager

TaskManager 1

TaskManager 2

M1

M2

RP1

RP2

R1

R2

1

2 3a 3b

4a

4b

5a

5b

"Pipelined" result partition