ufuc celebi – stream & batch processing in one system

Post on 16-Apr-2017

5.799 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Ufuk Celebi uce@apache.org

Flink Forward October 13, 2015

Stream & Batch Processing in One System

Apache Flink’s Streaming Data Flow Engine

System Architecture

DeploymentLocal (Single JVM) · Cluster (Standalone, YARN)

DataStream API Unbounded Data

DataSet API Bounded Data

Runtime Distributed Streaming Data Flow

Libraries Machine Learning · Graph Processing · SQL-like API

1

User

DeploymentLocal (Single JVM) · Cluster (Standalone, YARN)

DataStream API Unbounded Data

DataSet API Bounded Data

Runtime Distributed Streaming Data Flow

Libraries Machine Learning · Graph Processing · SQL-like API

1

System

DeploymentLocal (Single JVM) · Cluster (Standalone, YARN)

DataStream API Unbounded Data

DataSet API Bounded Data

Runtime Distributed Streaming Data Flow

Libraries Machine Learning · Graph Processing · SQL-like API

1

TodayJourney from APIs to

Parallel Execution

A look behind the scenes. You don’t have to worry about this.

Components

JobManager MasterClient

TaskManager Worker

TaskManager Worker

TaskManager Worker

TaskManager Worker

User System

public class WordCount {

public static void main(String[] args) throws Exception { // Flink’s entry point StreamExecutionEnvironment env = StreamExecutionEnvironment .getExecutionEnvironment();

DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?", "Deny thy father and refuse thy name", "Or, if thou wilt not, be but sworn my love,", "And I'll no longer be a Capulet.");

// Split by whitespace to (word, 1) and sum up ones DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) .keyBy(0) .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1);

counts.print();

// Today: What happens now? env.execute(); } }

2

Components

JobManager MasterClient

TaskManager Worker

TaskManager Worker

TaskManager Worker

TaskManager Worker

User System

public class WordCount {

public static void main(String[] args) throws Exception { // Flink’s entry point StreamExecutionEnvironment env = StreamExecutionEnvironment .getExecutionEnvironment();

DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?", "Deny thy father and refuse thy name", "Or, if thou wilt not, be but sworn my love,", "And I'll no longer be a Capulet.");

// Split by whitespace to (word, 1) and sum up ones DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) .keyBy(0) .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1);

counts.print();

// Today: What happens now? env.execute(); } }

Submit Program

2

Components

JobManager MasterClient

TaskManager Worker

TaskManager Worker

TaskManager Worker

TaskManager Worker

User System

public class WordCount {

public static void main(String[] args) throws Exception { // Flink’s entry point StreamExecutionEnvironment env = StreamExecutionEnvironment .getExecutionEnvironment();

DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?", "Deny thy father and refuse thy name", "Or, if thou wilt not, be but sworn my love,", "And I'll no longer be a Capulet.");

// Split by whitespace to (word, 1) and sum up ones DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) .keyBy(0) .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1);

counts.print();

// Today: What happens now? env.execute(); } }

Submit Program

Schedule

2

Components

JobManager MasterClient

TaskManager Worker

TaskManager Worker

TaskManager Worker

TaskManager Worker

User System

public class WordCount {

public static void main(String[] args) throws Exception { // Flink’s entry point StreamExecutionEnvironment env = StreamExecutionEnvironment .getExecutionEnvironment();

DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?", "Deny thy father and refuse thy name", "Or, if thou wilt not, be but sworn my love,", "And I'll no longer be a Capulet.");

// Split by whitespace to (word, 1) and sum up ones DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) .keyBy(0) .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1);

counts.print();

// Today: What happens now? env.execute(); } }

Submit Program

Schedule

Execute

2

Client

Translates the API code to a data flow graph called JobGraph and

submits it to the JobManager.

Source

Transform

Sink

public class WordCount {

public static void main(String[] args) throws Exception { // Flink’s entry point StreamExecutionEnvironment env = StreamExecutionEnvironment .getExecutionEnvironment();

DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?", "Deny thy father and refuse thy name", "Or, if thou wilt not, be but sworn my love,", "And I'll no longer be a Capulet.");

// Split by whitespace to (word, 1) and sum up ones DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) .keyBy(0) .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1);

counts.print();

// Today: What happens now? env.execute(); } }

Translate

3

JobGraph

JobVertex IntermediateResult

Computation Data

4

JobGraph

JobVertex IntermediateResult

JobVertex IntermediateResultProduce

Computation Data

4

JobGraph

JobVertex IntermediateResult

JobVertex IntermediateResult

JobVertexIntermediateResult

Produce

Consume

Computation Data

4

The JobGraph

Vertices and results are combined to a directed acyclic graph (DAG) representing the user program.

5

Source

Source

Sink

SinkJoin

Map

JobGraph Translation• Translation includes optimizations like chaining:

f g

f · g

• DataSet API translation with cost-based optimization

6

JobGraph

JobVertex Parameters • Parallelism • Code to run • Consumed result(s) • Connection pattern

JobGraph is common abstraction for both DataStream and DataSet API.

Result Parameters • Producer • Type

Runtime is agnostic to the respective API. It’s only a question of JobGraph parameterization.

7

JobGraph

JobVertex Parameters • Parallelism • Code to run • Consumed result(s) • Connection pattern

JobGraph is common abstraction for both DataStream and DataSet API.

Result Parameters • Producer • Type

Runtime is agnostic to the respective API. It’s only a question of JobGraph parameterization.

7

TaskManagerTaskManager

Coordination• Coordination between components via Akka Actors • Actors exchange asynchronous messages • Each actor has own isolated state

JobManager MasterClient

Actor SystemActor System

8 TaskManager

System Components

JobManager MasterClient

TaskManager Worker

TaskManager Worker

TaskManager Worker

TaskManager Worker

User System

public class WordCount {

public static void main(String[] args) throws Exception { // Flink’s entry point StreamExecutionEnvironment env = StreamExecutionEnvironment .getExecutionEnvironment();

DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?", "Deny thy father and refuse thy name", "Or, if thou wilt not, be but sworn my love,", "And I'll no longer be a Capulet.");

// Split by whitespace to (word, 1) and sum up ones DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) .keyBy(0) .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1);

counts.print();

// Today: What happens now? env.execute(); } }

Submit Program

JobManager• All coordination via JobManager (master):

• Scheduling programs for execution • Checkpoint coordination (Till’s talk later today) • Monitoring workers

Actor System

Scheduling

Checkpoint Coordination

9

ExecutionGraph• Receive JobGraph and span out to ExecutionGraph

JobVertex Result JobVertex

10

ExecutionGraph• Receive JobGraph and span out to ExecutionGraph

EV1

EV3

EV2

EV4

RP1

RP2

RP3

RP4

EV1

EV2

Point to PointJobVertex Result

ExecutionVertex (EV)ResultPartition (RP)

JobVertex

10

ExecutionGraph• Receive JobGraph and span out to ExecutionGraph

EV1

EV3

EV2

EV4

RP1

RP2

RP3

RP4

EV1

EV2

All to AllJobVertex Result

ExecutionVertex (EV)ResultPartition (RP)

JobVertex

10

TaskManager

Actor System

Task SlotTask SlotTask SlotTask Slot

• All data processing in TaskManager (worker): • Communicate with JobManager via Actor messages • Exchange data between themselves via dedicated

data connections • Expose task slots for execution

I/O Manager

Memory Manager

11

Scheduling• Each ExecutionVertex will be executed one or more times • The JobManager maps Execution to task slots • Pipelined execution in same slot where applicable

12

p=4 p=4 p=3

All to allPointwise

TaskManager 1 TaskManager 2

Scheduling

TaskManager 1 TaskManager 2

• Each ExecutionVertex will be executed one or more times • The JobManager maps Execution to task slots • Pipelined execution in same slot where applicable

p=4 p=4 p=3

All to allPointwise

12

Scheduling

TaskManager 1 TaskManager 2

• Each ExecutionVertex will be executed one or more times • The JobManager maps Execution to task slots • Pipelined execution in same slot where applicable

p=4 p=4 p=3

All to allPointwise

12

Scheduling

TaskManager 1 TaskManager 2

• Each ExecutionVertex will be executed one or more times • The JobManager maps Execution to task slots • Pipelined execution in same slot where applicable

p=4 p=4 p=3

All to allPointwise

12

Scheduling

TaskManager 1 TaskManager 2

• Each ExecutionVertex will be executed one or more times • The JobManager maps Execution to task slots • Pipelined execution in same slot where applicable

p=4 p=4 p=3

All to allPointwise

12

Scheduling

TaskManager 1 TaskManager 2

• Each ExecutionVertex will be executed one or more times • The JobManager maps Execution to task slots • Pipelined execution in same slot where applicable

p=4 p=4 p=3

All to allPointwise

12

Scheduling• Scheduling happens from the sources • Later tasks are scheduled during runtime

• Depending on the result type

JobManager Master

Actor System

TaskManager Worker

Actor System

Submit Task

State Updates

13

Execution• The ExecutionGraph tracks the state of each parallel

Execution • Asynchronous messages from the

TaskManager and Client Failed

FinishedCancellingCancelled

Created Scheduled RunningDeploying

14

Task Execution• TaskManager receives Task per Execution • Task descriptor is limited to:

• Location of consumed results • Produced results • Operator & user code

User CodeOperator

Task

15

? ?

Task ExecutionDataStream<Tuple2<String, Integer>> counts = data.flatMap(new SplitByWhitespace());

User Code

17

Task ExecutionDataStream<Tuple2<String, Integer>> counts = data.flatMap(new SplitByWhitespace());

User Code

for (…) { out.collect(new Tuple2<>(w, 1));}

17

Task ExecutionDataStream<Tuple2<String, Integer>> counts = data.flatMap(new SplitByWhitespace());

User Code

StreamTask withStreamFlatMap

operator

for (…) { out.collect(new Tuple2<>(w, 1));}

17

Task ExecutionDataStream<Tuple2<String, Integer>> counts = data.flatMap(new SplitByWhitespace());

User Code

StreamTask withStreamFlatMap

operator

Task with one consumed and one produced

result

for (…) { out.collect(new Tuple2<>(w, 1));}

17

Data Connections• Input Gates request input from local and remote

channels on first read

Task Result

ResultManager

TaskManager

ResultManager

TaskManager

NetworkManagerNetworkManager

Input Gate

18

Data Connections• Input Gates request input from local and remote

channels on first read

Task Result

ResultManager

TaskManager

ResultManager

TaskManager

NetworkManagerNetworkManager

Input Gate

1.Initiate TCP connection

18

Data Connections• Input Gates request input from local and remote

channels on first read

Task Result

ResultManager

TaskManager

ResultManager

TaskManager

NetworkManagerNetworkManager

Input Gate

2. Request1.Initiate TCP connection

18

Data Connections• Input Gates request input from local and remote

channels on first read

Task Result

ResultManager

TaskManager

ResultManager

TaskManager

NetworkManagerNetworkManager

Input Gate

2. Request3. Send via

TCP

1.Initiate TCP connection

18

Result Characteristics

vs.

vs.

Ephemeral Checkpointed

Pipelined Blocking

How and when to do data exchange?

How long to keep results around?

20

Map Pipelined Result

110101010100

Pipelined Results

21

Map Pipelined Result11010101

0100

Pipelined Results

21

Map Pipelined Result

110101010100

Pipelined Results

21

Map Pipelined Result Reduce

110101010100

Pipelined Results

21

Map Pipelined Result Reduce

110101010100

Pipelined Results

21

Map Pipelined Result Reduce

11010101

0100

Pipelined Results

21

Map Pipelined Result Reduce

11010101

0100

Pipelined Results

21

Map Pipelined Result Reduce

11010101

0100

Pipelined Results

21

Map Pipelined Result Reduce

11010101

0100

Pipelined Results

21

Map Pipelined Result Reduce

110101010100

Pipelined Results

21

Map Pipelined Result Reduce

11010101

0100

Pipelined Results

21

Map Pipelined Result Reduce

110101010100

Pipelined Results

21

Map Blocking Result

110101010100

Blocking Results

22

Map Blocking Result11010101

0100

Blocking Results

22

Map Blocking Result

110101010100

Blocking Results

22

Map Blocking Result

11010101

0100

Blocking Results

22

Map Blocking Result

11010101

0100

Blocking Results

22

Map Blocking Result

110101010100

Blocking Results

22

Map Blocking Result

110101010100

Blocking Results

22

Map Blocking Result Reduce

110101010100

Blocking Results

22

Map Blocking Result Reduce

110101010100

Blocking Results

22

Map Blocking Result Reduce

110101010100

Blocking Results

22

Map Blocking Result Reduce

11010101

0100

Blocking Results

22

Map Blocking Result Reduce

110101010100

Blocking Results

22

Batch Pipelines

Batch Pipelines

Data exchange is mostly streamed

Batch Pipelines

Data exchange is mostly streamed

Some operators block (e.g. sort, hash table)

Recap

Client JobManager TaskManager

Communication Actor-only (coordination)

Actor-only (coordination)

Actor & Data Streams

Central Abstraction JobGraph ExecutionGraph Task

State Tracking – Completeprogram Single Task

23

Thank You!

Stream & Batch Processing

DataStream DataSet

JobGraph Chaining Chaining and cost-based optimisation

Intermediate Results Pipelined Pipelined and Blocking

Operators Stream operators Batch operators

User function Common interface formap, reduce, …

25

Stream & Batch Processing• Stream and Batch programs are different

parameterizations of the JobGraph • Everything goes down to the same runtime • Streaming first, batch as special case

• Cost-based optimizer on translation • Blocking results for less resource fragmentation • But still profit from streaming

• DataSet and DataStream API are essentially all user code to the runtime

24

Result Characteristics

vs.

vs.

Ephemeral Checkpointed

Pipelined Blocking

How and when to do data exchange?

How long to keep results around?

Map Ephemeral Result

45

Ephemeral Results

110101010100

Map Ephemeral Result

45

Ephemeral Results

110101010100

Map Ephemeral Result

45

Ephemeral Results

110101010100

Map Ephemeral Result Reduce

45

Ephemeral Results

110101010100

Map Ephemeral Result Reduce

45

Ephemeral Results

110101010100

Map Ephemeral Result Reduce

45

Ephemeral Results

11010101

0100

Map Ephemeral Result Reduce

45

Ephemeral Results

11010101

0100

Map Ephemeral Result Reduce

45

Ephemeral Results

11010101

0100

Map Ephemeral Result Reduce

45

Ephemeral Results

11010101

0100

Map Ephemeral Result Reduce

45

Ephemeral Results

1101

01010100

Map Ephemeral Result Reduce

45

Ephemeral Results

1101

01010100

Map Ephemeral Result Reduce

45

Ephemeral Results

1101

01010100

Map Ephemeral Result Reduce

45

Ephemeral Results

11010101

0100

Map Ephemeral Result Reduce

45

Ephemeral Results

11010101

0100

Map Ephemeral Result Reduce

45

Ephemeral Results

11010101

0100

Map Ephemeral Result Reduce

45

Ephemeral Results

11010101

0100

Map Ephemeral Result Reduce

45

Ephemeral Results

11010101

0100

Map Checkpointed Result Reduce

46

Checkpointed Results

110101010100

Map Checkpointed Result Reduce

46

Checkpointed Results

110101010100

Map Checkpointed Result Reduce

46

Checkpointed Results

110101010100

Map Checkpointed Result Reduce

46

Checkpointed Results

110101010100

1101

Map Checkpointed Result Reduce

46

Checkpointed Results

11010101

0100

1101

Map Checkpointed Result Reduce

46

Checkpointed Results

11010101

0100

1101

Map Checkpointed Result Reduce

46

Checkpointed Results

11010101

0100

11010101

Map Checkpointed Result Reduce

46

Checkpointed Results

11010101010011010101

Map Checkpointed Result Reduce

46

Checkpointed Results

110101010100

11010101

Map Checkpointed Result Reduce

46

Checkpointed Results

110101010100

110101010100

Map Checkpointed Result Reduce

46

Checkpointed Results

110101010100

110101010100

Map Checkpointed Result Reduce

46

Checkpointed Results

110101010100

11010101

0100

Map Checkpointed Result Reduce

46

Checkpointed Results

110101010100

110101010100

top related