flink. pure streaming

Flink Pure Streaming

Paco GuerreroBig Data & Solutions Architect 9/21/16

Not for Geeks

Life as Time

4

Anything as Time

Flink as Time

Streaming vs Batch

7

“Abstraction of reality used to facilitate information processing”

MicroBatch Batch

Batch

All Input

Batch

Batch Job

All Input

Batch

Batch Job

All Input

All Output

Nothing about timeTimestamps used as trick to keep real time fingerprint

Streaming

“Continuous processing of data that is continuously produced”

Streaming

“Streaming is the next programming paradigm for data applications, and you need to start thinking in terms of streams”


Streaming



Data Stream: Infinite sequence of data arriving in a continuous fashion.

Streaming




Stream processing is the backbone of the new data infrastructure.

Streaming




Stream processing is the backbone of the new data infrastructure.

“The world beyond batch” A high-level tour of modern data-processing concepts. By Tyler Akidau

August 5, 2015 https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101

Streaming

Streaming

Streaming Job

Streaming

Streaming Job

Real Life Time !!

Streaming is the biggest change in

data infraestructure since Hadoop

Streaming

The biggest change is moving from

batch to streaming is handling time explicitly

Streaming

Micro Batch

Micro Batch

Batch Job 1

Batch JobBatch Job 2

All Output

Batch Job 1

Micro Batch

Batch JobBatch Job 2

All Input

All OutputBatch Job 3All Output

All Output

Batch Job 1

Batch Frequency ?Timestamps keeps real time fingerprint

Micro Batch

Streaming Technologies

Batch StreamingMicro Batch

StateLess – Record acknowledgementsCPU bounded performanceNot expressive declarative functional API – Low Level APINot auto scalingLow level programmatic topology Poor Streaming Windows funcionalitiesNot compatible with Hadoop APIs

Streams


Batch StreamingMicro Batch

Apache Flink

38

FlinkOpen Source Stream Processing Framework. Last available Release 1.1.1

Top Level Apache Project since Dec '14

FlinkOpen Source Stream Processing Framework. Last available Release 1.1.1

Top Level Apache Project since Dec '14

Main FeaturesNative Stream Low LatencyHigh throughputStatefulExactly-one guaranteesDistributedExpressive APIsAnd more ….

Flink Flink

Flink Integration

YARN upcoming...

Flink Integration

Flink Integration

YARN upcoming...

upcoming...

Flink Integration

YARN upcoming...

Flink Stack

Flink Runtime Engine

Distributed pipelined processing

Execute everything as Stream

Iterative ( cyclic ) dataflows

Mutable state in operations

Operate on managed memory (*)

Also works on batch !!

Job Manager

Client

Optimizer

Dataflow Graph


Distributed pipelined processing

Execute everything as Stream

Iterative ( cyclic ) dataflows

Mutable state in operations

Operate on managed memory (*)

Also works on batch !!Workers ( Task Managers )

Job Manager

Client

Optimizer

Dataflow Graph

Execution Graph


Stream Job

Batch Job

ML Job


Graph Job

optimizer

optimizer

optimizer

optimizer

Tasks scheduled and executed in workers ( slots )

Tasks as chain of operators

Run operator logic in a pipelined fashion

State is kept in operators

Stream Job

Batch Job

ML Job


Graph Job

optimizer

optimizer

optimizer

optimizer

If you want to know one thing about Flink is that you don't need to know

the internals of Flink

Events Time &

Windows

Fault Tolerance &

CorrectnessState Handling

Low Latency &

High ThroughputAPI Libraries SQL

Building Blocks

Events Time &

Windows

Fault Tolerance &


Low Latency &


Building Blocks

lTime references

lOut of order events

lPowerful Windowing

Event Times & Windowing


EventTime

EventTime


Flink Data Source

EventTime

EventTime

Ingestion Time


Flink Data Source

Flink Window Operator

EventTime

EventTime

Ingestion Time

Processing Time

Event Time: when data is generated

Ingestion time: when data is loaded from source

Processing time: when data is processed

Event time help to process out- of-order events and replay elements as the ocurred ( deterministic results )

Explicit handling of time. 3 choices:


Event time. Out or Order

1 2 3 5 7

4 6 8 9 10


1 2 3 5 7

4 6 8 9 10


Out or Order

1 2 3 5 74 6 8 9 10

1 2 3 5 7

4 6 8 9 10 1 2 3 5 74 6 8 9 104


Ingestion Time Windows

Out or Order

1 2 3 5 74 6 8 9 10

1 2 3 5 7

4 6 8 9 10 1 2 3 5 74 6 8 9 10

1 2 3

4

4 5


6 7 8 9 10

Event Time Windows


Out or Order

1 2 3 5 74 6 8 9 10

Event time. Watermarks

1 2 3 5 7

4 6 8 9 10


1 2 3 5 7

4 6 8 9 10

1 2 3 54 6 8

1 2 3 54 6 8

1 2 3

4

4 5


6 8

Event Time Windows


Out or Order

1 2 3 5 7

4 6 8 9 10

1 2 3 54 6 8 910

1 2 3 54 6 8 910

1 2 3

4

4 5


6 8 9 10

Event Time Windows


Out or Order

Not event time before 5 will come

Late Time of 2

5

1 2 3 5 7

4 6 8 9 10

1 2 3 5 74 6 8 910

1 2 3 5 74 6 8 910

1 2 3

4

4 5


6 7 8 9 10

Event Time Windows


Out or Order

Not event time before 10 will come

Late Time of 2

10

Windowing

Windows: grouping of events according to time, session*, count

Windowing


Powerful built-in windows:

Windowing


Powerful built-in windows: Count: number of events to trigger the window. Process X last events each Y events.

Windowing



Time: l Tumbling: trigger every X time with received events

l Sliding: trigger every X time with received events in last Y time

Windowing





Session: all events from session/user X until session time expired ( Gap )

Windowing





Session: all events from session/user X until session time expired ( Gap )

High level API for user windows: Window Assigner, Trigger, Evictor

Events Time &

Windows

Fault Tolerance &


Low Latency &


Building Blocks

lManaged operator state for backup/recovery

lSavepoints

Stateful Streaming

Op

Stateless StreamProcessing

Stateful Streaming

Op Op

State

Stateless StreamProcessing

Stateful StreamProcessing

lBuilt-in internal state in each operator for exactly-once semantics

lUser state can be declared in each operator to be saved locally in memory ( API, key/value pars )

lSnapshots: periodically local states in memory are persisted in lightweight distributed snapshots. No global pause !!

lCheckpoint as global consistent point-in-time snapshot build by set of distributed snapshots.

lPluggable state backend for snapshots:JobManager, HDFS, RocksDB

lSavepoints: user-triggered retained checkpoint

Events Time &

Windows

Fault Tolerance &


Low Latency &


Building Blocks

lExactly-once semantics with managed operator state

lDistributed Snapshotting Algorithm

Periodically

Chandy-Lamport Snapshots

“The global-state-detection algorithm is to be superimposed on the underlying computation: It must run concurrently with, but no alter, this underlying computation”

. Triggers snapshots asynchronously

. Embedded snapshots algorithm in stream of data ( barriers )

. No global pause, lightweight impact in performance

Handling Checkpoints

snapshot

Job Manager

Periodically pushes barriers for new state

New state X+1

Ack for Snapshot state X from Task N


snapshot

Job Manager


snapshot

Job Manager


All Acks received

Register Checkpoint for restore in case of fail

Streaming Fault Tolerance

In case of fail, last global checkpoint is recovered ( recovery from partial checkpoint / individual snapshots is coming )

Need of stateful source like kafka to ensure end-to-end exactly-once semantic in case of fail.

Kafka sink doesn't guarantee end-to-end exactly-once ( multiple writes in topic ) ( at least-once )

Semantics in Flink:

At Least Once: never loses events, events might be reprocessed

Exactly once: neither reprocessed nor lost events.

Exactly once by default, with low impact in performance and development effort, unlike another tools like Storm or Spark.

If you want to know one thing about Flink is that you don't need to know

the internals of Flink

Events Time &

Windows

Fault Tolerance &


Low Latency &


Building Blocks

lPipelined runtime

lLatency vs throughput tunning

Exactly-once semantic with low impact in performance

Controllable checkpointing overhead

Higher throughput using processing time

Performance improvements thanks to:

. operator chaining during optimization phase

. own optimized serialization stack with code generation

Performance Tunning

Benchmark for “Streaming Computation” published by Yahoo. Dec 18, 2015https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at

Production use-case

l counting ad impressions group by campaign

l aggregations over a 10 second window

l save current aggregate value to Redis every second

Streaming Benchmark

Throughput vs Latency Graph

Throughput ( 1000 events / sec )

99 PercentileLatency ( ms )

Not Operator combinig in Storm, more complicate topology, more steps for events and more overhead

Apache Storm Without Tridentl At least once / Double counting after fail / Lost state after Failuresl CPU bounded

Apache Sparkl Latency increase with throughput

Apache Flinkl Exactly once / No double counting / No state lossl Limited by bandwidth between Kafka and Flink cluster l (1 GigE).

l kafka brokers within Kafka Cluster ( 10 GigE ) l Achieved 15 million messages /sec l ( before 3 million m/sec) with exactly once semantic

10,000,000 20,000,000

1 GigE

10 GigE

Performance Tunning

Events Time &

Windows

Fault Tolerance &


Low Latency &


Building Blocks

lHigh Level API

lWide range of basic and advanced operators

lJava , Scala. Python soon !!

API

Working on data streams ( bounded ? )

API


Stream Processing: Explicit Handling of Time

API



Java & Scala. Python coming. Java: Bean type classes vs Tuples with position addresses. Scala: case classes.

API




Operators:

Sources: kafka, FileSystem, Cassandra …

Sinks: Kafka, HDFS, Cassandra ….

Transformations: Basic: map, flatmap, filter, grouping, iterate, project, join, cross, … Streaming: Windowing + Aggregations, Temporal Binary Iterative Stream operators

API




Operators:

Sources: kafka, FileSystem, Cassandra …

Sinks: Kafka, HDFS, Cassandra ….

Transformations: Basic: map, flatmap, filter, grouping, iterate, project, join, cross, … Streaming: Windowing + Aggregations, Temporal Binary Iterative Stream operators

DataStream<?> DataSet<?>

Core API

1 implementation*, 2 interfaces

Source Map Reduce

Fliter

Join Sum Sink

Map

Source

Operators

Source Map Reduce

Fliter

Join Sum Sink

Source

Filter

Operators

Source Map Reduce

Fliter

Join Sum Sink

Source

Reduce

Operators

Source Map Reduce

Fliter

Join Sum Sink

Source

Join

Operators

Source Map Reduce

Fliter

Join Sum Sink

Source

Operators

Events Time &

Windows

Fault Tolerance &


Low Latency &


Building Blocks

lEasy to use. SQL !!

lBased on Apache Calcite

API extension for DataSets y DataStreams

Based on relational Table abstraction

Table <=> Source / DataSet / DataStream

Operators like: where, select, as, groupBy, join, union, minus, distinct, orderBy, ...

Table API

Execute SQL-Like sentences on DataSets and Datastreams

Resuts returned as Table ( Table API ), convertible to DataStream or DataSets

SQL and Table API can be seamlessly mixed over DataStream/DataSets

Flink’s SQL support is not feature complete, yet.

Queries that include unsupported SQL will fail !!

SQL Support

SQL

Parsing and Logical plan for Table operators and SQL are optimized using Apache Calcite

Only supported a Subset of the comprehensive SQL standard

Apache Calcite provides with:

SQL Parsing

API for building expressions in relational algebra

Query planning engine

Provides SQL for Streaming Queries with windows aggregations

SELECT STREAM TUMBLE_END(rowtime, INTERVAL '1' HOUR) AS rowtime, productId, COUNT(*) AS c, SUM(units) AS units FROM Orders GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR), productId;

Apache Calcite

SQL Sentence

Apache Calcite: SQL to Logical

Plan as Relational Algebra

Flink Optimizer: Logical Plan to Execution Plan

If you want to know

one thing about Flink

is that you don't need

to know the internals of Flink

So … Batch

Batch on Stream

Stream: Unbounded Data Stream

Unbounded Data Stream

Batch on Stream


Batch: Bounded stream ( dataset ) on a stream processor

Global window over the entire dataset

Optimization in operators for joins and grouping, with blocking data exchange if needed


Bounded Data Set

Batch on Stream


Batch: Bounded stream ( dataset ) on a stream processor

Global window over the entire dataset

Optimization in operators for joins and grouping, with blocking data exchange if needed

Batch specific optimizations:

Cost-based optimizer: dataset size known before hand

Manage memory on / off-heap for join, sort, …

Optimization serialization stack for user-types

Bounded Data Set

Batch on Stream


Conclusions

Conclusions

Flink Pure streaming engine matches real life. No Abstraction

Conclusions


Batch on streaming

Conclusions


Batch on streaming

Flexible Windowing Semantics with Explicit Time handling

Conclusions


Batch on streaming


Competitive Performance, low latency and hight throughput

Conclusions


Batch on streaming


Competitive Performance, low latency and hight throughput

Apache Beam, open sourced by Google, uses Flink as its first order runner forBatch and Streaming processing in partnership with Data Artisans.

100% Compliance of data processing model “what, where, when, how “

Indizen Technologies, S.L

Paseo de la Castellana, 130 - 4ª planta

28046 Madrid, Spain

Tel. 91 535 85 68

www.indizen.com

@indizen_corp¡gracia

s!

136

Francisco José Guerrero

Big Data & Solutions Architect

Tel. 91 535 85 68

Mov. XXX YYY ZZZ

[email protected]

flink. pure streaming

Technology