flink/ an unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. flink_ an unified...

49
FLINKUNIFIED STREAM ENGINE Li Chengxiang

Upload: others

Post on 28-May-2020

22 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

FLINK:UNIFIED STREAM ENGINELi Chengxiang

Page 2: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

ABOUT ME• Apache Hive Committer

• Apache Flink Committer

• Focus on large scale distributed computer engine and distributed SQL engine.

Page 3: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

AGENDA• Batch and Stream

• Core feature of Stream

• Advanced Flink Stream feature

Page 4: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

BATCH AND STREAM

Page 5: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()
Page 6: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

IT’S ALL ABOUT DATA

Bounded Data Unbounded Data

Page 7: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

BATCH ENGINE

Page 8: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

BATCH ENGINE

Page 9: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

BATCH ENGINE

Page 10: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

BATCH ENGINE

Page 11: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

BATCH ENGINE

Page 12: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

BATCH ENGINE

Page 13: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

BATCH ENGINE

Page 14: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

BATCH ENGINE

Page 15: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

BATCH ENGINE

Page 16: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

BATCH ENGINE

Page 17: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

BATCH ENGINE

Page 18: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

BATCH ENGINE

Page 19: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

BATCH ENGINE

Page 20: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

STREAM ENGINE

Page 21: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

STREAM ENGINE

Page 22: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

STREAM ENGINE

Page 23: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

STREAM ENGINE

Page 24: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

STREAM ENGINE

Page 25: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

STREAM ENGINE

Page 26: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

STREAM ENGINE

Page 27: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

STREAM ENGINE

Page 28: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

STREAM ENGINE

Page 29: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

STREAM ENGINE

Page 30: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

STREAM ENGINE

Page 31: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

STREAM ENGINE

Page 32: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

STREAM ENGINE

Page 33: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

BATCH ENGINE FOR BOUNDED DATA AND STREAM ENGINE

FOR UNBOUNDED DATA, WHY?

Page 34: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

BATCH ENGINE FOR UNBOUNDED DATA

Divide unbounded data into bounded data by window.

Page 35: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

STREAM ENGINE FOR BOUNDED DATA

Better resource utility with the cost of task level fault tolerant.

Page 36: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

Flink Spark

Which one is better?

Stream Engine

DataSet(Bounded data)

DataStream(Unbounded Data)

Batch Engine

RDD(Bounded data)

StreamRDD(Unbounded Data)

Page 37: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

FLINK STREAM ENGINE

• Flink Runtime Engine support both Batch and Pipeline mode.

• For DataSet API, use Batch or Pipeline.

• For DataStream API, use Pipeline.

Page 38: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

orBatchEngine

S Stream Engine

StreamEngine

BatchEngine

Page 39: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

WHAT SHOULD YOU CARE ABOUT STREAM?

Page 40: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

CORRECTNESS IS THE STAPLE FOOD INSTEAD OF DESSERT

Page 41: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

Batch Engine intermediate state = all data + partial computation

Stream Engine intermediate state = partial data + all computation

Page 42: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

• Distributed Snapshot

• Exactly-Once support.

1. do distributed snapshot.

2. rewind messages after barrier from source.

3. support rewinding messages at source side.

4. support UPDATE at sink side.

data stream

barrier

before barrier, part of snapshot

after barrier, not in snapshot

barrier

Stat

Page 43: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

WHEN IT’S HAPPENED INSTEAD OF WHEN IT’S

PROCESSED

Page 44: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

• Process Time

The time at which events are observed in the system.

• Event Time

The time at which events actually occurred.

• Watermark

Page 45: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

• Window

• Process Time/Event Time/Session

• Tumbling/Sliding

• Trigger

• Process Time/Event Time/Count

• Continuous/Purging/Delta

• Evictor

• Time/Count/Delta

Page 46: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

CEP(COMPLEX EVENT PROCESSING)

Pattern<MonitoringEvent, ?> warningPattern = Pattern.<MonitoringEvent>begin("First Event") .subtype(TemperatureEvent.class) .where(evt -> evt.getTemperature() >= TEMPERATURE_THRESHOLD) .next("Second Event") .subtype(TemperatureEvent.class) .where(evt -> evt.getTemperature() >= TEMPERATURE_THRESHOLD) .within(Time.seconds(10));

Page 47: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

FLINK STREM SQLval env = StreamExecutionEnvironment.getExecutionEnvironment()

// create some stream in the Streaming API

val stream : DataStream[(String, Double, Int)] = env

.addSource(new FlinkKafkaConsumer(...))

.map { line => parse(line) }

// create Table Environment and register stream with column names

val tabEnv = new TableEnvironment(env)

tabEnv.registerDataStream(stream, “myStreamTab”, (“ID”, “MEASURE”, “COUNT”))

val rTable: Table = tabEnv.sql(

“SELECT ID, MEASURE FROM myStreamTab WHERE COUNT > 17”)

Page 48: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

THANKS

Page 49: Flink/ An unified stream engine - files.meetup.comfiles.meetup.com/16395762/2. Flink_ An unified stream engine.pdf · FLINK STREM SQL val env = StreamExecutionEnvironment.getExecutionEnvironment()

REFERENCE1. https://www.oreilly.com/ideas/the-world-beyond-

batch-streaming-101

2. http://radar.oreilly.com/2014/07/why-local-state-is-a-fundamental-primitive-in-stream-processing.html

3. http://flink.apache.org/

4. http://spark.apache.org/