flink forward sf 2017: ufuk celebi - the stream processor as a database: building online...

30
Ufuk Celebi @iamuce The Stream Processor as a Database

Upload: flink-forward

Post on 21-Apr-2017

76 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

UfukCelebi@iamuce

The Stream Processoras a Database

Page 2: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

The(Classic)UseCaseRealtimeCountsandAggregates

2

Page 3: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

(Real-)TimeSeriesStatistics

3

StreamofEvents Real-timeStatistics

Page 4: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

TheArchitecture

4

collect messagequeue

analyze serve&store

Page 5: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

TheFlinkJob

5

case class Impressions(id: String, impressions: Long)

val events: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(…))

val impressions: DataStream[Impressions] = events.filter(evt => evt.isImpression).map(evt => Impressions(evt.id, evt.numImpressions)

val counts: DataStream[Impressions]= stream.keyBy("id").timeWindow(Time.hours(1)).sum("impressions")

Page 6: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

TheFlinkJob

6

case class Impressions(id: String, impressions: Long)

val events: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(…))

val impressions: DataStream[Impressions] = events.filter(evt => evt.isImpression).map(evt => Impressions(evt.id, evt.numImpressions)

val counts: DataStream[Impressions]= stream.keyBy("id").timeWindow(Time.hours(1)).sum("impressions")

Page 7: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

TheFlinkJob

7

case class Impressions(id: String, impressions: Long)

val events: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(…))

val impressions: DataStream[Impressions] = events.filter(evt => evt.isImpression).map(evt => Impressions(evt.id, evt.numImpressions)

val counts: DataStream[Impressions]= stream.keyBy("id").timeWindow(Time.hours(1)).sum("impressions")

Page 8: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

TheFlinkJob

8

case class Impressions(id: String, impressions: Long)

val events: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(…))

val impressions: DataStream[Impressions] = events.filter(evt => evt.isImpression).map(evt => Impressions(evt.id, evt.numImpressions)

val counts: DataStream[Impressions]= stream.keyBy("id").timeWindow(Time.hours(1)).sum("impressions")

Page 9: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

TheFlinkJob

9

case class Impressions(id: String, impressions: Long)

val events: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(…))

val impressions: DataStream[Impressions] = events.filter(evt => evt.isImpression).map(evt => Impressions(evt.id, evt.numImpressions)

val counts: DataStream[Impressions]= stream.keyBy("id").timeWindow(Time.hours(1)).sum("impressions")

Page 10: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

TheFlinkJob

10

KafkaSource map() window()/

sum() Sink

KafkaSource map() window()/

sum() Sink

filter()

filter()

keyBy()

keyBy()

State

State

Page 11: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

Puttingitalltogether

11

Periodically(everysecond)flushnewaggregates

toRedis

Page 12: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

TheBottleneck

12

Writestothekey/valuestoretaketoolong

Page 13: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

Queryable State

13

Page 14: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

QueryableState

14

Page 15: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

QueryableState

15

Optional,andonlyattheendof

windows

Page 16: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

QueryableState:ApplicationView

16

Database

realtimeresults olderresults

Application QueryService

currenttimewindows

pasttimewindows

Page 17: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

QueryableStateEnablers§ Flinkhasstateasafirstclasscitizen

§ Stateisfaulttolerant (exactlyoncesemantics)

§ Stateispartitioned (sharded)togetherwiththeoperatorsthatcreate/updateit

§ Stateiscontinuous (notminibatched)

§ Stateisscalable

17

Page 18: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

StateinFlink

18

window()/sum()

Source/filter()/map()

Stateindex(e.g.,RocksDB)

Eventsarepersistentandordered (perpartition/key)

inthemessagequeue(e.g.,ApacheKafka)

Eventsflowwithoutreplicationor synchronouswrites

Page 19: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

StateinFlink

19

window()/sum()

Source/filter()/map()

Triggercheckpoint Injectcheckpointbarrier

Page 20: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

StateinFlink

20

window()/sum()

Source/filter()/map()

Takestatesnapshot Triggerstatecopy-on-write

Page 21: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

StateinFlink

21

window()/sum()

Source/filter()/map()

Persiststatesnapshots Durablypersistsnapshots

asynchronously

Processingpipelinecontinues

Page 22: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

QueryableState:Implementation

22

QueryClient

StateRegistry

window()/sum()

JobManager TaskManager

ExecutionGraph

StateLocationServer

deploy

status

Query:/job/state-name/key

StateRegistry

window()/sum()

TaskManager

(1)Getlocationof"key-partition"of"job"

(2)Lookuplocation

(3)Respondlocation

(4)Querystate-nameandkey

localstate

register

Page 23: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

QueryableStatePerformance

23

Page 24: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

Conclusion

24

Page 25: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

Takeaways§ Streamingapplicationsareoftennotboundbythestream

processoritself.Crosssysteminteraction isfrequentlybiggestbottleneck

§ Queryablestatemitigatesabigbottleneck:Communicationwithexternalkey/valuestorestopublishrealtimeresults

§ ApacheFlink'ssophisticatedsupportforstatemakesthispossible

25

Page 26: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

TakeawaysPerformanceofQueryableState

§ Datapersistenceisfastwithlogs• Appendonly,andstreamingreplication

§ Computedstateisfastwithlocaldatastructuresandnosynchronousreplication

§ Flink'scheckpointmethodmakescomputedstatepersistentwithlowoverhead

26

Page 27: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

Questions?§ eMail:[email protected]§ Twitter:@iamuce§ Code/Demo:https://github.com/dataArtisans/flink-

queryable_state_demo

27

Page 28: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

Appendix

28

Page 29: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

Flink Runtime+APIs

29

DataStreamAPI

RuntimeDistributedStreamingDataFlow

TableAPI&StreamSQL

ProcessFunction API

Building Blocks: Streams, Time, State

Page 30: Flink forward SF 2017: Ufuk Celebi - The Stream Processor as a Database: Building Online Applications directly on Streams

ApacheFlinkArchitectureReview

30