modern stream processing with apache flink @ goto berlin 2017

Post on 21-Jan-2018

355 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Modern Stream Processing with Apache Flink®

Till Rohrmann@stsffap

GOTO Berlin 20171

2

Original creators ofApache Flink®

dA Platform 2Open Source Apache Flink + dA Application Manager

What changes faster? Data or Query?

3

Data changes slowlycompared to fastchanging queries

ad-hoc queries, data exploration, ML training and

(hyper) parameter tuning

Batch ProcessingUse Case

Data changes fastapplication logic

is long-lived

continuous applications,data pipelines, standing queries,

anomaly detection, ML evaluation, …

Stream ProcessingUse Case

Batch Processing

4

Stream Processing

5

6

Apache Flink in a Nutshell

Apache Flink in a Nutshell

7

Queries

Applications

Devices

etc.

Database

Stream

File/ObjectStorage

Stateful computations over streamsreal-time and historic

fast, scalable, fault tolerant, in-memory,event time, large state, exactly-once

HistoricData

Streams

Application

8

Event Streams State (Event) Time Snapshots

The Core Building Blocks

real-time andhindsight

complexbusiness logic

consistency without-of-order data

and late data

forking /versioning /time-travel

Powerful Abstractions

9

Process Function (events, state, time)

DataStream API (streams, windows)

Stream SQL / Tables (dynamic tables)

Stream- & Batch Data Processing

High-levelAnalytics API

Stateful Event-Driven Applications

val stats = stream.keyBy("sensor").timeWindow(Time.seconds(5)).sum((a, b) -> a.add(b))

def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = {// work with event and state(event, state.value) match { … }

out.collect(…) // emit eventsstate.update(…) // modify state

// schedule a timer callbackctx.timerService.registerEventTimeTimer(event.timestamp + 500)

}

Layered abstractions tonavigate simple to complex use cases

Hardened at scale

10

Athena X Streaming SQLPlatform Service

Streaming Platform as a Service

Fraud detectionStreaming Analytics Platform

100s jobs, 1000s nodes, TBs statemetrics, analytics, real time MLStreaming SQL as a platform

11

Distributed applicationinfrastructure

Good old centralized architecture

12

The big meancentral database

$$$

The grumpyDBA

Application Application Application Application

Write log and tables

13FromtheApacheCassandradocs

Modern distributed app. architecture

14

EventSourcing

TurningtheDBinsideout

CQRS

Modern distributed app. architecture

15

Application

Sensor

APIsApplication

Application

Application

The limit to what you can do is how sophisticatedcan you compute over the stream!

Boils down to: How well do you handle state and time

16

A Flink-favored approach

Event Sourcing + Memory Image

17

event logpersists events(temporarily)

event /command

Process

main memory

update localvariables/structures

periodically snapshot the

memory

Event Sourcing + Memory Image

18

Recovery: Restore snapshot and replay events since snapshot

event logpersists events(temporarily)

Process

Distributed Memory Image

19

Distributed application, many memory images.Snapshots are all consistent together.

Stateful Event & Stream Processing

20

Scalable embedded state

Access at memory speed &scales with parallel operators

Compute, State, and Storage

21

Classic tiered architecture Streaming architecture

databaselayer

computelayer

application state+ backup

compute+

stream storageand

snapshot storage(backup)

application state

Performance

22

synchronous reads/writesacross tier boundary

asynchronous writesof large blobs

all modificationsare local

Classic tiered architecture Streaming architecture

Consistency

23

distributed transactions

at scale typicallyat-most / at-least once

exactly onceper state =1 =1

Classic tiered architecture Streaming architecture

Scaling a Service

24

separately provision additionaldatabase capacity

provision computeand state together

Classic tiered architecture Streaming architecture

provisioncompute

Rolling out a new Service

25

provision a new database(or add capacity to an existing one)

simply occupies someadditional backup space

Classic tiered architecture

Streaming architecture

provision computeand state together

What users built on checkpoints…§ Upgrades and Rollbacks§ Cross Datacenter Failover§ State Archiving§ Application Migration§ Spot Instance Region Chasing§ A/B testing§ …

26

27

What is the next wave ofstream processing

applications?

What changes faster? Data or Query?

28

Data changes slowlycompared to fastchanging queries

ad-hoc queries, data exploration, ML training and

tuning

Batch ProcessingUse Case

Data changes fastapplication logic

is long-lived

continuous applications,data pipelines, standing queries,

anomaly detection, ML evaluation, …

Stream ProcessingUse Case

Analytics & Business Logic

29

Stream

Source analytics metricsBusiness

Logic

Stream Processor

events

ApplicationDatabase

Blending Analytics & Business Logic

30

Stream

Source StreamingApplication

Stream Processor

events

RPCs

(Actor) Messages

AthenaX by Uber

31

32

Can one build an entire sophisticatedweb application (say a social network)

on a stream processor?

(Yes, we can!™)

33

Social network implemented using event sourcing and CQRS (Command Query Responsibility Segregation) on Kafka/Flink/Elasticsearch/Redis

More:https://data-artisans.com/blog/drivetribe-cqrs-apache-flink

@

34

The next wave of stream processing applications…

… is all types of statefulapplications that react to

data and time!

35

Stateful stream applications

versioning, upgrading,rollback, duplicating,

migrating, …

Continuous applications

The dA Platform Architecture

36

dA Platform 2

Apache FlinkStateful stream processing

KubernetesContainer platform

Logging

Streams from Kafka, Kinesis,

S3, HDFS, Databases, ...

dA Application

ManagerApplication lifecycle

management

Metrics

CI/CD

Real-timeAnalytics

Anomaly- &Fraud Detection

Real-timeData Integration

ReactiveMicroservices (and more)

Versioned Applications, not Jobs/Jars

37

Stream ProcessingApplication

Version 3

Version 2

Version 1

Code and ApplicationSnapshot

upgrade

upgrade

New Application

Version 3a

Version 2afork /

duplicate

Deployments, not Flink Clusters

38

Testing / QA Kubernetes Cluster Production Kubernetes Cluster

Threat MetricsApp. Testing

Activity MonitorApplication

Fraud DetectionApplication

Fraud DetectionApp. Testing

Hooks for CI/CD pipelines

39

Kubernetes Cluster

ApplicationVersion 1

CI Service

dA Application

Manager

pushupdate

triggerCI

upgradeAPI

statefulupgrade

ApplicationVersion 2

40

Thank you!@stsffap@ApacheFlink @dataArtisans

We are hiring!data-artisans.com/careers

41

top related