christian kreuzfeld – static vs dynamic stream processing

28
STATIC VS DYNAMIC STREAM PROCESSING Christian Kreutzfeldt @mnxfst STATIC VS DYNAMIC STREAM PROCESSING Christian Kreutzfeldt @mnxfst

Upload: flink-forward

Post on 08-Jan-2017

5.715 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Christian Kreuzfeld – Static vs Dynamic Stream Processing

STATIC VS DYNAMIC STREAM PROCESSING

Christian Kreutzfeldt@mnxfst

STATIC VS DYNAMIC STREAM PROCESSING

Christian Kreutzfeldt@mnxfst

Page 2: Christian Kreuzfeld – Static vs Dynamic Stream Processing

1. Introduction

2. Stream Processing - First Encounter

3. Increasing number of Use Cases

4. Arising Implementation Issues

5. Requirements for Stream Processing Framework

6. Way to SPQR (+ short demo)

7. Way to Apache Flink (extension points + short demo)

8. Future (hope to come)

9. Q&A

Page 3: Christian Kreuzfeld – Static vs Dynamic Stream Processing

Christian Kreutzfeldt (@mnxfst)

Senior Software Developer & Architect atOtto Group Business Intelligence Department

Tech Lead “Real-Time Stream Processing”

Computer Science at University of Luebeck

Page 4: Christian Kreuzfeld – Static vs Dynamic Stream Processing

w/ catalogue business, e-commerce and over-the-counter retail

Multichannel Retail

covering the entire portfolio of retail services across the value-added chain

Services

World’s Second-Largest Online Retailer in End-Consumer BusinessEurope’s Largest Online Retailer in End-Consumer Fashion & Lifestyle Business

providing retail-related financial services across the value-added chain

Financial Services

Page 5: Christian Kreuzfeld – Static vs Dynamic Stream Processing

definition of business intelligence strategy

BI Strategy

talent recruitment & training,networking & consulting

Consulting

evaluation & impl. of data driven business models

Business Development

maintaining & providing data pools

Data Pool

software-as-a-service solutions

SaaS Products

Otto Group Business Intelligence Departmentdriven by data, inspired by our customers

Page 6: Christian Kreuzfeld – Static vs Dynamic Stream Processing

Otto Group Business Intelligence Departmentdedicated to open source

stream processing framework

SPQR

scheduling framework for painfree agile development of your datahub

Schedoscope

framework for developing real-world machine learning solutions

Palladium

follow us on github.com/ottogroup

Page 7: Christian Kreuzfeld – Static vs Dynamic Stream Processing

Stream Processingfirst steps w/ unified tracking

Unified

Tracking

Page 8: Christian Kreuzfeld – Static vs Dynamic Stream Processing

Stream Processingprevent quality problems

Unified

Tracking

Tagging Template

Tagging Template

Tagging Template

Tagging Template

Page 9: Christian Kreuzfeld – Static vs Dynamic Stream Processing

Stream Processingprevent quality problems

Unified

Tracking

Tagging Template

Tagging Template

Tagging Template

Tagging Template

EventStream

Event Validatorakka

-based

real stream

processi

ng

Page 10: Christian Kreuzfeld – Static vs Dynamic Stream Processing

customer sessions

search sessions

user-agent identification

dynamic profile selection dynamic stream

queries

Stream Processingdeveloping project ideas

Page 11: Christian Kreuzfeld – Static vs Dynamic Stream Processing

Umberto Salvagnin https://www.flickr.com/photos/kaibara/4688161016 (cc by 2.0)

Stream Processingsoftware development issues

resource intensive use-case implementation

required ops support for topology deployment and

monitoring

rather static implementations than highly flexible ones

highly time consuming

Static Topologies (Queries)

Dynamic Data

Highly Flexible Context

Page 12: Christian Kreuzfeld – Static vs Dynamic Stream Processing

Stream Processingrequirements to ease the pain

unified runtime environment

operations support

support for multiple sources and sinks

real stream processing

easy-to-extend

steep learning curve

Page 13: Christian Kreuzfeld – Static vs Dynamic Stream Processing

Stream Processingworking w/ data the business way

no-code topology definition(the SQL way)

self dependent, immediate deployments

consistent monitoring(behavior / result retrieval)

adjustment through re-deployments

Dynamic Topologies (Queries)

Dynamic Data

Highly Flexible Context

Page 14: Christian Kreuzfeld – Static vs Dynamic Stream Processing

Stream Processingframework decision

unified runtime environment

operations support

support for multiple sources and sinks

real stream processing

easy-to-extend

steep learning curve

S P

Q R

(spo

oker

)

no-code topology definition

self dependent deployments

consistent monitoring

immediate deployments

short feedback circuit

Page 15: Christian Kreuzfeld – Static vs Dynamic Stream Processing

SPQRconcepts

independent library deployments into node repositories for later use

library deployment

configuration based pipeline descriptions

zero-codetopologies

support for ad hoc queries, immediate adjustments and short feedback circuits

ad hoc queries

https://github.com/ottogroup/spqr

Page 16: Christian Kreuzfeld – Static vs Dynamic Stream Processing

SPQRarchitecture

Page 17: Christian Kreuzfeld – Static vs Dynamic Stream Processing

D E M O

Page 18: Christian Kreuzfeld – Static vs Dynamic Stream Processing

Dynamic Stream Processingimportance for (business) acceptance

no-code topology definition

self dependent deployments

consistent monitoring

immediate deployments

short feedback circuit

steep learning curve, focus on functionality instead of implementation, better representation

no or less ops support, shorter time-to-execution, independency from tech teams, easier to use

short feedback circuit, easier to adjust

support people to try out new ideas, get more people to work with data streams

choose representation defined by topology author as foundation for monitoring to have common understanding (topology author, ops team)

Page 19: Christian Kreuzfeld – Static vs Dynamic Stream Processing

Dynamic Stream Processingfrom spqr to apache flink - it’s all there

Martin Grandjean - http://www.martingrandjean.ch/wp-content/uploads/2013/10/Graphe3.png (cc by-sa 3.0)

akka

Page 20: Christian Kreuzfeld – Static vs Dynamic Stream Processing

Dynamic Stream Processingvariety of ways to interact with apache flink

Martin Grandjean - http://www.martingrandjean.ch/wp-content/uploads/2013/10/Graphe3.png (cc by-sa 3.0)

variety to message types (request/response) available to interact with job manager / cluster:

● RequestNumberRegisteredTaskManager● RequestTotalNumberOfSlots● SubmitJob● CancelJob● RequestPartitionState● RequestJobStatus● RequestRunningJobs● RequestRunningJobsStatus● RequestJob● RequestRegisteredTaskManagers● RequestStackTrace● RequestJobManagerStatus● AccumulatorMessage (RequestAccumulatorResultsStringified,...)● ...

Page 21: Christian Kreuzfeld – Static vs Dynamic Stream Processing

Apache Flinkshort feedback circuit & consistent monitoring (impl)

Martin Grandjean - http://www.martingrandjean.ch/wp-content/uploads/2013/10/Graphe3.png (cc by-sa 3.0)

akka

FlinkMetricsCollector RunningJobsManagerspawns

queriesJobManager

JobMetricsCollector

spawns for each job

queriesJobManager

Page 22: Christian Kreuzfeld – Static vs Dynamic Stream Processing

Apache Flinkshort feedback circuit & consistent monitoring (impl)

Martin Grandjean - http://www.martingrandjean.ch/wp-content/uploads/2013/10/Graphe3.png (cc by-sa 3.0)

akka

public void preStart() throws Exception { context().system().scheduler().schedule( FiniteDuration.Zero(), FiniteDuration.apply(5, TimeUnit.SECONDS), this.remoteJobManagerRef, new RequestAccumulatorResults(this.jobId), context().dispatcher(), getSelf() ); } AccumulatorResultsFound

public void preStart() throws Exception {

context().system().scheduler().schedule( FiniteDuration.Zero(), FiniteDuration.apply(5, TimeUnit.SECONDS), this.remoteJobManagerRef, JobManagerMessages.getRequestRunningJobsStatus(), context().dispatcher(), getSelf() ); }

receive RunningJobsStatus

extract job identifier

start job metrics collector

RunningJobsManager

JobMetricsCollector

Page 23: Christian Kreuzfeld – Static vs Dynamic Stream Processing

Apache Flinkmetrics retrieval through accumulators

D E M O

Page 24: Christian Kreuzfeld – Static vs Dynamic Stream Processing

https://nifi.apache.org/

Apache Flinkhow to move on

deploy metrics

under construction

Page 25: Christian Kreuzfeld – Static vs Dynamic Stream Processing

Apache Flinktopology definition & deployments (integration points)

akka

Martin Grandjean - http://www.martingrandjean.ch/wp-content/uploads/2013/10/Graphe3.png (cc by-sa 3.0)

no-code topology definition

self dependent deployments immediate deployments

expects code

requires far too much framework

modifications

the place to be

Page 26: Christian Kreuzfeld – Static vs Dynamic Stream Processing

https://nifi.apache.org/

metricsdeploy

Apache Flinkrelevance

Static DataStatic Queries

Static DataDynamic Queries

Dynamic DataStatic Queries

Dynamic DataDynamic Queries

SQL

Page 27: Christian Kreuzfeld – Static vs Dynamic Stream Processing

https://nifi.apache.org/

metricsdeploy

Apache Flinkapache zeppelin points the right direction

Static DataStatic Queries

Static DataDynamic Queries

Dynamic DataStatic Queries

Dynamic DataDynamic Queries

SQL

Page 28: Christian Kreuzfeld – Static vs Dynamic Stream Processing

http://www.ottogroup.com/en/karriere/

We are hiring!