scalable streaming analytics - york universitypapaggel/courses/eecs4415/... · 2015-10-08 ·...

60
Scalable Streaming Analytics KARTHIK RAMASAMY @karthikz

Upload: others

Post on 28-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

Scalable Streaming Analytics

KARTHIK RAMASAMY

@karthikz

Page 2: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

BEGIN

END

Overview

!I

Storm Overview

(II

Operational Experiences

KV

Heron

ZIV

TALK OUTLINE

Storm Internals

bIII

Page 3: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

!

!

WHAT IS ANALYTICS?

DISCOVERY

Ability to identify patterns in data

!

!

COMMUNICATION

Provide insights in a meaningful way

according to Wikipedia

Page 4: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

CUBE ANALYTICS

PREDICTIVE ANALYTICS

E!

varietiesTYPES OF ANALYTICS

Page 5: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

ô

Ability to analyze the data immediately

after it is produced

"

Ability to provide results instantly when a query is

posed

Ü

Ability to provide insights after several hours/days when a

query is posed

STREAMING INTERACTIVE BATCH

DIMENSIONS OF ANALYTICSvariants

Page 6: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

STREAMING VS INTERACTIVE

Static Batch Results/Reports

Database Server

Data$Storage$

Queries

Bulkload

Data

INTERACTIVE ANALYTICS

Data Stream Processing

Real time alerts, Real time analytics

Continuous visibility

Data$Storage$

Results

STREAMING ANALYTICS

Queries

Page 7: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

BATCHOLTP REAL TIME

high throughput

> 1 hour

monthly active users relevance for ads

latency sensitive

< 500 ms

fanout Tweets search for Tweets

ad impressions count hash tag trends

WHAT IS REAL TIME?msecs or secs or mins?

deterministic workflows

adhoc queries

approximate

> 1 sec

ComplementFeedback

Page 8: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

varietiesSTREAMING DATA FLOW

Page 9: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

STREAMING SYSTEMSfirst generation - SQL based

NIAGARA Query Engine

Stanford Stream Data Manager

Aurora Stream Processing Engine

Borealis Distributed Stream Processing Engine

Cayuga - Stateful Event Monitoring

Page 10: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

STREAMING SYSTEMSnext generation - too many

Page 11: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

STORM OVERVIEW

![

I

Page 12: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

GUARANTEED MESSAGE

PROCESSING

HORIZONTAL SCALABILITY

ROBUST FAULT

TOLERANCE

CONCISE CODE- FOCUS

ON LOGIC

/b \ Ñ

WHAT IS STORM?

Streaming platform for analyzing realtime data as they arrive, so you can react to data as it happens.

Page 13: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

STORM DATA MODELTOPOLOGY

Directed acyclic graph

Vertices=computation, and edges=streams of data tuples

SPOUTS

Sources of data tuples for the topology

Examples - Kafka/Kestrel/MySQL/Postgres

BOLTS

Process incoming tuples and emit outgoing tuples

Examples - filtering/aggregation/join/arbitrary function

,

%

Page 14: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

STORM TOPOLOGY

%

%

%

%

%

SPOUT 1

SPOUT 2

BOLT 1

BOLT 2

BOLT 3

BOLT 4

BOLT 5

Page 15: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

WORD COUNT TOPOLOGY

% %TWEET SPOUT PARSE TWEET BOLT WORD COUNT BOLT

Live stream of Tweets

LOGICAL PLAN

Page 16: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

WORD COUNT TOPOLOGY

% %TWEET SPOUT

TASKSPARSE TWEET BOLT

TASKSWORD COUNT BOLT

TASKS

%%%% %%%%

When a parse tweet bolt task emits a tuple which word count bolt task should it send to?

Page 17: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

STREAM GROUPINGS

Random distribution of tuples

Group tuples by a field or multiple

fields

Replicates tuples to all tasks

SHUFFLE GROUPING FIELDS GROUPING ALL GROUPING

Sends the entire stream to one task

GLOBAL GROUPING

/ - ,.

Page 18: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

WORD COUNT TOPOLOGY

% %TWEET SPOUT

TASKSPARSE TWEET BOLT

TASKSWORD COUNT BOLT

TASKS

%%%% %%%%

SHUFFLE GROUPING FIELDS GROUPING

Page 19: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

STORM INTERNALS

(II

Page 20: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

STORM ARCHITECTURE

Nimbus

ZK CLUSTER

SUPERVISOR

W1 W2 W3 W4

SUPERVISOR

W1 W2 W3 W4

TOPOLOGY SUBMISSION

ASSIGNMENT MAPS

SYNC CODE

SLAVE NODE SLAVE NODE

MASTER NODE

Page 21: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

STORM WORKER

TASK

EXECUTOR

TASK

TASK

EXECUTOR

TASK

TASK

TASK

EXECUTOR

JVM

PR

OC

ESS

Page 22: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

DATA FLOW IN STORM WORKERS

In QueueIn QueueIn QueueIn QueueIn Queue

TCP Receive Buffer

In QueueIn QueueIn QueueIn QueueOut Queue

Outgoing Message Buffer

User Logic Thread

User Logic Thread

User Logic Thread

User Logic Thread

User Logic Thread

User Logic Thread

User Logic Thread

User Logic Thread

User Logic ThreadSend Thread

Global Send Thread

TCP Send Buffer

Global Receive Thread

Kernel

Disruptor Queues

0mq Queues

Page 23: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

Large amount of data produced every day

Largest storm cluster Several topologies deployed

Several billion messages every day

>2400l

>50tbh

>250P

>3bb

STORM @TWITTER

1 stage 8 stages

Page 24: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

STORM ARCHITECTURE

Nimbus

ZK CLUSTER

SUPERVISOR

W1 W2 W3 W4

SUPERVISOR

W1 W2 W3 W4

TOPOLOGY SUBMISSION ASSIGNMENT

MAPS

SLAVE NODE SLAVE NODE

MASTER NODE

Multiple Functionality Scheduling/Monitoring

Single point of failure

Storage Contention

Page 25: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

STORM WORKER

TASK4

TASK5

EXECUTOR2

TASK2

TASK3

TASK1

EXECUTOR1

JVM

PR

OC

ESS

Complex hierarchy

Difficult to tune

Hard to debug

Page 26: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

DATA FLOW IN STORM WORKERS

In QueueIn QueueIn QueueIn QueueIn Queue

TCP Receive Buffer

In QueueIn QueueIn QueueIn QueueOut Queue

Outgoing Message Buffer

User Logic Thread

User Logic Thread

User Logic Thread

User Logic Thread

User Logic Thread

User Logic Thread

User Logic Thread

User Logic Thread

User Logic ThreadSend Thread

Global Send Thread

TCP Send Buffer

Global Receive Thread

Kernel

Queue Contention

Multiple Languages

Page 27: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

OVERLOADED ZOOKEEPER

zk

S1

S2

S3

Scaled up

W

W

WSTORM

zk

Handled unto to 1200 workers per cluster

Page 28: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

67%

33%

OVERLOADED ZOOKEEPER

KAFKA SPOUT

Offset/partition is written every 2 secs

!

!

STORM RUNTIME

Workers write heart beats every 3 secs

Analyzing zookeeper traffic

Page 29: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

OVERLOADED ZOOKEEPER

zk

S1

S2

S3

Heart beat daemons

W

W

WSTORM

zk

5000 workers per cluster

HHH

KVKVKV

Page 30: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

EVOLUTION OR REVOLUTION?

FUNDAMENTAL ISSUES- REQUIRE EXTENSIVE REWRITING

Several queues for moving data

Inflexible and requires longer development cycle

USE EXISTING OPEN SOURCE SOLUTIONS

Issues working at scale/lacks required performance

Incompatible API and long migration process

,

fix storm or develop a new system?

Page 31: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

HERONb

III

Page 32: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

HERON DESIGN GOALS

FULLY API COMPATIBLE WITH STORM

Directed acyclic graph

Topologies, spouts and bolts,

USE OF WELL KNOWN LANGUAGES

No Clojure

C++/JAVA/Python

Page 33: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

HERON ARCHITECTURE

Topology 1

TOPOLOGY SUBMISSION

Scheduler

Topology 2

Topology 3

Topology N

Aurora ECS

YARN Mesos

Page 34: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

TOPOLOGY ARCHITECTURE

Topology Master

ZK CLUSTER

Stream Manager

I1 I2 I3 I4

Stream Manager

I1 I2 I3 I4

Logical Plan, Physical Plan and Execution State

Sync Physical Plan

CONTAINER CONTAINER

Metrics Manager

Metrics Manager

Page 35: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

TOPOLOGY MASTER

ASSIGNS ROLE MONITORING METRICS

b \ Ñ

Solely responsible for the entire topology

Page 36: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

TOPOLOGY MASTER

Topology Master

ZK CLUSTER

Logical Plan, Physical Plan and Execution State

PREVENT MULTIPLE TM BECOMING MASTERS"

" ALLOWS OTHER PROCESS TO DISCOVER TM

Page 37: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

STREAM MANAGER

ROUTES TUPLES BACKPRESSURE ACK MGMT

Ñ

Routing Engine

/ ,

Page 38: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

S1 B2

B3

STREAM MANAGER

Stream Manager

Stream Manager

Stream Manager

Stream Manager

S1 B2

B3 B4

S1 B2

B3

S1 B2

B3 B4

O(n2) O(k2)

B4

Page 39: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

S1 B2

B3

STREAM MANAGER

Stream Manager

Stream Manager

Stream Manager

Stream Manager

S1 B2

B3 B4

S1 B2

B3

S1 B2

B3 B4

tcp back pressure

B4

SLOWS UPSTREAM AND DOWNSTREAM INSTANCES

Page 40: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

S1 B2

B3

STREAM MANAGER

Stream Manager

Stream Manager

Stream Manager

Stream Manager

S1 B2

B3 B4

S1 B2

B3

S1 B2

B3 B4

spout back pressure

B4

S1 S1

S1S1

Page 41: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

STREAM MANAGER

PREDICTABILITY

Tuple failures are more deterministic

SELF ADJUSTS

Topology goes as fast as the slowest component

"

"

back pressure advantages

Page 42: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

HERON INSTANCE

RUNS ONE TASK EXPOSES API COLLECTS METRICS

|

Does the real work!

p

>>

>

Page 43: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

HERON INSTANCE

Stream Manager

Metrics Manager

Gateway Thread

Task Execution Thread

data-in queue

data-out queue

metrics-out queue

BOUNDED QUEUES - TRIGGERS GC IN LARGE TOPOLOGIES

Page 44: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

METRICS MANAGER

GATHERS METRICS SCRIBES ABSTRACTED

ò

Optical Nerve

| *

Page 45: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

HERON PERFORMANCEm

illion

tupl

es/m

in

0

350

700

1050

1400

Spout Parallelism25 100 200 500

Storm Heron

Throughput with acknowledgements - Word count topology

Page 46: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

HERON PERFORMANCEla

tenc

y (m

s)

0

625

1250

1875

2500

Spout Parallelism25 100 200 500

Storm Heron

Latency with acknowledgements enabled - Word Count Topology

Page 47: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

HERON PERFORMANCE#

core

s us

ed

0

625

1250

1875

2500

Spout Parallelism25 100 200 500

Storm Heron

CPU usage with acknowledgements enabled - Word Count Topology

Page 48: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

HERON PERFORMANCEm

illion

tupl

es/m

in

0

1250

2500

3750

5000

Spout Parallelism25 100 200 500

Storm Heron

Throughput with no acknowledgements - Word count topology

Page 49: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

HERON PERFORMANCE#

core

s us

ed

0

625

1250

1875

2500

Spout Parallelism25 100 200 500

Storm Heron

CPU usage with no acknowledgements - Word Count Topology

Page 50: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

HERON PERFORMANCE

Acknowledgements enabled

# co

res

used

0

100

200

300

400

Storm Heron

CPU usage - RTAC Topology

No acknowledgements

# co

res

used

0

100

200

300

400

Storm Heron

Page 51: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

HERON PERFORMANCEla

tenc

y (m

s)

0

17.5

35

52.5

70

Storm Heron

Latency with acknowledgements enabled - RTAC Topology

Page 52: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

OPERATIONAL EXPERIENCES

K$

IV

Page 53: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

HERON DEPLOYMENTTopology 1

Topology 2

Topology 3

Topology N

Heron Tracker

Heron VIZ

Heron Web

ZK CLUSTER

Aurora Services

Aurora Scheduler

Observability

Page 54: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

HERON SAMPLE TOPOLOGIES

Page 55: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

OPERATIONAL EXPERIENCE

All topologies run under topology

owner’s role

Everything runs on Aurora

No more 2am pages

SERVICE-LESS CLUSTER-LESS TENSION-LESS

4 " ,\

Page 56: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

DEVELOPER EXPERIENCE

Faster iteration Better resource utilization

Devel to prod in 5min

DEBUG TUNE DEPLOY

J G ,a

Page 57: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

MIGRATION EXPERIENCE

Couple of hours Lots of savings Summingbird tuning takes time

SMALL MEDIUM LARGE

J # ,L

Page 58: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

CURRENT WORK

x

9V

Page 59: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

CURRENT WORK

Use Java Reflection Determine optimal set of parameters

Grow/Shrink based on data

SERIALIZATION TUNING ELASTIC

Update topology without restarting

CONFIGURATION

< q é"

Page 60: Scalable Streaming Analytics - York Universitypapaggel/courses/eecs4415/... · 2015-10-08 · STREAMING INTERACTIVE BATCH DIMENSIONS OF ANALYTICS variants. STREAMING VS ... Real time

QUESTIONS

and

ANSWERS

R% Go ahead. Ask away.