apache storm vs. spark streaming – two stream processing platforms compared

2014 © Trivadis

BASEL BERN BRUGG LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA

2014 © Trivadis

Apache Storm vs. Spark Streaming – Two Stream Processing Platforms compared DBTA Workshop on Stream Processing Berne, 3.12.2014

Guido Schmutz

3rd December 2014 Apache Storm vs. Spark Streaming – Two Stream Processing Platforms compared

1

2014 © Trivadis

Guido Schmutz

§  Working for Trivadis for more than 18 years §  Oracle ACE Director for Fusion Middleware and SOA §  Co-Author of different books §  Consultant, Trainer Software Architect for Java, Oracle, SOA and

Big Data / Fast Data §  Member of Trivadis Architecture Board §  Technology Manager @ Trivadis

§  More than 25 years of software development

experience

§  Contact: [email protected] §  Blog: http://guidoschmutz.wordpress.com §  Twitter: gschmutz


2

2014 © Trivadis

Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany and Austria.

We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems.

Our company

O P E R A T I O N


3

2014 © Trivadis

Agenda

1.  Introduction

2.  Apache Storm

3.  Apache Spark (Streaming)

4.  Unified Log

5.  Stream Processing Architectures


4

2014 © Trivadis

What is Stream Processing?

Infrastructure for continuous data processing

Computational model can be as general as MapReduce but with the ability to produce low-latency results

Data collected continuously is naturally processed continuously

aka. Event Processing / Complex Event Processing (CEP)

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

5

2014 © Trivadis

Why Stream Processing?

Response latency

Stream Processing

Milliseconds to minutes

RPC

Synchronous Later. Possibly much later.


6

2014 © Trivadis

How to design a Stream Processing System?


7

Event Stream

event Collecting

event

Queue (Persist)

Event Stream

event Collecting

event

Processing

event Processing

result

result

Event Stream

event Collecting/ Processing

result

2014 © Trivadis

How to scale a Stream Processing System?


8

Queue (Persist)

Event Stream

event

Collecting Thread 1 event event

Processing Thread 1 result

Collecting Thread 2

Processing Thread 2

event event event result

Collecting Thread n

Processing Thread n

2014 © Trivadis

Collecting Process 1







9

Queue 1 (Persist)

Event Stream

event

Collecting Thread 1

event event Processing Process 1 result

Collecting Thread 1

Processing Process 1

Queue 2 (Persist) event

event event result

Processing Process 1

Queue n (Persist)

2014 © Trivadis



Processing A Process 2

Processing B Process 2





10

Event Stream



Processing A Thread 2 Q2

e Processing B

Thread 2 Q2 e


e Processing B

Thread 1 Q1 e


Processing A Thread n Qn

e

2014 © Trivadis

How to make (stateful) Stream Processing System reliable?

Faults and stragglers inevitable in large clusters running big data applications

Streaming applications must recover from them quickly


11




Event Stream



e Processing B

Thread 2 Q2 e




Event Stream



e Processing B

Thread 2 Q2 e

2014 © Trivadis


Solution 1: using active/passive system (hot replication)

•  Both systems process the full load

•  In case of a failure, automatically switch and use the “passive” system

•  Stragglers slow down both active and passive system


12 State = State in-memory and/or on-disk




Event Stream



e Processing B

Thread 2 Q2 e

Active






e Processing B

Thread 2 Q2 e

Passive

State

State

2014 © Trivadis


Solution 2: Upstream backup

•  Nodes buffer sent messages and reply them to new node in case of failure

•  Stragglers are treated as failures


13

State = State in-memory and/or on-disk

buffer = Buffer for replay in-memory and/or on-disk




Event Stream



e Processing B

Thread 2 Q2 e

State

2014 © Trivadis

Processing Models

Batch Processing •  Familiar concept of processing data en masse •  Generally incurs a high-latency

(Event-) Stream Processing •  A one-at-a-time processing model •  A datum is processed as it arrives •  Sub-second latency •  Difficult to process state data efficiently Micro-Batching •  A special case of batch processing with very small batch sizes (tiny) •  A nice mix between batching and streaming •  At cost of latency •  Gives stateful computation, making windowing an easy task


14

2014 © Trivadis

Message Delivery Semantics

At most once [0,1] •  Messages my be lost •  Messages never redelivered

At least once [1 .. n] •  Messages will never be lost •  but messages may be redelivered (might be ok if consumer can handle it)

Exactly once [1] •  Messages are never lost •  Messages are never redelivered •  Perfect message delivery •  Incurs higher latency for transactional semantics


15

2014 © Trivadis

Requirements dictate the choice

Latency

•  Is performance of streaming application paramount

Development Cost

•  Is it desired to have similar code bases for batch and stream processing => lambda architecture

Message Delivery Guarantees

•  Is there high importance on processing every single record, or is some normal amount of data loss acceptable

Process Fault Tolerance

•  Is high-availability of primary concern


16

2014 © Trivadis

Agenda

1.  Introduction

2.  Apache Storm


4.  Unified Log



17

2014 © Trivadis

Apache Storm

A platform for doing analysis on streams of data as they come in, so you can react to data as it happens. •  A highly distributed real-time computation system

•  Provides general primitives to do real-time computation

•  To simplify working with queues & workers

•  scalable and fault-tolerant

•  complementary to Hadoop

•  Written in Clojure, supports Java, Clojure

•  Originated at Backtype, acquired by Twitter in 2011

•  Open Sourced late 2011

•  Part of Apache Incubator since September 2013


18

2014 © Trivadis

Apache Storm – Core concepts

Tuple •  Core data structure in storm •  Immutable Set of Key/value pairs •  You can think of Storm tuples as events •  Values must be serializable

Stream •  Key abstraction of Storm •  an unbounded sequence of tuples that can be processed in parallel by Storm •  Each stream is given ID and bolts can produce and consume tuples from

these streams on the basis of their ID •  Each stream also has an associated schema of the tuples that will flow

through it


19

T T T T T T T T

2014 © Trivadis

Apache Storm – Core concepts

Topology •  Wires data and functions via a DAG (directed acyclic graph) •  Executes on many machines similar to a MR job in Hadoop

Spout •  Source of data streams (tuples) •  can be run in “reliable” and “unreliable” mode

Bolt •  Consumes 1+ streams and potentially

produces new streams •  Complex operations often require multiple

steps and thus multiple bolts •  Calculate, Filter, Aggregate, Join, Talk to

database


20

Spout

Spout

Bolt

Bolt

Bolt

Bolt

Source of Stream B

Subscribes: A Emits: C

Subscribes: A Emits: D

Subscribes: A & B Emits: -

Subscribes: C & D Emits: -

2014 © Trivadis

Storm – How does it work ?

August 2014 CAS Big Data - FH Bern | Stream- and Event-Processing | Processing Event Streams - Apache Storm

21

NFL: Peyton Manning and Denver’s elite offense fall flat in #Superbowl XLVIII

ow.ly/tdQZn #seahawks #broncos

#Superbowl

Split Sentence

TwitterSpout

WordCount

Split Sentence

WordCount

NFL

Manning

Superbowl

Superbowl

… #Superbowl

Peyton

...

2014 © Trivadis



22

Split Sentence

TwitterSpout

WordCount

Split Sentence

WordCount

INCR Superbowl

INCR NFL

INCR Manning

NFL = 1 Manning = 1

Superbowl = 1

… #Superbowl

INCR Superbowl

NFL: Peyton Manning and Denver’s elite offense fall flat in #SuperBowl XLVIII


#Superbowl Superbowl = 2

NFL

Manning

Superbowl

Superbowl

Peyton

...

INCR Peyton Peyton = 1

2014 © Trivadis



23

Split Sentence

TwitterSpout

WordCount

Split Sentence

WordCount

INCR Superbowl

INCR NFL

INCR Manning

NFL = 1 Manning= 1

Superbowl = 1

… #Superbowl

INCR Superbowl



#Superbowl Superbowl = 2

NFL

Manning

Superbowl

Superbowl

Peyton

...

INCR Peyton Peyton = 1

Report

NFL = 1 Manning = 1

Superbowl = 2 Peyton= 1

2014 © Trivadis

Storm - Topology

Each Spout or Bolt are running N instances in parallel


24

Split Sentence

TwitterSpout

WordCount

Split Sentence

WordCount

Shuffle Fields

Shuffle grouping is random grouping

Fields grouping is grouped by value, such that equal value results in equal task

All grouping replicates to all tasks

Global grouping makes all tuples go to one task

None grouping makes bolt run in the same thread as bolt/spout it subscribes to

Direct grouping producer (task that emits) controls which consumer will receive

Local or Shuffle grouping

similar to the shuffle grouping but will shuffle tuples among bolt tasks running in the same worker process, if any. Falls back to shuffle grouping behavior.

Report Global

2014 © Trivadis

Storm - Creating Topology


25

2014 © Trivadis

Using a NoSQL database for storing results (keeping state with counter type columns)


26

Twitter Stream

HashtagSplitter

TwitterSpout

HashtagCounter

HashtagSplitter

HashtagCounter

seahawks

broncos

superbowl INCR superbowl

INCR seahawks

INCR broncos

seahawks= 1 broncos = 1

superbowl = 1

superbowl

… #Superbowl

INCR superbowl



#Superbowl

superbowl = 2

2014 © Trivadis

Storm Trident

High-Level abstraction on top of storm

Simplifies building topologies

Core data model is the stream •  Processed as a series of batches (micro-batches) •  Stream is partitioned among nodes in cluster

5 kinds of operations in Trident •  Operations that apply locally to each partition and cause no network transfer •  Repartitioning operations that don‘t change the contents •  Aggregation operations that do network transfer •  Operations on grouped streams •  Merges and Joins


27

2014 © Trivadis

Storm Trident - Creating Topology


28

Twitter Stream

tweet tweet HashtagSplitter

TwitterSpout

hashtag HashtagNormalizer

Persistent Aggregate

hashtag

groupBy local

Bolt Bolt

2014 © Trivadis

Trident Concepts - Function

•  takes in a set of input fields and emits zero or more tuples as output

•  fields of the output tuple are appended to the original input tuple in the stream •  If a function emits no tuples, the original input tuple is filtered out •  Otherwise the input tuple is duplicated for each output tuple


29

2014 © Trivadis

Storm Core vs. Storm Trident


30

Core Storm Storm Trident Community > 100 contributors > 100 contributors Adoption *** * Language Options Java, Clojure, Scala,

Python, Ruby, … Java, Clojure,

Scala

Processing Models Event-Streaming Micro-Batching Processing DSL No Yes Stateful Ops No Yes Distributed RPC Yes Yes Delivery Guarantees At most once / At least

once Exactly Once

Latency sub-second seconds Platform Storm Cluster, YARN Storm Cluster, YARN

2014 © Trivadis

Agenda

1.  Introduction

2.  Apache Storm


4.  Unified Log



31

2014 © Trivadis

Apache Spark

Apache Spark is a fast and general engine for large-scale data processing

•  The hot trend in Big Data!

•  Based on 2007 Microsoft Dryad paper

•  Written in Scala, supports Java, Python, SQL and R

•  Can run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk

•  Runs everywhere – runs on Hadoop, Mesos, standalone or in the cloud

•  One of the largest OSS communities in big data with over 200 contributors in 50+ organizations

•  Originally developed 2009 in UC Berkley’s AMPLab

•  Open Sourced in 2010 – since 2014 part of Apache Software foundation


32

2014 © Trivadis

Apache Spark

Spark Core •  General execution engine for the Spark platform •  In-memory computing capabilities deliver speed •  General execution model supports wide variety of use cases •  DAG-based •  Ease of development – native APIs in Java, Scala and Python

Spark Streaming •  Run a streaming computation as a series of very small, deterministic batch jobs •  Batch size as low as ½ sec, latency of about 1 sec •  Exactly-once semantics •  Potential for combining batch and streaming processing in same system •  Started in 2012, first alpha release in 2013


33

2014 © Trivadis

Apache Spark - Generality


34

Spark SQL (Batch

Processing)

Blink DB (Approximate

Querying)

Spark Streaming (Real-Time)

MLLib, Spark R (Machine Learning)

GraphX (Graph

Processing)

Spark Core API and Execution Model

Spark Standalone MESOS YARN HDFS Elastic

Search Cassandra S3 / DynamoDB

Libraries

Core Runtime

Cluster Resource Managers Data Stores

Adapted from C. Fregly: http://slidesha.re/11PP7FV

2014 © Trivadis

Apache Spark – Core concepts

Resilient Distributed Dataset (RDD) •  Core Spark abstraction •  Collections of objects (partitions) spread across cluster •  Partitions can be stored in-memory or on-disk (local) •  Enables parallel processing on data sets •  Build through parallel transformations •  Immutable, recomputable, fault tolerant •  Contains transformation history (“lineage”) for whole data set

Operations •  Stateless Transformations (map, filter, groupBy) •  Actions (count, collect, save)


35

2014 © Trivadis

RDD Lineage Example


36

HDFS File Input 1

HadoopRDD

FilteredRDD

MappedRDD

ShuffledRDD

HDFS File Output

HadoopRDD

MappedRDD

HDFS File Input 2 SparkContext.hadoopFile()

SparkContext.hadoopFile() filter()

map() map()

join()

SparkContext.saveAsHadoopFile()

Transformations (Lazy)

Action (Execute Transformations)

Adapted from Chris Fregly: http://slidesha.re/11PP7FV

2014 © Trivadis

RDD Execution Example


37

Partition 1

FileRDD

Partition 2

Partition 5

….

Partition 1

ShuffledRDD

Partition 2

Partition 5

….

Partition 1

FileRDD

Partition 2

Partition 5

….

Partition 1

FileRDD

Partition 2

Partition 1

FileRDD

Partition 2

Partition 1

ShuffledRDD

Partition 2

Partition 5

….

Partition 1

ShuffledRDD

Partition 2

Partition 5

….

Partition 1

MappedRDD

Partition 2

filter() map()

groupByKey() join()

join()

2014 © Trivadis

Apache Spark Streaming – Core concepts

Discretized Stream (DStream) •  Core Spark Streaming abstraction •  micro batches of RDD’s •  Operations similar to RDD

Input DStreams •  Represents the stream of raw data received from streaming sources •  Data can be ingested from many sources: Kafka, Kinesis, Flume, Twitter,

ZeroMQ, TCP Socket, Akka actors, etc. •  Custom Sources can be easily written for custom data sources

Operations •  Same as Spark Core •  Additional Stateful transformations (window, reduceByWindow)


38

2014 © Trivadis

Discretized Stream (DStream)


39

time 1 time 2 time 3

message

time n ….

f(message 1)

RDD @time 1

f(message 2)

f(message n)

….

message 1

RDD @time 1

message 2

message n

….

result 1

result 2

result n

….

message message message

f(message 1)

RDD @time 2

f(message 2)

f(message n)

….

message 1

RDD @time 2

message 2

message n

….

result 1

result 2

result n

….

f(message 1)

RDD @time 3

f(message 2)

f(message n)

….

message 1

RDD @time 3

message 2

message n

….

result 1

result 2

result n

….

f(message 1)

RDD @time n

f(message 2)

f(message n)

….

message 1

RDD @time n

message 2

message n

….

result 1

result 2

result n

….

Input Stream

DStream

MappedDStream map()

saveAsHadoopFiles()

Time Increasing

DSt

ream

Tra

nsfo

rmat

ion

Line

age

Act

ions

Tri

gger

Sp

ark

Jobs

Adapted from Chris Fregly: http://slidesha.re/11PP7FV

2014 © Trivadis

Storm Core vs. Storm Trident vs. Spark Streaming


41

Core Storm Storm Trident Spark Streaming

Community > 100 contributors > 100 contributors > 280 contributors Adoption *** * * Language Options

Java, Clojure, Scala, Python, Ruby, …

Java, Clojure, Scala

Java, Scala Python (coming)

Processing Models

Event-Streaming Micro-Batching Micro-Batching Batch (Spark Core)

Processing DSL No Yes Yes

Stateful Ops No Yes Yes

Distributed RPC Yes Yes No Delivery Guarantees

At most once / At least once

Exactly Once Exactly Once

Latency sub-second seconds seconds Platform Storm Cluster, YARN Storm Cluster, YARN

YARN, Mesos

Standalone, DataStax EE

2014 © Trivadis

Unified Log

That’s what most people think about logs 137.229.78.245 - - [02/Jul/2012:13:22:26 -0800] "GET /wp-admin/images/date-button.gif HTTP/1.1" 200 111 137.229.78.245 - - [02/Jul/2012:13:22:26 -0800] "GET /wp-includes/js/tinymce/langs/wp-langs-en.js?ver=349-20805 HTTP/1.1" 200 13593 137.229.78.245 - - [02/Jul/2012:13:22:26 -0800] "GET /wp-includes/js/tinymce/wp-tinymce.php?c=1&ver=349-20805 HTTP/1.1" 200 101114 137.229.78.245 - - [02/Jul/2012:13:22:28 -0800] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 30747 137.229.78.245 - - [02/Jul/2012:13:22:40 -0800] "POST /wp-admin/post.php HTTP/1.1" 302 - 137.229.78.245 - - [02/Jul/2012:13:22:40 -0800] "GET /wp-admin/post.php?post=387&action=edit&message=1 HTTP/1.1" 200 73160 137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "GET /wp-includes/css/editor.css?ver=3.4.1 HTTP/1.1" 304 - 137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "GET /wp-includes/js/tinymce/langs/wp-langs-en.js?ver=349-20805 HTTP/1.1" 304 - 137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 30809

But this is what we mean here by Log

•  a structured log (records are numbered beginning with 0 based on order they are written)

•  aka. commit log or journal


43

0 1 2 3 4 5 6 7 8 9 10 11

1st record Next record written

2014 © Trivadis

Central Unified Log for (real-time) subscription

Take all the organization’s data and put it into a central log for subscription

Properties of the Unified Log:

•  Unified: “Enterprise”, single deployment

•  Append-Only: events are appended, no update in place => immutable

•  Ordered: each event has an offset, which is unique within a shard

•  Fast: should be able to handle thousands of messages / sec

•  Distributed: lives on a cluster of machines


44

0 1 2 3 4 5 6 7 8 9 10 11

reads

writes

Collector

Consumer System A (time = 6)

Consumer System B

(time = 10)

reads

2014 © Trivadis

Apache Kafka - Overview

•  A distributed publish-subscribe messaging system

•  Designed for processing of real time activity stream data (logs, metrics collections, social media streams, …)

•  Initially developed at LinkedIn, now part of Apache

•  Does not follow JMS Standards and does not use JMS API

•  Kafka maintains feeds of messages in topics


45

Kafka Cluster

Consumer Consumer Consumer

Producer Producer Producer

0 1 2 3 4 5 6 7 8 9 1 0

1 1

1 2

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9 1 0

1 1

1 2

Anatomy of a topic:

Partition 0

Partition 1

Partition 2

Writes

old new

2014 © Trivadis

Apache Kafka - Motivation

LinkedIn’s motivation for Kafka was: §  “A unified platform for handling all the real-time data feeds a large company

might have.”

Must haves §  High throughput to support high volume event feeds. §  Support real-time processing of these feeds to create new, derived feeds. §  Support large data backlogs to handle periodic ingestion from offline

systems. §  Support low-latency delivery to handle more traditional messaging use

cases. §  Guarantee fault-tolerance in the presence of machine failures.


46

2014 © Trivadis

Apache Kafka - Performance

Kafka at LinkedIn

Up to 2 million writes/sec on 3 cheap machines §  Using 3 producers on 3 different machines


47

10+ billion writes per day

172k messages per second

(average)

55+ billion messages per day

to real-time consumers

http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

2014 © Trivadis

Apache Kafka - Partition offsets

Offset: messages in the partitions are each assigned a unique (per partition) and sequential id called the offset

•  Consumers track their pointers via (offset, partition, topic) tuples


48

Consumer group C1

2014 © Trivadis

Apache Kafka – two Options for Log Cleanup


49

Retaining a window of data •  Ideal for event data •  Window can be defined in time (days) or space (GBs) – defaults to 1 week

Retain a complete log (log compaction) •  Ideal for keyed data •  Keep a space-efficient complete

log of changes •  Log compaction runs in the

background •  Ensures that always at least the

last known value for each message key within the log of data is retained

2014 © Trivadis

Data Flow Graphs using Unified Log

Stream processing allows for computing feeds off of other feeds

Derived feeds are no different than original feeds they are computed off

Single deployment of “Unified Log” but logically different feeds


50

Meter Readings Collector

Enrich / Transform

Aggregate by Minute

Raw MeterReadings

Meter with Customer

Meter by Customer by Minute

Customer Aggregate by Minute

Meter by Minute

Persist

Meter by Minute

Persist

Raw Meter Readings

2014 © Trivadis

Architectural Pattern: Standalone Event Stream Processing


52 52

Event Processing (ESP / CEP)

State Store / Event Store En

terp

rise

Even

t Bus

(In

gres

s)

Even

t Cl

oud

Internet of Things

Social Media Streams

Ente

rpris

e Ev

ent B

us

52

Analytical Applications

DB

Ente

rpris

e Se

rvic

e Bu

s

Business Rule Management

System Rules

Event Processing

Result Store

2014 © Trivadis

Hadoop Big Data Infrastructure

Architectural Pattern: Event Stream Processing as part of Lambda Architecture


53 53


State Store / Event Store

Ente

rpris

e Ev

ent B

us

(Ingr

ess)

Even

t Cl

oud

Internet of Things


Ente

rpris

e Ev

ent B

us

53


DB

Ente

rpris

e Se

rvic

e Bu

s

Event Processing

Map/Reduce HDFS Result

Store

Result Store

2014 © Trivadis

Hadoop Big Data Infrastructure

Architectural Pattern: Event Stream Processing as part of Kappa Architecture


54 54


State Store / Event Store

Ente

rpris

e Ev

ent B

us

(Ingr

ess)

Even

t Cl

oud

Internet of Things


54


DB Ente

rpris

e Se

rvic

e Bu

s

Event Processing

Replay HDFS

Result Store

2014 © Trivadis

Questions and answers ...

2014 © Trivadis

BASEL BERN BRUGES LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA

Guido Schmutz Technology Manager


55

apache storm vs. spark streaming – two stream processing platforms compared

Software

event event processing

stream processing platforms

stream processing system

trivadis trivadis

stream processing berne

event queue

trivadis apache storm

trivadis services