stream processing frameworks

23
Stream Processing DAVID OSTROVSKY | COUCHBASE

Upload: sirketchup

Post on 22-Jan-2018

977 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Stream Processing Frameworks

Stream ProcessingDAVID OSTROVSKY | COUCHBASE

Page 2: Stream Processing Frameworks

Why Streaming?

Page 3: Stream Processing Frameworks

Streaming Data

Stream Processing

Stream Processing

Engines

Complex Event Processing

Engines

Page 4: Stream Processing Frameworks

Types of Data ProcessingThroughput / sec

Time frame100s

1000s

100000s

daysec min hrms

Real-Time Processing (CEP, ESP)

Interactive Query

DBMS

In-Memory Computing

Batch Processing

(MapReduce)

Page 5: Stream Processing Frameworks

All Apache, all the Time

Page 6: Stream Processing Frameworks

No Love for Microsoft?

Orleans

Page 7: Stream Processing Frameworks

Processing Model

OperatorEvents

OperatorOperator

Operator

OperatorEvents

OperatorOperator

Operator

CollectorBatches

(Time Window)

Continuous Micro-Batching

Page 8: Stream Processing Frameworks

Programming Model

Continuous Micro-Batch Micro-Batch Continuous Continuous*

* Has a batch abstraction on top of streaming

Page 9: Stream Processing Frameworks

API and Expressiveness

public class PrinterBolt extends BaseBasicBolt{

public void execute(Tuple tuple, ...) {System.out.println(tuple);

}}

topology.setBolt("print", new PrinterBolt()).shuffleGrouping("twitter");

val ssc = new StreamingContext(conf, Seconds(1))ssc.socketTextStream("localhost", 9999)

.flatMap(_.split(" "))

.map(word => (word, 1))

.reduceByKey(_ + _)

.print()

Compositional Declarative

Page 10: Stream Processing Frameworks

API and Expressiveness

Compositional Compositional Declarative Compositional Declarative

JVM, Python, Ruby, JS, Perl

JVM JVM, Python JVM JVM, Python*

* Only for the DataSet API (batch)

Page 11: Stream Processing Frameworks

Storm + TridentTopology:

◦ Spouts

◦ Bolts

Stream Groupings:◦ Shuffle

◦ Fields

◦ All

◦ …

Nimbus (Master)◦ Workers

Page 12: Stream Processing Frameworks

Spark Streaming

Resilient Distributed Datasets (RDD)

DStreams – sequences of RDDs

Page 13: Stream Processing Frameworks

SamzaUses Kafka for streaming

◦ Topics (streams)◦ Partitioned across Brokers

◦ Producers

◦ Consumers

Uses YARN for resource management◦ ResourceManager

◦ NodeManager

◦ ApplicationMaster

Page 14: Stream Processing Frameworks

FlinkDataflows

◦ Streams◦ Source(s)

◦ Sink(s)

◦ Transformations (operators)

Page 15: Stream Processing Frameworks

OrleansVirtual Actor System in .NET◦ Grains (operators)

◦ Silos (containers)

◦ Streams

Page 16: Stream Processing Frameworks

Message Delivery Guarantees

At Most Once At Least Once Exactly Once

SourceSockets

Twitter Streaming APIAny non-repeatable

FilesSimple Queues

Any forward-only

Kafka, RabbitMQCollections

Stateful

SinkData Stores

SocketsFiles

HDFS rolling sink

Page 17: Stream Processing Frameworks

Highest Possible Guarantee

At least once Exactly once* Exactly once** At least once Exactly once*

* Doesn’t apply to side-effects** Only at the batch level

Page 18: Stream Processing Frameworks

Reliability and Fault Tolerance

ACK per tuple RDD checkpointsPartition offset

checkpointsBarrier

checkpoints

Page 19: Stream Processing Frameworks

State Management

Manual

Dedicated state providers (memory, external)

RDD with per-key state

Local K/V store + changelog in

Kafka

Stored with snapshots,

configurable backends

Page 20: Stream Processing Frameworks

Performance

Latency Low Medium Medium-High* Low Low**

Throughput Medium Medium High High High

* Depends on batching** For streaming, not micro-batching

Page 21: Stream Processing Frameworks

Extended Ecosystem

SAMOA (ML) Trident-MLSpark SQL,

MLlibGraphX

SAMOA (ML)

CEPGelly*

FlinkML*Table API (SQL)*

* DataSet API (batch)** Currently v0.0.4

Page 22: Stream Processing Frameworks

Production and Maturity

Mature, many users,

224 contributors

Relatively mature,many users

957 contributors*

Newer, built on mature

components,fewer users,

57 contributors

New,high momentum,

few users,219 contributors

* Spark, not just spark streaming** Contributor numbers as of 5/9/2016

Page 23: Stream Processing Frameworks