discretized stream - fault-tolerant streaming computation at scale - sosp

36
Discretized Streams Fault-Tolerant Streaming Computation at Scale Matei Zaharia, Tathagata Das (TD), Haoyuan (HY) Li, Timothy Hunter, Scott Shenker, Ion Stoica

Upload: tathagata-das

Post on 12-Aug-2015

41 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Discretized StreamsFault-Tolerant Streaming Computation at Scale

Matei Zaharia, Tathagata Das (TD), Haoyuan (HY) Li, Timothy Hunter, Scott Shenker, Ion Stoica

Page 2: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Motivation

Many big-data applications need to process large data streams in near-real timeWebsite

monitoring Fraud detection Ad

monetizationRequire tens to hundreds of nodes

Require second-scale latencies

Page 3: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Page 4: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Challenge

• Stream processing systems must recover from failures and stragglers quickly and efficiently– More important for streaming systems than batch

systems

• Traditional streaming systems don’t achieve these properties simultaneously

Page 5: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Outline

• Limitations of Traditional Streaming

Systems

• Discretized Stream Processing

• Unification with Batch and Interactive

Processing

Page 6: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Traditional Streaming Systems

• Continuous operator modelmutable state

node 1

node 3

input records

node 2

input records

– Each node runs an operator with in-memory mutable state

– For each input record, state is updated and new records are sent out

• Mutable state is lost if node fails

• Various techniques exist to make state fault-tolerant

Page 7: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Fault-tolerance in Traditional Systems

• Separate set of “hot failover” nodes process the same data streams

• Synchronization protocols ensures exact ordering of records in both sets

• On failure, the system switches over to the failover nodes

sync protocol

input

input

hot failover nodes

Fast recovery, but 2x hardware cost

Node Replication [e.g. Borealis, Flux ]

Page 8: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

input

input

Fault-tolerance in Traditional Systems

Upstream Backup [e.g. TimeStream, Storm ]

cold failover

node

• Each node maintains backup of the forwarded records since last checkpoint

• A “cold failover” node is maintained

• On failure, upstream nodes replay the backup records serially to the failover node to recreate the state

Only need 1 standby, but slow recovery

backup

replay

Page 9: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Slow Nodes in Traditional Systems

Node Replication

Upstream Backup

input

input

input

input

Neither approach handles stragglers

Page 10: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Our Goal

• Scales to hundreds of nodes

• Achieves second-scale latency

• Tolerate node failures and stragglers

• Sub-second fault and straggler recovery

• Minimal overhead beyond base

processing

Page 11: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Our Goal

• Scales to hundreds of nodes

• Achieves second-scale latency

• Tolerate node failures and stragglers

• Sub-second fault and straggler recovery

• Minimal overhead beyond base

processing

Page 12: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Why is it hard?

Stateful continuous operators tightly integrate “computation” with “mutable

state”

Makes it harder to define clear boundaries when computation and state can be moved around

statefulcontinuous operator

mutable state

inputrecords

outputrecords

Page 13: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Dissociate computation from state

Make state immutable and break computation into small, deterministic,

stateless tasks

Defines clear boundaries where state and computation

can be moved around independently

stateless task

state 1

input 1

state 2

stateless task

state 2

input 2

stateless task

input 3

Page 14: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Batch Processing Systems!

Page 15: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Batch Processing Systems

Batch processing systems like MapReduce divide

– Data into small partitions– Jobs into small, deterministic, stateless map /

reduce tasks

M

M

M

M

M

M

M

M

M

M

M

M

stateless map tasks

stateless reduce tasks

R

R

R

R

R

R

immutable input

dataset

immutable output dataset

immutablemap outputs

Page 16: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Parallel Recovery

Failed tasks are re-executed on the other nodes in parallel

M

M

M

R

R

M

M

M

M

M

M

stateless map tasks

stateless reduce tasks

RR

R

R

MMM

M

M

MR

R

immutable input

dataset

immutable output dataset

Page 17: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Discretized Stream Processing

Page 18: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Discretized Stream Processing

Run a streaming computation as a series of small, deterministic

batch jobs

Store intermediate state data in cluster memory

Try to make batch sizes as small as possible to get second-scale latencies

Page 19: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Discretized Stream Processing

time = 0 - 1: batch operations

Input: replicated dataset stored in

memory

Output or State: non-replicated

datasetstored in memory

input stream

output / state stream

… …

input

time = 1 - 2:input

Page 20: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Example: Counting page views

Discretized Stream (DStream) is a sequence of immutable, partitioned datasets– Can be created from live data streams or by

applying bulk, parallel transformations on other DStreams

views = readStream("http:...", "1 sec")

ones = views.map(ev => (ev.url, 1))

counts = ones.runningReduce((x,y) => x+y)

creating a DStream

transformation

views ones counts

t: 0 - 1

t: 1 - 2

map reduce

Page 21: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Fine-grained Lineage

• Datasets track fine-grained operation lineage

• Datasets are periodically checkpointed asynchronously to prevent long lineages

views ones counts

t: 0 - 1

t: 1 - 2

map reduce

t: 2 - 3

Page 22: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

• Lineage is used to recompute partitions lost due to failures

• Datasets on different time steps recomputed in parallel

• Partitions within a dataset also recomputed in parallel

views ones counts

t: 0 - 1

t: 1 - 2

map reduce

t: 2 - 3

Parallel Fault Recovery

Page 23: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Upstream Backup

stream replayed serially

state

views ones counts

t: 0 - 1

t: 1 - 2

t: 2 - 3

Discretized Stream

Processing

parallelism across time intervals

parallelism within a batch

Faster recovery than upstream backup,

without the 2x cost of node replication

Comparison to Upstream Backup

Page 24: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

How much faster than Upstream Backup?

Recover time = time taken to recompute and catch up

– Depends on available resources in the cluster– Lower system load before failure allows faster

recovery

Parallel recovery

with 5 nodes

faster than upstream backup

Parallel recovery with 10 nodes

faster than 5 nodes

Page 25: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Parallel Straggler Recovery

• Straggler mitigation techniques– Detect slow tasks (e.g. 2X slower than other

tasks)– Speculatively launch more copies of the tasks

in parallel on other machines

• Masks the impact of slow nodes on the progress of the system

Page 26: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Evaluation

Page 27: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Spark Streaming

• Implemented using Spark processing engine*– Spark allows datasets to be stored in memory,

and automatically recovers them using lineage

• Modifications required to reduce jobs launching overheads from seconds to milliseconds

[ *Resilient Distributed Datasets - NSDI, 2012 ]

Page 28: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

How fast is Spark Streaming?

Can process 60M records/second on 100 nodes at 1 second latency

Tested with 100 4-core EC2 instances and 100 streams of text

28

0 50 1000

1

2

3

4WordCount

1 sec2 sec

# Nodes in Cluster

Clus

ter T

hrou

ghpu

t (G

B/s)

0 20 40 60 80100

0

2

4

6

8Grep

1 sec2 sec

# Nodes in Cluster

Clus

ter T

hhro

ughp

ut (G

B/s)

Count the sentences having

a keyword

WordCount over 30 sec sliding

window

Page 29: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

How does it compare to others?

Throughput comparable to other commercial stream processing systems

29

SystemThroughput per core [ records / sec ]

Spark Streaming 160k

Oracle CEP 125k

Esper 100k

StreamBase 30k

Storm 30k

[ Refer to the paper for citations ]

Page 30: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Before

fai...

On fa

ilure

Next 3

s

Second 3

s

Third 3

s

Fourth 3

s

Fifth 3

s

Sixth 3

s0.0

1.0

2.0

3.0

4.0

5.0

30s ckpts, 20 nodes

30s ckpts, 40 nodes

10s ckpts, 20 nodes

10s ckpts, 40 nodes

Ba

tch

Pro

ce

ss

ing

Tim

e (

s)

How fast can it recover from faults?

Recovery time improves with more frequent checkpointing and more

nodesFailure

Word Count over 30 sec

window

Page 31: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

How fast can it recover from stragglers?

WordCount Grep0.0

1.0

2.0

3.0

4.0

0.55 0.54

3.02 2.40

1.000.64

No straggler Straggler, no speculation

Straggler, with speculation

Ba

tch

Pro

ce

ss

ing

Tim

e (

s)

Speculative execution of slow tasks mask the effect of stragglers

Page 32: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Unification with Batch and Interactive Processing

Page 33: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Unification with Batch and Interactive Processing

• Discretized Streams creates a single programming and execution model for running streaming, batch and interactive jobs

• Combine live data streams with historic dataliveCounts.join(historicCounts).map(...)

• Interactively query live streamsliveCounts.slice(“21:00”, “21:05”).count()

Page 34: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

App combining live + historic data

Mobile Millennium Project: Real-time estimation of traffic transit times using live and past GPS observations

34

0 20 40 60 800

400

800

1200

1600

2000

# Nodes in Cluster

GP

S o

bs

erv

ati

on

s p

er

se

co

nd

• Markov chain Monte Carlo simulations on GPS observations

• Very CPU intensive

• Scales linearly with cluster size

Page 35: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Recent Related Work

• Naiad – Full cluster rollback on recovery

• SEEP – Extends continuous operators to enable parallel recovery, but does not handle stragglers

• TimeStream – Recovery similar to upstream backup

• MillWheel – State stored in BigTable, transactions per state update can be expensive

Page 36: Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP

Takeaways

• Discretized Streams model streaming computation as series of batch jobs– Uses simple techniques to exploit parallelism in

streams

– Scales to 100 nodes with 1 second latency

– Recovers from failures and stragglers very fast

• Spark Streaming is open source - spark-project.org– Used in production by ~ 10 organizations!

• Large scale streaming systems must handle faults and stragglers