stream processing under latency constraints

21
Stream Processing under Latency Constraints Björn Lohrmann Daniel Warneke Odej Kao Technische Universität Berlin

Upload: technische-universitaet-berlin

Post on 13-Jan-2015

554 views

Category:

Technology


3 download

DESCRIPTION

Presented on Nov 22, 2012 at the Stratosphere Onsite Meeting 2012. The presentation explains extensions to Stratosphere's execution engine Nephele for latency constrained stream processing. It also sheds light on future work in programming models for scalable, real-time stream processing. The Stratosphere Streaming Distribution, that implements the researched techniques, is available as open-source via github.com: https://github.com/bjoernlohrmann/stratosphere More about me: http://www.cit.tu-berlin.de/menue/personen/lohrmann_bjoern/parameter/en/

TRANSCRIPT

Page 1: Stream Processing Under Latency Constraints

Stream Processing

under Latency

Constraints

Björn Lohrmann

Daniel Warneke

Odej Kao

Technische Universität Berlin

Page 2: Stream Processing Under Latency Constraints

Background

20.11.2012 2Björn Lohrmann - Stream Processing Under Latency Constraints

Nephele and PACTs currently focus on batch-job

workloads

-to-

What about streaming workloads?

Generally possible with Nephele

PACTs support is WIP

May have different goals

Meet pipeline latency and throughput requirements

Max/Min other custom metrics

Page 3: Stream Processing Under Latency Constraints

Motivation

20.11.2012 3Björn Lohrmann - Stream Processing Under Latency Constraints

Live processing of streamed data is also worth looking

at. Some examples:

Incremental search index updates (Google Percolator

replaced MapReduce!)

Social Media streams (see Twitter Storm)

Sensor Networks in science and industry

Multimedia-Streams

User-Generated content from mobile phones

CCTV cameras

Page 4: Stream Processing Under Latency Constraints

Agenda

20.11.2012 4Björn Lohrmann - Stream Processing Under Latency Constraints

1. Latency constrained stream processing with Nephele

1. Internal framework design

2. Meeting latency requirements

1. Current Nephele Design Implications

2. Latency Constraints and Measurement

3. Strategy 1: Adaptive Output Buffer Sizing

4. Strategy 2: Task Chaining

5. Experimental Results

2. Streaming on the PACTs layer (WIP)

1. Sliding Window Semantics

Page 5: Stream Processing Under Latency Constraints

Nephele IO Layer Design

20.11.2012 5Björn Lohrmann - Stream Processing Under Latency Constraints

Task

n

Task

n+1

Task

n

Task

n+1

Task

n

Task

n+1

Task

n

Compute Node X

Compute Node Y

Compute Node Z

Input Buffer

Queue

Thread/ProcessOutput

Buffer

Data

Item

Page 6: Stream Processing Under Latency Constraints

Sample Application: Video

Livestreaming

20.11.2012 6Björn Lohrmann - Stream Processing Under Latency Constraints

Node 1 Node 2 Node n-1 Node n

Decoder

Merger

Overlay

Encoder

Partitioner

RTP

Server

Page 7: Stream Processing Under Latency Constraints

Latency w/o Optimizations

20.11.2012 7Björn Lohrmann - Stream Processing Under Latency Constraints

Setup:

200 nodes, 800 cores

32 KB output buffer

size

6400 video streams

Results:

Latency oscillates

around 4s

Large buffers cause

bursts

Page 8: Stream Processing Under Latency Constraints

Implications for Streaming

Applications

20.11.2012 8Björn Lohrmann - Stream Processing Under Latency Constraints

Effects of output buffers:

Large buffer = high tp, high latency

Small buffer = low tp, low latency

Trade-off needs to be found to meet latency goals

Thread/Process Model

1 Task= 1 Thread model is flexible, but has overhead

Thread scheduling, synchronization, communication

Serialization may be necessary (bad for TP & latency)

N Tasks = 1 Thread model can sometimes provide

better better tp and latency

Page 9: Stream Processing Under Latency Constraints

Latency Constraints

20.11.2012 9Björn Lohrmann - Stream Processing Under Latency Constraints

QoS goal:

Meet latency constraint X, keep

throughput as high as possible

We designed two strategies:

1. Adaptive Output Buffer Sizing

2. Dynamic Task Chaining

Both strategies

work autonomously (only latency constraint is required)

are applied on-demand at runtime

300ms

Page 10: Stream Processing Under Latency Constraints

Measuring Latency

To meet latency constraints we need to

measure first!

General aproach:

Determine which tasks & channels need

measuring

Add & evaluate periodic timestamps to

Determine wait time caused by buffers

Determining task latency is a little trickier

w/o knowing semantics (e.g. reduce)

Ship measurement data to a collector-node

20.11.2012 Björn Lohrmann - Stream Processing Under Latency Constraints 10

300ms

Page 11: Stream Processing Under Latency Constraints

Measuring Latency

Problem:

Combinatorial explosion of

paths through execution graph

for which constraints must be

evaluated

Infeasible to do on a central node

for large-scale workflows

Current approach:

Split execution graph into subgraphs (heuristic)

Assign each subgraph to a worker node responsible for

collecting measurements & applying runtime optimizations

when constraint is violated

Successfully scaled to 200 nodes in experiments

20.11.2012 Björn Lohrmann - Stream Processing Under Latency Constraints 11

300ms

Page 12: Stream Processing Under Latency Constraints

Adaptive Output Buffer

Sizing

20.11.2012 12Björn Lohrmann - Stream Processing Under Latency Constraints

Only applied when latency constraint violated

For each channel

Determine output buffer latency (obl)

If obl > threshold, decrease buffer size:

If obl < threshold, increase buffer size again

200,98.0

),max(:

r

rsizesize obl

310500,1.1

),min(:

r

rsizesize obl

Page 13: Stream Processing Under Latency Constraints

Task Chaining

Conditions:

Pipeline of unchained tasks

Sum of CPU utilizations is <

90% of capacity of one core

(reuse available Nephele

profiling data)

Apply to longest chainable

pipeline of tasks

Control-flow manipulation

requires mp-like tasks

20.11.2012 Björn Lohrmann - Stream Processing Under Latency Constraints 13

Task

n

Task

n+1

Compute Node

Task

n

Task

n+1

Compute Node

Again, only applied when overall latency constraint is

violated

Page 14: Stream Processing Under Latency Constraints

Schematic Overview

20.11.2012 14Björn Lohrmann - Stream Processing Under Latency Constraints

JM

300ms

TM

TM

TM

TM

Payload data

+

Measurements

(lat, tp)

+

Buffer Size Updates

Chain Commands

Scheduled Tasks

+

Distributed

Measurement Setup

Page 15: Stream Processing Under Latency Constraints

Latency w/ Adaptive Buffer

Sizing

20.11.2012 15Björn Lohrmann - Stream Processing Under Latency Constraints

Final Latency:

improvement)

Page 16: Stream Processing Under Latency Constraints

Latency /w ABS+TC

20.11.2012 16Björn Lohrmann - Stream Processing Under Latency Constraints

Final Latency:

improvement)

Page 17: Stream Processing Under Latency Constraints

Moving further up the

20.11.2012 17Björn Lohrmann - Stream Processing Under Latency Constraints

Key steps to push constraints up to PACTs:

1. Find a model to express stream semantics for the

blocking PACTS (reduce, match, cogroup, cross)? ( and implement them in the PACT-runtime

2. Define latency constraint annotations for PACTs

jobs

3. Adapt PACT-compiler to produce streamable plans

and push constraints to Nephele layer

Page 18: Stream Processing Under Latency Constraints

Stream Semantics

Literature shows many different ways of defining

stream semantics for blocking relational operators

Key aspect: The sliding data window

Degrees of freedom:

Tuple-based: take the N most recent tuples

Time-based: take all tuples whose timestamp is fresher

than T time units

Partition-based: partition the stream and take union of

N most recent tuples from each partition

Slide length (tuples or time units)

Timestamp Sources

20.11.2012 Björn Lohrmann - Stream Processing Under Latency Constraints 18

Page 19: Stream Processing Under Latency Constraints

Timestamp Sources

Obviously for all window types we need timestamped

tuples

Affects determinism:

Depending on where timestamps are added, replaying

the same data will or will not yield the same results

Degrees of freedom:

Timestamping at the data source outside Stratosphere

Yields identical results upon replay

Timestamping within Stratosphere

Yields different results upon replaying (caused by

scheduling, network, etc)

20.11.2012 Björn Lohrmann - Stream Processing Under Latency Constraints 19

Page 20: Stream Processing Under Latency Constraints

Video Workflow Translated

to PACTs

20.11.2012 Björn Lohrmann - Stream Processing Under Latency Constraints 20

data src map mapreduce map reduce

Decoder Merger Overlay Encoder RTP-Server

Reduce-Key: Group-IDWindow-Type: Partition-BasedWindow-Size: 1 Tuple (per Partition)Partitioning-By: Stream-IDSlide-Length: 1

Reduce-Key: Group-IDWindow-Type: Tuple-BasedWindow-Size: 1 TupleSlide-Length: 1

Timestamp Source

(Stream-ID, Packet)

(Group-ID, Frame) (Group-ID, Frame)

(Group-ID, Frame)

(Group-ID, Packet)

Data sink

Page 21: Stream Processing Under Latency Constraints

Proposed Model

Which semantics are needed is largely application

dependent

Therefore provide PACTs with common, user-

configurable window semantics

Key configuration parameters:

Window type and slide length

Force the user to define timestamp source locations

20.11.2012 Björn Lohrmann - Stream Processing Under Latency Constraints 21