the benefits of fine-grained synchronization in deterministic and efficient stream processing

61
The benefits of fine-grained synchronization in deterministic and efficient stream processing Vincenzo Gulisano, Yiannis Nikolakopoulos 2022-07-05 Vincenzo Gulisano - Yiannis Nikolakopoulos 1 Chalmers University of technology

Upload: vincenzo-gulisano

Post on 12-Apr-2017

327 views

Category:

Science


1 download

TRANSCRIPT

Page 1: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 1

The benefits of fine-grained synchronization in deterministic and efficient stream processing

Vincenzo Gulisano, Yiannis Nikolakopoulos

Vincenzo Gulisano - Yiannis Nikolakopoulos

Chalmers University of technology

Page 2: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 2

Agenda

• Who we are• Motivation and System Model• ScaleJoin: a Deterministic, Disjoint-Parallel and

Skew-Resilient Stream Join• The ScaleGate data structure• Fine-grained synchronization and other

use-cases• Research in progress• Conclusions

Page 3: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 3

Agenda

• Who we are• Motivation and System Model• ScaleJoin: a Deterministic, Disjoint-Parallel and

Skew-Resilient Stream Join• The ScaleGate data structure• Fine-grained synchronization and other

use-cases• Research in progress• Conclusions

Page 4: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 4

Who we are

Chalmers university

Computer Science and Engineering department

Distributed Computing and Systems research group

Page 5: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 5

Who we are

• 12 PhD degrees awarded• ~20 researchers, 3 Postdocs and 10 PhD

students. • acknowledged internationally as leading

group in practical multicore synchronization algorithms and programming (results adopted by Java JDK, C++, Intel, NVIDIA)

• extensive network of academic and industrial collaborations and has been continuously supported by national and international projects

Distributed Computing and Systems research group

Page 6: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 6

Who we areVincenzo Gulisano (PhD)Assistant Professor at Chalmers University of Technology

Research focus: • scalable big data analysis in cyber-physical systems

(Smart Grids, Advanced Metering Infrastructures and Vehicular Networks).

• streaming operators’ parallelization, load balancing, provisioning, decommissioning and fault tolerance protocols.

• enhancement of parallel stream operators by means of streaming- and energy-aware concurrent data structures that boost analysis performance (both in throughput and latency).

• applied use of the streaming paradigm in data validation, intrusion detection systems, privacy-preserving online analysis, distributed embedded networks and cloud infrastructures.

Page 7: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 7

Who we areYiannis NikolakopoulosPhD Student Research interests:

• Parallel and distributed computing• Shared memory concurrent data structures• Multicore/Manycore systems • Consistency problems• Synchronization• Data Streaming:

operator parallelism & scalability

Page 8: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 8

Agenda

• Who we are• Motivation and System Model• ScaleJoin: a Deterministic, Disjoint-Parallel and

Skew-Resilient Stream Join• The ScaleGate data structure• Fine-grained synchronization and other

use-cases• Research in progress• Conclusions

Page 9: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 9

MotivationApplications in sensor networks, cyber-physical systems:• large and fluctuating volumes of data generated

continuouslydemand for:• Continuous processing of data streams• In a real-time fashion

Store-then-process is not feasible

Main Memory1 Data

Query Processing

3 Query results

2 Query

Main Memory

Query Processing

Disk Main Memory

Query Processing

ContinuousQuery Data Query

results

Page 10: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos

System Model• Data Stream: unbounded sequence of tuples

– Example: Call Description Record (CDR)

time

Field Field

Caller text

Callee text

Time (secs) int

Price (€) double

A B 8:00 3 C D 8:20 7 A E 8:35 6

10

Page 11: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos

System Model

• Operators:

OP Stateless1 input tuple1 output tuple

OP Stateful1+ input tuple(s)1 output tuple

11

Page 12: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos

System Model

• Continuous Query: graph operators/streams

Convert€ $

Only> 10$

Count callsmade by eachCaller number

Map

Filter Agg

12

Count the number of calls whose price is more than 10 dollars made by each caller

Page 13: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos

System Model

• Infinite sequence of tuples / bounded memory windows

• Example: 1 hour windows

time[8:00,9:00)

[8:20,9:20)

[8:40,9:40)

13

Page 14: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos

System Model

• Infinite sequence of tuples / bounded memory windows

• Example: count tuples - 1 hour windows

time[8:00,9:00)

8:05 8:15 8:22 8:45 9:05

Output: 4

14

[8:20,9:20)

Page 15: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 15

The evolution of SPEs

CentralizedSPEs

DistributedSPEs

Parallel-DistributedSPEs

ElasticSPEs

Inter-operator parallelism … …

Intra-operator parallelism

… …

+/- +/-

Scale up / down

Page 16: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 16

Agenda

• Who we are• Motivation and System Model• ScaleJoin: a Deterministic, Disjoint-Parallel and

Skew-Resilient Stream Join• The ScaleGate data structure• Fine-grained synchronization and other

use-cases• Research in progress• Conclusions

Page 17: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 17

What is a stream join?

t1

t2

t3

t4

t1

t2

t3

t4

R S

Sliding window Window

size WSWSWR

Predicate P

Page 18: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 18

Why parallel stream joins?• WS = 600 seconds

• R receives 500 tuples/second• S receives 500 tuples/second

• WR will contain 300,000 tuples

• WS will contain 300,000 tuples

• Each new tuple from R gets compared with all the tuples in WS

• Each new tuple from S gets compared with all the tuples in WR

… 300,000,000 comparisons/second!

t1

t2

t3

t4

t1

t2

t3

t4

R S

WSWR

Page 19: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 19

Which are the challenges of a parallel stream join?

Scalability

High throughput

Low latency

Disjoint parallelism

Skew resilience

Determinism

Page 20: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 20

The 3-step procedure (sequential stream join)

For each incoming tuple t:1. compare t with all tuples in opposite window given predicate P2. add t to its window3. remove stale tuples from t’s window

Add tuples to S

Add tuples to R

Prod R

Prod S

Consume resultsConsPU

We assume each producer delivers tuples

in timestamp order

Page 21: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 21

The 3-step procedure, is it enough?

Scalability

High throughput

Low latency

Disjoint parallelism

Skew resilience

Determinism

t1

t2

t1

t2

R S

WSWR

t3

t1

t2

t1

t2

R S

WSWR

t4

t3

Page 22: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 22

Enforcing determinism in sequential stream joins

• Next tuple to process = earliest(tS,tR)

• The earliest(tS,tR) tuple is referred to as the next ready tuple

• Process ready tuples in timestamp order Determinism

PU

tS tR

Page 23: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 23

Deterministic 3-step procedure

Pick the next ready tuple t:1. compare t with all tuples in opposite window given predicate P2. add t to its window3. remove stale tuples from t’s window

Add tuples to S

Add tuples to R

Prod R

Prod S

Consume resultsConsPU

Page 24: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 24

Shared-nothing parallel stream join(state-of-the-art)

Prod R

Prod S

PU1

PU2

PUN

ConsAdd tuple to PUi S

Add tuple to PUi R

Consume results

Pick the next ready tuple t:1. compare t with all tuples in opposite window given P2. add t to its window3. remove stale tuples from t’s window

Chose a PU

Chose a PU

Take the next ready output tuple

Scalability

High throughput

Low latency

Disjoint parallelism

Skew resilience

Determinism

Merge

Page 25: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 25

Shared-nothing parallel stream join(state-of-the-art)

Prod R

Prod S

PU1

PU2

PUN…

enqueue()dequeue()

ConsMerge

Page 26: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 26

From coarse-grained to fine-grained synchronization

Prod R

Prod S

PU1

PU2

PUN

… Cons

Page 27: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 27

ScaleGate

addTuple(tuple,sourceID)allows a tuple from sourceID to be merged by ScaleGate in the resulting timestamp-sorted stream of ready tuples.

getNextReadyTuple(readerID)provides to readerID the next earliest ready tuple that has not been yet consumed by the former.

https://github.com/dcs-chalmers/ScaleGate_Java

Page 28: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 28

ScaleJoin

Prod R

Prod S

PU1

PU2

PUN

… ConsAdd tuple SGin

Add tuple SGin

Get next ready output tuplefrom SGout

Get next ready input tuple from SGin

1. compare t with all tuples in opposite window given P2. add t to its window in a round-robin fashion3. remove stale tuples from t’s window

SGin SGout

Steps for PU

Page 29: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 29

t1

t2

R S

WR

t3

t4

R S

t4

t1WR

R S

t4

t2

WR

R S

t4

WR

t3

Sequential stream join:

ScaleJoin with 3 PUs:

ScaleJoin (example)

Page 30: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 30

ScaleJoin

Prod R

Prod S

PU1

PU2

PUN

… ConsAdd tuple SGin

Add tuple SGin

Get next ready output tuplefrom SGout

SGin SGout

Scalability

High throughput

Low latency

Disjoint parallelism

Skew resilience

Determinism

Prod S

Prod S

Prod R Get next ready input tuple from SGin

1. compare t with all tuples in opposite window given P2. add t to its window in a round robin fashion3. remove stale tuples from t’s window

Steps for PUi

Page 31: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 32

ScaleJoin Scalability – comparisons/second

Number of PUs

4 billion comparison / second!!!

Page 32: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 35

Agenda

• Who we are• Motivation and System Model• ScaleJoin: a Deterministic, Disjoint-Parallel and

Skew-Resilient Stream Join• The ScaleGate data structure• Fine-grained synchronization and other

use-cases• Research in progress• Conclusions

Page 33: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 36

addTuple(tuple,sourceID)allows a tuple from sourceID to be merged by ScaleGate in the resulting timestamp-sorted stream of ready tuples.

getNextReadyTuple(readerID)provides to readerID the next earliest ready tuple that has not been yet consumed by the former.

Page 34: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Vincenzo Gulisano - Yiannis Nikolakopoulos

ScaleGate: Main Functionality

• addTuple(tuple,sourceID)– Insert tuple from sourceID

effectively in ts order

<6,…>

<1,…>

<3,…>Insert

concurrently

Page 35: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Vincenzo Gulisano - Yiannis Nikolakopoulos

ScaleGate: Main Functionality

• getNextReadyTuple(readerID)– readerID gets the next ready tuple t in ts order

i.e. no tuple with ts<t.ts will arrive afterwards– return null if no ready tuple

<1,…>

Get ready tuples

<6,…>

<3,…>

<7,…>

<9,…>

<8,…>

Page 36: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Vincenzo Gulisano - Yiannis Nikolakopoulos

Implementation?

• getNextReadyTuple(readerID)– we get tuples in ts order,

so maybe an extractMin()? (priority queue)– still no guarantee that t is ready

• addTuple(tuple, sourceID)– sorted insertion maybe could help

<6,…> <1,…><3,…>

Is there a concurrent data structure with cheap extractMin() + insert()

Binary Search Trees,Heaps:

Expensive rebalancing and heap property

Page 37: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Vincenzo Gulisano - Yiannis Nikolakopoulos

ScaleGate Anatomy (1)

• Inspired from lock-free skip lists– randomized height of nodes– expected cost for search/insertion O(logN)

• Search by traversing from higher to lower levels BigData’15 [GNPT]

Page 38: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Vincenzo Gulisano - Yiannis Nikolakopoulos

ScaleGate Anatomy (2)

• Reader-local view of "head" (also has minimum ts for that reader)

• Flagging mechanism: – If "head" is not flagged can be safely returned – Flag the last written tuple of each source

• Nodes free to be garbage-collected after every reader passes (almost...)

head0 head1

BigData’15 [GNPT]

Page 39: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Vincenzo Gulisano - Yiannis Nikolakopoulos

ScaleGate Properties

• Safety: Linearizable– Every operation appears to take effect

instantaneously• Progress: Lock-free

– At least one thread makes progress, independently of the state of others

• Basic building block for providing deterministic processing (through the ready semantics)

Page 40: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Vincenzo Gulisano - Yiannis Nikolakopoulos

Why not a Skip List?

Page 41: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 44

Agenda• Who we are• Motivation and System Model• ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient

Stream Join• The ScaleGate data structure• Fine-grained synchronization and other

use-cases– Streaming Aggregation– Towards real-time analytics– Some ongoing work

• Research in progress• Conclusions

Page 42: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Vincenzo Gulisano - Yiannis Nikolakopoulos

Timestamp sorted

Multi-way Streaming Aggregation

1 5

Aggregate F

3 7

5 9

<6,…>

<1,…>

<3,…>

When to process a tuple?

Tuples not in TS order across streams;not ready for

processing at arrival…

Merge & sort streams

Timestamp sorted tuples per stream

Page 43: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Multi-way Streaming Aggregation(baseline [Streamcloud, Borealis])

Vincenzo Gulisano - Yiannis Nikolakopoulos

Aggregate F:Update

windows and output<6,…>

<1,…>

<3,…>

When to process a tuple?Keep checkingall buffers!!!

Merge & sort streamsCan Data

Structures do more than insert

and extract?

Page 44: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Data Structures to the Rescue

<6,…>

<1,…>

<3,…>Insert

concurrently

Concurrently update

windows

Get ready tuples

Manage sorted windows

Output tuples

ScaleGate objects allow concurrent access with - consistency/safety , progress guarantees - determinism- appropriate interface parallelization of pipeline stages

Contribution:SPAA’14 [CGNPT]

“Brief announcement: concurrent data structures for

efficient streaming aggregation”

Page 45: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Vincenzo Gulisano - Yiannis Nikolakopoulos

Aggregate Scaling

Page 46: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Vincenzo Gulisano - Yiannis Nikolakopoulos

Latency with Increasing# Input Streams

Page 47: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Vincenzo Gulisano - Yiannis Nikolakopoulos

Best Solution Award

Analyze taxi trip reports from NYC and compute:

ACM DEBS 2015 [GNWPT]“Deterministic Real-Time

Analytics of Geospatial Data Streams through ScaleGate

Objects”

Page 48: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Vincenzo Gulisano - Yiannis Nikolakopoulos

System Architecture Overview

Page 49: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Vincenzo Gulisano - Yiannis Nikolakopoulos

• Networks on chip (NoC)• Short distance

between cores• Message passing

model support• Shared memory support

Can we have Data Structures:Fast

ScalableGood progress guarantees

Cache Cache

IA Core

Shared Local

• Eliminatedcache coherency

• Limited support for synchronization primitives

What’s happening in hardware?

Page 50: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Vincenzo Gulisano - Yiannis Nikolakopoulos

Off-chip DRAM

Adapteva’s Epiphany Architecture:16 or 64 cores

core

core

On-chip32kb per

core

core

core core

core

Cores communicate through mesh by reading and writing the on-chip memory

Distributed shared memory

Page 51: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Vincenzo Gulisano - Yiannis Nikolakopoulos

Ongoing work with Ericsson:ScaleGate on Epiphany

• Task-graph based processing• (Baseband) signal processing applications

Can ScaleGate help?

Page 52: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 55

Agenda

• Who we are• Motivation and System Model• ScaleJoin: a Deterministic, Disjoint-Parallel and

Skew-Resilient Stream Join• The ScaleGate data structure• Fine-grained synchronization and other use-cases• Research in progress• Conclusions

Page 53: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 56

Source: http://storm.apache.org/

Source: http://storm.apache.org/

… …

Intra-operator parallelism

Storm is a parallel-distributed Stream Processing Engine• Data streaming operators

Spout / Bolt• Splitting of work into tasks / workers

Intra-operator parallelism

Page 54: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos

Parallelization

• General approach

LB: Load BalancerIM: Input Merger

OPA OPB

…… …

…OPALB

IM

Node 1

OPALB

IM

Node m

Subcluster A

OPBLB

IM

Node 1

OPBLB

IM

Node n

Subcluster B

57

Page 55: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos

Parallelization• Stateful operators: Semantic awareness

– Aggregate: count within last hour, group-by caller number

Previous Subcluster

LB…

LB…

IM

Agg1

IM

Agg2

IM

Agg3

Caller A

A B 8:00 3

58

A C 8:30 5

A D 9:05 2

Page 56: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos

Parallelization

Previous Subcluster

LB…

LB…

IM

Agg1

IM

Agg2

IM

Agg3

59

Source: http://storm.apache.org/

Fields groupingSource: http://storm.apache.org/

User defined

Page 57: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 60

On-going researchSource: http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/

Coarse-grained synchronization!!!• Individual queues for

executors• 2 threads per executor• Dedicated Receive and Send

thread queues…

… What about fine-grained synchronization?

Page 58: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 61

Agenda

• Who we are• Motivation and System Model• ScaleJoin: a Deterministic, Disjoint-Parallel and

Skew-Resilient Stream Join• The ScaleGate data structure• Fine-grained synchronization and other

use-cases• Research in progress• Conclusions

Page 59: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 62

1.Parallel stream joins2.Parallel stream aggregation3.“Ad-hoc” streaming applications4.Determinism in Storm…

https://github.com/dcs-chalmers/ScaleGate_Java

Page 60: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 63

The benefits of fine-grained synchronization in deterministic and efficient stream processing

Vincenzo Gulisano, Yiannis Nikolakopoulos

Chalmers University of technology

Thank you! Questions?

Page 61: The benefits of fine-grained synchronization in  deterministic and efficient stream processing

Vincenzo Gulisano - Yiannis Nikolakopoulos

References• Vincenzo Gulisano, Yiannis Nikolakopoulos, Marina Papatriantafilou, and

Philippas Tsigas, “ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join”, to appear in IEEE BigData 2015

• Vincenzo Gulisano, Yiannis Nikolakopoulos, Ivan Walulya, Marina Papatriantafilou, Philippas Tsigas, “Deterministic Real-time Analytics of Geospatial Data Streams Through ScaleGate Objects”, 9th ACM International Conference on Distributed Event-Based Systems (DEBS '15)

• Yiannis Nikolakopoulos, Anders Gidenstam, Marina Papatriantafilou, Philippas Tsigas “A Consistency Framework for Iteration Operations in Concurrent Data Structures”, 2015 IEEE 29th International Symposium on Parallel & Distributed Processing (IPDPS)

• Daniel Cederman, Vincenzo Gulisano, Yiannis Nikolakopoulos, Marina Papatriantafilou, and Philippas Tsigas “Brief announcement: concurrent data structures for efficient streaming aggregation”, 2014 26th ACM symposium on Parallelism in algorithms and architectures (SPAA '14)