challenges in data stream processing - ce.uniroma2.it · challenges in data stream processing corso...

28
Università degli Studi di Roma Tor VergataDipartimento di Ingegneria Civile e Ingegneria Informatica Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini

Upload: others

Post on 26-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Università degli Studi di Roma “Tor Vergata” Dipartimento di Ingegneria Civile e Ingegneria Informatica

Challenges in Data Stream Processing

Corso di Sistemi e Architetture per Big Data A.A. 2016/17

Valeria Cardellini

Page 2: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Challenge 1: Optimize the DSP application

•  Apply some transformation to streaming graph –  At design time or run-time

•  Operator reordering –  To avoid unnecessary data transfers

•  Redundancy elimination

A B B A

A B

B D

C A B

D

C

1 Valeria Cardellini - SABD 2016/17

Page 3: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Challenge 1: Optimize the DSP application

•  Operator separation

•  Fusion

A A1 A2

A B AB

2 Valeria Cardellini - SABD 2016/17

Page 4: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Challenge 2: Place the operators

•  Operator placement decision: a complex problem –  Trade communication cost against resource

utilization •  When

–  Initial (static) operator placement •  Can be more expensive and comprehensive

–  Can also be at run-time •  Move only relocatable operators •  Require operator migration

•  We will focus on this issue in a next lesson

3 Valeria Cardellini - SABD 2016/17

Page 5: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Challenge 3: Manage load variations

•  Typical stream processing workloads are: –  with high volume and high rates –  bursty and with workload spikes not known in

advance •  Twitter in 2013: rate of tweets per second = 5700 •  … but significant peak of 144,000 tweets per second

4 Valeria Cardellini - SABD 2016/17

Page 6: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Challenge 3: Manage load variations

•  Possible approaches: –  Admission control –  Static reservation

•  Reserve specific resources in advance •  Cons: over-provisioning and cost increase

–  Apply dynamic techniques such as load shedding •  Selectively drop tuples at strategic points (e.g., when CPU

usage exceeds a specific limit) •  Cons: sacrifice accuracy and completeness

A Shedder A

5 Valeria Cardellini - SABD 2016/17

Page 7: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Challenge 3: Manage load variations

•  Possible approaches (continued): –  Use adaptive rate allocation

•  E.g., backpressure : the upstream operator that precedes the bottleneck stores data in an internal buffer to reduce the pressure; backpressure recursively propagates up to the source operators

–  Redistribute load, e.g., determine new operator placement and relocate operators on computing nodes

•  Cons: available resources could be insufficient

•  What else?

6 Valeria Cardellini - SABD 2016/17

Page 8: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

•  Alternative solution: –  Detect bottleneck –  Use data-parallelism (aka operator fission)

•  Apply SIMD paradigm: concurrent execution of multiple replicas of the same operator on different data portions

•  By hand: possible, but cumbersome

Exploit data parallelism

A

A

A

A

Split Merge

7 Valeria Cardellini - SABD 2016/17

Page 9: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Elastic stream processing

•  Exploit elasticity: acquire and release resources when needed

•  Where? –  At application layer (i.e., data parallelism)

•  Scale out (or scale in) operators by adding (removing) operator replicas

•  Activate (or deactivate) already replicated operators

–  At infrastructure layer •  Scale out (or scale in) computing nodes

8 Valeria Cardellini - SABD 2016/17

Page 10: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Elastic stream processing

•  When and how to scale? –  Open issues that deserve investigation –  Some simple example:

•  When: threshold-based •  How: add/remove one replica at time, but where to place it?

•  Be careful: elasticity overhead is not zero! –  In most streaming systems: run a new placement

decision to take the new replicas into account –  Dynamic scaling impacts stateful operators

9 Valeria Cardellini - SABD 2016/17

Page 11: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Challenge 4: Self-adapt at run-time •  To cope with highly dynamic operative

environment –  Unpredictable workload –  Computational characteristics of operators not

known a-priori –  Need to sustained load for long provisioning times –  Node availability, network congestion, …

•  Exploit run-time adaptation capabilities of DSP systems

•  What adaption actions? –  Scale the number of operator instances, relocate the

operators, …

10 Valeria Cardellini - SABD 2016/17

Page 12: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Self-adaptive deployment

•  MAPE (Monitor, Analyze, Plan and Execute)

•  Plan phase: how to reconfigure the application deployment

11 Valeria Cardellini - SABD 2016/17

Page 13: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Distributed Storm

•  We developed an extension of Storm •  Goals: to provide

–  distributed monitoring –  distributed placement –  and adaptation capabilities

•  Where: large-scale environment •  Code available on GitHub

matnar.github.io/uniroma2-storm/

12 Valeria Cardellini - SABD 2016/17

V. Cardellini, V. Grassi, F. Lo Presti, M. Nardelli, “Distributed QoS-aware scheduling in Storm”, ACM DEBS 2015.

Page 14: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Distributed Storm architecture

13 Valeria Cardellini - SABD 2016/17

Page 15: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Distributed Storm: monitoring

•  QoSMonitor (for each worker node) –  Estimate network latencies

•  Use a network coordinate system •  Vivaldi’s algorithm: decentralized and gossip-based

–  Monitor QoS attributes •  Node utilization and availability

•  Worker Monitor (for each worker process) –  Monitor exchanged data rate among the operators

14 Valeria Cardellini - SABD 2016/17

Page 16: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Distributed Storm: performance

Load spike on a subset of nodes

~50%

15 Valeria Cardellini - SABD 2016/17

Page 17: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Reconfiguration challenges

•  Reconfiguring the deployment has a non negligible cost!

•  Can affect negatively application performance in the short term –  Application freezing times caused by operator

migration and scaling, especially for stateful operators

Perform reconfiguration only when needed

Take into account the overhead for migrating and scaling the operators

16 Valeria Cardellini - SABD 2016/17

Page 18: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Challenge 5: stateful operators

•  State complicates things… 1. Dynamic scaling 2. Operator re-placement 3. Recovery from failure

Loss of state!

impact state

17 Valeria Cardellini - SABD 2016/17

impact state

Page 19: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Approaches for stateful migration

•  Most of streaming systems do not support stateful processing and migration (e.g., Storm) –  Developers manage state –  Typically combine with external system to store state –  Design complexity

•  Requirements for stateful operatior migration –  Safety (i.e., to preserve the consistency of the

operations) –  Application transparency –  Minimal footprint

18 Valeria Cardellini - SABD 2016/17

Page 20: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Stateful operator migration •  Two approaches:

–  Pause-and-resume –  Parallel-track

•  Pause-and-resume approach

Stop migrating

task Save state

Terminate migrating task and start it on new node

Restore state

Resume stream processing

19 Valeria Cardellini - SABD 2016/17

Page 21: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Stateful operator migration

•  Pause-and-resume drawback –  Peak in the application latency during the migration

•  Parallel-track approach –  Old and new operator instances run concurrently until

the state of both is synchronized and the new instance can safely take over

–  Drawback: requires enhanced mechanisms

•  No clear winner

20 Valeria Cardellini - SABD 2016/17

Page 22: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Issues for stateful migration

•  How to identify the portion of state to migrate? –  Expose an API to let the user manually manage

the state –  Support only partitioned stateful operators

•  Partitioned stateful operators store independent state for each sub-stream identified by a partitioning key

•  Automatically determine, on the basis of a partitioning key, the optimal number of state partitions to be used and migrate

21 Valeria Cardellini - SABD 2016/17

Page 23: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Issues for stateful migration

•  How to balance the load among multiple stateful replicas?

•  Can use consistent hashing •  Can use partial key grouping

–  Uses two hash functions where a key can be sent to two different replicas instead of one

•  Only available in research prototypes

Valeria Cardellini - SABD 2016/17

22

Page 24: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Elastic stateful migration in Storm •  We developed mechanisms for elastic stateful

migration in Storm

Supervisor Supervisor Supervisor Supervisor

wor

ker

proc

ess

wor

ker

proc

ess

wor

ker

slot

wor

ker

slot

wor

ker

slot

wor

ker

slot

wor

ker

proc

ess

wor

ker

proc

ess

wor

ker

proc

ess

wor

ker

proc

ess

wor

ker

proc

ess

wor

ker

proc

ess

DDS DDS DDS DDS

Network

schedulerMigrationNotifier

ElasticityManager

Nimbus ZooKeeper

23 Valeria Cardellini - SABD 2016/17

V. Cardellini, M. Nardelli, D. Luzi, "Elastic stateful stream processing in Storm", HPCS 2016.

Page 25: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Elastic stateful migration in Storm

•  Scaling decisions at the framework level –  Adapt the number of parallel instances for each

application operator –  Simple threshold-based scaling policy

•  Relocate the operator internal state on a different node and enable Storm to change the application deployment at run-time

MIGRATION NOTIFIED

MIGRATIONMODE

SAVESTATE

first synchronizationbarrier

the migrating taskcan be terminated

MIGRATION MODE

RESTORE STATE(if any)

OPERATIONALMODE

new task

second synchronizationbarrier

streams areresumed

time

DDS DDS

24 Valeria Cardellini - SABD 2016/17

Page 26: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Time (s)500 1000 1500 2000 2500 3000 3500 4000 4500

App

licat

ion

Late

ncy

(ms)

0

200

400

600

800

1000

1200

1400

1600

Data rate ScalingSchedulingwith E+SMw/o E+SM

120 tweets/s120 tweets/s 250 tweets/s350 tweets/s 900 tweets/s

Time (s)500 1000 1500 2000 2500 3000 3500 4000 4500

Num

ber

of E

xecu

tors

0

5

10

15

20

25

30

Data rateScalingSchedulingwith E+SM

120 tweets/s250 tweets/s900 tweets/s350 tweets/s120 tweets/s

Performance results

•  Elastic scaling and stateful migration improves the application latency

•  DSP app: frequent pattern detection

25 Valeria Cardellini - SABD 2016/17

Page 27: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

Challenge 6: guarantee fault tolerance

•  DSP applications run for long time intervals failures are unavoidable

•  Possible solutions: –  Active replication –  Check-pointing –  Replay logs –  Hybrid solutions

•  Having different trade-offs between runtime cost in absence of failures and recovery cost

•  Large-scale complicates things… –  Network partitions and CAP theorem

26 Valeria Cardellini - SABD 2016/17

Page 28: Challenges in Data Stream Processing - ce.uniroma2.it · Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini . ... ACM

References

•  M. Hirzel, R. Soulé, S. Schneider, B. Gedik, R. Grimm, “A catalog of stream processing optimizations”, ACM Comput. Surv., 2014. http://bit.ly/2rtLljf

•  T. Heinze T, L. Aniello, L. Querzoni, Z. Jerzak, “Cloud-based data stream processing”, Proc. ACM DEBS 2014. http://bit.ly/2sMzxMM

•  V. Cardellini, V. Grassi, F. Lo Presti, M. Nardelli, “Distributed QoS-aware scheduling in Storm”, Proc. ACM DEBS 2015.

•  V. Cardellini, M. Nardelli, D. Luzi, “Elastic stateful stream processing in Storm”, Proc. HPCS 2016.

•  B. Gedik, S. Schneider, M. Hirzel, and K.-L. Wu, “Elastic scaling for data stream processing”, IEEE Trans. Parallel Distrib. Syst. 25, 6, 2014. http://bit.ly/2rKunfP

Valeria Cardellini - SABD 2016/17

27