continuous architecting of stream-based systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... ·...

20
Politecnico di Milano Continuous Architecting of Stream - Based Systems M.M. Bersani , F. Marconi, D.A. Tamburri Politecnico di Milano Milan, Italy WICSA 2016 P. Jamshidi , A. Nodari Imperial College London London, UK Venice, Italy April 6, 2016

Upload: others

Post on 20-May-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Politecnicodi Milano

Continuous Architecting of Stream-Based Systems

M.M. Bersani, F. Marconi, D.A. TamburriPolitecnico di Milano

Milan, Italy

WICSA 2016

P. Jamshidi, A. NodariImperial College London

London, UK

Venice, ItalyApril 6, 2016

Page 2: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Roadmap

Context: What is DevOps?Our Playground and Challenge: Continuous Architecting and Big-DataApache StormResearch Solution

Anti-PatternsAlgorithmic ManipulationFormal Verification

EvaluationConclusions & Future Work

WICSA 2016- 2 -

Page 3: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Context: What is DevOps?

WICSA 2016- 3 -

Development Test Production

Development IT Operations

tools tools

Before DevOps

Continuous Production

Continuous Integration

ContinuousTesting

Development Test Production

QoS – Quality Assurance Service

toolsDevelopment & IT Operations

DevOps

Page 4: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Our Playground and Challenge: Continuous Architecting and Big-Data

Beyond the tremendous hype and diffusion of Big-Data applications in recent years

High infrastructure costsSteep learning curve for different frameworksComplex governance of such complex large scale architectures

Key challenge: quickly and continuously support design, deployment, operation, refactoring and re-deployment.

WICSA 2016- 4 -

(Re)Design

DeploymentOperation

Page 5: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Our Playground and Challenge: Continuous Architecting and Big-Data

Our focus:(provided a Big Data application)Supporting continuous and incremental improvement of architectural design by means of

a constant stream of analyses on the running applicationsmonitoring on platform and infrastructure.

Desired benefits:Reducing (re-)design effortsAccelerating and facilitating (re-)deployability

Need to narrow down the scope…

WICSA 2016- 5 -

Page 6: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Apache Storm

Open Source Distributed Stream Processing SystemAnalytics, Log Event processing, etc..Reliability, at-least-one semanticsWide adoption in production Main concepts

StreamsTopologies

WICSA 2016- 6 -

Page 7: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Storm Applications

Applications defined by means of Topologies, graphs of computations composed of:

SpoutsSources of data streams (tuples)

BoltsCalculate, Filter, Aggregate, Join, Talk to databases

WICSA 2016- 7 -

TOPOLOGY

spout

bolt

bolt

bolt

spout bolt

Message broker,

database

Queue,web API

Queue,database

File

in

in

out

out

Page 8: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Research Question

“How can we assist the continuous architecting of stream processing systems?”

WICSA 2016- 8 -

Page 9: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Research Solution

Identify common anti-patternsIdentify possible algorithmic manipulations Elicitate structural properties and consistency checksInvestigate further analysis based on formal verification techniquesDesign a tool to support incremental and iterative refinement of streaming topologies by leveraging the above-mentioned findings.

WICSA 2016- 9 -

Page 10: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

OSTIA On-the-fly Static Topology Inference Analysis

GoalsInferring application architecture through on-the-fly reverse engineering and architecture recoveryEnact continuous architecting by applying iterative refinement

WICSA 2016- 10 -

OSTIA architecture

Page 11: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Topology Visualization

WICSA 2016- 11 -

wpSpout_1

wpDeserializer_2

shuffleGrouping

expander_8

shuffleGrouping

miSpout_1

miDeserializer_2

shuffleGrouping

vIndexer_16

shuffleGrouping

articleExtraction_1

shuffleGrouping

mediaExtraction_1

shuffleGrouping

webPageUpdater_1

shuffleGrouping

textIndexer_1

shuffleGrouping shuffleGrouping shuffleGrouping shuffleGrouping

mediaupdater_1

shuffleGrouping

mediaTextIndexer_1

shuffleGrouping

clusterer_1

shuffleGrouping

Page 12: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Anti-Patterns Detection

Cycle-in

Persistent Data

Multi-Anchoring

Computational Funnel

WICSA 2016- 12 -

A

B

C

Page 13: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Algorithmic Manipulation (1/2)

Fan-in/Fan-out Clustering

WICSA 2016- 13 -

Fan-In / Fan-Out

X

A

B

...

N

P

Q

...

Z

OutDegree

InDegree

Cluster A

E

A

B

C

D

F

Q

H

A

B

C

Q

A

C

B

C

C

B

C

E

Cluster B

A

Q

H Q

A

C

B

B

C

E

Page 14: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Algorithmic Manipulation (2/2)

Topology Cascading Linearization

WICSA 2016- 14 -

Topology Cascading

E

A

B

C

D

F

Q

H

MQ A

B

C

Topological Weaving

E

A

B

C

D

F

Q

H

A

B

C

Q

A

C

B

C

C

E

A

BCD

F

Q

H

A B

E

Q D C

BC

Page 15: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Formal Verification

Automatic encoding of temporal logic formulae based on topology structure.It transparently extends core OSTIAComplements the empirical analysisNon functional properties based on real-time constraints

WICSA 2016- 15 -

Page 16: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Qualitative Evaluation -Industrial case study

Industrial case studyBenefits from visualizationRelevant refactoring hints from algorithmic analysis endorsed by formal verification

WICSA 2016- 16 -

wpSpout_1

WpDeserializer_4

shuffleGrouping

expander_8

shuffleGrouping

articleExtraction_1

shuffleGrouping

mediaExtraction_1

shuffleGrouping

webPageUpdater_4

shuffleGrouping

textIndexer_1

shuffleGrouping

mediaupdater_1

shuffleGrouping

mediatextindexer_1

shuffleGroupingshuffleGrouping shuffleGrouping shuffleGrouping

Page 17: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Qualitative Evaluation -Open-source software

VisualizationImmediate help in reverse-engineering new topologiesBetter understanding of the complexities

Several anti-patterns highlighted

WICSA 2016- 17 -

spout

partitioner

shuffleGrouping

fetch

fieldsGrouping

sitemap

shuffleGrouping

status

shuffleGrouping

shuffleGrouping

parse

shuffleGrouping

shuffleGrouping index

shuffleGrouping

shuffleGrouping

DatasetSpout

TokenizerBolt

shuffleGrouping

PreprocessorBolt

shuffleGrouping

POSTaggerBolt

shuffleGrouping

FeatureGenerationBolt

shuffleGrouping

SVMBolt

shuffleGrouping

Page 18: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Conclusions

Approach for supporting reverse engineering and recovering of deployed applications for incremental improvement

Anti-pattern detectionAlgorithmic analysisFormal Verification

Qualitative Evaluation on open and close-source use cases

WICSA 2016- 18 -

Page 19: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Future works

Additional anti-patternsAdding suggestions for performance improvementsSupport for new technologies

Apache Spark, Flink, Kafka.

Quantitative evaluationFurther Formal Analyses may be devised

WICSA 2016- 19 -

Page 20: Continuous Architecting of Stream-Based Systemswp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/... · Apache Storm Open Source Distributed Stream Processing System Analytics, Log Event

Questions?

Thanks!

- 20 -