modularizing flink programs to enable stream analytics in ... · modularizing flink programs to...
TRANSCRIPT
Modularizing Flink Programsto Enable Stream Analyticsin IoT Mashup Tools
Federico Fernández
Master’s Thesis Defense
19th July 2018
Supervisor: Prof. (Chang’an Univ.) PD Dr. habil. Christian Prehofer
Advisors: Tanmaya Mahapatra, M.Sc., Dr. Ilias Gerostathopoulos
20BILLIONDEVICESIN 2020
Source: IDC, Intel, United Nations
20BILLIONDEVICESIN 2020
Big Data
Source: IDC, Intel, United Nations
20BILLIONDEVICESIN 2020
Big Data
Stream
Analytics
Source: IDC, Intel, United Nations
20BILLIONDEVICESIN 2020
Apache
Flink
Big Data
Stream
Analytics
Source: IDC, Intel, United Nations
20BILLIONDEVICESIN 2020
Apache
Flink
Big Data
Stream
Analytics
Mashups
Source: IDC, Intel, United Nations
20BILLIONDEVICESIN 2020
Apache
Flink
Big Data
Stream
Analytics
Mashups
aFlux
Source: IDC, Intel, United Nations
20BILLIONDEVICESIN 2020
Apache
Flink
Big Data
Stream
Analytics
Mashups
aFlux
Source: IDC, Intel, United Nations
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichOutline
1. Introduction & Demo
2. Objectives & Methodology
3. Related Work
4. Conceptual Approach
5. Implementation
6. Evaluation1. The SmartSantander Project
2. Evaluation Scenario
3. Results
7. Conclusions
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 3 of 20
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichObjectives & Methodology
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 4 of 20
• Research questions:
Which abstractions are necessary to modularize Flink programs so that they
can be created from flow-based, graphical mashup tools?
How can end users get support during the process of creating Flink
programs graphically?
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichObjectives & Methodology
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 5 of 20
• Research questions revisited:
How can end users get support during the process of creating Flink programs
graphically so that they place visual components in the right order?
Which abstractions are necessary to modularize Flink streaming programs
so that they can be created from flow-based, graphical mashup tools?
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichObjectives & Methodology
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 5 of 20
• Research questions revisited:
• Methodology
1. Literature review
2. Design
• Analyze mashup tools (aFlux) and Flink
• Outcome: mashup components that allow the creation of Flink jobs
3. Implementation
• Java code generation + packaging of final job
• Continuous validation to support users
4. Evaluation → SmartSantander
How can end users get support during the process of creating Flink programs
graphically so that they place visual components in the right order?
Which abstractions are necessary to modularize Flink streaming programs
so that they can be created from flow-based, graphical mashup tools?
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichRelated Work
• Nussknacker• Open-source solution
• Architecture
• Engine
• User Interface
• Integrations
• Other tools• IBM SPSS Modeler
• Microsoft Azure Stream Analytics
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 6 of 20
Source: https://touk.github.io/nussknacker/
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichConceptual Approach
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 7 of 20
Model 1: Translator
Enable the creation of programs for Stream Analytics graphically.
Model 2: End-User Continuous Support
Continuously assess the end-user flow composition for semantic validity and provide feedback about it.
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichConceptual Approach
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 8 of 20
Model 1: Translator
Enable the creation of programs for Stream Analytics graphically.
Model 2: End-User Continuous Support
Continuously assess the end-user flow composition for semantic validity and provide feedback about it.
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichConceptual Approach
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 8 of 20
Model 1: Translator
Enable the creation of programs for Stream Analytics graphically.
Model 2: End-User Continuous Support
Continuously assess the end-user flow composition for semantic validity and provide feedback about it.
• Graphical Parser• Create internal model from GUI
• Instantiate actors
• Actor System & Actors• Specific Flink functionality
• Parameterized, generic structure of Flink statements
• Exchange messages → Specific Tree-Like Data Structure (STDS)
• Code Generator
• Mapping of the actual Flink API
• User-defined properties
• Generates, compiles, packages
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichConceptual Approach
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 8 of 20
Model 1: Translator
Enable the creation of programs for Stream Analytics graphically.
Model 2: End-User Continuous Support
Continuously assess the end-user flow composition for semantic validity and provide feedback about it.
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichConceptual Approach
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 9 of 20
Model 1: Translator
Enable the creation of programs for Stream Analytics graphically.
Model 2: End-User Continuous Support
Continuously assess the end-user flow composition for semantic validity and provide feedback about it.
Visual Component Ashould
must(immediately)come
before
afterVisual Component B
main
visual component
argument
visual componentisPrecedentisConsecutiveisMandatory
• Semantics between nodes
• Checked every time two mashup components are wired together
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichImplementation: Translator
• Graphical parser → embedded into aFlux
• Actors• Exchange FlinkFlowMessage → Contains a STDS
• 12 actors that map Flink’s DataStream API and CEP Library
• Code generator• Java source code generation → JavaPoet library
• FlinkAPIMapper → based on the JavaParser library
• Generates Abstract Syntax Tree (AST) from Flink sources
• Singleton design pattern to boost performance
• Package final job → MavenInvoker
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 10 of 20
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichImplementation: Translator
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 11 of 20
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichImplementation: End-User Support
• Visual Components → aFlux Mashup
Components
• Conditions implemented in ToolSemanticsCondition
• An array can be defined when developing a new mashup
component
• Errors are shown to the user when creating
the flow• Component becomes red
• Component name gets an asterisk (“*”)
• Details are shown in the right-hand panel
• Available to all mashup components in aFlux
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 12 of 20
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichImplementation: End-User Support
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 13 of 20
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichEvaluation: The SmartSantander Project
• City-scale experimental research facility• 3000 IEEE 802.15.4 devices
• 200 GPRS modules
• Static locations + on-board of mobile vehicles
• Here focusing on:
• Traffic Intensity Monitoring
• Environmental Monitoring
• Flink extension to retrieve live data• Independent of aFlux! → Can be contributed to the community
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 14 of 20
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of Munich
Image source: WeMaps
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichEvaluation: Scenario
Goal → prove how easy it is to create Flink jobs from aFlux
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 15 of 20
UC1: Real Time Data Processing
AggregateFunction, AllWindowedStream, DataStream,
FilterFunction, MapFunction, RichSourceFunction,
SlidingWindow, StreamExecutionEnvironment, TumblingWindow
Code Description
UC1E1Temperature vs. air quality in a certain area in relation with the
average of the city
UC1E2 Air quality vs. traffic charge in the city center
UC1E3 Noise vs. traffic charge in the city center
UC1E4 Max/min monitor
UC2: Pattern Detection
DataStream, Pattern, PatternSelectFunction, PatternStream
Code Description
UC2E1 Traffic increasing in a certain area
UC2E2 Heatwave in the city
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichEvaluation: Results
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 16 of 20
• Use Case 1, Experiment 1 (Temperature vs. Air Quality)
Tumbling Windows: size=5min Sliding Windows: size=5min, slide=1min
Live data from SmartSantander API @ 9th July 2018.
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichEvaluation: Results
• Use Case 2, Experiment 1 (Traffic Jams Detection)
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 17 of 20
Live data from SmartSantander API @ 9th July 2018.
AfterMatchSkipStrategy strat = AfterMatchSkipStrategy.noSkip();
Pattern<TrafficObservation, TrafficObservation> myPattern =
Pattern.<TrafficObservation>begin("start", strat)
.where(new SimpleCondition<TrafficObservation>() {
@Override
public boolean filter(TrafficObservation trafficObservation) throws Exception {
if (trafficObservation.getCharge() >= 50)
return true;
return false;
}
}).followedBy("middle")
.where(new SimpleCondition<TrafficObservation>() {
@Override
public boolean filter(TrafficObservation trafficObservation) throws Exception {
if (trafficObservation.getCharge() >= 60)
return true;
return false;
}
}).within(Time.minutes(10))
.followedBy("end").where(new SimpleCondition<TrafficObservation>() {
@Override
public boolean filter(TrafficObservation trafficObservation) throws Exception {
if (trafficObservation.getCharge() >= 75)
return true;
return false;
}
}).within(Time.minutes(10));
PatternStream<TrafficObservation> patternStream = CEP.pattern(filteredTraffic, myPattern);
DataStream<SmartSantanderAlert> alerts = patternStream.select(new
PatternSelectFunction<TrafficObservation, SmartSantanderAlert>() {
@Override
public SmartSantanderAlert select(Map<String,
List<TrafficObservation>> map) throws Exception {
TrafficObservation event = map.get("end").get(0);
return new SmartSantanderAlert("Charge went too high in " + event.toString());
}
});
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichConclusions
• Stream Analytics suits the IoT use-case
• IoT mashup tools as enabling technology
• Research Questions1. Abstractions to modularize Flink streaming programs so that they can be created graphically
2. End-user support while creating programs graphically
• Main contributions
1. A new extension for aFlux that allows the creation of Flink jobs
2. Support for semantics validation in aFlux
3. A new extension for Flink that allows the integration of live data from SmartSantander
• Future lines
• Flink APIs
• User Experience
• Unattended mechanism to deploy the jobs
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 18 of 20
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichBibliography
[1] E. Friedman and K. Tzoumas, Introduction to Apache Flink. O’Reilly, Sep. 2016.
[2] The Apache Software Foundation. (2018). Apache Flink, [Online]. Available: https://flink.apache.org/
(visited on 06/2018)
[3] Mark A. Overton. “The IDAR Graph.” In: Queue 15.2 (Apr. 2017), 20:29–20:48. ISSN: 1542-7730. doi: 10.1145/3084693.3089807.
[4] Smart Santander. (2018). Santander Facility, [Online]. Available:
http://www.smartsantander.eu/index.php/testbeds/item/132-santander-summary (visited on 06/2018)
[5] T. Mahapatra and C. Prehofer, “Service mashups and developer support,” Digital Mobility Platforms and
Ecosystems, p. 48, 2016.
[6] T. Mahapatra, I. Gerostathopoulos, and C. Prehofer, “Towards integration of Big Data analytics in Internet
of Things mashup tools,” in Proceedings of the Seventh International Workshop on the Web of Things, ser.
WoT ’16, Stuttgart, Germany: ACM, 2016, pp. 11–16, ISBN: 978-1-4503-4874-4. DOI:
10.1145/3017995.3017998
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 19 of 20
Modularizing Flink Programsto Enable Stream Analyticsin IoT Mashup Tools
Federico Fernández
Master’s Thesis Defense
19th July 2018
Supervisor: Prof. (Chang’an Univ.) PD Dr. habil. Christian Prehofer
Advisors: Tanmaya Mahapatra, M.Sc., Dr. Ilias Gerostathopoulos
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichApache Flink
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 21 of 20
Source: https://flink.apache.org/
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichApache Flink
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 22 of 20
Source: E. Friedman and K. Tzoumas
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichStreaming Architecture
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 23 of 20
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichWindows
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 24 of 20
Tumbling Windows Sliding Windows Session Windows Global Window
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichStreaming Architecture
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 25 of 20
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichProgramming Flink
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 26 of 20
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichProgramming Flink
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 27 of 20
Source: https://flink.apache.org/
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichFlink Connector for SmartSantander
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 28 of 20
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichFlink Connector for SmartSantander
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 29 of 20
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichaFlux
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 30 of 20
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of MunichaFlux
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 31 of 20
Informatics 4 – Chair of Software and Systems EngineeringDepartment of Informatics
Technical University of Munich
Federico Fernández — Modularizing Flink Programs to Enable Stream Analytics in IoT Mashup Tools 32 of 20