pilot sc4 - big data europe · stream and batch processor bde workshop brussels ... apache flink is...
TRANSCRIPT
![Page 1: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/1.jpg)
Pilot SC4 BDE Workshop Brussels 14 Sept. 2017
BDE Workshop Brussels 14.09.2017
![Page 2: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/2.jpg)
Objective of the Pilot SC4
BDE Workshop Brussels 14 Sept. 2017
A scalable, fault-tolerant and flexible platform based on open source frameworks that can process unbounded data sets.
![Page 3: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/3.jpg)
Microservice Architecture
BDE Workshop Brussels 14 Sept. 2017
![Page 4: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/4.jpg)
Message Broker
BDE Workshop Brussels 14 Sept. 2017
Apache Kafka is a high-throughput distributed durable messaging system
Apache Kafka
![Page 5: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/5.jpg)
Kafka Cluster
BDE Workshop Brussels 14 Sept. 2017
Apache Kafka
![Page 6: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/6.jpg)
Stream and Batch Processor
BDE Workshop Brussels 14 Sept. 2017
Apache Flink is an open source platform for distributed stream and batch data processing.
Apache Flink
![Page 7: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/7.jpg)
Flink Cluster
BDE Workshop Brussels 14 Sept. 2017
Apache Flink
![Page 8: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/8.jpg)
Storage and Indexing
BDE Workshop Brussels 14 Sept. 2017
PostGis is a spatial database that stores the road network data. Elasticsearch is a distributed open source document database built on top of Apache Lucene. It stores the result of the workflow.
![Page 9: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/9.jpg)
Elasticsearch Cluster
BDE Workshop Brussels 14 Sept. 2017
![Page 10: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/10.jpg)
Pilot Architecture
BDE Workshop Brussels 14 Sept. 2017
![Page 11: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/11.jpg)
BDE Components
BDE Workshop Brussels 14 Sept. 2017
![Page 12: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/12.jpg)
The FCD Pipeline
BDE Workshop Brussels 14 Sept. 2017
![Page 13: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/13.jpg)
Pilot Cluster
BDE Workshop Brussels 14 Sept. 2017
Minimum requirement for fault-tolerance and scalability ● C luster of 3 nodes (Docker swa rm) ● 4 C PU cores x node ● 1 (Flink) worker x node ● 1 (Flink) s lot x C PU core Ma x pa ra llelism = 12
![Page 14: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/14.jpg)
Parallelization: map-match subtasks
BDE Workshop Brussels 14 Sept. 2017
1. source() 2. mapMatch() 3. keyBy()/window()/apply() 4. sink() The subtasks can be distributed in slots with different parallelism (e.g. from 1 to 12)
![Page 15: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/15.jpg)
Parallelization: map-match subtasks
BDE Workshop Brussels 14 Sept. 2017
A slot can process all the subtasks in a pipeline
![Page 16: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/16.jpg)
Parallelization: input and output data
BDE Workshop Brussels 14 Sept. 2017
device_id timestamp lat lon speed orientation transit
The mapMatch subtask keeps the time order so that the next task keyBy(road_seg)/window(15’)/apply(average_speed) will return the correct result within the time window for each road segment.
road_seg_id start_date num_vehicles avg_speed
![Page 17: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/17.jpg)
SC4 Pilot Pipeline
BDE Workshop Brussels 14 Sept. 2017
![Page 18: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/18.jpg)
Data Upload
BDE Workshop Brussels 14 Sept. 2017
![Page 19: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/19.jpg)
Producer and Consumer
BDE Workshop Brussels 14 Sept. 2017
![Page 20: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/20.jpg)
Visualization
BDE Workshop Brussels 14 Sept. 2017
The pilot SC4 can process real-time FCD data for map-matching and classify a road segment according to the traffic level.
![Page 21: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/21.jpg)
Short-term traffic forecast
BDE Workshop Brussels 14 Sept. 2017
Algorithm: Feedforward ANN
![Page 22: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/22.jpg)
Short-term traffic forecast
BDE Workshop Brussels 14 Sept. 2017
Algorithm: Feedforward ANN Hyperparameters (spatial and temporal correlation): Input layer units: (Dd*24*60*Cr)/Tw Dd = number of days (e.g. working days, 5 days) Tw = time window (e.g. 30’) Cr = connected road segments (e.g. 3) -> 720 input units
![Page 23: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/23.jpg)
SANSA-Stack: Big Data + Machine Learning + Semantic Technologies
BDE Workshop Brussels 14 Sept. 2017
SANSA-Stack, part of the BDE project, and RDF data sets based on semantic technologies such as LinkedGeoData, will enable more use cases related to SC4
![Page 24: Pilot SC4 - Big Data Europe · Stream and Batch Processor BDE Workshop Brussels ... Apache Flink is an open source platform for distributed stream and batch data processing. Apache](https://reader033.vdocument.in/reader033/viewer/2022042909/5f3d765f2506cb64a5167401/html5/thumbnails/24.jpg)
Thanks
BDE Workshop Brussels 14 Sept. 2017
BDE project website: https:/ /www.big-data-europe.eu/ Code repository: https:/ /github.com/big-data-europe Contact: [email protected]