apache big_data europe event: "demonstrating the societal value of big & smart data...
TRANSCRIPT
DEMONSTRATING THE SOCIETAL VALUE OF BIG & SMART DATA
MANAGEMENT
Apache Big_Data Europe, Seville14 November 2016
Talk outline
� The BigDataEurope Project & Mission
� The Big Data Integrator (BDI) platform
� 7 Pilots for the 7 Societal Challenge Domains
� A look into the BDI platform [DEMO]
� Collocated Event – Today @ 16:30pm
14-nov.-16www.big-data-europe.eu
Supporting the Societal Domains with Big Data Technology
BigDataEurope Project
14-nov.-16www.big-data-europe.eu
BigDataEurope Action
� EC Horizon 2020 Coordination & Support Action
o ~5mio €, 2015-2017
� Show societal value of Big Data
o Across all societal challenges addressed by H2020
� Lower barrier for using big data technologies
o Effort and resources to convert tools and workflows
o Skills and expertise
� Help establish data value chains across domains & orgs.
14-nov.-16www.big-data-europe.eu
Consortium
NCSRDEMOKRITOS
Stakeholder Engagement Cycle
� Present action, showcase deployments
� Raise awareness about BDE results, what they mean for stakeholders
� Collect requirements to drive further development
14-nov.-16
www.big-data-europe.eu
M12M6 M18 M24 M30
Data Value Chain Evolution
14-nov.-16
Extraction, CurationQuality, Linking,
Integration
Publication,
Visualization, Analysis
Extraction, Curation, Quality,
Linking, Integration, Publication,
Visualization, Analysis
Health
Transport
Security
Extraction Curation Quality Linking Integration Publication Visualization Analysis
Data Repositories
Linked Open Data
TIME
Food SocietiesClimate Energy
Proprietary,
‘locked-in’
solutions
OS Solutions,
Big Data Stackswww.big-data-europe.eu
Quelle: Gesellschaft für Informatik
Variety – The most neglected V?
� Data Source Heterogeneity
� Lack of interoperability/semantics
A flexible, generic platform for (Big) Data Value Chain Deployment
Big Data Integrator
14-nov.-16www.big-data-europe.eu
Big Data Integrator
� Prototype developed by BDE
o Incorporates existing BD technology
o Facilitates integration and deployment
� Main points of the architecture
o Dockerization
o Support layer, including integrated UI
o Semantification layer14-nov.-16www.big-data-europe.eu
Generic Architecture
14-nov.-16www.big-data-europe.eu
� Plug-and-play BD Platform
� Cloud-deployment ready
� Domain independent, Customisable
� Stacks Open Source solutions
BDI Prototype Releases
1. [July 2016]
2. December 2016
3. ….
Docker containers
14-nov.-16www.big-data-europe.eu
� Docker offers lightweight virtualizationo Containers can be shared/provisioned on different Linux variations/versions
� Identical base system
o NOT Required
� All BDI components
o Docker containers
BDI Docker Containers (so far)
14-nov.-16www.big-data-europe.eu
� Data serving: HDFS, Cassandra, 4store, PostGIS, Strabon, Elastic Search, Hive, Semagrow
� Processing: Spark, Flink, Sansa
� Stream ingestion middleware: Flume, Kafka
BDI Instances – An example
14-nov.-16www.big-data-europe.eu
� Processing and storage components
o Re-used existing docker containers (where available)
o Dockerized by BDE otherwise
o Ensuring all can be provisioned through Docker Swarm
� Other BDI Components:
o Support Layer
o Semantic Layer
Support Layer
14-nov.-16www.big-data-europe.eu
� Integrator UI
o Web UIs from BDE dockers (including 3rd party components) follow these BDE stylesheets
� Stack Monitor App
o Configure Stack order
� Swarm UI o Launch, Install
and Manage Stacks
Stack
Semantic Layer
www.big-data-europe.eu
� Semantic Data Lakes
o Minimal ingestion pre-processing
o Semantic layer maintains metadata
o Add meaning when retrieving/processing
Data Lake: scalable unstructured data store
Relationship definitions and metadata
JSON-LD CSVW R2RMLXML2RDF
� Ongoing Research for Semantic Big Data & Analytics
Knowledge Graphs
Semantic Layer tools
14-nov.-16www.big-data-europe.eu
� BDE tooling for Semantic Data Lake:
o Swagger: Semantics of RESTful APIs
o Semantic Analytics Stack (SANSA): Distributed data processing over large-scale Knowledge Graphs
o Semagrow: SPARQL over Big Data stores
o Ontario: Querying over Semantic Data Lakes
More Information
� Big Data Integrator:
https://github.com/big-data-europe
� README includes extensive documentation, instructions and information on supported components
� “Integrators at Work! Real-Life Applications of Apache Big Data Components” @4:30 PM
o Includes more details & demo
14-nov.-16www.big-data-europe.eu
Demonstrating the Societal Value through 7 Pilot ‘Real-world’ use-cases
BigDataEurope Pilots
14-nov.-16www.big-data-europe.eu
Pilots: Overview
� SC1: Health & Pharm.
� SC2: Food & Agr.
� SC3: Energy
� SC4: Transport
14-nov.-16www.big-data-europe.eu
� SC5: Climate
� SC6: Social Sciences
� SC7: Security
7 Pilots
◎ BDI Platform Instantiationso Allow end-users to easily deploy functionality in own system environment o Modularized Docker approach - easier to replace componentso Reduces effort to keep 3rd party software updated & integrated
◎ 7 Societal Challenge Pilots o Aligned with 7 European Commision H2020 Societal Challengeso Real-world use-cases (Data, Objectives, Solutions)o Some pilots have different data & objectives but a similar solution
14-nov.-16www.big-data-europe.eu
SC1: Pharmacology research
14-nov.-16
www.big-data-europe.eu
Life Sciences & Health
• Query a large number of datasets, some large
• Existing elaborate ingestion and homogenization by OpenPHACTS
• Extensive toolset developed by OPF and others
Objective: Large-scale heterogeneous pharma-
research data linking & integration
SC1: Architecture & Components
14-nov.-16www.big-data-europe.eu
• Replicate Open PHACTS functionality on the BDE infrastructure using OS solutions• Based on Virtuoso, proprietary
distributed database
• Apply to other domains (e.g. Agriculture)
• Porting to BDI gives flexibility and enables new functionalities• Logging & system health monitoring
SC2: Viticulture resources
14-nov.-16www.big-data-europe.eu
Food and Agriculture
Objective: Automate publication ingestion and
thematic classification• AgInfra is a major
infrastructure for agriculture researchers, serving cross-linked bibliography, data, and processing services
www.big-data-europe.eu
SC2: Architecture & Components
• BDI deployed as an external infrastructure for processing text (viticulture publications)
• Storing and processing text at a larger scale than AgInfracan currently manage
SC3: Predictive maintenance
14-nov.-16www.big-data-europe.eu
Energy
• Wind turbine monitoring applies computational models to sensor data streams
• Models are weekly re-parameterized using week’s data from multiple turbines
Objective: Real-time turbine monitoring stream
processing and analytics
www.big-data-europe.eu
• Existing in-house non-scalable solution for model parameterization• Reliable Fortran software for data analysis• Efficient, but not scalable to data volume
• Developing a BDI orchestrator• Re-uses existing software unmodified• Makes it easy to apply in parallel to many
datasets and manage the outputs
SC3: Architecture & Components
SC4: Traffic conditions estimation
14-nov.-16www.big-data-europe.eu
Transport
• Combines:• Traffic modelling from
historical data• Current measurements from a
taxi fleet of 1200 vehicles
Objective: Estimation of real-time traffic
conditions in Thessaloniki
14-nov.-16www.big-data-europe.eu
• New Flink implementations of map matching and traffic prediction algorithms
• BDI provides access to varied data sources• PostGIS database with
city map• ElasticSearch database
of historical data• Kafka stream of real-
time data
SC4: Architecture & Components
SC5: Climate modelling
14-nov.-16www.big-data-europe.eu
Climate
• Preparing modelling experiments• Slicing, transforming, combining datasets• Submission and retrieval from modelling
infrastructure
• Discovering and re-using previously computed derivatives• Lineage annotation: computer derivatives
from datasets and model parameters• Finding appropriate past runs avoids
repeating weeks-long modelling runs
Objective: Supporting data-intensive climate research
• BDI offers:• Hive for managing data
in a way that can be retrieved and manipulated, rather than file blocks
• Cassandra stores structured and textual metadata for searching headers and lineage
• Existing infrastructure; stable, reliable software for parallel computation of models• BDI is deployed as an external infrastructure for preparing and managing datasets
SC5: Architecture & Components
SC6: Municipality budgets
14-nov.-16www.big-data-europe.eu
Social Sciences
• Ingestion of budget and budget execution data
• Multiple municipalities in varied formats and data models
Objective: Homogenized Budgetary data made
available for analysis and comparison
14-nov.-16www.big-data-europe.eu
• BDI deployed as ingestion and storage infrastructure for external tools• Homogenizes variety of
data (JSON, CSV, XML, etc.)
• Exposes data as SPARQL endpoint serving
homogenized data
• Existing analytics and visualization tools• Use SPARQL queries to retrieve only the relevant slices of the overall data
SC6: Architecture & Components
SC7: Change detection & verification
14-nov.-16www.big-data-europe.eu
Secure Societies
• Events are extracted from text published by news agencies and on social networking sites
• Events are geo-located and relevant changes are detected by comparing current and previous satellite images
Objective: Detect and Verify Events based on Satellite
Imagery, News and Social Media
14-nov.-16www.big-data-europe.eu
Event Detection
Change Detection
• Re-implementation of change detection algorithms for Spark
• Parallel orchestrator for text analytics• Re-uses existing software• Scales to many input streams
• BDI provides:• Cassandra for text content and
metadata• Strabon GIS store for detected
change location• Homogeneous access to both for
analysis and visualization
SC7: Architecture & Components
Free Workshops, Hangouts & Webinars
BigDataEurope Activities
14-nov.-16www.big-data-europe.eu
2nd round of Societal Workshops
14-nov.-16www.big-data-europe.eu
Transport 22 September 2016 Brussels Collocated with Big Data for
Transport, Tisa workshop
Food&Agri 30 September 2016 Brussels Collocated with DG AGRI WP2018-
20 stakeholder consultation
Energy 4 October 2016 Brussels Collocated with EC H2020 Info Day
on “Smart Grids and Storage”
Climate 11 October 2016 Brussels Collocated with Melodies Project
Event – Exploiting Open Data
Security 18 October 2016 Brussels Standalone Workshop
Societies 5 December 2016 Cologne Collocated with EDDI16- 8th Annual
European DDI User Conference
Health 9 December 2016 Brussels Standalone Workshop
Other Activities
� Fresh set (7) of Societal Workshops in 2017
� Various SC-focussed and general hangouts, follow!
o Apache Flink & BDE (20 Oct) – available online
o More to follow!
o Keep track on BDE Website (Events)
14-nov.-16www.big-data-europe.eu
Demonstrating the ease-of-use in deploying custom instances of the BDI Platform
BDI Platform – A Demo
14-nov.-16www.big-data-europe.eu
WEB: www.big-data-europe.eu EMAIL: [email protected]
BIG DATA INTEGRATOR
www.github.com/big-data-europe
PROJECT COORDINATION (Fraunhofer IAIS)
Prof. Sören Auer, auer © cs.uni-bonn · de > Dr. Simon Scerri, scerri © cs.uni-bonn · deEIS Department/Group,Fraunhofer IAIS & CS Department Uni-Bonn, Bonn, Germany
Questions & Contacts
www.big-data-europe.eu14-nov.-16
#BigDataEurope
leads the Fraunhofer
Big Data Alliance