apache big_data europe event: "demonstrating the societal value of big & smart data...


Upload: bigdataeurope

Post on 22-Jan-2017




0 download


Page 1: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"



Apache Big_Data Europe, Seville14 November 2016

Page 2: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Talk outline

� The BigDataEurope Project & Mission

� The Big Data Integrator (BDI) platform

� 7 Pilots for the 7 Societal Challenge Domains

� A look into the BDI platform [DEMO]

� Collocated Event – Today @ 16:30pm


Page 3: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Supporting the Societal Domains with Big Data Technology

BigDataEurope Project


Page 4: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

BigDataEurope Action

� EC Horizon 2020 Coordination & Support Action

o ~5mio €, 2015-2017

� Show societal value of Big Data

o Across all societal challenges addressed by H2020

� Lower barrier for using big data technologies

o Effort and resources to convert tools and workflows

o Skills and expertise

� Help establish data value chains across domains & orgs.


Page 5: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"



Page 6: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Stakeholder Engagement Cycle

� Present action, showcase deployments

� Raise awareness about BDE results, what they mean for stakeholders

� Collect requirements to drive further development



M12M6 M18 M24 M30

Page 7: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Data Value Chain Evolution


Extraction, CurationQuality, Linking,



Visualization, Analysis

Extraction, Curation, Quality,

Linking, Integration, Publication,

Visualization, Analysis




Extraction Curation Quality Linking Integration Publication Visualization Analysis

Data Repositories

Linked Open Data


Food SocietiesClimate Energy




OS Solutions,

Big Data Stackswww.big-data-europe.eu

Page 8: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Quelle: Gesellschaft für Informatik

Variety – The most neglected V?

� Data Source Heterogeneity

� Lack of interoperability/semantics

Page 9: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

A flexible, generic platform for (Big) Data Value Chain Deployment

Big Data Integrator


Page 10: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Big Data Integrator

� Prototype developed by BDE

o Incorporates existing BD technology

o Facilitates integration and deployment

� Main points of the architecture

o Dockerization

o Support layer, including integrated UI

o Semantification layer14-nov.-16www.big-data-europe.eu

Page 11: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Generic Architecture


� Plug-and-play BD Platform

� Cloud-deployment ready

� Domain independent, Customisable

� Stacks Open Source solutions

BDI Prototype Releases

1. [July 2016]

2. December 2016

3. ….

Page 12: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Docker containers


� Docker offers lightweight virtualizationo Containers can be shared/provisioned on different Linux variations/versions

� Identical base system

o NOT Required

� All BDI components

o Docker containers

Page 13: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

BDI Docker Containers (so far)


� Data serving: HDFS, Cassandra, 4store, PostGIS, Strabon, Elastic Search, Hive, Semagrow

� Processing: Spark, Flink, Sansa

� Stream ingestion middleware: Flume, Kafka

Page 14: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

BDI Instances – An example


� Processing and storage components

o Re-used existing docker containers (where available)

o Dockerized by BDE otherwise

o Ensuring all can be provisioned through Docker Swarm

� Other BDI Components:

o Support Layer

o Semantic Layer

Page 15: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Support Layer


� Integrator UI

o Web UIs from BDE dockers (including 3rd party components) follow these BDE stylesheets

� Stack Monitor App

o Configure Stack order

� Swarm UI o Launch, Install

and Manage Stacks


Page 16: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Semantic Layer


� Semantic Data Lakes

o Minimal ingestion pre-processing

o Semantic layer maintains metadata

o Add meaning when retrieving/processing

Data Lake: scalable unstructured data store

Relationship definitions and metadata


� Ongoing Research for Semantic Big Data & Analytics

Knowledge Graphs

Page 17: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Semantic Layer tools


� BDE tooling for Semantic Data Lake:

o Swagger: Semantics of RESTful APIs

o Semantic Analytics Stack (SANSA): Distributed data processing over large-scale Knowledge Graphs

o Semagrow: SPARQL over Big Data stores

o Ontario: Querying over Semantic Data Lakes

Page 18: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

More Information

� Big Data Integrator:


� README includes extensive documentation, instructions and information on supported components

� “Integrators at Work! Real-Life Applications of Apache Big Data Components” @4:30 PM

o Includes more details & demo


Page 19: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Demonstrating the Societal Value through 7 Pilot ‘Real-world’ use-cases

BigDataEurope Pilots


Page 20: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Pilots: Overview

� SC1: Health & Pharm.

� SC2: Food & Agr.

� SC3: Energy

� SC4: Transport


� SC5: Climate

� SC6: Social Sciences

� SC7: Security

Page 21: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

7 Pilots

◎ BDI Platform Instantiationso Allow end-users to easily deploy functionality in own system environment o Modularized Docker approach - easier to replace componentso Reduces effort to keep 3rd party software updated & integrated

◎ 7 Societal Challenge Pilots o Aligned with 7 European Commision H2020 Societal Challengeso Real-world use-cases (Data, Objectives, Solutions)o Some pilots have different data & objectives but a similar solution


Page 22: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

SC1: Pharmacology research



Life Sciences & Health

• Query a large number of datasets, some large

• Existing elaborate ingestion and homogenization by OpenPHACTS

• Extensive toolset developed by OPF and others

Objective: Large-scale heterogeneous pharma-

research data linking & integration

Page 23: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

SC1: Architecture & Components


• Replicate Open PHACTS functionality on the BDE infrastructure using OS solutions• Based on Virtuoso, proprietary

distributed database

• Apply to other domains (e.g. Agriculture)

• Porting to BDI gives flexibility and enables new functionalities• Logging & system health monitoring

Page 24: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

SC2: Viticulture resources


Food and Agriculture

Objective: Automate publication ingestion and

thematic classification• AgInfra is a major

infrastructure for agriculture researchers, serving cross-linked bibliography, data, and processing services

Page 25: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"


SC2: Architecture & Components

• BDI deployed as an external infrastructure for processing text (viticulture publications)

• Storing and processing text at a larger scale than AgInfracan currently manage

Page 26: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

SC3: Predictive maintenance



• Wind turbine monitoring applies computational models to sensor data streams

• Models are weekly re-parameterized using week’s data from multiple turbines

Objective: Real-time turbine monitoring stream

processing and analytics

Page 27: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"


• Existing in-house non-scalable solution for model parameterization• Reliable Fortran software for data analysis• Efficient, but not scalable to data volume

• Developing a BDI orchestrator• Re-uses existing software unmodified• Makes it easy to apply in parallel to many

datasets and manage the outputs

SC3: Architecture & Components

Page 28: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

SC4: Traffic conditions estimation



• Combines:• Traffic modelling from

historical data• Current measurements from a

taxi fleet of 1200 vehicles

Objective: Estimation of real-time traffic

conditions in Thessaloniki

Page 29: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"


• New Flink implementations of map matching and traffic prediction algorithms

• BDI provides access to varied data sources• PostGIS database with

city map• ElasticSearch database

of historical data• Kafka stream of real-

time data

SC4: Architecture & Components

Page 30: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

SC5: Climate modelling



• Preparing modelling experiments• Slicing, transforming, combining datasets• Submission and retrieval from modelling


• Discovering and re-using previously computed derivatives• Lineage annotation: computer derivatives

from datasets and model parameters• Finding appropriate past runs avoids

repeating weeks-long modelling runs

Objective: Supporting data-intensive climate research

Page 31: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

• BDI offers:• Hive for managing data

in a way that can be retrieved and manipulated, rather than file blocks

• Cassandra stores structured and textual metadata for searching headers and lineage

• Existing infrastructure; stable, reliable software for parallel computation of models• BDI is deployed as an external infrastructure for preparing and managing datasets

SC5: Architecture & Components

Page 32: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

SC6: Municipality budgets


Social Sciences

• Ingestion of budget and budget execution data

• Multiple municipalities in varied formats and data models

Objective: Homogenized Budgetary data made

available for analysis and comparison

Page 33: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"


• BDI deployed as ingestion and storage infrastructure for external tools• Homogenizes variety of

data (JSON, CSV, XML, etc.)

• Exposes data as SPARQL endpoint serving

homogenized data

• Existing analytics and visualization tools• Use SPARQL queries to retrieve only the relevant slices of the overall data

SC6: Architecture & Components

Page 34: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

SC7: Change detection & verification


Secure Societies

• Events are extracted from text published by news agencies and on social networking sites

• Events are geo-located and relevant changes are detected by comparing current and previous satellite images

Objective: Detect and Verify Events based on Satellite

Imagery, News and Social Media

Page 35: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"


Event Detection

Change Detection

• Re-implementation of change detection algorithms for Spark

• Parallel orchestrator for text analytics• Re-uses existing software• Scales to many input streams

• BDI provides:• Cassandra for text content and

metadata• Strabon GIS store for detected

change location• Homogeneous access to both for

analysis and visualization

SC7: Architecture & Components

Page 36: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Free Workshops, Hangouts & Webinars

BigDataEurope Activities


Page 37: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

2nd round of Societal Workshops


Transport 22 September 2016 Brussels Collocated with Big Data for

Transport, Tisa workshop

Food&Agri 30 September 2016 Brussels Collocated with DG AGRI WP2018-

20 stakeholder consultation

Energy 4 October 2016 Brussels Collocated with EC H2020 Info Day

on “Smart Grids and Storage”

Climate 11 October 2016 Brussels Collocated with Melodies Project

Event – Exploiting Open Data

Security 18 October 2016 Brussels Standalone Workshop

Societies 5 December 2016 Cologne Collocated with EDDI16- 8th Annual

European DDI User Conference

Health 9 December 2016 Brussels Standalone Workshop

Page 38: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Other Activities

� Fresh set (7) of Societal Workshops in 2017

� Various SC-focussed and general hangouts, follow!

o Apache Flink & BDE (20 Oct) – available online

o More to follow!

o Keep track on BDE Website (Events)


Page 39: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

Demonstrating the ease-of-use in deploying custom instances of the BDI Platform

BDI Platform – A Demo


Page 40: Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smart Data Management"

WEB: www.big-data-europe.eu EMAIL: [email protected]




Prof. Sören Auer, auer © cs.uni-bonn · de > Dr. Simon Scerri, scerri © cs.uni-bonn · deEIS Department/Group,Fraunhofer IAIS & CS Department Uni-Bonn, Bonn, Germany

Questions & Contacts



leads the Fraunhofer

Big Data Alliance