bde-sc1 webinar: openphacts re-engineered with big data europe

Post on 08-Apr-2017

21 Views

Category:

Health & Medicine

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BIG DATA EUROPE

H2020 CSA (2015-17)

SC1 – HEALTH CHALLENGE WEBINAR

Integrating Big Data, Software & Communities for Addressing Europe’s Societal ChallengesApril 4th 2017

Kiera McNeice, Ronald Siebes, Hajira Jabeen and Nick Lynch

BigDataEurope

5-avr.-17www.big-data-europe.eu

The 7 Societal

Challenges and their

first pilots

SC1: Life Sciences & Health

5-avr.-17www.big-data-europe.eu

SC1: Life Sciences & Health

SC1: Life Sciences & Health

5-avr.-17www.big-data-europe.eu

SC1: Life Sciences & Health

5-avr.-17www.big-data-europe.eu

SC2: Food & Agriculture

5-avr.-17www.big-data-europe.eu

SC2: Food & Agriculture

SC2: Food & Agriculture

5-avr.-17www.big-data-europe.eu

Partners:FAO, the largest autonomous agency within the

United Nations system and one of the main

players in the agricultural information

community.

Big Data Focus area: Large-scale distributed agricultural data integration

Selected Key Data assets: INFOODS, AQUASTAT Green Learning Network (GLN), Agricultural

Bibliography Network (ABN), AgroVoc, AquaMaps, Fishbase

Semantic Web Company (SWC) is a technology provider headquartered in

Vienna (Austria). SWC supports organizations from all industrial sectors

worldwide to improve their information management. Their core product is to

extract meaning from big data by making use of linked data technologies.

Agroknow is a company that captures, organizes and adds value to the

rich information available in agricultural and food sciences, in order to

make it universally accessible, useful and meaningful.

SC2: Food & Agriculture

5-avr.-17www.big-data-europe.eu

Pilot focus area:

Viticulture(from the Latin word for vine)

is the science, production,

and study of grapes.

It deals with the series of

events that occur in the vineyard.

SC2: Food & Agriculture

5-avr.-17www.big-data-europe.eu

Pilot 2: Support advanced crop

data discovery, processing,

combining and visualization from

distributed and heterogeneous

data repositories

Vine and Wine sector: emerging market in EU

Sustainability and biodiversity challenges:

local varieties are being lost

Exploitation of new grapevine varieties and

clones in terms of climate change adaptation

Quality and health status of viticultural

products

Contribution to human health (antioxidants,

prevention of heart diseases etc.)

Wide variety of heterogeneous (and big)

data from various information sources

Reasons:

SC3: Energy

5-avr.-17www.big-data-europe.eu

SC3: Energy

SC3: Energy

5-avr.-17www.big-data-europe.eu

Partners:A public entity supervised by the Ministry of Environment,

Energy and Climate Change in Greece, founded in

September 1987, active in the fields of Renewable

Energy Sources (RES), Rational Use of Energy (RUE) and

Energy Saving (ES).

Big Data Focus area: Real-time turbine monitoring stream processing and analytics

Selected Key Data assets: European Energy Exchange Data, smart meter sensor data,

gas/fuels market/price data, consumption statistics, stratigraphic model data (geology,

geophysics)

NCSR "Demokritos", the largest multidisciplinary research

centre of Greece hosts significant scientific research,

technological development and educational activities,

coordinated by eight Institutes.

SC3: Energy

5-avr.-17www.big-data-europe.eu

Pilot focus area:

System monitoring

in energy production

units.

SC3: Energy

5-avr.-17www.big-data-europe.eu

Pilot 3: Operation, maintenance

and production forecasting for

wind turbines on real-time sensor

data.

Current technology is not able to deal with

full amount of available valuable data

Economic benefit of predicting output and

prevention of damage (if one can predict one

part about to fail it can be prevented that other

parts get damaged)

Large continuous stream of sensor data,

perfect to test our platform

Reasons:

SC4: Transport

5-avr.-17www.big-data-europe.eu

SC4: Transport

SC4: Transport

5-avr.-17www.big-data-europe.eu

Partners: The Fraunhofer Society is a German research organization with 67

institutes spread throughout Germany, each focusing on different

fields of applied science.

Big Data Focus area: Streaming sensor network & geo-spatial data integration

Selected Key Data assets: GTFS data, OSM/LinkedGeoData, MobilityMaps, Transport

sensor data, ROSATTE Road safety attributes, European Road Data Infrastructure -

EuroRoadS

The Centre for Research and Technology-Hellas (CERTH)

founded in 2000 is one of the leading research

centres in Greece. CERTH includes the Hellenic Institute of

Transport (HIT): Land, Sea and Air Transportation as well

as Sustainable Mobility services

ERTICO - ITS Europe is a partnership of around 100 companies

and institutions involved in the production of Intelligent Transport

Systems (ITS).

SC4: Transport

5-avr.-17www.big-data-europe.eu

Pilot focus area:

Info mobility and

traffic planning

SC4: Transport

5-avr.-17www.big-data-europe.eu

Pilot 4: Multisource data collection

for the provision of accurate info-

mobility and advanced transport

planning service in Thessaloniki,

Greece

Congestion is a major problem in Europe,

especially in urban areas.

utilizing real-time probe data for the

provision of accurate info-mobility services and

advanced transport planning, leads to better

decisions

The use of mobility data coming from multiple

sources presents significant challenges,

especially due to the different nature of the

datasets both in content and spatio-temporal

terms as well as due to the fact that the data

should be collected and processed in real time.

Reasons:

SC5: Climate

5-avr.-17www.big-data-europe.eu

SC5: Climate

SC5: Climate

5-avr.-17www.big-data-europe.eu

Partners:A public entity supervised by the Ministry of Environment,

Energy and Climate Change in Greece, founded in

September 1987, active in the fields of Renewable

Energy Sources (RES), Rational Use of Energy (RUE) and

Energy Saving (ES).

Big Data Focus area: Enormous simulation time. Extremely complicated computing model.

Selected Key Data assets: European Grid Infrastructure (EGI). Access to several data centres

hosted at CNRS-Lyon, NCSR-D Athens, INFN-Milan, NIKhEF-Amsterdam.

NCSR "Demokritos", the largest multidisciplinary research

centre of Greece hosts significant scientific research,

technological development and educational activities,

coordinated by eight Institutes.

SC5: Climate

5-avr.-17www.big-data-europe.eu

Pilot focus area:

Supporting data-intensive

climate research

SC5: Climate

5-avr.-17www.big-data-europe.eu

Pilot 5: Downscaling, and retrieval

process on (raw) climate data via

User-defined parameters (e.g.

geographical areas, time period,

physical variables, computational

grids, time steps)

The provision of Climate model data satisfies

an important objective, that of assessing the

potential impacts of climate change on well

being for adaptation, prevention and mitigation

measures and supporting other policy making

decisions.

The awareness led to the availability of huge

datasets

Downscaling is a computational intensive

process

Reasons:

SC6: Social Sciences

5-avr.-17www.big-data-europe.eu

SC6: Social Sciences

SC6: Social Sciences

5-avr.-17www.big-data-europe.eu

Partners:CESSDA provides large scale, integrated and sustainable

data services to the social sciences. CESSDA is organised

as a limited company under Norwegian law owned and

financed by the individual EU member states’ ministry of

research or a delegated institution.

Big Data Focus area: Statistical and research data linking & integration

Selected Key Data assets: Federated social sciences data catalogs, statistical data from public

data portals and statistical offices (e.g. EuroStats, UNESCO, WorldBank)

NCSR "Demokritos", the largest multidisciplinary research

centre of Greece hosts significant scientific research,

technological development and educational activities,

coordinated by eight Institutes.

SC6: Social Sciences

5-avr.-17www.big-data-europe.eu

Pilot focus area:

Citizens budget spending on

municipal level

SC6: Social Sciences

5-avr.-17www.big-data-europe.eu

Pilot 6: Citizens budget

in municipal level

Budget: the most important document of

public policy

Budget execution affects everyday lives

Citizens are more involved in city level

Having a platform that integrates

heterogeneous budget data (many municipality

have their own data formats) and calculates

infographics would benefit the citizens, the

research community and policy makers

Reasons:

SC7: Security

5-avr.-17www.big-data-europe.eu

SC7: Security

SC7: Security

5-avr.-17www.big-data-europe.eu

Partners:The Centre supports the decision making of the European

Union in the field of the Common Foreign and Security

Policy (CFSP), by providing products and services

resulting from the exploitation of relevant space assets

and collateral data, including satellite imagery and

aerial imagery, and related services.

NCSR "Demokritos", the largest multidisciplinary research

centre of Greece hosts significant scientific research,

technological development and educational activities,

coordinated by eight Institutes.

SC7: Security

5-avr.-17www.big-data-europe.eu

Big Data Focus area: Image data analysis

Selected Key Data assets: Earth Observation data (e.g. Very High Resolution Satellite

Imagery acquired from commercial providers and governmental systems) and collateral data

for supporting CFSP/CSDP missions and operations

SC7: Security

5-avr.-17www.big-data-europe.eu

Pilot focus area:Getting insight in man-made surface

changes triggered by automatic detection, news, or

social media information

SC7: Security

5-avr.-17www.big-data-europe.eu

Pilot 7: Ingestion of remote

sensing images and social

sensing data to detect and verify

man-made changes on the Earth

surface for security applications

Evacuation route planning

Monitoring of critical infrastructures

Border security

Satellite image data is HUGE and

computational intensive to compare

Smart ‘focus’ algorithms are needed to

prioritize the analysis jobs

Reasons:

Big Data Europe Integrator Platform

Dr Hajira Jabeen, University of Bonn

SC1 Webinar

Platform Goals

◎Opensource

◎Simple to get started with Big Data

◎Support a variety of use cases

◎Embrace emerging Big Data technologies

◎Simple integration with custom components

Key actors

Platform Architecture4

5

Platform Architecture

Platform Architecture6

Platform Architecture Support Layer

Init Daemon

GUIs

Monitor

App Layer

Traffic

Forecast Satellite Image Analysis

Platform Layer

Spark Flink Semantic Layer

Ontario SANSA SemagrowKafka

Real-time Stream Monitoring

...

...

Resource Management Layer (Swarm)

Hardware Layer

Premises Cloud (AWS, GCE, MS Azure, …)

Data Layer

Hadoop NOSQL Store CassandraElasticsearch ...RDF Store

Supported FrameworksSearch/indexing Data processing

Apache Solr Apache Spark

Data acquisition Apache Flink

Apache Flume Semantic Components

Message passing Strabon

Apache Kafka Sextant

Data storage GeoTriples

Hue Silk

Apache Cassandra SEMAGROW

ScyllaDB LIMES

Apache Hive 4Store

Postgis OpenLink Virtuoso

8

BDI Stack Lifecycle

BDI Stack Lifecycle

Deploy BDE

Platform/Stack

to the Cluster

BDI Stack Lifecycle

Stack/Cluster

Monitor

BDI Stack Lifecycle

Developing

Custom

Applications

BDI Stack Lifecycle

Docker Images

BDI Stack Lifecycle

BDI Stack (workflow)

builder

BDI Stack Lifecycle

Custom Components

*Init Daemon

*Integrator UI

◎ High level pictureo docker-compose.yml describes pipeline topology

◎ BDE provided componentso extend template image with your code

◎ New componentso build a Docker image for your componento this is your own little Virtual Machine for your component

◎ Sharingo publish topology as git repositoryo publish new components on docker hub

Platform development

Actors

◎Cluster Setup ◎Developer ◎Packaging◎Stack Composition / Integration◎Deployment◎Monitoring

17

Development◎Base Docker images

o Serve as a template for a (Big Data) technologyo Easily extendable custom algorithm/data

◎Published componentso Image repositories on GitHubo Automated builds on DockerHubo Documentation on BDE Wiki

19

Deploying a Big Data Stack◎ Stack

o collection of communicating components o to solve a specific problem

◎ Described in Docker Composeo Component configurationo Application topology

20

Enhancing the Component

◎ Orchestrator required for initialization process (init_daemon)o Components may depend on each othero Components may require manual intervention

◎ User Interface Integrationo Standard Interfaces from componentso Combine and align the interfaces

21

User Interfaces

◎Target: Facilitate use of the platform

o User Interface Adaption

◎Available interfaces

o Workflow UIs

❖Workflow Builder

❖Workflow Monitor

o Swarm UI

o Integrator UI

22

BDE Workflow Builder23

BDE Workflow Monitor24

Swarm UI

Swarm UI26

Integrator UI27

Beyond the state of the art ...

Smart Big Data

Increase the value of Big Data by adding meaning to it!

28

Semantic Data Lake (Ontario)

◎Data Swamp

o Repository of data in its raw format

o Structured, semi-structured, unstructured

o Schema-less

◎Data Lake

o Add a Semantic layer on top of the source datasets

o The data is semantically lifted using existing ontology terms

29

31

SANSA Stack

Thank youhttps://github.com/big-data-europe

32

jabeen@iai.uni-bonn.de

33

BDE vs Hadoop distributions

Hortonworks Cloudera MapR Bigtop BDE

File System HDFS HDFS NFS HDFS HDFS

Installation Native Native Native Native lightweight virtualization

Plug & play components (no rigid schema)

no no no no yes

High Availability Single failure recovery (yarn)

Single failure recovery (yarn)

Self healing, mult. failure rec.

Single failure recovery (yarn)

Multiple Failure recovery

Cost Commercial Commercial Commercial Free Free

Scaling Freemium Freemium Freemium Free Free

Addition of custom components

Not easy No No No Yes

Integration testing yes yes yes yes --

Operating systems Linux Linux Linux Linux All

Management tool Ambari Cloudera manager MapR Control system

- Docker swarm UI+ Custom

34

BDE vs Hadoop distributions◎BDE is not built on top of existing distributions◎Targets

o Communitieso Research institutions

◎Bridges scientists and open data◎Multi Tier research efforts towards Smart

Data

35

Stian Soiland-Reyes, University of ManchesterNick Lynch, CTO Open PHACTS Foundation

4 Apr 2017

Stian Soiland-Reyes, University of ManchesterNick Lynch, CTO Open PHACTS Foundation

4 Apr 2017

Summary

3

• Update on Docker and Open PHACTS

• Learnings & transition to AWS

• Next Steps & Future Releases

Open PHACTS @dockerhub

14

https://hub.docker.com/r/openphacts/

Open PHACTS Next Steps

34

• Data Refresh planned API 2.2:–Phase 1: ChEMBL, WikiPathways, Uniprot + Chemistry

Refreshed (RDF and linksets)

–Phases 2 & 3: Remaining data sources

–Build data refresh processes

• Wider Architecture Review

• Science and Open PHACTS Webinar–Science and Open PHACTS: Workflow tools for Life

Science Research

–https://register.gotowebinar.com/register/2550359383420450817

Open PHACTS

35

• Custom Data Staging:

–Different licensing options to cover Annotated SureChEMBL for members/non members

• MicroServices?

–Part of Architecture review to discuss future services/API

–Interested in experiences of this

• Workflow

–BioExcel Workflow blocks in development

–See Bio.tools

top related