atmosphere-eubrazil · resources, data management and processing services, demonstrated on a...

16
Advances with respect to the State Of The Art atmosphere-eubrazil.eu

Upload: others

Post on 22-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: atmosphere-eubrazil · resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine, achieving the following three technical

Advanceswith respectto the StateOf The Art

atmosphere-eubrazil.eu

Page 2: atmosphere-eubrazil · resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine, achieving the following three technical

ATMOSPHERE - Adaptive, Trustworthy, Manageable, Orchestrated, Secure, Privacy-assuring Hybrid, Ecosystem for Resilient Cloud Computing (2017-2019) is a Research Innovation Action funded by the European Commission under the Horizon 2020 Programme, Call identifier: H2020-EUB-2017, grant agreement No. 777154, topic: EUB-1-2017 Cloud Computing, including security aspects, and the Secretary of Politics of Informatics (SEPIN) of the Brazilian Ministry of Science and Technology (MCTI) under the corresponding matching Brazilian Call for proposals: 4ª Chamada Coordenada de Programa de Cooperação Brasil-União Europeia em Tecnologias da Informação e Comunicação - TIC.

This document contains information on core activities, findings, and outcomes of the ATMOSPHERE project. Any references to content in both website content and documents should clearly indicate the authors, source, organisation and date of publication.

The document has been produced with the co-funding of the European Commission and the Secretary of Politics of Informatics of Brazil. The content of this publication is the sole responsibility of the ATMOSPHERE consortium and cannot be considered to reflect the views of the European Commission nor the Secretary of Politics of Informatics of Brazil.

Page 3: atmosphere-eubrazil · resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine, achieving the following three technical

3

Table of Contents

pag.4 I ATMOSPHERE Consortium

pag.5 I Glossary

pag.6 I Foreword by Ignacio Blanquer & Francisco Brasileiro

pag.9 I Summary of ATMOSPHERE

pag.10 I ATMOSPHERE Targets

pag.11 I Progress with respect of SOTA by components

pag.11 I Progress in TMA

pag.12 I Progress in TDPS

pag.13 I Progress in TDMS

pag.14 I Progress in IMS

pag.15 I Progress with respect of SOTA in global

Page 4: atmosphere-eubrazil · resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine, achieving the following three technical

4

ATMOSPHERE Consortium

ATMOSPHERE is led by Ignacio Blanquer (Full Professor at Universitat Politècnica de València - Spain) and Francisco

Brasileiro (Full Professor at Universidade Federal de Campina Grande - Brazil). ATMOSPHERE brings together 15

institutions from Europe and Brazil to collaborate on designing and implementing a framework and platform relying on

lightweight virtualization, hybrid resources and Europe and Brazil federated infrastructures to develop, build, deploy,

measure and evolve trustworthy, cloud-enabled applications.

EUROPEAN

EUROPEANEUROPEAN

EUROPEAN

EUROPEAN

EUROPEAN

EUROPEAN

BRAZILIAN

BRAZILIAN BRAZILIAN

BRAZILIAN

BRAZILIAN BRAZILIAN

EUROPEAN BRAZILIAN

Page 5: atmosphere-eubrazil · resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine, achieving the following three technical

5

Glossary

ATMOSPHEREAdaptive, Trustworthy, Manageable,Orchestrated, Secure, Privacy-assuring,Hybrid Ecosystem for REsilient Cloud Computing

CTICCentro de pesquisa e desenvolvimento em ecnologias digitais para informação e comunicação

CLUESCluster Energy Savings

VALLUMCore component of the ATMOSPHERE TDMS

EC3Elastic Compute Clusters in the Cloud

ECEuropean Commission

FogbowFramework for federating clouds

GDPRGeneral Data Protection Regulation

GPUGraphics Processing Unit

ICTInformation and communications technology

IaaSInfrastructure as a Service

IMSInfrastructure Management Services

IMInfrastructure Manager

INCTInstituto Nacional de Ciência e Tecnologia

K8SKubernetes

LGDPLei Geral sobre a Proteção de Dados

LEMONADELive Exploration and Mining of massive Amountsof Data coming from Everywhere

MCTICMinistério da Ciência, Tecnologia, Inovaçõese Comunicações

CNPqConselho Nacional de Desenvolvimento Científicoe Tecnológico

NoSQLNot Only SQL

ODBC/JDBCOpen DataBase Connectivity / Java Database Connectivity

PAFPrivacy Assessment Framework

RNPRede Nacional de Ensino e Pesquisa

SCONESecure Linux Containers with SGX

SGXSoftware Guard Extensions

SOTAState of the Art

TOSCATopology and Orchestration Specificationfor Cloud Applications

IM-TOSCATOSCA-compliant version of Infrastructure Manager

TEETrusted Execution Environment

TDMSTrustworthy Data Management Services

TDPSTrustworthy Data Processing Services

TMATrustworthy Monitoringand Assessment

VMVirtual Machine

Page 6: atmosphere-eubrazil · resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine, achieving the following three technical

6

Foreword byIgnacio Blanquer & Francisco Brasileiro

With the increasing trend towards data processing in the cloud (79% in 20191 and 94% expected in 20212), the need for

security, privacy, fairness, and transparency in big data applications looms ever larger in the public consciousness. The

Big Data Applications should comply with security and other trustworthiness properties, namely privacy, fairness, and

transparency, to avoid collateral damage. This is where ATMOSPHERE comes into play.

Over a 24 month period, the ATMOSPHERE project (www.atmosphere-eubrazil.eu), funded by the European Commission

and the Brazilian Government, designed and developed a set of toolboxes and federation services to build and evaluate

trustworthy data analytics applications in federated clouds. The ATMOSPHERE platform makes life easier for different

experts dealing with digital data. As a result, data owners, system administrators, application developers & managers,

and data scientists can now develop more trustworthy and secure cloud computing applications while being compliant

with data protection regulations on both sides of the Atlantic.

Under the savvy and experienced management and technical coordination of the University of Valencia and the

University of Campina Grande, the fifteen partners of the ATMOSPHERE consortium developed eight services which,

together, support an entirely new spectrum of trustworthy services usable by a diverse set of organisations from

different business sectors: DNAt, Fogbow, IM-TOSCA/EC3, LEMONADE, Capacity Planner, SCONE, TMA and Vallum.

1 Source: RightScale 2019 - State of the Cloud Report 2 Source: Cisco Global Cloud Index: Forecast and Methodology, 2016-2021 White Paper

Page 7: atmosphere-eubrazil · resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine, achieving the following three technical

7

Figure 1: The ATMOSPHERE EU-Brazil international use-case improving trustworhtiness of distributed big data

applications

An additional service, the RHD screening Artificial Intelligence tool, focuses on the health sector and enables the

secure and efficient processing of medical images, metadata, and clinical information, with the ability to evaluate

trustworthiness in performance, privacy, availability, robustness, and dependability. It processes a large set of medical

images, along with additional metadata and clinical information, efficiently and securely, leading to better and quicker

disease diagnosis. Assets are already being used by research and business organisations in both Europe & Brazil. These

include Dell, Vodane, Talkdesk, INDRA, as well as EGI Foundation and the Brazilian National Education and Research

Network (RNP). In addition, funded initiatives such as PRIMAGE, RESECA-CPS, EOSC Synergy and TETRAMAX will apply

the services in their work plan, and yet others are expected to use the assets, particularly those available in the European

Open Science Cloud (EOSC) marketplace portal.

ATMOSPHERE constitutes the latest step in a long trajectory of Europe-Brazil collaborative projects in cloud computing,

started back in 2010. The EUBrazil Cloud Connect (eubrazilcloudconnect.eu) set up the basis for creating federated

infrastructures for scientific collaborations between Europe and Brazil. In EUBraBIGSEA (www.eubra-bigsea.eu), it

was created a platform for data analytics to build data science applications on top of cloud resources. SecureCloud

(securecloudproject.eu) built the basic foundations for the secure access and processing of data, used on SCONE.

Finally, ATMOSPHERE provided the concept of Trustworthiness and implement the means to measure, monitor, assess

and improve it for data analysis applications.

Furthermore, ATMOSPHERE delivered a “Final Research & Innovation Research Priorities" report, proposing three

research topics for future joint initiatives between the regions with high potential economic impact in both. This

analysis is the new iteration of an analysis performed by EUBrasilCloudFORUM project (eubrasilcloudforum.eu), which

also constructed a Final Research Roadmap between Brazil and Europe, in 2017, that was transferred and updated by

ATMOSPHERE.

Page 8: atmosphere-eubrazil · resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine, achieving the following three technical

8

The Cloudscape Brazil series, a forum to showcase success stories between Brazil and Europe, was first initiated by

the EUBrazil Cloud Connect project, later transferred to EUBrasilCloudFORUM and then to ATMOSPHERE. Following

up with the legacy of connecting ICT experts from both sides of the Atlantic, the Cloudscape Brazil editions organised

by ATMOSPHERE were the meeting point for business people, public sector representatives, research scientists and

developers behind some of the most exciting developments in cloud technologies from both regions. The event has

been facilitating consensus on actions that matter for European and Brazilian economies and socio-economic aspects,

as well as connecting innovative ICT SMEs to promote international collaboration with potential to have a positive

impact on local citizens.

Figure 2: The EU-Brazil long story of ICT collaboration

The trans-oceanic federated infrastructure built by ATMOSPHERE, resulting in technology transfer to new initiatives

and industry players, could have not been set up without a long story of collaboration between Europe and Brazil. This

collaboration has been benefiting both societies at large and it has been an honour for us to have such an active role

to contribute to society. Looking forward to joining to new opportunities that will arise in the future, to keep with the

legacy that was created in the past decade.

Ignacio Blanquer & Francisco BrasileiroFull Professor at Universitat Politècnica de València (Spain) &

Full Professor at Universidade Federal de Campina Grande (Brazil)

High-level Data Analytics Framework (LEMONADE), a TOSCA orchestrator (IM) and a performance modelling system(DAGSIM)

Federation capabilities(Fogbow) to work in a

trans-continental scenario

Cloudscape Brazil & Research RoadmapBasic foundations (SCONE)

for the secure access and processing of data

Page 9: atmosphere-eubrazil · resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine, achieving the following three technical

9

Summary of ATMOSPHERE

Adaptive, Trustworthy, Manageable, Orchestrated, Secure, Privacy-assuring, Hybrid Ecosystem for REsilient Cloud

Computing (2017-2019) --hereinafter “ATMOSPHERE”-- is a 24-month Research and Innovation Action, funded by the

European Commission under the H2020 Programme, Call identifier: H2020-EUB-2017, Grant Agreement No 777154,

topic: EUB-1-2017 Cloud computing, including security aspects; and the Secretary of Politics of Informatics (SEPIN) of

the Brazilian Ministry of Science and Technology (MCTI) under the corresponding matching Brazilian Call for proposals:

4ª Chamada Coordenada Programa de Cooperação Brasil-União Europeia em Tecnologias da Informação e Comunicação

– TIC.

ATMOSPHERE has designed and developed a framework and a platform to implement trustworthy cloud services on

a federated intercontinental resource pool. Trust in a cloud environment is considered as the reliance of a customer

on a cloud service and, consequently, on its provider. ATMOSPHERE focuses on a broad spectrum of trustworthiness

properties and their measures such as Security, Privacy, Coherence, Isolation, Stability, Fairness, Transparency and

Dependability. Based on the given definition of trust in cloud computing, trustworthiness can be defined as the

worthiness of a service and its provider for being trusted.

ATMOSPHERE supports the development, build, deployment, measurement and adaptation of trustworthy cloud

resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine,

achieving the following three technical results:

• A hybrid federated VM and container platform;

• A development framework with four sets of services:

• Infrastructure Management Services (Cloud Computing Platform)-IMS;

• Trustworthy Monitoring and Assessment Framework;

• Trustworthy Distributed Data Management Services-TDMS;

• Trustworthy Data Processing Services-TDPS.

• A pilot use case on Medical Imaging Processing.

Page 10: atmosphere-eubrazil · resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine, achieving the following three technical

10

ATMOSPHERE Targets

The layers of the ATMOSPHERE platform, described in the previous section, and the avatar representations used to

denote such roles of the potential target users, are indicated below.

ApplicationDevelopersWho are the technical experts that use the software libraries and services of the ATMOSPHERE platform to build trustworthy cloud applications.

ApplicationManagersWho are the users that hold infrastructure credentials and deploy a specific application and the ATMOSPHERE services on top of an ATMOSPHERE infrastructure. They will also manage the needs of the users with respect to the application.

DataScientistsWho make use of both the final applications and the high-level Trustworthy Data Processing Services deployed on top of the ATMOSPHERE platform by Application Managers.

Data OwnersWho upload and share their data on the Trustworthy Data Management Services of the ATMOSPHERE platform.

SiteAdminWho install and configure the Infrastructure Management Services on their cloud sites to implement the federation.

APPLICATION

APP

LICA

TIO

N

TRUSTWORTHY DATAPROCESSING SERVICES

(TDPS)

TRUSTWORTHY DATAMANAGEMENT SERVICES

(TDMS)

INFRASTRUCTUREMANAGEMENT SERVICES

(IMS)

FEDERATED INFRASTRUCTURE

Page 11: atmosphere-eubrazil · resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine, achieving the following three technical

11

Progress with respectof SOTA by components 1. Progress in TMAThe Trustworthy Monitoring and Assessment (TMA) component provides a quantitative evaluation of the trustworthiness

of a service or application by composing the metrics obtained from different services and components. TMA defines

Quality Models, which are a hierarchical representation of several entities. Multiple Quality Models can be combined to

evaluate aggregated/higher level metrics taking into account customisable weights.

The TMA is a service relevant to application managers, who can monitor the application and virtual infrastructure,

application developers, who could develop call backs to adapt the applications when trustworthiness thresholds are

not met, data scientists who could gather information on the high-level trustworthiness metrics related to stability and

fairness, and data scientists, who could define and evaluate the privacy reidentification risks of the datasets they want

to share.

Despite the existing solutions for monitoring, without ATMOSPHERE, it is hard to monitor the cloud health, as typically

we get different measures from different metrics that are individually unlinked. There is a lack of data that represents

the health of the application together with the cloud services. Moreover, adaptations in case of a recess in the

trustworthiness of an application are manual and potentially error-prone. As a final consideration, all these tasks are

time-consuming.

Figure 2: Issues (left) and solutions (right) addressed by TMA.

ATMOSPHERE provides an integrated, detailed and configurable measure of the application cloud health through

navigable quality models that aggregate the information from the infrastructure services, platform services and

application services. Moreover, the TMA can trigger adaptation plans without the need for human interaction to mitigate

even complex scenarios such as anomalies in the execution time, lack of accuracy, reduced isolation or unacceptable

reidentification risks. ATMOSPHERE TMA reduces the time for human interaction and provides historical data of the

status of the application.

Before ATMOSPHERE After ATMOSPHERE

Page 12: atmosphere-eubrazil · resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine, achieving the following three technical

12

-

-

-

-

-

-

2. Progress in TDPSThe Trustworthy Data Processing Services (TDPS) of ATMOSPHERE provide a framework to implement trustworthy

data analytic applications in the cloud. The TDPS provides means for the annotation of the legal framework for data and

services and a programming environment that generates executable code for data analytics from graphic workflows

(Live Exploration and Mining of A Non-trivial Amount of Data from Everywhere - LEMONADE). LEMONADE provides

Data Scientists and Application Developers with measures for fairness, explainability, privacy and stability of Machine

Learning Applications integrated in a Quality Model. Application Managers can link application back-ends to LEMONADE

for the execution of data analytics with automatic management of resources through horizontal elasticity.

Without ATMOSPHERE, developers and data scientists have a hard work to find whether models use sensitive attributes

or discriminate (i.e., are unfair). This could lead to embed too much privacy information from the training data in the

models, increasing the risk of indirectly exposing such data through the models. Additionally, model-based estimations

and predictions are black-boxes and data scientists face a hard time debugging and calibrating them. Therefore,

developers and data scientists may determine whether sensitive attributes are the basis of models and audit their

fairness. Developers should also code themselves mechanisms for understanding the tradeoff between bias and

variance, as well as the model's stability.

Figure 3: Issues (left) and solutions (right) addressed by TDPS.

With ATMOSPHERE data scientists can easily develop data processing workflows without requiring programming

skills and incorporate components that implement the evaluation of fairness and stability. Application developers

can leverage several stability, explanation evaluation and fairness evaluation mechanisms to incorporate them on

their applications in a straightforward way. Moreover, data scientists will get both model outcomes and respective

explanations than can be used to debug and calibrate models more easily.

Before ATMOSPHERE After ATMOSPHERE

Page 13: atmosphere-eubrazil · resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine, achieving the following three technical

13

Before ATMOSPHERE After ATMOSPHERE

3. Progress in TDMSThe Trustworthy Data Management Services (TDMS) in ATMOSPHERE include a set of services and components

to securely store and access sensitive information even in untrusted cloud services (Vallum). The TDMS includes

mechanisms for Data Scientists and Application Managers to store data encrypted on disk, process them encrypted in

memory and guarantee that only authorised processes can access such data, preventing users even with administrative

credentials to access data in disk or memory. Application Developers can leverage TDMS for the secure storage and

data access of sensitive data. The TDMS provides Data Owners and Data Scientists with means for privacy attestation

and annotation of legal grounds.

Without ATMOSPHERE, Application Managers have to rely on the Infrastructure provider administrators, as they may

be able to access data volumes or could dump the memory of a running Virtual Machine. When a data owner shares

access to valuable, but sensitive, data for training AI models, access tracking is limited as typically credentials are given

at the level of the user, who could be able to do any kind of processing. Data owners and data scientists do not know

whether operations may lead to privacy violation.

Figure 4: Issues (left) and solutions (right) addressed by TDMS.

By using ATMOSPHERE, a data owner can share sensitive data with the data scientist, but, as the data is sensitive, the

data owner explicitly limits which applications will use them and where the applications could run (e.g. only trustworthy

providers). By running on enclaves, the information remains encrypted in memory, so privacy is preserved even in

untrusted cloud offerings. Data owners are aware of the risk of privacy violation associated with a given operation and

may request counter measures to prevent its execution. Finally, by means of the Privacy Access Forms application, both

the data owner and the data scientist can annotate the legal grounds that require the processing of the data even in an

international scenario.

Page 14: atmosphere-eubrazil · resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine, achieving the following three technical

14

4. Progress in IMSThe Infrastructure Management Services (IMS) of ATMOSPHERE is a framework for cloud federation, cloud orchestration

and performance modelling that provides upper layers of ATMOSPHERE with a reliable and efficient framework for

running distributed cloud applications in international collaborations. The IMS includes a lightweight cloud federation

framework (fogbow) that can be used to deploy cloud applications along several cloud sites, using a federated private

network among them, easing the work of site administrators when managing distributed infrastructures. On top of this

federated cloud, the IMS provides a cloud orchestrator (IM-TOSCA) that can be used to deploy virtual infrastructures

described as code. This way application managers can easily deploy applications and their associated dependencies in

the cloud without strongly binding to the back-ends. The work of application managers in monitoring the applications

is reduced as performance models characterise their expected behaviour, so deviations can be automatically detected.

Without ATMOSPHERE, site administrators have to manually configure cloud federation resources. This needs a deep

coordination among site administrators to federate cloud sites and to allow private connectivity between resources of

different sites. Application managers have to manually configure the applications or develop scripts to automate them,

which are typically bounded to platform specificities, reducing the repeatability and portability. Finally, applications run

without a priori performance guarantees, which requires an extra effort by the application managers to monitor them

and manually fine tune allocated resources if applications require strict deadlines for their execution.

Figure 5: Issues (left) and solutions (right) addressed by IMS.

ATMOSPHERE provides templates for the automatic deployment and adaptation of complex and distributed

applications on cloud resources and network federations. The cloud federation frees site administrators from the

burden of managing credentials of external users, trusting on the application managers, and provides the application

managers with a federated private network, which reduces the need for public IPs, which are both a scant resource

and a vulnerability risk. Finally, performance modelling can define the rightmost allocation of resources for minimising

waste of resources to achieve the results on a given deadline.

Before ATMOSPHERE After ATMOSPHERE

Page 15: atmosphere-eubrazil · resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine, achieving the following three technical

15

Progress with respect of SOTA in global

The whole ATMOSPHERE platform constitutes a novel approach for the complete management of trustworthiness in

cloud applications related to data analytics, providing components for secure storage and processing, user-friendly

building of data analytic pipelines, efficient execution and legal compliance support in a cloud-agnostic and federated

environment. A summary of the benefits for the four user roles is provided in Figure 6.

Figure 6: Issues (left) and solutions (right) addressed by ATMOSPHERE.

Before ATMOSPHERE After ATMOSPHERE

Page 16: atmosphere-eubrazil · resources, data management and processing services, demonstrated on a sensitive scenario of distributed telemedicine, achieving the following three technical

ATMOSPHERE is funded by the European Union under the Cooperation Programme, Horizon 2020 grant agreement No. 777154.Este projeto é resultante da 4ª Chamada Coordenada BR-UE em Tecnologias da Informação e Comunicação (TIC), anunciada pela Rede Nacional de Ensino e Pesquisa (RNP) e pelo Ministério da Ciência, Tecnologia, Inovações e Comunicações (MCTIC), no âmbito do acordo de cooperação Número 51119.