seventh framework programme - cordis · hdfs hadoop file system . hotp hmac-based one-time password...

23
Page 1 of 23 SEVENTH FRAMEWORK PROGRAMME Scalable, Secure Storage Biobank Grant Agreement Number: 317871 BiobankCloud Security: D3.2, Security Toolset Design Final Version: 0.9 Responsible Partner: Ali Gholami, KTH Date: 2013-11-29 Ref. Ares(2013)3610758 - 02/12/2013

Upload: others

Post on 10-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 1 of 23

SEVENTH FRAMEWORK PROGRAMME

Scalable, Secure Storage Biobank

Grant Agreement Number: 317871

BiobankCloud Security:

D3.2, Security Toolset Design

Final Version: 0.9 Responsible Partner: Ali Gholami, KTH Date: 2013-11-29

Ref. Ares(2013)3610758 - 02/12/2013

Page 2: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 2 of 23

Project and Deliverable Information Sheet Scalable Secure Storage Biobank Project

Project Ref. №: 317871 Project Title: Scalable, Secure Storage Biobank Project Web Site: http://www.biobankcloud.eu Deliverable ID: D3.2 Deliverable Nature: Report Deliverable Level: PU

Contractual Date of Delivery: 30 / November / 2013 Actual Date of Delivery: 29 / November / 2013

EC Project Officer: Wolfgang Treinen Partner Responsible: Ali Gholami, KTH Contributing Partners: KTH & KI & Charité

* - The dissemination levels are indicated as follows: PU – Public, RE – Restricted to other participants, CO – Confidential, only for members of the project (including the Commission Services). Document Status Sheet

Version Date Description Author/Partner 0.1 2013-06-10 Initial version, TOC Ali Gholami /KTH 0.2 2013-07-02 Security Requirements Ali Gholami /KTH 0.3 2013-09-25 Conceptual Architecture Ali Gholami /KTH 0.4 2013-09-30 Logical Architecture Ali Gholami /KTH 0.5 2013-10-14 Physical Architecture Ali Gholami /KTH 0.6 2013-11-13 Fixed comments of internal WP3

meetings Ali Gholami, Jim Dowling/KTH, Roxana Merino Martinez, Jane Reichel/KI, Lora

Dimitrova/ Charité 0.7 2013-11-22 General comments on the design Ulf Leser/Humboldt University,

Jane Reichel/KI 0.8 2013-11-25 General comments on the design Roxana Merino Martinez/KI 0.9 2013-11-29 Final version Ali Gholami/KTH

Page 3: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 3 of 23

Contents List of Tables ......................................................................................................................................... 5

List of Acronyms and Abbreviation .................................................................................................... 5

EXECUTIVE SUMMARY ..................................................................................................................... 7

1. Introduction ................................................................................................................................ 8

2. Security Requirements of the BiobankCloud .................................................................. 8

2.1 Security Principles ............................................................................................................ 8

2.2 Security Policies ................................................................................................................ 9

2.2.1 Identification ............................................................................................................... 9

2.2.2 Authentication ............................................................................................................ 9

2.2.3 Authorization ............................................................................................................ 10

2.2.4 Auditing ...................................................................................................................... 11

3. Conceptual Architecture ....................................................................................................... 11

3.1 Identity Management System .................................................................................... 11

3.1.1 Identities Store ........................................................................................................ 12

3.1.2 Federated Identities Services............................................................................. 12

3.1.3 Identity Mapping ..................................................................................................... 12

3.1.4 Identity Administration ......................................................................................... 12

3.1.5 Provisioning Services ............................................................................................. 12

3.2 Access Control Management ...................................................................................... 13

3.2.1 Request Structure ................................................................................................... 13

3.2.2 Policy Administration Point .................................................................................. 13

3.2.3 Policy Decision Point .............................................................................................. 13

3.2.4 Policy Enforcement Point ..................................................................................... 14

3.2.5 Access Control Table ............................................................................................. 14

3.3 Auditing .............................................................................................................................. 14

3.3.1 Security Events Logger ......................................................................................... 14

3.3.2 Audit Management ................................................................................................. 15

4. Logical Security Architecture .............................................................................................. 15

4.1 Identity Management System .................................................................................... 15

4.1.1 Identity Management Services .............................................................................. 15

Page 4: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 4 of 23

4.1.1.1 Identity Administration ......................................................................................... 16

4.1.1.2 User Manager ........................................................................................................... 16

4.1.2 Directory Services ...................................................................................................... 16

4.1.3 Provisioning Services ................................................................................................. 17

4.2 Access Control Management ...................................................................................... 18

4.2.1 PAP ............................................................................................................................... 18

4.2.2 PDP ............................................................................................................................... 18

4.2.3 PEP ................................................................................................................................ 19

4.2.4 Security Login Handler ......................................................................................... 19

4.3 Auditing .............................................................................................................................. 19

4.3.1 Event Suppliers ........................................................................................................ 19

4.3.2 Events Management Service .............................................................................. 20

4.3.3 Event Consumer ...................................................................................................... 20

5. Physical Security Architecture ........................................................................................... 21

5.1 IdM System ....................................................................................................................... 21

5.2 Authorization System .................................................................................................... 21

5.3 Auditing System .............................................................................................................. 21

6. Conclusions and Future Work ............................................................................................ 22

References ......................................................................................................................................... 23

Page 5: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 5 of 23

List of Figures Figure 1, Conceptual Architecture of the IdM System ..................................................... 12 Figure 2, Conceptual Architecture of the Access Control System ............................... 13 Figure 3, Conceptual Architecture of the Auditing System ............................................ 14 Figure 4, Logical Architecture of the IdM System .............................................................. 15 Figure 5, LDAP Namespace of the BiobankCloud ............................................................... 17 Figure 6, Access Control Management Logical Architecture .......................................... 18 Figure 7, Auditing Logical Architecture .................................................................................. 20 Figure 8, Physical Security Architecture ................................................................................ 22

List of Tables Table 1, The BiobankCloud Roles and Groups ..................................................................... 10 Table 2, LDAP Structure of the BiobankCloud ..................................................................... 16 Table 3, Common Log Event Contents ................................................................................... 20

List of Acronyms and Abbreviations AAA Authentication, Authorization, Auditing ABAC Attribute-Based Access Control ACL Access Control List ACT Access Control Table API Application Programming Interface CA Certificate Authority CRL Certificate Revocation List DPD Data Protection Directive EC European Commission EU European Union HDFS Hadoop File System HOTP HMAC-Based One-Time Password HTTPS Hypertext Transfer Protocol Secure IdM Identity Provider IDS Intrusion Detection System KI Karolinska Institutet KTH Kungliga Tekniska Högskolan LDAP Lightweight Directory Access Protocol MR MapReduce OTP One Time Password

Page 6: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 6 of 23

PaaS Platform as a Service PAP Policy Administration Point PDP Policy Decision Point PEP Policy Enforcement Point PKI Public Key Infrastructure PII Personally Identifiable Information RBAC Role-Based Access Control SAML Security Aspersion Markup Language SPML Service Provisioning Markup Language SQL Structured Query Language SSL Secure Sockets Layer SSH Secure Shell URI Uniform Resource Identifier UUID Universal Unique Identifier WP Work Package XACML eXtensible Access Control Markup Language XDAS Distributed Audit Service

Page 7: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 7 of 23

EXECUTIVE SUMMARY

The deliverable- D3.2 security toolset design, presents design of the security framework for the scalable, secure storage BiobankCloud project. The security framework consists of several components to be developed by the WP3 to ensure confidentiality, integrity and non-repudiation of data access to genomic data that will be stored and processed in a platform as a service (PaaS) cloud model. These security requirements stem from legislations such as European (EU) Data Protection Directive (DPD) to protect privacy of the union citizens, within the Member States, or individuals within the Member States.

To address the security requirements of genomic data processing, containing sensitive content such as personally identifiable information (PII), we define security polices of the BiobankCloud in terms of authentication, authorization and auditing (AAA) infrastructure to be developed in WP3.

We propose the security framework of the BiobankCloud that maps the security requirements and components to conceptual architecture, logical physical architecture and physical architecture. The conceptual architecture describes structure of the security services according to the requirements. The logical architecture, defines structure of the components and services with constraints defined in the conceptual architecture. The physical architecture describes the relationships between different components of the IdM, access control and auditing system to deliver the functionalities defined in the logical architecture.

These security architectures provide the design of a federated identity management (IdM) system based on security assertion markup language (SAML), X.509 certificates and one-time password (OTP) for authentication. Furthermore, we use service provisioning markup language (SPML) for account provisioning across the BiobankCloud resources in the IdM system.

Our access control system will be based on Argus, as an attribute-based access control (ABAC) for authorization of privileged users, combined with a role-based access control (RBAC) model, to be deployed on MySQL cluster network database (NDB) technology for high-availability authorization requests- approximately 4*109 requests/minute, for big datasets stored in Hadoop file system (HDFS) metadata server.

We also design an auditing system based on distributed audit service (XDAS) standard that generates log files in JavaScript object notation (JSON) format, from various resources in the BiobankCloud to be used for auditing purposes. The audit information will support integration of intrusion detection systens (IDS) in future.

Page 8: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 8 of 23

1. Introduction

The BiobankCloud is a platform-as-a-service (PaaS) cloud model that provides capability of analyzing and storing big genomic data in private cloud deployment model [1]. WP3 of the BiobankCloud addresses the security issues that hinder organizations and researchers to migrate the existing genomic data of the biobanks to the Cloud to exploit the capabilities of cloud computing. This document, D3.2 deliverable as a part of the WP3 activities, presents the security design of the BiobankCloud in context of security toolset design. The D3.2 describes design of authentication, authorization, auditing (AAA) infrastructure to maintain the EU Directive 95/46/EC of personal data protection (EU DPD) requirements.

We propose a security framework that ensures legitimate access to the BiobankCloud services and genomic data, according to the security requirements of WP1 and WP6 [2,3] mandated by the EU DPD. The deliverable D3.1- state-of-the-art in cloud computing security provided guidelines and recommendations to be investigated in design of the security framework [4]. Furthermore, our design approach is based on the architectural guidelines described in the open enterprise security architecture (O-ESA) [6]. The rest of this document is structured as follows. Section 2 describes the security requirements of the project and the corresponding policies. Section 3 sketches the conceptual architecture of different components. Section 4 discusses the logical architecture of the security framework. Section 5, describes the physical architecture of all elements. Finally, Section 6 presents the conclusions and future work.

2. Security Requirements of the BiobankCloud

This section describes the security principles and policies of the BiobankCloud including data confidentiality, integrity and non-repudiation of data access to be addressed according to the D3.1, state-of-the-arts deliverable recommendations [5].

2.1 Security Principles

Data confidentiality, integrity and non-repudiation of access are main security concerns in the BiobankCloud project. The BiobankCloud stores and processes genomic data which contains human related genomic information and it requires acquisition of informed consent (the animal models and cell lines are usually considered anonymous and can be shared freely). Therefore, through preserving confidentiality of data, we ensure that the human related genomic data will not be available or disclosed to unauthorized users or services.

Data integrity is another aspect that must be addressed to assure the accuracy and completeness of the genomic data over its lifecycle during transmission or in storage. Such requirements can be defined by the controller who delegates the data processing to the processor, as shown in Table 1.

Furthermore, non-repudiation of data access provides accountability measures for the BiobankCloud users actions, so that the users will not be able to deny their performed actions.

Page 9: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 9 of 23

The following requirements are required to be enforced in the BiobankCloud platform:

2.1.1 Data Confidentiality • Enforcing the EU DPD privacy requirements through protecting personally identifiable

information (PII) • Ensuring data retention requirements including period of retention, data format and archiving

rules set by the controller according to relevant laws and legislations • Preventing unauthorized users and services to read the data • Using least privilege access required to perform a task • Protecting data in transfer from traffic intercepting

2.1.2 Data Integrity

• Protecting the BiobankCloud services using strong encryption mechanisms at rest in storage • Preventing unauthorized users and services to tamper with genomic data • Using digital signature along with the original data to validate the integrity of data

2.1.3 Non-Repudiation of Data Access

• Deploying an audit mechanism to detect unauthorized access • Auditing the system access and operations • Providing forensic support through the audit system

2.2 Security Policies

In this section, we describe the BiobankCloud security policies containing identification, authentication, and authorization and auditing that are required to be implemented, in order to address the security principles described in 2.1.

2.2.1 Identification The BiobankCloud users will be identified through ORCID id1 as a HTTP URI with a 16-digit unique number that distinguishes researchers from each other. The URI starts with “http://orcid.org/” following by the number with a hyphen inserted every 4 digits [5]. For example “http://orcid.org/0000-0003-3133-6802” represents a researcher’s username.

2.2.2 Authentication Authentication is process of verifying a claimed identity by a BiobankCloud user or service. Therefore, it is important to deploy strong authentication mechanisms that protect the platform against malicious adversarial attacks. To authenticate users across the platform, we use two authentication mechanisms based on username/password and PKI X.509 certificates for usability reasons.

The first authentication method, username/password, will be used by generic users, e.g., guest researcher, with no administrative privileges that run only analytical tasks in the platform. However, because of the static password weaknesses for authentication, we add another factor of authentication, known as one-time password (OTP) that is superior in security than static passwords.

Upon every attempt of users to access the BiobankCloud services, users will be asked for authentication through their static password, and the OTP generated by a physical token or a mobile

1 ORCID, http://www.orcid.org

Page 10: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 10 of 23

device. The OTP can be launched by a Yubikey2 device that generates high quality random passwords based on the open authentication (OATH) standard for every login attempt. All the credentials for user authentication will be passed through secure channels such as secure sockets layer (SSL).

Mobile device authentication is another possible form of multifactor authentication, where users will be provided OTPs through their mobile devices. However, this model is not, as secure as using Yubikey devices, because of mobile devices vulnerabilities such as password theft. For usability reasons, in the BiobankCloud we will implement this form of authentication, as an extra feature, so that the controller, as the responsible authority to delegate data processing, will decide about the custom authentication mechanism.

Furthermore, users will be provided a static password that must not be loaned or be shared with anyone else. Each static password contains minimum 8 characters with at least one uppercase, one lowercase, one digit and one special character such as #,!,$,@. The static password must be hashed and stored in the system and a copy will be sent directly to the user. As a security measure, the user account must be blocked after supplying a specific number of false passwords. In such cases, users will be notified by letter and a new password will be sent to them directly to reactivate the blocked account.

Administrative staff in the BiobankCloud use another multi-factor mechanism based on X.509 certificates (a pair of public and private keys), where the users keep the private keys protected through a password that complies with the static password standard, described in this section. In the X.509 user authentication, the public key will be as a method for user identification, because it contains the subjects distinguished name information.

The X.509 certificates will be issued for a certain period of time by an accredited certification authority (CA) that is trusted by the BiobankCloud. The X.509 certificate owners are responsible to revoke the certificate in cases of their certificates being compromised.

2.2.3 Authorization To protect confidentiality and integrity of data, we implement the role model in the BiobankCloud using a role-based access control (RBAC). The platform users can be grouped as researcher, platform, ethic board and organization, as specified by D1.1 (see Table 1):

Group Role Researcher Controller

Trusted Researcher Guest

Platform Administrator Processor Auditor Access Committee

Ethic Board Auditor Guest

Organization Guest Table 1, The BiobankCloud Roles and Groups

2 Yubikey, http://www.yubico.com

Page 11: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 11 of 23

Our authorization system includes an attribute-based access control in addition to the RBAC. The RBAC supports confidentiality and integrity of the genomic data stored in the Hadoop file system (HDFS). The ABAC enables fine-grained authorization for platform users with the admin role, to be able to grant or revoke privileges of the users. To preserve confidentiality of data, the authorization system must deny any attempt to transfer unprotected genomic data. For instance, in cases of over Biobank transfer, a service validates encryption of the genomic data, to ensure sufficient security through well-known encryption standards.

2.2.4 Auditing The BiobankCloud requires logging all significant security events and related operations for audit trails to prove non-repudiation of data access operations. Each user is responsible for all transactions resulting from the use of his/her. The audit information will be stored for a certain period of time defined by the controller. The following events are required to be logged:

• User provisioning and de-provisioning • Failed authentication and authorization attempts • Successful logins and logouts • Adding or deleting user roles and privileges • Relevant privileged user activities • Tracking changes to sensitive information • Data access events by users and services to execute tasks on the Hadoop cluster

All the audit information will be stored securely in chronological order in the audit repository and will be accessed for auditing purposes through a log browser, auditing tool or intrusion detection system (IDS) to alert any suspicious malicious behavior such as:

• Login from different geographically environments • Login on unusual office times • A greedy behavior to browse the resources in the BiobankCloud • Attempt to run tasks not defined by the policies

The audit tools supports queries for certain types of events or users. It is also possible to issue cross-tabular queries to find patterns and do statistical analysis over them.

3. Conceptual Architecture

In this section, we describe the conceptual architecture of the BiobankCloud security framework to illustrate the structure of the security services, according to the requirements of Section 2.

3.1 Identity Management System

Identity management (IdM) system provides organizational benefit through providing low-cost solutions while reducing risk and complexity to enforce security policies. In this section, we describe our IdM system that is composed of identities store, federated identities services, identity mapping, identity administration, and provisioning services, as demonstrated in Figure 1.

Page 12: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 12 of 23

3.1.1 Identities Store The BiobankCloud identities are maintained in an identity store with the following information:

• Users Identities: information about users such as username, user id, first name, last name, title, organization, X.509 certificate information, role, telephone, email, address;

• Groups: group name, group id, members id • Credentials: Yubikey token OTPs, passwords • Resources: resource name, resource id, group id

3.1.2 Federated Identities Services Identity federation will enable users for cross-domain single sign-on (SSO) over different security domains, among different partners, e.g., over biobank data transfer. Identity providers require exchanging information of identities within each domain that will be stored through the federated identities services.

3.1.3 Identity Mapping Identity mapping service provides a correlation between user identifiers through pseudonyms. For instance and ORCID id of an external user will be mapped to a universal unique identifier (UUID) in a local service provider.

3.1.4 Identity Administration Identity administration is another component of our IdM to provide functionalities of creating unique identities and related attributes such as group, roles, resources for users, applications and other resources.

3.1.5 Provisioning Services There are scenarios where user’s identity needs to be provisioned within a service provider (SP), at the time he/she is granted to use the BiobankCloud services and be deprovisioned when the user is not entitled to access the SP any longer. Provisioning services automate maintenance of the platform accounts and entitlements, for instance provisioning users’ privileges to access the HDFS.

Identities Store

Federated Identities Services Identity Mapping

Identity Administration Provisioning Services

Platform Accounts and Entitlements

Users Identities Groups Resources

Groups Memberships Credentials Resouce Accesses

Roles

Figure 1, Conceptual Architecture of the IdM System

Page 13: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 13 of 23

3.2 Access Control Management

The BiobankCloud access control system can be considered as the authorization framework to provide functionalities such as policy administration and enforcement of policies, as shown in Figure 2. A policy can be considered as a set of rules that apply to a subject to perform a task.

Policy Enforcement Point (PEP)

Access Control Table (ACT)Environment

Subject Resource

Policy Decision Point (PDP)

Policy Access Point (PAP)

Access Control Policies

Figure 2, Conceptual Architecture of the Access Control System

The building blocks of the access control system are described as followings:

3.2.1 Request Structure An authorization request is composed of an extensible access control markup language (XACML) structure as below:

• Subject: who is the requester, e.g., X.509 distinguished name (DN), ORCID id attributes • Action: which action is requested to be performed, e.g., create, read, update, delete,

execute, transfer • Resource: server, file, directory, service, table, process, application • Environment: date, time, location

3.2.2 Policy Administration Point Policy administration point (PAP) provides administrative functions to create, maintain and delete policies in the access control policies repository through XACML requests.

3.2.3 Policy Decision Point Policy decision point (PDP) determines what requests are permitted to be performed by the client. The authorization response is one of the following decisions:

• PERMIT: permits the action to be performed in the request • DENY: denies the action in the request • INDETERMINATE: an error while evaluating the policy • NOT_APPLICABLE: no decision could be made since some parts of the policy could not be

applied to the values

Page 14: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 14 of 23

3.2.4 Policy Enforcement Point Policy enforcement point (PEP) is a component that ensures decisions made by the PDP are ordained to restrict usage and access to the BiobankCloud resources and applications.

3.2.5 Access Control Table Applications and resources that require maintaining fine-grained access control mechanism within a specific domain will use a local access control table (ACT) that stores users’ permissions to that application or resource. For instance, the HDFS metadata (UNIX inode of files) server will maintain a mapping between user’s permissions and actual genomic files, stored in the HDFS.

3.3 Auditing

The auditing system is composed of two main components: security services events logger and audit management, as shown in Figure 3. All generated logs from a subject to access a BiobankCloud resource will be stored in the audit logs repository.

Subject Resource

Security Events Logger

Data Access Events

Identity and Access Management Events

Audit Logs

Audit Management

Intrusion Detection Service (IDS)

Audit Browsing Audit Report

Authentication and Authorizaton Events

Figure 3, Conceptual Architecture of the Auditing System

3.3.1 Security Events Logger Security events logger provides the following security events and pushes the messages to the audit logs data store: • Authentication and Authorization Events

Authentication and authorization events component produces the users and administrative staff logs messages including successful or failed logins.

• Identity and Access Management Events Identity and access management component produces messages related to creating, modifying, disabling or deleting accounts.

• Data Access Events The data access events produces access logs to genomic data stored in the HDFS.

Page 15: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 15 of 23

3.3.2 Audit Management Log messages recorded in the audit logs’ data store will be accessible by the audit management components that accommodates audit browser, audit report and intrusion detection service (IDS). • Audit Browser

Audit browser provides interfaces for a log analyst or an auditor to review the log events from the log data store.

• Audit Report Audit report provides chronological activities of users and processes for monitoring purposes based on the log sources and the correlation between them.

• IDS IDS provides signature analysis of the communicated packets and alerts the system managers, in cases of a suspicious behavior, e.g., transferring unprotected data, login from suspicious locations, multiple failed authentication or authorization attempts.

4. Logical Security Architecture

The logical architecture, defines structure of the components and services with constraints defined in the conceptual architecture. In this section, we describe the relationships between different components of the IdM, access control and auditing systems.

4.1 Identity Management System

The BiobankCloud IdM system is composed of three components: identity management services, directory services and provisioning services, as shown in Figure 4.

Directory Services

Identity Management Services

Identity Administration

System AdministratorSAML Identity

Mapping Service

Local Identity Service

IdentitiesDirectory

Directory Access Manager

Provisioning ServicesHadoop Cluster

Audit

Intrusion Detection Service (IDS)

Crypto Services

Access Control Service

User Manager

User

SchemasProvisioning Server

Account Synchronization

Entitlement Manager

User Registration Self-Service

Figure 4, Logical Architecture of the IdM System

4.1.1 Identity Management Services

Identity management services provide functionalities required for user account administration and users self-service by the identity administration and user manager.

Page 16: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 16 of 23

4.1.1.1 Identity Administration Identity administration service provides two capabilities. Firstly, it enables administrators for users account management in a central directory through the directory access manager. Secondly, it provides the federated SSO services as described in the following. • Directory Access Manager

Directory access manager provides functionalities such as creating, editing, and deleting of users/groups/resources, in addition to privilege entitlement, group and account management.

• SAML Identity Mapping Service The SAML identity mapping service will be used to create or update identities for federated SSO, in addition to mapping the ORCID identifiers with UUIDs.

4.1.1.2 User Manager The user manager service facilitates user registration and self-service to reduce the administration costs and complexities.

• User Registration User registration service enables guest researchers and other users to apply for a new account in the BiobankCloud. Users enter all required information and questionnaires and confirm the entered information for a successful membership application. • Self-Service At any time users can login to their account to reset the static password, deactivate the account or modifying other personal information.

4.1.2 Directory Services

Information about all users of the BiobankCloud will be accessed and maintained using the lightweight directory access protocol (LDAP), as described in Table 2. First column indicates name of the entry and second column describes the structure of the entry.

Entry Name Description The Top Level LDAP server dn: ou=biobankcloud.eu,ou=ua,dc=org

ou=biobankcloud.eu objectClass: top objectClass: organizationalUnit

Organization

dn: ou=Organization,ou=biobankcloud.eu,ou=ua,dc=org ou=Organization objectClass: top objectClass: organizationalUnit

Group dn: ou=Group,ou=biobankcloud.eu,ou=ua,dc=org ou= Group objectClass: top objectClass: organizationalUnit

Resource

dn: rn:HDF, ou=Resource,ou=biobankcloud.eu,ou=ua,dc=org rn=HDFS objectClass: biobankCloudResource objectClass: organizationalUnit

User dn: ou=biobankcloud.eu,ou=ua,dc=org ou=People objectClass: top objectClass: organizationalUnit

Table 2, LDAP Structure of the BiobankCloud

Page 17: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 17 of 23

The BiobankCloud’s LDAP namespace consists of a root organization, groups, resources and people, as demonstrates in Figure 5. The organizational information such as name, postal address, and telephone number will be recorded in an entry. Each organization may require several groups to be defined trough a group name, and group id number (gidNumber) and member user ids (memberUid). The BiobankCloud resources such as workflow execution engines, file system, audit system will be defined through the resource entry to be accessed by entitled groups. Finally, users will be described through the people entry with information such as name, email address, group, role, and password.

dn: ou=biobankcloud.eu,ou=ua,dc=org

dn: ou=PDC,ou=biobankcloud.eu,ou=ua,dc=org ou: PDC postalAdress: telephoneNumber:

dn: cn=pdc00001,ou= Group,ou=biobankcloud.eu,ou=ua,dc=org cn: pdc00001 gidNumber: 70000 memberUid: pdc00001

dn: uid=pdc00001,ou=People,ou=biobankcloud.eu,ou=ua,dc=org cn: Ali Gholami

deactivted: FALSE deactivationReason: N/A

bbcSubjectDN: N/A gidNumber: 70000

givenName: Ali sn: Gholami

title: Mr. uid:pdc00001 uidNumber: 70000 telephoneNumber: +46 8 93 4843 mail: [email protected] bbcAccountRole: controller userPassword: 2e99758548972a8e8822ad47fa1017ff72f06f3ff6a016851f45c398732bc50c

dn: cn=HDFS,ou= Resource,ou=biobankcloud.eu,ou=ua,dc=org cn: HDFS gidNumber: 70000

Figure 5, LDAP Namespace of the BiobankCloud

4.1.3 Provisioning Services

Provisioning services is composed of account synchronization and entitlement manager that update the user accounts over all services and resources in the BiobankCloud platform according to the information pulled from the central directory services.

• Account synchronization Account synchronization service maintains the accounts and attributes updates based on the directory services information. For instance, accounts provisioning/deprovisioning in destination resources will be performed after creating a new account or terminating an existing account.

Page 18: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 18 of 23

• Entitlement manager Adding or removing permissions will be supported through the entitlement manager service to update the local ACT in the destination resources. To access files in the HDFS by users, we implement permissions such as read, write and execute. These permissions can be granted or revoked by controller or trusted researcher for specific groups.

4.2 Access Control Management

The access control management system requires several components for policy administration and enforcement such as PAP, PDP, PEP and handler services to extract the user identities, during runtime after authentication, as demonstrated in Figure 6.

Policy Enforcement Point (PEP)

Access Control Repository

Policy Decision Point (PDP)

Policy Access Point (PAP)

Security Login Handler

PEP

Clie

ntAccess Control

Table (ACT)

Certification Authority

System Administrator

SPL

X.509Security Handler

RequestResponse

SAMLSecurity Handler

MultifactorLogin Handler

ACT client

Figure 6, Access Control Management Logical Architecture

4.2.1 PAP Access control repository stores access policies described by simplified policy language (SPL) through the command line provided by the PAP. This feature provides central banning of users over the BiobankCloud resources. Below is structure of an access control policy that is composed using the SPL format.

resource <value> { action <value> { rule <permit | deny> { <attributeId> = <attributeValue> } } }

4.2.2 PDP The PDP service extracts an XACML request from the incoming messages to retrieve access control policies related to the subject of the request in the PAP service. Such retrieved policies from the PAP service can be cached for a specific period of time to reduce the communication overhead between the PDP and PAP services.

Page 19: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 19 of 23

4.2.3 PEP The PEP service receives the client authorization requests and generates a consistent XACML request to be forwarded to the PDP service. The PEP directs the XACML message through configured PDP information.

4.2.4 Security Login Handler The security login handler consists of different components to handle different user authentication mechanisms such as X.509, SAML and multifactor authentication, as he shown in Figure 6. The authorization requests are composed by the PEP client that is deployed in the BiobankCloud resources security login handler. Furthermore, the security login handler can be configured to provide high availability authorization decisions based on the ACT information.

For this purpose, the BiobankCloud resources such as HDFS services will forward authorization requests directly to the security login handler by the ACT client. The ACT client will retrieve the relevant access control information from a high availability database service and forwards the response to the request initiator.

• X.509 Login Handler The X.509 login handler component decodes and validates signature of authentication messages of the administrative staff using their public key, issued by a trusted CA. • SAML Token Login Handler The SAML token security handler generates or extracts the SAML tokens from the incoming/outgoing messages. • Multifactor Login Handler The multifactor login handler requires the username/password (LDAP) and an extra authentication factor such as an OATH-HOTP token (Yubikey device) or an OTP delivered through a mobile device.

4.3 Auditing

The logging events will be sent from event suppliers to event consumer through event management service, as shown in Figure 7.

4.3.1 Event Suppliers

Different subsystems of the BiobankCloud that are related with the IdM system or the actual platform that generates access logs will deploy a distributed audit service (XDAS) agent that collects and sends the logs to the event management service [7].

The audit events are composed of these entries: initiator, event type, target, event timestamp, environment of access, and event outcome, as described in Table 3.

Page 20: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 20 of 23

Initiator The principal username UUID

Event type Create account Delete account Disable account Enable account Modify account Create data Read data Delete data Modify data

Target Files stored in the HDFS BiobankCloud services

Event timestamp Date Local time

Environment of access Host name Host IP address

Event outcome Successful Failed

Table 3, Common Log Event Contents

4.3.2 Events Management Service The event management service supports the configuration and management of audit event so that the transitional event components do not modify the filtering or routing of audit events, e.g., an alarm is not filtered to reach its destination.

4.3.3 Event Consumer The event consumer processes the collected logging information and provides an audit management API to manage filters, in addition to an audit read API that can be used by the audit analysis application.

Event Management

Service

IDM System

Hadoop Cluster

BiobankCloud Event Suppliers Event Consumer

XDAS Agent

HDFS, Yarn, MR

IDM Services

XDAS Agent

Audit Service Event API

Audit Service Event API

Audit Event Management Application

XDAS Service

Audit Analysis Application

Audit Event Management API

Audit Event Analysis API

Audit Logs

Figure 7, Auditing Logical Architecture

Page 21: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 21 of 23

5. Physical Security Architecture

The physical architecture, defines the connectivity relationship of different libraries and services to deliver the functionalities defined in the logical architecture. Figure 8, demonstrates a high-level physical architecture of the BiobankCloud, where all services are secured and controlled behind the firewall. The dashboard service provides interfaces to interact with the internal services through a Glassfish application server.

The servers of the security system, described in this section, including IdM, authorization and auditing require installation of X.509 host certificates, issued by a trusted CA. The CA publishes certificate revocation list (CRL) that contains expired or invalid certificates to deny services for those certificates defined in the CRL.

Moreover, we use CentOS flavor of Linux to run the security components due to its reliability and community support for production environments. Also, we aim to support Ubuntu platform to install our security framework as well.

5.1 IdM System The IdM system is composed of Shibboleth, provisioning, directory and IdM servers. Identity federation and SSO, along with multifactor authentication components such as Yubikey or other OTPs will be implemented in this server.

The IdM server deploys administration management components, to be accessed by the administrators or users for self-service through hypertext transfer protocol secure (HTTPS) connections. The directory server will deploy an OpenLDAP3 service with a MySQL cluster network database technology (NDB)4 backend for high-availability purposes. The provisioning server will deploy a service markup provisioning language (SPML) service to automate account synchronization in the BiobankCloud services such as Hadoop cluster’s NDB ACT, according to the OpenLDAP information.

5.2 Authorization System The authorization system deploys the Argus PAP, PDP and PEP servers. The PAP server provides interfaces to create and maintain the policies through secure shell. The other Argus5 components (PDP and PEP) can be installed on separate servers, as shown in Figure 8, or all together in on server.

5.3 Auditing System The auditing system contains an XDAS audit server that deploys the auditing components, to be accessed through HTTPS. The audit server backend will be a NoSQL database such as MongoDB6 to store big amount of log messages in JavaScript object notation (JSON) format, where there is no need

3 OpenLDAP, http://www.openldap.org/ 4 MySQL cluster overview, http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-overview.html 5 Argus Authorization Framework, https://twiki.cern.ch/twiki/bin/view/EGEE/AuthorizationFramework 6 MongoDB, http://www.mongodb.org/

Page 22: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 22 of 23

to have a transactional database system. MongoDB provides fast write operations for the log events and also provides possibility to issue complicated queries to be used in the future IDS.

However, implementing the IDS server is not in the scope of the current project, but we provide interfaces as proof-of-concept, to be integrated in future with IDS systems.

Figure 8, Physical Security Architecture

6. Conclusions and Future Work In this document, we defined the security requirements to protect the PII according to the EU DPD to be addressed through implementation of the AAA infrastructure. We also presented the security architecture of the BiobankCloud including the IdM, access management and auditing systems. The proposed security toolset design provides the capability to be extended with the new security mechanisms in future, e.g., to support a new authentication method with backward compatibility.

As next step in WP3 activities, we try to implement the alpha version of the security toolset design to be tested with the existing platform, in context of the deliverable D3.4.

Page 23: SEVENTH FRAMEWORK PROGRAMME - CORDIS · HDFS Hadoop File System . HOTP HMAC-Based One-Time Password . HTTPS Hypertext Transfer Protocol Secure . IdM Identity Provider ... • Protecting

Page 23 of 23

References [1] BiobankCloud- STREP Proposal, Call Identifier: FP7 ICT-2011-8. [2] Jane Reichel, Roxana Merino Martinez, Jan-Eric Litton, BiobankCloud Deliverable, D1.5 v.01, Regulatory and Ethical Requirements for Biobanking Data Storage and Analysis, 2013-01. [3] Jane Reichel, Roxana Merino Martinez, BiobankCloud Model Data Management Policy (MDMP) and some considerations about the user interface, WP1, 2013-03. [4] The BiobankCloud Deliverable D3.1, State of the Art, WP3, 2013-05. [5] Open Researcher and Contributor ID (ORCID) Structure, http://support.orcid.org/knowledgebase/articles/116780-structure-of-the-orcid-identifier. [7] OpenXDAS, http://www.opengroup.org/security/das/xdas_int.htm.