designing and implementing a cloud-hosted saas for data movement and sharing with slapos

31
Implementing a cloud- hosted SaaS for data movement and Sharing with SlapOS Authors: Walid Saad, Heithem Abbes, Mohamed Jemni and Christophe Cerin Journal: International Journal of Big Data Intelligence Online Date: Thursday, July 24, 2014 By:- Arnob Saha (L20339084) Hari Prasad Dhonju Shrestha (L20352046) 1

Upload: haripds-shrestha

Post on 24-Jul-2015

116 views

Category:

Engineering


6 download

TRANSCRIPT

Page 1: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

1

Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with

SlapOS

Authors: Walid Saad, Heithem Abbes, Mohamed Jemni and Christophe CerinJournal: International Journal of Big Data IntelligenceOnline Date: Thursday, July 24, 2014

By:- Arnob Saha (L20339084)Hari Prasad Dhonju Shrestha (L20352046)

Page 2: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

2

Outlines•Abstracts• Introduction•Motivation and fundamental issues•Related work•SlapOS overview•Design and Implementation issues•Experimental results•Conclusion and future works•Acknowledgements•Reference

Page 3: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

3

Abstract• Tools and framework developed to manage and handle the big amount of data for the grid platform.• Tools not adopted because of the complexity of the installation and configuration processes.• SlapOS (Simple Language for Accounting and Provisioning Operating System) emerged•Main aim -> to hide the complexity of IT infrastructures

Software deployment from users• Paper propose a cloud-hosted data grid using the SlapOS cloud• Through a software as a service (SaaS) solution, users can request and install automatically any data movement and sharing tools like Stork and Bitdew without any intervention of a system administrator

Page 4: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

4

Introduction• Many real world scientific and enterprise applications deal with a huge amount of data. The emergence of data-intensive application has prompted scientists around the world to enable data grids. Examples bio-informatics, medical imaging, high energy physics, coastal and environmental modelling and geospatial analysis.• In order to process large data-sets, users need to access, process and transfer large datasets stored in distributed repositories.• Paper proposed a self-configurable desktop grids (DGs) platform on demanda.• The Simple Language for Accounting and Provisioning Operating System (SlapOS) cloud presents a configurable environment in terms of the OS and the software stack to manage without the need of virtualisation techniques.

Page 5: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

5

Introduction(contd…)•We focus in this paper on a subset of the overall research about interoperability between DGs and clouds namely data tools as hosted software as a service (SaaS) frameworks.•We present the design and the implementation of two Software as a Service tools for data management. The first service provides a mean for users to transfer data from their sites to the computation or simulation sites. The second service will be used to share data in widely distributed environment.• The challenge is how to:• imagine automatic data management tools that are able to

mask the installation and configuration difficulties of data management software• deliver data management functionality as hosted services

via web user interfaces.

Page 6: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

6

Introduction (contd…)

Page 7: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

7

Motivations and fundamental issues• e-Science applications require efficient data management and transfer software in wide-area, distributed computing environment.• To achieve data management on demand, the users need a resilient service and move data transparently• No IT knowledge required, no software download/installation/configuration steps.

• Implementations based on:• Stork data scheduler: Manage data movement over wide

area network, using intermediate data grid storage and different protocols• Bitdew: make data accessible & shared from other resources

including end-user desktops and servers

• SlapOS: with only a ‘one-click’ process instantiate, configure data managers(stork+ Bitdew) and deploy them over the internet

Page 8: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

8

Related Works• To manage the low-level data handling issues on grid systems• High-level tools for co-scheduling of data and computation in grid environments.• Research in data management using SaaS-based services.• Data management and transfer in grid environment• GridFTP is the most widely used tool through parallel streams.• Representative examples of storage systems includes SRMs, SRB, IBP and NeST

• FreeLoader framework is designed to aggregate space and I/O bandwidth contributions from volatile desktop storage• Farsite builds a secure file system using entrusted desktop computers• Chirp is a user-level file sytem for collaboration across distributed system like cluster, clouds and grids.

Page 9: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

9

Related Works (condt...)

• Bitdew is an open source data management for grid, DG and cloud computing.

Higher level tools for data scheduling• Stork: a schedular for data placement activities in a grid env• Using stroke input data will be queued, scheduled, monitored, managed and even check-pointed.• Stork provides solutions for data placement problems both in the grid and DG environment since it can interact with different data transfer protocol such as FTP, GridFTP, HTTP and DiskRouter.

Data orchestration through SaaS technologies• Globus Online (GO) is a project that delivers data

management functionalities not as downloadable software but as hosted SaaS.

• Allows users to move, synchronize and share their data using a web browser.

Page 10: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

10

SlapOS overview• An open source distributed operating system • Provides an environment for automating the deployment of applications• Based on the idea that ‘everything is process’, SlapOS combines grid computing, in particular the concepts inherited from BonjourGrid and the techniques inherited from the field of ERP in other to manage, through the SlapGrid daemon, IaaS, PaaS and SaaS cloud services.• The SlapOS strengths are the compatibility with any operating system, in particular GNU Linux, all software technologies and support for several infrastructure• More than 500 different recipes are available for consumer application such as Linux Apache MySQL PHP

Page 11: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

11

SlapOS key concepts• SlapOS architecture is composed of two types of components: SlapOS master and SlapOS node• SlapOS master: it acts as centralized directory for all SlapOS nodes and it knows the location where software are located and all software that are installed.• SlapOS node: it can be dedicated or volunteer node. The master’s role is to install applications and run processes on SlapOS nodes.

• In comparision with the traditional clouds,SlapOS is based on an opportunistic view.• In its normal utilisation, the requests are serviced by the data center nodes. Whenever the number of requests reach a peak, SlapOS can redirect some of them on volunteer node.

Page 12: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

12

SlapOS key concepts•Doing so, the system can win on two points,• It maintains a good response time in the request treatment• In the case of increase in the number of cloud’s customers, there is a good alternative for guaranteeing the SLAs without buying new machines• SlapOS node consists essentially of a basic Linux distribution, a daemon named SlapGrid, a Buildout environment for bootstrapping applications and supervisord to control processes.• Node can receive a request to install software form master, receive request asking the master to deloy an instance of software• SlapOS software on a node is called a ‘Software Release’ and it consists of all the binaries to run the software.• ‘Software Instance’ -> multiple instances of the corresponding s/w

Page 13: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

13

How to join SlapOS?• SlapOS is a voluntary cloud, which mean that each person can

potentially add its own server into the cloud.• To participate to a BOINC and/or Condor project, one has to:• Register on a SlapOS master• Install SlapOS node on the node.• Add a virtual server on the master and link it to the physical server by

configuring the node installed on the physical server.• Select and install application, from the list of available application on

the master, that will be allowed to be deployed on the node.

• The number of instances that can be run on the node depends on the capacity and the configuration of SlapOS on the server.• To make application available on the SlapOS master, it is

necessary to integrate them to SlapOS.• The integration of application to SlapOS goes through the writing

of Buildout profiles, consisting mainly of the file software.cfg which will then make reference to all other reqired files.

Page 14: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

14

Design and Implementation Issues• Implementation steps:• SlapOS uses Buildout technologies to install software and deploy

instances.• In the Stork case, software is divided in three profiles

1. Component (slapos/component/stork/buidlout.cfg): we find here all the dependencies used by by Stork. Buildout will allow us to integrate the profile and dependencies using the rules extends in order to install mainly the Globus Client, Globus GSI grid security infrastructure.

2. Software Release profile(SR): located on a remote git server and defined by its URL ( http://git-repository/slapos/software/stork/software.cfg ) . SR describe the installation of Stork and its dependencies without configuration files and disk image creation. When SlapOS installs a Stork SR, it launchesa Buildout command with the correct URL

3. Software Instance: It will reuse an installed Software Released by creating wrappers, configuration files and anything specific to an instance. The whole process creates a stork configuration file.

Page 15: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

15

Design and Implementation Issues (contd..)•Architecture overview: SlapOS is based on a master-slave paradigm. All steps that allow user to participate in SlapOS community and exploit Stork services are as follow:1. Slapos-connect(Login, Password)

2. Request-stork-software(Slave_Node_Name, Software_Release_Name)

3. Download-stork-software(Stork_Software_Release_URL)

4. Request-instance-parameter(Slap-Parameters_List)

5. Deploy-instance(Slap_Parameter_List)

6. Submit-data-job(submit_dap_file, stork_server)

7. Move-data(src_data_url, dest_data_url)

Page 16: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

16

Architecture overview• Figure 2 Schematic of the Stork SaaS via SlapOS cloud

Page 17: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

17

Security and authentication process• Security in Stork is an important issue with many

aspects to consider. The most important is the way in which user want to run Stork daemons. Current Stork releases fall into three main schema:1. single host: Stork_Server and Stork_Client are running in

the same machine.

2. Multiple hosts: Stork_Serve in one location and Stork_Client in another one.

3. Multiple host and third party transfer: Stork_Server manage movement of data among two or more remote locations.

• Many authentication mechanisms are available like SSL, Kerberso, PASSWORD and GSI.

• Stork_Server provides only GSI authentication to allow different client machines to connect to it.

Page 18: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

18

Security configuration• Users can easily run 100+ Stork instances on a ‘small cluster’, each of them with its own independent daemons and configuration.• Sercurity setting depends on the manner in which the users want to deploy their Stork instances.

1. Running Stork in the SlapOS cloud: After installation of the SlapOS slave node, the user requests one instance which includes two Stork components(server and client tools), both will use the same configuration file.

2. Submitting jobs to an external Stork server: An important property of our approach is the ability to handle transfers using existing Stork_Server.

3. Remote GridFTP transfer: to use GSIFTP transfers with Stork, the users need to specify a valid grid proxy and a user crendentials in place.

Page 19: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

19

Security configuration

Page 20: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

20

Data sharing via SaaS• Once data are placed on SlapOS, a second SaaS based on Bitdew is automatically launched to published and to distribute data over SlapOS community.• BitDew is a programmable framework for large-scale data management and distribution for DG systems.• Bitdew offers two sets of nodes: server (service host) and client(consumer)• To share data with Bitdew, end-users need to connect to SlapOS, request Bitdew software and specify information for instances deployment.• Cloud hosted approach divides the world in three sets of nodes:• Could-middleware node(SlapOS master), cloud-provider node

(SlapOS slave node), SaaS instances (Bitdew server and client)

Page 21: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

21

Data sharing via SaaS• SlapOS user must invoke the following steps:• Request-instance-parameters: for client instances, slap-

parametes are classified into two stepsa. Bitdew_Server: the user sets information about the

remote server hostname b. data information’s parameters: the user must specify the protocol used to get remote data and the signature of the file.

• Deploy-instance• Share-data(transfer_protocal, data-path, properties.json)• Get-data(transfer_protocol, file_md5_ID)

• Bitdew buildout profiles• The integration of Bitdew into SlapOS needs writing multiple

Buildout profiles. Buildout profiles are divided in three types(component, software Release, Software Instances), organized into several directories.

Page 22: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

22

Bitdew buildout profiles

Page 23: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

23

Experimental results• Experiment have performed on the experimental grid computing infrastructure Grid 5000. Experiments were conducted in four cluster of Lyon site using more than 50 machines. Set two Debian Linux Distribution images of SlapOS.• Deployment steps of SlapOS on Grid ‘5000• SlapOS is designed to work natively with IPV6. Several

restrictions are applied to limit access to and from outside the Grid ‘5000 infrastructure. To overcome restrictions, we prepared pre-compiled images containing all the standard install files of SlapOS: the kernel and runtime daemons. These images are also configured to run IPv6 at startup. Slapos-vifib image is implemented and slapos-image is used.

• Usage scenario• To show the capacity of our cloud-hosted model to build a

scalable platform for the purpose to manage bag-of-tasks applications with intensive data.

Page 24: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

24

Experimental results• Two type of metrics:• The scalability in terms of how-many-instances-requests-are-

supported. If the master is overloaded, the time needed to respond to a request instance may increase.• Measure the time required to create Stork and Bitdew

instances as a function of the number of SlapOS nodes.• In our experiments, we use blastn program to search

respectively Human DNA sequences in DNA databases. To run BLAST jobs we need the BLAST application package, the DNA Genebase which contains millions of sequences is a compressed large archive, the DNA Sequence to compare with sequences in Genebase.• The recommended scenarios to be used in our experiments is

shown in Algorithm. At the end of computation, each job will create a result file containing all matched sequences.

Page 25: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

25

Experimental results

Page 26: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

26

Experimental results

• Experimentation stepss:

Page 27: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

27

Experimental results• Result Analysis• Data movement service completion time• All instances are launched simultaneously and completed

successfully, the total completion time includes times to:• Register SlapOS node to the master:

• Deploy of Stork instances:

• Transfer BLAST files from NCBI FTP server to SlapOS nodes:

• The completion time of instances is proportional to• the number of nodes connected to the master

• the number of instances required simultaneously.

• Data sharing service completion time• Deploy of server instances

• Deployment of client instances

• BLAST execution

Page 28: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

28

Experimental results• This figure illustrates the total completion time for two Stork instances using 50 SlapOS nodes (a total of 100 instances). All instances are launched simultaneously and completed successfully

Page 29: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

29

Conclusion and future work• The emergence of data-intensive applications and cloud SaaS technologies brought the flexibility to introduce new data management handling mechanism that help the basic scientist and the grid users to deploy easily their distributed platform.• This works focuses on data management as SaaS-based solutions for the purpose to mask the complexity of the installation and configuration processes and the IT infrastructure requirements.• Since SaaS solutions is already in production into the SlapOS cloud at Paris 13 University, our future research is focused more on self-configuration, scalability and security transfer.

Page 30: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

30

Acknowledgements• In France, this work is funded by the FUI-12 Resilience project from the ministry of industry. Experiments presented in this paper were partly carried out using the Grid ‘5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organisations (see https://www.grid5000.fr). Some experiments were carried out on the SlapOS cloud available at University of Paris 13 (see https://slapos.cloud.univ-paris13.fr).

Page 31: Designing and Implementing a cloud-hosted SaaS for data movement and Sharing with SlapOS

31

References• http://pypi.python.org/pypi/slapos.cookbook/• Abbes, H., Cerin, C. and Jemni, M. (2008) ‘Bonjourgrid as adescentralized scheduler’, IEEE APSCC, December.• Foster, I. (2011) ‘Globus online: accelerating and democratizing science through cloud-based services’, IEEE Internet Computing, Vol. 15, No. 3, pp.70–73.

Thank YOU!!!