migrating application data to the cloud using cloud data patterns › publications › inproc... ·...

Institute of Architecture of Application Systems, University of Stuttgart, Germany,

[email protected], [email protected]

Migrating Application Data to the Cloud Using Cloud Data Patterns

Steve Strauch, Vasilios Andrikopoulos, Thomas Bachmann, Frank Leymann

This publication and contributions have been presented at CLOSER 2013

CLOSER 2013 Web site: http://closer.scitevents.org © 2013 SciTePress. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the SciTePress.

@inproceedings{Strauch2013, author = {Steve Strauch and Vasilios Andrikopoulos and Thomas Bachmann and Frank Leymann}, title = {Migrating Application Data to the Cloud Using Cloud Data Patterns}, booktitle = {Proceedings of the 3rd International Conference on Cloud Computing and Service Science, CLOSER 2013, 8-10 May 2013, Aachen, Germany}, year = {2013}, pages = {36-46}, publisher = {SciTePress} }

:

Institute of Architecture of Application Systems

http://closer.scitevents.org/

Migrating Application Data to the Cloud using Cloud Data Patterns

Steve Strauch, Vasilios Andrikopoulos, Thomas Bachmann and Frank LeymannInstitute of Architecture of Application Systems, University of Stuttgart, Stuttgart, Germany

[email protected], [email protected]

Keywords: Application Data Migration, Cloud Data Patterns, Cloud Migration Scenarios, Application Refactoring.

Abstract: Taking advantage of the capabilities offered by Cloud computing requires either an application to be builtspecifically for it, or for existing applications to be migrated to it. In this work we focus on the latter case,and in particular on migrating the application data. Migrating data to the Cloud creates a series of technical,architectural and legal challenges that the State of the Art attempts to address. We organize such efforts into aset of migration scenarios and connect them with a list of reusable solutions for the application data migrationin the form of patterns. From there we define an application data migration methodology and we demonstratehow it can be used in practice.

1 INTRODUCTION

Cloud computing has become increasingly popularwith the industry due to the clear advantage of reduc-ing capital expenditure and transforming it into op-erational costs (Armbrust et al., 2009). To take ad-vantage of Cloud computing, an existing applicationmay be moved to the Cloud (Cloud-enabling it) or de-signed from the beginning to use Cloud technologies(Cloud-native application). Applications are typicallybuilt using a three layer architecture pattern consist-ing of a presentation layer, a business logic layer,and a data layer (Fowler et al., 2002). The presenta-tion layer describes the application-users interactions,the business layer realizes the business logic and thedata layer is responsible for application data storage.The data layer is in turn subdivided into the Data Ac-cess Layer (DAL) and the Database Layer (DBL). TheDAL encapsulates the data access functionality, whilethe DBL is responsible for data persistence and datamanipulation. Figure 1 visualizes the positioning ofthe various layers.

Each application layer can be hosted using differ-ent Cloud deployment models. Possible Cloud de-ployment models, also shown in Figure 1, are: Pri-vate, Public, Community, and Hybrid Cloud (Melland Grance, 2009). Figure 1 shows the various pos-sibilities for distributing an application using the dif-ferent Cloud types. The “traditional” application, notusing any Cloud technology, is shown on the left ofthe figure. In this context, “on-premise” denotes thatthe Cloud infrastructure is hosted inside the company

and “off-premise” denotes that it is hosted outside thecompany.

In this work we focus on the lower two layers ofFigure 1, the DAL and DBL layers of the application.Application data is typically moved to the Cloud be-cause of e. g., Cloud bursting, data analysis or backupand archiving. The migration of the Data Layer to theCloud includes two main steps to be considered: mi-gration of the DBL to the Cloud, and adaptation ofthe DAL to enable Cloud data access. This separationof concerns allows for taking into consideration exist-ing applications for migration that could potentiallykeep their business logic (partially) on-premises anduse more than one Cloud data store provider at thesame time. It has also to be noted here that, for thepurposes of this work, we assume that the decision tomigrate the data layer to the Cloud has already beenmade based on criteria such as cost, effort etc. (Tak

Overview of Cloud Application Hosting Topologies

Traditional

ApplicationLayers

PresentationLayer

Application

Business Layer

Data Access Layer

Database Layer

PresentationLayer

Business Layer

Data Access Layer

Database Layer

Private Cloud

Community Cloud

DeploymentModels

Hybrid Cloud

Public Cloud

PresentationLayer

PresentationLayer

Business Layer

Business Layer

Data Access Layer

Data Access Layer

Database Layer

Database Layer

DataLayer

Figure 1: Overview of Cloud Deployment Models and Ap-plication Layers.

36

et al., 2011; Menzel and Ranjan, 2012; Andrikopou-los et al., 2012).

As previously discussed (Strauch et al., 2012b),using Cloud technology leads to challenges such asincompatibilities with the database layer previouslyused, or even accidental disclosing of critical data bye. g., moving them to a Public Cloud. For this pur-pose, in (Strauch et al., 2012b), a set of Cloud DataPatterns is identified addressing these challenges us-ing the format defined by Hohpe and Woolf (Hohpeand Woolf, 2003). A Cloud Data Pattern describes areusable and implementation technology-independentsolution for a challenge related to the data layer of anapplication in the Cloud for a specific context.

The contribution of this work is focused on:

1. analyzing the State of the Art in migrating appli-cations, and in particular application data to theCloud, and organizing these efforts into a set dis-tinct migration scenarios with particular charac-teristics,

2. mapping the identified Cloud Data Patterns withthe migration scenarios as best practices in orderto deal with each of the scenarios,

3. providing an application data migration method-ology based on the scenario/pattern mapping, anddemonstrating its applicability by means of an il-lustrative case.

The remainder of this paper is organized as fol-lows: Section 2 discusses a motivating scenario thatwill be used throughout the paper. Section 3 presentsthe existing work on migrating the application and ap-plication data and Section 4 organizes it into migra-tion scenarios. Section 5 summarizes the Cloud DataPatterns that were identified in (Strauch et al., 2012b)and (Strauch et al., 2012a) and highlights their keypoints. Section 6 then discusses the mapping betweenscenarios and patterns, and the application data mi-gration methodology that is based on this mapping.The motivating scenario from Section 2 is refactoredin order to demonstrate our proposal in practice. Fi-nally, Section 7 concludes the paper and discusses fu-ture work.

2 MOTIVATING SCENARIO

For purposes of an illustrative example let us con-sider the case of a Health Insurance Company (HIC)in Germany. As a result of the increase of the num-bers of its clients, the company stores their data in twodata centers in different parts of Germany. The datacenters, covering geographical regions A and B, re-spectively, form a Private Cloud data hosting solution.

This Private Cloud acts as a uniform access point andoffers a unified view of the data to the various appli-cations used by the employees of the company. Thecompany is required to provide access to an Exter-nal Auditing Company (EAC) to audit the financialtransactions processed by the company. EAC exe-cutes a series of predefined complex queries on thefinancial transactions data at irregular intervals andreports back to HIC and the responsible authoritiestheir findings. HIC however is also obliged by law toprotect the personal data privacy and confidentialityof the medical record of its clients. For this purpose,the company takes special care to anonymize the re-sults of the queries executed by the auditor in order toensure that no client information is accidentally ex-posed.

Providing EAC with direct access to the databaseof HIC raises a series of concerns about ensuring thesecurity of the company-internal data, and the perfor-mance of the company systems, as an indirect resultof the unpredictable additional load imposed by thecomplex queries executed by the auditor. As a so-lution to these issues, it is proposed to use a PublicCloud data hosting solution provider and migrate aconsistent replica of the financial transaction recordsto the Public Cloud, stripped (to the extent that it ispossible) of any personal data. The auditing companywould then be able to retrieve the necessary informa-tion without burdening the company systems. Sucha migration to the Cloud however, even if only par-tial, requires addressing different kinds of challenges:confidentiality-related (ensuring that it is impossibleto recreate the medical records and other personal in-formation of the company clients using the data in thePublic Cloud), functionality-related (providing bothall the necessary data and the querying mechanismsfor EAC to operate as required), and non-functional(ensuring that the partial migration does not encum-ber in any way the performance of HIC’s systems). Inthe following we discuss how Cloud Data MigrationScenarios and Patterns can be used in conjunction toaddress such issues.

3 BACKGROUND

Data migration can either be seen as part of the mi-gration of the whole application, or as the migrationof only the Data Layer. For this purpose, in the fol-lowing we investigate not only the State of the Art onuse cases for application migration to the Cloud, in-cluding which types of applications are suitable formigration to the Cloud, but also use cases for data-base layer migration only. Some of the cases include

Migrating�Application�Data�to�the�Cloud�using�Cloud�Data�Patterns

37

also migration of application from the Cloud back to atraditional on-premises model. Based on this analysisin Section 4 we derive specific Cloud Data MigrationScenarios.

Application Migration. Amazon proposes a phase-driven approach for Cloud application migration,which includes one phase focusing on data migra-tion (Varia, 2010). The data migration is done in twosteps: selection of the Amazon AWS service, and mi-gration of the data. Furthermore, Amazon providesrecommendations regarding which of their data andstorage services best fit for storing a specific typeof data, e.g., Amazon Relational Database Service(Amazon RDS)1. Apart from that, Amazon describesthree application migration use cases (Varia, 2010):

� Marketing and collaboration Web sites.

� Digital asset management solution using batchprocessing pipelines.

� Claims processing systems using back-end pro-cessing workflows.

Microsoft identifies the following eight types ofapplications to be considered for migration to theCloud (Microsoft Azure, 2012):

1. SaaS applications

2. Highly-scalable Web sites

3. Enterprise applications

4. Business intelligence and data warehouse applica-tions

5. Social or customer-oriented applications

6. Social (online) games

7. Mobile applications

8. High performance or parallel computing applica-tions

Microsoft also provides a step-by-step migrationguideline for migrating local MS SQL Serverdatabases to Azure SQL Database Service mainlyconsisting of two phases: database schema migrationsupported by a wizard, and migration of the data itselfbased on command-line tools (Lee et al., 2009).

Furthermore, Cunningham specifies the followingfour general scenarios when an application is suitablefor migration to the Cloud (Cunningham, 2010):

1. The application is used only in predefined periods.

2. The rapid increase in the need for resources can-not be compensated by buying new hardware.

1http://aws.amazon.com/rds/

3. The application load can be anticipated, e.g.,for seasonal businesses, allowing to optimize re-source utilization.

4. In case of unanticipated load, the load increaseswithout prior indication.

Laszewski et al. (Laszewski and Nauduri, 2011)identify five scenarios for migration of a legacy ap-plication to the Cloud: rewrite/re-architect the appli-cation, changing the platform, automated conversionusing suitable tools, emulation, e.g., through Service-Oriented Architecture enablement and Web services,and data modernization. Data modernization entailsthe migration of the database to the Cloud and itis therefore directly relevant for this work. Finally,Armbrust et al. identify the following types of ap-plications as drivers for Cloud computing (Armbrustet al., 2009):

� Mobile interactive applications

� Parallel batch processing

� Business analytics as a special case of batch pro-cessing

� Extension of computationally-intensive desktopapplications

Data Migration. With respect to the migration ofthe database layer in particular, Microsoft identifiesthe following migration scenarios where only thedatabase layer is migrated to the Cloud and the otherlayers are kept hosted traditionally (Lee et al., 2009):

1. Web applications.

2. Applications only used by departments or smallerworking groups within a company.

3. Data hubs, where the data is mirrored, e.g., on thelaptops of employees, and regularly synchronizedwith the data store in the Cloud.

Especially for the first two scenarios, the performanceissues and complexity due to the potential latencychallenges after the migration of the database layerto the Cloud have to be considered.

Informatica2 is an SaaS provider for secure datamigration, data replication and data synchronizationinto the Cloud. The user configures the synchroniza-tion and migration service via a Web interface and thedata migration is done using locally installed secureagents and encryption.

Different solutions and approaches can thereforebe used for migrating application data to the Cloud. Inthe following section we organize them into distinctmigration scenarios with particular characteristics.

2http://www.informatica.com

CLOSER�2013�-�3rd�International�Conference�on�Cloud�Computing�and�Services�Science

38

Table 1: Cloud Data Migration Scenarios Overview.

Migration Scenario Migration Direction (Traditional or Cloud)

Database Layer Outsourcing Traditional �! CloudUsing Highly-Scalable Data Stores Traditional/Cloud �! CloudGeographical Replication Traditional/Cloud �! CloudSharding Traditional/Cloud �! CloudCloud Bursting Traditional/Cloud ! CloudWorking on Data Copy Traditional �! CloudData Synchronization Traditional ! CloudBackup Traditional/Cloud �! CloudArchiving Traditional/Cloud �! CloudData Import from the Cloud Cloud �! Traditional/Cloud

4 CLOUD DATA MIGRATIONSCENARIOS

Table 1 summarizes the Cloud Data Migration Sce-narios we derived from the State of the Art as dis-cussed in the previous section. The table also iden-tifies the direction of data migration (from/to the tra-ditional to/from any Cloud-based deployment modelin Fig. 1). The migration scenarios Backup, Archiv-ing, and Data Import from the Cloud can be appliedindependently from the migration of an application tothe Cloud. All other scenarios have been derived fromthe types of applications and use cases for applicationmigration to the Cloud.

Database Layer Outsourcing is the complete orpartial migration of the local, traditionally hosteddatabase layer to the Cloud without changing the typeof the data store, e.g., relational, NoSQL, or BinaryLarge Object (BLOB) data store. An example of acomplete Database Layer Outsourcing of a relationaldata store is the migration of a local MySQL databaseto Amazon RDS with MySQL configuration.

The migration scenario Using Highly-ScalableData Stores assumes that before migration the dataare hosted in a non highly-scalable relational datastore, which might be hosted traditionally, or in aCloud environment. The data in this scenario are mi-grated to a NoSQL or BLOB data store. Two ex-amples taken from the industry are the media con-tent delivery company Netflix (Anand, 2010) and theWeb information company Alexa (Amazon.com, Inc.,2011). Both of them migrated their data from a re-lational data store to a NoSQL (AmazonSimpleDB3)

3http://aws.amazon.com/simpledb/

and BLOB data store (Amazon S34) in the Cloud forthe purpose of achieving better scalability.

For latency critical applications, the data aremoved as near as possible either to the user, or tothe processing logic in order to reduce data accesslatency. The Geographical Replication migrationscenario considers replicating the database layer forstatic data, as in the case of content delivery networks,or for data changing dynamically using, e.g., readreplicas or read-write replicas. This migration sce-nario implies that the different replicas have to be keptconsistent by realizing synchronization mechanisms.An example for replication of static data is AmazonCloudFront5 and an example for replication of datachanging dynamically is Microsoft Azure SQL usedwith its Data Sync functionality (Redkar and Guidici,2011).

In contrast to the previous scenario, Sharding dis-tributes the data into disjoint groups without requir-ing synchronization between the different data stores(shards) (Pritchett, 2008). The distribution can be ei-ther done geographically or based on the functionalgrouping of the data. Guidelines for the realization ofa solution considering sharding are available for Mi-crosoft Azure SQL6 and Amazon RDS7.

Cloud Bursting is the temporary outsourcing ofthe database layer to the Cloud in order to use ad-ditional resources for off-loading of peak loads, be-cause the available on-premise resources are not suf-

4http://aws.amazon.com/s3/5http://aws.amazon.com/cloudfront/6http://social.technet.microsoft.com/wiki/contents/

articles/1926.how-to-shard-with-windows-azure-sql-database.aspx

7http://aws.amazon.com/articles/0040302286264415


39

ficient. Typical cases where Cloud bursting is appliedare seasonal businesses like Christmas shopping out-lets. A realization of a solution based on this scenariocan be implemented using Microsoft Azure SQL usedwith its Data Sync functionality (Redkar and Guidici,2011).

The migration scenario Working on Data Copy re-quires the creation of a complete or partial copy of thedatabase layer in the Cloud for the purpose of avoid-ing additional load on the production system by, e.g.,running complex data analysis. As NoSQL data storesare highly scalable and thus are able to handle addi-tional load in parallel to the production process in thegeneral case the source and target data store for thismigration scenario are relational data stores. An ex-ample of this migration scenario are data warehouseapplications operating on non-live data.

Data Synchronization enables a set of users towork in parallel and temporarily off-line while shar-ing relatively up-to-date data without latency for thedata access. The database layer is copied into theCloud and a synchronization mechanism is realized.Each user works on a local replica of the databaselayer which is regularly synchronized with the data-base layer in the Cloud. A detailed example for DataSynchronization is provided in (Lee et al., 2009). Thesales staff are working on local copies of customersdata and lists of company product prices on their lap-tops. The data are regularly synchronized in the back-ground when the employees of the company are con-nected to the Web.

In order to fulfill compliance regulations such asSOX (United States Congress, 2002) it is required tokeep business data and store them over a fixed pe-riod of time. These data provide a snapshot at a spe-cific point in time and serve as a save point to re-turn to, e.g., in case of data loss. Especially whenusing Cloud providers it is recommended to regu-larly backup your data as you have no direct controlover the data when stored on the infrastructure of theCloud provider (Badger et al., 2012). Backup there-fore constitutes a particular case of data migration tothe Cloud, occurring at predefined intervals. An ex-ample backup solution in the Cloud is provided byDatacastle8. Archiving also creates a complete copyof the database layer to the Cloud, but serving a differ-ent purpose than Backup. The usage of archived datais not foreseen in case of errors or data loss, but foranalysis and as evidence in court cases (ConsultativeCommittee for Space Data Systems, 2002). Amazonprovides the Cloud archiving service Glacier 9 for thispurpose.

8http://www.datacastlecorp.com9http://aws.amazon.com/glacier/

Several Cloud services are offering access to datavia an API. Thus, the database layer of the Cloud ser-vice can be partially or completely imported to thelocal data center in order to minimize the latency fordata access during processing. Examples for Cloudservices enabling the migration scenario Data Importfrom the Cloud are governments providing their datain a machine readable format like Open Data10 andservices publishing data such as Twitter 11 in order toenable social media monitoring.

5 CLOUD DATA PATTERNS

In order to provide support when migrating applica-tion data to the Cloud there is a clear need for de-scribing reusable solutions to overcome the recurringchallenges such as incompatibilities on an abstractand technology independent level as patterns. Forthis purpose, in previous work (Strauch et al., 2012a;Strauch et al., 2012b) an initial list of reusable andtechnology independent solutions for the identifiedchallenges was proposed in the form of Cloud DataPatterns. A Cloud Data Pattern describes a reusableand implementation technology-independent solutionfor a challenge related to the Data Layer of an ap-plication in the Cloud for a specific context (Strauchet al., 2012b).

These patterns have been identified as part of thework in various EU research projects, and especiallyduring the collaboration with industry partners. Theidentified patterns are also based on literature researchfocusing on available reports from companies that al-ready migrated their application data to the Cloud andadapted their application accordingly, e.g., by Net-flix (Anand, 2010). Additionally, guidelines and bestpractices on how to design and build applications inthe Cloud, e.g., for enabling scalability (Adler, 2011),are also taken into consideration. The patterns are fur-thermore based on requirements and challenges con-cerning adaptation of applications for the Cloud en-vironment (Andrikopoulos et al., 2012) and manage-ment of these applications in the Cloud (Conway andCurry, 2012).

So far, three categories of Cloud Data Patternshave been identified:1. Functional Patterns2. Non-functional Patterns3. Confidentiality Patterns

Confidentiality patterns can be considered a subcate-gory of the non-functional patterns; they are treated

10http://www.opendata-network.org11http://dev.twitter.com/docs/streaming-api/methods


40

Table 2: Cloud Data Patterns Summary.

Category Name Icon Challenge

Functional

Data Store FunctionalityExtension

How can a Cloud data store provide a missingfunctionality?

Emulator of StoredProcedures

How can a Cloud data store not supporting storedprocedures provide such functionality?

Non-functional

Local Database ProxyHow can a Cloud data store not supporting hori-zontal data read scalability provide that function-ality?

Local Sharding-BasedRouter

How can a Cloud data store not supporting hori-zontal data read and write scalability provide thatfunctionality?

Confidentiality

Confidentiality LevelData Aggregator

How can data of different confidentiality levelsfrom different data sources be aggregated to onecommon confidentiality level?

Confidentiality LevelData Splitter

How can data of one common confidentiality levelbe categorized and split into separate data partsbelonging to different confidentiality levels?

Filter of Critical Data

How can data-access rights be kept when movingthe Database Layer into the private Cloud and apart of the Business Layer and a part of the dataaccess layer into the public Cloud?

Pseudonymizer ofCritical Data

How can a private Cloud data store ensure passingcritical data in pseudonymized form to the publicCloud?

Anonymizer of CriticalData

How can a private Cloud data store ensure passingcritical data only in anonymized form to the publicCloud?

separately however due to their importance to theData Layer. Table 2 provides an overview of the iden-tified and described Cloud Data Patterns so far.

Functional Cloud Data Patterns like Data StoreFunctionality Extension and Emulator of Stored Pro-cedures provide reusable solutions for challenges re-lated to offered functionality by Cloud data stores andservices. In case the type of data store changes dur-ing the migration, e.g., from RDBMS to NoSQL, orBLOB store, it might be not sufficient to emulateor add additional functionality by using FunctionalCloud Data Patterns. For example, there may be noequivalent database schema, the consistency modelmay change from strict to eventual consistency (Vo-gels, 2009), and ACID transactions may not be sup-ported.

Non-Functional Cloud Data Patterns focus onproviding solutions for ensuring an acceptable Qual-ity of Service (QoS) level by means of scalability in

case of increasing data read of data write load. Thereare two options for this purpose: vertical and hor-izontal data scaling (Pritchett, 2008; Zawodny andBalling, 2004). Elasticity with respect to data reads isnormally achieved by data replication (Buretta, 1997)using read replicas with a master/slave configuration.This is because when write replicas (several masterdatabases) are used, the performance might decreasedepending on the consistency model (strict or even-tual consistency (Vogels, 2009)).

Confidentiality Cloud Data Patterns provide solu-tions for avoiding disclosure of confidential data (Ta-ble 2). Confidentiality includes security and privacy.With respect to confidentiality, we consider the data tobe kept secure and private as critical data such as busi-ness secrets of companies, personal data, and healthcare data, for instance. The personal data and accountinformation of the customers as part of the billingdata of HIC, for example, have to be pseudonymized


41

or anonymized before migrating the billing data tothe Public Cloud. In this case, the actual peoplethe data is about (HIC customers), and the ownerand user of the data (HIC and EAC, respectively) areclearly distinguished. The migration of anonymizedor pseudonymized data to the Cloud therefore doesnot effect the management of the identity of users ofCloud data hosting solutions, e.g., in large-scale Pub-lic Cloud environments. The interested reader is re-ferred to (Strauch et al., 2012a; Strauch et al., 2012b)for an in-depth discussion on these patterns.

6 PATTERN-BASEDAPPLICATION REFACTORING

In this section we first introduce the mapping ofCloud Data Migration Scenarios to Cloud Data Pat-terns (Section 6.1). Afterwards we propose a step-by-step methodology for the migration of the data layerto the Cloud covering all migration scenarios and us-ing the patterns (Section 6.2). Finally, we apply themethodology to the application introduced in the mo-tivating scenario focusing on application architecturerefactoring using the patterns (Section 6.3).

6.1 Mapping Scenarios to Patterns

Table 3 provides an overview on how Cloud Data Mi-gration Scenarios map to Cloud Data Patterns. In or-der to avoid repetition, in the following we discuss themapping of the migration scenario Database LayerOutsourcing to the various patterns in detail and forthe other scenarios we focus on explaining the dif-ferences compared to the mapping of this migrationscenario.

All patterns are applicable for the scenario Data-base Layer Outsourcing depending on the specificconditions of the required solution. As no change ofthe data store type takes place, but there might be achange of the product, potential incompatibilities, e.g.regarding realization of data types, or missing func-tionalities used before, like stored procedures, haveto be added or emulated using the functional patterns.In case the DBL is completely moved to the Cloud,a solution to reduce the latency for data access is tohost read replicas locally and to use a Local Data-base Proxy to forward data reads to these replicasinstead of querying the DBL in the Cloud for everyread request. In order to scale both data writes andreads, a Sharding-Based Router can be used to enablesharding in case it is not supported by the Cloud datastore(s) chosen for migration.

When the database layer is migrated to the pub-lic Cloud, it has to be decided which critical datawill not be moved to the Cloud, e.g., as business se-crets. The patterns Confidentiality Level Data Aggre-gator and Confidentiality Level Data Splitter enablethe differentiation and harmonization of the confiden-tiality level of different data sets from potentially dif-ferent domains and different data sources. The otherconfidentiality patterns enable filtering of critical datathat have to stay locally in case the database layeris only partially migrated to the Cloud, and remov-ing or masking the critical data by anonymization orpseudonymization before moving them to the publicCloud.

As in the migration scenario Using Highly-Scalable Data Stores the data in the DBL can alsobe only partially migrated, the non-functional pat-terns might be applicable as in the fist migration sce-nario. Thus, there are no differences in the patternmapping compared to the migration scenario Data-base Layer Outsourcing. The usage of the Sharding-Based Router is not typical for the migration scenarioGeographical Replication since sharding the data dis-jointly distributes them, instead of replicating them.When combining replication and sharding however,e.g., by sharding some write replicas that are more of-ten accessed, this pattern is also applicable.

A non-typical use of the Local Database Proxypattern for the scenario Sharding can be applied incase, e.g., some shards that are more often accessedvia read are replicated for read scalability. The Lo-cal Sharding-based Router might be even used in thescenario Sharding in order to keep the sharding in thedatabase layer transparent for the business layer toavoid adaptations to the business logic. In the Work-ing on Data Copy scenario, data analysis is performedon the complete or partial copy of the database layer,instead of the actual data. As such, the non-functionalpatterns are not applicable for this scenario.

None of the Cloud Data Patterns are applicablein the Backup scenario, because the purpose of thisscenario is to restore the latest, saved state of the sys-tem as fast as possible in case an error or outage oc-curs. No further processing on the backup data takesplace. Thus, in such a time critical situation eventhe usage of the Pseudonymizer of Critical Data isto much overhead that delays restoring the latest sta-tus. The Pseudonymizer of Critical Data however canbe used in the scenario Archiving to save even criticaldata, e.g. personal data, in the public Cloud. The re-lation of the masked data to the original data has tobe stored separately and safely in order to avoid re-vealing and to be able to revert the masking of thedata before the archived data can be used. All other


42

Table 3: Scenario/Pattern Mapping.

Functional Non-functional Confidentiality

Dat

aSt

ore

Func

tiona

lity

Ext

ensi

on

Em

ulat

orof

Stor

edPr

oced

ures

Loc

alD

atab

ase

Prox

y

Loc

alSh

ardi

ng-b

ased

Rou

ter

Con

fiden

tialit

yL

evel

Dat

aA

ggeg

rato

r

Con

fiden

tialit

yL

evel

Dat

aSp

litte

r

Filte

rofC

ritic

alD

ata

Pseu

dony

miz

erof

Cri

tical

Dat

a

Ano

nym

izer

ofC

ritic

alD

ata

Database Layer Outsourcing X X X X X X X X X

Using Highly-Scalable DataStores

X X X X X X X X X

Geographical Replication X X X (X) X X X X X

Sharding X X (X) (X) X X X X X

Cloud Bursting X X X X X X X X X

Working on Data Copy X X – – X X X X X

Data Synchronization X X X X X X X X X

Backup – – – – – – – – –Archiving – – – – – – – X –Data Import from the Cloud – – – – (X) (X) (X) (X) (X)

Legend: X: applicable, (X) : might be applicable, – : not applicable

patterns are not applicable to scenario Archiving, asthere will be no further processing on the data after ithas been archived.

As the user has no impact on the API offered bythe Cloud service to import data, the functional andnon-functional patterns are not relevant for the mi-gration scenario Data Import from the Cloud. Theconfidentiality patterns might be applicable depend-ing on the goal of using the imported data. Thus, thedata might be filtered, anonymized, pseudonymized,categorized or harmonized based on pre-defined con-fidentiality levels before they are imported.

6.2 Cloud Data Migration

In this section we propose a methodology for migra-tion of the database layer to the Cloud and adaptationof the data access and business layer accordingly. Themethodology entails the following steps:

1. Identify relevant Cloud Data Migration Scenario.

2. Describe desired Cloud data store.

3. Select suitable Cloud Data Store.

4. Identify the applicable patterns to solve incompat-ibilities and to refactor the application.

5. Refactor the application by adapting the databaselayer and upper architectural layers while imple-menting the identified patterns.

6. Migrate the data to the selected Cloud Data Store.First of all the Cloud Data Migration Scenario has

to be identified. Therefore, the specific requirementsof the solution needed have to be mapped to the char-acteristics of the migration scenarios defined in Sec-tion 4. Afterwards, the functional and non-functionalrequirements such as data store type and read/writethroughput of the desired Cloud data store have tobe defined. By mapping this set of requirements tothe specifications provided by the Cloud data storeproviders, a specific Cloud data store can be selected.This step can be automated, e.g., by querying a Clouddata store repository containing the concrete proper-ties of available Cloud data stores.

In order to identify the patterns to be used tosolve incompatibilities and refactor the application,the non-functional and functional requirements of thedatabase layer used before migration have to be spec-ified. The comparison and mapping of these require-


43

Evaluation Scenario

Traditional / Cloud Private Cloud

Deployment Models Public Cloud

Ap

plic

atio

n L

ayer

s External Auditing Company

External Cloud Data Store Provider

Health Insurance Company

Database Layer*

Data Access Layer

Database Layer

Region A

Data Access Layer

Legend

Dataflow

Partial Migration

Modified

*

Region B

Data Layer

Component

Figure 2: Refactoring of motivating scenario using Cloud Data Patterns (Data Layer).

ments of the database layer previously used, to thedefined requirements for the desired Cloud data storelead to the identification of potential conflicts and in-compatibilities. The mapping of the identified mi-gration scenario to the corresponding patterns (Sec-tion 6.1) should be used for orientation and limitingthe number of patterns to be considered for applica-tion refactoring. Based on the characteristics of theCloud Data Patterns, and on the specific conditionswhen they can be applied (Section 5), the patternsto solve the incompatibilities and conflicts have to bechosen.

After the patterns to be applied have been identi-fied, the actual application refactoring can take placeby implementing the patterns and thus adapting thedatabase layer and upper architectural layers accord-ingly. This is because the adaptation of the databaselayer (and/or data access layer) by using Cloud DataPatterns may require adaptations of the business layeras well. For example the usage of confidentiality pat-terns for filtering, anonymization, or pseudonymiza-tion creates a different view on the data for the busi-ness layer by retrieving the application data in an ag-gregated format in comparison to the original data-base layer. Finally, the data themselves have to bemigrated to the selected Cloud data store, includingthe necessary DB schema creation. Existing tools andservices like the ones discussed in Section 3 can beused for this purpose.

6.3 Refactoring the Motivating Scenario

In the following, we discuss how the functional, non-functional, and confidentiality patterns can be used in

practice to realize the application based on the mo-tivating scenario introduced in Section 2 and refac-tor the application accordingly. As discussed in Sec-tion 2, HIC has to provide access to EAC to auditthe financial transactions processed by the companywhile avoiding additional load on the system. This es-sentially means that only the data on financial transac-tions have to be replicated. Furthermore, EAC can dono changes to the actual application data and thereforeno synchronization is required. These specific condi-tions on the required solution lead to the choice of themigration scenario Geographical Replication in thefirst step of the methodology (Section 6.2). Thus, thedatabase layer of the HIC is partially replicated in thepublic Cloud by focusing on the financial transactionsonly.

In the following we focus on identifying the pat-terns to be used and on the refactoring of the applica-tion based on the requirements described in the moti-vating scenario. According to the mapping of the sce-nario Geographical Replication to Cloud Data Pat-terns (Table 3) all patterns are potentially applicableto this scenario. Figure 2 provides an overview of therefactored application and realization of the scenariousing Cloud Data Patterns.

More specifically, in order to provide horizon-tal scalability for reads and writes to the data of theclients and their transactions, the Local Sharding-Based Router pattern is used. The data is separatedbetween the two data centers according to the geo-graphical location. The Google App Engine Datastoreis chosen for replicating the data on financial transac-tions, configured accordingly. The client informationand their corresponding medical records are critical


44

data to be kept only in the company’s private Cloud.Financial transactions information will appear both inthe private and in the public Cloud. In order to keepboth the data stored in the private data store and theone outsourced to the public Cloud consistent, dataupdates and inserts should be done in parallel to boththe private and the public part of the database layer.

However, only a part of the financial data is nec-essary for the auditing (e. g., client names can be re-moved) and the remaining can be pseudonymized be-fore moving to the public Cloud (e. g., bank accountnumbers replaced by serial IDs). A composition ofthe Filter of Critical Data and the Pseudonymizer ofCritical Data is used to fulfill both requirements. Thefilter is configured so that only data on financial trans-actions pass it. After passing the filter, the informa-tion on financial transactions is pseudonymized be-fore it is stored in the public Cloud. Storage of dataon the auditing company side can be done either in aprivate Cloud or in the traditional manner; this is outof the scope of this discussion.

Due to the challenges identified in the motivat-ing scenario regarding the data access of the audit-ing company, the query results must not contain anyrelations concerning clients and their correspondingmedical records or personal data. Therefore, this in-formation has to be completely deleted before pass-ing the query results to the auditing company (in-stead of simply obfuscating this relation by usingpseudonymization). In addition, the queries to be ex-ecuted by the auditor have to be agreed upon in ad-vance. For these purposes, the realization uses a com-position of the Anonymizer of Critical Data and theEmulator of Stored Procedures patterns. The emula-tor is used to predefine and restrict the data queriesallowed to be executed by the auditing company. TheAnonymizer of Critical Data additionally ensures theremoval of any critical information from the results ofthe queries.

A combination of functional, non-functional, andconfidentiality patterns can therefore be used in tan-dem by the proposed methodology to address the re-quirements of the use case posed by the motivatingscenario.

7 CONCLUSIONS AND FUTUREWORK

In the previous sections we discussed the need forsupporting the migration of application data to theCloud. A number of technical, architectural and legalissues related to this effort were identified by meansof a motivating scenario. The State of the Art in mi-

grating applications and application data to the Cloudwas then analyzed for best practices and associatedchallenges, and it was organized into ten distinct datamigration scenarios. These scenarios cover differ-ent aspects of data migration to and from the Cloudand include, among others, outsourcing the databaselayer, replication of data to reduce latency, Cloudbursting, and synchronization. A set of Cloud DataPatterns as reusable solutions to common migration-related challenges was then briefly summarized, and amapping between the identified scenarios and the pat-terns was presented. An application data migrationmethodology was then introduced based on this map-ping, and the motivating scenario application used inthe beginning of the paper was refactored to demon-strate the applicability of our work.

Neither the list of the migration scenarios, nor theone of the patterns is claimed to be complete. Furthereffort in augmenting and refining both scenarios andpatterns is required in the future. Providing toolingsupport for applying the application data migrationmethodology discussed in Section 6 could providesignificant help in this direction. A decision supportsystem built for this purpose, in the manner of worklike (Menzel and Ranjan, 2012) and (Khajeh-Hosseiniet al., 2012), could be an important contribution on itsown, guiding practictioners and researchers throughthe migration of their application to the Cloud. Fi-nally, the discussion in this work has to be positionedwithin a larger framework dealing with the migrationof applications to the Cloud, and the adaptations thatcould result from this migration.

ACKNOWLEDGEMENTS

The research leading to these results has re-ceived funding from the 4CaaSt project (http://www.4caast.eu) part of the European Union’s SeventhFramework Programme (FP7/2007-2013) under grantagreement no. 258862.

REFERENCES

Adler, B. (2011). Building Scalable Applications In theCloud: Reference Architecture & Best Practices,RightScale Inc.

Amazon.com, Inc. (2011). AWS Case Study: Alexa.Anand, S. (2010). Netflix’s Transition to High-Availability

Storage Systems.Andrikopoulos, V., Binz, T., Leymann, F., and Strauch, S.

(2012). How to Adapt Applications for the Cloud En-vironment. Springer Computing.


45

Armbrust, M. et al. (2009). Above the Clouds: A Berke-ley View of Cloud Computing. Technical ReportUCB/EECS-2009-28, EECS Department, Universityof California, Berkeley.

Badger, L., Grance, T., R., P.-C., and Voas, J. (2012). CloudComputing Synopsis and Recommendations - Recom-mendations of the National Institute of Standards andTechnology. NIST Special Publication 800-146.

Buretta, M. (1997). Data Replication: Tools and Tech-niques for Managing Distributed Information. JohnWiley & Sons, Inc.

Consultative Committee for Space Data Systems (2002).Reference Model for an Open Archival InformationSystem (OAIS).

Conway, G. and Curry, E. (2012). Managing Cloud Com-puting: A Life Cycle Approach. In Proceedings ofCLOSER’12. SciTePress.

Cunningham, S. R. (2010). Windows Azure Applica-tion Profile Guidance. Custom E-Commerce (Elastic-ity Focus) Application Migration Scenario.

Fowler, M. et al. (2002). Patterns of Enterprise ApplicationArchitecture. Addison-Wesley Professional.

Hohpe, G. and Woolf, B. (2003). Enterprise IntegrationPatterns: Designing, Building, and Deploying Mes-saging Solutions. Addison-Wesley Longman Publish-ing Co., Inc. Boston, MA, USA.

Khajeh-Hosseini, A., Greenwood, D., Smith, J. W., andSommerville, I. (2012). The Cloud Adoption Toolkit:Supporting Cloud Adoption Decisions in the Enter-prise. Software: Practice and Experience, 42(4):447–465.

Laszewski, T. and Nauduri, P. (2011). Migrating to theCloud: Oracle Client / Server Modernization. Else-vier Science.

Lee, J., Malcolm, G., and Matthews, A. (2009). Overviewof Microsoft SQL Azure Database.

Mell, P. and Grance, T. (2009). Cloud Computing Defini-tion. National Institute of Standards and Technology,Version 15.

Menzel, M. and Ranjan, R. (2012). CloudGenius: DecisionSupport for Web Server Cloud Migration. In Proceed-ings of WWW ’12, New York, NY, USA. ACM.

Microsoft Azure (2012). Generic Migration Scenarios andCase Studies.

Pritchett, D. (2008). BASE: An ACID Alternative. Queue,6(3):48–55.

Redkar, T. and Guidici, T. (2011). Windows Azure Platform.Apress.

Strauch, S., Andrikopoulos, V., Breitenbucher, U., Kopp,O., and Leymann, F. (2012a). Non-Functional DataLayer Patterns for Cloud Applications. In Proceedingsof CloudCom’12. IEEE Computer Society Press.

Strauch, S., Breitenbucher, U., Kopp, O., Leymann, F., andUnger, T. (2012b). Cloud Data Patterns for Confiden-tiality. In Proceedings of CLOSER’12. SciTePress.

Tak, B. C., Urgaonkar, B., and Sivasubramaniam, A.(2011). To Move or Not to Move: The Economics ofCloud Computing. In Proceedings of HotCloud’11,Berkeley, CA, USA. USENIX Association.

United States Congress (2002). Sarbanes-Oxley Act (SOX).

Varia, J. (2010). Migrating your Existing Applications tothe AWS Cloud. A Phase-driven Approach to CloudMigration.

Vogels, W. (2009). Eventually Consistent. Communicationsof the ACM, 52(1):40–44.

Zawodny, J. and Balling, D. (2004). High PerformanceMySQL: Optimization, Backups, Replication, Load-balancing, and More. O’Reilly & Associates, Inc. Se-bastopol, CA, USA.

All links were last followed on February 5, 2013.


46

migrating application data to the cloud using cloud data patterns › publications › inproc... ·...

Documents