730 ieee transactions on parallel and distributed …vaweinstitutes.com/pdffiles/scalable...

Scalable Distributed Service IntegrityAttestation for Software-as-a-Service CloudsJuan Du, Member, IEEE, Daniel J. Dean, Student Member, IEEE, Yongmin Tan, Member, IEEE,

Xiaohui Gu, Senior Member, IEEE, and Ting Yu, Member, IEEE

Abstract—Software-as-a-service (SaaS) cloud systems enable application service providers to deliver their applications via massive

cloud computing infrastructures. However, due to their sharing nature, SaaS clouds are vulnerable to malicious attacks. In this paper,

we present IntTest, a scalable and effective service integrity attestation framework for SaaS clouds. IntTest provides a novel integrated

attestation graph analysis scheme that can provide stronger attacker pinpointing power than previous schemes. Moreover, IntTest can

automatically enhance result quality by replacing bad results produced by malicious attackers with good results produced by benign

service providers. We have implemented a prototype of the IntTest system and tested it on a production cloud computing infrastructure

using IBM System S stream processing applications. Our experimental results show that IntTest can achieve higher attacker

pinpointing accuracy than existing approaches. IntTest does not require any special hardware or secure kernel support and imposes

little performance impact to the application, which makes it practical for large-scale cloud systems.

Index Terms—Distributed service integrity attestation, cloud computing, secure distributed data processing

Ç

1 INTRODUCTION

CLOUD computing has emerged as a cost-effectiveresource leasing paradigm, which obviates the need

for users maintain complex physical computing infrastruc-tures by themselves. Software-as-a-service (SaaS) clouds(e.g., Amazon Web Service (AWS) [1] and GoogleAppEngine [2]) build upon the concepts of software as aservice [3] and service-oriented architecture (SOA) [4], [5],which enable application service providers (ASPs) todeliver their applications via the massive cloud computinginfrastructure. In particular, our work focuses on datastream processing services [6], [7], [8] that are considered tobe one class of killer applications for clouds with manyreal-world applications in security surveillance, scientificcomputing, and business intelligence.

However, cloud computing infrastructures are oftenshared by ASPs from different security domains, whichmake them vulnerable to malicious attacks [9], [10]. Forexample, attackers can pretend to be legitimate serviceproviders to provide fake service components, and theservice components provided by benign service providersmay include security holes that can be exploited byattackers. Our work focuses on service integrity attacksthat cause the user to receive untruthful data processingresults, illustrated by Fig. 1. Although confidentiality andprivacy protection problems have been extensively studiedby previous research [11], [12], [13], [14], [15], [16], the

service integrity attestation problem has not been properlyaddressed. Moreover, service integrity is the most prevalentproblem, which needs to be addressed no matter whetherpublic or private data are processed by the cloud system.

Although previous work has provided various softwareintegrity attestation solutions [9], [17], [18], [19], [20], [21],[22], [23], those techniques often require special trustedhardware or secure kernel support, which makes themdifficult to be deployed on large-scale cloud computinginfrastructures. Traditional Byzantine fault tolerance (BFT)techniques [24], [25] can detect arbitrary misbehaviorsusing full-time majority voting (FTMV) over all replicas,which however incur high overhead to the cloud system. Adetailed discussion of the related work can be found inSection 5 of the online supplementary material, which canbe found on the Computer Society Digital Library athttp://doi.ieeecomputersociety.org/10.1109/TPDS.2013.62.

In this paper, we present IntTest, a new integratedservice integrity attestation framework for multitenantcloud systems. IntTest provides a practical service integrityattestation scheme that does not assume trusted entities onthird-party service provisioning sites or require applicationmodifications. IntTest builds upon our previous workRunTest [26] and AdapTest [27] but can provide strongermalicious attacker pinpointing power than RunTest andAdapTest. Specifically, both RunText and AdapTest as wellas traditional majority voting schemes need to assume thatbenign service providers take majority in every servicefunction. However, in large-scale multitenant cloud sys-tems, multiple malicious attackers may launch colludingattacks on certain targeted service functions to invalidatethe assumption. To address the challenge, IntTest takes aholistic approach by systematically examining both consis-tency and inconsistency relationships among different serviceproviders within the entire cloud system. IntTest examinesboth per-function consistency graphs and the global

730 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 3, MARCH 2014

. J. Du is with Amazon. E-mail: [email protected].

. D.J. Dean, X. Gu, and T. Yu are with the Department of Computer Science,North Carolina State University.E-mail: [email protected], {gu, yu}@csc.ncsu.edu.

. Y. Tan is with MathWorks. E-mail: [email protected].

Manuscript received 23 Oct. 2012; revised 16 Jan. 2013; accepted 12 Feb.2013; published online 27 Feb. 2013.Recommended for acceptance by L.E. Li.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TPDS-2012-10-1091.Digital Object Identifier no. 10.1109/TPDS.2013.62.

1045-9219/14/$31.00 � 2014 IEEE Published by the IEEE Computer Society

inconsistency graph. The per-function consistency graphanalysis can limit the scope of damage caused by colludingattackers, while the global inconsistency graph analysis caneffectively expose those attackers that try to compromisemany service functions. Hence, IntTest can still pinpointmalicious attackers even if they become majority for someservice functions.

By taking an integrated approach, IntTest can not onlypinpoint attackers more efficiently but also can suppressaggressive attackers and limit the scope of the damagecaused by colluding attacks. Moreover, IntTest providesresult autocorrection that can automatically replace corrupteddata processing results produced by malicious attackerswith good results produced by benign service providers.

Specifically, this paper makes the following contributions:

. We provide a scalable and efficient distributedservice integrity attestation framework for large-scale cloud computing infrastructures.

. We present a novel integrated service integrityattestation scheme that can achieve higher pinpoint-ing accuracy than previous techniques.

. We describe a result autocorrection technique thatcan automatically correct the corrupted resultsproduced by malicious attackers.

. We conduct both analytical study and experimentalevaluation to quantify the accuracy and overhead ofthe integrated service integrity attestation scheme.

We have implemented a prototype of the IntTestsystem and tested it on NCSU’s virtual computing lab(VCL) [28], a production cloud computing infrastructurethat operates in a similar way as the Amazon elasticcompute cloud (EC2) [29]. The benchmark applications weuse to evaluate IntTest are distributed data streamprocessing services provided by the IBM System S streamprocessing platform [8], [30], an industry strength datastream processing system. Experimental results show thatIntTest can achieve more accurate pinpointing thanexisting schemes (e.g., RunTest, AdapTest, and full-timemajority voting) under strategically colluding attacks.IntTest is scalable and can reduce the attestation overheadby more than one order of magnitude compared to thetraditional full-time majority voting scheme.

The rest of the paper is organized as follows: Section 2presents our system model. Section 3 presents the designdetails. Section 4 provides an analytical study about theIntTest system. Section 5 presents the experimental results.Section 6 summarizes the limitations of our approach.Finally, the paper concludes in Section 7.

2 PRELIMINARY

In this section, we first introduce the software-as-a-servicecloud system model. We then describe our problemformulation including the service integrity attack modeland our key assumptions. Table 1 summarizes all thenotations used in this paper.

2.1 SaaS Cloud System Model

SaaS cloud builds upon the concepts of software as a service[3] and service-oriented architecture [4], [5], which allowsapplication service providers to deliver their applicationsvia large-scale cloud computing infrastructures. For exam-ple, both Amazon Web Service and Google AppEngineprovide a set of application services supporting enterpriseapplications and big data processing. A distributed applica-tion service can be dynamically composed from individualservice components provided by different ASPs (pi) [31],[32]. For example, a disaster assistance claim processingapplication [33] consists of voice-over-IP (VoIP) analysiscomponent, e-mail analysis component, community dis-covery component, and clustering and join components.Our work focuses on data processing services [6], [8], [34],[35] which have become increasingly popular with applica-tions in many real-world usage domains such as businessintelligence, security surveillance, and scientific computing.Each service component, denoted by ci, provides a specificdata processing function, denoted by fi, such as sorting,filtering, correlation, or data mining utilities. Each servicecomponent can have one or more input ports for receivinginput data tuples, denoted by di, and one or more outputports to emit output tuples.

In a large-scale SaaS cloud, the same service function canbe provided by different ASPs. Those functionally equivalentservice components exist because: 1) service providers maycreate replicated service components for load balancing andfault tolerance purposes; and 2) popular services mayattract different service providers for profit. To supportautomatic service composition, we can deploy a set of portalnodes [31], [32] that serve as the gateway for the user toaccess the composed services in the SaaS cloud. The portalnode can aggregate different service components intocomposite services based on the user’s requirements. For

DU ET AL.: SCALABLE DISTRIBUTED SERVICE INTEGRITY ATTESTATION FOR SOFTWARE-AS-A-SERVICE CLOUDS 731

Fig. 1. Service integrity attack in cloud-based data processing. Si

denotes different service component and VM denotes virtual machines.

TABLE 1Notations

security protection, the portal node can perform authentica-tion on users to prevent malicious users from disturbingnormal service provisioning.

Different from other open distributed systems such aspeer-to-peer networks and volunteer computing environ-ments, SaaS cloud systems possess a set of unique features.First, third-party ASPs typically do not want to reveal theinternal implementation details of their software servicesfor intellectual property protection. Thus, it is difficult toonly rely on challenge-based attestation schemes [20], [36],[37] where the verifier is assumed to have certain knowl-edge about the software implementation or have access tothe software source code. Second, both the cloud infra-structure provider and third-party service providers areautonomous entities. It is impractical to impose any specialhardware or secure kernel support on individual serviceprovisioning sites. Third, for privacy protection, only portalnodes have global information about which service func-tions are provided by which service providers in the SaaScloud. Neither cloud users nor individual ASPs have theglobal knowledge about the SaaS cloud such as the numberof ASPs and the identifiers of the ASPs offering a specificservice function.

2.2 Problem Formulation

Given an SaaS cloud system, the goal of IntTest is topinpoint any malicious service provider that offers anuntruthful service function. IntTest treats all servicecomponents as black boxes, which does not require anyspecial hardware or secure kernel support on the cloudplatform. We now describe our attack model and our keyassumptions as follows:

Attack model. A malicious attacker can pretend to be alegitimate service provider or take control of vulnerableservice providers to provide untruthful service functions.Malicious attackers can be stealthy, which means they canmisbehave on a selective subset of input data or servicefunctions while pretending to be benign service providerson other input data or functions. The stealthy behaviormakes detection more challenging due to the followingreasons: 1) the detection scheme needs to be hidden fromthe attackers to prevent attackers from gaining knowledgeon the set of data processing results that will be verified andtherefore easily escaping detection; and 2) the detectionscheme needs to be scalable while being able to capturemisbehavior that may be both unpredictable and occasional.

In a large-scale cloud system, we need to considercolluding attack scenarios where multiple malicious attack-ers collude or multiple service sites are simultaneouslycompromised and controlled by a single malicious attacker.Attackers could sporadically collude, which means an attack-er can collude with an arbitrary subset of its colluders atany time. We assume that malicious nodes have noknowledge of other nodes except those they interact withdirectly. However, attackers can communicate with theircolluders in an arbitrary way. Attackers can also changetheir attacking and colluding strategies arbitrarily.

Assumptions. We first assume that the total number ofmalicious service components is less than the total numberof benign ones in the entire cloud system. Without thisassumption, it would be very hard, if not totally impossible,

for any attack detection scheme to work when comparableground truth processing results are not available. However,different from RunTest, AdapTest, or any previous majorityvoting schemes, IntTest does not assume benign servicecomponents have to be the majority for every servicefunction, which will greatly enhance our pinpointing powerand limit the scope of service functions that can becompromised by malicious attackers.

Second, we assume that the data processing services areinput-deterministic, that is, given the same input, a benignservice component always produces the same or similaroutput (based on a user-defined similarity function). Manydata stream processing functions fall into this category [8].We can also easily extend our attestation framework tosupport stateful data processing services [38], whichhowever is outside the scope of this paper.

Third, we also assume that the result inconsistencycaused by hardware or software faults can be marked byfault detection schemes [39] and are excluded from ourmalicious attack detection.

3 DESIGN AND ALGORITHMS

In this section, we first present the basis of the IntTestsystem: probabilistic replay-based consistency check andthe integrity attestation graph model. We then describe theintegrated service integrity attestation scheme in detail.Next, we present the result autocorrection scheme.

3.1 Baseline Attestation Scheme

To detect service integrity attack and pinpoint maliciousservice providers, our algorithm relies on replay-basedconsistency check to derive the consistency/inconsistencyrelationships between service providers. For example, Fig. 2shows the consistency check scheme for attesting threeservice providers p1, p2, and p3 that offer the same servicefunction f . The portal sends the original input data d1 to p1

and gets back the result fðd1Þ. Next, the portal sends d01, aduplicate of d1 to p3 and gets back the result fðd01Þ. Theportal then compares fðd1Þ and fðd01Þ to see whether p1 andp3 are consistent.

The intuition behind our approach is that if two serviceproviders disagree with each other on the processing resultof the same input, at least one of them should be malicious.Note that we do not send an input data item and itsduplicates (i.e., attestation data) concurrently. Instead, wereplay the attestation data on different service providersafter receiving the processing result of the original data.Thus, the malicious attackers cannot avoid the risk of beingdetected when they produce false results on the originaldata. Although the replay scheme may cause delay in a


Fig. 2. Replay-based consistency check.

single tuple processing, we can overlap the attestation andnormal processing of consecutive tuples in the data streamto hide the attestation delay from the user.

If two service providers always give consistent outputresults on all input data, there exists consistency relation-ship between them. Otherwise, if they give differentoutputs on at least one input data, there is inconsistencyrelationship between them. We do not limit the consistencyrelationship to equality function since two benign serviceproviders may produce similar but not exactly the sameresults. For example, the credit scores for the same personmay vary by a small difference when obtained fromdifferent credit bureaus. We allow the user to define adistance function to quantify the biggest tolerable resultdifference.

Definition 1. For two output results, r1 and r2, which come fromtwo functionally equivalent service providers, respectively,result consistency is defined as either r1 ¼ r2, or the distancebetween r1 and r2 according to user-defined distance functionDðr1; r2Þ falls within a threshold �.

For scalability, we propose randomized probabilistic attes-tation, an attestation technique that randomly replays asubset of input data for attestation. For composite data-flowprocessing services consisting of multiple service hops, eachservice hop is composed of a set of functionally equivalentservice providers. Specifically, for an incoming tuple di, theportal may decide to perform integrity attestation withprobability pu. If the portal decides to perform attestation ondi, the portal first sends di to a pre-defined service pathp1 ! p2 � � � ! pl providing functions f1 ! f2 � � � ! fl. Afterreceiving the processing result for di, the portal replays theduplicate(s) of di on alternative service path(s) such asp01 ! p02 � � � ! p0l, where p0j provides the same function fj aspj. The portal may perform data replay on multiple serviceproviders to perform concurrent attestation.

After receiving the attestation results, the portal com-pares each intermediate result between pairs of functionallyequivalent service providers pi and p0i. If pi and p0i receivethe same input data but produce different output results,we say that pi and p0i are inconsistent. Otherwise, we saythat pi and p0i are consistent with regard to function fi. Forexample, let us consider two different credit score serviceproviders p1 and p01. Suppose the distance function isdefined as two credit score difference is no more than 10. Ifp1 outputs 500 and p01 outputs 505 for the same person, wesay p1 and p01 are consistent. However, if p1 outputs 500 andp01 outputs 550 for the same person, we would consider p1

and p01 to be inconsistent. We evaluate both intermediateand final data processing results between functionallyequivalent service providers to derive the consistency/inconsistency relationships. For example, if data processinginvolves a subquery to a database, we evaluate both thefinal data processing result along with the intermediatesubquery result. Note that although we do not attest allservice providers at the same time, all service providers willbe covered by the randomized probabilistic attestation overa period of time.

With replay-based consistency check, we can test func-tionally equivalent service providers and obtain theirconsistency and inconsistency relationships. We employ

both the consistency graph and inconsistency graph to aggregatepairwise attestation results for further analysis. The graphsreflect the consistency/inconsistency relationships acrossmultiple service providers over a period of time. Beforeintroducing the attestation graphs, we first define consis-tency links and inconsistency links.

Definition 2. A consistency link exists between two serviceproviders who always give consistent output for the sameinput data during attestation. An inconsistency link existsbetween two service providers who give at least oneinconsistent output for the same input data during attestation.

We then construct consistency graphs for each functionto capture consistency relationships among the serviceproviders provisioning the same function. Fig. 3a shows theconsistency graphs for two functions. Note that two serviceproviders that are consistent for one function are notnecessarily consistent for another function. This is thereason why we confine consistency graphs within indivi-dual functions.

Definition 3. A per-function consistency graph is anundirected graph, with all the attested service providers thatprovide the same service function as the vertices andconsistency links as the edges.

We use a global inconsistency graph to capture incon-sistency relationships among all service providers. Twoservice providers are said to be inconsistent as long as theydisagree in any function. Thus, we can derive morecomprehensive inconsistency relationships by integratinginconsistency links across functions. Fig. 3b shows anexample of the global inconsistency graph. Note that serviceprovider p5 provides both functions f1 and f2. In theinconsistency graph, there is a single node p5 with its linksreflecting inconsistency relationships in both functions f1

and f2.

Definition 4. The global inconsistency graph is an undirectedgraph, with all the attested service providers in the system asthe vertex set and inconsistency links as the edges.

The portal node is responsible for constructing andmaintaining both per-function consistency graphs and theglobal inconsistency graph. To generate these graphs, theportal maintains counters for the number of consistencyresults and counters for the total number of attestation databetween each pair of service providers. The portal updates


Fig. 3. Attestation graphs.

the counters each time when it receives attestation results.At any time, if the counter for consistency results has thesame value with that for the total attestation data, there is aconsistency link between this pair of service providers.Otherwise, there is an inconsistency link between them.

3.2 Integrated Attestation Scheme

We now present our integrated attestation graph analysisalgorithm.

Step 1: Consistency graph analysis. We first examine per-function consistency graphs to pinpoint suspicious serviceproviders. The consistency links in per-function consistencygraphs can tell which set of service providers keepconsistent with each other on a specific service function.Given any service function, since benign service providersalways keep consistent with each other, benign serviceproviders will form a clique in terms of consistency links.For example, in Fig. 3a, p1, p3 and p4 are benign serviceproviders and they always form a consistency clique. In ourprevious work [26], we have developed a clique-basedalgorithm to pinpoint malicious service providers. If weassume that the number of benign service providers islarger than that of the malicious ones, a benign node willalways stay in a clique formed by all benign nodes, whichhas size larger than bk=2c, where k is the number of serviceproviders provisioning the service function. Thus, we canpinpoint suspicious nodes by identifying nodes that areoutside of all cliques of size larger than bk=2c. For example,in Fig. 3a, p2 and p5 are identified as suspicious because theyare excluded from the clique of size 3.

However, strategically colluding attackers can try to takemajority in a specific service function to escape thedetection. Thus, it is insufficient to examine the per-functionconsistency graph only. We need to integrate the consis-tency graph analysis with the inconsistency graph analysisto achieve more robust integrity attestation.

Step 2: Inconsistency graph analysis. Given an inconsis-tency graph containing only the inconsistency links, theremay exist different possible combinations of the benignnode set and the malicious node set. However, if we assumethat the total number of malicious service providers in thewhole system is no more than K, we can pinpoint a subsetof truly malicious service providers. Intuitively, given twoservice providers connected by an inconsistency link, wecan say that at least one of them is malicious since any twobenign service providers should always agree with eachother. Thus, we can derive the lower bound about thenumber of malicious service providers by examiningthe minimum vertex cover of the inconsistency graph. Theminimum vertex cover of a graph is a minimum set ofvertices such that each edge of the graph is incident to atleast one vertex in the set. For example, in Fig. 3b, p2 and p5

form the minimum vertex cover. We present two proposi-tions as part of our approach. The proofs for thesepropositions can be found in Section 1 of the onlinesupplementary material.

Proposition 1. Given an inconsistency graph G, let CG be a

minimum vertex cover of G. Then, the number of maliciousservice providers is no less than jCGj.

We now define the residual inconsistency graph for anode pi as follows.

Definition 5. The residual inconsistency graph of node pi isthe inconsistency graph after removing the node pi and all oflinks adjacent to pi.

For example, Fig. 4 shows the residual inconsistencygraph after removing the node p2. Based on the lowerbound of the number of malicious service providers andDefinition 5, we have the following proposition forpinpointing a subset of malicious nodes.

Proposition 2. Given an integrated inconsistency graph G andthe upper bound of the number of malicious service providersK, a node p must be a malicious service provider if and only if

jNpj þ jCG0p j > K; ð1Þ

where jNpj is the neighbor size of p, and jCG0p j is the size of theminimum vertex cover of the residual inconsistency graphafter removing p and its neighbors from G.

For example, in Fig. 3b, suppose we know the number ofmalicious service providers is no more than two. Let usexamine the malicious node p2 first. After we remove p2 andits neighbors p1, p3, and p4 from the inconsistency graph, theresidual inconsistency graph will be a graph without anylink. Thus, its minimum vertex cover is 0. Since p2 has threeneighbors, we have 3þ 0 > 2. Thus, p2 is malicious. Let usnow check out the benign node p1. After removing p1 and itstwo neighbors p2 and p5, the residual inconsistency graphwill be a graph without any link and its minimum vertexcover should be 0. Since p1 has two neighbors, (1) does nothold. We will not pinpoint p1 as malicious in this step.

Note that benign service providers that do not serve samefunctions with malicious ones will be isolated nodes in theinconsistency graph, since they will not be involved in anyinconsistency links. For example, in Fig. 5, nodes p4, p5, p6,and p7 are isolated nodes since they are not associated withany inconsistency links in the global inconsistency graph.Thus, we can remove these nodes from the inconsistencygraph without affecting the computation of the minimumvertex cover.

We now describe how to estimate the upper bound of thenumber of malicious service providers K. Let N denote thetotal number of service providers in the system. Since weassume that the total number of malicious service providersis less than that of benign ones, the number of maliciousservice providers should be no more than bN=2c. Accordingto Proposition 1, the number of malicious service providersshould be no less than the size of the minimum vertex cover


Fig. 4. Inconsistency graph G and its residual graph.

jCGj of the global inconsistency graph. Thus, K is firstbounded by its lower bound jCGj and upper bound bN=2c.We then use an iterative algorithm to tighten the bound ofK. We start from the lower bound of K and compute the setof malicious nodes, as described by Proposition 2, denotedby �. Then, we gradually increase K by one each time. Foreach specific value of K, we can get a set of maliciousnodes. With a larger K, the number of nodes that can satisfyjNsj þ jCG0s j > K becomes less, which causes the set � to bereduced. When � ¼ ;, we stop increasing K, since anylarger K cannot give more malicious nodes. Intuitively,when K is large, fewer nodes may satisfy (1). Thus, we mayonly identify a small subset of malicious nodes. In contrast,when K is small, more nodes may satisfy (1), which maymistakenly pinpoint benign nodes as malicious. To avoidfalse positives, we want to pick a large enough K, whichcan pinpoint a set of true malicious service providers.

Step 3: Combining consistency and inconsistency graphanalysis results. Let Gi be the consistency graph generatedfor service function fi, and G be the global inconsistencygraph. Let Mi denote the list of suspicious nodes byanalyzing per function consistency graph Gi (i.e., nodesbelonging to minority cliques), and � denotes the list ofsuspicious nodes by analyzing the global inconsistencygraph G, given a particular upper bound of the number ofmalicious nodes K. We examine per-function consistencygraphs one by one. Let �i denote the subset of � that servesfunction fi. If �i \Mi 6¼ ;, we add nodes in Mi to theidentified malicious node set. The idea is that since themajority of nodes serving function fi have successfullyexcluded malicious nodes in �i, we could trust theirdecision on proposing Mi as malicious nodes. Pseudocodeof our algorithm can be found in Section 1 of the onlinesupplemental material.

For example, Fig. 6 shows both the per-functionconsistency graphs and the global inconsistency graph. Ifthe upper bound of the malicious nodes K is set to 4, theinconsistency graph analysis will capture the maliciousnode p9 but will miss the malicious node p8. The reason isthat p8 only has three neighbors and the minimum vertexcover for the residual inconsistency graph after removing p8

and its three neighbors is 1. Note that we will not pinpointany benign node as malicious according to Proposition 2.For example, the benign node p1 has two neighbors and theminimum vertex cover for the residual graph after remov-ing p1 and its two neighbors p8 and p9 will be 0 since theresidual graph does not include any link. However, by

checking the consistency graph of function f1, we find �1 ¼fp9g has overlap with the minority clique M1 ¼ fp8; p9g. Wethen infer p8 should be malicious too.

Note that even if we have an accurate estimation of thenumber of malicious nodes, the inconsistency graphanalysis scheme may not identify all malicious nodes.However, our integrated algorithm can pinpoint moremalicious nodes than the inconsistency graph only algo-rithm. An example showing how our algorithm canpinpoint more malicious nodes than the inconsistencygraph only algorithm can be found in Section 1 of theonline supplemental material.

3.3 Result Autocorrection

IntTest can not only pinpoint malicious service providersbut also automatically correct corrupted data processingresults to improve the result quality of the cloud dataprocessing service, illustrated by Fig. 7. Without ourattestation scheme, once an original data item is manipu-lated by any malicious node, the processing result of thisdata item can be corrupted, which will result in degradedresult quality. IntTest leverages the attestation data and themalicious node pinpointing results to detect and correctcompromised data processing results.

Specifically, after the portal node receives the result fðdÞof the original data d, the portal node checks whether thedata d has been processed by any malicious node that hasbeen pinpointed by our algorithm. We label the result fðdÞas “suspicious result” if d has been processed by anypinpointed malicious node. Next, the portal node checkswhether d has been chosen for attestation. If d is selected forattestation, we check whether the attestation copy of d only


Fig. 5. Isolated benign service providers in the global inconsistency

graph.

Fig. 6. An example for integrated graph analysis with two malicious

nodes.

Fig. 7. Automatic data correction using attestation data processing

results.

traverses good nodes. If it is true, we will use the result ofthe attestation data to replace fðdÞ. For example, in Fig. 7,the original data d are processed by the pinpointedmalicious node s6, while one of its attestation data d00 isonly processed by benign nodes. The portal node will usethe attestation data result fðd00Þ to replace the original resultthat can be corrupted if s6 cheated on d.

4 SECURITY ANALYSIS

We now present a summary of the results of our analyticalstudy about IntTest. Additional details along with a proof ofthe proposition presented in this section can be found inSection 2 of the online supplemental material.

Proposition 3. Given an accurate upper bound of the number ofmalicious service providers K, if malicious service providersalways collude together, IntTest has zero false positive.

Although our algorithm cannot guarantee zero falsepositives when there are multiple independent colludinggroups, it will be difficult for attackers to escape ourdetection with multiple independent colluding groupssince attackers will have inconsistency links not only withbenign nodes but also with other groups of maliciousnodes. Additionally, our approach limits the damagecolluding attackers can cause if they can evade detectionin two ways. First, our algorithm limits the number offunctions which can be simultaneously attacked. Second,our approach ensures a single attacker cannot participatein compromising an unlimited number of service functionswithout being detected.

5 EXPERIMENTAL EVALUATION

In this section, we present the experimental evaluation ofthe IntTest system. We first describe our experimental setup.We then present and analyze the experimental results.

5.1 Experiment Setup

We have implemented a prototype of the IntTest systemand tested it using the NCSU’s virtual computing lab [28], aproduction cloud infrastructure operating in a similar wayas Amazon EC2 [29]. We add portal nodes into VCL anddeploy IBM System S stream processing middleware [8],[30] to provide distributed data stream processing service.System S is an industry-strength high performance streamprocessing platform that can analyze massive volumes ofcontinuous data streams and scale to hundreds of proces-sing elements (PEs) for each application. In our experi-ments, we used 10 VCL nodes which run 64bit CentOS 5.2.Each node runs multiple virtual machines (VMs) on top ofXen 3.0.3.

The data-flow processing application we use in ourexperiments is adapted from the sample applicationsprovided by System S. This application takes stockinformation as input, performs windowed aggregation onthe input stream according to the specified company name,and then performs calculations on the stock data. We use atrusted portal node to accept the input stream, performcomprehensive integrity attestation on the PEs, andanalyze the attestation results. The portal node constructsone consistency graph for each service function and one

global inconsistency graph across all service providers inthe system.

For comparison, we have also implemented three alter-native integrity attestation schemes: 1) the full-time majorityvoting scheme, which employs all functionally equivalentservice providers at all time for attestation and determinesmalicious service providers through majority voting on theprocessing results; 2) the part-time majority voting (PTMV)scheme, which employs all functionally equivalent serviceproviders over a subset of input data for attestation anddetermines malicious service providers using majorityvoting; and 3) the RunTest scheme [26], which pinpointsmalicious service providers by analyzing only per-functionconsistency graphs, labeling those service providers that areoutside of all cliques of size larger than bk=2c as malicious,where k is the number of service providers that takeparticipate in this service function. Note that AdapTest[27] uses the same attacker pinpointing algorithm asRunTest. Thus, AdapTest has the same detection accuracyas RunTest but with less attestation overhead.

Three major metrics for evaluating our scheme aredetection rate, false alarm rate, and attestation overhead. Wecalculate the detection rate, denoted byAD, as the number ofpinpointed malicious service providers over the totalnumber of malicious service providers that have misbe-haved at least once during the experiment. During runtime,the detection rate should start from zero and increase asmore malicious service providers are detected. false alarmrate AF is defined as Nfp=ðNfp þNtnÞ, where Nfp denotesfalse alarms corresponding to the number of benign serviceproviders that are incorrectly identified as malicious; Ntn

denotes true negatives corresponding to the number ofbenign service providers that are correctly identified asbenign. The attestation overhead is evaluated by both thenumber of duplicated data tuples that are redundantlyprocessed for service integrity attestation and the extra data-flow processing time incurred by the integrity attestation.

We assume that the colluding attackers know ourattestation scheme and take the best strategy whileevaluating the IntTest system. According to the securityanalysis in Section 4, to escape detection, the best practicefor attackers is to attack as a colluding group. Colludingattackers can take different strategies. They may conserva-tively attack by first attacking those service functions withless number of service providers where they can easily takemajority, assuming they know the number of participatingservice providers for each service function. Alternatively,they may aggressively attack by attacking service functionsrandomly, assuming they do not know the number ofparticipating service providers. We investigate the impactof these attack strategies on our scheme in terms of bothdetection rate and false alarm rate.

5.2 Results and Analysis

We first investigate the accuracy of our scheme inpinpointing malicious service providers. Fig. 8a comparesour scheme with the other alternative schemes (i.e., FTMV,PTMV, and RunTest) when malicious service providersaggressively attack different number of service functions. Inthis set of experiments, we have 10 service functions and30 service providers. The number of service providers in


each service function randomly ranges in [1, 8]. Each benignservice provider provides two randomly selected servicefunctions. The data rate of the input stream is 300 tuples persecond. We set 20 percent of service providers as malicious.After the portal receives the processing result of a new datatuple, it randomly decides whether to perform dataattestation. Each tuple has 0.2 probability of getting attested(i.e., attestation probability Pu ¼ 0:2), and two attestationdata replicas are used (i.e., number of total data copiesincluding the original data r ¼ 3). Each experiment isrepeated three times. We report the average detection rateand false alarm rate achieved by different schemes. Notethat RunTest can achieve the same detection accuracyresults as the majority voting based schemes after therandomized probabilistic attestation covers all attestedservice providers and discovers the majority clique [26].In contrast, IntTest comprehensively examines both per-function consistency graphs and the global inconsistencygraph to make the final pinpointing decision. We observethat IntTest can achieve much higher detection rate andlower false alarm rate than other alternatives. Moreover,IntTest can achieve better detection accuracy when mal-icious service providers attack more functions. We alsoobserve that when malicious service providers attackaggressively, our scheme can detect them even though theyattack a low percentage of service functions.

Fig. 8b shows the malicious service provider detectionaccuracy results under the conservative attack scenarios. Allthe other experiment parameters are kept the same as theprevious experiments. The results show that IntTest canconsistently achieve higher detection rate and lower falsealarm rate than the other alternatives. In the conservativeattack scenario, as shown by Fig. 8b, the false alarm rate ofIntTest first increases when a small percentage of servicefunctions are attacked and then drops to zero quickly withmore service functions are attacked. This is because whenattackers only attack a few service functions where theycan take majority, they can hide themselves from ourdetection scheme while tricking our algorithm into label-ing benign service providers as malicious. However, ifthey attack more service functions, they can be detectedsince they incur more inconsistency links with benignservice providers in the global inconsistency graph. Notethat majority voting-based schemes can also detect mal-icious attackers if attackers fail to take majority in theattacked service function. However, majority voting-basedschemes have high false alarms since attacks can alwaystrick the schemes to label benign service providers as

malicious as long as attackers can take majority in eachindividual service function.

The results of increasing the percentage of maliciousservice providers to 40 percent can be found in Section 3 ofthe online supplemental material.

We now show one example of detection time comparisonresults during the above experiments, shown by Fig. 9. Inthis case, malicious service providers use a small prob-ability (0.2) to misbehave on an incoming data tuple. Forprobabilistic attestation schemes such as IntTest, PTMV,and RunTest, the attestation probability is set at a smallnumber (0.2) too. For IntTest and RunTest, two attestationdata replicas are used (r ¼ 3). Here, the attackers may attackdifferent service functions with different subset of theircolluders. As expected, the FTMV scheme needs the leasttime to detect malicious service providers because it attestsall service components all the time. Although PTMV has thesame attestation probability with IntTest and RunTest, ithas shorter detection time since it uses all servicecomponents for each attestation data. IntTest can achieveshorter detection time than RunTest. Similar to previousexperiments, IntTest achieves the highest detection rateamong all algorithms. RunTest, FTMV, and PTMV cannotachieve 100 percent detection rate since they cannot detectthose attackers that only misbehave in service functionswhere they can take the majority.

We also conducted sensitivity study to evaluate theimpact of various system parameters on the effectiveness ofour algorithm. Those results can be found in Section 3 of theonline supplementary material.

We now evaluate the effectiveness of our result auto-correction scheme. We compare the result quality withoutautocorrection and with autocorrection and also investigatethe impact of the attestation probability. Figs. 10a and 10bshow the result quality under noncolluding attacks with20 percent malicious nodes and colluding attacks with40 percent malicious nodes, respectively. We vary theattestation probability from 0.2 to 0.4. In both scenarios,


Fig. 8. Malicious attackers pinpointing accuracy comparison with

20 percent service providers being malicious.

Fig. 9. Detection rate time line results.

Fig. 10. Result quality detection and autocorrection performance under

noncolluding attacks.

IntTest can achieve significant result quality improvementwithout incurring any extra overhead other than theattestation overhead. IntTest can achieve higher resultquality improvement under higher node misbehavingprobability. This is because IntTest can detect the maliciousnodes earlier so that it can correct more compromised datausing the attestation data.

Fig. 11 compares the overhead of the four schemes interms of the percentage of attestation traffic compared tothe original data traffic (i.e., the total number of duplicateddata tuples used for attestation over the number of originaldata tuples). The data rate is 300 tuples per second. Eachexperiment run processes 20,000 data tuples. IntTest andRunTest save more than half attestation traffic than PTMVand incur an order of magnitude less attestation overheadthan FTMV. Additional overhead analysis details areavailable in Section 3 of the online supplemental material.

6 LIMITATION DISCUSSION

Although we have shown that IntTest can achieve betterscalability and higher detection accuracy than existingschemes, IntTest still has a set of limitations that requirefurther study. A detailed limitation discussion can be foundin Section 4 of the online supplementary material. We nowprovide a summary of the limitations of our approach. First,malicious attackers can still escape the detection if they onlyattack a few service functions, take majority in all thecompromised service functions, and have less inconsistencylinks than benign service providers. However, IntTest caneffectively limit the attack scope and make it difficult toattack popular service functions. Second, IntTest needs toassume the attested services are input deterministic wherebenign services will return the same or similar resultsdefined by a distance function for the same input. Thus,IntTest cannot support those service functions whoseresults vary significantly based on some random numbersor time stamps.

7 CONCLUSION

In this paper, we have presented the design and imple-mentation of IntTest, a novel integrated service integrityattestation framework for multitenant software-as-a-servicecloud systems. IntTest employs randomized replay-basedconsistency check to verify the integrity of distributedservice components without imposing high overhead to thecloud infrastructure. IntTest performs integrated analysisover both consistency and inconsistency attestation graphsto pinpoint colluding attackers more efficiently than

existing techniques. Furthermore, IntTest provides result

autocorrection to automatically correct compromised re-

sults to improve the result quality. We have implemented

IntTest and tested it on a commercial data stream proces-

sing platform running inside a production virtualized cloud

computing infrastructure. Our experimental results show

that IntTest can achieve higher pinpointing accuracy than

existing alternative schemes. IntTest is lightweight, which

imposes low-performance impact to the data processing

services running inside the cloud computing infrastructure.

ACKNOWLEDGMENTS

This work was sponsored in part by US Army Research

Office (ARO) under grant W911NF-08-1-0105 managed by

NCSU Secure Open Systems Initiative (SOSI), US National

Science Foundation (NSF) CNS-0915567, and NSF IIS-

0430166. Any opinions expressed in this paper are those

of the authors and do not necessarily reflect the views of the

ARO, NSF or US Government. This work was done when

Juan Du and Yongmin Tan were PhD students at the North

Carolina State University.

REFERENCES

[1] Amazon Web Services, http://aws.amazon.com/, 2013.[2] Google App Engine, http://code.google.com/appengine/, 2013.[3] Software as a Service, http://en.wikipedia.org/wiki/Software as

a Service, 2013.[4] G. Alonso, F. Casati, H. Kuno, and V. Machiraju, Web Services

Concepts, Architectures and Applications (Data-Centric Systems andApplications). Addison-Wesley Professional, 2002.

[5] T. Erl, Service-Oriented Architecture (SOA): Concepts, Technology, andDesign. Prentice Hall, 2005.

[6] T.S. Group, “STREAM: The Stanford Stream Data Manager,” IEEEData Eng. Bull., vol. 26, no. 1, pp. 19-26, Mar. 2003.

[7] D.J. Abadi et al., “The Design of the Borealis Stream ProcessingEngine,” Proc. Second Biennial Conf. Innovative Data SystemsResearch (CIDR ’05), 2005.

[8] B. Gedik et al., “SPADE: The System S Declarative StreamProcessing Engine,” Proc. ACM SIGMOD Int’l Conf. Managementof Data (SIGMOD ’08), Apr. 2008.

[9] S. Berger et al., “TVDc: Managing Security in the Trusted VirtualDatacenter,” ACM SIGOPS Operating Systems Rev., vol. 42, no. 1,pp. 40-47, 2008.

[10] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage, “Hey, YouGet Off My Cloud! Exploring Information Leakage in Third-PartyCompute Clouds,” Proc. 16th ACM Conf. Computer and Commu-nications Security (CCS), 2009.

[11] W. Xu, V.N. Venkatakrishnan, R. Sekar, and I.V. Ramakrishnan,“A Framework for Building Privacy-Conscious Composite WebServices,” Proc. IEEE Int’l Conf. Web Services, pp. 655-662, Sept.2006.

[12] P.C.K. Hung, E. Ferrari, and B. Carminati, “Towards StandardizedWeb Services Privacy Technologies,” IEEE Int’l Conf. Web Services,pp. 174-183, June 2004.

[13] L. Alchaal, V. Roca, and M. Habert, “Managing and Securing WebServices with VPNs,” Proc. IEEE Int’l Conf. Web Services, pp. 236-243, June 2004.

[14] H. Zhang, M. Savoie, S. Campbell, S. Figuerola, G. von Bochmann,and B.S. Arnaud, “Service-Oriented Virtual Private Networks forGrid Applications,” Proc. IEEE Int’l Conf. Web Services, pp. 944-951,July 2007.

[15] M. Burnside and A.D. Keromytis, “F3ildCrypt: End-to-EndProtection of Sensitive Information in Web Services,” Proc. 12thInt’l Conf. Information Security (ISC), pp. 491-506, 2009.

[16] I. Roy et al., “Airavat: Security and Privacy for MapReduce,” Proc.Seventh USENIX Conf. Networked Systems Design and Implementation(NSDI), Apr. 2010.


Fig. 11. Attestation overhead comparison.

[17] J. Garay and L. Huelsbergen, “Software Integrity Protection UsingTimed Executable Agents,” Proc. ACM Symp. Information, Compu-ter and Comm. Security (ASIACCS), Mar. 2006.

[18] T. Garfinkel et al., “Terra: A Virtual Machine-Based Platform forTrusted Computing,” Proc. 19th ACM Symp. Operating SystemsPrinciples (SOSP), Oct. 2003.

[19] A. Seshadri, M. Luk, E. Shi, A. Perrig, L. van Doorn, and P. Khosla,“Pioneer: Verifying Code Integrity and Enforcing UntamperedCode Execution on Legacy Systems,” Proc. 20th ACM Symp.Operating Systems Principles (SOSP), Oct. 2005.

[20] E. Shi, A. Perrig, and L.V. Doorn, “Bind: A Fine-GrainedAttestation Service for Secure Distributed Systems,” Proc. IEEESymp. Security and Privacy, 2005.

[21] Trusted Computing Group, https://www.trustedcomputinggroup.org/home, 2013.

[22] “TPM Specifications Version 1.2,” TPM, https://www.trustedcomputinggroup.org/downloads/specifications/tpm/tpm, 2013.

[23] J.L. Griffin, T. Jaeger, R. Perez, and R. Sailer, “Trusted VirtualDomains: Toward Secure Distributed Services,” Proc. First Work-shop Hot Topics in System Dependability, June 2005.

[24] L. Lamport, R. Shostak, and M. Pease, “The Byzantine GeneralsProblem,” ACM Trans. Programming Languages and Systems, vol. 4,no. 3, pp. 382-401, 1982.

[25] T. Ho et al., “Byzantine Modification Detection in MulticastNetworks Using Randomized Network Coding,” Proc. IEEE Int’lSymp. Information Theory (ISIT), 2004.

[26] J. Du, W. Wei, X. Gu, and T. Yu, “Runtest: Assuring Integrity ofDataflow Processing in Cloud Computing Infrastructures,” Proc.ACM Symp. Information, Computer and Comm. Security (ASIACCS),2010.

[27] J. Du, N. Shah, and X. Gu, “Adaptive Data-Driven ServiceIntegrity Attestation for Multi-Tenant Cloud Systems,” Proc. Int’lWorkshop Quality of Service (IWQoS), 2011.

[28] Virtual Computing Lab, http://vcl.ncsu.edu/, 2013.[29] Amazon Elastic Compute Cloud, http://aws.amazon.com/ec2/,

2013.[30] N. Jain et al., “Design, Implementation, and Evaluation of the

Linear Road Benchmark on the Stream Processing Core,” Proc.ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’06), 2006.

[31] B. Raman et al., “The SAHARA Model for Service CompositionAcross Multiple Providers,” Proc. First Int’l Conf. PervasiveComputing, Aug. 2002.

[32] X. Gu et al., “QoS-Assured Service Composition in ManagedService Overlay Networks,” Proc. 23rd Int’l Conf. DistributedComputing Systems (ICDCS ’03), pp. 194-202, 2003.

[33] K.-L. Wu, P.S. Yu, B. Gedik, K. Hildrum, C.C. Aggarwal, E.Bouillet, W. Fan, D. George, X. Gu, G. Luo, and H. Wang,“Challenges and Experience in Prototyping a Multi-Modal StreamAnalytic and Monitoring Application on System S,” Proc. 33rd Int’lConf. Very Large Data Bases (VLDB), pp. 1185-1196, 2007.

[34] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Proces-sing on Large Clusters,” Proc. USENIX Symp. Operating SystemDesign and Implementation, 2004.

[35] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad:Distributed Data-Parallel Programs from Sequential BuildingBlocks,” Proc. European Conf. Computer Systems (EuroSys), 2007.

[36] A. Seshadri, A. Perrig, L.V. Doorn, and P. Khosla, “SWATT:Software-Based Attestation for Embedded Devices,” Proc. IEEESymp. Security and Privacy, May 2004.

[37] A. Haeberlen, P. Kuznetsov, and P. Druschel, “Peerreview:Practical Accountability for Distributed Systems,” Proc. 21stACM SIGOPS Symp. Operating Systems Principles, 2007.

[38] J. Du, X. Gu, and T. Yu, “On Verifying Stateful DataflowProcessing Services in Large-Scale Cloud Systems,” Proc. ACMConf. Computer and Communications Security (CCS), pp. 672-674,2010.

[39] I. Hwang, “A Survey of Fault Detection, Isolation, and Reconfi-guration Methods,” IEEE Trans. Control System Technology, vol. 18,no. 3, pp. 636-653, May 2010.

Juan Du received the BS and MS degrees fromthe Department of Computer Science, TianjinUniversity, China, in 2001 and 2004 respec-tively, and the PhD degree from the Departmentof Computer Science, North Carolina StateUniversity, in 2011. She is a software engineerat Amazon. She has interned at Ericsson, CiscoSystems, and IBM T. J. Watson Researchcenter when she was a PhD student. She ismember of the IEEE.

Daniel J. Dean received the BS and MSdegrees in computer science from Stony BrookUniversity, New York, in 2007 and 2009,respectively. He is currently working toward thePhD degree in the Department of ComputerScience, North Carolina State University. He hasinterned with NEC Labs America in the summerof 2012. He is a student member of the IEEE.

Yongmin Tan received the BE and ME degreesin electrical engineering from Shanghai JiaotongUniversity in 2005 and 2008, respectively,and the PhD degree from the Departmentof Computer Science, North Carolina StateUniversity, in 2012. He is a software engineerin MathWorks. He currently focuses on modelingdistributed systems for Simulink. His generalresearch interests include reliable distributedsystems and cloud computing. He has interned

with NEC Labs America in 2010. He is a recipient of the Best PaperAward from ICDCS 2012. He is member of the IEEE.

Xiaohui Gu received the BS degree in computerscience from Peking University, Beijing, China,in 1999, and the MS and PhD degrees from theDepartment of Computer Science, University ofIllinois at Urbana-Champaign, in 2001 and 2004,respectively. She is an assistant professor in theDepartment of Computer Science at the NorthCarolina State University. She was a researchstaff member at IBM T. J. Watson ResearchCenter, Hawthorne, New York, between 2004

and 2007. She received ILLIAC fellowship, David J. Kuck Best MasterThesis Award, and Saburo Muroga Fellowship from University of Illinoisat Urbana-Champaign. She also received the IBM Invention Achieve-ment Awards in 2004, 2006, and 2007. She has filed eight patents andhas published more than 50 research papers in international journalsand major peer-reviewed conference proceedings. She is a recipient ofUS National Science Foundation (NSF) Career Award, four IBM FacultyAwards 2008, 2009, 2010, 2011, and two Google Research Awards2009, 2011, a Best Paper Award from IEEE CNSM 2010, and NCSUFaculty Research and Professional Development Award. She is a seniormember of the IEEE.

Ting Yu received the BS degree from PekingUniversity in 1997, the MS degree from theUniversity of Minnesota in 1998, and the PhDdegree from the University of Illinois atUrbana-Champaign in 2003, all in computerscience. He is currently an associate professorin the Department of Computer Science, NorthCarolina State University. His research inter-ests include security, with a focus on datasecurity and privacy, trust management, and

security policies. He is a recipient of the US National ScienceFoundation (NSF) CAREER Award. He is a member of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


730 ieee transactions on parallel and distributed …vaweinstitutes.com/pdffiles/scalable...

Documents