cooperative provable data possession for integrity integrity verification in multi-cloud storage

Upload: ieeexploreprojects

Post on 04-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Cooperative Provable Data Possession for Integrity Integrity Verification in Multi-Cloud Storage

    1/14

    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1

    Cooperative Provable Data Possession forIntegrity Verification in Multi-Cloud Storage

    Yan Zhu, Hongxin Hu, Gail-Joon Ahn, Senior Member, IEEE, Mengyang Yu

    AbstractProvable data possession (PDP) is a technique for ensuring the integrity of data in storage outsourcing. In this paper,

    we address the construction of an efficient PDP scheme for distributed cloud storage to support the scalability of service and data

    migration, in which we consider the existence of multiple cloud service providers to cooperatively store and maintain the clients

    data. We present a cooperative PDP (CPDP) scheme based on homomorphic verifiable response and hash index hierarchy.

    We prove the security of our scheme based on multi-prover zero-knowledge proof system, which can satisfy completeness,

    knowledge soundness, and zero-knowledge properties. In addition, we articulate performance optimization mechanisms for our

    scheme, and in particular present an efficient method for selecting optimal parameter values to minimize the computation costs of

    clients and storage service providers. Our experiments show that our solution introduces lower computation and communication

    overheads in comparison with non-cooperative approaches.

    Index TermsStorage Security, Provable Data Possession, Interactive Protocol, Zero-knowledge, Multiple Cloud, Cooperative

    1 INTRODUCTION

    IN recent years, cloud storage service has become afaster profit growth point by providing a compara-bly low-cost, scalable, position-independent platformfor clients data. Since cloud computing environmentis constructed based on open architectures and inter-faces, it has the capability to incorporate multiple in-ternal and/or external cloud services together to pro-vide high interoperability. We call such a distributed

    cloud environment as a multi-Cloud (or hybrid cloud).Often, by using virtual infrastructure management(VIM) [1], a multi-cloud allows clients to easily accesshis/her resources remotely through interfaces such asWeb services provided by Amazon EC2.

    There exist various tools and technologies for multi-cloud, such as Platform VM Orchestrator, VMwarevSphere, and Ovirt. These tools help cloud providersconstruct a distributed cloud storage platform (DCSP)for managing clients data. However, if such an im-portant platform is vulnerable to security attacks, itwould bring irretrievable losses to the clients. For

    example, the confidential data in an enterprise may beillegally accessed through a remote interface providedby a multi-cloud, or relevant data and archives maybe lost or tampered with when they are stored into an

    A preliminary version of this paper appeared under the title EfficientProvable Data Possession for Hybrid Clouds in Proc. of the 17th

    ACM Conference on Computer and Communications Security (CCS),Chicago, IL, USA, 2010, pp. 881-883.

    Y. Zhu is with the Institute of Computer Science and Technology,Peking University, Beijing 100871, China, and the Beijing Key Lab-oratory of Internet Security Technology, Peking University, Beijing100871, China. E-mail: {yan.zhu,huzexing}@pku.edu.cn.

    H. Hu and G.-J. Ahn are with the Arizona State University, Tempe,Arizona, 85287. E-mail: {hxhu,gahn}@asu.edu. M. Yang is with the School of Mathematics Science, Peking University,

    Beijing 100871, China. E-mail: [email protected].

    uncertain storage pool outside the enterprise. There-fore, it is indispensable for cloud service providers(CSPs) to provide security techniques for managingtheir storage services.

    Provable data possession (PDP) [2] (or proofs ofretrievability (POR) [3]) is such a probabilistic prooftechnique for a storage provider to prove the integrityand ownership of clients data without download-

    ing data. The proof-checking without downloadingmakes it especially important for large-size files andfolders (typically including many clients files) tocheck whether these data have been tampered withor deleted without downloading the latest version ofdata. Thus, it is able to replace traditional hash andsignature functions in storage outsourcing. VariousPDP schemes have been recently proposed, such asScalable PDP [4] and Dynamic PDP [5]. However,these schemes mainly focus on PDP issues at un-trusted servers in a single cloud storage provider andare not suitable for a multi-cloud environment (see

    the comparison of POR/PDP schemes in Table 1).

    Motivation. To provide a low-cost, scalable, location-independent platform for managing clients data, cur-rent cloud storage systems adopt several new dis-tributed file systems, for example, Apache HadoopDistribution File System (HDFS), Google File System(GFS), Amazon S3 File System, CloudStore etc. Thesefile systems share some similar features: a single meta-data server provides centralized management by aglobal namespace; files are split into blocks or chunksand stored on block servers; and the systems are

    comprised of interconnected clusters of block servers.Those features enable cloud service providers to storeand process large amounts of data. However, it iscrucial to offer an efficient verification on the integrity

    Digital Object Indentifier 10.1109/TPDS.2012.66 1045-9219/12/$31.00 2012 IEEE

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

    http://ieeexploreprojects.blogspot.com

  • 7/31/2019 Cooperative Provable Data Possession for Integrity Integrity Verification in Multi-Cloud Storage

    2/14

    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2

    TABLE 1

    Comparison of POR/PDP schemes for a file consisting of blocks.

    Scheme TypeCSP Client

    Comm. Frag. PrivacyMultiple Prob. of

    Comp. Comp. Clouds Detection

    PDP[2] () () (1) 1 (1 )

    SPDP[4] () () () 1 (1 )

    DPDP-I[5] ( log ) ( log ) ( log ) 1 (1 )

    DPDP-II[5] ( log ) ( log ) ( log ) 1 (1 )()CPOR-I[6] () () (1) 1 (1 )

    CPOR-II[6] ( + ) ( + ) () 1 (1 )

    Our Scheme ( + ) ( + ) () 1

    (1 )

    is the number of sectors in each block, is the number of CSPs in a multi-cloud, is the number of sampling blocks, and are the probability of block corruption in a cloud server and -th cloud server in a multi-cloud = {},respective, denotes the verification process in a trivial approach, and ,, denotes Merkle Hash tree,homomorphic tags, and homomorphic responses, respectively.

    and availability of stored data for detecting faultsand automatic recovery. Moreover, this verificationis necessary to provide reliability by automatically

    maintaining multiple copies of data and automaticallyredeploying processing logic in the event of failures.Although existing schemes can make a false or true

    decision for data possession without downloadingdata at untrusted stores, they are not suitable fora distributed cloud storage environment since theywere not originally constructed on interactive proofsystem. For example, the schemes based on MerkleHash tree (MHT), such as DPDP-I, DPDP-II [2] andSPDP [4] in Table 1, use an authenticated skip list tocheck the integrity of file blocks adjacently in space.Unfortunately, they did not provide any algorithms

    for constructing distributed Merkle trees that arenecessary for efficient verification in a multi-cloudenvironment. In addition, when a client asks for a fileblock, the server needs to send the file block alongwith a proof for the intactness of the block. However,this process incurs significant communication over-head in a multi-cloud environment, since the serverin one cloud typically needs to generate such a proofwith the help of other cloud storage services, wherethe adjacent blocks are stored. The other schemes,such as PDP [2], CPOR-I, and CPOR-II [6] in Table1, are constructed on homomorphic verification tags,

    by which the server can generate tags for multiple fileblocks in terms of a single response value. However,that doesnt mean the responses from multiple cloudscan be also combined into a single value on theclient side. For lack of homomorphic responses, clientsmust invoke the PDP protocol repeatedly to checkthe integrity of file blocks stored in multiple cloudservers. Also, clients need to know the exact positionof each file block in a multi-cloud environment. Inaddition, the verification process in such a case willlead to high communication overheads and compu-tation costs at client sides as well. Therefore, it is of

    utmost necessary to design a cooperative PDP modelto reduce the storage and network overheads andenhance the transparency of verification activities incluster-based cloud storage systems. Moreover, such a

    cooperative PDP scheme should provide features fortimely detecting abnormality and renewing multiplecopies of data.

    Even though existing PDP schemes have addressedvarious security properties, such as public verifia-bility [2], dynamics [5], scalability [4], and privacypreservation [7], we still need a careful considerationof some potential attacks, including two major cat-egories: Data Leakage Attack by which an adversarycan easily obtain the stored data through verifica-tion process after running or wiretapping sufficientverification communications (see Attacks 1 and 3 inAppendix A), and Tag Forgery Attack by which adishonest CSP can deceive the clients (see Attacks 2and 4 in Appendix A). These two attacks may cause

    potential risks for privacy leakage and ownershipcheating. Also, these attacks can more easily compro-mise the security of a distributed cloud system thanthat of a single cloud system.

    Although various security models have been pro-posed for existing PDP schemes [2], [7], [6], thesemodels still cannot cover all security requirements,especially for provable secure privacy preservationand ownership authentication. To establish a highlyeffective security model, it is necessary to analyze thePDP scheme within the framework of zero-knowledge

    proof system (ZKPS) due to the reason that PDPsystem is essentially an interactive proof system (IPS),which has been well studied in the cryptography com-munity. In summary, a verification scheme for dataintegrity in distributed storage environments shouldhave the following features:

    Usability aspect: A client should utilize theintegrity check in the way of collaboration services.The scheme should conceal the details of the storageto reduce the burden on clients;

    Security aspect: The scheme should provide ad-equate security features to resist some existing attacks,

    such as data leakage attack and tag forgery attack; Performance aspect: The scheme should have

    the lower communication and computation overheadsthan non-cooperative solution.

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

    http://ieeexploreprojects.blogspot.com

  • 7/31/2019 Cooperative Provable Data Possession for Integrity Integrity Verification in Multi-Cloud Storage

    3/14

    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 3

    Related Works. To check the availability and integrityof outsourced data in cloud storages, researchers haveproposed two basic approaches called Provable DataPossession (PDP) [2] and Proofs of Retrievability(POR) [3]. Ateniese et al. [2] first proposed the PDPmodel for ensuring possession of files on untrustedstorages and provided an RSA-based scheme for astatic case that achieves the (1) communicationcost. They also proposed a publicly verifiable version,which allows anyone, not just the owner, to challengethe server for data possession. This property greatlyextended application areas of PDP protocol due to theseparation of data owners and the users. However,these schemes are insecure against replay attacks indynamic scenarios because of the dependencies onthe index of blocks. Moreover, they do not fit formulti-cloud storage due to the loss of homomorphismproperty in the verification process.

    In order to support dynamic data operations, Ate-niese et al. developed a dynamic PDP solution calledScalable PDP [4]. They proposed a lightweight PDPscheme based on cryptographic hash function andsymmetric key encryption, but the servers can deceivethe owners by using previous metadata or responsesdue to the lack of randomness in the challenges. Thenumbers of updates and challenges are limited andfixed in advance and users cannot perform blockinsertions anywhere. Based on this work, Erway etal. [5] introduced two Dynamic PDP schemes with ahash function tree to realize (log ) communication

    and computational costs for a -block file. The basicscheme, called DPDP-I, retains the drawback of Scal-able PDP, and in the blockless scheme, called DPDP-II, the data blocks {}[1,] can be leaked by the re-

    sponse of a challenge, =

    =1 , where is arandom challenge value. Furthermore, these schemesare also not effective for a multi-cloud environmentbecause the verification path of the challenge blockcannot be stored completely in a cloud [8].

    Juels and Kaliski [3] presented a POR scheme,which relies largely on preprocessing steps that theclient conducts before sending a file to a CSP. Un-

    fortunately, these operations prevent any efficient ex-tension for updating data. Shacham and Waters [6]proposed an improved version of this protocol calledCompact POR, which uses homomorphic propertyto aggregate a proof into (1) authenticator valueand () computation cost for challenge blocks, buttheir solution is also static and could not preventthe leakage of data blocks in the verification process.Wang et al. [7] presented a dynamic scheme with(log ) cost by integrating the Compact POR schemeand Merkle Hash Tree (MHT) into the DPDP. Further-more, several POR schemes and models have been

    recently proposed including [9], [10]. In [9] Bowerset al. introduced a distributed cryptographic systemthat allows a set of servers to solve the PDP problem.This system is based on an integrity-protected error-

    correcting code (IP-ECC), which improves the securityand efficiency of existing tools, like POR. However,a file must be transformed into distinct segmentswith the same length, which are distributed across servers. Hence, this system is more suitable for RAIDrather than a cloud storage.

    Our Contributions. In this paper, we address theproblem of provable data possession in distributedcloud environments from the following aspects: highsecurity, transparent verification, and high performance.To achieve these goals, we first propose a verificationframework for multi-cloud storage along with twofundamental techniques: hash index hierarchy (HIH)and homomorphic verifiable response (HVR).

    We then demonstrate that the possibility of con-structing a cooperative PDP (CPDP) scheme withoutcompromising data privacy based on modern crypto-graphic techniques, such as interactive proof system

    (IPS). We further introduce an effective constructionof CPDP scheme using above-mentioned structure.Moreover, we give a security analysis of our CPDPscheme from the IPS model. We prove that thisconstruction is a multi-prover zero-knowledge proofsystem (MP-ZKPS) [11], which has completeness,knowledge soundness, and zero-knowledge proper-ties. These properties ensure that CPDP scheme canimplement the security against data leakage attack andtag forgery attack.

    To improve the system performance with respect toour scheme, we analyze the performance of proba-

    bilistic queries for detecting abnormal situations. Thisprobabilistic method also has an inherent benefit inreducing computation and communication overheads.Then, we present an efficient method for the selectionof optimal parameter values to minimize the compu-tation overheads of CSPs and the clients operations.In addition, we analyze that our scheme is suitable forexisting distributed cloud storage systems. Finally, ourexperiments show that our solution introduces verylimited computation and communication overheads.

    Organization. The rest of this paper is organized asfollows. In Section 2, we describe a formal definition

    of CPDP and the underlying techniques, which areutilized in the construction of our scheme. We intro-duce the details of cooperative PDP scheme for multi-cloud storage in Section 3. We describes the securityand performance evaluation of our scheme in Section4 and 5, respectively. We discuss the related work inSection and Section 6 concludes this paper.

    2 STRUCTURE AND TECHNIQUES

    In this section, we present our verification frameworkfor multi-cloud storage and a formal definition of

    CPDP. We introduce two fundamental techniques forconstructing our CPDP scheme: hash index hierarchy(HIH) on which the responses of the clients chal-lenges computed from multiple CSPs can be com-

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

    http://ieeexploreprojects.blogspot.com

  • 7/31/2019 Cooperative Provable Data Possession for Integrity Integrity Verification in Multi-Cloud Storage

    4/14

  • 7/31/2019 Cooperative Provable Data Possession for Integrity Integrity Verification in Multi-Cloud Storage

    5/14

    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 5

    . However, it would cause significant commu-nication and computation overheads for the verifier,as well as a loss of location-transparent. Such a prim-itive approach obviously diminishes the advantagesof cloud storage: scaling arbitrarily up and down on-demand [13]. To solve this problem, we extend abovedefinition by adding an organizer(), which is oneof CSPs that directly contacts with the verifier, asfollows:

    ((), ())

    (, ),

    where the action of organizer is to initiate and orga-nize the verification process. This definition is con-sistent with aforementioned architecture, e.g., a client(or an authorized application) is considered as , theCSPs are as = {}[1,], and the Zoho cloud isas the organizer in Figure 1. Often, the organizer isan independent server or a certain CSP in . Theadvantage of this new multi-prover proof system isthat it does not make any difference for the clientsbetween multi-prover verification process and single-prover verification process in the way of collaboration.Also, this kind of transparent verification is able toconceal the details of data storage to reduce theburden on clients. For the sake of clarity, we list someused signals in Table 2.

    TABLE 2

    The signal and its explanation.

    Sig. Repression

    the number of blocks in a file; the number of sectors in each block; the number of index coefficient pairs in a query; the number of clouds to store a file;

    the file with sectors, i.e., = {,}[1,][1,] ;

    the set of tags, i.e., = {}[1,]; the set of index-coefficient pairs, i.e., = {(, )}; the response for the challenge .

    2.3 Hash Index Hierarchy for CPDP

    To support distributed cloud storage, we illustratea representative architecture used in our cooperativePDP scheme as shown in Figure 2. Our architecturehas a hierarchy structure which resembles a naturalrepresentation of file storage. This hierarchical struc-ture consists of three layers to represent relation-ships among all blocks for stored resources. They aredescribed as follows:

    1) Express Layer: offers an abstract representationof the stored resources;

    2) Service Layer: offers and manages cloud storageservices; and

    3) Storage Layer: realizes data storage on manyphysical devices.

    We make use of this simple hierarchy to organizedata blocks from multiple CSP services into a large-size file by shading their differences among thesecloud storage systems. For example, in Figure 2 theresources in Express Layer are split and stored intothree CSPs, that are indicated by different colors, inService Layer. In turn, each CSP fragments and storesthe assigned data into the storage servers in StorageLayer. We also make use of colors to distinguishdifferent CSPs. Moreover, we follow the logical orderof the data blocks to organize the Storage Layer.This architecture also provides special functions fordata storage and management, e.g., there may existoverlaps among data blocks (as shown in dashedboxes) and discontinuous blocks but these functionsmay increase the complexity of storage management.

    Service Layer Express LayerStorage Layer

    CSP1

    CSP2

    CSP3

    Overlap

    1

    (1) (" ")s

    ii

    H FnW

    [

    (1)

    (2)

    1 (" ")H Cn[[

    (1)

    (2)

    3(" ")H Cn

    [[

    (1)

    (2)

    2 (" ")H Cn[[

    (2)1

    (3)

    ,1 ( )i iH[[ F

    Fig. 2. Index-hash hierarchy of CPDP model.

    In storage layer, we define a common fragmentstructure that provides probabilistic verification ofdata integrity for outsourced storage. The fragmentstructure is a data structure that maintains a set of

    block-tag pairs, allowing searches, checks and updatesin (1) time. An instance of this structure is shown instorage layer of Figure 2: an outsourced file is splitinto blocks {1, 2, , }, and each block issplit into sectors {,1, ,2, , ,}. The fragmentstructure consists of block-tag pair (, ), where is a signature tag of block generated by aset of secrets = (1, 2, , ). In order to checkthe data integrity, the fragment structure implementsprobabilistic verification as follows: given a randomchosen challenge (or query) = {(, )}, where is a subset of the block indices and is a ran-

    dom coefficient. There exists an efficient algorithm toproduce a constant-size response (1, 2, , , ),where comes from all {,, } and

    is fromall {, }.

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

    http://ieeexploreprojects.blogspot.com

  • 7/31/2019 Cooperative Provable Data Possession for Integrity Integrity Verification in Multi-Cloud Storage

    6/14

    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 6

    Given a collision-resistant hash function (), wemake use of this architecture to construct a HashIndex Hierarchy (viewed as a random oracle),which is used to replace the common hash functionin prior PDP schemes, as follows:

    1) Express layer: given random {}=1 and thefile name , sets (1) =

    =1

    () and makesit public for verification but makes {}=1 secret;

    2) Service layer: given the (1) and the cloud name

    , sets (2) = (1)();

    3) Storage layer: given the (2), a block number ,

    and its index record = , sets (3), =

    (2)

    (), where is the sequence number of a

    block, is the updated version number, and is a random integer to avoid collision.

    As a virtualization approach, we introduce a simpleindex-hash table = {} to record the changes offile blocks as well as to generate the hash value of

    each block in the verification process. The structureof is similar to the structure of file block allocationtable in file systems. The index-hash table consists ofserial number, block number, version number, randominteger, and so on. Different from the common indextable, we assure that all records in our index tablediffer from one another to prevent forgery of datablocks and tags. By using this structure, especiallythe index records {}, our CPDP scheme can alsosupport dynamic data operations [8].

    The proposed structure can be readily incorperatedinto MAC-based, ECC or RSA schemes [2], [6]. These

    schemes, built from collision-resistance signatures (seeSection 3.1) and the random oracle model, have theshortest query and response with public verifiability.They share several common characters for the imple-mentation of the CPDP framework in the multipleclouds: 1) a file is split into sectors and each block( sectors) corresponds to a tag, so that the storage ofsignature tags can be reduced by the increase of ;2) a verifier can verify the integrity of file in randomsampling approach, which is of utmost importancefor large files; 3) these schemes rely on homomorphicproperties to aggregate data and tags into a constant-size response, which minimizes the overhead of net-work communication; and 4) the hierarchy structureprovides a virtualization approach to conceal the stor-age details of multiple CSPs.

    2.4 Homomorphic Verifiable Response for CPDP

    A homomorphism is a map : between twogroups such that (1 2) = (1) (2) for all1, 2 , where denotes the operation in and denotes the operation in . This notation has beenused to define Homomorphic Verifiable Tags (HVTs)

    in [2]: Given two values and for two messages and , anyone can combine them into a value corresponding to the sum of the messages + . When provable data possession is considered as

    a challenge-response protocol, we extend this notationto the concept of Homomorphic Verifiable Responses(HVR), which is used to integrate multiple responsesfrom the different CSPs in CPDP scheme as follows:

    Definition 2 (Homomorphic Verifiable Response): A re-sponse is called homomorphic verifiable response in a

    PDP protocol, if given two responses and for twochallenges and from two CSPs, there exists anefficient algorithm to combine them into a response corresponding to the sum of the challenges

    .

    Homomorphic verifiable response is the key tech-nique of CPDP because it not only reduces the com-munication bandwidth, but also conceals the locationof outsourced data in the distributed cloud storageenvironment.

    3 COOPERATIVE PDP SCHEME

    In this section, we propose a CPDP scheme for multi-cloud system based on the above-mentioned struc-ture and techniques. This scheme is constructed oncollision-resistant hash, bilinear map group, aggrega-tion algorithm, and homomorphic responses.

    3.1 Notations and Preliminaries

    Let = {} be a family of hash functions :{0, 1} {0, 1} index by . We say thatalgorithm has advantage in breaking collision-resistance of if Pr[() = (0, 1) : 0 =

    1, (0) = (1)] , where the probability isover the random choices of and the randombits of . So that, we have the following definition.

    Definition 3 (Collision-Resistant Hash): A hash fam-ily is (, )-collision-resistant if no -time adver-sary has advantage at least in breaking collision-resistance of.

    We set up our system using bilinear pairings pro-posed by Boneh and Franklin [14]. Let and betwo multiplicative groups using elliptic curve conven-tions with a large prime order . The function is a

    computable bilinear map : with the fol-lowing properties: for any , and all , ,we have 1) Bilinearity: ([], []) = (, ); 2)Non-degeneracy: (, ) = 1 unless or = 1; and3) Computability: (, ) is efficiently computable.

    Definition 4 (Bilinear Map Group System): A bilinearmap group system is a tuple = ,,, com-posed of the objects as described above.

    3.2 Our CPDP Scheme

    In our scheme (see Fig 3), the manager first runs algo-

    rithm to obtain the public/private key pairsfor CSPs and users. Then, the clients generate the tagsof outsourced data by using . Anytime, theprotocol is performed by a 5-move interactive

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

    http://ieeexploreprojects.blogspot.com

  • 7/31/2019 Cooperative Provable Data Possession for Integrity Integrity Verification in Multi-Cloud Storage

    7/14

    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 7

    KeyGen(1): Let = (,,, ) be a bilinear map group system with randomly selected generators , , where, are two bilinear groups of a large prime order , = (). Makes a hash function () public. For a CSP,chooses a random number and computes =

    . Thus, = and = (, ). For a user, choosestwo random numbers , and sets = (, ) and = ( ,, 1 =

    , 2 = ).

    TagGen(,, ): Splits into sectors {,}[1,],[1,] . Chooses random 1, , as the

    secret of this file and computes = for [1, ]. Constructs the index table = {}

    =1 and fills out the

    record a in for [1, ], then calculates the tag for each block as

    {(1)

    =1 (),

    (2) (1)(),

    (3), (2)

    (), , ((3), ) (=1 , ),

    where is the file name and is the CSP name of . And then stores = (, (1), ) into TTP, and

    = {,}= to , where = (1, , ). Finally, the data owner saves the secret = (1, , ).

    Proof(, ): This is a 5-move protocol among the Provers ( = {}[1,]), an organizer (), and a Verifier () withthe common input (, ), which is stored in TTP, as follows:

    1) Commitment( ): the organizer chooses a random and sends 1 =

    1 to the verifier;

    2) Challenge1( ): the verifier chooses a set of challenge index-coefficient pairs = {(, )} and sends to the organizer, where is a set of random indexes in [1, ] and is a random integer in

    ;

    3) Challenge2( ): the organizer forwards = {(, )} to each in ;4) Response1( ): chooses a random and random , for [1, ], and calculates a

    response

    (,)

    , , , +

    (,)

    , , , (, , 2),

    where = {,}[1,] and ==1 ,. Let

    , each sends = (, , , ) to the

    organizer;5) Response2( ): After receiving all responses from {}[1,], the organizer aggregates {} into a

    final response as

    (

    )

    ,

    ,, (

    )

    . (1)

    Let = {}[1,]. The organizer sends = (, , ) to the verifier.

    Verification: Now the verifier can check whether the response was correctly formed by checking that

    (, )?= (

    (,)

    (2)

    () , 1) (

    =1

    , 2). (2)

    a. For = , , in Section 2.3, we can set = ( = , = 1, {0, 1}) at initial stage of CPDP scheme.

    Fig. 3. Cooperative Provable Data Possession for Multi-Cloud Storage.

    proof protocol between a verifier and more than oneCSP, in which CSPs need not to interact with eachother during the verification process, but an organizeris used to organize and manage all CSPs.

    This protocol can be described as follows: 1) the or-ganizer initiates the protocol and sends a commitment

    to the verifier; 2) the verifier returns a challenge set ofrandom index-coefficient pairs to the organizer; 3)the organizer relays them into each in accordingto the exact position of each data block; 4) each returns its response of challenge to the organizer;and 5) the organizer synthesizes a final responsefrom received responses and sends it to the verifier.The above process would guarantee that the verifieraccesses files without knowing on which CSPs or inwhat geographical locations their files reside.

    In contrast to a single CSP environment, our schemediffers from the common PDP scheme in two aspects:

    1) Tag aggregation algorithm: In stage of commit-ment, the organizer generates a random and returns its commitment 1 to the verifier. Thisassures that the verifier and CSPs do not obtain the

    value of . Therefore, our approach guarantees onlythe organizer can compute the final by using and received from CSPs.

    After is computed, we need to transfer itto the organizer in stage of Response1. In orderto ensure the security of transmission of data tags,our scheme employs a new method, similar to the

    ElGamal encryption, to encrypt the combination oftags

    (,)

    , that is, for = and

    = (, = ) 2, the cipher of message is = (1 = , 2 = ) and its decryption isperformed by = 2

    1 . Thus, we hold the equation

    =

    =

    (,)

    =

    (,)

    = (,)

    .

    2) Homomorphic responses: Because of the homo-morphic property, the responses computed from CSPs

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

    http://ieeexploreprojects.blogspot.com

  • 7/31/2019 Cooperative Provable Data Possession for Integrity Integrity Verification in Multi-Cloud Storage

    8/14

    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 8

    in a multi-cloud can be combined into a single finalresponse as follows: given a set of = (, , , )received from , let =

    ,, the organizercan compute

    =

    , =

    , +

    (,) ,

    =

    , +

    (,)

    ,

    =

    , +

    (,) ,

    = +

    (,) , .

    The commitment of is also computed by

    = (

    ) = (

    =1

    ,)

    ==1

    (, , 2)

    = =1

    (,

    , 2 ) =

    =1

    ( ,

    2).

    It is obvious that the final response received bythe verifiers from multiple CSPs is same as that in onesimple CSP. This means that our CPDP scheme is ableto provide a transparent verification for the verifiers.Two response algorithms, Response1 and Response2,comprise an HVR: Given two responses and for two challenges and from two CSPs, i.e., = 1(, {} , {}), there existsan efficient algorithm to combine them into a finalresponse corresponding to the sum of the challenges , that is, = 1

    (

    , {}

    , {}

    )= 2(, ).

    For multiple CSPs, the above equation can be ex-tended to = 2({}). More importantly,the HVR is a pair of values = (,,), which has aconstant-size even for different challenges.

    4 SECURITY ANALYSIS

    We give a brief security analysis of our CPDPconstruction. This construction is directly derived

    from multi-prover zero-knowledge proof system (MP-ZKPS), which satisfies following properties for a givenassertion :

    1) Completeness: whenever , there exists astrategy for the provers that convinces the verifier thatthis is the case;

    2) Soundness: whenever , whatever strategythe provers employ, they will not convince the verifierthat ;

    3) Zero-knowledge: no cheating verifier can learnanything other than the veracity of the statement.

    According to existing IPS research [15], these prop-

    erties can protect our construction from various at-tacks, such as data leakage attack (privacy leakage),tag forgery attack (ownership cheating), etc. In details,the security of our scheme can be analyzed as follows:

    4.1 Collision resistant for index-hash hierarchy

    In our CPDP scheme, the collision resistant of index-hash hierarchy is the basis and prerequisite for thesecurity of whole scheme, which is described as beingsecure in the random oracle model. Although the hashfunction is collision resistant, a successful hash colli-

    sion can still be used to produce a forged tag whenthe same hash value is reused multiple times, e.g., alegitimate client modifies the data or repeats to insertand delete data blocks of outsourced data. To avoidthe hash collision, the hash value

    (3), , which is used

    to generate the tag in CPDP scheme, is computedfrom the set of values {}, , , {}. As long asthere exists one bit difference in these data, we canavoid the hash collision. As a consequence, we havethe following theorem (see Appendix B):

    Theorem 1 (Collision Resistant): The index-hash hier-archy in CPDP scheme is collision resistant, even if

    the client generates

    2 ln 11

    files with the same

    file name and cloud name, and the client repeats2+1 ln 11 times to modify, insert and delete data

    blocks, where the collision probability is at least , , and = for .

    4.2 Completeness property of verification

    In our scheme, the completeness property impliespublic verifiability property, which allows anyone, notjust the client (data owner), to challenge the cloudserver for data integrity and data ownership withoutthe need for any secret information. First, for everyavailable data-tag pair (, ) (,) anda random challenge = (, ), the verificationprotocol should be completed with success probabilityaccording to the Equation (3), that is,

    Pr

    (()

    , ())

    (, ) = 1

    = 1.

    In this process, anyone can obtain the ownerspublic key = (, ,1 =

    , 2 = ) and the

    corresponding file parameter = (, (1), ) fromTTP to execute the verification protocol, hence thisis a public verifiable protocol. Moreover, for differentowners, the secrets and hidden in their publickey are also different, determining that a successverification can only be implemented by the realowners public key. In addition, the parameter isused to store the file-related information, so an ownercan employ a unique public key to deal with a largenumber of outsourced files.

    4.3 Zero-knowledge property of verifi

    cationThe CPDP construction is in essence a Multi-ProverZero-knowledge Proof (MP-ZKP) system [11], whichcan be considered as an extension of the notion of

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

    http://ieeexploreprojects.blogspot.com

  • 7/31/2019 Cooperative Provable Data Possession for Integrity Integrity Verification in Multi-Cloud Storage

    9/14

    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 9

    (, ) ==1

    ( ,

    2) (

    (,)

    , )

    ==1

    ( ,

    2) (

    (,)

    (((3), ) (=1

    , )), )

    =

    =1( , 2) ( (,

    )

    ((3) ) , ) (

    =1

    (,)

    ,

    , )

    = (

    (,)

    ((3) ) , 1)

    =1

    ( , 2). (3)

    an interactive proof system (IPS). Roughly speak-ing, in the scenario of MP-ZKP, a polynomial-timebounded verifier interacts with several provers whosecomputational powers are unlimited. According to aSimulator model, in which every cheating verifier hasa simulator that can produce a transcript that lookslike an interaction between a honest prover and acheating verifier, we can prove our CPDP constructionhas Zero-knowledge property (see Appendix C):

    Theorem 2 (Zero-Knowledge Property): The verificat-ion protocol (, ) in CPDP scheme is a com-putational zero-knowledge system under a simulatormodel, that is, for every probabilistic polynomial-timeinteractive machine , there exists a probabilisticpolynomial-time algorithm such that the ensem-bles (

    ((), ()) (, ))

    and (, ) are computationally indistinguishable.

    Zero-knowledge is a property that achieves the

    CSPs robustness against attempts to gain knowledgeby interacting with them. For our construction, wemake use of the zero-knowledge property to preservethe privacy of data blocks and signature tags. Firstly,randomness is adopted into the CSPs responses inorder to resist the data leakage attacks (see Attacks 1and 3 in Appendix A). That is, the random integer, is introduced into the response ,, i.e., , =,+

    (,)

    , . This means that the cheatingverifier cannot obtain , from , because he doesnot know the random integer ,. At the same time,a random integer is also introduced to randomize

    the verification tag , i.e.,

    (

    )

    .Thus, the tag cannot reveal to the cheating verifierin terms of randomness.

    4.4 Knowledge soundness of verification

    For every data-tag pairs (, ) (,), inorder to prove nonexistence of fraudulent and ,we require that the scheme satisfies the knowledgesoundness property, that is,

    Pr

    ((), ()) (, ) = 1 ,

    where is a negligible error. We prove that ourscheme has the knowledge soundness property by

    using reduction to absurdity 1: we make use of

    to construct a knowledge extractor [7,13], whichgets the common input (, ) and rewindable black-box accesses to the prover , and then attempts tobreak the computational Diffie-Hellman (CDH) prob-lem in : given , 1 = , 2 = , output . But it is unacceptable because the CDH prob-lem is widely regarded as an unsolved problem inpolynomial-time. Thus, the opposite direction of thetheorem also follows. We have the following theorem(see Appendix D):

    Theorem 3 (Knowledge Soundness Property): Our sch-eme has (, ) knowledge soundness in random oracleand rewindable knowledge extractor model assum-ing the (, )-computational Diffie-Hellman (CDH) as-sumption holds in the group for .

    Essentially, the soundness means that it is infeasibleto fool the verifier to accept false statements. Often,the soundness can also be regarded as a stricter notionof unforgeability for file tags to avoid cheating theownership. This means that the CSPs, even if collusionis attempted, cannot be tampered with the data orforge the data tags if the soundness property holds.Thus, the Theorem 3 denotes that the CPDP schemecan resist the tag forgery attacks (see Attacks 2 and 4 inAppendix A) to avoid cheating the CSPs ownership.

    5 PERFORMANCE EVALUATION

    In this section, to detect abnormality in a low-overhead and timely manner, we analyze and op-

    timize the performance of CPDP scheme based onthe above scheme from two aspects: evaluation ofprobabilistic queries and optimization of length ofblocks. To validate the effects of scheme, we introducea prototype of CPDP-based audit system and presentthe experimental results.

    5.1 Performance Analysis for CPDP Scheme

    We present the computation cost of our CPDP schemein Table 3. We use [] to denote the computation costof an exponent operation in , namely, , where

    is a positive integer in and

    or. We ne-glect the computation cost of algebraic operations and

    1. It is a proof method in which a proposition is proved to betrue by proving that it is impossible to be false.

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

    http://ieeexploreprojects.blogspot.com

  • 7/31/2019 Cooperative Provable Data Possession for Integrity Integrity Verification in Multi-Cloud Storage

    10/14

    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 10

    simple modular arithmetic operations because theyrun fast enough [16]. The most complex operation isthe computation of a bilinear map (, ) between twoelliptic points (denoted as []).

    TABLE 3

    Comparison of computation overheads between our

    CPDP scheme and non-cooperative (trivial) scheme.

    CPDP Scheme Trivial Scheme

    KeyGen 3[] 2[E]TagGen (2 + )[] (2 + )[]

    Proof() [] + ( + + 1)[] [] + ( + )[]Proof(V) 3[] + ( + )[] 3[] + ( + )[]

    Then, we analyze the storage and communicationcosts of our scheme. We define the bilinear pairingtakes the form : () ()

    (The

    definition given here is from [17], [18]), where is aprime, is a positive integer, and is the embedding

    degree (or security multiplier). In this case, we utilizean asymmetric pairing : 12 to replace thesymmetric pairing in the original schemes. In Table 3,it is easy to find that clients computation overheadsare entirely irrelevant for the number of CSPs. Further,our scheme has better performance compared withnon-cooperative approach due to the total of compu-tation overheads decrease 3( 1) times bilinear mapoperations, where is the number of clouds in a multi-cloud. The reason is that, before the responses aresent to the verifier from clouds, the organizer hasaggregate these responses into a response by using

    aggregation algorithm, so the verifier only need toverify this response once to obtain the final result.

    TABLE 4

    Comparison of communication overheads between

    our CPDP and non-cooperative (trivial) scheme.

    CPDP Scheme Trivial Scheme

    Commitment 2 2Challenge1 20 20Challenge2 20/Response1 0 + 21 + (0 + 1 + )Response2 0 + 1 +

    Without loss of generality, let the security param-eter be 80 bits, we need the elliptic curve domainparameters over with = 160 bits and = 1in our experiments. This means that the length ofinteger is 0 = 2 in . Similarly, we have 1 = 4in 1, 2 = 24 in 2, and = 24 in for theembedding degree = 6. The storage and communi-cation costs of our scheme is shown in Table 4. Thestorage overhead of a file with () = 1-bytes is() = 0 + 1 = 1.04-bytes for = 103

    and = 50 . The storage overhead of its index table

    is 0 = 20-bytes. We define the overhead rate as = ()

    () 1 =10

    and it should therefore be keptas low as possible in order to minimize the storage incloud storage providers. It is obvious that a higher

    means much lower storage. Furthermore, in theverification protocol, the communication overhead ofchallenge is 2 0 = 40 -Bytes in terms of the numberof challenged blocks , but its response (response1 orresponse2) has a constant-size communication over-head 0 + 1 + 1.3-bytes for different file sizes.Also, it implies that clients communication overheadsare of a fixed size, which is entirely irrelevant for thenumber of CSPs.

    5.2 Probabilistic Verification

    We recall the probabilistic verification of commonPDP scheme (which only involves one CSP), in whichthe verification process achieves the detection of CSPserver misbehavior in a random sampling mode inorder to reduce the workload on the server. Thedetection probability of disrupted blocks is animportant parameter to guarantee that these blocks

    can be detected in time. Assume the CSP modifies blocks out of the -block file, that is, the probabilityof disrupted blocks is =

    . Let be the numberof queried blocks for a challenge in the verificationprotocol. We have detection probability 2

    (, ) 1 (

    ) = 1 (1 )

    ,

    where, (, ) denotes that the probability is afunction over and . Hence, the number of queried

    blocks is log(1)log(1)

    for a sufficiently large

    and .3 This means that the number of queried

    blocks is directly proportional to the total numberof file blocks for the constant and . Therefore,for a uniform random verification in a PDP schemewith fragment structure, given a file with = sectors and the probability of sector corruption ,the detection probability of verification protocol has 1 (1 ) , where denotes the samplingprobability in the verification protocol. We can obtainthis result as follows: because 1 (1 ) isthe probability of block corruption with sectors incommon PDP scheme, the verifier can detect blockerrors with probability 1 (1 ) 1

    ((1 )) = 1 (1 ) for a challenge with = index-coefficient pairs. In the same way, givena multi-cloud = {}[1,], the detection probabilityof CPDP scheme has

    (, {, }, )

    1

    ((1 )

    )

    = 1

    (1 )

    ,

    where denotes the proportion of data blocks in the-th CSP, denotes the probability of file corruption

    2. Exactly, we have = 1 (1 ) (1 1 ) (1 +1 ).

    Since 1

    1

    for [0, 1], we have = 1 1=0(1

    ) 1 1=0(1

    ) = 1 (1

    ).

    3. In terms of(1

    ) 1

    , we have 1(1

    ) =

    .

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

    http://ieeexploreprojects.blogspot.com

  • 7/31/2019 Cooperative Provable Data Possession for Integrity Integrity Verification in Multi-Cloud Storage

    11/14

    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 11

    TABLE 5

    The influence of, under the different corruption probabilities and the different detection probabilities .

    {0.1,0.2,0.01} {0.01,0.02,0.001} {0.001,0.002,0.0001} {0.0001,0.0002,0.00001} {0.5,0.3,0.2} {0.5,0.3,0.2} {0.5,0.3,0.2} {0.5,0.3,0.2}

    0.8 3/4 7/20 23/62 71/2020.85 3/5 8/21 26/65 79/2140.9 3/6 10/20 28/73 87/236

    0.95 3/8 11/29 31/86 100/2670.99 4/10 13/31 39/105 119/345

    0.999 5/11 16/38 48/128 146/433

    in the -th CSP, and denotes the possible numberof blocks queried by the verifier in the -th CSP.Furthermore, we observe the ratio of queried blocksin the total file blocks under different detectionprobabilities. Based on above analysis, it is easy tofind that this ratio holds the equation

    log(1 )

    log(1

    ).

    When this probability is a constant probability,the verifier can detect sever misbehavior with a cer-tain probability by asking proof for the number of

    blocks log(1) log(1)

    for PDP or

    log(1 )

    log(1 )

    for CPDP, where = =

    . Note that, the valueof is dependent on the total number of file blocks [2], because it is increased along with the decrease

    of and log(1 ) < 0 for the constant number ofdisrupted blocks and the larger number .

    Fig. 4. The relationship between computational cost

    and the number of sectors in each block.

    Another advantage of probabilistic verificationbased on random sampling is that it is easy to identifythe tampering or forging data blocks or tags. The iden-tification function is obvious: when the verificationfails, we can choose the partial set of challenge in-

    dexes as a new challenge set, and continue to executethe verification protocol. The above search process canbe repeatedly executed until the bad block is found.The complexity of such a search process is (log ).

    5.3 Parameter Optimization

    In the fragment structure, the number of sectorsper block is an important parameter to affect theperformance of storage services and audit services.Hence, we propose an optimization algorithm for thevalue of s in this section. Our results show that theoptimal value can not only minimize the computationand communication overheads, but also reduce the

    size of extra storage, which is required to store theverification tags in CSPs.

    Assume denotes the probability of sector corrup-tion. In the fragment structure, the choosing of is ex-tremely important for improving the performance ofthe CPDP scheme. Given the detection probability and the probability of sector corruption for multipleclouds = {}, the optimal value of can be com-

    puted by min

    log(1)

    log(1)

    + + }

    ,

    where + + denotes the computational costof verification protocol in PDP scheme, ,, ,

    and is a constant. This conclusion can be obtainedfrom following process: Let = = ()/0.According to above-mentioned results, the sam-

    pling probability holds log(1)

    log(1)

    =log(1)

    log(1)

    . In order to minimize the com-

    putational cost, we have

    min

    { + + }

    = min

    { + + }

    min log(1 )

    log(1 )

    + + .

    where denotes the proportion of data blocks in the-th CSP, denotes the probability of file corruptionin the -th CSP. Since

    is a monotone decreasing

    function and is a monotone increasing functionfor > 0, there exists an optimal value of in theabove equation. The optimal value of is unrelatedto a certain file from this conclusion if the probability is a constant value.

    For instance, we assume a multi-cloud storageinvolves three CSPs = {1, 2, 3} and the

    probability of sector corruption is a constant value{1, 2, 3} = {0.01, 0.02, 0.001}. We set the detectionprobability with the range from 0.8 to 1, e.g., = {0.8, 0.85, 0.9, 0.95, 0.99, 0.999}. For a file, the

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

    http://ieeexploreprojects.blogspot.com

  • 7/31/2019 Cooperative Provable Data Possession for Integrity Integrity Verification in Multi-Cloud Storage

    12/14

    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 12

    1DPH6SDFH

    'DWD1RGHV

    1DPH1RGH

    &

    /

    86

    7

    (

    5&OXVWHU

    0HPEHUVKLS

    &OXVWHU

    0HPEHUVKLS

    &OLHQWV

    $SSOLFDWLRQV&3'3

    'DWD1RGHV'DWD1RGHV

    1DPH1RGH

    &3'3

    7KLUG 3DUW\ $XGLWRU 73$

    'DWD %ORFNV'DWD %ORFNV

    9HULILFDWLRQ

    3URWRFRO

    3URWRFRO 3DUDPHWHU

    Fig. 5. Applying CPDP scheme in Hadoop distributed file system (HDFS).

    proportion of data blocks is 50%, 30%, and 20% inthree CSPs, respectively, that is, 1 = 0.5, 2 = 0.3, and

    3 = 0.2. In terms of Table 3, the computational costof CSPs can be simplified to + 3 + 9. Then, we canobserve the computational cost under different and in Figure 4. When is less than the optimal value,the computational cost decreases evidently with theincrease of , and then it raises when is more thanthe optimal value.

    TABLE 6

    The influence of parameters under different detection

    probabilities ( = {1, 2, 3} = {0.01, 0.02, 0.001},{1, 2, 3} = {0.5, 0.3, 0.2}).

    P 0.8 0.85 0.9 0.95 0.99 0.999

    142.60 168.09 204.02 265.43 408.04 612.06 7 8 10 11 13 16 20 21 20 29 31 38

    More accurately, we show the influence of parame-ters, , , and , under different detection probabil-ities in Table 6. It is easy to see that computational costraises with the increase of . Moreover, we can makesure the sampling number of challenge with followingconclusion: Given the detection probability , the

    probability of sector corruption , and the numberof sectors in each block , the sampling number ofverification protocol are a constant =

    log(1)

    log(1)

    for different files.

    Finally, we observe the change of under different and . The experimental results are shown in Table5. It is obvious that the optimal value of raises withincrease of and with the decrease of . We choosethe optimal value of on the basis of practical settingsand system requisition. For NTFS format, we suggestthat the value of is 200 and the size of block is 4K-

    Bytes, which is the same as the default size of clusterwhen the file size is less than 16TB in NTFS. In thiscase, the value of ensures that the extra storagedoesnt exceed 1% in storage servers.

    5.4 CPDP for Integrity Audit Services

    Based on our CPDP scheme, we introduce an audit

    system architecture for outsourced data in multipleclouds by replacing the TTP with a third party auditor(TPA) in Figure 1. In this architecture, this architecturecan be constructed into a visualization infrastructureof cloud-based storage service [1]. In Figure 5, weshow an example of applying our CPDP scheme inHadoop distributed file system (HDFS) 4, which adistributed, scalable, and portable file system [19].HDFS architecture is composed of NameNode andDataNode, where NameNode maps a file name toa set of indexes of blocks and DataNode indeedstores data blocks. To support our CPDP scheme, the

    index-hash hierarchy and the metadata of NameNodeshould be integrated together to provide an enquiry

    service for the hash value (3), or index-hash record .

    Based on the hash value, the clients can implement theverification protocol via CPDP services. Hence, it iseasy to replace the checksum methods with the CPDPscheme for anomaly detection in current HDFS.

    To validate the effectiveness and efficiency of ourproposed approach for audit services, we have imple-mented a prototype of an audit system. We simulatedthe audit service and the storage service by using two

    local IBM servers with two Intel Core 2 processors at2.16 GHz and 500M RAM running Windows Server2003. These servers were connected via 250 MB/sec ofnetwork bandwidth. Using GMP and PBC libraries,we have implemented a cryptographic library uponwhich our scheme can be constructed. This C librarycontains approximately 5,200 lines of codes and hasbeen tested on both Windows and Linux platforms.The elliptic curve utilized in the experiment is aMNT curve, with base field size of 160 bits and theembedding degree 6. The security level is chosen tobe 80 bits, which means = 160.

    4. Hadoop can enable applications to work with thousands ofnodes and petabytes of data, and it has been adopted by currentlymainstream cloud platforms from Apache, Google, Yahoo, Amazon,IBM and Sun.

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

    http://ieeexploreprojects.blogspot.com

  • 7/31/2019 Cooperative Provable Data Possession for Integrity Integrity Verification in Multi-Cloud Storage

    13/14

    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 13

    Fig. 6. Experimental results under different file size, sampling ratio, and sector number.

    Firstly, we quantify the performance of our auditscheme under different parameters, such as file size

    , sampling ratio , sector number per block ,and so on. Our analysis shows that the value of should grow with the increase of in order to reducecomputation and communication costs. Thus, our ex-periments were carried out as follows: the stored fileswere chosen from 10KB to 10MB; the sector numberswere changed from 20 to 250 in terms of file sizes; andthe sampling ratios were changed from 10% to 50%.The experimental results are shown in the left side ofFigure 6. These results dictate that the computationand communication costs (including I/O costs) growwith the increase of file size and sampling ratio.

    Next, we compare the performance of each activityin our verification protocol. We have shown the the-oretical results in Table 4: the overheads of commit-ment and challenge resemble one another, and theoverheads of response and verification resembleone another as well. To validate the theoretical results,we changed the sampling ratio from 10% to 50% fora 10MB file and 250 sectors per block in a multi-cloud = {1, 2, 3}, in which the proportions of datablocks are 50%, 30%, and 20% in three CSPs, respec-tively. In the right side of Figure 6, our experimentalresults show that the computation and communi-

    cation costs of commitment and challenge areslightly changed along with the sampling ratio, butthose for response and verification grow with theincrease of the sampling ratio. Here, challenge andresponse can be divided into two sub-processes:challenge1 and challenge2, as well as response1and response2, respectively. Furthermore, the pro-portions of data blocks in each CSP have greaterinfluence on the computation costs of challenge andresponse processes. In summary, our scheme hasbetter performance than non-cooperative approach.

    6 CONCLUSIONS

    In this paper, we presented the construction of anefficient PDP scheme for distributed cloud storage.

    Based on homomorphic verifiable response and hashindex hierarchy, we have proposed a cooperative PDP

    scheme to support dynamic scalability on multiplestorage servers. We also showed that our schemeprovided all security properties required by zero-knowledge interactive proof system, so that it canresist various attacks even if it is deployed as a publicaudit service in clouds. Furthermore, we optimizedthe probabilistic query and periodic verification to im-prove the audit performance. Our experiments clearlydemonstrated that our approaches only introduce asmall amount of computation and communicationoverheads. Therefore, our solution can be treated asa new candidate for data integrity verification in

    outsourcing data storage systems.As part of future work, we would extend ourwork to explore more effective CPDP constructions.First, from our experiments we found that the per-formance of CPDP scheme, especially for large files,is affected by the bilinear mapping operations dueto its high complexity. To solve this problem, RSA-based constructions may be a better choice, but thisis still a challenging task because the existing RSA-based schemes have too many restrictions on theperformance and security [2]. Next, from a practicalpoint of view, we still need to address some issues

    about integrating our CPDP scheme smoothly withexisting systems, for example, how to match index-hash hierarchy with HDFSs two-layer name space,how to match index structure with cluster-networkmodel, and how to dynamically update the CPDPparameters according to HDFS specific requirements.Finally, it is still a challenging problem for the gener-ation of tags with the length irrelevant to the size ofdata blocks. We would explore such a issue to providethe support of variable-length block verification.

    ACKNOWLEDGMENTSThe work of Y. Zhu and M. Yu was supported by theNational Natural Science Foundation of China (ProjectNo.61170264 and No.10990011). This work of Gail-J.

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

    http://ieeexploreprojects.blogspot.com

  • 7/31/2019 Cooperative Provable Data Possession for Integrity Integrity Verification in Multi-Cloud Storage

    14/14

    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 14

    Ahn and Hongxin Hu was partially supported by thegrants from US National Science Foundation (NSF-IIS-0900970 and NSF-CNS-0831360) and Departmentof Energy (DE-SC0004308).

    REFERENCES

    [1] B. Sotomayor, R. S. Montero, I. M. Llorente, and I. T. Foster,Virtual infrastructure management in private and hybridclouds, IEEE Internet Computing, vol. 13, no. 5, pp. 1422,2009.

    [2] G. Ateniese, R. C. Burns, R. Curtmola, J. Herring, L. Kissner,Z. N. J. Peterson, and D. X. Song, Provable data possessionat untrusted stores, in ACM Conference on Computer andCommunications Security, P. Ning, S. D. C. di Vimercati, andP. F. Syverson, Eds. ACM, 2007, pp. 598609.

    [3] A. Juels and B. S. K. Jr., Pors: proofs of retrievability forlarge files, in ACM Conference on Computer and CommunicationsSecurity, P. Ning, S. D. C. di Vimercati, and P. F. Syverson, Eds.ACM, 2007, pp. 584597.

    [4] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, Scal-able and efficient provable data possession, in Proceedings

    of the 4th international conference on Security and privacy incommunication netowrks, SecureComm, 2008, pp. 110.

    [5] C. C. Erway, A. Kupcu, C. Papamanthou, and R. Tamassia,Dynamic provable data possession, in ACM Conference onComputer and Communications Security, E. Al-Shaer, S. Jha, andA. D. Keromytis, Eds. ACM, 2009, pp. 213222.

    [6] H. Shacham and B. Waters, Compact proofs of retrievabil-ity, in ASIACRYPT, ser. Lecture Notes in Computer Science,

    J. Pieprzyk, Ed., vol. 5350. Springer, 2008, pp. 90107.[7] Q. Wang, C. Wang, J. Li, K. Ren, and W. Lou, Enabling public

    verifiability and data dynamics for storage security in cloudcomputing, in ESORICS, ser. Lecture Notes in ComputerScience, M. Backes and P. Ning, Eds., vol. 5789. Springer,2009, pp. 355370.

    [8] Y. Zhu, H. Wang, Z. Hu, G.-J. Ahn, H. Hu, and S. S. Yau, Dy-namic audit services for integrity verification of outsourcedstorages in clouds, in SAC, W. C. Chu, W. E. Wong, M. J.Palakal, and C.-C. Hung, Eds. ACM, 2011, pp. 15501557.

    [9] K. D. Bowers, A. Juels, and A. Oprea, Hail: a high-availabilityand integrity layer for cloud storage, in ACM Conference onComputer and Communications Security, E. Al-Shaer, S. Jha, andA. D. Keromytis, Eds. ACM, 2009, pp. 187198.

    [10] Y. Dodis, S. P. Vadhan, and D. Wichs, Proofs of retrievabilityvia hardness amplification, in TCC, ser. Lecture Notes inComputer Science, O. Reingold, Ed., vol. 5444. Springer, 2009,pp. 109127.

    [11] L. Fortnow, J. Rompel, and M. Sipser, On the power of multi-prover interactive protocols, in Theoretical Computer Science,1988, pp. 156161.

    [12] Y. Zhu, H. Hu, G.-J. Ahn, Y. Han, and S. Chen, Collaborativeintegrity verification in hybrid clouds, in IEEE Conference on

    the 7th International Conference on Collaborative Computing: Net-working, Applications and Worksharing, CollaborateCom, Orlando,Florida, USA, October 15-18, 2011, pp. 197206.

    [13] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz,A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, andM. Zaharia, Above the clouds: A berkeley view of cloud com-puting, EECS Department, University of California, Berkeley,Tech. Rep., Feb 2009.

    [14] D. Boneh and M. Franklin, Identity-based encryption fromthe weil pairing, in Advances in Cryptology (CRYPTO2001),vol. 2139 of LNCS, 2001, pp. 213229.

    [15] O. Goldreich, Foundations of Cryptography: Basic Tools. Cam-bridge University Press, 2001.

    [16] P. S. L. M. Barreto, S. D. Galbraith, C. OEigeartaigh, andM. Scott, Efficient pairing computation on supersingularabelian varieties, Des. Codes Cryptography, vol. 42, no. 3, pp.

    239271, 2007.[17] J.-L. Beuchat, N. Brisebarre, J. Detrey, and E. Okamoto, Arith-

    metic operators for pairing-based cryptography, in CHES,ser. Lecture Notes in Computer Science, P. Paillier and I. Ver-bauwhede Eds vol 4727 Springer 2007 pp 239 255

    [18] H. Hu, L. Hu, and D. Feng, On a class of pseudorandomsequences from elliptic curves over finite fields, IEEE Trans-actions on Information Theory, vol. 53, no. 7, pp. 25982605, 2007.

    [19] A. Bialecki, M. Cafarella, D. Cutting, and O. OMalley,Hadoop: A framework for running applications on largeclusters built of commodity hardware, Tech. Rep., 2005.[Online]. Available: http://lucene.apache.org/hadoop/

    [20] E. Al-Shaer, S. Jha, and A. D. Keromytis, Eds., Proceedings of the2009 ACM Conference on Computer and Communications Security,

    CCS 2009, Chicago, Illinois, USA, November 9-13, 2009. ACM,2009.

    Yan Zhu received the Ph.D. degree in com-puter science from Harbin Engineering Uni-versity, China, in 2005. He was an associateprofessor of computer science in the Insti-tute of Computer Science and Technologyat Peking University since 2007. He workedat the Department of Computer Science andEngineering, Arizona State University as avisiting associate professor from 2008 to2009. His research interests include cryptog-raphy and network security.

    Hongxin Hu is currently working toward thePh.D. degree from the School of Computing,Informatics, and Decision Systems Engineer-ing, Ira A. Fulton School of Engineering, Ari-zona State University. He is also a memberof the Security Engineering for Future Com-puting Laboratory, Arizona State University.His current research interests include accesscontrol models and mechanisms, securityand privacy in social networks, and securityin distributed and cloud computing, network

    and system security and secure software engineering.

    Gail-Joon Ahn is an Associate Professor inthe School of Computing, Informatics, andDecision Systems Engineering, Ira A. Ful-ton Schools of Engineering and the Directorof Security Engineering for Future Comput-ing Laboratory, Arizona State University. Hisresearch interests include information andsystems security, vulnerability and risk man-agement, access control, and security ar-chitecture for distributed systems, which hasbeen supported by the U.S. National Science

    Foundation, National Security Agency, U.S. Department of Defense,U.S. Department of Energy, Bank of America, Hewlett Packard,Microsoft, and Robert Wood Johnson Foundation. Dr. Ahn is arecipient of the U.S. Department of Energy CAREER Award and the

    Educator of the Year Award from the Federal Information SystemsSecurity Educators Association. He was an Associate Professor atthe College of Computing and Informatics, and the Founding Directorof the Center for Digital Identity and Cyber Defense Research andLaboratory of Information Integration, Security, and Privacy, Univer-sity of North Carolina, Charlotte. He received the Ph.D. degree ininformation technology from George Mason University, Fairfax, VA,in 2000.

    Mengyang Yu received his B.S. degree fromthe School of Mathematics Science, PekingUniversity in 2010. He is currently a M.S.candidate in Peking University. His researchinterests include cryptography and computer

    security.

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

    http://ieeexploreprojects.blogspot.com