lecture i: data storage security in cloud compujng
TRANSCRIPT
Lecture I: Data Storage Security in Cloud Compu7ng
Kui Ren Associate Professor
Department of Computer Science and Engineering
University at Buffalo
Disclaimer!
The lecture slides are partially collected from the Internet for the educational purpose only. The lecturer does not claim any credit for them and the copyrights belong to the original authors.
Outline
3
• Introduc7on to Cloud Compu7ng • Cloud Data Storage and Security Challenges • Our Research Efforts • Further Discussion on the Subject
4
Cloud Compu7ng: the Big Thing
Cloud Compu7ng: the Big Thing • Tremendous momentum:
Predic'on on Federal IT spendable to move to the cloud from US CIO.Gov in Feb. 2011.
Predic'on on cloud compu'ng revenue in 2012 from Market-‐research firm IDC.
5
Cloud Compu7ng: the Big Thing
• Tremendous momentum:
6
Cloud providers bring in $2B in first quarter -‐-‐ source: Synergy Research Group, May, 2013
The overall cloud market will hit $71 billion in 2015 Source: Gartner Company data, Macquarie Capital (USA), Jan. 2013
Cloud Compu7ng: Advantages
– Cloud compu7ng enjoys a "pay-‐per-‐use model for enabling available, convenient and on-‐demand network access to a shared pool of configurable compu7ng resources (e.g., networks, servers, storage, applica7ons and services) that can be rapidly provisioned and released with minimal management effort or service provider interac7on.” – NIST
7
Cloud Service Stacks
8
Pla\orm as a service
Infrastructure as a service
So]ware as a service
Cloud Deployment Models
9
Public
Private
Challenges for Cloud Compu7ng
10
Cloud Raises Big Security Challenges! • Data Loss and Leakage
• Insider a_acks
11
Cloud Raises Big Security Challenges! • Service Vulnerability
• Denial of Service
• Service Abuse
12
Broad A_acking Surface for Public Cloud
• Tradi7onal adversaries: Hackers, malwares, etc. • As well as:
– Cross-‐VM a_acks from mul7-‐tenants; – Leaking Personal Iden7fiable Informa7on from rogue employees ; – Even providers who control the en7re infrastructure… – Many others yet to be iden7fied…
• Main concerns: will my data be safe? will anyone see it? can anyone modify it? what if I don’t trust the cloud operator? … 13
Data owners
Data owners
Data flow Data flow
App1
Hypervisor
OS
App2 App
OS
App
OS
Hardware
Virtualized server
Loss of physical control
Security Challenges in Cloud
• Storage Outsourcing vs. Storage Security • Cloud Data Encryp7on vs. Data U7liza7on • Storage Outsourcing vs. Access Control • Computa7on Outsourcing vs. Data Security • U7lity Compu7ng vs. Trustworthy Metering & Pricing • Resource Virtualiza7on vs. Virtualiza7on Security • Security Overhead vs. Cloud Benefits • and many more … …
14
Outline
15
• Introduc7on to Cloud Compu7ng • Cloud Data Storage and Security Challenges • Our Research Efforts and Proposed Designs • Further Discussion on the Subject
Storage Outsourcing vs. Storage Security
16
Data owners
Data owners
Data flow Data flow
Loss of physical control
• Cloud storage service allows owners to outsource their data to cloud servers for storage and maintenance. – Low capital costs on hardware and so]ware, low management and
maintenance overheads, universal on-‐demand data access, etc – E.g., Amazon S3.
• However, data outsourcing also eliminates owners’ ul7mate control over their data.
Storage Outsourcing vs. Storage Security
• Cloud currently offers no guarantee: – Amazon S3: not liable to any data damages or data loss.
• Broad range of threats for data integrity do exist: – Internal: Byzan7ne failure, management errors, so]ware bugs, etc. – External: malicious malware, economically mo7vated a_acks, etc. – E.g., Amazon S3 -‐ Feb., Jul. 2008; Gmail -‐ Dec. 2006, Mar. 2011; Apple
MobileMe -‐ Jul. 2008, Hotmail – Dec. 2010, …
• Cloud servers might behave unfaithfully: – Discard rarely accessed data for monetary reason – Hide data loss incidents for reputa7on
• Data owners demands con7nuous storage correctness assurance for their data in the cloud.
17
Need to Create Security Visibility inside Cloud
• Proac7ve storage audi7ng mechanism to ensure con7nuous correctness of outsourced cloud data. – To help extend data trust perimeter into the cloud. – To meet security, system, and performance requirements.
18
Is my data correctly stored?
Storage correctness proofs
Secure Cloud Storage Audi7ng
19
• Demand efficient storage correctness guarantee without requiring local data copies. – Tradi7onal methods for storage security can not be directly adopted. – Retrieving massive data for checking is unprac7cal. (large bandwidth)
• Allow meaningful tradeoffs between security and overhead. – Communica7on and computa7on costs should be low. – audi7ng cost should not outweigh its benefits.
• Cope with frequent cloud data changing while ensuring con7nuous data audi7ng. – Cloud data may be frequently updated by owner for applica7on
purposes – Audi7ng mechanisms inherently need to support data dynamics.
Secure Cloud Storage Audi7ng (Cont’d)
20
• Enable public audi7ng for unified risk evalua7on. – Introduce a third-‐party auditor saves owners’ compu7ng resources
and simplifies the audi7ng management at cloud. – Public audi7ng should not affect owner’s data privacy.
• Handle mul7ple audi7ng tasks simultaneously (batch audi7ng) – The individual audi7ng of each data file can be tedious and inefficient. – Batch audi7ng improves efficiency and saves computa7on overhead.
Outline
21
• Cloud Compu7ng Background • Cloud Data Storage and Security Challenges • Our Research Efforts and Proposed Designs
• Storage audi7ng with data dynamics support • Privacy-‐preserving public audi7ng • Efficiency improvement via batch audi7ng
• Further Discussion on the Subject
Outline
22
• Cloud Compu7ng Background • Cloud Data Storage and Security Challenges • Our Research Efforts and Proposed Designs
• Storage audi7ng with data dynamics support • Privacy-‐preserving public audi7ng • Efficiency improvement via batch audi7ng
• Further Discussion on the Subject
Dynamic Storage Audi7ng
• Outsourced data can be frequently changing due to updates. – Outsourced file storage, databases, email data, log files, etc.
• How to design efficient storage audi7ng mechanism with inherent support of data dynamics? – The most general forms of data update include data block
modifica7on, inser7on, and dele7on.
Cloud hosts not only sta-c but dynamic data
Security message flow
Data flow
• The tradi7onal approach is not applicable. – Owner pre-‐computes MACs for the data.
Data* MACK1(Data)
MACK2(Data)
MACK3(Data) MACK1(Data*)
reveal K1
Owner Cloud Server
equal?"
Keys may be used up! No data dynamics support! Cloud processes entire data online per audit!
Straigh\orward Approaches
24
Straigh\orward Approaches • The random-‐sampling approach
– Check only a small por7on of the data per audit – Achieve probabilis7c integrity guarantee via random sampling
σ1
m1 σ2
m2 σ3
m3 σ4
m4 …
… σn
mn
1. Linear bandwidth cost w.r.t. sample size; 2. Linear computational cost - need to verify
each block/authenticator pair.
Cloud Server Owner randomly sample ���
block/authenticator pairs σ1
m1 σ2
m2 σ4
m4
Owner pre-computes an authenticator (e.g., signature/MAC) for each data block.
25
Construct Homomorphic Authen7cator • Homomorphic authen7cator provides integrity authen7ca7on
and has the aggrega7on property. – BLS signature based instan7a7on: x, gx is private/public key pair, H(.) :
hash to point func7on, u, g are generators for group G. • ,
σi
mi
– Homomorphic: aggrega7on of authen7cators and data blocks
Data block:
Authen7cator:
Verifica7on:
26
σ1
m1 σ2
m2 σ
μ + .
Construct Homomorphic Authen7cator • Audit the aggregated block and authen7cator for the constant
bandwidth cost and much saved computa7onal cost.
σ1
m1 σ2
m2 σ3
m3 σ4
m4 …
… σn
mn
Homomorphic property allows blocks and authenticators to be combined into single value
Cloud Server
Owner randomly sample ���
block/authenticator pairs σ1
m1 σ2
m2 σ4
m4 σ
μ
small and constant bandwidth verify μ and σ once only
Not designed to support data dynamics!
27
• Direct extension to data dynamics is insecure. – E.g., block modifica7on from mi to mi + Δm allows adversary to obtain Δm
and by dividing newly computed σi’ and original σi
– Adversary could now maliciously modify any block ms to ms* = ms+ Δm and forge legi7mate authen7cator σs* as:
• New authen7cator construc7on is required to avoid the a_ack.
Analysis of Exis7ng Work
m1 m2 m3 ……. mn
σ1 σ3 σ2 σn …….H. Shacham et al. 08 BLS signature based
G. Ateniese et al. 07 RSA based
v, name: randomly chosen labels for data names; d, x: related private keys; H(.), h(.) : hash to point functions.
28
Analysis of Exis7ng Work
m1 m2 m3 ……. mn
σ1 σ3 σ2 σn …….H. Shacham et al. 08 BLS signature based
G. Ateniese et al. 07 RSA based
• A secure authen7cator must enforce the block index, i.e., posi7on/sequence informa7on. – Prevent adversary from using authen7cators to obtain proofs for different blocks. – E.g., use any valid (ms ,σs) pair to pass challenges for corrupted mt successfully.
• But keeping index informa7on makes data updates highly inefficient. – E. g., inser7ng a block at any posi7on will require retrieving all the subsequent data
blocks and re-‐computa7on of all corresponding authen7cators.
• Can we eliminate the index informa7on but s7ll enforce block posi7on without affec7ng the security? " 29
Our Design Overview • Construct a new authen7cator using H(mi) instead of H(name||i).
• New authen7cator supports secure block modifica7on opera7on. – H(mi) changes for every block updates, so the aforemen7oned a_ack on
block modifica7on is no longer valid.
• Elimina7on of index for efficient block inser7on/dele7on opera7on.
O
We are yet to have a way to enforce the block index sequence.
30
h1,1
Our Design Overview • Construct a novel sequence-‐enforced Merkle Hash Tree (sMHT).
– Rank of each tree node: the # of leaves that can be reached from the node. • It’s also the sum of its children’s ranks.
– Construct sMHT with an ordered set {H(mi)}i=1,…,n as the leaf nodes, and use root (R,n) to ensure correct block posi7on informa7on:
Auxiliary Authen7ca7on Informa7on (AAI)
Sequence of the ordered set of leaves
To verify x3’s value and posi7on, we use root (R,4) and AAI = {(h4,1,0), (hA,2,1)}:
1. Compute rank of B as 1+1 = 2 and hB = h(h(x3||1) || h4 ||2);
2. Compute rank of root as 2+2 = 4 and R’ = h(hA || hB || 4);
3. Verify if R = R’ and also if LEFT(x3) = 2.
xi = H(mi), i = 1,…, n 31
Root= (R,4)
h2,1
hA,2
x1
hB,2
x2 x3 x4
A B
h3,1 h4,1 h1 = h(x1||1)
hA = h(h1||h2||2)
R = h(hA||hB||4)
C D E F
Lv:2
Lv:1
Lv:0
σ1
m1 σ2
m2 σ3
m3 σ4
m4 …
… σ8
m8 data outsource
σi
• Prepara7on: Owner generates sMHT, keeps root (R, n), and outsources {Data, σi’s, sMHT} to the cloud.
The Protocol Illustra7on
{v1, v5, v6, v8} random positions & coefficients
Owner
µ = v1m1+v5m5+v6m6+v8m8
Cloud Server
and Ω �Owner verifies µ and σ with Ω!
• Audi7ng: Owner challenges cloud on randomly selected data blocks. Cloud responds with the corresponding {μ, σ, Ω}.
32
x1 x2 x3 x4 x5 x6 x7 x8
Root
A B
C D E F
xi = H(mi) , i=1,...,8
The Protocol Illustra7on: Audi7ng • Step 1: Owner uses root (R,8) and Ω to authen7cate the
posi7ons of {H(mi )}i=1,5,6,8 and hence those of {mi }i=1,5,6,8 .
AAI
Ω = {H(mi )}i=1,5,6,8 ,and the corresponding AAI from sMHT
33
R,8
hB , 4
hD,2
hA , 4
hC,2 hE,2 hF,2
h1,1 h2,1 h3,1 h4,1 h5,1 h6,1 h7,1 h8,1
check if R = h(hA||hB||8) and if LEFT(xi)=i-1, for i=1,5,6,8
h8 = h(x8||1) h6 = h(x6||1) h5 = h(x5||1)
h1 = h(x1||1)
hF = h(h7||h8||2) hE = h(h5||h6||2) hC = h(h1||h2||2)
hB = h(hE||hF||4) hA = h(hC||hD||4)
The Protocol Illustra7on: Audi7ng
• Step 2: With {H(mi )}i=1,5,6,8 authen7cated, owner further checks
34
Random coefficients chosen by owner
Public key
Audi7ng materials from cloud
The Protocol Illustra7on: Support Data Dynamics
• Support general block-‐level opera7ons: Modifica7on (M), dele7on (D), and inser7on (I) – One step closer towards prac7cal audi7ng mechanisms
• Update opera7on: the block, its corresponding authen7cator, and the sMHT – When inser7ng/dele7ng a block, authen7cators for all other blocks
remains the same, i.e., no authen7cator re-‐computa7on or data retrieving is necessary.
35
Ω, h(H(m*))
Owner-‐side Updates:
Support Data Dynamics: Block Inser7on
h1,1 h2,1 h3,1 h4,1
Root (R,4)
A BhA,2 hB,2
Insert h(x*||1),1 after h2,1
n3
h1,1 hc,2 h3,1 h4,1
hA*,3 hB,2
Root (R*,5)
A B
h(x*||1),1h2,1
C
{m*, σ*} Insert m* after m2
Owner
xi = H(mi)
Cloud Server
2. Insert m* and update sMHT.
Ω ={(h1,1,0), (h2,1,0), (hB,2,1)}
1. Compute σ* for new block m*.
3. Authen7cate received Ω with local (R,4). 4. Compute (R*,5) with Ω and local h(H(m*)||1)= h(x*||1).
36
hi =h(xi||1)
Remarks
37
• In our scheme, we store addi7onal meta data in the tree structure to assist authen7ca7on.
– E.g., store addi7onal rank informa7on of the tree at the server.
• It helps eliminate the need for the owner to keep track of the tree structure, while keeping our design secure.
• Otherwise, the owner will have to record local state informa7on for each update he conducts - Quite a burden from prac7cal point of view.
Example: Storing Rank of Nodes • Rank of node i denotes the number of leaf nodes that belong
to this sub-‐tree with node i as the root.
• The owner can directly use authen7cated rank values to verify
that the node F is indeed the 750-‐th node.
Root
xi = H(mi) , i=1,...,n
......
...
,1000
hA, 400 hC, 400
hE,349 hF,1
Root = h(hA || hB || 1000);
hC = h(hE || hF || 400);
hB, 600
hD, 200 hB = h(hC || hD || 600);
hF = h(H(m750) || 1); Leaf node: H(m750)
…
Efficiency Enhancement • Using MHT, persistent inser7on on the same posi7on would
result in worst case complexity to be O(n). – Since the tree height keeps increasing.
• But other more-‐balanced tree structures can be directly u7lized to replace the MHT and maintain worst case performance to be O(log n). – E.g., Skiplist, B+ tree can be used .
– Homework: you can check these details by reading the corresponding papers.
Security Analysis• Our proposed authen7cator construc7on can be proved to be
existen7ally unforgeable. – Use the fact that the BLS signature is existen7ally unforgeable. – By contradic7on: if an adversary can forge our authen7cator scheme à we
can use the adversary to forge a BLS signature.
Simulator Adversary
A forged BLS signature passes the verificationContradiction !
Forge
40
Security Analysis (cont’d)• The soundness of our storage correctness guarantee is based on the
hardness of Computa7onal Diffie-‐Hellman (CDH) problem. – CDH: Given g, gα, h ∈ G for unknown α ∈ Zp, to output hα. – By contradic7on: If an adversary can respond corrupted to pass the
verifica7on à we can solve the CDH problem
Simulator
CDH is solved à Contradiction!41
Probabilis7c Guarantee of Random Sampling
42
• Assume r out of n blocks are corrupted, how many blocks should we randomly sample to detect it with high probability?
• Let X denote the number of corrupted blocks picked by the random-‐sampling. Then sampling c blocks gives detec7on probability
P = 1� P{X = 0} = 1�c�1Y
i=0
(1�min{ r
n� i, 1})
⇥ 1� (n� r
n)c = 1� (1� t)c,where t =
r
n
• If t = 1% of file is corrupted, randomly sample a constant of c = 460 blocks to maintain detec7on probability P = 0.99.
• Error-‐correc7ng code can be used to correct small data errors.
Performance Evalua7on
Table 1: Comparisons with the-‐state-‐of-‐art.
+: The scheme only supports bounded number of integrity challenges and par7ally data updates, i.e., data inser7on is not supported.
Ateniese et al. CCS'07
Shacham et al. ASIACRYPT'08
Ateniese et al. SecureComm'08
Our TPDS’11/ ESORICS’09
Data dynamics No Par7ally+ Yes
Sever comp. complexity O(1) O(1) O(1) O(log n)
Owner comp. complexity O(1) O(1) O(1) O(log n)
Comm. Complexity O(1) O(1) O(1) O(log n)
Owner storage complexity O(1) O(1) O(1) O(1)
43
Performance Evalua7on (cont’d)
Table 2: performance comparisons with different instan7a7ons.
Our experiment is conducted using C on a system with a processor running at 2.4 GHz, 768 MB of RAM.
The performance is measured for 1 GB data under data corrup7on rate t = 1% and 3% while maintaining detec7on probability P = 0.99, where P ≥ 1 -‐ (1 – t )c and c is the sample size. The block size of RSA-‐based instan7a7on is chosen to be 4 KB. Note that error-‐correc7ng code can be used to correct small data errors (e.g., t < 1%).
Our BLS based instan7a7on
Our RSA based instan7a7on
System Parameters
Data corrup7on rate – t 1% 3% 1% 3%
Detec7on probability – P 0.99 0.99 0.99 0.99
Randomly sampled blocks – c 460 152 460 152
Performance Results
Server comp. 7me (ms) 6.45 2.11 13.81 4.55
Owner comp. 7me (ms) 806.01 284.17 779.10 210.47
Comm. cost (KB) 239 80 223 76
44
Short Summary
45
• We explore the problem of cloud storage audi7ng with data dynamics support.
• We carefully designed a new homomorphic authen7cator and achieve the goal with a novel sequence-‐enforced Merkle Hash Tree (sMHT) design.
• We conduct experiments for both BLS-‐based and RSA-‐based instan7a7ons. Extensive security and performance analysis shows that the proposed scheme is provably secure and highly efficient.
Outline
46
• Cloud Compu7ng Background • Cloud Data Storage and Security Challenges • Our Research Efforts and Proposed Designs
• Storage audi7ng with data dynamics support • Privacy-‐preserving public audi7ng • Efficiency improvement via batch audi7ng
• Further Discussion on the Subject
Public Audi7ng with Third-‐party Auditor
• Maintaining storage correctness guarantee demands con7nuous audi7ng. – High computa7on/communica7on costs and online burdens for data
owners.
• Introduce a third-‐party auditor (TPA) for correctness evalua7on – Owners can be worry-‐free by resor7ng to TPA for audi7ng tasks.
Resource constrained
Large amount of data
47
Security message flow
Data flow
• TPA should not learn the content of the data, when performing audi7ng on behalf of data owners. • Unauthorized informa7on leakage is unwanted by data owners • Legal regula7ons, e.g., HIPAA, may mandate it.
• Privacy-‐preserving public audi7ng mechanism is desired.
Public Audi7ng VS. Data Privacy
Data flow
Third-‐party auditor
Revisit Exis7ng Approaches
• μ = v1m1+v5m5+v6m6+v8m8 leaks the data to TPA. – Direct adop7on is unsuitable for public audi7ng. – Can recover all mi’s by solving the linear equa7on systems.
• Assuming data encryp7on before outsourcing? NOT sa7sfying. – Method not self-‐contained; Leave the problem to key management – An overkill for certain types of data, e.g., libraries, scien7fic data, …
σ1
m1 σ2
m2 σ3
m3 σ4
m4 …
… σn
mn Data
{v1, v5, v6, v8} random positions & coefficientsTPA
Cloud Server Owner outsource
σi
μ = v1m1+v5m5+v6m6+v8m8with gx
49
• Achieve privacy-‐preserving audi7ng regardless of data encryp7on. • Construct homomorphic aggrega7on with random masking.
Privacy-‐preserving Public Audi7ng
σ1
m1 σ2
m2 σ3
m3 σ4
m4 …
… σn
mn {v1, v5, v6, v8}
random positions & coefficients
server combines corresponding blocks and randomly masks it.
μ = v1m1+v5m5+v6m6+v8m8
TPA Cloud Server
verify μ and σ
Random masking must not affect storage correctness validation!
With randomly masked μ , owner’s data content is no longer exposed!
50
Privacy-‐preserving Public Audi7ng
σ1
m1 σ2
m2 σ3
m3 σ4
m4 …
… σn
mn {v1, v5, v6, v8}
random positions & coefficients
μ = v1m1+v5m5+v6m6+v8m8
TPA Cloud Server
µ
• System Parameters: , . , ,
51
1. Cloud server picks a random r. ���2. Computes ���3. μ = r + γ μ mod p. The soundness of our privacy-preserving
auditing mechanism can be proved under the random oracle model.
The Correctness Elabora7on
52
µ' : the original block µ : the blinded block
Remarks on Privacy-‐preserving Audi7ng
• We have proved our construc7on of R and γ as γ = h(R) would not affect the security of storage audi7ng equa7on.
• The scheme works under semi-‐trusted security model – i.e., the colluding between cloud server and TPA not considered
• The scheme can support data dynamics straigh\orwardly. – Elimina7on of block index in authen7cator – U7lizing sequence-‐enforced MHT (sMHT)
• Other privacy-‐preserving audi7ng construc7ons are possible.
53
Security Analysis• The privacy preserving guarantee is proved in the random oracle
model using γ = h(R). – We prove the existence of a simulator, who controls the random oracle h(.)
and can produce a valid response {R, σ, µ } without the knowledge of µ. – Assume the simulator is given a valid σ.
1. Simulator randomly picks γ and µ from Zp.
54
2. Simulator sets µ
3. Simulator backpatches (or sets) γ = h(R), as it controls the random oracle h(.).
Since simulator generates a valid response {R, σ, µ } without knowing µ, it means from response {R, σ, µ }, TPA learns nothing on µ.
Security Analysis• The soundness of our modified audi7ng mechanism is based on the
underlying (original) storage audi7ng mechanism. – We prove the existence of an extractor who can extract µ from valid {R, σ, µ }. – The extractor controls the random oracle h(.) and answers queries issued by
cloud server for h(R).
1. Extractor answers γ = h(R) and cloud server outputs valid {λ, σ, µ } such that
55
3. By dividing the two equa7ons, the extractor can obtain valid {σ, µ}, where for original storage audi7ng equa7on (such as Shacham’s scheme). With valid {σ, µ}, the soundness of our audi7ng scheme follows from exis7ng soundness proofs.
µ
2. Extractor rewinds (resets) cloud server and returns γ* = h(R) for the query of h(R). Cloud server outputs {R, σ, µ* } such that
µ*
µ =( - )/ (γ*- γ), µ* µ
Cost of Privacy-‐Preserving Guarantee Table 3: performance comparisons with previous work
Our experiment is conducted using C on a system with an Intel Core 2 processor running at 1.86 GHz, 2048 MB of RAM.
Our analysis shows that if the server is missing t=1% of the data blocks, the TPA only needs to audit for c=460 or 300 randomly chosen blocks so as to detect this misbehavior with probability P larger than 0.99 or 0.95.
Our INFOCOM’10 Shacham et al. ASIACRYPT'08
System parameters
Data corrup7on rate -‐ t 1% 1% 1% 1%
Detec7on probability -‐ P 0.99 0.95 0.99 0.95
Randomly sampled blocks -‐ c 460 300 460 300
Performance results
Server comp. 7me (ms) 411.00 270.20 407.66 265.87
TPA comp. 7me (ms) 507.79 476.81 504.25 472.55
Comm. cost (Byte) 160 160 40 40
Privacy-‐preserving Yes No
56
Short Summary
57
• Enable public audi7ng is of cri7cal importance for its unified risk evalua7on for cloud storage services. But public audi7ng should not affect owner’s data privacy.
• A public storage audi7ng scheme u7lizing a new random-‐masking construc7on with homomorphic authen7cators is designed.
• The design also supports data dynamics straigh\orwardly.
• Extensive security and performance experiments show the proposed schemes are provably secure and highly efficient.
Outline
• Cloud Compu7ng Background • Cloud Data Storage and Security Challenges • Our Research Efforts and Proposed Designs
• Storage audi7ng with data dynamics support • Privacy-‐preserving public audi7ng • Efficiency improvement via batch audi7ng
• Further Discussion on the Subject
58
Batch Audi7ng
{v1, v5, v6, v8} randomly-chosen coefficientsTPA
• TPA may concurrently handle mul7ple audi7ng delega7ons. • Individually audi7ng each tasks can be tedious and overall inefficient. • We explore the algebraic property of BLS signature and slightly modify the
protocol in a single owner case for simultaneous audi7ng. (details skipped)
verify µ1 and σ1
verify µ2 and σ2
verify µk and σk
…… ……
σ1
m1 σ2
m2 σ3
m3 σ4
m4 …
… σn
mn
Cloud Server
σ1
m1 σ2
m2 σ3
m3 σ4
m4 …
… σn
mn
σ1
m1 σ2
m2 σ3
m3 σ4
m4 …
… σn
mn
owner 1
owner 2
owner k
…
Verify µ1 , µ2 ,… , µk , and an aggregated σ in a single equation.
59
Recap on Bilinear Pairing
Batch Audi7ng: Efficiency Enhancement Highlight
… … …
… … …
Aggregate K equa7ons into single one
61
Privacy-‐preserving Batch Audi7ng: Efficiency Enhancement Highlight
Aggregate K equa7ons into single one
Remarks on Batch Audi7ng
• Aggrega7ng K (K >= 2) verifica7on equa7on into 1 saves expensive pairing opera7ons from 2K to K+1. – A considerable amount of audi7ng 7me can be saved.
• Correct verifica7on means all checked blocks are valid.
– Due to the security strength of BLS based authen7cators and verifica7on equa7on.
• Failed verifica7on means one or more owners data are corrupted. – Divide-‐and-‐conquer approach (binary search) to find invalid responses.
63
Batch Audi7ng Efficiency
0 20 40 60 80 100 120 140 160 180 200400
420
440
460
480
500
520
Number of auditing tasks
Audi
ting
time
per t
ask
(ms)
individual auditingbatch auditing (c=460)batch auditing (c=300)
Batch audi7ng indeed helps reduce the TPA’s computa7on cost, as more than 11% and 14% of per-‐task audi7ng 7me is saved, when c=460 or 300 , respec7vely. 64
Sor7ng Out Invalid Responses
Even the number of invalid responses exceeds 15% of the total batch size, the performance of batch audi7ng can s7ll be safely concluded as more preferable than the individual audi7ng.
0 2 4 6 8 10 12 14 16 18410
420
430
440
450
460
470
480
490
500
510
Fraction of invalid responses α
Audi
ting
time
per t
ask
(ms)
individual auditingbatch auditing (c=460)batch auditing (c=300)
Short Summary
• Handle mul7ple audi7ng tasks simultaneously (batch audi7ng) is in great need as data are increasingly outsourced to cloud – The individual audi7ng of each data file can be tedious and inefficient. – Batch audi7ng improves efficiency and saves computa7on overhead.
• We leverage the algebraic property of BLS signature based homomorphic authen7cators and construct correct and secure batch audi7ng protocols.
• We demonstrate via experiments that the proposed batch audi7ng schemes outperforms individual audi7ng in terms of per task audi7ng 7me.
66
Related Publica7ons • Q. Wang, C. Wang, J. Li, Kui Ren, and W. Lou, "Enabling Public Verifiability and Data
Dynamics for Storage Security in Cloud Compu7ng", in IEEE Transac-ons on Parallel and Distributed Systems, Vol. 22, No. 5, pp. 847-‐859, May, 2011. (also appears in Proc. of ESORICS, 2009, AR = 19%)
• #1 top accessed IEEE TPDS ar7cle in IEEE Xplore as in December 2011 • C. Wang, Q. Wang, Kui Ren, and W. Lou, "Privacy-‐preserving Public Audi7ng for Data
Storage Security in Cloud Compu7ng”, IEEE Transac-ons on Computers, Vol. 62, No. 2, pp. 362-‐375, 2013. (also appears in Proc. of IEEE INFOCOM, 2010, AR = 17.5%)
• #1 top accessed INFOCOM'10 ar7cle in IEEE Xplore as in December 2011
• C. Wang, Q. Wang, Kui Ren, and W. Lou, "Ensuring Data Storage Security in Cloud Compu7ng,” IEEE Transac-ons on Service Compu-ng, Vol. 5, No. 2, pp. 220-‐232, 2012 (also appears in Proc. of IWQoS, 2009)
• C. Wang, Kui Ren, W. Lou, and J. Li, "Towards Publicly Auditable Secure Cloud Data Storage Services", IEEE Network, vol. 24, no. 4, pp. 19-‐24, 2010
• #2 top accessed IEEE Network ar7cle in IEEE Xplore as of July 2011
• Kui Ren, C. Wang, and Q. Wang, "Security Challenges for the Public Cloud, IEEE Internet Compu-ng, Vol. 16, No. 1, pp. 69-‐73, Jan/Feb, 2012 (Invited Paper)
67
Outline
• Cloud Compu7ng Background • Cloud Data Storage and Security Challenges • Our Research Efforts and Proposed Designs
• Storage audi7ng with data dynamics support • Privacy-‐preserving public audi7ng • Efficiency improvement via batch audi7ng
• Further Discussion on the Subject
68
• Proofs of data redundancy • Proofs of data encryp7on • Assured dele7on • Proofs of geoloca7on • Proofs of ownership vs. deduplica7on • More to be iden7fied…
69
More Cloud Storage Security Related Topics
• Proofs of data redundancy • Proofs of data encryp7on • Assured dele7on • Proofs of geoloca7on • Proofs of ownership vs. deduplica7on • More to be iden7fied…
70
More Cloud Storage Security Related Topics
Proofs of Data Redundancy: Challenges on The physical layer
• Amazon claims to store three dis7nct copies of my file for resilience. Can they prove it? – Audi7ng won’t do the trick, nor will downloading!
Alice
F F F
F or ? F F F F
Slides credits to Ari Jules et al.
Virtualiza7on is a complica7on
Erasure coding across disks…
Disk 1 Disk 2 Disk 3 Disk 4 Disk 5
My file can survive two disk crashes!
Virtualiza7on is a complica7on
Erasure coding across disks…
Disk 1 Disk 2 Disk 3 Disk 4 Disk 5
My file can survive two disk crashes!
Virtual Virtual Virtual Virtual Virtual
A single disk crash can destroy my file!
X
How to Tell if Your Cloud Files Are Vulnerable to Drive Crashes
Proofs for that the tenant’s files can survive drive
crashes
Prove Disk-‐crash Resilience
Claim: File can survive two disk crashes!
The Challenge: How can a cloud provider prove that certain bits sit on certain disks?
Disk 1 Disk 2 Disk 3 Disk 4 Disk 5
Mo7va7on and Idea
• Cloud server: “We store 3 copies of your file in 3 different drives. We are 2 fault-‐tolerant.”
• Pizza store: “We have 2 ovens.”
• How do you know if it’s true?
• Idea : mul7ple devices can do parallel work but single device can’t.
Example – pizza store • Assume we know
– The pizza store has 2 ovens – An oven “usually” takes 5 min to bake a pizza – The store is a 15 min drive from here
• Time needed for 24 pizzas ? – 1 oven : 5·∙24=120 min – 2 ovens: 60 min – Drive 7me: 15 min
• Task for the pizza store: “Send me 24 pizzas in 80 min.”
• Task for the cloud server: “Send me a block of the file from each drive in xxxx milliseconds”
The Pizza Oven Protocol
Eeta Pizza Pi Cheapskate Pizza
“Six pizzas!”
The Pizza Oven Protocol
“Six pizzas!”
XEeta Pizza Pi Cheapskate Pizza
X
The Pizza Oven Protocol
Eeta Pizza Pi Cheapskate Pizza
Cheapskate now claims it can survive an oven failure! How can Eeta Pizza Pi verify without visi7ng???
The Pizza Oven Protocol
Suppose that: • A pizza oven bakes one pizza at a 7me, and takes 10 minutes • The Cheapskate truck takes 15 minutes to deliver to Eeta Pizza Pi
“Six pizzas!”
Eeta Pizza Pi Cheapskate Pizza
T0
T1
T1 – T0 = 45 mins?
Protocol Design for Cloud Servers
• Core part – Choose the threshold of 7me limit
• Challenges – Network latency / pizza delivery traffic 7me – Drive read 7me / oven baking speed
• seek 7me, throughput, RPM, buffer
– Make the queries to disks unpredictable
Network latency • Ping hosts in Santa Clara and Shanghai from Boston
• Several strategies to factor variability in network latency – Latency 1 ≈ Latency 2 if geographically close – Abort protocol if response 7me exceeds 110% of the average
• Reduce network-‐7ming variance when limited bandwidth – Server applies hash func7on before transmi�ng
Drive – read 7me
• Task: Server reads a block from each drive – The block size (the size of each gi) ? – The 7me limit for this task?
• Two main factors of drive read 7me – Seek : disk head moves to the right track and sector – Data transfer rate (throughput)
• The drive used in this paper – 3.5ms seek 7me and 73MB/s to 125MB/s throughput
Drive – determine the block size • Seek 7me depends on the distance that the disk head needs to move
• Throughput depends on the posi7on of the block – Outer tracks are faster than inner tracks – Sequen7al data are faster than sca_ered data
• Force to perform a seek for EVERY block – Using small block size – Query random pa_ern of blocks
Drive – determine 7me limit
• Recall the two examples – Pizza store with 2 ovens: query 24 pizzas (12 steps) – Cloud server with 3 drives: query 3 blocks (1 step)
• Why use 12 steps instead of 1 step for pizza store? – Enlarge the gap between one oven and two ovens
• How to play the same trick to Cloud server, query q steps (query cq blocks) – Solu7on : lock-‐step à make the queries to disks unpredictable
Lockstep Idea
• Specify query Q in an ini7al step consis7ng of c random challenge blocks, one per drive
• For each subsequent step, the set of c challenge blocks depends on the content of the file blocks accessed in the last step.
• The server can proceed to the next step only a]er fully comple7ng the last one.
• Lock-‐step ensures the security via the increase of the steps
• The more steps, the larger gap
Gap, number of steps, 7me limit
threshold
Experiments : c = 5 drives • Response 7me gap between honest max and adversary min
• Proofs of data redundancy • Proofs of data encryp7on • Assured dele7on • Proofs of geoloca7on • Proofs of ownership vs. deduplica7on • More to be iden7fied…
90
More Cloud Storage Security Related Topics
Proofs of Data Encryp7on: Mo7va7on
• Public cloud has large a_ack surface – Thousands of computers – Dozens of storage systems and interfaces
• Amazon alone: S3, EBS, Instance Storage, Glacier, Storage Gateway, CloudFront, RDS, DynamoDB, Elas7Cache, CloudSearch, SQS
– Shared resources among thousands of tenants • Many possibili'es for accidental data leakage. – Data encryp'on is a must.
91
Slides credit to Stefanov et al.
Defending Against Accidental Data Leakage
• Simple view: – Just encrypt your data in the cloud.
– Problem solved?
leakage
???
Defending Against Accidental Data Leakage
• More realis7c view: – O]en want to use the cloud for more than just raw storage.
– Why? Want to outsource storage AND computa'on (services).
– In that case, the cloud needs access to your decrypted data.
leakage
???
Encrypt at Rest & Decrypt on the Fly
• Split the cloud into computa7on front-‐end and storage back-‐end – Already the case in many clouds (e.g., Amazon, Azure)
• Storage backend only sees encrypted data. • Computa7on front-‐end decrypts data on the fly
– Only accesses the data it really needs at any one 7me • Can be combined with 7ght access control and logging.
– Key servers
leakage
Services Front End Storage Back End
???
Encrypt at Rest & Decrypt on the Fly
• Protects against data leakage by the storage back-‐end infrastructure.
• Limits the amount of data leakage by the front-‐end at any one 7me.
• Common prac7ce. • Much be_er than no encryp7on.
leakage
???
Services Front End Storage Back End J complies with
government regula'ons
The Problem
• Lack of visibility – Users only see results (e.g., web pages) from the front-‐end. What is happening internally?
• Download data and check encryp7on? – The cloud can always just encrypt on the fly.
• Seems impossible!
How can we be reasonably sure that the cloud is encryp'ng data at rest? Plaintext is simpler for the cloud to manage.
One Proposed Solu7on
• Impose financial penal'es on misbehaving cloud providers.
• We ensure that an economically ra'onal cloud provider, encrypts data at rest.
• Misbehaving cloud must use double storage. – Must store both decrypted and encrypted file.
Economically mo'vate the cloud to encrypt data at rest.
One Solu7on: Hourglass Schemes
Original File Encrypted File Encapsulated File
encryp7on hourglass
client assists client verifies
by periodically challenging random file
blocks
client verifies
encryp7on client uploads file
• The client never needs to permanently store and manage keys.
Intui7on
Original File Encrypted File Encapsulated File
encryp7on hourglass
client checks adversarial cloud
wants to only store
Hourglass property: costly to compute “on the fly”
So an adversarial cloud must store both files.
Double the storage!
Hourglass Framework: More than a Scheme
• Encodings: – Encryp7on – Watermarking – File Bindings
• Hourglass func7ons: – Bu_erfly – Permuta7on – RSA
Modular Components
Encodings • Encryp'on: 𝑮=𝑬(𝑭) • Watermarking: 𝑮=𝑭||Tag
– Embed a tag into the file – Tag says that the file is stored on a specific cloud – Tag signed by the cloud – Evidence of data leakage origin.
• File Binding: 𝑮= 𝑭↓𝟏 ||𝑭↓𝟐 ||…|| 𝑭↓𝒎 – Combine mul7ple files into one encoding. – E.g., embedded license.
Hourglass Func7ons
• Costly to apply “on the fly” • Impose a resource lower bound on the cloud to compute: Gà H, and hence FàH
Original File Encrypted File Encapsulated File encoding���
(e.g., encryption) hourglass
𝑭 𝑮 𝑯
Hourglass Func7on: RSA
• Cloud can always recover the plaintext : – Gi = RSA-‐recoverMessage(Hi) (using client’s public RSA key) – Fi = Decode(Gi)
• Resource bound: computa'on – Completely infeasible for cloud: Fà H – It doesn’t have the RSA signing key to do: Gà H
F1 F2 F3 F4 Fn … F:
G1 G2 G3 G4 Gn … G:
H1 H2 H3 H4 Hn … H:
Client computes Hi = RSA-‐Sign(Gi) using random RSA private key.
Apply encoding (encryp7on, watermarking, file binding)
Hourglass Func7on: Permuta7on
• Client later challenges the cloud for sequen7al ranges of 𝐻. – Sequen'al range in 𝑯 à Random blocks in 𝑭 à Random blocks in 𝑭
• Resource bound: disk seeks – A misbehaving cloud (that only stores 𝐹) will need to do many random accesses to respond to a challenge.
F1 F2 F3 F4 Fn … F:
G1 G2 G3 G4 Gn … G:
H1 H2 H3 H4 Hn … H:
Apply encoding (encryp7on, watermarking, file binding)
Randomly permute the blocks of to form . No cryptographic opera7ons. Operates on 7ny blocks.
G1 G2 G3 G4 G5 G6 G7 G8
w = a known key PRP over a pair of file blocks
Hourglass Func7on: Bu_erfly
Comparison of Hourglass Func7ons
more prac'cal
more assump'ons
less prac'cal
less assump'ons
RSA Buderfly Permuta'on
RSA exponen'a'ons
AES opera'ons random memory accesses
RSA assump'ons storage speed seek inefficiency in rota'onal drives
Ran on Amazon EC2 (using a quadruple-‐extra-‐large high-‐memory instance and EBS Storage).
Comparison of Hourglass Func7ons
Challenge-‐Response Protocol • The client challenges the cloud for blocks of the encapsulated file H. – At random unpredictable 7mes
– Few challenges, e.g., O(log n) • Cloud must respond quickly.
• Doable by an external auditor. – Auditor doesn’t see the plaintext F.
H1 H2 H3 H4 Hn … H:
Limita7ons
• Assume files are not accessed to o]en. – Great for archiving files.
• File updates are costly. – RSA hourglass func7on allows for updates. – Other hourglass func7ons must be re-‐applied to the en7re file.
• Works mainly for large files.
• Proofs of data redundancy • Proofs of data encryp7on • Assured dele7on • Proofs of geoloca7on • Proofs of ownership vs. deduplica7on • More to be iden7fied…
110
More Cloud Storage Security Related Topics
111
Assured Data Dele7on: Mo7va7on • A]er outsourcing, can we reliably remove data from cloud? – We don’t want backups to exist a]er pre-‐defined 7me
• e.g., to avoid future exposure due to data breach or error management of operators
– If an employee quits, we want to remove his/her data • e.g., to avoid legal liability
• Cloud makes backup copies. We don’t know if all backup copies are reliably removed.
• We need assured dele'on: – Data becomes inaccessible upon requests of dele7on
Slides credit to Patrick Lee et al.
One Solu7on: FADE (securecomm’10) • FADE: an overlay cloud storage system with file assured dele7on
key manager
… Data owner
Cloud
file (encrypted)
metadata file
FADE
• FADE decouples key management and data management • Key manager can be flexibly deployed in another trusted third
party, or deployed within data owner • No implementa7on changes on cloud
113
Threat Models and Assump7ons
• File assured dele7on is achieved – If we request to delete a file, it is inaccessible
• Key manager is minimally trusted – can reliably remove keys of revoked policies – can be compromised, but only files with ac7ve policies can be recovered
• Data owner forms an authen7cated channel with key manager for key management opera7ons
114
Policy-‐based File Assured Dele7on
• Each file is associated with a data key and a file access policy
• Each policy is associated with a control key • All control keys are maintained by a key manager • When a policy is revoked, its respec7ve control key will be removed from the key manager
115
Policy-‐based File Assured Dele7on
• Main idea: – File protected with data key – Data key protected with control key
File
data key control key
is maintained by the key manager
116
Policy-‐based File Assured Dele7on
• When a policy is revoked, the control key is removed. The encrypted data key and hence the encrypted file cannot be recovered
• The file is deleted, i.e., even a copy exists, it is encrypted and inaccessible by everyone
File
data key Cannot be recovered
without
• Proofs of data redundancy • Proofs of data encryp7on • Assured dele7on • Proofs of geoloca7on • Proofs of ownership vs. deduplica7on • More to be iden7fied…
117
More Cloud Storage Security Related Topics
Proofs of Geoloca7on of Data
• Mo7va7on is from regulatory compliance. – many laws requires storage providers to keep customer data within, say, na7onal boundaries
• One open problem is the remote verifica7on of the geographical loca7on of cloud data. – of par7cular commercial interest
Proofs of Geoloca7on of Data
• Given the challenge of ensuring that data is not duplicated, any solu7on probably requires – a trusted data-‐management system, e.g., via trusted hardware
– localizing the pieces of the above system.
• A promising explora7on direc7on – Geoloca7on of trusted hardware via remote 7ming from trusted anchor points.
• Proofs of data redundancy • Proofs of data encryp7on • Assured dele7on • Proofs of geoloca7on • Proofs of ownership vs. deduplica7on • More to be iden7fied…
120
More Cloud Storage Security Related Topics
A_acks and Mo7va7ons
• Many cloud storage providers deduplicate the files that its users have stored online. – Usually use file hash to detect and keep a single copy of original file
– save storage and bandwidth cost
• It’s possible for adversary to simply leverage file hash to become one of the file owners.
A_acks and Mo7va7ons
Upload file1 to cloud
File1, hash1
Cloud uses hash1 to detect future upload requests
of File1
Data owner
A_acks and Mo7va7ons
Upload file1 to cloud
File1, hash1
Use hash1 to detect future
upload requests of File1
Data owner adversary
Request to upload File1, here is its hash1
A_acks and Mo7va7ons
Upload file1 to cloud
File1, hash1
Use hash1 to detect future
upload requests of File1
Data owner adversary
Request to upload File1, here is its hash1
Using simple file hash to become one of owners of File1
Proofs of Ownership (POW)
• POW is Not proof of storage – No-‐preprocessing step – Client has less power and space
• The basic Idea: – Server challenges the client – client has to prove that he has the file – With negligible probability client can convince server that he has the file when he does not
Solu7on Highlight
• Solu7on1: Proofs of random por7on of file – Use Merkle Hash Tree (MHT) over file
• Client sends root of MHT, built over blocks of the file • Server asks for random leaves to verify
– If small file entropy, encode the file first with erasure code • to enlarge the unknown file por7on, making it less predictable
• Solu7on 2: Proofs of random por7on of summary of file – Assume user’s memory size to be a buffer – Build MHT over the buffer only
• Other advanced solu7ons are also proposed
To learn more • K. Bowers, M. van Dijk, A. Juels, A. Oprea, and R. Rivest. How to Tell if Your
Cloud Files Are Vulnerable to Drive Crashes. In Proc. Of CCS, 2011. • M. van Dijk, A. Juels, A. Oprea, R. Rivest, E. Stefanov, N. Triandopoulos,
Hourglass Schemes: How to Prove that Cloud Files Are Encrypted. In Proc. Of CCS, 2012
• Y. Tang, P. P. C. Lee, J. C. S. Lui, R. Perlman, Secure Overlay Cloud Storage with Access Control and Assured Dele7on, IEEE TDSC, vol. 9 no. 6, 2012, pp. 903-‐916.
• A. Juels, A. Oprea, New approaches to security and availability for cloud data. Commun. ACM 56(2): 64-‐73 (2013)
• S. Halevi, D. Harnik, B. Pinkas, A. Shulman-‐Peleg, Proofs of ownership in remote storage systems. In Proc. Of CCS, 2011
To learn even more • A. Juels and B. Kaliski. Proof of Retrievability (PORs) for Large Files. In Proc.
Of CCS ‘07. • K. D. Bowers, A Juels, and A. Oprea: HAIL: a high-‐availability and integrity
layer for cloud storage. ACM CCS ‘09. • K. Bowers, A. Juels, and A. Oprea. Proofs of Retrievability: Theory and
Implementa7on. In Proc. Of CCSW, 2009. • G. Ateniese, S. Kamara, J. Katz, Proofs of Storage from Homomorphic
Iden7fica7on Protocols. In Proc. Of ASIACRYPT, 2009, pp. 319-‐333 • Y. Dodis, S. Vadhan, D. Wichs, Proofs of Retrievability via Hardness
Amplifica7on. In Proc. Of TCC, 2009, pp. 109-‐127 • G. Ateniese, et al., Remote data checking using provable data
possession. ACM Trans. Inf. Syst. Secur. 14(1): 12 (2011) • H. Shacham, B. Waters, Compact Proofs of Retrievability. J. Cryptology 26(3):
442-‐483 (2013)