secure data outsourcing

54
Secure Data Outsourcing

Upload: maegan

Post on 02-Feb-2016

49 views

Category:

Documents


4 download

DESCRIPTION

Secure Data Outsourcing. Outline. Motivation Background Research issues Summary. Motivation. Cost of maintaining/mining large data 4-5 times of the cost of data acquisition DBAs are paid well  More and more data service providers Low cost – cloud computing - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Secure Data Outsourcing

Secure Data Outsourcing

Page 2: Secure Data Outsourcing

Outline Motivation Background Research issues Summary

Page 3: Secure Data Outsourcing

Motivation Cost of maintaining/mining large data

4-5 times of the cost of data acquisition DBAs are paid well

More and more data service providers Low cost – cloud computing

Maintain one database for one user multiple users Examples:

Alentus.com Datapipe.com Discountasp.net …

Concerns about data security and privacy Untrusted service provider

Page 4: Secure Data Outsourcing

Un-trusted service provider Lazy: incentives to perform less Curious: incentives to acquire

information Malicious:

Denial of service Incorrect results Possibly compromised

Page 5: Secure Data Outsourcing

Challenges Data confidentiality

Data need to be encrypted (?) Utility of protected data?

Query utility Mining utility

Access pattern privacy Integrity

Data integrity Query integrity

Correct Complete Fresh

Page 6: Secure Data Outsourcing

Why is it hard for query services? Arbitrary expressivity

SQL statements Often, restricted for certain type of query

for simplicity (e.g. range query, knn query)

Cost Communication Computation (server side vs client side)

Page 7: Secure Data Outsourcing

Why it is hard for mining services? Many data mining models

Different utilities to preserve No one-size-for-all solutions

Page 8: Secure Data Outsourcing

Data confidentiality Bucketization method (crypto-index) Order preserving encryption Perturbations

Page 9: Secure Data Outsourcing

Bucketization method Hacigumus (SIGMOD02)

Page 10: Secure Data Outsourcing

Main steps Partition sensitive attributes

Order preserving: supports comparison Random: query rewriting becomes hard

Build index on the partitions Rewrite queries to target partitions

‘john doe’ 105 Select * from T’ where name=105

Execute queries and return results Prune/post-process results on client

Page 11: Secure Data Outsourcing

Trade off between confidentiality and overhead Larger partition increased privacy

increased overheads

Page 12: Secure Data Outsourcing

Order preserving encryption Agrawal2004, Boldyreva2009 The set of data is securely

transformed so that the order is preserved but the distribution and domain are changed

Benefits: indexing/searching on OPE encrypted data

Weakness: once the original distribution is known, OPE is broken

Page 13: Secure Data Outsourcing

Not attribute-wise order preserving Order preserving encryption (OPE, Agrawal et al

2004) is not resilient to distribution-based attacks

Original Xi distribution is known Transformed Xi’ distribution

OPE

Bucket basedEstimation

Page 14: Secure Data Outsourcing

Data perturbation Definition

1. randomly change the original data2. the attacker cannot effectively recover

the original data 3. the desired properties are preserved

Techniques Single dimension: noise addition Multidimensional

Geometric perturbation Random projection RASP random space perturbation

Page 15: Secure Data Outsourcing

Noise addition Y = X+ R

X: original data column, R: random noise (distribution published), Y: published data

Applications in data mining Reconstructing column distribution

Rakesh Agrawal SIGMOD 2000 Applied to privacy-preserving decision tree, naïve

bayes classifier

Attacks Spectral filtering (Kargupta ICDM 2004) PCA reconstruction (Huang SIGMOD2005)

Page 16: Secure Data Outsourcing

Multiplicative perturbations Geometric data perturbation for

outsourced data mining Random Projection RASP perturbation for query services

(range query, kNN query).

Page 17: Secure Data Outsourcing

Perturbation-based framework

Mining service

Page 18: Secure Data Outsourcing

Geometric data perturbation Y=RX+T+D

R: secret rotation matrix (preserve Euclidean distances) T: secret random translation matrix, D: secret random

noise matrix Distances are approximately preserved (D) Resilient to most attacks to rotation perturbation

Applications Outsourced privacy preserving data mining, applicable

for many classification and clustering algorithms

Attacks Population based attacks (when covariance matrix is

revealed)

Page 19: Secure Data Outsourcing

Random Projection Y=AX+D

A: random projection, e.g., entries from N(0,1)

Distances are approximately preserved Applications

Many classification and clustering algorithms Worse accuracy than geometric perturbation

Good for sparse high-dimensional data (text data), i.e., sketch methods (A is randomly generated for EACH record)

Attacks Possibly more resilient than other two

perturbation methods But utility (distance) is not well preserved

Page 20: Secure Data Outsourcing

RASP perturbationk-dimensional numeric data, n records, represented as a k x n matrix, x: a record

(1) Extend x to k+2 dimensions - (K+1) th dimension is always 1 – homogeneous dimension- (K+2) th dimension v is a real random number drawn from

(2) Encryption

- A is a (k+2)x(k+2) invertible real value matrix, with at least two non-zero values for each row and the last column of A has all non-zero values

- A is shared by all records

Page 21: Secure Data Outsourcing

Properties Not an OPE Preserves convexity of the dataset

Convex dataset in Rk another convex dataset in Rk+2.

Good for range query Each range query in Rk

hyperplane based query range query in Rk+2 .

Page 22: Secure Data Outsourcing

RASP properties Convexity preserving

Queried range (hypercube) is convex RASP transforms the range to another convex (polyhedron)

wTx=a

half space: wTx<=a

The intersection of convex sets is also convex.

Page 23: Secure Data Outsourcing

illustration of convexity preserving

Original space Encrypted space

Page 24: Secure Data Outsourcing

Secure query transformation A naïve solution

Based on the convexity preserving property

Problems: (1) A-1 can be probed (2) is . . If a is known, the whole dimension i is breached.

Page 25: Secure Data Outsourcing

Secure query transformation Enhanced solution

Xk+2 is always positive

(Xi-a) 0 (Xi-a)Xk+2 0 Correspondingly, in the encrypted space

yTy 0,

Problems addressed: (1) A-1 cannot be derived from (2) (Xi-a)Xk+2 0 contains the random component Xk+2 that protects the condition (Xi-a) 0

Page 26: Secure Data Outsourcing

Efficient two-stage query processing

illustrated

Original space Transformed space

Stage1:Querying this boundingbox

A multidimensional tree index is been built on the encrypted data (in the transformed space) in the server.

Stage2:Filter out the junk records

Page 27: Secure Data Outsourcing

Stage 1: The client calculates the large bounding box;The server uses the index to find the results.Stage 2: filter the initial results with the conditions

yTiy 0 for 1…2k

Note: the two-stage strategy works, if the output of stage 1 is significantly smaller than the original database and can be fit into the memory.

Otherwise, use linear scan with stage 2 filtering.

Page 28: Secure Data Outsourcing

RASP-based data mining Preserving range query linear

classifier Use the boosting framework to get

strong classifiers (PerturBoost, in ICDM 2013)

Page 29: Secure Data Outsourcing

Access pattern privacy On database queries

Problem is the same as PIR Attackers may use the access pattern to

breach data confidentiality

Each of previous approaches should handle this problem!

Page 30: Secure Data Outsourcing

PIR is impractical Solutions based on private

Information retrieval (PIR) PIR is still impractical

Page 31: Secure Data Outsourcing

For Bucktization approach Based on the architecture of

Hacigumus (SIGMOD02) Hore VLDB04

For range query Privacy concern: reveal the distribution

of value in each bucket “Diffusion”: split buckets and combine

parts of different buckets Trade off: now the server needs to return

more noisy results larger size

Page 32: Secure Data Outsourcing

For OPE Use queries to find out the

distributions, then break the encryption

Page 33: Secure Data Outsourcing

For RASP Secure query transformation Attacks to transformed queries

Page 34: Secure Data Outsourcing

Oblivious RAM Access pattern: read/write data items Setting:

Client has a small secure memory Server has large insecure storage, semi-

honest Data items are encrypted Client cannot hide the accessed locations

An active area

Page 35: Secure Data Outsourcing

Existing Approaches

Inside a level Some real blocks

Useful data Some dummy blocks

Random data Randomly permuted

Only the client knows the permutation

Dummy Block

Real Block

Real Block

Dummy Block

Real Block

Dummy Block

Dummy Block

Real Block

Page 36: Secure Data Outsourcing

Existing Approaches

Reading Read a block from

each level One real block. Remaining are

dummy blocks

ClientServer

realdummydummydummydummy

dummy

Page 37: Secure Data Outsourcing

Existing Approaches

Writing Shuffle

consecutively filled levels.

Write into next unfilled level.

Clear the source levels

Server (before) Server (after)Client

shuffleblocks

Page 38: Secure Data Outsourcing

Continuous Shuffling

To write:

Page 39: Secure Data Outsourcing

The Problem with Existing Approaches

Page 40: Secure Data Outsourcing

Integrity guarantee Merkle hash tree

H(H(x1)+H(x2)) , + is string concatenation

Can be stored with tree like structure : index, xml

Page 41: Secure Data Outsourcing

Hash chains

Page 42: Secure Data Outsourcing

Query correctness with merkleby Devanbu et. al.

Page 43: Secure Data Outsourcing

Using merkle tree

Example:5<=q<=10

LUB(q) = 4GLB(q) = 11

Page 44: Secure Data Outsourcing

Operations: Selections, projections, equijoins, set ops

Issues Works only on data with verification objects Query expressiveness Expensive

Related work Pang et. al (ICDE04, SIGMOD05), using ElGamal

function Sion VLDB05: challenge token F.Li SIGMOD06: freshness

Page 45: Secure Data Outsourcing

Secure keyword search Simple information retrieval

For a keyword, find the documents containing the keyword

What if the documents are encrypted word by word

and if the keyword is also encrypted

Page 46: Secure Data Outsourcing

Secure keyword search Song 2000

•Seed is random, different for each Wi•Key idea: Li and Ri are self-verifiable •Advantage of XOR

Page 47: Secure Data Outsourcing
Page 48: Secure Data Outsourcing

How to set K?

Page 49: Secure Data Outsourcing

Setting of ki Ki = Fk’(Wi), k’ is secret User publishes W and k = Fk’(W) Server checks CiW

whether <Li, Fk(Li)> == CiW It reveals nothing if Ci is not the ciphertext

for W. And Li is random for different Wi – server

cannot find any information from Li.

Page 50: Secure Data Outsourcing

Hidden search In previous schemes, W is revealed

Weakness: each search will have to release k for W Easy to collect information

Solution: encrypt Wi with an private key, then xor with <Li, Fk(Li)>

Page 51: Secure Data Outsourcing

Recent developments Reza 2006

“Searchable symmetric encryption: improved definitions and efficient constructions”

Completely solved this problem, with a solution indistinguishability under chosen ciphertext attack (IND-CCA)

Page 52: Secure Data Outsourcing

Trusted hardware

Page 53: Secure Data Outsourcing

Possible benefits

Page 54: Secure Data Outsourcing

Discussion Data confidentiality/access pattern

Restrict cryptographic definition (keyword search) or

Relaxed definition (perturbation, bucketization, OPE, etc.)

It is very difficult to formulate and prove the security of non-traditional approaches Do we need to reformulate the security

model? and how?