generalizing pir for practical private retrieval of public data

21
Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi Department of Computer Science UC Santa Barbara DBSec 2010

Upload: kamea

Post on 31-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Generalizing PIR for Practical Private Retrieval of Public Data. Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi Department of Computer Science UC Santa Barbara. DBSec 2010. Outline. The Problem Practical private retrieval of public data Main Challenges - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Generalizing PIR for Practical Private Retrieval of Public Data

Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi

Department of Computer ScienceUC Santa Barbara

DBSec 2010

Page 2: Generalizing PIR for Practical Private Retrieval of Public Data

The Problem◦ Practical private retrieval of public data

Main Challenges◦ Strong privacy, practical cost of retrieval

Our proposal◦ Absolute privacy in a bounding box

Contributions◦ Private retrieval service charge model◦ Bounding-box PIR: generalizing k-Anonymity and PIR◦ Query by key in one round

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 2

Page 3: Generalizing PIR for Practical Private Retrieval of Public Data

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 3

public data

Server

Private query method

Client

query obfuscatedquery

I don’t want to reveal my personal interest.

Untrustyserver

I can provide this private retrieval

service, if you pay for it.

Private data profile

Page 4: Generalizing PIR for Practical Private Retrieval of Public Data

Desiderata◦ Practical

Minimize computation and communication costs◦ Flexible

Allow clients to specify their desired degree of privacy ρ and service charge budget µ. Satisfy ρ without exceeding µ.

Metrics of interests◦ Performance metrics

Computation Cost Ccomp Communication Cost Ccomm

◦ Quality of service metrics Privacy Breach Probability Pbrh (Pbrh ≤ ρ) Server Charge Csrv (Csrv ≤ µ)

Challenge◦ Difficult to achieve both strong privacy and practical retrieval cost

at the same time

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 4

Page 5: Generalizing PIR for Practical Private Retrieval of Public Data

Principle◦ Blur a data value with a range or partition s.t. each value is

indistinguishable among at least k values. [Sama98, Swee02]

Analysis: use k bit data to anonymize 1 requested bit ◦ E.g. k =30, query “June 17, 1972” -> obfuscated query “June, 1972”◦ Ccomp = k, Ccomm = k +1◦ Pbrh = 1/k, Csrv = k

Pros Flexible Computationally cheap

Cons Potential proximity breach for numeric data (due to a narrow

anonymous range) [Li08] Plain text communication, subject to attack with background

knowledge

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 5

Page 6: Generalizing PIR for Practical Private Retrieval of Public Data

Principle◦ Achieve computationally complete privacy by applying

cryptographic computations over the entire public data [Kush97]

Pros◦ Complete privacy for clients◦ Secure communication

Cons◦ Orders of magnitude less efficient than simply transferring the

entire data from the server to the client [Sion07]

X1

X2

Xn

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 6

X=public data

ServerClientq=“give me ith

record” encrypted(q)

encrypted-result=f(X, encrypted(q))Xi

Page 7: Generalizing PIR for Practical Private Retrieval of Public Data

Quadratic Residue (QR) x is a quadratic residue (QR) mod N if

◦ E.g. N=35, 11 is QR (92=11 mod 35), 3 is QNR (no y exists for y2=3 mod 35)

◦ Essential properties: QR ×QR = QR QR ×QNR = QNR

Let N =p1×p2, p1 and p2 are large primes of m/2 bits.

Quadratic Residuosity Assumption (QRA)◦ Determining if a number is a QR or a QNR is

computationally hard if p1 and p2 are not given.

Page 8: Generalizing PIR for Practical Private Retrieval of Public Data

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 8

Adapted from Tan’s presentation

0 1 01

1 1 01

0 1 01

0 1 11

e

g

Get M2,3

e=2, g=3, N=35, m=6

QNR={3,12,13,17,27,33}

QR={1,4,9,11,16,29}

4 16 17 11

QNR

z 4

z 3

z 2

z 1

z2=QNR => M2,3=1

z2=QR => M2,3=0

M2,3

17

33

17

27

public data size: n = 16

Organize data in an s×t (4×4) binary matrix M

Page 9: Generalizing PIR for Practical Private Retrieval of Public Data

Principles◦ Rely on cPIR cryptographic operations to achieve strong privacy◦ Trade partial privacy of cPIR for practical performance◦ Adopt the flexible privacy principle of k-Anonymity

Basic idea◦ Bound expensive cryptographic computations in an r×c bounding

box BB, a sub-matrix on M.◦ (1) Satisfy client’s privacy requirement: r×c = 1/ρ◦ (2) Minimize Ccomm -> minimize (c + b×r)

Properties◦ The bounding box contains both the data whose values are close

to the query value and the data whose values are not close.◦ Unify k-Anonymity and cPIR by varying dimensions of the

bounding box

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 9

Page 10: Generalizing PIR for Practical Private Retrieval of Public Data

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 10

0 1 01

1 1 01

0 1 01

0 1 11

e

g

Get M2,3

e=2, g=3, N=35, m=6

QNR={3,12,13,17,27,33}

QR={1,4,9,11,16,29}

z2=QNR => M2,3=1

M2,3

17

27

16 17

QNR

y:z:

BB

Page 11: Generalizing PIR for Practical Private Retrieval of Public Data

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 11

8 33 56 89

7 26 54 80

5 23 53 79

1 16 45 72

Public data size: n = 16

Query: retrieve the item with key 53

g

e cPIR

8 33 56 89

7 26 54 80

5 23 53 79

1 16 45 72

Ccomp = k = 4

Ccomm = k +1 = 5

Pbrh = 1/ k = ¼

Csrv = k = 4

8 33 56 89

7 26 54 80

5 23 53 79

1 16 45 72

g

e k-Anonymity

g

e bbPIR

Bounding box

Page 12: Generalizing PIR for Practical Private Retrieval of Public Data

Limitation of previous formulation: query by matrix address

Solution for query by key: find address by key◦ Candidate solution I: third party translation, like in Casper

[Mokb07] Cons: security subject to a third party

◦ Candidate solution II: an index structure on server mapping key to address [Chor97] Cons: needs O(b × logn) times communication

◦ Our proposal: server publishes a histogram H on the key field to authorized clients. Client calculates an address range for the queried entry by

searching the bin in which the entry falls. Pros: If the bin size w ≤ s, only need to run one round of bbPIR

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 12

Page 13: Generalizing PIR for Practical Private Retrieval of Public Data

In clients’ view, server matrix M is a histogram matrix HM, thus the address of the requested item x maps to an address range of the items in the same bin with x.

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 13

M2,3

40

--

26

HM1,3 (M1,3, M2,3)

w=2

100

--

94

79

--

72

53

--

45

23

--

16

5

--

1

138

--

101

93

--

80

70

--

54

13

--

7

g

e

947245161

1007953235

1018054267

1078960338

13893704013

g

e

Page 14: Generalizing PIR for Practical Private Retrieval of Public Data

Implementation of three private retrieval methods◦ bbPIR, cPIR◦ k-Anonymity: anonymize the private query item by specifying a

consecutive range that covers the item

Data set◦ Generated n=106 data records with 3 attributes based on an

Adult census data set with 32561 records of 15 attributes.◦ Only for experiment on proximity privacy of numeric data,

generated 106 numeric data following Zipf distribution in [0.0, 1.0].

Settings◦ Test bed: Intel 2.40GHz CPU, 3GB memory, Federal Core 8 OS◦ Default parameter values: ρ = 0.001, µ = 50, k = 1000, m =

1024

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 14

Page 15: Generalizing PIR for Practical Private Retrieval of Public Data

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 15

Page 16: Generalizing PIR for Practical Private Retrieval of Public Data

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 16

Page 17: Generalizing PIR for Practical Private Retrieval of Public Data

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 17

Page 18: Generalizing PIR for Practical Private Retrieval of Public Data

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 18

Page 19: Generalizing PIR for Practical Private Retrieval of Public Data

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 19

Page 20: Generalizing PIR for Practical Private Retrieval of Public Data

We proposed a practical, flexible and secure approach for private retrieval of public data in single server settings, called Bounding-Box PIR (bbPIR).

bbPIR generalizes cPIR and k-Anonymity based private retrieval methods.

We incorporated the realistic assumption of charging clients for the exposed service data.

We achieved query by key without running additional rounds of bbPIR.

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 20

Page 21: Generalizing PIR for Practical Private Retrieval of Public Data

[Sama98] P. Samarati et al. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, 1998.

[Swee02] L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557--570, 2002.

[Li08] J. Li et al. Preservation of proximity privacy in publishing numerical sensitive data. In SIGMOD 2008.

[Mokb07] M. Mokbel et al. The new casper: A privacy-aware location-based database server. In ICDE 2007.

[Kush97] E. Kushilevitz et al. Replication is not needed: Single database, computationally-private information retrieval. In FOCS 1997.

[Sion07] R. Sion et al. On the computational practicality of private information retrieval. In NDSS 2007.

[Chor97] B. Chor et al. Private information retrieval by keywords. Technical Report, TRCS 0917, Technian.

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 21