d leakage detection - stanford...

1
DATA LEAKAGE DETECTION Panagiotis Papadimitriou and Hector Garcia-Molina Stanford University Jeremy Sarah Mark Name: Sarah Sex: Female Name: Mark Sex: Male Kathryn Other Sources e.g., Mark’s Friend App. U 1 App. U 2 Leakage Problem A data distributor, e.g., Facebook, owns a set T of private data items, e.g., FB profiles. The distributor gives to supposedly trusted agents U 1 , ,U n , e.g., Facebook Apps, the sets R 1 , ,R n T. A leaker obtains data from agents or from other sources and publishes set S T. An agent who provides the leaker with data is guilty. Given the leaked set S, what is the probability that U i is guilty? How can the distributor allocate data items to to agents so that he can detect guilty agents? or Independently All OR nothing or (1-p) 2 (1-p)p p(1-p) p 2 Guilt Models p: Posterior probability that a leaked profile comes from other sources (other than the agents). Pr(G i |S): Probability that agent U i is guilty, given the leaked set of profiles S. Models’ Assumptions Agents leak each of their data items independently. Agents leak all their data items OR nothing. or Pr(G 1 |S) Pr(G 2 |S) Pr(G 2 |S) Pr(G 1 |S) Data Allocation Problem Agents’ Requests Sample, e.g., any 100 Stanford profiles. Explicit, e.g., all people who added an application. Objective Allocate data to agents so that if U i leaks his set Ri, then Pr(G i |S) >>Pr(G j |S) for i j. Example 4 agents U 1 , U 2 , U 3 and U 4 . Each agents requests a sample of (any) 2 profiles. U 1 U 2 U 3 U 4 U 1 U 2 U 3 U 4 U 1 U 2 U 3 U 4 Good Poor Optimal Agents U 1 and U 2 are not suspects if U 3 or U 4 leak data All agents have the same guilt prob. in case of leakage Agent U i who leaks its data has the highest guilt prob. OR minimize (over R 1 , ..., R n ) 1 R i R i R j j i i (1) minimize (over R 1 , ..., R n ) max i j R i R j R i (2) Allocation Strategies Sample Requests s-random: Allocates at random. s-overlap: Minimizes sum of over laps |R i R j |. s-sum: Minimizes (1). s-max: Minimizes (2). Explicit Requests no fake: Allocates exactly the requested data items. e-random, e-optimal: In addition to the requested real items, they allocate B fake items. e-random allocates them at random. e- optimal minimizes (1) and (2). Sample Requests Explicit Requests min min i j Pr( G i | S R i ) Pr( G j | S R i )

Upload: others

Post on 11-Jun-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D LEAKAGE DETECTION - Stanford Universityinfolab.stanford.edu/~ppapadim/papers/Data_Leakage_Detection_po… · Leakage Problem A data distributor, e.g., Facebook, owns a set T of

DATA LEAKAGE DETECTIONPanagiotis Papadimitriou and Hector Garcia-Molina

Stanford University

Jeremy Sarah MarkName: SarahSex: Female

…Name: Mark

Sex: Male…

Kathryn

Other Sourcese.g., Mark’s Friend

App. U1

App. U2

Leakage ProblemA data distributor, e.g., Facebook,owns a set T of private data items,e.g., FB profiles. The distributorgives to supposedly trusted agentsU1, …, Un, e.g., Facebook Apps, the

sets R1, …, Rn T. A leaker obtainsdata from agents or from othersources and publishes set S T.An agent who provides the leakerwith data is guilty.• Given the leaked set S, what isthe probability that Ui is guilty?• How can the distributor allocatedata items to to agents so that hecan detect guilty agents?

or

Independently All OR nothing

or

(1-p)2

(1-p)p

p(1-p)

p2

Guilt Modelsp: Posterior probability that aleaked profile comes from othersources (other than the agents).Pr(Gi|S): Probability that agent Ui

is guilty, given the leaked set ofprofiles S.

Models’ Assumptions• Agents leak each of their dataitems independently.• Agents leak all their data itemsOR nothing.

or

Pr(G1|S)

Pr(G2|S) Pr(G2|S)

Pr(G1|S)

Data Allocation ProblemAgents’ Requests• Sample, e.g., any 100 Stanfordprofiles.• Explicit, e.g., all people whoadded an application.

ObjectiveAllocate data to agents so that ifUi leaks his set Ri, then

Pr(Gi|S) >> Pr(Gj|S) for i ≠ j.

Example• 4 agents U1, U2, U3 and U4.• Each agents requests a sample of (any) 2 profiles.

U1

U2

U3

U4

U1

U2

U3

U4

U1

U2

U3

U4

GoodPoor Optimal

Agents U1 and U2 are not suspects if U3 or U4 leak data

All agents have the same guilt prob. in case of leakage

Agent Ui who leaks its data has the highest guilt prob.

OR

minimize(over R1 , ..., Rn )

1

RiRiR j

ji

i

(1)

minimize(over R1 , ..., Rn )

maxi j

RiR j

Ri(2)

Allocation StrategiesSample Requests• s-random: Allocates at random.• s-overlap: Minimizes sum of over

laps |Ri Rj|.• s-sum: Minimizes (1).• s-max: Minimizes (2).

Explicit Requests• no fake: Allocates exactly therequested data items.• e-random, e-optimal: In additionto the requested real items, theyallocate B fake items. e-randomallocates them at random. e-optimal minimizes (1) and (2).

Sample Requests Explicit Requests

min mini j

Pr(Gi | S Ri)Pr(G j | S Ri)