d leakage detection - stanford...

DATA LEAKAGE DETECTIONPanagiotis Papadimitriou and Hector Garcia-Molina

Stanford University

Jeremy Sarah MarkName: SarahSex: Female

…Name: Mark

Sex: Male…

Kathryn

Other Sourcese.g., Mark’s Friend

App. U1

App. U2

Leakage ProblemA data distributor, e.g., Facebook,owns a set T of private data items,e.g., FB profiles. The distributorgives to supposedly trusted agentsU1, …, Un, e.g., Facebook Apps, the

sets R1, …, Rn T. A leaker obtainsdata from agents or from othersources and publishes set S T.An agent who provides the leakerwith data is guilty.• Given the leaked set S, what isthe probability that Ui is guilty?• How can the distributor allocatedata items to to agents so that hecan detect guilty agents?

or

Independently All OR nothing

or

(1-p)2

(1-p)p

p(1-p)

p2

Guilt Modelsp: Posterior probability that aleaked profile comes from othersources (other than the agents).Pr(Gi|S): Probability that agent Ui

is guilty, given the leaked set ofprofiles S.

Models’ Assumptions• Agents leak each of their dataitems independently.• Agents leak all their data itemsOR nothing.

or

Pr(G1|S)

Pr(G2|S) Pr(G2|S)

Pr(G1|S)

Data Allocation ProblemAgents’ Requests• Sample, e.g., any 100 Stanfordprofiles.• Explicit, e.g., all people whoadded an application.

ObjectiveAllocate data to agents so that ifUi leaks his set Ri, then

Pr(Gi|S) >> Pr(Gj|S) for i ≠ j.

Example• 4 agents U1, U2, U3 and U4.• Each agents requests a sample of (any) 2 profiles.

U1

U2

U3

U4

U1

U2

U3

U4

U1

U2

U3

U4

GoodPoor Optimal

Agents U1 and U2 are not suspects if U3 or U4 leak data

All agents have the same guilt prob. in case of leakage

Agent Ui who leaks its data has the highest guilt prob.

OR

minimize(over R1 , ..., Rn )

1

RiRiR j

ji

i

(1)

minimize(over R1 , ..., Rn )

maxi j

RiR j

Ri(2)

Allocation StrategiesSample Requests• s-random: Allocates at random.• s-overlap: Minimizes sum of over

laps |Ri Rj|.• s-sum: Minimizes (1).• s-max: Minimizes (2).

Explicit Requests• no fake: Allocates exactly therequested data items.• e-random, e-optimal: In additionto the requested real items, theyallocate B fake items. e-randomallocates them at random. e-optimal minimizes (1) and (2).

Sample Requests Explicit Requests

min mini j

Pr(Gi | S Ri)Pr(G j | S Ri)

d leakage detection - stanford...

Documents