data leakage detection by akshay vishwanathan (0801003) joseph george (0801027) s. prasanth...

17
Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

Upload: wilfred-sullivan

Post on 29-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

Data Leakage Detection by

Akshay Vishwanathan (0801003)

Joseph George (0801027) S. Prasanth (0801069)

Guided by:Ms. Krishnapriya

Page 2: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

Data Leakage Detection-Introduction

• In the course of doing business, sometimes sensitive data must be handed over to supposedly trusted third parties.

• For example, a hospital may give patient records to researchers who will devise new treatments. We call the owner of the data the distributor and the supposedly trusted third parties the agents.

• Our goal is to detect when the distributor’s sensitive data has been leaked by agents, and if possible to identify the agent that leaked the data.

Page 3: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

Existing System

• We develop a model for assessing the “guilt” of agents.

• We also consider the option of adding “fake” objects to the distributed set. Such objects do not correspond to real entities but appear realistic to the agents. In a sense, the fake objects act as a type of watermark for the entire set, without modifying any individual members.

• If it turns out that an agent was given one or more fake objects that were leaked, then the distributor can be more confident that agent was guilty.

Page 4: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

PROBLEM DEFINITION

• The distributor’s data allocation to agents has one constraint and one objective.

• The distributor’s constraint is to satisfy agents’ requests, by providing them with the number of objects they request or with all available objects that satisfy their conditions.

• His objective is to be able to detect an agent who leaks any portion of his data.

Page 5: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

Problem Setup And Notation

• Entities and Agents: A distributor owns a set T = {t1, . . . , tm} of

valuable data objects. The distributor wants to share some of the objects with a set of agents U1, U2, ...,Un, but does not wish the objects be leaked to other third parties.

• Guilty Agents: Suppose that after giving objects to agents, the distributor

discovers that a set S є T has leaked. This means that some third party called the target, has been caught in possession of S.

Page 6: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

Agent Guilt Model

• To compute the probability that the agent is guilty given set S, Pr{Gi|S}, we need an estimate for the probability that values in S can be “guessed” by the target.

• Assumption 1. For all t, t є S such that t ≠ t1

provenance of t is independent of the provenance of t1.

• Assumption 2. An object t є S can only be obtained by the target in one of two ways:

• A single agent Ui leaked t from its own Ri set; or • The target guessed (or obtained through other

means) t without the help of any of the n agents.

Page 7: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

Disadvantages of the Existing System

• In a sense, the fake objects act as a type of watermark for the entire set. If the agent comes to know of the existence of the fake object, he can easily remove it using various software which can easily remove watermarking from the data.

• There is no way to intimate the distributor when the data is leaked.

Page 8: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

Proposed System

• We present algorithms for distributing objects to agents, in a way that improves our chances of identifying a leaker.

• We also design a system where an email is sent to the distributor when the fake object is downloaded by another agent.

Page 9: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

Advantages of the proposed system

• It is possible to assess the likelihood that an agent is responsible for a leak, based on the overlap of his data with the leaked data and the data of other agents.

• The algorithms we have presented implement a data distribution strategies that can improve the distributor’s chances of identifying a leaker.

Page 10: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

Data Allocation Problem

• The main focus of the proposed system is the data allocation problem: how can the distributor “intelligently” give data to agents in order to improve the chances of detecting a guilty agent?

• The two types of requests we handle are sample and explicit. Fake objects are objects generated by the distributor that are not in set T which are designed to look like real objects, and are distributed to agents together with the T objects, in order to increase the chances of detecting agents that leak data.

Page 11: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

Explicit Data Requests

• Explicit request Ri = EXPLICIT(T,condi): Agent Ui receives all T objects that satisfy condi.

Algorithm 1. Allocation for Explicit Data Requests (EF)Input: R1; . . . ; Rn, cond1; . . . ; condn, b1; . . . ; bn, BOutput: R1; . . . ; Rn, F1; . . . ; Fn1: R <- Ф ; //Agents that can receive fake objects2: for i = 1, . . . ,n do3: if bi > 0 then4: R <- R U {i}5. Fi <- Ф6: while B > 0 do7: i <- SELECTAGENT(R, R1, . . . , Rn)8: f <- CREATEFAKEOBJECT(Ri, Fi, condi)9: Ri <- Ri U {f}10: Fi <- Fi U {f}11: bi <- bi - 112: if bi = 0 then13: R <- R/{Ri}14: B <- B - 1

Page 12: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

Sample Data Requests

• Sample request Ri = SAMPLE(T, mi): Any subset of mi records from T can be given to Ui.

Algorithm 2. Allocation for Sample Data Requests (SF)Input: m1, . . . , mn, |T| // Assuming mi <= |T|Output: R1, . . . , Rn 1: a <- 0|T| // a[k]:number of agents who have received object tk2: R1 <- Ф, . . . , Rn<-Ф 3: remaining <- Σi=1 to n mi4: while remaining > 0 do5: for all i = 1, . . . , n : |Ri| < mi do6: k <- SELECTOBJECT (i, Ri) //May also use additional parameters7: Ri <- Ri U {tk}8: a[k] <- a[k] + 19: remaining <- remaining - 1

Page 13: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

Software Requirements

Language : C#.NET Technology : ASP.NET IDE : Visual Studio 2008 Operating System : Microsoft Windows

XP SP2 Backend : Microsoft SQL Server

2005

Page 14: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

Hardware Requirements

Processor : Intel Pentium or moreRAM : 512 MB (Minimum)Hard Disk : 40 GB

Page 15: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

Conclusion

• In a perfect world there would be no need to hand

over sensitive data to agents that may unknowingly or

maliciously leak it. And even if we had to hand over sensitive data, in a perfect world we could watermark each

object so that we could trace its origins with absolute

certainty.

Page 16: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

References• Papadimitriou, P.  Garcia-Molina, H., “Data Leakage

Detection” , IEEE Trans. on Knowledge and Data Engineering, pp. 51 – 63, 2011.

• P. Buneman and W.-C. Tan, “Provenance in Databases,” Proc.ACM SIGMOD, pp. 1171-1173, 2007.

• R. Agrawal and J. Kiernan, “Watermarking Relational Databases,” Proc. 28th Int’l Conf. Very Large Data Bases (VLDB ’02), VLDB Endowment, pp. 155-166, 2002.

• B. Mungamuru and H. Garcia-Molina, “Privacy, Preservation and Performance: The 3 P’s of Distributed Data Management,” technical report, Stanford Univ., 2008.

Page 17: Data Leakage Detection by Akshay Vishwanathan (0801003) Joseph George (0801027) S. Prasanth (0801069) Guided by: Ms. Krishnapriya

Thank You