limiting disclosure in hippocratic databases kristen lefevre rakesh agrawal vuk ercegovac raghu...

36
Limiting Disclosure in Hippocratic Databases Kristen LeFevre Rakesh Agrawal Vuk Ercegovac Raghu Ramakrishnan Yirong Xu David DeWitt VLDB August 31, 2004

Post on 22-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Limiting Disclosure in Hippocratic Databases Kristen LeFevre Rakesh Agrawal Vuk Ercegovac Raghu Ramakrishnan Yirong Xu David DeWitt VLDB August 31, 2004
  • Slide 2
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases2 Presentation Outline Hippocratic Databases framework for managing privacy, including the problem of limiting disclosure Overview of our proposal for integrating policy- driven disclosure control into an existing relational database environment Brief discussion of alternative cell-level enforcement models Optimized implementation of opt-in and opt-out choices Overview of performance evaluation Conclusions
  • Slide 3
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases3 Hippocratic Databases and Limited Disclosure Hippocratic Databases have been proposed as a framework for managing privacy-sensitive information Limited disclosure is one of the defining principles of this framework Limited Disclosure includes 3 Main Ideas: Privacy Policy Organizations define a set of rules describing to whom data may be disclosed (recipients) and how the data may be used (purposes) Consent Data subjects given control over who may see their personal information and under what circumstances Disclosure Control Database ensures that privacy policy and data subject consent is enforced with respect to all data access Limits the outflow of information from the database
  • Slide 4
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases4 Motivating Example Consider a group of athletes registering for a major international competition Personal information is collected from each athlete, possibly including Name, Age, Nationality, Address, Phone number, Visa status Data must be managed according to the organizing committees privacy policy Government officials are allowed to see visa information for the purpose of venue security Team travel agents may see the contact information for athletes from their own country for making travel arrangements Organizing committee may not disclose athletes information to journalists without the athletes consent
  • Slide 5
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases5 Limited Disclosure Framework Goals Provide techniques for enforcing a broad class of privacy policy rules Privacy policy enforcement should require little or no modification to existing application code Policy rules should be stored and managed by the database Provide limited disclosure enforcement at the cell level
  • Slide 6
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases6 Limited Disclosure Framework Overview Privacy Meta- Data Data Table Query Modifier Policy Definition Query Consent Info Subject Consent Start with an existing database environment with associated applications Privacy policy is defined and stored in the database in privacy meta-data tables When providing information, data subjects also provide consent for various data use Queries are modified so results respect privacy policy and consent
  • Slide 7
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases7 Policy Definition Privacy policy is defined using one of the following XML-based policy definition languages Platform for Privacy Preferences (P3P) Enterprise Privacy Authorization Language (EPAL)
  • Slide 8
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases8 Privacy Meta-Data and Policy Meta-Language Privacy meta-language for expressing the privacy policy in the database Not tied to one particular policy language Many practical P3P and EPAL policies can be translated to this language Privacy policy is a set of rules of the form Condition must be a predicate that can be expressed in SQL Privacy policy rules stored in the database
  • Slide 9
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases9 Journalists may only see athletes names for the purpose of writing articles with explicit consent Government officials may see athletes visa information for security purposes. Privacy Meta-Data Example C2AddressAthletesJournalistArticlesR6P1 C1NameAthletesJournalistArticlesR5P1 -NameAthletesGovt Off.SecurityR2P1 -PhoneAthletesTravel Ag.TravelR4P1 -NameAthletesTravel Ag.TravelR3P1 -VisaAthletesGovt Off.SecurityR1P1 CondIDColumnTableRecipientPurposeRulePolicy EXISTS (SELECT Name_choice FROM Athlete_choices WHERE Athletes.Athlete# = Athlete_choices.Athlete# AND Athlete_choices.Address_choice = 1) C2 EXISTS (SELECT Name_choice FROM Athlete_choices WHERE Athletes.Athlete# = Athlete_choices.Athlete# AND Athlete_choices.Name_choice = 1) C1 PredicateCondID
  • Slide 10
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases10 Query Modification Implemented two alternative algorithms for modifying queries to incorporate policy rules and consent information Queries modified in such a way that query results follow one our cell- level semantic models
  • Slide 11
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases11 Enforcement Models Row (tuple)-level enforcement insufficient for enforcing arbitrary policies when existing database schemas are not designed with the policy in mind
  • Slide 12
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases12 An Example Athlete#NameAgeAddressPhone 1 Michael Phelps 19Baltimore111-1111 2 Natalie Coughlin 22Berkeley222-2222 3 Ian Thorpe 23Sydney333-3333 4 Jenny Thompson 31New York444-4444 Table Athletes #Athlete#NameAgeAddressPhone 1 2XXXXX 3XX 4XXX Consent information for journalists writing stories
  • Slide 13
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases13 Row-Level Enforcement Athlete#NameAgeAddressPhone 1 Michael Phelps 19Baltimore111-1111 2 Natalie Coughlin 22Berkeley222-2222 3 Ian Thorpe 23Sydney333-3333 4 Jenny Thompson 31New York444-4444 Table Athletes #Athlete#NameAgeAddressPhone 1 2XXXXX 3XX 4XXX Consent information for journalists writing stories
  • Slide 14
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases14 Row-Level Enforcement 444-4444New York31 Jenny Thompson 4 333-3333Sydney23 Ian Thorpe 3 111-1111Baltimore19 Michael Phelps 1 PhoneAddressAgeNameAthlete# # NameAgeAddressPhone 1 2XXXXX 3XX 4XXX Consent information for journalists writing stories Must either disclose prohibited information, or restrict information that should be available! Filter Athlete #2 because no consent is provided
  • Slide 15
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases15 Enforcement Models Cell-level enforcement Table Semantics model Query Semantics model
  • Slide 16
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases16 Table Semantics Enforcement 1. Mask prohibited cells with the null value 2. Filter rows where the primary key is prohibited 3. Conceptually, query is performed on top of this view
  • Slide 17
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases17 Table Semantics Enforcement SQLs null value represents no value Desirable semantics for prohibited values Predicates applied to null never evaluate to true Null does not join with other values Null is not included when computing aggregates
  • Slide 18
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases18 Table Semantics Enforcement Athlete#NameAgeAddressPhone 1 Michael Phelps 19Baltimore111-1111 2 Natalie Coughlin 22Berkeley222-2222 3 Ian Thorpe 23Sydney333-3333 4 Jenny Thompson 31New York444-4444 Table Athletes #Athlete#NameAgeAddressPhone 1 2XXXXX 3XX 4XXX Consent Information Athlete#NameAgeAddressPhone 1Michael Phelps19Baltimore111-1111 3Sydney333-3333 4Jenny Thompson Athlete#NameAgeAddressPhone 1Michael Phelps19Baltimore111-1111 3Sydney333-3333 4Jenny Thompson Mask prohibited cells with null Filter rows where the primary key is prohibited
  • Slide 19
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases19 Enforcement Models Cell-level enforcement Table Semantics model Query Semantics model
  • Slide 20
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases20 Query Semantics Enforcement 1. Mask prohibited cells with the null value 2. Execute the query on top of the masked table 3. Filter rows that are entirely null from the result set
  • Slide 21
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases21 Query Semantics Enforcement Athlete#NameAgeAddressPhone 1 Michael Phelps 19Baltimore111-1111 3Sydney333-3333 4 Jenny Thompson NameAge Michael Phelps19 Jenny Thompson NameAge Michael Phelps19 Jenny Thompson Query Semantics NameAge Michael Phelps19 Jenny Thompson Table Semantics Issue Query: SELECT Name, Age FROM Athletes Filter rows that are entirely null from result set Mask prohibited cells with null
  • Slide 22
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases22 Query Modification Example (Table Semantics) SELECT Name FROM Athletes WHERE Name = Michael Phelps SELECT CASE WHEN EXISTS (SELECT Name_Choice FROM Athlete_Choices WHERE Athletes.Athlete# = Athlete_Choices.Athlete# AND Athlete_Choices.Name_Choice = 1) THEN Name ELSE null END FROM Athletes WHERE Name = Michael Phelps AND EXISTS (SELECT Athlete#_Choice FROM Athlete_Choices WHERE Athletes.Athlete# = Athlete_Choices.Athlete# AND Athlete_Choices.Athlete#_Choice = 1)
  • Slide 23
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases23 Database-level disclosure control Database the best place to enforce limited disclosure More efficient, flexible, and secure than an application-level approach Need not fetch prohibited data from the database When applied naively, an application-level approach leads to privacy leaks when applied at the cell level Consider the query SELECT Name, Age FROM Athletes WHERE Age > 30
  • Slide 24
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases24 Based on this query, it is easy to infer that Jenny Thompsons age is greater than 30! NameAge Jenny Thompson 31 Example: Difficulties of application-level disclosure control Athlete#NameAgeAddressPhone 1Michael Phelps19Baltimore111-1111 2Natalie Coughlin22Berkeley222-2222 3Ian Thorpe23Sydney333-3333 4Jenny Thompson31New York444-4444 Table Athletes 4 3 2 1 # XXX X X XXX X X PhoneAddressAgeNameAthlete# Consent Information Jenny Thompson AgeName Query the database; Retrieve results to application Check policy and consent info; replace prohibited cells with null
  • Slide 25
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases25 Database-level disclosure control Database is a logical place to enforce limited disclosure More efficient and flexible than an application- level rule engine approach Need not fetch prohibited data from the database When applied naively, an application-level approach leads to privacy leaks when applied at the cell level Consider the query SELECT Name, Age FROM Athletes WHERE Age > 30 Alternative approach performs much query processing in the application Even more complicated to compute aggregates and joins when some cells are prohibited!
  • Slide 26
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases26 Optimized Implementation of Opt-in and Opt-out Conditions Important to note that SQL queries offer much flexibility for defining disclosure conditions In practice simple opt-in and opt-out choices are often used to express subject consent and are extremely important Sufficient for expressing P3P policy rules Sufficient for expressing many HIPAA- mandated policies, for example. Implemented several techniques for storing consent and optimizing this type of condition
  • Slide 27
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases27 Optimized Implementation of Opt-in and Opt-out Conditions Several alternative storage techniques Internal column (inline) representation External, single table representation External, multiple table representation
  • Slide 28
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases28 Optimized Implementation of Opt-in and Opt-out Conditions Athlete#NameAgeAddressPhoneAthlete #NameAgeAddressPhone 1 Michael Phelps 19Baltimore111- 1111 yes 2 Natalie Coughlin 23Berkeley222- 2222 no 3 Ian Thorpe 23Sydney333- 3333 yesno yes 4 Jenny Thompson 31New York444- 4444 yes no Table Athletes Internal Column representation
  • Slide 29
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases29 Optimized Implementation of Opt-in and Opt-out Conditions External, single table representation Athlete#NameAgeAddressPhone 1 Michael Phelps 19Baltimore111- 1111 2 Natalie Coughlin 23Berkeley222- 2222 3 Ian Thorpe 23Sydney333- 3333 4 Jenny Thompson 31New York444- 4444 Table Athletes IDAthlete#NameAgeAddressPhone 1yes 2no 3yes no yes 4 no Consent Table
  • Slide 30
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases30 Optimized Implementation of Opt-in and Opt-out Conditions External, multiple table representation Athlete#NameAgeAddressPhone 1 Michael Phelps 19Baltimore111- 1111 2 Natalie Coughlin 23Berkeley222- 2222 3 Ian Thorpe 23Sydney333- 3333 4 Jenny Thompson 31New York444- 4444 Table Athletes Athlete# 1 3 4 Positive Consent Tables Name 1 4 Phone 1 3 Address 1 3 Age 1
  • Slide 31
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases31 Overview of Performance Experiments Implemented Query Modification algorithms on top of DB2 version 8.1 Focused on measuring performance for unconditional rules, and those with opt-in and opt-out choices Experimental setup Synthetic dataset based on the Wisconsin Benchmark Dual-processor 1.8 GHz AMD Machine running Windows 2000 Server 2 gigabytes memory 50 megabyte buffer pool Queries run warm and cold Here we report the warm numbers (error less than 5% with 95% confidence)
  • Slide 32
  • 8/31/2004Limiting Disclosure in Hippocratic Databases 32 0 10 20 30 40 020406080100 Choice Selectivity (%) Elapsed Time (seconds) Modified External Multiple Unmodified Modified Internal Measured performance of a query selecting all records from a 5 million- record table Compared performance of original and modified queries for varied choice selectivity Not surprisingly, performance actually better for modified queries when we use privacy enforcement as an additional selection condition Able to use indexes on choice values Shows the importance of database-level privacy enforcement for performance
  • Slide 33
  • 8/31/2004Limiting Disclosure in Hippocratic Databases 33 Measured overhead cost using a query that selects all records Choice selectivity = 100% Observed worst-case scenario where no rows are filtered due to privacy constraints, but incur all costs of cell-level checking Full bar represents elapsed time Bottom portion of bar is CPU time Much of the cost of privacy enforcement is CPU cost, so scales well as queries become more I/O intensive
  • Slide 34
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases34 Additional Performance Results Cost of rewriting queries is small Must only be done once if query is pre-compiled Found that query semantics enforcement model is often faster than table semantics because frequently more rows are filtered Tradeoffs between choice storage techniques Number of choices stored for a particular table As more choices are stored, performance of internal representation suffers Number of choices enforced for a particular query As more choices are enforced, performance of external multiple representation suffers Tradeoffs between query modification algorithms Described in paper
  • Slide 35
  • 8/31/2004 Limiting Disclosure in Hippocratic Databases35 Conclusions Limited Disclosure is a necessary component of a comprehensive data privacy management system Proposed a framework enforcing limited disclosure at the database level More efficient and flexible than application-level disclosure control Techniques also have broader use for other applications requiring policy-driven fine-grained disclosure control Framework can be deployed to an existing environment with minimal modification to legacy applications and existing schemas Query modification and consent storage approaches efficient enough to be viable in practice
  • Slide 36
  • Questions