paper authors: rakesh agrawal, jerry kiernan, ramakrishnan srikant, yirong xu presented by: camille...
TRANSCRIPT
Paper Authors: Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, Yirong Xu
Presented By:Camille Gaspard
Original ly taken from:
pages.cpsc .ucalgary.ca/~hammad/Fal l04-00_fi les /Reg_Hippocrat ic%20Databases2.ppt
Hippocratic Databases
2
Introduction Privacy with respect
to databases Related workFounding principles
of Hippocratic databases.
Strawman designFuture researchLimiting Disclosure
in Hippocratic Databases
Hippocratic Oath: “And about whatever I may see or hear in treatment, or even without treatment, in the life of human beings – things that should not ever be blurted out outside – I will remain silent, holding such things to be unutterable.”
3
Dramatic and escalating increase in digital data
Lack of privacy awareness, lax securityWe need to re-architect database systems to
include responsibility for the privacy of data.
Overview
Digitization
Storage
ProcessorsNetworks
TechniquesAutomatic collection
4
Privacy is the right of individuals to determine for themselves when, how and to what extent information about them is communicated to others.
Computer security is the prevention of, or protection against,access to information by unauthorized
recipients, and intentional but unauthorized destruction or
alteration of that information
Vocabulary
5
Traditional databasesStatistical databasesSecure databases
Related Work
6
Fundamental to a database system is1. Ability to manage persistent data.2. Ability to access a large amount of data efficiently.
Universal capabilities of a database system1. Support for at least one data model.2. Support for certain high-level languages that allow the
user to define the structure of data, access data, and manipulate data.
3. Transaction management, the capability to provide correct, concurrent access to the database by many users at once.
4. Access control, the ability to deny access to data by unauthorized users and the ability to check the validity of the data.
5. Resiliency, the ability to recover from system failures without losing data.
Traditional Databases
7
Hippocratic databases require all the capabilities provided by current database systems
Different focus Need to rethink data
definition and query languages, query processing, indexing and storage structures, and access control mechanisms
Traditional Databases
• Privacy• Consented
sharing
• Efficiency
• Maximizing Concurrency
• Resiliency
8
Goal: Provide statistical information (sum, count, average, maximum, minimum, pth percentile) without compromising sensitive information about individuals
Query restriction Restricting the size of query results Controlling the overlap among successive queries Keeping audit trails of all answered queries and checking for
possible compromises Data perturbation
Swapping values between records Replacing the original database by a sample from the same
distribution Adding noise to the values in the database Adding noise to the results of a query
Hippocratic databases share the goal of preventing disclosure of private information but the class of queries for Hippocratic databases is much broader.
Statistical Databases
9
Secure DatabasesSensitive information is
transmitted over a secure channel and stored securely
Access controlsEncryptionMultilevel secure
databases Multiple levels of security
(top secret, secret, confidential, unclassified)
Eg: Bell-LaPadula model (no read up, no write down)
High Sensitivity Level
Medium Sensitivity Level
Low Sensitivity Level
Writ
e O
K
Re
ad O
K
10
United States Privacy Act of 1974 requires federal agencies to1. permit an individual to determine what records pertaining to him
are collected, maintained, used, or disseminated; 2. permit an individual to prevent records pertaining to him
obtained for a particular purpose from being used or made available for another purpose without his consent;
3. permit an individual to gain access to information pertaining to him in records, and to correct or amend such records;
4. collect, maintain, use or disseminate any record of personally identifiable information in a manner that assures that such action is for a necessary and lawful purpose, that the information is current and accurate for its intended use, and that adequate safeguards are provided to prevent misuse of such information;
5. permit exemptions from the requirements with respect to the records provided in this Act only in those cases where there is an important public policy need for such exemption as has been determined by specific statutory authority; and
6. be subject to civil suit for any damages which occur as a result of willful or intentional action which violates any individual’s right under this Act.
Privacy Regulations
11
Recent privacy documents1996 Health Insurance Portability and
Accountability Act (HIPAA)1999 Gramm-Leach-Bliley Financial Services
Modernization Act1995 Canada Personal Information Protection
and Electronic Documents Act (PIPEDA)
Privacy Regulations
12
GuidelinesCollectionRetentionUseDisclosureExample: Grad
student information at the university
13
1.Purpose Specification. For personal information stored in the database, the purposes for which the information has been collected shall be associated with that information.
2.Consent. The purposes associated with personal information shall have consent of the donor of the personal information.
3.Limited Collection. The personal information collected shall be limited to the minimum necessary for accomplishing the specified purposes.
4.Limited Use. The database shall run only those queries that are consistent with the purposes for which the information has been collected.
5.Limited Disclosure. The personal information stored in the database shall not be communicated outside the database for purposes other than those for which there is consent from the donor of the information.
Ten Founding Principles
14
6. Limited Retention. Personal information shall be retained only as long as necessary for the fulfillment of the purposes for which it has been collected.
7. Accuracy. Personal information stored in the database shall be accurate and up-to-date.
8. Safety. Personal information shall be protected by security safeguards against theft and other misappropriations.
9. Openness. A donor shall be able to access all information about the donor stored in the database.
10.Compliance. A donor shall be able to verify compliance with the above principles. Similarly, the database shall be able to address a challenge concerning compliance.
Ten Founding Principles
15
Use scenario: Mississippi Bookstore
Mississippi is an on-line bookseller who needs to obtain certain minimum personal information to complete a purchase transaction. This information includes name, shipping address, and credit card number. Mississippi also needs an email address to notify the customer of the status of the order.
Mississippi uses the purchase history of customers to offer book recommendations on its site. It also publishes information about books popular in the various regions of the country (purchase circles).
Strawman Design
16
The Characters Name: Super Mario Privacy pragmatist Likes the convenience
of providing his email and shipping address only once by registering at Mississippi . He also likes recommendations but he does not want his transactions used for purchase circles.
17
The Characters Name: Princess Peach Privacy fundamentalist Does not want
Mississippi to retain any information once her purchase transaction is complete.
18
The Characters Name: Luigi Mississippi employee
with questionable ethics
The database and privacy officer must ensure that he is not able to obtain more information that she is supposed to.
19
Privacy Metadata Schema
Database Schema
Privacy-Policies Table
Privacy Metadata
20
Privacy-Authorizations TablePrivacy Metadata
This object is copied from the original paper.
21
This design may be too restrictive or may not fit in some situations.
May create a new table with the columnsUserTableAttributePurpose External recipient
Alternate Organizations
22
This object is copied from the original paper.
23
Privacy Constraint Validator checks whether the business’s privacy policy is acceptable to the user
Example: If Princess Peach required a 2 week retention period, the database would reject the transaction
Data is inserted with the purpose for which it may be used
Data Accuracy Analyzer addresses the Principle of Accuracy. For example, verify that the postal code corresponds to the street address.
Data Collection
24
Submitted to the database along with their purpose. Example: recommendations
Before query execution: Attribute Access Control checks privacy-authorizations table for a match on purpose, attribute and user.
During query execution: Record Access Control ensures that only records whose purpose attribute includes the query’s purpose will be visible to the query.
Queries
25
After query execution: Query Intrusion Detector is run on the query results to spot queries whose access pattern is different from the usual access pattern for queries with that purpose and by that user.
Detector uses the Query Intrusion Model built by analyzing past queries for each purpose and each authorized user.
An audit trail of all queries is maintained for external privacy audits, as well as addressing challenges regarding compliance.
Queries
26
Data Retention Manager deletes data items that have outlived their purpose.
Data Collection Analyzer examines the set of queries for each purpose to determine if any information is being collected but not used. This supports the Principle of Limited Collection.
DRA determines if data is being kept for longer than necessary. (Limited Retention)
DCA determines if people have unused (unnecessary) authorizations to issue queries with a given purpose. (Limited Use)
Encryption Support allows some data items to be stored in encrypted form to guard against snooping.
Other Features
27
Platform for Privacy Preferences (emerging standard developed by the WWW Consortium)
P3P provides a way for a web site to encode its data-collection practices in an XML P3P policy
The sites policy is programmatically compared to a user’s privacy preferences
How to enforce?Integrate with Hippocratic databasesThe Electronic Privacy Information Center (EPIC)
has been critical of P3P and believes P3P makes it too difficult for users to protect their privacy
P3P and Hippocratic Databases
28
29
P3P language insufficientDeveloped for web shopping language restrictedP3P is a good starting for a language which can be
used in a wider variety of environments such as finance, insurance, and health care
Difficult to find balance between expressibility and usability
Work is being done to arrange purposes in a hierarchy rather than the flat space that P3P uses
Other research treats the problem as a coalition game [Kleinberg et al. 2001]where information is valuable and users are compensated for revealing some private information.
New Challenges – Language
30
Current databases have undergone years of optimization without concern for privacy. What type of performance hit will integrated privacy checking entail?
Some techniques from multilevel secure databases will apply
Technique that can reduce the cost of each check:Assume number of purposes is 32 or lessEncode the set of purposes by setting a bit in a wordThen the Record Access Control check only requires
a bit-wise AND and XOR of two words with a check that the result is 0
Storage of purpose – space versus efficiency
New Challenges – Efficiency
31
Access Analysis: Analyze the queries for each purpose and identify attributes that are collected for a given purpose but not used. Problem: Necessity of one attribute may depend on
othersGranularity Analysis: Analyze the queries for
each purpose and numeric attribute and determine the granularity at which information is needed.
Minimal Query Generation: Generate the minimal query that is required to solve a given problem.
Challenges – Limited Collection
32
Strawman design as it exists now does not protect against a dynamically determined set of recipients
Example: EquiRate is a credit rating agency that maintains a database of credit ratingsPrincess Peach has asked EquiRate to only
disclose her credit rating to companies that she has contacted
Luigi has stolen Princess Peach’s identity and has contacted EasyCredit pretending to be her.
EasyCredit contacts EquiRate with Princess Peach’s credentials and obtains her credit rating
Both companies have adhered to the protocol but it has failed
Challenges – Limited Disclosure
33
Potential solution1. Princess Peach digitally signs EasyCredit’s
company ID with her cryptographic private key
2. She provides the result to EasyCredit3. EasyCredit contacts EquiRate and gives
them Alice’s signed permission4. EquiRate verifies the signature and provides
the information to EasyCredit
Challenges – Limited Disclosure
34
Deleting a record from a database is simple but how does one completely remove the information?
Deleting information for all logs, checkpoints, and backups is difficult
Challenges – Limited Retention
35
How do we protect against someone whose access controls prevent them from accessing tables through the DBMS but they have access to the files via other means?
Example: Suppose Luigi has no access to the DBMS but has root privileges on the system
Encryption incurs performance penaltiesHow can one index or query encrypted
columns
New Challenges – Safety
36
A person challenging information about themselves that was not provided by them (eg: credit report)
Super Mario wishes to find out what databases have information about him
If the database does not have information about him, it should not know who issued the query
Related work: symmetrically private information retrieval
New Challenges – Openness
37
Provide each user whose data is accessed with a log of that access along with the query reading the data, without paying a large performance penalty
Fingerprinting – sign up with a company that inserts fingerprint records into the database with an email address, phone number and credit card number along with various approved purposes. If one is used for an unauthorized purpose it would be immediately detected and the company notified that a breach has occurred
Challenge: use the least number of fingerprint records to maximum effect.
New Challenges – Compliance
38
Presented a vision, inspired by the Hippocratic Oath, of databases that preserve privacy
Enunciated key privacy principlesDiscussed a strawman design for a
Hippocratic databaseIdentified technical challengesDiscussed related work
Conclusion
Paper Authors: Kristen LeFevrey, Rakesh Agrawaly, Vuk Ercegovac, Raghu Ramakrishnan, Yirong Xuy and David DeWitt
Presented By:Camille Gaspard
Limiting Disclosure in Hippocratic Databases
40
It’s essential to manage disclosure in a more fine-grain manner.
Cell-level disclosure schemes have not been well studied before this work.
Motivation
41
Investigate the alternative semantics for limited disclosure in relational databases and present two models of cell-level limited disclosure: table semantics and query semantics, weighing the relative semantic and performance tradeoffs implicit in the two models.
Rather than viewing the disclosure control problem as one of checking against a list of privileges, they transform it into a query modification problem.
Experimental evaluation.
Contribution
42
The table semantics model conceptually defines a view of each data table for each purpose-recipient pair, based on the disclosure constraints specified in the privacy policy. These views combine to produce a coherent relational database for each purpose-recipient pair, independent of any queries, and queries are evaluated against this database.
Table semantics
43
The query semantics model takes the query into account when enforcing disclosure.
Both models mask prohibited values using SQL's null value.
Query semantics
44
Policy definition: Privacy policies are expressed electronically and stored in the database where they can be used to enforce limited disclosure.
Privacy meta-data: The rules and conditions prescribed by the privacy policy are stored a privacy meta-data tables in the database.
Application context: Each query must be associated with a purpose and recipient. In their system, this information is inferred based on the context of the application issuing the query. The query interceptor infers the purpose and recipient of the query based on context information stored in an additional meta-data table.
System Overview (1)
45
Query modifier: SQL queries issued to the database are intercepted and augmented to reflect the privacy policy rules regarding the purpose and recipient issuing the query. The results of this new query are returned to the issuer.
Disclosure model: The result of executing a privacy modified query will reflect one of two cell-level limited disclosure models.
System Overview (2)
46
47
Results
48
Results
49
Results
50
Limited Disclosure. Can we really limit it?Limited Retention. Can we really delete the
data?Isn’t it better to keep such complex privacy
mechanism as part of the application?Why not using new techniques in writing
stored procedures?Databases are only part of the whole system.
Securing the database helps but not prevent attacks. A holistic solution is what might be needed.
Discussion
51
Thank You!