differential privacy on linked data: theory and implementation
DESCRIPTION
Differential Privacy on Linked Data: Theory and Implementation. Yotam Aron. Table of Contents. Introduction Differential Privacy for Linked Data SPIM implementation Evaluation. Contributions. Theory: how to apply differential privacy to linked data. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/1.jpg)
Differential Privacy on Linked Data: Theory and ImplementationYotam Aron
![Page 2: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/2.jpg)
Table of Contents
• Introduction• Differential Privacy for Linked Data• SPIM implementation• Evaluation
![Page 3: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/3.jpg)
Contributions
• Theory: how to apply differential privacy to linked data.• Implementation: privacy module for SPARQL
queries.• Experimental evaluation: differential privacy on
linked data.
![Page 4: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/4.jpg)
Introduction
![Page 5: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/5.jpg)
Overview: Privacy Risk
• Statistical data can leak privacy.• Mosaic Theory: Different data sources harmful
when combined.• Examples:• Netflix Prize Data set• GIC Medical Data set• AOL Data logs
• Linked data has added ontologies and meta-data, making it even more vulnerable.
![Page 6: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/6.jpg)
Current Solutions
• Accountability:• Privacy Ontologies• Privacy Policies and Laws
• Problems:• Requires agreement among parties.• Does not actually prevent breaches, just a deterrent.
![Page 7: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/7.jpg)
Current Solutions (Cont’d)
• Anonymization• Delete “private” data• K – anonymity (Strong Privacy Guarantee)
• Problems• Deletion provides no strong guarantees• Must be carried out for every data set• What data should be anonymized?• High computational cost (k-anonimity is np-hard)
![Page 8: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/8.jpg)
Differential Privacy
• Definition for relational databases (from PINQ paper):
A randomized function K gives Ɛ-differential privacy if for all data sets and differing on at most one record, and all ,
![Page 9: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/9.jpg)
Differential Privacy
• What does this mean?• Adversaries get roughly same results from and ,
meaning a single individual’s data will not greatly affect their knowledge acquired from each data set.
![Page 10: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/10.jpg)
How Is This Achieved?• Add noise to result.• Simplest: Add Laplace noise
![Page 11: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/11.jpg)
Laplace Noise Parameters• Mean = 0 (so don’t add bias)• Variance = , where is defined, for a record j, as
• Theorem: For query Q result R, the output R + Laplace(0, ) is
differentially private.
![Page 12: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/12.jpg)
Other Benefit of Laplace Noise• A set of queries each with sensitivity will have an overall
sensitivity of • Implementation-wise, can allocate an “budget” Ɛ for a client
and for each query client specifies to use.
![Page 13: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/13.jpg)
Benefits of Differential Privacy• Strong Privacy Guarantee• Mechanism-Based, so don’t have to mess with data.• Independent of data set’s structure.• Works well with for statistical analysis algorithms.
![Page 14: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/14.jpg)
Problems with Differential Privacy• Potentially poor performance• Complexity• Noise
• Only works with statistical data (though this has fixes)• How to calculate sensitivity of arbitrary query without brute-
force?
![Page 15: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/15.jpg)
Theory: Differential Privacy for Linked Data
![Page 16: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/16.jpg)
Differential Privacy and Linked Data• Want same privacy guarantees for linked data without, but no
“records.”• What should be “unit of difference”?• One triple• All URIs related to person’s URI• All links going out from person’s URI
![Page 17: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/17.jpg)
Differential Privacy and Linked Data• Want same privacy guarantees for linked data without, but no
“records.”• What should be “unit of difference”?
•One triple• All URIs related to person’s URI• All links going out from person’s URI
![Page 18: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/18.jpg)
Differential Privacy and Linked Data• Want same privacy guarantees for linked data without, but no
“records.”• What should be “unit of difference”?• One triple
•All URIs related to person’s URI• All links going out from person’s URI
![Page 19: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/19.jpg)
Differential Privacy and Linked Data• Want same privacy guarantees for linked data without, but no
“records.”• What should be “unit of difference”?• One triple• All URIs related to person’s URI
•All links going out from person’s URI
![Page 20: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/20.jpg)
“Records” for Linked Data• Reduce links in graph to attributes • Idea: • Identify individual contributions from a single individual to total
answer.• Find contribution that affects answer most.
![Page 21: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/21.jpg)
“Records” for Linked Data• Reduce links in graph to attributes, makes it a record.
P1 P2Knows
Person Knows
P1 P2
![Page 22: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/22.jpg)
“Records” for Linked Data• Repeated attributes and null values allowed
P1 P2Knows
P3 P4
Loves
Knows
Knows
![Page 23: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/23.jpg)
“Records” for Linked Data• Repeated attributes and null values allowed (not good RDBMS form
but makes definitions easier)
Person Knows Knows Loves
P1 P2 Null P4
P3 P2 P4 Null
![Page 24: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/24.jpg)
Query Sensitivity in Practice• Need to find triples that “belong” to a person.• Idea:• Identify individual contributions from a single individual to total
answer.• Find contribution that affects answer most.
• Done using sorting and limiting functions in SPARQL
![Page 25: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/25.jpg)
Example• COUNT of places
visited
P1
P2
MA
S2
S3
State of Residence
S1
Visited
![Page 26: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/26.jpg)
Example• COUNT of places
visited
P1
P2
MA
S2
S3
State of Residence
S1
Visited
![Page 27: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/27.jpg)
Example• COUNT of places
visited
P1
P2
MA
S2
S3
State of Residence
S1
Visited
Answer: Sensitivity of 2
![Page 28: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/28.jpg)
Using SPARQL• Query:
(COUNT(?s) as ?num_places_visited) WHERE{?p :visited ?s }
![Page 29: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/29.jpg)
Using SPARQL• Sensitivity Calculation Query (Ideally):
SELECT ?p (COUNT(ABS(?s)) as ?num_places_visited) WHERE{
?p :visited ?s;?p foaf:name ?n }
GROUP BY ?p ORDER BY ?num_places_visited LIMIT 1
![Page 30: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/30.jpg)
In reality…• LIMIT, ORDER BY, GROUP BY doesn’t work together in 4store…• For now: Don’t use LIMIT and get top answers manually.• I.e. Simulate using these keywords in python• Will affect results, so better testing should be carried out in the
future.• Would like to keep it on sparql-side ideally so there is less
transmitted data (e.g. on large data sets)
![Page 31: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/31.jpg)
(Side rant) 4store limitations• Many operations not supported in unison• E.g. cannot always filter and use “order by” for some reason• Severely limits the types of queries I could use to test.• May be desirable to work with a different triplestore that is
more up-to-date (ARQ). • Didn’t because wanted to keep code in python.• Also had already written all code for 4store
![Page 32: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/32.jpg)
Problems with this Approach• Need to identify “people” in graph.• Assume, for example, that URI with a foaf:name is a person and
use its triples in privacy calculations.• Imposes some constraints on linked data format for this to work.• For future work, look if there’s a way to automatically identify
private data, maybe by using ontologies.• Complexity is tied to speed of performing query over large
data set.• Still not generalizable to all functions.
![Page 33: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/33.jpg)
…and on the Plus Side• Model for sensitivity calculation can be expanded to arbitrary
statistical functions.• e.g. dot products, distance functions, variance, etc.
• Relatively simple to implement using SPARQL 1.1
![Page 34: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/34.jpg)
Implementation: Design of Privacy System
![Page 35: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/35.jpg)
SPARQL Privacy Insurance Module• i.e. SPIM• Use authentication, AIR, and differential privacy in one system.• Authentication to manage Ɛ-budgets.• AIR to control flow of information and non-statistical data.• Differential privacy for statistics.
• Goal: Provide a module that can integrate into SPARQL 1.1 endpoints and provide privacy.
![Page 36: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/36.jpg)
Design
Triplestore
User DataPrivacy Policies
SPIM Main Process AIR Reasoner
Differential Privacy Module
HTTP ServerOpenID Authentication
![Page 37: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/37.jpg)
HTTP Server and Authentication
• HTTP Server: Django server that handles http requests.
• OpenID Authentication: Django module.
HTTP Server
OpenID Authentication
![Page 38: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/38.jpg)
SPIM Main Process• Controls flow of information. • First checks user’s budget, then
uses AIR, then performs final differentially-private query.
SPIM Main Process
![Page 39: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/39.jpg)
AIR Reasoner• Performs access control by
translating SPARQL queries to n3 and checking against policies.
• Can potentially perform more complicated operations (e.g. check user credentials)
Privacy Policies
AIR Reasoner
![Page 40: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/40.jpg)
Differential Privacy Protocol
Differential Privacy ModuleClient SPARQL
Endpoint
Scenario: Client wishes to make standard SPARQL 1.1 statistical query. Client has Ɛ “budget” of overall accuracy for all queries.
![Page 41: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/41.jpg)
Differential Privacy Protocol
Differential Privacy ModuleClient SPARQL
Endpoint
Step 1: Query and epsilon value sent to the endpoint and intercepted by the enforcement module.
Query, Ɛ > 0
![Page 42: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/42.jpg)
Differential Privacy Protocol
Differential Privacy ModuleClient SPARQL
Endpoint
Step 2: The sensitivity of the query is calculated using a re-written, related query.
Sens Query
![Page 43: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/43.jpg)
Differential Privacy Protocol
Differential Privacy ModuleClient SPARQL
Endpoint
Step 3: Actual query sent.
Query
![Page 44: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/44.jpg)
Differential Privacy Protocol
Differential Privacy ModuleClient SPARQL
Endpoint
Step 4: Result with Laplace noise sent over.
Result and Noise
![Page 45: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/45.jpg)
Experimental Evaluation
![Page 46: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/46.jpg)
Evaluation• Three things to evaluate:• Correctness of operation• Correctness of differential privacy• Runtime
• Used an anonymized clinical database as the test data and added fake names, social security numbers, and addresses.
![Page 47: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/47.jpg)
Correctness of Operation• Can the system do what we want?• Authentication provides access control• AIR restricts information and types of queries• Differential privacy gives strong privacy guarantees.
• Can we do better?
![Page 48: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/48.jpg)
Use Case Used in Thesis• Clinical database data protection• HIPAA: Federal protection of private information fields, such as
name and social security number, for patients.• 3 users• Alice: Works in CDC, needs unhindered access• Bob: Researcher that needs access to private fields (e.g.
addresses)• Charlie: Amateur researcher to whom HIPAA should apply
• Assumptions:• Django is secure enough to handle “clever attacks”• Users do not collude, so can allocate individual epsilon values.
![Page 49: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/49.jpg)
Use Case Solution Overview• What should happen:• Dynamically apply different AIR policies at runtime.• Give different epsilon-budgets.
• How allocated:• Alice: No AIR Policy, no noise.• Bob: Give access to addresses but hide all other private
information fields.• Epsilon budget: E1
• Charlie: Hide all private information fields in accordance with HIPAA• Epsilon budget: E2
![Page 50: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/50.jpg)
Use Case Solution Overview• Alice: No AIR Policy• Bob: Give access to addresses but hide all other private
information fields.• Epsilon budget: E1
• Charlie: Hide all private information fields in accordance with HIPAA• Epsilon budget: E2
![Page 51: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/51.jpg)
Example: A Clinical Database• Client Accesses triplestore via
HTTP server. • OpenID Authentication verifies
user has access to data. Finds epsilon value,
HTTP Server
OpenID Authentication
![Page 52: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/52.jpg)
Example: A Clinical Database• AIR reasoner checks incoming
queries for HIPAA violations.• Privacy policies contain HIPAA
rules.
Privacy Policies
AIR Reasoner
![Page 53: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/53.jpg)
Example: A Clinical Database• Differential Privacy applied to
statistical queries.• Statistical result + noise
returned to client.
Differential Privacy Module
![Page 54: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/54.jpg)
Correctness of Differential Privacy• Need to test how much noise is added.• Too much noise = poor results.• Too little noise = no guarantee.
• Test: Run queries and look at sensitivity calculated vs. actual sensitivity.
![Page 55: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/55.jpg)
How to test sensitivity?• Ideally:• Test noise calculation is correct• Test that noise makes data still useful (e.g. by applying machine
learning algorithms).• Fort his project, just tested former• Machine learning APIs not as prevalent for linked data.• What results to compare to?
![Page 56: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/56.jpg)
Test suite• 10 queries for each operation (COUNT, SUM, AVG, MIN, MAX)• 10 different WHERE CLAUSES• Test:• Sensitivity calculated from original query• Remove each personal URI using “MINUS” keyword and see
which removal is most sensitive
![Page 57: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/57.jpg)
Example for Sens Test• Query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX foaf: <http://xmlns.com/foaf/0.1#>PREFIX mimic: <http://air.csail.mit.edu/spim_ontologies/mimicOntology#>
SELECT (SUM(?o) as ?aggr) WHERE{ ?s foaf:name ?n. ?s mimic:event ?e. ?e mimic:m1 "Insulin". ?e mimic:v1 ?o. FILTER(isNumeric(?o))}
![Page 58: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/58.jpg)
Example for Sens Test• Sensitivity query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX foaf: <http://xmlns.com/foaf/0.1#>PREFIX mimic: <http://air.csail.mit.edu/spim_ontologies/mimicOntology#>
SELECT (SUM(?o) as ?aggr) WHERE{ ?s foaf:name ?n. ?s mimic:event ?e. ?e mimic:m1 "Insulin". ?e mimic:v1 ?o. FILTER(isNumeric(?o)) MINUS {?s foaf:name "%s"} } % (name)
![Page 59: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/59.jpg)
Results Query 6 - Error
![Page 60: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/60.jpg)
Runtime• Queries were also tested for runtime.• Bigger WHERE clauses• More keywords • Extra overhead of doing the calculations.
![Page 61: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/61.jpg)
Results Query 6 - Runtime
![Page 62: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/62.jpg)
Interpretation• Sensitivity calculation time on-par with query time• Might not be good for big data• Find ways to reduce sensitivity calculation time?
• AVG does not do so well…• Approximation yields too much noise vs. trying all possibilities• Runs ~4x slower than simple querying• Solution 1: Look at all data manually (large data transfer)• Solution 2: Can we use NOISY_SUM / NOISY_COUNT instead?
![Page 63: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/63.jpg)
Conclusion
![Page 64: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/64.jpg)
Contributions• Theory on how to apply differential privacy to linked data.• Overall privacy module for SPARQL queries.• Limited but a good start
• Experimental implementation of differential privacy.• Verification that it is applied correctly.
• Other:• Updated sparql to n3 translation to Sparql version 1.1• Expanded upon IARPA project to create policies against statistical
queries.
![Page 65: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/65.jpg)
Shortcomings and Future Work• Triplestores need some structure for this to work• Personal information must be explicitly defined in triples.• Is there a way to automatically detect what triples would
constitute private information? • Complexity• Lots of noise for sparse data.• Can divide data into disjoint sets to reduce noise like PINQ does • Use localized sensitivity measures?
• Third party software problems• Would this work better using a different Triplestore
implementation?
![Page 66: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/66.jpg)
Diff. Privacy and an Open Web• How applicable is this to an open web?• High sample numbers, but potentially high data variance.• Sensitivity calculation might take too long, need to approximate.
• Can use disjoint subsets of the web to increase number of queries with ɛ budgets.
![Page 67: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/67.jpg)
Demo• air.csail.mit.edu:8800/spim_module/
![Page 68: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/68.jpg)
References• Differential Privacy Implementations:• “Privacy Integrated Queries (PINQ)” by Frank McSherry: http://
research.microsoft.com/pubs/80218/sigmod115-mcsherry.pdf• “Airavat: Security and Privacy for MapReduce” by Roy, Indrajit;
Setty, Srinath T. V. ; Kilzer, Ann; Shmatikov, Vitaly; and Witchel, Emmet: http://www.cs.utexas.edu/~shmat/shmat_nsdi10.pdf
• “Towards Statistical Queries over Distributed Private User Data” by Chen, Ruichuan; Reznichenko, Alexey; Francis, Paul; Gehrke, Johannes: https://www.usenix.org/conference/nsdi12/towards-statistical-queries-over-distributed-private-user-data
![Page 69: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/69.jpg)
References• Theoretical Work• “Differential Privacy” by Cynthia Dwork: http://
research.microsoft.com/pubs/64346/dwork.pdf• “Mechanism Design via Differential Privacy” by McSherry, Frank;
and Talwar, Kunal: http://research.microsoft.com/pubs/65075/mdviadp.pdf
• “Calibrating Noise to Sensitivity in Private Data Analysis” by Dwork, Cynthia; McSherry, Frank; Nissim, Kobbi; and Smith, Adam: http://people.csail.mit.edu/asmith/PS/sensitivity-tcc-final.pdf
• “Differential Privacy for Clinical Trail Data: Preliminary Evaluations”, by Vu, Duy; and Slavković, Aleksandra: http://sites.stat.psu.edu/~sesa/Research/Papers/padm09sesaSep24.pdf
![Page 70: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/70.jpg)
References• Other• “Privacy Concerns of FOAF-Based Linked Data” by Nasirifard,
Peyman; Hausenblas, Michael; and Decker, Stefan: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.153.5772
• “The Mosaic Theory, National Security, and the Freedom of Information Act”, by David E. Pozen http://www.yalelawjournal.org/pdf/115-3/Pozen.pdf
• “A Privacy Preference Ontology (PPO) for Linked Data”, by Sacco, Owen; and Passant, Alexandre: http://ceur-ws.org/Vol-813/ldow2011-paper01.pdf
• “k-Anonimity: A Model for Protecting Privacy”, by Latanya Sweeney: http://arbor.ee.ntu.edu.tw/archive/ppdm/Anonymity/SweeneyKA02.pdf
![Page 71: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/71.jpg)
References• Other• “Approximation Algorithms for k-Anonimity”, by Aggarwal,
Gagan; Feder, Tomas; Kenthapadi, Krishnaram; Motwani, Rajeev; Panigraphy, Rina; Thomas, Dilys; and Zhu, An: http://research.microsoft.com/pubs/77537/k-anonymity-jopt.pdf
![Page 72: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/72.jpg)
Appendix: Results Q1, Q2Q1 Error Query_Time Sens_Calc_TimeCOUNT 0 0.020976 0.05231
Q2 Error Query_Time Sens_Calc_TimeCOUNT 0 0.015823126 0.011798859SUM 0 0.010298967 0.01198101AVG 868.8379 0.010334969 0.04432416MAX 0 0.010645866 0.012124062MIN 0 0.010524988 0.012120962
![Page 73: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/73.jpg)
Appendix: Results Q3, Q4Q3 Error Query_Time Sens_Calc_TimeCOUNT 0 0.007927895 0.00800705SUM 0 0.007529974 0.007997036AVG 375.8253 0.00763011 0.030416012MAX 0 0.007451057 0.008117914MIN 0 0.007512093 0.008100986
Q4 Error Query_Time Sens_Calc_TimeCOUNT 0 0.01048708 0.012546062SUM 0 0.01123786 0.012809038AVG 860.91 0.011286974 0.048202038MAX 0 0.01145792 0.01297307MIN 0 0.011392117 0.012881041
![Page 74: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/74.jpg)
Appendix: Results Q5, Q6Q5 Error Query_Time Sens_Calc_TimeCOUNT 0 0.08081007 0.098078012SUM 0 0.085678816 0.097680092AVG 115099.5 0.087270975 0.373119116MAX 0 0.084903955 0.097922087MIN 0 0.083213806 0.098366022
Q6 Error Query_Time Sens_Calc_TimeCOUNT 0 0.136605978 0.153807878SUM 0 0.139995098 0.155878067AVG 115118.4 0.139881134 0.616436958MAX 0 0.148360014 0.160467148MIN 0 0.144635916 0.158998966
![Page 75: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/75.jpg)
Appendix: Results Q7, Q8Q7 Error Query_Time Sens_Calc_TimeCOUNT 0 0.006100178 0.004678965SUM 0 0.004260063 0.004747868AVG 0 0.004283905 0.017117977MAX 0 0.004103184 0.004703999MIN 0 0.004188061 0.004717112
Q8 Error Query_Time Sens_Calc_TimeCOUNT 0 0.002182961 0.002643108SUM 0 0.002092123 0.002592087AVG 0 0.002075911 0.002662182MAX 0 0.00207901 0.002576113MIN 0 0.002048969 0.002597094
![Page 76: Differential Privacy on Linked Data: Theory and Implementation](https://reader035.vdocument.in/reader035/viewer/2022081517/568160c0550346895dcfe845/html5/thumbnails/76.jpg)
Appendix: Results Q9, Q10Q9 Error Query_Time Sens_Calc_TimeCOUNT 0 0.004920959 0.010298014SUM 0 0.004822016 0.010312796AVG 0.00037 0.004909992 0.024574041MAX 0 0.004843235 0.01032114MIN 0 0.004893064 0.010319948
Q10 Error Query_Time Sens_Calc_TimeCOUNT 0 0.012365818 0.014447212SUM 0 0.013066053 0.014631987AVG 860.91 0.013166904 0.056000948MAX 0 0.013354063 0.014893055MIN 0 0.013329029 0.014914989