![Page 1: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/1.jpg)
OCEAN: Open-source Collation of eGovernment data And NetworksUnderstanding Privacy Leaks in Open
Government Data
Srishti Gupta
Advisor: Dr. Ponnurangam Kumaraguru
M.Tech Thesis Defense
20-November-2013
![Page 2: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/2.jpg)
Thesis Committee
Dr. Muttukrishnan Rajarajan, City University, London
Dr. Vinayak Naik, IIIT-Delhi
Dr. PK (Chair), IIIT-Delhi
2
![Page 3: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/3.jpg)
Demo
3
![Page 4: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/4.jpg)
Academic Honors
Gupta, S., Gupta, M., and Kumaraguru, P. OCEAN: Open-source Collation of eGovernment data And Networks. Poster at Security and Privacy Symposium (SPS), IIT-K, 2013.
Gupta, S., Gupta, M., and Kumaraguru, P. Is Government a Friend or Foe? Privacy in Open Government Data. Poster at IBM-ICARE, IISc Bangalore, 2012.
4
BEST Poster
![Page 5: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/5.jpg)
Recognition
5
IIITD Homepage [ Aug ’13 ]
Hindustan [ April ’13 ]
550 Unique Visitors
(as on Nov 17, 2013)
![Page 6: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/6.jpg)
Presentation Outline
6
Presentation Outline
Research Motivation and Aim
Related Work
Research Contribution
Methodology
Experiments and Analysis
Conclusion
Future Work
Questions
![Page 7: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/7.jpg)
Identity Theft- On rise!
7
Research Motivation and Aim
![Page 8: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/8.jpg)
Ways to get PII
8
Mail Thefts, Pharming
OSN
Social Engineering (e.g., Fake accounts)
Not credible Limited Info.
Open Government Data Source
E-mail, Docs, Spreadsheet
Shoulder SurfingDumpster Diving
![Page 9: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/9.jpg)
Open Government Data Sources ‘Open’: Publicly available
eGovernment initiatives by different state government inthe form of databases / services.
Objective? Improve information gathering procedure
Reduce the burden on citizens to access their data
Pros: Improved data availability, easy verification.
Cons: Databases publicly available, leading to informationdisclosure, privacy breach.
9
Research Motivation and Aim
![Page 10: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/10.jpg)
10
Information Leakage in Open Government Data Sources ??
![Page 11: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/11.jpg)
PII Leakage
11
Personally Identifiable Information (PII)
Voter ID, Name, Father’s name, Age, Gender, Date Of Birth, DL number, PAN, Phone number
Research Motivation and Aim
![Page 12: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/12.jpg)
The Other Side! “People’s View”
12
Research Motivation and Aim
CONSCIOUS DECISION !
(Kumaraguru, 2012)
![Page 13: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/13.jpg)
13
Citizens do not want their PII to be leaked !
![Page 14: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/14.jpg)
Research Aim
To develop a technology to showcase publicly availablepersonal information online
To highlight the privacy issues on aggregation of availablepersonal information
14
Research Motivation and Aim
![Page 15: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/15.jpg)
System Outline
15
Presentation Outline
Identification of data sources
Threat Modelling Information Aggregation
Data ExtractionEvaluation (Privacy Score, Recall, SUS)
![Page 16: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/16.jpg)
Presentation Outline
16
Presentation Outline
Research Motivation and Aim
Related Work
Research Contribution
Methodology
Experiments and Analysis
Conclusion
Future Work
Questions
![Page 17: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/17.jpg)
Related Work
17
Related Work and Research Contribution
Yasni(www.yasni.com)
![Page 18: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/18.jpg)
Related Work
18
Related Work and Research Contribution
Pipl(www.pipl.com)
![Page 19: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/19.jpg)
Related Work
19
Related Work and Research Contribution
Name Country Description
IndianKanoon(http://www.indiankanoon.org/)
India Legal search engine Indexes judgements of the Supreme Court and several High
Courts
OpenCivic.in(http://www.opencivic.in/)
India Application Programming Interface Gives data about state assembly elections and profiles of MP's in
Maharashtra
ABQ Ride(http://www.cabq.gov/abq-apps/city-apps-listing/abq-ride)
USA Real-time locations of city buses Fares for other public transportation
Illustreets(http://data.gov.uk/apps/illustreets)
UK Comparing locations Gives crime, education, transport and census data for a location
Various country-specific systems built with Open Government Data
![Page 20: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/20.jpg)
Research Gap
20
OCEANYasni / Pipl
Open Source Data Aggregation
Indian KanoonOpen Government Data
PII Leakage
Related Work and Research Contribution
![Page 21: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/21.jpg)
Presentation Outline
21
Presentation Outline
Research Motivation and Aim
Related Work
Research Contribution
Methodology
Experiments and Analysis
Conclusion
Future Work
Questions
![Page 22: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/22.jpg)
Research Contribution First deployed system which shows the aggregated personal
information about the residents of Delhi.
Threat modelling on the various open government databases.
Privacy Score: Risk associated with the person on the leaking PII.
Empirical understanding of privacy perceptions, awareness andexpectations of the users from the open government data.
22
Related Work and Research Contribution
![Page 23: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/23.jpg)
Presentation Outline
23
Presentation Outline
Research Motivation and Aim
Related Work
Research Contribution
Methodology Identification of open government data sources
Threat Modelling
Data Extraction
Information Aggregation
Experiments and Analysis
Conclusion
Future Work
Questions
![Page 24: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/24.jpg)
System Architecture
24
![Page 25: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/25.jpg)
System Outline
25
Presentation Outline
Identification of data sources
Threat Modelling Information Aggregation
Data Extraction Evaluation (Privacy Score, Recall, SUS)
![Page 26: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/26.jpg)
Driving LicenceDL-XXYYYYAAAAAAA where
DL: state(Delhi), XX: Location in Delhi, YYYY: Year of issue of the license, AAAAAAA is unique
26
Methodology
![Page 27: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/27.jpg)
Voter ID XXX12345678 where
X: ‘A’ – ‘Z’ and last 8 digits- numerals
27
Methodology
![Page 28: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/28.jpg)
PAN XXXTL1234X where
XXX: ‘A’ – ‘Z’, T: Type of holder, L: First character of last-name,1234: Sequential number, X: Check digit
28
Methodology
![Page 29: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/29.jpg)
Online Social Networks
29
Methodology
Name , Gender, Profile image, Profile url
Name , Followers / Following count, Location, Profile image, Profile url
Name , Gender, Facebook / Twitter contact, Friend / Follower count, Badge / Mayorship / Check-in count, Location, Profile image, Profile url
Name , Location, Profile image, Profile url
Name , Gender, Relationship status, Location, Organization, Birthday, E-mail, Language, Profile image, Profile url
![Page 30: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/30.jpg)
System Outline
30
Presentation Outline
Identification of data sources
Threat Modelling Information Aggregation
Data ExtractionEvaluation (Privacy Score, Recall, SUS)
![Page 31: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/31.jpg)
II. Threat Modelling
31
Methodology
USER
OPEN GOVERNMENT
DATA
PAN
DRIVING LICENSE VOTER ROLLS
Name, Address, Father’s name, Driving License no., DOB
Driving License number
Name, Constituency
Name, Address, Relation name, Age, Gender, Voter ID
Name, PANName, DOB
TRUST BOUNDARY
![Page 32: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/32.jpg)
Attack Scenario (I) Online Voter ID card – Multiple fake voter ID cards can be
created from the available PII
32
Research Motivation and Aim
![Page 33: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/33.jpg)
Attack Scenario (II) View tax statements (Income tax e-filing) – Fake accounts
can be created to view TDS statements.
33
Research Motivation and Aim
![Page 34: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/34.jpg)
Attack Scenario (III) Procure a SIM card / phone connection
Fake documents can be created
Credit / debit cards can be applied in victim’s name
Networking accounts can be created
34
Research Motivation and Aim
![Page 35: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/35.jpg)
II. Threat Modelling DREAD Model: Microsoft’s Risk Assessment Model
35
Methodology
Term Remarks
Damage How big the damage would be if the attack succeeded?
Reproducibility How easy it is to reproduce the attack to work?
Exploitability How much time, effort, and expertise is needed toexploit the threat?
Affected Users If a threat were exploited, what percentage of users would be affected?
Discoverability How easy is it for an attacker to discover this threat?
![Page 36: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/36.jpg)
II. Threat ModellingScheme: High (3), Medium (2), Low (1)
Threat: Malicious user can identify PII of Delhi residents
36
Methodology
[Threat modelling: http://msdn.microsoft.com/en-us/library/ff648644.aspx]
![Page 37: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/37.jpg)
II. Threat ModellingAccording to Microsoft’s DREAD model,
In our case,
Overall rating = 2 + 3 + 2 + 3 + 3 = 13 (High)
It means that this threat pose a significant risk to the various information portal websites of Delhi government and needs to be addressed as soon as possible !
37
Methodology
Range Level of risk
5 -7 Low
8 – 11 Medium
12 – 15 High
![Page 38: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/38.jpg)
System Outline
38
Presentation Outline
Identification of data sources
Threat Modelling Information Aggregation
Data ExtractionEvaluation (Privacy Score, Recall, SUS)
![Page 39: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/39.jpg)
III. Data ExtractionData was collected from various open government data sources usingPHP scripts and stored as MySQL databases.
39
Methodology
OPEN GOVT. WEBSITES
Alphabets a-z for name, across 70 constituencies
Name and DOB from DL
Random 5 seeds, ‘Incremental attack’
PAN
[53,419]
DRIVING LICENCE[2,24,982]
VOTER[81,95,053]
![Page 40: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/40.jpg)
III. Data Extraction Public data from various online social networking sites was
collected using public API calls.
OAuth tokens were used for authentication and authorization.
40
Methodology
UNIQUE NAME
API CALLS
GOOGLEPLUS[28,900]
LINKEDIN[1,86,798]
FOURSQUARE[29,393]
TWITTER[15,57,715]
FACEBOOK[33,77,102]
![Page 41: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/41.jpg)
System Outline
41
Presentation Outline
Identification of data sources
Threat Modelling Information Aggregation
Data ExtractionEvaluation (Privacy Score, Recall, SUS)
![Page 42: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/42.jpg)
IV. Information Aggregation Family Tree
Information within Voter ID database aggregated to findrelationships among records.
OCEAN has 3,90,353 such users.
42
Methodology
![Page 43: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/43.jpg)
IV. Information Aggregation Mapping of users across Voter ID and Driving licence database.
Table Schema:
Done on the basis of similarity between name, relation name andaddress of the users across the database.
OCEAN has 6,384 such users.
43
Methodology
Database Attributes
Voter ID Voter ID, Name, Address, Father's / Mother's / Husband's name, Age, Gender
Driving Licence Name, Address, Father's name, DOB, Validity period, vehicle category
![Page 44: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/44.jpg)
44
IV. Information AggregationMethodology
Challenge: The address formats for various sources is different
![Page 45: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/45.jpg)
IV. Information Aggregation Mapping of users across Voter ID, Driving licence and PAN
database.
Subset of DL having PAN were chosen.
OCEAN has 1,693 such users.
45
Methodology
![Page 46: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/46.jpg)
IV. Information Aggregation Mapping users across Foursquare, Facebook and Twitter.
Some users specify their other OSN’s contact on Foursquare. Theinformation available from such users is aggregated together.
OCEAN has 11 such users
46
Methodology
![Page 47: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/47.jpg)
IV. Information Aggregation
Challenge: Not many users link their OSN accounts
47
Methodology
![Page 48: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/48.jpg)
Presentation Outline
48
Presentation Outline
Research Motivation and Aim
Related Work and Research Contribution
Methodology
System User Interface
Experiments and Analysis
Conclusion
Future Work
Questions
![Page 49: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/49.jpg)
System Outline
49
Presentation Outline
Identification of data sources
Threat Modelling Information Aggregation
Data ExtractionEvaluation (Privacy Score, Recall, SUS)
![Page 50: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/50.jpg)
Survey Dataset
62 complete responses.
51% males, 49% females.
77% in the age group 20 – 25.
23% had friends / self experience identity thefts online.
50
Experiments and Analysis
![Page 51: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/51.jpg)
51
Experiments and Analysis
Privacy score measure the risk associated with a person on the basis of how much PII about that person is revealed from open government data sources.
Privacy score (user) = Σ Sensitivity score (attributes)
Sensitivity score -> {1, 2, 3, 4, 5}
Range Level
<20 % 1
21 – 30 % 2
31 – 50 % 3
51 – 60 % 4
>61 % 5
Evaluation Metric I - Privacy Score
![Page 52: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/52.jpg)
Privacy Score
52
Experiments and Analysis
Level 5 1Willingness to share
Attribute Percentage of users unwilling to share personal information with anyone
Privacy Level
Voter ID 56.4% 4
Driving licence no. 58% 4
PAN 67.7% 5
Full name 14.5% 1
Home address 82.25% 5
Age 29% 2
DOB 50% 3
Father’s name 38.7% 3
Gender 14.5% 1
![Page 53: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/53.jpg)
Privacy Score
53
Experiments and Analysis
Privacy score for 84,22,459 users:
Case 1: Users having only Voter ID (97.3%)
PS = Σ(Voter ID, name, father’s name, age, gender, address) = 16
Case 2: Users having only Driving licence number (2%)
PS = Σ(DL number, name, relative’s name, DOB, address) = 17
Case 3: Users having only PAN (1%)
PS = Σ(PAN, DL number, name, relative’s name, DOB, address) = 25
![Page 54: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/54.jpg)
Privacy Score
54
Experiments and Analysis
1,693 people
Highest Risk!
Case 4: Users having Voter ID and DL number (0.07%)
PS = Σ(Voter ID, DL number, name, father’s name, age, gender, DOB, address) = 24
Case 5: Users having Voter ID, DL number and PAN (0.02%)
PS = Σ(Voter ID, DL number, PAN, name, father’s name, age, gender, DOB, address) = 29
![Page 55: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/55.jpg)
Evaluation Metric II
55
Evaluation Metrics
Recall (Based on user study)
𝑅𝑒𝑐𝑎𝑙𝑙 =𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑒𝑜𝑝𝑙𝑒 𝑤ℎ𝑜 𝑐𝑜𝑢𝑙𝑑 𝑏𝑒 𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑦𝑠𝑡𝑒𝑚
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑎𝑟𝑐ℎ 𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 𝑑𝑜𝑛𝑒 𝑜𝑛 𝑡ℎ𝑒 𝑠𝑦𝑠𝑡𝑒𝑚
Thus, Recall = ( 179 / 389 ) = 46%
Low Recall Data collection not 100%.
(Out of 12 million voter records, we have ~8 million records)
Respondents might be unclear about constituency.
![Page 56: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/56.jpg)
Evaluation Metric III
56
Evaluation Metrics
System Usability Score (SUS)
Measured using the standard method as defined by Brooke et.al.
For OCEAN, value was 74.5 / 100 which means that people found the system usable and convenient to use.
(Brooke, 1996)
![Page 57: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/57.jpg)
User Awareness
Government started various open initiatives to increase the level of transparency with citizens.
But, only 19% survey respondents aware.
Around 76% have started using these for less than 2 years.
Proper schemes required to convey the existence.
57
Experiments and Analysis
![Page 58: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/58.jpg)
User Experience Majority, 62% were shocked to see the availability of
personal information to this extent.
People felt that the information can be used maliciously against them.
People now feel scared in sharing their information with various government departments.
58
Experiments and Analysis
![Page 59: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/59.jpg)
User Expectations
59
Experiments and Analysis
![Page 60: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/60.jpg)
Feedback
60
Feedback
“It was an eye-openerto a common man.”
I am really shocked that the exact ID
numbers are available online without much security against data mining at this scale.”
“Waiting for an upgraded version
which will work for other states also.”
“Good system. Great work ! Didn't know
such a system existed.”
“A great shortcoming and security flaw has been pointed out by OCEAN. Great work.”
![Page 61: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/61.jpg)
Presentation Outline
61
Presentation Outline
Research Motivation and Aim
Related Work
Research Contribution
Methodology
Experiments and Analysis
Conclusion
Future Work
Questions
![Page 62: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/62.jpg)
Conclusion Large amount of personal information is available on
government servers.
Information aggregation yields more information about a person.
Threat Modelling on open government data sources shows risk associated with PII leakage and need for preventive measures.
1,693 users are most vulnerable to identity thefts risks.
People felt the need of access control on the data and proper privacy laws against the misuse of information.
62
Conclusion
![Page 63: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/63.jpg)
Presentation Outline
63
Presentation Outline
Research Motivation and Aim
Related Work
Research Contribution
Methodology
Experiments and Analysis
Conclusion
Future Work
Questions
![Page 64: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/64.jpg)
Future Work Datasets can be extended to other states in India.
Mapping users across offline (govt. databases) and online(social networking sites) worlds.
Data collection can be expanded to improve the recall.
64
Future Work
![Page 65: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/65.jpg)
Acknowledgments
Mayank Gupta, B.Tech, DCE
Niharika Sachdeva, PhD, IIIT-Delhi
Precog members, friends and family
65
Future Work
![Page 66: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/66.jpg)
References Kumaraguru, P., and Sachdeva, N. Privacy in India: Attitudes and
Awareness V 2.0. Tech. rep., PreCog-TR-12-001, PreCog@IIIT-Delhi, 2012. http://precog.iiitd.edu.in/research/privacyindia/
McCallister, Erika, Tim Grance, and Karen Scanfone. "Guide to protecting the confidentiality of personally identifiable information (PII)(draft), January 2009." NIST Special Publication: 800-122.
Schwartz, Paul M., and Daniel J. Solove. "PII Problem: Privacy and a New Concept of Personally Identifiable Information, The." NYUL Rev. 86 (2011): 1814.
Mont, Marco Casassa, Siani Pearson, and Pete Bramhall. "Towards accountable management of identity and privacy: Sticky policies and enforceable tracing services." Database and Expert Systems Applications, 2003. Proceedings. 14th International Workshop on. IEEE, 2003.
Jones, Rosie, et al. "I know what you did last summer: query logs and user privacy." Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, 2007
66
![Page 67: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/67.jpg)
References (I) Nashash, Hyam. "EDUCATION AS A BUILDING BLOCK IN OPENING UP
GOVERNMENT DATA." European Scientific Journal 9.13 (2013).
Barber, Grayson. "Personal Information in Government Records: Protecting the Public Interest in Privacy." . Louis U. Pub. L. Rev. 25 (2006): 63.
Krishnamurthy, Balachander, and Craig E. Wills. "On the leakage of personally identifiable information via online social networks." Proceedings of the 2nd ACM workshop on Online social networks. ACM, 2009.
Jurgens, David. "That’s What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships." Seventh International AAAI Conference on Weblogs and Social Media. 2013.
Zheleva, Elena, and Lise Getoor. "To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles." Proceedings of the 18th international conference on World wide web. ACM, 2009.
67
![Page 68: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/68.jpg)
References (II) Mislove, Alan, et al. "You are who you know: inferring user profiles in
online social networks." Proceedings of the third ACM international conference on Web search and data mining. ACM, 2010.
Harel, Amir, et al. "M-score: estimating the potential damage of data leakage incident by assigning misuseability weight." Proceedings of the 2010 ACM workshop on Insider threats. ACM, 2010.
Wright, Glover, Pranesh Prakash Sunil Abraham, and Nishant Shah. "Open government data study: India." Study commissioned by the Transparency and Accountability Initiative (2010).
Godse, Mr Vinayak, and Director–Data Protection. "RISE PROJECT." (2010).\bibitem{brooke1996sus} Brooke, John. ``SUS-A quick and dirty usability scale." Usability evaluation in industry 189 (1996): 194.
Social media report 2012: Social media comes of age. http://www.nielsen.com/us/en/reports/2012/state-of-the-media-the-social-media-report-2012.html
68
![Page 69: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/69.jpg)
69
Thank You!
![Page 70: OCEAN: Open-source Collation of eGovernment data And Networks: Understanding Privacy Leaks in Open Government Data](https://reader034.vdocument.in/reader034/viewer/2022051610/5491c211ac795939288b45f7/html5/thumbnails/70.jpg)
70
Questions?