microdata anonymization considerations

28
Timing, data access types and degree of anonymization in microdata dissemination Rajiv Ranjan NISR/UNDP-Rwanda Reflections on data confidentiali ty, privacy, and curation Regional Workshop on Microdata Dissemination Policy Kigali, Rwanda: 27 – 29 August 2014

Upload: rajiv-ranjan

Post on 21-Jun-2015

210 views

Category:

Government & Nonprofit


0 download

DESCRIPTION

Timing, data access types and degree of anonymization in microdata dissemination

TRANSCRIPT

Page 1: Microdata anonymization considerations

Timing, data access types and

degree of anonymization in microdata dissemination

…Rajiv Ranjan

NISR/UNDP-Rwanda

Reflections on data

confidentiality, privacy, and

curationRegional Workshop on Microdata Dissemination Policy

Kigali, Rwanda: 27 – 29 August 2014

Page 2: Microdata anonymization considerations

Confidentiality concerns

Access issues

Legal basis

Assurance

Challenges

Harmony Governance

Practices

Timing, data access types

and degree of

anonymization in microdata

dissemination

Scheme of the presentation

Page 3: Microdata anonymization considerations

Confidentiality

Page 4: Microdata anonymization considerations

Caveat

Microdata dissemination must maintain confidentiality of individual units: people, households or enterprises.

Individual data collected by statistical agencies for statistical compilation, whether they refer to natural or legal persons, are to be strictly confidential and used exclusively for statistical purposes.

Principle 6

United Nations Fundamental Principles of Official Statistics

http://unstats.un.org/unsd/dnss/gp/fundprinciples.aspx

Page 5: Microdata anonymization considerations

Legal basis in Rwanda

Source: Law on the organisation of statistical activities in Rwanda. Chapter VI: Statistical Confidentiality, Article 17: Prohibited dissemination of information (N° 45/2013 of 16/06/2013)

Data collected by the institutions of the national statistical system through surveys or any other method of collection are protected by statistical confidentiality. Statistical confidentiality implies that the dissemination of such data as well as statistical information which can be calculated from them, shall be conducted in a way that those who provided it are not identified whether directly or indirectly.

Page 6: Microdata anonymization considerations

Access

Page 7: Microdata anonymization considerations

Access benefits

• Fosters diversity of research

• Increases transparency and accountability

• Mitigates duplication of data collection work

• Increases the quality of data

https://unstats.un.org/unsd/accsub-public/microdata.pdf

Page 8: Microdata anonymization considerations

Access assurance in Rwanda

The anonymous basic databases on individuals and other institutions shall be accessible to researchers who, however, shall be committed to : 1° make a written note, that they shall not communicate to any person the contents of such databases without the written authorization of the National Institute of Statistics of Rwanda;2° give to the National Institute of Statistics of Rwanda, the findings of their research.

Source: Law on the organisation of statistical activities in Rwanda. Chapter VI: Statistical Confidentiality, Article 19: Accessibility to anonymous basic database not to be published (N° 45/2013 of 16/06/2013)

Page 9: Microdata anonymization considerations

Challenges

Page 10: Microdata anonymization considerations

Balancing act

Disclosure risks Information loss

• In practice, the more the disclosure risks are reduced, the lower will be the expected utility of the microdata sets.

• The objective remains to deal with the trade-off between disclosure risks and information loss.

Source: Chris Skinner: Statistical Disclosure Control for Survey Data: http://personal.lse.ac.uk/skinnecj/SDC%20for%20survey%20data%20S3RI.pdf

Page 11: Microdata anonymization considerations

Challenges

[Emerging mash-ups]

Datasets are being reused and combined with other datasets in ways never before thought possible, including for use that go beyond the original intent.

[Growing motives]

While there are promising research efforts underway to protect privacy, far more advanced efforts are presently in use to re-identify seemingly “anonymous” data

[Improved access]

Access to datasets have eased their discoverability and data could be used to re-identify previously de-identified datasets

http://www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_5.1.14_final_print.pdf

Page 12: Microdata anonymization considerations

Complicating the challenges

Disclosure risks Information loss

Images: (1.) From the cover of ‘Open Data Now’ - a book by Joel Gurin, exploring how open data within public records will create new jobs, applications and other technology innovations . http://www.opendatanow.com & (2.) A project at PARIS21 on data revolution for post 2015 SDGs http://www.paris21.org/node/1654

Machine readability,

Open standards and

Free for reuse

Post 20151 2

Page 13: Microdata anonymization considerations

Harmony

Page 14: Microdata anonymization considerations

Coexistence

“There is nothing inherently contradictory about hiding one piece of information while revealing another, so long as the information we want to hide is different from the information we want to disclose.”

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2031808

- Felix T. Wu in Defining Privacy and Utility in Data Sets.

Though not easy, but it is possible and desirable for openness and privacy to co-exist.

Page 15: Microdata anonymization considerations

Decision factors

Disclosure risks Information loss

Sensitivity of the dataset

Usage intent

Page 16: Microdata anonymization considerations

Enabling dimensions

• Asserting users types

• Controlling release timing

• Categorizing access methods

• Varying the degree of anonymization

Tools & Methods1 Governance Practices

• Legal basis• Policy backing• Institutionalization

• sdcMicro• sdcMicroGUI

• Deterministic• Probabilistic

1: http://cran.r-project.org/web/packages/sdcMicro/vignettes/sdc_guidelines.pdf

Anon

ymiza

tion

Page 17: Microdata anonymization considerations

Governance

Page 18: Microdata anonymization considerations

Law on the organisation of

statistical activities in

Rwanda(Feb 14, 2006)

Law

Page 19: Microdata anonymization considerations

MicrodataReleasePolicy

@National Institute of Statistics of Rwanda

Policy

Page 20: Microdata anonymization considerations

MicrodataRelease

Committee&

Data curation team@

NISR

Institutionalization

Page 21: Microdata anonymization considerations

Practices

Page 22: Microdata anonymization considerations

Users types served

Govt. (Policy makers and researchers)

International development agencies

Research and academic institutions

Students and professors

Others (scientific researchers)

Page 23: Microdata anonymization considerations

Release timing

6 – 24 monthsafter the 1st release of aggregated data from a survey/census

Within

DHS 2010

EICV(3) 2010-2011

Census 2012

7

7

?

Seasonal Agri Survey 2013 ?

24 Months

Exam

ples

Integrated Household Living Conditions Survey (EICV)

Page 24: Microdata anonymization considerations

Access methods

Web-based distribution

Page 25: Microdata anonymization considerations

Types of files/access

16

1

3

Open access (no restriction)

Direct access or Public Use Files (some restrictions on use, but no screening of users)

Research Use Files (or Scientific Use Files, or Licensed Files)

Availability only in an enclave

No access authorized

Data not available

Data available from external repo 4 Tot

al n

o of

stud

ies

= 24

Page 26: Microdata anonymization considerations

Degree of anonymization

• Suppressing/deleting the records of direct identifiers (e.g. name of the head of HH) and few indirect identifiers (e.g. sub-national admin boundaries)

• Generalizing/replacing (recoding) some indirect identifiers with less specific but semantically consistent groupings of observation values (e.g. place of birth, occupation)

• Perturbing/distorting some indirect identifiers by randomizing the values (e.g. clusters)

Removing or modifying the identifying variables contained in the microdata

The usual practice at NISR is to release microdata as Public Use Files.

For example, in EICV3, the methods applied for anonymizing data were:

Integrated Household Living Conditions Survey (EICV): EICV3 was done in 2010-2011

Variations in the degree of anonymization (and resulting access files/types) may be considered depending on the sensitivity of the dataset and the use.

Page 27: Microdata anonymization considerations

e.g.: Recoding (Occupation)

Page 28: Microdata anonymization considerations

@rajiv_r_in…

Thank you!

“87% of the U.S. population can be uniquely identified by date of birth + gender + zip”

Latanya Sweeney, CMUlatanyasweeney.org