towards a socio demographic fingerprint ch-iassist 2013

14
Towards a procedure to anonymise micro data Anonymising data from official statistics for public use IASSIST, Köln - 30.05.2013 Katelijne Gysen [email protected]

Upload: katelijne-gysen

Post on 07-Jun-2015

36 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Towards a socio demographic fingerprint ch-iassist 2013

Towards a procedure to anonymise micro data

Anonymising data from official statistics for public use

IASSIST, Köln - 30.05.2013 Katelijne Gysen

[email protected]

Page 2: Towards a socio demographic fingerprint ch-iassist 2013

2

Outline

1. Promotion of official statistics

2. Anonymisation of data2.1 Trade off: disclosure risk versus data utility

2.2 Procedure

2.3 Parameter setting for Statistical Disclosure Control (SDC)

3. Uniqueness and k-anonymity3.1 Concepts

3.2 Recent research on mobility data

3.3 The real fingerprint

3.4 Socio-demographic fingerprint

Page 3: Towards a socio demographic fingerprint ch-iassist 2013

3

1. Promotion of official statistics

Data from National Statistical Institute (NSI) Labour Force Survey Survey on Structure of Earnings SILC (Survey on Income and Living Conditions) PISA (Education) Swiss Health Survey Population Census and Business Census, …

Micro data for research and teaching purposes

Collaboration with our NSI:

Page 4: Towards a socio demographic fingerprint ch-iassist 2013

4

2. Anonymisation of data

2.1 Trade-off dilemma: disclosure risk versus data utility

researcher versus data owner

Data utility

Data protection

Page 5: Towards a socio demographic fingerprint ch-iassist 2013

5

2.2 Procedure (1)Dataset

Release data

Risk / utility Balance ?

Describe Intrusion scenario

Apply SDC methods

Describe Dataset characteristics

Define Target public

Release data

Disclosure risk ?

Measure Data utility

Describeaccess conditions

Page 6: Towards a socio demographic fingerprint ch-iassist 2013

6

2.2 Procedure (2)Dataset

Release data

Data utility ?

Describe Intrusion scenario

Apply SDC methods

Set SDC parametersDescribe

Dataset characteristics

Define Target public

SDC parameters

met ?

Release data

Disclosure risk ?

Measure Data utility

Describeaccess conditions

Page 7: Towards a socio demographic fingerprint ch-iassist 2013

7

2.3 Parameter setting for Statistical Disclosure Control (SDC)

1. Age of the data (min.)

2. Subsample (min.)

3. Level of geographical detail (max.)

4. Global and individual risk (max.)

5. Number of indirect identifying variables (max.)

6. Degree of anonymity for socio-demographic characteristics (min.)

Page 8: Towards a socio demographic fingerprint ch-iassist 2013

8

Micro data

iden

tify

ing

vari

ab

les

Non identifying variables Rare

Observable

Searchable

3 Uniqueness and k-anonymity - 3.1 Concepts

Page 9: Towards a socio demographic fingerprint ch-iassist 2013

9

3.2 Recent research about mobility data

“… four, randomly chosen “spatio-temporal points” (for example, mobile device pings to antennas)

is enough to: uniquely identify 95% of the individuals”.

The mobility pattern is apparently unique.

Page 10: Towards a socio demographic fingerprint ch-iassist 2013

10

3.3 The real fingerprint

“There are as many as 150 ridge characteristics (points) in the average fingerprint.

So how many points must a fingerprint examiner match in order to safely say the prints are indeed those of a particular suspect?”

The answer is surprising.

“There is no standard number required. …

… In fact, the decision as to whether or not there is a match is left entirely to the individual examiner. However, individual departments and agencies may have their own set of standards in place that requires a certain number of points be matched before making a positive identification.”

Source: http://www.leelofland.com/wordpress/comparing-fingerprints-whats-the-point

/

Page 11: Towards a socio demographic fingerprint ch-iassist 2013

11

3.4 The socio-demographic fingerprint

Gender Date of birth Municipality

Civil status Nationality

Page 12: Towards a socio demographic fingerprint ch-iassist 2013

12

3.4 The socio-demographic fingerprint (2)

Source: STATPOP 2010, BFS.

k-anonymity

  1 2 5 20 100 1000

Gender * DOB * Municipality 74 86.9 95.3 100 100 100

Gender * YOB * Municipality  0.7 1.9 6.3 27.6 68.3 92.1

Gender * YOB * Civil status * Municipality  3.2 6.4 14.9 41.5 77.9 96.6

Gender * YOB * Nationality * Municipality  7.9 12.9 21.3 47.1 82 97.1

Gender * YOB * Civil * Nation * Municip.  12 18.6 31.1 59.6 87.4 98.9

Anonymity of the Swiss population given simple socio-demographics

Page 13: Towards a socio demographic fingerprint ch-iassist 2013

13

References

de Montjoye, Y.A., Hidalgo C.A., Verleysen M., Blondel V.D. Unique in the crowd: the privacy bounds of human mobility. Scientific Reports 3, article 1376, DOI: 10.1038/srep01376. 2013

Franconi, L., Public Use Files: practices and methods to increase quality of released microdata. OECD, 2012.

Golle, P. Revisiting the uniqueness of simple demographics in the US population. Palo Alto Research Center. 2006

Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Schulte Nordholt, E., Spicer, K. , De Wolf P.P., Statistical Disclosure Control. Wiley. 2012.

Sweeney, L. Simple Demographics often identify people uniquely. Carnegie Mellon University, Data Privacy Working Paper 3. Pittsburgh 2000.

Sweeney, L. k-Anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuziness and Knowledge-based Systems, 10 (5), 2002, 557-570.

Meindl, B., Kowarik, A., Templ M. Guidelines for the anonymisation of microdata using R-package sdcMicro. Vienna. 2012

Page 14: Towards a socio demographic fingerprint ch-iassist 2013

14

Find out more ?

about FORS: www.fors.unil.ch about public microdata for research in CH: www.compass.unil.ch

Let’s connect !