the use of protected microdata in tabulation: case of sdc-methods microaggregation and pram

14
The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM Researcher Janika Konnu Manchester, United Kingdom 17-19 December 2007

Upload: razi

Post on 11-Jan-2016

17 views

Category:

Documents


0 download

DESCRIPTION

The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM. Researcher Janika Konnu Manchester, United Kingdom 17-19 December 2007. Outline. Data SDC-methods Results Conclusions Forthcoming research. Data used in the study. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

Researcher Janika Konnu

Manchester, United Kingdom

17-19 December 2007

Page 2: The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

Tuesday 18 December 2007

2Janika Konnu

Outline

Data SDC-methods Results Conclusions Forthcoming research

Page 3: The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

Tuesday 18 December 2007

3Janika Konnu

Data used in the study

Data of teachers was originally collected for administrative purposes.

Only high schools teachers (N=7798) were included in our study.

Data included information about teachers: age, gender, position, etc. the schools those teachers taught in: the location of the

school, number of students, etc.

Page 4: The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

Tuesday 18 December 2007

4Janika Konnu

SDC Methods: Microaggregation

First data is divided into groups of k observations and the group averages are released instead of original values of variable.

MDAV-algorithm was used in grouping: algorithm finds the average observation with respect to the values and forms groups by using the distance from this average observation.

Grouping the data is the crucial point for this method: when the most similar observations are contained in the group, information loss will be minimised.

In our study microaggregation was used for categorical data although it is intended for numerical data.

Page 5: The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

Tuesday 18 December 2007

5Janika Konnu

SDC Methods: The Post RAndomization Method

Method changes values of a variable according to probability matrix (Markov matrix)

example:

When PRAM is applied, data user must take the probability matrix into account in order to obtain correct results.

In our study we were testing usefulness of PRAM when probability matrix is not used in analysis.

80.020.000

10.080.010.00

010.080.010.0

0020.080.0

Page 6: The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

Tuesday 18 December 2007

6Janika Konnu

Empirical work: -Argus software

Software includes disclosure risk measurement and following methods: global recoding, local suppression, top and bottom coding, PRAM, numerical microaggregation, numerical rank swapping and Sullivan masking.

Software produces protected data if suppressions are allowed.

In our case, only SDC-methods PRAM and numerical microaggregation were studied. No suppressions were made, because we needed information on the difference between original and protected data.

Page 7: The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

Tuesday 18 December 2007

7Janika Konnu

Results: Data protected by MicroaggregationGroup sizes used in protection are 2, 5, 8, 10 and 15

Microaggregation does not have an effect on frequencies.

Unfortunately this implies that hardly any change occur in values.

Conclusion: microaggregation does not give strong enough protection when it comes to categorical data.

Page 8: The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

Tuesday 18 December 2007

8Janika Konnu

Results: Data protected by PRAM (no bandwidth)Changing probabilities: 0.05, 0.10, 0.20, 0.30 and 0.40

PRAM changes values of variables and that way data will be protected.

Unfortunately PRAM leads to problems when categories have big differences in the frequencies. The larger frequency keeps getting smaller and the other way around.

Page 9: The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

Tuesday 18 December 2007

9Janika Konnu

Results: Data protected by PRAM (bandwidth is 2)Changing probabilities: 0.05, 0.10, 0.20, 0.30 and 0.40

Restricting the change of values can not solve problem with difference in frequencies.

Our study shows that frequencies in categories next to the one with largest frequency still grow too fast.

Page 10: The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

Tuesday 18 December 2007

10Janika Konnu

Results: Data protected by PRAMNo bandwidth Bandwidth is 2

Page 11: The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

Tuesday 18 December 2007

11Janika Konnu

Conclusion: Microaggregation

Microaggregation perform well with numerical data, but its application for categorical data needs more research.

Data protected by microaggregation includes almost the same information as the original data.

Are we sure that microaggregation is able to protect categorical data properly?

Page 12: The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

Tuesday 18 December 2007

12Janika Konnu

Conclusion: PRAM

PRAM seems to perform quite well when it comes to protecting data, but there are some issues to overcome.

PRAM can protect data with small changing probabilities, because it is based on uncertainty of identification.

In this case our concern is with information loss. Is the protected data useful without using probability matrix?

Page 13: The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

Tuesday 18 December 2007

13Janika Konnu

Forthcoming research

Include more methods rank swappingnoise adding

Include disclosure risk measures

Include more precise measurement for information loss

Page 14: The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM

Tuesday 18 December 2007

14Janika Konnu

Some preferences

Domingo-Ferrer, J., Torra, V. 2001. A Quantitative Comparison of Disclosure Control Methods for Microdata. In Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Amsterdam: North-Holland.

Gouweleeuw, J., Kooiman, P., Willenborg, L., and de Wolf, P. 1998. Post Randomisation for Statistical Disclosure Control: Theory and Implementation. Journal of Official Statistics. Vol. 14, No.4, s. 463--478.

Group Crises. 2004. Research Reports: Microaggregation for Privacy Protection in Statistical Databases. In July 2005. <http://vneumann.etse.urv.es/publications/reports/>.

Thank You!