anonymising quantative data
TRANSCRIPT
![Page 1: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/1.jpg)
Anonymising quantitative
data
Dr Sharon Bolton
UK Data Service
UK Data Archive, University of Essex
Anonymising Research Data workshop
Dublin, 22 June 2016
![Page 2: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/2.jpg)
The UK Data Service
• Single point of access to wide range of social science data:
ukdataservice.ac.uk
• Funded by the ESRC to serve the academic community: training
and guidance; UK Data Archive established 1967
• Used by academic researchers and students; government analysts;
charities; business; research centres; think tanks
• Survey microdata; cohort studies; international macrodata; census
data; qualitative/mixed methods data
• Support and guide data creators, including disclosure review
(anonymisation) and preparation for archiving
![Page 3: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/3.jpg)
Protecting confidentiality: the ‘5 Safes’
Five guiding principles:
• Safe people - educate researchers to use data safely
• Safe projects - research projects for ‘public good’
• Safe settings - SecureLab system for sensitive data
• Safe outputs - SecureLab projects outputs screened
• Safe data - treat the data to protect respondent
confidentiality
• For this session, we will concentrate (mostly) on Safe
data
![Page 4: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/4.jpg)
Data collection: planning
• Explain to respondents what archiving entails and gain agreement for data sharing – informed consent
• Think about disclosure risks before starting – what kind of information do you need to collect?
• Direct identifiers include: names; addresses; telephone numbers; email addresses; photos; (perhaps) IP addresses; do you really need them?
• Unless explicit consent obtained for sharing, direct identifiers should always be removed from data
![Page 5: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/5.jpg)
Anonymising data: indirect identifiers
Indirect identifiers include:
• Sensitive information: health information/medical
conditions; crime victimisation/offending; drug/alcohol
use etc.
• ‘Less sensitive’ information: age/birth date; educational
characteristics; employment details; religious affiliation;
household size; geographic area
• Look at demographics in combination (e.g.
demographics + geographies)
• Text/string variables – too detailed?
![Page 6: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/6.jpg)
Anonymising indirect identifiers
• Aggregate categories to reduce precision
• Band ages, incomes, expenditure, etc. to disguise outliers
• Use standard coding frames – e.g. SOC2010
• Generalise meaning of detailed text
• Document the changes you make
• Talk to other researchers, archives, data services
Published guides:
• UCD Research Data Management Guide http://libguides.ucd.ie/data/ethics
• ONS Disclosure control guidance for microdata produced from social surveys http://www.ons.gov.uk/methodology/methodologytopicsandstatisticalconcepts/disclosurecontrol/policyforsocialsurveymicrodata
![Page 7: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/7.jpg)
Anonymising data: new developments and tools
Statistical Disclosure Control (SDC) software is available:
• mu-Argus
• standalone software package recommended by Eurostat for
government statisticians
• software and manual: http://neon.vb.cbs.nl/casc/mu.htm
• R tool - SDCMicro (GUI)
• Software, manual:
http://www.inside-r.org/packages/cran/sdcMicro/docs/sdcMicro
• new documentation being developed by UK Data Service, working with
R developers
![Page 8: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/8.jpg)
Quiz 1: disclosive text in job titleJob title Frequency Valid Percent
nurse 73 73.0
carer for elderly man 1 1.0
hospital ward cleaner 1 1.0
social science researcher 1 1.0
head of dental practice 2 2.0
cleaner in electronics factory 1 1.0
Financial Director, Sunnyview Care Home,
Colchester
1 1.0
general manager 1 1.0
GP 1 1.0
Manager, Cotterill Village Stores 1 1.0
works in electronics factory 1 1.0
on benefits, not working 1 1.0
police officer 2 2.0
consultant, geriatric psychiatry 1 1.0
Reetired 1 1.0
retired 1 1.0
Retired 1 1.0
retirement 1 1.0
geography teacher 2 2.0
Teacher, music 2 2.0
Seondary school teeacher 1 1.0
unemployed 1 1.0
web designer 2 2.0
Total 100 100.0
![Page 9: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/9.jpg)
Quiz 1: jobs coded with SOC2010
Job title: SOC2010 Frequency Valid Percent
1131: Director, financial 1 1.0
1171: Manager, general 1 1.0
1190: Manager, retail 1 1.0
2231: Nurse 73 73.0
2426: Researcher 1 1.0
2215: Dentist 2 2.0
2211: Doctor, medical 2 2.0
3312: Officer, police 2 2.0
2314 Teacher, secondary 3 3.0
2137: Designer, web 2 2.0
6145: Carer 1 1.0
9139: Worker, factory 1 1.0
9233: Cleaner 2 2.0
Retired 4 4.0
Unemployed 2 2.0
Total 100 100.0
![Page 10: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/10.jpg)
Quiz 2: detailed religion categories
Religious affiliation
Frequency Valid Percent
1 Protestant 41 41.4
2 Anglican 4 4.0
3 Catholic 26 26.3
4 Muslim 8 8.1
5 Sikh 5 5.1
6 Jehovah's Witness 6 6.1
7 Methodist 1 1.0
8 Mormon 1 1.0
9 Baptist 1 1.0
10 Buddhist 3 3.0
11 None 1 1.0
12 No religion 1 1.0
13 Moravian 1 1.0
Total 99 100.0
![Page 11: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/11.jpg)
Quiz 2: religion categories aggregated
Religious affiliation
Frequency Valid Percent
1 Protestant 49 49.0
3 Catholic 26 26.0
4 Muslim 8 8.0
5 Sikh 5 5.0
6 Other religion 10 10.0
7 No religion 2 2.0
Total 100 100.0
![Page 12: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/12.jpg)
Quiz 3: age
in years
Age in years
Frequency Valid Percent
16 3 3.0
17 3 3.0
18 9 9.0
19 9 9.0
20 16 16.0
21 4 4.0
22 2 2.0
23 2 2.0
24 2 2.0
25 2 2.0
26 2 2.0
27 2 2.0
28 2 2.0
29 2 2.0
30 2 2.0
31 1 1.0
32 1 1.0
40 11 11.0
41 1 1.0
42 1 1.0
43 3 3.0
49 1 1.0
50 13 13.0
51 1 1.0
60 1 1.0
61 1 1.0
62 1 1.0
63 1 1.0
64 1 1.0
Total 100 100.0
![Page 13: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/13.jpg)
Quiz 3: banded age
Age (banded)
Frequency Valid Percent
1 16-20 40 40.0
2 21-30 22 22.0
4 41-50 13 13.0
5 51-60 19 19.0
6 60-64 6 6.0
Total 100 100.0
![Page 14: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/14.jpg)
Access control
• Don’t over anonymise - find balance between protecting
respondents’ confidentiality and maintaining research
usability of data
• Can’t fully anonymise data without removing all the
useful detail? Go back to the 5 Safes – think about
access control: Safe people, Safe settings, Safe outputs
![Page 15: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/15.jpg)
Access control
• At UK Data Service, data available under 3 access levels:
• OPEN – open public access
• SAFEGUARDED – downloadable, but use is traceable
• Registered users only (agree not to try to identify any
individual respondents)
• Special agreements/licence: permission-only access; approved projects – usage agreed in advance
• CONTROLLED – accredited users take a further training course
• Access via on-site safe setting or virtual secure environment
(SecureLab)
• Outputs disclosure-checked before publication
![Page 16: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/16.jpg)
Anonymising quantitative data: summary
• Informed consent
• Think about level of detail needed before data collection
• Remove direct identifiers
• Check and treat indirect identifiers to reduce disclosure
risk
• Document your changes
• Balance anonymisation with access control to preserve
data usability
![Page 17: Anonymising quantative data](https://reader037.vdocument.in/reader037/viewer/2022100807/58752dc71a28abe7728b4efd/html5/thumbnails/17.jpg)
Questions?
Guidance on anonymisation:
• UCD: http://libguides.ucd.ie/data/ethics
• UKDS: www.data-archive.ac.uk/create-manage/consent-
ethics/anonymisation
• Managing and Sharing Research Data book
https://uk.sagepub.com/en-gb/eur/managing-and-sharing-research-
data/book240297