output consultation plans and statistical disclosure control strategy developments angele storey and...
TRANSCRIPT
Output Consultation Plans and Statistical Disclosure Control Strategy
developments
Angele Storey and Jane Longhurst ONS
Background
• Census users require a range of different outputs• SARs are a key output• Detailed information may raise confidentiality
issues• Requirements to protect data:
– Public trust and cooperation– Legal rights and obligations– National and International standards for statistics
• To balance confidentiality requirements with maximising utility to meet user needs
• Need comprehensive understanding of user needs and then
Update on…..
• User consultation
• Progress on Microdata
• Statistical Development Control (SDC) developments
User consultation–tabular output
• Current activities are focussing on understanding user’s high level output requirements for tabular output and microdata samples
• 38 in-depth interviews conducted with users drawn from across census user communities
• On-line survey planned to test emerging themes with wider user base
User consultation–tabular output cont…
• Areas that are being explored with users in the interviews and on-line consultation include
– Balance of additivity, consistency, accuracy in tables– Level and complexity of outputs used and required– Strength of user requirement for flexible output and level
of flexibility required– Metadata use and requirements– Access, dissemination methods and media
Census Personas as aid to output design
2011 Census output geography
• ONS consultation on small area geography for England and Wales: Nov ’06 to Feb ’07
http://www.statistics.gov.uk/about/consultations/Small_Area_Geography_Policy.asp
Conclusions:– High degree of stability at both OA and SOA level– Minimal changes after Census 2011 limited to less than
5% of the OAs nationally– changes made by simple mergers and splits
• OA/SOA hierarchy will be primary output geography for 2011 Census output
Progress on microdata
• Census Microdata Strategy Working Group (CMSWG) set up May 2007
• Includes representatives from CCSR, GROS and NISRA
• Responsible for determining strategy for specification and production of 2011 Census microdata samples
• First step – to explore high level user requirements for 2011 Census microdata samples
• CCSR : Presenting initial findings from survey this pm
• Survey findings will be considered at next meeting of CMSWG
Key future dates in microdata work programme
• April ’08 – Feb ’09
- CCSR consultation with users on content of microdata samples
- CCSR work with ONS to develop draft of statistical specification
• Feb ’09 – June ’09
- CCSR consult formally on statistical specification of microdata samples
• Dec ’10
- Final agreed specifications for microdata samples• Jan’11 onwards
- Development, production, testing and delivery stages
Approach to developing SDC policy and methodology for 2011 Census output
• SDC for 2011 Census outputs is a major concern
for users
• Different SDC methodologies were adopted for
standard tabular 2001 Census outputs across the
UK
• Late addition of small cell adjustment by
ONS/NISRA resulted in high level of user confusion
and dissatisfaction
• Publicised commitment to aim for a common UK
SDC methodology for all 2011 Census outputs
Approach continued..
• Phase 1 (March ’06 – Jan ’07)– UK agreement of key SDC policy issues
• Phase 2 (Jan ’07 – Sept ’08) – Evaluation of all methods complying with agreed SDC policy
position in terms of risk/utility framework and feasibility of implementation
• Phase 3 (Sept ’08 – April ’09)– Recommendations and UK agreement of SDC
methodologies for 2011 Census tabular outputs• Phase 4 (Jan ’09 onwards)
– Evaluate and develop SDC methods for microdata, future work on output specification, system specification, development and testing
SDC progress
• RsG agreed high level SDC policy ? 2006
• UK SDC working group established to take forward
work
• UKCDMAC subgroup established to QA work
• Phase 2 (focus on tabular outputs):
– Evaluation of 2001
– High level qualitative evaluation and short-listing of SDC
methods for tabular output
– Detailed quantitative evaluation of short-list
Qualitative criteria
• Primary criteria
– Additivity and consistency
– Over user acceptability
– Protection against differencing
– Feasibility of implementation
• Secondary criteria
– Impact on microdata releases
– Simple to understand
– Easy to account for in analyses
SDC short-list
• Record swapping
• Over-imputation
• ABS Cell Perturbation method
• Small cell adjustment with record swapping (to
provide comparison with 2001)
Quantitative evaluation
• Evaluate short-list
• Range of tables using 2001 Census data
• Balance between risk and utility
• Combine quantitative results with qualitative criteria
• Aligned with work on Outputs, Downstream
Processing and Geography
• Continued communication and consultation with
users
Timetable
• Phase 1 (March ’06 – Jan ’07)– UK agreement of key SDC policy issues
• Phase 2 (Jan ’07 – Sept ’08) – Evaluation of all methods complying with agreed SDC policy
position in terms of risk/utility framework and feasibility of implementation
• Phase 3 (Sept ’08 – Spring/Summer ’09)– Recommendations and UK agreement of SDC
methodologies for 2011 Census tabular outputs• Phase 4 (Feb ’09 onwards)
– Evaluate and develop SDC methods for microdata, future work on output specification, system specification, development and testing
Risk Assessment for microdata
• Disclosure risk depends on records that are unique
in the sample and in the population
• Evaluate risk using scenarios and quantitative risk
measures
Risk assessment for microdata
• Disclosure risk scenarios– Assumptions concerning prior knowledge of intruder and
information available to him, e.g. private database, journalist, nosy neighbour
– Identify key variables - indirectly identifying variables– Use this process to decide what needs to be protected
against– Need to update scenarios
• Quantitative risk measures– Percentage of population uniques that are in the file– Identify high risk records
SDC for Microdata
• Perturbative methods– PRAM (implemented for 2001 Census SAR)– Record swapping– Adding noise
• Non-perturbative methods– Recoding (implemented for 2001 Census SAR)– Suppression– Sub-sampling
• Mixed strategies• Different methods will be evaluated• Need to recognise interdependence with SDC
method for tabular outputs
Access Options
• ‘End user licence’ access to low (but not zero) risk
data
• ‘Special licence’ access to detailed but anonymised
data
• On-site laboratory access - ‘safe centre’
• Remote access/execution
Impact of legislation
• Statistics and Registration Services Act – April
2008
• Personal information held by the Board must not be
disclosed unless an exemption holds
– E.g. to an approved researcher
• ONS approved researcher working group
• ONS SDC microdata standards working group
Summary
• Ongoing progress made for 2011 Census
• Output consultation
– Tables
– Microdata
– Geography
• Statistical disclosure control
– Short-list of SDC methods for protecting tables
– Quantitative evaluation of short-list
– Future work on microdata