progressing from safe data to safe output to safe...

19
Charles (Chuck) Humphrey University of Alberta August 2007 Progressing from Safe Data to Safe Output to Safe Metadata Managing the risk of disclosure

Upload: buidien

Post on 11-Feb-2019

218 views

Category:

Documents


0 download

TRANSCRIPT

Charles (Chuck) HumphreyUniversity of AlbertaAugust 2007

Progressing from SafeData to Safe Output toSafe Metadata

Managing the risk of disclosure

2

Victoria

Sudbury

LavalMcGill

SherbrookeUQAM

COOL

Queen’s

Western

Manitoba

BCIRDC

Prairie

Alberta

SWORDC

McMaster

Toronto

CRISP

CIQSS

Atlantic

Centres in first CFI application

Provincial funds leveraged by first CFI application

3

Cumulative Growth of Projects up to April 13, 2007

0

200

400

600

800

1000

1200

1400

C

O

U

N

T

Incomplete/Withdrawn

Active/Dormant

Proposal in evaluation

Completed

Incomplete/Withdrawn 196

Active/Dormant 31 122 251 350 481 600 754 534

Proposal in evaluation 1 7 0 19 14 18 30 72

Completed 0 0 0 2 47 131 233 448

2000 2001 2002 2003 2004 2005 2006 2007

Source: Gustave Goldmann, RDC Manager’s Report, April 2007

4

Data & Knowledge: an Old Problem

How does one maximise the creation of newknowledge from confidential data?

For data producers, the problem is expressedas the risk of disclosure. What level ofaccess can be permitted while minimising therisk of disclosure?

From the researchers perspective, theproblem is one of having the fullness of datato create new knowledge. Do all thevariables exist to explain the phenomena?

5

6

The Canadian Context

Two microdata dissemination mechanisms30 Years of Public Use Microdata Files7 Years of Research Data Centres

25 Year Gap between PUMFs and RDCs 3 computing paradigm shifts Statistic Canada introduces exorbitant data

prices No longitudinal PUMFs coming out of the

1990’s

7

The Canadian Context

The context coming out of the 1990’sDownsizing of the public service,

especially policy analyst positions.A crisis of the national health care system.The introduction of a new wave of

longitudinal surveys,The rise of evidence-based policy making

and its demands for policy-relevantknowledge.

Recently, microdata from administrativerecords and the Census

8

Approaches to Managing Risk

Public Use MicrodataCreated at the tail end of a survey’s

production stream with a focus on satisfyingthe Data Release Committee’s concernsabout confidentiality

Confidential DataAccess is created on a project-by-project

basis in a secure environment controlled byStatistics Canada

Focus is on disclosure analysis and ‘safe’output

9

Adding More Information

Adding more organised information in theform of metadata.

Metadata consist of information describingother information in a format that is webactionable, can be migrated easily withchanges in technology and can be preserved.

In this instance, what is being described isthe information produced within the separatestages of the research life cycle.

10

The RDC Research Life Cycle

ProjectApplication

ProjectApproval

ProjectCreation

Access to Data

GenerateAnalysis

Files

OutputDisclosureAnalysis

ResearchCommun-

icatons

Stages in the life cycle

11

The RDC Research Life Cycle

ProjectApplication

ProjectApproval

ProjectCreation

Access to Data

GenerateAnalysis

Files

OutputDisclosureAnalysis

ResearchCommun-

icatons

Stages in the life cycle

12

Metadata in the Life Cycle

Generate DDI metadatato 1.0/2.0 standards ina retrospectiveconversion project

Statistics CanadaMaster Files

Develop tools toconvert DDI 1.0/2.0 toDDI 3.0 and incorporatethe Questionnairemodule

Acc

ess

to m

aste

r file

s

Analysis through MultipleWorking Files

Rep

urpo

se m

aste

r file

s

SubsetRecode

ComputeMerge

Multiple Versions

13

Metadata in the Life CycleAnalysis through Multiple

Working Files

Acc

ess

to m

aste

r file

s

Rep

urpo

se m

aste

r file

s

Multiple Versions

Generate DDI metadata forworking files using tools that readstatistical system and syntax / logfiles. These metadata files can beused to document the workingfiles, to produce products from themetadata (e.g., a codebook listingor Powerpoint slides), to be linkedto research communications andto recreate the working file from itsmaster data file.

A validation tool will compare aworking file against a virtuallygenerated working file.

14

The RDC Research Life Cycle

ProjectApplication

ProjectApproval

ProjectCreation

Access to Data

GenerateAnalysis

Files

OutputDisclosureAnalysis

ResearchCommun-

icatons

Stages in the life cycle

15

Metadata in the Life CycleAnalysis through

Working Files

Out

put f

rom

Wor

king

File

sWorking Data FilesMetadata

DisclosureAnalysis

Tables

Reports

Res

earc

h C

omm

unic

atio

ns

Website

Journals

Conferences

RepositorySupportingmetadata

16

A Third Method of Managing Risk

Metadata-driven risk assessment Introduces the researcher’s data needs as

the central focusUses metadata to drive the process of

determining the access mechanismRequires standard metadata across all

surveysMust be a service outside the survey

production stream

17

New Role for Research Data Centres

Research Data Centres are not part of asurvey production stream and provide theonly continuous dissemination service forconfidential data in Statistics Canada. Canada Foundation for Innovation funding is providing in

infrastructure allowing a rationalisation of RDC operations

Research Data Centre Network is creatingDDI-compliant metadata for all confidentialdata files in its Centres. Canada Foundation for Innovation funding metadata

creation and tools development projects

18

User-driven Continuum of Risk Management

Researcher

Searchesmetadata forpublic andconfidentialmicrodataand identifiesa subset ofvariables andcases.

Submitssubsetrequest toRDCN.Request isautomaticallyin metadataformat.

RDCN

Requestreceived inmetadataformat.

Risk scorescalculatedand thresholddetermined.

Con

tinuu

m o

f Acc

ess

RDC applicationprepared and returned toresearcher

Confidential subset createdin RDC; datasent pendingcontract

Metadata setupgenerated forextracting subsetfrom PUMF

Researcher

19

Future Work

Develop risk-level measures for individualvariables and ascertain thresholds of overallscores. Make use of survey managers’ experiences with preparing

public use microdata files for the Data Release Committee Build a database of RDC disclosure decisions and

incorporate in project metadata

Through metadata we can improve access todetailed microdata while safeguarding theconfidentiality of respondents.