confidentiality issues with “small cell” data michael c. samuel, drph std control branch...

Post on 29-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Confidentiality Issues with “Small Cell” Data

Michael C. Samuel, DrPHSTD Control Branch

California Department of Public Health

2008 National STD Prevention ConferenceConfronting Challenges, Applying Solutions

Chicago, Illinois, March 10-13, 2008

* NOT issue of “small cells” with expected value < 5 and impact on chi-square tests

Race/Ethnicity

Age Group Black White Hispanic Asian/PI Total

15-19 15 1 3 1 20

20-24 20 10 10 15 55

25-34 3 10 10 2 25

35+ 12 14 7 2 35

Total 50 35 30 20 135

Confidentiality – Types of Disclosure

• Identity Disclosure– Identity of an individual can be determined based on

the released data• Or …can reasonably be determined…

• Attribute Disclosure– Confidential information about an individual is

revealed based on the released data• Or “sensitive” information; or “embarrassing” information

Extensive Literature

http://www.fcsm.gov/working-papers/spwp22.html

Key Resource:

Federal Committee on Statistical MethodologyOffice of Management and Budget

• Release of public health data– Balance obligations to protect the public’s

health with obligations to respect individual privacy & confidentiality

• If “significant” risks– “Statistical Disclosure Limitation”

• True Risk versus Perception of Risk

Key Concepts

Disclosure Limitation with Tabular Data

• If cells are deemed sensitive based on specified threshold rule– Alter underlying “line-listed” or “microdata”

before the tables are constructed – may be particularly relevant technique for on-line query systems

– Change table: aggregate rows or columns– Suppress cells

Threshold Rules

• Numerator rule– e.g. cell size <3, <5 (many)

• Population denominator rule– e.g. population < 20,000 (HIPPA-based), <50

• Numerator and population denominator rule– numerator > 10 AND denominator > 50 (Oregon cancer registry)

• Population denominator minus numerator rule– e.g. population-cell count < 10 (Missouri)

Cell Suppression

• Simple Cell Suppression• Random Rounding• Controlled Rounding• Controlled Tabular Adjustment

No Suppression (“With Disclosure”)Numbers from Working Paper 22, from Cox 1986

Race/Ethnicity

Age Group Black White Hispanic Asian/PI Total

15-19 15 1 3 1 20

20-24 20 10 10 15 55

25-34 3 10 10 2 25

35+ 12 14 7 2 35

Total 50 35 30 20 135

No Suppression (“With Disclosure”)Numbers from Working Paper 22, from Cox 1986

Race/Ethnicity

Age Group Black White Hispanic Asian/PI Total

15-19 15 1 3 1 20

20-24 20 10 10 15 55

25-34 3 10 10 2 25

35+ 12 14 7 2 35

Total 50 35 30 20 135

Simple SuppressionNumbers from Working Paper 22, from Cox 1986

Race/Ethnicity

Age Group Black White Hispanic Asian/PI Total

15-19 15 s s s 20

20-24 20 10 10 15 55

25-34 s 10 10 s 25

35+ 12 14 7 s 35

Total 50 35 30 20 135

s – data withheld to limit disclosure

Simple & Complementary Row and/or Column SuppressionNumbers from Working Paper 22, from Cox 1986

Race/Ethnicity

Age Group Black White Hispanic Asian/PI Total

15-19 15 s s s 20

20-24 20 s s 15 55

25-34 s 10 10 s 25

35+ s 14 7 s 35

Total 50 35 30 20 135

s – data withheld to limit disclosure

Simple & Complementary SuppressionNumbers from Working Paper 22, from Cox 1986

Race/Ethnicity

Age Group Black White Hispanic Asian/PI Total

15-19 15 s s s 20

20-24 20 s s 15 55

25-34 s 10 10 s 25

35+ s 14 7 s 35

Total 50 35 30 20 135

s – data withheld to limit disclosure

= 1 based on linear combinations

Simple & Complementary – “Protected by Suppression”Numbers from Working Paper 22, from Cox 1986

Race/Ethnicity

Age Group Black White Hispanic Asian/PI Total

15-19 15 s s s 20

20-24 20 10 10 15 55

25-34 s s 10 s 25

35+ s 14 s s 35

Total 50 35 30 20 135

s – data withheld to limit disclosure

Methods available to select appropriate cells for suppression and to audit a proposed suppression pattern

Los Angeles County - 2006

CA STD Control Suppression Rule

• Suppress any cell if– numerator ≠ 0 AND

– 0 < (Cell denominator – cell numerator) < 100

• AND, If so– Suppress any complementary cells necessary to avoid

re-calculation of suppressed cell – OR– Suppress all cells in a table if any cell meet criteria above

Fresno County - 2006

Modoc County - 2006

Alpine County - 2006

Sierra County - 2006

Attribute Disclosure

Solano County - 2004

Recommendations

• Confidentiality Concerns– Assess real versus perceived risk– If real, determine best rule(s)– Proposition: suppress if:

• Denominator – Numerator < 100 AND Numerator Not = 0

• If denominator unknown, estimate reasonably or use reasonable “numerator only” rule

?Michael C. Samuel, Dr.P.H.

Michael.Samuel@cdph.ca.gov510.620.3198

top related