confidentiality issues with “small cell” data michael c. samuel, drph std control branch...

28
Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention Conference Confronting Challenges, Applying Solutions Chicago, Illinois, March 10-13, 2008

Upload: janis-evans

Post on 29-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Confidentiality Issues with “Small Cell” Data

Michael C. Samuel, DrPHSTD Control Branch

California Department of Public Health

2008 National STD Prevention ConferenceConfronting Challenges, Applying Solutions

Chicago, Illinois, March 10-13, 2008

Page 2: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

* NOT issue of “small cells” with expected value < 5 and impact on chi-square tests

Race/Ethnicity

Age Group Black White Hispanic Asian/PI Total

15-19 15 1 3 1 20

20-24 20 10 10 15 55

25-34 3 10 10 2 25

35+ 12 14 7 2 35

Total 50 35 30 20 135

Page 3: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Confidentiality – Types of Disclosure

• Identity Disclosure– Identity of an individual can be determined based on

the released data• Or …can reasonably be determined…

• Attribute Disclosure– Confidential information about an individual is

revealed based on the released data• Or “sensitive” information; or “embarrassing” information

Page 4: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Extensive Literature

http://www.fcsm.gov/working-papers/spwp22.html

Key Resource:

Federal Committee on Statistical MethodologyOffice of Management and Budget

Page 5: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

• Release of public health data– Balance obligations to protect the public’s

health with obligations to respect individual privacy & confidentiality

• If “significant” risks– “Statistical Disclosure Limitation”

• True Risk versus Perception of Risk

Key Concepts

Page 6: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Disclosure Limitation with Tabular Data

• If cells are deemed sensitive based on specified threshold rule– Alter underlying “line-listed” or “microdata”

before the tables are constructed – may be particularly relevant technique for on-line query systems

– Change table: aggregate rows or columns– Suppress cells

Page 7: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Threshold Rules

• Numerator rule– e.g. cell size <3, <5 (many)

• Population denominator rule– e.g. population < 20,000 (HIPPA-based), <50

• Numerator and population denominator rule– numerator > 10 AND denominator > 50 (Oregon cancer registry)

• Population denominator minus numerator rule– e.g. population-cell count < 10 (Missouri)

Page 8: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Cell Suppression

• Simple Cell Suppression• Random Rounding• Controlled Rounding• Controlled Tabular Adjustment

Page 9: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

No Suppression (“With Disclosure”)Numbers from Working Paper 22, from Cox 1986

Race/Ethnicity

Age Group Black White Hispanic Asian/PI Total

15-19 15 1 3 1 20

20-24 20 10 10 15 55

25-34 3 10 10 2 25

35+ 12 14 7 2 35

Total 50 35 30 20 135

Page 10: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

No Suppression (“With Disclosure”)Numbers from Working Paper 22, from Cox 1986

Race/Ethnicity

Age Group Black White Hispanic Asian/PI Total

15-19 15 1 3 1 20

20-24 20 10 10 15 55

25-34 3 10 10 2 25

35+ 12 14 7 2 35

Total 50 35 30 20 135

Page 11: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Simple SuppressionNumbers from Working Paper 22, from Cox 1986

Race/Ethnicity

Age Group Black White Hispanic Asian/PI Total

15-19 15 s s s 20

20-24 20 10 10 15 55

25-34 s 10 10 s 25

35+ 12 14 7 s 35

Total 50 35 30 20 135

s – data withheld to limit disclosure

Page 12: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Simple & Complementary Row and/or Column SuppressionNumbers from Working Paper 22, from Cox 1986

Race/Ethnicity

Age Group Black White Hispanic Asian/PI Total

15-19 15 s s s 20

20-24 20 s s 15 55

25-34 s 10 10 s 25

35+ s 14 7 s 35

Total 50 35 30 20 135

s – data withheld to limit disclosure

Page 13: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Simple & Complementary SuppressionNumbers from Working Paper 22, from Cox 1986

Race/Ethnicity

Age Group Black White Hispanic Asian/PI Total

15-19 15 s s s 20

20-24 20 s s 15 55

25-34 s 10 10 s 25

35+ s 14 7 s 35

Total 50 35 30 20 135

s – data withheld to limit disclosure

= 1 based on linear combinations

Page 14: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Simple & Complementary – “Protected by Suppression”Numbers from Working Paper 22, from Cox 1986

Race/Ethnicity

Age Group Black White Hispanic Asian/PI Total

15-19 15 s s s 20

20-24 20 10 10 15 55

25-34 s s 10 s 25

35+ s 14 s s 35

Total 50 35 30 20 135

s – data withheld to limit disclosure

Methods available to select appropriate cells for suppression and to audit a proposed suppression pattern

Page 15: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention
Page 16: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention
Page 17: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention
Page 18: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Los Angeles County - 2006

Page 19: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

CA STD Control Suppression Rule

• Suppress any cell if– numerator ≠ 0 AND

– 0 < (Cell denominator – cell numerator) < 100

• AND, If so– Suppress any complementary cells necessary to avoid

re-calculation of suppressed cell – OR– Suppress all cells in a table if any cell meet criteria above

Page 20: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Fresno County - 2006

Page 21: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Modoc County - 2006

Page 22: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Alpine County - 2006

Page 23: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Sierra County - 2006

Page 24: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Attribute Disclosure

Page 25: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention
Page 26: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Solano County - 2004

Page 27: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

Recommendations

• Confidentiality Concerns– Assess real versus perceived risk– If real, determine best rule(s)– Proposition: suppress if:

• Denominator – Numerator < 100 AND Numerator Not = 0

• If denominator unknown, estimate reasonably or use reasonable “numerator only” rule

Page 28: Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention

?Michael C. Samuel, Dr.P.H.

[email protected]