confidentiality issues with “small cell” data michael c. samuel, drph std control branch...
Post on 29-Dec-2015
214 Views
Preview:
TRANSCRIPT
Confidentiality Issues with “Small Cell” Data
Michael C. Samuel, DrPHSTD Control Branch
California Department of Public Health
2008 National STD Prevention ConferenceConfronting Challenges, Applying Solutions
Chicago, Illinois, March 10-13, 2008
* NOT issue of “small cells” with expected value < 5 and impact on chi-square tests
Race/Ethnicity
Age Group Black White Hispanic Asian/PI Total
15-19 15 1 3 1 20
20-24 20 10 10 15 55
25-34 3 10 10 2 25
35+ 12 14 7 2 35
Total 50 35 30 20 135
Confidentiality – Types of Disclosure
• Identity Disclosure– Identity of an individual can be determined based on
the released data• Or …can reasonably be determined…
• Attribute Disclosure– Confidential information about an individual is
revealed based on the released data• Or “sensitive” information; or “embarrassing” information
Extensive Literature
http://www.fcsm.gov/working-papers/spwp22.html
Key Resource:
Federal Committee on Statistical MethodologyOffice of Management and Budget
• Release of public health data– Balance obligations to protect the public’s
health with obligations to respect individual privacy & confidentiality
• If “significant” risks– “Statistical Disclosure Limitation”
• True Risk versus Perception of Risk
Key Concepts
Disclosure Limitation with Tabular Data
• If cells are deemed sensitive based on specified threshold rule– Alter underlying “line-listed” or “microdata”
before the tables are constructed – may be particularly relevant technique for on-line query systems
– Change table: aggregate rows or columns– Suppress cells
Threshold Rules
• Numerator rule– e.g. cell size <3, <5 (many)
• Population denominator rule– e.g. population < 20,000 (HIPPA-based), <50
• Numerator and population denominator rule– numerator > 10 AND denominator > 50 (Oregon cancer registry)
• Population denominator minus numerator rule– e.g. population-cell count < 10 (Missouri)
Cell Suppression
• Simple Cell Suppression• Random Rounding• Controlled Rounding• Controlled Tabular Adjustment
No Suppression (“With Disclosure”)Numbers from Working Paper 22, from Cox 1986
Race/Ethnicity
Age Group Black White Hispanic Asian/PI Total
15-19 15 1 3 1 20
20-24 20 10 10 15 55
25-34 3 10 10 2 25
35+ 12 14 7 2 35
Total 50 35 30 20 135
No Suppression (“With Disclosure”)Numbers from Working Paper 22, from Cox 1986
Race/Ethnicity
Age Group Black White Hispanic Asian/PI Total
15-19 15 1 3 1 20
20-24 20 10 10 15 55
25-34 3 10 10 2 25
35+ 12 14 7 2 35
Total 50 35 30 20 135
Simple SuppressionNumbers from Working Paper 22, from Cox 1986
Race/Ethnicity
Age Group Black White Hispanic Asian/PI Total
15-19 15 s s s 20
20-24 20 10 10 15 55
25-34 s 10 10 s 25
35+ 12 14 7 s 35
Total 50 35 30 20 135
s – data withheld to limit disclosure
Simple & Complementary Row and/or Column SuppressionNumbers from Working Paper 22, from Cox 1986
Race/Ethnicity
Age Group Black White Hispanic Asian/PI Total
15-19 15 s s s 20
20-24 20 s s 15 55
25-34 s 10 10 s 25
35+ s 14 7 s 35
Total 50 35 30 20 135
s – data withheld to limit disclosure
Simple & Complementary SuppressionNumbers from Working Paper 22, from Cox 1986
Race/Ethnicity
Age Group Black White Hispanic Asian/PI Total
15-19 15 s s s 20
20-24 20 s s 15 55
25-34 s 10 10 s 25
35+ s 14 7 s 35
Total 50 35 30 20 135
s – data withheld to limit disclosure
= 1 based on linear combinations
Simple & Complementary – “Protected by Suppression”Numbers from Working Paper 22, from Cox 1986
Race/Ethnicity
Age Group Black White Hispanic Asian/PI Total
15-19 15 s s s 20
20-24 20 10 10 15 55
25-34 s s 10 s 25
35+ s 14 s s 35
Total 50 35 30 20 135
s – data withheld to limit disclosure
Methods available to select appropriate cells for suppression and to audit a proposed suppression pattern
Los Angeles County - 2006
CA STD Control Suppression Rule
• Suppress any cell if– numerator ≠ 0 AND
– 0 < (Cell denominator – cell numerator) < 100
• AND, If so– Suppress any complementary cells necessary to avoid
re-calculation of suppressed cell – OR– Suppress all cells in a table if any cell meet criteria above
Fresno County - 2006
Modoc County - 2006
Alpine County - 2006
Sierra County - 2006
Attribute Disclosure
Solano County - 2004
Recommendations
• Confidentiality Concerns– Assess real versus perceived risk– If real, determine best rule(s)– Proposition: suppress if:
• Denominator – Numerator < 100 AND Numerator Not = 0
• If denominator unknown, estimate reasonably or use reasonable “numerator only” rule
?Michael C. Samuel, Dr.P.H.
Michael.Samuel@cdph.ca.gov510.620.3198
top related