vital event data release scoring criteria 5 june, 2005 naphsis meeting – cincinnati, oh mark...
TRANSCRIPT
Vital Event Data ReleaseScoring Criteria
5 June, 2005
NAPHSIS Meeting – Cincinnati, OH
Mark Flotow
Illinois Center for Health Statistics, IDPH
< Relate amusing anecdote here >
Quantifying Experience
Quantifying Experience
The purpose of our Vital Event Data Release Scoring Criteria is to establish clear guidelines for determining if tabulated data can be released to any requester.
Regarding this:
- this is for providing safeguards and assurances toward maintaining the confidentiality of the persons who comprise the data (re-identification issue)
- this is NOT for determining if vital event data are too small for calculating stable or reliable point estimators or rates (statistical issue)
Quantifying Experience
Bruce Cohen (MA) suggested that the following would be reasonable criteria and goals for a tabular data release system:
1) Protects confidentiality of individuals
2) Be simple and clear
3) Can be programmed electronically
4) Be sensitive and flexible
Regarding maintaining confidentiality, where does your level of comfort begin to appreciably drop off in the following data request example?
Quantifying Experience
Number of births in Chicago (or your jurisdiction’s most populous city) . . .
By five-year age group of mother . . .
Who delivered a low birthweight baby . . .
And had no or only third trimester prenatal care . . .
Cross-tabulated by four race categories . . .
By census tract . . .
For June-August 2003.
Data Request Confidential data? DRRC*
What constitutes confidential data?
- data that contain direct individual identifiers
- aggregated data that could lead to re-identification of individuals indirectly
What is NOT confidential data?
- public use data sets (e.g., sterilized files)
- aggregated data that could not lead to re-identification of individuals
* DRRC = Data Release and Research Committee
Data Release Guidelines in IDPH
The purpose of the Vital Event Data Release Scoring Criteria is to aid the determination of:
- confidential data (DRRC’s purview)
- data that require a minimum cell size or partial suppression* when releasing, to protect confidentiality
- data that can be released with no restrictions on cell size
* currently use complimentary suppression
Data Release Guidelines in IDPH
The Scoring Release Criteria are based on the following principles:
- the greater the degree of cross-tabulations of vital event variables, the greater the likelihood of re-identification
- the more categories or detail of any one vital event variable, the more it contributes to re-identification
- some vital event variables can lead towards re-identification to a greater degree than others
- the greater the aggregation of time and geography, the lesser the likelihood of re-identification
Data Release Guidelines in IDPH
The Scoring Release Criteria are generally based on the following research:
Dr. Tiefu Shen’s (of IDPH) “uniqueness” SAS program, now used as NAACCR standard*
This is a “brute force” program that checks all the combinations of variables at an individual record level to determine the uniqueness of each combination.
This is a test of potential confidentiality breach.
* Please see http://www.naaccr.org/index.asp?Col_SectionKey=11&Col_ContentID=81
Data Release Guidelines in IDPH
The Scoring Release Criteria are generally based on the following research (continued):
In general, the proportion of unique records increases as the number of variables in a cross-tabulation or combination increases.
Among cancer data files, a variable of five-year age groups contributes to uniqueness more than race category and much more than sex.
Data Release Guidelines in IDPH
Score Values for Vital Event Data Release
Variable Characteristic Score
Sex +1
Age >10-year age groups +2 6-10 year age groups +3 3-5 year age groups +5 1-2 year age groups +7
Race group any +3
Hispanic ethnicity yes or no +2 detailed ethnicity +3
Cause of death 1,000+ deaths for geography +2 100 – 999 deaths +3 <100 deaths +5
Quantifying Experience
Score Values (continued)
Variable Characteristic Score
Geography Illinois, Chicago, Cook County -5 CA, city, or county > 20,000 pop. 0 CA, city, or county > 20,000 or less +5
Data year 5 years aggregated -5 2-4 years aggregated 0 1 year (e.g., 2001) +3 quarter +5
Other variables < 5 groups or categories +3 5-9 groups +5 10+ groups +7
Quantifying Experience
Release Scoring Criteria
Process: for the resulting most detailed cross-tabulation, add up the scores based on the point values.
If the score is . . .
< 9 data are releasable, with no minimum cell size
9-11 discuss with supervisor if the data are okay as is or if there is a need to aggregate categories
12+ cell sizes must be 12 or more before releasing data; otherwise, small-size cells’ data must be suppressed
Quantifying Experience
Example 1
Births by 12 birthweight groups for Chicago for 1995
Scoring = +7 (other variables) –5 (geography) +3 (data year) = 5
Action: data can be released, as is, regardless of cell size.
Quantifying Experience
Example 2
Teen suicides by race categories for Chicago Community Areas (CAs) for 1999-2000 (combined)
Scoring = +3 (age) +3 (race) +5 (geography) +0 (data year) = 11
Action: discuss with supervisor; likely CAs with less than 12 suicides would be suppressed.
Quantifying Experience
Revisiting our level of comfort example, here’s where ICHS’s level drops off . . .
Number of births in Chicago . . .
Quantifying Experience
By five-year age group of mother . . . +5 (age) okay to release
Who delivered a low birthweight baby . . . +3 (other variables) still okay
And had no or only third trimester prenatal care . . . +3 (other variables) “discuss with supervisor”
Cross-tabulated by four race categories . . . +3 (race group) suppress small cell data
By census tract . . . +5 (geography) suppress small cell data
For June-Aug. 2003. +5 (data year) suppress small cell data
Total score = +5 (age) +3 (other var.) +3 (other var.) +3 (race) +5 (geography) +5 (data year) = 24
Possible Alternatives:
- Chicago only and 2 years of data = 5 + 3 + 3 + 3 - 5 + 0 = 9
- no age of mother and 5 years of data = 0 + 3 + 3 + 3 + 5 - 5 = 9
- make two separate requests:
a) births by census tract for June-Aug. 2003 = 5 + 5 = 10
b) Chicago births by LBW by prenatal care by race categories = 3 + 3 + 3 = 9
Quantifying Experience
These Scoring Release Criteria are your starting point.
They are designed to be flexible; experiment and compare to how you release vital event tabulations now.
Add scores for variables that are frequently requested.
Change variable characteristics to reflect your commonly used categories or selections (e.g., levels of geography).
Consider changing the thresholds to better meet your (agency’s) “comfort levels.”
Quantifying Experience
Contact information:
Mark FlotowIllinois Center for Health StatisticsIllinois Department of Public Health525 West Jefferson StreetSpringfield, IL 62761
Telephone: (217) 785-1064Fax: (217) 785-4308
E-mail: [email protected]