vital event data release scoring criteria 5 june, 2005 naphsis meeting – cincinnati, oh mark...

19
Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

Upload: brent-hart

Post on 16-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

Vital Event Data ReleaseScoring Criteria

5 June, 2005

NAPHSIS Meeting – Cincinnati, OH

Mark Flotow

Illinois Center for Health Statistics, IDPH

Page 2: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

< Relate amusing anecdote here >

Quantifying Experience

Page 3: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

Quantifying Experience

The purpose of our Vital Event Data Release Scoring Criteria is to establish clear guidelines for determining if tabulated data can be released to any requester.

Regarding this:

- this is for providing safeguards and assurances toward maintaining the confidentiality of the persons who comprise the data (re-identification issue)

- this is NOT for determining if vital event data are too small for calculating stable or reliable point estimators or rates (statistical issue)

Page 4: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

Quantifying Experience

Bruce Cohen (MA) suggested that the following would be reasonable criteria and goals for a tabular data release system:

1) Protects confidentiality of individuals

2) Be simple and clear

3) Can be programmed electronically

4) Be sensitive and flexible

Page 5: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

Regarding maintaining confidentiality, where does your level of comfort begin to appreciably drop off in the following data request example?

Quantifying Experience

Number of births in Chicago (or your jurisdiction’s most populous city) . . .

By five-year age group of mother . . .

Who delivered a low birthweight baby . . .

And had no or only third trimester prenatal care . . .

Cross-tabulated by four race categories . . .

By census tract . . .

For June-August 2003.

Page 6: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

Data Request Confidential data? DRRC*

What constitutes confidential data?

- data that contain direct individual identifiers

- aggregated data that could lead to re-identification of individuals indirectly

What is NOT confidential data?

- public use data sets (e.g., sterilized files)

- aggregated data that could not lead to re-identification of individuals

* DRRC = Data Release and Research Committee

Data Release Guidelines in IDPH

Page 7: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

The purpose of the Vital Event Data Release Scoring Criteria is to aid the determination of:

- confidential data (DRRC’s purview)

- data that require a minimum cell size or partial suppression* when releasing, to protect confidentiality

- data that can be released with no restrictions on cell size

* currently use complimentary suppression

Data Release Guidelines in IDPH

Page 8: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

The Scoring Release Criteria are based on the following principles:

- the greater the degree of cross-tabulations of vital event variables, the greater the likelihood of re-identification

- the more categories or detail of any one vital event variable, the more it contributes to re-identification

- some vital event variables can lead towards re-identification to a greater degree than others

- the greater the aggregation of time and geography, the lesser the likelihood of re-identification

Data Release Guidelines in IDPH

Page 9: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

The Scoring Release Criteria are generally based on the following research:

Dr. Tiefu Shen’s (of IDPH) “uniqueness” SAS program, now used as NAACCR standard*

This is a “brute force” program that checks all the combinations of variables at an individual record level to determine the uniqueness of each combination.

This is a test of potential confidentiality breach.

* Please see http://www.naaccr.org/index.asp?Col_SectionKey=11&Col_ContentID=81

Data Release Guidelines in IDPH

Page 10: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

The Scoring Release Criteria are generally based on the following research (continued):

In general, the proportion of unique records increases as the number of variables in a cross-tabulation or combination increases.

Among cancer data files, a variable of five-year age groups contributes to uniqueness more than race category and much more than sex.

Data Release Guidelines in IDPH

Page 11: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

Score Values for Vital Event Data Release

Variable Characteristic Score

Sex +1

Age >10-year age groups +2 6-10 year age groups +3 3-5 year age groups +5 1-2 year age groups +7

Race group any +3

Hispanic ethnicity yes or no +2 detailed ethnicity +3

Cause of death 1,000+ deaths for geography +2 100 – 999 deaths +3 <100 deaths +5

Quantifying Experience

Page 12: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

Score Values (continued)

Variable Characteristic Score

Geography Illinois, Chicago, Cook County -5 CA, city, or county > 20,000 pop. 0 CA, city, or county > 20,000 or less +5

Data year 5 years aggregated -5 2-4 years aggregated 0 1 year (e.g., 2001) +3 quarter +5

Other variables < 5 groups or categories +3 5-9 groups +5 10+ groups +7

Quantifying Experience

Page 13: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

Release Scoring Criteria

Process: for the resulting most detailed cross-tabulation, add up the scores based on the point values.

If the score is . . .

< 9 data are releasable, with no minimum cell size

9-11 discuss with supervisor if the data are okay as is or if there is a need to aggregate categories

12+ cell sizes must be 12 or more before releasing data; otherwise, small-size cells’ data must be suppressed

Quantifying Experience

Page 14: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

Example 1

Births by 12 birthweight groups for Chicago for 1995

Scoring = +7 (other variables) –5 (geography) +3 (data year) = 5

Action: data can be released, as is, regardless of cell size.

Quantifying Experience

Page 15: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

Example 2

Teen suicides by race categories for Chicago Community Areas (CAs) for 1999-2000 (combined)

Scoring = +3 (age) +3 (race) +5 (geography) +0 (data year) = 11

Action: discuss with supervisor; likely CAs with less than 12 suicides would be suppressed.

Quantifying Experience

Page 16: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

Revisiting our level of comfort example, here’s where ICHS’s level drops off . . .

Number of births in Chicago . . .

Quantifying Experience

By five-year age group of mother . . . +5 (age) okay to release

Who delivered a low birthweight baby . . . +3 (other variables) still okay

And had no or only third trimester prenatal care . . . +3 (other variables) “discuss with supervisor”

Cross-tabulated by four race categories . . . +3 (race group) suppress small cell data

By census tract . . . +5 (geography) suppress small cell data

For June-Aug. 2003. +5 (data year) suppress small cell data

Page 17: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

Total score = +5 (age) +3 (other var.) +3 (other var.) +3 (race) +5 (geography) +5 (data year) = 24

Possible Alternatives:

- Chicago only and 2 years of data = 5 + 3 + 3 + 3 - 5 + 0 = 9

- no age of mother and 5 years of data = 0 + 3 + 3 + 3 + 5 - 5 = 9

- make two separate requests:

a) births by census tract for June-Aug. 2003 = 5 + 5 = 10

b) Chicago births by LBW by prenatal care by race categories = 3 + 3 + 3 = 9

Quantifying Experience

Page 18: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

These Scoring Release Criteria are your starting point.

They are designed to be flexible; experiment and compare to how you release vital event tabulations now.

Add scores for variables that are frequently requested.

Change variable characteristics to reflect your commonly used categories or selections (e.g., levels of geography).

Consider changing the thresholds to better meet your (agency’s) “comfort levels.”

Quantifying Experience

Page 19: Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH

Contact information:

Mark FlotowIllinois Center for Health StatisticsIllinois Department of Public Health525 West Jefferson StreetSpringfield, IL 62761

Telephone: (217) 785-1064Fax: (217) 785-4308

E-mail: [email protected]