comparing linked maternity data sets to check data quality in spss 

28
Comparing linked maternity data sets to check data quality in SPSS Preeti Datta-Nemdharry, Nirupa Dattani and Alison Macfarlane

Upload: breck

Post on 13-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Comparing linked maternity data sets to check data quality in SPSS . Preeti Datta-Nemdharry, Nirupa Dattani and Alison Macfarlane. Background (1). Birth registration By law, live births must be registered within 42 days of birth - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Comparing linked maternity data sets to check data quality in SPSS 

Comparing linked maternity data sets to check data quality

in SPSS

Preeti Datta-Nemdharry, Nirupa Dattani and Alison Macfarlane

Page 2: Comparing linked maternity data sets to check data quality in SPSS 

Background (1)Birth registration

• By law, live births must be registered within 42 days of birth

• Information recorded from parents is mainly socio-demographic, such as names, address of residence, occupation of parents, marital status and country of birth

Page 3: Comparing linked maternity data sets to check data quality in SPSS 

Background (2)

NHS Numbers for Babies (NN4B)

• Central Issuing System introduced in 2002 for issuing NHS numbers at birth for babies born in England, Wales and the Isle of Man

• A small set of data is collected, including gestational age for live births, ethnicity of baby and date and time of birth

Page 4: Comparing linked maternity data sets to check data quality in SPSS 

Background (3)

Maternity Hospital Episode Statistics (HES)

• Data should be collected for all births occurring in England

• Core admitted patient care record for mother plus ‘maternity tail’ with details of delivery and the baby.

• Core birth record for baby plus ‘baby tail(s)’

Page 5: Comparing linked maternity data sets to check data quality in SPSS 

Background (4)National Community Child Health database(NCCHD) and Patient Episode Database forWales (PEDW)

• Data collected for all births occurring in Wales

• Information collected on maternity similar to HES

Page 6: Comparing linked maternity data sets to check data quality in SPSS 
Page 7: Comparing linked maternity data sets to check data quality in SPSS 

Method

• Link data for 2005 and 2006 for England and Wales

• Phase 1 involving linkage of birth registration data to NN4B data

• Phase 2 involving linkage of registration/NN4B data to Maternity HES for England and Child Health/PEDW databases for Wales

Page 8: Comparing linked maternity data sets to check data quality in SPSS 

Method cont…Phase 2

• Linkage to maternity HES carried out by Northgate Solutions using algorithm devised by City University

• Key data items for linkage, e.g. NHS no, DOB and unique ID compiled by ONS sent to Northgate solutions for linkage

• Linkage to Child Health and PEDW databases carried out by NHS Wales Informatics Service using the same algorithm

Page 9: Comparing linked maternity data sets to check data quality in SPSS 

After the linkage was done…

• HES records, linked to registration/NN4B data, had multiple records for the same mother for each episode.

• So needed to omit the duplicates by keeping records with most information.

• Ensure one-to-one linkage to registration/NN4B

Page 10: Comparing linked maternity data sets to check data quality in SPSS 

Identifying duplicates, triplicates..• GET• FILE='C:\Users\trial\Desktop\exampleHES.sav'.• Dataset name DataSet1 Window=Front.

• * Identify Duplicate Cases after sorting by id and within id by epikeys.

• Dataset activate Dataset1.• Sort cases by id(D) epikeys(D). /* sorts the cases first by id(D)

and then by epikeys(D)*/.• compute flag=1. /*computes a variable called flag with

default value of 1 */.• if id=lag(id) flag=0. /*replaces any initial ‘1’ value to 0 if id =

the same id in the row before*/.• exe.

Page 11: Comparing linked maternity data sets to check data quality in SPSS 

id and epikey sorted –

descending

1.00 allocated to the highest epikey per id

Page 12: Comparing linked maternity data sets to check data quality in SPSS 

Creating a file with only one id per row…

• *Create wodups - without duplicates dataset. • Dataset Activate dataset1. /*exampleHES dataset is

the active dataset */.• Dataset copy wodups.• Select if (flag=1). /*selecting the record with the

most information ie the highest epikey*/.• Exe.

Page 13: Comparing linked maternity data sets to check data quality in SPSS 

Merge with exampleNN4BREG data• *merge exampleHES with exampleNN4BREG.• *first sort the key variable e.g. id.• *main dataset.• Dataset activate wodups. • Sort cases by id(A). /*make sure the cases are sorted in both the

datasets */.

• *dataset to be merged.• Dataset Activate NN4BREG.• Sort cases by id(A).

• *merging. • Match files file=wodups.• /file=NN4BREG• /by id.• Exe.

Page 14: Comparing linked maternity data sets to check data quality in SPSS 

Data quality checks

• Quality of maternity HES based on completeness and consistency of the HES data in relation to birth registration data where ever possible

• NN4B data used to validate maternity HES where information not available from registration.

Page 15: Comparing linked maternity data sets to check data quality in SPSS 

Missing data

• *Missing data - for string variables eg NHS No.• Dataset activate wodups.• missing values NHSnoHES (" ").• freq var = NHSnoHES/format=notable. • /*gives only the total numbers */.

• *OR.• compute var1 = (length(rtrim(NHSnoHES)) = 0). • execute.• desc var = var1• /statistics = sum.

Page 16: Comparing linked maternity data sets to check data quality in SPSS 

• *Missing data - for dates, after checking formats.• freq var=dobHES/format=notable.

• *Missing data for numeric variables e.g. birthweight.• Freq var=birthweightHES/format=notable

• *OR.• Compute noBWT=missing(birthweightHES). /*codes 1 as

missing */.• Exe.

Page 17: Comparing linked maternity data sets to check data quality in SPSS 

Cross checking dates…• *Cross checking baby's dob • *1) Formatting dates.• *if one date is string - reformat to date.• Compute datevar2=Number(dobReg,ADATE10). /*converting date in string eg

01/01/2005 into date format*/.• Formats datevar2 (ADATE10).• Execute.

• *if both are in date format but need to reformat into eg yyyy/mm/dd.• formats dobHES (sDate10). /*other way around ie mm/dd/yyyy - (aDate10)

*/.• execute.

• *2) cross checking dates.• compute equal=dobHES=dobReg. /*gives value of 1 =same dates and 0 =

dates differ*/.• Execute.• freq var=equal/format=notable. /* shows how many are equal*/.

Page 18: Comparing linked maternity data sets to check data quality in SPSS 

Birthweight• *cross checking birthweight between two datasets.• *one way- create another variable which will give value of 0 if not equal

and 1 if equal.• DATASET ACTIVATE wodups.• Compute birthweight3=birthweightHES=• birthweightReg.• Execute.• Freq var birthweight3.

Page 19: Comparing linked maternity data sets to check data quality in SPSS 

• *OR group birthweight into categories and see how many cases fall into each category.• *recoding birthweight data for HES.• Recode birthweightHES (0=0) (9998=0) (MISSING=0) (1 thru 499=1) (500 thru 999=2) (1000 thru 1499=3) (1500 thru

1999=4) (2000 thru 2499=5) (2500 thru• 2999=6) (3000 thru 3499=7) (3500 thru 3999=8) (4000 thru 4499=9) (4500 thru 4999=10) (5000 thru 5499=11) (5500 thru• Highest=12) INTO BWTgroupHES.• Var labels BWTgroupHES 'BWTgroupHES'.• Exe.

• *recoding birthweight data for registration.• Recode birthweightReg (0=0) (9998=0) (MISSING=0) (1 thru 499=1) (500 thru 999=2) (1000 thru 1499=3) (1500 thru

1999=4) (2000 thru 2499=5) (2500 thru• 2999=6) (3000 thru 3499=7) (3500 thru 3999=8) (4000 thru 4499=9) (4500 thru 4999=10) (5000 thru 5499=11) (5500 thru• Highest=12) INTO BWTgroupReg.• Var labels BWTgroupReg 'BWTgroupReg'.• Exe.

• Crosstabs• /tables=birthweightHES BY birthweightReg• /format=avalue tables• /cells=count /*row column-If want row percentage or column percentage */.• /count round cell.

Page 20: Comparing linked maternity data sets to check data quality in SPSS 

Gestational age• *recoding gestational age data.• Recode gestNN4B (0=0) (missing=0) (1 thru 21=1) (44 thru Highest=2)

(Else=Copy) into GestGroupNN4B.• Var Labels GestGroupNN4B 'GestGroupNN4B'.• Execute.

• Recode gestHES (0=0) (missing=0) (1 thru 21=1) (44 thru Highest=2) (else=Copy) into GestGroupHES.

• Var labels GestGroupHES 'GestGroupHES'.• Execute.

• Crosstabs• /tables=GestGroupHES BY GestGroupNN4B• /format=avalue tables• /cells=count row column total• /count round cell.

Page 21: Comparing linked maternity data sets to check data quality in SPSS 

Ethnicity• *Recoding ethnicity.• Recode ethnicNN4B ('A'=1) ('B'=1) ('C'=1) ('D'=9) ('E'=9) ('F'=9) ('G'=9)

('H'=2) ('J'=3) ('K'=4) ('L'=9) ('M'=6) ('N'=5) ('P'=7) ('R'=• 8) ('S'=9) ('Z'=10) (missing=10) into ethnicgroupNN4B.• Var labels ethnicgroupNN4B 'ethnicgroupNN4B'.• Execute.

• Recode ethnicHES ('A'=1) ('B'=1) ('C'=1) ('D'=9) ('E'=9) ('F'=9) ('G'=9) ('H'=2) ('J'=3) ('K'=4) ('L'=9) ('M'=6) ('N'=5) ('P'=7) ('R'=

• 8) ('S'=9) ('Z'=10) (missing=10) into ethnicgroupHES.• Var labels ethnicgroupHES 'ethnicgroupHES'.• Execute.

• *also rename the variable values into the relevant ethnic group.

Page 22: Comparing linked maternity data sets to check data quality in SPSS 

Results

91% of maternity HES delivery records could be linked to the birth registration/NN4B records

Page 23: Comparing linked maternity data sets to check data quality in SPSS 

Linked records for singleton births with missing data items in common data fields, 2005

NN4B Registration Maternity HES

Number Percent Number Percent Number Percent

Mother NHS No

164,458 30 NA NA 16,685 3

Mother’s DOB

960 0.2 0 0 0 0

Ethnicity 59,865 11 NA NA 77,771 14

Gestation 3,829 1 NA NA 264,877 48

Birth-weight 2,721 1 874 0.2 135,144 25

Birth status 615 0.1 0 0 176,455 32

Sex baby 1,098 0.2 0 0 144,115 26

Page 24: Comparing linked maternity data sets to check data quality in SPSS 

Comparison of sex for singletons in the linked records, 2005

Maternity HES*

Birth registration

Male Female Total Percentage

Male 204,613 791 205,404 51

Female 2,814 196,524 199,338 49

Total 207,427 197,315 404,742 100

Page 25: Comparing linked maternity data sets to check data quality in SPSS 

Concordance in data items between NN4B and maternity HES, 2005

Stated Missing Concordance where stated

Percentage

Birthweight* 75 25 99

Gestational age 52 48 89

Ethnicity 81 19 87

* using birth registration rather than NN4B

Page 26: Comparing linked maternity data sets to check data quality in SPSS 
Page 27: Comparing linked maternity data sets to check data quality in SPSS 
Page 28: Comparing linked maternity data sets to check data quality in SPSS 

Conclusion

• Good linkage rate was obtained• To gain maximum benefit, data quality and

completeness needs to improve in maternity HES

• SPSS is useful in data quality checks.