assesing the quality of last menstrual periode date

12
Assessing the quality of last menstrual period date on California birth records Michelle Pearl a , Megan L. Wier a and Martin Kharrazi b a Sequoia Foundation, La Jolla, and b California Department of Health Services, Genetic Disease Screening Program, Richmond, CA, USA Summary Correspondence: Michelle Pearl, Sequoia Foundation c/o Genetic Disease Screening Program, California Department of Public Health, 850 Marina Bay Parkway, Rm. F175, Mail Stop 8200, Richmond, CA 94804, USA. E-mail: [email protected] Pearl M, Wier ML, Kharrazi M. Assessing the quality of last menstrual period date on California birth records. Paediatric and Perinatal Epidemiology 2007; 21(Suppl. 2): 50–61. Birth certificate last menstrual period (LMP) date is widely used to estimate gestational age in the US. While data quality concerns have been raised, no large population-based study has isolated data quality issues by comparing birth record LMP (Birth LMP) with reliable LMP dates from another source. We assessed LMP data quality in 2002 Cali- fornia singleton livebirth records (n = 515 381) and in a subset of records with linked prenatally collected LMP from California’s statewide Prenatal Expanded Alpha- fetoprotein Screening Program (XAFP) (n = 105 936). Missing or incomplete LMP data affected 13% of birth records; 17% of those had complete LMP within XAFP records. Data quality indicators supported XAFP LMP as more accurate than Birth LMP, with a lower prevalence of digit preference, post-term delivery, out-of-range gestational age estimates and implausible birthweight-for-gestational age. The bimodal birthweight distribution evident at 20–31weeks’ gestation based on Birth LMP was nearly absent with XAFP LMP-based gestational age.Approximately 32% of the second birthweight mode was explained by apparent clerical errors in Birth LMP month. Digit preference errors, particularly day 1, were associated with gestational age overestimation. Preterm delivery rates were higher according to Birth (7.6%) vs. XAFP LMP (7.2%). One-fifth of observed preterm and over half of observed post-term births using Birth LMP were not true cases; 15% of true preterm cases were missed. African American or Hispanic, less educated, and publicly or uninsured women were most likely to be misclassified and have large LMP date discrepancies attributable to clerical or digit preference error. The implementation of a revised birth certificate is an opportunity for targeted training and data entry checks that could substantially improve LMP accuracy on birth records. Keywords: birth records, LMP date, accuracy, gestational age. Conflicts of interest: the authors have declared no conflicts of interest. Introduction Last menstrual period (LMP) date is the most widely available source for estimating gestational age from birth certificates in the US, and is the only source from the California certificate of livebirth before 2007. However, gestational age estimates from LMP in general, and from birth records in particular, are prone to error, as exhibited by digit preference 1–3 and implau- sible values relative to birthweight. 4 Errors in gesta- tional age estimates from LMP have resulted in excess post-term births relative to ultrasound estimates 1,5 and a bimodal birthweight distribution among very early preterm deliveries 6,7 not observed for very early preterm deliveries identified through clinical and ultrasound estimates. 8,9 It is unknown to what extent birth certificate LMP data quality is affected by recall difficulties and clerical error, beyond limitations inherent in the LMP dating method and its assumption of conception 14 days after the first day of menstrual bleeding (e.g. cycle length variability, amenorrhoea, non-menstrual vaginal bleed- ing mistaken for a normal period). 10 Digit preference in 50 Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 50–61. ©2007 The Authors. Journal Compilation ©2007 Blackwell Publishing Ltd

Upload: pujianti-lestarina

Post on 10-Apr-2016

212 views

Category:

Documents


0 download

DESCRIPTION

Assesing the Quality of Last Menstrual Periode Date

TRANSCRIPT

Page 1: Assesing the Quality of Last Menstrual Periode Date

Assessing the quality of last menstrual period date on Californiabirth recordsMichelle Pearla, Megan L. Wiera and Martin Kharrazib

aSequoia Foundation, La Jolla, and bCalifornia Department of Health Services, Genetic Disease Screening Program, Richmond, CA, USA

Summary

Correspondence:Michelle Pearl, SequoiaFoundation c/o GeneticDisease Screening Program,California Department ofPublic Health, 850 Marina BayParkway, Rm. F175, Mail Stop8200, Richmond, CA 94804,USA.E-mail:[email protected]

Pearl M, Wier ML, Kharrazi M. Assessing the quality of last menstrual period dateon California birth records. Paediatric and Perinatal Epidemiology 2007; 21(Suppl. 2):50–61.

Birth certificate last menstrual period (LMP) date is widely used to estimate gestationalage in the US. While data quality concerns have been raised, no large population-basedstudy has isolated data quality issues by comparing birth record LMP (Birth LMP) withreliable LMP dates from another source. We assessed LMP data quality in 2002 Cali-fornia singleton livebirth records (n = 515 381) and in a subset of records with linkedprenatally collected LMP from California’s statewide Prenatal Expanded Alpha-fetoprotein Screening Program (XAFP) (n = 105 936). Missing or incomplete LMP dataaffected 13% of birth records; 17% of those had complete LMP within XAFP records.

Data quality indicators supported XAFP LMP as more accurate than Birth LMP, witha lower prevalence of digit preference, post-term delivery, out-of-range gestational ageestimates and implausible birthweight-for-gestational age. The bimodal birthweightdistribution evident at 20–31 weeks’ gestation based on Birth LMP was nearly absentwith XAFP LMP-based gestational age. Approximately 32% of the second birthweightmode was explained by apparent clerical errors in Birth LMP month. Digit preferenceerrors, particularly day 1, were associated with gestational age overestimation. Pretermdelivery rates were higher according to Birth (7.6%) vs. XAFP LMP (7.2%). One-fifth ofobserved preterm and over half of observed post-term births using Birth LMP were nottrue cases; 15% of true preterm cases were missed. African American or Hispanic, lesseducated, and publicly or uninsured women were most likely to be misclassified andhave large LMP date discrepancies attributable to clerical or digit preference error.The implementation of a revised birth certificate is an opportunity for targeted trainingand data entry checks that could substantially improve LMP accuracy on birthrecords.

Keywords: birth records, LMP date, accuracy, gestational age.

Conflicts of interest:the authors have declared noconflicts of interest.

Introduction

Last menstrual period (LMP) date is the most widelyavailable source for estimating gestational age frombirth certificates in the US, and is the only source fromthe California certificate of livebirth before 2007.However, gestational age estimates from LMP ingeneral, and from birth records in particular, are proneto error, as exhibited by digit preference1–3 and implau-sible values relative to birthweight.4 Errors in gesta-tional age estimates from LMP have resulted in excesspost-term births relative to ultrasound estimates1,5 and

a bimodal birthweight distribution among very earlypreterm deliveries6,7 not observed for very earlypreterm deliveries identified through clinical andultrasound estimates.8,9

It is unknown to what extent birth certificate LMPdata quality is affected by recall difficulties and clericalerror, beyond limitations inherent in the LMP datingmethod and its assumption of conception 14 days afterthe first day of menstrual bleeding (e.g. cycle lengthvariability, amenorrhoea, non-menstrual vaginal bleed-ing mistaken for a normal period).10 Digit preference in

50

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 50–61. ©2007 The Authors. Journal Compilation ©2007 Blackwell Publishing Ltd

Page 2: Assesing the Quality of Last Menstrual Periode Date

the reported day of the month, an indication of recallerror, is prevalent in LMP from birth records as well asmedical records.1–3 Quantification of clerical errors inrecording and entry, such as month or year discrepan-cies and month/day transpositions, requires compari-son of LMP from different sources, yet no suchpopulation-based comparisons have been published.

The goals of this analysis are to (1) establish whetherprenatally collected LMP data from California’s centra-lised prenatal screening programme is more accuratethan LMP data from linked birth records; (2) quantifythe magnitude and impact of gestational age reportingerrors; (3) determine to what extent clerical and recallerror contribute to discrepancies in LMP dating; and(4) identify population subgroups most affected bypoor LMP data quality. By comparing LMP dates frombirth records with a population-based source of reli-able LMP data, the study design isolates reportingerror in LMP rather than errors inherent in the LMPdating methodology.

Methods

California singleton livebirth records from 2002(n = 515 389) were linked to data from pregnantwomen enrolled in the statewide Expanded Alpha-fetoprotein Screening Program (XAFP) between July2001 and December 2002. The XAFP is a voluntary,triple marker screening programme offered to allwomen entering prenatal care by 20 weeks’ gestation.In order to interpret serological markers, the pro-gramme requires an estimate of gestational age basedon ultrasound, LMP, or physical examination, whichis reported by the medical provider at the time ofmaternal blood collection (between 15 and 20 weeks’gestation) and double-key entered by programmepersonnel. The programme assigns a ‘best estimate’ ofgestational age that prioritises ultrasound when avail-able as the ‘gold standard’, unless otherwise specifiedby the provider. Between 20% and 25% of records areroutinely verified with providers before serologicalinterpretation, and those with positive or uninterpret-able screen results (roughly an additional 8%) receivefurther follow-up to confirm gestational age.

Probabilistic matching was used to link recordsfrom the XAFP and birth certificates, using mother’sname, date of birth, social security number, deliverydate, XAFP accession date, telephone number, streetaddress, city and zip code.11 A conservative certaintycut-off was used to minimise false matches. Overall,

327 218 livebirth records (63%) linked to an XAFPrecord from the same pregnancy. As a quality controlmeasure, 1800 records with large gestational age dis-crepancies or whose birth records indicated no prena-tal care before 6 months’ gestation were reviewed formatching accuracy, yielding six likely mismatches(0.4%). No mismatches were found from manualreview of records with out-of-range gestational agevalues based on XAFP LMP (<20 or >45 weeks,n = 45).

Of 515 389 birth records in 2002, eight birth recordswith missing birthweight, 29 468 missing date of LMPand 37 155 missing only day of LMP were excluded,yielding 448 758 complete records. Comparisons withXAFP LMP data are based on 105 936 birth recordswith complete LMP date linked to an XAFP recordwith LMP date as the ‘best estimate’ of gestationalage.

Data quality indicators evaluated include the propor-tion with post-term deliveries, out-of-range gestationalage, implausible birthweight-for-gestational age, verypreterm births with implausibly high birthweights(second birthweight mode), and digit preference. Ges-tational age was calculated as the neonate’s date ofbirth minus the LMP date, with those <20 completedweeks or >44 completed weeks considered out-of-range and excluded from rate calculations. Pretermwas defined as 20–36 completed weeks and post-termas 42–44 completed weeks. Implausible birthweight-for-gestational age was determined according toNational Center for Health Statistics cut points(<20 weeks, �1000 g; 20–23 weeks, �2000 g; 24–27weeks, �3000 g; 28–31 weeks, �4000 g; 32–47 weeks,�1000 g).12

To examine the bimodal birthweight distribution,birthweight density plots were generated from birthrecords for births between 20 and 27 weeks and28–31 weeks’ gestation, as defined by Birth LMP andXAFP LMP, using kernel density estimation.13 Birth-weights �2200 g at 20–27 weeks’ gestation and�2700 g at 28–31 weeks’ were considered to be in thesecond birthweight mode. LMP days of the monthwith frequency greater than expected by chanceinclude 1, 5, 10, 15, 20, 25 and 28. The overall expectedproportion with preferred digits is 23.0%, and theexpected proportion for digits 1–28 is 3.3%. The mag-nitude of measurement error in gestational age fromBirth LMP dates was estimated by the differencebetween Birth LMP and XAFP LMP gestationalage estimates. Positive differences represent

Quality of LMP date 51

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 50–61. ©2007 The Authors. Journal Compilation ©2007 Blackwell Publishing Ltd

Page 3: Assesing the Quality of Last Menstrual Periode Date

overestimation of birth gestational age relative toXAFP. Among discrepant records, the R2 from linearregression models of birthweight on gestational age,defined by either Birth LMP or XAFP LMP, wasassessed. We further examined false-positive pretermand post-term rates (1–specificity = false positives/true negatives), false-negative preterm rates(1–sensitivity = false negatives/true positives), andfalse-positive preterm and post-term screen rates(1–positive predictive value = false positives/screenpositives), treating XAFP gestational age as the goldstandard.

Two error flags were evaluated to explain discrepan-cies: clerical error and digit preference (indicatingrecall error). Clerical error types were suggested byBlair et al.14 as well as the distribution of observed dis-crepancies and include: dates that differ in only themonth or year field; dates that differ by 1 in the tensdigit of the day field (e.g. day 1 vs. 11); transposedmonth and day; LMP equal to the delivery date; orLMP 28 days or less before the child’s date of birth,possibly reflecting an estimated delivery date. The elec-tronic birth recording system used to enter 90% ofrecords in California in 2002 did not allow LMP entrieswith dates beyond the delivery date. The XAFP dataentry programme triggers a double-check for LMPdates beyond the date of blood collection. Records withpreferred digit LMP days were labelled ‘digit prefer-ence errors’ if the date was discrepant from the XAFPdate and the discrepancy was not also considered aclerical error. The proportion of discrepancies and poordata quality indicators ‘explained’ by each error typewas evaluated by calculating the percentage change inprevalence of each indicator when substituting XAFPLMP values for Birth LMP values for records flagged aseither clerical or digit-preference error.

The relationship between birth certificate demo-graphic and obstetric characteristics and data quality,misclassification and gestational age estimates wasexamined by comparing prevalence across subgroupsdefined by: self-reported race/ethnicity, with Hispanicethnicity stratified by mother’s birthplace [US-born orforeign-born (Mexico in 87.5% of cases)]; maternal age;years of completed education categorised as <12, 12and >12; parity (number of livebirths before currentdelivery); and source of payment for delivery, groupedas Medi-Cal (California’s Medicaid programme),private insurance, uninsured or other (Medicare,worker’s compensation, other governmental and non-governmental programmes).

Results

Data completeness and population selection factors areassessed in Table 1. In 2002 birth records, 12.9% ofdeliveries were missing LMP dates, 55.8% of thosemissing day only (data not shown). Missing or incom-plete LMP data on birth records were associated withAfrican American and US-born Hispanic race/ethnicity, younger maternal age, higher prevalence oflow birthweight, less than high-school education, andMedi-Cal coverage (Table 1). Of records with missingor incomplete LMP, 39.2% had complete ultrasounddata and 16.6% had complete LMP data in linked XAFPrecords (data not shown).

Compared with non-XAFP participants, XAFP par-ticipants were more likely to be under the age of34 years, to have no previous livebirths, to have com-pleted more than 12 years of education, and to be pri-vately insured (Table 1). Among XAFP participants,women with LMP as opposed to ultrasound ‘best esti-mates’ were more likely to be foreign-born Hispanic,have less than high-school education, and have Medi-Cal coverage. Both preterm and post-term birth ratesderived from birth certificate LMP were higher amongXAFP participants with ultrasound best estimates com-pared with those with LMP best estimates (8.9% vs.7.7% and 8.2% vs. 3.7%, respectively).

XAFP LMP appears to suffer from fewer data qualityproblems than Birth LMP, as evidenced by fewer out-of-range gestational age values, fewer preferred digits,lower post-term rates and lack of a bimodal birth-weight distribution at early gestational ages (Table 2).Preterm birth prevalence was higher according tolinked Birth LMP than XAFP LMP (7.6% vs. 7.2%).Birth records linked to XAFP records had lower preva-lence of out-of-range gestational age, post-term birthsand implausible birthweight-for-gestational age thanthe overall birth population. Day 1 was the most com-monly reported day in overall birth records, and day 15was most commonly reported by both Birth LMP andXAFP LMP within the linked sample. While digit pref-erence is evident in both data sources for LMP date,over-reporting of days 1 and 15 of the month washigher in Birth LMP vs. XAFP LMP dates (Table 2).

The proportion of very preterm births falling withinthe second birthweight mode was largest among theoverall birth population (26.7% of all births between 20and 31 weeks), and was four times greater when usingBirth LMP than XAFP LMP to estimate gestational agein the linked sample (Table 2). The second birthweight

52 M. Pearl et al.

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 50–61. ©2007 The Authors. Journal Compilation ©2007 Blackwell Publishing Ltd

Page 4: Assesing the Quality of Last Menstrual Periode Date

Table 1. Characteristics of linked and unlinked study populations, California 2002 Live Birth and Prenatal Expanded Alpha-fetoproteinScreening Program (XAFP) records

2002 Livebirthsa

(n = 515 381)

2002 Livebirthsa

with Birth LMP(n = 448 758)

2002 LiveBirthsa with BirthLMP, linked to XAFP

(n = 270 746)

Missing orincompleteLMP daten = 66 623

(12.9%)%

WithLMPdate

n = 448 758(87.1%)

%

Not linkedto XAFPb

n = 178 012(39.7%)

%

Linked toXAFPc

n = 270 746(60.3%)

%

XAFPUltrasoundd

n = 164 810(60.9%)

%

XAFPLMPd

n = 105 936(39.1%)

%

Race/ethnicityWhite 29.1 31.2 32.1 30.7 32.0 28.6African American 7.8 5.6 6.0 5.4 5.5 5.3Asian 8.1 8.8 7.5 9.6 9.9 9.1Hispanic, US-born 21.9 17.5 16.1 18.3 18.1 18.7Hispanic, foreign-born 28.9 33.0 34.4 32.2 30.6 34.7Pacific Islander 3.5 3.4 3.4 3.4 3.6 3.2American Indian/

Alaskan Native0.6 0.4 0.5 0.4 0.4 0.3

Age (years)<20 11.8 9.5 11.4 8.2 7.4 9.420–24 26.1 23.1 24.5 22.2 21.2 23.925–34 47.5 51.0 41.0 57.6 58.3 56.4>34 14.6 16.4 23.1 12.0 13.1 10.3

Education (years)<12 31.3 28.8 32.2 26.5 24.9 29.112 31.6 28.3 28.2 28.3 28.2 28.7>12 37.1 42.9 39.6 45.1 47.0 42.3

Previous livebirths (parity)0 34.5 40.0 38.1 41.2 40.7 41.91 32.6 31.7 30.1 32.7 32.9 32.52+ 32.9 28.4 31.8 26.1 26.5 25.6

Birthweight (g)<1500 1.2 0.9 0.9 0.8 0.9 0.71500–2499 5.0 4.0 4.2 3.8 4.0 3.6�2500 93.8 95.2 95.0 95.4 95.1 95.8

Method of payment for deliveryMedi-Cal 47.7 42.6 47.3 39.6 36.0 45.1Any private 48.8 53.0 44.4 58.7 62.3 53.1Uninsured 2.4 2.3 4.1 1.2 1.1 1.3Other 1.1 2.0 4.2 0.6 0.6 0.5

Birth LMP gestational age (completed weeks)<20 NA 0.1 0.1 0.1 0.1 0.120–31 NA 1.3 1.5 1.1 1.2 0.932–36 NA 7.6 8.3 7.2 7.5 6.737–41 NA 83.0 81.7 83.9 81.1 88.242–44 NA 6.5 6.7 6.3 8.0 3.6>44 NA 1.6 1.7 1.5 2.2 0.6Preterm:e 20–36 NA 9.0 10.0 8.4 8.9 7.7Post-term:e 42–44 NA 6.6 6.8 6.4 8.2 3.7

aExcludes n = 8 records missing birthweight.bAlso includes n = 12 239 records that linked to XAFP but had no XAFP LMP or ultrasound data.cRecords with XAFP LMP or ultrasound data.dBest estimate of gestational age used by the state-sponsored prenatal screening programme to interpret serological markers.eDenominator excludes records with gestational ages <20 and >44 completed weeks.LMP, last menstrual period; NA, not applicable.

Quality of LMP date 53

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 50–61. ©2007 The Authors. Journal Compilation ©2007 Blackwell Publishing Ltd

Page 5: Assesing the Quality of Last Menstrual Periode Date

Table 2. Data quality indicators by studypopulation and LMP data source,California 2002 Live Birth and PrenatalExpanded Alpha-fetoprotein ScreeningProgram (XAFP) records

2002 Livebirths2002 Livebirths linked toXAFP records with LMP

Birth LMP(n = 448 758)

%

Birth LMP(n = 105 936)

%

XAFP LMP(n = 105 936)

%

Gestational age at birth (completed weeks)Out-of-range: <20 0.1 0.1 0.02Out-of-range: >44 1.6 0.6 0.03Preterm:a 20–36 9.0 7.6 7.2Term:a 37–41 84.4 88.7 90.5Post-term:a 42–44 6.6 3.7 2.3

Digit preference, LMP dayb

Day 1 7.3 6.2 4.3Day 5 4.4 4.3 4.3Day 10 4.6 4.4 4.4Day 15 6.9 6.3 5.7Day 20 5.2 4.9 4.7Day 25 4.3 4.2 4.1Day 28 4.0 4.0 4.0Any preferred digit 36.7 34.3 31.4

Implausible birthweight-for-gestational age

0.2 0.1 0.02

Second birthweight modec % (n) % (n) % (n)20–27 weeks 22.6 (477) 14.1 (47) 1.9 (6)28–31 weeks 29.2 (1015) 21.0 (124) 6.4 (33)Overall: 20–31 weeks 26.7 (1492) 18.5 (171) 4.7 (39)

aDenominator excludes records with gestational ages <20 and >44 completed weeks.bExpected frequency of preferred digits is 3.3%.cProportion with birthweight �2200 g among deliveries 20–27 weeks and �2700 gamong deliveries 28–31 weeks.LMP, last menstrual period.

Figure 1. Birthweight distribution withinLMP-based gestational age 20–27 completedweeks from birth (n = 333) and PrenatalExpanded Alpha-fetoprotein ScreeningProgram (XAFP, n = 315) records, California2002 Linked Birth and Prenatal Screeningrecords. Birthweight (g)

Pro

bab

ility

den

sity

0 1000 2000 3000 4000 5000

0.00

000.00

020.00

040.00

060.00

080.00

100.00

12

XAFPBirth

54 M. Pearl et al.

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 50–61. ©2007 The Authors. Journal Compilation ©2007 Blackwell Publishing Ltd

Page 6: Assesing the Quality of Last Menstrual Periode Date

mode all but disappeared within 20–27 weeks’ gesta-tion when gestational age was derived from XAFPLMP (Fig. 1), and was greatly attenuated between 28and 31 weeks (Fig. 2).

The majority of Birth LMP and XAFP LMP dates areidentical (71.1%), and 65.0% of discrepancies amountto �1 week in either direction (Table 3). Among dis-crepant records, XAFP LMP-derived days of gestationhave a stronger association with birthweight than BirthLMP-derived days of gestation (n = 30 624; R2 = 0.27and R2 = 0.01, respectively). Large (>2 weeks) gesta-tional age overestimates are 75% more common thanlarge underestimates (Table 3; 3.7% vs. 2.1%), andaccount for 97.2% of gestational ages >44 weeks and

41.6% of post-term births (data not shown). Birth LMPdates with preferred digits have larger discrepanciesand greater gestational age overestimation than dateswith non-preferred digits. Among Birth LMP dateswith day 1 of the month, 16.2% overestimate gesta-tional age by more than 2 weeks whereas 2.6% under-estimate gestational age. The vast majority of records inthe second birthweight mode (79.5%) underestimategestational age by more than 31 days relative to XAFPLMP gestational age (Table 3).

Table 4 shows the cross-classification of gestationalage categories according to Birth LMP and XAFPLMP. Within Birth LMP-based gestational age groupsof 20–31 and 32–36 weeks, 12.4% and 21.4%, respec-

Figure 2. Birthweight distribution withinLMP-based gestational age 28–31 completedweeks from birth (n = 590) and PrenatalExpanded Alpha-fetoprotein ScreeningProgram (XAFP, n = 517) records, California2002 Linked Birth and Prenatal Screeningrecords.

Birthweight (g)

Pro

bab

ility

den

sity

0 1000 2000 3000 4000 5000

0.00

000.00

020.00

040.00

060.00

08

XAFPBirth

Table 3. Magnitude ofdifference between gestationalages calculated from BirthLMP vs. XAFP LMP date, bydata quality indicators,California 2002 Linked Birthand Prenatal ExpandedAlpha-fetoprotein ScreeningProgram (XAFP) records(n = 105 936)

Birth minus XAFP

Birth LMP data quality indicators

% Overall(n = 105 936)

% Amongpreferred digits

(n = 36 333)

% Amongday 1

(n = 6614)

% Implausiblebirthweight-for-gestational age

(n = 91)

% Among2nd birthweight

mode(n = 171)

Gestational age (days)32+ 0.9 1.3 3.5 6.6 0.015–31 2.8 4.6 12.8 0.0 0.08–14 2.5 3.9 9.1 0.0 0.01–7 8.7 9.8 13.6 1.1 0.60 (no difference) 71.1 65.7 52.1 7.7 9.9-1 to -7 10.1 9.8 4.9 0.0 1.2-8 to -14 1.8 2.3 1.5 1.1 1.2-15 to -31 1.6 2.1 2.2 1.1 7.6-32+ 0.5 0.5 0.5 82.4 79.5�14 days 2.1 2.6 2.6 83.5 87.1>14 days 3.7 5.9 16.2 6.6 0.0

LMP, last menstrual period.

Quality of LMP date 55

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 50–61. ©2007 The Authors. Journal Compilation ©2007 Blackwell Publishing Ltd

Page 7: Assesing the Quality of Last Menstrual Periode Date

Table 4. Distribution of XAFP gestational age within Birth LMP gestational age categories (completed weeks), California 2002 LinkedBirth and Prenatal Expanded Alpha-fetoprotein Screening Program (XAFP) records (n = 105 936)

Birth LMP-based gestationalage(completed weeks)

XAFP LMP-based gestational agea

<20 20–31 32–36 37–41 42–44 >44 Total % Total N

<20 10 4 2 33 2 0 0.0 5120–31 3 723 77 114 5 1 0.9 92332–36 1 52 5540 1 523 10 2 6.7 7 12837–41 3 30 1055 91 694 598 7 88.2 93 38742–44 0 1 57 2 010 1770 5 3.6 3 843>44 0 22 34 503 32 13 0.6 604

Total% 0.0 0.8 6.4 90.5 2.3 0.0 100.0N 17 832 6765 95 877 2417 28 105 936

Missing% 0.0 1.1 8.7 87.6 2.6 0.1 100.0N 2 123 959 9 693 284 9 11 070

Preterm false-positive rateb 1 652/97 724 = 1.7%Preterm false-negative rateb 1 143/7 535 = 15.2%Preterm false-positive

screen rateb

1 652/8 044 = 20.5%

Post-term false-positive rateb 2 068/102 876 = 2.0%Post-term false-positive

screen rateb

2 068/3 838 = 53.9%

aBolded diagonal values indicate birth records correctly categorised according to XAFP gestational age categories.bCalculations exclude Birth and XAFP gestational ages <20 and >44 completed weeks (total n = 105 259). Because post-term births derivedfrom either LMP source may be unreliable, a post-term false-negative rate is not presented.LMP, last menstrual period.

Figure 3. Birth/XAFP LMP datediscrepancies and poor data qualityindicators: proportion explained by clericalerror and digit preference error, California2002 Linked Birth and Prenatal ExpandedAlpha-fetoprotein Screening Program(XAFP) records (n = 105 936). LMP, lastmenstrual period.

29.9%

20.8%

60.0%

35.1%

22.0%

33.1%

20.7%

31.6%

25.3%

15.8%

15.5%

27.9%

27.3%

24.6%

21.0%

0.0% 0.0%

14.0%

25.7%

23.5%

17.9%

15.8%

9.3%

0.6%

4.8%

6.6%

9.7%

0% 20% 40% 60% 80% 100%

All discrepant records(n = 30 624)

–14 days difference(n = 2 236)

+14 days difference(n = 3 929)

<20 weeks, Birth LMP(n = 51)

>44 weeks, Birth LMP(n = 604)

Preterm false negatives(n = 1 143)

Preterm false positives(n = 1 652)

Post-term falsepositives (n = 2 068)

Second birthweightmode, 20–31 weeks

(n = 171)

Any clerical error Day 1 digit preference error (non-clerical) Other digit preference error (non-clerical)

56 M. Pearl et al.

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 50–61. ©2007 The Authors. Journal Compilation ©2007 Blackwell Publishing Ltd

Page 8: Assesing the Quality of Last Menstrual Periode Date

tively, are term births based on XAFP LMP. Morethan half of post-term births and 83.3% of those>44 weeks according to Birth LMP are term birthsbased on XAFP LMP. The rate of false-negativepreterm births is 15.0%, and 20.5% of observedpreterm births and 53.9% of observed post-termbirths are false positives. While the majority of thesemisclassifications result from discrepancies of>2 weeks, 30.6% of preterm false negatives, 22.4% ofpreterm false positives and 22.9% of post-term falsepositives result from discrepancies of �14 days (datanot shown). Birth records missing LMP dates withlinked XAFP LMP data have higher preterm ratesthan linked records not missing LMP dates (9.8% and7.2%, respectively) (Table 4).

Of all gestational age discrepancies, 46.3% can bedescribed as either clerical or digit preference errors.Clerical errors observed from discrepancies betweenBirth LMP and XAFP LMP dates represent 2.7% of alllinked records and 9.3% of all discrepancies, whereasthe prevalence of non-clerical digit preference error is10.7% of all linked records and 37.0% of all discrepan-cies. Among clerical errors, 2.2% are whole yeardeviations, 0.9% possible confusions with estimateddelivery date, 47.7% whole month deviations, 1.2%month/day transpositions, and 47.8% 10-day devia-tions. Among clerical errors, XAFP LMP gestationalage is more closely related to birthweight (R2 = 0.33),whereas no relationship exists between Birth LMP ges-tational age and birthweight (R2 = 0.00).

Table 5. Maternal and infant characteristics by gestational age categories and data quality indicators, California 2002 Linked Birth andPrenatal Expanded Alpha-fetoprotein Screening Program (XAFP) records (n = 105 936)

-14days

%

+14days

%

Digitpreference

error,Birth LMP

%

Clericalerror,BirthLMP

%

Pretermrate,BirthLMP

%

Pretermrate,

XAFPLMP

%

Pretermfalse-

negativeratea

%

Pretermfalse-

positiveratea

%

Pretermfalse-

positivescreen ratea

%

Overall 2.1 3.7 10.7 2.7 7.6 7.2 15.2 1.7 20.5Race/ethnicity

White 1.2 2.9 9.6 1.8 6.1 5.9 11.5 1.0 14.8African American 2.2 5.4 12.9 2.7 11.3 10.9 13.1 2.1 16.1Asian 1.6 2.8 8.9 2.2 7.0 6.5 13.3 1.3 18.7Hispanic, US-born 2.1 4.0 11.6 2.7 8.1 7.8 16.6 1.8 20.0Hispanic, foreign-born 3.1 4.2 11.4 3.6 8.1 7.3 18.2 2.3 26.4Pacific Islander 1.4 3.1 9.1 2.3 9.9 9.5 11.9 1.7 15.6Native American 1.7 3.9 11.7 1.7 7.6 7.3 19.2 1.8 22.2

Age (years)<20 2.9 3.9 11.3 3.1 9.1 8.9 17.4 1.9 19.420–24 2.5 4.0 11.6 3.0 7.8 7.1 16.8 2.0 23.525–34 1.9 3.5 10.3 2.5 7.1 6.6 14.6 1.6 20.5>35 1.8 3.8 10.5 2.7 9.0 8.6 12.3 1.5 15.7

Education (years)<12 3.2 4.6 11.8 3.6 8.6 7.9 19.7 2.4 25.912 2.2 3.9 11.4 2.7 7.7 7.3 16.0 1.7 20.5>12 1.3 2.9 9.4 2.0 6.9 6.5 10.8 1.2 16.0

Previous livebirths (parity)0 1.9 3.2 9.7 2.4 7.8 7.5 13.7 1.4 17.01 2.1 3.7 10.8 2.7 6.8 6.1 14.3 1.7 23.4>1 2.5 4.6 12.2 3.1 8.5 8.0 18.3 2.1 23.0

Method of payment for deliveryMedi-Cal 3.0 4.4 12.0 3.4 8.5 7.9 18.4 2.2 24.2Any private 1.4 3.1 9.6 2.1 6.9 6.5 12.0 1.2 16.6Uninsured 2.4 4.9 9.6 3.5 8.2 7.6 17.0 2.0 23.2Other 2.0 2.3 12.1 2.7 6.1 5.2 3.4 1.1 17.7

aExcludes Birth and XAFP gestational ages <20 and >44 completed weeks (total n = 105 259, see Table 4 for detail).LMP, last menstrual period.

Quality of LMP date 57

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 50–61. ©2007 The Authors. Journal Compilation ©2007 Blackwell Publishing Ltd

Page 9: Assesing the Quality of Last Menstrual Periode Date

Proportions displayed in Fig. 3 represent the amountby which the prevalence of each data quality indicatordecreases when clerical or digit preference errors inbirth records are corrected using XAFP LMP as the goldstandard. Clerical errors are associated more with largeunderestimates than overestimates of gestational age,resulting in 33.1% of the preterm false positives and31.6% of the second birthweight mode observedbetween 20 and 31 weeks (all of the latter involvederrors in the month field). Digit preference error, espe-cially day 1 error, is associated with large gestational ageoverestimates. Day 1 errors, while representing only2.7% of linked records, disproportionately contribute topost-term out-of-range gestational ages, post-term falsepositives and missed preterm cases.

Discrepancies between Birth LMP and XAFP LMPgestational age estimates vary by population subgroup(Table 5). Large underestimation of gestational age,clerical errors and false-positive preterm rates areapparent among foreign-born Hispanics, youngerwomen with less than high-school education, womenwith high parity and with Medi-Cal or no insurance.Large overestimation of gestational age, digit prefer-ence and post-term false-positive rates are observedamong African Americans, Native Americans, womenwith low education level, high parity, and Medi-Cal orno insurance. Preference for LMP day 1 is most preva-lent among African Americans (data not shown) whileclerical errors are more prevalent among foreign-bornHispanics.

Rates of preterm birth are approximately 5% loweracross population subgroups when defined accordingto XAFP LMP than according to Birth LMP, with theexception of foreign-born Hispanics, whose pretermrates are 10% lower using XAFP LMP (Table 5). Thepreterm birth rate among foreign-born Hispanicsappears to be lower than that for US-born Hispanicsusing XAFP LMP, while rates are identical using BirthLMP. Other preterm birth rate comparisons amongsubgroups change little based on LMP data source.However, overall preterm birth rates mask substantialmisclassification in both directions. Among AfricanAmericans, for example, 13.1% of preterm cases aremissed using Birth LMP, whereas 16.1% of presumedpreterm cases are not true cases. Foreign-born Hispan-ics have the highest preterm false-positive and false-positive screen rates based on Birth LMP (2.3% and26.4%, respectively). Native Americans and foreign-born Hispanics have the highest preterm false-negativerates (19.2% and 18.2%, respectively). Medi-Cal cover-

age and lack of insurance, high parity, young age andlow education level are associated with high misclas-sification of preterm births in both directions.

Overall post-term rates are 36% lower using XAFPLMP compared with Birth LMP; however, thisdecrease is higher among African Americans, womenaged over 35 years, women with high parity andwomen with Medi-Cal or no insurance. African Ameri-cans and the uninsured have the highest post-termfalse-positive rates (2.6% each), followed by NativeAmericans, women with less than high-school educa-tion and women with Medi-Cal (data not shown).

Discussion

This is the first study to compare LMP dates from birthcertificates with a large, population-based source ofreliable, prenatally collected LMP data in order toisolate data reporting errors. Birth LMP was discrepantwith XAFP LMP nearly a third of the time, resulting inone-fifth of preterm births and half of post-term birthsfrom birth records representing false positives, and15% of true preterm cases being missed. Agreementwithin 1 week was larger in the current study than aprevious comparison of LMP-based gestational agefrom birth records with gestational age from medicalcharts among normal-birthweight babies in northernCalifornia (89% and 77–78%, respectively); however,some chart estimates in that smaller study werederived from ultrasound.15

While menstrual dating has inherent flaws for esti-mating gestational age, the recording of LMP date itselfis prone to errors amenable to improvement. Califor-nia’s centralised XAFP prenatal screening programmeis the largest in the country, serving approximately 70%of pregnant women in the State. As accurate gestationalage is needed for interpretation of risks for trisomiesand neural tube defects, XAFP data provide apopulation-based source of gestational age in Califor-nia. Until now, only vital records have provided suffi-cient numbers of very early deliveries to examine thebimodal distribution of birthweight. The second birth-weight mode at early gestations appears to be largelyan issue of clerical and recall error, rather than patho-logical non-menstrual bleeding misidentified as anormal menstrual cycle.6 XAFP LMP is more accuratethan LMP from birth certificates, as demonstrated bylower rates of digit preference, out-of-range gestationalages, implausible birthweight-for-gestational age andpost-term births. Over half of large discrepancies in

58 M. Pearl et al.

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 50–61. ©2007 The Authors. Journal Compilation ©2007 Blackwell Publishing Ltd

Page 10: Assesing the Quality of Last Menstrual Periode Date

LMP dates were explained by suspected clerical anddigit preference errors, indicating that quality controlmeasures have the potential to improve gestational ageestimates.

Clerical errors may arise from recording dates fromthe wrong field (e.g. estimated due date14 or child’sdate of birth), manual error transcribing a date into achart or worksheet, or typographical error on dataentry. In this analysis, assessment of clerical error mayhave been incomplete as misread digits in the day fieldwere only assessed if the tens digit differed by one orthe month and day were transposed. On the otherhand, random or recall error may have resulted in sus-pected clerical error by chance. Among discrepantrecords flagged for clerical error, birthweight wasstrongly associated with XAFP LMP gestational agewhile lacking association with Birth LMP gestationalage, suggesting errors are predominantly in BirthLMP.

In 2002, California’s XAFP programme requireddouble-key entry of all dates, thus providing built-inerror checks during data entry, verification of keyfields with providers where any data element wasmissing, and follow-up of non-negative screeningresults, which probably account for improved dataquality. The State vital statistics electronic data entryprogramme requires confirmation of dates of LMPthat precede birth by more than 1 year and gestationsless than 140 days with birthweight >2000 g. Imple-menting double-key entry and expanding data checksto other situations including mistaking the estimateddue date for the LMP, additional implausible birth-weight entries and out-of-range gestational age esti-mates, could yield substantial improvements in BirthLMP data quality.

Birth LMP dates with preferred digits were morelikely than those with non-preferred digits to differfrom XAFP LMP dates by more than 2 weeks (8.5% vs.4.4%). Increased discrepancies associated with pre-ferred vs. non-preferred digits have also been reportedcomparing gestational age estimates from XAFP LMPdates with ultrasound gestational age estimates.3

Increased digit preference prevalence in Birth LMPrelative to XAFP LMP implies that mothers are directlyasked for LMP information during birth registration.Increasing duration of LMP recall has been associatedwith gestational age overestimation16 and may be oneexplanation for the overestimation of gestational agethat we observed for Birth LMP dates with preferreddigits. Maternal querying and missing LMP dates may

both result from missing prenatal charts at the time ofbirth registration.

The prevalence of digit preference in day of LMPdates changed little between 1987 and 2002 (35.9% and36.7%, respectively).3 While preference for day 1 inBirth LMP dates was associated with large gestationalage overestimations and missed preterm cases in ourstudy, its prevalence in California birth records hasdecreased from 7.7% in 2001 and 7.3% in 2002 to 5.7%in 2003. Researchers should assess the degree of day 1digit preference when relying on LMP dates for gesta-tional age estimates, particularly among vulnerablesubpopulations.

LMP data in California birth records were not morecomplete in 2002 than they were in 1987 (12.9% vs.12.7%).3 Nationally in 2002, 5.1% of birth records weremissing only day of LMP and 5.5% were also missingmonth and year (J. Martin, 7 Nov 2005, pers. comm.).Missing LMP data threaten external validity of pretermbirth estimates. Births missing LMP data are dispropor-tionately from vulnerable populations and have higherrisk of infant mortality.17,18 Implausible gestational agesare frequently excluded from analysis, further com-pounding the missing data problem. In California birthrecords, missing day is imputed as 15 for gestational agecalculation. However, unlike other States, clinical orobstetric estimate of gestational age was unavailable tosubstitute for records with incomplete LMP or out-of-range gestational age estimates until 2007.

Direct linkage of birth records to XAFP recordswhere LMP dates were considered the ‘best estimate’of gestational age ensures the most direct LMP errorassessment possible on a large, population-basedsample. However, the population of women withXAFP LMP as the best gestational age estimate differedfrom the general birth population, with fewer womenover the age of 35 years, with post-high school educa-tion, and fewer preterm or low-birthweight deliveries.Women aged over 35 years often elect to have a diag-nostic test (e.g. amniocentesis) rather than a screeningtest. Beyond selection factors related to prenatal screen-ing participation, women in this study had LMP datesconsidered reliable for screening interpretation. It islikely that the LMP dates in birth records for thesewomen are more reliable than the LMP dates ofwomen who were referred for ultrasound, as sug-gested by the higher Birth LMP-based post-term ratesamong women with ultrasound dating. Similarly, rela-tive to linked records, the overall birth population hadhigher digit preference and post-term rates and a

Quality of LMP date 59

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 50–61. ©2007 The Authors. Journal Compilation ©2007 Blackwell Publishing Ltd

Page 11: Assesing the Quality of Last Menstrual Periode Date

larger proportion in the second birthweight mode. Forthese reasons, the discrepancies we observed probablyunderestimate the true extent of LMP reporting errors inthe general population of California births.

Studies comparing birth certificate LMP with ultra-sound gestational age estimates need to consider therole of reporting error in vital records in addition toinherent biological or methodological limitations ofLMP dating. Direct comparison of XAFP LMP withultrasound could also lead to biased conclusionsregarding the quality of the LMP dating method. Weobserved an excess of post-term births based on XAFPLMP dates within the subsample of XAFP records withboth LMP and ultrasound data, suggesting an over-representation of unreliable XAFP LMP dates necessi-tating ultrasound confirmation. This small subgroup ofXAFP participants with both XAFP LMP and ultra-sound data, comprising 14% of all XAFP participantsand 1% of 1987 California livebirths, has been the focusof previous research.3

Women of African American and Hispanic origin,with less education and higher parity, and with publicor no insurance coverage were disproportionatelyaffected by misclassification and missing LMP data onbirth records. Foreign-born Hispanics had the highestrates of clerical error and underestimated gestationalage, and a high preterm false-positive rate. However,recall error, indicated by digit preference, was less pro-nounced than among African Americans and NativeAmericans. The appearance of higher preterm ratesamong US-born Hispanics relative to foreign-bornHispanics using gestational age from XAFP LMP datessuggests that reporting error in birth records, and cleri-cal error in particular, may hide a Hispanic paradox forpreterm delivery similar to that observed for birth-weight, as hypothesised by others.19 Indices based ongestational age, such as small-for-gestational age oradequacy of prenatal care, may also be biased amongthese segments of the population.

Beginning in 2007, the obstetric estimate was addedto the California birth certificate, intended to reflectultrasound dating where available.20 Linked data fromthe XAFP programme suggest that at least 39% of birthrecords missing LMP could potentially have an obstet-ric estimate informed by ultrasound. Because womenobtaining ultrasound during pregnancy are not repre-sentative of the birth population, as well as for otherreasons, LMP dating will still be the primary source forpopulation monitoring of preterm delivery. We con-clude that some limitations previously attributed to the

LMP dating method may be ameliorated through dataquality control measures. The training surrounding theimplementation of the revised birth certificate providesan opportunity to emphasise appropriate sources forgestational age data and to enhance data-checkingprotocols.

Acknowledgements

This paper was partially supported through contractCQ004942-LOS with the Centers for Disease Controland Prevention, Atlanta, GA. The authors are indebtedto Joyce A. Martin of the Centers for Disease Controland Prevention, National Center for Health Statistics,and Alan Oppenheim of the California Department ofHealth Services, Center for Health Statistics for insightregarding national and State birth certificate data; BobCurrier and Marie Roberson of the California Depart-ment of Health Services, Genetic Disease Branch andPatricia M. Dietz of the Centers for Disease Controland Prevention, National Center for Chronic DiseasePrevention and Health Promotion for their thoughtfulcomments; Alan Hubbard of University of California,Berkeley for statistical support; Allen Hom and SteveGraham of the Sequoia Foundation for data linkage;and Deborah Hildebrandt and Marissa Root for manu-script assistance.

References

1 Savitz DA, Terry JW Jr, Dole N, Thorp JM Jr, Siega-Riz AM,Herring AH. Comparison of pregnancy dating by lastmenstrual period, ultrasound scanning, and theircombination. American Journal of Obstetrics and Gynecology2002; 187:1660–1666.

2 Frazier TM. Error in reported date of last menstrual period.American Journal of Obstetrics and Gynecology 1959;77:915–918.

3 Waller DK, Spears WD, Gu Y, Cunningham GC. Assessingnumber-specific error in the recall of onset of last menstrualperiod. Paediatric and Perinatal Epidemiology 2000;14:263–267.

4 Alexander GR, Himes JH, Kaufman RB, Mor J, Kogan M. AUnited States national reference for fetal growth. Obstetricsand Gynecology 1996; 87:163–168.

5 Kramer MS, McLean FH, Boyd ME, Usher RH. The validityof gestational age estimation by menstrual dating in term,preterm, and postterm gestations. JAMA 1988;260:3306–3308.

6 David RJ. The quality and completeness of birthweight andgestational age data in computerized birth files. AmericanJournal of Public Health 1980; 70:964–973.

60 M. Pearl et al.

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 50–61. ©2007 The Authors. Journal Compilation ©2007 Blackwell Publishing Ltd

Page 12: Assesing the Quality of Last Menstrual Periode Date

7 Vahratian A, Buekens P, Bennett TA, Meyer RE, Kogan MD,Yu SM. Preterm delivery rates in North Carolina: are theyreally declining among non-Hispanic African Americans?American Journal of Epidemiology 2004; 159:59–63.

8 Mustafa G, David RJ. Comparative accuracy of clinicalestimate versus menstrual gestational age incomputerized birth certificates. Public Health Reports 2001;116:15–21.

9 Dietz PM, England LJ, Callaghan WM, Pearl M, Wier ML,Kharrazi M. A comparison of LMP-based andultrasound-based estimates of gestational age using linkedCalifornia livebirth and prenatal screening records.Paediatric and Perinatal Epidemiology 2007; 21 (Suppl. 2):62–71.

10 Alexander GR, Allen MC. Conceptualization, measurement,and use of gestational age. I. Clinical and public healthpractice. Journal of Perinatology 1996; 16:53–59.

11 SuperMATCH Concepts and Reference, Version 3.10. Boston:Vality Technology Incorporated, March 2001.

12 National Center for Health Statistics. Instruction Manual,Computer Edits for Natality Data, Part 12. Hyattsville, MD: USDepartment of Health and Human Services, Centers forDisease Control and Prevention, 1995.

13 Scott DW. Multivariate Density Estimation: Theory, Practiceand Visualization. New York: John Wiley & Sons 1992.

14 Blair E, Liu Y, Cosgrove P. Choosing the best estimate ofgestational age from routinely collected population-basedperinatal data. Paediatric and Perinatal Epidemiology 2004;18:270–276.

15 Emery ES 3rd, Eaton A, Grether JK, Nelson KB. Assessmentof gestational age using birth certificate data compared withmedical record data. Paediatric and Perinatal Epidemiology1997; 11:313–321.

16 Wegienka G, Baird DD. A comparison of recalled date oflast menstrual period with prospectively recorded dates.Journal of Women’s Health 2005; 14:248–252.

17 Buekens P, Delvoye P, Wollast E, Robyn C. Epidemiology ofpregnancies with unknown last menstrual period. Journal ofEpidemiology and Community Health 1984; 38:79–80.

18 Gould JB, Chavez G, Marks AR, Liu H. Incomplete birthcertificates: a risk marker for infant mortality. AmericanJournal of Public Health 2002; 92:79–81.

19 Deeb-Sossa N, Agans RP, Butron-Riveros BC, Balcazar H,Kalsbeek WD, Buekens P. Development and testing ofinterview questions to determine last menstrual period inMexican immigrant populations. Journal of Immigrant Health2004; 6:127–136.

20 Wier ML, Pearl M, Kharrazi M. Gestational age estimationon United States live birth certificates: a historical overview.Paediatric and Perinatal Epidemiology 2007; 21 (Suppl. 2):4–12.

Quality of LMP date 61

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 50–61. ©2007 The Authors. Journal Compilation ©2007 Blackwell Publishing Ltd