useofmultipleimputationtocorrectforbiasinlungcancer incidence … · ciation between the missing...
Post on 13-Mar-2020
7 Views
Preview:
TRANSCRIPT
Research Article
Use ofMultiple Imputation to Correct for Bias in Lung CancerIncidence Trends by Histologic Subtype
Mandi Yu1, Eric J. Feuer1, Kathleen A. Cronin1, and Neil E. Caporaso2
AbstractBackground: Over the past several decades, advances in lung cancer research and practice have led to
refinements of histologic diagnosis of lung cancer. The differential use and subsequent alterations of
nonspecific morphology codes, however, may have caused artifactual fluctuations in the incidence rates for
histologic subtypes, thus biasing temporal trends.
Methods:Wedeveloped amultiple imputation (MI)method to correct lung cancer incidence for nonspecific
histology using data from the Surveillance, Epidemiology, and End Results Program during 1975 to 2010.
Results: For adenocarcinoma in men and squamous in both genders, the change to an increasing trend
around 2005, after more than 10 years of decreasing incidence, is apparently an artifact of the changes in
histopathology practice and coding system. After imputation, the rates remained decreasing for adenocar-
cinoma and squamous in men, and became constant for squamous in women.
Conclusions: As molecular features of distinct histologies are increasingly identified by new technologies,
accurate histologic distinctions are becoming increasingly relevant to more effective "targeted" therapies, and
therefore, are important to track inpatients.However,without incorporating the coding changes, the incidence
trends estimated for histologic subtypes could be misleading.
Impact: The MI approach provides a valuable tool for bridging the different histology definitions, thus
permitting meaningful inferences about the long-term trends of lung cancer by histologic subtype. Cancer
Epidemiol Biomarkers Prev; 23(8); 1546–58. �2014 AACR.
IntroductionLung cancer is the leading cause of cancer death in
women and men in the United States. On average, onlyapproximately 15% of newly diagnosed cases survive for5 years or longer (1). Histologically, lung cancers areclassified as small cell and non–small cell (NSC) carcino-ma (2). The latter is usually further divided into squa-mous cell carcinoma, adenocarcinoma, and large cellcarcinoma. Within the NSC category, etiologic and mor-phologic differences by histology have been recognized,but in the past, treatment and prognosis were consideredrelatively homogeneous for different histologies of thesame stage. Emerging data now increasingly identifysubsets of adenocarcinoma (3) and squamous histologies(4) with specific genetic alterations. For example, the
epidermal growth factor receptor (EGFR) protein over-expression and activating EGFR mutations, associatedwith responsiveness to EGFR therapies (tyrosine kinaseinhibitors; refs. 5 and 6), are almost exclusively found inadenocarcinoma histology. Similarly, echinoderm micro-tubule-associated protein-like 4 (EML4)-anaplastic lym-phoma kinase (ALK) rearrangements are also more com-mon in adenocarcinoma and these mutations indicateresponsiveness to another therapeutic agent, crizotinib(7). Aswemove into the future, clinical strategy for tumormanagement will be determined by molecular studies ofthe tumors and their underlying mutations (8). Inheritedvariation in lung cancer that has been identified mayeventually have therapeutic implications in terms of effi-cacy and side effects. The recent results from the NationalLung Screening Trial further suggested that histologymight be attributable to the differential computed tomog-raphy (CT) screening efficiency (9). As the broader impli-cations of histologic classification are becoming increas-ingly relevant to screening, treatment, prognosis, andetiology, so will the examination of temporal trends sep-arately for each subtype.
Cancer registry data collected by the National CancerInstitute (NCI)’s Surveillance Epidemiology and EndResults (SEER) Program have been a primary sourceof data for providing national trends of lung cancerincidence and mortality (10). SEER registries have beencoding cancer histology according to the International
Authors' Affiliations: 1Division of Cancer Control and PopulationSciences; and 2Division of Cancer Epidemiology and Genetics, NationalCancer Institute, Rockville, Maryland
Note: Supplementary data for this article are available at Cancer Epide-miology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).
Corresponding Author: Mandi Yu, Division of Cancer Control and Pop-ulation Sciences, National Cancer Institute, 9606 Medical Center Drive,Room 4E560, Rockville, MD 20850. Phone: 240-276-6866; Fax: 240-276-7908; E-mail: yum3@mail.nih.gov
doi: 10.1158/1055-9965.EPI-14-0130
�2014 American Association for Cancer Research.
CancerEpidemiology,
Biomarkers& Prevention
Cancer Epidemiol Biomarkers Prev; 23(8) August 20141546
on March 15, 2020. © 2014 American Association for Cancer Research. cebp.aacrjournals.org Downloaded from
Published OnlineFirst May 22, 2014; DOI: 10.1158/1055-9965.EPI-14-0130
Classification of Diseases for Oncology (ICD-O). In the1990s, pathologists tended not to report NSC carcinomaswith specificity because their treatments and prognoseswere considered similar, thus an increasing number ofcases are codedwith 8010 (carcinoma, NOS) since 1980. Inrecognition of this trend, 8046 (NSC carcinoma) wasadded into ICD-O-3 in 2001 to group cases that could notbe classified beyond the exclusion of small cell. Collec-tively, the percentage of cases coded with 8010 or 8046increased dramatically, from 5% in 1982 tomore than 22%in 2005 (11). Some of these cases could have been derivedfrom one of the specific histologic subtypes, which wouldhave subsequently reduced their incidence rates. How-ever, this increasing use of nonspecific codes did notcontinue. In light of the advances in cancer research andtherapy, increasingly NSC cases have been diagnosedwith more histologic specificity (12) over the last fewyears, which may have driven up the rates for squamousor adenocarcinoma. Such differential use of nonspecificmorphology codes could bias the estimated temporaltrends of histologic subtypes and complicate interpreta-tions. Appropriate statistical adjustments are necessary toimprove the quality of inferences using the authoritativecancer registry data, which otherwise has been compro-mised by the unavoidable limitations imposed by theimperfect earlier classification system.Multiple imputation (MI) has been shown to be a useful
approach for handling measurement or coding changesfor settings both in the presence (13–16) and absence (17)of calibration data (observations that are measured in allmeasurement scales or coding systems).When calibrationdata (usually on a random subsample) are available, onecan generate plausible values in all measurement scalesfrom an imputation model and analyze the imputed datausing the preferred scale. For the issue associatedwith thechange in the use of nonspecificmorphology codes, that is8010 and 8046, 2 types of calibration data could be usefulfor correcting coding inconsistency. The first type com-prises cancer cases that are originally assigned to a non-specific code, but areupdatedwith a specific code throughreexamination. Such data provide information about theassociation between nonspecific and specific histologiesthat one can use to recover the missing histology for allcases with a nonspecific code. Because nonspecific codesno longer exist in the imputed data, the trend analysis ofincidence by histology is valid (provided that the impu-tationmodel is correct). The second type consists of cancercases coded inmultiple classification systems.Using thesedata as a bridge, one can convert data from one system toanother. Although nonspecific codes still exist, temporalcomparisons of imputed histology in any classificationsystem is valid because coding consistency is maintained.However, neither type of calibration data could be easilyobtained because of practical reasons, such as budgetconstraints and the lack of diagnostic data sources. Thus,this problem becomes a missing data issue where thespecific histologies for cases with a nonspecific morphol-ogy code are missing and an assumption about the asso-
ciation between the missing specific histology andobserved data (18–20) is required. We make a reasonableassumption that for cancer cases with similar tumor,treatment, survival, patients’ demographic characteris-tics, the distribution of nonspecific and specific histologyis similar. Based on this assumption, we developed anMIapproach using the sequential regression imputationmethod (SRMI; ref. 21) to redistribute cases without spe-cific histology to one of specific subtypes, thus correctingthe biased estimates of incidence rates.
Materials and MethodsData sources
We selected 522,416 malignant lung cancer casesdiagnosed from 1975 to 2010 from the SEER 9 registriesdatabase (including Atlanta, Connecticut, Detroit,Hawaii, Iowa, New Mexico, San Francisco–Oakland,Seattle–Puget Sound, and Utah). We created 6 histologiccategories according to the most recent NCI’s SEERCancer Statistics Review (1) and they are small cellcarcinoma (8041–8045), squamous and transitional cellcarcinoma (8051–8052, 8070–8084, 8120–8131), adeno-carcinoma (8050, 8140–8149, 8160–8162, 8190–8221,8250–8263, 8270–8280, 8290–8337, 8350–8390, 9400–8560, 8570–8576, 8940–8941), large cell carcinoma(8011–8015), other NSC carcinoma (8020–8022, 8030–8040, 8090–8110, 8150–8156, 8170–8175, 8180, 8230–8231, 8240–8249, 8340–8347, 8561–8562, 8580–8671), andother specified and unspecified types (8680–8713, 8800–8912, 8990–8991, 9040–9044, 9120–9136, 9150–9252,9370–9373, 9540–9582, 8720–8790, 8930–8936, 8950–8983, 9000–9030, 9060–9110, 9260–9365, 9380–9539,8000–8005). We singled out 8010 and 8046 from thesecategories, for which we performed statistical adjust-ments. We excluded cases with other specified andunspecified types because their incidence is not likely tobe affected by the recent change in coding system.We alsoexcluded cases that were not histologically confirmed orwith unknown histologic confirmation status, becausetheir diagnoses tended to be inaccurate and lacked spec-ificity. The final sample size for this analysis is 470,326.
Data analysisWe treated the cases with 8010 or 8046 as missing data
that we dealt with byMI (22). ThisMI approach took eachcase withmissing histology and imputed it with a specifichistologic subtype. Cases coded with 8010 were imputedwith one of the 5 carcinoma subtypes, that is small cell,squamous, adenocarcinoma, large cell, and other NSC.For cases coded with 8046, the imputation was limited toone of the NSC subtypes, that is excluding small cell. Thisprocess was repeated independently 10 times to create 10completed datasets to account for imputation uncertainty.Age-adjusted incidence rates (using the 2000 U.S. stan-dard population in 19 age groups) were estimated fromeach completed dataset in the same way as using theoriginal dataset, thus producing 10 sets of estimates. Wethen combined these estimates to produce MI estimates.
Adjust for Bias in Lung Cancer Incidence Trends by Histology
www.aacrjournals.org Cancer Epidemiol Biomarkers Prev; 23(8) August 2014 1547
on March 15, 2020. © 2014 American Association for Cancer Research. cebp.aacrjournals.org Downloaded from
Published OnlineFirst May 22, 2014; DOI: 10.1158/1055-9965.EPI-14-0130
For a single incidence rate, the MI point estimate was theaverage of 10 imputed data estimates. The associatedstandard error was calculated by combining the averageof the squared standard errors of the 10 estimates and thevariance of the 10 rate estimates (22). Joinpoint linearregression models (23) were used to fit connected lineartrends on a log scale with up to 4 joinpoints using theJoinpoint regression program version 3.5.0 developed bythe NCI. Annual percentage change (APC) with a corre-sponding 95% confidence interval (CI) was calculated todescribe each joined trend.
Imputation methodThe nonspecific histologic diagnoses are highly likely to
have nonrandom characteristics. For example, patientsmay not merit further histologic diagnostic proceduresbecause they have diseases too advanced to permit cura-tive surgery (i.e., stage IIIB or greater) or because theirmedical status preclude surgery or other modalities withcurative intent. When surgery is not a clinical option,obtaining adequate tissue to establish a histologicsubtype may be impossible and, in this circumstance,clinicians may elect to forgo further histologic classifica-tion. Therefore, we considered using the information thatis predictive of histology and the missingness of specifichistology to recover the incomplete specific histology.Weassumed the missingness is random conditional on thisinformation, and this assumption has been shown to bereasonable in most practical situations (24, 25).
Specifically, we selected the covariates to be included inthe imputation model following the principle of reducingmissing data bias in a statistical analysis (26). Sociodemo-graphic covariates include age, gender, race, Hispanicorigin, nativity, and marital status. Covariates describingtumor characteristics and treatment include tumor size(27), grade, stage, survival time, and receipt of cancer-directed surgery. Certain therapies have shown to bemore responsive in some histologic subtypes, thus mak-ing them important predictors. However, such informa-tion can only be made available for patients 65 years andolder through the linked SEER-Medicare database (28) for1991 and later. Considering the lack of analytics tools tohandle the dynamics of the availability and access toparticular regimen over time and patients’ age, we didnot include more detailed treatment variables in themodel. We also did not include lymph node involvementin the final model because it is highly collinear with stage.We included a nominal variable of 9 SEER registries toreflect the variability among registries in the use of non-specific morphology codes. Cancer diagnosis year wasentered into themodels as a nominal variable (instead of acontinuous variable) to relax the temporal assumptionabout the intervariable relationships. Smoking and socio-economic deprivation are also strongly predictive of his-tology (29), but they are not routinely collected in SEER.To substitute, we used county-level smoking prevalenceestimates obtained from the Model-based Small AreaEstimates Projects of NCI (http://sae.cancer.gov/;
ref. 30), and poverty prevalence estimates from the 2000U.S. Census Bureau (31).
Because missing histology cannot be imputed for casesthat are associated with missing covariates using simpleregression-based imputation approaches, we developedan algorithm using SRMI technique to deal with multi-variate missing data with arbitrary missing patterns.Specifically, SRMI fits a conditional model for each var-iable at a time on the remaining variables sequentially formultiple rounds to achieve convergence. The formof conditional model depends on the type of variableimputed. Our algorithm offers 2 new capacities beyondwhat is available in existing SRMI-based imputationpackages, such as IVEware (http://www.isr.umich.edu/src/smp/ive/) and MICE (http://cran.r-project.org/web/packages/mice/index.html). First, for imput-ing binary data (categorical variables with more than 2levels can be expressed as a series of nested dummyvariables), we used ridge-penalized logistic regressions(32, 33) to improve imputation precision in the presence ofbinary outcome with skewed distribution and highlycorrelated covariates (34). The standard approach forimputingmissingbinarydata is usually basedona logisticregression model (21, 35). However, the adequacy oflogistic models could highly depend upon the extent towhich the binary outcome is balanced and there is anabsence of collinearity. In the presence of either conditionor both at the same time, logistic regression coefficientsmay still be unbiased, but the precision could be very low,which could lead to poorly imputed data. The proposedapproach improves the imputation by estimating a penal-ized log likelihood to obtain coefficients estimates withminimumprediction errors. Optimizing the penalty para-meters is critical and usually requires intensive cross-validation studies (36).We follow the simplified approachproposed by Yu (34) and obtain the optimized parametersdirectly from the data by estimating the unrestricted loglikelihood. The remaining steps are similar to those whenstandard logistic models are used (21). Second, we addedamodule to impute discrete right-censored survival data.For the data we chose for this study, more than 25% ofsurvival time was censored because the patient was stillalive at the end of study or died from other causes.Because both survival and censoring are highly correlatedwith histology as well as other covariates such as age,stage, tumor size, and grade, it is problematic to userelatively simple approaches, such as the indicator meth-od where censoring is taken care of by including a cen-soring indicator (37, 38). Theproposedmethod applies theMI principle to impute the censored time with a plausiblefuture survival time. Specifically, to generate the imputedvalues, we first aggregate continuous survival time (inmonth) into several meaningful categories and sort themin an increasing order of survival. We then define animputing risk set for each censored case as the cases withobserved survivals no shorter than the censoring time.Using data from this imputing risk set, we finally estimatethe predictive conditional distributions of survival
Yu et al.
Cancer Epidemiol Biomarkers Prev; 23(8) August 2014 Cancer Epidemiology, Biomarkers & Prevention1548
on March 15, 2020. © 2014 American Association for Cancer Research. cebp.aacrjournals.org Downloaded from
Published OnlineFirst May 22, 2014; DOI: 10.1158/1055-9965.EPI-14-0130
categories, from which we randomly draw a value to bethe imputed survival. Note that the possible value of animputed survival is always equal to or longer than thecensoring time category. This is reasonable because acensored case could only die at a later time in its ownsurvival category or be still alive and die at a futurecategory, but not die at a past category. This imputationprocess starts with censored cases in the first survivalcategory and cycles through all categories to complete oneimputed survival data. Because the survival is now adiscrete variable, we estimate its predictive conditionaldistribution using nested ridge-penalized logistic modelssimilar to what we have outlined for categorical data.Furthermore, to deal with the inconsistency in stagedefinitions over time, we conducted the imputation sep-arately for 1975 to 1982, 1983 to 1987, and 1988 to 2010, sothat staging is comparable within each period.
Simulation studyTo explore information recovery from the MI in esti-
mating the distribution of histology, we generated asimulated dataset from the analysis data with only com-plete observations included (n¼ 10,659).We considered asituation similar to the main analysis where histology ismissing at random and the probability of the inducedmissingness is determined by a logistic regression modelwith the coefficients estimated using the analysis data.The rate of induced missing data was 8.4% (the observedmissing rate was 10.0% for the portion of data with allcovariates observed). Twenty imputed datasets were gen-erated using the proposed approach and the standardlogistic regression method, respectively.The ridge-penalized logistic regression model outper-
formed the standard logistic regression model in recov-ering the missing information based on the Akaike
information criterion (AIC; ridge-penalized method:AIC ¼ 31,008 and standard method: AIC ¼ 31,065). Theimputed distributions of histology obtained using theproposed method were similar to the complete datadistribution (with absolute difference less than 2% inestimating the percentage of cases in each histologyand gender group). We also calculated the overlap prob-ability (39) to evaluate how much the associated 95% CIestimated from the imputed and complete data overlap.Suppose (Limp, Uimp) and (Lcom, Ucom) are the 95% CIsfor estimating P, the percentage of adenocarcinomaamong men, using the imputed and complete data,respectively. The probability overlap in the CIs for P is
I ¼ 12
RUimp
LimpfcomðtÞdtþ
RUcomLcom
fimpðtÞdth i
, where fimp and fcom
are the distributions of P computed under the imputedand complete data, respectively. Note that fcom could takea different form of distribution depending on the type ofstatistics forwhich onewish to obtain estimates, but fimp isalways t-distributed according to Rubin’s rules (22). Itakes value 0.95 if 2 CIs overlap perfectly and 0 if theydo not overlap at all. A large value in I suggests that theimputeddatahighlymaintains the analytical properties ofthe complete data. This measure provides more informa-tion than a simple comparison of 2 point estimates by alsoconsidering the standard errors. Estimates with largestandard errors might still have a high CI overlap evenif their point estimates differ considerably fromeach otherbecause the CI will increase with the standard error of theestimate. In this simulation study, most overlap proba-bilities (for estimating the distributions of cases by his-tology andgender)weremore than 0.8,which suggested avery strong agreement,with a few exceptions inwhich theprobabilities were around 0.75, which still suggested astrong agreement. These evaluation results provided
Table 1. The numbers and percentages of lung cancer cases by histologic type and histologic confirmationstatus, SEER 9a, 1975 to 2010
OverallHistologic confirmation status (column%)
[n ¼ 522,416(100.0%)]
Confirmed[n ¼ 470,326(90.0%)]
Not confirmed[n ¼ 38,657(7.4%)]
Unknown[n ¼ 13,433(2.6%)]
Small cell carcinoma 14.4 15.7 1.7 3.4NSC carcinoma 68.5 75.2 7.0 8.9Squamous 22.6 24.8 1.9 2.3Adenocarcinoma 32.1 35.3 3.1 4.0Large-cell 5.6 6.2 0.3 0.6Other specified NSC 3.1 3.4 0.2 0.38046 (NSC carcinoma) 5.1 5.5 1.5 1.7
8010 (carcinoma, NOS) 12.5 7.6 61.5 43.0Other specified and unspecified types 4.6 1.4 29.8 44.7
Abbreviation: NOS, not otherwise specified.aThe SEER 9 registries include Atlanta, Connecticut, Detroit, Hawaii, Iowa, New Mexico, San Francisco–Oakland, Seattle–PugetSound, and Utah.
Adjust for Bias in Lung Cancer Incidence Trends by Histology
www.aacrjournals.org Cancer Epidemiol Biomarkers Prev; 23(8) August 2014 1549
on March 15, 2020. © 2014 American Association for Cancer Research. cebp.aacrjournals.org Downloaded from
Published OnlineFirst May 22, 2014; DOI: 10.1158/1055-9965.EPI-14-0130
Tab
le2.
Distributionof
histolog
ically
confi
rmed
lung
canc
erca
sesbyhistolog
yan
dse
lected
cova
riates,
SEER9a,1
975to
2010
Ove
rall
Small
cell
Squa
mous
Aden
o-
carcinoma
Large
cell
Other
spec
ified
NSC
8010
(carcino
ma,
NOS)
8046
(NSC
carcinoma)
Ove
rall
463,60
9(100
.0%
)73
,994
(100
.0%)
116,77
5(100
.0%
)16
6,00
6(100
.0%)
29,123
(100
.0%
)15
,914
(100
.0%)
35,954
(100
.0%
)25
,843
(100
.0%
)
Age
<50y
6.5
5.4
4.1
7.8
8.5
14.6
6.3
5.7
50–<6
0y
17.9
19.5
15.2
19.1
20.6
19.7
16.4
16.6
60–<7
0y
32.4
35.3
33.5
31.5
33.0
30.5
30.4
26.8
70–<8
0y
31.2
30.3
34.9
29.6
28.4
25.9
32.3
32.7
�80y
12.0
9.4
12.3
11.9
9.5
9.4
14.6
18.2
Sex
Male
59.8
56.1
71.4
53.4
63.0
53.2
62.1
55.8
Rac
eWhite
84.0
88.2
83.3
83.1
84.6
86.1
79.5
83.0
Black
10.4
7.8
12.1
9.9
11.2
9.5
12.6
11.2
Other
5.5
4.0
4.5
6.9
4.1
4.1
7.7
5.8
Missing
0.1
0.1
0.1
0.1
0.1
0.3
0.2
0.1
Ethnicity
Non
-Hispan
ic2.7
2.4
2.4
3.0
2.6
3.3
2.6
3.8
Marita
lstatus
Single
9.0
8.2
8.8
9.0
8.2
10.7
3.4
3.7
Marrie
d58
.257
.259
.059
.060
.459
.29.2
12.2
Sep
/Div/W
id29
.731
.629
.129
.028
.427
.156
.151
.1Missing
3.1
3.0
3.1
3.1
3.0
3.0
31.3
33.0
Nativity
Native-born
81.0
86.1
83.1
77.9
84.9
72.8
83.7
74.2
Foreign-born
8.2
7.0
7.9
9.0
8.1
7.2
9.0
8.0
Missing
10.8
7.0
9.1
13.1
7.0
20.1
7.3
17.8
Dataso
urce
Non
-hos
pita
l1.8
1.6
1.5
1.8
1.1
1.9
2.8
2.9
Grade
Grade1
4.0
0.1
4.1
7.8
0.2
4.9
0.2
0.2
Grade2
13.1
0.7
24.8
18.2
0.5
2.6
0.6
1.8
Grade3
27.4
5.9
36.0
30.6
21.8
8.0
39.5
30.5
Grade4
27.4
44.7
2.2
1.8
48.3
41.9
2.8
2.4
Missing
42.3
48.7
32.9
41.7
29.3
42.7
57.0
65.0
Tumor
size
<2cm
8.3
4.6
5.9
12.2
5.3
17.1
4.2
8.1
2–<3
cm10
.96.3
9.1
15.0
8.8
12.4
7.4
11.6
3–<4
cm10
.26.4
10.2
12.3
9.6
8.4
8.4
11.9
4–<5
cm8.1
5.7
9.1
8.4
8.3
6.0
7.1
10.4
�5cm
19.6
17.9
24.4
15.7
23.0
14.7
18.8
28.2
Missing
43.0
59.0
41.3
36.4
45.0
41.4
54.1
29.8
Stage
Loca
lized
19.7
7.9
24.7
23.8
16.5
33.0
11.4
11.9
Reg
iona
l27
.824
.736
.225
.630
.021
.721
.423
.1Distant
46.5
61.6
31.7
46.1
47.1
40.3
56.1
62.0
(Con
tinue
don
thefollo
wingpag
e)
Yu et al.
Cancer Epidemiol Biomarkers Prev; 23(8) August 2014 Cancer Epidemiology, Biomarkers & Prevention1550
on March 15, 2020. © 2014 American Association for Cancer Research. cebp.aacrjournals.org Downloaded from
Published OnlineFirst May 22, 2014; DOI: 10.1158/1055-9965.EPI-14-0130
Tab
le2.
Distributionof
histolog
ically
confi
rmed
lung
canc
erca
sesbyhistolog
yan
dse
lected
cova
riates,
SEER9a,1
975to
2010
(Con
t'd)
Ove
rall
Small
cell
Squa
mous
Aden
o-
carcinoma
Large
cell
Other
spec
ified
NSC
8010
(carcino
ma,
NOS)
8046
(NSC
carcinoma)
Ove
rall
463,60
9(100
.0%
)73
,994
(100
.0%)
116,77
5(100
.0%
)16
6,00
6(100
.0%)
29,123
(100
.0%
)15
,914
(100
.0%)
35,954
(100
.0%
)25
,843
(100
.0%
)
Missing
6.0
5.9
7.4
4.4
6.4
5.0
11.1
3.1
Surge
ryPerform
ed27
.45.9
31.9
38.6
25.6
43.6
11.7
10.3
Not
perform
ed69
.189
.264
.358
.869
.052
.583
.489
.4Missing
3.5
4.9
3.9
2.6
5.4
3.8
4.9
0.3
Surviva
l<1
y43
.853
.739
.938
.752
.137
.054
.645
.51–
<2y
11.2
15.5
11.4
9.9
10.7
7.3
10.1
10.3
2–<3
y3.7
3.1
4.1
4.0
3.4
2.3
3.1
3.2
�3y
16.8
6.7
18.1
22.1
14.4
30.8
9.0
9.7
Cen
sored
24.7
21.0
26.5
25.4
19.4
22.7
23.2
31.3
SEER9registry
SMS
15.3
13.2
13.5
16.6
17.9
15.0
16.0
17.6
Con
necticut
16.2
15.9
15.4
17.3
15.4
15.7
16.9
14.5
Detroit
20.8
21.3
23.0
19.8
20.9
21.2
20.8
15.2
Haw
aii
4.0
3.4
3.6
4.6
2.5
3.6
4.4
4.0
Iowa
13.6
15.5
15.6
12.7
10.5
13.6
11.7
11.6
New
Mex
ico
4.5
4.9
4.5
4.2
4.2
3.9
4.3
5.6
Sea
ttle
15.0
15.3
13.8
15.1
11.5
15.1
16.9
19.9
Utah
2.8
2.8
2.9
2.7
2.8
4.6
2.4
2.6
Atla
nta
7.9
7.7
7.8
6.9
14.4
7.3
6.7
9.1
%Below
pov
erty
0–<5
1.7
1.8
1.7
1.9
1.2
1.6
1.8
1.6
5–<1
054
.955
.652
.756
.352
.256
.253
.857
.510
–<2
041
.740
.743
.940
.444
.740
.842
.738
.6�2
01.7
1.9
1.7
1.4
1.9
1.4
1.7
2.2
%Current
smok
er(m
ean)
21.4
21.7
21.9
21.1
20.9
21.5
21.4
21.4
NOTE
:Alltw
o-way
asso
ciations
aresign
ifica
ntat
the0.00
1leve
l.Abbreviations
:Atla
nta,
Atla
ntametropolita
n;Sea
ttle,S
eattle–Pug
etSou
nd;S
MS,S
anFran
cisc
o–Oak
land
.aTh
eSEER9registrie
sinclud
eAtla
nta,
Con
necticut,D
etroit,
Haw
aii,Iowa,
New
Mex
ico,
San
Fran
cisc
o–Oak
land
,Sea
ttle–Pug
etSou
nd,a
ndUtah.
Adjust for Bias in Lung Cancer Incidence Trends by Histology
www.aacrjournals.org Cancer Epidemiol Biomarkers Prev; 23(8) August 2014 1551
on March 15, 2020. © 2014 American Association for Cancer Research. cebp.aacrjournals.org Downloaded from
Published OnlineFirst May 22, 2014; DOI: 10.1158/1055-9965.EPI-14-0130
strong evidences for model adequacy in the proposedmethod.
ResultsTable 1 shows the distribution of histologic categories
by histology confirmation status. Ninety percent ofcases are histologically confirmed. Among the casesthat are not confirmed and the cases for which theconfirmation status is unknown, 8010 accounts forabout 50% of the total whereas 8046 only accounts forless than 2%. Possible explanation for the differentialuse of 8010 and 8046 could be that the latter is mainlyused when histologic diagnosis, although not quitespecific, exists, and the former is also used when thediagnosis is not available.
Table 2 shows the distributions of lung cancer cases byhistology and selected covariates. All covariates are close-ly associatedwith histology. Men and older patients weremore likely to be diagnosed with squamous type. Squa-mous and adenocarcinoma tumors tended to be morewell-differentiated than large cell and other specific NSCtumors. Squamous and large cell tumors tended to belarger at diagnosis. Small cell tumors were likely detectedat a later stage (61.6%) as compared with other types.In contrast, tumors of squamous and adenocarcinomatypes tended to be detected at early stage. There are alsoa few notable differences in the use of nonspecific codesacross registries. For example, a lower use of 8046 (15.2%in 8046 compared with the overall percentage of 20.8%) isobserved in Detroit, and a higher use of both 8010 (16.9%compared with the overall percentage of 15.0%) and 8046(19.9%) is observed in Seattle. The use of nonspecific codeis also slightly higher for cases not reported by a hospital(2.8% in 8010 and 2.9% in 8046 compared with the overallpercentage of 1.8%). These variables are also predictive tothe use of nonspecificmorphology codes. Aswe expected,tumors without specific histologic diagnosis tended to beless well differentiated, diagnosed at a late stage, hadshorter survivals, andwere less likely to be candidates forsurgery.
Figure 1 shows thepercentages of cases codedwith 8046and 8010 by year of diagnosis for men and women sep-arately. The temporal distributions are similar for bothgenders. The percentage of cases coded with 8010 hadincreased from 1982 until the introduction of 8046 intoICD-O-3 in 2001, when it dropped to around 3%. Thereseems to be a smooth compensation between 8010 and8046 in 2001, which suggests that 8010 and 8046 areprobably used interexchangebly in practice.
Figure 2 shows the rates of incidence by imputed his-tology among cases coded with 8010 or 8046. Overall, theamount of imputed histologydiffers by histologic subtypeand year. For both 8010 and 8046, the rates of incidenceraised by imputation were greatest for adenocarcinomaand squamous. For both histologic subtypes, the ratesfollowed an n-shaped pattern over the most recent 15years. Small cell was the third most raised category,although only contributed from imputing 8010 cases, andthe amount of increases was relatively stable over time.
Figure 3 compares the before and after imputationtemporal trends in age-adjusted incidence rate of lungcancer by histology for men and women separately(see Table 3, for detailed results of the joinpoint trendsanalysis) The numbers listed over (imputed) or under(original) each segment represents the APC for that por-tion of the trend and an asterisk indicates a statisticallysignificant trend at 0.05 level. The rates for 8010 (small celltype) and 8010 and 8046 combined (NSC subtypes) arealso included in these plots to help examine how cases aredistributed by the imputation procedure.
The imputation adjustment affected the incidencetrends differently for each histologic subtype. For smallcell in both genders, the original and imputed trends aresimilar. For squamous cell cancer in both genders andadenocarcinoma in men, the trends showed a similarpattern overall from 1970 to early 1990s before and afterimputation. From early 1990s to 2005, the decreasingtrends also remained unchanged after imputation, butthe pace of decline slowed. After 2005, the increasingtrends based on the original data had been replaced by
50
1015
2025
30
Year of diagnosis
1975
1980
1985
1990
1995
2000
2005
2010
80108046
Men
Per
cent
age
of c
ases
(%
)
50
1015
2025
30
Year of diagnosis
1975
1980
1985
1990
1995
2000
2005
2010
80108046
Women
Figure 1. Percentages ofhistologically confirmed lungcancer cases coded as 8010 and8046, SEER 9, 1975 to 2010.
Yu et al.
Cancer Epidemiol Biomarkers Prev; 23(8) August 2014 Cancer Epidemiology, Biomarkers & Prevention1552
on March 15, 2020. © 2014 American Association for Cancer Research. cebp.aacrjournals.org Downloaded from
Published OnlineFirst May 22, 2014; DOI: 10.1158/1055-9965.EPI-14-0130
the steady continuations of earlier decreasing trends forsquamous and adenocarcinoma in men, a constanttrend for squamous in women, after imputation. Foradenocarcinoma in women, the trends, before and afterimputation, exhibited similar patterns overall beforeearly 1990s. From 1992 to 2007, the plateau followedby an increasing trend started in 2004 changed to acontinuously increasing trend after imputation. It is alsoworth noting that the imputed rates showed a nonsig-nificant decreasing tendency during the most recent 3years starting in 2007. For large cell cancer and cancer inother specified NSC type, the imputed rates were sim-ilar to the original rates and the imputation did notchange the overall trends.To rule out the possibility that changes in trends may
be because of the absence of cases that are not histo-logically confirmed or have missing confirmation status,we conducted a sensitivity analysis on all cases. Theimputation affected the trends similarly (see Supple-mentary Fig. S1 and Table S1, for detailed results on therates and jointpoint analysis), which suggests thatexcluding these cases does not affect the overall findingsand conclusions.
DiscussionIn cancer surveillance data collections, it is common for
the morphological classification systems to change toreflect the contemporary pathology practice. Hence, thedata often comprise cancer cases coded one way at onetime and others a different way at another time. Whenclassification systems differ in coding histology, temporalinferences by histologic subtype can be misleading anddifficult to interpret. Without access to calibration data toinform the underlying distribution of histology amongcases coded without specificity or the association in his-tology between editions of classification systems, wecarefully developed an MI approach to correct for biasesin statistical inferences about temporal trends of lungcancer incidence based on the MAR assumption.
Although this assumption is not empirically testable,we argue that MAR is reasonable in our setting becausewe have identified and included into the imputationmodels an extensive set of auxiliary variables that canexplain the missingness of specific histology, for examplereceipt of cancer-directed surgery, and that are correlatesof histology, for example the stage, grade, and size of atumor, as well as patient survival. Other important
Figure 2. Imputed incidence rates by histologic subtype and gender, histologically confirmed cases that were originally coded as 8010 or 8046, SEER 9, 1975to 2010.
Adjust for Bias in Lung Cancer Incidence Trends by Histology
www.aacrjournals.org Cancer Epidemiol Biomarkers Prev; 23(8) August 2014 1553
on March 15, 2020. © 2014 American Association for Cancer Research. cebp.aacrjournals.org Downloaded from
Published OnlineFirst May 22, 2014; DOI: 10.1158/1055-9965.EPI-14-0130
variables that could enhance the MAR assumption plau-sibility are patients’ smoking status and socioeconomicstatus (40, 41), for which we substituted county levelestimates at 2000 (pooled estimates from 2000 to 2003 forsmoking) from the decennial census because they are notroutinely collected in SEER. Although such estimates arenot available for every diagnosis year, we believe theranking of a county in smoking prevalence or povertylevel relative to the rest of the country remains relativelyunchangedover time. Thepotential confounding betweensmoking status and poverty (40) is not likely a cause forconcern in our analysis because both are aggregate mea-sures and neither is a strong predictor to histology afterconditional on other patient-level information.
Ensuring the plausibility of MAR assumption imposed2 modeling challenges of handling a large number ofvariables with missing data and a general missing datapattern, which often cannot be adequately addressed bysimple imputation methods (21). The proposed MIapproach based on SRMI is particularly suitable to this
complex situation because of its flexibility in specifyingand fitting conditional distributions. The search forrefined ridge-penalized logistic regression imputationmodels is necessary because the standard SRMI approach(based on logistic regressions) might be inadequate inhandling a categorical outcome with a skewed distribu-tion (e.g., certain histology categories only contain 3%–6%of cases) and correlated covariates (e.g., stage and surviv-al). The simulation study demonstrated the adequacyand prediction benefits of the proposed semiparametricmodels.
The amount of lung cancer cases lacking specific his-tologic subtypes was predominantly associated with theyear of diagnosis, which reflected the evolution of SEERcoding algorithms and recent changes in diagnostic prac-tice. The imputation raised the incidence rates across theentire study period for both genders and histology sub-groups. However, the magnitudes of the elevations var-ied. Of the various histologic subtypes, themost impactedwere squamous and adenocarcinoma, on which the
Figure 3. Observed and imputed incidence rates by histologic subtype and gender, histologically confirmed malignant cancer cases, SEER 9, 1975 to 2010.
Yu et al.
Cancer Epidemiol Biomarkers Prev; 23(8) August 2014 Cancer Epidemiology, Biomarkers & Prevention1554
on March 15, 2020. © 2014 American Association for Cancer Research. cebp.aacrjournals.org Downloaded from
Published OnlineFirst May 22, 2014; DOI: 10.1158/1055-9965.EPI-14-0130
Tab
le3.
Joinpoint
analysisforh
istologica
llyco
nfirm
edmaligna
ntlung
canc
ersbyim
putationstatus
,gen
der,a
ndhistolog
y,SEER9a,1
975to
2010
Trend
1Trend
2Trend
3Trend
4Trend
5
Yea
rsAPC
(95%
CI)
Yea
rsAPC
(95%
CI)
Yea
rsAPC
(95%
CI)
Yea
rsAPC
(95%
CI)
Yea
rsAPC
(95%
CI)
Men
Smallc
ell
Orig
inal
1975
–19
815.6
(3.8
to7.4)
1981
–19
880.0
(�1.4to
1.5)
1988
–20
10�3
.2(�
3.4to
�3.0)
Imputed
1975
–19
787.0
(0.3
to14
.2)
1978
–19
861.7
(0.2
to3.2)
1986
–19
96�2
.1(�
3.0to
�1.1)19
96–20
10�4
.0(�
4.5to
�3.5)
Squa
mou
sOrig
inal
1975
–19
821.7
(0.6
to2.7)
1982
–19
90�2
.1(�
3.1to
�1.2)19
90–20
05�4
.0(�
4.3to
�3.6)20
05–20
100.9
(�1.0to
2.8)
Imputed
1975
–19
820.6
(�0.1to
1.3)
1982
–19
92� 1
.8(�
2.3to
�1.4)19
92–19
96�4
.8(�
7.2to
�2.2)19
96–19
990.1
(�5.2to
5.6)
1999
–20
10�2
.4(�
2.8to
�2.0)
Aden
ocarcino
ma
Orig
inal
1975
–19
7810
.8(4.3
to17
.6)
1978
–19
922.0
(1.5
to2.5)
1992
–20
05�1
.8(�
2.3to
�1.3)20
05–20
102.5
(0.7
to4.3)
Imputed
1975
–19
788.0
(2.6
to13
.7)
1978
–19
922.1
(1.7
to2.6)
1992
–20
10�0
.2(�
0.4to
0.0)
Largece
llOrig
inal
1975
–19
8017
.5(12.1to
23.2)
1980
–19
882.3
(0.3
to4.4)
1988
–19
99�5
.9(�
7.0to
�4.8)19
99–20
10�1
1.4
(�12
.9to
�10.0)
Imputed
1975
–19
7920
.1(12.4to
28.3)
1979
–19
883.3
(1.7
to4.9)
1987
–20
04�5
.5(�
6.1to
�4.9)20
04–20
10�1
4.5
(�18
.0to
�10.9)
Other
spec
ificNSC
Orig
inal
1975
–19
77�1
7.9
(�28
.7to
�5.3)19
77–19
90�6
.4(�
7.5to
�5.3)19
90–20
10�0
.7(�
1.3to
�0.1)
Imputed
1975
–19
77�2
0.0
(�31
.1to
�7.2)19
77–19
90�6
.5(�
7.6to
�5.5)19
90–20
071.2
(0.4
to2)
2007
–20
10�8
.5(�
17.7
to1.6)
Women
Smallc
ell
Orig
inal
1975
–19
829.5
(7.4
to11
.6)
1982
–19
913.0
(1.9
to4.2)
1991
–20
10�1
.7(�
2.0to
�1.5)
Imputed
1975
–19
876.3
(5.3
to7.4)
1987
–19
970.4
(�0.8to
1.6)
1997
–20
10�3
.0(�
3.6to
�2.3)
Squa
mou
sOrig
inal
1975
–19
845.8
(4.8
to6.8)
1984
–19
951.0
(0.3
to1.6)
1995
–20
04�2
.2(�
3.1to
�1.4)20
04–20
102.1
(0.8
to3.5)
Imputed
1975
–19
884.3
(3.7
to4.9)
1988
–20
100.1
(�0.1to
0.3)
Aden
ocarcino
ma
Orig
inal
1975
–19
817.0
(5.1
to9.0)
1981
–19
923.8
(3.2
to4.5)
1992
–20
040.1
(�0.3to
0.5)
2004
–20
102.8
(1.8
to3.8)
Imputed
1975
–19
904.7
(4.3
to5.0)
1990
–20
071.9
(1.6
to2.1)
2007
–20
10�1
.2(�
3.6to
1.2)
(Con
tinue
don
thefollo
wingpag
e)
Adjust for Bias in Lung Cancer Incidence Trends by Histology
www.aacrjournals.org Cancer Epidemiol Biomarkers Prev; 23(8) August 2014 1555
on March 15, 2020. © 2014 American Association for Cancer Research. cebp.aacrjournals.org Downloaded from
Published OnlineFirst May 22, 2014; DOI: 10.1158/1055-9965.EPI-14-0130
most pronounced impacts occurred during the lastdecade. This result further supports our hypothesis that8010 and 8046 are mainly used to group cases, whichcould have been coded as either adenocarcinoma orsquamous type if more coding information wereextracted and available to support detailed histologiccoding. For both subtypes, the decreasing trends fromearly or mid-1990s to 2005, had persisted, although at aslower pace. The increasing trends after 2005 are appar-ently an artifact of this coding change and imprecisionin histopathologic classification, which, after imputa-tion, became a continuation of earlier decreasing trends.The sensitivity analyses including cases that are nothistologically confirmed or have missing histologic con-firmation information showed similar results.
We classified lung cancers according to a schemadeveloped based on Travis and colleagues (42) andearlier versions of ICD-Os. WHO recently published arevised version of the histologic grouping for lungcancers (43). Different histologic classification systemshave been used in practice, for example, the recentlypublished classification schema by the InternationalAgency for Research on Cancer of the WHO (43) in2007. The differences between this new classificationand the one used in this research are summarized inSupplementary Table S2. Because the groupings of themost frequently used morphologic codes are consistentbetween the 2 schemas, we suspect that the effect ofusing this alternative schema on the inferences of inci-dence trends is noticeable for the histologic subtypesthat we investigated in this research.
In summary, molecular, genetic, and etiologic fea-tures are increasingly associated with histology distinc-tions (3, 4, 44). Progress in linking molecular features tomorphology will facilitate mechanistic understandingand further characterization of the molecular and genet-ic features specific to histologic subtypes in lung cancer.These considerations, along with the emergence of tar-geted therapies within specific histologic subtypes espe-cially adenocarcinoma, clearly indicates that accuratepopulation tracking of trends by lung cancer histologywill be increasingly important in the future, and that theMI technique applied in this study can help refine thesetrends. Planned data collections for bridge data in thefuture will further enhance the quality of data augment-ed by MI.
Disclosure of Potential Conflicts of InterestNo potential conflicts of interest were disclosed.
Authors' ContributionsConception and design: M. Yu, E.J. Feuer, K.A. Cronin, N.E. CaporasoDevelopment of methodology: M. Yu, E.J. Feuer, K.A. CroninAcquisition of data (provided animals, acquired and managed patients,provided facilities, etc.): M. YuAnalysis and interpretation of data (e.g., statistical analysis, biostatis-tics, computational analysis): M. Yu, E.J. Feuer, K.A. Cronin,N.E. CaporasoWriting, review, and/or revision of the manuscript: M. Yu, E.J. Feuer,K.A. Cronin, N.E. Caporaso
Tab
le3.
Joinpoint
analysisforh
istologica
llyco
nfirm
edmaligna
ntlung
canc
ersbyim
putationstatus
,gen
der,a
ndhistolog
y,SEER9a,1
975to
2010
(Con
t'd)
Trend
1Trend
2Trend
3Trend
4Trend
5
Yea
rsAPC
(95%
CI)
Yea
rsAPC
(95%
CI)
Yea
rsAPC
(95%
CI)
Yea
rsAPC
(95%
CI)
Yea
rsAPC
(95%
CI)
Largece
llOrig
inal
1975
–19
7840
.4(24.1to
59.0)
1978
–19
886.6
(5.2
to7.9)
1988
–19
97�3
.0(�
4.3to
�1.7)19
97–20
10�9
.9(�
10.7
to�9
.1)
Imputed
1975
–19
7837
.3(20.8to
56.2)
1978
–19
886.6
(5.2
to8.0)
1988
–19
95�2
.1(�
4.1to
0.1)
1997
–20
04�5
.2(�
6.6to
�3.7)
2004
–20
10�1
2.4
(�15
.3to
�9.4)
Other
spec
ificNSC
Orig
inal
1975
–19
85�3
.5(�
5.3to
�1.6)
1985
–20
101.5
(1.1
to1.9)
Imputed
1975
–19
85�3
.9(�
5.7to
� 2.1)
1985
–20
102.3
(1.8
to2.7)
aTh
eSEER9registrie
sinclud
eAtla
nta,
Con
necticut,D
etroit,
Haw
aii,Iowa,
New
Mex
ico,
San
Fran
cisc
o–Oak
land
,Sea
ttle–Pug
etSou
nd,a
ndUtah.
Cancer Epidemiol Biomarkers Prev; 23(8) August 2014 Cancer Epidemiology, Biomarkers & Prevention1556
Yu et al.
on March 15, 2020. © 2014 American Association for Cancer Research. cebp.aacrjournals.org Downloaded from
Published OnlineFirst May 22, 2014; DOI: 10.1158/1055-9965.EPI-14-0130
Administrative, technical, or material support (i.e., reporting or orga-nizing data, constructing databases): M. YuStudy supervision: M. Yu, K.A. Cronin
The costs of publication of this article were defrayed in part by thepayment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicatethis fact.
Received February 5, 2014; revised April 22, 2014; acceptedMay 5, 2014;published OnlineFirst May 22, 2014.
References1. Howlader N, Noone AM, Krapcho M, Garshell J, Neyman N, Altekruse
SF, et al., editors. SEERCancer Statistics Review, 1975–2010, Nation-al Cancer Institute. Bethesda, MD. Available from: http://seer.cancer.gov/csr/1975_2010/. Based on November 2012 SEER data submis-sion, posted to the SEER website, April 2013.
2. Lamb D. Histological classification of lung cancer. Thorax 1984;39:161–5.
3. Landi MT, Chatterjee N, Yu K, Goldin LR, Goldstein AM, Rotunno M,et al. A genome-wide association study of lung cancer identifies aregion of chromosome 5p15 associated with risk for adenocarcinoma.Am J Hum Genet 2009;85:679–91.
4. Shi J, Chatterjee N, Rotunno M, Wang Y, Pesatori AC, Consonni D,et al. Inherited variation at chromosome 12p13.33, including RAD52,influences the risk of squamous cell lung carcinoma. Cancer Discov2012;2:131–9.
5. Lynch TJ, Bell DW, Sordella R, Gurubhagavatula S, Okimoto RA,Brannigan BW, et al. Activating mutations in the epidermal growthfactor receptor underlying responsiveness of non-small-cell lung can-cer to gefitinib. N Engl J Med 2004;350:2129–39.
6. Paez JG, J€anne PA, Lee JC, Tracy S, Greulich H, Gabriel S, et al. EGFRmutations in lung cancer: correlation with clinical response to gefitinibtherapy. Science 2004;304:1497–500.
7. Husain H, Rudin CM. ALK-targeted therapy for lung cancer: ready forprime time. Oncology 2011;25:597–60.
8. KimES, Herbst RS,Wistuba II, Lee JJ, BlumenscheinGR, TsaoA, et al.The BATTLE rrial:personalizing therapy for lung cancer. Cancer Discov2011;1:44–53.
9. Pinsky P. National Lung Screening Trial (NLST) subset analysis. Boardof Scientific Advisor and National Cancer Advisory Board, Bethesda,MD: National Cancer Institute; 2013.
10. Jemal A, Simard E, Dorell C, Noone A, Markowitz L, Kohler B, et al.Annual report to the nation on the status of cancer, 1975–2009,featuring the burden and trends in HPV-associated cancers and HPVvaccination coverage levels. J Natl Cancer Inst 2013;105:175–201.
11. Surveillance, Epidemiology, and End Results (SEER) Program. Avail-able from: www.seer.cancer.gov. SEER�Stat Database: Incidence -SEER9RegsResearchData, Nov 2011Sub (1975–2010) <Katrina/RitaPopulation Adjustment> - Linked To County Attributes - Total U.S.,1969–2010 Counties, National Cancer Institute, DCCPS, SurveillanceResearchProgram, Surveillance SystemsBranch, released April 2013,based on the November 2012 submission. [Internet].
12. TravisW, Brambilla E, Noguchi M, Nicholson A, Geisinger K, Yatabe Y,et al. International Association for the Study of Lung Cancer/AmericanThoracic Society/European Respiratory Society international multidis-ciplinary classification of lung adenocarcinoma: executive summary.Proc Am Thorac Soc 2011;8:381–5.
13. Cole SR, Chu H, Greenland S. Multiple-imputation for measurement-error correction. Int J Epidemiol 2006;35:1074–81.
14. Durrant GB, Skinner C. Using missing data methods to correct formeasurement error in a distribution function. Surv Methodol 2006;32:25–36.
15. Schenker N, Parker JD. From single-race reporting to multiple-racereporting: using imputation methods to bridge the transition. Stat Med2003;22:1571–87.
16. Thomas N, Raghunathan TE, Schenker N, Katzo MJ, Johnson CL. Anevaluation of matrix sampling methods using data from the NationalHealth and Nutrition Examination Survey. Surv Methodol2006;32:217–32.
17. Burgette LF, Reiter JP. Nonparametric Bayesian multiple imputationformissing data due tomid-study switching ofmeasurementmethods.J Am Stat Assoc 2012;107:439–49.
18. Anderson WF, Katki HA, Rosenberg PS. Incidence of breast cancer inthe United States: current and future trends. J Natl Cancer Inst2011;103:1397–402.
19. Howlader N, Noone A, Yu M, Cronin K. Use of imputed population-based cancer registry data as a method of accounting for missinginformation: application to estrogen receptor status for breast cancer.Am J Epidemiol 2012;176:347–56.
20. Little RJA, Rubin DB. Statistical analysis with missing data. Hoboken,NJ: John Wiley & Sons, Inc.; 2002.
21. Raghunathan TE, Lepkowski JM, van Hoewyk J, Solenberger P. Amultivariate technique for multiply imputing missing values usinga sequence of regression models. Surv Methodol 2001;27:85–95.
22. Rubin DB. Multiple imputation for nonresponse in surveys. New York:Wiley & Sons; 1987.
23. Kim H-J, Fay MP, Feuer EJ, Midthune DN. Permutation tests forjoinpoint regression with applications to cancer rates. Stat Med 2000;19:335–51.
24. David M, Little RJA, Samuhel ME, Triest RK. Alternative methods forCPS income imputation. J Am Stat Assoc 1986;81:29–41.
25. Rubin DB, Stern HS, Vehovar V. Handling "don't know" surveyresponses: the case of the Slovenian plebiscite. J Am Stat Assoc1995;90:822–8.
26. Little RJA. Missing-data adjustments in large surveys. J Bus EconomStatist 1988;6:287–96.
27. Lin P-Y, Chang Y-C, ChenH-Y, ChenC-H, Tsui H-C, Yang P-C. Tumorsize matters differently in pulmonary adenocarcinoma and squamouscell carcinoma. Lung Cancer 2010;67:296–300.
28. Warren JL, Klabunde CN, Schrag D, Bach PB, Riley GF. Overview ofthe SEER-Medicare data: content, research applications, and gener-alizability to the United States elderly population. Med Care 2002;40:IV-3–18.
29. Thun MJ, Lally CA, Calle EE, Heath CW, Flannery JT, Flanders WD.Cigarette smoking and changes in the histopathology of lung cancer.J Natl Cancer Inst 1997;89:1580–6.
30. Small Area Estimates for Cancer Risk Factors & Screening Behaviors.National Cancer Institute, DCCPS, Statistical Methodology & Applica-tions Branch, released May 2010 (sae.cancer.gov). Underlying dataprovided by Behavioral Risk Factor Surveillance System (http://www.cdc.gov/brfss/) andNational Health InterviewSurvey (http://www.cdc.gov/nchs/nhis.htm). [Internet].
31. U.S. Census Bureau; Census 2000, Summary File 3, Table QT-P35;using American FactFinder. Available from: http://factfinder2.census.gov [Internet].
32. Le Cessie S, van Houwelingen JC. Ridge estimators in logistic regres-sion. Appl Statist 1992;41:191–201.
33. Schaefer R, Roi L, Wolfe R. A ridge logistic estimator. Commun Stat-Theor M 1984;13:99–113.
34. YuM.Disclosure risk assessments and control. University ofMichigan,Ann Arbor, MI: ProQuest/UMI; 2008.
35. SAS Institute Inc. SAS/STAT 9.2 user's guide. Cary, NC: SAS InstituteInc.; 2008.
36. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning:data mining, inference, and prediction. 2nd ed. New York, NY: Spring-er-Verlag; 2009.
37. Greenland S, Finkle W. A critical look at methods for handling missingcovariates in epidemiologic regression analyses. Am J Epidemiol1995;142:1255–64.
38. van der Heijden G, Donders A, Stijnen T, Moons K. Imputation ofmissing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical exam-ple. J Clin Epidemiol 2006;59:1102–9.
Adjust for Bias in Lung Cancer Incidence Trends by Histology
www.aacrjournals.org Cancer Epidemiol Biomarkers Prev; 23(8) August 2014 1557
on March 15, 2020. © 2014 American Association for Cancer Research. cebp.aacrjournals.org Downloaded from
Published OnlineFirst May 22, 2014; DOI: 10.1158/1055-9965.EPI-14-0130
39. Karr AF, Kohnen CN, Oganian A, Reiter JP, Sanil AP. A framework forevaluating theutility of data altered toprotect confidentiality. Amer Stat2006;3:224–32.
40. Menvielle G, Boshuizen H, Kunst A, Dalton S, Vineis P, Bergmann M,et al. The role of smoking anddiet in explaining educational inequalitiesin lung cancer incidence. J Natl Cancer Inst 2009;101:321–30.
41. Bennett VA, Davies EA, JackRH,MakV,Møller H.Histological subtypeof lung cancer in relation to socio-economic deprivation in South EastEngland. BMC Cancer 2008;8:139.
42. Travis WD, Travis LB, Devesa SS. Lung cancer. Cancer 1995;75:191–202.
43. Curado MP, Shin HR, Storm H, Ferlay J, Heanue M, Boyle P, editors.Cancer incidence in five continents, vol. IX. Lyon, France: IARC;2007.
44. Rotunno M, Yu K, Lubin JH, Consonni D, Pesatori AC, GoldsteinAM, et al. Phase I metabolic genes and risk of lung cancer:multiple polymorphisms and mRNA expression. PLoS ONE2009;4:e5652.
Cancer Epidemiol Biomarkers Prev; 23(8) August 2014 Cancer Epidemiology, Biomarkers & Prevention1558
Yu et al.
on March 15, 2020. © 2014 American Association for Cancer Research. cebp.aacrjournals.org Downloaded from
Published OnlineFirst May 22, 2014; DOI: 10.1158/1055-9965.EPI-14-0130
2014;23:1546-1558. Published OnlineFirst May 22, 2014.Cancer Epidemiol Biomarkers Prev Mandi Yu, Eric J. Feuer, Kathleen A. Cronin, et al. Incidence Trends by Histologic SubtypeUse of Multiple Imputation to Correct for Bias in Lung Cancer
Updated version
10.1158/1055-9965.EPI-14-0130doi:
Access the most recent version of this article at:
Material
Supplementary
http://cebp.aacrjournals.org/content/suppl/2014/05/28/1055-9965.EPI-14-0130.DC1
Access the most recent supplemental material at:
Cited articles
http://cebp.aacrjournals.org/content/23/8/1546.full#ref-list-1
This article cites 33 articles, 4 of which you can access for free at:
Citing articles
http://cebp.aacrjournals.org/content/23/8/1546.full#related-urls
This article has been cited by 5 HighWire-hosted articles. Access the articles at:
E-mail alerts related to this article or journal.Sign up to receive free email-alerts
Subscriptions
Reprints and
.pubs@aacr.orgat
To order reprints of this article or to subscribe to the journal, contact the AACR Publications Department
Permissions
Rightslink site. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC)
.http://cebp.aacrjournals.org/content/23/8/1546To request permission to re-use all or part of this article, use this link
on March 15, 2020. © 2014 American Association for Cancer Research. cebp.aacrjournals.org Downloaded from
Published OnlineFirst May 22, 2014; DOI: 10.1158/1055-9965.EPI-14-0130
top related