· web viewthe tumor detection and characterization of tumor phenotype appeared to be improved...
TRANSCRIPT
A radiomics approach based on support vector machine using MR images for preoperative lymph
node status evaluation in intrahepatic cholangiocarcinoma
Authors:
Lei Xu1,2*, Pengfei Yang1,2,3*, Wenjie Liang4*, Weihai Liu4, Weigen Wang5, Chen Luo1,2, Jing Wang1,2, Zhiyi
Peng4†, Lei Xing6, Mi Huang7†, Shusen Zheng8†, Tianye Niu1,2,9†
Affiliations:
1. Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
2. Institute of Translational Medicine, Zhejiang University, Hangzhou, Zhejiang, China
3. College of Biomedical Engineering &Instrument Science, Zhejiang University, Hangzhou, Zhejiang,
China
4. Department of Radiology, the First Affiliated Hospital, Zhejiang University School of Medicine,
Hangzhou, Zhejiang, China
5. Department of Radiology, the First Hospital of Ninghai County Medical Centre, Ningbo, Zhejiang,
China
6. Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California, USA
7. Department of Radiation Oncology, Duke University Medical Center, Durham, USA
8. Department of Hepatobiliary and Pancreatic Surgery, the First Affiliated Hospital, Zhejiang University
School of Medicine, Hangzhou, Zhejiang, China
9. Engineering Research Center of Cognitive Healthcare of Zhejiang Province
*: Both authors contribute equally.
†: Corresponding authors: Tianye Niu, [email protected]; Shusen Zheng, [email protected]; Mi
Huang, [email protected] ; Zhiyi Peng, [email protected]
Keywords:
Radiomics, intrahepatic cholangiocarcinoma, lymph node metastasis
Abstract
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Purpose: Accurate lymph node (LN) status evaluation for intrahepatic cholangiocarcinoma (ICC) patients is
essential for surgical planning. This study aimed to develop and validate a prediction model for preoperative
LN status evaluation in ICC patients.
Methods and Materials: A group of 106 ICC patients, who were diagnosed between April 2011 and
February 2016, was used for prediction model training. Image features were extracted from T1-weighted
contrast-enhanced MR images. A support vector machine (SVM) model was built by using the most LN
status-related features, which were selected using the maximum relevance minimum redundancy (mRMR)
algorithm. The mRMR method ranked each feature according to its relevance to the LN status and
redundancy with other features. An SVM score was calculated for each patient to reflect the LN metastasis
(LNM) probability from the SVM model. Finally, a combination nomogram was constructed by
incorporating the SVM score and clinical features. An independent group of 42 patients who were diagnosed
from March 2016 to November 2017 was used to validate the prediction models. The model performances
were evaluated on discrimination, calibration, and clinical utility.
Results: The SVM model was constructed based on five selected image features. Significant differences
were found between patients with LNM and non-LNM in SVM scores in both groups (the training group:
0.5466 (interquartile range (IQR), 0.4059-0.6985) vs. 0.3226 (IQR, 0.0527-0.4659), P<0.0001; the
validation group: 0.5831 (IQR, 0.3641-0.8162) vs. 0.3101 (IQR, 0.1029-0.4661), P=0.0015). The
combination nomogram based on the SVM score, the CA 19-9 level, and the MR-reported LNM factor
showed better discrimination in separating patients with LNM and non-LNM, comparing to the SVM model
alone (AUC: the training group: 0.842 vs. 0.788; the validation group: 0.870 vs. 0.787). Favorable clinical
utility was observed using the decision curve analysis for the nomogram.
Conclusion: The nomogram, incorporating the SVM score, CA 19-9 level and the MR-reported LNM
factor, provided an individualized LN status evaluation and helped clinicians guide the surgical decisions.
Introduction
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
For liver, intrahepatic cholangiocarcinoma (ICC) is the second common malignancy with steadily growing
incidence rate, with 5%-56% 5-year survival rate worldwide [1, 2]. When diagnosed, about 35% of ICC
patients suffer from synchronous lymph node (LN) metastases [1]. Lymph node metastasis (LNM) generally
indicates a negative prognosis for patients with ICC [3, 4]. Accurate preoperative evaluation of LN status
could provide crucial information for treatment strategy decisions, especially for lymph node dissection
(LND). In current clinical practice, the preoperative LN status in ICC is evaluated mainly based on the
morphological features of the lymph nodes by reviewing the medical images preoperatively (for example,
size and morphology of lymph nodes, signal changes within lymph nodes, etc.)[5, 6]. The prediction
accuracy of current LN status evaluation method is often unstable and unsatisfactory.
A common strategy to predict the LN status was developed based on histopathologic findings, such as tumor
differentiation and lymphatic invasion. However, the predictors based on this strategy were only available
postoperatively. A clinical model built upon the clinical factors including tumor size, pathological
differentiation, and tumor boundary could achieve high sensitivity of 96.1%, but the achievable specificity
and accuracy were quite low, with a value of 23.0% in specificity and 40.3% in accuracy [7]. The above
clinical model was reported to be useful in predicting LN status in patients with ICC. Nevertheless, this
method was also challenging to apply in clinical practice, because subjectivity may exist in determination of
tumor size and tumor boundary, based on clinician’s experience and judgment. When the tumor volume was
small, or when the tumor boundary was unclear, the situation became exacerbated, and the prediction
accuracy could be questionable.
On the other hand, several image-based methods have been proposed [5, 6, 8-10]. Seo et al. used the
standardized uptake value as the LNM image marker based on positron emission tomography (PET) images
[9]. However, the high cost of PET scan limited the utility in clinical practice. Nanashima et al. developed
an LN status prediction model by combining CT findings and serum carbohydrate antigen 19-9 level [5].
The CT findings were used as image markers, which defined by radiologists according to the node status of
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
hepatoduodenal ligament, common hepatic artery, and para-aorta based on CT images. The model showed a
higher prediction accuracy than previous models based on clinical features only, or medical images alone.
However, the underlying geometry and texture features of medical images were not fully excavated in these
models. A comprehensive model incorporating clinical features and image features is needed.
Radiomics refers to mining the underlying relationships between quantitative image features and
pathophysiology characteristics and then developing predictive models for clinical outcomes, such as
survival, distant metastases, and molecular characteristics classification [11-15]. The use of nomograms has
been widely accepted as a reliable tool by incorporating quantitative risk factors for clinical events.
Recently, several researchers have developed nomograms for preoperative LN status evaluation in colorectal
cancer and bladder cancer by incorporating clinical features and image features [16-18]. These nomograms
achieved desirable predictive accuracies. The related studies demonstrated the feasibility of using the
radiomics method to evaluate the LN status for patients with ICC.
In this study, a support vector machine (SVM) model was developed by using the radiomics method for
preoperative LN status evaluation in ICC patients. A nomogram was then constructed by combining the
clinical features and LNM probability, which was calculated based on the SVM model with the radiomics
features from the MR images. We then investigated the difference in prediction accuracies between the
combination model and the SVM model.
Methods
Workflow
Figure 1 presents the workflow of this study. It includes two major parts: ( ) imaging and segmentation; ( )ⅰ ⅱ
feature extraction and model construction. The specific descriptions for these two parts are provided in the
next sections.
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
Patient Population
The institutional review board of the institution (First Affiliated Hospital, Zhejiang University School of
Medicine) approved this study, and the requirement of written informed consent was waived. In this
retrospective study, we reviewed clinical records and T1-weighted contrast-enhanced MR images of ICC
patients undergoing partial hepatectomy and LND between April 2011 and November 2017. The clinical LN
status was obtained based on the clinicopathologic analysis of each patient (X-M Z, a pathologist with more
than 10 years of experience in cancer diagnosis). Supplementary Material presents theⅠ patient inclusion-
exclusion criteria and recruitment pathway. We divided the overall patient population into two independent
sub-groups according to the diagnosis time. The first sub-group was used as the training group to test the
robustness for image features and conduct the model constructing purpose. This training group consisted of
106 patients diagnosed between April 2011 and February 2016 (53 females and 53 females; 35 to 86 years of
age). The second sub-group was used as the validation group to test the proposed model. The validation
group involved 42 patients (13 females and 29 females; 40 to 80 years of age) diagnosed between March
2016 and November 2017. All patients underwent pre-treatment T1-weighted contrast-enhanced MRI scans
in our institution, since the contrast agent (Gadopentetate dimeglumine) was paramagnetic, and the signal
intensity within the tumor would be enhanced after the injection in the images. The tumor detection and
characterization of tumor phenotype appeared to be improved compared with the un-enhanced MRI and
contrast-enhanced CT. The MRI acquisition parameters are presented in Supplementary Material .Ⅱ
Baseline clinical features were derived from medical records, including gender, age, cholelithiasis (with or
without), hepatitis B (with or without), cirrhosis (with or without), primary hepatic lobe site (left or right)
and number of the primary tumors (single or multiple). The serum carbohydrate antigen 19-9 (CA19-9) level
(abnormal or normal) and serum carcinoembryonic antigen (CEA) level (abnormal or normal) were
achieved with the threshold value of the former 37 u/ml, and the latter 5 ng/ml in our institution. All MR
images were evaluated by two experienced abdomen radiologists, both of whom were blind to the actual
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
clinicopathologic results. The definitions for hepatitis B, number of the primary tumors and the MR-reported
LNM factor are provided in Supplementary Material .Ⅲ The number of the primary tumors was used as a
clinical predictive factor, which referred to the number of solid primary tumors for each patient. To justify
the use of baseline clinical features of patients in the training and validation groups, we performed the
demographic comparison for each clinical feature between the training group and validation group for
patients with LNM and non-LNM, respectively.
VOI Segmentation and Feature Extraction
We used the ITK-SNAP software to perform a 3D volume of interest (VOI) manual segmentation [19].
When multiple tumors were present, the tumor with the largest diameter was used to analyze. The VOI was
segmented by two experienced radiologists independently. Radiologist-1 had experience of 12 years in MR
images interpretation, and Radiologist-2 had experience of 14 years in MR images interpretation.
Radiologist-1 finished the segmentation of patients in the training group only (106 patients). Radiologist-2
finished the segmentation of the overall patient population including the training and validation groups (148
patients). We then obtained two feature sets for the training group (feature set-1 was extracted based on the
VOI segmentation of Radiologist-1; feature set-2 from the VOI segmentation of Radiologist-2) and a feature
set for the validation group (feature set-3 from the VOI segmentation of Radiologist-2). The feature set-1
was used to perform the model training task. The feature set-2 was used to test the robustness and
reproducibility of radiomics features from feature set-1. The feature set-3 was used to evaluate the predictive
power of the proposed model.
Image preprocessing was applied before the feature extraction, including image resampling of the arterial
phase contrast-enhanced MR images to a 1×1×1 mm3 voxel size, and image grey level normalization to a
scale of 1 to 32. A total number of 491 image features was extracted for each patient based on the VOI. The
feature set included histogram features (number=6), geometry features (number=8), gray level co-
occurrence matrix features (number=22), grey-level run-length matrix features (number=13), grey-level size
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
zone matrix features (number=13), neighborhood gray-tone difference matrix features (number=5), and
wavelet-based texture features (number=424). These features could characterize intratumor heterogeneity, as
well as the underlying tumor genotypes and protein structures [20-26]. Supplementary Material providesⅣ
the specific descriptions for all the radiomics features. The feature extraction procedure was implemented in
MATLAB V2017b (MathWorks, Natick, MA, USA).
Feature Selection and SVM Model Construction
To eliminate the differences in the value scales of the radiomics features, feature normalization was
performed before feature selection. For features in the training group, each feature for a specific patient was
subtracted by the mean value and divided by standard deviation value from this group. The same
normalization method was applied to features in the validation group using the mean values and standard
deviation values calculated based on the training group.
Due to the relatively low-dimensional patient sample size and high-dimensional feature size, we then
performed feature selection process to select the most LN status-related features to construct an SVM model.
Feature selection was performed including two steps. First, we tested the robustness and reproducibility of
image features. Since the features were extracted based on the VOIs segmented by radiologists manually, we
only used the features that were most robust against the manual segmentation among different radiologists
[26]. The correlation coefficient for each feature was calculated between the feature set-1 (from Radiologist-
1) and feature set-2 (from Radiologist-2) by using the Spearman rank correlation test. Features with
correlation coefficients greater than 0.8 were regarded as robust features, since a correlation coefficient of
0.8 indicated a high correlation according to a rule of thumb [26, 27]. Second, we applied the maximum
relevance minimum redundancy (mRMR) algorithm to assess the relevance and redundancy for each feature
[28, 29]. The maximum-relevance selection was aimed to select features that had the maximal correlation to
the actual LN status. The minimum-redundancy selection ensured that the selected features had the minimal
redundancy among each other. By using the mRMR method, the features were ranked according to their
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
relevance-redundancy indexes. The several top features with high-relevance and low-redundancy were used
to construct the SVM model by a linear kernel. In addition, we tried several typical feature selection
methods, including mRMR, least absolute shrinkage and selection operator (LASSO), Random Forest,
Elastic Net, Wilcoxon, and Gini index [30-32]. A comparison of these methods was also performed.
To demonstrate the association between the selected features and the actual LN status, we performed
univariate analysis and correlation test for each selected features in the training group. An SVM score was
calculated by using the SVM model for each patient to reflect the LNM probability. The discrimination
measured the capacity of prediction models in separating patients with LNM and non-LNM. The
discriminative capability was measured using receiver operating characteristic (ROC) curve, area under the
curve (AUC) and prediction accuracy. AUC had a range from 0.5 to 1.0 (0.5 means no discriminative ability
and 1.0 means ideal discriminative ability). The AUC was reported with a 95% confidence interval (CI). The
prediction accuracy was calculated based on a threshold from a Youden Index, which could classify the
patients into the predicted LNM group and non-LNM group according to the SVM score [33, 34]. To
estimate the prediction error and confidence interval for both groups, we further tested the proposed model
using a 10000-iteration bootstrap analysis in both training and validation group [35]. For each repetition, we
randomly selected a subset of 75% patients from the training group or the validation group (the training
group: 80 patients; the validation group: 32 patients) and calculated the corresponding AUC.
Development and Validation of Combination Nomogram
Multivariable analysis was applied to combine the clinical features and the SVM score with multivariable
logistic regression model [17, 36]. The clinical features involved gender, age, cholelithiasis status, hepatitis
status, cirrhosis status, primary site, number of the primary tumors, CEA level, CA19-9 level, and the MR-
reported LNM. Then, a nomogram (called combination nomogram) was generated based on the proposed
multivariate model. An LNM probability defined as nomogram score was then calculated for each patient by
using the developed combination nomogram. To detect the multi-collinearity among variables in the
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
combination nomogram, the collinearity diagnosis was conducted by calculating the variance inflation factor
(VIF) for variables in the combination nomogram [36, 37]. The VIF was defined as a ratio of the variance of
the model with more than two variables, divided by the variance of the model with the single variable.
Variables with VIFs > 10 indicated severity multicollinearities [37]. The threshold for the nomogram score
was determined and used to classify patients in the validation group. We tested the nomogram by using the
overall group including the feature set-2 and feature set-3. In addition, the combination nomogram was also
tested using the bootstrap method in both training and validation groups.
The model performances were evaluated in three aspects: discrimination, calibration and clinical utility [17,
24]. The discrimination performance was accessed by using ROC, AUC, and prediction accuracy. The
calibration was detected by using the calibration curves accompanied by the Hosmer-Lemeshow test (H-L
test). The calibration curves measured the consistency between the predicted LNM probability and the actual
LNM probability. The H-L test accessed the goodness-of-fit of the prediction models [17, 38]. A significant
statistic from the H-L test indicated the significant difference between the predicted LNM probability and
the actual LNM probability, meaning that the model was poor fitting. For the test of the overall group, we
used the Delong test to measure the differences in ROC curves between combination nomogram and the
SVM model [39, 40].
The decision curve analysis was applied to measure the clinical utility of models [41, 42]. The horizontal
axis of the decision curve indicates the threshold probability in the range of 0.0 to 1.0. The vertical axis
shows the clinical net benefit values resulted from the prediction models against the threshold probability.
The decision curves corresponding to the “treat-all plan” and the “treat-none plan” are plotted as references.
The detailed descriptions of the clinical net benefit, the“treat-all plan” and the“treat-none plan” are provided
in Supplementary Material . A larger area under the decision curve suggested a better clinical utility.Ⅴ
Statistical Analysis Procedure
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
All statistical tests used in this study were executed on MedCalc Statistical Software V15.2.2 (MedCalc
Software bvba, Ostend, Belgium) or R software V3.4.1 (R Core Team, Vienna, Austria). Univariate analysis
for clinical features was implemented by using the Chi-square test or Mann-Whitney U test, as appropriate.
The categorical variable was analyzed using the Chi-square test, such as gender, primary site, number of the
primary tumors, CEA level, etc. The continuous variable was analyzed using the Mann-Whitney U test,
including age and tumor size. The P<0.05 in two-tailed analyses was defined as the statistical significance.
Results
Clinical Features
Table 1 listed the clinical features of patients in the training group and validation group. No statistically
significant difference existed in LNM rate (P = 0.9210) between the two groups. The LNM rate was defined
as the ratio between the number of patients with LNM and the number of patients involved in the certain
group. The LNM rate was 44.34% in the training group, and 45.24% in the validation group. While a
temporal interval existed between the training and validation groups, there were no significant differences in
the baseline clinical features between the training group and the validation group neither for patients with
LNM nor non-LNM, justifying their use as the training and validation groups. The detailed results of
univariable association analysis were presented in Supplementary Material . Ⅵ
Feature Selection and SVM Model Construction
Among the 491 image features, 91 features were retained through the robustness and reproducibility test
with correlation coefficients greater than 0.8 between the feature set-1 and feature set-2 by using the
Spearman rank correlation test. The mRMR based feature selection was used to decrease the redundancy of
the feature set and build the optimal subset of complementary predictive features. The five highest mRMR-
ranked features were selected to build the SVM model. The calculation formula for the SVM model was
provided in Supplementary Material .Ⅶ The selected features were HLH_GLCM_maxpr,
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
LLH_GLCM_sosvh, HLL_GLCM_corrm, LLL_GLCM_denth and HLL_GLSZM_LGZE. Among the five
features, three features of HLH_GLCM_maxpr, LLL_GLCM_denth, and HLL_GLSZM_LGZE showed
significant correlation with the actual LN status and significant difference between the patients with LNM
and non-LNM in the training group with P < 0.05. The univariate analysis and correlation analysis for the
selected features were summarized in Table 2. By comparing the prediction performances of different
feature selection methods, it was noticed that the mRMR method showed the optimal performance. The
specific prediction performances of different methods are summarized in Supplementary Material .Ⅷ
Validation and Evaluation of SVM Model
Significant differences were observed in SVM scores between the patients with synchronous LNM and non-
LNM in both groups (the training group: 0.5466 (interquartile range (IQR), 0.4059-0.6985) vs. 0.3226 (IQR,
0.0527-0.4659), P<0.0001; the validation group: 0.5831 (IQR, 0.3641-0.8162) vs. 0.3101 (IQR, 0.1029-
0.4661), P =0.0015). The AUC value was 0.788 (95% CI, 0.698-0.862) for the training group, and 0.787
(95% CI, 0.634-0.898) for the validation group. These values were consistent with the AUC values
calculated by using the 10000 times bootstrap analysis in both training and validation groups (mean ±
standard deviation; the training group: 0.788±0.027; the validation group: 0.787±0.041). Histograms
describing the distributions of AUCs from the bootstrap method for the SVM model were provided in
Supplementary Material Ⅸ. By using the Youden Index in the training group, the threshold for the SVM
score was defined as 0.4915. By using this threshold, patients with SVM scores higher than 0.4915 were
classified as synchronous LNM, while patients with scores lower than 0.4915 were classified as non-LNM.
The prediction accuracy was 73.58% for the training group and 69.05% for the validation group. The ROC
curves and scatter plots for the SVM score were presented in Figure 2.
Development of Combination Nomogram
In the multivariable analysis, we used the Akaike information criterion (AIC) and the independence analysis
to select the optimal feature combination. A combination of the SVM score, CA 19-9 level, and the MR-
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
reported LNM factor was finally selected. The detailed descriptions of the model construction procedure
were provided in Supplementary Material .Ⅹ By using the collinearity diagnosis, the VIFs for the SVM
score, CA19-9 level, and the MR-reported LNM factor were less than 10 (SVM score: 4.9109; CA19-9
level: 3.7210; MR-reported LNM: 1.9614), indicating no severe collinearity existing in these factors. Using
the multivariable analysis, the three factors including the SVM score (P<0.0001), CA19-9 level (P=0.0081),
and the MR-reported LNM factor (P=0.0307) were all statistically significant and independent in the
training group (Supplementary Material Ⅹ). The combination nomogram was displayed in Figure 3. The
calculation formula for the combination nomogram was provided in Supplementary Material .Ⅶ
Validation and Evaluation of Combination Nomogram
Compared to patients with synchronous non-LNM, patients with synchronous LNM had higher nomogram
scores (the training group: 0.5928 (IQR, 0.1422-1.6073) vs. -1.2560 (IQR, -2.2466- -0.1691), P<0.0001; the
validation group: 0.5151 (IQR, -0.4691-1.0837) vs. -1.6298 (IQR, -2.2261- -0.3005), P<0.0001). The
calibration curves demonstrated good consistency between the predicted LNM probability and the actual
LNM probability for the combination nomogram in both training and validation groups. For the training
group, a non-significant statistic (P=0.4650) of the H-L test suggested no significant deviation from an ideal
fitting. The AUC value was 0.842 (95% CI, 0.758-0.906). For the validation group, a non-significant statistic
of P=0.8578 and an AUC of 0.870 (95% CI, 0.730-0.953) were obtained. By using the bootstrap method, the
AUC values were generally consistent with that calculated based on the two groups (mean ± standard
deviation; the training group: 0.842±0.026; the validation group: 0.869±0.033). Histograms describing the
distributions of AUCs from the bootstrap method for the combination nomogram were provided in
Supplementary Material . Figure 4 displayed the ROC curves and scatter plots for the nomogram score.Ⅸ
The calibration curves for the combination nomogram were provided in Figure 5A-B.
The prediction accuracy for the nomogram was calculated based on the threshold for the nomogram score.
By using the Youden Index in the training group, the optimal threshold of -0.8270 was selected in the ROC
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
analysis. Patients with the nomogram scores greater than -0.8270 were predicted as synchronous LNM,
while patients with scores lower than -0.8270 were predicted as non-LNM. The prediction accuracy was
72.64% for the training group and 78.57% for the validation group. The decision curves for the combination
nomogram and the SVM model were used to evaluate the clinical utilities. In both training and validation
groups, the combination nomogram (red) showed a higher area under decision curves than the SVM model
(black) (Figure 5C-D). The specific performances of the combination nomogram, the SVM model and the
MR-reported LNM factor in both groups were summarized in Table 3.
Overall Validation of the SVM Model and Combination Nomogram
The prediction models were developed based on training group segmented by Radiologist-1 (feature set-1).
To test the robustness and deliverability of the prediction models, we further tested the SVM model, and the
combination nomogram using the overall dataset segmented by Radiologist-2 (feature set-2 and feature set-
3). The combination nomogram showed better performance (Accuracy, 74.32%; AUC, 0.846 (95% CI,
0.777-0.900); Sensitivity, 87.88%; Specificity, 60.98%) than the SVM model alone (Accuracy, 67.57%;
AUC, 0.787 (95% CI, 713-0.850); Sensitivity, 56.06%; Specificity, 78.05%). Further, significant differences
from Delong test suggested significant improvements in predictive performances between the combination
nomogram and the SVM model (P=0.0219). The ROC curves of the combination nomogram and the SVM
model for the overall group were shown in Figure 6.
Discussion
We developed and validated a nomogram by using radiomics approach for LN status preoperative evaluation
in this study. The combination nomogram was constructed by incorporating the SVM score from the
radiomics method and two clinical features of CA19-9 level and the MR-reported LNM factor. SVM score
was an LNM probability calculated from the SVM model, which was developed based on five selective
image features. The combination nomogram outperformed the SVM model in both training and validation
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
groups (the training group: 0.842 vs. 0.788; the validation group 0.870 vs. 0.787). Thus, the favorable
preoperative LN status prediction power of the proposed non-invasive method made it a potential
preoperative evaluation tool in clinical practice.
The manual process of tumor segmentation and the reproducibility of radiomics features are the most
debatable aspects in the radiomics analysis. Subjectivity in the determination of tumor volume and tumor
boundary would occur. The uncertainties in tumor segmentation adversely affect the reproducibility of
radiomics features [43]. A recent study investigating the robustness and reproducibility of radiomics features
in different MRI sequences suggested that radiomics features extracted from T1-weighted images should be
used with care, and only those reproducible features should be selected in building a radiomics model [44].
In this study, all patients were scanned in the same MRI scanner with liver acceleration volume acquisition
(LAVA) sequence. The tumor segmentations were performed by two radiologists independently.
Furthermore, we tested the robustness and reproducibility of image features by using two feature sets
extracted based on the segmentations of the two radiologists. The five selected features and the proposed
SVM model were found to be robust against tumor segmentation.
Note that all the selected five image features used in the SVM model were wavelet features. These features
were extracted from images decomposed by undecimated 3D wavelet transforms. The wavelet
transformation was a multiscale image analysis method by splitting the 3D image data into different
frequency components along three axes. Fine and coarse texture extracted from the wavelet decomposed
images could further present the spatial heterogeneity at multiple scales within tumor regions [45]. By using
the correlation analysis, three out of the five features showed significant correlation with the actual LN
status with P<0.05. The possible reason was that the wavelet features had underlying associations with
clinicopathology and tumor lymphatic system invasion. This observation was consistent with previous
studies which used wavelet-based features in the radiomics models [46-48]. Recently, a study developed a
prediction model to preoperatively differentiate pathological grades in patients with pancreatic
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
neuroendocrine tumors [46]. The prediction model was constructed using eight image features, and seven
out of them were wavelet features. A radiomics study employed machine-learning methods to predict
histologic subtypes for patients with lung cancer. Four out of five features included in the model were
wavelet features [48]. These studies confirmed that wavelet features are important imaging biomarkers for
predicting the phenotype of tumors because they are closely related to the biological behavior of tumors.
In this study, CA19-9 level was served as an independent marker in prognosis stratification in patients with
ICC, which was consistent with the previous studies [49]. In 2001, Jiang et al. proposed a clinical feature
based prognostic score to accurately predict the prognosis for patients with ICC regardless of resection
status, in which CA 19-9 was the only laboratory marker. It was also used as an independent predictive
factor in prognosis evaluation in ICC patients with partial excision. More importantly, a study reported that
the CA 19-9 level was also associated with the tumor progression of ICC [50]. Two recent studies both
reported that the preoperative abnormal level of CA 19-9 was valuable in preoperative LN status evaluation
[51, 52]. Similarly, the CA19-9 level was used as an independent predictor in the combination nomogram in
this study, which also could improve the predictive power of the SVM model.
The proposed preoperative LN status prediction model has potential in assisting clinicians in making the
effective surgical decision for patients with ICC. Although a series of studies had reported that LNM was
highly correlative to the prognosis of ICC, the benefit of lymph node dissection (LND) is still controversial
[53, 54]. de Jong et al. found that among patients who underwent routine LND, patients with LNM showed a
worse median survival [55]. Meng et al. revealed that LND only benefited a subset of patients with a
moderate survival benefit of about five months [56]. LNM-related prognostic stratification is a significant
clinical problem in the management of ICC patients. Accurate preoperative LN status evaluation represents a
key step in individualized and precision treatment of ICC patients.
Our study still had several limitations. Firstly, the patient population was collected from a single institution
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
retrospectively. A total of 106 patients was enrolled in the training group, and 42 patients in the validation
group. To evaluate the sample size for the validation group, we performed a power analysis based on the
LNM rates of the training dataset and the validation dataset. Normally, a power value greater than 0.8
suggests a sufficient sample size [57, 58]. Our estimated power value was 0.85 for the current study. Thus,
the sample sizes for the training and the validation groups were sufficient, meaning that the result and
conclusion of this study were statistically significant. In the future, we will test the proposed model with
multi-center and larger sample size. Secondly, we did not incorporate genomic characteristics in this study.
Recently, increased researches with gene markers had been proposed to detect LNM in patients with ICC,
such as VEGF and EGFR [59]. Though it might be an interesting study to combine genomics and radiomics
analysis, it has not yet been determined how to incorporate genomic characteristics, image features, and
clinical features together. Thirdly, because the diffusion-weighted imaging (DWI) sequences were altered
several times during the long-time span of the study, we used only T1-weighted arterial phase MR images to
mitigate any possible adverse effect caused by the changes in the DWI sequences and enhance the stability
and robustness of the predictive model.
Conclusions
This study developed and validated a combination nomogram for the LN status preoperative evaluation in
ICC patients. The combination nomogram developed using SVM score, CA19-9 level, and the MR-reported
LNM factor showed better prediction accuracy than the MR-reported LNM factor and the SVM model
alone. The proposed model could be used for individualized LN status evaluation and would help clinicians
guide the surgical decisions. Multi-institution retrospective and prospective validation studies should be
implemented before the practical application in the future clinical surgical plan determination.
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377378
379
Abbreviations
ICC: intrahepatic cholangiocarcinoma
LNM: lymph node metastasis
SVM: support vector machine
mRMR: maximum relevance minimum redundancy
LASSO: least absolute shrinkage and selection operator
ROC: receiver operating characteristic
AUC: area under the curve
CI: confidence interval
IQR: interquartile range
VOI: volume of interest
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
AIC: Akaike information criterion
VIF: variance inflation factor
LND: lymph node dissection
CEA: serum carcinoembryonic antigen
CA 19-9: serum carbohydrate antigen 19-9
Acknowledgements
We thank the staff in the center of Cryo-Electron Microscopy (CCEM), Zhejiang University for his technical
assistance.
Sources of Funding
This work was supported by the Zhejiang Provincial Natural Science Foundation of China (Grant No.
LR16F010001, LY17H160010, LY17E050008), National High-tech R&D Program for Young Scientists by
the Ministry of Science and Technology of China (Grant No. 2015AA020917), National Key Research Plan
by the Ministry of Science and Technology of China (Grant No. 2016YFC0104507), Natural Science
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
Foundation of China (NSFC Grant No. 81871351), Zhejiang Province 151 Talents Program, Zhejiang
University Education Foundation ZJU-Stanford Collaboration Fund, the Opening Fund of Engineering
Research Center of Cognitive Healthcare of Zhejiang Province.
Contributions
Conception and design: WL, SZ, TN
Development of methodology: LX, PY, TN
Analysis and interpretation of data: LX, PY, WL
Writing, review, and/or revision of the manuscript: LX, PY, WL, LX, MH, TN
Administrative, technical, or material support: LX, PY, WL, WL, WW, CL, JW, ZP, LX, MH, SZ, TN
Competing Interests
The authors have declared that no competing interest exists.
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
References
1. Mavros MN, Economopoulos KP, Alexiou VG, Pawlik TM. Treatment and prognosis for patients with intrahepatic cholangiocarcinoma: systematic review and meta-analysis. JAMA Surg. 2014; 149: 565-74.2. Aljiffry M, Abdulelah A, Walsh M, Peltekian K, Alwayn I, Molinari M. Evidence-based approach to cholangiocarcinoma: a systematic review of the current literature. J Am Coll Surg. 2009; 208: 134-47.3. de Jong MC, Nathan H, Sotiropoulos GC, Paul A, Alexandrescu S, Marques H, et al. Intrahepatic cholangiocarcinoma: an international multi-institutional analysis of prognostic factors and lymph node assessment. J Clin Oncol. 2011; 29: 3140-5.4. Nakagawa T, Kamiyama T, Kurauchi N, Matsushita M, Nakanishi K, Kamachi H, et al. Number of lymph node metastases is a significant prognostic factor in intrahepatic cholangiocarcinoma. World J Surg. 2005; 29: 728-33.5. Nanashima A, Sakamoto I, Hayashi T, Tobinaga S, Araki M, Kunizaki M, et al. Preoperative diagnosis of lymph node metastasis in biliary and pancreatic carcinomas: evaluation of the combination of multi-detector CT and serum CA19-9 level. Dig Dis Sci. 2010; 55: 3617-26.6. Noji T, Kondo S, Hirano S, Tanaka E, Suzuki O, Shichinohe T. Computed tomography evaluation of regional lymph node metastases in patients with biliary cancer. Br J Surg. 2008; 95: 92-6.7. Chen Y, Zeng Z, Tang Z, Fan J, Zhou J, Jiang W, et al. Prediction of the lymph node status in patients with intrahepatic cholangiocarcinoma: analysis of 320 surgical cases. Front Oncol. 2011; 42: 1-6.8. Kattan MW. Judging new markers by their ability to improve predictive accuracy. J Natl Cancer Inst. 2003; 95: 634-5.9. Seo S, Hatano E, Higashi T, Nakajima A, Nakamoto Y, Tada M, et al. Fluorine-18 fluorodeoxyglucose positron emission tomography predicts lymph node metastasis, P-glycoprotein expression, and recurrence after resection in mass-forming intrahepatic cholangiocarcinoma. Surgery. 2008; 143: 769-77.10. Songthamwat M, Chamadol N, Khuntikeo N, Thinkhamrop J, Koonmee S, Chaichaya N, et al. Evaluating a preoperative protocol that includes magnetic resonance imaging for lymph node metastasis in the Cholangiocarcinoma Screening and Care Program (CASCAP) in Thailand. World J Surg Oncol. 2017; 15: 176.11. Caudell JJ, Torres-Roca JF, Gillies RJ, Enderling H, Kim S, Rishi A, et al. The future of personalised radiotherapy for head and neck cancer. Lancet Oncol. 2017; 18: e266-73.12. Lambin P, Leijenaar RT, Deist TM, Peerlings J, de Jong EE, van Timmeren J, et al. Radiomics: the bridge between medical
453
454
455
456
457
458
459
460
461
462
463
464
465466467468469470471472473474475476477478479480481482483484485486487488489
imaging and personalized medicine. Nat Rev Clin Oncol. 2017; 14: 749-62.13. Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012; 48: 441-6.14. Limkin E, Sun R, Dercle L, Zacharaki E, Robert C, Reuzé S, et al. Promises and challenges for the implementation of computational medical imaging (radiomics) in oncology. Ann Oncol. 2017; 28: 1191-206.15. Verma V, Simone CB, Krishnan S, Lin SH, Yang J, Hahn SM. The rise of radiomics and implications for oncologic management. J Natl Cancer Inst. 2017; 109: djx055.16. Wu S, Zheng J, Li Y, Yu H, Shi S, Xie W, et al. A radiomics nomogram for the preoperative prediction of lymph node metastasis in bladder cancer. Clin Cancer Res. 2017; 23: 6904-11.17. Huang Y, Liang C, He L, Tian J, Liang C, Chen X, et al. Development and validation of a radiomics nomogram for preoperative prediction of lymph node metastasis in colorectal cancer. J Clin Oncol. 2016; 34: 2157-64.18. Wu S, Zheng J, Li Y, Wu Z, Shi S, Huang M, et al. Development and validation of an MRI-based radiomics signature for the preoperative prediction of lymph node metastasis in bladder cancer. EBioMedicine. 2018; 34: 76-84.19. Yushkevich PA, Piven J, Hazlett HC, Smith RG, Ho S, Gee JC, et al. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage. 2006; 31: 1116-28.20. Amadasun M, King R. Textural features corresponding to textural properties. IEEE Trans Syst Man Cybern. 1989; 19: 1264-74. 21. Bocchino C, Carabellese A, Caruso T, Della Sala G, Ricart S, Spinella A. Use of gray value distribution of run lengths for texture analysis. Pattern Recognit Lett. 1990; 11: 415-9.22. Dasarathy BV, Holder EB. Image characterizations based on joint gray level-run length distributions. Pattern Recognit Lett. 1991; 12: 497-502.23. Galloway MM. Texture analysis using gray level run lengths. Comput Graph Image Process. 1975; 4: 172-9.24. Halabi S, Small EJ, Kantoff PW, Kattan MW, Kaplan EB, Dawson NA, et al. Prognostic model for predicting survival in men with hormone-refractory metastatic prostate cancer. J Clin Oncol. 2003; 21: 1232-7.25. Thibault G, Fertil B, Navarro C, Pereira S, Levy N, Sequeira J, et al. Texture indexes and gray level size zone matrix: application to cell nuclei classification. Pattern Recognition Inf Process. 2009; 140-5.26. Wu J, Aguilera T, Shultz D, Gudur M, Rubin DL, Loo Jr BW, et al. Early-stage non-small cell lung cancer: quantitative imaging characteristics of 18F fluorodeoxyglucose PET/CT allow prediction of distant metastasis. Radiology. 2016; 281: 270-8.27. Mukaka MM. Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Med J. 2012; 24: 69-71.28. Unler A, Murat A, Chinnam RB. mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf Sci. 2011; 181: 4625-41.29. Yu L, Liu H. Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res. 2004; 5: 1205-24.30. Coroller TP, Grossmann P, Hou Y, Velazquez ER, Leijenaar RT, Hermann G, et al. CT-based radiomic signature predicts distant metastasis in lung adenocarcinoma. Radiother Oncol. 2015; 114: 345-50.31. Parmar C, Grossmann P, Bussink J, Lambin P, Aerts HJ. Machine learning methods for quantitative radiomic biomarkers. Sci Rep. 2015; 5: 13087.32. Zhang B, He X, Ouyang F, Gu D, Dong Y, Zhang L, et al. Radiomic machine-learning classifiers for prognostic biomarkers of advanced nasopharyngeal carcinoma. Cancer Lett. 2017; 403: 21-7.33. Fluss R, Faraggi D, Reiser B. Estimation of the Youden Index and its associated cutoff point. Biom J. 2005; 47: 458-72.34. Youden WJ. Index for rating diagnostic tests. Cancer. 1950; 3: 32-5.35. Felsenstein J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 1985; 39: 783-91.36. Wu Y, Xu L, Yang P, Lin N, Huang X, Pan WB, et al. Survival Prediction in high-grade osteosarcoma using radiomics of diagnostic computed tomography. EBioMedicine. 2018; 34: 27-34.37. O'Brien RM. A caution regarding rules of thumb for variance inflation factors. Qual Quant. 2007; 41: 673-90.38. Kramer AA, Zimmerman JE. Assessing the calibration of mortality benchmarks in critical care: the Hosmer-Lemeshow test revisited. Crit Care Med. 2007; 35: 2052-6.39. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988; 44: 837-45.
490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538
40. Demler OV, Pencina MJ, D'Agostino RB, Sr. Misuse of DeLong test to compare AUCs for nested models. Stat Med. 2012; 31: 2577-87.41. Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006; 26: 565-74.42. Vickers AJ, Van Calster B, Steyerberg EW. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ. 2016; 352: i6.43. Polan DF, Brady SL, Kaufman RA. Tissue segmentation of computed tomography images using a Random Forest algorithm: a feasibility study. Phys Med Biol. 2016; 61: 6553-69.44. Baeßler B, Weiss K, Pinto DDS. Robustness and reproducibility of radiomics in magnetic resonance imaging: a phantom study. Invest Radiol. 2018; 54: 221-8.45. Kassner A, Thornhill R. Texture analysis: a review of neurologic MR imaging applications. AJNR Am J Neuroradiol. 2010; 31: 809-16.46. Liang W, Yang P, Huang R, Xu L, Wang J, Liu W, et al. A combined nomogram model to preoperatively predict histologic grade in pancreatic neuroendocrine tumors. Clin Cancer Res. 2019; 25: 584-94.47. Wilson R, Devaraj A. Radiomics of pulmonary nodules and lung cancer. Transl Lung Cancer Res. 2017; 6: 86-91.48. Wu W, Parmar C, Grossmann P, Quackenbush J, Lambin P, Bussink J, et al. Exploratory study to identify radiomics classifiers for lung cancer histology. Front Oncol. 2016; 6: 71.49. Jiang W, Zeng Z, Tang Z, Fan J, Sun H, Zhou J, et al. A prognostic scoring system based on clinical features of intrahepatic cholangiocarcinoma: the Fudan score. Ann Oncol. 2011; 22: 1644-52.50. Bergquist JR, Ivanics T, Storlie CB, Groeschl RT, Tee MC, Habermann EB, et al. Implications of CA19‐9 elevation for survival, staging, and treatment sequencing in intrahepatic cholangiocarcinoma: a national cohort analysis. J Surg Oncol. 2016; 114: 475-82.51. Yamada T, Nakanishi Y, Okamura K, Tsuchikawa T, Nakamura T, Noji T, et al. Impact of serum carbohydrate antigen 19‐9 level on prognosis and prediction of lymph node metastasis in patients with intrahepatic cholangiocarcinoma. J Gastroenterol Hepatol. 2018; 33: 1626-33.52. Meng Z, Lin X, Zhu J, Han S, Chen Y. A nomogram to predict lymph node metastasis before resection in intrahepatic cholangiocarcinoma. J Surg Res. 2018; 226: 56-63.53. Weber SM, Ribero D, O'reilly EM, Kokudo N, Miyazaki M, Pawlik TM. Intrahepatic cholangiocarcinoma: expert consensus statement. HPB. 2015; 17: 669-80.54. Adachi T, Eguchi S. Lymph node dissection for intrahepatic cholangiocarcinoma: a critical review of the literature to date. J Hepatobiliary Pancreat Sci. 2014; 21: 162-8.55. Jong MCd, Nathan H, Sotiropoulos GC, Paul A, Alexandrescu S, Marques H, et al. Intrahepatic cholangiocarcinoma: an international multi-institutional analysis of prognostic factors and lymph node assessment. J Clin Oncol. 2011; 29: 3140-5.56. Vitale A, Moustafa M, Spolverato G, Gani F, Cillo U, Pawlik TM. Defining the possible therapeutic benefit of lymphadenectomy among patients undergoing hepatic resection for intrahepatic cholangiocarcinoma. J Surg Oncol. 2016; 113: 685-91.57. Bock J. Power and Sample Size Calculations. New York, USA: Springer; 2001: 309-333.58. Wei J, Yang G, Hao X, Gu D, Tan Y, Wang X, et al. A multi-sequence and habitat-based MRI radiomics signature for preoperative prediction of MGMT promoter methylation in astrocytomas with prognostic implication. Eur Radiol. 2018; 29: 877-88.59. Yoshikawa D, Ojima H, Iwasaki M, Hiraoka N, Kosuge T, Kasai S, et al. Clinicopathological and prognostic significance of EGFR, VEGF, and HER2 expression in cholangiocarcinoma. Br J Cancer. 2008; 98: 418-25.
539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581
582
583
584
Tables
Table 1
Table 1. Patients and preoperative clinical featureClinical features Training group (n=106) P Validation group (n=42) P
LNM non-LNM LNM non-LNMAge (Mean± SD) 58.02 ± 10.54 60.05 ± 8.52 0.2755 55.93 ± 15.25 60.34 ± 7.17 0.4662
Range (35, 77) (39, 86) (40, 80) (43, 76)Gender 0.8450 0.5547
Male 23 30 5 8Female 24 29 14 15
Primary hepatic lobe site 0.6683 0.2479Left 30 40 14 13
Right 17 19 5 10Number of the primary tumors 0.0013 0.1536
Single 30 53 12 19Multiple 17 6 7 4
Hepatitis 0.7065 0.6179Without 35 42 15 24
With 12 17 4 9Cirrhosis 0.4274 0.6673
Without 46 56 18 21With 1 3 1 2
Cholelithiasis 0.2506 0.7030Without 38 42 15 17
With 9 17 4 6CA19-9 0.0086 0.0339
Normal 10 27 7 16Abnormal 37 32 12 7CEA 0.0674 0.0143
Normal 29 46 10 20Abnormal 18 13 9 3
MR-reported LNM 0.0012 0.0251Negative 17 40 5 14Positive 30 19 14 9
Note: LNM, lymph node metastasis; CA19-9, serum carbohydrate antigen 19-9; CEA, serum carcinoembryonic antigen; SD, standard deviation.
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
Table 2
Table 2. Univariate analysis and correlation test for radiomics features used in the SVM model for the training group
Radiomics features Training group (n=106) P Correlation coefficient P
LNM non-LNM
HLH_GLCM_maxpr 0.2854 (0.2651 to 0.3175) 0.2665(0.2425 to 0.2805) 0.0164 0.2343 0.0156
LLH_GLCM_sosvh 1.0462 (0.9128 to 1.1260) 1.1206 (0.9829 to 1.1929) 0.0963 -0.1623 0.0965
HLL_GLCM_corrm -0.0178 (-0.0212 to -0.0152) -0.0146 (-0.0175 to -0.0115) 0.0629 -0.1815 0.0626
LLL_GLCM_denth 2.7902 (2.7389 to 2.8908) 2.9404 (2.8816 to 2.9863) 0.0014 -0.3112 0.0012
HLL_GLSZM_LGZE 0.0013 (0.0010 to 0.0014) 0.0018 (0.0014 to 0.0023) 0.0028 0.2920 0.0024
Note: The univariate analysis for radiomics features was applied by using the Mann-Whitney U test.
The correlation between radiomics features and the LN status was applied by using the Spearman rank correlation test.
All features were reported as median and 95% confidence interval.
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
Table 3
Table 3. Performances of the SVM model, combination nomogram and MR-reported LNM
Models Training group Validation group
Accuracy AUC (95%CI) Sensitivity Specificity S. E. Accuracy AUC (95%CI) Sensitivity Specificity S. E.
SVM score 73.58% 0.788 (0.698 - 0.862) 65.96% 79.66% 0.0441 69.05% 0.787 (0.634 - 0.898) 52.63% 91.30% 0.0695
Combination model 72.64% 0.842 (0.758 - 0.906) 89.36% 57.63% 0.0387 78.57% 0.870 (0.730 - 0.953) 89.47% 69.57% 0.0540
MR-reported LNM 66.04% 0.658 (0.560 - 0.748) 63.83% 67.80% 0.0469 66.67% 0.673 (0.511 - 0.809) 73.68% 60.87% 0.0735
Note: SVM, support vector machine; S.E., standard error; CI, confidence interval.
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
Figures and legends
Figure 1
Figure 1. Workflow of this study. The letters k and n are used to number the patients.
655
656
657
658
659
660
661662
663
664
665
666
667
668
669
670
671
672
Figure 2
Figure 2. The ROC curves of the SVM model in the training group (A) and the validation group (B). The scatter plots of the SVM scores in the training group (C) and the validation group (D). The blue markers indicate patients with synchronous LNM; the red markers indicate patients with non-LNM. The black horizontal line presents the threshold. Patients with SVM scores higher than 0. 4915 are classified as LNM; patients with scores lower than 0. 4915 are classified as non-LNM.
673
674
675
676
677
678
679
680681682683684
685
686
687
688
689
Figure 3
Figure 3. The combination nomogram, combining SVM score, CA 19-9 level, and the MR-reported LNM factor.
690
691
692
693
694
695
696
697698
699
700
701
702
703
704
705
706
707
Figure 4
Figure 4. The ROC curves of the combination nomogram in the training group (A) and the validation group (B). The scatter plots for the nomogram score in the training group (C) and the validation group (D). The blue markers indicate patients with synchronous LNM; the red markers indicate patients with non-LNM. The black horizontal line presents the threshold. Patients with nomogram scores higher than -0.8270 are classified as LNM; patients with scores lower than -0.8270 are classified as non-LNM.
708
709
710
711
712
713
714
715716717718719
720
721
722
723
724
Figure 5
Figure 5. The calibration curves of the combination nomogram in the training group (A) and the validation group (B). Vertical axis: the actual probability of LNM probability; horizontal axis: the nomogram predicted LNM probability; the diagonal line: the perfect prediction with predicted LNM probabilities equal to the actual LNM probabilities. The decision curves of the SVM model and the combination nomogram in the training group (C) and the validation group (D). Vertical axis: the net benefit; horizontal axis: the threshold probability at a range of 0.0 to 1.0. The red and black dotted lines represent the decision curve of the combination nomogram and the SVM model, respectively. The gray line represents the decision curve of the assumption that all patients suffer from LNM; the black line represents the decision curve of the assumption that no patients suffer from LNM.
725
726
727
728
729
730
731
732733734735736737738739740
Figure 6
Figure 6. ROC curves for the SVM model and combination nomogram in the overall group.
741
742
743
744
745
746
747
748749
750
751
752
753
754
755
756
757
758
759
760
Graphical abstract
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
Supplementary Material
A radiomics approach based on support vector machine using MR images for preoperative lymph
node status evaluation in intrahepatic cholangiocarcinoma
Authors
Lei Xu*, Pengfei Yang*, Wenjie Liang*, Weihai Liu, Weigen Wang, Chen Luo, Jing Wang, Zhiyi Peng†, Lei
Xing, Mi Huang†, Shusen Zheng†, Tianye Niu†
. The patient inclusion Ⅰ and exclusion criteria
. MR acquisition parametersⅡ
. Determination ofⅢ hepatitis B, number of the primary tumors, and the MR-reported LNM factor
. The feature set developed in this studyⅣ
. The detailed descriptions of clinical net benefit, the “treat-all plan”, and the “treat-none plan” Ⅴ
. Demographic comparison of baseline clinical features between the training and validation groupsⅥ
. Calculation formulas for SVM model and combination nomogram Ⅶ
. Predictive performances of different feature selection methodsⅧ
. Histograms regarding the distributions of AUCs for the SVM model and combination nomogramⅨ
Ⅹ. The multivariable analysis for model construction
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
. The Ⅰ patient inclusion and exclusion criteria
The patient inclusion criteria included the following: (1) All patients were surgically resected with
pathologically confirmed ICC. (2) Lymph node dissection was performed during operation. (3) T1-weighted
contrast-enhanced MRI scan was performed within one month before the operation. (4) Preoperative clinical
records were complete.
The patient exclusion criteria included the following: (1) The disease was diagnosed to be mixed
hepatocellular cholangiocarcinoma. (2) The patient underwent chemotherapy before contrast-enhanced MRI
scan.
Figure S1. Patient data inclusion/exclusion pathway
. Ⅱ MR acquisition parameters
All patients underwent preoperative abdominal T1-weighted contrast-enhanced MRI scans. The MRI scans
were performed on an MRI scanner (3.0T, GE Medical Systems, Milwaukee, USA) with liver acceleration
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
volume acquisition (LAVA) sequence. By using the abdominal coil, the scan range was defined from the
diaphragmatic dome to the lower pole of the kidney. The contrast-enhanced MR imaging acquisition
parameters included: flip angle: 10°; repetition time: 2.75-3.20 ms; echo time: 1.25-1.50 ms; reverse time: 5
ms; bandwidth: 390.63-488.28 kHz; field of view: 380×304 mm; pixel spacing: 0.7031-0.8203 mm; slice
thickness: 4-6 mm. Each patient was rapid intravenous injected with 15 ml of Gadopentetate Dimeglumine
(2.5 ml/s). The arterial phase was scanned at 14s, then portal vein phase and delayed phase were 55s and
120s after the injection, respectively. Finally, we used the T1-weighted arterial phase enhanced MR images
in this radiomics study.
. Determination of hepatitis B, number of the primary tumors, and the MR-Ⅲ
reported LNM factor
All patients involved in this study were diagnosed with chronic hepatitis B. We divided the patients into
cirrhosis group (F4) and non-cirrhosis group (F0-3), according to their clinical and liver imaging
characteristics. Due to the absence of liver biopsy data, we did not perform accurate liver fibrosis grading.
We used the terminology “number of the primary tumors (single or multiple)” to refer to the case with the
number of solid primary tumors. “Single” refers to the case with only 1 solid tumor, while “multiple” refers
to the case with the number of solid tumors more than 2.
The MR-reported LNM factor was defined by an agreement of 3 radiologists based on the preoperative MR
images. The presence of the maximum short-axis diameter of regional LN ≥ 10mm and/or lymph node
margin hyperintense in diffusion MRI images, and/or marginal enhancement was scored as positive LNM,
while the absence of enlarged or lymph node margin hyperintense or marginal enhancement was scored as
negative LNM, consistent with the definition for LN status evaluation criteria in most previous studies.
. Ⅳ The feature set developed in this study
In this study, a number of 491 image features was extracted for each patient. These features comprised of
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
four groups; the detailed descriptions of the image features were provided in Table S1-S4. The histogram
statistics features described the voxel intensities statistical distribution within the tumors. The geometry
features described the 3D volume and shape characteristics of the tumors. The texture features described the
spatial intensity correlation and distributions of the voxels.
The gray-level co-occurrence matrix (GLCM) is a N g× N gmatrix defined as C (m ,n ;δ , α ), where m and n
represent gray levels, δ (dx ,dy) is the given distance and α indicates the certain direction which has 13
potential value for 3-dimension. The entry C (m ,n) represents the repetition of the correlation of the gray
levels m and n . N g represents the maximum gray level value within the volume of interest (VOI). The gray-
level run length matrix (GLRLM) is used to quantify run length matrices within the VOI. It is a N g× N g
matrix defined as R(m, n∨θ). The element R(m, n) describes the frequency value that the VOI includes a
run of length m, consisting of points of gray level n in the certain direction θ. The gray level size zone
matrix (GLSZM) is used to quantify size zone matrices within the VOI. It is a N g× N g matrix defined as
S(m,n). The element S(m, n) specifies the frequency of block of size n with the gray level m. The
neighborhood gray-tone difference matrix (NGTDM) is a column matrix of N g. It is a sum of the absolute
difference values between central voxel and the average value of its neighborhood. The neighborhood is
defined as the certain distance of 2 voxels. The detailed feature names and abbreviations are presented
below. The feature extraction program was implemented based on MATLAB (Version 2017b; MathWorks,
Natick, MA, USA).
Table S1 Histogram feature
Histogram feature Feature names and abbreviations
Variance, Skewness, Kurtosis, Mean, Energy, Entropy
Table S2 Geometry features
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
Geometry feature Feature names and abbreviations
Max DiameterUniformity, Surface Volume Ratio(SVR), Compactness1(Cpt1), Compactness2(Cpt2),Surface Area, Spherical Disproportion(SphDisp), Sphericity
Table S3 Texture features
Feature type Feature names and abbreviations
GLCM(Grey-level co-occurrence matrix)
Autocorrelation(autoc), Contrast(contr), Correlation(corrm), Correlation2(corrp), Cluster Prominence(cprom), Cluster Shade(cshad), Dissimilarity(dissi), Energy(energ), Entropy(entro), Homogeneity(homom), Homogeneity2(homop), Maximum probability(maxpr), Sum of squares Variance(sosvh), Sum average(savgh), Sum variance(svarh), Sum entropy(senth), Difference variance(dvarh), Difference entropy(denth), Information measure of correlation1(inf1h), Information measure of correlation2(inf2h), Inverse difference normalized (INN) (indnc), Inverse difference moment normalized(idmnc)
GLRLM(Grey-level run-length matrix)
Short Run Emphasis (SRE), Long Run Emphasis (LRE), Grey-Level Non-uniformity (GLN), Run-Length Non-uniformity (RLN), Run Percentage (RP), Low Grey-Level Run Emphasis (LGRE),
High Grey-Level Run Emphasis (HGRE),
Short Run Low Grey-Level Emphasis (SRLGE), Short Run High Grey-Level Emphasis (SRHGE), Long Run Low Grey-Level Emphasis (LRLGE), Long Run High Grey-Level Emphasis (LRHGE), Grey-Level Variance (GLV), Run-Length Variance (RLV)
865
GLSZM(Grey-level size zone matrix)
Small Zone Emphasis (SZE), Large Zone Emphasis (LZE), Grey-Level Non-uniformity (GLN), Zone-Size Non-uniformity (ZSN), Zone Percentage (ZP), Low Grey-Level Zone Emphasis (LGZE), High Grey-Level Zone Emphasis (HGZE),
Small Zone Low Grey-Level Emphasis (SZLGE), Small Zone High Grey-Level Emphasis (SZHGE), Large Zone Low Grey-Level Emphasis (LZLGE), Large Zone High Grey-Level Emphasis (LZHGE), Grey-Level Variance (GLV), Zone-Size Variance (ZSV)
NGTDM(Neighbourhood grey-tone difference matrix)
Coarseness, Contrast, Busyness, Complexity, Strength
Wavelet features: We use the discrete undecimated wavelet transform for decomposing the original
images. The high-pass and low-pass wavelet functions were used in three axials; then, the original image
could be decomposed into eight decompositions. We marked the original 3D images as G, the high-pass
wavelet function as H and the low-pass wavelet function as L. Them, the decompositions could be express
as GLLL, GLLH, GLHL, GHLL, GLHH, GHLH, GHHL, GHHH. Specificity, the decomposition GLHL indicated that the
original image was processed by using a low-pass filter, a high-pass filter and a low-pass filter in the x-axis,
y-axis and z-axis, respectively. Based on these 8 decomposed images, histogram and textural features are
extracted again. A number of 424 features could be obtained through wavelet transform. The filters used for
wavelet transform satisfy the perfect reconstruction conditions.
Table S4 Wavelet features
Wavelet type Feature names
LLL (low, low, low) LLL_GLCM_autoc, LLL_GLCM_contr, LLL_GLCM_corrm, LLL_GLCM_corrp, LLL_GLCM_cprom, LLL_GLCM_cshad, LLL_GLCM_dissi, LLL_GLCM_energ, LLL_GLCM_entro, LLL_GLCM_homom, LLL_GLCM_homop, LLL_GLCM_maxpr, LLL_GLCM_sosvh, LLL_GLCM_savgh, LLL_GLCM_svarh, LLL_GLCM_senth, LLL_GLCM_dvarh, LLL_GLCM_denth, LLL_GLCM_inf1h, LLL_GLCM_inf2h, LLL_GLCM_indnc, LLL_GLCM_idmnc, LLL_GLRLM_SRE, LLL_GLRLM_LRE,
866
867
868
869
870
871
872
873
874
875
LLL_GLRLM_GLN, LLL_GLRLM_RLN, LLL_GLRLM_RP, LLL_GLRLM_LGRE, LLL_GLRLM_HGRE, LLL_GLRLM_SRLGE, LLL_GLRLM_SRHGE, LLL_GLRLM_LRLGE, LLL_GLRLM_LRHGE, LLL_GLRLM_GLV, LLL_GLRLM_RLV, LLL_GLSZM_SZE, LLL_GLSZM_LZE, LLL_GLSZM_GLN, LLL_GLSZM_ZSN, LLL_GLSZM_ZP, LLL_GLSZM_LGZE, LLL_GLSZM_HGZE, LLL_GLSZM_SZLGE, LLL_GLSZM_SZHGE, LLL_GLSZM_LZLGE, LLL_GLSZM_LZHGE, LLL_GLSZM_GLV, LLL_GLSZM_ZSV, LLL_Coarseness, LLL_Contrast, LLL_Busyness, LLL_Complexity, LLL_Strength
LLH (low, low, high) LLH_GLCM_autoc, LLH_GLCM_contr, LLH_GLCM_corrm, LLH_GLCM_corrp, LLH_GLCM_cprom, LLH_GLCM_cshad, LLH_GLCM_dissi, LLH_GLCM_energ), LLH_GLCM_entro, LLH_GLCM_homom, LLH_GLCM_homop, LLH_GLCM_maxpr, LLH_GLCM_sosvh, LLH_GLCM_savgh, LLH_GLCM_svarh, LLH_GLCM_senth, LLH_GLCM_dvarh, LLH_GLCM_denth, LLH_GLCM_inf1h, LLH_GLCM_inf2h, LLH_GLCM_indnc, LLH_GLCM_idmnc, LLH_GLRLM_SRE, LLH_GLRLM_LRE, LLH_GLRLM_GLN, LLH_GLRLM_RLN, LLH_GLRLM_RP, LLH_GLRLM_LGRE, LLH_GLRLM_HGRE, LLH_GLRLM_SRLGE, LLH_GLRLM_SRHGE, LLH_GLRLM_LRLGE, LLH_GLRLM_LRHGE, LLH_GLRLM_GLV, LLH_GLRLM_RLV, LLH_GLSZM_SZE, LLH_GLSZM_LZE, LLH_GLSZM_GLN, LLH_GLSZM_ZSN, LLH_GLSZM_ZP, LLH_GLSZM_LGZE, LLH_GLSZM_HGZE, LLH_GLSZM_SZLGE, LLH_GLSZM_SZHGE, LLH_GLSZM_LZLGE, LLH_GLSZM_LZHGE, LLH_GLSZM_GLV, LLH_GLSZM_ZSV, LLH_Coarseness, LLH_Contrast, LLH_Busyness, LLH_Complexity, LLH_Strength
LHL (low, high, low) LHL_GLCM_autoc, LHL_GLCM_contr, LHL_GLCM_corrm, LHL_GLCM_corrp, LHL_GLCM_cprom, LHL_GLCM_cshad, LHL_GLCM_dissi, LHL_GLCM_energ), LHL_GLCM_entro, LHL_GLCM_homom, LHL_GLCM_homop, LHL_GLCM_maxpr, LHL_GLCM_sosvh, LHL_GLCM_savgh, LHL_GLCM_svarh, LHL_GLCM_senth, LHL_GLCM_dvarh, LHL_GLCM_denth, LHL_GLCM_inf1h, LHL_GLCM_inf2h, LHL_GLCM_indnc, LHL_GLCM_idmnc, LHL_GLRLM_SRE, LHL_GLRLM_LRE, LHL_GLRLM_GLN, LHL_GLRLM_RLN, LHL_GLRLM_RP, LHL_GLRLM_LGRE, LHL_GLRLM_HGRE, LHL_GLRLM_SRLGE, LHL_GLRLM_SRHGE, LHL_GLRLM_LRLGE, LHL_GLRLM_LRHGE, LHL_GLRLM_GLV, LHL_GLRLM_RLV, LHL_GLSZM_SZE, LHL_GLSZM_LZE, LHL_GLSZM_GLN, LHL_GLSZM_ZSN, LHL_GLSZM_ZP, LHL_GLSZM_LGZE, LHL_GLSZM_HGZE, LHL_GLSZM_SZLGE, LHL_GLSZM_SZHGE, LHL_GLSZM_LZLGE, LHL_GLSZM_LZHGE, LHL_GLSZM_GLV, LHL_GLSZM_ZSV, LHL_Coarseness, LHL_Contrast, LHL_Busyness, LHL_Complexity, LHL_Strength
HLL (high, low, low) HLL_GLCM_autoc, HLL_GLCM_contr, HLL_GLCM_corrm, HLL_GLCM_corrp, HLL_GLCM_cprom, HLL_GLCM_cshad,
HLL_GLCM_dissi, HLL_GLCM_energ), HLL_GLCM_entro, HLL_GLCM_homom, HLL_GLCM_homop, HLL_GLCM_maxpr, HLL_GLCM_sosvh, HLL_GLCM_savgh, HLL_GLCM_svarh, HLL_GLCM_senth, HLL_GLCM_dvarh, HLL_GLCM_denth, HLL_GLCM_inf1h, HLL_GLCM_inf2h, HLL_GLCM_indnc, HLL_GLCM_idmnc, HLL_GLRLM_SRE, HLL_GLRLM_LRE, HLL_GLRLM_GLN, HLL_GLRLM_RLN, HLL_GLRLM_RP, HLL_GLRLM_LGRE, HLL_GLRLM_HGRE, HLL_GLRLM_SRLGE, HLL_GLRLM_SRHGE, HLL_GLRLM_LRLGE, HLL_GLRLM_LRHGE, HLL_GLRLM_GLV, HLL_GLRLM_RLV, HLL_GLSZM_SZE, HLL_GLSZM_LZE, HLL_GLSZM_GLN, HLL_GLSZM_ZSN, HLL_GLSZM_ZP, HLL_GLSZM_LGZE, HLL_GLSZM_HGZE, HLL_GLSZM_SZLGE, HLL_GLSZM_SZHGE, HLL_GLSZM_LZLGE, HLL_GLSZM_LZHGE, HLL_GLSZM_GLV, HLL_GLSZM_ZSV, HLL_Coarseness, HLL_Contrast, HLL_Busyness, HLL_Complexity, HLL_Strength
HHL (high, high, low)
HHL_GLCM_autoc, HHL_GLCM_contr, HHL_GLCM_corrm, HHL_GLCM_corrp, HHL_GLCM_cprom, HHL_GLCM_cshad, HHL_GLCM_dissi, HHL_GLCM_energ), HHL_GLCM_entro, HHL_GLCM_homom, HHL_GLCM_homop, HHL_GLCM_maxpr, HHL_GLCM_sosvh, HHL_GLCM_savgh, HHL_GLCM_svarh, HHL_GLCM_senth, HHL_GLCM_dvarh, HHL_GLCM_denth, HHL_GLCM_inf1h, HHL_GLCM_inf2h, HHL_GLCM_indnc, HHL_GLCM_idmnc, HHL_GLRLM_SRE, HHL_GLRLM_LRE, HHL_GLRLM_GLN, HHL_GLRLM_RLN, HHL_GLRLM_RP, HHL_GLRLM_LGRE, HHL_GLRLM_HGRE, HHL_GLRLM_SRLGE, HHL_GLRLM_SRHGE, HHL_GLRLM_LRLGE, HHL_GLRLM_LRHGE, HHL_GLRLM_GLV, HHL_GLRLM_RLV, HHL_GLSZM_SZE, HHL_GLSZM_LZE, HHL_GLSZM_GLN, HHL_GLSZM_ZSN, HHL_GLSZM_ZP, HHL_GLSZM_LGZE, HHL_GLSZM_HGZE, HHL_GLSZM_SZLGE, HHL_GLSZM_SZHGE, HHL_GLSZM_LZLGE, HHL_GLSZM_LZHGE, HHL_GLSZM_GLV, HHL_GLSZM_ZSV, HHL_Coarseness, HHL_Contrast, HHL_Busyness, HHL_Complexity, HHL_Strength
HLH (high, low, high) HLH_GLCM_autoc, HLH_GLCM_contr, HLH_GLCM_corrm, HLH_GLCM_corrp, HLH_GLCM_cprom, HLH_GLCM_cshad, HLH_GLCM_dissi, HLH_GLCM_energ), HLH_GLCM_entro, HLH_GLCM_homom, HLH_GLCM_homop, HLH_GLCM_maxpr, HLH_GLCM_sosvh, HLH_GLCM_savgh, HLH_GLCM_svarh, HLH_GLCM_senth, HLH_GLCM_dvarh, HLH_GLCM_denth, HLH_GLCM_inf1h, HLH_GLCM_inf2h, HLH_GLCM_indnc, HLH_GLCM_idmnc, HLH_GLRLM_SRE, HLH_GLRLM_LRE, HLH_GLRLM_GLN, HLH_GLRLM_RLN, HLH_GLRLM_RP, HLH_GLRLM_LGRE, HLH_GLRLM_HGRE, HLH_GLRLM_SRLGE, HLH_GLRLM_SRHGE, HLH_GLRLM_LRLGE, HLH_GLRLM_LRHGE, HLH_GLRLM_GLV, HLH_GLRLM_RLV, HLH_GLSZM_SZE, HLH_GLSZM_LZE, HLH_GLSZM_GLN, HLH_GLSZM_ZSN, HLH_GLSZM_ZP, HLH_GLSZM_LGZE, HLH_GLSZM_HGZE,
HLH_GLSZM_SZLGE, HLH_GLSZM_SZHGE, HLH_GLSZM_LZLGE, HLH_GLSZM_LZHGE, HLH_GLSZM_GLV, HLH_GLSZM_ZSV, HLH_Coarseness, HLH_Contrast, HLH_Busyness, HLH_Complexity, HLH_Strength
LHH (low, high, high) LHH_GLCM_autoc, LHH_GLCM_contr, LHH_GLCM_corrm, LHH_GLCM_corrp, LHH_GLCM_cprom, LHH_GLCM_cshad, LHH_GLCM_dissi, LHH_GLCM_energ), LHH_GLCM_entro, LHH_GLCM_homom, LHH_GLCM_homop, LHH_GLCM_maxpr, LHH_GLCM_sosvh, LHH_GLCM_savgh, LHH_GLCM_svarh, LHH_GLCM_senth, LHH_GLCM_dvarh, LHH_GLCM_denth, LHH_GLCM_inf1h, LHH_GLCM_inf2h, LHH_GLCM_indnc, LHH_GLCM_idmnc, LHH_GLRLM_SRE, LHH_GLRLM_LRE, LHH_GLRLM_GLN, LHH_GLRLM_RLN, LHH_GLRLM_RP, LHH_GLRLM_LGRE, LHH_GLRLM_HGRE, LHH_GLRLM_SRLGE, LHH_GLRLM_SRHGE, LHH_GLRLM_LRLGE, LHH_GLRLM_LRHGE, LHH_GLRLM_GLV, LHH_GLRLM_RLV, LHH_GLSZM_SZE, LHH_GLSZM_LZE, LHH_GLSZM_GLN, LHH_GLSZM_ZSN, LHH_GLSZM_ZP, LHH_GLSZM_LGZE, LHH_GLSZM_HGZE, LHH_GLSZM_SZLGE, LHH_GLSZM_SZHGE, LHH_GLSZM_LZLGE, LHH_GLSZM_LZHGE, LHH_GLSZM_GLV, LHH_GLSZM_ZSV, LHH_Coarseness, LHH_Contrast, LHH_Busyness, LHH_Complexity, LHH_Strength
HHH (high, high, high) HHH_GLCM_autoc, HHH_GLCM_contr, HHH_GLCM_corrm, HHH_GLCM_corrp, HHH_GLCM_cprom, HHH_GLCM_cshad, HHH_GLCM_dissi, HHH_GLCM_energ), HHH_GLCM_entro, HHH_GLCM_homom, HHH_GLCM_homop, HHH_GLCM_maxpr, HHH_GLCM_sosvh, HHH_GLCM_savgh, HHH_GLCM_svarh, HHH_GLCM_senth, HHH_GLCM_dvarh, HHH_GLCM_denth, HHH_GLCM_inf1h, HHH_GLCM_inf2h, HHH_GLCM_indnc, HHH_GLCM_idmnc, HHH_GLRLM_SRE, HHH_GLRLM_LRE, HHH_GLRLM_GLN, HHH_GLRLM_RLN, HHH_GLRLM_RP, HHH_GLRLM_LGRE, HHH_GLRLM_HGRE, HHH_GLRLM_SRLGE, HHH_GLRLM_SRHGE, HHH_GLRLM_LRLGE, HHH_GLRLM_LRHGE, HHH_GLRLM_GLV, HHH_GLRLM_RLV, HHH_GLSZM_SZE, HHH_GLSZM_LZE, HHH_GLSZM_GLN, HHH_GLSZM_ZSN, HHH_GLSZM_ZP, HHH_GLSZM_LGZE, HHH_GLSZM_HGZE, HHH_GLSZM_SZLGE, HHH_GLSZM_SZHGE, HHH_GLSZM_LZLGE, HHH_GLSZM_LZHGE, HHH_GLSZM_GLV, HHH_GLSZM_ZSV, HHH_Coarseness, HHH_Contrast, HHH_Busyness, HHH_Complexity, HHH_Strength
. The detailed descriptions of clinical net benefit,Ⅴ the “treat-all plan”, and the “treat-
none plan”
876
877
878
The net benefit was defined using the following formula:
Net benefit=TPRN
− FPRN
×P t
1−Pt
In this formula,N was the sample size, Pt was the threshold probability to stratify patients as the predicted
synchronous lymph node metastasis (LNM) or non-LNM. Patients with the predicted LNM probabilities
greater than Pt were predicted as synchronous LNM, while patients with the predicted LNM probabilities
lower than Pt were predicted as non-LNM. For patients with predicted synchronous LNM, the lymph node
dissection (LND) were recommended. While for patients with predicted non-LNM, the LND was not
recommended. TPR was the true positive rate. TPR was defined as the ratio of patients with predicted
synchronous LNM in the patients with actual LNM. FPR was the false positive rate. FPR was defined as the
ratio of patients with predicted synchronous LNM in the patients without LNM.
The “treat-none plan” was defined that no patients were predicted as LNM. In this case, the TPR and FPR
equaled to zero, and the net benefit was zero. The “treat-all plan” was defined that all patients were
predicted as LNM. In this case, the TPR and FPR equaled to one, and the net benefit calculation formula
was changed:
Net benefit treat−all plan=1−2 × Pt
N ×(1−P t)
. Demographic comparison of baseline clinical features between the training and Ⅵ
validation groups
While a temporal interval existed between the training and validation groups, there were no significant
differences in the baseline clinical features between the training group and the validation group neither for
patients with LNM (P = 0.9773 for age, 0.0923 for gender, 0.4419 for primary hepatic lobe site, 0.9590 for
number of the primary tumors 0.9464 for hepatitis, 0.9044 for cirrhosis, 0.8684 for cholelithiasis, 0.1904 for
CA 19-9 level, 0.4974 for CEA level, 0.4419 for the MR-reported LNM factor) and patients with non-LNM
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
(P = 0.8829 for age, 0.1900 for gender, 0.3374 for primary hepatic lobe site, 0.6015 for number of the
primary tumors 0.8749 for hepatitis, 0.9202 for cirrhosis, 0.8050 for cholelithiasis, 0.0525 for CA 19-9 level,
0.5401 for CEA level, 0.5523 for the MR-reported LNM factor). Thus, the baseline clinical features for
patients in the training and validation groups justify their use as the training and validation groups.
. Calculation formulas for SVM model and combination nomogram ⅦSVM score=0.3386+0.0988× HLH¿−0.1524 × LLH¿−0.2111 × HLL¿−0.4333 × LLL¿−0.2087 × HLL¿
Nomogram score=−3.4872+4.1198× SVM score+1.4461×CA 19 ˗ 9level+1.0490× MR ˗reported LNM
. Predictive performances of different feature selection methodsⅧTo find the optimal feature selection algorithm for the problem here, we compared the performances of
several feature selection methods, including mRMR, least absolute shrinkage and selection operator
(LASSO), Random forest, Elastic net, Wilcoxon, and Gini index, and the results were summarized in Table
S5 below. In the training group, the P values calculated based on the Delong test showed that the mRMR,
LASSO, Elastic net, and Random forest were all less than 0.0001. Using the AUC value as an evaluation
index, the mRMR method and LASSO method showed the best performances. In the validation group, the P
value for the mRMR method was the lowest, while its AUC was the highest. Therefore, the mRMR was
chosen as the optimal method in this paper.
Table S5. Predictive performances of different feature selection methods
Methods Training group Validation group
Sensitivity Specificity AUC (95% CI) P Sensitivity Specificity AUC (95% CI) P
mRMR 65.96% 79.66% 0.788 (0.698-0.862) <0.0001 52.63% 91.30% 0.787 (0.634-0.898) <0.0001
LASSO 70.21% 71.19% 0.773 (0.692-0.857) <0.0001 63.16% 60.87% 0.714 (0.554-0.843) 0.0077
Elastic Net 82.98% 55.93% 0.737 (0.643-0.818) <0.0001 73.68% 43.48% 0.629 (0.467-0.773) 0.1385
Random Forest 65.96% 79.66% 0.759 (0.666-0.837) <0.0001 42.11% 73.91% 0.673 (0.511-0.809) 0.0396
Wilcoxon 57.45% 79.66% 0.689 (0.592-0.775) 0.0004 57.89% 78.26% 0.693 (0.532-0.826) 0.0238
Gini Index 49.15% 80.85% 0.679 (0.581-0.766) 0.0006 43.48% 78.95% 0.719 (0.559-0.846) 0.0074
Note: AUC: area under the curve; CI: confidence interval. P value was calculated using the Delong test.
. Histograms regarding the distributions of AUCs for the SVM mode and Ⅸ
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
combination nomogram
Figure S2. Histograms regarding the distributions of AUCs from the bootstrap method for SVM model and combination
nomogram in both training and validation groups. (A) Histogram for SVM model in the training group; (B) histogram for
SVM model in the validation group; (C) histogram for combination nomogram in the training group; (B) histogram for
combination nomogram in the validation group.
Ⅹ. The multivariable analysis for model construction
In the multivariable analysis, we used the Akaike information criterion (AIC) and the independence analysis
to select the optimal factors. The detailed AIC values in the model construction procedure were showed in
Table S6. Firstly, a combination with the minimum AIC value of 117.49 was selected, involving SVM score,
CA 19-9 level, number of the primary tumors, primary hepatic lobe site, and the MR-reported LNM factor
was selected. By using the independence analysis for features in the model, three features of SVM score, CA
19-9 level, and the MR-reported LNM factor were reported independent with P-values < 0.05, while two
features of primary hepatic lobe site and number of the primary tumors were reported non-independent with
P-values > 0.05. Table S8 showed the P-values for these five features. Then, we removed these two
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
redundant features and construct a new model with the independent features only. Table S9 showed the P-
values for the selected factors in the new prediction model. Thus, we used the model with the SVM score,
CA 19-9 level, and the MR-reported LNM factor in this study.
Table S6. AIC value changes in the model construction
Variable AIC
SVM score & Gender & Age & Cholelithiasis & Hepatitis B & Cirrhosis & Primary hepatic lobe site & Number of the primary tumors &CA 19-9 level & CEA level & MR-reported LNM
127.41
SVM score & Gender & Age & Cholelithiasis & Cirrhosis & Primary hepatic lobe site & Number of the primary tumors &CA 19-9 level & CEA level & MR-reported LNM
125.43
SVM score & Gender & Age & Cholelithiasis & Cirrhosis & Primary hepatic lobe site & Number of the primary tumors &CA 19-9 level & MR-reported LNM
123.45
SVM score & Gender & Age & Cholelithiasis & Primary hepatic lobe site & Number of the primary tumors &CA 19-9 level & MR-reported LNM
121.53
SVM score & Age & Cholelithiasis & Primary hepatic lobe site & Number of the primary tumors &CA 19-9 level & MR-reported LNM
119.76
SVM score & Cholelithiasis & Primary hepatic lobe site & Number of the primary tumors &CA 19-9 level & MR-reported LNM
118.19
SVM score & Primary hepatic lobe site & Number of the primary tumors &CA 19-9 level & MR-reported LNM
117.49
Note: SVM, support vector machine; LNM, lymph node metastasis; CA19-9, serum carbohydrate antigen 19-9; CEA, serum carcinoembryonic antigen.
Table S7. VIFs for all the candidate variables in the logistic regression analysis
Variable VIF
SVM score 5.9610Gender 2.3877Age 47.8974Cholelithiasis 1.4593Hepatitis B 1.6686Cirrhosis 1.0680Primary hepatic lobe site 1.8118Number of the primary tumors 1.4319CA 19-9 level 4.1702CEA level 1.9743MR-reported LNM 0.0772
Note: LNM, lymph node metastasis; CA19-9, serum carbohydrate antigen 19-9; CEA, serum carcinoembryonic antigen.
Table S8. Multivariable analysis for five features selected
Variable P
SVM score 0.0003CA 19-9 level 0.0078MR-reported LNM 0.0249
933
934
935
936
937
938
Primary hepatic lobe site 0.0546Number of the primary tumors 0.0772
Note: LNM: lymph node metastasis; CA19-9: serum carbohydrate antigen 19-9.
Table S9. Multivariable analysis for features used in the nomogram
Variable Coefficients P OR (95% CI)
SVM score 4.1198 <0.0001 61.5448 (7.8097-485.0073)CA 19-9 level 1.4461 0.0081 4.2467 (1.4569-12.3785)MR-reported LNM 1.0490 0.0307 2.8548 (1.1022-7.3941)
Note: SVM, support vector machine; OR, odds ratio; CI, confidence interval.
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
Reference:
1. Haralick RM, Shanmugam K. Textural features for image classification. IEEE Trans Syst Man Cybern. 1973; SMC-3: 610-21.2. Galloway MM. Texture analysis using gray level run lengths. Comput Graph Image Process. 1975; 4: 172-9.3. Amadasun M, King R. Textural features corresponding to textural properties. IEEE Trans Syst Man Cybern. 1989; 19: 1264-74. 4. Chu A, Sehgal CM, Greenleaf JF. Use of gray value distribution of run lengths for texture analysis. Pattern Recognit Lett. 1990; 11: 415-9. 5. Dasarathy BV, Holder EB. Image characterizations based on joint gray level-run length distributions. Pattern Recognit Lett. 1991; 12: 497-502.6. Thibault G, Fertil B, Navarro C, L., Pereira S, Cau P, Lévy N, et al. Texture indexes and gray level size zone matrix: application to cell nuclei classification. Pattern Recognition Inf Process. 2009; 140-5.7. Vallieres M, Freeman CR, Skamene SR, El Naqa I. A radiomics model from joint FDG-PET and MRI texture features for the prediction of lung metastases in soft-tissue sarcomas of the extremities. Phys Med Biol. 2015; 60: 5471-96. 8. Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006; 26: 565-74. 9. Vickers AJ, Van Calster B, Steyerberg EW. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ. 2016; 352: 5.10. Coroller TP, Grossmann P, Hou Y, Velazquez ER, Leijenaar RTH, Hermann G, et al. CT-based radiomic signature predicts distant metastasis in lung adenocarcinoma. Radiother Oncol. 2015; 114: 345-50. 11. Parmar C, Grossmann P, Bussink J, Lambin P, Aerts H. Machine learning methods for quantitative radiomic biomarkers. Sci Rep. 2015; 5: 11. 12. Huang Y, Liang C, He L, Tian J, Liang C, Chen X, et al. Development and validation of a radiomics nomogram for preoperative prediction of lymph node metastasis in colorectal cancer. J Clin Oncol. 2016; 34: 2157-64.13. Zhang B, He X, Ouyang F, Gu D, Dong Y, Zhang L, et al. Radiomic machine-learning classifiers for prognostic biomarkers of advanced nasopharyngeal carcinoma. Cancer Lett. 2017; 403: 21-7.14. Li H, Galperin-Aizenberg M, Pryma D, Simone CB, Fan Y. Unsupervised machine learning of radiomic features for predicting treatment response and overall survival of early stage non-small cell lung cancer patients treated with stereotactic body radiation therapy. Radiother Oncol. 2018; 129: 218-26. 15. Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr. 1974; 19: 716-23. 16. Wu S, Zheng J, Li Y, Yu H, Shi S, Xie W, et al. A radiomics nomogram for the preoperative prediction of lymph node metastasis in bladder cancer. Clin Cancer Res. 2017; 23: 6904-11.
967
968
969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999