· web viewthe tumor detection and characterization of tumor phenotype appeared to be improved...

A radiomics approach based on support vector machine using MR images for preoperative lymph

node status evaluation in intrahepatic cholangiocarcinoma

Authors:

Lei Xu1,2*, Pengfei Yang1,2,3*, Wenjie Liang4*, Weihai Liu4, Weigen Wang5, Chen Luo1,2, Jing Wang1,2, Zhiyi

Peng4†, Lei Xing6, Mi Huang7†, Shusen Zheng8†, Tianye Niu1,2,9†

Affiliations:

1. Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China

2. Institute of Translational Medicine, Zhejiang University, Hangzhou, Zhejiang, China

3. College of Biomedical Engineering &Instrument Science, Zhejiang University, Hangzhou, Zhejiang,

China

4. Department of Radiology, the First Affiliated Hospital, Zhejiang University School of Medicine,

Hangzhou, Zhejiang, China

5. Department of Radiology, the First Hospital of Ninghai County Medical Centre, Ningbo, Zhejiang,

China

6. Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California, USA

7. Department of Radiation Oncology, Duke University Medical Center, Durham, USA

8. Department of Hepatobiliary and Pancreatic Surgery, the First Affiliated Hospital, Zhejiang University

School of Medicine, Hangzhou, Zhejiang, China

9. Engineering Research Center of Cognitive Healthcare of Zhejiang Province

*: Both authors contribute equally.

†: Corresponding authors: Tianye Niu, [email protected]; Shusen Zheng, [email protected]; Mi

Huang, [email protected] ; Zhiyi Peng, [email protected]

Keywords:

Radiomics, intrahepatic cholangiocarcinoma, lymph node metastasis

Abstract

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

mailto:[email protected]

mailto:[email protected]

Purpose: Accurate lymph node (LN) status evaluation for intrahepatic cholangiocarcinoma (ICC) patients is

essential for surgical planning. This study aimed to develop and validate a prediction model for preoperative

LN status evaluation in ICC patients.

Methods and Materials: A group of 106 ICC patients, who were diagnosed between April 2011 and

February 2016, was used for prediction model training. Image features were extracted from T1-weighted

contrast-enhanced MR images. A support vector machine (SVM) model was built by using the most LN

status-related features, which were selected using the maximum relevance minimum redundancy (mRMR)

algorithm. The mRMR method ranked each feature according to its relevance to the LN status and

redundancy with other features. An SVM score was calculated for each patient to reflect the LN metastasis

(LNM) probability from the SVM model. Finally, a combination nomogram was constructed by

incorporating the SVM score and clinical features. An independent group of 42 patients who were diagnosed

from March 2016 to November 2017 was used to validate the prediction models. The model performances

were evaluated on discrimination, calibration, and clinical utility.

Results: The SVM model was constructed based on five selected image features. Significant differences

were found between patients with LNM and non-LNM in SVM scores in both groups (the training group:

0.5466 (interquartile range (IQR), 0.4059-0.6985) vs. 0.3226 (IQR, 0.0527-0.4659), P<0.0001; the

validation group: 0.5831 (IQR, 0.3641-0.8162) vs. 0.3101 (IQR, 0.1029-0.4661), P=0.0015). The

combination nomogram based on the SVM score, the CA 19-9 level, and the MR-reported LNM factor

showed better discrimination in separating patients with LNM and non-LNM, comparing to the SVM model

alone (AUC: the training group: 0.842 vs. 0.788; the validation group: 0.870 vs. 0.787). Favorable clinical

utility was observed using the decision curve analysis for the nomogram.

Conclusion: The nomogram, incorporating the SVM score, CA 19-9 level and the MR-reported LNM

factor, provided an individualized LN status evaluation and helped clinicians guide the surgical decisions.

Introduction

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

For liver, intrahepatic cholangiocarcinoma (ICC) is the second common malignancy with steadily growing

incidence rate, with 5%-56% 5-year survival rate worldwide [1, 2]. When diagnosed, about 35% of ICC

patients suffer from synchronous lymph node (LN) metastases [1]. Lymph node metastasis (LNM) generally

indicates a negative prognosis for patients with ICC [3, 4]. Accurate preoperative evaluation of LN status

could provide crucial information for treatment strategy decisions, especially for lymph node dissection

(LND). In current clinical practice, the preoperative LN status in ICC is evaluated mainly based on the

morphological features of the lymph nodes by reviewing the medical images preoperatively (for example,

size and morphology of lymph nodes, signal changes within lymph nodes, etc.)[5, 6]. The prediction

accuracy of current LN status evaluation method is often unstable and unsatisfactory.

A common strategy to predict the LN status was developed based on histopathologic findings, such as tumor

differentiation and lymphatic invasion. However, the predictors based on this strategy were only available

postoperatively. A clinical model built upon the clinical factors including tumor size, pathological

differentiation, and tumor boundary could achieve high sensitivity of 96.1%, but the achievable specificity

and accuracy were quite low, with a value of 23.0% in specificity and 40.3% in accuracy [7]. The above

clinical model was reported to be useful in predicting LN status in patients with ICC. Nevertheless, this

method was also challenging to apply in clinical practice, because subjectivity may exist in determination of

tumor size and tumor boundary, based on clinician’s experience and judgment. When the tumor volume was

small, or when the tumor boundary was unclear, the situation became exacerbated, and the prediction

accuracy could be questionable.

On the other hand, several image-based methods have been proposed [5, 6, 8-10]. Seo et al. used the

standardized uptake value as the LNM image marker based on positron emission tomography (PET) images

[9]. However, the high cost of PET scan limited the utility in clinical practice. Nanashima et al. developed

an LN status prediction model by combining CT findings and serum carbohydrate antigen 19-9 level [5].

The CT findings were used as image markers, which defined by radiologists according to the node status of

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

hepatoduodenal ligament, common hepatic artery, and para-aorta based on CT images. The model showed a

higher prediction accuracy than previous models based on clinical features only, or medical images alone.

However, the underlying geometry and texture features of medical images were not fully excavated in these

models. A comprehensive model incorporating clinical features and image features is needed.

Radiomics refers to mining the underlying relationships between quantitative image features and

pathophysiology characteristics and then developing predictive models for clinical outcomes, such as

survival, distant metastases, and molecular characteristics classification [11-15]. The use of nomograms has

been widely accepted as a reliable tool by incorporating quantitative risk factors for clinical events.

Recently, several researchers have developed nomograms for preoperative LN status evaluation in colorectal

cancer and bladder cancer by incorporating clinical features and image features [16-18]. These nomograms

achieved desirable predictive accuracies. The related studies demonstrated the feasibility of using the

radiomics method to evaluate the LN status for patients with ICC.

In this study, a support vector machine (SVM) model was developed by using the radiomics method for

preoperative LN status evaluation in ICC patients. A nomogram was then constructed by combining the

clinical features and LNM probability, which was calculated based on the SVM model with the radiomics

features from the MR images. We then investigated the difference in prediction accuracies between the

combination model and the SVM model.

Methods

Workflow

Figure 1 presents the workflow of this study. It includes two major parts: ( ) imaging and segmentation; ( )ⅰ ⅱ

feature extraction and model construction. The specific descriptions for these two parts are provided in the

next sections.

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

Patient Population

The institutional review board of the institution (First Affiliated Hospital, Zhejiang University School of

Medicine) approved this study, and the requirement of written informed consent was waived. In this

retrospective study, we reviewed clinical records and T1-weighted contrast-enhanced MR images of ICC

patients undergoing partial hepatectomy and LND between April 2011 and November 2017. The clinical LN

status was obtained based on the clinicopathologic analysis of each patient (X-M Z, a pathologist with more

than 10 years of experience in cancer diagnosis). Supplementary Material presents theⅠ patient inclusion-

exclusion criteria and recruitment pathway. We divided the overall patient population into two independent

sub-groups according to the diagnosis time. The first sub-group was used as the training group to test the

robustness for image features and conduct the model constructing purpose. This training group consisted of

106 patients diagnosed between April 2011 and February 2016 (53 females and 53 females; 35 to 86 years of

age). The second sub-group was used as the validation group to test the proposed model. The validation

group involved 42 patients (13 females and 29 females; 40 to 80 years of age) diagnosed between March

2016 and November 2017. All patients underwent pre-treatment T1-weighted contrast-enhanced MRI scans

in our institution, since the contrast agent (Gadopentetate dimeglumine) was paramagnetic, and the signal

intensity within the tumor would be enhanced after the injection in the images. The tumor detection and

characterization of tumor phenotype appeared to be improved compared with the un-enhanced MRI and

contrast-enhanced CT. The MRI acquisition parameters are presented in Supplementary Material .Ⅱ

Baseline clinical features were derived from medical records, including gender, age, cholelithiasis (with or

without), hepatitis B (with or without), cirrhosis (with or without), primary hepatic lobe site (left or right)

and number of the primary tumors (single or multiple). The serum carbohydrate antigen 19-9 (CA19-9) level

(abnormal or normal) and serum carcinoembryonic antigen (CEA) level (abnormal or normal) were

achieved with the threshold value of the former 37 u/ml, and the latter 5 ng/ml in our institution. All MR

images were evaluated by two experienced abdomen radiologists, both of whom were blind to the actual

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

clinicopathologic results. The definitions for hepatitis B, number of the primary tumors and the MR-reported

LNM factor are provided in Supplementary Material .Ⅲ The number of the primary tumors was used as a

clinical predictive factor, which referred to the number of solid primary tumors for each patient. To justify

the use of baseline clinical features of patients in the training and validation groups, we performed the

demographic comparison for each clinical feature between the training group and validation group for

patients with LNM and non-LNM, respectively.

VOI Segmentation and Feature Extraction

We used the ITK-SNAP software to perform a 3D volume of interest (VOI) manual segmentation [19].

When multiple tumors were present, the tumor with the largest diameter was used to analyze. The VOI was

segmented by two experienced radiologists independently. Radiologist-1 had experience of 12 years in MR

images interpretation, and Radiologist-2 had experience of 14 years in MR images interpretation.

Radiologist-1 finished the segmentation of patients in the training group only (106 patients). Radiologist-2

finished the segmentation of the overall patient population including the training and validation groups (148

patients). We then obtained two feature sets for the training group (feature set-1 was extracted based on the

VOI segmentation of Radiologist-1; feature set-2 from the VOI segmentation of Radiologist-2) and a feature

set for the validation group (feature set-3 from the VOI segmentation of Radiologist-2). The feature set-1

was used to perform the model training task. The feature set-2 was used to test the robustness and

reproducibility of radiomics features from feature set-1. The feature set-3 was used to evaluate the predictive

power of the proposed model.

Image preprocessing was applied before the feature extraction, including image resampling of the arterial

phase contrast-enhanced MR images to a 1×1×1 mm3 voxel size, and image grey level normalization to a

scale of 1 to 32. A total number of 491 image features was extracted for each patient based on the VOI. The

feature set included histogram features (number=6), geometry features (number=8), gray level co-

occurrence matrix features (number=22), grey-level run-length matrix features (number=13), grey-level size

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

zone matrix features (number=13), neighborhood gray-tone difference matrix features (number=5), and

wavelet-based texture features (number=424). These features could characterize intratumor heterogeneity, as

well as the underlying tumor genotypes and protein structures [20-26]. Supplementary Material providesⅣ

the specific descriptions for all the radiomics features. The feature extraction procedure was implemented in

MATLAB V2017b (MathWorks, Natick, MA, USA).

Feature Selection and SVM Model Construction

To eliminate the differences in the value scales of the radiomics features, feature normalization was

performed before feature selection. For features in the training group, each feature for a specific patient was

subtracted by the mean value and divided by standard deviation value from this group. The same

normalization method was applied to features in the validation group using the mean values and standard

deviation values calculated based on the training group.

Due to the relatively low-dimensional patient sample size and high-dimensional feature size, we then

performed feature selection process to select the most LN status-related features to construct an SVM model.

Feature selection was performed including two steps. First, we tested the robustness and reproducibility of

image features. Since the features were extracted based on the VOIs segmented by radiologists manually, we

only used the features that were most robust against the manual segmentation among different radiologists

[26]. The correlation coefficient for each feature was calculated between the feature set-1 (from Radiologist-

1) and feature set-2 (from Radiologist-2) by using the Spearman rank correlation test. Features with

correlation coefficients greater than 0.8 were regarded as robust features, since a correlation coefficient of

0.8 indicated a high correlation according to a rule of thumb [26, 27]. Second, we applied the maximum

relevance minimum redundancy (mRMR) algorithm to assess the relevance and redundancy for each feature

[28, 29]. The maximum-relevance selection was aimed to select features that had the maximal correlation to

the actual LN status. The minimum-redundancy selection ensured that the selected features had the minimal

redundancy among each other. By using the mRMR method, the features were ranked according to their

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

relevance-redundancy indexes. The several top features with high-relevance and low-redundancy were used

to construct the SVM model by a linear kernel. In addition, we tried several typical feature selection

methods, including mRMR, least absolute shrinkage and selection operator (LASSO), Random Forest,

Elastic Net, Wilcoxon, and Gini index [30-32]. A comparison of these methods was also performed.

To demonstrate the association between the selected features and the actual LN status, we performed

univariate analysis and correlation test for each selected features in the training group. An SVM score was

calculated by using the SVM model for each patient to reflect the LNM probability. The discrimination

measured the capacity of prediction models in separating patients with LNM and non-LNM. The

discriminative capability was measured using receiver operating characteristic (ROC) curve, area under the

curve (AUC) and prediction accuracy. AUC had a range from 0.5 to 1.0 (0.5 means no discriminative ability

and 1.0 means ideal discriminative ability). The AUC was reported with a 95% confidence interval (CI). The

prediction accuracy was calculated based on a threshold from a Youden Index, which could classify the

patients into the predicted LNM group and non-LNM group according to the SVM score [33, 34]. To

estimate the prediction error and confidence interval for both groups, we further tested the proposed model

using a 10000-iteration bootstrap analysis in both training and validation group [35]. For each repetition, we

randomly selected a subset of 75% patients from the training group or the validation group (the training

group: 80 patients; the validation group: 32 patients) and calculated the corresponding AUC.

Development and Validation of Combination Nomogram

Multivariable analysis was applied to combine the clinical features and the SVM score with multivariable

logistic regression model [17, 36]. The clinical features involved gender, age, cholelithiasis status, hepatitis

status, cirrhosis status, primary site, number of the primary tumors, CEA level, CA19-9 level, and the MR-

reported LNM. Then, a nomogram (called combination nomogram) was generated based on the proposed

multivariate model. An LNM probability defined as nomogram score was then calculated for each patient by

using the developed combination nomogram. To detect the multi-collinearity among variables in the

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

combination nomogram, the collinearity diagnosis was conducted by calculating the variance inflation factor

(VIF) for variables in the combination nomogram [36, 37]. The VIF was defined as a ratio of the variance of

the model with more than two variables, divided by the variance of the model with the single variable.

Variables with VIFs > 10 indicated severity multicollinearities [37]. The threshold for the nomogram score

was determined and used to classify patients in the validation group. We tested the nomogram by using the

overall group including the feature set-2 and feature set-3. In addition, the combination nomogram was also

tested using the bootstrap method in both training and validation groups.

The model performances were evaluated in three aspects: discrimination, calibration and clinical utility [17,

24]. The discrimination performance was accessed by using ROC, AUC, and prediction accuracy. The

calibration was detected by using the calibration curves accompanied by the Hosmer-Lemeshow test (H-L

test). The calibration curves measured the consistency between the predicted LNM probability and the actual

LNM probability. The H-L test accessed the goodness-of-fit of the prediction models [17, 38]. A significant

statistic from the H-L test indicated the significant difference between the predicted LNM probability and

the actual LNM probability, meaning that the model was poor fitting. For the test of the overall group, we

used the Delong test to measure the differences in ROC curves between combination nomogram and the

SVM model [39, 40].

The decision curve analysis was applied to measure the clinical utility of models [41, 42]. The horizontal

axis of the decision curve indicates the threshold probability in the range of 0.0 to 1.0. The vertical axis

shows the clinical net benefit values resulted from the prediction models against the threshold probability.

The decision curves corresponding to the “treat-all plan” and the “treat-none plan” are plotted as references.

The detailed descriptions of the clinical net benefit, the“treat-all plan” and the“treat-none plan” are provided

in Supplementary Material . A larger area under the decision curve suggested a better clinical utility.Ⅴ

Statistical Analysis Procedure

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

All statistical tests used in this study were executed on MedCalc Statistical Software V15.2.2 (MedCalc

Software bvba, Ostend, Belgium) or R software V3.4.1 (R Core Team, Vienna, Austria). Univariate analysis

for clinical features was implemented by using the Chi-square test or Mann-Whitney U test, as appropriate.

The categorical variable was analyzed using the Chi-square test, such as gender, primary site, number of the

primary tumors, CEA level, etc. The continuous variable was analyzed using the Mann-Whitney U test,

including age and tumor size. The P<0.05 in two-tailed analyses was defined as the statistical significance.

Results

Clinical Features

Table 1 listed the clinical features of patients in the training group and validation group. No statistically

significant difference existed in LNM rate (P = 0.9210) between the two groups. The LNM rate was defined

as the ratio between the number of patients with LNM and the number of patients involved in the certain

group. The LNM rate was 44.34% in the training group, and 45.24% in the validation group. While a

temporal interval existed between the training and validation groups, there were no significant differences in

the baseline clinical features between the training group and the validation group neither for patients with

LNM nor non-LNM, justifying their use as the training and validation groups. The detailed results of

univariable association analysis were presented in Supplementary Material . Ⅵ

Feature Selection and SVM Model Construction

Among the 491 image features, 91 features were retained through the robustness and reproducibility test

with correlation coefficients greater than 0.8 between the feature set-1 and feature set-2 by using the

Spearman rank correlation test. The mRMR based feature selection was used to decrease the redundancy of

the feature set and build the optimal subset of complementary predictive features. The five highest mRMR-

ranked features were selected to build the SVM model. The calculation formula for the SVM model was

provided in Supplementary Material .Ⅶ The selected features were HLH_GLCM_maxpr,

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

LLH_GLCM_sosvh, HLL_GLCM_corrm, LLL_GLCM_denth and HLL_GLSZM_LGZE. Among the five

features, three features of HLH_GLCM_maxpr, LLL_GLCM_denth, and HLL_GLSZM_LGZE showed

significant correlation with the actual LN status and significant difference between the patients with LNM

and non-LNM in the training group with P < 0.05. The univariate analysis and correlation analysis for the

selected features were summarized in Table 2. By comparing the prediction performances of different

feature selection methods, it was noticed that the mRMR method showed the optimal performance. The

specific prediction performances of different methods are summarized in Supplementary Material .Ⅷ

Validation and Evaluation of SVM Model

Significant differences were observed in SVM scores between the patients with synchronous LNM and non-

LNM in both groups (the training group: 0.5466 (interquartile range (IQR), 0.4059-0.6985) vs. 0.3226 (IQR,

0.0527-0.4659), P<0.0001; the validation group: 0.5831 (IQR, 0.3641-0.8162) vs. 0.3101 (IQR, 0.1029-

0.4661), P =0.0015). The AUC value was 0.788 (95% CI, 0.698-0.862) for the training group, and 0.787

(95% CI, 0.634-0.898) for the validation group. These values were consistent with the AUC values

calculated by using the 10000 times bootstrap analysis in both training and validation groups (mean ±

standard deviation; the training group: 0.788±0.027; the validation group: 0.787±0.041). Histograms

describing the distributions of AUCs from the bootstrap method for the SVM model were provided in

Supplementary Material Ⅸ. By using the Youden Index in the training group, the threshold for the SVM

score was defined as 0.4915. By using this threshold, patients with SVM scores higher than 0.4915 were

classified as synchronous LNM, while patients with scores lower than 0.4915 were classified as non-LNM.

The prediction accuracy was 73.58% for the training group and 69.05% for the validation group. The ROC

curves and scatter plots for the SVM score were presented in Figure 2.

Development of Combination Nomogram

In the multivariable analysis, we used the Akaike information criterion (AIC) and the independence analysis

to select the optimal feature combination. A combination of the SVM score, CA 19-9 level, and the MR-

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

reported LNM factor was finally selected. The detailed descriptions of the model construction procedure

were provided in Supplementary Material .Ⅹ By using the collinearity diagnosis, the VIFs for the SVM

score, CA19-9 level, and the MR-reported LNM factor were less than 10 (SVM score: 4.9109; CA19-9

level: 3.7210; MR-reported LNM: 1.9614), indicating no severe collinearity existing in these factors. Using

the multivariable analysis, the three factors including the SVM score (P<0.0001), CA19-9 level (P=0.0081),

and the MR-reported LNM factor (P=0.0307) were all statistically significant and independent in the

training group (Supplementary Material Ⅹ). The combination nomogram was displayed in Figure 3. The

calculation formula for the combination nomogram was provided in Supplementary Material .Ⅶ

Validation and Evaluation of Combination Nomogram

Compared to patients with synchronous non-LNM, patients with synchronous LNM had higher nomogram

scores (the training group: 0.5928 (IQR, 0.1422-1.6073) vs. -1.2560 (IQR, -2.2466- -0.1691), P<0.0001; the

validation group: 0.5151 (IQR, -0.4691-1.0837) vs. -1.6298 (IQR, -2.2261- -0.3005), P<0.0001). The

calibration curves demonstrated good consistency between the predicted LNM probability and the actual

LNM probability for the combination nomogram in both training and validation groups. For the training

group, a non-significant statistic (P=0.4650) of the H-L test suggested no significant deviation from an ideal

fitting. The AUC value was 0.842 (95% CI, 0.758-0.906). For the validation group, a non-significant statistic

of P=0.8578 and an AUC of 0.870 (95% CI, 0.730-0.953) were obtained. By using the bootstrap method, the

AUC values were generally consistent with that calculated based on the two groups (mean ± standard

deviation; the training group: 0.842±0.026; the validation group: 0.869±0.033). Histograms describing the

distributions of AUCs from the bootstrap method for the combination nomogram were provided in

Supplementary Material . Figure 4 displayed the ROC curves and scatter plots for the nomogram score.Ⅸ

The calibration curves for the combination nomogram were provided in Figure 5A-B.

The prediction accuracy for the nomogram was calculated based on the threshold for the nomogram score.

By using the Youden Index in the training group, the optimal threshold of -0.8270 was selected in the ROC

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

analysis. Patients with the nomogram scores greater than -0.8270 were predicted as synchronous LNM,

while patients with scores lower than -0.8270 were predicted as non-LNM. The prediction accuracy was

72.64% for the training group and 78.57% for the validation group. The decision curves for the combination

nomogram and the SVM model were used to evaluate the clinical utilities. In both training and validation

groups, the combination nomogram (red) showed a higher area under decision curves than the SVM model

(black) (Figure 5C-D). The specific performances of the combination nomogram, the SVM model and the

MR-reported LNM factor in both groups were summarized in Table 3.

Overall Validation of the SVM Model and Combination Nomogram

The prediction models were developed based on training group segmented by Radiologist-1 (feature set-1).

To test the robustness and deliverability of the prediction models, we further tested the SVM model, and the

combination nomogram using the overall dataset segmented by Radiologist-2 (feature set-2 and feature set-

3). The combination nomogram showed better performance (Accuracy, 74.32%; AUC, 0.846 (95% CI,

0.777-0.900); Sensitivity, 87.88%; Specificity, 60.98%) than the SVM model alone (Accuracy, 67.57%;

AUC, 0.787 (95% CI, 713-0.850); Sensitivity, 56.06%; Specificity, 78.05%). Further, significant differences

from Delong test suggested significant improvements in predictive performances between the combination

nomogram and the SVM model (P=0.0219). The ROC curves of the combination nomogram and the SVM

model for the overall group were shown in Figure 6.

Discussion

We developed and validated a nomogram by using radiomics approach for LN status preoperative evaluation

in this study. The combination nomogram was constructed by incorporating the SVM score from the

radiomics method and two clinical features of CA19-9 level and the MR-reported LNM factor. SVM score

was an LNM probability calculated from the SVM model, which was developed based on five selective

image features. The combination nomogram outperformed the SVM model in both training and validation

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

groups (the training group: 0.842 vs. 0.788; the validation group 0.870 vs. 0.787). Thus, the favorable

preoperative LN status prediction power of the proposed non-invasive method made it a potential

preoperative evaluation tool in clinical practice.

The manual process of tumor segmentation and the reproducibility of radiomics features are the most

debatable aspects in the radiomics analysis. Subjectivity in the determination of tumor volume and tumor

boundary would occur. The uncertainties in tumor segmentation adversely affect the reproducibility of

radiomics features [43]. A recent study investigating the robustness and reproducibility of radiomics features

in different MRI sequences suggested that radiomics features extracted from T1-weighted images should be

used with care, and only those reproducible features should be selected in building a radiomics model [44].

In this study, all patients were scanned in the same MRI scanner with liver acceleration volume acquisition

(LAVA) sequence. The tumor segmentations were performed by two radiologists independently.

Furthermore, we tested the robustness and reproducibility of image features by using two feature sets

extracted based on the segmentations of the two radiologists. The five selected features and the proposed

SVM model were found to be robust against tumor segmentation.

Note that all the selected five image features used in the SVM model were wavelet features. These features

were extracted from images decomposed by undecimated 3D wavelet transforms. The wavelet

transformation was a multiscale image analysis method by splitting the 3D image data into different

frequency components along three axes. Fine and coarse texture extracted from the wavelet decomposed

images could further present the spatial heterogeneity at multiple scales within tumor regions [45]. By using

the correlation analysis, three out of the five features showed significant correlation with the actual LN

status with P<0.05. The possible reason was that the wavelet features had underlying associations with

clinicopathology and tumor lymphatic system invasion. This observation was consistent with previous

studies which used wavelet-based features in the radiomics models [46-48]. Recently, a study developed a

prediction model to preoperatively differentiate pathological grades in patients with pancreatic

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

neuroendocrine tumors [46]. The prediction model was constructed using eight image features, and seven

out of them were wavelet features. A radiomics study employed machine-learning methods to predict

histologic subtypes for patients with lung cancer. Four out of five features included in the model were

wavelet features [48]. These studies confirmed that wavelet features are important imaging biomarkers for

predicting the phenotype of tumors because they are closely related to the biological behavior of tumors.

In this study, CA19-9 level was served as an independent marker in prognosis stratification in patients with

ICC, which was consistent with the previous studies [49]. In 2001, Jiang et al. proposed a clinical feature

based prognostic score to accurately predict the prognosis for patients with ICC regardless of resection

status, in which CA 19-9 was the only laboratory marker. It was also used as an independent predictive

factor in prognosis evaluation in ICC patients with partial excision. More importantly, a study reported that

the CA 19-9 level was also associated with the tumor progression of ICC [50]. Two recent studies both

reported that the preoperative abnormal level of CA 19-9 was valuable in preoperative LN status evaluation

[51, 52]. Similarly, the CA19-9 level was used as an independent predictor in the combination nomogram in

this study, which also could improve the predictive power of the SVM model.

The proposed preoperative LN status prediction model has potential in assisting clinicians in making the

effective surgical decision for patients with ICC. Although a series of studies had reported that LNM was

highly correlative to the prognosis of ICC, the benefit of lymph node dissection (LND) is still controversial

[53, 54]. de Jong et al. found that among patients who underwent routine LND, patients with LNM showed a

worse median survival [55]. Meng et al. revealed that LND only benefited a subset of patients with a

moderate survival benefit of about five months [56]. LNM-related prognostic stratification is a significant

clinical problem in the management of ICC patients. Accurate preoperative LN status evaluation represents a

key step in individualized and precision treatment of ICC patients.

Our study still had several limitations. Firstly, the patient population was collected from a single institution

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

retrospectively. A total of 106 patients was enrolled in the training group, and 42 patients in the validation

group. To evaluate the sample size for the validation group, we performed a power analysis based on the

LNM rates of the training dataset and the validation dataset. Normally, a power value greater than 0.8

suggests a sufficient sample size [57, 58]. Our estimated power value was 0.85 for the current study. Thus,

the sample sizes for the training and the validation groups were sufficient, meaning that the result and

conclusion of this study were statistically significant. In the future, we will test the proposed model with

multi-center and larger sample size. Secondly, we did not incorporate genomic characteristics in this study.

Recently, increased researches with gene markers had been proposed to detect LNM in patients with ICC,

such as VEGF and EGFR [59]. Though it might be an interesting study to combine genomics and radiomics

analysis, it has not yet been determined how to incorporate genomic characteristics, image features, and

clinical features together. Thirdly, because the diffusion-weighted imaging (DWI) sequences were altered

several times during the long-time span of the study, we used only T1-weighted arterial phase MR images to

mitigate any possible adverse effect caused by the changes in the DWI sequences and enhance the stability

and robustness of the predictive model.

Conclusions

This study developed and validated a combination nomogram for the LN status preoperative evaluation in

ICC patients. The combination nomogram developed using SVM score, CA19-9 level, and the MR-reported

LNM factor showed better prediction accuracy than the MR-reported LNM factor and the SVM model

alone. The proposed model could be used for individualized LN status evaluation and would help clinicians

guide the surgical decisions. Multi-institution retrospective and prospective validation studies should be

implemented before the practical application in the future clinical surgical plan determination.

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377378

379

Abbreviations

ICC: intrahepatic cholangiocarcinoma

LNM: lymph node metastasis

SVM: support vector machine

mRMR: maximum relevance minimum redundancy

LASSO: least absolute shrinkage and selection operator

ROC: receiver operating characteristic

AUC: area under the curve

CI: confidence interval

IQR: interquartile range

VOI: volume of interest

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

AIC: Akaike information criterion

VIF: variance inflation factor

LND: lymph node dissection

CEA: serum carcinoembryonic antigen

CA 19-9: serum carbohydrate antigen 19-9

Acknowledgements

We thank the staff in the center of Cryo-Electron Microscopy (CCEM), Zhejiang University for his technical

assistance.

Sources of Funding

This work was supported by the Zhejiang Provincial Natural Science Foundation of China (Grant No.

LR16F010001, LY17H160010, LY17E050008), National High-tech R&D Program for Young Scientists by

the Ministry of Science and Technology of China (Grant No. 2015AA020917), National Key Research Plan

by the Ministry of Science and Technology of China (Grant No. 2016YFC0104507), Natural Science

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

Foundation of China (NSFC Grant No. 81871351), Zhejiang Province 151 Talents Program, Zhejiang

University Education Foundation ZJU-Stanford Collaboration Fund, the Opening Fund of Engineering

Research Center of Cognitive Healthcare of Zhejiang Province.

Contributions

Conception and design: WL, SZ, TN

Development of methodology: LX, PY, TN

Analysis and interpretation of data: LX, PY, WL

Writing, review, and/or revision of the manuscript: LX, PY, WL, LX, MH, TN

Administrative, technical, or material support: LX, PY, WL, WL, WW, CL, JW, ZP, LX, MH, SZ, TN

Competing Interests

The authors have declared that no competing interest exists.

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

References

1. Mavros MN, Economopoulos KP, Alexiou VG, Pawlik TM. Treatment and prognosis for patients with intrahepatic cholangiocarcinoma: systematic review and meta-analysis. JAMA Surg. 2014; 149: 565-74.2. Aljiffry M, Abdulelah A, Walsh M, Peltekian K, Alwayn I, Molinari M. Evidence-based approach to cholangiocarcinoma: a systematic review of the current literature. J Am Coll Surg. 2009; 208: 134-47.3. de Jong MC, Nathan H, Sotiropoulos GC, Paul A, Alexandrescu S, Marques H, et al. Intrahepatic cholangiocarcinoma: an international multi-institutional analysis of prognostic factors and lymph node assessment. J Clin Oncol. 2011; 29: 3140-5.4. Nakagawa T, Kamiyama T, Kurauchi N, Matsushita M, Nakanishi K, Kamachi H, et al. Number of lymph node metastases is a significant prognostic factor in intrahepatic cholangiocarcinoma. World J Surg. 2005; 29: 728-33.5. Nanashima A, Sakamoto I, Hayashi T, Tobinaga S, Araki M, Kunizaki M, et al. Preoperative diagnosis of lymph node metastasis in biliary and pancreatic carcinomas: evaluation of the combination of multi-detector CT and serum CA19-9 level. Dig Dis Sci. 2010; 55: 3617-26.6. Noji T, Kondo S, Hirano S, Tanaka E, Suzuki O, Shichinohe T. Computed tomography evaluation of regional lymph node metastases in patients with biliary cancer. Br J Surg. 2008; 95: 92-6.7. Chen Y, Zeng Z, Tang Z, Fan J, Zhou J, Jiang W, et al. Prediction of the lymph node status in patients with intrahepatic cholangiocarcinoma: analysis of 320 surgical cases. Front Oncol. 2011; 42: 1-6.8. Kattan MW. Judging new markers by their ability to improve predictive accuracy. J Natl Cancer Inst. 2003; 95: 634-5.9. Seo S, Hatano E, Higashi T, Nakajima A, Nakamoto Y, Tada M, et al. Fluorine-18 fluorodeoxyglucose positron emission tomography predicts lymph node metastasis, P-glycoprotein expression, and recurrence after resection in mass-forming intrahepatic cholangiocarcinoma. Surgery. 2008; 143: 769-77.10. Songthamwat M, Chamadol N, Khuntikeo N, Thinkhamrop J, Koonmee S, Chaichaya N, et al. Evaluating a preoperative protocol that includes magnetic resonance imaging for lymph node metastasis in the Cholangiocarcinoma Screening and Care Program (CASCAP) in Thailand. World J Surg Oncol. 2017; 15: 176.11. Caudell JJ, Torres-Roca JF, Gillies RJ, Enderling H, Kim S, Rishi A, et al. The future of personalised radiotherapy for head and neck cancer. Lancet Oncol. 2017; 18: e266-73.12. Lambin P, Leijenaar RT, Deist TM, Peerlings J, de Jong EE, van Timmeren J, et al. Radiomics: the bridge between medical

453

454

455

456

457

458

459

460

461

462

463

464

465466467468469470471472473474475476477478479480481482483484485486487488489

imaging and personalized medicine. Nat Rev Clin Oncol. 2017; 14: 749-62.13. Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012; 48: 441-6.14. Limkin E, Sun R, Dercle L, Zacharaki E, Robert C, Reuzé S, et al. Promises and challenges for the implementation of computational medical imaging (radiomics) in oncology. Ann Oncol. 2017; 28: 1191-206.15. Verma V, Simone CB, Krishnan S, Lin SH, Yang J, Hahn SM. The rise of radiomics and implications for oncologic management. J Natl Cancer Inst. 2017; 109: djx055.16. Wu S, Zheng J, Li Y, Yu H, Shi S, Xie W, et al. A radiomics nomogram for the preoperative prediction of lymph node metastasis in bladder cancer. Clin Cancer Res. 2017; 23: 6904-11.17. Huang Y, Liang C, He L, Tian J, Liang C, Chen X, et al. Development and validation of a radiomics nomogram for preoperative prediction of lymph node metastasis in colorectal cancer. J Clin Oncol. 2016; 34: 2157-64.18. Wu S, Zheng J, Li Y, Wu Z, Shi S, Huang M, et al. Development and validation of an MRI-based radiomics signature for the preoperative prediction of lymph node metastasis in bladder cancer. EBioMedicine. 2018; 34: 76-84.19. Yushkevich PA, Piven J, Hazlett HC, Smith RG, Ho S, Gee JC, et al. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage. 2006; 31: 1116-28.20. Amadasun M, King R. Textural features corresponding to textural properties. IEEE Trans Syst Man Cybern. 1989; 19: 1264-74. 21. Bocchino C, Carabellese A, Caruso T, Della Sala G, Ricart S, Spinella A. Use of gray value distribution of run lengths for texture analysis. Pattern Recognit Lett. 1990; 11: 415-9.22. Dasarathy BV, Holder EB. Image characterizations based on joint gray level-run length distributions. Pattern Recognit Lett. 1991; 12: 497-502.23. Galloway MM. Texture analysis using gray level run lengths. Comput Graph Image Process. 1975; 4: 172-9.24. Halabi S, Small EJ, Kantoff PW, Kattan MW, Kaplan EB, Dawson NA, et al. Prognostic model for predicting survival in men with hormone-refractory metastatic prostate cancer. J Clin Oncol. 2003; 21: 1232-7.25. Thibault G, Fertil B, Navarro C, Pereira S, Levy N, Sequeira J, et al. Texture indexes and gray level size zone matrix: application to cell nuclei classification. Pattern Recognition Inf Process. 2009; 140-5.26. Wu J, Aguilera T, Shultz D, Gudur M, Rubin DL, Loo Jr BW, et al. Early-stage non-small cell lung cancer: quantitative imaging characteristics of 18F fluorodeoxyglucose PET/CT allow prediction of distant metastasis. Radiology. 2016; 281: 270-8.27. Mukaka MM. Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Med J. 2012; 24: 69-71.28. Unler A, Murat A, Chinnam RB. mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf Sci. 2011; 181: 4625-41.29. Yu L, Liu H. Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res. 2004; 5: 1205-24.30. Coroller TP, Grossmann P, Hou Y, Velazquez ER, Leijenaar RT, Hermann G, et al. CT-based radiomic signature predicts distant metastasis in lung adenocarcinoma. Radiother Oncol. 2015; 114: 345-50.31. Parmar C, Grossmann P, Bussink J, Lambin P, Aerts HJ. Machine learning methods for quantitative radiomic biomarkers. Sci Rep. 2015; 5: 13087.32. Zhang B, He X, Ouyang F, Gu D, Dong Y, Zhang L, et al. Radiomic machine-learning classifiers for prognostic biomarkers of advanced nasopharyngeal carcinoma. Cancer Lett. 2017; 403: 21-7.33. Fluss R, Faraggi D, Reiser B. Estimation of the Youden Index and its associated cutoff point. Biom J. 2005; 47: 458-72.34. Youden WJ. Index for rating diagnostic tests. Cancer. 1950; 3: 32-5.35. Felsenstein J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 1985; 39: 783-91.36. Wu Y, Xu L, Yang P, Lin N, Huang X, Pan WB, et al. Survival Prediction in high-grade osteosarcoma using radiomics of diagnostic computed tomography. EBioMedicine. 2018; 34: 27-34.37. O'Brien RM. A caution regarding rules of thumb for variance inflation factors. Qual Quant. 2007; 41: 673-90.38. Kramer AA, Zimmerman JE. Assessing the calibration of mortality benchmarks in critical care: the Hosmer-Lemeshow test revisited. Crit Care Med. 2007; 35: 2052-6.39. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988; 44: 837-45.

490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538

40. Demler OV, Pencina MJ, D'Agostino RB, Sr. Misuse of DeLong test to compare AUCs for nested models. Stat Med. 2012; 31: 2577-87.41. Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006; 26: 565-74.42. Vickers AJ, Van Calster B, Steyerberg EW. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ. 2016; 352: i6.43. Polan DF, Brady SL, Kaufman RA. Tissue segmentation of computed tomography images using a Random Forest algorithm: a feasibility study. Phys Med Biol. 2016; 61: 6553-69.44. Baeßler B, Weiss K, Pinto DDS. Robustness and reproducibility of radiomics in magnetic resonance imaging: a phantom study. Invest Radiol. 2018; 54: 221-8.45. Kassner A, Thornhill R. Texture analysis: a review of neurologic MR imaging applications. AJNR Am J Neuroradiol. 2010; 31: 809-16.46. Liang W, Yang P, Huang R, Xu L, Wang J, Liu W, et al. A combined nomogram model to preoperatively predict histologic grade in pancreatic neuroendocrine tumors. Clin Cancer Res. 2019; 25: 584-94.47. Wilson R, Devaraj A. Radiomics of pulmonary nodules and lung cancer. Transl Lung Cancer Res. 2017; 6: 86-91.48. Wu W, Parmar C, Grossmann P, Quackenbush J, Lambin P, Bussink J, et al. Exploratory study to identify radiomics classifiers for lung cancer histology. Front Oncol. 2016; 6: 71.49. Jiang W, Zeng Z, Tang Z, Fan J, Sun H, Zhou J, et al. A prognostic scoring system based on clinical features of intrahepatic cholangiocarcinoma: the Fudan score. Ann Oncol. 2011; 22: 1644-52.50. Bergquist JR, Ivanics T, Storlie CB, Groeschl RT, Tee MC, Habermann EB, et al. Implications of CA19‐9 elevation for survival, staging, and treatment sequencing in intrahepatic cholangiocarcinoma: a national cohort analysis. J Surg Oncol. 2016; 114: 475-82.51. Yamada T, Nakanishi Y, Okamura K, Tsuchikawa T, Nakamura T, Noji T, et al. Impact of serum carbohydrate antigen 19‐9 level on prognosis and prediction of lymph node metastasis in patients with intrahepatic cholangiocarcinoma. J Gastroenterol Hepatol. 2018; 33: 1626-33.52. Meng Z, Lin X, Zhu J, Han S, Chen Y. A nomogram to predict lymph node metastasis before resection in intrahepatic cholangiocarcinoma. J Surg Res. 2018; 226: 56-63.53. Weber SM, Ribero D, O'reilly EM, Kokudo N, Miyazaki M, Pawlik TM. Intrahepatic cholangiocarcinoma: expert consensus statement. HPB. 2015; 17: 669-80.54. Adachi T, Eguchi S. Lymph node dissection for intrahepatic cholangiocarcinoma: a critical review of the literature to date. J Hepatobiliary Pancreat Sci. 2014; 21: 162-8.55. Jong MCd, Nathan H, Sotiropoulos GC, Paul A, Alexandrescu S, Marques H, et al. Intrahepatic cholangiocarcinoma: an international multi-institutional analysis of prognostic factors and lymph node assessment. J Clin Oncol. 2011; 29: 3140-5.56. Vitale A, Moustafa M, Spolverato G, Gani F, Cillo U, Pawlik TM. Defining the possible therapeutic benefit of lymphadenectomy among patients undergoing hepatic resection for intrahepatic cholangiocarcinoma. J Surg Oncol. 2016; 113: 685-91.57. Bock J. Power and Sample Size Calculations. New York, USA: Springer; 2001: 309-333.58. Wei J, Yang G, Hao X, Gu D, Tan Y, Wang X, et al. A multi-sequence and habitat-based MRI radiomics signature for preoperative prediction of MGMT promoter methylation in astrocytomas with prognostic implication. Eur Radiol. 2018; 29: 877-88.59. Yoshikawa D, Ojima H, Iwasaki M, Hiraoka N, Kosuge T, Kasai S, et al. Clinicopathological and prognostic significance of EGFR, VEGF, and HER2 expression in cholangiocarcinoma. Br J Cancer. 2008; 98: 418-25.

539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581

582

583

584

Tables

Table 1

Table 1. Patients and preoperative clinical featureClinical features Training group (n=106) P Validation group (n=42) P

LNM non-LNM LNM non-LNMAge (Mean± SD) 58.02 ± 10.54 60.05 ± 8.52 0.2755 55.93 ± 15.25 60.34 ± 7.17 0.4662

Range (35, 77) (39, 86) (40, 80) (43, 76)Gender 0.8450 0.5547

Male 23 30 5 8Female 24 29 14 15

Primary hepatic lobe site 0.6683 0.2479Left 30 40 14 13

Right 17 19 5 10Number of the primary tumors 0.0013 0.1536

Single 30 53 12 19Multiple 17 6 7 4

Hepatitis 0.7065 0.6179Without 35 42 15 24

With 12 17 4 9Cirrhosis 0.4274 0.6673

Without 46 56 18 21With 1 3 1 2

Cholelithiasis 0.2506 0.7030Without 38 42 15 17

With 9 17 4 6CA19-9 0.0086 0.0339

Normal 10 27 7 16Abnormal 37 32 12 7CEA 0.0674 0.0143

Normal 29 46 10 20Abnormal 18 13 9 3

MR-reported LNM 0.0012 0.0251Negative 17 40 5 14Positive 30 19 14 9

Note: LNM, lymph node metastasis; CA19-9, serum carbohydrate antigen 19-9; CEA, serum carcinoembryonic antigen; SD, standard deviation.

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

Table 2

Table 2. Univariate analysis and correlation test for radiomics features used in the SVM model for the training group

Radiomics features Training group (n=106) P Correlation coefficient P

LNM non-LNM

HLH_GLCM_maxpr 0.2854 (0.2651 to 0.3175) 0.2665(0.2425 to 0.2805) 0.0164 0.2343 0.0156

LLH_GLCM_sosvh 1.0462 (0.9128 to 1.1260) 1.1206 (0.9829 to 1.1929) 0.0963 -0.1623 0.0965

HLL_GLCM_corrm -0.0178 (-0.0212 to -0.0152) -0.0146 (-0.0175 to -0.0115) 0.0629 -0.1815 0.0626

LLL_GLCM_denth 2.7902 (2.7389 to 2.8908) 2.9404 (2.8816 to 2.9863) 0.0014 -0.3112 0.0012

HLL_GLSZM_LGZE 0.0013 (0.0010 to 0.0014) 0.0018 (0.0014 to 0.0023) 0.0028 0.2920 0.0024

Note: The univariate analysis for radiomics features was applied by using the Mann-Whitney U test.

The correlation between radiomics features and the LN status was applied by using the Spearman rank correlation test.

All features were reported as median and 95% confidence interval.

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

Table 3

Table 3. Performances of the SVM model, combination nomogram and MR-reported LNM

Models Training group Validation group

Accuracy AUC (95%CI) Sensitivity Specificity S. E. Accuracy AUC (95%CI) Sensitivity Specificity S. E.

SVM score 73.58% 0.788 (0.698 - 0.862) 65.96% 79.66% 0.0441 69.05% 0.787 (0.634 - 0.898) 52.63% 91.30% 0.0695

Combination model 72.64% 0.842 (0.758 - 0.906) 89.36% 57.63% 0.0387 78.57% 0.870 (0.730 - 0.953) 89.47% 69.57% 0.0540

MR-reported LNM 66.04% 0.658 (0.560 - 0.748) 63.83% 67.80% 0.0469 66.67% 0.673 (0.511 - 0.809) 73.68% 60.87% 0.0735

Note: SVM, support vector machine; S.E., standard error; CI, confidence interval.

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

Figures and legends

Figure 1

Figure 1. Workflow of this study. The letters k and n are used to number the patients.

655

656

657

658

659

660

661662

663

664

665

666

667

668

669

670

671

672

Figure 2

Figure 2. The ROC curves of the SVM model in the training group (A) and the validation group (B). The scatter plots of the SVM scores in the training group (C) and the validation group (D). The blue markers indicate patients with synchronous LNM; the red markers indicate patients with non-LNM. The black horizontal line presents the threshold. Patients with SVM scores higher than 0. 4915 are classified as LNM; patients with scores lower than 0. 4915 are classified as non-LNM.

673

674

675

676

677

678

679

680681682683684

685

686

687

688

689

Figure 3

Figure 3. The combination nomogram, combining SVM score, CA 19-9 level, and the MR-reported LNM factor.

690

691

692

693

694

695

696

697698

699

700

701

702

703

704

705

706

707

Figure 4

Figure 4. The ROC curves of the combination nomogram in the training group (A) and the validation group (B). The scatter plots for the nomogram score in the training group (C) and the validation group (D). The blue markers indicate patients with synchronous LNM; the red markers indicate patients with non-LNM. The black horizontal line presents the threshold. Patients with nomogram scores higher than -0.8270 are classified as LNM; patients with scores lower than -0.8270 are classified as non-LNM.

708

709

710

711

712

713

714

715716717718719

720

721

722

723

724

Figure 5

Figure 5. The calibration curves of the combination nomogram in the training group (A) and the validation group (B). Vertical axis: the actual probability of LNM probability; horizontal axis: the nomogram predicted LNM probability; the diagonal line: the perfect prediction with predicted LNM probabilities equal to the actual LNM probabilities. The decision curves of the SVM model and the combination nomogram in the training group (C) and the validation group (D). Vertical axis: the net benefit; horizontal axis: the threshold probability at a range of 0.0 to 1.0. The red and black dotted lines represent the decision curve of the combination nomogram and the SVM model, respectively. The gray line represents the decision curve of the assumption that all patients suffer from LNM; the black line represents the decision curve of the assumption that no patients suffer from LNM.

725

726

727

728

729

730

731

732733734735736737738739740

Figure 6

Figure 6. ROC curves for the SVM model and combination nomogram in the overall group.

741

742

743

744

745

746

747

748749

750

751

752

753

754

755

756

757

758

759

760

Graphical abstract

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

Supplementary Material

A radiomics approach based on support vector machine using MR images for preoperative lymph

node status evaluation in intrahepatic cholangiocarcinoma

Authors

Lei Xu*, Pengfei Yang*, Wenjie Liang*, Weihai Liu, Weigen Wang, Chen Luo, Jing Wang, Zhiyi Peng†, Lei

Xing, Mi Huang†, Shusen Zheng†, Tianye Niu†

. The patient inclusion Ⅰ and exclusion criteria

. MR acquisition parametersⅡ

. Determination ofⅢ hepatitis B, number of the primary tumors, and the MR-reported LNM factor

. The feature set developed in this studyⅣ

. The detailed descriptions of clinical net benefit, the “treat-all plan”, and the “treat-none plan” Ⅴ

. Demographic comparison of baseline clinical features between the training and validation groupsⅥ

. Calculation formulas for SVM model and combination nomogram Ⅶ

. Predictive performances of different feature selection methodsⅧ

. Histograms regarding the distributions of AUCs for the SVM model and combination nomogramⅨ

Ⅹ. The multivariable analysis for model construction

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

. The Ⅰ patient inclusion and exclusion criteria

The patient inclusion criteria included the following: (1) All patients were surgically resected with

pathologically confirmed ICC. (2) Lymph node dissection was performed during operation. (3) T1-weighted

contrast-enhanced MRI scan was performed within one month before the operation. (4) Preoperative clinical

records were complete.

The patient exclusion criteria included the following: (1) The disease was diagnosed to be mixed

hepatocellular cholangiocarcinoma. (2) The patient underwent chemotherapy before contrast-enhanced MRI

scan.

Figure S1. Patient data inclusion/exclusion pathway

. Ⅱ MR acquisition parameters

All patients underwent preoperative abdominal T1-weighted contrast-enhanced MRI scans. The MRI scans

were performed on an MRI scanner (3.0T, GE Medical Systems, Milwaukee, USA) with liver acceleration

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

volume acquisition (LAVA) sequence. By using the abdominal coil, the scan range was defined from the

diaphragmatic dome to the lower pole of the kidney. The contrast-enhanced MR imaging acquisition

parameters included: flip angle: 10°; repetition time: 2.75-3.20 ms; echo time: 1.25-1.50 ms; reverse time: 5

ms; bandwidth: 390.63-488.28 kHz; field of view: 380×304 mm; pixel spacing: 0.7031-0.8203 mm; slice

thickness: 4-6 mm. Each patient was rapid intravenous injected with 15 ml of Gadopentetate Dimeglumine

(2.5 ml/s). The arterial phase was scanned at 14s, then portal vein phase and delayed phase were 55s and

120s after the injection, respectively. Finally, we used the T1-weighted arterial phase enhanced MR images

in this radiomics study.

. Determination of hepatitis B, number of the primary tumors, and the MR-Ⅲ

reported LNM factor

All patients involved in this study were diagnosed with chronic hepatitis B. We divided the patients into

cirrhosis group (F4) and non-cirrhosis group (F0-3), according to their clinical and liver imaging

characteristics. Due to the absence of liver biopsy data, we did not perform accurate liver fibrosis grading.

We used the terminology “number of the primary tumors (single or multiple)” to refer to the case with the

number of solid primary tumors. “Single” refers to the case with only 1 solid tumor, while “multiple” refers

to the case with the number of solid tumors more than 2.

The MR-reported LNM factor was defined by an agreement of 3 radiologists based on the preoperative MR

images. The presence of the maximum short-axis diameter of regional LN ≥ 10mm and/or lymph node

margin hyperintense in diffusion MRI images, and/or marginal enhancement was scored as positive LNM,

while the absence of enlarged or lymph node margin hyperintense or marginal enhancement was scored as

negative LNM, consistent with the definition for LN status evaluation criteria in most previous studies.

. Ⅳ The feature set developed in this study

In this study, a number of 491 image features was extracted for each patient. These features comprised of

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

four groups; the detailed descriptions of the image features were provided in Table S1-S4. The histogram

statistics features described the voxel intensities statistical distribution within the tumors. The geometry

features described the 3D volume and shape characteristics of the tumors. The texture features described the

spatial intensity correlation and distributions of the voxels.

The gray-level co-occurrence matrix (GLCM) is a N g× N gmatrix defined as C (m ,n ;δ , α ), where m and n

represent gray levels, δ (dx ,dy) is the given distance and α indicates the certain direction which has 13

potential value for 3-dimension. The entry C (m ,n) represents the repetition of the correlation of the gray

levels m and n . N g represents the maximum gray level value within the volume of interest (VOI). The gray-

level run length matrix (GLRLM) is used to quantify run length matrices within the VOI. It is a N g× N g

matrix defined as R(m, n∨θ). The element R(m, n) describes the frequency value that the VOI includes a

run of length m, consisting of points of gray level n in the certain direction θ. The gray level size zone

matrix (GLSZM) is used to quantify size zone matrices within the VOI. It is a N g× N g matrix defined as

S(m,n). The element S(m, n) specifies the frequency of block of size n with the gray level m. The

neighborhood gray-tone difference matrix (NGTDM) is a column matrix of N g. It is a sum of the absolute

difference values between central voxel and the average value of its neighborhood. The neighborhood is

defined as the certain distance of 2 voxels. The detailed feature names and abbreviations are presented

below. The feature extraction program was implemented based on MATLAB (Version 2017b; MathWorks,

Natick, MA, USA).

Table S1 Histogram feature

Histogram feature Feature names and abbreviations

Variance, Skewness, Kurtosis, Mean, Energy, Entropy

Table S2 Geometry features

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

Geometry feature Feature names and abbreviations

Max DiameterUniformity, Surface Volume Ratio(SVR), Compactness1(Cpt1), Compactness2(Cpt2),Surface Area, Spherical Disproportion(SphDisp), Sphericity

Table S3 Texture features

Feature type Feature names and abbreviations

GLCM(Grey-level co-occurrence matrix)

Autocorrelation(autoc), Contrast(contr), Correlation(corrm), Correlation2(corrp), Cluster Prominence(cprom), Cluster Shade(cshad), Dissimilarity(dissi), Energy(energ), Entropy(entro), Homogeneity(homom), Homogeneity2(homop), Maximum probability(maxpr), Sum of squares Variance(sosvh), Sum average(savgh), Sum variance(svarh), Sum entropy(senth), Difference variance(dvarh), Difference entropy(denth), Information measure of correlation1(inf1h), Information measure of correlation2(inf2h), Inverse difference normalized (INN) (indnc), Inverse difference moment normalized(idmnc)

GLRLM(Grey-level run-length matrix)

Short Run Emphasis (SRE), Long Run Emphasis (LRE), Grey-Level Non-uniformity (GLN), Run-Length Non-uniformity (RLN), Run Percentage (RP), Low Grey-Level Run Emphasis (LGRE),

High Grey-Level Run Emphasis (HGRE),

Short Run Low Grey-Level Emphasis (SRLGE), Short Run High Grey-Level Emphasis (SRHGE), Long Run Low Grey-Level Emphasis (LRLGE), Long Run High Grey-Level Emphasis (LRHGE), Grey-Level Variance (GLV), Run-Length Variance (RLV)

865

GLSZM(Grey-level size zone matrix)

Small Zone Emphasis (SZE), Large Zone Emphasis (LZE), Grey-Level Non-uniformity (GLN), Zone-Size Non-uniformity (ZSN), Zone Percentage (ZP), Low Grey-Level Zone Emphasis (LGZE), High Grey-Level Zone Emphasis (HGZE),

Small Zone Low Grey-Level Emphasis (SZLGE), Small Zone High Grey-Level Emphasis (SZHGE), Large Zone Low Grey-Level Emphasis (LZLGE), Large Zone High Grey-Level Emphasis (LZHGE), Grey-Level Variance (GLV), Zone-Size Variance (ZSV)

NGTDM(Neighbourhood grey-tone difference matrix)

Coarseness, Contrast, Busyness, Complexity, Strength

Wavelet features: We use the discrete undecimated wavelet transform for decomposing the original

images. The high-pass and low-pass wavelet functions were used in three axials; then, the original image

could be decomposed into eight decompositions. We marked the original 3D images as G, the high-pass

wavelet function as H and the low-pass wavelet function as L. Them, the decompositions could be express

as GLLL, GLLH, GLHL, GHLL, GLHH, GHLH, GHHL, GHHH. Specificity, the decomposition GLHL indicated that the

original image was processed by using a low-pass filter, a high-pass filter and a low-pass filter in the x-axis,

y-axis and z-axis, respectively. Based on these 8 decomposed images, histogram and textural features are

extracted again. A number of 424 features could be obtained through wavelet transform. The filters used for

wavelet transform satisfy the perfect reconstruction conditions.

Table S4 Wavelet features

Wavelet type Feature names

LLL (low, low, low) LLL_GLCM_autoc, LLL_GLCM_contr, LLL_GLCM_corrm, LLL_GLCM_corrp, LLL_GLCM_cprom, LLL_GLCM_cshad, LLL_GLCM_dissi, LLL_GLCM_energ, LLL_GLCM_entro, LLL_GLCM_homom, LLL_GLCM_homop, LLL_GLCM_maxpr, LLL_GLCM_sosvh, LLL_GLCM_savgh, LLL_GLCM_svarh, LLL_GLCM_senth, LLL_GLCM_dvarh, LLL_GLCM_denth, LLL_GLCM_inf1h, LLL_GLCM_inf2h, LLL_GLCM_indnc, LLL_GLCM_idmnc, LLL_GLRLM_SRE, LLL_GLRLM_LRE,

866

867

868

869

870

871

872

873

874

875

LLL_GLRLM_GLN, LLL_GLRLM_RLN, LLL_GLRLM_RP, LLL_GLRLM_LGRE, LLL_GLRLM_HGRE, LLL_GLRLM_SRLGE, LLL_GLRLM_SRHGE, LLL_GLRLM_LRLGE, LLL_GLRLM_LRHGE, LLL_GLRLM_GLV, LLL_GLRLM_RLV, LLL_GLSZM_SZE, LLL_GLSZM_LZE, LLL_GLSZM_GLN, LLL_GLSZM_ZSN, LLL_GLSZM_ZP, LLL_GLSZM_LGZE, LLL_GLSZM_HGZE, LLL_GLSZM_SZLGE, LLL_GLSZM_SZHGE, LLL_GLSZM_LZLGE, LLL_GLSZM_LZHGE, LLL_GLSZM_GLV, LLL_GLSZM_ZSV, LLL_Coarseness, LLL_Contrast, LLL_Busyness, LLL_Complexity, LLL_Strength

LLH (low, low, high) LLH_GLCM_autoc, LLH_GLCM_contr, LLH_GLCM_corrm, LLH_GLCM_corrp, LLH_GLCM_cprom, LLH_GLCM_cshad, LLH_GLCM_dissi, LLH_GLCM_energ), LLH_GLCM_entro, LLH_GLCM_homom, LLH_GLCM_homop, LLH_GLCM_maxpr, LLH_GLCM_sosvh, LLH_GLCM_savgh, LLH_GLCM_svarh, LLH_GLCM_senth, LLH_GLCM_dvarh, LLH_GLCM_denth, LLH_GLCM_inf1h, LLH_GLCM_inf2h, LLH_GLCM_indnc, LLH_GLCM_idmnc, LLH_GLRLM_SRE, LLH_GLRLM_LRE, LLH_GLRLM_GLN, LLH_GLRLM_RLN, LLH_GLRLM_RP, LLH_GLRLM_LGRE, LLH_GLRLM_HGRE, LLH_GLRLM_SRLGE, LLH_GLRLM_SRHGE, LLH_GLRLM_LRLGE, LLH_GLRLM_LRHGE, LLH_GLRLM_GLV, LLH_GLRLM_RLV, LLH_GLSZM_SZE, LLH_GLSZM_LZE, LLH_GLSZM_GLN, LLH_GLSZM_ZSN, LLH_GLSZM_ZP, LLH_GLSZM_LGZE, LLH_GLSZM_HGZE, LLH_GLSZM_SZLGE, LLH_GLSZM_SZHGE, LLH_GLSZM_LZLGE, LLH_GLSZM_LZHGE, LLH_GLSZM_GLV, LLH_GLSZM_ZSV, LLH_Coarseness, LLH_Contrast, LLH_Busyness, LLH_Complexity, LLH_Strength

LHL (low, high, low) LHL_GLCM_autoc, LHL_GLCM_contr, LHL_GLCM_corrm, LHL_GLCM_corrp, LHL_GLCM_cprom, LHL_GLCM_cshad, LHL_GLCM_dissi, LHL_GLCM_energ), LHL_GLCM_entro, LHL_GLCM_homom, LHL_GLCM_homop, LHL_GLCM_maxpr, LHL_GLCM_sosvh, LHL_GLCM_savgh, LHL_GLCM_svarh, LHL_GLCM_senth, LHL_GLCM_dvarh, LHL_GLCM_denth, LHL_GLCM_inf1h, LHL_GLCM_inf2h, LHL_GLCM_indnc, LHL_GLCM_idmnc, LHL_GLRLM_SRE, LHL_GLRLM_LRE, LHL_GLRLM_GLN, LHL_GLRLM_RLN, LHL_GLRLM_RP, LHL_GLRLM_LGRE, LHL_GLRLM_HGRE, LHL_GLRLM_SRLGE, LHL_GLRLM_SRHGE, LHL_GLRLM_LRLGE, LHL_GLRLM_LRHGE, LHL_GLRLM_GLV, LHL_GLRLM_RLV, LHL_GLSZM_SZE, LHL_GLSZM_LZE, LHL_GLSZM_GLN, LHL_GLSZM_ZSN, LHL_GLSZM_ZP, LHL_GLSZM_LGZE, LHL_GLSZM_HGZE, LHL_GLSZM_SZLGE, LHL_GLSZM_SZHGE, LHL_GLSZM_LZLGE, LHL_GLSZM_LZHGE, LHL_GLSZM_GLV, LHL_GLSZM_ZSV, LHL_Coarseness, LHL_Contrast, LHL_Busyness, LHL_Complexity, LHL_Strength

HLL (high, low, low) HLL_GLCM_autoc, HLL_GLCM_contr, HLL_GLCM_corrm, HLL_GLCM_corrp, HLL_GLCM_cprom, HLL_GLCM_cshad,

HLL_GLCM_dissi, HLL_GLCM_energ), HLL_GLCM_entro, HLL_GLCM_homom, HLL_GLCM_homop, HLL_GLCM_maxpr, HLL_GLCM_sosvh, HLL_GLCM_savgh, HLL_GLCM_svarh, HLL_GLCM_senth, HLL_GLCM_dvarh, HLL_GLCM_denth, HLL_GLCM_inf1h, HLL_GLCM_inf2h, HLL_GLCM_indnc, HLL_GLCM_idmnc, HLL_GLRLM_SRE, HLL_GLRLM_LRE, HLL_GLRLM_GLN, HLL_GLRLM_RLN, HLL_GLRLM_RP, HLL_GLRLM_LGRE, HLL_GLRLM_HGRE, HLL_GLRLM_SRLGE, HLL_GLRLM_SRHGE, HLL_GLRLM_LRLGE, HLL_GLRLM_LRHGE, HLL_GLRLM_GLV, HLL_GLRLM_RLV, HLL_GLSZM_SZE, HLL_GLSZM_LZE, HLL_GLSZM_GLN, HLL_GLSZM_ZSN, HLL_GLSZM_ZP, HLL_GLSZM_LGZE, HLL_GLSZM_HGZE, HLL_GLSZM_SZLGE, HLL_GLSZM_SZHGE, HLL_GLSZM_LZLGE, HLL_GLSZM_LZHGE, HLL_GLSZM_GLV, HLL_GLSZM_ZSV, HLL_Coarseness, HLL_Contrast, HLL_Busyness, HLL_Complexity, HLL_Strength

HHL (high, high, low)

HHL_GLCM_autoc, HHL_GLCM_contr, HHL_GLCM_corrm, HHL_GLCM_corrp, HHL_GLCM_cprom, HHL_GLCM_cshad, HHL_GLCM_dissi, HHL_GLCM_energ), HHL_GLCM_entro, HHL_GLCM_homom, HHL_GLCM_homop, HHL_GLCM_maxpr, HHL_GLCM_sosvh, HHL_GLCM_savgh, HHL_GLCM_svarh, HHL_GLCM_senth, HHL_GLCM_dvarh, HHL_GLCM_denth, HHL_GLCM_inf1h, HHL_GLCM_inf2h, HHL_GLCM_indnc, HHL_GLCM_idmnc, HHL_GLRLM_SRE, HHL_GLRLM_LRE, HHL_GLRLM_GLN, HHL_GLRLM_RLN, HHL_GLRLM_RP, HHL_GLRLM_LGRE, HHL_GLRLM_HGRE, HHL_GLRLM_SRLGE, HHL_GLRLM_SRHGE, HHL_GLRLM_LRLGE, HHL_GLRLM_LRHGE, HHL_GLRLM_GLV, HHL_GLRLM_RLV, HHL_GLSZM_SZE, HHL_GLSZM_LZE, HHL_GLSZM_GLN, HHL_GLSZM_ZSN, HHL_GLSZM_ZP, HHL_GLSZM_LGZE, HHL_GLSZM_HGZE, HHL_GLSZM_SZLGE, HHL_GLSZM_SZHGE, HHL_GLSZM_LZLGE, HHL_GLSZM_LZHGE, HHL_GLSZM_GLV, HHL_GLSZM_ZSV, HHL_Coarseness, HHL_Contrast, HHL_Busyness, HHL_Complexity, HHL_Strength

HLH (high, low, high) HLH_GLCM_autoc, HLH_GLCM_contr, HLH_GLCM_corrm, HLH_GLCM_corrp, HLH_GLCM_cprom, HLH_GLCM_cshad, HLH_GLCM_dissi, HLH_GLCM_energ), HLH_GLCM_entro, HLH_GLCM_homom, HLH_GLCM_homop, HLH_GLCM_maxpr, HLH_GLCM_sosvh, HLH_GLCM_savgh, HLH_GLCM_svarh, HLH_GLCM_senth, HLH_GLCM_dvarh, HLH_GLCM_denth, HLH_GLCM_inf1h, HLH_GLCM_inf2h, HLH_GLCM_indnc, HLH_GLCM_idmnc, HLH_GLRLM_SRE, HLH_GLRLM_LRE, HLH_GLRLM_GLN, HLH_GLRLM_RLN, HLH_GLRLM_RP, HLH_GLRLM_LGRE, HLH_GLRLM_HGRE, HLH_GLRLM_SRLGE, HLH_GLRLM_SRHGE, HLH_GLRLM_LRLGE, HLH_GLRLM_LRHGE, HLH_GLRLM_GLV, HLH_GLRLM_RLV, HLH_GLSZM_SZE, HLH_GLSZM_LZE, HLH_GLSZM_GLN, HLH_GLSZM_ZSN, HLH_GLSZM_ZP, HLH_GLSZM_LGZE, HLH_GLSZM_HGZE,

HLH_GLSZM_SZLGE, HLH_GLSZM_SZHGE, HLH_GLSZM_LZLGE, HLH_GLSZM_LZHGE, HLH_GLSZM_GLV, HLH_GLSZM_ZSV, HLH_Coarseness, HLH_Contrast, HLH_Busyness, HLH_Complexity, HLH_Strength

LHH (low, high, high) LHH_GLCM_autoc, LHH_GLCM_contr, LHH_GLCM_corrm, LHH_GLCM_corrp, LHH_GLCM_cprom, LHH_GLCM_cshad, LHH_GLCM_dissi, LHH_GLCM_energ), LHH_GLCM_entro, LHH_GLCM_homom, LHH_GLCM_homop, LHH_GLCM_maxpr, LHH_GLCM_sosvh, LHH_GLCM_savgh, LHH_GLCM_svarh, LHH_GLCM_senth, LHH_GLCM_dvarh, LHH_GLCM_denth, LHH_GLCM_inf1h, LHH_GLCM_inf2h, LHH_GLCM_indnc, LHH_GLCM_idmnc, LHH_GLRLM_SRE, LHH_GLRLM_LRE, LHH_GLRLM_GLN, LHH_GLRLM_RLN, LHH_GLRLM_RP, LHH_GLRLM_LGRE, LHH_GLRLM_HGRE, LHH_GLRLM_SRLGE, LHH_GLRLM_SRHGE, LHH_GLRLM_LRLGE, LHH_GLRLM_LRHGE, LHH_GLRLM_GLV, LHH_GLRLM_RLV, LHH_GLSZM_SZE, LHH_GLSZM_LZE, LHH_GLSZM_GLN, LHH_GLSZM_ZSN, LHH_GLSZM_ZP, LHH_GLSZM_LGZE, LHH_GLSZM_HGZE, LHH_GLSZM_SZLGE, LHH_GLSZM_SZHGE, LHH_GLSZM_LZLGE, LHH_GLSZM_LZHGE, LHH_GLSZM_GLV, LHH_GLSZM_ZSV, LHH_Coarseness, LHH_Contrast, LHH_Busyness, LHH_Complexity, LHH_Strength

HHH (high, high, high) HHH_GLCM_autoc, HHH_GLCM_contr, HHH_GLCM_corrm, HHH_GLCM_corrp, HHH_GLCM_cprom, HHH_GLCM_cshad, HHH_GLCM_dissi, HHH_GLCM_energ), HHH_GLCM_entro, HHH_GLCM_homom, HHH_GLCM_homop, HHH_GLCM_maxpr, HHH_GLCM_sosvh, HHH_GLCM_savgh, HHH_GLCM_svarh, HHH_GLCM_senth, HHH_GLCM_dvarh, HHH_GLCM_denth, HHH_GLCM_inf1h, HHH_GLCM_inf2h, HHH_GLCM_indnc, HHH_GLCM_idmnc, HHH_GLRLM_SRE, HHH_GLRLM_LRE, HHH_GLRLM_GLN, HHH_GLRLM_RLN, HHH_GLRLM_RP, HHH_GLRLM_LGRE, HHH_GLRLM_HGRE, HHH_GLRLM_SRLGE, HHH_GLRLM_SRHGE, HHH_GLRLM_LRLGE, HHH_GLRLM_LRHGE, HHH_GLRLM_GLV, HHH_GLRLM_RLV, HHH_GLSZM_SZE, HHH_GLSZM_LZE, HHH_GLSZM_GLN, HHH_GLSZM_ZSN, HHH_GLSZM_ZP, HHH_GLSZM_LGZE, HHH_GLSZM_HGZE, HHH_GLSZM_SZLGE, HHH_GLSZM_SZHGE, HHH_GLSZM_LZLGE, HHH_GLSZM_LZHGE, HHH_GLSZM_GLV, HHH_GLSZM_ZSV, HHH_Coarseness, HHH_Contrast, HHH_Busyness, HHH_Complexity, HHH_Strength

. The detailed descriptions of clinical net benefit,Ⅴ the “treat-all plan”, and the “treat-

none plan”

876

877

878

The net benefit was defined using the following formula:

Net benefit=TPRN

− FPRN

×P t

1−Pt

In this formula,N was the sample size, Pt was the threshold probability to stratify patients as the predicted

synchronous lymph node metastasis (LNM) or non-LNM. Patients with the predicted LNM probabilities

greater than Pt were predicted as synchronous LNM, while patients with the predicted LNM probabilities

lower than Pt were predicted as non-LNM. For patients with predicted synchronous LNM, the lymph node

dissection (LND) were recommended. While for patients with predicted non-LNM, the LND was not

recommended. TPR was the true positive rate. TPR was defined as the ratio of patients with predicted

synchronous LNM in the patients with actual LNM. FPR was the false positive rate. FPR was defined as the

ratio of patients with predicted synchronous LNM in the patients without LNM.

The “treat-none plan” was defined that no patients were predicted as LNM. In this case, the TPR and FPR

equaled to zero, and the net benefit was zero. The “treat-all plan” was defined that all patients were

predicted as LNM. In this case, the TPR and FPR equaled to one, and the net benefit calculation formula

was changed:

Net benefit treat−all plan=1−2 × Pt

N ×(1−P t)

. Demographic comparison of baseline clinical features between the training and Ⅵ

validation groups

While a temporal interval existed between the training and validation groups, there were no significant

differences in the baseline clinical features between the training group and the validation group neither for

patients with LNM (P = 0.9773 for age, 0.0923 for gender, 0.4419 for primary hepatic lobe site, 0.9590 for

number of the primary tumors 0.9464 for hepatitis, 0.9044 for cirrhosis, 0.8684 for cholelithiasis, 0.1904 for

CA 19-9 level, 0.4974 for CEA level, 0.4419 for the MR-reported LNM factor) and patients with non-LNM

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

(P = 0.8829 for age, 0.1900 for gender, 0.3374 for primary hepatic lobe site, 0.6015 for number of the

primary tumors 0.8749 for hepatitis, 0.9202 for cirrhosis, 0.8050 for cholelithiasis, 0.0525 for CA 19-9 level,

0.5401 for CEA level, 0.5523 for the MR-reported LNM factor). Thus, the baseline clinical features for

patients in the training and validation groups justify their use as the training and validation groups.

. Calculation formulas for SVM model and combination nomogram ⅦSVM score=0.3386+0.0988× HLH¿−0.1524 × LLH¿−0.2111 × HLL¿−0.4333 × LLL¿−0.2087 × HLL¿

Nomogram score=−3.4872+4.1198× SVM score+1.4461×CA 19 ˗ 9level+1.0490× MR ˗reported LNM

. Predictive performances of different feature selection methodsⅧTo find the optimal feature selection algorithm for the problem here, we compared the performances of

several feature selection methods, including mRMR, least absolute shrinkage and selection operator

(LASSO), Random forest, Elastic net, Wilcoxon, and Gini index, and the results were summarized in Table

S5 below. In the training group, the P values calculated based on the Delong test showed that the mRMR,

LASSO, Elastic net, and Random forest were all less than 0.0001. Using the AUC value as an evaluation

index, the mRMR method and LASSO method showed the best performances. In the validation group, the P

value for the mRMR method was the lowest, while its AUC was the highest. Therefore, the mRMR was

chosen as the optimal method in this paper.

Table S5. Predictive performances of different feature selection methods

Methods Training group Validation group

Sensitivity Specificity AUC (95% CI) P Sensitivity Specificity AUC (95% CI) P

mRMR 65.96% 79.66% 0.788 (0.698-0.862) <0.0001 52.63% 91.30% 0.787 (0.634-0.898) <0.0001

LASSO 70.21% 71.19% 0.773 (0.692-0.857) <0.0001 63.16% 60.87% 0.714 (0.554-0.843) 0.0077

Elastic Net 82.98% 55.93% 0.737 (0.643-0.818) <0.0001 73.68% 43.48% 0.629 (0.467-0.773) 0.1385

Random Forest 65.96% 79.66% 0.759 (0.666-0.837) <0.0001 42.11% 73.91% 0.673 (0.511-0.809) 0.0396

Wilcoxon 57.45% 79.66% 0.689 (0.592-0.775) 0.0004 57.89% 78.26% 0.693 (0.532-0.826) 0.0238

Gini Index 49.15% 80.85% 0.679 (0.581-0.766) 0.0006 43.48% 78.95% 0.719 (0.559-0.846) 0.0074

Note: AUC: area under the curve; CI: confidence interval. P value was calculated using the Delong test.

. Histograms regarding the distributions of AUCs for the SVM mode and Ⅸ

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

combination nomogram

Figure S2. Histograms regarding the distributions of AUCs from the bootstrap method for SVM model and combination

nomogram in both training and validation groups. (A) Histogram for SVM model in the training group; (B) histogram for

SVM model in the validation group; (C) histogram for combination nomogram in the training group; (B) histogram for

combination nomogram in the validation group.

Ⅹ. The multivariable analysis for model construction

In the multivariable analysis, we used the Akaike information criterion (AIC) and the independence analysis

to select the optimal factors. The detailed AIC values in the model construction procedure were showed in

Table S6. Firstly, a combination with the minimum AIC value of 117.49 was selected, involving SVM score,

CA 19-9 level, number of the primary tumors, primary hepatic lobe site, and the MR-reported LNM factor

was selected. By using the independence analysis for features in the model, three features of SVM score, CA

19-9 level, and the MR-reported LNM factor were reported independent with P-values < 0.05, while two

features of primary hepatic lobe site and number of the primary tumors were reported non-independent with

P-values > 0.05. Table S8 showed the P-values for these five features. Then, we removed these two

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

redundant features and construct a new model with the independent features only. Table S9 showed the P-

values for the selected factors in the new prediction model. Thus, we used the model with the SVM score,

CA 19-9 level, and the MR-reported LNM factor in this study.

Table S6. AIC value changes in the model construction

Variable AIC

SVM score & Gender & Age & Cholelithiasis & Hepatitis B & Cirrhosis & Primary hepatic lobe site & Number of the primary tumors &CA 19-9 level & CEA level & MR-reported LNM

127.41

SVM score & Gender & Age & Cholelithiasis & Cirrhosis & Primary hepatic lobe site & Number of the primary tumors &CA 19-9 level & CEA level & MR-reported LNM

125.43

SVM score & Gender & Age & Cholelithiasis & Cirrhosis & Primary hepatic lobe site & Number of the primary tumors &CA 19-9 level & MR-reported LNM

123.45

SVM score & Gender & Age & Cholelithiasis & Primary hepatic lobe site & Number of the primary tumors &CA 19-9 level & MR-reported LNM

121.53

SVM score & Age & Cholelithiasis & Primary hepatic lobe site & Number of the primary tumors &CA 19-9 level & MR-reported LNM

119.76

SVM score & Cholelithiasis & Primary hepatic lobe site & Number of the primary tumors &CA 19-9 level & MR-reported LNM

118.19

SVM score & Primary hepatic lobe site & Number of the primary tumors &CA 19-9 level & MR-reported LNM

117.49

Note: SVM, support vector machine; LNM, lymph node metastasis; CA19-9, serum carbohydrate antigen 19-9; CEA, serum carcinoembryonic antigen.

Table S7. VIFs for all the candidate variables in the logistic regression analysis

Variable VIF

SVM score 5.9610Gender 2.3877Age 47.8974Cholelithiasis 1.4593Hepatitis B 1.6686Cirrhosis 1.0680Primary hepatic lobe site 1.8118Number of the primary tumors 1.4319CA 19-9 level 4.1702CEA level 1.9743MR-reported LNM 0.0772

Note: LNM, lymph node metastasis; CA19-9, serum carbohydrate antigen 19-9; CEA, serum carcinoembryonic antigen.

Table S8. Multivariable analysis for five features selected

Variable P

SVM score 0.0003CA 19-9 level 0.0078MR-reported LNM 0.0249

933

934

935

936

937

938

Primary hepatic lobe site 0.0546Number of the primary tumors 0.0772

Note: LNM: lymph node metastasis; CA19-9: serum carbohydrate antigen 19-9.

Table S9. Multivariable analysis for features used in the nomogram

Variable Coefficients P OR (95% CI)

SVM score 4.1198 <0.0001 61.5448 (7.8097-485.0073)CA 19-9 level 1.4461 0.0081 4.2467 (1.4569-12.3785)MR-reported LNM 1.0490 0.0307 2.8548 (1.1022-7.3941)

Note: SVM, support vector machine; OR, odds ratio; CI, confidence interval.

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

Reference:

1. Haralick RM, Shanmugam K. Textural features for image classification. IEEE Trans Syst Man Cybern. 1973; SMC-3: 610-21.2. Galloway MM. Texture analysis using gray level run lengths. Comput Graph Image Process. 1975; 4: 172-9.3. Amadasun M, King R. Textural features corresponding to textural properties. IEEE Trans Syst Man Cybern. 1989; 19: 1264-74. 4. Chu A, Sehgal CM, Greenleaf JF. Use of gray value distribution of run lengths for texture analysis. Pattern Recognit Lett. 1990; 11: 415-9. 5. Dasarathy BV, Holder EB. Image characterizations based on joint gray level-run length distributions. Pattern Recognit Lett. 1991; 12: 497-502.6. Thibault G, Fertil B, Navarro C, L., Pereira S, Cau P, Lévy N, et al. Texture indexes and gray level size zone matrix: application to cell nuclei classification. Pattern Recognition Inf Process. 2009; 140-5.7. Vallieres M, Freeman CR, Skamene SR, El Naqa I. A radiomics model from joint FDG-PET and MRI texture features for the prediction of lung metastases in soft-tissue sarcomas of the extremities. Phys Med Biol. 2015; 60: 5471-96. 8. Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006; 26: 565-74. 9. Vickers AJ, Van Calster B, Steyerberg EW. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ. 2016; 352: 5.10. Coroller TP, Grossmann P, Hou Y, Velazquez ER, Leijenaar RTH, Hermann G, et al. CT-based radiomic signature predicts distant metastasis in lung adenocarcinoma. Radiother Oncol. 2015; 114: 345-50. 11. Parmar C, Grossmann P, Bussink J, Lambin P, Aerts H. Machine learning methods for quantitative radiomic biomarkers. Sci Rep. 2015; 5: 11. 12. Huang Y, Liang C, He L, Tian J, Liang C, Chen X, et al. Development and validation of a radiomics nomogram for preoperative prediction of lymph node metastasis in colorectal cancer. J Clin Oncol. 2016; 34: 2157-64.13. Zhang B, He X, Ouyang F, Gu D, Dong Y, Zhang L, et al. Radiomic machine-learning classifiers for prognostic biomarkers of advanced nasopharyngeal carcinoma. Cancer Lett. 2017; 403: 21-7.14. Li H, Galperin-Aizenberg M, Pryma D, Simone CB, Fan Y. Unsupervised machine learning of radiomic features for predicting treatment response and overall survival of early stage non-small cell lung cancer patients treated with stereotactic body radiation therapy. Radiother Oncol. 2018; 129: 218-26. 15. Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr. 1974; 19: 716-23. 16. Wu S, Zheng J, Li Y, Yu H, Shi S, Xie W, et al. A radiomics nomogram for the preoperative prediction of lymph node metastasis in bladder cancer. Clin Cancer Res. 2017; 23: 6904-11.

967

968

969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999

· web viewthe tumor detection and characterization of tumor phenotype appeared to be improved...

Documents