a deep neural network improves endoscopic detection of

10
A deep neural network improves endoscopic detection of early gastric cancer without blind spots Authors Lianlian Wu 1,2,3,* , Wei Zhou 1,2,3,* , Xinyue Wan 1,2,3 , Jun Zhang 1,2,3 , Lei Shen 1,2,3 , Shan Hu 4 , Qianshan Ding 1,2,3 , Ganggang Mu 1,2,3 , Anning Yin 1,2,3 , Xu Huang 1,2,3 , Jun Liu 1,3 , Xiaoda Jiang 1,2,3 , Zhengqiang Wang 1,2,3 , Yunchao Deng 1,2,3 , Mei Liu 5 , Rong Lin 6 , Tingsheng Ling 7 , Peng Li 8 , Qi Wu 9 , Peng Jin 10 , Jie Chen 11 , Honggang Yu 1,2,3 Institutions 1 Department of Gastroenterology, Renmin Hospital of Wuhan University, Wuhan, China 2 Key Laboratory of Hubei Province for Digestive System Disease, Renmin Hospital of Wuhan University, Wuhan, China 3 Hubei Provincial Clinical Research Center for Digestive Disease Minimally Invasive Incision, Renmin Hospital of Wuhan University, Wuhan, China 4 School of Resources and Environmental Sciences of Wuhan University, Wuhan, China 5 Department of Gastroenterology, Tongji Hospital of Huazhong University of Science and Technology, Wuhan, China 6 Department of Gastroenterology, Wuhan Union Hospital of Huazhong University of Science and Technology, Wuhan, China 7 Department of Gastroenterology, Nanjing Drum Tower Hospital of Nanjin University, Nanjin, China 8 Department of Gastroenterology, Beijing Friendship Hospital of the Capital University of Medical Sciences, Beijing, China 9 Endoscopy Center, Beijing Cancer Hospital of Peking University, Beijing, China 10 Department of Gastroenterology, Beijing Military Hospital, Beijing, China 11 Department of Gastroenterology, Changhai Hospital of the Second Military Medical University, Shanghai, China submitted 10.4.2018 accepted after revision 14.9.2018 Bibliography DOI https://doi.org/10.1055/a-0855-3532 Published online: 12.3.2019 | Endoscopy 2019; 51: 522531 © Georg Thieme Verlag KG Stuttgart · New York ISSN 0013-726X Corresponding author Yu Honggang, MD, Department of Gastroenterology, Renmin Hospital of Wuhan University, 99 Zhangzhidong Road, Wuhan 430060, Hubei Province, China [email protected] ABSTRACT Background Gastric cancer is the third most lethal malig- nancy worldwide. A novel deep convolution neural network (DCNN) to perform visual tasks has been recently devel- oped. The aim of this study was to build a system using the DCNN to detect early gastric cancer (EGC) without blind spots during esophagogastroduodenoscopy (EGD). Methods 3170 gastric cancer and 5981 benign images were collected to train the DCNN to detect EGC. A total of 24549 images from different parts of stomach were collec- ted to train the DCNN to monitor blind spots. Class activa- tion maps were developed to automatically cover suspi- cious cancerous regions. A grid model for the stomach was used to indicate the existence of blind spots in unprocessed EGD videos. Results The DCNN identified EGC from non-malignancy with an accuracy of 92.5 %, a sensitivity of 94.0 %, a specifi- city of 91.0%, a positive predictive value of 91.3%, and a negative predictive value of 93.8 %, outperforming all levels of endoscopists. In the task of classifying gastric locations into 10 or 26 parts, the DCNN achieved an accuracy of 90 % or 65.9 %, on a par with the performance of experts. In real- time unprocessed EGD videos, the DCNN achieved automa- ted performance for detecting EGC and monitoring blind spots. Appendix e1, Fig. e2 e6, Fig. e8, Table e1, e2 Online content viewable at: https://doi.org/10.1055/a-0855-3532 * Contributed equally to this work Scan this QR-Code for the authorʼ s interview. Original article 522 Wu Lianlian et al. Deep neural network for endoscopic early gastric cancer detection Endoscopy 2019; 51: 522531 This document was downloaded for personal use only. Unauthorized distribution is strictly prohibited. Published online: 2019-03-12

Upload: others

Post on 07-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

A deep neural network improves endoscopic detection of earlygastric cancer without blind spots

Authors

Lianlian Wu1,2, 3, *, Wei Zhou1,2, 3, *, Xinyue Wan1,2, 3, Jun Zhang1,2, 3, Lei Shen1,2, 3, Shan Hu4, Qianshan Ding1,2, 3,

Ganggang Mu1,2, 3, Anning Yin1,2, 3, Xu Huang1,2, 3, Jun Liu1, 3, Xiaoda Jiang1,2, 3, Zhengqiang Wang1, 2, 3,

Yunchao Deng1,2, 3, Mei Liu5, Rong Lin6, Tingsheng Ling7, Peng Li8, Qi Wu9, Peng Jin10, Jie Chen11, Honggang Yu1,2, 3

Institutions

1 Department of Gastroenterology, Renmin Hospital of

Wuhan University, Wuhan, China

2 Key Laboratory of Hubei Province for Digestive System

Disease, Renmin Hospital of Wuhan University, Wuhan,

China

3 Hubei Provincial Clinical Research Center for Digestive

Disease Minimally Invasive Incision, Renmin Hospital of

Wuhan University, Wuhan, China

4 School of Resources and Environmental Sciences of

Wuhan University, Wuhan, China

5 Department of Gastroenterology, Tongji Hospital of

Huazhong University of Science and Technology,

Wuhan, China

6 Department of Gastroenterology, Wuhan Union

Hospital of Huazhong University of Science and

Technology, Wuhan, China

7 Department of Gastroenterology, Nanjing Drum Tower

Hospital of Nanjin University, Nanjin, China

8 Department of Gastroenterology, Beijing Friendship

Hospital of the Capital University of Medical Sciences,

Beijing, China

9 Endoscopy Center, Beijing Cancer Hospital of Peking

University, Beijing, China

10 Department of Gastroenterology, Beijing Military

Hospital, Beijing, China

11 Department of Gastroenterology, Changhai Hospital of

the Second Military Medical University, Shanghai,

China

submitted 10.4.2018

accepted after revision 14.9.2018

Bibliography

DOI https://doi.org/10.1055/a-0855-3532

Published online: 12.3.2019 | Endoscopy 2019; 51: 522–531

© Georg Thieme Verlag KG Stuttgart · New York

ISSN 0013-726X

Corresponding author

Yu Honggang, MD, Department of Gastroenterology,

Renmin Hospital of Wuhan University, 99 Zhangzhidong

Road, Wuhan 430060, Hubei Province, China

[email protected]

ABSTRACT

Background Gastric cancer is the third most lethal malig-

nancy worldwide. A novel deep convolution neural network

(DCNN) to perform visual tasks has been recently devel-

oped. The aim of this study was to build a system using the

DCNN to detect early gastric cancer (EGC) without blind

spots during esophagogastroduodenoscopy (EGD).

Methods 3170 gastric cancer and 5981 benign images

were collected to train the DCNN to detect EGC. A total of

24549 images from different parts of stomach were collec-

ted to train the DCNN to monitor blind spots. Class activa-

tion maps were developed to automatically cover suspi-

cious cancerous regions. A grid model for the stomach was

used to indicate the existence of blind spots in unprocessed

EGD videos.

Results The DCNN identified EGC from non-malignancy

with an accuracy of 92.5%, a sensitivity of 94.0%, a specifi-

city of 91.0%, a positive predictive value of 91.3%, and a

negative predictive value of 93.8%, outperforming all levels

of endoscopists. In the task of classifying gastric locations

into 10 or 26 parts, the DCNN achieved an accuracy of 90%

or 65.9%, on a par with the performance of experts. In real-

time unprocessed EGD videos, the DCNN achieved automa-

ted performance for detecting EGC and monitoring blind

spots.

Appendix e1, Fig. e2– e6, Fig. e8, Table e1, e2

Online content viewable at:

https://doi.org/10.1055/a-0855-3532

* Contributed equally to this work

Scan this QR-Code for the authorʼs interview.

Original article

522 Wu Lianlian et al. Deep neural network for endoscopic early gastric cancer detection… Endoscopy 2019; 51: 522–531

Thi

s do

cum

ent w

as d

ownl

oade

d fo

r pe

rson

al u

se o

nly.

Una

utho

rized

dis

trib

utio

n is

str

ictly

pro

hibi

ted.

Published online: 2019-03-12

IntroductionGastric cancer is the third most lethal and the fifth most com-mon malignancy from a global perspective [1]. It is estimatedthat about 1 million new gastric cancer cases were diagnosedand about 700000 people died of gastric cancer in 2012, whichrepresents up to 10% of the cancer-related deaths worldwide[1]. The 5-year survival rate for gastric cancer is 5%–25% in itsadvanced stages, but reaches 90% in the early stages [2, 3]. Ear-ly detection is therefore a key strategy to improve patient survi-val.

In recent decades, endoscopic technology has seen remark-able advances and endoscopy has been widely used as a screen-ing test for early gastric cancer (EGC) [4]. In one series, 7.2% ofpatients with gastric cancer had however been misdiagnosed atan endoscopy performed within the previous year, and 73% ofthese cases arose from endoscopist errors [5].

The performance quality of EGD varies significantly becauseof cognitive and technical factors [6]. In the cognitive domain,EGC lesions are difficult to recognize because the mucosa oftenshows only subtle changes, which require endoscopists to bewell trained and armed with a thorough knowledge [4, 7]. In ad-dition, endoscopists could be affected by their subjective stateduring endoscopy, which restricts the detection of EGC to alarge extent [8]. In the technical domain, guidelines to mapthe entire stomach exist, but are often not well followed, espe-cially in developing countries [9, 10]. Therefore, it is importantto develop a feasible and reliable method to alert endoscopiststo possible EGC lesions and blind spots.

A potential solution to mitigate the skill variations is to applyartificial intelligence (AI) to EGD examinations. The past dec-

ades have seen an explosion of interest in the application of AIin medicine [11]. More recently, a method of AI known as adeep convolutional neural network (DCNN), a method trans-forming the representation at one level into a more abstractlevel to make predictions, has opened the door to elaborate im-age analysis [12]. Recent studies have successfully used DCNNsin the field of endoscopy. Chen et al. achieved accurate classifi-cation of diminutive colorectal polyps based on colonoscopyimages [13], and Byrne et al. achieved real-time differentiationof adenomatous and hyperplastic diminutive colorectal polypsusing colonoscopy videos [14]. However, in real-time EGC de-tection alone with blind spot monitoring, the application ofDCNN has not yet been researched.

In this work, we first developed a novel system using DCNNto analyze EGD images of EGC and gastric locations. Further-more, we exploited an activation map to proactively track sus-picious cancerous regions and built a grid model for the stom-ach to indicate the existence of blind spots on unprocessed EGDvideos.

MethodsDatasets, data preparation, and sample distribution

The flowchart of the data preparation and training/testing pro-cedure of the DCNN is shown in ▶Fig. 1. Networks playing dif-ferent roles were independently trained, and their functions,inclusion criteria, exclusion criteria, image views, data sources,and data preparation are described in ▶Table e1 (▶Fig. e2;[15, 16]), available online in Supplementary materials. Thesample distribution is presented in ▶Fig. e3. Images of thesame lesion from multiple viewpoints, or similar lesions from

Blurry images 10/26 locations

Malignant DCNNtraining and

validation

Comparison between the DCNN and endoscopists

Running on EGD videos

BenignClassified by pathologists

Classified by endoscopists

EGD gallery

▶ Fig. 1 Flowchart of the data preparation and training/testing procedure of the deep convolutional neural network (DCNN) model. The func-tions of networks 1, 2, and 3 are filtering blurry images, early gastric cancer identification, and classification of gastric location, respectively. Thethree networks were independently trained. Blurry images and some clear images were used for the training of the network 1. Clear imageswere further classified into malignant or benign depending on the pathology evidence for the training and testing of network 2. Parallelly, clearimages were classified into 10 or 26 gastric locations by two endoscopists with more than 10 years of esophagogastroduodenoscopy (EGD)experience for the training and testing of network 3. When running on the videos, all frames will be first filtered via network 1, so only clearimages can enter into networks 2 and 3.

Conclusions We developed a system based on a DCNN to

accurately detect EGC and recognize gastric locations bet-

ter than endoscopists, and proactively track suspicious can-

cerous lesions and monitor blind spots during EGD.

Chinese Clinical Trial Registry

ChiCTR1800014809

TRIAL REGISTRATION: Single-center, retrospective trial

ChiCTR1800014809 at http://www.chictr.org.cn/

Wu Lianlian et al. Deep neural network for endoscopic early gastric cancer detection… Endoscopy 2019; 51: 522–531 523

Thi

s do

cum

ent w

as d

ownl

oade

d fo

r pe

rson

al u

se o

nly.

Una

utho

rized

dis

trib

utio

n is

str

ictly

pro

hibi

ted.

the same person were contained. Extensive attention was paidto ensure that images from the same person were not split be-tween the training, validation, and test sets.

The videos used came from stored data at Renmin Hospitalof Wuhan University. Instruments that had been used includedgastroscopes with an optical magnification function (CVL-290SL, Olympus Optical Co. Ltd., Tokyo, Japan; VP-4450HD, Fu-jifilm Co., Kanagawa, Japan).

The number of enrolled images was based on the data avail-ability, which led to malignant images being relatively rarecompared with non-malignant images and the number of ima-ges from different locations varying widely. The standards ofthe ImageNet Large Scale Visual Recognition Challenge(ILSVRC) were used to justify these numbers [17].

Training algorithm

VGG-16 [18] and ResNet-50 [19], two state-of-the-art DCNNarchitectures pretrained with 1.28 million images from 1000object classes, were used to train our system. Using transferlearning [20], we replaced the final classification layer with an-other fully connected layer, retrained it using our datasets, andfine-tuned the parameters of all layers. Images were resized to224×224 pixels to suit the original dimensions of the models.Google’s TensorFlow [21] deep learning framework was usedto train, validate, and test our system. Confusion matrices,learning curves, and the methods of avoiding overfitting aredescribed in Appendix e1 (▶Table e2; ▶Figs. e4– e6 [22–26]), available online in Supplementary materials.

Comparison between DCNN and endoscopists

To evaluate the DCNN’s diagnostic ability for EGC, 200 imagesindependent from the training/validation sets were selected asthe test set. The performance of the DCNN was compared withthat of six expert endoscopists, eight seniors, and seven novi-ces. Lesions that are easily missed, including EGC type 0-I, 0-IIa, 0-IIb, 0-IIc, and 0-mixed were selected (▶Table 3; ▶Fig. 7).Two endoscopists with more than 10 years of EGD experiencereviewed these images. In the test, endoscopists were asked ifthere was a suspicious malignant lesion shown in each image.

The calculation formulas of comparison metrics are describedAppendix e1, available online in Supplementary materials.

To evaluate the DCNN’s ability to classify gastric locations,we compared the accuracy of DCNN to that of 10 experts, 16seniors, and 9 novices. The test dataset consisted of 170 ima-ges, independent from the training/validation sets, randomlyselected from each gastric location. In the test, endoscopistswere asked which location a picture referred to. Tests were per-formed using Document Star, a Chinese online test servicecompany. The description of the endoscopists participating inthe experiment is presented in Appendix e1, available in Sup-plementary materials.

Class activation mapsClass activation maps (CAMs) indicating suspicious cancerousregions were established, as described previously [27]. In brief,before the final output layer of Resnet-50, a global averagepooling was performed on the convolutional feature maps andthese were used as the features for a fully-connected layer thatproduced the desired output. The color depth of CAMs is posi-tively correlated with the confidence of the prediction.

A grid model for the stomach

In order to automatically remind endoscopists of blind spotsduring EGD, a grid model for the stomach was developed topresent the covered parts. The onion-skin display method wasused to build a grid model for the stomach, as previously de-scribed [28]. The characteristics of each part of the stomachwere extracted and put together to generate a virtual stomachmodel. The model was set to be transparent before EGD. Assoon as the scope was inserted into the stomach, the DCNNmodel began to capture images and filled them into the cor-responding part in the model, coloring the various parts.

Running the DCNN on videos

Frame-wise prediction was used on unprocessed videos usingclient-server interaction (▶Fig. e8a). Images were captured at2 frames per second (fps). Noises were smoothed by the Ran-dom Forest Classifier model [29] and the rule of “output resultsonly when three of the five consecutive images show a same re-sult” (▶Fig. e8b). The time used for outputting a prediction perframe in the videos in the clinical setting includes time con-sumed in the client (image capture, image resizing, and render-ing images based on predicted results), network communica-tion, and the server (reading and loading images, running thethree networks, and saving images).

The speed of the DCNN in the clinical setting was evaluatedby 926 independent tests, calculating the total time used tooutput a prediction per frame in the endoscopy center of Re-nmin Hospital of Wuhan University.

Human subjects

Endoscopists that participated in our tests were under in-formed consent. This study was approved by the Ethics Com-mittee of Renmin Hospital of Wuhan University, and was regis-tered as trial number ChiCTR1800014809 of the Primary Regis-tries of the WHO Registry Network.

▶Table 3 Lesion characteristics in the test set for the detection of earlygastric cancer (EGC).

EGC Control

0-I   2

0-IIa  10

0-IIb  25

0-IIc  31

0-mixed (0-IIa + IIc, IIc + IIb, IIc + III)  32

Normal  18

Superficial gastritis  40

Mild erosive gastritis  42

Total 100 100

524 Wu Lianlian et al. Deep neural network for endoscopic early gastric cancer detection… Endoscopy 2019; 51: 522–531

Original article

Thi

s do

cum

ent w

as d

ownl

oade

d fo

r pe

rson

al u

se o

nly.

Una

utho

rized

dis

trib

utio

n is

str

ictly

pro

hibi

ted.

Statistical analysis

A two-tailed unpaired Student's t test with a significance levelof 0.05 was used to compare differences in the accuracy, sensi-tivity, specificity, and positive and negative predictive values(PPV and NPV, respectively) of the DCNN and endoscopists. In-terobserver and intraobserver agreement of the endoscopistsand intraobserver agreement of the DCNN were evaluatedusing Cohen’s kappa coefficient. All calculations were per-formed using SPSS 20 (IBM, Chicago, Illinois, USA).

Results

The performance of DCNN on identification of EGC

Comparison between the performance of DCNN andendoscopists

▶Table 4 shows the predictions of the DCNN and endoscopistsfor identifying EGC. Among 200 gastroscope images, with orwithout malignant lesions, the DCNN correctly diagnosed ma-lignancy with an accuracy of 92.5%, a sensitivity of 94%, a spe-cificity of 91%, a PPV of 91.3%, and an NPV of 93.8%. Six ex-perts, eight seniors, and seven novices attained an accuracy of89.7% (standard deviation [SD] 2.2%), 86.7% (SD 5.6%), and81.2% (SD 5.7%) for each picture, respectively. The accuracyof DCNN was significantly higher than that of all endoscopists.

▶ Fig. 7 Representative images predicted by the deep convolution neural network (DCNN) in the test set for the classification of gastric loca-tions into 26 parts, showing the gastric locations determined by the DCNN and their prediction confidence.L, lesser curvature; A, anterior wall; G, greater curvature; P, posterior wall.Class 0, esophagus; 1, squamocolumnar junction; 2–5, antrum (G, P, A, L, respectively); 6, duodenal bulb; 7, descending duodenum; 8–11,lower body in forward view (G, P, A, L, respectively); 12–15, middle-upper body in forward view (G, P, A, L, respectively); 16–19, fundus (G, P, A,L, respectively); 20–22, middle-upper body in retroflexed view (P, A, L, respectively); 23–25, angulus (P, A, L, respectively).

▶Table 4 Performance of the deep convolution neural network(DCNN) and endoscopists in the detection of early gastric cancer. Allresults are given as a percentage (standard deviation).

DCCN Experts

(n=6)

Seniors

(n=8)

Novices

(n=7)

Accuracy 92.50 89.73(2.15)1

86.68(5.58)2

81.16(5.72)1

Sensitivity 94.00 93.86(7.65)

90.00(6.05)

75.33(6.31)1

Specificity 91.00 87.33(7.43)

85.05(16.18)

88.83(6.03)

PPV 91.26 91.75(4.15)

90.91(5.69)

80.47(8.75)1

NPV 93.81 92.52(5.76)

88.01(6.55)2

82.32(11.46)

PPV, positive predictive value; NPV, negative predictive value.1 P <0.012 P <0.05

Wu Lianlian et al. Deep neural network for endoscopic early gastric cancer detection… Endoscopy 2019; 51: 522–531 525

Thi

s do

cum

ent w

as d

ownl

oade

d fo

r pe

rson

al u

se o

nly.

Una

utho

rized

dis

trib

utio

n is

str

ictly

pro

hibi

ted.

class: OKconfidence: 0.9775053989

class: OKconfidence: 0.952870659

class: OKconfidence: 0.906235367

normal mucosa

0-IIa

a superficial gastritis mild erosive gastritis

class: Caconfidence: 0.857862

class: Caconfidence: 0.818048

class: Caconfidence: 0.890203

b

c

0-IIb 0-IIc

▶ Fig. 9 Representative images predicted by the deep convolution neural network (DCNN) in the test set for detection of early gastric cancer(EGC). a The displayed normal mucosa, superficial gastritis, and mild erosive gastritis were predicted to be non-malignant by the DCNN withconfidence levels of 0.98, 0.95, and 0.91, respectively. b The mucosal images from EGC type 0-IIa, 0-IIb, and 0-IIc were predicted to be malig-nant by the DCNN with confidence of 0.86, 0.82, and 0.89, respectively. c Cancerous regions were indicated in the images from b after estab-lishing class activation maps (CAMs). The color depth of the CAMs was positively correlated with the prediction confidence.

526 Wu Lianlian et al. Deep neural network for endoscopic early gastric cancer detection… Endoscopy 2019; 51: 522–531

Original article

Thi

s do

cum

ent w

as d

ownl

oade

d fo

r pe

rson

al u

se o

nly.

Una

utho

rized

dis

trib

utio

n is

str

ictly

pro

hibi

ted.

▶Fig. 9 shows representative images predicted by the model inthe test set. The CAMs highlighted the cancerous regions afterimages were evaluated by the model.

Comparison between the stability of DCNN andendoscopists

To evaluate the stability of DCNN and endoscopists on identify-ing EGC, we mixed up all of the test pictures and randomly se-lected six endoscopists (2 experts, 2 seniors, and 2 novices) todo the same test again. As shown in ▶Table 5, the experts hadsubstantial interobserver agreement (kappa 0.80), and the se-niors and novices achieved moderate interobserver agreement(kappa 0.49 and 0.42, respectively). The intraobserver agree-ment of experts and nonexperts was moderate or better (kappa0.84 in the expert group and 0.54–0.77 in the nonexpertgroup). The DCNN achieved perfect intraobserver agreement(kappa 1.0).

The performance of DCNN on classification ofgastric locationsComparison between the performance of DCNN andendoscopists

▶Table 6 shows the predictions of the DCNN and endoscopistsfor classifying gastric locations. A group of 10 experts, 16 se-niors, and 9 novices classified EGD images into 10 stomachparts with an accuracy of 90.2% (SD 5.1%), 86.8% (5.2%), and83.3% (10.3%), respectively, and into 26 sites with an accuracyof 63.8% (6.9%), 59.3% (6.4%), and 46.5% (7.2%), respectively.The DCNN correctly identified EGD images into 10 parts with anaccuracy of 90% and into 26 parts with an accuracy of 65.9%,showing no significant difference with any of the levels ofendoscopists. ▶Fig. 7 and ▶Fig. 10 show representative ima-ges in the test set that were predicted by the DCNN in the taskof classifying gastric locations into 26 parts and 10 parts,respectively.

Comparison between the stability of DCNN andendoscopists

In the task of classifying gastric locations into 10 parts, allendoscopists achieved substantial interobserver or intraobser-ver agreement (kappa 0.75–0.96). In the 26-part classification,all endoscopists achieved moderate interobserver or intraob-

server agreement (kappa 0.50–0.68) (▶Table 7). The DCNNachieved perfect intraobserver agreement (kappa 1.0).

Testing of the DCNN in unprocessed gastroscopevideos

To explore the ability of the DCNN in detecting EGC and moni-toring blind spots in a real-time clinical setting, we checked themodel in two unprocessed gastroscope videos. In ▶Video 1,which had no cancerous lesion, the DCNN accurately presentedthe covered parts synchronized with the process of EGD to ver-ify that the entire stomach was mapped.

In ▶Video 2, which had cancerous lesions, the DCNN alertedabout blind spots synchronized with the process of EGD, andautomatically indicated the suspicious EGC regions with CAMs.All lesions were successfully detected; however, a false-positiveerror occurred when the mucosa was covered by unwashedfoam.

To test the speed of the DCNN, 926 independent tests wereconducted in a clinical setting. The total time to output a pre-diction using all three networks for each frame was 230 millise-conds (SD 60; range 180–350). In the test of identifying EGC,six experts, eight seniors, and seven novices took 3.29 secondsfor each picture (SD 0.42), 3.96 (0.80), and 6.19 (1.92), respec-tively. In the test of classifying gastric locations into 10 parts,10 experts, 16 seniors, and 9 novices required 4.51 secondsper picture (SD 2.07), 4.52 (0.65), and 4.76 (0.67), respective-ly; for classification into 26 parts, they took 14.23 seconds perpicture (SD 2.41), 19.33 (9.34), and 24.15 (6.93), respectively.The prediction time of the DCNN in the clinical setting was con-

▶Table 5 Intra- and interobserver agreement (kappa value) of endoscopists in identifying early gastric cancer.

Expert 1 Expert 2 Senior 1 Senior 2 Novice 1 Novice 2

Expert 1 0.84

Expert 2 0.80 0.84

Senior 1 0.77

Senior 2 0.50 0.66

Novice 1 0.66

Novice 2 0.42 0.54

▶Table 6 Accuracy of the deep convolution neural network (DCNN)and endoscopists in classifying gastric location. All results are given aspercentage (standard deviation).

10 parts 26 parts

DCNN 90.00 65.88

Experts (n = 10) 90.22 (5.09) 63.76 (6.87)

Seniors (n = 16) 86.81 (5.19) 59.26 (6.36)

Novices (n =9) 83.30 (10.27) 46.47 (7.23)*

* P<0.01

Wu Lianlian et al. Deep neural network for endoscopic early gastric cancer detection… Endoscopy 2019; 51: 522–531 527

Thi

s do

cum

ent w

as d

ownl

oade

d fo

r pe

rson

al u

se o

nly.

Una

utho

rized

dis

trib

utio

n is

str

ictly

pro

hibi

ted.

class: 0confidence: 0.997692

class: 1confidence: 0.996576

class: 2confidence: 0.995459

class: 3confidence: 0.952261

class: 4confidence: 1.0

class: 5confidence: 0.997692

class: 6confidence: 0.996576

class: 7confidence: 0.995459

class: 8confidence: 0.952261

class: 9confidence: 1.0

▶ Fig. 10 Representative images predicted by the deep convolution neural network (DCNN) in the test set for the classification of gastric loca-tions into 10 parts, showing the gastric locations determined by the DCNN and their prediction confidence.Class 0, esophagus; 1, squamocolumnar junction; 2, antrum; 3, duodenal bulb; 4, descending duodenum; 5, lower body in forward view;6, middle-upper body in forward view; 7, fundus; 8, middle-upper body in retroflexed view; 9, angulus.

▶Table 7 Intra- and interobserver agreement (kappa value) of endoscopists in classifying gastric location into 10 or 26 parts.

Expert 1 Expert 2 Senior 1 Senior 2 Novice 1 Novice 2

10 parts

Expert 1 0.96

Expert 2 0.91 0.90

Senior 1 0.86

Senior 2 0.75 0.89

Novice 1 0.89

Novice 2 0.85 0.93

26 parts

Expert 1 0.68

Expert 2 0.61 0.61

Senior 1 0.55

Senior 2 0.52 0.60

Novice 1 0.53

Novice 2 0.55 0.50

528 Wu Lianlian et al. Deep neural network for endoscopic early gastric cancer detection… Endoscopy 2019; 51: 522–531

Original article

Thi

s do

cum

ent w

as d

ownl

oade

d fo

r pe

rson

al u

se o

nly.

Una

utho

rized

dis

trib

utio

n is

str

ictly

pro

hibi

ted.

siderably shorter compared with the time taken by the endos-copists.

DiscussionEndoscopy plays a pivotal role in the diagnosis of gastric cancer,the third leading cause of cancer death worldwide [1, 4]. Unfor-tunately, endoscopic diagnosis of gastric cancer at an earlystage is difficult, requiring endoscopists first to obtain thor-ough knowledge and good technique [7]. The training processof a qualified endoscopist is time- and cost-consuming. In manycountries, especially in Western Europe and China, the demandfor endoscopists familiar with the diagnosis of EGC means theyare in short supply, which greatly limits the effectiveness ofendoscopy in the diagnosis and prevention of gastric cancer[7–10].

DCNN is one of the most important deep learning methodsfor computer vision and image classification [12–14]. Chen etal. [13] achieved accurate classification of diminutive colorectalpolyps based on images captured during colonoscopy usingDCNN, and Byrne et al. [14] achieved real-time differentiationof adenomatous and hyperplastic diminutive colorectal polypson colonoscopy videos. The most recent study [30] usedDCNN to detect EGC with an overall sensitivity of 92.2% and aPPV of 30.6% in their dataset. Here, we developed a DCNN sys-tem to detect EGC, with a sensitivity of 94% and a PPV of 91.3%,and distinguish gastric locations on a par with the level of ex-pert endoscopists. Furthermore, we compared the competenceof the DCNN with endoscopists and used the DCNN in unpro-cessed EGD videos to proactively track suspicious cancerous le-sions without blind spots.

Observing the whole stomach is a basic prerequisite for thediagnosis of gastric cancer at an early stage [7, 16]. In order toavoid blind spots, standardized procedures have been made tomap the entire stomach during gastroscopy. The European So-ciety of Gastrointestinal Endoscopy (ESGE) published a protocolincluding 10 images of the stomach in 2016 [15]. Japanese re-searchers published a minimum required “systematic screeningprotocol for the stomach” (SSS) standard including 22 imagesof the stomach so as not to miss suspicious cancerous lesions[16]. However, these protocols are often not well followed,and endoscopists may ignore some parts in the stomach be-cause of subjective factors or limited operative levels, whichcan lead to the misdiagnosis of EGC [7, 8, 10].

In the present study, using 24549 images from EGDs, we de-veloped a DCNN model that accurately and stably recognizeseach part of the stomach on a par with the expert level, auto-matically captures images during endoscopy, and maps theseonto a grid model of the stomach to prompt the operator aboutblind spots. This real-time assistance system will improve thequality of EGD and ensure that the whole stomach is observedduring endoscopy, thereby providing an important prerequisitefor the detection of EGC.

White-light imaging is the standard endoscopic examinationfor the identification of gastric cancer lesions, although it is dif-ficult to make an accurate detection of EGC [31, 32]. It has beenreported that the sensitivity of white-light imaging in the diag-

Video 1 Testing of the deep convolution neural network(DCNN) in an unprocessed esophagogastroduodenoscopy (EGD)video from case 1. The DCNN accurately presented the coveredparts synchronized with the process of EGD to verify that the en-tire stomach had been mapped. In the grid model for the stom-ach, any transparent area indicated that this part had not beenobserved during the EGD. No cancerous lesion was detected.Online content viewable at:https://doi.org/10.1055/a-0855-3532

Video 2 Testing of the deep convolution neural network(DCNN) model in an unprocessed esophagogastroduodenoscopy(EGD) video from case 2. The DCNN indicated suspicious gastriccancer regions and presented the covered parts synchronizedwith the process of EGD. Suspicious cancerous regions were indi-cated by class activation maps (CAMs) and the color depth of theCAMs was positively correlated with the confidence of DCNN pre-diction. In the grid model for the stomach, any transparent areaindicated that this part had not been observed during the EGD.Online content viewable at:https://doi.org/10.1055/a-0855-3532

Wu Lianlian et al. Deep neural network for endoscopic early gastric cancer detection… Endoscopy 2019; 51: 522–531 529

Thi

s do

cum

ent w

as d

ownl

oade

d fo

r pe

rson

al u

se o

nly.

Una

utho

rized

dis

trib

utio

n is

str

ictly

pro

hibi

ted.

nosis of superficial EGC ranges from 33% to 75% [33]. Manynew technologies have been developed to improve diagnosticabilities for EGC, including image-enhanced endoscopy andmagnifying endoscopy [34]. Plenty of time and money havebeen put into training endoscopists to become familiar withthe characteristics of early cancer lesions under different views[7]. However, while current technologies are still not strong forsome secluded lesions, such as the IIb-type EGC [34], manualdiagnosis, on the other hand, greatly depends on the experi-ence and subjective state of the operator performing theendoscopy [8].

This subjective dependence on the operator decreases theaccuracy and stability of EGC diagnosis [8]. To overcome thislimitation, DCNN, with its strong learning ability and good re-producibility, has gained attention as a clinical tool for endos-copists. In the present study, 3170 gastric cancer and 6541 nor-mal control images from EGD examinations were collected totrain a DCNN model with reliable and stable diagnosis abilityfor EGC. An independent test was conducted to evaluate the di-agnostic ability of DCNN and endoscopists. In our study, theDCNN achieved an accuracy of 92.5%, a sensitivity of 94%, aspecificity of 91%, a PPV of 91.3%, and an NPV of 93.8%, out-performing all levels of endoscopists.

In terms of stability evaluation, the nonexperts achievedmoderate interobserver agreement (kappa 0.42–0.49), andthe experts achieved substantial interobserver agreement(kappa 0.80). Because of the subjective interpretation of theEGC characteristics, human learning curves in the diagnosis ofEGC exist, and therefore an objective diagnosis is necessary[5, 8]. In our study, we used up-to-date neural network modelsto develop a DCNN system. It achieved perfect intraobserveragreement (kappa 1.0), while endoscopists had variable in-traobserver agreement. Our results indicate that this gastriccancer screening system based on a DCNN has adequate andconsistent diagnostic performance, removes some of the diag-nostic subjectivity, and could be a powerful tool to assistendoscopists, especially nonexperts, in detecting EGC.

The diagnostic ability and stability of the DCNN seems tooutperform that of experienced endoscopists. In addition, thediagnostic time of the DCNN was considerably shorter thanthat of the endoscopists. It should be noted that the shorterscreening time and the absence of fatigue with the DCNN maymake it possible to provide quick predictions of EGC followingan endoscopic examination. Importantly, the diagnosis of EGCby the DCNN can be achieved completely automatically and on-line, which may contribute to the development of telemedicine,thereby alleviating the problem of inadequate numbers and ex-perience of doctors in remote regions.

Another strength of this study is that the DCNN is armedwith CAMs and a grid model for the stomach to cover suspi-cious cancerous regions and indicate the existence of potentialblind spots. The CAMs are a weighted linear sum of the pres-ence of visual patterns with different characteristics, by whichthe discriminative regions of target classification are highligh-ted [27]. As soon as we insert the scope into the stomach, wecan determine whether and where EGC is present in the back-ground mucosa using the DCNN with CAMs. In addition, the ob-

served areas are instantaneously recorded and colored in thegrid model of the stomach, indicating whether blind spots existduring the EGD. Through these two auxiliary tools, it is possiblefor the DCNN to proactively track suspicious cancerous lesionswithout blind spots, so reducing the pressure and workload onendoscopists during real-time EGD.

There are some limitations in the present study. First, thedetection of EGC was based on images only in white light, nar-row-band imaging (NBI), and blue-laser imaging (BLI) views.With images under more views, such as chromoendoscopyusing indigo carmine [7], i-scan optical enhancement [34], andeven optical coherence tomography [35], it is possible to de-sign a more universal EGC detection system.

Second, in the control group of the gastric cancer dataset,only normal, superficial gastritis, and mild erosive gastritis mu-cosa were enrolled. Other benign diseases, such as atrophicgastritis, gastritis verrucose, and typical benign ulcer, could beenrolled in the control group later. In this way, the practicabilityof the DCNN in detection of EGC will be further improved.

Third, when the DCNN was applied to unprocessed EGD vi-deos, false-positive errors occurred when the mucosa was notwashed clean. We plan to train the DCNN to recognize mucosathat is poorly prepared to avoid these mistakes and to transferthe false-positive errors into a suggestion to the operator withregard to cleaning the mucosa.

Fourth, although the DCNN presented satisfactory results inthe detection of EGC and monitoring of EGD quality in real-timeunprocessed videos, its competence was only quantitativelyevaluated in still images, not in videos. We will keep collectingdata to assess its ability in unprocessed videos and to provideaccurate effectiveness data for the DCNN in a real clinical set-ting in the near future.

In summary, a computer-aided system based on a DCNNprovides automated, accurate, and consistent diagnostic per-formance for the detection of EGC and the recognition of gas-tric locations. The CAMs and grid model for the stomach enablethe DCNN to proactively track suspicious cancerous lesionswithout blind spots during EGD. The DCNN is a promising tech-nique in computer-based recognition and is not inferior to ex-perts. It may be a powerful tool to assist endoscopists in detect-ing EGC without blind spots. More research should be conduct-ed and clinical applications should be tested to further verifyand improve this system’s effectiveness.

AcknowledgmentsThis work was partly supported by a grant from the ResearchFunds for Key Laboratory of Hubei Province (No. 2016CFA066),the National Natural Science Foundation of China (grant nos.81672387 [to Yu Honggang]), and the China Youth Develop-ment Foundation (grant no. 81401959 [to Zhou Wei] and grantno. 81703030 [to Ding Qianshan]).

530 Wu Lianlian et al. Deep neural network for endoscopic early gastric cancer detection… Endoscopy 2019; 51: 522–531

Original article

Thi

s do

cum

ent w

as d

ownl

oade

d fo

r pe

rson

al u

se o

nly.

Una

utho

rized

dis

trib

utio

n is

str

ictly

pro

hibi

ted.

Competing interests

None

References

[1] Torre LA, Bray F, Siegel RL et al. Global cancer statistics, 2012. CACancer J Clin 2015; 65: 87–108

[2] Soetikno R, Kaltenbach T, Yeh R et al. Endoscopic mucosal resectionfor early cancers of the upper gastrointestinal tract. J Clin Oncol 2005;23: 4490–4498

[3] Laks S, Meyers MO, Kim HJ. Surveillance for gastric cancer. Surg Clin2017; 97: 317–331

[4] Pasechnikov V, Chukov S, Fedorov E et al. Gastric cancer: prevention,screening and early diagnosis. World J Gastroenterol 2014; 20:13842–13862

[5] Yalamarthi S, Witherspoon P, McCole D et al. Missed diagnoses inpatients with upper gastrointestinal cancers. Endoscopy 2004; 36:874–879

[6] Rutter MD, Senore C, Bisschops R et al. The European Society of Gas-trointestinal Endoscopy quality improvement initiative: developingperformance measures. United European Gastroenterol J 2016; 4:30–41

[7] Yao K, Uedo N, Muto M et al. Development of an e-learning system forteaching endoscopists how to diagnose early gastric cancer: basicprinciples for improving early detection. Gastric Cancer 2017; 20:S28– S38

[8] Scaffidi MA, Grover SC, Carnahan H et al. Impact of experience onself-assessment accuracy of clinical colonoscopy competence. Gas-trointest Endosc 2018; 87: 827–836.e2

[9] Kim GH, Bang SJ, Ende AR et al. Is screening and surveillance for earlydetection of gastric cancer needed in Korean Americans? Korean J IntMed 2015; 30: 747

[10] O'Mahony S, Naylor G, Axon A. Quality assurance in gastrointestinalendoscopy. Endoscopy 2000; 32: 483–488

[11] Torkamani A, Andersen KG, Steinhubl SR et al. High-definition medi-cine. Cell 2017; 170: 828–843

[12] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015; 521:436–444

[13] Chen PJ, Lin MC, Lai MJ et al. Accurate classification of diminutivecolorectal polyps using computer-aided analysis. Gastroenterology2018; 154: 568–575

[14] Byrne MF, Chapados N, Soudan F et al. Real-time differentiation ofadenomatous and hyperplastic diminutive colorectal polyps duringanalysis of unaltered videos of standard colonoscopy using a deeplearning model. Gut 2018: doi:10.1136/gutjnl-2017-314547

[15] Bisschops R, Areia M, Coron E et al. Performance measures for uppergastrointestinal endoscopy: a European Society of GastrointestinalEndoscopy (ESGE) quality improvement initiative. Endoscopy 2016;48: 843–864

[16] Yao K. The endoscopic diagnosis of early gastric cancer. Ann Gastro-enterol 2013; 26: 11–22

[17] Russakovsky O, Deng J, Su H et al. Imagenet large scale visual recog-nition challenge. Int J Comput Vision 2015; 115: 211–252

[18] Simonyan K, Zisserman A. Very deep convolutional networks forlarge-scale image recognition. CoRR arXiv: 1409.1556 https://arxiv.org/abs/1508.06576

[19] He K, Zhang X, Ren S et al. Deep residual learning for image recogni-tion. Proc IEEE Conf Comput Vision Pattern Recogn 2016: 770–778

[20] Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl DataEng 2010; 22: 1345–1359

[21] Abadi M, Agarwal A, Barham P et al. Tensorflow: A system for large-scale machine learning. 12th Symposium on Operating Systems De-sign and Implementation. 2016: 265–283 https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf

[22] Tanner MA, Wong WH. The calculation of posterior distributions bydata augmentation. J Am Stat Assoc 1987; 82: 528–540

[23] Wen Z, Li B, Ramamohanarao K et al. Improving efficiency of SVM k-fold cross-validation by alpha seeding. AAAI 2017: 2768–2774

[24] Prechelt L. Automatic early stopping using cross validation: quantify-ing the criteria. Neural Netw 1998; 11: 761–767

[25] Li S, Liu G, Tang X et al. An ensemble deep convolutional neural net-work model with improved DS evidence fusion for bearing fault diag-nosis. Sensors 2017; 17: 1729

[26] Jung Y, Hu J. AK-fold averaging cross-validation procedure. JNonparametr Stat 2015; 27: 167–179

[27] Zhou B, Khosla A, Lapedriza A et al. Learning deep features for discri-minative localization. Proc IEEE Conf Comput Vision Pattern Recogn2016: 2921–2929

[28] O'Hailey T. Hybrid animation: integrating 2D and 3D assets. Abing-don: OXON: Taylor and Francis; 2010

[29] Liaw A, Wiener M. Classification and regression by randomForest. Rnews 2002; 2: 18–22

[30] Hirasawa T, Aoyama K, Tanimoto T et al. Application of artificial intel-ligence using a convolutional neural network for detecting gastriccancer in endoscopic images. Gastric Cancer 2018; 21: 653–660

[31] Teh JL, Hartman M, Lau L et al. Mo1579 duration of endoscopic ex-amination significantly impacts detection rates of neoplastic lesionsduring diagnostic upper endoscopy. Gastrointest Endosc 2011; 73:AB393

[32] Zhang Q, Wang F, Chen ZY et al. Comparison of the diagnostic effica-cy of white light endoscopy and magnifying endoscopy with narrowband imaging for early gastric cancer: a meta-analysis. Gastric Cancer2016; 19: 543–552

[33] Ezoe Y, Muto M, Uedo N et al. Magnifying narrowband imaging ismore accurate than conventional white-light imaging in diagnosis ofgastric mucosal cancer. Gastroenterology 2011; 141: 2017–2025

[34] Song M, Ang TL. Early detection of early gastric cancer using image-enhanced endoscopy: Current trends. Gastrointest Intervent 2014; 3:1–7

[35] Tsai TH, Leggett CL, Trindade AJ et al. Optical coherence tomographyin gastroenterology: a review and future outlook. J Biomed Opt 2017;22: 1–17

Wu Lianlian et al. Deep neural network for endoscopic early gastric cancer detection… Endoscopy 2019; 51: 522–531 531

Thi

s do

cum

ent w

as d

ownl

oade

d fo

r pe

rson

al u

se o

nly.

Una

utho

rized

dis

trib

utio

n is

str

ictly

pro

hibi

ted.