deepconvolutionalneuralnetworkforulcerrecognitionin ...e emergence of wireless capsule endoscopy...

15
Research Article DeepConvolutionalNeuralNetworkforUlcerRecognitionin Wireless Capsule Endoscopy: Experimental Feasibility and Optimization SenWang , 1,2 YuxiangXing, 1,2 LiZhang , 1,2 HeweiGao, 1,2 andHaoZhang 3 1 Key Laboratory of Particle & Radiation Imaging (Tsinghua University), Ministry of Education, Beijing, China 2 Department of Engineering Physics, Tsinghua University, Beijing 100084, China 3 Ankon Technologies Co., Ltd., Wuhan, Shanghai, China Correspondence should be addressed to Li Zhang; [email protected] Received 4 April 2019; Accepted 18 August 2019; Published 18 September 2019 Academic Editor: Manuel F. G. Penedo Copyright © 2019 Sen Wang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Wireless capsule endoscopy (WCE) has developed rapidly over the last several years and now enables physicians to examine the gastrointestinal tract without surgical operation. However, a large number of images must be analyzed to obtain a diagnosis. Deep convolutional neural networks (CNNs) have demonstrated impressive performance in different computer vision tasks. us, in this work, we aim to explore the feasibility of deep learning for ulcer recognition and optimize a CNN-based ulcer recognition architecture for WCE images. By analyzing the ulcer recognition task and characteristics of classic deep learning networks, we propose a HAnet architecture that uses ResNet-34 as the base network and fuses hyper features from the shallow layer with deep features in deeper layers to provide final diagnostic decisions. 1,416 independent WCE videos are collected for this study. e overall test accuracy of our HAnet is 92.05%, and its sensitivity and specificity are 91.64% and 92.42%, respectively. According to our comparisons of F1, F2, and ROC-AUC, the proposed method performs better than several off-the-shelf CNN models, including VGG, DenseNet, and Inception-ResNet-v2, and classical machine learning methods with handcrafted features for WCE image classification. Overall, this study demonstrates that recognizing ulcers in WCE images via the deep CNN method is feasible and could help reduce the tedious image reading work of physicians. Moreover, our HAnet architecture tailored for this problem gives a fine choice for the design of network structure. 1.Introduction Gastrointestinal (GI) diseases pose great threats to human health. Gastric cancer, for example, ranks fourth among the most common type of cancers globally and is the second most common cause of death from cancer worldwide [1]. Conventional gastroscopy can provide accurate localization of lesions and is one of the most popular diagnostic mo- dalities for gastric diseases. However, conventional gas- troscopy is painful and invasive and cannot effectively detect lesions in the small intestine. e emergence of wireless capsule endoscopy (WCE) has revolutionized the task of imaging GI issues; this technology offers a noninvasive alternative to the con- ventional method and allows exploration of the GI tract with direct visualization. WCE has been proven to have great value in evaluating focal lesions, such as those related to GI bleeding and ulcers, in the digestive tract [2]. WCE was first induced in 2000 by Given Imaging and approved for use by the U.S. Food and Drug Administration in 2001 [3]. In the examination phase, a capsule is swallowed by a patient and propelled by peristalsis or magnetic fields to travel along the GI tract [3, 4]. While travelling, the WCE takes colored pictures of the GI tract for hours at a frame rate of 2–4 photographs per second [3] and transmits the same to a data-recording device. e recorded images are viewed by physicians to arrive at a diagnosis. Figure 1 illustrates a wireless capsule. Examination of WCE images is a time-consuming and tedious endeavor for doctors because a single scan for a Hindawi Computational and Mathematical Methods in Medicine Volume 2019, Article ID 7546215, 14 pages https://doi.org/10.1155/2019/7546215

Upload: others

Post on 10-Aug-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DeepConvolutionalNeuralNetworkforUlcerRecognitionin ...e emergence of wireless capsule endoscopy (WCE) has revolutionized the task of imaging GI issues; this technology offers a noninvasive

Research ArticleDeep Convolutional Neural Network for Ulcer Recognition inWireless Capsule Endoscopy Experimental Feasibilityand Optimization

Sen Wang 12 Yuxiang Xing12 Li Zhang 12 Hewei Gao12 and Hao Zhang3

1Key Laboratory of Particle amp Radiation Imaging (Tsinghua University) Ministry of Education Beijing China2Department of Engineering Physics Tsinghua University Beijing 100084 China3Ankon Technologies Co Ltd Wuhan Shanghai China

Correspondence should be addressed to Li Zhang zlitsinghuaeducn

Received 4 April 2019 Accepted 18 August 2019 Published 18 September 2019

Academic Editor Manuel F G Penedo

Copyright copy 2019 Sen Wang et al is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Wireless capsule endoscopy (WCE) has developed rapidly over the last several years and now enables physicians to examine thegastrointestinal tract without surgical operation However a large number of images must be analyzed to obtain a diagnosis Deepconvolutional neural networks (CNNs) have demonstrated impressive performance in different computer vision tasks us inthis work we aim to explore the feasibility of deep learning for ulcer recognition and optimize a CNN-based ulcer recognitionarchitecture for WCE images By analyzing the ulcer recognition task and characteristics of classic deep learning networks wepropose a HAnet architecture that uses ResNet-34 as the base network and fuses hyper features from the shallow layer with deepfeatures in deeper layers to provide final diagnostic decisions 1416 independent WCE videos are collected for this study eoverall test accuracy of our HAnet is 9205 and its sensitivity and specificity are 9164 and 9242 respectively According toour comparisons of F1 F2 and ROC-AUC the proposed method performs better than several off-the-shelf CNN modelsincluding VGG DenseNet and Inception-ResNet-v2 and classical machine learning methods with handcrafted features forWCEimage classification Overall this study demonstrates that recognizing ulcers inWCE images via the deep CNNmethod is feasibleand could help reduce the tedious image reading work of physicians Moreover our HAnet architecture tailored for this problemgives a fine choice for the design of network structure

1 Introduction

Gastrointestinal (GI) diseases pose great threats to humanhealth Gastric cancer for example ranks fourth among themost common type of cancers globally and is the secondmost common cause of death from cancer worldwide [1]Conventional gastroscopy can provide accurate localizationof lesions and is one of the most popular diagnostic mo-dalities for gastric diseases However conventional gas-troscopy is painful and invasive and cannot effectively detectlesions in the small intestine

e emergence of wireless capsule endoscopy (WCE)has revolutionized the task of imaging GI issues thistechnology offers a noninvasive alternative to the con-ventional method and allows exploration of the GI tract

with direct visualization WCE has been proven to havegreat value in evaluating focal lesions such as those relatedto GI bleeding and ulcers in the digestive tract [2]

WCE was first induced in 2000 by Given Imaging andapproved for use by the US Food and Drug Administrationin 2001 [3] In the examination phase a capsule is swallowedby a patient and propelled by peristalsis or magnetic fields totravel along the GI tract [3 4] While travelling the WCEtakes colored pictures of the GI tract for hours at a frame rateof 2ndash4 photographs per second [3] and transmits the same toa data-recording device e recorded images are viewed byphysicians to arrive at a diagnosis Figure 1 illustrates awireless capsule

Examination of WCE images is a time-consuming andtedious endeavor for doctors because a single scan for a

HindawiComputational and Mathematical Methods in MedicineVolume 2019 Article ID 7546215 14 pageshttpsdoiorg10115520197546215

patient may include up to tens of thousands of images ofthe GI tract Experienced physicians may spend hoursreviewing each case Furthermore abnormal frames mayoccupy only a tiny portion of all of the images obtained [5]us physicians may miss the actual issue due to fatigue oroversight

Several features have motivated researchers to turn tocomputer-aided systems including improved ulcer de-tection polyp recognition and bleeding area segmentation[3 5ndash15] to reduce the burden on physicians and guaranteediagnostic precision

Ulcers are one of the most common lesions in the GItract an estimated 1 out of every 10 persons is believed tosuffer from ulcers [13] An ulcer is defined as an area oftissues destroyed by gastric juice and showing a disconti-nuity or break in a bodily membrane [9 11] e color andtexture of the ulcerated area are different from those of anormal GI tract Some representative ulcer frames in WCEvideos are demonstrated in Figure 2 Ulcer recognitionrequires classification of each image in a WCE video asulcerated or not similar to the classification work incomputer vision tasks

Deep learning methods based on the convolutionalneural network (CNN) have seen several breakthroughs inclassification tasks in recent years Considering the diffi-culty in mathematically describing the great variation in theshapes and features of abnormal regions in WCE imagesand the fact that deep learning is powerful in extractinginformation from data we propose the application of deeplearning methods to ulcer recognition using a large WCEdataset of big volume to provide adequate diversity In thispaper we carefully analyze the problem of ulcer frameclassification and propose a deep learning framework basedon a multiscale feature concatenated CNN hereinafterreferred to as HAnet to assist in the WCE video exami-nation task of physicians Our network is verified to beeffective on a large dataset containing WCE videos of 1416patients

Our main contributions can be summarized in termsof the following three aspects (1) e proposed archi-tecture adopts state-of-the-art CNN models to efficientlyextract features for ulcer recognition It incorporates aspecial design that fuses hyper features from shallowlayers and deep features from deep layers to improve therecognition of ulcers at vastly distributed scales (2) To thebest of our knowledge this work is the first experimentalstudy to include a large dataset consisting of over 1400WCE videos from ulcer patients to explore the feasibility

of deep CNN for ulcer diagnosis Some representativedatasets presented in published works are listed in Table 1e 9205 accuracy and 09726 ROC-AUC of our pro-posed model demonstrate its great potential for practicalclinic applications (3) An extensive comparison withdifferent state-of-the-art CNN network structures isprovided to evaluate the most promising network for ulcerrecognition

2 Materials and Methods

21 Abnormality Recognition in WCE Videos Prior relatedmethods for abnormality recognition in WCE videos can beroughly divided into two classes conventional machinelearning techniques with handcrafted features and deeplearning methods

Conventional machine learning techniques are usuallybased on manually selected handcrafted features followed byapplication of some classifier Features commonly employedin conventional techniques include color and texturalfeatures

Lesion areas are usually of a different color from thesurrounding normal areas for example bleeding areas maypresent as red and ulcerated areas may present as yellow orwhite Fu et al [12] proposed a rapid bleeding detectionmethod that extracts color feature in the RGB color spaceBesides the RGB color space other color spaces like the HSIHSV [9] and YCbCr [3] are also commonly used to extractfeatures

Texture is another type of feature commonly used forpattern recognition Texture features include local binarypatterns (LBP) and filter-based features [7] An LBPdescriptor is based on a simple binary coding scheme thatcompares each pixel with its neighbors [19] e LBPdescriptor as well as its extended versions such asuniform LBP [8] and monogenic LBP [20] has beenadopted in various WCE recognition tasks Filter-basedfeatures such as Gabor filters and wavelet transforms arewidely used in WCE image recognition tasks for theirability to describe images in multistage space In addi-tion different textural features can be combined forbetter recognition performance As demonstrated in [8]the combination of wavelet transformation and uniformLBP can achieve automatic polyp detection with goodaccuracy

CNN-based deep learning methods are known to showimpressive performance e error rate in computer visionchallenges (eg ImageNet COCO) has decreased rapidly

CameraCMOS

Battery

Magnet

Antenna

Figure 1 Illustration of a wireless capsule is capsule is a product of Ankon Technologies Co Ltd (Wuhan Shanghai China)

2 Computational and Mathematical Methods in Medicine

(a) (b) (c)

(d) (e) (f )

Figure 2 Typical WCE images of an ulcer Ulcerated areas in each image are marked by a white box

Table 1 Representative studies and datasets of WCE videos in the literature

Experiment Cases DetailLi and Meng [5] 10 patientsrsquo videos 10 patientsrsquo videos 200 images

Li and Meng [16] 10 patientsrsquo videos 10 patientsrsquo videos (five for bleeding and the other fivefor ulcer)

Li et al [17] mdash 80 representative small intestine ulcer WCE imagesand 80 normal images

Karargyris and Bourbakis [9] mdash A WCE video containing 10 frames with polyps and40 normal frames and extra 20 frames with ulcer

Li and Meng [8] 10 patientsrsquo videos

10 patientsrsquo videos 600 representative polyp imagesand 600 normal images from data 60 normal images

and 60 polyp images from each patientrsquos videosegments

Yu et al [10] 60 patientsrsquo videos60 patientsrsquo videos 344 endoscopic images for

training another 120 ulcer images and 120 normalimages for testing

Fu et al [12] 20 patientsrsquo videos 20 patientsrsquo videos 5000 WCE images consisting of1000 bleeding frames and 4000 nonbleeding frames

Yeh et al [11] mdash 607 images containing 220 159 and 228 images ofbleeding ulcers and nonbleedingulcers respectively

Yuan et al [3] 10 patientsrsquo videos 10 patientsrsquo videos 2400 WCE images that consist of400 bleeding frames and 2000 normal frames

Yuan and Meng [7] 35 patientsrsquo videos35 patientsrsquo videos 3000 normal WCE images (1000bubbles 1000 TIs and 1000 CIs) and 1000 polyp

imagesHe et al [6] 11 patientsrsquo videos 11 patientsrsquo videos 440K WCE images

Aoki et al [18] 180 patientsrsquo videos115 patientsrsquo videos 5360 images of small-bowelerosions and ulcerations for training 65 patientsrsquovideos 10440 independent images for validation

Ours 1416 patientsrsquo videos 1416 patientsrsquo videos with 24839 representative ulcerframes

Computational and Mathematical Methods in Medicine 3

with the emergence of various deep CNN architectures suchas AlexNet VGGNet GoogLeNet and ResNet [21ndash28]

Many researchers have realized that handcrafted fea-tures merely encode partial information in WCE images[29] and that deep learning methods are capable ofextracting powerful feature representations that can beused in WCE lesion recognition and depth estimation[6 7 15 18 30ndash34]

A framework for hookworm detection was proposed in[14] this framework consists of an edge extraction networkand a hookworm classification network Inception modulesare used to capture multiscale features to capture spatialcorrelationse robustness and effectiveness of this methodwere verified in a dataset containing 440000WCE images of11 patients Yuan and Meng [7] proposed an autoencoder-based neural network model that introduces an imagemanifold constraint to a traditional sparse autoencoder torecognize polyps in WCE images Manifold constraint caneffectively enforce images within the same category to sharesimilar features and keep images in different categories farway ie it can preserve large intervariances and smallintravariance among images e proposed method wasevaluated using 3000 normal WCE images and 1000 WCEimages with polyps extracted from 35 patient videos Uti-lizing temporal information of WCE videos with 3D con-volution has also been explored for poly detection [33] Deeplearning methods are also adopted for ulcer diagnosis ereare also some investigations of ulcer recognition with deeplearning methods [18 35] Off-the-shelf CNN models aretrained and evaluated in these studies Experimental resultsand comparisons in these studies clearly demonstrate thesuperiority of deep learning methods over conventionalmachine learning techniques

From ulcer size analysis of our dataset we find that mostof the ulcers occupy only a tiny area in the whole imageDeep CNNs can inherently compute feature hierarchieslayer by layer Hyper features from shallow layers have highresolution but lack representation capacity by contrast deepfeatures from deep layers are semantically strong but havepoor resolution [36ndash38] ese features motivate us topropose a framework that fuses hyper and deep features toachieve ulcer recognition at vastly different scales We willgive detailed description of ulcer size analysis and theproposed method in Sections 22 and 23

22 Ulcer Dataset Our dataset is collected using a WCEsystem provided by Ankon Technologies Co Ltd(Wuhan Shanghai China) e WCE system consists ofan endoscopic capsule a guidance magnet robot a datarecorder and a computer workstation with software forreal-time viewing and controlling e capsule is28mm times 12mm in size and contains a permanent magnetin its dome Images are recorded and transferred at aspeed of 2 framess e resolution of the WCE image is480 times 480 pixels

e dataset used in this work to evaluate the perfor-mance of the proposed framework contains 1416 WCEvideos from 1416 patients (males 73 female 27) ie one

video per patient e WCE videos are collected from morethan 30 hospitals and 100 medical examination centersthrough the Ankon WCE system Each video is in-dependently annotated by at least two gastroenterologists Ifthe difference between annotation bounding boxes of thesame ulcer is larger than 10 an expert gastroenterologistwill review the annotation and provide a final decision eage distribution of patients is illustrated in Figure 3 eentire dataset consists of 1157 ulcer videos and 259 normalvideos In total 24839 frames are annotated as ulcers bygastroenterologists To balance the volume of each class24225 normal frames are randomly extracted from normalvideos for this study to match the 24839 representative ulcerframes A mask of diameter 420 pixels was used to crop thecenter area of each image in preprocessing is pre-processing did not change the image size

We plot the distribution of ulcer size in our dataset inFigure 4 e vertical and horizontal axes denote thenumber of images and the ratio of the ulcerated area to thewhole image size respectively Despite the inspiring suc-cess of CNNs in ImageNet competition ulcer recognitionpresents some challenge to the ImageNet classification taskbecause lesions normally occupy only a small area of WCEimages and the structures of lesions are rather subtle InFigure 4 about 25 of the ulcers occupy less than 1 of thearea of the whole image and more than 80 of the ulcersfound occupy less than 5 of the area of the image Hencea specific design of a suitable network is proposed to ac-count for the small ulcer problem and achieve goodsensitivity

23 HAnet-Based Ulcer Recognition Network with FusedHyper and Deep Features In this section we introduce ourdesign and the proposed architecture of our ulcer recog-nition network

Inspired by the design concept of previous works thatdeal with object recognition in vastly distributed scales[36ndash38] we propose an ulcer recognition network with ahyperconnection architecture (HAnet) e overall pipelineof this network is illustrated in Figure 5 FundamentallyHAnet fuses hyper and deep features Here we use ResNet-34 as the base feature-extraction network because accordingto our experiments (demonstrated in Section 3) it providesthe best results Global average pooling (GAP) [39] is used togenerate features for each layer GAP takes an average ofeach feature layer so that it reduces tensors with dimensionshtimes w times d to 1times 1times d Hyper features can be extracted frommultiple intermediate layers (layers 2 and 3 in this case) ofthe base network they are concatenated with the features oflast feature-extraction layer (layer 4 in this case) to make thefinal decision

Our WCE system outputs color images with a resolutionof 480times 480 pixels Experiments by the computer visioncommunity [36 40] have shown that high-resolution inputimages are helpful to the performance of CNN networks Tofully utilize the output images from the WCE system wemodify the base network to receive input images with a sizeof 480times 480times 3 without cropping or rescaling

4 Computational and Mathematical Methods in Medicine

24 Loss ofWeightedCross Entropy Cross-entropy (CE) lossis a common choice for classification tasks For binaryclassification [40] CE is defined as

CE(p y) minus y log(p) minus (1 minus y)log(1 minus p) (1)

where y isin 0 1 denotes the ground-truth label of thesample and p isin [0 1] is the estimated probability of asample belonging to the class with label 1 Mathematicallythe minimization process of CE is to enlarge the probabilitiesof samples with label 1 and suppress the probabilities ofsamples with label 0

To deal with possible imbalance between classes aweighting factor can be applied to different classes whichcan be called weighted cross-entropy (wCE) loss [41]

wCE(p y w) minusw

w + 1middot y log(p) minus

1w + 1

middot (1 minus y)log(1 minus p)

(2)

where w denotes the weighting factor to balance the loss ofdifferent classes Considering the overall small and varia-tional size distribution of ulcers as well as possible imbal-ance in the large dataset we set wCE as our loss function

25 Evaluation Criteria To evaluate the performance ofclassification accuracy (AC) sensitivity (SE) and specificity(SP) are exploited as metrics [6]

AC (TP + TN)

N

SE TP

(TP + FN)

SP TN

(TN + FP)

(3)

Here N is the total number of test images and TP FPTN and FN are the number of correctly classified imagescontaining ulcers the number of normal images falselyclassified as ulcer frames the number of correctly classifiedimages without ulcers and the number of images with ulcersfalsely classified as normal images respectively

AC gives an overall assessment of the performance of themodel SE denotes the modelrsquos ability to detect ulcer imagesand SP denotes its ability to distinguish normal imagesIdeally we expect both high SE and SP although some trade-offs between these metrics exist Considering that furthermanual inspection by the doctor of ulcer images detected bycomputer-aided systems is compulsory SE should be as highas possible with no negative impact on overall AC

We use a 5-fold cross-validation strategy at the case levelto evaluate the performances of different architectures thisstrategy splits the total number of cases evenly into fivesubsets Here one subset is used for testing and the fourother subsets are used for training and validation Figure 6illustrates the cross-validation operation In the presentstudy the volumes of train validation and test are about70 10 and 20 respectively Normal or ulcer frames arethen extracted from each case to form the trainingvalida-tiontesting dataset We perform case-level splitting becauseadjacent frames in the same case are likely to share similardetails We do not conduct frame-level cross-validationsplitting to avoid overfittinge validation dataset is used toselect the best model in each training process ie the modelwith the best validation accuracy during the training iter-ation is saved as the final model

3 Results

In this section the implementation process of the proposedmethod is introduced and its performance is evaluated bycomparison with several other related methods includingstate-of-the-art CNN methods and some representativeWCE recognition methods based on conventional machinelearning techniques

31 Network Architectures and Training Configurationse proposed HAnet connects hyper features to the finalfeature vector with the aim of enhancing the recognition ofulcers of different sizes e HAnet models are distinguishedby their architecture and training settings which includethree architectures and three training configurations in total

10ndash30 30ndash40 40ndash50 50ndash60 60ndash70 70ndash0

10

20

30

40

356

1804

3665

2810

1015

451

Age distribution

Case

por

tion

()

Figure 3 Age distribution of patients providing videos for thisstudy e horizontal axis denotes the age range and the verticalaxis denotes the case portion

lt1 1ndash25 25ndash5 gt50

2000

4000

6000

8000

6239

8969

5650

4115

Size ()

Case

num

ber

Figure 4 Distribution of ulcer size among patients e horizontalaxis denotes the ratio of ulcer area to the whole image while thevertical axis presents the number of frames

Computational and Mathematical Methods in Medicine 5

We illustrate these different architectures and configurationsin Table 2

ree different architectures can be obtained whenhyper features from layers 2 and 3 are used for decision incombination with features from layer 4 of our ResNetbackbone in Figure 7(a) hyper(l2) which fuses the hyperfeatures from layer 2 with the deep features of layer 4 inFigure 7(b) hyper(l3) which fuses features from layers 3and 4 and in Figure 7(c) hyper(l23) which fuses featuresfrom layers 2 3 and 4 to form the third HAnet Figure 7provides comprehensive diagrams of these different HAnetarchitectures

Each HAnet can be trained with three configurationsFigure 8 illustrates these configurations

311 ImageNet e whole HAnet is trained using pre-trained ResNet weights from ImageNet from initialization(denoted as ImageNet in Table 2) e total training processlasts for 40 epochs and the batch size is fixed to 16 samplese learning rate was initialized to be 10minus 3 and decayed by afactor of 10 at each period of 20 epochs e parameter ofmomentum is set to 09 Experimental results show that 40epochs are adequate for training to converge Weightedcross-entropy loss is used as the optimization criterion ebest model is selected based on validation results

312 All-Update ResNet(480) is first fine-tuned on ourdataset using pretrained ResNet weights from ImageNet for

initialization e training settings are identical to those in(1) Convergence is achieved during training and the bestmodel is selected based on validation results We then trainthe whole HAnet using the fine-tuned ResNet (480)models for initialization and update all weights in HAnet(denoted as all-update in Table 2) Training lasts for 40epochs e learning rate is set to 10minus 4 momentum is set to09 and the best model is selected based on validationresults

313 FC-Only e weights of the fine-tuned ResNet(480)model are used and only the last fully connected (FC) layeris updated in HAnet (denoted as FC-only in Table 2) ebest model is selected based on validation results Traininglasts for 10 epochs the learning rate is set to 10minus 4 andmomentum is set to 09

For example the first HAnet in Table 2 hyper(l2) FC-only refers to the architecture fusing the features from layer2 and the final layer 4 it uses ResNet (480) weights as thefeature extractor and only the final FC layer is updatedduring HAnet training

To achieve better generalizability data augmentation wasapplied online in the training procedure as suggested in [7]e images are randomly rotated between 0deg and 90deg andflipped with 50 possibility Our network is implementedusing PyTorch

e experiments are conducted on an Intel Xeon ma-chine (Gold 6130 CPU210GHz) with Nvidia Quadro

GAP (global average pooling)

GAP

Input Layer 1 Layer 2 Layer 3 Layer 4

GAP

Features

120 times 120 times 64 60 times 60 times 128 30 times 30 times 256 15 times 15 times 512 1 times 1 times 512 1 times 1 times 256 1 times 1 times 128480 times 480 times 3

Figure 5 Ulcer recognition network framework ResNet-34 which has 34 layers is selected as the feature extractor Here we only displaythe structural framework for clarity Detailed layer information can be found in Appendix

Split 0

Split 1

Split 4

hellip

helliphellip

Test Train Val

Figure 6 Illustration of cross-validation (green test dataset blue train dataset yellow validation dataset)

6 Computational and Mathematical Methods in Medicine

GP100 graphics cards Detailed results are presented inSections 32ndash34

32 Refinement of the Weighting Factor for Weighted Cross-Entropy To demonstrate the impact of different weightingfactors ie w in equation (2) we examine the cross-vali-dation results of model recognition accuracy with differentweighting factors e AC SE and SP curves are shown inFigure 9

AC varies with changes in weighting factor In generalSE improves while SP is degraded as the weighting factorincreases Detailed AC SE and SP values are listed inTable 3 ResNet-18(480) refers to experiments on a ResNet-18 network with a WCE full-resolution image input of480times 480 times 3 A possible explanation for the observed effectof the weighting factor is that the ulcer dataset containsmany consecutive frames of the same ulcer and theseframes may share marked similarities us while theframe numbers of the ulcer and normal dataset are com-parable the information contained by each dataset remainsunbalanced e weighting factor corrects or compensatesfor this imbalance

In the following experiments 40 is used as the weightingfactor as it outperforms other choices and simultaneouslyachieves good balance between SE and SP

33 Selection of Hyper Architectures We tested 10 models intotal as listed in Table 4 including a ResNet-18(480) modeland nine HAnet models based on ResNet-18(480) eresolution of input images is 480times 480times 3 for all models

According to the results in Table 4 the FC-only and all-update hyper models consistently outperform the ResNet-18(480) model in terms of the AC criterion which dem-onstrates the effectiveness of HAnet architectures More-over FC-only models generally perform better than all-update models thus implying that ResNet-18(480) extractsfeatures well and that further updates may corrupt thesefeatures

e hyper ImageNet models including hyper(l2)ImageNet hyper(l3) ImageNet and hyper(l23) ImageNetseem to give weak performance Hyper ImageNet modelsand the other hyper models share the same architecturese difference between these types of models is that thehyper ImageNet models are trained with the pretrainedImageNet ResNet-18 weights while the other models useResNet-18(480) weights that have been fine-tuned on theWCE datasetis finding reveals that a straightforward basenet such as ResNet-18(480) shows great power in extractingfeatures e complicated connections of HAnet may pro-hibit the network from reaching good convergence points

To fully utilize the advantages of hyper architectures werecommend a two-stage training process (1) Train a ResNet-18(480) model based on the ImageNet-pretrained weightsand then (2) use the fine-tuned ResNet-18(480) model as abackbone feature extractor to train the hyper models Wedenote the best model in all hyper architectures as HAnet-18(480) ie a hyper(l23) FC-only model

Additionally former exploration is based on ResNet-18and results indicate that a hyper(l23) FC-only architecturebased on the ResNet backbone feature extractor fine-tunedbyWCE images may be expected to improve the recognition

Table 2 Illustration of different architectures and configurations

ModelFeatures

Network initialization weights TrainingLayer 2 Layer 3 Layer 4

ResNet(480) ImageNet Train on WCE datasethyper(l2) FC-only

ResNet(480) Update FC-layer onlyhyper(l3) FC-only hyper(l23) FC-only hyper(l2) all-update

ResNet(480) Update all layershyper(l3) all-update hyper(l23) all-update hyper(l2) ImageNet

ImageNet Update all layershyper(l3) ImageNet hyper(l23) ImageNet

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(a)

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(b)

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(c)

Figure 7 Illustrations of HAnet architectures (a) hyper(l2) (b) hyper(l3) (c) hyper(l23)

Computational and Mathematical Methods in Medicine 7

ResNet(480) weights

ImageNet weights

Other trained weights

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) ImageNet

Layer 1

Layer 2

Layer 3

Layer 4

ResNet from ImageNet

Trai

n

Layer 1

Layer 2

Layer 3

Layer 4

ResNet(480)

FC la

yer

FC la

yer

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) all-update

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) FC-only

Train ResNet backbone to get ResNet(480)

all-update use the trained ResNet(480) weights as initialization and update all layers during training

FC-only use the trained ResNet(480) weights as initialization and update the FC layer during training

ImageNet use the ImageNet ResNet weights as initialization and update all layers during training

Figure 8 Illustration of three different training configurations including ImageNet all-update and FC-only Different colors are used torepresent different weights Green denotes pretrained weights from ImageNet and purple denotes the weights of ResNet(480) that have beenfine-tuned on the WCE dataset Several other colors denote other trained weights

1 2 3 4 5 688

89

90

91

92

93

94

Weighting factor

Val

ue (

)

ACSESP

Figure 9 AC SE and SP evolution against the wCE weighting factor Red purple and blue curves denote the results of AC SE and SPrespectively e horizontal axis is the weighting factor and the vertical axis is the value of AC SE and SP

8 Computational and Mathematical Methods in Medicine

capability of lesions in WCE videos To optimize our net-work we examine the performance of various ResNet seriesmembers to determine an appropriate backbone e cor-responding results are listed in Table 5 ResNet-34(480) hasbetter performance than ResNet-18(480) and ResNet-50(480) us we take ResNet-34(480) as our backbone totrain HAnet-34(480) e training settings are described inSection 31 Figure 10 gives the overall progression ofHAnet-34(480)

34 Comparison with Other Methods To evaluate the per-formance of HAnet we compared the proposed methodwith several other methods including several off-the-shelfCNN models [26ndash28] and two representative handcrafted-feature based methods for WCE recognition [3 42] e off-the-shelf CNN models including VGG [28] DenseNet [26]and Inception-ResNet-v2 [27] are trained to converge withthe same settings as ResNet-34(480) and the best model isselected based on the validation results For handcrafted-feature based methods grid searches to optimize hyperparameters are carried out

We performed repeated 2times 5-fold cross-validation toprovide sufficient measurements for statistical tests Table 6compares the detailed results of HAnet-34(480) with thoseof other methods On average HAnet-34(480) performsbetter in terms of AC SE and SP than the other methodsFigure 11(a) gives the location of each model considering itsinference time and accuracy Figure 11(b) is the statisticalresults of paired T-Test

Among the models tested HAnet-34(480) yields the bestperformance with good efficiency and accuracy Addition-ally the statistical test results demonstrate the improvementof our HAnet-34 is statistically significant Number in eachgrid cell denotes the p value of the two models in thecorresponding row and column We can see that the

improvement of HAnet-34(480) is statistically significant atthe 001 level compared with other methods

Table 7 gives more evaluation results based on severalcriteria including precision (PRE) recall (RECALL) F1 andF2 scores [33] and ROC-AUC [6] HAnet-34 outperformsall other models based on these criteria

4 Discussion

In this section the recognition capability of the proposedmethod for small lesions is demonstrated and discussedRecognition results are also visualized via the class activationmap (CAM) method [43] which indicates the localizationpotential of CNN networks for clinical diagnosis

41 Enhanced Recognition of Small Lesions by HAnet Toanalyze the recognition capacity of the proposed model thesensitivities of ulcers of different sizes are studied and theresults of ResNet-34(480) and the best hyper model HAnet-34(480) are listed in Table 8

Based on the results of each row in Table 8 most of theerrors noticeably occur in the small size range for bothmodels In general the larger the ulcer the easier its rec-ognition In the vertical comparison the ulcer recognition ofHAnet-34(480) outperforms that of ResNet-34(480) at allsize ranges including small lesions

42 Visualization of Recognition Results To better un-derstand our network we use a CAM [43] generated fromGAP to visualize the behavior of HAnet by highlighting therelatively important parts of an image and providing objectlocation information CAM is the weighted linear sum of theactivation map in the last convolutional layer e imageregions most relevant to a particular category can be simplyobtained by upsampling the CAM Using CAM we canverify what indeed has been learned by the network Six casesof representative results are displayed in Figure 12 For eachpair of images the left image shows the original frame whilethe right image shows the CAM result

ese results displayed in Figure 12 demonstrate thepotential use of HAnet for locating ulcers and easing thework of clinical physicians

Table 3 Cross-validation accuracy of ResNet-18(480) with different weighting factors

Weighting factor (w) 10 20 30 40 50 60AC () 9095plusmn 064 9100plusmn 049 9096plusmn 068 9100plusmn 070 9095plusmn 083 9072plusmn 075SE () 8865plusmn 064 8985plusmn 047 8986plusmn 101 9067plusmn 093 9150plusmn 076 9112plusmn 194SP () 9327plusmn 105 9215plusmn 122 9205plusmn 109 9145plusmn 184 9038plusmn 126 9032plusmn 188

Table 4 Performances of different architectures

ModelCross validation

AC () SE () SP ()ResNet-18(480) 9100plusmn 070 9055plusmn 086 9145plusmn 184hyper(l2) FC-only 9164plusmn 079 9122plusmn 108 9205plusmn 165hyper(l3) FC-only 9162plusmn 065 9115plusmn 056 9207plusmn 145hyper(l23) FC-only 9166 plusmn 081 9148 plusmn 090 9183 plusmn 175hyper(l2) all-update 9139plusmn 102 9128plusmn 104 9147plusmn 186hyper(l3) all-update 9150plusmn 081 9063plusmn 106 9233plusmn 174hyper(l23) all-update 9137plusmn 072 9133plusmn 063 914plusmn 142hyper(l2) ImageNet 9096plusmn 090 9052plusmn 110 9138plusmn 201hyper(l3) ImageNet 9104plusmn 080 9052plusmn 134 9154plusmn 131hyper(l23) ImageNet 9082plusmn 085 9026plusmn 133 9137plusmn 148

Table 5 Model recognition accuracy with different settings

ResNet-18(480) ResNet-34(480) ResNet-50(480)AC () 9100plusmn 070 9150plusmn 070 9129plusmn 091SE () 9055plusmn 086 9074plusmn 074 8963plusmn 183SP () 9145plusmn 184 9225plusmn 172 9294plusmn 182

Computational and Mathematical Methods in Medicine 9

3 HAnet architectures

hyper(l2)

hyper(l3)

hyper(l23)

3 Training configurations

ImageNet

all-update

FC-only

hyper(l23)

FC-only

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

3 Backbones

ResNet-34(480)

HAnet-34(480)

Cross validation

Cross validation

Figure 10 Progression of HAnet-34(480)

Table 6 Comparison of HAnet with other methods

AC () SE () SP ()SPM-BoW-SVM [42] 6138plusmn 122 5147plusmn 265 7067plusmn 224Words-based-color-histogram [3] 8034plusmn 029 8221plusmn 039 7844plusmn 029vgg-16(480) [28] 9085plusmn 098 9012plusmn 117 9202plusmn 252dense-121(480) [26] 9126plusmn 043 9047plusmn 167 9207plusmn 204Inception-ResNet-v2(480) [27] 9145plusmn 080 9081plusmn 195 9212plusmn 271ResNet-34(480) [25] 9147plusmn 052 9053plusmn 114 9241plusmn 166HAnet-34(480) 9205plusmn 052 9164plusmn 095 9242plusmn 154

0 10 20 30 409050

9075

9100

9125

9150

9175

9200

9225

9250

dense-121(480)

Inception-res-v2(480)

vgg-16(480)

HAnet-18(480)

HAnet-34(480)

HAnet-50(480)

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

Average inference time (ms)

Accu

racy

()

(a)

Figure 11 Continued

10 Computational and Mathematical Methods in Medicine

5 Conclusion

In this work we proposed a CNN architecture for ulcerdetection that uses a state-of-the-art CNN architecture(ResNet-34) as the feature extractor and fuses hyper anddeep features to enhance the recognition of ulcers of varioussizes A large ulcer dataset containing WCE videos from1416 patients was used for this studye proposed networkwas extensively evaluated and compared with other methodsusing overall AC SE SP F1 F2 and ROC-AUC as metrics

Experimental results demonstrate that the proposed ar-chitecture outperforms off-the-shelf CNN architectures es-pecially for the recognition of small ulcers Visualization with

CAM further demonstrates the potential of the proposedarchitecture to locate a suspicious area accurately in a WCEimage Taken together the results suggest a potential methodfor the automatic diagnosis of ulcers from WCE videos

Additionally we conducted experiments to investigatethe effect of number of cases We used split 0 datasets in thecross-validation experiment 990 cases for training 142 casesfor validation and 283 cases for testing We constructeddifferent training datasets from the 990 cases while fixed thevalidation and test dataset Firstly we did experiments onusing different number of cases for training We randomlyselected 659 cases 423 cases and 283 cases from 990 casesen we did another experiment using similar number of

00096

00096 00002

08018

00002 08018

00003

02384

00228

00003 02384 00228

00002

00026

00080

00989

00002 00026 00080 00989

HAnet-34(480)

HAnet-34(480)

Inception-res-v2(480)

Inception-res-v2(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

lt001

gt005

001ndash005

(b)

Figure 11 Comparison of different models (a) Accuracy and inference time comparison e horizontal and vertical axes denote theinference time and test accuracy respectively (b) Statistical test results of paired T-Test Number in each grid cell denotes the p value of thetwo models in the corresponding row and column

Table 7 Evaluation of different criteria

PRE RECALL F1 F2 ROC-AUCvgg-16(480) 09170 09012 09087 09040 09656dense-121(480) 09198 09047 09118 09074 09658Inception-ResNet-v2(480) 09208 09081 09138 09102 09706ResNet-34(480) 09218 09053 09133 09084 09698HAnet-34(480) 09237 09164 09199 09177 09726

Table 8 Recognition of ulcer with different sizes

ModelUlcer size

lt1 1ndash25 25ndash5 gt5ResNet-34(480) 8144plusmn 307 9186plusmn 140 9416plusmn 126 9651plusmn 143HAnet-34(480) 8237plusmn 360 9278plusmn 133 9540plusmn 074 9711plusmn 111

Computational and Mathematical Methods in Medicine 11

frames as last experiment that distributed in all 990 casesResults demonstrate that when similar number of frames areused for training test accuracies using training datasets withmore cases are better is should be attributed to richerdiversity introduced by more cases We may recommend touse as many cases as possible to train the model

While the performance of HAnet is very encouragingimproving its SE and SP further is necessary For examplethe fusion strategy in the proposed architecture involves

concatenation of features from shallow layers after GAPSemantic information in hyper features may not be as strongas that in deep features ie false-activated neural units dueto the relative limited receptive field in the shallow layersmay add unnecessary noise to the concatenated featurevector when GAP is utilized An attention mechanism [44]that can focus on the suspicious area may help address thisissue Temporal information from adjacent frames couldalso be used to provide external guidance during recognition

(a) (b)

(c) (d)

(e) (f )

Figure 12 Visualization network results of some representative ulcers with CAM A total of six groups of representative frames are obtained Foreach group the left image reflects the original frame while the right image shows the result of CAM (a) Typical ulcer (b) ulcer on the edge (c) ulcerin a turbid backgroundwith bubbles (d)multiple ulcers in one frame (e) ulcer in a shadow and (f) ulcer recognition in the framewith a flashlight

Table 9 Architectures of ResNet series members

Layer name Output size 18-Layer 34-Layer 50-Layer 101-Layer 152-LayerConv1 240 times 240 7 times 7 64 stride 2Maxpool 120 times 120 3 times 3 max pool stride 2

Layer 1 120 times 120 3 times 3 643 times 3 641113890 1113891 times 2 3 times 3 64

3 times 3 641113890 1113891 times 31 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Layer 2 60 times 60 3 times 3 1283 times 3 1281113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 41 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 8

Layer 3 30 times 30 3 times 3 2563 times 3 2561113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 61 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 6

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 23

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 36

Layer 4 15 times 15 3 times 3 5123 times 3 5121113890 1113891 times 2 3 times 3 512

3 times 3 5121113890 1113891 times 31 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Avgpool and fc 1 times 1 Global average pool 2-d fc

12 Computational and Mathematical Methods in Medicine

of the current frame In future work we will explore moretechniques in network design to improve ulcer recognition

Appendix

e architectures of members of the ResNet series includingResNet-18 ResNet-34 ResNet-50 ResNet-101 and ResNet-152 are illustrated in Table 9ese members share a similarpipeline consisting of seven components (ie conv1maxpool layer 1 layer 2 layer 3 layer 4 and avgpool + fc)Each layer in layers 1ndash4 has different subcomponentsdenoted as [ ] times n where [ ] represents a specific basicmodule consisting of 2-3 convolutional layers and n is thenumber of times the basic module is repeated Each row in[ ] describes the convolution layer setting eg [(3 times 3)64]

means a convolution layer with a kernel size and channel of3 times 3 and 64 respectively

Data Availability

eWCE datasets used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

Hao Zhang is currently an employee of Ankon TechnologiesCo Ltd (Wuhan Shanghai China) He helped in providingWCE data and organizing annotation e authors wouldlike to thank Ankon Technologies Co Ltd (WuhanShanghai China) for providing the WCE data (anko-ninccomcn)ey would also like to thank the participatingengineers of Ankon Technologies Co Ltd for theirthoughtful support and cooperation is work was sup-ported by a Research Project from Ankon Technologies CoLtd (Wuhan Shanghai China) the National Key ScientificInstrument and Equipment Development Project underGrant no 2013YQ160439 and the Zhangjiang NationalInnovation Demonstration Zone Special Development Fundunder Grant no ZJ2017-ZD-001

References

[1] L Shen Y-S Shan H-M Hu et al ldquoManagement of gastriccancer in Asia resource-stratified guidelinesrdquo e LancetOncology vol 14 no 12 pp e535ndashe547 2013

[2] Z Liao X Hou E-Q Lin-Hu et al ldquoAccuracy of magneticallycontrolled capsule endoscopy compared with conventionalgastroscopy in detection of gastric diseasesrdquo Clinical Gastro-enterology and Hepatology vol 14 no 9 pp 1266ndash1273e1 2016

[3] Y Yuan B Li and M Q-H Meng ldquoBleeding frame andregion detection in the wireless capsule endoscopy videordquoIEEE Journal of Biomedical and Health Informatics vol 20no 2 pp 624ndash630 2015

[4] N Shamsudhin V I Zverev H Keller et al ldquoMagneticallyguided capsule endoscopyrdquo Medical Physics vol 44 no 8pp e91ndashe111 2017

[5] B Li and M Q-H Meng ldquoComputer aided detection ofbleeding regions for capsule endoscopy imagesrdquo IEEETransactions on Biomedical Engineering vol 56 no 4pp 1032ndash1039 2009

[6] J-Y He X Wu Y-G Jiang Q Peng and R Jain ldquoHook-worm detection in wireless capsule endoscopy images withdeep learningrdquo IEEE Transactions on Image Processingvol 27 no 5 pp 2379ndash2392 2018

[7] Y Yuan and M Q-H Meng ldquoDeep learning for polyprecognition in wireless capsule endoscopy imagesrdquo MedicalPhysics vol 44 no 4 pp 1379ndash1389 2017

[8] B Li and M Q-H Meng ldquoAutomatic polyp detection forwireless capsule endoscopy imagesrdquo Expert Systems withApplications vol 39 no 12 pp 10952ndash10958 2012

[9] A Karargyris and N Bourbakis ldquoDetection of small bowelpolyps and ulcers in wireless capsule endoscopy videosrdquo IEEETransactions on Biomedical Engineering vol 58 no 10pp 2777ndash2786 2011

[10] L Yu P C Yuen and J Lai ldquoUlcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 21st In-ternational Conference on Pattern Recognition (ICPR2012)pp 45ndash48 IEEE Tsukuba Japan November 2012

[11] J-Y Yeh T-H Wu and W-J Tsai ldquoBleeding and ulcerdetection using wireless capsule endoscopy imagesrdquo Journalof Software Engineering and Applications vol 7 no 5pp 422ndash432 2014

[12] Y Fu W Zhang M Mandal and M Q-H Meng ldquoCom-puter-aided bleeding detection in WCE videordquo IEEE Journalof Biomedical and Health Informatics vol 18 no 2pp 636ndash642 2014

[13] V Charisis L Hadjileontiadis and G Sergiadis ldquoEnhancedulcer recognition from capsule endoscopic images usingtexture analysisrdquo in New Advances in the Basic and ClinicalGastroenterology IntechOpen London UK 2012

[14] A K Kundu and S A Fattah ldquoAn asymmetric indexed imagebased technique for automatic ulcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 2017 IEEERegion 10 Humanitarian Technology Conference (R10-HTC)pp 734ndash737 IEEE Dhaka Bangladesh December 2017

[15] E Ribeiro A Uhl W Georg andM Hafner ldquoExploring deeplearning and transfer learning for colonic polyp classifica-tionrdquo Computational andMathematical Methods in Medicinevol 2016 Article ID 6584725 16 pages 2016

[16] B Li and M Q-H Meng ldquoComputer-based detection ofbleeding and ulcer in wireless capsule endoscopy images bychromaticity momentsrdquo Computers in Biology and Medicinevol 39 no 2 pp 141ndash147 2009

[17] B Li L Qi M Q-H Meng and Y Fan ldquoUsing ensembleclassifier for small bowel ulcer detection in wireless capsuleendoscopy imagesrdquo in Proceedings of the 2009 IEEE In-ternational Conference on Robotics amp Biomimetics GuilinChina December 2009

[18] T Aoki A Yamada K Aoyama et al ldquoAutomatic detection oferosions and ulcerations in wireless capsule endoscopy imagesbased on a deep convolutional neural networkrdquo Gastrointes-tinal Endoscopy vol 89 no 2 pp 357ndash363e2 2019

[19] B Li andM Q-H Meng ldquoTexture analysis for ulcer detectionin capsule endoscopy imagesrdquo Image and Vision Computingvol 27 no 9 pp 1336ndash1342 2009

[20] Y Yuan and M Q-H Meng ldquoA novel feature for polypdetection in wireless capsule endoscopy imagesrdquo in Pro-ceedings of the 2014 IEEERSJ International Conference onIntelligent Robots and Systems pp 5010ndash5015 IEEE ChicagoIL USA September 2014

Computational and Mathematical Methods in Medicine 13

[21] A Krizhevsky I Sutskever and G Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inAdvances in Neural Information Processing Systems vol 25no 2 Curran Associates Inc Red Hook NY USA 2012

[22] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in Proceedings of the EuropeanConference on Computer Vision pp 818ndash833 Springer ChamSwitzerland September 2014

[23] C Szegedy W Liu Y Jia et al ldquoGoing deeper with con-volutionsrdquo in Proceedings of the 2015 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEEBoston MA USA June 2015

[24] C Szegedy V Vanhoucke S Ioffe J Shlens and Z WojnaldquoRethinking the inception architecture for computer visionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2818ndash2826 Las Vegas NV USA June2016

[25] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778 LasVegas NV USA June 2016

[26] G Huang Z Liu L Van Der Maaten and K Q WeinbergerldquoDensely connected convolutional networksrdquo in Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition pp 4700ndash4708 Honolulu HI USA July 2017

[27] C Szegedy S Ioffe V Vanhoucke and A Alemi ldquoInception-v4 inception-ResNet and the impact of residual connectionson learningrdquo in Proceedings of the irty-First AAAI Con-ference on Artificial Intelligence San Francisco CA USAFebruary 2017

[28] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[29] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquoNaturevol 521 no 7553 pp 436ndash444 2015

[30] M Turan E P Ornek N Ibrahimli et al ldquoUnsupervisedodometry and depth learning for endoscopic capsule robotsrdquo2018 httpsarxivorgabs180301047

[31] M Ye E Johns A Handa L Zhang P Pratt and G-Z YangldquoSelf-supervised Siamese learning on stereo image pairs fordepth estimation in robotic surgeryrdquo 2017 httpsarxivorgabs170508260

[32] M Faisal and N J Durr ldquoDeep learning and conditionalrandom fields-based depth estimation and topographicalreconstruction from conventional endoscopyrdquoMedical ImageAnalysis vol 48 pp 230ndash243 2018

[33] L Yu H Chen Q Dou J Qin and P A Heng ldquoIntegratingonline and offline three-dimensional deep learning for auto-mated polyp detection in colonoscopy videosrdquo IEEE Journal ofBiomedical andHealth Informatics vol 21 no1 pp 65ndash75 2017

[34] A A Shvets V I Iglovikov A Rakhlin and A A KalininldquoAngiodysplasia detection and localization using deep con-volutional neural networksrdquo in Proceedings of the 2018 17thIEEE International Conference on Machine Learning andApplications (ICMLA) pp 612ndash617 IEEE Orlando FL USADecember 2018

[35] S Fan L Xu Y Fan K Wei and L Li ldquoComputer-aideddetection of small intestinal ulcer and erosion in wirelesscapsule endoscopy imagesrdquo Physics in Medicine amp Biologyvol 63 no 16 p 165001 2018

[36] W Liu D Anguelov D Erhan et al ldquoSSD single shotmultibox detectorrdquo in Proceedings of the European Conferenceon Computer Vision pp 21ndash37 Springer Cham SwitzerlandOctober 2016

[37] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7263ndash7271 Honolulu HIUSA July 2017

[38] T-Y Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2117ndash2125 Honolulu HI USA July2017

[39] M Lin Q Chen and S Yan ldquoNetwork in networkrdquo 2013httpsarxivorgabs13124400

[40] T-Y Lin P Goyal R Girshick K He and P Dollar ldquoFocalloss for dense object detectionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 2980ndash2988Venice Italy October 2017

[41] H R Roth H Oda Y Hayashi et al ldquoHierarchical 3D fullyconvolutional networks for multi-organ segmentationrdquo 2017httpsarxivorgabs170406382

[42] X Liu J Bai G Liao Z Luo and C Wang ldquoDetection ofprotruding lesion in wireless capsule endoscopy videos of smallintestinerdquo in Proceedings of the Medical Imaging 2018 Com-puter-Aided Diagnosis Houston TX USA February 2018

[43] B Zhou A Khosla A Lapedriza A Oliva and A TorralbaldquoLearning deep features for discriminative localizationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2921ndash2929 Las Vegas NV USA June2016

[44] F Wang M Jiang C Qian et al ldquoResidual attention networkfor image classificationrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 3156ndash3164Honolulu HI USA July 2017

14 Computational and Mathematical Methods in Medicine

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 2: DeepConvolutionalNeuralNetworkforUlcerRecognitionin ...e emergence of wireless capsule endoscopy (WCE) has revolutionized the task of imaging GI issues; this technology offers a noninvasive

patient may include up to tens of thousands of images ofthe GI tract Experienced physicians may spend hoursreviewing each case Furthermore abnormal frames mayoccupy only a tiny portion of all of the images obtained [5]us physicians may miss the actual issue due to fatigue oroversight

Several features have motivated researchers to turn tocomputer-aided systems including improved ulcer de-tection polyp recognition and bleeding area segmentation[3 5ndash15] to reduce the burden on physicians and guaranteediagnostic precision

Ulcers are one of the most common lesions in the GItract an estimated 1 out of every 10 persons is believed tosuffer from ulcers [13] An ulcer is defined as an area oftissues destroyed by gastric juice and showing a disconti-nuity or break in a bodily membrane [9 11] e color andtexture of the ulcerated area are different from those of anormal GI tract Some representative ulcer frames in WCEvideos are demonstrated in Figure 2 Ulcer recognitionrequires classification of each image in a WCE video asulcerated or not similar to the classification work incomputer vision tasks

Deep learning methods based on the convolutionalneural network (CNN) have seen several breakthroughs inclassification tasks in recent years Considering the diffi-culty in mathematically describing the great variation in theshapes and features of abnormal regions in WCE imagesand the fact that deep learning is powerful in extractinginformation from data we propose the application of deeplearning methods to ulcer recognition using a large WCEdataset of big volume to provide adequate diversity In thispaper we carefully analyze the problem of ulcer frameclassification and propose a deep learning framework basedon a multiscale feature concatenated CNN hereinafterreferred to as HAnet to assist in the WCE video exami-nation task of physicians Our network is verified to beeffective on a large dataset containing WCE videos of 1416patients

Our main contributions can be summarized in termsof the following three aspects (1) e proposed archi-tecture adopts state-of-the-art CNN models to efficientlyextract features for ulcer recognition It incorporates aspecial design that fuses hyper features from shallowlayers and deep features from deep layers to improve therecognition of ulcers at vastly distributed scales (2) To thebest of our knowledge this work is the first experimentalstudy to include a large dataset consisting of over 1400WCE videos from ulcer patients to explore the feasibility

of deep CNN for ulcer diagnosis Some representativedatasets presented in published works are listed in Table 1e 9205 accuracy and 09726 ROC-AUC of our pro-posed model demonstrate its great potential for practicalclinic applications (3) An extensive comparison withdifferent state-of-the-art CNN network structures isprovided to evaluate the most promising network for ulcerrecognition

2 Materials and Methods

21 Abnormality Recognition in WCE Videos Prior relatedmethods for abnormality recognition in WCE videos can beroughly divided into two classes conventional machinelearning techniques with handcrafted features and deeplearning methods

Conventional machine learning techniques are usuallybased on manually selected handcrafted features followed byapplication of some classifier Features commonly employedin conventional techniques include color and texturalfeatures

Lesion areas are usually of a different color from thesurrounding normal areas for example bleeding areas maypresent as red and ulcerated areas may present as yellow orwhite Fu et al [12] proposed a rapid bleeding detectionmethod that extracts color feature in the RGB color spaceBesides the RGB color space other color spaces like the HSIHSV [9] and YCbCr [3] are also commonly used to extractfeatures

Texture is another type of feature commonly used forpattern recognition Texture features include local binarypatterns (LBP) and filter-based features [7] An LBPdescriptor is based on a simple binary coding scheme thatcompares each pixel with its neighbors [19] e LBPdescriptor as well as its extended versions such asuniform LBP [8] and monogenic LBP [20] has beenadopted in various WCE recognition tasks Filter-basedfeatures such as Gabor filters and wavelet transforms arewidely used in WCE image recognition tasks for theirability to describe images in multistage space In addi-tion different textural features can be combined forbetter recognition performance As demonstrated in [8]the combination of wavelet transformation and uniformLBP can achieve automatic polyp detection with goodaccuracy

CNN-based deep learning methods are known to showimpressive performance e error rate in computer visionchallenges (eg ImageNet COCO) has decreased rapidly

CameraCMOS

Battery

Magnet

Antenna

Figure 1 Illustration of a wireless capsule is capsule is a product of Ankon Technologies Co Ltd (Wuhan Shanghai China)

2 Computational and Mathematical Methods in Medicine

(a) (b) (c)

(d) (e) (f )

Figure 2 Typical WCE images of an ulcer Ulcerated areas in each image are marked by a white box

Table 1 Representative studies and datasets of WCE videos in the literature

Experiment Cases DetailLi and Meng [5] 10 patientsrsquo videos 10 patientsrsquo videos 200 images

Li and Meng [16] 10 patientsrsquo videos 10 patientsrsquo videos (five for bleeding and the other fivefor ulcer)

Li et al [17] mdash 80 representative small intestine ulcer WCE imagesand 80 normal images

Karargyris and Bourbakis [9] mdash A WCE video containing 10 frames with polyps and40 normal frames and extra 20 frames with ulcer

Li and Meng [8] 10 patientsrsquo videos

10 patientsrsquo videos 600 representative polyp imagesand 600 normal images from data 60 normal images

and 60 polyp images from each patientrsquos videosegments

Yu et al [10] 60 patientsrsquo videos60 patientsrsquo videos 344 endoscopic images for

training another 120 ulcer images and 120 normalimages for testing

Fu et al [12] 20 patientsrsquo videos 20 patientsrsquo videos 5000 WCE images consisting of1000 bleeding frames and 4000 nonbleeding frames

Yeh et al [11] mdash 607 images containing 220 159 and 228 images ofbleeding ulcers and nonbleedingulcers respectively

Yuan et al [3] 10 patientsrsquo videos 10 patientsrsquo videos 2400 WCE images that consist of400 bleeding frames and 2000 normal frames

Yuan and Meng [7] 35 patientsrsquo videos35 patientsrsquo videos 3000 normal WCE images (1000bubbles 1000 TIs and 1000 CIs) and 1000 polyp

imagesHe et al [6] 11 patientsrsquo videos 11 patientsrsquo videos 440K WCE images

Aoki et al [18] 180 patientsrsquo videos115 patientsrsquo videos 5360 images of small-bowelerosions and ulcerations for training 65 patientsrsquovideos 10440 independent images for validation

Ours 1416 patientsrsquo videos 1416 patientsrsquo videos with 24839 representative ulcerframes

Computational and Mathematical Methods in Medicine 3

with the emergence of various deep CNN architectures suchas AlexNet VGGNet GoogLeNet and ResNet [21ndash28]

Many researchers have realized that handcrafted fea-tures merely encode partial information in WCE images[29] and that deep learning methods are capable ofextracting powerful feature representations that can beused in WCE lesion recognition and depth estimation[6 7 15 18 30ndash34]

A framework for hookworm detection was proposed in[14] this framework consists of an edge extraction networkand a hookworm classification network Inception modulesare used to capture multiscale features to capture spatialcorrelationse robustness and effectiveness of this methodwere verified in a dataset containing 440000WCE images of11 patients Yuan and Meng [7] proposed an autoencoder-based neural network model that introduces an imagemanifold constraint to a traditional sparse autoencoder torecognize polyps in WCE images Manifold constraint caneffectively enforce images within the same category to sharesimilar features and keep images in different categories farway ie it can preserve large intervariances and smallintravariance among images e proposed method wasevaluated using 3000 normal WCE images and 1000 WCEimages with polyps extracted from 35 patient videos Uti-lizing temporal information of WCE videos with 3D con-volution has also been explored for poly detection [33] Deeplearning methods are also adopted for ulcer diagnosis ereare also some investigations of ulcer recognition with deeplearning methods [18 35] Off-the-shelf CNN models aretrained and evaluated in these studies Experimental resultsand comparisons in these studies clearly demonstrate thesuperiority of deep learning methods over conventionalmachine learning techniques

From ulcer size analysis of our dataset we find that mostof the ulcers occupy only a tiny area in the whole imageDeep CNNs can inherently compute feature hierarchieslayer by layer Hyper features from shallow layers have highresolution but lack representation capacity by contrast deepfeatures from deep layers are semantically strong but havepoor resolution [36ndash38] ese features motivate us topropose a framework that fuses hyper and deep features toachieve ulcer recognition at vastly different scales We willgive detailed description of ulcer size analysis and theproposed method in Sections 22 and 23

22 Ulcer Dataset Our dataset is collected using a WCEsystem provided by Ankon Technologies Co Ltd(Wuhan Shanghai China) e WCE system consists ofan endoscopic capsule a guidance magnet robot a datarecorder and a computer workstation with software forreal-time viewing and controlling e capsule is28mm times 12mm in size and contains a permanent magnetin its dome Images are recorded and transferred at aspeed of 2 framess e resolution of the WCE image is480 times 480 pixels

e dataset used in this work to evaluate the perfor-mance of the proposed framework contains 1416 WCEvideos from 1416 patients (males 73 female 27) ie one

video per patient e WCE videos are collected from morethan 30 hospitals and 100 medical examination centersthrough the Ankon WCE system Each video is in-dependently annotated by at least two gastroenterologists Ifthe difference between annotation bounding boxes of thesame ulcer is larger than 10 an expert gastroenterologistwill review the annotation and provide a final decision eage distribution of patients is illustrated in Figure 3 eentire dataset consists of 1157 ulcer videos and 259 normalvideos In total 24839 frames are annotated as ulcers bygastroenterologists To balance the volume of each class24225 normal frames are randomly extracted from normalvideos for this study to match the 24839 representative ulcerframes A mask of diameter 420 pixels was used to crop thecenter area of each image in preprocessing is pre-processing did not change the image size

We plot the distribution of ulcer size in our dataset inFigure 4 e vertical and horizontal axes denote thenumber of images and the ratio of the ulcerated area to thewhole image size respectively Despite the inspiring suc-cess of CNNs in ImageNet competition ulcer recognitionpresents some challenge to the ImageNet classification taskbecause lesions normally occupy only a small area of WCEimages and the structures of lesions are rather subtle InFigure 4 about 25 of the ulcers occupy less than 1 of thearea of the whole image and more than 80 of the ulcersfound occupy less than 5 of the area of the image Hencea specific design of a suitable network is proposed to ac-count for the small ulcer problem and achieve goodsensitivity

23 HAnet-Based Ulcer Recognition Network with FusedHyper and Deep Features In this section we introduce ourdesign and the proposed architecture of our ulcer recog-nition network

Inspired by the design concept of previous works thatdeal with object recognition in vastly distributed scales[36ndash38] we propose an ulcer recognition network with ahyperconnection architecture (HAnet) e overall pipelineof this network is illustrated in Figure 5 FundamentallyHAnet fuses hyper and deep features Here we use ResNet-34 as the base feature-extraction network because accordingto our experiments (demonstrated in Section 3) it providesthe best results Global average pooling (GAP) [39] is used togenerate features for each layer GAP takes an average ofeach feature layer so that it reduces tensors with dimensionshtimes w times d to 1times 1times d Hyper features can be extracted frommultiple intermediate layers (layers 2 and 3 in this case) ofthe base network they are concatenated with the features oflast feature-extraction layer (layer 4 in this case) to make thefinal decision

Our WCE system outputs color images with a resolutionof 480times 480 pixels Experiments by the computer visioncommunity [36 40] have shown that high-resolution inputimages are helpful to the performance of CNN networks Tofully utilize the output images from the WCE system wemodify the base network to receive input images with a sizeof 480times 480times 3 without cropping or rescaling

4 Computational and Mathematical Methods in Medicine

24 Loss ofWeightedCross Entropy Cross-entropy (CE) lossis a common choice for classification tasks For binaryclassification [40] CE is defined as

CE(p y) minus y log(p) minus (1 minus y)log(1 minus p) (1)

where y isin 0 1 denotes the ground-truth label of thesample and p isin [0 1] is the estimated probability of asample belonging to the class with label 1 Mathematicallythe minimization process of CE is to enlarge the probabilitiesof samples with label 1 and suppress the probabilities ofsamples with label 0

To deal with possible imbalance between classes aweighting factor can be applied to different classes whichcan be called weighted cross-entropy (wCE) loss [41]

wCE(p y w) minusw

w + 1middot y log(p) minus

1w + 1

middot (1 minus y)log(1 minus p)

(2)

where w denotes the weighting factor to balance the loss ofdifferent classes Considering the overall small and varia-tional size distribution of ulcers as well as possible imbal-ance in the large dataset we set wCE as our loss function

25 Evaluation Criteria To evaluate the performance ofclassification accuracy (AC) sensitivity (SE) and specificity(SP) are exploited as metrics [6]

AC (TP + TN)

N

SE TP

(TP + FN)

SP TN

(TN + FP)

(3)

Here N is the total number of test images and TP FPTN and FN are the number of correctly classified imagescontaining ulcers the number of normal images falselyclassified as ulcer frames the number of correctly classifiedimages without ulcers and the number of images with ulcersfalsely classified as normal images respectively

AC gives an overall assessment of the performance of themodel SE denotes the modelrsquos ability to detect ulcer imagesand SP denotes its ability to distinguish normal imagesIdeally we expect both high SE and SP although some trade-offs between these metrics exist Considering that furthermanual inspection by the doctor of ulcer images detected bycomputer-aided systems is compulsory SE should be as highas possible with no negative impact on overall AC

We use a 5-fold cross-validation strategy at the case levelto evaluate the performances of different architectures thisstrategy splits the total number of cases evenly into fivesubsets Here one subset is used for testing and the fourother subsets are used for training and validation Figure 6illustrates the cross-validation operation In the presentstudy the volumes of train validation and test are about70 10 and 20 respectively Normal or ulcer frames arethen extracted from each case to form the trainingvalida-tiontesting dataset We perform case-level splitting becauseadjacent frames in the same case are likely to share similardetails We do not conduct frame-level cross-validationsplitting to avoid overfittinge validation dataset is used toselect the best model in each training process ie the modelwith the best validation accuracy during the training iter-ation is saved as the final model

3 Results

In this section the implementation process of the proposedmethod is introduced and its performance is evaluated bycomparison with several other related methods includingstate-of-the-art CNN methods and some representativeWCE recognition methods based on conventional machinelearning techniques

31 Network Architectures and Training Configurationse proposed HAnet connects hyper features to the finalfeature vector with the aim of enhancing the recognition ofulcers of different sizes e HAnet models are distinguishedby their architecture and training settings which includethree architectures and three training configurations in total

10ndash30 30ndash40 40ndash50 50ndash60 60ndash70 70ndash0

10

20

30

40

356

1804

3665

2810

1015

451

Age distribution

Case

por

tion

()

Figure 3 Age distribution of patients providing videos for thisstudy e horizontal axis denotes the age range and the verticalaxis denotes the case portion

lt1 1ndash25 25ndash5 gt50

2000

4000

6000

8000

6239

8969

5650

4115

Size ()

Case

num

ber

Figure 4 Distribution of ulcer size among patients e horizontalaxis denotes the ratio of ulcer area to the whole image while thevertical axis presents the number of frames

Computational and Mathematical Methods in Medicine 5

We illustrate these different architectures and configurationsin Table 2

ree different architectures can be obtained whenhyper features from layers 2 and 3 are used for decision incombination with features from layer 4 of our ResNetbackbone in Figure 7(a) hyper(l2) which fuses the hyperfeatures from layer 2 with the deep features of layer 4 inFigure 7(b) hyper(l3) which fuses features from layers 3and 4 and in Figure 7(c) hyper(l23) which fuses featuresfrom layers 2 3 and 4 to form the third HAnet Figure 7provides comprehensive diagrams of these different HAnetarchitectures

Each HAnet can be trained with three configurationsFigure 8 illustrates these configurations

311 ImageNet e whole HAnet is trained using pre-trained ResNet weights from ImageNet from initialization(denoted as ImageNet in Table 2) e total training processlasts for 40 epochs and the batch size is fixed to 16 samplese learning rate was initialized to be 10minus 3 and decayed by afactor of 10 at each period of 20 epochs e parameter ofmomentum is set to 09 Experimental results show that 40epochs are adequate for training to converge Weightedcross-entropy loss is used as the optimization criterion ebest model is selected based on validation results

312 All-Update ResNet(480) is first fine-tuned on ourdataset using pretrained ResNet weights from ImageNet for

initialization e training settings are identical to those in(1) Convergence is achieved during training and the bestmodel is selected based on validation results We then trainthe whole HAnet using the fine-tuned ResNet (480)models for initialization and update all weights in HAnet(denoted as all-update in Table 2) Training lasts for 40epochs e learning rate is set to 10minus 4 momentum is set to09 and the best model is selected based on validationresults

313 FC-Only e weights of the fine-tuned ResNet(480)model are used and only the last fully connected (FC) layeris updated in HAnet (denoted as FC-only in Table 2) ebest model is selected based on validation results Traininglasts for 10 epochs the learning rate is set to 10minus 4 andmomentum is set to 09

For example the first HAnet in Table 2 hyper(l2) FC-only refers to the architecture fusing the features from layer2 and the final layer 4 it uses ResNet (480) weights as thefeature extractor and only the final FC layer is updatedduring HAnet training

To achieve better generalizability data augmentation wasapplied online in the training procedure as suggested in [7]e images are randomly rotated between 0deg and 90deg andflipped with 50 possibility Our network is implementedusing PyTorch

e experiments are conducted on an Intel Xeon ma-chine (Gold 6130 CPU210GHz) with Nvidia Quadro

GAP (global average pooling)

GAP

Input Layer 1 Layer 2 Layer 3 Layer 4

GAP

Features

120 times 120 times 64 60 times 60 times 128 30 times 30 times 256 15 times 15 times 512 1 times 1 times 512 1 times 1 times 256 1 times 1 times 128480 times 480 times 3

Figure 5 Ulcer recognition network framework ResNet-34 which has 34 layers is selected as the feature extractor Here we only displaythe structural framework for clarity Detailed layer information can be found in Appendix

Split 0

Split 1

Split 4

hellip

helliphellip

Test Train Val

Figure 6 Illustration of cross-validation (green test dataset blue train dataset yellow validation dataset)

6 Computational and Mathematical Methods in Medicine

GP100 graphics cards Detailed results are presented inSections 32ndash34

32 Refinement of the Weighting Factor for Weighted Cross-Entropy To demonstrate the impact of different weightingfactors ie w in equation (2) we examine the cross-vali-dation results of model recognition accuracy with differentweighting factors e AC SE and SP curves are shown inFigure 9

AC varies with changes in weighting factor In generalSE improves while SP is degraded as the weighting factorincreases Detailed AC SE and SP values are listed inTable 3 ResNet-18(480) refers to experiments on a ResNet-18 network with a WCE full-resolution image input of480times 480 times 3 A possible explanation for the observed effectof the weighting factor is that the ulcer dataset containsmany consecutive frames of the same ulcer and theseframes may share marked similarities us while theframe numbers of the ulcer and normal dataset are com-parable the information contained by each dataset remainsunbalanced e weighting factor corrects or compensatesfor this imbalance

In the following experiments 40 is used as the weightingfactor as it outperforms other choices and simultaneouslyachieves good balance between SE and SP

33 Selection of Hyper Architectures We tested 10 models intotal as listed in Table 4 including a ResNet-18(480) modeland nine HAnet models based on ResNet-18(480) eresolution of input images is 480times 480times 3 for all models

According to the results in Table 4 the FC-only and all-update hyper models consistently outperform the ResNet-18(480) model in terms of the AC criterion which dem-onstrates the effectiveness of HAnet architectures More-over FC-only models generally perform better than all-update models thus implying that ResNet-18(480) extractsfeatures well and that further updates may corrupt thesefeatures

e hyper ImageNet models including hyper(l2)ImageNet hyper(l3) ImageNet and hyper(l23) ImageNetseem to give weak performance Hyper ImageNet modelsand the other hyper models share the same architecturese difference between these types of models is that thehyper ImageNet models are trained with the pretrainedImageNet ResNet-18 weights while the other models useResNet-18(480) weights that have been fine-tuned on theWCE datasetis finding reveals that a straightforward basenet such as ResNet-18(480) shows great power in extractingfeatures e complicated connections of HAnet may pro-hibit the network from reaching good convergence points

To fully utilize the advantages of hyper architectures werecommend a two-stage training process (1) Train a ResNet-18(480) model based on the ImageNet-pretrained weightsand then (2) use the fine-tuned ResNet-18(480) model as abackbone feature extractor to train the hyper models Wedenote the best model in all hyper architectures as HAnet-18(480) ie a hyper(l23) FC-only model

Additionally former exploration is based on ResNet-18and results indicate that a hyper(l23) FC-only architecturebased on the ResNet backbone feature extractor fine-tunedbyWCE images may be expected to improve the recognition

Table 2 Illustration of different architectures and configurations

ModelFeatures

Network initialization weights TrainingLayer 2 Layer 3 Layer 4

ResNet(480) ImageNet Train on WCE datasethyper(l2) FC-only

ResNet(480) Update FC-layer onlyhyper(l3) FC-only hyper(l23) FC-only hyper(l2) all-update

ResNet(480) Update all layershyper(l3) all-update hyper(l23) all-update hyper(l2) ImageNet

ImageNet Update all layershyper(l3) ImageNet hyper(l23) ImageNet

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(a)

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(b)

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(c)

Figure 7 Illustrations of HAnet architectures (a) hyper(l2) (b) hyper(l3) (c) hyper(l23)

Computational and Mathematical Methods in Medicine 7

ResNet(480) weights

ImageNet weights

Other trained weights

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) ImageNet

Layer 1

Layer 2

Layer 3

Layer 4

ResNet from ImageNet

Trai

n

Layer 1

Layer 2

Layer 3

Layer 4

ResNet(480)

FC la

yer

FC la

yer

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) all-update

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) FC-only

Train ResNet backbone to get ResNet(480)

all-update use the trained ResNet(480) weights as initialization and update all layers during training

FC-only use the trained ResNet(480) weights as initialization and update the FC layer during training

ImageNet use the ImageNet ResNet weights as initialization and update all layers during training

Figure 8 Illustration of three different training configurations including ImageNet all-update and FC-only Different colors are used torepresent different weights Green denotes pretrained weights from ImageNet and purple denotes the weights of ResNet(480) that have beenfine-tuned on the WCE dataset Several other colors denote other trained weights

1 2 3 4 5 688

89

90

91

92

93

94

Weighting factor

Val

ue (

)

ACSESP

Figure 9 AC SE and SP evolution against the wCE weighting factor Red purple and blue curves denote the results of AC SE and SPrespectively e horizontal axis is the weighting factor and the vertical axis is the value of AC SE and SP

8 Computational and Mathematical Methods in Medicine

capability of lesions in WCE videos To optimize our net-work we examine the performance of various ResNet seriesmembers to determine an appropriate backbone e cor-responding results are listed in Table 5 ResNet-34(480) hasbetter performance than ResNet-18(480) and ResNet-50(480) us we take ResNet-34(480) as our backbone totrain HAnet-34(480) e training settings are described inSection 31 Figure 10 gives the overall progression ofHAnet-34(480)

34 Comparison with Other Methods To evaluate the per-formance of HAnet we compared the proposed methodwith several other methods including several off-the-shelfCNN models [26ndash28] and two representative handcrafted-feature based methods for WCE recognition [3 42] e off-the-shelf CNN models including VGG [28] DenseNet [26]and Inception-ResNet-v2 [27] are trained to converge withthe same settings as ResNet-34(480) and the best model isselected based on the validation results For handcrafted-feature based methods grid searches to optimize hyperparameters are carried out

We performed repeated 2times 5-fold cross-validation toprovide sufficient measurements for statistical tests Table 6compares the detailed results of HAnet-34(480) with thoseof other methods On average HAnet-34(480) performsbetter in terms of AC SE and SP than the other methodsFigure 11(a) gives the location of each model considering itsinference time and accuracy Figure 11(b) is the statisticalresults of paired T-Test

Among the models tested HAnet-34(480) yields the bestperformance with good efficiency and accuracy Addition-ally the statistical test results demonstrate the improvementof our HAnet-34 is statistically significant Number in eachgrid cell denotes the p value of the two models in thecorresponding row and column We can see that the

improvement of HAnet-34(480) is statistically significant atthe 001 level compared with other methods

Table 7 gives more evaluation results based on severalcriteria including precision (PRE) recall (RECALL) F1 andF2 scores [33] and ROC-AUC [6] HAnet-34 outperformsall other models based on these criteria

4 Discussion

In this section the recognition capability of the proposedmethod for small lesions is demonstrated and discussedRecognition results are also visualized via the class activationmap (CAM) method [43] which indicates the localizationpotential of CNN networks for clinical diagnosis

41 Enhanced Recognition of Small Lesions by HAnet Toanalyze the recognition capacity of the proposed model thesensitivities of ulcers of different sizes are studied and theresults of ResNet-34(480) and the best hyper model HAnet-34(480) are listed in Table 8

Based on the results of each row in Table 8 most of theerrors noticeably occur in the small size range for bothmodels In general the larger the ulcer the easier its rec-ognition In the vertical comparison the ulcer recognition ofHAnet-34(480) outperforms that of ResNet-34(480) at allsize ranges including small lesions

42 Visualization of Recognition Results To better un-derstand our network we use a CAM [43] generated fromGAP to visualize the behavior of HAnet by highlighting therelatively important parts of an image and providing objectlocation information CAM is the weighted linear sum of theactivation map in the last convolutional layer e imageregions most relevant to a particular category can be simplyobtained by upsampling the CAM Using CAM we canverify what indeed has been learned by the network Six casesof representative results are displayed in Figure 12 For eachpair of images the left image shows the original frame whilethe right image shows the CAM result

ese results displayed in Figure 12 demonstrate thepotential use of HAnet for locating ulcers and easing thework of clinical physicians

Table 3 Cross-validation accuracy of ResNet-18(480) with different weighting factors

Weighting factor (w) 10 20 30 40 50 60AC () 9095plusmn 064 9100plusmn 049 9096plusmn 068 9100plusmn 070 9095plusmn 083 9072plusmn 075SE () 8865plusmn 064 8985plusmn 047 8986plusmn 101 9067plusmn 093 9150plusmn 076 9112plusmn 194SP () 9327plusmn 105 9215plusmn 122 9205plusmn 109 9145plusmn 184 9038plusmn 126 9032plusmn 188

Table 4 Performances of different architectures

ModelCross validation

AC () SE () SP ()ResNet-18(480) 9100plusmn 070 9055plusmn 086 9145plusmn 184hyper(l2) FC-only 9164plusmn 079 9122plusmn 108 9205plusmn 165hyper(l3) FC-only 9162plusmn 065 9115plusmn 056 9207plusmn 145hyper(l23) FC-only 9166 plusmn 081 9148 plusmn 090 9183 plusmn 175hyper(l2) all-update 9139plusmn 102 9128plusmn 104 9147plusmn 186hyper(l3) all-update 9150plusmn 081 9063plusmn 106 9233plusmn 174hyper(l23) all-update 9137plusmn 072 9133plusmn 063 914plusmn 142hyper(l2) ImageNet 9096plusmn 090 9052plusmn 110 9138plusmn 201hyper(l3) ImageNet 9104plusmn 080 9052plusmn 134 9154plusmn 131hyper(l23) ImageNet 9082plusmn 085 9026plusmn 133 9137plusmn 148

Table 5 Model recognition accuracy with different settings

ResNet-18(480) ResNet-34(480) ResNet-50(480)AC () 9100plusmn 070 9150plusmn 070 9129plusmn 091SE () 9055plusmn 086 9074plusmn 074 8963plusmn 183SP () 9145plusmn 184 9225plusmn 172 9294plusmn 182

Computational and Mathematical Methods in Medicine 9

3 HAnet architectures

hyper(l2)

hyper(l3)

hyper(l23)

3 Training configurations

ImageNet

all-update

FC-only

hyper(l23)

FC-only

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

3 Backbones

ResNet-34(480)

HAnet-34(480)

Cross validation

Cross validation

Figure 10 Progression of HAnet-34(480)

Table 6 Comparison of HAnet with other methods

AC () SE () SP ()SPM-BoW-SVM [42] 6138plusmn 122 5147plusmn 265 7067plusmn 224Words-based-color-histogram [3] 8034plusmn 029 8221plusmn 039 7844plusmn 029vgg-16(480) [28] 9085plusmn 098 9012plusmn 117 9202plusmn 252dense-121(480) [26] 9126plusmn 043 9047plusmn 167 9207plusmn 204Inception-ResNet-v2(480) [27] 9145plusmn 080 9081plusmn 195 9212plusmn 271ResNet-34(480) [25] 9147plusmn 052 9053plusmn 114 9241plusmn 166HAnet-34(480) 9205plusmn 052 9164plusmn 095 9242plusmn 154

0 10 20 30 409050

9075

9100

9125

9150

9175

9200

9225

9250

dense-121(480)

Inception-res-v2(480)

vgg-16(480)

HAnet-18(480)

HAnet-34(480)

HAnet-50(480)

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

Average inference time (ms)

Accu

racy

()

(a)

Figure 11 Continued

10 Computational and Mathematical Methods in Medicine

5 Conclusion

In this work we proposed a CNN architecture for ulcerdetection that uses a state-of-the-art CNN architecture(ResNet-34) as the feature extractor and fuses hyper anddeep features to enhance the recognition of ulcers of varioussizes A large ulcer dataset containing WCE videos from1416 patients was used for this studye proposed networkwas extensively evaluated and compared with other methodsusing overall AC SE SP F1 F2 and ROC-AUC as metrics

Experimental results demonstrate that the proposed ar-chitecture outperforms off-the-shelf CNN architectures es-pecially for the recognition of small ulcers Visualization with

CAM further demonstrates the potential of the proposedarchitecture to locate a suspicious area accurately in a WCEimage Taken together the results suggest a potential methodfor the automatic diagnosis of ulcers from WCE videos

Additionally we conducted experiments to investigatethe effect of number of cases We used split 0 datasets in thecross-validation experiment 990 cases for training 142 casesfor validation and 283 cases for testing We constructeddifferent training datasets from the 990 cases while fixed thevalidation and test dataset Firstly we did experiments onusing different number of cases for training We randomlyselected 659 cases 423 cases and 283 cases from 990 casesen we did another experiment using similar number of

00096

00096 00002

08018

00002 08018

00003

02384

00228

00003 02384 00228

00002

00026

00080

00989

00002 00026 00080 00989

HAnet-34(480)

HAnet-34(480)

Inception-res-v2(480)

Inception-res-v2(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

lt001

gt005

001ndash005

(b)

Figure 11 Comparison of different models (a) Accuracy and inference time comparison e horizontal and vertical axes denote theinference time and test accuracy respectively (b) Statistical test results of paired T-Test Number in each grid cell denotes the p value of thetwo models in the corresponding row and column

Table 7 Evaluation of different criteria

PRE RECALL F1 F2 ROC-AUCvgg-16(480) 09170 09012 09087 09040 09656dense-121(480) 09198 09047 09118 09074 09658Inception-ResNet-v2(480) 09208 09081 09138 09102 09706ResNet-34(480) 09218 09053 09133 09084 09698HAnet-34(480) 09237 09164 09199 09177 09726

Table 8 Recognition of ulcer with different sizes

ModelUlcer size

lt1 1ndash25 25ndash5 gt5ResNet-34(480) 8144plusmn 307 9186plusmn 140 9416plusmn 126 9651plusmn 143HAnet-34(480) 8237plusmn 360 9278plusmn 133 9540plusmn 074 9711plusmn 111

Computational and Mathematical Methods in Medicine 11

frames as last experiment that distributed in all 990 casesResults demonstrate that when similar number of frames areused for training test accuracies using training datasets withmore cases are better is should be attributed to richerdiversity introduced by more cases We may recommend touse as many cases as possible to train the model

While the performance of HAnet is very encouragingimproving its SE and SP further is necessary For examplethe fusion strategy in the proposed architecture involves

concatenation of features from shallow layers after GAPSemantic information in hyper features may not be as strongas that in deep features ie false-activated neural units dueto the relative limited receptive field in the shallow layersmay add unnecessary noise to the concatenated featurevector when GAP is utilized An attention mechanism [44]that can focus on the suspicious area may help address thisissue Temporal information from adjacent frames couldalso be used to provide external guidance during recognition

(a) (b)

(c) (d)

(e) (f )

Figure 12 Visualization network results of some representative ulcers with CAM A total of six groups of representative frames are obtained Foreach group the left image reflects the original frame while the right image shows the result of CAM (a) Typical ulcer (b) ulcer on the edge (c) ulcerin a turbid backgroundwith bubbles (d)multiple ulcers in one frame (e) ulcer in a shadow and (f) ulcer recognition in the framewith a flashlight

Table 9 Architectures of ResNet series members

Layer name Output size 18-Layer 34-Layer 50-Layer 101-Layer 152-LayerConv1 240 times 240 7 times 7 64 stride 2Maxpool 120 times 120 3 times 3 max pool stride 2

Layer 1 120 times 120 3 times 3 643 times 3 641113890 1113891 times 2 3 times 3 64

3 times 3 641113890 1113891 times 31 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Layer 2 60 times 60 3 times 3 1283 times 3 1281113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 41 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 8

Layer 3 30 times 30 3 times 3 2563 times 3 2561113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 61 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 6

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 23

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 36

Layer 4 15 times 15 3 times 3 5123 times 3 5121113890 1113891 times 2 3 times 3 512

3 times 3 5121113890 1113891 times 31 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Avgpool and fc 1 times 1 Global average pool 2-d fc

12 Computational and Mathematical Methods in Medicine

of the current frame In future work we will explore moretechniques in network design to improve ulcer recognition

Appendix

e architectures of members of the ResNet series includingResNet-18 ResNet-34 ResNet-50 ResNet-101 and ResNet-152 are illustrated in Table 9ese members share a similarpipeline consisting of seven components (ie conv1maxpool layer 1 layer 2 layer 3 layer 4 and avgpool + fc)Each layer in layers 1ndash4 has different subcomponentsdenoted as [ ] times n where [ ] represents a specific basicmodule consisting of 2-3 convolutional layers and n is thenumber of times the basic module is repeated Each row in[ ] describes the convolution layer setting eg [(3 times 3)64]

means a convolution layer with a kernel size and channel of3 times 3 and 64 respectively

Data Availability

eWCE datasets used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

Hao Zhang is currently an employee of Ankon TechnologiesCo Ltd (Wuhan Shanghai China) He helped in providingWCE data and organizing annotation e authors wouldlike to thank Ankon Technologies Co Ltd (WuhanShanghai China) for providing the WCE data (anko-ninccomcn)ey would also like to thank the participatingengineers of Ankon Technologies Co Ltd for theirthoughtful support and cooperation is work was sup-ported by a Research Project from Ankon Technologies CoLtd (Wuhan Shanghai China) the National Key ScientificInstrument and Equipment Development Project underGrant no 2013YQ160439 and the Zhangjiang NationalInnovation Demonstration Zone Special Development Fundunder Grant no ZJ2017-ZD-001

References

[1] L Shen Y-S Shan H-M Hu et al ldquoManagement of gastriccancer in Asia resource-stratified guidelinesrdquo e LancetOncology vol 14 no 12 pp e535ndashe547 2013

[2] Z Liao X Hou E-Q Lin-Hu et al ldquoAccuracy of magneticallycontrolled capsule endoscopy compared with conventionalgastroscopy in detection of gastric diseasesrdquo Clinical Gastro-enterology and Hepatology vol 14 no 9 pp 1266ndash1273e1 2016

[3] Y Yuan B Li and M Q-H Meng ldquoBleeding frame andregion detection in the wireless capsule endoscopy videordquoIEEE Journal of Biomedical and Health Informatics vol 20no 2 pp 624ndash630 2015

[4] N Shamsudhin V I Zverev H Keller et al ldquoMagneticallyguided capsule endoscopyrdquo Medical Physics vol 44 no 8pp e91ndashe111 2017

[5] B Li and M Q-H Meng ldquoComputer aided detection ofbleeding regions for capsule endoscopy imagesrdquo IEEETransactions on Biomedical Engineering vol 56 no 4pp 1032ndash1039 2009

[6] J-Y He X Wu Y-G Jiang Q Peng and R Jain ldquoHook-worm detection in wireless capsule endoscopy images withdeep learningrdquo IEEE Transactions on Image Processingvol 27 no 5 pp 2379ndash2392 2018

[7] Y Yuan and M Q-H Meng ldquoDeep learning for polyprecognition in wireless capsule endoscopy imagesrdquo MedicalPhysics vol 44 no 4 pp 1379ndash1389 2017

[8] B Li and M Q-H Meng ldquoAutomatic polyp detection forwireless capsule endoscopy imagesrdquo Expert Systems withApplications vol 39 no 12 pp 10952ndash10958 2012

[9] A Karargyris and N Bourbakis ldquoDetection of small bowelpolyps and ulcers in wireless capsule endoscopy videosrdquo IEEETransactions on Biomedical Engineering vol 58 no 10pp 2777ndash2786 2011

[10] L Yu P C Yuen and J Lai ldquoUlcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 21st In-ternational Conference on Pattern Recognition (ICPR2012)pp 45ndash48 IEEE Tsukuba Japan November 2012

[11] J-Y Yeh T-H Wu and W-J Tsai ldquoBleeding and ulcerdetection using wireless capsule endoscopy imagesrdquo Journalof Software Engineering and Applications vol 7 no 5pp 422ndash432 2014

[12] Y Fu W Zhang M Mandal and M Q-H Meng ldquoCom-puter-aided bleeding detection in WCE videordquo IEEE Journalof Biomedical and Health Informatics vol 18 no 2pp 636ndash642 2014

[13] V Charisis L Hadjileontiadis and G Sergiadis ldquoEnhancedulcer recognition from capsule endoscopic images usingtexture analysisrdquo in New Advances in the Basic and ClinicalGastroenterology IntechOpen London UK 2012

[14] A K Kundu and S A Fattah ldquoAn asymmetric indexed imagebased technique for automatic ulcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 2017 IEEERegion 10 Humanitarian Technology Conference (R10-HTC)pp 734ndash737 IEEE Dhaka Bangladesh December 2017

[15] E Ribeiro A Uhl W Georg andM Hafner ldquoExploring deeplearning and transfer learning for colonic polyp classifica-tionrdquo Computational andMathematical Methods in Medicinevol 2016 Article ID 6584725 16 pages 2016

[16] B Li and M Q-H Meng ldquoComputer-based detection ofbleeding and ulcer in wireless capsule endoscopy images bychromaticity momentsrdquo Computers in Biology and Medicinevol 39 no 2 pp 141ndash147 2009

[17] B Li L Qi M Q-H Meng and Y Fan ldquoUsing ensembleclassifier for small bowel ulcer detection in wireless capsuleendoscopy imagesrdquo in Proceedings of the 2009 IEEE In-ternational Conference on Robotics amp Biomimetics GuilinChina December 2009

[18] T Aoki A Yamada K Aoyama et al ldquoAutomatic detection oferosions and ulcerations in wireless capsule endoscopy imagesbased on a deep convolutional neural networkrdquo Gastrointes-tinal Endoscopy vol 89 no 2 pp 357ndash363e2 2019

[19] B Li andM Q-H Meng ldquoTexture analysis for ulcer detectionin capsule endoscopy imagesrdquo Image and Vision Computingvol 27 no 9 pp 1336ndash1342 2009

[20] Y Yuan and M Q-H Meng ldquoA novel feature for polypdetection in wireless capsule endoscopy imagesrdquo in Pro-ceedings of the 2014 IEEERSJ International Conference onIntelligent Robots and Systems pp 5010ndash5015 IEEE ChicagoIL USA September 2014

Computational and Mathematical Methods in Medicine 13

[21] A Krizhevsky I Sutskever and G Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inAdvances in Neural Information Processing Systems vol 25no 2 Curran Associates Inc Red Hook NY USA 2012

[22] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in Proceedings of the EuropeanConference on Computer Vision pp 818ndash833 Springer ChamSwitzerland September 2014

[23] C Szegedy W Liu Y Jia et al ldquoGoing deeper with con-volutionsrdquo in Proceedings of the 2015 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEEBoston MA USA June 2015

[24] C Szegedy V Vanhoucke S Ioffe J Shlens and Z WojnaldquoRethinking the inception architecture for computer visionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2818ndash2826 Las Vegas NV USA June2016

[25] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778 LasVegas NV USA June 2016

[26] G Huang Z Liu L Van Der Maaten and K Q WeinbergerldquoDensely connected convolutional networksrdquo in Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition pp 4700ndash4708 Honolulu HI USA July 2017

[27] C Szegedy S Ioffe V Vanhoucke and A Alemi ldquoInception-v4 inception-ResNet and the impact of residual connectionson learningrdquo in Proceedings of the irty-First AAAI Con-ference on Artificial Intelligence San Francisco CA USAFebruary 2017

[28] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[29] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquoNaturevol 521 no 7553 pp 436ndash444 2015

[30] M Turan E P Ornek N Ibrahimli et al ldquoUnsupervisedodometry and depth learning for endoscopic capsule robotsrdquo2018 httpsarxivorgabs180301047

[31] M Ye E Johns A Handa L Zhang P Pratt and G-Z YangldquoSelf-supervised Siamese learning on stereo image pairs fordepth estimation in robotic surgeryrdquo 2017 httpsarxivorgabs170508260

[32] M Faisal and N J Durr ldquoDeep learning and conditionalrandom fields-based depth estimation and topographicalreconstruction from conventional endoscopyrdquoMedical ImageAnalysis vol 48 pp 230ndash243 2018

[33] L Yu H Chen Q Dou J Qin and P A Heng ldquoIntegratingonline and offline three-dimensional deep learning for auto-mated polyp detection in colonoscopy videosrdquo IEEE Journal ofBiomedical andHealth Informatics vol 21 no1 pp 65ndash75 2017

[34] A A Shvets V I Iglovikov A Rakhlin and A A KalininldquoAngiodysplasia detection and localization using deep con-volutional neural networksrdquo in Proceedings of the 2018 17thIEEE International Conference on Machine Learning andApplications (ICMLA) pp 612ndash617 IEEE Orlando FL USADecember 2018

[35] S Fan L Xu Y Fan K Wei and L Li ldquoComputer-aideddetection of small intestinal ulcer and erosion in wirelesscapsule endoscopy imagesrdquo Physics in Medicine amp Biologyvol 63 no 16 p 165001 2018

[36] W Liu D Anguelov D Erhan et al ldquoSSD single shotmultibox detectorrdquo in Proceedings of the European Conferenceon Computer Vision pp 21ndash37 Springer Cham SwitzerlandOctober 2016

[37] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7263ndash7271 Honolulu HIUSA July 2017

[38] T-Y Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2117ndash2125 Honolulu HI USA July2017

[39] M Lin Q Chen and S Yan ldquoNetwork in networkrdquo 2013httpsarxivorgabs13124400

[40] T-Y Lin P Goyal R Girshick K He and P Dollar ldquoFocalloss for dense object detectionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 2980ndash2988Venice Italy October 2017

[41] H R Roth H Oda Y Hayashi et al ldquoHierarchical 3D fullyconvolutional networks for multi-organ segmentationrdquo 2017httpsarxivorgabs170406382

[42] X Liu J Bai G Liao Z Luo and C Wang ldquoDetection ofprotruding lesion in wireless capsule endoscopy videos of smallintestinerdquo in Proceedings of the Medical Imaging 2018 Com-puter-Aided Diagnosis Houston TX USA February 2018

[43] B Zhou A Khosla A Lapedriza A Oliva and A TorralbaldquoLearning deep features for discriminative localizationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2921ndash2929 Las Vegas NV USA June2016

[44] F Wang M Jiang C Qian et al ldquoResidual attention networkfor image classificationrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 3156ndash3164Honolulu HI USA July 2017

14 Computational and Mathematical Methods in Medicine

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 3: DeepConvolutionalNeuralNetworkforUlcerRecognitionin ...e emergence of wireless capsule endoscopy (WCE) has revolutionized the task of imaging GI issues; this technology offers a noninvasive

(a) (b) (c)

(d) (e) (f )

Figure 2 Typical WCE images of an ulcer Ulcerated areas in each image are marked by a white box

Table 1 Representative studies and datasets of WCE videos in the literature

Experiment Cases DetailLi and Meng [5] 10 patientsrsquo videos 10 patientsrsquo videos 200 images

Li and Meng [16] 10 patientsrsquo videos 10 patientsrsquo videos (five for bleeding and the other fivefor ulcer)

Li et al [17] mdash 80 representative small intestine ulcer WCE imagesand 80 normal images

Karargyris and Bourbakis [9] mdash A WCE video containing 10 frames with polyps and40 normal frames and extra 20 frames with ulcer

Li and Meng [8] 10 patientsrsquo videos

10 patientsrsquo videos 600 representative polyp imagesand 600 normal images from data 60 normal images

and 60 polyp images from each patientrsquos videosegments

Yu et al [10] 60 patientsrsquo videos60 patientsrsquo videos 344 endoscopic images for

training another 120 ulcer images and 120 normalimages for testing

Fu et al [12] 20 patientsrsquo videos 20 patientsrsquo videos 5000 WCE images consisting of1000 bleeding frames and 4000 nonbleeding frames

Yeh et al [11] mdash 607 images containing 220 159 and 228 images ofbleeding ulcers and nonbleedingulcers respectively

Yuan et al [3] 10 patientsrsquo videos 10 patientsrsquo videos 2400 WCE images that consist of400 bleeding frames and 2000 normal frames

Yuan and Meng [7] 35 patientsrsquo videos35 patientsrsquo videos 3000 normal WCE images (1000bubbles 1000 TIs and 1000 CIs) and 1000 polyp

imagesHe et al [6] 11 patientsrsquo videos 11 patientsrsquo videos 440K WCE images

Aoki et al [18] 180 patientsrsquo videos115 patientsrsquo videos 5360 images of small-bowelerosions and ulcerations for training 65 patientsrsquovideos 10440 independent images for validation

Ours 1416 patientsrsquo videos 1416 patientsrsquo videos with 24839 representative ulcerframes

Computational and Mathematical Methods in Medicine 3

with the emergence of various deep CNN architectures suchas AlexNet VGGNet GoogLeNet and ResNet [21ndash28]

Many researchers have realized that handcrafted fea-tures merely encode partial information in WCE images[29] and that deep learning methods are capable ofextracting powerful feature representations that can beused in WCE lesion recognition and depth estimation[6 7 15 18 30ndash34]

A framework for hookworm detection was proposed in[14] this framework consists of an edge extraction networkand a hookworm classification network Inception modulesare used to capture multiscale features to capture spatialcorrelationse robustness and effectiveness of this methodwere verified in a dataset containing 440000WCE images of11 patients Yuan and Meng [7] proposed an autoencoder-based neural network model that introduces an imagemanifold constraint to a traditional sparse autoencoder torecognize polyps in WCE images Manifold constraint caneffectively enforce images within the same category to sharesimilar features and keep images in different categories farway ie it can preserve large intervariances and smallintravariance among images e proposed method wasevaluated using 3000 normal WCE images and 1000 WCEimages with polyps extracted from 35 patient videos Uti-lizing temporal information of WCE videos with 3D con-volution has also been explored for poly detection [33] Deeplearning methods are also adopted for ulcer diagnosis ereare also some investigations of ulcer recognition with deeplearning methods [18 35] Off-the-shelf CNN models aretrained and evaluated in these studies Experimental resultsand comparisons in these studies clearly demonstrate thesuperiority of deep learning methods over conventionalmachine learning techniques

From ulcer size analysis of our dataset we find that mostof the ulcers occupy only a tiny area in the whole imageDeep CNNs can inherently compute feature hierarchieslayer by layer Hyper features from shallow layers have highresolution but lack representation capacity by contrast deepfeatures from deep layers are semantically strong but havepoor resolution [36ndash38] ese features motivate us topropose a framework that fuses hyper and deep features toachieve ulcer recognition at vastly different scales We willgive detailed description of ulcer size analysis and theproposed method in Sections 22 and 23

22 Ulcer Dataset Our dataset is collected using a WCEsystem provided by Ankon Technologies Co Ltd(Wuhan Shanghai China) e WCE system consists ofan endoscopic capsule a guidance magnet robot a datarecorder and a computer workstation with software forreal-time viewing and controlling e capsule is28mm times 12mm in size and contains a permanent magnetin its dome Images are recorded and transferred at aspeed of 2 framess e resolution of the WCE image is480 times 480 pixels

e dataset used in this work to evaluate the perfor-mance of the proposed framework contains 1416 WCEvideos from 1416 patients (males 73 female 27) ie one

video per patient e WCE videos are collected from morethan 30 hospitals and 100 medical examination centersthrough the Ankon WCE system Each video is in-dependently annotated by at least two gastroenterologists Ifthe difference between annotation bounding boxes of thesame ulcer is larger than 10 an expert gastroenterologistwill review the annotation and provide a final decision eage distribution of patients is illustrated in Figure 3 eentire dataset consists of 1157 ulcer videos and 259 normalvideos In total 24839 frames are annotated as ulcers bygastroenterologists To balance the volume of each class24225 normal frames are randomly extracted from normalvideos for this study to match the 24839 representative ulcerframes A mask of diameter 420 pixels was used to crop thecenter area of each image in preprocessing is pre-processing did not change the image size

We plot the distribution of ulcer size in our dataset inFigure 4 e vertical and horizontal axes denote thenumber of images and the ratio of the ulcerated area to thewhole image size respectively Despite the inspiring suc-cess of CNNs in ImageNet competition ulcer recognitionpresents some challenge to the ImageNet classification taskbecause lesions normally occupy only a small area of WCEimages and the structures of lesions are rather subtle InFigure 4 about 25 of the ulcers occupy less than 1 of thearea of the whole image and more than 80 of the ulcersfound occupy less than 5 of the area of the image Hencea specific design of a suitable network is proposed to ac-count for the small ulcer problem and achieve goodsensitivity

23 HAnet-Based Ulcer Recognition Network with FusedHyper and Deep Features In this section we introduce ourdesign and the proposed architecture of our ulcer recog-nition network

Inspired by the design concept of previous works thatdeal with object recognition in vastly distributed scales[36ndash38] we propose an ulcer recognition network with ahyperconnection architecture (HAnet) e overall pipelineof this network is illustrated in Figure 5 FundamentallyHAnet fuses hyper and deep features Here we use ResNet-34 as the base feature-extraction network because accordingto our experiments (demonstrated in Section 3) it providesthe best results Global average pooling (GAP) [39] is used togenerate features for each layer GAP takes an average ofeach feature layer so that it reduces tensors with dimensionshtimes w times d to 1times 1times d Hyper features can be extracted frommultiple intermediate layers (layers 2 and 3 in this case) ofthe base network they are concatenated with the features oflast feature-extraction layer (layer 4 in this case) to make thefinal decision

Our WCE system outputs color images with a resolutionof 480times 480 pixels Experiments by the computer visioncommunity [36 40] have shown that high-resolution inputimages are helpful to the performance of CNN networks Tofully utilize the output images from the WCE system wemodify the base network to receive input images with a sizeof 480times 480times 3 without cropping or rescaling

4 Computational and Mathematical Methods in Medicine

24 Loss ofWeightedCross Entropy Cross-entropy (CE) lossis a common choice for classification tasks For binaryclassification [40] CE is defined as

CE(p y) minus y log(p) minus (1 minus y)log(1 minus p) (1)

where y isin 0 1 denotes the ground-truth label of thesample and p isin [0 1] is the estimated probability of asample belonging to the class with label 1 Mathematicallythe minimization process of CE is to enlarge the probabilitiesof samples with label 1 and suppress the probabilities ofsamples with label 0

To deal with possible imbalance between classes aweighting factor can be applied to different classes whichcan be called weighted cross-entropy (wCE) loss [41]

wCE(p y w) minusw

w + 1middot y log(p) minus

1w + 1

middot (1 minus y)log(1 minus p)

(2)

where w denotes the weighting factor to balance the loss ofdifferent classes Considering the overall small and varia-tional size distribution of ulcers as well as possible imbal-ance in the large dataset we set wCE as our loss function

25 Evaluation Criteria To evaluate the performance ofclassification accuracy (AC) sensitivity (SE) and specificity(SP) are exploited as metrics [6]

AC (TP + TN)

N

SE TP

(TP + FN)

SP TN

(TN + FP)

(3)

Here N is the total number of test images and TP FPTN and FN are the number of correctly classified imagescontaining ulcers the number of normal images falselyclassified as ulcer frames the number of correctly classifiedimages without ulcers and the number of images with ulcersfalsely classified as normal images respectively

AC gives an overall assessment of the performance of themodel SE denotes the modelrsquos ability to detect ulcer imagesand SP denotes its ability to distinguish normal imagesIdeally we expect both high SE and SP although some trade-offs between these metrics exist Considering that furthermanual inspection by the doctor of ulcer images detected bycomputer-aided systems is compulsory SE should be as highas possible with no negative impact on overall AC

We use a 5-fold cross-validation strategy at the case levelto evaluate the performances of different architectures thisstrategy splits the total number of cases evenly into fivesubsets Here one subset is used for testing and the fourother subsets are used for training and validation Figure 6illustrates the cross-validation operation In the presentstudy the volumes of train validation and test are about70 10 and 20 respectively Normal or ulcer frames arethen extracted from each case to form the trainingvalida-tiontesting dataset We perform case-level splitting becauseadjacent frames in the same case are likely to share similardetails We do not conduct frame-level cross-validationsplitting to avoid overfittinge validation dataset is used toselect the best model in each training process ie the modelwith the best validation accuracy during the training iter-ation is saved as the final model

3 Results

In this section the implementation process of the proposedmethod is introduced and its performance is evaluated bycomparison with several other related methods includingstate-of-the-art CNN methods and some representativeWCE recognition methods based on conventional machinelearning techniques

31 Network Architectures and Training Configurationse proposed HAnet connects hyper features to the finalfeature vector with the aim of enhancing the recognition ofulcers of different sizes e HAnet models are distinguishedby their architecture and training settings which includethree architectures and three training configurations in total

10ndash30 30ndash40 40ndash50 50ndash60 60ndash70 70ndash0

10

20

30

40

356

1804

3665

2810

1015

451

Age distribution

Case

por

tion

()

Figure 3 Age distribution of patients providing videos for thisstudy e horizontal axis denotes the age range and the verticalaxis denotes the case portion

lt1 1ndash25 25ndash5 gt50

2000

4000

6000

8000

6239

8969

5650

4115

Size ()

Case

num

ber

Figure 4 Distribution of ulcer size among patients e horizontalaxis denotes the ratio of ulcer area to the whole image while thevertical axis presents the number of frames

Computational and Mathematical Methods in Medicine 5

We illustrate these different architectures and configurationsin Table 2

ree different architectures can be obtained whenhyper features from layers 2 and 3 are used for decision incombination with features from layer 4 of our ResNetbackbone in Figure 7(a) hyper(l2) which fuses the hyperfeatures from layer 2 with the deep features of layer 4 inFigure 7(b) hyper(l3) which fuses features from layers 3and 4 and in Figure 7(c) hyper(l23) which fuses featuresfrom layers 2 3 and 4 to form the third HAnet Figure 7provides comprehensive diagrams of these different HAnetarchitectures

Each HAnet can be trained with three configurationsFigure 8 illustrates these configurations

311 ImageNet e whole HAnet is trained using pre-trained ResNet weights from ImageNet from initialization(denoted as ImageNet in Table 2) e total training processlasts for 40 epochs and the batch size is fixed to 16 samplese learning rate was initialized to be 10minus 3 and decayed by afactor of 10 at each period of 20 epochs e parameter ofmomentum is set to 09 Experimental results show that 40epochs are adequate for training to converge Weightedcross-entropy loss is used as the optimization criterion ebest model is selected based on validation results

312 All-Update ResNet(480) is first fine-tuned on ourdataset using pretrained ResNet weights from ImageNet for

initialization e training settings are identical to those in(1) Convergence is achieved during training and the bestmodel is selected based on validation results We then trainthe whole HAnet using the fine-tuned ResNet (480)models for initialization and update all weights in HAnet(denoted as all-update in Table 2) Training lasts for 40epochs e learning rate is set to 10minus 4 momentum is set to09 and the best model is selected based on validationresults

313 FC-Only e weights of the fine-tuned ResNet(480)model are used and only the last fully connected (FC) layeris updated in HAnet (denoted as FC-only in Table 2) ebest model is selected based on validation results Traininglasts for 10 epochs the learning rate is set to 10minus 4 andmomentum is set to 09

For example the first HAnet in Table 2 hyper(l2) FC-only refers to the architecture fusing the features from layer2 and the final layer 4 it uses ResNet (480) weights as thefeature extractor and only the final FC layer is updatedduring HAnet training

To achieve better generalizability data augmentation wasapplied online in the training procedure as suggested in [7]e images are randomly rotated between 0deg and 90deg andflipped with 50 possibility Our network is implementedusing PyTorch

e experiments are conducted on an Intel Xeon ma-chine (Gold 6130 CPU210GHz) with Nvidia Quadro

GAP (global average pooling)

GAP

Input Layer 1 Layer 2 Layer 3 Layer 4

GAP

Features

120 times 120 times 64 60 times 60 times 128 30 times 30 times 256 15 times 15 times 512 1 times 1 times 512 1 times 1 times 256 1 times 1 times 128480 times 480 times 3

Figure 5 Ulcer recognition network framework ResNet-34 which has 34 layers is selected as the feature extractor Here we only displaythe structural framework for clarity Detailed layer information can be found in Appendix

Split 0

Split 1

Split 4

hellip

helliphellip

Test Train Val

Figure 6 Illustration of cross-validation (green test dataset blue train dataset yellow validation dataset)

6 Computational and Mathematical Methods in Medicine

GP100 graphics cards Detailed results are presented inSections 32ndash34

32 Refinement of the Weighting Factor for Weighted Cross-Entropy To demonstrate the impact of different weightingfactors ie w in equation (2) we examine the cross-vali-dation results of model recognition accuracy with differentweighting factors e AC SE and SP curves are shown inFigure 9

AC varies with changes in weighting factor In generalSE improves while SP is degraded as the weighting factorincreases Detailed AC SE and SP values are listed inTable 3 ResNet-18(480) refers to experiments on a ResNet-18 network with a WCE full-resolution image input of480times 480 times 3 A possible explanation for the observed effectof the weighting factor is that the ulcer dataset containsmany consecutive frames of the same ulcer and theseframes may share marked similarities us while theframe numbers of the ulcer and normal dataset are com-parable the information contained by each dataset remainsunbalanced e weighting factor corrects or compensatesfor this imbalance

In the following experiments 40 is used as the weightingfactor as it outperforms other choices and simultaneouslyachieves good balance between SE and SP

33 Selection of Hyper Architectures We tested 10 models intotal as listed in Table 4 including a ResNet-18(480) modeland nine HAnet models based on ResNet-18(480) eresolution of input images is 480times 480times 3 for all models

According to the results in Table 4 the FC-only and all-update hyper models consistently outperform the ResNet-18(480) model in terms of the AC criterion which dem-onstrates the effectiveness of HAnet architectures More-over FC-only models generally perform better than all-update models thus implying that ResNet-18(480) extractsfeatures well and that further updates may corrupt thesefeatures

e hyper ImageNet models including hyper(l2)ImageNet hyper(l3) ImageNet and hyper(l23) ImageNetseem to give weak performance Hyper ImageNet modelsand the other hyper models share the same architecturese difference between these types of models is that thehyper ImageNet models are trained with the pretrainedImageNet ResNet-18 weights while the other models useResNet-18(480) weights that have been fine-tuned on theWCE datasetis finding reveals that a straightforward basenet such as ResNet-18(480) shows great power in extractingfeatures e complicated connections of HAnet may pro-hibit the network from reaching good convergence points

To fully utilize the advantages of hyper architectures werecommend a two-stage training process (1) Train a ResNet-18(480) model based on the ImageNet-pretrained weightsand then (2) use the fine-tuned ResNet-18(480) model as abackbone feature extractor to train the hyper models Wedenote the best model in all hyper architectures as HAnet-18(480) ie a hyper(l23) FC-only model

Additionally former exploration is based on ResNet-18and results indicate that a hyper(l23) FC-only architecturebased on the ResNet backbone feature extractor fine-tunedbyWCE images may be expected to improve the recognition

Table 2 Illustration of different architectures and configurations

ModelFeatures

Network initialization weights TrainingLayer 2 Layer 3 Layer 4

ResNet(480) ImageNet Train on WCE datasethyper(l2) FC-only

ResNet(480) Update FC-layer onlyhyper(l3) FC-only hyper(l23) FC-only hyper(l2) all-update

ResNet(480) Update all layershyper(l3) all-update hyper(l23) all-update hyper(l2) ImageNet

ImageNet Update all layershyper(l3) ImageNet hyper(l23) ImageNet

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(a)

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(b)

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(c)

Figure 7 Illustrations of HAnet architectures (a) hyper(l2) (b) hyper(l3) (c) hyper(l23)

Computational and Mathematical Methods in Medicine 7

ResNet(480) weights

ImageNet weights

Other trained weights

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) ImageNet

Layer 1

Layer 2

Layer 3

Layer 4

ResNet from ImageNet

Trai

n

Layer 1

Layer 2

Layer 3

Layer 4

ResNet(480)

FC la

yer

FC la

yer

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) all-update

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) FC-only

Train ResNet backbone to get ResNet(480)

all-update use the trained ResNet(480) weights as initialization and update all layers during training

FC-only use the trained ResNet(480) weights as initialization and update the FC layer during training

ImageNet use the ImageNet ResNet weights as initialization and update all layers during training

Figure 8 Illustration of three different training configurations including ImageNet all-update and FC-only Different colors are used torepresent different weights Green denotes pretrained weights from ImageNet and purple denotes the weights of ResNet(480) that have beenfine-tuned on the WCE dataset Several other colors denote other trained weights

1 2 3 4 5 688

89

90

91

92

93

94

Weighting factor

Val

ue (

)

ACSESP

Figure 9 AC SE and SP evolution against the wCE weighting factor Red purple and blue curves denote the results of AC SE and SPrespectively e horizontal axis is the weighting factor and the vertical axis is the value of AC SE and SP

8 Computational and Mathematical Methods in Medicine

capability of lesions in WCE videos To optimize our net-work we examine the performance of various ResNet seriesmembers to determine an appropriate backbone e cor-responding results are listed in Table 5 ResNet-34(480) hasbetter performance than ResNet-18(480) and ResNet-50(480) us we take ResNet-34(480) as our backbone totrain HAnet-34(480) e training settings are described inSection 31 Figure 10 gives the overall progression ofHAnet-34(480)

34 Comparison with Other Methods To evaluate the per-formance of HAnet we compared the proposed methodwith several other methods including several off-the-shelfCNN models [26ndash28] and two representative handcrafted-feature based methods for WCE recognition [3 42] e off-the-shelf CNN models including VGG [28] DenseNet [26]and Inception-ResNet-v2 [27] are trained to converge withthe same settings as ResNet-34(480) and the best model isselected based on the validation results For handcrafted-feature based methods grid searches to optimize hyperparameters are carried out

We performed repeated 2times 5-fold cross-validation toprovide sufficient measurements for statistical tests Table 6compares the detailed results of HAnet-34(480) with thoseof other methods On average HAnet-34(480) performsbetter in terms of AC SE and SP than the other methodsFigure 11(a) gives the location of each model considering itsinference time and accuracy Figure 11(b) is the statisticalresults of paired T-Test

Among the models tested HAnet-34(480) yields the bestperformance with good efficiency and accuracy Addition-ally the statistical test results demonstrate the improvementof our HAnet-34 is statistically significant Number in eachgrid cell denotes the p value of the two models in thecorresponding row and column We can see that the

improvement of HAnet-34(480) is statistically significant atthe 001 level compared with other methods

Table 7 gives more evaluation results based on severalcriteria including precision (PRE) recall (RECALL) F1 andF2 scores [33] and ROC-AUC [6] HAnet-34 outperformsall other models based on these criteria

4 Discussion

In this section the recognition capability of the proposedmethod for small lesions is demonstrated and discussedRecognition results are also visualized via the class activationmap (CAM) method [43] which indicates the localizationpotential of CNN networks for clinical diagnosis

41 Enhanced Recognition of Small Lesions by HAnet Toanalyze the recognition capacity of the proposed model thesensitivities of ulcers of different sizes are studied and theresults of ResNet-34(480) and the best hyper model HAnet-34(480) are listed in Table 8

Based on the results of each row in Table 8 most of theerrors noticeably occur in the small size range for bothmodels In general the larger the ulcer the easier its rec-ognition In the vertical comparison the ulcer recognition ofHAnet-34(480) outperforms that of ResNet-34(480) at allsize ranges including small lesions

42 Visualization of Recognition Results To better un-derstand our network we use a CAM [43] generated fromGAP to visualize the behavior of HAnet by highlighting therelatively important parts of an image and providing objectlocation information CAM is the weighted linear sum of theactivation map in the last convolutional layer e imageregions most relevant to a particular category can be simplyobtained by upsampling the CAM Using CAM we canverify what indeed has been learned by the network Six casesof representative results are displayed in Figure 12 For eachpair of images the left image shows the original frame whilethe right image shows the CAM result

ese results displayed in Figure 12 demonstrate thepotential use of HAnet for locating ulcers and easing thework of clinical physicians

Table 3 Cross-validation accuracy of ResNet-18(480) with different weighting factors

Weighting factor (w) 10 20 30 40 50 60AC () 9095plusmn 064 9100plusmn 049 9096plusmn 068 9100plusmn 070 9095plusmn 083 9072plusmn 075SE () 8865plusmn 064 8985plusmn 047 8986plusmn 101 9067plusmn 093 9150plusmn 076 9112plusmn 194SP () 9327plusmn 105 9215plusmn 122 9205plusmn 109 9145plusmn 184 9038plusmn 126 9032plusmn 188

Table 4 Performances of different architectures

ModelCross validation

AC () SE () SP ()ResNet-18(480) 9100plusmn 070 9055plusmn 086 9145plusmn 184hyper(l2) FC-only 9164plusmn 079 9122plusmn 108 9205plusmn 165hyper(l3) FC-only 9162plusmn 065 9115plusmn 056 9207plusmn 145hyper(l23) FC-only 9166 plusmn 081 9148 plusmn 090 9183 plusmn 175hyper(l2) all-update 9139plusmn 102 9128plusmn 104 9147plusmn 186hyper(l3) all-update 9150plusmn 081 9063plusmn 106 9233plusmn 174hyper(l23) all-update 9137plusmn 072 9133plusmn 063 914plusmn 142hyper(l2) ImageNet 9096plusmn 090 9052plusmn 110 9138plusmn 201hyper(l3) ImageNet 9104plusmn 080 9052plusmn 134 9154plusmn 131hyper(l23) ImageNet 9082plusmn 085 9026plusmn 133 9137plusmn 148

Table 5 Model recognition accuracy with different settings

ResNet-18(480) ResNet-34(480) ResNet-50(480)AC () 9100plusmn 070 9150plusmn 070 9129plusmn 091SE () 9055plusmn 086 9074plusmn 074 8963plusmn 183SP () 9145plusmn 184 9225plusmn 172 9294plusmn 182

Computational and Mathematical Methods in Medicine 9

3 HAnet architectures

hyper(l2)

hyper(l3)

hyper(l23)

3 Training configurations

ImageNet

all-update

FC-only

hyper(l23)

FC-only

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

3 Backbones

ResNet-34(480)

HAnet-34(480)

Cross validation

Cross validation

Figure 10 Progression of HAnet-34(480)

Table 6 Comparison of HAnet with other methods

AC () SE () SP ()SPM-BoW-SVM [42] 6138plusmn 122 5147plusmn 265 7067plusmn 224Words-based-color-histogram [3] 8034plusmn 029 8221plusmn 039 7844plusmn 029vgg-16(480) [28] 9085plusmn 098 9012plusmn 117 9202plusmn 252dense-121(480) [26] 9126plusmn 043 9047plusmn 167 9207plusmn 204Inception-ResNet-v2(480) [27] 9145plusmn 080 9081plusmn 195 9212plusmn 271ResNet-34(480) [25] 9147plusmn 052 9053plusmn 114 9241plusmn 166HAnet-34(480) 9205plusmn 052 9164plusmn 095 9242plusmn 154

0 10 20 30 409050

9075

9100

9125

9150

9175

9200

9225

9250

dense-121(480)

Inception-res-v2(480)

vgg-16(480)

HAnet-18(480)

HAnet-34(480)

HAnet-50(480)

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

Average inference time (ms)

Accu

racy

()

(a)

Figure 11 Continued

10 Computational and Mathematical Methods in Medicine

5 Conclusion

In this work we proposed a CNN architecture for ulcerdetection that uses a state-of-the-art CNN architecture(ResNet-34) as the feature extractor and fuses hyper anddeep features to enhance the recognition of ulcers of varioussizes A large ulcer dataset containing WCE videos from1416 patients was used for this studye proposed networkwas extensively evaluated and compared with other methodsusing overall AC SE SP F1 F2 and ROC-AUC as metrics

Experimental results demonstrate that the proposed ar-chitecture outperforms off-the-shelf CNN architectures es-pecially for the recognition of small ulcers Visualization with

CAM further demonstrates the potential of the proposedarchitecture to locate a suspicious area accurately in a WCEimage Taken together the results suggest a potential methodfor the automatic diagnosis of ulcers from WCE videos

Additionally we conducted experiments to investigatethe effect of number of cases We used split 0 datasets in thecross-validation experiment 990 cases for training 142 casesfor validation and 283 cases for testing We constructeddifferent training datasets from the 990 cases while fixed thevalidation and test dataset Firstly we did experiments onusing different number of cases for training We randomlyselected 659 cases 423 cases and 283 cases from 990 casesen we did another experiment using similar number of

00096

00096 00002

08018

00002 08018

00003

02384

00228

00003 02384 00228

00002

00026

00080

00989

00002 00026 00080 00989

HAnet-34(480)

HAnet-34(480)

Inception-res-v2(480)

Inception-res-v2(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

lt001

gt005

001ndash005

(b)

Figure 11 Comparison of different models (a) Accuracy and inference time comparison e horizontal and vertical axes denote theinference time and test accuracy respectively (b) Statistical test results of paired T-Test Number in each grid cell denotes the p value of thetwo models in the corresponding row and column

Table 7 Evaluation of different criteria

PRE RECALL F1 F2 ROC-AUCvgg-16(480) 09170 09012 09087 09040 09656dense-121(480) 09198 09047 09118 09074 09658Inception-ResNet-v2(480) 09208 09081 09138 09102 09706ResNet-34(480) 09218 09053 09133 09084 09698HAnet-34(480) 09237 09164 09199 09177 09726

Table 8 Recognition of ulcer with different sizes

ModelUlcer size

lt1 1ndash25 25ndash5 gt5ResNet-34(480) 8144plusmn 307 9186plusmn 140 9416plusmn 126 9651plusmn 143HAnet-34(480) 8237plusmn 360 9278plusmn 133 9540plusmn 074 9711plusmn 111

Computational and Mathematical Methods in Medicine 11

frames as last experiment that distributed in all 990 casesResults demonstrate that when similar number of frames areused for training test accuracies using training datasets withmore cases are better is should be attributed to richerdiversity introduced by more cases We may recommend touse as many cases as possible to train the model

While the performance of HAnet is very encouragingimproving its SE and SP further is necessary For examplethe fusion strategy in the proposed architecture involves

concatenation of features from shallow layers after GAPSemantic information in hyper features may not be as strongas that in deep features ie false-activated neural units dueto the relative limited receptive field in the shallow layersmay add unnecessary noise to the concatenated featurevector when GAP is utilized An attention mechanism [44]that can focus on the suspicious area may help address thisissue Temporal information from adjacent frames couldalso be used to provide external guidance during recognition

(a) (b)

(c) (d)

(e) (f )

Figure 12 Visualization network results of some representative ulcers with CAM A total of six groups of representative frames are obtained Foreach group the left image reflects the original frame while the right image shows the result of CAM (a) Typical ulcer (b) ulcer on the edge (c) ulcerin a turbid backgroundwith bubbles (d)multiple ulcers in one frame (e) ulcer in a shadow and (f) ulcer recognition in the framewith a flashlight

Table 9 Architectures of ResNet series members

Layer name Output size 18-Layer 34-Layer 50-Layer 101-Layer 152-LayerConv1 240 times 240 7 times 7 64 stride 2Maxpool 120 times 120 3 times 3 max pool stride 2

Layer 1 120 times 120 3 times 3 643 times 3 641113890 1113891 times 2 3 times 3 64

3 times 3 641113890 1113891 times 31 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Layer 2 60 times 60 3 times 3 1283 times 3 1281113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 41 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 8

Layer 3 30 times 30 3 times 3 2563 times 3 2561113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 61 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 6

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 23

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 36

Layer 4 15 times 15 3 times 3 5123 times 3 5121113890 1113891 times 2 3 times 3 512

3 times 3 5121113890 1113891 times 31 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Avgpool and fc 1 times 1 Global average pool 2-d fc

12 Computational and Mathematical Methods in Medicine

of the current frame In future work we will explore moretechniques in network design to improve ulcer recognition

Appendix

e architectures of members of the ResNet series includingResNet-18 ResNet-34 ResNet-50 ResNet-101 and ResNet-152 are illustrated in Table 9ese members share a similarpipeline consisting of seven components (ie conv1maxpool layer 1 layer 2 layer 3 layer 4 and avgpool + fc)Each layer in layers 1ndash4 has different subcomponentsdenoted as [ ] times n where [ ] represents a specific basicmodule consisting of 2-3 convolutional layers and n is thenumber of times the basic module is repeated Each row in[ ] describes the convolution layer setting eg [(3 times 3)64]

means a convolution layer with a kernel size and channel of3 times 3 and 64 respectively

Data Availability

eWCE datasets used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

Hao Zhang is currently an employee of Ankon TechnologiesCo Ltd (Wuhan Shanghai China) He helped in providingWCE data and organizing annotation e authors wouldlike to thank Ankon Technologies Co Ltd (WuhanShanghai China) for providing the WCE data (anko-ninccomcn)ey would also like to thank the participatingengineers of Ankon Technologies Co Ltd for theirthoughtful support and cooperation is work was sup-ported by a Research Project from Ankon Technologies CoLtd (Wuhan Shanghai China) the National Key ScientificInstrument and Equipment Development Project underGrant no 2013YQ160439 and the Zhangjiang NationalInnovation Demonstration Zone Special Development Fundunder Grant no ZJ2017-ZD-001

References

[1] L Shen Y-S Shan H-M Hu et al ldquoManagement of gastriccancer in Asia resource-stratified guidelinesrdquo e LancetOncology vol 14 no 12 pp e535ndashe547 2013

[2] Z Liao X Hou E-Q Lin-Hu et al ldquoAccuracy of magneticallycontrolled capsule endoscopy compared with conventionalgastroscopy in detection of gastric diseasesrdquo Clinical Gastro-enterology and Hepatology vol 14 no 9 pp 1266ndash1273e1 2016

[3] Y Yuan B Li and M Q-H Meng ldquoBleeding frame andregion detection in the wireless capsule endoscopy videordquoIEEE Journal of Biomedical and Health Informatics vol 20no 2 pp 624ndash630 2015

[4] N Shamsudhin V I Zverev H Keller et al ldquoMagneticallyguided capsule endoscopyrdquo Medical Physics vol 44 no 8pp e91ndashe111 2017

[5] B Li and M Q-H Meng ldquoComputer aided detection ofbleeding regions for capsule endoscopy imagesrdquo IEEETransactions on Biomedical Engineering vol 56 no 4pp 1032ndash1039 2009

[6] J-Y He X Wu Y-G Jiang Q Peng and R Jain ldquoHook-worm detection in wireless capsule endoscopy images withdeep learningrdquo IEEE Transactions on Image Processingvol 27 no 5 pp 2379ndash2392 2018

[7] Y Yuan and M Q-H Meng ldquoDeep learning for polyprecognition in wireless capsule endoscopy imagesrdquo MedicalPhysics vol 44 no 4 pp 1379ndash1389 2017

[8] B Li and M Q-H Meng ldquoAutomatic polyp detection forwireless capsule endoscopy imagesrdquo Expert Systems withApplications vol 39 no 12 pp 10952ndash10958 2012

[9] A Karargyris and N Bourbakis ldquoDetection of small bowelpolyps and ulcers in wireless capsule endoscopy videosrdquo IEEETransactions on Biomedical Engineering vol 58 no 10pp 2777ndash2786 2011

[10] L Yu P C Yuen and J Lai ldquoUlcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 21st In-ternational Conference on Pattern Recognition (ICPR2012)pp 45ndash48 IEEE Tsukuba Japan November 2012

[11] J-Y Yeh T-H Wu and W-J Tsai ldquoBleeding and ulcerdetection using wireless capsule endoscopy imagesrdquo Journalof Software Engineering and Applications vol 7 no 5pp 422ndash432 2014

[12] Y Fu W Zhang M Mandal and M Q-H Meng ldquoCom-puter-aided bleeding detection in WCE videordquo IEEE Journalof Biomedical and Health Informatics vol 18 no 2pp 636ndash642 2014

[13] V Charisis L Hadjileontiadis and G Sergiadis ldquoEnhancedulcer recognition from capsule endoscopic images usingtexture analysisrdquo in New Advances in the Basic and ClinicalGastroenterology IntechOpen London UK 2012

[14] A K Kundu and S A Fattah ldquoAn asymmetric indexed imagebased technique for automatic ulcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 2017 IEEERegion 10 Humanitarian Technology Conference (R10-HTC)pp 734ndash737 IEEE Dhaka Bangladesh December 2017

[15] E Ribeiro A Uhl W Georg andM Hafner ldquoExploring deeplearning and transfer learning for colonic polyp classifica-tionrdquo Computational andMathematical Methods in Medicinevol 2016 Article ID 6584725 16 pages 2016

[16] B Li and M Q-H Meng ldquoComputer-based detection ofbleeding and ulcer in wireless capsule endoscopy images bychromaticity momentsrdquo Computers in Biology and Medicinevol 39 no 2 pp 141ndash147 2009

[17] B Li L Qi M Q-H Meng and Y Fan ldquoUsing ensembleclassifier for small bowel ulcer detection in wireless capsuleendoscopy imagesrdquo in Proceedings of the 2009 IEEE In-ternational Conference on Robotics amp Biomimetics GuilinChina December 2009

[18] T Aoki A Yamada K Aoyama et al ldquoAutomatic detection oferosions and ulcerations in wireless capsule endoscopy imagesbased on a deep convolutional neural networkrdquo Gastrointes-tinal Endoscopy vol 89 no 2 pp 357ndash363e2 2019

[19] B Li andM Q-H Meng ldquoTexture analysis for ulcer detectionin capsule endoscopy imagesrdquo Image and Vision Computingvol 27 no 9 pp 1336ndash1342 2009

[20] Y Yuan and M Q-H Meng ldquoA novel feature for polypdetection in wireless capsule endoscopy imagesrdquo in Pro-ceedings of the 2014 IEEERSJ International Conference onIntelligent Robots and Systems pp 5010ndash5015 IEEE ChicagoIL USA September 2014

Computational and Mathematical Methods in Medicine 13

[21] A Krizhevsky I Sutskever and G Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inAdvances in Neural Information Processing Systems vol 25no 2 Curran Associates Inc Red Hook NY USA 2012

[22] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in Proceedings of the EuropeanConference on Computer Vision pp 818ndash833 Springer ChamSwitzerland September 2014

[23] C Szegedy W Liu Y Jia et al ldquoGoing deeper with con-volutionsrdquo in Proceedings of the 2015 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEEBoston MA USA June 2015

[24] C Szegedy V Vanhoucke S Ioffe J Shlens and Z WojnaldquoRethinking the inception architecture for computer visionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2818ndash2826 Las Vegas NV USA June2016

[25] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778 LasVegas NV USA June 2016

[26] G Huang Z Liu L Van Der Maaten and K Q WeinbergerldquoDensely connected convolutional networksrdquo in Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition pp 4700ndash4708 Honolulu HI USA July 2017

[27] C Szegedy S Ioffe V Vanhoucke and A Alemi ldquoInception-v4 inception-ResNet and the impact of residual connectionson learningrdquo in Proceedings of the irty-First AAAI Con-ference on Artificial Intelligence San Francisco CA USAFebruary 2017

[28] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[29] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquoNaturevol 521 no 7553 pp 436ndash444 2015

[30] M Turan E P Ornek N Ibrahimli et al ldquoUnsupervisedodometry and depth learning for endoscopic capsule robotsrdquo2018 httpsarxivorgabs180301047

[31] M Ye E Johns A Handa L Zhang P Pratt and G-Z YangldquoSelf-supervised Siamese learning on stereo image pairs fordepth estimation in robotic surgeryrdquo 2017 httpsarxivorgabs170508260

[32] M Faisal and N J Durr ldquoDeep learning and conditionalrandom fields-based depth estimation and topographicalreconstruction from conventional endoscopyrdquoMedical ImageAnalysis vol 48 pp 230ndash243 2018

[33] L Yu H Chen Q Dou J Qin and P A Heng ldquoIntegratingonline and offline three-dimensional deep learning for auto-mated polyp detection in colonoscopy videosrdquo IEEE Journal ofBiomedical andHealth Informatics vol 21 no1 pp 65ndash75 2017

[34] A A Shvets V I Iglovikov A Rakhlin and A A KalininldquoAngiodysplasia detection and localization using deep con-volutional neural networksrdquo in Proceedings of the 2018 17thIEEE International Conference on Machine Learning andApplications (ICMLA) pp 612ndash617 IEEE Orlando FL USADecember 2018

[35] S Fan L Xu Y Fan K Wei and L Li ldquoComputer-aideddetection of small intestinal ulcer and erosion in wirelesscapsule endoscopy imagesrdquo Physics in Medicine amp Biologyvol 63 no 16 p 165001 2018

[36] W Liu D Anguelov D Erhan et al ldquoSSD single shotmultibox detectorrdquo in Proceedings of the European Conferenceon Computer Vision pp 21ndash37 Springer Cham SwitzerlandOctober 2016

[37] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7263ndash7271 Honolulu HIUSA July 2017

[38] T-Y Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2117ndash2125 Honolulu HI USA July2017

[39] M Lin Q Chen and S Yan ldquoNetwork in networkrdquo 2013httpsarxivorgabs13124400

[40] T-Y Lin P Goyal R Girshick K He and P Dollar ldquoFocalloss for dense object detectionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 2980ndash2988Venice Italy October 2017

[41] H R Roth H Oda Y Hayashi et al ldquoHierarchical 3D fullyconvolutional networks for multi-organ segmentationrdquo 2017httpsarxivorgabs170406382

[42] X Liu J Bai G Liao Z Luo and C Wang ldquoDetection ofprotruding lesion in wireless capsule endoscopy videos of smallintestinerdquo in Proceedings of the Medical Imaging 2018 Com-puter-Aided Diagnosis Houston TX USA February 2018

[43] B Zhou A Khosla A Lapedriza A Oliva and A TorralbaldquoLearning deep features for discriminative localizationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2921ndash2929 Las Vegas NV USA June2016

[44] F Wang M Jiang C Qian et al ldquoResidual attention networkfor image classificationrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 3156ndash3164Honolulu HI USA July 2017

14 Computational and Mathematical Methods in Medicine

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 4: DeepConvolutionalNeuralNetworkforUlcerRecognitionin ...e emergence of wireless capsule endoscopy (WCE) has revolutionized the task of imaging GI issues; this technology offers a noninvasive

with the emergence of various deep CNN architectures suchas AlexNet VGGNet GoogLeNet and ResNet [21ndash28]

Many researchers have realized that handcrafted fea-tures merely encode partial information in WCE images[29] and that deep learning methods are capable ofextracting powerful feature representations that can beused in WCE lesion recognition and depth estimation[6 7 15 18 30ndash34]

A framework for hookworm detection was proposed in[14] this framework consists of an edge extraction networkand a hookworm classification network Inception modulesare used to capture multiscale features to capture spatialcorrelationse robustness and effectiveness of this methodwere verified in a dataset containing 440000WCE images of11 patients Yuan and Meng [7] proposed an autoencoder-based neural network model that introduces an imagemanifold constraint to a traditional sparse autoencoder torecognize polyps in WCE images Manifold constraint caneffectively enforce images within the same category to sharesimilar features and keep images in different categories farway ie it can preserve large intervariances and smallintravariance among images e proposed method wasevaluated using 3000 normal WCE images and 1000 WCEimages with polyps extracted from 35 patient videos Uti-lizing temporal information of WCE videos with 3D con-volution has also been explored for poly detection [33] Deeplearning methods are also adopted for ulcer diagnosis ereare also some investigations of ulcer recognition with deeplearning methods [18 35] Off-the-shelf CNN models aretrained and evaluated in these studies Experimental resultsand comparisons in these studies clearly demonstrate thesuperiority of deep learning methods over conventionalmachine learning techniques

From ulcer size analysis of our dataset we find that mostof the ulcers occupy only a tiny area in the whole imageDeep CNNs can inherently compute feature hierarchieslayer by layer Hyper features from shallow layers have highresolution but lack representation capacity by contrast deepfeatures from deep layers are semantically strong but havepoor resolution [36ndash38] ese features motivate us topropose a framework that fuses hyper and deep features toachieve ulcer recognition at vastly different scales We willgive detailed description of ulcer size analysis and theproposed method in Sections 22 and 23

22 Ulcer Dataset Our dataset is collected using a WCEsystem provided by Ankon Technologies Co Ltd(Wuhan Shanghai China) e WCE system consists ofan endoscopic capsule a guidance magnet robot a datarecorder and a computer workstation with software forreal-time viewing and controlling e capsule is28mm times 12mm in size and contains a permanent magnetin its dome Images are recorded and transferred at aspeed of 2 framess e resolution of the WCE image is480 times 480 pixels

e dataset used in this work to evaluate the perfor-mance of the proposed framework contains 1416 WCEvideos from 1416 patients (males 73 female 27) ie one

video per patient e WCE videos are collected from morethan 30 hospitals and 100 medical examination centersthrough the Ankon WCE system Each video is in-dependently annotated by at least two gastroenterologists Ifthe difference between annotation bounding boxes of thesame ulcer is larger than 10 an expert gastroenterologistwill review the annotation and provide a final decision eage distribution of patients is illustrated in Figure 3 eentire dataset consists of 1157 ulcer videos and 259 normalvideos In total 24839 frames are annotated as ulcers bygastroenterologists To balance the volume of each class24225 normal frames are randomly extracted from normalvideos for this study to match the 24839 representative ulcerframes A mask of diameter 420 pixels was used to crop thecenter area of each image in preprocessing is pre-processing did not change the image size

We plot the distribution of ulcer size in our dataset inFigure 4 e vertical and horizontal axes denote thenumber of images and the ratio of the ulcerated area to thewhole image size respectively Despite the inspiring suc-cess of CNNs in ImageNet competition ulcer recognitionpresents some challenge to the ImageNet classification taskbecause lesions normally occupy only a small area of WCEimages and the structures of lesions are rather subtle InFigure 4 about 25 of the ulcers occupy less than 1 of thearea of the whole image and more than 80 of the ulcersfound occupy less than 5 of the area of the image Hencea specific design of a suitable network is proposed to ac-count for the small ulcer problem and achieve goodsensitivity

23 HAnet-Based Ulcer Recognition Network with FusedHyper and Deep Features In this section we introduce ourdesign and the proposed architecture of our ulcer recog-nition network

Inspired by the design concept of previous works thatdeal with object recognition in vastly distributed scales[36ndash38] we propose an ulcer recognition network with ahyperconnection architecture (HAnet) e overall pipelineof this network is illustrated in Figure 5 FundamentallyHAnet fuses hyper and deep features Here we use ResNet-34 as the base feature-extraction network because accordingto our experiments (demonstrated in Section 3) it providesthe best results Global average pooling (GAP) [39] is used togenerate features for each layer GAP takes an average ofeach feature layer so that it reduces tensors with dimensionshtimes w times d to 1times 1times d Hyper features can be extracted frommultiple intermediate layers (layers 2 and 3 in this case) ofthe base network they are concatenated with the features oflast feature-extraction layer (layer 4 in this case) to make thefinal decision

Our WCE system outputs color images with a resolutionof 480times 480 pixels Experiments by the computer visioncommunity [36 40] have shown that high-resolution inputimages are helpful to the performance of CNN networks Tofully utilize the output images from the WCE system wemodify the base network to receive input images with a sizeof 480times 480times 3 without cropping or rescaling

4 Computational and Mathematical Methods in Medicine

24 Loss ofWeightedCross Entropy Cross-entropy (CE) lossis a common choice for classification tasks For binaryclassification [40] CE is defined as

CE(p y) minus y log(p) minus (1 minus y)log(1 minus p) (1)

where y isin 0 1 denotes the ground-truth label of thesample and p isin [0 1] is the estimated probability of asample belonging to the class with label 1 Mathematicallythe minimization process of CE is to enlarge the probabilitiesof samples with label 1 and suppress the probabilities ofsamples with label 0

To deal with possible imbalance between classes aweighting factor can be applied to different classes whichcan be called weighted cross-entropy (wCE) loss [41]

wCE(p y w) minusw

w + 1middot y log(p) minus

1w + 1

middot (1 minus y)log(1 minus p)

(2)

where w denotes the weighting factor to balance the loss ofdifferent classes Considering the overall small and varia-tional size distribution of ulcers as well as possible imbal-ance in the large dataset we set wCE as our loss function

25 Evaluation Criteria To evaluate the performance ofclassification accuracy (AC) sensitivity (SE) and specificity(SP) are exploited as metrics [6]

AC (TP + TN)

N

SE TP

(TP + FN)

SP TN

(TN + FP)

(3)

Here N is the total number of test images and TP FPTN and FN are the number of correctly classified imagescontaining ulcers the number of normal images falselyclassified as ulcer frames the number of correctly classifiedimages without ulcers and the number of images with ulcersfalsely classified as normal images respectively

AC gives an overall assessment of the performance of themodel SE denotes the modelrsquos ability to detect ulcer imagesand SP denotes its ability to distinguish normal imagesIdeally we expect both high SE and SP although some trade-offs between these metrics exist Considering that furthermanual inspection by the doctor of ulcer images detected bycomputer-aided systems is compulsory SE should be as highas possible with no negative impact on overall AC

We use a 5-fold cross-validation strategy at the case levelto evaluate the performances of different architectures thisstrategy splits the total number of cases evenly into fivesubsets Here one subset is used for testing and the fourother subsets are used for training and validation Figure 6illustrates the cross-validation operation In the presentstudy the volumes of train validation and test are about70 10 and 20 respectively Normal or ulcer frames arethen extracted from each case to form the trainingvalida-tiontesting dataset We perform case-level splitting becauseadjacent frames in the same case are likely to share similardetails We do not conduct frame-level cross-validationsplitting to avoid overfittinge validation dataset is used toselect the best model in each training process ie the modelwith the best validation accuracy during the training iter-ation is saved as the final model

3 Results

In this section the implementation process of the proposedmethod is introduced and its performance is evaluated bycomparison with several other related methods includingstate-of-the-art CNN methods and some representativeWCE recognition methods based on conventional machinelearning techniques

31 Network Architectures and Training Configurationse proposed HAnet connects hyper features to the finalfeature vector with the aim of enhancing the recognition ofulcers of different sizes e HAnet models are distinguishedby their architecture and training settings which includethree architectures and three training configurations in total

10ndash30 30ndash40 40ndash50 50ndash60 60ndash70 70ndash0

10

20

30

40

356

1804

3665

2810

1015

451

Age distribution

Case

por

tion

()

Figure 3 Age distribution of patients providing videos for thisstudy e horizontal axis denotes the age range and the verticalaxis denotes the case portion

lt1 1ndash25 25ndash5 gt50

2000

4000

6000

8000

6239

8969

5650

4115

Size ()

Case

num

ber

Figure 4 Distribution of ulcer size among patients e horizontalaxis denotes the ratio of ulcer area to the whole image while thevertical axis presents the number of frames

Computational and Mathematical Methods in Medicine 5

We illustrate these different architectures and configurationsin Table 2

ree different architectures can be obtained whenhyper features from layers 2 and 3 are used for decision incombination with features from layer 4 of our ResNetbackbone in Figure 7(a) hyper(l2) which fuses the hyperfeatures from layer 2 with the deep features of layer 4 inFigure 7(b) hyper(l3) which fuses features from layers 3and 4 and in Figure 7(c) hyper(l23) which fuses featuresfrom layers 2 3 and 4 to form the third HAnet Figure 7provides comprehensive diagrams of these different HAnetarchitectures

Each HAnet can be trained with three configurationsFigure 8 illustrates these configurations

311 ImageNet e whole HAnet is trained using pre-trained ResNet weights from ImageNet from initialization(denoted as ImageNet in Table 2) e total training processlasts for 40 epochs and the batch size is fixed to 16 samplese learning rate was initialized to be 10minus 3 and decayed by afactor of 10 at each period of 20 epochs e parameter ofmomentum is set to 09 Experimental results show that 40epochs are adequate for training to converge Weightedcross-entropy loss is used as the optimization criterion ebest model is selected based on validation results

312 All-Update ResNet(480) is first fine-tuned on ourdataset using pretrained ResNet weights from ImageNet for

initialization e training settings are identical to those in(1) Convergence is achieved during training and the bestmodel is selected based on validation results We then trainthe whole HAnet using the fine-tuned ResNet (480)models for initialization and update all weights in HAnet(denoted as all-update in Table 2) Training lasts for 40epochs e learning rate is set to 10minus 4 momentum is set to09 and the best model is selected based on validationresults

313 FC-Only e weights of the fine-tuned ResNet(480)model are used and only the last fully connected (FC) layeris updated in HAnet (denoted as FC-only in Table 2) ebest model is selected based on validation results Traininglasts for 10 epochs the learning rate is set to 10minus 4 andmomentum is set to 09

For example the first HAnet in Table 2 hyper(l2) FC-only refers to the architecture fusing the features from layer2 and the final layer 4 it uses ResNet (480) weights as thefeature extractor and only the final FC layer is updatedduring HAnet training

To achieve better generalizability data augmentation wasapplied online in the training procedure as suggested in [7]e images are randomly rotated between 0deg and 90deg andflipped with 50 possibility Our network is implementedusing PyTorch

e experiments are conducted on an Intel Xeon ma-chine (Gold 6130 CPU210GHz) with Nvidia Quadro

GAP (global average pooling)

GAP

Input Layer 1 Layer 2 Layer 3 Layer 4

GAP

Features

120 times 120 times 64 60 times 60 times 128 30 times 30 times 256 15 times 15 times 512 1 times 1 times 512 1 times 1 times 256 1 times 1 times 128480 times 480 times 3

Figure 5 Ulcer recognition network framework ResNet-34 which has 34 layers is selected as the feature extractor Here we only displaythe structural framework for clarity Detailed layer information can be found in Appendix

Split 0

Split 1

Split 4

hellip

helliphellip

Test Train Val

Figure 6 Illustration of cross-validation (green test dataset blue train dataset yellow validation dataset)

6 Computational and Mathematical Methods in Medicine

GP100 graphics cards Detailed results are presented inSections 32ndash34

32 Refinement of the Weighting Factor for Weighted Cross-Entropy To demonstrate the impact of different weightingfactors ie w in equation (2) we examine the cross-vali-dation results of model recognition accuracy with differentweighting factors e AC SE and SP curves are shown inFigure 9

AC varies with changes in weighting factor In generalSE improves while SP is degraded as the weighting factorincreases Detailed AC SE and SP values are listed inTable 3 ResNet-18(480) refers to experiments on a ResNet-18 network with a WCE full-resolution image input of480times 480 times 3 A possible explanation for the observed effectof the weighting factor is that the ulcer dataset containsmany consecutive frames of the same ulcer and theseframes may share marked similarities us while theframe numbers of the ulcer and normal dataset are com-parable the information contained by each dataset remainsunbalanced e weighting factor corrects or compensatesfor this imbalance

In the following experiments 40 is used as the weightingfactor as it outperforms other choices and simultaneouslyachieves good balance between SE and SP

33 Selection of Hyper Architectures We tested 10 models intotal as listed in Table 4 including a ResNet-18(480) modeland nine HAnet models based on ResNet-18(480) eresolution of input images is 480times 480times 3 for all models

According to the results in Table 4 the FC-only and all-update hyper models consistently outperform the ResNet-18(480) model in terms of the AC criterion which dem-onstrates the effectiveness of HAnet architectures More-over FC-only models generally perform better than all-update models thus implying that ResNet-18(480) extractsfeatures well and that further updates may corrupt thesefeatures

e hyper ImageNet models including hyper(l2)ImageNet hyper(l3) ImageNet and hyper(l23) ImageNetseem to give weak performance Hyper ImageNet modelsand the other hyper models share the same architecturese difference between these types of models is that thehyper ImageNet models are trained with the pretrainedImageNet ResNet-18 weights while the other models useResNet-18(480) weights that have been fine-tuned on theWCE datasetis finding reveals that a straightforward basenet such as ResNet-18(480) shows great power in extractingfeatures e complicated connections of HAnet may pro-hibit the network from reaching good convergence points

To fully utilize the advantages of hyper architectures werecommend a two-stage training process (1) Train a ResNet-18(480) model based on the ImageNet-pretrained weightsand then (2) use the fine-tuned ResNet-18(480) model as abackbone feature extractor to train the hyper models Wedenote the best model in all hyper architectures as HAnet-18(480) ie a hyper(l23) FC-only model

Additionally former exploration is based on ResNet-18and results indicate that a hyper(l23) FC-only architecturebased on the ResNet backbone feature extractor fine-tunedbyWCE images may be expected to improve the recognition

Table 2 Illustration of different architectures and configurations

ModelFeatures

Network initialization weights TrainingLayer 2 Layer 3 Layer 4

ResNet(480) ImageNet Train on WCE datasethyper(l2) FC-only

ResNet(480) Update FC-layer onlyhyper(l3) FC-only hyper(l23) FC-only hyper(l2) all-update

ResNet(480) Update all layershyper(l3) all-update hyper(l23) all-update hyper(l2) ImageNet

ImageNet Update all layershyper(l3) ImageNet hyper(l23) ImageNet

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(a)

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(b)

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(c)

Figure 7 Illustrations of HAnet architectures (a) hyper(l2) (b) hyper(l3) (c) hyper(l23)

Computational and Mathematical Methods in Medicine 7

ResNet(480) weights

ImageNet weights

Other trained weights

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) ImageNet

Layer 1

Layer 2

Layer 3

Layer 4

ResNet from ImageNet

Trai

n

Layer 1

Layer 2

Layer 3

Layer 4

ResNet(480)

FC la

yer

FC la

yer

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) all-update

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) FC-only

Train ResNet backbone to get ResNet(480)

all-update use the trained ResNet(480) weights as initialization and update all layers during training

FC-only use the trained ResNet(480) weights as initialization and update the FC layer during training

ImageNet use the ImageNet ResNet weights as initialization and update all layers during training

Figure 8 Illustration of three different training configurations including ImageNet all-update and FC-only Different colors are used torepresent different weights Green denotes pretrained weights from ImageNet and purple denotes the weights of ResNet(480) that have beenfine-tuned on the WCE dataset Several other colors denote other trained weights

1 2 3 4 5 688

89

90

91

92

93

94

Weighting factor

Val

ue (

)

ACSESP

Figure 9 AC SE and SP evolution against the wCE weighting factor Red purple and blue curves denote the results of AC SE and SPrespectively e horizontal axis is the weighting factor and the vertical axis is the value of AC SE and SP

8 Computational and Mathematical Methods in Medicine

capability of lesions in WCE videos To optimize our net-work we examine the performance of various ResNet seriesmembers to determine an appropriate backbone e cor-responding results are listed in Table 5 ResNet-34(480) hasbetter performance than ResNet-18(480) and ResNet-50(480) us we take ResNet-34(480) as our backbone totrain HAnet-34(480) e training settings are described inSection 31 Figure 10 gives the overall progression ofHAnet-34(480)

34 Comparison with Other Methods To evaluate the per-formance of HAnet we compared the proposed methodwith several other methods including several off-the-shelfCNN models [26ndash28] and two representative handcrafted-feature based methods for WCE recognition [3 42] e off-the-shelf CNN models including VGG [28] DenseNet [26]and Inception-ResNet-v2 [27] are trained to converge withthe same settings as ResNet-34(480) and the best model isselected based on the validation results For handcrafted-feature based methods grid searches to optimize hyperparameters are carried out

We performed repeated 2times 5-fold cross-validation toprovide sufficient measurements for statistical tests Table 6compares the detailed results of HAnet-34(480) with thoseof other methods On average HAnet-34(480) performsbetter in terms of AC SE and SP than the other methodsFigure 11(a) gives the location of each model considering itsinference time and accuracy Figure 11(b) is the statisticalresults of paired T-Test

Among the models tested HAnet-34(480) yields the bestperformance with good efficiency and accuracy Addition-ally the statistical test results demonstrate the improvementof our HAnet-34 is statistically significant Number in eachgrid cell denotes the p value of the two models in thecorresponding row and column We can see that the

improvement of HAnet-34(480) is statistically significant atthe 001 level compared with other methods

Table 7 gives more evaluation results based on severalcriteria including precision (PRE) recall (RECALL) F1 andF2 scores [33] and ROC-AUC [6] HAnet-34 outperformsall other models based on these criteria

4 Discussion

In this section the recognition capability of the proposedmethod for small lesions is demonstrated and discussedRecognition results are also visualized via the class activationmap (CAM) method [43] which indicates the localizationpotential of CNN networks for clinical diagnosis

41 Enhanced Recognition of Small Lesions by HAnet Toanalyze the recognition capacity of the proposed model thesensitivities of ulcers of different sizes are studied and theresults of ResNet-34(480) and the best hyper model HAnet-34(480) are listed in Table 8

Based on the results of each row in Table 8 most of theerrors noticeably occur in the small size range for bothmodels In general the larger the ulcer the easier its rec-ognition In the vertical comparison the ulcer recognition ofHAnet-34(480) outperforms that of ResNet-34(480) at allsize ranges including small lesions

42 Visualization of Recognition Results To better un-derstand our network we use a CAM [43] generated fromGAP to visualize the behavior of HAnet by highlighting therelatively important parts of an image and providing objectlocation information CAM is the weighted linear sum of theactivation map in the last convolutional layer e imageregions most relevant to a particular category can be simplyobtained by upsampling the CAM Using CAM we canverify what indeed has been learned by the network Six casesof representative results are displayed in Figure 12 For eachpair of images the left image shows the original frame whilethe right image shows the CAM result

ese results displayed in Figure 12 demonstrate thepotential use of HAnet for locating ulcers and easing thework of clinical physicians

Table 3 Cross-validation accuracy of ResNet-18(480) with different weighting factors

Weighting factor (w) 10 20 30 40 50 60AC () 9095plusmn 064 9100plusmn 049 9096plusmn 068 9100plusmn 070 9095plusmn 083 9072plusmn 075SE () 8865plusmn 064 8985plusmn 047 8986plusmn 101 9067plusmn 093 9150plusmn 076 9112plusmn 194SP () 9327plusmn 105 9215plusmn 122 9205plusmn 109 9145plusmn 184 9038plusmn 126 9032plusmn 188

Table 4 Performances of different architectures

ModelCross validation

AC () SE () SP ()ResNet-18(480) 9100plusmn 070 9055plusmn 086 9145plusmn 184hyper(l2) FC-only 9164plusmn 079 9122plusmn 108 9205plusmn 165hyper(l3) FC-only 9162plusmn 065 9115plusmn 056 9207plusmn 145hyper(l23) FC-only 9166 plusmn 081 9148 plusmn 090 9183 plusmn 175hyper(l2) all-update 9139plusmn 102 9128plusmn 104 9147plusmn 186hyper(l3) all-update 9150plusmn 081 9063plusmn 106 9233plusmn 174hyper(l23) all-update 9137plusmn 072 9133plusmn 063 914plusmn 142hyper(l2) ImageNet 9096plusmn 090 9052plusmn 110 9138plusmn 201hyper(l3) ImageNet 9104plusmn 080 9052plusmn 134 9154plusmn 131hyper(l23) ImageNet 9082plusmn 085 9026plusmn 133 9137plusmn 148

Table 5 Model recognition accuracy with different settings

ResNet-18(480) ResNet-34(480) ResNet-50(480)AC () 9100plusmn 070 9150plusmn 070 9129plusmn 091SE () 9055plusmn 086 9074plusmn 074 8963plusmn 183SP () 9145plusmn 184 9225plusmn 172 9294plusmn 182

Computational and Mathematical Methods in Medicine 9

3 HAnet architectures

hyper(l2)

hyper(l3)

hyper(l23)

3 Training configurations

ImageNet

all-update

FC-only

hyper(l23)

FC-only

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

3 Backbones

ResNet-34(480)

HAnet-34(480)

Cross validation

Cross validation

Figure 10 Progression of HAnet-34(480)

Table 6 Comparison of HAnet with other methods

AC () SE () SP ()SPM-BoW-SVM [42] 6138plusmn 122 5147plusmn 265 7067plusmn 224Words-based-color-histogram [3] 8034plusmn 029 8221plusmn 039 7844plusmn 029vgg-16(480) [28] 9085plusmn 098 9012plusmn 117 9202plusmn 252dense-121(480) [26] 9126plusmn 043 9047plusmn 167 9207plusmn 204Inception-ResNet-v2(480) [27] 9145plusmn 080 9081plusmn 195 9212plusmn 271ResNet-34(480) [25] 9147plusmn 052 9053plusmn 114 9241plusmn 166HAnet-34(480) 9205plusmn 052 9164plusmn 095 9242plusmn 154

0 10 20 30 409050

9075

9100

9125

9150

9175

9200

9225

9250

dense-121(480)

Inception-res-v2(480)

vgg-16(480)

HAnet-18(480)

HAnet-34(480)

HAnet-50(480)

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

Average inference time (ms)

Accu

racy

()

(a)

Figure 11 Continued

10 Computational and Mathematical Methods in Medicine

5 Conclusion

In this work we proposed a CNN architecture for ulcerdetection that uses a state-of-the-art CNN architecture(ResNet-34) as the feature extractor and fuses hyper anddeep features to enhance the recognition of ulcers of varioussizes A large ulcer dataset containing WCE videos from1416 patients was used for this studye proposed networkwas extensively evaluated and compared with other methodsusing overall AC SE SP F1 F2 and ROC-AUC as metrics

Experimental results demonstrate that the proposed ar-chitecture outperforms off-the-shelf CNN architectures es-pecially for the recognition of small ulcers Visualization with

CAM further demonstrates the potential of the proposedarchitecture to locate a suspicious area accurately in a WCEimage Taken together the results suggest a potential methodfor the automatic diagnosis of ulcers from WCE videos

Additionally we conducted experiments to investigatethe effect of number of cases We used split 0 datasets in thecross-validation experiment 990 cases for training 142 casesfor validation and 283 cases for testing We constructeddifferent training datasets from the 990 cases while fixed thevalidation and test dataset Firstly we did experiments onusing different number of cases for training We randomlyselected 659 cases 423 cases and 283 cases from 990 casesen we did another experiment using similar number of

00096

00096 00002

08018

00002 08018

00003

02384

00228

00003 02384 00228

00002

00026

00080

00989

00002 00026 00080 00989

HAnet-34(480)

HAnet-34(480)

Inception-res-v2(480)

Inception-res-v2(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

lt001

gt005

001ndash005

(b)

Figure 11 Comparison of different models (a) Accuracy and inference time comparison e horizontal and vertical axes denote theinference time and test accuracy respectively (b) Statistical test results of paired T-Test Number in each grid cell denotes the p value of thetwo models in the corresponding row and column

Table 7 Evaluation of different criteria

PRE RECALL F1 F2 ROC-AUCvgg-16(480) 09170 09012 09087 09040 09656dense-121(480) 09198 09047 09118 09074 09658Inception-ResNet-v2(480) 09208 09081 09138 09102 09706ResNet-34(480) 09218 09053 09133 09084 09698HAnet-34(480) 09237 09164 09199 09177 09726

Table 8 Recognition of ulcer with different sizes

ModelUlcer size

lt1 1ndash25 25ndash5 gt5ResNet-34(480) 8144plusmn 307 9186plusmn 140 9416plusmn 126 9651plusmn 143HAnet-34(480) 8237plusmn 360 9278plusmn 133 9540plusmn 074 9711plusmn 111

Computational and Mathematical Methods in Medicine 11

frames as last experiment that distributed in all 990 casesResults demonstrate that when similar number of frames areused for training test accuracies using training datasets withmore cases are better is should be attributed to richerdiversity introduced by more cases We may recommend touse as many cases as possible to train the model

While the performance of HAnet is very encouragingimproving its SE and SP further is necessary For examplethe fusion strategy in the proposed architecture involves

concatenation of features from shallow layers after GAPSemantic information in hyper features may not be as strongas that in deep features ie false-activated neural units dueto the relative limited receptive field in the shallow layersmay add unnecessary noise to the concatenated featurevector when GAP is utilized An attention mechanism [44]that can focus on the suspicious area may help address thisissue Temporal information from adjacent frames couldalso be used to provide external guidance during recognition

(a) (b)

(c) (d)

(e) (f )

Figure 12 Visualization network results of some representative ulcers with CAM A total of six groups of representative frames are obtained Foreach group the left image reflects the original frame while the right image shows the result of CAM (a) Typical ulcer (b) ulcer on the edge (c) ulcerin a turbid backgroundwith bubbles (d)multiple ulcers in one frame (e) ulcer in a shadow and (f) ulcer recognition in the framewith a flashlight

Table 9 Architectures of ResNet series members

Layer name Output size 18-Layer 34-Layer 50-Layer 101-Layer 152-LayerConv1 240 times 240 7 times 7 64 stride 2Maxpool 120 times 120 3 times 3 max pool stride 2

Layer 1 120 times 120 3 times 3 643 times 3 641113890 1113891 times 2 3 times 3 64

3 times 3 641113890 1113891 times 31 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Layer 2 60 times 60 3 times 3 1283 times 3 1281113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 41 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 8

Layer 3 30 times 30 3 times 3 2563 times 3 2561113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 61 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 6

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 23

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 36

Layer 4 15 times 15 3 times 3 5123 times 3 5121113890 1113891 times 2 3 times 3 512

3 times 3 5121113890 1113891 times 31 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Avgpool and fc 1 times 1 Global average pool 2-d fc

12 Computational and Mathematical Methods in Medicine

of the current frame In future work we will explore moretechniques in network design to improve ulcer recognition

Appendix

e architectures of members of the ResNet series includingResNet-18 ResNet-34 ResNet-50 ResNet-101 and ResNet-152 are illustrated in Table 9ese members share a similarpipeline consisting of seven components (ie conv1maxpool layer 1 layer 2 layer 3 layer 4 and avgpool + fc)Each layer in layers 1ndash4 has different subcomponentsdenoted as [ ] times n where [ ] represents a specific basicmodule consisting of 2-3 convolutional layers and n is thenumber of times the basic module is repeated Each row in[ ] describes the convolution layer setting eg [(3 times 3)64]

means a convolution layer with a kernel size and channel of3 times 3 and 64 respectively

Data Availability

eWCE datasets used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

Hao Zhang is currently an employee of Ankon TechnologiesCo Ltd (Wuhan Shanghai China) He helped in providingWCE data and organizing annotation e authors wouldlike to thank Ankon Technologies Co Ltd (WuhanShanghai China) for providing the WCE data (anko-ninccomcn)ey would also like to thank the participatingengineers of Ankon Technologies Co Ltd for theirthoughtful support and cooperation is work was sup-ported by a Research Project from Ankon Technologies CoLtd (Wuhan Shanghai China) the National Key ScientificInstrument and Equipment Development Project underGrant no 2013YQ160439 and the Zhangjiang NationalInnovation Demonstration Zone Special Development Fundunder Grant no ZJ2017-ZD-001

References

[1] L Shen Y-S Shan H-M Hu et al ldquoManagement of gastriccancer in Asia resource-stratified guidelinesrdquo e LancetOncology vol 14 no 12 pp e535ndashe547 2013

[2] Z Liao X Hou E-Q Lin-Hu et al ldquoAccuracy of magneticallycontrolled capsule endoscopy compared with conventionalgastroscopy in detection of gastric diseasesrdquo Clinical Gastro-enterology and Hepatology vol 14 no 9 pp 1266ndash1273e1 2016

[3] Y Yuan B Li and M Q-H Meng ldquoBleeding frame andregion detection in the wireless capsule endoscopy videordquoIEEE Journal of Biomedical and Health Informatics vol 20no 2 pp 624ndash630 2015

[4] N Shamsudhin V I Zverev H Keller et al ldquoMagneticallyguided capsule endoscopyrdquo Medical Physics vol 44 no 8pp e91ndashe111 2017

[5] B Li and M Q-H Meng ldquoComputer aided detection ofbleeding regions for capsule endoscopy imagesrdquo IEEETransactions on Biomedical Engineering vol 56 no 4pp 1032ndash1039 2009

[6] J-Y He X Wu Y-G Jiang Q Peng and R Jain ldquoHook-worm detection in wireless capsule endoscopy images withdeep learningrdquo IEEE Transactions on Image Processingvol 27 no 5 pp 2379ndash2392 2018

[7] Y Yuan and M Q-H Meng ldquoDeep learning for polyprecognition in wireless capsule endoscopy imagesrdquo MedicalPhysics vol 44 no 4 pp 1379ndash1389 2017

[8] B Li and M Q-H Meng ldquoAutomatic polyp detection forwireless capsule endoscopy imagesrdquo Expert Systems withApplications vol 39 no 12 pp 10952ndash10958 2012

[9] A Karargyris and N Bourbakis ldquoDetection of small bowelpolyps and ulcers in wireless capsule endoscopy videosrdquo IEEETransactions on Biomedical Engineering vol 58 no 10pp 2777ndash2786 2011

[10] L Yu P C Yuen and J Lai ldquoUlcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 21st In-ternational Conference on Pattern Recognition (ICPR2012)pp 45ndash48 IEEE Tsukuba Japan November 2012

[11] J-Y Yeh T-H Wu and W-J Tsai ldquoBleeding and ulcerdetection using wireless capsule endoscopy imagesrdquo Journalof Software Engineering and Applications vol 7 no 5pp 422ndash432 2014

[12] Y Fu W Zhang M Mandal and M Q-H Meng ldquoCom-puter-aided bleeding detection in WCE videordquo IEEE Journalof Biomedical and Health Informatics vol 18 no 2pp 636ndash642 2014

[13] V Charisis L Hadjileontiadis and G Sergiadis ldquoEnhancedulcer recognition from capsule endoscopic images usingtexture analysisrdquo in New Advances in the Basic and ClinicalGastroenterology IntechOpen London UK 2012

[14] A K Kundu and S A Fattah ldquoAn asymmetric indexed imagebased technique for automatic ulcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 2017 IEEERegion 10 Humanitarian Technology Conference (R10-HTC)pp 734ndash737 IEEE Dhaka Bangladesh December 2017

[15] E Ribeiro A Uhl W Georg andM Hafner ldquoExploring deeplearning and transfer learning for colonic polyp classifica-tionrdquo Computational andMathematical Methods in Medicinevol 2016 Article ID 6584725 16 pages 2016

[16] B Li and M Q-H Meng ldquoComputer-based detection ofbleeding and ulcer in wireless capsule endoscopy images bychromaticity momentsrdquo Computers in Biology and Medicinevol 39 no 2 pp 141ndash147 2009

[17] B Li L Qi M Q-H Meng and Y Fan ldquoUsing ensembleclassifier for small bowel ulcer detection in wireless capsuleendoscopy imagesrdquo in Proceedings of the 2009 IEEE In-ternational Conference on Robotics amp Biomimetics GuilinChina December 2009

[18] T Aoki A Yamada K Aoyama et al ldquoAutomatic detection oferosions and ulcerations in wireless capsule endoscopy imagesbased on a deep convolutional neural networkrdquo Gastrointes-tinal Endoscopy vol 89 no 2 pp 357ndash363e2 2019

[19] B Li andM Q-H Meng ldquoTexture analysis for ulcer detectionin capsule endoscopy imagesrdquo Image and Vision Computingvol 27 no 9 pp 1336ndash1342 2009

[20] Y Yuan and M Q-H Meng ldquoA novel feature for polypdetection in wireless capsule endoscopy imagesrdquo in Pro-ceedings of the 2014 IEEERSJ International Conference onIntelligent Robots and Systems pp 5010ndash5015 IEEE ChicagoIL USA September 2014

Computational and Mathematical Methods in Medicine 13

[21] A Krizhevsky I Sutskever and G Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inAdvances in Neural Information Processing Systems vol 25no 2 Curran Associates Inc Red Hook NY USA 2012

[22] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in Proceedings of the EuropeanConference on Computer Vision pp 818ndash833 Springer ChamSwitzerland September 2014

[23] C Szegedy W Liu Y Jia et al ldquoGoing deeper with con-volutionsrdquo in Proceedings of the 2015 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEEBoston MA USA June 2015

[24] C Szegedy V Vanhoucke S Ioffe J Shlens and Z WojnaldquoRethinking the inception architecture for computer visionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2818ndash2826 Las Vegas NV USA June2016

[25] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778 LasVegas NV USA June 2016

[26] G Huang Z Liu L Van Der Maaten and K Q WeinbergerldquoDensely connected convolutional networksrdquo in Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition pp 4700ndash4708 Honolulu HI USA July 2017

[27] C Szegedy S Ioffe V Vanhoucke and A Alemi ldquoInception-v4 inception-ResNet and the impact of residual connectionson learningrdquo in Proceedings of the irty-First AAAI Con-ference on Artificial Intelligence San Francisco CA USAFebruary 2017

[28] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[29] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquoNaturevol 521 no 7553 pp 436ndash444 2015

[30] M Turan E P Ornek N Ibrahimli et al ldquoUnsupervisedodometry and depth learning for endoscopic capsule robotsrdquo2018 httpsarxivorgabs180301047

[31] M Ye E Johns A Handa L Zhang P Pratt and G-Z YangldquoSelf-supervised Siamese learning on stereo image pairs fordepth estimation in robotic surgeryrdquo 2017 httpsarxivorgabs170508260

[32] M Faisal and N J Durr ldquoDeep learning and conditionalrandom fields-based depth estimation and topographicalreconstruction from conventional endoscopyrdquoMedical ImageAnalysis vol 48 pp 230ndash243 2018

[33] L Yu H Chen Q Dou J Qin and P A Heng ldquoIntegratingonline and offline three-dimensional deep learning for auto-mated polyp detection in colonoscopy videosrdquo IEEE Journal ofBiomedical andHealth Informatics vol 21 no1 pp 65ndash75 2017

[34] A A Shvets V I Iglovikov A Rakhlin and A A KalininldquoAngiodysplasia detection and localization using deep con-volutional neural networksrdquo in Proceedings of the 2018 17thIEEE International Conference on Machine Learning andApplications (ICMLA) pp 612ndash617 IEEE Orlando FL USADecember 2018

[35] S Fan L Xu Y Fan K Wei and L Li ldquoComputer-aideddetection of small intestinal ulcer and erosion in wirelesscapsule endoscopy imagesrdquo Physics in Medicine amp Biologyvol 63 no 16 p 165001 2018

[36] W Liu D Anguelov D Erhan et al ldquoSSD single shotmultibox detectorrdquo in Proceedings of the European Conferenceon Computer Vision pp 21ndash37 Springer Cham SwitzerlandOctober 2016

[37] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7263ndash7271 Honolulu HIUSA July 2017

[38] T-Y Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2117ndash2125 Honolulu HI USA July2017

[39] M Lin Q Chen and S Yan ldquoNetwork in networkrdquo 2013httpsarxivorgabs13124400

[40] T-Y Lin P Goyal R Girshick K He and P Dollar ldquoFocalloss for dense object detectionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 2980ndash2988Venice Italy October 2017

[41] H R Roth H Oda Y Hayashi et al ldquoHierarchical 3D fullyconvolutional networks for multi-organ segmentationrdquo 2017httpsarxivorgabs170406382

[42] X Liu J Bai G Liao Z Luo and C Wang ldquoDetection ofprotruding lesion in wireless capsule endoscopy videos of smallintestinerdquo in Proceedings of the Medical Imaging 2018 Com-puter-Aided Diagnosis Houston TX USA February 2018

[43] B Zhou A Khosla A Lapedriza A Oliva and A TorralbaldquoLearning deep features for discriminative localizationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2921ndash2929 Las Vegas NV USA June2016

[44] F Wang M Jiang C Qian et al ldquoResidual attention networkfor image classificationrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 3156ndash3164Honolulu HI USA July 2017

14 Computational and Mathematical Methods in Medicine

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 5: DeepConvolutionalNeuralNetworkforUlcerRecognitionin ...e emergence of wireless capsule endoscopy (WCE) has revolutionized the task of imaging GI issues; this technology offers a noninvasive

24 Loss ofWeightedCross Entropy Cross-entropy (CE) lossis a common choice for classification tasks For binaryclassification [40] CE is defined as

CE(p y) minus y log(p) minus (1 minus y)log(1 minus p) (1)

where y isin 0 1 denotes the ground-truth label of thesample and p isin [0 1] is the estimated probability of asample belonging to the class with label 1 Mathematicallythe minimization process of CE is to enlarge the probabilitiesof samples with label 1 and suppress the probabilities ofsamples with label 0

To deal with possible imbalance between classes aweighting factor can be applied to different classes whichcan be called weighted cross-entropy (wCE) loss [41]

wCE(p y w) minusw

w + 1middot y log(p) minus

1w + 1

middot (1 minus y)log(1 minus p)

(2)

where w denotes the weighting factor to balance the loss ofdifferent classes Considering the overall small and varia-tional size distribution of ulcers as well as possible imbal-ance in the large dataset we set wCE as our loss function

25 Evaluation Criteria To evaluate the performance ofclassification accuracy (AC) sensitivity (SE) and specificity(SP) are exploited as metrics [6]

AC (TP + TN)

N

SE TP

(TP + FN)

SP TN

(TN + FP)

(3)

Here N is the total number of test images and TP FPTN and FN are the number of correctly classified imagescontaining ulcers the number of normal images falselyclassified as ulcer frames the number of correctly classifiedimages without ulcers and the number of images with ulcersfalsely classified as normal images respectively

AC gives an overall assessment of the performance of themodel SE denotes the modelrsquos ability to detect ulcer imagesand SP denotes its ability to distinguish normal imagesIdeally we expect both high SE and SP although some trade-offs between these metrics exist Considering that furthermanual inspection by the doctor of ulcer images detected bycomputer-aided systems is compulsory SE should be as highas possible with no negative impact on overall AC

We use a 5-fold cross-validation strategy at the case levelto evaluate the performances of different architectures thisstrategy splits the total number of cases evenly into fivesubsets Here one subset is used for testing and the fourother subsets are used for training and validation Figure 6illustrates the cross-validation operation In the presentstudy the volumes of train validation and test are about70 10 and 20 respectively Normal or ulcer frames arethen extracted from each case to form the trainingvalida-tiontesting dataset We perform case-level splitting becauseadjacent frames in the same case are likely to share similardetails We do not conduct frame-level cross-validationsplitting to avoid overfittinge validation dataset is used toselect the best model in each training process ie the modelwith the best validation accuracy during the training iter-ation is saved as the final model

3 Results

In this section the implementation process of the proposedmethod is introduced and its performance is evaluated bycomparison with several other related methods includingstate-of-the-art CNN methods and some representativeWCE recognition methods based on conventional machinelearning techniques

31 Network Architectures and Training Configurationse proposed HAnet connects hyper features to the finalfeature vector with the aim of enhancing the recognition ofulcers of different sizes e HAnet models are distinguishedby their architecture and training settings which includethree architectures and three training configurations in total

10ndash30 30ndash40 40ndash50 50ndash60 60ndash70 70ndash0

10

20

30

40

356

1804

3665

2810

1015

451

Age distribution

Case

por

tion

()

Figure 3 Age distribution of patients providing videos for thisstudy e horizontal axis denotes the age range and the verticalaxis denotes the case portion

lt1 1ndash25 25ndash5 gt50

2000

4000

6000

8000

6239

8969

5650

4115

Size ()

Case

num

ber

Figure 4 Distribution of ulcer size among patients e horizontalaxis denotes the ratio of ulcer area to the whole image while thevertical axis presents the number of frames

Computational and Mathematical Methods in Medicine 5

We illustrate these different architectures and configurationsin Table 2

ree different architectures can be obtained whenhyper features from layers 2 and 3 are used for decision incombination with features from layer 4 of our ResNetbackbone in Figure 7(a) hyper(l2) which fuses the hyperfeatures from layer 2 with the deep features of layer 4 inFigure 7(b) hyper(l3) which fuses features from layers 3and 4 and in Figure 7(c) hyper(l23) which fuses featuresfrom layers 2 3 and 4 to form the third HAnet Figure 7provides comprehensive diagrams of these different HAnetarchitectures

Each HAnet can be trained with three configurationsFigure 8 illustrates these configurations

311 ImageNet e whole HAnet is trained using pre-trained ResNet weights from ImageNet from initialization(denoted as ImageNet in Table 2) e total training processlasts for 40 epochs and the batch size is fixed to 16 samplese learning rate was initialized to be 10minus 3 and decayed by afactor of 10 at each period of 20 epochs e parameter ofmomentum is set to 09 Experimental results show that 40epochs are adequate for training to converge Weightedcross-entropy loss is used as the optimization criterion ebest model is selected based on validation results

312 All-Update ResNet(480) is first fine-tuned on ourdataset using pretrained ResNet weights from ImageNet for

initialization e training settings are identical to those in(1) Convergence is achieved during training and the bestmodel is selected based on validation results We then trainthe whole HAnet using the fine-tuned ResNet (480)models for initialization and update all weights in HAnet(denoted as all-update in Table 2) Training lasts for 40epochs e learning rate is set to 10minus 4 momentum is set to09 and the best model is selected based on validationresults

313 FC-Only e weights of the fine-tuned ResNet(480)model are used and only the last fully connected (FC) layeris updated in HAnet (denoted as FC-only in Table 2) ebest model is selected based on validation results Traininglasts for 10 epochs the learning rate is set to 10minus 4 andmomentum is set to 09

For example the first HAnet in Table 2 hyper(l2) FC-only refers to the architecture fusing the features from layer2 and the final layer 4 it uses ResNet (480) weights as thefeature extractor and only the final FC layer is updatedduring HAnet training

To achieve better generalizability data augmentation wasapplied online in the training procedure as suggested in [7]e images are randomly rotated between 0deg and 90deg andflipped with 50 possibility Our network is implementedusing PyTorch

e experiments are conducted on an Intel Xeon ma-chine (Gold 6130 CPU210GHz) with Nvidia Quadro

GAP (global average pooling)

GAP

Input Layer 1 Layer 2 Layer 3 Layer 4

GAP

Features

120 times 120 times 64 60 times 60 times 128 30 times 30 times 256 15 times 15 times 512 1 times 1 times 512 1 times 1 times 256 1 times 1 times 128480 times 480 times 3

Figure 5 Ulcer recognition network framework ResNet-34 which has 34 layers is selected as the feature extractor Here we only displaythe structural framework for clarity Detailed layer information can be found in Appendix

Split 0

Split 1

Split 4

hellip

helliphellip

Test Train Val

Figure 6 Illustration of cross-validation (green test dataset blue train dataset yellow validation dataset)

6 Computational and Mathematical Methods in Medicine

GP100 graphics cards Detailed results are presented inSections 32ndash34

32 Refinement of the Weighting Factor for Weighted Cross-Entropy To demonstrate the impact of different weightingfactors ie w in equation (2) we examine the cross-vali-dation results of model recognition accuracy with differentweighting factors e AC SE and SP curves are shown inFigure 9

AC varies with changes in weighting factor In generalSE improves while SP is degraded as the weighting factorincreases Detailed AC SE and SP values are listed inTable 3 ResNet-18(480) refers to experiments on a ResNet-18 network with a WCE full-resolution image input of480times 480 times 3 A possible explanation for the observed effectof the weighting factor is that the ulcer dataset containsmany consecutive frames of the same ulcer and theseframes may share marked similarities us while theframe numbers of the ulcer and normal dataset are com-parable the information contained by each dataset remainsunbalanced e weighting factor corrects or compensatesfor this imbalance

In the following experiments 40 is used as the weightingfactor as it outperforms other choices and simultaneouslyachieves good balance between SE and SP

33 Selection of Hyper Architectures We tested 10 models intotal as listed in Table 4 including a ResNet-18(480) modeland nine HAnet models based on ResNet-18(480) eresolution of input images is 480times 480times 3 for all models

According to the results in Table 4 the FC-only and all-update hyper models consistently outperform the ResNet-18(480) model in terms of the AC criterion which dem-onstrates the effectiveness of HAnet architectures More-over FC-only models generally perform better than all-update models thus implying that ResNet-18(480) extractsfeatures well and that further updates may corrupt thesefeatures

e hyper ImageNet models including hyper(l2)ImageNet hyper(l3) ImageNet and hyper(l23) ImageNetseem to give weak performance Hyper ImageNet modelsand the other hyper models share the same architecturese difference between these types of models is that thehyper ImageNet models are trained with the pretrainedImageNet ResNet-18 weights while the other models useResNet-18(480) weights that have been fine-tuned on theWCE datasetis finding reveals that a straightforward basenet such as ResNet-18(480) shows great power in extractingfeatures e complicated connections of HAnet may pro-hibit the network from reaching good convergence points

To fully utilize the advantages of hyper architectures werecommend a two-stage training process (1) Train a ResNet-18(480) model based on the ImageNet-pretrained weightsand then (2) use the fine-tuned ResNet-18(480) model as abackbone feature extractor to train the hyper models Wedenote the best model in all hyper architectures as HAnet-18(480) ie a hyper(l23) FC-only model

Additionally former exploration is based on ResNet-18and results indicate that a hyper(l23) FC-only architecturebased on the ResNet backbone feature extractor fine-tunedbyWCE images may be expected to improve the recognition

Table 2 Illustration of different architectures and configurations

ModelFeatures

Network initialization weights TrainingLayer 2 Layer 3 Layer 4

ResNet(480) ImageNet Train on WCE datasethyper(l2) FC-only

ResNet(480) Update FC-layer onlyhyper(l3) FC-only hyper(l23) FC-only hyper(l2) all-update

ResNet(480) Update all layershyper(l3) all-update hyper(l23) all-update hyper(l2) ImageNet

ImageNet Update all layershyper(l3) ImageNet hyper(l23) ImageNet

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(a)

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(b)

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(c)

Figure 7 Illustrations of HAnet architectures (a) hyper(l2) (b) hyper(l3) (c) hyper(l23)

Computational and Mathematical Methods in Medicine 7

ResNet(480) weights

ImageNet weights

Other trained weights

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) ImageNet

Layer 1

Layer 2

Layer 3

Layer 4

ResNet from ImageNet

Trai

n

Layer 1

Layer 2

Layer 3

Layer 4

ResNet(480)

FC la

yer

FC la

yer

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) all-update

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) FC-only

Train ResNet backbone to get ResNet(480)

all-update use the trained ResNet(480) weights as initialization and update all layers during training

FC-only use the trained ResNet(480) weights as initialization and update the FC layer during training

ImageNet use the ImageNet ResNet weights as initialization and update all layers during training

Figure 8 Illustration of three different training configurations including ImageNet all-update and FC-only Different colors are used torepresent different weights Green denotes pretrained weights from ImageNet and purple denotes the weights of ResNet(480) that have beenfine-tuned on the WCE dataset Several other colors denote other trained weights

1 2 3 4 5 688

89

90

91

92

93

94

Weighting factor

Val

ue (

)

ACSESP

Figure 9 AC SE and SP evolution against the wCE weighting factor Red purple and blue curves denote the results of AC SE and SPrespectively e horizontal axis is the weighting factor and the vertical axis is the value of AC SE and SP

8 Computational and Mathematical Methods in Medicine

capability of lesions in WCE videos To optimize our net-work we examine the performance of various ResNet seriesmembers to determine an appropriate backbone e cor-responding results are listed in Table 5 ResNet-34(480) hasbetter performance than ResNet-18(480) and ResNet-50(480) us we take ResNet-34(480) as our backbone totrain HAnet-34(480) e training settings are described inSection 31 Figure 10 gives the overall progression ofHAnet-34(480)

34 Comparison with Other Methods To evaluate the per-formance of HAnet we compared the proposed methodwith several other methods including several off-the-shelfCNN models [26ndash28] and two representative handcrafted-feature based methods for WCE recognition [3 42] e off-the-shelf CNN models including VGG [28] DenseNet [26]and Inception-ResNet-v2 [27] are trained to converge withthe same settings as ResNet-34(480) and the best model isselected based on the validation results For handcrafted-feature based methods grid searches to optimize hyperparameters are carried out

We performed repeated 2times 5-fold cross-validation toprovide sufficient measurements for statistical tests Table 6compares the detailed results of HAnet-34(480) with thoseof other methods On average HAnet-34(480) performsbetter in terms of AC SE and SP than the other methodsFigure 11(a) gives the location of each model considering itsinference time and accuracy Figure 11(b) is the statisticalresults of paired T-Test

Among the models tested HAnet-34(480) yields the bestperformance with good efficiency and accuracy Addition-ally the statistical test results demonstrate the improvementof our HAnet-34 is statistically significant Number in eachgrid cell denotes the p value of the two models in thecorresponding row and column We can see that the

improvement of HAnet-34(480) is statistically significant atthe 001 level compared with other methods

Table 7 gives more evaluation results based on severalcriteria including precision (PRE) recall (RECALL) F1 andF2 scores [33] and ROC-AUC [6] HAnet-34 outperformsall other models based on these criteria

4 Discussion

In this section the recognition capability of the proposedmethod for small lesions is demonstrated and discussedRecognition results are also visualized via the class activationmap (CAM) method [43] which indicates the localizationpotential of CNN networks for clinical diagnosis

41 Enhanced Recognition of Small Lesions by HAnet Toanalyze the recognition capacity of the proposed model thesensitivities of ulcers of different sizes are studied and theresults of ResNet-34(480) and the best hyper model HAnet-34(480) are listed in Table 8

Based on the results of each row in Table 8 most of theerrors noticeably occur in the small size range for bothmodels In general the larger the ulcer the easier its rec-ognition In the vertical comparison the ulcer recognition ofHAnet-34(480) outperforms that of ResNet-34(480) at allsize ranges including small lesions

42 Visualization of Recognition Results To better un-derstand our network we use a CAM [43] generated fromGAP to visualize the behavior of HAnet by highlighting therelatively important parts of an image and providing objectlocation information CAM is the weighted linear sum of theactivation map in the last convolutional layer e imageregions most relevant to a particular category can be simplyobtained by upsampling the CAM Using CAM we canverify what indeed has been learned by the network Six casesof representative results are displayed in Figure 12 For eachpair of images the left image shows the original frame whilethe right image shows the CAM result

ese results displayed in Figure 12 demonstrate thepotential use of HAnet for locating ulcers and easing thework of clinical physicians

Table 3 Cross-validation accuracy of ResNet-18(480) with different weighting factors

Weighting factor (w) 10 20 30 40 50 60AC () 9095plusmn 064 9100plusmn 049 9096plusmn 068 9100plusmn 070 9095plusmn 083 9072plusmn 075SE () 8865plusmn 064 8985plusmn 047 8986plusmn 101 9067plusmn 093 9150plusmn 076 9112plusmn 194SP () 9327plusmn 105 9215plusmn 122 9205plusmn 109 9145plusmn 184 9038plusmn 126 9032plusmn 188

Table 4 Performances of different architectures

ModelCross validation

AC () SE () SP ()ResNet-18(480) 9100plusmn 070 9055plusmn 086 9145plusmn 184hyper(l2) FC-only 9164plusmn 079 9122plusmn 108 9205plusmn 165hyper(l3) FC-only 9162plusmn 065 9115plusmn 056 9207plusmn 145hyper(l23) FC-only 9166 plusmn 081 9148 plusmn 090 9183 plusmn 175hyper(l2) all-update 9139plusmn 102 9128plusmn 104 9147plusmn 186hyper(l3) all-update 9150plusmn 081 9063plusmn 106 9233plusmn 174hyper(l23) all-update 9137plusmn 072 9133plusmn 063 914plusmn 142hyper(l2) ImageNet 9096plusmn 090 9052plusmn 110 9138plusmn 201hyper(l3) ImageNet 9104plusmn 080 9052plusmn 134 9154plusmn 131hyper(l23) ImageNet 9082plusmn 085 9026plusmn 133 9137plusmn 148

Table 5 Model recognition accuracy with different settings

ResNet-18(480) ResNet-34(480) ResNet-50(480)AC () 9100plusmn 070 9150plusmn 070 9129plusmn 091SE () 9055plusmn 086 9074plusmn 074 8963plusmn 183SP () 9145plusmn 184 9225plusmn 172 9294plusmn 182

Computational and Mathematical Methods in Medicine 9

3 HAnet architectures

hyper(l2)

hyper(l3)

hyper(l23)

3 Training configurations

ImageNet

all-update

FC-only

hyper(l23)

FC-only

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

3 Backbones

ResNet-34(480)

HAnet-34(480)

Cross validation

Cross validation

Figure 10 Progression of HAnet-34(480)

Table 6 Comparison of HAnet with other methods

AC () SE () SP ()SPM-BoW-SVM [42] 6138plusmn 122 5147plusmn 265 7067plusmn 224Words-based-color-histogram [3] 8034plusmn 029 8221plusmn 039 7844plusmn 029vgg-16(480) [28] 9085plusmn 098 9012plusmn 117 9202plusmn 252dense-121(480) [26] 9126plusmn 043 9047plusmn 167 9207plusmn 204Inception-ResNet-v2(480) [27] 9145plusmn 080 9081plusmn 195 9212plusmn 271ResNet-34(480) [25] 9147plusmn 052 9053plusmn 114 9241plusmn 166HAnet-34(480) 9205plusmn 052 9164plusmn 095 9242plusmn 154

0 10 20 30 409050

9075

9100

9125

9150

9175

9200

9225

9250

dense-121(480)

Inception-res-v2(480)

vgg-16(480)

HAnet-18(480)

HAnet-34(480)

HAnet-50(480)

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

Average inference time (ms)

Accu

racy

()

(a)

Figure 11 Continued

10 Computational and Mathematical Methods in Medicine

5 Conclusion

In this work we proposed a CNN architecture for ulcerdetection that uses a state-of-the-art CNN architecture(ResNet-34) as the feature extractor and fuses hyper anddeep features to enhance the recognition of ulcers of varioussizes A large ulcer dataset containing WCE videos from1416 patients was used for this studye proposed networkwas extensively evaluated and compared with other methodsusing overall AC SE SP F1 F2 and ROC-AUC as metrics

Experimental results demonstrate that the proposed ar-chitecture outperforms off-the-shelf CNN architectures es-pecially for the recognition of small ulcers Visualization with

CAM further demonstrates the potential of the proposedarchitecture to locate a suspicious area accurately in a WCEimage Taken together the results suggest a potential methodfor the automatic diagnosis of ulcers from WCE videos

Additionally we conducted experiments to investigatethe effect of number of cases We used split 0 datasets in thecross-validation experiment 990 cases for training 142 casesfor validation and 283 cases for testing We constructeddifferent training datasets from the 990 cases while fixed thevalidation and test dataset Firstly we did experiments onusing different number of cases for training We randomlyselected 659 cases 423 cases and 283 cases from 990 casesen we did another experiment using similar number of

00096

00096 00002

08018

00002 08018

00003

02384

00228

00003 02384 00228

00002

00026

00080

00989

00002 00026 00080 00989

HAnet-34(480)

HAnet-34(480)

Inception-res-v2(480)

Inception-res-v2(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

lt001

gt005

001ndash005

(b)

Figure 11 Comparison of different models (a) Accuracy and inference time comparison e horizontal and vertical axes denote theinference time and test accuracy respectively (b) Statistical test results of paired T-Test Number in each grid cell denotes the p value of thetwo models in the corresponding row and column

Table 7 Evaluation of different criteria

PRE RECALL F1 F2 ROC-AUCvgg-16(480) 09170 09012 09087 09040 09656dense-121(480) 09198 09047 09118 09074 09658Inception-ResNet-v2(480) 09208 09081 09138 09102 09706ResNet-34(480) 09218 09053 09133 09084 09698HAnet-34(480) 09237 09164 09199 09177 09726

Table 8 Recognition of ulcer with different sizes

ModelUlcer size

lt1 1ndash25 25ndash5 gt5ResNet-34(480) 8144plusmn 307 9186plusmn 140 9416plusmn 126 9651plusmn 143HAnet-34(480) 8237plusmn 360 9278plusmn 133 9540plusmn 074 9711plusmn 111

Computational and Mathematical Methods in Medicine 11

frames as last experiment that distributed in all 990 casesResults demonstrate that when similar number of frames areused for training test accuracies using training datasets withmore cases are better is should be attributed to richerdiversity introduced by more cases We may recommend touse as many cases as possible to train the model

While the performance of HAnet is very encouragingimproving its SE and SP further is necessary For examplethe fusion strategy in the proposed architecture involves

concatenation of features from shallow layers after GAPSemantic information in hyper features may not be as strongas that in deep features ie false-activated neural units dueto the relative limited receptive field in the shallow layersmay add unnecessary noise to the concatenated featurevector when GAP is utilized An attention mechanism [44]that can focus on the suspicious area may help address thisissue Temporal information from adjacent frames couldalso be used to provide external guidance during recognition

(a) (b)

(c) (d)

(e) (f )

Figure 12 Visualization network results of some representative ulcers with CAM A total of six groups of representative frames are obtained Foreach group the left image reflects the original frame while the right image shows the result of CAM (a) Typical ulcer (b) ulcer on the edge (c) ulcerin a turbid backgroundwith bubbles (d)multiple ulcers in one frame (e) ulcer in a shadow and (f) ulcer recognition in the framewith a flashlight

Table 9 Architectures of ResNet series members

Layer name Output size 18-Layer 34-Layer 50-Layer 101-Layer 152-LayerConv1 240 times 240 7 times 7 64 stride 2Maxpool 120 times 120 3 times 3 max pool stride 2

Layer 1 120 times 120 3 times 3 643 times 3 641113890 1113891 times 2 3 times 3 64

3 times 3 641113890 1113891 times 31 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Layer 2 60 times 60 3 times 3 1283 times 3 1281113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 41 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 8

Layer 3 30 times 30 3 times 3 2563 times 3 2561113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 61 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 6

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 23

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 36

Layer 4 15 times 15 3 times 3 5123 times 3 5121113890 1113891 times 2 3 times 3 512

3 times 3 5121113890 1113891 times 31 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Avgpool and fc 1 times 1 Global average pool 2-d fc

12 Computational and Mathematical Methods in Medicine

of the current frame In future work we will explore moretechniques in network design to improve ulcer recognition

Appendix

e architectures of members of the ResNet series includingResNet-18 ResNet-34 ResNet-50 ResNet-101 and ResNet-152 are illustrated in Table 9ese members share a similarpipeline consisting of seven components (ie conv1maxpool layer 1 layer 2 layer 3 layer 4 and avgpool + fc)Each layer in layers 1ndash4 has different subcomponentsdenoted as [ ] times n where [ ] represents a specific basicmodule consisting of 2-3 convolutional layers and n is thenumber of times the basic module is repeated Each row in[ ] describes the convolution layer setting eg [(3 times 3)64]

means a convolution layer with a kernel size and channel of3 times 3 and 64 respectively

Data Availability

eWCE datasets used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

Hao Zhang is currently an employee of Ankon TechnologiesCo Ltd (Wuhan Shanghai China) He helped in providingWCE data and organizing annotation e authors wouldlike to thank Ankon Technologies Co Ltd (WuhanShanghai China) for providing the WCE data (anko-ninccomcn)ey would also like to thank the participatingengineers of Ankon Technologies Co Ltd for theirthoughtful support and cooperation is work was sup-ported by a Research Project from Ankon Technologies CoLtd (Wuhan Shanghai China) the National Key ScientificInstrument and Equipment Development Project underGrant no 2013YQ160439 and the Zhangjiang NationalInnovation Demonstration Zone Special Development Fundunder Grant no ZJ2017-ZD-001

References

[1] L Shen Y-S Shan H-M Hu et al ldquoManagement of gastriccancer in Asia resource-stratified guidelinesrdquo e LancetOncology vol 14 no 12 pp e535ndashe547 2013

[2] Z Liao X Hou E-Q Lin-Hu et al ldquoAccuracy of magneticallycontrolled capsule endoscopy compared with conventionalgastroscopy in detection of gastric diseasesrdquo Clinical Gastro-enterology and Hepatology vol 14 no 9 pp 1266ndash1273e1 2016

[3] Y Yuan B Li and M Q-H Meng ldquoBleeding frame andregion detection in the wireless capsule endoscopy videordquoIEEE Journal of Biomedical and Health Informatics vol 20no 2 pp 624ndash630 2015

[4] N Shamsudhin V I Zverev H Keller et al ldquoMagneticallyguided capsule endoscopyrdquo Medical Physics vol 44 no 8pp e91ndashe111 2017

[5] B Li and M Q-H Meng ldquoComputer aided detection ofbleeding regions for capsule endoscopy imagesrdquo IEEETransactions on Biomedical Engineering vol 56 no 4pp 1032ndash1039 2009

[6] J-Y He X Wu Y-G Jiang Q Peng and R Jain ldquoHook-worm detection in wireless capsule endoscopy images withdeep learningrdquo IEEE Transactions on Image Processingvol 27 no 5 pp 2379ndash2392 2018

[7] Y Yuan and M Q-H Meng ldquoDeep learning for polyprecognition in wireless capsule endoscopy imagesrdquo MedicalPhysics vol 44 no 4 pp 1379ndash1389 2017

[8] B Li and M Q-H Meng ldquoAutomatic polyp detection forwireless capsule endoscopy imagesrdquo Expert Systems withApplications vol 39 no 12 pp 10952ndash10958 2012

[9] A Karargyris and N Bourbakis ldquoDetection of small bowelpolyps and ulcers in wireless capsule endoscopy videosrdquo IEEETransactions on Biomedical Engineering vol 58 no 10pp 2777ndash2786 2011

[10] L Yu P C Yuen and J Lai ldquoUlcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 21st In-ternational Conference on Pattern Recognition (ICPR2012)pp 45ndash48 IEEE Tsukuba Japan November 2012

[11] J-Y Yeh T-H Wu and W-J Tsai ldquoBleeding and ulcerdetection using wireless capsule endoscopy imagesrdquo Journalof Software Engineering and Applications vol 7 no 5pp 422ndash432 2014

[12] Y Fu W Zhang M Mandal and M Q-H Meng ldquoCom-puter-aided bleeding detection in WCE videordquo IEEE Journalof Biomedical and Health Informatics vol 18 no 2pp 636ndash642 2014

[13] V Charisis L Hadjileontiadis and G Sergiadis ldquoEnhancedulcer recognition from capsule endoscopic images usingtexture analysisrdquo in New Advances in the Basic and ClinicalGastroenterology IntechOpen London UK 2012

[14] A K Kundu and S A Fattah ldquoAn asymmetric indexed imagebased technique for automatic ulcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 2017 IEEERegion 10 Humanitarian Technology Conference (R10-HTC)pp 734ndash737 IEEE Dhaka Bangladesh December 2017

[15] E Ribeiro A Uhl W Georg andM Hafner ldquoExploring deeplearning and transfer learning for colonic polyp classifica-tionrdquo Computational andMathematical Methods in Medicinevol 2016 Article ID 6584725 16 pages 2016

[16] B Li and M Q-H Meng ldquoComputer-based detection ofbleeding and ulcer in wireless capsule endoscopy images bychromaticity momentsrdquo Computers in Biology and Medicinevol 39 no 2 pp 141ndash147 2009

[17] B Li L Qi M Q-H Meng and Y Fan ldquoUsing ensembleclassifier for small bowel ulcer detection in wireless capsuleendoscopy imagesrdquo in Proceedings of the 2009 IEEE In-ternational Conference on Robotics amp Biomimetics GuilinChina December 2009

[18] T Aoki A Yamada K Aoyama et al ldquoAutomatic detection oferosions and ulcerations in wireless capsule endoscopy imagesbased on a deep convolutional neural networkrdquo Gastrointes-tinal Endoscopy vol 89 no 2 pp 357ndash363e2 2019

[19] B Li andM Q-H Meng ldquoTexture analysis for ulcer detectionin capsule endoscopy imagesrdquo Image and Vision Computingvol 27 no 9 pp 1336ndash1342 2009

[20] Y Yuan and M Q-H Meng ldquoA novel feature for polypdetection in wireless capsule endoscopy imagesrdquo in Pro-ceedings of the 2014 IEEERSJ International Conference onIntelligent Robots and Systems pp 5010ndash5015 IEEE ChicagoIL USA September 2014

Computational and Mathematical Methods in Medicine 13

[21] A Krizhevsky I Sutskever and G Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inAdvances in Neural Information Processing Systems vol 25no 2 Curran Associates Inc Red Hook NY USA 2012

[22] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in Proceedings of the EuropeanConference on Computer Vision pp 818ndash833 Springer ChamSwitzerland September 2014

[23] C Szegedy W Liu Y Jia et al ldquoGoing deeper with con-volutionsrdquo in Proceedings of the 2015 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEEBoston MA USA June 2015

[24] C Szegedy V Vanhoucke S Ioffe J Shlens and Z WojnaldquoRethinking the inception architecture for computer visionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2818ndash2826 Las Vegas NV USA June2016

[25] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778 LasVegas NV USA June 2016

[26] G Huang Z Liu L Van Der Maaten and K Q WeinbergerldquoDensely connected convolutional networksrdquo in Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition pp 4700ndash4708 Honolulu HI USA July 2017

[27] C Szegedy S Ioffe V Vanhoucke and A Alemi ldquoInception-v4 inception-ResNet and the impact of residual connectionson learningrdquo in Proceedings of the irty-First AAAI Con-ference on Artificial Intelligence San Francisco CA USAFebruary 2017

[28] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[29] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquoNaturevol 521 no 7553 pp 436ndash444 2015

[30] M Turan E P Ornek N Ibrahimli et al ldquoUnsupervisedodometry and depth learning for endoscopic capsule robotsrdquo2018 httpsarxivorgabs180301047

[31] M Ye E Johns A Handa L Zhang P Pratt and G-Z YangldquoSelf-supervised Siamese learning on stereo image pairs fordepth estimation in robotic surgeryrdquo 2017 httpsarxivorgabs170508260

[32] M Faisal and N J Durr ldquoDeep learning and conditionalrandom fields-based depth estimation and topographicalreconstruction from conventional endoscopyrdquoMedical ImageAnalysis vol 48 pp 230ndash243 2018

[33] L Yu H Chen Q Dou J Qin and P A Heng ldquoIntegratingonline and offline three-dimensional deep learning for auto-mated polyp detection in colonoscopy videosrdquo IEEE Journal ofBiomedical andHealth Informatics vol 21 no1 pp 65ndash75 2017

[34] A A Shvets V I Iglovikov A Rakhlin and A A KalininldquoAngiodysplasia detection and localization using deep con-volutional neural networksrdquo in Proceedings of the 2018 17thIEEE International Conference on Machine Learning andApplications (ICMLA) pp 612ndash617 IEEE Orlando FL USADecember 2018

[35] S Fan L Xu Y Fan K Wei and L Li ldquoComputer-aideddetection of small intestinal ulcer and erosion in wirelesscapsule endoscopy imagesrdquo Physics in Medicine amp Biologyvol 63 no 16 p 165001 2018

[36] W Liu D Anguelov D Erhan et al ldquoSSD single shotmultibox detectorrdquo in Proceedings of the European Conferenceon Computer Vision pp 21ndash37 Springer Cham SwitzerlandOctober 2016

[37] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7263ndash7271 Honolulu HIUSA July 2017

[38] T-Y Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2117ndash2125 Honolulu HI USA July2017

[39] M Lin Q Chen and S Yan ldquoNetwork in networkrdquo 2013httpsarxivorgabs13124400

[40] T-Y Lin P Goyal R Girshick K He and P Dollar ldquoFocalloss for dense object detectionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 2980ndash2988Venice Italy October 2017

[41] H R Roth H Oda Y Hayashi et al ldquoHierarchical 3D fullyconvolutional networks for multi-organ segmentationrdquo 2017httpsarxivorgabs170406382

[42] X Liu J Bai G Liao Z Luo and C Wang ldquoDetection ofprotruding lesion in wireless capsule endoscopy videos of smallintestinerdquo in Proceedings of the Medical Imaging 2018 Com-puter-Aided Diagnosis Houston TX USA February 2018

[43] B Zhou A Khosla A Lapedriza A Oliva and A TorralbaldquoLearning deep features for discriminative localizationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2921ndash2929 Las Vegas NV USA June2016

[44] F Wang M Jiang C Qian et al ldquoResidual attention networkfor image classificationrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 3156ndash3164Honolulu HI USA July 2017

14 Computational and Mathematical Methods in Medicine

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 6: DeepConvolutionalNeuralNetworkforUlcerRecognitionin ...e emergence of wireless capsule endoscopy (WCE) has revolutionized the task of imaging GI issues; this technology offers a noninvasive

We illustrate these different architectures and configurationsin Table 2

ree different architectures can be obtained whenhyper features from layers 2 and 3 are used for decision incombination with features from layer 4 of our ResNetbackbone in Figure 7(a) hyper(l2) which fuses the hyperfeatures from layer 2 with the deep features of layer 4 inFigure 7(b) hyper(l3) which fuses features from layers 3and 4 and in Figure 7(c) hyper(l23) which fuses featuresfrom layers 2 3 and 4 to form the third HAnet Figure 7provides comprehensive diagrams of these different HAnetarchitectures

Each HAnet can be trained with three configurationsFigure 8 illustrates these configurations

311 ImageNet e whole HAnet is trained using pre-trained ResNet weights from ImageNet from initialization(denoted as ImageNet in Table 2) e total training processlasts for 40 epochs and the batch size is fixed to 16 samplese learning rate was initialized to be 10minus 3 and decayed by afactor of 10 at each period of 20 epochs e parameter ofmomentum is set to 09 Experimental results show that 40epochs are adequate for training to converge Weightedcross-entropy loss is used as the optimization criterion ebest model is selected based on validation results

312 All-Update ResNet(480) is first fine-tuned on ourdataset using pretrained ResNet weights from ImageNet for

initialization e training settings are identical to those in(1) Convergence is achieved during training and the bestmodel is selected based on validation results We then trainthe whole HAnet using the fine-tuned ResNet (480)models for initialization and update all weights in HAnet(denoted as all-update in Table 2) Training lasts for 40epochs e learning rate is set to 10minus 4 momentum is set to09 and the best model is selected based on validationresults

313 FC-Only e weights of the fine-tuned ResNet(480)model are used and only the last fully connected (FC) layeris updated in HAnet (denoted as FC-only in Table 2) ebest model is selected based on validation results Traininglasts for 10 epochs the learning rate is set to 10minus 4 andmomentum is set to 09

For example the first HAnet in Table 2 hyper(l2) FC-only refers to the architecture fusing the features from layer2 and the final layer 4 it uses ResNet (480) weights as thefeature extractor and only the final FC layer is updatedduring HAnet training

To achieve better generalizability data augmentation wasapplied online in the training procedure as suggested in [7]e images are randomly rotated between 0deg and 90deg andflipped with 50 possibility Our network is implementedusing PyTorch

e experiments are conducted on an Intel Xeon ma-chine (Gold 6130 CPU210GHz) with Nvidia Quadro

GAP (global average pooling)

GAP

Input Layer 1 Layer 2 Layer 3 Layer 4

GAP

Features

120 times 120 times 64 60 times 60 times 128 30 times 30 times 256 15 times 15 times 512 1 times 1 times 512 1 times 1 times 256 1 times 1 times 128480 times 480 times 3

Figure 5 Ulcer recognition network framework ResNet-34 which has 34 layers is selected as the feature extractor Here we only displaythe structural framework for clarity Detailed layer information can be found in Appendix

Split 0

Split 1

Split 4

hellip

helliphellip

Test Train Val

Figure 6 Illustration of cross-validation (green test dataset blue train dataset yellow validation dataset)

6 Computational and Mathematical Methods in Medicine

GP100 graphics cards Detailed results are presented inSections 32ndash34

32 Refinement of the Weighting Factor for Weighted Cross-Entropy To demonstrate the impact of different weightingfactors ie w in equation (2) we examine the cross-vali-dation results of model recognition accuracy with differentweighting factors e AC SE and SP curves are shown inFigure 9

AC varies with changes in weighting factor In generalSE improves while SP is degraded as the weighting factorincreases Detailed AC SE and SP values are listed inTable 3 ResNet-18(480) refers to experiments on a ResNet-18 network with a WCE full-resolution image input of480times 480 times 3 A possible explanation for the observed effectof the weighting factor is that the ulcer dataset containsmany consecutive frames of the same ulcer and theseframes may share marked similarities us while theframe numbers of the ulcer and normal dataset are com-parable the information contained by each dataset remainsunbalanced e weighting factor corrects or compensatesfor this imbalance

In the following experiments 40 is used as the weightingfactor as it outperforms other choices and simultaneouslyachieves good balance between SE and SP

33 Selection of Hyper Architectures We tested 10 models intotal as listed in Table 4 including a ResNet-18(480) modeland nine HAnet models based on ResNet-18(480) eresolution of input images is 480times 480times 3 for all models

According to the results in Table 4 the FC-only and all-update hyper models consistently outperform the ResNet-18(480) model in terms of the AC criterion which dem-onstrates the effectiveness of HAnet architectures More-over FC-only models generally perform better than all-update models thus implying that ResNet-18(480) extractsfeatures well and that further updates may corrupt thesefeatures

e hyper ImageNet models including hyper(l2)ImageNet hyper(l3) ImageNet and hyper(l23) ImageNetseem to give weak performance Hyper ImageNet modelsand the other hyper models share the same architecturese difference between these types of models is that thehyper ImageNet models are trained with the pretrainedImageNet ResNet-18 weights while the other models useResNet-18(480) weights that have been fine-tuned on theWCE datasetis finding reveals that a straightforward basenet such as ResNet-18(480) shows great power in extractingfeatures e complicated connections of HAnet may pro-hibit the network from reaching good convergence points

To fully utilize the advantages of hyper architectures werecommend a two-stage training process (1) Train a ResNet-18(480) model based on the ImageNet-pretrained weightsand then (2) use the fine-tuned ResNet-18(480) model as abackbone feature extractor to train the hyper models Wedenote the best model in all hyper architectures as HAnet-18(480) ie a hyper(l23) FC-only model

Additionally former exploration is based on ResNet-18and results indicate that a hyper(l23) FC-only architecturebased on the ResNet backbone feature extractor fine-tunedbyWCE images may be expected to improve the recognition

Table 2 Illustration of different architectures and configurations

ModelFeatures

Network initialization weights TrainingLayer 2 Layer 3 Layer 4

ResNet(480) ImageNet Train on WCE datasethyper(l2) FC-only

ResNet(480) Update FC-layer onlyhyper(l3) FC-only hyper(l23) FC-only hyper(l2) all-update

ResNet(480) Update all layershyper(l3) all-update hyper(l23) all-update hyper(l2) ImageNet

ImageNet Update all layershyper(l3) ImageNet hyper(l23) ImageNet

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(a)

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(b)

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(c)

Figure 7 Illustrations of HAnet architectures (a) hyper(l2) (b) hyper(l3) (c) hyper(l23)

Computational and Mathematical Methods in Medicine 7

ResNet(480) weights

ImageNet weights

Other trained weights

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) ImageNet

Layer 1

Layer 2

Layer 3

Layer 4

ResNet from ImageNet

Trai

n

Layer 1

Layer 2

Layer 3

Layer 4

ResNet(480)

FC la

yer

FC la

yer

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) all-update

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) FC-only

Train ResNet backbone to get ResNet(480)

all-update use the trained ResNet(480) weights as initialization and update all layers during training

FC-only use the trained ResNet(480) weights as initialization and update the FC layer during training

ImageNet use the ImageNet ResNet weights as initialization and update all layers during training

Figure 8 Illustration of three different training configurations including ImageNet all-update and FC-only Different colors are used torepresent different weights Green denotes pretrained weights from ImageNet and purple denotes the weights of ResNet(480) that have beenfine-tuned on the WCE dataset Several other colors denote other trained weights

1 2 3 4 5 688

89

90

91

92

93

94

Weighting factor

Val

ue (

)

ACSESP

Figure 9 AC SE and SP evolution against the wCE weighting factor Red purple and blue curves denote the results of AC SE and SPrespectively e horizontal axis is the weighting factor and the vertical axis is the value of AC SE and SP

8 Computational and Mathematical Methods in Medicine

capability of lesions in WCE videos To optimize our net-work we examine the performance of various ResNet seriesmembers to determine an appropriate backbone e cor-responding results are listed in Table 5 ResNet-34(480) hasbetter performance than ResNet-18(480) and ResNet-50(480) us we take ResNet-34(480) as our backbone totrain HAnet-34(480) e training settings are described inSection 31 Figure 10 gives the overall progression ofHAnet-34(480)

34 Comparison with Other Methods To evaluate the per-formance of HAnet we compared the proposed methodwith several other methods including several off-the-shelfCNN models [26ndash28] and two representative handcrafted-feature based methods for WCE recognition [3 42] e off-the-shelf CNN models including VGG [28] DenseNet [26]and Inception-ResNet-v2 [27] are trained to converge withthe same settings as ResNet-34(480) and the best model isselected based on the validation results For handcrafted-feature based methods grid searches to optimize hyperparameters are carried out

We performed repeated 2times 5-fold cross-validation toprovide sufficient measurements for statistical tests Table 6compares the detailed results of HAnet-34(480) with thoseof other methods On average HAnet-34(480) performsbetter in terms of AC SE and SP than the other methodsFigure 11(a) gives the location of each model considering itsinference time and accuracy Figure 11(b) is the statisticalresults of paired T-Test

Among the models tested HAnet-34(480) yields the bestperformance with good efficiency and accuracy Addition-ally the statistical test results demonstrate the improvementof our HAnet-34 is statistically significant Number in eachgrid cell denotes the p value of the two models in thecorresponding row and column We can see that the

improvement of HAnet-34(480) is statistically significant atthe 001 level compared with other methods

Table 7 gives more evaluation results based on severalcriteria including precision (PRE) recall (RECALL) F1 andF2 scores [33] and ROC-AUC [6] HAnet-34 outperformsall other models based on these criteria

4 Discussion

In this section the recognition capability of the proposedmethod for small lesions is demonstrated and discussedRecognition results are also visualized via the class activationmap (CAM) method [43] which indicates the localizationpotential of CNN networks for clinical diagnosis

41 Enhanced Recognition of Small Lesions by HAnet Toanalyze the recognition capacity of the proposed model thesensitivities of ulcers of different sizes are studied and theresults of ResNet-34(480) and the best hyper model HAnet-34(480) are listed in Table 8

Based on the results of each row in Table 8 most of theerrors noticeably occur in the small size range for bothmodels In general the larger the ulcer the easier its rec-ognition In the vertical comparison the ulcer recognition ofHAnet-34(480) outperforms that of ResNet-34(480) at allsize ranges including small lesions

42 Visualization of Recognition Results To better un-derstand our network we use a CAM [43] generated fromGAP to visualize the behavior of HAnet by highlighting therelatively important parts of an image and providing objectlocation information CAM is the weighted linear sum of theactivation map in the last convolutional layer e imageregions most relevant to a particular category can be simplyobtained by upsampling the CAM Using CAM we canverify what indeed has been learned by the network Six casesof representative results are displayed in Figure 12 For eachpair of images the left image shows the original frame whilethe right image shows the CAM result

ese results displayed in Figure 12 demonstrate thepotential use of HAnet for locating ulcers and easing thework of clinical physicians

Table 3 Cross-validation accuracy of ResNet-18(480) with different weighting factors

Weighting factor (w) 10 20 30 40 50 60AC () 9095plusmn 064 9100plusmn 049 9096plusmn 068 9100plusmn 070 9095plusmn 083 9072plusmn 075SE () 8865plusmn 064 8985plusmn 047 8986plusmn 101 9067plusmn 093 9150plusmn 076 9112plusmn 194SP () 9327plusmn 105 9215plusmn 122 9205plusmn 109 9145plusmn 184 9038plusmn 126 9032plusmn 188

Table 4 Performances of different architectures

ModelCross validation

AC () SE () SP ()ResNet-18(480) 9100plusmn 070 9055plusmn 086 9145plusmn 184hyper(l2) FC-only 9164plusmn 079 9122plusmn 108 9205plusmn 165hyper(l3) FC-only 9162plusmn 065 9115plusmn 056 9207plusmn 145hyper(l23) FC-only 9166 plusmn 081 9148 plusmn 090 9183 plusmn 175hyper(l2) all-update 9139plusmn 102 9128plusmn 104 9147plusmn 186hyper(l3) all-update 9150plusmn 081 9063plusmn 106 9233plusmn 174hyper(l23) all-update 9137plusmn 072 9133plusmn 063 914plusmn 142hyper(l2) ImageNet 9096plusmn 090 9052plusmn 110 9138plusmn 201hyper(l3) ImageNet 9104plusmn 080 9052plusmn 134 9154plusmn 131hyper(l23) ImageNet 9082plusmn 085 9026plusmn 133 9137plusmn 148

Table 5 Model recognition accuracy with different settings

ResNet-18(480) ResNet-34(480) ResNet-50(480)AC () 9100plusmn 070 9150plusmn 070 9129plusmn 091SE () 9055plusmn 086 9074plusmn 074 8963plusmn 183SP () 9145plusmn 184 9225plusmn 172 9294plusmn 182

Computational and Mathematical Methods in Medicine 9

3 HAnet architectures

hyper(l2)

hyper(l3)

hyper(l23)

3 Training configurations

ImageNet

all-update

FC-only

hyper(l23)

FC-only

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

3 Backbones

ResNet-34(480)

HAnet-34(480)

Cross validation

Cross validation

Figure 10 Progression of HAnet-34(480)

Table 6 Comparison of HAnet with other methods

AC () SE () SP ()SPM-BoW-SVM [42] 6138plusmn 122 5147plusmn 265 7067plusmn 224Words-based-color-histogram [3] 8034plusmn 029 8221plusmn 039 7844plusmn 029vgg-16(480) [28] 9085plusmn 098 9012plusmn 117 9202plusmn 252dense-121(480) [26] 9126plusmn 043 9047plusmn 167 9207plusmn 204Inception-ResNet-v2(480) [27] 9145plusmn 080 9081plusmn 195 9212plusmn 271ResNet-34(480) [25] 9147plusmn 052 9053plusmn 114 9241plusmn 166HAnet-34(480) 9205plusmn 052 9164plusmn 095 9242plusmn 154

0 10 20 30 409050

9075

9100

9125

9150

9175

9200

9225

9250

dense-121(480)

Inception-res-v2(480)

vgg-16(480)

HAnet-18(480)

HAnet-34(480)

HAnet-50(480)

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

Average inference time (ms)

Accu

racy

()

(a)

Figure 11 Continued

10 Computational and Mathematical Methods in Medicine

5 Conclusion

In this work we proposed a CNN architecture for ulcerdetection that uses a state-of-the-art CNN architecture(ResNet-34) as the feature extractor and fuses hyper anddeep features to enhance the recognition of ulcers of varioussizes A large ulcer dataset containing WCE videos from1416 patients was used for this studye proposed networkwas extensively evaluated and compared with other methodsusing overall AC SE SP F1 F2 and ROC-AUC as metrics

Experimental results demonstrate that the proposed ar-chitecture outperforms off-the-shelf CNN architectures es-pecially for the recognition of small ulcers Visualization with

CAM further demonstrates the potential of the proposedarchitecture to locate a suspicious area accurately in a WCEimage Taken together the results suggest a potential methodfor the automatic diagnosis of ulcers from WCE videos

Additionally we conducted experiments to investigatethe effect of number of cases We used split 0 datasets in thecross-validation experiment 990 cases for training 142 casesfor validation and 283 cases for testing We constructeddifferent training datasets from the 990 cases while fixed thevalidation and test dataset Firstly we did experiments onusing different number of cases for training We randomlyselected 659 cases 423 cases and 283 cases from 990 casesen we did another experiment using similar number of

00096

00096 00002

08018

00002 08018

00003

02384

00228

00003 02384 00228

00002

00026

00080

00989

00002 00026 00080 00989

HAnet-34(480)

HAnet-34(480)

Inception-res-v2(480)

Inception-res-v2(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

lt001

gt005

001ndash005

(b)

Figure 11 Comparison of different models (a) Accuracy and inference time comparison e horizontal and vertical axes denote theinference time and test accuracy respectively (b) Statistical test results of paired T-Test Number in each grid cell denotes the p value of thetwo models in the corresponding row and column

Table 7 Evaluation of different criteria

PRE RECALL F1 F2 ROC-AUCvgg-16(480) 09170 09012 09087 09040 09656dense-121(480) 09198 09047 09118 09074 09658Inception-ResNet-v2(480) 09208 09081 09138 09102 09706ResNet-34(480) 09218 09053 09133 09084 09698HAnet-34(480) 09237 09164 09199 09177 09726

Table 8 Recognition of ulcer with different sizes

ModelUlcer size

lt1 1ndash25 25ndash5 gt5ResNet-34(480) 8144plusmn 307 9186plusmn 140 9416plusmn 126 9651plusmn 143HAnet-34(480) 8237plusmn 360 9278plusmn 133 9540plusmn 074 9711plusmn 111

Computational and Mathematical Methods in Medicine 11

frames as last experiment that distributed in all 990 casesResults demonstrate that when similar number of frames areused for training test accuracies using training datasets withmore cases are better is should be attributed to richerdiversity introduced by more cases We may recommend touse as many cases as possible to train the model

While the performance of HAnet is very encouragingimproving its SE and SP further is necessary For examplethe fusion strategy in the proposed architecture involves

concatenation of features from shallow layers after GAPSemantic information in hyper features may not be as strongas that in deep features ie false-activated neural units dueto the relative limited receptive field in the shallow layersmay add unnecessary noise to the concatenated featurevector when GAP is utilized An attention mechanism [44]that can focus on the suspicious area may help address thisissue Temporal information from adjacent frames couldalso be used to provide external guidance during recognition

(a) (b)

(c) (d)

(e) (f )

Figure 12 Visualization network results of some representative ulcers with CAM A total of six groups of representative frames are obtained Foreach group the left image reflects the original frame while the right image shows the result of CAM (a) Typical ulcer (b) ulcer on the edge (c) ulcerin a turbid backgroundwith bubbles (d)multiple ulcers in one frame (e) ulcer in a shadow and (f) ulcer recognition in the framewith a flashlight

Table 9 Architectures of ResNet series members

Layer name Output size 18-Layer 34-Layer 50-Layer 101-Layer 152-LayerConv1 240 times 240 7 times 7 64 stride 2Maxpool 120 times 120 3 times 3 max pool stride 2

Layer 1 120 times 120 3 times 3 643 times 3 641113890 1113891 times 2 3 times 3 64

3 times 3 641113890 1113891 times 31 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Layer 2 60 times 60 3 times 3 1283 times 3 1281113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 41 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 8

Layer 3 30 times 30 3 times 3 2563 times 3 2561113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 61 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 6

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 23

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 36

Layer 4 15 times 15 3 times 3 5123 times 3 5121113890 1113891 times 2 3 times 3 512

3 times 3 5121113890 1113891 times 31 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Avgpool and fc 1 times 1 Global average pool 2-d fc

12 Computational and Mathematical Methods in Medicine

of the current frame In future work we will explore moretechniques in network design to improve ulcer recognition

Appendix

e architectures of members of the ResNet series includingResNet-18 ResNet-34 ResNet-50 ResNet-101 and ResNet-152 are illustrated in Table 9ese members share a similarpipeline consisting of seven components (ie conv1maxpool layer 1 layer 2 layer 3 layer 4 and avgpool + fc)Each layer in layers 1ndash4 has different subcomponentsdenoted as [ ] times n where [ ] represents a specific basicmodule consisting of 2-3 convolutional layers and n is thenumber of times the basic module is repeated Each row in[ ] describes the convolution layer setting eg [(3 times 3)64]

means a convolution layer with a kernel size and channel of3 times 3 and 64 respectively

Data Availability

eWCE datasets used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

Hao Zhang is currently an employee of Ankon TechnologiesCo Ltd (Wuhan Shanghai China) He helped in providingWCE data and organizing annotation e authors wouldlike to thank Ankon Technologies Co Ltd (WuhanShanghai China) for providing the WCE data (anko-ninccomcn)ey would also like to thank the participatingengineers of Ankon Technologies Co Ltd for theirthoughtful support and cooperation is work was sup-ported by a Research Project from Ankon Technologies CoLtd (Wuhan Shanghai China) the National Key ScientificInstrument and Equipment Development Project underGrant no 2013YQ160439 and the Zhangjiang NationalInnovation Demonstration Zone Special Development Fundunder Grant no ZJ2017-ZD-001

References

[1] L Shen Y-S Shan H-M Hu et al ldquoManagement of gastriccancer in Asia resource-stratified guidelinesrdquo e LancetOncology vol 14 no 12 pp e535ndashe547 2013

[2] Z Liao X Hou E-Q Lin-Hu et al ldquoAccuracy of magneticallycontrolled capsule endoscopy compared with conventionalgastroscopy in detection of gastric diseasesrdquo Clinical Gastro-enterology and Hepatology vol 14 no 9 pp 1266ndash1273e1 2016

[3] Y Yuan B Li and M Q-H Meng ldquoBleeding frame andregion detection in the wireless capsule endoscopy videordquoIEEE Journal of Biomedical and Health Informatics vol 20no 2 pp 624ndash630 2015

[4] N Shamsudhin V I Zverev H Keller et al ldquoMagneticallyguided capsule endoscopyrdquo Medical Physics vol 44 no 8pp e91ndashe111 2017

[5] B Li and M Q-H Meng ldquoComputer aided detection ofbleeding regions for capsule endoscopy imagesrdquo IEEETransactions on Biomedical Engineering vol 56 no 4pp 1032ndash1039 2009

[6] J-Y He X Wu Y-G Jiang Q Peng and R Jain ldquoHook-worm detection in wireless capsule endoscopy images withdeep learningrdquo IEEE Transactions on Image Processingvol 27 no 5 pp 2379ndash2392 2018

[7] Y Yuan and M Q-H Meng ldquoDeep learning for polyprecognition in wireless capsule endoscopy imagesrdquo MedicalPhysics vol 44 no 4 pp 1379ndash1389 2017

[8] B Li and M Q-H Meng ldquoAutomatic polyp detection forwireless capsule endoscopy imagesrdquo Expert Systems withApplications vol 39 no 12 pp 10952ndash10958 2012

[9] A Karargyris and N Bourbakis ldquoDetection of small bowelpolyps and ulcers in wireless capsule endoscopy videosrdquo IEEETransactions on Biomedical Engineering vol 58 no 10pp 2777ndash2786 2011

[10] L Yu P C Yuen and J Lai ldquoUlcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 21st In-ternational Conference on Pattern Recognition (ICPR2012)pp 45ndash48 IEEE Tsukuba Japan November 2012

[11] J-Y Yeh T-H Wu and W-J Tsai ldquoBleeding and ulcerdetection using wireless capsule endoscopy imagesrdquo Journalof Software Engineering and Applications vol 7 no 5pp 422ndash432 2014

[12] Y Fu W Zhang M Mandal and M Q-H Meng ldquoCom-puter-aided bleeding detection in WCE videordquo IEEE Journalof Biomedical and Health Informatics vol 18 no 2pp 636ndash642 2014

[13] V Charisis L Hadjileontiadis and G Sergiadis ldquoEnhancedulcer recognition from capsule endoscopic images usingtexture analysisrdquo in New Advances in the Basic and ClinicalGastroenterology IntechOpen London UK 2012

[14] A K Kundu and S A Fattah ldquoAn asymmetric indexed imagebased technique for automatic ulcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 2017 IEEERegion 10 Humanitarian Technology Conference (R10-HTC)pp 734ndash737 IEEE Dhaka Bangladesh December 2017

[15] E Ribeiro A Uhl W Georg andM Hafner ldquoExploring deeplearning and transfer learning for colonic polyp classifica-tionrdquo Computational andMathematical Methods in Medicinevol 2016 Article ID 6584725 16 pages 2016

[16] B Li and M Q-H Meng ldquoComputer-based detection ofbleeding and ulcer in wireless capsule endoscopy images bychromaticity momentsrdquo Computers in Biology and Medicinevol 39 no 2 pp 141ndash147 2009

[17] B Li L Qi M Q-H Meng and Y Fan ldquoUsing ensembleclassifier for small bowel ulcer detection in wireless capsuleendoscopy imagesrdquo in Proceedings of the 2009 IEEE In-ternational Conference on Robotics amp Biomimetics GuilinChina December 2009

[18] T Aoki A Yamada K Aoyama et al ldquoAutomatic detection oferosions and ulcerations in wireless capsule endoscopy imagesbased on a deep convolutional neural networkrdquo Gastrointes-tinal Endoscopy vol 89 no 2 pp 357ndash363e2 2019

[19] B Li andM Q-H Meng ldquoTexture analysis for ulcer detectionin capsule endoscopy imagesrdquo Image and Vision Computingvol 27 no 9 pp 1336ndash1342 2009

[20] Y Yuan and M Q-H Meng ldquoA novel feature for polypdetection in wireless capsule endoscopy imagesrdquo in Pro-ceedings of the 2014 IEEERSJ International Conference onIntelligent Robots and Systems pp 5010ndash5015 IEEE ChicagoIL USA September 2014

Computational and Mathematical Methods in Medicine 13

[21] A Krizhevsky I Sutskever and G Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inAdvances in Neural Information Processing Systems vol 25no 2 Curran Associates Inc Red Hook NY USA 2012

[22] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in Proceedings of the EuropeanConference on Computer Vision pp 818ndash833 Springer ChamSwitzerland September 2014

[23] C Szegedy W Liu Y Jia et al ldquoGoing deeper with con-volutionsrdquo in Proceedings of the 2015 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEEBoston MA USA June 2015

[24] C Szegedy V Vanhoucke S Ioffe J Shlens and Z WojnaldquoRethinking the inception architecture for computer visionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2818ndash2826 Las Vegas NV USA June2016

[25] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778 LasVegas NV USA June 2016

[26] G Huang Z Liu L Van Der Maaten and K Q WeinbergerldquoDensely connected convolutional networksrdquo in Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition pp 4700ndash4708 Honolulu HI USA July 2017

[27] C Szegedy S Ioffe V Vanhoucke and A Alemi ldquoInception-v4 inception-ResNet and the impact of residual connectionson learningrdquo in Proceedings of the irty-First AAAI Con-ference on Artificial Intelligence San Francisco CA USAFebruary 2017

[28] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[29] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquoNaturevol 521 no 7553 pp 436ndash444 2015

[30] M Turan E P Ornek N Ibrahimli et al ldquoUnsupervisedodometry and depth learning for endoscopic capsule robotsrdquo2018 httpsarxivorgabs180301047

[31] M Ye E Johns A Handa L Zhang P Pratt and G-Z YangldquoSelf-supervised Siamese learning on stereo image pairs fordepth estimation in robotic surgeryrdquo 2017 httpsarxivorgabs170508260

[32] M Faisal and N J Durr ldquoDeep learning and conditionalrandom fields-based depth estimation and topographicalreconstruction from conventional endoscopyrdquoMedical ImageAnalysis vol 48 pp 230ndash243 2018

[33] L Yu H Chen Q Dou J Qin and P A Heng ldquoIntegratingonline and offline three-dimensional deep learning for auto-mated polyp detection in colonoscopy videosrdquo IEEE Journal ofBiomedical andHealth Informatics vol 21 no1 pp 65ndash75 2017

[34] A A Shvets V I Iglovikov A Rakhlin and A A KalininldquoAngiodysplasia detection and localization using deep con-volutional neural networksrdquo in Proceedings of the 2018 17thIEEE International Conference on Machine Learning andApplications (ICMLA) pp 612ndash617 IEEE Orlando FL USADecember 2018

[35] S Fan L Xu Y Fan K Wei and L Li ldquoComputer-aideddetection of small intestinal ulcer and erosion in wirelesscapsule endoscopy imagesrdquo Physics in Medicine amp Biologyvol 63 no 16 p 165001 2018

[36] W Liu D Anguelov D Erhan et al ldquoSSD single shotmultibox detectorrdquo in Proceedings of the European Conferenceon Computer Vision pp 21ndash37 Springer Cham SwitzerlandOctober 2016

[37] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7263ndash7271 Honolulu HIUSA July 2017

[38] T-Y Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2117ndash2125 Honolulu HI USA July2017

[39] M Lin Q Chen and S Yan ldquoNetwork in networkrdquo 2013httpsarxivorgabs13124400

[40] T-Y Lin P Goyal R Girshick K He and P Dollar ldquoFocalloss for dense object detectionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 2980ndash2988Venice Italy October 2017

[41] H R Roth H Oda Y Hayashi et al ldquoHierarchical 3D fullyconvolutional networks for multi-organ segmentationrdquo 2017httpsarxivorgabs170406382

[42] X Liu J Bai G Liao Z Luo and C Wang ldquoDetection ofprotruding lesion in wireless capsule endoscopy videos of smallintestinerdquo in Proceedings of the Medical Imaging 2018 Com-puter-Aided Diagnosis Houston TX USA February 2018

[43] B Zhou A Khosla A Lapedriza A Oliva and A TorralbaldquoLearning deep features for discriminative localizationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2921ndash2929 Las Vegas NV USA June2016

[44] F Wang M Jiang C Qian et al ldquoResidual attention networkfor image classificationrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 3156ndash3164Honolulu HI USA July 2017

14 Computational and Mathematical Methods in Medicine

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 7: DeepConvolutionalNeuralNetworkforUlcerRecognitionin ...e emergence of wireless capsule endoscopy (WCE) has revolutionized the task of imaging GI issues; this technology offers a noninvasive

GP100 graphics cards Detailed results are presented inSections 32ndash34

32 Refinement of the Weighting Factor for Weighted Cross-Entropy To demonstrate the impact of different weightingfactors ie w in equation (2) we examine the cross-vali-dation results of model recognition accuracy with differentweighting factors e AC SE and SP curves are shown inFigure 9

AC varies with changes in weighting factor In generalSE improves while SP is degraded as the weighting factorincreases Detailed AC SE and SP values are listed inTable 3 ResNet-18(480) refers to experiments on a ResNet-18 network with a WCE full-resolution image input of480times 480 times 3 A possible explanation for the observed effectof the weighting factor is that the ulcer dataset containsmany consecutive frames of the same ulcer and theseframes may share marked similarities us while theframe numbers of the ulcer and normal dataset are com-parable the information contained by each dataset remainsunbalanced e weighting factor corrects or compensatesfor this imbalance

In the following experiments 40 is used as the weightingfactor as it outperforms other choices and simultaneouslyachieves good balance between SE and SP

33 Selection of Hyper Architectures We tested 10 models intotal as listed in Table 4 including a ResNet-18(480) modeland nine HAnet models based on ResNet-18(480) eresolution of input images is 480times 480times 3 for all models

According to the results in Table 4 the FC-only and all-update hyper models consistently outperform the ResNet-18(480) model in terms of the AC criterion which dem-onstrates the effectiveness of HAnet architectures More-over FC-only models generally perform better than all-update models thus implying that ResNet-18(480) extractsfeatures well and that further updates may corrupt thesefeatures

e hyper ImageNet models including hyper(l2)ImageNet hyper(l3) ImageNet and hyper(l23) ImageNetseem to give weak performance Hyper ImageNet modelsand the other hyper models share the same architecturese difference between these types of models is that thehyper ImageNet models are trained with the pretrainedImageNet ResNet-18 weights while the other models useResNet-18(480) weights that have been fine-tuned on theWCE datasetis finding reveals that a straightforward basenet such as ResNet-18(480) shows great power in extractingfeatures e complicated connections of HAnet may pro-hibit the network from reaching good convergence points

To fully utilize the advantages of hyper architectures werecommend a two-stage training process (1) Train a ResNet-18(480) model based on the ImageNet-pretrained weightsand then (2) use the fine-tuned ResNet-18(480) model as abackbone feature extractor to train the hyper models Wedenote the best model in all hyper architectures as HAnet-18(480) ie a hyper(l23) FC-only model

Additionally former exploration is based on ResNet-18and results indicate that a hyper(l23) FC-only architecturebased on the ResNet backbone feature extractor fine-tunedbyWCE images may be expected to improve the recognition

Table 2 Illustration of different architectures and configurations

ModelFeatures

Network initialization weights TrainingLayer 2 Layer 3 Layer 4

ResNet(480) ImageNet Train on WCE datasethyper(l2) FC-only

ResNet(480) Update FC-layer onlyhyper(l3) FC-only hyper(l23) FC-only hyper(l2) all-update

ResNet(480) Update all layershyper(l3) all-update hyper(l23) all-update hyper(l2) ImageNet

ImageNet Update all layershyper(l3) ImageNet hyper(l23) ImageNet

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(a)

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(b)

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

(c)

Figure 7 Illustrations of HAnet architectures (a) hyper(l2) (b) hyper(l3) (c) hyper(l23)

Computational and Mathematical Methods in Medicine 7

ResNet(480) weights

ImageNet weights

Other trained weights

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) ImageNet

Layer 1

Layer 2

Layer 3

Layer 4

ResNet from ImageNet

Trai

n

Layer 1

Layer 2

Layer 3

Layer 4

ResNet(480)

FC la

yer

FC la

yer

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) all-update

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) FC-only

Train ResNet backbone to get ResNet(480)

all-update use the trained ResNet(480) weights as initialization and update all layers during training

FC-only use the trained ResNet(480) weights as initialization and update the FC layer during training

ImageNet use the ImageNet ResNet weights as initialization and update all layers during training

Figure 8 Illustration of three different training configurations including ImageNet all-update and FC-only Different colors are used torepresent different weights Green denotes pretrained weights from ImageNet and purple denotes the weights of ResNet(480) that have beenfine-tuned on the WCE dataset Several other colors denote other trained weights

1 2 3 4 5 688

89

90

91

92

93

94

Weighting factor

Val

ue (

)

ACSESP

Figure 9 AC SE and SP evolution against the wCE weighting factor Red purple and blue curves denote the results of AC SE and SPrespectively e horizontal axis is the weighting factor and the vertical axis is the value of AC SE and SP

8 Computational and Mathematical Methods in Medicine

capability of lesions in WCE videos To optimize our net-work we examine the performance of various ResNet seriesmembers to determine an appropriate backbone e cor-responding results are listed in Table 5 ResNet-34(480) hasbetter performance than ResNet-18(480) and ResNet-50(480) us we take ResNet-34(480) as our backbone totrain HAnet-34(480) e training settings are described inSection 31 Figure 10 gives the overall progression ofHAnet-34(480)

34 Comparison with Other Methods To evaluate the per-formance of HAnet we compared the proposed methodwith several other methods including several off-the-shelfCNN models [26ndash28] and two representative handcrafted-feature based methods for WCE recognition [3 42] e off-the-shelf CNN models including VGG [28] DenseNet [26]and Inception-ResNet-v2 [27] are trained to converge withthe same settings as ResNet-34(480) and the best model isselected based on the validation results For handcrafted-feature based methods grid searches to optimize hyperparameters are carried out

We performed repeated 2times 5-fold cross-validation toprovide sufficient measurements for statistical tests Table 6compares the detailed results of HAnet-34(480) with thoseof other methods On average HAnet-34(480) performsbetter in terms of AC SE and SP than the other methodsFigure 11(a) gives the location of each model considering itsinference time and accuracy Figure 11(b) is the statisticalresults of paired T-Test

Among the models tested HAnet-34(480) yields the bestperformance with good efficiency and accuracy Addition-ally the statistical test results demonstrate the improvementof our HAnet-34 is statistically significant Number in eachgrid cell denotes the p value of the two models in thecorresponding row and column We can see that the

improvement of HAnet-34(480) is statistically significant atthe 001 level compared with other methods

Table 7 gives more evaluation results based on severalcriteria including precision (PRE) recall (RECALL) F1 andF2 scores [33] and ROC-AUC [6] HAnet-34 outperformsall other models based on these criteria

4 Discussion

In this section the recognition capability of the proposedmethod for small lesions is demonstrated and discussedRecognition results are also visualized via the class activationmap (CAM) method [43] which indicates the localizationpotential of CNN networks for clinical diagnosis

41 Enhanced Recognition of Small Lesions by HAnet Toanalyze the recognition capacity of the proposed model thesensitivities of ulcers of different sizes are studied and theresults of ResNet-34(480) and the best hyper model HAnet-34(480) are listed in Table 8

Based on the results of each row in Table 8 most of theerrors noticeably occur in the small size range for bothmodels In general the larger the ulcer the easier its rec-ognition In the vertical comparison the ulcer recognition ofHAnet-34(480) outperforms that of ResNet-34(480) at allsize ranges including small lesions

42 Visualization of Recognition Results To better un-derstand our network we use a CAM [43] generated fromGAP to visualize the behavior of HAnet by highlighting therelatively important parts of an image and providing objectlocation information CAM is the weighted linear sum of theactivation map in the last convolutional layer e imageregions most relevant to a particular category can be simplyobtained by upsampling the CAM Using CAM we canverify what indeed has been learned by the network Six casesof representative results are displayed in Figure 12 For eachpair of images the left image shows the original frame whilethe right image shows the CAM result

ese results displayed in Figure 12 demonstrate thepotential use of HAnet for locating ulcers and easing thework of clinical physicians

Table 3 Cross-validation accuracy of ResNet-18(480) with different weighting factors

Weighting factor (w) 10 20 30 40 50 60AC () 9095plusmn 064 9100plusmn 049 9096plusmn 068 9100plusmn 070 9095plusmn 083 9072plusmn 075SE () 8865plusmn 064 8985plusmn 047 8986plusmn 101 9067plusmn 093 9150plusmn 076 9112plusmn 194SP () 9327plusmn 105 9215plusmn 122 9205plusmn 109 9145plusmn 184 9038plusmn 126 9032plusmn 188

Table 4 Performances of different architectures

ModelCross validation

AC () SE () SP ()ResNet-18(480) 9100plusmn 070 9055plusmn 086 9145plusmn 184hyper(l2) FC-only 9164plusmn 079 9122plusmn 108 9205plusmn 165hyper(l3) FC-only 9162plusmn 065 9115plusmn 056 9207plusmn 145hyper(l23) FC-only 9166 plusmn 081 9148 plusmn 090 9183 plusmn 175hyper(l2) all-update 9139plusmn 102 9128plusmn 104 9147plusmn 186hyper(l3) all-update 9150plusmn 081 9063plusmn 106 9233plusmn 174hyper(l23) all-update 9137plusmn 072 9133plusmn 063 914plusmn 142hyper(l2) ImageNet 9096plusmn 090 9052plusmn 110 9138plusmn 201hyper(l3) ImageNet 9104plusmn 080 9052plusmn 134 9154plusmn 131hyper(l23) ImageNet 9082plusmn 085 9026plusmn 133 9137plusmn 148

Table 5 Model recognition accuracy with different settings

ResNet-18(480) ResNet-34(480) ResNet-50(480)AC () 9100plusmn 070 9150plusmn 070 9129plusmn 091SE () 9055plusmn 086 9074plusmn 074 8963plusmn 183SP () 9145plusmn 184 9225plusmn 172 9294plusmn 182

Computational and Mathematical Methods in Medicine 9

3 HAnet architectures

hyper(l2)

hyper(l3)

hyper(l23)

3 Training configurations

ImageNet

all-update

FC-only

hyper(l23)

FC-only

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

3 Backbones

ResNet-34(480)

HAnet-34(480)

Cross validation

Cross validation

Figure 10 Progression of HAnet-34(480)

Table 6 Comparison of HAnet with other methods

AC () SE () SP ()SPM-BoW-SVM [42] 6138plusmn 122 5147plusmn 265 7067plusmn 224Words-based-color-histogram [3] 8034plusmn 029 8221plusmn 039 7844plusmn 029vgg-16(480) [28] 9085plusmn 098 9012plusmn 117 9202plusmn 252dense-121(480) [26] 9126plusmn 043 9047plusmn 167 9207plusmn 204Inception-ResNet-v2(480) [27] 9145plusmn 080 9081plusmn 195 9212plusmn 271ResNet-34(480) [25] 9147plusmn 052 9053plusmn 114 9241plusmn 166HAnet-34(480) 9205plusmn 052 9164plusmn 095 9242plusmn 154

0 10 20 30 409050

9075

9100

9125

9150

9175

9200

9225

9250

dense-121(480)

Inception-res-v2(480)

vgg-16(480)

HAnet-18(480)

HAnet-34(480)

HAnet-50(480)

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

Average inference time (ms)

Accu

racy

()

(a)

Figure 11 Continued

10 Computational and Mathematical Methods in Medicine

5 Conclusion

In this work we proposed a CNN architecture for ulcerdetection that uses a state-of-the-art CNN architecture(ResNet-34) as the feature extractor and fuses hyper anddeep features to enhance the recognition of ulcers of varioussizes A large ulcer dataset containing WCE videos from1416 patients was used for this studye proposed networkwas extensively evaluated and compared with other methodsusing overall AC SE SP F1 F2 and ROC-AUC as metrics

Experimental results demonstrate that the proposed ar-chitecture outperforms off-the-shelf CNN architectures es-pecially for the recognition of small ulcers Visualization with

CAM further demonstrates the potential of the proposedarchitecture to locate a suspicious area accurately in a WCEimage Taken together the results suggest a potential methodfor the automatic diagnosis of ulcers from WCE videos

Additionally we conducted experiments to investigatethe effect of number of cases We used split 0 datasets in thecross-validation experiment 990 cases for training 142 casesfor validation and 283 cases for testing We constructeddifferent training datasets from the 990 cases while fixed thevalidation and test dataset Firstly we did experiments onusing different number of cases for training We randomlyselected 659 cases 423 cases and 283 cases from 990 casesen we did another experiment using similar number of

00096

00096 00002

08018

00002 08018

00003

02384

00228

00003 02384 00228

00002

00026

00080

00989

00002 00026 00080 00989

HAnet-34(480)

HAnet-34(480)

Inception-res-v2(480)

Inception-res-v2(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

lt001

gt005

001ndash005

(b)

Figure 11 Comparison of different models (a) Accuracy and inference time comparison e horizontal and vertical axes denote theinference time and test accuracy respectively (b) Statistical test results of paired T-Test Number in each grid cell denotes the p value of thetwo models in the corresponding row and column

Table 7 Evaluation of different criteria

PRE RECALL F1 F2 ROC-AUCvgg-16(480) 09170 09012 09087 09040 09656dense-121(480) 09198 09047 09118 09074 09658Inception-ResNet-v2(480) 09208 09081 09138 09102 09706ResNet-34(480) 09218 09053 09133 09084 09698HAnet-34(480) 09237 09164 09199 09177 09726

Table 8 Recognition of ulcer with different sizes

ModelUlcer size

lt1 1ndash25 25ndash5 gt5ResNet-34(480) 8144plusmn 307 9186plusmn 140 9416plusmn 126 9651plusmn 143HAnet-34(480) 8237plusmn 360 9278plusmn 133 9540plusmn 074 9711plusmn 111

Computational and Mathematical Methods in Medicine 11

frames as last experiment that distributed in all 990 casesResults demonstrate that when similar number of frames areused for training test accuracies using training datasets withmore cases are better is should be attributed to richerdiversity introduced by more cases We may recommend touse as many cases as possible to train the model

While the performance of HAnet is very encouragingimproving its SE and SP further is necessary For examplethe fusion strategy in the proposed architecture involves

concatenation of features from shallow layers after GAPSemantic information in hyper features may not be as strongas that in deep features ie false-activated neural units dueto the relative limited receptive field in the shallow layersmay add unnecessary noise to the concatenated featurevector when GAP is utilized An attention mechanism [44]that can focus on the suspicious area may help address thisissue Temporal information from adjacent frames couldalso be used to provide external guidance during recognition

(a) (b)

(c) (d)

(e) (f )

Figure 12 Visualization network results of some representative ulcers with CAM A total of six groups of representative frames are obtained Foreach group the left image reflects the original frame while the right image shows the result of CAM (a) Typical ulcer (b) ulcer on the edge (c) ulcerin a turbid backgroundwith bubbles (d)multiple ulcers in one frame (e) ulcer in a shadow and (f) ulcer recognition in the framewith a flashlight

Table 9 Architectures of ResNet series members

Layer name Output size 18-Layer 34-Layer 50-Layer 101-Layer 152-LayerConv1 240 times 240 7 times 7 64 stride 2Maxpool 120 times 120 3 times 3 max pool stride 2

Layer 1 120 times 120 3 times 3 643 times 3 641113890 1113891 times 2 3 times 3 64

3 times 3 641113890 1113891 times 31 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Layer 2 60 times 60 3 times 3 1283 times 3 1281113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 41 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 8

Layer 3 30 times 30 3 times 3 2563 times 3 2561113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 61 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 6

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 23

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 36

Layer 4 15 times 15 3 times 3 5123 times 3 5121113890 1113891 times 2 3 times 3 512

3 times 3 5121113890 1113891 times 31 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Avgpool and fc 1 times 1 Global average pool 2-d fc

12 Computational and Mathematical Methods in Medicine

of the current frame In future work we will explore moretechniques in network design to improve ulcer recognition

Appendix

e architectures of members of the ResNet series includingResNet-18 ResNet-34 ResNet-50 ResNet-101 and ResNet-152 are illustrated in Table 9ese members share a similarpipeline consisting of seven components (ie conv1maxpool layer 1 layer 2 layer 3 layer 4 and avgpool + fc)Each layer in layers 1ndash4 has different subcomponentsdenoted as [ ] times n where [ ] represents a specific basicmodule consisting of 2-3 convolutional layers and n is thenumber of times the basic module is repeated Each row in[ ] describes the convolution layer setting eg [(3 times 3)64]

means a convolution layer with a kernel size and channel of3 times 3 and 64 respectively

Data Availability

eWCE datasets used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

Hao Zhang is currently an employee of Ankon TechnologiesCo Ltd (Wuhan Shanghai China) He helped in providingWCE data and organizing annotation e authors wouldlike to thank Ankon Technologies Co Ltd (WuhanShanghai China) for providing the WCE data (anko-ninccomcn)ey would also like to thank the participatingengineers of Ankon Technologies Co Ltd for theirthoughtful support and cooperation is work was sup-ported by a Research Project from Ankon Technologies CoLtd (Wuhan Shanghai China) the National Key ScientificInstrument and Equipment Development Project underGrant no 2013YQ160439 and the Zhangjiang NationalInnovation Demonstration Zone Special Development Fundunder Grant no ZJ2017-ZD-001

References

[1] L Shen Y-S Shan H-M Hu et al ldquoManagement of gastriccancer in Asia resource-stratified guidelinesrdquo e LancetOncology vol 14 no 12 pp e535ndashe547 2013

[2] Z Liao X Hou E-Q Lin-Hu et al ldquoAccuracy of magneticallycontrolled capsule endoscopy compared with conventionalgastroscopy in detection of gastric diseasesrdquo Clinical Gastro-enterology and Hepatology vol 14 no 9 pp 1266ndash1273e1 2016

[3] Y Yuan B Li and M Q-H Meng ldquoBleeding frame andregion detection in the wireless capsule endoscopy videordquoIEEE Journal of Biomedical and Health Informatics vol 20no 2 pp 624ndash630 2015

[4] N Shamsudhin V I Zverev H Keller et al ldquoMagneticallyguided capsule endoscopyrdquo Medical Physics vol 44 no 8pp e91ndashe111 2017

[5] B Li and M Q-H Meng ldquoComputer aided detection ofbleeding regions for capsule endoscopy imagesrdquo IEEETransactions on Biomedical Engineering vol 56 no 4pp 1032ndash1039 2009

[6] J-Y He X Wu Y-G Jiang Q Peng and R Jain ldquoHook-worm detection in wireless capsule endoscopy images withdeep learningrdquo IEEE Transactions on Image Processingvol 27 no 5 pp 2379ndash2392 2018

[7] Y Yuan and M Q-H Meng ldquoDeep learning for polyprecognition in wireless capsule endoscopy imagesrdquo MedicalPhysics vol 44 no 4 pp 1379ndash1389 2017

[8] B Li and M Q-H Meng ldquoAutomatic polyp detection forwireless capsule endoscopy imagesrdquo Expert Systems withApplications vol 39 no 12 pp 10952ndash10958 2012

[9] A Karargyris and N Bourbakis ldquoDetection of small bowelpolyps and ulcers in wireless capsule endoscopy videosrdquo IEEETransactions on Biomedical Engineering vol 58 no 10pp 2777ndash2786 2011

[10] L Yu P C Yuen and J Lai ldquoUlcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 21st In-ternational Conference on Pattern Recognition (ICPR2012)pp 45ndash48 IEEE Tsukuba Japan November 2012

[11] J-Y Yeh T-H Wu and W-J Tsai ldquoBleeding and ulcerdetection using wireless capsule endoscopy imagesrdquo Journalof Software Engineering and Applications vol 7 no 5pp 422ndash432 2014

[12] Y Fu W Zhang M Mandal and M Q-H Meng ldquoCom-puter-aided bleeding detection in WCE videordquo IEEE Journalof Biomedical and Health Informatics vol 18 no 2pp 636ndash642 2014

[13] V Charisis L Hadjileontiadis and G Sergiadis ldquoEnhancedulcer recognition from capsule endoscopic images usingtexture analysisrdquo in New Advances in the Basic and ClinicalGastroenterology IntechOpen London UK 2012

[14] A K Kundu and S A Fattah ldquoAn asymmetric indexed imagebased technique for automatic ulcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 2017 IEEERegion 10 Humanitarian Technology Conference (R10-HTC)pp 734ndash737 IEEE Dhaka Bangladesh December 2017

[15] E Ribeiro A Uhl W Georg andM Hafner ldquoExploring deeplearning and transfer learning for colonic polyp classifica-tionrdquo Computational andMathematical Methods in Medicinevol 2016 Article ID 6584725 16 pages 2016

[16] B Li and M Q-H Meng ldquoComputer-based detection ofbleeding and ulcer in wireless capsule endoscopy images bychromaticity momentsrdquo Computers in Biology and Medicinevol 39 no 2 pp 141ndash147 2009

[17] B Li L Qi M Q-H Meng and Y Fan ldquoUsing ensembleclassifier for small bowel ulcer detection in wireless capsuleendoscopy imagesrdquo in Proceedings of the 2009 IEEE In-ternational Conference on Robotics amp Biomimetics GuilinChina December 2009

[18] T Aoki A Yamada K Aoyama et al ldquoAutomatic detection oferosions and ulcerations in wireless capsule endoscopy imagesbased on a deep convolutional neural networkrdquo Gastrointes-tinal Endoscopy vol 89 no 2 pp 357ndash363e2 2019

[19] B Li andM Q-H Meng ldquoTexture analysis for ulcer detectionin capsule endoscopy imagesrdquo Image and Vision Computingvol 27 no 9 pp 1336ndash1342 2009

[20] Y Yuan and M Q-H Meng ldquoA novel feature for polypdetection in wireless capsule endoscopy imagesrdquo in Pro-ceedings of the 2014 IEEERSJ International Conference onIntelligent Robots and Systems pp 5010ndash5015 IEEE ChicagoIL USA September 2014

Computational and Mathematical Methods in Medicine 13

[21] A Krizhevsky I Sutskever and G Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inAdvances in Neural Information Processing Systems vol 25no 2 Curran Associates Inc Red Hook NY USA 2012

[22] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in Proceedings of the EuropeanConference on Computer Vision pp 818ndash833 Springer ChamSwitzerland September 2014

[23] C Szegedy W Liu Y Jia et al ldquoGoing deeper with con-volutionsrdquo in Proceedings of the 2015 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEEBoston MA USA June 2015

[24] C Szegedy V Vanhoucke S Ioffe J Shlens and Z WojnaldquoRethinking the inception architecture for computer visionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2818ndash2826 Las Vegas NV USA June2016

[25] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778 LasVegas NV USA June 2016

[26] G Huang Z Liu L Van Der Maaten and K Q WeinbergerldquoDensely connected convolutional networksrdquo in Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition pp 4700ndash4708 Honolulu HI USA July 2017

[27] C Szegedy S Ioffe V Vanhoucke and A Alemi ldquoInception-v4 inception-ResNet and the impact of residual connectionson learningrdquo in Proceedings of the irty-First AAAI Con-ference on Artificial Intelligence San Francisco CA USAFebruary 2017

[28] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[29] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquoNaturevol 521 no 7553 pp 436ndash444 2015

[30] M Turan E P Ornek N Ibrahimli et al ldquoUnsupervisedodometry and depth learning for endoscopic capsule robotsrdquo2018 httpsarxivorgabs180301047

[31] M Ye E Johns A Handa L Zhang P Pratt and G-Z YangldquoSelf-supervised Siamese learning on stereo image pairs fordepth estimation in robotic surgeryrdquo 2017 httpsarxivorgabs170508260

[32] M Faisal and N J Durr ldquoDeep learning and conditionalrandom fields-based depth estimation and topographicalreconstruction from conventional endoscopyrdquoMedical ImageAnalysis vol 48 pp 230ndash243 2018

[33] L Yu H Chen Q Dou J Qin and P A Heng ldquoIntegratingonline and offline three-dimensional deep learning for auto-mated polyp detection in colonoscopy videosrdquo IEEE Journal ofBiomedical andHealth Informatics vol 21 no1 pp 65ndash75 2017

[34] A A Shvets V I Iglovikov A Rakhlin and A A KalininldquoAngiodysplasia detection and localization using deep con-volutional neural networksrdquo in Proceedings of the 2018 17thIEEE International Conference on Machine Learning andApplications (ICMLA) pp 612ndash617 IEEE Orlando FL USADecember 2018

[35] S Fan L Xu Y Fan K Wei and L Li ldquoComputer-aideddetection of small intestinal ulcer and erosion in wirelesscapsule endoscopy imagesrdquo Physics in Medicine amp Biologyvol 63 no 16 p 165001 2018

[36] W Liu D Anguelov D Erhan et al ldquoSSD single shotmultibox detectorrdquo in Proceedings of the European Conferenceon Computer Vision pp 21ndash37 Springer Cham SwitzerlandOctober 2016

[37] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7263ndash7271 Honolulu HIUSA July 2017

[38] T-Y Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2117ndash2125 Honolulu HI USA July2017

[39] M Lin Q Chen and S Yan ldquoNetwork in networkrdquo 2013httpsarxivorgabs13124400

[40] T-Y Lin P Goyal R Girshick K He and P Dollar ldquoFocalloss for dense object detectionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 2980ndash2988Venice Italy October 2017

[41] H R Roth H Oda Y Hayashi et al ldquoHierarchical 3D fullyconvolutional networks for multi-organ segmentationrdquo 2017httpsarxivorgabs170406382

[42] X Liu J Bai G Liao Z Luo and C Wang ldquoDetection ofprotruding lesion in wireless capsule endoscopy videos of smallintestinerdquo in Proceedings of the Medical Imaging 2018 Com-puter-Aided Diagnosis Houston TX USA February 2018

[43] B Zhou A Khosla A Lapedriza A Oliva and A TorralbaldquoLearning deep features for discriminative localizationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2921ndash2929 Las Vegas NV USA June2016

[44] F Wang M Jiang C Qian et al ldquoResidual attention networkfor image classificationrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 3156ndash3164Honolulu HI USA July 2017

14 Computational and Mathematical Methods in Medicine

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 8: DeepConvolutionalNeuralNetworkforUlcerRecognitionin ...e emergence of wireless capsule endoscopy (WCE) has revolutionized the task of imaging GI issues; this technology offers a noninvasive

ResNet(480) weights

ImageNet weights

Other trained weights

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) ImageNet

Layer 1

Layer 2

Layer 3

Layer 4

ResNet from ImageNet

Trai

n

Layer 1

Layer 2

Layer 3

Layer 4

ResNet(480)

FC la

yer

FC la

yer

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) all-update

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23)

Train

Layer 1

Layer 2

Layer 3

Layer 4

FC la

yer

hyper(l23) FC-only

Train ResNet backbone to get ResNet(480)

all-update use the trained ResNet(480) weights as initialization and update all layers during training

FC-only use the trained ResNet(480) weights as initialization and update the FC layer during training

ImageNet use the ImageNet ResNet weights as initialization and update all layers during training

Figure 8 Illustration of three different training configurations including ImageNet all-update and FC-only Different colors are used torepresent different weights Green denotes pretrained weights from ImageNet and purple denotes the weights of ResNet(480) that have beenfine-tuned on the WCE dataset Several other colors denote other trained weights

1 2 3 4 5 688

89

90

91

92

93

94

Weighting factor

Val

ue (

)

ACSESP

Figure 9 AC SE and SP evolution against the wCE weighting factor Red purple and blue curves denote the results of AC SE and SPrespectively e horizontal axis is the weighting factor and the vertical axis is the value of AC SE and SP

8 Computational and Mathematical Methods in Medicine

capability of lesions in WCE videos To optimize our net-work we examine the performance of various ResNet seriesmembers to determine an appropriate backbone e cor-responding results are listed in Table 5 ResNet-34(480) hasbetter performance than ResNet-18(480) and ResNet-50(480) us we take ResNet-34(480) as our backbone totrain HAnet-34(480) e training settings are described inSection 31 Figure 10 gives the overall progression ofHAnet-34(480)

34 Comparison with Other Methods To evaluate the per-formance of HAnet we compared the proposed methodwith several other methods including several off-the-shelfCNN models [26ndash28] and two representative handcrafted-feature based methods for WCE recognition [3 42] e off-the-shelf CNN models including VGG [28] DenseNet [26]and Inception-ResNet-v2 [27] are trained to converge withthe same settings as ResNet-34(480) and the best model isselected based on the validation results For handcrafted-feature based methods grid searches to optimize hyperparameters are carried out

We performed repeated 2times 5-fold cross-validation toprovide sufficient measurements for statistical tests Table 6compares the detailed results of HAnet-34(480) with thoseof other methods On average HAnet-34(480) performsbetter in terms of AC SE and SP than the other methodsFigure 11(a) gives the location of each model considering itsinference time and accuracy Figure 11(b) is the statisticalresults of paired T-Test

Among the models tested HAnet-34(480) yields the bestperformance with good efficiency and accuracy Addition-ally the statistical test results demonstrate the improvementof our HAnet-34 is statistically significant Number in eachgrid cell denotes the p value of the two models in thecorresponding row and column We can see that the

improvement of HAnet-34(480) is statistically significant atthe 001 level compared with other methods

Table 7 gives more evaluation results based on severalcriteria including precision (PRE) recall (RECALL) F1 andF2 scores [33] and ROC-AUC [6] HAnet-34 outperformsall other models based on these criteria

4 Discussion

In this section the recognition capability of the proposedmethod for small lesions is demonstrated and discussedRecognition results are also visualized via the class activationmap (CAM) method [43] which indicates the localizationpotential of CNN networks for clinical diagnosis

41 Enhanced Recognition of Small Lesions by HAnet Toanalyze the recognition capacity of the proposed model thesensitivities of ulcers of different sizes are studied and theresults of ResNet-34(480) and the best hyper model HAnet-34(480) are listed in Table 8

Based on the results of each row in Table 8 most of theerrors noticeably occur in the small size range for bothmodels In general the larger the ulcer the easier its rec-ognition In the vertical comparison the ulcer recognition ofHAnet-34(480) outperforms that of ResNet-34(480) at allsize ranges including small lesions

42 Visualization of Recognition Results To better un-derstand our network we use a CAM [43] generated fromGAP to visualize the behavior of HAnet by highlighting therelatively important parts of an image and providing objectlocation information CAM is the weighted linear sum of theactivation map in the last convolutional layer e imageregions most relevant to a particular category can be simplyobtained by upsampling the CAM Using CAM we canverify what indeed has been learned by the network Six casesof representative results are displayed in Figure 12 For eachpair of images the left image shows the original frame whilethe right image shows the CAM result

ese results displayed in Figure 12 demonstrate thepotential use of HAnet for locating ulcers and easing thework of clinical physicians

Table 3 Cross-validation accuracy of ResNet-18(480) with different weighting factors

Weighting factor (w) 10 20 30 40 50 60AC () 9095plusmn 064 9100plusmn 049 9096plusmn 068 9100plusmn 070 9095plusmn 083 9072plusmn 075SE () 8865plusmn 064 8985plusmn 047 8986plusmn 101 9067plusmn 093 9150plusmn 076 9112plusmn 194SP () 9327plusmn 105 9215plusmn 122 9205plusmn 109 9145plusmn 184 9038plusmn 126 9032plusmn 188

Table 4 Performances of different architectures

ModelCross validation

AC () SE () SP ()ResNet-18(480) 9100plusmn 070 9055plusmn 086 9145plusmn 184hyper(l2) FC-only 9164plusmn 079 9122plusmn 108 9205plusmn 165hyper(l3) FC-only 9162plusmn 065 9115plusmn 056 9207plusmn 145hyper(l23) FC-only 9166 plusmn 081 9148 plusmn 090 9183 plusmn 175hyper(l2) all-update 9139plusmn 102 9128plusmn 104 9147plusmn 186hyper(l3) all-update 9150plusmn 081 9063plusmn 106 9233plusmn 174hyper(l23) all-update 9137plusmn 072 9133plusmn 063 914plusmn 142hyper(l2) ImageNet 9096plusmn 090 9052plusmn 110 9138plusmn 201hyper(l3) ImageNet 9104plusmn 080 9052plusmn 134 9154plusmn 131hyper(l23) ImageNet 9082plusmn 085 9026plusmn 133 9137plusmn 148

Table 5 Model recognition accuracy with different settings

ResNet-18(480) ResNet-34(480) ResNet-50(480)AC () 9100plusmn 070 9150plusmn 070 9129plusmn 091SE () 9055plusmn 086 9074plusmn 074 8963plusmn 183SP () 9145plusmn 184 9225plusmn 172 9294plusmn 182

Computational and Mathematical Methods in Medicine 9

3 HAnet architectures

hyper(l2)

hyper(l3)

hyper(l23)

3 Training configurations

ImageNet

all-update

FC-only

hyper(l23)

FC-only

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

3 Backbones

ResNet-34(480)

HAnet-34(480)

Cross validation

Cross validation

Figure 10 Progression of HAnet-34(480)

Table 6 Comparison of HAnet with other methods

AC () SE () SP ()SPM-BoW-SVM [42] 6138plusmn 122 5147plusmn 265 7067plusmn 224Words-based-color-histogram [3] 8034plusmn 029 8221plusmn 039 7844plusmn 029vgg-16(480) [28] 9085plusmn 098 9012plusmn 117 9202plusmn 252dense-121(480) [26] 9126plusmn 043 9047plusmn 167 9207plusmn 204Inception-ResNet-v2(480) [27] 9145plusmn 080 9081plusmn 195 9212plusmn 271ResNet-34(480) [25] 9147plusmn 052 9053plusmn 114 9241plusmn 166HAnet-34(480) 9205plusmn 052 9164plusmn 095 9242plusmn 154

0 10 20 30 409050

9075

9100

9125

9150

9175

9200

9225

9250

dense-121(480)

Inception-res-v2(480)

vgg-16(480)

HAnet-18(480)

HAnet-34(480)

HAnet-50(480)

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

Average inference time (ms)

Accu

racy

()

(a)

Figure 11 Continued

10 Computational and Mathematical Methods in Medicine

5 Conclusion

In this work we proposed a CNN architecture for ulcerdetection that uses a state-of-the-art CNN architecture(ResNet-34) as the feature extractor and fuses hyper anddeep features to enhance the recognition of ulcers of varioussizes A large ulcer dataset containing WCE videos from1416 patients was used for this studye proposed networkwas extensively evaluated and compared with other methodsusing overall AC SE SP F1 F2 and ROC-AUC as metrics

Experimental results demonstrate that the proposed ar-chitecture outperforms off-the-shelf CNN architectures es-pecially for the recognition of small ulcers Visualization with

CAM further demonstrates the potential of the proposedarchitecture to locate a suspicious area accurately in a WCEimage Taken together the results suggest a potential methodfor the automatic diagnosis of ulcers from WCE videos

Additionally we conducted experiments to investigatethe effect of number of cases We used split 0 datasets in thecross-validation experiment 990 cases for training 142 casesfor validation and 283 cases for testing We constructeddifferent training datasets from the 990 cases while fixed thevalidation and test dataset Firstly we did experiments onusing different number of cases for training We randomlyselected 659 cases 423 cases and 283 cases from 990 casesen we did another experiment using similar number of

00096

00096 00002

08018

00002 08018

00003

02384

00228

00003 02384 00228

00002

00026

00080

00989

00002 00026 00080 00989

HAnet-34(480)

HAnet-34(480)

Inception-res-v2(480)

Inception-res-v2(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

lt001

gt005

001ndash005

(b)

Figure 11 Comparison of different models (a) Accuracy and inference time comparison e horizontal and vertical axes denote theinference time and test accuracy respectively (b) Statistical test results of paired T-Test Number in each grid cell denotes the p value of thetwo models in the corresponding row and column

Table 7 Evaluation of different criteria

PRE RECALL F1 F2 ROC-AUCvgg-16(480) 09170 09012 09087 09040 09656dense-121(480) 09198 09047 09118 09074 09658Inception-ResNet-v2(480) 09208 09081 09138 09102 09706ResNet-34(480) 09218 09053 09133 09084 09698HAnet-34(480) 09237 09164 09199 09177 09726

Table 8 Recognition of ulcer with different sizes

ModelUlcer size

lt1 1ndash25 25ndash5 gt5ResNet-34(480) 8144plusmn 307 9186plusmn 140 9416plusmn 126 9651plusmn 143HAnet-34(480) 8237plusmn 360 9278plusmn 133 9540plusmn 074 9711plusmn 111

Computational and Mathematical Methods in Medicine 11

frames as last experiment that distributed in all 990 casesResults demonstrate that when similar number of frames areused for training test accuracies using training datasets withmore cases are better is should be attributed to richerdiversity introduced by more cases We may recommend touse as many cases as possible to train the model

While the performance of HAnet is very encouragingimproving its SE and SP further is necessary For examplethe fusion strategy in the proposed architecture involves

concatenation of features from shallow layers after GAPSemantic information in hyper features may not be as strongas that in deep features ie false-activated neural units dueto the relative limited receptive field in the shallow layersmay add unnecessary noise to the concatenated featurevector when GAP is utilized An attention mechanism [44]that can focus on the suspicious area may help address thisissue Temporal information from adjacent frames couldalso be used to provide external guidance during recognition

(a) (b)

(c) (d)

(e) (f )

Figure 12 Visualization network results of some representative ulcers with CAM A total of six groups of representative frames are obtained Foreach group the left image reflects the original frame while the right image shows the result of CAM (a) Typical ulcer (b) ulcer on the edge (c) ulcerin a turbid backgroundwith bubbles (d)multiple ulcers in one frame (e) ulcer in a shadow and (f) ulcer recognition in the framewith a flashlight

Table 9 Architectures of ResNet series members

Layer name Output size 18-Layer 34-Layer 50-Layer 101-Layer 152-LayerConv1 240 times 240 7 times 7 64 stride 2Maxpool 120 times 120 3 times 3 max pool stride 2

Layer 1 120 times 120 3 times 3 643 times 3 641113890 1113891 times 2 3 times 3 64

3 times 3 641113890 1113891 times 31 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Layer 2 60 times 60 3 times 3 1283 times 3 1281113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 41 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 8

Layer 3 30 times 30 3 times 3 2563 times 3 2561113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 61 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 6

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 23

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 36

Layer 4 15 times 15 3 times 3 5123 times 3 5121113890 1113891 times 2 3 times 3 512

3 times 3 5121113890 1113891 times 31 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Avgpool and fc 1 times 1 Global average pool 2-d fc

12 Computational and Mathematical Methods in Medicine

of the current frame In future work we will explore moretechniques in network design to improve ulcer recognition

Appendix

e architectures of members of the ResNet series includingResNet-18 ResNet-34 ResNet-50 ResNet-101 and ResNet-152 are illustrated in Table 9ese members share a similarpipeline consisting of seven components (ie conv1maxpool layer 1 layer 2 layer 3 layer 4 and avgpool + fc)Each layer in layers 1ndash4 has different subcomponentsdenoted as [ ] times n where [ ] represents a specific basicmodule consisting of 2-3 convolutional layers and n is thenumber of times the basic module is repeated Each row in[ ] describes the convolution layer setting eg [(3 times 3)64]

means a convolution layer with a kernel size and channel of3 times 3 and 64 respectively

Data Availability

eWCE datasets used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

Hao Zhang is currently an employee of Ankon TechnologiesCo Ltd (Wuhan Shanghai China) He helped in providingWCE data and organizing annotation e authors wouldlike to thank Ankon Technologies Co Ltd (WuhanShanghai China) for providing the WCE data (anko-ninccomcn)ey would also like to thank the participatingengineers of Ankon Technologies Co Ltd for theirthoughtful support and cooperation is work was sup-ported by a Research Project from Ankon Technologies CoLtd (Wuhan Shanghai China) the National Key ScientificInstrument and Equipment Development Project underGrant no 2013YQ160439 and the Zhangjiang NationalInnovation Demonstration Zone Special Development Fundunder Grant no ZJ2017-ZD-001

References

[1] L Shen Y-S Shan H-M Hu et al ldquoManagement of gastriccancer in Asia resource-stratified guidelinesrdquo e LancetOncology vol 14 no 12 pp e535ndashe547 2013

[2] Z Liao X Hou E-Q Lin-Hu et al ldquoAccuracy of magneticallycontrolled capsule endoscopy compared with conventionalgastroscopy in detection of gastric diseasesrdquo Clinical Gastro-enterology and Hepatology vol 14 no 9 pp 1266ndash1273e1 2016

[3] Y Yuan B Li and M Q-H Meng ldquoBleeding frame andregion detection in the wireless capsule endoscopy videordquoIEEE Journal of Biomedical and Health Informatics vol 20no 2 pp 624ndash630 2015

[4] N Shamsudhin V I Zverev H Keller et al ldquoMagneticallyguided capsule endoscopyrdquo Medical Physics vol 44 no 8pp e91ndashe111 2017

[5] B Li and M Q-H Meng ldquoComputer aided detection ofbleeding regions for capsule endoscopy imagesrdquo IEEETransactions on Biomedical Engineering vol 56 no 4pp 1032ndash1039 2009

[6] J-Y He X Wu Y-G Jiang Q Peng and R Jain ldquoHook-worm detection in wireless capsule endoscopy images withdeep learningrdquo IEEE Transactions on Image Processingvol 27 no 5 pp 2379ndash2392 2018

[7] Y Yuan and M Q-H Meng ldquoDeep learning for polyprecognition in wireless capsule endoscopy imagesrdquo MedicalPhysics vol 44 no 4 pp 1379ndash1389 2017

[8] B Li and M Q-H Meng ldquoAutomatic polyp detection forwireless capsule endoscopy imagesrdquo Expert Systems withApplications vol 39 no 12 pp 10952ndash10958 2012

[9] A Karargyris and N Bourbakis ldquoDetection of small bowelpolyps and ulcers in wireless capsule endoscopy videosrdquo IEEETransactions on Biomedical Engineering vol 58 no 10pp 2777ndash2786 2011

[10] L Yu P C Yuen and J Lai ldquoUlcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 21st In-ternational Conference on Pattern Recognition (ICPR2012)pp 45ndash48 IEEE Tsukuba Japan November 2012

[11] J-Y Yeh T-H Wu and W-J Tsai ldquoBleeding and ulcerdetection using wireless capsule endoscopy imagesrdquo Journalof Software Engineering and Applications vol 7 no 5pp 422ndash432 2014

[12] Y Fu W Zhang M Mandal and M Q-H Meng ldquoCom-puter-aided bleeding detection in WCE videordquo IEEE Journalof Biomedical and Health Informatics vol 18 no 2pp 636ndash642 2014

[13] V Charisis L Hadjileontiadis and G Sergiadis ldquoEnhancedulcer recognition from capsule endoscopic images usingtexture analysisrdquo in New Advances in the Basic and ClinicalGastroenterology IntechOpen London UK 2012

[14] A K Kundu and S A Fattah ldquoAn asymmetric indexed imagebased technique for automatic ulcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 2017 IEEERegion 10 Humanitarian Technology Conference (R10-HTC)pp 734ndash737 IEEE Dhaka Bangladesh December 2017

[15] E Ribeiro A Uhl W Georg andM Hafner ldquoExploring deeplearning and transfer learning for colonic polyp classifica-tionrdquo Computational andMathematical Methods in Medicinevol 2016 Article ID 6584725 16 pages 2016

[16] B Li and M Q-H Meng ldquoComputer-based detection ofbleeding and ulcer in wireless capsule endoscopy images bychromaticity momentsrdquo Computers in Biology and Medicinevol 39 no 2 pp 141ndash147 2009

[17] B Li L Qi M Q-H Meng and Y Fan ldquoUsing ensembleclassifier for small bowel ulcer detection in wireless capsuleendoscopy imagesrdquo in Proceedings of the 2009 IEEE In-ternational Conference on Robotics amp Biomimetics GuilinChina December 2009

[18] T Aoki A Yamada K Aoyama et al ldquoAutomatic detection oferosions and ulcerations in wireless capsule endoscopy imagesbased on a deep convolutional neural networkrdquo Gastrointes-tinal Endoscopy vol 89 no 2 pp 357ndash363e2 2019

[19] B Li andM Q-H Meng ldquoTexture analysis for ulcer detectionin capsule endoscopy imagesrdquo Image and Vision Computingvol 27 no 9 pp 1336ndash1342 2009

[20] Y Yuan and M Q-H Meng ldquoA novel feature for polypdetection in wireless capsule endoscopy imagesrdquo in Pro-ceedings of the 2014 IEEERSJ International Conference onIntelligent Robots and Systems pp 5010ndash5015 IEEE ChicagoIL USA September 2014

Computational and Mathematical Methods in Medicine 13

[21] A Krizhevsky I Sutskever and G Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inAdvances in Neural Information Processing Systems vol 25no 2 Curran Associates Inc Red Hook NY USA 2012

[22] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in Proceedings of the EuropeanConference on Computer Vision pp 818ndash833 Springer ChamSwitzerland September 2014

[23] C Szegedy W Liu Y Jia et al ldquoGoing deeper with con-volutionsrdquo in Proceedings of the 2015 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEEBoston MA USA June 2015

[24] C Szegedy V Vanhoucke S Ioffe J Shlens and Z WojnaldquoRethinking the inception architecture for computer visionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2818ndash2826 Las Vegas NV USA June2016

[25] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778 LasVegas NV USA June 2016

[26] G Huang Z Liu L Van Der Maaten and K Q WeinbergerldquoDensely connected convolutional networksrdquo in Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition pp 4700ndash4708 Honolulu HI USA July 2017

[27] C Szegedy S Ioffe V Vanhoucke and A Alemi ldquoInception-v4 inception-ResNet and the impact of residual connectionson learningrdquo in Proceedings of the irty-First AAAI Con-ference on Artificial Intelligence San Francisco CA USAFebruary 2017

[28] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[29] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquoNaturevol 521 no 7553 pp 436ndash444 2015

[30] M Turan E P Ornek N Ibrahimli et al ldquoUnsupervisedodometry and depth learning for endoscopic capsule robotsrdquo2018 httpsarxivorgabs180301047

[31] M Ye E Johns A Handa L Zhang P Pratt and G-Z YangldquoSelf-supervised Siamese learning on stereo image pairs fordepth estimation in robotic surgeryrdquo 2017 httpsarxivorgabs170508260

[32] M Faisal and N J Durr ldquoDeep learning and conditionalrandom fields-based depth estimation and topographicalreconstruction from conventional endoscopyrdquoMedical ImageAnalysis vol 48 pp 230ndash243 2018

[33] L Yu H Chen Q Dou J Qin and P A Heng ldquoIntegratingonline and offline three-dimensional deep learning for auto-mated polyp detection in colonoscopy videosrdquo IEEE Journal ofBiomedical andHealth Informatics vol 21 no1 pp 65ndash75 2017

[34] A A Shvets V I Iglovikov A Rakhlin and A A KalininldquoAngiodysplasia detection and localization using deep con-volutional neural networksrdquo in Proceedings of the 2018 17thIEEE International Conference on Machine Learning andApplications (ICMLA) pp 612ndash617 IEEE Orlando FL USADecember 2018

[35] S Fan L Xu Y Fan K Wei and L Li ldquoComputer-aideddetection of small intestinal ulcer and erosion in wirelesscapsule endoscopy imagesrdquo Physics in Medicine amp Biologyvol 63 no 16 p 165001 2018

[36] W Liu D Anguelov D Erhan et al ldquoSSD single shotmultibox detectorrdquo in Proceedings of the European Conferenceon Computer Vision pp 21ndash37 Springer Cham SwitzerlandOctober 2016

[37] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7263ndash7271 Honolulu HIUSA July 2017

[38] T-Y Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2117ndash2125 Honolulu HI USA July2017

[39] M Lin Q Chen and S Yan ldquoNetwork in networkrdquo 2013httpsarxivorgabs13124400

[40] T-Y Lin P Goyal R Girshick K He and P Dollar ldquoFocalloss for dense object detectionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 2980ndash2988Venice Italy October 2017

[41] H R Roth H Oda Y Hayashi et al ldquoHierarchical 3D fullyconvolutional networks for multi-organ segmentationrdquo 2017httpsarxivorgabs170406382

[42] X Liu J Bai G Liao Z Luo and C Wang ldquoDetection ofprotruding lesion in wireless capsule endoscopy videos of smallintestinerdquo in Proceedings of the Medical Imaging 2018 Com-puter-Aided Diagnosis Houston TX USA February 2018

[43] B Zhou A Khosla A Lapedriza A Oliva and A TorralbaldquoLearning deep features for discriminative localizationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2921ndash2929 Las Vegas NV USA June2016

[44] F Wang M Jiang C Qian et al ldquoResidual attention networkfor image classificationrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 3156ndash3164Honolulu HI USA July 2017

14 Computational and Mathematical Methods in Medicine

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 9: DeepConvolutionalNeuralNetworkforUlcerRecognitionin ...e emergence of wireless capsule endoscopy (WCE) has revolutionized the task of imaging GI issues; this technology offers a noninvasive

capability of lesions in WCE videos To optimize our net-work we examine the performance of various ResNet seriesmembers to determine an appropriate backbone e cor-responding results are listed in Table 5 ResNet-34(480) hasbetter performance than ResNet-18(480) and ResNet-50(480) us we take ResNet-34(480) as our backbone totrain HAnet-34(480) e training settings are described inSection 31 Figure 10 gives the overall progression ofHAnet-34(480)

34 Comparison with Other Methods To evaluate the per-formance of HAnet we compared the proposed methodwith several other methods including several off-the-shelfCNN models [26ndash28] and two representative handcrafted-feature based methods for WCE recognition [3 42] e off-the-shelf CNN models including VGG [28] DenseNet [26]and Inception-ResNet-v2 [27] are trained to converge withthe same settings as ResNet-34(480) and the best model isselected based on the validation results For handcrafted-feature based methods grid searches to optimize hyperparameters are carried out

We performed repeated 2times 5-fold cross-validation toprovide sufficient measurements for statistical tests Table 6compares the detailed results of HAnet-34(480) with thoseof other methods On average HAnet-34(480) performsbetter in terms of AC SE and SP than the other methodsFigure 11(a) gives the location of each model considering itsinference time and accuracy Figure 11(b) is the statisticalresults of paired T-Test

Among the models tested HAnet-34(480) yields the bestperformance with good efficiency and accuracy Addition-ally the statistical test results demonstrate the improvementof our HAnet-34 is statistically significant Number in eachgrid cell denotes the p value of the two models in thecorresponding row and column We can see that the

improvement of HAnet-34(480) is statistically significant atthe 001 level compared with other methods

Table 7 gives more evaluation results based on severalcriteria including precision (PRE) recall (RECALL) F1 andF2 scores [33] and ROC-AUC [6] HAnet-34 outperformsall other models based on these criteria

4 Discussion

In this section the recognition capability of the proposedmethod for small lesions is demonstrated and discussedRecognition results are also visualized via the class activationmap (CAM) method [43] which indicates the localizationpotential of CNN networks for clinical diagnosis

41 Enhanced Recognition of Small Lesions by HAnet Toanalyze the recognition capacity of the proposed model thesensitivities of ulcers of different sizes are studied and theresults of ResNet-34(480) and the best hyper model HAnet-34(480) are listed in Table 8

Based on the results of each row in Table 8 most of theerrors noticeably occur in the small size range for bothmodels In general the larger the ulcer the easier its rec-ognition In the vertical comparison the ulcer recognition ofHAnet-34(480) outperforms that of ResNet-34(480) at allsize ranges including small lesions

42 Visualization of Recognition Results To better un-derstand our network we use a CAM [43] generated fromGAP to visualize the behavior of HAnet by highlighting therelatively important parts of an image and providing objectlocation information CAM is the weighted linear sum of theactivation map in the last convolutional layer e imageregions most relevant to a particular category can be simplyobtained by upsampling the CAM Using CAM we canverify what indeed has been learned by the network Six casesof representative results are displayed in Figure 12 For eachpair of images the left image shows the original frame whilethe right image shows the CAM result

ese results displayed in Figure 12 demonstrate thepotential use of HAnet for locating ulcers and easing thework of clinical physicians

Table 3 Cross-validation accuracy of ResNet-18(480) with different weighting factors

Weighting factor (w) 10 20 30 40 50 60AC () 9095plusmn 064 9100plusmn 049 9096plusmn 068 9100plusmn 070 9095plusmn 083 9072plusmn 075SE () 8865plusmn 064 8985plusmn 047 8986plusmn 101 9067plusmn 093 9150plusmn 076 9112plusmn 194SP () 9327plusmn 105 9215plusmn 122 9205plusmn 109 9145plusmn 184 9038plusmn 126 9032plusmn 188

Table 4 Performances of different architectures

ModelCross validation

AC () SE () SP ()ResNet-18(480) 9100plusmn 070 9055plusmn 086 9145plusmn 184hyper(l2) FC-only 9164plusmn 079 9122plusmn 108 9205plusmn 165hyper(l3) FC-only 9162plusmn 065 9115plusmn 056 9207plusmn 145hyper(l23) FC-only 9166 plusmn 081 9148 plusmn 090 9183 plusmn 175hyper(l2) all-update 9139plusmn 102 9128plusmn 104 9147plusmn 186hyper(l3) all-update 9150plusmn 081 9063plusmn 106 9233plusmn 174hyper(l23) all-update 9137plusmn 072 9133plusmn 063 914plusmn 142hyper(l2) ImageNet 9096plusmn 090 9052plusmn 110 9138plusmn 201hyper(l3) ImageNet 9104plusmn 080 9052plusmn 134 9154plusmn 131hyper(l23) ImageNet 9082plusmn 085 9026plusmn 133 9137plusmn 148

Table 5 Model recognition accuracy with different settings

ResNet-18(480) ResNet-34(480) ResNet-50(480)AC () 9100plusmn 070 9150plusmn 070 9129plusmn 091SE () 9055plusmn 086 9074plusmn 074 8963plusmn 183SP () 9145plusmn 184 9225plusmn 172 9294plusmn 182

Computational and Mathematical Methods in Medicine 9

3 HAnet architectures

hyper(l2)

hyper(l3)

hyper(l23)

3 Training configurations

ImageNet

all-update

FC-only

hyper(l23)

FC-only

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

3 Backbones

ResNet-34(480)

HAnet-34(480)

Cross validation

Cross validation

Figure 10 Progression of HAnet-34(480)

Table 6 Comparison of HAnet with other methods

AC () SE () SP ()SPM-BoW-SVM [42] 6138plusmn 122 5147plusmn 265 7067plusmn 224Words-based-color-histogram [3] 8034plusmn 029 8221plusmn 039 7844plusmn 029vgg-16(480) [28] 9085plusmn 098 9012plusmn 117 9202plusmn 252dense-121(480) [26] 9126plusmn 043 9047plusmn 167 9207plusmn 204Inception-ResNet-v2(480) [27] 9145plusmn 080 9081plusmn 195 9212plusmn 271ResNet-34(480) [25] 9147plusmn 052 9053plusmn 114 9241plusmn 166HAnet-34(480) 9205plusmn 052 9164plusmn 095 9242plusmn 154

0 10 20 30 409050

9075

9100

9125

9150

9175

9200

9225

9250

dense-121(480)

Inception-res-v2(480)

vgg-16(480)

HAnet-18(480)

HAnet-34(480)

HAnet-50(480)

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

Average inference time (ms)

Accu

racy

()

(a)

Figure 11 Continued

10 Computational and Mathematical Methods in Medicine

5 Conclusion

In this work we proposed a CNN architecture for ulcerdetection that uses a state-of-the-art CNN architecture(ResNet-34) as the feature extractor and fuses hyper anddeep features to enhance the recognition of ulcers of varioussizes A large ulcer dataset containing WCE videos from1416 patients was used for this studye proposed networkwas extensively evaluated and compared with other methodsusing overall AC SE SP F1 F2 and ROC-AUC as metrics

Experimental results demonstrate that the proposed ar-chitecture outperforms off-the-shelf CNN architectures es-pecially for the recognition of small ulcers Visualization with

CAM further demonstrates the potential of the proposedarchitecture to locate a suspicious area accurately in a WCEimage Taken together the results suggest a potential methodfor the automatic diagnosis of ulcers from WCE videos

Additionally we conducted experiments to investigatethe effect of number of cases We used split 0 datasets in thecross-validation experiment 990 cases for training 142 casesfor validation and 283 cases for testing We constructeddifferent training datasets from the 990 cases while fixed thevalidation and test dataset Firstly we did experiments onusing different number of cases for training We randomlyselected 659 cases 423 cases and 283 cases from 990 casesen we did another experiment using similar number of

00096

00096 00002

08018

00002 08018

00003

02384

00228

00003 02384 00228

00002

00026

00080

00989

00002 00026 00080 00989

HAnet-34(480)

HAnet-34(480)

Inception-res-v2(480)

Inception-res-v2(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

lt001

gt005

001ndash005

(b)

Figure 11 Comparison of different models (a) Accuracy and inference time comparison e horizontal and vertical axes denote theinference time and test accuracy respectively (b) Statistical test results of paired T-Test Number in each grid cell denotes the p value of thetwo models in the corresponding row and column

Table 7 Evaluation of different criteria

PRE RECALL F1 F2 ROC-AUCvgg-16(480) 09170 09012 09087 09040 09656dense-121(480) 09198 09047 09118 09074 09658Inception-ResNet-v2(480) 09208 09081 09138 09102 09706ResNet-34(480) 09218 09053 09133 09084 09698HAnet-34(480) 09237 09164 09199 09177 09726

Table 8 Recognition of ulcer with different sizes

ModelUlcer size

lt1 1ndash25 25ndash5 gt5ResNet-34(480) 8144plusmn 307 9186plusmn 140 9416plusmn 126 9651plusmn 143HAnet-34(480) 8237plusmn 360 9278plusmn 133 9540plusmn 074 9711plusmn 111

Computational and Mathematical Methods in Medicine 11

frames as last experiment that distributed in all 990 casesResults demonstrate that when similar number of frames areused for training test accuracies using training datasets withmore cases are better is should be attributed to richerdiversity introduced by more cases We may recommend touse as many cases as possible to train the model

While the performance of HAnet is very encouragingimproving its SE and SP further is necessary For examplethe fusion strategy in the proposed architecture involves

concatenation of features from shallow layers after GAPSemantic information in hyper features may not be as strongas that in deep features ie false-activated neural units dueto the relative limited receptive field in the shallow layersmay add unnecessary noise to the concatenated featurevector when GAP is utilized An attention mechanism [44]that can focus on the suspicious area may help address thisissue Temporal information from adjacent frames couldalso be used to provide external guidance during recognition

(a) (b)

(c) (d)

(e) (f )

Figure 12 Visualization network results of some representative ulcers with CAM A total of six groups of representative frames are obtained Foreach group the left image reflects the original frame while the right image shows the result of CAM (a) Typical ulcer (b) ulcer on the edge (c) ulcerin a turbid backgroundwith bubbles (d)multiple ulcers in one frame (e) ulcer in a shadow and (f) ulcer recognition in the framewith a flashlight

Table 9 Architectures of ResNet series members

Layer name Output size 18-Layer 34-Layer 50-Layer 101-Layer 152-LayerConv1 240 times 240 7 times 7 64 stride 2Maxpool 120 times 120 3 times 3 max pool stride 2

Layer 1 120 times 120 3 times 3 643 times 3 641113890 1113891 times 2 3 times 3 64

3 times 3 641113890 1113891 times 31 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Layer 2 60 times 60 3 times 3 1283 times 3 1281113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 41 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 8

Layer 3 30 times 30 3 times 3 2563 times 3 2561113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 61 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 6

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 23

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 36

Layer 4 15 times 15 3 times 3 5123 times 3 5121113890 1113891 times 2 3 times 3 512

3 times 3 5121113890 1113891 times 31 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Avgpool and fc 1 times 1 Global average pool 2-d fc

12 Computational and Mathematical Methods in Medicine

of the current frame In future work we will explore moretechniques in network design to improve ulcer recognition

Appendix

e architectures of members of the ResNet series includingResNet-18 ResNet-34 ResNet-50 ResNet-101 and ResNet-152 are illustrated in Table 9ese members share a similarpipeline consisting of seven components (ie conv1maxpool layer 1 layer 2 layer 3 layer 4 and avgpool + fc)Each layer in layers 1ndash4 has different subcomponentsdenoted as [ ] times n where [ ] represents a specific basicmodule consisting of 2-3 convolutional layers and n is thenumber of times the basic module is repeated Each row in[ ] describes the convolution layer setting eg [(3 times 3)64]

means a convolution layer with a kernel size and channel of3 times 3 and 64 respectively

Data Availability

eWCE datasets used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

Hao Zhang is currently an employee of Ankon TechnologiesCo Ltd (Wuhan Shanghai China) He helped in providingWCE data and organizing annotation e authors wouldlike to thank Ankon Technologies Co Ltd (WuhanShanghai China) for providing the WCE data (anko-ninccomcn)ey would also like to thank the participatingengineers of Ankon Technologies Co Ltd for theirthoughtful support and cooperation is work was sup-ported by a Research Project from Ankon Technologies CoLtd (Wuhan Shanghai China) the National Key ScientificInstrument and Equipment Development Project underGrant no 2013YQ160439 and the Zhangjiang NationalInnovation Demonstration Zone Special Development Fundunder Grant no ZJ2017-ZD-001

References

[1] L Shen Y-S Shan H-M Hu et al ldquoManagement of gastriccancer in Asia resource-stratified guidelinesrdquo e LancetOncology vol 14 no 12 pp e535ndashe547 2013

[2] Z Liao X Hou E-Q Lin-Hu et al ldquoAccuracy of magneticallycontrolled capsule endoscopy compared with conventionalgastroscopy in detection of gastric diseasesrdquo Clinical Gastro-enterology and Hepatology vol 14 no 9 pp 1266ndash1273e1 2016

[3] Y Yuan B Li and M Q-H Meng ldquoBleeding frame andregion detection in the wireless capsule endoscopy videordquoIEEE Journal of Biomedical and Health Informatics vol 20no 2 pp 624ndash630 2015

[4] N Shamsudhin V I Zverev H Keller et al ldquoMagneticallyguided capsule endoscopyrdquo Medical Physics vol 44 no 8pp e91ndashe111 2017

[5] B Li and M Q-H Meng ldquoComputer aided detection ofbleeding regions for capsule endoscopy imagesrdquo IEEETransactions on Biomedical Engineering vol 56 no 4pp 1032ndash1039 2009

[6] J-Y He X Wu Y-G Jiang Q Peng and R Jain ldquoHook-worm detection in wireless capsule endoscopy images withdeep learningrdquo IEEE Transactions on Image Processingvol 27 no 5 pp 2379ndash2392 2018

[7] Y Yuan and M Q-H Meng ldquoDeep learning for polyprecognition in wireless capsule endoscopy imagesrdquo MedicalPhysics vol 44 no 4 pp 1379ndash1389 2017

[8] B Li and M Q-H Meng ldquoAutomatic polyp detection forwireless capsule endoscopy imagesrdquo Expert Systems withApplications vol 39 no 12 pp 10952ndash10958 2012

[9] A Karargyris and N Bourbakis ldquoDetection of small bowelpolyps and ulcers in wireless capsule endoscopy videosrdquo IEEETransactions on Biomedical Engineering vol 58 no 10pp 2777ndash2786 2011

[10] L Yu P C Yuen and J Lai ldquoUlcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 21st In-ternational Conference on Pattern Recognition (ICPR2012)pp 45ndash48 IEEE Tsukuba Japan November 2012

[11] J-Y Yeh T-H Wu and W-J Tsai ldquoBleeding and ulcerdetection using wireless capsule endoscopy imagesrdquo Journalof Software Engineering and Applications vol 7 no 5pp 422ndash432 2014

[12] Y Fu W Zhang M Mandal and M Q-H Meng ldquoCom-puter-aided bleeding detection in WCE videordquo IEEE Journalof Biomedical and Health Informatics vol 18 no 2pp 636ndash642 2014

[13] V Charisis L Hadjileontiadis and G Sergiadis ldquoEnhancedulcer recognition from capsule endoscopic images usingtexture analysisrdquo in New Advances in the Basic and ClinicalGastroenterology IntechOpen London UK 2012

[14] A K Kundu and S A Fattah ldquoAn asymmetric indexed imagebased technique for automatic ulcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 2017 IEEERegion 10 Humanitarian Technology Conference (R10-HTC)pp 734ndash737 IEEE Dhaka Bangladesh December 2017

[15] E Ribeiro A Uhl W Georg andM Hafner ldquoExploring deeplearning and transfer learning for colonic polyp classifica-tionrdquo Computational andMathematical Methods in Medicinevol 2016 Article ID 6584725 16 pages 2016

[16] B Li and M Q-H Meng ldquoComputer-based detection ofbleeding and ulcer in wireless capsule endoscopy images bychromaticity momentsrdquo Computers in Biology and Medicinevol 39 no 2 pp 141ndash147 2009

[17] B Li L Qi M Q-H Meng and Y Fan ldquoUsing ensembleclassifier for small bowel ulcer detection in wireless capsuleendoscopy imagesrdquo in Proceedings of the 2009 IEEE In-ternational Conference on Robotics amp Biomimetics GuilinChina December 2009

[18] T Aoki A Yamada K Aoyama et al ldquoAutomatic detection oferosions and ulcerations in wireless capsule endoscopy imagesbased on a deep convolutional neural networkrdquo Gastrointes-tinal Endoscopy vol 89 no 2 pp 357ndash363e2 2019

[19] B Li andM Q-H Meng ldquoTexture analysis for ulcer detectionin capsule endoscopy imagesrdquo Image and Vision Computingvol 27 no 9 pp 1336ndash1342 2009

[20] Y Yuan and M Q-H Meng ldquoA novel feature for polypdetection in wireless capsule endoscopy imagesrdquo in Pro-ceedings of the 2014 IEEERSJ International Conference onIntelligent Robots and Systems pp 5010ndash5015 IEEE ChicagoIL USA September 2014

Computational and Mathematical Methods in Medicine 13

[21] A Krizhevsky I Sutskever and G Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inAdvances in Neural Information Processing Systems vol 25no 2 Curran Associates Inc Red Hook NY USA 2012

[22] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in Proceedings of the EuropeanConference on Computer Vision pp 818ndash833 Springer ChamSwitzerland September 2014

[23] C Szegedy W Liu Y Jia et al ldquoGoing deeper with con-volutionsrdquo in Proceedings of the 2015 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEEBoston MA USA June 2015

[24] C Szegedy V Vanhoucke S Ioffe J Shlens and Z WojnaldquoRethinking the inception architecture for computer visionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2818ndash2826 Las Vegas NV USA June2016

[25] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778 LasVegas NV USA June 2016

[26] G Huang Z Liu L Van Der Maaten and K Q WeinbergerldquoDensely connected convolutional networksrdquo in Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition pp 4700ndash4708 Honolulu HI USA July 2017

[27] C Szegedy S Ioffe V Vanhoucke and A Alemi ldquoInception-v4 inception-ResNet and the impact of residual connectionson learningrdquo in Proceedings of the irty-First AAAI Con-ference on Artificial Intelligence San Francisco CA USAFebruary 2017

[28] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[29] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquoNaturevol 521 no 7553 pp 436ndash444 2015

[30] M Turan E P Ornek N Ibrahimli et al ldquoUnsupervisedodometry and depth learning for endoscopic capsule robotsrdquo2018 httpsarxivorgabs180301047

[31] M Ye E Johns A Handa L Zhang P Pratt and G-Z YangldquoSelf-supervised Siamese learning on stereo image pairs fordepth estimation in robotic surgeryrdquo 2017 httpsarxivorgabs170508260

[32] M Faisal and N J Durr ldquoDeep learning and conditionalrandom fields-based depth estimation and topographicalreconstruction from conventional endoscopyrdquoMedical ImageAnalysis vol 48 pp 230ndash243 2018

[33] L Yu H Chen Q Dou J Qin and P A Heng ldquoIntegratingonline and offline three-dimensional deep learning for auto-mated polyp detection in colonoscopy videosrdquo IEEE Journal ofBiomedical andHealth Informatics vol 21 no1 pp 65ndash75 2017

[34] A A Shvets V I Iglovikov A Rakhlin and A A KalininldquoAngiodysplasia detection and localization using deep con-volutional neural networksrdquo in Proceedings of the 2018 17thIEEE International Conference on Machine Learning andApplications (ICMLA) pp 612ndash617 IEEE Orlando FL USADecember 2018

[35] S Fan L Xu Y Fan K Wei and L Li ldquoComputer-aideddetection of small intestinal ulcer and erosion in wirelesscapsule endoscopy imagesrdquo Physics in Medicine amp Biologyvol 63 no 16 p 165001 2018

[36] W Liu D Anguelov D Erhan et al ldquoSSD single shotmultibox detectorrdquo in Proceedings of the European Conferenceon Computer Vision pp 21ndash37 Springer Cham SwitzerlandOctober 2016

[37] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7263ndash7271 Honolulu HIUSA July 2017

[38] T-Y Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2117ndash2125 Honolulu HI USA July2017

[39] M Lin Q Chen and S Yan ldquoNetwork in networkrdquo 2013httpsarxivorgabs13124400

[40] T-Y Lin P Goyal R Girshick K He and P Dollar ldquoFocalloss for dense object detectionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 2980ndash2988Venice Italy October 2017

[41] H R Roth H Oda Y Hayashi et al ldquoHierarchical 3D fullyconvolutional networks for multi-organ segmentationrdquo 2017httpsarxivorgabs170406382

[42] X Liu J Bai G Liao Z Luo and C Wang ldquoDetection ofprotruding lesion in wireless capsule endoscopy videos of smallintestinerdquo in Proceedings of the Medical Imaging 2018 Com-puter-Aided Diagnosis Houston TX USA February 2018

[43] B Zhou A Khosla A Lapedriza A Oliva and A TorralbaldquoLearning deep features for discriminative localizationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2921ndash2929 Las Vegas NV USA June2016

[44] F Wang M Jiang C Qian et al ldquoResidual attention networkfor image classificationrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 3156ndash3164Honolulu HI USA July 2017

14 Computational and Mathematical Methods in Medicine

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 10: DeepConvolutionalNeuralNetworkforUlcerRecognitionin ...e emergence of wireless capsule endoscopy (WCE) has revolutionized the task of imaging GI issues; this technology offers a noninvasive

3 HAnet architectures

hyper(l2)

hyper(l3)

hyper(l23)

3 Training configurations

ImageNet

all-update

FC-only

hyper(l23)

FC-only

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

3 Backbones

ResNet-34(480)

HAnet-34(480)

Cross validation

Cross validation

Figure 10 Progression of HAnet-34(480)

Table 6 Comparison of HAnet with other methods

AC () SE () SP ()SPM-BoW-SVM [42] 6138plusmn 122 5147plusmn 265 7067plusmn 224Words-based-color-histogram [3] 8034plusmn 029 8221plusmn 039 7844plusmn 029vgg-16(480) [28] 9085plusmn 098 9012plusmn 117 9202plusmn 252dense-121(480) [26] 9126plusmn 043 9047plusmn 167 9207plusmn 204Inception-ResNet-v2(480) [27] 9145plusmn 080 9081plusmn 195 9212plusmn 271ResNet-34(480) [25] 9147plusmn 052 9053plusmn 114 9241plusmn 166HAnet-34(480) 9205plusmn 052 9164plusmn 095 9242plusmn 154

0 10 20 30 409050

9075

9100

9125

9150

9175

9200

9225

9250

dense-121(480)

Inception-res-v2(480)

vgg-16(480)

HAnet-18(480)

HAnet-34(480)

HAnet-50(480)

ResNet-18(480)

ResNet-34(480)

ResNet-50(480)

Average inference time (ms)

Accu

racy

()

(a)

Figure 11 Continued

10 Computational and Mathematical Methods in Medicine

5 Conclusion

In this work we proposed a CNN architecture for ulcerdetection that uses a state-of-the-art CNN architecture(ResNet-34) as the feature extractor and fuses hyper anddeep features to enhance the recognition of ulcers of varioussizes A large ulcer dataset containing WCE videos from1416 patients was used for this studye proposed networkwas extensively evaluated and compared with other methodsusing overall AC SE SP F1 F2 and ROC-AUC as metrics

Experimental results demonstrate that the proposed ar-chitecture outperforms off-the-shelf CNN architectures es-pecially for the recognition of small ulcers Visualization with

CAM further demonstrates the potential of the proposedarchitecture to locate a suspicious area accurately in a WCEimage Taken together the results suggest a potential methodfor the automatic diagnosis of ulcers from WCE videos

Additionally we conducted experiments to investigatethe effect of number of cases We used split 0 datasets in thecross-validation experiment 990 cases for training 142 casesfor validation and 283 cases for testing We constructeddifferent training datasets from the 990 cases while fixed thevalidation and test dataset Firstly we did experiments onusing different number of cases for training We randomlyselected 659 cases 423 cases and 283 cases from 990 casesen we did another experiment using similar number of

00096

00096 00002

08018

00002 08018

00003

02384

00228

00003 02384 00228

00002

00026

00080

00989

00002 00026 00080 00989

HAnet-34(480)

HAnet-34(480)

Inception-res-v2(480)

Inception-res-v2(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

lt001

gt005

001ndash005

(b)

Figure 11 Comparison of different models (a) Accuracy and inference time comparison e horizontal and vertical axes denote theinference time and test accuracy respectively (b) Statistical test results of paired T-Test Number in each grid cell denotes the p value of thetwo models in the corresponding row and column

Table 7 Evaluation of different criteria

PRE RECALL F1 F2 ROC-AUCvgg-16(480) 09170 09012 09087 09040 09656dense-121(480) 09198 09047 09118 09074 09658Inception-ResNet-v2(480) 09208 09081 09138 09102 09706ResNet-34(480) 09218 09053 09133 09084 09698HAnet-34(480) 09237 09164 09199 09177 09726

Table 8 Recognition of ulcer with different sizes

ModelUlcer size

lt1 1ndash25 25ndash5 gt5ResNet-34(480) 8144plusmn 307 9186plusmn 140 9416plusmn 126 9651plusmn 143HAnet-34(480) 8237plusmn 360 9278plusmn 133 9540plusmn 074 9711plusmn 111

Computational and Mathematical Methods in Medicine 11

frames as last experiment that distributed in all 990 casesResults demonstrate that when similar number of frames areused for training test accuracies using training datasets withmore cases are better is should be attributed to richerdiversity introduced by more cases We may recommend touse as many cases as possible to train the model

While the performance of HAnet is very encouragingimproving its SE and SP further is necessary For examplethe fusion strategy in the proposed architecture involves

concatenation of features from shallow layers after GAPSemantic information in hyper features may not be as strongas that in deep features ie false-activated neural units dueto the relative limited receptive field in the shallow layersmay add unnecessary noise to the concatenated featurevector when GAP is utilized An attention mechanism [44]that can focus on the suspicious area may help address thisissue Temporal information from adjacent frames couldalso be used to provide external guidance during recognition

(a) (b)

(c) (d)

(e) (f )

Figure 12 Visualization network results of some representative ulcers with CAM A total of six groups of representative frames are obtained Foreach group the left image reflects the original frame while the right image shows the result of CAM (a) Typical ulcer (b) ulcer on the edge (c) ulcerin a turbid backgroundwith bubbles (d)multiple ulcers in one frame (e) ulcer in a shadow and (f) ulcer recognition in the framewith a flashlight

Table 9 Architectures of ResNet series members

Layer name Output size 18-Layer 34-Layer 50-Layer 101-Layer 152-LayerConv1 240 times 240 7 times 7 64 stride 2Maxpool 120 times 120 3 times 3 max pool stride 2

Layer 1 120 times 120 3 times 3 643 times 3 641113890 1113891 times 2 3 times 3 64

3 times 3 641113890 1113891 times 31 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Layer 2 60 times 60 3 times 3 1283 times 3 1281113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 41 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 8

Layer 3 30 times 30 3 times 3 2563 times 3 2561113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 61 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 6

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 23

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 36

Layer 4 15 times 15 3 times 3 5123 times 3 5121113890 1113891 times 2 3 times 3 512

3 times 3 5121113890 1113891 times 31 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Avgpool and fc 1 times 1 Global average pool 2-d fc

12 Computational and Mathematical Methods in Medicine

of the current frame In future work we will explore moretechniques in network design to improve ulcer recognition

Appendix

e architectures of members of the ResNet series includingResNet-18 ResNet-34 ResNet-50 ResNet-101 and ResNet-152 are illustrated in Table 9ese members share a similarpipeline consisting of seven components (ie conv1maxpool layer 1 layer 2 layer 3 layer 4 and avgpool + fc)Each layer in layers 1ndash4 has different subcomponentsdenoted as [ ] times n where [ ] represents a specific basicmodule consisting of 2-3 convolutional layers and n is thenumber of times the basic module is repeated Each row in[ ] describes the convolution layer setting eg [(3 times 3)64]

means a convolution layer with a kernel size and channel of3 times 3 and 64 respectively

Data Availability

eWCE datasets used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

Hao Zhang is currently an employee of Ankon TechnologiesCo Ltd (Wuhan Shanghai China) He helped in providingWCE data and organizing annotation e authors wouldlike to thank Ankon Technologies Co Ltd (WuhanShanghai China) for providing the WCE data (anko-ninccomcn)ey would also like to thank the participatingengineers of Ankon Technologies Co Ltd for theirthoughtful support and cooperation is work was sup-ported by a Research Project from Ankon Technologies CoLtd (Wuhan Shanghai China) the National Key ScientificInstrument and Equipment Development Project underGrant no 2013YQ160439 and the Zhangjiang NationalInnovation Demonstration Zone Special Development Fundunder Grant no ZJ2017-ZD-001

References

[1] L Shen Y-S Shan H-M Hu et al ldquoManagement of gastriccancer in Asia resource-stratified guidelinesrdquo e LancetOncology vol 14 no 12 pp e535ndashe547 2013

[2] Z Liao X Hou E-Q Lin-Hu et al ldquoAccuracy of magneticallycontrolled capsule endoscopy compared with conventionalgastroscopy in detection of gastric diseasesrdquo Clinical Gastro-enterology and Hepatology vol 14 no 9 pp 1266ndash1273e1 2016

[3] Y Yuan B Li and M Q-H Meng ldquoBleeding frame andregion detection in the wireless capsule endoscopy videordquoIEEE Journal of Biomedical and Health Informatics vol 20no 2 pp 624ndash630 2015

[4] N Shamsudhin V I Zverev H Keller et al ldquoMagneticallyguided capsule endoscopyrdquo Medical Physics vol 44 no 8pp e91ndashe111 2017

[5] B Li and M Q-H Meng ldquoComputer aided detection ofbleeding regions for capsule endoscopy imagesrdquo IEEETransactions on Biomedical Engineering vol 56 no 4pp 1032ndash1039 2009

[6] J-Y He X Wu Y-G Jiang Q Peng and R Jain ldquoHook-worm detection in wireless capsule endoscopy images withdeep learningrdquo IEEE Transactions on Image Processingvol 27 no 5 pp 2379ndash2392 2018

[7] Y Yuan and M Q-H Meng ldquoDeep learning for polyprecognition in wireless capsule endoscopy imagesrdquo MedicalPhysics vol 44 no 4 pp 1379ndash1389 2017

[8] B Li and M Q-H Meng ldquoAutomatic polyp detection forwireless capsule endoscopy imagesrdquo Expert Systems withApplications vol 39 no 12 pp 10952ndash10958 2012

[9] A Karargyris and N Bourbakis ldquoDetection of small bowelpolyps and ulcers in wireless capsule endoscopy videosrdquo IEEETransactions on Biomedical Engineering vol 58 no 10pp 2777ndash2786 2011

[10] L Yu P C Yuen and J Lai ldquoUlcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 21st In-ternational Conference on Pattern Recognition (ICPR2012)pp 45ndash48 IEEE Tsukuba Japan November 2012

[11] J-Y Yeh T-H Wu and W-J Tsai ldquoBleeding and ulcerdetection using wireless capsule endoscopy imagesrdquo Journalof Software Engineering and Applications vol 7 no 5pp 422ndash432 2014

[12] Y Fu W Zhang M Mandal and M Q-H Meng ldquoCom-puter-aided bleeding detection in WCE videordquo IEEE Journalof Biomedical and Health Informatics vol 18 no 2pp 636ndash642 2014

[13] V Charisis L Hadjileontiadis and G Sergiadis ldquoEnhancedulcer recognition from capsule endoscopic images usingtexture analysisrdquo in New Advances in the Basic and ClinicalGastroenterology IntechOpen London UK 2012

[14] A K Kundu and S A Fattah ldquoAn asymmetric indexed imagebased technique for automatic ulcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 2017 IEEERegion 10 Humanitarian Technology Conference (R10-HTC)pp 734ndash737 IEEE Dhaka Bangladesh December 2017

[15] E Ribeiro A Uhl W Georg andM Hafner ldquoExploring deeplearning and transfer learning for colonic polyp classifica-tionrdquo Computational andMathematical Methods in Medicinevol 2016 Article ID 6584725 16 pages 2016

[16] B Li and M Q-H Meng ldquoComputer-based detection ofbleeding and ulcer in wireless capsule endoscopy images bychromaticity momentsrdquo Computers in Biology and Medicinevol 39 no 2 pp 141ndash147 2009

[17] B Li L Qi M Q-H Meng and Y Fan ldquoUsing ensembleclassifier for small bowel ulcer detection in wireless capsuleendoscopy imagesrdquo in Proceedings of the 2009 IEEE In-ternational Conference on Robotics amp Biomimetics GuilinChina December 2009

[18] T Aoki A Yamada K Aoyama et al ldquoAutomatic detection oferosions and ulcerations in wireless capsule endoscopy imagesbased on a deep convolutional neural networkrdquo Gastrointes-tinal Endoscopy vol 89 no 2 pp 357ndash363e2 2019

[19] B Li andM Q-H Meng ldquoTexture analysis for ulcer detectionin capsule endoscopy imagesrdquo Image and Vision Computingvol 27 no 9 pp 1336ndash1342 2009

[20] Y Yuan and M Q-H Meng ldquoA novel feature for polypdetection in wireless capsule endoscopy imagesrdquo in Pro-ceedings of the 2014 IEEERSJ International Conference onIntelligent Robots and Systems pp 5010ndash5015 IEEE ChicagoIL USA September 2014

Computational and Mathematical Methods in Medicine 13

[21] A Krizhevsky I Sutskever and G Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inAdvances in Neural Information Processing Systems vol 25no 2 Curran Associates Inc Red Hook NY USA 2012

[22] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in Proceedings of the EuropeanConference on Computer Vision pp 818ndash833 Springer ChamSwitzerland September 2014

[23] C Szegedy W Liu Y Jia et al ldquoGoing deeper with con-volutionsrdquo in Proceedings of the 2015 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEEBoston MA USA June 2015

[24] C Szegedy V Vanhoucke S Ioffe J Shlens and Z WojnaldquoRethinking the inception architecture for computer visionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2818ndash2826 Las Vegas NV USA June2016

[25] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778 LasVegas NV USA June 2016

[26] G Huang Z Liu L Van Der Maaten and K Q WeinbergerldquoDensely connected convolutional networksrdquo in Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition pp 4700ndash4708 Honolulu HI USA July 2017

[27] C Szegedy S Ioffe V Vanhoucke and A Alemi ldquoInception-v4 inception-ResNet and the impact of residual connectionson learningrdquo in Proceedings of the irty-First AAAI Con-ference on Artificial Intelligence San Francisco CA USAFebruary 2017

[28] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[29] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquoNaturevol 521 no 7553 pp 436ndash444 2015

[30] M Turan E P Ornek N Ibrahimli et al ldquoUnsupervisedodometry and depth learning for endoscopic capsule robotsrdquo2018 httpsarxivorgabs180301047

[31] M Ye E Johns A Handa L Zhang P Pratt and G-Z YangldquoSelf-supervised Siamese learning on stereo image pairs fordepth estimation in robotic surgeryrdquo 2017 httpsarxivorgabs170508260

[32] M Faisal and N J Durr ldquoDeep learning and conditionalrandom fields-based depth estimation and topographicalreconstruction from conventional endoscopyrdquoMedical ImageAnalysis vol 48 pp 230ndash243 2018

[33] L Yu H Chen Q Dou J Qin and P A Heng ldquoIntegratingonline and offline three-dimensional deep learning for auto-mated polyp detection in colonoscopy videosrdquo IEEE Journal ofBiomedical andHealth Informatics vol 21 no1 pp 65ndash75 2017

[34] A A Shvets V I Iglovikov A Rakhlin and A A KalininldquoAngiodysplasia detection and localization using deep con-volutional neural networksrdquo in Proceedings of the 2018 17thIEEE International Conference on Machine Learning andApplications (ICMLA) pp 612ndash617 IEEE Orlando FL USADecember 2018

[35] S Fan L Xu Y Fan K Wei and L Li ldquoComputer-aideddetection of small intestinal ulcer and erosion in wirelesscapsule endoscopy imagesrdquo Physics in Medicine amp Biologyvol 63 no 16 p 165001 2018

[36] W Liu D Anguelov D Erhan et al ldquoSSD single shotmultibox detectorrdquo in Proceedings of the European Conferenceon Computer Vision pp 21ndash37 Springer Cham SwitzerlandOctober 2016

[37] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7263ndash7271 Honolulu HIUSA July 2017

[38] T-Y Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2117ndash2125 Honolulu HI USA July2017

[39] M Lin Q Chen and S Yan ldquoNetwork in networkrdquo 2013httpsarxivorgabs13124400

[40] T-Y Lin P Goyal R Girshick K He and P Dollar ldquoFocalloss for dense object detectionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 2980ndash2988Venice Italy October 2017

[41] H R Roth H Oda Y Hayashi et al ldquoHierarchical 3D fullyconvolutional networks for multi-organ segmentationrdquo 2017httpsarxivorgabs170406382

[42] X Liu J Bai G Liao Z Luo and C Wang ldquoDetection ofprotruding lesion in wireless capsule endoscopy videos of smallintestinerdquo in Proceedings of the Medical Imaging 2018 Com-puter-Aided Diagnosis Houston TX USA February 2018

[43] B Zhou A Khosla A Lapedriza A Oliva and A TorralbaldquoLearning deep features for discriminative localizationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2921ndash2929 Las Vegas NV USA June2016

[44] F Wang M Jiang C Qian et al ldquoResidual attention networkfor image classificationrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 3156ndash3164Honolulu HI USA July 2017

14 Computational and Mathematical Methods in Medicine

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 11: DeepConvolutionalNeuralNetworkforUlcerRecognitionin ...e emergence of wireless capsule endoscopy (WCE) has revolutionized the task of imaging GI issues; this technology offers a noninvasive

5 Conclusion

In this work we proposed a CNN architecture for ulcerdetection that uses a state-of-the-art CNN architecture(ResNet-34) as the feature extractor and fuses hyper anddeep features to enhance the recognition of ulcers of varioussizes A large ulcer dataset containing WCE videos from1416 patients was used for this studye proposed networkwas extensively evaluated and compared with other methodsusing overall AC SE SP F1 F2 and ROC-AUC as metrics

Experimental results demonstrate that the proposed ar-chitecture outperforms off-the-shelf CNN architectures es-pecially for the recognition of small ulcers Visualization with

CAM further demonstrates the potential of the proposedarchitecture to locate a suspicious area accurately in a WCEimage Taken together the results suggest a potential methodfor the automatic diagnosis of ulcers from WCE videos

Additionally we conducted experiments to investigatethe effect of number of cases We used split 0 datasets in thecross-validation experiment 990 cases for training 142 casesfor validation and 283 cases for testing We constructeddifferent training datasets from the 990 cases while fixed thevalidation and test dataset Firstly we did experiments onusing different number of cases for training We randomlyselected 659 cases 423 cases and 283 cases from 990 casesen we did another experiment using similar number of

00096

00096 00002

08018

00002 08018

00003

02384

00228

00003 02384 00228

00002

00026

00080

00989

00002 00026 00080 00989

HAnet-34(480)

HAnet-34(480)

Inception-res-v2(480)

Inception-res-v2(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

ResNet-34(480)

dense-121(480)

vgg-16(480)

lt001

gt005

001ndash005

(b)

Figure 11 Comparison of different models (a) Accuracy and inference time comparison e horizontal and vertical axes denote theinference time and test accuracy respectively (b) Statistical test results of paired T-Test Number in each grid cell denotes the p value of thetwo models in the corresponding row and column

Table 7 Evaluation of different criteria

PRE RECALL F1 F2 ROC-AUCvgg-16(480) 09170 09012 09087 09040 09656dense-121(480) 09198 09047 09118 09074 09658Inception-ResNet-v2(480) 09208 09081 09138 09102 09706ResNet-34(480) 09218 09053 09133 09084 09698HAnet-34(480) 09237 09164 09199 09177 09726

Table 8 Recognition of ulcer with different sizes

ModelUlcer size

lt1 1ndash25 25ndash5 gt5ResNet-34(480) 8144plusmn 307 9186plusmn 140 9416plusmn 126 9651plusmn 143HAnet-34(480) 8237plusmn 360 9278plusmn 133 9540plusmn 074 9711plusmn 111

Computational and Mathematical Methods in Medicine 11

frames as last experiment that distributed in all 990 casesResults demonstrate that when similar number of frames areused for training test accuracies using training datasets withmore cases are better is should be attributed to richerdiversity introduced by more cases We may recommend touse as many cases as possible to train the model

While the performance of HAnet is very encouragingimproving its SE and SP further is necessary For examplethe fusion strategy in the proposed architecture involves

concatenation of features from shallow layers after GAPSemantic information in hyper features may not be as strongas that in deep features ie false-activated neural units dueto the relative limited receptive field in the shallow layersmay add unnecessary noise to the concatenated featurevector when GAP is utilized An attention mechanism [44]that can focus on the suspicious area may help address thisissue Temporal information from adjacent frames couldalso be used to provide external guidance during recognition

(a) (b)

(c) (d)

(e) (f )

Figure 12 Visualization network results of some representative ulcers with CAM A total of six groups of representative frames are obtained Foreach group the left image reflects the original frame while the right image shows the result of CAM (a) Typical ulcer (b) ulcer on the edge (c) ulcerin a turbid backgroundwith bubbles (d)multiple ulcers in one frame (e) ulcer in a shadow and (f) ulcer recognition in the framewith a flashlight

Table 9 Architectures of ResNet series members

Layer name Output size 18-Layer 34-Layer 50-Layer 101-Layer 152-LayerConv1 240 times 240 7 times 7 64 stride 2Maxpool 120 times 120 3 times 3 max pool stride 2

Layer 1 120 times 120 3 times 3 643 times 3 641113890 1113891 times 2 3 times 3 64

3 times 3 641113890 1113891 times 31 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Layer 2 60 times 60 3 times 3 1283 times 3 1281113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 41 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 8

Layer 3 30 times 30 3 times 3 2563 times 3 2561113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 61 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 6

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 23

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 36

Layer 4 15 times 15 3 times 3 5123 times 3 5121113890 1113891 times 2 3 times 3 512

3 times 3 5121113890 1113891 times 31 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Avgpool and fc 1 times 1 Global average pool 2-d fc

12 Computational and Mathematical Methods in Medicine

of the current frame In future work we will explore moretechniques in network design to improve ulcer recognition

Appendix

e architectures of members of the ResNet series includingResNet-18 ResNet-34 ResNet-50 ResNet-101 and ResNet-152 are illustrated in Table 9ese members share a similarpipeline consisting of seven components (ie conv1maxpool layer 1 layer 2 layer 3 layer 4 and avgpool + fc)Each layer in layers 1ndash4 has different subcomponentsdenoted as [ ] times n where [ ] represents a specific basicmodule consisting of 2-3 convolutional layers and n is thenumber of times the basic module is repeated Each row in[ ] describes the convolution layer setting eg [(3 times 3)64]

means a convolution layer with a kernel size and channel of3 times 3 and 64 respectively

Data Availability

eWCE datasets used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

Hao Zhang is currently an employee of Ankon TechnologiesCo Ltd (Wuhan Shanghai China) He helped in providingWCE data and organizing annotation e authors wouldlike to thank Ankon Technologies Co Ltd (WuhanShanghai China) for providing the WCE data (anko-ninccomcn)ey would also like to thank the participatingengineers of Ankon Technologies Co Ltd for theirthoughtful support and cooperation is work was sup-ported by a Research Project from Ankon Technologies CoLtd (Wuhan Shanghai China) the National Key ScientificInstrument and Equipment Development Project underGrant no 2013YQ160439 and the Zhangjiang NationalInnovation Demonstration Zone Special Development Fundunder Grant no ZJ2017-ZD-001

References

[1] L Shen Y-S Shan H-M Hu et al ldquoManagement of gastriccancer in Asia resource-stratified guidelinesrdquo e LancetOncology vol 14 no 12 pp e535ndashe547 2013

[2] Z Liao X Hou E-Q Lin-Hu et al ldquoAccuracy of magneticallycontrolled capsule endoscopy compared with conventionalgastroscopy in detection of gastric diseasesrdquo Clinical Gastro-enterology and Hepatology vol 14 no 9 pp 1266ndash1273e1 2016

[3] Y Yuan B Li and M Q-H Meng ldquoBleeding frame andregion detection in the wireless capsule endoscopy videordquoIEEE Journal of Biomedical and Health Informatics vol 20no 2 pp 624ndash630 2015

[4] N Shamsudhin V I Zverev H Keller et al ldquoMagneticallyguided capsule endoscopyrdquo Medical Physics vol 44 no 8pp e91ndashe111 2017

[5] B Li and M Q-H Meng ldquoComputer aided detection ofbleeding regions for capsule endoscopy imagesrdquo IEEETransactions on Biomedical Engineering vol 56 no 4pp 1032ndash1039 2009

[6] J-Y He X Wu Y-G Jiang Q Peng and R Jain ldquoHook-worm detection in wireless capsule endoscopy images withdeep learningrdquo IEEE Transactions on Image Processingvol 27 no 5 pp 2379ndash2392 2018

[7] Y Yuan and M Q-H Meng ldquoDeep learning for polyprecognition in wireless capsule endoscopy imagesrdquo MedicalPhysics vol 44 no 4 pp 1379ndash1389 2017

[8] B Li and M Q-H Meng ldquoAutomatic polyp detection forwireless capsule endoscopy imagesrdquo Expert Systems withApplications vol 39 no 12 pp 10952ndash10958 2012

[9] A Karargyris and N Bourbakis ldquoDetection of small bowelpolyps and ulcers in wireless capsule endoscopy videosrdquo IEEETransactions on Biomedical Engineering vol 58 no 10pp 2777ndash2786 2011

[10] L Yu P C Yuen and J Lai ldquoUlcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 21st In-ternational Conference on Pattern Recognition (ICPR2012)pp 45ndash48 IEEE Tsukuba Japan November 2012

[11] J-Y Yeh T-H Wu and W-J Tsai ldquoBleeding and ulcerdetection using wireless capsule endoscopy imagesrdquo Journalof Software Engineering and Applications vol 7 no 5pp 422ndash432 2014

[12] Y Fu W Zhang M Mandal and M Q-H Meng ldquoCom-puter-aided bleeding detection in WCE videordquo IEEE Journalof Biomedical and Health Informatics vol 18 no 2pp 636ndash642 2014

[13] V Charisis L Hadjileontiadis and G Sergiadis ldquoEnhancedulcer recognition from capsule endoscopic images usingtexture analysisrdquo in New Advances in the Basic and ClinicalGastroenterology IntechOpen London UK 2012

[14] A K Kundu and S A Fattah ldquoAn asymmetric indexed imagebased technique for automatic ulcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 2017 IEEERegion 10 Humanitarian Technology Conference (R10-HTC)pp 734ndash737 IEEE Dhaka Bangladesh December 2017

[15] E Ribeiro A Uhl W Georg andM Hafner ldquoExploring deeplearning and transfer learning for colonic polyp classifica-tionrdquo Computational andMathematical Methods in Medicinevol 2016 Article ID 6584725 16 pages 2016

[16] B Li and M Q-H Meng ldquoComputer-based detection ofbleeding and ulcer in wireless capsule endoscopy images bychromaticity momentsrdquo Computers in Biology and Medicinevol 39 no 2 pp 141ndash147 2009

[17] B Li L Qi M Q-H Meng and Y Fan ldquoUsing ensembleclassifier for small bowel ulcer detection in wireless capsuleendoscopy imagesrdquo in Proceedings of the 2009 IEEE In-ternational Conference on Robotics amp Biomimetics GuilinChina December 2009

[18] T Aoki A Yamada K Aoyama et al ldquoAutomatic detection oferosions and ulcerations in wireless capsule endoscopy imagesbased on a deep convolutional neural networkrdquo Gastrointes-tinal Endoscopy vol 89 no 2 pp 357ndash363e2 2019

[19] B Li andM Q-H Meng ldquoTexture analysis for ulcer detectionin capsule endoscopy imagesrdquo Image and Vision Computingvol 27 no 9 pp 1336ndash1342 2009

[20] Y Yuan and M Q-H Meng ldquoA novel feature for polypdetection in wireless capsule endoscopy imagesrdquo in Pro-ceedings of the 2014 IEEERSJ International Conference onIntelligent Robots and Systems pp 5010ndash5015 IEEE ChicagoIL USA September 2014

Computational and Mathematical Methods in Medicine 13

[21] A Krizhevsky I Sutskever and G Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inAdvances in Neural Information Processing Systems vol 25no 2 Curran Associates Inc Red Hook NY USA 2012

[22] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in Proceedings of the EuropeanConference on Computer Vision pp 818ndash833 Springer ChamSwitzerland September 2014

[23] C Szegedy W Liu Y Jia et al ldquoGoing deeper with con-volutionsrdquo in Proceedings of the 2015 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEEBoston MA USA June 2015

[24] C Szegedy V Vanhoucke S Ioffe J Shlens and Z WojnaldquoRethinking the inception architecture for computer visionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2818ndash2826 Las Vegas NV USA June2016

[25] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778 LasVegas NV USA June 2016

[26] G Huang Z Liu L Van Der Maaten and K Q WeinbergerldquoDensely connected convolutional networksrdquo in Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition pp 4700ndash4708 Honolulu HI USA July 2017

[27] C Szegedy S Ioffe V Vanhoucke and A Alemi ldquoInception-v4 inception-ResNet and the impact of residual connectionson learningrdquo in Proceedings of the irty-First AAAI Con-ference on Artificial Intelligence San Francisco CA USAFebruary 2017

[28] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[29] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquoNaturevol 521 no 7553 pp 436ndash444 2015

[30] M Turan E P Ornek N Ibrahimli et al ldquoUnsupervisedodometry and depth learning for endoscopic capsule robotsrdquo2018 httpsarxivorgabs180301047

[31] M Ye E Johns A Handa L Zhang P Pratt and G-Z YangldquoSelf-supervised Siamese learning on stereo image pairs fordepth estimation in robotic surgeryrdquo 2017 httpsarxivorgabs170508260

[32] M Faisal and N J Durr ldquoDeep learning and conditionalrandom fields-based depth estimation and topographicalreconstruction from conventional endoscopyrdquoMedical ImageAnalysis vol 48 pp 230ndash243 2018

[33] L Yu H Chen Q Dou J Qin and P A Heng ldquoIntegratingonline and offline three-dimensional deep learning for auto-mated polyp detection in colonoscopy videosrdquo IEEE Journal ofBiomedical andHealth Informatics vol 21 no1 pp 65ndash75 2017

[34] A A Shvets V I Iglovikov A Rakhlin and A A KalininldquoAngiodysplasia detection and localization using deep con-volutional neural networksrdquo in Proceedings of the 2018 17thIEEE International Conference on Machine Learning andApplications (ICMLA) pp 612ndash617 IEEE Orlando FL USADecember 2018

[35] S Fan L Xu Y Fan K Wei and L Li ldquoComputer-aideddetection of small intestinal ulcer and erosion in wirelesscapsule endoscopy imagesrdquo Physics in Medicine amp Biologyvol 63 no 16 p 165001 2018

[36] W Liu D Anguelov D Erhan et al ldquoSSD single shotmultibox detectorrdquo in Proceedings of the European Conferenceon Computer Vision pp 21ndash37 Springer Cham SwitzerlandOctober 2016

[37] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7263ndash7271 Honolulu HIUSA July 2017

[38] T-Y Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2117ndash2125 Honolulu HI USA July2017

[39] M Lin Q Chen and S Yan ldquoNetwork in networkrdquo 2013httpsarxivorgabs13124400

[40] T-Y Lin P Goyal R Girshick K He and P Dollar ldquoFocalloss for dense object detectionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 2980ndash2988Venice Italy October 2017

[41] H R Roth H Oda Y Hayashi et al ldquoHierarchical 3D fullyconvolutional networks for multi-organ segmentationrdquo 2017httpsarxivorgabs170406382

[42] X Liu J Bai G Liao Z Luo and C Wang ldquoDetection ofprotruding lesion in wireless capsule endoscopy videos of smallintestinerdquo in Proceedings of the Medical Imaging 2018 Com-puter-Aided Diagnosis Houston TX USA February 2018

[43] B Zhou A Khosla A Lapedriza A Oliva and A TorralbaldquoLearning deep features for discriminative localizationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2921ndash2929 Las Vegas NV USA June2016

[44] F Wang M Jiang C Qian et al ldquoResidual attention networkfor image classificationrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 3156ndash3164Honolulu HI USA July 2017

14 Computational and Mathematical Methods in Medicine

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 12: DeepConvolutionalNeuralNetworkforUlcerRecognitionin ...e emergence of wireless capsule endoscopy (WCE) has revolutionized the task of imaging GI issues; this technology offers a noninvasive

frames as last experiment that distributed in all 990 casesResults demonstrate that when similar number of frames areused for training test accuracies using training datasets withmore cases are better is should be attributed to richerdiversity introduced by more cases We may recommend touse as many cases as possible to train the model

While the performance of HAnet is very encouragingimproving its SE and SP further is necessary For examplethe fusion strategy in the proposed architecture involves

concatenation of features from shallow layers after GAPSemantic information in hyper features may not be as strongas that in deep features ie false-activated neural units dueto the relative limited receptive field in the shallow layersmay add unnecessary noise to the concatenated featurevector when GAP is utilized An attention mechanism [44]that can focus on the suspicious area may help address thisissue Temporal information from adjacent frames couldalso be used to provide external guidance during recognition

(a) (b)

(c) (d)

(e) (f )

Figure 12 Visualization network results of some representative ulcers with CAM A total of six groups of representative frames are obtained Foreach group the left image reflects the original frame while the right image shows the result of CAM (a) Typical ulcer (b) ulcer on the edge (c) ulcerin a turbid backgroundwith bubbles (d)multiple ulcers in one frame (e) ulcer in a shadow and (f) ulcer recognition in the framewith a flashlight

Table 9 Architectures of ResNet series members

Layer name Output size 18-Layer 34-Layer 50-Layer 101-Layer 152-LayerConv1 240 times 240 7 times 7 64 stride 2Maxpool 120 times 120 3 times 3 max pool stride 2

Layer 1 120 times 120 3 times 3 643 times 3 641113890 1113891 times 2 3 times 3 64

3 times 3 641113890 1113891 times 31 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 643 times 3 641 times 1 256

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Layer 2 60 times 60 3 times 3 1283 times 3 1281113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 41 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 4

1 times 1 1283 times 3 1281 times 1 512

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 8

Layer 3 30 times 30 3 times 3 2563 times 3 2561113890 1113891 times 2 3 times 3 128

3 times 3 1281113890 1113891 times 61 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 6

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 23

1 times 1 2563 times 3 2561 times 1 1024

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 36

Layer 4 15 times 15 3 times 3 5123 times 3 5121113890 1113891 times 2 3 times 3 512

3 times 3 5121113890 1113891 times 31 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

1 times 1 5123 times 3 5121 times 1 2048

⎡⎢⎢⎢⎢⎢⎣⎤⎥⎥⎥⎥⎥⎦ times 3

Avgpool and fc 1 times 1 Global average pool 2-d fc

12 Computational and Mathematical Methods in Medicine

of the current frame In future work we will explore moretechniques in network design to improve ulcer recognition

Appendix

e architectures of members of the ResNet series includingResNet-18 ResNet-34 ResNet-50 ResNet-101 and ResNet-152 are illustrated in Table 9ese members share a similarpipeline consisting of seven components (ie conv1maxpool layer 1 layer 2 layer 3 layer 4 and avgpool + fc)Each layer in layers 1ndash4 has different subcomponentsdenoted as [ ] times n where [ ] represents a specific basicmodule consisting of 2-3 convolutional layers and n is thenumber of times the basic module is repeated Each row in[ ] describes the convolution layer setting eg [(3 times 3)64]

means a convolution layer with a kernel size and channel of3 times 3 and 64 respectively

Data Availability

eWCE datasets used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

Hao Zhang is currently an employee of Ankon TechnologiesCo Ltd (Wuhan Shanghai China) He helped in providingWCE data and organizing annotation e authors wouldlike to thank Ankon Technologies Co Ltd (WuhanShanghai China) for providing the WCE data (anko-ninccomcn)ey would also like to thank the participatingengineers of Ankon Technologies Co Ltd for theirthoughtful support and cooperation is work was sup-ported by a Research Project from Ankon Technologies CoLtd (Wuhan Shanghai China) the National Key ScientificInstrument and Equipment Development Project underGrant no 2013YQ160439 and the Zhangjiang NationalInnovation Demonstration Zone Special Development Fundunder Grant no ZJ2017-ZD-001

References

[1] L Shen Y-S Shan H-M Hu et al ldquoManagement of gastriccancer in Asia resource-stratified guidelinesrdquo e LancetOncology vol 14 no 12 pp e535ndashe547 2013

[2] Z Liao X Hou E-Q Lin-Hu et al ldquoAccuracy of magneticallycontrolled capsule endoscopy compared with conventionalgastroscopy in detection of gastric diseasesrdquo Clinical Gastro-enterology and Hepatology vol 14 no 9 pp 1266ndash1273e1 2016

[3] Y Yuan B Li and M Q-H Meng ldquoBleeding frame andregion detection in the wireless capsule endoscopy videordquoIEEE Journal of Biomedical and Health Informatics vol 20no 2 pp 624ndash630 2015

[4] N Shamsudhin V I Zverev H Keller et al ldquoMagneticallyguided capsule endoscopyrdquo Medical Physics vol 44 no 8pp e91ndashe111 2017

[5] B Li and M Q-H Meng ldquoComputer aided detection ofbleeding regions for capsule endoscopy imagesrdquo IEEETransactions on Biomedical Engineering vol 56 no 4pp 1032ndash1039 2009

[6] J-Y He X Wu Y-G Jiang Q Peng and R Jain ldquoHook-worm detection in wireless capsule endoscopy images withdeep learningrdquo IEEE Transactions on Image Processingvol 27 no 5 pp 2379ndash2392 2018

[7] Y Yuan and M Q-H Meng ldquoDeep learning for polyprecognition in wireless capsule endoscopy imagesrdquo MedicalPhysics vol 44 no 4 pp 1379ndash1389 2017

[8] B Li and M Q-H Meng ldquoAutomatic polyp detection forwireless capsule endoscopy imagesrdquo Expert Systems withApplications vol 39 no 12 pp 10952ndash10958 2012

[9] A Karargyris and N Bourbakis ldquoDetection of small bowelpolyps and ulcers in wireless capsule endoscopy videosrdquo IEEETransactions on Biomedical Engineering vol 58 no 10pp 2777ndash2786 2011

[10] L Yu P C Yuen and J Lai ldquoUlcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 21st In-ternational Conference on Pattern Recognition (ICPR2012)pp 45ndash48 IEEE Tsukuba Japan November 2012

[11] J-Y Yeh T-H Wu and W-J Tsai ldquoBleeding and ulcerdetection using wireless capsule endoscopy imagesrdquo Journalof Software Engineering and Applications vol 7 no 5pp 422ndash432 2014

[12] Y Fu W Zhang M Mandal and M Q-H Meng ldquoCom-puter-aided bleeding detection in WCE videordquo IEEE Journalof Biomedical and Health Informatics vol 18 no 2pp 636ndash642 2014

[13] V Charisis L Hadjileontiadis and G Sergiadis ldquoEnhancedulcer recognition from capsule endoscopic images usingtexture analysisrdquo in New Advances in the Basic and ClinicalGastroenterology IntechOpen London UK 2012

[14] A K Kundu and S A Fattah ldquoAn asymmetric indexed imagebased technique for automatic ulcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 2017 IEEERegion 10 Humanitarian Technology Conference (R10-HTC)pp 734ndash737 IEEE Dhaka Bangladesh December 2017

[15] E Ribeiro A Uhl W Georg andM Hafner ldquoExploring deeplearning and transfer learning for colonic polyp classifica-tionrdquo Computational andMathematical Methods in Medicinevol 2016 Article ID 6584725 16 pages 2016

[16] B Li and M Q-H Meng ldquoComputer-based detection ofbleeding and ulcer in wireless capsule endoscopy images bychromaticity momentsrdquo Computers in Biology and Medicinevol 39 no 2 pp 141ndash147 2009

[17] B Li L Qi M Q-H Meng and Y Fan ldquoUsing ensembleclassifier for small bowel ulcer detection in wireless capsuleendoscopy imagesrdquo in Proceedings of the 2009 IEEE In-ternational Conference on Robotics amp Biomimetics GuilinChina December 2009

[18] T Aoki A Yamada K Aoyama et al ldquoAutomatic detection oferosions and ulcerations in wireless capsule endoscopy imagesbased on a deep convolutional neural networkrdquo Gastrointes-tinal Endoscopy vol 89 no 2 pp 357ndash363e2 2019

[19] B Li andM Q-H Meng ldquoTexture analysis for ulcer detectionin capsule endoscopy imagesrdquo Image and Vision Computingvol 27 no 9 pp 1336ndash1342 2009

[20] Y Yuan and M Q-H Meng ldquoA novel feature for polypdetection in wireless capsule endoscopy imagesrdquo in Pro-ceedings of the 2014 IEEERSJ International Conference onIntelligent Robots and Systems pp 5010ndash5015 IEEE ChicagoIL USA September 2014

Computational and Mathematical Methods in Medicine 13

[21] A Krizhevsky I Sutskever and G Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inAdvances in Neural Information Processing Systems vol 25no 2 Curran Associates Inc Red Hook NY USA 2012

[22] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in Proceedings of the EuropeanConference on Computer Vision pp 818ndash833 Springer ChamSwitzerland September 2014

[23] C Szegedy W Liu Y Jia et al ldquoGoing deeper with con-volutionsrdquo in Proceedings of the 2015 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEEBoston MA USA June 2015

[24] C Szegedy V Vanhoucke S Ioffe J Shlens and Z WojnaldquoRethinking the inception architecture for computer visionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2818ndash2826 Las Vegas NV USA June2016

[25] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778 LasVegas NV USA June 2016

[26] G Huang Z Liu L Van Der Maaten and K Q WeinbergerldquoDensely connected convolutional networksrdquo in Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition pp 4700ndash4708 Honolulu HI USA July 2017

[27] C Szegedy S Ioffe V Vanhoucke and A Alemi ldquoInception-v4 inception-ResNet and the impact of residual connectionson learningrdquo in Proceedings of the irty-First AAAI Con-ference on Artificial Intelligence San Francisco CA USAFebruary 2017

[28] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[29] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquoNaturevol 521 no 7553 pp 436ndash444 2015

[30] M Turan E P Ornek N Ibrahimli et al ldquoUnsupervisedodometry and depth learning for endoscopic capsule robotsrdquo2018 httpsarxivorgabs180301047

[31] M Ye E Johns A Handa L Zhang P Pratt and G-Z YangldquoSelf-supervised Siamese learning on stereo image pairs fordepth estimation in robotic surgeryrdquo 2017 httpsarxivorgabs170508260

[32] M Faisal and N J Durr ldquoDeep learning and conditionalrandom fields-based depth estimation and topographicalreconstruction from conventional endoscopyrdquoMedical ImageAnalysis vol 48 pp 230ndash243 2018

[33] L Yu H Chen Q Dou J Qin and P A Heng ldquoIntegratingonline and offline three-dimensional deep learning for auto-mated polyp detection in colonoscopy videosrdquo IEEE Journal ofBiomedical andHealth Informatics vol 21 no1 pp 65ndash75 2017

[34] A A Shvets V I Iglovikov A Rakhlin and A A KalininldquoAngiodysplasia detection and localization using deep con-volutional neural networksrdquo in Proceedings of the 2018 17thIEEE International Conference on Machine Learning andApplications (ICMLA) pp 612ndash617 IEEE Orlando FL USADecember 2018

[35] S Fan L Xu Y Fan K Wei and L Li ldquoComputer-aideddetection of small intestinal ulcer and erosion in wirelesscapsule endoscopy imagesrdquo Physics in Medicine amp Biologyvol 63 no 16 p 165001 2018

[36] W Liu D Anguelov D Erhan et al ldquoSSD single shotmultibox detectorrdquo in Proceedings of the European Conferenceon Computer Vision pp 21ndash37 Springer Cham SwitzerlandOctober 2016

[37] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7263ndash7271 Honolulu HIUSA July 2017

[38] T-Y Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2117ndash2125 Honolulu HI USA July2017

[39] M Lin Q Chen and S Yan ldquoNetwork in networkrdquo 2013httpsarxivorgabs13124400

[40] T-Y Lin P Goyal R Girshick K He and P Dollar ldquoFocalloss for dense object detectionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 2980ndash2988Venice Italy October 2017

[41] H R Roth H Oda Y Hayashi et al ldquoHierarchical 3D fullyconvolutional networks for multi-organ segmentationrdquo 2017httpsarxivorgabs170406382

[42] X Liu J Bai G Liao Z Luo and C Wang ldquoDetection ofprotruding lesion in wireless capsule endoscopy videos of smallintestinerdquo in Proceedings of the Medical Imaging 2018 Com-puter-Aided Diagnosis Houston TX USA February 2018

[43] B Zhou A Khosla A Lapedriza A Oliva and A TorralbaldquoLearning deep features for discriminative localizationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2921ndash2929 Las Vegas NV USA June2016

[44] F Wang M Jiang C Qian et al ldquoResidual attention networkfor image classificationrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 3156ndash3164Honolulu HI USA July 2017

14 Computational and Mathematical Methods in Medicine

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 13: DeepConvolutionalNeuralNetworkforUlcerRecognitionin ...e emergence of wireless capsule endoscopy (WCE) has revolutionized the task of imaging GI issues; this technology offers a noninvasive

of the current frame In future work we will explore moretechniques in network design to improve ulcer recognition

Appendix

e architectures of members of the ResNet series includingResNet-18 ResNet-34 ResNet-50 ResNet-101 and ResNet-152 are illustrated in Table 9ese members share a similarpipeline consisting of seven components (ie conv1maxpool layer 1 layer 2 layer 3 layer 4 and avgpool + fc)Each layer in layers 1ndash4 has different subcomponentsdenoted as [ ] times n where [ ] represents a specific basicmodule consisting of 2-3 convolutional layers and n is thenumber of times the basic module is repeated Each row in[ ] describes the convolution layer setting eg [(3 times 3)64]

means a convolution layer with a kernel size and channel of3 times 3 and 64 respectively

Data Availability

eWCE datasets used to support the findings of this studyare available from the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

Hao Zhang is currently an employee of Ankon TechnologiesCo Ltd (Wuhan Shanghai China) He helped in providingWCE data and organizing annotation e authors wouldlike to thank Ankon Technologies Co Ltd (WuhanShanghai China) for providing the WCE data (anko-ninccomcn)ey would also like to thank the participatingengineers of Ankon Technologies Co Ltd for theirthoughtful support and cooperation is work was sup-ported by a Research Project from Ankon Technologies CoLtd (Wuhan Shanghai China) the National Key ScientificInstrument and Equipment Development Project underGrant no 2013YQ160439 and the Zhangjiang NationalInnovation Demonstration Zone Special Development Fundunder Grant no ZJ2017-ZD-001

References

[1] L Shen Y-S Shan H-M Hu et al ldquoManagement of gastriccancer in Asia resource-stratified guidelinesrdquo e LancetOncology vol 14 no 12 pp e535ndashe547 2013

[2] Z Liao X Hou E-Q Lin-Hu et al ldquoAccuracy of magneticallycontrolled capsule endoscopy compared with conventionalgastroscopy in detection of gastric diseasesrdquo Clinical Gastro-enterology and Hepatology vol 14 no 9 pp 1266ndash1273e1 2016

[3] Y Yuan B Li and M Q-H Meng ldquoBleeding frame andregion detection in the wireless capsule endoscopy videordquoIEEE Journal of Biomedical and Health Informatics vol 20no 2 pp 624ndash630 2015

[4] N Shamsudhin V I Zverev H Keller et al ldquoMagneticallyguided capsule endoscopyrdquo Medical Physics vol 44 no 8pp e91ndashe111 2017

[5] B Li and M Q-H Meng ldquoComputer aided detection ofbleeding regions for capsule endoscopy imagesrdquo IEEETransactions on Biomedical Engineering vol 56 no 4pp 1032ndash1039 2009

[6] J-Y He X Wu Y-G Jiang Q Peng and R Jain ldquoHook-worm detection in wireless capsule endoscopy images withdeep learningrdquo IEEE Transactions on Image Processingvol 27 no 5 pp 2379ndash2392 2018

[7] Y Yuan and M Q-H Meng ldquoDeep learning for polyprecognition in wireless capsule endoscopy imagesrdquo MedicalPhysics vol 44 no 4 pp 1379ndash1389 2017

[8] B Li and M Q-H Meng ldquoAutomatic polyp detection forwireless capsule endoscopy imagesrdquo Expert Systems withApplications vol 39 no 12 pp 10952ndash10958 2012

[9] A Karargyris and N Bourbakis ldquoDetection of small bowelpolyps and ulcers in wireless capsule endoscopy videosrdquo IEEETransactions on Biomedical Engineering vol 58 no 10pp 2777ndash2786 2011

[10] L Yu P C Yuen and J Lai ldquoUlcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 21st In-ternational Conference on Pattern Recognition (ICPR2012)pp 45ndash48 IEEE Tsukuba Japan November 2012

[11] J-Y Yeh T-H Wu and W-J Tsai ldquoBleeding and ulcerdetection using wireless capsule endoscopy imagesrdquo Journalof Software Engineering and Applications vol 7 no 5pp 422ndash432 2014

[12] Y Fu W Zhang M Mandal and M Q-H Meng ldquoCom-puter-aided bleeding detection in WCE videordquo IEEE Journalof Biomedical and Health Informatics vol 18 no 2pp 636ndash642 2014

[13] V Charisis L Hadjileontiadis and G Sergiadis ldquoEnhancedulcer recognition from capsule endoscopic images usingtexture analysisrdquo in New Advances in the Basic and ClinicalGastroenterology IntechOpen London UK 2012

[14] A K Kundu and S A Fattah ldquoAn asymmetric indexed imagebased technique for automatic ulcer detection in wirelesscapsule endoscopy imagesrdquo in Proceedings of the 2017 IEEERegion 10 Humanitarian Technology Conference (R10-HTC)pp 734ndash737 IEEE Dhaka Bangladesh December 2017

[15] E Ribeiro A Uhl W Georg andM Hafner ldquoExploring deeplearning and transfer learning for colonic polyp classifica-tionrdquo Computational andMathematical Methods in Medicinevol 2016 Article ID 6584725 16 pages 2016

[16] B Li and M Q-H Meng ldquoComputer-based detection ofbleeding and ulcer in wireless capsule endoscopy images bychromaticity momentsrdquo Computers in Biology and Medicinevol 39 no 2 pp 141ndash147 2009

[17] B Li L Qi M Q-H Meng and Y Fan ldquoUsing ensembleclassifier for small bowel ulcer detection in wireless capsuleendoscopy imagesrdquo in Proceedings of the 2009 IEEE In-ternational Conference on Robotics amp Biomimetics GuilinChina December 2009

[18] T Aoki A Yamada K Aoyama et al ldquoAutomatic detection oferosions and ulcerations in wireless capsule endoscopy imagesbased on a deep convolutional neural networkrdquo Gastrointes-tinal Endoscopy vol 89 no 2 pp 357ndash363e2 2019

[19] B Li andM Q-H Meng ldquoTexture analysis for ulcer detectionin capsule endoscopy imagesrdquo Image and Vision Computingvol 27 no 9 pp 1336ndash1342 2009

[20] Y Yuan and M Q-H Meng ldquoA novel feature for polypdetection in wireless capsule endoscopy imagesrdquo in Pro-ceedings of the 2014 IEEERSJ International Conference onIntelligent Robots and Systems pp 5010ndash5015 IEEE ChicagoIL USA September 2014

Computational and Mathematical Methods in Medicine 13

[21] A Krizhevsky I Sutskever and G Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inAdvances in Neural Information Processing Systems vol 25no 2 Curran Associates Inc Red Hook NY USA 2012

[22] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in Proceedings of the EuropeanConference on Computer Vision pp 818ndash833 Springer ChamSwitzerland September 2014

[23] C Szegedy W Liu Y Jia et al ldquoGoing deeper with con-volutionsrdquo in Proceedings of the 2015 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEEBoston MA USA June 2015

[24] C Szegedy V Vanhoucke S Ioffe J Shlens and Z WojnaldquoRethinking the inception architecture for computer visionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2818ndash2826 Las Vegas NV USA June2016

[25] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778 LasVegas NV USA June 2016

[26] G Huang Z Liu L Van Der Maaten and K Q WeinbergerldquoDensely connected convolutional networksrdquo in Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition pp 4700ndash4708 Honolulu HI USA July 2017

[27] C Szegedy S Ioffe V Vanhoucke and A Alemi ldquoInception-v4 inception-ResNet and the impact of residual connectionson learningrdquo in Proceedings of the irty-First AAAI Con-ference on Artificial Intelligence San Francisco CA USAFebruary 2017

[28] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[29] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquoNaturevol 521 no 7553 pp 436ndash444 2015

[30] M Turan E P Ornek N Ibrahimli et al ldquoUnsupervisedodometry and depth learning for endoscopic capsule robotsrdquo2018 httpsarxivorgabs180301047

[31] M Ye E Johns A Handa L Zhang P Pratt and G-Z YangldquoSelf-supervised Siamese learning on stereo image pairs fordepth estimation in robotic surgeryrdquo 2017 httpsarxivorgabs170508260

[32] M Faisal and N J Durr ldquoDeep learning and conditionalrandom fields-based depth estimation and topographicalreconstruction from conventional endoscopyrdquoMedical ImageAnalysis vol 48 pp 230ndash243 2018

[33] L Yu H Chen Q Dou J Qin and P A Heng ldquoIntegratingonline and offline three-dimensional deep learning for auto-mated polyp detection in colonoscopy videosrdquo IEEE Journal ofBiomedical andHealth Informatics vol 21 no1 pp 65ndash75 2017

[34] A A Shvets V I Iglovikov A Rakhlin and A A KalininldquoAngiodysplasia detection and localization using deep con-volutional neural networksrdquo in Proceedings of the 2018 17thIEEE International Conference on Machine Learning andApplications (ICMLA) pp 612ndash617 IEEE Orlando FL USADecember 2018

[35] S Fan L Xu Y Fan K Wei and L Li ldquoComputer-aideddetection of small intestinal ulcer and erosion in wirelesscapsule endoscopy imagesrdquo Physics in Medicine amp Biologyvol 63 no 16 p 165001 2018

[36] W Liu D Anguelov D Erhan et al ldquoSSD single shotmultibox detectorrdquo in Proceedings of the European Conferenceon Computer Vision pp 21ndash37 Springer Cham SwitzerlandOctober 2016

[37] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7263ndash7271 Honolulu HIUSA July 2017

[38] T-Y Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2117ndash2125 Honolulu HI USA July2017

[39] M Lin Q Chen and S Yan ldquoNetwork in networkrdquo 2013httpsarxivorgabs13124400

[40] T-Y Lin P Goyal R Girshick K He and P Dollar ldquoFocalloss for dense object detectionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 2980ndash2988Venice Italy October 2017

[41] H R Roth H Oda Y Hayashi et al ldquoHierarchical 3D fullyconvolutional networks for multi-organ segmentationrdquo 2017httpsarxivorgabs170406382

[42] X Liu J Bai G Liao Z Luo and C Wang ldquoDetection ofprotruding lesion in wireless capsule endoscopy videos of smallintestinerdquo in Proceedings of the Medical Imaging 2018 Com-puter-Aided Diagnosis Houston TX USA February 2018

[43] B Zhou A Khosla A Lapedriza A Oliva and A TorralbaldquoLearning deep features for discriminative localizationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2921ndash2929 Las Vegas NV USA June2016

[44] F Wang M Jiang C Qian et al ldquoResidual attention networkfor image classificationrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 3156ndash3164Honolulu HI USA July 2017

14 Computational and Mathematical Methods in Medicine

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 14: DeepConvolutionalNeuralNetworkforUlcerRecognitionin ...e emergence of wireless capsule endoscopy (WCE) has revolutionized the task of imaging GI issues; this technology offers a noninvasive

[21] A Krizhevsky I Sutskever and G Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inAdvances in Neural Information Processing Systems vol 25no 2 Curran Associates Inc Red Hook NY USA 2012

[22] M D Zeiler and R Fergus ldquoVisualizing and understandingconvolutional networksrdquo in Proceedings of the EuropeanConference on Computer Vision pp 818ndash833 Springer ChamSwitzerland September 2014

[23] C Szegedy W Liu Y Jia et al ldquoGoing deeper with con-volutionsrdquo in Proceedings of the 2015 IEEE Conference onComputer Vision and Pattern Recognition (CVPR) IEEEBoston MA USA June 2015

[24] C Szegedy V Vanhoucke S Ioffe J Shlens and Z WojnaldquoRethinking the inception architecture for computer visionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2818ndash2826 Las Vegas NV USA June2016

[25] K He X Zhang S Ren and J Sun ldquoDeep residual learningfor image recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 770ndash778 LasVegas NV USA June 2016

[26] G Huang Z Liu L Van Der Maaten and K Q WeinbergerldquoDensely connected convolutional networksrdquo in Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition pp 4700ndash4708 Honolulu HI USA July 2017

[27] C Szegedy S Ioffe V Vanhoucke and A Alemi ldquoInception-v4 inception-ResNet and the impact of residual connectionson learningrdquo in Proceedings of the irty-First AAAI Con-ference on Artificial Intelligence San Francisco CA USAFebruary 2017

[28] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo 2014 httpsarxivorgabs14091556

[29] Y LeCun Y Bengio and G Hinton ldquoDeep learningrdquoNaturevol 521 no 7553 pp 436ndash444 2015

[30] M Turan E P Ornek N Ibrahimli et al ldquoUnsupervisedodometry and depth learning for endoscopic capsule robotsrdquo2018 httpsarxivorgabs180301047

[31] M Ye E Johns A Handa L Zhang P Pratt and G-Z YangldquoSelf-supervised Siamese learning on stereo image pairs fordepth estimation in robotic surgeryrdquo 2017 httpsarxivorgabs170508260

[32] M Faisal and N J Durr ldquoDeep learning and conditionalrandom fields-based depth estimation and topographicalreconstruction from conventional endoscopyrdquoMedical ImageAnalysis vol 48 pp 230ndash243 2018

[33] L Yu H Chen Q Dou J Qin and P A Heng ldquoIntegratingonline and offline three-dimensional deep learning for auto-mated polyp detection in colonoscopy videosrdquo IEEE Journal ofBiomedical andHealth Informatics vol 21 no1 pp 65ndash75 2017

[34] A A Shvets V I Iglovikov A Rakhlin and A A KalininldquoAngiodysplasia detection and localization using deep con-volutional neural networksrdquo in Proceedings of the 2018 17thIEEE International Conference on Machine Learning andApplications (ICMLA) pp 612ndash617 IEEE Orlando FL USADecember 2018

[35] S Fan L Xu Y Fan K Wei and L Li ldquoComputer-aideddetection of small intestinal ulcer and erosion in wirelesscapsule endoscopy imagesrdquo Physics in Medicine amp Biologyvol 63 no 16 p 165001 2018

[36] W Liu D Anguelov D Erhan et al ldquoSSD single shotmultibox detectorrdquo in Proceedings of the European Conferenceon Computer Vision pp 21ndash37 Springer Cham SwitzerlandOctober 2016

[37] J Redmon and A Farhadi ldquoYOLO9000 better fasterstrongerrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 7263ndash7271 Honolulu HIUSA July 2017

[38] T-Y Lin P Dollar R Girshick K He B Hariharan andS Belongie ldquoFeature pyramid networks for object detectionrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2117ndash2125 Honolulu HI USA July2017

[39] M Lin Q Chen and S Yan ldquoNetwork in networkrdquo 2013httpsarxivorgabs13124400

[40] T-Y Lin P Goyal R Girshick K He and P Dollar ldquoFocalloss for dense object detectionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision pp 2980ndash2988Venice Italy October 2017

[41] H R Roth H Oda Y Hayashi et al ldquoHierarchical 3D fullyconvolutional networks for multi-organ segmentationrdquo 2017httpsarxivorgabs170406382

[42] X Liu J Bai G Liao Z Luo and C Wang ldquoDetection ofprotruding lesion in wireless capsule endoscopy videos of smallintestinerdquo in Proceedings of the Medical Imaging 2018 Com-puter-Aided Diagnosis Houston TX USA February 2018

[43] B Zhou A Khosla A Lapedriza A Oliva and A TorralbaldquoLearning deep features for discriminative localizationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition pp 2921ndash2929 Las Vegas NV USA June2016

[44] F Wang M Jiang C Qian et al ldquoResidual attention networkfor image classificationrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition pp 3156ndash3164Honolulu HI USA July 2017

14 Computational and Mathematical Methods in Medicine

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom

Page 15: DeepConvolutionalNeuralNetworkforUlcerRecognitionin ...e emergence of wireless capsule endoscopy (WCE) has revolutionized the task of imaging GI issues; this technology offers a noninvasive

Stem Cells International

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

MEDIATORSINFLAMMATION

of

EndocrinologyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Disease Markers

Hindawiwwwhindawicom Volume 2018

BioMed Research International

OncologyJournal of

Hindawiwwwhindawicom Volume 2013

Hindawiwwwhindawicom Volume 2018

Oxidative Medicine and Cellular Longevity

Hindawiwwwhindawicom Volume 2018

PPAR Research

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Immunology ResearchHindawiwwwhindawicom Volume 2018

Journal of

ObesityJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational and Mathematical Methods in Medicine

Hindawiwwwhindawicom Volume 2018

Behavioural Neurology

OphthalmologyJournal of

Hindawiwwwhindawicom Volume 2018

Diabetes ResearchJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Research and TreatmentAIDS

Hindawiwwwhindawicom Volume 2018

Gastroenterology Research and Practice

Hindawiwwwhindawicom Volume 2018

Parkinsonrsquos Disease

Evidence-Based Complementary andAlternative Medicine

Volume 2018Hindawiwwwhindawicom

Submit your manuscripts atwwwhindawicom