automated pulmonary nodule detection on computed

IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2018

Automated Pulmonary Nodule Detection on Computed Tomography Images with 3D Deep Convolutional Neural Network

ANTOINE BROYELLE

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

KTH Royal Institute of TechnologySchool of Electrical Engineering and Computer ScienceMaster in Computer ScienceDegree ProjectJune 2018

Author:Antoine [email protected]

Principal:Optellum Ltd.Oxford Centre for InnovationOxford, OX1 1BY, United Kingdom

Examiner at KTH CSC:Hedvig Kjellströ[email protected]

Supervisor at KTH CSC:Pawel [email protected]

Supervisor at the principal:Timor [email protected]

[email protected]

[email protected]

mailto:[email protected]

mailto:[email protected]

iv

Abstract

Object detection on natural images has become a single-stage end-to-end pro-cess thanks to recent breakthroughs on deep neural networks. By contrast,automated pulmonary nodule detection is usually a three steps method: lungsegmentation, generation of nodule candidates and false positive reduction.This project tackles the nodule detection problem with a single stage modelusing a deep neural network.

Pulmonary nodules have unique shapes and characteristics which are notpresent outside of the lungs. We expect the model to capture these character-istics and to only focus on elements inside the lungs when working on rawCT scans (without the segmentation). Nodules are small, distributed and in-frequent. We show that a well trained deep neural network can spot relevantfeatures and keep a low number of region proposals without any extra pre-processing or post-processing.

Due to the visual nature of the task, we designed a three-dimensionalconvolutional neural network with residual connections. It was inspired bythe region proposal network of the Faster R-CNN detection framework.

The evaluation is performed on the LUNA16 dataset. The final score is0.826 which is the average sensitivity at 0.125, 0.25, 0.5, 1, 2, 4, and 8 falsepositives per scan. It can be considered as an average score compared toother submissions to the challenge. However, the solution described herewas trained end-to-end and has fewer trainable parameters.

KEYWORDS: Deep Learning, Artifical Neural Networks, Lung damages,Lung cancer, CT scans, Pulmonary Nodules, Detection, Region Proposal.

v

Sammanfattning

Objektdetektering i naturliga bilder har reducerates till en enstegs processtack vare genombrott i djupa neurala nätverk. Automatisk detektering avpulmonella nodulärer är vanligtvis ett trestegsproblem: segmentering av lunga,generering av nodulärkandidater och reducering av falska positiva utfall. Dethär projektet tar sig an nodulärdetektering med en enstegsmodell med hjälpav ett djupt neuralt nätverk.

Pulmonella nodulärer har unika karaktärsdrag som inte finns utanför lung-orna. Modellen förväntas fånga dessa drag och enbart fokusera på elementinuti lungorna när den arbetar med datortomografibilder. Nodulärer är småoch glest föredelade. Vi visar att ett vältränat nätverk kan finna relevantasärdrag samt föreslå ett lågt antal intresseregioner utan extra för- eller efter-behandling.

På grund av den visuella karaktären av det här problemet så designadevi ett tredimensionellt s.k. convolutional neural network med residualkopp-lingar. Projektet inspirerades av Faster R-CNN, ett nätverk som utmärker sigi sin förmåga att detektera intresseregioner.

Nätverket utvärderades på ett dataset vid namn LUNA16. Det slutgilti-ga nätverket testade 0.826, vilket är genomsnittlig sensitivitet vid 0.125, 0.25,0.5, 1, 2, 4, och 8 falska positiva per utvärdering. Detta kan anses vara ge-nomsnittligt jämfört med andra deltagande i tävlingen, men lösningen somföreslås här är en enstegslösning som utför detektering från början till slutoch har färre träningsbara parametrar.

vi

Acknowledgment

I must express my very profound gratitude to thank Carlos Arteta, my super-visor at the principal. He guided me in the right direction whenever I neededit. He also took time to answer all my questions regarding my research or hispersonal experience as a former PhD and post-doc at Oxford University. Alearnt a lot working alongside him.

Besides him, I would like to thank the Optellum team for their welcoming.I learned a lot about entrepreneurship and early-stage companies.

I would also like to thank my supervisor at KTH, Dr. Pawel Herman, forguiding us through the requirements of the degree project.

My sincere thanks also goes to Quentin Chometon. The back-and-forthsbetween us had contributed to both our projects, and all our lunch times con-versation made bearable the British weather.

I would finally like to acknowledge Gabriel Carrizo for the Swedish ab-stract and Lottie Woodward who had the kindness to review and proofreada report written in a French fashion way.

The authors would like to thank the LUNA16 challenge organizers forproviding the dataset and evaluating results.

Last but not the least, I would like to thank my family for their supportthroughout my education.

Glossary of terms

CAD Computer-Aided diagnosis

CNN Convolutional Neural Network

CT Computed Tomography

FDA Food and Drug Administration, federal agency of the UnitedStates Department of Health and Human Services

FP False Positive

FROC Free Receiver Operating Characteristic

GLOBOCAN Clinical trial, carried by the International Agency for Researchon Cancer, aims at providing an estimation of mortality, inci-dence and prevalence of major types of cancer worldwide

GPU Graphical Processing Unit

HU Hounsfield Unit

IoU Intersection Over Union

LUNA16 Lung Nodule Analysis 2016

NMS Non-Maxima Suppression

R− CNN Region-based Convolutional Neural Network

ReLu Rectified Linear Unit

ROC Receiver Operating Characteristic

TP True Positive

TPR True Positive Rate

vii

Contents

Introduction 1

1 Background 41.1 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . 41.2 Computed Tomography Imaging . . . . . . . . . . . . . . . . . 41.3 Detection Framework . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Nodule Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Methods 102.1 Clinical Dataset: Lung Nodule Analysis 2016 (LUNA16) . . . 102.2 Preparation of the data . . . . . . . . . . . . . . . . . . . . . . . 112.3 Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . 132.5 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Results 173.1 Anchors Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.4 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Discussion 214.1 Main findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Impacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.4 Ethic and Sustainability . . . . . . . . . . . . . . . . . . . . . . 23

5 Conclusion 24

Conclusion 245.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

References 26

viii

CONTENTS ix

A Visual Evaluation and Interpretation 30

Annexe A: Visual Evaluation and Interpretation 30

Introduction

According to the GLOBOCAN series of the International Agency for Re-search on Cancer published in 2014[1], lung cancer is the most commonlydiagnosed cancer (1.82 million) and the most lethal cancer with 1.6 milliondeaths in 2012 within 184 countries. Cancer incidence changes deeply de-pending on the individuals, the gender, the development of the country, etc.Interestingly, lung cancer incidence is higher in developed and industrialisedregions inside Europe, America and Asia.

In general, cancers are due to an abnormal development of cells. Insidethe lungs, the result is called pulmonary nodule. Lesions formed by the nod-ules are detected with non-invasive imaging processes like computed tomog-raphy (CT) scans, where nodules result in a radiographic opacity. The malig-nancy is assessed by characteristics of the nodule such as size, morphology,location, multiplicity, etc; but also with patient characteristics such as age,sex, race and medical background[2]. Some studies have shown that a screen-ing program can reduce lung cancer mortality as this allows early-stage de-tection and a better follow up[3]. However, screening programs remain theexception and are prone to high false positive rate, increasing patient stressand cost. Nowadays, malignant pulmonary nodules are mainly discoveredfrom incidental findings (liver or cardiovascular scans). The sooner nodulesare detected, the more effective the treatment can be.

The challenge surrounding diagnosis is to detect early onset of lung can-cer with only a small amount of information. For the past 5 years, artificialintelligence has been able to solve many tasks, especially visual tasks. It cannow be used in the medical industry as computed-aided diagnosis (CAD)system, and help radiologists, ophthalmologists and other health profession-als during the decision-making processes. The United States Food and DrugAdministration (FDA) has just approved a first decision-software based onartificial intelligence. The system is able to detect diabetes by looking at theretina[4]. It provides a screening decision without the need for a doctor tointerpret the results. Another example, more research oriented, is the classifi-cation of natural images showing epidermis patches in order to prevent skincancer[5].

1

2 CONTENTS

Optellum Ltd, the company where the thesis has been carried out, is cur-rently building such a system. The company aims to provide an expert-levelpulmonary nodule risk assessment using computer vision techniques trainedon a large set of chest CT scans. To achieve this, Optellum Ltd is carryingsome researches on nodule detection, nodule malignancy assessment andnodule segmentation among others. Since the company has to turn this re-search into a commercial product, it is important to explore the capabilitiesof simple models which require little resources. Additionally, unlike modelsfor object detection on natural images (see Section 1.3), algorithms for noduledetection seem to rely on more processing steps, not necessarily well justified(see Section 1.4). Finally, the segmentation of the lungs as a pre-processingoperation might discard nodules located on the lung walls.

The scientific aspect of the thesis is to gauge the effectiveness of a single-stage deep neural network based approach to the problem of nodule detec-tion in medical images. In other words, the project discusses the need forpre-processing (segmentation) and post-processing (false positive reduction)stages in this specific diagnostic image analysis pipeline. The hypothesis isthat a well trained CNN can have a low number of region proposals andlearn to focus only inside the lung. To this end, we intend to design a CNNand compare its performance to other CNN based model used by previoussubmissions to the Lung Nodule Analysis 2016 challenge.

There are some key concepts and postulates made for tackling the pul-monary nodule detection problem:

• There is more information in a volume than in a few slices extractedfrom the CT scans.

• The network should be isotropic to cope with the variety of inputs.

• Nodules are small and sparse. This means a lot of information will belost if the spatial compression of the network is too high. However,keeping a large spatial representation leads to more computation.

• Nodules can be approximated as spheres. Hence, the predefined bounding-boxes (anchors) are isotropic (spheres) and only vary in scale.

• The input represents a continuous volume so there is no obfuscations orocclusions which could lead to restrictive and conservatives rules andstrategies.

• The number of region proposals should be quite low as the number ofnodule per patient is also low.

CONTENTS 3

Chapter 1 presents some background information and interesting workrelated to medical image analysis and computer vision algorithms for objectdetection. Chapter 2, describes our approach in terms of data processingand design of the model. Chapter 3 describes the results and a comparativestudy. Finally, the report ends with a discussion of impacts and limitations inChapter 4 and the conclusion.

Chapter 1 Background

1.1 Artificial Neural Networks

Artificial neural networks are a graph-based class of machine learning algo-rithms where nodes are called neurons. This is inspired by the way the brainworks. Neurons inhibit or not their output potential based on a weightedsum of inputs coming from other neurons. The goal is to find the best weightsto produce a meaningful output given the input and a criterion. This searchis called the training phase.

Feed-forward artificial neural networks are particular types of neural net-works where the information move in only one direction, from the input tothe output, without any cycle inside the graph. In most cases, when feed-forward artificial neural networks are used, neurons are grouped to form lay-ers and layers are stacked so that each neuron in one layer has direct orientedconnections to the neurons of the subsequent layer. Deep neural networksare feed-forward artificial neural networks where many layers are stackedbetween the input and the output layers. The mapping from one layer to thenext one can vary between algorithms. CNNs are one specific type of feed-forward artificial neural networks where the operations between layers areconvolutions with some kernels which have learnable weights and bias.

CNNs have shown a useful and efficient usage in computer vision thanksto weights sharing and translation invariant [6][7]. They are used for differenttasks such as classification [8][9], segmentation [10] and detection [7][11][12].Medical imaging is no exception. Among many other projects, people workedon paediatric bone age estimation with X-ray[13], pneumonia detection on X-ray[14] and skin cancer classification on natural images of the epidermis[5].

1.2 Computed Tomography Imaging

Computed tomography (CT) is a noninvasive medical technique which usesX-ray to produce several cross-sections of the body. Each cross-section rep-resents a slice of a few millimetres of the patient. CT scans are a set of slicesusually representing a continuous part of the person being screened. For

4

1.3. DETECTION FRAMEWORK 5

each slice, the X-ray source and detector rotate around the patient. Duringthe rotation, several snapshots are taken and are then processed to producean image. Each pixel of the image expresses an intensity in Hounsfield Units(HU). Figure 1.1 describes the relation between the HU values and the as-sociated substances. Radiologists have to visualise every single slice of thescan when looking for nodules. They perform the detection by spotting un-expected shapes and intensities. However, this visual inspection has beenshown to be prone to failures; only 68% of nodules are found[15].

substance values (HU)

Air -1000Lung -500Water 0Blood 30 to 45

Soft tissue 100 to 300Bone 700 to 3000

Table 1.1: Hounsfield Units meaning

1.3 Detection Framework

Object detection is a combination of object classification and object localisa-tion. The classification is the task of choosing one label among a predefinedset. The localisation is solved by putting a bounding box around the object.The tighter the bounding box, the better.

The first attempt to the detection problem using a CNN was made in 2014by LeCun et al. when they released OverFeat[7], a network with a small re-ceptive field applied on the input image at different scales and positions ina sliding window fashion. This is computationally greedy as the network isrun many times.

Later in 2014, a region-based convolutional neural network (R-CNN)[11]was published. This model contains four independent components. The firststep is to find regions of interest, called region proposals. The second stepuses a CNN for the feature extraction of each proposal. Finally, the featurevector at the output of the CNN is fed into a support vector machine (SVM)for classification and a linear regressor for the localisation. The main issuewith R-CNN is the amount of computation required. Indeed, the network isapplied independently on every single candidate region even though someof them may overlap (see Figure 1.2a).

6 1.3. DETECTION FRAMEWORK

Figure 1.1: Illustration of anchors. The input is a 2D array (16x16). The featuremap is a volume to represent the channels. The CNN has a subsampling ratio of4. The dots on the image represent the mesh grid where anchors are applied. Threesquared anchors are used and are represented with dashed lines.

This drawback was later fixed with Fast R-CNN[16], where the CNNtakes a raw image and directly generates a class prediction and a class-specificbounding box. Hence, the image evaluation is single-stage. The improve-ment comes from the feature extraction directly performed on the full imageusing a CNN. The generated feature map is then cropped based on the pro-posed regions. The assumption is that the feature map is still embeddingsome spatial information. This operation is made by the region of interestpooling layer (RoIpool). The sub-patches of the feature map go into the finalpart of the network, which performs the classification and regression of thebounding box (see Figure 1.2b). The model still relies on an external regionproposal algorithm which is the bottleneck of the computations.

The first end-to-end deep learning detection algorithm was Faster R-CNN [17].Shaoqing et al. introduced a Region Proposal Network (RPN), generating re-gion proposals with a CNN. An additional improvement, aiming to reducecomputations, is the sharing of parameters between RPN and the detectionnetwork (Fast R-CNN) by using the same first convolutional layers. The RPNregion of interest proposals are used exactly like Fast R-CNN does with theRoIpool (crop of the feature map). They use a key concept used later by alldetection frameworks called anchors[18]. Anchors can be seen as pre-definedbounding boxes used as references. They come in different shapes, ratios andscales. Each of them is capturing different information due to their character-istics. They are all applied throughout the entire image in a sliding-window

1.3. DETECTION FRAMEWORK 7

(a) R-CNN (b) Fast R-CNN

(c) Faster R-CNN (d) FCN

Figure 1.2: Illustration of the pipeline for different detection frameworks. Foreach graph, the information flow is from the bottom to the top following the arrows.Dark boxes represent a sub-task to optimize. Blue elements represent intermediatestates. Blue squares represent a portion of the image and blue cubes represent featuremaps.

8 1.4. NODULE DETECTION

fashion. For each anchor, at each location, the problem is broken into twoparts : is there a relevant object (binary classification), and how to adjust theanchor for a better fit (regression of the bounding box). The Fast R-CNNcomponent performs object classification and a class-specific bounding boxrefinement for each anchor (see Figure 1.2c).

Region-based Fully Convolutional Networks (R-FCN)[19] is a modifiedFaster R-CNN. Compare to Faster R-CNN, the feature map is cropped lateron, on the very last convolution before the classification and regression ofthe bounding box (see Figure 1.2d). Other systems remove the need for RPNas they perform the object classification and class-specific bounding box re-gression directly (see Single Shot MultiBox Detector (SSD)[20] and You OnlyLook Once (YOLO)[12]).

At the end of each of these algorithms, a post-processing operation is con-ducted: the non-maxima suppression (NMS)[21]. Each anchor proposes abounding box with a different confidence score. The problem is that an ob-ject often accumulates several region proposals. The NMS is a local search forthe best proposal for each object. It aims to reduce duplication between theregion proposals by only keeping the one with the highest confidence scorecompared to its neighbours.

1.4 Nodule Detection

In 2016, a dataset and a challenge were released for pulmonary nodule de-tection (see Section 2.1 below). Relevant work has been published in thecontext of the competition. Unfortunately, some of the reports lack the re-quired details for reproductability, for example, due to intellectual propertyconstraints.

The detection is often tackled with an information pipeline: segmentationof the lungs, nodule detection and false positive reduction. The motivationfor segmenting the inner part of the lungs is to only present areas of interestto the network. The nodule detection works on the segmented lungs andoften returns a high number of candidate regions (high sensitivity but alsohigh false positive rate). This motivates the need for the third componentwhich filters the interesting region proposals from the background samples.

Ding et al.[22] (team name qfpxfd) presented a solution using a 2D FasterR-CNN[17] (RPN + Fast R-CNN) with a feature extraction based on VGG-16[23] pre-trained on ImageNet[24]. The second part is a 3D CNN with 6convolutional layers and 3 fully connected layers. The interesting componentof their solution is the presence of a deconvolution layer as the last operationof the feature extraction, after the VGG-16 network. This reduces the down-sampling ratio between the input and the feature map. They used 6 anchors

1.4. NODULE DETECTION 9

of sizes ranging from 4mm to 32mm, leading to high computational cost. Asthe network only sees three consecutive slices at a time, screening a full CTscan is expensive and time consuming.

Berens et al.[25]. (team name ZNET) made a unique interpretation of thelabels. They used the annotations to generate a mask and tackled the detec-tion problem by segmenting the nodules. Moving from the generated maskto nodule candidates required some post-processing based on heuristics. Thiswas the nodule detection part and the neural network architecture used was aU-Net[26]. They performed the false positive reduction with a 2D ResNet[27]containing 18 convolutions.

The team PAtech[28] and Zhu et al. who introduced DeepLung[29] hadthe same approach. The detection network is based on the RPN from theFaster R-CNN framework and the feature extraction is made with a networkinspired by U-Net encoder-decoder[26]. Thus, they decoupled the field ofview and the sampling ratio. Also, they both performed the selection oflungs with some hand-crafted rules based on Hounsfield units and geomet-rical considerations. For DeepLung, the false postive reduction was madewith a dual path network[30] while, for the other solution, the same networkarchitecture is used but trained with a different objective function.

Chapter 2 Methods

2.1 Clinical Dataset: Lung Nodule Analysis 2016 (LUNA16)

In 2011, Armato et al. published LIDC-IDRI dataset (Lung Image DatabaseConsortium and Image Database Resource Initiative). This publicly availabledataset contains the annotations of four radiologists. The LUNA16 challenge[31]is a subset of LIDC-IDRI but provides a single ground truth for each nodule.LUNA16 does not contain CT scans with a slice thickness greater than 3mm,CT scans with missing slices or CT scans with inconsistent spacing. Nodulessmaller than 3mm or not marked by at least three radiologists are not kept.Once excluded, this represents a collection of 888 CT scans. The localisationis provided for 1186 nodules spread among 601 CT scans. The challenge hastwo tracks: the nodule detection and the false positive reduction. This thesisis an attempt to solve the nodule detection problem. Figure 2.1 presents thedistribution of nodules.

(a) Nodule diameter distribution(b) Spatial distribution.

Direct orthonormed systemexpressed in millimetres.

Figure 2.1: Distribution of nodules in LUNA16

10

2.2. PREPARATION OF THE DATA 11

The organisers also provide lung segmentation. However, the mask hadbeen automatically generated with algorithms[32]. The Figure 2.2 illustratesone slice of the generated mask. The output is not suitable for segmentationstudy as advise by the organisers.

Figure 2.2: Lung mask. The white represents the trachea and main bronchus. Thelight grey and the dark grey are the inner volume of the left lung and right lungrespectively.

2.2 Preparation of the data

CT scans come in various sizes and resolutions. The first step is to resamplethem to an isotropic resolution of 1mm between the centre of two consecutivevoxels in the axial, coronal and sagittal directions. Subpatches of 128× 128×128 containing at least one nodule are then extracted. Patches are strictlyincluded in the CT volume. These patches define the working samples. Tri-linear interpolation was used to determine the final voxel values.

Values are clipped between -1000 and 400 HU. Hounsfield units (HU)lower than -1000 do not have any semantic meaning and are used for padding.Figure 2.3 represent the HU distribution on the LUNA16 dataset. Valueshigher than 400 to not bring any information to the task, they represent bonesor foreign bodies like pacemakers (see Table 1.1).

Data augmentation is used during training to reduce risks of overfitting.A random crop of 96× 96× 96 is extracted from the bigger patches, all threeaxes are randomly flipped and 3D 90-degree rotations are randomly applied.Values are standardised so that on average samples have a zero mean and aunit variance.

12 2.3. PIPELINE

Figure 2.3: HU values distribution on LUNA16

2.3 Pipeline

The nodule detection is based on the RPN of the Faster R-CNN[17]. As theproblem is binary (is there a nodule or not?), RPN is similar to single viewnetworks, like SSD[20] or YOLO[12]. The network uses 3D convolutions andrequires a volume as input. The first layers are used for the extraction of ahigh dimensional, low-resolution feature map. The number of anchors is setto three per spatial position of the output feature map. Due to the generalspherical shape of nodules, anchors have the same ratio and only come indifferent scales.

Unlike 2D detection on natural images (e.g. photos), where the objectsare roughly centred and use most of the space, nodules are small and sparse.The average diameter is 7.3mm (see Figure 2.1a), which make the feature 4million times smaller than the input volume on average. Thus the anchordiameters and the spatial compression of the network are things that need tobe carefully set. All anchors are applied in the input space at the centre ofeach pixel cluster contributing to 1 element of the feature map. It is impor-tant to understand that anchors are defined in the input space but are evenlypositioned based on the shape of the feature map. Figure 2.5 describes theimpact of both the scale ratio and the selection of anchors on the amount ofinformation captured. Networks with the smallest ratio are the most interest-ing even though they are more computationally heavy. For an input of sizeWxHxD, K anchors and an isotropic network with a scale ratio of S, the finalfeature map of the network is 5 ∗K ∗W ∗H ∗D/S3. The factor 5 correspondsto the 3D position (e.g. x, y and z), the diameter, and the confidence score.As described below, in our case we have 3 anchors (K = 3), a scale ratio of 4(S = 4) and an input of 96× 96× 96 (W = H = D = 96).

2.4. NETWORK ARCHITECTURE 13

2.4 Network Architecture

The feature extraction is inspired by ResNet[27] and 2D convolutions are re-placed by 3D convolutions. In that respect, a first convolution with kernelsof size 7 × 7 × 7 and a stride of 2 in each direction is applied. As we try tokeep a low ratio between the output feature map and the input, only 3 ResNetblocks made out of 2 residual connections are used and only one of them hasa feature stride greater than 1. Two convolutions with kernels of 1 × 1 × 1.are then applied for the classification and the regression of the bounding box.Figure 2.4 presents the architecture of the CNN used for this experiment.

The input is set to 96× 96× 96 due to memory limitation during training.This architecture was designed to obtain a spatial scale factor of 4 betweenthe input and the output space. Figure 2.5 depicts the need for a small scalefactor between the inputs space and the output space in order capture asmuch information as possible.

Figure 2.4: Convolutional Neural Network. Tensors are described with theirshape (K,D × H × W ) where K is the number of channels, D the depth, H theheight and W the width. These values correspond to a sample at training time. Cxs

defines a 3D convolution with a kernel of size x and a stride of s, followed by a batchnormalisation and a rectified linear unit (ReLu) activation. Rs defines a ResNet[27]block with 2 residuals connections using convolutions with a kernel size of 3. Foreach block, the stride of the first convolution and the first residual connection is setto s.

14 2.5. TRAINING

2.5 Training

The multitask loss function for the anchor i is defined as:

Lossi = Lcls(pi, p∗i ) + 2p∗i × Lreg(ti, t

∗i ) (2.1)

where ∗ denotes the ground truth, p is the probability of being a positiveanchor and t is the representation of the bounding box defined as a relativeoffset vector:

t =

(x− xada

,y − yada

,z − zada

, log(d

da)

)(2.2)

where (x, y, z, d) represents the position and the diameter, and (xa, ya, za, da)

represents the anchor position and scale. In Equation 2.1, the factor twocomes from the fact that the regression is only performed for positive an-chors. The binary cross-entropy is used to compute the classification lossLcls. The regression loss Lreg is the smooth-L1[16].

An anchor is considered as positive if it has the highest intersection overunion (IoU) with any ground truth bounding box or if the IoU is higher than0.5. The IoU between 2 elements is defined as the volume of overlap dividedby the volume of union. The motivation is that the closest anchor, the onewith the minimum transformation, should be the anchor capturing the infor-mation of the nodule. Anchors with an IoU lower than 0.02 are considered asnegative anchors. In any other case, anchors are not considered as relevantand their contributions are not taken into account in the final loss. The learn-ing should not take into account complex cases to prevent the system frombeing confused. Figure 2.5 validates the need of having two rules in orderto consider an anchor as positive. The rule of being the best match is mainlyused by small anchors. The other rule is mainly used by larger anchors.

Hard negative mining is applied to improve generalisation. The final lossis a weighted sum of each anchor loss (eq 2.1). For a positive anchor, theweight is set to 1; for negative anchors, the weight is the probability of beingclassified as a nodule. Thus, hard examples contribute more during the learn-ing. To deal with the high imbalance between classes, weights are normalisedfor each class .

RMSprop is the optimiser used for the back-propagation with α = 0.99,ε = 1e− 08 and a momentum of 0.9. The initial learning rate is set to 0.001. Alearning rate decay strategy is used: it is divided by 2 every 30 epochs. Thesystem is trained for 150 epochs and the best model is the network with thelowest loss on the validation set. The batch size is 32. For computations, thegraphical processing units (GPU) used were Nvidia GEFORCE GTX 1080 Ti.

2.6. EVALUATION 15

2.6 Evaluation

As recommended and provided by the LUNA16 organisers, a 10-folds cross-validation is used. One fold was used for testing, two for validation andseven for training. At test time, the full CT scans are sent as the input (asopposed to 96× 96× 96 during training).

The non-maxima suppression (NMS) is applied as a post-processing fil-tering operation. The neighbourhood notion is defined using the IoU and thevalue to optimise is the probability of being a nodule. If candidate boundingboxes intersect with the bounding box with the highest confidence score, andif the IoU between them is greater than a threshold, then these region pro-posals will be discarded. Due to the absence of obfuscation, a low thresholdis used (pt = 0.1).

If a bounding box proposal is located within a certain distance to the cen-tre of a nodule defined by the ground truth, it is considered as a positivematch. This distance is set to the radius of the nodule defined by the groundtruth. Other candidates are considered false positives. Note that only thecentral location of the region proposals is evaluated with this metric; the sizeof the bounding box is not taken into account. The LUNA16 challenge usesthe free receiver operating characteristic (FROC) analysis. This could be seenas an adjustment of the receiver operating characteristic (ROC). A point ofthe FROC curve is generated by computing the average false positive (FP)region proposals per scan and the sensitivity at a given score threshold. Thesensitivity, also called true positive rate (TPR), is defined as:

TPR = TP/P = TP/(TP + FN)

where TP is the number of true positives, P the number of positives, TP thenumber of true positives and FN the number of false negatives.

The final metric is the average of the sensitivity at 0.125, 0.25, 0.5, 1, 2, 4and 8 FPs per scan. The purpose is to take into account the scarcity of nodulesand so to expect only few candidates. Under this settings, the worst modelwill get a score of 0 and the best model a score of 1.

16 2.6. EVALUATION

(a) IoU mean 0.40318, median 0.38023 (b) IoU mean 0.36657, median 0.36165

(c) IoU mean 0.23376, median 0.19764 (d) IoU mean 0.232133, median 0.193360

(e) IoU mean 0.098513, median 0.053458 (f) IoU mean 0.098380, median 0.053943

Figure 2.5: Distribution of the maximum intersection over union for eachnodule (IoU with the best anchor). Nodules are described with their diameter.The scale ratio is 4 for the first row (2.5a and 2.5b), 8 for the second row (2.5c and2.5d), and 16 for the last row (2.5e and 2.5f). On the left side, anchors have a diameterof 5, 10 and 20 mm; on the right the diameters are 8, 16 and 32 mm.

Chapter 3 Results

3.1 Anchors Selection

As described in Section 2.3, anchors are predefined bounding boxes. There-fore, the greater the number of anchors, the better the characteristics of thenodules are captured. Nonetheless, it is computationally costly to have manyanchors. In our experiments, having more than 3 anchors did not provide asignificant improvement in the results. Consequently, the number of anchorswas set to 3.

Having anchors evenly distributed to match the size distribution of thenodules in the data is important. A k-means algorithm using euclidean dis-tance with three clusters returns centroids at 5.7, 9.9 and 18.7mm. These val-ues were rounded to 5, 10 and 20mm. Figure 2.1 represents the distributionof nodules.

Figure 2.5 presents the amount of information captured depending on thedistribution of anchors for several models with different scale ratios. The cap-tured information is measured with the intersection over union; the higher,the better. As expected, the networks with the a low scale ratio can moreeasily model the distribution of the data and minimise the shifts between theanchors and the bounding boxes. An even distribution of anchors over thedata distribution also helps minimising this shift. Our model fell under thecontext of Figure 2.5a.

3.2 Evaluation

We designed a single-stage deep learning architecture based on convolutionsand residual connections to tackle the pulmonary detection task. All hyper-parameters were carefully set to achieve interesting results. The most impor-tant ones were the number of anchors, the sizes of anchors and the compres-sion rate of the network. The model was trained on small patches but canrun on full CT scans. The preparation of the data consisted in resampling theCT scans to a resolution of 1mm × 1mm × 1mm and normalising the valuesbetween -1 and 1.

17

18 3.2. EVALUATION

Figure 3.1: FROC curve. The continuous line is generated on the submissions overthe 10 folds. The dotted lines represent the 95% confidence interval using bootstrap-ping with 1000 re-samplings with replacement.

The results of the proposed CAD system are reported in Figure 3.1. Toget a better insight of the global performance, bootstrapping technique wasused (random resampling with replacement). We achieved a final score of0.826. At test time, the final computations over an entire CT scan takes threeseconds in average on a Nvidia GEFORCE GTX 1080 Ti.

Appendix A is a visual interpretation of the quality of the detection. Asexpected, the detection worked great on different sizes, intensities and shapes.This is presented in Figure A.1. Failure modes were categorized and are pre-sented in Figure A.2 for false negatives and Figure A.3 for false positives.

Most false negatives were related to nodules with low intensity or softboundaries. Sometimes the localization was the failure point. In this case, thepost-processing was often to blame; keeping only the bounding box with thehighest confidence score is sometimes not the best strategy. The first sourceof false positive were vessels. This was the biggest problem. Other failurecases were coming from potential regions of interest but not included in theannotations due to the inclusion criteria. This includes nodules smaller than3mm or regions annotated by too few radiologists.

3.3. LOCALIZATION 19

3.3 Localization

Figure 3.2 reports the proportion of bounding box proposals laying outsideof the lungs. This proportion was computed for several classification confi-dence score thresholds. A nodule was considered as part of the inner volumeof the lungs if a given number of pixels contained by its bounding box over-lapped with the lung masks provided by the LUNA16 challenge organisers.Otherwise, the nodule belonged to the outer space.

This experiment did not aim at assessing the correctness of the regressionconcerning the bounding box position. Also, we wanted to make sure thatsmall nodules on the edges are considered as part of the inside of the lungs.Therefore, a nodule was considered as part of the lung volume if a singlepixel of the bounding box overlapped with the lung mask.

For any threshold, the proportion of nodules outside of the lungs repre-sents less than 4% of the bounding boxes generated by the model.

Figure 3.2: Proportion of bounding box proposals outside of the lungs over differentclassification confidence score thresholds.

3.4 Comparison

By mid-February 2018, 29 submissions had been made to the LUNA16 chal-lenge. The median score was 0.845, the average was 0.82, and the standarddeviation was 0.09. Unfortunately, only few submissions contained a correctdescription. Our final result can be considered as average.

20 3.4. COMPARISON

Figure 3.3 represents a comparative study of performance compared tothe number of parameters used in the models. Interestingly, the final resultappeared to be decorrelated from the number of trainable parameters. Due tothe two-stage process (nodule detection and false positive reduction), most ofthe models get an important number of parameters compared to the modelbuilt for this project.

Unfortunately, a lot of submissions to the LUNA16 challenge do not havea complete report. Information is not shared for the protection of the intel-lectual property or the process is not fully described, making it impossible toreproduce. This makes a more detailed comparison difficult to perform.

Figure 3.3: LUNA16 submission comparison. The top plot presents the evolu-tion of the sensitivity for several submissions. The bottom one shows the final scoreagainst the number of trainable weights for the same models.

Chapter 4 Discussion

4.1 Main findings

Several machine learning approaches could have been used for this problem.A Deep CNN was chosen based on the outstanding results reported in visualtasks and, specifically, obtained in the LUNA16 challenge[33][28][29]. How-ever, for pulmonary nodule detection the CNNs have been used only as acomponent in a larger information pipeline.

This degree project describes the pulmonary nodule detection task andproposes a single stage model using deep convolutional neural networks.As a result, our pipeline is simple and similar to the ones used for objectdetection on natural images.

We identified two major elements which significantly influenced our ex-periments:

• The sizes of anchors and their quantity. The anchors need to representthe data and follow the same distribution in order to maximise the in-formation captured. In our case, cubical anchors were a good fit due tothe spherical shape of the nodules.

• The subsampling ratio between the input and the last feature map ofthe CNN. Small ratios allow the CNN to deal with small features.

Some detection algorithms on natural images are criticised for their per-formances on small objects. It is for example the case for YOLOv2[12]. Wehave shown that CNNs can learn to identify small and sparse features. Thesecapabilities are related to the distribution of anchors, but also the criteria bywhich anchors are considered positives or negatives. Under YOLOv2 frame-work, only the anchor at the centre of the bounding box ground truth is con-sidered as positive. In our case, we combine two rules and as a result wemight have several positive anchors representing the same element.

In addition, the fact that nodules are unlikely is tackled with a good train-ing environment based on hard example mining and a good weighting strat-egy.

21

22 4.2. IMPACTS

Figure 3.2 shows that the model found some nodules outside of the lungs.However, the proportions of bounding box proposals outside of the lungs issmall. Removing them by using segmentation would improve results. How-ever we believe that this proportion is small enough to question the use ofthe segmentation. In addition, clinicians can easily spot these mistakes.

4.2 Impacts

Nowadays, image processing is key component of the decisions taken by clin-icians. CADs represent the next step down the road by providing some in-sights and key elements to the clinicians. Our detection system, combinedwith other models, could deliver an expert diagnosis on lung cancer makinghealth cares more efficient and cost-effective. In order to do so, the detectionmodel will have to work with, for example, a nodule classifier (is it a malig-nant nodule? what is the stage of development?), a nodule segmentation tool(what is the radius and the volume?) but also with patient information suchas the sex, the age and the patient family history.

For our model, an interesting operating point would be to work at 0.9classification confidence score, leading to an average false positive scan of 7and a sensitivity of 93.6%. Also, some false positives are really easy to spoteven for non-trained eyes. Indeed, some bounding boxes represent vessels orare outside of the lungs.

Our observation is that CNN architectures developed for object detectionon natural images can easily be tuned and applied on different types of im-ages. A single stage model based on CNN leads to an end-to-end training.This end-to-end paradigm is an interesting approach as it provides a robustand implicit solution for complex problems.

4.3 Limitations

The confidence scores associated with generated bounding boxes do not haveany semantic or physiologic meaning. This could represent an issue. Thefirst issue is about finding an interesting operating point depending on theapplication. The second issue is the potential impact this information couldhave on the clinician. Will two bounding boxes with two different confidencescores receive the same consideration? More work has to be done by thescientific community on the interpretation of the decisions made by CNNs.

Even though one can develop an impressive model and achieve a highscore, this work has a small impact. Like any medical project, the researchis often a small part of the project. Indeed, in order to have a meaningfulimpact on society the model has to be evaluated with clinical trials and get

4.4. ETHIC AND SUSTAINABILITY 23

the approvals from health and sanity organisations around the world. Finally,clinicians have to be trained and use the tools.

Since we compared our model to the mean performance of few other sub-missions, we are not able to compare using statistically convincing evidence(null hypothesis testing).

Another limitation of this project is related to the size of the dataset. LUNA16contains 888 scans which cannot represent the diversity of morphologies, de-vices and protocols present around the world. More data will have to becurated and annotated in order to build a more robust CAD. Optellum Ltd iscurrently working on this point.

A more technical limitation is related to the use of CNNs. In general,CNNs are trained with supervised learning, which means all the data haveto be annotated, often manually and in a large quantity. Also, the modelperforms a lot of operations computed on large GPUs which are expensiveand not eco-friendly because they consume a lot of electricity. It might not bepossible to deploy the solution on a computer like the ones used in hospitalsdue to the absence of GPUs. The company is now looking into requirementsfor a cloud-based solution.

4.4 Ethic and Sustainability

From an ethical point of view, on the one side, one can be worried about thedata which will be gathered, collected and stored by Optellum Ltd. Indeed,such system will store medical records, which are highly personal and alltechnology devices are prone to security issues. On the other side, such CADgives an expert level decision to anyone for nearly nothing which ensuresgender, ethnicity and financial equality. One can imagine that a CAD willreplace radiologists; on the contrary, such tool will boost their productivityby taking obvious decisions and provide useful insights on tricky cases.

As described before, CNN architectures require components using a lotof energy compare to other hardware components. Consequently, the impacton the environment depends on how and where the electricity is generated.In general CADs do not need to be real time. As a result, old GPUs can beused and their life-cycle can be extended. In all, this project has the sameimpact than any project relying on GPUs. From an ethical point of view, theproject is promising but no doubt more questions will arise as the system ismore and more used.

Chapter 5 Conclusion

In this study, a novel pulmonary nodule detection CAD system has been de-veloped which uses deep convolutional neural networks. The detection isinspired by the region proposal network of the Faster R-CNN framework.The evaluation is made using the free response receiver operating charac-teristic (FROC). The proposed architecture yields 0.826 which correspond toan average result. The final network relies on fewer convolutions and fewertrainable parameters compared to previous submissions[33][28][29].

This thesis questioned the necessity of segmentation and false positivereduction as elements of the information pipeline. Our result, although notcompeting with the current state of the art which used multi-stage process-ing, shows that a well-designed single-stage model based on deep learningcan achieve interesting results. This is possible due to a low number of re-gion proposals and a network mainly focusing on features inside the lung.The final model is simple and closer to the pipeline used for object detectionon natural images. The results are encouraging and this work could lead tomany other experiments.

5.1 Future work

From this promising work, a lot of new experiments can be derived. We be-lieve that the architecture of the CNN has a minor impact on the overall per-formance as long as the ratio is kept low. However, this has to be tested. Newarchitectures will probably present trade-offs between speed and accuracy aswe see for object detection in natural images[34].

Single-shot detectors have been shown to be faster and simpler, but havelower accuracy than two-stage detectors because of extreme class imbalancebetween background and object for the final loss[34]. Focal loss[35] is a mod-ified version of the cross entropy loss used as the classification loss for single-shot detector. It weighs down the loss assigned to well-classified examplesand therefor reduce the contribution of easy backgrounds. As our binaryRPN could be seen as single-shot detector, an interesting experimentationcould be to replace the hard negative mining with the focal loss. It could also

24

5.1. FUTURE WORK 25

be interesting to add the second part of the Faster R-CNN on top of the RPNand to evaluate the performance gained.

Collecting CT scans of organs surrounding the lungs can also improveperformances. These samples, if properly integrated during the training,could help reducing the number of false positive detections outside of thelungs.

One interesting thing would be to learn the non-maxima suppression witha supervised end-to-end learning as described by Hosang et al.[36]. Furtherwork should explore more complex data augmentation. We believe the pyra-midal feature hierarchy would be quite appropriate[37]. Under this frame-work, predictions are made on several intermediate feature maps and notonly on the last one.

Another set of experiments concern the preparation of the data. We didnot test any other resampling resolution or anisotropic ones. Also, to im-prove their productivity, some radiologists use maximum intensity projectionover several continuous slices. This results in vessels displayed as lines andnodules displayed as circles. However, this technique tends to drop smallnodules on the edges. Inspired by this use case, it could be interesting totry different projections (minimum, average, maximum, etc.) over differentthicknesses (3mm, 5mm, 10mm, 20mm, etc.). This could lead to faster detec-tion as the deep neural network will have fewer operations to compute dueto a smaller input.

Multi-task training has been shown to improve performances for eachsub-task. This has been demonstrated by Sermanet et al. when working onOverFeat[7] and more recently by He et al. with Mask R-CNN[38]. The lateruses a single network for object detection, instance segmentation and keypoint identification. In our case, combining nodule detection with lung seg-mentation or lobe segmentation could be interesting.

Bibliography

[1] J. Ferlay, I. Soerjomataram, R. Dikshit, S. Eser, C. Mathers, M. Rebelo,D. M. Parkin, D. Forman, and F. Bray, “Cancer incidence and mortal-ity worldwide: sources, methods and major patterns in globocan 2012,”International Journal of Cancer, vol. 136, no. 5, pp. E359–E386, 2015.

[2] H. MacMahon, J. H. Austin, G. Gamsu, C. J. Herold, J. R. Jett, D. P.Naidich, E. F. Patz Jr, and S. J. Swensen, “Guidelines for managementof small pulmonary nodules detected on ct scans: a statement from thefleischner society,” Radiological Society of North America, vol. 237, no. 2,pp. 395–400, 2005.

[3] J. Abraham, “Reduced lung cancer mortality with low-dose computedtomographic screening,” Community Oncology, vol. 8, no. 10, pp. 441–442, 2011.

[4] A. A. van der Heijden, M. D. Abramoff, F. Verbraak, M. V. Hecke,A. Liem, and G. Nijpels, “Validation of automated screening for refer-able diabetic retinopathy with the idx-dr device in the hoorn diabetescare system,” Acta ophthalmologica, vol. 96, no. 1, pp. 63–68, 2018.

[5] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, andS. Thrun, “Dermatologist-level classification of skin cancer with deepneural networks,” Nature, vol. 542, no. 7639, p. 115, 2017.

[6] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4,inception-resnet and the impact of residual connections on learning.” inAssociation for the Advancement of Artificial Intelligence (AAAI), 2017, pp.4278–4284.

[7] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun,“Overfeat: Integrated recognition, localization and detection using con-volutional networks,” in International Conference on Learning Representa-tions (ICLR), 2014.

26

BIBLIOGRAPHY 27

[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classificationwith deep convolutional neural networks,” in Advances in neural infor-mation processing systems, 2012, pp. 1097–1105.

[9] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang,A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visualrecognition challenge,” International Journal of Computer Vision, vol. 115,no. 3, pp. 211–252, 2015.

[10] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networksfor biomedical image segmentation,” in International Conference on Med-ical Image Computing and Computer-Assisted Intervention (MICCAI), 2015,pp. 234–241.

[11] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierar-chies for accurate object detection and semantic segmentation,” in IEEEConference on Computer Vision and Pattern Recognition (CVPR), 2014, pp.580–587.

[12] J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” in IEEEConference on Computer Vision and Pattern Recognition (CVPR), 2017, pp.6517–6525.

[13] V. Iglovikov, A. Rakhlin, A. Kalinin, and A. Shvets, “Pediatric bone ageassessment using deep convolutional neural networks,” arXiv preprintarXiv:1712.05053, 2017.

[14] P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding,A. Bagul, C. Langlotz, K. Shpanskaya et al., “Chexnet: Radiologist-levelpneumonia detection on chest x-rays with deep learning,” arXiv preprintarXiv:1711.05225, 2017.

[15] R. Heelan, B. Flehinger, M. Melamed, M. Zaman, W. Perchick, J. Car-avelli, and N. Martini, “Non-small-cell lung cancer: results of the newyork screening program.” Radiological Society of North America, vol. 151,no. 2, pp. 289–293, 1984.

[16] R. Girshick, “Fast r-cnn,” in IEEE International Conference on ComputerVision (ICCV), 2015, pp. 1440–1448.

[17] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-timeobject detection with region proposal networks,” IEEE Transactions onPattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149,2017.

28 BIBLIOGRAPHY

[18] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Ob-ject detection with discriminatively trained part-based models,” IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp.1627–1645, 2010.

[19] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-basedfully convolutional networks,” in Advances in Neural Information Process-ing Systems (NIPS), 2016, pp. 379–387.

[20] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.Berg, “Ssd: Single shot multibox detector,” in European Conference onComputer Vision (ECCV), 2016, pp. 21–37.

[21] A. Neubeck and L. Van Gool, “Efficient non-maximum suppression,” inInternational Conference Pattern Recognition (ICPR), vol. 3, 2006, pp. 850–855.

[22] J. Ding, A. Li, Z. Hu, and L. Wang, “Accurate pulmonary nodule detec-tion in computed tomography images using deep convolutional neuralnetworks,” arXiv preprint arXiv:1706.04303, 2017.

[23] K. Simonyan and A. Zisserman, “Very deep convolutional networks forlarge-scale image recognition,” in International Conference on LearningRepresentations (ICLR), 2015.

[24] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet:A large-scale hierarchical image database,” in IEEE Computer Vision andPattern Recognition (CVPR), 2009, pp. 248–255.

[25] B. Moira, v. d. G. Robbert, d. K. Michael, M. Jeroen, and Z. Guido, “Dualpath networks,” in Advances in Neural Information Processing Systems,2016, pp. 4470–4478.

[26] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networksfor biomedical image segmentation,” in Medical Image Computing andComputer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger,W. M. Wells, and A. F. Frangi, Eds., Cham, 2015, pp. 234–241.

[27] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” in IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR), 2016, pp. 770–778.

[28] “3dcnn for lung nodule detection and false positive reduction,” 2 Jan-uary 2018.

BIBLIOGRAPHY 29

[29] W. Zhu, C. Liu, W. Fan, and X. Xie, “Deeplung: 3d deep convolu-tional nets for automated pulmonary nodule detection and classifica-tion,” arXiv preprint arXiv:1709.05538, 2017.

[30] Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan, and J. Feng, “Dual path networks,”in Advances in Neural Information Processing Systems (NIPS), 2017, pp.4470–4478.

[31] A. A. A. Setio, A. Traverso, T. De Bel, M. S. Berens, C. van den Bogaard,P. Cerello, H. Chen, Q. Dou, M. E. Fantacci, B. Geurts et al., “Validation,comparison, and combination of algorithms for automatic detection ofpulmonary nodules in computed tomography images: the luna16 chal-lenge,” Medical Image Analysis, vol. 42, pp. 1–13, 2017.

[32] E. M. van Rikxoort, B. de Hoop, M. A. Viergever, M. Prokop, and B. vanGinneken, “Automatic lung segmentation from thoracic computed to-mography scans using a hybrid approach with error detection,” Medicalphysics, vol. 36, no. 7, pp. 2934–2947, 2009.

[33] J. Ding, A. Li, Z. Hu, and L. Wang, “Accurate pulmonary nodule detec-tion in computed tomography images using deep convolutional neuralnetworks,” 2017.

[34] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer,Z. Wojna, Y. Song, S. Guadarrama et al., “Speed/accuracy trade-offs formodern convolutional object detectors,” in IEEE Conference on ComputerVision and Pattern Recognition (CVPR), 2017.

[35] T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, “Focal loss for denseobject detection,” CoRR, vol. abs/1708.02002, 2017.

[36] J. Hosang, R. Benenson, and B. Schiele, “Learning non-maximum sup-pression,” in IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2017.

[37] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie,“Feature pyramid networks for object detection,” in IEEE Conference onComputer Vision and Pattern Recognition (CVPR), vol. 1, no. 2, 2017, p. 4.

[38] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in IEEEInternational Conference on Computer Vision (ICCV), 2017, pp. 2980–2988.

Appendix A Visual Evaluation andInterpretation

varioussizes

variousintensi-

ties

variousshapes

Figure A.1: Illustrations of the quality of the detection on a variety of nod-ules: True positive Bounding box proposals are filtered; only the one with a confi-dence score greater than 0.9 are returned. The blue boxes represent the ground-truthsand the red boxes represent the region proposals. The system works over several char-acteristics (size, intensity and shape)

30

31

poorintensityor reso-lution

wronglocation

Figure A.2: Illustrations of the quality of the detection on a variety of nod-ules: False negative. Bounding box proposals are filtered; only the one with aconfidence score greater than 0.9 are returned. The blue boxes represent the ground-truths and the red boxes represent the region proposals. Major failures are due tonodules with poor intensity or a wrong localisation of the bounding boxes.

32

vessels

potentialregions

ofinterest

cornersor lobebound-

ary

outsidelungs

Figure A.3: Illustrations of the quality of the detection on a variety of nod-ules: False negative. Bounding box proposals are filtered; only the one with aconfidence score greater than 0.9 are returned. The blue boxes represent the ground-truths and the red boxes represent the region proposals. Most of the false positivescorrespond to damage area or tricky cases.

www.kth.se

automated pulmonary nodule detection on computed

Documents