size-invariant fully convolutional neural network for ... · pdf filesize-invariant fully...

Download Size-Invariant Fully Convolutional Neural Network for ... · PDF fileSize-Invariant Fully Convolutional Neural Network for Vessel Segmentation of Digital Retinal Images Yuansheng Luo

If you can't read please download the document

Upload: hoangxuyen

Post on 06-Feb-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • Size-Invariant Fully Convolutional Neural Networkfor Vessel Segmentation of Digital Retinal Images

    Yuansheng Luo, Hong Cheng and Lu Yang Machine Intelligence Institute, School of Automation Engineering,

    University of Electronic Science and Technology of China, Chengdu, Email: [email protected] Machine Intelligence Institute, School of Automation Engineering,

    University of Electronic Science and Technology of China, Chengdu, Email: [email protected] Machine Intelligence Institute, School of Automation Engineering,

    University of Electronic Science and Technology of China, Chengdu, Email: [email protected]

    AbstractVessel segmentation of digital retinal images playsan important role in diagnosis of diseases such as diabetics,hypertension and retinopathy of prematurity due to these diseasesimpact the retina. In this paper, a novel Size-Invariant Fully Con-volutional Neural Network (SIFCN) is proposed to address theautomatic retinal vessel segmentation problems. The input data ofthe network is the patches of images and the corresponding pixel-wise labels. A consecutive convolution layers and pooling layersfollow the input data, so that the network can learn the abstractfeatures to segment retinal vessel. Our network is designed tohold the height and width of data of each layer with padding andassign pooling stride so that the spatial information maintain andup-sample is not required. Compared with the pixel-wise retinalvessel segmentation approaches, our patch-wise segmentation ismuch more efficient since in each cycle it can predict all the pixelsof the patch. Our overlapped SIFCN approach achieves accuracyof 0.9471, with the AUC of 0.9682. And our non-overlap SIFCN isthe most efficient approach among the deep learning approaches,costing only 3.68 seconds per image, and the overlapped SIFCNcosts 31.17 seconds per image.

    I. INTRODUCTION

    Its proved that the retinal vessel morphology is highlycorrelated with the evolution of diseases such as diabetics,hypertension and retinopathy of prematurity [1] since thesediseases damage the retinal vessel. For example, the verycommon disease, diabetes, damages blood vessels which pro-vide nourishment for retina converting and conveying visualsignal to the brain, since the high blood sugar levels [2].Retinal vessel become an important indication for those dis-eases. Therefore, segmentation of retinal vessel from imagesis significant for physiologists and pathologists diagnosingdiseases with assistance of retina images. Particularly, its anadvantage that retinal images can be a non-invasively approachto visualize vessel [3], because it just utilizes a special opticalcamera to take the photo of retina, without the help of X-ray [3, 4]. About 10% of all diabetic patients have diabeticretinopathy, one of the main cause of blindness [5]. But thiskind of blindness can be prevented if early treatment wasexerted. Thus, WHO suggests ocular screening of patients eachyear. Retinal image segmentation is a indispensable step forscreening. So automatical segmentation will facilitate it.

    Automatical Segmentation of retinal vessels from images

    is mainly impeded by the poor local contrast and capriciousunbalanced illumination [6]. It can be easily proved by DRIVE(Digital Retinal Image for Vessel Extraction) database [5].The edge of vessel is so obscure that even specialists arenot able to distinguish the accurate vessel boundary withDRIVE database, since the organic tissue does not have clearvisual boundary [7]. And the unbalanced illumination of wholeimages makes the traditional threshold-based segmentationimpossible to reach state-of-art result.

    II. RELATED WORKS

    In this paper, the problem we devote ourselves to solve isthe segmentation of retinal vessel from images. The existingapproaches about it can be categorized as two groups: rule-based and learning-based.

    A. Rule-based approaches

    The former group primarily focuses on image processingalgorithms, including pre-processing, segmentation and post-processing. Chaudhuri et al. [6] proposed that a Gaussianshaped curve was used to approximate the gray-level profileof the cross section of a blood vessel, and 12 different tem-plates (matched filters) are used to search for vessel segmentsalong all possible directions. Al-Rawi et al.[8] also used 12templates generated by a set of parameters {L, , T}, andselected the best parameters to fit the vessel edge. This kindof matched filter approach is skilled in searching line-shapevessel in digital images, but its easy be interfered by line-shape background and little abstract context is considered.Martnez-Perez et al. [9] proposed an approach combiningregion growing and scale space analysis to segment pixels intovessel and non-vessel classes with the gradient magnitude andthe ridge strength both at different scales. Martinez-Perez etal. [10] developed the region growing idea, and proposed anapproach to automatically segment retinal vessels based onthe multiscale feature extraction using the first and secondspatial derivatives of the intensity image. This kind of regiongrowing approach can detect blood vessels of different width,length and orientation, while initial seeds for region growingare needed to be assigned. Zana et al. [11] presented an

  • algorithm based on mathematical morphology and curvatureevaluation for the detection of vessel-like patterns in a noisycontext. Bankhead et al. [1] presented the Isotropic Undec-imated Wavelet Transform (IUWT) approach to enhance theforeground (retinal vessel pixels) and background, and mor-phology transformation, vessel edge detection and thresholdsegmentation followed. Such kind of framework relies toomuch upon the pre-processing of images, but a genera imageprocessing algorithm for pre-processing that are effective formany different retinal image environment are impossible, soits application is confined.

    B. Learning-based approaches

    Learning-based group primarily focuses on training param-eters to classify pixels. Niemeijer et al [7] utilized kNN-classifier to classify each pixel in the digital retinal image witha feature vector extracted from green channel of the retinalimage only. Soares et al. [12] presented a novel approachusing a Bayesian classifier with class-conditional probabilitydense function. The feature vectors are composed of thepixels intensity and two-dimensional Gabor wavelet transformresponses at different scales. Xu et al. [13] used SVM toclassify each pixel of retinal image with features extractedfrom fragments. But the design of features become skillful.Melinscak et al. [14] and Ciresan et al. [15] proposed a deepconvolutional neural network framework to classify each pixelof retinal image and showed good results, but they are notefficient.

    Generally speaking, rule-based approaches use deliberative-ly designed parameters to processing retinal images, thus theyare fast. And learning-based approaches train parameters withground truth, so they are time-consuming. Particularly, pixel-wise classification needs considerable computation since eachpixel needs once classification and an image contains tens ofthousands of pixels to classify.

    The approach we propose belongs to the latter group and weuse the Fully Convolution Neural Network to segment retinalimages. And, we use GPU to accelerate the classification.More importantly, compared with pixel-wise segmentation,a patch-wise segmentation approach we propose will savetremendous computation. For example, if we assign the patchsize as 120 120, once classification will classify 14400pixels, not 1 pixel. The deep learning framework we useis deep SIFCN, and our work was inspired by [16], and itproposed that the fully convolutional network for semanticsegmentation. But we originally make the size of input andoutput the same by padding the border of each layer andassigning stride as 1 so that the thin vessels will not be ignoredduring down sampling.

    The contribution of this paper can be summarized as follow: The patch-wise retinal image segmentation approach is

    novelly proposed by us. And compared with the pixel-wise approaches, our patch-wise approach takes morecontext information into consideration during segmenta-tion. Therefore, our approach outperforms the pixel-wiseapproaches. And in each segmentation computation cycle,

    (a) ground truth (b) Heat-map without overlap

    (c) Heat-map with overlap (d) Heat-map of [14]

    Fig. 1. Heat-map of the retinal image. The intensity of the heat map indicatesthe probability of the corresponding pixel. Compared with result of (d) from[14], our approaches (c) and (d) shows less noises. And (d) makes the vesselsfatter than the ground truth. So the accuracy of our approach (c) is higherthan (d).

    patch-wise segmentation can predict tens of thousands ofpixels of a patch. Thats much more efficient than thepixel-wise segmentation which can only predict one pixelin each computation cycle.

    In this paper, we propose a size-invariant fully connectednetwork which maintains the height and width of datain each layer in order to avoid that the small vesselsare overlooked during in down-sampling. As far as wesee, this paper is the first one utilizing the size-invariantnetwork to keep the detailed information.

    The rest of the paper is organized as follows. The deepSIFCN framework is described in section III. The experimentresult and detailed analysis of the experiment are presented insection IV. And in section V, the summary and conclusion aredrawn.

    III. SIFCN FRAMEWORK

    A. Working layers

    Our SIFCN is composed of convolution layers and max-pooling layers. Each layers data is organized as a three-dimensional array in size of hwd, where h and w denotesthe height and width of input, and d denotes the dimensionof feature. Data in higher layers corresponds to the receptivefields locations in the i