convolutional neural network based medical imaging ......• in the context of medical imaging, such...
TRANSCRIPT
ConvolutionalNeuralNetworkbasedMedicalImagingSegmentation:Recent
ProgressandChallenges
JiaxingTan
www.gc.cuny.edu
RoadMap
• Introduction• CNN based Models• Encoder-Decoder basedModels• GAN Based Models• Some Challenges• Conclusion
www.gc.cuny.edu
RoadMap
• Introduction• CNN based Models• Encoder-Decoder basedModels• GAN Based Models• Some Challenges• Conclusion
www.gc.cuny.edu
Introduction
• One key research topic in Medical Imaging is imagesegmentation.
• Image segmentation, or Semantic Segmentation, isa pixel level image understanding task which is toperform a pixel-by-pixel classification to decide theclass of each pixel.
1/53
www.gc.cuny.edu
Introduction
• In the context of Medical Imaging, suchsegmentation method could be utilized to solvethe problems such as nodule detection, anomalydetection and organ segmentation.
• Medical image suffers the fact of high noisy andlow quality, which makes it is much harder toperform segmentation on the medical images.
2/53
www.gc.cuny.edu
Introduction
Examples of lung CT slices with nodules in redmask, the nodules are quite small comparingto the whole image.
Findingtheneedleinthehaystack
3/53
www.gc.cuny.edu
Introduction
• Traditionally, medical image segmentation isperformed by hand-engineered feature basedclassification.
• One common attribute of such method is someempirical “magic numbers” are used forthresholding and preprocessing.
• Risk remains that those empirical values couldbe dataset specific which might impede themodel to be a general solution.
4/53
www.gc.cuny.edu
Introduction
• As the final goal of medical imaging is to buildCAD to serve the massive people, a robust andadaptive model is highly recommended.
• Convolutional Neural Network could performautomatically feature learning by itself. (Cast newlight!)
5/53
www.gc.cuny.edu
RoadMap
• Introduction• CNN based Models• Encoder-Decoder basedModels• GAN Based Models• Some Challenges• Conclusion
www.gc.cuny.edu
BasicStructureofCNN
• Convolutional Neural Networks are similar to thedeep neural networks as they are made up ofneurons that have learn-able weights and biases.
• In a tone of a more commonly seen introduction,CNN is a deep neural network, inspired by thebiology study of human cortex, constructed byfour types of layers:
• Input Layer• Convolution Layer• Sampling Layer• Fully Connected Layer
6/53
www.gc.cuny.edu
AnExampleofCNN
7/53
www.gc.cuny.edu
CNN-InputLayer
• The Input layer is in charge of reading datawith a predefined size without performingany changes to it.
• In figure below, the input layer reads in a CTscan image with size 256×256.
8/53
www.gc.cuny.edu
Convolution Layer
• 2D Convolution: An operation on two functions fand g, which produces a third function that can beinterpreted as a modified ("filtered") version of f.
• where * means convolution and · means ordinarymultiplication.
9/53
www.gc.cuny.edu
CNN- Convolution Layer
• The output of the convolution layer is k featuremaps, each generated by a convolution operationwith one kernel applied on the whole image.
• In figure below, there are 2 convolution layers.Conv1 has 32 kernels, each of size 7x7 , conv2 has64 kernels, each of size 5x5 .
10/53
www.gc.cuny.edu
Pooling Layer(2D)
Acceptsavolumeofsize W1×H1×D1Requirestwohyper-parameters:
– theirspatialextent F,– thestride S,
Producesavolumeofsize W2×H2×D2 where:– W2=(W1−F)/S+1W2=(W1−F)/S+1– H2=(H1−F)/S+1H2=(H1−F)/S+1– D2=D1
11/53
www.gc.cuny.edu
Fully-connectedlayer
Neurons in a fully connected layer have fullconnections to all activations in the previouslayer, as seen in regular Neural Networks. Theiractivations can hence be computed with amatrix multiplication followed by a bias offset.
12/53
www.gc.cuny.edu
CNN- FullyConnectedLayer
• The last layer of a CNN model is usually a fullyconnected layer which serves as an output layer. Inthe output layer, the number of neurons denotes thenumber of classes in the classification task.
13/53
www.gc.cuny.edu
SomeDeepLearningPackages
• One advantage of CNN is that several public packages areavailable.
• Instead of building your CNN from scratch, you couldtake advantage of the publicly available GPU supportpackages.
14/53
www.gc.cuny.edu
SomeDeepLearningPackages
• Caffe:http://caffe.berkeleyvision.org/• Lasagne:https://lasagne.readthedocs.io/en/latest/• TensorFlow:https://www.tensorflow.org/• Theano:http://deeplearning.net/software/theano/• Torch:http://torch.ch/• Pytorch:http://pytorch.org/• Mxnet:http://mxnet.io/
• Keras:https://keras.io/• Caffe2:https://cae2.ai/• Andthislistkeepincreasing…..• WhichoneIlikemost?Dependsonavailablecodeforthemodel…..
15/53
www.gc.cuny.edu
CNNModels
• The general idea is infer the class label of thecenter pixel(s) using its neighbors nearby.
• We classify them into three categories:• 2D CNN based• 3D CNN based• Holistic CNN based.
16/53
www.gc.cuny.edu
CNNModels
• BoundingBoxmodel:The2Dand3Dmethodsarebasedonwhethera2Dor3Dneighborhoodisconsideredforclassificationofthecenteredpixel(s).
• Asacomplementarytoboundingboxmethod,wealsointroducemethodsthatdoesnotrelyonboundingbox,namedHolistic.
17/53
www.gc.cuny.edu
2DCNNbasedlungnoduledetection
• Given a set of CT scans of a patient, which usuallycontains more than 300 slices depending on thebody size of the patient, radiologists will check thescan slice by slice to detect the nodule.
• For each scan, radiologists will observe every sub-region in it. This procedure is performed in 2D slices.
18/53
www.gc.cuny.edu
2DCNNbasedlungnoduledetection
• To simulate this procedure, 2D CNN is used.• After applying some pre-processings such as lungsegmentation and noise elimination, the slice wouldbe cut into small sub-regions.
• Each region is an input into CNN and the output is adecision whether such region contains a nodule ornot.
• The combined result shows if a nodule exists in thisslice.
19/53
www.gc.cuny.edu
2DCNNbasedlungnoduledetection
• Mostrecently,Nima,et.alcomparestheperformanceofseveralCNNstructuresonlungnoduledetectionwithMTANN.
Nima Tajbakhs, et al. ”Comparing two classes of end-to-end machinelearning models in lung nodule detection and classification: MTANNs vs. CNNs”, Pattern Recognition, in press.
20/53
www.gc.cuny.edu
3DCNNbasedlungnoduledetection
• Whenradiologistscheckeachscan,forabetterinspection,besidesgoingthrougheachregionofthescan,theywillalsocheckthesameregionontheslicesbeforeorafterthecurrentslicetodecidewhetherthereisanoduleinside.
• Suchdetectionproceduretakesadvantageofthe3DnatureoftheCTscan,whichcouldalsobeaninspirationonthedesignofCNNbaseddetection.
21/53
www.gc.cuny.edu
3DCNNbasedlungnoduledetection
• Traditionally,aCNNtakesa2Dmatrixasaninput.
• However,therearesomerecentpublicationsincomputervisionintroduce3DCNNforthetasksuchasvideoscenerecognitionor3Dobjectrecognition,whichachievepromisingresult.
22/53
www.gc.cuny.edu
3DCNNbasedlungnoduledetection
• Intheareaoflungnoduledetection,duetothe3DnatureofCTscan,itwillbereasonabletoapply3DCNN.Someeffortshavealsobeenmade.
• Rushil Anirudh et.alhasapplied3DCNNonaweaklylabelledlungnoduledataset.HeusesavoxelvasainputintotheCNNtodecidewhetherthecenterpointvlocatedat(x;y;z)tobeanoduleornot.
• visdefinedas(x-w:x+w;y-w:y+w;z-h:z+h),whichmeansnotonlyneighborsofvinthesameslicebutalsotheneighborsonthepreviousandlattersliceareconsideredtomakethedecision.23/53
www.gc.cuny.edu
3DCNNbasedlungnoduledetection• Thisdesignisclosertohowradiologistperformslungnoduledetection.
• Asensitivityof80%for10falsepositivesperscanhasbeengivenontheirweaklylabelleddatasetasaresult.
Anirudh, Rushil, et al. ”Lung nodule detection using 3D convolutional neural networks trained on weakly labeled data.” SPIE Medical Imaging. International Society for Optics and Photonics, 2016.
24/53
www.gc.cuny.edu
HolisticCNNlungnoduledetection
• All the methods we have mentioned in the previoustwo sections mainly obey the pipeline so that thedetection result of a given slice is based on theresults from a group of sub-tasks which performnodule detection in each region of the slice with aslidingwindow.
• Such pipeline is not very efficient. The concern iswhether we could perform nodule detection on thewhole image to achieve the detection result withoutdividing it into sub-tasks.
25/53
www.gc.cuny.edu
YOLO
• It models detection as a regression problem.• It divides the image into an even (S × S)grid and simultaneously predicts (B)
boundingboxes, confidence in those boxes, and (C) class probabilities.• Each bounding box consists of 5 predictions: center of the box (relative to the
bounds of the grid cell), Width, height, and confidence.• These predictions are encoded as an S × S × (B ∗ 5 + C) tensor.
26/53
www.gc.cuny.edu
HolisticCNNlungnoduledetection
Gao, Mingchen, et al. ”Holistic classification of CT attenuation patterns for interstitial lung diseases via deep convolutional neural networks.” Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization (2016): 1-6.
Mingchen Gao et. al, has applied a holistic classification on lung CT scans to detect 6 different kinds of diseases. Although the task is lung disease detection instead of lung cancer, this paper casts some light on using a different pipeline to perform nodule detection.
27/53
www.gc.cuny.edu
RoadMap
• Introduction• CNN based Models• Encoder-Decoder basedModels• GAN Based Models• Some Challenges• Conclusion
www.gc.cuny.edu
Auto-encoderalikestructure
• An auto-encoder neural network is an unsupervisedlearning algorithm that applies backpropagation,setting the target values to be equal to the inputs.
• My understanding: Given an input x, the auto-encoder could pass only the information needed whilefiltering out all the noises.
Encoder Decoder
HiddenRepresentation
28/53
www.gc.cuny.edu
FCN
• Thefeaturesaremergedfromdifferentstagesintheencoderwhichvaryin coarsenessofsemanticinformation.
• Theupsampling oflearnedlowresolutionsemanticfeaturemapsisdoneusing deconvolutionswhichareinitializedwithbillinearinterpolationfilters.
• Excellentexamplefor knowledgetransferfrommodernclassifiernetworks likeVGG16,Alexnet toperformsemanticsegmentation
29/53
www.gc.cuny.edu
SegNet
• Auto-encoderAlikestructure(symmetric).• SegNet uses unpooling toupsample featuremapsin
decodertouseandkeephighfrequencydetailsintactinthesegmentation.
• Thisencoderdoesn’tusethefullyconnectedlayers(byconvolutionizing themasFCN)andhenceislightweightnetworklesserparameters.
30/53
www.gc.cuny.edu
U-Net
• U-Net simply concatenates the encoder feature maps toupsampled feature maps from the decoder at every stageto form a ladder like structure.
• The architecture by its skip concatenation connectionsallows the decoder at each stage to learn back relevantfeatures that are lost when pooled in the encoder.
31/53
www.gc.cuny.edu
Link-Net
• SimilartoU-Net• Linkeachencoderwithdecoder• EachencoderblockisaResNet Block• Reducedparameters
32/53
www.gc.cuny.edu
PSP-Net
• EmbeddifficultscenerycontextfeaturesinanFCNbasedpixelpredictionframework.
• Firstuseapretrained ResNet modelwiththedilatednetworkstrategytoextractthefeaturemap.
• Thenfusedifferentlevelfeaturesforfurtheranalysis.
33/53
www.gc.cuny.edu
• Given an input image (a)• Use CNN to get the feature map of the last convolutional layer (b)• A pyramid parsing module is applied to harvest different sub-region
representations, followed by up-sampling and concatenation layersto form the final feature representation, which carries both local andglobal context information in (c).
• Finally, the representation is fed into a convolution layer to get thefinal per-pixel prediction (d).
34/53
www.gc.cuny.edu
WhyGlobalFeatures?
Anillustrationtoshowcasetheimportanceofglobalspatialcontextforsemanticsegmentation.
35/53
www.gc.cuny.edu 36/53
www.gc.cuny.edu
RoadMap
• Introduction• CNN based Models• Encoder-Decoder basedModels• GAN Based Models• Some Challenges• Conclusion
www.gc.cuny.edu
GenerativeAdversarialNetwork (GAN)
The loss for the G network is:
The loss for the D network is
37/53
www.gc.cuny.edu
GAN BasedSegmentation
38/53
www.gc.cuny.edu
GAN BasedSegmentation
GivenadatasetofNtrainingimagesxn andacorrespondinglabelmapsyn,thelossisdefinedas:
Trainingtheadversarialmodel:
Trainingthesegmentationmodel:
39/53
www.gc.cuny.edu
RoadMap
• Introduction• CNN based Models• Encoder-Decoder basedModels• GAN Based Models• Some Challenges• Conclusion
www.gc.cuny.edu
SOMECHALLENGES
• Data Source• Data Preparation• Some Other Challenge:
– HPC support– High Cost– Multi-disciplinary Cooperation Required
40/53
www.gc.cuny.edu
DataSource
• CNN, as some other big data technologies, requires alarge enough dataset to learn the classification rules.
• Different from computer vision area, where large andclean benchmark dataset is available, limited lung noduledataset is available to the public.
• Most people have their own datasets containing differentnumbers of patients from various sources.
• Where to get data is a big challenge to perform a deeplearning based detection.
41/53
www.gc.cuny.edu
DataSource
• SPIE-AAPM-LUNGx dataset: a dataset used for a lung challengeoriginally to decide whether a nodule is benign or malignant.
• LIDC-IDRI: contains 1018 cases, the largest public database founded bythe Lung Image Database Consortium and Image Database ResourceInitiative. On the website lung nodule CT scan is available fordownload.
• ELCAP Public Lung Image Database: contains 50 lowdose thin-slicechest CT images with annotations for small nodules.
• NSCLC-Radiomics: contains 422 non-small cell lung cancer (NSCLC)patients
42/53
www.gc.cuny.edu
DataPreparation
• Themajorpurposesofthedatapreparationaretomakethetrainingdatalessconfusing,morefittoCNNandenrichdatasize.
• Tomakethedatalessconfusing,someliteraturesperformlungsegmentationtoreducenoise.
• Thenpossiblesmoothmethodscouldbeappliedtothesegmentedlungparts.
• Also,someotherunnecessaryparts,likesomelightdotsorair,couldbefilteredoutwiththresholdorothertechniques. 43/53
www.gc.cuny.edu
DataPreparation
• ForthepurposeofmodifyingthedatatobemorefitforCNN,onechallengetobementionedisthedifferencebetweenCTscanandaRGBimage.
• ForaRGBimage,itcontainsthreechannels,eachchannelhasdatarangingfrom0to255.ForaCTscan,ithasonlyonechannelwithdatarangingfrom[-1000,3000],whichismuchlarger.
• Basedonourexperiments,ifonedirectlyputsCTscanwithsuchlargerangeintoCNN,theperformancewillbelimited.
44/53
www.gc.cuny.edu
DataPreparation
• To make CT scan more similar to the image originallyprocessed by CNN in computer vision, one solution is torescale the data range of CT scan to [0, 255] .
• This could definitely cause information loss.
45/53
www.gc.cuny.edu
DataPreparation
• One idea has been raised in the literature that turn theone channel CT scan into three channels by separatingattenuations into three levels: low, normal and high.
• Then the three channeled image would be rescaled into[0, 255] . One benefit of this method is that the CTimage now is in the same format with a RGB image.
46/53
www.gc.cuny.edu
DataPreparation
• To enlarge the size of dataset to meet the need of bigdata by CNN, some methods such as image translationcould be applied to enlarge dataset.
• The generated ones are considered different from theoriginal image.
• Also, adding random noise, such as white noise, to theoriginal image, could also be a solution to enlargedataset.
47/53
www.gc.cuny.edu
DataPreparation
• One more thing is the issue of imbalanced dataset.• As nodule detection is a binary classification problem(Nodule or Non-nodule), to train a classifier, thedataset should be a balanced one, which means bothclasses have equal number of samples.
48/53
www.gc.cuny.edu
DataPreparation
• However, obviously, in a set of CT scans, the number ofslices containing nodule is much smaller than that ofslices do not contain nodules. So when preparing thetraining dataset.
• We need to balance the dataset to make the number oftwo types of samples, containing nodule or not, to beequal.
49/53
www.gc.cuny.edu
SomeOtherChanllenges
• HPC support: The training of CNN based modelrequires huge amount of calculations on huge amountof data. Even with the help of HPC can the CNN modelbe trained in a durable length of time. Nowadays,besides training CNN purely on CPU, CUDA acceleratedGPU has also been used for training as well.
• High Cost: As with the need of HPC, another challengeis cost. The support of HPC consumes large amount ofenergy and requires facilities.
50/53
www.gc.cuny.edu
SomeOtherChanllenges
• Multi-disciplinary Cooperation Required: Thedesign of a CNN based lung nodule detection systemrequires the cooperation from multiple disciplinarysuch as medical, radiology and computer science.
51/53
www.gc.cuny.edu
RoadMap
• Introduction• CNN based Models• Encoder-Decoder basedModels• GAN Based Models• Some Challenges• Conclusion
www.gc.cuny.edu
Conclusion
• We give a brief introduction on the recent progress ofusing CNN for lung nodule detection.
• A list of public packages as well as a list of public data aregiven.
• We can see that CNN has shown great potential in thearea of Medical Imaging Segmentation.
• Meanwhile, challenges still remain and researchers areworking on solving them.
52/53
www.gc.cuny.edu
Questions?
53/53