![Page 1: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/1.jpg)
Encoder-Decoder Networks for Semantic
Segmentation
Sachin Mehta
![Page 2: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/2.jpg)
Outline
> Overview of Semantic Segmentation > Encoder-Decoder Networks > Results
![Page 3: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/3.jpg)
What is Semantic Segmentation?
Input: RGB Image Output: A segmentation Mask
![Page 4: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/4.jpg)
Encoder-Decoder Networks
Encoder Decoder
Encoder • Takes an input image
and generates a high-dimensional feature vector
• Aggregate features at multiple levels
Decoder • Takes a high-
dimensional feature vector and generates a semantic segmentation mask
• Decode features aggregated by encoder at multiple levels
![Page 5: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/5.jpg)
Building Blocks of CNNs
> Convolution > Down-Sampling > Up-Sampling
![Page 6: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/6.jpg)
Convolution
a0 a1 a2
a3 a4 a5
a6 a7 a8
x0 x1 x2
x3 x4 X5
x6 x7 x8
Filter weights are learned from data
![Page 7: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/7.jpg)
Down-Sampling
> Max-pooling > Average Pooling > Strided Convolution
a0 a1 b0 b1
a2 a3 b2 b3
c0 c1 d0 d1
c2 c3 d2 d3
a0 b1
c1 d3 Max-Pooling
𝒂� 𝒃�
𝒄� 𝒅� Avg. Pooling
𝒙𝒙 𝒙𝒙
𝒙𝒙 𝒙𝒙 Strided Convolution
(stride = 2)
![Page 8: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/8.jpg)
Un-Pooling
Deconvolution
Up-Sampling
> Un-pooling > Deconvolution
a0 a1 b0 b1
a2 a3 b2 b3
c0 c1 d0 d1
c2 c3 d2 d3
a0 b1
c1 d3
a0 0 0 b1
0 0 0 0
0 c1 0 0
0 0 0 d3
Max-pooling
0 0 0 0
0 a0 b1 0
0 c1 d3 0
0 0 0 0
x0 x1 x2 x3
x4 x5 x6 x7
x8 x9 x10 x11
x12 x13 x14 x15
![Page 9: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/9.jpg)
Encoder-Decoder Networks
Encoder Decoder
![Page 10: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/10.jpg)
Encoder-Decoder Networks Different Encoding Block Types
• VGG • Inception • ResNet
![Page 11: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/11.jpg)
Encoder-Decoder Networks Different Encoding Block Types
Max-Pool
Conv 3x3
Conv 3x3
Conv 3x3
Input
Output
• VGG • Inception • ResNetd
![Page 12: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/12.jpg)
Encoder-Decoder Networks Different Encoding Block Types
• VGG • Inception • ResNet
Max-Pool
Conv 1x1
Conv 3x3
Concat
Input
Output
Max-Pool
Conv 1x1
Conv 1x1
Conv 5x5
Conv 1x1
![Page 13: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/13.jpg)
Encoder-Decoder Networks Different Encoding Block Types
• VGG • Inception • ResNet
Conv 3x3
Conv 3x3
Sum
Input
Output
![Page 14: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/14.jpg)
Different Encoding Block Types Performance on the ImageNet 2012 Validation Dataset
0
20
40
60
80
Mem
ory
(in M
B) Memory per image
0
50
100
150
Para
met
ers (
in
Mill
ion)
Parameters
0
10
20
30
40
Infe
renc
e Ti
me
(in
ms)
Inference Time
7.5
8
8.5
9
9.5
10
Erro
r (in
%)
Classification Error
VGG Inception ResNet-18
![Page 15: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/15.jpg)
Encoder-Decoder Networks
Encoder Decoder
![Page 16: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/16.jpg)
Encoder-Decoder Networks
Encoder Decoder
![Page 17: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/17.jpg)
Encoder-Decoder Networks Different Decoding Block Types
• VGG • Inception • ResNet
![Page 18: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/18.jpg)
Encoder-Decoder Networks Different Decoding Block Types
Un-Pool
Conv 3x3
Conv 3x3
Conv 3x3
Input
Output
• VGG • Inception • ResNet
![Page 19: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/19.jpg)
Encoder-Decoder Networks Different Decoding Block Types
Deconv 1x1
Conv 1x1
Conv 3x3
Concat
Input
Output
Max-Pool
Conv 1x1
Conv 1x1
Conv 5x5
Conv 1x1
• VGG • Inception • ResNet
![Page 20: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/20.jpg)
Encoder-Decoder Networks Different Decoding Block Types
DeConv 3x3
Sum
Input
Output
• VGG • Inception • ResNet
![Page 21: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/21.jpg)
Classification vs Segmentation
VGG Inception
Source: Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." TPAMI. 2016.
9.05
9.1
9.15
9.2
9.25
9.3
9.35
VGG Inception
Erro
r (in
%)
Top-5 Classification Error
30
35
40
45
50
55
60
VGG Inception
Accu
racy
(in
%)
FCN-32s Segmentation Accuracy
(a) ImageNet Classificaiton Validation Set
(b) PASCAL VOC 2011 Validation Set
![Page 22: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/22.jpg)
Our Work on Segmenting GigaPixel Breast Biopsy Images
![Page 23: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/23.jpg)
Challenges with the dataset
> Limited computational resources > Sliding window approach is promising but
– Size of patch determines the context – Some biological structures may cover several patches
![Page 24: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/24.jpg)
Challenges with the dataset
> Some biological structures are rare – Necrosis and Secretion have less than 1% of all the pixels
![Page 25: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/25.jpg)
Training details
> Training Set: 30 ROIs – 25,992 patches of size 256x256 with augmentation – Split into training and validation set using 90:10 ratio
> Test Set: 28 ROIs
> Stochastic Gradient Descent for optimization
> Implemented in Torch – http://torch.ch/
![Page 26: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/26.jpg)
Segmentation Results
RGB Image Ground Truth Label
![Page 27: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/27.jpg)
Segmentation Results
Encoder-Decoder Network with skip connection
RGB Image Prediction
![Page 28: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/28.jpg)
Segmentation Results
Multi-Resolution Encoder-Decoder Network
RGB Image Prediction
![Page 29: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/29.jpg)
Segmentation Results
RGB Image Ground Truth
Plain Multi-Resolution
![Page 30: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/30.jpg)
Segmentation Results
F1-Score
![Page 31: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/31.jpg)
Why Segmentation?
> Segmented whole dataset (428 ROIs) with the model trained on 30 ROIs
> Extracted histograms from segmentation masks and then trained different classifiers
> Weak classifiers are as good as strong classifiers
Results on Diagnosis
![Page 32: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/32.jpg)
Thank You!!
![Page 33: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/33.jpg)
References
1. Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." TPAMI. 2016. (FCN-8s)
2. Noh, Hyeonwoo, Seunghoon Hong, and Bohyung Han. "Learning deconvolution network for semantic segmentation." ICCV. 2015. (DeConvNet)
3. V. Badrinarayanan; A. Kendall; R. Cipolla, "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Scene Segmentation," TPAMI, 2017 (SegNet)
4. Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions.“, ICLR, 2016 (Dilation)
5. Chen, Liang-Chieh, et al. "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." arXiv preprint arXiv:1606.00915 (2016). (DeepLab)
6. Zheng, Shuai, et al. "Conditional random fields as recurrent neural networks." ICCV. 2015. (CRFasRNN)
7. Hariharan, Bharath, et al. "Hypercolumns for object segmentation and fine-grained localization." CVPR. 2015. (HyperColumn)
![Page 34: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels](https://reader030.vdocument.in/reader030/viewer/2022040908/5e7ff551a0e16a1cc328845b/html5/thumbnails/34.jpg)
Two 3x3 filters are same as one 5x5 filter
Source: Rethinking the Inception Architecture for Computer Vision by Szegedy et al.