![Page 1: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/1.jpg)
Feedforward semantic segmentation with zoom-out featuresMOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH
TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO
![Page 2: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/2.jpg)
2Main Ideas
Casting semantic segmentation as classifying a set of superpixels.
Extracting CNN features from different levels of spatial context around the superpixel at hand.
Using MLP as the classifier
Photo credit: Mostajabi et al.
![Page 3: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/3.jpg)
3Zoom-out feature extraction
Photo credit: Mostajabi et al.
![Page 4: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/4.jpg)
4Zoom-out feature extraction
Subscene Level Features Bounding box of superpixels within radius three from the superpixel
at hand
Warp bounding box to 256 x 256 pixels
Activations of the last fully connected layer
Scene Level Features Warp image to 256 x 256 pixels
Activations of the last fully connected layer
![Page 5: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/5.jpg)
5Training
Extracting the features from the mirror images and take element-wise max over the resulting two features vectors.
12416-dimensional representation for each superpixel.
Training 2 classifiers Linear classifier (Softmax)
MLP: Hidden layer (1024 neurons) + ReLU + Hidden layer (1024 neurons) with dropout
![Page 6: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/6.jpg)
6Loss Function
Imbalanced dataset Wheighted loss function
Loss function: Let be frequency of class c in the training data and
![Page 7: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/7.jpg)
7Effect of Zoom-out Levels
Photo and Table credit: Mostajabi et al.
Image Ground Truth
G1:3 G1:5 G1:5+S1 G1:5+S1+S2
![Page 8: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/8.jpg)
8Quantitative Results
Softmax Results on VOC 2012
Table credit: Mostajabi et al.
![Page 9: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/9.jpg)
9Quantitative Results MLP Results
Table credit: Mostajabi et al.
![Page 10: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/10.jpg)
10Qualitative Results
Photo credit: Mostajabi et al.
![Page 11: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/11.jpg)
11
Learning Deconvolution Network for Semantic SegmentationNOH, HONG AND HAN
POSTECH, KOREA
![Page 12: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/12.jpg)
12Motivations
Photo credit: Noh et al.
Image Ground Truth FCN Prediction
![Page 13: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/13.jpg)
13Motivations
Photo credit: Noh et al.
![Page 14: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/14.jpg)
14Deconvolution Network Architecture
Photo credit: Noh et al.
![Page 15: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/15.jpg)
15Unpooling
Photo credit: Noh et al.
![Page 16: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/16.jpg)
16Deconvolution
Photo credit: Noh et al.
![Page 17: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/17.jpg)
17Unpooling and Deconvolution Effects
Photo credit: Noh et al.
![Page 18: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/18.jpg)
18Pipeline
Generating 2K object proposals using Edge-Box and selecting top 50 based on their objectness scores.
Aggregating the segmentation maps which are generated for each proposals using pixel-wise maximum or average.
Constructing the class conditional probability map using Softmax
Apply fully-conncected CRF to the probability map.
Ensemble with FCN Computing mean of probability map generated with DeconvNet and
FCN
applying CRF.
Photo credit: Noh et al.
![Page 19: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/19.jpg)
19Training Deep Network
Adding a batch normalization layer to the output of every convolutional and deconvolutional layer.
Two-stage Training Train on easy examples first and then fine-tune with more
challenging ones.
Constructing easy examples: Crop object instances using ground-truth annotations
Limiting the variations in object location and size reduces the search space for semantic segmentation substantially
![Page 20: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/20.jpg)
20Effect of Number of Proposals
Photo credit: Noh et al.
![Page 21: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/21.jpg)
21Quantitative Results
Table credit: Noh et al.
![Page 22: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/22.jpg)
22Qualitative Results
Photo credit: Noh et al.
![Page 23: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/23.jpg)
23Qualitative Results
Examples that FCN produces better results than DeconvNet.
Photo credit: Noh et al.
![Page 24: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO](https://reader035.vdocument.in/reader035/viewer/2022062807/5697c0211a28abf838cd2bdd/html5/thumbnails/24.jpg)
24Qualitative Results
Examples that inaccurate predictions from our method and FCN are improved by ensemble.
Photo credit: Noh et al.