feedforward semantic segmentation with zoom-out features mostajabi, yadollahpour and shakhnarovich...

24
Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

Upload: avis-hancock

Post on 21-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

Feedforward semantic segmentation with zoom-out featuresMOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH

TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

Page 2: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

2Main Ideas

Casting semantic segmentation as classifying a set of superpixels.

Extracting CNN features from different levels of spatial context around the superpixel at hand.

Using MLP as the classifier

Photo credit: Mostajabi et al.

Page 3: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

3Zoom-out feature extraction

Photo credit: Mostajabi et al.

Page 4: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

4Zoom-out feature extraction

Subscene Level Features Bounding box of superpixels within radius three from the superpixel

at hand

Warp bounding box to 256 x 256 pixels

Activations of the last fully connected layer

Scene Level Features Warp image to 256 x 256 pixels

Activations of the last fully connected layer

Page 5: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

5Training

Extracting the features from the mirror images and take element-wise max over the resulting two features vectors.

12416-dimensional representation for each superpixel.

Training 2 classifiers Linear classifier (Softmax)

MLP: Hidden layer (1024 neurons) + ReLU + Hidden layer (1024 neurons) with dropout

Page 6: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

6Loss Function

Imbalanced dataset Wheighted loss function

Loss function: Let be frequency of class c in the training data and

Page 7: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

7Effect of Zoom-out Levels

Photo and Table credit: Mostajabi et al.

Image Ground Truth

G1:3 G1:5 G1:5+S1 G1:5+S1+S2

Page 8: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

8Quantitative Results

Softmax Results on VOC 2012

Table credit: Mostajabi et al.

Page 9: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

9Quantitative Results MLP Results

Table credit: Mostajabi et al.

Page 10: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

10Qualitative Results

Photo credit: Mostajabi et al.

Page 11: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

11

Learning Deconvolution Network for Semantic SegmentationNOH, HONG AND HAN

POSTECH, KOREA

Page 12: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

12Motivations

Photo credit: Noh et al.

Image Ground Truth FCN Prediction

Page 13: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

13Motivations

Photo credit: Noh et al.

Page 14: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

14Deconvolution Network Architecture

Photo credit: Noh et al.

Page 15: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

15Unpooling

Photo credit: Noh et al.

Page 16: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

16Deconvolution

Photo credit: Noh et al.

Page 17: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

17Unpooling and Deconvolution Effects

Photo credit: Noh et al.

Page 18: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

18Pipeline

Generating 2K object proposals using Edge-Box and selecting top 50 based on their objectness scores.

Aggregating the segmentation maps which are generated for each proposals using pixel-wise maximum or average.

Constructing the class conditional probability map using Softmax

Apply fully-conncected CRF to the probability map.

Ensemble with FCN Computing mean of probability map generated with DeconvNet and

FCN

applying CRF.

Photo credit: Noh et al.

Page 19: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

19Training Deep Network

Adding a batch normalization layer to the output of every convolutional and deconvolutional layer.

Two-stage Training Train on easy examples first and then fine-tune with more

challenging ones.

Constructing easy examples: Crop object instances using ground-truth annotations

Limiting the variations in object location and size reduces the search space for semantic segmentation substantially

Page 20: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

20Effect of Number of Proposals

Photo credit: Noh et al.

Page 21: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

21Quantitative Results

Table credit: Noh et al.

Page 22: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

22Qualitative Results

Photo credit: Noh et al.

Page 23: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

23Qualitative Results

Examples that FCN produces better results than DeconvNet.

Photo credit: Noh et al.

Page 24: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

24Qualitative Results

Examples that inaccurate predictions from our method and FCN are improved by ensemble.

Photo credit: Noh et al.