context encoders - stanford...

Context EncodersFeature Learning by InpaintingBy Pathak et al. (2016)

Photo: Live on the Edge Photography

Unsupervised Semantic Feature Learning

Intro Related Work Main Contributions Results Conclusion

More supervised

More semantic

ImageNet

Image Captioning

Learning to Generate Chairs

Image reconstruction

Semantic Inpainting

GAN

Image denoising

Context Prediction

OdometryPrediction

Inputs:

( , )

Task: Learn a

f( ) =

Semantic Inpainting


Photo: Live on the Edge Photography

Semantic Inpainting+ For large regions, requires

semantics+ Unsupervised

- Ill-posed (not well-defined)


Photo: Zhang et al (ECCV 2016)

Hypothesis Selection in Semantic InpaintingHow to choose between possibilities?

L2: Choose them all

Adversarial: Pick the most believable


Photo: Pathak et al. (2016)

Related Work

Intro Related Work Main Contributions Results ConclusionIntro Related Work Main Contributions Results Conclusion


More supervised

More semantic

ImageNet


Semantic Inpainting

GAN

Image Captioning

Image denoising


Context Prediction

OdometryPrediction

Visual Memex


Visual MemexCreates graph of previously seen objects, and compares query image to graph


Malisiewicz et al. (2009)


More supervised

More semantic

ImageNet


Semantic Inpainting

GAN

Image Captioning

Image denoising


Context Prediction

OdometryPrediction


Dosovitsky et al. (2015)



More supervised

More semantic

ImageNet

Image Captioning



Semantic Inpainting

GAN

Image denoising

Context Prediction

OdometryPrediction


Autoencoders


Shinya Yuki (2016)


More supervised

More semantic

ImageNet

Image Captioning



Semantic Inpainting

GAN

Image denoising

Context Prediction

OdometryPrediction


Context Prediction


Doersch et al. (2016)


More supervised

More semantic

ImageNet


Semantic Inpainting

GAN

Image Captioning

Image denoising


Context Prediction

OdometryPrediction


Learning to See by Moving


Agrawal et al. (2015)

Main Contributions


Context Aware L210x scaled loss in context region,


Inputs:

( , )

Random Patches


AlexNet Architecture


Channel-Wise Fully Connected

Followed by 1x1 convolution to propagate across channels


100M → <0.4М

Context Encoder Architecture


Context Encoder Architecture Continued


GAN Objective:

Adversarial LossTerm:

Context EncoderObjective:

Results


Feature Transfer Evaluation Methodology


● Feature transfer capability evaluated on three tasks: a. Classification pretrainingb. Detection pretrainingc. Semantic Segmentation pretraining

● Compared against:a. Random weight initializationb. Autoencoder initializationc. Learning to see by moving (Agrawal et al.)d. Context prediction (Doersch et al.)e. Unsupervised learning with videos (Wang et al.)

Further Details


● Classification○ Pascal VOC 2007 Dataset○ ~10000 images for training○ Output generated by voting from 10 random croppings of input image

● Detection○ Pascal VOC 2007 Detection Challenge Dataset ○ Fast R-CNN method (Girshick, 2015) used to generate detection hypotheses

from features● Segmentation

○ Pascal VOC 2012 Dataset ○ Fully convolutional network (FCN) (Shelhamer et al., 2015) used to generate

segmentation hypothesis from features


Pretraining Results

Doersch et al. 65.3% 51.1%Modified


Inpainting Results


Encoded Features Nearest Neighbors

Recapitulation



Paper Contributions● Idea of using semantic inpainting as a supervisory signal for

unsupervised feature learning● Idea of using adversarial loss as a modular loss function that

can be combined with other losses● Qualitatively nice inpainting results


Negatives of Paper● Seemed to be two “separate tasks”

a. Unsupervised feature learningb. Semantic inpainting

● No feature transfer results for context encoder● No results for how adversarial loss affects pre-trainability of

context encoder features● Worked on par with other pre-training methods

Semantic Inpainting

Feature Learning

Semantic Inpainting

Feature Learning

context encoders - stanford...

Documents