context encoders - stanford...
TRANSCRIPT
Context EncodersFeature Learning by InpaintingBy Pathak et al. (2016)
Photo: Live on the Edge Photography
Unsupervised Semantic Feature Learning
Intro Related Work Main Contributions Results Conclusion
More supervised
More semantic
ImageNet
Image Captioning
Learning to Generate Chairs
Image reconstruction
Semantic Inpainting
GAN
Image denoising
Context Prediction
OdometryPrediction
Inputs:
( , )
Task: Learn a
f( ) =
Semantic Inpainting
Intro Related Work Main Contributions Results Conclusion
Photo: Live on the Edge Photography
Semantic Inpainting+ For large regions, requires
semantics+ Unsupervised
- Ill-posed (not well-defined)
Intro Related Work Main Contributions Results Conclusion
Photo: Zhang et al (ECCV 2016)
Hypothesis Selection in Semantic InpaintingHow to choose between possibilities?
L2: Choose them all
Adversarial: Pick the most believable
Intro Related Work Main Contributions Results Conclusion
Photo: Pathak et al. (2016)
Related Work
Intro Related Work Main Contributions Results ConclusionIntro Related Work Main Contributions Results Conclusion
Unsupervised Semantic Feature Learning
More supervised
More semantic
ImageNet
Image reconstruction
Semantic Inpainting
GAN
Image Captioning
Image denoising
Learning to Generate Chairs
Context Prediction
OdometryPrediction
Visual Memex
Intro Related Work Main Contributions Results Conclusion
Visual MemexCreates graph of previously seen objects, and compares query image to graph
Intro Related Work Main Contributions Results Conclusion
Malisiewicz et al. (2009)
Unsupervised Semantic Feature Learning
More supervised
More semantic
ImageNet
Image reconstruction
Semantic Inpainting
GAN
Image Captioning
Image denoising
Learning to Generate Chairs
Context Prediction
OdometryPrediction
Intro Related Work Main Contributions Results Conclusion
Dosovitsky et al. (2015)
Intro Related Work Main Contributions Results Conclusion
Unsupervised Semantic Feature Learning
More supervised
More semantic
ImageNet
Image Captioning
Learning to Generate Chairs
Image reconstruction
Semantic Inpainting
GAN
Image denoising
Context Prediction
OdometryPrediction
Intro Related Work Main Contributions Results Conclusion
Autoencoders
Intro Related Work Main Contributions Results Conclusion
Shinya Yuki (2016)
Unsupervised Semantic Feature Learning
More supervised
More semantic
ImageNet
Image Captioning
Learning to Generate Chairs
Image reconstruction
Semantic Inpainting
GAN
Image denoising
Context Prediction
OdometryPrediction
Intro Related Work Main Contributions Results Conclusion
Context Prediction
Intro Related Work Main Contributions Results Conclusion
Doersch et al. (2016)
Unsupervised Semantic Feature Learning
More supervised
More semantic
ImageNet
Image reconstruction
Semantic Inpainting
GAN
Image Captioning
Image denoising
Learning to Generate Chairs
Context Prediction
OdometryPrediction
Intro Related Work Main Contributions Results Conclusion
Learning to See by Moving
Intro Related Work Main Contributions Results Conclusion
Agrawal et al. (2015)
Main Contributions
Intro Related Work Main Contributions Results Conclusion
Context Aware L210x scaled loss in context region,
Intro Related Work Main Contributions Results Conclusion
Inputs:
( , )
Random Patches
Intro Related Work Main Contributions Results Conclusion
AlexNet Architecture
Intro Related Work Main Contributions Results Conclusion
Channel-Wise Fully Connected
Followed by 1x1 convolution to propagate across channels
Intro Related Work Main Contributions Results Conclusion
100M → <0.4М
Context Encoder Architecture
Intro Related Work Main Contributions Results Conclusion
Context Encoder Architecture Continued
Intro Related Work Main Contributions Results Conclusion
GAN Objective:
Adversarial LossTerm:
Context EncoderObjective:
Results
Intro Related Work Main Contributions Results Conclusion
Feature Transfer Evaluation Methodology
Intro Related Work Main Contributions Results Conclusion
● Feature transfer capability evaluated on three tasks: a. Classification pretrainingb. Detection pretrainingc. Semantic Segmentation pretraining
● Compared against:a. Random weight initializationb. Autoencoder initializationc. Learning to see by moving (Agrawal et al.)d. Context prediction (Doersch et al.)e. Unsupervised learning with videos (Wang et al.)
Further Details
Intro Related Work Main Contributions Results Conclusion
● Classification○ Pascal VOC 2007 Dataset○ ~10000 images for training○ Output generated by voting from 10 random croppings of input image
● Detection○ Pascal VOC 2007 Detection Challenge Dataset ○ Fast R-CNN method (Girshick, 2015) used to generate detection hypotheses
from features● Segmentation
○ Pascal VOC 2012 Dataset ○ Fully convolutional network (FCN) (Shelhamer et al., 2015) used to generate
segmentation hypothesis from features
Intro Related Work Main Contributions Results Conclusion
Pretraining Results
Doersch et al. 65.3% 51.1%Modified
Intro Related Work Main Contributions Results Conclusion
Inpainting Results
Intro Related Work Main Contributions Results Conclusion
Encoded Features Nearest Neighbors
Recapitulation
Intro Related Work Main Contributions Results Conclusion
Intro Related Work Main Contributions Results Conclusion
Paper Contributions● Idea of using semantic inpainting as a supervisory signal for
unsupervised feature learning● Idea of using adversarial loss as a modular loss function that
can be combined with other losses● Qualitatively nice inpainting results
Intro Related Work Main Contributions Results Conclusion
Negatives of Paper● Seemed to be two “separate tasks”
a. Unsupervised feature learningb. Semantic inpainting
● No feature transfer results for context encoder● No results for how adversarial loss affects pre-trainability of
context encoder features● Worked on par with other pre-training methods
Semantic Inpainting
Feature Learning
Semantic Inpainting
Feature Learning