tables and cars with convolutional...
TRANSCRIPT
![Page 1: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/1.jpg)
Learning to Generate Chairs, Tables and Cars with
Convolutional NetworksAlexey Dosovitskiy, Jost Tobias Springenberg,
Maxim Tatarchenko, Thomas Brox
Liu Jiang and Ian Tam
![Page 2: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/2.jpg)
Introduction and Related Work
![Page 3: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/3.jpg)
Overview (Part 1)
● Goal: Using a dataset of 3D models (chairs, tables, and cars), train generative ‘up-convolutional’ neural networks that can generate realistic 2D projections of objects from high-level descriptions○ Object style○ Viewpoint○ Additional transformation parameters (e.g. color and brightness)
![Page 4: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/4.jpg)
Overview (Part 2)
● Networks do not merely memorize images but find a meaningful representation of 3D models, allowing them to: ○ Transfer knowledge within object class○ Transfer knowledge between classes○ Interpolate between different objects within a class and between classes○ Invent new objects not present in the training set
![Page 5: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/5.jpg)
Related Work
● Train undirected graphical models, which treat encoding and generation as a joint inference problem○ Deep Boltzmann Machines (DBMs)○ Restricted Boltzmann Machines (RBMs)
● Train directed graphical models of the data distribution○ Gaussian mixture models○ Autoregressive models○ Stochastic variations of neural networks
![Page 6: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/6.jpg)
Previous Work vs. This Paper
● Previous work○ Unsupervised generative models that can be extended to incorporate label
information, forming semi-supervised models○ Restricted to small models and images (maximum of 48 x 48 pixels)○ Require extensive inference procedure for both training and image generation
● This paper○ Supervised learning and assumes high-level latent representation of the images○ Generate large high quality images of 128 x 128 images○ Complete control over which images to generate. Downside is the need for labels
that fully describe the appearance of each image
![Page 7: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/7.jpg)
Network Architectures and Training
![Page 8: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/8.jpg)
Network Architecture
● Targets are the RGB output image x and the segmentation mask s. Generative network g(c, v, θ) is composed of three vectors: ○ c: model style○ v: horizontal angle and elevation of the camera position○ θ: parameters of additional transformations applied to the images
● Mostly generated 128 x 128 pixel images but also experimented with 64 x 64 and 256 x 256○ Only difference in the architectures is one less or more up-convolution ○ Adding a convolutional layer after each up-convolution increases quality of
generated images
![Page 9: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/9.jpg)
2-Stream Network ArchitectureFC - fully connected, unconv - unpooling+convolution
Build a shared, high dimensional hidden
representation
Generate an image and object segmentation mask
![Page 10: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/10.jpg)
Network Training
Network parameters W are trained by minimizing error of reconstructing the segmented-out chair image and the segmentation mask.
Qualitative results with different networks trained on chairs
Per-pixel mean squared error of generated images and # of parameters in expanding network parts
“1s-S-deep” network is best both
qualitatively and quantitatively
![Page 11: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/11.jpg)
Training Set Size and Data Augmentation
● Experimented with data augmentation: fixing the network architecture and varying the training set size○ Effect is qualitatively similar to increasing training set size○ Worse reconstruction of fine details but better generalization
![Page 12: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/12.jpg)
Qualitative results for different numbers of car models in the training set
Interpolation between two car modelsTop: W/O data augmentationBottom: W/ data augmentation
![Page 13: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/13.jpg)
Key Experiments / Results
![Page 14: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/14.jpg)
Modeling Transformations
![Page 15: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/15.jpg)
Viewpoint Interpolation
![Page 16: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/16.jpg)
Elevation Transfer / Extrapolation● Network trained on both tables and chairs can transfer knowledge about
elevations from table dataset to chair dataset and vice-versa● Training on both object classes forces network to model general 3D geometry
![Page 17: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/17.jpg)
Style Interpolation● Interpolation between feature/label input vectors
![Page 18: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/18.jpg)
Style Interpolation II● Interpolation between
multiple chairs
![Page 19: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/19.jpg)
Feature Space Arithmetic
![Page 20: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/20.jpg)
Correspondences● Given two images from training set,
generate style interpolations (of say, 64 images) between the two
● Use refined optical flow between interpolations to determine correspondences between objects in the two images
![Page 21: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/21.jpg)
Analysis of the Network
![Page 22: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/22.jpg)
Reminder: “2S-E” Network Architecture
![Page 23: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/23.jpg)
Images Generated from Single Unit Activations in Feature Maps of Different Fully Connected Layers
Activating neurons of FC-1 and FC-2 feature maps of the class
stream while fixing viewpoint and transformation inputs
Activating neurons of FC-3 and FC-4 feature maps of the class
stream with non-fixed viewpoints
![Page 24: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/24.jpg)
‘Zoom Neuron’
Increasing the activation of a specialized neuron while keeping all other activations fixed results in these transformations
![Page 25: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/25.jpg)
Single neurons in later layers produce edge-like images. Neurons of higher deconvolutional
layers generate blurry ‘clouds’.
Images Generated from Single Neuron Activations in Feature Maps of Some Layers of the “2s-E” Network
Unconv-2
Unconv-1
FC-5
![Page 26: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/26.jpg)
Smooth interpolation between a single activation and the whole chair: Neurons are activated in the center and the size of the center region is increased from 2 x 2 to 8 x 8.
Network Can Generate Fine Details Through a Combination of Spatially Neighboring Neurons
Interaction of neighboring neurons is important. In the center, where many neurons are active, the
image is sharp, while in the periphery, it is blurry.
![Page 27: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/27.jpg)
Conclusion and Recap
● Supervised training of CNNs can be used to generate images given high-level information
● Network does not simply learn to generate training samples but instead learns an implicit 3D shape and geometry representation
● When trained stochastically, the network can even invent new chair styles
![Page 28: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/28.jpg)
Other Approaches to Generative Networks
![Page 29: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/29.jpg)
Generative Adversarial Networks
![Page 30: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/30.jpg)
Deep Convolutional Generative Adversarial Networks● Generator Network A generates images● Discriminator Network B distinguishes generated images from real images● Backpropagate through both generator and discriminator :
○ Discriminator learns to distinguish real images from generated images○ Generator learns to “fool” discriminator by generating images similar to real images
● Ideally, generator improves such that discriminator can’t distinguish images● However, training the generator can be unstable - Oscillations or collapse of
the generator solution can happen
Generator ArchitectureGenerator-Discriminator Network
Radford, Metz and Chintala
![Page 31: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/31.jpg)
Bedrooms in Latent Space
![Page 32: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/32.jpg)
Face Rotations
![Page 33: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/33.jpg)
Face Arithmetic
![Page 34: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/34.jpg)
Generated Faces and Albums
![Page 35: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/35.jpg)
InfoGAN● Maximizes the mutual information between latent variables and observations● Learns disentangled representations - Each latent variable corresponds to
some meaningful variable in semantic space (e.g. viewing angle, lighting)
Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever , Pieter Abbeel
![Page 36: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/36.jpg)
Voxel-Based Approaches
![Page 37: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/37.jpg)
Predictable and Generative Object Representations● Autoencoder to ensure that representation is generative● Convolutional network to ensure that representation is predictable
Rohit Girdhar, David Fouhey
![Page 38: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/38.jpg)
Results on IKEA Dataset
![Page 39: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/39.jpg)
Results on IKEA Dataset
![Page 40: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/40.jpg)
Thank You
![Page 41: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/41.jpg)
Variational Autoencoders● Bayesian inference on probabilistic graphical model with latent variables.● Jointly learn the recognition model (encoder) parameters and generative
model (decoder) parameters θ.● Recognition model q (z|x) approximates the intractable posterior pθ(z|x)
![Page 42: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/42.jpg)
Deep Recurrent Attentive Writer (DRAW)● Variational Autoencoders + Recurrent Networks● Network decides at each time step
○ Where to Read○ Where to Write○ What to Write
![Page 43: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/43.jpg)
DRAWings
![Page 44: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/44.jpg)
PixelRNN● Model the conditional distribution of each individual pixel given previous pixels● LSTM network approximates ideal context
![Page 45: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/45.jpg)
PixelRNN - Inpainting
![Page 46: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec6130262bf6e599008e744/html5/thumbnails/46.jpg)
PixelRNN - Generated ImageNet 64x64