lecture 12: generative models
TRANSCRIPT
![Page 1: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/1.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 20211
Lecture 12:Generative Models
![Page 2: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/2.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Administrative
2
● A3 is out. Due May 25.● Milestone was due May 10th
○ Read website page for milestone requirements.○ Need to Finish data preprocessing and initial results by then.
● Midterm and A2 grades will be out this week
![Page 3: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/3.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Supervised vs Unsupervised Learning
3
Supervised Learning
Data: (x, y)x is data, y is label
Goal: Learn a function to map x -> y
Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc.
![Page 4: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/4.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Supervised vs Unsupervised Learning
4
Supervised Learning
Data: (x, y)x is data, y is label
Goal: Learn a function to map x -> y
Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc.
Cat
Classification
This image is CC0 public domain
![Page 5: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/5.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Supervised vs Unsupervised Learning
5
Supervised Learning
Data: (x, y)x is data, y is label
Goal: Learn a function to map x -> y
Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc.
Image captioning
A cat sitting on a suitcase on the floor
Caption generated using neuraltalk2Image is CC0 Public domain.
![Page 6: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/6.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Supervised vs Unsupervised Learning
6
Supervised Learning
Data: (x, y)x is data, y is label
Goal: Learn a function to map x -> y
Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc.
DOG, DOG, CAT
This image is CC0 public domain
Object Detection
![Page 7: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/7.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Supervised vs Unsupervised Learning
7
Supervised Learning
Data: (x, y)x is data, y is label
Goal: Learn a function to map x -> y
Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc.
Semantic Segmentation
GRASS, CAT, TREE, SKY
![Page 8: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/8.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 20218
Unsupervised Learning
Data: xJust data, no labels!
Goal: Learn some underlying hidden structure of the data
Examples: Clustering, dimensionality reduction, feature learning, density estimation, etc.
Supervised vs Unsupervised Learning
![Page 9: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/9.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 20219
Unsupervised Learning
Data: xJust data, no labels!
Goal: Learn some underlying hidden structure of the data
Examples: Clustering, dimensionality reduction, density estimation, etc.
Supervised vs Unsupervised Learning
K-means clustering
This image is CC0 public domain
![Page 10: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/10.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202110
Unsupervised Learning
Data: xJust data, no labels!
Goal: Learn some underlying hidden structure of the data
Examples: Clustering, dimensionality reduction, density estimation, etc.
Supervised vs Unsupervised Learning
Principal Component Analysis (Dimensionality reduction)
This image from Matthias Scholz is CC0 public domain
3-d 2-d
![Page 11: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/11.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202111
Unsupervised Learning
Data: xJust data, no labels!
Goal: Learn some underlying hidden structure of the data
Examples: Clustering, dimensionality reduction, density estimation, etc.
Supervised vs Unsupervised Learning
2-d density estimation
2-d density images left and right are CC0 public domain
1-d density estimationFigure copyright Ian Goodfellow, 2016. Reproduced with permission.
Modeling p(x)
![Page 12: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/12.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Unsupervised Learning
Data: xJust data, no labels!
Goal: Learn some underlying hidden structure of the data
Examples: Clustering, dimensionality reduction, density estimation, etc.
12
Supervised vs Unsupervised Learning
Supervised Learning
Data: (x, y)x is data, y is label
Goal: Learn a function to map x -> y
Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc.
![Page 13: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/13.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Generative Modeling
13
Training data ~ pdata(x)
Objectives:1. Learn pmodel(x) that approximates pdata(x) 2. Sampling new x from pmodel(x)
Given training data, generate new samples from same distribution
learningpmodel(x
)
sampling
![Page 14: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/14.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Generative Modeling
14
Training data ~ pdata(x)
Given training data, generate new samples from same distribution
learningpmodel(x
)
sampling
Formulate as density estimation problems: - Explicit density estimation: explicitly define and solve for pmodel(x) - Implicit density estimation: learn model that can sample from pmodel(x) without
explicitly defining it.
![Page 15: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/15.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Why Generative Models?
15
- Realistic samples for artwork, super-resolution, colorization, etc.- Learn useful features for downstream tasks such as classification.- Getting insights from high-dimensional data (physics, medical imaging, etc.)- Modeling physical world for simulation and planning (robotics and
reinforcement learning applications)- Many more ...
FIgures from L-R are copyright: (1) Alec Radford et al. 2016; (2) Phillip Isola et al. 2017. Reproduced with authors permission (3) BAIR Blog.
![Page 16: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/16.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Taxonomy of Generative Models
16
Generative models
Explicit density Implicit density
Direct
Tractable density Approximate density Markov Chain
Variational Markov Chain
Variational Autoencoder Boltzmann Machine
GSN
GAN
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
Fully Visible Belief Nets- NADE- MADE- PixelRNN/CNN- NICE / RealNVP- Glow - Ffjord
![Page 17: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/17.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Taxonomy of Generative Models
17
Generative models
Explicit density Implicit density
Direct
Tractable density Approximate density Markov Chain
Variational Markov Chain
Variational Autoencoder Boltzmann Machine
GSN
GAN
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
Today: discuss 3 most popular types of generative models today
Fully Visible Belief Nets- NADE- MADE- PixelRNN/CNN- NICE / RealNVP- Glow - Ffjord
![Page 18: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/18.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202118
PixelRNN and PixelCNN(A very brief overview)
![Page 19: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/19.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202119
Fully visible belief network (FVBN)
Likelihood of image x
Explicit density model
Joint likelihood of each pixel in the image
![Page 20: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/20.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202120
Fully visible belief network (FVBN)
Use chain rule to decompose likelihood of an image x into product of 1-d distributions:
Explicit density model
Likelihood of image x
Probability of i’th pixel value given all previous pixels
Then maximize likelihood of training data
![Page 21: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/21.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Then maximize likelihood of training data
21
Fully visible belief network (FVBN)
Use chain rule to decompose likelihood of an image x into product of 1-d distributions:
Explicit density model
Likelihood of image x
Probability of i’th pixel value given all previous pixels
Complex distribution over pixel values => Express using a neural network!
![Page 22: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/22.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202122
Recurrent Neural Network
x1
RNN
x2
x2
RNN
x3
x3
RNN
x4
...
xn-
1
RNN
xn
h1 h2 h3h0
![Page 23: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/23.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
PixelRNN
23
Generate image pixels starting from corner
Dependency on previous pixels modeled using an RNN (LSTM)
[van der Oord et al. 2016]
![Page 24: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/24.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
PixelRNN
24
Generate image pixels starting from corner
Dependency on previous pixels modeled using an RNN (LSTM)
[van der Oord et al. 2016]
![Page 25: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/25.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
PixelRNN
25
Generate image pixels starting from corner
Dependency on previous pixels modeled using an RNN (LSTM)
[van der Oord et al. 2016]
![Page 26: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/26.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
PixelRNN
26
Generate image pixels starting from corner
Dependency on previous pixels modeled using an RNN (LSTM)
[van der Oord et al. 2016]
Drawback: sequential generation is slow in both training and inference!
![Page 27: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/27.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
PixelCNN
27
[van der Oord et al. 2016]
Still generate image pixels starting from corner
Dependency on previous pixels now modeled using a CNN over context region(masked convolution)
Figure copyright van der Oord et al., 2016. Reproduced with permission.
![Page 28: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/28.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
PixelCNN
28
[van der Oord et al. 2016]
Still generate image pixels starting from corner
Dependency on previous pixels now modeled using a CNN over context region(masked convolution)
Figure copyright van der Oord et al., 2016. Reproduced with permission.
Training is faster than PixelRNN(can parallelize convolutions since context region values known from training images)
Generation is still slow:For a 32x32 image, we need to do forward passes of the network 1024 times for a single image
![Page 29: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/29.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Generation Samples
29
Figures copyright Aaron van der Oord et al., 2016. Reproduced with permission.
32x32 CIFAR-10 32x32 ImageNet
![Page 30: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/30.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202130
PixelRNN and PixelCNNImproving PixelCNN performance
- Gated convolutional layers- Short-cut connections- Discretized logistic loss- Multi-scale- Training tricks- Etc…
See- Van der Oord et al. NIPS 2016- Salimans et al. 2017
(PixelCNN++)
Pros:- Can explicitly compute likelihood
p(x)- Easy to optimize- Good samples
Con:- Sequential generation => slow
![Page 31: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/31.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Taxonomy of Generative Models
31
Generative models
Explicit density Implicit density
Direct
Tractable density Approximate density Markov Chain
Variational Markov Chain
Variational Autoencoder Boltzmann Machine
GSN
GAN
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
Fully Visible Belief Nets- NADE- MADE- PixelRNN/CNN- NICE / RealNVP- Glow - Ffjord
![Page 32: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/32.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202132
Variational Autoencoders (VAE)
![Page 33: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/33.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202133
PixelRNN/CNNs define tractable density function, optimize likelihood of training data:
So far...
![Page 34: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/34.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
So far...
34
PixelCNNs define tractable density function, optimize likelihood of training data:
Variational Autoencoders (VAEs) define intractable density function with latent z:
Cannot optimize directly, derive and optimize lower bound on likelihood instead
No dependencies among pixels, can generate all pixels at the same time!
![Page 35: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/35.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
So far...
35
PixelCNNs define tractable density function, optimize likelihood of training data:
Variational Autoencoders (VAEs) define intractable density function with latent z:
Cannot optimize directly, derive and optimize lower bound on likelihood instead
No dependencies among pixels, can generate all pixels at the same time!
Why latent z?
![Page 36: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/36.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Some background first: Autoencoders
36
Unsupervised approach for learning a lower-dimensional feature representation from unlabeled training data
Encoder
Input data
Features
Decoder
![Page 37: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/37.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Some background first: Autoencoders
37
Input data
Features
Unsupervised approach for learning a lower-dimensional feature representation from unlabeled training data
z usually smaller than x(dimensionality reduction)
Q: Why dimensionality reduction? Decoder
Encoder
![Page 38: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/38.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Some background first: Autoencoders
38
Input data
Features
Unsupervised approach for learning a lower-dimensional feature representation from unlabeled training data
z usually smaller than x(dimensionality reduction)
Decoder
Encoder
Q: Why dimensionality reduction?
A: Want features to capture meaningful factors of variation in data
![Page 39: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/39.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Some background first: Autoencoders
39
Encoder
Input data
Features
How to learn this feature representation?
Train such that features can be used to reconstruct original data“Autoencoding” - encoding input itself
Decoder
Reconstructed input data
Reconstructed data
Encoder: 4-layer convDecoder: 4-layer upconv
Input data
![Page 40: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/40.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Some background first: Autoencoders
40
Encoder
Input data
Features
Decoder
Reconstructed data
Input data
Encoder: 4-layer convDecoder: 4-layer upconv
L2 Loss function: Train such that features can be used to reconstruct original data
Doesn’t use labels!
![Page 41: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/41.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Some background first: Autoencoders
41
Encoder
Input data
Features
Decoder
Reconstructed input data
After training, throw away decoder
![Page 42: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/42.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Some background first: Autoencoders
42
Encoder
Input data
Features
Classifier
Predicted Label
Fine-tuneencoderjointly withclassifier
Loss function (Softmax, etc)
Encoder can be used to initialize a supervised model
planedog deer
birdtruck
Train for final task (sometimes with
small data)
Transfer from large, unlabeled dataset to small, labeled dataset.
![Page 43: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/43.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Some background first: Autoencoders
43
Encoder
Input data
Features
Decoder
Reconstructed input data
Autoencoders can reconstruct data, and can learn features to initialize a supervised model
Features capture factors of variation in training data.
But we can’t generate new images from an autoencoder because we don’t know the space of z.
How do we make autoencoder a generative model?
![Page 44: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/44.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202144
Variational AutoencodersProbabilistic spin on autoencoders - will let us sample from the model to generate data!
![Page 45: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/45.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202145
Sample fromtrue prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Assume training data is generated from the distribution of unobserved (latent) representation z
Probabilistic spin on autoencoders - will let us sample from the model to generate data!
Sample from true conditional
![Page 46: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/46.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202146
Sample fromtrue prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Assume training data is generated from the distribution of unobserved (latent) representation z
Probabilistic spin on autoencoders - will let us sample from the model to generate data!
Sample from true conditional
Intuition (remember from autoencoders!): x is an image, z is latent factors used to generate x: attributes, orientation, etc.
![Page 47: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/47.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202147
Sample fromtrue prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Sample from true conditional
We want to estimate the true parameters of this generative model given training data x.
![Page 48: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/48.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202148
Sample fromtrue prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Sample from true conditional
We want to estimate the true parameters of this generative model given training data x.
How should we represent this model?
![Page 49: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/49.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202149
Sample fromtrue prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Sample from true conditional
We want to estimate the true parameters of this generative model given training data x.
How should we represent this model?
Choose prior p(z) to be simple, e.g. Gaussian. Reasonable for latent attributes, e.g. pose, how much smile.
![Page 50: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/50.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202150
Sample fromtrue prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Sample from true conditional
We want to estimate the true parameters of this generative model given training data x.
How should we represent this model?
Choose prior p(z) to be simple, e.g. Gaussian. Reasonable for latent attributes, e.g. pose, how much smile.
Conditional p(x|z) is complex (generates image) => represent with neural network
Decoder network
![Page 51: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/51.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202151
Sample fromtrue prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Sample from true conditional
We want to estimate the true parameters of this generative model given training data x.
How to train the model?
Decoder network
![Page 52: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/52.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202152
Sample fromtrue prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Sample from true conditional
We want to estimate the true parameters of this generative model given training data x.
How to train the model?
Learn model parameters to maximize likelihood of training dataDecoder
network
![Page 53: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/53.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202153
Sample fromtrue prior
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders
Sample from true conditional
We want to estimate the true parameters of this generative model given training data x.
How to train the model?
Learn model parameters to maximize likelihood of training data
Q: What is the problem with this?Intractable!
Decoder network
![Page 54: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/54.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202154
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders: Intractability
Data likelihood:
![Page 55: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/55.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202155
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders: Intractability
Data likelihood:
Simple Gaussian prior
✔
![Page 56: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/56.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202156
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders: Intractability
Data likelihood:
Decoder neural network
✔ ✔
![Page 57: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/57.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202157
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders: Intractability
Data likelihood:
Intractable to compute p(x|z) for every z!
��✔ ✔
![Page 58: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/58.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202158
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders: Intractability
Data likelihood:
Intractable to compute p(x|z) for every z!
��✔ ✔
Monte Carlo estimation is too high variance
![Page 59: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/59.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202159
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders: Intractability
Data likelihood:��✔ ✔
Posterior density:
Intractable data likelihood
![Page 60: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/60.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202160
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Variational Autoencoders: Intractability
Data likelihood:
Solution: In addition to modeling pθ(x|z), learn qɸ(z|x) that approximates the true posterior pθ(z|x).
Will see that the approximate posterior allows us to derive a lower bound on the data likelihood that is tractable, which we can optimize.
Variational inference is to approximate the unknown posterior distribution from only the observed data x
Posterior density also intractable:
![Page 61: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/61.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202161
Variational Autoencoders
![Page 62: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/62.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202162
Variational Autoencoders
Taking expectation wrt. z (using encoder network) will come in handy later
![Page 63: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/63.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202163
Variational Autoencoders
![Page 64: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/64.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202164
Variational Autoencoders
![Page 65: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/65.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202165
Variational Autoencoders
![Page 66: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/66.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202166
Variational Autoencoders
The expectation wrt. z (using encoder network) let us write nice KL terms
![Page 67: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/67.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202167
Variational Autoencoders
This KL term (between Gaussians for encoder and z prior) has nice closed-form solution!
pθ(z|x) intractable (saw earlier), can’t compute this KL term :( But we know KL divergence always >= 0.
Decoder network gives pθ(x|z), can compute estimate of this term through sampling (need some trick to differentiate through sampling).
![Page 68: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/68.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202168
Variational Autoencoders
We want to maximize the data likelihood
This KL term (between Gaussians for encoder and z prior) has nice closed-form solution!
pθ(z|x) intractable (saw earlier), can’t compute this KL term :( But we know KL divergence always >= 0.
Decoder network gives pθ(x|z), can compute estimate of this term through sampling.
![Page 69: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/69.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202169
Variational Autoencoders
Tractable lower bound which we can take gradient of and optimize! (pθ(x|z) differentiable, KL term differentiable)
We want to maximize the data likelihood
![Page 70: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/70.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202170
Variational Autoencoders
Tractable lower bound which we can take gradient of and optimize! (pθ(x|z) differentiable, KL term differentiable)
Decoder:reconstructthe input data
Encoder: make approximate posterior distribution close to prior
![Page 71: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/71.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202171
Variational AutoencodersPutting it all together: maximizing the likelihood lower bound
![Page 72: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/72.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202172
Input Data
Variational AutoencodersPutting it all together: maximizing the likelihood lower bound
Let’s look at computing the KL divergence between the estimated posterior and the prior given some data
![Page 73: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/73.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202173
Encoder network
Input Data
Variational AutoencodersPutting it all together: maximizing the likelihood lower bound
![Page 74: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/74.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202174
Encoder network
Input Data
Variational AutoencodersPutting it all together: maximizing the likelihood lower bound
Make approximate posterior distribution close to prior
Have analytical solution
![Page 75: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/75.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202175
Encoder network
Sample z from
Input Data
Variational AutoencodersPutting it all together: maximizing the likelihood lower bound
Make approximate posterior distribution close to prior
Not part of the computation graph!
![Page 76: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/76.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202176
Encoder network
Sample z from
Input Data
Variational AutoencodersPutting it all together: maximizing the likelihood lower bound
Reparameterization trick to make sampling differentiable:
Sample
![Page 77: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/77.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202177
Encoder network
Sample z from
Input Data
Variational AutoencodersPutting it all together: maximizing the likelihood lower bound
Reparameterization trick to make sampling differentiable:
Sample
Part of computation graph
Input to the graph
![Page 78: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/78.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202178
Encoder network
Decoder network
Sample z from
Input Data
Variational AutoencodersPutting it all together: maximizing the likelihood lower bound
![Page 79: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/79.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202179
Encoder network
Decoder network
Sample z from
Input Data
Variational AutoencodersPutting it all together: maximizing the likelihood lower bound
Maximize likelihood of original input being reconstructed
![Page 80: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/80.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202180
Encoder network
Decoder network
Sample z from
Input Data
Variational AutoencodersPutting it all together: maximizing the likelihood lower bound
For every minibatch of input data: compute this forward pass, and then backprop!
![Page 81: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/81.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202181
Variational Autoencoders: Generating Data!
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Sample fromtrue prior
Sample from true conditional
Decoder network
Our assumption about data generation process
![Page 82: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/82.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202182
Variational Autoencoders: Generating Data!
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Sample fromtrue prior
Sample from true conditional
Decoder network
Our assumption about data generation process
Decoder network
Sample z from
Sample x|z from
Now given a trained VAE: use decoder network & sample z from prior!
![Page 83: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/83.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202183
Decoder network
Sample z from
Sample x|z from
Variational Autoencoders: Generating Data!Use decoder network. Now sample z from prior!
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
![Page 84: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/84.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202184
Decoder network
Sample z from
Sample x|z from
Variational Autoencoders: Generating Data!Use decoder network. Now sample z from prior! Data manifold for 2-d z
Vary z1
Vary z2Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
![Page 85: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/85.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202185
Variational Autoencoders: Generating Data!
Vary z1
Vary z2
Degree of smile
Head pose
Diagonal prior on z => independent latent variables
Different dimensions of z encode interpretable factors of variation
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
![Page 86: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/86.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202186
Variational Autoencoders: Generating Data!
Vary z1
Vary z2
Degree of smile
Head pose
Diagonal prior on z => independent latent variables
Different dimensions of z encode interpretable factors of variation
Also good feature representation that can be computed using qɸ(z|x)!
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
![Page 87: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/87.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202187
Variational Autoencoders: Generating Data!
32x32 CIFAR-10Labeled Faces in the Wild
Figures copyright (L) Dirk Kingma et al. 2016; (R) Anders Larsen et al. 2017. Reproduced with permission.
![Page 88: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/88.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Variational Autoencoders
88
Probabilistic spin to traditional autoencoders => allows generating dataDefines an intractable density => derive and optimize a (variational) lower bound
Pros:- Principled approach to generative models- Interpretable latent space.- Allows inference of q(z|x), can be useful feature representation for other tasks
Cons:- Maximizes lower bound of likelihood: okay, but not as good evaluation as
PixelRNN/PixelCNN- Samples blurrier and lower quality compared to state-of-the-art (GANs)
Active areas of research:- More flexible approximations, e.g. richer approximate posterior instead of diagonal
Gaussian, e.g., Gaussian Mixture Models (GMMs), Categorical Distributions.- Learning disentangled representations.
![Page 89: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/89.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Taxonomy of Generative Models
89
Generative models
Explicit density Implicit density
Direct
Tractable density Approximate density Markov Chain
Variational Markov Chain
Variational Autoencoder Boltzmann Machine
GSN
GAN
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
Fully Visible Belief Nets- NADE- MADE- PixelRNN/CNN- NICE / RealNVP- Glow - Ffjord
![Page 90: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/90.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 202190
Generative Adversarial Networks (GANs)
![Page 91: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/91.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
So far...
91
PixelCNNs define tractable density function, optimize likelihood of training data:
VAEs define intractable density function with latent z:
Cannot optimize directly, derive and optimize lower bound on likelihood instead
![Page 92: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/92.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
So far...PixelCNNs define tractable density function, optimize likelihood of training data:
VAEs define intractable density function with latent z:
Cannot optimize directly, derive and optimize lower bound on likelihood instead
92
What if we give up on explicitly modeling density, and just want ability to sample?
![Page 93: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/93.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
So far...PixelCNNs define tractable density function, optimize likelihood of training data:
VAEs define intractable density function with latent z:
Cannot optimize directly, derive and optimize lower bound on likelihood instead
93
What if we give up on explicitly modeling density, and just want ability to sample?
GANs: not modeling any explicit density function!
![Page 94: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/94.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Generative Adversarial Networks
94
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
Problem: Want to sample from complex, high-dimensional training distribution. No direct way to do this!
Solution: Sample from a simple distribution we can easily sample from, e.g. random noise. Learn transformation to training distribution.
![Page 95: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/95.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Problem: Want to sample from complex, high-dimensional training distribution. No direct way to do this!
Solution: Sample from a simple distribution we can easily sample from, e.g. random noise. Learn transformation to training distribution.
Generative Adversarial Networks
95
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
zInput: Random noise
Generator Network
Output: Sample from training distribution
![Page 96: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/96.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Problem: Want to sample from complex, high-dimensional training distribution. No direct way to do this!
Solution: Sample from a simple distribution we can easily sample from, e.g. random noise. Learn transformation to training distribution.
Generative Adversarial Networks
96
zInput: Random noise
Generator Network
Output: Sample from training distribution
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
But we don’t know which sample z maps to which training image -> can’t learn by reconstructing training images
![Page 97: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/97.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Problem: Want to sample from complex, high-dimensional training distribution. No direct way to do this!
Solution: Sample from a simple distribution we can easily sample from, e.g. random noise. Learn transformation to training distribution.
Generative Adversarial Networks
97
zInput: Random noise
Generator Network
Output: Sample from training distribution
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
But we don’t know which sample z maps to which training image -> can’t learn by reconstructing training images
Objective: generated images should look “real”
![Page 98: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/98.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Problem: Want to sample from complex, high-dimensional training distribution. No direct way to do this!
Solution: Sample from a simple distribution we can easily sample from, e.g. random noise. Learn transformation to training distribution.
Generative Adversarial Networks
98
zInput: Random noise
Generator Network
Output: Sample from training distribution
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
But we don’t know which sample z maps to which training image -> can’t learn by reconstructing training images
Discriminator Network
Real?Fake?
Solution: Use a discriminator network to tell whether the generate image is within data distribution (“real”) or not
gradient
![Page 99: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/99.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Training GANs: Two-player game
99
Discriminator network: try to distinguish between real and fake images Generator network: try to fool the discriminator by generating real-looking images
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
![Page 100: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/100.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Training GANs: Two-player game
100
zRandom noise
Generator Network
Discriminator Network
Fake Images(from generator)
Real Images(from training set)
Real or Fake
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
Fake and real images copyright Emily Denton et al. 2015. Reproduced with permission.
Discriminator network: try to distinguish between real and fake images Generator network: try to fool the discriminator by generating real-looking images
![Page 101: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/101.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Training GANs: Two-player game
101
zRandom noise
Generator Network
Discriminator Network
Fake Images(from generator)
Real Images(from training set)
Real or Fake
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
Fake and real images copyright Emily Denton et al. 2015. Reproduced with permission.
Generator learning signal
Discriminator learning signal
Discriminator network: try to distinguish between real and fake images Generator network: try to fool the discriminator by generating real-looking images
![Page 102: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/102.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Training GANs: Two-player game
102
Train jointly in minimax game
Minimax objective function:
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
Discriminator network: try to distinguish between real and fake images Generator network: try to fool the discriminator by generating real-looking images
Generator objective Discriminator
objective
![Page 103: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/103.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Training GANs: Two-player game
103
Train jointly in minimax game
Minimax objective function:
Discriminator output for real data x
Discriminator output for generated fake data G(z)
Discriminator outputs likelihood in (0,1) of real image
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
Discriminator network: try to distinguish between real and fake images Generator network: try to fool the discriminator by generating real-looking images
![Page 104: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/104.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Training GANs: Two-player game
104
Train jointly in minimax game
Minimax objective function:
Discriminator output for real data x
Discriminator output for generated fake data G(z)
Discriminator outputs likelihood in (0,1) of real image
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
Discriminator network: try to distinguish between real and fake images Generator network: try to fool the discriminator by generating real-looking images
![Page 105: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/105.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Training GANs: Two-player game
105
Train jointly in minimax game
Minimax objective function:
Discriminator output for real data x
Discriminator output for generated fake data G(z)
Discriminator outputs likelihood in (0,1) of real image
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
Discriminator network: try to distinguish between real and fake images Generator network: try to fool the discriminator by generating real-looking images
![Page 106: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/106.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Training GANs: Two-player game
106
Train jointly in minimax game
Minimax objective function:
Discriminator output for real data x
Discriminator output for generated fake data G(z)
Discriminator outputs likelihood in (0,1) of real image
- Discriminator (θd) wants to maximize objective such that D(x) is close to 1 (real) and D(G(z)) is close to 0 (fake)
- Generator (θg) wants to minimize objective such that D(G(z)) is close to 1 (discriminator is fooled into thinking generated G(z) is real)
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
Discriminator network: try to distinguish between real and fake images Generator network: try to fool the discriminator by generating real-looking images
![Page 107: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/107.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Training GANs: Two-player game
107
Minimax objective function:
Alternate between:1. Gradient ascent on discriminator
2. Gradient descent on generator
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
![Page 108: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/108.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Training GANs: Two-player game
108
Minimax objective function:
Alternate between:1. Gradient ascent on discriminator
2. Gradient descent on generator
In practice, optimizing this generator objective does not work well!
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
When sample is likely fake, want to learn from it to improve generator (move to the right on X axis).
![Page 109: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/109.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Training GANs: Two-player game
109
Minimax objective function:
Alternate between:1. Gradient ascent on discriminator
2. Gradient descent on generator
In practice, optimizing this generator objective does not work well!
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
When sample is likely fake, want to learn from it to improve generator (move to the right on X axis).
But gradient in this region is relatively flat!
Gradient signal dominated by region where sample is already good
![Page 110: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/110.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Training GANs: Two-player game
110
Minimax objective function:
Alternate between:1. Gradient ascent on discriminator
2. Instead: Gradient ascent on generator, different objective
Instead of minimizing likelihood of discriminator being correct, now maximize likelihood of discriminator being wrong. Same objective of fooling discriminator, but now higher gradient signal for bad samples => works much better! Standard in practice.
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
High gradient signal
Low gradient signal
![Page 111: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/111.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Training GANs: Two-player game
111
Putting it together: GAN training algorithm
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
![Page 112: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/112.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Training GANs: Two-player game
112
Putting it together: GAN training algorithm
Some find k=1 more stable, others use k > 1, no best rule.
Followup work (e.g. Wasserstein GAN, BEGAN) alleviates this problem, better stability!
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
Arjovsky et al. "Wasserstein gan." arXiv preprint arXiv:1701.07875 (2017)Berthelot, et al. "Began: Boundary equilibrium generative adversarial networks." arXiv preprint arXiv:1703.10717 (2017)
![Page 113: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/113.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Training GANs: Two-player game
113
Generator network: try to fool the discriminator by generating real-looking imagesDiscriminator network: try to distinguish between real and fake images
zRandom noise
Generator Network
Discriminator Network
Fake Images(from generator)
Real Images(from training set)
Real or Fake
After training, use generator network to generate new images
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
Fake and real images copyright Emily Denton et al. 2015. Reproduced with permission.
![Page 114: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/114.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Generative Adversarial Nets
114
Nearest neighbor from training set
Generated samples
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
Figures copyright Ian Goodfellow et al., 2014. Reproduced with permission.
![Page 115: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/115.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Generative Adversarial Nets
115
Nearest neighbor from training set
Generated samples (CIFAR-10)
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
Figures copyright Ian Goodfellow et al., 2014. Reproduced with permission.
![Page 116: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/116.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Generative Adversarial Nets: Convolutional Architectures
116
Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016
Generator is an upsampling network with fractionally-strided convolutionsDiscriminator is a convolutional network
![Page 117: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/117.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021117
Radford et al, ICLR 2016
Samples from the model look much better!
Generative Adversarial Nets: Convolutional Architectures
![Page 118: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/118.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021118
Radford et al, ICLR 2016
Interpolating between random points in latent space
Generative Adversarial Nets: Convolutional Architectures
![Page 119: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/119.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Generative Adversarial Nets: Interpretable Vector Math
119
Smiling woman Neutral woman Neutral man
Samples from the model
Radford et al, ICLR 2016
![Page 120: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/120.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021120
Smiling woman Neutral woman Neutral man
Samples from the model
Average Z vectors, do arithmetic
Radford et al, ICLR 2016
Generative Adversarial Nets: Interpretable Vector Math
![Page 121: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/121.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021121
Smiling woman Neutral woman Neutral man
Smiling ManSamples from the model
Average Z vectors, do arithmetic
Radford et al, ICLR 2016
Generative Adversarial Nets: Interpretable Vector Math
![Page 122: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/122.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021122
Glasses man No glasses man No glasses woman
Woman with glasses
Radford et al, ICLR 2016
Generative Adversarial Nets: Interpretable Vector Math
![Page 123: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/123.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021123
https://github.com/hindupuravinash/the-gan-zoo
See also: https://github.com/soumith/ganhacks for tips and tricks for trainings GANs2017: Explosion of GANs
“The GAN Zoo”
![Page 124: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/124.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021124
Better training and generation
LSGAN, Zhu 2017. Wasserstein GAN, Arjovsky 2017. Improved Wasserstein GAN, Gulrajani 2017.
Progressive GAN, Karras 2018.
2017: Explosion of GANs
![Page 125: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/125.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
2017: Explosion of GANs
125
CycleGAN. Zhu et al. 2017.
Source->Target domain transfer
Many GAN applications
Pix2pix. Isola 2017. Many examples at https://phillipi.github.io/pix2pix/
Reed et al. 2017.
Text -> Image Synthesis
![Page 126: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/126.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
2019: BigGAN
126
Brock et al., 2019
![Page 127: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/127.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Scene graphs to GANs
Specifying exactly what kind of image you want to generate.
The explicit structure in scene graphs provides better image generation for complex scenes.
127Johnson et al. Image Generation from Scene Graphs, CVPR 2019
Figures copyright 2019. Reproduced with permission.
![Page 128: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/128.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
HYPE: Human eYe Perceptual Evaluationshype.stanford.edu
Zhou, Gordon, Krishna et al. HYPE: Human eYe Perceptual Evaluations, NeurIPS 2019
128Figures copyright 2019. Reproduced with permission.
![Page 129: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/129.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Summary: GANs
129
Don’t work with an explicit density functionTake game-theoretic approach: learn to generate from training distribution through 2-player game
Pros:- Beautiful, state-of-the-art samples!
Cons:- Trickier / more unstable to train- Can’t solve inference queries such as p(x), p(z|x)
Active areas of research:- Better loss functions, more stable training (Wasserstein GAN, LSGAN, many others)- Conditional GANs, GANs for all kinds of applications
![Page 130: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/130.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Summary
130
Autoregressive models:PixelRNN, PixelCNN
Van der Oord et al, “Conditional image generation with pixelCNN decoders”, NIPS 2016
Variational Autoencoders
Kingma and Welling, “Auto-encoding variational bayes”, ICLR 2013
Generative Adversarial Networks (GANs)
Goodfellow et al, “Generative Adversarial Nets”, NIPS 2014
![Page 131: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/131.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Useful Resources on Generative Models
CS 236: Deep Generative Models (Stanford)
CS 294-158 Deep Unsupervised Learning (Berkeley)
131
![Page 132: Lecture 12: Generative Models](https://reader030.vdocument.in/reader030/viewer/2022012717/61afc96dab24fe6e6f1ea964/html5/thumbnails/132.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 12 - May 11, 2021
Next: Self-Supervised Learning
132