deep learning part iii - yisong yue€¦ · conditional image generation with pixelcnn decoders,...
TRANSCRIPT
![Page 1: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/1.jpg)
D E E P L E A R N I N GPA RT T H R E E - D E E P G E N E R AT I V E M O D E L S
C S / C N S / E E 1 5 5 - M A C H I N E L E A R N I N G & D ATA M I N I N G - L E C T U R E 1 7
![Page 2: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/2.jpg)
G E N E R AT I V E M O D E L S
![Page 3: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/3.jpg)
3
DATA
![Page 4: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/4.jpg)
4
DATA
![Page 5: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/5.jpg)
5
example 1
DATA
![Page 6: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/6.jpg)
6
example 2
DATA
![Page 7: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/7.jpg)
7
example 3
DATA
![Page 8: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/8.jpg)
8
num
ber
of d
ata
exam
ple
s
DATA
![Page 9: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/9.jpg)
9
num
ber
of d
ata
exam
ple
sfeature 1
DATA
![Page 10: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/10.jpg)
10
num
ber
of d
ata
exam
ple
sfeature 2
DATA
![Page 11: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/11.jpg)
11
num
ber
of d
ata
exam
ple
sfeature 3
DATA
![Page 12: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/12.jpg)
12
num
ber
of d
ata
exam
ple
snumber of features
DATA
![Page 13: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/13.jpg)
13
DATA DISTRIBUTION
feature 1
feature 2
feature 3
example 1
example 2
example 3
![Page 14: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/14.jpg)
14
DATA DISTRIBUTION
feature 1
feature 2
feature 3
![Page 15: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/15.jpg)
15
DENSITY ESTIMATION
feature 1
feature 2
feature 3
estimating the density of the empirically observed data distribution
![Page 16: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/16.jpg)
16
GENERATIVE MODEL
a model of the density of the data distribution
![Page 17: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/17.jpg)
17
by modeling the data distribution, generative models are able to generate new data examples
feature 1
feature 2
feature 3
generated examples
![Page 18: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/18.jpg)
18
generative modeldiscriminative model
![Page 19: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/19.jpg)
19
discriminative models vs. generative models
can both be trained using supervised learning
generative models typically require more modeling assumptions
generative models are often easier to train with unsupervised methods
straightforward to quantify uncertainty with generative models
![Page 20: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/20.jpg)
20
one of the main benefits of generative modeling is the ability to automatically extract structure from data
reducing the effective dimensionality of the data can make it easier to learn and generalize on new tasks
![Page 21: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/21.jpg)
21
one of the main benefits of generative modeling is the ability to automatically extract structure from data
reducing the effective dimensionality of the data can make it easier to learn and generalize on new tasks
labeled examples
![Page 22: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/22.jpg)
22
one of the main benefits of generative modeling is the ability to automatically extract structure from data
reducing the effective dimensionality of the data can make it easier to learn and generalize on new tasks
labeled examples
![Page 23: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/23.jpg)
23
any model that has an output in the data space can be considered a generative model
nervous systems appear to use this mechanism in part
prediction of sensory input using “top-down” pathways
![Page 24: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/24.jpg)
24
deep generative model
a generative model that uses deep neural networks to model the data distribution
![Page 25: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/25.jpg)
25
auto-regressive models
latent variable models
implicit models
FAMILIES OF (DEEP) GENERATIVE MODELS
![Page 26: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/26.jpg)
A U T O - R E G R E S S I V E M O D E L S
![Page 27: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/27.jpg)
27
a data example
number of features
x1 x2 x3 xM
p(x) = p(x1, x2, . . . , xM )
![Page 28: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/28.jpg)
28
x1 x2 x3 xM
p(x) = p(x1, x2, . . . , xM )
use chain rule of probability to split the joint distribution into a product of conditional distributions
note: conditioning order is arbitrary
definition of conditional probability p(a|b) = p(a, b)
p(b)p(a, b) = p(a|b)p(b)
p(x1, x2, . . . , xM ) = p(x1|x2, . . . , xM )p(x2, . . . , xM )
recursively apply to p(x1, x2, . . . , xM ) = p(x1|x2, . . . , xM )p(x2, . . . , xM )
p(x1, x2, . . . , xM ) = p(x1|x2, . . . , xM )p(x2|x3, . . . , xM ) . . . p(xM�1|xM )p(xM )
p(x1, . . . , xM ) =MY
j=1
p(xj |x1, . . . , xj�1)
![Page 29: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/29.jpg)
29
x1 x2 x3 xM
model the conditional distributions of the data
learn to auto-regress to the missing values
![Page 30: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/30.jpg)
30
x1 x2 x3 xM
model the conditional distributions of the data
learn to auto-regress to the missing values
p(x1)
![Page 31: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/31.jpg)
31
x1 x2 x3 xM
model the conditional distributions of the data
learn to auto-regress to the missing values
MODEL
p(x2|x1)
![Page 32: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/32.jpg)
32
x1 x2 x3 xM
model the conditional distributions of the data
learn to auto-regress to the missing values
MODEL
p(x3|x2, x1)
![Page 33: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/33.jpg)
33
x1 x2 x3 xM
model the conditional distributions of the data
learn to auto-regress to the missing values
MODEL
p(x4|x3, x2, x1)
![Page 34: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/34.jpg)
34
x1 x2 x3 xM
model the conditional distributions of the data
learn to auto-regress to the missing values
MODEL
p(x5|x4, x3, x2, x1)
![Page 35: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/35.jpg)
35
x1 x2 x3 xM
model the conditional distributions of the data
learn to auto-regress to the missing values
MODEL
p(x6|x5, x4, x3, x2, x1)
![Page 36: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/36.jpg)
36
x1 x2 x3 xM
model the conditional distributions of the data
learn to auto-regress to the missing values
MODEL
p(xM |xM�1, . . . , x1)
![Page 37: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/37.jpg)
37
maximum likelihood
to fit the model to the empirical data distribution, maximize the likelihood of the true data examples
likelihood: p(x) =MY
j=1
p(xj |x<j)
optimize the parameters to assign high (log) probability to the true data examples
learning: ✓⇤ = argmax✓ log p(x)
auto-regressive conditionals
logarithm for numerical stability
![Page 38: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/38.jpg)
38
models
can parameterize conditional distributions using a recurrent neural network
unrolling auto-regressive generation from an RNN “teacher forcing”
Deep Learning, Goodfellow et al., 2016
(chapter 10)
![Page 39: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/39.jpg)
39
models
can parameterize conditional distributions using a recurrent neural network
The Unreasonable Effectiveness of Recurrent Neural Networks, Karpathy, 2015
Pixel Recurrent Neural Networks, van den Oord et al., 2016
![Page 40: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/40.jpg)
can also condition on a local window using convolutional neural networks
40
models
Pixel Recurrent Neural Networks, van den Oord et al., 2016
Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016
WaveNet: A Generative Model for Raw Audio, van den Oord et al., 2016
![Page 41: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/41.jpg)
41
output distributions
need to choose a form for the conditional output distribution, i.e. how do we express ?p(xj |x1, . . . , xj�1)
model the data as categorical variables
model the data as continuous variables
Gaussian, logistic, etc. output
multinomial output
![Page 42: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/42.jpg)
42
example applications
text images
Pixel Recurrent Neural Networks, van den Oord et al., 2016
WaveNet: A Generative Model for Raw Audio, van den Oord et al., 2016
speech
![Page 43: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/43.jpg)
43
recap: auto-regressive models
difficult to capture “high-level” global structure
need to impose conditioning order
sequential sampling is computationally expensive
x1 x2 x3 xM
MODEL
p(x6|x5, x4, x3, x2, x1)
model conditional distributions to auto-regress to missing values
Pros
tractable and straightforward to evaluate the (log) likelihood
great at capturing details
superior quantitative performance
Cons
![Page 44: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/44.jpg)
E X P L I C I T L AT E N T VA R I A B L E M O D E L S
![Page 45: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/45.jpg)
45
reality generates sensory stimuli from underlying latent phenomena
REALITY
matterenergyforces
etc.
laws of nature inference
STIMULI PERCEPTION
object identitiesobject locationscommunication
etc.
can use latent variables to help model these phenomena
![Page 46: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/46.jpg)
46
probabilistic graphical models provide a framework for modeling relationships between random variables
observed variable
unobserved (latent) variable
x y
x y
directed
undirected
PLATE NOTATION
set of variables
x
N
![Page 47: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/47.jpg)
47
N
review exercise: represent an auto-regressive model of 3 random variables
with plate notation
x1 x2 x3
✓
p✓(x1) p✓(x2|x1) p✓(x3|x1, x2)
![Page 48: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/48.jpg)
48
comparing auto-regressive models and latent variable models
N
x1 x2 x3
✓
p✓(x1) p✓(x2|x1) p✓(x3|x1, x2)
auto-regressive model
N
x1 x2 x3
✓
z
p✓(x1|z) p✓(x2|z) p✓(x3|z)
p✓(z)
latent variable model
![Page 49: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/49.jpg)
49
x
z
N
example: undirected latent variable model
cut f
or tim
e
restricted Boltzmann machine
![Page 50: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/50.jpg)
50
example: directed latent variable model
x
z
N
✓
![Page 51: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/51.jpg)
51
example: directed latent variable model
p(x, z) = p(x|z)p(z)GENERATIVE MODEL
jointconditional likelihood
prior
x
z
N
✓
Generation
1. sample from z p(z)
2. use samples to sample from z p(x|z)x
intuitive example
object ~ p(objects)
lighting ~ p(lighting)
background ~ p(bg)
RENDER
![Page 52: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/52.jpg)
52
example: directed latent variable model
posterior
joint
marginal likelihood
INFERENCE
p(z|x) = p(x, z)
p(x)
x
z
N
✓
Posterior Inference
use Bayes’ rule (def. of cond. prob.)
provides conditional distribution over latent variables
intuitive example
observation
what is the probability that I am observing a cat given these pixel observations?
p(cat | )p( |cat) p(cat)______________=
p( )
![Page 53: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/53.jpg)
53
example: directed latent variable model
marginal likelihood
joint
MARGINALIZATION
p(x) =
Zp(x, z)dz
x
z
N
✓
Model Evaluation
to evaluate the likelihood of an observation, we need to marginalize over all latent variables
i.e. consider all possible underlying states
intuitive example
observation
how likely is this observation under my model?(what is the probability of observing this?)
for all objects, lighting, backgrounds, etc.:
how plausible is this example?
![Page 54: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/54.jpg)
54
example: directed latent variable model
to fit the model, we want to evaluate the marginal (log) likelihood of the data
✓⇤ = argmax✓ log p(x)
however, this is generally intractable, due to the integration over latent variables
p(x) =
Zp(x, z)dz
integration in high-dimensions
x
z
N
✓
![Page 55: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/55.jpg)
55
variational inference
introduce an approximate posterior, then minimize KL-divergence to the true posterior
q⇤(z|x) = argminqDKL(q(z|x)||p(z|x))
✓̃⇤ = argmax✓Lprovides a lower bound onL log p(x), so we can use to (approximately) fit the modelL
instead of optimizing the (log) likelihood, optimize a lower bound on it
main idea
p(z|x)
z
q⇤(z|x) = argmaxqL
where is the evidence lower bound (ELBO), defined asL L ⌘ Ez⇠q(z|x) [log p(x, z)� log q(z|x)]
evaluating KL-divergence involves evaluating , instead maximize :p(z|x) L
q(z|x)
![Page 56: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/56.jpg)
56
interpreting the ELBO
L ⌘ Ez⇠q(z|x) [log p(x, z)� log q(z|x)]
we can write the ELBO as
= Ez⇠q(z|x) [log p(x|z)p(z)� log q(z|x)]
= Ez⇠q(z|x) [log p(x|z) + log p(z)� log q(z|x)]{reconstruction
{regularization
is optimized to represent the data while staying close to the priorq(z|x)
resembles the “auto-encoding” framework
many connections to compression, information theory
= Ez⇠q(z|x) [log p(x|z)]�DKL(q(z|x)||p(z))
![Page 57: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/57.jpg)
57
variational inference involves optimizing the approximate posterior for each data example
q⇤(z|x) = argmaxqLcan be solved using gradient ascent and (stochastic) backpropagation / REINFORCE,
but can be computationally expensive
can instead amortize inference over data examples by learning a separate inference model to output approximate posterior estimates
“variational auto-encoder”
x inference model
z
q(z|x)
p(z)
generative model p(
x|z)
Learning to Infer, Marino et al., 2017
![Page 58: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/58.jpg)
58
hierarchical latent variable models
xN
✓
z1
z2
Learning Hierarchical Features from Generative Models, Zhao et al., 2017
Improving Variational Inference with Inverse
Auto-regressive Flow, Kingma et al., 2016
iterative inference modelssequential latent variable models
A Recurrent Latent Variable Model for
Sequential Data, Chung et al., 2015
Deep Variational Bayes Filters:
Unsupervised Learning of State Space
Models from Raw Data, Karl et al., 2016Learning to Infer, Marino et al., 2017
![Page 59: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/59.jpg)
59
introducing latent variables to a generative model generally makes evaluating the (log) likelihood intractable
need to consider all possible “hypotheses” to evaluate (marginal) likelihood of the “outcome”
“hypotheses”
“outcome”
![Page 60: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/60.jpg)
60
change of variables
under certain conditions, we can use the change of variables formula to exactly evaluate the log likelihood
consider a variable in one dimension: x ⇠ Uniform(0, 1)
x
p(x)
1
1
then let be an affine transformation of , e.g. .
y xy = 2x+ 1
1
1
p(y)
y2 3
0.5
x
y
dx
dy
dy
dx> 0
x
y
dx
dy
dy
dx< 0
to conserve probability mass, p(y) = p(x)
����dx
dy
����
Normalizing Flows Tutorial, Eric Jang, 2018
![Page 61: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/61.jpg)
61
change of variables
in higher dimensions, conservation of probability mass generalizes to
p(y) = p(x)
����detdx
dy
���� = p(x)��detJ�1
��
where is the Jacobian matrix of the transformation, J J =dy
dx
CHANGE OF VARIABLES FORMULA
“law of the unconscious statistician” (LOTUS) can evaluate the probability from one variable’s distribution by evaluating the
probability of a transformed variable and the volume transformation
for certain classes of transformations, this is tractable to evaluate
expresses the local distortion in volume from the linear transformation��detJ�1
��
![Page 62: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/62.jpg)
62
change of variables
to use the change of variables formula, we need to evaluate��detJ�1
��
for an arbitrary Jacobian matrix, this is worst caseN ⇥N O(N3)
restricting the transformations to those with diagonal or triangular inverse Jacobians allows us to compute in .
��detJ�1��
O(N)
product of diagonal entries
![Page 63: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/63.jpg)
63
change of variables
Density Estimation Using Real NVP, Dinh et al., 2016
can transform the data into a space that is easier to model
![Page 64: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/64.jpg)
64
change of variables for variational inference: normalizing flows
use more complex approximate posterior, but evaluate a simpler distribution
chain together multiple transforms to get more expressive model
Variational Inference with Normalizing Flows, Rezende & Mohamed, 2015
target distribution:
transforms
Normalizing Flows Tutorial, Eric Jang, 2018
transform q(z|x)
![Page 65: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/65.jpg)
65
transforms
additive coupling layer Dinh et al., 2014
yd+1:D = xd+1:D + f(x1:d)y1:d = x1:d
planar flow Rezende & Mohamed, 2015
y = x+ f(x)� g(h(x)|x+ b(x))
affine coupling layer Dinh et al., 2016
yd+1:D = xd+1:D � exp(f(x1:d)) + g(x1:d)
y1:d = x1:d
inverse auto-regressive flow (IAF) Kingma et al., 2016
y =x� f(x)
exp(g(x))
masked auto-regressive flow (MAF) Papamakarios et al., 2017
y = x� exp(g(x)) + f(x)
![Page 66: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/66.jpg)
66
NICE: Non-linear Independent Components EstimationDinh et al., 2014
Variational Inference with Normalizing FlowsRezende & Mohamed, 2015
Density Estimation Using Real NVPDinh et al., 2016
Improving Variational Inference with Inverse Autoregressive FlowKingma et al., 2016
recent work
![Page 67: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/67.jpg)
67
recap: explicit latent variable models
difficult to capture details
requires additional assumptions on latent variables
x
z
N
✓
model the data through latent variables
Pros Cons
can capture abstract variables, good for semi supervised learning
relatively fast sampling / training
theoretical foundations from info. theory
likelihood evaluation / inference often intractable
![Page 68: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/68.jpg)
I M P L I C I T L AT E N T VA R I A B L E M O D E L S
![Page 69: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/69.jpg)
69
instead of using an explicit probability density, learn a model that defines an implicit density
p(x̂)
x
specify a stochastic procedure for generating the data that does not require an explicit likelihood evaluation
Learning in Implicit Generative Models, Mohamed & Lakshminarayanan, 2016
p(x̃)
![Page 70: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/70.jpg)
70
Generative Stochastic Networks (GSNs)
train an auto-encoder to learn Monte Carlo sampling transitions
the generative distribution is implicitly defined by this transition
Deep Generative Stochastic Networks Trainable by Backprop, Bengio et al., 2013
![Page 71: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/71.jpg)
71
estimate density ratio through Bayesian two-sample test
p(x̃)p(x̂)
x
p(x̂)data distribution p(x̃)generated distribution
p(x̂)
p(x̃)=
p(x|data)p(x|gen.)
density estimation becomes a sample discrimination task
p(x̂)
p(x̃)=
p(data|x)p(x)/p(data)p(gen.|x)p(x)/p(gen.) (Bayes’ rule)
p(x̂)
p(x̃)=
p(data|x)p(gen.|x) (assuming equal dist. prob.)
![Page 72: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/72.jpg)
72
Generative Adversarial Networks (GANs)
learn the discriminator:
p(data|x) = D(x) p(gen.|x) = 1�D(x)
Bernoulli outcome:
log p(y|x) = logD(x̂) + log(1�D(x̃))
y 2 {data, gen.}
Mohamed, 2016
Goodfellow, 2016
two-sample criterion:
minG
maxD
Ep(x̂) [logD(x̂)] + Ep(x̃) [log(1�D(x̃))]
x̂ ⇠ p(x̂)
x̃ ⇠ p(x̃)
![Page 73: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/73.jpg)
73
Generative Adversarial Networks (GANs)
x̂ ⇠ p(x̂)
x̃ ⇠ p(x̃)
two-sample criterion:
minG
maxD
Ep(x̂) [logD(x̂)] + Ep(x̃) [log(1�D(x̃))]
in practice:maxD
Ep(x̂) [logD(x̂)] + Ep(x̃) [log(1�D(x̃))]
maxG
Ep(x̃) [logD(x̃)]
Generative Adversarial Networks, Goodfellow et al., 2014Goodfellow, 2016
![Page 74: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/74.jpg)
74
interpretation
data manifold
Aaron Courville
explicit model
explicit models tend to cover the entire data manifold, but are constrained
implicit model
implicit models tend to capture part of the data manifold, but can neglect other parts
“mode collapse”
![Page 75: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/75.jpg)
75
Generative Adversarial Networks (GANs)
GANs can be difficult to optimize
Improved Training of Wasserstein GANs, Gulrajani et al., 2017
![Page 76: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/76.jpg)
76
evaluation
inception score
A Note on the Inception Score, Barratt & Sharma, 2018
without an explicit likelihood, it is difficult to quantify the performance
use a pre-trained Inception v3 model to quantify class and distribution entropy
IS(G) = exp�Ep(x̃)DKL(p(y|x̃)||p(y))
�
p(y|x̃) is the class distribution for a given image
p(y) =
Zp(y|x̃)dx̃ is the marginal class distribution
should be highly peaked (low entropy)
want this to be uniform (high entropy)
Improved Techniques for Training GANs, Salimans et al., 2016
![Page 77: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/77.jpg)
77
extensions: Wasserstein GAN
under an “ideal” discriminator, the generator minimizes the Jensen-Shannon divergence
DJS(p(x̂)||p(x̃)) =1
2DKL(p(x̂)||
1
2(p(x̂) + p(x̃))) +
1
2DKL(p(x̃)||
1
2(p(x̂) + p(x̃)))
however, this metric can be discontinuous, making it difficult to train
can instead use the Wasserstein (Earth Mover’s) distance (which is continuous and diff. almost everywhere):
W (p(x̂), p(x̃)) = inf�2⇧(p(x̂),p(x̃))
E(x̂,x̃)⇠� [||x̂� x̃||]
think of it as the “minimum cost of transporting points between two distributions”
intractable to actually evaluate Wasserstein distance, but by constraining the discriminator, can evaluate
minG
maxD2D
Ep(x̂) [D(x̂)]� Ep(x̃) [D(x̃)]
is the set of Lipschitz functions, which can be enforced through weight clipping or gradient penaltyD
Was
sers
tein
✓
Jens
en-S
hann
on
✓
✓ is a gen. model parameter
Wasserstein GANs, Arjovsky et al., 2017
Improved Training of Wasserstein GANs, Gulrajani et al., 2017
![Page 78: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/78.jpg)
78
extensions: inference
Adversarially Learned Inference, Dumoulin et al., 2017
Adversarial Feature Learning, Donahue et al., 2017
can we also learn to infer a latent representation?
![Page 79: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/79.jpg)
79
applicationsimage to image translation
Image-to-Image Translation with Conditional Adversarial Networks, Isola et al., 2016
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, Zhu et al., 2017
experimental simulation
Learning Particle Physics by Example, de Oliveira et al., 2017
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative
Adversarial Nets, Chen et al., 2016
interpretable representationstext to image synthesis
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks, Zhang et al., 2016
music synthesis
MIDINET: A CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORK FOR SYMBOLIC-DOMAIN MUSIC GENERATION, Yang et al., 2017
![Page 80: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/80.jpg)
80
recap: implicit latent variable models
x̂ ⇠ p(x̂)
x̃ ⇠ p(x̃)
Pros
able to learn flexible models
requires fewer modeling assumptions
capable of learning latent representation
Cons
difficult to evaluate
sensitive, difficult to optimize
can be difficult to incorporate model assumptions
![Page 81: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/81.jpg)
D I S C U S S I O N
![Page 82: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/82.jpg)
82
generative models: what are they good for?
generative models model the data distribution
1. can generate and simulate data
2. can extract structure from data
![Page 83: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/83.jpg)
83
generative models: what’s next?
applying generative models to new forms of data
incorporating generative models into complementary learning systems
![Page 84: Deep Learning Part III - Yisong Yue€¦ · Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord](https://reader034.vdocument.in/reader034/viewer/2022050216/5f62018a562da2561a527a68/html5/thumbnails/84.jpg)