generative models (other than...

47
Algorithmic Intelligence Lab Algorithmic Intelligence Lab AI602: Recent Advances in Deep Learning Lecture 9 Slide made by Sungsoo Ahn and Sejun Park KAIST EE Generative Models (other than GANs)

Upload: others

Post on 15-Oct-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

Algorithmic Intelligence Lab

AI602: Recent Advances in Deep Learning

Lecture 9

Slide made by

Sungsoo Ahn and Sejun Park

KAIST EE

Generative Models (other than GANs)

Page 2: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

1. Introduction• Explicit vs. implicit generative models

2. Auto-Regressive Models• Pixel recurrent neural network

3. Flow-based Models• Real-valued non-volume preserving transformation

4. Variational Auto-Encoder• Variational auto-encoder (VAE)

• Improving VAE with importance weighted samples

• Improving VAE with normalizing flows

Table of Contents

2

Page 3: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

1. Introduction• Explicit vs. implicit generative models

2. Auto-Regressive Models• Pixel recurrent neural network

3. Flow-based Models• Real-valued non-volume preserving transformation

4. Variational Auto-Encoder• Variational auto-encoder (VAE)

• Improving VAE with importance weighted samples

• Improving VAE with normalizing flows

Table of Contents

3

Page 4: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• From now on, we study generative models with explicit data distribution:

• Autoregressive models use conditional distributionsto express the target distribution sequentially.

• Flow-based models warp a simple distribution via invertible transformationsto match the target distribution.

• Variational auto-encoders model the target distribution by maximizing its lower bound for the training dataset.

• Remember generative adversarial network (GAN) models implicit distribution.

Explicit vs. Implicit Generative Models

4

What is ?Random noise

modelparameters

Page 5: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

1. Introduction• Explicit vs. implicit generative models

2. Auto-Regressive Models• Pixel recurrent neural network

3. Flow-based Models• Real-valued non-volume preserving transformation

4. Variational Auto-Encoder• Variational auto-encoder (VAE)

• Improving VAE with importance weighted samples

• Improving VAE with normalizing flows

Table of Contents

5

Page 6: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Autoregressive generation (e.g., pixel-by-pixel for images) [Oord et al., 2016]:

• For example, each RBG pixel is generated autoregressively:

• Each pixel is treated as discrete variables, sampled from softmax distributions:

Pixel-Recurrent Neural Network

6

Page 7: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Using CNN and RNN for modeling• Simply treating as one-dimensional (instead of two-dimensional) vector:

Pixel-Recurrent Neural Network

7

CNN-based

maskedconvolution

(input)

(hidden layer)

Page 8: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Using CNN and RNN for modeling• Simply treating as one-dimensional (instead of two-dimensional) vector:

Pixel-Recurrent Neural Network

8

CNN-based

(input)

(generation)(hidden layer)

Page 9: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Using CNN and RNN for modeling• Simply treating as one-dimensional (instead of two-dimensional) vector:

Pixel-Recurrent Neural Network

9

CNN-based

effective receptive field

Page 10: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Using CNN and RNN for modeling• Simply treating as one-dimensional (instead of two-dimensional) vector:

Pixel-Recurrent Neural Network

10

CNN-based

LSTM

RNN-based

effective receptive field

Page 11: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Using CNN and RNN for modeling• Simply treating as one-dimensional (instead of two-dimensional) vector:

• Image generation requires multiple forward passes for each pixel.

• Evaluating (at training time) requires single forward pass for CNN, butmultiple (sequential, non-parallelizable) LSTM pass are required for RNN (slow).

• Effective receptive field (context of pixel generation) is unbounded for RNN, but bounded for CNN (constrained).

Pixel-Recurrent Neural Network

11

CNN-based RNN-basedeffective receptive field

effective receptive field

LSTM

Next, extending to two-dimensional data

Page 12: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Using CNN and RNN for modeling• Pixel CNN use masked convolutional layer (for )

Pixel-Recurrent Neural Network

12

masked convolution

(input)

(hidden layer)

Pixel CNN

Page 13: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Using CNN and RNN for modeling• Pixel CNN use masked convolutional layer (for )

Pixel-Recurrent Neural Network

13

masked convolution

masked convolution

(input)

(generation)

(hidden layer)

Pixel CNN

Page 14: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Using CNN and RNN for modeling• Pixel CNN use masked convolutional layer (for )

• Row LSTM use LSTMs, generating image row-by-row (not pixel-by-pixel).

Pixel-Recurrent Neural Network

14

(row) masked convolution

masked convolution

masked convolution

Pixel CNN Row LSTM

Page 15: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Using CNN and RNN for modeling• Pixel CNN use masked convolutional layer (for )

• Row LSTM use LSTMs, generating image row-by-row (not pixel-by-pixel).

Pixel-Recurrent Neural Network

15

(row) masked convolution

1-dimensionalconvolution

masked convolution

masked convolution

Pixel CNN Row LSTM

Page 16: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Using CNN and RNN for modeling• Pixel CNN use masked convolutional layer (for )

• Row LSTM use LSTMs, generating image row-by-row (not pixel-by-pixel).

Pixel-Recurrent Neural Network

16

(row) masked convolution

time sequencefor LSTMs

convolutional connectionsfor LSTM hidden states

1-dimensionalconvolution

masked convolution

masked convolution

Pixel CNN Row LSTM

Page 17: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Using CNN and RNN for modeling• Pixel CNN use masked convolutional layer (for )

• Row LSTM use LSTMs, generating image row-by-row (not pixel-by-pixel).

Pixel-Recurrent Neural Network

17

(row) masked convolution

time sequencefor LSTMs

convolutional connectionsfor LSTM hidden states

1-dimensionalconvolution

masked convolution

masked convolution

Pixel CNN Row LSTM

Next, introducing column-wise dependencies using LSTMs

Page 18: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Using CNN and RNN for modeling• Pixel CNN use masked convolutional layer (for )

• Row LSTM use LSTMs, generating image row-by-row (not pixel-by-pixel).

• Diagonal BiLSTM use bi-directional LSTMs, to generate image pixel-by-pixel.

Pixel-Recurrent Neural Network

18

masked convolution

masked convolution

Diagonal BiLSTM

Page 19: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Using CNN and RNN for modeling• Pixel CNN use masked convolutional layer (for )

• Row LSTM use LSTMs, generating image row-by-row (not pixel-by-pixel).

• Diagonal BiLSTM use bi-directional LSTMs, to generate image pixel-by-pixel.

Pixel-Recurrent Neural Network

19

masked convolution

masked convolution

bi-directional LSTM

Diagonal BiLSTM

Page 20: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Using CNN and RNN for modeling• Pixel CNN use masked convolutional layer (for )

• Row LSTM use LSTMs, generating image row-by-row (not pixel-by-pixel).

• Diagonal BiLSTM use bi-directional LSTMs, to generate image pixel-by-pixel.

Pixel-Recurrent Neural Network

20

“diagonal” time sequence

for LSTMs

masked convolution

masked convolution

bi-directional LSTM

Diagonal BiLSTM

Page 21: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Using CNN and RNN for modeling• Pixel CNN use masked convolutional layer (for )

• Row LSTM use LSTMs, generating image row-by-row (not pixel-by-pixel).

• Diagonal BiLSTM use bi-directional LSTMs, to generate image pixel-by-pixel.

Pixel-Recurrent Neural Network

21

masked convolution

• Receptive field now covers every pixels generated previously.

masked convolution

bi-directional LSTM “diagonal”

time sequencefor LSTMs

Diagonal BiLSTM

Page 22: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Image generation results from CIFAR-10 and ImageNet:

• Evaluation of negative log-likelihood (NLL) on MNIST and CIFAR-10 dataset:

• In general, pixel CNN is easiest to train and diagonal BiLSTM performs best.

Pixel-Recurrent Neural Network

22

MNIST CIFAR-10

CIFAR-10 ImageNet

Only explicit models (not GAN) can compute NLL.

Page 23: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Pixel CNN++ [Salimans et al., 2017] improves pixel CNN by replacement of the softmax distribution with discretized logistic mixture likelihood.

• WaveNet [Oord et al., 2017] applies pixel CNN to speech data by introducing dilated convolutional layer for scalability.

• Applications of pixelCNN to video [Kalchbrenner et al., 2016a] and machine translation [Kalchbrenner et al., 2016b] have been investigated.

Other Auto-Regressive Models

23

convolutional layer dilated convolutional layer

Page 24: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

1. Introduction• Explicit vs. implicit generative models

2. Auto-Regressive Models• Pixel recurrent neural network

3. Flow-based Models• Real-valued non-volume preserving transformation

4. Variational Auto-Encoder• Variational auto-encoder (VAE)

• Improving VAE with importance weighted samples

• Improving VAE with normalizing flows

Table of Contents

24

Page 25: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab* source: Jang, https://blog.evjang.com/2018/01/nf1.html,

Mohamed et al., https://www.shakirm.com/slides/DeepGenModelsTutorial.pdf

• Modifying data distribution by flow (sequence) of invertible transformations[Rezende et al., 2015]:

• Final variable follows some specified prior .

• Data distribution is explicitly modeled by change-of-variables formula:

• Log-likelihood can be maximized directly.

Flow-based Models

25

Page 26: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Modifying data distribution by flow (sequence) of invertible transformations[Rezende et al., 2015]:

• Final variable follows some specified prior .

• Data distribution is explicitly modeled by change-of-variables formula:

• Log-likelihood can be maximized directly.

• For training, important to design transformations with tractable computation

of , which takes times.

Flow-based Models

26

Next, introducing a flexible and tractable form of invertible transformation

Page 27: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Coupling layer for flow with tractable inference [Dinh et al., 2017]:1. Partition the variable into two parts:

2. Coupling law defines a simple invertible transformation of the first partitiongiven the second partition ( and are described later).

3. Second partition is left invariant ( ).

Real Valued Non-Volume Preserving Flow

27Next, specifying a particular effective form of the coupling layer

channel-partitionspatial-partition

Page 28: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Affine coupling layer was shown to be effective in practice:

• Jacobian of each transformation becomes a lower triangular matrix:

• Inference for such transformations can be done in tractable time.• Determinant of lower triangular matrix can be computed in time.

Real Valued Non-Volume Preserving Flow

28

element-wise product neural networks

Page 29: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• For each coupling layer, there exists asymmetry since the first partition is left invariant. • Two coupling layers are paired alternatively to overcome this issue.

• Multi-scale architectures are used.• Half variables follow Gaussian distribution at each scale.

Real Valued Non-Volume Preserving Flow

29

paired coupling layer

Page 30: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Test results for log-likelihood of test images:

• Synthetic image generation results:

Real Valued Non-Volume Preserving Flow

30

Done by forward pass of functions .

Done by backward pass of functions .

Page 31: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• If one uses fully factorized prior distribution , then flow based models can be interpreted as non-linear independent components estimation (NICE) [Dinh et al., 2015].

• Alternative pairing of coupling layers have been generalized to linear transformations [Kingma et al., 2018].

• More advanced flows specified for density estimation (not sampling) have been proposed by using autoregressive transformations [Papamakarios et al., 2017].

Other Flow-based Models

31

factorizedindependent componentsnon-linear transformations

non-factorizeddata distribution

1x1

con

v

Page 32: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

1. Introduction• Explicit vs. implicit generative models

2. Auto-Regressive Models• Pixel recurrent neural network

3. Flow-based Models• Real-valued non-volume preserving transformation

4. Variational Auto-Encoder• Variational auto-encoder (VAE)

• Improving VAE with importance weighted samples

• Improving VAE with normalizing flows

Table of Contents

32

Page 33: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Consider the following generative model:

• Fixed prior on random latent variable.

• e.g., standard Normal distribution

• Parameterized likelihood (decoder) for generation:

• e.g., Normal distribution parameterized by neural network

• Resulting generative distribution (to optimize):

Variational Autoencoder

33

“decoding”distribution

latent variable data

Page 34: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Variational autoencoder (VAE) introduce an auxiliary distribution (encoder) [Kingma et al., 2013].

• Each term is replaced by its lower bound:

• Bound becomes equality when .

Variational Autoencoder

34

“encoding”distribution

representationdata

Page 35: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• The training objective becomes:

where latent variables are sampled by .

• However, non-trivial to train with back propagation due to sampling procedure:

Variational Autoencoder

35

Since is fixed after being sampled, ?

tractable between two Gaussian distributions

Page 36: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Reparameterization trick is based on the change-of-variables formula:

• Latent variable can be similarly parameterized by encoder network:

Variational Autoencoder

36

scaling shifting

Page 37: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Total loss of variational autoencoder:

• Recall that are parameterized by .

• Derivative of first part:

Variational Autoencoder

37

log-normal distribution

reparameterization trick

Page 38: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Total loss of variational autoencoder:

• Recall that are parameterized by .

• Derivative of second part:

Variational Autoencoder

38

element-wise factorization ( )

KL divergence between normal distributions

Page 39: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Based on the proposed scheme, variational autoencoder successfully generates images:

• Interpolation of latent variables induce transitions in generated images:

Variational Autoencoder

39

Training on MNIST

Page 40: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Recall: evidence lower bound (ELBO, also variational lower bound).

• Approximating the log-likelihood is also called variational inference.• Considered to be a important topic in theory.

• However, gap in bound does not decrease no matter how many samples we use.

• Two ways of improving variational inference:• Replace ELBO with a tighter bound: importance weighted samples

• Use more powerful parameterization of : inverse auto-regressive flow

Towards Improving Variational Inference

40

Page 41: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Observe that ELBO can also be proved by the Jensen’s inequality:

• Based on convexity, interchange order of logarithm and summation.

• Importance weighted AE (IWAE) relax the inequality [Burda et al., 2018]:

• Becomes original ELBO when and becomes exact bound when .

Importance Weighted Auto-Encoder

41

also called importance weights

Page 42: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• IWAE improves the ELBO, i.e., lower negative log-likelihood (NLL).

• It also alleviates the so-called posterior collapse problem: hidden units become inactive (=invariant for all input).

Importance Weighted Auto-Encoder

42

best performance

Page 43: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Fully factorized Gaussian often leads to poor approximationof the true distribution .

• This can be improved by introducing flow-based distribution as the flexibleparameterization of .• Recall: functions are invertible in order to allow tractable density estimation.

• Inverse autoregressive flow (IAF) for improving variational auto-encoder was proposed [Kingma et al., 2016] (details in the next slide).• IAF is specialized for variational inference since its forward pass is fast,

but its backward pass is slow (not needed for variational inference).

Improving Variational Inference with Inverse Auto-Regressive Flow

43

Page 44: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Inverse autoregressive flow (IAF) modifies each dimension of variable in auto-regressive manner [Kingma et al., 2016]:• For each :

• Inference for corresponding normalizing flow is efficient:

Improving Variational Inference with Inverse Auto-Regressive Flow

44

case of updates done in parallel

Page 45: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Variational lower bound and estimated log likelihood for test dataset (MNIST):

• Average negative log likelihood for test dataset (CIFAR-10):

Improving Variational Inference with Inverse Auto-Regressive Flow

45

Page 46: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

• Overcoming the posterior collapse problem (hidden units becoming inactive) is an active topic of research [He et al., 2019].

• Latent variables can be disentangled, i.e., trained to be sensitive to only a single generative factor of dataset [Higgins et al., 2017]:

• Recent variational autoencoders were successfully applied to generating high-fidelity images [Razavi et al., 2019] :

Other Variational Autoencoders

46

rotation smiling-ness hair

interpolation of single latent variables

Page 47: Generative Models (other than GANs)alinlab.kaist.ac.kr/resource/AI602_Lec09_Explicit_generative_models.pdf · Generative Models (other than GANs) Algorithmic Intelligence Lab 1. Introduction

Algorithmic Intelligence Lab

[Kingma et al., 2013] Auto-Encoding Variational Bayes, ICLR 2013link: https://arxiv.org/abs/1802.06455

[Dinh et al., 2015] NICE: Non-Linear Independent Components Estimation, ICLR 2015link: https://arxiv.org/abs/1410.8516

[Rezende et al., 2015] Variational Inference with Normalizing Flows, ICML 2015link: https://arxiv.org/abs/1705.08665

[Oord et al., 2016] Pixel Recurrent Neural Networks, ICML 2016link: https://arxiv.org/pdf/1601.06759

[Burda et al., 2016] Importance Weighted Autoencoders, ICLR 2016link: https://arxiv.org/abs/1509.00519

[Kingma et al., 2016] Improving Inference with Inverse Autoregressive Flows, NIPS 2016link: https://arxiv.org/abs/1710.10628

[Oord et al., 2017] WaveNet: A Generative Model for Raw Audio, SSW 2017link: https://arxiv.org/abs/1703.01961

[Dinh et al., 2017] Density Estimation using Real NVP, ICLR 2017link: https://arxiv.org/abs/1605.08803

[Higgins et al., 2017] beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework, ICLR 2017link: https://openreview.net/pdf?id=Sy2fzU9gl

[Papamakarios et al., 2017] Masked Autoregressive Flow for Density Estimation, NIPS 2017link: https://arxiv.org/abs/1710.10628

[Kingma et al., 2018] Generative Flow with Invertible 1x1 Convolutions, NIPS 2018link: https://arxiv.org/abs/1807.03039

[He et al., 2019] Lagging Inference Networks and Posterior Collapse in Variational Autoencoders, ICLR 2019link: https://openreview.net/forum?id=rylDfnCqF7

References

47