reviewing autoencoding variational bayes...autoencoding variational bayes & deep generative...

ReviewingAutoencoding Variational Bayes& Deep Generative Models

Elham DolatabadiMarch 2019

Overview

2

Deep Latent-Variable Models

Variational Approximation

Variational Autoencoding

• Background

• Paper review

• Discussion on deep generative models

Probabilistic graphical models

3

• Our system is a collection of random variables

Latent variables Z

XObserved variables

ϴ Model parameters

might be huge

might be deep &

continuous

Computational Challenge

4

• For complex model or big data it may be infeasible to compute posterior & marginal likelihood

• Intractable posterior -> No EM

• Approaches to compute posterior:

• Analytical Integration (MAP)

• Approximation:

• Stochastic (sampling, MCMC)

• Variational




e.g. neural nets as

components


5

Z

X

ϴɸ




Variational Lower bound

6

how: to minimize KL(qɸ(z|x) || pϴ(z|x)) w.r.t. variational parameters ɸ

KL(qɸ || pϴ) = ∫qɸ(z|x) log qɸ(z|x)/pϴ(z|x) = log p ϴ(x) + ∫qɸ(z|x) log qɸ(z|x)/pϴ(z,x)




L, Variational Lower bound,

L = -∫qɸ(z|x) log qɸ(z|x)/pϴ(z,x)

log p(x) = L + KL(q||p)

log p(x) > L

-L

Variational Lower bound

7

Minimize KL(qɸ(z|x) || pϴ(z|x)) w.r.t. ɸ

maximize L

L = -∫qɸ(z|x) log qɸ(z|x)/pϴ(z,x)

p ϴ(x) is fixed w.r.t ɸ

1

2




Connection to Auto-encoders

8

L

Stochastic Encoder

qɸ

Stochastic Decoder

pϴ

Encoder maps x into a distribution qɸSampling z ~ qɸ(z|x) is an “encoding” that converts observations to latent code

Z X

Decoder maps z into a distribution pϴSampling x ~ pθ(x|z) is an “decoding” that reconstructs observations from z

X

Sampled latent vector




Reconstruction loss or Expected negative loglikelihood

Regularizer

2

Deep Latent Variable Model

9Deep Latent-Variable

ModelsVariational

ApproximationVariational

Autoencoding

Benefits:

1. Representing complex p(x)

2. Representing complicated conditional dependencies

Example model:

p(z) = N(0,I)pϴ(x|z) = N(µ, σ2), µ = fϴ(z) = deep NN

With neural net fϴ(z),

• pϴ(x) would be complicated

• pϴ(x|z) would be intractable

Image source: Kingma talk Nips 2015

Key Reparameterization Trick

10

• Θ*, ɸ* = argmax L(Θ, ɸ; x)Issue: How to take derivatives w.r.t parameters (e.g. backpropagate) because sampling is a stochastic process





SGVB estimator

11

2

1




AEVB

12





AEVB


ModelsVariational


Autoencoding

Stochastic Encoder

qɸ

Stochastic Decoder

pϴ

Z XXµ

σ

Sampled Latent

Z = µ +σ * ϵ

Pixel differences|| x– f(z) || 2

|| x– f(z) || 2f(z)

(½) [exp(σ(x)) + µ(x)2 – 1 - σ(x)]Σ

VAE


ModelsVariational


Autoencoding

The latent variable space q(z|x) of VAE trained on MNIST

VAE as generative model


ModelsVariational


Autoencoding

DecoderTrained on

MNIST

Demo


ModelsVariational


Autoencoding

VAE interactive demo

https://www.siarez.com/projects/variational-autoencoder

Summary

17

• VAE is rooted in Bayesian inference and estimates p(x); optimizes its lower bound

• VAE is a scalable generative modeling with continuous latent variables

• Simple and fast

• Potential applications:

• Deep generative models of images, videos, audio

• Broader application of SGVB estimator

Discussion: Deep Generative Models

18

• State of the art deep generative models

image source: https://openai.com/blog/generative-models/#contributions

Discussion: Deep Generative Models

19

• Three main approaches:

• VAE

• GAN

• PixelRNN

Discussion: VAE vs. GAN

20

VAE:• Find q(z|x), map x onto z• Interpretable p(x)• Produces blurry Images

GAN:• No explicit p(x)• Unstable training dynamics/

difficult to optimize• Difficult to find q(z|x)• Produces sharp images

Image Credit: Autoencoding beyond pixels using a learned similarity metric

reviewing autoencoding variational bayes...autoencoding variational bayes & deep generative...

Documents