cs 480/680 lecture 23: july 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfcs...

19
CS 480/680 Lecture 23: July 24, 2019 Normalizing Flows CS480/680, Guest Lecture, Priyank Jaini University of Waterloo Spring 2019 [GBC] Sec: 20.10.7 Complimentary Reading: Sum-of-Squares Polynomial Flows, ICML 2019 Tutorial on Normalizing Flows, Eric Jang

Upload: others

Post on 17-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

CS 480/680 Lecture 23: July 24, 2019

Normalizing Flows

CS480/680, Guest Lecture, Priyank JainiUniversity of Waterloo Spring 2019

[GBC] Sec: 20.10.7Complimentary Reading:

• Sum-of-Squares Polynomial Flows, ICML 2019• Tutorial on Normalizing Flows, Eric Jang

Page 2: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

density estimation

data = {x1, x2, …, xn}

estimate q(x)

importance sampling

many applications…

Jaini et. al. Online Bayesian Transfer Learning for Sequential Data Modeling, ICLR 2017

image & audio synthesisvan den oord et. al. Pixel Recurrent Neural Networks, ICML 2016

van den oord et. al. WaveNet: A Generative Model for Raw Audio, SSW 2016Kingma et. al. Glow:Generative Flow with Invertible 1x1 Convolutions, NeurIPS 2018

bayesian inference

Moselhy, T., Marzouk, Y. et.al. Bayesian Inference with Optimal Maps, JCP 2012

Page 3: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

conditional generative models

courtesy: wimbledon courtesy: 9GAG

Page 4: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

GANs and VAE’s

xqreal

z

qsynth

p(z)

Real (1) / Fake(0)

z p(z)

x

x

Encoder

Decoder

implicit representation of density functions

Page 5: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

source distribution p(z)

target distribution q(x)

T

agenda

explicit representation of density functions

find deterministic maps from source to target density

Page 6: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

conversation of probability mass

0

0

1

13

p(z)

q(x)

z

x

dz

dx

p(z)dz = q(x)dx

q(x) = p(z)dzdxEric Jang, Tutorial on Normalizing Flows

0

0

1

1

1

4

13

p(z)

q(x)

z

x

x := T(z) = 3z + 1

Transforming a uniform random variable into

another uniform variable

Page 7: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

change of variables

0

0

1

13

p(z)

q(x)

z

x

dz

dx

p(z)dz = q(x)dx

q(x) = p(z)dzdx

univariate

T : 𝖹 → 𝖷

𝖹 :

𝖷 :

source random variable and density p(z)

source random variable and density q(x)

q(x) = p(z)∂T(z)

∂z

−1

q(x) = p(z) det(∇zT(z))−1

multivariate

T : ℝd → ℝd

Eric Jang, Tutorial on Normalizing Flows

Page 8: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

recipe for learning

Given: dataset 𝒟 := {x1, x2, x3, …, xn} ∼ q(x)

learn the density q(x)

choose a simple source density p(z)

use maximum likelihood

n

∏i=1

q(xi) =n

∏i=1

p(zi) det(∇zT(zi))−1

T := arg maxT

n

∏i=1

p(zi) det(∇zT(zi))−1

T := arg maxT

n

∑i=1

log p(zi) − log det(∇zT(zi))

But…..what are � ?zi

zi = T−1(xi)

so we need the inverse T−1

computing determinant is computationally expensive

triangular maps

computation of inverse and Jacobian must be cheap

Page 9: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

increasing triangular maps

x1 = T1(z1)x2 = T2(z1, z2)x3 = T3(z1, z2, z3)

xd = Td(z1, z2, z3, …, zd)

∇zT =

∂T1

∂z10 … 0

∂T2

∂z1

∂T2

∂z2… 0

⋮ ⋮ ⋱ ⋮∂Td

∂z1

∂Td

∂z2…

∂Td

∂zd

triangular mapsinverse and Jacobian are easy to compute

T : ℝd → ℝd

triangular : Tj is a function of z1, z2, …, zj increasing :Tj is increasing w.r.t zj

∂Tj

∂zj> 0

Bogachev, V. et. al. Triangular Transformation of Measures, Sbornik: Mathematics, 2005

Theorem (paraphrase) : there always exists a unique* increasing triangular map that transforms a source density to a target density

* for a fixed ordering

Page 10: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

framework

z1

T1

x1

z2

T2

x2

zd

Td

xd

z ∼ p(z) x ∼ q(x)

minT

n

∑i=1

[− log p(T−1(xi)) + ∑j

log ∂jTj(T−1(xi))]learn T by maximizing likelihood

Marzouk, Y. et.al. Sampling via Measure Transport: An Introduction, Springer, 2016

Page 11: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

z1C

x1

w1z

z1

z2

x1

x2T

x

{z1, …, zl−1}

{zl, …, zd}

{x1, …, xl−1}

{xl, …, xd}

x1

x2

xd

flow models as triangular mapsstudy commonalities & differences of flow based methods

Page 12: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

autoregressive modelsq(x) = q1(x1) ⋅ q2(x2 |x1) ⋅ … ⋅ qd(xd |x<d)

x1z1

T1

z1 x1

zd

Td

xdzd xd

choosing a conditional implicitly fixes a family of triangular maps

q1(x1)x1

q2(x2 |x1)x2

qd(xd |x<d)xd

Larochelle, H and Murray, I. The Neural Autoregressive Distribution Estimator, AISTATS, 2011Uria, et.al. Neural Autoregressive Distribution Estimation, JMLR, 2016

z2

T2

x2z2 x2

xj = Tj (zj; θj(z<j))

Page 13: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

AR with Gaussian conditionals

z1

T1

x1z1 x1

z2

T2

x2z2 x2

x1 = σ1 ⋅ z1 + μ1 =: T1(z1)

x2 = σ2(z1) ⋅ z2 + μ2(z1) := T2(z2 ; z1)

zd

Td

xdzd xd

xd = σd(z<d) ⋅ zd + μd(z<d) := Td(zd ; z<d)

q(x) = q1(x1) ⋅ q2(x2 |x1) ⋅ … ⋅ qd(xd |x<d)

𝒩(μ1, σ21) 𝒩(μ2, σ2

2) 𝒩(μd, σ2d)

Kingma, et.al. Improved Variational Inference with Inverse Autoregressive Flow, NeurIPS, 2016

Page 14: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

masked autoregressive flows (MAFs)

deep autoregressive flows with Gaussian conditionals*

Papamakarios, et.al. Masked Autoregressive Flow for Density Estimation, NeurIPS, 2017 * can consider mixture of gaussians

z1z

p(z) ⋅ ∇T(1)−1

AR

T(1) T(2)

⋅ ∇T(2)−1

z2

ARP

⋅ ∇T(3)−1

z3

AR

T(3)

P

q(x) = ⋅ ∇T(4)−1

xj = zj ⋅ exp(αj(z<j)) + μj(z<j) =: Tj(zj ; z<j)

x

AR

T(4)

P

triangular maps are fundamental blocks for complex models

Page 15: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

real-NVP

Tj(zj ; z<l) = exp(αj(z<l) ⋅ 1j∉[l−1]) ⋅ zj + μj(z<l) ⋅ 1j∉[l−1]

z

z1

z2

{z1, …, zl−1}

{zl, …, zd}

x1

{x1, …, xl−1}

x2T

x

{xl, …, xd}

Dinh, et.al. Density estimation using Real NVP, ICLR, 2017

Page 16: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

neural autoregressive flows (NAFs)

z1

x1

C

w1

z1

z2C

x2

w2

zdC

xd

z1, …, zd−1

wd

Strictly positive weights & strictly monotonic activation function ensure that the map is increasing

Huang, et.al. Neural Autoregressive Flows, ICML, 2018

Universal

xj = DNN (zj ; wj(z<j)) =: Tj(zj ; z<j)

Page 17: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

sum-of-squares polynomial flows

goal : learn a universal increasing triangular function

∫+a0

x1

C

{a1, a2, a0}

∑∂T1

∂z1

z1

𝔓2r(z1; a1) 𝔓2

r(z1; a2)

𝔓r(z; a) =r

∑l=0

al,kzl

zd

𝔓2r(zd; a′�1) 𝔓2

r(zd; a′�2)

C

{a′�1, a′�2, a′�0}

z1, …, zd−1

∫+a′�0

xd

∂Td

∂zd

increasing functions ≈ primitive of non-

negative polynomials= primitive of sum-of-

squares of polynomials

Jaini, et.al. Sum-of-Squares Polynomial Flow, ICML, 2019

Page 18: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

SOS flows

Interpretablecoefficients control higher order moments

Easier to trainno constraints on coefficients

Universalapproximate any density given enough capacity

Application to stochastic simulationz1

C

∫+c

x1

𝔓2r(z1; a1) 𝔓2

r(z1; a2)

{a1, a2}

Generalization of other flows (e.g. IAF, MAF, Real-NVP etc)

Jaini, et.al. Sum-of-Squares Polynomial Flow, ICML, 2019

Page 19: CS 480/680 Lecture 23: July 24, 2019ppoupart/teaching/cs480-spring19/slides/cs480-lecture23.pdfCS 480/680 Lecture 23: July 24, 2019 Normalizing Flows University of Waterloo CS480/680,

Glow : invertible 1x1 convolutions

z

x

data dependent

random rotation matrix

Kingma, D.P. and Dhariwal, P, et.al. Glow: Generative Flow with Invertible 1x1 Convolutions, NeurIPS, 2018