learning representations for efficient...

73
Learning representations for efficient inference Matt Johnson, David Duvenaud, Alex Wiltschko, Bob Datta, Ryan Adams brain

Upload: others

Post on 28-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Learning representations for efficient inference

Matt Johnson, David Duvenaud, Alex Wiltschko, Bob Datta, Ryan Adams

brain

Page 2: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,
Page 3: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Natural gradient SVI for nice PGMs

p q

q

⇤(x) , argmax

q(x)L[ q(✓)q(x) ]

Page 4: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Natural gradient SVI for nice PGMs

p q

q

⇤(x) , argmax

q(x)L[ q(✓)q(x) ]

Variational autoencoders and inference networks

p q

q

⇤(x) , N (x |µ(y;�),⌃(y;�))

Page 5: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

p q

q

⇤(x) , argmax

q(x)L[ q(✓)q(x) ]

Natural gradient SVI for nice PGMs

[1] Hoffman, Bach, Blei. Online learning for Latent Dirichlet Allocation. NIPS 2010. [2] Hoffman, Blei, Wang, and Paisley. Stochastic variational inference. JMLR 2013.

[1,2]

Page 6: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

p(x | ✓) is a linear dynamical system

p(y |x, ✓) is a linear-Gaussian observation

p(✓) is a conjugate prior

✓x1 x2 x3 x4

y1 y2 y3 y4

Page 7: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

q(✓)q(x) ⇡ p(✓, x | y)p(x | ✓) is a linear dynamical system

p(y |x, ✓) is a linear-Gaussian observation

p(✓) is a conjugate prior

✓x1 x2 x3 x4

y1 y2 y3 y4

✓x1 x2 x3 x4

Page 8: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

q(✓)q(x) ⇡ p(✓, x | y)

L(⌘✓

, ⌘x

) , Eq(✓)q(x)

hlog

p(✓,x,y)q(✓)q(x)

i

p(x | ✓) is a linear dynamical system

p(y |x, ✓) is a linear-Gaussian observation

p(✓) is a conjugate prior

✓x1 x2 x3 x4

y1 y2 y3 y4

✓x1 x2 x3 x4

Page 9: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

q(✓)q(x) ⇡ p(✓, x | y)

L(⌘✓

, ⌘x

) , Eq(✓)q(x)

hlog

p(✓,x,y)q(✓)q(x)

i

⌘⇤x

(⌘✓

) , argmax

x

L(⌘✓

, ⌘x

) LSVI(⌘✓) , L(⌘✓

, ⌘⇤x

(⌘✓

))

p(x | ✓) is a linear dynamical system

p(y |x, ✓) is a linear-Gaussian observation

p(✓) is a conjugate prior

✓x1 x2 x3 x4

y1 y2 y3 y4

✓x1 x2 x3 x4

Page 10: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

q(✓)q(x) ⇡ p(✓, x | y)

L(⌘✓

, ⌘x

) , Eq(✓)q(x)

hlog

p(✓,x,y)q(✓)q(x)

i

Proposition (natural gradient SVI of Ho↵man et al. 2013)

erLSVI(⌘✓) = ⌘

0✓

+ Eq

⇤(x)(txy(x, y), 1)� ⌘

⌘⇤x

(⌘✓

) , argmax

x

L(⌘✓

, ⌘x

) LSVI(⌘✓) , L(⌘✓

, ⌘⇤x

(⌘✓

))

p(x | ✓) is a linear dynamical system

p(y |x, ✓) is a linear-Gaussian observation

p(✓) is a conjugate prior

✓x1 x2 x3 x4

y1 y2 y3 y4

✓x1 x2 x3 x4

Page 11: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

q(✓)q(x) ⇡ p(✓, x | y)

L(⌘✓

, ⌘x

) , Eq(✓)q(x)

hlog

p(✓,x,y)q(✓)q(x)

i

Proposition (natural gradient SVI of Ho↵man et al. 2013)

erLSVI(⌘✓) = ⌘

0✓

+

NX

n=1

Eq

⇤(xn)(txy(xn

, y

n

), 1)� ⌘

⌘⇤x

(⌘✓

) , argmax

x

L(⌘✓

, ⌘x

) LSVI(⌘✓) , L(⌘✓

, ⌘⇤x

(⌘✓

))

p(x | ✓) is a linear dynamical system

p(y |x, ✓) is a linear-Gaussian observation

p(✓) is a conjugate prior

N

✓x1 x2 x3 x4

y1 y2 y3 y4N

✓x1 x2 x3 x4

Page 12: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: compute evidence potentials

[1] Johnson and Willsky. Stochastic variational inference for Bayesian time series models. ICML 2014. [2] Foti, Xu, Laird, and Fox. Stochastic variational inference for hidden Markov models. NIPS 2014.

Page 13: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: compute evidence potentials

[1] Johnson and Willsky. Stochastic variational inference for Bayesian time series models. ICML 2014. [2] Foti, Xu, Laird, and Fox. Stochastic variational inference for hidden Markov models. NIPS 2014.

Page 14: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: compute evidence potentials

[1] Johnson and Willsky. Stochastic variational inference for Bayesian time series models. ICML 2014. [2] Foti, Xu, Laird, and Fox. Stochastic variational inference for hidden Markov models. NIPS 2014.

Page 15: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: compute evidence potentials

[1] Johnson and Willsky. Stochastic variational inference for Bayesian time series models. ICML 2014. [2] Foti, Xu, Laird, and Fox. Stochastic variational inference for hidden Markov models. NIPS 2014.

Page 16: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: compute evidence potentials Step 2: run fast message passing

[1] Johnson and Willsky. Stochastic variational inference for Bayesian time series models. ICML 2014. [2] Foti, Xu, Laird, and Fox. Stochastic variational inference for hidden Markov models. NIPS 2014.

Page 17: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: compute evidence potentials Step 2: run fast message passing

[1] Johnson and Willsky. Stochastic variational inference for Bayesian time series models. ICML 2014. [2] Foti, Xu, Laird, and Fox. Stochastic variational inference for hidden Markov models. NIPS 2014.

Page 18: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: compute evidence potentials

Step 3: compute natural gradient

Step 2: run fast message passing

[1] Johnson and Willsky. Stochastic variational inference for Bayesian time series models. ICML 2014. [2] Foti, Xu, Laird, and Fox. Stochastic variational inference for hidden Markov models. NIPS 2014.

Page 19: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

arbitrary inference queries

Page 20: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

What about more general observation models?

Page 21: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

p(x | ✓) is a linear dynamical system

p(y |x, �) is a neural network decoder

p(✓) is a conjugate prior, p(�) is generic

✓x1 x2 x3 x4

y1 y2 y3 y4

Page 22: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

p(x | ✓) is a linear dynamical system

p(y |x, �) is a neural network decoder

p(✓) is a conjugate prior, p(�) is generic

✓x1 x2 x3 x4

y1 y2 y3 y4

x1 x2 x3 x4

q(✓)q(�)q(x) ⇡ p(✓, �, x | y)

Page 23: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

p(x | ✓) is a linear dynamical system

p(y |x, �) is a neural network decoder

p(✓) is a conjugate prior, p(�) is generic

✓x1 x2 x3 x4

y1 y2 y3 y4

x1 x2 x3 x4

q(✓)q(�)q(x) ⇡ p(✓, �, x | y)

L(⌘✓

, ⌘

, ⌘

x

) , Eq(✓)q(✓)q(x)

log

p(✓, �, x)p(y |x, �)q(✓)q(�)q(x)

Page 24: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

p(x | ✓) is a linear dynamical system

p(y |x, �) is a neural network decoder

p(✓) is a conjugate prior, p(�) is generic

✓x1 x2 x3 x4

y1 y2 y3 y4

x1 x2 x3 x4

q(✓)q(�)q(x) ⇡ p(✓, �, x | y)

L(⌘✓

, ⌘

, ⌘

x

) , Eq(✓)q(✓)q(x)

log

p(✓, �, x)p(y |x, �)q(✓)q(�)q(x)

⌘?x

(⌘✓

, ⌘�

) , argmax

x

L(⌘✓

, ⌘�

, ⌘x

)

LSVI(⌘✓, ⌘�) , L(⌘✓

, ⌘�

, ⌘?x

(⌘✓

, ⌘�

))

Page 25: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Variational autoencoders and inference networks

p q

q

⇤(x) , N (x |µ(y;�),⌃(y;�))

[1] Kingma and Welling. Auto-encoding variational Bayes. ICLR 2014. [2] Rezende, Mohamed, and Wierstra. Stochastic backpropagation and approximate inference in deep generative models. ICML 2014

[1,2]

Page 26: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

yn

�xn

Page 27: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

yn

�xn xn

yn

q

?(xn) , N (xn |µ(yn;�), ⌃(yn;�))

Page 28: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

yn

�xn xn

yn

q

?(xn) , N (xn |µ(yn;�), ⌃(yn;�))

µ(yn;�) diag⌃(yn;�)

yn

Page 29: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

yn

�xn xn

yn

q

?(xn) , N (xn |µ(yn;�), ⌃(yn;�))

LVAE(⌘� ,�) , L(⌘�

, ⌘?x

(�))

µ(yn;�) diag⌃(yn;�)

yn

Page 30: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,
Page 31: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

[1] Archer, Park, Buesing, Cunningham, Paninski. Black box variational inference for state space models. ICLR 2016 Workshops.[2] Gao*, Archer*, Paninski, Cunningham. Linear dynamical neural population models through nonlinear embeddings. NIPS 2016.

µt(yt;�µ)

Jt,t(yt;�D)

Jt,t+1(yt, yt+1;�B)

[1,2]

Page 32: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

[1] Archer, Park, Buesing, Cunningham, Paninski. Black box variational inference for state space models. ICLR 2016 Workshops.[2] Gao*, Archer*, Paninski, Cunningham. Linear dynamical neural population models through nonlinear embeddings. NIPS 2016.[3] Krishnan, Shalit, Sontag. Structured inference networks for nonlinear state space models. AISTATS 2017.

[3]

⌃t(y1:T , x̂t�1;�)

µt(y1:T , x̂t�1;�)

µt(yt;�µ)

Jt,t(yt;�D)

Jt,t+1(yt, yt+1;�B)

[1,2]

Page 33: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Natural gradient SVI

p q

q

⇤(x) , argmax

q(x)L[ q(✓)q(x) ]

Page 34: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

– expensive for general obs.

Natural gradient SVI

p q

q

⇤(x) , argmax

q(x)L[ q(✓)q(x) ]

Page 35: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

– expensive for general obs.

+ optimal local factor

Natural gradient SVI

p q

q

⇤(x) , argmax

q(x)L[ q(✓)q(x) ]

Page 36: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

– expensive for general obs.

+ optimal local factor

+ exploits conj. graph structure

Natural gradient SVI

p q

q

⇤(x) , argmax

q(x)L[ q(✓)q(x) ]

Page 37: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

– expensive for general obs.

+ optimal local factor

+ exploits conj. graph structure

+ arbitrary inference queries

Natural gradient SVI

p q

q

⇤(x) , argmax

q(x)L[ q(✓)q(x) ]

Page 38: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

– expensive for general obs.

+ optimal local factor

+ exploits conj. graph structure

+ arbitrary inference queries

+ natural gradients

Natural gradient SVI

p q

q

⇤(x) , argmax

q(x)L[ q(✓)q(x) ]

Page 39: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

– expensive for general obs.

+ optimal local factor

+ exploits conj. graph structure

+ arbitrary inference queries

+ natural gradients

Natural gradient SVI Variational autoencoders

p q

q

⇤(x) , N (x |µ(y;�),⌃(y;�))

p q

q

⇤(x) , argmax

q(x)L[ q(✓)q(x) ]

Page 40: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

– expensive for general obs.

+ optimal local factor

+ exploits conj. graph structure

+ arbitrary inference queries

+ natural gradients

+ fast for general obs.

– suboptimal local factor

– does all local inference

– limited inference queries

– no natural gradients

Natural gradient SVI Variational autoencoders

p q

q

⇤(x) , N (x |µ(y;�),⌃(y;�))

p q

q

⇤(x) , argmax

q(x)L[ q(✓)q(x) ]

Page 41: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

– expensive for general obs.

+ optimal local factor

+ exploits conj. graph structure

+ arbitrary inference queries

+ natural gradients

+ fast for general obs.

– suboptimal local factor

– does all local inference

– limited inference queries

– no natural gradients

Natural gradient SVI Variational autoencoders Structured VAEs

p q

q

⇤(x) , N (x |µ(y;�),⌃(y;�))

p qp q

q

⇤(x) , argmax

q(x)L[ q(✓)q(x) ]

q

⇤(x) , ?

[1] Johnson, Duvenaud, Wiltschko, Datta, and Adams. Composing graphical models and neural networks. NIPS 2016.

[1]

Page 42: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

– expensive for general obs.

+ optimal local factor

+ exploits conj. graph structure

+ arbitrary inference queries

+ natural gradients

+ fast for general obs.

– suboptimal local factor

– does all local inference

– limited inference queries

– no natural gradients

+ fast for general obs.

± optimal given conj. evidence

+ exploits conj. graph structure

+ arbitrary inference queries

+ natural gradients on

Natural gradient SVI Variational autoencoders Structured VAEs

p q

⌘✓

q

⇤(x) , N (x |µ(y;�),⌃(y;�))

p qp q

q

⇤(x) , argmax

q(x)L[ q(✓)q(x) ]

q

⇤(x) , ?

[1] Johnson, Duvenaud, Wiltschko, Datta, and Adams. Composing graphical models and neural networks. NIPS 2016.

[1]

Page 43: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

SVAEs: recognition networks output conjugate potentials,

then apply fast graphical model algorithms

Page 44: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

L(⌘✓

, ⌘�

, ⌘x

) , Eq(✓)q(�)q(x)

hlog

p(✓,�,x)p(y | x,�)q(✓)q(�)q(x)

i

✓x1 x2 x3 x4

y1 y2 y3 y4

Page 45: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

L(⌘✓

, ⌘�

, ⌘x

) , Eq(✓)q(�)q(x)

hlog

p(✓,�,x)p(y | x,�)q(✓)q(�)q(x)

i

✓x1 x2 x3 x4

y1 y2 y3 y4

Eq(�) log p(yt |xt, �)

xt

Page 46: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

L(⌘✓

, ⌘�

, ⌘x

) , Eq(✓)q(�)q(x)

hlog

p(✓,�,x)p(y | x,�)q(✓)q(�)q(x)

i

(xt; yt,�)

✓x1 x2 x3 x4

y1 y2 y3 y4

Eq(�) log p(yt |xt, �)

xt

Page 47: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

L(⌘✓

, ⌘�

, ⌘x

) , Eq(✓)q(�)q(x)

hlog

p(✓,�,x)p(y | x,�)q(✓)q(�)q(x)

i

where (x; y,�) is a conjugate potential for p(x | ✓)

bL(⌘✓

, ⌘x

,�) , Eq(✓)q(�)q(x)

hlog

p(✓,�,x) exp{ (x;y,�)}q(✓)q(�)q(x)

i

(xt; yt,�)

✓x1 x2 x3 x4

y1 y2 y3 y4

x1 x2 x3 x4

y1 y2 y3 y4

Eq(�) log p(yt |xt, �)

xt

Page 48: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

L(⌘✓

, ⌘�

, ⌘x

) , Eq(✓)q(�)q(x)

hlog

p(✓,�,x)p(y | x,�)q(✓)q(�)q(x)

i

⌘⇤x

(⌘✓

,�) , argmax

x

bL(⌘✓

, ⌘x

,�) LSVAE(⌘✓, ⌘� ,�) , L(⌘✓

, ⌘�

, ⌘⇤x

(⌘✓

,�))

where (x; y,�) is a conjugate potential for p(x | ✓)

bL(⌘✓

, ⌘x

,�) , Eq(✓)q(�)q(x)

hlog

p(✓,�,x) exp{ (x;y,�)}q(✓)q(�)q(x)

i

(xt; yt,�)

✓x1 x2 x3 x4

y1 y2 y3 y4

x1 x2 x3 x4

y1 y2 y3 y4

Eq(�) log p(yt |xt, �)

xt

Page 49: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: apply recognition network

Page 50: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: apply recognition network

Page 51: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: apply recognition network

Page 52: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: apply recognition network

Page 53: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: apply recognition network Step 2: run fast PGM algorithms

Page 54: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: apply recognition network Step 2: run fast PGM algorithms

Page 55: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: apply recognition network Step 2: run fast PGM algorithms

Step 3: sample, compute flat grads

Page 56: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: apply recognition network Step 2: run fast PGM algorithms

Step 3: sample, compute flat grads

Page 57: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: apply recognition network

Step 4: compute natural gradient

Step 2: run fast PGM algorithms

Step 3: sample, compute flat grads

Page 58: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

Step 1: apply recognition network

Step 4: compute natural gradient

Step 2: run fast PGM algorithms

Step 3: sample, compute flat grads

Page 59: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

data space latent space

Page 60: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

data space latent space

Page 61: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

data

frame index

pred

ictio

nsla

tent

sta

tes

Page 62: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

data

frame index

pred

ictio

nsla

tent

sta

tes

Page 63: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,
Page 64: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

[1] Corduneanu and Bishop. Variational Bayesian Model Selection for Mixture Distributions. AISTATS 2001.[2] Ghahramani and Beal. Propagation algorithms for variational Bayesian learning. NIPS 2001.[3] Beal. Variational algorithms for approximate Bayesian inference, Ch. 3. U of London Ph.D. Thesis 2003.[4] Ghahramani and Hinton. Variational learning for switching state-space models. Neural Computation 2000.

[1] [2] [4]

Gaussian mixture model Linear dynamical system Hidden Markov model Switching LDS

[3]

Page 65: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

[1] Corduneanu and Bishop. Variational Bayesian Model Selection for Mixture Distributions. AISTATS 2001.[2] Ghahramani and Beal. Propagation algorithms for variational Bayesian learning. NIPS 2001.[3] Beal. Variational algorithms for approximate Bayesian inference, Ch. 3. U of London Ph.D. Thesis 2003.[4] Ghahramani and Hinton. Variational learning for switching state-space models. Neural Computation 2000.[5] Jordan and Jacobs. Hierarchical Mixtures of Experts and the EM algorithm. Neural Computation 1994.[6] Bengio and Frasconi. An Input Output HMM Architecture. NIPS 1995.[7] Ghahramani and Jordan. Factorial Hidden Markov Models. Machine Learning 1997.

[6][2][5]

Mixture of Experts Driven LDS IO-HMM Factorial HMM

[7]

[1] [2] [4]

Gaussian mixture model Linear dynamical system Hidden Markov model Switching LDS

[3]

Page 66: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

[1] Corduneanu and Bishop. Variational Bayesian Model Selection for Mixture Distributions. AISTATS 2001.[2] Ghahramani and Beal. Propagation algorithms for variational Bayesian learning. NIPS 2001.[3] Beal. Variational algorithms for approximate Bayesian inference, Ch. 3. U of London Ph.D. Thesis 2003.[4] Ghahramani and Hinton. Variational learning for switching state-space models. Neural Computation 2000.[5] Jordan and Jacobs. Hierarchical Mixtures of Experts and the EM algorithm. Neural Computation 1994.[6] Bengio and Frasconi. An Input Output HMM Architecture. NIPS 1995.[7] Ghahramani and Jordan. Factorial Hidden Markov Models. Machine Learning 1997.[8] Bach and Jordan. A probabilistic interpretation of Canonical Correlation Analysis. Tech. Report 2005.[9] Archambeau and Bach. Sparse probabilistic projections. NIPS 2008.[10] Hoffman, Bach, Blei. Online learning for Latent Dirichlet Allocation. NIPS 2010.

[8,9] [10]

Canonical correlations analysis admixture / LDA / NMF

[6][2][5]

Mixture of Experts Driven LDS IO-HMM Factorial HMM

[7]

[1] [2] [4]

Gaussian mixture model Linear dynamical system Hidden Markov model Switching LDS

[3]

Page 67: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

arbitrary inference queries*

*see next slide

Page 68: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

SVAEs can use any inference network architectures

[1] Archer, Park, Buesing, Cunningham, Paninski. Black box variational inference for state space models. ICLR 2016 Workshops. [2] Gao*, Archer*, Paninski, Cunningham. Linear dynamical neural population models through nonlinear embeddings. NIPS 2016.

Page 69: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

SVAEs

Page 70: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,
Page 71: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

supervised learning

Page 72: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

unsupervised learning

supervised learning

Page 73: Learning representations for efficient inferenceapproximateinference.org/2016/schedule/Johnson2016.pdf · Learning representations for efficient inference Matt Johnson, David Duvenaud,

github.com/mattjj/svae

brain