scalable training of inference networks for gaussian ...12-14-00)-12-15-05-5097... · scalable...

Scalable Training of Inference Networks forGaussian-Process Models

Jiaxin ShiTsinghua University

Jun ZhuMohammad Emtiyaz Khan

Joint work with

Gaussian Process

mean function

covariance function / kernel

Sparse variational GP [Titsias, 09; Hensman et al., 13]

Gaussian field

inducing points

Posterior inference

complexity, conjugate likelihoods

Inference Networks for GP Models Remove sparse assumption

Gaussianfield

Inputs

Observations

Data Prediction

Inference network

Inference Networks for GP Models Remove sparse assumption

Inference network

Data Prediction

Gaussianfield

Inputs

Observations

Examples of Inference Networks

weight space

function space

● Bayesian neural networks:

○ intractable output densitysin

freq (s)

weights (w)

[Sun et al., 19]

● Inference network architecture can be derived from

the weight-space posterior

○ Random feature expansions

○ Deep neural nets

[Cutajar, et al., 18]

● Consider matching variational and true posterior processes at arbitrary

● This objective is doing improper minibatch for the KL divergence term

Minibatch Training is DifficultFunctional Variational Bayesian Neural Networks (Sun et al., 19)

● Full batch fELBO

● Practical fELBO

Measurement points

Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent

● work with the functional density directly

○ natural gradient in the density space

○ minibatch approximation with stochastic functional gradient

● closed-form solution as an adaptive Bayesian filter

[Dai et al., 16; Cheng & Boots, 16]

seeing next data point adapted prior

● sequentially applying Bayes’ rule is the most natural gradient

○ in conjugate models: equivalent to natural gradient for exponential families

[Raskutti & Mukherjee, 13; Khan & Lin, 17]

Scalable Training of Inference Networks for GP Models Minibatch training of inference networks

student teacher

● an idea from filtering: bootstrap

○ similar idea: temporal difference (TD) learning with function approximation

Scalable Training of Inference Networks for GP Models Minibatch training of inference networks

● (Gaussian likelihood case) closed-form marginals of at locations

○ equivalent to GP regression

● (Nonconjugate case) optimize an upper bound of

Scalable Training of Inference Networks for GP Models Measurement points vs. inducing points

M=2 M=5 M=20

● inducing points - expressiveness of variational approximation

● measurement points - variance of training

Scalable Training of Inference Networks for GP Models Effect of proper minibatch training

FBNN, M=20 GPNet, M=20

Airline Delay (700K)

● Fix underfitting N=100, batch size=20

● Better performance with more measurement points

Scalable Training of Inference Networks for GP Models Regression & Classification

Regression benchmarks

GP classification with a priorderived from

infinite-width Bayesian ConvNets

Poster #227Code: https://github.com/thjashin/gp-infer-net

scalable training of inference networks for gaussian ...12-14-00)-12-15-05-5097... · scalable...

Documents

2009-2010 bill 5097: adam taylor, jr. - south … · web...

scalable inference in latent variable...

71r3000 (ref. hr 5097)

scalable inference of overlapping communities. neural...

scalable and robust bayesian inference via the median...

scalable taint specification inference with big...

scalable inference algorithms for clustering large networks

concurrency control for scalable bayesian inference

rilaas: robot inference and learning as a...

amr ahmed thesis proposal modeling users and content:...

scalable inference for markov processes with …stat comput...

scalable map inference in bayesian networks based on a...

a scalable rdbms-based inference engine for rdfs/owl

5097%2010%20side%20road%20fs%20 %20%20low%20res%20new

towards scalable cluster auditing through grammatical...

implementing an inference engine for rdfs/owl constructs...

scalable estimation and inference for...

scalable inference for a full multivariate stochastic...

scalable inference in models with multiple timescales · pdf...

ai-on-skin: fast and scalable on-body ai inference for