Variational Methods for Discrete-Data
Latent Gaussian Models
Mohammad Emtiyaz Khan Mohammad Emtiyaz Khan
University of British Columbia
Vancouver, Canada
March 6, 2012
Variational Methods for Discrete-Data Latent Gaussian Models
The Big Picture
• Joint density models for data with mixed data types
• Bayesian models – principled and robust approach
• Algorithms that are not only accurate and fast, but are
also easy to tune, implement, and intuitive (speed-
accuracy tradeoffs)
Mohammad Emtiyaz KhanSlide 2 of 46
accuracy tradeoffs)
Variational Methods for Discrete-Data
Latent Gaussian Models
Variational Methods for Discrete-Data Latent Gaussian Models
Sources of Discrete Data
User rating dataSurvey/voting
data and blogs for
sentiment analysis
Health data
Mohammad Emtiyaz KhanSlide 3 of 46
tag correlation.Consumer choice data Sports/game data
Variational Methods for Discrete-Data Latent Gaussian Models
Motivation: Recommendation system
Movie rating dataset – Missing values – Different types of data
User1 User2 User3 User4 User5 User6 ….
Movie1 9 2 3 9 ….
Movie2 8 8 2 ….
Mohammad Emtiyaz KhanSlide 4 of 46
Movie2 8 8 2 ….
Movie3 2 8 ….
Movie4 3 8 8 1 ….
Movie5 2 7 1 ….
Movie6 7 2 1 ….
Variational Methods for Discrete-Data Latent Gaussian Models
Missing Ratings
From Wikipedia on Netflix-prize dataset
“The training set is such that the average user rated over 200 movies,
and the average movie was rated by over 5000 users. But there is wide
variance in the data—some movies in the training set have as few as 3
ratings, while one user rated over 17,000 movies.”
Movielens Dataset
Mohammad Emtiyaz KhanSlide 6 of 46
Movielens Dataset
Variational Methods for Discrete-Data Latent Gaussian Models
Sources of Discrete Data
User rating dataSurvey/voting
data and blogs for
sentiment analysis
Health data
Mohammad Emtiyaz KhanSlide 8 of 46
tag correlation.Consumer choice data Sports/game data
Variational Methods for Discrete-Data Latent Gaussian Models
For these datasets, we need a method of analysis which
• Handles missing values efficiently
• Makes efficient use of the data by weighting “reliable”
data vectors more than the “unreliable” ones
• Makes efficient use of the data by “fusing” different
What we need!
Mohammad Emtiyaz KhanSlide 9 of 46
• Makes efficient use of the data by “fusing” different
types of data efficiently (binary, ordinal, categorical,
count, text)
Variational Methods for Discrete-Data Latent Gaussian Models
W
Factor Model
Y
Z
D m
ovie
s
movie
s N users
L fa
cto
rs
Mohammad Emtiyaz KhanSlide 10 of 46
N users L factors
Gaussian:
Collins et. al. 2002, Khan et. al.2010, Yu et.al. 2009
ExpFamily:
D
D m
ovie
s
Variational Methods for Discrete-Data Latent Gaussian Models
Bayesian Learning
zn
µ, Σ
W
Mohammad Emtiyaz KhanSlide 11 of 46
x
n=1:N
yn
This talk: Lower bound maximization
Variational Methods for Discrete-Data Latent Gaussian Models
• Design tractable bounds to reduce approximation error
• Efficient optimization since lower bounds are concave : good
convergence rates and easy convergence diagnostics
Variational Methods
Mohammad Emtiyaz KhanSlide 12 of 46
convergence rates and easy convergence diagnostics
• Efficient expectation-maximization (EM) algorithms for
parameter leaning
• Comparable performance to MCMC, but much faster
• Algorithms with a wide range of speed-accuracy trade-offs
Variational Methods for Discrete-Data Latent Gaussian Models
Outline
• Latent Gaussian models
• Bounds for binary data
• Bounds for categorical data
• Results
Mohammad Emtiyaz KhanSlide 13 of 46
• Results
• Future work and conclusions
Variational Methods for Discrete-Data Latent Gaussian Models
Outline
• Latent Gaussian models
• Definition and examples
• Problem with parameter learning
• Bounds for binary data
Mohammad Emtiyaz KhanSlide 14 of 46
• Bounds for binary data
• Bounds for categorical data
• Results
• Future work and conclusions
Variational Methods for Discrete-Data Latent Gaussian Models
y11 y12 y13
y21 y23
y32 y32
Latent Gaussian Model (LGM)
z11 z12 z13
z21 z22 z23
w
η11 η12 η13
η21 η23
η32 η32N
L
Mohammad Emtiyaz KhanSlide 15 of 46
y32 y32
yD1 yD2 yD3DN
wη32 η32
ηD1 ηD2 ηD3 DL
N
n=1:N
zn
yn
µ, Σ
W
Variational Methods for Discrete-Data Latent Gaussian Models
Likelihood Examples
0.9
Data type Distribution
Real Gaussian
Count Poisson
Binary Bernoulli-Logit
Categorical Multinomial-Logit
Mohammad Emtiyaz KhanSlide 16 of 46
2Ordinal Proportional-odds
Variational Methods for Discrete-Data Latent Gaussian Models
Parameter Estimation
zn
yn
µ, Σ
W
Mohammad Emtiyaz KhanSlide 19 of 46
x
Bernoulli-logitn=1:N
Variational Methods for Discrete-Data Latent Gaussian Models
Jensen’s Lower Bound
Mohammad Emtiyaz KhanSlide 20 of 46
Variational Methods for Discrete-Data Latent Gaussian Models
Variational Lower Bound
• Generalized EM algorithm
Mohammad Emtiyaz KhanSlide 21 of 46
• Generalized EM algorithm
• E-step involves minimizing convex function
• Early stopping in E-step
• (Almost) no tuning parameters
• Easy convergence diagnostics
Variational Methods for Discrete-Data Latent Gaussian Models
Outline
• Latent Gaussian models
• Bounds for binary data
• Bernoulli-logistic likelihood
• The Bohning bound (Khan, Marlin, Bouchard, Murphy,
Mohammad Emtiyaz KhanSlide 22 of 46
• The Bohning bound (Khan, Marlin, Bouchard, Murphy,
NIPS 2010)
• Piecewise bounds (Marlin, Khan, Murphy, ICML 2011)
• Bounds for categorical data
• Results
• Future work and conclusions
Variational Methods for Discrete-Data Latent Gaussian Models
Bernoulli-Logit Likelihood
2
0.9
Mohammad Emtiyaz KhanSlide 23 of 46
2
some other
tractable terms
in m and V +
Variational Methods for Discrete-Data Latent Gaussian Models
Local Variational Bounds
some other
tractable terms
in m and V +
Mohammad Emtiyaz KhanSlide 24 of 46
x
• Bohning’s bound (Khan, Marlin, Bouchard, Murphy 2010)
• Jaakola’s bound (Jaakkola and Jordan1996)
• Piecewise quadratic bounds (Marlin, Khan, Murphy 2011)
Variational Methods for Discrete-Data Latent Gaussian Models
Bohning Bound is Faster
Mohammad Emtiyaz KhanSlide 25 of 46
For n = 1:N
Vn = (WTAnW + I)-1
mn = …..
end
O(L3ND)
V = (WTAW + I)-1
For n = 1:N
mn = …..
end
O(L2ND)
Variational Methods for Discrete-Data Latent Gaussian Models
Piecewise bounds are more accurate
Bohning Jaakkola Piecewise
Q1(x)
Q2(x)
Q3(x)
Mohammad Emtiyaz KhanSlide 27 of 46
Q3(x)
Variational Methods for Discrete-Data Latent Gaussian Models
Details of Piecewise bounds
• Find cut points and
parameters of each piece by
minimizing maximum error
• Linear pieces (Hsiung, Kim
and Boyd, 2008)
Mohammad Emtiyaz KhanSlide 28 of 46
and Boyd, 2008)
• Quadratic Pieces (Nelder-
Mead method)
• Fixed Piecewise Bounds!
• Increase accuracy by
increasing the number of
pieces
Variational Methods for Discrete-Data Latent Gaussian Models
Outline
• Latent Gaussian models
• Bounds for binary data
• Bounds for categorical data
• Multinomial-logistic likelihood and local variational
Mohammad Emtiyaz KhanSlide 29 of 46
• Multinomial-logistic likelihood and local variational
bounds
• Stick-breaking likelihood (Khan, Mohamed, Marlin,
Murphy, AI-Stats 2012)
• Results
• Future work and conclusions
Variational Methods for Discrete-Data Latent Gaussian Models
Multinomial-Logit Likelihood
Mohammad Emtiyaz KhanSlide 30 of 46
Variational Methods for Discrete-Data Latent Gaussian Models
Local Variational bounds
• The Bohning bound
• Fast and closed form updates
• The log bound (Blei and Lafferty 2006)
• More accurate than the Bohning bound, but slower
Mohammad Emtiyaz KhanSlide 31 of 46
• More accurate than the Bohning bound, but slower
• The product of sigmoid bound (Bouchard 2007)
Variational Methods for Discrete-Data Latent Gaussian Models
Stick-Breaking Likelihood
0 1
Mohammad Emtiyaz KhanSlide 32 of 46
Variational Methods for Discrete-Data Latent Gaussian Models
Outline
• Latent Gaussian models
• Bounds for binary data
• Bounds for categorical data
• Results
Mohammad Emtiyaz KhanSlide 33 of 46
• Results
• Future work and conclusions
Variational Methods for Discrete-Data Latent Gaussian Models
Speed Accuracy Trade-offs
Binary FA : UCI voting dataset (D=15, N=435)
Mohammad Emtiyaz KhanSlide 34 of 46
Bohning
Jaakkola
Piecewise Linear with 3 pieces
Piecewise Quad with 3 pieces
Piecewise Quad with 10 pieces
Variational Methods for Discrete-Data Latent Gaussian Models
Comparison with EP
Binary Gaussian Process : Ionosphere dataset (D=200)
Σij =σ exp[-||xi-xj||2/s]
µ, Σ
Mohammad Emtiyaz KhanSlide 35 of 46
n=1:N
zn
yn
µ, Σ
W
Variational Methods for Discrete-Data Latent Gaussian Models
EP vs PW : Posterior Distribution(Neg) KL-Lower
Bound to MargLikApproximation
To MargLik Pred Error
Pie
ce
wis
e B
ou
nd
Mohammad Emtiyaz KhanSlide 38 of 46
Pie
ce
wis
e B
ou
nd
EP
Variational Methods for Discrete-Data Latent Gaussian Models
Comparison with EP
• Both methods give very similar results for GPs
• Our approach can be easily extended to factor
models
• Variational EM objective function is well-defined
Mohammad Emtiyaz KhanSlide 39 of 46
• Variational EM objective function is well-defined
and can be obtained by solving minimization of
convex functions
• Numerically stable
Variational Methods for Discrete-Data Latent Gaussian Models
MultiClass Gaussian Process
MCMC
Ne
gL
og
Lik
Bohning Log VB-probit Stick-PW
Glass dataset (D=143, K=6)
Mohammad Emtiyaz KhanSlide 40 of 46
Ne
gL
og
Lik
Pre
dic
tio
n E
rro
r
Variational Methods for Discrete-Data Latent Gaussian Models
Categorical Factor Analysis
Glass dataset (D = 10, N = 958, sum of K = 29)
Logit-log
Stick-PW
Mohammad Emtiyaz KhanSlide 41 of 46
Variational Methods for Discrete-Data Latent Gaussian Models
Outline
• Latent Gaussian models
• Bounds for binary data
• Bounds for categorical data
• Results
Mohammad Emtiyaz KhanSlide 42 of 46
• Results
• Future work and conclusions
Variational Methods for Discrete-Data Latent Gaussian Models
Future Work
• Large-scale collaborative filtering
• Use convexity to design approximate gradient methods
• Sparse Gaussian Posterior Distribution
• Tuning HMC using Bayesian optimization methods
• Latent Sparse-factor model
Mohammad Emtiyaz KhanSlide 43 of 46
• Latent Sparse-factor model
• Conditional models (e.g. to model for tag-image correlation)
Variational Methods for Discrete-Data Latent Gaussian Models
Conclusions
• Variational methods show comparable performance with existing
approaches
• The main sources of errors is the bounding error
• Design of piecewise bounds to control these errors
• A good control over speed-accuracy trade-offs can be obtained
Mohammad Emtiyaz KhanSlide 44 of 46
• A good control over speed-accuracy trade-offs can be obtained
• Variational lower bounds can be optimized efficiently
• Use of convex optimization methods to get fast convergence
rates and easy convergence diagnostics
• Design of efficient expectation-maximization (EM) algorithms
Variational Methods for Discrete-Data Latent Gaussian Models
Collaborators
Kevin MurphyUBC
Benjamin MarlinU. Mass-Amherst
Mohammad Emtiyaz KhanSlide 45 of 46
Guillaume BouchardXRCE, France
Shakir MohamedU. Cambridge, now at UBC
Variational Methods for Discrete-Data Latent Gaussian Models
Thank You
Mohammad Emtiyaz KhanSlide 46 of 46
Thank You