simple bayesian supervised models saskia klein & steffen bollmann 1

Simple Bayesian Supervised Models

Saskia Klein & Steffen Bollmann

1

Content

Saskia Klein & Steffen Bollmann 2

Recap from last weak Bayesian Linear Regression

What is linear regression? Application of the Bayesian Theory on Linear

Regression Example Comparison to Conventional Linear Regression

Bayesian Logistic Regression Naive Bayes classifier

Source: Bishop (ch. 3,4); Barber (ch. 10)

Maximum a posterior estimation• The bayesian approach to estimate parameters of the

distribution given a set of observations is to maximize posterior distribution.

• It allows to account for the prior information.evidence

likelihoodprior

posterior

Conjugate prior• In general, for a given probability distribution p(x|η), we can seek a

prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior.

• For any member of the exponential family, there exists a conjugate prior that can be written in the form

• Important conjugate pairs include:Binomial – BetaMultinomial – DirichletGaussian – Gaussian (for mean)Gaussian – Gamma (for precision)Exponential – Gamma

Linear Regression


goal: predict the value of a target variable given the value of a D-dimensional vector of input variables

linear regression models: linear functions of the adjustable parameters𝐱

𝑡

for example:

Linear Regression


Training … training data set comprising observations,

where … corresponding target values compute the weights

Prediction goal: predict the value of for a new value of = model the predictive distribution and make predictions of in such a way as to

minimize the expected value of a loss function

Examples of linear regression models


simplest linear regression model: linear function of the weights/parameters and the

data linear regression models using basis

functions :

Bayesian Linear Regression


model: … target variable … model … data … weights/parameters … additive Gaussian noise: with zero mean and

precision (inverse variance)




likelihoodprior

posterior

Bayesian Linear Regression - Likelihood


likelihood function:

observation of N training data sets of inputs and target values (independently drawn from the distribution)




likelihoodprior

posterior

Conjugate prior• In general, for a given probability distribution p(x|η), we can seek a

prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior.

• For any member of the exponential family, there exists a conjugate prior that can be written in the form

• Important conjugate pairs include:Binomial – BetaMultinomial – DirichletGaussian – Gaussian (for mean)Gaussian – Gamma (for precision)Exponential – Gamma

Bayesian Linear Regression - Prior


prior probability distribution over the model parameters 

conjugate prior: Gaussian distribution

mean and covariance




likelihoodprior

posterior

Bayesian Linear Regression – Posterior Distribution


due to the conjugate prior, the posterior will also be Gaussian

(derivation: Bishop p.112)

Example Linear Regression


matlab

Predictive Distribution


making predictions of for new values of predictive distribution:

variance of the distribution:

first term represents the noise in the data second term reflects the uncertainty

associated with the parameters optimal prediction, for a new value of , would

be the conditional mean of the target variable:

Common Problem in Linear Regression: Overfitting/model complexitiy


Least Squares approach (maximizing the likelihood): point estimate of the weights Regularization: regularization term and value

needs to be chosen Cross-Validation: requires large datasets and high

computational power Bayesian approach:

distribution of the weights good prior model comparison: computationally demanding,

validation data not required

From Regression to Classification


for regression problems: target variable was the vector of real numbers

whose values we wish to predict in case of classification:

target values represent class labels two-class problem: K > 2: class 2

Classification


goal: take an input vector and assign it to one of discrete classes

decision boundary

Bayesian Logistic Regression


model the class-conditional densities and the prior probabilities and apply Bayes Theorem:

Bayesian Logistic Regression


exact Bayesian inference for logistic regression is intractable

Laplace approximation aims to find a Gaussian approximation to a

probability density defined over a set of continuous variables

posterior distribution is approximated around

Example


Barber: DemosExercises\demoBayesLogRegression.m

Naive Bayes classifier


Why naive? strong independence assumptions assumes that the presence/absence of a feature of

a class is unrelated to the presence/absence of any other feature, given the class variable

Ignores relation between features and assumes that all feature contribute independently to a class

[http://en.wikipedia.org/wiki/Naive_Bayes_classifier]

Saskia Klein & Steffen Bollmann

Thank you for your attention

26

simple bayesian supervised models saskia klein & steffen bollmann 1

Documents

conjugate prior

prior information

matlab slide

posterior estimation

bayesian approach

bayesian theory

gaussian gamma

likelihood function