simple bayesian supervised models saskia klein & steffen bollmann 1
TRANSCRIPT
![Page 1: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/1.jpg)
Simple Bayesian Supervised Models
Saskia Klein & Steffen Bollmann
1
![Page 2: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/2.jpg)
Content
Saskia Klein & Steffen Bollmann 2
Recap from last weak Bayesian Linear Regression
What is linear regression? Application of the Bayesian Theory on Linear
Regression Example Comparison to Conventional Linear Regression
Bayesian Logistic Regression Naive Bayes classifier
Source: Bishop (ch. 3,4); Barber (ch. 10)
![Page 3: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/3.jpg)
Maximum a posterior estimation• The bayesian approach to estimate parameters of the
distribution given a set of observations is to maximize posterior distribution.
• It allows to account for the prior information.evidence
likelihoodprior
posterior
![Page 4: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/4.jpg)
Conjugate prior• In general, for a given probability distribution p(x|η), we can seek a
prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior.
• For any member of the exponential family, there exists a conjugate prior that can be written in the form
• Important conjugate pairs include:Binomial – BetaMultinomial – DirichletGaussian – Gaussian (for mean)Gaussian – Gamma (for precision)Exponential – Gamma
![Page 5: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/5.jpg)
Linear Regression
Saskia Klein & Steffen Bollmann 5
goal: predict the value of a target variable given the value of a D-dimensional vector of input variables
linear regression models: linear functions of the adjustable parameters𝐱
𝑡
for example:
![Page 6: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/6.jpg)
Linear Regression
Saskia Klein & Steffen Bollmann 6
Training … training data set comprising observations,
where … corresponding target values compute the weights
Prediction goal: predict the value of for a new value of = model the predictive distribution and make predictions of in such a way as to
minimize the expected value of a loss function
![Page 7: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/7.jpg)
Examples of linear regression models
Saskia Klein & Steffen Bollmann 7
simplest linear regression model: linear function of the weights/parameters and the
data linear regression models using basis
functions :
![Page 8: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/8.jpg)
Bayesian Linear Regression
Saskia Klein & Steffen Bollmann 8
model: … target variable … model … data … weights/parameters … additive Gaussian noise: with zero mean and
precision (inverse variance)
![Page 9: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/9.jpg)
Maximum a posterior estimation• The bayesian approach to estimate parameters of the
distribution given a set of observations is to maximize posterior distribution.
• It allows to account for the prior information.evidence
likelihoodprior
posterior
![Page 10: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/10.jpg)
Bayesian Linear Regression - Likelihood
Saskia Klein & Steffen Bollmann 10
likelihood function:
observation of N training data sets of inputs and target values (independently drawn from the distribution)
![Page 11: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/11.jpg)
Maximum a posterior estimation• The bayesian approach to estimate parameters of the
distribution given a set of observations is to maximize posterior distribution.
• It allows to account for the prior information.evidence
likelihoodprior
posterior
![Page 12: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/12.jpg)
Conjugate prior• In general, for a given probability distribution p(x|η), we can seek a
prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior.
• For any member of the exponential family, there exists a conjugate prior that can be written in the form
• Important conjugate pairs include:Binomial – BetaMultinomial – DirichletGaussian – Gaussian (for mean)Gaussian – Gamma (for precision)Exponential – Gamma
![Page 13: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/13.jpg)
Bayesian Linear Regression - Prior
Saskia Klein & Steffen Bollmann 13
prior probability distribution over the model parameters
conjugate prior: Gaussian distribution
mean and covariance
![Page 14: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/14.jpg)
Maximum a posterior estimation• The bayesian approach to estimate parameters of the
distribution given a set of observations is to maximize posterior distribution.
• It allows to account for the prior information.evidence
likelihoodprior
posterior
![Page 15: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/15.jpg)
Bayesian Linear Regression – Posterior Distribution
Saskia Klein & Steffen Bollmann 15
due to the conjugate prior, the posterior will also be Gaussian
(derivation: Bishop p.112)
![Page 16: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/16.jpg)
Example Linear Regression
Saskia Klein & Steffen Bollmann 16
matlab
![Page 17: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/17.jpg)
Predictive Distribution
Saskia Klein & Steffen Bollmann 17
making predictions of for new values of predictive distribution:
variance of the distribution:
first term represents the noise in the data second term reflects the uncertainty
associated with the parameters optimal prediction, for a new value of , would
be the conditional mean of the target variable:
![Page 18: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/18.jpg)
Common Problem in Linear Regression: Overfitting/model complexitiy
Saskia Klein & Steffen Bollmann 18
Least Squares approach (maximizing the likelihood): point estimate of the weights Regularization: regularization term and value
needs to be chosen Cross-Validation: requires large datasets and high
computational power Bayesian approach:
distribution of the weights good prior model comparison: computationally demanding,
validation data not required
![Page 19: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/19.jpg)
From Regression to Classification
Saskia Klein & Steffen Bollmann 19
for regression problems: target variable was the vector of real numbers
whose values we wish to predict in case of classification:
target values represent class labels two-class problem: K > 2: class 2
![Page 20: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/20.jpg)
Classification
Saskia Klein & Steffen Bollmann 20
goal: take an input vector and assign it to one of discrete classes
decision boundary
![Page 21: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/21.jpg)
Bayesian Logistic Regression
Saskia Klein & Steffen Bollmann 21
model the class-conditional densities and the prior probabilities and apply Bayes Theorem:
![Page 22: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/22.jpg)
Bayesian Logistic Regression
Saskia Klein & Steffen Bollmann 22
exact Bayesian inference for logistic regression is intractable
Laplace approximation aims to find a Gaussian approximation to a
probability density defined over a set of continuous variables
posterior distribution is approximated around
![Page 23: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/23.jpg)
Example
Saskia Klein & Steffen Bollmann 23
Barber: DemosExercises\demoBayesLogRegression.m
![Page 24: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/24.jpg)
Example
Saskia Klein & Steffen Bollmann 24
Barber: DemosExercises\demoBayesLogRegression.m
![Page 25: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/25.jpg)
Naive Bayes classifier
Saskia Klein & Steffen Bollmann 25
Why naive? strong independence assumptions assumes that the presence/absence of a feature of
a class is unrelated to the presence/absence of any other feature, given the class variable
Ignores relation between features and assumes that all feature contribute independently to a class
[http://en.wikipedia.org/wiki/Naive_Bayes_classifier]
![Page 26: Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d755503460f94a56315/html5/thumbnails/26.jpg)
Saskia Klein & Steffen Bollmann
Thank you for your attention
26