![Page 1: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/1.jpg)
Machine Learning
Lecture 23: Statistical Estimation with Sampling
Iain Murray’s MLSS lecture on videolectures.net: http://videolectures.net/mlss09uk_murray_mcmc/
![Page 2: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/2.jpg)
Today
• In service of EM In graphical models
• Sampling– Technique to approximate the expected value
of a distribution
• Gibbs Sampling– Sampling of latent variables in a Graphical
Model
2
![Page 3: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/3.jpg)
What is the average height of professors of CS at Queens
College?
• What’s the size of C?
3
![Page 4: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/4.jpg)
What is the average height of students at Queens College?
• What’s the size of C?
4
![Page 5: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/5.jpg)
What is the average height of people in Queens?
5
![Page 6: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/6.jpg)
So we’re comfortable approximating statistical
parameters…
• Why don’t we use this to do inference in complicated Graphical Models?
• or where it is difficult to count everything?
6
![Page 7: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/7.jpg)
Statistical sampling
• Make a prediction about variable, x, based on data D.
7
![Page 8: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/8.jpg)
Expected Values
• Want to know the expected value of a distribution.– E[p(t | x)] is a classification
problem• We can calculate p(x), but
integration is difficult.• Given a graphical model
describing the relationship between variables, we’d like to generate E[p(x)] where x is only partially observed.
8
![Page 9: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/9.jpg)
Sampling
• We have a representation of p(x) and f(x), but integration is intractable
• E[f] is difficult as an integral, but easy as a sum.• Randomly select points from distribution p(x) and
use these as representative of the distribution of f(x).
• It turns out that if correctly sampled, only 10-20 points can be sufficient to estimate the mean and variance of a distribution.– Samples must be independently drawn– Expectation may be dominated by regions of high
probability, or high function values
9
![Page 10: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/10.jpg)
Monte Carlo Example
• Sampling techniques to solve difficult integration problems.
• What is the area of a circle with radius 1? – What if you don’t know trigonometry?
10
![Page 11: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/11.jpg)
Monte Carlo Estimation• How can we approximate the
area of a circle if we have no trigonometry?
• Take a random x and a random y between 1 and -1– Sample from x and sample from y.
• Determine if• Repeat many times.• Count the number of times that
the inequality is true.• Divide by the area of the square
11
![Page 12: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/12.jpg)
How is sampling used in EM?
• E-Step– what are the responsibilities in GMM?
– p(xhidden | xobserved)
• M-Step– Reestimate parameters based on a convex
optimization.– Get new parameters
12
![Page 13: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/13.jpg)
Sampling in a Graphical Model
• Sample variables from its marginal
• Sample children after parents
13
AA BB
CC
DD
EE
![Page 14: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/14.jpg)
How do you sample from a distribution???
• Known algorithms
• Use this book: http://luc.devroye.org/rnbookindex.html
14
![Page 15: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/15.jpg)
Basic Algorithm
• Sample uniformly from x.• The probability mass to the left of x is a
uniform distribution. 15
x1x2 x3 x4
![Page 16: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/16.jpg)
Basic Algorithm
• y(u) = h-1(u)
• h is not always easy to calculate or invert16
x1x2 x3 x4
1
![Page 17: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/17.jpg)
Rejection Sampling
• The distribution p(x) is easy to evaluate– As in a graphical model representation
• But difficult to integrate.• Identify a simpler distribution, kq(x), which bounds
p(x), and sample, x0, from it.– This is called the proposal distribution.
• Generate another sample u from an even distribution between 0 and kq(x0).– If u ≤ p(x0) accept the sample
• E.g. use it in the calculation of an expectation of f– Otherwise reject the sample
• E.g. omit from the calculation of an expectation of f
17
![Page 18: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/18.jpg)
Rejection Sampling Example
18
![Page 19: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/19.jpg)
Importance Sampling
• One problem with rejection sampling is that you lose information when throwing out samples.
• If we are only looking for the expected value of f(x), we can incorporate unlikely samples of x in the calculation.
• Again use a proposal distribution to approximate the expected value.– Weight each sample from q by
the likelihood that it was also drawn from p.
19
![Page 20: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/20.jpg)
Graphical Example of Importance Sampling
20
![Page 21: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/21.jpg)
Markov Chain Monte Carlo
• Markov Chain:– p(x1|x2,x3,x4,x5,…) = p(x1|x2)
• For MCMC sampling start in a state z(0).• At each step, draw a sample z(m+1) based on the
previous state z(m)
• Accept this step with some probability based on a proposal distribution.– If the step is accepted: z(m+1) = z(m)
– Else: z(m+1) = z(m)
• Or only accept if the sample is consistent with an observed value
21
![Page 22: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/22.jpg)
Markov Chain Monte Carlo
• Goal: p(z(m)) = p*(z) as m →∞ – MCMCs that have this property are called ergodic.– Implies that the sampled distribution converges to the
true distribution
• Need to define a transition function to move from one state to the next. – How do we draw a sample at state m+1 given state
m?– Often, z(m+1) is drawn from a gaussian with z(m) mean
and a constant variance.
22
![Page 23: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/23.jpg)
Markov Chain Monte Carlo
• Goal: p(z(m)) = p*(z) as m →∞ – MCMCs that have this property are ergodic.
• Transition properties that provide detailed balance guarantee ergodic MCMC processess. – Also considered reversible.
23
![Page 24: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/24.jpg)
Metropolis-Hastings Algorithm
• Assume the current state is z(m).
• Draw a sample z* from q(z|z(m))
• Accept probability function
• Often use a normal distribution for q– Tradeoff between convergence and
acceptance rate based on variance.
24
![Page 25: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/25.jpg)
Gibbs Sampling
• We’ve been treating z as a vector to be sampled as a whole
• However, in high dimensions, the accept probability becomes vanishingly small.
• Gibbs sampling allows us to sample one variable at a time, based on the other variables in z.
25
![Page 26: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/26.jpg)
Gibbs sampling
• Assume a distribution over 3 variables.
• Generate a new sample for each variable conditioned on all of the other variables.
26
![Page 27: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/27.jpg)
Gibbs Sampling in a Graphical Model
• The appeal of Gibbs sampling in a graphical model is that the conditional distribution of a variable is only dependent on its parents.
• Gibbs sampling fixes n-1 variables, and generates a sample for the the nth.
• If each of the variables are assumed to have easily sample-able distributions, we can just sample from the conditionals given by the graphical model given some initial states.
27
![Page 28: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/28.jpg)
Gibbs Sampling
• Fix 4 variables, sample 5th
• repeat until convergence
28
AA BB
CC
DD
EE
![Page 29: Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:](https://reader031.vdocument.in/reader031/viewer/2022013101/56649e9f5503460f94ba1802/html5/thumbnails/29.jpg)
Next Time
• Perceptrons
• Neural Networks
29