bayesian wrap-up (probably). administrivia office hours tomorrow on schedule woo hoo! office hours...

Bayesian Wrap-Up(probably)

Administrivia

•Office hours tomorrow on schedule

•Woo hoo!

•Office hours today deferred... [sigh]

•4:30-5:15

Retrospective/prospective•Last time:

•Maximum likelihood

• IID samples

•The MLE recipe

•Today:

•Finish up MLE recipe

•Bayesian posterior estimation

Exercise•Find the maximum likelihood estimator of μ for

the univariate Gaussian:

•Find the maximum likelihood estimator of β for the degenerate gamma distribution:

•Hint: consider the log of the likelihood fns in both cases

Solutions•PDF for one data point:

• Joint likelihood of N data points:

Solutions•Log-likelihood:

Solutions•Log-likelihood:

•Differentiate w.r.t. μ:

Example•1-d Gaussian w/ σ=1, unknown μ

•x1=4.35

L(μ,x1)

Example•1-d Gaussian w/ σ=1, unknown μ

•x1=4.35, x2=3.12, x3=4.91

Solutions•What about for the gamma PDF?

Putting the parts together

[X,Y]

com

ple

te

train

ing

data


Assumed distributionfamily (hyp. space)w/ parameters Θ

Parameters for class a:

Specific PDFfor class a

Gaussian Distributions

5 minutes of math...•Recall your friend the Gaussian PDF:

• I asserted that the d-dimensional form is:

•Let’s look at the parts...

5 minutes of math...

5 minutes of math...•Ok, but what do the parts mean?

•Mean vector, : mean of data along each dimension

5 minutes of math...•Covariance matrix

•Like variance, but describes spread of data

5 minutes of math...•Note: covariances on the diagonal of

are same as standard variances on that dimension of data

•But what about skewed data?

5 minutes of math...•Off-diagonal covariances ( )

describe the pairwise variance

•How much xi changes as x

j changes (on

avg)

5 minutes of math...•Calculating from data:

• In practice: you want to measure the covariance between every pair of random variables (dimensions):

•Or, in linear algebra:

5 minutes of math...•Marginal probabilities

• If you have a joint PDF:

• ... and want to know about the probability of just one RV (regardless of what happens to the others)

•Marginal PDF of or :

5 minutes of math...•Conditional probabilities

•Suppose you have a joint PDF, f(H,W)

•Now you get to see one of the values, e.g., H=“183cm”

•What’s your probability estimate of W, given this new knowledge?

5 minutes of math...•Conditional probabilities

•Suppose you have a joint PDF, f(H,W)

•Now you get to see one of the values, e.g., H=“183cm”

•What’s your probability estimate of A, given this new knowledge?

5 minutes of math...•From cond prob. rule, it’s 2 steps to Bayes’

rule:

•(Often helps algebraically to think of “given that” operator, “|”, as a division operation)

Everything’s random...•Basic Bayesian viewpoint:

•Treat (almost) everything as a random variable

•Data/independent var: X vector

•Class/dependent var: Y

•Parameters: Θ

•E.g., mean, variance, correlations, multinomial params, etc.

•Use Bayes’ Rule to assess probabilities of classes

•Allows us to say: “It is is very unlikely that the mean height is 2 light years”

Uncertainty over params•Maximum likelihood treats parameters as

(unknown) constants

• Job is just to pick the constants so as to maximize data likelihood

•Fullblown Bayesian modeling treats params as random variables

•PDF over parameter variables tells us how certain/uncertain we are about the location of that parameter

•Also allows us to express prior beliefs (probabilities) about params

bayesian wrap-up (probably). administrivia office hours tomorrow on schedule woo hoo! office hours...

Documents