bayesian learning 1 of (probably) 2. administrivia readings 1 back today good job, overall watch...

Bayesian Learning1 of (probably) 2

Administrivia•Readings 1 back today

•Good job, overall

•Watch your spelling/grammar!

•Nice analyses, though

•Possible fruit for final proj?

•HW2 assigned today

•Due Oct 12 (2 weeks)

•Do start early...

ML trivia of the day...•Which data mining techniques [have] you used

in a successfully deployed application?

htt

p:/

/w

ww

.kdnu

gg

ets

.com

/

Bayesian Classification

Assumptions•“ ‘Assume’ makes an ass out of ‘U’ and ‘me’”

•Bull****

•Assumptions about data are unavoidable

•Learn faster/better when you know (assume) more about data

•Decision tree:

•Axis orthagonality; accuracy cost (0/1 loss)

•k-NN:

•Distance function/metric; accuracy cost

•LSE:

•Linear separator; squared error cost

Assumptions•SVMs

•Linear separator

•High-dim projection (kernel function)

•Generalized inner product/cosine

•Max margin cost function

Specifying assumptions•Bayesian learning assumes:

•Data were generated by some stochastic process

•Can write down (some) mathematical form for that process

•CDF/PDF/PMF

•Mathematical form needs to be parameterized

•Have some “prior beliefs” about those params

•Essentially, an attempt to make assumptions explicit and to divorce them from learning algorithm

•In practice, not a single learning algorithm, but a recipe for generating problem-specific algs.

Example•F={height, weight}

•C={male, female}

•Q1: Any guesses about individual distributions of height/weight by class?

•What probability function (PDF)?

•Q2: What about the joint distribution?

•Q3: What about the means of each?

•Reasonable guess for the upper/lower bounds on the means?

Some actual data*

* Actual synthesized data, anyway...

General idea•Find probability distribution that

describes classes of data

•Find decision surface in terms of those probability distributions

H/W data as PDFs

Or, if you prefer...

General idea•Find probability distribution that

describes classes of data

•Find decision surface in terms of those probability distributions

•What would be a good rule?

The Bayes optimal decision•For 0/1 loss (accuracy), is provable that

optimal decision is:

•Equivalently, it’s sometimes useful to use log odds ratio test:

Bayes decisions in pictures

x1

f(x1 )

f(x1|c2)

f(x1|c1)c1c2

c2c1

Bayesian learning process•So where do the probability distributions

come from?

•The art of Bayesian data modeling is:

•Deciding what probability models to use

•Figuring out how to find the parameters

•In Bayesian learning, the “learning” is (almost) all in finding the parameters

Back to the H/W data

•Gaussian (a.k.a. normal or bell curve) is a reasonable assumption for this data

•Other distributions better for other data

•Can make reasonable guesses about means

•Probably not -3 kg or 2 million lightyears

•Assumptions like these are called

•Model assumptions (Gaussian)

•Parameter priors (means)

•How do we incorporate these into learning?

Prior knowledge

5 minutes of math...•Our friend the Gaussian distribution

•1n 1-dimension:

•Mean:

•Std deviation:

•Both parameters scalar

•Usually, we talk about variance rather than std dev:

5 minutes of math...•In d dimensions:

•Where:

•Mean vector:

•Covariance matrix:

•Determinant of covariance:

Exercise:•For the 1-d Gaussian:

•Given two classes, with means and and std devs and

•Find a description of the decision point if the std devs are the same, but diff means

•And if means are the same, but std devs are diff

•For the d-dim Gaussian,

•What shapes are the isopotentials? Why?

•Repeat above exercise for d-dim Gaussian

bayesian learning 1 of (probably) 2. administrivia readings 1 back today good job, overall watch...

Documents