classification derek hoiem cs 598, spring 2009 jan 27, 2009

Classification

Derek HoiemCS 598, Spring 2009

Jan 27, 2009

Outline

• Principles of generalization

• Survey of classifiers

• Project discussion

• Discussion of Rosch

Pipeline for Prediction

Imagery Representation Classifier Predictions

Free Lunch Theorem

Bias and Variance

Complexity Low BiasHigh Variance

High BiasLow Variance

Err

or

Overfitting• Need validation set• Validation set not same as test set

Bias-Variance View of Features• More compact = lower variance, potentially

higher bias• More features = higher variance, lower bias• More independence among features = simpler

classifier lower variance

How to reduce variance• Parameterize model

E.g., linear vs. piecewise

How to measure complexity?• VC dimension

Training error +

Upper bound on generalization error

N: size of training seth: VC dimension: 1-probability

How to reduce variance• Parameterize model• Regularize

How to reduce variance• Parameterize model• Regularize• Increase number of training examples

Effect of Training Size

Number of Training Examples

Err

or

Risk Minimization• Margins

x x

xx

x

x

x

x

oo

o

o

o

x2

x1

Classifiers• Generative methods

– Naïve Bayes– Bayesian Networks

• Discriminative methods– Logistic Regression– Linear SVM– Kernelized SVM

• Ensemble methods– Randomized Forests– Boosted Decision Trees

• Instance based– K-nearest neighbor

• Unsupervised– Kmeans

Components of classification methods• Objective function• Parameterization• Regularization• Training• Inference

Classifiers: Naïve Bayes• Objective• Parameterization• Regularization• Training• Inference x1 x2 x3

y

Classifiers: Logistic Regression• Objective• Parameterization• Regularization• Training• Inference

Classifiers: Linear SVM• Objective• Parameterization• Regularization• Training• Inference

x x

xx

x

x

x

x

oo

o

o

o

x2

x1

Classifiers: Linear SVM• Objective• Parameterization• Regularization• Training• Inference

x x

xx

x

x

x

x

o

oo

o

o

o

x2

x1

Needs slack

Classifiers: Kernelized SVM• Objective• Parameterization• Regularization• Training• Inference

xx xx oo o

x1

x

x

x

x

o

oo

x1

x12

Classifiers: Decision Trees• Objective• Parameterization• Regularization• Training• Inference

x x

xx

x

x

x

x

oo

o

o

o

o

x2

x1

Ensemble Methods: Boosting

figure from Friedman et al. 2000

Boosted Decision Trees

…

Gray?

High inImage?

Many LongLines?

Yes

No

NoNo

No

Yes Yes

Yes

Very High Vanishing

Point?

High in Image?

Smooth? Green?

Blue?

Yes

No

NoNo

No

Yes Yes

Yes

Ground Vertical Sky

[Collins et al. 2002]

P(label | good segment, data)

Boosted Decision Trees• How to control bias/variance trade-off

– Size of trees– Number of trees

K-nearest neighbor

x x

xx

x

x

x

xo

oo

o

o

o

o

x2

x1

• Objective• Parameterization• Regularization• Training• Inference

Clustering

x x

xx

x

xo

o

o

o

o

x1

x

x2

+ +

++

+

++

+

+

+

+

x2

x1

+

References

• General– Tom Mitchell, Machine Learning, McGraw Hill, 1997– Christopher Bishop, Neural Networks for Pattern Recognition, Oxford

University Press, 1995

• Adaboost– Friedman, Hastie, and Tibshirani, “Additive logistic regression: a statistical view

of boosting”, Annals of Statistics, 2000

• SVMs– http://www.support-vector.net/icml-tutorial.pdf

Project ideas?

Discussion of Rosch

classification derek hoiem cs 598, spring 2009 jan 27, 2009

Documents

higher variance

databoosted decision

size of training seth

lower biasmore independence

higher biasmore features

high inimage

varianceparameterize

boosted decision trees