steep learning curves reading: dh&s, ch 4.6, 4.5

28
Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Post on 21-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Steep learning curves

Reading: DH&S, Ch 4.6, 4.5

Page 2: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Administrivia•HW1 due now

•Late days are ticking...

•No other news today..

Page 3: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Viewing and re-viewing•Last time:

•HW1 FAQ

•5 minutes of math: function optimization

•Measuring performance

•Cross-validation

•Today:

•Learning curves

•Metrics

•The nearest-neighbor rule

Page 4: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Exercise

•Given the function:

•Find the extremum

•Show that the extremum is really a minimum

Page 5: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Mea culpa!

•I copied the wrong example out of the book.

•Oops. My bad.

•You guys did a great job figuring it out, though...

Page 6: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

The saddle point

Page 7: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Cross-validation in words•Shuffle data vectors

•Break into k chunks

•Train on first k-1 chunks

•Test on last 1

•Repeat, with a different chunk held-out

•Average all test accuracies together

Page 8: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

CV in pix[X;y]Original

data

[X’;y’]Randomshuffle

k-waypartition

[X1’Y1’]

[X2’Y2’]

[Xk’Yk’]

...

k train/test sets

k accuracies53.7% 85.1% 73.2%

Page 9: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

But is it really learning?•Now we know how well our models are performing

•But are they really learning?

•Maybe any classifier would do as well

•E.g., a default classifier (pick the most likely class) or a random classifier

•How can we tell if the model is learning anything?

Page 10: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

The learning curve•Train on successively larger fractions of data

•Watch how accuracy (performance) changes Learning

Static classifier(no learning)

Anti-learning(forgetting)

Page 11: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Measuring variance•Cross validation helps you get better estimate of accuracy for small data

•Randomization (shuffling the data) helps guard against poor splits/ordering of the data

•Learning curves help assess learning rate/asymptotic accuracy

•Still one big missing component: variance

•Definition: Variance of a classifier is the fraction of error due to the specific data set it’s trained on

Page 12: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Measuring variance•Variance tells you how much you expect your classifier/performance to change when you train it on a new (but similar) data set

•E.g., take 5 samplings of a data source; train/test 5 classifiers

•Accuracies: 74.2, 90.3, 58.1, 80.6, 90.3

•Mean accuracy: 78.7%

•Std dev of acc: 13.4%

•Variance is usually a function of both classifier and data source

•High variance classifiers are very susceptible to small changes in data

Page 13: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Putting it all together•Suppose you want to measure the expected accuracy of your classifier, assess learning rate, and measure variance all at the same time?for (i=0;i<10;++i) { // variance reps

shuffle datado 10-way CV partition of datafor each train/test partition { // xval

for (pct=0.1;pct+=0.1;pct<=0.9) { // LCSubsample pct fraction of training settrain on subsample, test on test set

}}avg across all folds of CV partitiongenerate learning curve for this partition

}get mean and std across all curves

Page 14: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Putting it all together“hepatitis” data

Page 15: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

5 minutes of math...

•Decision trees are non-metric

•Don’t know anything about relations between instances, except sets induced by feature splits

•Often, we have well-defined distances between points

•Idea of distance encapsulated by a metric

Page 16: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

5 minutes of math...•Definition: a metric function

•is a function that obeys the following properties:

•Non-negativity:

•Reflexivity:

•Symmetry:

4.Triangle inequality:

Page 17: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

5 minutes of math...•Euclidean distance

Page 18: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

5 minutes of math

xa

xb

dE(xa,xb)

Page 19: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

5 minutes of math...•Manhattan (taxicab) distance

•Distance travelled along a grid between two points

•No diagonals allowed

•Good for integer features

Page 20: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

5 minutes of math

xa

xb

dM(xa,xb)

Page 21: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

5 minutes of math...•What if some attribute is categorical?

Page 22: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

5 minutes of math...•What if some attribute is categorical?

•Typical answer is Hamming (sometimes 0/1) distance:

•For each attribute, add 1 if the instances differ in that attribute, else 0

Page 23: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Distances in classification•Nearest neighbor rule: find the nearest instance to the query point in feature space, return the class of that instance

•Simplest possible distance-based classifier

•With more notation:

Page 24: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Distances in classification•Nearest neighbor rule: find the nearest instance to the query point in feature space, return the class of that instance

•Simplest possible distance-based classifier

•With more notation:

•Distance here is “whatever’s appropriate to your data”

Page 25: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Properties of NN•Training time of NN?

•Classification time?

•Geometry of model?

d( , )

Closer to

Closer to

Page 26: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Properties of NN•Training time of NN?

•Classification time?

•Geometry of model?

Page 27: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Properties of NN•Training time of NN?

•Classification time?

•Geometry of model?

Page 28: Steep learning curves Reading: DH&S, Ch 4.6, 4.5

NN miscellaney

•Slight generalization: k-Nearest neighbors (k-NN)

•Find k training instances closest to query point

•Vote among them for label

•Q: How does this affect system?