lazy learning k-nearest neighbour motivation: availability of large amounts of processing power...

Lazy Learningk-Nearest Neighbour

Motivation: availability of large amounts of processingpower improves our ability to tune k-NN classifiers

What is Lazy Learning?

• Compare ANNs and CBR or k-NN classifier– Artificial Neural Networks are eager learners

• training examples compiled into a model at training time

• not available at runtime

– CBR or k-Nearest Neighbour are lazy• little offline learning done

• work deferred to runtime

Compare conventional use of lazy-eager in computer science

Outline

• Classification problems

• Classification techniques

• k-Nearest Neighbour– Condense training Set– Feature selection– Feature weighting

• Ensemble techniques in ML

Classification problems

• Exemplar characterised by a set of features; decide class to which exemplar belongs

Compare regression problems

• Exemplar characterised by a set of features;

• decide value of continuous output (dependant) variable

Classifying apples and pears

Greeness Height Width Taste Weight Height/Width ClassNo. 1 210 60 62 Sweet 186 0.97 AppleNo. 2 220 70 51 Sweet 180 1.37 PearNo. 3 215 55 55 Tart 152 1.00 AppleNo. 4 180 76 40 Sweet 152 1.90 PearNo. 5 220 68 45 Sweet 153 1.51 PearNo. 6 160 65 68 Sour 221 0.96 AppleNo. 7 215 63 45 Sweet 140 1.40 PearNo. 8 180 55 56 Sweet 154 0.98 AppleNo. 9 220 68 65 Tart 221 1.05 Apple

No. 10 190 60 58 Sour 174 1.03 Apple

No. x 222 70 55 Sweet 185 1.27 ?

To what class does this belong?

Distance/Similarity Function

For query q and training set X (described by features F)compute d(x,q) for each x X, where

F

qxf

fff qxwd ),(),(

continuous is

and discrete is 1

and discrete is 0

),(

fqx

qxf

qxf

qx

ff

ff

ff

ff

and where

Category of q decided by its k Nearest Neighbours

k-NN and Noise

• 1-NN easy to implement– susceptible to noise

• a misclassification every time a noisy pattern retrieved

• k-NN with k 3 will overcome this

e.g. Pregnancy prediction

http://svr-www.eng.cam.ac.uk/projects/qamc/

e.g. MVT

• Machine Vision for inspection of PCBs– components present or

absent– solder joints good or bad

Components present?Absent

Present

Characterise image as a set of features

type name Wid2 Wid3 CenX CenY M1 Sig1 M2 Sig2 M3 Sig3 Min2c0402_mvc c815 556 1344 3 28 134 7 61 16 109 5 51c0402_mvc c804 1221 1253 -20 -49 127 30 78 34 97 39 54c0402_mvc c802 441 1189 -45 -52 122 28 91 24 89 40 68c0402_mvc c808 532 1294 59 60 130 23 74 29 138 9 58c0402_mvc c806 1384 1492 -9 65 140 6 72 15 144 13 62c0402_mvc c605 943 1278 51 -9 116 29 68 28 139 7 54c0402_mvc c813 1446 1462 209 48 93 15 139 29 162 6 100c0402_mvc c606 1219 1302 40 -8 161 7 93 25 135 3 65c0402_mvc c710 1113 1128 -99 -13 145 6 95 40 88 38 56c0402_mvc c703 1090 1386 -56 -18 149 11 72 28 147 14 52c0402_mvc c761 1214 1203 -95 -21 149 11 77 34 113 40 56c0402_mvc c701 1487 1296 -30 33 142 9 73 28 135 12 54c0402_mvc c732 1038 1196 -19 -3 148 8 62 10 100 44 56c0402_mvc c753 1015 1288 58 -16 123 13 73 35 128 8 54c0402_mvc c751 1146 1036 -163 -25 140 5 102 34 85 2 80c0402_mvc c760 1113 1091 -121 44 133 11 94 44 96 37 57

Classification techniques• Artificial Neural Networks

– also good for non linear regression

– black box• development tricky

• users do not know what is going on

• Decision Trees– built using induction (information theoretic analysis)

• k-Nearest Neighbour classifiers– keep training examples, find k nearest at run time

Dimension reduction in k-NN

• Not all features required– noisy features a

hindrance

• Some examples redundant– retrieval time depends on

no. of examples

p features

q best features

n covering examples

m examples

Condensed NND set of training samplesFind E where E D; NN rule used with E should be as good as with D

choose x D randomly, D D \ {x}, E {x},DO

learning? FALSE, FOR EACH x D

classify x by NN using E,if classification incorrect

then E E {x}, D D \ {x}, learning TRUE,

WHILE (learning? FALSE)

Condensed NN

100 examples2 categories

Different CNN solutions

Improving Condensed NN

• Sort data based on distance to nearest unlike neighbour

A

B

– identify exemplars near decision surface

– in diagram

B more useful than A

• Different outcomes depending on data order– that’s a bad thing in an algorithm

Condensed NN

100 examples2 categories

Different CNN solutions

CNNusingNUN

m

Feature selection

Greeness Height Width Taste Weight Height/Width ClassNo. 1 210 60 62 Sweet 186 0.97 AppleNo. 2 220 70 51 Sweet 180 1.37 PearNo. 3 215 55 55 Tart 152 1.00 AppleNo. 4 180 76 40 Sweet 152 1.90 PearNo. 5 220 68 45 Sweet 153 1.51 PearNo. 6 160 65 68 Sour 221 0.96 AppleNo. 7 215 63 45 Sweet 140 1.40 PearNo. 8 180 55 56 Sweet 154 0.98 AppleNo. 9 220 68 65 Tart 221 1.05 Apple

No. 10 190 60 58 Sour 174 1.03 Apple

• Irrelevant features are noise:– make classification harder

• Extra features add to computation cost mpT p

Ensemble techniques

• For the user with more machine cycles than they know what to do with

Outcome

Combiner

Classifiers

• Build several classifiers– different training subsets

– different feature subsets

• Aggregate results– voting

• vote based on generalisation error

Conclusions

• Finding a covering set of training data– very good solutions exist

• Compare with results of Ensemble techniques

lazy learning k-nearest neighbour motivation: availability of large amounts of processing power...

Documents