aprendizagem baseada em instâncias (k vizinhos mais próximos)

Aprendizagem baseada Aprendizagem baseada em instem instânciasâncias

(K vizinhos mais (K vizinhos mais próximos)próximos)

SAD Tagus 2004/05 H. Galhardas

The The kk-Nearest Neighbor -Nearest Neighbor Algorithm (1)Algorithm (1)

All instances correspond to points in the n-D space.All instances correspond to points in the n-D space. Given an unknown tuple, the Given an unknown tuple, the k-NN classifierk-NN classifier searches the searches the

pattern space for the k training tuples that are closest to pattern space for the k training tuples that are closest to the unknown tuple.the unknown tuple.

The The nearest neighbornearest neighbor is defined in terms of Euclidean is defined in terms of Euclidean distance:distance:

The The target functiontarget function could be discrete- or real- valued. could be discrete- or real- valued. For discrete-valued, the For discrete-valued, the kk-NN -NN returnsreturns the most common the most common

value among the k training examples nearest tovalue among the k training examples nearest to xxqq. .

)2||...2|22

|2|11

(|),(jpx

ipx

jx

ix

jx

ixjid


The The kk-Nearest Neighbor -Nearest Neighbor Algorithm (2)Algorithm (2)

Key ideaKey idea: just store all training examples <x: just store all training examples <x ii,f(x,f(xii)>)>

Nearest neighbor:Nearest neighbor: Given query instance xGiven query instance xqq, first locate nearest , first locate nearest

training example xtraining example xnn, then estimate f(x, then estimate f(xqq)=f(x)=f(xnn))

K-nearest neighbor:K-nearest neighbor: Given xGiven xqq, take vote among its k nearest neighbors , take vote among its k nearest neighbors

(if (if discrete-valueddiscrete-valued target function) target function) Take mean of f values of k nearest neighbors (if Take mean of f values of k nearest neighbors (if

real-valuedreal-valued) f(x) f(xqq)=)=i=1i=1kk f(x f(xii)/k)/k


Lazy vs Eager LearningLazy vs Eager Learning

Instance-based learning: lazy evaluation lazy evaluation Decision-tree and Bayesian classification: eager evaluation eager evaluation Key differences

Lazy: may consider query instance xq when deciding how to generalize beyond the training data D

Eager: cannot since they have already chosen global approximation when seeing the query

EfficiencyEfficiency: Lazy - less time training but more time predicting: Lazy - less time training but more time predicting AccuracyAccuracy

Lazy: effectively uses a richer hypothesis space since it uses many local linear functions to form its implicit global approximation to the target function

Eager: must commit to a single hypothesis that covers the entire instance space


When to Consider Nearest When to Consider Nearest NeighborsNeighbors

Instances map to points in RInstances map to points in RNN

Less than 20 attributes per instance, typically normalizedLess than 20 attributes per instance, typically normalized Lots of training dataLots of training dataAdvantagesAdvantages:: Training is very fast Training is very fast Learn complex target functionsLearn complex target functions Do not loose informationDo not loose informationDisadvantagesDisadvantages:: Slow at query time Slow at query time

Presorting and indexing training samples into search trees reduces time

Easily fooled by irrelevant attributesEasily fooled by irrelevant attributes


How to determine the How to determine the good value for K?good value for K?

Determined Determined experimentallyexperimentally Start with K=1 and use a test set to Start with K=1 and use a test set to

validate the error rate of the classifiervalidate the error rate of the classifier Repeat with K=K+1Repeat with K=K+1 Choose the Choose the value of K for which the value of K for which the

error rate is minimumerror rate is minimum


Definition of Voronoi Definition of Voronoi diagramdiagram

the decision surface induced by 1-NN for a typical set the decision surface induced by 1-NN for a typical set of training examples.of training examples.

.

_+

_ xq

+

_ _+

_

_

+

.

..

. .


1-Nearest Neighbor1-Nearest Neighbor

query point qf

nearest neighbor qi


3-Nearest Neighbors3-Nearest Neighbors

query point qf

3 nearest neighbors

2x,1o


7-Nearest Neighbors7-Nearest Neighbors

query point qf

7 nearest neighbors

3x,4o


Distance Weighted k-NNDistance Weighted k-NN

Give more weight to neighbors closer to Give more weight to neighbors closer to the query pointthe query point

ff^̂(x(xqq) = ) = i=1i=1kk w wii f(x f(xii) / ) / i=1i=1

k k wwii

where wwhere wii=K(d(x=K(d(xqq,x,xii)) and d(x)) and d(xqq,x,xii) is the ) is the

distance between xdistance between xqq and x and xii


Curse of DimensionalityCurse of Dimensionality

Imagine instances described by 20 attributes but only 3 Imagine instances described by 20 attributes but only 3 are relevant to target functionare relevant to target function

Curse of dimensionalityCurse of dimensionality: nearest neighbor is easily : nearest neighbor is easily misled when instance space is high-dimensionalmisled when instance space is high-dimensional

One approach:One approach: Stretch j-th axis by weight zStretch j-th axis by weight zjj, where z, where z11,…,z,…,znn chosen to chosen to

minimize prediction errorminimize prediction error Use cross-validation to automatically choose weights Use cross-validation to automatically choose weights

zz11,…,z,…,znn Note setting zNote setting zjj to zero eliminates this dimension to zero eliminates this dimension

alltogether (feature subset selection)alltogether (feature subset selection)


BibliografiaBibliografia

Data Mining: Concepts and TechniquesData Mining: Concepts and Techniques, J. , J. Han & M. Kamber, Morgan Kaufmann, Han & M. Kamber, Morgan Kaufmann, 2001 (Sect. 7.7.1)2001 (Sect. 7.7.1)

Machine Learning, Tom Mitchell, McGraw Machine Learning, Tom Mitchell, McGraw 1997 (Cap 8)1997 (Cap 8)

aprendizagem baseada em instâncias (k vizinhos mais próximos)

Documents