aprendizagem baseada em instâncias (k vizinhos mais próximos)
Post on 21-Dec-2015
218 views
TRANSCRIPT
Aprendizagem baseada Aprendizagem baseada em instem instânciasâncias
(K vizinhos mais (K vizinhos mais próximos)próximos)
SAD Tagus 2004/05 H. Galhardas
The The kk-Nearest Neighbor -Nearest Neighbor Algorithm (1)Algorithm (1)
All instances correspond to points in the n-D space.All instances correspond to points in the n-D space. Given an unknown tuple, the Given an unknown tuple, the k-NN classifierk-NN classifier searches the searches the
pattern space for the k training tuples that are closest to pattern space for the k training tuples that are closest to the unknown tuple.the unknown tuple.
The The nearest neighbornearest neighbor is defined in terms of Euclidean is defined in terms of Euclidean distance:distance:
The The target functiontarget function could be discrete- or real- valued. could be discrete- or real- valued. For discrete-valued, the For discrete-valued, the kk-NN -NN returnsreturns the most common the most common
value among the k training examples nearest tovalue among the k training examples nearest to xxqq. .
)2||...2|22
|2|11
(|),(jpx
ipx
jx
ix
jx
ixjid
SAD Tagus 2004/05 H. Galhardas
The The kk-Nearest Neighbor -Nearest Neighbor Algorithm (2)Algorithm (2)
Key ideaKey idea: just store all training examples <x: just store all training examples <x ii,f(x,f(xii)>)>
Nearest neighbor:Nearest neighbor: Given query instance xGiven query instance xqq, first locate nearest , first locate nearest
training example xtraining example xnn, then estimate f(x, then estimate f(xqq)=f(x)=f(xnn))
K-nearest neighbor:K-nearest neighbor: Given xGiven xqq, take vote among its k nearest neighbors , take vote among its k nearest neighbors
(if (if discrete-valueddiscrete-valued target function) target function) Take mean of f values of k nearest neighbors (if Take mean of f values of k nearest neighbors (if
real-valuedreal-valued) f(x) f(xqq)=)=i=1i=1kk f(x f(xii)/k)/k
SAD Tagus 2004/05 H. Galhardas
Lazy vs Eager LearningLazy vs Eager Learning
Instance-based learning: lazy evaluation lazy evaluation Decision-tree and Bayesian classification: eager evaluation eager evaluation Key differences
Lazy: may consider query instance xq when deciding how to generalize beyond the training data D
Eager: cannot since they have already chosen global approximation when seeing the query
EfficiencyEfficiency: Lazy - less time training but more time predicting: Lazy - less time training but more time predicting AccuracyAccuracy
Lazy: effectively uses a richer hypothesis space since it uses many local linear functions to form its implicit global approximation to the target function
Eager: must commit to a single hypothesis that covers the entire instance space
SAD Tagus 2004/05 H. Galhardas
When to Consider Nearest When to Consider Nearest NeighborsNeighbors
Instances map to points in RInstances map to points in RNN
Less than 20 attributes per instance, typically normalizedLess than 20 attributes per instance, typically normalized Lots of training dataLots of training dataAdvantagesAdvantages:: Training is very fast Training is very fast Learn complex target functionsLearn complex target functions Do not loose informationDo not loose informationDisadvantagesDisadvantages:: Slow at query time Slow at query time
Presorting and indexing training samples into search trees reduces time
Easily fooled by irrelevant attributesEasily fooled by irrelevant attributes
SAD Tagus 2004/05 H. Galhardas
How to determine the How to determine the good value for K?good value for K?
Determined Determined experimentallyexperimentally Start with K=1 and use a test set to Start with K=1 and use a test set to
validate the error rate of the classifiervalidate the error rate of the classifier Repeat with K=K+1Repeat with K=K+1 Choose the Choose the value of K for which the value of K for which the
error rate is minimumerror rate is minimum
SAD Tagus 2004/05 H. Galhardas
Definition of Voronoi Definition of Voronoi diagramdiagram
the decision surface induced by 1-NN for a typical set the decision surface induced by 1-NN for a typical set of training examples.of training examples.
.
_+
_ xq
+
_ _+
_
_
+
.
..
. .
SAD Tagus 2004/05 H. Galhardas
1-Nearest Neighbor1-Nearest Neighbor
query point qf
nearest neighbor qi
SAD Tagus 2004/05 H. Galhardas
3-Nearest Neighbors3-Nearest Neighbors
query point qf
3 nearest neighbors
2x,1o
SAD Tagus 2004/05 H. Galhardas
7-Nearest Neighbors7-Nearest Neighbors
query point qf
7 nearest neighbors
3x,4o
SAD Tagus 2004/05 H. Galhardas
Distance Weighted k-NNDistance Weighted k-NN
Give more weight to neighbors closer to Give more weight to neighbors closer to the query pointthe query point
ff^̂(x(xqq) = ) = i=1i=1kk w wii f(x f(xii) / ) / i=1i=1
k k wwii
where wwhere wii=K(d(x=K(d(xqq,x,xii)) and d(x)) and d(xqq,x,xii) is the ) is the
distance between xdistance between xqq and x and xii
SAD Tagus 2004/05 H. Galhardas
Curse of DimensionalityCurse of Dimensionality
Imagine instances described by 20 attributes but only 3 Imagine instances described by 20 attributes but only 3 are relevant to target functionare relevant to target function
Curse of dimensionalityCurse of dimensionality: nearest neighbor is easily : nearest neighbor is easily misled when instance space is high-dimensionalmisled when instance space is high-dimensional
One approach:One approach: Stretch j-th axis by weight zStretch j-th axis by weight zjj, where z, where z11,…,z,…,znn chosen to chosen to
minimize prediction errorminimize prediction error Use cross-validation to automatically choose weights Use cross-validation to automatically choose weights
zz11,…,z,…,znn Note setting zNote setting zjj to zero eliminates this dimension to zero eliminates this dimension
alltogether (feature subset selection)alltogether (feature subset selection)
SAD Tagus 2004/05 H. Galhardas
BibliografiaBibliografia
Data Mining: Concepts and TechniquesData Mining: Concepts and Techniques, J. , J. Han & M. Kamber, Morgan Kaufmann, Han & M. Kamber, Morgan Kaufmann, 2001 (Sect. 7.7.1)2001 (Sect. 7.7.1)
Machine Learning, Tom Mitchell, McGraw Machine Learning, Tom Mitchell, McGraw 1997 (Cap 8)1997 (Cap 8)