instance based learning
DESCRIPTION
Instance Based Learning. Nearest Neighbor. Remember all your data When someone asks a question Find the nearest old data point Return the answer associated with it In order to say what point is nearest, we have to define what we mean by "near". - PowerPoint PPT PresentationTRANSCRIPT
Nearest Neighbor • Remember all your data
• When someone asks a question– Find the nearest old data point
– Return the answer associated with it
• In order to say what point is nearest, we have to define what we mean by "near".
• Typically, we use Euclidean distance between two points.
2)2()1(2)2(2
)1(2
2)2(1
)1(1 )(...)()( kk aaaaaa
Nominal attributes: distance is set to 1 if values are different, 0 if they are equal
Predicting Bankruptcy• Now, let's say we have a new person with R equal to 0.3 and L
equal to 2. • What y value should we predict?
And so our answer would be "no".
Scaling• The naïve Euclidean distance isn't always appropriate.
• Consider the case where we have two features describing a car.
– f1 = weight in pounds
– f2 = number of cylinders.
• Any effect of f2 will be completely lost because of the relative scales.
• So, rescale the inputs to put all of the features on about equal footing:
ii
iii vv
vva
minmax
min
Time and Space• Learning is fast
– We just have to remember the training data.
• Space is n.
• What takes longer is answering a query.
• If we do it naively, we have to, for each point in our training set (and there are n of them) compute the distance to the query point (which takes about m computations, since there are m features to compare).
• So, overall, this takes about m * n time.
Remedy: K-Nearest Neighbors• k-nearest neighbor algorithm:
– Just like the old algorithm, except that when we get a query, we'll search for the k closest points to the query points.
• Output what the majority says.
– In this case, we've chosen k to be 3.
– The three closest points consist of two "no"s and a "yes", so our answer would be "no".
Find the optimal k using cross-validation
Other Variants• IB2: save memory, speed up classification
– Work incrementally
– Only incorporate misclassified instances
– Problem: noisy data gets incorporated
• IB3: deal with noise– Discard instances that don’t perform well
– Keep a record of the number of correct and incorrect classification decisions that each exemplar makes.
– Two predetermined thresholds are set on success ratio. • If the performance of exemplar falls below the low threshold it is
deleted.
• If the performance exceeds the upper threshold it is used for prediction.
Instance-based learning: IB2• IB2: save memory, speed
up classification– Work incrementally– Only incorporate
misclassified instances– Problem: noisy data
gets incorporated
Data: “Who buys gold jewelry”
(25,60,no) (45,60,no) (50,75,no) (50,100,no)
(50,120,no) (70,110,yes) (85,140,yes) (30,260,yes)
(25,400,yes) (45,350,yes) (50,275,yes) (60,260,yes)
Instance-based learning: IB2• Data:
– (25,60,no) – (85,140,yes) – (45,60,no) – (30,260,yes) – (50,75,no) – (50,120,no)– (70,110,yes)– (25,400,yes)– (50,100,no)– (45,350,yes)– (50,275,yes)– (60,260,yes)
This is the final answer. I.e. we memorize only these 5 points. However, let’s compute gradually the classifier.
Instance-based learning: IB2• Data:
– (25,60,no) – (85,140,yes)
Since so far the model has only the first
instance memorized, this second instance
gets wrongly classified. So, we memorize it as
well.
Instance-based learning: IB2• Data:
– (25,60,no) – (85,140,yes) – (45,60,no)
So far the model has the two first instances memorized.
The third instance gets properly classified, since it happens to be
closer with the first. So, we don’t memorize it.
Instance-based learning: IB2• Data:
– (25,60,no) – (85,140,yes) – (45,60,no) – (30,260,yes)
So far the model has the two first instances memorized.
The fourth instance gets properly classified, since it happens to be
closer with the second. So, we don’t memorize it.
Instance-based learning: IB2• Data:
– (25,60,no) – (85,140,yes) – (45,60,no) – (30,260,yes)– (50,75,no)
So far the model has the two first instances memorized.
The fifth instance gets properly classified, since it happens to be
closer with the first. So, we don’t memorize it.
Instance-based learning: IB2• Data:
– (25,60,no) – (85,140,yes) – (45,60,no) – (30,260,yes)– (50,75,no)– (50,120,no)
So far the model has the two first instances memorized.
The sixth instance gets wrongly classified, since it happens to be
closer with the second. So, we memorize it.
Instance-based learning: IB2• Continuing in a similar
way, we finally get, the figure in the right. – The colored points are
the one that get memorized.
This is the final answer. I.e. we memorize only these 5 points.
Instance-based learning: IB3• IB3: deal with noise
– Discard instances that don’t perform well
– Keep a record of the number of correct and incorrect classification decisions that each exemplar makes.
– Two predetermined thresholds are set on success ratio.
– An instance is used for training: • If the number of incorrect classifications is the first threshold, and
• If the number of correct classifications the second threshold.
Instance-based learning: IB3• Suppose the lower
threshold is 0, and upper threshold is 1.
• Shuffle the data first– (25,60,no)– (85,140,yes)– (45,60,no)– (30,260,yes)– (50,75,no)– (50,120,no)– (70,110,yes)– (25,400,yes)– (50,100,no)– (45,350,yes)– (50,275,yes)– (60,260,yes)
Instance-based learning: IB3• Suppose the lower
threshold is 0, and upper threshold is 1.
• Shuffle the data first– (25,60,no) [1,1] – (85,140,yes) [1,1]– (45,60,no) [0,1]– (30,260,yes) [0,2]– (50,75,no) [0,1]– (50,120,no) [0,1]– (70,110,yes) [0,0]– (25,400,yes) [0,1]– (50,100,no) [0,0]– (45,350,yes) [0,0]– (50,275,yes) [0,1]– (60,260,yes) [0,0]
Instance-based learning: IB3• The points that will be
used in classification are:– (45,60,no) [0,1]– (30,260,yes) [0,2]– (50,75,no) [0,1]– (50,120,no) [0,1]– (25,400,yes) [0,1]– (50,275,yes) [0,1]
Rectangular generalizations• When a new exemplar is classified correctly, it is generalized by
simply merging it with the nearest exemplar.
• The nearest exemplar may be either a single instance or a hyper-rectangle.
Rectangular generalizations• Data:
– (25,60,no)– (85,140,yes)– (45,60,no)– (30,260,yes)– (50,75,no)– (50,120,no)– (70,110,yes)– (25,400,yes)– (50,100,no)– (45,350,yes)– (50,275,yes)– (60,260,yes)
Classification
Class 1
Class
2
Separation line
• If the new instance lies within a rectangle then output the rectangle class
• If the new instance lies in the overlap of several rectangles, then output the class of the rectangle whose center is the closest to the new data instance.
• If the new instance lies outside any of the rectangles, output the class of the rectangle, which is the closest to the data instance.
• The distance of a point from a rectangle is:
1. If an instance lies within rectangle, d=0
2. If outside, d = distance from the closest rectangle part, i.e. distance from some point in the rectangle boundary.