nearest neighbors algorithm
DESCRIPTION
Nearest Neighbors Algorithm. Lecturer: Yishay Mansour Presentation: Adi Haviv and Guy Lev. Lecture Overview. NN general overview Various methods of NN Models of the Nearest Neighbor Algorithm NN – Risk Analysis KNN – Risk Analysis Drawbacks Locality Sensitive Hashing (LSH) - PowerPoint PPT PresentationTRANSCRIPT
1
NEAREST NEIGHBORS ALGORITHMLecturer: Yishay MansourPresentation: Adi Haviv and Guy Lev
2
Lecture Overview• NN general overview• Various methods of NN• Models of the Nearest Neighbor Algorithm• NN – Risk Analysis • KNN – Risk Analysis• Drawbacks• Locality Sensitive Hashing (LSH)• Implementing KNN using LSH• Extension for Bounded Values• Extension for Real Numbers
3
General Overview• The nearest neighbors algorithm is a simple classification
algorithm that classify a new point according to its nearest neighbors class\label:• Let be a sample space of points and their classification. Given a
new point we find the and classify with • An implicit assumptions is that close points have the same
classification
4
Various methods of NN• NN - Given a new point x, we wish to find it's nearest point
and return it's classification.
• K-NN - Given a new point x, we wish to find it's k nearest points and return their average classification.
• Weighted - Given a new point x , we assign weights to all the sample points according to the distance from x and classify x according to the weighted average.
5
Models of the Nearest Neighbor Algorithm
1. Distribution over • Sample according to • .• The problem with this model is that is not measurable.
2. 2 distributions (negative points), (positive points) and parameter
• sample a class using such that • sample from and return •
3. Distribution over .• sample from D.
6
NN – Risk Analysis• The optimal classifier is: • r(x) = min{p(x), 1-p(x)}, • the loss of the optimal classifier is (Bayes Risk):
= = • P =
• We will prove that:
7
• we split to 2 cases: Simple Case and General Case.• Simple Case:
• There exist exatcly one such that• Proof:• Thus we get that the expected error is:
KNN vs. Bayes Risk Proof
8
• General Case:• The nearest neighbor of converge to • The classification of the neighborhood of is close to that of
• Proof• Simplifications:
• D(x) is non zero• P(x) is continues
• Theorem: for every given with probability 1.• Proof:
• be a sphere with radius with center • .
• Theorem: • Proof:
• = {the event that the NN algorithm made a mistake with a sample space of m points}
• Pr[
NN vs. Bayes Risk Proof Cont.
9
KNN – Risk Analysis• We go on to the case of K points, here we will gain that we
wont lose the factor of 2 of the Bayes Risk.• Denote:
• l= number of • The estimation is : • Our conditions are:
• • We want to proof that:
1. 2.
10
KNN vs. Bayes Risk Proof • Same as before we split to 2 cases.• Simple Case:
• All k nearest neighbors are identical to (1) is satisfied • proof
• By Chertoff bound:
11
KNN vs. Bayes Risk Proof • Proof for the General Case:• Define same as before.• ] > 0• Expected number of point that will fall in is • = number of points that fall in • at most k-1 in ] =
12
Drawbacks • No bound on the number of samples (m): the nearest
neighbor is dependent on the actual distribution • for example: we set m and take such that m
• The probability of error is at least • Determine the distance function- distance between points
should not be effected by different scales. 2 ways to normalize:• Assuming normal distribution : • Assuming uniform distribution:
𝛿𝛿 𝛿
? +¿−
13
Locality Sensitive Hashing (LSH)• Trivial algorithm for KNN : for every point go over all other
points and compute distance linear time.• We want to pre-process the sample set so that search
time would be sub-linear.• We can look at the following problem: given x and R, find
all y such that
14
Locality Sensitive Hashing (LSH)• A Locality Sensitive Hashing family is a set H of hash
functions s.t. for any p,q:• If then • If then for some probabilities
• Example:• • • • If then we have as required.
15
Implementing KNN using LSH• Step 1: Amplification:
• Use functions of the form where are randomly selected from H. Then:
• If then • If then
• k is chosen s.t. . Thus:
• Denote:
16
Implementing KNN using LSH Cont.• Step 2: Combination
• pick L functions (use L hash tables).• For each i:
• Probability of no-match for any of the functions: • For given δ, Choose , then we have:
• For “far” points, the probability to hit is so the probability of hitting a “far” point in any of the tables is bounded by
17
Implementing KNN using LSH Cont.• We are given LSH family H and sample set.• Pre-processing:
• Pick L functions (use L hash tables).• Insert each sample x to each table i, according to
• Finding nearest neighbors of q:• For each i calculate and search in the ith table.• Thus obtain • Check the distance between q and each point in P.
18
Implementing KNN using LSH Cont.• Complexity:• Space Complexity: L tables, each containing n samples.
Therefore: • Search time complexity:
• O(L) queries to hash tables.• We assume lookup time is constant.
• For each sample retrieved we check if it is “close”.• Expected number of “far” points is at most
Therefore rejecting “far” samples is O(L).• Time for processing “close” samples: O(kL)
• Where k is number of desired neighbors.
19
Extension for Bounded Values• Sample space is• We use as distance metric.• Use unary encoding:
• Represent each coordinate by a block of s bits• A value t is represented by t consecutive 1s followed by s-t zeros.
• Example: s=8, x=<5,7>• Representation of x: 1111100011111110
• Hamming distance in this representation is same as distance in the original representation.
• Problems with real values can be reduced to this solution by quantization.
L1
L1
20
Extension for Real Numbers• Sample space is X = • Assume R<<1• Pick randomly and uniformly • Hash function is:
• For :• Therefore:
• If R is small then:
21
Extension for Real Numbers Cont.• Therefore:
• So we get a separation between and given a big enough constant c.