1 k-nearest neighbor methods william cohen 10-601 april 2008
TRANSCRIPT
![Page 1: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/1.jpg)
1
K-nearest neighbor methods
William Cohen
10-601 April 2008
![Page 2: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/2.jpg)
2
But first….
0
5
10
15
20
25
30
35
40
45
50
0 20 40 60 80 100 120 140 160
Number of Publications
Ag
e in
Yea
rs
267
1ˆ xy
![Page 3: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/3.jpg)
3
Onward: multivariate linear regression
1)(ˆ xxyx TTw
nxx ,....,1x
nyy ,....,1y
1
11
)(ˆ
ˆ...ˆˆ
XXX
xwxwyTT
kk
yw
knn
k
xx
xx
X
,....,
...
,....,
1
111
ny
y
...1
y
Univariate Multivariate
row is example
col is feature
2)](ˆ[minarg ww i
iT
ii y xww )(̂
![Page 4: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/4.jpg)
4
X Y
![Page 5: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/5.jpg)
5
![Page 6: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/6.jpg)
6
![Page 7: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/7.jpg)
7
ACM Computing Surveys 2002
![Page 8: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/8.jpg)
8
![Page 9: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/9.jpg)
9
Review of K-NN methods (so far)
![Page 10: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/10.jpg)
10
Kernel regression
• aka locally weighted regression, locally linear regression, LOESS, …
What does making the kernel wider do to bias and variance?
![Page 11: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/11.jpg)
11
BellCore’s MovieRecommender• Participants sent email to [email protected]• System replied with a list of 500 movies to rate on a
1-10 scale (250 random, 250 popular)– Only subset need to be rated
• New participant P sends in rated movies via email• System compares ratings for P to ratings of (a
random sample of) previous users• Most similar users are used to predict scores for
unrated movies (more later)• System returns recommendations in an email
message.
![Page 12: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/12.jpg)
12
Suggested Videos for: John A. Jamus.
Your must-see list with predicted ratings:
•7.0 "Alien (1979)"
•6.5 "Blade Runner"
•6.2 "Close Encounters Of The Third Kind (1977)"
Your video categories with average ratings:
•6.7 "Action/Adventure"
•6.5 "Science Fiction/Fantasy"
•6.3 "Children/Family"
•6.0 "Mystery/Suspense"
•5.9 "Comedy"
•5.8 "Drama"
![Page 13: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/13.jpg)
13
The viewing patterns of 243 viewers were consulted. Patterns of 7 viewers were found to be most similar. Correlation with target viewer:
•0.59 viewer-130 ([email protected])
•0.55 bullert,jane r ([email protected])
•0.51 jan_arst ([email protected])
•0.46 Ken Cross ([email protected])
•0.42 rskt ([email protected])
•0.41 kkgg ([email protected])
•0.41 bnn ([email protected])
By category, their joint ratings recommend:
•Action/Adventure:
•"Excalibur" 8.0, 4 viewers
•"Apocalypse Now" 7.2, 4 viewers
•"Platoon" 8.3, 3 viewers
•Science Fiction/Fantasy:
•"Total Recall" 7.2, 5 viewers
•Children/Family:
•"Wizard Of Oz, The" 8.5, 4 viewers
•"Mary Poppins" 7.7, 3 viewers
Mystery/Suspense: •"Silence Of The Lambs, The" 9.3, 3 viewers
Comedy: •"National Lampoon's Animal House" 7.5, 4 viewers •"Driving Miss Daisy" 7.5, 4 viewers •"Hannah and Her Sisters" 8.0, 3 viewers
Drama: •"It's A Wonderful Life" 8.0, 5 viewers •"Dead Poets Society" 7.0, 5 viewers •"Rain Man" 7.5, 4 viewers
Correlation of predicted ratings with your actual ratings is: 0.64 This number measures ability to evaluate movies accurately for you. 0.15 means low ability. 0.85 means very good ability. 0.50
means fair ability.
![Page 14: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/14.jpg)
14
Algorithms for Collaborative Filtering 1: Memory-Based Algorithms (Breese et al, UAI98)
• vi,j= vote of user i on item j
• Ii = items for which user i has voted
• Mean vote for i is
• Predicted vote for “active user” a is weighted sum
weights of n similar usersnormalizer
![Page 15: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/15.jpg)
15
Basic k-nearest neighbor classification
• Training method:– Save the training examples
• At prediction time:– Find the k training examples (x1,y1),…(xk,yk) that
are closest to the test example x
– Predict the most frequent class among those yi’s.
• Example: http://cgm.cs.mcgill.ca/~soss/cs644/projects/simard/
![Page 16: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/16.jpg)
16
What is the decision boundary?Voronoi diagram
![Page 17: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/17.jpg)
17
Convergence of 1-NN
x
yx1
y1
x2
y2
neighbor
P(Y|x1)
P(Y|x’’)
P(Y|x)
*'
22
'
2
1
)|'Pr()|*Pr(1
)|'Pr(1
)Pr(1
knnError)(
yy
y
xyYxy
xyY
yy
P
assume equal
let y*=argmax Pr(y|x)
rate)error optimal Bayes(2
))|*Pr(1(2
...
xy
![Page 18: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/18.jpg)
18
Basic k-nearest neighbor classification
• Training method:– Save the training examples
• At prediction time:– Find the k training examples (x1,y1),…(xk,yk) that
are closest to the test example x– Predict the most frequent class among those yi’s.
• Improvements:– Weighting examples from the neighborhood– Measuring “closeness”– Finding “close” examples in a large training set
quickly
![Page 19: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/19.jpg)
19
K-NN and irrelevant features
+ ++ ++ + + +oo o ooo ooooo ooo oo oo?
![Page 20: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/20.jpg)
20
K-NN and irrelevant features
+
+
+
+
+
++ +
o
o
o o
o
o
oo
o
o
o
oo
o
o
o
o
o?
![Page 21: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/21.jpg)
21
K-NN and irrelevant features
+
+
+
++
+ + +oo
o oo
o
ooo
o
ooo
oo
o
oo?
![Page 22: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/22.jpg)
22
Ways of rescaling for KNN
Normalized L1 distance:
Scale by IG:
Modified value distance metric:
![Page 23: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/23.jpg)
23
Ways of rescaling for KNN
Dot product:
Cosine distance:
TFIDF weights for text: for doc j, feature i: xi=tfi,j * idfi :
#occur. of term i in
doc j
#docs in corpus
#docs in corpus that contain term i
![Page 24: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/24.jpg)
24
Combining distances to neighbors
Standard KNN:
Distance-weighted KNN:
|}':')','{(|)',(
))(,(maxargˆ
yyDyxDyC
xNeighborsyCy y
)',(1)',(
))',(()',(
))(,(maxargˆ
}':')','{(
xxxxSIM
xxSIMDyC
xNeighborsyCy
yyDyx
y
}':')','{(
))',(1( 1 )',(yyDyx
xxSIMDyC
![Page 25: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/25.jpg)
25
![Page 26: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/26.jpg)
26
![Page 27: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/27.jpg)
27
William W. Cohen & Haym Hirsh (1998): Joins that Generalize: Text Classification Using WHIRL in
KDD 1998: 169-173.
![Page 28: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/28.jpg)
28
![Page 29: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/29.jpg)
29
![Page 30: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/30.jpg)
30
M1
M2
Vitor Carvalho and William W. Cohen (2008): Ranking Users for Intelligent Message Addressing in
ECIR-2008, and current work with Vitor, me, and Ramnath Balasubramanyan
![Page 31: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/31.jpg)
31
Computing KNN: pros and cons
• Storage: all training examples are saved in memory– A decision tree or linear classifier is much smaller
• Time: to classify x, you need to loop over all training examples (x’,y’) to compute distance between x and x’.– However, you get predictions for every class y
• KNN is nice when there are many many classes
– Actually, there are some tricks to speed this up…especially when data is sparse (e.g., text)
![Page 32: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/32.jpg)
32
Efficiently implementing KNN (for text)
IDF is nice computationally
![Page 33: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/33.jpg)
33
Tricks with fast KNN
K-means using r-NN1. Pick k points c1=x1,….,ck=xk as centers
2. For each xi, find Di=Neighborhood(xi)
3. For each xi, let ci=mean(Di)
4. Go to step 2….
![Page 34: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/34.jpg)
34
Efficiently implementing KNN
dj2
dj3
dj4
Selective classification: given a training set and test set, find the N test cases that you can most confidently classify
![Page 35: 1 K-nearest neighbor methods William Cohen 10-601 April 2008](https://reader036.vdocument.in/reader036/viewer/2022070412/56649e675503460f94b62b25/html5/thumbnails/35.jpg)
35
Train once and select 100 test cases to classify