![Page 1: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/1.jpg)
Model Selection
1
10-601 Introduction to Machine Learning
Matt GormleyLecture 4
January 29, 2018
Machine Learning DepartmentSchool of Computer ScienceCarnegie Mellon University
![Page 2: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/2.jpg)
Q&A
2
Q: How do we deal with ties in k-Nearest Neighbors (e.g. even k or equidistant points)?
A: I would ask you all for a good solution!
Q: How do we define a distance function when the features are categorical (e.g. weather takes values {sunny, rainy, overcast})?
A: Step 1: Convert from categorical attributes to numeric features (e.g. binary)Step 2: Select an appropriate distance function (e.g. Hamming distance)
![Page 3: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/3.jpg)
Reminders
• Homework 2: Decision Trees– Out: Wed, Jan 24– Due: Mon, Feb 5 at 11:59pm
• 10601 Notation Crib Sheet
3
![Page 4: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/4.jpg)
K-NEAREST NEIGHBORS
7
![Page 5: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/5.jpg)
k-Nearest Neighbors
Chalkboard:– KNN for binary classification– Distance functions– Efficiency of KNN– Inductive bias of KNN– KNN Properties
8
![Page 6: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/6.jpg)
KNN ON FISHER IRIS DATA
9
![Page 7: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/7.jpg)
Fisher Iris DatasetFisher (1936) used 150 measurements of flowers from 3 different species: Iris setosa (0), Iris virginica (1), Iris versicolor (2) collected by Anderson (1936)
10Full dataset: https://en.wikipedia.org/wiki/Iris_flower_data_set
Species Sepal Length
Sepal Width
Petal Length
Petal Width
0 4.3 3.0 1.1 0.1
0 4.9 3.6 1.4 0.1
0 5.3 3.7 1.5 0.2
1 4.9 2.4 3.3 1.0
1 5.7 2.8 4.1 1.3
1 6.3 3.3 4.7 1.6
1 6.7 3.0 5.0 1.7
![Page 8: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/8.jpg)
Fisher Iris DatasetFisher (1936) used 150 measurements of flowers from 3 different species: Iris setosa (0), Iris virginica (1), Iris versicolor (2) collected by Anderson (1936)
11Full dataset: https://en.wikipedia.org/wiki/Iris_flower_data_set
Species Sepal Length
Sepal Width
0 4.3 3.0
0 4.9 3.6
0 5.3 3.7
1 4.9 2.4
1 5.7 2.8
1 6.3 3.3
1 6.7 3.0
Deleted two of the four features, so that
input space is 2D
![Page 9: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/9.jpg)
KNN on Fisher Iris Data
12
![Page 10: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/10.jpg)
KNN on Fisher Iris Data
13
Special Case: Nearest Neighbor
![Page 11: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/11.jpg)
KNN on Fisher Iris Data
14
Special Case: Majority Vote
![Page 12: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/12.jpg)
KNN on Fisher Iris Data
15
![Page 13: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/13.jpg)
KNN on Fisher Iris Data
16
Special Case: Nearest Neighbor
![Page 14: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/14.jpg)
KNN on Fisher Iris Data
17
![Page 15: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/15.jpg)
KNN on Fisher Iris Data
18
![Page 16: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/16.jpg)
KNN on Fisher Iris Data
19
![Page 17: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/17.jpg)
KNN on Fisher Iris Data
20
![Page 18: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/18.jpg)
KNN on Fisher Iris Data
21
![Page 19: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/19.jpg)
KNN on Fisher Iris Data
22
![Page 20: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/20.jpg)
KNN on Fisher Iris Data
23
![Page 21: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/21.jpg)
KNN on Fisher Iris Data
24
![Page 22: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/22.jpg)
KNN on Fisher Iris Data
25
![Page 23: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/23.jpg)
KNN on Fisher Iris Data
26
![Page 24: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/24.jpg)
KNN on Fisher Iris Data
27
![Page 25: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/25.jpg)
KNN on Fisher Iris Data
28
![Page 26: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/26.jpg)
KNN on Fisher Iris Data
29
![Page 27: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/27.jpg)
KNN on Fisher Iris Data
30
![Page 28: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/28.jpg)
KNN on Fisher Iris Data
31
![Page 29: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/29.jpg)
KNN on Fisher Iris Data
32
![Page 30: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/30.jpg)
KNN on Fisher Iris Data
33
![Page 31: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/31.jpg)
KNN on Fisher Iris Data
34
![Page 32: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/32.jpg)
KNN on Fisher Iris Data
35
![Page 33: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/33.jpg)
KNN on Fisher Iris Data
36
Special Case: Majority Vote
![Page 34: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/34.jpg)
KNN ON GAUSSIAN DATA
37
![Page 35: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/35.jpg)
KNN on Gaussian Data
38
![Page 36: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/36.jpg)
KNN on Gaussian Data
39
![Page 37: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/37.jpg)
KNN on Gaussian Data
40
![Page 38: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/38.jpg)
KNN on Gaussian Data
41
![Page 39: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/39.jpg)
KNN on Gaussian Data
42
![Page 40: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/40.jpg)
KNN on Gaussian Data
43
![Page 41: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/41.jpg)
KNN on Gaussian Data
44
![Page 42: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/42.jpg)
KNN on Gaussian Data
45
![Page 43: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/43.jpg)
KNN on Gaussian Data
46
![Page 44: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/44.jpg)
KNN on Gaussian Data
47
![Page 45: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/45.jpg)
KNN on Gaussian Data
48
![Page 46: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/46.jpg)
KNN on Gaussian Data
49
![Page 47: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/47.jpg)
KNN on Gaussian Data
50
![Page 48: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/48.jpg)
KNN on Gaussian Data
51
![Page 49: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/49.jpg)
KNN on Gaussian Data
52
![Page 50: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/50.jpg)
KNN on Gaussian Data
53
![Page 51: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/51.jpg)
KNN on Gaussian Data
54
![Page 52: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/52.jpg)
KNN on Gaussian Data
55
![Page 53: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/53.jpg)
KNN on Gaussian Data
56
![Page 54: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/54.jpg)
KNN on Gaussian Data
57
![Page 55: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/55.jpg)
KNN on Gaussian Data
58
![Page 56: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/56.jpg)
KNN on Gaussian Data
59
![Page 57: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/57.jpg)
KNN on Gaussian Data
60
![Page 58: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/58.jpg)
KNN on Gaussian Data
61
![Page 59: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/59.jpg)
KNN on Gaussian Data
62
![Page 60: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/60.jpg)
K-NEAREST NEIGHBORS
63
![Page 61: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/61.jpg)
Questions
• How could k-Nearest Neighbors (KNN) be applied to regression?
• Can we do better than majority vote? (e.g. distance-weighted KNN)
• Where does the Cover & Hart (1967) Bayes error rate bound come from?
64
![Page 62: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/62.jpg)
KNN Learning ObjectivesYou should be able to…• Describe a dataset as points in a high dimensional space
[CIML]• Implement k-Nearest Neighbors with O(N) prediction• Describe the inductive bias of a k-NN classifier and relate
it to feature scale [a la. CIML]• Sketch the decision boundary for a learning algorithm
(compare k-NN and DT)• State Cover & Hart (1967)'s large sample analysis of a
nearest neighbor classifier• Invent "new" k-NN learning algorithms capable of dealing
with even k• Explain computational and geometric examples of the
curse of dimensionality
65
![Page 63: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/63.jpg)
k-Nearest NeighborsBut how do we choose k?
66
![Page 64: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/64.jpg)
MODEL SELECTION
67
![Page 65: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/65.jpg)
Model Selection
WARNING: • In some sense, our discussion of model
selection is premature. • The models we have considered thus far are
fairly simple.• The models and the many decisions available
to the data scientist wielding them will grow to be much more complex than what we’ve seen so far.
68
![Page 66: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/66.jpg)
Model Selection
Statistics• Def: a model defines the data
generation process (i.e. a set or family of parametric probability distributions)
• Def: model parameters are the values that give rise to a particular probability distribution in the model family
• Def: learning (aka. estimation) is the process of finding the parameters that best fit the data
• Def: hyperparameters are the parameters of a prior distribution over parameters
Machine Learning• Def: (loosely) a model defines the
hypothesis space over which learning performs its search
• Def: model parameters are the numeric values or structure selected by the learning algorithm that give rise to a hypothesis
• Def: the learning algorithm defines the data-driven search over the hypothesis space (i.e. search for good parameters)
• Def: hyperparameters are the tunable aspects of the model, that the learning algorithm does notselect
69
![Page 67: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/67.jpg)
Model Selection
Machine Learning• Def: (loosely) a model defines the
hypothesis space over which learning performs its search
• Def: model parameters are the numeric values or structure selected by the learning algorithm that give rise to a hypothesis
• Def: the learning algorithm defines the data-driven search over the hypothesis space (i.e. search for good parameters)
• Def: hyperparameters are the tunable aspects of the model, that the learning algorithm does notselect
70
• model = set of all possible trees, possibly restricted by some hyperparameters (e.g. max depth)
• parameters = structure of a specific decision tree
• learning algorithm = ID3, CART, etc.
• hyperparameters = max-depth, threshold for splitting criterion, etc.
Example: Decision Tree
![Page 68: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/68.jpg)
Model Selection
Machine Learning• Def: (loosely) a model defines the
hypothesis space over which learning performs its search
• Def: model parameters are the numeric values or structure selected by the learning algorithm that give rise to a hypothesis
• Def: the learning algorithm defines the data-driven search over the hypothesis space (i.e. search for good parameters)
• Def: hyperparameters are the tunable aspects of the model, that the learning algorithm does notselect
71
• model = set of all possible nearest neighbors classifiers
• parameters = none (KNN is an instance-based or non-parametric method)
• learning algorithm = for naïve setting, just storing the data
• hyperparameters = k, the number of neighbors to consider
Example: k-Nearest Neighbors
![Page 69: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/69.jpg)
Model Selection
Machine Learning• Def: (loosely) a model defines the
hypothesis space over which learning performs its search
• Def: model parameters are the numeric values or structure selected by the learning algorithm that give rise to a hypothesis
• Def: the learning algorithm defines the data-driven search over the hypothesis space (i.e. search for good parameters)
• Def: hyperparameters are the tunable aspects of the model, that the learning algorithm does notselect
72
• model = set of all linear separators
• parameters = vector of weights (one for each feature)
• learning algorithm = mistake based updates to the parameters
• hyperparameters = none (unless using some variant such as averaged perceptron)
Example: Perceptron
![Page 70: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/70.jpg)
Model Selection
Statistics• Def: a model defines the data
generation process (i.e. a set or family of parametric probability distributions)
• Def: model parameters are the values that give rise to a particular probability distribution in the model family
• Def: learning (aka. estimation) is the process of finding the parameters that best fit the data
• Def: hyperparameters are the parameters of a prior distribution over parameters
Machine Learning• Def: (loosely) a model defines the
hypothesis space over which learning performs its search
• Def: model parameters are the numeric values or structure selected by the learning algorithm that give rise to a hypothesis
• Def: the learning algorithm defines the data-driven search over the hypothesis space (i.e. search for good parameters)
• Def: hyperparameters are the tunable aspects of the model, that the learning algorithm does notselect
73
If “learning” is all about picking the best
parameters how do we pick the best
hyperparameters?
![Page 71: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/71.jpg)
Model Selection• Two very similar definitions:– Def: model selection is the process by which we choose
the “best” model from among a set of candidates– Def: hyperparameter optimization is the process by
which we choose the “best” hyperparameters from among a set of candidates (could be called a special case of model selection)
• Both assume access to a function capable of measuring the quality of a model
• Both are typically done “outside” the main training algorithm --- typically training is treated as a black box
74
![Page 72: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/72.jpg)
Example of Hyperparameter Opt.
Chalkboard:– Special cases of k-Nearest Neighbors– Choosing k with validation data– Choosing k with cross-validation
75
![Page 73: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/73.jpg)
Cross-ValidationCross validation is a method of estimating loss on held out data
Input: training data, learning algorithm, loss function (e.g. 0/1 error)Output: an estimate of loss function on held-out data
Key idea: rather than just a single “validation” set, use many! (Error is more stable. Slower computation.)
76
D = y(1)
y(2)
y(N)
x(1)
x(2)
x(N)
Fold 1
Fold 2
Fold 3
Fold 4
Algorithm: Divide data into folds (e.g. 4)1. Train on folds {1,2,3} and
predict on {4}2. Train on folds {1,2,4} and
predict on {3}3. Train on folds {1,3,4} and
predict on {2}4. Train on folds {2,3,4} and
predict on {1}Concatenate all the predictions and evaluate loss (almostequivalent to averaging loss over the folds)
![Page 74: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/74.jpg)
Model Selection
WARNING (again):– This section is only scratching the surface!– Lots of methods for hyperparameter
optimization: (to talk about later)• Grid search• Random search• Bayesian optimization• Graduate-student descent• …
Main Takeaway: – Model selection / hyperparameter optimization
is just another form of learning
77
![Page 75: Model Selection - Carnegie Mellon School of Computer Sciencemgormley/courses/10601-s18/slides/lecture4-ms.pdf · Model Selection Machine Learning • Def: (loosely) a modeldefines](https://reader030.vdocument.in/reader030/viewer/2022040714/5e1c0c3b21f5c44f01085a48/html5/thumbnails/75.jpg)
Model Selection Learning ObjectivesYou should be able to…• Plan an experiment that uses training, validation, and
test datasets to predict the performance of a classifier on unseen data (without cheating)
• Explain the difference between (1) training error, (2) validation error, (3) cross-validation error, (4) test error, and (5) true error
• For a given learning technique, identify the model, learning algorithm, parameters, and hyperparamters
• Define "instance-based learning" or "nonparametric methods"
• Select an appropriate algorithm for optimizing (aka. learning) hyperparameters
78