![Page 1: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/1.jpg)
Three kinds of learning Supervised learning
Learning some mapping from inputs to outputs
Unsupervised learning Given “data”, what kinds of patterns
can you find? Reinforcement learning
Learn from positive negative reinforcement
![Page 2: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/2.jpg)
Categorical data example
Example from Ross Quinlan, Decision Tree Induction; graphics from Tom Mitchell, Machine Learning
![Page 3: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/3.jpg)
Decision Tree Classification
![Page 4: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/4.jpg)
Which feature to split on?
Try to classify as many as possible with each split(This is a good split)
![Page 5: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/5.jpg)
Which feature to split on?
These are bad splits – no classifications obtained
![Page 6: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/6.jpg)
Improving a good split
![Page 7: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/7.jpg)
Decision Tree Algorithm Framework
Use splitting criterion to decide on best attribute to split
Each child is new decision tree – recurse with parent feature removed
If all data points in child node are same class, classify node as that class
If no attributes left, classify by majority rule If no data points left, no such example
seen: classify as majority class from entire dataset
![Page 8: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/8.jpg)
How do we know which splits are good?
Want nodes as “pure” as possible How do we quantify “randomness”
of a node? Want All elements +: “randomness” = 0 All elements –: “randomness” = 0 Half +, half -: “randomness” = 1 Draw plot
What should “randomness” function look like?
![Page 9: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/9.jpg)
Typical solution: Entropy pp = proportion of + examples pn = proportion of – examples
A collection with low entropy is good.
NNPP ppppEntropy lglg
![Page 10: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/10.jpg)
ID3 Criterion Split on feature with most
information gain. Gain = entropy in original node –
weighted sum of entropy in child nodes
child
childEntropyparentofsize
childofsize
parentEntropysplitGain
)(
)()(
![Page 11: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/11.jpg)
How good is this split?
985.7
4log
7
4
7
3log
7
322 592.0
7
1log
7
1
7
6log
7
622
789.0)592(.14
7)985(.
14
7average Weighted
940.14
5log
14
5
14
9log
14
922
151.789.940. Gain
![Page 12: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/12.jpg)
How good is this split?
971.5
3log
5
3
5
2log
5
222
694.0)971(.14
5)0(
14
4)971(.
14
5average Weighted
940.14
5log
14
5
14
9log
14
922
246.694.940. Gain
04
0log
4
0
4
4log
4
422 971.
5
2log
5
2
5
3log
5
322
![Page 13: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/13.jpg)
The big picture Start with root Find attribute to split on with most
gain Recurse
![Page 14: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/14.jpg)
Assessment How do I know how well my
decision tree works? Training set: data that you use to
build decision tree Test set: data that you did not use
for training that you use to assess the quality of decision tree
![Page 15: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/15.jpg)
Issues on training and test sets
Do you know the correct classification for the test set?
If you do, why not include it in the training set to get a better classifier?
If you don’t, how can you measure the performance of your classifier?
![Page 16: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/16.jpg)
Cross Validation Tenfold cross-validation
Ten iterations Pull a different tenth of the dataset out
each time to act as a test set Train on the remaining training set Measure performance on the test set
Leave one out cross-validation Similar, but leave only one point out
each time, then count correct vs. incorrect
![Page 17: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/17.jpg)
Noise and Overfitting Can we always obtain a decision tree
that is consistent with the data? Do we always want a decision tree that
is consistent with the data? Example: Predict Carleton students who
become CEOs Features: state/country of origin, GPA letter,
major, age, high school GPA, junior high GPA, ...
What happens with only a few features? What happens with many features?
![Page 18: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/18.jpg)
Overfitting Fitting a classifier “too closely” to
the data finding patterns that aren’t really there
Prevented in decision trees by pruning When building trees, stop recursion on
irrelevant attributes Do statistical tests at node to
determine if should continue or not
![Page 19: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/19.jpg)
Examples of decision treesusing Weka
![Page 20: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/20.jpg)
Preventing overfitting by cross validation
Another technique to prevent overfitting (is this valid)? Keep on recursing on decision tree as
long as you continue to get improved accuracy on the test set
![Page 21: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/21.jpg)
Ensemble Methods Many “weak” learners, when
combined together, can perform more strongly than any one by itself
Bagging & Boosting: many different learners, voting on which classification Multiple algorithms, or different
features, or both
![Page 22: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/22.jpg)
Bagging / Boosting Bagging: vote to determine answer
Run one algorithm on random subsets of data to obtain multiple classifiers
Boosting: weighted vote to determine answer Each iteration, weight more heavily data that
learner got wrong What does it mean to “weight more heavily” for
k-nn? For decision trees? AdaBoost is recent (1997) and has become
popular, fast
![Page 23: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/23.jpg)
Computational Learning Theory
![Page 24: Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a36441/html5/thumbnails/24.jpg)
Chapter 20 up next Moving on to Chapter 20:
statistical learning methods Skipping to: will revisit earlier
topics (perhaps) near end of course 20.5: Neural Networks 20.6: Support vector machines