learning what questions to ask. 8/29/03decision trees2 job is to build a tree that represents a...
TRANSCRIPT
![Page 1: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/1.jpg)
DECISION TREESLearning what questions to ask
![Page 2: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/2.jpg)
Decision Trees 28/29/03
Decision tree Job is to build a tree that represents a series of
questions that the classifier will ask of a data instance that is to be classified Each node is a question about the value that the
instance to be classified has in a particular dimension
Outlook Humidity Wind Play Tennis?
Sunny Normal Weak ???
How would the decision tree classify this data instance
Discrete Data
Fan-out of each node determined by how many different values that dimension can take-on
Play Tennis?
![Page 3: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/3.jpg)
Decision Trees 38/29/03
Training
Training data is used to build the tree How decide what question to ask first? Remember the curse of dimensionality
There might be just a few dimensions that are important and the rest could be random
Training builds the tree
Classifying means using the tree
![Page 4: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/4.jpg)
Decision Trees 48/29/03
What Question to Ask
What question can I ask about the data that will give me the most information gain Closer to being able to classify…
Identifying the most important dimension (most important question)
What to ask next…
What is the outlook?
How humid is it?How windy is it?
![Page 5: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/5.jpg)
Decision Trees 58/29/03
Approach comes out of Information Theory From Wikipedia: developed by Claude E.
Shannon to find fundamental limits on signal processing operations such as compressing data
Basically, how much information can I cram into a given signal (how many bits can I encode)
Information Theory
Another statistical approach
![Page 6: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/6.jpg)
Decision Trees 68/29/03
Entropy Starts with entropy…
Entropy is a measure of the homogeneity of the data
Purely random (nothing but noise) is maximum entropy
Linearly separable data is minimum entropy What does that mean with discrete data?
Given all instances with a sunny outlook, what if all of them were classified “yes, play tennis” that were “low humidity” and all of them were classified “no, do not play tennis” that were “high humidity”
High entropy or low?
Given all instances with a sunny outlook, what if half were “yes, play tennis” and half “no, don’t play” no matter what the humidity
High entropy or low?
![Page 7: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/7.jpg)
Decision Trees 78/29/03
Entropy
S is a collection of training samples is the proportion of positives is the proportion of negativesWe define as 0
If going to measure…
Want a statistical approach that yields…
Example: 100% positivesExample: 0% positivesExample: 50% positives
![Page 8: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/8.jpg)
Decision Trees 88/29/03
Example
What if a sample was 20% 80% Log2(.2) = log(.2)/log(2) Log2(.2) = -2.321928 Log(.8) = -0.3219281 -(.2)*(-2.321928) – (.8)*(-0.3219281) 0.7219281
What if 80% 20% Same
What if 50% 50% Highest entropy, 1
![Page 9: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/9.jpg)
Decision Trees 98/29/03
If Not Binary
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 )=∑𝑖=1
𝑐
−𝑝𝑖 log2𝑝𝑖
Can extend to more classes Not just positive and negative
• If set base to number of classes back to summing to 1 at max• Sum to number of classes if stick with base 2• From book: Entropy is a measure of the expected encoding length
measured in bits
![Page 10: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/10.jpg)
Decision Trees 10
Humidity question or Windy question?
8/29/03
Information Gain Simply, expected reduction in entropy
caused by partitioning the examples according to this attribute
𝐺𝑎𝑖𝑛 (𝑆 , 𝐴 )≡𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 )− ∑𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴 )
|𝑆𝑣||𝑆|
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆𝑣)
Scales the contribution of each answer according to membership
If entropy of S is 1 and each of the entropies for the answers is 1 then … 1 – 1 so zero
Information gain is zero
If entropy of S is 1 and each of the entropies for the answers is 0 then … 1 – 0 so one
Information gain is 1
![Page 11: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/11.jpg)
Decision Trees 118/29/03
Example
, 9 yesses to tennis, 5 no’s
What is the information gain
![Page 12: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/12.jpg)
Decision Trees 128/29/03
The algorithm Recursive algorithm: ID3
Iterative Dichotomizer 3
ID3(S, attributes yet to be processed)Create a Root node for the treeBase cases
If S are all same class, return the single node tree root with that labelIf attributes is empty return r node with label equal to most common class
OtherwiseFind attribute with greatest information gainSet decision attribute for root For each value of the chosen attribute
Add a new branch below rootDetermine Sv for that valueIf Sv is empty
Add a leaf with label of most common classElse
Add subtree to this branch: ID3(Sv, attributes – this attribute)
![Page 13: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/13.jpg)
Decision Trees 138/29/03
Another example
Which attribute next?
![Page 14: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/14.jpg)
Decision Trees 148/29/03
Another Example
Next attribute?
![Page 15: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/15.jpg)
Decision Trees 158/29/03
An issue
Is there a branch for every answer?
What if no training samples had overcast as their outlook?
Could you classify a new unknown or test instance if it had overcast in that dimension?
![Page 16: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/16.jpg)
Decision Trees 168/29/03
An issue
Tree often perfectly classifies training data Not guaranteed but usually: if exhaust every dimension
as drill-down last decision node might have answers that are still “impure” but is labeled with most abundant class
For instance: on the cancer data my tree had no leaves deeper than 4 levels
It basically memorizes the training data Is this the best policy? What if had a node that “should” be pure but had
a single exception?
Overfitting
![Page 17: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/17.jpg)
Decision Trees 178/29/03
0 5 100
510
Two Classes
X
Y
Visualizing Overfitting
Decision boundary Sometimes it is
better to live with a little error than to try to get perfection
![Page 18: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/18.jpg)
Decision Trees 188/29/03
Overfitting
Wikipedia In statistics, overfitting occurs when a
statistical model describes random error or noise instead of the underlying relationship.
-10 -5 0 5 10
-3000
-2000
-1000
01000
2000
3000
X
Y
-10 -5 0 5 10
-3000
-2000
-1000
01000
2000
3000
X
Y
![Page 19: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/19.jpg)
Decision Trees 198/29/03
How Fix Bayesian finds boundary that minimizes
error If we trim the decision tree’s leaves—
similar effect i.e. don’t try to memorize every single training
sample
![Page 20: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/20.jpg)
Decision Trees 208/29/03
Don’t know until you know Withhold some data Use to test
Definition
Given a hypothesis space , a hypothesis is said to overfit the training data if there exists some alternative hypothesis , such that has smaller error than over the training examples, but has a smaller error than over the entire distribution of instances.
![Page 21: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/21.jpg)
Decision Trees 218/29/03
How prevent?
Stop growing tree early Set some threshold for allowable
entropy Post Pruning
Build tree then remove as long as it improves
![Page 22: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/22.jpg)
Decision Trees 228/29/03
Remove each decision node in turn and check performance Removing a decision node means
removing all sub-trees below it and assigning the most common class
Remove (permanently) the decision node that caused the greatest increase in accuracy
Rinse and repeat
Reduced Error Pruning Try it a
nd
see
![Page 23: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/23.jpg)
Decision Trees 238/29/03
Build the complete (over trained) tree Convert the learned tree into a set of
rules One rule per path from root to leaf Each rule is a set of conjunctions
Remove any clause from each rule chain that increases accuracy Remember each rule chain provides a full
classification Sort rules by accuracy and classify in that
order
Rule Post Pruning
![Page 24: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/24.jpg)
Decision Trees 248/29/03
Not really a tree any more A series of rules A node could both be present and not be
present Imagine a bifurcation and one track has
only the first and last “node”
![Page 25: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/25.jpg)
Neural Networks 258/29/03
Bagging
Bootstrap aggregating (bagging)
Helps to avoid overfitting
Usually applied to decision tree models (though not exclusively)
![Page 26: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/26.jpg)
Neural Networks 268/29/03
Bagging
Machine learning ensemble meta-algorithm Create a bunch of models Do so by bootstrap sampling the training data Let all the models vote
Q1
Q2 Q3 Q4
Q1
Q2 Q3 Q4
Q1
Q2 Q3 Q4
Q1
Q2 Q3 Q4
Q1
Q2 Q3 Q4
Q1
Q2 Q3 Q4
Q1
Q2 Q3 Q4
Pick me!
Pick me!
Pick me!
Pick me!Pick me!
Pick me!
Pick me!Pick me! Pick me!
Pick me!
Pick me!
Pick me!
Pick me!Pick me!
Pick me!
Pick me!Pick me!
Pick me!
![Page 27: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/27.jpg)
Decision Trees 278/29/03
Random Forest
Forest is a bunch of trees
Each tree has access to a random subset of attributes/dimensions
![Page 28: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/28.jpg)
Decision Trees 288/29/03
The nature of Decision Trees
Greedy algorithm Tries to race to an
answer Finds the next
question that best splits the data into classes by answer
Result: Short trees are
preferred
![Page 29: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/29.jpg)
Decision Trees 298/29/03
Occam’s razor
The simplest answer is often the best
But does this lead to the best classifier
Book has a philosophical discussion about this without resolving the issue
![Page 30: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/30.jpg)
Decision Trees 308/29/03
Coolness factor Many classifiers simply give an answer No reason Decision trees one of the
few that provides such insights
![Page 31: Learning what questions to ask. 8/29/03Decision Trees2 Job is to build a tree that represents a series of questions that the classifier will ask of](https://reader036.vdocument.in/reader036/viewer/2022062516/56649e355503460f94b2491e/html5/thumbnails/31.jpg)
Decision Trees 318/29/03