decision tree - sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. ·...
TRANSCRIPT
![Page 1: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/1.jpg)
Decision Tree CE-717 : Machine Learning Sharif University of Technology
M. Soleymani
Fall 2012
Some slides have been adapted from: Prof. Tom Mitchell
![Page 2: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/2.jpg)
Decision tree
Approximating functions of usually discrete domain
The learned function is represented by a decision tree
Application: Database mining
2
![Page 3: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/3.jpg)
PlayTennis example: Training samples
3
![Page 4: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/4.jpg)
A PlayTennis example
Feature 1: Humidity
Feature 2: Wind
Feature 3: Outlook
Feature 4: Temprature
Output: Play Tennis
Example:
𝒙=[low, weak, rain, hot] 𝑦=𝑁𝑜
4
![Page 5: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/5.jpg)
PlayTennis example: decision tree
5
![Page 6: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/6.jpg)
Decision tree structure
Internal node: a test on some attributes
Branch: one of the possible results of the test
Leaf: predicting y
6
![Page 7: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/7.jpg)
Decision tree learning:
Function approximation problem
Problem Setting:
Set of possible instances 𝑋
Unknown target function 𝑓: 𝑋 → 𝑌 (𝑌 is discrete valued)
Set of function hypotheses 𝐻 = * | ∶ 𝑋 → 𝑌 +
is a DT where tree sorts each 𝒙 to a leaf which assigns 𝑦
Input:
Training examples *(𝒙 𝑖 , 𝑦 𝑖 )+ of unknown target function 𝑓
Output:
Hypothesis ∈ 𝐻 that best approximates target function 𝑓
7
![Page 8: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/8.jpg)
Decision trees hypothesis space
Suppose attributes are Boolean
Disjunction of conjunctions
Which trees to show the following functions?
𝑦 = 𝑥1 𝑎𝑛𝑑 𝑥3
𝑦 = 𝑥1 𝑜𝑟 𝑥3
𝑦 = (𝑥1 𝑎𝑛𝑑 𝑥3) 𝑜𝑟(𝑥2 𝑎𝑛𝑑 ¬𝑥4)
8
![Page 9: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/9.jpg)
Decision tree as a rule base
Decision tree = a set of rules
Disjunctions of conjunctions of constraints on the attribute
values of instances
Each path from root to a leaf = conjunction of attribute tests
All of the leafs with 𝑦 = 𝑖 are considered to find rule for 𝑦 = 𝑖
9
![Page 10: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/10.jpg)
How to construct basic decision tree?
Learning an optimal decision tree is NP-Complete
Instead, we use a greedy search based on a heuristic
Which attribute at the root?
The attribute 𝑨 that is the best classifier
Measure: how well splits the set into homogeneous subsets (having same value of target)
We prefer decisions leading to a simple, compact tree with few nodes
How to form descendant?
Descendant is created for each possible value of 𝐴
Training examples are sorted to descendant nodes
10
![Page 11: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/11.jpg)
Basic decision tree learning algorithm (ID3)
ID3(𝐸𝑥𝑎𝑚𝑝𝑙𝑒𝑠, 𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠)
1) Create a root node 𝑟
2) If all examples have the same label or 𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠 = *+ return 𝑟 with proper label
3) 𝐴 ← the best classifier attribute on 𝐸𝑥𝑎𝑚𝑝𝑙𝑒𝑠
4) For each value of 𝐴 add a new branch below 𝑟 and sort
examples accordingly
5) Recursively grows the tree under each branch
Top down, Greedy, No backtrack
11
![Page 12: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/12.jpg)
ID3
12
•ID3 (Examples, Target_Attribute, Attributes)
•Create a root node for the tree
•If all examples are positive, return the single-node tree Root, with label = +
•If all examples are negative, return the single-node tree Root, with label = -
•If number of predicting attributes is empty then
• return Root, with label = most common value of the target attribute in the examples
•else
•A = The Attribute that best classifies examples.
•Testing attribute for Root = A.
•for each possible value, 𝑣𝑖, of A
•Add a new tree branch below Root, corresponding to the test A = .
•Let Examples(𝑣𝑖) be the subset of examples that have the value for A
•if Examples(𝑣𝑖) is empty then
• below this new branch add a leaf node with label = most common target value in the examples
•else below this new branch add subtree ID3 (Examples(𝒗𝒊), Target_Attribute, Attributes – {A})
•return Root
![Page 13: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/13.jpg)
Which attribute is the best?
13
![Page 14: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/14.jpg)
Which attribute is the best?
14
A variety of heuristics for picking a good test
Information gain: originated with ID3 (Quinlan,1979).
Gini impurity
…
![Page 15: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/15.jpg)
Entropy
𝐻 𝑋 = − 𝑃 𝑥𝑖 𝑙𝑜𝑔𝑃(𝑥𝑖)𝑥𝑖∈𝑋
Amount of uncertainty associated with a specific
probability distribution
15
![Page 16: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/16.jpg)
Entropy
Information theory:
𝐻 𝑋 : expected number of bits needed to encode a randomly
drawn value of 𝑋 (under most efficient code)
Most efficient code assigns −log 𝑃(𝑋 = 𝑖) bits to encode 𝑋 = 𝑖
⇒ expected number of bits to code one random 𝑋 is 𝐻(𝑋)
16
![Page 17: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/17.jpg)
Entropy for a Boolean variable
𝐻(𝑋)
𝑃(𝑋 = 1)
𝐻 𝑋 = −1 log2 1 − 0 log2 0 = 0 𝐻 𝑋 = −0.5 log21
2− 0.5 log2
1
2= 1
17
Entropy as a measure
of impurity
![Page 18: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/18.jpg)
Conditional entropy
𝐻 𝑌 𝑋 = − 𝑃 𝑋 = 𝑖, 𝑌 = 𝑗 𝑙𝑜𝑔𝑃 𝑌 = 𝑗|𝑋 = 𝑖𝑗𝑖
18
𝐻 𝑌 𝑋 = 𝑃 𝑋 = 𝑖 −𝑃 𝑌 = 𝑗|𝑋 = 𝑖 𝑙𝑜𝑔𝑃 𝑌 = 𝑗|𝑋 = 𝑖𝑗𝑖
probability of following i-th branch for 𝑋
Entropy of 𝑌 for samples sorted to i-th branch
![Page 19: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/19.jpg)
Mutual Information
𝐼 𝑋, 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌 = 𝐻 𝑌 − 𝐻 𝑌 𝑋
The expected reduction in entropy caused by partitioning
examples according to this attribute
𝐻 𝑌 : Entropy of Y before you split
𝐻 𝑌 𝑋 : Entropy after split
19
![Page 20: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/20.jpg)
Information Gain
𝐺𝑎𝑖𝑛 𝑆, 𝐴 = 𝐼𝑆 𝐴, 𝑌 = 𝐻𝑆 𝑌 − 𝐻𝑆 𝑌|𝐴
𝑌: target variable
𝑆: samples
𝐴: variable used to sort samples
𝐺𝑎𝑖𝑛 𝑆, 𝐴 ≡ 𝐻𝑆 𝑌 − 𝑆𝑣𝑆𝐻𝑆𝑣 𝑌
𝑣∈Values(𝐴)
20
![Page 21: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/21.jpg)
Information Gain: Example
21
![Page 22: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/22.jpg)
How to find the best attribute classifier?
Information gain as our criteria for a good split
attribute that maximizes information gain at each node
Choose 𝑗-th attribute: 𝑗 = argmax
𝑖𝐼𝐺 𝑋𝑖
= argmax𝑖𝐻 𝑌 − 𝐻 𝑌|𝑋𝑖
22
![Page 23: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/23.jpg)
Example
23
![Page 24: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/24.jpg)
Decision tree algorithm
24
The algorithm
either reaches homogenous nodes
or runs out of attributes
Guaranteed to find a tree consistent with any conflict-free
training set
Conflict free training set: identical feature vectors always assigned the
same class
But not necessarily find the simplest tree.
![Page 25: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/25.jpg)
How partition instance space?
25
Decision tree
Partition the instance space into axis-parallel regions, labeled with
class value
Source: Duda & Hurt ’s Book
![Page 26: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/26.jpg)
26
Bias in decision tree induction
Information-gain gives a bias for trees with minimal depth.
Implements a search (preference) bias instead of a
language (restriction) bias.
![Page 27: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/27.jpg)
Which Tree Should We Output?
ID3: heuristic search through space of DTs
Stops at smallest acceptable tree
Occam’s razor: prefer the
simplest hypothesis that
fits the data
27
Ockham (1285-1349) Principle of Parsimony:
“One should not increase, beyond what is necessary,
the number of entities required to explain anything.”
![Page 28: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/28.jpg)
Why prefer short hypotheses?
(Occam’s Razor)
Fewer short hypotheses than long ones
a short hypothesis that fits the data is less likely to be a
statistical coincidence
highly probable that a sufficiently complex hypothesis will fit
the data
28
![Page 29: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/29.jpg)
Over-fitting problem
ID3 usually perfectly classifies training data
It tries to memorize every training data
Poor decisions when very little data and may not reflect
reliable trends
A node that “should” be pure but had a single exception?
Noise in the training data: the tree is erroneously fitting.
For many (non relevant) attributes, the algorithm will
continue to split nodes
leads to over-fitting!
29
![Page 30: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/30.jpg)
Over-fitting problem: an example
Consider adding a (noisy) training example:
How does the tree change?
PlayTennis Wind Humidity Temp Outlook
No Strong Normal Hot Sunny
30
![Page 31: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/31.jpg)
General definition of overfitting
Hypothesis ∈ 𝐻
Training error: 𝑒𝑟𝑟𝑜𝑟𝑡𝑟𝑎𝑖𝑛()
Expected error: 𝑒𝑟𝑟𝑜𝑟𝑡𝑟𝑢𝑒()
overfits training data if there is a ′ ∈ 𝐻 such that
𝑒𝑟𝑟𝑜𝑟𝑡𝑟𝑎𝑖𝑛 < 𝑒𝑟𝑟𝑜𝑟𝑡𝑟𝑎𝑖𝑛(′)
𝑒𝑟𝑟𝑜𝑟𝑡𝑟𝑢𝑒 > 𝑒𝑟𝑟𝑜𝑟𝑡𝑟𝑢𝑒(′)
31
![Page 32: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/32.jpg)
Over-fitting in decision tree learning
32
![Page 33: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/33.jpg)
A question?
33
How can it be made smaller and simpler?
When should a node be declared a leaf?
If a leaf node is impure, how should the category label be assigned?
Pruning?
![Page 34: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/34.jpg)
Avoiding overfitting
Stop growing when the data split is not statistically
significant.
Grow full tree and then prune it.
More successful than stop growing in practice.
How to select “best” tree:
Measure performance over separate validation set
MDL: minimize
𝑠𝑖𝑧𝑒 𝑡𝑟𝑒𝑒 + 𝑠𝑖𝑧𝑒(𝑚𝑖𝑠𝑠𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠(𝑡𝑟𝑒𝑒))
34
![Page 35: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/35.jpg)
Reduced-error pruning
Split data into train and validation set
Build tree using training set
Do until further pruning is harmful:
Evaluate impact on validation set when pruning sub-tree
rooted at each node
Temporarily remove sub-tree rooted at node
Replace it with a leaf labeled with the current majority class at that node
Measure and record error on validation set
Greedily remove the one that most improves validation set
accuracy (if any).
35
Produces smallest version of the most accurate sub-tree.
![Page 36: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/36.jpg)
Rule post-pruning
Convert tree to equivalent set of rules
Prune each rule independently of others
Sort final rules into desired sequence for use
Perhaps most frequently used method (e.g., C4.5)
36
![Page 37: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/37.jpg)
Continuous attributes
Tests on continuous variables as boolean ?
Either use threshold to turn into binary or discretize
Its possible to compute information gain for all possible
thresholds (there are a finite number of training samples)
Harder if we wish to assign more than two values (can be
done recursively)
37
![Page 38: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/38.jpg)
38
More issues on decision trees
Better splitting criteria
Information gain prefers features with many values.
Missing feature values
Features with costs
Misclassification costs
Incremental learning (ID4, ID5)
Regression trees
![Page 39: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/39.jpg)
Ranking classifiers
Rich Caruana & Alexandru Niculescu-Mizil, An Empirical Comparison of Supervised Learning
Algorithms, ICML 2006
Top 8 are all based on various extensions of
decision trees
39
![Page 40: Decision Tree - Sharifce.sharif.edu/courses/91-92/1/ce717-2/resources/root... · 2020. 9. 7. · Learning an optimal decision tree is NP-Complete Instead, we use a greedy search based](https://reader033.vdocument.in/reader033/viewer/2022060915/60a8bf0f5f2da250a32c5160/html5/thumbnails/40.jpg)
Decision tree advantages
40
Simple to understand and interpret
Requires little data preparation
Can handle both numerical and categorical data
Performs well with large data in a short time
Robust
Performs well even if its assumptions are somewhat violated