Download - Decision tree and random forest
![Page 1: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/1.jpg)
Decision Tree & Random Forest Algorithm
![Page 2: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/2.jpg)
Outline
Introduction Example of Decision Tree Principles of Decision Tree
– Entropy– Information gain
Random Forest
2
![Page 3: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/3.jpg)
The problem
Given a set of training cases/objects and their attribute values, try to determine the target attribute value of new examples.
– Classification– Prediction
Apply Model
Induction
Deduction
Learn Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes 10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ? 10
Test Set
Learningalgorithm
Training Set
3
![Page 4: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/4.jpg)
Key Requirements
Attribute-value description: object or case must be expressible in terms of a fixed collection of properties or attributes (e.g., hot, mild, cold).
Predefined classes (target values): the target function has discrete output values (bollean or multiclass)
Sufficient data: enough training cases should be provided to learn the model.
4
![Page 5: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/5.jpg)
A simple example
5
![Page 6: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/6.jpg)
Principled Criterion
Choosing the most useful attribute for classifying examples. Entropy
- A measure of homogeneity of the set of examples- If the sample is completely homogeneous the entropy is zero and if
the sample is an equally divided it has entropy of one Information Gain
- Measures how well a given attribute separates the training examples according to their target classification
- This measure is used to select among the candidate attributes at each step while growing the tree
6
![Page 7: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/7.jpg)
Information Gain
Step 1 : Calculate entropy of the target
7
![Page 8: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/8.jpg)
Information Gain (Cont’d)
Step 2 : Calculate information gain for each attribute
8
![Page 9: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/9.jpg)
Information Gain (Cont’d)
Step 3: Choose attribute with the largest information gain as the decision node.
9
![Page 10: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/10.jpg)
Information Gain (Cont’d)
Step 4a: A branch with entropy of 0 is a leaf node.
10
![Page 11: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/11.jpg)
Information Gain (Cont’d)
Step 4b: A branch with entropy more than 0 needs further splitting.
11
![Page 12: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/12.jpg)
Information Gain (Cont’d)
Step 5: The algorithm is run recursively on the non-leaf branches, until all data is classified.
12
![Page 13: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/13.jpg)
Random Forest
Decision Tree : one tree Random Forest : more than one tree
13
![Page 14: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/14.jpg)
Decision Tree & Random Forest
14
Decision Tree
Random Forest
Tree 1 Tree 2
Tree 3
![Page 15: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/15.jpg)
Decision Tree
Outlook Temp. Humidity Windy Play GolfRainy Mild High False ?
15
Result : No
![Page 16: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/16.jpg)
Random Forest
16
Tree 1 Tree 2
Tree 3
Tree 1 : NoTree 2 : NoTree 3 : Yes
Yes : 1No : 2
Result : No
![Page 17: Decision tree and random forest](https://reader036.vdocument.in/reader036/viewer/2022062412/589a86c31a28ab0e2f8b65fd/html5/thumbnails/17.jpg)
OOB Error Rate
OOB error rate can be used to get a running unbiased estimate of the classification error as trees are added to the forest.
17