decision tree
TRANSCRIPT
![Page 1: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/1.jpg)
Tilani Gunawardena
Algorithms: Decision Trees
![Page 2: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/2.jpg)
![Page 3: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/3.jpg)
Decision Tree• Decision tree builds classification or regression models in the form of
a tree structure• It breaks down a dataset into smaller and smaller subsets while at
the same time an associated decision tree is incrementally developed
• The final results is a tree with decision nodes and leaf notes.– Decision nodes(ex: Outlook) has two or more branches(ex: Sunny,
Overcast and Rainy)– Leaf Node(ex: Play=Yes or Play=No) – Topmost decision node in a tree which corresponds to the best predictor
called root node• Decision trees can handle both categorical and numerical data
![Page 4: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/4.jpg)
Decision tree learning Algorithms
• ID3 (Iterative Dichotomiser 3)• C4.5 (successor of ID3)• CART (Classification And Regression Tree)• CHAID (CHi-squared Automatic Interaction
Detector). Performs multi-level splits when computing classification trees)
• MARS: extends decision trees to handle numerical data better.
![Page 5: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/5.jpg)
How it works• The core algorithm for building decisions tress
called ID3 by J.R. Quinlan which employs a top-down, greedy search through the space of possible branches with no backtracking
• ID3 uses Entropy and information Gain to construct a decision tree
![Page 6: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/6.jpg)
DIVIDE-AND-CONQUER(CONSTRUCTING DECISION TREES
• Divide and Conquer approach (Strategy: top down)– First: select attribute for root node : create branch
for each possible attribute value– Then: split instances into subsets ; One for each
branch extending from the node– Finally: repeat recursively for each branch, using
only instances that reach the branch• Stop if all instances have same class
![Page 7: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/7.jpg)
Outlook Temp Humidity Windy PlaySunny Hot High False NoSunny Hot High True NoOvercast Hot High False YesRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoOvercast Cool Normal True YesSunny Mild High False NoSunny Cool Normal False YesRainy Mild Normal False YesSunny Mild Normal True YesOvercast Mild High True YesOvercast Hot Normal False YesRainy Mild High True No
Which attribute to select?
![Page 8: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/8.jpg)
![Page 9: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/9.jpg)
Criterion for attribute selection• Which is the best attribute?
– The one which will result in the smallest tree– Heuristic: choose the attribute that produces the “purest” nodes
• Need a good measure of purity!– Maximal when?– Minimal when?
• Popular impurity criterion: Information gain– Information gain increases with the average purity of the subsets
• Measure information in bits – Given a probability distribution, the info required to predict an event is the
distribution’s entropy – Entropy gives the information required in bits (can involve fractions of
bits!) • Formula for computing the entropy:
– Entropy(p1,p2,...,pn)=−p1logp1−p2 logp2...−pnlogpn
Purity measure of each node improves the feature/attribute selection
![Page 10: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/10.jpg)
10
Entropy: a common way to measure impurity
• Entropy =
pi is the probability of class iCompute it as the proportion of class i in the set.
• Entropy comes from information theory. The higher the entropy the more the information content.
i
ii pp 2log
Entropy aims to answer “how uncertain we are of the outcome?”
![Page 11: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/11.jpg)
Entropy• A decision tree is built top-down from root node and involves
partitioning the data into subsets that contain instances with similar values(homogeneous)
• ID3 algorithm uses entropy to calculate the homogeneity of a sample
• If the sample is completely homogeneous the entropy is zero and if the samples is an equally divided it has entropy of one
![Page 12: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/12.jpg)
12
2-Class Cases:• What is the entropy of a group in which all
examples belong to the same class?– entropy =
• What is the entropy of a group with 50% in either class?– entropy =
Minimum impurity
Maximumimpurity
i
ii pp 2logEntropy =
![Page 13: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/13.jpg)
13
2-Class Cases:• What is the entropy of a group in which all
examples belong to the same class?– entropy = - 1 log21 = 0
• What is the entropy of a group with 50% in either class?– entropy = -0.5 log20.5 – 0.5 log20.5 =1
Minimum impurity
Maximumimpurity
![Page 14: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/14.jpg)
14
Information Gain
Which test is more informative?Split over whether
Balance exceeds 50K
Over 50KLess or equal 50K EmployedUnemployed
Split over whether applicant is employed
![Page 15: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/15.jpg)
15
Impurity/Entropy (informal)– Measures the level of impurity in a group of examples
Information Gain
Less impure Minimum impurity
Very impure group
Gain aims to answer “how much entropy of the training set some test reduced ??”
![Page 16: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/16.jpg)
16
Information Gain• We want to determine which attribute in a given set
of training feature vectors is most useful for discriminating between the classes to be learned.
• Information gain tells us how important a given attribute of the feature vectors is.
• We will use it to decide the ordering of attributes in the nodes of a decision tree.
![Page 17: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/17.jpg)
17
Calculating Information Gain
996.03016log
3016
3014log
3014
22
impurity
787.0174log
174
1713log
1713
22
impurity
Entire population (30 instances)17 instances
13 instances
(Weighted) Average Entropy of Children = 615.0391.03013787.0
3017
Information Gain= 0.996 - 0.615 = 0.38
391.01312log
1312
131log
131
22
impurity
Information Gain = entropy(parent) – [average entropy(children)]
gain(population)=info([14,16])-info([13,4],[1,12])
parententropy
childentropy
childentropy
![Page 18: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/18.jpg)
18
Calculating Information Gain
615.0391.03013787.0
3017
Information Gain= info([14,16])-info([13,4],[1,12]) = 0.996 - 0.615
= 0.38
391.01312log
1312
131log
131
22
impurity
Information Gain = entropy(parent) – [average entropy(children)]
gain(population)=info([14,16])-info([13,4],[1,12]) info[14/16]=entropy(14/30,16/30) =
info[13,4]=entropy(13/17,4/17) = info[1.12]=entropy(1/13,12/13) =
996.03016log
3016
3014log
3014
22
impurity
787.0174log
174
1713log
1713
22
impurity
info([13,4],[1,12]) =
![Page 19: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/19.jpg)
Outlook Temp Humidity Windy PlaySunny Hot High False NoSunny Hot High True NoOvercast Hot High False YesRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoOvercast Cool Normal True YesSunny Mild High False NoSunny Cool Normal False YesRainy Mild Normal False YesSunny Mild Normal True YesOvercast Mild High True YesOvercast Hot Normal False YesRainy Mild High True No
Which attribute to select?
![Page 20: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/20.jpg)
![Page 21: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/21.jpg)
Outlook = Sunny :
info[([2,3])=Outlook = Overcast : Info([4,0])=Outlook = Rainy : Info([2,3])=
i
ii pp 2log
![Page 22: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/22.jpg)
Outlook = Sunny :
info[([2,3])=entropy(2/5,3/5)=
Outlook = Overcast : Info([4,0])=entropy(1,0)=
Outlook = Rainy : Info([2,3])=entropy(3/5,2/5)=
i
ii pp 2log
![Page 23: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/23.jpg)
Outlook = Sunny :
info[([2,3])=entropy(2/5,3/5)=−2/5log(2/5)−3/5log(3/5)=0.971bits
Outlook = Overcast : Info([4,0])=entropy(1,0)=−1log(1)−0log(0)=0bits
Outlook = Rainy : Info([2,3])=entropy(3/5,2/5)=−3/5log(3/5)−2/5log(2/5)=0.971bits
Expected information for attribute: Info([3,2],[4,0],[3,2])=
Note: log(0) is normally undefined but we evaluate 0*log(0) as zero
(Weighted) Average Entropy of Children =
i
ii pp 2log
![Page 24: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/24.jpg)
Outlook = Sunny :
info[([2,3])=entropy(2/5,3/5)=−2/5log(2/5)−3/5log(3/5)=0.971bits
Outlook = Overcast : Info([4,0])=entropy(1,0)=−1log(1)−0log(0)=0bits
Outlook = Rainy : Info([2,3])=entropy(3/5,2/5)=−3/5log(3/5)−2/5log(2/5)=0.971bits
Expected information for attribute: Info([3,2],[4,0],[3,2])=(5/14)×0.971+(4/14)×0+(5/14)×0.971=0.693bits
Information gain= information before splitting – information after splitting
gain(Outlook ) = info([9,5]) – info([2,3],[4,0],[3,2])
Note: log(0) is normally undefined but we evaluate 0*log(0) as zero
i
ii pp 2log
![Page 25: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/25.jpg)
Outlook = Sunny :
info[([2,3])=entropy(2/5,3/5)=−2/5log(2/5)−3/5log(3/5)=0.971bits
Outlook = Overcast : Info([4,0])=entropy(1,0)=−1log(1)−0log(0)=0bits
Outlook = Rainy : Info([2,3])=entropy(3/5,2/5)=−3/5log(3/5)−2/5log(2/5)=0.971bits
Expected information for attribute: Info([3,2],[4,0],[3,2])=(5/14)×0.971+(4/14)×0+(5/14)×0.971=0.693bits
Information gain= information before splitting – information after splitting
gain(Outlook ) = info([9,5]) – info([2,3],[4,0],[3,2]) = 0.940 – 0.693 = 0.247 bits
Note: log(0) is normally undefined but we evaluate 0*log(0) as zero
![Page 26: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/26.jpg)
Humidity = high :
info[([3,4])=entropy(3/7,4/7)=−3/7log(3/7)−4/7log(4/7)=0.524+0.461=0.985 bits
Humidity = normal : Info([6,1])=entropy(6/7,1/7)=−6/7log(6/7)−1/7log(1/7)=0.191+0.401=0.592 bits
Expected information for attribute: Info([3,4],[6,1])=(7/14)×0.985+(7/14)×0.592=0.492+0.296= 0.788 bits
Information gain= information before splitting – information after splitting
gain(Humidity ) = info([9,5]) – info([3,4],[6,1]) = 0.940 – 0.788 = 0.152 bits
![Page 27: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/27.jpg)
gain(Outlook ) = 0.247 bits gain(Temperature ) = 0.029 bits gain(Humidity ) 0.152 bits gain(Windy ) 0.048 bits
info(nodes)=Info([2,3],[4,0],[3,2])=0.693bits
gain= 0.940-0.693 = 0.247 bits
info(nodes)=Info([6,2],[3,3])=0.892 bits
gain=0.940-0.892 = 0.048 bits
info(nodes)=Info([2,2],[4,2],[3,1])=0.911 bits
gain=0.940-0.911 = 0.029 bits
info(nodes)=Info([3,4],[6,1])=0.788bits
gain= 0.940-0.788 =0.152 bits
Info(all features) =Info(9,5) =0.940 bits
This nodes is “pure” with only “yes” pattern, therefore lower entropy and higher gain
![Page 28: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/28.jpg)
gain(Outlook ) = 0.247 bits gain(Temperature ) = 0.029 bits gain(Humidity ) 0.152 bits gain(Windy ) 0.048 bits
• Select the attribute with the highest gain ratio
• Information gain tells us how important a given attribute of the feature vectors is.
• We will use it to decide the ordering of attributes in the nodes of a decision tree.
• Constructing a decision tree is all about finding attribute that returns the highest information gain(the most homogeneous branches)
![Page 29: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/29.jpg)
Continuing to split • gain(Outlook) =0.247 bits• gain(Temperature ) = 0.029 bits • gain(Humidity ) = 0.152 bits • gain(Windy ) = 0.048 bits
![Page 30: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/30.jpg)
Outlook Temp Humidity Windy PlaySunny Hot High False NoSunny Hot High True NoOvercast Hot High False YesRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoOvercast Cool Normal True YesSunny Mild High False NoSunny Cool Normal False YesRainy Mild Normal False YesSunny Mild Normal True YesOvercast Mild High True YesOvercast Hot Normal False YesRainy Mild High True No
Outlook Temp Humidity Windy PlaySunny Hot High False NoSunny Hot High True NoSunny Mild High False NoSunny Cool Normal False YesSunny Mild Normal True Yes
Temp Humidity Windy PlayHot High False NoHot High True NoMild High False NoCool Normal False YesMild Normal True Yes
![Page 31: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/31.jpg)
Temp Humidity Windy PlayHot High False NoHot High True NoMild High False NoCool Normal False YesMild Normal True Yes
Temperature
NoNo
Hot
YesNo
Yes
MildCool
Windy
NoNoYes
False
NoYes
True
Humidity
NoNoNo
High
YesYes
Normal
Play
NoNoNo
YesYes
![Page 32: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/32.jpg)
Temperature
NoNo
Hot
YesNo
Yes
MildCool
Windy
NoNoYes
False
NoYes
True
Temperature = Hot : info[([2,0])=entropy(1,0)=entropy(1,0)=−1log(1)−0log(0)=0 bits Temperature = Mild : Info([1,1])=entropy(1/2,1/2)=−1/2log(1/2)−1/2log(1/2)=0.5+0.5=1 bits Temperature = Cool : Info([1,0])=entropy(1,0)= 0 bits
Expected information for attribute: Info([2,0],[1,1],[1,0])=(2/5)×0+(2/5)×1+(1/5)x0=0+0.4+0= 0.4 bits
gain(Temperature ) = info([3,2]) – info([2,0],[1,1],[1,0]) = 0.971-0.4= 0.571 bits
Play
NoNoNo
YesYes
Windy = False : info[([2,1])=entropy(2/3,1/3)=−2/3log(2/3)−1/3log(1/3)=0.9179 bits Windy = True : Info([1,1])=entropy(1/2,1/2)=1 bits
Expected information for attribute: Info([2,1],[1,1])=(3/5)×0.918+(2/5)×1=0.951bits
gain(Windy ) = info([3,2]) – info([2,1],[1,1]) = 0.971-0.951= 0.020 bits
Humidity
NoNoNo
High
YesYes
Normal
Humidity = High : info[([3,0])=entropy(1,0)=0bits Humidity = Normal : Info([2,0])=entropy(1,0)=0 bits
Expected information for attribute: Info([3,0],[2,0])=(3/5)×0+(2/5)×0=0 bits
gain(Humidity ) = info([3,2]) – Info([3,0],[2,0]) = 0.971-0= 0.971 bits
gain(Temperature ) = 0.571 bits gain(Humidity ) = 0.971 bits gain(Windy ) = 0.020 bits
![Page 33: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/33.jpg)
Outlook Temp Humidity Windy PlaySunny Hot High False NoSunny Hot High True NoOvercast Hot High False YesRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoOvercast Cool Normal True YesSunny Mild High False NoSunny Cool Normal False YesRainy Mild Normal False YesSunny Mild Normal True YesOvercast Mild High True YesOvercast Hot Normal False YesRainy Mild High True No
Outlook Temp Humidity Windy PlayRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoRainy Mild Normal False YesRainy Mild High True No
Temp Humidity Windy PlayMild High False YesCool Normal False YesCool Normal True NoMild Normal False YesMild High True No
Temp Windy PlayMild False YesCool False YesCool True NoMild False YesMild True No
![Page 34: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/34.jpg)
Temp Windy PlayMild False YesCool False YesCool True NoMild False YesMild True No
Temperature
---
Hot
YesYesNo
YesNo
MildCool
Windy
YesYesYes
False
NoNo
True
Temperature = Mild : Info([2,1])=entropy(1/2,1/2)=0.9179 bits
Temperature = Cool : Info([1,1])=1 bits
Expected information for attribute: Info([2,1],[1,1])=(3/5)×0.918+(2/5)×1=0.551+0.4= 0.951 bits
gain(Temperature ) = info([3,2]) – info([2,1],[1,1]) = 0.971-0.951= 0.02 bits
Play
YesYesYes
NoNo
Windy = False : info[([3,0])=0 bits Windy = True : Info([2,0])=0 bits
Expected information for attribute: Info([3,0],[2,0])= 0 bits
gain(Windy ) = info([3,2]) – info([3,0],[2,0]) = 0.971-0= 0.971 bits
gain(Temperature ) = 0.02 bits gain(Windy ) = 0.971 bits
![Page 35: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/35.jpg)
Final decision tree
R1: If (Outlook=Sunny) And (Humidity=High) then Play=No
R2: If (Outlook=Sunny) And (Humidity=Normal) then Play=Yes
R3: If (Outlook=Overcast) then Play=Yes
R4: If (Outlook=Rainy) And (Windy=False) then Play=Yes
R5: If (Outlook=Rainy) And (Windy=True) then Play=No
Note: not all leaves need to be pure; sometimes identical instances have different classes⇒ Splitting stops when data can’t be split any further
When the set contains only samples belonging to a single pattern, the decision tree is composed by a leaf
![Page 36: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/36.jpg)
Wishlist for a purity measure
• Properties we require from a purity measure: – When node is pure, measure should be zero – When impurity is maximal (i.e. all classes equally likely),
measure should be maximal – Measure should obey multistage property (i.e. decisions can be
made in several stages)
Measure ([ 2,3,4 ])=measure ([ 2,7 ]+(7 / 9)×measure ([ 3,4 ])
• Entropy is the only function that satisfies all three properties!
![Page 37: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/37.jpg)
Properties of the entropy
• The multistage property:
• Simplification of computation: •
![Page 38: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/38.jpg)
Highly-branching attributes
• Problematic: attributes with a large number of values (extreme case: ID code)
• Subsets are more likely to be pure if there is a large number of values – Information gain is biased towards choosing
attributes with a large number of values – This may result in overfitting (selection of an
attribute that is non-optimal for prediction) • Another problem: fragmentation
![Page 39: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/39.jpg)
Information gain is maximal for ID code (namely 0.940 bits)Entropy of split:
![Page 40: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/40.jpg)
Gain Ratio
• Gain ratio: a modification of the information gain that reduces its bias
• Gain ratio takes number and size of branches into account when choosing an attribute – It corrects the information gain by taking the intrinsic
information of a split into account • Intrinsic information: entropy of distribution of
instances into branches (i.e. how much info do we need to tell which branch an instance belongs to)
![Page 41: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/41.jpg)
Computing the gain ratio
• Example: intrinsic information for ID code – Info([1,1,...,1])=14×(−1/14×log(1/14))=3.807bits
• Value of attribute decreases as intrinsic information gets larger
• Definition of gain ratio: gain_ratio(attribute)=gain(attribute)
intrinsic_info(attribute)• Example: gain_ratio(ID code)=0.940 bits =0.246
3.807 bits
![Page 42: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/42.jpg)
Gain ratios for weather data
![Page 43: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/43.jpg)
• Assume attributes are discrete– Discretize continues attributes
• Choose the attribute with the highest Information gain
• Create branches for each value of attribute• Examples partitioned based on selected attributes• Repeat with remaining attributes• Stropping conditions– All examples assigned the same label– No examples left
Building a Decision Tree(ID3 algorithm)
![Page 44: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/44.jpg)
C4.5 ExtensionsConsider every possible binary partition: choose the partition with the highest gain
![Page 45: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/45.jpg)
• Top-down induction of decision trees: ID3, algorithm developed by Ross Quinlan – Gain ratio just one modification of this basic
algorithm – ⇒ C4.5: deals with numeric attributes, missing values,
noisy data • Similar approach: CART • There are many other attribute selection criteria!
(But little difference in accuracy of result)
Discussion
![Page 46: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/46.jpg)
Q• Suppose there is a student that decides whether or not to go in to campus on
any given day based on the weather, wakeup time, and whether there is a seminar talk he is interested in attending. There are data collected from 13 days.
![Page 47: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/47.jpg)
Person Hair Length
Weight Age Class
Homer 0” 250 36 MMarge 10” 150 34 F
Bart 2” 90 10 MLisa 6” 78 8 F
Maggie 4” 20 1 FAbe 1” 170 70 M
Selma 8” 160 41 FOtto 10” 180 38 M
Krusty 6” 200 45 M
Comic 8” 290 38 ?
![Page 48: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/48.jpg)
Person Hair Length
Weight Age Class
Homer 0” 250 36 MMarge 10” 150 34 F
Bart 2” 90 10 MLisa 6” 78 8 F
Maggie 4” 20 1 FAbe 1” 170 70 M
Selma 8” 160 41 FOtto 10” 180 38 M
Krusty 6” 200 45 M
Comic 8” 290 38 ?
![Page 49: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/49.jpg)
Hair Length <= 5?yes no
Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911
Entropy(1F,3M) = -(1/4)log2(1/4) - (3/4)log2(3/4) = 0.8113
Entropy(3F,2M) = -(3/5)log2(3/5) - (2/5)log2(2/5) = 0.9710
gain(Hair Length <= 5) = 0.9911 – (4/9 * 0.8113 + 5/9 * 0.9710 ) = 0.0911
Let us try splitting on Hair length
![Page 50: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/50.jpg)
Weight <= 160?yes no
Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911
Entropy(4F,1M) = -(4/5)log2(4/5) - (1/5)log2(1/5) = 0.7219
Entropy(0F,4M) = -(0/4)log2(0/4) - (4/4)log2(4/4) = 0
gain(Weight <= 160) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900
Let us try splitting on Weight
![Page 51: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/51.jpg)
age <= 40?yes no
Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911
Entropy(3F,3M) = -(3/6)log2(3/6) - (3/6)log2(3/6) = 1
Entropy(1F,2M) = -(1/3)log2(1/3) - (2/3)log2(2/3) = 0.9183
gain(Age <= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183
Let us try splitting on Age
![Page 52: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/52.jpg)
Weight <= 160?yes no
Hair Length <= 2?yes no
Of the 3 features we had, Weight was best. But while people who weigh over 160 are perfectly classified (as males), the under 160 people are not perfectly classified… So we simply recurse!
This time we find that we can split on Hair length, and we are done!
gain(Hair Length <= 5) = 0.0911
gain(Weight <= 160) = 0.5900
gain(Age <= 40) = 0.0183
![Page 53: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/53.jpg)
Person Hair Length
Weight Age Class
Marge 10” 150 34 FBart 2” 90 10 MLisa 6” 78 8 F
Maggie 4” 20 1 FSelma 8” 160 41 F
![Page 54: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/54.jpg)
Hair Length <= 2?
Entropy(4F,1M) = -(4/5)log2(4/5) – (1/5)log2(1/5)= 0.2575+0.464
= 0.721
yes no
Entropy(0F,1M) =0
Entropy(4F,0M) = 0
gain(Hair Length <= 2) = 0.721 -0= 0.721
![Page 55: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/55.jpg)
Age <= 2?
Entropy(4F,1M) = -(4/5)log2(4/5) – (1/5)log2(1/5)= 0.2575+0.464
= 0.721
yes no
Entropy(0F,1M) =0
Entropy(4F,0M) = 0
gain(Hair Length <= 2) = 0.721 -0= 0.721
age <= 40?
![Page 56: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/56.jpg)
Decision Tree• Lunch with girlfriend• Enter the restaurant or not?
![Page 57: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/57.jpg)
• Input: features about restaurant• Output: Enter or not• Classification or Regression Problem?• Classification
• Features/Attributes:– Type: Italian, French,Thai– Environment: Fancy, classical– Occupied?
![Page 58: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/58.jpg)
Occupied Type Rainy Hungry Gf/friend Happiness Class
T Pizza T T T T
F Thai T T T F
T Thai F T T F
F Other F T T F
T Other F T T T
![Page 59: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/59.jpg)
Example of C4.5 algorithm
TABLE 7.1 (p.145)A simple flat database of examples for training
![Page 60: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/60.jpg)
• If I flip a coin N times and get A heads, what is the probability of getting heads on toss N+1
A+2N+2
Rule of Succession
![Page 61: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/61.jpg)
• I have a weighted coin but I don’t know what the likehoods are for flipping heads or tails
• I flip the coin 10 times, always get heads• What’s the probability of getting heads on 11th
try?– A+1/N+2=10+1/10+2=11/12
![Page 62: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/62.jpg)
• What is the probability that the sun will rise tomorrow?
• N=1.8 x10^12 days• A=1.8x10^12 days
99.999999999944%
![Page 63: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/63.jpg)
Outlook Temp Humidity Windy PlaySunny Hot High False NoSunny Hot High True NoSunny Mild High False NoSunny Cool Normal False YesSunny Mild Normal True Yes
Outlook Temp Humidity Windy PlayRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoRainy Mild Normal False YesRainy Mild High True No
gain(Temperature ) = 0.571 bits gain(Humidity ) = 0.971 bits gain(Windy ) = 0.020 bits
gain(Temperature ) = 0.02 bits gain(Windy ) = 0.971 bits Gain(Humidity)=0.02 bits
![Page 64: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/64.jpg)
Example 3:
![Page 65: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/65.jpg)
X1 X2 X3 X4 C
F F F F P
F F T T P
F T F T P
T T T F P
T F F F N
T T T T N
T T T F N
D=
X={X1,X2,X3,X4}Entropy(D)=entropy(4/7,3/7)=0.98
Gain(X1 ) = 0.98 - 0.46 = 0.52 Gain(X2 ) = 0.98 – 0.97 = 0.01
Gain(X1 ) = 0.52 Gain(X2 ) = 0.01 Gain(X3 ) = 0.01 Gain(X4 ) = 0.01
X1 X2 X3 X4 C
F F F F P
F F T T P
F T F T P
X1 X2 X3 X4 C
T T T F P
T F F F N
T T T T N
T T T F NX={X1,X2,X3}
X={X1,X2,X3}
![Page 66: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/66.jpg)
X1 X2 X3 X4 C
F F F F P
F F T T P
F T F T P
X1 X2 X3 X4 C
T T T F P
T F F F N
T T T T N
T T T F NX={X1,X2,X3}
All instances have the same class.
Return class P
All attributes have same information gain.
Break ties arbitrarily.
Choose X2
X1 X2 X3 X4 C
T F F F N
X1 X2 X3 X4 C
T T T F P
T T T T N
T T T F N
X={X1,X2,X3}
X={X3,X4}
All instances have the same class.
Return class N X={X3,X4
X3 has zero information gain
X4 has positive information gain Choose X4
![Page 67: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/67.jpg)
X1 X2 X3 X4 C
T T T T N
X1 X2 X3 X4 C
T T T F P
T T T F N
X={X3}
X3 has zero information gain
No suitable attribute for splitting
Return most common class (break ties arbitrarily)
Note: data is inconsistent!
X={X3}
All instances have the same class. Return N.
![Page 68: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/68.jpg)
Example 4
![Page 69: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/69.jpg)
Outlook Temp Humidity Windy PlaySunny Hot High False YesSunny Hot High True NoOvercast Hot High False YesRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoOvercast Cool Normal True YesSunny Mild High False NoSunny Cool Normal False YesRainy Mild Normal False YesSunny Mild Normal True YesOvercast Mild High True YesOvercast Hot Normal False YesRainy Mild High True No
Outlook
Humidity Windy
Temperature
Windy
No Yes
YesYes
No
No
RainySunnyOvercast
YesHigh Normal
HotMild
True False
![Page 70: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/70.jpg)
Outlook Temp Humidity Windy PlaySunny Hot High False YesSunny Hot High True NoSunny Mild High False NoSunny Cool Normal False YesSunny Mild Normal True Yes
Outlook
Humidity Windy
YesYes No
RainySunnyOvercast
YesHigh Normal
Gain(Temperature)=0.971-0.8=0.171Gain(Windy)=0.971-0.951=0.020Gain(Humidity)=0.971-0.551=0.420
O T H W PS H H F YS H H T NS M H F N
O T H W PS C N F YS M N T Y
![Page 71: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/71.jpg)
Humidity Windy
Temperature YesYes No
RainySunnyOvercast
YesHigh Normal
HotMild
Outlook
O T H W PS H H F YS H H T NS M H F N
Temperature
YesNo
Hot
No
Mild
Windy
NoYes
False
No
True
O T H W PS H H F YS H H T N
O T H W PS M H F N
![Page 72: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/72.jpg)
Outlook
Humidity Windy
Temperature
Windy
No Yes
Yes
No
No
RainySunnyOvercast
YesHigh Normal
HotMild
True FalseO T H W PS H H T N O T H W P
S H H F Y
Yes
![Page 73: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/73.jpg)
76
Outlook Temp Humidity Windy PlaySunny Hot High False NoSunny Hot High True NoSunny Mild High False NoSunny Cool Normal False YesSunny Mild Normal True Yes
Outlook Temp Humidity Windy PlayRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoRainy Mild Normal False YesRainy Mild High True Nogain(Temperature ) = 0.571 bits
gain(Humidity ) = 0.971 bits gain(Windy ) = 0.020 bits
gain(Temperature ) = 0.02 bits gain(Windy ) = 0.971 bits Gain(Humidity)=0.02 bits
Decision Tree Cont.
![Page 74: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/74.jpg)
77
X1 X2 X3 X4 C
F F F F P
F F T T P
F T F T P
T T T F P
T F F F N
T T T T N
T T T F N
D=
X={X1,X2,X3,X4}
Entropy(D)=entropy(4/7,3/7)=0.98
Gain(X1 ) = 0.98 - 0.46 = 0.52 Gain(X2 ) = 0.98 – 0.97 = 0.01
Gain(X1 ) = 0.52 Gain(X2 ) = 0.01 Gain(X3 ) = 0.01 Gain(X4 ) = 0.01
X1 X2 X3 X4 C
F F F F P
F F T T P
F T F T P
X1 X2 X3 X4 C
T T T F P
T F F F N
T T T T N
T T T F N
X={X1,X2,X3} X={X1,X2,X3}
Example 2:
![Page 75: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/75.jpg)
78
X1 X2 X3 X4 C
F F F F P
F F T T P
F T F T P
X1 X2 X3 X4 C
T T T F P
T F F F N
T T T T N
T T T F NX={X1,X2,X3}
All instances have the same class.
Return class P
All attributes have same information gain.
Break ties arbitrarily.
Choose X2
X1 X2 X3 X4 C
T F F F N
X1 X2 X3 X4 C
T T T F P
T T T T N
T T T F N
X={X1,X2,X3}
X={X3,X4}
All instances have the same class.
Return class N X={X3,X4
X3 has zero information gain
X4 has positive information gain Choose X4
![Page 76: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/76.jpg)
79
X1 X2 X3 X4 C
T T T T N
X1 X2 X3 X4 C
T T T F P
T T T F N
X={X3}
X3 has zero information gain
No suitable attribute for splitting
Return most common class (break ties arbitrarily)
Note: data is inconsistent!
X={X3}
All instances have the same class. Return N.
![Page 77: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/77.jpg)
80
Outlook Temp Humidity Windy PlaySunny Hot High False YesSunny Hot High True NoOvercast Hot High False YesRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoOvercast Cool Normal True YesSunny Mild High False NoSunny Cool Normal False YesRainy Mild Normal False YesSunny Mild Normal True YesOvercast Mild High True YesOvercast Hot Normal False YesRainy Mild High True No
Outlook
Humidity Windy
Temperature
Windy
No Yes
YesYes
No
No
RainySunnyOvercast
YesHigh Normal
HotMild
True False
Example 3
![Page 78: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/78.jpg)
81
Outlook Temp Humidity Windy PlaySunny Hot High False YesSunny Hot High True NoSunny Mild High False NoSunny Cool Normal False YesSunny Mild Normal True Yes
Outlook
Humidity Windy
YesYes No
RainySunnyOvercast
YesHigh Normal
Gain(Temperature)=0.971-0.8=0.171Gain(Windy)=0.971-0.951=0.020Gain(Humidity)=0.971-0.551=0.420
O T H W PS H H F YS H H T NS M H F N
O T H W PS C N F YS M N T Y
![Page 79: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/79.jpg)
82
Humidity Windy
Temperature YesYes No
RainySunnyOvercast
YesHigh Normal
HotMild
Outlook
O T H W PS H H F YS H H T NS M H F N
Temperature
YesNo
Hot
No
Mild
Windy
NoYes
False
No
True
O T H W PS H H F YS H H T N
O T H W PS M H F N
![Page 80: Decision tree](https://reader036.vdocument.in/reader036/viewer/2022062400/58859a791a28abd2498b5757/html5/thumbnails/80.jpg)
83
Outlook
Humidity Windy
Temperature
Windy
No Yes
Yes
No
No
RainySunnyOvercast
YesHigh Normal
HotMild
True FalseO T H W PS H H T N O T H W P
S H H F Y
Yes