decision tree

80
Tilani Gunawardena Algorithms: Decision Trees

Upload: tilani-gunawardena-phdunibas-bscpera-fheauk-amiesl

Post on 23-Jan-2017

155 views

Category:

Education


2 download

TRANSCRIPT

Page 1: Decision tree

Tilani Gunawardena

Algorithms: Decision Trees

Page 2: Decision tree
Page 3: Decision tree

Decision Tree• Decision tree builds classification or regression models in the form of

a tree structure• It breaks down a dataset into smaller and smaller subsets while at

the same time an associated decision tree is incrementally developed

• The final results is a tree with decision nodes and leaf notes.– Decision nodes(ex: Outlook) has two or more branches(ex: Sunny,

Overcast and Rainy)– Leaf Node(ex: Play=Yes or Play=No) – Topmost decision node in a tree which corresponds to the best predictor

called root node• Decision trees can handle both categorical and numerical data

Page 4: Decision tree

Decision tree learning Algorithms

• ID3 (Iterative Dichotomiser 3)• C4.5 (successor of ID3)• CART (Classification And Regression Tree)• CHAID (CHi-squared Automatic Interaction

Detector). Performs multi-level splits when computing classification trees)

• MARS: extends decision trees to handle numerical data better.

Page 5: Decision tree

How it works• The core algorithm for building decisions tress

called ID3 by J.R. Quinlan which employs a top-down, greedy search through the space of possible branches with no backtracking

• ID3 uses Entropy and information Gain to construct a decision tree

Page 6: Decision tree

DIVIDE-AND-CONQUER(CONSTRUCTING DECISION TREES

• Divide and Conquer approach (Strategy: top down)– First: select attribute for root node : create branch

for each possible attribute value– Then: split instances into subsets ; One for each

branch extending from the node– Finally: repeat recursively for each branch, using

only instances that reach the branch• Stop if all instances have same class

Page 7: Decision tree

Outlook Temp Humidity Windy PlaySunny Hot High False NoSunny Hot High True NoOvercast Hot High False YesRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoOvercast Cool Normal True YesSunny Mild High False NoSunny Cool Normal False YesRainy Mild Normal False YesSunny Mild Normal True YesOvercast Mild High True YesOvercast Hot Normal False YesRainy Mild High True No

Which attribute to select?

Page 8: Decision tree
Page 9: Decision tree

Criterion for attribute selection• Which is the best attribute?

– The one which will result in the smallest tree– Heuristic: choose the attribute that produces the “purest” nodes

• Need a good measure of purity!– Maximal when?– Minimal when?

• Popular impurity criterion: Information gain– Information gain increases with the average purity of the subsets

• Measure information in bits – Given a probability distribution, the info required to predict an event is the

distribution’s entropy – Entropy gives the information required in bits (can involve fractions of

bits!) • Formula for computing the entropy:

– Entropy(p1,p2,...,pn)=−p1logp1−p2 logp2...−pnlogpn

Purity measure of each node improves the feature/attribute selection

Page 10: Decision tree

10

Entropy: a common way to measure impurity

• Entropy =

pi is the probability of class iCompute it as the proportion of class i in the set.

• Entropy comes from information theory. The higher the entropy the more the information content.

i

ii pp 2log

Entropy aims to answer “how uncertain we are of the outcome?”

Page 11: Decision tree

Entropy• A decision tree is built top-down from root node and involves

partitioning the data into subsets that contain instances with similar values(homogeneous)

• ID3 algorithm uses entropy to calculate the homogeneity of a sample

• If the sample is completely homogeneous the entropy is zero and if the samples is an equally divided it has entropy of one

Page 12: Decision tree

12

2-Class Cases:• What is the entropy of a group in which all

examples belong to the same class?– entropy =

• What is the entropy of a group with 50% in either class?– entropy =

Minimum impurity

Maximumimpurity

i

ii pp 2logEntropy =

Page 13: Decision tree

13

2-Class Cases:• What is the entropy of a group in which all

examples belong to the same class?– entropy = - 1 log21 = 0

• What is the entropy of a group with 50% in either class?– entropy = -0.5 log20.5 – 0.5 log20.5 =1

Minimum impurity

Maximumimpurity

Page 14: Decision tree

14

Information Gain

Which test is more informative?Split over whether

Balance exceeds 50K

Over 50KLess or equal 50K EmployedUnemployed

Split over whether applicant is employed

Page 15: Decision tree

15

Impurity/Entropy (informal)– Measures the level of impurity in a group of examples

Information Gain

Less impure Minimum impurity

Very impure group

Gain aims to answer “how much entropy of the training set some test reduced ??”

Page 16: Decision tree

16

Information Gain• We want to determine which attribute in a given set

of training feature vectors is most useful for discriminating between the classes to be learned.

• Information gain tells us how important a given attribute of the feature vectors is.

• We will use it to decide the ordering of attributes in the nodes of a decision tree.

Page 17: Decision tree

17

Calculating Information Gain

996.03016log

3016

3014log

3014

22

impurity

787.0174log

174

1713log

1713

22

impurity

Entire population (30 instances)17 instances

13 instances

(Weighted) Average Entropy of Children = 615.0391.03013787.0

3017

Information Gain= 0.996 - 0.615 = 0.38

391.01312log

1312

131log

131

22

impurity

Information Gain = entropy(parent) – [average entropy(children)]

gain(population)=info([14,16])-info([13,4],[1,12])

parententropy

childentropy

childentropy

Page 18: Decision tree

18

Calculating Information Gain

615.0391.03013787.0

3017

Information Gain= info([14,16])-info([13,4],[1,12]) = 0.996 - 0.615

= 0.38

391.01312log

1312

131log

131

22

impurity

Information Gain = entropy(parent) – [average entropy(children)]

gain(population)=info([14,16])-info([13,4],[1,12]) info[14/16]=entropy(14/30,16/30) =

info[13,4]=entropy(13/17,4/17) = info[1.12]=entropy(1/13,12/13) =

996.03016log

3016

3014log

3014

22

impurity

787.0174log

174

1713log

1713

22

impurity

info([13,4],[1,12]) =

Page 19: Decision tree

Outlook Temp Humidity Windy PlaySunny Hot High False NoSunny Hot High True NoOvercast Hot High False YesRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoOvercast Cool Normal True YesSunny Mild High False NoSunny Cool Normal False YesRainy Mild Normal False YesSunny Mild Normal True YesOvercast Mild High True YesOvercast Hot Normal False YesRainy Mild High True No

Which attribute to select?

Page 20: Decision tree
Page 21: Decision tree

Outlook = Sunny :

info[([2,3])=Outlook = Overcast : Info([4,0])=Outlook = Rainy : Info([2,3])=

i

ii pp 2log

Page 22: Decision tree

Outlook = Sunny :

info[([2,3])=entropy(2/5,3/5)=

Outlook = Overcast : Info([4,0])=entropy(1,0)=

Outlook = Rainy : Info([2,3])=entropy(3/5,2/5)=

i

ii pp 2log

Page 23: Decision tree

Outlook = Sunny :

info[([2,3])=entropy(2/5,3/5)=−2/5log(2/5)−3/5log(3/5)=0.971bits

Outlook = Overcast : Info([4,0])=entropy(1,0)=−1log(1)−0log(0)=0bits

Outlook = Rainy : Info([2,3])=entropy(3/5,2/5)=−3/5log(3/5)−2/5log(2/5)=0.971bits

Expected information for attribute: Info([3,2],[4,0],[3,2])=

Note: log(0) is normally undefined but we evaluate 0*log(0) as zero

(Weighted) Average Entropy of Children =

i

ii pp 2log

Page 24: Decision tree

Outlook = Sunny :

info[([2,3])=entropy(2/5,3/5)=−2/5log(2/5)−3/5log(3/5)=0.971bits

Outlook = Overcast : Info([4,0])=entropy(1,0)=−1log(1)−0log(0)=0bits

Outlook = Rainy : Info([2,3])=entropy(3/5,2/5)=−3/5log(3/5)−2/5log(2/5)=0.971bits

Expected information for attribute: Info([3,2],[4,0],[3,2])=(5/14)×0.971+(4/14)×0+(5/14)×0.971=0.693bits

Information gain= information before splitting – information after splitting

gain(Outlook ) = info([9,5]) – info([2,3],[4,0],[3,2])

Note: log(0) is normally undefined but we evaluate 0*log(0) as zero

i

ii pp 2log

Page 25: Decision tree

Outlook = Sunny :

info[([2,3])=entropy(2/5,3/5)=−2/5log(2/5)−3/5log(3/5)=0.971bits

Outlook = Overcast : Info([4,0])=entropy(1,0)=−1log(1)−0log(0)=0bits

Outlook = Rainy : Info([2,3])=entropy(3/5,2/5)=−3/5log(3/5)−2/5log(2/5)=0.971bits

Expected information for attribute: Info([3,2],[4,0],[3,2])=(5/14)×0.971+(4/14)×0+(5/14)×0.971=0.693bits

Information gain= information before splitting – information after splitting

gain(Outlook ) = info([9,5]) – info([2,3],[4,0],[3,2]) = 0.940 – 0.693 = 0.247 bits

Note: log(0) is normally undefined but we evaluate 0*log(0) as zero

Page 26: Decision tree

Humidity = high :

info[([3,4])=entropy(3/7,4/7)=−3/7log(3/7)−4/7log(4/7)=0.524+0.461=0.985 bits

Humidity = normal : Info([6,1])=entropy(6/7,1/7)=−6/7log(6/7)−1/7log(1/7)=0.191+0.401=0.592 bits

Expected information for attribute: Info([3,4],[6,1])=(7/14)×0.985+(7/14)×0.592=0.492+0.296= 0.788 bits

Information gain= information before splitting – information after splitting

gain(Humidity ) = info([9,5]) – info([3,4],[6,1]) = 0.940 – 0.788 = 0.152 bits

Page 27: Decision tree

gain(Outlook ) = 0.247 bits gain(Temperature ) = 0.029 bits gain(Humidity ) 0.152 bits gain(Windy ) 0.048 bits

info(nodes)=Info([2,3],[4,0],[3,2])=0.693bits

gain= 0.940-0.693 = 0.247 bits

info(nodes)=Info([6,2],[3,3])=0.892 bits

gain=0.940-0.892 = 0.048 bits

info(nodes)=Info([2,2],[4,2],[3,1])=0.911 bits

gain=0.940-0.911 = 0.029 bits

info(nodes)=Info([3,4],[6,1])=0.788bits

gain= 0.940-0.788 =0.152 bits

Info(all features) =Info(9,5) =0.940 bits

This nodes is “pure” with only “yes” pattern, therefore lower entropy and higher gain

Page 28: Decision tree

gain(Outlook ) = 0.247 bits gain(Temperature ) = 0.029 bits gain(Humidity ) 0.152 bits gain(Windy ) 0.048 bits

• Select the attribute with the highest gain ratio

• Information gain tells us how important a given attribute of the feature vectors is.

• We will use it to decide the ordering of attributes in the nodes of a decision tree.

• Constructing a decision tree is all about finding attribute that returns the highest information gain(the most homogeneous branches)

Page 29: Decision tree

Continuing to split • gain(Outlook) =0.247 bits• gain(Temperature ) = 0.029 bits • gain(Humidity ) = 0.152 bits • gain(Windy ) = 0.048 bits

Page 30: Decision tree

Outlook Temp Humidity Windy PlaySunny Hot High False NoSunny Hot High True NoOvercast Hot High False YesRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoOvercast Cool Normal True YesSunny Mild High False NoSunny Cool Normal False YesRainy Mild Normal False YesSunny Mild Normal True YesOvercast Mild High True YesOvercast Hot Normal False YesRainy Mild High True No

Outlook Temp Humidity Windy PlaySunny Hot High False NoSunny Hot High True NoSunny Mild High False NoSunny Cool Normal False YesSunny Mild Normal True Yes

Temp Humidity Windy PlayHot High False NoHot High True NoMild High False NoCool Normal False YesMild Normal True Yes

Page 31: Decision tree

Temp Humidity Windy PlayHot High False NoHot High True NoMild High False NoCool Normal False YesMild Normal True Yes

Temperature

NoNo

Hot

YesNo

Yes

MildCool

Windy

NoNoYes

False

NoYes

True

Humidity

NoNoNo

High

YesYes

Normal

Play

NoNoNo

YesYes

Page 32: Decision tree

Temperature

NoNo

Hot

YesNo

Yes

MildCool

Windy

NoNoYes

False

NoYes

True

Temperature = Hot : info[([2,0])=entropy(1,0)=entropy(1,0)=−1log(1)−0log(0)=0 bits Temperature = Mild : Info([1,1])=entropy(1/2,1/2)=−1/2log(1/2)−1/2log(1/2)=0.5+0.5=1 bits Temperature = Cool : Info([1,0])=entropy(1,0)= 0 bits

Expected information for attribute: Info([2,0],[1,1],[1,0])=(2/5)×0+(2/5)×1+(1/5)x0=0+0.4+0= 0.4 bits

gain(Temperature ) = info([3,2]) – info([2,0],[1,1],[1,0]) = 0.971-0.4= 0.571 bits

Play

NoNoNo

YesYes

Windy = False : info[([2,1])=entropy(2/3,1/3)=−2/3log(2/3)−1/3log(1/3)=0.9179 bits Windy = True : Info([1,1])=entropy(1/2,1/2)=1 bits

Expected information for attribute: Info([2,1],[1,1])=(3/5)×0.918+(2/5)×1=0.951bits

gain(Windy ) = info([3,2]) – info([2,1],[1,1]) = 0.971-0.951= 0.020 bits

Humidity

NoNoNo

High

YesYes

Normal

Humidity = High : info[([3,0])=entropy(1,0)=0bits Humidity = Normal : Info([2,0])=entropy(1,0)=0 bits

Expected information for attribute: Info([3,0],[2,0])=(3/5)×0+(2/5)×0=0 bits

gain(Humidity ) = info([3,2]) – Info([3,0],[2,0]) = 0.971-0= 0.971 bits

gain(Temperature ) = 0.571 bits gain(Humidity ) = 0.971 bits gain(Windy ) = 0.020 bits

Page 33: Decision tree

Outlook Temp Humidity Windy PlaySunny Hot High False NoSunny Hot High True NoOvercast Hot High False YesRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoOvercast Cool Normal True YesSunny Mild High False NoSunny Cool Normal False YesRainy Mild Normal False YesSunny Mild Normal True YesOvercast Mild High True YesOvercast Hot Normal False YesRainy Mild High True No

Outlook Temp Humidity Windy PlayRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoRainy Mild Normal False YesRainy Mild High True No

Temp Humidity Windy PlayMild High False YesCool Normal False YesCool Normal True NoMild Normal False YesMild High True No

Temp Windy PlayMild False YesCool False YesCool True NoMild False YesMild True No

Page 34: Decision tree

Temp Windy PlayMild False YesCool False YesCool True NoMild False YesMild True No

Temperature

---

Hot

YesYesNo

YesNo

MildCool

Windy

YesYesYes

False

NoNo

True

Temperature = Mild : Info([2,1])=entropy(1/2,1/2)=0.9179 bits

Temperature = Cool : Info([1,1])=1 bits

Expected information for attribute: Info([2,1],[1,1])=(3/5)×0.918+(2/5)×1=0.551+0.4= 0.951 bits

gain(Temperature ) = info([3,2]) – info([2,1],[1,1]) = 0.971-0.951= 0.02 bits

Play

YesYesYes

NoNo

Windy = False : info[([3,0])=0 bits Windy = True : Info([2,0])=0 bits

Expected information for attribute: Info([3,0],[2,0])= 0 bits

gain(Windy ) = info([3,2]) – info([3,0],[2,0]) = 0.971-0= 0.971 bits

gain(Temperature ) = 0.02 bits gain(Windy ) = 0.971 bits

Page 35: Decision tree

Final decision tree

R1: If (Outlook=Sunny) And (Humidity=High) then Play=No

R2: If (Outlook=Sunny) And (Humidity=Normal) then Play=Yes

R3: If (Outlook=Overcast) then Play=Yes

R4: If (Outlook=Rainy) And (Windy=False) then Play=Yes

R5: If (Outlook=Rainy) And (Windy=True) then Play=No

Note: not all leaves need to be pure; sometimes identical instances have different classes⇒ Splitting stops when data can’t be split any further

When the set contains only samples belonging to a single pattern, the decision tree is composed by a leaf

Page 36: Decision tree

Wishlist for a purity measure

• Properties we require from a purity measure: – When node is pure, measure should be zero – When impurity is maximal (i.e. all classes equally likely),

measure should be maximal – Measure should obey multistage property (i.e. decisions can be

made in several stages)

Measure ([ 2,3,4 ])=measure ([ 2,7 ]+(7 / 9)×measure ([ 3,4 ])

• Entropy is the only function that satisfies all three properties!

Page 37: Decision tree

Properties of the entropy

• The multistage property:

• Simplification of computation: •

Page 38: Decision tree

Highly-branching attributes

• Problematic: attributes with a large number of values (extreme case: ID code)

• Subsets are more likely to be pure if there is a large number of values – Information gain is biased towards choosing

attributes with a large number of values – This may result in overfitting (selection of an

attribute that is non-optimal for prediction) • Another problem: fragmentation

Page 39: Decision tree

Information gain is maximal for ID code (namely 0.940 bits)Entropy of split:

Page 40: Decision tree

Gain Ratio

• Gain ratio: a modification of the information gain that reduces its bias

• Gain ratio takes number and size of branches into account when choosing an attribute – It corrects the information gain by taking the intrinsic

information of a split into account • Intrinsic information: entropy of distribution of

instances into branches (i.e. how much info do we need to tell which branch an instance belongs to)

Page 41: Decision tree

Computing the gain ratio

• Example: intrinsic information for ID code – Info([1,1,...,1])=14×(−1/14×log(1/14))=3.807bits

• Value of attribute decreases as intrinsic information gets larger

• Definition of gain ratio: gain_ratio(attribute)=gain(attribute)

intrinsic_info(attribute)• Example: gain_ratio(ID code)=0.940 bits =0.246

3.807 bits

Page 42: Decision tree

Gain ratios for weather data

Page 43: Decision tree

• Assume attributes are discrete– Discretize continues attributes

• Choose the attribute with the highest Information gain

• Create branches for each value of attribute• Examples partitioned based on selected attributes• Repeat with remaining attributes• Stropping conditions– All examples assigned the same label– No examples left

Building a Decision Tree(ID3 algorithm)

Page 44: Decision tree

C4.5 ExtensionsConsider every possible binary partition: choose the partition with the highest gain

Page 45: Decision tree

• Top-down induction of decision trees: ID3, algorithm developed by Ross Quinlan – Gain ratio just one modification of this basic

algorithm – ⇒ C4.5: deals with numeric attributes, missing values,

noisy data • Similar approach: CART • There are many other attribute selection criteria!

(But little difference in accuracy of result)

Discussion

Page 46: Decision tree

Q• Suppose there is a student that decides whether or not to go in to campus on

any given day based on the weather, wakeup time, and whether there is a seminar talk he is interested in attending. There are data collected from 13 days.

Page 47: Decision tree

Person Hair Length

Weight Age Class

Homer 0” 250 36 MMarge 10” 150 34 F

Bart 2” 90 10 MLisa 6” 78 8 F

Maggie 4” 20 1 FAbe 1” 170 70 M

Selma 8” 160 41 FOtto 10” 180 38 M

Krusty 6” 200 45 M

Comic 8” 290 38 ?

Page 48: Decision tree

Person Hair Length

Weight Age Class

Homer 0” 250 36 MMarge 10” 150 34 F

Bart 2” 90 10 MLisa 6” 78 8 F

Maggie 4” 20 1 FAbe 1” 170 70 M

Selma 8” 160 41 FOtto 10” 180 38 M

Krusty 6” 200 45 M

Comic 8” 290 38 ?

Page 49: Decision tree

Hair Length <= 5?yes no

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911

Entropy(1F,3M) = -(1/4)log2(1/4) - (3/4)log2(3/4) = 0.8113

Entropy(3F,2M) = -(3/5)log2(3/5) - (2/5)log2(2/5) = 0.9710

gain(Hair Length <= 5) = 0.9911 – (4/9 * 0.8113 + 5/9 * 0.9710 ) = 0.0911

Let us try splitting on Hair length

Page 50: Decision tree

Weight <= 160?yes no

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911

Entropy(4F,1M) = -(4/5)log2(4/5) - (1/5)log2(1/5) = 0.7219

Entropy(0F,4M) = -(0/4)log2(0/4) - (4/4)log2(4/4) = 0

gain(Weight <= 160) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900

Let us try splitting on Weight

Page 51: Decision tree

age <= 40?yes no

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911

Entropy(3F,3M) = -(3/6)log2(3/6) - (3/6)log2(3/6) = 1

Entropy(1F,2M) = -(1/3)log2(1/3) - (2/3)log2(2/3) = 0.9183

gain(Age <= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183

Let us try splitting on Age

Page 52: Decision tree

Weight <= 160?yes no

Hair Length <= 2?yes no

Of the 3 features we had, Weight was best. But while people who weigh over 160 are perfectly classified (as males), the under 160 people are not perfectly classified… So we simply recurse!

This time we find that we can split on Hair length, and we are done!

gain(Hair Length <= 5) = 0.0911

gain(Weight <= 160) = 0.5900

gain(Age <= 40) = 0.0183

Page 53: Decision tree

Person Hair Length

Weight Age Class

Marge 10” 150 34 FBart 2” 90 10 MLisa 6” 78 8 F

Maggie 4” 20 1 FSelma 8” 160 41 F

Page 54: Decision tree

Hair Length <= 2?

Entropy(4F,1M) = -(4/5)log2(4/5) – (1/5)log2(1/5)= 0.2575+0.464

= 0.721

yes no

Entropy(0F,1M) =0

Entropy(4F,0M) = 0

gain(Hair Length <= 2) = 0.721 -0= 0.721

Page 55: Decision tree

Age <= 2?

Entropy(4F,1M) = -(4/5)log2(4/5) – (1/5)log2(1/5)= 0.2575+0.464

= 0.721

yes no

Entropy(0F,1M) =0

Entropy(4F,0M) = 0

gain(Hair Length <= 2) = 0.721 -0= 0.721

age <= 40?

Page 56: Decision tree

Decision Tree• Lunch with girlfriend• Enter the restaurant or not?

Page 57: Decision tree

• Input: features about restaurant• Output: Enter or not• Classification or Regression Problem?• Classification

• Features/Attributes:– Type: Italian, French,Thai– Environment: Fancy, classical– Occupied?

Page 58: Decision tree

Occupied Type Rainy Hungry Gf/friend Happiness Class

T Pizza T T T T

F Thai T T T F

T Thai F T T F

F Other F T T F

T Other F T T T

Page 59: Decision tree

Example of C4.5 algorithm

TABLE 7.1 (p.145)A simple flat database of examples for training

Page 60: Decision tree

• If I flip a coin N times and get A heads, what is the probability of getting heads on toss N+1

A+2N+2

Rule of Succession

Page 61: Decision tree

• I have a weighted coin but I don’t know what the likehoods are for flipping heads or tails

• I flip the coin 10 times, always get heads• What’s the probability of getting heads on 11th

try?– A+1/N+2=10+1/10+2=11/12

Page 62: Decision tree

• What is the probability that the sun will rise tomorrow?

• N=1.8 x10^12 days• A=1.8x10^12 days

99.999999999944%

Page 63: Decision tree

Outlook Temp Humidity Windy PlaySunny Hot High False NoSunny Hot High True NoSunny Mild High False NoSunny Cool Normal False YesSunny Mild Normal True Yes

Outlook Temp Humidity Windy PlayRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoRainy Mild Normal False YesRainy Mild High True No

gain(Temperature ) = 0.571 bits gain(Humidity ) = 0.971 bits gain(Windy ) = 0.020 bits

gain(Temperature ) = 0.02 bits gain(Windy ) = 0.971 bits Gain(Humidity)=0.02 bits

Page 64: Decision tree

Example 3:

Page 65: Decision tree

X1 X2 X3 X4 C

F F F F P

F F T T P

F T F T P

T T T F P

T F F F N

T T T T N

T T T F N

D=

X={X1,X2,X3,X4}Entropy(D)=entropy(4/7,3/7)=0.98

Gain(X1 ) = 0.98 - 0.46 = 0.52 Gain(X2 ) = 0.98 – 0.97 = 0.01

Gain(X1 ) = 0.52 Gain(X2 ) = 0.01 Gain(X3 ) = 0.01 Gain(X4 ) = 0.01

X1 X2 X3 X4 C

F F F F P

F F T T P

F T F T P

X1 X2 X3 X4 C

T T T F P

T F F F N

T T T T N

T T T F NX={X1,X2,X3}

X={X1,X2,X3}

Page 66: Decision tree

X1 X2 X3 X4 C

F F F F P

F F T T P

F T F T P

X1 X2 X3 X4 C

T T T F P

T F F F N

T T T T N

T T T F NX={X1,X2,X3}

All instances have the same class.

Return class P

All attributes have same information gain.

Break ties arbitrarily.

Choose X2

X1 X2 X3 X4 C

T F F F N

X1 X2 X3 X4 C

T T T F P

T T T T N

T T T F N

X={X1,X2,X3}

X={X3,X4}

All instances have the same class.

Return class N X={X3,X4

X3 has zero information gain

X4 has positive information gain Choose X4

Page 67: Decision tree

X1 X2 X3 X4 C

T T T T N

X1 X2 X3 X4 C

T T T F P

T T T F N

X={X3}

X3 has zero information gain

No suitable attribute for splitting

Return most common class (break ties arbitrarily)

Note: data is inconsistent!

X={X3}

All instances have the same class. Return N.

Page 68: Decision tree

Example 4

Page 69: Decision tree

Outlook Temp Humidity Windy PlaySunny Hot High False YesSunny Hot High True NoOvercast Hot High False YesRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoOvercast Cool Normal True YesSunny Mild High False NoSunny Cool Normal False YesRainy Mild Normal False YesSunny Mild Normal True YesOvercast Mild High True YesOvercast Hot Normal False YesRainy Mild High True No

Outlook

Humidity Windy

Temperature

Windy

No Yes

YesYes

No

No

RainySunnyOvercast

YesHigh Normal

HotMild

True False

Page 70: Decision tree

Outlook Temp Humidity Windy PlaySunny Hot High False YesSunny Hot High True NoSunny Mild High False NoSunny Cool Normal False YesSunny Mild Normal True Yes

Outlook

Humidity Windy

YesYes No

RainySunnyOvercast

YesHigh Normal

Gain(Temperature)=0.971-0.8=0.171Gain(Windy)=0.971-0.951=0.020Gain(Humidity)=0.971-0.551=0.420

O T H W PS H H F YS H H T NS M H F N

O T H W PS C N F YS M N T Y

Page 71: Decision tree

Humidity Windy

Temperature YesYes No

RainySunnyOvercast

YesHigh Normal

HotMild

Outlook

O T H W PS H H F YS H H T NS M H F N

Temperature

YesNo

Hot

No

Mild

Windy

NoYes

False

No

True

O T H W PS H H F YS H H T N

O T H W PS M H F N

Page 72: Decision tree

Outlook

Humidity Windy

Temperature

Windy

No Yes

Yes

No

No

RainySunnyOvercast

YesHigh Normal

HotMild

True FalseO T H W PS H H T N O T H W P

S H H F Y

Yes

Page 73: Decision tree

76

Outlook Temp Humidity Windy PlaySunny Hot High False NoSunny Hot High True NoSunny Mild High False NoSunny Cool Normal False YesSunny Mild Normal True Yes

Outlook Temp Humidity Windy PlayRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoRainy Mild Normal False YesRainy Mild High True Nogain(Temperature ) = 0.571 bits

gain(Humidity ) = 0.971 bits gain(Windy ) = 0.020 bits

gain(Temperature ) = 0.02 bits gain(Windy ) = 0.971 bits Gain(Humidity)=0.02 bits

Decision Tree Cont.

Page 74: Decision tree

77

X1 X2 X3 X4 C

F F F F P

F F T T P

F T F T P

T T T F P

T F F F N

T T T T N

T T T F N

D=

X={X1,X2,X3,X4}

Entropy(D)=entropy(4/7,3/7)=0.98

Gain(X1 ) = 0.98 - 0.46 = 0.52 Gain(X2 ) = 0.98 – 0.97 = 0.01

Gain(X1 ) = 0.52 Gain(X2 ) = 0.01 Gain(X3 ) = 0.01 Gain(X4 ) = 0.01

X1 X2 X3 X4 C

F F F F P

F F T T P

F T F T P

X1 X2 X3 X4 C

T T T F P

T F F F N

T T T T N

T T T F N

X={X1,X2,X3} X={X1,X2,X3}

Example 2:

Page 75: Decision tree

78

X1 X2 X3 X4 C

F F F F P

F F T T P

F T F T P

X1 X2 X3 X4 C

T T T F P

T F F F N

T T T T N

T T T F NX={X1,X2,X3}

All instances have the same class.

Return class P

All attributes have same information gain.

Break ties arbitrarily.

Choose X2

X1 X2 X3 X4 C

T F F F N

X1 X2 X3 X4 C

T T T F P

T T T T N

T T T F N

X={X1,X2,X3}

X={X3,X4}

All instances have the same class.

Return class N X={X3,X4

X3 has zero information gain

X4 has positive information gain Choose X4

Page 76: Decision tree

79

X1 X2 X3 X4 C

T T T T N

X1 X2 X3 X4 C

T T T F P

T T T F N

X={X3}

X3 has zero information gain

No suitable attribute for splitting

Return most common class (break ties arbitrarily)

Note: data is inconsistent!

X={X3}

All instances have the same class. Return N.

Page 77: Decision tree

80

Outlook Temp Humidity Windy PlaySunny Hot High False YesSunny Hot High True NoOvercast Hot High False YesRainy Mild High False YesRainy Cool Normal False YesRainy Cool Normal True NoOvercast Cool Normal True YesSunny Mild High False NoSunny Cool Normal False YesRainy Mild Normal False YesSunny Mild Normal True YesOvercast Mild High True YesOvercast Hot Normal False YesRainy Mild High True No

Outlook

Humidity Windy

Temperature

Windy

No Yes

YesYes

No

No

RainySunnyOvercast

YesHigh Normal

HotMild

True False

Example 3

Page 78: Decision tree

81

Outlook Temp Humidity Windy PlaySunny Hot High False YesSunny Hot High True NoSunny Mild High False NoSunny Cool Normal False YesSunny Mild Normal True Yes

Outlook

Humidity Windy

YesYes No

RainySunnyOvercast

YesHigh Normal

Gain(Temperature)=0.971-0.8=0.171Gain(Windy)=0.971-0.951=0.020Gain(Humidity)=0.971-0.551=0.420

O T H W PS H H F YS H H T NS M H F N

O T H W PS C N F YS M N T Y

Page 79: Decision tree

82

Humidity Windy

Temperature YesYes No

RainySunnyOvercast

YesHigh Normal

HotMild

Outlook

O T H W PS H H F YS H H T NS M H F N

Temperature

YesNo

Hot

No

Mild

Windy

NoYes

False

No

True

O T H W PS H H F YS H H T N

O T H W PS M H F N

Page 80: Decision tree

83

Outlook

Humidity Windy

Temperature

Windy

No Yes

Yes

No

No

RainySunnyOvercast

YesHigh Normal

HotMild

True FalseO T H W PS H H T N O T H W P

S H H F Y

Yes