11 decision trees - tu braunschweig · 2009. 7. 10. · microsoft powerpoint - solutions ex11...

12
Exercise 1.1: Decision Tree I(9, 6) = 0.97 Calculate gain for all the attributes 11 Decision Trees Age Has job Owns house Credit rating Approve loan Young False False Fair No Young False False Good No Young True False Good Yes Young True True Fair Yes Young False False Fair No the attributes Age: E(Age) = 5/15 * I(2, 3)+ 5/15 * I(3, 2)+ 5/15 * I(4, 1) = 0.88 Gain(Age) = 0.97 – 0.88 = 0.09 Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 1 Middle False False Fair No Middle False False Good No Middle True True Good Yes Middle False True Excellent Yes Middle False True Excellent Yes Old False True Excellent Yes Old False True Good Yes Old True False Good Yes Old True False Excellent Yes Old False False Fair No

Upload: others

Post on 17-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 11 Decision Trees - TU Braunschweig · 2009. 7. 10. · Microsoft PowerPoint - Solutions Ex11 Author: Silviu Created Date: 7/10/2009 11:24:15 AM

• Exercise 1.1: Decision Tree– I(9, 6) = 0.97

• Calculate gain for allthe attributes

11 Decision Trees

Age Has job Owns house Credit rating Approve loan

Young False False Fair No

Young False False Good No

Young True False Good Yes

Young True True Fair Yes

Young False False Fair No

the attributes– Age:

• E(Age) = 5/15 * I(2, 3)+5/15 * I(3, 2)+5/15 * I(4, 1)

= 0.88

• Gain(Age) = 0.97 – 0.88= 0.09

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 1

Middle False False Fair No

Middle False False Good No

Middle True True Good Yes

Middle False True Excellent Yes

Middle False True Excellent Yes

Old False True Excellent Yes

Old False True Good Yes

Old True False Good Yes

Old True False Excellent Yes

Old False False Fair No

Page 2: 11 Decision Trees - TU Braunschweig · 2009. 7. 10. · Microsoft PowerPoint - Solutions Ex11 Author: Silviu Created Date: 7/10/2009 11:24:15 AM

– Has Job:

• E(..) = 10/15 * I(4, 6)+5/15 * I(5, 0)= 0.64

• Gain(..) = 0.97 – 0.64

11 Decision Trees

Age Has job Owns house Credit rating Approve loan

Young False False Fair No

Young False False Good No

Young True False Good Yes

Young True True Fair Yes

Young False False Fair No• Gain(..) = 0.97 – 0.64= 0.33

– Owns house:

• E(..) = 9/15 * I(3, 6)+6/15 * I(6, 0)= 0.54

• Gain(..) = 0.97 – 0.54= 0.43

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2

Middle False False Fair No

Middle False False Good No

Middle True True Good Yes

Middle False True Excellent Yes

Middle False True Excellent Yes

Old False True Excellent Yes

Old False True Good Yes

Old True False Good Yes

Old True False Excellent Yes

Old False False Fair No

Page 3: 11 Decision Trees - TU Braunschweig · 2009. 7. 10. · Microsoft PowerPoint - Solutions Ex11 Author: Silviu Created Date: 7/10/2009 11:24:15 AM

– Credit rating:

• E(..) = 5/15 * I(1, 4)+6/15 * I(4, 2)+4/15 * I(4, 0)= 0.60

11 Decision Trees

Age Has job Owns house Credit rating Approve loan

Young False False Fair No

Young False False Good No

Young True False Good Yes

Young True True Fair Yes

Young False False Fair No

• Gain(..) = 0.97 – 0.60= 0.37

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3

Middle False False Fair No

Middle False False Good No

Middle True True Good Yes

Middle False True Excellent Yes

Middle False True Excellent Yes

Old False True Excellent Yes

Old False True Good Yes

Old True False Good Yes

Old True False Excellent Yes

Old False False Fair No

Page 4: 11 Decision Trees - TU Braunschweig · 2009. 7. 10. · Microsoft PowerPoint - Solutions Ex11 Author: Silviu Created Date: 7/10/2009 11:24:15 AM

– Split through Own_house

11 Decision Trees

Owns_house?

false truetrue

Age Has job Credit rating Approve loan

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4

Age Has job Credit rating Approve loan

Young True Fair Yes

Middle True Good Yes

Middle False Excellent Yes

Middle False Excellent Yes

Old False Excellent Yes

Old False Good Yes

Age Has job Credit rating Approve loan

Young False Fair No

Young False Good No

Young True Good Yes

Young False Fair No

Middle False Fair No

Middle False Good No

Old True Good Yes

Old True Excellent Yes

Old False Fair No

Page 5: 11 Decision Trees - TU Braunschweig · 2009. 7. 10. · Microsoft PowerPoint - Solutions Ex11 Author: Silviu Created Date: 7/10/2009 11:24:15 AM

– I(3, 6) = 0.91

– Age:

• E(Age) = 4/9 * I(1, 3)+2/9 * I(0, 2)+3/9 * I(2, 1)

11 Decision TreesOwns_house?

false

Age Has job Credit rating Approve loan

Young False Fair No

Young False Good No

Young True Good Yes

Young False Fair No

Middle False Fair No3/9 * I(2, 1)= 0.66

• Gain(Age) = 0.91 – 0.66= 0.25

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 5

Middle False Fair No

Middle False Good No

Old True Good Yes

Old True Excellent Yes

Old False Fair No

Page 6: 11 Decision Trees - TU Braunschweig · 2009. 7. 10. · Microsoft PowerPoint - Solutions Ex11 Author: Silviu Created Date: 7/10/2009 11:24:15 AM

– Has Job:

• E(…) = 6/9 * I(0, 6)+3/9 * I(3, 0)= 0

• Gain(…) = 0.91

11 Decision TreesOwns_house?

false

Age Has job Credit rating Approve loan

Young False Fair No

Young False Good No

Young True Good Yes

Young False Fair No

Middle False Fair No• Gain(…) = 0.91

• We can stop and split throughHas job

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 6

Middle False Fair No

Middle False Good No

Old True Good Yes

Old True Excellent Yes

Old False Fair No

Page 7: 11 Decision Trees - TU Braunschweig · 2009. 7. 10. · Microsoft PowerPoint - Solutions Ex11 Author: Silviu Created Date: 7/10/2009 11:24:15 AM

– Split through Has job

11 Decision TreesOwns_house?

false

Age Has job Credit rating Approve loan

Young False Fair No

Young False Good No

Has Job?

Age Has job Credit rating Approve loan

Young True Good Yes

false true

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7

Young False Good No

Young False Fair No

Middle False Fair No

Middle False Good No

Old False Fair No

Old True Good Yes

Old True Excellent Yes

Owns_house?false

Has Job?

false true

yesno

Page 8: 11 Decision Trees - TU Braunschweig · 2009. 7. 10. · Microsoft PowerPoint - Solutions Ex11 Author: Silviu Created Date: 7/10/2009 11:24:15 AM

– Recur into the right side

• Stop condition met

11 Decision Trees

Owns_house?

truetrue

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 8

Age Has job Credit rating Approve loan

Young True Fair Yes

Middle True Good Yes

Middle False Excellent Yes

Middle False Excellent Yes

Old False Excellent Yes

Old False Good Yes

Owns_house?

true

yes

Page 9: 11 Decision Trees - TU Braunschweig · 2009. 7. 10. · Microsoft PowerPoint - Solutions Ex11 Author: Silviu Created Date: 7/10/2009 11:24:15 AM

– Resulting tree

11 Decision Trees

true

yes

Owns_house?false

Has Job?

false true

yesno

– X = Senior person with job, doesn’t own a house and has good credit rating

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 9

true

yes

Owns_house?false

Has Job?

false true

yesno

Page 10: 11 Decision Trees - TU Braunschweig · 2009. 7. 10. · Microsoft PowerPoint - Solutions Ex11 Author: Silviu Created Date: 7/10/2009 11:24:15 AM

• Exercise 1.2: Naïve Bayesian Classification

– P(p) = 9/15

– P(n) = 6/15

11 Decision Trees

Age Has job Owns house Credit rating Approve loan

Young False False Fair No

Young False False Good No

Young True False Good Yes

Young True True Fair Yes

Young False False Fair No

Age attribute

P(youth|p) = 2/9 P(youth|n) = 3/6

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 10

Middle False False Fair No

Middle False False Good No

Middle True True Good Yes

Middle False True Excellent Yes

Middle False True Excellent Yes

Old False True Excellent Yes

Old False True Good Yes

Old True False Good Yes

Old True False Excellent Yes

Old False False Fair No

P(middle|p) = 3/9 P(middle|n) = 2/6

P(senior|p) = 4/9 P(senior|n) = 1/6

Has Job

P(true|p) = 5/9 P(true|n) = 0/6

P(false|p) = 4/9 P(false|n) = 6/6

Owns house

P(true|p) = 6/9 P(true|n) = 0/6

P(false|p) = 3/9 P(false|n) = 6/6

Page 11: 11 Decision Trees - TU Braunschweig · 2009. 7. 10. · Microsoft PowerPoint - Solutions Ex11 Author: Silviu Created Date: 7/10/2009 11:24:15 AM

• Exercise 1.2: Naïve Bayesian Classification

11 Decision Trees

Age Has job Owns house Credit rating Approve loan

Young False False Fair No

Young False False Good No

Young True False Good Yes

Young True True Fair Yes

Young False False Fair No

Credit rating

P(fair|p) = 1/9 P(fair|n) = 4/6

P(good|p) = 4/9 P(good|n) = 2/6

P(excellent|p) = 4/9 P(excellent|n) = 0/6

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 11

Middle False False Fair No

Middle False False Good No

Middle True True Good Yes

Middle False True Excellent Yes

Middle False True Excellent Yes

Old False True Excellent Yes

Old False True Good Yes

Old True False Good Yes

Old True False Excellent Yes

Old False False Fair No

Page 12: 11 Decision Trees - TU Braunschweig · 2009. 7. 10. · Microsoft PowerPoint - Solutions Ex11 Author: Silviu Created Date: 7/10/2009 11:24:15 AM

• Exercise 1.2: Naïve Bayesian Classification

– New data:

• X = Senior person with job, doesn’t own a house and hasgood credit rating

11 Decision Trees

Age attribute

P(youth|p) = 2/9 P(youth|n) = 3/6

P(middle|p) = 3/9 P(middle|n) = 2/6

P(senior|p) = 4/9 P(senior|n) = 1/6

Has Job• P(X|p)*P(p) = 4/9 * 5/9 * 3/9

* 4/9 * 9/15= 0.021

• P(X|n)*P(n) = 1/6 * 0 * … = 0

• 0.02 > 0 So X gets thecredit

Data Warehousing & DM– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 12

Credit rating

P(fair|p) = 1/9 P(fair|n) = 4/6

P(good|p) = 4/9 P(good|n) = 2/6

P(excellent|p) = 4/9 P(excellent|n) = 0/6

Has Job

P(true|p) = 5/9 P(true|n) = 0/6

P(false|p) = 4/9 P(false|n) = 6/6

Owns house

P(true|p) = 6/9 P(true|n) = 0/6

P(false|p) = 3/9 P(false|n) = 6/6