classification trees hi. what should you get out of this? obtain an introduction to classifiaction...

Classification trees

What should you get out of this?

• Obtain an introduction to classifiaction tree modelling

• Allow you to follow our steps and be able to reproduce them at home.

• Understand the usefulness of classification trees.

• A „Manual” explaining our steps will be posted online.

Schedule

• Part 1 – Introduction to Classification Trees• Part 2 – Hands on example of classification

tree• Part 3 – Examples of research using

classification trees• Part 4 – CHAID and Gini Index models

• Part 5 - Assymetric payoff analysis

Let’s Begin!

• Part 1 – Michel van Dijck• Part 2 – Timo van Dockum• Part 3&4 – Stanisław Guner

• Part 5 – Ruud Moers

Part 1

Introduction to classification trees - Theory

Decision tree

• Root Node• Internal Node• Leaf Node

Binary vs. Multiway split

Binary attributes

Nominal attributes

Ordinal Attributes

Continuous attributes

Measures for selecting best split

Criteria for growing a tree

• Maximize Information gain

• Maximize Gain Ratio

Problems with decision trees

• Under fitting – Not enough data– Algorithm is “clueless”

• Over fitting– Too many nodes– Tree pruning

Part 2

How does it work in practise?

# Home Owner Marital Status Annual Income Defaulted Borrower

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes

Node 0

10 peopleDefaulted BorrowerYes 3No 7

Node 0

No home owner

Home owner

Node 0

Single or divorced

Married

Node 0

Income < 100

Income ≥ 100

Selecting the best split

Impurity:• When all people belong to the same class, the

node has zero impurity.• When all people are equally split between the

two classes, the node has the highest impurity.

Goal: minimizing impurity

Node 0

No home owner

Home owner

p(yes) = 3/7p(no) = 4/7

High impurity

p(yes) = 0p(no) = 1

Zero impurity

Degree of impurity

• Self-information is the amount of information you receive when an event occurs.

• The smaller the probability an event occurs, the greater the self-information.

• You want to minimize expected amount of self-information to make more accurate predictions

• Definition:2log(1/p)(if p increases, 2log(1/p) decreases)

• Entropy is the expected value of the self-information.

• The Entropy of Node 1 is:3/7* 2log(1/(3/7)) + 4/7* 2log(1/(4/7))= 0,99

Node 0

Node 1: No home owner

p(yes) = 3/7p(no) = 4/7

High impurity

Similarly, the Entropy of Node 2 is:0* 2log(1/0) + 1* 2log(1/1)

Node 0

Node 2: Home owner

p(yes) = 0p(no) = 1

Zero impurity

Node 0

No home owner

Home owner

p(yes) = 3/7p(no) = 4/7

High impurity=

p(yes) = 0p(no) = 1

Zero impurity=0

Impurity = 0,88

• A split is an improvement if the weighted average of the impurities of the subnodes is smaller than the impurity of the root node.

Node 0

No home owner

Home owner

p(yes) = 3/7p(no) = 4/7

High impurity=

p(yes) = 0p(no) = 1

Zero impurity=0

Impurity = 0,88

Weighted impurity = 7/10 * 0,96 + 3/10 * 0 = 0,672 < 0,88 improvement

Split based on Weighted impurity

No split 0,88

House owner yes/no 0,67

Marital status 0,60

Annual income less or higher than 100 0,60

Let’s go for the split based on income.

Node 0

Income < 100

Income ≥ 100

Impurity = 0No further splitSingle or divorced

Married

Impurity = 0No further split

Impurity = 0,81

Weighted impurity = 4/6*0,81 + 2/6*0 = 0,54 < 0,67 < 0,88 improvement

Impurity = 0,88

Impurity = 0,67

Part 3

Case Studies

Case 1 – Classifying an Iris

Sertosa Versicolor Verginica

• Not enough knowledge to distinguish them by looking at them?

• Use the petal’s width and is length

• Follow the decision tree

• First Decision Tree constructed for this problem in 1936 by the British scientist Fisher.

• Different trees over the last 80 years

• Accuracy >95%

Case 2 – Classifcation of osteoarthritis

List of characteristics of idiopathic and non-idiopathic OA

M.D.s were surveyed in order to determine the classification atributes out of the list of potential characteristics

X-ray as a first exam

Physical examination as a first step

Repeated Synovial fluid test

Conclusions

Case 3 – Carsh vs Non-crash unit independet classification

Non crash specific factors

They used Gini impurity measure

This equation finds the gain (in terms of impurity) from creating a new node

VIM asseses performance of a variable in producing splits

VIM Performance Speed limit seem to be crucial in distinguishing purer nodes

Lift chart – Generic and baseline model

Lift chart – Generic and crash type models

Conclusions

Part 4

Modelling

Data Transformation

• „not donated” and „donated”• Attributes must characterize donators in other

ways than their label. DONAMT out.• SPSS: Aggerage test and train sheet into one

data set. Distinguish between the two by a new, 0 or 1 attribute „train”.

• Create standardized values of attributes (PCA)

DATA REDUCTION

Correlations

TIMELR TIMECL FRQRES MEDTOR AVGDON LSTDON ANNDON

TIMELR

Pearson

Correlation1 -,063** -,736** -,109** -,333** -,048** -,293**

Sig. (2-tailed) ,000 ,000 ,000 ,000 ,000 ,000