classification trees hi. what should you get out of this? obtain an introduction to classifiaction...

85
Classification trees Hi

Upload: spencer-mccormick

Post on 12-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Classification trees

Hi

Page 2: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

What should you get out of this?

• Obtain an introduction to classifiaction tree modelling

• Allow you to follow our steps and be able to reproduce them at home.

• Understand the usefulness of classification trees.

• A „Manual” explaining our steps will be posted online.

Page 3: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Schedule

• Part 1 – Introduction to Classification Trees• Part 2 – Hands on example of classification

tree• Part 3 – Examples of research using

classification trees• Part 4 – CHAID and Gini Index models

• Part 5 - Assymetric payoff analysis

Page 4: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Let’s Begin!

• Part 1 – Michel van Dijck• Part 2 – Timo van Dockum• Part 3&4 – Stanisław Guner

• Part 5 – Ruud Moers

Page 5: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Part 1

Introduction to classification trees - Theory

Page 6: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Decision tree

• Root Node• Internal Node• Leaf Node

Page 7: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Binary vs. Multiway split

Page 8: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Binary attributes

Page 9: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Nominal attributes

Page 10: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Ordinal Attributes

Page 11: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Continuous attributes

Page 12: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Measures for selecting best split

Page 13: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Criteria for growing a tree

• Maximize Information gain

• Maximize Gain Ratio

Page 14: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Problems with decision trees

• Under fitting – Not enough data– Algorithm is “clueless”

• Over fitting– Too many nodes– Tree pruning

Page 15: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Part 2

How does it work in practise?

Page 16: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

# Home Owner Marital Status Annual Income Defaulted Borrower

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes

Page 17: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Node 0

10 peopleDefaulted BorrowerYes 3No 7

Page 18: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Node 0

10 peopleDefaulted BorrowerYes 3No 7

No home owner

7 peopleDefaulted BorrowerYes 3No 4

Home owner

3 peopleDefaulted BorrowerYes 0No 3

Page 19: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Node 0

10 peopleDefaulted BorrowerYes 3No 7

Single or divorced

6 peopleDefaulted BorrowerYes 3No 3

Married

4 peopleDefaulted BorrowerYes 0No 4

Page 20: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Node 0

10 peopleDefaulted BorrowerYes 3No 7

Income < 100

6 peopleDefaulted BorrowerYes 3No 3

Income ≥ 100

4 peopleDefaulted BorrowerYes 0No 4

Page 21: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Selecting the best split

Impurity:• When all people belong to the same class, the

node has zero impurity.• When all people are equally split between the

two classes, the node has the highest impurity.

Goal: minimizing impurity

Page 22: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Node 0

10 peopleDefaulted BorrowerYes 3No 7

No home owner

7 peopleDefaulted BorrowerYes 3No 4

Home owner

3 peopleDefaulted BorrowerYes 0No 3

p(yes) = 3/7p(no) = 4/7

High impurity

p(yes) = 0p(no) = 1

Zero impurity

Page 23: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Degree of impurity

• Self-information is the amount of information you receive when an event occurs.

• The smaller the probability an event occurs, the greater the self-information.

• You want to minimize expected amount of self-information to make more accurate predictions

• Definition:2log(1/p)(if p increases, 2log(1/p) decreases)

Page 24: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

• Entropy is the expected value of the self-information.

• The Entropy of Node 1 is:3/7* 2log(1/(3/7)) + 4/7* 2log(1/(4/7))= 0,99

Node 0

10 peopleDefaulted BorrowerYes 3No 7

Node 1: No home owner

7 peopleDefaulted BorrowerYes 3No 4

p(yes) = 3/7p(no) = 4/7

High impurity

Page 25: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Similarly, the Entropy of Node 2 is:0* 2log(1/0) + 1* 2log(1/1)

= 0

Node 0

10 peopleDefaulted BorrowerYes 3No 7

Node 2: Home owner

3 peopleDefaulted BorrowerYes 0No 3

p(yes) = 0p(no) = 1

Zero impurity

Page 26: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Node 0

10 peopleDefaulted BorrowerYes 3No 7

No home owner

7 peopleDefaulted BorrowerYes 3No 4

Home owner

3 peopleDefaulted BorrowerYes 0No 3

p(yes) = 3/7p(no) = 4/7

High impurity=

0,96

p(yes) = 0p(no) = 1

Zero impurity=0

Impurity = 0,88

Page 27: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

• A split is an improvement if the weighted average of the impurities of the subnodes is smaller than the impurity of the root node.

Page 28: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Node 0

10 peopleDefaulted BorrowerYes 3No 7

No home owner

7 peopleDefaulted BorrowerYes 3No 4

Home owner

3 peopleDefaulted BorrowerYes 0No 3

p(yes) = 3/7p(no) = 4/7

High impurity=

0,96

p(yes) = 0p(no) = 1

Zero impurity=0

Impurity = 0,88

Weighted impurity = 7/10 * 0,96 + 3/10 * 0 = 0,672 < 0,88 improvement

Page 29: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Split based on Weighted impurity

No split 0,88

House owner yes/no 0,67

Marital status 0,60

Annual income less or higher than 100 0,60

Let’s go for the split based on income.

Page 30: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Node 0

10 peopleDefaulted BorrowerYes 3No 7

Income < 100

6 peopleDefaulted BorrowerYes 3No 3

Income ≥ 100

4 peopleDefaulted BorrowerYes 0No 4

Impurity = 0No further splitSingle or divorced

4 peopleDefaulted BorrowerYes 3No 1

Married

2 peopleDefaulted BorrowerYes 0No 2

Impurity = 0No further split

Impurity = 0,81

Weighted impurity = 4/6*0,81 + 2/6*0 = 0,54 < 0,67 < 0,88 improvement

Impurity = 0,88

Impurity = 0,67

Page 31: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Part 3

Case Studies

Page 32: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Case 1 – Classifying an Iris

Sertosa Versicolor Verginica

Page 33: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

• Not enough knowledge to distinguish them by looking at them?

• Use the petal’s width and is length

• Follow the decision tree

• First Decision Tree constructed for this problem in 1936 by the British scientist Fisher.

• Different trees over the last 80 years

• Accuracy >95%

Page 34: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Case 2 – Classifcation of osteoarthritis

Page 35: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

List of characteristics of idiopathic and non-idiopathic OA

Page 36: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

M.D.s were surveyed in order to determine the classification atributes out of the list of potential characteristics

Page 37: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

X-ray as a first exam

Page 38: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Physical examination as a first step

Page 39: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Repeated Synovial fluid test

Page 40: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Conclusions

Page 41: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Case 3 – Carsh vs Non-crash unit independet classification

Page 42: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Non crash specific factors

Page 43: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

They used Gini impurity measure

This equation finds the gain (in terms of impurity) from creating a new node

VIM asseses performance of a variable in producing splits

Page 44: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

VIM Performance Speed limit seem to be crucial in distinguishing purer nodes

Page 45: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Lift chart – Generic and baseline model

Page 46: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Lift chart – Generic and crash type models

Page 47: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Conclusions

Page 48: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Part 4

Modelling

Page 49: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Data Transformation

• „not donated” and „donated”• Attributes must characterize donators in other

ways than their label. DONAMT out.• SPSS: Aggerage test and train sheet into one

data set. Distinguish between the two by a new, 0 or 1 attribute „train”.

• Create standardized values of attributes (PCA)

Page 50: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

DATA REDUCTION

Page 51: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Correlations

TIMELR TIMECL FRQRES MEDTOR AVGDON LSTDON ANNDON

TIMELR

Pearson

Correlation1 -,063** -,736** -,109** -,333** -,048** -,293**

Sig. (2-tailed) ,000 ,000 ,000 ,000 ,000 ,000

N 8137 8137 8137 8137 8137 8137 8137

TIMECL

Pearson

Correlation-,063** 1 ,056** ,083** ,057** ,031** -,168**

Sig. (2-tailed) ,000 ,000 ,000 ,000 ,005 ,000

N 8137 8137 8137 8137 8137 8137 8137

FRQRES

Pearson

Correlation-,736** ,056** 1 -,025* ,412** -,025* ,353**

Sig. (2-tailed) ,000 ,000 ,025 ,000 ,025 ,000

N 8137 8137 8137 8137 8137 8137 8137

MEDTOR

Pearson

Correlation-,109** ,083** -,025* 1 ,030** ,091** ,014

Sig. (2-tailed) ,000 ,000 ,025 ,007 ,000 ,214

N 8137 8137 8137 8137 8137 8137 8137

AVGDON

Pearson

Correlation-,333** ,057** ,412** ,030** 1 ,642** ,878**

Sig. (2-tailed) ,000 ,000 ,000 ,007 ,000 ,000

N 8137 8137 8137 8137 8137 8137 8137

LSTDON

Pearson

Correlation-,048** ,031** -,025* ,091** ,642** 1 ,618**

Sig. (2-tailed) ,000 ,005 ,025 ,000 ,000 ,000

N 8137 8137 8137 8137 8137 8137 8137

ANNDON

Pearson

Correlation-,293** -,168** ,353** ,014 ,878** ,618** 1

Sig. (2-tailed) ,000 ,000 ,000 ,214 ,000 ,000

N 8137 8137 8137 8137 8137 8137 8137

**. Correlation is significant at the 0.01 level (2-tailed).

*. Correlation is significant at the 0.05 level (2-tailed).

Page 52: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Correlation matrix

• We observe high correlations (above 0,5) between:– 1) FRQRES and TIMELR– 2) ANNDON and AVGDON– 3) LSTDON and AVGDON– 4) ANNDON and LSTDON.

Page 53: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Principal Component Analysis – Scree plot

Page 54: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

PCA – Cumulative Variance explainedTotal Variance Explained

Factor Initial Eigenvalues Extraction Sums of Squared Loadings

Total % of Variance Cumulative % Total % of Variance Cumulative %

1 1,959 39,172 39,172 1,631 32,619 32,619

2 1,153 23,063 62,235 ,532 10,642 43,261

3 ,965 19,306 81,542

4 ,674 13,480 95,021

5 ,249 4,979 100,000

Extraction Method: Principal Axis Factoring.

Page 55: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Communalities

Initial Extraction

Zscore(TIMELR) ,559 ,675

Zscore(TIMECL) ,052 ,443

Zscore(FRQRES) ,575 ,804

Zscore(MEDTOR) ,043 ,015

Zscore(ANNDON) ,164 ,225

Extraction Method: Principal Axis Factoring.

Page 56: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Factor loadings – 2 Factors foundFactor Matrixa

Factor

1 2

Zscore(TIMELR) -,818 -,074

Zscore(TIMECL) ,034 ,665

Zscore(FRQRES) ,896 ,018

Zscore(MEDTOR) ,047 ,114

Zscore(ANNDON) ,393 -,267

Extraction Method: Principal Axis Factoring.

a. Attempted to extract 2 factors. More than 100 iterations required. (Convergence=,001). Extraction was terminated.

Page 57: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

PCA - Conclusions

• Helps us to understand dimensionality of data• Informs us about uniquness of attributes.

• We know what to expect in terms of complexity of a decision tree

Page 58: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

MODEL 1 – SPSS CHAID

Page 59: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

SPSS CHAID model

Notice:1) 2/3 levels2) Attributes can be split over a range

Page 60: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

SPSS – Right side

Page 61: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

SPSS – Left side

Page 62: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Confusion matrixClassification

Sample Observed Predicted

donated not donated Percent Correct

Training

donated 899 510 63,8%

not donated 523 2125 80,2%

Overall Percentage 35,1% 64,9% 74,5%

Test

Donated 837 569 59,5%

not donated 574 2100 78,5%

Overall Percentage 34,6% 65,4% 72,0%

Growing Method: CHAID

Dependent Variable: label

Page 63: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Ratios

• Overall error rate = (574+569)/4080=28,0%• Overall accuracy = (837+2100)/4080=72,0%

• Sensitivity = 837/(837+569)= 59,5%• Specificity = 2100/(2100+574)=78,5% ability to rule

out not donators correctly.• False positive rate = 574/(574+837) = 40,7% • False negative test = 569/(569+2100)=21,3%

Page 64: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

0 1000 2000 3000 4000

0

200

400

600

800

1000

1200

1400

1600

Mod

el rep

onse

Number of mailings

Model reponse Naive response

Lift chart

Page 65: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Why to make this model?

• Pros:– Intuitive– Easy to read– Pruned– Provides good knowledge overview

Page 66: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

MODEL 2 – RAPIDMINER GINI INDEX

Page 67: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

CriteriaBased on Gini criterion

Page 68: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

RM Gini index model

Page 69: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Performance

Overall accuracy of different models was very similar

Page 70: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Lift chart

0 1000 2000 3000 40000

200

400

600

800

1000

1200

1400

1600

Exp

ecte

d nu

mbe

r of

don

ator

s (n

)

Cumulative number of sent letters (n)

Number of responses from the model Number of responses from naive classifier

Page 71: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Part 5 – Payoff analysis

Page 72: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Assymetric payoff analysis – different approaches – a challenge

• Method 1: Focus of probabilities and select records based on chance of being a donator.

• Method 2: Include in the cut-off calculation the amount each client group generates.

Page 73: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Method 1

• Select all records with p>0,5 – Classified as donators.

• Add records between 0,3 and 0,5. – More capture.

• Or select at random sub-sample from non-donators (20% records)

– SIMPLE !

Page 74: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Method 2: Assymetrix payoff analysis

• High accuracy previous model

• However, should this be your goal?

Page 75: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Assymetrix payoff analysis

• No, you should aim to get as many profit as possible out of your actions

• Expected revenue of sending a letter to somebody is positive ⇒sending a letter only costs 50 cents

• E.g. For one of the groups only one in ten people is expected to donate, but still when this person is willing to donate 10 euros it will exceed the costs.

Page 76: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Solution• Expected Revenue

• Calculated by– Expected Average Donation per person x

possibility somebody will donate – costs of a letter– Gather the Expected Donation from the overall

average donation ⇒ 7,01

Page 77: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Expected Revenues• Expected Revenue of group at node 10

• 0.068 x 7.01 – 0.5 = - 0.0235⇒ Negative, so do not send a letter• Expected Revenu of group at node 9

• 0.205 x 7.01 - 0,5 = 0.9366 ⇒ Eventhough low percentage is donating , you should send them a letter

Page 78: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Expected Revenues

• Expected Revenues for every group at every node

• Conclusie, if the costs of sending a letter would increase the amount that is given in the table, you stop sending a letter to the group that the node belongs to.

• Consequences Classification tree

Node Number Expected Revenue

1 0.0676

9 0.9366

10 -0.0235

11 1.6444

12 0.5652

13 0.3900

14 1.1749

16 1.4272

17 2.4083

18 3.2353

19 2.4644

20 0.7825

21 3.2913

22 1.6655

23 3.4035

24 4.5107

25 2.8708

26 5.3096

27 1.4692

28 3.1372

Page 79: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Old Classification Tree

Page 80: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Change in Classification Tree

• Tremendous simplification of the tree

• However, (0.182, 0.234] is a strange interval

Page 81: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Further analyses

• Now take into account every group will probably donate another average amount

• Calculate this amount for every specific group and determine the Expected Revenue per person per group

• Also the group that belongs to node 1 will not receive a letter

Node Number Expected Revenue

1-0.3089

9 0.298110 -0.273311 1.329712 0.079613 0.214114 0.744416 1.450517 2.671918 3.716419 2.355320 0.431521 5.106822 2.694623 5.567224 7.925925 6.779026 11.422127 1.403628 3.7041

Page 82: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

New Classification Tree

• Still there are some remarks

Page 83: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Analyses new Tree

• Consider whether you are statisfied with forecasting by only using the frequency of response and the time to last response

• The tree does make sense→ If long time no response, most likely nog donation

• Results received with the out-of-sample forecasting using the other dataset

Page 84: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Out-of-sample forecasting* (error)

• If we apply the last model we would in the end have a total revenue of € 22373 and send out 3046 letters

• Compared to the model in which we would only send letters to those we expect to donate we would a have total revenue of € 15544 and send 1477 letters, which is indeed lower.

• The model in which we just use the average donation of all (ex)-donators gives us a total revenue of € 22460 and we would have to send 3270 letters.

• Even better would be sending a letter to everybody, that would lead to a total revenue of € 23257,50

Page 85: Classification trees Hi. What should you get out of this? Obtain an introduction to classifiaction tree modelling Allow you to follow our steps and be

Discussion

• Unfortunately ‘our’ model has been beaten by the most simplistic model of sending a letter to everybody, at least for the data we forecasted

• However, we did deliver some insights of to who you might not want to send letters to if the costs for every letter would increase