1 chapter 6: model assessment 6.1 model fit statistics 6.2 statistical graphics 6.3 adjusting for...

59
1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

Upload: roberta-sandra-ball

Post on 17-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

1

Chapter 6: Model Assessment

6.1 Model Fit Statistics

6.2 Statistical Graphics

6.3 Adjusting for Separate Sampling

6.4 Profit Matrices

Page 2: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

2

Chapter 6: Model Assessment

6.1 Model Fit Statistics6.1 Model Fit Statistics

6.2 Statistical Graphics

6.3 Adjusting for Separate Sampling

6.4 Profit Matrices

Page 3: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

3

Summary Statistics SummaryStatisticPrediction Type

Decisions

Rankings

Estimates

ROC Index (concordance)Gini coefficient

Average squared errorSBC/Likelihood

...

Page 4: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

4

Summary Statistics SummaryStatisticPrediction Type

Decisions

Rankings

Estimates Average squared errorSBC/Likelihood

Accuracy/MisclassificationProfit/Loss

Inverse prior threshold

...

Page 5: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

5

Summary Statistics SummaryStatisticPrediction Type

Decisions

Rankings

Estimates

Accuracy/MisclassificationProfit/Loss

Inverse prior threshold

ROC Index (concordance)Gini coefficient

Page 6: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

6

Comparing Models with Summary Statistics

This demonstration illustrates the use of the Model Comparison tool, which collects assessment information from attached modeling nodes and enables you to easily compare model performance measures.

Page 7: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

7

Chapter 6: Model Assessment

6.1 Model Fit Statistics

6.2 Statistical Graphics6.2 Statistical Graphics

6.3 Adjusting for Separate Sampling

6.4 Profit Matrices

Page 8: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

8

Statistical Graphics – ROC Chart

captured response fraction(sensitivity)

false positive fraction(1-specificity)

...

The ROC chart illustrates a tradeoffbetween a captured response fraction

and a false positive fraction.

0.0

1.0

0.0 1.0

Page 9: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

9

Statistical Graphics – ROC Chart

captured response fraction(sensitivity)

false positive fraction(1-specificity)

...

The ROC chart illustrates a tradeoffbetween a captured response fraction

and a false positive fraction.

0.0

1.0

0.0 1.0

Page 10: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

10

Statistical Graphics – ROC Chart

...

0.0

1.0

0.0 1.0

Each point on the ROC chart corresponds to a specific fraction of cases, ordered by their predicted value.

Page 11: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

11

Statistical Graphics – ROC Chart

...

0.0

1.0

0.0 1.0

Each point on the ROC chart corresponds to a specific fraction of cases, ordered by their predicted value.

Page 12: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

12

Statistical Graphics – ROC Chart

...

0.0

1.0

0.0 1.0

top 40%

For example, this point on the ROC chart corresponds to the 40% of cases with the highest predicted values.

Page 13: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

13

Statistical Graphics – ROC Chart

...

0.0

1.0

0.0 1.0

top 40%

For example, this point on the ROC chart corresponds to the 40% of cases with the highest predicted values.

Page 14: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

14

Statistical Graphics – ROC Chart

...

0.0

1.0

0.0 1.0

top 40%

The y-coordinate shows the fraction of primary outcomecases captured in the top 40% of all cases.

Page 15: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

15

Statistical Graphics – ROC Chart

...

0.0

1.0

0.0 1.0

top 40%

The y-coordinate shows the fraction of primary outcomecases captured in the top 40% of all cases.

Page 16: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

16

Statistical Graphics – ROC Chart

...

0.0

1.0

0.0 1.0

top 40%

The x-coordinate shows the fraction of secondary outcome cases captured in the top 40% of all cases.

Page 17: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

17

Statistical Graphics – ROC Chart

...

0.0

1.0

0.0 1.0

top 40%

The x-coordinate shows the fraction of secondary outcome cases captured in the top 40% of all cases.

Page 18: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

18

Statistical Graphics – ROC Chart

...

0.0

1.0

0.0 1.0

top 40%

Repeat for all selection fractions.

Page 19: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

19

Statistical Graphics – ROC Chart

...

top 40%

0.0

1.0

0.0 1.0

Repeat for all selection fractions.

Page 20: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

20

Statistical Graphics – ROC Chart

...

0.0

1.0

0.0 1.0

weak model strong model

Page 21: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

21

Statistical Graphics – ROC Index

...

0.0

1.0

0.0 1.0

weak modelROC Index < 0.6

strong modelROC Index > 0.7

Page 22: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

22

Comparing Modelswith ROC Charts

This demonstration illustrates the use of ROC charts to compare models.

Page 23: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

23

Statistical Graphics – Response Chart

cumulative percent response

percent selected

...

The response chart shows the expectedresponse rate for various selection percentages.

50%

100%

0% 100%

Page 24: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

24

Statistical Graphics – Response Chart

cumulative percent response

percent selected

...

The response chart shows the expectedresponse rate for various selection percentages.

50%

100%

0% 100%

Page 25: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

25

Statistical Graphics – Response Chart

...

50%

100%

0% 100%

Each point on the response chart corresponds to a specific fraction of cases, ordered by their predicted values.

Page 26: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

26

Statistical Graphics – Response Chart

...

50%

100%

0% 100%

Each point on the response chart corresponds to a specific fraction of cases, ordered by their predicted values.

Page 27: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

27

Statistical Graphics – Response Chart

...

top 40%

For example, this point on the response chart corresponds to the 40% of cases with the highest predicted values.

50%

100%

0% 100%

Page 28: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

28

Statistical Graphics – Response Chart

...

top 40%

For example, this point on the response chart corresponds to the 40% of cases with the highest predicted values.

50%

100%

0% 100%

Page 29: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

29

Statistical Graphics – Response Chart

...

top 40%

50%

100%

0% 100%

The x-coordinate shows the percentage of selected cases.

40%

Page 30: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

30

Statistical Graphics – Response Chart

...

top 40%

50%

100%

0% 100%

The x-coordinate shows the percentage of selected cases.

40%

Page 31: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

31

Statistical Graphics – Response Chart

...

top 40%

50%

100%

0% 100%40%

The y-coordinate shows the percentage of primary outcome cases found in the top 40%.

Page 32: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

32

Statistical Graphics – Response Chart

...

top 40%

50%

100%

0% 100%40%

The y-coordinate shows the percentage of primary outcome cases found in the top 40%.

Page 33: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

33

Statistical Graphics – Response Chart

...

50%

100%

0% 100%40%

top 40%

Repeat for all selection fractions.

Page 34: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

34

Page 35: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

35

6.01 PollIn practice, modelers often use several tools, sometimes both graphical and numerical, to choose a best model.

True

False

Page 36: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

36

6.01 Poll – Correct AnswerIn practice, modelers often use several tools, sometimes both graphical and numerical, to choose a best model.

True

False

Page 37: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

37

Comparing Modelswith Score Rankings Plots

This demonstration illustrates comparing models with Score Rankings plots.

Page 38: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

38

Adjusting for Separate Sampling

This demonstration illustrates how to adjust for separate sampling in SAS Enterprise Miner.

Page 39: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

39

Chapter 6: Model Assessment

6.1 Model Fit Statistics

6.2 Statistical Graphics

6.3 Adjusting for Separate Sampling6.3 Adjusting for Separate Sampling

6.4 Profit Matrices

Page 40: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

40

Outcome OverrepresentationA common predictive modeling practice is to build models from a sample with a primary outcome proportion different from the original population.

...

Page 41: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

41

Outcome OverrepresentationA common predictive modeling practice is to build models from a sample with a primary outcome proportion different from the original population.

...

Page 42: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

42

Separate Sampling

...

Target-based samples are created by considering the primary outcome cases separately from the secondary outcome cases.

primary outcomesecondary outcome

Page 43: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

43

Separate Sampling

...

Target-based samples are created by considering the primary outcome cases separately from the secondary outcome cases.

primary outcomesecondary outcome

Page 44: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

44

Separate Sampling

...

Select all cases.Select some cases.

primary outcomesecondary outcome

Page 45: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

45

Separate Sampling

...

Select all cases.Select some cases.

primary outcomesecondary outcome

Page 46: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

46

The Modeling Sample

...

+ Similar predictive powerwith smaller case count

− Must adjust assessmentstatistics and graphics

− Must adjust predictionestimates for bias

Page 47: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

47

Adjusting for Separate Sampling (continued)

This demonstration illustrates how to adjust for separate sampling in SAS Enterprise Miner.

Page 48: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

48

Creating a Profit Matrix

This demonstration illustrates how to create a profit matrix.

Page 49: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

49

Chapter 6: Model Assessment

6.1 Model Fit Statistics

6.2 Statistical Graphics

6.3 Adjusting for Separate Sampling

6.4 Profit Matrices6.4 Profit Matrices

Page 50: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

50

0

0

Profit Matrices

0profit distribution

for solicit decision

-0.68

solicit ignore

primaryoutcome

secondaryoutcome

Page 51: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

51

Profit Matrices

profit distributionfor solicit decision

0

0

0

solicit ignore

primaryoutcome

secondaryoutcome

15.14

Page 52: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

52

Expected Profit Solicit = 15.14 p1 – 0.68 p0

Expected Profit Ignore = 0

Choose the larger.

^ ^

Decision Expected Profits

0

...

solicit ignore

primaryoutcome

secondaryoutcome

Page 53: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

53

decision threshold

Decision Threshold

^

^p1 ≥ 0.68 / 15.82 Solicit

p1 < 0.68 / 15.82 Ignore

0

solicit ignore

primaryoutcome

secondaryoutcome

Page 54: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

54

Average Profit

average profit

Average profit = (15.14NPS – 0.68 NSS ) / N

NPS = # solicited primary outcome cases

NSS = # solicited secondary outcome cases

N = total number of assessment cases

0

solicit ignore

primaryoutcome

secondaryoutcome

Page 55: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

55

Evaluating Model Profit

This demonstration illustrates viewing the consequences of incorporating a profit matrix.

Page 56: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

56

Viewing Additional Assessments

This demonstration illustrates several other assessments of possible interest.

Page 57: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

57

Optimizing with Profit (Self-Study)

This demonstration illustrates optimizing your model strictly on profit.

Page 58: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

58

Exercises

This exercise reinforces the concepts discussed previously.

Page 59: 1 Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

59

Assessment Tools Review

Compare model summary statistics and statistical graphics.

Create decision data; add prior probabilities and profit matrices.

Tune models with average squared error or appropriate profit matrix.

Obtain means and other statistics on data source variables.