profit analysis of the german credit

31
1 TUTORIAL M - Profit Analysis of the German Credit Data Using SAS ® Enterprise Miner TM 5.3 Guest Author: Chamont Wang, Ph.D. Department of Mathematics and Statistics The College of New Jersey Tel: (609)771-3041 [email protected] Edited by Gary Miner, Ph.D. Introduction In this tutorial, we will approach the German credit data from a cost/profit perspective. Specifically, we assume that a correct decision of the bank would result in 35% of the profit at the end of a specific period, say 3–5 years. Here a correct decision means that the bank predicts that a customer’s credit is in good standing (and hence would obtain the loan), and the customer is indeed has good credit. On the other hand, if the model or the manager makes a false prediction that the customer’s credit is in good standing, yet the opposite is true, then the bank will result in a unit loss. This concludes the first column of the following profit matrix: Good Customer (predicted) Bad Customer (predicted) Good Customer (observed) +0.35 0 Bad Customer (observed) -1.00 0 In the second column of the matrix, the bank predicted that the customer’s credit is not in good standing and declined the loan. Hence there is no gain or loss in the decision. Note that the data has 70% credit-worthy (good) customers and 30% not-credit-worthy (bad) customers. A manager without any model that gives everybody the loan would result in the following negative profit per customer: (700*0.35- 300*1.00)/1000 = -55/1000 = -0.055 unit loss. This number (-0.055 unit loss) may seem small. But if the average of the load is $20,000 for this population (n = 1000), then the total loss will be (-0.055 unit loss)*($20,000 per unit per customer)*(1,000 customers) = -$1,100,000, which would be a whopping one million and one hundred thousand dollar loss. On the other hand, if a model produced the following classification matrix:

Upload: buicong

Post on 01-Jan-2017

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Profit Analysis of the German Credit

1

TUTORIAL M - Profit Analysis of the German Credit Data Using SAS® Enterprise MinerTM 5.3

Guest Author: Chamont Wang, Ph.D. Department of Mathematics and Statistics The College of New Jersey Tel: (609)771-3041 [email protected] Edited by Gary Miner, Ph.D.

Introduction In this tutorial, we will approach the German credit data from a cost/profit perspective. Specifically, we assume that a correct decision of the bank would result in 35% of the profit at the end of a specific period, say 3–5 years. Here a correct decision means that the bank predicts that a customer’s credit is in good standing (and hence would obtain the loan), and the customer is indeed has good credit. On the other hand, if the model or the manager makes a false prediction that the customer’s credit is in good standing, yet the opposite is true, then the bank will result in a unit loss. This concludes the first column of the following profit matrix:

Good Customer (predicted)

Bad Customer (predicted)

Good Customer (observed)

+0.35 0

Bad Customer (observed)

-1.00 0

In the second column of the matrix, the bank predicted that the customer’s credit is not in good standing and declined the loan. Hence there is no gain or loss in the decision. Note that the data has 70% credit-worthy (good) customers and 30% not-credit-worthy (bad) customers. A manager without any model that gives everybody the loan would result in the following negative profit per customer:

(700*0.35- 300*1.00)/1000 = -55/1000 = -0.055 unit loss. This number (-0.055 unit loss) may seem small. But if the average of the load is $20,000 for this population (n = 1000), then the total loss will be

(-0.055 unit loss)*($20,000 per unit per customer)*(1,000 customers) = -$1,100,000,

which would be a whopping one million and one hundred thousand dollar loss. On the other hand, if a model produced the following classification matrix:

Page 2: Profit Analysis of the German Credit

2

Good

(Predicted_Positive) Bad

(Predicted_Negative) Row total

Good (observed) 608 customers (76%, True_Positive)

46 customers 700 customers

Bad (observed) 192 customers (24%, False_Positive)

154 customers 300 customers

Column total & percentages

800 customers (100%)

200 customers 1,000 customers

Then the total profit would be

Profit = True_Positive*$20,000*0.35 – False_Positive*$20,000 = 608*$20,000*0.35 – 192*$20,000 = $416,000

The difference of model vs. no-model is $416,000 – (-$1,100,000) = $1,516,000,

which is about 1.5 million dollars of profit. The goal of this tutorial is to build statistical models to maximize the profit.

Modeling Strategy Assume that the data is already cleaned. Then the following steps would help maximize the profit:

1. Try Different Tools: SAS Enterprise Miner provides a variety of data mining tools, including

Decision Tree, Regression, Neural Network, Stochastic Gradient Boosting, Ensemble model, and countless variations of these tools. In this tutorial, we will use mainly the default settings of some of these models.

2. Variable Selection: The original data set has 20 predictors. Some of the predictors may not be as important as others and the exclusion of these variables may improve the model performance.

3. Bundling the Variables: Some of the predictors may be correlated to each other. In SAS Enterprise Miner, the grouping of these predictors via two different techniques (Variable Clustering node and Principle Components node) often improves the model performance.

4. Binning, Filtering, and Variable Transformation:

5. Parameter Tuning: 6. Change Nominal Predictors to Ordinal and Interval Variables: This subsection discusses a

very powerful node in SAS Enterprise Miner: the Replacement Node. This node can be very handy and very useful in many studies where the input variables are intrinsically in Ordinal

Page 3: Profit Analysis of the German Credit

3

scale but are coded in Nominal scale. This is exactly what happens with the German credit data, and this is the reason the Neural Network failed in Section-I.

7. Different Cutoff Values: Given the study population, the model will produce the probabilities of all customers with regard to their credit standing. If the probability of a specific customer is above the cutoff (also known as a threshold), then the customer will be placed in the category of good customers; otherwise, the customer loan application will be denied. By the adjustment of different cutoff values, we may be able to increase the total profit. In our experience, this technique is one of the most important in the maximization of the profit.

8. Decision Rules of Complicated Models: Machine learning techniques such as Neural Network and Gradient Boosting are often criticized for being a black-box model that is “impossible to figure out how an individual input is affecting the predicted outcome” (Ayres 2007, p. 143). This is not true. Given any Neural Network, one can plot its response surface and calculate its marginal effects (Wang and Liu 2008). For Boosted Trees, one can also calculate Interaction Effects (Friedman and Popescu 2005) and draw Partial Dependence Plots for the understanding and the interpretation of the model (Friedman 2002). In SAS Enterprise Miner, a special technique is to build a Decision Tree after a Neural Network (or other complicated model) to extract decision rules that can be very helpful for managers or other decision makers in real world applications.

Due to the limited space of (and time constraint of writing) this tutorial, we will skip Steps 2, 3, 4, 5, 6, and 8. Furthermore, in Step 6, we will use the default in SAS Enterprise Miner, which gives cutoff values at 5% increments. For finer resolution at 1% or 0.5% increments, one needs to write SAS codes to accomplish the task.

SAS Enterprise Miner 5.3 Interface

The following diagram shows the key components of the SAS Enterprise Miner 5.3 interface:

Page 4: Profit Analysis of the German Credit

4

Toolbars: There are three rows of tools that can be activated by clicking to select them, or can be dragged to the workspace. Move the cursor to a specific tool and a window will pop up that gives a brief description of the tool functionality. Project Panel: To manage and view data sources, diagrams, and results. Properties Panel: To view and edit the settings of data sources, diagrams, and nodes. Diagram Workspace: To graphically build, edit, run, and save process flow diagrams.

Section-I: A Primer of SAS Enterprise Miner Predictive Modeling This section provides information on the construction of the following SAS Enterprise Miner process flow, which contains these nodes:

The construction of the above process flow is sufficient for small or median-sized data sets. For large data sets, a Sample node can be added with little effort.

1. (Creating a Project)

Select File New Project from the main menu. Specify the project name (Profit Analysis) in the Name field of the Create New Project window:

Page 5: Profit Analysis of the German Credit

5

2. (Creating a Data Source)

Select File New Data Source to open the Data Source Wizard:

Click the Next button to browse the folder of the Credit_scoring data that resides in the Embook library:

Page 6: Profit Analysis of the German Credit

6

Click OK, click Next four times, and then click Finish to import the data.

3. (Creating a Diagram) Select File New Diagram, type the name, Profit Analysis, in the Diagram Name field in the Create New Diagram window, and then click the OK button:

4. (Creating the Process Flow): From the Project Panel (upper-left corner), drag the Profit_Analysis icon that is under Data Sources to the Diagram Workspace to create the Data Sources node:

Page 7: Profit Analysis of the German Credit

7

5. (Edit Variables) In the Diagram Workspace, right-click the Data Sources node, and then select Edit Variables:

In the Variables window, identify the target variable, Creditability. Change the model Role from Input to Target, and change the Level from Nominal to Binary. Then click OK:

Page 8: Profit Analysis of the German Credit

8

6. (Replacement Node) The node can be very handy and very useful in many studies where the predictors are intrinsically Ordinal variables but are coded in Nominal scale.

Clicking the Input Data node will produce the following window, which shows that most predictors are Nominal variables:

Page 9: Profit Analysis of the German Credit

9

To change the nominal predictors into ordinal variables, click the Modify tab, and then drag the Replacement node to the Diagram Workspace. Right-click the Replacement node and select Run. Click the Replacement node to activate its Property Panel, and then click the … icon that is to the right of the Class Variable Replacement Editor:

Page 10: Profit Analysis of the German Credit

10

In the Replacement Editor window, change the Level of the selected Input Variables as follows:

Page 11: Profit Analysis of the German Credit

11

When you are done, click OK.

Page 12: Profit Analysis of the German Credit

12

The conversion of Nominal variables to Ordinal scale requires subject-matter judgment and sometimes can be controversial. Readers of this tutorial are urged to examine the above conversion and use their own numbers when necessary. 7. (Data Partition) Drag the Data Partition icon (the 3rd icon under the Sample tab) to the Diagram Workspace. Connect the Replacement node to the Data Partition node:

In SAS Enterprise Miner 5.3, the default setting is 40%-30%-30% for the partition of the original data into Training, Validation, and Test data sets.

8. (Regression Node)

To use this node, click the Model tab (in the 3rd row of toolbars) to activate the Regression

icon . Drag the icon to the Diagram Workspace, and then connect the Data Partition node and the Regression Network node:

Right-click the Regression node, and then select Run. Click Yes in the pop-up window. Wait for the next pop-up window, and then click Results. The next pop-up window contains a lot of information, which could be useful in many other studies. In this case, we will skip these results and go straight to the profit calculation.

Page 13: Profit Analysis of the German Credit

13

Almost all data mining packages confuse the subtle difference between a misclassification matrix and a decision matrix. SAS Enterprise Miner is a rare exception. Note that the first matrix does not allow non-zero entries on the diagonal line, while the second matrix is able to accommodate different types of cost-profit considerations. The difference may seem small but the consequence is enormous. The following steps show you how to accomplish this with SAS Enterprise Miner. 9. (Decision Weights in the Data Source Node) Click the Data Source node to activate its Property Panel. Then click the icon that is to the right of Decisions. In the Decision Processing window, click Build to activate the decision menu:

Click the Decisions tab, and then select Yes:

Page 14: Profit Analysis of the German Credit

14

Click the Decision Weights tab, and enter weight values for the decision:

10. (Comparison of the Profits) Build the following process flow to compare the profits of different models. To do so, first click the Model tab to activate a number of predictive modeling tools. Then drag the Neural Network and Decision Tree icons to the Diagram Workspace:

Page 15: Profit Analysis of the German Credit

15

Next click the Assess tab to activate a number of new tools. This tab contains a special icon named Model Comparison. Drag a Model Comparison icon to the Diagram Workspace and connect the Regression, Decision Tree, and Neural Network nodes to it.

Run the Model Comparison node, and then select Results in the pop-up window. In the Results window, go to the lower-left corner, and then click the Cumulative Lift arrow to access the drop-down list:

Page 16: Profit Analysis of the German Credit

16

In the drop-down list, scroll to Expected Profit for Cumulative Profit (the Total Profit gives non-Cumulative Profit and is useful for other applications):

Page 17: Profit Analysis of the German Credit

17

At the bottom of the Expected Profit window, click the Neural Network model. Go to the Data Role = TEST window. Point the cursor at any of the blue boxes to see the Expected Profit Mean:

Page 18: Profit Analysis of the German Credit

18

The 0.34 Expected Profit in the top 10% of the data for the Neural Network curve corresponds very closely to the number entered into the profit matrix earlier. Thus, the Neural Network was able to achieve 100% accuracy in the top 10% of the customers. The following table compares the Total Profit of the Tree and Dmine Regression at different cutoff values (Total Profit = Mean Profit*Cutoff*Population Size):

N=300

(Holdout Data) 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%

Mean Profit

Reg 0.33 0.31 0.29 0.28 0.25 0.22 0.19 0.15 0.10 0.05 Tree 0.21 0.21 0.21 0.21 0.21 0.21 0.21 0.21 0.21 0.07

NN 0.35 0.34 0.33 0.33 0.32 0.30 0.27 0.24 0.22 0.18 Total Profit

Reg 3.9 3.9 2.55 2.55 -0.15 2.55 1.2 1.2 1.2 1.2 Tree 2.21 2.21 2.21 2.21 2.21 2.21 2.21 2.21 2.21 0.74

NN 5.25 1.2 1.2 2.55 5.25 3.9 -1.5 -1.5 2.55 -1.5

The table shows that the Neural Network achieves the best profit at 5% cutoff and the Regression achieves the best profit at the 5% or 10% cutoff. In short, if we use the Neural Network model to select the top 5% of the customers, then the model would produce a Total Profit of 5.25 units for each unit of the investment in the Holdout data (n=300).

Page 19: Profit Analysis of the German Credit

19

Discussions

Assume that we have a new population of 1,000 customers with average loan of $20,000. The Neural Network model would select the top 5% of the customer and results in a total profit of quite a bit of money indeed.

0.35*0.05*1000*$20,000 = $350,000

Section-II: Advanced Techniques of Predictive Modeling

1. Gradient Boosting and Ensemble Models SAS Enterprise Miner 5.3 has several advanced techniques for building predictive models for classification and regression; many of these techniques also incorporate profit-based model selection and assessment. A new technique, which is available in this version of the software, is Stochastic Gradient Boosting, which is based on published work (Friedman 2002) which has shown great promise in many applications. A Support Vector Machine procedure is also available; however, its status is experimental and should not be used for production model development.

Consequently it would be desirable to compare the performance of Gradient Boosting with other advanced techniques in this specific study. We will again start with the credit scoring data.

2. This time, we will use another advanced feature of SAS Enterprise Miner 5.3, which is the

Advanced Advisor in the Datasource Wizard. This selection, which is shown below, will execute a process that will scan the data for variables that should be ordinal, nominal, or interval, and for variables that should be rejected due to having too many or too few class levels. The user may control distribution thresholds for these assignments.

Page 20: Profit Analysis of the German Credit

20

Using this method, we find that the variables have been assigned a new set of level and role attributes. Again, we have set the CREDITABILITY variable as the dependent target variable.

3. To complete the analysis, click the Model tab to activate the predictive modeling tools. Then drag the Gradient Boosting icon to the Diagram Workspace.

4. In addition, we will add an Ensemble node to the diagram. Ensembles work by combining the predictions from multiple models into a single prediction that often produces superior results to any of the constituent models. In this case, the Ensemble node will average the probabilities of the input models and will be compared to the input models.

Page 21: Profit Analysis of the German Credit

21

5. We will again highlight the use of the Replacement node. In this case, we want to replace the values of the BALANCE_OF_CURRENT_ACCOUNT with simple values for high, none, low, and missing. Cleaning data is a frequent activity for most data miners. Select the Replacement node on the Workspace Diagram.

6. Change the default interval replacement method to none.

7. Select the class variables replacement editor and make the changes shown below:

Page 22: Profit Analysis of the German Credit

22

8. Now that we have cleared that task, select the Model Comparison node and run the path.

Once the path as completed, we will select a champion model.

9. Open the results of the Model Comparison node and select the output listing. In this case, the Gradient Boosting model has posted the best Test Average Profit value, followed by the Decision Tree, and has been automatically selected as the champion model. In terms of Average Square Error (ASE), the Ensemble model is the champion followed by the Gradient Boosting model. Why are these orderings different? The profit matrix that we entered unevenly weights the distribution of classification matrix; thus, models that are not monotonically related will produce different orders in those measures. Alternatively, you may think that these three models have identified different features within the overall pattern detection.

The ability of Ensembles to outperform their constituent models on classification tasks (Elder in 2003) is a very interesting effect. If we look at the expected profit curve we find a different story. In this case, the Neural Network is the champion with a consistent expected profit of 0.35 in the 5% percentile over the train, validation, and test data sets.

Page 23: Profit Analysis of the German Credit

23

10. We can also look at the more conventional measures such as ROC as shown below. If we were selecting models based on the tradeoff between specificity and sensitivity, the Ensemble model is consistently superior, while the Neural Network is a perhaps insignificantly close and second. The Decision Tree and Gradient Boosting models are ranked lower on this measure. In fact, the sharp line shape of the Decision Tree indicates that a shallow tree was created that produces a harsh distribution of probabilities.

11. We may also examine the lift charts where we find the Neural Network is now consistently

the highest ranked model, followed by the Gradient Boosting. The Decision Tree is again ranked lower due to its highly pruned structure.

Page 24: Profit Analysis of the German Credit

24

12. To better understand these results, we can look at the distributions of scores. In the results of

the model compare node, select ViewAssessmentScore Distribution Plots. First look at the plot of the Gradient Boosting model and find a good separation of true and false events.

Page 25: Profit Analysis of the German Credit

25

Now, we look at the Ensemble model and the Neural Network model. The Ensemble model shows a similar distribution with more separation between the two cases, which can be related to its slightly better score on ASE. The Neural Network model, on the other hand, shows a very different score distribution that produces more low probabilities and more overlap between the models. Remember that the Ensemble model is a mixture of the constituent models including both the Gradient Boosting model and the Neural Network model.

13. Now that we have established that the Gradient Boosting model is our champion, we can

look at its results in more detail. Open the results of the Gradient Boosting node and examine the Subseries Plot. This shows the reduction of error as the model grows more complex. However, the ASE plot does not show why the model was selected at iteration 39. Because we entered a profit matrix, SAS Enterprise Miner also shows the evolution of profit. The Gradient Boosting model was selected to maximize profit, which helps us develop a better campaign.

Page 26: Profit Analysis of the German Credit

26

Conclusion

The model selected as champion depends largely on the measurement used to make the decision. Selection based on Average Profit chooses the Gradient Boosting model, yielding an average profit per case of 0.069, but an expected profit of 0.35 in the 5% percentile. The data miner will select and report rank order measures at a population depth that is appropriate for the business case. The Neural Network and Ensemble models were very close in terms of overall performance and each would have been selected if the criteria were different. All three models detected the pattern and produced usable models. The selection of model is often determined by business rules and regulations; however, also producing a best model from modern techniques is valuable for setting bounds on the expected performance of the chosen model.

Page 27: Profit Analysis of the German Credit

27

Section-III: Micro-Target the Profitable Customers

This sub-section presents detailed steps on how to identify the customers that would be most profitable. Recall that the Decision Tree is the best model, and hence we will focus on this model. The following red block highlights the parts of the diagram that will be used for micro-targeting:

1. (Scoring Data via the Input Data Node) Drag the Input Data node to the Diagram Workspace. Import new data for scoring (see details in Section-I, item 2). Click the Input Data node to activate its Property Panel. Change the Role from Raw to Score:

2. (Score Node) Add the Score node to the Decision Tree node. Run the Score node.

Page 28: Profit Analysis of the German Credit

28

3. (SAS Code Node)

Data Customers; Set &EM_Import_Score; Customer_ID = _N_; Run; PROC Sort data = Customers; By descending P_CreditabilityGood; Run; Data good_customers; Set Customers; Obsnum = _N_; If Obsnum > 0.05*1000 THEN delete; Run; PROC Print data = good_customers noobs split = '*'; VAR obsnum Customer_ID P_CreditabilityGood; LABEL P_CreditabilityGood= ‘Predicted*Good*Credit’; TITLE “Credit Worthy Applicants”; Run; Proc print;

Page 29: Profit Analysis of the German Credit

29

References Adnan, A., and E. Bastos. 2005. “A Comparative Estimation of Machine Learning Methods on

QSAR Data Sets.” SUGI-30 Proceedings. Ayres, I. 2007. Super Crunchers: Why Thinking-by-Numbers Is the New Way to Be Smart.

Bantam Books. http://www.youtube.com/watch?v=cb4d4jl2A6E. Elder IV, John F. (2003). “The Generalization Paradox of Ensembles.” Journal of Computational and Graphical Statistics 12, No. 4: 853-864. Foster, D.P., and R.A. Stine. 2004. “Variable Selection in Data Mining: Building a Predictive

Model for Bankruptcy.” JASA, Vol. 99, No. 466, 303–313. Friedman, J.H. 2002. “Greedy Function Approximation: a Gradient Boosting Machine.” Annuals

of Statistics 29, 1189-1232. Friedman, J.H., and B.E. Popescu. 2005. “Uncovering Interaction Effects.” Presented at the

Second International Salford Systems Data Mining Conference. Wang, C., and B. Liu. 2008. “Data Mining for Large Datasets and Hotspot Detection in an Urban

Development Project.” Journal of Data Science. Yu, J.S., S. Ongarello, R. Fiedler, X.W. Chen, G. Toffolo, C. Cobelli, and Z. Trajanoski. 2005.

“Ovarian Cancer Identification Based on Dimensionality Reduction for High-Throughput Mass Spectrometry Data.” Bioinformatics, Vol. 21 No. 10, 2200–2209.

Page 30: Profit Analysis of the German Credit

30

Appendix (Import German Credit Excel data to the SASuser library)

1. Open Base SAS. Select File Import Data.

2. Click Next, and then click Browse to locate the data:

3. Select the SASUSER library.

Page 31: Profit Analysis of the German Credit

31

4. Type the name of the file, click Next, and then click Finish to complete the data import.

Chamont Wang, Ph.D. Department of Mathematics and Statistics The College of New Jersey Tel: (609)771-3041 [email protected]