data mining in building behavioral scoring models

Upload: joao-flavio

Post on 07-Jul-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/18/2019 Data Mining in Building Behavioral Scoring Models

    1/4

     Data Mining in Building Behavioral Scoring Models 

     Horng-I Hsieh

    Graduate Institute of Business

    Administration Fu-Jen Catholic University

    Taipei County, Taiwan

    [email protected]

    Tsung-Pei Lee

    Department of International Trade

    and FinanceFu-Jen Catholic University

    Taipei County, Taiwan

    [email protected]

    Tian-Shyug Lee*

    Department of Business

    AdministrationFu-Jen Catholic University

    Taipei County, Taiwan

    [email protected]

     Abstract  —Credit scoring and behavioral scoring have become

    very important credit risk management tasks during the past few

    years due to the impact of several financial crises. The objective

    of the proposed study is to explore the performance of behavioral

    scoring using three commonly discussed data mining techniques-

    linear discriminant analysis (LDA), backpropagation neural

    networks (BPN), and support vector machine (SVM). To

    demonstrate the effectiveness of behavioral scoring using the

    above-mentioned techniques, behavioral scoring tasks are

    performed on one bank credit card dataset in Taiwan. As the

    results reveal, BPN outperforms other techniques in terms of

    overall scoring accuracy and hence is an efficient alternative in

    implementing behavioral scoring tasks.

     Keywords-behavioral scoring; multi-class classification; neural

    networks; support vector machine

    * Corresponding author.

    I.  I NTRODUCTION

    Having been grown rapidly during the past few decades inthe credit industry, financial institutions would be impossible todecide whether to grant credit to customers without usingvarious automatic analysis techniques. With the increasedcompetition in the financial industry, the banks find it harder todevelop new customers than before. Because the cost ofattracting new customers increases rapidly, retaining existingcustomer is a must for all banks. The banks may easily losecustomers if they cannot satisfy them with quality and goodservice. That is, the customers will choose other banks that can

     provide better personalized services. Though contemporarymarketing treats customers as valuable assets, it is crucial toextract knowledge from huge numbers of consumers' datacollected by the credit department of banks to maintain theviability and optimize credit book. Financial institutions invest

    large amounts of resources to understand and explore thecontribution of their existing customers. Through analysis ofcustomer contribution, they can identify valuable customersand further make efforts to retain and enhance theircontributions, to increase customer satisfaction, lead to better

     bank customer relationship, and to locate causes of customerloss. The analysis results serve as an important reference formarketing and decision making.

    However, it is not sufficient to identify valuable customerssolely based on their payment histories. A customer pays his

     bills on time every month might also involve in risky behaviors

    at the same time. As the delinquencies and charge-offs soar, the problem of personal bankruptcy has become an internationalconcern yet little understood [1]. Therefore credit riskassessment is crucial for financial institutions especially in thisera of economic globalization. Recently, several financialcrises, such as credit and cash card crises in Taiwan andsubprime mortgage crisis in US, have not only caused a severe

    recession in domestic economy but resulted in the globalfinancial disaster. All of which were triggered by inappropriatecredit decisions. Speed and efficiency are key components tothe success of the credit issuers; however, profitability inconsumer credit industry is based on faster and better decisions.Granting credit to customers with poor credit performing maycause huge amount of losses, and further lead the banks into

     bankruptcy. The adage “pay on the spot and borrow a lot; payslow and you’ll get no dough” fails to attack the problem

     before it arises. The financial health of a customer may changedramatically, thus must be monitored and managed over thelife of the leasing relationship.

    Credit scoring and behavioral scoring are primary

    techniques to answer these needs and aim to increase cash flowand reduce losses from bad credit decisions. Many bankscontinue to focus attention and resources on automating front-end credit scoring processes, but back-end behavioral scoringare the foundation of their success [2]. The objective of thisstudy is to explore the performance of behavioral scoring usingseveral commonly discussed data mining techniques-lineardiscriminant analysis (LDA), backpropagation neural networks(BPN) and support vector machine (SVM).

    The remainder of the paper is organized as follows: Anoverview of credit and behavioral scoring models is given inSection 2. Section 3 provides a brief outline of LDA, BPN, andSVM. The empirical results of behavioral scoring models using

    LDA, BPN, and SVM are provided in Section 4. Section 5concludes the study and discusses the possible further researchareas.

    II.  CREDIT AND BEHAVIORAL SCORING MODELS 

    Credit risk management has become one of the mostimportant tasks in the credit industry. Credit scoring and itsderivative, behavioral scoring, are the primary techniques thathelp organizations decide whether to grant credit to consumers

     based on the credit risk of applicants [3]. The objective of bothcredit and behavioral scoring models is to assign consumers to

    This research was partially supported by the National Science Council of theRepublic of China under Grant Number NSC 97-2221-E-030-011-MY2.

    978-1-4244-5392-4/10/$26.00 ©2010 IEEE

  • 8/18/2019 Data Mining in Building Behavioral Scoring Models

    2/4

    either a ‘good credit’ group that is likely to repay financialobligation or a ‘bad credit’ group whose application should bedenied because of its high possibility of defaulting on thefinancial obligation. Therefore credit scoring lies in the domainof the more general and widely discussed classification

     problems [4]. Credit scoring is done on the front end when newconsumer applies for credit. On the other hand, behavioralscoring evaluates the credit risk of existing consumers based on

    the same principles. In building behavioral scoring models, oneuses the credit scoring variables and includes others whichdescribe the behavior [3].

    Behavioral scoring has gained more and more attention because the explosion of competition in recent years making itdifficult to attract and retain profitable, low-risk customers.Generally, most business revenue, sometimes up to 90 percent,is being produced from repeat transactions with existingcustomers [2]. In addition, the financial health of customersmay change over time; thus must be continuously monitoredand managed. By forecasting their future performance,

     behavioral scoring models allow financial institutions to makedecisions faster and better to retain creditworthy and valuable

    customers.Traditional research often treats credit scoring and

     behavioral models as binary classification problem, theapproaches that might miss useful information [5]. In order todiscover more knowledge for advanced credit risk management,multi-class classifiers are needed [6]. As a result, this studyused multi-class models to build behavioral scoring models.

    III.  R ESEARCH METHODOLOGY 

     A.   Artificial Neural Networks

    A neural network, modeled following the neural activity inthe human brain, is a computer-intensive, algorithmic

     procedure for transforming inputs into desired outputs usinghighly interconnected networks of relatively simple processingelements. Neural networks are increasingly found to be usefulin modeling non-stationary processes due to their associatedmemory characteristics and generalization capabilities. Neuralnetworks have been widely used in engineering, science,education, social research, medical research, business, finance,forecasting, and related fields. Neural networks have also beenexplored in handling credit and behavioral scoring problems [4]and the results show that the scoring accuracies of neuralnetworks are better than those using traditional statistical and

     parametric approaches.

     B. 

    Support Vector MachineSVM is a novel non-parametric statistical learning

    algorithm developed by Vapnik [7]. The original SVM wasdesigned for solving the binary classification problem, and hasgained popularity due to many attractive features and

     promising generalization performance. Based on the structuredrisk minimization (SRM) principle, SVM seeks to minimize anupper bound of the generalization error instead of the empiricalrisk minimization (ERM) principle implemented in most of thetraditional neural network models. Generally, the neuralnetwork models may tend to fall into the problem of local

    minimum. On the other hand, the SVM will be equivalent tosolve a linear constrained quadratic programming (QP)

     problem so that it can provide both global and unique solutions.

    In order to create a classifier, the basic principle of SVM islearning to find out a line or hyperplane between the two-classdata set. If the margin is maximized, the hyperplane is calledoptimal separating hyperplane (OSH). When the data is notlinearly separated, the SVM uses the kernel method to map theinput data into a high-dimensional feature space via a nonlinearmapping, and then performs a linear separating between thetwo classes in this space.

    The SVM works very well when dealing with the high-dimensional data and therefore avoids the curse ofdimensionality problem. It has been widely used in modelingcredit and behavioral scoring problems and preliminaryevidence suggest support vector machines seem to be mostaccurate [8]. For a detailed introduction, the readers arereferred to [7].

    IV.  EMPIRICAL STUDY 

    The aim of this study is to explore the performance of behavioral scoring using three commonly discussed datamining techniques-LDA, BPN, and SVM. To demonstrate theeffectiveness of behavioral scoring using the above-mentionedtechniques, behavioral scoring tasks are performed on one bankcredit card dataset in Taiwan. The dataset consists of totally10769 cardholders. Each cardholder in the dataset contains 41variables, such as demographics, credit history, account

     balances, payment history, etc., which are used to describe thecardmembers' characteristics, credit status as well as credit cardusage behavior. The dataset are divided into four groups:transactor users, revolver users, inactive users without usingtheir credit cards, and bad credit customers. The number of thefour groups are 3835, 2567, 3884, and 483. Table I summarizesthe distribution of the dataset.

    TABLE I. DISTRIBUTION OF DEPENDENT VARIABLE 

    Groups Description Frequency Percentage

    Group 1 Transactor users 3835 35.61%

    Group 2 Revolver users 2567 23.84%Group 3 Inactive users 3884 36.07%

    Group 4 Bad credit customers 483 4.49%

    Total 10769 100.0%

    In order to minimize the possible bias associated with therandom sampling of the training and testing samples,

    researchers tend to use n-fold cross-validation scheme inevaluating the classification capability of the built model. In n-fold cross-validation, the entire dataset is randomly split into nmutually exclusively subsets (also called folds) ofapproximately equal size with respect to the ratios of different

     populations. The classification model will then be trained andtested n times. Each time the model is built using (n-1) folds

    as the training sample and the remaining single fold is retainedfor testing. The training sample is used to estimate the

     behavioral scoring model’s parameters while the retainedholdout sample is used to test the generalization capability ofthe built model. The overall classification accuracy of the built

  • 8/18/2019 Data Mining in Building Behavioral Scoring Models

    3/4

    model is then just the simple average of the n individualaccuracy measures. The five-fold cross-validation will beadopted, the detailed behavioral scoring results using theabove-mentioned modeling techniques can be summarized asfollows.

     A.   Linear Discriminant Analysis Model

    The stepwise discriminant approach is adopted in building

    the discriminant analysis behavioral scoring models. Twenty-six out of forty-one independent variables are selected in thefinal discriminant function. The behavioral scoringclassification results of the corresponding testing samplesobtained from five discriminant functions are summarized inTable II.

    TABLE II. CROSS-VALIDATION R ESULTS OF THE LDA MODELS 

    Foldnumber

    Behavioral scoring results

    {1-1} {2-2} {3-3} {4-4}

     Averagecorrect

    classificationrate

    1 92.31%(708/767) 75.05%(385/513) 68.34%(531/777) 94.85%(92/97) 79.67%(1716/2154)

    292.70%

    (711/767)71.35%

    (366/513)72.20%

    (561/777)90.72%(88/97)

    80.13%(1726/2154)

    392.18%

    (707/767)70.96%

    (364/513)70.01%

    (544/777)92.78%(90/97)

    79.16%(1705/2154)

    494.00%

    (721/767)76.07%

    (391/514)72.97%

    (567/777)97.92%(94/96)

    82.31%(1773/2154)

    591.66%

    (703/767)75.88%

    (390/514)72.94%

    (566/776)89.58%(86/96)

    81.05%(1745/2153)

    Mean 92.57% 73.86% 71.29% 93.17% 80.46%

    Here {x-y} means that customer in group x is classified as group y, and the definitions of groups pleaserefer to Table I.

    From the results revealed in Table II, we can observe thatthe average correct classification rates for the five folds are79.67%, 80.13%, 79.16%, 82.31%, and 81.05%, respectively,with mean equals to 80.46%. Among the four groups ofcustomers, group 4 achieves the best classification accuracy

     between 89.58% and 97.92%, with average accuracy of93.17%, while group 3 has the worse classification accuracyranging from 68.34% to 72.97%.

     B.   Backpropagation Neural Networks Model

    The popular BPN is used in building the credit scoringmodel, and the single hidden layer network is used to designnetwork structure. There are 41 input nodes in the input layerand 4 output nodes. As the issue of determining the optimalnumber of hidden nodes is a crucial yet complicated one, the

    most commonly used way in determining the number of hiddennodes is via experiments or trial and error. We, therefore, willalso use the trial-and-error approach with the range from 43 to88 neurons to determine the appropriate number of hiddennodes for the desired networks. The training of a network isimplemented with various learning rates ranging from 0.01 to0.9 (almost all the network structure cannot converge with alearning rate greater than 0.9) and training lengths rangingfrom 56,000 to 500,000 iterations until the network converges.

     Network weights will be reset for each combination of thenetwork parameters such as learning rates and momentum.

    As the training of any neural network is itself a stochastic process, the reported neural network result is therefore themedium value (avoid possible extreme values due to

     better/poorly trained networks) of 20 repetitive trials. Thenetwork topology with the highest correct classification rate isconsidered as the optimal network topology. Five neuralnetworks behavioral scoring models were built and theclassification results of the corresponding testing samples were

    summarized in Table III. From the results in Table III, we canobserve that the average correct classification rates for the fivefolds are 94.52%, 93.55%, 93.92%, 94.99%, and 95.12%,respectively, with mean equals to 94.42%. Among the fourgroups of customers, group 4 achieves the best classificationaccuracy between 96.88% and 100.00%, with average accuracyof 97.93%, while group 2 has the worse classification accuracyranging from 89.67% to 93.39%.

    TABLE III. CROSS-VALIDATION R ESULTS OF THE BPN MODELS 

    Foldnumber

    Behavioral scoring results

    {1-1} {2-2} {3-3} {4-4}

     Averagecorrect

    classificationrate

    195.83%

    (735/767)90.45%

    (464/513)95.37%

    (741/777)98.97%(96/97)

    94.52%(2036/2154)

    294.39%

    (724/767)89.67%

    (460/513)94.85%

    (737/777)96.91%(94/97)

    93.55%(2015/2154)

    395.05%

    (729/767)90.64%

    (465/513)94.59%

    (735/777)96.91%(94/97)

    93.92%(2023/2154)

    496.87%

    (743/767)92.22%

    (474/514)94.72%

    (736/777)96.88%(93/96)

    94.99%(2045/2154)

    596.48%

    (740/767)93.39%

    (480/514)94.33%

    (732/776)100.00%(93/96)

    95.12%(2048/2153)

    Mean 95.72% 91.27% 94.77% 97.93% 94.42%

    C.  Support Vector Machine Model

    In this study, the most widely used radial basis function(RBF) [7] [9] is adopted as the kernel function of SVM. In themodeling of SVM, one of the key problems is how to selectmodel parameters correctly, which plays an important role ingood generalization performance. However, no generalguidelines are available to choose the free parameters of anSVM model. The selection is usually based on trial-and-errormethod or user’s prior knowledge and/or expertise.

    This study used a ‘grid-search’ method [9] to find the bestcombination of parameters. The grid search is a straightforwardmethod using exponentially growing sequences of C  and γ  toidentify good parameters (for example, C =2

    -5, 2

    -3, 2

    -1, … , 2

    15).

    Since doing a complete gird search is time-consuming for largedataset, a two-step search process—beginning with a coarsegrid and followed by a finer grid search—was conducted toselect the model parameters. Five SVM behavioral scoringmodels were built, and the cross-validation results of thesemodels are presented in Table IV. From the results in Table IV,we can observe that the average correct classification rates forthe five folds are 94.06%, 94.05%, 93.31%, 94.52%, and94.84%, respectively, with mean equals to 94.16%. Among thefour groups of customers, group 4 achieves the bestclassification accuracy between 93.75% and 97.94% for each

  • 8/18/2019 Data Mining in Building Behavioral Scoring Models

    4/4

    fold, with average accuracy of 96.91%, while group 2 has theworse classification accuracy ranging from 88.50% to 93.00%.

    TABLE IV. CROSS-VALIDATION R ESULTS OF THE SVM MODELS 

    Foldnumber

    Behavioral scoring results

    {1-1} {2-2} {3-3} {4-4}

     Averagecorrect

    classification

    rate

    195.83%

    (735/767)89.28%

    (458/513)94.98%

    (738/777)97.94%(95/97)

    94.06%(2026/2154)

    296.22%

    (738/767)89.47%

    (459/513)94.85%

    (736/777)94.85%(92/97)

    94.05%(2025/2154)

    395.05%

    (729/767)88.50%

    (454/513)94.34%

    (733/777)96.91%(94/97)

    93.31%(2010/2154)

    495.96%

    (736/767)91.25%

    (469/514)95.37%

    (741/777)93.75%(90/96)

    94.52%(2036/2154)

    595.70%

    (734/767)93.00%

    (478/514)95.10%

    (738/776)95.83%(92/96)

    94.84%(2042/2153)

    Mean 95.75% 90.30% 94.93% 95.85% 94.16%

     D.  Comparison of Results of Different Behavioral Models

    Finally, in order to evaluate the classification capabilities ofthe above three constructed behavioral scoring models, thesummarized results can be shown in Table V. From the resultsrevealed in Table V, we can conclude that the BPN model hasthe best behavioral scoring capability in terms of the averageclassification rate in comparison with LDA and SVM. Inconsideration of group classification accuracies, SVM achievedthe best accuracy in both group 1 and 3; BPN achieved the bestaccuracy in both group 2, 4.

    TABLE V. GROUP ACCURACY COMPARISONS

    Group

    AccuracyLDA BPN SVM

    Group 1 92.57% 95.72% 95.75%Group 2 73.86% 91.27% 90.30%

    Group 3 71.29% 94.77% 94.93%Group 4 93.17% 97.93% 95.85%Overall 80.46% 94.42% 94.16%

    V.  CONCLUSIONS AND FUTURE R ESEARCH 

    Consumer credit risk management has gained more andmore attention during the past few years due to the impact ofserious financial crises. Behavioral scoring is a widely usedtool for banks to continually assess the ongoing credit risks andconsumer behavior of their existing customers. This paper

    investigates the classification capability of three modelingtechniques in building behavioral scoring tasks. Experimentalresults showed that BPN model has better overall classification

    accuracy than those of the other models. However, the overallclassification accuracy is not the only criterion to assess thecapability of model. For example, if the business strategy of the

     bank is to make a black list on bad credit accounts, BPN modelthen can be used to accomplish this goal as it achieved thehighest classification accuracy in group 4 (see Table V). On theother hand, if the business strategy is to identify the inactivecustomers who are at risk of being lost to a competitor, then

    SVM model should be used to conduct this task.Future studies may aim at collecting more important

    independent variables that will increase the behavioral scoringaccuracies. Combining feature selection tools withclassification techniques is also recommended. Integratingother artificial intelligence techniques, like fuzzy discriminantanalysis, genetic algorithms and gray theory, with supportvector machine in further improving the behavioral scoringaccuracies may also being discussed. Other related topics aboutcredit cards like customer retention, market basket analysis,

     profit scoring, and collection scoring models may also beinginvestigated in future research works.

    R EFERENCES 

    [1]  J.M. Donato, J.C. Schryver, G.C. Hinkel, R.L. Schmoyer Jr., N.W.Grady, and M.R. Leuze, “Mining multi-dimensional data for decisionsupport,” Future Gener. Comp. Syst., vol. 15, pp. 433–441, 1999.

    [2]  M. Banasiak and E. O'Hare, “Behavior scoring,” Bus. Credit, vol. 103, pp. 52–55, 2001.

    [3]  L.C. Thomas, “A survey of credit and behavioural scoring: forecastingfinancial risk of lending to consumers,” Int. J. Forecast., vol. 16, pp.149–172, 2000.

    [4]  T.S. Lee, C.C. Chiu, C.J. Lu, and I.F. Chen, “Credit scoring using thehybrid neural discriminant technique,” Expert Syst. Appl., vol. 23, pp.245–254, 2002.

    [5]  H.J. Noh, T.H. Roh, and I. Han, “Prognostic personal credit risk model

    considering censored information,” Expert Syst. Appl., vol. 28, pp. 753– 762, 2005.

    [6]  G. Kou, Y. Peng, Y. Shi, M. Wise, and W. Xu, “Discovering creditcardholders’ behavior by multiple criteria linear programming,” Ann.Oper. Res., vol. 135, pp. 261–274, 2005.

    [7]  V.N. Vapnik, The Nature of Statistical Learning Theory, 2nd ed., NY:Springer. (2000).

    [8]  J.N. Crook, D.B. Edelman, and L.C. Thomas, “Recent developments inconsumer credit risk assessment,” Eur. J. Oper. Res., vol. 183, pp. 1447-1465, 2007.

    [9]  C.W. Hsu, C.C. Chang, and C.J. Lin, “A practical guide to supportvector classification,” Technical Report, Department of ComputerScience and Information Engineering, National Taiwan University,2003.