feature selection in credit scoring- a quadratic
TRANSCRIPT
Texas A&M International University Texas A&M International University
Research Information Online Research Information Online
Theses and Dissertations
6-4-2015
Feature selection in credit scoring- a quadratic programming Feature selection in credit scoring- a quadratic programming
approach solving with bisection method based on Tabu search approach solving with bisection method based on Tabu search
Jun Huang
Follow this and additional works at: https://rio.tamiu.edu/etds
Recommended Citation Recommended Citation Huang, Jun, "Feature selection in credit scoring- a quadratic programming approach solving with bisection method based on Tabu search" (2015). Theses and Dissertations. 2. https://rio.tamiu.edu/etds/2
This Dissertation is brought to you for free and open access by Research Information Online. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of Research Information Online. For more information, please contact [email protected], [email protected], [email protected], [email protected].
FEATURE SELECTION IN CREDIT SCORING- A QUADRATIC PROGRAMMING
APPROACH SOLVING WITH BISECTION METHOD BASED ON TABU SEARCH
A Dissertation
by
JUN HUANG
Submitted to Texas A&M International University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
May 2014
Major Subject: International Business Administration
FEATURE SELECTION IN CREDIT SCORING- A QUADRATIC PROGRAMMING
APPROACH SOLVING WITH BISECTION METHOD BASED ON TABU SEARCH
Copyright 2014 Jun Huang
FEATURE SELECTION IN CREDIT SCORING- A QUADRATIC PROGRAMMING
APPROACH SOLVING WITH BISECTION METHOD BASED ON TABU SEARCH
A Dissertation
by
JUN HUANG
Submitted to Texas A&M International University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Approved as to style and content by:
Chair of Committee, Haibo Wang
Committee Members, Jacqueline R Mayfield
Milton R Mayfield
Runchang Lin
Head of Department, Nereu Florencio Kock
May 2014
Major Subject: International Business Administration
iv
ABSTRACT
Feature Selection in Credit Scoring- A Quadratic Programming Approach Solving with Bisection
Method Based On Tabu Search (May 2014)
Jun Huang, Master of Science, Texas A&M International University;
Chair of Committee: Haibo Wang
Credit risk is one of the most important topics in the risk management. Meanwhile, it is the
major risk of banks and financial institutions encountered as claimed by the Basel capital accord.
As a form of credit risk measurement, credit scoring is the credit evaluation process to reduce the
current and expected risk of a customer being bad credit. The credit scoring models usually use a
set of features to predict the credit status, good credit (unlikely to default) and bad credit (more
likely to default), of the applicants. However, with the fast growth in the credit industry and
facilitation of collecting and storing information due to the new technologies, a huge amount of
information on customer is available. Feature selection or subset selection is therefore essential
to handle irrelevant, redundant or misleading features in order to improve predictive
(classification) accuracy and reduce high complexity, intensive computation, and instability for
most credit scoring models.
In this study, a hybrid model is developed for credit scoring problems to predict the
classification accuracy based on selected subsets by first establishing a correlation coefficient
based binary quadratic programming model for feature selection. The model is then solved with
the bisection method based on Tabu search algorithm (BMTS) and provides optional subsets of
features in different sizes from which the satisfactory subsets for credit scoring models are
v
selected based on both the size and overall classification accuracy rate (OCAR). The results of
this proposed BMTS+SVM method, tested on two benchmark credit datasets, shed light on the
improvement of the existing credit scoring systems with flexibility and robustness.
This validated method is then used in an international business context to test the data on
the U.S. and Chinese companies in order to find out the subsets of features that act as key factors
in distinguishing good credit companies from bad credit companies in these two countries.
Finally, The performance of classification models, using different classifiers, in terms of OCAR
and misclassification cost is evaluated based on the U.S. and Chinese datasets. Cutoff values
which give highest OCAR and minimum misclassification cost is also discussed.
vi
ACKNOWLEDGEMENTS
I would like to first thank Dr. Haibo Wang for his constant guidance, personal attention,
suggestions and endless encouragement and full support during last four and half years of my
graduate study and research. Special thanks go to my committee members Dr. Jacqueline R
Mayfield, Dr. Milton R Mayfield, and Dr. Runchang Lin, for their invaluable advice and
feedback. Also, I would like to express my sincere appreciation to the visiting scholar Dr. Zhibin
Xiong who gave me enormous valuable discussions during my dissertation research.
Finally, I would like to express my utmost gratitude to my family- my parents, younger
sister, and parents-in-law whose unparallel support and constant encouragement helped me sail
through the rigorous journey of the PhD program. I extend my deepest appreciation to my
beloved wife, Weiwei Wu, for her unconditional love, understanding, and inspiration. Her
endless support and encouragement throughout the entire doctorate program contributed greatly
to my success. A true blessing, she is indeed the highly valued significant-other.
vii
TABLE OF CONTENTS
Page
ABSTRACT ............................................................................................................................. iv
ACKNOWLEDGMENTS ....................................................................................................... vi
TABLE OF CONTENTS ........................................................................................................ vii
LIST OF TABLES .....................................................................................................................x
LIST OF FIGURES ................................................................................................................ xii
CHAPTER
I INTRODUCTION .........................................................................................................1
Background ....................................................................................................................1
Purpose and Contribution ..............................................................................................3
II LITERATURE REVIEW ..............................................................................................7
Credit Risk Management ...............................................................................................7
Credit Scoring ................................................................................................................8
Discriminant Analysis ......................................................................................12
Logistic Regression ......................................................................................... 14
Decision Trees .................................................................................................15
Neural Networks ..............................................................................................17
Genetic Programming ..................................................................................... 20
Support Vector Machines ................................................................................23
Feature Selection ..........................................................................................................25
III METHODOLOGY ......................................................................................................32
Model Construction .....................................................................................................32
Algorithm .....................................................................................................................34
viii
SVM Classifier.............................................................................................................41
Cross Validation...........................................................................................................42
IV EXPERIMENT RESULTS AND COMPARISON ANALYSIS ................................44
Validation of the Method on Two Benchmark Datasets ..............................................44
Results and Comparison Analysis ...............................................................................47
V APPLICATION OF THE CREDIT SCORING AT CORPORATE LEVEL ..............54
Reviews of Applications of Credit Scoring at Corporate Level ..................................54
A Study of Credit Scoring for the U.S. and Chinese Companies ................................58
Model Predictive Performance and Evaluation ...........................................................73
ROC Curve...................................................................................................................75
Misclassification Cost ..................................................................................................79
Identification of Cutoff Value ......................................................................................81
VI CONCLUSION AND DISCUSSION .........................................................................84
Summary ......................................................................................................................84
Discussion and Future Research ..................................................................................86
REFERENCES ........................................................................................................................89
APPENDIX
A EXAMPLE OF SOLUTIONS FOR MODEL 3.1 .....................................................108
B STATISTICAL DESCRIPTION OF THE U.S. DATASET .....................................113
C STATISTICAL DESCRIPTION OF CHINESE DATASET ....................................114
D DEFINITIONS OF LONG TERM CREDIT RATINGS FROM S&P ......................115
E COMPLETE SELECTED SUBSETS AND OCAR FOR THE U.S. DATASET .....116
F COMPLETE SELECTED SUBSETS AND OCAR FOR CHINESE DATASET ....118
ix
G SENSITIVITY AND 1-SPECIFICITY FOR THE U.S. DATASET ........................122
H SENSITIVITY AND 1-SPECIFICITY FOR CHINESE DATASET .......................124
VITA ......................................................................................................................................128
x
LIST OF TABLES
Page
Table 1: Summary of customer credit scoring models ............................................................11
Table 2: Penalty conversion .....................................................................................................36
Table 3: Statistic description for Australian and German datasets ..........................................45
Table 4: Complete subsets of features associated with given α for Australian dataset ...........45
Table 5: Complete subsets of features associated with given α for German dataset ...............46
Table 6: OCAR for Australian case and comparison ...............................................................48
Table 7: OCAR of selected subsets for Australian case and comparison ................................51
Table 8: OCAR for German case and comparison ..................................................................52
Table 9: OCAR of selected subsets for German case and comparison ....................................52
Table 10: Financial ratios in bankruptcy prediction literatures ...............................................56
Table 11: Financial ratios for the U.S. and Chinese companies ..............................................59
Table 12: Description of the U.S. and Chinese datasets ..........................................................65
Table 13: OCAR for the U.S. dataset and comparison ............................................................67
Table 14: OCAR for Chinese dataset and comparison ............................................................67
Table 15: Comparison of financial ratios between the U.S. dataset and S&P .........................69
Table 16: Comparison of financial ratios between the U.S. and Chinese dataset ...................70
Table 17: ANOVA for features in operating ratios from the U.S. dataset ...............................71
Table 18: ANOVA for features in operating ratios from Chinese dataset ...............................71
Table 19: Description of training and testing data for the U.S. and Chinese datasets .............74
Table 20: OCAR of five classifiers for the U.S. and Chinese datasets ....................................74
Table 21: AUC of different classifiers for the U.S. and Chinese datasets ...............................76
Table 22: OCAR in new cutoff value of five classifiers for the U.S. and Chinese datasets ....78
xi
Table 23: Misclassification cost for the U.S. and Chinese datasets.........................................80
Table 24: Misclassification cost with new cutoff values for the U.S. dataset .........................83
xii
LIST OF FIGURES
Page
Fig. 1: Dissertation structure ......................................................................................................6
Fig. 2: Relationship between number of research papers and year ..........................................10
Fig. 3: Logistic function P .......................................................................................................15
Fig. 4: An example of a decision tree ......................................................................................16
Fig. 5: An example of a neuron ...............................................................................................18
Fig. 6: An example of a neural network ..................................................................................18
Fig. 7: An example of expression of GP ..................................................................................21
Fig. 8: An example of mutation in GP .....................................................................................21
Fig. 9: An example of crossover in GP ....................................................................................22
Fig. 10: An example of a SVM in the two-dimensional space ................................................24
Fig. 11: Flowchart of filter approaches. ...................................................................................26
Fig. 12: Flowchart of wrapper approaches ..............................................................................29
Fig. 13: Flowchart of the BMTS+SVM method ......................................................................40
Fig. 14: An example of the cross validation ............................................................................42
Fig. 15: Relationship between number of features and α ........................................................47
Fig. 16: Structure of credit scoring study at corporate level ....................................................66
Fig. 17: ROC for the U.S. dataset ............................................................................................77
Fig. 18: ROC for Chinese dataset ............................................................................................77
1
This dissertation is modeled on Expert Systems with Applications.
CHAPTER I
INTRODUCTION
1.1 Background
Credit risk is one of the most important topics in the risk management. Meanwhile, it is the
major risk of banks and financial institutions encountered as claimed by the Basel capital accord
(Stephanou & Mendoza, 2005). With the rapid development of credit industry and increasing
complexity of banking activities, various credit risk problems arose. For instance, the growing
defaults from borrowers led the increasing of non-performing assets in banks which may even
cause bankruptcy of banks (World Bank, 2013); bond holders or investor suffer great losses
when a company default since they could not pay interests on time; The credit risk problem even
bear some responsibility for financial catastrophes such as 2008 Global Financial Crisis (Utzig,
2010). Therefore, the development and establishment of credit risk measurement is extremely
important to mitigate the risks.
Historically, financial institutions have relied on loan officers’ experience using technique
such as the 5 Cs to assess credit quality. However with increasing complexity of banking
activities, the qualitative method based on human judgment could not meet the need for credit
risk management. Credit risk measurements dominated by quantitative methods were becoming
increasingly popular. As a form of credit risk measurement, credit scoring is the credit evaluation
process to reduce the current and expected risk of a customer being bad credit in so that losses
due to bad debt can be mitigated (Abdou & Pointon, 2011).
Generally, the credit scoring models applied statistical approaches, and artificial intelligent
approaches (Huang, Chen & Wang, 2007). A main stream of building credit scoring models is to
develop classification models so that based on the analysis of the past performance of
2
consumers, future credit applicants can be classified into one of the predefined classes, typically
good class (unlikely to default) or bad class (more likely to default), according to the properties
that describe demographic characteristics, economic or financial conditions of the applicants
(García, Marqués & Sánchez, 2012). A variety of credit scoring models have been developed,
including statistical classification approaches, such as logistic regression, linear discriminant
analysis, factor analysis, and probit regression; and artificial intelligence approaches, such as
expert system, fuzzy algorithm, genetic programming, neural networks, support vector machines,
etc. (Šušteršič, Mramor & Zupan, 2009). The benefits of developing credit scoring models
include reducing the cost of credit analysis, enabling faster credit decisions, better examination
of existing accounts and prioritizing collections (Brill, 1998). For example, a Louisiana bank
called Hibernia Corporation reported that they processed 100 applications per month for small
business lending before implementing credit scoring in 1993 with seven loan officers. By 1995,
the same number of loan officers processed 1,100 applications per month. Also the business
loan portfolio increased from $100 million to $600 million during from 1993 to 1995.
Moreover, fewer bad loans were made by the bank (Lawson, 1995).
With the rapid growth in credit industry and facilitation of collecting and storing information
due to the new technologies, especially, after the rising of e-commerce, a huge amount of
information on customer behavior is available (Wollan, 2008). However, the inclusion of high
dimensional data often leads to high complexity, intensive computation, instability or lack of
predictive accuracy for most classification models (Liu & Schumann, 2005). Feature selection or
subset selection is therefore necessary to reduce the number of features used in order to achieve
better prediction accurately and efficiently.
3
In many real world problems, feature selection is also considered as a preprocessing of the
variables before applying other sophistical analysis tools. It is well known that keeping
uninformative variables in the model will cause the increase of variance of the response variable,
and thus, affects the predictive performance of the model. Feature selection can help to improve
the decision making by (1) improving the prediction performance by eliminating uninformative
variables; (2) providing faster and cost-effective predictors, and thus, saves the cost of collecting
data and builds less computational expensive models, (3) providing better understanding of
underlying process and making the model more interpretable (Guyon & Elisseeff, 2003). Hence,
feature selection is very important in building classification models.
1.2 Purpose and Contribution
In this study, a hybrid model for credit scoring problem is developed by establishing a
correlation coefficient based binary quadratic programming model for feature selection in the
first phase. The model is then solved by a bisection method based on Tabu search algorithm
(BMTS) and provides optional subsets of features in different sizes. In the second phase, the
satisfactory subsets for credit scoring models are selected based on both the size (number of
features in a subset) and predictive performance in terms of overall classification accuracy rate
(OCAR) which is derived from 10-fold cross validation Support Vector Machines (SVM). The
presented hybrid model, using BMTS+SVM method, not only reduces the computational effort
by the classifier but also provides flexible options so that a tradeoff between accuracy and the
size of subset is available.
This proposed BMTS+SVM method is validated with two benchmark credit datasets, and can
be applied in determining key factors that provide best discriminating power in identifying good
credit and bad credit customers from a pool of factors. Therefore, an application of the method is
4
then illustrated in an international business context on the U.S. and Chinese companies in order
to find out the subsets of features that act as the key factors in differentiating between
creditworthy companies (CWCs), companies that are unlikely to default, and less creditworthy
companies (LCWCs), companies that are more likely to default, in these two countries. The most
useful factors, in terms of financial categories and financial ratios, are first identified for the U.S.
companies. The four financial categories are those with profitability, solvency, cash flow, and
leverage ratios, and are in line with the four financial categories to which the 8 financial ratios,
provided by a widely recognized credit rating agency Standard & Poor, belong. Similarly, we
found the same four financial categories for Chinese companies with an additional financial
category with operating ratios. This indicates that key financial categories discriminated best
between CWCs and LCWCs may vary for different countries. Moreover, the application of the
findings is twofold. On one hand, managers of financial institutions can pay more attention to the
ratios in the key financial categories, especially the most representative ratios selected with our
proposed method, so that the managers are able to gain better understanding about the credit
status of their applicants before making any further decisions. On the other hand, companies that
attempt to borrow money from financial institutions are able to attain clear vision on what are the
most important financial factors for being considered a creditworthy company, and what
improvement are needed immediately to increase the chance of receiving loans.
Finally, the performance of different classification models (models using different classifiers
including support vector machines, discriminant analysis, logistic regression, decision tree, and
neural networks) in terms of OCAR and misclassification cost is evaluated based on the U.S. and
Chinese datasets. Cutoff values which give the highest overall classification accuracy rate and
minimum misclassification cost are also discussed. The results show that SVM has stable and
5
slightly better overall performance. However, there is no strong evidence showing that a
particular classifier significantly outperforms the others.
In sum, the contribution of this study is that it develops a hybrid model using BMTS+SVM
method which performs competitively well in predicting the classification accuracy for the credit
scoring problem. The method not only reduces the computational effort by the classifier but also
provides flexible options so that a tradeoff between accuracy and the size of subset is available.
In regard to application, the method is used at corporate level to identify key factors in
differentiating between creditworthy and less creditworthy companies in both the U.S. and China
to provide some insights and guidance for the managers in both financial institutions and
borrowing companies.
The dissertation is organized as follows. In Chapter 2, we give an overview of the related
work of credit risk management, credit scoring models, and feature selection. Chapter 3
introduces the construction of the subset selection model and the way to identify the subsets
using bisection method for different values of parameter, α, based on Tabu search algorithm.
Chapter 4 presents the results of subsets selected with the proposed binary quadratic
programming model and OCAR derived from SVM classifier. These results are also compared
with some classic approaches and the results from other studies. In Chapter 5, an application of
the proposed method is presented in an international business context for the U.S. and Chinese
companies, and the performances of different classification models are evaluated Discussion and
concluding remarks are presented in Chapter 6. The structure of the dissertation is given in Fig.
1.
6
Fig. 1. Dissertation structure
Chapter 6:
CONCLUSION AND DISCUSSION
Chapter 1:
INTRODUCTION
Background introduction, purpose and
contribution of the dissertation
Chapter 2:
LITERATURE REVIEW
Reviews on credit risk management,
credit scoring, and feature selection
Chapter 3:
METHODOLOGY
Chapter 4:
RESULTS AND
ANALYSIS
Construction of the model and
validation of BMTS+SVM method
Chapter 5:
APPLICATION
Application of the method at firm
level in an international business
context
7
CHAPTER II
LITERATURE REVIEW
2.1 Credit Risk Management
A simple definition of credit risk (also referred as default risk, or counterparty risk) is the
potential that a borrower or counterparty will fail to meet its contractual obligations in
accordance with agreed terms. The objective of credit risk management is to maximize a
financial institution’s risk-adjusted rate of return by maintaining credit risk exposure within
acceptable parameters. For most financial institutions, lending is the largest and most obvious
source of credit risk, and they have encountered difficulties over the past years for a variety of
reasons. A main cause for the difficulties is due to credit standards for borrowers and
counterparties are too lax (Njanike, 2009). Therefore, it is vital for financial institutions to
establish well-defined credit granting criteria to approve credit in a safe and sound manner. In
addition, credit risk assessment was actively promoted by Basel Committee on Banking
Supervision (BCBS) who issued Basel accord, a set of recommendations for regulations in the
banking industry (Stephanou & Mendoza, 2005). The new Basel capital accord (Basel II), which
developed since 1999, proposed a series of new regulatory framework to measure the credit risk.
It focused on a variety of risk identification and measurement methods including the standard
approach and internal rating method (IRB). While the standardized approach allows less
sophisticated banks to use external credit ratings to classify the bank’s assets into risk classes, the
IRB approach was particularly emphasized by the new accord which relies heavily on the bank’s
own experience in determining the risk characteristics, and encouraged banks to develop and use
better risk management techniques and models (Stephanou & Mendoza, 2005).
8
Credit risk measurements can be classified into classic and modern methods chronologically.
The classic credit risk measurements are more like the expert system relied mostly on human
experts’ experience to judge the probability of default. 5 Cs method is such a measurement that
credit and loan decisions are made by the judgment of experts based on five factors namely
Character, Capital, Capacity, Collateral, and Cycle conditions. Character measures the reputation
of a borrower. Capital looks at a borrower’s equity investment and debt ratio to see its financial
commitment. Capacity measures a borrower's ability to repay a loan. Collateral or third-party
guarantees are additional forms to secure the loan. Finally Cycle conditions measure how
sensitive is a borrower's sales to the overall economy. However, this method may be inconsistent
and subjective as it specifies no weighting scheme that would consistently order the 5 Cs in
terms of their relative importance in forecasting probability of default.
2.2 Credit Scoring
Due to the increasing complexity of banking activities, the qualitative method based on
human experts could no longer meet the need for credit risk management. Banks and financial
institutions were looking for more effective measurements to assist and support the complex
credit risk management. Modern credit risk measurements dominated by quantitative methods
were becoming increasingly prevalent.
Modern methods of credit risk measurement can be traced to an options-theoretic structural
approach pioneered by Merton (1974), and a reduced form approach. He proposed a model to
evaluate the credit risk of a company by considering the company’s equity as a call option on its
assets with a strike price equal to the debt repayment amount. It assumed that a company issues
zero-coupon debt that will become due at a future time. The company defaults if the market
9
value of the firm’s assets falls below the value of its promised debt. Probability of default (PD) is
computed directly from the distance to default (DD) as follow in equation 2.1:
Market Value of Assets - Default PointDD=
Market Value of Assets Asset Volatility (2.1)
The higher the DD, the lower the PD. In Merton’s (1974) model, a log normal distribution
was assumed when converting the DD into a PD estimate. However, this distributional
assumption is often violated in practice. Thus some other models base on Merton’s (1974) model
used alternative approaches to project the DD into a PD estimate. For example, KMV developed
by Moody determines an empirical estimate of the PD based on a historical database of default
rates, denoted as expected default frequency (EDF). Unlike the structural approach models which
made assumptions about the dynamics of a firm’s assets, capital structure, and its debt and
shareholders, the reduced form approach models made no assumptions about why a default
occurs. Default is not tied to the dynamics of asset prices but is based on an exogenous Poisson
process. Credit Risk Plus, a model developed by Credit Suisse Financial Products (CSFP) is such
a reduced form model (Crouhy, Galai & Mark, 2000).
In the past few decades, as a form of quantitative credit risk measurement, credit scoring is
widely used as a credit evaluation process to evaluate the potential risk posed by lending money
to consumers and to mitigate losses due to bad debt. Credit scoring models reduce the cost of
credit analysis, enhance the credit decision, and save time and effort (Ong, Huang & Tzeng,
2005). Generally, the credit scoring models usually use two types of approaches which are
statistical approaches and the most recent artificial intelligent approaches. A main stream of
building credit scoring models is to develop classification models so that information from
10
applications is used to separate the applicants into good and bad credit risks according to the
properties that describe demographic characteristics, economic or financial conditions of the
applicants. These models usually use statistics methods, e.g. discriminant analysis, logistic
regression, factor analysis, and probit regression, and artificial intelligent approaches, e.g. expert
system, fuzzy algorithms, genetic programming, neural networks, SVM (Falangis, 2007; Hand &
Henley, 1997; Kim & Sohn, 2004).
In order to discover the trend of credit scoring models over recent years, we used SCOPUS
database as our main source with keyword “credit scoring”. Since feature selection is core
content of this dissertation, we narrow down the search by adding AND “feature selection” OR
“subset selection” OR “variable selection”. The search results are presented in Fig. 2.
Fig. 2. Relationship between number of research papers and year
We can see the growth trend of this research area during the recent years with a big jump in
2009 after the 2008 credit crisis, and 93% of the sources are journal articles and conference
papers. It becomes clear that this area is one of the interest areas of Computer Science,
Engineering, Decision Sciences, Mathematics, Business, Management, Accounting, Social
0
5
10
15
20
25
30
1972 1982 1992 2002 2012
Nu
mb
er
of
pap
ers
Year
11
Science, and Economic/Finance. Models or solution approaches used to evaluate credit scoring
and the comments on the approaches for these papers are summarized in Table 1.
Table 1
Summary of customer credit scoring models.
Model (Approach) No. of
papers
Representative
references
Comments
Ant Colony
Optimization/
Particle Swarm
Optimization
3 Marinakis,
Marinaki,
Doumpos, and
Zopounidis
(2009)
Both methods are easy to implement and very fast to converge
but often find local optimal and is difficult for theoretical
analysis.
Bayesian Network 3 Hsieh and
Hung (2010)
Bayesian network incorporate uncertainty in the model and
handle data from all sources including missing data. But it relied
on the expert input with less Spatial and temporal dynamics due
to lack of feedback loops.
Case-Based
Reasoning
(Decision Tree)
25 Cho, Hong,
and Ha (2010)
Case-based reasoning is intuitive due to the system’s self-
learning capability and easy to develop. But it will not provide
optimal solution and there is no perfect match with the case in
the system.
Fuzzy Sets 7 Zimmermann
and Zysno
(1983)
Fuzzy sets have great flexibility on variable types and data input
and can be easy to design and understand from the rules.
However, they are hard to formulate as mathematical model
Genetic Algorithm
/ Artificial
Neural Network
38 Oreski,
Oreski, and
Oreski (2012)
Both method are self-guided and self-organized with high
degree of flexibility and robustness and can be implemented
with parallelism. But both will produce chance-dependent
outcome and are computational intensive.
LDA, Logistic
Regression.
12 Altman (1968)
Yap, Ong, and
Husain (2011)
Logistic regression has great flexibility in term of variables and
relationship but requires much more data to achieve stable and
meaningful results. (Conventional approaches such as logistic
regression and LDA are mainly used as comparison purpose).
Monte Carlo
Simulation
4 Paisittanand
and Olson
(2006)
Monte Carlo Simulation has unbiased estimator and easy to be
implemented but with great deal of computation time due to
large number of simulations.
Multi-Objective
Optimization
1 Wang and
Huang (2009)
Multi-objective optimization is simple and easy to use with each
objective addressed in the model but difficult to assign weights
and combine different types of optimization into a single
formulation.
Rough set 2 Wang, Hedar,
Wang, and Ma
(2012)
They yield ‘if-then’ rules involving ordinal values to perform
classification tasks, but it can be sometimes impractical to
apply as it may lead to an empty set ; sensitive to changes in
data; and inaccurate.
Support Vector
Machine
48 Bellotti and
Crook (2009)
SVM provides unique optimal solution based on the choice of
kernel functions. However it has high algorithmic complexity
and requires extensive memory.
12
A review of the studies in Table 1 reveals that conventional statistical techniques are rarely
used alone as the credit scoring models. They were used in the studies for comparison purpose
with other sophisticated methods. Among the artificial intelligent techniques, which gain their
popularity nowadays, SVM is a dominant method in credit scoring with 48 out of 168 papers
followed by neural network, genetic algorithm, decision tree, fuzzy sets, Monte Carlo simulation,
Bayesian network, ant colony, and other methods. The most frequently applied two statistical
techniques and four artificial intelligent techniques, showing in Table 1, are discussed in more
detail in the following section of this chapter.
2.2.1 Discriminant Analysis
Discriminant analysis (DA) was proposed by Fisher (1936) for a classification and
discrimination purpose where dependent variable is a nonmetric variable. It is a parametric
statistical technique used in situations in which the primary objective is to identify the group to
which an object belongs. The technique is referred to as two-group discriminant analysis when
two groups are involved while it is referred as multiple discriminant analysis (MDA) when
multiple (three or more) groups involved. DA involves deriving a variate. The discriminant
variate also known as the discriminant function is the linear combination of the independent
variables that will discriminate best between the objects in the groups, and it is achieved by
computing the variate’s weights for each independent variable to maximize the differences
between the groups with the equation 2.2:
1 1 2 2jk k k n nkZ W X W X W X L (2.2)
13
where is discriminant score of discriminant function for object , is intercept, is
discriminant weight for independent variable , and is independent variable for object . In
a ward, Fisher’s discriminant analysis is to use weights , , , to construct a discriminant
function ( score) which maximize the ratio, , of the between-class variation to the within-
class variation shown as equation 2.3
2
1 2
22
Z
Z ZD
S
(2.3)
where ̅ and ̅ are sample means of (when two classes presented), is pooled estimation of
sample variances.
The discriminant functions can be used to determine to which group each case most likely
belongs, and in general we classify the case as belonging to the group for which it has the highest
discriminant score (Hair, Black, Babin & Alderson, 2010). Discriminant approach is still one
of the most broadly established techniques, and has been treated as the benchmark to other
modern classification approaches in the credit scoring applications (Aidi & Sari, 2012; Altman,
1968; Danenas, Garsva & Gudas, 2011; Glen, 2003; Jo, Han & Lee, 1997; Lam & Moy, 2002;
Swicegood & Clark, 2001).
DA is based on a number of assumptions including normality of independent variables,
linearity of relationships, lack of multicollinearity among independent variables, and
homogeneity of variance/covariance (variances among group variables are the same across levels
of predictors). A criticism on using DA is the violation of these assumptions. However, the
evidence is mixed regarding the sensitivity of discriminant analysis to violation of these
14
assumptions (Eisenbeis, 1978; Hair et al., 2010; Karels & Prakash, 1987; Lacher, Coats, Sharma
& Fant, 1995).
2.2.2 Logistic Regression
Logistic regression, along with discriminant analysis, is also one of the most widely used
statistical techniques in the field. It is a form of regression that formulated to predict and explain
a binary categorical variable. When compared with DA, logistic regression is limited to
prediction of only two-group dependent measure. However, logistic regression has the advantage
of being less affected than DA when the basic assumptions, particularly normality of the
independent variables and equal variance, are not met. Since the binary dependent variable is
either 0 or 1, the predicted value (probability) must be bounded to fall within the same range. To
define a relationship bounded by 0 and 1, logistic regression uses the logistic curve to represent
the relationship between the independent and dependent variables. At very low levels of the
independent variable, the probability approaches 0, but never reaches it. Likewise, at very upper
levels of the independent variable, the probability approaches 1, but never reaches it. The
probability is computed with the following equation known as logistic function in equation 2.4:
0 1 1 2 2
0 1 1 2 21
n n
n n
X X X
X X X
eP x
e
L
L (2.4)
where is the probability of the dependent variable equaling a case, is the base of the natural
logarithm (about 2.718), and s are the parameters of the model. A graph of the logistic function
P is shown in Fig. 3.
The parameters are usually estimated with maximum likelihood estimation under an
assumption that observations are independent by the following equation 2.5
15
1
1
1i
i
yny
i i
i
l x x
(2.5)
where is unknown parameters, and and are observation ( ). The logistic function
takes an input with any value from negative infinity to positive infinity, whereas the output is
confined to values between 0 and 1, and hence is interpretable as a probability of the dependent
variable equaling a case. Logistic regression has been widely used in credit scoring applications
(Lee & Jung, 1999; Martin, 1977; Nie, Rowe, Zhang, Tian & Shi, 2011; Salehi & Mansoury,
2011; Srinivasan & Kim, 1987; Ye, Li, Feng & Wang, 2011). Although the logistic regression
can perform well in many applications, the accuracy of logistic regression model decreases when
the relationships between variables are non-linear (Akkoç, 2012).
Fig. 3. Logistic function P
2.2.3 Decision Trees
Decision trees are well-known classification techniques that organize information extracted
from a training dataset in a tree structure composed of the internal nodes, the leaf nodes, and
0.5
1
0
16
branches. The internal nodes represent the input attributes, the leaf nodes represent the
classification, and branches represent conjunctions of attributions. The algorithm begins with
selecting an attribute to place at the root node. Based on the impurity measure, the algorithm then
loops over all possible splits in order to find an attribute ( ) and its corresponding cutoff ( )
value which gives the best split condition. This process is repeated recursively for the new nodes
until a stopping criterion is satisfied. Fig. 4 gives an example of a decision tree in terms of two
classes of customers with bad credit and good credit.
Fig. 4. An example of a decision tree
Impurity measure defines how well the two classes are separated. The leaf nodes are
classified according to the most prevalent class in them. There are various impurity measures
used in the literature such as entropy based measure in ID3 (Quinlan, 1986) and its successor
C4.5 (C5.0 as the latest version) (Quinlan, 1993) computed as equation 2.6, and Gini measure in
Root Node
Bad credit
Good credit
Good credit Bad credit
17
classification and regression trees (CARTs) (Breiman, Friedman, Olshen & Stone, 1984; Loh,
2011) computed as equation 2.7.
1
k
i i
i
I D Entropy D p log p
(2.6)
2
1
1k
i
i
I D Gini D p
(2.7)
Decision tree models are powerful and flexible classifiers. Their popularity attributes to the
easy interpretation and implementation of the results. Decision trees have been successfully used
in many classification problems, and have been applied to the development of credit scoring
applications (Frydman, Altman & Kao, 1985; Mandala, Nawangpalupi & Praktikto, 2012; Nie et
al., 2011; Paleologo, Elisseeff & Antonini, 2010; Yi, Yan, Zhimin & Xiangjian, 2008; Zhang,
Zhou, Leung & Zheng, 2010; Zibanezhad, Foroghi & Monadjemi, 2011). However, one of the
limitations of decision trees is their instability, because even small fluctuations in the sample
data may lead to large variations in the classifications assigned to the instances (Li & Belford,
2002).
2.2.4 Neural Networks
Neural networks (NNs) are mathematical techniques developed by simulating working
principles of the human brain. NNs are structures of highly interconnected artificial nodes called
neurons or computational unit to form a network which mimics a biological neural network. Fig.
5 gives an example of a neuron.
18
A neuron has a set of input connections that receive signals from other neurons, a set of
weights, , for each input connection, , and a transfer function, ( ), that transforms the
sum of the weighted inputs to output (Coakley & Brown, 2000).
Fig. 5. An example of a neuron
There are different types of NNs such as feedforward neural network, recurrent neural
network, and self-organizing network. Feedforward neural network is the most widely used
Fig. 6. An example of a neural network
( )
Neuron
Input Layer Hidden Layer Output Layer
19
technique as shown in Fig. 6, and two most popular feedforward neural networks models are the
multi-layer perceptron (MLP) and the Radial Basis Function (RBF) networks (Wlodzislaw &
Norbert, 2001). They have much more in common, and the only fundamental difference is the
way in which hidden units combine values coming from preceding layers. MLPs use inner
products, while RBFs use Euclidean distance.
A major advantage of neural networks is their ability to provide flexible mapping between
inputs and outputs. This is achieved by adding a hidden layer between input layer and output.
The arrangement of the simple units into a multilayer framework produces a map between inputs
and outputs that is consistent with any underlying functional relationship regardless of its “true”
functional form. Having a general map between the input and output vectors eliminates the need
for unjustified priori restrictions that are needed in conventional statistical and econometric
modeling. Therefore, a neural network is often considered as a “universal approximator (Yu,
Wang & Lai, 2007, p. 28). Cybenko (1989) and Hornik, Stinchcombe, and White (1989)
demonstrated that arbitrary decision regions can be arbitrarily well approximated by continuous
feedforward neural networks with only a single hidden layer and any continuous sigmoidal
nonlinearity.
Back Propagation (BP) algorithm is used to train the feedforward neural network. It is a
supervised learning method, and the algorithm trains a given feedforward multilayer neural
network for a given set of input patterns with known classifications. When each entry of the
sample set is presented to the network, the network examines its output response to the sample
input pattern. The output response is then compared to the known and desired output and the
error value is calculated. Based on the error, the connection weights are adjusted.
20
Multilayer feed‐forward neural networks have been applying to many credit scoring models
(Derelioğlu & Gürgen, 2011; Derelioğlu, Gürgen & Okay, 2009; Dimla & Lister, 2000; Nanni &
Lumini, 2009; Tsai, 2009). However, West (2000) investigated the performance of five different
neural networks in credit scoring problem. The results showed that the mixture‐of‐experts and
radial basis function neural networks performed better, whilst multilayer perceptron (MLP) may
not be the most accurate neural network model. Other types of neural networks were developed
as well. For example, Piramuthu (1999) used neurofuzzy systems to evaluate credit risk. Ravi
and Pramodh (2008) proposed a principal component neural network (PCNN) architecture to
predict bankruptcy. Some hybrid neural networks were also developed to combine neural
network with subset selection models when dealing with large number of variables (Lee & Chen,
2005; Lee, Chiu, Lu & Chen, 2002; Yim & Mitchell, 2005) Meanwhile, comparisons between
neural networks and traditional statistical approaches have been widely studied (Alam, Booth,
Lee & Thordarson, 2000; Bell, 1997; Jo et al., 1997; Lin, Chang, Li & Chao, 2011; Malhotra &
Malhotra, 2003; Zhang, Hu, Patuwo & Indro, 1999). The majority of these studies reported that
the neural network models have better performance in terms of predictive accuracy rate when
compared with other traditional techniques, such as discriminant analysis and logistic regression,
though the results were very close (Abdou & Pointon, 2011; Crook, Edelman & Thomas, 2007).
2.2.5 Genetic Programming
Genetic programming (GP) was suggested by (Koza, 1992). It is an evolutionary algorithm
inspired by the Darwinian theory of evolution, and can be viewed as a specialization of genetic
algorithms (GA) (Eiben & Smith, 2003). GP can handle more complicated structures in
optimization when comparing with GA and therefore has been widely applied to a great diversity
of problems (Espejo, Ventura & Herrera, 2010; Sette & Boullart, 2001; Zhang & Bhattacharyya,
21
2004). GP evolves computer programs, and is represented in memory as tree structures
composed of the function set and terminal set. The function set is the operators or statements
such as arithmetic operators or If, then conditional statements. The terminal set
contains constants, input and other zero-argument in the GP tree. For example, ( )
( ) is expressed as Fig. 7.
Fig. 7. An example of expression of GP
Once a population of rules representing potential solutions to the classification of the GP tree
is initialized, the following procedures are similar to GA. The initial population of rules is
evaluated with a fitness function, and some of these rules are selected to run the mechanism of
Fig. 8. An example of mutation in GP
+
*
X Y
-
8 /
Z 3
Mutation
+
*
X Y
-
8 /
Z 3
+
*
X Y
-
3 *
A B
MC MP
22
Fig. 9. An example of crossover in GP
reproduction. Genetic operators, mutation and crossover, are then applied to produce new rules.
The mutation operator is used to choose a node randomly in a subtree and replace it with a new
subtree randomly as shown in Fig. 8 from MP to MC. The crossover operator is used to swap the
subtree from the parents to reproduce the children as shown in Fig. 9 from CP1 and CP2 to CC1
and CC2. These procedures are repeated until an acceptable classification rule is found for each
class in the dataset (Etemadi, Anvary Rostamy & Dehkordi, 2009; Ong et al., 2005).
Genetic programming is a rapidly growing area, and one of the most recent techniques that
has been applied in the classification problems. There are numbers of studies applied GP in the
field of credit scoring (Abdou, 2009; Alfaro-Cid, Sharman & Esparcia-Alcazar, 2007; Chen,
Zhang, Wei & Chen, 2007; Huang, Tzeng & Ong, 2006; Jiang & Yuan, 2007; Lensberg, Eilifsen
& McKee, 2006; Liu, Wang & Shuai, 2008; Rampone, Frattolillo & Landolfi, 2013; Zhang, Hifi,
Chen & Ye, 2008).
CC2
Crossover
-
+
A B
/
X *
Y Z
+
*
X Y
-
8 /
Z 3
+
*
X Y
/
X *
Y Z
-
+
A B
-
8 /
Z 3
CP1
CP2
CC1
23
2.2.6 Support Vector Machines
Support Vector Machines (SVMs) are supervised machine learning method suggested by
Vapnik (1995). It produces a binary classifier, so-called optimal separating hyper planes, through
an extremely non-linear mapping of the input vectors into the high-dimensional feature space.
SVM constructs a linear model to estimate the decision function using non-linear class
boundaries based on support vectors. If the data are linearly separated, SVM trains linear
machines for an optimal hyper plane that separates the data without error and into the maximum
distance between the hyper plane and the closest training points. The training points that are
closest to the optimal separating hyper plane are called support vectors.
Specifically, the main idea of SVM to map the linear inseparable samples in low dimensional
space to linear separable samples in high dimensional feature space with some kernel functions
making it easy to analyze the nonlinear characteristics of samples through linear algorithm in the
high dimensional space. There are several types of kernel functions including polynomial, radial
basis functions and sigmoid kernels which can be expressed as follow (Prajapati & Patle, 2010):
1. Polynomial kernel: , 1q
T
i iK x x x x
(2.8)
2. Radial basis functions (RBF) kernel: 2
2, exp{ }
i
i
x xK x x
(2.9)
3. Sigmoid kernel: , tanh( ( ) )T
i iK x x v x x c (2.10)
where RBF and sigmoid kernels are usually applied in classification problem and regression
analysis respectively.
24
After changing the samples in low dimensional space to samples in high dimensional feature
space with one of these kernel functions, the best separating hyper plane that maximizes the
distance between the two classes (or minimizes number of training errors) is constructed.
Generally, the algorithm builds two parallel hyper planes, H1 and H2, in a way that they
separate the data with no points between them, and then try to maximize their distance. The
region bounded by them is called "the margin". The best separating hyper plane also known as
optimal hyper plane H falls in between the two split planes making the distance of the plane
H1(H2) and the plane H to be as large as possible, see Fig. 10.
Fig. 10. An example of a SVM in the two-dimensional space
Given training vectors in two classes, labeled by the vector { }.
The support vector machine finds an optimal hyper plane with the maximum margin by solving
the following optimization problem
Optimal hyper plane
margin margin
H1
H
H2
25
, ,1
1Min
2
subject to: -1 0
0
mT
iw b
i
i i i
i
w w C
y w x b
(2.11)
This powerful tool for classification has been widely applied in practical problems such as
credit scoring (Bellotti & Crook, 2009; Chen & Li, 2010; Danenas et al., 2011; Harikrishna,
Farquad & Shabana, 2012; Huang et al., 2007; Kim & Sohn, 2010; Li, Li, Kuo, Liu & Huang,
2012; Martens, Baesens, Van Gestel & Vanthienen, 2007; Schebesch & Stecking, 2005; Wang,
Guo & Wang, 2010; Wei, Li & Chen, 2007), financial time-series forecasting (Tay & Cao, 2001;
Van Gestel, Suykens, Baestaens, Lambrechts, Lanckriet, Vandaele, De Moor & Vandewalle,
2001), pattern recognition (Asada, Yun, Nakayama & Tanino, 2004; Camastra, 2007), and
disease diagnosis (Akay, 2009; Huang, Liao & Chen, 2008; Lu, Van Gestel, Suykens, Van
Huffel, Vergote & Timmerman, 2003; Su & Yang, 2008).
2.3 Feature Selection
Due to the rapid expansion of credit industry and availability of massive amounts of
information, one of the challenges researchers have to face when using classification algorithms
to build credit scoring models is the selection of features since on one hand, increasing the
number of features increases collinearity and causes greater variance on the prediction of
response variable. On the other hand, the inclusion of high dimensional data with many
irrelevant and redundant features leads to high complexity, intensive computation, instability, or
lack of predictive accuracy for most classification models. Therefore, in the past few years, most
credit scoring models involve feature selection in order to reduce the computational effort for
classifiers and improve the accuracy of credit scoring models (Chen & Li, 2010).
26
Feature selection is a problem of finding an optimal subset which is given by a feature subset
selection algorithm that provides the highest possible accuracy. The following definition is given
by Kohavi and John (1997): “Given an inducer and a dataset with features
from a distribution D over the labeled instance space, an optimal feature subset, , is a subset
of the features such that the accuracy of the induced classifier = () is maximal (p. 276).
An optimal feature subset is not necessarily unique because when one feature can be replaced
by another feature since they are perfectly correlated to each other, the accuracy derived by
different combination of features is the same. Reasons for using a subset of variables can be
summarized as: (1) improving the prediction performance by eliminating uninformative
variables; (2) providing faster and cost-effective features and thus saves cost of collecting data
and builds models parsimonious; (3) providing better understanding of underlying process
generated the data, making the model more interpretable (Guyon & Elisseeff, 2003; Miller,
1984).
Fig. 11. Flowchart of filter approaches
Feature selection algorithms can be classified into two categories, namely, filter algorithms
and wrapper algorithms. Filter algorithms are independent of any learning algorithms and use
Input
Features
Training data Testing data
Testing
Accuracy
Particular
Measures
The Selected
Subset
Induction
Algorithm
Testing
27
particular measures, such as distance measures, information measures, dependency measures,
and consistency measures to evaluated features. The flowchart of filter approach is shown as Fig.
11.
Distance measures are also known as separability, divergence, or discrimination measures.
Euclidean distance and Chebyshev distance are examples for distance measures. The relief
algorithm proposed by Kira and Rendell (1992) was based on the distance measures. The basic
idea of relief is to draw instances at random, compute their nearest neighbors, and adjust a
feature weighting vector to give more weight to features that discriminate the instance from
neighbors of different classes. Therefore, a useful feature should have the same value for cases
from the same class and different values between cases from different classes. However, the
relief algorithm does not detect redundancy, so the remaining subset still contains redundant
features due to its feature evaluation mechanism that all discriminative features are assigned with
high relevance weight without considering the correlations in between (Yang & Li, 2006).
Mutual information has been used for feature selection as an information measure. Battiti
(1994) introduced mutual information feature selection (MIFS) to investigate the application of
the mutual information criterion to evaluate a set of candidate features and to select an
informative subset with robust estimation. The fast correlation-based filter (FCBF) developed by
Senliol, Gulgezen, Lei, and Cataltepe (2008) is a sequential forward selection algorithm that
creates the feature subset by sequentially adding features in decreasing relevance order while
excluding redundant features. Yu and Liu (2004) defined feature redundancy and proposed to
perform explicit redundancy analysis in feature selection. They developed a correlation-based
method for relevance and redundancy analysis. Peng, Fulmi, and Ding (2005) studied how to
select good features according to the maximal statistical dependency criterion based on mutual
28
information. They derived an equivalent form, called minimal-redundancy-maximal-relevance
criterion (mRMR), for first-order incremental feature selection. Fleuret (2004) proposed a fast
feature selection technique based on conditional mutual information. The method ensures the
selection of features, which are both individually informative and two-by-two weakly dependant
by picking features that maximize their mutual information. Kwak and Choi (2002) proposed a
new method of calculating mutual information between input and class variables based on the
Parzen window. However, it is said that in many areas of experimental sciences, it is difficult to
compute mutual information accurately due to the limited or imbalanced sample size (Sakar &
Kursun, 2012).
Dependency measures based on statistical information, such as Pearson correlation
coefficients, Fisher score, t-test, F-Statistic, etc., are designed to quantify how strongly two
features are associated or correlated with each other. Many traditional selection methodologies
are based on these measures, such as forward selection, backward elimination, and stepwise
regression. Wei and Billings (2007) presented a new unsupervised forward orthogonal search
(FOS) algorithm for feature selection and ranking. In this algorithm, features are selected in a
stepwise way, one at a time, by estimating the capability of each specified candidate feature
subset to represent the overall features in the measurement space. A squared correlation function
is employed as the criterion to measure the dependency between features, and this makes the new
algorithm easy to implement. Camps, Mooij, and Scholkopf (2010) introduced a nonlinear
measure of independence between random variables for remote sensing supervised feature
selection, where statistical dependence is evaluated with Hilbert–Schmidt independence criterion
(HSIC).
29
Finally, consistency measures try to retain the discriminating power of the data defined by
original features. Dash and Liu (2003) carried out a study of consistency measure with different
search strategies. The study of the consistency measure with other measures shows that it is
monotonic, fast, multivariate, capable of handling some noise, and can be used to remove
redundant and/or irrelevant features.
In short, Filter methods assess the relevance of features by looking at the intrinsic properties
of the data. They are fast and independent of the classifier, and thus can easily scale to very high
dimensional datasets. As a result, feature selection need to be done only once and then different
classifiers can be evaluated. However, these methods totally ignore the effects of the selected
feature subset on the performance of the induction algorithm. Many filters provide a feature
Fig. 12. Flowchart of wrapper approaches
No
Yes
Testing
Accuracy
Training data Testing data
The Selected
Subset
Induction
Algorithm
Testing
Accuracy or
Fitness
Evaluation
Input
Features Stop
Feature Selection
Search
Induction
Algorithm
30
ranking rather than an explicit best feature subset. This may lead to worse classification
performance when compared to other types of feature selection techniques. In addition, it is not
clear how to determine the threshold point for rankings to select only the required features and
exclude noise (Kumari & Swarnkar, 2011).
Wrappers use the learning machine of interest as a black box to score subsets of variables
according to their predictive power. The search algorithm for wrappers searches through the
space of possible features and evaluates each subset. The flowchart of wrapper approach is
shown as Fig. 12. The induction algorithm can be the classifier in the problem of classification.
Hsu (2004) employed decision tree for feature selection, and found out a subset of features
with lowest error rate of classification by using the genetic algorithms wrapper approach for
inducing decision trees. Chiang and Pell (2004) incorporated genetic algorithms with Fisher
discriminant analysis (FDA) for key variable identification, and genetic algorithms are used as an
optimization tool to determine variables that maximize the FDA classification success rate for
two given data sets. Guyon, Weston, Barnhill, and Vapnik (2002) proposed a method of gene
selection utilizing SVM methods based on Recursive Feature Elimination (RFE) yielded better
classification performance. Chen and Li (2010) combined SVM classifier with conventional
statistical linear discriminate analysis, decision tree, rough sets, and F-score approaches as
features selection. Chen, Ma, and Ma (2009) proposed hybrid SVM technique based on three
strategies, namely, using classification and regression tree (CART) to select input features, using
multivariate adaptive regression splines (MARS) to select input features, and using grid search to
optimize model parameters. Their results demonstrated that the hybrid SVM provided the best
classification rate. It is seemed that in the past few years these hybrid models yielded better
performance, and gained their popularity in research.
31
Wrapper algorithms often achieve better results than filters in that they are tuned to the
specific interaction between an induction algorithm and their training data. However, when a
search algorithm is wrapped around the classification model, the space of feature subsets grows
exponentially with the number of features. This problem is known as NP-hard, and the search
quickly becomes computationally expensive and intractable (Kumari & Swarnkar, 2011; Liu &
Schumann, 2005). Wrapper algorithms also have a risk of over fitting to the model.
In sum, there are advantages and disadvantages for both filter and wrapper methods. This
paper presents a hybrid model by combining advantages of filter and wrapper methods. In this
study, subsets for different sizes are selected with a binary quadratic programming model based
on correlation coefficient. The first phase dramatically brings down the number of subsets to be
evaluated by the classifiers from all possible subsets. These selected subsets are then follow
wrapper approach in which one or more satisfactory subsets of features will be given based on
the accuracy of the prediction and the size of the subset.
32
CHAPTER III
METHODOLOGY
The model used to select subsets of variables in this study is a correlation coefficient based
binary quadratic programming model. The aim is to select optimal subsets of variables for
different sizes of subsets. The model is transformed into the unconstrained binary quadratic
programming (UBQP) problem and solved with bisection method based on Tabu search
algorithm (BMTS). For variables where subsets of variables selected from, there will be
( ) possible subsets of variables corresponding to the subset of size , where .
The approach for selecting subsets in this paper, in most cases, will choose at least subsets of
variables for different sizes out of all possible subsets. The selected subsets of variables
associated to the optimal solution of the presented subset selection model are evaluated in terms
of OCAR with 10-fold cross validation SVM. Finally, satisfactory subsets used as a credit
scoring (classification) model are determined based on both the overall classification accuracy
rate (OCAR) and the size of the subset. It should be noted that there can be multiple satisfactory
subsets for a same size due to different value of parameter α.
3.1 Model Construction
The criteria used to build the subset selection model involve two conflicting objectives. On
one hand, we would like the model to include as many variables as possible so that the
information content in these factors can influence the predicted value. On the other hand, we
want the model to include as few variable as possible because the variance of the predicted value
increases as collinearity increases caused by the increase of the number of variables. Therefore,
in a good model, the correlations between variable and the variables ( - correlations)
should be high, and those between the variables ( - correlations) should be low (Eksioglu,
33
Demirer & Capar, 2005). Finally, the objective function is to maximize the difference between
the sum of the - correlations and - correlations adjusted by a weight α since there is a
tradeoff between the number of informative factors and the effect of collinearity or in other
wards one must trade off estimation of more parameters (bias reduction) with accurately
estimating these parameters (variance reduction).
Let
be the sample correlation coefficient between and , and (
) be the sample
correlation coefficient between ( ) and ( ). The subset selection method derives a subset
containing variables from the set with all variables such that the sum of correlation
coefficient between and , ∑ | |
, is maximized, and the sum of correlation coefficient
between ( ) and ( ) , ∑ (| | |
|)
, is minimized. A combination of these
correlation terms is generated using a weight, α ( α ), denoting the tradeoff between the
two conflicting objectives. This is shown as objective function in model 3.1. and , are
defined as the decision variables, where if variable is in , 0 otherwise. if
variable is in , 0 otherwise. Therefore indicates both and are in , 0 otherwise.
1
i i j
1 1, ,
i j
Maximize u 1 u u
s.t.
u ,u 0,1
n n
yi ij ji
i ii N i j N i j
(3.1)
34
3.2 Algorithm
Model 3.1 can be rewritten as form of , where is an vector of binary variables and
is an symmetric matrix. When we expand , we have following equation 3.2.
1
1 1
'n n
ii i j ji i j
i ii j
u Qu q u qi q u u
(3.2)
where and are binary variables. are the elements on the diagonal of matrix, and and
are symmetric non-diagonal elements of matrix. As we can see model 3.1 exactly matches
with the right hand side of Equation 3.2, and thus model 3.1 can be rewritten to the form of
which is recognized as UBQP problem with α | | as its diagonal elements and – (
α) (| | |
|) as its non-diagonal elements of matrix.
1 12 21 1 11 1
2 21 12 2 2 2 2
1 1 2 2
1 1
1 1
1 1
T y n n
y n n
n nn n n n yn
u u
u u
u u
L
L
M MM M O M
L
According to Kochenberger and Glover (2006), the UBQP is used to solve a wide variety of
combinatorial optimization problems. Since the proposed model for feature selection in this
study is already quadratic function without constraints, no additional transformation is needed in
this case. However, any linear or quadratic discrete problem with linear constraints in bounded
35
integer variables can be converted to the form of UQBP by using quadratic infeasibility penalties
as an alternative to imposing constraints. Following is a brief introduction for general constraints
problem:
0 '
s.t. binary
Min x x Qx
Ax b x (3.3)
The constrained quadratic optimization model can be converted into equivalent UQBP models by
imposing a quadratic infeasibility penalty function to the objective function, and thus model 3.3
is converted to
0
^
Min ( ) ( )
=
=
tx xQx P Ax b Ax b
xQx xDx c
xQ x c
(3.4)
where is a positive scalar. The additive constant can be dropped later on to attain the final
unconstrained version, model 3.5, of the constrained model 3.3. Slack variables can be added to
inequality constraints to comply with the form of .
ˆMin ' x Qx x binary (3.5)
On the other hand, the equivalent quadratic penalties for certain types of constraints are well
established, making the conversion from constrained to unconstrained model much easier. For
36
example, the corresponding quadratic penalties for constraints is ( ), and Table 2
gives some other well established penalties.
Table 2
Penalty conversion.
Classical Constraint Equivalent Penalty
( )
( )
( )
( )
( )
A variety of procedures to solve UBQP problem have been reported. For example,
exhaustive method based on branch and bound is useful to find out optimal solutions for
problems of small or limited size (Boros, Hammer, Sun & Tavares, 2008). For problems of large
size, Tabu search based algorithms are among the most successful ones in solving UBQP
problems (Wang, Lü, Glover & Hao, 2012) . Tabu search algorithm provides solutions very close
to optimality and are among the most effective, if not the best, at tackling the difficult problems
at hand since classical methods often encounter great difficulty when facing the challenge of
solving hard optimization problem. These successes have made Tabu search extremely popular
among those interested in finding good solutions to large combinatorial problems. A
distinguishing feature of Tabu search is that it is based on the premise that problem solving must
incorporate adaptive memory and responsive exploration, allowing local search methods to
overcome local optima and enhance the performance by using memory structures that describe
the visited solutions or user-provided sets of rules.
37
In more details, let X be a feasible solution set. Each has an associated neighborhood
( ) , and each solution ( ) is reached from by an operation called a move. Tabu
search is an iterative method and begins in the same way as ordinary local or neighborhood
search. The general steps of an iterative procedure start with (1) choosing , and then (2)
find ( ) such that ( ) ( ), (3) is the local optimum (minimal), denoted with , if
no such can be found and the method stops, (4) otherwise, designate to be the new and go
to step (2) (Glover & Laguna, 1997). However, algorithms for ordinary local or neighborhood
search often face the risk of being trapped in local optimal instead of global optimal. Tabu
neighborhood search method has been designed to avoid being trapped in a local minimum or
maximum by using memory structure which can be considered as modifying the neighborhood
( ) of the current solution . The modified neighborhood denoted by ( ) may be expanded
to include solutions not ordinarily found in ( ). Therefore, Tabu search can be viewed as a
dynamic neighborhood method where the neighborhood of is not a static set, but rather a set
that can change according to the history of the search (Glover, 1986, 1989, 1990; Glover &
Laguna, 1997). The dynamic neighborhood is achieved via the memory strategy. The selected
attributes that occur in solutions recently visited are labeled ‘Tabu-active’, and solutions that
contain Tabu-active elements are those become Tabu. Current moves are taken in Tabu list
which record the Tabu-active attributes and identify their current status. These Tabu-active
attributes are prohibited in the following number of iteratives that defined as Tabu tenure. This
prevents certain solutions from the recent past from belonging to ( ) and hence from being
revisited, and consequently exploiting solutions in a modified neighborhood.
The first adaptive memory Tabu search algorithm was used to solve UBQP by Glover,
Kochenberger, and Alidaee (1998), and then more Tabu search strategies and algorithm
38
improvements have been presented. Palubeckis (2004) proposed five different multistart Tabu
search strategies for the large size of unconstrained binary quadratic optimization problem and
has achieved very good results. Palubeckis (2006) later on further improves these results by
iterated Tabu search algorithm. More recently, Glover, Lü, and Hao (2010) presented a
diversification-driven Tabu search (D2TS) algorithm that alternates between a basic Tabu search
procedure and a memory-based perturbation strategy guided by a long-term memory. Lü, Glover,
and Hao (2010) proposed a hybrid metaheuristic approach (HMA) by incorporating a Tabu
search procedure into the framework of evolutionary algorithms.
Finally, Wang, Lü et al. (2012) proposed an improved algorithm called path relinking which
are composed of a reference set initialization method, an improvement method by Tabu search, a
reference set update method, a relinking method and a path solution selection method. It has
been demonstrated that the algorithm improves both solution quality and computational
efficiency. Therefore, the problem in our paper is then solved iteratively for different
values of α with this path relinking algorithm, an improved Tabu search algorithm. It has been
demonstrated that the algorithm improves both the quality of solution and the efficiency of
computation. Therefore, the problem in our paper is then solved iteratively for different
values of α with this path relinking algorithm. In this study, three parameters need to be set in
path relinking algorithm. The first one is the time limit for the whole path relinking algorithm to
be stopped. For the current problem, 1 second is sufficient. The second parameter is the stop
condition of each Tabu search procedure in the path relinking algorithm. We choose 5000, which
means the Tabu search will stop when the current best known result has not been improved
within the last 5000 iterations. The last parameter is for the Tabu tenure, which is the number of
Tabu-active attributes that are prohibited to be visited in the following iterations. Considering the
39
size of problems used in this study, 5 to 7 is a reasonable range for this parameter. The bisection
method and its connection with path relinking algorithm are realized by Python, and 10-fold
cross validation is run in MATLAB.
As mentioned earlier, α is the weight to balance between the number of informative factors
and the degree of collinearity. When α , the first term, α∑ | | , in the objective
function is zero, and the objective function becomes to maximize ( α) ∑ (| |
| |) . The maximal solution for this objective function is 0 which means all
assuming . According to the definition for , no more than one variable can be
selected to form the subset model. In this case the selected subset of variable with size 1 should
be the variable having the largest correlation coefficient with response . On the other extreme,
when α , the second term, ( α) ∑ (| | |
|) , in the objective function is
zero, and the objective function is to maximize ∑ | | assuming
. Apparently, the
maximal can be obtained by setting all which indicates to select all variables. It is found
that the number of variables selected by the model changes from 1 to as α increases from 0 to 1
in this study.
We present a bisection method to identify αs that gives different sizes of optimal subsets of
variables. α [ ] is divided into equally. The number of αs gives solutions with
many of them duplicated since a certain range of α gives the same solution according to our
experiments. Thus these duplicated solutions can be eliminated (see Appendix A for examples of
solutions for Model 3.1 when α [ ] is divided into ). When is extremely large, we can
use parallel computing. α [ ] is first divided into
where each of the interval can be then
40
partitioned into where is smaller than , and also each of the interval can be run
simultaneously. For example, if we divided into where (with interval equals
) and set the time limit to 1 second for Tabu search algorithm, then 32768 seconds
(almost four days) is needed to reach the solutions for the UBQP problem. However, if we first
divided into
where , then partition the 10 intervals of 0.1 into where (with
interval equals , and run them at the same time, it will only take 4096 seconds which
is approximate 11 hours with an even smaller interval. In most cases, at least one solution for
each size of subset is retained and these solutions will be evaluated by 10-fold cross validation
SVM to find out the satisfactory subsets. Fig. 13 gives a flowchart of the proposed BMTS+SVM
method.
Fig. 13. Flowchart of the BMTS+SVM method
BMTS
Accuracy
Evaluation Testing
Accuracy
Training data Testing data
Classifier
(SVM)
Classifier
Testing
Input
Feature
s Binary Quadratic
Programming Model
Subsets with
Different Sizes
Satisfactory
Subset(s)
Filter
Wrapper
41
The two stages for solving the current problem shown as follow:
Stage 1: Divided Alpha into equally and for each Alpha, do the following
Step 1. Solve model 3.1 with Tabu search algorithm (path relinking).
Step 2. If th solution = ( )th solution, replace ( )th solution with th solution,
otherwise keep both solutions.
Step 3. Evaluate the features of subsets corresponding to the solutions with 10-fold cross
validation SVM.
Stage 2: Choose the satisfactory subsets based on both the size of the subset and the OCAR.
3.3 SVM Classifier
There are a number of classification techniques. Among conventional statistical methods,
logistic regression and discriminant analysis are most widely used (Baesens, Gestel, Viaene,
Stepanova, Suykens & Vanthienen, 2003; Šušteršič et al., 2009). However, due to the possible
complex nonlinear relationship between variables, they are reported to have a lack of accuracy.
There are also more sophisticated methods known as artificial intelligence such as fuzzy systems,
neural networks, genetic Programming (GP) and SVM that achieve better performance than
traditional statistical methods in classification task (Baesens et al., 2003; Desai, Crook &
Overstreet, 1996; Lee & Chen, 2005; Lee et al., 2002; West, 2000). Recently, SVM has received
considerable attention in the machine learning literature, and it is appreciated because of its
strong theoretical foundation adaptive generalization ability, and appealing and stable predictive
performance (Lessmann & Voß, 2009). In addition, compared with other artificial intelligence
techniques, only two free parameters, namely penalty parameter C and the kernel function
parameters such as the gamma ( ) for the radial basis function (RBF) kernel, are needed for
SVM and it guarantees unique, optimal and global solution since the training of an SVM is done
by solving a linearly constrained quadratic problem (Shin, Lee & Kim, 2005). Due to these
advantages and its popularity, SVM is used as the classifier in this study.
42
3.4 Cross Validation
Cross-validation is a model validation technique in estimating how well the model,
developed on training data, will perform on a future unknown data set. In -fold cross-validation,
the original data set is randomly partitioned into equally sized subsets of samples (folds).
Subsequently iterations of training and validation are performed such that within each iteration
a different fold of the data is retained as the validation data for testing the model while the
remaining folds are used as training data. The results from the folds can then be
averaged to produce a single estimation.
Fig. 14 gives an example of -fold with . The original data set is randomly divided into
five equally sized subsets, to . The estimation of parameters for the model, in each
experiment, is based on the four unshaded training datasets, while predictive performance such
as classification accuracy is obtained from the shaded validating dataset. Finally, the overall
classification accuracy is computed with the average accuracy rate from the five experiments.
Fig. 14. An example of the cross validation
Experiment 3
Experiment 4
Experiment 5
Experiment 2
Experiment 1
S1 S4 S3 S2 S5
43
We use 10-fold cross-validation along with the SVM in this study. In data mining and
machine learning, 10-fold cross-validation is the most common, and it was found by Kohavi
(1995) that 10-fold cross validation was among the best model selection method since it
provided less biased estimation of the accuracy after compared with several other approaches to
estimate accuracy.
44
CHAPTER IV
EXPERIMENT RESULTS AND COMPARISON ANALYSIS
4.1 Validation of the Method on Two Benchmark Datasets
The real world datasets, the Australian and German credit datasets, are used to test the
validity of the hybrid model. Both datasets are available from the UCI Repository of Machine
Learning Databases. It consists of 307 ‘good’ applicants and 383 ‘bad’ applicants whose credits
are not creditworthy in the Australian dataset. Each applicant contain a class attributes and 14
features, including 6 nominal and 8 numeric attributes. Despite the fact that all attribute names
and values in this dataset have been changed to symbols to protect confidentiality of the data,
this dataset is interesting and valid for the model testing purpose because there is a good mix of
attributes including continuous, nominal with small numbers of values, and nominal with larger
numbers of values. The original German dataset contain a class attribute and 20
categorical/symbolic attributes including status of existing account, credit history, duration in
month, purpose, credit amount, savings account/bonds, present employment, personal status and
sex, installment rate in percentage of disposable income, other debtors/guarantors, present
residence, property, age in years, other installments plans, housing, number of existing credits at
this bank, job, dependents, telephone, and foreign worker. It contains 700 instances of
creditworthy applicants and 300 bad applicants. An edited German dataset, used in this study, is
also available at UCI machine learning database with 24 numerical attributes. Several indicator
variables have been added, and categorical attributes have been coded as integer. Description
about the datasets is shown in Table 3. These two datasets have been studied by many
researchers and the Australian dataset, especially, contains a good mixture of attributes making it
interesting for research purpose.
45
Table 3
Statistic description for Australian and German datasets.
Country No. of
Attributes
Nominal
features
Numeric
features
No. of
classes
Sample
size
Good
credit
Bad
credit
Australia 14 6 8 2 690 307 383
German 24 0 24 2 1000 700 300
The first step for the credit scoring problem in this study is to establish subsets of features
with full dataset by model 3.1. A bisection method is now imposed on . It is divided into
where in this case. can be set to a larger value as needed. The greater value of , the
smaller interval of , and thus more solutions, which is determined by the number of , will be
provided based on model 3.1. The complete results are shown in Table 4 and Table 5 which give
subsets of features in different sizes selected with BMTS for the Australian and German datasets
respectively.
Table 4
Complete subsets of features associated with given for Australian dataset. Subset of Variables Selected No. of Features OCAR
0 1 55.51%
0.060546875 2 85.51%
0.3046875 2 85.80%
0.400390625 3 85.65%
0.419921875 3 85.80%
0.5078125 4 85.94%
0.59765625 5 86.96%
0.703125 5 87.54%
0.71875 5 87.25%
0.75 6 86.52%
0.779296875 7 87.68%
0.802734375 7 86.52%
0.826171875 8 87.97%
0.830078125 8 86.81%
0.837890625 9 86.81%
0.86328125 10 87.39%
0.880859375 11 87.25%
0.8984375 11 86.96%
0.90234375 12 87.39%
0.955078125 13 87.39%
1 14 87.10%
Table 5
Complete subsets of features associated with given for German dataset.
Subset of Variables Selected No. of
Variables OCAR
0 1 70.0%
0.120239258 3 70.6%
0.422912598 3 73.5%
0.422973633 4 71.9%
0.531921387 4 73.4%
0.601623535 4 75.4%
0.620391846 5 75.4%
0.658599854 5 75.8%
0.684204102 6 75.1%
0.694030762 7 74.5%
0.711578369 7 75.8%
0.723571777 8 76.5%
0.753265381 9 77.4%
0.839416504 10 76.8%
0.861663818 11 76.8%
0.875 11 77.1%
0.876800537 11 77.1%
0.886505127 12 77.5%
0.894195557 13 77.3%
0.9112854 14 77.2%
0.930633545 15 77.2%
0.946868896 16 77.3%
0.960479736 17 77.6%
0.976745605 18 77.4%
0.985107422 19 77.5%
0.985168457 19 77.3%
0.991271973 20 77.5%
0.994476318 21 77.5%
0.997283936 22 77.6%
0.99822998 23 77.2%
1 24 78.3%
46
47
There are 21 subsets of variables selected out of possible subset of variables for the
Australian dataset, and 31 subsets of variables selected out of possible subset of variables for
the case of Germany. As mentioned earlier, determines the size of subset and the number of
variables selected by the model changes from 1 to as increases from 0 to 1. Fig. 15 shows the
relationship between and the size of the two cases.
Fig. 15. Relationship between number of features and
4.2 Results and Comparison Analysis
After establishing subsets of variables, the next step is to find out the subsets with the
satisfactory OCAR via the SVM classifier. An advantage of this subset selection method is to
provide options for different sizes of subsets so that we can select a satisfactory solution by
considering both the number of features in the subset and the accuracy comprehensively. For
example, we may consider sacrificing some degree of accuracy for a smaller size of the subset
1
6
11
16
21
0 0.2 0.4 0.6 0.8 1
Nu
mb
er
of
feat
ure
s
α
German dataset
Australian dataset
48
when the cost of collecting the data of variables is high. Table 6 gives the OCAR derived from
10-fold cross validation SVM based on selected subsets for the Australian dataset.
Table 6
OCAR for Australian case and comparison.
Reference Method No. of
Features OCAR
Proposed method
BMTS
+
SVM
5 87.54%
5 87.25%
7 87.68%
8 87.97%
Chen and Li (2010)
LDA+SVM
7
86.52%
DT+SVM 7 86.29%
RST+SVM 7 85.22%
Fscore+SVM 7 85.10%
Huang et al. (2007)
Grid+Fscore+SVM
7.6
84.20%
GA+SVM 7.3 86.90%
The accuracy rates of two size 5, one size 7 and one size 8 subsets are reported as satisfactory
solutions (See Table 4 and Table 5 for all accuracy rates of different sizes). We compare these
accuracy rates with the results from Chen and Li (2010) and Huang et al. (2007) since these two
studies also used 10-fold cross validation SVM to evaluate the accuracy rate based on selected
subsets. As we can see, the accuracy rate for the subset of size 7 from our method is 87.68%, and
it is higher than the rates from all other methods in the two studies. The subset of size 8 gives the
highest accuracy rate, 87.97%. Nevertheless, two subsets of size 5 provide satisfactory accuracy
rates with fewer variables.
Unfortunately, Chen and Li (2010) and Huang et al. (2007) did not provide which variables
are included in their selected subsets, and thus, we can only conclude that our method, overall,
improves the accuracy rate at this stage. In order to provide evidence that the subsets of features
selected by BMTS performs competitively well in accuracy prediction, we compare the accuracy
49
rates based on BMTS selected subsets with the rates based on subsets selected by forward
selection (FS), backward elimination (BE), and stepwise selection.
Forward selection starts with the intercept term only and at each step it chooses a variable to
be added if the F-statistic of this variable exceeds a cut-off value (a pre-determined critical F-
value, say ). The first variable, , which has the largest simple correlation with the response
variable is selected to be entered. Now an F-test is carried out for to check whether the
hypothesis that the coefficient of is zero ( ) can be rejected. If the F-statistic exceeds
the , The variable is entered. is usually computed by , where is the confidence
level, e.g. . and are the number of observations and terms, including the variable
and the intercept, in the current subset model. As changes, the cut-off value changes as well.
Unlike the first variable, the second variable chosen for entry is the one that has the largest
partial correlation with response variable given is also in the model. Partial correlation aimed
at finding correlation between two variables while taking away the effects of another variable, or
several other variables, on this relationship. If the F-statistic is greater than associated with the
current step, then the second variable is also included in the model. The procedure repeated until
the partial F-statistic at a particular step does not exceed or when there is no more candidate
variable to be added.
Stepwise regression algorithm is developed by Efroymson and has been widely used for
multiple regression calculations. It is an extension of forward selection. In forward selection,
variables will never be dropped once they enter in the model. In stepwise regression, however,
the variable entered at the previous step is reassessed after inclusion of a new variable.
Comparing with , the cut-off value for dropping the variable is usually
making it relatively more difficult to add a variable than to delete one. Therefore,
50
while the procedure for selecting the first variable is the same in forward selection as it is in
stepwise regression, there is one more step when processing the second variable. That is when
the second variable is entered into the model, we need to assess the previous entered variable,
. If the partial F-statistic for is now less than , would be dropped. Once the variable is
dropped, it cannot be used anymore and we will select the next variable from the remaining
candidate variables. Again, the procedure is terminated when the partial F-statistic does not
exceed or when the last candidate variable is added to the model.
The two sequential procedures described above start with no variable in the model and add
one variable at each step. Backward elimination works in the opposite way. It begins with the
full model of all variables and deletes one variable at a time. First, the partial F-statistic for
each variable is computed. The smallest of the partial F-statistic is selected and compared with
where in this first step. If the partial F-statistic for the selected variable
is less than , this variable is dropped. Backward elimination algorithm stops when the smallest
partial F-statistic is greater than the cutoff value . These three feature selection methods are
classic and are available for most commercial packages.
We also compare our results with the results in two other studies, Wang, Hedar et al. (2012)
and Gönen, Gönen, and Gürgen (2012), which provide the features in their selected subsets.
Under this circumstance, each subset can be evaluated in the same experimental condition where
accuracy rates are all derived from a 10-fold cross validation SVM. The results are shown in
Table 7.
The accuracy rate given by the subset of size 7 from the presented method, 87.68%, is
slightly higher than that given by the subset of size 7 in Wang, Hedar et al. (2012), which is
87.39%. The highest accuracy rate from Gönen et al. (2012) is 87.97% given by the subset of
51
size 9, while BMTS method gives the same accuracy rate with subset of size 8. The accuracy
rates based on the subsets selected by the classic methods are much lower than the results from
our method.
Table 7
OCAR of selected subsets for Australian case and comparison.
Reference Method Features in selected subsets No. of
Features OCAR
Proposed
method
BMTS
+
SVM
5 87.54%
5 87.25%
7 87.68%
8 87.97%
Wang, Hedar
et al. (2012)
RSFS 7 87.39%
Gönen et al.
(2012)
PNS/PGNS 11 87.39%
PS 9 87.97%
PGS 7 87.10%
MKLNS/MKLGNS 12 87.39%
MKLS/MKLGS 9 86.23%
Classic
FS+SVM
7
78.70%
BE+SVM 9 78.99%
Stepwise+SVM 7 78.99%
BMTS method performs well in the case of the Australian dataset. Not only is the accuracy
rate higher but also the number of variables is fewer when comparing with the results from other
studies and classic methods. We also provide the comparison of accuracy rates, shown in Table
8, between our method and the rates in Chen and Li (2010) and Huang et al. (2007) for the
German dataset. While the accuracy rates in this study are higher than the rates from all other
methods in Chen and Li (2010), there is no significant differences between our results and those
in Huang et al. (2007) except that the number of features selected in the subsets is fewer in the
presented method. In addition, we compare the OCAR based on our selected subset with the rate
based on subset selected by the classic methods, and the results are given in Table 9. The
52
accuracy rates are very close to each other except that one of the subsets given by BMTS method
contains 9 variables which is fewer than all the subsets given by classic methods.
Table 8
OCAR for German case and comparison.
Reference Method No. of
Features OCAR
Proposed method BMTS+ SVM
9 77.40%
12
77.50%
Chen and Li (2010)
LDA+SVM 12 76.10%
DT+SVM 12 73.70%
RST+SVM 12 75.60%
Fscore+SVM
12
76.70%
Huang et al. (2007) Grid+Fscore+SVM 20.4 77.50%
GA+SVM 13.3 77.92%
Table 9 OCAR of selected subsets for German case and comparison.
Reference Method Features in selected subsets No. of
Features OCAR
Proposed
method
BMTS+SVM 9 77.40%
12
77.50%
Classic
FS+SVM 11 77.50%
BE+SVM 14 77.40%
Stepwise+SVM 12 76.90%
In sum, BMTS method has the superiority in terms of accuracy rate and the number of
selected features. Moreover, it provides flexibility in that a tradeoff between OCAR and the size
of subset is available. This is very useful when the cost associated with data collection is high.
Take the Australian dataset for example, the highest accuracy rate, 87.97%, is associated with the
subset with size 8 whereas two subsets with size 5 give the accuracy rates of 87.54% and 87.25%
which are only slightly lower than the highest one. When the cost of data collection is high,
selecting subsets with size 5 exceeds the benefit from selecting the one with highest accuracy
53
rate because by doing so, it reduces both number of features and the cost of data collection with
no significant difference in OCAR.
54
CHAPTER V
APPLICATION OF THE CREDIT SCORING AT CORPORATE LEVEL
5.1 Reviews of Applications of Credit Scoring at Corporate Level
In this chapter, we apply the proposed method, (BMTS+SVM), for credit scoring problem at
corporate level. Credit scoring at this level comprises the assessment of risk, such as the
probability of default, bankruptcy or fraud, associated with lending to an organization (Paleologo
et al., 2010).
A well-known application of corporate credit scoring is bankruptcy prediction or
classification. An early study by Altman (1968) used financial ratios and discriminant analysis
for corporate bankruptcy prediction. The 22 potential financial ratios were grouped into five ratio
categories, namely profitability, liquidity, solvency, activity, and leverage ratios. By following
four criteria including statistical significance of alternative functions, inter-correlations between
the relevant variables, predictive accuracy, and expert opinion, Altman (1968) selected 5 features
(financial ratios) to predict corporate bankruptcy with discriminant analysis. Frydman et al.
(1985) analyzed financial distress of firms with 20 financial ratios, and they introduced recursive
partitioning algorithm (RPA), a nonparametric technique based on pattern recognition, to
improve the classification accuracy.
Based on previous studies and inter-correlation, Leshno and Spector (1996) filtered 29
financial parameters from 70 ratios, and evaluated the prediction capability of various neural
network models which were differed in terms of data span, number of iterations, and neural
network architecture. McKee and Lensberg (2002) used rough sets model to identify variables
that are important for the prediction, and developed a structural model of bankruptcy solved with
genetic programming algorithm. Ryu and Yue (2005) used simple feature reduction techniques
55
such as stepwise discriminant analysis, sequential elimination, and mutual information based
feature selection to choose features from 23 financial ratios, and they introduced a linear
programming technique called isotonic separation to separate bankrupt and non-bankrupt firms.
Shin et al. (2005) selected 52 variables out of more than 250 financial ratios using independent-
samples t-test in the first stage. They further selected 10 variables by MDA stepwise method, and
evaluated the predictive performance of bankruptcy with SVM.
Min and Lee (2008) employed Data Envelopment Analysis (DEA) for bankruptcy prediction.
57 features were classified into categories of profitability, growth, productivity, liquidity,
activity, and cost structure, and six final financial ratios were chosen by using factor analysis and
judgment of the experts. DEA score, ranged from 0 to 100, were reported to specify the financial
performance. While the best firms have DEA score of 100, a firm with lower DEA score is
considered to be relatively worse than other firms, and thus has higher probability of bankruptcy.
Etemadi et al. (2009) selected 5 financial ratios out of the 43 candidate ratios with discriminant
stepwise procedure. Prediction of corporate bankruptcy was then conducted by using a genetic
programming model. Min and Jeong (2009) identified 9 variables from 27 financial ratios based
on various feature selection methods such as independent sample t-test, discriminant analysis,
logistic regression, and decision trees. They proposed a binary classification method, solved with
genetic approach, to classify observation firms into bankrupt and non-bankrupt according to the
distance between a representative firm and observation firms.
Olson, Delen, and Meng (2012) illustrated their preference of using decision trees to predict
corporate failure. They argued that decision trees could provide models with transparency and
transportability as well as accurate. Fedorova, Gilenko, and Dovzhenko (2013) first selected 75
financial ratios from 98 ratios with ANOVA test, and then applied different combinations of
56
learning algorithm, including multiple discriminant analysis, logit regression, classification and
regression trees, to identify final financial ratios. These ratios were evaluated by two types of
artificial neural networks to derive the classification accuracy rate for the bankruptcy prediction.
Table 10 lists financial ratios in some of the aforementioned studies. A review of bankruptcy
prediction in banks and firms via statistical and intelligent techniques by Kumar and Ravi (2007)
was another good source that provided lists of financial ratios used for bankruptcy studies.
Table 10
Financial ratios in bankruptcy prediction literatures. Authors (Year) Financial Ratios Sample
Ratios
Altman (1968) Working capital/Total assets; retained earnings/Total assets; EBIT/Total
assets; Market value equity/Book value of debt; Sales/Total assets
1:1
Frydman et al.
(1985)
Cash/Total assets; Cash/total sales; Cash flow/total debt; Current
assets/Current Liabilities; Current assets/total assets; Current assets/total
sales; EBIT/total assets; Log (interest Coverage + 15); Log (total assets);
Market value of equity/total capitalization; Net income/total assets; Quick
assets/total assets; Quick assets/current liabilities; Quick assets/sales;
Retained earnings/total assets; Standard deviation of (EBIT/total assets);
Total debt/total assets; Total sales/total assets; Working capital/total assets;
Working capital/total sales
2.5:1
Leshno and Spector
(1996)
Working capital/total sales; Retained earnings/total assets; Earning before
income tax/total assets; Market value/total liabilities; Sales/ total assets;
EBIT per share; Cash flow per share; Cost of goods sold/sales; Capital
expenditures per share; Sales/cash; Receivables turnover; Inventory
turnover; ROE; ROI; Investments/assets/ Long term debt/total liabilities;
Debt/equity; Long term debt/equity; Quick ratio; price/earnings ratio;
Dividend yield; Total debt/total assets; Quick assets/sales; Sales/total
capital; Log (total assets); Interest coverage; Log (interest coverage);
Earning/5 years maturity; Cash flow/total debt; Working capital/long term
debt; Working capital/cash expenses; Book equity/total capital; Market
equity/total capital; Average market equity/total capital; StDv (log
(EBIT/total assets)); Sales/gross fixed assets; Sales/receivables; ROA; Total
debt/invested capital; Current ratio; Worth/total debt; Net income/total debt;
Operating income/sales; EBIT/total tangible assets; Net available for capital
/total capital; Sales/total tangible assets; EBIT/sales; Current liabilities/total
liabilities; Net available for total capital/sales; Fixed charge coverage; Cash
flow/Fixed charges; earning/total debt; retaining earning/tangible assets;
Capital lease/total assets
1:1
McKee and
Lensberg (2002)
General & Administration expense/net sales; Net income/net worth; Current
assets/current liabilities; Liabilities/total assets; Net worth/net fixed assets;
Working capital/net worth; Net income/total assets; Cash/current liabilities;
Investment cash flow/net income
1:1
57
Table 10: Financial ratios in bankruptcy prediction literatures (continued)
Authors (Year) Financial Ratios Sample
Ratios
Ryu and Yue
(2005)
Cash flow/total assets; cash/sales; cash flow/total debt; current assets/current
liabilities; current assets/total assets; current assets/sales; EBIT/total assets;
Retained earnings/total assets; Net income/total assets; Total dent/total assets;
Sales/total assets; Working capital/total assets; Working capital/sales; Quick
assets/total assets; Quick assets/current liabilities; Quick assets/sales; Market
value of equity/total capitalization; Cash/current liabilities; Current
liabilities/equity; Inventory/sales; Equity/sales; Market value of equity/total
debt; Net income/total capitalization
1:1
Shin et al. (2005) Total asset growth; Contribution margin; Operating income to total asset; Fixed
asset to sales; Owner’s equity to total asset; Net asset to total asset; Net loan
dependence rate; Operating asset constitute ratio
1:1
Etemadi et al.
(2009)
EBIT/total assets; Long term debt/Shareholders’ equity; Retained
earnings/stock capital; Market value of equity/total liabilities; Market value
equity/shareholders’ equity; Market value equity/total assets; Cash/total assets;
Total liabilities/total assets; Current liabilities/shareholders’ equity; Current
liabilities/total liabilities; (Cash + short term investments)/current liabilities;
(Receivables + inventory)/total assets; Receivables/sales;
Receivables/inventory; Shareholders’ equity/total liabilities; Shareholders’
equity/total assets; Current assets/current liabilities; Quick assets/current
liabilities; Quick assets/total assets; Fixed assets/(shareholders’ equity + long
term debt); Fixed assets/total assets; Current assets/total assets; Cash/current
liabilities; Interest expenses/gross profit; Sales/cash; Sales/total assets; Working
capital/total assets; paid in capital/shareholders’ equity; Sales/working capital;
Retained earnings/total assets; Net income/shareholders’ equity; Net income/
sales; Net income/total assets; Operational income/sales; Operational
income/total assets; EBIT/interest expenses; EBIT/sales; Gross profit/sales;
Sales/shareholders’ equity; Sales/fixed assets; Sales/current assets
1:1
Min and Jeong
(2009)
Gross value added/sales; Gross value added/total assets; Growth rate of total
assets; Ordinary income/sales; Net; Income/sales; Operating income/sales;
Costs of sales/sales; Net interest expenses/sales; Ordinary income/total assets;
Rate of earnings on total capital; Net working capital/total assets; Current
liabilities/total assets; Stockholders’; equity/total assets; Total borrowings and
bonds payable/total assets; Total assets turnover; Ordinary income/total assets;
Net working capital/sales; Stockholders’ equity/sales; Ordinary income/total
assets; Depreciation expenses; Operating assets turnover; Interest expenses/total
expenses; Net interest expenses; Break-even point ratio; Employment costs;
Interest expenses and net income/total assets; Earnings before interest and
taxes/sales
1:1
Fedorova et al.
(2013)
Cash flow/total liabilities; Cash flow/equity; Cash flow/total sales; Cash
flow/total assets; Cash flow/equity; Cash flow/current liabilities; Cash flow/total
assets; Cash flow/total sales; Cash flow/current liabilities; Gross profit/total
sales; Gross profit/total assets; EBT/total liabilities; Profit on sales/total sales;
Profit on sales/total assets; Net income/total liabilities; EBT/total sales;
EBT/total assets; Profit on sales/current liabilities; Gross profit/cost of goods
sold; Profit on sales/equity; Net profit/current liabilities; Profit on sales/cost of
goods sold; Gross profit/total liabilities; Gross profit/current liabilities;
EBT/cost of goods sold; Gross profit/equity; Profit on sales/total liabilities; Net
profit/cost of goods sold; Sales/fixed assets; Sales/equity; (Cost of goods sold -
depreciation)/accounts payable; Sales/current assets; Sales/total liabilities; (Cost
6:1
58
Table 10: Financial ratios in bankruptcy prediction literatures (continued)
Authors (Year) Financial Ratios Sample
Ratios
of goods sold - depreciation)/inventories; Sales/(cash + invested funds);
Sales/current liabilities; Sales/(cash + invested funds + accounts receivable);
Sales/accounts receivable; Sales/working capital; Cost of goods sold/finished
goods; Cash/current liabilities; Short-term accounts receivable/accounts
payable; (Cash + invested funds)/(costs/365); (Equity - fixed assets)/current
assets; Quick assets/(costs/365); Quick assets/total assets; Long-term
liabilities/equity; Cash/total assets; Quick assets/current assets; Current
assets/total liabilities; Cash/current assets; Short-term liabilities/total liabilities;
Current assets/total assets; Revenue reserves/equity; Long-term liabilities/fixed
assets; (Cash + invested funds)/total assets; Revenue reserves/total assets; Long-
term liabilities/total liabilities; (Equity + long - term liabilities)/total assets;
Revenue reserves/total liabilities; Current liabilities/total liabilities; Working
capital/inventories; Long-term liabilities/total assets; Accounts payable/total
liabilities; Retained earnings/equity; Fixed assets/total assets; Accounts
payable/accounts receivable; Log (tangible total assets); Debt/total assets; Profit
before tax/current liabilities; Working capital/total debt; Equity/total liabilities;
Working capital/total assets; Log (EBIT)/interest Net profit/costs; Retained
earnings/total assets; EBT/equity Current liabilities/(cash + invested funds);
Sales/total assets; EBIT/total assets; Total assets/sales; Cash flow/total debt;
No-credit interval; Current liabilities/total assets; Net profit/equity
5.2 A Study of Credit Scoring for the U.S. and Chinese Companies
Likewise, this study tries to use the proposed method as a tool to identify key financial
factors from a pool of financial ratios. These selected key factors are considered to have best
discriminating power in classifying companies into two groups. We initiates with 40 features
(financial ratios) for companies from the U.S. and China respectively. The features are grouped
into 7 categories, and all the companies are classified into either creditworthy companies
(CWCs) or less creditworthy companies (LCWCs) according to some criteria. Aside from the
goal of identifying key financial categories and features, the study in this chapter also discuss the
predictive performance and evaluation of the classification models.
The 7 financial categories are in line with the financial ratio categories provided by GTA
database, and are categories with cash flow ratios, profitability ratios, liquidity ratios, solvency
ratio, shareholders’ profitability ratios, operating ratios, and leverage ratios. GTA database is a
59
leading global provider of China financial market, industries and economic data. It also provides
financial analytics, financial education and related value-added services to financial institutions,
(e.g. Morgan Stanley Composite Index (MSCI), the China Securities Regulatory Commission,
etc.) business schools, (e.g. Wharton Business School, Harvard Business School, and University
of Chicago, etc.) and individual investors.
Table 11 Financial ratios for the U.S. and Chinese companies.
Category Description of Features
Cash Flow Ratios
(5)
X1: OANCF/LCT
X4: OANCF/NI
X2: OANCF/DT
X5: OANCF/CSHI
X3: CH /SALE
Profitability Ratios
(9)
X6: NI/SALE
X9: NI/AT
X11: TXT/GP
X13: COGS/SALE
X7: EBIT/SALE
X10: NI/TEQ (C)
X12: GP/XT (C)
X14: FEXP/SALE (C)
X8: EBIT/AT
X10: NI/(AT-LT) (U)
X12: GP/(SALE-EBIT) (U)
X14: EXP/SALE (U)
Liquidity Ratios (4)
X15: (ACT-INVT)/LCT
X18: WACP/AT
X16: ACT/LCT
X17: (ACT-LCT)/ACT
Solvency Ratio (8)
X19: LT/AT
X21: LT/CEQ
X24: CEQ/AT
X20 :(NI+TXT+FEXP)/FEXP(C)
X22: MKVALT/LT
X25: LCT/LT
X20: EBIT/EXP(U)
X23: DT/AT
X26: DLTT/LT
Shareholder’s
Profitability Ratios
(5)
X27: PRCC/EPSPX
X29: SALE/CSHI
X28: PRCC/NAVPS(C)
X30: MKVALT/AT
X28: PRCC/(AT-LT)(U)
X31: PRCC*/AT
Operating Ratios (7)
X32: SALE/RECT
X34: (SALE-GP)/LCT(U)
X36: SALE/(AT-ACT)(U)
X33: INVT/SALE
X35: (ACT-INVT)/SALE
X37: SALE/AT
X34: OPC/PAYT(C)
X36: SALE/FA
X38: SALE/CEQ
Leverage Ratios (2)
X39: EBIT/FEXP(C)
X39: EBIT/(EBIT-EXP)(U)
X40: GP/EBIT
ACT: Total Current Assets; AT: Total Assets; CEQ: Total Common/Ordinary Equity; CH: Cash; COGS: Cost of Goods Sold;
CSHI: Common Shares Issued; DLC: Total Debt in Current Liabilities; DLTT: Total Long-Term Debt; DT: Total Debt
(DT=DLC+DLTT); EBIT: Earnings Before Interest and Taxes; EPSPX: Earnings Per Share (Basic) Excluding Extraordinary
Items; EXP: Expense (EXP=EBIT-NI-TXT); FA: Fixed Assets; FEXP: Financial Expense; GP: Gross Profit (Loss); INVT:
Total Inventories; LCT: Total Current Liabilities; LT: Total Liabilities; MKVALT: Total Market Value; NAVPS: Net Asset
Value per Share; NI: Net Income (Loss); OANCF: Operating Activities Net Cash Flow; OPC: Operating Costs; PAYT: Total
Payables; RECT: Total Receivables; PRCC: Price Close; SALE: Sales; TEQ: Stockholders’ Equity; TXT: Income Taxes;
WCAP: Working Capital; XT: Total Expense
Note: Due to availability of the data, the same feature is slightly different between the U.S. and Chinese companies in some
cases. The letters “U” and “C” in the parentheses indicate the ratios for the U.S. and China respectively.
60
The 40 financial ratios for Chinese companies are selected from the 7 categories from GTA
database. The 40 financial ratios for the U.S. companies are computed with the financial
indicators collected from COMPUSTAT database to match with the ratios in the Chinese case.
These financial ratios are also grouped into the same 7 categories. A brief introduction of the 7
financial categories is given as follow and the 40 financial ratios are shown in Table 11. A
statistical description about the U.S. and Chinese datasets for each of the 40 financial ratios are
given in Appendix B and Appendix C respectively. The normality for each features are tested
with Skewness and Kurtosis. While Skewness is a measure of symmetry, Kurtosis is a measure
of whether the data are peaked or flat relative to a normal distribution. Both Skewness and
Kurtosis are 0 indicate a normal distribution. The descriptions show that there is no feature with
normal distribution for both datasets.
1. The category with cash flow ratios is used to determine companies’ ability of generating
cash in their operating activity. This category is important because companies can make
themselves look profitable by manipulating with the magic of accounting and non-cash
transactions such as sales on credit, but in fact are at a financial risk if they generate little
cash from these profits. Therefore, ratios in this category give us a better understanding at
the financial health and performance of companies. OANCF/NI ratio, for example,
compares companies’ operating activity net cash flow to their net income giving us an
idea about how much cash they can generate from the net income, and how much amount
of cash they have to cover obligations.
2. The category with profitability ratios explains how well companies employed their
resources in generating profit. Companies with higher gross profit margins or returns on
capital have better chance to survive in the economy downturn than those have razor-thin
61
margins or returns on capital. NI/SALE, for instance, measures the return earned on
companies’ capital relative to each dollar of sales. Another widely used ratio is NI/AT
which refers to the return on assets (ROA) ratio. It illustrates how well management
utilized the company's total assets to make profits.
3. The category with liquidity ratios reflects companies’ ability to meet their short-terms
debts obligations. Generally, higher value of these ratios indicate larger margin of safety
to pay for short-term debts. In contrast, a company with low coverage of liquid assets to
short-term debts may have difficulty to run its operations, as well as meet its obligations.
Two common liquidity ratios are ACT/LCT (current ratio) which measures companies’
ability to meet their current liabilities with their current assets such as cash, accounts
receivable and inventories, and (ACT-INVT)/LCT (quick ratio) which measures
companies’ ability to pay their short-term obligations with their most liquid assets.
4. The category with solvency ratios reflects companies’ capacity to meet their long-term
financial commitments. The higher companies’ solvency ratio is, the lower the
probability that they will default on their long term debt obligations. An example of
solvency ratio is DT/AT which measures what percentage of companies’ assets is
financed with debt. A higher ratio indicates a greater financial risk for these companies to
pay off their long term obligations.
5. The category with shareholders’ profitability ratios is considered part of profitability
financial ratios but focus more on companies’ ability in generating profit with
shareholders’ equity. PRCC/NAVPS, the ratio of price close to net asset value per share,
is used to capture this ability.
62
6. The category with operating ratios shows the efficiency of management and companies
operations in using their capital. SALE/AT ratio (total asset turnover), for instance,
measures companies’ ability of using their assets in generating sales revenue. Another
example of operating ratio is SALE/RECT (account receivable turnover) which measures
the effectiveness of companies in extending credit and collecting debts. Companies
should reassess their credit policies when this ratio is low in order to ensure the timely
collection of imparted credit that is not earning interest for the companies.
7. The category with leverage financial ratios shows the percentage of a company’s capital
structure that is made up on debt or liabilities owed to external parties. The financial
leverage ratio indicates the extent to which the business relies on debt financing.
EBIT/(EBIT-Interest Expense) is an example of this ratio. In addition, the operating
leverage of a business is the ratio of the change in EBIT to the change in sales. The
computation of this ratio can be expressed as GP/EBIT
The classification of creditworthy and less creditworthy for U.S. and Chinese companies are
based on Standard & Poor’s COMPUSTAT credit rating and ST classification respectively. The
classification standard for the U.S. companies is based on one of the big three credit-rating
agencies, Standard & Poor’s (S&P) credit ratings. The data can also be obtained from
COMPUSTAT. These ratings are the S&P’s opinion about the ability and willingness of issuers,
e.g. corporations, to meet their financial obligations in full and on time. Also their ratings reflect
the credit quality and the probability that the debt may default. Although S&P’s stated that
ratings opinions were not intended as guarantees of credit quality or as exact measures of the
likelihood that a particular issuer or particular debt issue will default, their studies on defaults
indicated a strong correlation between ratings and default frequencies. Generally the higher the
63
rating is, the lower the frequency of default, and vice versa. In addition, one of S&P’s studies
have shown that issuers rated ‘B+’ or lower accounted for 61% of defaults, over all 7-year
intervals between 1981 and 2010.
In more specific, the dichotomic classification standard regarding to the U.S. companies in
our study rests on the S&P’s long-term credit ratings which are divided into several categories
ranging from ‘AAA’, indicating the strongest credit quality, to ‘D’ or ‘SD’, indicating the lowest
credit quality. Long-term ratings from ‘AA’ to ‘CCC’ may be modified by the additional sign of
plus or minus to show relative standing within the major rating categories. The definitions of all
the ratings are shown in Appendix D.
This well known long term credit ratings from the S&P are utilized for classification standard
where companies with obligation rated B and above are classified into creditworthy group, It is
believed that the obligors still have the capacity to meet their financial commitment on the
obligation with an obligation rated ‘B’, though is more vulnerable to default than obligations
rated with B above. According to the description of the S&P’s credit rating, companies with
obligation rated ‘CCC’ are vulnerable to nonpayment, and in the event of adverse business,
financial, or economic conditions, the obligors are unlikely to have the capacity to meet their
financial commitment on the obligation. Therefore, it is reasonable to consider the U.S.
companies with obligation rated B and above as creditworthy group, denoted with 0 while
corporations with obligation rated B below, starting from ‘CCC’, are classified into less
creditworthy group, denoted with 1. For the U.S. companies, we use the credit rating, which is
quarterly based, at the last quarter of year , and the financial ratios corresponding to the credit
ratings are obtained from year .
64
For Chinese companies, ST classification standard is used where ST stands for Special
Treatment. The original idea behind this ST classification is to warn investors to be cautious
about the companies labeled ST due to their abnormal financial conditions according to the rules
issued by China Securities Regulatory Commission (CSRC), a ministry-level unit directly under
the State Council. ST companies usually face the problem of low profitability and higher default
risk on their debts, and thus can be considered as less creditworthy companies (Lü & Zhao, 2004;
Xiong, 2013). Therefore, it is reasonable to use ST and non-ST as classification standard to
group Chinese companies into creditworthy, denoted with 0, and less creditworthy companies,
denoted with 1. Now suppose a company is announced as ST at the year . The financial ratios
used for classification corresponding to this ST or non-ST status are obtained from year .
The reason is that according to the disclosure policy of Chinese listing companies, the
announcement for a company to be ST at year is mainly based on the financial performance of
year , and thus using financial ratios from year to predict the ST status at year will
raise the problem of overestimating the predictive power of a model. Consequently, we use the
ST status of a company at year while the matching financial ratios are derived from year of
. For example, the financial ratios for a company from the year of 2006 will be used to
predict its status of ST or non-ST at the year of 2008. The status of ST or non-ST for a company
is derived from GTA database as well.
We initiate with 40 features, numeric financial ratios, for both the U.S. and Chinese
companies. All of the companies are from nonfinancial sector. The U.S. dataset includes 238
corporations and 297 observations from 1999 to 2011. Chinese dataset contains 593 corporations
and 900 observations from 1998 to 2010. The descriptions of the U.S. and Chinese datasets are
given in Table 12. The ratio for the number of CWCs to LCWCs is set to 2:1. There is no
65
conclusive evidence to show what this ratio should be as we can see from the studies listed in
Table 10. While most studies used 1:1, some other studies used different ratios, such as 6:1 and
2.5:1. In practical, however, the number of companies with obligation rated B or above is more
than that of below B in the United States. Also, the number of the non-ST companies is more
than the number of the ST companies in China. In order to reflect this reality as well as avoiding
the problem of extremely unbalance sample size between the two classes, we set the ratio to 2:1.
Table 12
Description of the U.S. and Chinese datasets.
Country No. of
Features
Features
Property
No. of
Classes
No. of
Companies
Sample
Size CWCs LCWCs
U.S. 40 Numeric 2 238 297 198 99
China 40 Numeric 2 593 900 600 300
In sum, a study on credit scoring problem between the U.S. companies and Chinese
companies is conducted in this section. The classification standard of CWCs and LCWCs is
based on S&P’s credit rating for the U.S. case and ST for Chinese case. Forty financial ratios
which are categorized into seven groups act as the initial features from which satisfactory
features of subsets will be selected by the proposed method. The flow chart in Fig. 16 depicts the
structure of the study in this chapter.
We firstly try to provide some insights about the application of the proposed feature selection
method in identifying the key factors in terms of financial categories and ratios that provide best
discriminating power to distinguish CWCs from LCWCs in both countries. Secondly, the
predictive performance, in terms of OCAR, of different classifiers, namely SVM, logistic
regression, discriminant analysis, decision tree, and neural networks, on the best subsets among
the satisfactory subsets are evaluated
66
Fig. 16. Structure of credit scoring study at corporate level
Following the same procedure in the case of Australian and German credit scoring problem,
we obtain three satisfactory subsets of features providing the top three subsets with highest
overall classification accuracy rate (OCAR) as well as satisfying outcomes regarding to the size
of subsets (the complete results for all subsets are given in Appendix E and Appendix F). The
results of features in selected subsets, number of features, and OCAR for the U.S. and China
cases are reported in Table 13 and Table 14 respectively.
The results are encouraging. Not only the predictive performance in terms of OCAR for the
three satisfactory subsets selected by the proposed method is better, but also the sizes of the
subsets are far fewer than full model which includes all 40 features. In the case of the U.S., the
Proposed subset
selection method
Proposed subset
selection method
40 Features (Financial ratios)
Data source: GTA
Classification standard: ST
Satisfactory subsets of
features
Performance of
different classifiers
on the best subset
Key factors for
classification in
China
Satisfactory subsets of
features
Performance of
different classifiers
on the best subset
Key factors for
classification in
U.S.
40 Features (Financial ratios)
Data source: COMPUSTAT
Classification standard:
S&P’s credit rating
China U.S.
67
OCARs provided by the three satisfactory subsets with size 3 to 5 are 6.73% to 7.74% higher
than the rates derived from the full model. In the Chinese case, the OCARs provided by the three
satisfactory subsets with size 3 are 3.67% to 4.11% higher than the rates derived from the full
model.
Table 13
OCAR for the U.S. dataset and comparison.
Method Features in selected subsets No. of
Features OCAR
All + SVM 40 69.36%
BMTS +SVM 3 77.10%
4 76.77%
5 76.09%
FS+SVM 6 75.08%
BE+SVM 12 72.39%
STEPWISE+SVM 4 71.38%
Table 14 OCAR for Chinese dataset and comparison.
Method Features in selected subsets No. of
Features OCAR
All + SVM 40 69.00%
BMTS +SVM 3 73.11%
3 72.67%
3 72.67%
FS+SVM 8 71.11%
BE+SVM 38 69.78%
STEPWISE+SVM 11 71.22%
We also compare the OCARs based on our selected subsets with the rates based on subsets
selected by the classic feature selection methods which are forward selection, backward
elimination, and stepwise selection. The results give supportive evidence that the subsets selected
with BMTS method give better predictive performance and smaller size of the subsets in both
cases.
68
The best subset selected by proposed method, in the U.S. case has three features which are
from profitability ratios, from solvency ratios, and from leverage ratios. The second
and third best subsets which also provide small sizes of subsets and OCARs very close to the
best one include features and from cash flow ratios, and from profitability ratios,
and and from solvency ratios. Therefore, we conclude that profitability, solvency, cash
flow, and leverage ratios are four key financial categories while features
are the most representative features that provide best
discriminating power to differentiate between CWCs and LCWCs in the United States.
This conclusion can be supported by the S&P’s criteria publications where they provided 8
key industrial financial ratios which are EBIT interest coverage, EBITDA interest coverage, long
term debt to capital, funds from operations (FFO) to total debt, free operating cash flow (FOCF)
to total debt, return on capital, operating income to sales, and total debt to capital. The long term
debt to capital belongs to solvency ratios reflecting companies’ capacity to meet their long-term
financial commitments. FFO to total debt, FOCF to total debt, and EBITDA interest coverage
pertain to cash flow ratios revealing companies’ ability of generating cash in their operating
activity. Return on capital, operating income to sales, and EBIT interest coverage are
profitability ratios explaining how well companies employed their resources in generating profit.
Finally, total debt to capital is considered as leverage ratios showing the percentage of a
company’s capital structure that is made up on debt.
Table 15 list 9 financial ratios selected with the proposed method and their corresponding
categories, and 8 financial ratios and their corresponding categories provided by S&P’s
publication. We conduct the comparison on the level of financial categories rather than
individual financial ratios because different financial ratios are used between this study and S&P.
69
The result is supportive in that the 9 selected financial ratios from the proposed method and 8
financial ratios from S&P can be attributed to the same four categories which are cash flow,
profitability, solvency and leverage ratios. Therefore, the consistency in matching all the
financial categories between the categories derived from our method and ones provided by a
widely recognized credit rating agency, S&P, provide evidence that the proposed method can be
applied to identify key factors so that on one hand, financial institutions are able to gain better
understanding about the credit status of their applicants by focusing on these key factors. On the
other hand, companies that attempt to borrow money from financial institutions are able to attain
clear vision on what are the most important factors for being considered a creditworthy
company, and what they need to improve to increase the chance of receiving loans.
Table 15
Comparison of financial ratios between the U.S. dataset and S&P. Categories Financial ratios Categories Financial ratios
BMTS+SVM S&P
Cash flow ratios X4: OANCF/NI Cash flow ratios FFO/Total debt
X5: OANCF/CSHI FOCF/Total debt
EBITDA interest coverage
Profitability ratios X7: EBIT/SALE Profitability ratios Return on capital
X10: NI/(AT-LT) Operating income/Sales
X13: COGS/SALE
EBIT interest coverage
Solvency ratios X19: LT/AT Solvency ratios Long term debt/Capital
X21: LT/CEQ
X24: CEQ/AT
Leverage ratios X39: EBIT/(EBIT-EXP) Leverage ratios Total debt/Capital
The supportive evidence from the case of the U.S. motivates us to apply the proposed method
in identifying the key factors to differentiate between CWCs and LCWCs in China. The seven
financial ratios and their corresponding five categories for Chinese case are listed in Table 16.
70
The results show that the four categories of cash flow, profitability, solvency, and leverage
ratios are key financial categories in predicting the classification of CWCs and LCWCs in both
countries. However, category with operating ratios is an additional useful category to separate
CWCs from LCWCs in China. An additional support for the identified categories as the key
financial categories is that if we select one key financial ratio from each of the key financial
ratios, the combination of gives an even higher OCAR, 78.79%, in the U.S. case,
and gives a higher OCAR of 73.56% in Chinese case.
Table 16 Comparison of financial ratios between the U.S. and Chinese dataset.
Categories Financial ratios Categories Financial ratios
U.S. China
Cash flow ratios X4: OANCF/NI Cash flow ratios X1: OANCF/LCT
X5: OANCF/CSHI
Profitability ratios X7: EBIT/SALE Profitability ratios X8: EBIT/AT
X10: NI/(AT-LT) X14: FEXP/SALE
X13: COGS/SALE
Solvency ratios X19: LT/AT Solvency ratios X25: LCT/LT
X21: LT/CEQ X26: DLTT/LT
X24: CEQ/AT
Operating ratios X37: SALE/AT
Leverage ratios X39: EBIT/(EBIT-EXP) Leverage ratios X39: EBIT/FEXP
This result reveals that the gap of operating capacity between CWCs and LCWCs in the U.S.
is not as significant as the gap in China. We further use ANOVA, which is used to determine
whether there are any significant differences between the means of two or more independent
groups, to verify this statement. In this case, we are expecting to see that the difference between
the means of CWCs and LCWCs for features in operating ratios from the U.S. is not significant
whereas it is significant in the case of China. The results are shown in Table 17 and Table 18.
71
The result is consistent with our expectation. While none of the ANOVA result for the seven
features in operating ratios is significant in the U.S., the ANOVA results for the five features out
of seven in operating ratios are significant in China. The P value for the feature, , selected by
our proposed method is which is a very significant result.
Table 17
ANOVA for features in operating ratios from the U.S. dataset.
Features Sum of Squares df Mean Square F Sig.
X32 Between Groups
2031.462 1 2031.462 1.597 .207
X33 Between Groups
.009 1 .009 .767 .382
X34 Between Groups
.322 1 .322 .051 .821
X35 Between Groups
.047 1 .047 .335 .563
X36 Between Groups
5.707 1 5.707 .492 .483
X37 Between Groups
1.383 1 1.383 1.863 .173
X38 Between Groups 119202.068 1 119202.068 2.735 .099
Table 18
ANOVA for features in operating ratios from Chinese dataset. Features Sum of Squares df Mean Square F Sig.
X32 Between Groups
26561.069 1 26561.069 19.948 .000
X33 Between Groups
3.968 1 3.968 8.946 .003
X34 Between Groups
1547.700 1 1547.700 .651 .420
X35 Between Groups
35.167 1 35.167 37.316 .000
X36 Between Groups
407.082 1 407.082 3.330 .068
X37 Between Groups
5.177 1 5.177 26.290 .000
X38 Between Groups 12.277 1 12.277 3.939 .047
A possible explanation of this finding that category with operating ratios is a key financial
category in China but not in the U.S. is due to the different conditions and capacity in obtaining
72
financial sources to repay their debts. In the category of operating ratios, almost all the financial
ratios are related to sales. Since China is still an emerging country, companies do not have so
much resource and access to commercial finance as the companies in the United States. In China,
sales revenue is the most important source of finance for a company to pay for the debts, whereas
the U.S. companies have more source of finance to raise funds to repay their debts other than
rely merely on the sales revenue. Therefore, category with operating ratios plays more important
role in differentiating between CWCs and LCWCs in China than in the United States.
To summarize, based on the data collected for the U.S. dataset, profitability, solvency, cash
flow, and leverage ratios are four key financial categories, and 9 out of 40 features, namely
OANCF/NI, OANCF/CSHI, EBIT/SALE, NI/(AT-LT), COGS/SALE, LT/AT, LT/CEQ,
CEQ/AT, and EBIT/(EBIT-EXP) are most useful financial ratios in their corresponding financial
categories that can effectively differentiate between CWCs and LCWCs in the U.S. case.
Similarly, Chinese case has the same four categories plus a financial category with operating
ratios as key financial categories, and 7 out of 40 financial ratios which are OANCF/LCT,
EBIT/AT, FEXP/SALE, LCT/LT, DLTT/LT, EBIT/FEXP, and SALE/AT in Chinese case are
the most representative features in their corresponding categories. The application of the findings
is twofold. On one hand, managers of financial institutions can pay more attention to the ratios in
the key financial categories especially the most representative ratios selected with our proposed
method so that they are able to gain better understanding about the credit status of their
applicants before making any further decisions. Managers should also be aware that key financial
categories may vary for different countries On the other hand, companies that attempt to borrow
money from financial institutions are able to attain clear vision on what are the most important
73
financial factors for being considered a creditworthy company, and what improvement are
needed immediately to increase the chance of receiving loans.
5.3 Model Predictive Performance and Evaluation
In this section, predictive performance of the models in classifying companies into either
CWCs or LCWCs is measured with overall classification accuracy rate (OCAR) and cost of
misclassification. The OCAR is computed based on the best subsets among the three satisfactory
subsets, namely , and for the U.S. companies, and , and for Chinese
companies. In addition, we compare the performance of five models using different classifiers
including SVM, logistic regression (LR), discriminant analysis (DA), decision trees (DTs), and
neural networks (NN). We use SPSS to run LR, DA, and DT where DT is in a form of
classification and regression trees (CARTs). SVM and NN are performed in MATLAB. Finally,
the impact of cutoff value on classification and the cost of misclassification associated with Type
I and Type II errors are also discussed. Cutoff values are important since on one hand, whether a
company is classified into one class instead of the other relies on this cutoff value in most
classification techniques. On the other hand, in statistics, Type I and Type II errors which are
used to compute misclassification cost depend on the cutoff value as well. Type I error is the
incorrect rejection of a true null hypothesis when it is in fact true, and it is a false positive. Type
II error is the failure to reject a false null hypothesis when in fact the alternate hypothesis is true,
and it is a false negative. In this particular credit scoring problem, Type I error refers to a CWC
is misclassified as a LCWC, and Type II error refers to a LCWC is misclassified as a CWC.
For the purpose of analyzing the predictive performance of the models, we randomly select
training data and testing data for each year from 1998 to 2012 for the two cases with 244
observations of training data and 53 observations of testing data for the U.S. dataset, and 787
74
observations of training data and 113 observations of testing data for Chinese dataset. The ratio
of the number of CWCs to LCWCs for both the training and testing data is again set to 2:1. Table
19 gives a description on these two datasets.
Table 19
Description of training and testing data for the U.S. and Chinese datasets.
Country No. of
Features
Features
Property
No. of
Classes
No. of
Companies
Sample
Size
Training
Sample
Testing
Sample
U.S. 3 Numeric 2 238 297 244 53
China 3 Numeric 2 593 900 787 113
In a standard procedure, training data are used to determinate parameters for a model, and
validating data are used to test the performance of the model. Table 20 summarizes the results of
SVM, LR, DA, DT, and NN for which all the cutoff values are set as 0.5 for the two cases. The
model with SVM achieves the highest OCAR for both the U.S. and China cases.
Table 20
OCAR of five classifiers for the U.S. and Chinese datasets.
SVM Logistic LDA DT NN
U.S.
OCAR 73.58% 67.92% 66.04% 67.92% 67.92%
Type 1: 5.88% 5.88% 5.88% 11.77% 5.88%
Type 2: 57.89% 78.95% 84.21% 68.42% 78.95%
China
OCAR 71.68% 67.26% 68.14% 69.91% 67.25%
Type 1 9.59% 10.96% 6.85% 28.77% 13.70%
Type 2 62.50% 72.50% 77.50% 32.50% 67.50% The cut value is 0.5 for both the U.S. and Chinese cases
The results show that the model with SVM achieves the highest OCAR for both the U.S and
Chinese cases, and the OCARs are not significantly different from each other for the other four
models. However, Type II error is extremely high for all models, which is undesired in most of
situations in that the cost of Type II error is usually much higher than Type I error.
75
5.4 ROC Curve
A weakness of the above analyses on OCAR is that the selection of cutoff value directly
affects the accuracy of the classification. In this case, fixing the cutoff value to 0.5 can be
arbitrary. Receiver Operating Characteristic (ROC) is introduced to overcome this weakness.
Given a binary classification problem in which the outcomes are labeled either as positive or
negative, there are four combination outcomes. If the actual value is positive and it is classified
as positive, then it is called a true positive; if it is classified as negative, it is called a false
negative. Conversely, if the actual value is negative and it is classified as negative, it is said to be
a true negative; if it is classified as positive, it is called a false positive. The four outcomes is
formulated in a 2×2 contingency table as follows
Positive condition Negative condition
Positive test
outcome True Positive
False Positive
(Type I error)
(1-sensitivity)
Negative test
outcome
False negative
(Type II error)
(1-Specificity)
True negative
ROC graphs are two-dimensional graphs which illustrates the performance of a binary
classifier system by plotting the true positive rate (sensitivity) against the false positive rate (1-
specificity) for the different possible cutoff value points of a diagnostic test (Fawcett, 2006). The
formulae for computing sensitivity and specificity are given below.
Positives correctly classifiedSensitivity
Total positives
76
True negativeSpecificity
False positives + True negatives
The empirical method for creating an ROC plot is to plot pairs of sensitivity versus (1-
specificity) at all possible values for the decision threshold. Accuracy is measured by the area
under the ROC curve (referred as AUC). The AUC is an overall summary of diagnostic
accuracy, and the diagonal line is the ROC curve corresponding to random chance. An AUC of 1
represents a perfect test; an AUC of 0.5 represents a worthless test as ROC curve corresponds to
random chance. On rare occasions, the estimated AUC is less than 0.5, indicating that the test
does worse than chance. In other words, the closer the curve follows the left hand border and the
top border of the ROC space, the more accurate the test is (Lasko, Bhagwat, Zou & Ohno-
Machado, 2005; Zou, Resnic, Talos, Goldberg-Zimring, Bhagwat, Haker, Kikinis, Jolesz &
Ohno-Machado, 2005). Values of AUC are reported in Table 21.
Table 21 AUC of different classifiers for the U.S. and Chinese datasets.
SVM LR DA DT NN
U.S. .811
.741 .737 .599 .732
China .766 .758 .760 .741 .768
The test result variable(s): DT, NN has at least one tie between the positive
actual state group and the negative actual state group
The results indicate that the model using SVM provides highest AUC, 0.811, in the U.S.
case, and thus has the best overall summary of diagnostic accuracy. However, there is no
significant difference between the model using SVM and other models using different classifiers
for the Chinese dataset. Fig. 17 and Fig. 18 show the ROC curves for the 5 models using
different classifiers for the U.S. and Chinese cases respectively.
77
Fig. 17. ROC for the U.S. dataset
Fig. 18. ROC for Chinese dataset
78
Since sensitivity or true positive rate measures the proportion of positives correctly classified
whereas specificity or true negative rate measures the proportion of negatives correctly
classified, a cutoff point corresponding to a maximized sum of sensitivity and specificity gives
the highest OCAR. Lists of pairs of sensitivity and 1-specificity for each classifier are given in
Appendix G and Appendix H.
The results with new cutoff values and their corresponding OCAR are shown in Table 22. All
the OCARs for both case increase when new cutoff values are applied. In the U.S. case, when the
cutoff value changed from 0.5 to 0.3062, the OCAR increases from 73.58% to 81.13% for SVM.
Though the Type I error increases by 5.88%, the Type II error drop by 26.31%. The situation is
similar for Chinese case, when the cutoff value changed from 0.5 to 0.252, the overall
classification accuracy rate increases from 71.68% to 73.45% for SVM. The Type I error
increases by 21.92%, and Type II error drop by 45%. Models using other classifiers exhibit the
similar change. The reason we are interested in reporting Type I and Type II errors is discussed
in the next section of misclassification cost.
Table 22
OCAR in new cutoff values of five classifiers for the U.S. and Chinese datasets.
SVM Logistic LDA DT NN
U.S.
Cutoff 0.3062 0.2925 0.2819 0.4395 0.2992
OCAR 81.13% 73.58% 73.58% 67.92% 73.58%
Type 1 11.76% 23.53% 23.53% 11.77% 23.53%
Type 2 31.58% 31.58% 31.58% 68.42% 31.58%
China
Cutoff 0.2520 0.3611 0.3560 0.3761 0.3963
OCAR 73.45% 74.34% 74.34% 79.80% 75.22%
Type 1 31.51% 26.03% 24.66% 32.88% 27.40%
Type 2 17.50% 25.00% 27.50% 22.50% 20.00%
79
5.5 Misclassification Cost
Though the overall classification accuracy rate is an important criterion in evaluating the
predictive performance of a credit scoring model, misclassification cost is an effective and
relatively more comprehensive way to assess a model (West, 2000). Here, we employ the
following equation 5.1 from Lee and Chen (2005) to compute the expected misclassification cost
for the five models in each case
Min (1) (2 1) (2 1) (2) (1 2) (1 2)EC P P C P P C (5.1)
where EC is the expected cost of misclassification. ( ) and ( ) are prior probabilities of
creditworthy and less creditworthy populations. ( | ) and ( | ) indicates the probability of
making Type I error and Type II error. For example, the probabilities of making Type I errors
and Type II errors for the model using SVM in the U.S. case is 0.0588 and 0.5789 as shown in
Table 20 . ( | ) and ( | ) are the corresponding cost of Type I and Type II errors. As we can
see from equation 5.1, on one hand, the cost of making Type I and Type II errors are associated
with the cost of misclassification. This is because that the cost of Type II error is usually much
higher than that of the Type I error. For example, a bank may lose the interest revenue from a
loan since it rejects the loan application from a creditworthy customer (Type I error). However, it
may experience a huge lose from a default or even fraud if the bank accept the application and
provide loan to a bad credit customer who is misclassified as a good credit customer (Type II
error). Consequently, Type II error is usually more undesired than Type I error. On the other
hand, prior probabilities of creditworthy and less creditworthy populations also affect the
misclassification cost. For instance, if ( ) is significantly greater than ( ) but the cost of
80
making Type II error is not large enough than making Type I error, higher Type I error is more
undesired under this circumstance in minimizing misclassification cost.
Based on the data collected from Standard & Poor’s COMPUSTAT between 1990 and 2012,
there are 38749 companies with credit rating of B or above while 1352 companies with credit
rating below B, and thus the prior probabilities in the case of the U.S. can be set to ( )
and ( ) . According to the data in 2012, the prior probabilities in the case of Chinese
dataset is ( ) and ( ) as the number of ST companies to non-ST
companies in China is 180 to 2284 at that year.
Table 23
Misclassification cost for the U.S. and Chinese datasets. U.S. China
Model Relative
cost ratio n
Type I
error
Type II
error
EC Type I
error
Type II
error
EC
SVM 1 0.0588 0.5789 0.076483 0.0959 0.625 0.134524
5 0.0588 0.5789 0.155214 0.0959 0.625 0.317024
10
0.0588 0.5789 0.253627 0.0959 0.625 0.545149
LR 1 0.0588 0.7895 0.083644 0.1096 0.725 0.154524
5 0.0588 0.7895 0.191016 0.1096 0.725 0.366224
10
0.0588
0.7895 0.325231 0.1096 0.725 0.630849
DA 1 0.0588 0.8421 0.085432 0.0685 0.775 0.120075
5 0.0588 0.8421 0.199958 0.0685 0.775 0.346375
10
0.0588
0.8421 0.343115 0.0685 0.775 0.62925
DT 1 0.1177 0.6842 0.136961 0.2877 0.325 0.290423
5 0.1177 0.6842 0.230012 0.2877 0.325 0.385323
10
0.1177
0.6842 0.346326 0.2877 0.325 0.503948
NN 1 0.0588 0.7895 0.083644 0.137 0.675 0.176274
5 0.0588 0.7895 0.191016 0.137 0.675 0.373374
10 0.0588 0.7895 0.325231 0.137 0.675 0.619749 The cutoff values are 0.5 for both the U.S. and Chinese cases
Though valid estimates of the costs for Type I and Type II errors is a challenging task and
may not be available in this study, relative cost ratio, between them can be applied to compute
the expected misclassification costs by assuming that misclassification cost of the Type II error is
81
times greater than that of the Type I error since it is generally believed that the costs
associated with Type II error are greater than the costs associated with Type I error. Here, is
set to be 1, 5 and 10 respectively. The results are summarized in Table 23.
We can see that SVM has the best performance regarding to the minimum cost of expected
misclassification criterion in comparison with those of logistic regress, discriminant analysis,
decision tree, and neural networks in all three scenarios for the U.S. dataset. In the case of
Chinese dataset, the model using SVM obtains minimum cost of expected misclassification
criterion when , while model with discriminant analysis has the best performance
following by SVM when . Decision tree has the best performance following by SVM when
. Overall, the performance of SVM is stable and slightly better than other classifiers.
However, we cannot conclude that there is a classifier that is significantly better or worse than
the others.
5.6 Identification of Cutoff Value
Finally, let’s discuss how to identify the cutoff value that gives the minimum
misclassification cost. The definitions of the Type I error, II error, sensitivity, and specificity tell
us that Type I error is same as 1-sensitivity, and Type II error equals to 1-specificity. Therefore,
to find out minimal value in equation 5.1 is same as to find out maximal of the following
objective function 5.2.
Max: (1) sensitivity (2 1) (2) specificity (1 2) (1) (2 1) (2) (1 2)P C P C P C P C (5.2)
Since ( ) ( | ) and ( ) ( | ) are constants, the objective function can be reduced
to find out the maximized value for the objective functions 5.3 below.
82
Max (1) sensitivity (2 1) (2) specificity (1 2)P C P C (5.3)
Using the U.S. case with SVM classifier as an example, Table 24 reports the results
computed from objective function 5.3, showing in the SS column, and equation 5.1, showing in
column EC. The maximal of the SS column is 1.007598 while the minimal of column EC is
0.128402. They both correspond to the same cutoff value 0.3571 demonstrating that the cutoff
value for the minimized misclassification cost can be found with objective function 5.3 where
sensitivity and specificity can be derived from ROC function in SPSS directly (shown in
Appendix G and Appendix H ). Compared with the cost of misclassification (0.155214) when the
cutoff value is 0.5, the new misclassification cost is 0.128402 when the cutoff value is set to
0.3571.
In sum, if we evaluate the model based on OCAR, and would like to find out cutoff value
that gives best overall classification accuracy empirically, a list of sensitivity and 1-specificity
provided by ROC function in SPSS, listed in Appendix G and Appendix H, can be directly used
by finding out the maximal of the sum of sensitivity and specificity in the list. The corresponding
cutoff value to this maximal gives the highest overall classification accuracy rate among all the
cutoff values in the list. If we evaluate the model with misclassification cost, the cutoff value that
gives the minimum misclassification cost can be attained by substituting the values of sensitivity
and specificity from Table 24 into objective function 5.3, and the corresponding cutoff value to
this maximized objective function gives the minimum cost of misclassification in Table 24 as
shown in bold.
83
Table 24
Misclassification cost with new cutoff values for the U.S. dataset. Cutoff
Value
Sensitive Specificity SS EC Cutoff
Value
Sensitive Specificity SS EC
.000 0.0000 1.0000 0.1700 0.9660 .2202 0.6765 0.7895 0.7877 0.3483
.0787 0.0294 1.0000 0.1984 0.9376 .2229 0.7059 0.7895 0.8161 0.3199
.0853 0.0588 1.0000 0.2268 0.9092 .2302 0.7353 0.7895 0.8445 0.2915
.0918 0.0882 1.0000 0.2552 0.8808 .2454 0.7353 0.7368 0.8356 0.3004
.0994 0.1176 1.0000 0.2836 0.8524 .2547 0.7647 0.7368 0.8640 0.2720
.1126 0.1471 1.0000 0.3121 0.8239 .2564 0.7647 0.6842 0.8550 0.2810
.1231 0.1765 1.0000 0.3405 0.7955 .2609 0.7941 0.6842 0.8834 0.2526
.1266 0.2059 1.0000 0.3689 0.7671 .2676 0.8235 0.6842 0.9118 0.2242
.1305 0.2353 1.0000 0.3973 0.7387 .2841 0.8529 0.6842 0.9403 0.1957
.1359 0.2647 1.0000 0.4257 0.7103 .3062 0.8824 0.6842 0.9687 0.1673
.1437 0.2941 1.0000 0.4541 0.6819 .3187 0.8824 0.6316 0.9597 0.1763
.1488 0.3235 1.0000 0.4825 0.6535 .3299 0.8824 0.5789 0.9508 0.1852
.1506 0.3529 1.0000 0.5109 0.6251 .3446 0.9118 0.5789 0.9792 0.1568
.1530 0.3824 1.0000 0.5394 0.5966 .3571 0.9412 0.5789 1.0076 0.1284
.1583 0.3824 0.9474 0.5304 0.6056 .3765 0.9412 0.5263 0.9987 0.1374
.1624 0.3824 0.8947 0.5215 0.6145 .4270 0.9412 0.4737 0.9897 0.1463
.1645 0.4118 0.8947 0.5499 0.5861 .4815 0.9412 0.4211 0.9808 0.1552
.1696 0.4118 0.8421 0.5409 0.5951 .5037 0.9412 0.3684 0.9718 0.1642
.1748 0.4412 0.8421 0.5693 0.5667 .5081 0.9412 0.3158 0.9629 0.1731
.1834 0.4706 0.8421 0.5977 0.5383 .5117 0.9412 0.2632 0.9539 0.1821
.1928 0.5000 0.8421 0.6262 0.5098 .5229 0.9412 0.2105 0.9450 0.1910
.1970 0.5294 0.8421 0.6546 0.4814 .6365 0.9412 0.1579 0.9360 0.2000
.1998 0.5588 0.8421 0.6830 0.4530 .7626 0.9412 0.1053 0.9271 0.2089
.2010 0.5882 0.8421 0.7114 0.4246 .7941 0.9412 0.0526 0.9181 0.2179
.2013 0.5882 0.7895 0.7024 0.4336 .8269 0.9706 0.0526 0.9465 0.1895
.2062 0.6176 0.7895 0.7309 0.4051 .8815 0.9706 0.0000 0.9376 0.1984
.2143 0.6471 0.7895 0.7593 0.3767 1.0000 1.0000 0.0000 0.9660 0.1700
84
CHAPTER VI
CONCLUSION AND DISCUSSION
6.1 Summary
Credit risk is one of the most important topics in the risk management. Meanwhile, it is the
major risk of banks and financial institutions encountered as claimed by the Basel capital accord.
As a form of credit risk measurement, credit scoring is an important decision process used in
many business areas. A main stream of building credit scoring models is to develop classification
models so that based on the analysis of the past performance of consumers, future credit
applicants can be classified into one of the predefined classes, according to the features that
describe demographic characteristics, economic or financial conditions of the applicants
However, with the rapid growth in credit industry and facilitation of collecting and storing
information due to the new technologies, a huge amount of information on customer is available
due to increasing number of irrelevant and/or redundant features in building credit scoring
models. How to select a subset of useful features from a pool of candidate features to establish an
effective classification model in credit scoring is a practical and challenging research topic.
Feature selection is therefore essential to handle irrelevant, redundant or misleading features in
order to improve predictive accuracy and reduce high complexity, intensive computation, and
instability for most of classification models.
In this dissertation, a hybrid model is developed to improve predictive accuracy and reduce
high complexity and intensive computation when a pool of candidate features present. It
combines advantages of filter and wrapper methods, and completes feature selection and
classification prediction in two phases. In the first phase, where a filter approach is applied, a
correlation coefficient based binary quadratic programming model is constructed for selecting
85
subsets of features. The model is then solved with bisection method based on Tabu search
algorithm (BMTS) and provides optional subsets of features in different sizes. In the second
phase, where a wrapper approach is employed, the selected subsets of features are evaluated in
terms of OCAR with 10-fold cross validation SVM, and finally, satisfactory subsets used to build
credit scoring model are determined based on both the OCAR and the size of the subset.
The validity of the hybrid model is demonstrated by two benchmark datasets, and
experimental results on the Australian and German datasets show the effectiveness of the
proposed BMTS+SVM method which not only performs competitively well on OCAR but also
reduces the computational effort by the classifier and provides alternative options so that a
tradeoff between accuracy and the size of subset is available, bringing flexibility to the decision
making process.
This validated method is then used in an international business context to test the data on the
U.S. and Chinese companies in order to identify key factors in discriminating between CWCs
and LCWCs in these two countries. The most useful financial ratios and their corresponding
financial categories are first identified for the U.S. companies. The four categories are those with
profitability, solvency, cash flow, and leverage ratios, and are consistent with the four financial
categories provided by a widely recognized credit rating agency Standard & Poor. Similarly, we
found the same four financial categories for Chinese companies with an additional category with
operating ratios. Therefore, managers should be aware that key financial categories may vary for
different countries. Moreover, the application of the findings is twofold. On one hand, managers
of financial institutions can pay more attention to the ratios in the key financial categories
especially the most representative ratios selected with our proposed method so that they are able
to gain better understanding about the credit status of their applicants before making any further
86
decisions. On the other hand, companies that attempt to borrow money from financial institutions
are able to attain clear vision on what are the most important financial factors for being
considered a creditworthy company, and what improvement are needed immediately to increase
the chance of receiving loans.
The performance of classification models (models using different classifiers) in terms of
OCAR and misclassification cost is evaluated based on the U.S. and Chinese datasets. Cutoff
values which gives highest OCAR and lowest misclassification cost is also discussed. The results
show that SVM has stable and slightly better overall performance. However, there is no strong
evidence showing that a particular classifier significantly outperforms the others.
6.2 Discussion and Future Research
For the proposed method per se, the computational effectiveness can be improved if critical
points of are available. Evidently, the time for finding out different sizes of subsets in phase
one depends on algorithms solving the quadratic programming model and to what extent is
partitioned. While Tabu search algorithms are efficient and among the most successful ones in
solving problems of large size, our future study in improving computational time and effort
based on this study lies on how efficient [ ] can be divided and identified for different
sizes. For example, if we divided into where and set the time limit to 1 second for
Tabu search algorithm, then 1024 seconds is needed to reach the solutions for the UBQP
problem. From the experiment of the two datasets, we know that a certain range of gives the
same solution, which means many of solutions are duplicated. However, if the critical point of
for different sizes can be identified, duplicate solutions will be avoided, thus saving a lot of
computational time.
87
In addition, the BMTS method can be extent to meet the requirement if a particular number
of features is specified to be selected from the candidate features. We can set to a number of
different values between 0 and 1 at first step. For example, if subsets with size 5 from 40
candidate features are needed, we can set to 0.1, 0.2, 0.3, 0.4, 0.5, etc., and if the solution of
model 3.1 corresponding to gives a subset of size 3 while gives a subset of size
6. We will know that by adjusting between 0.3 and 0.4, the subsets of size 5 can be identified.
The method will be also tested to deal with real big data in the future. This improvement can
be done in twofold. On one hand, with the increasing number of candidate features, the subsets
of features identified by BMTS method for each size increases as well. To cope with the
increasing computational effort causing by the increasing number of subsets is a challenge task
in the future. On the other hand, SVM is used as the classifier in this study due to its strong
theoretical foundation, adaptive generalization ability, and appealing and stable predictive
performance. However, according to the results from comparing different classifiers in Chapter
5, there is no strong evidence showing that a particular classifier significantly outperforms the
others. Therefore, we can combine different classifiers with BMTS method in different cases.
For example, in big data with extremely large size of samples, SVM might not be the best choice
of classifier since a disadvantage of SVM is that it has high algorithmic complexity and
extensive memory requirements in large scale tasks (Yu, Miche, Sorjamaa, Guillen, Lendasse &
Severin, 2010).
What’s more, the BMTS+SVM method has been so far tested in scenarios that only two
classes presented good credit and bad credit or creditworthy companies and less creditworthy
companies. However, real world credit scoring problems often involve more groups. For
example, in the study of credit scoring at corporate level, companies in the U.S. dataset are
88
classified into AAA, AA, A, BBB, C, and so on. Therefore, another improvement that can be
made in the future research is to test the performance of the proposed hybrid model for credit
scoring in a situation when three or more classes or groups presented.
Finally, in the study of the U.S. and Chinese cases, the features used to predict the
classification are all financial ratios. In the future study, we can include more features other than
financial ratios such as main activity of the business, the borrower’s business expertise and the
status of the borrower’s economic sector and its position within that sector, age of business, the
borrower’s sensitivity to economic and market developments, business location, and even the
structure of a company’s board members. Also, we can include macroeconomic features as well
as industrial level features.
89
REFERENCES
Abdou, H. A. (2009). Genetic programming for credit scoring: The case of Egyptian public
sector banks. Expert Systems with Applications, 36(9), 11402-11417.
Abdou, H. A., & Pointon, J. (2011). Credit scoring, statistical techiques and evaluation criteria:
A review of the literatue. Intelligent Systems in Accounting, Finance & Management,
18(2/3), 59-88.
Aidi, M. N., & Sari, R. I. (2012). Classification of debtor credit status and determination amount
of credit risk by using linier discriminant function. Paper presented at the AIP Conference
Proceedings.
Akay, M. F. (2009). Support vector machines combined with feature selection for breast cancer
diagnosis. Expert Systems with Applications, 36(2-2), 3240-3247.
Akkoç, S. (2012). An empirical comparison of conventional techniques, neural networks and the
three stage hybrid adaptive neuro fuzzy inference system (ANFIS) model for credit
scoring analysis: The case of Turkish credit card data. European Journal of Operational
Research, 222(1), 168-178.
Alam, P., Booth, D., Lee, K., & Thordarson, T. (2000). The use of fuzzy clustering algorithm
and self-organizing neural networks for identifying potentially failing banks: an
experimental study. Expert Systems with Applications, 18(3), 185-199.
Alfaro-Cid, E., Sharman, K., & Esparcia-Alcazar, A. I. (2007). A genetic programming apprach
for bankruptcy prediction using a highly unbalanced database. In M. Giacobini, A.
Brabazon, S. Cagnoni, G. A. Di Caro, R. Drechsler, M. Farooq, A. Fink, E. Lutton, P.
Machado, S. Minner, M. O’ Neill, J. Romero, F. Rothlauf, G. Squillero, H. Takagi, A. S.
Uyar & S. Yang (Eds.), Applications of Evolutionary Computing, EvoWorkshops2007:
90
EvoCOMNET, EvoFIN, EvoIASP, EvoInteraction, EvoMUSART, EvoSTOC,
EvoTransLog (pp. 169-178). Valencia, Spain: Springer Verlag.
Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate
bankruptcy. The Journal of Finance, 23(4), 589-609.
Asada, T., Yun, Y., Nakayama, H., & Tanino, T. (2004). Pattern classification by goal
programming and support vector machines. Computational Management Science, 1(3-4),
211-230.
Baesens, B., Gestel, T. V., Viaene, S., Stepanova, M., Suykens, J., & Vanthienen, J. (2003).
Benchmarking state-of-the-art classification algorithms for credit scoring. The Journal of
the Operational Research Society, 54(6), 627-635.
Battiti, R. (1994). Using mutual information for selecting features in supervised neural net
learning. IEEE Transactions on Neural Networks, 5(4), 537-550.
Bell, T. B. (1997). Neural nets or the logit model? A comparison of each model’s ability to
predict commercial bank failures. Intelligent Systems in Accounting, Finance &
Management, 6(3), 249-264.
Bellotti, T., & Crook, J. (2009). Support vector machines for credit scoring and discovery of
significant features. Expert Systems with Applications, 36(2-2), 3302-3308.
Boros, E., Hammer, P. L., Sun, R., & Tavares, G. (2008). A max-flow approach to improved
lower bounds for quadratic unconstrained binary optimization (QUBO). Discrete
Optimization, 5(2), 501-529.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression
trees. Belmont, CA: Chapman and Hall.
91
Brill, J. (1998). The importance of credit scoring models in improving cash flow and collections.
Business Credit, 100(1), 16-17.
Camastra, F. (2007). A SVM-based cursive character recognizer. Pattern Recognition, 40(12),
3721-3727.
Camps, V. G., Mooij, J., & Scholkopf, B. (2010). Remote sensing feature selection by kernel
dependence measures. IEEE Geoscience and Remote Sensing Letters, 7(3), 587-591.
Chen, F. L., & Li, F. C. (2010). Combination of feature selection approaches with SVM in credit
scoring. Expert Systems with Applications, 37(7), 4902-4909.
Chen, W., Ma, C., & Ma, L. (2009). Mining the customer credit using hybrid support vector
machine technique. Expert Systems with Applications, 36(4), 7611-7616.
Chen, Q., Zhang, D., Wei, L., & Chen, H. (2007, March 1-April 5). A modified genetic
programming for behavior scoring problem. Paper presented at the IEEE Symposium on
Computational Intelligence and Data Mining. doi: 10.1109/CIDM.2007.368921
Chiang, L. H., & Pell, R. J. (2004). Genetic algorithms combined with discriminant analysis for
key variable identification. Journal of Process Control, 14(2), 143-155.
Cho, S., Hong, H., & Ha, B. C. (2010). A hybrid approach based on the combination of variable
selection using decision trees and case-based reasoning using the Mahalanobis distance:
For bankruptcy prediction. Expert Systems with Applications, 37(4), 3482-3488.
Coakley, J. R., & Brown, C. E. (2000). Artificial neural networks in accounting and finance:
Modeling issues. International Journal of Intelligent Systems in Accounting Finance &
Management, 9(2), 119-144.
Crook, J. N., Edelman, D. B., & Thomas, L. C. (2007). Recent developments in consumer credit
risk assessment. European Journal of Operational Research, 183(3), 1447-1465.
92
Crouhy, M., Galai, D., & Mark, R. (2000). A comparative analysis of current credit risk models.
Journal of Banking & Finance, 24(1–2), 59-117.
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of
Control, Signals and Systems, 2(4), 303-314.
Danenas, P., Garsva, G., & Gudas, S. (2011). Credit risk evaluation model development using
support vector based classifiers. Procedia Computer Science, 4, 1699-1707.
Danenas, Garsva, G., & Simutis, R. (2011). Development of discriminant analysis and majority-
voting based credit risk assessment classifier. International Conference on
Computational Science. Retrieved from http://world-comp.org/p2011/ICA3513.pdf
Dash, M., & Liu, H. (2003). Consistency-based search in feature selection. Artificial Intelligence,
151(1–2), 155-176.
Derelioğlu, G., & Gürgen, F. (2011). Knowledge discovery using neural approach for SME's
credit risk analysis problem in Turkey. Expert Systems with Applications, 38(8), 9313-
9318.
Derelioğlu, G., Gürgen, F., & Okay, N. (2009). A Neural Approach for SME’s Credit Risk
Analysis in Turkey. In P. Perner (Ed.), Machine Learning and Data Mining in Pattern
Recognition (pp. 749-759). Berlin, Heidelberg: Springer.
Desai, V. S., Crook, J. N., & Overstreet, G. A., Jr. (1996). A comparison of neural networks and
linear scoring models in the credit union environment. European Journal of Operational
Research, 95(1), 24.
Dimla, D. E., Sr., & Lister, P. M. (2000). On-line metal cutting tool condition monitoring.: II:
tool-state classification using multi-layer perceptron neural networks. International
Journal of Machine Tools and Manufacture, 40(5), 769-781.
93
Eiben, A. E., & Smith, J. E. (2003). Introduction to Evolutionary Computing. Berlin Heidelberg:
Springer.
Eisenbeis, R. A. (1978). Problems in applying discriminant analysis in credit scoring models.
Journal of Banking & Finance, 2(3), 205-219.
Eksioglu, B., Demirer, R., & Capar, I. (2005). Subset selection in multiple linear regression: a
new mathematical programming approach. Computers & Industrial Engineering, 49(1),
155-167.
Espejo, P. G., Ventura, S., & Herrera, F. (2010). A survey on the application of genetic
programming to classification. IEEE Transactions on Systems, Man and Cybernetics Part
C: Applications and Reviews, 40(2), 121-144.
Etemadi, H., Anvary Rostamy, A. A., & Dehkordi, H. F. (2009). A genetic programming model
for bankruptcy prediction: Empirical evidence from Iran. Expert Systems with
Applications, 36(2-2), 3199-3207.
Falangis, K. (2007). The use of MSD model in credit scoring. Operational Research, 7(3), 481-
503.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-
874.
Fedorova, E., Gilenko, E., & Dovzhenko, S. (2013). Bankruptcy prediction for Russian
companies: Application of combined classifiers. Expert Systems with Applications,
40(18), 7285-7293.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of
Eugenics, 7(2), 179-188.
94
Fleuret, F. (2004). Fast binary feature selection with conditional mutual information. Journal of
Machine Learning Research, 5, 1531-1555.
Frydman, H., Altman, E. I., & Kao, D.-L. (1985). Introducing recursive partitioning for financial
classification: The case of financial distress. The Journal of Finance, 40(1), 269-291.
García, V., Marqués, A. I., & Sánchez, J. S. (2012). Improving risk predictions by preprocessing
imbalanced credit data. Neural Information Processing, 7664, 68-75.
Glen, J. J. (2003). An iterative mixed integer programming method for classification accuracy
maximizing discriminant analysis. Computers and Operations Research, 30(2), 181-198.
Glover, F. (1986). Future paths for integer programming and links to artificial intelligence.
Computers & Operations Research, 13(5), 533.
Glover, F. (1989). Tabu search-- Part I. ORSA Journal on Computing, 1(3), 190-206.
Glover, F. (1990). Tabu search-- Part II. ORSA Journal on Computing, 2(1), 4-32.
Glover, F., Kochenberger, G.A., & Alidaee, B. (1998). Adaptive memory Tabu search for binary
quadratic programs. Management Science, 44(3), 336-345.
Glover, F., & Laguna, M. (1997). Tabu search: Kluwer Academic.
Glover, F., Lü, Z., & Hao, J.-K. (2010). Diversification-driven Tabu search for unconstrained
binary quadratic problems. 4OR, 8(3), 239-253.
Gönen, B. G., Gönen, M., & Gürgen, F. (2012). Probabilistic and discriminative group-wise
feature selection methods for credit risk analysis. Expert Systems with Applications,
39(14), 11709-11717.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of
Machine Learning Research, 3(7), 1157-1182.
95
Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification
using support vector machines. Machine Learning, 46(1-3), 389-422.
Hair, J. F. H., Black, W. C. B., Babin, B. J. B., & Alderson, R. E. (2010). Multivariate data
analysis (7th ed.). Upper Saddle River, NJ: Prentice Hall.
Hand, D. J., & Henley, W. E. (1997). Statistical classification methods in consumer credit
scoring: A review. Journal of the Royal Statistical Society: Series A (Statistics in Society),
160(3), 523-541.
Harikrishna, S., Farquad, M. A. H., & Shabana. (2012). Credit scoring using support vector
machine: A comparative analysis. Advanced Materials Research, 433/440, 6527-6533.
Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are
universal approximators. Neural Networks, 2(5), 359-366.
Hsieh, N. C., & Hung, L. P. (2010). A data driven ensemble classifier for credit scoring analysis.
Expert Systems with Applications, 37(1), 534-545.
Hsu, W. H. (2004). Genetic wrappers for feature selection in decision tree induction and variable
ordering in Bayesian network structure learning. Information Sciences, 163(1–3), 103-
122.
Huang, C.-L., Chen, M. C., & Wang, C. J. (2007). Credit scoring with a data mining approach
based on support vector machines. Expert Systems with Applications, 33(4), 847-856.
Huang, C.-L., Liao, H.-C., & Chen, M.-C. (2008). Prediction model building and feature
selection with support vector machines in breast cancer diagnosis. Expert Systems with
Applications, 34(1), 578-587.
Huang, J.-J., Tzeng, G.-H., & Ong, C.-S. (2006). Two-stage genetic programming (2SGP) for the
credit scoring model. Applied Mathematics and Computation, 174(2), 1039-1053.
96
Jiang, M., & Yuan, X. (2007, August 24-27). Personal credit scoring model of non-linear
combining forecast based on GP. Paper presented at the International Conference on
Natural Computation. doi: 10.1109/ICNC.2007.551
Jo, H., Han, I., & Lee, H. (1997). Bankruptcy prediction using case-based reasoning, neural
networks, and discriminant analysis. Expert Systems with Applications, 13(2), 97-108.
Karels, G. V., & Prakash, A. J. (1987). Multivariate normality and forecasting of business
bankruptcy. Journal of Business Finance & Accounting, 14(4), 573-593.
Kim, H., & Sohn, S. (2010). Support vector machines for default prediction of SMEs based on
technology credit. European Journal of Operational Research, 201(3), 838-846.
Kim, Y. S., & Sohn, S. Y. (2004). Managing loan customers using misclassification patterns of
credit scoring model. Expert Systems with Applications, 26(4), 567-573.
Kira, K., & Rendell, L. A. (1992). The feature selection problem: Traditional methods and a new
algorithm. Proceedings of the National Conference on Artificial Intelligence, San Jose,
1992. Menlo Park, CA: The AAAI Press.
Kochenberger, G., & Glover, F. (2006). A Unified Framework for Modeling and Solving
Combinatorial Optimization Problems: A Tutorial. In W. Hager, S.-J. Huang, P. Pardalos
& O. Prokopyev (Eds.), Multiscale Optimization Methods and Applications (pp. 101-124).
New York, NY: Springer.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model
selection. Proceedings of the International Joint Conference on Artificial Intelligence,
Montreal, 1995. San Francisco, CA: Morgan Kaufmann Publishers Inc.
Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence,
97(1–2), 273-324.
97
Koza, J. R. (1992). Genetic programming: On the programming of computers by means of
natural selection. Cambridge, MA: MIT Press.
Kumar, P. R., & Ravi, V. (2007). Bankruptcy prediction in banks and firms via statistical and
intelligent techniques: A review. European Journal of Operational Research, 180(1), 1-
28.
Kumari, B., & Swarnkar, T. (2011). Filter versus wrapper feature subset selection in large
dimensionality micro array: A review. International Journal of Computer Science and
Information Technologies, 2(3), 1048-1053.
Kwak, N., & Choi, C.-H. (2002). Input feature selection by mutual information based on Parzen
window. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1667-
1671.
Lacher, R. C., Coats, P. K., Sharma, S. C., & Fant, L. F. (1995). A neural network for classifying
the financial health of a firm. European Journal of Operational Research, 85(1), 53-65.
Lam, K. F., & Moy, J. W. (2002). Combining discriminant methods in solving classification
problems in two-group discriminant analysis. European Journal of Operational Research,
138(2), 294-301.
Lasko, T. A., Bhagwat, J. G., Zou, K. H., & Ohno-Machado, L. (2005). The use of receiver
operating characteristic curves in biomedical informatics. Journal of Biomedical
Informatics, 38(5), 404-415.
Lawson, J. C. (1995). Knowing the score. U.S. Banker, 105(11), 61.
Lee, T. S., & Chen, I. F. (2005). A two-stage hybrid credit scoring model using artificial neural
networks and multivariate adaptive regression splines. Expert Systems with Applications,
28(4), 743-752.
98
Lee, T. S., Chiu, C.-C., Lu, C.-J., & Chen, I. F. (2002). Credit scoring using the hybrid neural
discriminant technique. Expert Systems with Applications, 23(3), 245-254.
Lee, T. H., & Jung, S.-C. (1999). Forecasting creditworthiness: Logistic vs. artificial neural net.
Journal of Business Forecasting Methods & Systems, 18(4), 28.
Lensberg, T., Eilifsen, A., & McKee, T. E. (2006). Bankruptcy theory development and
classification via genetic programming. European Journal of Operational Research,
169(2), 677-697.
Leshno, M., & Spector, Y. (1996). Neural network prediction analysis: The bankruptcy case.
Neurocomputing, 10(2), 125-147.
Lessmann, S., & Voß, S. (2009). A reference model for customer-centric data mining with
support vector machines. European Journal of Operational Research, 199(2), 520-530.
Li, C. H., Li, Y. C., Kuo, B. C., Liu, J. F., & Huang, H. Y. (2012). SVM self-contained variable
importance measure for credit scoring. ICIC Express Letters, 6(2), 389-394.
Li, R.-H., & Belford, G. G. (2002). Instability of decision tree classification algorithms.
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, Edmonton, 2002. New York, NY: ACM.
Lin, C. C., Chang, C. C., Li, F. C., & Chao, T. C. (2011, December 6-9). Features selection
approaches combined with effective classifiers in credit scoring. Paper presented at the
IEEE International Conference on Industrial Engineering and Engineering Management.
doi: 10.1109/IEEM.2011.6118017
Liu, S., Wang, Q., & Shuai, L. (2008, July 2-4). Application of Genetic Programming in credit
scoring. Paper presented at the Control and Decision Conference. doi:
10.1109/CCDC.2008.4597485
99
Liu, Y., & Schumann, M. (2005). Data mining feature selection for credit scoring models.
Journal of the Operational Research Society, 56(9), 1099-1108.
Loh, W.-Y. (2011). Classification and regression trees. Wiley Interdisciplinary Reviews: Data
Mining and Knowledge Discovery, 1(1), 14-23.
Lu, C., Van Gestel, T., Suykens, J. A. K., Van Huffel, S., Vergote, I., & Timmerman, D. (2003).
Preoperative prediction of malignancy of ovarian tumors using least squares support
vector machines. Artificial Intelligence in Medicine, 28(3), 281-306.
Lü, C., & Zhao, Y. (2004). Researches on the financial position classification of listed companies.
Accounting Research, 11, 53-61 (in Chinese).
Lü, Z., Glover, F., & Hao, J.-K. (2010). A hybrid metaheuristic approach to solving the UBQP
problem. European Journal of Operational Research, 207(3), 1254.
Malhotra, R., & Malhotra, D. K. (2003). Evaluating consumer loans using neural networks.
Omega, 31(2), 83-96.
Mandala, I. G. N. N., Nawangpalupi, C. B., & Praktikto, F. R. (2012). Assessing credit risk: An
application of data mining in a rural bank. Procedia Economics and Finance, 4, 406-412.
Marinakis, Y., Marinaki, M., Doumpos, M., & Zopounidis, C. (2009). Ant colony and particle
swarm optimization for financial classification problems. Expert Systems with
Applications, 36(7), 10604-10611.
Martens, D., Baesens, B., Van Gestel, T., & Vanthienen, J. (2007). Comprehensible credit
scoring models using rule extraction from support vector machines. European Journal of
Operational Research, 183(3), 1466-1476.
Martin, D. (1977). Early warning of bank failure: A logit regression approach. Journal of
Banking & Finance, 1(3), 249-276.
100
McKee, T. E., & Lensberg, T. (2002). Genetic programming and rough sets: A hybrid approach
to bankruptcy classification. European Journal of Operational Research, 138(2), 436-451.
Merton, R. C. (1974). On the pricing of corporate debt: The risk structure of interest rates. The
Journal of Finance, 29(2), 449-470.
Miller, A. J. (1984). Selection of subsets of regression variables. Journal of the Royal Statistical
Society: Series A, 147(3), 389-425.
Min, J. H., & Jeong, C. (2009). A binary classification method for bankruptcy prediction. Expert
Systems with Applications, 36(3-1), 5256-5263.
Min, J. H., & Lee, Y.-C. (2008). A practical approach to credit scoring. Expert Systems with
Applications, 35(4), 1762-1770.
Nanni, L., & Lumini, A. (2009). An experimental comparison of ensemble of classifiers for
bankruptcy prediction and credit scoring. Expert Systems with Applications, 36(2-2),
3028-3033.
Nie, G., Rowe, W., Zhang, L., Tian, Y., & Shi, Y. (2011). Credit card churn forecasting by
logistic regression and decision tree. Expert Systems with Applications, 38(12), 15273-
15285.
Njanike, K. (2009). The impact of effective credit risk management on bank survuval. Annals of
the University of Petrosani Economics, 9(2), 173-184.
Olson, D. L., Delen, D., & Meng, Y. (2012). Comparative analysis of data mining methods for
bankruptcy prediction. Decision Support Systems, 52(2), 464-473.
Ong, C.-S., Huang, J.-J., & Tzeng, G.-H. (2005). Building credit scoring models using genetic
programming. Expert Systems with Applications, 29(1), 41-47.
101
Oreski, S., Oreski, D., & Oreski, G. (2012). Hybrid system with genetic algorithm and artificial
neural networks and its application to retail credit risk assessment. Expert Systems with
Applications, 39(16), 12605-12617.
Paisittanand, S., & Olson, D. L. (2006). A simulation study of IT outsourcing in the credit card
business. European Journal of Operational Research, 175(2), 1248-1261.
Paleologo, G., Elisseeff, A., & Antonini, G. (2010). Subagging for credit scoring models.
European Journal of Operational Research, 201(2), 490-499.
Palubeckis, G. (2004). Multistart Tabu search strategies for the unconstrained binary quadratic
optimization problem. Annals of Operations Research, 131(1-4), 259-282.
Palubeckis, G. (2006). Iterated Tabu search for the unconstrained binary quadratic optimization
problem. Informatica, 17(2), 279-296.
Peng, H., Fulmi, L., & Ding, C. (2005). Feature selection based on mutual information criteria of
max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 27(8), 1226-1238.
Piramuthu, S. (1999). Financial credit-risk evaluation with neural and neurofuzzy systems.
European Journal of Operational Research, 112(2), 310-321.
Prajapati, G. L., & Patle, A. (2010, November 19-21). On performing classification using SVM
with radial basis and polynomial kernel functions. Paper presented at the International
Conference on Emerging Trends in Engineering and Technology. doi:
10.1109/ICETET.2010.134
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106.
Quinlan, J. R. (1993). C4.5: programs for machine learning. San Francisco, CA: Morgan
Kaufmann Publishers Inc.
102
Rampone, S., Frattolillo, F., & Landolfi, F. (2013). Assessing consumer credit applications by a
genetic programming approach Advanced Dynamic Modeling of Economic and Social
Systems (pp. 79-89). Berlin, Heidelberg: Springer.
Ravi, V., & Pramodh, C. (2008). Threshold accepting trained principal component neural
network and feature subset selection: Application to bankruptcy prediction in banks.
Applied Soft Computing, 8(4), 1539-1548.
Ryu, Y. U., & Yue, W. T. (2005). Firm bankruptcy prediction: Experimental comparison of
isotonic separation and other classification approaches. IEEE Transactions on Systems,
Man and Cybernetics, 35(5), 727-737.
Sakar, O. C., & Kursun, O. (2012). A method for combining mutual information and canonical
correlation analysis: Predictive Mutual Information and its use in feature selection.
Expert Systems with Applications, 39(3), 3333-3344.
Salehi, M., & Mansoury, A. (2011). An evaluation of Iranian banking system credit risk: Neural
network and logistic regression approach. International Journal of Physical Sciences,
6(25), 6082-6090.
Schebesch, K. B., & Stecking, R. (2005). Support vector machines for classifying and describing
credit applicants: Detecting typical and critical regions. The Journal of the Operational
Research Society, 56(9), 1082-1088.
Senliol, B., Gulgezen, G., Lei, Y., & Cataltepe, Z. (2008, October 27-29). Fast Correlation
Based Filter (FCBF) with a different search strategy. Paper presented at the International
Symposium onComputer and Information Sciences. doi: 10.1109/ISCIS.2008.4717949
Sette, S., & Boullart, L. (2001). Genetic programming: Principles and applications. Engineering
Applications of Artificial Intelligence, 14(6), 727-736.
103
Shin, K.-S., Lee, T. S., & Kim, H.-j. (2005). An application of support vector machines in
bankruptcy prediction model. Expert Systems with Applications, 28(1), 127-135.
Srinivasan, V., & Kim, Y. H. (1987). Credit granting: A comparative analysis of classification
procedures. The Journal of Finance, 42(3), 665-681.
Stephanou, C., & Mendoza, J. C. (2005). Credit risk measurement under Basel II: An overview
and implementation issues for developing countries World Bank Policy Research
Working Paper 3556.
Su, C.-T., & Yang, C.-H. (2008). Feature selection for the SVM: An application to hypertension
diagnosis. Expert Systems with Applications, 34(1), 754-763.
Šušteršič, M., Mramor, D., & Zupan, J. (2009). Consumer credit scoring models with limited
data. Expert Systems with Applications, 36(3), 4736-4744.
Swicegood, P., & Clark, J. A. (2001). Off-site monitoring systems for predicting bank
underperformance: A comparison of neural networks, discriminant analysis, and
professional human judgment. Intelligent Systems in Accounting, Finance &
Management, 10(3), 169-186.
Tay, F. E. H., & Cao, L. (2001). Application of support vector machines in financial time series
forecasting. Omega, 29(4), 309-317.
Tsai, C. F. (2009). Feature selection in bankruptcy prediction. Knowledge-Based Systems, 22(2),
120-127.
Utzig, S. (2010). The financial crisis and the regulation of credit rating agencies: A European
banking perspective ADBI Working Paper Series (Vol. 188): Asian Development Bank
Institute.
104
Van Gestel, T., Suykens, J. A. K., Baestaens, D. E., Lambrechts, A., Lanckriet, G., Vandaele, B.,
De Moor, B., & Vandewalle, J. (2001). Financial time series prediction using least
squares support vector machines within the evidence framework. IEEE Transactions on
Neural Networks, 12(4), 809-821.
Vapnik, V. (1995). The nature of statistical learning theory. New York, NY: Springer.
Wang, C. M., & Huang, Y. (2009). Evolutionary-based feature selection approaches with new
criteria for data mining: A case study of credit approval data. Expert Systems with
Applications, 36(3-2), 5900-5908.
Wang, J., Guo, K., & Wang, S. (2010). Rough set and Tabu search based feature selection for
credit scoring. Procedia Computer Science, 1(1), 2425-2432.
Wang, J., Hedar, A. R., Wang, S., & Ma, J. (2012). Rough set and scatter search metaheuristic
based feature selection for credit scoring. Expert Systems with Applications, 39(6), 6123-
6128.
Wang, Y., Lü, Z., Glover, F., & Hao, J.-K. (2012). Path relinking for unconstrained binary
quadratic programming. European Journal of Operational Research, 223(3), 595-604.
Wei, H., & Billings, S.A. (2007). Feature subset selection and ranking for data dimensionality
reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 162-
166.
Wei, L., Li, J., & Chen, Z. (2007, May 27-30). Credit risk evaluation using support vector
machine with mixture of kernel. Paper presented at the International Conference on
Computational Science. doi: 10.1007/978-3-540-72586-2_62
West, D. (2000). Neural network credit scoring models. Computers & Operations Research,
27(11–12), 1131-1152.
105
Wlodzislaw, D., & Norbert, J. (2001, April 25-27). Transfer functions: hidden possibilities for
better neural networks. Paper presented at the European Symposium on Artificial Neural
Networks. Bruges: De-facto publications.
Wollan, R. (2008). The new rules for customer service: Findings from the Accenture Global
Customer Satisfaction Survey. Accenture Outlook. Retrieved from
http://www.accenture.com/sitecollectiondocuments/pdf/Global20Customer20Satisfaction
20Survey_Outlook_Jan08.pdf
World Bank (2013). Banking crisis. Global Financial Development Report. Retrieved from
http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTGLOBALFINREPORT
/0,,contentMDK:23268770~pagePK:64168182~piPK:64168060~theSitePK:8816097,00.
html
Xiong, Z. (2013). Research on credit evaluation model based on nonlinear principal component
analysis. The Journal of Quantitative & Technical Economics, 30(10), 138-150 (in
Chinese).
Yang, J., & Li, Y.-P. (2006). Orthogonal relief algorithm for feature selection. In D.-S. Huang, K.
Li & G. Irwin (Eds.), Intelligent Computing (pp. 227-234). Berlin, Heidelberg: Springer
Yap, B. W., Ong, S. H., & Husain, N. H. M. (2011). Using data mining to improve assessment of
credit worthiness via credit scoring models. Expert Systems with Applications, 38(10),
13274-13283.
Ye, H., Li, N., Feng, H., & Wang, Y. (2011). The comparisons of personal credit evaluation
models. Information Technology Journal, 10(11), 2237-2241.
Yi, J., Yan, C., Zhimin, Z., & He, X. (2008, July 8-11). A bank customer credit evaluation based
on the decision tree and the simulated annealing algorithm. Paper presented at the IEEE
106
International Conference on Computer and Information Technology. doi:
10.1109/CIT.2008.4594674
Yim, J., & Mitchell, H. (2005). Comparison of country risk models: hybrid neural networks,
logit models, discriminant analysis and cluster techniques. Expert Systems with
Applications, 28(1), 137-148.
Yu, L., & Liu, H. (2004). Efficient Feature Selection via Analysis of Relevance and Redundancy.
Journal of Machine Learning Research, 5, 1205-1224.
Yu, L., Wang, S., & Lai, K. K. (2007). Foreign-Exchange-Rate Forecasting With Artificial
Neural Networks. New York, NY: Springer.
Yu, Q., Miche, Y., Sorjamaa, A., Guillen, A., Lendasse, A., & Severin, E. (2010). OP-KNN:
method and applications. Advances in Artificial Neural Systems, 2010, 1-6.
Zhang, D., Hifi, M., Chen, Q., & Ye, W. (2008, October 18-20). A hybrid credit scoring model
based on genetic programming and support vector machines. Paper presented at the
International Conference on Natural Computation. doi: 10.1109/ICNC.2008.205
Zhang, D., Zhou, X., Leung, S. C. H., & Zheng, J. (2010). Vertical bagging decision trees model
for credit scoring. Expert Systems with Applications, 37(12), 7838-7843.
Zhang, G., Hu, Y. M., Patuwo, B. E., & Indro, D. C. (1999). Artificial neural networks in
bankruptcy prediction: General framework and cross-validation analysis. European
Journal of Operational Research, 116(1), 16-32.
Zhang, Y., & Bhattacharyya, S. (2004). Genetic programming in classifying large-scale data: an
ensemble method. Information Sciences, 163(1–3), 85-101.
107
Zibanezhad, E., Foroghi, D., & Monadjemi, A. (2011, June 10-12). Applying decision tree to
predict bankruptcy. Paper presented at the IEEE International Conference on Computer
Science and Automation Engineering. doi: 10.1109/CSAE.2011.5952826
Zimmermann, H. J., & Zysno, P. (1983). Decisions and evaluations by hierarchical aggregation
of information. Fuzzy Sets and Systems, 10(1-3), 243-260.
Zou, K. H., Resnic, F. S., Talos, I.-F., Goldberg-Zimring, D., Bhagwat, J. G ., Haker, S. J.,
Kikinis, R., Jolesz, F. A., & Ohno-Machado, L. (2005). A global goodness-of-fit test for
receiver operating characteristic curve analysis via the bootstrap method. Journal of
Biomedical Informatics, 38(5), 395-403.
108
APPENDIX A
EXAMPLE OF SOLUTIONS FOR MODEL 3.1
When is divided into alpha:0.0
best_result = 0
best_t = 0.01
Best Solution is :
0 1 0 0 0 0 0 0 0 0 0 0 0 0
***************************************************
alpha:0.001953125
best_result = 141
best_t = 0.00
Best Solution is :
0 0 0 0 0 0 0 1 0 0 0 0 0 0
***************************************************
alpha:0.00390625
best_result = 281
best_t = 0.00
Best Solution is :
0 0 0 0 0 0 0 1 0 0 0 0 0 0
***************************************************
alpha:0.005859375
best_result = 422
best_t = 0.00
Best Solution is :
0 0 0 0 0 0 0 1 0 0 0 0 0 0
***************************************************
alpha:0.0078125
best_result = 563
best_t = 0.00
Best Solution is :
0 0 0 0 0 0 0 1 0 0 0 0 0 0
***************************************************
alpha:0.060546875
best_result = 4362
best_t = 0.00
Best Solution is :
1 0 0 0 0 0 0 1 0 0 0 0 0 0
***************************************************
109
alpha:0.0625
best_result = 4506
best_t = 0.00
Best Solution is :
1 0 0 0 0 0 0 1 0 0 0 0 0 0
***************************************************
alpha:0.064453125
best_result = 4649
best_t = 0.00
Best Solution is :
1 0 0 0 0 0 0 1 0 0 0 0 0 0
***************************************************
alpha:0.06640625
best_result = 4792
best_t = 0.00
Best Solution is :
1 0 0 0 0 0 0 1 0 0 0 0 0 0
***************************************************
alpha:0.3046875
best_result = 22332
best_t = 0.00
Best Solution is :
0 0 0 0 0 0 0 1 0 0 0 1 0 0
***************************************************
alpha:0.306640625
best_result = 22503
best_t = 0.00
Best Solution is :
0 0 0 0 0 0 0 1 0 0 0 1 0 0
***************************************************
alpha:0.30859375
best_result = 22674
best_t = 0.00
Best Solution is :
0 0 0 0 0 0 0 1 0 0 0 1 0 0
***************************************************
alpha:0.310546875
best_result = 22847
best_t = 0.00
Best Solution is :
0 0 0 0 0 0 0 1 0 0 0 1 0 0
***************************************************
110
alpha:0.59375
best_result = 58258
best_t = 0.00
Best Solution is :
1 0 0 0 1 0 0 1 0 0 0 0 0 1
***************************************************
alpha:0.595703125
best_result = 58591
best_t = 0.00
Best Solution is :
1 0 0 0 1 0 0 1 0 0 0 0 0 1
***************************************************
alpha:0.59765625
best_result = 59015
best_t = 0.00
Best Solution is :
1 0 0 0 1 0 0 1 0 1 0 0 0 1
***************************************************
alpha:0.599609375
best_result = 59549
best_t = 0.00
Best Solution is :
1 0 0 0 1 0 0 1 0 1 0 0 0 1
***************************************************
alpha:0.970703125
best_result = 311272
best_t = 0.00
Best Solution is :
0 1 1 1 1 1 1 1 1 1 1 1 1 1
***************************************************
alpha:0.97265625
best_result = 313951
best_t = 0.00
Best Solution is :
0 1 1 1 1 1 1 1 1 1 1 1 1 1
***************************************************
alpha:0.974609375
best_result = 316654
best_t = 0.00
Best Solution is :
1 1 1 1 1 1 1 1 1 1 1 1 1 1
***************************************************
111
alpha:0.9765625
best_result = 319445
best_t = 0.00
Best Solution is :
1 1 1 1 1 1 1 1 1 1 1 1 1 1
***************************************************
alpha:0.978515625
best_result = 322224
best_t = 0.00
Best Solution is :
1 1 1 1 1 1 1 1 1 1 1 1 1 1
***************************************************
alpha:0.98046875
best_result = 324994
best_t = 0.00
Best Solution is :
1 1 1 1 1 1 1 1 1 1 1 1 1 1
***************************************************
alpha:0.982421875
best_result = 327788
best_t = 0.00
Best Solution is :
1 1 1 1 1 1 1 1 1 1 1 1 1 1
***************************************************
alpha:0.984375
best_result = 330568
best_t = 0.00
Best Solution is :
1 1 1 1 1 1 1 1 1 1 1 1 1 1
***************************************************
alpha:0.986328125
best_result = 333364
best_t = 0.00
Best Solution is :
1 1 1 1 1 1 1 1 1 1 1 1 1 1
***************************************************
alpha:0.98828125
best_result = 336144
best_t = 0.00
Best Solution is :
1 1 1 1 1 1 1 1 1 1 1 1 1 1
***************************************************
alpha:0.990234375
112
best_result = 338921
best_t = 0.00
Best Solution is :
1 1 1 1 1 1 1 1 1 1 1 1 1 1
***************************************************
alpha:0.9921875
best_result = 341697
best_t = 0.00
Best Solution is :
1 1 1 1 1 1 1 1 1 1 1 1 1 1
***************************************************
alpha:0.994140625
best_result = 344485
best_t = 0.00
Best Solution is :
1 1 1 1 1 1 1 1 1 1 1 1 1 1
***************************************************
alpha:1.0
best_result = 352652
best_t = 0.00
Best Solution is :
1 1 1 1 1 1 1 1 1 1 1 1 1 1
113
APPENDIX B
STATISTICAL DESCRIPTION OF THE U.S. DATASET
ST Companies Non-ST Companies
Features Mean Std. Dev. Kurtosis Skewness Mean Std. Dev. Kurtosis Skewness
X1 0.361 0.670 56.355 5.848 -0.021 0.316 8.929 -1.325
X2 0.279 1.749 178.205 13.086 -0.011 0.278 25.587 2.764
X3 480.721 4923.939 194.356 13.884 72.959 200.664 47.371 6.367
X4 -0.207 5.858 31.251 -4.038 0.108 1.257 37.584 4.680
X5 1.167 2.039 10.465 1.791 -0.138 2.108 6.111 -1.123
X6 -0.151 0.325 17.717 -3.554 -0.289 0.454 19.506 -3.952
X7 0.020 0.186 27.220 -4.115 -0.130 0.540 33.472 -5.528
X8 0.024 0.099 7.820 -1.450 -0.043 0.115 12.045 -2.615
X9 -0.100 0.221 33.680 -4.646 -0.222 0.270 10.414 -2.766
X10 -0.433 2.958 66.838 -6.557 1.478 11.604 67.432 7.710
X11 0.027 0.276 33.744 3.622 0.005 1.134 33.821 2.990
X12 0.343 0.236 1.918 0.659 0.228 0.220 1.154 1.002
X13 0.686 0.220 6.159 1.083 0.782 0.256 15.761 2.235
X14 0.168 0.256 25.603 4.166 0.164 0.405 43.853 -4.898
X15 1.167 0.845 21.683 3.650 0.985 0.703 3.135 1.561
X16 1.638 1.012 16.083 3.036 1.352 0.904 1.009 1.093
X17 0.105 0.939 50.233 -6.110 -0.219 0.989 3.277 -1.807
X18 0.089 0.189 8.452 -1.818 -0.009 0.329 8.503 -2.209
X19 0.776 0.273 8.058 1.842 1.026 0.391 8.572 2.226
X20 0.227 5.871 74.786 -6.934 0.070 12.887 81.407 8.492
X21 5.193 31.415 96.966 8.798 -42.205 396.217 98.178 -9.889
X22 0.655 0.815 10.021 2.683 0.297 0.670 33.849 5.298
X23 1.092 3.698 126.040 10.429 0.022 2.635 41.884 -2.696
X24 0.207 0.274 7.607 -1.815 -0.056 0.402 11.013 -2.490
X25 0.319 0.195 1.601 1.294 0.366 0.246 0.265 1.041
X26 0.535 0.214 -0.011 -0.652 0.454 0.289 -1.308 -0.263
X27 0.763 35.510 11.953 0.034 0.661 14.155 39.076 5.599
X28 0.046 0.225 88.334 8.302 0.044 0.229 61.550 7.338
X29 27.833 42.975 43.738 5.721 40.190 77.122 59.210 6.947
X30 0.402 0.420 6.127 2.136 0.265 0.624 23.381 4.633
X31 0.008 0.008 1.938 1.497 0.004 0.007 7.175 2.617
X32 15.091 32.471 108.150 9.529 20.639 41.338 24.463 4.787
X33 0.103 0.103 6.600 2.090 0.091 0.124 35.439 4.905
X34 3.390 2.367 4.917 1.629 3.320 2.761 9.843 2.362
X35 0.315 0.303 15.418 3.432 0.342 0.491 26.724 4.950
X36 2.169 3.748 38.466 5.606 2.463 2.576 12.193 2.903
X37 1.034 0.912 23.926 3.837 1.179 0.749 0.249 0.785
X38 5.050 15.581 72.063 7.069 -37.448 361.537 97.698 -9.855
X39 0.135 5.629 49.157 -4.792 0.638 4.497 91.389 9.388
X40 -11.495 282.697 183.185 -13.191 -49.935 462.083 96.414 -9.768
114
APPENDIX C
STATISTICAL DESCRIPTION OF CHINESE DATASET
ST Companies Non-ST Companies
Features Mean Std. Dev. Kurtosis Skewness Mean Std. Dev. Kurtosis Skewness
X1 0.185 0.295 7.326 1.849 0.060 0.223 23.557 3.084
X2 0.138 0.213 8.916 1.372 0.051 0.198 23.108 2.878
X3 1.072 0.393 333.235 15.610 1.054 0.315 80.835 6.414
X4 4.900 56.749 556.351 23.208 3.772 29.739 77.606 4.020
X5 0.332 0.728 41.215 -3.166 0.149 0.475 3.036 0.359
X6 0.102 0.144 80.319 7.589 0.076 0.123 98.638 8.088
X7 0.151 0.172 71.137 6.906 0.138 0.157 66.033 6.437
X8 0.065 0.035 1.502 1.021 0.046 0.034 13.625 2.830
X9 0.044 0.030 0.810 0.881 0.026 0.029 15.175 3.078
X10 0.085 0.112 374.025 17.279 0.053 0.055 4.797 1.946
X11 0.205 0.177 25.248 -1.349 0.235 0.203 2.007 1.114
X12 0.153 0.221 34.983 5.106 0.097 0.114 8.461 2.510
X13 0.054 0.059 10.346 2.708 0.052 0.051 4.309 1.932
X14 0.026 0.030 33.148 4.326 0.047 0.051 22.747 3.778
X15 1.125 0.817 15.380 3.052 1.029 0.677 17.960 3.180
X16 0.509 0.618 46.304 5.242 0.983 10.826 294.504 17.095
X17 0.148 0.512 15.698 -2.949 0.090 0.486 14.763 -2.705
X18 0.137 0.199 0.116 0.013 0.105 0.205 -0.168 -0.101
X19 0.465 0.152 -0.402 0.041 0.520 0.154 -0.245 -0.363
X20 25.449 117.424 155.687 11.367 10.834 49.928 125.751 10.618
X21 1.086 0.981 76.017 6.453 1.352 1.017 26.121 3.664
X22 0.711 0.619 12.002 2.660 0.828 0.652 4.395 1.803
X23 1.471 1.135 8.308 2.388 1.203 1.131 16.913 3.525
X24 0.535 0.152 -0.402 -0.041 0.480 0.154 -0.245 0.363
X25 0.846 0.171 1.186 -1.324 0.886 0.134 3.241 -1.675
X26 0.154 0.171 1.186 1.324 0.114 0.134 3.241 1.675
X27 85.237 219.937 236.343 13.392 248.966 885.226 236.786 14.621
X28 3.686 8.706 508.459 21.686 4.220 3.731 21.013 3.580
X29 4.820 6.794 74.136 6.899 3.215 4.390 57.606 6.417
X30 2.160 1.289 6.305 2.123 2.251 1.659 21.457 3.751
X31 0.805 0.213 -0.071 -0.298 0.797 0.222 0.808 -0.136
X32 16.668 44.184 43.533 6.006 5.144 9.383 30.065 4.830
X33 0.346 0.583 146.451 9.750 0.487 0.807 38.349 5.528
X34 12.585 34.457 165.356 11.898 15.367 69.005 182.688 12.653
X35 1.368 1.063 9.431 2.322 0.948 0.754 4.011 1.841
X36 4.359 13.160 139.500 10.788 2.932 4.492 24.549 4.521
X37 0.642 0.478 9.958 2.501 0.481 0.365 10.832 2.678
X38 1.460 1.895 64.123 6.655 1.212 1.472 33.752 5.090
X39 1.568 1.382 64.191 7.129 2.506 2.783 24.409 4.459
X40 2.541 2.723 83.724 7.469 2.600 2.051 48.427 5.173
115
APPENDIX D
DEFINITIONS OF LONG TERM CREDIT RATINGS FROM S&P
AAA An obligation rated ‘AAA’ has the highest rating assigned by Standard & Poor’s. The
obligor’s capacity to meet its financial commitment on the obligation is extremely
strong.
AA An obligation rated ‘AA’ differs from the highest-rated obligations only to a small
degree. The obligor’s capacity to meet its financial commitment on the obligation is
very strong.
A An obligation rated ‘A’ is somewhat more susceptible to the adverse effects of
changes in circumstances and economic conditions than obligations in higher rated
categories. However, the obligor’s capacity to meet its financial commitment on the
obligation is still strong.
BBB An obligation rated ‘BBB’ exhibits adequate protection parameters. However, adverse
economic conditions or changing circumstances are more likely to lead to a weakened
capacity of the obligor to meet its financial commitment on the obligation.
BB An obligation rated ‘BB’ is less vulnerable to nonpayment than other speculative
issues. However, it faces major ongoing uncertainties or exposure to adverse business,
financial, or economic conditions that could lead to the obligor’s inadequate capacity
to meet its financial commitment on the obligation.
B An obligation rated ‘B’ is more vulnerable to nonpayment than obligations rated ‘BB’,
but the obligor currently has the capacity to meet its financial commitment on the
obligation. Adverse business, financial, or economic conditions will likely impair the
obligor’s capacity or willingness to meet its financial commitment on the obligation.
CCC An obligation rated ‘CCC’ is currently vulnerable to nonpayment and is dependent on
favorable business, financial, and economic conditions for the obligor to meet its
financial commitment on the obligation. In the event of adverse business, financial, or
economic conditions, the obligor is not likely to have the capacity to meet its financial
commitment on the obligation.
CC An obligation rated ‘CC’ is currently highly vulnerable to nonpayment.
C The ‘C’ rating may be used when a bankruptcy petition has been filed or similar
action has been taken but payments on this obligation are being continued. ‘C’ is also
used for a preferred stock that is in arrears (as well as for junior debt of issuers rated
‘CCC-’ and ‘CC’).
D/ SD The ‘D’ rating, unlike other ratings, is not prospective; rather, it is used only when a
default has actually occurred—and not when a default is only expected.
The SD’ (selective default) is assigned when an issuer can be expected to default
selectively, that is, continue to pay certain issues or classes of obligations while not
paying others.
Note: The ratings from ‘AA’ to ‘CCC’ may be modified by the addition of a Plus (+) or minus (-) sign
116
APPENDIX E
COMPLETE SELECTED SUBSETS AND OCAR FOR THE U.S. DATASET
Features # of
Features
OCAR
24 1 72.3906%
12, 18 2 75.0842%
13, 19 2 74.7475%
13, 19,39 3 77.1044%
7, 21, 24 3 74.7475%
13, 19,21,39 4 75.7576%
4, 7, 21, 24 4 76.7677%
4, 7, 10, 24 4 74.4108%
7, 24, 26, 38 4 74.0741%
7, 24, 26, 32, 38 5 70.3704%
5, 7, 10, 21, 24 5 76.0943%
5, 7, 10, 21, 24, 32 6 72.0539%
5, 8, 10, 21, 24, 29 6 75.7576%
5, 8, 10, 21, 22, 24 6 74.7475%
5, 8, 10, 21, 22, 24, 32 7 72.0539%
5, 8, 10, 21, 22, 24, 32, 40 8 69.0236%
5, 8, 10, 13, 21, 22, 24, 32 8 72.0539%
5, 8, 10, 13, 21, 22, 24, 32, 40 9 69.0236%
5, 8, 10, 13, 19, 21, 24, 30, 32 9 72.0539%
5, 8, 10, 13, 19, 21, 22, 24, 32 9 72.0539%
5, 8, 10, 13, 19, 21, 22, 24, 32, 40 10 69.0236%
1, 5, 8, 10, 13, 19, 21, 22, 24, 32, 40 11 69.0236%
1, 5, 8, 10, 19, 21, 22, 24, 26, 32, 40 11 69.0236%
1, 5, 7, 8, 10, 19, 21, 22, 24, 26, 32, 40 12 69.0236%
1, 5, 7, 8, 10, 19, 21, 22, 24, 26, 32, 39, 40 13 69.6970%
1, 5, 7, 8, 10, 19, 21, 22, 24, 26, 31, 32, 39, 40 14 69.6970%
1, 5, 7, 8, 10, 13, 19, 21, 22, 24, 26, 31, 32, 39, 40 15 69.6970%
1, 5, 7, 8, 10, 12, 19, 21, 22, 24, 26, 31, 32, 39, 40 15 69.6970%
1, 5, 7, 8, 9, 10, 13, 19, 21, 22, 24, 26, 31, 32, 39, 40 16 69.6970%
1, 5, 7, 8, 9, 10, 12, 19, 21, 22, 24, 26, 31, 32, 39, 40 16 69.6970%
1, 5, 7, 8, 9, 10, 12, 17, 19, 21, 22, 24, 26, 31, 32, 39, 40 17 70.0337%
1, 5, 7, 8, 9, 10, 12, 18, 19, 21, 22, 24, 26, 29, 31, 32, 39, 40 18 71.3805%
1, 5, 7, 8, 9, 10, 12, 18, 19, 21, 22, 24, 26, 29, 31, 32, 38, 39, 40 19 70.3704%
1, 5, 7, 8, 9, 10, 12, 13, 18, 19, 21, 22, 24, 26, 29, 31, 32, 38, 39, 40 20 70.3704%
APPENDIX E (continued)
Features # of
Features
OCAR
1, 5, 7, 8, 9, 10, 12, 13, 17, 18, 19, 21, 22, 24, 26, 29, 31, 32, 38, 39, 40 21 70.7071%
1, 5, 7, 8, 9, 10, 12, 13, 17, 18, 19, 21, 22, 23, 24, 26, 29, 31, 32, 38, 39, 40 22 70.3704%
1, 5, 6, 7, 8, 9, 10, 12, 13, 17, 18, 19, 21, 22, 23, 24, 26, 29, 31, 32, 38, 39, 40 23 70.3704%
1, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 18, 19, 21, 22, 23, 24, 26, 29, 31, 32, 38, 39, 40 24 70.0337%
1, 5, 6, 7, 8, 9, 10, 12, 13, 17, 18, 19, 21, 22, 23, 24, 26, 29, 30, 31, 32, 38, 39, 40 24 70.7071%
1, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 18, 19, 21, 22, 23, 24, 26, 29, 30, 31, 32, 38, 39, 40 25 70.0337%
1, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 26, 29, 30, 31, 32, 38, 39, 40 26 70.0337%
1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 26, 29, 30, 31, 32, 38, 39, 40 27 70.7071%
1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 38, 39, 40 28 70.7071%
1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 38, 39, 40 29 70.3704%
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 38, 39, 40 30 70.3704%
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 37, 38, 39, 40 31 70.3704%
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 33, 37, 38, 39, 40 32 70.3704%
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 33, 37, 38, 39, 40 33 70.3704%
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 33, 36, 37, 38, 39, 40 34 70.7071%
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 33, 35, 36, 37, 38, 39, 40 35 70.7071%
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 33, 35, 36, 37, 38 , 39, 40 36 70.0337%
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 33, 35, 36, 37, 38, 39, 40 37 70.0337%
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 33, 34, 35, 36 , 37, 38, 39, 40 38 70.7071%
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 39 70.3704%
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 40 69.3603%
The numbers in this table represent the subscript of 40 features from X1 to X40
117
118
APPENDIX F
COMPLETE SELECTED SUBSETS AND OCAR FOR CHINESE DATASET
Features # of
Features
OCAR
9 1 71.5556%
9, 36 2 66.7778%
1, 17, 37 3 68.6667%
24, 27, 32 3 69.2222%
19, 27, 32 3 69.2222%
1, 37, 39 3 73.1111%
8, 14, 26 3 72.6667%
8, 14, 25 3 72.6667%
1, 17, 27, 37 4 71.4444%
10, 14, 26, 27 4 71.1111%
10, 14, 25, 27 4 71.1111%
8, 14, 25, 27 4 71.1111%
8,14,26,27 4 71.1111%
10, 14, 16, 26,27 5 71.2222%
10, 14, 16, 25,27 5 71.2222%
8, 14, 16, 25, 27 5 71.2222%
8, 14, 16, 26, 27 5 71.2222%
2, 10, 14, 25, 27 5 71.1111%
2, 10, 14, 26, 27 5 71.1111%
8, 14, 21, 25, 27 5 71.1111%
8, 14, 21, 26, 27 5 71.1111%
2, 10, 14, 16, 25, 27 6 71.5556%
2, 10, 14, 16, 26, 27 6 71.5556%
5, 10, 14, 24, 25, 27 6 71.6667%
5, 10, 14, 19, 26, 27 6 71.6667%
5, 10, 14, 24, 26, 27 6 71.6667%
5, 10, 14, 19, 25, 27 6 71.6667%
8, 14, 21, 25, 27, 32 6 72.0000%
8, 14, 21, 26, 27, 32 6 72.0000%
8, 14, 16, 21, 26, 27, 32 7 72.0000%
8, 14, 16, 21, 25, 27, 32 7 72.0000%
5, 10, 14, 16, 24, 25, 27, 32 8 70.4444%
5, 10, 14, 16, 24, 26, 27, 32 8 70.4444%
5, 10, 14, 16, 19, 26, 27, 32 8 70.4444%
119
APPENDIX F (continued)
Features # of
Features
OCAR
5, 10, 14, 16, 19, 25, 27, 32 8 70.4444%
5, 8, 14, 16, 19, 26, 27, 32 8 70.4444%
5, 8, 14, 16, 19, 25, 27, 32 8 70.4444%
5, 8, 14, 16, 24, 25, 27, 32 8 70.4444%
5, 8, 14, 16, 24, 26, 27, 32 8 70.4444%
5, 10, 14, 16, 19, 26, 27, 32, 39 9 69.3333%
5, 10, 14, 16, 24, 25, 27, 32, 39 9 69.3333%
5, 10, 14, 16, 24, 26, 27, 32, 39 9 69.3333%
5, 10, 14, 16, 19, 25, 27, 32, 39 9 69.3333%
5, 9, 10, 14, 16, 24, 26, 27, 32 9 70.5556%
5, 9, 10, 14, 16, 24, 25, 27, 32 9 70.5556%
5, 9, 10, 14, 16, 19, 25, 27, 32 9 70.5556%
5, 9, 10, 14, 16, 19, 26, 27, 32 9 70.5556%
5, 8, 14, 16, 19, 26, 27, 32, 39 9 69.3333%
5, 8, 14, 16, 19, 25, 27, 32, 39 9 69.3333%
5, 8, 14, 16, 24, 26, 27, 32, 39 9 69.3333%
5, 8, 14, 16, 24, 25, 27, 32, 39 9 69.3333%
5, 9, 10, 14, 16, 19, 25, 27, 32, 39 10 69.3333%
5, 9, 10, 14, 16, 24, 25, 27, 32, 39 10 69.3333%
5, 9, 10, 14, 16, 24, 26, 27, 32, 39 10 69.3333%
5, 9, 10, 14, 16, 19, 26, 27, 32, 39 10 69.3333%
5, 9, 10, 14, 16, 24, 25, 27, 32, 35, 39 11 69.1111%
5, 9, 10, 14, 16, 24, 26, 27, 32, 35, 39 11 69.1111%
5, 9, 10, 14, 16, 19, 25, 27, 32, 35, 39 11 69.1111%
5, 9, 10, 14, 16, 19, 26, 27, 32, 35, 39 11 69.1111%
2, 5, 9, 10, 14, 16, 24, 26, 27, 32, 37, 39 12 69.5556%
2, 5, 9, 10, 14, 16, 24, 25, 27, 32, 37, 39 12 69.5556%
2, 5, 9, 10, 14, 16, 19, 25, 27, 32, 37, 39 12 69.5556%
2, 5, 9, 10, 14, 16, 19, 26, 27, 32, 37, 39 12 69.5556%
1, 5, 9, 10, 14, 16, 24, 25, 27, 32, 37, 39 12 70.1111%
1, 5, 9, 10, 14, 16, 24, 26, 27, 32, 37, 39 12 70.1111%
1, 5, 9, 10, 14, 16, 19, 25, 27, 32, 37, 39 12 70.1111%
1, 5, 9, 10, 14, 16, 19, 26, 27, 32, 37, 39 12 70.1111%
1, 5, 9, 10, 11, 14, 16, 24, 25, 27, 32, 37, 39 13 70.3333%
1, 5, 9, 10, 11, 14, 16, 24, 26, 27, 32, 37, 39 13 70.3333%
1, 5, 9, 10, 11, 14, 16, 19, 26, 27, 32, 37, 39 13 70.3333%
1, 5, 9, 10, 11, 14, 16, 19, 25, 27, 32, 37, 39 13 70.3333%
120
APPENDIX F (continued)
Features # of
Features
OCAR
1, 5, 8, 9, 10, 14, 16, 19, 26, 27, 32, 37, 39 13 70.2222%
1, 5, 8, 9, 10, 14, 16, 24, 26, 27, 32, 37, 39 13 70.2222%
1, 5, 8, 9, 10, 14, 16, 24, 25, 27, 32, 37, 39 13 70.2222%
1, 5, 8, 9, 10, 14, 16, 19, 25, 27, 32, 37, 39 13 70.2222%
1, 5, 8, 9, 10, 11, 14, 16, 19, 25, 27, 32, 37, 39 14 70.4444%
1, 5, 8, 9, 10, 11, 14, 16, 19, 26, 27, 32, 37, 39 14 70.4444%
1, 5, 8, 9, 10, 11, 14, 16, 24, 25, 27, 32, 37, 39 14 70.4444%
1, 5, 8, 9, 10, 11, 14, 16, 24, 26, 27, 32, 37, 39 14 70.4444%
2, 5, 8, 9, 10, 11, 14, 16, 24, 26, 27, 29, 32, 35, 39 15 69.1111%
2, 5, 8, 9, 10, 11, 14, 16, 19, 26, 27, 29, 32, 35, 39 15 69.1111%
2, 5, 8, 9, 10, 11, 14, 16, 24, 25, 27, 29, 32, 35, 39 15 69.1111%
2, 5, 8, 9, 10, 11, 14, 16, 19, 25, 27, 29, 32, 35, 39 15 69.1111%
2, 5, 8, 9, 10, 11, 14, 16, 20, 24, 26, 27, 29, 32, 35, 39 16 70.2222%
2, 5, 8, 9, 10, 11, 14, 16, 20, 24, 25, 27, 29, 32, 35, 39 16 70.2222%
1, 5, 8, 9, 10, 11, 14, 16, 19, 20, 25, 27, 29, 32, 35, 39 16 70.2222%
2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 26, 27, 29, 32, 35, 39 16 70.2222%
1, 5, 8, 9, 10, 11, 14, 16, 20, 24, 25, 27, 29, 32, 35, 39 16 70.2222%
1, 5, 8, 9, 10, 11, 14, 16, 20, 24, 26, 27, 29, 32, 35, 39 16 70.2222%
2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 25, 27, 29, 32, 35, 39 16 70.2222%
1, 5, 8, 9, 10, 11, 14, 16, 19, 20, 26, 27, 29, 32, 35, 39 16 70.2222%
2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 25, 26, 27, 29, 32, 35, 39 17 70.3333%
2, 5, 8, 9, 10, 11, 14, 16, 20, 24, 25, 26, 27, 29, 32, 35, 39 17 70.3333%
2, 5, 8, 9, 10, 11, 14, 16, 19, 24, 25, 26, 27, 29, 32, 35, 39 17 69.8889%
2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 25, 26, 27, 29, 32, 33, 35, 39 18 70.2222%
2, 5, 8, 9, 10, 11, 14, 16, 20, 24, 25, 26, 27, 29, 32, 33, 35, 39 18 70.2222%
1, 2, 5, 8, 9, 10, 11, 14, 16, 20, 24, 26, 27, 29, 32, 33, 35, 39 18 70.3333%
2, 5, 8, 9, 10, 11, 14, 16, 19, 24, 25, 26, 27, 29, 32, 33, 35, 39 18 70.0000%
1, 2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 25, 27, 29, 32, 33, 35, 39 18 70.3333%
1, 2, 5, 8, 9, 10, 11, 14, 16, 20, 24, 25, 27, 29, 32, 35, 37, 39 18 70.3333%
1, 2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 26, 27, 29, 32, 35, 37, 39 18 70.3333%
2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 24, 25, 26, 27, 32, 33, 35, 37, 39 19 69.7778%
1, 2, 5, 8, 9, 10, 11, 14, 16, 20, 24, 25, 26, 27, 29, 32, 35, 37, 39 19 70.4444%
1, 2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 25, 26, 27, 29, 32, 35, 37, 39 19 70.4444%
1, 2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 24, 25, 26, 27, 32, 33, 35, 37, 39 20 69.7778%
1, 2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 24, 25, 26, 27, 29, 32, 33, 35, 37, 39 21 70.3333%
1, 2, 5, 8, 9, 10, 11, 12, 14, 16, 19, 20, 24, 25, 26, 27, 29, 32, 33, 35, 37, 39 22 70.3333%
1, 2, 5, 8, 9, 10, 11, 12, 14, 16, 19, 20, 21, 24, 25, 26, 27, 29, 32, 33, 35, 37, 39 23 70.5556%
APPENDIX F (continued)
Features # of
Features
OCAR
1, 2, 5, 8, 9, 10, 11, 12, 14, 16, 19, 20, 21, 23, 24, 25, 26, 27, 29, 32, 33, 35, 37, 39 24 70.5556%
1, 2, 5, 8, 9, 10, 11, 12, 14, 16, 19, 20, 21, 23, 24, 25, 26, 27, 29, 32, 33, 35, 36, 37, 39 25 70.8889%
1, 2, 5, 6, 8, 9, 10, 11, 12, 14, 16, 19, 20, 21, 23, 24, 25, 26, 27, 29, 32, 33, 35, 36, 37, 39 26 70.8889%
1, 2, 5, 6, 8, 9, 10, 11, 12, 14, 16, 18, 19, 20, 21, 23, 24, 25, 26, 27, 29, 32, 33, 35, 36, 37, 39 27 71.0000%
1, 2, 5, 6, 8, 9, 10, 11, 12, 14, 16, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 32, 33, 35, 36, 37, 39 28 71.4444%
1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 14, 16, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 32, 33, 35, 36, 37, 39 29 71.4444%
1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 14, 16, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 39 30 70.1111%
1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 14, 16, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 35, 36, 37, 39 30 71.4444%
1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 14, 16, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 39 31 70.3333%
1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 39 31 70.5556%
1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 39 32 70.5556%
1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 39 33 70.4444%
1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 39 34 70.5556%
1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 39 35 69.0000%
1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 39 36 69.0000%
1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 33, 34, 35, 36, 37, 38, 39 37 68.8889%
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 33, 34, 35 , 36, 37, 38, 39 38 69.0000%
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 39 69.0000%
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 , 35, 36, 37, 38, 39, 40 40 69.7778%
The numbers in this table represent the subscript of 40 features from X1 to X40
121
APPENDIX G
SENSITIVITY AND 1-SPECIFICITY FOR THE U.S. DATASET
SVM Logistic Regression Discriminant Analysis Neural Networks Decision Tree
Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe.
.0000 1.000 1.000 .0000 1.000 1.000 .0000 1.000 1.000 .0000 1.000 1.000 .0000 1.000 1.000
.0787 1.000 .971 .0543 1.000 .971 .0568 1.000 .971 .1598 1.000 .971 .4395 .316 .118
.0853 1.000 .941 .0660 1.000 .941 .0699 1.000 .941 .1646 1.000 .941 1.0000 .000 .000
.0918 1.000 .912 .0781 1.000 .912 .0840 1.000 .912 .1691 1.000 .912
.0994 1.000 .882 .0908 1.000 .882 .0969 1.000 .882 .1739 1.000 .882
.1126 1.000 .853 .0971 1.000 .853 .1042 1.000 .853 .1758 1.000 .853
.1231 1.000 .824 .1094 1.000 .824 .1136 1.000 .824 .1851 1.000 .824
.1266 1.000 .794 .1224 1.000 .794 .1236 1.000 .794 .1934 1.000 .794
.1305 1.000 .765 .1260 1.000 .765 .1281 1.000 .765 .1950 1.000 .765
.1359 1.000 .735 .1280 1.000 .735 .1294 1.000 .735 .1978 1.000 .735
.1437 1.000 .706 .1298 1.000 .706 .1309 1.000 .706 .1995 1.000 .706
.1488 1.000 .676 .1306 1.000 .676 .1319 1.000 .676 .1998 .947 .706
.1506 1.000 .647 .1330 1.000 .647 .1350 1.000 .647 .2002 .947 .676
.1530 1.000 .618 .1428 .947 .647 .1447 .947 .647 .2038 .947 .647
.1583 .947 .618 .1540 .947 .618 .1580 .947 .618 .2092 .895 .647
.1624 .895 .618 .1691 .895 .618 .1699 .895 .618 .2186 .895 .618
.1645 .895 .588 .1817 .895 .588 .1781 .842 .618 .2262 .895 .588
.1696 .842 .588 .1861 .895 .559 .1824 .842 .588 .2279 .895 .559
.1748 .842 .559 .1923 .842 .559 .1885 .842 .559 .2297 .842 .559
.1834 .842 .529 .1953 .789 .559 .1963 .842 .529 .2381 .842 .529
.1928 .842 .500 .1988 .789 .529 .1998 .789 .529 .2468 .789 .529
.1970 .842 .471 .2057 .789 .500 .2010 .789 .500 .2480 .789 .500
.1998 .842 .441 .2110 .789 .471 .2023 .789 .471 .2492 .737 .500
.2010 .842 .412 .2156 .737 .471 .2028 .789 .441 .2517 .737 .471
.2013 .789 .412 .2197 .737 .441 .2134 .737 .441 .2566 .684 .471
.2062 .789 .382 .2282 .684 .441 .2289 .684 .441 .2651 .684 .441
.2143 .789 .353 .2436 .684 .412 .2402 .684 .412 .2712 .684 .412
.2202 .789 .324 .2533 .684 .382 .2465 .684 .382 .2752 .684 .382
.2229 .789 .294 .2559 .684 .353 .2472 .684 .353 .2813 .684 .353
.2302 .789 .265 .2644 .684 .324 .2555 .684 .324 .2881 .684 .324
.2454 .737 .265 .2748 .684 .294 .2649 .684 .294 .2923 .684 .294
.2547 .737 .235 .2836 .684 .265 .2738 .684 .265 .2954 .684 .265
.2564 .684 .235 .2925 .684 .235 .2819 .684 .235 .2992 .684 .235
.2609 .684 .206 .2963 .632 .235 .2856 .632 .235 .3024 .632 .235
122
APPENDIX G (continued)
SVM Logistic Regression Discriminant Analysis Neural Networks Decision Tree
Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe.
.2676 .684 .176 .3015 .579 .235 .2893 .632 .206 .3166 .579 .235
.2841 .684 .147 .3072 .579 .206 .2907 .579 .206 .3291 .526 .235
.3062 .684 .118 .3149 .526 .206 .3005 .526 .206 .3345 .526 .206
.3187 .632 .118 .3290 .474 .206 .3150 .474 .206 .3397 .474 .206
.3299 .579 .118 .3378 .474 .176 .3228 .421 .206 .3434 .474 .176
.3446 .579 .088 .3407 .474 .147 .3256 .368 .206 .3487 .474 .147
.3571 .579 .059 .3443 .421 .147 .3287 .368 .176 .3590 .421 .147
.3765 .526 .059 .3571 .368 .147 .3315 .368 .147 .3763 .368 .147
.4270 .474 .059 .3714 .368 .118 .3416 .368 .118 .3922 .316 .147
.4815 .421 .059 .3793 .316 .118 .3531 .316 .118 .4011 .316 .118
.5037 .368 .059 .3957 .316 .088 .3721 .316 .088 .4041 .316 .088
.5081 .316 .059 .4156 .263 .088 .3973 .263 .088 .4107 .263 .088
.5117 .263 .059 .4332 .263 .059 .4146 .263 .059 .4265 .263 .059
.5229 .211 .059 .4861 .211 .059 .4481 .211 .059 .5103 .211 .059
.6365 .158 .059 .5768 .158 .059 .5271 .158 .059 .6279 .158 .059
.7626 .105 .059 .6329 .105 .059 .5930 .105 .059 .6783 .105 .059
.7941 .053 .059 .7063 .053 .059 .6713 .053 .059 .7575 .053 .059
.8269 .053 .029 .7967 .053 .029 .7565 .053 .029 .8491 .053 .029
.8815 .000 .029 .8550 .053 .000 .8120 .053 .000 .8744 .053 .000
1.0000 .000 .000 1.0000 .000 .000 1.0000 .000 .000 1.0000 .000 .000
Note: Sen. refers to Sensitivity; Spe. refers to Specificity
123
APPENDIX H
SENSITIVITY AND 1-SPECIFICITY FOR CHINESE DATASET
SVM Logistic Regression Discriminant Analysis Neural Networks Decision Tree
Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe.
0.0000 1.000 1.000 0.0000 1.000 1.000 0.0000 1.000 1.000 0.0000 1.000 1.000 0.0000 1.000 1.000
0.1885 .975 1.000 0.0605 1.000 .986 0.0855 1.000 .986 0.0670 1.000 .986 0.1748 .900 .521
0.1950 .975 .986 0.0647 1.000 .973 0.0901 1.000 .973 0.0678 1.000 .973 0.2598 .825 .438
0.1967 .975 .973 0.0729 .975 .973 0.0990 .975 .973 0.0699 .975 .973 0.3761 .775 .329
0.1993 .950 .973 0.0794 .975 .959 0.1058 .975 .959 0.0719 .975 .959 0.5039 .675 .288
0.2010 .950 .959 0.0988 .975 .945 0.1248 .975 .945 0.0778 .975 .945 1.0000 0.000 0.000
0.2032 .950 .945 0.1277 .975 .932 0.1549 .975 .932 0.0911 .975 .932
0.2081 .950 .932 0.1449 .950 .932 0.1709 .950 .932 0.0999 .975 .918
0.2136 .950 .918 0.1516 .950 .918 0.1761 .950 .918 0.1010 .975 .904
0.2161 .950 .904 0.1528 .950 .904 0.1774 .950 .904 0.1039 .975 .890
0.2166 .950 .890 0.1540 .950 .890 0.1783 .950 .890 0.1068 .975 .877
0.2184 .950 .877 0.1570 .950 .877 0.1809 .950 .877 0.1086 .975 .863
0.2208 .950 .863 0.1609 .950 .863 0.1832 .950 .863 0.1106 .975 .849
0.2219 .950 .849 0.1620 .950 .849 0.1846 .950 .849 0.1204 .975 .836
0.2226 .950 .836 0.1677 .950 .836 0.1896 .950 .836 0.1305 .975 .822
0.2241 .950 .822 0.1741 .950 .822 0.1951 .950 .822 0.1344 .975 .808
0.2252 .950 .808 0.1815 .950 .808 0.2032 .950 .808 0.1429 .975 .795
0.2255 .950 .795 0.1893 .925 .808 0.2124 .950 .795 0.1501 .975 .781
0.2257 .950 .781 0.1993 .925 .795 0.2187 .925 .795 0.1522 .975 .767
0.2261 .950 .767 0.2099 .925 .781 0.2249 .925 .781 0.1530 .975 .753
0.2265 .950 .753 0.2118 .925 .767 0.2274 .925 .767 0.1540 .975 .740
0.2269 .950 .740 0.2132 .925 .753 0.2280 .925 .753 0.1566 .975 .726
0.2275 .950 .726 0.2160 .925 .740 0.2298 .925 .740 0.1592 .975 .712
0.2280 .950 .712 0.2194 .925 .726 0.2329 .925 .726 0.1604 .975 .699
0.2284 .925 .712 0.2216 .925 .712 0.2354 .925 .712 0.1655 .950 .699
0.2294 .925 .699 0.2249 .925 .699 0.2382 .925 .699 0.1714 .950 .685
0.2301 .925 .685 0.2346 .925 .685 0.2454 .925 .685 0.1754 .950 .671
0.2303 .925 .671 0.2422 .925 .671 0.2521 .925 .671 0.1795 .950 .658
0.2305 .925 .658 0.2441 .925 .658 0.2536 .925 .658 0.1815 .950 .644
0.2309 .925 .644 0.2453 .925 .644 0.2561 .925 .644 0.1856 .950 .630
0.2315 .925 .630 0.2471 .925 .630 0.2598 .925 .630 0.2000 .950 .616
0.2320 .900 .630 0.2498 .925 .616 0.2623 .925 .616 0.2128 .950 .603
0.2326 .900 .616 0.2532 .925 .603 0.2639 .925 .603 0.2154 .950 .589
0.2331 .900 .603 0.2562 .900 .603 0.2654 .925 .589 0.2219 .950 .575
124
APPENDIX H (continued)
SVM Logistic Regression Discriminant Analysis Neural Networks Decision Tree
Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe.
0.2335 .900 .589 0.2596 .900 .589 0.2680 .900 .589 0.2293 .950 .562
0.2337 .900 .575 0.2631 .875 .589 0.2717 .875 .589 0.2319 .950 .548
0.2340 .900 .562 0.2662 .875 .575 0.2749 .875 .575 0.2335 .950 .534
0.2347 .875 .562 0.2701 .875 .562 0.2782 .875 .562 0.2410 .925 .534
0.2356 .875 .548 0.2751 .875 .548 0.2812 .875 .548 0.2488 .925 .521
0.2363 .875 .534 0.2804 .875 .534 0.2837 .875 .534 0.2516 .900 .521
0.2367 .875 .521 0.2843 .875 .521 0.2859 .875 .521 0.2573 .900 .507
0.2367 .875 .507 0.2865 .875 .507 0.2866 .875 .507 0.2642 .900 .493
0.2368 .850 .507 0.2874 .875 .493 0.2899 .875 .493 0.2699 .875 .493
0.2369 .850 .493 0.2884 .875 .479 0.2943 .875 .479 0.2797 .875 .479
0.2382 .850 .479 0.2916 .850 .479 0.2971 .875 .466 0.2909 .850 .479
0.2396 .850 .466 0.2954 .850 .466 0.2996 .875 .452 0.3044 .850 .466
0.2397 .850 .452 0.3040 .850 .452 0.3040 .850 .452 0.3155 .850 .452
0.2399 .850 .438 0.3133 .825 .452 0.3083 .825 .452 0.3187 .850 .438
0.2401 .825 .438 0.3181 .825 .438 0.3121 .825 .438 0.3231 .850 .425
0.2410 .825 .425 0.3216 .825 .425 0.3157 .825 .425 0.3332 .850 .411
0.2421 .825 .411 0.3222 .825 .411 0.3167 .825 .411 0.3450 .850 .397
0.2429 .825 .397 0.3234 .800 .411 0.3189 .825 .397 0.3498 .850 .384
0.2435 .825 .384 0.3263 .800 .397 0.3221 .825 .384 0.3513 .850 .370
0.2443 .825 .370 0.3285 .800 .384 0.3236 .800 .384 0.3551 .825 .370
0.2480 .825 .356 0.3296 .800 .370 0.3242 .800 .370 0.3599 .800 .370
0.2514 .825 .342 0.3326 .800 .356 0.3260 .800 .356 0.3658 .800 .356
0.2518 .825 .329 0.3371 .800 .342 0.3278 .775 .356 0.3706 .800 .342
0.2520 .825 .315 0.3430 .775 .342 0.3340 .775 .342 0.3725 .800 .329
0.2536 .800 .315 0.3489 .775 .329 0.3402 .750 .342 0.3786 .800 .315
0.2577 .775 .315 0.3525 .750 .329 0.3409 .750 .329 0.3835 .800 .301
0.2606 .775 .301 0.3543 .750 .315 0.3422 .750 .315 0.3899 .800 .288
0.2622 .750 .301 0.3571 .750 .301 0.3451 .750 .301 0.3963 .800 .274
0.2652 .750 .288 0.3594 .750 .288 0.3474 .750 .288 0.3980 .775 .274
0.2678 .750 .274 0.3602 .750 .274 0.3490 .750 .274 0.4018 .750 .274
0.2694 .725 .274 0.3611 .750 .260 0.3517 .725 .274 0.4045 .750 .260
0.2707 .700 .274 0.3646 .725 .260 0.3545 .725 .260 0.4061 .750 .247
0.2725 .675 .274 0.3683 .725 .247 0.3561 .725 .247 0.4084 .725 .247
0.2748 .675 .260 0.3700 .700 .247 0.3574 .700 .247 0.4128 .700 .247
0.2767 .675 .247 0.3721 .700 .233 0.3585 .675 .247 0.4207 .675 .247
0.2788 .650 .247 0.3732 .675 .233 0.3589 .675 .233 0.4256 .650 .247
0.2814 .625 .247 0.3738 .650 .233 0.3602 .650 .233 0.4258 .650 .233
125
APPENDIX H (continued)
SVM Logistic Regression Discriminant Analysis Neural Networks Decision Tree
Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe.
0.2830 .625 .233 0.3757 .625 .233 0.3625 .625 .233 0.4258 .625 .233
0.2905 .625 .219 0.3779 .625 .219 0.3646 .625 .219 0.4258 .625 .219
0.3033 .625 .205 0.3783 .600 .219 0.3664 .625 .205 0.4259 .525 .205
0.3140 .600 .205 0.3803 .600 .205 0.3677 .625 .192 0.4261 .525 .192
0.3196 .600 .192 0.3833 .600 .192 0.3678 .600 .192 0.4278 .500 .192
0.3396 .575 .192 0.3856 .600 .178 0.3681 .575 .192 0.4295 .475 .192
0.3601 .575 .178 0.3872 .575 .178 0.3732 .550 .192 0.4408 .450 .192
0.3613 .550 .178 0.3920 .550 .178 0.3804 .525 .192 0.4525 .425 .192
0.3637 .550 .164 0.3968 .525 .178 0.3835 .525 .178 0.4579 .425 .178
0.3677 .550 .151 0.3991 .500 .178 0.3843 .500 .178 0.4637 .400 .178
0.3704 .525 .151 0.4031 .475 .178 0.3845 .475 .178 0.4682 .375 .178
0.3722 .500 .151 0.4061 .475 .164 0.3849 .475 .164 0.4744 .350 .178
0.3802 .500 .137 0.4190 .475 .151 0.3988 .475 .151 0.4821 .350 .164
0.3873 .475 .137 0.4322 .450 .151 0.4165 .450 .151 0.4905 .350 .151
0.3918 .450 .137 0.4375 .425 .151 0.4214 .425 .151 0.4949 .325 .151
0.3998 .425 .137 0.4419 .400 .151 0.4220 .425 .137 0.4997 .325 .137
0.4077 .400 .137 0.4448 .400 .137 0.4220 .425 .123 0.5074 .325 .123
0.4277 .400 .123 0.4485 .400 .123 0.4269 .400 .123 0.5127 .325 .110
0.4550 .375 .123 0.4582 .400 .110 0.4349 .375 .123 0.5159 .325 .096
0.4701 .375 .110 0.4715 .350 .110 0.4422 .350 .123 0.5179 .300 .096
0.4993 .375 .096 0.4791 .325 .110 0.4465 .350 .110 0.5185 .300 .082
0.5317 .375 .082 0.4836 .300 .110 0.4478 .325 .110 0.5220 .275 .082
0.5394 .350 .082 0.4936 .275 .110 0.4503 .300 .110 0.5275 .250 .082
0.5467 .325 .082 0.5048 .275 .096 0.4671 .275 .110 0.5389 .225 .082
0.5721 .300 .082 0.5083 .250 .096 0.4845 .275 .096 0.5558 .225 .068
0.5925 .275 .082 0.5165 .225 .096 0.4875 .275 .082 0.5687 .200 .068
0.5948 .275 .068 0.5306 .225 .082 0.4893 .250 .082 0.5743 .175 .068
0.5959 .250 .068 0.5432 .225 .068 0.4949 .250 .068 0.5769 .150 .068
0.6020 .250 .055 0.5618 .200 .068 0.5071 .225 .068 0.5805 .150 .055
0.6138 .225 .055 0.5760 .175 .068 0.5160 .225 .055 0.5828 .150 .041
0.6215 .200 .055 0.5853 .175 .055 0.5329 .200 .055 0.5866 .125 .041
0.6246 .175 .055 0.6314 .150 .055 0.5571 .175 .055 0.5989 .100 .041
0.6381 .150 .055 0.6762 .150 .041 0.6007 .150 .055 0.6130 .075 .041
0.6559 .125 .055 0.7051 .150 .027 0.6530 .150 .041 0.6198 .075 .027
0.6618 .100 .055 0.8006 .125 .027 0.6806 .150 .027 0.6225 .050 .027
0.6730 .100 .041 0.8985 .100 .027 0.7677 .125 .027 0.6277 .025 .027
0.6856 .100 .027 0.9441 .100 .014 0.8741 .100 .027 0.6390 0.000 .027
126
APPENDIX H (continued)
SVM Logistic Regression Discriminant Analysis Neural Networks Decision Tree
Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe.
0.6919 .100 .014 0.9745 .075 .014 0.9277 .100 .014 0.6502 0.000 .014
0.7094 .075 .014 0.9892 .075 0.000 0.9645 .075 .014 1.0000 0.000 0.000
0.7234 .075 0.000 0.9958 .050 0.000 0.9838 .075 0.000
0.7378 .050 0.000 0.9984 .025 0.000 0.9928 .050 0.000
0.7712 .025 0.000 1.0000 0.000 0.000 0.9970 .025 0.000
1.0000 0.000 0.000 1.0000 0.000 0.000
Note: Sen. refers to Sensitivity; Spe. refers to Specificity
127
128
VITA
Name: Jun Huang
Address: Room 201, No. 46, Guanghanzhi Street, Guangzhou, China, 510224
Education: PhD in International Business, A.R. Sanchez, Jr. School of Business,
Texas A&M International University, May 2014
MSc in International Management, Business School of Oxford Brookes
University, January 2006
Bachelor of Management in Accounting of Foreign Affairs, School of
Economics & Management, Guangdong University of Technology,
July 2004